All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/35] [PULL] qemu-kvm.git uq/master queue
@ 2011-01-06 17:56 ` Marcelo Tosatti
  0 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, kvm, Marcelo Tosatti

The following changes since commit 23979dc5411befabe9049e37075b2b6320debc4e:

  microblaze: Use more TB chaining (2011-01-05 02:23:09 +0100)

are available in the git repository at:
  git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git uq/master

Jan Kiszka (27):
      kvm: x86: Fix DPL write back of segment registers
      kvm: x86: Remove obsolete SS.RPL/DPL aligment
      kvm: x86: Prevent sign extension of DR7 in guest debugging mode
      kvm: x86: Fix a few coding style violations
      kvm: Fix coding style violations
      kvm: Drop return value of kvm_cpu_exec
      kvm: Stop on all fatal exit reasons
      kvm: Improve reporting of fatal errors
      x86: Optionally dump code bytes on cpu_dump_state
      kvm: x86: Align kvm_arch_put_registers code with comment
      kvm: x86: Prepare kvm_get_mp_state for in-kernel irqchip
      kvm: x86: Remove redundant mp_state initialization
      kvm: x86: Fix xcr0 reset mismerge
      kvm: x86: Refactor msr_star/hsave_pa setup and checks
      kvm: x86: Reset paravirtual MSRs
      Synchronize VCPU states before reset
      kvm: x86: Drop MCE MSRs write back restrictions
      kvm: Eliminate KVMState arguments
      kvm: x86: Fix !CONFIG_KVM_PARA build
      kvm: x86: Introduce kvmclock device to save/restore its state
      kvm: Drop smp_cpus argument from init functions
      kvm: Consolidate must-have capability checks
      kvm: x86: Rework identity map and TSS setup for larger BIOS sizes
      kvm: Flush coalesced mmio buffer on IO window exits
      kvm: Do not use qemu_fair_mutex
      kvm: x86: Implicitly clear nmi_injected/pending on reset
      kvm: x86: Only read/write MSR_KVM_ASYNC_PF_EN if supported

Jin Dongming (6):
      Clean up cpu_inject_x86_mce()
      Add "broadcast" option for mce command
      Add function for checking mca broadcast of CPU
      kvm: introduce kvm_mce_in_progress
      kvm: kvm_mce_inj_* subroutines for templated error injections
      kvm: introduce kvm_inject_x86_mce_on

Lai Jiangshan (2):
      kvm: Enable user space NMI injection for kvm guest
      kvm: convert kvm_ioctl(KVM_CHECK_EXTENSION) to kvm_check_extension()

 configure             |   36 ++-
 cpu-all.h             |    5 +-
 cpu-defs.h            |    2 -
 cpus.c                |    2 -
 hmp-commands.hx       |    6 +-
 kvm-all.c             |  447 ++++++++++++-------------
 kvm-stub.c            |    8 +-
 kvm.h                 |   29 +-
 monitor.c             |    7 +-
 target-i386/cpu.h     |    9 +-
 target-i386/cpuid.c   |   14 +-
 target-i386/helper.c  |   97 +++++-
 target-i386/kvm.c     |  882 +++++++++++++++++++++++++++++--------------------
 target-i386/kvm_x86.h |    8 +-
 target-ppc/kvm.c      |   20 +-
 target-s390x/kvm.c    |   12 +-
 vl.c                  |    3 +-
 17 files changed, 929 insertions(+), 658 deletions(-)

^ permalink raw reply	[flat|nested] 300+ messages in thread

* [Qemu-devel] [PATCH 00/35] [PULL] qemu-kvm.git uq/master queue
@ 2011-01-06 17:56 ` Marcelo Tosatti
  0 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Marcelo Tosatti, qemu-devel, kvm

The following changes since commit 23979dc5411befabe9049e37075b2b6320debc4e:

  microblaze: Use more TB chaining (2011-01-05 02:23:09 +0100)

are available in the git repository at:
  git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git uq/master

Jan Kiszka (27):
      kvm: x86: Fix DPL write back of segment registers
      kvm: x86: Remove obsolete SS.RPL/DPL aligment
      kvm: x86: Prevent sign extension of DR7 in guest debugging mode
      kvm: x86: Fix a few coding style violations
      kvm: Fix coding style violations
      kvm: Drop return value of kvm_cpu_exec
      kvm: Stop on all fatal exit reasons
      kvm: Improve reporting of fatal errors
      x86: Optionally dump code bytes on cpu_dump_state
      kvm: x86: Align kvm_arch_put_registers code with comment
      kvm: x86: Prepare kvm_get_mp_state for in-kernel irqchip
      kvm: x86: Remove redundant mp_state initialization
      kvm: x86: Fix xcr0 reset mismerge
      kvm: x86: Refactor msr_star/hsave_pa setup and checks
      kvm: x86: Reset paravirtual MSRs
      Synchronize VCPU states before reset
      kvm: x86: Drop MCE MSRs write back restrictions
      kvm: Eliminate KVMState arguments
      kvm: x86: Fix !CONFIG_KVM_PARA build
      kvm: x86: Introduce kvmclock device to save/restore its state
      kvm: Drop smp_cpus argument from init functions
      kvm: Consolidate must-have capability checks
      kvm: x86: Rework identity map and TSS setup for larger BIOS sizes
      kvm: Flush coalesced mmio buffer on IO window exits
      kvm: Do not use qemu_fair_mutex
      kvm: x86: Implicitly clear nmi_injected/pending on reset
      kvm: x86: Only read/write MSR_KVM_ASYNC_PF_EN if supported

Jin Dongming (6):
      Clean up cpu_inject_x86_mce()
      Add "broadcast" option for mce command
      Add function for checking mca broadcast of CPU
      kvm: introduce kvm_mce_in_progress
      kvm: kvm_mce_inj_* subroutines for templated error injections
      kvm: introduce kvm_inject_x86_mce_on

Lai Jiangshan (2):
      kvm: Enable user space NMI injection for kvm guest
      kvm: convert kvm_ioctl(KVM_CHECK_EXTENSION) to kvm_check_extension()

 configure             |   36 ++-
 cpu-all.h             |    5 +-
 cpu-defs.h            |    2 -
 cpus.c                |    2 -
 hmp-commands.hx       |    6 +-
 kvm-all.c             |  447 ++++++++++++-------------
 kvm-stub.c            |    8 +-
 kvm.h                 |   29 +-
 monitor.c             |    7 +-
 target-i386/cpu.h     |    9 +-
 target-i386/cpuid.c   |   14 +-
 target-i386/helper.c  |   97 +++++-
 target-i386/kvm.c     |  882 +++++++++++++++++++++++++++++--------------------
 target-i386/kvm_x86.h |    8 +-
 target-ppc/kvm.c      |   20 +-
 target-s390x/kvm.c    |   12 +-
 vl.c                  |    3 +-
 17 files changed, 929 insertions(+), 658 deletions(-)

^ permalink raw reply	[flat|nested] 300+ messages in thread

* [PATCH 01/35] kvm: Enable user space NMI injection for kvm guest
  2011-01-06 17:56 ` [Qemu-devel] " Marcelo Tosatti
@ 2011-01-06 17:56   ` Marcelo Tosatti
  -1 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, kvm, Lai Jiangshan, Marcelo Tosatti

From: Lai Jiangshan <laijs@cn.fujitsu.com>

Make use of the new KVM_NMI IOCTL to send NMIs into the KVM guest if the
user space raised them. (example: qemu monitor's "nmi" command)

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 configure         |    3 +++
 target-i386/kvm.c |    7 +++++++
 2 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/configure b/configure
index 47e4cf0..ec37a91 100755
--- a/configure
+++ b/configure
@@ -1674,6 +1674,9 @@ if test "$kvm" != "no" ; then
 #if !defined(KVM_CAP_DESTROY_MEMORY_REGION_WORKS)
 #error Missing KVM capability KVM_CAP_DESTROY_MEMORY_REGION_WORKS
 #endif
+#if !defined(KVM_CAP_USER_NMI)
+#error Missing KVM capability KVM_CAP_USER_NMI
+#endif
 int main(void) { return 0; }
 EOF
   if test "$kerneldir" != "" ; then
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 7dfc357..755f8c9 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -1417,6 +1417,13 @@ int kvm_arch_get_registers(CPUState *env)
 
 int kvm_arch_pre_run(CPUState *env, struct kvm_run *run)
 {
+    /* Inject NMI */
+    if (env->interrupt_request & CPU_INTERRUPT_NMI) {
+        env->interrupt_request &= ~CPU_INTERRUPT_NMI;
+        DPRINTF("injected NMI\n");
+        kvm_vcpu_ioctl(env, KVM_NMI);
+    }
+
     /* Try to inject an interrupt if the guest can accept it */
     if (run->ready_for_interrupt_injection &&
         (env->interrupt_request & CPU_INTERRUPT_HARD) &&
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [Qemu-devel] [PATCH 01/35] kvm: Enable user space NMI injection for kvm guest
@ 2011-01-06 17:56   ` Marcelo Tosatti
  0 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Marcelo Tosatti, qemu-devel, kvm, Lai Jiangshan

From: Lai Jiangshan <laijs@cn.fujitsu.com>

Make use of the new KVM_NMI IOCTL to send NMIs into the KVM guest if the
user space raised them. (example: qemu monitor's "nmi" command)

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 configure         |    3 +++
 target-i386/kvm.c |    7 +++++++
 2 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/configure b/configure
index 47e4cf0..ec37a91 100755
--- a/configure
+++ b/configure
@@ -1674,6 +1674,9 @@ if test "$kvm" != "no" ; then
 #if !defined(KVM_CAP_DESTROY_MEMORY_REGION_WORKS)
 #error Missing KVM capability KVM_CAP_DESTROY_MEMORY_REGION_WORKS
 #endif
+#if !defined(KVM_CAP_USER_NMI)
+#error Missing KVM capability KVM_CAP_USER_NMI
+#endif
 int main(void) { return 0; }
 EOF
   if test "$kerneldir" != "" ; then
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 7dfc357..755f8c9 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -1417,6 +1417,13 @@ int kvm_arch_get_registers(CPUState *env)
 
 int kvm_arch_pre_run(CPUState *env, struct kvm_run *run)
 {
+    /* Inject NMI */
+    if (env->interrupt_request & CPU_INTERRUPT_NMI) {
+        env->interrupt_request &= ~CPU_INTERRUPT_NMI;
+        DPRINTF("injected NMI\n");
+        kvm_vcpu_ioctl(env, KVM_NMI);
+    }
+
     /* Try to inject an interrupt if the guest can accept it */
     if (run->ready_for_interrupt_injection &&
         (env->interrupt_request & CPU_INTERRUPT_HARD) &&
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [PATCH 02/35] kvm: convert kvm_ioctl(KVM_CHECK_EXTENSION) to kvm_check_extension()
  2011-01-06 17:56 ` [Qemu-devel] " Marcelo Tosatti
@ 2011-01-06 17:56   ` Marcelo Tosatti
  -1 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, kvm, Lai Jiangshan, Marcelo Tosatti

From: Lai Jiangshan <laijs@cn.fujitsu.com>

simple cleanup and use existing helper: kvm_check_extension().

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 kvm-all.c         |    2 +-
 target-i386/kvm.c |    4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index cae24bb..35fc73c 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -678,7 +678,7 @@ int kvm_init(int smp_cpus)
 
     s->broken_set_mem_region = 1;
 #ifdef KVM_CAP_JOIN_MEMORY_REGIONS_WORKS
-    ret = kvm_ioctl(s, KVM_CHECK_EXTENSION, KVM_CAP_JOIN_MEMORY_REGIONS_WORKS);
+    ret = kvm_check_extension(s, KVM_CAP_JOIN_MEMORY_REGIONS_WORKS);
     if (ret > 0) {
         s->broken_set_mem_region = 0;
     }
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 755f8c9..4004de7 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -187,7 +187,7 @@ static int kvm_get_mce_cap_supported(KVMState *s, uint64_t *mce_cap,
 {
     int r;
 
-    r = kvm_ioctl(s, KVM_CHECK_EXTENSION, KVM_CAP_MCE);
+    r = kvm_check_extension(s, KVM_CAP_MCE);
     if (r > 0) {
         *max_banks = r;
         return kvm_ioctl(s, KVM_X86_GET_MCE_CAP_SUPPORTED, mce_cap);
@@ -540,7 +540,7 @@ int kvm_arch_init(KVMState *s, int smp_cpus)
      * versions of KVM just assumed that it would be at the end of physical
      * memory but that doesn't work with more than 4GB of memory.  We simply
      * refuse to work with those older versions of KVM. */
-    ret = kvm_ioctl(s, KVM_CHECK_EXTENSION, KVM_CAP_SET_TSS_ADDR);
+    ret = kvm_check_extension(s, KVM_CAP_SET_TSS_ADDR);
     if (ret <= 0) {
         fprintf(stderr, "kvm does not support KVM_CAP_SET_TSS_ADDR\n");
         return ret;
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [Qemu-devel] [PATCH 02/35] kvm: convert kvm_ioctl(KVM_CHECK_EXTENSION) to kvm_check_extension()
@ 2011-01-06 17:56   ` Marcelo Tosatti
  0 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Marcelo Tosatti, qemu-devel, kvm, Lai Jiangshan

From: Lai Jiangshan <laijs@cn.fujitsu.com>

simple cleanup and use existing helper: kvm_check_extension().

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 kvm-all.c         |    2 +-
 target-i386/kvm.c |    4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index cae24bb..35fc73c 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -678,7 +678,7 @@ int kvm_init(int smp_cpus)
 
     s->broken_set_mem_region = 1;
 #ifdef KVM_CAP_JOIN_MEMORY_REGIONS_WORKS
-    ret = kvm_ioctl(s, KVM_CHECK_EXTENSION, KVM_CAP_JOIN_MEMORY_REGIONS_WORKS);
+    ret = kvm_check_extension(s, KVM_CAP_JOIN_MEMORY_REGIONS_WORKS);
     if (ret > 0) {
         s->broken_set_mem_region = 0;
     }
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 755f8c9..4004de7 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -187,7 +187,7 @@ static int kvm_get_mce_cap_supported(KVMState *s, uint64_t *mce_cap,
 {
     int r;
 
-    r = kvm_ioctl(s, KVM_CHECK_EXTENSION, KVM_CAP_MCE);
+    r = kvm_check_extension(s, KVM_CAP_MCE);
     if (r > 0) {
         *max_banks = r;
         return kvm_ioctl(s, KVM_X86_GET_MCE_CAP_SUPPORTED, mce_cap);
@@ -540,7 +540,7 @@ int kvm_arch_init(KVMState *s, int smp_cpus)
      * versions of KVM just assumed that it would be at the end of physical
      * memory but that doesn't work with more than 4GB of memory.  We simply
      * refuse to work with those older versions of KVM. */
-    ret = kvm_ioctl(s, KVM_CHECK_EXTENSION, KVM_CAP_SET_TSS_ADDR);
+    ret = kvm_check_extension(s, KVM_CAP_SET_TSS_ADDR);
     if (ret <= 0) {
         fprintf(stderr, "kvm does not support KVM_CAP_SET_TSS_ADDR\n");
         return ret;
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [PATCH 03/35] Clean up cpu_inject_x86_mce()
  2011-01-06 17:56 ` [Qemu-devel] " Marcelo Tosatti
@ 2011-01-06 17:56   ` Marcelo Tosatti
  -1 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, kvm, Jin Dongming, Marcelo Tosatti

From: Jin Dongming <jin.dongming@np.css.fujitsu.com>

Clean up cpu_inject_x86_mce() for later patch.

Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 target-i386/helper.c |   27 +++++++++++++++++----------
 1 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/target-i386/helper.c b/target-i386/helper.c
index 25a3e36..2c94130 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -1021,21 +1021,12 @@ static void breakpoint_handler(CPUState *env)
 /* This should come from sysemu.h - if we could include it here... */
 void qemu_system_reset_request(void);
 
-void cpu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
+static void qemu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
                         uint64_t mcg_status, uint64_t addr, uint64_t misc)
 {
     uint64_t mcg_cap = cenv->mcg_cap;
-    unsigned bank_num = mcg_cap & 0xff;
     uint64_t *banks = cenv->mce_banks;
 
-    if (bank >= bank_num || !(status & MCI_STATUS_VAL))
-        return;
-
-    if (kvm_enabled()) {
-        kvm_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc, 0);
-        return;
-    }
-
     /*
      * if MSR_MCG_CTL is not all 1s, the uncorrected error
      * reporting is disabled
@@ -1076,6 +1067,22 @@ void cpu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
     } else
         banks[1] |= MCI_STATUS_OVER;
 }
+
+void cpu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
+                        uint64_t mcg_status, uint64_t addr, uint64_t misc)
+{
+    unsigned bank_num = cenv->mcg_cap & 0xff;
+
+    if (bank >= bank_num || !(status & MCI_STATUS_VAL)) {
+        return;
+    }
+
+    if (kvm_enabled()) {
+        kvm_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc, 0);
+    } else {
+        qemu_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc);
+    }
+}
 #endif /* !CONFIG_USER_ONLY */
 
 static void mce_init(CPUX86State *cenv)
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [Qemu-devel] [PATCH 03/35] Clean up cpu_inject_x86_mce()
@ 2011-01-06 17:56   ` Marcelo Tosatti
  0 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Marcelo Tosatti, qemu-devel, kvm, Jin Dongming

From: Jin Dongming <jin.dongming@np.css.fujitsu.com>

Clean up cpu_inject_x86_mce() for later patch.

Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 target-i386/helper.c |   27 +++++++++++++++++----------
 1 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/target-i386/helper.c b/target-i386/helper.c
index 25a3e36..2c94130 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -1021,21 +1021,12 @@ static void breakpoint_handler(CPUState *env)
 /* This should come from sysemu.h - if we could include it here... */
 void qemu_system_reset_request(void);
 
-void cpu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
+static void qemu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
                         uint64_t mcg_status, uint64_t addr, uint64_t misc)
 {
     uint64_t mcg_cap = cenv->mcg_cap;
-    unsigned bank_num = mcg_cap & 0xff;
     uint64_t *banks = cenv->mce_banks;
 
-    if (bank >= bank_num || !(status & MCI_STATUS_VAL))
-        return;
-
-    if (kvm_enabled()) {
-        kvm_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc, 0);
-        return;
-    }
-
     /*
      * if MSR_MCG_CTL is not all 1s, the uncorrected error
      * reporting is disabled
@@ -1076,6 +1067,22 @@ void cpu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
     } else
         banks[1] |= MCI_STATUS_OVER;
 }
+
+void cpu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
+                        uint64_t mcg_status, uint64_t addr, uint64_t misc)
+{
+    unsigned bank_num = cenv->mcg_cap & 0xff;
+
+    if (bank >= bank_num || !(status & MCI_STATUS_VAL)) {
+        return;
+    }
+
+    if (kvm_enabled()) {
+        kvm_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc, 0);
+    } else {
+        qemu_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc);
+    }
+}
 #endif /* !CONFIG_USER_ONLY */
 
 static void mce_init(CPUX86State *cenv)
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [PATCH 04/35] Add "broadcast" option for mce command
  2011-01-06 17:56 ` [Qemu-devel] " Marcelo Tosatti
@ 2011-01-06 17:56   ` Marcelo Tosatti
  -1 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, kvm, Jin Dongming, Marcelo Tosatti

From: Jin Dongming <jin.dongming@np.css.fujitsu.com>

When the following test case is injected with mce command, maybe user could not
get the expected result.
    DATA
               command cpu bank status             mcg_status  addr   misc
        (qemu) mce     1   1    0xbd00000000000000 0x05        0x1234 0x8c

    Expected Result
           panic type: "Fatal Machine check"

That is because each mce command can only inject the given cpu and could not
inject mce interrupt to other cpus. So user will get the following result:
    panic type: "Fatal machine check on current CPU"

"broadcast" option is used for injecting dummy data into other cpus. Injecting
mce with this option the expected result could be gotten.

Usage:
    Broadcast[on]
           command broadcast cpu bank status             mcg_status  addr   misc
    (qemu) mce     -b        1   1    0xbd00000000000000 0x05        0x1234 0x8c

    Broadcast[off]
           command cpu bank status             mcg_status  addr   misc
    (qemu) mce     1   1    0xbd00000000000000 0x05        0x1234 0x8c

Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 cpu-all.h             |    3 ++-
 hmp-commands.hx       |    6 +++---
 monitor.c             |    7 +++++--
 target-i386/helper.c  |   20 ++++++++++++++++++--
 target-i386/kvm.c     |   16 ++++++++++++----
 target-i386/kvm_x86.h |    5 ++++-
 6 files changed, 44 insertions(+), 13 deletions(-)

diff --git a/cpu-all.h b/cpu-all.h
index 30ae17d..4ce4e83 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -964,6 +964,7 @@ int cpu_memory_rw_debug(CPUState *env, target_ulong addr,
                         uint8_t *buf, int len, int is_write);
 
 void cpu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
-                        uint64_t mcg_status, uint64_t addr, uint64_t misc);
+                        uint64_t mcg_status, uint64_t addr, uint64_t misc,
+                        int broadcast);
 
 #endif /* CPU_ALL_H */
diff --git a/hmp-commands.hx b/hmp-commands.hx
index df134f8..c82fb10 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1091,9 +1091,9 @@ ETEXI
 
     {
         .name       = "mce",
-        .args_type  = "cpu_index:i,bank:i,status:l,mcg_status:l,addr:l,misc:l",
-        .params     = "cpu bank status mcgstatus addr misc",
-        .help       = "inject a MCE on the given CPU",
+        .args_type  = "broadcast:-b,cpu_index:i,bank:i,status:l,mcg_status:l,addr:l,misc:l",
+        .params     = "[-b] cpu bank status mcgstatus addr misc",
+        .help       = "inject a MCE on the given CPU [and broadcast to other CPUs with -b option]",
         .mhandler.cmd = do_inject_mce,
     },
 
diff --git a/monitor.c b/monitor.c
index f258000..f4f624b 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2671,12 +2671,15 @@ static void do_inject_mce(Monitor *mon, const QDict *qdict)
     uint64_t mcg_status = qdict_get_int(qdict, "mcg_status");
     uint64_t addr = qdict_get_int(qdict, "addr");
     uint64_t misc = qdict_get_int(qdict, "misc");
+    int broadcast = qdict_get_try_bool(qdict, "broadcast", 0);
 
-    for (cenv = first_cpu; cenv != NULL; cenv = cenv->next_cpu)
+    for (cenv = first_cpu; cenv != NULL; cenv = cenv->next_cpu) {
         if (cenv->cpu_index == cpu_index && cenv->mcg_cap) {
-            cpu_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc);
+            cpu_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc,
+                               broadcast);
             break;
         }
+    }
 }
 #endif
 
diff --git a/target-i386/helper.c b/target-i386/helper.c
index 2c94130..2cfb4a4 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -1069,18 +1069,34 @@ static void qemu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
 }
 
 void cpu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
-                        uint64_t mcg_status, uint64_t addr, uint64_t misc)
+                        uint64_t mcg_status, uint64_t addr, uint64_t misc,
+                        int broadcast)
 {
     unsigned bank_num = cenv->mcg_cap & 0xff;
+    CPUState *env;
+    int flag = 0;
 
     if (bank >= bank_num || !(status & MCI_STATUS_VAL)) {
         return;
     }
 
     if (kvm_enabled()) {
-        kvm_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc, 0);
+        if (broadcast) {
+            flag |= MCE_BROADCAST;
+        }
+
+        kvm_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc, flag);
     } else {
         qemu_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc);
+        if (broadcast) {
+            for (env = first_cpu; env != NULL; env = env->next_cpu) {
+                if (cenv == env) {
+                    continue;
+                }
+
+                qemu_inject_x86_mce(env, 1, 0xa000000000000000, 0, 0, 0);
+            }
+        }
     }
 }
 #endif /* !CONFIG_USER_ONLY */
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 4004de7..8b868ad 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -264,11 +264,13 @@ static void kvm_do_inject_x86_mce(void *_data)
         }
     }
 }
+
+static void kvm_mce_broadcast_rest(CPUState *env);
 #endif
 
 void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
                         uint64_t mcg_status, uint64_t addr, uint64_t misc,
-                        int abort_on_error)
+                        int flag)
 {
 #ifdef KVM_CAP_MCE
     struct kvm_x86_mce mce = {
@@ -288,10 +290,15 @@ void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
         return;
     }
 
+    if (flag & MCE_BROADCAST) {
+        kvm_mce_broadcast_rest(cenv);
+    }
+
     run_on_cpu(cenv, kvm_do_inject_x86_mce, &data);
 #else
-    if (abort_on_error)
+    if (flag & ABORT_ON_ERROR) {
         abort();
+    }
 #endif
 }
 
@@ -1716,7 +1723,8 @@ static void kvm_mce_broadcast_rest(CPUState *env)
                 continue;
             }
             kvm_inject_x86_mce(cenv, 1, MCI_STATUS_VAL | MCI_STATUS_UC,
-                               MCG_STATUS_MCIP | MCG_STATUS_RIPV, 0, 0, 1);
+                               MCG_STATUS_MCIP | MCG_STATUS_RIPV, 0, 0,
+                               ABORT_ON_ERROR);
         }
     }
 }
@@ -1816,7 +1824,7 @@ int kvm_on_sigbus(int code, void *addr)
             | 0xc0;
         kvm_inject_x86_mce(first_cpu, 9, status,
                            MCG_STATUS_MCIP | MCG_STATUS_RIPV, paddr,
-                           (MCM_ADDR_PHYS << 6) | 0xc, 1);
+                           (MCM_ADDR_PHYS << 6) | 0xc, ABORT_ON_ERROR);
         kvm_mce_broadcast_rest(first_cpu);
     } else
 #endif
diff --git a/target-i386/kvm_x86.h b/target-i386/kvm_x86.h
index 04932cf..9d7b584 100644
--- a/target-i386/kvm_x86.h
+++ b/target-i386/kvm_x86.h
@@ -15,8 +15,11 @@
 #ifndef __KVM_X86_H__
 #define __KVM_X86_H__
 
+#define ABORT_ON_ERROR  0x01
+#define MCE_BROADCAST   0x02
+
 void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
                         uint64_t mcg_status, uint64_t addr, uint64_t misc,
-                        int abort_on_error);
+                        int flag);
 
 #endif
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [Qemu-devel] [PATCH 04/35] Add "broadcast" option for mce command
@ 2011-01-06 17:56   ` Marcelo Tosatti
  0 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Marcelo Tosatti, qemu-devel, kvm, Jin Dongming

From: Jin Dongming <jin.dongming@np.css.fujitsu.com>

When the following test case is injected with mce command, maybe user could not
get the expected result.
    DATA
               command cpu bank status             mcg_status  addr   misc
        (qemu) mce     1   1    0xbd00000000000000 0x05        0x1234 0x8c

    Expected Result
           panic type: "Fatal Machine check"

That is because each mce command can only inject the given cpu and could not
inject mce interrupt to other cpus. So user will get the following result:
    panic type: "Fatal machine check on current CPU"

"broadcast" option is used for injecting dummy data into other cpus. Injecting
mce with this option the expected result could be gotten.

Usage:
    Broadcast[on]
           command broadcast cpu bank status             mcg_status  addr   misc
    (qemu) mce     -b        1   1    0xbd00000000000000 0x05        0x1234 0x8c

    Broadcast[off]
           command cpu bank status             mcg_status  addr   misc
    (qemu) mce     1   1    0xbd00000000000000 0x05        0x1234 0x8c

Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 cpu-all.h             |    3 ++-
 hmp-commands.hx       |    6 +++---
 monitor.c             |    7 +++++--
 target-i386/helper.c  |   20 ++++++++++++++++++--
 target-i386/kvm.c     |   16 ++++++++++++----
 target-i386/kvm_x86.h |    5 ++++-
 6 files changed, 44 insertions(+), 13 deletions(-)

diff --git a/cpu-all.h b/cpu-all.h
index 30ae17d..4ce4e83 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -964,6 +964,7 @@ int cpu_memory_rw_debug(CPUState *env, target_ulong addr,
                         uint8_t *buf, int len, int is_write);
 
 void cpu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
-                        uint64_t mcg_status, uint64_t addr, uint64_t misc);
+                        uint64_t mcg_status, uint64_t addr, uint64_t misc,
+                        int broadcast);
 
 #endif /* CPU_ALL_H */
diff --git a/hmp-commands.hx b/hmp-commands.hx
index df134f8..c82fb10 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1091,9 +1091,9 @@ ETEXI
 
     {
         .name       = "mce",
-        .args_type  = "cpu_index:i,bank:i,status:l,mcg_status:l,addr:l,misc:l",
-        .params     = "cpu bank status mcgstatus addr misc",
-        .help       = "inject a MCE on the given CPU",
+        .args_type  = "broadcast:-b,cpu_index:i,bank:i,status:l,mcg_status:l,addr:l,misc:l",
+        .params     = "[-b] cpu bank status mcgstatus addr misc",
+        .help       = "inject a MCE on the given CPU [and broadcast to other CPUs with -b option]",
         .mhandler.cmd = do_inject_mce,
     },
 
diff --git a/monitor.c b/monitor.c
index f258000..f4f624b 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2671,12 +2671,15 @@ static void do_inject_mce(Monitor *mon, const QDict *qdict)
     uint64_t mcg_status = qdict_get_int(qdict, "mcg_status");
     uint64_t addr = qdict_get_int(qdict, "addr");
     uint64_t misc = qdict_get_int(qdict, "misc");
+    int broadcast = qdict_get_try_bool(qdict, "broadcast", 0);
 
-    for (cenv = first_cpu; cenv != NULL; cenv = cenv->next_cpu)
+    for (cenv = first_cpu; cenv != NULL; cenv = cenv->next_cpu) {
         if (cenv->cpu_index == cpu_index && cenv->mcg_cap) {
-            cpu_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc);
+            cpu_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc,
+                               broadcast);
             break;
         }
+    }
 }
 #endif
 
diff --git a/target-i386/helper.c b/target-i386/helper.c
index 2c94130..2cfb4a4 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -1069,18 +1069,34 @@ static void qemu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
 }
 
 void cpu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
-                        uint64_t mcg_status, uint64_t addr, uint64_t misc)
+                        uint64_t mcg_status, uint64_t addr, uint64_t misc,
+                        int broadcast)
 {
     unsigned bank_num = cenv->mcg_cap & 0xff;
+    CPUState *env;
+    int flag = 0;
 
     if (bank >= bank_num || !(status & MCI_STATUS_VAL)) {
         return;
     }
 
     if (kvm_enabled()) {
-        kvm_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc, 0);
+        if (broadcast) {
+            flag |= MCE_BROADCAST;
+        }
+
+        kvm_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc, flag);
     } else {
         qemu_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc);
+        if (broadcast) {
+            for (env = first_cpu; env != NULL; env = env->next_cpu) {
+                if (cenv == env) {
+                    continue;
+                }
+
+                qemu_inject_x86_mce(env, 1, 0xa000000000000000, 0, 0, 0);
+            }
+        }
     }
 }
 #endif /* !CONFIG_USER_ONLY */
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 4004de7..8b868ad 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -264,11 +264,13 @@ static void kvm_do_inject_x86_mce(void *_data)
         }
     }
 }
+
+static void kvm_mce_broadcast_rest(CPUState *env);
 #endif
 
 void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
                         uint64_t mcg_status, uint64_t addr, uint64_t misc,
-                        int abort_on_error)
+                        int flag)
 {
 #ifdef KVM_CAP_MCE
     struct kvm_x86_mce mce = {
@@ -288,10 +290,15 @@ void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
         return;
     }
 
+    if (flag & MCE_BROADCAST) {
+        kvm_mce_broadcast_rest(cenv);
+    }
+
     run_on_cpu(cenv, kvm_do_inject_x86_mce, &data);
 #else
-    if (abort_on_error)
+    if (flag & ABORT_ON_ERROR) {
         abort();
+    }
 #endif
 }
 
@@ -1716,7 +1723,8 @@ static void kvm_mce_broadcast_rest(CPUState *env)
                 continue;
             }
             kvm_inject_x86_mce(cenv, 1, MCI_STATUS_VAL | MCI_STATUS_UC,
-                               MCG_STATUS_MCIP | MCG_STATUS_RIPV, 0, 0, 1);
+                               MCG_STATUS_MCIP | MCG_STATUS_RIPV, 0, 0,
+                               ABORT_ON_ERROR);
         }
     }
 }
@@ -1816,7 +1824,7 @@ int kvm_on_sigbus(int code, void *addr)
             | 0xc0;
         kvm_inject_x86_mce(first_cpu, 9, status,
                            MCG_STATUS_MCIP | MCG_STATUS_RIPV, paddr,
-                           (MCM_ADDR_PHYS << 6) | 0xc, 1);
+                           (MCM_ADDR_PHYS << 6) | 0xc, ABORT_ON_ERROR);
         kvm_mce_broadcast_rest(first_cpu);
     } else
 #endif
diff --git a/target-i386/kvm_x86.h b/target-i386/kvm_x86.h
index 04932cf..9d7b584 100644
--- a/target-i386/kvm_x86.h
+++ b/target-i386/kvm_x86.h
@@ -15,8 +15,11 @@
 #ifndef __KVM_X86_H__
 #define __KVM_X86_H__
 
+#define ABORT_ON_ERROR  0x01
+#define MCE_BROADCAST   0x02
+
 void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
                         uint64_t mcg_status, uint64_t addr, uint64_t misc,
-                        int abort_on_error);
+                        int flag);
 
 #endif
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [PATCH 05/35] Add function for checking mca broadcast of CPU
  2011-01-06 17:56 ` [Qemu-devel] " Marcelo Tosatti
@ 2011-01-06 17:56   ` Marcelo Tosatti
  -1 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, kvm, Jin Dongming, Marcelo Tosatti

From: Jin Dongming <jin.dongming@np.css.fujitsu.com>

Add function for checking whether current CPU support mca broadcast.

Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 target-i386/cpu.h    |    1 +
 target-i386/helper.c |   33 +++++++++++++++++++++++++++++++++
 target-i386/kvm.c    |    6 +-----
 3 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index f0c07cd..dddcd74 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -760,6 +760,7 @@ int cpu_x86_exec(CPUX86State *s);
 void cpu_x86_close(CPUX86State *s);
 void x86_cpu_list (FILE *f, fprintf_function cpu_fprintf, const char *optarg);
 void x86_cpudef_setup(void);
+int cpu_x86_support_mca_broadcast(CPUState *env);
 
 int cpu_get_pic_interrupt(CPUX86State *s);
 /* MSDOS compatibility mode FPU exception support */
diff --git a/target-i386/helper.c b/target-i386/helper.c
index 2cfb4a4..6dfa27d 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -110,6 +110,32 @@ void cpu_x86_close(CPUX86State *env)
     qemu_free(env);
 }
 
+static void cpu_x86_version(CPUState *env, int *family, int *model)
+{
+    int cpuver = env->cpuid_version;
+
+    if (family == NULL || model == NULL) {
+        return;
+    }
+
+    *family = (cpuver >> 8) & 0x0f;
+    *model = ((cpuver >> 12) & 0xf0) + ((cpuver >> 4) & 0x0f);
+}
+
+/* Broadcast MCA signal for processor version 06H_EH and above */
+int cpu_x86_support_mca_broadcast(CPUState *env)
+{
+    int family = 0;
+    int model = 0;
+
+    cpu_x86_version(env, &family, &model);
+    if ((family == 6 && model >= 14) || family > 6) {
+        return 1;
+    }
+
+    return 0;
+}
+
 /***********************************************************/
 /* x86 debug */
 
@@ -1080,6 +1106,13 @@ void cpu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
         return;
     }
 
+    if (broadcast) {
+        if (!cpu_x86_support_mca_broadcast(cenv)) {
+            fprintf(stderr, "Current CPU does not support broadcast\n");
+            return;
+        }
+    }
+
     if (kvm_enabled()) {
         if (broadcast) {
             flag |= MCE_BROADCAST;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 8b868ad..2115a58 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -1711,13 +1711,9 @@ static void hardware_memory_error(void)
 static void kvm_mce_broadcast_rest(CPUState *env)
 {
     CPUState *cenv;
-    int family, model, cpuver = env->cpuid_version;
-
-    family = (cpuver >> 8) & 0xf;
-    model = ((cpuver >> 12) & 0xf0) + ((cpuver >> 4) & 0xf);
 
     /* Broadcast MCA signal for processor version 06H_EH and above */
-    if ((family == 6 && model >= 14) || family > 6) {
+    if (cpu_x86_support_mca_broadcast(env)) {
         for (cenv = first_cpu; cenv != NULL; cenv = cenv->next_cpu) {
             if (cenv == env) {
                 continue;
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [Qemu-devel] [PATCH 05/35] Add function for checking mca broadcast of CPU
@ 2011-01-06 17:56   ` Marcelo Tosatti
  0 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Marcelo Tosatti, qemu-devel, kvm, Jin Dongming

From: Jin Dongming <jin.dongming@np.css.fujitsu.com>

Add function for checking whether current CPU support mca broadcast.

Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 target-i386/cpu.h    |    1 +
 target-i386/helper.c |   33 +++++++++++++++++++++++++++++++++
 target-i386/kvm.c    |    6 +-----
 3 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index f0c07cd..dddcd74 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -760,6 +760,7 @@ int cpu_x86_exec(CPUX86State *s);
 void cpu_x86_close(CPUX86State *s);
 void x86_cpu_list (FILE *f, fprintf_function cpu_fprintf, const char *optarg);
 void x86_cpudef_setup(void);
+int cpu_x86_support_mca_broadcast(CPUState *env);
 
 int cpu_get_pic_interrupt(CPUX86State *s);
 /* MSDOS compatibility mode FPU exception support */
diff --git a/target-i386/helper.c b/target-i386/helper.c
index 2cfb4a4..6dfa27d 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -110,6 +110,32 @@ void cpu_x86_close(CPUX86State *env)
     qemu_free(env);
 }
 
+static void cpu_x86_version(CPUState *env, int *family, int *model)
+{
+    int cpuver = env->cpuid_version;
+
+    if (family == NULL || model == NULL) {
+        return;
+    }
+
+    *family = (cpuver >> 8) & 0x0f;
+    *model = ((cpuver >> 12) & 0xf0) + ((cpuver >> 4) & 0x0f);
+}
+
+/* Broadcast MCA signal for processor version 06H_EH and above */
+int cpu_x86_support_mca_broadcast(CPUState *env)
+{
+    int family = 0;
+    int model = 0;
+
+    cpu_x86_version(env, &family, &model);
+    if ((family == 6 && model >= 14) || family > 6) {
+        return 1;
+    }
+
+    return 0;
+}
+
 /***********************************************************/
 /* x86 debug */
 
@@ -1080,6 +1106,13 @@ void cpu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
         return;
     }
 
+    if (broadcast) {
+        if (!cpu_x86_support_mca_broadcast(cenv)) {
+            fprintf(stderr, "Current CPU does not support broadcast\n");
+            return;
+        }
+    }
+
     if (kvm_enabled()) {
         if (broadcast) {
             flag |= MCE_BROADCAST;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 8b868ad..2115a58 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -1711,13 +1711,9 @@ static void hardware_memory_error(void)
 static void kvm_mce_broadcast_rest(CPUState *env)
 {
     CPUState *cenv;
-    int family, model, cpuver = env->cpuid_version;
-
-    family = (cpuver >> 8) & 0xf;
-    model = ((cpuver >> 12) & 0xf0) + ((cpuver >> 4) & 0xf);
 
     /* Broadcast MCA signal for processor version 06H_EH and above */
-    if ((family == 6 && model >= 14) || family > 6) {
+    if (cpu_x86_support_mca_broadcast(env)) {
         for (cenv = first_cpu; cenv != NULL; cenv = cenv->next_cpu) {
             if (cenv == env) {
                 continue;
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [PATCH 06/35] kvm: introduce kvm_mce_in_progress
  2011-01-06 17:56 ` [Qemu-devel] " Marcelo Tosatti
@ 2011-01-06 17:56   ` Marcelo Tosatti
  -1 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: qemu-devel, kvm, Jin Dongming, Hidetoshi Seto, Marcelo Tosatti

From: Jin Dongming <jin.dongming@np.css.fujitsu.com>

Share same error handing, and rename this function after
MCIP (Machine Check In Progress) flag.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 target-i386/kvm.c |   15 +++++----------
 1 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 2115a58..5a699fc 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -219,7 +219,7 @@ static int kvm_get_msr(CPUState *env, struct kvm_msr_entry *msrs, int n)
 }
 
 /* FIXME: kill this and kvm_get_msr, use env->mcg_status instead */
-static int kvm_mce_in_exception(CPUState *env)
+static int kvm_mce_in_progress(CPUState *env)
 {
     struct kvm_msr_entry msr_mcg_status = {
         .index = MSR_MCG_STATUS,
@@ -228,7 +228,8 @@ static int kvm_mce_in_exception(CPUState *env)
 
     r = kvm_get_msr(env, &msr_mcg_status, 1);
     if (r == -1 || r == 0) {
-        return -1;
+        fprintf(stderr, "Failed to get MCE status\n");
+        return 0;
     }
     return !!(msr_mcg_status.data & MCG_STATUS_MCIP);
 }
@@ -248,10 +249,7 @@ static void kvm_do_inject_x86_mce(void *_data)
     /* If there is an MCE exception being processed, ignore this SRAO MCE */
     if ((data->env->mcg_cap & MCG_SER_P) &&
         !(data->mce->status & MCI_STATUS_AR)) {
-        r = kvm_mce_in_exception(data->env);
-        if (r == -1) {
-            fprintf(stderr, "Failed to get MCE status\n");
-        } else if (r) {
+        if (kvm_mce_in_progress(data->env)) {
             return;
         }
     }
@@ -1752,10 +1750,7 @@ int kvm_on_sigbus_vcpu(CPUState *env, int code, void *addr)
              * If there is an MCE excpetion being processed, ignore
              * this SRAO MCE
              */
-            r = kvm_mce_in_exception(env);
-            if (r == -1) {
-                fprintf(stderr, "Failed to get MCE status\n");
-            } else if (r) {
+            if (kvm_mce_in_progress(env)) {
                 return 0;
             }
             /* Fake an Intel architectural Memory scrubbing UCR */
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [Qemu-devel] [PATCH 06/35] kvm: introduce kvm_mce_in_progress
@ 2011-01-06 17:56   ` Marcelo Tosatti
  0 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Hidetoshi Seto, Marcelo Tosatti, qemu-devel, kvm, Jin Dongming

From: Jin Dongming <jin.dongming@np.css.fujitsu.com>

Share same error handing, and rename this function after
MCIP (Machine Check In Progress) flag.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 target-i386/kvm.c |   15 +++++----------
 1 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 2115a58..5a699fc 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -219,7 +219,7 @@ static int kvm_get_msr(CPUState *env, struct kvm_msr_entry *msrs, int n)
 }
 
 /* FIXME: kill this and kvm_get_msr, use env->mcg_status instead */
-static int kvm_mce_in_exception(CPUState *env)
+static int kvm_mce_in_progress(CPUState *env)
 {
     struct kvm_msr_entry msr_mcg_status = {
         .index = MSR_MCG_STATUS,
@@ -228,7 +228,8 @@ static int kvm_mce_in_exception(CPUState *env)
 
     r = kvm_get_msr(env, &msr_mcg_status, 1);
     if (r == -1 || r == 0) {
-        return -1;
+        fprintf(stderr, "Failed to get MCE status\n");
+        return 0;
     }
     return !!(msr_mcg_status.data & MCG_STATUS_MCIP);
 }
@@ -248,10 +249,7 @@ static void kvm_do_inject_x86_mce(void *_data)
     /* If there is an MCE exception being processed, ignore this SRAO MCE */
     if ((data->env->mcg_cap & MCG_SER_P) &&
         !(data->mce->status & MCI_STATUS_AR)) {
-        r = kvm_mce_in_exception(data->env);
-        if (r == -1) {
-            fprintf(stderr, "Failed to get MCE status\n");
-        } else if (r) {
+        if (kvm_mce_in_progress(data->env)) {
             return;
         }
     }
@@ -1752,10 +1750,7 @@ int kvm_on_sigbus_vcpu(CPUState *env, int code, void *addr)
              * If there is an MCE excpetion being processed, ignore
              * this SRAO MCE
              */
-            r = kvm_mce_in_exception(env);
-            if (r == -1) {
-                fprintf(stderr, "Failed to get MCE status\n");
-            } else if (r) {
+            if (kvm_mce_in_progress(env)) {
                 return 0;
             }
             /* Fake an Intel architectural Memory scrubbing UCR */
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [PATCH 07/35] kvm: kvm_mce_inj_* subroutines for templated error injections
  2011-01-06 17:56 ` [Qemu-devel] " Marcelo Tosatti
@ 2011-01-06 17:56   ` Marcelo Tosatti
  -1 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: qemu-devel, kvm, Jin Dongming, Hidetoshi Seto, Marcelo Tosatti

From: Jin Dongming <jin.dongming@np.css.fujitsu.com>

Refactor codes for maintainability.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 target-i386/kvm.c |  111 ++++++++++++++++++++++++++++++++++-------------------
 1 files changed, 71 insertions(+), 40 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 5a699fc..ce01e18 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -1722,44 +1722,75 @@ static void kvm_mce_broadcast_rest(CPUState *env)
         }
     }
 }
+
+static void kvm_mce_inj_srar_dataload(CPUState *env, target_phys_addr_t paddr)
+{
+    struct kvm_x86_mce mce = {
+        .bank = 9,
+        .status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
+                  | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
+                  | MCI_STATUS_AR | 0x134,
+        .mcg_status = MCG_STATUS_MCIP | MCG_STATUS_EIPV,
+        .addr = paddr,
+        .misc = (MCM_ADDR_PHYS << 6) | 0xc,
+    };
+    int r;
+
+    r = kvm_set_mce(env, &mce);
+    if (r < 0) {
+        fprintf(stderr, "kvm_set_mce: %s\n", strerror(errno));
+        abort();
+    }
+    kvm_mce_broadcast_rest(env);
+}
+
+static void kvm_mce_inj_srao_memscrub(CPUState *env, target_phys_addr_t paddr)
+{
+    struct kvm_x86_mce mce = {
+        .bank = 9,
+        .status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
+                  | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
+                  | 0xc0,
+        .mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV,
+        .addr = paddr,
+        .misc = (MCM_ADDR_PHYS << 6) | 0xc,
+    };
+    int r;
+
+    r = kvm_set_mce(env, &mce);
+    if (r < 0) {
+        fprintf(stderr, "kvm_set_mce: %s\n", strerror(errno));
+        abort();
+    }
+    kvm_mce_broadcast_rest(env);
+}
+
+static void kvm_mce_inj_srao_memscrub2(CPUState *env, target_phys_addr_t paddr)
+{
+    uint64_t status;
+
+    status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
+            | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
+            | 0xc0;
+    kvm_inject_x86_mce(env, 9, status,
+                       MCG_STATUS_MCIP | MCG_STATUS_RIPV, paddr,
+                       (MCM_ADDR_PHYS << 6) | 0xc, ABORT_ON_ERROR);
+
+    kvm_mce_broadcast_rest(env);
+}
+
 #endif
 
 int kvm_on_sigbus_vcpu(CPUState *env, int code, void *addr)
 {
 #if defined(KVM_CAP_MCE)
-    struct kvm_x86_mce mce = {
-            .bank = 9,
-    };
     void *vaddr;
     ram_addr_t ram_addr;
     target_phys_addr_t paddr;
-    int r;
 
     if ((env->mcg_cap & MCG_SER_P) && addr
         && (code == BUS_MCEERR_AR
             || code == BUS_MCEERR_AO)) {
-        if (code == BUS_MCEERR_AR) {
-            /* Fake an Intel architectural Data Load SRAR UCR */
-            mce.status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
-                | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
-                | MCI_STATUS_AR | 0x134;
-            mce.misc = (MCM_ADDR_PHYS << 6) | 0xc;
-            mce.mcg_status = MCG_STATUS_MCIP | MCG_STATUS_EIPV;
-        } else {
-            /*
-             * If there is an MCE excpetion being processed, ignore
-             * this SRAO MCE
-             */
-            if (kvm_mce_in_progress(env)) {
-                return 0;
-            }
-            /* Fake an Intel architectural Memory scrubbing UCR */
-            mce.status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
-                | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
-                | 0xc0;
-            mce.misc = (MCM_ADDR_PHYS << 6) | 0xc;
-            mce.mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV;
-        }
         vaddr = (void *)addr;
         if (qemu_ram_addr_from_host(vaddr, &ram_addr) ||
             !kvm_physical_memory_addr_from_ram(env->kvm_state, ram_addr, &paddr)) {
@@ -1772,13 +1803,20 @@ int kvm_on_sigbus_vcpu(CPUState *env, int code, void *addr)
                 hardware_memory_error();
             }
         }
-        mce.addr = paddr;
-        r = kvm_set_mce(env, &mce);
-        if (r < 0) {
-            fprintf(stderr, "kvm_set_mce: %s\n", strerror(errno));
-            abort();
+
+        if (code == BUS_MCEERR_AR) {
+            /* Fake an Intel architectural Data Load SRAR UCR */
+            kvm_mce_inj_srar_dataload(env, paddr);
+        } else {
+            /*
+             * If there is an MCE excpetion being processed, ignore
+             * this SRAO MCE
+             */
+            if (!kvm_mce_in_progress(env)) {
+                /* Fake an Intel architectural Memory scrubbing UCR */
+                kvm_mce_inj_srao_memscrub(env, paddr);
+            }
         }
-        kvm_mce_broadcast_rest(env);
     } else
 #endif
     {
@@ -1797,7 +1835,6 @@ int kvm_on_sigbus(int code, void *addr)
 {
 #if defined(KVM_CAP_MCE)
     if ((first_cpu->mcg_cap & MCG_SER_P) && addr && code == BUS_MCEERR_AO) {
-        uint64_t status;
         void *vaddr;
         ram_addr_t ram_addr;
         target_phys_addr_t paddr;
@@ -1810,13 +1847,7 @@ int kvm_on_sigbus(int code, void *addr)
                     "QEMU itself instead of guest system!: %p\n", addr);
             return 0;
         }
-        status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
-            | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
-            | 0xc0;
-        kvm_inject_x86_mce(first_cpu, 9, status,
-                           MCG_STATUS_MCIP | MCG_STATUS_RIPV, paddr,
-                           (MCM_ADDR_PHYS << 6) | 0xc, ABORT_ON_ERROR);
-        kvm_mce_broadcast_rest(first_cpu);
+        kvm_mce_inj_srao_memscrub2(first_cpu, paddr);
     } else
 #endif
     {
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [Qemu-devel] [PATCH 07/35] kvm: kvm_mce_inj_* subroutines for templated error injections
@ 2011-01-06 17:56   ` Marcelo Tosatti
  0 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Hidetoshi Seto, Marcelo Tosatti, qemu-devel, kvm, Jin Dongming

From: Jin Dongming <jin.dongming@np.css.fujitsu.com>

Refactor codes for maintainability.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 target-i386/kvm.c |  111 ++++++++++++++++++++++++++++++++++-------------------
 1 files changed, 71 insertions(+), 40 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 5a699fc..ce01e18 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -1722,44 +1722,75 @@ static void kvm_mce_broadcast_rest(CPUState *env)
         }
     }
 }
+
+static void kvm_mce_inj_srar_dataload(CPUState *env, target_phys_addr_t paddr)
+{
+    struct kvm_x86_mce mce = {
+        .bank = 9,
+        .status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
+                  | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
+                  | MCI_STATUS_AR | 0x134,
+        .mcg_status = MCG_STATUS_MCIP | MCG_STATUS_EIPV,
+        .addr = paddr,
+        .misc = (MCM_ADDR_PHYS << 6) | 0xc,
+    };
+    int r;
+
+    r = kvm_set_mce(env, &mce);
+    if (r < 0) {
+        fprintf(stderr, "kvm_set_mce: %s\n", strerror(errno));
+        abort();
+    }
+    kvm_mce_broadcast_rest(env);
+}
+
+static void kvm_mce_inj_srao_memscrub(CPUState *env, target_phys_addr_t paddr)
+{
+    struct kvm_x86_mce mce = {
+        .bank = 9,
+        .status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
+                  | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
+                  | 0xc0,
+        .mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV,
+        .addr = paddr,
+        .misc = (MCM_ADDR_PHYS << 6) | 0xc,
+    };
+    int r;
+
+    r = kvm_set_mce(env, &mce);
+    if (r < 0) {
+        fprintf(stderr, "kvm_set_mce: %s\n", strerror(errno));
+        abort();
+    }
+    kvm_mce_broadcast_rest(env);
+}
+
+static void kvm_mce_inj_srao_memscrub2(CPUState *env, target_phys_addr_t paddr)
+{
+    uint64_t status;
+
+    status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
+            | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
+            | 0xc0;
+    kvm_inject_x86_mce(env, 9, status,
+                       MCG_STATUS_MCIP | MCG_STATUS_RIPV, paddr,
+                       (MCM_ADDR_PHYS << 6) | 0xc, ABORT_ON_ERROR);
+
+    kvm_mce_broadcast_rest(env);
+}
+
 #endif
 
 int kvm_on_sigbus_vcpu(CPUState *env, int code, void *addr)
 {
 #if defined(KVM_CAP_MCE)
-    struct kvm_x86_mce mce = {
-            .bank = 9,
-    };
     void *vaddr;
     ram_addr_t ram_addr;
     target_phys_addr_t paddr;
-    int r;
 
     if ((env->mcg_cap & MCG_SER_P) && addr
         && (code == BUS_MCEERR_AR
             || code == BUS_MCEERR_AO)) {
-        if (code == BUS_MCEERR_AR) {
-            /* Fake an Intel architectural Data Load SRAR UCR */
-            mce.status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
-                | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
-                | MCI_STATUS_AR | 0x134;
-            mce.misc = (MCM_ADDR_PHYS << 6) | 0xc;
-            mce.mcg_status = MCG_STATUS_MCIP | MCG_STATUS_EIPV;
-        } else {
-            /*
-             * If there is an MCE excpetion being processed, ignore
-             * this SRAO MCE
-             */
-            if (kvm_mce_in_progress(env)) {
-                return 0;
-            }
-            /* Fake an Intel architectural Memory scrubbing UCR */
-            mce.status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
-                | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
-                | 0xc0;
-            mce.misc = (MCM_ADDR_PHYS << 6) | 0xc;
-            mce.mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV;
-        }
         vaddr = (void *)addr;
         if (qemu_ram_addr_from_host(vaddr, &ram_addr) ||
             !kvm_physical_memory_addr_from_ram(env->kvm_state, ram_addr, &paddr)) {
@@ -1772,13 +1803,20 @@ int kvm_on_sigbus_vcpu(CPUState *env, int code, void *addr)
                 hardware_memory_error();
             }
         }
-        mce.addr = paddr;
-        r = kvm_set_mce(env, &mce);
-        if (r < 0) {
-            fprintf(stderr, "kvm_set_mce: %s\n", strerror(errno));
-            abort();
+
+        if (code == BUS_MCEERR_AR) {
+            /* Fake an Intel architectural Data Load SRAR UCR */
+            kvm_mce_inj_srar_dataload(env, paddr);
+        } else {
+            /*
+             * If there is an MCE excpetion being processed, ignore
+             * this SRAO MCE
+             */
+            if (!kvm_mce_in_progress(env)) {
+                /* Fake an Intel architectural Memory scrubbing UCR */
+                kvm_mce_inj_srao_memscrub(env, paddr);
+            }
         }
-        kvm_mce_broadcast_rest(env);
     } else
 #endif
     {
@@ -1797,7 +1835,6 @@ int kvm_on_sigbus(int code, void *addr)
 {
 #if defined(KVM_CAP_MCE)
     if ((first_cpu->mcg_cap & MCG_SER_P) && addr && code == BUS_MCEERR_AO) {
-        uint64_t status;
         void *vaddr;
         ram_addr_t ram_addr;
         target_phys_addr_t paddr;
@@ -1810,13 +1847,7 @@ int kvm_on_sigbus(int code, void *addr)
                     "QEMU itself instead of guest system!: %p\n", addr);
             return 0;
         }
-        status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
-            | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
-            | 0xc0;
-        kvm_inject_x86_mce(first_cpu, 9, status,
-                           MCG_STATUS_MCIP | MCG_STATUS_RIPV, paddr,
-                           (MCM_ADDR_PHYS << 6) | 0xc, ABORT_ON_ERROR);
-        kvm_mce_broadcast_rest(first_cpu);
+        kvm_mce_inj_srao_memscrub2(first_cpu, paddr);
     } else
 #endif
     {
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [PATCH 08/35] kvm: introduce kvm_inject_x86_mce_on
  2011-01-06 17:56 ` [Qemu-devel] " Marcelo Tosatti
@ 2011-01-06 17:56   ` Marcelo Tosatti
  -1 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: qemu-devel, kvm, Jin Dongming, Hidetoshi Seto, Marcelo Tosatti

From: Jin Dongming <jin.dongming@np.css.fujitsu.com>

Pass a table instead of multiple args.

Note:

    kvm_inject_x86_mce(env, bank, status, mcg_status, addr, misc,
                       abort_on_error);

is equal to:

    struct kvm_x86_mce mce = {
        .bank = bank,
        .status = status,
        .mcg_status = mcg_status,
        .addr = addr,
        .misc = misc,
    };
    kvm_inject_x86_mce_on(env, &mce, abort_on_error);

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 target-i386/kvm.c |   57 +++++++++++++++++++++++++++++++++-------------------
 1 files changed, 36 insertions(+), 21 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index ce01e18..9a4bf98 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -263,6 +263,23 @@ static void kvm_do_inject_x86_mce(void *_data)
     }
 }
 
+static void kvm_inject_x86_mce_on(CPUState *env, struct kvm_x86_mce *mce,
+                                  int flag)
+{
+    struct kvm_x86_mce_data data = {
+        .env = env,
+        .mce = mce,
+        .abort_on_error = (flag & ABORT_ON_ERROR),
+    };
+
+    if (!env->mcg_cap) {
+        fprintf(stderr, "MCE support is not enabled!\n");
+        return;
+    }
+
+    run_on_cpu(env, kvm_do_inject_x86_mce, &data);
+}
+
 static void kvm_mce_broadcast_rest(CPUState *env);
 #endif
 
@@ -278,21 +295,12 @@ void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
         .addr = addr,
         .misc = misc,
     };
-    struct kvm_x86_mce_data data = {
-            .env = cenv,
-            .mce = &mce,
-    };
-
-    if (!cenv->mcg_cap) {
-        fprintf(stderr, "MCE support is not enabled!\n");
-        return;
-    }
 
     if (flag & MCE_BROADCAST) {
         kvm_mce_broadcast_rest(cenv);
     }
 
-    run_on_cpu(cenv, kvm_do_inject_x86_mce, &data);
+    kvm_inject_x86_mce_on(cenv, &mce, flag);
 #else
     if (flag & ABORT_ON_ERROR) {
         abort();
@@ -1708,6 +1716,13 @@ static void hardware_memory_error(void)
 #ifdef KVM_CAP_MCE
 static void kvm_mce_broadcast_rest(CPUState *env)
 {
+    struct kvm_x86_mce mce = {
+        .bank = 1,
+        .status = MCI_STATUS_VAL | MCI_STATUS_UC,
+        .mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV,
+        .addr = 0,
+        .misc = 0,
+    };
     CPUState *cenv;
 
     /* Broadcast MCA signal for processor version 06H_EH and above */
@@ -1716,9 +1731,7 @@ static void kvm_mce_broadcast_rest(CPUState *env)
             if (cenv == env) {
                 continue;
             }
-            kvm_inject_x86_mce(cenv, 1, MCI_STATUS_VAL | MCI_STATUS_UC,
-                               MCG_STATUS_MCIP | MCG_STATUS_RIPV, 0, 0,
-                               ABORT_ON_ERROR);
+            kvm_inject_x86_mce_on(cenv, &mce, ABORT_ON_ERROR);
         }
     }
 }
@@ -1767,15 +1780,17 @@ static void kvm_mce_inj_srao_memscrub(CPUState *env, target_phys_addr_t paddr)
 
 static void kvm_mce_inj_srao_memscrub2(CPUState *env, target_phys_addr_t paddr)
 {
-    uint64_t status;
-
-    status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
-            | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
-            | 0xc0;
-    kvm_inject_x86_mce(env, 9, status,
-                       MCG_STATUS_MCIP | MCG_STATUS_RIPV, paddr,
-                       (MCM_ADDR_PHYS << 6) | 0xc, ABORT_ON_ERROR);
+    struct kvm_x86_mce mce = {
+        .bank = 9,
+        .status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
+                  | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
+                  | 0xc0,
+        .mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV,
+        .addr = paddr,
+        .misc = (MCM_ADDR_PHYS << 6) | 0xc,
+    };
 
+    kvm_inject_x86_mce_on(env, &mce, ABORT_ON_ERROR);
     kvm_mce_broadcast_rest(env);
 }
 
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [Qemu-devel] [PATCH 08/35] kvm: introduce kvm_inject_x86_mce_on
@ 2011-01-06 17:56   ` Marcelo Tosatti
  0 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Hidetoshi Seto, Marcelo Tosatti, qemu-devel, kvm, Jin Dongming

From: Jin Dongming <jin.dongming@np.css.fujitsu.com>

Pass a table instead of multiple args.

Note:

    kvm_inject_x86_mce(env, bank, status, mcg_status, addr, misc,
                       abort_on_error);

is equal to:

    struct kvm_x86_mce mce = {
        .bank = bank,
        .status = status,
        .mcg_status = mcg_status,
        .addr = addr,
        .misc = misc,
    };
    kvm_inject_x86_mce_on(env, &mce, abort_on_error);

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 target-i386/kvm.c |   57 +++++++++++++++++++++++++++++++++-------------------
 1 files changed, 36 insertions(+), 21 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index ce01e18..9a4bf98 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -263,6 +263,23 @@ static void kvm_do_inject_x86_mce(void *_data)
     }
 }
 
+static void kvm_inject_x86_mce_on(CPUState *env, struct kvm_x86_mce *mce,
+                                  int flag)
+{
+    struct kvm_x86_mce_data data = {
+        .env = env,
+        .mce = mce,
+        .abort_on_error = (flag & ABORT_ON_ERROR),
+    };
+
+    if (!env->mcg_cap) {
+        fprintf(stderr, "MCE support is not enabled!\n");
+        return;
+    }
+
+    run_on_cpu(env, kvm_do_inject_x86_mce, &data);
+}
+
 static void kvm_mce_broadcast_rest(CPUState *env);
 #endif
 
@@ -278,21 +295,12 @@ void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
         .addr = addr,
         .misc = misc,
     };
-    struct kvm_x86_mce_data data = {
-            .env = cenv,
-            .mce = &mce,
-    };
-
-    if (!cenv->mcg_cap) {
-        fprintf(stderr, "MCE support is not enabled!\n");
-        return;
-    }
 
     if (flag & MCE_BROADCAST) {
         kvm_mce_broadcast_rest(cenv);
     }
 
-    run_on_cpu(cenv, kvm_do_inject_x86_mce, &data);
+    kvm_inject_x86_mce_on(cenv, &mce, flag);
 #else
     if (flag & ABORT_ON_ERROR) {
         abort();
@@ -1708,6 +1716,13 @@ static void hardware_memory_error(void)
 #ifdef KVM_CAP_MCE
 static void kvm_mce_broadcast_rest(CPUState *env)
 {
+    struct kvm_x86_mce mce = {
+        .bank = 1,
+        .status = MCI_STATUS_VAL | MCI_STATUS_UC,
+        .mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV,
+        .addr = 0,
+        .misc = 0,
+    };
     CPUState *cenv;
 
     /* Broadcast MCA signal for processor version 06H_EH and above */
@@ -1716,9 +1731,7 @@ static void kvm_mce_broadcast_rest(CPUState *env)
             if (cenv == env) {
                 continue;
             }
-            kvm_inject_x86_mce(cenv, 1, MCI_STATUS_VAL | MCI_STATUS_UC,
-                               MCG_STATUS_MCIP | MCG_STATUS_RIPV, 0, 0,
-                               ABORT_ON_ERROR);
+            kvm_inject_x86_mce_on(cenv, &mce, ABORT_ON_ERROR);
         }
     }
 }
@@ -1767,15 +1780,17 @@ static void kvm_mce_inj_srao_memscrub(CPUState *env, target_phys_addr_t paddr)
 
 static void kvm_mce_inj_srao_memscrub2(CPUState *env, target_phys_addr_t paddr)
 {
-    uint64_t status;
-
-    status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
-            | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
-            | 0xc0;
-    kvm_inject_x86_mce(env, 9, status,
-                       MCG_STATUS_MCIP | MCG_STATUS_RIPV, paddr,
-                       (MCM_ADDR_PHYS << 6) | 0xc, ABORT_ON_ERROR);
+    struct kvm_x86_mce mce = {
+        .bank = 9,
+        .status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
+                  | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
+                  | 0xc0,
+        .mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV,
+        .addr = paddr,
+        .misc = (MCM_ADDR_PHYS << 6) | 0xc,
+    };
 
+    kvm_inject_x86_mce_on(env, &mce, ABORT_ON_ERROR);
     kvm_mce_broadcast_rest(env);
 }
 
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [PATCH 09/35] kvm: x86: Fix DPL write back of segment registers
  2011-01-06 17:56 ` [Qemu-devel] " Marcelo Tosatti
@ 2011-01-06 17:56   ` Marcelo Tosatti
  -1 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, kvm, Jan Kiszka, Avi Kivity

From: Jan Kiszka <jan.kiszka@siemens.com>

The DPL is stored in the flags and not in the selector. In fact, the RPL
may differ from the DPL at some point in time, and so we were corrupting
the guest state so far.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
 target-i386/kvm.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 9a4bf98..ee7bdf8 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -602,7 +602,7 @@ static void set_seg(struct kvm_segment *lhs, const SegmentCache *rhs)
     lhs->limit = rhs->limit;
     lhs->type = (flags >> DESC_TYPE_SHIFT) & 15;
     lhs->present = (flags & DESC_P_MASK) != 0;
-    lhs->dpl = rhs->selector & 3;
+    lhs->dpl = (flags >> DESC_DPL_SHIFT) & 3;
     lhs->db = (flags >> DESC_B_SHIFT) & 1;
     lhs->s = (flags & DESC_S_MASK) != 0;
     lhs->l = (flags >> DESC_L_SHIFT) & 1;
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [Qemu-devel] [PATCH 09/35] kvm: x86: Fix DPL write back of segment registers
@ 2011-01-06 17:56   ` Marcelo Tosatti
  0 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Jan Kiszka, qemu-devel, kvm, Avi Kivity

From: Jan Kiszka <jan.kiszka@siemens.com>

The DPL is stored in the flags and not in the selector. In fact, the RPL
may differ from the DPL at some point in time, and so we were corrupting
the guest state so far.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
 target-i386/kvm.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 9a4bf98..ee7bdf8 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -602,7 +602,7 @@ static void set_seg(struct kvm_segment *lhs, const SegmentCache *rhs)
     lhs->limit = rhs->limit;
     lhs->type = (flags >> DESC_TYPE_SHIFT) & 15;
     lhs->present = (flags & DESC_P_MASK) != 0;
-    lhs->dpl = rhs->selector & 3;
+    lhs->dpl = (flags >> DESC_DPL_SHIFT) & 3;
     lhs->db = (flags >> DESC_B_SHIFT) & 1;
     lhs->s = (flags & DESC_S_MASK) != 0;
     lhs->l = (flags >> DESC_L_SHIFT) & 1;
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [PATCH 10/35] kvm: x86: Remove obsolete SS.RPL/DPL aligment
  2011-01-06 17:56 ` [Qemu-devel] " Marcelo Tosatti
@ 2011-01-06 17:56   ` Marcelo Tosatti
  -1 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, kvm, Jan Kiszka, Avi Kivity

From: Jan Kiszka <jan.kiszka@siemens.com>

This seems to date back to the days KVM didn't support real mode. The
check is no longer needed and, even worse, is corrupting the guest state
in case SS.RPL != DPL.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
 target-i386/kvm.c |    7 -------
 1 files changed, 0 insertions(+), 7 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index ee7bdf8..7e5982b 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -780,13 +780,6 @@ static int kvm_put_sregs(CPUState *env)
 	    set_seg(&sregs.fs, &env->segs[R_FS]);
 	    set_seg(&sregs.gs, &env->segs[R_GS]);
 	    set_seg(&sregs.ss, &env->segs[R_SS]);
-
-	    if (env->cr[0] & CR0_PE_MASK) {
-		/* force ss cpl to cs cpl */
-		sregs.ss.selector = (sregs.ss.selector & ~3) |
-			(sregs.cs.selector & 3);
-		sregs.ss.dpl = sregs.ss.selector & 3;
-	    }
     }
 
     set_seg(&sregs.tr, &env->tr);
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [Qemu-devel] [PATCH 10/35] kvm: x86: Remove obsolete SS.RPL/DPL aligment
@ 2011-01-06 17:56   ` Marcelo Tosatti
  0 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Jan Kiszka, qemu-devel, kvm, Avi Kivity

From: Jan Kiszka <jan.kiszka@siemens.com>

This seems to date back to the days KVM didn't support real mode. The
check is no longer needed and, even worse, is corrupting the guest state
in case SS.RPL != DPL.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
 target-i386/kvm.c |    7 -------
 1 files changed, 0 insertions(+), 7 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index ee7bdf8..7e5982b 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -780,13 +780,6 @@ static int kvm_put_sregs(CPUState *env)
 	    set_seg(&sregs.fs, &env->segs[R_FS]);
 	    set_seg(&sregs.gs, &env->segs[R_GS]);
 	    set_seg(&sregs.ss, &env->segs[R_SS]);
-
-	    if (env->cr[0] & CR0_PE_MASK) {
-		/* force ss cpl to cs cpl */
-		sregs.ss.selector = (sregs.ss.selector & ~3) |
-			(sregs.cs.selector & 3);
-		sregs.ss.dpl = sregs.ss.selector & 3;
-	    }
     }
 
     set_seg(&sregs.tr, &env->tr);
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [PATCH 11/35] kvm: x86: Prevent sign extension of DR7 in guest debugging mode
  2011-01-06 17:56 ` [Qemu-devel] " Marcelo Tosatti
@ 2011-01-06 17:56   ` Marcelo Tosatti
  -1 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, kvm, Jan Kiszka, Avi Kivity

From: Jan Kiszka <jan.kiszka@siemens.com>

This unbreaks guest debugging when the 4th hardware breakpoint used for
guest debugging is a watchpoint of 4 or 8 byte lenght. The 31st bit of
DR7 is set in that case and used to cause a sign extension to the high
word which was breaking the guest state (vm entry failure).

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
 target-i386/kvm.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 7e5982b..85edacc 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -1686,7 +1686,7 @@ void kvm_arch_update_guest_debug(CPUState *env, struct kvm_guest_debug *dbg)
             dbg->arch.debugreg[n] = hw_breakpoint[n].addr;
             dbg->arch.debugreg[7] |= (2 << (n * 2)) |
                 (type_code[hw_breakpoint[n].type] << (16 + n*4)) |
-                (len_code[hw_breakpoint[n].len] << (18 + n*4));
+                ((uint32_t)len_code[hw_breakpoint[n].len] << (18 + n*4));
         }
     }
     /* Legal xcr0 for loading */
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [Qemu-devel] [PATCH 11/35] kvm: x86: Prevent sign extension of DR7 in guest debugging mode
@ 2011-01-06 17:56   ` Marcelo Tosatti
  0 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Jan Kiszka, qemu-devel, kvm, Avi Kivity

From: Jan Kiszka <jan.kiszka@siemens.com>

This unbreaks guest debugging when the 4th hardware breakpoint used for
guest debugging is a watchpoint of 4 or 8 byte lenght. The 31st bit of
DR7 is set in that case and used to cause a sign extension to the high
word which was breaking the guest state (vm entry failure).

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
 target-i386/kvm.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 7e5982b..85edacc 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -1686,7 +1686,7 @@ void kvm_arch_update_guest_debug(CPUState *env, struct kvm_guest_debug *dbg)
             dbg->arch.debugreg[n] = hw_breakpoint[n].addr;
             dbg->arch.debugreg[7] |= (2 << (n * 2)) |
                 (type_code[hw_breakpoint[n].type] << (16 + n*4)) |
-                (len_code[hw_breakpoint[n].len] << (18 + n*4));
+                ((uint32_t)len_code[hw_breakpoint[n].len] << (18 + n*4));
         }
     }
     /* Legal xcr0 for loading */
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [PATCH 12/35] kvm: x86: Fix a few coding style violations
  2011-01-06 17:56 ` [Qemu-devel] " Marcelo Tosatti
@ 2011-01-06 17:56   ` Marcelo Tosatti
  -1 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, kvm, Jan Kiszka, Avi Kivity

From: Jan Kiszka <jan.kiszka@siemens.com>

No functional changes.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
 target-i386/kvm.c |  335 +++++++++++++++++++++++++++++------------------------
 1 files changed, 182 insertions(+), 153 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 85edacc..fda07d2 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -150,34 +150,34 @@ uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function,
 
 #ifdef CONFIG_KVM_PARA
 struct kvm_para_features {
-        int cap;
-        int feature;
+    int cap;
+    int feature;
 } para_features[] = {
 #ifdef KVM_CAP_CLOCKSOURCE
-        { KVM_CAP_CLOCKSOURCE, KVM_FEATURE_CLOCKSOURCE },
+    { KVM_CAP_CLOCKSOURCE, KVM_FEATURE_CLOCKSOURCE },
 #endif
 #ifdef KVM_CAP_NOP_IO_DELAY
-        { KVM_CAP_NOP_IO_DELAY, KVM_FEATURE_NOP_IO_DELAY },
+    { KVM_CAP_NOP_IO_DELAY, KVM_FEATURE_NOP_IO_DELAY },
 #endif
 #ifdef KVM_CAP_PV_MMU
-        { KVM_CAP_PV_MMU, KVM_FEATURE_MMU_OP },
+    { KVM_CAP_PV_MMU, KVM_FEATURE_MMU_OP },
 #endif
 #ifdef KVM_CAP_ASYNC_PF
-        { KVM_CAP_ASYNC_PF, KVM_FEATURE_ASYNC_PF },
+    { KVM_CAP_ASYNC_PF, KVM_FEATURE_ASYNC_PF },
 #endif
-        { -1, -1 }
+    { -1, -1 }
 };
 
 static int get_para_features(CPUState *env)
 {
-        int i, features = 0;
+    int i, features = 0;
 
-        for (i = 0; i < ARRAY_SIZE(para_features) - 1; i++) {
-                if (kvm_check_extension(env->kvm_state, para_features[i].cap))
-                        features |= (1 << para_features[i].feature);
+    for (i = 0; i < ARRAY_SIZE(para_features) - 1; i++) {
+        if (kvm_check_extension(env->kvm_state, para_features[i].cap)) {
+            features |= (1 << para_features[i].feature);
         }
-
-        return features;
+    }
+    return features;
 }
 #endif
 
@@ -389,13 +389,15 @@ int kvm_arch_init_vcpu(CPUState *env)
                 c->index = j;
                 cpu_x86_cpuid(env, i, j, &c->eax, &c->ebx, &c->ecx, &c->edx);
 
-                if (i == 4 && c->eax == 0)
+                if (i == 4 && c->eax == 0) {
                     break;
-                if (i == 0xb && !(c->ecx & 0xff00))
+                }
+                if (i == 0xb && !(c->ecx & 0xff00)) {
                     break;
-                if (i == 0xd && c->eax == 0)
+                }
+                if (i == 0xd && c->eax == 0) {
                     break;
-
+                }
                 c = &cpuid_data.entries[cpuid_i++];
             }
             break;
@@ -425,17 +427,18 @@ int kvm_arch_init_vcpu(CPUState *env)
         uint64_t mcg_cap;
         int banks;
 
-        if (kvm_get_mce_cap_supported(env->kvm_state, &mcg_cap, &banks))
+        if (kvm_get_mce_cap_supported(env->kvm_state, &mcg_cap, &banks)) {
             perror("kvm_get_mce_cap_supported FAILED");
-        else {
+        } else {
             if (banks > MCE_BANKS_DEF)
                 banks = MCE_BANKS_DEF;
             mcg_cap &= MCE_CAP_DEF;
             mcg_cap |= banks;
-            if (kvm_setup_mce(env, &mcg_cap))
+            if (kvm_setup_mce(env, &mcg_cap)) {
                 perror("kvm_setup_mce FAILED");
-            else
+            } else {
                 env->mcg_cap = mcg_cap;
+            }
         }
     }
 #endif
@@ -577,7 +580,7 @@ int kvm_arch_init(KVMState *s, int smp_cpus)
 
     return kvm_init_identity_map_page(s);
 }
-                    
+
 static void set_v8086_seg(struct kvm_segment *lhs, const SegmentCache *rhs)
 {
     lhs->selector = rhs->selector;
@@ -616,23 +619,23 @@ static void get_seg(SegmentCache *lhs, const struct kvm_segment *rhs)
     lhs->selector = rhs->selector;
     lhs->base = rhs->base;
     lhs->limit = rhs->limit;
-    lhs->flags =
-	(rhs->type << DESC_TYPE_SHIFT)
-	| (rhs->present * DESC_P_MASK)
-	| (rhs->dpl << DESC_DPL_SHIFT)
-	| (rhs->db << DESC_B_SHIFT)
-	| (rhs->s * DESC_S_MASK)
-	| (rhs->l << DESC_L_SHIFT)
-	| (rhs->g * DESC_G_MASK)
-	| (rhs->avl * DESC_AVL_MASK);
+    lhs->flags = (rhs->type << DESC_TYPE_SHIFT) |
+                 (rhs->present * DESC_P_MASK) |
+                 (rhs->dpl << DESC_DPL_SHIFT) |
+                 (rhs->db << DESC_B_SHIFT) |
+                 (rhs->s * DESC_S_MASK) |
+                 (rhs->l << DESC_L_SHIFT) |
+                 (rhs->g * DESC_G_MASK) |
+                 (rhs->avl * DESC_AVL_MASK);
 }
 
 static void kvm_getput_reg(__u64 *kvm_reg, target_ulong *qemu_reg, int set)
 {
-    if (set)
+    if (set) {
         *kvm_reg = *qemu_reg;
-    else
+    } else {
         *qemu_reg = *kvm_reg;
+    }
 }
 
 static int kvm_getput_regs(CPUState *env, int set)
@@ -642,8 +645,9 @@ static int kvm_getput_regs(CPUState *env, int set)
 
     if (!set) {
         ret = kvm_vcpu_ioctl(env, KVM_GET_REGS, &regs);
-        if (ret < 0)
+        if (ret < 0) {
             return ret;
+        }
     }
 
     kvm_getput_reg(&regs.rax, &env->regs[R_EAX], set);
@@ -668,8 +672,9 @@ static int kvm_getput_regs(CPUState *env, int set)
     kvm_getput_reg(&regs.rflags, &env->eflags, set);
     kvm_getput_reg(&regs.rip, &env->eip, set);
 
-    if (set)
+    if (set) {
         ret = kvm_vcpu_ioctl(env, KVM_SET_REGS, &regs);
+    }
 
     return ret;
 }
@@ -683,8 +688,9 @@ static int kvm_put_fpu(CPUState *env)
     fpu.fsw = env->fpus & ~(7 << 11);
     fpu.fsw |= (env->fpstt & 7) << 11;
     fpu.fcw = env->fpuc;
-    for (i = 0; i < 8; ++i)
-	fpu.ftwx |= (!env->fptags[i]) << i;
+    for (i = 0; i < 8; ++i) {
+        fpu.ftwx |= (!env->fptags[i]) << i;
+    }
     memcpy(fpu.fpr, env->fpregs, sizeof env->fpregs);
     memcpy(fpu.xmm, env->xmm_regs, sizeof env->xmm_regs);
     fpu.mxcsr = env->mxcsr;
@@ -709,8 +715,9 @@ static int kvm_put_xsave(CPUState *env)
     struct kvm_xsave* xsave;
     uint16_t cwd, swd, twd, fop;
 
-    if (!kvm_has_xsave())
+    if (!kvm_has_xsave()) {
         return kvm_put_fpu(env);
+    }
 
     xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
     memset(xsave, 0, sizeof(struct kvm_xsave));
@@ -718,8 +725,9 @@ static int kvm_put_xsave(CPUState *env)
     swd = env->fpus & ~(7 << 11);
     swd |= (env->fpstt & 7) << 11;
     cwd = env->fpuc;
-    for (i = 0; i < 8; ++i)
+    for (i = 0; i < 8; ++i) {
         twd |= (!env->fptags[i]) << i;
+    }
     xsave->region[0] = (uint32_t)(swd << 16) + cwd;
     xsave->region[1] = (uint32_t)(fop << 16) + twd;
     memcpy(&xsave->region[XSAVE_ST_SPACE], env->fpregs,
@@ -743,8 +751,9 @@ static int kvm_put_xcrs(CPUState *env)
 #ifdef KVM_CAP_XCRS
     struct kvm_xcrs xcrs;
 
-    if (!kvm_has_xcrs())
+    if (!kvm_has_xcrs()) {
         return 0;
+    }
 
     xcrs.nr_xcrs = 1;
     xcrs.flags = 0;
@@ -767,19 +776,19 @@ static int kvm_put_sregs(CPUState *env)
     }
 
     if ((env->eflags & VM_MASK)) {
-	    set_v8086_seg(&sregs.cs, &env->segs[R_CS]);
-	    set_v8086_seg(&sregs.ds, &env->segs[R_DS]);
-	    set_v8086_seg(&sregs.es, &env->segs[R_ES]);
-	    set_v8086_seg(&sregs.fs, &env->segs[R_FS]);
-	    set_v8086_seg(&sregs.gs, &env->segs[R_GS]);
-	    set_v8086_seg(&sregs.ss, &env->segs[R_SS]);
+        set_v8086_seg(&sregs.cs, &env->segs[R_CS]);
+        set_v8086_seg(&sregs.ds, &env->segs[R_DS]);
+        set_v8086_seg(&sregs.es, &env->segs[R_ES]);
+        set_v8086_seg(&sregs.fs, &env->segs[R_FS]);
+        set_v8086_seg(&sregs.gs, &env->segs[R_GS]);
+        set_v8086_seg(&sregs.ss, &env->segs[R_SS]);
     } else {
-	    set_seg(&sregs.cs, &env->segs[R_CS]);
-	    set_seg(&sregs.ds, &env->segs[R_DS]);
-	    set_seg(&sregs.es, &env->segs[R_ES]);
-	    set_seg(&sregs.fs, &env->segs[R_FS]);
-	    set_seg(&sregs.gs, &env->segs[R_GS]);
-	    set_seg(&sregs.ss, &env->segs[R_SS]);
+        set_seg(&sregs.cs, &env->segs[R_CS]);
+        set_seg(&sregs.ds, &env->segs[R_DS]);
+        set_seg(&sregs.es, &env->segs[R_ES]);
+        set_seg(&sregs.fs, &env->segs[R_FS]);
+        set_seg(&sregs.gs, &env->segs[R_GS]);
+        set_seg(&sregs.ss, &env->segs[R_SS]);
     }
 
     set_seg(&sregs.tr, &env->tr);
@@ -822,10 +831,12 @@ static int kvm_put_msrs(CPUState *env, int level)
     kvm_msr_entry_set(&msrs[n++], MSR_IA32_SYSENTER_CS, env->sysenter_cs);
     kvm_msr_entry_set(&msrs[n++], MSR_IA32_SYSENTER_ESP, env->sysenter_esp);
     kvm_msr_entry_set(&msrs[n++], MSR_IA32_SYSENTER_EIP, env->sysenter_eip);
-    if (kvm_has_msr_star(env))
-	kvm_msr_entry_set(&msrs[n++], MSR_STAR, env->star);
-    if (kvm_has_msr_hsave_pa(env))
+    if (kvm_has_msr_star(env)) {
+        kvm_msr_entry_set(&msrs[n++], MSR_STAR, env->star);
+    }
+    if (kvm_has_msr_hsave_pa(env)) {
         kvm_msr_entry_set(&msrs[n++], MSR_VM_HSAVE_PA, env->vm_hsave);
+    }
 #ifdef TARGET_X86_64
     if (lm_capable_kernel) {
         kvm_msr_entry_set(&msrs[n++], MSR_CSTAR, env->cstar);
@@ -854,13 +865,15 @@ static int kvm_put_msrs(CPUState *env, int level)
 #ifdef KVM_CAP_MCE
     if (env->mcg_cap) {
         int i;
-        if (level == KVM_PUT_RESET_STATE)
+
+        if (level == KVM_PUT_RESET_STATE) {
             kvm_msr_entry_set(&msrs[n++], MSR_MCG_STATUS, env->mcg_status);
-        else if (level == KVM_PUT_FULL_STATE) {
+        } else if (level == KVM_PUT_FULL_STATE) {
             kvm_msr_entry_set(&msrs[n++], MSR_MCG_STATUS, env->mcg_status);
             kvm_msr_entry_set(&msrs[n++], MSR_MCG_CTL, env->mcg_ctl);
-            for (i = 0; i < (env->mcg_cap & 0xff) * 4; i++)
+            for (i = 0; i < (env->mcg_cap & 0xff) * 4; i++) {
                 kvm_msr_entry_set(&msrs[n++], MSR_MC0_CTL + i, env->mce_banks[i]);
+            }
         }
     }
 #endif
@@ -878,14 +891,16 @@ static int kvm_get_fpu(CPUState *env)
     int i, ret;
 
     ret = kvm_vcpu_ioctl(env, KVM_GET_FPU, &fpu);
-    if (ret < 0)
+    if (ret < 0) {
         return ret;
+    }
 
     env->fpstt = (fpu.fsw >> 11) & 7;
     env->fpus = fpu.fsw;
     env->fpuc = fpu.fcw;
-    for (i = 0; i < 8; ++i)
-	env->fptags[i] = !((fpu.ftwx >> i) & 1);
+    for (i = 0; i < 8; ++i) {
+        env->fptags[i] = !((fpu.ftwx >> i) & 1);
+    }
     memcpy(env->fpregs, fpu.fpr, sizeof env->fpregs);
     memcpy(env->xmm_regs, fpu.xmm, sizeof env->xmm_regs);
     env->mxcsr = fpu.mxcsr;
@@ -900,8 +915,9 @@ static int kvm_get_xsave(CPUState *env)
     int ret, i;
     uint16_t cwd, swd, twd, fop;
 
-    if (!kvm_has_xsave())
+    if (!kvm_has_xsave()) {
         return kvm_get_fpu(env);
+    }
 
     xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
     ret = kvm_vcpu_ioctl(env, KVM_GET_XSAVE, xsave);
@@ -917,8 +933,9 @@ static int kvm_get_xsave(CPUState *env)
     env->fpstt = (swd >> 11) & 7;
     env->fpus = swd;
     env->fpuc = cwd;
-    for (i = 0; i < 8; ++i)
+    for (i = 0; i < 8; ++i) {
         env->fptags[i] = !((twd >> i) & 1);
+    }
     env->mxcsr = xsave->region[XSAVE_MXCSR];
     memcpy(env->fpregs, &xsave->region[XSAVE_ST_SPACE],
             sizeof env->fpregs);
@@ -940,19 +957,22 @@ static int kvm_get_xcrs(CPUState *env)
     int i, ret;
     struct kvm_xcrs xcrs;
 
-    if (!kvm_has_xcrs())
+    if (!kvm_has_xcrs()) {
         return 0;
+    }
 
     ret = kvm_vcpu_ioctl(env, KVM_GET_XCRS, &xcrs);
-    if (ret < 0)
+    if (ret < 0) {
         return ret;
+    }
 
-    for (i = 0; i < xcrs.nr_xcrs; i++)
+    for (i = 0; i < xcrs.nr_xcrs; i++) {
         /* Only support xcr0 now */
         if (xcrs.xcrs[0].xcr == 0) {
             env->xcr0 = xcrs.xcrs[0].value;
             break;
         }
+    }
     return 0;
 #else
     return 0;
@@ -966,8 +986,9 @@ static int kvm_get_sregs(CPUState *env)
     int bit, i, ret;
 
     ret = kvm_vcpu_ioctl(env, KVM_GET_SREGS, &sregs);
-    if (ret < 0)
+    if (ret < 0) {
         return ret;
+    }
 
     /* There can only be one pending IRQ set in the bitmap at a time, so try
        to find it and save its number instead (-1 for none). */
@@ -1005,21 +1026,19 @@ static int kvm_get_sregs(CPUState *env)
     env->efer = sregs.efer;
     //cpu_set_apic_tpr(env->apic_state, sregs.cr8);
 
-#define HFLAG_COPY_MASK ~( \
-			HF_CPL_MASK | HF_PE_MASK | HF_MP_MASK | HF_EM_MASK | \
-			HF_TS_MASK | HF_TF_MASK | HF_VM_MASK | HF_IOPL_MASK | \
-			HF_OSFXSR_MASK | HF_LMA_MASK | HF_CS32_MASK | \
-			HF_SS32_MASK | HF_CS64_MASK | HF_ADDSEG_MASK)
-
-
+#define HFLAG_COPY_MASK \
+    ~( HF_CPL_MASK | HF_PE_MASK | HF_MP_MASK | HF_EM_MASK | \
+       HF_TS_MASK | HF_TF_MASK | HF_VM_MASK | HF_IOPL_MASK | \
+       HF_OSFXSR_MASK | HF_LMA_MASK | HF_CS32_MASK | \
+       HF_SS32_MASK | HF_CS64_MASK | HF_ADDSEG_MASK)
 
     hflags = (env->segs[R_CS].flags >> DESC_DPL_SHIFT) & HF_CPL_MASK;
     hflags |= (env->cr[0] & CR0_PE_MASK) << (HF_PE_SHIFT - CR0_PE_SHIFT);
     hflags |= (env->cr[0] << (HF_MP_SHIFT - CR0_MP_SHIFT)) &
-	    (HF_MP_MASK | HF_EM_MASK | HF_TS_MASK);
+                (HF_MP_MASK | HF_EM_MASK | HF_TS_MASK);
     hflags |= (env->eflags & (HF_TF_MASK | HF_VM_MASK | HF_IOPL_MASK));
     hflags |= (env->cr[4] & CR4_OSFXSR_MASK) <<
-	    (HF_OSFXSR_SHIFT - CR4_OSFXSR_SHIFT);
+                (HF_OSFXSR_SHIFT - CR4_OSFXSR_SHIFT);
 
     if (env->efer & MSR_EFER_LMA) {
         hflags |= HF_LMA_MASK;
@@ -1029,19 +1048,16 @@ static int kvm_get_sregs(CPUState *env)
         hflags |= HF_CS32_MASK | HF_SS32_MASK | HF_CS64_MASK;
     } else {
         hflags |= (env->segs[R_CS].flags & DESC_B_MASK) >>
-		(DESC_B_SHIFT - HF_CS32_SHIFT);
+                    (DESC_B_SHIFT - HF_CS32_SHIFT);
         hflags |= (env->segs[R_SS].flags & DESC_B_MASK) >>
-		(DESC_B_SHIFT - HF_SS32_SHIFT);
-        if (!(env->cr[0] & CR0_PE_MASK) ||
-                   (env->eflags & VM_MASK) ||
-                   !(hflags & HF_CS32_MASK)) {
-                hflags |= HF_ADDSEG_MASK;
-            } else {
-                hflags |= ((env->segs[R_DS].base |
-                                env->segs[R_ES].base |
-                                env->segs[R_SS].base) != 0) <<
-                    HF_ADDSEG_SHIFT;
-            }
+                    (DESC_B_SHIFT - HF_SS32_SHIFT);
+        if (!(env->cr[0] & CR0_PE_MASK) || (env->eflags & VM_MASK) ||
+            !(hflags & HF_CS32_MASK)) {
+            hflags |= HF_ADDSEG_MASK;
+        } else {
+            hflags |= ((env->segs[R_DS].base | env->segs[R_ES].base |
+                        env->segs[R_SS].base) != 0) << HF_ADDSEG_SHIFT;
+        }
     }
     env->hflags = (env->hflags & HFLAG_COPY_MASK) | hflags;
 
@@ -1061,10 +1077,12 @@ static int kvm_get_msrs(CPUState *env)
     msrs[n++].index = MSR_IA32_SYSENTER_CS;
     msrs[n++].index = MSR_IA32_SYSENTER_ESP;
     msrs[n++].index = MSR_IA32_SYSENTER_EIP;
-    if (kvm_has_msr_star(env))
-	msrs[n++].index = MSR_STAR;
-    if (kvm_has_msr_hsave_pa(env))
+    if (kvm_has_msr_star(env)) {
+        msrs[n++].index = MSR_STAR;
+    }
+    if (kvm_has_msr_hsave_pa(env)) {
         msrs[n++].index = MSR_VM_HSAVE_PA;
+    }
     msrs[n++].index = MSR_IA32_TSC;
 #ifdef TARGET_X86_64
     if (lm_capable_kernel) {
@@ -1084,15 +1102,17 @@ static int kvm_get_msrs(CPUState *env)
     if (env->mcg_cap) {
         msrs[n++].index = MSR_MCG_STATUS;
         msrs[n++].index = MSR_MCG_CTL;
-        for (i = 0; i < (env->mcg_cap & 0xff) * 4; i++)
+        for (i = 0; i < (env->mcg_cap & 0xff) * 4; i++) {
             msrs[n++].index = MSR_MC0_CTL + i;
+        }
     }
 #endif
 
     msr_data.info.nmsrs = n;
     ret = kvm_vcpu_ioctl(env, KVM_GET_MSRS, &msr_data);
-    if (ret < 0)
+    if (ret < 0) {
         return ret;
+    }
 
     for (i = 0; i < ret; i++) {
         switch (msrs[i].index) {
@@ -1320,7 +1340,7 @@ static int kvm_get_debugregs(CPUState *env)
 
     ret = kvm_vcpu_ioctl(env, KVM_GET_DEBUGREGS, &dbgregs);
     if (ret < 0) {
-       return ret;
+        return ret;
     }
     for (i = 0; i < 4; i++) {
         env->dr[i] = dbgregs.db[i];
@@ -1339,44 +1359,44 @@ int kvm_arch_put_registers(CPUState *env, int level)
     assert(cpu_is_stopped(env) || qemu_cpu_self(env));
 
     ret = kvm_getput_regs(env, 1);
-    if (ret < 0)
+    if (ret < 0) {
         return ret;
-
+    }
     ret = kvm_put_xsave(env);
-    if (ret < 0)
+    if (ret < 0) {
         return ret;
-
+    }
     ret = kvm_put_xcrs(env);
-    if (ret < 0)
+    if (ret < 0) {
         return ret;
-
+    }
     ret = kvm_put_sregs(env);
-    if (ret < 0)
+    if (ret < 0) {
         return ret;
-
+    }
     ret = kvm_put_msrs(env, level);
-    if (ret < 0)
+    if (ret < 0) {
         return ret;
-
+    }
     if (level >= KVM_PUT_RESET_STATE) {
         ret = kvm_put_mp_state(env);
-        if (ret < 0)
+        if (ret < 0) {
             return ret;
+        }
     }
-
     ret = kvm_put_vcpu_events(env, level);
-    if (ret < 0)
+    if (ret < 0) {
         return ret;
-
+    }
     /* must be last */
     ret = kvm_guest_debug_workarounds(env);
-    if (ret < 0)
+    if (ret < 0) {
         return ret;
-
+    }
     ret = kvm_put_debugregs(env);
-    if (ret < 0)
+    if (ret < 0) {
         return ret;
-
+    }
     return 0;
 }
 
@@ -1387,37 +1407,37 @@ int kvm_arch_get_registers(CPUState *env)
     assert(cpu_is_stopped(env) || qemu_cpu_self(env));
 
     ret = kvm_getput_regs(env, 0);
-    if (ret < 0)
+    if (ret < 0) {
         return ret;
-
+    }
     ret = kvm_get_xsave(env);
-    if (ret < 0)
+    if (ret < 0) {
         return ret;
-
+    }
     ret = kvm_get_xcrs(env);
-    if (ret < 0)
+    if (ret < 0) {
         return ret;
-
+    }
     ret = kvm_get_sregs(env);
-    if (ret < 0)
+    if (ret < 0) {
         return ret;
-
+    }
     ret = kvm_get_msrs(env);
-    if (ret < 0)
+    if (ret < 0) {
         return ret;
-
+    }
     ret = kvm_get_mp_state(env);
-    if (ret < 0)
+    if (ret < 0) {
         return ret;
-
+    }
     ret = kvm_get_vcpu_events(env);
-    if (ret < 0)
+    if (ret < 0) {
         return ret;
-
+    }
     ret = kvm_get_debugregs(env);
-    if (ret < 0)
+    if (ret < 0) {
         return ret;
-
+    }
     return 0;
 }
 
@@ -1451,10 +1471,11 @@ int kvm_arch_pre_run(CPUState *env, struct kvm_run *run)
      * interrupt, request an interrupt window exit.  This will
      * cause a return to userspace as soon as the guest is ready to
      * receive interrupts. */
-    if ((env->interrupt_request & CPU_INTERRUPT_HARD))
+    if ((env->interrupt_request & CPU_INTERRUPT_HARD)) {
         run->request_interrupt_window = 1;
-    else
+    } else {
         run->request_interrupt_window = 0;
+    }
 
     DPRINTF("setting tpr\n");
     run->cr8 = cpu_get_apic_tpr(env->apic_state);
@@ -1464,11 +1485,11 @@ int kvm_arch_pre_run(CPUState *env, struct kvm_run *run)
 
 int kvm_arch_post_run(CPUState *env, struct kvm_run *run)
 {
-    if (run->if_flag)
+    if (run->if_flag) {
         env->eflags |= IF_MASK;
-    else
+    } else {
         env->eflags &= ~IF_MASK;
-    
+    }
     cpu_set_apic_tpr(env->apic_state, run->cr8);
     cpu_set_apic_base(env->apic_state, run->apic_base);
 
@@ -1524,8 +1545,9 @@ int kvm_arch_insert_sw_breakpoint(CPUState *env, struct kvm_sw_breakpoint *bp)
     static const uint8_t int3 = 0xcc;
 
     if (cpu_memory_rw_debug(env, bp->pc, (uint8_t *)&bp->saved_insn, 1, 0) ||
-        cpu_memory_rw_debug(env, bp->pc, (uint8_t *)&int3, 1, 1))
+        cpu_memory_rw_debug(env, bp->pc, (uint8_t *)&int3, 1, 1)) {
         return -EINVAL;
+    }
     return 0;
 }
 
@@ -1534,8 +1556,9 @@ int kvm_arch_remove_sw_breakpoint(CPUState *env, struct kvm_sw_breakpoint *bp)
     uint8_t int3;
 
     if (cpu_memory_rw_debug(env, bp->pc, &int3, 1, 0) || int3 != 0xcc ||
-        cpu_memory_rw_debug(env, bp->pc, (uint8_t *)&bp->saved_insn, 1, 1))
+        cpu_memory_rw_debug(env, bp->pc, (uint8_t *)&bp->saved_insn, 1, 1)) {
         return -EINVAL;
+    }
     return 0;
 }
 
@@ -1551,10 +1574,12 @@ static int find_hw_breakpoint(target_ulong addr, int len, int type)
 {
     int n;
 
-    for (n = 0; n < nb_hw_breakpoint; n++)
+    for (n = 0; n < nb_hw_breakpoint; n++) {
         if (hw_breakpoint[n].addr == addr && hw_breakpoint[n].type == type &&
-            (hw_breakpoint[n].len == len || len == -1))
+            (hw_breakpoint[n].len == len || len == -1)) {
             return n;
+        }
+    }
     return -1;
 }
 
@@ -1573,8 +1598,9 @@ int kvm_arch_insert_hw_breakpoint(target_ulong addr,
         case 2:
         case 4:
         case 8:
-            if (addr & (len - 1))
+            if (addr & (len - 1)) {
                 return -EINVAL;
+            }
             break;
         default:
             return -EINVAL;
@@ -1584,12 +1610,12 @@ int kvm_arch_insert_hw_breakpoint(target_ulong addr,
         return -ENOSYS;
     }
 
-    if (nb_hw_breakpoint == 4)
+    if (nb_hw_breakpoint == 4) {
         return -ENOBUFS;
-
-    if (find_hw_breakpoint(addr, len, type) >= 0)
+    }
+    if (find_hw_breakpoint(addr, len, type) >= 0) {
         return -EEXIST;
-
+    }
     hw_breakpoint[nb_hw_breakpoint].addr = addr;
     hw_breakpoint[nb_hw_breakpoint].len = len;
     hw_breakpoint[nb_hw_breakpoint].type = type;
@@ -1604,9 +1630,9 @@ int kvm_arch_remove_hw_breakpoint(target_ulong addr,
     int n;
 
     n = find_hw_breakpoint(addr, (type == GDB_BREAKPOINT_HW) ? 1 : len, type);
-    if (n < 0)
+    if (n < 0) {
         return -ENOENT;
-
+    }
     nb_hw_breakpoint--;
     hw_breakpoint[n] = hw_breakpoint[nb_hw_breakpoint];
 
@@ -1627,11 +1653,12 @@ int kvm_arch_debug(struct kvm_debug_exit_arch *arch_info)
 
     if (arch_info->exception == 1) {
         if (arch_info->dr6 & (1 << 14)) {
-            if (cpu_single_env->singlestep_enabled)
+            if (cpu_single_env->singlestep_enabled) {
                 handle = 1;
+            }
         } else {
-            for (n = 0; n < 4; n++)
-                if (arch_info->dr6 & (1 << n))
+            for (n = 0; n < 4; n++) {
+                if (arch_info->dr6 & (1 << n)) {
                     switch ((arch_info->dr7 >> (16 + n*4)) & 0x3) {
                     case 0x0:
                         handle = 1;
@@ -1649,10 +1676,12 @@ int kvm_arch_debug(struct kvm_debug_exit_arch *arch_info)
                         hw_watchpoint.flags = BP_MEM_ACCESS;
                         break;
                     }
+                }
+            }
         }
-    } else if (kvm_find_sw_breakpoint(cpu_single_env, arch_info->pc))
+    } else if (kvm_find_sw_breakpoint(cpu_single_env, arch_info->pc)) {
         handle = 1;
-
+    }
     if (!handle) {
         cpu_synchronize_state(cpu_single_env);
         assert(cpu_single_env->exception_injected == -1);
@@ -1676,9 +1705,9 @@ void kvm_arch_update_guest_debug(CPUState *env, struct kvm_guest_debug *dbg)
     };
     int n;
 
-    if (kvm_sw_breakpoints_active(env))
+    if (kvm_sw_breakpoints_active(env)) {
         dbg->control |= KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_SW_BP;
-
+    }
     if (nb_hw_breakpoint > 0) {
         dbg->control |= KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_HW_BP;
         dbg->arch.debugreg[7] = 0x0600;
@@ -1696,8 +1725,8 @@ void kvm_arch_update_guest_debug(CPUState *env, struct kvm_guest_debug *dbg)
 
 bool kvm_arch_stop_on_emulation_error(CPUState *env)
 {
-      return !(env->cr[0] & CR0_PE_MASK) ||
-              ((env->segs[R_CS].selector  & 3) != 3);
+    return !(env->cr[0] & CR0_PE_MASK) ||
+           ((env->segs[R_CS].selector  & 3) != 3);
 }
 
 static void hardware_memory_error(void)
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [Qemu-devel] [PATCH 12/35] kvm: x86: Fix a few coding style violations
@ 2011-01-06 17:56   ` Marcelo Tosatti
  0 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Jan Kiszka, qemu-devel, kvm, Avi Kivity

From: Jan Kiszka <jan.kiszka@siemens.com>

No functional changes.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
 target-i386/kvm.c |  335 +++++++++++++++++++++++++++++------------------------
 1 files changed, 182 insertions(+), 153 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 85edacc..fda07d2 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -150,34 +150,34 @@ uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function,
 
 #ifdef CONFIG_KVM_PARA
 struct kvm_para_features {
-        int cap;
-        int feature;
+    int cap;
+    int feature;
 } para_features[] = {
 #ifdef KVM_CAP_CLOCKSOURCE
-        { KVM_CAP_CLOCKSOURCE, KVM_FEATURE_CLOCKSOURCE },
+    { KVM_CAP_CLOCKSOURCE, KVM_FEATURE_CLOCKSOURCE },
 #endif
 #ifdef KVM_CAP_NOP_IO_DELAY
-        { KVM_CAP_NOP_IO_DELAY, KVM_FEATURE_NOP_IO_DELAY },
+    { KVM_CAP_NOP_IO_DELAY, KVM_FEATURE_NOP_IO_DELAY },
 #endif
 #ifdef KVM_CAP_PV_MMU
-        { KVM_CAP_PV_MMU, KVM_FEATURE_MMU_OP },
+    { KVM_CAP_PV_MMU, KVM_FEATURE_MMU_OP },
 #endif
 #ifdef KVM_CAP_ASYNC_PF
-        { KVM_CAP_ASYNC_PF, KVM_FEATURE_ASYNC_PF },
+    { KVM_CAP_ASYNC_PF, KVM_FEATURE_ASYNC_PF },
 #endif
-        { -1, -1 }
+    { -1, -1 }
 };
 
 static int get_para_features(CPUState *env)
 {
-        int i, features = 0;
+    int i, features = 0;
 
-        for (i = 0; i < ARRAY_SIZE(para_features) - 1; i++) {
-                if (kvm_check_extension(env->kvm_state, para_features[i].cap))
-                        features |= (1 << para_features[i].feature);
+    for (i = 0; i < ARRAY_SIZE(para_features) - 1; i++) {
+        if (kvm_check_extension(env->kvm_state, para_features[i].cap)) {
+            features |= (1 << para_features[i].feature);
         }
-
-        return features;
+    }
+    return features;
 }
 #endif
 
@@ -389,13 +389,15 @@ int kvm_arch_init_vcpu(CPUState *env)
                 c->index = j;
                 cpu_x86_cpuid(env, i, j, &c->eax, &c->ebx, &c->ecx, &c->edx);
 
-                if (i == 4 && c->eax == 0)
+                if (i == 4 && c->eax == 0) {
                     break;
-                if (i == 0xb && !(c->ecx & 0xff00))
+                }
+                if (i == 0xb && !(c->ecx & 0xff00)) {
                     break;
-                if (i == 0xd && c->eax == 0)
+                }
+                if (i == 0xd && c->eax == 0) {
                     break;
-
+                }
                 c = &cpuid_data.entries[cpuid_i++];
             }
             break;
@@ -425,17 +427,18 @@ int kvm_arch_init_vcpu(CPUState *env)
         uint64_t mcg_cap;
         int banks;
 
-        if (kvm_get_mce_cap_supported(env->kvm_state, &mcg_cap, &banks))
+        if (kvm_get_mce_cap_supported(env->kvm_state, &mcg_cap, &banks)) {
             perror("kvm_get_mce_cap_supported FAILED");
-        else {
+        } else {
             if (banks > MCE_BANKS_DEF)
                 banks = MCE_BANKS_DEF;
             mcg_cap &= MCE_CAP_DEF;
             mcg_cap |= banks;
-            if (kvm_setup_mce(env, &mcg_cap))
+            if (kvm_setup_mce(env, &mcg_cap)) {
                 perror("kvm_setup_mce FAILED");
-            else
+            } else {
                 env->mcg_cap = mcg_cap;
+            }
         }
     }
 #endif
@@ -577,7 +580,7 @@ int kvm_arch_init(KVMState *s, int smp_cpus)
 
     return kvm_init_identity_map_page(s);
 }
-                    
+
 static void set_v8086_seg(struct kvm_segment *lhs, const SegmentCache *rhs)
 {
     lhs->selector = rhs->selector;
@@ -616,23 +619,23 @@ static void get_seg(SegmentCache *lhs, const struct kvm_segment *rhs)
     lhs->selector = rhs->selector;
     lhs->base = rhs->base;
     lhs->limit = rhs->limit;
-    lhs->flags =
-	(rhs->type << DESC_TYPE_SHIFT)
-	| (rhs->present * DESC_P_MASK)
-	| (rhs->dpl << DESC_DPL_SHIFT)
-	| (rhs->db << DESC_B_SHIFT)
-	| (rhs->s * DESC_S_MASK)
-	| (rhs->l << DESC_L_SHIFT)
-	| (rhs->g * DESC_G_MASK)
-	| (rhs->avl * DESC_AVL_MASK);
+    lhs->flags = (rhs->type << DESC_TYPE_SHIFT) |
+                 (rhs->present * DESC_P_MASK) |
+                 (rhs->dpl << DESC_DPL_SHIFT) |
+                 (rhs->db << DESC_B_SHIFT) |
+                 (rhs->s * DESC_S_MASK) |
+                 (rhs->l << DESC_L_SHIFT) |
+                 (rhs->g * DESC_G_MASK) |
+                 (rhs->avl * DESC_AVL_MASK);
 }
 
 static void kvm_getput_reg(__u64 *kvm_reg, target_ulong *qemu_reg, int set)
 {
-    if (set)
+    if (set) {
         *kvm_reg = *qemu_reg;
-    else
+    } else {
         *qemu_reg = *kvm_reg;
+    }
 }
 
 static int kvm_getput_regs(CPUState *env, int set)
@@ -642,8 +645,9 @@ static int kvm_getput_regs(CPUState *env, int set)
 
     if (!set) {
         ret = kvm_vcpu_ioctl(env, KVM_GET_REGS, &regs);
-        if (ret < 0)
+        if (ret < 0) {
             return ret;
+        }
     }
 
     kvm_getput_reg(&regs.rax, &env->regs[R_EAX], set);
@@ -668,8 +672,9 @@ static int kvm_getput_regs(CPUState *env, int set)
     kvm_getput_reg(&regs.rflags, &env->eflags, set);
     kvm_getput_reg(&regs.rip, &env->eip, set);
 
-    if (set)
+    if (set) {
         ret = kvm_vcpu_ioctl(env, KVM_SET_REGS, &regs);
+    }
 
     return ret;
 }
@@ -683,8 +688,9 @@ static int kvm_put_fpu(CPUState *env)
     fpu.fsw = env->fpus & ~(7 << 11);
     fpu.fsw |= (env->fpstt & 7) << 11;
     fpu.fcw = env->fpuc;
-    for (i = 0; i < 8; ++i)
-	fpu.ftwx |= (!env->fptags[i]) << i;
+    for (i = 0; i < 8; ++i) {
+        fpu.ftwx |= (!env->fptags[i]) << i;
+    }
     memcpy(fpu.fpr, env->fpregs, sizeof env->fpregs);
     memcpy(fpu.xmm, env->xmm_regs, sizeof env->xmm_regs);
     fpu.mxcsr = env->mxcsr;
@@ -709,8 +715,9 @@ static int kvm_put_xsave(CPUState *env)
     struct kvm_xsave* xsave;
     uint16_t cwd, swd, twd, fop;
 
-    if (!kvm_has_xsave())
+    if (!kvm_has_xsave()) {
         return kvm_put_fpu(env);
+    }
 
     xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
     memset(xsave, 0, sizeof(struct kvm_xsave));
@@ -718,8 +725,9 @@ static int kvm_put_xsave(CPUState *env)
     swd = env->fpus & ~(7 << 11);
     swd |= (env->fpstt & 7) << 11;
     cwd = env->fpuc;
-    for (i = 0; i < 8; ++i)
+    for (i = 0; i < 8; ++i) {
         twd |= (!env->fptags[i]) << i;
+    }
     xsave->region[0] = (uint32_t)(swd << 16) + cwd;
     xsave->region[1] = (uint32_t)(fop << 16) + twd;
     memcpy(&xsave->region[XSAVE_ST_SPACE], env->fpregs,
@@ -743,8 +751,9 @@ static int kvm_put_xcrs(CPUState *env)
 #ifdef KVM_CAP_XCRS
     struct kvm_xcrs xcrs;
 
-    if (!kvm_has_xcrs())
+    if (!kvm_has_xcrs()) {
         return 0;
+    }
 
     xcrs.nr_xcrs = 1;
     xcrs.flags = 0;
@@ -767,19 +776,19 @@ static int kvm_put_sregs(CPUState *env)
     }
 
     if ((env->eflags & VM_MASK)) {
-	    set_v8086_seg(&sregs.cs, &env->segs[R_CS]);
-	    set_v8086_seg(&sregs.ds, &env->segs[R_DS]);
-	    set_v8086_seg(&sregs.es, &env->segs[R_ES]);
-	    set_v8086_seg(&sregs.fs, &env->segs[R_FS]);
-	    set_v8086_seg(&sregs.gs, &env->segs[R_GS]);
-	    set_v8086_seg(&sregs.ss, &env->segs[R_SS]);
+        set_v8086_seg(&sregs.cs, &env->segs[R_CS]);
+        set_v8086_seg(&sregs.ds, &env->segs[R_DS]);
+        set_v8086_seg(&sregs.es, &env->segs[R_ES]);
+        set_v8086_seg(&sregs.fs, &env->segs[R_FS]);
+        set_v8086_seg(&sregs.gs, &env->segs[R_GS]);
+        set_v8086_seg(&sregs.ss, &env->segs[R_SS]);
     } else {
-	    set_seg(&sregs.cs, &env->segs[R_CS]);
-	    set_seg(&sregs.ds, &env->segs[R_DS]);
-	    set_seg(&sregs.es, &env->segs[R_ES]);
-	    set_seg(&sregs.fs, &env->segs[R_FS]);
-	    set_seg(&sregs.gs, &env->segs[R_GS]);
-	    set_seg(&sregs.ss, &env->segs[R_SS]);
+        set_seg(&sregs.cs, &env->segs[R_CS]);
+        set_seg(&sregs.ds, &env->segs[R_DS]);
+        set_seg(&sregs.es, &env->segs[R_ES]);
+        set_seg(&sregs.fs, &env->segs[R_FS]);
+        set_seg(&sregs.gs, &env->segs[R_GS]);
+        set_seg(&sregs.ss, &env->segs[R_SS]);
     }
 
     set_seg(&sregs.tr, &env->tr);
@@ -822,10 +831,12 @@ static int kvm_put_msrs(CPUState *env, int level)
     kvm_msr_entry_set(&msrs[n++], MSR_IA32_SYSENTER_CS, env->sysenter_cs);
     kvm_msr_entry_set(&msrs[n++], MSR_IA32_SYSENTER_ESP, env->sysenter_esp);
     kvm_msr_entry_set(&msrs[n++], MSR_IA32_SYSENTER_EIP, env->sysenter_eip);
-    if (kvm_has_msr_star(env))
-	kvm_msr_entry_set(&msrs[n++], MSR_STAR, env->star);
-    if (kvm_has_msr_hsave_pa(env))
+    if (kvm_has_msr_star(env)) {
+        kvm_msr_entry_set(&msrs[n++], MSR_STAR, env->star);
+    }
+    if (kvm_has_msr_hsave_pa(env)) {
         kvm_msr_entry_set(&msrs[n++], MSR_VM_HSAVE_PA, env->vm_hsave);
+    }
 #ifdef TARGET_X86_64
     if (lm_capable_kernel) {
         kvm_msr_entry_set(&msrs[n++], MSR_CSTAR, env->cstar);
@@ -854,13 +865,15 @@ static int kvm_put_msrs(CPUState *env, int level)
 #ifdef KVM_CAP_MCE
     if (env->mcg_cap) {
         int i;
-        if (level == KVM_PUT_RESET_STATE)
+
+        if (level == KVM_PUT_RESET_STATE) {
             kvm_msr_entry_set(&msrs[n++], MSR_MCG_STATUS, env->mcg_status);
-        else if (level == KVM_PUT_FULL_STATE) {
+        } else if (level == KVM_PUT_FULL_STATE) {
             kvm_msr_entry_set(&msrs[n++], MSR_MCG_STATUS, env->mcg_status);
             kvm_msr_entry_set(&msrs[n++], MSR_MCG_CTL, env->mcg_ctl);
-            for (i = 0; i < (env->mcg_cap & 0xff) * 4; i++)
+            for (i = 0; i < (env->mcg_cap & 0xff) * 4; i++) {
                 kvm_msr_entry_set(&msrs[n++], MSR_MC0_CTL + i, env->mce_banks[i]);
+            }
         }
     }
 #endif
@@ -878,14 +891,16 @@ static int kvm_get_fpu(CPUState *env)
     int i, ret;
 
     ret = kvm_vcpu_ioctl(env, KVM_GET_FPU, &fpu);
-    if (ret < 0)
+    if (ret < 0) {
         return ret;
+    }
 
     env->fpstt = (fpu.fsw >> 11) & 7;
     env->fpus = fpu.fsw;
     env->fpuc = fpu.fcw;
-    for (i = 0; i < 8; ++i)
-	env->fptags[i] = !((fpu.ftwx >> i) & 1);
+    for (i = 0; i < 8; ++i) {
+        env->fptags[i] = !((fpu.ftwx >> i) & 1);
+    }
     memcpy(env->fpregs, fpu.fpr, sizeof env->fpregs);
     memcpy(env->xmm_regs, fpu.xmm, sizeof env->xmm_regs);
     env->mxcsr = fpu.mxcsr;
@@ -900,8 +915,9 @@ static int kvm_get_xsave(CPUState *env)
     int ret, i;
     uint16_t cwd, swd, twd, fop;
 
-    if (!kvm_has_xsave())
+    if (!kvm_has_xsave()) {
         return kvm_get_fpu(env);
+    }
 
     xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
     ret = kvm_vcpu_ioctl(env, KVM_GET_XSAVE, xsave);
@@ -917,8 +933,9 @@ static int kvm_get_xsave(CPUState *env)
     env->fpstt = (swd >> 11) & 7;
     env->fpus = swd;
     env->fpuc = cwd;
-    for (i = 0; i < 8; ++i)
+    for (i = 0; i < 8; ++i) {
         env->fptags[i] = !((twd >> i) & 1);
+    }
     env->mxcsr = xsave->region[XSAVE_MXCSR];
     memcpy(env->fpregs, &xsave->region[XSAVE_ST_SPACE],
             sizeof env->fpregs);
@@ -940,19 +957,22 @@ static int kvm_get_xcrs(CPUState *env)
     int i, ret;
     struct kvm_xcrs xcrs;
 
-    if (!kvm_has_xcrs())
+    if (!kvm_has_xcrs()) {
         return 0;
+    }
 
     ret = kvm_vcpu_ioctl(env, KVM_GET_XCRS, &xcrs);
-    if (ret < 0)
+    if (ret < 0) {
         return ret;
+    }
 
-    for (i = 0; i < xcrs.nr_xcrs; i++)
+    for (i = 0; i < xcrs.nr_xcrs; i++) {
         /* Only support xcr0 now */
         if (xcrs.xcrs[0].xcr == 0) {
             env->xcr0 = xcrs.xcrs[0].value;
             break;
         }
+    }
     return 0;
 #else
     return 0;
@@ -966,8 +986,9 @@ static int kvm_get_sregs(CPUState *env)
     int bit, i, ret;
 
     ret = kvm_vcpu_ioctl(env, KVM_GET_SREGS, &sregs);
-    if (ret < 0)
+    if (ret < 0) {
         return ret;
+    }
 
     /* There can only be one pending IRQ set in the bitmap at a time, so try
        to find it and save its number instead (-1 for none). */
@@ -1005,21 +1026,19 @@ static int kvm_get_sregs(CPUState *env)
     env->efer = sregs.efer;
     //cpu_set_apic_tpr(env->apic_state, sregs.cr8);
 
-#define HFLAG_COPY_MASK ~( \
-			HF_CPL_MASK | HF_PE_MASK | HF_MP_MASK | HF_EM_MASK | \
-			HF_TS_MASK | HF_TF_MASK | HF_VM_MASK | HF_IOPL_MASK | \
-			HF_OSFXSR_MASK | HF_LMA_MASK | HF_CS32_MASK | \
-			HF_SS32_MASK | HF_CS64_MASK | HF_ADDSEG_MASK)
-
-
+#define HFLAG_COPY_MASK \
+    ~( HF_CPL_MASK | HF_PE_MASK | HF_MP_MASK | HF_EM_MASK | \
+       HF_TS_MASK | HF_TF_MASK | HF_VM_MASK | HF_IOPL_MASK | \
+       HF_OSFXSR_MASK | HF_LMA_MASK | HF_CS32_MASK | \
+       HF_SS32_MASK | HF_CS64_MASK | HF_ADDSEG_MASK)
 
     hflags = (env->segs[R_CS].flags >> DESC_DPL_SHIFT) & HF_CPL_MASK;
     hflags |= (env->cr[0] & CR0_PE_MASK) << (HF_PE_SHIFT - CR0_PE_SHIFT);
     hflags |= (env->cr[0] << (HF_MP_SHIFT - CR0_MP_SHIFT)) &
-	    (HF_MP_MASK | HF_EM_MASK | HF_TS_MASK);
+                (HF_MP_MASK | HF_EM_MASK | HF_TS_MASK);
     hflags |= (env->eflags & (HF_TF_MASK | HF_VM_MASK | HF_IOPL_MASK));
     hflags |= (env->cr[4] & CR4_OSFXSR_MASK) <<
-	    (HF_OSFXSR_SHIFT - CR4_OSFXSR_SHIFT);
+                (HF_OSFXSR_SHIFT - CR4_OSFXSR_SHIFT);
 
     if (env->efer & MSR_EFER_LMA) {
         hflags |= HF_LMA_MASK;
@@ -1029,19 +1048,16 @@ static int kvm_get_sregs(CPUState *env)
         hflags |= HF_CS32_MASK | HF_SS32_MASK | HF_CS64_MASK;
     } else {
         hflags |= (env->segs[R_CS].flags & DESC_B_MASK) >>
-		(DESC_B_SHIFT - HF_CS32_SHIFT);
+                    (DESC_B_SHIFT - HF_CS32_SHIFT);
         hflags |= (env->segs[R_SS].flags & DESC_B_MASK) >>
-		(DESC_B_SHIFT - HF_SS32_SHIFT);
-        if (!(env->cr[0] & CR0_PE_MASK) ||
-                   (env->eflags & VM_MASK) ||
-                   !(hflags & HF_CS32_MASK)) {
-                hflags |= HF_ADDSEG_MASK;
-            } else {
-                hflags |= ((env->segs[R_DS].base |
-                                env->segs[R_ES].base |
-                                env->segs[R_SS].base) != 0) <<
-                    HF_ADDSEG_SHIFT;
-            }
+                    (DESC_B_SHIFT - HF_SS32_SHIFT);
+        if (!(env->cr[0] & CR0_PE_MASK) || (env->eflags & VM_MASK) ||
+            !(hflags & HF_CS32_MASK)) {
+            hflags |= HF_ADDSEG_MASK;
+        } else {
+            hflags |= ((env->segs[R_DS].base | env->segs[R_ES].base |
+                        env->segs[R_SS].base) != 0) << HF_ADDSEG_SHIFT;
+        }
     }
     env->hflags = (env->hflags & HFLAG_COPY_MASK) | hflags;
 
@@ -1061,10 +1077,12 @@ static int kvm_get_msrs(CPUState *env)
     msrs[n++].index = MSR_IA32_SYSENTER_CS;
     msrs[n++].index = MSR_IA32_SYSENTER_ESP;
     msrs[n++].index = MSR_IA32_SYSENTER_EIP;
-    if (kvm_has_msr_star(env))
-	msrs[n++].index = MSR_STAR;
-    if (kvm_has_msr_hsave_pa(env))
+    if (kvm_has_msr_star(env)) {
+        msrs[n++].index = MSR_STAR;
+    }
+    if (kvm_has_msr_hsave_pa(env)) {
         msrs[n++].index = MSR_VM_HSAVE_PA;
+    }
     msrs[n++].index = MSR_IA32_TSC;
 #ifdef TARGET_X86_64
     if (lm_capable_kernel) {
@@ -1084,15 +1102,17 @@ static int kvm_get_msrs(CPUState *env)
     if (env->mcg_cap) {
         msrs[n++].index = MSR_MCG_STATUS;
         msrs[n++].index = MSR_MCG_CTL;
-        for (i = 0; i < (env->mcg_cap & 0xff) * 4; i++)
+        for (i = 0; i < (env->mcg_cap & 0xff) * 4; i++) {
             msrs[n++].index = MSR_MC0_CTL + i;
+        }
     }
 #endif
 
     msr_data.info.nmsrs = n;
     ret = kvm_vcpu_ioctl(env, KVM_GET_MSRS, &msr_data);
-    if (ret < 0)
+    if (ret < 0) {
         return ret;
+    }
 
     for (i = 0; i < ret; i++) {
         switch (msrs[i].index) {
@@ -1320,7 +1340,7 @@ static int kvm_get_debugregs(CPUState *env)
 
     ret = kvm_vcpu_ioctl(env, KVM_GET_DEBUGREGS, &dbgregs);
     if (ret < 0) {
-       return ret;
+        return ret;
     }
     for (i = 0; i < 4; i++) {
         env->dr[i] = dbgregs.db[i];
@@ -1339,44 +1359,44 @@ int kvm_arch_put_registers(CPUState *env, int level)
     assert(cpu_is_stopped(env) || qemu_cpu_self(env));
 
     ret = kvm_getput_regs(env, 1);
-    if (ret < 0)
+    if (ret < 0) {
         return ret;
-
+    }
     ret = kvm_put_xsave(env);
-    if (ret < 0)
+    if (ret < 0) {
         return ret;
-
+    }
     ret = kvm_put_xcrs(env);
-    if (ret < 0)
+    if (ret < 0) {
         return ret;
-
+    }
     ret = kvm_put_sregs(env);
-    if (ret < 0)
+    if (ret < 0) {
         return ret;
-
+    }
     ret = kvm_put_msrs(env, level);
-    if (ret < 0)
+    if (ret < 0) {
         return ret;
-
+    }
     if (level >= KVM_PUT_RESET_STATE) {
         ret = kvm_put_mp_state(env);
-        if (ret < 0)
+        if (ret < 0) {
             return ret;
+        }
     }
-
     ret = kvm_put_vcpu_events(env, level);
-    if (ret < 0)
+    if (ret < 0) {
         return ret;
-
+    }
     /* must be last */
     ret = kvm_guest_debug_workarounds(env);
-    if (ret < 0)
+    if (ret < 0) {
         return ret;
-
+    }
     ret = kvm_put_debugregs(env);
-    if (ret < 0)
+    if (ret < 0) {
         return ret;
-
+    }
     return 0;
 }
 
@@ -1387,37 +1407,37 @@ int kvm_arch_get_registers(CPUState *env)
     assert(cpu_is_stopped(env) || qemu_cpu_self(env));
 
     ret = kvm_getput_regs(env, 0);
-    if (ret < 0)
+    if (ret < 0) {
         return ret;
-
+    }
     ret = kvm_get_xsave(env);
-    if (ret < 0)
+    if (ret < 0) {
         return ret;
-
+    }
     ret = kvm_get_xcrs(env);
-    if (ret < 0)
+    if (ret < 0) {
         return ret;
-
+    }
     ret = kvm_get_sregs(env);
-    if (ret < 0)
+    if (ret < 0) {
         return ret;
-
+    }
     ret = kvm_get_msrs(env);
-    if (ret < 0)
+    if (ret < 0) {
         return ret;
-
+    }
     ret = kvm_get_mp_state(env);
-    if (ret < 0)
+    if (ret < 0) {
         return ret;
-
+    }
     ret = kvm_get_vcpu_events(env);
-    if (ret < 0)
+    if (ret < 0) {
         return ret;
-
+    }
     ret = kvm_get_debugregs(env);
-    if (ret < 0)
+    if (ret < 0) {
         return ret;
-
+    }
     return 0;
 }
 
@@ -1451,10 +1471,11 @@ int kvm_arch_pre_run(CPUState *env, struct kvm_run *run)
      * interrupt, request an interrupt window exit.  This will
      * cause a return to userspace as soon as the guest is ready to
      * receive interrupts. */
-    if ((env->interrupt_request & CPU_INTERRUPT_HARD))
+    if ((env->interrupt_request & CPU_INTERRUPT_HARD)) {
         run->request_interrupt_window = 1;
-    else
+    } else {
         run->request_interrupt_window = 0;
+    }
 
     DPRINTF("setting tpr\n");
     run->cr8 = cpu_get_apic_tpr(env->apic_state);
@@ -1464,11 +1485,11 @@ int kvm_arch_pre_run(CPUState *env, struct kvm_run *run)
 
 int kvm_arch_post_run(CPUState *env, struct kvm_run *run)
 {
-    if (run->if_flag)
+    if (run->if_flag) {
         env->eflags |= IF_MASK;
-    else
+    } else {
         env->eflags &= ~IF_MASK;
-    
+    }
     cpu_set_apic_tpr(env->apic_state, run->cr8);
     cpu_set_apic_base(env->apic_state, run->apic_base);
 
@@ -1524,8 +1545,9 @@ int kvm_arch_insert_sw_breakpoint(CPUState *env, struct kvm_sw_breakpoint *bp)
     static const uint8_t int3 = 0xcc;
 
     if (cpu_memory_rw_debug(env, bp->pc, (uint8_t *)&bp->saved_insn, 1, 0) ||
-        cpu_memory_rw_debug(env, bp->pc, (uint8_t *)&int3, 1, 1))
+        cpu_memory_rw_debug(env, bp->pc, (uint8_t *)&int3, 1, 1)) {
         return -EINVAL;
+    }
     return 0;
 }
 
@@ -1534,8 +1556,9 @@ int kvm_arch_remove_sw_breakpoint(CPUState *env, struct kvm_sw_breakpoint *bp)
     uint8_t int3;
 
     if (cpu_memory_rw_debug(env, bp->pc, &int3, 1, 0) || int3 != 0xcc ||
-        cpu_memory_rw_debug(env, bp->pc, (uint8_t *)&bp->saved_insn, 1, 1))
+        cpu_memory_rw_debug(env, bp->pc, (uint8_t *)&bp->saved_insn, 1, 1)) {
         return -EINVAL;
+    }
     return 0;
 }
 
@@ -1551,10 +1574,12 @@ static int find_hw_breakpoint(target_ulong addr, int len, int type)
 {
     int n;
 
-    for (n = 0; n < nb_hw_breakpoint; n++)
+    for (n = 0; n < nb_hw_breakpoint; n++) {
         if (hw_breakpoint[n].addr == addr && hw_breakpoint[n].type == type &&
-            (hw_breakpoint[n].len == len || len == -1))
+            (hw_breakpoint[n].len == len || len == -1)) {
             return n;
+        }
+    }
     return -1;
 }
 
@@ -1573,8 +1598,9 @@ int kvm_arch_insert_hw_breakpoint(target_ulong addr,
         case 2:
         case 4:
         case 8:
-            if (addr & (len - 1))
+            if (addr & (len - 1)) {
                 return -EINVAL;
+            }
             break;
         default:
             return -EINVAL;
@@ -1584,12 +1610,12 @@ int kvm_arch_insert_hw_breakpoint(target_ulong addr,
         return -ENOSYS;
     }
 
-    if (nb_hw_breakpoint == 4)
+    if (nb_hw_breakpoint == 4) {
         return -ENOBUFS;
-
-    if (find_hw_breakpoint(addr, len, type) >= 0)
+    }
+    if (find_hw_breakpoint(addr, len, type) >= 0) {
         return -EEXIST;
-
+    }
     hw_breakpoint[nb_hw_breakpoint].addr = addr;
     hw_breakpoint[nb_hw_breakpoint].len = len;
     hw_breakpoint[nb_hw_breakpoint].type = type;
@@ -1604,9 +1630,9 @@ int kvm_arch_remove_hw_breakpoint(target_ulong addr,
     int n;
 
     n = find_hw_breakpoint(addr, (type == GDB_BREAKPOINT_HW) ? 1 : len, type);
-    if (n < 0)
+    if (n < 0) {
         return -ENOENT;
-
+    }
     nb_hw_breakpoint--;
     hw_breakpoint[n] = hw_breakpoint[nb_hw_breakpoint];
 
@@ -1627,11 +1653,12 @@ int kvm_arch_debug(struct kvm_debug_exit_arch *arch_info)
 
     if (arch_info->exception == 1) {
         if (arch_info->dr6 & (1 << 14)) {
-            if (cpu_single_env->singlestep_enabled)
+            if (cpu_single_env->singlestep_enabled) {
                 handle = 1;
+            }
         } else {
-            for (n = 0; n < 4; n++)
-                if (arch_info->dr6 & (1 << n))
+            for (n = 0; n < 4; n++) {
+                if (arch_info->dr6 & (1 << n)) {
                     switch ((arch_info->dr7 >> (16 + n*4)) & 0x3) {
                     case 0x0:
                         handle = 1;
@@ -1649,10 +1676,12 @@ int kvm_arch_debug(struct kvm_debug_exit_arch *arch_info)
                         hw_watchpoint.flags = BP_MEM_ACCESS;
                         break;
                     }
+                }
+            }
         }
-    } else if (kvm_find_sw_breakpoint(cpu_single_env, arch_info->pc))
+    } else if (kvm_find_sw_breakpoint(cpu_single_env, arch_info->pc)) {
         handle = 1;
-
+    }
     if (!handle) {
         cpu_synchronize_state(cpu_single_env);
         assert(cpu_single_env->exception_injected == -1);
@@ -1676,9 +1705,9 @@ void kvm_arch_update_guest_debug(CPUState *env, struct kvm_guest_debug *dbg)
     };
     int n;
 
-    if (kvm_sw_breakpoints_active(env))
+    if (kvm_sw_breakpoints_active(env)) {
         dbg->control |= KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_SW_BP;
-
+    }
     if (nb_hw_breakpoint > 0) {
         dbg->control |= KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_HW_BP;
         dbg->arch.debugreg[7] = 0x0600;
@@ -1696,8 +1725,8 @@ void kvm_arch_update_guest_debug(CPUState *env, struct kvm_guest_debug *dbg)
 
 bool kvm_arch_stop_on_emulation_error(CPUState *env)
 {
-      return !(env->cr[0] & CR0_PE_MASK) ||
-              ((env->segs[R_CS].selector  & 3) != 3);
+    return !(env->cr[0] & CR0_PE_MASK) ||
+           ((env->segs[R_CS].selector  & 3) != 3);
 }
 
 static void hardware_memory_error(void)
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [PATCH 13/35] kvm: Fix coding style violations
  2011-01-06 17:56 ` [Qemu-devel] " Marcelo Tosatti
@ 2011-01-06 17:56   ` Marcelo Tosatti
  -1 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, kvm, Jan Kiszka, Marcelo Tosatti

From: Jan Kiszka <jan.kiszka@siemens.com>

No functional changes.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 kvm-all.c |  139 ++++++++++++++++++++++++++++++++++--------------------------
 1 files changed, 79 insertions(+), 60 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 35fc73c..2538283 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -82,10 +82,12 @@ static KVMSlot *kvm_alloc_slot(KVMState *s)
 
     for (i = 0; i < ARRAY_SIZE(s->slots); i++) {
         /* KVM private memory slots */
-        if (i >= 8 && i < 12)
+        if (i >= 8 && i < 12) {
             continue;
-        if (s->slots[i].memory_size == 0)
+        }
+        if (s->slots[i].memory_size == 0) {
             return &s->slots[i];
+        }
     }
 
     fprintf(stderr, "%s: no free slot available\n", __func__);
@@ -220,9 +222,10 @@ int kvm_init_vcpu(CPUState *env)
     }
 
 #ifdef KVM_CAP_COALESCED_MMIO
-    if (s->coalesced_mmio && !s->coalesced_mmio_ring)
-        s->coalesced_mmio_ring = (void *) env->kvm_run +
-		s->coalesced_mmio * PAGE_SIZE;
+    if (s->coalesced_mmio && !s->coalesced_mmio_ring) {
+        s->coalesced_mmio_ring =
+            (void *)env->kvm_run + s->coalesced_mmio * PAGE_SIZE;
+    }
 #endif
 
     ret = kvm_arch_init_vcpu(env);
@@ -269,16 +272,14 @@ static int kvm_dirty_pages_log_change(target_phys_addr_t phys_addr,
 
 int kvm_log_start(target_phys_addr_t phys_addr, ram_addr_t size)
 {
-        return kvm_dirty_pages_log_change(phys_addr, size,
-                                          KVM_MEM_LOG_DIRTY_PAGES,
-                                          KVM_MEM_LOG_DIRTY_PAGES);
+    return kvm_dirty_pages_log_change(phys_addr, size, KVM_MEM_LOG_DIRTY_PAGES,
+                                      KVM_MEM_LOG_DIRTY_PAGES);
 }
 
 int kvm_log_stop(target_phys_addr_t phys_addr, ram_addr_t size)
 {
-        return kvm_dirty_pages_log_change(phys_addr, size,
-                                          0,
-                                          KVM_MEM_LOG_DIRTY_PAGES);
+    return kvm_dirty_pages_log_change(phys_addr, size, 0,
+                                      KVM_MEM_LOG_DIRTY_PAGES);
 }
 
 static int kvm_set_migration_log(int enable)
@@ -350,7 +351,7 @@ static int kvm_get_dirty_pages_log_range(unsigned long start_addr,
  * @end_addr: end of logged region.
  */
 static int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
-					  target_phys_addr_t end_addr)
+                                          target_phys_addr_t end_addr)
 {
     KVMState *s = kvm_state;
     unsigned long size, allocated_size = 0;
@@ -441,9 +442,8 @@ int kvm_check_extension(KVMState *s, unsigned int extension)
     return ret;
 }
 
-static void kvm_set_phys_mem(target_phys_addr_t start_addr,
-			     ram_addr_t size,
-			     ram_addr_t phys_offset)
+static void kvm_set_phys_mem(target_phys_addr_t start_addr, ram_addr_t size,
+                             ram_addr_t phys_offset)
 {
     KVMState *s = kvm_state;
     ram_addr_t flags = phys_offset & ~TARGET_PAGE_MASK;
@@ -550,13 +550,13 @@ static void kvm_set_phys_mem(target_phys_addr_t start_addr,
     }
 
     /* in case the KVM bug workaround already "consumed" the new slot */
-    if (!size)
+    if (!size) {
         return;
-
+    }
     /* KVM does not need to know about this memory */
-    if (flags >= IO_MEM_UNASSIGNED)
+    if (flags >= IO_MEM_UNASSIGNED) {
         return;
-
+    }
     mem = kvm_alloc_slot(s);
     mem->memory_size = size;
     mem->start_addr = start_addr;
@@ -572,30 +572,29 @@ static void kvm_set_phys_mem(target_phys_addr_t start_addr,
 }
 
 static void kvm_client_set_memory(struct CPUPhysMemoryClient *client,
-				  target_phys_addr_t start_addr,
-				  ram_addr_t size,
-				  ram_addr_t phys_offset)
+                                  target_phys_addr_t start_addr,
+                                  ram_addr_t size, ram_addr_t phys_offset)
 {
-	kvm_set_phys_mem(start_addr, size, phys_offset);
+    kvm_set_phys_mem(start_addr, size, phys_offset);
 }
 
 static int kvm_client_sync_dirty_bitmap(struct CPUPhysMemoryClient *client,
-					target_phys_addr_t start_addr,
-					target_phys_addr_t end_addr)
+                                        target_phys_addr_t start_addr,
+                                        target_phys_addr_t end_addr)
 {
-	return kvm_physical_sync_dirty_bitmap(start_addr, end_addr);
+    return kvm_physical_sync_dirty_bitmap(start_addr, end_addr);
 }
 
 static int kvm_client_migration_log(struct CPUPhysMemoryClient *client,
-				    int enable)
+                                    int enable)
 {
-	return kvm_set_migration_log(enable);
+    return kvm_set_migration_log(enable);
 }
 
 static CPUPhysMemoryClient kvm_cpu_phys_memory_client = {
-	.set_memory = kvm_client_set_memory,
-	.sync_dirty_bitmap = kvm_client_sync_dirty_bitmap,
-	.migration_log = kvm_client_migration_log,
+    .set_memory = kvm_client_set_memory,
+    .sync_dirty_bitmap = kvm_client_sync_dirty_bitmap,
+    .migration_log = kvm_client_migration_log,
 };
 
 int kvm_init(int smp_cpus)
@@ -612,9 +611,9 @@ int kvm_init(int smp_cpus)
 #ifdef KVM_CAP_SET_GUEST_DEBUG
     QTAILQ_INIT(&s->kvm_sw_breakpoints);
 #endif
-    for (i = 0; i < ARRAY_SIZE(s->slots); i++)
+    for (i = 0; i < ARRAY_SIZE(s->slots); i++) {
         s->slots[i].slot = i;
-
+    }
     s->vmfd = -1;
     s->fd = qemu_open("/dev/kvm", O_RDWR);
     if (s->fd == -1) {
@@ -625,8 +624,9 @@ int kvm_init(int smp_cpus)
 
     ret = kvm_ioctl(s, KVM_GET_API_VERSION, 0);
     if (ret < KVM_API_VERSION) {
-        if (ret > 0)
+        if (ret > 0) {
             ret = -EINVAL;
+        }
         fprintf(stderr, "kvm version too old\n");
         goto err;
     }
@@ -711,8 +711,9 @@ int kvm_init(int smp_cpus)
 #endif
 
     ret = kvm_arch_init(s, smp_cpus);
-    if (ret < 0)
+    if (ret < 0) {
         goto err;
+    }
 
     kvm_state = s;
     cpu_register_phys_memory_client(&kvm_cpu_phys_memory_client);
@@ -721,10 +722,12 @@ int kvm_init(int smp_cpus)
 
 err:
     if (s) {
-        if (s->vmfd != -1)
+        if (s->vmfd != -1) {
             close(s->vmfd);
-        if (s->fd != -1)
+        }
+        if (s->fd != -1) {
             close(s->fd);
+        }
     }
     qemu_free(s);
 
@@ -788,8 +791,9 @@ static void kvm_handle_internal_error(CPUState *env, struct kvm_run *run)
     cpu_dump_state(env, stderr, fprintf, 0);
     if (run->internal.suberror == KVM_INTERNAL_ERROR_EMULATION) {
         fprintf(stderr, "emulation failure\n");
-        if (!kvm_arch_stop_on_emulation_error(env))
-		return;
+        if (!kvm_arch_stop_on_emulation_error(env)) {
+            return;
+        }
     }
     /* FIXME: Should trigger a qmp message to let management know
      * something went wrong.
@@ -829,8 +833,9 @@ static void do_kvm_cpu_synchronize_state(void *_env)
 
 void kvm_cpu_synchronize_state(CPUState *env)
 {
-    if (!env->kvm_vcpu_dirty)
+    if (!env->kvm_vcpu_dirty) {
         run_on_cpu(env, do_kvm_cpu_synchronize_state, env);
+    }
 }
 
 void kvm_cpu_synchronize_post_reset(CPUState *env)
@@ -970,9 +975,9 @@ int kvm_ioctl(KVMState *s, int type, ...)
     va_end(ap);
 
     ret = ioctl(s->fd, type, arg);
-    if (ret == -1)
+    if (ret == -1) {
         ret = -errno;
-
+    }
     return ret;
 }
 
@@ -987,9 +992,9 @@ int kvm_vm_ioctl(KVMState *s, int type, ...)
     va_end(ap);
 
     ret = ioctl(s->vmfd, type, arg);
-    if (ret == -1)
+    if (ret == -1) {
         ret = -errno;
-
+    }
     return ret;
 }
 
@@ -1004,9 +1009,9 @@ int kvm_vcpu_ioctl(CPUState *env, int type, ...)
     va_end(ap);
 
     ret = ioctl(env->kvm_fd, type, arg);
-    if (ret == -1)
+    if (ret == -1) {
         ret = -errno;
-
+    }
     return ret;
 }
 
@@ -1067,8 +1072,9 @@ struct kvm_sw_breakpoint *kvm_find_sw_breakpoint(CPUState *env,
     struct kvm_sw_breakpoint *bp;
 
     QTAILQ_FOREACH(bp, &env->kvm_state->kvm_sw_breakpoints, entry) {
-        if (bp->pc == pc)
+        if (bp->pc == pc) {
             return bp;
+        }
     }
     return NULL;
 }
@@ -1123,8 +1129,9 @@ int kvm_insert_breakpoint(CPUState *current_env, target_ulong addr,
         }
 
         bp = qemu_malloc(sizeof(struct kvm_sw_breakpoint));
-        if (!bp)
+        if (!bp) {
             return -ENOMEM;
+        }
 
         bp->pc = addr;
         bp->use_count = 1;
@@ -1138,14 +1145,16 @@ int kvm_insert_breakpoint(CPUState *current_env, target_ulong addr,
                           bp, entry);
     } else {
         err = kvm_arch_insert_hw_breakpoint(addr, len, type);
-        if (err)
+        if (err) {
             return err;
+        }
     }
 
     for (env = first_cpu; env != NULL; env = env->next_cpu) {
         err = kvm_update_guest_debug(env, 0);
-        if (err)
+        if (err) {
             return err;
+        }
     }
     return 0;
 }
@@ -1159,8 +1168,9 @@ int kvm_remove_breakpoint(CPUState *current_env, target_ulong addr,
 
     if (type == GDB_BREAKPOINT_SW) {
         bp = kvm_find_sw_breakpoint(current_env, addr);
-        if (!bp)
+        if (!bp) {
             return -ENOENT;
+        }
 
         if (bp->use_count > 1) {
             bp->use_count--;
@@ -1168,21 +1178,24 @@ int kvm_remove_breakpoint(CPUState *current_env, target_ulong addr,
         }
 
         err = kvm_arch_remove_sw_breakpoint(current_env, bp);
-        if (err)
+        if (err) {
             return err;
+        }
 
         QTAILQ_REMOVE(&current_env->kvm_state->kvm_sw_breakpoints, bp, entry);
         qemu_free(bp);
     } else {
         err = kvm_arch_remove_hw_breakpoint(addr, len, type);
-        if (err)
+        if (err) {
             return err;
+        }
     }
 
     for (env = first_cpu; env != NULL; env = env->next_cpu) {
         err = kvm_update_guest_debug(env, 0);
-        if (err)
+        if (err) {
             return err;
+        }
     }
     return 0;
 }
@@ -1197,15 +1210,17 @@ void kvm_remove_all_breakpoints(CPUState *current_env)
         if (kvm_arch_remove_sw_breakpoint(current_env, bp) != 0) {
             /* Try harder to find a CPU that currently sees the breakpoint. */
             for (env = first_cpu; env != NULL; env = env->next_cpu) {
-                if (kvm_arch_remove_sw_breakpoint(env, bp) == 0)
+                if (kvm_arch_remove_sw_breakpoint(env, bp) == 0) {
                     break;
+                }
             }
         }
     }
     kvm_arch_remove_all_hw_breakpoints();
 
-    for (env = first_cpu; env != NULL; env = env->next_cpu)
+    for (env = first_cpu; env != NULL; env = env->next_cpu) {
         kvm_update_guest_debug(env, 0);
+    }
 }
 
 #else /* !KVM_CAP_SET_GUEST_DEBUG */
@@ -1237,8 +1252,9 @@ int kvm_set_signal_mask(CPUState *env, const sigset_t *sigset)
     struct kvm_signal_mask *sigmask;
     int r;
 
-    if (!sigset)
+    if (!sigset) {
         return kvm_vcpu_ioctl(env, KVM_SET_SIGNAL_MASK, NULL);
+    }
 
     sigmask = qemu_malloc(sizeof(*sigmask) + sizeof(*sigset));
 
@@ -1293,13 +1309,16 @@ int kvm_set_ioeventfd_pio_word(int fd, uint16_t addr, uint16_t val, bool assign)
         .fd = fd,
     };
     int r;
-    if (!kvm_enabled())
+    if (!kvm_enabled()) {
         return -ENOSYS;
-    if (!assign)
+    }
+    if (!assign) {
         kick.flags |= KVM_IOEVENTFD_FLAG_DEASSIGN;
+    }
     r = kvm_vm_ioctl(kvm_state, KVM_IOEVENTFD, &kick);
-    if (r < 0)
+    if (r < 0) {
         return r;
+    }
     return 0;
 #else
     return -ENOSYS;
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [Qemu-devel] [PATCH 13/35] kvm: Fix coding style violations
@ 2011-01-06 17:56   ` Marcelo Tosatti
  0 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel, kvm

From: Jan Kiszka <jan.kiszka@siemens.com>

No functional changes.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 kvm-all.c |  139 ++++++++++++++++++++++++++++++++++--------------------------
 1 files changed, 79 insertions(+), 60 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 35fc73c..2538283 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -82,10 +82,12 @@ static KVMSlot *kvm_alloc_slot(KVMState *s)
 
     for (i = 0; i < ARRAY_SIZE(s->slots); i++) {
         /* KVM private memory slots */
-        if (i >= 8 && i < 12)
+        if (i >= 8 && i < 12) {
             continue;
-        if (s->slots[i].memory_size == 0)
+        }
+        if (s->slots[i].memory_size == 0) {
             return &s->slots[i];
+        }
     }
 
     fprintf(stderr, "%s: no free slot available\n", __func__);
@@ -220,9 +222,10 @@ int kvm_init_vcpu(CPUState *env)
     }
 
 #ifdef KVM_CAP_COALESCED_MMIO
-    if (s->coalesced_mmio && !s->coalesced_mmio_ring)
-        s->coalesced_mmio_ring = (void *) env->kvm_run +
-		s->coalesced_mmio * PAGE_SIZE;
+    if (s->coalesced_mmio && !s->coalesced_mmio_ring) {
+        s->coalesced_mmio_ring =
+            (void *)env->kvm_run + s->coalesced_mmio * PAGE_SIZE;
+    }
 #endif
 
     ret = kvm_arch_init_vcpu(env);
@@ -269,16 +272,14 @@ static int kvm_dirty_pages_log_change(target_phys_addr_t phys_addr,
 
 int kvm_log_start(target_phys_addr_t phys_addr, ram_addr_t size)
 {
-        return kvm_dirty_pages_log_change(phys_addr, size,
-                                          KVM_MEM_LOG_DIRTY_PAGES,
-                                          KVM_MEM_LOG_DIRTY_PAGES);
+    return kvm_dirty_pages_log_change(phys_addr, size, KVM_MEM_LOG_DIRTY_PAGES,
+                                      KVM_MEM_LOG_DIRTY_PAGES);
 }
 
 int kvm_log_stop(target_phys_addr_t phys_addr, ram_addr_t size)
 {
-        return kvm_dirty_pages_log_change(phys_addr, size,
-                                          0,
-                                          KVM_MEM_LOG_DIRTY_PAGES);
+    return kvm_dirty_pages_log_change(phys_addr, size, 0,
+                                      KVM_MEM_LOG_DIRTY_PAGES);
 }
 
 static int kvm_set_migration_log(int enable)
@@ -350,7 +351,7 @@ static int kvm_get_dirty_pages_log_range(unsigned long start_addr,
  * @end_addr: end of logged region.
  */
 static int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
-					  target_phys_addr_t end_addr)
+                                          target_phys_addr_t end_addr)
 {
     KVMState *s = kvm_state;
     unsigned long size, allocated_size = 0;
@@ -441,9 +442,8 @@ int kvm_check_extension(KVMState *s, unsigned int extension)
     return ret;
 }
 
-static void kvm_set_phys_mem(target_phys_addr_t start_addr,
-			     ram_addr_t size,
-			     ram_addr_t phys_offset)
+static void kvm_set_phys_mem(target_phys_addr_t start_addr, ram_addr_t size,
+                             ram_addr_t phys_offset)
 {
     KVMState *s = kvm_state;
     ram_addr_t flags = phys_offset & ~TARGET_PAGE_MASK;
@@ -550,13 +550,13 @@ static void kvm_set_phys_mem(target_phys_addr_t start_addr,
     }
 
     /* in case the KVM bug workaround already "consumed" the new slot */
-    if (!size)
+    if (!size) {
         return;
-
+    }
     /* KVM does not need to know about this memory */
-    if (flags >= IO_MEM_UNASSIGNED)
+    if (flags >= IO_MEM_UNASSIGNED) {
         return;
-
+    }
     mem = kvm_alloc_slot(s);
     mem->memory_size = size;
     mem->start_addr = start_addr;
@@ -572,30 +572,29 @@ static void kvm_set_phys_mem(target_phys_addr_t start_addr,
 }
 
 static void kvm_client_set_memory(struct CPUPhysMemoryClient *client,
-				  target_phys_addr_t start_addr,
-				  ram_addr_t size,
-				  ram_addr_t phys_offset)
+                                  target_phys_addr_t start_addr,
+                                  ram_addr_t size, ram_addr_t phys_offset)
 {
-	kvm_set_phys_mem(start_addr, size, phys_offset);
+    kvm_set_phys_mem(start_addr, size, phys_offset);
 }
 
 static int kvm_client_sync_dirty_bitmap(struct CPUPhysMemoryClient *client,
-					target_phys_addr_t start_addr,
-					target_phys_addr_t end_addr)
+                                        target_phys_addr_t start_addr,
+                                        target_phys_addr_t end_addr)
 {
-	return kvm_physical_sync_dirty_bitmap(start_addr, end_addr);
+    return kvm_physical_sync_dirty_bitmap(start_addr, end_addr);
 }
 
 static int kvm_client_migration_log(struct CPUPhysMemoryClient *client,
-				    int enable)
+                                    int enable)
 {
-	return kvm_set_migration_log(enable);
+    return kvm_set_migration_log(enable);
 }
 
 static CPUPhysMemoryClient kvm_cpu_phys_memory_client = {
-	.set_memory = kvm_client_set_memory,
-	.sync_dirty_bitmap = kvm_client_sync_dirty_bitmap,
-	.migration_log = kvm_client_migration_log,
+    .set_memory = kvm_client_set_memory,
+    .sync_dirty_bitmap = kvm_client_sync_dirty_bitmap,
+    .migration_log = kvm_client_migration_log,
 };
 
 int kvm_init(int smp_cpus)
@@ -612,9 +611,9 @@ int kvm_init(int smp_cpus)
 #ifdef KVM_CAP_SET_GUEST_DEBUG
     QTAILQ_INIT(&s->kvm_sw_breakpoints);
 #endif
-    for (i = 0; i < ARRAY_SIZE(s->slots); i++)
+    for (i = 0; i < ARRAY_SIZE(s->slots); i++) {
         s->slots[i].slot = i;
-
+    }
     s->vmfd = -1;
     s->fd = qemu_open("/dev/kvm", O_RDWR);
     if (s->fd == -1) {
@@ -625,8 +624,9 @@ int kvm_init(int smp_cpus)
 
     ret = kvm_ioctl(s, KVM_GET_API_VERSION, 0);
     if (ret < KVM_API_VERSION) {
-        if (ret > 0)
+        if (ret > 0) {
             ret = -EINVAL;
+        }
         fprintf(stderr, "kvm version too old\n");
         goto err;
     }
@@ -711,8 +711,9 @@ int kvm_init(int smp_cpus)
 #endif
 
     ret = kvm_arch_init(s, smp_cpus);
-    if (ret < 0)
+    if (ret < 0) {
         goto err;
+    }
 
     kvm_state = s;
     cpu_register_phys_memory_client(&kvm_cpu_phys_memory_client);
@@ -721,10 +722,12 @@ int kvm_init(int smp_cpus)
 
 err:
     if (s) {
-        if (s->vmfd != -1)
+        if (s->vmfd != -1) {
             close(s->vmfd);
-        if (s->fd != -1)
+        }
+        if (s->fd != -1) {
             close(s->fd);
+        }
     }
     qemu_free(s);
 
@@ -788,8 +791,9 @@ static void kvm_handle_internal_error(CPUState *env, struct kvm_run *run)
     cpu_dump_state(env, stderr, fprintf, 0);
     if (run->internal.suberror == KVM_INTERNAL_ERROR_EMULATION) {
         fprintf(stderr, "emulation failure\n");
-        if (!kvm_arch_stop_on_emulation_error(env))
-		return;
+        if (!kvm_arch_stop_on_emulation_error(env)) {
+            return;
+        }
     }
     /* FIXME: Should trigger a qmp message to let management know
      * something went wrong.
@@ -829,8 +833,9 @@ static void do_kvm_cpu_synchronize_state(void *_env)
 
 void kvm_cpu_synchronize_state(CPUState *env)
 {
-    if (!env->kvm_vcpu_dirty)
+    if (!env->kvm_vcpu_dirty) {
         run_on_cpu(env, do_kvm_cpu_synchronize_state, env);
+    }
 }
 
 void kvm_cpu_synchronize_post_reset(CPUState *env)
@@ -970,9 +975,9 @@ int kvm_ioctl(KVMState *s, int type, ...)
     va_end(ap);
 
     ret = ioctl(s->fd, type, arg);
-    if (ret == -1)
+    if (ret == -1) {
         ret = -errno;
-
+    }
     return ret;
 }
 
@@ -987,9 +992,9 @@ int kvm_vm_ioctl(KVMState *s, int type, ...)
     va_end(ap);
 
     ret = ioctl(s->vmfd, type, arg);
-    if (ret == -1)
+    if (ret == -1) {
         ret = -errno;
-
+    }
     return ret;
 }
 
@@ -1004,9 +1009,9 @@ int kvm_vcpu_ioctl(CPUState *env, int type, ...)
     va_end(ap);
 
     ret = ioctl(env->kvm_fd, type, arg);
-    if (ret == -1)
+    if (ret == -1) {
         ret = -errno;
-
+    }
     return ret;
 }
 
@@ -1067,8 +1072,9 @@ struct kvm_sw_breakpoint *kvm_find_sw_breakpoint(CPUState *env,
     struct kvm_sw_breakpoint *bp;
 
     QTAILQ_FOREACH(bp, &env->kvm_state->kvm_sw_breakpoints, entry) {
-        if (bp->pc == pc)
+        if (bp->pc == pc) {
             return bp;
+        }
     }
     return NULL;
 }
@@ -1123,8 +1129,9 @@ int kvm_insert_breakpoint(CPUState *current_env, target_ulong addr,
         }
 
         bp = qemu_malloc(sizeof(struct kvm_sw_breakpoint));
-        if (!bp)
+        if (!bp) {
             return -ENOMEM;
+        }
 
         bp->pc = addr;
         bp->use_count = 1;
@@ -1138,14 +1145,16 @@ int kvm_insert_breakpoint(CPUState *current_env, target_ulong addr,
                           bp, entry);
     } else {
         err = kvm_arch_insert_hw_breakpoint(addr, len, type);
-        if (err)
+        if (err) {
             return err;
+        }
     }
 
     for (env = first_cpu; env != NULL; env = env->next_cpu) {
         err = kvm_update_guest_debug(env, 0);
-        if (err)
+        if (err) {
             return err;
+        }
     }
     return 0;
 }
@@ -1159,8 +1168,9 @@ int kvm_remove_breakpoint(CPUState *current_env, target_ulong addr,
 
     if (type == GDB_BREAKPOINT_SW) {
         bp = kvm_find_sw_breakpoint(current_env, addr);
-        if (!bp)
+        if (!bp) {
             return -ENOENT;
+        }
 
         if (bp->use_count > 1) {
             bp->use_count--;
@@ -1168,21 +1178,24 @@ int kvm_remove_breakpoint(CPUState *current_env, target_ulong addr,
         }
 
         err = kvm_arch_remove_sw_breakpoint(current_env, bp);
-        if (err)
+        if (err) {
             return err;
+        }
 
         QTAILQ_REMOVE(&current_env->kvm_state->kvm_sw_breakpoints, bp, entry);
         qemu_free(bp);
     } else {
         err = kvm_arch_remove_hw_breakpoint(addr, len, type);
-        if (err)
+        if (err) {
             return err;
+        }
     }
 
     for (env = first_cpu; env != NULL; env = env->next_cpu) {
         err = kvm_update_guest_debug(env, 0);
-        if (err)
+        if (err) {
             return err;
+        }
     }
     return 0;
 }
@@ -1197,15 +1210,17 @@ void kvm_remove_all_breakpoints(CPUState *current_env)
         if (kvm_arch_remove_sw_breakpoint(current_env, bp) != 0) {
             /* Try harder to find a CPU that currently sees the breakpoint. */
             for (env = first_cpu; env != NULL; env = env->next_cpu) {
-                if (kvm_arch_remove_sw_breakpoint(env, bp) == 0)
+                if (kvm_arch_remove_sw_breakpoint(env, bp) == 0) {
                     break;
+                }
             }
         }
     }
     kvm_arch_remove_all_hw_breakpoints();
 
-    for (env = first_cpu; env != NULL; env = env->next_cpu)
+    for (env = first_cpu; env != NULL; env = env->next_cpu) {
         kvm_update_guest_debug(env, 0);
+    }
 }
 
 #else /* !KVM_CAP_SET_GUEST_DEBUG */
@@ -1237,8 +1252,9 @@ int kvm_set_signal_mask(CPUState *env, const sigset_t *sigset)
     struct kvm_signal_mask *sigmask;
     int r;
 
-    if (!sigset)
+    if (!sigset) {
         return kvm_vcpu_ioctl(env, KVM_SET_SIGNAL_MASK, NULL);
+    }
 
     sigmask = qemu_malloc(sizeof(*sigmask) + sizeof(*sigset));
 
@@ -1293,13 +1309,16 @@ int kvm_set_ioeventfd_pio_word(int fd, uint16_t addr, uint16_t val, bool assign)
         .fd = fd,
     };
     int r;
-    if (!kvm_enabled())
+    if (!kvm_enabled()) {
         return -ENOSYS;
-    if (!assign)
+    }
+    if (!assign) {
         kick.flags |= KVM_IOEVENTFD_FLAG_DEASSIGN;
+    }
     r = kvm_vm_ioctl(kvm_state, KVM_IOEVENTFD, &kick);
-    if (r < 0)
+    if (r < 0) {
         return r;
+    }
     return 0;
 #else
     return -ENOSYS;
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [PATCH 14/35] kvm: Drop return value of kvm_cpu_exec
  2011-01-06 17:56 ` [Qemu-devel] " Marcelo Tosatti
@ 2011-01-06 17:56   ` Marcelo Tosatti
  -1 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, kvm, Jan Kiszka, Marcelo Tosatti

From: Jan Kiszka <jan.kiszka@siemens.com>

It is not used, it is not needed, so let's remove it.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 kvm-all.c  |    6 ++----
 kvm-stub.c |    4 ++--
 kvm.h      |    2 +-
 3 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 2538283..7518f2c 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -850,7 +850,7 @@ void kvm_cpu_synchronize_post_init(CPUState *env)
     env->kvm_vcpu_dirty = 0;
 }
 
-int kvm_cpu_exec(CPUState *env)
+void kvm_cpu_exec(CPUState *env)
 {
     struct kvm_run *run = env->kvm_run;
     int ret;
@@ -943,7 +943,7 @@ int kvm_cpu_exec(CPUState *env)
 #ifdef KVM_CAP_SET_GUEST_DEBUG
             if (kvm_arch_debug(&run->debug.arch)) {
                 env->exception_index = EXCP_DEBUG;
-                return 0;
+                return;
             }
             /* re-enter, this exception was guest-internal */
             ret = 1;
@@ -960,8 +960,6 @@ int kvm_cpu_exec(CPUState *env)
         env->exit_request = 0;
         env->exception_index = EXCP_INTERRUPT;
     }
-
-    return ret;
 }
 
 int kvm_ioctl(KVMState *s, int type, ...)
diff --git a/kvm-stub.c b/kvm-stub.c
index 5384a4b..352c6a6 100644
--- a/kvm-stub.c
+++ b/kvm-stub.c
@@ -79,9 +79,9 @@ void kvm_cpu_synchronize_post_init(CPUState *env)
 {
 }
 
-int kvm_cpu_exec(CPUState *env)
+void kvm_cpu_exec(CPUState *env)
 {
-    abort ();
+    abort();
 }
 
 int kvm_has_sync_mmu(void)
diff --git a/kvm.h b/kvm.h
index 60a9b42..51ad56f 100644
--- a/kvm.h
+++ b/kvm.h
@@ -46,7 +46,7 @@ int kvm_has_xcrs(void);
 #ifdef NEED_CPU_H
 int kvm_init_vcpu(CPUState *env);
 
-int kvm_cpu_exec(CPUState *env);
+void kvm_cpu_exec(CPUState *env);
 
 #if !defined(CONFIG_USER_ONLY)
 int kvm_log_start(target_phys_addr_t phys_addr, ram_addr_t size);
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [Qemu-devel] [PATCH 14/35] kvm: Drop return value of kvm_cpu_exec
@ 2011-01-06 17:56   ` Marcelo Tosatti
  0 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel, kvm

From: Jan Kiszka <jan.kiszka@siemens.com>

It is not used, it is not needed, so let's remove it.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 kvm-all.c  |    6 ++----
 kvm-stub.c |    4 ++--
 kvm.h      |    2 +-
 3 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 2538283..7518f2c 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -850,7 +850,7 @@ void kvm_cpu_synchronize_post_init(CPUState *env)
     env->kvm_vcpu_dirty = 0;
 }
 
-int kvm_cpu_exec(CPUState *env)
+void kvm_cpu_exec(CPUState *env)
 {
     struct kvm_run *run = env->kvm_run;
     int ret;
@@ -943,7 +943,7 @@ int kvm_cpu_exec(CPUState *env)
 #ifdef KVM_CAP_SET_GUEST_DEBUG
             if (kvm_arch_debug(&run->debug.arch)) {
                 env->exception_index = EXCP_DEBUG;
-                return 0;
+                return;
             }
             /* re-enter, this exception was guest-internal */
             ret = 1;
@@ -960,8 +960,6 @@ int kvm_cpu_exec(CPUState *env)
         env->exit_request = 0;
         env->exception_index = EXCP_INTERRUPT;
     }
-
-    return ret;
 }
 
 int kvm_ioctl(KVMState *s, int type, ...)
diff --git a/kvm-stub.c b/kvm-stub.c
index 5384a4b..352c6a6 100644
--- a/kvm-stub.c
+++ b/kvm-stub.c
@@ -79,9 +79,9 @@ void kvm_cpu_synchronize_post_init(CPUState *env)
 {
 }
 
-int kvm_cpu_exec(CPUState *env)
+void kvm_cpu_exec(CPUState *env)
 {
-    abort ();
+    abort();
 }
 
 int kvm_has_sync_mmu(void)
diff --git a/kvm.h b/kvm.h
index 60a9b42..51ad56f 100644
--- a/kvm.h
+++ b/kvm.h
@@ -46,7 +46,7 @@ int kvm_has_xcrs(void);
 #ifdef NEED_CPU_H
 int kvm_init_vcpu(CPUState *env);
 
-int kvm_cpu_exec(CPUState *env);
+void kvm_cpu_exec(CPUState *env);
 
 #if !defined(CONFIG_USER_ONLY)
 int kvm_log_start(target_phys_addr_t phys_addr, ram_addr_t size);
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [PATCH 15/35] kvm: Stop on all fatal exit reasons
  2011-01-06 17:56 ` [Qemu-devel] " Marcelo Tosatti
@ 2011-01-06 17:56   ` Marcelo Tosatti
  -1 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, kvm, Jan Kiszka, Marcelo Tosatti

From: Jan Kiszka <jan.kiszka@siemens.com>

Ensure that we stop the guest whenever we face a fatal or unknown exit
reason. If we stop, we also have to enforce a cpu loop exit.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 kvm-all.c         |   15 +++++++++++----
 target-i386/kvm.c |    4 ++++
 target-ppc/kvm.c  |    4 ++++
 3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 7518f2c..a46a3b6 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -774,7 +774,7 @@ static int kvm_handle_io(uint16_t port, void *data, int direction, int size,
 }
 
 #ifdef KVM_CAP_INTERNAL_ERROR_DATA
-static void kvm_handle_internal_error(CPUState *env, struct kvm_run *run)
+static int kvm_handle_internal_error(CPUState *env, struct kvm_run *run)
 {
 
     if (kvm_check_extension(kvm_state, KVM_CAP_INTERNAL_ERROR_DATA)) {
@@ -792,13 +792,13 @@ static void kvm_handle_internal_error(CPUState *env, struct kvm_run *run)
     if (run->internal.suberror == KVM_INTERNAL_ERROR_EMULATION) {
         fprintf(stderr, "emulation failure\n");
         if (!kvm_arch_stop_on_emulation_error(env)) {
-            return;
+            return 0;
         }
     }
     /* FIXME: Should trigger a qmp message to let management know
      * something went wrong.
      */
-    vm_stop(0);
+    return -1;
 }
 #endif
 
@@ -926,16 +926,19 @@ void kvm_cpu_exec(CPUState *env)
             break;
         case KVM_EXIT_UNKNOWN:
             DPRINTF("kvm_exit_unknown\n");
+            ret = -1;
             break;
         case KVM_EXIT_FAIL_ENTRY:
             DPRINTF("kvm_exit_fail_entry\n");
+            ret = -1;
             break;
         case KVM_EXIT_EXCEPTION:
             DPRINTF("kvm_exit_exception\n");
+            ret = -1;
             break;
 #ifdef KVM_CAP_INTERNAL_ERROR_DATA
         case KVM_EXIT_INTERNAL_ERROR:
-            kvm_handle_internal_error(env, run);
+            ret = kvm_handle_internal_error(env, run);
             break;
 #endif
         case KVM_EXIT_DEBUG:
@@ -956,6 +959,10 @@ void kvm_cpu_exec(CPUState *env)
         }
     } while (ret > 0);
 
+    if (ret < 0) {
+        vm_stop(0);
+        env->exit_request = 1;
+    }
     if (env->exit_request) {
         env->exit_request = 0;
         env->exception_index = EXCP_INTERRUPT;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index fda07d2..2431a1f 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -1534,6 +1534,10 @@ int kvm_arch_handle_exit(CPUState *env, struct kvm_run *run)
         DPRINTF("handle_hlt\n");
         ret = kvm_handle_halt(env);
         break;
+    default:
+        fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
+        ret = -1;
+        break;
     }
 
     return ret;
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 5caa07c..849b404 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -307,6 +307,10 @@ int kvm_arch_handle_exit(CPUState *env, struct kvm_run *run)
         dprintf("handle halt\n");
         ret = kvmppc_handle_halt(env);
         break;
+    default:
+        fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
+        ret = -1;
+        break;
     }
 
     return ret;
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [Qemu-devel] [PATCH 15/35] kvm: Stop on all fatal exit reasons
@ 2011-01-06 17:56   ` Marcelo Tosatti
  0 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel, kvm

From: Jan Kiszka <jan.kiszka@siemens.com>

Ensure that we stop the guest whenever we face a fatal or unknown exit
reason. If we stop, we also have to enforce a cpu loop exit.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 kvm-all.c         |   15 +++++++++++----
 target-i386/kvm.c |    4 ++++
 target-ppc/kvm.c  |    4 ++++
 3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 7518f2c..a46a3b6 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -774,7 +774,7 @@ static int kvm_handle_io(uint16_t port, void *data, int direction, int size,
 }
 
 #ifdef KVM_CAP_INTERNAL_ERROR_DATA
-static void kvm_handle_internal_error(CPUState *env, struct kvm_run *run)
+static int kvm_handle_internal_error(CPUState *env, struct kvm_run *run)
 {
 
     if (kvm_check_extension(kvm_state, KVM_CAP_INTERNAL_ERROR_DATA)) {
@@ -792,13 +792,13 @@ static void kvm_handle_internal_error(CPUState *env, struct kvm_run *run)
     if (run->internal.suberror == KVM_INTERNAL_ERROR_EMULATION) {
         fprintf(stderr, "emulation failure\n");
         if (!kvm_arch_stop_on_emulation_error(env)) {
-            return;
+            return 0;
         }
     }
     /* FIXME: Should trigger a qmp message to let management know
      * something went wrong.
      */
-    vm_stop(0);
+    return -1;
 }
 #endif
 
@@ -926,16 +926,19 @@ void kvm_cpu_exec(CPUState *env)
             break;
         case KVM_EXIT_UNKNOWN:
             DPRINTF("kvm_exit_unknown\n");
+            ret = -1;
             break;
         case KVM_EXIT_FAIL_ENTRY:
             DPRINTF("kvm_exit_fail_entry\n");
+            ret = -1;
             break;
         case KVM_EXIT_EXCEPTION:
             DPRINTF("kvm_exit_exception\n");
+            ret = -1;
             break;
 #ifdef KVM_CAP_INTERNAL_ERROR_DATA
         case KVM_EXIT_INTERNAL_ERROR:
-            kvm_handle_internal_error(env, run);
+            ret = kvm_handle_internal_error(env, run);
             break;
 #endif
         case KVM_EXIT_DEBUG:
@@ -956,6 +959,10 @@ void kvm_cpu_exec(CPUState *env)
         }
     } while (ret > 0);
 
+    if (ret < 0) {
+        vm_stop(0);
+        env->exit_request = 1;
+    }
     if (env->exit_request) {
         env->exit_request = 0;
         env->exception_index = EXCP_INTERRUPT;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index fda07d2..2431a1f 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -1534,6 +1534,10 @@ int kvm_arch_handle_exit(CPUState *env, struct kvm_run *run)
         DPRINTF("handle_hlt\n");
         ret = kvm_handle_halt(env);
         break;
+    default:
+        fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
+        ret = -1;
+        break;
     }
 
     return ret;
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 5caa07c..849b404 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -307,6 +307,10 @@ int kvm_arch_handle_exit(CPUState *env, struct kvm_run *run)
         dprintf("handle halt\n");
         ret = kvmppc_handle_halt(env);
         break;
+    default:
+        fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
+        ret = -1;
+        break;
     }
 
     return ret;
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [PATCH 16/35] kvm: Improve reporting of fatal errors
  2011-01-06 17:56 ` [Qemu-devel] " Marcelo Tosatti
@ 2011-01-06 17:56   ` Marcelo Tosatti
  -1 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, kvm, Jan Kiszka, Marcelo Tosatti

From: Jan Kiszka <jan.kiszka@siemens.com>

Report KVM_EXIT_UNKNOWN, KVM_EXIT_FAIL_ENTRY, and KVM_EXIT_EXCEPTION
with more details to stderr. The latter two are so far x86-only, so move
them into the arch-specific handler. Integrate the Intel real mode
warning on KVM_EXIT_FAIL_ENTRY that qemu-kvm carries, but actually
restrict it to Intel CPUs. Moreover, always dump the CPU state in case
we fail.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 kvm-all.c           |   22 ++++++++--------------
 target-i386/cpu.h   |    2 ++
 target-i386/cpuid.c |    5 ++---
 target-i386/kvm.c   |   33 +++++++++++++++++++++++++++++++++
 4 files changed, 45 insertions(+), 17 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index a46a3b6..ad1d0a8 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -776,22 +776,22 @@ static int kvm_handle_io(uint16_t port, void *data, int direction, int size,
 #ifdef KVM_CAP_INTERNAL_ERROR_DATA
 static int kvm_handle_internal_error(CPUState *env, struct kvm_run *run)
 {
-
+    fprintf(stderr, "KVM internal error.");
     if (kvm_check_extension(kvm_state, KVM_CAP_INTERNAL_ERROR_DATA)) {
         int i;
 
-        fprintf(stderr, "KVM internal error. Suberror: %d\n",
-                run->internal.suberror);
-
+        fprintf(stderr, " Suberror: %d\n", run->internal.suberror);
         for (i = 0; i < run->internal.ndata; ++i) {
             fprintf(stderr, "extra data[%d]: %"PRIx64"\n",
                     i, (uint64_t)run->internal.data[i]);
         }
+    } else {
+        fprintf(stderr, "\n");
     }
-    cpu_dump_state(env, stderr, fprintf, 0);
     if (run->internal.suberror == KVM_INTERNAL_ERROR_EMULATION) {
         fprintf(stderr, "emulation failure\n");
         if (!kvm_arch_stop_on_emulation_error(env)) {
+            cpu_dump_state(env, stderr, fprintf, 0);
             return 0;
         }
     }
@@ -925,15 +925,8 @@ void kvm_cpu_exec(CPUState *env)
             ret = 1;
             break;
         case KVM_EXIT_UNKNOWN:
-            DPRINTF("kvm_exit_unknown\n");
-            ret = -1;
-            break;
-        case KVM_EXIT_FAIL_ENTRY:
-            DPRINTF("kvm_exit_fail_entry\n");
-            ret = -1;
-            break;
-        case KVM_EXIT_EXCEPTION:
-            DPRINTF("kvm_exit_exception\n");
+            fprintf(stderr, "KVM: unknown exit, hardware reason %" PRIx64 "\n",
+                    (uint64_t)run->hw.hardware_exit_reason);
             ret = -1;
             break;
 #ifdef KVM_CAP_INTERNAL_ERROR_DATA
@@ -960,6 +953,7 @@ void kvm_cpu_exec(CPUState *env)
     } while (ret > 0);
 
     if (ret < 0) {
+        cpu_dump_state(env, stderr, fprintf, 0);
         vm_stop(0);
         env->exit_request = 1;
     }
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index dddcd74..a457423 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -874,6 +874,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
                    uint32_t *ecx, uint32_t *edx);
 int cpu_x86_register (CPUX86State *env, const char *cpu_model);
 void cpu_clear_apic_feature(CPUX86State *env);
+void host_cpuid(uint32_t function, uint32_t count,
+                uint32_t *eax, uint32_t *ebx, uint32_t *ecx, uint32_t *edx);
 
 /* helper.c */
 int cpu_x86_handle_mmu_fault(CPUX86State *env, target_ulong addr,
diff --git a/target-i386/cpuid.c b/target-i386/cpuid.c
index 165045e..5382a28 100644
--- a/target-i386/cpuid.c
+++ b/target-i386/cpuid.c
@@ -103,9 +103,8 @@ typedef struct model_features_t {
 int check_cpuid = 0;
 int enforce_cpuid = 0;
 
-static void host_cpuid(uint32_t function, uint32_t count,
-                       uint32_t *eax, uint32_t *ebx,
-                       uint32_t *ecx, uint32_t *edx)
+void host_cpuid(uint32_t function, uint32_t count,
+                uint32_t *eax, uint32_t *ebx, uint32_t *ecx, uint32_t *edx)
 {
 #if defined(CONFIG_KVM)
     uint32_t vec[4];
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 2431a1f..d4f253e 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -1525,8 +1525,19 @@ static int kvm_handle_halt(CPUState *env)
     return 1;
 }
 
+static bool host_supports_vmx(void)
+{
+    uint32_t ecx, unused;
+
+    host_cpuid(1, 0, &unused, &unused, &ecx, &unused);
+    return ecx & CPUID_EXT_VMX;
+}
+
+#define VMX_INVALID_GUEST_STATE 0x80000021
+
 int kvm_arch_handle_exit(CPUState *env, struct kvm_run *run)
 {
+    uint64_t code;
     int ret = 0;
 
     switch (run->exit_reason) {
@@ -1534,6 +1545,28 @@ int kvm_arch_handle_exit(CPUState *env, struct kvm_run *run)
         DPRINTF("handle_hlt\n");
         ret = kvm_handle_halt(env);
         break;
+    case KVM_EXIT_FAIL_ENTRY:
+        code = run->fail_entry.hardware_entry_failure_reason;
+        fprintf(stderr, "KVM: entry failed, hardware error 0x%" PRIx64 "\n",
+                code);
+        if (host_supports_vmx() && code == VMX_INVALID_GUEST_STATE) {
+            fprintf(stderr,
+                    "\nIf you're runnning a guest on an Intel machine without "
+                        "unrestricted mode\n"
+                    "support, the failure can be most likely due to the guest "
+                        "entering an invalid\n"
+                    "state for Intel VT. For example, the guest maybe running "
+                        "in big real mode\n"
+                    "which is not supported on less recent Intel processors."
+                        "\n\n");
+        }
+        ret = -1;
+        break;
+    case KVM_EXIT_EXCEPTION:
+        fprintf(stderr, "KVM: exception %d exit (error code 0x%x)\n",
+                run->ex.exception, run->ex.error_code);
+        ret = -1;
+        break;
     default:
         fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
         ret = -1;
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [Qemu-devel] [PATCH 16/35] kvm: Improve reporting of fatal errors
@ 2011-01-06 17:56   ` Marcelo Tosatti
  0 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel, kvm

From: Jan Kiszka <jan.kiszka@siemens.com>

Report KVM_EXIT_UNKNOWN, KVM_EXIT_FAIL_ENTRY, and KVM_EXIT_EXCEPTION
with more details to stderr. The latter two are so far x86-only, so move
them into the arch-specific handler. Integrate the Intel real mode
warning on KVM_EXIT_FAIL_ENTRY that qemu-kvm carries, but actually
restrict it to Intel CPUs. Moreover, always dump the CPU state in case
we fail.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 kvm-all.c           |   22 ++++++++--------------
 target-i386/cpu.h   |    2 ++
 target-i386/cpuid.c |    5 ++---
 target-i386/kvm.c   |   33 +++++++++++++++++++++++++++++++++
 4 files changed, 45 insertions(+), 17 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index a46a3b6..ad1d0a8 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -776,22 +776,22 @@ static int kvm_handle_io(uint16_t port, void *data, int direction, int size,
 #ifdef KVM_CAP_INTERNAL_ERROR_DATA
 static int kvm_handle_internal_error(CPUState *env, struct kvm_run *run)
 {
-
+    fprintf(stderr, "KVM internal error.");
     if (kvm_check_extension(kvm_state, KVM_CAP_INTERNAL_ERROR_DATA)) {
         int i;
 
-        fprintf(stderr, "KVM internal error. Suberror: %d\n",
-                run->internal.suberror);
-
+        fprintf(stderr, " Suberror: %d\n", run->internal.suberror);
         for (i = 0; i < run->internal.ndata; ++i) {
             fprintf(stderr, "extra data[%d]: %"PRIx64"\n",
                     i, (uint64_t)run->internal.data[i]);
         }
+    } else {
+        fprintf(stderr, "\n");
     }
-    cpu_dump_state(env, stderr, fprintf, 0);
     if (run->internal.suberror == KVM_INTERNAL_ERROR_EMULATION) {
         fprintf(stderr, "emulation failure\n");
         if (!kvm_arch_stop_on_emulation_error(env)) {
+            cpu_dump_state(env, stderr, fprintf, 0);
             return 0;
         }
     }
@@ -925,15 +925,8 @@ void kvm_cpu_exec(CPUState *env)
             ret = 1;
             break;
         case KVM_EXIT_UNKNOWN:
-            DPRINTF("kvm_exit_unknown\n");
-            ret = -1;
-            break;
-        case KVM_EXIT_FAIL_ENTRY:
-            DPRINTF("kvm_exit_fail_entry\n");
-            ret = -1;
-            break;
-        case KVM_EXIT_EXCEPTION:
-            DPRINTF("kvm_exit_exception\n");
+            fprintf(stderr, "KVM: unknown exit, hardware reason %" PRIx64 "\n",
+                    (uint64_t)run->hw.hardware_exit_reason);
             ret = -1;
             break;
 #ifdef KVM_CAP_INTERNAL_ERROR_DATA
@@ -960,6 +953,7 @@ void kvm_cpu_exec(CPUState *env)
     } while (ret > 0);
 
     if (ret < 0) {
+        cpu_dump_state(env, stderr, fprintf, 0);
         vm_stop(0);
         env->exit_request = 1;
     }
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index dddcd74..a457423 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -874,6 +874,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
                    uint32_t *ecx, uint32_t *edx);
 int cpu_x86_register (CPUX86State *env, const char *cpu_model);
 void cpu_clear_apic_feature(CPUX86State *env);
+void host_cpuid(uint32_t function, uint32_t count,
+                uint32_t *eax, uint32_t *ebx, uint32_t *ecx, uint32_t *edx);
 
 /* helper.c */
 int cpu_x86_handle_mmu_fault(CPUX86State *env, target_ulong addr,
diff --git a/target-i386/cpuid.c b/target-i386/cpuid.c
index 165045e..5382a28 100644
--- a/target-i386/cpuid.c
+++ b/target-i386/cpuid.c
@@ -103,9 +103,8 @@ typedef struct model_features_t {
 int check_cpuid = 0;
 int enforce_cpuid = 0;
 
-static void host_cpuid(uint32_t function, uint32_t count,
-                       uint32_t *eax, uint32_t *ebx,
-                       uint32_t *ecx, uint32_t *edx)
+void host_cpuid(uint32_t function, uint32_t count,
+                uint32_t *eax, uint32_t *ebx, uint32_t *ecx, uint32_t *edx)
 {
 #if defined(CONFIG_KVM)
     uint32_t vec[4];
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 2431a1f..d4f253e 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -1525,8 +1525,19 @@ static int kvm_handle_halt(CPUState *env)
     return 1;
 }
 
+static bool host_supports_vmx(void)
+{
+    uint32_t ecx, unused;
+
+    host_cpuid(1, 0, &unused, &unused, &ecx, &unused);
+    return ecx & CPUID_EXT_VMX;
+}
+
+#define VMX_INVALID_GUEST_STATE 0x80000021
+
 int kvm_arch_handle_exit(CPUState *env, struct kvm_run *run)
 {
+    uint64_t code;
     int ret = 0;
 
     switch (run->exit_reason) {
@@ -1534,6 +1545,28 @@ int kvm_arch_handle_exit(CPUState *env, struct kvm_run *run)
         DPRINTF("handle_hlt\n");
         ret = kvm_handle_halt(env);
         break;
+    case KVM_EXIT_FAIL_ENTRY:
+        code = run->fail_entry.hardware_entry_failure_reason;
+        fprintf(stderr, "KVM: entry failed, hardware error 0x%" PRIx64 "\n",
+                code);
+        if (host_supports_vmx() && code == VMX_INVALID_GUEST_STATE) {
+            fprintf(stderr,
+                    "\nIf you're runnning a guest on an Intel machine without "
+                        "unrestricted mode\n"
+                    "support, the failure can be most likely due to the guest "
+                        "entering an invalid\n"
+                    "state for Intel VT. For example, the guest maybe running "
+                        "in big real mode\n"
+                    "which is not supported on less recent Intel processors."
+                        "\n\n");
+        }
+        ret = -1;
+        break;
+    case KVM_EXIT_EXCEPTION:
+        fprintf(stderr, "KVM: exception %d exit (error code 0x%x)\n",
+                run->ex.exception, run->ex.error_code);
+        ret = -1;
+        break;
     default:
         fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
         ret = -1;
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [PATCH 17/35] x86: Optionally dump code bytes on cpu_dump_state
  2011-01-06 17:56 ` [Qemu-devel] " Marcelo Tosatti
@ 2011-01-06 17:56   ` Marcelo Tosatti
  -1 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, kvm, Jan Kiszka, Marcelo Tosatti

From: Jan Kiszka <jan.kiszka@siemens.com>

Introduce the cpu_dump_state flag CPU_DUMP_CODE and implement it for
x86. This writes out the code bytes around the current instruction
pointer. Make use of this feature in KVM to help debugging fatal vm
exits.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 cpu-all.h            |    2 ++
 kvm-all.c            |    4 ++--
 target-i386/helper.c |   21 +++++++++++++++++++++
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/cpu-all.h b/cpu-all.h
index 4ce4e83..ffbd6a4 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -765,6 +765,8 @@ int page_check_range(target_ulong start, target_ulong len, int flags);
 CPUState *cpu_copy(CPUState *env);
 CPUState *qemu_get_cpu(int cpu);
 
+#define CPU_DUMP_CODE 0x00010000
+
 void cpu_dump_state(CPUState *env, FILE *f, fprintf_function cpu_fprintf,
                     int flags);
 void cpu_dump_statistics(CPUState *env, FILE *f, fprintf_function cpu_fprintf,
diff --git a/kvm-all.c b/kvm-all.c
index ad1d0a8..ef2ca3b 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -791,7 +791,7 @@ static int kvm_handle_internal_error(CPUState *env, struct kvm_run *run)
     if (run->internal.suberror == KVM_INTERNAL_ERROR_EMULATION) {
         fprintf(stderr, "emulation failure\n");
         if (!kvm_arch_stop_on_emulation_error(env)) {
-            cpu_dump_state(env, stderr, fprintf, 0);
+            cpu_dump_state(env, stderr, fprintf, CPU_DUMP_CODE);
             return 0;
         }
     }
@@ -953,7 +953,7 @@ void kvm_cpu_exec(CPUState *env)
     } while (ret > 0);
 
     if (ret < 0) {
-        cpu_dump_state(env, stderr, fprintf, 0);
+        cpu_dump_state(env, stderr, fprintf, CPU_DUMP_CODE);
         vm_stop(0);
         env->exit_request = 1;
     }
diff --git a/target-i386/helper.c b/target-i386/helper.c
index 6dfa27d..af2ce10 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -249,6 +249,9 @@ done:
     cpu_fprintf(f, "\n");
 }
 
+#define DUMP_CODE_BYTES_TOTAL    50
+#define DUMP_CODE_BYTES_BACKWARD 20
+
 void cpu_dump_state(CPUState *env, FILE *f, fprintf_function cpu_fprintf,
                     int flags)
 {
@@ -434,6 +437,24 @@ void cpu_dump_state(CPUState *env, FILE *f, fprintf_function cpu_fprintf,
                 cpu_fprintf(f, " ");
         }
     }
+    if (flags & CPU_DUMP_CODE) {
+        target_ulong base = env->segs[R_CS].base + env->eip;
+        target_ulong offs = MIN(env->eip, DUMP_CODE_BYTES_BACKWARD);
+        uint8_t code;
+        char codestr[3];
+
+        cpu_fprintf(f, "Code=");
+        for (i = 0; i < DUMP_CODE_BYTES_TOTAL; i++) {
+            if (cpu_memory_rw_debug(env, base - offs + i, &code, 1, 0) == 0) {
+                snprintf(codestr, sizeof(codestr), "%02x", code);
+            } else {
+                snprintf(codestr, sizeof(codestr), "??");
+            }
+            cpu_fprintf(f, "%s%s%s%s", i > 0 ? " ": "",
+                        i == offs ? "<" : "", codestr, i == offs ? ">" : "");
+        }
+        cpu_fprintf(f, "\n");
+    }
 }
 
 /***********************************************************/
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [Qemu-devel] [PATCH 17/35] x86: Optionally dump code bytes on cpu_dump_state
@ 2011-01-06 17:56   ` Marcelo Tosatti
  0 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel, kvm

From: Jan Kiszka <jan.kiszka@siemens.com>

Introduce the cpu_dump_state flag CPU_DUMP_CODE and implement it for
x86. This writes out the code bytes around the current instruction
pointer. Make use of this feature in KVM to help debugging fatal vm
exits.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 cpu-all.h            |    2 ++
 kvm-all.c            |    4 ++--
 target-i386/helper.c |   21 +++++++++++++++++++++
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/cpu-all.h b/cpu-all.h
index 4ce4e83..ffbd6a4 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -765,6 +765,8 @@ int page_check_range(target_ulong start, target_ulong len, int flags);
 CPUState *cpu_copy(CPUState *env);
 CPUState *qemu_get_cpu(int cpu);
 
+#define CPU_DUMP_CODE 0x00010000
+
 void cpu_dump_state(CPUState *env, FILE *f, fprintf_function cpu_fprintf,
                     int flags);
 void cpu_dump_statistics(CPUState *env, FILE *f, fprintf_function cpu_fprintf,
diff --git a/kvm-all.c b/kvm-all.c
index ad1d0a8..ef2ca3b 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -791,7 +791,7 @@ static int kvm_handle_internal_error(CPUState *env, struct kvm_run *run)
     if (run->internal.suberror == KVM_INTERNAL_ERROR_EMULATION) {
         fprintf(stderr, "emulation failure\n");
         if (!kvm_arch_stop_on_emulation_error(env)) {
-            cpu_dump_state(env, stderr, fprintf, 0);
+            cpu_dump_state(env, stderr, fprintf, CPU_DUMP_CODE);
             return 0;
         }
     }
@@ -953,7 +953,7 @@ void kvm_cpu_exec(CPUState *env)
     } while (ret > 0);
 
     if (ret < 0) {
-        cpu_dump_state(env, stderr, fprintf, 0);
+        cpu_dump_state(env, stderr, fprintf, CPU_DUMP_CODE);
         vm_stop(0);
         env->exit_request = 1;
     }
diff --git a/target-i386/helper.c b/target-i386/helper.c
index 6dfa27d..af2ce10 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -249,6 +249,9 @@ done:
     cpu_fprintf(f, "\n");
 }
 
+#define DUMP_CODE_BYTES_TOTAL    50
+#define DUMP_CODE_BYTES_BACKWARD 20
+
 void cpu_dump_state(CPUState *env, FILE *f, fprintf_function cpu_fprintf,
                     int flags)
 {
@@ -434,6 +437,24 @@ void cpu_dump_state(CPUState *env, FILE *f, fprintf_function cpu_fprintf,
                 cpu_fprintf(f, " ");
         }
     }
+    if (flags & CPU_DUMP_CODE) {
+        target_ulong base = env->segs[R_CS].base + env->eip;
+        target_ulong offs = MIN(env->eip, DUMP_CODE_BYTES_BACKWARD);
+        uint8_t code;
+        char codestr[3];
+
+        cpu_fprintf(f, "Code=");
+        for (i = 0; i < DUMP_CODE_BYTES_TOTAL; i++) {
+            if (cpu_memory_rw_debug(env, base - offs + i, &code, 1, 0) == 0) {
+                snprintf(codestr, sizeof(codestr), "%02x", code);
+            } else {
+                snprintf(codestr, sizeof(codestr), "??");
+            }
+            cpu_fprintf(f, "%s%s%s%s", i > 0 ? " ": "",
+                        i == offs ? "<" : "", codestr, i == offs ? ">" : "");
+        }
+        cpu_fprintf(f, "\n");
+    }
 }
 
 /***********************************************************/
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [PATCH 18/35] kvm: x86: Align kvm_arch_put_registers code with comment
  2011-01-06 17:56 ` [Qemu-devel] " Marcelo Tosatti
@ 2011-01-06 17:56   ` Marcelo Tosatti
  -1 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, kvm, Jan Kiszka, Marcelo Tosatti

From: Jan Kiszka <jan.kiszka@siemens.com>

The ordering doesn't matter in this case, but better keep it consistent.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 target-i386/kvm.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index d4f253e..684430f 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -1388,12 +1388,12 @@ int kvm_arch_put_registers(CPUState *env, int level)
     if (ret < 0) {
         return ret;
     }
-    /* must be last */
-    ret = kvm_guest_debug_workarounds(env);
+    ret = kvm_put_debugregs(env);
     if (ret < 0) {
         return ret;
     }
-    ret = kvm_put_debugregs(env);
+    /* must be last */
+    ret = kvm_guest_debug_workarounds(env);
     if (ret < 0) {
         return ret;
     }
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [Qemu-devel] [PATCH 18/35] kvm: x86: Align kvm_arch_put_registers code with comment
@ 2011-01-06 17:56   ` Marcelo Tosatti
  0 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel, kvm

From: Jan Kiszka <jan.kiszka@siemens.com>

The ordering doesn't matter in this case, but better keep it consistent.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 target-i386/kvm.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index d4f253e..684430f 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -1388,12 +1388,12 @@ int kvm_arch_put_registers(CPUState *env, int level)
     if (ret < 0) {
         return ret;
     }
-    /* must be last */
-    ret = kvm_guest_debug_workarounds(env);
+    ret = kvm_put_debugregs(env);
     if (ret < 0) {
         return ret;
     }
-    ret = kvm_put_debugregs(env);
+    /* must be last */
+    ret = kvm_guest_debug_workarounds(env);
     if (ret < 0) {
         return ret;
     }
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [PATCH 19/35] kvm: x86: Prepare kvm_get_mp_state for in-kernel irqchip
  2011-01-06 17:56 ` [Qemu-devel] " Marcelo Tosatti
@ 2011-01-06 17:56   ` Marcelo Tosatti
  -1 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, kvm, Jan Kiszka, Marcelo Tosatti

From: Jan Kiszka <jan.kiszka@siemens.com>

This code path will not yet be taken as we still lack in-kernel irqchip
support. But qemu-kvm can already make use of it and drop its own
mp_state access services.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 target-i386/kvm.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 684430f..30aa51c 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -1198,6 +1198,9 @@ static int kvm_get_mp_state(CPUState *env)
         return ret;
     }
     env->mp_state = mp_state.mp_state;
+    if (kvm_irqchip_in_kernel()) {
+        env->halted = (mp_state.mp_state == KVM_MP_STATE_HALTED);
+    }
     return 0;
 }
 
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [Qemu-devel] [PATCH 19/35] kvm: x86: Prepare kvm_get_mp_state for in-kernel irqchip
@ 2011-01-06 17:56   ` Marcelo Tosatti
  0 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel, kvm

From: Jan Kiszka <jan.kiszka@siemens.com>

This code path will not yet be taken as we still lack in-kernel irqchip
support. But qemu-kvm can already make use of it and drop its own
mp_state access services.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 target-i386/kvm.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 684430f..30aa51c 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -1198,6 +1198,9 @@ static int kvm_get_mp_state(CPUState *env)
         return ret;
     }
     env->mp_state = mp_state.mp_state;
+    if (kvm_irqchip_in_kernel()) {
+        env->halted = (mp_state.mp_state == KVM_MP_STATE_HALTED);
+    }
     return 0;
 }
 
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [PATCH 20/35] kvm: x86: Remove redundant mp_state initialization
  2011-01-06 17:56 ` [Qemu-devel] " Marcelo Tosatti
@ 2011-01-06 17:56   ` Marcelo Tosatti
  -1 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, kvm, Jan Kiszka, Marcelo Tosatti

From: Jan Kiszka <jan.kiszka@siemens.com>

kvm_arch_reset_vcpu initializes mp_state, and that function is invoked
right after kvm_arch_init_vcpu.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 target-i386/kvm.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 30aa51c..1403327 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -321,8 +321,6 @@ int kvm_arch_init_vcpu(CPUState *env)
     uint32_t signature[3];
 #endif
 
-    env->mp_state = KVM_MP_STATE_RUNNABLE;
-
     env->cpuid_features &= kvm_arch_get_supported_cpuid(env, 1, 0, R_EDX);
 
     i = env->cpuid_ext_features & CPUID_EXT_HYPERVISOR;
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [Qemu-devel] [PATCH 20/35] kvm: x86: Remove redundant mp_state initialization
@ 2011-01-06 17:56   ` Marcelo Tosatti
  0 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel, kvm

From: Jan Kiszka <jan.kiszka@siemens.com>

kvm_arch_reset_vcpu initializes mp_state, and that function is invoked
right after kvm_arch_init_vcpu.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 target-i386/kvm.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 30aa51c..1403327 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -321,8 +321,6 @@ int kvm_arch_init_vcpu(CPUState *env)
     uint32_t signature[3];
 #endif
 
-    env->mp_state = KVM_MP_STATE_RUNNABLE;
-
     env->cpuid_features &= kvm_arch_get_supported_cpuid(env, 1, 0, R_EDX);
 
     i = env->cpuid_ext_features & CPUID_EXT_HYPERVISOR;
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [PATCH 21/35] kvm: x86: Fix xcr0 reset mismerge
  2011-01-06 17:56 ` [Qemu-devel] " Marcelo Tosatti
@ 2011-01-06 17:56   ` Marcelo Tosatti
  -1 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, kvm, Jan Kiszka, Marcelo Tosatti

From: Jan Kiszka <jan.kiszka@siemens.com>

For unknown reasons, xcr0 reset ended up in kvm_arch_update_guest_debug
on upstream merge. Fix this and also remove the misleading comment (1 is
THE reset value).

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 target-i386/kvm.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 1403327..e46b901 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -450,6 +450,7 @@ void kvm_arch_reset_vcpu(CPUState *env)
     env->interrupt_injected = -1;
     env->nmi_injected = 0;
     env->nmi_pending = 0;
+    env->xcr0 = 1;
     if (kvm_irqchip_in_kernel()) {
         env->mp_state = cpu_is_bsp(env) ? KVM_MP_STATE_RUNNABLE :
                                           KVM_MP_STATE_UNINITIALIZED;
@@ -1756,8 +1757,6 @@ void kvm_arch_update_guest_debug(CPUState *env, struct kvm_guest_debug *dbg)
                 ((uint32_t)len_code[hw_breakpoint[n].len] << (18 + n*4));
         }
     }
-    /* Legal xcr0 for loading */
-    env->xcr0 = 1;
 }
 #endif /* KVM_CAP_SET_GUEST_DEBUG */
 
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [Qemu-devel] [PATCH 21/35] kvm: x86: Fix xcr0 reset mismerge
@ 2011-01-06 17:56   ` Marcelo Tosatti
  0 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel, kvm

From: Jan Kiszka <jan.kiszka@siemens.com>

For unknown reasons, xcr0 reset ended up in kvm_arch_update_guest_debug
on upstream merge. Fix this and also remove the misleading comment (1 is
THE reset value).

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 target-i386/kvm.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 1403327..e46b901 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -450,6 +450,7 @@ void kvm_arch_reset_vcpu(CPUState *env)
     env->interrupt_injected = -1;
     env->nmi_injected = 0;
     env->nmi_pending = 0;
+    env->xcr0 = 1;
     if (kvm_irqchip_in_kernel()) {
         env->mp_state = cpu_is_bsp(env) ? KVM_MP_STATE_RUNNABLE :
                                           KVM_MP_STATE_UNINITIALIZED;
@@ -1756,8 +1757,6 @@ void kvm_arch_update_guest_debug(CPUState *env, struct kvm_guest_debug *dbg)
                 ((uint32_t)len_code[hw_breakpoint[n].len] << (18 + n*4));
         }
     }
-    /* Legal xcr0 for loading */
-    env->xcr0 = 1;
 }
 #endif /* KVM_CAP_SET_GUEST_DEBUG */
 
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [PATCH 22/35] kvm: x86: Refactor msr_star/hsave_pa setup and checks
  2011-01-06 17:56 ` [Qemu-devel] " Marcelo Tosatti
@ 2011-01-06 17:56   ` Marcelo Tosatti
  -1 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, kvm, Jan Kiszka, Marcelo Tosatti

From: Jan Kiszka <jan.kiszka@siemens.com>

Simplify kvm_has_msr_star/hsave_pa to booleans and push their one-time
initialization into kvm_arch_init. Also handle potential errors of that
setup procedure.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 target-i386/kvm.c |   47 +++++++++++++++++++----------------------------
 1 files changed, 19 insertions(+), 28 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index e46b901..d8f26bf 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -54,6 +54,8 @@
 #define BUS_MCEERR_AO 5
 #endif
 
+static bool has_msr_star;
+static bool has_msr_hsave_pa;
 static int lm_capable_kernel;
 
 #ifdef KVM_CAP_EXT_CPUID
@@ -459,13 +461,10 @@ void kvm_arch_reset_vcpu(CPUState *env)
     }
 }
 
-int has_msr_star;
-int has_msr_hsave_pa;
-
-static void kvm_supported_msrs(CPUState *env)
+static int kvm_get_supported_msrs(KVMState *s)
 {
     static int kvm_supported_msrs;
-    int ret;
+    int ret = 0;
 
     /* first time */
     if (kvm_supported_msrs == 0) {
@@ -476,9 +475,9 @@ static void kvm_supported_msrs(CPUState *env)
         /* Obtain MSR list from KVM.  These are the MSRs that we must
          * save/restore */
         msr_list.nmsrs = 0;
-        ret = kvm_ioctl(env->kvm_state, KVM_GET_MSR_INDEX_LIST, &msr_list);
+        ret = kvm_ioctl(s, KVM_GET_MSR_INDEX_LIST, &msr_list);
         if (ret < 0 && ret != -E2BIG) {
-            return;
+            return ret;
         }
         /* Old kernel modules had a bug and could write beyond the provided
            memory. Allocate at least a safe amount of 1K. */
@@ -487,17 +486,17 @@ static void kvm_supported_msrs(CPUState *env)
                                               sizeof(msr_list.indices[0])));
 
         kvm_msr_list->nmsrs = msr_list.nmsrs;
-        ret = kvm_ioctl(env->kvm_state, KVM_GET_MSR_INDEX_LIST, kvm_msr_list);
+        ret = kvm_ioctl(s, KVM_GET_MSR_INDEX_LIST, kvm_msr_list);
         if (ret >= 0) {
             int i;
 
             for (i = 0; i < kvm_msr_list->nmsrs; i++) {
                 if (kvm_msr_list->indices[i] == MSR_STAR) {
-                    has_msr_star = 1;
+                    has_msr_star = true;
                     continue;
                 }
                 if (kvm_msr_list->indices[i] == MSR_VM_HSAVE_PA) {
-                    has_msr_hsave_pa = 1;
+                    has_msr_hsave_pa = true;
                     continue;
                 }
             }
@@ -506,19 +505,7 @@ static void kvm_supported_msrs(CPUState *env)
         free(kvm_msr_list);
     }
 
-    return;
-}
-
-static int kvm_has_msr_hsave_pa(CPUState *env)
-{
-    kvm_supported_msrs(env);
-    return has_msr_hsave_pa;
-}
-
-static int kvm_has_msr_star(CPUState *env)
-{
-    kvm_supported_msrs(env);
-    return has_msr_star;
+    return ret;
 }
 
 static int kvm_init_identity_map_page(KVMState *s)
@@ -543,9 +530,13 @@ static int kvm_init_identity_map_page(KVMState *s)
 int kvm_arch_init(KVMState *s, int smp_cpus)
 {
     int ret;
-
     struct utsname utsname;
 
+    ret = kvm_get_supported_msrs(s);
+    if (ret < 0) {
+        return ret;
+    }
+
     uname(&utsname);
     lm_capable_kernel = strcmp(utsname.machine, "x86_64") == 0;
 
@@ -830,10 +821,10 @@ static int kvm_put_msrs(CPUState *env, int level)
     kvm_msr_entry_set(&msrs[n++], MSR_IA32_SYSENTER_CS, env->sysenter_cs);
     kvm_msr_entry_set(&msrs[n++], MSR_IA32_SYSENTER_ESP, env->sysenter_esp);
     kvm_msr_entry_set(&msrs[n++], MSR_IA32_SYSENTER_EIP, env->sysenter_eip);
-    if (kvm_has_msr_star(env)) {
+    if (has_msr_star) {
         kvm_msr_entry_set(&msrs[n++], MSR_STAR, env->star);
     }
-    if (kvm_has_msr_hsave_pa(env)) {
+    if (has_msr_hsave_pa) {
         kvm_msr_entry_set(&msrs[n++], MSR_VM_HSAVE_PA, env->vm_hsave);
     }
 #ifdef TARGET_X86_64
@@ -1076,10 +1067,10 @@ static int kvm_get_msrs(CPUState *env)
     msrs[n++].index = MSR_IA32_SYSENTER_CS;
     msrs[n++].index = MSR_IA32_SYSENTER_ESP;
     msrs[n++].index = MSR_IA32_SYSENTER_EIP;
-    if (kvm_has_msr_star(env)) {
+    if (has_msr_star) {
         msrs[n++].index = MSR_STAR;
     }
-    if (kvm_has_msr_hsave_pa(env)) {
+    if (has_msr_hsave_pa) {
         msrs[n++].index = MSR_VM_HSAVE_PA;
     }
     msrs[n++].index = MSR_IA32_TSC;
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [Qemu-devel] [PATCH 22/35] kvm: x86: Refactor msr_star/hsave_pa setup and checks
@ 2011-01-06 17:56   ` Marcelo Tosatti
  0 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel, kvm

From: Jan Kiszka <jan.kiszka@siemens.com>

Simplify kvm_has_msr_star/hsave_pa to booleans and push their one-time
initialization into kvm_arch_init. Also handle potential errors of that
setup procedure.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 target-i386/kvm.c |   47 +++++++++++++++++++----------------------------
 1 files changed, 19 insertions(+), 28 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index e46b901..d8f26bf 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -54,6 +54,8 @@
 #define BUS_MCEERR_AO 5
 #endif
 
+static bool has_msr_star;
+static bool has_msr_hsave_pa;
 static int lm_capable_kernel;
 
 #ifdef KVM_CAP_EXT_CPUID
@@ -459,13 +461,10 @@ void kvm_arch_reset_vcpu(CPUState *env)
     }
 }
 
-int has_msr_star;
-int has_msr_hsave_pa;
-
-static void kvm_supported_msrs(CPUState *env)
+static int kvm_get_supported_msrs(KVMState *s)
 {
     static int kvm_supported_msrs;
-    int ret;
+    int ret = 0;
 
     /* first time */
     if (kvm_supported_msrs == 0) {
@@ -476,9 +475,9 @@ static void kvm_supported_msrs(CPUState *env)
         /* Obtain MSR list from KVM.  These are the MSRs that we must
          * save/restore */
         msr_list.nmsrs = 0;
-        ret = kvm_ioctl(env->kvm_state, KVM_GET_MSR_INDEX_LIST, &msr_list);
+        ret = kvm_ioctl(s, KVM_GET_MSR_INDEX_LIST, &msr_list);
         if (ret < 0 && ret != -E2BIG) {
-            return;
+            return ret;
         }
         /* Old kernel modules had a bug and could write beyond the provided
            memory. Allocate at least a safe amount of 1K. */
@@ -487,17 +486,17 @@ static void kvm_supported_msrs(CPUState *env)
                                               sizeof(msr_list.indices[0])));
 
         kvm_msr_list->nmsrs = msr_list.nmsrs;
-        ret = kvm_ioctl(env->kvm_state, KVM_GET_MSR_INDEX_LIST, kvm_msr_list);
+        ret = kvm_ioctl(s, KVM_GET_MSR_INDEX_LIST, kvm_msr_list);
         if (ret >= 0) {
             int i;
 
             for (i = 0; i < kvm_msr_list->nmsrs; i++) {
                 if (kvm_msr_list->indices[i] == MSR_STAR) {
-                    has_msr_star = 1;
+                    has_msr_star = true;
                     continue;
                 }
                 if (kvm_msr_list->indices[i] == MSR_VM_HSAVE_PA) {
-                    has_msr_hsave_pa = 1;
+                    has_msr_hsave_pa = true;
                     continue;
                 }
             }
@@ -506,19 +505,7 @@ static void kvm_supported_msrs(CPUState *env)
         free(kvm_msr_list);
     }
 
-    return;
-}
-
-static int kvm_has_msr_hsave_pa(CPUState *env)
-{
-    kvm_supported_msrs(env);
-    return has_msr_hsave_pa;
-}
-
-static int kvm_has_msr_star(CPUState *env)
-{
-    kvm_supported_msrs(env);
-    return has_msr_star;
+    return ret;
 }
 
 static int kvm_init_identity_map_page(KVMState *s)
@@ -543,9 +530,13 @@ static int kvm_init_identity_map_page(KVMState *s)
 int kvm_arch_init(KVMState *s, int smp_cpus)
 {
     int ret;
-
     struct utsname utsname;
 
+    ret = kvm_get_supported_msrs(s);
+    if (ret < 0) {
+        return ret;
+    }
+
     uname(&utsname);
     lm_capable_kernel = strcmp(utsname.machine, "x86_64") == 0;
 
@@ -830,10 +821,10 @@ static int kvm_put_msrs(CPUState *env, int level)
     kvm_msr_entry_set(&msrs[n++], MSR_IA32_SYSENTER_CS, env->sysenter_cs);
     kvm_msr_entry_set(&msrs[n++], MSR_IA32_SYSENTER_ESP, env->sysenter_esp);
     kvm_msr_entry_set(&msrs[n++], MSR_IA32_SYSENTER_EIP, env->sysenter_eip);
-    if (kvm_has_msr_star(env)) {
+    if (has_msr_star) {
         kvm_msr_entry_set(&msrs[n++], MSR_STAR, env->star);
     }
-    if (kvm_has_msr_hsave_pa(env)) {
+    if (has_msr_hsave_pa) {
         kvm_msr_entry_set(&msrs[n++], MSR_VM_HSAVE_PA, env->vm_hsave);
     }
 #ifdef TARGET_X86_64
@@ -1076,10 +1067,10 @@ static int kvm_get_msrs(CPUState *env)
     msrs[n++].index = MSR_IA32_SYSENTER_CS;
     msrs[n++].index = MSR_IA32_SYSENTER_ESP;
     msrs[n++].index = MSR_IA32_SYSENTER_EIP;
-    if (kvm_has_msr_star(env)) {
+    if (has_msr_star) {
         msrs[n++].index = MSR_STAR;
     }
-    if (kvm_has_msr_hsave_pa(env)) {
+    if (has_msr_hsave_pa) {
         msrs[n++].index = MSR_VM_HSAVE_PA;
     }
     msrs[n++].index = MSR_IA32_TSC;
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [PATCH 23/35] kvm: x86: Reset paravirtual MSRs
  2011-01-06 17:56 ` [Qemu-devel] " Marcelo Tosatti
@ 2011-01-06 17:56   ` Marcelo Tosatti
  -1 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: qemu-devel, kvm, Jan Kiszka, Glauber Costa, Marcelo Tosatti

From: Jan Kiszka <jan.kiszka@siemens.com>

Make sure to write the cleared MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
and MSR_KVM_ASYNC_PF_EN to the kernel state so that a freshly booted
guest cannot be disturbed by old values.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Glauber Costa <glommer@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 target-i386/kvm.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index d8f26bf..8267655 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -845,6 +845,13 @@ static int kvm_put_msrs(CPUState *env, int level)
         if (smp_cpus == 1 || env->tsc != 0) {
             kvm_msr_entry_set(&msrs[n++], MSR_IA32_TSC, env->tsc);
         }
+    }
+    /*
+     * The following paravirtual MSRs have side effects on the guest or are
+     * too heavy for normal writeback. Limit them to reset or full state
+     * updates.
+     */
+    if (level >= KVM_PUT_RESET_STATE) {
         kvm_msr_entry_set(&msrs[n++], MSR_KVM_SYSTEM_TIME,
                           env->system_time_msr);
         kvm_msr_entry_set(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [Qemu-devel] [PATCH 23/35] kvm: x86: Reset paravirtual MSRs
@ 2011-01-06 17:56   ` Marcelo Tosatti
  0 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Jan Kiszka, Glauber Costa, Marcelo Tosatti, qemu-devel, kvm

From: Jan Kiszka <jan.kiszka@siemens.com>

Make sure to write the cleared MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
and MSR_KVM_ASYNC_PF_EN to the kernel state so that a freshly booted
guest cannot be disturbed by old values.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Glauber Costa <glommer@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 target-i386/kvm.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index d8f26bf..8267655 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -845,6 +845,13 @@ static int kvm_put_msrs(CPUState *env, int level)
         if (smp_cpus == 1 || env->tsc != 0) {
             kvm_msr_entry_set(&msrs[n++], MSR_IA32_TSC, env->tsc);
         }
+    }
+    /*
+     * The following paravirtual MSRs have side effects on the guest or are
+     * too heavy for normal writeback. Limit them to reset or full state
+     * updates.
+     */
+    if (level >= KVM_PUT_RESET_STATE) {
         kvm_msr_entry_set(&msrs[n++], MSR_KVM_SYSTEM_TIME,
                           env->system_time_msr);
         kvm_msr_entry_set(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [PATCH 24/35] Synchronize VCPU states before reset
  2011-01-06 17:56 ` [Qemu-devel] " Marcelo Tosatti
@ 2011-01-06 17:56   ` Marcelo Tosatti
  -1 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, kvm, Jan Kiszka, Marcelo Tosatti

From: Jan Kiszka <jan.kiszka@siemens.com>

This is required to support keeping VCPU states across a system reset.
If we do not read the current state before the reset,
cpu_synchronize_all_post_reset may write back incorrect state
information.

The first user of this will be MCE MSR synchronization which currently
works around the missing cpu_synchronize_all_states.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 vl.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/vl.c b/vl.c
index 78fcef1..b0b6605 100644
--- a/vl.c
+++ b/vl.c
@@ -1422,6 +1422,7 @@ static void main_loop(void)
         }
         if (qemu_reset_requested()) {
             pause_all_vcpus();
+            cpu_synchronize_all_states();
             qemu_system_reset();
             resume_all_vcpus();
         }
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [Qemu-devel] [PATCH 24/35] Synchronize VCPU states before reset
@ 2011-01-06 17:56   ` Marcelo Tosatti
  0 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel, kvm

From: Jan Kiszka <jan.kiszka@siemens.com>

This is required to support keeping VCPU states across a system reset.
If we do not read the current state before the reset,
cpu_synchronize_all_post_reset may write back incorrect state
information.

The first user of this will be MCE MSR synchronization which currently
works around the missing cpu_synchronize_all_states.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 vl.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/vl.c b/vl.c
index 78fcef1..b0b6605 100644
--- a/vl.c
+++ b/vl.c
@@ -1422,6 +1422,7 @@ static void main_loop(void)
         }
         if (qemu_reset_requested()) {
             pause_all_vcpus();
+            cpu_synchronize_all_states();
             qemu_system_reset();
             resume_all_vcpus();
         }
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [PATCH 25/35] kvm: x86: Drop MCE MSRs write back restrictions
  2011-01-06 17:56 ` [Qemu-devel] " Marcelo Tosatti
@ 2011-01-06 17:56   ` Marcelo Tosatti
  -1 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, kvm, Jan Kiszka, Huang Ying, Marcelo Tosatti

From: Jan Kiszka <jan.kiszka@siemens.com>

There is no need to restrict writing back MCE MSRs to reset or full
state updates as setting their values has no side effects.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Huang Ying <ying.huang@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 target-i386/kvm.c |   12 ++++--------
 1 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 8267655..1789bff 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -863,14 +863,10 @@ static int kvm_put_msrs(CPUState *env, int level)
     if (env->mcg_cap) {
         int i;
 
-        if (level == KVM_PUT_RESET_STATE) {
-            kvm_msr_entry_set(&msrs[n++], MSR_MCG_STATUS, env->mcg_status);
-        } else if (level == KVM_PUT_FULL_STATE) {
-            kvm_msr_entry_set(&msrs[n++], MSR_MCG_STATUS, env->mcg_status);
-            kvm_msr_entry_set(&msrs[n++], MSR_MCG_CTL, env->mcg_ctl);
-            for (i = 0; i < (env->mcg_cap & 0xff) * 4; i++) {
-                kvm_msr_entry_set(&msrs[n++], MSR_MC0_CTL + i, env->mce_banks[i]);
-            }
+        kvm_msr_entry_set(&msrs[n++], MSR_MCG_STATUS, env->mcg_status);
+        kvm_msr_entry_set(&msrs[n++], MSR_MCG_CTL, env->mcg_ctl);
+        for (i = 0; i < (env->mcg_cap & 0xff) * 4; i++) {
+            kvm_msr_entry_set(&msrs[n++], MSR_MC0_CTL + i, env->mce_banks[i]);
         }
     }
 #endif
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [Qemu-devel] [PATCH 25/35] kvm: x86: Drop MCE MSRs write back restrictions
@ 2011-01-06 17:56   ` Marcelo Tosatti
  0 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel, kvm, Huang Ying

From: Jan Kiszka <jan.kiszka@siemens.com>

There is no need to restrict writing back MCE MSRs to reset or full
state updates as setting their values has no side effects.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Huang Ying <ying.huang@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 target-i386/kvm.c |   12 ++++--------
 1 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 8267655..1789bff 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -863,14 +863,10 @@ static int kvm_put_msrs(CPUState *env, int level)
     if (env->mcg_cap) {
         int i;
 
-        if (level == KVM_PUT_RESET_STATE) {
-            kvm_msr_entry_set(&msrs[n++], MSR_MCG_STATUS, env->mcg_status);
-        } else if (level == KVM_PUT_FULL_STATE) {
-            kvm_msr_entry_set(&msrs[n++], MSR_MCG_STATUS, env->mcg_status);
-            kvm_msr_entry_set(&msrs[n++], MSR_MCG_CTL, env->mcg_ctl);
-            for (i = 0; i < (env->mcg_cap & 0xff) * 4; i++) {
-                kvm_msr_entry_set(&msrs[n++], MSR_MC0_CTL + i, env->mce_banks[i]);
-            }
+        kvm_msr_entry_set(&msrs[n++], MSR_MCG_STATUS, env->mcg_status);
+        kvm_msr_entry_set(&msrs[n++], MSR_MCG_CTL, env->mcg_ctl);
+        for (i = 0; i < (env->mcg_cap & 0xff) * 4; i++) {
+            kvm_msr_entry_set(&msrs[n++], MSR_MC0_CTL + i, env->mce_banks[i]);
         }
     }
 #endif
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [PATCH 26/35] kvm: Eliminate KVMState arguments
  2011-01-06 17:56 ` [Qemu-devel] " Marcelo Tosatti
@ 2011-01-06 17:56   ` Marcelo Tosatti
  -1 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: qemu-devel, kvm, Jan Kiszka, Alexander Graf, Marcelo Tosatti

From: Jan Kiszka <jan.kiszka@siemens.com>

QEMU supports only one VM, so there is only one kvm_state per process,
and we gain nothing passing a reference to it around. Eliminate any need
to refer to it outside of kvm-all.c.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 cpu-defs.h            |    2 -
 kvm-all.c             |  232 +++++++++++++++++++++----------------------------
 kvm-stub.c            |    2 +-
 kvm.h                 |   15 +--
 target-i386/cpuid.c   |    9 +-
 target-i386/kvm.c     |   77 ++++++++--------
 target-i386/kvm_x86.h |    3 +
 target-ppc/kvm.c      |   12 ++--
 target-s390x/kvm.c    |    8 +--
 9 files changed, 160 insertions(+), 200 deletions(-)

diff --git a/cpu-defs.h b/cpu-defs.h
index 8d4bf86..0e04239 100644
--- a/cpu-defs.h
+++ b/cpu-defs.h
@@ -131,7 +131,6 @@ typedef struct icount_decr_u16 {
 #endif
 
 struct kvm_run;
-struct KVMState;
 struct qemu_work_item;
 
 typedef struct CPUBreakpoint {
@@ -207,7 +206,6 @@ typedef struct CPUWatchpoint {
     struct QemuCond *halt_cond;                                         \
     struct qemu_work_item *queued_work_first, *queued_work_last;        \
     const char *cpu_model_str;                                          \
-    struct KVMState *kvm_state;                                         \
     struct kvm_run *kvm_run;                                            \
     int kvm_fd;                                                         \
     int kvm_vcpu_dirty;
diff --git a/kvm-all.c b/kvm-all.c
index ef2ca3b..d8820c7 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -52,8 +52,7 @@ typedef struct KVMSlot
 
 typedef struct kvm_dirty_log KVMDirtyLog;
 
-struct KVMState
-{
+static struct KVMState {
     KVMSlot slots[32];
     int fd;
     int vmfd;
@@ -72,21 +71,19 @@ struct KVMState
     int irqchip_in_kernel;
     int pit_in_kernel;
     int xsave, xcrs;
-};
-
-static KVMState *kvm_state;
+} kvm_state;
 
-static KVMSlot *kvm_alloc_slot(KVMState *s)
+static KVMSlot *kvm_alloc_slot(void)
 {
     int i;
 
-    for (i = 0; i < ARRAY_SIZE(s->slots); i++) {
+    for (i = 0; i < ARRAY_SIZE(kvm_state.slots); i++) {
         /* KVM private memory slots */
         if (i >= 8 && i < 12) {
             continue;
         }
-        if (s->slots[i].memory_size == 0) {
-            return &s->slots[i];
+        if (kvm_state.slots[i].memory_size == 0) {
+            return &kvm_state.slots[i];
         }
     }
 
@@ -94,14 +91,13 @@ static KVMSlot *kvm_alloc_slot(KVMState *s)
     abort();
 }
 
-static KVMSlot *kvm_lookup_matching_slot(KVMState *s,
-                                         target_phys_addr_t start_addr,
+static KVMSlot *kvm_lookup_matching_slot(target_phys_addr_t start_addr,
                                          target_phys_addr_t end_addr)
 {
     int i;
 
-    for (i = 0; i < ARRAY_SIZE(s->slots); i++) {
-        KVMSlot *mem = &s->slots[i];
+    for (i = 0; i < ARRAY_SIZE(kvm_state.slots); i++) {
+        KVMSlot *mem = &kvm_state.slots[i];
 
         if (start_addr == mem->start_addr &&
             end_addr == mem->start_addr + mem->memory_size) {
@@ -115,15 +111,14 @@ static KVMSlot *kvm_lookup_matching_slot(KVMState *s,
 /*
  * Find overlapping slot with lowest start address
  */
-static KVMSlot *kvm_lookup_overlapping_slot(KVMState *s,
-                                            target_phys_addr_t start_addr,
+static KVMSlot *kvm_lookup_overlapping_slot(target_phys_addr_t start_addr,
                                             target_phys_addr_t end_addr)
 {
     KVMSlot *found = NULL;
     int i;
 
-    for (i = 0; i < ARRAY_SIZE(s->slots); i++) {
-        KVMSlot *mem = &s->slots[i];
+    for (i = 0; i < ARRAY_SIZE(kvm_state.slots); i++) {
+        KVMSlot *mem = &kvm_state.slots[i];
 
         if (mem->memory_size == 0 ||
             (found && found->start_addr < mem->start_addr)) {
@@ -139,13 +134,13 @@ static KVMSlot *kvm_lookup_overlapping_slot(KVMState *s,
     return found;
 }
 
-int kvm_physical_memory_addr_from_ram(KVMState *s, ram_addr_t ram_addr,
+int kvm_physical_memory_addr_from_ram(ram_addr_t ram_addr,
                                       target_phys_addr_t *phys_addr)
 {
     int i;
 
-    for (i = 0; i < ARRAY_SIZE(s->slots); i++) {
-        KVMSlot *mem = &s->slots[i];
+    for (i = 0; i < ARRAY_SIZE(kvm_state.slots); i++) {
+        KVMSlot *mem = &kvm_state.slots[i];
 
         if (ram_addr >= mem->phys_offset &&
             ram_addr < mem->phys_offset + mem->memory_size) {
@@ -157,7 +152,7 @@ int kvm_physical_memory_addr_from_ram(KVMState *s, ram_addr_t ram_addr,
     return 0;
 }
 
-static int kvm_set_user_memory_region(KVMState *s, KVMSlot *slot)
+static int kvm_set_user_memory_region(KVMSlot *slot)
 {
     struct kvm_userspace_memory_region mem;
 
@@ -166,10 +161,10 @@ static int kvm_set_user_memory_region(KVMState *s, KVMSlot *slot)
     mem.memory_size = slot->memory_size;
     mem.userspace_addr = (unsigned long)qemu_safe_ram_ptr(slot->phys_offset);
     mem.flags = slot->flags;
-    if (s->migration_log) {
+    if (kvm_state.migration_log) {
         mem.flags |= KVM_MEM_LOG_DIRTY_PAGES;
     }
-    return kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, &mem);
+    return kvm_vm_ioctl(KVM_SET_USER_MEMORY_REGION, &mem);
 }
 
 static void kvm_reset_vcpu(void *opaque)
@@ -181,33 +176,31 @@ static void kvm_reset_vcpu(void *opaque)
 
 int kvm_irqchip_in_kernel(void)
 {
-    return kvm_state->irqchip_in_kernel;
+    return kvm_state.irqchip_in_kernel;
 }
 
 int kvm_pit_in_kernel(void)
 {
-    return kvm_state->pit_in_kernel;
+    return kvm_state.pit_in_kernel;
 }
 
 
 int kvm_init_vcpu(CPUState *env)
 {
-    KVMState *s = kvm_state;
     long mmap_size;
     int ret;
 
     DPRINTF("kvm_init_vcpu\n");
 
-    ret = kvm_vm_ioctl(s, KVM_CREATE_VCPU, env->cpu_index);
+    ret = kvm_vm_ioctl(KVM_CREATE_VCPU, env->cpu_index);
     if (ret < 0) {
         DPRINTF("kvm_create_vcpu failed\n");
         goto err;
     }
 
     env->kvm_fd = ret;
-    env->kvm_state = s;
 
-    mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
+    mmap_size = kvm_ioctl(KVM_GET_VCPU_MMAP_SIZE, 0);
     if (mmap_size < 0) {
         DPRINTF("KVM_GET_VCPU_MMAP_SIZE failed\n");
         goto err;
@@ -222,9 +215,9 @@ int kvm_init_vcpu(CPUState *env)
     }
 
 #ifdef KVM_CAP_COALESCED_MMIO
-    if (s->coalesced_mmio && !s->coalesced_mmio_ring) {
-        s->coalesced_mmio_ring =
-            (void *)env->kvm_run + s->coalesced_mmio * PAGE_SIZE;
+    if (kvm_state.coalesced_mmio && !kvm_state.coalesced_mmio_ring) {
+        kvm_state.coalesced_mmio_ring =
+            (void *)env->kvm_run + kvm_state.coalesced_mmio * PAGE_SIZE;
     }
 #endif
 
@@ -243,8 +236,7 @@ err:
 static int kvm_dirty_pages_log_change(target_phys_addr_t phys_addr,
                                       ram_addr_t size, int flags, int mask)
 {
-    KVMState *s = kvm_state;
-    KVMSlot *mem = kvm_lookup_matching_slot(s, phys_addr, phys_addr + size);
+    KVMSlot *mem = kvm_lookup_matching_slot(phys_addr, phys_addr + size);
     int old_flags;
 
     if (mem == NULL)  {
@@ -260,14 +252,14 @@ static int kvm_dirty_pages_log_change(target_phys_addr_t phys_addr,
     mem->flags = flags;
 
     /* If nothing changed effectively, no need to issue ioctl */
-    if (s->migration_log) {
+    if (kvm_state.migration_log) {
         flags |= KVM_MEM_LOG_DIRTY_PAGES;
     }
     if (flags == old_flags) {
             return 0;
     }
 
-    return kvm_set_user_memory_region(s, mem);
+    return kvm_set_user_memory_region(mem);
 }
 
 int kvm_log_start(target_phys_addr_t phys_addr, ram_addr_t size)
@@ -284,14 +276,13 @@ int kvm_log_stop(target_phys_addr_t phys_addr, ram_addr_t size)
 
 static int kvm_set_migration_log(int enable)
 {
-    KVMState *s = kvm_state;
     KVMSlot *mem;
     int i, err;
 
-    s->migration_log = enable;
+    kvm_state.migration_log = enable;
 
-    for (i = 0; i < ARRAY_SIZE(s->slots); i++) {
-        mem = &s->slots[i];
+    for (i = 0; i < ARRAY_SIZE(kvm_state.slots); i++) {
+        mem = &kvm_state.slots[i];
 
         if (!mem->memory_size) {
             continue;
@@ -299,7 +290,7 @@ static int kvm_set_migration_log(int enable)
         if (!!(mem->flags & KVM_MEM_LOG_DIRTY_PAGES) == enable) {
             continue;
         }
-        err = kvm_set_user_memory_region(s, mem);
+        err = kvm_set_user_memory_region(mem);
         if (err) {
             return err;
         }
@@ -353,7 +344,6 @@ static int kvm_get_dirty_pages_log_range(unsigned long start_addr,
 static int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
                                           target_phys_addr_t end_addr)
 {
-    KVMState *s = kvm_state;
     unsigned long size, allocated_size = 0;
     KVMDirtyLog d;
     KVMSlot *mem;
@@ -361,7 +351,7 @@ static int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
 
     d.dirty_bitmap = NULL;
     while (start_addr < end_addr) {
-        mem = kvm_lookup_overlapping_slot(s, start_addr, end_addr);
+        mem = kvm_lookup_overlapping_slot(start_addr, end_addr);
         if (mem == NULL) {
             break;
         }
@@ -377,7 +367,7 @@ static int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
 
         d.slot = mem->slot;
 
-        if (kvm_vm_ioctl(s, KVM_GET_DIRTY_LOG, &d) == -1) {
+        if (kvm_vm_ioctl(KVM_GET_DIRTY_LOG, &d) == -1) {
             DPRINTF("ioctl failed %d\n", errno);
             ret = -1;
             break;
@@ -395,16 +385,15 @@ static int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
 int kvm_coalesce_mmio_region(target_phys_addr_t start, ram_addr_t size)
 {
     int ret = -ENOSYS;
-#ifdef KVM_CAP_COALESCED_MMIO
-    KVMState *s = kvm_state;
 
-    if (s->coalesced_mmio) {
+#ifdef KVM_CAP_COALESCED_MMIO
+    if (kvm_state.coalesced_mmio) {
         struct kvm_coalesced_mmio_zone zone;
 
         zone.addr = start;
         zone.size = size;
 
-        ret = kvm_vm_ioctl(s, KVM_REGISTER_COALESCED_MMIO, &zone);
+        ret = kvm_vm_ioctl(KVM_REGISTER_COALESCED_MMIO, &zone);
     }
 #endif
 
@@ -414,27 +403,26 @@ int kvm_coalesce_mmio_region(target_phys_addr_t start, ram_addr_t size)
 int kvm_uncoalesce_mmio_region(target_phys_addr_t start, ram_addr_t size)
 {
     int ret = -ENOSYS;
-#ifdef KVM_CAP_COALESCED_MMIO
-    KVMState *s = kvm_state;
 
-    if (s->coalesced_mmio) {
+#ifdef KVM_CAP_COALESCED_MMIO
+    if (kvm_state.coalesced_mmio) {
         struct kvm_coalesced_mmio_zone zone;
 
         zone.addr = start;
         zone.size = size;
 
-        ret = kvm_vm_ioctl(s, KVM_UNREGISTER_COALESCED_MMIO, &zone);
+        ret = kvm_vm_ioctl(KVM_UNREGISTER_COALESCED_MMIO, &zone);
     }
 #endif
 
     return ret;
 }
 
-int kvm_check_extension(KVMState *s, unsigned int extension)
+int kvm_check_extension(unsigned int extension)
 {
     int ret;
 
-    ret = kvm_ioctl(s, KVM_CHECK_EXTENSION, extension);
+    ret = kvm_ioctl(KVM_CHECK_EXTENSION, extension);
     if (ret < 0) {
         ret = 0;
     }
@@ -445,7 +433,6 @@ int kvm_check_extension(KVMState *s, unsigned int extension)
 static void kvm_set_phys_mem(target_phys_addr_t start_addr, ram_addr_t size,
                              ram_addr_t phys_offset)
 {
-    KVMState *s = kvm_state;
     ram_addr_t flags = phys_offset & ~TARGET_PAGE_MASK;
     KVMSlot *mem, old;
     int err;
@@ -459,7 +446,7 @@ static void kvm_set_phys_mem(target_phys_addr_t start_addr, ram_addr_t size,
     phys_offset &= ~IO_MEM_ROM;
 
     while (1) {
-        mem = kvm_lookup_overlapping_slot(s, start_addr, start_addr + size);
+        mem = kvm_lookup_overlapping_slot(start_addr, start_addr + size);
         if (!mem) {
             break;
         }
@@ -476,7 +463,7 @@ static void kvm_set_phys_mem(target_phys_addr_t start_addr, ram_addr_t size,
 
         /* unregister the overlapping slot */
         mem->memory_size = 0;
-        err = kvm_set_user_memory_region(s, mem);
+        err = kvm_set_user_memory_region(mem);
         if (err) {
             fprintf(stderr, "%s: error unregistering overlapping slot: %s\n",
                     __func__, strerror(-err));
@@ -491,16 +478,16 @@ static void kvm_set_phys_mem(target_phys_addr_t start_addr, ram_addr_t size,
          * address as the first existing one. If not or if some overlapping
          * slot comes around later, we will fail (not seen in practice so far)
          * - and actually require a recent KVM version. */
-        if (s->broken_set_mem_region &&
+        if (kvm_state.broken_set_mem_region &&
             old.start_addr == start_addr && old.memory_size < size &&
             flags < IO_MEM_UNASSIGNED) {
-            mem = kvm_alloc_slot(s);
+            mem = kvm_alloc_slot();
             mem->memory_size = old.memory_size;
             mem->start_addr = old.start_addr;
             mem->phys_offset = old.phys_offset;
             mem->flags = 0;
 
-            err = kvm_set_user_memory_region(s, mem);
+            err = kvm_set_user_memory_region(mem);
             if (err) {
                 fprintf(stderr, "%s: error updating slot: %s\n", __func__,
                         strerror(-err));
@@ -515,13 +502,13 @@ static void kvm_set_phys_mem(target_phys_addr_t start_addr, ram_addr_t size,
 
         /* register prefix slot */
         if (old.start_addr < start_addr) {
-            mem = kvm_alloc_slot(s);
+            mem = kvm_alloc_slot();
             mem->memory_size = start_addr - old.start_addr;
             mem->start_addr = old.start_addr;
             mem->phys_offset = old.phys_offset;
             mem->flags = 0;
 
-            err = kvm_set_user_memory_region(s, mem);
+            err = kvm_set_user_memory_region(mem);
             if (err) {
                 fprintf(stderr, "%s: error registering prefix slot: %s\n",
                         __func__, strerror(-err));
@@ -533,14 +520,14 @@ static void kvm_set_phys_mem(target_phys_addr_t start_addr, ram_addr_t size,
         if (old.start_addr + old.memory_size > start_addr + size) {
             ram_addr_t size_delta;
 
-            mem = kvm_alloc_slot(s);
+            mem = kvm_alloc_slot();
             mem->start_addr = start_addr + size;
             size_delta = mem->start_addr - old.start_addr;
             mem->memory_size = old.memory_size - size_delta;
             mem->phys_offset = old.phys_offset + size_delta;
             mem->flags = 0;
 
-            err = kvm_set_user_memory_region(s, mem);
+            err = kvm_set_user_memory_region(mem);
             if (err) {
                 fprintf(stderr, "%s: error registering suffix slot: %s\n",
                         __func__, strerror(-err));
@@ -557,13 +544,13 @@ static void kvm_set_phys_mem(target_phys_addr_t start_addr, ram_addr_t size,
     if (flags >= IO_MEM_UNASSIGNED) {
         return;
     }
-    mem = kvm_alloc_slot(s);
+    mem = kvm_alloc_slot();
     mem->memory_size = size;
     mem->start_addr = start_addr;
     mem->phys_offset = phys_offset;
     mem->flags = 0;
 
-    err = kvm_set_user_memory_region(s, mem);
+    err = kvm_set_user_memory_region(mem);
     if (err) {
         fprintf(stderr, "%s: error registering slot: %s\n", __func__,
                 strerror(-err));
@@ -602,27 +589,24 @@ int kvm_init(int smp_cpus)
     static const char upgrade_note[] =
         "Please upgrade to at least kernel 2.6.29 or recent kvm-kmod\n"
         "(see http://sourceforge.net/projects/kvm).\n";
-    KVMState *s;
     int ret;
     int i;
 
-    s = qemu_mallocz(sizeof(KVMState));
-
 #ifdef KVM_CAP_SET_GUEST_DEBUG
-    QTAILQ_INIT(&s->kvm_sw_breakpoints);
+    QTAILQ_INIT(&kvm_state.kvm_sw_breakpoints);
 #endif
-    for (i = 0; i < ARRAY_SIZE(s->slots); i++) {
-        s->slots[i].slot = i;
+    for (i = 0; i < ARRAY_SIZE(kvm_state.slots); i++) {
+        kvm_state.slots[i].slot = i;
     }
-    s->vmfd = -1;
-    s->fd = qemu_open("/dev/kvm", O_RDWR);
-    if (s->fd == -1) {
+    kvm_state.vmfd = -1;
+    kvm_state.fd = qemu_open("/dev/kvm", O_RDWR);
+    if (kvm_state.fd == -1) {
         fprintf(stderr, "Could not access KVM kernel module: %m\n");
         ret = -errno;
         goto err;
     }
 
-    ret = kvm_ioctl(s, KVM_GET_API_VERSION, 0);
+    ret = kvm_ioctl(KVM_GET_API_VERSION, 0);
     if (ret < KVM_API_VERSION) {
         if (ret > 0) {
             ret = -EINVAL;
@@ -637,8 +621,8 @@ int kvm_init(int smp_cpus)
         goto err;
     }
 
-    s->vmfd = kvm_ioctl(s, KVM_CREATE_VM, 0);
-    if (s->vmfd < 0) {
+    kvm_state.vmfd = kvm_ioctl(KVM_CREATE_VM, 0);
+    if (kvm_state.vmfd < 0) {
 #ifdef TARGET_S390X
         fprintf(stderr, "Please add the 'switch_amode' kernel parameter to "
                         "your host kernel command line\n");
@@ -651,7 +635,7 @@ int kvm_init(int smp_cpus)
      * just use a user allocated buffer so we can use regular pages
      * unmodified.  Make sure we have a sufficiently modern version of KVM.
      */
-    if (!kvm_check_extension(s, KVM_CAP_USER_MEMORY)) {
+    if (!kvm_check_extension(KVM_CAP_USER_MEMORY)) {
         ret = -EINVAL;
         fprintf(stderr, "kvm does not support KVM_CAP_USER_MEMORY\n%s",
                 upgrade_note);
@@ -661,7 +645,7 @@ int kvm_init(int smp_cpus)
     /* There was a nasty bug in < kvm-80 that prevents memory slots from being
      * destroyed properly.  Since we rely on this capability, refuse to work
      * with any kernel without this capability. */
-    if (!kvm_check_extension(s, KVM_CAP_DESTROY_MEMORY_REGION_WORKS)) {
+    if (!kvm_check_extension(KVM_CAP_DESTROY_MEMORY_REGION_WORKS)) {
         ret = -EINVAL;
 
         fprintf(stderr,
@@ -670,66 +654,55 @@ int kvm_init(int smp_cpus)
         goto err;
     }
 
-    s->coalesced_mmio = 0;
 #ifdef KVM_CAP_COALESCED_MMIO
-    s->coalesced_mmio = kvm_check_extension(s, KVM_CAP_COALESCED_MMIO);
-    s->coalesced_mmio_ring = NULL;
+    kvm_state.coalesced_mmio = kvm_check_extension(KVM_CAP_COALESCED_MMIO);
 #endif
 
-    s->broken_set_mem_region = 1;
+    kvm_state.broken_set_mem_region = 1;
 #ifdef KVM_CAP_JOIN_MEMORY_REGIONS_WORKS
-    ret = kvm_check_extension(s, KVM_CAP_JOIN_MEMORY_REGIONS_WORKS);
+    ret = kvm_check_extension(KVM_CAP_JOIN_MEMORY_REGIONS_WORKS);
     if (ret > 0) {
-        s->broken_set_mem_region = 0;
+        kvm_state.broken_set_mem_region = 0;
     }
 #endif
 
-    s->vcpu_events = 0;
 #ifdef KVM_CAP_VCPU_EVENTS
-    s->vcpu_events = kvm_check_extension(s, KVM_CAP_VCPU_EVENTS);
+    kvm_state.vcpu_events = kvm_check_extension(KVM_CAP_VCPU_EVENTS);
 #endif
 
-    s->robust_singlestep = 0;
 #ifdef KVM_CAP_X86_ROBUST_SINGLESTEP
-    s->robust_singlestep =
-        kvm_check_extension(s, KVM_CAP_X86_ROBUST_SINGLESTEP);
+    kvm_state.robust_singlestep =
+        kvm_check_extension(KVM_CAP_X86_ROBUST_SINGLESTEP);
 #endif
 
-    s->debugregs = 0;
 #ifdef KVM_CAP_DEBUGREGS
-    s->debugregs = kvm_check_extension(s, KVM_CAP_DEBUGREGS);
+    kvm_state.debugregs = kvm_check_extension(KVM_CAP_DEBUGREGS);
 #endif
 
-    s->xsave = 0;
 #ifdef KVM_CAP_XSAVE
-    s->xsave = kvm_check_extension(s, KVM_CAP_XSAVE);
+    kvm_state.xsave = kvm_check_extension(KVM_CAP_XSAVE);
 #endif
 
-    s->xcrs = 0;
 #ifdef KVM_CAP_XCRS
-    s->xcrs = kvm_check_extension(s, KVM_CAP_XCRS);
+    kvm_state.xcrs = kvm_check_extension(KVM_CAP_XCRS);
 #endif
 
-    ret = kvm_arch_init(s, smp_cpus);
+    ret = kvm_arch_init(smp_cpus);
     if (ret < 0) {
         goto err;
     }
 
-    kvm_state = s;
     cpu_register_phys_memory_client(&kvm_cpu_phys_memory_client);
 
     return 0;
 
 err:
-    if (s) {
-        if (s->vmfd != -1) {
-            close(s->vmfd);
-        }
-        if (s->fd != -1) {
-            close(s->fd);
-        }
+    if (kvm_state.vmfd != -1) {
+        close(kvm_state.vmfd);
+    }
+    if (kvm_state.fd != -1) {
+        close(kvm_state.fd);
     }
-    qemu_free(s);
 
     return ret;
 }
@@ -777,7 +750,7 @@ static int kvm_handle_io(uint16_t port, void *data, int direction, int size,
 static int kvm_handle_internal_error(CPUState *env, struct kvm_run *run)
 {
     fprintf(stderr, "KVM internal error.");
-    if (kvm_check_extension(kvm_state, KVM_CAP_INTERNAL_ERROR_DATA)) {
+    if (kvm_check_extension(KVM_CAP_INTERNAL_ERROR_DATA)) {
         int i;
 
         fprintf(stderr, " Suberror: %d\n", run->internal.suberror);
@@ -805,9 +778,8 @@ static int kvm_handle_internal_error(CPUState *env, struct kvm_run *run)
 void kvm_flush_coalesced_mmio_buffer(void)
 {
 #ifdef KVM_CAP_COALESCED_MMIO
-    KVMState *s = kvm_state;
-    if (s->coalesced_mmio_ring) {
-        struct kvm_coalesced_mmio_ring *ring = s->coalesced_mmio_ring;
+    if (kvm_state.coalesced_mmio_ring) {
+        struct kvm_coalesced_mmio_ring *ring = kvm_state.coalesced_mmio_ring;
         while (ring->first != ring->last) {
             struct kvm_coalesced_mmio *ent;
 
@@ -963,7 +935,7 @@ void kvm_cpu_exec(CPUState *env)
     }
 }
 
-int kvm_ioctl(KVMState *s, int type, ...)
+int kvm_ioctl(int type, ...)
 {
     int ret;
     void *arg;
@@ -973,14 +945,14 @@ int kvm_ioctl(KVMState *s, int type, ...)
     arg = va_arg(ap, void *);
     va_end(ap);
 
-    ret = ioctl(s->fd, type, arg);
+    ret = ioctl(kvm_state.fd, type, arg);
     if (ret == -1) {
         ret = -errno;
     }
     return ret;
 }
 
-int kvm_vm_ioctl(KVMState *s, int type, ...)
+int kvm_vm_ioctl(int type, ...)
 {
     int ret;
     void *arg;
@@ -990,7 +962,7 @@ int kvm_vm_ioctl(KVMState *s, int type, ...)
     arg = va_arg(ap, void *);
     va_end(ap);
 
-    ret = ioctl(s->vmfd, type, arg);
+    ret = ioctl(kvm_state.vmfd, type, arg);
     if (ret == -1) {
         ret = -errno;
     }
@@ -1017,9 +989,7 @@ int kvm_vcpu_ioctl(CPUState *env, int type, ...)
 int kvm_has_sync_mmu(void)
 {
 #ifdef KVM_CAP_SYNC_MMU
-    KVMState *s = kvm_state;
-
-    return kvm_check_extension(s, KVM_CAP_SYNC_MMU);
+    return kvm_check_extension(KVM_CAP_SYNC_MMU);
 #else
     return 0;
 #endif
@@ -1027,27 +997,27 @@ int kvm_has_sync_mmu(void)
 
 int kvm_has_vcpu_events(void)
 {
-    return kvm_state->vcpu_events;
+    return kvm_state.vcpu_events;
 }
 
 int kvm_has_robust_singlestep(void)
 {
-    return kvm_state->robust_singlestep;
+    return kvm_state.robust_singlestep;
 }
 
 int kvm_has_debugregs(void)
 {
-    return kvm_state->debugregs;
+    return kvm_state.debugregs;
 }
 
 int kvm_has_xsave(void)
 {
-    return kvm_state->xsave;
+    return kvm_state.xsave;
 }
 
 int kvm_has_xcrs(void)
 {
-    return kvm_state->xcrs;
+    return kvm_state.xcrs;
 }
 
 void kvm_setup_guest_memory(void *start, size_t size)
@@ -1070,7 +1040,7 @@ struct kvm_sw_breakpoint *kvm_find_sw_breakpoint(CPUState *env,
 {
     struct kvm_sw_breakpoint *bp;
 
-    QTAILQ_FOREACH(bp, &env->kvm_state->kvm_sw_breakpoints, entry) {
+    QTAILQ_FOREACH(bp, &kvm_state.kvm_sw_breakpoints, entry) {
         if (bp->pc == pc) {
             return bp;
         }
@@ -1080,7 +1050,7 @@ struct kvm_sw_breakpoint *kvm_find_sw_breakpoint(CPUState *env,
 
 int kvm_sw_breakpoints_active(CPUState *env)
 {
-    return !QTAILQ_EMPTY(&env->kvm_state->kvm_sw_breakpoints);
+    return !QTAILQ_EMPTY(&kvm_state.kvm_sw_breakpoints);
 }
 
 struct kvm_set_guest_debug_data {
@@ -1140,8 +1110,7 @@ int kvm_insert_breakpoint(CPUState *current_env, target_ulong addr,
             return err;
         }
 
-        QTAILQ_INSERT_HEAD(&current_env->kvm_state->kvm_sw_breakpoints,
-                          bp, entry);
+        QTAILQ_INSERT_HEAD(&kvm_state.kvm_sw_breakpoints, bp, entry);
     } else {
         err = kvm_arch_insert_hw_breakpoint(addr, len, type);
         if (err) {
@@ -1181,7 +1150,7 @@ int kvm_remove_breakpoint(CPUState *current_env, target_ulong addr,
             return err;
         }
 
-        QTAILQ_REMOVE(&current_env->kvm_state->kvm_sw_breakpoints, bp, entry);
+        QTAILQ_REMOVE(&kvm_state.kvm_sw_breakpoints, bp, entry);
         qemu_free(bp);
     } else {
         err = kvm_arch_remove_hw_breakpoint(addr, len, type);
@@ -1202,10 +1171,9 @@ int kvm_remove_breakpoint(CPUState *current_env, target_ulong addr,
 void kvm_remove_all_breakpoints(CPUState *current_env)
 {
     struct kvm_sw_breakpoint *bp, *next;
-    KVMState *s = current_env->kvm_state;
     CPUState *env;
 
-    QTAILQ_FOREACH_SAFE(bp, &s->kvm_sw_breakpoints, entry, next) {
+    QTAILQ_FOREACH_SAFE(bp, &kvm_state.kvm_sw_breakpoints, entry, next) {
         if (kvm_arch_remove_sw_breakpoint(current_env, bp) != 0) {
             /* Try harder to find a CPU that currently sees the breakpoint. */
             for (env = first_cpu; env != NULL; env = env->next_cpu) {
@@ -1285,7 +1253,7 @@ int kvm_set_ioeventfd_mmio_long(int fd, uint32_t addr, uint32_t val, bool assign
         iofd.flags |= KVM_IOEVENTFD_FLAG_DEASSIGN;
     }
 
-    ret = kvm_vm_ioctl(kvm_state, KVM_IOEVENTFD, &iofd);
+    ret = kvm_vm_ioctl(KVM_IOEVENTFD, &iofd);
 
     if (ret < 0) {
         return -errno;
@@ -1314,7 +1282,7 @@ int kvm_set_ioeventfd_pio_word(int fd, uint16_t addr, uint16_t val, bool assign)
     if (!assign) {
         kick.flags |= KVM_IOEVENTFD_FLAG_DEASSIGN;
     }
-    r = kvm_vm_ioctl(kvm_state, KVM_IOEVENTFD, &kick);
+    r = kvm_vm_ioctl(KVM_IOEVENTFD, &kick);
     if (r < 0) {
         return r;
     }
diff --git a/kvm-stub.c b/kvm-stub.c
index 352c6a6..3a058ad 100644
--- a/kvm-stub.c
+++ b/kvm-stub.c
@@ -53,7 +53,7 @@ int kvm_uncoalesce_mmio_region(target_phys_addr_t start, ram_addr_t size)
     return -ENOSYS;
 }
 
-int kvm_check_extension(KVMState *s, unsigned int extension)
+int kvm_check_extension(unsigned int extension)
 {
     return 0;
 }
diff --git a/kvm.h b/kvm.h
index 51ad56f..26ca8c1 100644
--- a/kvm.h
+++ b/kvm.h
@@ -74,12 +74,9 @@ int kvm_irqchip_in_kernel(void);
 
 /* internal API */
 
-struct KVMState;
-typedef struct KVMState KVMState;
+int kvm_ioctl(int type, ...);
 
-int kvm_ioctl(KVMState *s, int type, ...);
-
-int kvm_vm_ioctl(KVMState *s, int type, ...);
+int kvm_vm_ioctl(int type, ...);
 
 int kvm_vcpu_ioctl(CPUState *env, int type, ...);
 
@@ -104,7 +101,7 @@ int kvm_arch_get_registers(CPUState *env);
 
 int kvm_arch_put_registers(CPUState *env, int level);
 
-int kvm_arch_init(KVMState *s, int smp_cpus);
+int kvm_arch_init(int smp_cpus);
 
 int kvm_arch_init_vcpu(CPUState *env);
 
@@ -146,10 +143,8 @@ void kvm_arch_update_guest_debug(CPUState *env, struct kvm_guest_debug *dbg);
 
 bool kvm_arch_stop_on_emulation_error(CPUState *env);
 
-int kvm_check_extension(KVMState *s, unsigned int extension);
+int kvm_check_extension(unsigned int extension);
 
-uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function,
-                                      uint32_t index, int reg);
 void kvm_cpu_synchronize_state(CPUState *env);
 void kvm_cpu_synchronize_post_reset(CPUState *env);
 void kvm_cpu_synchronize_post_init(CPUState *env);
@@ -179,7 +174,7 @@ static inline void cpu_synchronize_post_init(CPUState *env)
 
 
 #if !defined(CONFIG_USER_ONLY)
-int kvm_physical_memory_addr_from_ram(KVMState *s, ram_addr_t ram_addr,
+int kvm_physical_memory_addr_from_ram(ram_addr_t ram_addr,
                                       target_phys_addr_t *phys_addr);
 #endif
 
diff --git a/target-i386/cpuid.c b/target-i386/cpuid.c
index 5382a28..17ab619 100644
--- a/target-i386/cpuid.c
+++ b/target-i386/cpuid.c
@@ -23,6 +23,7 @@
 
 #include "cpu.h"
 #include "kvm.h"
+#include "kvm_x86.h"
 
 #include "qemu-option.h"
 #include "qemu-config.h"
@@ -1138,10 +1139,10 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
             break;
         }
         if (kvm_enabled()) {
-            *eax = kvm_arch_get_supported_cpuid(env, 0xd, count, R_EAX);
-            *ebx = kvm_arch_get_supported_cpuid(env, 0xd, count, R_EBX);
-            *ecx = kvm_arch_get_supported_cpuid(env, 0xd, count, R_ECX);
-            *edx = kvm_arch_get_supported_cpuid(env, 0xd, count, R_EDX);
+            *eax = kvm_x86_get_supported_cpuid(0xd, count, R_EAX);
+            *ebx = kvm_x86_get_supported_cpuid(0xd, count, R_EBX);
+            *ecx = kvm_x86_get_supported_cpuid(0xd, count, R_ECX);
+            *edx = kvm_x86_get_supported_cpuid(0xd, count, R_EDX);
         } else {
             *eax = 0;
             *ebx = 0;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 1789bff..cb6883f 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -60,7 +60,7 @@ static int lm_capable_kernel;
 
 #ifdef KVM_CAP_EXT_CPUID
 
-static struct kvm_cpuid2 *try_get_cpuid(KVMState *s, int max)
+static struct kvm_cpuid2 *try_get_cpuid(int max)
 {
     struct kvm_cpuid2 *cpuid;
     int r, size;
@@ -68,7 +68,7 @@ static struct kvm_cpuid2 *try_get_cpuid(KVMState *s, int max)
     size = sizeof(*cpuid) + max * sizeof(*cpuid->entries);
     cpuid = (struct kvm_cpuid2 *)qemu_mallocz(size);
     cpuid->nent = max;
-    r = kvm_ioctl(s, KVM_GET_SUPPORTED_CPUID, cpuid);
+    r = kvm_ioctl(KVM_GET_SUPPORTED_CPUID, cpuid);
     if (r == 0 && cpuid->nent >= max) {
         r = -E2BIG;
     }
@@ -85,20 +85,20 @@ static struct kvm_cpuid2 *try_get_cpuid(KVMState *s, int max)
     return cpuid;
 }
 
-uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function,
-                                      uint32_t index, int reg)
+uint32_t kvm_x86_get_supported_cpuid(uint32_t function, uint32_t index,
+                                     int reg)
 {
     struct kvm_cpuid2 *cpuid;
     int i, max;
     uint32_t ret = 0;
     uint32_t cpuid_1_edx;
 
-    if (!kvm_check_extension(env->kvm_state, KVM_CAP_EXT_CPUID)) {
+    if (!kvm_check_extension(KVM_CAP_EXT_CPUID)) {
         return -1U;
     }
 
     max = 1;
-    while ((cpuid = try_get_cpuid(env->kvm_state, max)) == NULL) {
+    while ((cpuid = try_get_cpuid(max)) == NULL) {
         max *= 2;
     }
 
@@ -126,7 +126,7 @@ uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function,
                     /* On Intel, kvm returns cpuid according to the Intel spec,
                      * so add missing bits according to the AMD spec:
                      */
-                    cpuid_1_edx = kvm_arch_get_supported_cpuid(env, 1, 0, R_EDX);
+                    cpuid_1_edx = kvm_x86_get_supported_cpuid(1, 0, R_EDX);
                     ret |= cpuid_1_edx & 0x183f7ff;
                     break;
                 }
@@ -142,8 +142,8 @@ uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function,
 
 #else
 
-uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function,
-                                      uint32_t index, int reg)
+uint32_t kvm_x86_get_supported_cpuid(uint32_t function, uint32_t index,
+                                     int reg)
 {
     return -1U;
 }
@@ -170,12 +170,12 @@ struct kvm_para_features {
     { -1, -1 }
 };
 
-static int get_para_features(CPUState *env)
+static int get_para_features(void)
 {
     int i, features = 0;
 
     for (i = 0; i < ARRAY_SIZE(para_features) - 1; i++) {
-        if (kvm_check_extension(env->kvm_state, para_features[i].cap)) {
+        if (kvm_check_extension(para_features[i].cap)) {
             features |= (1 << para_features[i].feature);
         }
     }
@@ -184,15 +184,14 @@ static int get_para_features(CPUState *env)
 #endif
 
 #ifdef KVM_CAP_MCE
-static int kvm_get_mce_cap_supported(KVMState *s, uint64_t *mce_cap,
-                                     int *max_banks)
+static int kvm_get_mce_cap_supported(uint64_t *mce_cap, int *max_banks)
 {
     int r;
 
-    r = kvm_check_extension(s, KVM_CAP_MCE);
+    r = kvm_check_extension(KVM_CAP_MCE);
     if (r > 0) {
         *max_banks = r;
-        return kvm_ioctl(s, KVM_X86_GET_MCE_CAP_SUPPORTED, mce_cap);
+        return kvm_ioctl(KVM_X86_GET_MCE_CAP_SUPPORTED, mce_cap);
     }
     return -ENOSYS;
 }
@@ -323,18 +322,18 @@ int kvm_arch_init_vcpu(CPUState *env)
     uint32_t signature[3];
 #endif
 
-    env->cpuid_features &= kvm_arch_get_supported_cpuid(env, 1, 0, R_EDX);
+    env->cpuid_features &= kvm_x86_get_supported_cpuid(1, 0, R_EDX);
 
     i = env->cpuid_ext_features & CPUID_EXT_HYPERVISOR;
-    env->cpuid_ext_features &= kvm_arch_get_supported_cpuid(env, 1, 0, R_ECX);
+    env->cpuid_ext_features &= kvm_x86_get_supported_cpuid(1, 0, R_ECX);
     env->cpuid_ext_features |= i;
 
-    env->cpuid_ext2_features &= kvm_arch_get_supported_cpuid(env, 0x80000001,
-                                                             0, R_EDX);
-    env->cpuid_ext3_features &= kvm_arch_get_supported_cpuid(env, 0x80000001,
-                                                             0, R_ECX);
-    env->cpuid_svm_features  &= kvm_arch_get_supported_cpuid(env, 0x8000000A,
-                                                             0, R_EDX);
+    env->cpuid_ext2_features &= kvm_x86_get_supported_cpuid(0x80000001,
+                                                            0, R_EDX);
+    env->cpuid_ext3_features &= kvm_x86_get_supported_cpuid(0x80000001,
+                                                            0, R_ECX);
+    env->cpuid_svm_features  &= kvm_x86_get_supported_cpuid(0x8000000A,
+                                                            0, R_EDX);
 
 
     cpuid_i = 0;
@@ -353,7 +352,7 @@ int kvm_arch_init_vcpu(CPUState *env)
     c = &cpuid_data.entries[cpuid_i++];
     memset(c, 0, sizeof(*c));
     c->function = KVM_CPUID_FEATURES;
-    c->eax = env->cpuid_kvm_features & get_para_features(env);
+    c->eax = env->cpuid_kvm_features & get_para_features();
 #endif
 
     cpu_x86_cpuid(env, 0, 0, &limit, &unused, &unused, &unused);
@@ -423,11 +422,11 @@ int kvm_arch_init_vcpu(CPUState *env)
 #ifdef KVM_CAP_MCE
     if (((env->cpuid_version >> 8)&0xF) >= 6
         && (env->cpuid_features&(CPUID_MCE|CPUID_MCA)) == (CPUID_MCE|CPUID_MCA)
-        && kvm_check_extension(env->kvm_state, KVM_CAP_MCE) > 0) {
+        && kvm_check_extension(KVM_CAP_MCE) > 0) {
         uint64_t mcg_cap;
         int banks;
 
-        if (kvm_get_mce_cap_supported(env->kvm_state, &mcg_cap, &banks)) {
+        if (kvm_get_mce_cap_supported(&mcg_cap, &banks)) {
             perror("kvm_get_mce_cap_supported FAILED");
         } else {
             if (banks > MCE_BANKS_DEF)
@@ -461,7 +460,7 @@ void kvm_arch_reset_vcpu(CPUState *env)
     }
 }
 
-static int kvm_get_supported_msrs(KVMState *s)
+static int kvm_get_supported_msrs(void)
 {
     static int kvm_supported_msrs;
     int ret = 0;
@@ -475,7 +474,7 @@ static int kvm_get_supported_msrs(KVMState *s)
         /* Obtain MSR list from KVM.  These are the MSRs that we must
          * save/restore */
         msr_list.nmsrs = 0;
-        ret = kvm_ioctl(s, KVM_GET_MSR_INDEX_LIST, &msr_list);
+        ret = kvm_ioctl(KVM_GET_MSR_INDEX_LIST, &msr_list);
         if (ret < 0 && ret != -E2BIG) {
             return ret;
         }
@@ -486,7 +485,7 @@ static int kvm_get_supported_msrs(KVMState *s)
                                               sizeof(msr_list.indices[0])));
 
         kvm_msr_list->nmsrs = msr_list.nmsrs;
-        ret = kvm_ioctl(s, KVM_GET_MSR_INDEX_LIST, kvm_msr_list);
+        ret = kvm_ioctl(KVM_GET_MSR_INDEX_LIST, kvm_msr_list);
         if (ret >= 0) {
             int i;
 
@@ -508,17 +507,17 @@ static int kvm_get_supported_msrs(KVMState *s)
     return ret;
 }
 
-static int kvm_init_identity_map_page(KVMState *s)
+static int kvm_init_identity_map_page(void)
 {
 #ifdef KVM_CAP_SET_IDENTITY_MAP_ADDR
     int ret;
     uint64_t addr = 0xfffbc000;
 
-    if (!kvm_check_extension(s, KVM_CAP_SET_IDENTITY_MAP_ADDR)) {
+    if (!kvm_check_extension(KVM_CAP_SET_IDENTITY_MAP_ADDR)) {
         return 0;
     }
 
-    ret = kvm_vm_ioctl(s, KVM_SET_IDENTITY_MAP_ADDR, &addr);
+    ret = kvm_vm_ioctl(KVM_SET_IDENTITY_MAP_ADDR, &addr);
     if (ret < 0) {
         fprintf(stderr, "kvm_set_identity_map_addr: %s\n", strerror(ret));
         return ret;
@@ -527,12 +526,12 @@ static int kvm_init_identity_map_page(KVMState *s)
     return 0;
 }
 
-int kvm_arch_init(KVMState *s, int smp_cpus)
+int kvm_arch_init(int smp_cpus)
 {
     int ret;
     struct utsname utsname;
 
-    ret = kvm_get_supported_msrs(s);
+    ret = kvm_get_supported_msrs();
     if (ret < 0) {
         return ret;
     }
@@ -546,7 +545,7 @@ int kvm_arch_init(KVMState *s, int smp_cpus)
      * versions of KVM just assumed that it would be at the end of physical
      * memory but that doesn't work with more than 4GB of memory.  We simply
      * refuse to work with those older versions of KVM. */
-    ret = kvm_check_extension(s, KVM_CAP_SET_TSS_ADDR);
+    ret = kvm_check_extension(KVM_CAP_SET_TSS_ADDR);
     if (ret <= 0) {
         fprintf(stderr, "kvm does not support KVM_CAP_SET_TSS_ADDR\n");
         return ret;
@@ -563,12 +562,12 @@ int kvm_arch_init(KVMState *s, int smp_cpus)
         perror("e820_add_entry() table is full");
         exit(1);
     }
-    ret = kvm_vm_ioctl(s, KVM_SET_TSS_ADDR, 0xfffbd000);
+    ret = kvm_vm_ioctl(KVM_SET_TSS_ADDR, 0xfffbd000);
     if (ret < 0) {
         return ret;
     }
 
-    return kvm_init_identity_map_page(s);
+    return kvm_init_identity_map_page();
 }
 
 static void set_v8086_seg(struct kvm_segment *lhs, const SegmentCache *rhs)
@@ -1861,7 +1860,7 @@ int kvm_on_sigbus_vcpu(CPUState *env, int code, void *addr)
             || code == BUS_MCEERR_AO)) {
         vaddr = (void *)addr;
         if (qemu_ram_addr_from_host(vaddr, &ram_addr) ||
-            !kvm_physical_memory_addr_from_ram(env->kvm_state, ram_addr, &paddr)) {
+            !kvm_physical_memory_addr_from_ram(ram_addr, &paddr)) {
             fprintf(stderr, "Hardware memory error for memory used by "
                     "QEMU itself instead of guest system!\n");
             /* Hope we are lucky for AO MCE */
@@ -1910,7 +1909,7 @@ int kvm_on_sigbus(int code, void *addr)
         /* Hope we are lucky for AO MCE */
         vaddr = addr;
         if (qemu_ram_addr_from_host(vaddr, &ram_addr) ||
-            !kvm_physical_memory_addr_from_ram(first_cpu->kvm_state, ram_addr, &paddr)) {
+            !kvm_physical_memory_addr_from_ram(ram_addr, &paddr)) {
             fprintf(stderr, "Hardware memory error for memory used by "
                     "QEMU itself instead of guest system!: %p\n", addr);
             return 0;
diff --git a/target-i386/kvm_x86.h b/target-i386/kvm_x86.h
index 9d7b584..304d0cb 100644
--- a/target-i386/kvm_x86.h
+++ b/target-i386/kvm_x86.h
@@ -22,4 +22,7 @@ void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
                         uint64_t mcg_status, uint64_t addr, uint64_t misc,
                         int flag);
 
+uint32_t kvm_x86_get_supported_cpuid(uint32_t function, uint32_t index,
+                                     int reg);
+
 #endif
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 849b404..56d30cc 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -56,13 +56,13 @@ static void kvm_kick_env(void *env)
     qemu_cpu_kick(env);
 }
 
-int kvm_arch_init(KVMState *s, int smp_cpus)
+int kvm_arch_init(int smp_cpus)
 {
 #ifdef KVM_CAP_PPC_UNSET_IRQ
-    cap_interrupt_unset = kvm_check_extension(s, KVM_CAP_PPC_UNSET_IRQ);
+    cap_interrupt_unset = kvm_check_extension(KVM_CAP_PPC_UNSET_IRQ);
 #endif
 #ifdef KVM_CAP_PPC_IRQ_LEVEL
-    cap_interrupt_level = kvm_check_extension(s, KVM_CAP_PPC_IRQ_LEVEL);
+    cap_interrupt_level = kvm_check_extension(KVM_CAP_PPC_IRQ_LEVEL);
 #endif
 
     if (!cap_interrupt_level) {
@@ -164,7 +164,7 @@ int kvm_arch_get_registers(CPUState *env)
         env->gpr[i] = regs.gpr[i];
 
 #ifdef KVM_CAP_PPC_SEGSTATE
-    if (kvm_check_extension(env->kvm_state, KVM_CAP_PPC_SEGSTATE)) {
+    if (kvm_check_extension(KVM_CAP_PPC_SEGSTATE)) {
         env->sdr1 = sregs.u.s.sdr1;
 
         /* Sync SLB */
@@ -371,8 +371,8 @@ int kvmppc_get_hypercall(CPUState *env, uint8_t *buf, int buf_len)
 #ifdef KVM_CAP_PPC_GET_PVINFO
     struct kvm_ppc_pvinfo pvinfo;
 
-    if (kvm_check_extension(env->kvm_state, KVM_CAP_PPC_GET_PVINFO) &&
-        !kvm_vm_ioctl(env->kvm_state, KVM_PPC_GET_PVINFO, &pvinfo)) {
+    if (kvm_check_extension(KVM_CAP_PPC_GET_PVINFO) &&
+        !kvm_vm_ioctl(KVM_PPC_GET_PVINFO, &pvinfo)) {
         memcpy(buf, pvinfo.hcall, buf_len);
 
         return 0;
diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
index adf4a9e..927a37e 100644
--- a/target-s390x/kvm.c
+++ b/target-s390x/kvm.c
@@ -70,7 +70,7 @@
 #define SCLP_CMDW_READ_SCP_INFO         0x00020001
 #define SCLP_CMDW_READ_SCP_INFO_FORCED  0x00120001
 
-int kvm_arch_init(KVMState *s, int smp_cpus)
+int kvm_arch_init(int smp_cpus)
 {
     return 0;
 }
@@ -186,10 +186,6 @@ static void kvm_s390_interrupt_internal(CPUState *env, int type, uint32_t parm,
     struct kvm_s390_interrupt kvmint;
     int r;
 
-    if (!env->kvm_state) {
-        return;
-    }
-
     env->halted = 0;
     env->exception_index = -1;
 
@@ -198,7 +194,7 @@ static void kvm_s390_interrupt_internal(CPUState *env, int type, uint32_t parm,
     kvmint.parm64 = parm64;
 
     if (vm) {
-        r = kvm_vm_ioctl(env->kvm_state, KVM_S390_INTERRUPT, &kvmint);
+        r = kvm_vm_ioctl(KVM_S390_INTERRUPT, &kvmint);
     } else {
         r = kvm_vcpu_ioctl(env, KVM_S390_INTERRUPT, &kvmint);
     }
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [Qemu-devel] [PATCH 26/35] kvm: Eliminate KVMState arguments
@ 2011-01-06 17:56   ` Marcelo Tosatti
  0 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel, kvm, Alexander Graf

From: Jan Kiszka <jan.kiszka@siemens.com>

QEMU supports only one VM, so there is only one kvm_state per process,
and we gain nothing passing a reference to it around. Eliminate any need
to refer to it outside of kvm-all.c.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 cpu-defs.h            |    2 -
 kvm-all.c             |  232 +++++++++++++++++++++----------------------------
 kvm-stub.c            |    2 +-
 kvm.h                 |   15 +--
 target-i386/cpuid.c   |    9 +-
 target-i386/kvm.c     |   77 ++++++++--------
 target-i386/kvm_x86.h |    3 +
 target-ppc/kvm.c      |   12 ++--
 target-s390x/kvm.c    |    8 +--
 9 files changed, 160 insertions(+), 200 deletions(-)

diff --git a/cpu-defs.h b/cpu-defs.h
index 8d4bf86..0e04239 100644
--- a/cpu-defs.h
+++ b/cpu-defs.h
@@ -131,7 +131,6 @@ typedef struct icount_decr_u16 {
 #endif
 
 struct kvm_run;
-struct KVMState;
 struct qemu_work_item;
 
 typedef struct CPUBreakpoint {
@@ -207,7 +206,6 @@ typedef struct CPUWatchpoint {
     struct QemuCond *halt_cond;                                         \
     struct qemu_work_item *queued_work_first, *queued_work_last;        \
     const char *cpu_model_str;                                          \
-    struct KVMState *kvm_state;                                         \
     struct kvm_run *kvm_run;                                            \
     int kvm_fd;                                                         \
     int kvm_vcpu_dirty;
diff --git a/kvm-all.c b/kvm-all.c
index ef2ca3b..d8820c7 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -52,8 +52,7 @@ typedef struct KVMSlot
 
 typedef struct kvm_dirty_log KVMDirtyLog;
 
-struct KVMState
-{
+static struct KVMState {
     KVMSlot slots[32];
     int fd;
     int vmfd;
@@ -72,21 +71,19 @@ struct KVMState
     int irqchip_in_kernel;
     int pit_in_kernel;
     int xsave, xcrs;
-};
-
-static KVMState *kvm_state;
+} kvm_state;
 
-static KVMSlot *kvm_alloc_slot(KVMState *s)
+static KVMSlot *kvm_alloc_slot(void)
 {
     int i;
 
-    for (i = 0; i < ARRAY_SIZE(s->slots); i++) {
+    for (i = 0; i < ARRAY_SIZE(kvm_state.slots); i++) {
         /* KVM private memory slots */
         if (i >= 8 && i < 12) {
             continue;
         }
-        if (s->slots[i].memory_size == 0) {
-            return &s->slots[i];
+        if (kvm_state.slots[i].memory_size == 0) {
+            return &kvm_state.slots[i];
         }
     }
 
@@ -94,14 +91,13 @@ static KVMSlot *kvm_alloc_slot(KVMState *s)
     abort();
 }
 
-static KVMSlot *kvm_lookup_matching_slot(KVMState *s,
-                                         target_phys_addr_t start_addr,
+static KVMSlot *kvm_lookup_matching_slot(target_phys_addr_t start_addr,
                                          target_phys_addr_t end_addr)
 {
     int i;
 
-    for (i = 0; i < ARRAY_SIZE(s->slots); i++) {
-        KVMSlot *mem = &s->slots[i];
+    for (i = 0; i < ARRAY_SIZE(kvm_state.slots); i++) {
+        KVMSlot *mem = &kvm_state.slots[i];
 
         if (start_addr == mem->start_addr &&
             end_addr == mem->start_addr + mem->memory_size) {
@@ -115,15 +111,14 @@ static KVMSlot *kvm_lookup_matching_slot(KVMState *s,
 /*
  * Find overlapping slot with lowest start address
  */
-static KVMSlot *kvm_lookup_overlapping_slot(KVMState *s,
-                                            target_phys_addr_t start_addr,
+static KVMSlot *kvm_lookup_overlapping_slot(target_phys_addr_t start_addr,
                                             target_phys_addr_t end_addr)
 {
     KVMSlot *found = NULL;
     int i;
 
-    for (i = 0; i < ARRAY_SIZE(s->slots); i++) {
-        KVMSlot *mem = &s->slots[i];
+    for (i = 0; i < ARRAY_SIZE(kvm_state.slots); i++) {
+        KVMSlot *mem = &kvm_state.slots[i];
 
         if (mem->memory_size == 0 ||
             (found && found->start_addr < mem->start_addr)) {
@@ -139,13 +134,13 @@ static KVMSlot *kvm_lookup_overlapping_slot(KVMState *s,
     return found;
 }
 
-int kvm_physical_memory_addr_from_ram(KVMState *s, ram_addr_t ram_addr,
+int kvm_physical_memory_addr_from_ram(ram_addr_t ram_addr,
                                       target_phys_addr_t *phys_addr)
 {
     int i;
 
-    for (i = 0; i < ARRAY_SIZE(s->slots); i++) {
-        KVMSlot *mem = &s->slots[i];
+    for (i = 0; i < ARRAY_SIZE(kvm_state.slots); i++) {
+        KVMSlot *mem = &kvm_state.slots[i];
 
         if (ram_addr >= mem->phys_offset &&
             ram_addr < mem->phys_offset + mem->memory_size) {
@@ -157,7 +152,7 @@ int kvm_physical_memory_addr_from_ram(KVMState *s, ram_addr_t ram_addr,
     return 0;
 }
 
-static int kvm_set_user_memory_region(KVMState *s, KVMSlot *slot)
+static int kvm_set_user_memory_region(KVMSlot *slot)
 {
     struct kvm_userspace_memory_region mem;
 
@@ -166,10 +161,10 @@ static int kvm_set_user_memory_region(KVMState *s, KVMSlot *slot)
     mem.memory_size = slot->memory_size;
     mem.userspace_addr = (unsigned long)qemu_safe_ram_ptr(slot->phys_offset);
     mem.flags = slot->flags;
-    if (s->migration_log) {
+    if (kvm_state.migration_log) {
         mem.flags |= KVM_MEM_LOG_DIRTY_PAGES;
     }
-    return kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, &mem);
+    return kvm_vm_ioctl(KVM_SET_USER_MEMORY_REGION, &mem);
 }
 
 static void kvm_reset_vcpu(void *opaque)
@@ -181,33 +176,31 @@ static void kvm_reset_vcpu(void *opaque)
 
 int kvm_irqchip_in_kernel(void)
 {
-    return kvm_state->irqchip_in_kernel;
+    return kvm_state.irqchip_in_kernel;
 }
 
 int kvm_pit_in_kernel(void)
 {
-    return kvm_state->pit_in_kernel;
+    return kvm_state.pit_in_kernel;
 }
 
 
 int kvm_init_vcpu(CPUState *env)
 {
-    KVMState *s = kvm_state;
     long mmap_size;
     int ret;
 
     DPRINTF("kvm_init_vcpu\n");
 
-    ret = kvm_vm_ioctl(s, KVM_CREATE_VCPU, env->cpu_index);
+    ret = kvm_vm_ioctl(KVM_CREATE_VCPU, env->cpu_index);
     if (ret < 0) {
         DPRINTF("kvm_create_vcpu failed\n");
         goto err;
     }
 
     env->kvm_fd = ret;
-    env->kvm_state = s;
 
-    mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
+    mmap_size = kvm_ioctl(KVM_GET_VCPU_MMAP_SIZE, 0);
     if (mmap_size < 0) {
         DPRINTF("KVM_GET_VCPU_MMAP_SIZE failed\n");
         goto err;
@@ -222,9 +215,9 @@ int kvm_init_vcpu(CPUState *env)
     }
 
 #ifdef KVM_CAP_COALESCED_MMIO
-    if (s->coalesced_mmio && !s->coalesced_mmio_ring) {
-        s->coalesced_mmio_ring =
-            (void *)env->kvm_run + s->coalesced_mmio * PAGE_SIZE;
+    if (kvm_state.coalesced_mmio && !kvm_state.coalesced_mmio_ring) {
+        kvm_state.coalesced_mmio_ring =
+            (void *)env->kvm_run + kvm_state.coalesced_mmio * PAGE_SIZE;
     }
 #endif
 
@@ -243,8 +236,7 @@ err:
 static int kvm_dirty_pages_log_change(target_phys_addr_t phys_addr,
                                       ram_addr_t size, int flags, int mask)
 {
-    KVMState *s = kvm_state;
-    KVMSlot *mem = kvm_lookup_matching_slot(s, phys_addr, phys_addr + size);
+    KVMSlot *mem = kvm_lookup_matching_slot(phys_addr, phys_addr + size);
     int old_flags;
 
     if (mem == NULL)  {
@@ -260,14 +252,14 @@ static int kvm_dirty_pages_log_change(target_phys_addr_t phys_addr,
     mem->flags = flags;
 
     /* If nothing changed effectively, no need to issue ioctl */
-    if (s->migration_log) {
+    if (kvm_state.migration_log) {
         flags |= KVM_MEM_LOG_DIRTY_PAGES;
     }
     if (flags == old_flags) {
             return 0;
     }
 
-    return kvm_set_user_memory_region(s, mem);
+    return kvm_set_user_memory_region(mem);
 }
 
 int kvm_log_start(target_phys_addr_t phys_addr, ram_addr_t size)
@@ -284,14 +276,13 @@ int kvm_log_stop(target_phys_addr_t phys_addr, ram_addr_t size)
 
 static int kvm_set_migration_log(int enable)
 {
-    KVMState *s = kvm_state;
     KVMSlot *mem;
     int i, err;
 
-    s->migration_log = enable;
+    kvm_state.migration_log = enable;
 
-    for (i = 0; i < ARRAY_SIZE(s->slots); i++) {
-        mem = &s->slots[i];
+    for (i = 0; i < ARRAY_SIZE(kvm_state.slots); i++) {
+        mem = &kvm_state.slots[i];
 
         if (!mem->memory_size) {
             continue;
@@ -299,7 +290,7 @@ static int kvm_set_migration_log(int enable)
         if (!!(mem->flags & KVM_MEM_LOG_DIRTY_PAGES) == enable) {
             continue;
         }
-        err = kvm_set_user_memory_region(s, mem);
+        err = kvm_set_user_memory_region(mem);
         if (err) {
             return err;
         }
@@ -353,7 +344,6 @@ static int kvm_get_dirty_pages_log_range(unsigned long start_addr,
 static int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
                                           target_phys_addr_t end_addr)
 {
-    KVMState *s = kvm_state;
     unsigned long size, allocated_size = 0;
     KVMDirtyLog d;
     KVMSlot *mem;
@@ -361,7 +351,7 @@ static int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
 
     d.dirty_bitmap = NULL;
     while (start_addr < end_addr) {
-        mem = kvm_lookup_overlapping_slot(s, start_addr, end_addr);
+        mem = kvm_lookup_overlapping_slot(start_addr, end_addr);
         if (mem == NULL) {
             break;
         }
@@ -377,7 +367,7 @@ static int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
 
         d.slot = mem->slot;
 
-        if (kvm_vm_ioctl(s, KVM_GET_DIRTY_LOG, &d) == -1) {
+        if (kvm_vm_ioctl(KVM_GET_DIRTY_LOG, &d) == -1) {
             DPRINTF("ioctl failed %d\n", errno);
             ret = -1;
             break;
@@ -395,16 +385,15 @@ static int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
 int kvm_coalesce_mmio_region(target_phys_addr_t start, ram_addr_t size)
 {
     int ret = -ENOSYS;
-#ifdef KVM_CAP_COALESCED_MMIO
-    KVMState *s = kvm_state;
 
-    if (s->coalesced_mmio) {
+#ifdef KVM_CAP_COALESCED_MMIO
+    if (kvm_state.coalesced_mmio) {
         struct kvm_coalesced_mmio_zone zone;
 
         zone.addr = start;
         zone.size = size;
 
-        ret = kvm_vm_ioctl(s, KVM_REGISTER_COALESCED_MMIO, &zone);
+        ret = kvm_vm_ioctl(KVM_REGISTER_COALESCED_MMIO, &zone);
     }
 #endif
 
@@ -414,27 +403,26 @@ int kvm_coalesce_mmio_region(target_phys_addr_t start, ram_addr_t size)
 int kvm_uncoalesce_mmio_region(target_phys_addr_t start, ram_addr_t size)
 {
     int ret = -ENOSYS;
-#ifdef KVM_CAP_COALESCED_MMIO
-    KVMState *s = kvm_state;
 
-    if (s->coalesced_mmio) {
+#ifdef KVM_CAP_COALESCED_MMIO
+    if (kvm_state.coalesced_mmio) {
         struct kvm_coalesced_mmio_zone zone;
 
         zone.addr = start;
         zone.size = size;
 
-        ret = kvm_vm_ioctl(s, KVM_UNREGISTER_COALESCED_MMIO, &zone);
+        ret = kvm_vm_ioctl(KVM_UNREGISTER_COALESCED_MMIO, &zone);
     }
 #endif
 
     return ret;
 }
 
-int kvm_check_extension(KVMState *s, unsigned int extension)
+int kvm_check_extension(unsigned int extension)
 {
     int ret;
 
-    ret = kvm_ioctl(s, KVM_CHECK_EXTENSION, extension);
+    ret = kvm_ioctl(KVM_CHECK_EXTENSION, extension);
     if (ret < 0) {
         ret = 0;
     }
@@ -445,7 +433,6 @@ int kvm_check_extension(KVMState *s, unsigned int extension)
 static void kvm_set_phys_mem(target_phys_addr_t start_addr, ram_addr_t size,
                              ram_addr_t phys_offset)
 {
-    KVMState *s = kvm_state;
     ram_addr_t flags = phys_offset & ~TARGET_PAGE_MASK;
     KVMSlot *mem, old;
     int err;
@@ -459,7 +446,7 @@ static void kvm_set_phys_mem(target_phys_addr_t start_addr, ram_addr_t size,
     phys_offset &= ~IO_MEM_ROM;
 
     while (1) {
-        mem = kvm_lookup_overlapping_slot(s, start_addr, start_addr + size);
+        mem = kvm_lookup_overlapping_slot(start_addr, start_addr + size);
         if (!mem) {
             break;
         }
@@ -476,7 +463,7 @@ static void kvm_set_phys_mem(target_phys_addr_t start_addr, ram_addr_t size,
 
         /* unregister the overlapping slot */
         mem->memory_size = 0;
-        err = kvm_set_user_memory_region(s, mem);
+        err = kvm_set_user_memory_region(mem);
         if (err) {
             fprintf(stderr, "%s: error unregistering overlapping slot: %s\n",
                     __func__, strerror(-err));
@@ -491,16 +478,16 @@ static void kvm_set_phys_mem(target_phys_addr_t start_addr, ram_addr_t size,
          * address as the first existing one. If not or if some overlapping
          * slot comes around later, we will fail (not seen in practice so far)
          * - and actually require a recent KVM version. */
-        if (s->broken_set_mem_region &&
+        if (kvm_state.broken_set_mem_region &&
             old.start_addr == start_addr && old.memory_size < size &&
             flags < IO_MEM_UNASSIGNED) {
-            mem = kvm_alloc_slot(s);
+            mem = kvm_alloc_slot();
             mem->memory_size = old.memory_size;
             mem->start_addr = old.start_addr;
             mem->phys_offset = old.phys_offset;
             mem->flags = 0;
 
-            err = kvm_set_user_memory_region(s, mem);
+            err = kvm_set_user_memory_region(mem);
             if (err) {
                 fprintf(stderr, "%s: error updating slot: %s\n", __func__,
                         strerror(-err));
@@ -515,13 +502,13 @@ static void kvm_set_phys_mem(target_phys_addr_t start_addr, ram_addr_t size,
 
         /* register prefix slot */
         if (old.start_addr < start_addr) {
-            mem = kvm_alloc_slot(s);
+            mem = kvm_alloc_slot();
             mem->memory_size = start_addr - old.start_addr;
             mem->start_addr = old.start_addr;
             mem->phys_offset = old.phys_offset;
             mem->flags = 0;
 
-            err = kvm_set_user_memory_region(s, mem);
+            err = kvm_set_user_memory_region(mem);
             if (err) {
                 fprintf(stderr, "%s: error registering prefix slot: %s\n",
                         __func__, strerror(-err));
@@ -533,14 +520,14 @@ static void kvm_set_phys_mem(target_phys_addr_t start_addr, ram_addr_t size,
         if (old.start_addr + old.memory_size > start_addr + size) {
             ram_addr_t size_delta;
 
-            mem = kvm_alloc_slot(s);
+            mem = kvm_alloc_slot();
             mem->start_addr = start_addr + size;
             size_delta = mem->start_addr - old.start_addr;
             mem->memory_size = old.memory_size - size_delta;
             mem->phys_offset = old.phys_offset + size_delta;
             mem->flags = 0;
 
-            err = kvm_set_user_memory_region(s, mem);
+            err = kvm_set_user_memory_region(mem);
             if (err) {
                 fprintf(stderr, "%s: error registering suffix slot: %s\n",
                         __func__, strerror(-err));
@@ -557,13 +544,13 @@ static void kvm_set_phys_mem(target_phys_addr_t start_addr, ram_addr_t size,
     if (flags >= IO_MEM_UNASSIGNED) {
         return;
     }
-    mem = kvm_alloc_slot(s);
+    mem = kvm_alloc_slot();
     mem->memory_size = size;
     mem->start_addr = start_addr;
     mem->phys_offset = phys_offset;
     mem->flags = 0;
 
-    err = kvm_set_user_memory_region(s, mem);
+    err = kvm_set_user_memory_region(mem);
     if (err) {
         fprintf(stderr, "%s: error registering slot: %s\n", __func__,
                 strerror(-err));
@@ -602,27 +589,24 @@ int kvm_init(int smp_cpus)
     static const char upgrade_note[] =
         "Please upgrade to at least kernel 2.6.29 or recent kvm-kmod\n"
         "(see http://sourceforge.net/projects/kvm).\n";
-    KVMState *s;
     int ret;
     int i;
 
-    s = qemu_mallocz(sizeof(KVMState));
-
 #ifdef KVM_CAP_SET_GUEST_DEBUG
-    QTAILQ_INIT(&s->kvm_sw_breakpoints);
+    QTAILQ_INIT(&kvm_state.kvm_sw_breakpoints);
 #endif
-    for (i = 0; i < ARRAY_SIZE(s->slots); i++) {
-        s->slots[i].slot = i;
+    for (i = 0; i < ARRAY_SIZE(kvm_state.slots); i++) {
+        kvm_state.slots[i].slot = i;
     }
-    s->vmfd = -1;
-    s->fd = qemu_open("/dev/kvm", O_RDWR);
-    if (s->fd == -1) {
+    kvm_state.vmfd = -1;
+    kvm_state.fd = qemu_open("/dev/kvm", O_RDWR);
+    if (kvm_state.fd == -1) {
         fprintf(stderr, "Could not access KVM kernel module: %m\n");
         ret = -errno;
         goto err;
     }
 
-    ret = kvm_ioctl(s, KVM_GET_API_VERSION, 0);
+    ret = kvm_ioctl(KVM_GET_API_VERSION, 0);
     if (ret < KVM_API_VERSION) {
         if (ret > 0) {
             ret = -EINVAL;
@@ -637,8 +621,8 @@ int kvm_init(int smp_cpus)
         goto err;
     }
 
-    s->vmfd = kvm_ioctl(s, KVM_CREATE_VM, 0);
-    if (s->vmfd < 0) {
+    kvm_state.vmfd = kvm_ioctl(KVM_CREATE_VM, 0);
+    if (kvm_state.vmfd < 0) {
 #ifdef TARGET_S390X
         fprintf(stderr, "Please add the 'switch_amode' kernel parameter to "
                         "your host kernel command line\n");
@@ -651,7 +635,7 @@ int kvm_init(int smp_cpus)
      * just use a user allocated buffer so we can use regular pages
      * unmodified.  Make sure we have a sufficiently modern version of KVM.
      */
-    if (!kvm_check_extension(s, KVM_CAP_USER_MEMORY)) {
+    if (!kvm_check_extension(KVM_CAP_USER_MEMORY)) {
         ret = -EINVAL;
         fprintf(stderr, "kvm does not support KVM_CAP_USER_MEMORY\n%s",
                 upgrade_note);
@@ -661,7 +645,7 @@ int kvm_init(int smp_cpus)
     /* There was a nasty bug in < kvm-80 that prevents memory slots from being
      * destroyed properly.  Since we rely on this capability, refuse to work
      * with any kernel without this capability. */
-    if (!kvm_check_extension(s, KVM_CAP_DESTROY_MEMORY_REGION_WORKS)) {
+    if (!kvm_check_extension(KVM_CAP_DESTROY_MEMORY_REGION_WORKS)) {
         ret = -EINVAL;
 
         fprintf(stderr,
@@ -670,66 +654,55 @@ int kvm_init(int smp_cpus)
         goto err;
     }
 
-    s->coalesced_mmio = 0;
 #ifdef KVM_CAP_COALESCED_MMIO
-    s->coalesced_mmio = kvm_check_extension(s, KVM_CAP_COALESCED_MMIO);
-    s->coalesced_mmio_ring = NULL;
+    kvm_state.coalesced_mmio = kvm_check_extension(KVM_CAP_COALESCED_MMIO);
 #endif
 
-    s->broken_set_mem_region = 1;
+    kvm_state.broken_set_mem_region = 1;
 #ifdef KVM_CAP_JOIN_MEMORY_REGIONS_WORKS
-    ret = kvm_check_extension(s, KVM_CAP_JOIN_MEMORY_REGIONS_WORKS);
+    ret = kvm_check_extension(KVM_CAP_JOIN_MEMORY_REGIONS_WORKS);
     if (ret > 0) {
-        s->broken_set_mem_region = 0;
+        kvm_state.broken_set_mem_region = 0;
     }
 #endif
 
-    s->vcpu_events = 0;
 #ifdef KVM_CAP_VCPU_EVENTS
-    s->vcpu_events = kvm_check_extension(s, KVM_CAP_VCPU_EVENTS);
+    kvm_state.vcpu_events = kvm_check_extension(KVM_CAP_VCPU_EVENTS);
 #endif
 
-    s->robust_singlestep = 0;
 #ifdef KVM_CAP_X86_ROBUST_SINGLESTEP
-    s->robust_singlestep =
-        kvm_check_extension(s, KVM_CAP_X86_ROBUST_SINGLESTEP);
+    kvm_state.robust_singlestep =
+        kvm_check_extension(KVM_CAP_X86_ROBUST_SINGLESTEP);
 #endif
 
-    s->debugregs = 0;
 #ifdef KVM_CAP_DEBUGREGS
-    s->debugregs = kvm_check_extension(s, KVM_CAP_DEBUGREGS);
+    kvm_state.debugregs = kvm_check_extension(KVM_CAP_DEBUGREGS);
 #endif
 
-    s->xsave = 0;
 #ifdef KVM_CAP_XSAVE
-    s->xsave = kvm_check_extension(s, KVM_CAP_XSAVE);
+    kvm_state.xsave = kvm_check_extension(KVM_CAP_XSAVE);
 #endif
 
-    s->xcrs = 0;
 #ifdef KVM_CAP_XCRS
-    s->xcrs = kvm_check_extension(s, KVM_CAP_XCRS);
+    kvm_state.xcrs = kvm_check_extension(KVM_CAP_XCRS);
 #endif
 
-    ret = kvm_arch_init(s, smp_cpus);
+    ret = kvm_arch_init(smp_cpus);
     if (ret < 0) {
         goto err;
     }
 
-    kvm_state = s;
     cpu_register_phys_memory_client(&kvm_cpu_phys_memory_client);
 
     return 0;
 
 err:
-    if (s) {
-        if (s->vmfd != -1) {
-            close(s->vmfd);
-        }
-        if (s->fd != -1) {
-            close(s->fd);
-        }
+    if (kvm_state.vmfd != -1) {
+        close(kvm_state.vmfd);
+    }
+    if (kvm_state.fd != -1) {
+        close(kvm_state.fd);
     }
-    qemu_free(s);
 
     return ret;
 }
@@ -777,7 +750,7 @@ static int kvm_handle_io(uint16_t port, void *data, int direction, int size,
 static int kvm_handle_internal_error(CPUState *env, struct kvm_run *run)
 {
     fprintf(stderr, "KVM internal error.");
-    if (kvm_check_extension(kvm_state, KVM_CAP_INTERNAL_ERROR_DATA)) {
+    if (kvm_check_extension(KVM_CAP_INTERNAL_ERROR_DATA)) {
         int i;
 
         fprintf(stderr, " Suberror: %d\n", run->internal.suberror);
@@ -805,9 +778,8 @@ static int kvm_handle_internal_error(CPUState *env, struct kvm_run *run)
 void kvm_flush_coalesced_mmio_buffer(void)
 {
 #ifdef KVM_CAP_COALESCED_MMIO
-    KVMState *s = kvm_state;
-    if (s->coalesced_mmio_ring) {
-        struct kvm_coalesced_mmio_ring *ring = s->coalesced_mmio_ring;
+    if (kvm_state.coalesced_mmio_ring) {
+        struct kvm_coalesced_mmio_ring *ring = kvm_state.coalesced_mmio_ring;
         while (ring->first != ring->last) {
             struct kvm_coalesced_mmio *ent;
 
@@ -963,7 +935,7 @@ void kvm_cpu_exec(CPUState *env)
     }
 }
 
-int kvm_ioctl(KVMState *s, int type, ...)
+int kvm_ioctl(int type, ...)
 {
     int ret;
     void *arg;
@@ -973,14 +945,14 @@ int kvm_ioctl(KVMState *s, int type, ...)
     arg = va_arg(ap, void *);
     va_end(ap);
 
-    ret = ioctl(s->fd, type, arg);
+    ret = ioctl(kvm_state.fd, type, arg);
     if (ret == -1) {
         ret = -errno;
     }
     return ret;
 }
 
-int kvm_vm_ioctl(KVMState *s, int type, ...)
+int kvm_vm_ioctl(int type, ...)
 {
     int ret;
     void *arg;
@@ -990,7 +962,7 @@ int kvm_vm_ioctl(KVMState *s, int type, ...)
     arg = va_arg(ap, void *);
     va_end(ap);
 
-    ret = ioctl(s->vmfd, type, arg);
+    ret = ioctl(kvm_state.vmfd, type, arg);
     if (ret == -1) {
         ret = -errno;
     }
@@ -1017,9 +989,7 @@ int kvm_vcpu_ioctl(CPUState *env, int type, ...)
 int kvm_has_sync_mmu(void)
 {
 #ifdef KVM_CAP_SYNC_MMU
-    KVMState *s = kvm_state;
-
-    return kvm_check_extension(s, KVM_CAP_SYNC_MMU);
+    return kvm_check_extension(KVM_CAP_SYNC_MMU);
 #else
     return 0;
 #endif
@@ -1027,27 +997,27 @@ int kvm_has_sync_mmu(void)
 
 int kvm_has_vcpu_events(void)
 {
-    return kvm_state->vcpu_events;
+    return kvm_state.vcpu_events;
 }
 
 int kvm_has_robust_singlestep(void)
 {
-    return kvm_state->robust_singlestep;
+    return kvm_state.robust_singlestep;
 }
 
 int kvm_has_debugregs(void)
 {
-    return kvm_state->debugregs;
+    return kvm_state.debugregs;
 }
 
 int kvm_has_xsave(void)
 {
-    return kvm_state->xsave;
+    return kvm_state.xsave;
 }
 
 int kvm_has_xcrs(void)
 {
-    return kvm_state->xcrs;
+    return kvm_state.xcrs;
 }
 
 void kvm_setup_guest_memory(void *start, size_t size)
@@ -1070,7 +1040,7 @@ struct kvm_sw_breakpoint *kvm_find_sw_breakpoint(CPUState *env,
 {
     struct kvm_sw_breakpoint *bp;
 
-    QTAILQ_FOREACH(bp, &env->kvm_state->kvm_sw_breakpoints, entry) {
+    QTAILQ_FOREACH(bp, &kvm_state.kvm_sw_breakpoints, entry) {
         if (bp->pc == pc) {
             return bp;
         }
@@ -1080,7 +1050,7 @@ struct kvm_sw_breakpoint *kvm_find_sw_breakpoint(CPUState *env,
 
 int kvm_sw_breakpoints_active(CPUState *env)
 {
-    return !QTAILQ_EMPTY(&env->kvm_state->kvm_sw_breakpoints);
+    return !QTAILQ_EMPTY(&kvm_state.kvm_sw_breakpoints);
 }
 
 struct kvm_set_guest_debug_data {
@@ -1140,8 +1110,7 @@ int kvm_insert_breakpoint(CPUState *current_env, target_ulong addr,
             return err;
         }
 
-        QTAILQ_INSERT_HEAD(&current_env->kvm_state->kvm_sw_breakpoints,
-                          bp, entry);
+        QTAILQ_INSERT_HEAD(&kvm_state.kvm_sw_breakpoints, bp, entry);
     } else {
         err = kvm_arch_insert_hw_breakpoint(addr, len, type);
         if (err) {
@@ -1181,7 +1150,7 @@ int kvm_remove_breakpoint(CPUState *current_env, target_ulong addr,
             return err;
         }
 
-        QTAILQ_REMOVE(&current_env->kvm_state->kvm_sw_breakpoints, bp, entry);
+        QTAILQ_REMOVE(&kvm_state.kvm_sw_breakpoints, bp, entry);
         qemu_free(bp);
     } else {
         err = kvm_arch_remove_hw_breakpoint(addr, len, type);
@@ -1202,10 +1171,9 @@ int kvm_remove_breakpoint(CPUState *current_env, target_ulong addr,
 void kvm_remove_all_breakpoints(CPUState *current_env)
 {
     struct kvm_sw_breakpoint *bp, *next;
-    KVMState *s = current_env->kvm_state;
     CPUState *env;
 
-    QTAILQ_FOREACH_SAFE(bp, &s->kvm_sw_breakpoints, entry, next) {
+    QTAILQ_FOREACH_SAFE(bp, &kvm_state.kvm_sw_breakpoints, entry, next) {
         if (kvm_arch_remove_sw_breakpoint(current_env, bp) != 0) {
             /* Try harder to find a CPU that currently sees the breakpoint. */
             for (env = first_cpu; env != NULL; env = env->next_cpu) {
@@ -1285,7 +1253,7 @@ int kvm_set_ioeventfd_mmio_long(int fd, uint32_t addr, uint32_t val, bool assign
         iofd.flags |= KVM_IOEVENTFD_FLAG_DEASSIGN;
     }
 
-    ret = kvm_vm_ioctl(kvm_state, KVM_IOEVENTFD, &iofd);
+    ret = kvm_vm_ioctl(KVM_IOEVENTFD, &iofd);
 
     if (ret < 0) {
         return -errno;
@@ -1314,7 +1282,7 @@ int kvm_set_ioeventfd_pio_word(int fd, uint16_t addr, uint16_t val, bool assign)
     if (!assign) {
         kick.flags |= KVM_IOEVENTFD_FLAG_DEASSIGN;
     }
-    r = kvm_vm_ioctl(kvm_state, KVM_IOEVENTFD, &kick);
+    r = kvm_vm_ioctl(KVM_IOEVENTFD, &kick);
     if (r < 0) {
         return r;
     }
diff --git a/kvm-stub.c b/kvm-stub.c
index 352c6a6..3a058ad 100644
--- a/kvm-stub.c
+++ b/kvm-stub.c
@@ -53,7 +53,7 @@ int kvm_uncoalesce_mmio_region(target_phys_addr_t start, ram_addr_t size)
     return -ENOSYS;
 }
 
-int kvm_check_extension(KVMState *s, unsigned int extension)
+int kvm_check_extension(unsigned int extension)
 {
     return 0;
 }
diff --git a/kvm.h b/kvm.h
index 51ad56f..26ca8c1 100644
--- a/kvm.h
+++ b/kvm.h
@@ -74,12 +74,9 @@ int kvm_irqchip_in_kernel(void);
 
 /* internal API */
 
-struct KVMState;
-typedef struct KVMState KVMState;
+int kvm_ioctl(int type, ...);
 
-int kvm_ioctl(KVMState *s, int type, ...);
-
-int kvm_vm_ioctl(KVMState *s, int type, ...);
+int kvm_vm_ioctl(int type, ...);
 
 int kvm_vcpu_ioctl(CPUState *env, int type, ...);
 
@@ -104,7 +101,7 @@ int kvm_arch_get_registers(CPUState *env);
 
 int kvm_arch_put_registers(CPUState *env, int level);
 
-int kvm_arch_init(KVMState *s, int smp_cpus);
+int kvm_arch_init(int smp_cpus);
 
 int kvm_arch_init_vcpu(CPUState *env);
 
@@ -146,10 +143,8 @@ void kvm_arch_update_guest_debug(CPUState *env, struct kvm_guest_debug *dbg);
 
 bool kvm_arch_stop_on_emulation_error(CPUState *env);
 
-int kvm_check_extension(KVMState *s, unsigned int extension);
+int kvm_check_extension(unsigned int extension);
 
-uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function,
-                                      uint32_t index, int reg);
 void kvm_cpu_synchronize_state(CPUState *env);
 void kvm_cpu_synchronize_post_reset(CPUState *env);
 void kvm_cpu_synchronize_post_init(CPUState *env);
@@ -179,7 +174,7 @@ static inline void cpu_synchronize_post_init(CPUState *env)
 
 
 #if !defined(CONFIG_USER_ONLY)
-int kvm_physical_memory_addr_from_ram(KVMState *s, ram_addr_t ram_addr,
+int kvm_physical_memory_addr_from_ram(ram_addr_t ram_addr,
                                       target_phys_addr_t *phys_addr);
 #endif
 
diff --git a/target-i386/cpuid.c b/target-i386/cpuid.c
index 5382a28..17ab619 100644
--- a/target-i386/cpuid.c
+++ b/target-i386/cpuid.c
@@ -23,6 +23,7 @@
 
 #include "cpu.h"
 #include "kvm.h"
+#include "kvm_x86.h"
 
 #include "qemu-option.h"
 #include "qemu-config.h"
@@ -1138,10 +1139,10 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
             break;
         }
         if (kvm_enabled()) {
-            *eax = kvm_arch_get_supported_cpuid(env, 0xd, count, R_EAX);
-            *ebx = kvm_arch_get_supported_cpuid(env, 0xd, count, R_EBX);
-            *ecx = kvm_arch_get_supported_cpuid(env, 0xd, count, R_ECX);
-            *edx = kvm_arch_get_supported_cpuid(env, 0xd, count, R_EDX);
+            *eax = kvm_x86_get_supported_cpuid(0xd, count, R_EAX);
+            *ebx = kvm_x86_get_supported_cpuid(0xd, count, R_EBX);
+            *ecx = kvm_x86_get_supported_cpuid(0xd, count, R_ECX);
+            *edx = kvm_x86_get_supported_cpuid(0xd, count, R_EDX);
         } else {
             *eax = 0;
             *ebx = 0;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 1789bff..cb6883f 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -60,7 +60,7 @@ static int lm_capable_kernel;
 
 #ifdef KVM_CAP_EXT_CPUID
 
-static struct kvm_cpuid2 *try_get_cpuid(KVMState *s, int max)
+static struct kvm_cpuid2 *try_get_cpuid(int max)
 {
     struct kvm_cpuid2 *cpuid;
     int r, size;
@@ -68,7 +68,7 @@ static struct kvm_cpuid2 *try_get_cpuid(KVMState *s, int max)
     size = sizeof(*cpuid) + max * sizeof(*cpuid->entries);
     cpuid = (struct kvm_cpuid2 *)qemu_mallocz(size);
     cpuid->nent = max;
-    r = kvm_ioctl(s, KVM_GET_SUPPORTED_CPUID, cpuid);
+    r = kvm_ioctl(KVM_GET_SUPPORTED_CPUID, cpuid);
     if (r == 0 && cpuid->nent >= max) {
         r = -E2BIG;
     }
@@ -85,20 +85,20 @@ static struct kvm_cpuid2 *try_get_cpuid(KVMState *s, int max)
     return cpuid;
 }
 
-uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function,
-                                      uint32_t index, int reg)
+uint32_t kvm_x86_get_supported_cpuid(uint32_t function, uint32_t index,
+                                     int reg)
 {
     struct kvm_cpuid2 *cpuid;
     int i, max;
     uint32_t ret = 0;
     uint32_t cpuid_1_edx;
 
-    if (!kvm_check_extension(env->kvm_state, KVM_CAP_EXT_CPUID)) {
+    if (!kvm_check_extension(KVM_CAP_EXT_CPUID)) {
         return -1U;
     }
 
     max = 1;
-    while ((cpuid = try_get_cpuid(env->kvm_state, max)) == NULL) {
+    while ((cpuid = try_get_cpuid(max)) == NULL) {
         max *= 2;
     }
 
@@ -126,7 +126,7 @@ uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function,
                     /* On Intel, kvm returns cpuid according to the Intel spec,
                      * so add missing bits according to the AMD spec:
                      */
-                    cpuid_1_edx = kvm_arch_get_supported_cpuid(env, 1, 0, R_EDX);
+                    cpuid_1_edx = kvm_x86_get_supported_cpuid(1, 0, R_EDX);
                     ret |= cpuid_1_edx & 0x183f7ff;
                     break;
                 }
@@ -142,8 +142,8 @@ uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function,
 
 #else
 
-uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function,
-                                      uint32_t index, int reg)
+uint32_t kvm_x86_get_supported_cpuid(uint32_t function, uint32_t index,
+                                     int reg)
 {
     return -1U;
 }
@@ -170,12 +170,12 @@ struct kvm_para_features {
     { -1, -1 }
 };
 
-static int get_para_features(CPUState *env)
+static int get_para_features(void)
 {
     int i, features = 0;
 
     for (i = 0; i < ARRAY_SIZE(para_features) - 1; i++) {
-        if (kvm_check_extension(env->kvm_state, para_features[i].cap)) {
+        if (kvm_check_extension(para_features[i].cap)) {
             features |= (1 << para_features[i].feature);
         }
     }
@@ -184,15 +184,14 @@ static int get_para_features(CPUState *env)
 #endif
 
 #ifdef KVM_CAP_MCE
-static int kvm_get_mce_cap_supported(KVMState *s, uint64_t *mce_cap,
-                                     int *max_banks)
+static int kvm_get_mce_cap_supported(uint64_t *mce_cap, int *max_banks)
 {
     int r;
 
-    r = kvm_check_extension(s, KVM_CAP_MCE);
+    r = kvm_check_extension(KVM_CAP_MCE);
     if (r > 0) {
         *max_banks = r;
-        return kvm_ioctl(s, KVM_X86_GET_MCE_CAP_SUPPORTED, mce_cap);
+        return kvm_ioctl(KVM_X86_GET_MCE_CAP_SUPPORTED, mce_cap);
     }
     return -ENOSYS;
 }
@@ -323,18 +322,18 @@ int kvm_arch_init_vcpu(CPUState *env)
     uint32_t signature[3];
 #endif
 
-    env->cpuid_features &= kvm_arch_get_supported_cpuid(env, 1, 0, R_EDX);
+    env->cpuid_features &= kvm_x86_get_supported_cpuid(1, 0, R_EDX);
 
     i = env->cpuid_ext_features & CPUID_EXT_HYPERVISOR;
-    env->cpuid_ext_features &= kvm_arch_get_supported_cpuid(env, 1, 0, R_ECX);
+    env->cpuid_ext_features &= kvm_x86_get_supported_cpuid(1, 0, R_ECX);
     env->cpuid_ext_features |= i;
 
-    env->cpuid_ext2_features &= kvm_arch_get_supported_cpuid(env, 0x80000001,
-                                                             0, R_EDX);
-    env->cpuid_ext3_features &= kvm_arch_get_supported_cpuid(env, 0x80000001,
-                                                             0, R_ECX);
-    env->cpuid_svm_features  &= kvm_arch_get_supported_cpuid(env, 0x8000000A,
-                                                             0, R_EDX);
+    env->cpuid_ext2_features &= kvm_x86_get_supported_cpuid(0x80000001,
+                                                            0, R_EDX);
+    env->cpuid_ext3_features &= kvm_x86_get_supported_cpuid(0x80000001,
+                                                            0, R_ECX);
+    env->cpuid_svm_features  &= kvm_x86_get_supported_cpuid(0x8000000A,
+                                                            0, R_EDX);
 
 
     cpuid_i = 0;
@@ -353,7 +352,7 @@ int kvm_arch_init_vcpu(CPUState *env)
     c = &cpuid_data.entries[cpuid_i++];
     memset(c, 0, sizeof(*c));
     c->function = KVM_CPUID_FEATURES;
-    c->eax = env->cpuid_kvm_features & get_para_features(env);
+    c->eax = env->cpuid_kvm_features & get_para_features();
 #endif
 
     cpu_x86_cpuid(env, 0, 0, &limit, &unused, &unused, &unused);
@@ -423,11 +422,11 @@ int kvm_arch_init_vcpu(CPUState *env)
 #ifdef KVM_CAP_MCE
     if (((env->cpuid_version >> 8)&0xF) >= 6
         && (env->cpuid_features&(CPUID_MCE|CPUID_MCA)) == (CPUID_MCE|CPUID_MCA)
-        && kvm_check_extension(env->kvm_state, KVM_CAP_MCE) > 0) {
+        && kvm_check_extension(KVM_CAP_MCE) > 0) {
         uint64_t mcg_cap;
         int banks;
 
-        if (kvm_get_mce_cap_supported(env->kvm_state, &mcg_cap, &banks)) {
+        if (kvm_get_mce_cap_supported(&mcg_cap, &banks)) {
             perror("kvm_get_mce_cap_supported FAILED");
         } else {
             if (banks > MCE_BANKS_DEF)
@@ -461,7 +460,7 @@ void kvm_arch_reset_vcpu(CPUState *env)
     }
 }
 
-static int kvm_get_supported_msrs(KVMState *s)
+static int kvm_get_supported_msrs(void)
 {
     static int kvm_supported_msrs;
     int ret = 0;
@@ -475,7 +474,7 @@ static int kvm_get_supported_msrs(KVMState *s)
         /* Obtain MSR list from KVM.  These are the MSRs that we must
          * save/restore */
         msr_list.nmsrs = 0;
-        ret = kvm_ioctl(s, KVM_GET_MSR_INDEX_LIST, &msr_list);
+        ret = kvm_ioctl(KVM_GET_MSR_INDEX_LIST, &msr_list);
         if (ret < 0 && ret != -E2BIG) {
             return ret;
         }
@@ -486,7 +485,7 @@ static int kvm_get_supported_msrs(KVMState *s)
                                               sizeof(msr_list.indices[0])));
 
         kvm_msr_list->nmsrs = msr_list.nmsrs;
-        ret = kvm_ioctl(s, KVM_GET_MSR_INDEX_LIST, kvm_msr_list);
+        ret = kvm_ioctl(KVM_GET_MSR_INDEX_LIST, kvm_msr_list);
         if (ret >= 0) {
             int i;
 
@@ -508,17 +507,17 @@ static int kvm_get_supported_msrs(KVMState *s)
     return ret;
 }
 
-static int kvm_init_identity_map_page(KVMState *s)
+static int kvm_init_identity_map_page(void)
 {
 #ifdef KVM_CAP_SET_IDENTITY_MAP_ADDR
     int ret;
     uint64_t addr = 0xfffbc000;
 
-    if (!kvm_check_extension(s, KVM_CAP_SET_IDENTITY_MAP_ADDR)) {
+    if (!kvm_check_extension(KVM_CAP_SET_IDENTITY_MAP_ADDR)) {
         return 0;
     }
 
-    ret = kvm_vm_ioctl(s, KVM_SET_IDENTITY_MAP_ADDR, &addr);
+    ret = kvm_vm_ioctl(KVM_SET_IDENTITY_MAP_ADDR, &addr);
     if (ret < 0) {
         fprintf(stderr, "kvm_set_identity_map_addr: %s\n", strerror(ret));
         return ret;
@@ -527,12 +526,12 @@ static int kvm_init_identity_map_page(KVMState *s)
     return 0;
 }
 
-int kvm_arch_init(KVMState *s, int smp_cpus)
+int kvm_arch_init(int smp_cpus)
 {
     int ret;
     struct utsname utsname;
 
-    ret = kvm_get_supported_msrs(s);
+    ret = kvm_get_supported_msrs();
     if (ret < 0) {
         return ret;
     }
@@ -546,7 +545,7 @@ int kvm_arch_init(KVMState *s, int smp_cpus)
      * versions of KVM just assumed that it would be at the end of physical
      * memory but that doesn't work with more than 4GB of memory.  We simply
      * refuse to work with those older versions of KVM. */
-    ret = kvm_check_extension(s, KVM_CAP_SET_TSS_ADDR);
+    ret = kvm_check_extension(KVM_CAP_SET_TSS_ADDR);
     if (ret <= 0) {
         fprintf(stderr, "kvm does not support KVM_CAP_SET_TSS_ADDR\n");
         return ret;
@@ -563,12 +562,12 @@ int kvm_arch_init(KVMState *s, int smp_cpus)
         perror("e820_add_entry() table is full");
         exit(1);
     }
-    ret = kvm_vm_ioctl(s, KVM_SET_TSS_ADDR, 0xfffbd000);
+    ret = kvm_vm_ioctl(KVM_SET_TSS_ADDR, 0xfffbd000);
     if (ret < 0) {
         return ret;
     }
 
-    return kvm_init_identity_map_page(s);
+    return kvm_init_identity_map_page();
 }
 
 static void set_v8086_seg(struct kvm_segment *lhs, const SegmentCache *rhs)
@@ -1861,7 +1860,7 @@ int kvm_on_sigbus_vcpu(CPUState *env, int code, void *addr)
             || code == BUS_MCEERR_AO)) {
         vaddr = (void *)addr;
         if (qemu_ram_addr_from_host(vaddr, &ram_addr) ||
-            !kvm_physical_memory_addr_from_ram(env->kvm_state, ram_addr, &paddr)) {
+            !kvm_physical_memory_addr_from_ram(ram_addr, &paddr)) {
             fprintf(stderr, "Hardware memory error for memory used by "
                     "QEMU itself instead of guest system!\n");
             /* Hope we are lucky for AO MCE */
@@ -1910,7 +1909,7 @@ int kvm_on_sigbus(int code, void *addr)
         /* Hope we are lucky for AO MCE */
         vaddr = addr;
         if (qemu_ram_addr_from_host(vaddr, &ram_addr) ||
-            !kvm_physical_memory_addr_from_ram(first_cpu->kvm_state, ram_addr, &paddr)) {
+            !kvm_physical_memory_addr_from_ram(ram_addr, &paddr)) {
             fprintf(stderr, "Hardware memory error for memory used by "
                     "QEMU itself instead of guest system!: %p\n", addr);
             return 0;
diff --git a/target-i386/kvm_x86.h b/target-i386/kvm_x86.h
index 9d7b584..304d0cb 100644
--- a/target-i386/kvm_x86.h
+++ b/target-i386/kvm_x86.h
@@ -22,4 +22,7 @@ void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
                         uint64_t mcg_status, uint64_t addr, uint64_t misc,
                         int flag);
 
+uint32_t kvm_x86_get_supported_cpuid(uint32_t function, uint32_t index,
+                                     int reg);
+
 #endif
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 849b404..56d30cc 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -56,13 +56,13 @@ static void kvm_kick_env(void *env)
     qemu_cpu_kick(env);
 }
 
-int kvm_arch_init(KVMState *s, int smp_cpus)
+int kvm_arch_init(int smp_cpus)
 {
 #ifdef KVM_CAP_PPC_UNSET_IRQ
-    cap_interrupt_unset = kvm_check_extension(s, KVM_CAP_PPC_UNSET_IRQ);
+    cap_interrupt_unset = kvm_check_extension(KVM_CAP_PPC_UNSET_IRQ);
 #endif
 #ifdef KVM_CAP_PPC_IRQ_LEVEL
-    cap_interrupt_level = kvm_check_extension(s, KVM_CAP_PPC_IRQ_LEVEL);
+    cap_interrupt_level = kvm_check_extension(KVM_CAP_PPC_IRQ_LEVEL);
 #endif
 
     if (!cap_interrupt_level) {
@@ -164,7 +164,7 @@ int kvm_arch_get_registers(CPUState *env)
         env->gpr[i] = regs.gpr[i];
 
 #ifdef KVM_CAP_PPC_SEGSTATE
-    if (kvm_check_extension(env->kvm_state, KVM_CAP_PPC_SEGSTATE)) {
+    if (kvm_check_extension(KVM_CAP_PPC_SEGSTATE)) {
         env->sdr1 = sregs.u.s.sdr1;
 
         /* Sync SLB */
@@ -371,8 +371,8 @@ int kvmppc_get_hypercall(CPUState *env, uint8_t *buf, int buf_len)
 #ifdef KVM_CAP_PPC_GET_PVINFO
     struct kvm_ppc_pvinfo pvinfo;
 
-    if (kvm_check_extension(env->kvm_state, KVM_CAP_PPC_GET_PVINFO) &&
-        !kvm_vm_ioctl(env->kvm_state, KVM_PPC_GET_PVINFO, &pvinfo)) {
+    if (kvm_check_extension(KVM_CAP_PPC_GET_PVINFO) &&
+        !kvm_vm_ioctl(KVM_PPC_GET_PVINFO, &pvinfo)) {
         memcpy(buf, pvinfo.hcall, buf_len);
 
         return 0;
diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
index adf4a9e..927a37e 100644
--- a/target-s390x/kvm.c
+++ b/target-s390x/kvm.c
@@ -70,7 +70,7 @@
 #define SCLP_CMDW_READ_SCP_INFO         0x00020001
 #define SCLP_CMDW_READ_SCP_INFO_FORCED  0x00120001
 
-int kvm_arch_init(KVMState *s, int smp_cpus)
+int kvm_arch_init(int smp_cpus)
 {
     return 0;
 }
@@ -186,10 +186,6 @@ static void kvm_s390_interrupt_internal(CPUState *env, int type, uint32_t parm,
     struct kvm_s390_interrupt kvmint;
     int r;
 
-    if (!env->kvm_state) {
-        return;
-    }
-
     env->halted = 0;
     env->exception_index = -1;
 
@@ -198,7 +194,7 @@ static void kvm_s390_interrupt_internal(CPUState *env, int type, uint32_t parm,
     kvmint.parm64 = parm64;
 
     if (vm) {
-        r = kvm_vm_ioctl(env->kvm_state, KVM_S390_INTERRUPT, &kvmint);
+        r = kvm_vm_ioctl(KVM_S390_INTERRUPT, &kvmint);
     } else {
         r = kvm_vcpu_ioctl(env, KVM_S390_INTERRUPT, &kvmint);
     }
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [PATCH 27/35] kvm: x86: Fix !CONFIG_KVM_PARA build
  2011-01-06 17:56 ` [Qemu-devel] " Marcelo Tosatti
@ 2011-01-06 17:56   ` Marcelo Tosatti
  -1 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, kvm, Jan Kiszka, Marcelo Tosatti

From: Jan Kiszka <jan.kiszka@siemens.com>

If we lack kvm_para.h, MSR_KVM_ASYNC_PF_EN is not defined. The change in
kvm_arch_init_vcpu is just for consistency reasons.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 target-i386/kvm.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index cb6883f..69b8234 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -318,7 +318,7 @@ int kvm_arch_init_vcpu(CPUState *env)
     uint32_t limit, i, j, cpuid_i;
     uint32_t unused;
     struct kvm_cpuid_entry2 *c;
-#ifdef KVM_CPUID_SIGNATURE
+#ifdef CONFIG_KVM_PARA
     uint32_t signature[3];
 #endif
 
@@ -854,7 +854,7 @@ static int kvm_put_msrs(CPUState *env, int level)
         kvm_msr_entry_set(&msrs[n++], MSR_KVM_SYSTEM_TIME,
                           env->system_time_msr);
         kvm_msr_entry_set(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
-#ifdef KVM_CAP_ASYNC_PF
+#if defined(CONFIG_KVM_PARA) && defined(KVM_CAP_ASYNC_PF)
         kvm_msr_entry_set(&msrs[n++], MSR_KVM_ASYNC_PF_EN, env->async_pf_en_msr);
 #endif
     }
@@ -1086,7 +1086,7 @@ static int kvm_get_msrs(CPUState *env)
 #endif
     msrs[n++].index = MSR_KVM_SYSTEM_TIME;
     msrs[n++].index = MSR_KVM_WALL_CLOCK;
-#ifdef KVM_CAP_ASYNC_PF
+#if defined(CONFIG_KVM_PARA) && defined(KVM_CAP_ASYNC_PF)
     msrs[n++].index = MSR_KVM_ASYNC_PF_EN;
 #endif
 
@@ -1162,7 +1162,7 @@ static int kvm_get_msrs(CPUState *env)
             }
 #endif
             break;
-#ifdef KVM_CAP_ASYNC_PF
+#if defined(CONFIG_KVM_PARA) && defined(KVM_CAP_ASYNC_PF)
         case MSR_KVM_ASYNC_PF_EN:
             env->async_pf_en_msr = msrs[i].data;
             break;
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [Qemu-devel] [PATCH 27/35] kvm: x86: Fix !CONFIG_KVM_PARA build
@ 2011-01-06 17:56   ` Marcelo Tosatti
  0 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel, kvm

From: Jan Kiszka <jan.kiszka@siemens.com>

If we lack kvm_para.h, MSR_KVM_ASYNC_PF_EN is not defined. The change in
kvm_arch_init_vcpu is just for consistency reasons.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 target-i386/kvm.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index cb6883f..69b8234 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -318,7 +318,7 @@ int kvm_arch_init_vcpu(CPUState *env)
     uint32_t limit, i, j, cpuid_i;
     uint32_t unused;
     struct kvm_cpuid_entry2 *c;
-#ifdef KVM_CPUID_SIGNATURE
+#ifdef CONFIG_KVM_PARA
     uint32_t signature[3];
 #endif
 
@@ -854,7 +854,7 @@ static int kvm_put_msrs(CPUState *env, int level)
         kvm_msr_entry_set(&msrs[n++], MSR_KVM_SYSTEM_TIME,
                           env->system_time_msr);
         kvm_msr_entry_set(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
-#ifdef KVM_CAP_ASYNC_PF
+#if defined(CONFIG_KVM_PARA) && defined(KVM_CAP_ASYNC_PF)
         kvm_msr_entry_set(&msrs[n++], MSR_KVM_ASYNC_PF_EN, env->async_pf_en_msr);
 #endif
     }
@@ -1086,7 +1086,7 @@ static int kvm_get_msrs(CPUState *env)
 #endif
     msrs[n++].index = MSR_KVM_SYSTEM_TIME;
     msrs[n++].index = MSR_KVM_WALL_CLOCK;
-#ifdef KVM_CAP_ASYNC_PF
+#if defined(CONFIG_KVM_PARA) && defined(KVM_CAP_ASYNC_PF)
     msrs[n++].index = MSR_KVM_ASYNC_PF_EN;
 #endif
 
@@ -1162,7 +1162,7 @@ static int kvm_get_msrs(CPUState *env)
             }
 #endif
             break;
-#ifdef KVM_CAP_ASYNC_PF
+#if defined(CONFIG_KVM_PARA) && defined(KVM_CAP_ASYNC_PF)
         case MSR_KVM_ASYNC_PF_EN:
             env->async_pf_en_msr = msrs[i].data;
             break;
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-06 17:56 ` [Qemu-devel] " Marcelo Tosatti
@ 2011-01-06 17:56   ` Marcelo Tosatti
  -1 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: qemu-devel, kvm, Jan Kiszka, Glauber Costa, Marcelo Tosatti

From: Jan Kiszka <jan.kiszka@siemens.com>

If kvmclock is used, which implies the kernel supports it, register a
kvmclock device with the sysbus. Its main purpose is to save and restore
the kernel state on migration, but this will also allow to visualize it
one day.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Glauber Costa <glommer@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 target-i386/kvm.c |   92 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 91 insertions(+), 1 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 69b8234..47cb22b 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -29,6 +29,7 @@
 #include "hw/apic.h"
 #include "ioport.h"
 #include "kvm_x86.h"
+#include "hw/sysbus.h"
 
 #ifdef CONFIG_KVM_PARA
 #include <linux/kvm_para.h>
@@ -309,6 +310,85 @@ void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
 #endif
 }
 
+#if defined(CONFIG_KVM_PARA) && defined(KVM_CAP_ADJUST_CLOCK)
+typedef struct KVMClockState {
+    SysBusDevice busdev;
+    uint64_t clock;
+    bool clock_valid;
+} KVMClockState;
+
+static void kvmclock_pre_save(void *opaque)
+{
+    KVMClockState *s = opaque;
+    struct kvm_clock_data data;
+    int ret;
+
+    if (s->clock_valid) {
+        return;
+    }
+    ret = kvm_vm_ioctl(KVM_GET_CLOCK, &data);
+    if (ret < 0) {
+        fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
+        data.clock = 0;
+    }
+    s->clock = data.clock;
+    /*
+     * If the VM is stopped, declare the clock state valid to avoid re-reading
+     * it on next vmsave (which would return a different value). Will be reset
+     * when the VM is continued.
+     */
+    s->clock_valid = !vm_running;
+}
+
+static int kvmclock_post_load(void *opaque, int version_id)
+{
+    KVMClockState *s = opaque;
+    struct kvm_clock_data data;
+
+    data.clock = s->clock;
+    data.flags = 0;
+    return kvm_vm_ioctl(KVM_SET_CLOCK, &data);
+}
+
+static void kvmclock_vm_state_change(void *opaque, int running, int reason)
+{
+    KVMClockState *s = opaque;
+
+    if (running) {
+        s->clock_valid = false;
+    }
+}
+
+static int kvmclock_init(SysBusDevice *dev)
+{
+    KVMClockState *s = FROM_SYSBUS(KVMClockState, dev);
+
+    qemu_add_vm_change_state_handler(kvmclock_vm_state_change, s);
+    return 0;
+}
+
+static const VMStateDescription kvmclock_vmsd= {
+    .name = "kvmclock",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .pre_save = kvmclock_pre_save,
+    .post_load = kvmclock_post_load,
+    .fields = (VMStateField []) {
+        VMSTATE_UINT64(clock, KVMClockState),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static SysBusDeviceInfo kvmclock_info = {
+    .qdev.name = "kvmclock",
+    .qdev.size = sizeof(KVMClockState),
+    .qdev.vmsd = &kvmclock_vmsd,
+    .qdev.no_user = 1,
+    .init = kvmclock_init,
+};
+#endif /* CONFIG_KVM_PARA && KVM_CAP_ADJUST_CLOCK */
+
 int kvm_arch_init_vcpu(CPUState *env)
 {
     struct {
@@ -335,7 +415,6 @@ int kvm_arch_init_vcpu(CPUState *env)
     env->cpuid_svm_features  &= kvm_x86_get_supported_cpuid(0x8000000A,
                                                             0, R_EDX);
 
-
     cpuid_i = 0;
 
 #ifdef CONFIG_KVM_PARA
@@ -442,6 +521,13 @@ int kvm_arch_init_vcpu(CPUState *env)
     }
 #endif
 
+#if defined(CONFIG_KVM_PARA) && defined(KVM_CAP_ADJUST_CLOCK)
+    if (cpu_is_bsp(env) &&
+        (env->cpuid_kvm_features & (1ULL << KVM_FEATURE_CLOCKSOURCE))) {
+        sysbus_create_simple("kvmclock", -1, NULL);
+    }
+#endif
+
     return kvm_vcpu_ioctl(env, KVM_SET_CPUID2, &cpuid_data);
 }
 
@@ -531,6 +617,10 @@ int kvm_arch_init(int smp_cpus)
     int ret;
     struct utsname utsname;
 
+#if defined(CONFIG_KVM_PARA) && defined(KVM_CAP_ADJUST_CLOCK)
+    sysbus_register_withprop(&kvmclock_info);
+#endif
+
     ret = kvm_get_supported_msrs();
     if (ret < 0) {
         return ret;
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-06 17:56   ` Marcelo Tosatti
  0 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Jan Kiszka, Glauber Costa, Marcelo Tosatti, qemu-devel, kvm

From: Jan Kiszka <jan.kiszka@siemens.com>

If kvmclock is used, which implies the kernel supports it, register a
kvmclock device with the sysbus. Its main purpose is to save and restore
the kernel state on migration, but this will also allow to visualize it
one day.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Glauber Costa <glommer@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 target-i386/kvm.c |   92 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 91 insertions(+), 1 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 69b8234..47cb22b 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -29,6 +29,7 @@
 #include "hw/apic.h"
 #include "ioport.h"
 #include "kvm_x86.h"
+#include "hw/sysbus.h"
 
 #ifdef CONFIG_KVM_PARA
 #include <linux/kvm_para.h>
@@ -309,6 +310,85 @@ void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
 #endif
 }
 
+#if defined(CONFIG_KVM_PARA) && defined(KVM_CAP_ADJUST_CLOCK)
+typedef struct KVMClockState {
+    SysBusDevice busdev;
+    uint64_t clock;
+    bool clock_valid;
+} KVMClockState;
+
+static void kvmclock_pre_save(void *opaque)
+{
+    KVMClockState *s = opaque;
+    struct kvm_clock_data data;
+    int ret;
+
+    if (s->clock_valid) {
+        return;
+    }
+    ret = kvm_vm_ioctl(KVM_GET_CLOCK, &data);
+    if (ret < 0) {
+        fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
+        data.clock = 0;
+    }
+    s->clock = data.clock;
+    /*
+     * If the VM is stopped, declare the clock state valid to avoid re-reading
+     * it on next vmsave (which would return a different value). Will be reset
+     * when the VM is continued.
+     */
+    s->clock_valid = !vm_running;
+}
+
+static int kvmclock_post_load(void *opaque, int version_id)
+{
+    KVMClockState *s = opaque;
+    struct kvm_clock_data data;
+
+    data.clock = s->clock;
+    data.flags = 0;
+    return kvm_vm_ioctl(KVM_SET_CLOCK, &data);
+}
+
+static void kvmclock_vm_state_change(void *opaque, int running, int reason)
+{
+    KVMClockState *s = opaque;
+
+    if (running) {
+        s->clock_valid = false;
+    }
+}
+
+static int kvmclock_init(SysBusDevice *dev)
+{
+    KVMClockState *s = FROM_SYSBUS(KVMClockState, dev);
+
+    qemu_add_vm_change_state_handler(kvmclock_vm_state_change, s);
+    return 0;
+}
+
+static const VMStateDescription kvmclock_vmsd= {
+    .name = "kvmclock",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .pre_save = kvmclock_pre_save,
+    .post_load = kvmclock_post_load,
+    .fields = (VMStateField []) {
+        VMSTATE_UINT64(clock, KVMClockState),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static SysBusDeviceInfo kvmclock_info = {
+    .qdev.name = "kvmclock",
+    .qdev.size = sizeof(KVMClockState),
+    .qdev.vmsd = &kvmclock_vmsd,
+    .qdev.no_user = 1,
+    .init = kvmclock_init,
+};
+#endif /* CONFIG_KVM_PARA && KVM_CAP_ADJUST_CLOCK */
+
 int kvm_arch_init_vcpu(CPUState *env)
 {
     struct {
@@ -335,7 +415,6 @@ int kvm_arch_init_vcpu(CPUState *env)
     env->cpuid_svm_features  &= kvm_x86_get_supported_cpuid(0x8000000A,
                                                             0, R_EDX);
 
-
     cpuid_i = 0;
 
 #ifdef CONFIG_KVM_PARA
@@ -442,6 +521,13 @@ int kvm_arch_init_vcpu(CPUState *env)
     }
 #endif
 
+#if defined(CONFIG_KVM_PARA) && defined(KVM_CAP_ADJUST_CLOCK)
+    if (cpu_is_bsp(env) &&
+        (env->cpuid_kvm_features & (1ULL << KVM_FEATURE_CLOCKSOURCE))) {
+        sysbus_create_simple("kvmclock", -1, NULL);
+    }
+#endif
+
     return kvm_vcpu_ioctl(env, KVM_SET_CPUID2, &cpuid_data);
 }
 
@@ -531,6 +617,10 @@ int kvm_arch_init(int smp_cpus)
     int ret;
     struct utsname utsname;
 
+#if defined(CONFIG_KVM_PARA) && defined(KVM_CAP_ADJUST_CLOCK)
+    sysbus_register_withprop(&kvmclock_info);
+#endif
+
     ret = kvm_get_supported_msrs();
     if (ret < 0) {
         return ret;
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [PATCH 29/35] kvm: Drop smp_cpus argument from init functions
  2011-01-06 17:56 ` [Qemu-devel] " Marcelo Tosatti
@ 2011-01-06 17:56   ` Marcelo Tosatti
  -1 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, kvm, Jan Kiszka, Marcelo Tosatti

From: Jan Kiszka <jan.kiszka@siemens.com>

No longer used.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 kvm-all.c          |    4 ++--
 kvm-stub.c         |    2 +-
 kvm.h              |    4 ++--
 target-i386/kvm.c  |    2 +-
 target-ppc/kvm.c   |    2 +-
 target-s390x/kvm.c |    2 +-
 vl.c               |    2 +-
 7 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index d8820c7..190fcdf 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -584,7 +584,7 @@ static CPUPhysMemoryClient kvm_cpu_phys_memory_client = {
     .migration_log = kvm_client_migration_log,
 };
 
-int kvm_init(int smp_cpus)
+int kvm_init(void)
 {
     static const char upgrade_note[] =
         "Please upgrade to at least kernel 2.6.29 or recent kvm-kmod\n"
@@ -687,7 +687,7 @@ int kvm_init(int smp_cpus)
     kvm_state.xcrs = kvm_check_extension(KVM_CAP_XCRS);
 #endif
 
-    ret = kvm_arch_init(smp_cpus);
+    ret = kvm_arch_init();
     if (ret < 0) {
         goto err;
     }
diff --git a/kvm-stub.c b/kvm-stub.c
index 3a058ad..e00d7df 100644
--- a/kvm-stub.c
+++ b/kvm-stub.c
@@ -58,7 +58,7 @@ int kvm_check_extension(unsigned int extension)
     return 0;
 }
 
-int kvm_init(int smp_cpus)
+int kvm_init(void)
 {
     return -ENOSYS;
 }
diff --git a/kvm.h b/kvm.h
index 26ca8c1..31d9f21 100644
--- a/kvm.h
+++ b/kvm.h
@@ -34,7 +34,7 @@ struct kvm_run;
 
 /* external API */
 
-int kvm_init(int smp_cpus);
+int kvm_init(void);
 
 int kvm_has_sync_mmu(void);
 int kvm_has_vcpu_events(void);
@@ -101,7 +101,7 @@ int kvm_arch_get_registers(CPUState *env);
 
 int kvm_arch_put_registers(CPUState *env, int level);
 
-int kvm_arch_init(int smp_cpus);
+int kvm_arch_init(void);
 
 int kvm_arch_init_vcpu(CPUState *env);
 
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 47cb22b..a907578 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -612,7 +612,7 @@ static int kvm_init_identity_map_page(void)
     return 0;
 }
 
-int kvm_arch_init(int smp_cpus)
+int kvm_arch_init(void)
 {
     int ret;
     struct utsname utsname;
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 56d30cc..72f2f94 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -56,7 +56,7 @@ static void kvm_kick_env(void *env)
     qemu_cpu_kick(env);
 }
 
-int kvm_arch_init(int smp_cpus)
+int kvm_arch_init(void)
 {
 #ifdef KVM_CAP_PPC_UNSET_IRQ
     cap_interrupt_unset = kvm_check_extension(KVM_CAP_PPC_UNSET_IRQ);
diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
index 927a37e..4f9075c 100644
--- a/target-s390x/kvm.c
+++ b/target-s390x/kvm.c
@@ -70,7 +70,7 @@
 #define SCLP_CMDW_READ_SCP_INFO         0x00020001
 #define SCLP_CMDW_READ_SCP_INFO_FORCED  0x00120001
 
-int kvm_arch_init(int smp_cpus)
+int kvm_arch_init(void)
 {
     return 0;
 }
diff --git a/vl.c b/vl.c
index b0b6605..fd47f4c 100644
--- a/vl.c
+++ b/vl.c
@@ -2837,7 +2837,7 @@ int main(int argc, char **argv, char **envp)
     }
 
     if (kvm_allowed) {
-        int ret = kvm_init(smp_cpus);
+        int ret = kvm_init();
         if (ret < 0) {
             if (!kvm_available()) {
                 printf("KVM not supported for this target\n");
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [Qemu-devel] [PATCH 29/35] kvm: Drop smp_cpus argument from init functions
@ 2011-01-06 17:56   ` Marcelo Tosatti
  0 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel, kvm

From: Jan Kiszka <jan.kiszka@siemens.com>

No longer used.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 kvm-all.c          |    4 ++--
 kvm-stub.c         |    2 +-
 kvm.h              |    4 ++--
 target-i386/kvm.c  |    2 +-
 target-ppc/kvm.c   |    2 +-
 target-s390x/kvm.c |    2 +-
 vl.c               |    2 +-
 7 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index d8820c7..190fcdf 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -584,7 +584,7 @@ static CPUPhysMemoryClient kvm_cpu_phys_memory_client = {
     .migration_log = kvm_client_migration_log,
 };
 
-int kvm_init(int smp_cpus)
+int kvm_init(void)
 {
     static const char upgrade_note[] =
         "Please upgrade to at least kernel 2.6.29 or recent kvm-kmod\n"
@@ -687,7 +687,7 @@ int kvm_init(int smp_cpus)
     kvm_state.xcrs = kvm_check_extension(KVM_CAP_XCRS);
 #endif
 
-    ret = kvm_arch_init(smp_cpus);
+    ret = kvm_arch_init();
     if (ret < 0) {
         goto err;
     }
diff --git a/kvm-stub.c b/kvm-stub.c
index 3a058ad..e00d7df 100644
--- a/kvm-stub.c
+++ b/kvm-stub.c
@@ -58,7 +58,7 @@ int kvm_check_extension(unsigned int extension)
     return 0;
 }
 
-int kvm_init(int smp_cpus)
+int kvm_init(void)
 {
     return -ENOSYS;
 }
diff --git a/kvm.h b/kvm.h
index 26ca8c1..31d9f21 100644
--- a/kvm.h
+++ b/kvm.h
@@ -34,7 +34,7 @@ struct kvm_run;
 
 /* external API */
 
-int kvm_init(int smp_cpus);
+int kvm_init(void);
 
 int kvm_has_sync_mmu(void);
 int kvm_has_vcpu_events(void);
@@ -101,7 +101,7 @@ int kvm_arch_get_registers(CPUState *env);
 
 int kvm_arch_put_registers(CPUState *env, int level);
 
-int kvm_arch_init(int smp_cpus);
+int kvm_arch_init(void);
 
 int kvm_arch_init_vcpu(CPUState *env);
 
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 47cb22b..a907578 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -612,7 +612,7 @@ static int kvm_init_identity_map_page(void)
     return 0;
 }
 
-int kvm_arch_init(int smp_cpus)
+int kvm_arch_init(void)
 {
     int ret;
     struct utsname utsname;
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 56d30cc..72f2f94 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -56,7 +56,7 @@ static void kvm_kick_env(void *env)
     qemu_cpu_kick(env);
 }
 
-int kvm_arch_init(int smp_cpus)
+int kvm_arch_init(void)
 {
 #ifdef KVM_CAP_PPC_UNSET_IRQ
     cap_interrupt_unset = kvm_check_extension(KVM_CAP_PPC_UNSET_IRQ);
diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
index 927a37e..4f9075c 100644
--- a/target-s390x/kvm.c
+++ b/target-s390x/kvm.c
@@ -70,7 +70,7 @@
 #define SCLP_CMDW_READ_SCP_INFO         0x00020001
 #define SCLP_CMDW_READ_SCP_INFO_FORCED  0x00120001
 
-int kvm_arch_init(int smp_cpus)
+int kvm_arch_init(void)
 {
     return 0;
 }
diff --git a/vl.c b/vl.c
index b0b6605..fd47f4c 100644
--- a/vl.c
+++ b/vl.c
@@ -2837,7 +2837,7 @@ int main(int argc, char **argv, char **envp)
     }
 
     if (kvm_allowed) {
-        int ret = kvm_init(smp_cpus);
+        int ret = kvm_init();
         if (ret < 0) {
             if (!kvm_available()) {
                 printf("KVM not supported for this target\n");
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [PATCH 30/35] kvm: Consolidate must-have capability checks
  2011-01-06 17:56 ` [Qemu-devel] " Marcelo Tosatti
@ 2011-01-06 17:56   ` Marcelo Tosatti
  -1 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, kvm, Jan Kiszka, Marcelo Tosatti

From: Jan Kiszka <jan.kiszka@siemens.com>

Instead of splattering the code with #ifdefs and runtime checks for
capabilities we cannot work without anyway, provide central test
infrastructure for verifying their availability both at build and
runtime.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 configure          |   39 ++++++++++++++++++++++-----------
 kvm-all.c          |   61 ++++++++++++++++++++++-----------------------------
 kvm.h              |   10 ++++++++
 target-i386/kvm.c  |   39 ++++++--------------------------
 target-ppc/kvm.c   |    4 +++
 target-s390x/kvm.c |    4 +++
 6 files changed, 78 insertions(+), 79 deletions(-)

diff --git a/configure b/configure
index ec37a91..e6ee5c3 100755
--- a/configure
+++ b/configure
@@ -1665,18 +1665,31 @@ if test "$kvm" != "no" ; then
 #if !defined(KVM_API_VERSION) || KVM_API_VERSION < 12 || KVM_API_VERSION > 12
 #error Invalid KVM version
 #endif
-#if !defined(KVM_CAP_USER_MEMORY)
-#error Missing KVM capability KVM_CAP_USER_MEMORY
-#endif
-#if !defined(KVM_CAP_SET_TSS_ADDR)
-#error Missing KVM capability KVM_CAP_SET_TSS_ADDR
-#endif
-#if !defined(KVM_CAP_DESTROY_MEMORY_REGION_WORKS)
-#error Missing KVM capability KVM_CAP_DESTROY_MEMORY_REGION_WORKS
-#endif
-#if !defined(KVM_CAP_USER_NMI)
-#error Missing KVM capability KVM_CAP_USER_NMI
+EOF
+    must_have_caps="KVM_CAP_USER_MEMORY \
+                    KVM_CAP_DESTROY_MEMORY_REGION_WORKS \
+                    KVM_CAP_COALESCED_MMIO \
+                    KVM_CAP_SYNC_MMU \
+                   "
+    if test \( "$cpu" = "i386" -o "$cpu" = "x86_64" \) ; then
+      must_have_caps="$caps \
+                      KVM_CAP_SET_TSS_ADDR \
+                      KVM_CAP_EXT_CPUID \
+                      KVM_CAP_CLOCKSOURCE \
+                      KVM_CAP_NOP_IO_DELAY \
+                      KVM_CAP_PV_MMU \
+                      KVM_CAP_MP_STATE \
+                      KVM_CAP_USER_NMI \
+                     "
+    fi
+    for c in $must_have_caps ; do
+      cat >> $TMPC <<EOF
+#if !defined($c)
+#error Missing KVM capability $c
 #endif
+EOF
+    done
+    cat >> $TMPC <<EOF
 int main(void) { return 0; }
 EOF
   if test "$kerneldir" != "" ; then
@@ -1711,8 +1724,8 @@ EOF
 	| awk -F "error: " '{if (NR>1) printf(", "); printf("%s",$2);}'`
         if test "$kvmerr" != "" ; then
           echo -e "${kvmerr}\n\
-      NOTE: To enable KVM support, update your kernel to 2.6.29+ or install \
-  recent kvm-kmod from http://sourceforge.net/projects/kvm."
+NOTE: To enable KVM support, update your kernel to 2.6.29+ or install \
+recent kvm-kmod from http://sourceforge.net/projects/kvm."
         fi
       fi
       feature_not_found "kvm"
diff --git a/kvm-all.c b/kvm-all.c
index 190fcdf..7a5b299 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -57,9 +57,7 @@ static struct KVMState {
     int fd;
     int vmfd;
     int coalesced_mmio;
-#ifdef KVM_CAP_COALESCED_MMIO
     struct kvm_coalesced_mmio_ring *coalesced_mmio_ring;
-#endif
     int broken_set_mem_region;
     int migration_log;
     int vcpu_events;
@@ -73,6 +71,12 @@ static struct KVMState {
     int xsave, xcrs;
 } kvm_state;
 
+static const KVMCapabilityInfo kvm_required_capabilites[] = {
+    KVM_CAP_INFO(USER_MEMORY),
+    KVM_CAP_INFO(DESTROY_MEMORY_REGION_WORKS),
+    KVM_CAP_LAST_INFO
+};
+
 static KVMSlot *kvm_alloc_slot(void)
 {
     int i;
@@ -214,12 +218,10 @@ int kvm_init_vcpu(CPUState *env)
         goto err;
     }
 
-#ifdef KVM_CAP_COALESCED_MMIO
     if (kvm_state.coalesced_mmio && !kvm_state.coalesced_mmio_ring) {
         kvm_state.coalesced_mmio_ring =
             (void *)env->kvm_run + kvm_state.coalesced_mmio * PAGE_SIZE;
     }
-#endif
 
     ret = kvm_arch_init_vcpu(env);
     if (ret == 0) {
@@ -386,7 +388,6 @@ int kvm_coalesce_mmio_region(target_phys_addr_t start, ram_addr_t size)
 {
     int ret = -ENOSYS;
 
-#ifdef KVM_CAP_COALESCED_MMIO
     if (kvm_state.coalesced_mmio) {
         struct kvm_coalesced_mmio_zone zone;
 
@@ -395,7 +396,6 @@ int kvm_coalesce_mmio_region(target_phys_addr_t start, ram_addr_t size)
 
         ret = kvm_vm_ioctl(KVM_REGISTER_COALESCED_MMIO, &zone);
     }
-#endif
 
     return ret;
 }
@@ -404,7 +404,6 @@ int kvm_uncoalesce_mmio_region(target_phys_addr_t start, ram_addr_t size)
 {
     int ret = -ENOSYS;
 
-#ifdef KVM_CAP_COALESCED_MMIO
     if (kvm_state.coalesced_mmio) {
         struct kvm_coalesced_mmio_zone zone;
 
@@ -413,7 +412,6 @@ int kvm_uncoalesce_mmio_region(target_phys_addr_t start, ram_addr_t size)
 
         ret = kvm_vm_ioctl(KVM_UNREGISTER_COALESCED_MMIO, &zone);
     }
-#endif
 
     return ret;
 }
@@ -430,6 +428,18 @@ int kvm_check_extension(unsigned int extension)
     return ret;
 }
 
+static const KVMCapabilityInfo *
+kvm_check_extension_list(const KVMCapabilityInfo *list)
+{
+    while (list->name) {
+        if (!kvm_check_extension(list->value)) {
+            return list;
+        }
+        list++;
+    }
+    return NULL;
+}
+
 static void kvm_set_phys_mem(target_phys_addr_t start_addr, ram_addr_t size,
                              ram_addr_t phys_offset)
 {
@@ -589,6 +599,7 @@ int kvm_init(void)
     static const char upgrade_note[] =
         "Please upgrade to at least kernel 2.6.29 or recent kvm-kmod\n"
         "(see http://sourceforge.net/projects/kvm).\n";
+    const KVMCapabilityInfo *missing_cap;
     int ret;
     int i;
 
@@ -630,33 +641,19 @@ int kvm_init(void)
         goto err;
     }
 
-    /* initially, KVM allocated its own memory and we had to jump through
-     * hooks to make phys_ram_base point to this.  Modern versions of KVM
-     * just use a user allocated buffer so we can use regular pages
-     * unmodified.  Make sure we have a sufficiently modern version of KVM.
-     */
-    if (!kvm_check_extension(KVM_CAP_USER_MEMORY)) {
-        ret = -EINVAL;
-        fprintf(stderr, "kvm does not support KVM_CAP_USER_MEMORY\n%s",
-                upgrade_note);
-        goto err;
+    missing_cap = kvm_check_extension_list(kvm_required_capabilites);
+    if (!missing_cap) {
+        missing_cap =
+            kvm_check_extension_list(kvm_arch_required_capabilities);
     }
-
-    /* There was a nasty bug in < kvm-80 that prevents memory slots from being
-     * destroyed properly.  Since we rely on this capability, refuse to work
-     * with any kernel without this capability. */
-    if (!kvm_check_extension(KVM_CAP_DESTROY_MEMORY_REGION_WORKS)) {
+    if (missing_cap) {
         ret = -EINVAL;
-
-        fprintf(stderr,
-                "KVM kernel module broken (DESTROY_MEMORY_REGION).\n%s",
-                upgrade_note);
+        fprintf(stderr, "kvm does not support %s\n%s",
+                missing_cap->name, upgrade_note);
         goto err;
     }
 
-#ifdef KVM_CAP_COALESCED_MMIO
     kvm_state.coalesced_mmio = kvm_check_extension(KVM_CAP_COALESCED_MMIO);
-#endif
 
     kvm_state.broken_set_mem_region = 1;
 #ifdef KVM_CAP_JOIN_MEMORY_REGIONS_WORKS
@@ -777,7 +774,6 @@ static int kvm_handle_internal_error(CPUState *env, struct kvm_run *run)
 
 void kvm_flush_coalesced_mmio_buffer(void)
 {
-#ifdef KVM_CAP_COALESCED_MMIO
     if (kvm_state.coalesced_mmio_ring) {
         struct kvm_coalesced_mmio_ring *ring = kvm_state.coalesced_mmio_ring;
         while (ring->first != ring->last) {
@@ -790,7 +786,6 @@ void kvm_flush_coalesced_mmio_buffer(void)
             ring->first = (ring->first + 1) % KVM_COALESCED_MMIO_MAX;
         }
     }
-#endif
 }
 
 static void do_kvm_cpu_synchronize_state(void *_env)
@@ -988,11 +983,7 @@ int kvm_vcpu_ioctl(CPUState *env, int type, ...)
 
 int kvm_has_sync_mmu(void)
 {
-#ifdef KVM_CAP_SYNC_MMU
     return kvm_check_extension(KVM_CAP_SYNC_MMU);
-#else
-    return 0;
-#endif
 }
 
 int kvm_has_vcpu_events(void)
diff --git a/kvm.h b/kvm.h
index 31d9f21..153d7b9 100644
--- a/kvm.h
+++ b/kvm.h
@@ -32,6 +32,14 @@ extern int kvm_allowed;
 
 struct kvm_run;
 
+typedef struct KVMCapabilityInfo {
+    const char *name;
+    int value;
+} KVMCapabilityInfo;
+
+#define KVM_CAP_INFO(CAP) { "KVM_CAP_" stringify(CAP), KVM_CAP_##CAP }
+#define KVM_CAP_LAST_INFO { NULL, 0 }
+
 /* external API */
 
 int kvm_init(void);
@@ -82,6 +90,8 @@ int kvm_vcpu_ioctl(CPUState *env, int type, ...);
 
 /* Arch specific hooks */
 
+extern const KVMCapabilityInfo kvm_arch_required_capabilities[];
+
 int kvm_arch_post_run(CPUState *env, struct kvm_run *run);
 
 int kvm_arch_handle_exit(CPUState *env, struct kvm_run *run);
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index a907578..58122d9 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -55,12 +55,17 @@
 #define BUS_MCEERR_AO 5
 #endif
 
+const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
+    KVM_CAP_INFO(SET_TSS_ADDR),
+    KVM_CAP_INFO(EXT_CPUID),
+    KVM_CAP_INFO(MP_STATE),
+    KVM_CAP_LAST_INFO
+};
+
 static bool has_msr_star;
 static bool has_msr_hsave_pa;
 static int lm_capable_kernel;
 
-#ifdef KVM_CAP_EXT_CPUID
-
 static struct kvm_cpuid2 *try_get_cpuid(int max)
 {
     struct kvm_cpuid2 *cpuid;
@@ -94,10 +99,6 @@ uint32_t kvm_x86_get_supported_cpuid(uint32_t function, uint32_t index,
     uint32_t ret = 0;
     uint32_t cpuid_1_edx;
 
-    if (!kvm_check_extension(KVM_CAP_EXT_CPUID)) {
-        return -1U;
-    }
-
     max = 1;
     while ((cpuid = try_get_cpuid(max)) == NULL) {
         max *= 2;
@@ -141,30 +142,14 @@ uint32_t kvm_x86_get_supported_cpuid(uint32_t function, uint32_t index,
     return ret;
 }
 
-#else
-
-uint32_t kvm_x86_get_supported_cpuid(uint32_t function, uint32_t index,
-                                     int reg)
-{
-    return -1U;
-}
-
-#endif
-
 #ifdef CONFIG_KVM_PARA
 struct kvm_para_features {
     int cap;
     int feature;
 } para_features[] = {
-#ifdef KVM_CAP_CLOCKSOURCE
     { KVM_CAP_CLOCKSOURCE, KVM_FEATURE_CLOCKSOURCE },
-#endif
-#ifdef KVM_CAP_NOP_IO_DELAY
     { KVM_CAP_NOP_IO_DELAY, KVM_FEATURE_NOP_IO_DELAY },
-#endif
-#ifdef KVM_CAP_PV_MMU
     { KVM_CAP_PV_MMU, KVM_FEATURE_MMU_OP },
-#endif
 #ifdef KVM_CAP_ASYNC_PF
     { KVM_CAP_ASYNC_PF, KVM_FEATURE_ASYNC_PF },
 #endif
@@ -631,15 +616,7 @@ int kvm_arch_init(void)
 
     /* create vm86 tss.  KVM uses vm86 mode to emulate 16-bit code
      * directly.  In order to use vm86 mode, a TSS is needed.  Since this
-     * must be part of guest physical memory, we need to allocate it.  Older
-     * versions of KVM just assumed that it would be at the end of physical
-     * memory but that doesn't work with more than 4GB of memory.  We simply
-     * refuse to work with those older versions of KVM. */
-    ret = kvm_check_extension(KVM_CAP_SET_TSS_ADDR);
-    if (ret <= 0) {
-        fprintf(stderr, "kvm does not support KVM_CAP_SET_TSS_ADDR\n");
-        return ret;
-    }
+     * must be part of guest physical memory, we need to allocate it. */
 
     /* this address is 3 pages before the bios, and the bios should present
      * as unavaible memory.  FIXME, need to ensure the e820 map deals with
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 72f2f94..7918426 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -37,6 +37,10 @@
     do { } while (0)
 #endif
 
+const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
+    KVM_CAP_LAST_INFO
+};
+
 static int cap_interrupt_unset = false;
 static int cap_interrupt_level = false;
 
diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
index 4f9075c..29fcd46 100644
--- a/target-s390x/kvm.c
+++ b/target-s390x/kvm.c
@@ -70,6 +70,10 @@
 #define SCLP_CMDW_READ_SCP_INFO         0x00020001
 #define SCLP_CMDW_READ_SCP_INFO_FORCED  0x00120001
 
+const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
+    KVM_CAP_LAST_INFO
+};
+
 int kvm_arch_init(void)
 {
     return 0;
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [Qemu-devel] [PATCH 30/35] kvm: Consolidate must-have capability checks
@ 2011-01-06 17:56   ` Marcelo Tosatti
  0 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel, kvm

From: Jan Kiszka <jan.kiszka@siemens.com>

Instead of splattering the code with #ifdefs and runtime checks for
capabilities we cannot work without anyway, provide central test
infrastructure for verifying their availability both at build and
runtime.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 configure          |   39 ++++++++++++++++++++++-----------
 kvm-all.c          |   61 ++++++++++++++++++++++-----------------------------
 kvm.h              |   10 ++++++++
 target-i386/kvm.c  |   39 ++++++--------------------------
 target-ppc/kvm.c   |    4 +++
 target-s390x/kvm.c |    4 +++
 6 files changed, 78 insertions(+), 79 deletions(-)

diff --git a/configure b/configure
index ec37a91..e6ee5c3 100755
--- a/configure
+++ b/configure
@@ -1665,18 +1665,31 @@ if test "$kvm" != "no" ; then
 #if !defined(KVM_API_VERSION) || KVM_API_VERSION < 12 || KVM_API_VERSION > 12
 #error Invalid KVM version
 #endif
-#if !defined(KVM_CAP_USER_MEMORY)
-#error Missing KVM capability KVM_CAP_USER_MEMORY
-#endif
-#if !defined(KVM_CAP_SET_TSS_ADDR)
-#error Missing KVM capability KVM_CAP_SET_TSS_ADDR
-#endif
-#if !defined(KVM_CAP_DESTROY_MEMORY_REGION_WORKS)
-#error Missing KVM capability KVM_CAP_DESTROY_MEMORY_REGION_WORKS
-#endif
-#if !defined(KVM_CAP_USER_NMI)
-#error Missing KVM capability KVM_CAP_USER_NMI
+EOF
+    must_have_caps="KVM_CAP_USER_MEMORY \
+                    KVM_CAP_DESTROY_MEMORY_REGION_WORKS \
+                    KVM_CAP_COALESCED_MMIO \
+                    KVM_CAP_SYNC_MMU \
+                   "
+    if test \( "$cpu" = "i386" -o "$cpu" = "x86_64" \) ; then
+      must_have_caps="$caps \
+                      KVM_CAP_SET_TSS_ADDR \
+                      KVM_CAP_EXT_CPUID \
+                      KVM_CAP_CLOCKSOURCE \
+                      KVM_CAP_NOP_IO_DELAY \
+                      KVM_CAP_PV_MMU \
+                      KVM_CAP_MP_STATE \
+                      KVM_CAP_USER_NMI \
+                     "
+    fi
+    for c in $must_have_caps ; do
+      cat >> $TMPC <<EOF
+#if !defined($c)
+#error Missing KVM capability $c
 #endif
+EOF
+    done
+    cat >> $TMPC <<EOF
 int main(void) { return 0; }
 EOF
   if test "$kerneldir" != "" ; then
@@ -1711,8 +1724,8 @@ EOF
 	| awk -F "error: " '{if (NR>1) printf(", "); printf("%s",$2);}'`
         if test "$kvmerr" != "" ; then
           echo -e "${kvmerr}\n\
-      NOTE: To enable KVM support, update your kernel to 2.6.29+ or install \
-  recent kvm-kmod from http://sourceforge.net/projects/kvm."
+NOTE: To enable KVM support, update your kernel to 2.6.29+ or install \
+recent kvm-kmod from http://sourceforge.net/projects/kvm."
         fi
       fi
       feature_not_found "kvm"
diff --git a/kvm-all.c b/kvm-all.c
index 190fcdf..7a5b299 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -57,9 +57,7 @@ static struct KVMState {
     int fd;
     int vmfd;
     int coalesced_mmio;
-#ifdef KVM_CAP_COALESCED_MMIO
     struct kvm_coalesced_mmio_ring *coalesced_mmio_ring;
-#endif
     int broken_set_mem_region;
     int migration_log;
     int vcpu_events;
@@ -73,6 +71,12 @@ static struct KVMState {
     int xsave, xcrs;
 } kvm_state;
 
+static const KVMCapabilityInfo kvm_required_capabilites[] = {
+    KVM_CAP_INFO(USER_MEMORY),
+    KVM_CAP_INFO(DESTROY_MEMORY_REGION_WORKS),
+    KVM_CAP_LAST_INFO
+};
+
 static KVMSlot *kvm_alloc_slot(void)
 {
     int i;
@@ -214,12 +218,10 @@ int kvm_init_vcpu(CPUState *env)
         goto err;
     }
 
-#ifdef KVM_CAP_COALESCED_MMIO
     if (kvm_state.coalesced_mmio && !kvm_state.coalesced_mmio_ring) {
         kvm_state.coalesced_mmio_ring =
             (void *)env->kvm_run + kvm_state.coalesced_mmio * PAGE_SIZE;
     }
-#endif
 
     ret = kvm_arch_init_vcpu(env);
     if (ret == 0) {
@@ -386,7 +388,6 @@ int kvm_coalesce_mmio_region(target_phys_addr_t start, ram_addr_t size)
 {
     int ret = -ENOSYS;
 
-#ifdef KVM_CAP_COALESCED_MMIO
     if (kvm_state.coalesced_mmio) {
         struct kvm_coalesced_mmio_zone zone;
 
@@ -395,7 +396,6 @@ int kvm_coalesce_mmio_region(target_phys_addr_t start, ram_addr_t size)
 
         ret = kvm_vm_ioctl(KVM_REGISTER_COALESCED_MMIO, &zone);
     }
-#endif
 
     return ret;
 }
@@ -404,7 +404,6 @@ int kvm_uncoalesce_mmio_region(target_phys_addr_t start, ram_addr_t size)
 {
     int ret = -ENOSYS;
 
-#ifdef KVM_CAP_COALESCED_MMIO
     if (kvm_state.coalesced_mmio) {
         struct kvm_coalesced_mmio_zone zone;
 
@@ -413,7 +412,6 @@ int kvm_uncoalesce_mmio_region(target_phys_addr_t start, ram_addr_t size)
 
         ret = kvm_vm_ioctl(KVM_UNREGISTER_COALESCED_MMIO, &zone);
     }
-#endif
 
     return ret;
 }
@@ -430,6 +428,18 @@ int kvm_check_extension(unsigned int extension)
     return ret;
 }
 
+static const KVMCapabilityInfo *
+kvm_check_extension_list(const KVMCapabilityInfo *list)
+{
+    while (list->name) {
+        if (!kvm_check_extension(list->value)) {
+            return list;
+        }
+        list++;
+    }
+    return NULL;
+}
+
 static void kvm_set_phys_mem(target_phys_addr_t start_addr, ram_addr_t size,
                              ram_addr_t phys_offset)
 {
@@ -589,6 +599,7 @@ int kvm_init(void)
     static const char upgrade_note[] =
         "Please upgrade to at least kernel 2.6.29 or recent kvm-kmod\n"
         "(see http://sourceforge.net/projects/kvm).\n";
+    const KVMCapabilityInfo *missing_cap;
     int ret;
     int i;
 
@@ -630,33 +641,19 @@ int kvm_init(void)
         goto err;
     }
 
-    /* initially, KVM allocated its own memory and we had to jump through
-     * hooks to make phys_ram_base point to this.  Modern versions of KVM
-     * just use a user allocated buffer so we can use regular pages
-     * unmodified.  Make sure we have a sufficiently modern version of KVM.
-     */
-    if (!kvm_check_extension(KVM_CAP_USER_MEMORY)) {
-        ret = -EINVAL;
-        fprintf(stderr, "kvm does not support KVM_CAP_USER_MEMORY\n%s",
-                upgrade_note);
-        goto err;
+    missing_cap = kvm_check_extension_list(kvm_required_capabilites);
+    if (!missing_cap) {
+        missing_cap =
+            kvm_check_extension_list(kvm_arch_required_capabilities);
     }
-
-    /* There was a nasty bug in < kvm-80 that prevents memory slots from being
-     * destroyed properly.  Since we rely on this capability, refuse to work
-     * with any kernel without this capability. */
-    if (!kvm_check_extension(KVM_CAP_DESTROY_MEMORY_REGION_WORKS)) {
+    if (missing_cap) {
         ret = -EINVAL;
-
-        fprintf(stderr,
-                "KVM kernel module broken (DESTROY_MEMORY_REGION).\n%s",
-                upgrade_note);
+        fprintf(stderr, "kvm does not support %s\n%s",
+                missing_cap->name, upgrade_note);
         goto err;
     }
 
-#ifdef KVM_CAP_COALESCED_MMIO
     kvm_state.coalesced_mmio = kvm_check_extension(KVM_CAP_COALESCED_MMIO);
-#endif
 
     kvm_state.broken_set_mem_region = 1;
 #ifdef KVM_CAP_JOIN_MEMORY_REGIONS_WORKS
@@ -777,7 +774,6 @@ static int kvm_handle_internal_error(CPUState *env, struct kvm_run *run)
 
 void kvm_flush_coalesced_mmio_buffer(void)
 {
-#ifdef KVM_CAP_COALESCED_MMIO
     if (kvm_state.coalesced_mmio_ring) {
         struct kvm_coalesced_mmio_ring *ring = kvm_state.coalesced_mmio_ring;
         while (ring->first != ring->last) {
@@ -790,7 +786,6 @@ void kvm_flush_coalesced_mmio_buffer(void)
             ring->first = (ring->first + 1) % KVM_COALESCED_MMIO_MAX;
         }
     }
-#endif
 }
 
 static void do_kvm_cpu_synchronize_state(void *_env)
@@ -988,11 +983,7 @@ int kvm_vcpu_ioctl(CPUState *env, int type, ...)
 
 int kvm_has_sync_mmu(void)
 {
-#ifdef KVM_CAP_SYNC_MMU
     return kvm_check_extension(KVM_CAP_SYNC_MMU);
-#else
-    return 0;
-#endif
 }
 
 int kvm_has_vcpu_events(void)
diff --git a/kvm.h b/kvm.h
index 31d9f21..153d7b9 100644
--- a/kvm.h
+++ b/kvm.h
@@ -32,6 +32,14 @@ extern int kvm_allowed;
 
 struct kvm_run;
 
+typedef struct KVMCapabilityInfo {
+    const char *name;
+    int value;
+} KVMCapabilityInfo;
+
+#define KVM_CAP_INFO(CAP) { "KVM_CAP_" stringify(CAP), KVM_CAP_##CAP }
+#define KVM_CAP_LAST_INFO { NULL, 0 }
+
 /* external API */
 
 int kvm_init(void);
@@ -82,6 +90,8 @@ int kvm_vcpu_ioctl(CPUState *env, int type, ...);
 
 /* Arch specific hooks */
 
+extern const KVMCapabilityInfo kvm_arch_required_capabilities[];
+
 int kvm_arch_post_run(CPUState *env, struct kvm_run *run);
 
 int kvm_arch_handle_exit(CPUState *env, struct kvm_run *run);
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index a907578..58122d9 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -55,12 +55,17 @@
 #define BUS_MCEERR_AO 5
 #endif
 
+const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
+    KVM_CAP_INFO(SET_TSS_ADDR),
+    KVM_CAP_INFO(EXT_CPUID),
+    KVM_CAP_INFO(MP_STATE),
+    KVM_CAP_LAST_INFO
+};
+
 static bool has_msr_star;
 static bool has_msr_hsave_pa;
 static int lm_capable_kernel;
 
-#ifdef KVM_CAP_EXT_CPUID
-
 static struct kvm_cpuid2 *try_get_cpuid(int max)
 {
     struct kvm_cpuid2 *cpuid;
@@ -94,10 +99,6 @@ uint32_t kvm_x86_get_supported_cpuid(uint32_t function, uint32_t index,
     uint32_t ret = 0;
     uint32_t cpuid_1_edx;
 
-    if (!kvm_check_extension(KVM_CAP_EXT_CPUID)) {
-        return -1U;
-    }
-
     max = 1;
     while ((cpuid = try_get_cpuid(max)) == NULL) {
         max *= 2;
@@ -141,30 +142,14 @@ uint32_t kvm_x86_get_supported_cpuid(uint32_t function, uint32_t index,
     return ret;
 }
 
-#else
-
-uint32_t kvm_x86_get_supported_cpuid(uint32_t function, uint32_t index,
-                                     int reg)
-{
-    return -1U;
-}
-
-#endif
-
 #ifdef CONFIG_KVM_PARA
 struct kvm_para_features {
     int cap;
     int feature;
 } para_features[] = {
-#ifdef KVM_CAP_CLOCKSOURCE
     { KVM_CAP_CLOCKSOURCE, KVM_FEATURE_CLOCKSOURCE },
-#endif
-#ifdef KVM_CAP_NOP_IO_DELAY
     { KVM_CAP_NOP_IO_DELAY, KVM_FEATURE_NOP_IO_DELAY },
-#endif
-#ifdef KVM_CAP_PV_MMU
     { KVM_CAP_PV_MMU, KVM_FEATURE_MMU_OP },
-#endif
 #ifdef KVM_CAP_ASYNC_PF
     { KVM_CAP_ASYNC_PF, KVM_FEATURE_ASYNC_PF },
 #endif
@@ -631,15 +616,7 @@ int kvm_arch_init(void)
 
     /* create vm86 tss.  KVM uses vm86 mode to emulate 16-bit code
      * directly.  In order to use vm86 mode, a TSS is needed.  Since this
-     * must be part of guest physical memory, we need to allocate it.  Older
-     * versions of KVM just assumed that it would be at the end of physical
-     * memory but that doesn't work with more than 4GB of memory.  We simply
-     * refuse to work with those older versions of KVM. */
-    ret = kvm_check_extension(KVM_CAP_SET_TSS_ADDR);
-    if (ret <= 0) {
-        fprintf(stderr, "kvm does not support KVM_CAP_SET_TSS_ADDR\n");
-        return ret;
-    }
+     * must be part of guest physical memory, we need to allocate it. */
 
     /* this address is 3 pages before the bios, and the bios should present
      * as unavaible memory.  FIXME, need to ensure the e820 map deals with
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 72f2f94..7918426 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -37,6 +37,10 @@
     do { } while (0)
 #endif
 
+const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
+    KVM_CAP_LAST_INFO
+};
+
 static int cap_interrupt_unset = false;
 static int cap_interrupt_level = false;
 
diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
index 4f9075c..29fcd46 100644
--- a/target-s390x/kvm.c
+++ b/target-s390x/kvm.c
@@ -70,6 +70,10 @@
 #define SCLP_CMDW_READ_SCP_INFO         0x00020001
 #define SCLP_CMDW_READ_SCP_INFO_FORCED  0x00120001
 
+const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
+    KVM_CAP_LAST_INFO
+};
+
 int kvm_arch_init(void)
 {
     return 0;
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [PATCH 31/35] kvm: x86: Rework identity map and TSS setup for larger BIOS sizes
  2011-01-06 17:56 ` [Qemu-devel] " Marcelo Tosatti
@ 2011-01-06 17:56   ` Marcelo Tosatti
  -1 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, kvm, Jan Kiszka, Marcelo Tosatti

From: Jan Kiszka <jan.kiszka@siemens.com>

In order to support loading BIOSes > 256K, reorder the code, adjusting
the base if the kernel supports moving the identity map.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 target-i386/kvm.c |   63 +++++++++++++++++++++++++---------------------------
 1 files changed, 30 insertions(+), 33 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 58122d9..50d8ec8 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -578,27 +578,9 @@ static int kvm_get_supported_msrs(void)
     return ret;
 }
 
-static int kvm_init_identity_map_page(void)
-{
-#ifdef KVM_CAP_SET_IDENTITY_MAP_ADDR
-    int ret;
-    uint64_t addr = 0xfffbc000;
-
-    if (!kvm_check_extension(KVM_CAP_SET_IDENTITY_MAP_ADDR)) {
-        return 0;
-    }
-
-    ret = kvm_vm_ioctl(KVM_SET_IDENTITY_MAP_ADDR, &addr);
-    if (ret < 0) {
-        fprintf(stderr, "kvm_set_identity_map_addr: %s\n", strerror(ret));
-        return ret;
-    }
-#endif
-    return 0;
-}
-
 int kvm_arch_init(void)
 {
+    uint64_t identity_base = 0xfffbc000;
     int ret;
     struct utsname utsname;
 
@@ -614,27 +596,42 @@ int kvm_arch_init(void)
     uname(&utsname);
     lm_capable_kernel = strcmp(utsname.machine, "x86_64") == 0;
 
-    /* create vm86 tss.  KVM uses vm86 mode to emulate 16-bit code
-     * directly.  In order to use vm86 mode, a TSS is needed.  Since this
-     * must be part of guest physical memory, we need to allocate it. */
-
-    /* this address is 3 pages before the bios, and the bios should present
-     * as unavaible memory.  FIXME, need to ensure the e820 map deals with
-     * this?
-     */
     /*
-     * Tell fw_cfg to notify the BIOS to reserve the range.
+     * On older Intel CPUs, KVM uses vm86 mode to emulate 16-bit code directly.
+     * In order to use vm86 mode, an EPT identity map and a TSS  are needed.
+     * Since these must be part of guest physical memory, we need to allocate
+     * them, both by setting their start addresses in the kernel and by
+     * creating a corresponding e820 entry. We need 4 pages before the BIOS.
+     *
+     * Older KVM versions may not support setting the identity map base. In
+     * that case we need to stick with the default, i.e. a 256K maximum BIOS
+     * size.
      */
-    if (e820_add_entry(0xfffbc000, 0x4000, E820_RESERVED) < 0) {
-        perror("e820_add_entry() table is full");
-        exit(1);
+#ifdef KVM_CAP_SET_IDENTITY_MAP_ADDR
+    if (kvm_check_extension(KVM_CAP_SET_IDENTITY_MAP_ADDR)) {
+        /* Allows up to 16M BIOSes. */
+        identity_base = 0xfeffc000;
+
+        ret = kvm_vm_ioctl(KVM_SET_IDENTITY_MAP_ADDR, &identity_base);
+        if (ret < 0) {
+            return ret;
+        }
     }
-    ret = kvm_vm_ioctl(KVM_SET_TSS_ADDR, 0xfffbd000);
+#endif
+    /* Set TSS base one page after EPT identity map. */
+    ret = kvm_vm_ioctl(KVM_SET_TSS_ADDR, identity_base + 0x1000);
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* Tell fw_cfg to notify the BIOS to reserve the range. */
+    ret = e820_add_entry(identity_base, 0x4000, E820_RESERVED);
     if (ret < 0) {
+        fprintf(stderr, "e820_add_entry() table is full\n");
         return ret;
     }
 
-    return kvm_init_identity_map_page();
+    return 0;
 }
 
 static void set_v8086_seg(struct kvm_segment *lhs, const SegmentCache *rhs)
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [Qemu-devel] [PATCH 31/35] kvm: x86: Rework identity map and TSS setup for larger BIOS sizes
@ 2011-01-06 17:56   ` Marcelo Tosatti
  0 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel, kvm

From: Jan Kiszka <jan.kiszka@siemens.com>

In order to support loading BIOSes > 256K, reorder the code, adjusting
the base if the kernel supports moving the identity map.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 target-i386/kvm.c |   63 +++++++++++++++++++++++++---------------------------
 1 files changed, 30 insertions(+), 33 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 58122d9..50d8ec8 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -578,27 +578,9 @@ static int kvm_get_supported_msrs(void)
     return ret;
 }
 
-static int kvm_init_identity_map_page(void)
-{
-#ifdef KVM_CAP_SET_IDENTITY_MAP_ADDR
-    int ret;
-    uint64_t addr = 0xfffbc000;
-
-    if (!kvm_check_extension(KVM_CAP_SET_IDENTITY_MAP_ADDR)) {
-        return 0;
-    }
-
-    ret = kvm_vm_ioctl(KVM_SET_IDENTITY_MAP_ADDR, &addr);
-    if (ret < 0) {
-        fprintf(stderr, "kvm_set_identity_map_addr: %s\n", strerror(ret));
-        return ret;
-    }
-#endif
-    return 0;
-}
-
 int kvm_arch_init(void)
 {
+    uint64_t identity_base = 0xfffbc000;
     int ret;
     struct utsname utsname;
 
@@ -614,27 +596,42 @@ int kvm_arch_init(void)
     uname(&utsname);
     lm_capable_kernel = strcmp(utsname.machine, "x86_64") == 0;
 
-    /* create vm86 tss.  KVM uses vm86 mode to emulate 16-bit code
-     * directly.  In order to use vm86 mode, a TSS is needed.  Since this
-     * must be part of guest physical memory, we need to allocate it. */
-
-    /* this address is 3 pages before the bios, and the bios should present
-     * as unavaible memory.  FIXME, need to ensure the e820 map deals with
-     * this?
-     */
     /*
-     * Tell fw_cfg to notify the BIOS to reserve the range.
+     * On older Intel CPUs, KVM uses vm86 mode to emulate 16-bit code directly.
+     * In order to use vm86 mode, an EPT identity map and a TSS  are needed.
+     * Since these must be part of guest physical memory, we need to allocate
+     * them, both by setting their start addresses in the kernel and by
+     * creating a corresponding e820 entry. We need 4 pages before the BIOS.
+     *
+     * Older KVM versions may not support setting the identity map base. In
+     * that case we need to stick with the default, i.e. a 256K maximum BIOS
+     * size.
      */
-    if (e820_add_entry(0xfffbc000, 0x4000, E820_RESERVED) < 0) {
-        perror("e820_add_entry() table is full");
-        exit(1);
+#ifdef KVM_CAP_SET_IDENTITY_MAP_ADDR
+    if (kvm_check_extension(KVM_CAP_SET_IDENTITY_MAP_ADDR)) {
+        /* Allows up to 16M BIOSes. */
+        identity_base = 0xfeffc000;
+
+        ret = kvm_vm_ioctl(KVM_SET_IDENTITY_MAP_ADDR, &identity_base);
+        if (ret < 0) {
+            return ret;
+        }
     }
-    ret = kvm_vm_ioctl(KVM_SET_TSS_ADDR, 0xfffbd000);
+#endif
+    /* Set TSS base one page after EPT identity map. */
+    ret = kvm_vm_ioctl(KVM_SET_TSS_ADDR, identity_base + 0x1000);
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* Tell fw_cfg to notify the BIOS to reserve the range. */
+    ret = e820_add_entry(identity_base, 0x4000, E820_RESERVED);
     if (ret < 0) {
+        fprintf(stderr, "e820_add_entry() table is full\n");
         return ret;
     }
 
-    return kvm_init_identity_map_page();
+    return 0;
 }
 
 static void set_v8086_seg(struct kvm_segment *lhs, const SegmentCache *rhs)
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [PATCH 32/35] kvm: Flush coalesced mmio buffer on IO window exits
  2011-01-06 17:56 ` [Qemu-devel] " Marcelo Tosatti
@ 2011-01-06 17:56   ` Marcelo Tosatti
  -1 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, kvm, Jan Kiszka, Marcelo Tosatti

From: Jan Kiszka <jan.kiszka@siemens.com>

We must flush pending mmio writes if we leave kvm_cpu_exec for an IO
window. Otherwise we risk to loose those requests when migrating to a
different host during that window.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 kvm-all.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 7a5b299..a5e9246 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -851,6 +851,8 @@ void kvm_cpu_exec(CPUState *env)
         cpu_single_env = env;
         kvm_arch_post_run(env, run);
 
+        kvm_flush_coalesced_mmio_buffer();
+
         if (ret == -EINTR || ret == -EAGAIN) {
             cpu_exit(env);
             DPRINTF("io window exit\n");
@@ -863,8 +865,6 @@ void kvm_cpu_exec(CPUState *env)
             abort();
         }
 
-        kvm_flush_coalesced_mmio_buffer();
-
         ret = 0; /* exit loop */
         switch (run->exit_reason) {
         case KVM_EXIT_IO:
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [Qemu-devel] [PATCH 32/35] kvm: Flush coalesced mmio buffer on IO window exits
@ 2011-01-06 17:56   ` Marcelo Tosatti
  0 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel, kvm

From: Jan Kiszka <jan.kiszka@siemens.com>

We must flush pending mmio writes if we leave kvm_cpu_exec for an IO
window. Otherwise we risk to loose those requests when migrating to a
different host during that window.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 kvm-all.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 7a5b299..a5e9246 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -851,6 +851,8 @@ void kvm_cpu_exec(CPUState *env)
         cpu_single_env = env;
         kvm_arch_post_run(env, run);
 
+        kvm_flush_coalesced_mmio_buffer();
+
         if (ret == -EINTR || ret == -EAGAIN) {
             cpu_exit(env);
             DPRINTF("io window exit\n");
@@ -863,8 +865,6 @@ void kvm_cpu_exec(CPUState *env)
             abort();
         }
 
-        kvm_flush_coalesced_mmio_buffer();
-
         ret = 0; /* exit loop */
         switch (run->exit_reason) {
         case KVM_EXIT_IO:
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [PATCH 33/35] kvm: Do not use qemu_fair_mutex
  2011-01-06 17:56 ` [Qemu-devel] " Marcelo Tosatti
@ 2011-01-06 17:56   ` Marcelo Tosatti
  -1 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, kvm, Jan Kiszka, Marcelo Tosatti

From: Jan Kiszka <jan.kiszka@siemens.com>

The imbalance in the hold time of qemu_global_mutex only exists in TCG
mode. In contrast to TCG VCPUs, KVM drops the global lock during guest
execution. We already avoid touching the fairness lock from the
IO-thread in KVM mode, so also stop using it from the VCPU threads.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 cpus.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/cpus.c b/cpus.c
index 0309189..4c9928e 100644
--- a/cpus.c
+++ b/cpus.c
@@ -735,9 +735,7 @@ static sigset_t block_io_signals(void)
 void qemu_mutex_lock_iothread(void)
 {
     if (kvm_enabled()) {
-        qemu_mutex_lock(&qemu_fair_mutex);
         qemu_mutex_lock(&qemu_global_mutex);
-        qemu_mutex_unlock(&qemu_fair_mutex);
     } else {
         qemu_mutex_lock(&qemu_fair_mutex);
         if (qemu_mutex_trylock(&qemu_global_mutex)) {
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [Qemu-devel] [PATCH 33/35] kvm: Do not use qemu_fair_mutex
@ 2011-01-06 17:56   ` Marcelo Tosatti
  0 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel, kvm

From: Jan Kiszka <jan.kiszka@siemens.com>

The imbalance in the hold time of qemu_global_mutex only exists in TCG
mode. In contrast to TCG VCPUs, KVM drops the global lock during guest
execution. We already avoid touching the fairness lock from the
IO-thread in KVM mode, so also stop using it from the VCPU threads.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 cpus.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/cpus.c b/cpus.c
index 0309189..4c9928e 100644
--- a/cpus.c
+++ b/cpus.c
@@ -735,9 +735,7 @@ static sigset_t block_io_signals(void)
 void qemu_mutex_lock_iothread(void)
 {
     if (kvm_enabled()) {
-        qemu_mutex_lock(&qemu_fair_mutex);
         qemu_mutex_lock(&qemu_global_mutex);
-        qemu_mutex_unlock(&qemu_fair_mutex);
     } else {
         qemu_mutex_lock(&qemu_fair_mutex);
         if (qemu_mutex_trylock(&qemu_global_mutex)) {
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [PATCH 34/35] kvm: x86: Implicitly clear nmi_injected/pending on reset
  2011-01-06 17:56 ` [Qemu-devel] " Marcelo Tosatti
@ 2011-01-06 17:56   ` Marcelo Tosatti
  -1 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, kvm, Jan Kiszka, Marcelo Tosatti

From: Jan Kiszka <jan.kiszka@siemens.com>

All CPUX86State variables before CPU_COMMON are automatically cleared on
reset. Reorder nmi_injected and nmi_pending to avoid having to touch
them explicitly.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 target-i386/cpu.h |    6 ++++--
 target-i386/kvm.c |    2 --
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index a457423..af701a4 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -699,6 +699,10 @@ typedef struct CPUX86State {
     uint32_t smbase;
     int old_exception;  /* exception in flight */
 
+    /* KVM states, automatically cleared on reset */
+    uint8_t nmi_injected;
+    uint8_t nmi_pending;
+
     CPU_COMMON
 
     /* processor features (e.g. for CPUID insn) */
@@ -726,8 +730,6 @@ typedef struct CPUX86State {
     int32_t exception_injected;
     int32_t interrupt_injected;
     uint8_t soft_interrupt;
-    uint8_t nmi_injected;
-    uint8_t nmi_pending;
     uint8_t has_error_code;
     uint32_t sipi_vector;
     uint32_t cpuid_kvm_features;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 50d8ec8..79a1da8 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -520,8 +520,6 @@ void kvm_arch_reset_vcpu(CPUState *env)
 {
     env->exception_injected = -1;
     env->interrupt_injected = -1;
-    env->nmi_injected = 0;
-    env->nmi_pending = 0;
     env->xcr0 = 1;
     if (kvm_irqchip_in_kernel()) {
         env->mp_state = cpu_is_bsp(env) ? KVM_MP_STATE_RUNNABLE :
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [Qemu-devel] [PATCH 34/35] kvm: x86: Implicitly clear nmi_injected/pending on reset
@ 2011-01-06 17:56   ` Marcelo Tosatti
  0 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel, kvm

From: Jan Kiszka <jan.kiszka@siemens.com>

All CPUX86State variables before CPU_COMMON are automatically cleared on
reset. Reorder nmi_injected and nmi_pending to avoid having to touch
them explicitly.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 target-i386/cpu.h |    6 ++++--
 target-i386/kvm.c |    2 --
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index a457423..af701a4 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -699,6 +699,10 @@ typedef struct CPUX86State {
     uint32_t smbase;
     int old_exception;  /* exception in flight */
 
+    /* KVM states, automatically cleared on reset */
+    uint8_t nmi_injected;
+    uint8_t nmi_pending;
+
     CPU_COMMON
 
     /* processor features (e.g. for CPUID insn) */
@@ -726,8 +730,6 @@ typedef struct CPUX86State {
     int32_t exception_injected;
     int32_t interrupt_injected;
     uint8_t soft_interrupt;
-    uint8_t nmi_injected;
-    uint8_t nmi_pending;
     uint8_t has_error_code;
     uint32_t sipi_vector;
     uint32_t cpuid_kvm_features;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 50d8ec8..79a1da8 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -520,8 +520,6 @@ void kvm_arch_reset_vcpu(CPUState *env)
 {
     env->exception_injected = -1;
     env->interrupt_injected = -1;
-    env->nmi_injected = 0;
-    env->nmi_pending = 0;
     env->xcr0 = 1;
     if (kvm_irqchip_in_kernel()) {
         env->mp_state = cpu_is_bsp(env) ? KVM_MP_STATE_RUNNABLE :
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [PATCH 35/35] kvm: x86: Only read/write MSR_KVM_ASYNC_PF_EN if supported
  2011-01-06 17:56 ` [Qemu-devel] " Marcelo Tosatti
@ 2011-01-06 17:56   ` Marcelo Tosatti
  -1 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, kvm, Jan Kiszka, Marcelo Tosatti

From: Jan Kiszka <jan.kiszka@siemens.com>

If the kernel does not support KVM_CAP_ASYNC_PF, it also does not know
about the related MSR. So skip it during state synchronization in that
case. Fixes annoying kernel warnings.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 target-i386/kvm.c |   12 ++++++++++--
 1 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 79a1da8..af79526 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -64,6 +64,9 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
 
 static bool has_msr_star;
 static bool has_msr_hsave_pa;
+#if defined(CONFIG_KVM_PARA) && defined(KVM_CAP_ASYNC_PF)
+static bool has_msr_async_pf_en;
+#endif
 static int lm_capable_kernel;
 
 static struct kvm_cpuid2 *try_get_cpuid(int max)
@@ -165,6 +168,7 @@ static int get_para_features(void)
             features |= (1 << para_features[i].feature);
         }
     }
+    has_msr_async_pf_en = features & (1 << KVM_FEATURE_ASYNC_PF);
     return features;
 }
 #endif
@@ -917,7 +921,9 @@ static int kvm_put_msrs(CPUState *env, int level)
                           env->system_time_msr);
         kvm_msr_entry_set(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
 #if defined(CONFIG_KVM_PARA) && defined(KVM_CAP_ASYNC_PF)
-        kvm_msr_entry_set(&msrs[n++], MSR_KVM_ASYNC_PF_EN, env->async_pf_en_msr);
+        if (has_msr_async_pf_en) {
+            kvm_msr_entry_set(&msrs[n++], MSR_KVM_ASYNC_PF_EN, env->async_pf_en_msr);
+        }
 #endif
     }
 #ifdef KVM_CAP_MCE
@@ -1149,7 +1155,9 @@ static int kvm_get_msrs(CPUState *env)
     msrs[n++].index = MSR_KVM_SYSTEM_TIME;
     msrs[n++].index = MSR_KVM_WALL_CLOCK;
 #if defined(CONFIG_KVM_PARA) && defined(KVM_CAP_ASYNC_PF)
-    msrs[n++].index = MSR_KVM_ASYNC_PF_EN;
+    if (has_msr_async_pf_en) {
+        msrs[n++].index = MSR_KVM_ASYNC_PF_EN;
+    }
 #endif
 
 #ifdef KVM_CAP_MCE
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [Qemu-devel] [PATCH 35/35] kvm: x86: Only read/write MSR_KVM_ASYNC_PF_EN if supported
@ 2011-01-06 17:56   ` Marcelo Tosatti
  0 siblings, 0 replies; 300+ messages in thread
From: Marcelo Tosatti @ 2011-01-06 17:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel, kvm

From: Jan Kiszka <jan.kiszka@siemens.com>

If the kernel does not support KVM_CAP_ASYNC_PF, it also does not know
about the related MSR. So skip it during state synchronization in that
case. Fixes annoying kernel warnings.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
 target-i386/kvm.c |   12 ++++++++++--
 1 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 79a1da8..af79526 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -64,6 +64,9 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
 
 static bool has_msr_star;
 static bool has_msr_hsave_pa;
+#if defined(CONFIG_KVM_PARA) && defined(KVM_CAP_ASYNC_PF)
+static bool has_msr_async_pf_en;
+#endif
 static int lm_capable_kernel;
 
 static struct kvm_cpuid2 *try_get_cpuid(int max)
@@ -165,6 +168,7 @@ static int get_para_features(void)
             features |= (1 << para_features[i].feature);
         }
     }
+    has_msr_async_pf_en = features & (1 << KVM_FEATURE_ASYNC_PF);
     return features;
 }
 #endif
@@ -917,7 +921,9 @@ static int kvm_put_msrs(CPUState *env, int level)
                           env->system_time_msr);
         kvm_msr_entry_set(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
 #if defined(CONFIG_KVM_PARA) && defined(KVM_CAP_ASYNC_PF)
-        kvm_msr_entry_set(&msrs[n++], MSR_KVM_ASYNC_PF_EN, env->async_pf_en_msr);
+        if (has_msr_async_pf_en) {
+            kvm_msr_entry_set(&msrs[n++], MSR_KVM_ASYNC_PF_EN, env->async_pf_en_msr);
+        }
 #endif
     }
 #ifdef KVM_CAP_MCE
@@ -1149,7 +1155,9 @@ static int kvm_get_msrs(CPUState *env)
     msrs[n++].index = MSR_KVM_SYSTEM_TIME;
     msrs[n++].index = MSR_KVM_WALL_CLOCK;
 #if defined(CONFIG_KVM_PARA) && defined(KVM_CAP_ASYNC_PF)
-    msrs[n++].index = MSR_KVM_ASYNC_PF_EN;
+    if (has_msr_async_pf_en) {
+        msrs[n++].index = MSR_KVM_ASYNC_PF_EN;
+    }
 #endif
 
 #ifdef KVM_CAP_MCE
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 300+ messages in thread

* Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
  2011-01-06 17:56   ` [Qemu-devel] " Marcelo Tosatti
@ 2011-01-06 19:24     ` Anthony Liguori
  -1 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-06 19:24 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: qemu-devel, kvm, Jan Kiszka, Alexander Graf

On 01/06/2011 11:56 AM, Marcelo Tosatti wrote:
> From: Jan Kiszka<jan.kiszka@siemens.com>
>
> QEMU supports only one VM, so there is only one kvm_state per process,
> and we gain nothing passing a reference to it around. Eliminate any need
> to refer to it outside of kvm-all.c.
>
> Signed-off-by: Jan Kiszka<jan.kiszka@siemens.com>
> CC: Alexander Graf<agraf@suse.de>
> Signed-off-by: Marcelo Tosatti<mtosatti@redhat.com>
>    

I think this is a big mistake.

Having to manage kvm_state keeps the abstraction lines well defined.  
Otherwise, it's far too easy for portions of code to call into KVM 
functions that really shouldn't.

Regards,

Anthony Liguori

> ---
>   cpu-defs.h            |    2 -
>   kvm-all.c             |  232 +++++++++++++++++++++----------------------------
>   kvm-stub.c            |    2 +-
>   kvm.h                 |   15 +--
>   target-i386/cpuid.c   |    9 +-
>   target-i386/kvm.c     |   77 ++++++++--------
>   target-i386/kvm_x86.h |    3 +
>   target-ppc/kvm.c      |   12 ++--
>   target-s390x/kvm.c    |    8 +--
>   9 files changed, 160 insertions(+), 200 deletions(-)
>
> diff --git a/cpu-defs.h b/cpu-defs.h
> index 8d4bf86..0e04239 100644
> --- a/cpu-defs.h
> +++ b/cpu-defs.h
> @@ -131,7 +131,6 @@ typedef struct icount_decr_u16 {
>   #endif
>
>   struct kvm_run;
> -struct KVMState;
>   struct qemu_work_item;
>
>   typedef struct CPUBreakpoint {
> @@ -207,7 +206,6 @@ typedef struct CPUWatchpoint {
>       struct QemuCond *halt_cond;                                         \
>       struct qemu_work_item *queued_work_first, *queued_work_last;        \
>       const char *cpu_model_str;                                          \
> -    struct KVMState *kvm_state;                                         \
>       struct kvm_run *kvm_run;                                            \
>       int kvm_fd;                                                         \
>       int kvm_vcpu_dirty;
> diff --git a/kvm-all.c b/kvm-all.c
> index ef2ca3b..d8820c7 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -52,8 +52,7 @@ typedef struct KVMSlot
>
>   typedef struct kvm_dirty_log KVMDirtyLog;
>
> -struct KVMState
> -{
> +static struct KVMState {
>       KVMSlot slots[32];
>       int fd;
>       int vmfd;
> @@ -72,21 +71,19 @@ struct KVMState
>       int irqchip_in_kernel;
>       int pit_in_kernel;
>       int xsave, xcrs;
> -};
> -
> -static KVMState *kvm_state;
> +} kvm_state;
>
> -static KVMSlot *kvm_alloc_slot(KVMState *s)
> +static KVMSlot *kvm_alloc_slot(void)
>   {
>       int i;
>
> -    for (i = 0; i<  ARRAY_SIZE(s->slots); i++) {
> +    for (i = 0; i<  ARRAY_SIZE(kvm_state.slots); i++) {
>           /* KVM private memory slots */
>           if (i>= 8&&  i<  12) {
>               continue;
>           }
> -        if (s->slots[i].memory_size == 0) {
> -            return&s->slots[i];
> +        if (kvm_state.slots[i].memory_size == 0) {
> +            return&kvm_state.slots[i];
>           }
>       }
>
> @@ -94,14 +91,13 @@ static KVMSlot *kvm_alloc_slot(KVMState *s)
>       abort();
>   }
>
> -static KVMSlot *kvm_lookup_matching_slot(KVMState *s,
> -                                         target_phys_addr_t start_addr,
> +static KVMSlot *kvm_lookup_matching_slot(target_phys_addr_t start_addr,
>                                            target_phys_addr_t end_addr)
>   {
>       int i;
>
> -    for (i = 0; i<  ARRAY_SIZE(s->slots); i++) {
> -        KVMSlot *mem =&s->slots[i];
> +    for (i = 0; i<  ARRAY_SIZE(kvm_state.slots); i++) {
> +        KVMSlot *mem =&kvm_state.slots[i];
>
>           if (start_addr == mem->start_addr&&
>               end_addr == mem->start_addr + mem->memory_size) {
> @@ -115,15 +111,14 @@ static KVMSlot *kvm_lookup_matching_slot(KVMState *s,
>   /*
>    * Find overlapping slot with lowest start address
>    */
> -static KVMSlot *kvm_lookup_overlapping_slot(KVMState *s,
> -                                            target_phys_addr_t start_addr,
> +static KVMSlot *kvm_lookup_overlapping_slot(target_phys_addr_t start_addr,
>                                               target_phys_addr_t end_addr)
>   {
>       KVMSlot *found = NULL;
>       int i;
>
> -    for (i = 0; i<  ARRAY_SIZE(s->slots); i++) {
> -        KVMSlot *mem =&s->slots[i];
> +    for (i = 0; i<  ARRAY_SIZE(kvm_state.slots); i++) {
> +        KVMSlot *mem =&kvm_state.slots[i];
>
>           if (mem->memory_size == 0 ||
>               (found&&  found->start_addr<  mem->start_addr)) {
> @@ -139,13 +134,13 @@ static KVMSlot *kvm_lookup_overlapping_slot(KVMState *s,
>       return found;
>   }
>
> -int kvm_physical_memory_addr_from_ram(KVMState *s, ram_addr_t ram_addr,
> +int kvm_physical_memory_addr_from_ram(ram_addr_t ram_addr,
>                                         target_phys_addr_t *phys_addr)
>   {
>       int i;
>
> -    for (i = 0; i<  ARRAY_SIZE(s->slots); i++) {
> -        KVMSlot *mem =&s->slots[i];
> +    for (i = 0; i<  ARRAY_SIZE(kvm_state.slots); i++) {
> +        KVMSlot *mem =&kvm_state.slots[i];
>
>           if (ram_addr>= mem->phys_offset&&
>               ram_addr<  mem->phys_offset + mem->memory_size) {
> @@ -157,7 +152,7 @@ int kvm_physical_memory_addr_from_ram(KVMState *s, ram_addr_t ram_addr,
>       return 0;
>   }
>
> -static int kvm_set_user_memory_region(KVMState *s, KVMSlot *slot)
> +static int kvm_set_user_memory_region(KVMSlot *slot)
>   {
>       struct kvm_userspace_memory_region mem;
>
> @@ -166,10 +161,10 @@ static int kvm_set_user_memory_region(KVMState *s, KVMSlot *slot)
>       mem.memory_size = slot->memory_size;
>       mem.userspace_addr = (unsigned long)qemu_safe_ram_ptr(slot->phys_offset);
>       mem.flags = slot->flags;
> -    if (s->migration_log) {
> +    if (kvm_state.migration_log) {
>           mem.flags |= KVM_MEM_LOG_DIRTY_PAGES;
>       }
> -    return kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION,&mem);
> +    return kvm_vm_ioctl(KVM_SET_USER_MEMORY_REGION,&mem);
>   }
>
>   static void kvm_reset_vcpu(void *opaque)
> @@ -181,33 +176,31 @@ static void kvm_reset_vcpu(void *opaque)
>
>   int kvm_irqchip_in_kernel(void)
>   {
> -    return kvm_state->irqchip_in_kernel;
> +    return kvm_state.irqchip_in_kernel;
>   }
>
>   int kvm_pit_in_kernel(void)
>   {
> -    return kvm_state->pit_in_kernel;
> +    return kvm_state.pit_in_kernel;
>   }
>
>
>   int kvm_init_vcpu(CPUState *env)
>   {
> -    KVMState *s = kvm_state;
>       long mmap_size;
>       int ret;
>
>       DPRINTF("kvm_init_vcpu\n");
>
> -    ret = kvm_vm_ioctl(s, KVM_CREATE_VCPU, env->cpu_index);
> +    ret = kvm_vm_ioctl(KVM_CREATE_VCPU, env->cpu_index);
>       if (ret<  0) {
>           DPRINTF("kvm_create_vcpu failed\n");
>           goto err;
>       }
>
>       env->kvm_fd = ret;
> -    env->kvm_state = s;
>
> -    mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
> +    mmap_size = kvm_ioctl(KVM_GET_VCPU_MMAP_SIZE, 0);
>       if (mmap_size<  0) {
>           DPRINTF("KVM_GET_VCPU_MMAP_SIZE failed\n");
>           goto err;
> @@ -222,9 +215,9 @@ int kvm_init_vcpu(CPUState *env)
>       }
>
>   #ifdef KVM_CAP_COALESCED_MMIO
> -    if (s->coalesced_mmio&&  !s->coalesced_mmio_ring) {
> -        s->coalesced_mmio_ring =
> -            (void *)env->kvm_run + s->coalesced_mmio * PAGE_SIZE;
> +    if (kvm_state.coalesced_mmio&&  !kvm_state.coalesced_mmio_ring) {
> +        kvm_state.coalesced_mmio_ring =
> +            (void *)env->kvm_run + kvm_state.coalesced_mmio * PAGE_SIZE;
>       }
>   #endif
>
> @@ -243,8 +236,7 @@ err:
>   static int kvm_dirty_pages_log_change(target_phys_addr_t phys_addr,
>                                         ram_addr_t size, int flags, int mask)
>   {
> -    KVMState *s = kvm_state;
> -    KVMSlot *mem = kvm_lookup_matching_slot(s, phys_addr, phys_addr + size);
> +    KVMSlot *mem = kvm_lookup_matching_slot(phys_addr, phys_addr + size);
>       int old_flags;
>
>       if (mem == NULL)  {
> @@ -260,14 +252,14 @@ static int kvm_dirty_pages_log_change(target_phys_addr_t phys_addr,
>       mem->flags = flags;
>
>       /* If nothing changed effectively, no need to issue ioctl */
> -    if (s->migration_log) {
> +    if (kvm_state.migration_log) {
>           flags |= KVM_MEM_LOG_DIRTY_PAGES;
>       }
>       if (flags == old_flags) {
>               return 0;
>       }
>
> -    return kvm_set_user_memory_region(s, mem);
> +    return kvm_set_user_memory_region(mem);
>   }
>
>   int kvm_log_start(target_phys_addr_t phys_addr, ram_addr_t size)
> @@ -284,14 +276,13 @@ int kvm_log_stop(target_phys_addr_t phys_addr, ram_addr_t size)
>
>   static int kvm_set_migration_log(int enable)
>   {
> -    KVMState *s = kvm_state;
>       KVMSlot *mem;
>       int i, err;
>
> -    s->migration_log = enable;
> +    kvm_state.migration_log = enable;
>
> -    for (i = 0; i<  ARRAY_SIZE(s->slots); i++) {
> -        mem =&s->slots[i];
> +    for (i = 0; i<  ARRAY_SIZE(kvm_state.slots); i++) {
> +        mem =&kvm_state.slots[i];
>
>           if (!mem->memory_size) {
>               continue;
> @@ -299,7 +290,7 @@ static int kvm_set_migration_log(int enable)
>           if (!!(mem->flags&  KVM_MEM_LOG_DIRTY_PAGES) == enable) {
>               continue;
>           }
> -        err = kvm_set_user_memory_region(s, mem);
> +        err = kvm_set_user_memory_region(mem);
>           if (err) {
>               return err;
>           }
> @@ -353,7 +344,6 @@ static int kvm_get_dirty_pages_log_range(unsigned long start_addr,
>   static int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
>                                             target_phys_addr_t end_addr)
>   {
> -    KVMState *s = kvm_state;
>       unsigned long size, allocated_size = 0;
>       KVMDirtyLog d;
>       KVMSlot *mem;
> @@ -361,7 +351,7 @@ static int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
>
>       d.dirty_bitmap = NULL;
>       while (start_addr<  end_addr) {
> -        mem = kvm_lookup_overlapping_slot(s, start_addr, end_addr);
> +        mem = kvm_lookup_overlapping_slot(start_addr, end_addr);
>           if (mem == NULL) {
>               break;
>           }
> @@ -377,7 +367,7 @@ static int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
>
>           d.slot = mem->slot;
>
> -        if (kvm_vm_ioctl(s, KVM_GET_DIRTY_LOG,&d) == -1) {
> +        if (kvm_vm_ioctl(KVM_GET_DIRTY_LOG,&d) == -1) {
>               DPRINTF("ioctl failed %d\n", errno);
>               ret = -1;
>               break;
> @@ -395,16 +385,15 @@ static int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
>   int kvm_coalesce_mmio_region(target_phys_addr_t start, ram_addr_t size)
>   {
>       int ret = -ENOSYS;
> -#ifdef KVM_CAP_COALESCED_MMIO
> -    KVMState *s = kvm_state;
>
> -    if (s->coalesced_mmio) {
> +#ifdef KVM_CAP_COALESCED_MMIO
> +    if (kvm_state.coalesced_mmio) {
>           struct kvm_coalesced_mmio_zone zone;
>
>           zone.addr = start;
>           zone.size = size;
>
> -        ret = kvm_vm_ioctl(s, KVM_REGISTER_COALESCED_MMIO,&zone);
> +        ret = kvm_vm_ioctl(KVM_REGISTER_COALESCED_MMIO,&zone);
>       }
>   #endif
>
> @@ -414,27 +403,26 @@ int kvm_coalesce_mmio_region(target_phys_addr_t start, ram_addr_t size)
>   int kvm_uncoalesce_mmio_region(target_phys_addr_t start, ram_addr_t size)
>   {
>       int ret = -ENOSYS;
> -#ifdef KVM_CAP_COALESCED_MMIO
> -    KVMState *s = kvm_state;
>
> -    if (s->coalesced_mmio) {
> +#ifdef KVM_CAP_COALESCED_MMIO
> +    if (kvm_state.coalesced_mmio) {
>           struct kvm_coalesced_mmio_zone zone;
>
>           zone.addr = start;
>           zone.size = size;
>
> -        ret = kvm_vm_ioctl(s, KVM_UNREGISTER_COALESCED_MMIO,&zone);
> +        ret = kvm_vm_ioctl(KVM_UNREGISTER_COALESCED_MMIO,&zone);
>       }
>   #endif
>
>       return ret;
>   }
>
> -int kvm_check_extension(KVMState *s, unsigned int extension)
> +int kvm_check_extension(unsigned int extension)
>   {
>       int ret;
>
> -    ret = kvm_ioctl(s, KVM_CHECK_EXTENSION, extension);
> +    ret = kvm_ioctl(KVM_CHECK_EXTENSION, extension);
>       if (ret<  0) {
>           ret = 0;
>       }
> @@ -445,7 +433,6 @@ int kvm_check_extension(KVMState *s, unsigned int extension)
>   static void kvm_set_phys_mem(target_phys_addr_t start_addr, ram_addr_t size,
>                                ram_addr_t phys_offset)
>   {
> -    KVMState *s = kvm_state;
>       ram_addr_t flags = phys_offset&  ~TARGET_PAGE_MASK;
>       KVMSlot *mem, old;
>       int err;
> @@ -459,7 +446,7 @@ static void kvm_set_phys_mem(target_phys_addr_t start_addr, ram_addr_t size,
>       phys_offset&= ~IO_MEM_ROM;
>
>       while (1) {
> -        mem = kvm_lookup_overlapping_slot(s, start_addr, start_addr + size);
> +        mem = kvm_lookup_overlapping_slot(start_addr, start_addr + size);
>           if (!mem) {
>               break;
>           }
> @@ -476,7 +463,7 @@ static void kvm_set_phys_mem(target_phys_addr_t start_addr, ram_addr_t size,
>
>           /* unregister the overlapping slot */
>           mem->memory_size = 0;
> -        err = kvm_set_user_memory_region(s, mem);
> +        err = kvm_set_user_memory_region(mem);
>           if (err) {
>               fprintf(stderr, "%s: error unregistering overlapping slot: %s\n",
>                       __func__, strerror(-err));
> @@ -491,16 +478,16 @@ static void kvm_set_phys_mem(target_phys_addr_t start_addr, ram_addr_t size,
>            * address as the first existing one. If not or if some overlapping
>            * slot comes around later, we will fail (not seen in practice so far)
>            * - and actually require a recent KVM version. */
> -        if (s->broken_set_mem_region&&
> +        if (kvm_state.broken_set_mem_region&&
>               old.start_addr == start_addr&&  old.memory_size<  size&&
>               flags<  IO_MEM_UNASSIGNED) {
> -            mem = kvm_alloc_slot(s);
> +            mem = kvm_alloc_slot();
>               mem->memory_size = old.memory_size;
>               mem->start_addr = old.start_addr;
>               mem->phys_offset = old.phys_offset;
>               mem->flags = 0;
>
> -            err = kvm_set_user_memory_region(s, mem);
> +            err = kvm_set_user_memory_region(mem);
>               if (err) {
>                   fprintf(stderr, "%s: error updating slot: %s\n", __func__,
>                           strerror(-err));
> @@ -515,13 +502,13 @@ static void kvm_set_phys_mem(target_phys_addr_t start_addr, ram_addr_t size,
>
>           /* register prefix slot */
>           if (old.start_addr<  start_addr) {
> -            mem = kvm_alloc_slot(s);
> +            mem = kvm_alloc_slot();
>               mem->memory_size = start_addr - old.start_addr;
>               mem->start_addr = old.start_addr;
>               mem->phys_offset = old.phys_offset;
>               mem->flags = 0;
>
> -            err = kvm_set_user_memory_region(s, mem);
> +            err = kvm_set_user_memory_region(mem);
>               if (err) {
>                   fprintf(stderr, "%s: error registering prefix slot: %s\n",
>                           __func__, strerror(-err));
> @@ -533,14 +520,14 @@ static void kvm_set_phys_mem(target_phys_addr_t start_addr, ram_addr_t size,
>           if (old.start_addr + old.memory_size>  start_addr + size) {
>               ram_addr_t size_delta;
>
> -            mem = kvm_alloc_slot(s);
> +            mem = kvm_alloc_slot();
>               mem->start_addr = start_addr + size;
>               size_delta = mem->start_addr - old.start_addr;
>               mem->memory_size = old.memory_size - size_delta;
>               mem->phys_offset = old.phys_offset + size_delta;
>               mem->flags = 0;
>
> -            err = kvm_set_user_memory_region(s, mem);
> +            err = kvm_set_user_memory_region(mem);
>               if (err) {
>                   fprintf(stderr, "%s: error registering suffix slot: %s\n",
>                           __func__, strerror(-err));
> @@ -557,13 +544,13 @@ static void kvm_set_phys_mem(target_phys_addr_t start_addr, ram_addr_t size,
>       if (flags>= IO_MEM_UNASSIGNED) {
>           return;
>       }
> -    mem = kvm_alloc_slot(s);
> +    mem = kvm_alloc_slot();
>       mem->memory_size = size;
>       mem->start_addr = start_addr;
>       mem->phys_offset = phys_offset;
>       mem->flags = 0;
>
> -    err = kvm_set_user_memory_region(s, mem);
> +    err = kvm_set_user_memory_region(mem);
>       if (err) {
>           fprintf(stderr, "%s: error registering slot: %s\n", __func__,
>                   strerror(-err));
> @@ -602,27 +589,24 @@ int kvm_init(int smp_cpus)
>       static const char upgrade_note[] =
>           "Please upgrade to at least kernel 2.6.29 or recent kvm-kmod\n"
>           "(see http://sourceforge.net/projects/kvm).\n";
> -    KVMState *s;
>       int ret;
>       int i;
>
> -    s = qemu_mallocz(sizeof(KVMState));
> -
>   #ifdef KVM_CAP_SET_GUEST_DEBUG
> -    QTAILQ_INIT(&s->kvm_sw_breakpoints);
> +    QTAILQ_INIT(&kvm_state.kvm_sw_breakpoints);
>   #endif
> -    for (i = 0; i<  ARRAY_SIZE(s->slots); i++) {
> -        s->slots[i].slot = i;
> +    for (i = 0; i<  ARRAY_SIZE(kvm_state.slots); i++) {
> +        kvm_state.slots[i].slot = i;
>       }
> -    s->vmfd = -1;
> -    s->fd = qemu_open("/dev/kvm", O_RDWR);
> -    if (s->fd == -1) {
> +    kvm_state.vmfd = -1;
> +    kvm_state.fd = qemu_open("/dev/kvm", O_RDWR);
> +    if (kvm_state.fd == -1) {
>           fprintf(stderr, "Could not access KVM kernel module: %m\n");
>           ret = -errno;
>           goto err;
>       }
>
> -    ret = kvm_ioctl(s, KVM_GET_API_VERSION, 0);
> +    ret = kvm_ioctl(KVM_GET_API_VERSION, 0);
>       if (ret<  KVM_API_VERSION) {
>           if (ret>  0) {
>               ret = -EINVAL;
> @@ -637,8 +621,8 @@ int kvm_init(int smp_cpus)
>           goto err;
>       }
>
> -    s->vmfd = kvm_ioctl(s, KVM_CREATE_VM, 0);
> -    if (s->vmfd<  0) {
> +    kvm_state.vmfd = kvm_ioctl(KVM_CREATE_VM, 0);
> +    if (kvm_state.vmfd<  0) {
>   #ifdef TARGET_S390X
>           fprintf(stderr, "Please add the 'switch_amode' kernel parameter to "
>                           "your host kernel command line\n");
> @@ -651,7 +635,7 @@ int kvm_init(int smp_cpus)
>        * just use a user allocated buffer so we can use regular pages
>        * unmodified.  Make sure we have a sufficiently modern version of KVM.
>        */
> -    if (!kvm_check_extension(s, KVM_CAP_USER_MEMORY)) {
> +    if (!kvm_check_extension(KVM_CAP_USER_MEMORY)) {
>           ret = -EINVAL;
>           fprintf(stderr, "kvm does not support KVM_CAP_USER_MEMORY\n%s",
>                   upgrade_note);
> @@ -661,7 +645,7 @@ int kvm_init(int smp_cpus)
>       /* There was a nasty bug in<  kvm-80 that prevents memory slots from being
>        * destroyed properly.  Since we rely on this capability, refuse to work
>        * with any kernel without this capability. */
> -    if (!kvm_check_extension(s, KVM_CAP_DESTROY_MEMORY_REGION_WORKS)) {
> +    if (!kvm_check_extension(KVM_CAP_DESTROY_MEMORY_REGION_WORKS)) {
>           ret = -EINVAL;
>
>           fprintf(stderr,
> @@ -670,66 +654,55 @@ int kvm_init(int smp_cpus)
>           goto err;
>       }
>
> -    s->coalesced_mmio = 0;
>   #ifdef KVM_CAP_COALESCED_MMIO
> -    s->coalesced_mmio = kvm_check_extension(s, KVM_CAP_COALESCED_MMIO);
> -    s->coalesced_mmio_ring = NULL;
> +    kvm_state.coalesced_mmio = kvm_check_extension(KVM_CAP_COALESCED_MMIO);
>   #endif
>
> -    s->broken_set_mem_region = 1;
> +    kvm_state.broken_set_mem_region = 1;
>   #ifdef KVM_CAP_JOIN_MEMORY_REGIONS_WORKS
> -    ret = kvm_check_extension(s, KVM_CAP_JOIN_MEMORY_REGIONS_WORKS);
> +    ret = kvm_check_extension(KVM_CAP_JOIN_MEMORY_REGIONS_WORKS);
>       if (ret>  0) {
> -        s->broken_set_mem_region = 0;
> +        kvm_state.broken_set_mem_region = 0;
>       }
>   #endif
>
> -    s->vcpu_events = 0;
>   #ifdef KVM_CAP_VCPU_EVENTS
> -    s->vcpu_events = kvm_check_extension(s, KVM_CAP_VCPU_EVENTS);
> +    kvm_state.vcpu_events = kvm_check_extension(KVM_CAP_VCPU_EVENTS);
>   #endif
>
> -    s->robust_singlestep = 0;
>   #ifdef KVM_CAP_X86_ROBUST_SINGLESTEP
> -    s->robust_singlestep =
> -        kvm_check_extension(s, KVM_CAP_X86_ROBUST_SINGLESTEP);
> +    kvm_state.robust_singlestep =
> +        kvm_check_extension(KVM_CAP_X86_ROBUST_SINGLESTEP);
>   #endif
>
> -    s->debugregs = 0;
>   #ifdef KVM_CAP_DEBUGREGS
> -    s->debugregs = kvm_check_extension(s, KVM_CAP_DEBUGREGS);
> +    kvm_state.debugregs = kvm_check_extension(KVM_CAP_DEBUGREGS);
>   #endif
>
> -    s->xsave = 0;
>   #ifdef KVM_CAP_XSAVE
> -    s->xsave = kvm_check_extension(s, KVM_CAP_XSAVE);
> +    kvm_state.xsave = kvm_check_extension(KVM_CAP_XSAVE);
>   #endif
>
> -    s->xcrs = 0;
>   #ifdef KVM_CAP_XCRS
> -    s->xcrs = kvm_check_extension(s, KVM_CAP_XCRS);
> +    kvm_state.xcrs = kvm_check_extension(KVM_CAP_XCRS);
>   #endif
>
> -    ret = kvm_arch_init(s, smp_cpus);
> +    ret = kvm_arch_init(smp_cpus);
>       if (ret<  0) {
>           goto err;
>       }
>
> -    kvm_state = s;
>       cpu_register_phys_memory_client(&kvm_cpu_phys_memory_client);
>
>       return 0;
>
>   err:
> -    if (s) {
> -        if (s->vmfd != -1) {
> -            close(s->vmfd);
> -        }
> -        if (s->fd != -1) {
> -            close(s->fd);
> -        }
> +    if (kvm_state.vmfd != -1) {
> +        close(kvm_state.vmfd);
> +    }
> +    if (kvm_state.fd != -1) {
> +        close(kvm_state.fd);
>       }
> -    qemu_free(s);
>
>       return ret;
>   }
> @@ -777,7 +750,7 @@ static int kvm_handle_io(uint16_t port, void *data, int direction, int size,
>   static int kvm_handle_internal_error(CPUState *env, struct kvm_run *run)
>   {
>       fprintf(stderr, "KVM internal error.");
> -    if (kvm_check_extension(kvm_state, KVM_CAP_INTERNAL_ERROR_DATA)) {
> +    if (kvm_check_extension(KVM_CAP_INTERNAL_ERROR_DATA)) {
>           int i;
>
>           fprintf(stderr, " Suberror: %d\n", run->internal.suberror);
> @@ -805,9 +778,8 @@ static int kvm_handle_internal_error(CPUState *env, struct kvm_run *run)
>   void kvm_flush_coalesced_mmio_buffer(void)
>   {
>   #ifdef KVM_CAP_COALESCED_MMIO
> -    KVMState *s = kvm_state;
> -    if (s->coalesced_mmio_ring) {
> -        struct kvm_coalesced_mmio_ring *ring = s->coalesced_mmio_ring;
> +    if (kvm_state.coalesced_mmio_ring) {
> +        struct kvm_coalesced_mmio_ring *ring = kvm_state.coalesced_mmio_ring;
>           while (ring->first != ring->last) {
>               struct kvm_coalesced_mmio *ent;
>
> @@ -963,7 +935,7 @@ void kvm_cpu_exec(CPUState *env)
>       }
>   }
>
> -int kvm_ioctl(KVMState *s, int type, ...)
> +int kvm_ioctl(int type, ...)
>   {
>       int ret;
>       void *arg;
> @@ -973,14 +945,14 @@ int kvm_ioctl(KVMState *s, int type, ...)
>       arg = va_arg(ap, void *);
>       va_end(ap);
>
> -    ret = ioctl(s->fd, type, arg);
> +    ret = ioctl(kvm_state.fd, type, arg);
>       if (ret == -1) {
>           ret = -errno;
>       }
>       return ret;
>   }
>
> -int kvm_vm_ioctl(KVMState *s, int type, ...)
> +int kvm_vm_ioctl(int type, ...)
>   {
>       int ret;
>       void *arg;
> @@ -990,7 +962,7 @@ int kvm_vm_ioctl(KVMState *s, int type, ...)
>       arg = va_arg(ap, void *);
>       va_end(ap);
>
> -    ret = ioctl(s->vmfd, type, arg);
> +    ret = ioctl(kvm_state.vmfd, type, arg);
>       if (ret == -1) {
>           ret = -errno;
>       }
> @@ -1017,9 +989,7 @@ int kvm_vcpu_ioctl(CPUState *env, int type, ...)
>   int kvm_has_sync_mmu(void)
>   {
>   #ifdef KVM_CAP_SYNC_MMU
> -    KVMState *s = kvm_state;
> -
> -    return kvm_check_extension(s, KVM_CAP_SYNC_MMU);
> +    return kvm_check_extension(KVM_CAP_SYNC_MMU);
>   #else
>       return 0;
>   #endif
> @@ -1027,27 +997,27 @@ int kvm_has_sync_mmu(void)
>
>   int kvm_has_vcpu_events(void)
>   {
> -    return kvm_state->vcpu_events;
> +    return kvm_state.vcpu_events;
>   }
>
>   int kvm_has_robust_singlestep(void)
>   {
> -    return kvm_state->robust_singlestep;
> +    return kvm_state.robust_singlestep;
>   }
>
>   int kvm_has_debugregs(void)
>   {
> -    return kvm_state->debugregs;
> +    return kvm_state.debugregs;
>   }
>
>   int kvm_has_xsave(void)
>   {
> -    return kvm_state->xsave;
> +    return kvm_state.xsave;
>   }
>
>   int kvm_has_xcrs(void)
>   {
> -    return kvm_state->xcrs;
> +    return kvm_state.xcrs;
>   }
>
>   void kvm_setup_guest_memory(void *start, size_t size)
> @@ -1070,7 +1040,7 @@ struct kvm_sw_breakpoint *kvm_find_sw_breakpoint(CPUState *env,
>   {
>       struct kvm_sw_breakpoint *bp;
>
> -    QTAILQ_FOREACH(bp,&env->kvm_state->kvm_sw_breakpoints, entry) {
> +    QTAILQ_FOREACH(bp,&kvm_state.kvm_sw_breakpoints, entry) {
>           if (bp->pc == pc) {
>               return bp;
>           }
> @@ -1080,7 +1050,7 @@ struct kvm_sw_breakpoint *kvm_find_sw_breakpoint(CPUState *env,
>
>   int kvm_sw_breakpoints_active(CPUState *env)
>   {
> -    return !QTAILQ_EMPTY(&env->kvm_state->kvm_sw_breakpoints);
> +    return !QTAILQ_EMPTY(&kvm_state.kvm_sw_breakpoints);
>   }
>
>   struct kvm_set_guest_debug_data {
> @@ -1140,8 +1110,7 @@ int kvm_insert_breakpoint(CPUState *current_env, target_ulong addr,
>               return err;
>           }
>
> -        QTAILQ_INSERT_HEAD(&current_env->kvm_state->kvm_sw_breakpoints,
> -                          bp, entry);
> +        QTAILQ_INSERT_HEAD(&kvm_state.kvm_sw_breakpoints, bp, entry);
>       } else {
>           err = kvm_arch_insert_hw_breakpoint(addr, len, type);
>           if (err) {
> @@ -1181,7 +1150,7 @@ int kvm_remove_breakpoint(CPUState *current_env, target_ulong addr,
>               return err;
>           }
>
> -        QTAILQ_REMOVE(&current_env->kvm_state->kvm_sw_breakpoints, bp, entry);
> +        QTAILQ_REMOVE(&kvm_state.kvm_sw_breakpoints, bp, entry);
>           qemu_free(bp);
>       } else {
>           err = kvm_arch_remove_hw_breakpoint(addr, len, type);
> @@ -1202,10 +1171,9 @@ int kvm_remove_breakpoint(CPUState *current_env, target_ulong addr,
>   void kvm_remove_all_breakpoints(CPUState *current_env)
>   {
>       struct kvm_sw_breakpoint *bp, *next;
> -    KVMState *s = current_env->kvm_state;
>       CPUState *env;
>
> -    QTAILQ_FOREACH_SAFE(bp,&s->kvm_sw_breakpoints, entry, next) {
> +    QTAILQ_FOREACH_SAFE(bp,&kvm_state.kvm_sw_breakpoints, entry, next) {
>           if (kvm_arch_remove_sw_breakpoint(current_env, bp) != 0) {
>               /* Try harder to find a CPU that currently sees the breakpoint. */
>               for (env = first_cpu; env != NULL; env = env->next_cpu) {
> @@ -1285,7 +1253,7 @@ int kvm_set_ioeventfd_mmio_long(int fd, uint32_t addr, uint32_t val, bool assign
>           iofd.flags |= KVM_IOEVENTFD_FLAG_DEASSIGN;
>       }
>
> -    ret = kvm_vm_ioctl(kvm_state, KVM_IOEVENTFD,&iofd);
> +    ret = kvm_vm_ioctl(KVM_IOEVENTFD,&iofd);
>
>       if (ret<  0) {
>           return -errno;
> @@ -1314,7 +1282,7 @@ int kvm_set_ioeventfd_pio_word(int fd, uint16_t addr, uint16_t val, bool assign)
>       if (!assign) {
>           kick.flags |= KVM_IOEVENTFD_FLAG_DEASSIGN;
>       }
> -    r = kvm_vm_ioctl(kvm_state, KVM_IOEVENTFD,&kick);
> +    r = kvm_vm_ioctl(KVM_IOEVENTFD,&kick);
>       if (r<  0) {
>           return r;
>       }
> diff --git a/kvm-stub.c b/kvm-stub.c
> index 352c6a6..3a058ad 100644
> --- a/kvm-stub.c
> +++ b/kvm-stub.c
> @@ -53,7 +53,7 @@ int kvm_uncoalesce_mmio_region(target_phys_addr_t start, ram_addr_t size)
>       return -ENOSYS;
>   }
>
> -int kvm_check_extension(KVMState *s, unsigned int extension)
> +int kvm_check_extension(unsigned int extension)
>   {
>       return 0;
>   }
> diff --git a/kvm.h b/kvm.h
> index 51ad56f..26ca8c1 100644
> --- a/kvm.h
> +++ b/kvm.h
> @@ -74,12 +74,9 @@ int kvm_irqchip_in_kernel(void);
>
>   /* internal API */
>
> -struct KVMState;
> -typedef struct KVMState KVMState;
> +int kvm_ioctl(int type, ...);
>
> -int kvm_ioctl(KVMState *s, int type, ...);
> -
> -int kvm_vm_ioctl(KVMState *s, int type, ...);
> +int kvm_vm_ioctl(int type, ...);
>
>   int kvm_vcpu_ioctl(CPUState *env, int type, ...);
>
> @@ -104,7 +101,7 @@ int kvm_arch_get_registers(CPUState *env);
>
>   int kvm_arch_put_registers(CPUState *env, int level);
>
> -int kvm_arch_init(KVMState *s, int smp_cpus);
> +int kvm_arch_init(int smp_cpus);
>
>   int kvm_arch_init_vcpu(CPUState *env);
>
> @@ -146,10 +143,8 @@ void kvm_arch_update_guest_debug(CPUState *env, struct kvm_guest_debug *dbg);
>
>   bool kvm_arch_stop_on_emulation_error(CPUState *env);
>
> -int kvm_check_extension(KVMState *s, unsigned int extension);
> +int kvm_check_extension(unsigned int extension);
>
> -uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function,
> -                                      uint32_t index, int reg);
>   void kvm_cpu_synchronize_state(CPUState *env);
>   void kvm_cpu_synchronize_post_reset(CPUState *env);
>   void kvm_cpu_synchronize_post_init(CPUState *env);
> @@ -179,7 +174,7 @@ static inline void cpu_synchronize_post_init(CPUState *env)
>
>
>   #if !defined(CONFIG_USER_ONLY)
> -int kvm_physical_memory_addr_from_ram(KVMState *s, ram_addr_t ram_addr,
> +int kvm_physical_memory_addr_from_ram(ram_addr_t ram_addr,
>                                         target_phys_addr_t *phys_addr);
>   #endif
>
> diff --git a/target-i386/cpuid.c b/target-i386/cpuid.c
> index 5382a28..17ab619 100644
> --- a/target-i386/cpuid.c
> +++ b/target-i386/cpuid.c
> @@ -23,6 +23,7 @@
>
>   #include "cpu.h"
>   #include "kvm.h"
> +#include "kvm_x86.h"
>
>   #include "qemu-option.h"
>   #include "qemu-config.h"
> @@ -1138,10 +1139,10 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
>               break;
>           }
>           if (kvm_enabled()) {
> -            *eax = kvm_arch_get_supported_cpuid(env, 0xd, count, R_EAX);
> -            *ebx = kvm_arch_get_supported_cpuid(env, 0xd, count, R_EBX);
> -            *ecx = kvm_arch_get_supported_cpuid(env, 0xd, count, R_ECX);
> -            *edx = kvm_arch_get_supported_cpuid(env, 0xd, count, R_EDX);
> +            *eax = kvm_x86_get_supported_cpuid(0xd, count, R_EAX);
> +            *ebx = kvm_x86_get_supported_cpuid(0xd, count, R_EBX);
> +            *ecx = kvm_x86_get_supported_cpuid(0xd, count, R_ECX);
> +            *edx = kvm_x86_get_supported_cpuid(0xd, count, R_EDX);
>           } else {
>               *eax = 0;
>               *ebx = 0;
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index 1789bff..cb6883f 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -60,7 +60,7 @@ static int lm_capable_kernel;
>
>   #ifdef KVM_CAP_EXT_CPUID
>
> -static struct kvm_cpuid2 *try_get_cpuid(KVMState *s, int max)
> +static struct kvm_cpuid2 *try_get_cpuid(int max)
>   {
>       struct kvm_cpuid2 *cpuid;
>       int r, size;
> @@ -68,7 +68,7 @@ static struct kvm_cpuid2 *try_get_cpuid(KVMState *s, int max)
>       size = sizeof(*cpuid) + max * sizeof(*cpuid->entries);
>       cpuid = (struct kvm_cpuid2 *)qemu_mallocz(size);
>       cpuid->nent = max;
> -    r = kvm_ioctl(s, KVM_GET_SUPPORTED_CPUID, cpuid);
> +    r = kvm_ioctl(KVM_GET_SUPPORTED_CPUID, cpuid);
>       if (r == 0&&  cpuid->nent>= max) {
>           r = -E2BIG;
>       }
> @@ -85,20 +85,20 @@ static struct kvm_cpuid2 *try_get_cpuid(KVMState *s, int max)
>       return cpuid;
>   }
>
> -uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function,
> -                                      uint32_t index, int reg)
> +uint32_t kvm_x86_get_supported_cpuid(uint32_t function, uint32_t index,
> +                                     int reg)
>   {
>       struct kvm_cpuid2 *cpuid;
>       int i, max;
>       uint32_t ret = 0;
>       uint32_t cpuid_1_edx;
>
> -    if (!kvm_check_extension(env->kvm_state, KVM_CAP_EXT_CPUID)) {
> +    if (!kvm_check_extension(KVM_CAP_EXT_CPUID)) {
>           return -1U;
>       }
>
>       max = 1;
> -    while ((cpuid = try_get_cpuid(env->kvm_state, max)) == NULL) {
> +    while ((cpuid = try_get_cpuid(max)) == NULL) {
>           max *= 2;
>       }
>
> @@ -126,7 +126,7 @@ uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function,
>                       /* On Intel, kvm returns cpuid according to the Intel spec,
>                        * so add missing bits according to the AMD spec:
>                        */
> -                    cpuid_1_edx = kvm_arch_get_supported_cpuid(env, 1, 0, R_EDX);
> +                    cpuid_1_edx = kvm_x86_get_supported_cpuid(1, 0, R_EDX);
>                       ret |= cpuid_1_edx&  0x183f7ff;
>                       break;
>                   }
> @@ -142,8 +142,8 @@ uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function,
>
>   #else
>
> -uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function,
> -                                      uint32_t index, int reg)
> +uint32_t kvm_x86_get_supported_cpuid(uint32_t function, uint32_t index,
> +                                     int reg)
>   {
>       return -1U;
>   }
> @@ -170,12 +170,12 @@ struct kvm_para_features {
>       { -1, -1 }
>   };
>
> -static int get_para_features(CPUState *env)
> +static int get_para_features(void)
>   {
>       int i, features = 0;
>
>       for (i = 0; i<  ARRAY_SIZE(para_features) - 1; i++) {
> -        if (kvm_check_extension(env->kvm_state, para_features[i].cap)) {
> +        if (kvm_check_extension(para_features[i].cap)) {
>               features |= (1<<  para_features[i].feature);
>           }
>       }
> @@ -184,15 +184,14 @@ static int get_para_features(CPUState *env)
>   #endif
>
>   #ifdef KVM_CAP_MCE
> -static int kvm_get_mce_cap_supported(KVMState *s, uint64_t *mce_cap,
> -                                     int *max_banks)
> +static int kvm_get_mce_cap_supported(uint64_t *mce_cap, int *max_banks)
>   {
>       int r;
>
> -    r = kvm_check_extension(s, KVM_CAP_MCE);
> +    r = kvm_check_extension(KVM_CAP_MCE);
>       if (r>  0) {
>           *max_banks = r;
> -        return kvm_ioctl(s, KVM_X86_GET_MCE_CAP_SUPPORTED, mce_cap);
> +        return kvm_ioctl(KVM_X86_GET_MCE_CAP_SUPPORTED, mce_cap);
>       }
>       return -ENOSYS;
>   }
> @@ -323,18 +322,18 @@ int kvm_arch_init_vcpu(CPUState *env)
>       uint32_t signature[3];
>   #endif
>
> -    env->cpuid_features&= kvm_arch_get_supported_cpuid(env, 1, 0, R_EDX);
> +    env->cpuid_features&= kvm_x86_get_supported_cpuid(1, 0, R_EDX);
>
>       i = env->cpuid_ext_features&  CPUID_EXT_HYPERVISOR;
> -    env->cpuid_ext_features&= kvm_arch_get_supported_cpuid(env, 1, 0, R_ECX);
> +    env->cpuid_ext_features&= kvm_x86_get_supported_cpuid(1, 0, R_ECX);
>       env->cpuid_ext_features |= i;
>
> -    env->cpuid_ext2_features&= kvm_arch_get_supported_cpuid(env, 0x80000001,
> -                                                             0, R_EDX);
> -    env->cpuid_ext3_features&= kvm_arch_get_supported_cpuid(env, 0x80000001,
> -                                                             0, R_ECX);
> -    env->cpuid_svm_features&= kvm_arch_get_supported_cpuid(env, 0x8000000A,
> -                                                             0, R_EDX);
> +    env->cpuid_ext2_features&= kvm_x86_get_supported_cpuid(0x80000001,
> +                                                            0, R_EDX);
> +    env->cpuid_ext3_features&= kvm_x86_get_supported_cpuid(0x80000001,
> +                                                            0, R_ECX);
> +    env->cpuid_svm_features&= kvm_x86_get_supported_cpuid(0x8000000A,
> +                                                            0, R_EDX);
>
>
>       cpuid_i = 0;
> @@ -353,7 +352,7 @@ int kvm_arch_init_vcpu(CPUState *env)
>       c =&cpuid_data.entries[cpuid_i++];
>       memset(c, 0, sizeof(*c));
>       c->function = KVM_CPUID_FEATURES;
> -    c->eax = env->cpuid_kvm_features&  get_para_features(env);
> +    c->eax = env->cpuid_kvm_features&  get_para_features();
>   #endif
>
>       cpu_x86_cpuid(env, 0, 0,&limit,&unused,&unused,&unused);
> @@ -423,11 +422,11 @@ int kvm_arch_init_vcpu(CPUState *env)
>   #ifdef KVM_CAP_MCE
>       if (((env->cpuid_version>>  8)&0xF)>= 6
>           &&  (env->cpuid_features&(CPUID_MCE|CPUID_MCA)) == (CPUID_MCE|CPUID_MCA)
> -&&  kvm_check_extension(env->kvm_state, KVM_CAP_MCE)>  0) {
> +&&  kvm_check_extension(KVM_CAP_MCE)>  0) {
>           uint64_t mcg_cap;
>           int banks;
>
> -        if (kvm_get_mce_cap_supported(env->kvm_state,&mcg_cap,&banks)) {
> +        if (kvm_get_mce_cap_supported(&mcg_cap,&banks)) {
>               perror("kvm_get_mce_cap_supported FAILED");
>           } else {
>               if (banks>  MCE_BANKS_DEF)
> @@ -461,7 +460,7 @@ void kvm_arch_reset_vcpu(CPUState *env)
>       }
>   }
>
> -static int kvm_get_supported_msrs(KVMState *s)
> +static int kvm_get_supported_msrs(void)
>   {
>       static int kvm_supported_msrs;
>       int ret = 0;
> @@ -475,7 +474,7 @@ static int kvm_get_supported_msrs(KVMState *s)
>           /* Obtain MSR list from KVM.  These are the MSRs that we must
>            * save/restore */
>           msr_list.nmsrs = 0;
> -        ret = kvm_ioctl(s, KVM_GET_MSR_INDEX_LIST,&msr_list);
> +        ret = kvm_ioctl(KVM_GET_MSR_INDEX_LIST,&msr_list);
>           if (ret<  0&&  ret != -E2BIG) {
>               return ret;
>           }
> @@ -486,7 +485,7 @@ static int kvm_get_supported_msrs(KVMState *s)
>                                                 sizeof(msr_list.indices[0])));
>
>           kvm_msr_list->nmsrs = msr_list.nmsrs;
> -        ret = kvm_ioctl(s, KVM_GET_MSR_INDEX_LIST, kvm_msr_list);
> +        ret = kvm_ioctl(KVM_GET_MSR_INDEX_LIST, kvm_msr_list);
>           if (ret>= 0) {
>               int i;
>
> @@ -508,17 +507,17 @@ static int kvm_get_supported_msrs(KVMState *s)
>       return ret;
>   }
>
> -static int kvm_init_identity_map_page(KVMState *s)
> +static int kvm_init_identity_map_page(void)
>   {
>   #ifdef KVM_CAP_SET_IDENTITY_MAP_ADDR
>       int ret;
>       uint64_t addr = 0xfffbc000;
>
> -    if (!kvm_check_extension(s, KVM_CAP_SET_IDENTITY_MAP_ADDR)) {
> +    if (!kvm_check_extension(KVM_CAP_SET_IDENTITY_MAP_ADDR)) {
>           return 0;
>       }
>
> -    ret = kvm_vm_ioctl(s, KVM_SET_IDENTITY_MAP_ADDR,&addr);
> +    ret = kvm_vm_ioctl(KVM_SET_IDENTITY_MAP_ADDR,&addr);
>       if (ret<  0) {
>           fprintf(stderr, "kvm_set_identity_map_addr: %s\n", strerror(ret));
>           return ret;
> @@ -527,12 +526,12 @@ static int kvm_init_identity_map_page(KVMState *s)
>       return 0;
>   }
>
> -int kvm_arch_init(KVMState *s, int smp_cpus)
> +int kvm_arch_init(int smp_cpus)
>   {
>       int ret;
>       struct utsname utsname;
>
> -    ret = kvm_get_supported_msrs(s);
> +    ret = kvm_get_supported_msrs();
>       if (ret<  0) {
>           return ret;
>       }
> @@ -546,7 +545,7 @@ int kvm_arch_init(KVMState *s, int smp_cpus)
>        * versions of KVM just assumed that it would be at the end of physical
>        * memory but that doesn't work with more than 4GB of memory.  We simply
>        * refuse to work with those older versions of KVM. */
> -    ret = kvm_check_extension(s, KVM_CAP_SET_TSS_ADDR);
> +    ret = kvm_check_extension(KVM_CAP_SET_TSS_ADDR);
>       if (ret<= 0) {
>           fprintf(stderr, "kvm does not support KVM_CAP_SET_TSS_ADDR\n");
>           return ret;
> @@ -563,12 +562,12 @@ int kvm_arch_init(KVMState *s, int smp_cpus)
>           perror("e820_add_entry() table is full");
>           exit(1);
>       }
> -    ret = kvm_vm_ioctl(s, KVM_SET_TSS_ADDR, 0xfffbd000);
> +    ret = kvm_vm_ioctl(KVM_SET_TSS_ADDR, 0xfffbd000);
>       if (ret<  0) {
>           return ret;
>       }
>
> -    return kvm_init_identity_map_page(s);
> +    return kvm_init_identity_map_page();
>   }
>
>   static void set_v8086_seg(struct kvm_segment *lhs, const SegmentCache *rhs)
> @@ -1861,7 +1860,7 @@ int kvm_on_sigbus_vcpu(CPUState *env, int code, void *addr)
>               || code == BUS_MCEERR_AO)) {
>           vaddr = (void *)addr;
>           if (qemu_ram_addr_from_host(vaddr,&ram_addr) ||
> -            !kvm_physical_memory_addr_from_ram(env->kvm_state, ram_addr,&paddr)) {
> +            !kvm_physical_memory_addr_from_ram(ram_addr,&paddr)) {
>               fprintf(stderr, "Hardware memory error for memory used by "
>                       "QEMU itself instead of guest system!\n");
>               /* Hope we are lucky for AO MCE */
> @@ -1910,7 +1909,7 @@ int kvm_on_sigbus(int code, void *addr)
>           /* Hope we are lucky for AO MCE */
>           vaddr = addr;
>           if (qemu_ram_addr_from_host(vaddr,&ram_addr) ||
> -            !kvm_physical_memory_addr_from_ram(first_cpu->kvm_state, ram_addr,&paddr)) {
> +            !kvm_physical_memory_addr_from_ram(ram_addr,&paddr)) {
>               fprintf(stderr, "Hardware memory error for memory used by "
>                       "QEMU itself instead of guest system!: %p\n", addr);
>               return 0;
> diff --git a/target-i386/kvm_x86.h b/target-i386/kvm_x86.h
> index 9d7b584..304d0cb 100644
> --- a/target-i386/kvm_x86.h
> +++ b/target-i386/kvm_x86.h
> @@ -22,4 +22,7 @@ void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
>                           uint64_t mcg_status, uint64_t addr, uint64_t misc,
>                           int flag);
>
> +uint32_t kvm_x86_get_supported_cpuid(uint32_t function, uint32_t index,
> +                                     int reg);
> +
>   #endif
> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> index 849b404..56d30cc 100644
> --- a/target-ppc/kvm.c
> +++ b/target-ppc/kvm.c
> @@ -56,13 +56,13 @@ static void kvm_kick_env(void *env)
>       qemu_cpu_kick(env);
>   }
>
> -int kvm_arch_init(KVMState *s, int smp_cpus)
> +int kvm_arch_init(int smp_cpus)
>   {
>   #ifdef KVM_CAP_PPC_UNSET_IRQ
> -    cap_interrupt_unset = kvm_check_extension(s, KVM_CAP_PPC_UNSET_IRQ);
> +    cap_interrupt_unset = kvm_check_extension(KVM_CAP_PPC_UNSET_IRQ);
>   #endif
>   #ifdef KVM_CAP_PPC_IRQ_LEVEL
> -    cap_interrupt_level = kvm_check_extension(s, KVM_CAP_PPC_IRQ_LEVEL);
> +    cap_interrupt_level = kvm_check_extension(KVM_CAP_PPC_IRQ_LEVEL);
>   #endif
>
>       if (!cap_interrupt_level) {
> @@ -164,7 +164,7 @@ int kvm_arch_get_registers(CPUState *env)
>           env->gpr[i] = regs.gpr[i];
>
>   #ifdef KVM_CAP_PPC_SEGSTATE
> -    if (kvm_check_extension(env->kvm_state, KVM_CAP_PPC_SEGSTATE)) {
> +    if (kvm_check_extension(KVM_CAP_PPC_SEGSTATE)) {
>           env->sdr1 = sregs.u.s.sdr1;
>
>           /* Sync SLB */
> @@ -371,8 +371,8 @@ int kvmppc_get_hypercall(CPUState *env, uint8_t *buf, int buf_len)
>   #ifdef KVM_CAP_PPC_GET_PVINFO
>       struct kvm_ppc_pvinfo pvinfo;
>
> -    if (kvm_check_extension(env->kvm_state, KVM_CAP_PPC_GET_PVINFO)&&
> -        !kvm_vm_ioctl(env->kvm_state, KVM_PPC_GET_PVINFO,&pvinfo)) {
> +    if (kvm_check_extension(KVM_CAP_PPC_GET_PVINFO)&&
> +        !kvm_vm_ioctl(KVM_PPC_GET_PVINFO,&pvinfo)) {
>           memcpy(buf, pvinfo.hcall, buf_len);
>
>           return 0;
> diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
> index adf4a9e..927a37e 100644
> --- a/target-s390x/kvm.c
> +++ b/target-s390x/kvm.c
> @@ -70,7 +70,7 @@
>   #define SCLP_CMDW_READ_SCP_INFO         0x00020001
>   #define SCLP_CMDW_READ_SCP_INFO_FORCED  0x00120001
>
> -int kvm_arch_init(KVMState *s, int smp_cpus)
> +int kvm_arch_init(int smp_cpus)
>   {
>       return 0;
>   }
> @@ -186,10 +186,6 @@ static void kvm_s390_interrupt_internal(CPUState *env, int type, uint32_t parm,
>       struct kvm_s390_interrupt kvmint;
>       int r;
>
> -    if (!env->kvm_state) {
> -        return;
> -    }
> -
>       env->halted = 0;
>       env->exception_index = -1;
>
> @@ -198,7 +194,7 @@ static void kvm_s390_interrupt_internal(CPUState *env, int type, uint32_t parm,
>       kvmint.parm64 = parm64;
>
>       if (vm) {
> -        r = kvm_vm_ioctl(env->kvm_state, KVM_S390_INTERRUPT,&kvmint);
> +        r = kvm_vm_ioctl(KVM_S390_INTERRUPT,&kvmint);
>       } else {
>           r = kvm_vcpu_ioctl(env, KVM_S390_INTERRUPT,&kvmint);
>       }
>    


^ permalink raw reply	[flat|nested] 300+ messages in thread

* [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
@ 2011-01-06 19:24     ` Anthony Liguori
  0 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-06 19:24 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Jan Kiszka, qemu-devel, kvm, Alexander Graf

On 01/06/2011 11:56 AM, Marcelo Tosatti wrote:
> From: Jan Kiszka<jan.kiszka@siemens.com>
>
> QEMU supports only one VM, so there is only one kvm_state per process,
> and we gain nothing passing a reference to it around. Eliminate any need
> to refer to it outside of kvm-all.c.
>
> Signed-off-by: Jan Kiszka<jan.kiszka@siemens.com>
> CC: Alexander Graf<agraf@suse.de>
> Signed-off-by: Marcelo Tosatti<mtosatti@redhat.com>
>    

I think this is a big mistake.

Having to manage kvm_state keeps the abstraction lines well defined.  
Otherwise, it's far too easy for portions of code to call into KVM 
functions that really shouldn't.

Regards,

Anthony Liguori

> ---
>   cpu-defs.h            |    2 -
>   kvm-all.c             |  232 +++++++++++++++++++++----------------------------
>   kvm-stub.c            |    2 +-
>   kvm.h                 |   15 +--
>   target-i386/cpuid.c   |    9 +-
>   target-i386/kvm.c     |   77 ++++++++--------
>   target-i386/kvm_x86.h |    3 +
>   target-ppc/kvm.c      |   12 ++--
>   target-s390x/kvm.c    |    8 +--
>   9 files changed, 160 insertions(+), 200 deletions(-)
>
> diff --git a/cpu-defs.h b/cpu-defs.h
> index 8d4bf86..0e04239 100644
> --- a/cpu-defs.h
> +++ b/cpu-defs.h
> @@ -131,7 +131,6 @@ typedef struct icount_decr_u16 {
>   #endif
>
>   struct kvm_run;
> -struct KVMState;
>   struct qemu_work_item;
>
>   typedef struct CPUBreakpoint {
> @@ -207,7 +206,6 @@ typedef struct CPUWatchpoint {
>       struct QemuCond *halt_cond;                                         \
>       struct qemu_work_item *queued_work_first, *queued_work_last;        \
>       const char *cpu_model_str;                                          \
> -    struct KVMState *kvm_state;                                         \
>       struct kvm_run *kvm_run;                                            \
>       int kvm_fd;                                                         \
>       int kvm_vcpu_dirty;
> diff --git a/kvm-all.c b/kvm-all.c
> index ef2ca3b..d8820c7 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -52,8 +52,7 @@ typedef struct KVMSlot
>
>   typedef struct kvm_dirty_log KVMDirtyLog;
>
> -struct KVMState
> -{
> +static struct KVMState {
>       KVMSlot slots[32];
>       int fd;
>       int vmfd;
> @@ -72,21 +71,19 @@ struct KVMState
>       int irqchip_in_kernel;
>       int pit_in_kernel;
>       int xsave, xcrs;
> -};
> -
> -static KVMState *kvm_state;
> +} kvm_state;
>
> -static KVMSlot *kvm_alloc_slot(KVMState *s)
> +static KVMSlot *kvm_alloc_slot(void)
>   {
>       int i;
>
> -    for (i = 0; i<  ARRAY_SIZE(s->slots); i++) {
> +    for (i = 0; i<  ARRAY_SIZE(kvm_state.slots); i++) {
>           /* KVM private memory slots */
>           if (i>= 8&&  i<  12) {
>               continue;
>           }
> -        if (s->slots[i].memory_size == 0) {
> -            return&s->slots[i];
> +        if (kvm_state.slots[i].memory_size == 0) {
> +            return&kvm_state.slots[i];
>           }
>       }
>
> @@ -94,14 +91,13 @@ static KVMSlot *kvm_alloc_slot(KVMState *s)
>       abort();
>   }
>
> -static KVMSlot *kvm_lookup_matching_slot(KVMState *s,
> -                                         target_phys_addr_t start_addr,
> +static KVMSlot *kvm_lookup_matching_slot(target_phys_addr_t start_addr,
>                                            target_phys_addr_t end_addr)
>   {
>       int i;
>
> -    for (i = 0; i<  ARRAY_SIZE(s->slots); i++) {
> -        KVMSlot *mem =&s->slots[i];
> +    for (i = 0; i<  ARRAY_SIZE(kvm_state.slots); i++) {
> +        KVMSlot *mem =&kvm_state.slots[i];
>
>           if (start_addr == mem->start_addr&&
>               end_addr == mem->start_addr + mem->memory_size) {
> @@ -115,15 +111,14 @@ static KVMSlot *kvm_lookup_matching_slot(KVMState *s,
>   /*
>    * Find overlapping slot with lowest start address
>    */
> -static KVMSlot *kvm_lookup_overlapping_slot(KVMState *s,
> -                                            target_phys_addr_t start_addr,
> +static KVMSlot *kvm_lookup_overlapping_slot(target_phys_addr_t start_addr,
>                                               target_phys_addr_t end_addr)
>   {
>       KVMSlot *found = NULL;
>       int i;
>
> -    for (i = 0; i<  ARRAY_SIZE(s->slots); i++) {
> -        KVMSlot *mem =&s->slots[i];
> +    for (i = 0; i<  ARRAY_SIZE(kvm_state.slots); i++) {
> +        KVMSlot *mem =&kvm_state.slots[i];
>
>           if (mem->memory_size == 0 ||
>               (found&&  found->start_addr<  mem->start_addr)) {
> @@ -139,13 +134,13 @@ static KVMSlot *kvm_lookup_overlapping_slot(KVMState *s,
>       return found;
>   }
>
> -int kvm_physical_memory_addr_from_ram(KVMState *s, ram_addr_t ram_addr,
> +int kvm_physical_memory_addr_from_ram(ram_addr_t ram_addr,
>                                         target_phys_addr_t *phys_addr)
>   {
>       int i;
>
> -    for (i = 0; i<  ARRAY_SIZE(s->slots); i++) {
> -        KVMSlot *mem =&s->slots[i];
> +    for (i = 0; i<  ARRAY_SIZE(kvm_state.slots); i++) {
> +        KVMSlot *mem =&kvm_state.slots[i];
>
>           if (ram_addr>= mem->phys_offset&&
>               ram_addr<  mem->phys_offset + mem->memory_size) {
> @@ -157,7 +152,7 @@ int kvm_physical_memory_addr_from_ram(KVMState *s, ram_addr_t ram_addr,
>       return 0;
>   }
>
> -static int kvm_set_user_memory_region(KVMState *s, KVMSlot *slot)
> +static int kvm_set_user_memory_region(KVMSlot *slot)
>   {
>       struct kvm_userspace_memory_region mem;
>
> @@ -166,10 +161,10 @@ static int kvm_set_user_memory_region(KVMState *s, KVMSlot *slot)
>       mem.memory_size = slot->memory_size;
>       mem.userspace_addr = (unsigned long)qemu_safe_ram_ptr(slot->phys_offset);
>       mem.flags = slot->flags;
> -    if (s->migration_log) {
> +    if (kvm_state.migration_log) {
>           mem.flags |= KVM_MEM_LOG_DIRTY_PAGES;
>       }
> -    return kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION,&mem);
> +    return kvm_vm_ioctl(KVM_SET_USER_MEMORY_REGION,&mem);
>   }
>
>   static void kvm_reset_vcpu(void *opaque)
> @@ -181,33 +176,31 @@ static void kvm_reset_vcpu(void *opaque)
>
>   int kvm_irqchip_in_kernel(void)
>   {
> -    return kvm_state->irqchip_in_kernel;
> +    return kvm_state.irqchip_in_kernel;
>   }
>
>   int kvm_pit_in_kernel(void)
>   {
> -    return kvm_state->pit_in_kernel;
> +    return kvm_state.pit_in_kernel;
>   }
>
>
>   int kvm_init_vcpu(CPUState *env)
>   {
> -    KVMState *s = kvm_state;
>       long mmap_size;
>       int ret;
>
>       DPRINTF("kvm_init_vcpu\n");
>
> -    ret = kvm_vm_ioctl(s, KVM_CREATE_VCPU, env->cpu_index);
> +    ret = kvm_vm_ioctl(KVM_CREATE_VCPU, env->cpu_index);
>       if (ret<  0) {
>           DPRINTF("kvm_create_vcpu failed\n");
>           goto err;
>       }
>
>       env->kvm_fd = ret;
> -    env->kvm_state = s;
>
> -    mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
> +    mmap_size = kvm_ioctl(KVM_GET_VCPU_MMAP_SIZE, 0);
>       if (mmap_size<  0) {
>           DPRINTF("KVM_GET_VCPU_MMAP_SIZE failed\n");
>           goto err;
> @@ -222,9 +215,9 @@ int kvm_init_vcpu(CPUState *env)
>       }
>
>   #ifdef KVM_CAP_COALESCED_MMIO
> -    if (s->coalesced_mmio&&  !s->coalesced_mmio_ring) {
> -        s->coalesced_mmio_ring =
> -            (void *)env->kvm_run + s->coalesced_mmio * PAGE_SIZE;
> +    if (kvm_state.coalesced_mmio&&  !kvm_state.coalesced_mmio_ring) {
> +        kvm_state.coalesced_mmio_ring =
> +            (void *)env->kvm_run + kvm_state.coalesced_mmio * PAGE_SIZE;
>       }
>   #endif
>
> @@ -243,8 +236,7 @@ err:
>   static int kvm_dirty_pages_log_change(target_phys_addr_t phys_addr,
>                                         ram_addr_t size, int flags, int mask)
>   {
> -    KVMState *s = kvm_state;
> -    KVMSlot *mem = kvm_lookup_matching_slot(s, phys_addr, phys_addr + size);
> +    KVMSlot *mem = kvm_lookup_matching_slot(phys_addr, phys_addr + size);
>       int old_flags;
>
>       if (mem == NULL)  {
> @@ -260,14 +252,14 @@ static int kvm_dirty_pages_log_change(target_phys_addr_t phys_addr,
>       mem->flags = flags;
>
>       /* If nothing changed effectively, no need to issue ioctl */
> -    if (s->migration_log) {
> +    if (kvm_state.migration_log) {
>           flags |= KVM_MEM_LOG_DIRTY_PAGES;
>       }
>       if (flags == old_flags) {
>               return 0;
>       }
>
> -    return kvm_set_user_memory_region(s, mem);
> +    return kvm_set_user_memory_region(mem);
>   }
>
>   int kvm_log_start(target_phys_addr_t phys_addr, ram_addr_t size)
> @@ -284,14 +276,13 @@ int kvm_log_stop(target_phys_addr_t phys_addr, ram_addr_t size)
>
>   static int kvm_set_migration_log(int enable)
>   {
> -    KVMState *s = kvm_state;
>       KVMSlot *mem;
>       int i, err;
>
> -    s->migration_log = enable;
> +    kvm_state.migration_log = enable;
>
> -    for (i = 0; i<  ARRAY_SIZE(s->slots); i++) {
> -        mem =&s->slots[i];
> +    for (i = 0; i<  ARRAY_SIZE(kvm_state.slots); i++) {
> +        mem =&kvm_state.slots[i];
>
>           if (!mem->memory_size) {
>               continue;
> @@ -299,7 +290,7 @@ static int kvm_set_migration_log(int enable)
>           if (!!(mem->flags&  KVM_MEM_LOG_DIRTY_PAGES) == enable) {
>               continue;
>           }
> -        err = kvm_set_user_memory_region(s, mem);
> +        err = kvm_set_user_memory_region(mem);
>           if (err) {
>               return err;
>           }
> @@ -353,7 +344,6 @@ static int kvm_get_dirty_pages_log_range(unsigned long start_addr,
>   static int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
>                                             target_phys_addr_t end_addr)
>   {
> -    KVMState *s = kvm_state;
>       unsigned long size, allocated_size = 0;
>       KVMDirtyLog d;
>       KVMSlot *mem;
> @@ -361,7 +351,7 @@ static int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
>
>       d.dirty_bitmap = NULL;
>       while (start_addr<  end_addr) {
> -        mem = kvm_lookup_overlapping_slot(s, start_addr, end_addr);
> +        mem = kvm_lookup_overlapping_slot(start_addr, end_addr);
>           if (mem == NULL) {
>               break;
>           }
> @@ -377,7 +367,7 @@ static int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
>
>           d.slot = mem->slot;
>
> -        if (kvm_vm_ioctl(s, KVM_GET_DIRTY_LOG,&d) == -1) {
> +        if (kvm_vm_ioctl(KVM_GET_DIRTY_LOG,&d) == -1) {
>               DPRINTF("ioctl failed %d\n", errno);
>               ret = -1;
>               break;
> @@ -395,16 +385,15 @@ static int kvm_physical_sync_dirty_bitmap(target_phys_addr_t start_addr,
>   int kvm_coalesce_mmio_region(target_phys_addr_t start, ram_addr_t size)
>   {
>       int ret = -ENOSYS;
> -#ifdef KVM_CAP_COALESCED_MMIO
> -    KVMState *s = kvm_state;
>
> -    if (s->coalesced_mmio) {
> +#ifdef KVM_CAP_COALESCED_MMIO
> +    if (kvm_state.coalesced_mmio) {
>           struct kvm_coalesced_mmio_zone zone;
>
>           zone.addr = start;
>           zone.size = size;
>
> -        ret = kvm_vm_ioctl(s, KVM_REGISTER_COALESCED_MMIO,&zone);
> +        ret = kvm_vm_ioctl(KVM_REGISTER_COALESCED_MMIO,&zone);
>       }
>   #endif
>
> @@ -414,27 +403,26 @@ int kvm_coalesce_mmio_region(target_phys_addr_t start, ram_addr_t size)
>   int kvm_uncoalesce_mmio_region(target_phys_addr_t start, ram_addr_t size)
>   {
>       int ret = -ENOSYS;
> -#ifdef KVM_CAP_COALESCED_MMIO
> -    KVMState *s = kvm_state;
>
> -    if (s->coalesced_mmio) {
> +#ifdef KVM_CAP_COALESCED_MMIO
> +    if (kvm_state.coalesced_mmio) {
>           struct kvm_coalesced_mmio_zone zone;
>
>           zone.addr = start;
>           zone.size = size;
>
> -        ret = kvm_vm_ioctl(s, KVM_UNREGISTER_COALESCED_MMIO,&zone);
> +        ret = kvm_vm_ioctl(KVM_UNREGISTER_COALESCED_MMIO,&zone);
>       }
>   #endif
>
>       return ret;
>   }
>
> -int kvm_check_extension(KVMState *s, unsigned int extension)
> +int kvm_check_extension(unsigned int extension)
>   {
>       int ret;
>
> -    ret = kvm_ioctl(s, KVM_CHECK_EXTENSION, extension);
> +    ret = kvm_ioctl(KVM_CHECK_EXTENSION, extension);
>       if (ret<  0) {
>           ret = 0;
>       }
> @@ -445,7 +433,6 @@ int kvm_check_extension(KVMState *s, unsigned int extension)
>   static void kvm_set_phys_mem(target_phys_addr_t start_addr, ram_addr_t size,
>                                ram_addr_t phys_offset)
>   {
> -    KVMState *s = kvm_state;
>       ram_addr_t flags = phys_offset&  ~TARGET_PAGE_MASK;
>       KVMSlot *mem, old;
>       int err;
> @@ -459,7 +446,7 @@ static void kvm_set_phys_mem(target_phys_addr_t start_addr, ram_addr_t size,
>       phys_offset&= ~IO_MEM_ROM;
>
>       while (1) {
> -        mem = kvm_lookup_overlapping_slot(s, start_addr, start_addr + size);
> +        mem = kvm_lookup_overlapping_slot(start_addr, start_addr + size);
>           if (!mem) {
>               break;
>           }
> @@ -476,7 +463,7 @@ static void kvm_set_phys_mem(target_phys_addr_t start_addr, ram_addr_t size,
>
>           /* unregister the overlapping slot */
>           mem->memory_size = 0;
> -        err = kvm_set_user_memory_region(s, mem);
> +        err = kvm_set_user_memory_region(mem);
>           if (err) {
>               fprintf(stderr, "%s: error unregistering overlapping slot: %s\n",
>                       __func__, strerror(-err));
> @@ -491,16 +478,16 @@ static void kvm_set_phys_mem(target_phys_addr_t start_addr, ram_addr_t size,
>            * address as the first existing one. If not or if some overlapping
>            * slot comes around later, we will fail (not seen in practice so far)
>            * - and actually require a recent KVM version. */
> -        if (s->broken_set_mem_region&&
> +        if (kvm_state.broken_set_mem_region&&
>               old.start_addr == start_addr&&  old.memory_size<  size&&
>               flags<  IO_MEM_UNASSIGNED) {
> -            mem = kvm_alloc_slot(s);
> +            mem = kvm_alloc_slot();
>               mem->memory_size = old.memory_size;
>               mem->start_addr = old.start_addr;
>               mem->phys_offset = old.phys_offset;
>               mem->flags = 0;
>
> -            err = kvm_set_user_memory_region(s, mem);
> +            err = kvm_set_user_memory_region(mem);
>               if (err) {
>                   fprintf(stderr, "%s: error updating slot: %s\n", __func__,
>                           strerror(-err));
> @@ -515,13 +502,13 @@ static void kvm_set_phys_mem(target_phys_addr_t start_addr, ram_addr_t size,
>
>           /* register prefix slot */
>           if (old.start_addr<  start_addr) {
> -            mem = kvm_alloc_slot(s);
> +            mem = kvm_alloc_slot();
>               mem->memory_size = start_addr - old.start_addr;
>               mem->start_addr = old.start_addr;
>               mem->phys_offset = old.phys_offset;
>               mem->flags = 0;
>
> -            err = kvm_set_user_memory_region(s, mem);
> +            err = kvm_set_user_memory_region(mem);
>               if (err) {
>                   fprintf(stderr, "%s: error registering prefix slot: %s\n",
>                           __func__, strerror(-err));
> @@ -533,14 +520,14 @@ static void kvm_set_phys_mem(target_phys_addr_t start_addr, ram_addr_t size,
>           if (old.start_addr + old.memory_size>  start_addr + size) {
>               ram_addr_t size_delta;
>
> -            mem = kvm_alloc_slot(s);
> +            mem = kvm_alloc_slot();
>               mem->start_addr = start_addr + size;
>               size_delta = mem->start_addr - old.start_addr;
>               mem->memory_size = old.memory_size - size_delta;
>               mem->phys_offset = old.phys_offset + size_delta;
>               mem->flags = 0;
>
> -            err = kvm_set_user_memory_region(s, mem);
> +            err = kvm_set_user_memory_region(mem);
>               if (err) {
>                   fprintf(stderr, "%s: error registering suffix slot: %s\n",
>                           __func__, strerror(-err));
> @@ -557,13 +544,13 @@ static void kvm_set_phys_mem(target_phys_addr_t start_addr, ram_addr_t size,
>       if (flags>= IO_MEM_UNASSIGNED) {
>           return;
>       }
> -    mem = kvm_alloc_slot(s);
> +    mem = kvm_alloc_slot();
>       mem->memory_size = size;
>       mem->start_addr = start_addr;
>       mem->phys_offset = phys_offset;
>       mem->flags = 0;
>
> -    err = kvm_set_user_memory_region(s, mem);
> +    err = kvm_set_user_memory_region(mem);
>       if (err) {
>           fprintf(stderr, "%s: error registering slot: %s\n", __func__,
>                   strerror(-err));
> @@ -602,27 +589,24 @@ int kvm_init(int smp_cpus)
>       static const char upgrade_note[] =
>           "Please upgrade to at least kernel 2.6.29 or recent kvm-kmod\n"
>           "(see http://sourceforge.net/projects/kvm).\n";
> -    KVMState *s;
>       int ret;
>       int i;
>
> -    s = qemu_mallocz(sizeof(KVMState));
> -
>   #ifdef KVM_CAP_SET_GUEST_DEBUG
> -    QTAILQ_INIT(&s->kvm_sw_breakpoints);
> +    QTAILQ_INIT(&kvm_state.kvm_sw_breakpoints);
>   #endif
> -    for (i = 0; i<  ARRAY_SIZE(s->slots); i++) {
> -        s->slots[i].slot = i;
> +    for (i = 0; i<  ARRAY_SIZE(kvm_state.slots); i++) {
> +        kvm_state.slots[i].slot = i;
>       }
> -    s->vmfd = -1;
> -    s->fd = qemu_open("/dev/kvm", O_RDWR);
> -    if (s->fd == -1) {
> +    kvm_state.vmfd = -1;
> +    kvm_state.fd = qemu_open("/dev/kvm", O_RDWR);
> +    if (kvm_state.fd == -1) {
>           fprintf(stderr, "Could not access KVM kernel module: %m\n");
>           ret = -errno;
>           goto err;
>       }
>
> -    ret = kvm_ioctl(s, KVM_GET_API_VERSION, 0);
> +    ret = kvm_ioctl(KVM_GET_API_VERSION, 0);
>       if (ret<  KVM_API_VERSION) {
>           if (ret>  0) {
>               ret = -EINVAL;
> @@ -637,8 +621,8 @@ int kvm_init(int smp_cpus)
>           goto err;
>       }
>
> -    s->vmfd = kvm_ioctl(s, KVM_CREATE_VM, 0);
> -    if (s->vmfd<  0) {
> +    kvm_state.vmfd = kvm_ioctl(KVM_CREATE_VM, 0);
> +    if (kvm_state.vmfd<  0) {
>   #ifdef TARGET_S390X
>           fprintf(stderr, "Please add the 'switch_amode' kernel parameter to "
>                           "your host kernel command line\n");
> @@ -651,7 +635,7 @@ int kvm_init(int smp_cpus)
>        * just use a user allocated buffer so we can use regular pages
>        * unmodified.  Make sure we have a sufficiently modern version of KVM.
>        */
> -    if (!kvm_check_extension(s, KVM_CAP_USER_MEMORY)) {
> +    if (!kvm_check_extension(KVM_CAP_USER_MEMORY)) {
>           ret = -EINVAL;
>           fprintf(stderr, "kvm does not support KVM_CAP_USER_MEMORY\n%s",
>                   upgrade_note);
> @@ -661,7 +645,7 @@ int kvm_init(int smp_cpus)
>       /* There was a nasty bug in<  kvm-80 that prevents memory slots from being
>        * destroyed properly.  Since we rely on this capability, refuse to work
>        * with any kernel without this capability. */
> -    if (!kvm_check_extension(s, KVM_CAP_DESTROY_MEMORY_REGION_WORKS)) {
> +    if (!kvm_check_extension(KVM_CAP_DESTROY_MEMORY_REGION_WORKS)) {
>           ret = -EINVAL;
>
>           fprintf(stderr,
> @@ -670,66 +654,55 @@ int kvm_init(int smp_cpus)
>           goto err;
>       }
>
> -    s->coalesced_mmio = 0;
>   #ifdef KVM_CAP_COALESCED_MMIO
> -    s->coalesced_mmio = kvm_check_extension(s, KVM_CAP_COALESCED_MMIO);
> -    s->coalesced_mmio_ring = NULL;
> +    kvm_state.coalesced_mmio = kvm_check_extension(KVM_CAP_COALESCED_MMIO);
>   #endif
>
> -    s->broken_set_mem_region = 1;
> +    kvm_state.broken_set_mem_region = 1;
>   #ifdef KVM_CAP_JOIN_MEMORY_REGIONS_WORKS
> -    ret = kvm_check_extension(s, KVM_CAP_JOIN_MEMORY_REGIONS_WORKS);
> +    ret = kvm_check_extension(KVM_CAP_JOIN_MEMORY_REGIONS_WORKS);
>       if (ret>  0) {
> -        s->broken_set_mem_region = 0;
> +        kvm_state.broken_set_mem_region = 0;
>       }
>   #endif
>
> -    s->vcpu_events = 0;
>   #ifdef KVM_CAP_VCPU_EVENTS
> -    s->vcpu_events = kvm_check_extension(s, KVM_CAP_VCPU_EVENTS);
> +    kvm_state.vcpu_events = kvm_check_extension(KVM_CAP_VCPU_EVENTS);
>   #endif
>
> -    s->robust_singlestep = 0;
>   #ifdef KVM_CAP_X86_ROBUST_SINGLESTEP
> -    s->robust_singlestep =
> -        kvm_check_extension(s, KVM_CAP_X86_ROBUST_SINGLESTEP);
> +    kvm_state.robust_singlestep =
> +        kvm_check_extension(KVM_CAP_X86_ROBUST_SINGLESTEP);
>   #endif
>
> -    s->debugregs = 0;
>   #ifdef KVM_CAP_DEBUGREGS
> -    s->debugregs = kvm_check_extension(s, KVM_CAP_DEBUGREGS);
> +    kvm_state.debugregs = kvm_check_extension(KVM_CAP_DEBUGREGS);
>   #endif
>
> -    s->xsave = 0;
>   #ifdef KVM_CAP_XSAVE
> -    s->xsave = kvm_check_extension(s, KVM_CAP_XSAVE);
> +    kvm_state.xsave = kvm_check_extension(KVM_CAP_XSAVE);
>   #endif
>
> -    s->xcrs = 0;
>   #ifdef KVM_CAP_XCRS
> -    s->xcrs = kvm_check_extension(s, KVM_CAP_XCRS);
> +    kvm_state.xcrs = kvm_check_extension(KVM_CAP_XCRS);
>   #endif
>
> -    ret = kvm_arch_init(s, smp_cpus);
> +    ret = kvm_arch_init(smp_cpus);
>       if (ret<  0) {
>           goto err;
>       }
>
> -    kvm_state = s;
>       cpu_register_phys_memory_client(&kvm_cpu_phys_memory_client);
>
>       return 0;
>
>   err:
> -    if (s) {
> -        if (s->vmfd != -1) {
> -            close(s->vmfd);
> -        }
> -        if (s->fd != -1) {
> -            close(s->fd);
> -        }
> +    if (kvm_state.vmfd != -1) {
> +        close(kvm_state.vmfd);
> +    }
> +    if (kvm_state.fd != -1) {
> +        close(kvm_state.fd);
>       }
> -    qemu_free(s);
>
>       return ret;
>   }
> @@ -777,7 +750,7 @@ static int kvm_handle_io(uint16_t port, void *data, int direction, int size,
>   static int kvm_handle_internal_error(CPUState *env, struct kvm_run *run)
>   {
>       fprintf(stderr, "KVM internal error.");
> -    if (kvm_check_extension(kvm_state, KVM_CAP_INTERNAL_ERROR_DATA)) {
> +    if (kvm_check_extension(KVM_CAP_INTERNAL_ERROR_DATA)) {
>           int i;
>
>           fprintf(stderr, " Suberror: %d\n", run->internal.suberror);
> @@ -805,9 +778,8 @@ static int kvm_handle_internal_error(CPUState *env, struct kvm_run *run)
>   void kvm_flush_coalesced_mmio_buffer(void)
>   {
>   #ifdef KVM_CAP_COALESCED_MMIO
> -    KVMState *s = kvm_state;
> -    if (s->coalesced_mmio_ring) {
> -        struct kvm_coalesced_mmio_ring *ring = s->coalesced_mmio_ring;
> +    if (kvm_state.coalesced_mmio_ring) {
> +        struct kvm_coalesced_mmio_ring *ring = kvm_state.coalesced_mmio_ring;
>           while (ring->first != ring->last) {
>               struct kvm_coalesced_mmio *ent;
>
> @@ -963,7 +935,7 @@ void kvm_cpu_exec(CPUState *env)
>       }
>   }
>
> -int kvm_ioctl(KVMState *s, int type, ...)
> +int kvm_ioctl(int type, ...)
>   {
>       int ret;
>       void *arg;
> @@ -973,14 +945,14 @@ int kvm_ioctl(KVMState *s, int type, ...)
>       arg = va_arg(ap, void *);
>       va_end(ap);
>
> -    ret = ioctl(s->fd, type, arg);
> +    ret = ioctl(kvm_state.fd, type, arg);
>       if (ret == -1) {
>           ret = -errno;
>       }
>       return ret;
>   }
>
> -int kvm_vm_ioctl(KVMState *s, int type, ...)
> +int kvm_vm_ioctl(int type, ...)
>   {
>       int ret;
>       void *arg;
> @@ -990,7 +962,7 @@ int kvm_vm_ioctl(KVMState *s, int type, ...)
>       arg = va_arg(ap, void *);
>       va_end(ap);
>
> -    ret = ioctl(s->vmfd, type, arg);
> +    ret = ioctl(kvm_state.vmfd, type, arg);
>       if (ret == -1) {
>           ret = -errno;
>       }
> @@ -1017,9 +989,7 @@ int kvm_vcpu_ioctl(CPUState *env, int type, ...)
>   int kvm_has_sync_mmu(void)
>   {
>   #ifdef KVM_CAP_SYNC_MMU
> -    KVMState *s = kvm_state;
> -
> -    return kvm_check_extension(s, KVM_CAP_SYNC_MMU);
> +    return kvm_check_extension(KVM_CAP_SYNC_MMU);
>   #else
>       return 0;
>   #endif
> @@ -1027,27 +997,27 @@ int kvm_has_sync_mmu(void)
>
>   int kvm_has_vcpu_events(void)
>   {
> -    return kvm_state->vcpu_events;
> +    return kvm_state.vcpu_events;
>   }
>
>   int kvm_has_robust_singlestep(void)
>   {
> -    return kvm_state->robust_singlestep;
> +    return kvm_state.robust_singlestep;
>   }
>
>   int kvm_has_debugregs(void)
>   {
> -    return kvm_state->debugregs;
> +    return kvm_state.debugregs;
>   }
>
>   int kvm_has_xsave(void)
>   {
> -    return kvm_state->xsave;
> +    return kvm_state.xsave;
>   }
>
>   int kvm_has_xcrs(void)
>   {
> -    return kvm_state->xcrs;
> +    return kvm_state.xcrs;
>   }
>
>   void kvm_setup_guest_memory(void *start, size_t size)
> @@ -1070,7 +1040,7 @@ struct kvm_sw_breakpoint *kvm_find_sw_breakpoint(CPUState *env,
>   {
>       struct kvm_sw_breakpoint *bp;
>
> -    QTAILQ_FOREACH(bp,&env->kvm_state->kvm_sw_breakpoints, entry) {
> +    QTAILQ_FOREACH(bp,&kvm_state.kvm_sw_breakpoints, entry) {
>           if (bp->pc == pc) {
>               return bp;
>           }
> @@ -1080,7 +1050,7 @@ struct kvm_sw_breakpoint *kvm_find_sw_breakpoint(CPUState *env,
>
>   int kvm_sw_breakpoints_active(CPUState *env)
>   {
> -    return !QTAILQ_EMPTY(&env->kvm_state->kvm_sw_breakpoints);
> +    return !QTAILQ_EMPTY(&kvm_state.kvm_sw_breakpoints);
>   }
>
>   struct kvm_set_guest_debug_data {
> @@ -1140,8 +1110,7 @@ int kvm_insert_breakpoint(CPUState *current_env, target_ulong addr,
>               return err;
>           }
>
> -        QTAILQ_INSERT_HEAD(&current_env->kvm_state->kvm_sw_breakpoints,
> -                          bp, entry);
> +        QTAILQ_INSERT_HEAD(&kvm_state.kvm_sw_breakpoints, bp, entry);
>       } else {
>           err = kvm_arch_insert_hw_breakpoint(addr, len, type);
>           if (err) {
> @@ -1181,7 +1150,7 @@ int kvm_remove_breakpoint(CPUState *current_env, target_ulong addr,
>               return err;
>           }
>
> -        QTAILQ_REMOVE(&current_env->kvm_state->kvm_sw_breakpoints, bp, entry);
> +        QTAILQ_REMOVE(&kvm_state.kvm_sw_breakpoints, bp, entry);
>           qemu_free(bp);
>       } else {
>           err = kvm_arch_remove_hw_breakpoint(addr, len, type);
> @@ -1202,10 +1171,9 @@ int kvm_remove_breakpoint(CPUState *current_env, target_ulong addr,
>   void kvm_remove_all_breakpoints(CPUState *current_env)
>   {
>       struct kvm_sw_breakpoint *bp, *next;
> -    KVMState *s = current_env->kvm_state;
>       CPUState *env;
>
> -    QTAILQ_FOREACH_SAFE(bp,&s->kvm_sw_breakpoints, entry, next) {
> +    QTAILQ_FOREACH_SAFE(bp,&kvm_state.kvm_sw_breakpoints, entry, next) {
>           if (kvm_arch_remove_sw_breakpoint(current_env, bp) != 0) {
>               /* Try harder to find a CPU that currently sees the breakpoint. */
>               for (env = first_cpu; env != NULL; env = env->next_cpu) {
> @@ -1285,7 +1253,7 @@ int kvm_set_ioeventfd_mmio_long(int fd, uint32_t addr, uint32_t val, bool assign
>           iofd.flags |= KVM_IOEVENTFD_FLAG_DEASSIGN;
>       }
>
> -    ret = kvm_vm_ioctl(kvm_state, KVM_IOEVENTFD,&iofd);
> +    ret = kvm_vm_ioctl(KVM_IOEVENTFD,&iofd);
>
>       if (ret<  0) {
>           return -errno;
> @@ -1314,7 +1282,7 @@ int kvm_set_ioeventfd_pio_word(int fd, uint16_t addr, uint16_t val, bool assign)
>       if (!assign) {
>           kick.flags |= KVM_IOEVENTFD_FLAG_DEASSIGN;
>       }
> -    r = kvm_vm_ioctl(kvm_state, KVM_IOEVENTFD,&kick);
> +    r = kvm_vm_ioctl(KVM_IOEVENTFD,&kick);
>       if (r<  0) {
>           return r;
>       }
> diff --git a/kvm-stub.c b/kvm-stub.c
> index 352c6a6..3a058ad 100644
> --- a/kvm-stub.c
> +++ b/kvm-stub.c
> @@ -53,7 +53,7 @@ int kvm_uncoalesce_mmio_region(target_phys_addr_t start, ram_addr_t size)
>       return -ENOSYS;
>   }
>
> -int kvm_check_extension(KVMState *s, unsigned int extension)
> +int kvm_check_extension(unsigned int extension)
>   {
>       return 0;
>   }
> diff --git a/kvm.h b/kvm.h
> index 51ad56f..26ca8c1 100644
> --- a/kvm.h
> +++ b/kvm.h
> @@ -74,12 +74,9 @@ int kvm_irqchip_in_kernel(void);
>
>   /* internal API */
>
> -struct KVMState;
> -typedef struct KVMState KVMState;
> +int kvm_ioctl(int type, ...);
>
> -int kvm_ioctl(KVMState *s, int type, ...);
> -
> -int kvm_vm_ioctl(KVMState *s, int type, ...);
> +int kvm_vm_ioctl(int type, ...);
>
>   int kvm_vcpu_ioctl(CPUState *env, int type, ...);
>
> @@ -104,7 +101,7 @@ int kvm_arch_get_registers(CPUState *env);
>
>   int kvm_arch_put_registers(CPUState *env, int level);
>
> -int kvm_arch_init(KVMState *s, int smp_cpus);
> +int kvm_arch_init(int smp_cpus);
>
>   int kvm_arch_init_vcpu(CPUState *env);
>
> @@ -146,10 +143,8 @@ void kvm_arch_update_guest_debug(CPUState *env, struct kvm_guest_debug *dbg);
>
>   bool kvm_arch_stop_on_emulation_error(CPUState *env);
>
> -int kvm_check_extension(KVMState *s, unsigned int extension);
> +int kvm_check_extension(unsigned int extension);
>
> -uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function,
> -                                      uint32_t index, int reg);
>   void kvm_cpu_synchronize_state(CPUState *env);
>   void kvm_cpu_synchronize_post_reset(CPUState *env);
>   void kvm_cpu_synchronize_post_init(CPUState *env);
> @@ -179,7 +174,7 @@ static inline void cpu_synchronize_post_init(CPUState *env)
>
>
>   #if !defined(CONFIG_USER_ONLY)
> -int kvm_physical_memory_addr_from_ram(KVMState *s, ram_addr_t ram_addr,
> +int kvm_physical_memory_addr_from_ram(ram_addr_t ram_addr,
>                                         target_phys_addr_t *phys_addr);
>   #endif
>
> diff --git a/target-i386/cpuid.c b/target-i386/cpuid.c
> index 5382a28..17ab619 100644
> --- a/target-i386/cpuid.c
> +++ b/target-i386/cpuid.c
> @@ -23,6 +23,7 @@
>
>   #include "cpu.h"
>   #include "kvm.h"
> +#include "kvm_x86.h"
>
>   #include "qemu-option.h"
>   #include "qemu-config.h"
> @@ -1138,10 +1139,10 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
>               break;
>           }
>           if (kvm_enabled()) {
> -            *eax = kvm_arch_get_supported_cpuid(env, 0xd, count, R_EAX);
> -            *ebx = kvm_arch_get_supported_cpuid(env, 0xd, count, R_EBX);
> -            *ecx = kvm_arch_get_supported_cpuid(env, 0xd, count, R_ECX);
> -            *edx = kvm_arch_get_supported_cpuid(env, 0xd, count, R_EDX);
> +            *eax = kvm_x86_get_supported_cpuid(0xd, count, R_EAX);
> +            *ebx = kvm_x86_get_supported_cpuid(0xd, count, R_EBX);
> +            *ecx = kvm_x86_get_supported_cpuid(0xd, count, R_ECX);
> +            *edx = kvm_x86_get_supported_cpuid(0xd, count, R_EDX);
>           } else {
>               *eax = 0;
>               *ebx = 0;
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index 1789bff..cb6883f 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -60,7 +60,7 @@ static int lm_capable_kernel;
>
>   #ifdef KVM_CAP_EXT_CPUID
>
> -static struct kvm_cpuid2 *try_get_cpuid(KVMState *s, int max)
> +static struct kvm_cpuid2 *try_get_cpuid(int max)
>   {
>       struct kvm_cpuid2 *cpuid;
>       int r, size;
> @@ -68,7 +68,7 @@ static struct kvm_cpuid2 *try_get_cpuid(KVMState *s, int max)
>       size = sizeof(*cpuid) + max * sizeof(*cpuid->entries);
>       cpuid = (struct kvm_cpuid2 *)qemu_mallocz(size);
>       cpuid->nent = max;
> -    r = kvm_ioctl(s, KVM_GET_SUPPORTED_CPUID, cpuid);
> +    r = kvm_ioctl(KVM_GET_SUPPORTED_CPUID, cpuid);
>       if (r == 0&&  cpuid->nent>= max) {
>           r = -E2BIG;
>       }
> @@ -85,20 +85,20 @@ static struct kvm_cpuid2 *try_get_cpuid(KVMState *s, int max)
>       return cpuid;
>   }
>
> -uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function,
> -                                      uint32_t index, int reg)
> +uint32_t kvm_x86_get_supported_cpuid(uint32_t function, uint32_t index,
> +                                     int reg)
>   {
>       struct kvm_cpuid2 *cpuid;
>       int i, max;
>       uint32_t ret = 0;
>       uint32_t cpuid_1_edx;
>
> -    if (!kvm_check_extension(env->kvm_state, KVM_CAP_EXT_CPUID)) {
> +    if (!kvm_check_extension(KVM_CAP_EXT_CPUID)) {
>           return -1U;
>       }
>
>       max = 1;
> -    while ((cpuid = try_get_cpuid(env->kvm_state, max)) == NULL) {
> +    while ((cpuid = try_get_cpuid(max)) == NULL) {
>           max *= 2;
>       }
>
> @@ -126,7 +126,7 @@ uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function,
>                       /* On Intel, kvm returns cpuid according to the Intel spec,
>                        * so add missing bits according to the AMD spec:
>                        */
> -                    cpuid_1_edx = kvm_arch_get_supported_cpuid(env, 1, 0, R_EDX);
> +                    cpuid_1_edx = kvm_x86_get_supported_cpuid(1, 0, R_EDX);
>                       ret |= cpuid_1_edx&  0x183f7ff;
>                       break;
>                   }
> @@ -142,8 +142,8 @@ uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function,
>
>   #else
>
> -uint32_t kvm_arch_get_supported_cpuid(CPUState *env, uint32_t function,
> -                                      uint32_t index, int reg)
> +uint32_t kvm_x86_get_supported_cpuid(uint32_t function, uint32_t index,
> +                                     int reg)
>   {
>       return -1U;
>   }
> @@ -170,12 +170,12 @@ struct kvm_para_features {
>       { -1, -1 }
>   };
>
> -static int get_para_features(CPUState *env)
> +static int get_para_features(void)
>   {
>       int i, features = 0;
>
>       for (i = 0; i<  ARRAY_SIZE(para_features) - 1; i++) {
> -        if (kvm_check_extension(env->kvm_state, para_features[i].cap)) {
> +        if (kvm_check_extension(para_features[i].cap)) {
>               features |= (1<<  para_features[i].feature);
>           }
>       }
> @@ -184,15 +184,14 @@ static int get_para_features(CPUState *env)
>   #endif
>
>   #ifdef KVM_CAP_MCE
> -static int kvm_get_mce_cap_supported(KVMState *s, uint64_t *mce_cap,
> -                                     int *max_banks)
> +static int kvm_get_mce_cap_supported(uint64_t *mce_cap, int *max_banks)
>   {
>       int r;
>
> -    r = kvm_check_extension(s, KVM_CAP_MCE);
> +    r = kvm_check_extension(KVM_CAP_MCE);
>       if (r>  0) {
>           *max_banks = r;
> -        return kvm_ioctl(s, KVM_X86_GET_MCE_CAP_SUPPORTED, mce_cap);
> +        return kvm_ioctl(KVM_X86_GET_MCE_CAP_SUPPORTED, mce_cap);
>       }
>       return -ENOSYS;
>   }
> @@ -323,18 +322,18 @@ int kvm_arch_init_vcpu(CPUState *env)
>       uint32_t signature[3];
>   #endif
>
> -    env->cpuid_features&= kvm_arch_get_supported_cpuid(env, 1, 0, R_EDX);
> +    env->cpuid_features&= kvm_x86_get_supported_cpuid(1, 0, R_EDX);
>
>       i = env->cpuid_ext_features&  CPUID_EXT_HYPERVISOR;
> -    env->cpuid_ext_features&= kvm_arch_get_supported_cpuid(env, 1, 0, R_ECX);
> +    env->cpuid_ext_features&= kvm_x86_get_supported_cpuid(1, 0, R_ECX);
>       env->cpuid_ext_features |= i;
>
> -    env->cpuid_ext2_features&= kvm_arch_get_supported_cpuid(env, 0x80000001,
> -                                                             0, R_EDX);
> -    env->cpuid_ext3_features&= kvm_arch_get_supported_cpuid(env, 0x80000001,
> -                                                             0, R_ECX);
> -    env->cpuid_svm_features&= kvm_arch_get_supported_cpuid(env, 0x8000000A,
> -                                                             0, R_EDX);
> +    env->cpuid_ext2_features&= kvm_x86_get_supported_cpuid(0x80000001,
> +                                                            0, R_EDX);
> +    env->cpuid_ext3_features&= kvm_x86_get_supported_cpuid(0x80000001,
> +                                                            0, R_ECX);
> +    env->cpuid_svm_features&= kvm_x86_get_supported_cpuid(0x8000000A,
> +                                                            0, R_EDX);
>
>
>       cpuid_i = 0;
> @@ -353,7 +352,7 @@ int kvm_arch_init_vcpu(CPUState *env)
>       c =&cpuid_data.entries[cpuid_i++];
>       memset(c, 0, sizeof(*c));
>       c->function = KVM_CPUID_FEATURES;
> -    c->eax = env->cpuid_kvm_features&  get_para_features(env);
> +    c->eax = env->cpuid_kvm_features&  get_para_features();
>   #endif
>
>       cpu_x86_cpuid(env, 0, 0,&limit,&unused,&unused,&unused);
> @@ -423,11 +422,11 @@ int kvm_arch_init_vcpu(CPUState *env)
>   #ifdef KVM_CAP_MCE
>       if (((env->cpuid_version>>  8)&0xF)>= 6
>           &&  (env->cpuid_features&(CPUID_MCE|CPUID_MCA)) == (CPUID_MCE|CPUID_MCA)
> -&&  kvm_check_extension(env->kvm_state, KVM_CAP_MCE)>  0) {
> +&&  kvm_check_extension(KVM_CAP_MCE)>  0) {
>           uint64_t mcg_cap;
>           int banks;
>
> -        if (kvm_get_mce_cap_supported(env->kvm_state,&mcg_cap,&banks)) {
> +        if (kvm_get_mce_cap_supported(&mcg_cap,&banks)) {
>               perror("kvm_get_mce_cap_supported FAILED");
>           } else {
>               if (banks>  MCE_BANKS_DEF)
> @@ -461,7 +460,7 @@ void kvm_arch_reset_vcpu(CPUState *env)
>       }
>   }
>
> -static int kvm_get_supported_msrs(KVMState *s)
> +static int kvm_get_supported_msrs(void)
>   {
>       static int kvm_supported_msrs;
>       int ret = 0;
> @@ -475,7 +474,7 @@ static int kvm_get_supported_msrs(KVMState *s)
>           /* Obtain MSR list from KVM.  These are the MSRs that we must
>            * save/restore */
>           msr_list.nmsrs = 0;
> -        ret = kvm_ioctl(s, KVM_GET_MSR_INDEX_LIST,&msr_list);
> +        ret = kvm_ioctl(KVM_GET_MSR_INDEX_LIST,&msr_list);
>           if (ret<  0&&  ret != -E2BIG) {
>               return ret;
>           }
> @@ -486,7 +485,7 @@ static int kvm_get_supported_msrs(KVMState *s)
>                                                 sizeof(msr_list.indices[0])));
>
>           kvm_msr_list->nmsrs = msr_list.nmsrs;
> -        ret = kvm_ioctl(s, KVM_GET_MSR_INDEX_LIST, kvm_msr_list);
> +        ret = kvm_ioctl(KVM_GET_MSR_INDEX_LIST, kvm_msr_list);
>           if (ret>= 0) {
>               int i;
>
> @@ -508,17 +507,17 @@ static int kvm_get_supported_msrs(KVMState *s)
>       return ret;
>   }
>
> -static int kvm_init_identity_map_page(KVMState *s)
> +static int kvm_init_identity_map_page(void)
>   {
>   #ifdef KVM_CAP_SET_IDENTITY_MAP_ADDR
>       int ret;
>       uint64_t addr = 0xfffbc000;
>
> -    if (!kvm_check_extension(s, KVM_CAP_SET_IDENTITY_MAP_ADDR)) {
> +    if (!kvm_check_extension(KVM_CAP_SET_IDENTITY_MAP_ADDR)) {
>           return 0;
>       }
>
> -    ret = kvm_vm_ioctl(s, KVM_SET_IDENTITY_MAP_ADDR,&addr);
> +    ret = kvm_vm_ioctl(KVM_SET_IDENTITY_MAP_ADDR,&addr);
>       if (ret<  0) {
>           fprintf(stderr, "kvm_set_identity_map_addr: %s\n", strerror(ret));
>           return ret;
> @@ -527,12 +526,12 @@ static int kvm_init_identity_map_page(KVMState *s)
>       return 0;
>   }
>
> -int kvm_arch_init(KVMState *s, int smp_cpus)
> +int kvm_arch_init(int smp_cpus)
>   {
>       int ret;
>       struct utsname utsname;
>
> -    ret = kvm_get_supported_msrs(s);
> +    ret = kvm_get_supported_msrs();
>       if (ret<  0) {
>           return ret;
>       }
> @@ -546,7 +545,7 @@ int kvm_arch_init(KVMState *s, int smp_cpus)
>        * versions of KVM just assumed that it would be at the end of physical
>        * memory but that doesn't work with more than 4GB of memory.  We simply
>        * refuse to work with those older versions of KVM. */
> -    ret = kvm_check_extension(s, KVM_CAP_SET_TSS_ADDR);
> +    ret = kvm_check_extension(KVM_CAP_SET_TSS_ADDR);
>       if (ret<= 0) {
>           fprintf(stderr, "kvm does not support KVM_CAP_SET_TSS_ADDR\n");
>           return ret;
> @@ -563,12 +562,12 @@ int kvm_arch_init(KVMState *s, int smp_cpus)
>           perror("e820_add_entry() table is full");
>           exit(1);
>       }
> -    ret = kvm_vm_ioctl(s, KVM_SET_TSS_ADDR, 0xfffbd000);
> +    ret = kvm_vm_ioctl(KVM_SET_TSS_ADDR, 0xfffbd000);
>       if (ret<  0) {
>           return ret;
>       }
>
> -    return kvm_init_identity_map_page(s);
> +    return kvm_init_identity_map_page();
>   }
>
>   static void set_v8086_seg(struct kvm_segment *lhs, const SegmentCache *rhs)
> @@ -1861,7 +1860,7 @@ int kvm_on_sigbus_vcpu(CPUState *env, int code, void *addr)
>               || code == BUS_MCEERR_AO)) {
>           vaddr = (void *)addr;
>           if (qemu_ram_addr_from_host(vaddr,&ram_addr) ||
> -            !kvm_physical_memory_addr_from_ram(env->kvm_state, ram_addr,&paddr)) {
> +            !kvm_physical_memory_addr_from_ram(ram_addr,&paddr)) {
>               fprintf(stderr, "Hardware memory error for memory used by "
>                       "QEMU itself instead of guest system!\n");
>               /* Hope we are lucky for AO MCE */
> @@ -1910,7 +1909,7 @@ int kvm_on_sigbus(int code, void *addr)
>           /* Hope we are lucky for AO MCE */
>           vaddr = addr;
>           if (qemu_ram_addr_from_host(vaddr,&ram_addr) ||
> -            !kvm_physical_memory_addr_from_ram(first_cpu->kvm_state, ram_addr,&paddr)) {
> +            !kvm_physical_memory_addr_from_ram(ram_addr,&paddr)) {
>               fprintf(stderr, "Hardware memory error for memory used by "
>                       "QEMU itself instead of guest system!: %p\n", addr);
>               return 0;
> diff --git a/target-i386/kvm_x86.h b/target-i386/kvm_x86.h
> index 9d7b584..304d0cb 100644
> --- a/target-i386/kvm_x86.h
> +++ b/target-i386/kvm_x86.h
> @@ -22,4 +22,7 @@ void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
>                           uint64_t mcg_status, uint64_t addr, uint64_t misc,
>                           int flag);
>
> +uint32_t kvm_x86_get_supported_cpuid(uint32_t function, uint32_t index,
> +                                     int reg);
> +
>   #endif
> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> index 849b404..56d30cc 100644
> --- a/target-ppc/kvm.c
> +++ b/target-ppc/kvm.c
> @@ -56,13 +56,13 @@ static void kvm_kick_env(void *env)
>       qemu_cpu_kick(env);
>   }
>
> -int kvm_arch_init(KVMState *s, int smp_cpus)
> +int kvm_arch_init(int smp_cpus)
>   {
>   #ifdef KVM_CAP_PPC_UNSET_IRQ
> -    cap_interrupt_unset = kvm_check_extension(s, KVM_CAP_PPC_UNSET_IRQ);
> +    cap_interrupt_unset = kvm_check_extension(KVM_CAP_PPC_UNSET_IRQ);
>   #endif
>   #ifdef KVM_CAP_PPC_IRQ_LEVEL
> -    cap_interrupt_level = kvm_check_extension(s, KVM_CAP_PPC_IRQ_LEVEL);
> +    cap_interrupt_level = kvm_check_extension(KVM_CAP_PPC_IRQ_LEVEL);
>   #endif
>
>       if (!cap_interrupt_level) {
> @@ -164,7 +164,7 @@ int kvm_arch_get_registers(CPUState *env)
>           env->gpr[i] = regs.gpr[i];
>
>   #ifdef KVM_CAP_PPC_SEGSTATE
> -    if (kvm_check_extension(env->kvm_state, KVM_CAP_PPC_SEGSTATE)) {
> +    if (kvm_check_extension(KVM_CAP_PPC_SEGSTATE)) {
>           env->sdr1 = sregs.u.s.sdr1;
>
>           /* Sync SLB */
> @@ -371,8 +371,8 @@ int kvmppc_get_hypercall(CPUState *env, uint8_t *buf, int buf_len)
>   #ifdef KVM_CAP_PPC_GET_PVINFO
>       struct kvm_ppc_pvinfo pvinfo;
>
> -    if (kvm_check_extension(env->kvm_state, KVM_CAP_PPC_GET_PVINFO)&&
> -        !kvm_vm_ioctl(env->kvm_state, KVM_PPC_GET_PVINFO,&pvinfo)) {
> +    if (kvm_check_extension(KVM_CAP_PPC_GET_PVINFO)&&
> +        !kvm_vm_ioctl(KVM_PPC_GET_PVINFO,&pvinfo)) {
>           memcpy(buf, pvinfo.hcall, buf_len);
>
>           return 0;
> diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
> index adf4a9e..927a37e 100644
> --- a/target-s390x/kvm.c
> +++ b/target-s390x/kvm.c
> @@ -70,7 +70,7 @@
>   #define SCLP_CMDW_READ_SCP_INFO         0x00020001
>   #define SCLP_CMDW_READ_SCP_INFO_FORCED  0x00120001
>
> -int kvm_arch_init(KVMState *s, int smp_cpus)
> +int kvm_arch_init(int smp_cpus)
>   {
>       return 0;
>   }
> @@ -186,10 +186,6 @@ static void kvm_s390_interrupt_internal(CPUState *env, int type, uint32_t parm,
>       struct kvm_s390_interrupt kvmint;
>       int r;
>
> -    if (!env->kvm_state) {
> -        return;
> -    }
> -
>       env->halted = 0;
>       env->exception_index = -1;
>
> @@ -198,7 +194,7 @@ static void kvm_s390_interrupt_internal(CPUState *env, int type, uint32_t parm,
>       kvmint.parm64 = parm64;
>
>       if (vm) {
> -        r = kvm_vm_ioctl(env->kvm_state, KVM_S390_INTERRUPT,&kvmint);
> +        r = kvm_vm_ioctl(KVM_S390_INTERRUPT,&kvmint);
>       } else {
>           r = kvm_vcpu_ioctl(env, KVM_S390_INTERRUPT,&kvmint);
>       }
>    

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
  2011-01-06 19:24     ` [Qemu-devel] " Anthony Liguori
@ 2011-01-07  9:03       ` Jan Kiszka
  -1 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-07  9:03 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Marcelo Tosatti, qemu-devel, kvm, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 1584 bytes --]

Am 06.01.2011 20:24, Anthony Liguori wrote:
> On 01/06/2011 11:56 AM, Marcelo Tosatti wrote:
>> From: Jan Kiszka<jan.kiszka@siemens.com>
>>
>> QEMU supports only one VM, so there is only one kvm_state per process,
>> and we gain nothing passing a reference to it around. Eliminate any need
>> to refer to it outside of kvm-all.c.
>>
>> Signed-off-by: Jan Kiszka<jan.kiszka@siemens.com>
>> CC: Alexander Graf<agraf@suse.de>
>> Signed-off-by: Marcelo Tosatti<mtosatti@redhat.com>
>>    
> 
> I think this is a big mistake.

Obviously, I don't share your concerns. :)

> 
> Having to manage kvm_state keeps the abstraction lines well defined. 

How does it help?

> Otherwise, it's far too easy for portions of code to call into KVM
> functions that really shouldn't.

I can't imagine we gain anything from requiring kvm_check_extension
callers to hold a kvm_state "capability". Yes, it's now much easier to
call kvm_[vm_]ioctl, but that's the key point of this change:

So far we primarily complicated the internal interface between generic
and arch-dependent kvm parts by requiring kvm_state joggling. But
external users already find interfaces without this restriction
(kvm_log_*, kvm_ioeventfd_*, ...). That's because it's at least
complicated to _cleanly_ pass kvm_state references to all users that
need it - e.g. sysbus devices like kvmclock or upcoming in-kernel irqchips.

Let's just stop this artificial abstraction that has no practical use
and focus on detecting layering violations via code review. That's more
reliable IMHO.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 300+ messages in thread

* [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
@ 2011-01-07  9:03       ` Jan Kiszka
  0 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-07  9:03 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Marcelo Tosatti, qemu-devel, kvm, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 1584 bytes --]

Am 06.01.2011 20:24, Anthony Liguori wrote:
> On 01/06/2011 11:56 AM, Marcelo Tosatti wrote:
>> From: Jan Kiszka<jan.kiszka@siemens.com>
>>
>> QEMU supports only one VM, so there is only one kvm_state per process,
>> and we gain nothing passing a reference to it around. Eliminate any need
>> to refer to it outside of kvm-all.c.
>>
>> Signed-off-by: Jan Kiszka<jan.kiszka@siemens.com>
>> CC: Alexander Graf<agraf@suse.de>
>> Signed-off-by: Marcelo Tosatti<mtosatti@redhat.com>
>>    
> 
> I think this is a big mistake.

Obviously, I don't share your concerns. :)

> 
> Having to manage kvm_state keeps the abstraction lines well defined. 

How does it help?

> Otherwise, it's far too easy for portions of code to call into KVM
> functions that really shouldn't.

I can't imagine we gain anything from requiring kvm_check_extension
callers to hold a kvm_state "capability". Yes, it's now much easier to
call kvm_[vm_]ioctl, but that's the key point of this change:

So far we primarily complicated the internal interface between generic
and arch-dependent kvm parts by requiring kvm_state joggling. But
external users already find interfaces without this restriction
(kvm_log_*, kvm_ioeventfd_*, ...). That's because it's at least
complicated to _cleanly_ pass kvm_state references to all users that
need it - e.g. sysbus devices like kvmclock or upcoming in-kernel irqchips.

Let's just stop this artificial abstraction that has no practical use
and focus on detecting layering violations via code review. That's more
reliable IMHO.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
  2011-01-07  9:03       ` [Qemu-devel] " Jan Kiszka
@ 2011-01-07 23:27         ` Anthony Liguori
  -1 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-07 23:27 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Marcelo Tosatti, qemu-devel, kvm, Alexander Graf

On 01/07/2011 03:03 AM, Jan Kiszka wrote:
> Am 06.01.2011 20:24, Anthony Liguori wrote:
>    
>> On 01/06/2011 11:56 AM, Marcelo Tosatti wrote:
>>      
>>> From: Jan Kiszka<jan.kiszka@siemens.com>
>>>
>>> QEMU supports only one VM, so there is only one kvm_state per process,
>>> and we gain nothing passing a reference to it around. Eliminate any need
>>> to refer to it outside of kvm-all.c.
>>>
>>> Signed-off-by: Jan Kiszka<jan.kiszka@siemens.com>
>>> CC: Alexander Graf<agraf@suse.de>
>>> Signed-off-by: Marcelo Tosatti<mtosatti@redhat.com>
>>>
>>>        
>> I think this is a big mistake.
>>      
> Obviously, I don't share your concerns. :)
>
>    
>> Having to manage kvm_state keeps the abstraction lines well defined.
>>      
> How does it help?
>
>    
>> Otherwise, it's far too easy for portions of code to call into KVM
>> functions that really shouldn't.
>>      
> I can't imagine we gain anything from requiring kvm_check_extension
> callers to hold a kvm_state "capability". Yes, it's now much easier to
> call kvm_[vm_]ioctl, but that's the key point of this change:
>
> So far we primarily complicated the internal interface between generic
> and arch-dependent kvm parts by requiring kvm_state joggling. But
> external users already find interfaces without this restriction
> (kvm_log_*, kvm_ioeventfd_*, ...). That's because it's at least
> complicated to _cleanly_ pass kvm_state references to all users that
> need it - e.g. sysbus devices like kvmclock or upcoming in-kernel irqchips.
>    

I think you're basically making my point for me.

ioeventfd is a broken interface.  It shouldn't be a VM ioctl but rather 
a VCPU ioctl because PIO events are dispatched on a per-VCPU basis.

kvm_state is available as part of CPU state so it's quite easy to get at 
if these interfaces just took a CPUState argument (and they should).

> Let's just stop this artificial abstraction that has no practical use
> and focus on detecting layering violations via code review. That's more
> reliable IMHO.
>    

Documenting relationships between devices and the CPU is a very 
important task.  Being able to grep for cpu_single_env to see what 
devices models are interacting with the CPU is a very good thing.

When you've got these interactions hidden in a spaghetti of function 
calls, things become impossible to understand.

Regards,

Anthony Liguori

> Jan
>
>    


^ permalink raw reply	[flat|nested] 300+ messages in thread

* [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
@ 2011-01-07 23:27         ` Anthony Liguori
  0 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-07 23:27 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Marcelo Tosatti, qemu-devel, kvm, Alexander Graf

On 01/07/2011 03:03 AM, Jan Kiszka wrote:
> Am 06.01.2011 20:24, Anthony Liguori wrote:
>    
>> On 01/06/2011 11:56 AM, Marcelo Tosatti wrote:
>>      
>>> From: Jan Kiszka<jan.kiszka@siemens.com>
>>>
>>> QEMU supports only one VM, so there is only one kvm_state per process,
>>> and we gain nothing passing a reference to it around. Eliminate any need
>>> to refer to it outside of kvm-all.c.
>>>
>>> Signed-off-by: Jan Kiszka<jan.kiszka@siemens.com>
>>> CC: Alexander Graf<agraf@suse.de>
>>> Signed-off-by: Marcelo Tosatti<mtosatti@redhat.com>
>>>
>>>        
>> I think this is a big mistake.
>>      
> Obviously, I don't share your concerns. :)
>
>    
>> Having to manage kvm_state keeps the abstraction lines well defined.
>>      
> How does it help?
>
>    
>> Otherwise, it's far too easy for portions of code to call into KVM
>> functions that really shouldn't.
>>      
> I can't imagine we gain anything from requiring kvm_check_extension
> callers to hold a kvm_state "capability". Yes, it's now much easier to
> call kvm_[vm_]ioctl, but that's the key point of this change:
>
> So far we primarily complicated the internal interface between generic
> and arch-dependent kvm parts by requiring kvm_state joggling. But
> external users already find interfaces without this restriction
> (kvm_log_*, kvm_ioeventfd_*, ...). That's because it's at least
> complicated to _cleanly_ pass kvm_state references to all users that
> need it - e.g. sysbus devices like kvmclock or upcoming in-kernel irqchips.
>    

I think you're basically making my point for me.

ioeventfd is a broken interface.  It shouldn't be a VM ioctl but rather 
a VCPU ioctl because PIO events are dispatched on a per-VCPU basis.

kvm_state is available as part of CPU state so it's quite easy to get at 
if these interfaces just took a CPUState argument (and they should).

> Let's just stop this artificial abstraction that has no practical use
> and focus on detecting layering violations via code review. That's more
> reliable IMHO.
>    

Documenting relationships between devices and the CPU is a very 
important task.  Being able to grep for cpu_single_env to see what 
devices models are interacting with the CPU is a very good thing.

When you've got these interactions hidden in a spaghetti of function 
calls, things become impossible to understand.

Regards,

Anthony Liguori

> Jan
>
>    

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
  2011-01-07 23:27         ` [Qemu-devel] " Anthony Liguori
@ 2011-01-08  8:47           ` Jan Kiszka
  -1 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-08  8:47 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Marcelo Tosatti, qemu-devel, kvm, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 2673 bytes --]

Am 08.01.2011 00:27, Anthony Liguori wrote:
> On 01/07/2011 03:03 AM, Jan Kiszka wrote:
>> Am 06.01.2011 20:24, Anthony Liguori wrote:
>>   
>>> On 01/06/2011 11:56 AM, Marcelo Tosatti wrote:
>>>     
>>>> From: Jan Kiszka<jan.kiszka@siemens.com>
>>>>
>>>> QEMU supports only one VM, so there is only one kvm_state per process,
>>>> and we gain nothing passing a reference to it around. Eliminate any
>>>> need
>>>> to refer to it outside of kvm-all.c.
>>>>
>>>> Signed-off-by: Jan Kiszka<jan.kiszka@siemens.com>
>>>> CC: Alexander Graf<agraf@suse.de>
>>>> Signed-off-by: Marcelo Tosatti<mtosatti@redhat.com>
>>>>
>>>>        
>>> I think this is a big mistake.
>>>      
>> Obviously, I don't share your concerns. :)
>>
>>   
>>> Having to manage kvm_state keeps the abstraction lines well defined.
>>>      
>> How does it help?
>>
>>   
>>> Otherwise, it's far too easy for portions of code to call into KVM
>>> functions that really shouldn't.
>>>      
>> I can't imagine we gain anything from requiring kvm_check_extension
>> callers to hold a kvm_state "capability". Yes, it's now much easier to
>> call kvm_[vm_]ioctl, but that's the key point of this change:
>>
>> So far we primarily complicated the internal interface between generic
>> and arch-dependent kvm parts by requiring kvm_state joggling. But
>> external users already find interfaces without this restriction
>> (kvm_log_*, kvm_ioeventfd_*, ...). That's because it's at least
>> complicated to _cleanly_ pass kvm_state references to all users that
>> need it - e.g. sysbus devices like kvmclock or upcoming in-kernel
>> irqchips.
>>    
> 
> I think you're basically making my point for me.
> 
> ioeventfd is a broken interface.  It shouldn't be a VM ioctl but rather
> a VCPU ioctl because PIO events are dispatched on a per-VCPU basis.

OK, but I don't want to argue about the ioeventfd API. So let's put this
case aside. :)

> 
> kvm_state is available as part of CPU state so it's quite easy to get at
> if these interfaces just took a CPUState argument (and they should).

My point is definitely NOT about cpu-bound devices. That case is clear
and is not touched at all by this patch.

My point is about devices that have clear system scope like kvmclock,
ioapic, pit, pic, whatever-the-future-will-bring. And about KVM services
that have global scope like capability checks and other feature
explorations or VM configurations done by the KVM arch code. You still
didn't explain what we gain in these concrete scenarios by handing the
technically redundant abstraction kvm_state around, especially _inside_
the KVM core.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 300+ messages in thread

* [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
@ 2011-01-08  8:47           ` Jan Kiszka
  0 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-08  8:47 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Marcelo Tosatti, qemu-devel, kvm, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 2673 bytes --]

Am 08.01.2011 00:27, Anthony Liguori wrote:
> On 01/07/2011 03:03 AM, Jan Kiszka wrote:
>> Am 06.01.2011 20:24, Anthony Liguori wrote:
>>   
>>> On 01/06/2011 11:56 AM, Marcelo Tosatti wrote:
>>>     
>>>> From: Jan Kiszka<jan.kiszka@siemens.com>
>>>>
>>>> QEMU supports only one VM, so there is only one kvm_state per process,
>>>> and we gain nothing passing a reference to it around. Eliminate any
>>>> need
>>>> to refer to it outside of kvm-all.c.
>>>>
>>>> Signed-off-by: Jan Kiszka<jan.kiszka@siemens.com>
>>>> CC: Alexander Graf<agraf@suse.de>
>>>> Signed-off-by: Marcelo Tosatti<mtosatti@redhat.com>
>>>>
>>>>        
>>> I think this is a big mistake.
>>>      
>> Obviously, I don't share your concerns. :)
>>
>>   
>>> Having to manage kvm_state keeps the abstraction lines well defined.
>>>      
>> How does it help?
>>
>>   
>>> Otherwise, it's far too easy for portions of code to call into KVM
>>> functions that really shouldn't.
>>>      
>> I can't imagine we gain anything from requiring kvm_check_extension
>> callers to hold a kvm_state "capability". Yes, it's now much easier to
>> call kvm_[vm_]ioctl, but that's the key point of this change:
>>
>> So far we primarily complicated the internal interface between generic
>> and arch-dependent kvm parts by requiring kvm_state joggling. But
>> external users already find interfaces without this restriction
>> (kvm_log_*, kvm_ioeventfd_*, ...). That's because it's at least
>> complicated to _cleanly_ pass kvm_state references to all users that
>> need it - e.g. sysbus devices like kvmclock or upcoming in-kernel
>> irqchips.
>>    
> 
> I think you're basically making my point for me.
> 
> ioeventfd is a broken interface.  It shouldn't be a VM ioctl but rather
> a VCPU ioctl because PIO events are dispatched on a per-VCPU basis.

OK, but I don't want to argue about the ioeventfd API. So let's put this
case aside. :)

> 
> kvm_state is available as part of CPU state so it's quite easy to get at
> if these interfaces just took a CPUState argument (and they should).

My point is definitely NOT about cpu-bound devices. That case is clear
and is not touched at all by this patch.

My point is about devices that have clear system scope like kvmclock,
ioapic, pit, pic, whatever-the-future-will-bring. And about KVM services
that have global scope like capability checks and other feature
explorations or VM configurations done by the KVM arch code. You still
didn't explain what we gain in these concrete scenarios by handing the
technically redundant abstraction kvm_state around, especially _inside_
the KVM core.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [PATCH 14/35] kvm: Drop return value of kvm_cpu_exec
  2011-01-06 17:56   ` [Qemu-devel] " Marcelo Tosatti
@ 2011-01-08 13:09     ` Jan Kiszka
  -1 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-08 13:09 UTC (permalink / raw)
  To: Marcelo Tosatti, Anthony Liguori; +Cc: qemu-devel, kvm

[-- Attachment #1: Type: text/plain, Size: 2308 bytes --]

Am 06.01.2011 18:56, Marcelo Tosatti wrote:
> From: Jan Kiszka <jan.kiszka@siemens.com>
> 
> It is not used, it is not needed, so let's remove it.
> 

Please do not apply this for now. Digging deeper into execution loop
issues, it turned out that we likely do need the return code to clean up
the kvm mess in cpu_exec.

Jan

> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> ---
>  kvm-all.c  |    6 ++----
>  kvm-stub.c |    4 ++--
>  kvm.h      |    2 +-
>  3 files changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/kvm-all.c b/kvm-all.c
> index 2538283..7518f2c 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -850,7 +850,7 @@ void kvm_cpu_synchronize_post_init(CPUState *env)
>      env->kvm_vcpu_dirty = 0;
>  }
>  
> -int kvm_cpu_exec(CPUState *env)
> +void kvm_cpu_exec(CPUState *env)
>  {
>      struct kvm_run *run = env->kvm_run;
>      int ret;
> @@ -943,7 +943,7 @@ int kvm_cpu_exec(CPUState *env)
>  #ifdef KVM_CAP_SET_GUEST_DEBUG
>              if (kvm_arch_debug(&run->debug.arch)) {
>                  env->exception_index = EXCP_DEBUG;
> -                return 0;
> +                return;
>              }
>              /* re-enter, this exception was guest-internal */
>              ret = 1;
> @@ -960,8 +960,6 @@ int kvm_cpu_exec(CPUState *env)
>          env->exit_request = 0;
>          env->exception_index = EXCP_INTERRUPT;
>      }
> -
> -    return ret;
>  }
>  
>  int kvm_ioctl(KVMState *s, int type, ...)
> diff --git a/kvm-stub.c b/kvm-stub.c
> index 5384a4b..352c6a6 100644
> --- a/kvm-stub.c
> +++ b/kvm-stub.c
> @@ -79,9 +79,9 @@ void kvm_cpu_synchronize_post_init(CPUState *env)
>  {
>  }
>  
> -int kvm_cpu_exec(CPUState *env)
> +void kvm_cpu_exec(CPUState *env)
>  {
> -    abort ();
> +    abort();
>  }
>  
>  int kvm_has_sync_mmu(void)
> diff --git a/kvm.h b/kvm.h
> index 60a9b42..51ad56f 100644
> --- a/kvm.h
> +++ b/kvm.h
> @@ -46,7 +46,7 @@ int kvm_has_xcrs(void);
>  #ifdef NEED_CPU_H
>  int kvm_init_vcpu(CPUState *env);
>  
> -int kvm_cpu_exec(CPUState *env);
> +void kvm_cpu_exec(CPUState *env);
>  
>  #if !defined(CONFIG_USER_ONLY)
>  int kvm_log_start(target_phys_addr_t phys_addr, ram_addr_t size);



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 300+ messages in thread

* [Qemu-devel] Re: [PATCH 14/35] kvm: Drop return value of kvm_cpu_exec
@ 2011-01-08 13:09     ` Jan Kiszka
  0 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-08 13:09 UTC (permalink / raw)
  To: Marcelo Tosatti, Anthony Liguori; +Cc: qemu-devel, kvm

[-- Attachment #1: Type: text/plain, Size: 2308 bytes --]

Am 06.01.2011 18:56, Marcelo Tosatti wrote:
> From: Jan Kiszka <jan.kiszka@siemens.com>
> 
> It is not used, it is not needed, so let's remove it.
> 

Please do not apply this for now. Digging deeper into execution loop
issues, it turned out that we likely do need the return code to clean up
the kvm mess in cpu_exec.

Jan

> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> ---
>  kvm-all.c  |    6 ++----
>  kvm-stub.c |    4 ++--
>  kvm.h      |    2 +-
>  3 files changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/kvm-all.c b/kvm-all.c
> index 2538283..7518f2c 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -850,7 +850,7 @@ void kvm_cpu_synchronize_post_init(CPUState *env)
>      env->kvm_vcpu_dirty = 0;
>  }
>  
> -int kvm_cpu_exec(CPUState *env)
> +void kvm_cpu_exec(CPUState *env)
>  {
>      struct kvm_run *run = env->kvm_run;
>      int ret;
> @@ -943,7 +943,7 @@ int kvm_cpu_exec(CPUState *env)
>  #ifdef KVM_CAP_SET_GUEST_DEBUG
>              if (kvm_arch_debug(&run->debug.arch)) {
>                  env->exception_index = EXCP_DEBUG;
> -                return 0;
> +                return;
>              }
>              /* re-enter, this exception was guest-internal */
>              ret = 1;
> @@ -960,8 +960,6 @@ int kvm_cpu_exec(CPUState *env)
>          env->exit_request = 0;
>          env->exception_index = EXCP_INTERRUPT;
>      }
> -
> -    return ret;
>  }
>  
>  int kvm_ioctl(KVMState *s, int type, ...)
> diff --git a/kvm-stub.c b/kvm-stub.c
> index 5384a4b..352c6a6 100644
> --- a/kvm-stub.c
> +++ b/kvm-stub.c
> @@ -79,9 +79,9 @@ void kvm_cpu_synchronize_post_init(CPUState *env)
>  {
>  }
>  
> -int kvm_cpu_exec(CPUState *env)
> +void kvm_cpu_exec(CPUState *env)
>  {
> -    abort ();
> +    abort();
>  }
>  
>  int kvm_has_sync_mmu(void)
> diff --git a/kvm.h b/kvm.h
> index 60a9b42..51ad56f 100644
> --- a/kvm.h
> +++ b/kvm.h
> @@ -46,7 +46,7 @@ int kvm_has_xcrs(void);
>  #ifdef NEED_CPU_H
>  int kvm_init_vcpu(CPUState *env);
>  
> -int kvm_cpu_exec(CPUState *env);
> +void kvm_cpu_exec(CPUState *env);
>  
>  #if !defined(CONFIG_USER_ONLY)
>  int kvm_log_start(target_phys_addr_t phys_addr, ram_addr_t size);



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [PATCH 04/35] Add "broadcast" option for mce command
  2011-01-06 17:56   ` [Qemu-devel] " Marcelo Tosatti
@ 2011-01-09 18:51     ` Jan Kiszka
  -1 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-09 18:51 UTC (permalink / raw)
  To: Marcelo Tosatti, Anthony Liguori; +Cc: qemu-devel, kvm, Jin Dongming

[-- Attachment #1: Type: text/plain, Size: 7886 bytes --]

Am 06.01.2011 18:56, Marcelo Tosatti wrote:
> From: Jin Dongming <jin.dongming@np.css.fujitsu.com>
> 
> When the following test case is injected with mce command, maybe user could not
> get the expected result.
>     DATA
>                command cpu bank status             mcg_status  addr   misc
>         (qemu) mce     1   1    0xbd00000000000000 0x05        0x1234 0x8c
> 
>     Expected Result
>            panic type: "Fatal Machine check"
> 
> That is because each mce command can only inject the given cpu and could not
> inject mce interrupt to other cpus. So user will get the following result:
>     panic type: "Fatal machine check on current CPU"
> 
> "broadcast" option is used for injecting dummy data into other cpus. Injecting
> mce with this option the expected result could be gotten.
> 
> Usage:
>     Broadcast[on]
>            command broadcast cpu bank status             mcg_status  addr   misc
>     (qemu) mce     -b        1   1    0xbd00000000000000 0x05        0x1234 0x8c
> 
>     Broadcast[off]
>            command cpu bank status             mcg_status  addr   misc
>     (qemu) mce     1   1    0xbd00000000000000 0x05        0x1234 0x8c
> 
> Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> ---
>  cpu-all.h             |    3 ++-
>  hmp-commands.hx       |    6 +++---
>  monitor.c             |    7 +++++--
>  target-i386/helper.c  |   20 ++++++++++++++++++--
>  target-i386/kvm.c     |   16 ++++++++++++----
>  target-i386/kvm_x86.h |    5 ++++-
>  6 files changed, 44 insertions(+), 13 deletions(-)
> 
> diff --git a/cpu-all.h b/cpu-all.h
> index 30ae17d..4ce4e83 100644
> --- a/cpu-all.h
> +++ b/cpu-all.h
> @@ -964,6 +964,7 @@ int cpu_memory_rw_debug(CPUState *env, target_ulong addr,
>                          uint8_t *buf, int len, int is_write);
>  
>  void cpu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
> -                        uint64_t mcg_status, uint64_t addr, uint64_t misc);
> +                        uint64_t mcg_status, uint64_t addr, uint64_t misc,
> +                        int broadcast);
>  
>  #endif /* CPU_ALL_H */
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index df134f8..c82fb10 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -1091,9 +1091,9 @@ ETEXI
>  
>      {
>          .name       = "mce",
> -        .args_type  = "cpu_index:i,bank:i,status:l,mcg_status:l,addr:l,misc:l",
> -        .params     = "cpu bank status mcgstatus addr misc",
> -        .help       = "inject a MCE on the given CPU",
> +        .args_type  = "broadcast:-b,cpu_index:i,bank:i,status:l,mcg_status:l,addr:l,misc:l",
> +        .params     = "[-b] cpu bank status mcgstatus addr misc",
> +        .help       = "inject a MCE on the given CPU [and broadcast to other CPUs with -b option]",
>          .mhandler.cmd = do_inject_mce,
>      },
>  
> diff --git a/monitor.c b/monitor.c
> index f258000..f4f624b 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -2671,12 +2671,15 @@ static void do_inject_mce(Monitor *mon, const QDict *qdict)
>      uint64_t mcg_status = qdict_get_int(qdict, "mcg_status");
>      uint64_t addr = qdict_get_int(qdict, "addr");
>      uint64_t misc = qdict_get_int(qdict, "misc");
> +    int broadcast = qdict_get_try_bool(qdict, "broadcast", 0);
>  
> -    for (cenv = first_cpu; cenv != NULL; cenv = cenv->next_cpu)
> +    for (cenv = first_cpu; cenv != NULL; cenv = cenv->next_cpu) {
>          if (cenv->cpu_index == cpu_index && cenv->mcg_cap) {
> -            cpu_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc);
> +            cpu_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc,
> +                               broadcast);
>              break;
>          }
> +    }
>  }
>  #endif
>  
> diff --git a/target-i386/helper.c b/target-i386/helper.c
> index 2c94130..2cfb4a4 100644
> --- a/target-i386/helper.c
> +++ b/target-i386/helper.c
> @@ -1069,18 +1069,34 @@ static void qemu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
>  }
>  
>  void cpu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
> -                        uint64_t mcg_status, uint64_t addr, uint64_t misc)
> +                        uint64_t mcg_status, uint64_t addr, uint64_t misc,
> +                        int broadcast)
>  {
>      unsigned bank_num = cenv->mcg_cap & 0xff;
> +    CPUState *env;
> +    int flag = 0;
>  
>      if (bank >= bank_num || !(status & MCI_STATUS_VAL)) {
>          return;
>      }
>  
>      if (kvm_enabled()) {
> -        kvm_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc, 0);
> +        if (broadcast) {
> +            flag |= MCE_BROADCAST;
> +        }
> +
> +        kvm_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc, flag);
>      } else {
>          qemu_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc);
> +        if (broadcast) {
> +            for (env = first_cpu; env != NULL; env = env->next_cpu) {
> +                if (cenv == env) {
> +                    continue;
> +                }
> +
> +                qemu_inject_x86_mce(env, 1, 0xa000000000000000, 0, 0, 0);

Constant lacks "ULL". Can probably be fixed up on commit.

Jan

> +            }
> +        }
>      }
>  }
>  #endif /* !CONFIG_USER_ONLY */
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index 4004de7..8b868ad 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -264,11 +264,13 @@ static void kvm_do_inject_x86_mce(void *_data)
>          }
>      }
>  }
> +
> +static void kvm_mce_broadcast_rest(CPUState *env);
>  #endif
>  
>  void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
>                          uint64_t mcg_status, uint64_t addr, uint64_t misc,
> -                        int abort_on_error)
> +                        int flag)
>  {
>  #ifdef KVM_CAP_MCE
>      struct kvm_x86_mce mce = {
> @@ -288,10 +290,15 @@ void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
>          return;
>      }
>  
> +    if (flag & MCE_BROADCAST) {
> +        kvm_mce_broadcast_rest(cenv);
> +    }
> +
>      run_on_cpu(cenv, kvm_do_inject_x86_mce, &data);
>  #else
> -    if (abort_on_error)
> +    if (flag & ABORT_ON_ERROR) {
>          abort();
> +    }
>  #endif
>  }
>  
> @@ -1716,7 +1723,8 @@ static void kvm_mce_broadcast_rest(CPUState *env)
>                  continue;
>              }
>              kvm_inject_x86_mce(cenv, 1, MCI_STATUS_VAL | MCI_STATUS_UC,
> -                               MCG_STATUS_MCIP | MCG_STATUS_RIPV, 0, 0, 1);
> +                               MCG_STATUS_MCIP | MCG_STATUS_RIPV, 0, 0,
> +                               ABORT_ON_ERROR);
>          }
>      }
>  }
> @@ -1816,7 +1824,7 @@ int kvm_on_sigbus(int code, void *addr)
>              | 0xc0;
>          kvm_inject_x86_mce(first_cpu, 9, status,
>                             MCG_STATUS_MCIP | MCG_STATUS_RIPV, paddr,
> -                           (MCM_ADDR_PHYS << 6) | 0xc, 1);
> +                           (MCM_ADDR_PHYS << 6) | 0xc, ABORT_ON_ERROR);
>          kvm_mce_broadcast_rest(first_cpu);
>      } else
>  #endif
> diff --git a/target-i386/kvm_x86.h b/target-i386/kvm_x86.h
> index 04932cf..9d7b584 100644
> --- a/target-i386/kvm_x86.h
> +++ b/target-i386/kvm_x86.h
> @@ -15,8 +15,11 @@
>  #ifndef __KVM_X86_H__
>  #define __KVM_X86_H__
>  
> +#define ABORT_ON_ERROR  0x01
> +#define MCE_BROADCAST   0x02
> +
>  void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
>                          uint64_t mcg_status, uint64_t addr, uint64_t misc,
> -                        int abort_on_error);
> +                        int flag);
>  
>  #endif


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 300+ messages in thread

* [Qemu-devel] Re: [PATCH 04/35] Add "broadcast" option for mce command
@ 2011-01-09 18:51     ` Jan Kiszka
  0 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-09 18:51 UTC (permalink / raw)
  To: Marcelo Tosatti, Anthony Liguori; +Cc: qemu-devel, kvm, Jin Dongming

[-- Attachment #1: Type: text/plain, Size: 7886 bytes --]

Am 06.01.2011 18:56, Marcelo Tosatti wrote:
> From: Jin Dongming <jin.dongming@np.css.fujitsu.com>
> 
> When the following test case is injected with mce command, maybe user could not
> get the expected result.
>     DATA
>                command cpu bank status             mcg_status  addr   misc
>         (qemu) mce     1   1    0xbd00000000000000 0x05        0x1234 0x8c
> 
>     Expected Result
>            panic type: "Fatal Machine check"
> 
> That is because each mce command can only inject the given cpu and could not
> inject mce interrupt to other cpus. So user will get the following result:
>     panic type: "Fatal machine check on current CPU"
> 
> "broadcast" option is used for injecting dummy data into other cpus. Injecting
> mce with this option the expected result could be gotten.
> 
> Usage:
>     Broadcast[on]
>            command broadcast cpu bank status             mcg_status  addr   misc
>     (qemu) mce     -b        1   1    0xbd00000000000000 0x05        0x1234 0x8c
> 
>     Broadcast[off]
>            command cpu bank status             mcg_status  addr   misc
>     (qemu) mce     1   1    0xbd00000000000000 0x05        0x1234 0x8c
> 
> Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> ---
>  cpu-all.h             |    3 ++-
>  hmp-commands.hx       |    6 +++---
>  monitor.c             |    7 +++++--
>  target-i386/helper.c  |   20 ++++++++++++++++++--
>  target-i386/kvm.c     |   16 ++++++++++++----
>  target-i386/kvm_x86.h |    5 ++++-
>  6 files changed, 44 insertions(+), 13 deletions(-)
> 
> diff --git a/cpu-all.h b/cpu-all.h
> index 30ae17d..4ce4e83 100644
> --- a/cpu-all.h
> +++ b/cpu-all.h
> @@ -964,6 +964,7 @@ int cpu_memory_rw_debug(CPUState *env, target_ulong addr,
>                          uint8_t *buf, int len, int is_write);
>  
>  void cpu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
> -                        uint64_t mcg_status, uint64_t addr, uint64_t misc);
> +                        uint64_t mcg_status, uint64_t addr, uint64_t misc,
> +                        int broadcast);
>  
>  #endif /* CPU_ALL_H */
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index df134f8..c82fb10 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -1091,9 +1091,9 @@ ETEXI
>  
>      {
>          .name       = "mce",
> -        .args_type  = "cpu_index:i,bank:i,status:l,mcg_status:l,addr:l,misc:l",
> -        .params     = "cpu bank status mcgstatus addr misc",
> -        .help       = "inject a MCE on the given CPU",
> +        .args_type  = "broadcast:-b,cpu_index:i,bank:i,status:l,mcg_status:l,addr:l,misc:l",
> +        .params     = "[-b] cpu bank status mcgstatus addr misc",
> +        .help       = "inject a MCE on the given CPU [and broadcast to other CPUs with -b option]",
>          .mhandler.cmd = do_inject_mce,
>      },
>  
> diff --git a/monitor.c b/monitor.c
> index f258000..f4f624b 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -2671,12 +2671,15 @@ static void do_inject_mce(Monitor *mon, const QDict *qdict)
>      uint64_t mcg_status = qdict_get_int(qdict, "mcg_status");
>      uint64_t addr = qdict_get_int(qdict, "addr");
>      uint64_t misc = qdict_get_int(qdict, "misc");
> +    int broadcast = qdict_get_try_bool(qdict, "broadcast", 0);
>  
> -    for (cenv = first_cpu; cenv != NULL; cenv = cenv->next_cpu)
> +    for (cenv = first_cpu; cenv != NULL; cenv = cenv->next_cpu) {
>          if (cenv->cpu_index == cpu_index && cenv->mcg_cap) {
> -            cpu_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc);
> +            cpu_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc,
> +                               broadcast);
>              break;
>          }
> +    }
>  }
>  #endif
>  
> diff --git a/target-i386/helper.c b/target-i386/helper.c
> index 2c94130..2cfb4a4 100644
> --- a/target-i386/helper.c
> +++ b/target-i386/helper.c
> @@ -1069,18 +1069,34 @@ static void qemu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
>  }
>  
>  void cpu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
> -                        uint64_t mcg_status, uint64_t addr, uint64_t misc)
> +                        uint64_t mcg_status, uint64_t addr, uint64_t misc,
> +                        int broadcast)
>  {
>      unsigned bank_num = cenv->mcg_cap & 0xff;
> +    CPUState *env;
> +    int flag = 0;
>  
>      if (bank >= bank_num || !(status & MCI_STATUS_VAL)) {
>          return;
>      }
>  
>      if (kvm_enabled()) {
> -        kvm_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc, 0);
> +        if (broadcast) {
> +            flag |= MCE_BROADCAST;
> +        }
> +
> +        kvm_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc, flag);
>      } else {
>          qemu_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc);
> +        if (broadcast) {
> +            for (env = first_cpu; env != NULL; env = env->next_cpu) {
> +                if (cenv == env) {
> +                    continue;
> +                }
> +
> +                qemu_inject_x86_mce(env, 1, 0xa000000000000000, 0, 0, 0);

Constant lacks "ULL". Can probably be fixed up on commit.

Jan

> +            }
> +        }
>      }
>  }
>  #endif /* !CONFIG_USER_ONLY */
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index 4004de7..8b868ad 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -264,11 +264,13 @@ static void kvm_do_inject_x86_mce(void *_data)
>          }
>      }
>  }
> +
> +static void kvm_mce_broadcast_rest(CPUState *env);
>  #endif
>  
>  void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
>                          uint64_t mcg_status, uint64_t addr, uint64_t misc,
> -                        int abort_on_error)
> +                        int flag)
>  {
>  #ifdef KVM_CAP_MCE
>      struct kvm_x86_mce mce = {
> @@ -288,10 +290,15 @@ void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
>          return;
>      }
>  
> +    if (flag & MCE_BROADCAST) {
> +        kvm_mce_broadcast_rest(cenv);
> +    }
> +
>      run_on_cpu(cenv, kvm_do_inject_x86_mce, &data);
>  #else
> -    if (abort_on_error)
> +    if (flag & ABORT_ON_ERROR) {
>          abort();
> +    }
>  #endif
>  }
>  
> @@ -1716,7 +1723,8 @@ static void kvm_mce_broadcast_rest(CPUState *env)
>                  continue;
>              }
>              kvm_inject_x86_mce(cenv, 1, MCI_STATUS_VAL | MCI_STATUS_UC,
> -                               MCG_STATUS_MCIP | MCG_STATUS_RIPV, 0, 0, 1);
> +                               MCG_STATUS_MCIP | MCG_STATUS_RIPV, 0, 0,
> +                               ABORT_ON_ERROR);
>          }
>      }
>  }
> @@ -1816,7 +1824,7 @@ int kvm_on_sigbus(int code, void *addr)
>              | 0xc0;
>          kvm_inject_x86_mce(first_cpu, 9, status,
>                             MCG_STATUS_MCIP | MCG_STATUS_RIPV, paddr,
> -                           (MCM_ADDR_PHYS << 6) | 0xc, 1);
> +                           (MCM_ADDR_PHYS << 6) | 0xc, ABORT_ON_ERROR);
>          kvm_mce_broadcast_rest(first_cpu);
>      } else
>  #endif
> diff --git a/target-i386/kvm_x86.h b/target-i386/kvm_x86.h
> index 04932cf..9d7b584 100644
> --- a/target-i386/kvm_x86.h
> +++ b/target-i386/kvm_x86.h
> @@ -15,8 +15,11 @@
>  #ifndef __KVM_X86_H__
>  #define __KVM_X86_H__
>  
> +#define ABORT_ON_ERROR  0x01
> +#define MCE_BROADCAST   0x02
> +
>  void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
>                          uint64_t mcg_status, uint64_t addr, uint64_t misc,
> -                        int abort_on_error);
> +                        int flag);
>  
>  #endif


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
  2011-01-08  8:47           ` [Qemu-devel] " Jan Kiszka
@ 2011-01-10 19:59             ` Anthony Liguori
  -1 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-10 19:59 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Marcelo Tosatti, qemu-devel, kvm, Alexander Graf

On 01/08/2011 02:47 AM, Jan Kiszka wrote:
> Am 08.01.2011 00:27, Anthony Liguori wrote:
>    
>> On 01/07/2011 03:03 AM, Jan Kiszka wrote:
>>      
>>> Am 06.01.2011 20:24, Anthony Liguori wrote:
>>>
>>>        
>>>> On 01/06/2011 11:56 AM, Marcelo Tosatti wrote:
>>>>
>>>>          
>>>>> From: Jan Kiszka<jan.kiszka@siemens.com>
>>>>>
>>>>> QEMU supports only one VM, so there is only one kvm_state per process,
>>>>> and we gain nothing passing a reference to it around. Eliminate any
>>>>> need
>>>>> to refer to it outside of kvm-all.c.
>>>>>
>>>>> Signed-off-by: Jan Kiszka<jan.kiszka@siemens.com>
>>>>> CC: Alexander Graf<agraf@suse.de>
>>>>> Signed-off-by: Marcelo Tosatti<mtosatti@redhat.com>
>>>>>
>>>>>
>>>>>            
>>>> I think this is a big mistake.
>>>>
>>>>          
>>> Obviously, I don't share your concerns. :)
>>>
>>>
>>>        
>>>> Having to manage kvm_state keeps the abstraction lines well defined.
>>>>
>>>>          
>>> How does it help?
>>>
>>>
>>>        
>>>> Otherwise, it's far too easy for portions of code to call into KVM
>>>> functions that really shouldn't.
>>>>
>>>>          
>>> I can't imagine we gain anything from requiring kvm_check_extension
>>> callers to hold a kvm_state "capability". Yes, it's now much easier to
>>> call kvm_[vm_]ioctl, but that's the key point of this change:
>>>
>>> So far we primarily complicated the internal interface between generic
>>> and arch-dependent kvm parts by requiring kvm_state joggling. But
>>> external users already find interfaces without this restriction
>>> (kvm_log_*, kvm_ioeventfd_*, ...). That's because it's at least
>>> complicated to _cleanly_ pass kvm_state references to all users that
>>> need it - e.g. sysbus devices like kvmclock or upcoming in-kernel
>>> irqchips.
>>>
>>>        
>> I think you're basically making my point for me.
>>
>> ioeventfd is a broken interface.  It shouldn't be a VM ioctl but rather
>> a VCPU ioctl because PIO events are dispatched on a per-VCPU basis.
>>      
> OK, but I don't want to argue about the ioeventfd API. So let's put this
> case aside. :)
>
>    
>> kvm_state is available as part of CPU state so it's quite easy to get at
>> if these interfaces just took a CPUState argument (and they should).
>>      
> My point is definitely NOT about cpu-bound devices. That case is clear
> and is not touched at all by this patch.
>
> My point is about devices that have clear system scope like kvmclock,
> ioapic, pit, pic,

I don't see how ioapic, pit, or pic have a system scope.

I don't know enough about kvmclock.

>   whatever-the-future-will-bring. And about KVM services
> that have global scope like capability checks and other feature
> explorations or VM configurations done by the KVM arch code. You still
> didn't explain what we gain in these concrete scenarios by handing the
> technically redundant abstraction kvm_state around, especially _inside_
> the KVM core.
>    

If you have to pass around a KVMState pointer, you establish an explicit 
relationship and communication between subsystems.  Any place where the 
global KVMState is used is a red flag that something is wrong.

I don't see what the advantage to making all of the KVMState global and 
implicit.  It seems like a big step backwards to me.  Can you give a 
very concrete example of where you think it results in easier to 
understand code as I don't see how making relationships implicit ever 
makes code easier to understand?

Regards,

Anthony Liguori

> Jan
>
>    


^ permalink raw reply	[flat|nested] 300+ messages in thread

* [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
@ 2011-01-10 19:59             ` Anthony Liguori
  0 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-10 19:59 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Marcelo Tosatti, qemu-devel, kvm, Alexander Graf

On 01/08/2011 02:47 AM, Jan Kiszka wrote:
> Am 08.01.2011 00:27, Anthony Liguori wrote:
>    
>> On 01/07/2011 03:03 AM, Jan Kiszka wrote:
>>      
>>> Am 06.01.2011 20:24, Anthony Liguori wrote:
>>>
>>>        
>>>> On 01/06/2011 11:56 AM, Marcelo Tosatti wrote:
>>>>
>>>>          
>>>>> From: Jan Kiszka<jan.kiszka@siemens.com>
>>>>>
>>>>> QEMU supports only one VM, so there is only one kvm_state per process,
>>>>> and we gain nothing passing a reference to it around. Eliminate any
>>>>> need
>>>>> to refer to it outside of kvm-all.c.
>>>>>
>>>>> Signed-off-by: Jan Kiszka<jan.kiszka@siemens.com>
>>>>> CC: Alexander Graf<agraf@suse.de>
>>>>> Signed-off-by: Marcelo Tosatti<mtosatti@redhat.com>
>>>>>
>>>>>
>>>>>            
>>>> I think this is a big mistake.
>>>>
>>>>          
>>> Obviously, I don't share your concerns. :)
>>>
>>>
>>>        
>>>> Having to manage kvm_state keeps the abstraction lines well defined.
>>>>
>>>>          
>>> How does it help?
>>>
>>>
>>>        
>>>> Otherwise, it's far too easy for portions of code to call into KVM
>>>> functions that really shouldn't.
>>>>
>>>>          
>>> I can't imagine we gain anything from requiring kvm_check_extension
>>> callers to hold a kvm_state "capability". Yes, it's now much easier to
>>> call kvm_[vm_]ioctl, but that's the key point of this change:
>>>
>>> So far we primarily complicated the internal interface between generic
>>> and arch-dependent kvm parts by requiring kvm_state joggling. But
>>> external users already find interfaces without this restriction
>>> (kvm_log_*, kvm_ioeventfd_*, ...). That's because it's at least
>>> complicated to _cleanly_ pass kvm_state references to all users that
>>> need it - e.g. sysbus devices like kvmclock or upcoming in-kernel
>>> irqchips.
>>>
>>>        
>> I think you're basically making my point for me.
>>
>> ioeventfd is a broken interface.  It shouldn't be a VM ioctl but rather
>> a VCPU ioctl because PIO events are dispatched on a per-VCPU basis.
>>      
> OK, but I don't want to argue about the ioeventfd API. So let's put this
> case aside. :)
>
>    
>> kvm_state is available as part of CPU state so it's quite easy to get at
>> if these interfaces just took a CPUState argument (and they should).
>>      
> My point is definitely NOT about cpu-bound devices. That case is clear
> and is not touched at all by this patch.
>
> My point is about devices that have clear system scope like kvmclock,
> ioapic, pit, pic,

I don't see how ioapic, pit, or pic have a system scope.

I don't know enough about kvmclock.

>   whatever-the-future-will-bring. And about KVM services
> that have global scope like capability checks and other feature
> explorations or VM configurations done by the KVM arch code. You still
> didn't explain what we gain in these concrete scenarios by handing the
> technically redundant abstraction kvm_state around, especially _inside_
> the KVM core.
>    

If you have to pass around a KVMState pointer, you establish an explicit 
relationship and communication between subsystems.  Any place where the 
global KVMState is used is a red flag that something is wrong.

I don't see what the advantage to making all of the KVMState global and 
implicit.  It seems like a big step backwards to me.  Can you give a 
very concrete example of where you think it results in easier to 
understand code as I don't see how making relationships implicit ever 
makes code easier to understand?

Regards,

Anthony Liguori

> Jan
>
>    

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
  2011-01-08  8:47           ` [Qemu-devel] " Jan Kiszka
  (?)
  (?)
@ 2011-01-10 20:11           ` Anthony Liguori
  2011-01-10 20:15             ` Jan Kiszka
  2011-01-11  9:17               ` Avi Kivity
  -1 siblings, 2 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-10 20:11 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Marcelo Tosatti, qemu-devel, kvm, Alexander Graf

On 01/08/2011 02:47 AM, Jan Kiszka wrote:
> OK, but I don't want to argue about the ioeventfd API. So let's put this
> case aside. :)
>    

I often reply too quickly without explaining myself.  Let me use 
ioeventfd as an example to highlight why KVMState is a good thing.

In real life, PIO and MMIO are never directly communicated to the device 
from the processor.  Instead, they go through a series of other 
devices.  In the case of something like an ISA device, a PIO first goes 
to the chipset into the PCI complex, it will then go through a 
PCI-to-ISA bridge via subtractive decoding, and then forward over the 
ISA device where it will be interpreted by some device.

The path to the chipset may be shared among different processors but it 
may also be unique.  The APIC is the best example as there are historic 
APICs that hung directly off of the CPUs such that the same MMIO access 
across different CPUs did not go to the same device.  This is why the 
APIC emulation in QEMU is so weird because we don't model this behavior 
correctly.

This means that a PIO operation needs to flow from a CPUState to a 
DeviceState.  It can then flow through to another DeviceState until it's 
finally handled.

The first problem with ioeventfd is that it's a per-VM operation.  It 
should be per VCPU.

But even if this were the case, the path that a PIO operation takes 
should not be impacted by ioeventfd.  IOW, a device shouldn't be 
allocating an eventfd() and handing it to a magical KVM call.  Instead, 
a device should register a callback for a particular port in the same 
way it always does.  *As an optimization*, we should have another 
interface that says that these values are only valid for this IO port.  
That would let us create eventfds and register things behind the scenes.

That means we can handle TCG, older KVM kernels, and newer KVM kernels 
without any special support in the device model.  It also means that the 
device models never have to worry about KVMState because there's an 
entirely different piece of code that's consulting the set of special 
ports and then deciding how to handle it.  The result is better, more 
portable code that doesn't have KVM-isms.

If passing state around seems to be ugly, it's probably because we're 
not abstracting things correctly.  Removing the state and making it 
implicit is the wrong solution.  Fixing the abstraction is the right 
solution (or living with the ugliness until someone else is motivated to 
fix it properly).

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
  2011-01-10 19:59             ` [Qemu-devel] " Anthony Liguori
@ 2011-01-10 20:12               ` Jan Kiszka
  -1 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-10 20:12 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Marcelo Tosatti, qemu-devel, kvm, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 4142 bytes --]

Am 10.01.2011 20:59, Anthony Liguori wrote:
> On 01/08/2011 02:47 AM, Jan Kiszka wrote:
>> Am 08.01.2011 00:27, Anthony Liguori wrote:
>>   
>>> On 01/07/2011 03:03 AM, Jan Kiszka wrote:
>>>     
>>>> Am 06.01.2011 20:24, Anthony Liguori wrote:
>>>>
>>>>       
>>>>> On 01/06/2011 11:56 AM, Marcelo Tosatti wrote:
>>>>>
>>>>>         
>>>>>> From: Jan Kiszka<jan.kiszka@siemens.com>
>>>>>>
>>>>>> QEMU supports only one VM, so there is only one kvm_state per
>>>>>> process,
>>>>>> and we gain nothing passing a reference to it around. Eliminate any
>>>>>> need
>>>>>> to refer to it outside of kvm-all.c.
>>>>>>
>>>>>> Signed-off-by: Jan Kiszka<jan.kiszka@siemens.com>
>>>>>> CC: Alexander Graf<agraf@suse.de>
>>>>>> Signed-off-by: Marcelo Tosatti<mtosatti@redhat.com>
>>>>>>
>>>>>>
>>>>>>            
>>>>> I think this is a big mistake.
>>>>>
>>>>>          
>>>> Obviously, I don't share your concerns. :)
>>>>
>>>>
>>>>       
>>>>> Having to manage kvm_state keeps the abstraction lines well defined.
>>>>>
>>>>>          
>>>> How does it help?
>>>>
>>>>
>>>>       
>>>>> Otherwise, it's far too easy for portions of code to call into KVM
>>>>> functions that really shouldn't.
>>>>>
>>>>>          
>>>> I can't imagine we gain anything from requiring kvm_check_extension
>>>> callers to hold a kvm_state "capability". Yes, it's now much easier to
>>>> call kvm_[vm_]ioctl, but that's the key point of this change:
>>>>
>>>> So far we primarily complicated the internal interface between generic
>>>> and arch-dependent kvm parts by requiring kvm_state joggling. But
>>>> external users already find interfaces without this restriction
>>>> (kvm_log_*, kvm_ioeventfd_*, ...). That's because it's at least
>>>> complicated to _cleanly_ pass kvm_state references to all users that
>>>> need it - e.g. sysbus devices like kvmclock or upcoming in-kernel
>>>> irqchips.
>>>>
>>>>        
>>> I think you're basically making my point for me.
>>>
>>> ioeventfd is a broken interface.  It shouldn't be a VM ioctl but rather
>>> a VCPU ioctl because PIO events are dispatched on a per-VCPU basis.
>>>      
>> OK, but I don't want to argue about the ioeventfd API. So let's put this
>> case aside. :)
>>
>>   
>>> kvm_state is available as part of CPU state so it's quite easy to get at
>>> if these interfaces just took a CPUState argument (and they should).
>>>      
>> My point is definitely NOT about cpu-bound devices. That case is clear
>> and is not touched at all by this patch.
>>
>> My point is about devices that have clear system scope like kvmclock,
>> ioapic, pit, pic,
> 
> I don't see how ioapic, pit, or pic have a system scope.

They are not bound to any CPU like the APIC which you may have in mind.

> 
> I don't know enough about kvmclock.

It's just the same.

> 
>>   whatever-the-future-will-bring. And about KVM services
>> that have global scope like capability checks and other feature
>> explorations or VM configurations done by the KVM arch code. You still
>> didn't explain what we gain in these concrete scenarios by handing the
>> technically redundant abstraction kvm_state around, especially _inside_
>> the KVM core.
>>    
> 
> If you have to pass around a KVMState pointer, you establish an explicit
> relationship and communication between subsystems.  Any place where the
> global KVMState is used is a red flag that something is wrong.

It is and will be _only_ used inside kvm-all.c. Again: What is the
benefit of restricting access to kvm_check_extension this way?

> 
> I don't see what the advantage to making all of the KVMState global and
> implicit.  It seems like a big step backwards to me.  Can you give a
> very concrete example of where you think it results in easier to
> understand code as I don't see how making relationships implicit ever
> makes code easier to understand?

The best example does not yet exist (fortunately): Just look at patch 28
and then try to pass some kvm_state reference to the kvmclock device. Is
this handle worth changing the sysbus API?

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 300+ messages in thread

* [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
@ 2011-01-10 20:12               ` Jan Kiszka
  0 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-10 20:12 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Marcelo Tosatti, qemu-devel, kvm, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 4142 bytes --]

Am 10.01.2011 20:59, Anthony Liguori wrote:
> On 01/08/2011 02:47 AM, Jan Kiszka wrote:
>> Am 08.01.2011 00:27, Anthony Liguori wrote:
>>   
>>> On 01/07/2011 03:03 AM, Jan Kiszka wrote:
>>>     
>>>> Am 06.01.2011 20:24, Anthony Liguori wrote:
>>>>
>>>>       
>>>>> On 01/06/2011 11:56 AM, Marcelo Tosatti wrote:
>>>>>
>>>>>         
>>>>>> From: Jan Kiszka<jan.kiszka@siemens.com>
>>>>>>
>>>>>> QEMU supports only one VM, so there is only one kvm_state per
>>>>>> process,
>>>>>> and we gain nothing passing a reference to it around. Eliminate any
>>>>>> need
>>>>>> to refer to it outside of kvm-all.c.
>>>>>>
>>>>>> Signed-off-by: Jan Kiszka<jan.kiszka@siemens.com>
>>>>>> CC: Alexander Graf<agraf@suse.de>
>>>>>> Signed-off-by: Marcelo Tosatti<mtosatti@redhat.com>
>>>>>>
>>>>>>
>>>>>>            
>>>>> I think this is a big mistake.
>>>>>
>>>>>          
>>>> Obviously, I don't share your concerns. :)
>>>>
>>>>
>>>>       
>>>>> Having to manage kvm_state keeps the abstraction lines well defined.
>>>>>
>>>>>          
>>>> How does it help?
>>>>
>>>>
>>>>       
>>>>> Otherwise, it's far too easy for portions of code to call into KVM
>>>>> functions that really shouldn't.
>>>>>
>>>>>          
>>>> I can't imagine we gain anything from requiring kvm_check_extension
>>>> callers to hold a kvm_state "capability". Yes, it's now much easier to
>>>> call kvm_[vm_]ioctl, but that's the key point of this change:
>>>>
>>>> So far we primarily complicated the internal interface between generic
>>>> and arch-dependent kvm parts by requiring kvm_state joggling. But
>>>> external users already find interfaces without this restriction
>>>> (kvm_log_*, kvm_ioeventfd_*, ...). That's because it's at least
>>>> complicated to _cleanly_ pass kvm_state references to all users that
>>>> need it - e.g. sysbus devices like kvmclock or upcoming in-kernel
>>>> irqchips.
>>>>
>>>>        
>>> I think you're basically making my point for me.
>>>
>>> ioeventfd is a broken interface.  It shouldn't be a VM ioctl but rather
>>> a VCPU ioctl because PIO events are dispatched on a per-VCPU basis.
>>>      
>> OK, but I don't want to argue about the ioeventfd API. So let's put this
>> case aside. :)
>>
>>   
>>> kvm_state is available as part of CPU state so it's quite easy to get at
>>> if these interfaces just took a CPUState argument (and they should).
>>>      
>> My point is definitely NOT about cpu-bound devices. That case is clear
>> and is not touched at all by this patch.
>>
>> My point is about devices that have clear system scope like kvmclock,
>> ioapic, pit, pic,
> 
> I don't see how ioapic, pit, or pic have a system scope.

They are not bound to any CPU like the APIC which you may have in mind.

> 
> I don't know enough about kvmclock.

It's just the same.

> 
>>   whatever-the-future-will-bring. And about KVM services
>> that have global scope like capability checks and other feature
>> explorations or VM configurations done by the KVM arch code. You still
>> didn't explain what we gain in these concrete scenarios by handing the
>> technically redundant abstraction kvm_state around, especially _inside_
>> the KVM core.
>>    
> 
> If you have to pass around a KVMState pointer, you establish an explicit
> relationship and communication between subsystems.  Any place where the
> global KVMState is used is a red flag that something is wrong.

It is and will be _only_ used inside kvm-all.c. Again: What is the
benefit of restricting access to kvm_check_extension this way?

> 
> I don't see what the advantage to making all of the KVMState global and
> implicit.  It seems like a big step backwards to me.  Can you give a
> very concrete example of where you think it results in easier to
> understand code as I don't see how making relationships implicit ever
> makes code easier to understand?

The best example does not yet exist (fortunately): Just look at patch 28
and then try to pass some kvm_state reference to the kvmclock device. Is
this handle worth changing the sysbus API?

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
  2011-01-10 20:11           ` Anthony Liguori
@ 2011-01-10 20:15             ` Jan Kiszka
  2011-01-11  9:17               ` Avi Kivity
  1 sibling, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-10 20:15 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Marcelo Tosatti, qemu-devel, kvm, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 2803 bytes --]

Am 10.01.2011 21:11, Anthony Liguori wrote:
> On 01/08/2011 02:47 AM, Jan Kiszka wrote:
>> OK, but I don't want to argue about the ioeventfd API. So let's put this
>> case aside. :)
>>    
> 
> I often reply too quickly without explaining myself.  Let me use
> ioeventfd as an example to highlight why KVMState is a good thing.
> 
> In real life, PIO and MMIO are never directly communicated to the device
> from the processor.  Instead, they go through a series of other
> devices.  In the case of something like an ISA device, a PIO first goes
> to the chipset into the PCI complex, it will then go through a
> PCI-to-ISA bridge via subtractive decoding, and then forward over the
> ISA device where it will be interpreted by some device.
> 
> The path to the chipset may be shared among different processors but it
> may also be unique.  The APIC is the best example as there are historic
> APICs that hung directly off of the CPUs such that the same MMIO access
> across different CPUs did not go to the same device.  This is why the
> APIC emulation in QEMU is so weird because we don't model this behavior
> correctly.
> 
> This means that a PIO operation needs to flow from a CPUState to a
> DeviceState.  It can then flow through to another DeviceState until it's
> finally handled.
> 
> The first problem with ioeventfd is that it's a per-VM operation.  It
> should be per VCPU.
> 
> But even if this were the case, the path that a PIO operation takes
> should not be impacted by ioeventfd.  IOW, a device shouldn't be
> allocating an eventfd() and handing it to a magical KVM call.  Instead,
> a device should register a callback for a particular port in the same
> way it always does.  *As an optimization*, we should have another
> interface that says that these values are only valid for this IO port. 
> That would let us create eventfds and register things behind the scenes.
> 
> That means we can handle TCG, older KVM kernels, and newer KVM kernels
> without any special support in the device model.  It also means that the
> device models never have to worry about KVMState because there's an
> entirely different piece of code that's consulting the set of special
> ports and then deciding how to handle it.  The result is better, more
> portable code that doesn't have KVM-isms.
> 
> If passing state around seems to be ugly, it's probably because we're
> not abstracting things correctly.  Removing the state and making it
> implicit is the wrong solution.  Fixing the abstraction is the right
> solution (or living with the ugliness until someone else is motivated to
> fix it properly).

Look at my other reply, it should answer this. ioeventfd is the wrong
example IMHO as one may argue about its relation to VCPUS.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
  2011-01-10 20:12               ` [Qemu-devel] " Jan Kiszka
  (?)
@ 2011-01-10 20:23               ` Anthony Liguori
  2011-01-10 20:34                 ` Jan Kiszka
  2011-01-11  9:01                   ` Avi Kivity
  -1 siblings, 2 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-10 20:23 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, Marcelo Tosatti, qemu-devel, kvm, Alexander Graf

On 01/10/2011 02:12 PM, Jan Kiszka wrote:
> Am 10.01.2011 20:59, Anthony Liguori wrote:
>    
>> On 01/08/2011 02:47 AM, Jan Kiszka wrote:
>>      
>>> Am 08.01.2011 00:27, Anthony Liguori wrote:
>>>
>>>        
>>>> On 01/07/2011 03:03 AM, Jan Kiszka wrote:
>>>>
>>>>          
>>>>> Am 06.01.2011 20:24, Anthony Liguori wrote:
>>>>>
>>>>>
>>>>>            
>>>>>> On 01/06/2011 11:56 AM, Marcelo Tosatti wrote:
>>>>>>
>>>>>>
>>>>>>              
>>>>>>> From: Jan Kiszka<jan.kiszka@siemens.com>
>>>>>>>
>>>>>>> QEMU supports only one VM, so there is only one kvm_state per
>>>>>>> process,
>>>>>>> and we gain nothing passing a reference to it around. Eliminate any
>>>>>>> need
>>>>>>> to refer to it outside of kvm-all.c.
>>>>>>>
>>>>>>> Signed-off-by: Jan Kiszka<jan.kiszka@siemens.com>
>>>>>>> CC: Alexander Graf<agraf@suse.de>
>>>>>>> Signed-off-by: Marcelo Tosatti<mtosatti@redhat.com>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>> I think this is a big mistake.
>>>>>>
>>>>>>
>>>>>>              
>>>>> Obviously, I don't share your concerns. :)
>>>>>
>>>>>
>>>>>
>>>>>            
>>>>>> Having to manage kvm_state keeps the abstraction lines well defined.
>>>>>>
>>>>>>
>>>>>>              
>>>>> How does it help?
>>>>>
>>>>>
>>>>>
>>>>>            
>>>>>> Otherwise, it's far too easy for portions of code to call into KVM
>>>>>> functions that really shouldn't.
>>>>>>
>>>>>>
>>>>>>              
>>>>> I can't imagine we gain anything from requiring kvm_check_extension
>>>>> callers to hold a kvm_state "capability". Yes, it's now much easier to
>>>>> call kvm_[vm_]ioctl, but that's the key point of this change:
>>>>>
>>>>> So far we primarily complicated the internal interface between generic
>>>>> and arch-dependent kvm parts by requiring kvm_state joggling. But
>>>>> external users already find interfaces without this restriction
>>>>> (kvm_log_*, kvm_ioeventfd_*, ...). That's because it's at least
>>>>> complicated to _cleanly_ pass kvm_state references to all users that
>>>>> need it - e.g. sysbus devices like kvmclock or upcoming in-kernel
>>>>> irqchips.
>>>>>
>>>>>
>>>>>            
>>>> I think you're basically making my point for me.
>>>>
>>>> ioeventfd is a broken interface.  It shouldn't be a VM ioctl but rather
>>>> a VCPU ioctl because PIO events are dispatched on a per-VCPU basis.
>>>>
>>>>          
>>> OK, but I don't want to argue about the ioeventfd API. So let's put this
>>> case aside. :)
>>>
>>>
>>>        
>>>> kvm_state is available as part of CPU state so it's quite easy to get at
>>>> if these interfaces just took a CPUState argument (and they should).
>>>>
>>>>          
>>> My point is definitely NOT about cpu-bound devices. That case is clear
>>> and is not touched at all by this patch.
>>>
>>> My point is about devices that have clear system scope like kvmclock,
>>> ioapic, pit, pic,
>>>        
>> I don't see how ioapic, pit, or pic have a system scope.
>>      
> They are not bound to any CPU like the APIC which you may have in mind.
>    

And none of the above interact with KVM.

They may be replaced by KVM but if you look at the PIT, this is done by 
having two distinct devices.  The KVM specific device can (and should) 
be instantiated with kvm_state.

The way the IOAPIC/APIC/PIC is handled in qemu-kvm is nasty.  The kernel 
devices are separate devices and that should be reflected in the device 
tree.

>> I don't know enough about kvmclock.
>>      
> It's just the same.
>
>    
>>      
>>>    whatever-the-future-will-bring. And about KVM services
>>> that have global scope like capability checks and other feature
>>> explorations or VM configurations done by the KVM arch code. You still
>>> didn't explain what we gain in these concrete scenarios by handing the
>>> technically redundant abstraction kvm_state around, especially _inside_
>>> the KVM core.
>>>
>>>        
>> If you have to pass around a KVMState pointer, you establish an explicit
>> relationship and communication between subsystems.  Any place where the
>> global KVMState is used is a red flag that something is wrong.
>>      
> It is and will be _only_ used inside kvm-all.c. Again: What is the
> benefit of restricting access to kvm_check_extension this way?
>    

The more places that need to deal with KVM compatibility code, the worse 
we will be because it's more opportunities to get it wrong.

>> I don't see what the advantage to making all of the KVMState global and
>> implicit.  It seems like a big step backwards to me.  Can you give a
>> very concrete example of where you think it results in easier to
>> understand code as I don't see how making relationships implicit ever
>> makes code easier to understand?
>>      
> The best example does not yet exist (fortunately): Just look at patch 28
> and then try to pass some kvm_state reference to the kvmclock device. Is
> this handle worth changing the sysbus API?
>    

Let me look at that patch and reply there.

Regards,

Anthony Liguori

> Jan
>
>    


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-06 17:56   ` [Qemu-devel] " Marcelo Tosatti
@ 2011-01-10 20:31     ` Anthony Liguori
  -1 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-10 20:31 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Anthony Liguori, Jan Kiszka, Glauber Costa, qemu-devel, kvm

On 01/06/2011 11:56 AM, Marcelo Tosatti wrote:
> From: Jan Kiszka<jan.kiszka@siemens.com>
>
> If kvmclock is used, which implies the kernel supports it, register a
> kvmclock device with the sysbus. Its main purpose is to save and restore
> the kernel state on migration, but this will also allow to visualize it
> one day.
>
> Signed-off-by: Jan Kiszka<jan.kiszka@siemens.com>
> CC: Glauber Costa<glommer@redhat.com>
> Signed-off-by: Marcelo Tosatti<mtosatti@redhat.com>
> ---
>   target-i386/kvm.c |   92 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
>   1 files changed, 91 insertions(+), 1 deletions(-)
>
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index 69b8234..47cb22b 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -29,6 +29,7 @@
>   #include "hw/apic.h"
>   #include "ioport.h"
>   #include "kvm_x86.h"
> +#include "hw/sysbus.h"
>
>   #ifdef CONFIG_KVM_PARA
>   #include<linux/kvm_para.h>
> @@ -309,6 +310,85 @@ void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
>   #endif
>   }
>
> +#if defined(CONFIG_KVM_PARA)&&  defined(KVM_CAP_ADJUST_CLOCK)
> +typedef struct KVMClockState {
> +    SysBusDevice busdev;
> +    uint64_t clock;
> +    bool clock_valid;
> +} KVMClockState;
> +
> +static void kvmclock_pre_save(void *opaque)
> +{
> +    KVMClockState *s = opaque;
> +    struct kvm_clock_data data;
> +    int ret;
> +
> +    if (s->clock_valid) {
> +        return;
> +    }
> +    ret = kvm_vm_ioctl(KVM_GET_CLOCK,&data);
> +    if (ret<  0) {
> +        fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
> +        data.clock = 0;
> +    }
> +    s->clock = data.clock;
> +    /*
> +     * If the VM is stopped, declare the clock state valid to avoid re-reading
> +     * it on next vmsave (which would return a different value). Will be reset
> +     * when the VM is continued.
> +     */
> +    s->clock_valid = !vm_running;
> +}
> +
> +static int kvmclock_post_load(void *opaque, int version_id)
> +{
> +    KVMClockState *s = opaque;
> +    struct kvm_clock_data data;
> +
> +    data.clock = s->clock;
> +    data.flags = 0;
> +    return kvm_vm_ioctl(KVM_SET_CLOCK,&data);
> +}
> +
> +static void kvmclock_vm_state_change(void *opaque, int running, int reason)
> +{
> +    KVMClockState *s = opaque;
> +
> +    if (running) {
> +        s->clock_valid = false;
> +    }
> +}
> +
> +static int kvmclock_init(SysBusDevice *dev)
> +{
> +    KVMClockState *s = FROM_SYSBUS(KVMClockState, dev);
> +
> +    qemu_add_vm_change_state_handler(kvmclock_vm_state_change, s);
> +    return 0;
> +}
> +
> +static const VMStateDescription kvmclock_vmsd= {
> +    .name = "kvmclock",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .minimum_version_id_old = 1,
> +    .pre_save = kvmclock_pre_save,
> +    .post_load = kvmclock_post_load,
> +    .fields = (VMStateField []) {
> +        VMSTATE_UINT64(clock, KVMClockState),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
> +static SysBusDeviceInfo kvmclock_info = {
> +    .qdev.name = "kvmclock",
> +    .qdev.size = sizeof(KVMClockState),
> +    .qdev.vmsd =&kvmclock_vmsd,
> +    .qdev.no_user = 1,
> +    .init = kvmclock_init,
> +};
> +#endif /* CONFIG_KVM_PARA&&  KVM_CAP_ADJUST_CLOCK */
> +
>   int kvm_arch_init_vcpu(CPUState *env)
>   {
>       struct {
> @@ -335,7 +415,6 @@ int kvm_arch_init_vcpu(CPUState *env)
>       env->cpuid_svm_features&= kvm_x86_get_supported_cpuid(0x8000000A,
>                                                               0, R_EDX);
>
> -
>       cpuid_i = 0;
>
>   #ifdef CONFIG_KVM_PARA
> @@ -442,6 +521,13 @@ int kvm_arch_init_vcpu(CPUState *env)
>       }
>   #endif
>
> +#if defined(CONFIG_KVM_PARA)&&  defined(KVM_CAP_ADJUST_CLOCK)
> +    if (cpu_is_bsp(env)&&
> +        (env->cpuid_kvm_features&  (1ULL<<  KVM_FEATURE_CLOCKSOURCE))) {
> +        sysbus_create_simple("kvmclock", -1, NULL);
> +    }
> +#endif
> +
>       return kvm_vcpu_ioctl(env, KVM_SET_CPUID2,&cpuid_data);
>   }
>
> @@ -531,6 +617,10 @@ int kvm_arch_init(int smp_cpus)
>       int ret;
>       struct utsname utsname;
>
> +#if defined(CONFIG_KVM_PARA)&&  defined(KVM_CAP_ADJUST_CLOCK)
> +    sysbus_register_withprop(&kvmclock_info);
> +#endif
> +
>       ret = kvm_get_supported_msrs();
>       if (ret<  0) {
>           return ret;
>    

There are a couple things wrong with this patch.  It breaks 
compatibility because it does not allow kvmclock to be created or 
initiated in machines.  Older machines didn't expose kvmclock but now 
they do.  It also makes it impossible to pass parameters to kvmclock in 
the future because the device creation is hidden deep in other code 
paths.  Calling any qdev creation function in anything but pc.c (or the 
equivalent) should be a big red flag.

The solution is simple, introduce as kvm_has_clocksource().  Within the 
machine init, create the the kvm clock device after CPU creation wrapped 
in a if (kvm_has_clocksource()) call.  kvmclock should be created with 
kvm_state as a parameter and kvm_vm_ioctl() is passed the stored 
reference.   Taking a global reference to kvm_state in machine_init is 
not a bad thing, obviously the machine initialization function needs 
access to the kvm_state.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-10 20:31     ` Anthony Liguori
  0 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-10 20:31 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Jan Kiszka, Anthony Liguori, Glauber Costa, kvm, qemu-devel

On 01/06/2011 11:56 AM, Marcelo Tosatti wrote:
> From: Jan Kiszka<jan.kiszka@siemens.com>
>
> If kvmclock is used, which implies the kernel supports it, register a
> kvmclock device with the sysbus. Its main purpose is to save and restore
> the kernel state on migration, but this will also allow to visualize it
> one day.
>
> Signed-off-by: Jan Kiszka<jan.kiszka@siemens.com>
> CC: Glauber Costa<glommer@redhat.com>
> Signed-off-by: Marcelo Tosatti<mtosatti@redhat.com>
> ---
>   target-i386/kvm.c |   92 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
>   1 files changed, 91 insertions(+), 1 deletions(-)
>
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index 69b8234..47cb22b 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -29,6 +29,7 @@
>   #include "hw/apic.h"
>   #include "ioport.h"
>   #include "kvm_x86.h"
> +#include "hw/sysbus.h"
>
>   #ifdef CONFIG_KVM_PARA
>   #include<linux/kvm_para.h>
> @@ -309,6 +310,85 @@ void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
>   #endif
>   }
>
> +#if defined(CONFIG_KVM_PARA)&&  defined(KVM_CAP_ADJUST_CLOCK)
> +typedef struct KVMClockState {
> +    SysBusDevice busdev;
> +    uint64_t clock;
> +    bool clock_valid;
> +} KVMClockState;
> +
> +static void kvmclock_pre_save(void *opaque)
> +{
> +    KVMClockState *s = opaque;
> +    struct kvm_clock_data data;
> +    int ret;
> +
> +    if (s->clock_valid) {
> +        return;
> +    }
> +    ret = kvm_vm_ioctl(KVM_GET_CLOCK,&data);
> +    if (ret<  0) {
> +        fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
> +        data.clock = 0;
> +    }
> +    s->clock = data.clock;
> +    /*
> +     * If the VM is stopped, declare the clock state valid to avoid re-reading
> +     * it on next vmsave (which would return a different value). Will be reset
> +     * when the VM is continued.
> +     */
> +    s->clock_valid = !vm_running;
> +}
> +
> +static int kvmclock_post_load(void *opaque, int version_id)
> +{
> +    KVMClockState *s = opaque;
> +    struct kvm_clock_data data;
> +
> +    data.clock = s->clock;
> +    data.flags = 0;
> +    return kvm_vm_ioctl(KVM_SET_CLOCK,&data);
> +}
> +
> +static void kvmclock_vm_state_change(void *opaque, int running, int reason)
> +{
> +    KVMClockState *s = opaque;
> +
> +    if (running) {
> +        s->clock_valid = false;
> +    }
> +}
> +
> +static int kvmclock_init(SysBusDevice *dev)
> +{
> +    KVMClockState *s = FROM_SYSBUS(KVMClockState, dev);
> +
> +    qemu_add_vm_change_state_handler(kvmclock_vm_state_change, s);
> +    return 0;
> +}
> +
> +static const VMStateDescription kvmclock_vmsd= {
> +    .name = "kvmclock",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .minimum_version_id_old = 1,
> +    .pre_save = kvmclock_pre_save,
> +    .post_load = kvmclock_post_load,
> +    .fields = (VMStateField []) {
> +        VMSTATE_UINT64(clock, KVMClockState),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
> +static SysBusDeviceInfo kvmclock_info = {
> +    .qdev.name = "kvmclock",
> +    .qdev.size = sizeof(KVMClockState),
> +    .qdev.vmsd =&kvmclock_vmsd,
> +    .qdev.no_user = 1,
> +    .init = kvmclock_init,
> +};
> +#endif /* CONFIG_KVM_PARA&&  KVM_CAP_ADJUST_CLOCK */
> +
>   int kvm_arch_init_vcpu(CPUState *env)
>   {
>       struct {
> @@ -335,7 +415,6 @@ int kvm_arch_init_vcpu(CPUState *env)
>       env->cpuid_svm_features&= kvm_x86_get_supported_cpuid(0x8000000A,
>                                                               0, R_EDX);
>
> -
>       cpuid_i = 0;
>
>   #ifdef CONFIG_KVM_PARA
> @@ -442,6 +521,13 @@ int kvm_arch_init_vcpu(CPUState *env)
>       }
>   #endif
>
> +#if defined(CONFIG_KVM_PARA)&&  defined(KVM_CAP_ADJUST_CLOCK)
> +    if (cpu_is_bsp(env)&&
> +        (env->cpuid_kvm_features&  (1ULL<<  KVM_FEATURE_CLOCKSOURCE))) {
> +        sysbus_create_simple("kvmclock", -1, NULL);
> +    }
> +#endif
> +
>       return kvm_vcpu_ioctl(env, KVM_SET_CPUID2,&cpuid_data);
>   }
>
> @@ -531,6 +617,10 @@ int kvm_arch_init(int smp_cpus)
>       int ret;
>       struct utsname utsname;
>
> +#if defined(CONFIG_KVM_PARA)&&  defined(KVM_CAP_ADJUST_CLOCK)
> +    sysbus_register_withprop(&kvmclock_info);
> +#endif
> +
>       ret = kvm_get_supported_msrs();
>       if (ret<  0) {
>           return ret;
>    

There are a couple things wrong with this patch.  It breaks 
compatibility because it does not allow kvmclock to be created or 
initiated in machines.  Older machines didn't expose kvmclock but now 
they do.  It also makes it impossible to pass parameters to kvmclock in 
the future because the device creation is hidden deep in other code 
paths.  Calling any qdev creation function in anything but pc.c (or the 
equivalent) should be a big red flag.

The solution is simple, introduce as kvm_has_clocksource().  Within the 
machine init, create the the kvm clock device after CPU creation wrapped 
in a if (kvm_has_clocksource()) call.  kvmclock should be created with 
kvm_state as a parameter and kvm_vm_ioctl() is passed the stored 
reference.   Taking a global reference to kvm_state in machine_init is 
not a bad thing, obviously the machine initialization function needs 
access to the kvm_state.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
  2011-01-10 20:23               ` Anthony Liguori
@ 2011-01-10 20:34                 ` Jan Kiszka
  2011-01-11  9:01                   ` Avi Kivity
  1 sibling, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-10 20:34 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Anthony Liguori, Marcelo Tosatti, qemu-devel, kvm, Alexander Graf

[-- Attachment #1: Type: text/plain, Size: 5989 bytes --]

Am 10.01.2011 21:23, Anthony Liguori wrote:
> On 01/10/2011 02:12 PM, Jan Kiszka wrote:
>> Am 10.01.2011 20:59, Anthony Liguori wrote:
>>   
>>> On 01/08/2011 02:47 AM, Jan Kiszka wrote:
>>>     
>>>> Am 08.01.2011 00:27, Anthony Liguori wrote:
>>>>
>>>>       
>>>>> On 01/07/2011 03:03 AM, Jan Kiszka wrote:
>>>>>
>>>>>         
>>>>>> Am 06.01.2011 20:24, Anthony Liguori wrote:
>>>>>>
>>>>>>
>>>>>>           
>>>>>>> On 01/06/2011 11:56 AM, Marcelo Tosatti wrote:
>>>>>>>
>>>>>>>
>>>>>>>             
>>>>>>>> From: Jan Kiszka<jan.kiszka@siemens.com>
>>>>>>>>
>>>>>>>> QEMU supports only one VM, so there is only one kvm_state per
>>>>>>>> process,
>>>>>>>> and we gain nothing passing a reference to it around. Eliminate any
>>>>>>>> need
>>>>>>>> to refer to it outside of kvm-all.c.
>>>>>>>>
>>>>>>>> Signed-off-by: Jan Kiszka<jan.kiszka@siemens.com>
>>>>>>>> CC: Alexander Graf<agraf@suse.de>
>>>>>>>> Signed-off-by: Marcelo Tosatti<mtosatti@redhat.com>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                
>>>>>>> I think this is a big mistake.
>>>>>>>
>>>>>>>
>>>>>>>              
>>>>>> Obviously, I don't share your concerns. :)
>>>>>>
>>>>>>
>>>>>>
>>>>>>           
>>>>>>> Having to manage kvm_state keeps the abstraction lines well defined.
>>>>>>>
>>>>>>>
>>>>>>>              
>>>>>> How does it help?
>>>>>>
>>>>>>
>>>>>>
>>>>>>           
>>>>>>> Otherwise, it's far too easy for portions of code to call into KVM
>>>>>>> functions that really shouldn't.
>>>>>>>
>>>>>>>
>>>>>>>              
>>>>>> I can't imagine we gain anything from requiring kvm_check_extension
>>>>>> callers to hold a kvm_state "capability". Yes, it's now much
>>>>>> easier to
>>>>>> call kvm_[vm_]ioctl, but that's the key point of this change:
>>>>>>
>>>>>> So far we primarily complicated the internal interface between
>>>>>> generic
>>>>>> and arch-dependent kvm parts by requiring kvm_state joggling. But
>>>>>> external users already find interfaces without this restriction
>>>>>> (kvm_log_*, kvm_ioeventfd_*, ...). That's because it's at least
>>>>>> complicated to _cleanly_ pass kvm_state references to all users that
>>>>>> need it - e.g. sysbus devices like kvmclock or upcoming in-kernel
>>>>>> irqchips.
>>>>>>
>>>>>>
>>>>>>            
>>>>> I think you're basically making my point for me.
>>>>>
>>>>> ioeventfd is a broken interface.  It shouldn't be a VM ioctl but
>>>>> rather
>>>>> a VCPU ioctl because PIO events are dispatched on a per-VCPU basis.
>>>>>
>>>>>          
>>>> OK, but I don't want to argue about the ioeventfd API. So let's put
>>>> this
>>>> case aside. :)
>>>>
>>>>
>>>>       
>>>>> kvm_state is available as part of CPU state so it's quite easy to
>>>>> get at
>>>>> if these interfaces just took a CPUState argument (and they should).
>>>>>
>>>>>          
>>>> My point is definitely NOT about cpu-bound devices. That case is clear
>>>> and is not touched at all by this patch.
>>>>
>>>> My point is about devices that have clear system scope like kvmclock,
>>>> ioapic, pit, pic,
>>>>        
>>> I don't see how ioapic, pit, or pic have a system scope.
>>>      
>> They are not bound to any CPU like the APIC which you may have in mind.
>>    
> 
> And none of the above interact with KVM.
> 
> They may be replaced by KVM but if you look at the PIT, this is done by
> having two distinct devices.  The KVM specific device can (and should)
> be instantiated with kvm_state.
> 
> The way the IOAPIC/APIC/PIC is handled in qemu-kvm is nasty.  The kernel
> devices are separate devices and that should be reflected in the device
> tree.

If separate device or hack to existing one - both need to sync their
user space state with the kernel when QEMU asks them to. That's how they
have to interact with KVM all the time. Same for kvmclock if you want to
look at a really trivial example.

> 
>>> I don't know enough about kvmclock.
>>>      
>> It's just the same.
>>
>>   
>>>     
>>>>    whatever-the-future-will-bring. And about KVM services
>>>> that have global scope like capability checks and other feature
>>>> explorations or VM configurations done by the KVM arch code. You still
>>>> didn't explain what we gain in these concrete scenarios by handing the
>>>> technically redundant abstraction kvm_state around, especially _inside_
>>>> the KVM core.
>>>>
>>>>        
>>> If you have to pass around a KVMState pointer, you establish an explicit
>>> relationship and communication between subsystems.  Any place where the
>>> global KVMState is used is a red flag that something is wrong.
>>>      
>> It is and will be _only_ used inside kvm-all.c. Again: What is the
>> benefit of restricting access to kvm_check_extension this way?
>>    
> 
> The more places that need to deal with KVM compatibility code, the worse
> we will be because it's more opportunities to get it wrong.

That code belongs where the related logic is. IMHO, it would be a
needless abstraction to push in-kernel access services and workaround
definitions in the KVM core instead of the KVM device model code -
provided there is only one user.

But this discussion is a bit abstract right now as we do not yet have
anything more complex than kvmclock on the table for QEMU.

> 
>>> I don't see what the advantage to making all of the KVMState global and
>>> implicit.  It seems like a big step backwards to me.  Can you give a
>>> very concrete example of where you think it results in easier to
>>> understand code as I don't see how making relationships implicit ever
>>> makes code easier to understand?
>>>      
>> The best example does not yet exist (fortunately): Just look at patch 28
>> and then try to pass some kvm_state reference to the kvmclock device. Is
>> this handle worth changing the sysbus API?
>>    
> 
> Let me look at that patch and reply there.
> 

OK, great.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-10 20:31     ` Anthony Liguori
@ 2011-01-10 21:06       ` Jan Kiszka
  -1 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-10 21:06 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Marcelo Tosatti, Anthony Liguori, Glauber Costa, qemu-devel, kvm

[-- Attachment #1: Type: text/plain, Size: 6187 bytes --]

Am 10.01.2011 21:31, Anthony Liguori wrote:
> On 01/06/2011 11:56 AM, Marcelo Tosatti wrote:
>> From: Jan Kiszka<jan.kiszka@siemens.com>
>>
>> If kvmclock is used, which implies the kernel supports it, register a
>> kvmclock device with the sysbus. Its main purpose is to save and restore
>> the kernel state on migration, but this will also allow to visualize it
>> one day.
>>
>> Signed-off-by: Jan Kiszka<jan.kiszka@siemens.com>
>> CC: Glauber Costa<glommer@redhat.com>
>> Signed-off-by: Marcelo Tosatti<mtosatti@redhat.com>
>> ---
>>   target-i386/kvm.c |   92
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>   1 files changed, 91 insertions(+), 1 deletions(-)
>>
>> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
>> index 69b8234..47cb22b 100644
>> --- a/target-i386/kvm.c
>> +++ b/target-i386/kvm.c
>> @@ -29,6 +29,7 @@
>>   #include "hw/apic.h"
>>   #include "ioport.h"
>>   #include "kvm_x86.h"
>> +#include "hw/sysbus.h"
>>
>>   #ifdef CONFIG_KVM_PARA
>>   #include<linux/kvm_para.h>
>> @@ -309,6 +310,85 @@ void kvm_inject_x86_mce(CPUState *cenv, int bank,
>> uint64_t status,
>>   #endif
>>   }
>>
>> +#if defined(CONFIG_KVM_PARA)&&  defined(KVM_CAP_ADJUST_CLOCK)
>> +typedef struct KVMClockState {
>> +    SysBusDevice busdev;
>> +    uint64_t clock;
>> +    bool clock_valid;
>> +} KVMClockState;
>> +
>> +static void kvmclock_pre_save(void *opaque)
>> +{
>> +    KVMClockState *s = opaque;
>> +    struct kvm_clock_data data;
>> +    int ret;
>> +
>> +    if (s->clock_valid) {
>> +        return;
>> +    }
>> +    ret = kvm_vm_ioctl(KVM_GET_CLOCK,&data);
>> +    if (ret<  0) {
>> +        fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
>> +        data.clock = 0;
>> +    }
>> +    s->clock = data.clock;
>> +    /*
>> +     * If the VM is stopped, declare the clock state valid to avoid
>> re-reading
>> +     * it on next vmsave (which would return a different value). Will
>> be reset
>> +     * when the VM is continued.
>> +     */
>> +    s->clock_valid = !vm_running;
>> +}
>> +
>> +static int kvmclock_post_load(void *opaque, int version_id)
>> +{
>> +    KVMClockState *s = opaque;
>> +    struct kvm_clock_data data;
>> +
>> +    data.clock = s->clock;
>> +    data.flags = 0;
>> +    return kvm_vm_ioctl(KVM_SET_CLOCK,&data);
>> +}
>> +
>> +static void kvmclock_vm_state_change(void *opaque, int running, int
>> reason)
>> +{
>> +    KVMClockState *s = opaque;
>> +
>> +    if (running) {
>> +        s->clock_valid = false;
>> +    }
>> +}
>> +
>> +static int kvmclock_init(SysBusDevice *dev)
>> +{
>> +    KVMClockState *s = FROM_SYSBUS(KVMClockState, dev);
>> +
>> +    qemu_add_vm_change_state_handler(kvmclock_vm_state_change, s);
>> +    return 0;
>> +}
>> +
>> +static const VMStateDescription kvmclock_vmsd= {
>> +    .name = "kvmclock",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .minimum_version_id_old = 1,
>> +    .pre_save = kvmclock_pre_save,
>> +    .post_load = kvmclock_post_load,
>> +    .fields = (VMStateField []) {
>> +        VMSTATE_UINT64(clock, KVMClockState),
>> +        VMSTATE_END_OF_LIST()
>> +    }
>> +};
>> +
>> +static SysBusDeviceInfo kvmclock_info = {
>> +    .qdev.name = "kvmclock",
>> +    .qdev.size = sizeof(KVMClockState),
>> +    .qdev.vmsd =&kvmclock_vmsd,
>> +    .qdev.no_user = 1,
>> +    .init = kvmclock_init,
>> +};
>> +#endif /* CONFIG_KVM_PARA&&  KVM_CAP_ADJUST_CLOCK */
>> +
>>   int kvm_arch_init_vcpu(CPUState *env)
>>   {
>>       struct {
>> @@ -335,7 +415,6 @@ int kvm_arch_init_vcpu(CPUState *env)
>>       env->cpuid_svm_features&= kvm_x86_get_supported_cpuid(0x8000000A,
>>                                                               0, R_EDX);
>>
>> -
>>       cpuid_i = 0;
>>
>>   #ifdef CONFIG_KVM_PARA
>> @@ -442,6 +521,13 @@ int kvm_arch_init_vcpu(CPUState *env)
>>       }
>>   #endif
>>
>> +#if defined(CONFIG_KVM_PARA)&&  defined(KVM_CAP_ADJUST_CLOCK)
>> +    if (cpu_is_bsp(env)&&
>> +        (env->cpuid_kvm_features&  (1ULL<<  KVM_FEATURE_CLOCKSOURCE))) {
>> +        sysbus_create_simple("kvmclock", -1, NULL);
>> +    }
>> +#endif
>> +
>>       return kvm_vcpu_ioctl(env, KVM_SET_CPUID2,&cpuid_data);
>>   }
>>
>> @@ -531,6 +617,10 @@ int kvm_arch_init(int smp_cpus)
>>       int ret;
>>       struct utsname utsname;
>>
>> +#if defined(CONFIG_KVM_PARA)&&  defined(KVM_CAP_ADJUST_CLOCK)
>> +    sysbus_register_withprop(&kvmclock_info);
>> +#endif
>> +
>>       ret = kvm_get_supported_msrs();
>>       if (ret<  0) {
>>           return ret;
>>    
> 
> There are a couple things wrong with this patch.  It breaks
> compatibility because it does not allow kvmclock to be created or
> initiated in machines.  Older machines didn't expose kvmclock but now
> they do.  It also makes it impossible to pass parameters to kvmclock in
> the future because the device creation is hidden deep in other code
> paths.

Device parameters should get passed as properties. Would already work
today if we had any.

>  Calling any qdev creation function in anything but pc.c (or the
> equivalent) should be a big red flag.
> 
> The solution is simple, introduce as kvm_has_clocksource().  Within the
> machine init, create the the kvm clock device after CPU creation wrapped
> in a if (kvm_has_clocksource()) call.

No problem with moving sysbus_create_simple to machine initialization,
though.

>  kvmclock should be created with
> kvm_state as a parameter and kvm_vm_ioctl() is passed the stored
> reference.   Taking a global reference to kvm_state in machine_init is
> not a bad thing, obviously the machine initialization function needs
> access to the kvm_state.

This would also require changing sysbus interfaces for the sake of KVM's
"abstraction". If this is the only way forward, I could look into this.

Still, I do not see any benefit for the affected code. You then either
need to "steal" a kvm_state reference from the first cpu or introduce a
marvelous interface like kvm_get_state() to make this work from outside
of the KVM core.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-10 21:06       ` Jan Kiszka
  0 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-10 21:06 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Anthony Liguori, Marcelo Tosatti, Glauber Costa, kvm, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 6187 bytes --]

Am 10.01.2011 21:31, Anthony Liguori wrote:
> On 01/06/2011 11:56 AM, Marcelo Tosatti wrote:
>> From: Jan Kiszka<jan.kiszka@siemens.com>
>>
>> If kvmclock is used, which implies the kernel supports it, register a
>> kvmclock device with the sysbus. Its main purpose is to save and restore
>> the kernel state on migration, but this will also allow to visualize it
>> one day.
>>
>> Signed-off-by: Jan Kiszka<jan.kiszka@siemens.com>
>> CC: Glauber Costa<glommer@redhat.com>
>> Signed-off-by: Marcelo Tosatti<mtosatti@redhat.com>
>> ---
>>   target-i386/kvm.c |   92
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>   1 files changed, 91 insertions(+), 1 deletions(-)
>>
>> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
>> index 69b8234..47cb22b 100644
>> --- a/target-i386/kvm.c
>> +++ b/target-i386/kvm.c
>> @@ -29,6 +29,7 @@
>>   #include "hw/apic.h"
>>   #include "ioport.h"
>>   #include "kvm_x86.h"
>> +#include "hw/sysbus.h"
>>
>>   #ifdef CONFIG_KVM_PARA
>>   #include<linux/kvm_para.h>
>> @@ -309,6 +310,85 @@ void kvm_inject_x86_mce(CPUState *cenv, int bank,
>> uint64_t status,
>>   #endif
>>   }
>>
>> +#if defined(CONFIG_KVM_PARA)&&  defined(KVM_CAP_ADJUST_CLOCK)
>> +typedef struct KVMClockState {
>> +    SysBusDevice busdev;
>> +    uint64_t clock;
>> +    bool clock_valid;
>> +} KVMClockState;
>> +
>> +static void kvmclock_pre_save(void *opaque)
>> +{
>> +    KVMClockState *s = opaque;
>> +    struct kvm_clock_data data;
>> +    int ret;
>> +
>> +    if (s->clock_valid) {
>> +        return;
>> +    }
>> +    ret = kvm_vm_ioctl(KVM_GET_CLOCK,&data);
>> +    if (ret<  0) {
>> +        fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
>> +        data.clock = 0;
>> +    }
>> +    s->clock = data.clock;
>> +    /*
>> +     * If the VM is stopped, declare the clock state valid to avoid
>> re-reading
>> +     * it on next vmsave (which would return a different value). Will
>> be reset
>> +     * when the VM is continued.
>> +     */
>> +    s->clock_valid = !vm_running;
>> +}
>> +
>> +static int kvmclock_post_load(void *opaque, int version_id)
>> +{
>> +    KVMClockState *s = opaque;
>> +    struct kvm_clock_data data;
>> +
>> +    data.clock = s->clock;
>> +    data.flags = 0;
>> +    return kvm_vm_ioctl(KVM_SET_CLOCK,&data);
>> +}
>> +
>> +static void kvmclock_vm_state_change(void *opaque, int running, int
>> reason)
>> +{
>> +    KVMClockState *s = opaque;
>> +
>> +    if (running) {
>> +        s->clock_valid = false;
>> +    }
>> +}
>> +
>> +static int kvmclock_init(SysBusDevice *dev)
>> +{
>> +    KVMClockState *s = FROM_SYSBUS(KVMClockState, dev);
>> +
>> +    qemu_add_vm_change_state_handler(kvmclock_vm_state_change, s);
>> +    return 0;
>> +}
>> +
>> +static const VMStateDescription kvmclock_vmsd= {
>> +    .name = "kvmclock",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .minimum_version_id_old = 1,
>> +    .pre_save = kvmclock_pre_save,
>> +    .post_load = kvmclock_post_load,
>> +    .fields = (VMStateField []) {
>> +        VMSTATE_UINT64(clock, KVMClockState),
>> +        VMSTATE_END_OF_LIST()
>> +    }
>> +};
>> +
>> +static SysBusDeviceInfo kvmclock_info = {
>> +    .qdev.name = "kvmclock",
>> +    .qdev.size = sizeof(KVMClockState),
>> +    .qdev.vmsd =&kvmclock_vmsd,
>> +    .qdev.no_user = 1,
>> +    .init = kvmclock_init,
>> +};
>> +#endif /* CONFIG_KVM_PARA&&  KVM_CAP_ADJUST_CLOCK */
>> +
>>   int kvm_arch_init_vcpu(CPUState *env)
>>   {
>>       struct {
>> @@ -335,7 +415,6 @@ int kvm_arch_init_vcpu(CPUState *env)
>>       env->cpuid_svm_features&= kvm_x86_get_supported_cpuid(0x8000000A,
>>                                                               0, R_EDX);
>>
>> -
>>       cpuid_i = 0;
>>
>>   #ifdef CONFIG_KVM_PARA
>> @@ -442,6 +521,13 @@ int kvm_arch_init_vcpu(CPUState *env)
>>       }
>>   #endif
>>
>> +#if defined(CONFIG_KVM_PARA)&&  defined(KVM_CAP_ADJUST_CLOCK)
>> +    if (cpu_is_bsp(env)&&
>> +        (env->cpuid_kvm_features&  (1ULL<<  KVM_FEATURE_CLOCKSOURCE))) {
>> +        sysbus_create_simple("kvmclock", -1, NULL);
>> +    }
>> +#endif
>> +
>>       return kvm_vcpu_ioctl(env, KVM_SET_CPUID2,&cpuid_data);
>>   }
>>
>> @@ -531,6 +617,10 @@ int kvm_arch_init(int smp_cpus)
>>       int ret;
>>       struct utsname utsname;
>>
>> +#if defined(CONFIG_KVM_PARA)&&  defined(KVM_CAP_ADJUST_CLOCK)
>> +    sysbus_register_withprop(&kvmclock_info);
>> +#endif
>> +
>>       ret = kvm_get_supported_msrs();
>>       if (ret<  0) {
>>           return ret;
>>    
> 
> There are a couple things wrong with this patch.  It breaks
> compatibility because it does not allow kvmclock to be created or
> initiated in machines.  Older machines didn't expose kvmclock but now
> they do.  It also makes it impossible to pass parameters to kvmclock in
> the future because the device creation is hidden deep in other code
> paths.

Device parameters should get passed as properties. Would already work
today if we had any.

>  Calling any qdev creation function in anything but pc.c (or the
> equivalent) should be a big red flag.
> 
> The solution is simple, introduce as kvm_has_clocksource().  Within the
> machine init, create the the kvm clock device after CPU creation wrapped
> in a if (kvm_has_clocksource()) call.

No problem with moving sysbus_create_simple to machine initialization,
though.

>  kvmclock should be created with
> kvm_state as a parameter and kvm_vm_ioctl() is passed the stored
> reference.   Taking a global reference to kvm_state in machine_init is
> not a bad thing, obviously the machine initialization function needs
> access to the kvm_state.

This would also require changing sysbus interfaces for the sake of KVM's
"abstraction". If this is the only way forward, I could look into this.

Still, I do not see any benefit for the affected code. You then either
need to "steal" a kvm_state reference from the first cpu or introduce a
marvelous interface like kvm_get_state() to make this work from outside
of the KVM core.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-10 21:06       ` Jan Kiszka
@ 2011-01-10 22:21         ` Jan Kiszka
  -1 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-10 22:21 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Marcelo Tosatti, Anthony Liguori, Glauber Costa, qemu-devel, kvm

[-- Attachment #1: Type: text/plain, Size: 1009 bytes --]

Am 10.01.2011 22:06, Jan Kiszka wrote:
>>  kvmclock should be created with
>> kvm_state as a parameter and kvm_vm_ioctl() is passed the stored
>> reference.   Taking a global reference to kvm_state in machine_init is
>> not a bad thing, obviously the machine initialization function needs
>> access to the kvm_state.
> 
> This would also require changing sysbus interfaces for the sake of KVM's
> "abstraction". If this is the only way forward, I could look into this.

Actually, there is already a channel to pass pointers to qdev devices:
the pointer property hack. I'm not sure we should contribute to its user
base or take the chance for a cleanup, but we are not alone with this
requirement. Point below remains valid, though.

> 
> Still, I do not see any benefit for the affected code. You then either
> need to "steal" a kvm_state reference from the first cpu or introduce a
> marvelous interface like kvm_get_state() to make this work from outside
> of the KVM core.
> 

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-10 22:21         ` Jan Kiszka
  0 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-10 22:21 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Anthony Liguori, Marcelo Tosatti, Glauber Costa, kvm, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1009 bytes --]

Am 10.01.2011 22:06, Jan Kiszka wrote:
>>  kvmclock should be created with
>> kvm_state as a parameter and kvm_vm_ioctl() is passed the stored
>> reference.   Taking a global reference to kvm_state in machine_init is
>> not a bad thing, obviously the machine initialization function needs
>> access to the kvm_state.
> 
> This would also require changing sysbus interfaces for the sake of KVM's
> "abstraction". If this is the only way forward, I could look into this.

Actually, there is already a channel to pass pointers to qdev devices:
the pointer property hack. I'm not sure we should contribute to its user
base or take the chance for a cleanup, but we are not alone with this
requirement. Point below remains valid, though.

> 
> Still, I do not see any benefit for the affected code. You then either
> need to "steal" a kvm_state reference from the first cpu or introduce a
> marvelous interface like kvm_get_state() to make this work from outside
> of the KVM core.
> 

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-10 22:21         ` Jan Kiszka
@ 2011-01-10 23:02           ` Anthony Liguori
  -1 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-10 23:02 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Marcelo Tosatti, Glauber Costa, qemu-devel, kvm

On 01/10/2011 04:21 PM, Jan Kiszka wrote:
> Am 10.01.2011 22:06, Jan Kiszka wrote:
>    
>>>   kvmclock should be created with
>>> kvm_state as a parameter and kvm_vm_ioctl() is passed the stored
>>> reference.   Taking a global reference to kvm_state in machine_init is
>>> not a bad thing, obviously the machine initialization function needs
>>> access to the kvm_state.
>>>        
>> This would also require changing sysbus interfaces for the sake of KVM's
>> "abstraction". If this is the only way forward, I could look into this.
>>      
> Actually, there is already a channel to pass pointers to qdev devices:
> the pointer property hack. I'm not sure we should contribute to its user
> base or take the chance for a cleanup, but we are not alone with this
> requirement. Point below remains valid, though.
>    

It probably makes sense to have a KVMBus and not pass it as a property 
but rather have it access it from the KvmBusState.

Regards,

Anthony Liguori

>    
>> Still, I do not see any benefit for the affected code. You then either
>> need to "steal" a kvm_state reference from the first cpu or introduce a
>> marvelous interface like kvm_get_state() to make this work from outside
>> of the KVM core.
>>
>>      
> Jan
>    


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-10 23:02           ` Anthony Liguori
  0 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-10 23:02 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Glauber Costa, Marcelo Tosatti, qemu-devel, kvm

On 01/10/2011 04:21 PM, Jan Kiszka wrote:
> Am 10.01.2011 22:06, Jan Kiszka wrote:
>    
>>>   kvmclock should be created with
>>> kvm_state as a parameter and kvm_vm_ioctl() is passed the stored
>>> reference.   Taking a global reference to kvm_state in machine_init is
>>> not a bad thing, obviously the machine initialization function needs
>>> access to the kvm_state.
>>>        
>> This would also require changing sysbus interfaces for the sake of KVM's
>> "abstraction". If this is the only way forward, I could look into this.
>>      
> Actually, there is already a channel to pass pointers to qdev devices:
> the pointer property hack. I'm not sure we should contribute to its user
> base or take the chance for a cleanup, but we are not alone with this
> requirement. Point below remains valid, though.
>    

It probably makes sense to have a KVMBus and not pass it as a property 
but rather have it access it from the KvmBusState.

Regards,

Anthony Liguori

>    
>> Still, I do not see any benefit for the affected code. You then either
>> need to "steal" a kvm_state reference from the first cpu or introduce a
>> marvelous interface like kvm_get_state() to make this work from outside
>> of the KVM core.
>>
>>      
> Jan
>    

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-10 21:06       ` Jan Kiszka
@ 2011-01-10 23:04         ` Anthony Liguori
  -1 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-10 23:04 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Marcelo Tosatti, Glauber Costa, kvm, qemu-devel

On 01/10/2011 03:06 PM, Jan Kiszka wrote:
> Am 10.01.2011 21:31, Anthony Liguori wrote:
>    
>> On 01/06/2011 11:56 AM, Marcelo Tosatti wrote:
>>      
>>> From: Jan Kiszka<jan.kiszka@siemens.com>
>>>
>>> If kvmclock is used, which implies the kernel supports it, register a
>>> kvmclock device with the sysbus. Its main purpose is to save and restore
>>> the kernel state on migration, but this will also allow to visualize it
>>> one day.
>>>
>>> Signed-off-by: Jan Kiszka<jan.kiszka@siemens.com>
>>> CC: Glauber Costa<glommer@redhat.com>
>>> Signed-off-by: Marcelo Tosatti<mtosatti@redhat.com>
>>> ---
>>>    target-i386/kvm.c |   92
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>>    1 files changed, 91 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
>>> index 69b8234..47cb22b 100644
>>> --- a/target-i386/kvm.c
>>> +++ b/target-i386/kvm.c
>>> @@ -29,6 +29,7 @@
>>>    #include "hw/apic.h"
>>>    #include "ioport.h"
>>>    #include "kvm_x86.h"
>>> +#include "hw/sysbus.h"
>>>
>>>    #ifdef CONFIG_KVM_PARA
>>>    #include<linux/kvm_para.h>
>>> @@ -309,6 +310,85 @@ void kvm_inject_x86_mce(CPUState *cenv, int bank,
>>> uint64_t status,
>>>    #endif
>>>    }
>>>
>>> +#if defined(CONFIG_KVM_PARA)&&   defined(KVM_CAP_ADJUST_CLOCK)
>>> +typedef struct KVMClockState {
>>> +    SysBusDevice busdev;
>>> +    uint64_t clock;
>>> +    bool clock_valid;
>>> +} KVMClockState;
>>> +
>>> +static void kvmclock_pre_save(void *opaque)
>>> +{
>>> +    KVMClockState *s = opaque;
>>> +    struct kvm_clock_data data;
>>> +    int ret;
>>> +
>>> +    if (s->clock_valid) {
>>> +        return;
>>> +    }
>>> +    ret = kvm_vm_ioctl(KVM_GET_CLOCK,&data);
>>> +    if (ret<   0) {
>>> +        fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
>>> +        data.clock = 0;
>>> +    }
>>> +    s->clock = data.clock;
>>> +    /*
>>> +     * If the VM is stopped, declare the clock state valid to avoid
>>> re-reading
>>> +     * it on next vmsave (which would return a different value). Will
>>> be reset
>>> +     * when the VM is continued.
>>> +     */
>>> +    s->clock_valid = !vm_running;
>>> +}
>>> +
>>> +static int kvmclock_post_load(void *opaque, int version_id)
>>> +{
>>> +    KVMClockState *s = opaque;
>>> +    struct kvm_clock_data data;
>>> +
>>> +    data.clock = s->clock;
>>> +    data.flags = 0;
>>> +    return kvm_vm_ioctl(KVM_SET_CLOCK,&data);
>>> +}
>>> +
>>> +static void kvmclock_vm_state_change(void *opaque, int running, int
>>> reason)
>>> +{
>>> +    KVMClockState *s = opaque;
>>> +
>>> +    if (running) {
>>> +        s->clock_valid = false;
>>> +    }
>>> +}
>>> +
>>> +static int kvmclock_init(SysBusDevice *dev)
>>> +{
>>> +    KVMClockState *s = FROM_SYSBUS(KVMClockState, dev);
>>> +
>>> +    qemu_add_vm_change_state_handler(kvmclock_vm_state_change, s);
>>> +    return 0;
>>> +}
>>> +
>>> +static const VMStateDescription kvmclock_vmsd= {
>>> +    .name = "kvmclock",
>>> +    .version_id = 1,
>>> +    .minimum_version_id = 1,
>>> +    .minimum_version_id_old = 1,
>>> +    .pre_save = kvmclock_pre_save,
>>> +    .post_load = kvmclock_post_load,
>>> +    .fields = (VMStateField []) {
>>> +        VMSTATE_UINT64(clock, KVMClockState),
>>> +        VMSTATE_END_OF_LIST()
>>> +    }
>>> +};
>>> +
>>> +static SysBusDeviceInfo kvmclock_info = {
>>> +    .qdev.name = "kvmclock",
>>> +    .qdev.size = sizeof(KVMClockState),
>>> +    .qdev.vmsd =&kvmclock_vmsd,
>>> +    .qdev.no_user = 1,
>>> +    .init = kvmclock_init,
>>> +};
>>> +#endif /* CONFIG_KVM_PARA&&   KVM_CAP_ADJUST_CLOCK */
>>> +
>>>    int kvm_arch_init_vcpu(CPUState *env)
>>>    {
>>>        struct {
>>> @@ -335,7 +415,6 @@ int kvm_arch_init_vcpu(CPUState *env)
>>>        env->cpuid_svm_features&= kvm_x86_get_supported_cpuid(0x8000000A,
>>>                                                                0, R_EDX);
>>>
>>> -
>>>        cpuid_i = 0;
>>>
>>>    #ifdef CONFIG_KVM_PARA
>>> @@ -442,6 +521,13 @@ int kvm_arch_init_vcpu(CPUState *env)
>>>        }
>>>    #endif
>>>
>>> +#if defined(CONFIG_KVM_PARA)&&   defined(KVM_CAP_ADJUST_CLOCK)
>>> +    if (cpu_is_bsp(env)&&
>>> +        (env->cpuid_kvm_features&   (1ULL<<   KVM_FEATURE_CLOCKSOURCE))) {
>>> +        sysbus_create_simple("kvmclock", -1, NULL);
>>> +    }
>>> +#endif
>>> +
>>>        return kvm_vcpu_ioctl(env, KVM_SET_CPUID2,&cpuid_data);
>>>    }
>>>
>>> @@ -531,6 +617,10 @@ int kvm_arch_init(int smp_cpus)
>>>        int ret;
>>>        struct utsname utsname;
>>>
>>> +#if defined(CONFIG_KVM_PARA)&&   defined(KVM_CAP_ADJUST_CLOCK)
>>> +    sysbus_register_withprop(&kvmclock_info);
>>> +#endif
>>> +
>>>        ret = kvm_get_supported_msrs();
>>>        if (ret<   0) {
>>>            return ret;
>>>
>>>        
>> There are a couple things wrong with this patch.  It breaks
>> compatibility because it does not allow kvmclock to be created or
>> initiated in machines.  Older machines didn't expose kvmclock but now
>> they do.  It also makes it impossible to pass parameters to kvmclock in
>> the future because the device creation is hidden deep in other code
>> paths.
>>      
> Device parameters should get passed as properties. Would already work
> today if we had any.
>
>    
>>   Calling any qdev creation function in anything but pc.c (or the
>> equivalent) should be a big red flag.
>>
>> The solution is simple, introduce as kvm_has_clocksource().  Within the
>> machine init, create the the kvm clock device after CPU creation wrapped
>> in a if (kvm_has_clocksource()) call.
>>      
> No problem with moving sysbus_create_simple to machine initialization,
> though.
>
>    
>>   kvmclock should be created with
>> kvm_state as a parameter and kvm_vm_ioctl() is passed the stored
>> reference.   Taking a global reference to kvm_state in machine_init is
>> not a bad thing, obviously the machine initialization function needs
>> access to the kvm_state.
>>      
> This would also require changing sysbus interfaces for the sake of KVM's
> "abstraction". If this is the only way forward, I could look into this.
>
> Still, I do not see any benefit for the affected code. You then either
> need to "steal" a kvm_state reference from the first cpu or introduce a
> marvelous interface like kvm_get_state() to make this work from outside
> of the KVM core.
>    

Or move kvm_init() to pc_init() and then pc_init() has the kvm_state 
reference.

Regards,

Anthony Liguori

> Jan
>
>    


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-10 23:04         ` Anthony Liguori
  0 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-10 23:04 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Glauber Costa, Marcelo Tosatti, qemu-devel, kvm

On 01/10/2011 03:06 PM, Jan Kiszka wrote:
> Am 10.01.2011 21:31, Anthony Liguori wrote:
>    
>> On 01/06/2011 11:56 AM, Marcelo Tosatti wrote:
>>      
>>> From: Jan Kiszka<jan.kiszka@siemens.com>
>>>
>>> If kvmclock is used, which implies the kernel supports it, register a
>>> kvmclock device with the sysbus. Its main purpose is to save and restore
>>> the kernel state on migration, but this will also allow to visualize it
>>> one day.
>>>
>>> Signed-off-by: Jan Kiszka<jan.kiszka@siemens.com>
>>> CC: Glauber Costa<glommer@redhat.com>
>>> Signed-off-by: Marcelo Tosatti<mtosatti@redhat.com>
>>> ---
>>>    target-i386/kvm.c |   92
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>>    1 files changed, 91 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
>>> index 69b8234..47cb22b 100644
>>> --- a/target-i386/kvm.c
>>> +++ b/target-i386/kvm.c
>>> @@ -29,6 +29,7 @@
>>>    #include "hw/apic.h"
>>>    #include "ioport.h"
>>>    #include "kvm_x86.h"
>>> +#include "hw/sysbus.h"
>>>
>>>    #ifdef CONFIG_KVM_PARA
>>>    #include<linux/kvm_para.h>
>>> @@ -309,6 +310,85 @@ void kvm_inject_x86_mce(CPUState *cenv, int bank,
>>> uint64_t status,
>>>    #endif
>>>    }
>>>
>>> +#if defined(CONFIG_KVM_PARA)&&   defined(KVM_CAP_ADJUST_CLOCK)
>>> +typedef struct KVMClockState {
>>> +    SysBusDevice busdev;
>>> +    uint64_t clock;
>>> +    bool clock_valid;
>>> +} KVMClockState;
>>> +
>>> +static void kvmclock_pre_save(void *opaque)
>>> +{
>>> +    KVMClockState *s = opaque;
>>> +    struct kvm_clock_data data;
>>> +    int ret;
>>> +
>>> +    if (s->clock_valid) {
>>> +        return;
>>> +    }
>>> +    ret = kvm_vm_ioctl(KVM_GET_CLOCK,&data);
>>> +    if (ret<   0) {
>>> +        fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
>>> +        data.clock = 0;
>>> +    }
>>> +    s->clock = data.clock;
>>> +    /*
>>> +     * If the VM is stopped, declare the clock state valid to avoid
>>> re-reading
>>> +     * it on next vmsave (which would return a different value). Will
>>> be reset
>>> +     * when the VM is continued.
>>> +     */
>>> +    s->clock_valid = !vm_running;
>>> +}
>>> +
>>> +static int kvmclock_post_load(void *opaque, int version_id)
>>> +{
>>> +    KVMClockState *s = opaque;
>>> +    struct kvm_clock_data data;
>>> +
>>> +    data.clock = s->clock;
>>> +    data.flags = 0;
>>> +    return kvm_vm_ioctl(KVM_SET_CLOCK,&data);
>>> +}
>>> +
>>> +static void kvmclock_vm_state_change(void *opaque, int running, int
>>> reason)
>>> +{
>>> +    KVMClockState *s = opaque;
>>> +
>>> +    if (running) {
>>> +        s->clock_valid = false;
>>> +    }
>>> +}
>>> +
>>> +static int kvmclock_init(SysBusDevice *dev)
>>> +{
>>> +    KVMClockState *s = FROM_SYSBUS(KVMClockState, dev);
>>> +
>>> +    qemu_add_vm_change_state_handler(kvmclock_vm_state_change, s);
>>> +    return 0;
>>> +}
>>> +
>>> +static const VMStateDescription kvmclock_vmsd= {
>>> +    .name = "kvmclock",
>>> +    .version_id = 1,
>>> +    .minimum_version_id = 1,
>>> +    .minimum_version_id_old = 1,
>>> +    .pre_save = kvmclock_pre_save,
>>> +    .post_load = kvmclock_post_load,
>>> +    .fields = (VMStateField []) {
>>> +        VMSTATE_UINT64(clock, KVMClockState),
>>> +        VMSTATE_END_OF_LIST()
>>> +    }
>>> +};
>>> +
>>> +static SysBusDeviceInfo kvmclock_info = {
>>> +    .qdev.name = "kvmclock",
>>> +    .qdev.size = sizeof(KVMClockState),
>>> +    .qdev.vmsd =&kvmclock_vmsd,
>>> +    .qdev.no_user = 1,
>>> +    .init = kvmclock_init,
>>> +};
>>> +#endif /* CONFIG_KVM_PARA&&   KVM_CAP_ADJUST_CLOCK */
>>> +
>>>    int kvm_arch_init_vcpu(CPUState *env)
>>>    {
>>>        struct {
>>> @@ -335,7 +415,6 @@ int kvm_arch_init_vcpu(CPUState *env)
>>>        env->cpuid_svm_features&= kvm_x86_get_supported_cpuid(0x8000000A,
>>>                                                                0, R_EDX);
>>>
>>> -
>>>        cpuid_i = 0;
>>>
>>>    #ifdef CONFIG_KVM_PARA
>>> @@ -442,6 +521,13 @@ int kvm_arch_init_vcpu(CPUState *env)
>>>        }
>>>    #endif
>>>
>>> +#if defined(CONFIG_KVM_PARA)&&   defined(KVM_CAP_ADJUST_CLOCK)
>>> +    if (cpu_is_bsp(env)&&
>>> +        (env->cpuid_kvm_features&   (1ULL<<   KVM_FEATURE_CLOCKSOURCE))) {
>>> +        sysbus_create_simple("kvmclock", -1, NULL);
>>> +    }
>>> +#endif
>>> +
>>>        return kvm_vcpu_ioctl(env, KVM_SET_CPUID2,&cpuid_data);
>>>    }
>>>
>>> @@ -531,6 +617,10 @@ int kvm_arch_init(int smp_cpus)
>>>        int ret;
>>>        struct utsname utsname;
>>>
>>> +#if defined(CONFIG_KVM_PARA)&&   defined(KVM_CAP_ADJUST_CLOCK)
>>> +    sysbus_register_withprop(&kvmclock_info);
>>> +#endif
>>> +
>>>        ret = kvm_get_supported_msrs();
>>>        if (ret<   0) {
>>>            return ret;
>>>
>>>        
>> There are a couple things wrong with this patch.  It breaks
>> compatibility because it does not allow kvmclock to be created or
>> initiated in machines.  Older machines didn't expose kvmclock but now
>> they do.  It also makes it impossible to pass parameters to kvmclock in
>> the future because the device creation is hidden deep in other code
>> paths.
>>      
> Device parameters should get passed as properties. Would already work
> today if we had any.
>
>    
>>   Calling any qdev creation function in anything but pc.c (or the
>> equivalent) should be a big red flag.
>>
>> The solution is simple, introduce as kvm_has_clocksource().  Within the
>> machine init, create the the kvm clock device after CPU creation wrapped
>> in a if (kvm_has_clocksource()) call.
>>      
> No problem with moving sysbus_create_simple to machine initialization,
> though.
>
>    
>>   kvmclock should be created with
>> kvm_state as a parameter and kvm_vm_ioctl() is passed the stored
>> reference.   Taking a global reference to kvm_state in machine_init is
>> not a bad thing, obviously the machine initialization function needs
>> access to the kvm_state.
>>      
> This would also require changing sysbus interfaces for the sake of KVM's
> "abstraction". If this is the only way forward, I could look into this.
>
> Still, I do not see any benefit for the affected code. You then either
> need to "steal" a kvm_state reference from the first cpu or introduce a
> marvelous interface like kvm_get_state() to make this work from outside
> of the KVM core.
>    

Or move kvm_init() to pc_init() and then pc_init() has the kvm_state 
reference.

Regards,

Anthony Liguori

> Jan
>
>    

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-10 23:02           ` Anthony Liguori
@ 2011-01-11  5:54             ` Jan Kiszka
  -1 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-11  5:54 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Marcelo Tosatti, Glauber Costa, qemu-devel, kvm

[-- Attachment #1: Type: text/plain, Size: 1206 bytes --]

Am 11.01.2011 00:02, Anthony Liguori wrote:
> On 01/10/2011 04:21 PM, Jan Kiszka wrote:
>> Am 10.01.2011 22:06, Jan Kiszka wrote:
>>   
>>>>   kvmclock should be created with
>>>> kvm_state as a parameter and kvm_vm_ioctl() is passed the stored
>>>> reference.   Taking a global reference to kvm_state in machine_init is
>>>> not a bad thing, obviously the machine initialization function needs
>>>> access to the kvm_state.
>>>>        
>>> This would also require changing sysbus interfaces for the sake of KVM's
>>> "abstraction". If this is the only way forward, I could look into this.
>>>      
>> Actually, there is already a channel to pass pointers to qdev devices:
>> the pointer property hack. I'm not sure we should contribute to its user
>> base or take the chance for a cleanup, but we are not alone with this
>> requirement. Point below remains valid, though.
>>    
> 
> It probably makes sense to have a KVMBus and not pass it as a property
> but rather have it access it from the KvmBusState.

KVM is orthogonal to the qtree. Some devices like vga (kvm_coalesce_*,
kvm_log_start/stop*) would actually have to be attached to multiple
buses in this model.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-11  5:54             ` Jan Kiszka
  0 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-11  5:54 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Glauber Costa, Marcelo Tosatti, qemu-devel, kvm

[-- Attachment #1: Type: text/plain, Size: 1206 bytes --]

Am 11.01.2011 00:02, Anthony Liguori wrote:
> On 01/10/2011 04:21 PM, Jan Kiszka wrote:
>> Am 10.01.2011 22:06, Jan Kiszka wrote:
>>   
>>>>   kvmclock should be created with
>>>> kvm_state as a parameter and kvm_vm_ioctl() is passed the stored
>>>> reference.   Taking a global reference to kvm_state in machine_init is
>>>> not a bad thing, obviously the machine initialization function needs
>>>> access to the kvm_state.
>>>>        
>>> This would also require changing sysbus interfaces for the sake of KVM's
>>> "abstraction". If this is the only way forward, I could look into this.
>>>      
>> Actually, there is already a channel to pass pointers to qdev devices:
>> the pointer property hack. I'm not sure we should contribute to its user
>> base or take the chance for a cleanup, but we are not alone with this
>> requirement. Point below remains valid, though.
>>    
> 
> It probably makes sense to have a KVMBus and not pass it as a property
> but rather have it access it from the KvmBusState.

KVM is orthogonal to the qtree. Some devices like vga (kvm_coalesce_*,
kvm_log_start/stop*) would actually have to be attached to multiple
buses in this model.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-10 23:04         ` Anthony Liguori
@ 2011-01-11  5:55           ` Jan Kiszka
  -1 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-11  5:55 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Marcelo Tosatti, Glauber Costa, kvm, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1209 bytes --]

Am 11.01.2011 00:04, Anthony Liguori wrote:
>>>   kvmclock should be created with
>>> kvm_state as a parameter and kvm_vm_ioctl() is passed the stored
>>> reference.   Taking a global reference to kvm_state in machine_init is
>>> not a bad thing, obviously the machine initialization function needs
>>> access to the kvm_state.
>>>      
>> This would also require changing sysbus interfaces for the sake of KVM's
>> "abstraction". If this is the only way forward, I could look into this.
>>
>> Still, I do not see any benefit for the affected code. You then either
>> need to "steal" a kvm_state reference from the first cpu or introduce a
>> marvelous interface like kvm_get_state() to make this work from outside
>> of the KVM core.
>>    
> 
> Or move kvm_init() to pc_init() and then pc_init() has the kvm_state
> reference.

Or pass the reference to the machine_init service to avoid duplicating
kvm_init logic for every KVM arch.

That alone would still be consistent. But as long as we do not pass a
kvm_state to each and every memory registration (for
kvm_setup_guest_memory), this all is like putting a fence around half of
your yard and only declaring it closed.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-11  5:55           ` Jan Kiszka
  0 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-11  5:55 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Glauber Costa, Marcelo Tosatti, qemu-devel, kvm

[-- Attachment #1: Type: text/plain, Size: 1209 bytes --]

Am 11.01.2011 00:04, Anthony Liguori wrote:
>>>   kvmclock should be created with
>>> kvm_state as a parameter and kvm_vm_ioctl() is passed the stored
>>> reference.   Taking a global reference to kvm_state in machine_init is
>>> not a bad thing, obviously the machine initialization function needs
>>> access to the kvm_state.
>>>      
>> This would also require changing sysbus interfaces for the sake of KVM's
>> "abstraction". If this is the only way forward, I could look into this.
>>
>> Still, I do not see any benefit for the affected code. You then either
>> need to "steal" a kvm_state reference from the first cpu or introduce a
>> marvelous interface like kvm_get_state() to make this work from outside
>> of the KVM core.
>>    
> 
> Or move kvm_init() to pc_init() and then pc_init() has the kvm_state
> reference.

Or pass the reference to the machine_init service to avoid duplicating
kvm_init logic for every KVM arch.

That alone would still be consistent. But as long as we do not pass a
kvm_state to each and every memory registration (for
kvm_setup_guest_memory), this all is like putting a fence around half of
your yard and only declaring it closed.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-10 22:21         ` Jan Kiszka
@ 2011-01-11  8:00           ` Paolo Bonzini
  -1 siblings, 0 replies; 300+ messages in thread
From: Paolo Bonzini @ 2011-01-11  8:00 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, Marcelo Tosatti, Anthony Liguori, Glauber Costa,
	qemu-devel, kvm

On 01/10/2011 11:21 PM, Jan Kiszka wrote:
> Am 10.01.2011 22:06, Jan Kiszka wrote:
>>>   kvmclock should be created with
>>> kvm_state as a parameter and kvm_vm_ioctl() is passed the stored
>>> reference.   Taking a global reference to kvm_state in machine_init is
>>> not a bad thing, obviously the machine initialization function needs
>>> access to the kvm_state.
>>
>> This would also require changing sysbus interfaces for the sake of KVM's
>> "abstraction". If this is the only way forward, I could look into this.
>
> Actually, there is already a channel to pass pointers to qdev devices:
> the pointer property hack. I'm not sure we should contribute to its user
> base or take the chance for a cleanup, but we are not alone with this
> requirement. Point below remains valid, though.

The pointer property uses, at least last time I checked, were all:

1) for a CPU (apic, etrax interrupt controller)

2) for a device (sparc IIRC)

3) useless (smbus_eeprom.c)

So adding a fourth kind is not really a good idea, I think.

Paolo

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-11  8:00           ` Paolo Bonzini
  0 siblings, 0 replies; 300+ messages in thread
From: Paolo Bonzini @ 2011-01-11  8:00 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, kvm, Glauber Costa, Marcelo Tosatti, qemu-devel

On 01/10/2011 11:21 PM, Jan Kiszka wrote:
> Am 10.01.2011 22:06, Jan Kiszka wrote:
>>>   kvmclock should be created with
>>> kvm_state as a parameter and kvm_vm_ioctl() is passed the stored
>>> reference.   Taking a global reference to kvm_state in machine_init is
>>> not a bad thing, obviously the machine initialization function needs
>>> access to the kvm_state.
>>
>> This would also require changing sysbus interfaces for the sake of KVM's
>> "abstraction". If this is the only way forward, I could look into this.
>
> Actually, there is already a channel to pass pointers to qdev devices:
> the pointer property hack. I'm not sure we should contribute to its user
> base or take the chance for a cleanup, but we are not alone with this
> requirement. Point below remains valid, though.

The pointer property uses, at least last time I checked, were all:

1) for a CPU (apic, etrax interrupt controller)

2) for a device (sparc IIRC)

3) useless (smbus_eeprom.c)

So adding a fourth kind is not really a good idea, I think.

Paolo

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-10 22:21         ` Jan Kiszka
@ 2011-01-11  8:53           ` Gerd Hoffmann
  -1 siblings, 0 replies; 300+ messages in thread
From: Gerd Hoffmann @ 2011-01-11  8:53 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, Marcelo Tosatti, Anthony Liguori, Glauber Costa,
	qemu-devel, kvm

   Hi,

> Actually, there is already a channel to pass pointers to qdev devices:
> the pointer property hack. I'm not sure we should contribute to its user
> base or take the chance for a cleanup, but we are not alone with this
> requirement. Point below remains valid, though.

It is considered bad/hackish style as you can't create that kind of 
devices using the -device command line switch (or from a machine 
description config file some day in the future).  So we should not add 
more uses of this, especially not in patches which are supposed to 
cleanup things ;)

cheers,
   Gerd


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-11  8:53           ` Gerd Hoffmann
  0 siblings, 0 replies; 300+ messages in thread
From: Gerd Hoffmann @ 2011-01-11  8:53 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, kvm, Glauber Costa, Marcelo Tosatti, qemu-devel

   Hi,

> Actually, there is already a channel to pass pointers to qdev devices:
> the pointer property hack. I'm not sure we should contribute to its user
> base or take the chance for a cleanup, but we are not alone with this
> requirement. Point below remains valid, though.

It is considered bad/hackish style as you can't create that kind of 
devices using the -device command line switch (or from a machine 
description config file some day in the future).  So we should not add 
more uses of this, especially not in patches which are supposed to 
cleanup things ;)

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
  2011-01-10 20:23               ` Anthony Liguori
@ 2011-01-11  9:01                   ` Avi Kivity
  2011-01-11  9:01                   ` Avi Kivity
  1 sibling, 0 replies; 300+ messages in thread
From: Avi Kivity @ 2011-01-11  9:01 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Jan Kiszka, Anthony Liguori, Marcelo Tosatti, qemu-devel, kvm,
	Alexander Graf

On 01/10/2011 10:23 PM, Anthony Liguori wrote:
>>> I don't see how ioapic, pit, or pic have a system scope.
>> They are not bound to any CPU like the APIC which you may have in mind.
>
> And none of the above interact with KVM.

They're implemented by kvm.  What deeper interaction do you have in mind?

>
> They may be replaced by KVM but if you look at the PIT, this is done 
> by having two distinct devices.  The KVM specific device can (and 
> should) be instantiated with kvm_state.
>
> The way the IOAPIC/APIC/PIC is handled in qemu-kvm is nasty.  The 
> kernel devices are separate devices and that should be reflected in 
> the device tree.

I don't see why.  Those are just two different implementations for the 
same guest visible device.  It's like saying IDE should be seen 
differently if it's backed by qcow2 or qed.

The device tree is about the guest view of devices.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
@ 2011-01-11  9:01                   ` Avi Kivity
  0 siblings, 0 replies; 300+ messages in thread
From: Avi Kivity @ 2011-01-11  9:01 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kvm, Marcelo Tosatti, qemu-devel, Alexander Graf,
	Anthony Liguori, Jan Kiszka

On 01/10/2011 10:23 PM, Anthony Liguori wrote:
>>> I don't see how ioapic, pit, or pic have a system scope.
>> They are not bound to any CPU like the APIC which you may have in mind.
>
> And none of the above interact with KVM.

They're implemented by kvm.  What deeper interaction do you have in mind?

>
> They may be replaced by KVM but if you look at the PIT, this is done 
> by having two distinct devices.  The KVM specific device can (and 
> should) be instantiated with kvm_state.
>
> The way the IOAPIC/APIC/PIC is handled in qemu-kvm is nasty.  The 
> kernel devices are separate devices and that should be reflected in 
> the device tree.

I don't see why.  Those are just two different implementations for the 
same guest visible device.  It's like saying IDE should be seen 
differently if it's backed by qcow2 or qed.

The device tree is about the guest view of devices.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
  2011-01-10 20:11           ` Anthony Liguori
@ 2011-01-11  9:17               ` Avi Kivity
  2011-01-11  9:17               ` Avi Kivity
  1 sibling, 0 replies; 300+ messages in thread
From: Avi Kivity @ 2011-01-11  9:17 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel, kvm, Alexander Graf

On 01/10/2011 10:11 PM, Anthony Liguori wrote:
> On 01/08/2011 02:47 AM, Jan Kiszka wrote:
>> OK, but I don't want to argue about the ioeventfd API. So let's put this
>> case aside. :)
>
> I often reply too quickly without explaining myself.  Let me use 
> ioeventfd as an example to highlight why KVMState is a good thing.
>
> In real life, PIO and MMIO are never directly communicated to the 
> device from the processor.  Instead, they go through a series of other 
> devices.  In the case of something like an ISA device, a PIO first 
> goes to the chipset into the PCI complex, it will then go through a 
> PCI-to-ISA bridge via subtractive decoding, and then forward over the 
> ISA device where it will be interpreted by some device.
>
> The path to the chipset may be shared among different processors but 
> it may also be unique.  The APIC is the best example as there are 
> historic APICs that hung directly off of the CPUs such that the same 
> MMIO access across different CPUs did not go to the same device.  This 
> is why the APIC emulation in QEMU is so weird because we don't model 
> this behavior correctly.
>
> This means that a PIO operation needs to flow from a CPUState to a 
> DeviceState.  It can then flow through to another DeviceState until 
> it's finally handled.
>
> The first problem with ioeventfd is that it's a per-VM operation.  It 
> should be per VCPU.

Just consider ioeventfd as something that hooks the system bus, not the 
processor-chipset link, and the problem goes away.  In practice, any 
per-cpu io port (for SMM or power management) would need synchronous 
handling; an eventfd isn't a suitable way to communicate it.

>
> But even if this were the case, the path that a PIO operation takes 
> should not be impacted by ioeventfd.  IOW, a device shouldn't be 
> allocating an eventfd() and handing it to a magical KVM call.  
> Instead, a device should register a callback for a particular port in 
> the same way it always does.  *As an optimization*, we should have 
> another interface that says that these values are only valid for this 
> IO port.  That would let us create eventfds and register things behind 
> the scenes.

The semantics are different.  The normal callbacks are synchronous; the 
vcpu is halted until the callback is serviced.  For most callbacks, this 
is critical, not just per-cpu things like vmport (example: cirrus back 
switching).

I agree it shouldn't be done by a kvm-specific kvm call, but instead by 
an API, but that API needs to be explicitly asynchronous.  When we 
further thread qemu, we'd also need to specify which thread is to 
execute the callback (and the implementation would add the eventfd to 
the thread's fd poll list).

>
> That means we can handle TCG, older KVM kernels, and newer KVM kernels 
> without any special support in the device model.  It also means that 
> the device models never have to worry about KVMState because there's 
> an entirely different piece of code that's consulting the set of 
> special ports and then deciding how to handle it.  The result is 
> better, more portable code that doesn't have KVM-isms.

Yes.

>
> If passing state around seems to be ugly, it's probably because we're 
> not abstracting things correctly.  Removing the state and making it 
> implicit is the wrong solution. 

I agree with the general sentiment that utilizing the fact that a 
variable is global to make it implicit is bad from a software 
engineering point of view.  By restricting access to variables and 
functions, you can enforce modularity on the code.  Much like the 
private: specifier in C++ and other languages.

> Fixing the abstraction is the right solution (or living with the 
> ugliness until someone else is motivated to fix it properly).

And with that too, especially the parenthesized statement.  We have 
qemu-kvm that is overly pragmatic and trie[sd] not to avoid modifying 
qemu as much as possible.  We have the qemu.git kvm implementation that 
tries a perfectionist approach that failed because most of the users 
need the featured and tested pragmatic approach.  The two mix like oil 
and water.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
@ 2011-01-11  9:17               ` Avi Kivity
  0 siblings, 0 replies; 300+ messages in thread
From: Avi Kivity @ 2011-01-11  9:17 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Marcelo Tosatti, Jan Kiszka, qemu-devel, kvm, Alexander Graf

On 01/10/2011 10:11 PM, Anthony Liguori wrote:
> On 01/08/2011 02:47 AM, Jan Kiszka wrote:
>> OK, but I don't want to argue about the ioeventfd API. So let's put this
>> case aside. :)
>
> I often reply too quickly without explaining myself.  Let me use 
> ioeventfd as an example to highlight why KVMState is a good thing.
>
> In real life, PIO and MMIO are never directly communicated to the 
> device from the processor.  Instead, they go through a series of other 
> devices.  In the case of something like an ISA device, a PIO first 
> goes to the chipset into the PCI complex, it will then go through a 
> PCI-to-ISA bridge via subtractive decoding, and then forward over the 
> ISA device where it will be interpreted by some device.
>
> The path to the chipset may be shared among different processors but 
> it may also be unique.  The APIC is the best example as there are 
> historic APICs that hung directly off of the CPUs such that the same 
> MMIO access across different CPUs did not go to the same device.  This 
> is why the APIC emulation in QEMU is so weird because we don't model 
> this behavior correctly.
>
> This means that a PIO operation needs to flow from a CPUState to a 
> DeviceState.  It can then flow through to another DeviceState until 
> it's finally handled.
>
> The first problem with ioeventfd is that it's a per-VM operation.  It 
> should be per VCPU.

Just consider ioeventfd as something that hooks the system bus, not the 
processor-chipset link, and the problem goes away.  In practice, any 
per-cpu io port (for SMM or power management) would need synchronous 
handling; an eventfd isn't a suitable way to communicate it.

>
> But even if this were the case, the path that a PIO operation takes 
> should not be impacted by ioeventfd.  IOW, a device shouldn't be 
> allocating an eventfd() and handing it to a magical KVM call.  
> Instead, a device should register a callback for a particular port in 
> the same way it always does.  *As an optimization*, we should have 
> another interface that says that these values are only valid for this 
> IO port.  That would let us create eventfds and register things behind 
> the scenes.

The semantics are different.  The normal callbacks are synchronous; the 
vcpu is halted until the callback is serviced.  For most callbacks, this 
is critical, not just per-cpu things like vmport (example: cirrus back 
switching).

I agree it shouldn't be done by a kvm-specific kvm call, but instead by 
an API, but that API needs to be explicitly asynchronous.  When we 
further thread qemu, we'd also need to specify which thread is to 
execute the callback (and the implementation would add the eventfd to 
the thread's fd poll list).

>
> That means we can handle TCG, older KVM kernels, and newer KVM kernels 
> without any special support in the device model.  It also means that 
> the device models never have to worry about KVMState because there's 
> an entirely different piece of code that's consulting the set of 
> special ports and then deciding how to handle it.  The result is 
> better, more portable code that doesn't have KVM-isms.

Yes.

>
> If passing state around seems to be ugly, it's probably because we're 
> not abstracting things correctly.  Removing the state and making it 
> implicit is the wrong solution. 

I agree with the general sentiment that utilizing the fact that a 
variable is global to make it implicit is bad from a software 
engineering point of view.  By restricting access to variables and 
functions, you can enforce modularity on the code.  Much like the 
private: specifier in C++ and other languages.

> Fixing the abstraction is the right solution (or living with the 
> ugliness until someone else is motivated to fix it properly).

And with that too, especially the parenthesized statement.  We have 
qemu-kvm that is overly pragmatic and trie[sd] not to avoid modifying 
qemu as much as possible.  We have the qemu.git kvm implementation that 
tries a perfectionist approach that failed because most of the users 
need the featured and tested pragmatic approach.  The two mix like oil 
and water.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-10 22:21         ` Jan Kiszka
@ 2011-01-11  9:31           ` Markus Armbruster
  -1 siblings, 0 replies; 300+ messages in thread
From: Markus Armbruster @ 2011-01-11  9:31 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, Anthony Liguori, Marcelo Tosatti, Glauber Costa,
	kvm, qemu-devel

Jan Kiszka <jan.kiszka@web.de> writes:

> Am 10.01.2011 22:06, Jan Kiszka wrote:
>>>  kvmclock should be created with
>>> kvm_state as a parameter and kvm_vm_ioctl() is passed the stored
>>> reference.   Taking a global reference to kvm_state in machine_init is
>>> not a bad thing, obviously the machine initialization function needs
>>> access to the kvm_state.
>> 
>> This would also require changing sysbus interfaces for the sake of KVM's
>> "abstraction". If this is the only way forward, I could look into this.
>
> Actually, there is already a channel to pass pointers to qdev devices:
> the pointer property hack. I'm not sure we should contribute to its user
> base

We shouldn't.

>      or take the chance for a cleanup, but we are not alone with this
> requirement. Point below remains valid, though.
>
>> 
>> Still, I do not see any benefit for the affected code. You then either
>> need to "steal" a kvm_state reference from the first cpu or introduce a
>> marvelous interface like kvm_get_state() to make this work from outside
>> of the KVM core.

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-11  9:31           ` Markus Armbruster
  0 siblings, 0 replies; 300+ messages in thread
From: Markus Armbruster @ 2011-01-11  9:31 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, kvm, Glauber Costa, Marcelo Tosatti, qemu-devel

Jan Kiszka <jan.kiszka@web.de> writes:

> Am 10.01.2011 22:06, Jan Kiszka wrote:
>>>  kvmclock should be created with
>>> kvm_state as a parameter and kvm_vm_ioctl() is passed the stored
>>> reference.   Taking a global reference to kvm_state in machine_init is
>>> not a bad thing, obviously the machine initialization function needs
>>> access to the kvm_state.
>> 
>> This would also require changing sysbus interfaces for the sake of KVM's
>> "abstraction". If this is the only way forward, I could look into this.
>
> Actually, there is already a channel to pass pointers to qdev devices:
> the pointer property hack. I'm not sure we should contribute to its user
> base

We shouldn't.

>      or take the chance for a cleanup, but we are not alone with this
> requirement. Point below remains valid, though.
>
>> 
>> Still, I do not see any benefit for the affected code. You then either
>> need to "steal" a kvm_state reference from the first cpu or introduce a
>> marvelous interface like kvm_get_state() to make this work from outside
>> of the KVM core.

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-11  9:31           ` Markus Armbruster
@ 2011-01-11 13:54             ` Anthony Liguori
  -1 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-11 13:54 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Jan Kiszka, Marcelo Tosatti, Glauber Costa, kvm, qemu-devel

On 01/11/2011 03:31 AM, Markus Armbruster wrote:
> Jan Kiszka<jan.kiszka@web.de>  writes:
>
>    
>> Am 10.01.2011 22:06, Jan Kiszka wrote:
>>      
>>>>   kvmclock should be created with
>>>> kvm_state as a parameter and kvm_vm_ioctl() is passed the stored
>>>> reference.   Taking a global reference to kvm_state in machine_init is
>>>> not a bad thing, obviously the machine initialization function needs
>>>> access to the kvm_state.
>>>>          
>>> This would also require changing sysbus interfaces for the sake of KVM's
>>> "abstraction". If this is the only way forward, I could look into this.
>>>        
>> Actually, there is already a channel to pass pointers to qdev devices:
>> the pointer property hack. I'm not sure we should contribute to its user
>> base
>>      
> We shouldn't.
>    

Right, we should introduce a KVMBus that KVM devices are created on.  
The devices can get at KVMState through the BusState.

Regards,

Anthony Liguori

>>       or take the chance for a cleanup, but we are not alone with this
>> requirement. Point below remains valid, though.
>>
>>      
>>> Still, I do not see any benefit for the affected code. You then either
>>> need to "steal" a kvm_state reference from the first cpu or introduce a
>>> marvelous interface like kvm_get_state() to make this work from outside
>>> of the KVM core.
>>>        


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-11 13:54             ` Anthony Liguori
  0 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-11 13:54 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Glauber Costa, Marcelo Tosatti, Jan Kiszka, qemu-devel, kvm

On 01/11/2011 03:31 AM, Markus Armbruster wrote:
> Jan Kiszka<jan.kiszka@web.de>  writes:
>
>    
>> Am 10.01.2011 22:06, Jan Kiszka wrote:
>>      
>>>>   kvmclock should be created with
>>>> kvm_state as a parameter and kvm_vm_ioctl() is passed the stored
>>>> reference.   Taking a global reference to kvm_state in machine_init is
>>>> not a bad thing, obviously the machine initialization function needs
>>>> access to the kvm_state.
>>>>          
>>> This would also require changing sysbus interfaces for the sake of KVM's
>>> "abstraction". If this is the only way forward, I could look into this.
>>>        
>> Actually, there is already a channel to pass pointers to qdev devices:
>> the pointer property hack. I'm not sure we should contribute to its user
>> base
>>      
> We shouldn't.
>    

Right, we should introduce a KVMBus that KVM devices are created on.  
The devices can get at KVMState through the BusState.

Regards,

Anthony Liguori

>>       or take the chance for a cleanup, but we are not alone with this
>> requirement. Point below remains valid, though.
>>
>>      
>>> Still, I do not see any benefit for the affected code. You then either
>>> need to "steal" a kvm_state reference from the first cpu or introduce a
>>> marvelous interface like kvm_get_state() to make this work from outside
>>> of the KVM core.
>>>        

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
  2011-01-11  9:01                   ` Avi Kivity
@ 2011-01-11 14:00                     ` Anthony Liguori
  -1 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-11 14:00 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel, kvm, Alexander Graf

On 01/11/2011 03:01 AM, Avi Kivity wrote:
> On 01/10/2011 10:23 PM, Anthony Liguori wrote:
>>>> I don't see how ioapic, pit, or pic have a system scope.
>>> They are not bound to any CPU like the APIC which you may have in mind.
>>
>> And none of the above interact with KVM.
>
> They're implemented by kvm.  What deeper interaction do you have in mind?

The emulated ioapic/pit/pic do not interact with KVM at all.

The KVM versions should be completely separate devices.

>
>>
>> They may be replaced by KVM but if you look at the PIT, this is done 
>> by having two distinct devices.  The KVM specific device can (and 
>> should) be instantiated with kvm_state.
>>
>> The way the IOAPIC/APIC/PIC is handled in qemu-kvm is nasty.  The 
>> kernel devices are separate devices and that should be reflected in 
>> the device tree.
>
> I don't see why.  Those are just two different implementations for the 
> same guest visible device.

Right, they should appear the same to the guest but the fact that 
they're two different implementations should be reflected in the device 
tree.

>   It's like saying IDE should be seen differently if it's backed by 
> qcow2 or qed.

No, it's not at all.

Advantages of separating KVM devices:

1) it becomes very clear what functionality is handled in the kernel 
verses in userspace (you can actually look at the code and tell)

2) a user can explicitly create either the emulated version of the 
device or the in-kernel version of the device (no need for -no-kvm-irqchip)

3) a user can pass parameters directly to the in-kernel version of the 
device that are different from the userspace version (like selecting 
different interrupt catch-up methods)

Regards,

Anthony Liguori

> The device tree is about the guest view of devices.
>


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
@ 2011-01-11 14:00                     ` Anthony Liguori
  0 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-11 14:00 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, Jan Kiszka, qemu-devel, kvm, Alexander Graf

On 01/11/2011 03:01 AM, Avi Kivity wrote:
> On 01/10/2011 10:23 PM, Anthony Liguori wrote:
>>>> I don't see how ioapic, pit, or pic have a system scope.
>>> They are not bound to any CPU like the APIC which you may have in mind.
>>
>> And none of the above interact with KVM.
>
> They're implemented by kvm.  What deeper interaction do you have in mind?

The emulated ioapic/pit/pic do not interact with KVM at all.

The KVM versions should be completely separate devices.

>
>>
>> They may be replaced by KVM but if you look at the PIT, this is done 
>> by having two distinct devices.  The KVM specific device can (and 
>> should) be instantiated with kvm_state.
>>
>> The way the IOAPIC/APIC/PIC is handled in qemu-kvm is nasty.  The 
>> kernel devices are separate devices and that should be reflected in 
>> the device tree.
>
> I don't see why.  Those are just two different implementations for the 
> same guest visible device.

Right, they should appear the same to the guest but the fact that 
they're two different implementations should be reflected in the device 
tree.

>   It's like saying IDE should be seen differently if it's backed by 
> qcow2 or qed.

No, it's not at all.

Advantages of separating KVM devices:

1) it becomes very clear what functionality is handled in the kernel 
verses in userspace (you can actually look at the code and tell)

2) a user can explicitly create either the emulated version of the 
device or the in-kernel version of the device (no need for -no-kvm-irqchip)

3) a user can pass parameters directly to the in-kernel version of the 
device that are different from the userspace version (like selecting 
different interrupt catch-up methods)

Regards,

Anthony Liguori

> The device tree is about the guest view of devices.
>

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
  2011-01-11 14:00                     ` Anthony Liguori
@ 2011-01-11 14:06                       ` Alexander Graf
  -1 siblings, 0 replies; 300+ messages in thread
From: Alexander Graf @ 2011-01-11 14:06 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Avi Kivity, Jan Kiszka, Marcelo Tosatti, qemu-devel, kvm


On 11.01.2011, at 15:00, Anthony Liguori wrote:

> On 01/11/2011 03:01 AM, Avi Kivity wrote:
>> On 01/10/2011 10:23 PM, Anthony Liguori wrote:
>>>>> I don't see how ioapic, pit, or pic have a system scope.
>>>> They are not bound to any CPU like the APIC which you may have in mind.
>>> 
>>> And none of the above interact with KVM.
>> 
>> They're implemented by kvm.  What deeper interaction do you have in mind?
> 
> The emulated ioapic/pit/pic do not interact with KVM at all.
> 
> The KVM versions should be completely separate devices.
> 
>> 
>>> 
>>> They may be replaced by KVM but if you look at the PIT, this is done by having two distinct devices.  The KVM specific device can (and should) be instantiated with kvm_state.
>>> 
>>> The way the IOAPIC/APIC/PIC is handled in qemu-kvm is nasty.  The kernel devices are separate devices and that should be reflected in the device tree.
>> 
>> I don't see why.  Those are just two different implementations for the same guest visible device.
> 
> Right, they should appear the same to the guest but the fact that they're two different implementations should be reflected in the device tree.
> 
>>  It's like saying IDE should be seen differently if it's backed by qcow2 or qed.
> 
> No, it's not at all.
> 
> Advantages of separating KVM devices:
> 
> 1) it becomes very clear what functionality is handled in the kernel verses in userspace (you can actually look at the code and tell)
> 
> 2) a user can explicitly create either the emulated version of the device or the in-kernel version of the device (no need for -no-kvm-irqchip)
> 
> 3) a user can pass parameters directly to the in-kernel version of the device that are different from the userspace version (like selecting different interrupt catch-up methods)

Disadvantages:

1) you lose migration / savevm between KVM and non-KVM VMs

I'm not saying this is unsolvable, but it's certainly something that bothers me :). Some sort of meta-device for KVM implemented devices and emulated devices would be nice. That device would then be the one state gets saved/restored from.


Alex


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
@ 2011-01-11 14:06                       ` Alexander Graf
  0 siblings, 0 replies; 300+ messages in thread
From: Alexander Graf @ 2011-01-11 14:06 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Marcelo Tosatti, Jan Kiszka, Avi Kivity, kvm, qemu-devel


On 11.01.2011, at 15:00, Anthony Liguori wrote:

> On 01/11/2011 03:01 AM, Avi Kivity wrote:
>> On 01/10/2011 10:23 PM, Anthony Liguori wrote:
>>>>> I don't see how ioapic, pit, or pic have a system scope.
>>>> They are not bound to any CPU like the APIC which you may have in mind.
>>> 
>>> And none of the above interact with KVM.
>> 
>> They're implemented by kvm.  What deeper interaction do you have in mind?
> 
> The emulated ioapic/pit/pic do not interact with KVM at all.
> 
> The KVM versions should be completely separate devices.
> 
>> 
>>> 
>>> They may be replaced by KVM but if you look at the PIT, this is done by having two distinct devices.  The KVM specific device can (and should) be instantiated with kvm_state.
>>> 
>>> The way the IOAPIC/APIC/PIC is handled in qemu-kvm is nasty.  The kernel devices are separate devices and that should be reflected in the device tree.
>> 
>> I don't see why.  Those are just two different implementations for the same guest visible device.
> 
> Right, they should appear the same to the guest but the fact that they're two different implementations should be reflected in the device tree.
> 
>>  It's like saying IDE should be seen differently if it's backed by qcow2 or qed.
> 
> No, it's not at all.
> 
> Advantages of separating KVM devices:
> 
> 1) it becomes very clear what functionality is handled in the kernel verses in userspace (you can actually look at the code and tell)
> 
> 2) a user can explicitly create either the emulated version of the device or the in-kernel version of the device (no need for -no-kvm-irqchip)
> 
> 3) a user can pass parameters directly to the in-kernel version of the device that are different from the userspace version (like selecting different interrupt catch-up methods)

Disadvantages:

1) you lose migration / savevm between KVM and non-KVM VMs

I'm not saying this is unsolvable, but it's certainly something that bothers me :). Some sort of meta-device for KVM implemented devices and emulated devices would be nice. That device would then be the one state gets saved/restored from.


Alex

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
  2011-01-11 14:06                       ` Alexander Graf
@ 2011-01-11 14:09                         ` Anthony Liguori
  -1 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-11 14:09 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Avi Kivity, Jan Kiszka, Marcelo Tosatti, qemu-devel, kvm

On 01/11/2011 08:06 AM, Alexander Graf wrote:
> On 11.01.2011, at 15:00, Anthony Liguori wrote:
>
>    
>> On 01/11/2011 03:01 AM, Avi Kivity wrote:
>>      
>>> On 01/10/2011 10:23 PM, Anthony Liguori wrote:
>>>        
>>>>>> I don't see how ioapic, pit, or pic have a system scope.
>>>>>>              
>>>>> They are not bound to any CPU like the APIC which you may have in mind.
>>>>>            
>>>> And none of the above interact with KVM.
>>>>          
>>> They're implemented by kvm.  What deeper interaction do you have in mind?
>>>        
>> The emulated ioapic/pit/pic do not interact with KVM at all.
>>
>> The KVM versions should be completely separate devices.
>>
>>      
>>>        
>>>> They may be replaced by KVM but if you look at the PIT, this is done by having two distinct devices.  The KVM specific device can (and should) be instantiated with kvm_state.
>>>>
>>>> The way the IOAPIC/APIC/PIC is handled in qemu-kvm is nasty.  The kernel devices are separate devices and that should be reflected in the device tree.
>>>>          
>>> I don't see why.  Those are just two different implementations for the same guest visible device.
>>>        
>> Right, they should appear the same to the guest but the fact that they're two different implementations should be reflected in the device tree.
>>
>>      
>>>   It's like saying IDE should be seen differently if it's backed by qcow2 or qed.
>>>        
>> No, it's not at all.
>>
>> Advantages of separating KVM devices:
>>
>> 1) it becomes very clear what functionality is handled in the kernel verses in userspace (you can actually look at the code and tell)
>>
>> 2) a user can explicitly create either the emulated version of the device or the in-kernel version of the device (no need for -no-kvm-irqchip)
>>
>> 3) a user can pass parameters directly to the in-kernel version of the device that are different from the userspace version (like selecting different interrupt catch-up methods)
>>      
> Disadvantages:
>
> 1) you lose migration / savevm between KVM and non-KVM VMs
>    

This doesn't work today and it's never worked.  KVM exposes things that 
TCG cannot emulate (like pvclock).

Even as two devices, nothing prevents it from working.  Both devices 
just have to support each other's savevm format.  If they use the same 
code, it makes it very easy.  Take a look at how the KVM PIT is 
implemented for an example of this.

Regards,

Anthony Liguori

> I'm not saying this is unsolvable, but it's certainly something that bothers me :). Some sort of meta-device for KVM implemented devices and emulated devices would be nice. That device would then be the one state gets saved/restored from.
>
>
> Alex
>
>    


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
@ 2011-01-11 14:09                         ` Anthony Liguori
  0 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-11 14:09 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Marcelo Tosatti, Jan Kiszka, Avi Kivity, kvm, qemu-devel

On 01/11/2011 08:06 AM, Alexander Graf wrote:
> On 11.01.2011, at 15:00, Anthony Liguori wrote:
>
>    
>> On 01/11/2011 03:01 AM, Avi Kivity wrote:
>>      
>>> On 01/10/2011 10:23 PM, Anthony Liguori wrote:
>>>        
>>>>>> I don't see how ioapic, pit, or pic have a system scope.
>>>>>>              
>>>>> They are not bound to any CPU like the APIC which you may have in mind.
>>>>>            
>>>> And none of the above interact with KVM.
>>>>          
>>> They're implemented by kvm.  What deeper interaction do you have in mind?
>>>        
>> The emulated ioapic/pit/pic do not interact with KVM at all.
>>
>> The KVM versions should be completely separate devices.
>>
>>      
>>>        
>>>> They may be replaced by KVM but if you look at the PIT, this is done by having two distinct devices.  The KVM specific device can (and should) be instantiated with kvm_state.
>>>>
>>>> The way the IOAPIC/APIC/PIC is handled in qemu-kvm is nasty.  The kernel devices are separate devices and that should be reflected in the device tree.
>>>>          
>>> I don't see why.  Those are just two different implementations for the same guest visible device.
>>>        
>> Right, they should appear the same to the guest but the fact that they're two different implementations should be reflected in the device tree.
>>
>>      
>>>   It's like saying IDE should be seen differently if it's backed by qcow2 or qed.
>>>        
>> No, it's not at all.
>>
>> Advantages of separating KVM devices:
>>
>> 1) it becomes very clear what functionality is handled in the kernel verses in userspace (you can actually look at the code and tell)
>>
>> 2) a user can explicitly create either the emulated version of the device or the in-kernel version of the device (no need for -no-kvm-irqchip)
>>
>> 3) a user can pass parameters directly to the in-kernel version of the device that are different from the userspace version (like selecting different interrupt catch-up methods)
>>      
> Disadvantages:
>
> 1) you lose migration / savevm between KVM and non-KVM VMs
>    

This doesn't work today and it's never worked.  KVM exposes things that 
TCG cannot emulate (like pvclock).

Even as two devices, nothing prevents it from working.  Both devices 
just have to support each other's savevm format.  If they use the same 
code, it makes it very easy.  Take a look at how the KVM PIT is 
implemented for an example of this.

Regards,

Anthony Liguori

> I'm not saying this is unsolvable, but it's certainly something that bothers me :). Some sort of meta-device for KVM implemented devices and emulated devices would be nice. That device would then be the one state gets saved/restored from.
>
>
> Alex
>
>    

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
  2011-01-11 14:00                     ` Anthony Liguori
@ 2011-01-11 14:18                       ` Avi Kivity
  -1 siblings, 0 replies; 300+ messages in thread
From: Avi Kivity @ 2011-01-11 14:18 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel, kvm, Alexander Graf

On 01/11/2011 04:00 PM, Anthony Liguori wrote:
> On 01/11/2011 03:01 AM, Avi Kivity wrote:
>> On 01/10/2011 10:23 PM, Anthony Liguori wrote:
>>>>> I don't see how ioapic, pit, or pic have a system scope.
>>>> They are not bound to any CPU like the APIC which you may have in 
>>>> mind.
>>>
>>> And none of the above interact with KVM.
>>
>> They're implemented by kvm.  What deeper interaction do you have in 
>> mind?
>
> The emulated ioapic/pit/pic do not interact with KVM at all.

How can they "not interact" with kvm if they're implemented by kvm?

I really don't follow here.

>
> The KVM versions should be completely separate devices.
>

Why?

>> I don't see why.  Those are just two different implementations for 
>> the same guest visible device.
>
> Right, they should appear the same to the guest but the fact that 
> they're two different implementations should be reflected in the 
> device tree.

Why?

To move beyond single-word questions, what is the purpose of the device 
tree?  In my mind, it reflects the virtual hardware.  What's important 
is that we have a PIC, virtio network adapter, and IDE disk.  Not that 
they're backed by kvm, vhost-net, and qcow2.

>
>>   It's like saying IDE should be seen differently if it's backed by 
>> qcow2 or qed.
>
> No, it's not at all.
>
> Advantages of separating KVM devices:
>
> 1) it becomes very clear what functionality is handled in the kernel 
> verses in userspace (you can actually look at the code and tell)

How something is implemented is not important, certainly not important 
enough to expose to the user as an monitor or live migration ABI.

>
> 2) a user can explicitly create either the emulated version of the 
> device or the in-kernel version of the device (no need for 
> -no-kvm-irqchip)

-device ioapic,model=kernel vs. -device kvm-ioapic?

Is it really important to do that? 110% of the time we want the kernel 
irqchips.  The remaining -10% are only used for testing.

>
> 3) a user can pass parameters directly to the in-kernel version of the 
> device that are different from the userspace version (like selecting 
> different interrupt catch-up methods)

-device pit,model=qemu,catchup=slew

error: catchup=slew not supported in this model

I'm not overly concerned about the implementation part.  Though I think 
it's better to have a single implementation with kvm acting as an 
accelerator, having it the other way is no big deal.  What I am worried 
about is exposing it as a monitor and migration ABI.  IMO the only 
important thing is the spec that the device implements, not what piece 
of code implements it.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
@ 2011-01-11 14:18                       ` Avi Kivity
  0 siblings, 0 replies; 300+ messages in thread
From: Avi Kivity @ 2011-01-11 14:18 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Marcelo Tosatti, Jan Kiszka, qemu-devel, kvm, Alexander Graf

On 01/11/2011 04:00 PM, Anthony Liguori wrote:
> On 01/11/2011 03:01 AM, Avi Kivity wrote:
>> On 01/10/2011 10:23 PM, Anthony Liguori wrote:
>>>>> I don't see how ioapic, pit, or pic have a system scope.
>>>> They are not bound to any CPU like the APIC which you may have in 
>>>> mind.
>>>
>>> And none of the above interact with KVM.
>>
>> They're implemented by kvm.  What deeper interaction do you have in 
>> mind?
>
> The emulated ioapic/pit/pic do not interact with KVM at all.

How can they "not interact" with kvm if they're implemented by kvm?

I really don't follow here.

>
> The KVM versions should be completely separate devices.
>

Why?

>> I don't see why.  Those are just two different implementations for 
>> the same guest visible device.
>
> Right, they should appear the same to the guest but the fact that 
> they're two different implementations should be reflected in the 
> device tree.

Why?

To move beyond single-word questions, what is the purpose of the device 
tree?  In my mind, it reflects the virtual hardware.  What's important 
is that we have a PIC, virtio network adapter, and IDE disk.  Not that 
they're backed by kvm, vhost-net, and qcow2.

>
>>   It's like saying IDE should be seen differently if it's backed by 
>> qcow2 or qed.
>
> No, it's not at all.
>
> Advantages of separating KVM devices:
>
> 1) it becomes very clear what functionality is handled in the kernel 
> verses in userspace (you can actually look at the code and tell)

How something is implemented is not important, certainly not important 
enough to expose to the user as an monitor or live migration ABI.

>
> 2) a user can explicitly create either the emulated version of the 
> device or the in-kernel version of the device (no need for 
> -no-kvm-irqchip)

-device ioapic,model=kernel vs. -device kvm-ioapic?

Is it really important to do that? 110% of the time we want the kernel 
irqchips.  The remaining -10% are only used for testing.

>
> 3) a user can pass parameters directly to the in-kernel version of the 
> device that are different from the userspace version (like selecting 
> different interrupt catch-up methods)

-device pit,model=qemu,catchup=slew

error: catchup=slew not supported in this model

I'm not overly concerned about the implementation part.  Though I think 
it's better to have a single implementation with kvm acting as an 
accelerator, having it the other way is no big deal.  What I am worried 
about is exposing it as a monitor and migration ABI.  IMO the only 
important thing is the spec that the device implements, not what piece 
of code implements it.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
  2011-01-11 14:09                         ` Anthony Liguori
@ 2011-01-11 14:22                           ` Avi Kivity
  -1 siblings, 0 replies; 300+ messages in thread
From: Avi Kivity @ 2011-01-11 14:22 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Alexander Graf, Jan Kiszka, Marcelo Tosatti, qemu-devel, kvm

On 01/11/2011 04:09 PM, Anthony Liguori wrote:
>> Disadvantages:
>>
>> 1) you lose migration / savevm between KVM and non-KVM VMs
>
> This doesn't work today and it's never worked.  KVM exposes things 
> that TCG cannot emulate (like pvclock).

If you run kvm without pvclock, or implement pvclock in qemu, it works 
fine.  It should work fine for the PIT, PIC, and IOAPIC (never tried it 
myself).

If we decide to have a kernel hpet implementation, for example, it would 
be good to be able to live migrate from a version without kernel hpet, 
to a version with kernel hpet, and have the kernel hpet enabled.

>
> Even as two devices, nothing prevents it from working.  Both devices 
> just have to support each other's savevm format.  If they use the same 
> code, it makes it very easy.  Take a look at how the KVM PIT is 
> implemented for an example of this.

They need to use the same device id then.  And if they share code, that 
indicates that they need to be the same device even more.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
@ 2011-01-11 14:22                           ` Avi Kivity
  0 siblings, 0 replies; 300+ messages in thread
From: Avi Kivity @ 2011-01-11 14:22 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Marcelo Tosatti, Jan Kiszka, Alexander Graf, kvm, qemu-devel

On 01/11/2011 04:09 PM, Anthony Liguori wrote:
>> Disadvantages:
>>
>> 1) you lose migration / savevm between KVM and non-KVM VMs
>
> This doesn't work today and it's never worked.  KVM exposes things 
> that TCG cannot emulate (like pvclock).

If you run kvm without pvclock, or implement pvclock in qemu, it works 
fine.  It should work fine for the PIT, PIC, and IOAPIC (never tried it 
myself).

If we decide to have a kernel hpet implementation, for example, it would 
be good to be able to live migrate from a version without kernel hpet, 
to a version with kernel hpet, and have the kernel hpet enabled.

>
> Even as two devices, nothing prevents it from working.  Both devices 
> just have to support each other's savevm format.  If they use the same 
> code, it makes it very easy.  Take a look at how the KVM PIT is 
> implemented for an example of this.

They need to use the same device id then.  And if they share code, that 
indicates that they need to be the same device even more.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
  2011-01-11 14:09                         ` Anthony Liguori
@ 2011-01-11 14:24                           ` Alexander Graf
  -1 siblings, 0 replies; 300+ messages in thread
From: Alexander Graf @ 2011-01-11 14:24 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Avi Kivity, Jan Kiszka, Marcelo Tosatti, qemu-devel, kvm


On 11.01.2011, at 15:09, Anthony Liguori wrote:

> On 01/11/2011 08:06 AM, Alexander Graf wrote:
>> On 11.01.2011, at 15:00, Anthony Liguori wrote:
>> 
>>   
>>> On 01/11/2011 03:01 AM, Avi Kivity wrote:
>>>     
>>>> On 01/10/2011 10:23 PM, Anthony Liguori wrote:
>>>>       
>>>>>>> I don't see how ioapic, pit, or pic have a system scope.
>>>>>>>             
>>>>>> They are not bound to any CPU like the APIC which you may have in mind.
>>>>>>           
>>>>> And none of the above interact with KVM.
>>>>>         
>>>> They're implemented by kvm.  What deeper interaction do you have in mind?
>>>>       
>>> The emulated ioapic/pit/pic do not interact with KVM at all.
>>> 
>>> The KVM versions should be completely separate devices.
>>> 
>>>     
>>>>       
>>>>> They may be replaced by KVM but if you look at the PIT, this is done by having two distinct devices.  The KVM specific device can (and should) be instantiated with kvm_state.
>>>>> 
>>>>> The way the IOAPIC/APIC/PIC is handled in qemu-kvm is nasty.  The kernel devices are separate devices and that should be reflected in the device tree.
>>>>>         
>>>> I don't see why.  Those are just two different implementations for the same guest visible device.
>>>>       
>>> Right, they should appear the same to the guest but the fact that they're two different implementations should be reflected in the device tree.
>>> 
>>>     
>>>>  It's like saying IDE should be seen differently if it's backed by qcow2 or qed.
>>>>       
>>> No, it's not at all.
>>> 
>>> Advantages of separating KVM devices:
>>> 
>>> 1) it becomes very clear what functionality is handled in the kernel verses in userspace (you can actually look at the code and tell)
>>> 
>>> 2) a user can explicitly create either the emulated version of the device or the in-kernel version of the device (no need for -no-kvm-irqchip)
>>> 
>>> 3) a user can pass parameters directly to the in-kernel version of the device that are different from the userspace version (like selecting different interrupt catch-up methods)
>>>     
>> Disadvantages:
>> 
>> 1) you lose migration / savevm between KVM and non-KVM VMs
>>   
> 
> This doesn't work today and it's never worked.  KVM exposes things that TCG cannot emulate (like pvclock).

Those cases simply shouldn't exist and hurt us (or at least me). I had exactly the pvclock issue with xenner. Xenner can't do proper timekeeping in emulation mode. So implementing an emulated pvclock implementation is (pretty low) on my todo list.

> Even as two devices, nothing prevents it from working.  Both devices just have to support each other's savevm format.  If they use the same code, it makes it very easy.  Take a look at how the KVM PIT is implemented for an example of this.

If that's all it takes, fine. It makes it pretty hard to enforce, but I guess we can get away with that :).

Making devices separate basically hurts abstraction. I don't see any use case where we should have a KVM device without emulation equivalent. For the CPU we also think of KVM as an accelerator instead of a separate device, no? :)


Alex


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
@ 2011-01-11 14:24                           ` Alexander Graf
  0 siblings, 0 replies; 300+ messages in thread
From: Alexander Graf @ 2011-01-11 14:24 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Marcelo Tosatti, Jan Kiszka, Avi Kivity, kvm, qemu-devel


On 11.01.2011, at 15:09, Anthony Liguori wrote:

> On 01/11/2011 08:06 AM, Alexander Graf wrote:
>> On 11.01.2011, at 15:00, Anthony Liguori wrote:
>> 
>>   
>>> On 01/11/2011 03:01 AM, Avi Kivity wrote:
>>>     
>>>> On 01/10/2011 10:23 PM, Anthony Liguori wrote:
>>>>       
>>>>>>> I don't see how ioapic, pit, or pic have a system scope.
>>>>>>>             
>>>>>> They are not bound to any CPU like the APIC which you may have in mind.
>>>>>>           
>>>>> And none of the above interact with KVM.
>>>>>         
>>>> They're implemented by kvm.  What deeper interaction do you have in mind?
>>>>       
>>> The emulated ioapic/pit/pic do not interact with KVM at all.
>>> 
>>> The KVM versions should be completely separate devices.
>>> 
>>>     
>>>>       
>>>>> They may be replaced by KVM but if you look at the PIT, this is done by having two distinct devices.  The KVM specific device can (and should) be instantiated with kvm_state.
>>>>> 
>>>>> The way the IOAPIC/APIC/PIC is handled in qemu-kvm is nasty.  The kernel devices are separate devices and that should be reflected in the device tree.
>>>>>         
>>>> I don't see why.  Those are just two different implementations for the same guest visible device.
>>>>       
>>> Right, they should appear the same to the guest but the fact that they're two different implementations should be reflected in the device tree.
>>> 
>>>     
>>>>  It's like saying IDE should be seen differently if it's backed by qcow2 or qed.
>>>>       
>>> No, it's not at all.
>>> 
>>> Advantages of separating KVM devices:
>>> 
>>> 1) it becomes very clear what functionality is handled in the kernel verses in userspace (you can actually look at the code and tell)
>>> 
>>> 2) a user can explicitly create either the emulated version of the device or the in-kernel version of the device (no need for -no-kvm-irqchip)
>>> 
>>> 3) a user can pass parameters directly to the in-kernel version of the device that are different from the userspace version (like selecting different interrupt catch-up methods)
>>>     
>> Disadvantages:
>> 
>> 1) you lose migration / savevm between KVM and non-KVM VMs
>>   
> 
> This doesn't work today and it's never worked.  KVM exposes things that TCG cannot emulate (like pvclock).

Those cases simply shouldn't exist and hurt us (or at least me). I had exactly the pvclock issue with xenner. Xenner can't do proper timekeeping in emulation mode. So implementing an emulated pvclock implementation is (pretty low) on my todo list.

> Even as two devices, nothing prevents it from working.  Both devices just have to support each other's savevm format.  If they use the same code, it makes it very easy.  Take a look at how the KVM PIT is implemented for an example of this.

If that's all it takes, fine. It makes it pretty hard to enforce, but I guess we can get away with that :).

Making devices separate basically hurts abstraction. I don't see any use case where we should have a KVM device without emulation equivalent. For the CPU we also think of KVM as an accelerator instead of a separate device, no? :)


Alex

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
  2011-01-11 14:18                       ` Avi Kivity
@ 2011-01-11 14:28                         ` Anthony Liguori
  -1 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-11 14:28 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel, kvm, Alexander Graf

On 01/11/2011 08:18 AM, Avi Kivity wrote:
> On 01/11/2011 04:00 PM, Anthony Liguori wrote:
>> On 01/11/2011 03:01 AM, Avi Kivity wrote:
>>> On 01/10/2011 10:23 PM, Anthony Liguori wrote:
>>>>>> I don't see how ioapic, pit, or pic have a system scope.
>>>>> They are not bound to any CPU like the APIC which you may have in 
>>>>> mind.
>>>>
>>>> And none of the above interact with KVM.
>>>
>>> They're implemented by kvm.  What deeper interaction do you have in 
>>> mind?
>>
>> The emulated ioapic/pit/pic do not interact with KVM at all.
>
> How can they "not interact" with kvm if they're implemented by kvm?
>
> I really don't follow here.

"emulated ioapic/pit/pic" == versions implemented in QEMU.  That's what 
I'm trying to say.  When not using the KVM versions of the devices, 
there are no interactions with KVM.

>>
>> The KVM versions should be completely separate devices.
>>
>
> Why?

Because the KVM versions are replacements.

>>> I don't see why.  Those are just two different implementations for 
>>> the same guest visible device.
>>
>> Right, they should appear the same to the guest but the fact that 
>> they're two different implementations should be reflected in the 
>> device tree.
>
> Why?
>
> To move beyond single-word questions, what is the purpose of the 
> device tree?  In my mind, it reflects the virtual hardware.  What's 
> important is that we have a PIC, virtio network adapter, and IDE 
> disk.  Not that they're backed by kvm, vhost-net, and qcow2.

Let me give a very concrete example to illustrate my point.

One thing I have on my TODO is to implement catch-up support for the 
emulated devices.  I want to implement three modes of catch-up support: 
drop, fast, and gradual.  Gradual is the best policy IMHO but fast is 
necessary on older kernels without highres timers.  Drop is necessary to 
maintain compatibility with what we have today.

The kernel PIT only implements one mode and even if the other two were 
added, even the newest version of QEMU needs to deal with the fact that 
there's old kernels out there with PIT's that only do fast.

So how does this get exposed to management tools?  Do you check for 
drift-mode=fast and transparently enable the KVM pit?  Do you fail if 
anything but drift-mode=fast is specified?

We need to have the following mechanisms:

1) the ability to select an in-kernel PIT vs. a userspace PIT

2) an independent mechanism to configure the userspace PIT

3) an independent mechanism to configure the in-kernel PIT.

The best way to do this is to make the in-kernel PIT a separate device.  
Then we get all of this for free.

>>
>> 2) a user can explicitly create either the emulated version of the 
>> device or the in-kernel version of the device (no need for 
>> -no-kvm-irqchip)
>
> -device ioapic,model=kernel vs. -device kvm-ioapic?
>
> Is it really important to do that? 110% of the time we want the kernel 
> irqchips.  The remaining -10% are only used for testing.

If model=kernel makes the support options different, then you end up 
introduce another layer of option validation.  By using the later form, 
you get to leverage the option validation of qdev plus it makes it much 
clearer to users what options are supported in what model because now 
the documentation is explicit about it.

>>
>> 3) a user can pass parameters directly to the in-kernel version of 
>> the device that are different from the userspace version (like 
>> selecting different interrupt catch-up methods)
>
> -device pit,model=qemu,catchup=slew
>
> error: catchup=slew not supported in this model
>
> I'm not overly concerned about the implementation part.  Though I 
> think it's better to have a single implementation with kvm acting as 
> an accelerator, having it the other way is no big deal.  What I am 
> worried about is exposing it as a monitor and migration ABI.  IMO the 
> only important thing is the spec that the device implements, not what 
> piece of code implements it.

Just as we do in the PIT, there's nothing wrong with making the device's 
migration compatible.  I'm not entirely sure what your concerns about 
the monitor are but there's simply no way to hide the fact that a device 
is implemented in KVM at the monitor level.  But really, is this 
something that management tools want?  I doubt it.  I think they want to 
have ultimate control over what gets created with us providing a 
recommended set of defaults.

Regards,

Anthony Liguori


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
@ 2011-01-11 14:28                         ` Anthony Liguori
  0 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-11 14:28 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, Jan Kiszka, qemu-devel, kvm, Alexander Graf

On 01/11/2011 08:18 AM, Avi Kivity wrote:
> On 01/11/2011 04:00 PM, Anthony Liguori wrote:
>> On 01/11/2011 03:01 AM, Avi Kivity wrote:
>>> On 01/10/2011 10:23 PM, Anthony Liguori wrote:
>>>>>> I don't see how ioapic, pit, or pic have a system scope.
>>>>> They are not bound to any CPU like the APIC which you may have in 
>>>>> mind.
>>>>
>>>> And none of the above interact with KVM.
>>>
>>> They're implemented by kvm.  What deeper interaction do you have in 
>>> mind?
>>
>> The emulated ioapic/pit/pic do not interact with KVM at all.
>
> How can they "not interact" with kvm if they're implemented by kvm?
>
> I really don't follow here.

"emulated ioapic/pit/pic" == versions implemented in QEMU.  That's what 
I'm trying to say.  When not using the KVM versions of the devices, 
there are no interactions with KVM.

>>
>> The KVM versions should be completely separate devices.
>>
>
> Why?

Because the KVM versions are replacements.

>>> I don't see why.  Those are just two different implementations for 
>>> the same guest visible device.
>>
>> Right, they should appear the same to the guest but the fact that 
>> they're two different implementations should be reflected in the 
>> device tree.
>
> Why?
>
> To move beyond single-word questions, what is the purpose of the 
> device tree?  In my mind, it reflects the virtual hardware.  What's 
> important is that we have a PIC, virtio network adapter, and IDE 
> disk.  Not that they're backed by kvm, vhost-net, and qcow2.

Let me give a very concrete example to illustrate my point.

One thing I have on my TODO is to implement catch-up support for the 
emulated devices.  I want to implement three modes of catch-up support: 
drop, fast, and gradual.  Gradual is the best policy IMHO but fast is 
necessary on older kernels without highres timers.  Drop is necessary to 
maintain compatibility with what we have today.

The kernel PIT only implements one mode and even if the other two were 
added, even the newest version of QEMU needs to deal with the fact that 
there's old kernels out there with PIT's that only do fast.

So how does this get exposed to management tools?  Do you check for 
drift-mode=fast and transparently enable the KVM pit?  Do you fail if 
anything but drift-mode=fast is specified?

We need to have the following mechanisms:

1) the ability to select an in-kernel PIT vs. a userspace PIT

2) an independent mechanism to configure the userspace PIT

3) an independent mechanism to configure the in-kernel PIT.

The best way to do this is to make the in-kernel PIT a separate device.  
Then we get all of this for free.

>>
>> 2) a user can explicitly create either the emulated version of the 
>> device or the in-kernel version of the device (no need for 
>> -no-kvm-irqchip)
>
> -device ioapic,model=kernel vs. -device kvm-ioapic?
>
> Is it really important to do that? 110% of the time we want the kernel 
> irqchips.  The remaining -10% are only used for testing.

If model=kernel makes the support options different, then you end up 
introduce another layer of option validation.  By using the later form, 
you get to leverage the option validation of qdev plus it makes it much 
clearer to users what options are supported in what model because now 
the documentation is explicit about it.

>>
>> 3) a user can pass parameters directly to the in-kernel version of 
>> the device that are different from the userspace version (like 
>> selecting different interrupt catch-up methods)
>
> -device pit,model=qemu,catchup=slew
>
> error: catchup=slew not supported in this model
>
> I'm not overly concerned about the implementation part.  Though I 
> think it's better to have a single implementation with kvm acting as 
> an accelerator, having it the other way is no big deal.  What I am 
> worried about is exposing it as a monitor and migration ABI.  IMO the 
> only important thing is the spec that the device implements, not what 
> piece of code implements it.

Just as we do in the PIT, there's nothing wrong with making the device's 
migration compatible.  I'm not entirely sure what your concerns about 
the monitor are but there's simply no way to hide the fact that a device 
is implemented in KVM at the monitor level.  But really, is this 
something that management tools want?  I doubt it.  I think they want to 
have ultimate control over what gets created with us providing a 
recommended set of defaults.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
  2011-01-11 14:22                           ` Avi Kivity
@ 2011-01-11 14:36                             ` Anthony Liguori
  -1 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-11 14:36 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Alexander Graf, Jan Kiszka, Marcelo Tosatti, qemu-devel, kvm

On 01/11/2011 08:22 AM, Avi Kivity wrote:
> On 01/11/2011 04:09 PM, Anthony Liguori wrote:
>>> Disadvantages:
>>>
>>> 1) you lose migration / savevm between KVM and non-KVM VMs
>>
>> This doesn't work today and it's never worked.  KVM exposes things 
>> that TCG cannot emulate (like pvclock).
>
> If you run kvm without pvclock, or implement pvclock in qemu, it works 
> fine.  It should work fine for the PIT, PIC, and IOAPIC (never tried 
> it myself).
>
> If we decide to have a kernel hpet implementation, for example, it 
> would be good to be able to live migrate from a version without kernel 
> hpet, to a version with kernel hpet, and have the kernel hpet enabled.
>
>>
>> Even as two devices, nothing prevents it from working.  Both devices 
>> just have to support each other's savevm format.  If they use the 
>> same code, it makes it very easy.  Take a look at how the KVM PIT is 
>> implemented for an example of this.
>
> They need to use the same device id then.  And if they share code, 
> that indicates that they need to be the same device even more.

No, it really doesn't :-)  Cirrus VGA and std VGA share a lot of code.  
But that doesn't mean that we treat them as one device.

And BTW, there are guest visible differences between the KVM 
IOAPIC/PIC/PIT than the QEMU versions.  The only reason PIT live 
migration works today is because usually delivers all interrupts 
quickly.  But it actually does maintain state in the work queue that 
isn't saved.  If PIT tried to implement gradual catchup, there would be 
no way not to expose that state to userspace.



Regards,

Anthony Liguori



^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
@ 2011-01-11 14:36                             ` Anthony Liguori
  0 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-11 14:36 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, Jan Kiszka, Alexander Graf, kvm, qemu-devel

On 01/11/2011 08:22 AM, Avi Kivity wrote:
> On 01/11/2011 04:09 PM, Anthony Liguori wrote:
>>> Disadvantages:
>>>
>>> 1) you lose migration / savevm between KVM and non-KVM VMs
>>
>> This doesn't work today and it's never worked.  KVM exposes things 
>> that TCG cannot emulate (like pvclock).
>
> If you run kvm without pvclock, or implement pvclock in qemu, it works 
> fine.  It should work fine for the PIT, PIC, and IOAPIC (never tried 
> it myself).
>
> If we decide to have a kernel hpet implementation, for example, it 
> would be good to be able to live migrate from a version without kernel 
> hpet, to a version with kernel hpet, and have the kernel hpet enabled.
>
>>
>> Even as two devices, nothing prevents it from working.  Both devices 
>> just have to support each other's savevm format.  If they use the 
>> same code, it makes it very easy.  Take a look at how the KVM PIT is 
>> implemented for an example of this.
>
> They need to use the same device id then.  And if they share code, 
> that indicates that they need to be the same device even more.

No, it really doesn't :-)  Cirrus VGA and std VGA share a lot of code.  
But that doesn't mean that we treat them as one device.

And BTW, there are guest visible differences between the KVM 
IOAPIC/PIC/PIT than the QEMU versions.  The only reason PIT live 
migration works today is because usually delivers all interrupts 
quickly.  But it actually does maintain state in the work queue that 
isn't saved.  If PIT tried to implement gradual catchup, there would be 
no way not to expose that state to userspace.



Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
  2011-01-11 14:28                         ` Anthony Liguori
@ 2011-01-11 14:52                           ` Avi Kivity
  -1 siblings, 0 replies; 300+ messages in thread
From: Avi Kivity @ 2011-01-11 14:52 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Jan Kiszka, Marcelo Tosatti, qemu-devel, kvm, Alexander Graf

On 01/11/2011 04:28 PM, Anthony Liguori wrote:
> On 01/11/2011 08:18 AM, Avi Kivity wrote:
>> On 01/11/2011 04:00 PM, Anthony Liguori wrote:
>>> On 01/11/2011 03:01 AM, Avi Kivity wrote:
>>>> On 01/10/2011 10:23 PM, Anthony Liguori wrote:
>>>>>>> I don't see how ioapic, pit, or pic have a system scope.
>>>>>> They are not bound to any CPU like the APIC which you may have in 
>>>>>> mind.
>>>>>
>>>>> And none of the above interact with KVM.
>>>>
>>>> They're implemented by kvm.  What deeper interaction do you have in 
>>>> mind?
>>>
>>> The emulated ioapic/pit/pic do not interact with KVM at all.
>>
>> How can they "not interact" with kvm if they're implemented by kvm?
>>
>> I really don't follow here.
>
> "emulated ioapic/pit/pic" == versions implemented in QEMU.  That's 
> what I'm trying to say.  When not using the KVM versions of the 
> devices, there are no interactions with KVM.

Okay.  Isn't that the same for the cpu?  Yet we use the same CPUState 
and are live-migration compatible (as long as cpuids match).

>
>>>
>>> The KVM versions should be completely separate devices.
>>>
>>
>> Why?
>
> Because the KVM versions are replacements.

Only the implementation.  The guest doesn't see the replacement.  They 
have exactly the same state.

>
>>>> I don't see why.  Those are just two different implementations for 
>>>> the same guest visible device.
>>>
>>> Right, they should appear the same to the guest but the fact that 
>>> they're two different implementations should be reflected in the 
>>> device tree.
>>
>> Why?
>>
>> To move beyond single-word questions, what is the purpose of the 
>> device tree?  In my mind, it reflects the virtual hardware.  What's 
>> important is that we have a PIC, virtio network adapter, and IDE 
>> disk.  Not that they're backed by kvm, vhost-net, and qcow2.
>
> Let me give a very concrete example to illustrate my point.
>
> One thing I have on my TODO is to implement catch-up support for the 
> emulated devices.  I want to implement three modes of catch-up 
> support: drop, fast, and gradual.  Gradual is the best policy IMHO but 
> fast is necessary on older kernels without highres timers.  Drop is 
> necessary to maintain compatibility with what we have today.
>
> The kernel PIT only implements one mode and even if the other two were 
> added, even the newest version of QEMU needs to deal with the fact 
> that there's old kernels out there with PIT's that only do fast.
>
> So how does this get exposed to management tools?  Do you check for 
> drift-mode=fast and transparently enable the KVM pit?  Do you fail if 
> anything but drift-mode=fast is specified?
>
> We need to have the following mechanisms:
>
> 1) the ability to select an in-kernel PIT vs. a userspace PIT
>
> 2) an independent mechanism to configure the userspace PIT
>
> 3) an independent mechanism to configure the in-kernel PIT.
>
> The best way to do this is to make the in-kernel PIT a separate 
> device.  Then we get all of this for free.

And it buys us live migration and ABI issues for the same price.

Really, can't we do

     class i8254 {
         ...
         virtual void set_catchup_policy(std::string policy) = 0;
         ...
     }

to deal with the differences?

>
>>>
>>> 2) a user can explicitly create either the emulated version of the 
>>> device or the in-kernel version of the device (no need for 
>>> -no-kvm-irqchip)
>>
>> -device ioapic,model=kernel vs. -device kvm-ioapic?
>>
>> Is it really important to do that? 110% of the time we want the 
>> kernel irqchips.  The remaining -10% are only used for testing.
>
> If model=kernel makes the support options different, then you end up 
> introduce another layer of option validation.  By using the later 
> form, you get to leverage the option validation of qdev plus it makes 
> it much clearer to users what options are supported in what model 
> because now the documentation is explicit about it.

Option validation = internals.  ABI = ABI.  We can deal with the former 
in any number of ways, but exposing it to the ABI is forever.

>
>>>
>>> 3) a user can pass parameters directly to the in-kernel version of 
>>> the device that are different from the userspace version (like 
>>> selecting different interrupt catch-up methods)
>>
>> -device pit,model=qemu,catchup=slew
>>
>> error: catchup=slew not supported in this model
>>
>> I'm not overly concerned about the implementation part.  Though I 
>> think it's better to have a single implementation with kvm acting as 
>> an accelerator, having it the other way is no big deal.  What I am 
>> worried about is exposing it as a monitor and migration ABI.  IMO the 
>> only important thing is the spec that the device implements, not what 
>> piece of code implements it.
>
> Just as we do in the PIT, there's nothing wrong with making the 
> device's migration compatible. 

Then the two devices have the same migration section id?  That's my 
biggest worry.  Not really worried about PIT and PIC (no one uses the 
user PIT now), but more about future devices moving into the kernel, if 
we have to do that.

> I'm not entirely sure what your concerns about the monitor are but 
> there's simply no way to hide the fact that a device is implemented in 
> KVM at the monitor level. 

Why is that?  a PIT is a PIT.  Why does the monitor care where the state 
is managed?

> But really, is this something that management tools want?  I doubt 
> it.  I think they want to have ultimate control over what gets created 
> with us providing a recommended set of defaults.

They also want a forward migration path.  Splitting into two separate 
devices (at the ABI level, ignoring the source level for now) denies 
them that.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
@ 2011-01-11 14:52                           ` Avi Kivity
  0 siblings, 0 replies; 300+ messages in thread
From: Avi Kivity @ 2011-01-11 14:52 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Marcelo Tosatti, Jan Kiszka, qemu-devel, kvm, Alexander Graf

On 01/11/2011 04:28 PM, Anthony Liguori wrote:
> On 01/11/2011 08:18 AM, Avi Kivity wrote:
>> On 01/11/2011 04:00 PM, Anthony Liguori wrote:
>>> On 01/11/2011 03:01 AM, Avi Kivity wrote:
>>>> On 01/10/2011 10:23 PM, Anthony Liguori wrote:
>>>>>>> I don't see how ioapic, pit, or pic have a system scope.
>>>>>> They are not bound to any CPU like the APIC which you may have in 
>>>>>> mind.
>>>>>
>>>>> And none of the above interact with KVM.
>>>>
>>>> They're implemented by kvm.  What deeper interaction do you have in 
>>>> mind?
>>>
>>> The emulated ioapic/pit/pic do not interact with KVM at all.
>>
>> How can they "not interact" with kvm if they're implemented by kvm?
>>
>> I really don't follow here.
>
> "emulated ioapic/pit/pic" == versions implemented in QEMU.  That's 
> what I'm trying to say.  When not using the KVM versions of the 
> devices, there are no interactions with KVM.

Okay.  Isn't that the same for the cpu?  Yet we use the same CPUState 
and are live-migration compatible (as long as cpuids match).

>
>>>
>>> The KVM versions should be completely separate devices.
>>>
>>
>> Why?
>
> Because the KVM versions are replacements.

Only the implementation.  The guest doesn't see the replacement.  They 
have exactly the same state.

>
>>>> I don't see why.  Those are just two different implementations for 
>>>> the same guest visible device.
>>>
>>> Right, they should appear the same to the guest but the fact that 
>>> they're two different implementations should be reflected in the 
>>> device tree.
>>
>> Why?
>>
>> To move beyond single-word questions, what is the purpose of the 
>> device tree?  In my mind, it reflects the virtual hardware.  What's 
>> important is that we have a PIC, virtio network adapter, and IDE 
>> disk.  Not that they're backed by kvm, vhost-net, and qcow2.
>
> Let me give a very concrete example to illustrate my point.
>
> One thing I have on my TODO is to implement catch-up support for the 
> emulated devices.  I want to implement three modes of catch-up 
> support: drop, fast, and gradual.  Gradual is the best policy IMHO but 
> fast is necessary on older kernels without highres timers.  Drop is 
> necessary to maintain compatibility with what we have today.
>
> The kernel PIT only implements one mode and even if the other two were 
> added, even the newest version of QEMU needs to deal with the fact 
> that there's old kernels out there with PIT's that only do fast.
>
> So how does this get exposed to management tools?  Do you check for 
> drift-mode=fast and transparently enable the KVM pit?  Do you fail if 
> anything but drift-mode=fast is specified?
>
> We need to have the following mechanisms:
>
> 1) the ability to select an in-kernel PIT vs. a userspace PIT
>
> 2) an independent mechanism to configure the userspace PIT
>
> 3) an independent mechanism to configure the in-kernel PIT.
>
> The best way to do this is to make the in-kernel PIT a separate 
> device.  Then we get all of this for free.

And it buys us live migration and ABI issues for the same price.

Really, can't we do

     class i8254 {
         ...
         virtual void set_catchup_policy(std::string policy) = 0;
         ...
     }

to deal with the differences?

>
>>>
>>> 2) a user can explicitly create either the emulated version of the 
>>> device or the in-kernel version of the device (no need for 
>>> -no-kvm-irqchip)
>>
>> -device ioapic,model=kernel vs. -device kvm-ioapic?
>>
>> Is it really important to do that? 110% of the time we want the 
>> kernel irqchips.  The remaining -10% are only used for testing.
>
> If model=kernel makes the support options different, then you end up 
> introduce another layer of option validation.  By using the later 
> form, you get to leverage the option validation of qdev plus it makes 
> it much clearer to users what options are supported in what model 
> because now the documentation is explicit about it.

Option validation = internals.  ABI = ABI.  We can deal with the former 
in any number of ways, but exposing it to the ABI is forever.

>
>>>
>>> 3) a user can pass parameters directly to the in-kernel version of 
>>> the device that are different from the userspace version (like 
>>> selecting different interrupt catch-up methods)
>>
>> -device pit,model=qemu,catchup=slew
>>
>> error: catchup=slew not supported in this model
>>
>> I'm not overly concerned about the implementation part.  Though I 
>> think it's better to have a single implementation with kvm acting as 
>> an accelerator, having it the other way is no big deal.  What I am 
>> worried about is exposing it as a monitor and migration ABI.  IMO the 
>> only important thing is the spec that the device implements, not what 
>> piece of code implements it.
>
> Just as we do in the PIT, there's nothing wrong with making the 
> device's migration compatible. 

Then the two devices have the same migration section id?  That's my 
biggest worry.  Not really worried about PIT and PIC (no one uses the 
user PIT now), but more about future devices moving into the kernel, if 
we have to do that.

> I'm not entirely sure what your concerns about the monitor are but 
> there's simply no way to hide the fact that a device is implemented in 
> KVM at the monitor level. 

Why is that?  a PIT is a PIT.  Why does the monitor care where the state 
is managed?

> But really, is this something that management tools want?  I doubt 
> it.  I think they want to have ultimate control over what gets created 
> with us providing a recommended set of defaults.

They also want a forward migration path.  Splitting into two separate 
devices (at the ABI level, ignoring the source level for now) denies 
them that.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
  2011-01-11 14:36                             ` Anthony Liguori
@ 2011-01-11 14:56                               ` Avi Kivity
  -1 siblings, 0 replies; 300+ messages in thread
From: Avi Kivity @ 2011-01-11 14:56 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Alexander Graf, Jan Kiszka, Marcelo Tosatti, qemu-devel, kvm

On 01/11/2011 04:36 PM, Anthony Liguori wrote:
>> They need to use the same device id then.  And if they share code, 
>> that indicates that they need to be the same device even more.
>
>
> No, it really doesn't :-)  Cirrus VGA and std VGA share a lot of 
> code.  But that doesn't mean that we treat them as one device.

Cirrus and VGA really are separate devices.  They share code because on 
evolved from the other, and is backwards compatible with the other.  
i8254 and i8254-kvm did not evolve from each other, both are 
implementations of the i8254 spec, and both are 100% compatible with 
each other (modulu bugs).

>
> And BTW, there are guest visible differences between the KVM 
> IOAPIC/PIC/PIT than the QEMU versions.  The only reason PIT live 
> migration works today is because usually delivers all interrupts 
> quickly.  But it actually does maintain state in the work queue that 
> isn't saved.  If PIT tried to implement gradual catchup, there would 
> be no way not to expose that state to userspace.

Why not?  Whatever state the kernel keeps, we expose to userspace and 
allow sending it over the wire.


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
@ 2011-01-11 14:56                               ` Avi Kivity
  0 siblings, 0 replies; 300+ messages in thread
From: Avi Kivity @ 2011-01-11 14:56 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Marcelo Tosatti, Jan Kiszka, Alexander Graf, kvm, qemu-devel

On 01/11/2011 04:36 PM, Anthony Liguori wrote:
>> They need to use the same device id then.  And if they share code, 
>> that indicates that they need to be the same device even more.
>
>
> No, it really doesn't :-)  Cirrus VGA and std VGA share a lot of 
> code.  But that doesn't mean that we treat them as one device.

Cirrus and VGA really are separate devices.  They share code because on 
evolved from the other, and is backwards compatible with the other.  
i8254 and i8254-kvm did not evolve from each other, both are 
implementations of the i8254 spec, and both are 100% compatible with 
each other (modulu bugs).

>
> And BTW, there are guest visible differences between the KVM 
> IOAPIC/PIC/PIT than the QEMU versions.  The only reason PIT live 
> migration works today is because usually delivers all interrupts 
> quickly.  But it actually does maintain state in the work queue that 
> isn't saved.  If PIT tried to implement gradual catchup, there would 
> be no way not to expose that state to userspace.

Why not?  Whatever state the kernel keeps, we expose to userspace and 
allow sending it over the wire.


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
  2011-01-11 14:56                               ` Avi Kivity
@ 2011-01-11 15:12                                 ` Anthony Liguori
  -1 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-11 15:12 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Alexander Graf, Jan Kiszka, Marcelo Tosatti, qemu-devel, kvm

On 01/11/2011 08:56 AM, Avi Kivity wrote:
> On 01/11/2011 04:36 PM, Anthony Liguori wrote:
>>> They need to use the same device id then.  And if they share code, 
>>> that indicates that they need to be the same device even more.
>>
>>
>> No, it really doesn't :-)  Cirrus VGA and std VGA share a lot of 
>> code.  But that doesn't mean that we treat them as one device.
>
> Cirrus and VGA really are separate devices.  They share code because 
> on evolved from the other, and is backwards compatible with the 
> other.  i8254 and i8254-kvm did not evolve from each other,

Actually, they did, but that's besides the point.

> both are implementations of the i8254 spec, and both are 100% 
> compatible with each other (modulu bugs).
>
>>
>> And BTW, there are guest visible differences between the KVM 
>> IOAPIC/PIC/PIT than the QEMU versions.  The only reason PIT live 
>> migration works today is because usually delivers all interrupts 
>> quickly.  But it actually does maintain state in the work queue that 
>> isn't saved.  If PIT tried to implement gradual catchup, there would 
>> be no way not to expose that state to userspace.
>
> Why not?  Whatever state the kernel keeps, we expose to userspace and 
> allow sending it over the wire.

What exactly is the scenario you're concerned about?

Migration between userspace HPET and in-kernel HPET?

One thing I've been considering is essentially migration filters.  It 
would be a set of rules that essentially were "hpet-kvm.* = hpet.*" 
which would allow migration from hpet to hpet-kvm given a translation of 
state.  I think this sort of higher level ruleset would make it easier 
to support migration between versions of the device model.

Of course, that only gives you a forward path.  It doesn't give you a 
backwards path.

Regards,

Anthony Liguori



^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
@ 2011-01-11 15:12                                 ` Anthony Liguori
  0 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-11 15:12 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, Jan Kiszka, Alexander Graf, kvm, qemu-devel

On 01/11/2011 08:56 AM, Avi Kivity wrote:
> On 01/11/2011 04:36 PM, Anthony Liguori wrote:
>>> They need to use the same device id then.  And if they share code, 
>>> that indicates that they need to be the same device even more.
>>
>>
>> No, it really doesn't :-)  Cirrus VGA and std VGA share a lot of 
>> code.  But that doesn't mean that we treat them as one device.
>
> Cirrus and VGA really are separate devices.  They share code because 
> on evolved from the other, and is backwards compatible with the 
> other.  i8254 and i8254-kvm did not evolve from each other,

Actually, they did, but that's besides the point.

> both are implementations of the i8254 spec, and both are 100% 
> compatible with each other (modulu bugs).
>
>>
>> And BTW, there are guest visible differences between the KVM 
>> IOAPIC/PIC/PIT than the QEMU versions.  The only reason PIT live 
>> migration works today is because usually delivers all interrupts 
>> quickly.  But it actually does maintain state in the work queue that 
>> isn't saved.  If PIT tried to implement gradual catchup, there would 
>> be no way not to expose that state to userspace.
>
> Why not?  Whatever state the kernel keeps, we expose to userspace and 
> allow sending it over the wire.

What exactly is the scenario you're concerned about?

Migration between userspace HPET and in-kernel HPET?

One thing I've been considering is essentially migration filters.  It 
would be a set of rules that essentially were "hpet-kvm.* = hpet.*" 
which would allow migration from hpet to hpet-kvm given a translation of 
state.  I think this sort of higher level ruleset would make it easier 
to support migration between versions of the device model.

Of course, that only gives you a forward path.  It doesn't give you a 
backwards path.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
  2011-01-11 15:12                                 ` Anthony Liguori
@ 2011-01-11 15:17                                   ` Alexander Graf
  -1 siblings, 0 replies; 300+ messages in thread
From: Alexander Graf @ 2011-01-11 15:17 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Avi Kivity, Jan Kiszka, Marcelo Tosatti, qemu-devel, kvm


On 11.01.2011, at 16:12, Anthony Liguori wrote:

> On 01/11/2011 08:56 AM, Avi Kivity wrote:
>> On 01/11/2011 04:36 PM, Anthony Liguori wrote:
>>>> They need to use the same device id then.  And if they share code, that indicates that they need to be the same device even more.
>>> 
>>> 
>>> No, it really doesn't :-)  Cirrus VGA and std VGA share a lot of code.  But that doesn't mean that we treat them as one device.
>> 
>> Cirrus and VGA really are separate devices.  They share code because on evolved from the other, and is backwards compatible with the other.  i8254 and i8254-kvm did not evolve from each other,
> 
> Actually, they did, but that's besides the point.
> 
>> both are implementations of the i8254 spec, and both are 100% compatible with each other (modulu bugs).
>> 
>>> 
>>> And BTW, there are guest visible differences between the KVM IOAPIC/PIC/PIT than the QEMU versions.  The only reason PIT live migration works today is because usually delivers all interrupts quickly.  But it actually does maintain state in the work queue that isn't saved.  If PIT tried to implement gradual catchup, there would be no way not to expose that state to userspace.
>> 
>> Why not?  Whatever state the kernel keeps, we expose to userspace and allow sending it over the wire.
> 
> What exactly is the scenario you're concerned about?
> 
> Migration between userspace HPET and in-kernel HPET?
> 
> One thing I've been considering is essentially migration filters.  It would be a set of rules that essentially were "hpet-kvm.* = hpet.*" which would allow migration from hpet to hpet-kvm given a translation of state.  I think this sort of higher level ruleset would make it easier to support migration between versions of the device model.
> 
> Of course, that only gives you a forward path.  It doesn't give you a backwards path.

Why not? Just include the version in the rule set and define a backwards rule if it's easy to do. If not, migration isn't possible.


Alex


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
@ 2011-01-11 15:17                                   ` Alexander Graf
  0 siblings, 0 replies; 300+ messages in thread
From: Alexander Graf @ 2011-01-11 15:17 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Marcelo Tosatti, Jan Kiszka, Avi Kivity, kvm, qemu-devel


On 11.01.2011, at 16:12, Anthony Liguori wrote:

> On 01/11/2011 08:56 AM, Avi Kivity wrote:
>> On 01/11/2011 04:36 PM, Anthony Liguori wrote:
>>>> They need to use the same device id then.  And if they share code, that indicates that they need to be the same device even more.
>>> 
>>> 
>>> No, it really doesn't :-)  Cirrus VGA and std VGA share a lot of code.  But that doesn't mean that we treat them as one device.
>> 
>> Cirrus and VGA really are separate devices.  They share code because on evolved from the other, and is backwards compatible with the other.  i8254 and i8254-kvm did not evolve from each other,
> 
> Actually, they did, but that's besides the point.
> 
>> both are implementations of the i8254 spec, and both are 100% compatible with each other (modulu bugs).
>> 
>>> 
>>> And BTW, there are guest visible differences between the KVM IOAPIC/PIC/PIT than the QEMU versions.  The only reason PIT live migration works today is because usually delivers all interrupts quickly.  But it actually does maintain state in the work queue that isn't saved.  If PIT tried to implement gradual catchup, there would be no way not to expose that state to userspace.
>> 
>> Why not?  Whatever state the kernel keeps, we expose to userspace and allow sending it over the wire.
> 
> What exactly is the scenario you're concerned about?
> 
> Migration between userspace HPET and in-kernel HPET?
> 
> One thing I've been considering is essentially migration filters.  It would be a set of rules that essentially were "hpet-kvm.* = hpet.*" which would allow migration from hpet to hpet-kvm given a translation of state.  I think this sort of higher level ruleset would make it easier to support migration between versions of the device model.
> 
> Of course, that only gives you a forward path.  It doesn't give you a backwards path.

Why not? Just include the version in the rule set and define a backwards rule if it's easy to do. If not, migration isn't possible.


Alex

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
  2011-01-11 15:12                                 ` Anthony Liguori
@ 2011-01-11 15:37                                   ` Avi Kivity
  -1 siblings, 0 replies; 300+ messages in thread
From: Avi Kivity @ 2011-01-11 15:37 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Alexander Graf, Jan Kiszka, Marcelo Tosatti, qemu-devel, kvm

On 01/11/2011 05:12 PM, Anthony Liguori wrote:
>>> No, it really doesn't :-)  Cirrus VGA and std VGA share a lot of 
>>> code.  But that doesn't mean that we treat them as one device.
>>
>> Cirrus and VGA really are separate devices.  They share code because 
>> on evolved from the other, and is backwards compatible with the 
>> other.  i8254 and i8254-kvm did not evolve from each other,
>
>
> Actually, they did, but that's besides the point.

The code did, the devices did not.

>> Why not?  Whatever state the kernel keeps, we expose to userspace and 
>> allow sending it over the wire.
>
> What exactly is the scenario you're concerned about?
>
> Migration between userspace HPET and in-kernel HPET?

Yes.  To a lesser extent, a client doing 'info hpet' or similar and 
failing for kernel hpet.

>
> One thing I've been considering is essentially migration filters.  It 
> would be a set of rules that essentially were "hpet-kvm.* = hpet.*" 
> which would allow migration from hpet to hpet-kvm given a translation 
> of state.  I think this sort of higher level ruleset would make it 
> easier to support migration between versions of the device model.
>
> Of course, that only gives you a forward path.  It doesn't give you a 
> backwards path.
>

It would be easier to have them use the same device id in the first place.

If it looks like an i8254, quacks like an i8254, and live migrates like 
an i8254, it's probably an i8254.

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
@ 2011-01-11 15:37                                   ` Avi Kivity
  0 siblings, 0 replies; 300+ messages in thread
From: Avi Kivity @ 2011-01-11 15:37 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Marcelo Tosatti, Jan Kiszka, Alexander Graf, kvm, qemu-devel

On 01/11/2011 05:12 PM, Anthony Liguori wrote:
>>> No, it really doesn't :-)  Cirrus VGA and std VGA share a lot of 
>>> code.  But that doesn't mean that we treat them as one device.
>>
>> Cirrus and VGA really are separate devices.  They share code because 
>> on evolved from the other, and is backwards compatible with the 
>> other.  i8254 and i8254-kvm did not evolve from each other,
>
>
> Actually, they did, but that's besides the point.

The code did, the devices did not.

>> Why not?  Whatever state the kernel keeps, we expose to userspace and 
>> allow sending it over the wire.
>
> What exactly is the scenario you're concerned about?
>
> Migration between userspace HPET and in-kernel HPET?

Yes.  To a lesser extent, a client doing 'info hpet' or similar and 
failing for kernel hpet.

>
> One thing I've been considering is essentially migration filters.  It 
> would be a set of rules that essentially were "hpet-kvm.* = hpet.*" 
> which would allow migration from hpet to hpet-kvm given a translation 
> of state.  I think this sort of higher level ruleset would make it 
> easier to support migration between versions of the device model.
>
> Of course, that only gives you a forward path.  It doesn't give you a 
> backwards path.
>

It would be easier to have them use the same device id in the first place.

If it looks like an i8254, quacks like an i8254, and live migrates like 
an i8254, it's probably an i8254.

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
  2011-01-11 15:37                                   ` Avi Kivity
@ 2011-01-11 15:55                                     ` Anthony Liguori
  -1 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-11 15:55 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Alexander Graf, Jan Kiszka, Marcelo Tosatti, qemu-devel, kvm

On 01/11/2011 09:37 AM, Avi Kivity wrote:
>>> Why not?  Whatever state the kernel keeps, we expose to userspace 
>>> and allow sending it over the wire.
>>
>> What exactly is the scenario you're concerned about?
>>
>> Migration between userspace HPET and in-kernel HPET?
>
> Yes.  To a lesser extent, a client doing 'info hpet' or similar and 
> failing for kernel hpet.

That's pretty easy to address.

>>
>> One thing I've been considering is essentially migration filters.  It 
>> would be a set of rules that essentially were "hpet-kvm.* = hpet.*" 
>> which would allow migration from hpet to hpet-kvm given a translation 
>> of state.  I think this sort of higher level ruleset would make it 
>> easier to support migration between versions of the device model.
>>
>> Of course, that only gives you a forward path.  It doesn't give you a 
>> backwards path.
>>
>
> It would be easier to have them use the same device id in the first 
> place.
>
> If it looks like an i8254, quacks like an i8254, and live migrates 
> like an i8254, it's probably an i8254.

And that's fine.  I'm not suggesting you call it i8253.  But it's two 
separate implementations.  We should make that visible, not try to hide 
it.  It's an important detail.

Imagine getting a sosreport that includes a dump of the device tree.  
You really want to see something in there that tells you it's an 
in-kernel PIT and not the userspace one.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
@ 2011-01-11 15:55                                     ` Anthony Liguori
  0 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-11 15:55 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, Jan Kiszka, Alexander Graf, kvm, qemu-devel

On 01/11/2011 09:37 AM, Avi Kivity wrote:
>>> Why not?  Whatever state the kernel keeps, we expose to userspace 
>>> and allow sending it over the wire.
>>
>> What exactly is the scenario you're concerned about?
>>
>> Migration between userspace HPET and in-kernel HPET?
>
> Yes.  To a lesser extent, a client doing 'info hpet' or similar and 
> failing for kernel hpet.

That's pretty easy to address.

>>
>> One thing I've been considering is essentially migration filters.  It 
>> would be a set of rules that essentially were "hpet-kvm.* = hpet.*" 
>> which would allow migration from hpet to hpet-kvm given a translation 
>> of state.  I think this sort of higher level ruleset would make it 
>> easier to support migration between versions of the device model.
>>
>> Of course, that only gives you a forward path.  It doesn't give you a 
>> backwards path.
>>
>
> It would be easier to have them use the same device id in the first 
> place.
>
> If it looks like an i8254, quacks like an i8254, and live migrates 
> like an i8254, it's probably an i8254.

And that's fine.  I'm not suggesting you call it i8253.  But it's two 
separate implementations.  We should make that visible, not try to hide 
it.  It's an important detail.

Imagine getting a sosreport that includes a dump of the device tree.  
You really want to see something in there that tells you it's an 
in-kernel PIT and not the userspace one.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
  2011-01-11 15:55                                     ` Anthony Liguori
@ 2011-01-11 16:03                                       ` Avi Kivity
  -1 siblings, 0 replies; 300+ messages in thread
From: Avi Kivity @ 2011-01-11 16:03 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Alexander Graf, Jan Kiszka, Marcelo Tosatti, qemu-devel, kvm

On 01/11/2011 05:55 PM, Anthony Liguori wrote:
>
>>>
>>> One thing I've been considering is essentially migration filters.  
>>> It would be a set of rules that essentially were "hpet-kvm.* = 
>>> hpet.*" which would allow migration from hpet to hpet-kvm given a 
>>> translation of state.  I think this sort of higher level ruleset 
>>> would make it easier to support migration between versions of the 
>>> device model.
>>>
>>> Of course, that only gives you a forward path.  It doesn't give you 
>>> a backwards path.
>>>
>>
>> It would be easier to have them use the same device id in the first 
>> place.
>>
>> If it looks like an i8254, quacks like an i8254, and live migrates 
>> like an i8254, it's probably an i8254.
>
> And that's fine.  I'm not suggesting you call it i8253.  But it's two 
> separate implementations.  We should make that visible, not try to 
> hide it.  It's an important detail.
>

Visible, yes, but not in live migration, or in 'info i8254', or 
similar.  We can live migrate between qcow2 and qed (using block 
migration), we should be able to do the same for the two i8254 
implementations.

I'm not happy about separate implementations, but that's a minor 
details.  We can change it 2n+1 times without anybody noticing.  Not so 
about ABI stuff.

> Imagine getting a sosreport that includes a dump of the device tree.  
> You really want to see something in there that tells you it's an 
> in-kernel PIT and not the userspace one.

Sure.  Not the device tree though.  The command line would give all the 
information?

Or 'info i8254' can say something about the implementation.  I don't 
want to have the user say 'info i8254-kvm'.

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
@ 2011-01-11 16:03                                       ` Avi Kivity
  0 siblings, 0 replies; 300+ messages in thread
From: Avi Kivity @ 2011-01-11 16:03 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Marcelo Tosatti, Jan Kiszka, Alexander Graf, kvm, qemu-devel

On 01/11/2011 05:55 PM, Anthony Liguori wrote:
>
>>>
>>> One thing I've been considering is essentially migration filters.  
>>> It would be a set of rules that essentially were "hpet-kvm.* = 
>>> hpet.*" which would allow migration from hpet to hpet-kvm given a 
>>> translation of state.  I think this sort of higher level ruleset 
>>> would make it easier to support migration between versions of the 
>>> device model.
>>>
>>> Of course, that only gives you a forward path.  It doesn't give you 
>>> a backwards path.
>>>
>>
>> It would be easier to have them use the same device id in the first 
>> place.
>>
>> If it looks like an i8254, quacks like an i8254, and live migrates 
>> like an i8254, it's probably an i8254.
>
> And that's fine.  I'm not suggesting you call it i8253.  But it's two 
> separate implementations.  We should make that visible, not try to 
> hide it.  It's an important detail.
>

Visible, yes, but not in live migration, or in 'info i8254', or 
similar.  We can live migrate between qcow2 and qed (using block 
migration), we should be able to do the same for the two i8254 
implementations.

I'm not happy about separate implementations, but that's a minor 
details.  We can change it 2n+1 times without anybody noticing.  Not so 
about ABI stuff.

> Imagine getting a sosreport that includes a dump of the device tree.  
> You really want to see something in there that tells you it's an 
> in-kernel PIT and not the userspace one.

Sure.  Not the device tree though.  The command line would give all the 
information?

Or 'info i8254' can say something about the implementation.  I don't 
want to have the user say 'info i8254-kvm'.

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
  2011-01-11 16:03                                       ` Avi Kivity
@ 2011-01-11 16:26                                         ` Anthony Liguori
  -1 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-11 16:26 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Alexander Graf, Jan Kiszka, Marcelo Tosatti, qemu-devel, kvm


> Visible, yes, but not in live migration, or in 'info i8254', or 
> similar.  We can live migrate between qcow2 and qed (using block 
> migration), we should be able to do the same for the two i8254 
> implementations.
>
> I'm not happy about separate implementations, but that's a minor 
> details.  We can change it 2n+1 times without anybody noticing.  Not 
> so about ABI stuff.
>
>> Imagine getting a sosreport that includes a dump of the device tree.  
>> You really want to see something in there that tells you it's an 
>> in-kernel PIT and not the userspace one.
>
> Sure.  Not the device tree though.  The command line would give all 
> the information?

Then it's a one off option.  We really want as much info as possible 
stored in the device tree.

>
> Or 'info i8254' can say something about the implementation.  I don't 
> want to have the user say 'info i8254-kvm'.

info doesn't take a qdev device so yes, it can show whatever we want it 
to show.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
@ 2011-01-11 16:26                                         ` Anthony Liguori
  0 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-11 16:26 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, Jan Kiszka, Alexander Graf, kvm, qemu-devel


> Visible, yes, but not in live migration, or in 'info i8254', or 
> similar.  We can live migrate between qcow2 and qed (using block 
> migration), we should be able to do the same for the two i8254 
> implementations.
>
> I'm not happy about separate implementations, but that's a minor 
> details.  We can change it 2n+1 times without anybody noticing.  Not 
> so about ABI stuff.
>
>> Imagine getting a sosreport that includes a dump of the device tree.  
>> You really want to see something in there that tells you it's an 
>> in-kernel PIT and not the userspace one.
>
> Sure.  Not the device tree though.  The command line would give all 
> the information?

Then it's a one off option.  We really want as much info as possible 
stored in the device tree.

>
> Or 'info i8254' can say something about the implementation.  I don't 
> want to have the user say 'info i8254-kvm'.

info doesn't take a qdev device so yes, it can show whatever we want it 
to show.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
  2011-01-11 16:26                                         ` Anthony Liguori
@ 2011-01-11 17:05                                           ` Avi Kivity
  -1 siblings, 0 replies; 300+ messages in thread
From: Avi Kivity @ 2011-01-11 17:05 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Alexander Graf, Jan Kiszka, Marcelo Tosatti, qemu-devel, kvm

On 01/11/2011 06:26 PM, Anthony Liguori wrote:
>
>> Visible, yes, but not in live migration, or in 'info i8254', or 
>> similar.  We can live migrate between qcow2 and qed (using block 
>> migration), we should be able to do the same for the two i8254 
>> implementations.
>>
>> I'm not happy about separate implementations, but that's a minor 
>> details.  We can change it 2n+1 times without anybody noticing.  Not 
>> so about ABI stuff.
>>
>>> Imagine getting a sosreport that includes a dump of the device 
>>> tree.  You really want to see something in there that tells you it's 
>>> an in-kernel PIT and not the userspace one.
>>
>> Sure.  Not the device tree though.  The command line would give all 
>> the information?
>
> Then it's a one off option.  We really want as much info as possible 
> stored in the device tree.
>
>>
>> Or 'info i8254' can say something about the implementation.  I don't 
>> want to have the user say 'info i8254-kvm'.
>
> info doesn't take a qdev device so yes, it can show whatever we want 
> it to show.
>

It may be a qdev read-only attribute (and thus not migrated?) if we have 
to have it in qdev for some reason.


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments
@ 2011-01-11 17:05                                           ` Avi Kivity
  0 siblings, 0 replies; 300+ messages in thread
From: Avi Kivity @ 2011-01-11 17:05 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Marcelo Tosatti, Jan Kiszka, Alexander Graf, kvm, qemu-devel

On 01/11/2011 06:26 PM, Anthony Liguori wrote:
>
>> Visible, yes, but not in live migration, or in 'info i8254', or 
>> similar.  We can live migrate between qcow2 and qed (using block 
>> migration), we should be able to do the same for the two i8254 
>> implementations.
>>
>> I'm not happy about separate implementations, but that's a minor 
>> details.  We can change it 2n+1 times without anybody noticing.  Not 
>> so about ABI stuff.
>>
>>> Imagine getting a sosreport that includes a dump of the device 
>>> tree.  You really want to see something in there that tells you it's 
>>> an in-kernel PIT and not the userspace one.
>>
>> Sure.  Not the device tree though.  The command line would give all 
>> the information?
>
> Then it's a one off option.  We really want as much info as possible 
> stored in the device tree.
>
>>
>> Or 'info i8254' can say something about the implementation.  I don't 
>> want to have the user say 'info i8254-kvm'.
>
> info doesn't take a qdev device so yes, it can show whatever we want 
> it to show.
>

It may be a qdev read-only attribute (and thus not migrated?) if we have 
to have it in qdev for some reason.

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-11  8:53           ` Gerd Hoffmann
@ 2011-01-11 17:13             ` Jan Kiszka
  -1 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-11 17:13 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Anthony Liguori, Marcelo Tosatti, Anthony Liguori, Glauber Costa,
	qemu-devel, kvm

[-- Attachment #1: Type: text/plain, Size: 815 bytes --]

Am 11.01.2011 09:53, Gerd Hoffmann wrote:
>   Hi,
> 
>> Actually, there is already a channel to pass pointers to qdev devices:
>> the pointer property hack. I'm not sure we should contribute to its user
>> base or take the chance for a cleanup, but we are not alone with this
>> requirement. Point below remains valid, though.
> 
> It is considered bad/hackish style as you can't create that kind of
> devices using the -device command line switch (or from a machine
> description config file some day in the future).

That kind of instantiation wouldn't be possible for device models that
require someone actively passing kvm_state to them...

>  So we should not add
> more uses of this, especially not in patches which are supposed to
> cleanup things ;)

You won't see me disagree.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-11 17:13             ` Jan Kiszka
  0 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-11 17:13 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Anthony Liguori, kvm, Glauber Costa, Marcelo Tosatti, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 815 bytes --]

Am 11.01.2011 09:53, Gerd Hoffmann wrote:
>   Hi,
> 
>> Actually, there is already a channel to pass pointers to qdev devices:
>> the pointer property hack. I'm not sure we should contribute to its user
>> base or take the chance for a cleanup, but we are not alone with this
>> requirement. Point below remains valid, though.
> 
> It is considered bad/hackish style as you can't create that kind of
> devices using the -device command line switch (or from a machine
> description config file some day in the future).

That kind of instantiation wouldn't be possible for device models that
require someone actively passing kvm_state to them...

>  So we should not add
> more uses of this, especially not in patches which are supposed to
> cleanup things ;)

You won't see me disagree.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-11 13:54             ` Anthony Liguori
@ 2011-01-12 10:22               ` Avi Kivity
  -1 siblings, 0 replies; 300+ messages in thread
From: Avi Kivity @ 2011-01-12 10:22 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Markus Armbruster, Jan Kiszka, Marcelo Tosatti, Glauber Costa,
	kvm, qemu-devel

On 01/11/2011 03:54 PM, Anthony Liguori wrote:
>
> Right, we should introduce a KVMBus that KVM devices are created on.  
> The devices can get at KVMState through the BusState.

There is no kvm bus in a PC (I looked).  We're bending the device model 
here because a device is implemented in the kernel and not in 
userspace.  An implementation detail is magnified beyond all proportions.

An ioapic that is implemented by kvm lives in exactly the same place 
that the qemu ioapic lives in.  An assigned pci device lives on the PCI 
bus, not a KVMBus.  If we need a pointer to KVMState, then we must find 
it elsewhere, not through creating imaginary buses that don't exist.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-12 10:22               ` Avi Kivity
  0 siblings, 0 replies; 300+ messages in thread
From: Avi Kivity @ 2011-01-12 10:22 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kvm, Glauber Costa, Marcelo Tosatti, Markus Armbruster,
	qemu-devel, Jan Kiszka

On 01/11/2011 03:54 PM, Anthony Liguori wrote:
>
> Right, we should introduce a KVMBus that KVM devices are created on.  
> The devices can get at KVMState through the BusState.

There is no kvm bus in a PC (I looked).  We're bending the device model 
here because a device is implemented in the kernel and not in 
userspace.  An implementation detail is magnified beyond all proportions.

An ioapic that is implemented by kvm lives in exactly the same place 
that the qemu ioapic lives in.  An assigned pci device lives on the PCI 
bus, not a KVMBus.  If we need a pointer to KVMState, then we must find 
it elsewhere, not through creating imaginary buses that don't exist.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-12 10:22               ` Avi Kivity
@ 2011-01-12 10:31                 ` Jan Kiszka
  -1 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-12 10:31 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Anthony Liguori, Markus Armbruster, Marcelo Tosatti,
	Glauber Costa, kvm, qemu-devel

Am 12.01.2011 11:22, Avi Kivity wrote:
> On 01/11/2011 03:54 PM, Anthony Liguori wrote:
>>
>> Right, we should introduce a KVMBus that KVM devices are created on. 
>> The devices can get at KVMState through the BusState.
> 
> There is no kvm bus in a PC (I looked).  We're bending the device model
> here because a device is implemented in the kernel and not in
> userspace.  An implementation detail is magnified beyond all proportions.
> 
> An ioapic that is implemented by kvm lives in exactly the same place
> that the qemu ioapic lives in.  An assigned pci device lives on the PCI
> bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
> it elsewhere, not through creating imaginary buses that don't exist.
> 

Exactly.

So we can either "infect" the whole device tree with kvm (or maybe a
more generic accelerator structure that also deals with Xen) or we need
to pull the reference inside the device's init function from some global
service (kvm_get_state).

Jan

PS: I started refreshing my whole patch series with the two
controversial patches removed. Will send out later so that we can
proceed with merging the critical bits, specifically all the bug fixes.

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-12 10:31                 ` Jan Kiszka
  0 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-12 10:31 UTC (permalink / raw)
  To: Avi Kivity
  Cc: kvm, Glauber Costa, Marcelo Tosatti, Markus Armbruster,
	qemu-devel, Anthony Liguori

Am 12.01.2011 11:22, Avi Kivity wrote:
> On 01/11/2011 03:54 PM, Anthony Liguori wrote:
>>
>> Right, we should introduce a KVMBus that KVM devices are created on. 
>> The devices can get at KVMState through the BusState.
> 
> There is no kvm bus in a PC (I looked).  We're bending the device model
> here because a device is implemented in the kernel and not in
> userspace.  An implementation detail is magnified beyond all proportions.
> 
> An ioapic that is implemented by kvm lives in exactly the same place
> that the qemu ioapic lives in.  An assigned pci device lives on the PCI
> bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
> it elsewhere, not through creating imaginary buses that don't exist.
> 

Exactly.

So we can either "infect" the whole device tree with kvm (or maybe a
more generic accelerator structure that also deals with Xen) or we need
to pull the reference inside the device's init function from some global
service (kvm_get_state).

Jan

PS: I started refreshing my whole patch series with the two
controversial patches removed. Will send out later so that we can
proceed with merging the critical bits, specifically all the bug fixes.

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-12 10:22               ` Avi Kivity
@ 2011-01-12 12:04                 ` Markus Armbruster
  -1 siblings, 0 replies; 300+ messages in thread
From: Markus Armbruster @ 2011-01-12 12:04 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Anthony Liguori, kvm, Glauber Costa, Marcelo Tosatti, qemu-devel,
	Jan Kiszka

Avi Kivity <avi@redhat.com> writes:

> On 01/11/2011 03:54 PM, Anthony Liguori wrote:
>>
>> Right, we should introduce a KVMBus that KVM devices are created on.
>> The devices can get at KVMState through the BusState.
>
> There is no kvm bus in a PC (I looked).  We're bending the device
> model here because a device is implemented in the kernel and not in
> userspace.  An implementation detail is magnified beyond all
> proportions.
>
> An ioapic that is implemented by kvm lives in exactly the same place
> that the qemu ioapic lives in.

Exactly.  And that place is a bus.

What if the device interfaces in bus-specific ways with its parent bus?
Then we can't simply replace the parent bus by a KVM bus.  We'd need
*two* parent buses, as Jan pointed out upthread.

>                                 An assigned pci device lives on the
> PCI bus, not a KVMBus.  If we need a pointer to KVMState, then we must
> find it elsewhere, not through creating imaginary buses that don't
> exist.

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-12 12:04                 ` Markus Armbruster
  0 siblings, 0 replies; 300+ messages in thread
From: Markus Armbruster @ 2011-01-12 12:04 UTC (permalink / raw)
  To: Avi Kivity
  Cc: kvm, Glauber Costa, Marcelo Tosatti, qemu-devel, Anthony Liguori,
	Jan Kiszka

Avi Kivity <avi@redhat.com> writes:

> On 01/11/2011 03:54 PM, Anthony Liguori wrote:
>>
>> Right, we should introduce a KVMBus that KVM devices are created on.
>> The devices can get at KVMState through the BusState.
>
> There is no kvm bus in a PC (I looked).  We're bending the device
> model here because a device is implemented in the kernel and not in
> userspace.  An implementation detail is magnified beyond all
> proportions.
>
> An ioapic that is implemented by kvm lives in exactly the same place
> that the qemu ioapic lives in.

Exactly.  And that place is a bus.

What if the device interfaces in bus-specific ways with its parent bus?
Then we can't simply replace the parent bus by a KVM bus.  We'd need
*two* parent buses, as Jan pointed out upthread.

>                                 An assigned pci device lives on the
> PCI bus, not a KVMBus.  If we need a pointer to KVMState, then we must
> find it elsewhere, not through creating imaginary buses that don't
> exist.

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [PATCH 04/35] Add "broadcast" option for mce command
  2011-01-09 18:51     ` [Qemu-devel] " Jan Kiszka
@ 2011-01-15 16:24       ` Jan Kiszka
  -1 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-15 16:24 UTC (permalink / raw)
  To: Marcelo Tosatti, Jin Dongming
  Cc: Anthony Liguori, qemu-devel, kvm, Hidetoshi Seto

[-- Attachment #1: Type: text/plain, Size: 6430 bytes --]

Am 09.01.2011 19:51, Jan Kiszka wrote:
> Am 06.01.2011 18:56, Marcelo Tosatti wrote:
>> From: Jin Dongming <jin.dongming@np.css.fujitsu.com>
>>
>> When the following test case is injected with mce command, maybe user could not
>> get the expected result.
>>     DATA
>>                command cpu bank status             mcg_status  addr   misc
>>         (qemu) mce     1   1    0xbd00000000000000 0x05        0x1234 0x8c
>>
>>     Expected Result
>>            panic type: "Fatal Machine check"
>>
>> That is because each mce command can only inject the given cpu and could not
>> inject mce interrupt to other cpus. So user will get the following result:
>>     panic type: "Fatal machine check on current CPU"
>>
>> "broadcast" option is used for injecting dummy data into other cpus. Injecting
>> mce with this option the expected result could be gotten.
>>
>> Usage:
>>     Broadcast[on]
>>            command broadcast cpu bank status             mcg_status  addr   misc
>>     (qemu) mce     -b        1   1    0xbd00000000000000 0x05        0x1234 0x8c
>>
>>     Broadcast[off]
>>            command cpu bank status             mcg_status  addr   misc
>>     (qemu) mce     1   1    0xbd00000000000000 0x05        0x1234 0x8c
>>
>> Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
>> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
>> ---
>>  cpu-all.h             |    3 ++-
>>  hmp-commands.hx       |    6 +++---
>>  monitor.c             |    7 +++++--
>>  target-i386/helper.c  |   20 ++++++++++++++++++--
>>  target-i386/kvm.c     |   16 ++++++++++++----
>>  target-i386/kvm_x86.h |    5 ++++-
>>  6 files changed, 44 insertions(+), 13 deletions(-)
>>
>> diff --git a/cpu-all.h b/cpu-all.h
>> index 30ae17d..4ce4e83 100644
>> --- a/cpu-all.h
>> +++ b/cpu-all.h
>> @@ -964,6 +964,7 @@ int cpu_memory_rw_debug(CPUState *env, target_ulong addr,
>>                          uint8_t *buf, int len, int is_write);
>>  
>>  void cpu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
>> -                        uint64_t mcg_status, uint64_t addr, uint64_t misc);
>> +                        uint64_t mcg_status, uint64_t addr, uint64_t misc,
>> +                        int broadcast);
>>  
>>  #endif /* CPU_ALL_H */
>> diff --git a/hmp-commands.hx b/hmp-commands.hx
>> index df134f8..c82fb10 100644
>> --- a/hmp-commands.hx
>> +++ b/hmp-commands.hx
>> @@ -1091,9 +1091,9 @@ ETEXI
>>  
>>      {
>>          .name       = "mce",
>> -        .args_type  = "cpu_index:i,bank:i,status:l,mcg_status:l,addr:l,misc:l",
>> -        .params     = "cpu bank status mcgstatus addr misc",
>> -        .help       = "inject a MCE on the given CPU",
>> +        .args_type  = "broadcast:-b,cpu_index:i,bank:i,status:l,mcg_status:l,addr:l,misc:l",
>> +        .params     = "[-b] cpu bank status mcgstatus addr misc",
>> +        .help       = "inject a MCE on the given CPU [and broadcast to other CPUs with -b option]",
>>          .mhandler.cmd = do_inject_mce,
>>      },
>>  
>> diff --git a/monitor.c b/monitor.c
>> index f258000..f4f624b 100644
>> --- a/monitor.c
>> +++ b/monitor.c
>> @@ -2671,12 +2671,15 @@ static void do_inject_mce(Monitor *mon, const QDict *qdict)
>>      uint64_t mcg_status = qdict_get_int(qdict, "mcg_status");
>>      uint64_t addr = qdict_get_int(qdict, "addr");
>>      uint64_t misc = qdict_get_int(qdict, "misc");
>> +    int broadcast = qdict_get_try_bool(qdict, "broadcast", 0);
>>  
>> -    for (cenv = first_cpu; cenv != NULL; cenv = cenv->next_cpu)
>> +    for (cenv = first_cpu; cenv != NULL; cenv = cenv->next_cpu) {
>>          if (cenv->cpu_index == cpu_index && cenv->mcg_cap) {
>> -            cpu_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc);
>> +            cpu_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc,
>> +                               broadcast);
>>              break;
>>          }
>> +    }
>>  }
>>  #endif
>>  
>> diff --git a/target-i386/helper.c b/target-i386/helper.c
>> index 2c94130..2cfb4a4 100644
>> --- a/target-i386/helper.c
>> +++ b/target-i386/helper.c
>> @@ -1069,18 +1069,34 @@ static void qemu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
>>  }
>>  
>>  void cpu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
>> -                        uint64_t mcg_status, uint64_t addr, uint64_t misc)
>> +                        uint64_t mcg_status, uint64_t addr, uint64_t misc,
>> +                        int broadcast)
>>  {
>>      unsigned bank_num = cenv->mcg_cap & 0xff;
>> +    CPUState *env;
>> +    int flag = 0;
>>  
>>      if (bank >= bank_num || !(status & MCI_STATUS_VAL)) {
>>          return;
>>      }
>>  
>>      if (kvm_enabled()) {
>> -        kvm_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc, 0);
>> +        if (broadcast) {
>> +            flag |= MCE_BROADCAST;
>> +        }
>> +
>> +        kvm_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc, flag);
>>      } else {
>>          qemu_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc);
>> +        if (broadcast) {
>> +            for (env = first_cpu; env != NULL; env = env->next_cpu) {
>> +                if (cenv == env) {
>> +                    continue;
>> +                }
>> +
>> +                qemu_inject_x86_mce(env, 1, 0xa000000000000000, 0, 0, 0);
> 
> Constant lacks "ULL". Can probably be fixed up on commit.

Actually, the right fix is MCI_STATUS_VAL | MCI_STATUS_UC instead of the
magic number.

Still, there is an inconsistency: kvm_mce_broadcast_rest injects
mcg_state = MCG_STATUS_MCIP | MCG_STATUS_RIPV, the above code sets it to
0. I presume the KVM code is correct, isn't it?

This demonstrates one problem of the MCE code base: It contains too much
redundancy between TCG and KVM paths. And as the KVM part is also not
yet well integrated with VCPU state writeback, e.g. VCPU events, we have
some races over there. The good news is that we can drop half of the KVM
MCE bits when reusing qemu_inject_x86_mce for setting up the event
injection.

How to proceed? Fix up those nits, merge the patches, and then rework
MCE on top (I started like this already)? Or rather do the rework on top
of current qemu?

I'm willing to drive this, but I would also welcome if we could
distribute the effort.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 300+ messages in thread

* [Qemu-devel] Re: [PATCH 04/35] Add "broadcast" option for mce command
@ 2011-01-15 16:24       ` Jan Kiszka
  0 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-15 16:24 UTC (permalink / raw)
  To: Marcelo Tosatti, Jin Dongming
  Cc: Anthony Liguori, Hidetoshi Seto, qemu-devel, kvm

[-- Attachment #1: Type: text/plain, Size: 6430 bytes --]

Am 09.01.2011 19:51, Jan Kiszka wrote:
> Am 06.01.2011 18:56, Marcelo Tosatti wrote:
>> From: Jin Dongming <jin.dongming@np.css.fujitsu.com>
>>
>> When the following test case is injected with mce command, maybe user could not
>> get the expected result.
>>     DATA
>>                command cpu bank status             mcg_status  addr   misc
>>         (qemu) mce     1   1    0xbd00000000000000 0x05        0x1234 0x8c
>>
>>     Expected Result
>>            panic type: "Fatal Machine check"
>>
>> That is because each mce command can only inject the given cpu and could not
>> inject mce interrupt to other cpus. So user will get the following result:
>>     panic type: "Fatal machine check on current CPU"
>>
>> "broadcast" option is used for injecting dummy data into other cpus. Injecting
>> mce with this option the expected result could be gotten.
>>
>> Usage:
>>     Broadcast[on]
>>            command broadcast cpu bank status             mcg_status  addr   misc
>>     (qemu) mce     -b        1   1    0xbd00000000000000 0x05        0x1234 0x8c
>>
>>     Broadcast[off]
>>            command cpu bank status             mcg_status  addr   misc
>>     (qemu) mce     1   1    0xbd00000000000000 0x05        0x1234 0x8c
>>
>> Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
>> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
>> ---
>>  cpu-all.h             |    3 ++-
>>  hmp-commands.hx       |    6 +++---
>>  monitor.c             |    7 +++++--
>>  target-i386/helper.c  |   20 ++++++++++++++++++--
>>  target-i386/kvm.c     |   16 ++++++++++++----
>>  target-i386/kvm_x86.h |    5 ++++-
>>  6 files changed, 44 insertions(+), 13 deletions(-)
>>
>> diff --git a/cpu-all.h b/cpu-all.h
>> index 30ae17d..4ce4e83 100644
>> --- a/cpu-all.h
>> +++ b/cpu-all.h
>> @@ -964,6 +964,7 @@ int cpu_memory_rw_debug(CPUState *env, target_ulong addr,
>>                          uint8_t *buf, int len, int is_write);
>>  
>>  void cpu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
>> -                        uint64_t mcg_status, uint64_t addr, uint64_t misc);
>> +                        uint64_t mcg_status, uint64_t addr, uint64_t misc,
>> +                        int broadcast);
>>  
>>  #endif /* CPU_ALL_H */
>> diff --git a/hmp-commands.hx b/hmp-commands.hx
>> index df134f8..c82fb10 100644
>> --- a/hmp-commands.hx
>> +++ b/hmp-commands.hx
>> @@ -1091,9 +1091,9 @@ ETEXI
>>  
>>      {
>>          .name       = "mce",
>> -        .args_type  = "cpu_index:i,bank:i,status:l,mcg_status:l,addr:l,misc:l",
>> -        .params     = "cpu bank status mcgstatus addr misc",
>> -        .help       = "inject a MCE on the given CPU",
>> +        .args_type  = "broadcast:-b,cpu_index:i,bank:i,status:l,mcg_status:l,addr:l,misc:l",
>> +        .params     = "[-b] cpu bank status mcgstatus addr misc",
>> +        .help       = "inject a MCE on the given CPU [and broadcast to other CPUs with -b option]",
>>          .mhandler.cmd = do_inject_mce,
>>      },
>>  
>> diff --git a/monitor.c b/monitor.c
>> index f258000..f4f624b 100644
>> --- a/monitor.c
>> +++ b/monitor.c
>> @@ -2671,12 +2671,15 @@ static void do_inject_mce(Monitor *mon, const QDict *qdict)
>>      uint64_t mcg_status = qdict_get_int(qdict, "mcg_status");
>>      uint64_t addr = qdict_get_int(qdict, "addr");
>>      uint64_t misc = qdict_get_int(qdict, "misc");
>> +    int broadcast = qdict_get_try_bool(qdict, "broadcast", 0);
>>  
>> -    for (cenv = first_cpu; cenv != NULL; cenv = cenv->next_cpu)
>> +    for (cenv = first_cpu; cenv != NULL; cenv = cenv->next_cpu) {
>>          if (cenv->cpu_index == cpu_index && cenv->mcg_cap) {
>> -            cpu_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc);
>> +            cpu_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc,
>> +                               broadcast);
>>              break;
>>          }
>> +    }
>>  }
>>  #endif
>>  
>> diff --git a/target-i386/helper.c b/target-i386/helper.c
>> index 2c94130..2cfb4a4 100644
>> --- a/target-i386/helper.c
>> +++ b/target-i386/helper.c
>> @@ -1069,18 +1069,34 @@ static void qemu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
>>  }
>>  
>>  void cpu_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
>> -                        uint64_t mcg_status, uint64_t addr, uint64_t misc)
>> +                        uint64_t mcg_status, uint64_t addr, uint64_t misc,
>> +                        int broadcast)
>>  {
>>      unsigned bank_num = cenv->mcg_cap & 0xff;
>> +    CPUState *env;
>> +    int flag = 0;
>>  
>>      if (bank >= bank_num || !(status & MCI_STATUS_VAL)) {
>>          return;
>>      }
>>  
>>      if (kvm_enabled()) {
>> -        kvm_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc, 0);
>> +        if (broadcast) {
>> +            flag |= MCE_BROADCAST;
>> +        }
>> +
>> +        kvm_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc, flag);
>>      } else {
>>          qemu_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc);
>> +        if (broadcast) {
>> +            for (env = first_cpu; env != NULL; env = env->next_cpu) {
>> +                if (cenv == env) {
>> +                    continue;
>> +                }
>> +
>> +                qemu_inject_x86_mce(env, 1, 0xa000000000000000, 0, 0, 0);
> 
> Constant lacks "ULL". Can probably be fixed up on commit.

Actually, the right fix is MCI_STATUS_VAL | MCI_STATUS_UC instead of the
magic number.

Still, there is an inconsistency: kvm_mce_broadcast_rest injects
mcg_state = MCG_STATUS_MCIP | MCG_STATUS_RIPV, the above code sets it to
0. I presume the KVM code is correct, isn't it?

This demonstrates one problem of the MCE code base: It contains too much
redundancy between TCG and KVM paths. And as the KVM part is also not
yet well integrated with VCPU state writeback, e.g. VCPU events, we have
some races over there. The good news is that we can drop half of the KVM
MCE bits when reusing qemu_inject_x86_mce for setting up the event
injection.

How to proceed? Fix up those nits, merge the patches, and then rework
MCE on top (I started like this already)? Or rather do the rework on top
of current qemu?

I'm willing to drive this, but I would also welcome if we could
distribute the effort.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-12 10:31                 ` Jan Kiszka
@ 2011-01-18 14:28                   ` Jan Kiszka
  -1 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-18 14:28 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Avi Kivity, Markus Armbruster, Marcelo Tosatti, Glauber Costa,
	kvm, qemu-devel

On 2011-01-12 11:31, Jan Kiszka wrote:
> Am 12.01.2011 11:22, Avi Kivity wrote:
>> On 01/11/2011 03:54 PM, Anthony Liguori wrote:
>>>
>>> Right, we should introduce a KVMBus that KVM devices are created on. 
>>> The devices can get at KVMState through the BusState.
>>
>> There is no kvm bus in a PC (I looked).  We're bending the device model
>> here because a device is implemented in the kernel and not in
>> userspace.  An implementation detail is magnified beyond all proportions.
>>
>> An ioapic that is implemented by kvm lives in exactly the same place
>> that the qemu ioapic lives in.  An assigned pci device lives on the PCI
>> bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
>> it elsewhere, not through creating imaginary buses that don't exist.
>>
> 
> Exactly.
> 
> So we can either "infect" the whole device tree with kvm (or maybe a
> more generic accelerator structure that also deals with Xen) or we need
> to pull the reference inside the device's init function from some global
> service (kvm_get_state).

Note that this topic is still waiting for good suggestions, specifically
from those who believe in kvm_state references :). This is not only
blocking kvmstate merge but will affect KVM irqchips as well.

It boils down to how we reasonably pass a kvm_state reference from
machine init code to a sysbus device. I'm probably biased, but I don't
see any way that does not work against the idea of confining access to
kvm_state or breaks device instantiation from the command line or a
config file.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-18 14:28                   ` Jan Kiszka
  0 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-18 14:28 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kvm, Glauber Costa, Marcelo Tosatti, qemu-devel,
	Markus Armbruster, Avi Kivity

On 2011-01-12 11:31, Jan Kiszka wrote:
> Am 12.01.2011 11:22, Avi Kivity wrote:
>> On 01/11/2011 03:54 PM, Anthony Liguori wrote:
>>>
>>> Right, we should introduce a KVMBus that KVM devices are created on. 
>>> The devices can get at KVMState through the BusState.
>>
>> There is no kvm bus in a PC (I looked).  We're bending the device model
>> here because a device is implemented in the kernel and not in
>> userspace.  An implementation detail is magnified beyond all proportions.
>>
>> An ioapic that is implemented by kvm lives in exactly the same place
>> that the qemu ioapic lives in.  An assigned pci device lives on the PCI
>> bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
>> it elsewhere, not through creating imaginary buses that don't exist.
>>
> 
> Exactly.
> 
> So we can either "infect" the whole device tree with kvm (or maybe a
> more generic accelerator structure that also deals with Xen) or we need
> to pull the reference inside the device's init function from some global
> service (kvm_get_state).

Note that this topic is still waiting for good suggestions, specifically
from those who believe in kvm_state references :). This is not only
blocking kvmstate merge but will affect KVM irqchips as well.

It boils down to how we reasonably pass a kvm_state reference from
machine init code to a sysbus device. I'm probably biased, but I don't
see any way that does not work against the idea of confining access to
kvm_state or breaks device instantiation from the command line or a
config file.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-18 14:28                   ` Jan Kiszka
@ 2011-01-18 15:04                     ` Anthony Liguori
  -1 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-18 15:04 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Avi Kivity, Markus Armbruster, Marcelo Tosatti, Glauber Costa,
	kvm, qemu-devel

On 01/18/2011 08:28 AM, Jan Kiszka wrote:
> On 2011-01-12 11:31, Jan Kiszka wrote:
>    
>> Am 12.01.2011 11:22, Avi Kivity wrote:
>>      
>>> On 01/11/2011 03:54 PM, Anthony Liguori wrote:
>>>        
>>>> Right, we should introduce a KVMBus that KVM devices are created on.
>>>> The devices can get at KVMState through the BusState.
>>>>          
>>> There is no kvm bus in a PC (I looked).  We're bending the device model
>>> here because a device is implemented in the kernel and not in
>>> userspace.  An implementation detail is magnified beyond all proportions.
>>>
>>> An ioapic that is implemented by kvm lives in exactly the same place
>>> that the qemu ioapic lives in.  An assigned pci device lives on the PCI
>>> bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
>>> it elsewhere, not through creating imaginary buses that don't exist.
>>>
>>>        
>> Exactly.
>>
>> So we can either "infect" the whole device tree with kvm (or maybe a
>> more generic accelerator structure that also deals with Xen) or we need
>> to pull the reference inside the device's init function from some global
>> service (kvm_get_state).
>>      
> Note that this topic is still waiting for good suggestions, specifically
> from those who believe in kvm_state references :). This is not only
> blocking kvmstate merge but will affect KVM irqchips as well.
>
> It boils down to how we reasonably pass a kvm_state reference from
> machine init code to a sysbus device. I'm probably biased, but I don't
> see any way that does not work against the idea of confining access to
> kvm_state or breaks device instantiation from the command line or a
> config file.
>    

A KVM device should sit on a KVM specific bus that hangs off of sysbus.  
It can get to kvm_state through that bus.

That bus doesn't get instantiated through qdev so requiring a pointer 
argument should not be an issue.

Regards,

Anthony Liguori

> Jan
>
>    


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-18 15:04                     ` Anthony Liguori
  0 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-18 15:04 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: kvm, Glauber Costa, Marcelo Tosatti, qemu-devel,
	Markus Armbruster, Avi Kivity

On 01/18/2011 08:28 AM, Jan Kiszka wrote:
> On 2011-01-12 11:31, Jan Kiszka wrote:
>    
>> Am 12.01.2011 11:22, Avi Kivity wrote:
>>      
>>> On 01/11/2011 03:54 PM, Anthony Liguori wrote:
>>>        
>>>> Right, we should introduce a KVMBus that KVM devices are created on.
>>>> The devices can get at KVMState through the BusState.
>>>>          
>>> There is no kvm bus in a PC (I looked).  We're bending the device model
>>> here because a device is implemented in the kernel and not in
>>> userspace.  An implementation detail is magnified beyond all proportions.
>>>
>>> An ioapic that is implemented by kvm lives in exactly the same place
>>> that the qemu ioapic lives in.  An assigned pci device lives on the PCI
>>> bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
>>> it elsewhere, not through creating imaginary buses that don't exist.
>>>
>>>        
>> Exactly.
>>
>> So we can either "infect" the whole device tree with kvm (or maybe a
>> more generic accelerator structure that also deals with Xen) or we need
>> to pull the reference inside the device's init function from some global
>> service (kvm_get_state).
>>      
> Note that this topic is still waiting for good suggestions, specifically
> from those who believe in kvm_state references :). This is not only
> blocking kvmstate merge but will affect KVM irqchips as well.
>
> It boils down to how we reasonably pass a kvm_state reference from
> machine init code to a sysbus device. I'm probably biased, but I don't
> see any way that does not work against the idea of confining access to
> kvm_state or breaks device instantiation from the command line or a
> config file.
>    

A KVM device should sit on a KVM specific bus that hangs off of sysbus.  
It can get to kvm_state through that bus.

That bus doesn't get instantiated through qdev so requiring a pointer 
argument should not be an issue.

Regards,

Anthony Liguori

> Jan
>
>    

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-18 15:04                     ` Anthony Liguori
@ 2011-01-18 15:43                       ` Jan Kiszka
  -1 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-18 15:43 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Avi Kivity, Markus Armbruster, Marcelo Tosatti, Glauber Costa,
	kvm, qemu-devel

On 2011-01-18 16:04, Anthony Liguori wrote:
> On 01/18/2011 08:28 AM, Jan Kiszka wrote:
>> On 2011-01-12 11:31, Jan Kiszka wrote:
>>    
>>> Am 12.01.2011 11:22, Avi Kivity wrote:
>>>      
>>>> On 01/11/2011 03:54 PM, Anthony Liguori wrote:
>>>>        
>>>>> Right, we should introduce a KVMBus that KVM devices are created on.
>>>>> The devices can get at KVMState through the BusState.
>>>>>          
>>>> There is no kvm bus in a PC (I looked).  We're bending the device model
>>>> here because a device is implemented in the kernel and not in
>>>> userspace.  An implementation detail is magnified beyond all proportions.
>>>>
>>>> An ioapic that is implemented by kvm lives in exactly the same place
>>>> that the qemu ioapic lives in.  An assigned pci device lives on the PCI
>>>> bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
>>>> it elsewhere, not through creating imaginary buses that don't exist.
>>>>
>>>>        
>>> Exactly.
>>>
>>> So we can either "infect" the whole device tree with kvm (or maybe a
>>> more generic accelerator structure that also deals with Xen) or we need
>>> to pull the reference inside the device's init function from some global
>>> service (kvm_get_state).
>>>      
>> Note that this topic is still waiting for good suggestions, specifically
>> from those who believe in kvm_state references :). This is not only
>> blocking kvmstate merge but will affect KVM irqchips as well.
>>
>> It boils down to how we reasonably pass a kvm_state reference from
>> machine init code to a sysbus device. I'm probably biased, but I don't
>> see any way that does not work against the idea of confining access to
>> kvm_state or breaks device instantiation from the command line or a
>> config file.
>>    
> 
> A KVM device should sit on a KVM specific bus that hangs off of sysbus.  
> It can get to kvm_state through that bus.
> 
> That bus doesn't get instantiated through qdev so requiring a pointer 
> argument should not be an issue.
> 

This design is in conflict with the requirement to attach KVM-assisted
devices also to their home bus, e.g. an assigned PCI device to the PCI
bus. We don't support multi-homed qdev devices.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-18 15:43                       ` Jan Kiszka
  0 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-18 15:43 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kvm, Glauber Costa, Marcelo Tosatti, qemu-devel,
	Markus Armbruster, Avi Kivity

On 2011-01-18 16:04, Anthony Liguori wrote:
> On 01/18/2011 08:28 AM, Jan Kiszka wrote:
>> On 2011-01-12 11:31, Jan Kiszka wrote:
>>    
>>> Am 12.01.2011 11:22, Avi Kivity wrote:
>>>      
>>>> On 01/11/2011 03:54 PM, Anthony Liguori wrote:
>>>>        
>>>>> Right, we should introduce a KVMBus that KVM devices are created on.
>>>>> The devices can get at KVMState through the BusState.
>>>>>          
>>>> There is no kvm bus in a PC (I looked).  We're bending the device model
>>>> here because a device is implemented in the kernel and not in
>>>> userspace.  An implementation detail is magnified beyond all proportions.
>>>>
>>>> An ioapic that is implemented by kvm lives in exactly the same place
>>>> that the qemu ioapic lives in.  An assigned pci device lives on the PCI
>>>> bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
>>>> it elsewhere, not through creating imaginary buses that don't exist.
>>>>
>>>>        
>>> Exactly.
>>>
>>> So we can either "infect" the whole device tree with kvm (or maybe a
>>> more generic accelerator structure that also deals with Xen) or we need
>>> to pull the reference inside the device's init function from some global
>>> service (kvm_get_state).
>>>      
>> Note that this topic is still waiting for good suggestions, specifically
>> from those who believe in kvm_state references :). This is not only
>> blocking kvmstate merge but will affect KVM irqchips as well.
>>
>> It boils down to how we reasonably pass a kvm_state reference from
>> machine init code to a sysbus device. I'm probably biased, but I don't
>> see any way that does not work against the idea of confining access to
>> kvm_state or breaks device instantiation from the command line or a
>> config file.
>>    
> 
> A KVM device should sit on a KVM specific bus that hangs off of sysbus.  
> It can get to kvm_state through that bus.
> 
> That bus doesn't get instantiated through qdev so requiring a pointer 
> argument should not be an issue.
> 

This design is in conflict with the requirement to attach KVM-assisted
devices also to their home bus, e.g. an assigned PCI device to the PCI
bus. We don't support multi-homed qdev devices.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-18 15:43                       ` Jan Kiszka
@ 2011-01-18 15:48                         ` Anthony Liguori
  -1 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-18 15:48 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Avi Kivity, Markus Armbruster, Marcelo Tosatti, Glauber Costa,
	kvm, qemu-devel

On 01/18/2011 09:43 AM, Jan Kiszka wrote:
> On 2011-01-18 16:04, Anthony Liguori wrote:
>    
>> On 01/18/2011 08:28 AM, Jan Kiszka wrote:
>>      
>>> On 2011-01-12 11:31, Jan Kiszka wrote:
>>>
>>>        
>>>> Am 12.01.2011 11:22, Avi Kivity wrote:
>>>>
>>>>          
>>>>> On 01/11/2011 03:54 PM, Anthony Liguori wrote:
>>>>>
>>>>>            
>>>>>> Right, we should introduce a KVMBus that KVM devices are created on.
>>>>>> The devices can get at KVMState through the BusState.
>>>>>>
>>>>>>              
>>>>> There is no kvm bus in a PC (I looked).  We're bending the device model
>>>>> here because a device is implemented in the kernel and not in
>>>>> userspace.  An implementation detail is magnified beyond all proportions.
>>>>>
>>>>> An ioapic that is implemented by kvm lives in exactly the same place
>>>>> that the qemu ioapic lives in.  An assigned pci device lives on the PCI
>>>>> bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
>>>>> it elsewhere, not through creating imaginary buses that don't exist.
>>>>>
>>>>>
>>>>>            
>>>> Exactly.
>>>>
>>>> So we can either "infect" the whole device tree with kvm (or maybe a
>>>> more generic accelerator structure that also deals with Xen) or we need
>>>> to pull the reference inside the device's init function from some global
>>>> service (kvm_get_state).
>>>>
>>>>          
>>> Note that this topic is still waiting for good suggestions, specifically
>>> from those who believe in kvm_state references :). This is not only
>>> blocking kvmstate merge but will affect KVM irqchips as well.
>>>
>>> It boils down to how we reasonably pass a kvm_state reference from
>>> machine init code to a sysbus device. I'm probably biased, but I don't
>>> see any way that does not work against the idea of confining access to
>>> kvm_state or breaks device instantiation from the command line or a
>>> config file.
>>>
>>>        
>> A KVM device should sit on a KVM specific bus that hangs off of sysbus.
>> It can get to kvm_state through that bus.
>>
>> That bus doesn't get instantiated through qdev so requiring a pointer
>> argument should not be an issue.
>>
>>      
> This design is in conflict with the requirement to attach KVM-assisted
> devices also to their home bus, e.g. an assigned PCI device to the PCI
> bus. We don't support multi-homed qdev devices.
>    

With vfio, would an assigned PCI device even need kvm_state?

Regards,

Anthony Liguori

> Jan
>
>    


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-18 15:48                         ` Anthony Liguori
  0 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-18 15:48 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: kvm, Glauber Costa, Marcelo Tosatti, qemu-devel,
	Markus Armbruster, Avi Kivity

On 01/18/2011 09:43 AM, Jan Kiszka wrote:
> On 2011-01-18 16:04, Anthony Liguori wrote:
>    
>> On 01/18/2011 08:28 AM, Jan Kiszka wrote:
>>      
>>> On 2011-01-12 11:31, Jan Kiszka wrote:
>>>
>>>        
>>>> Am 12.01.2011 11:22, Avi Kivity wrote:
>>>>
>>>>          
>>>>> On 01/11/2011 03:54 PM, Anthony Liguori wrote:
>>>>>
>>>>>            
>>>>>> Right, we should introduce a KVMBus that KVM devices are created on.
>>>>>> The devices can get at KVMState through the BusState.
>>>>>>
>>>>>>              
>>>>> There is no kvm bus in a PC (I looked).  We're bending the device model
>>>>> here because a device is implemented in the kernel and not in
>>>>> userspace.  An implementation detail is magnified beyond all proportions.
>>>>>
>>>>> An ioapic that is implemented by kvm lives in exactly the same place
>>>>> that the qemu ioapic lives in.  An assigned pci device lives on the PCI
>>>>> bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
>>>>> it elsewhere, not through creating imaginary buses that don't exist.
>>>>>
>>>>>
>>>>>            
>>>> Exactly.
>>>>
>>>> So we can either "infect" the whole device tree with kvm (or maybe a
>>>> more generic accelerator structure that also deals with Xen) or we need
>>>> to pull the reference inside the device's init function from some global
>>>> service (kvm_get_state).
>>>>
>>>>          
>>> Note that this topic is still waiting for good suggestions, specifically
>>> from those who believe in kvm_state references :). This is not only
>>> blocking kvmstate merge but will affect KVM irqchips as well.
>>>
>>> It boils down to how we reasonably pass a kvm_state reference from
>>> machine init code to a sysbus device. I'm probably biased, but I don't
>>> see any way that does not work against the idea of confining access to
>>> kvm_state or breaks device instantiation from the command line or a
>>> config file.
>>>
>>>        
>> A KVM device should sit on a KVM specific bus that hangs off of sysbus.
>> It can get to kvm_state through that bus.
>>
>> That bus doesn't get instantiated through qdev so requiring a pointer
>> argument should not be an issue.
>>
>>      
> This design is in conflict with the requirement to attach KVM-assisted
> devices also to their home bus, e.g. an assigned PCI device to the PCI
> bus. We don't support multi-homed qdev devices.
>    

With vfio, would an assigned PCI device even need kvm_state?

Regards,

Anthony Liguori

> Jan
>
>    

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-18 15:43                       ` Jan Kiszka
@ 2011-01-18 15:50                         ` Anthony Liguori
  -1 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-18 15:50 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Avi Kivity, Markus Armbruster, Marcelo Tosatti, Glauber Costa,
	kvm, qemu-devel

On 01/18/2011 09:43 AM, Jan Kiszka wrote:
> On 2011-01-18 16:04, Anthony Liguori wrote:
>    
>> On 01/18/2011 08:28 AM, Jan Kiszka wrote:
>>      
>>> On 2011-01-12 11:31, Jan Kiszka wrote:
>>>
>>>        
>>>> Am 12.01.2011 11:22, Avi Kivity wrote:
>>>>
>>>>          
>>>>> On 01/11/2011 03:54 PM, Anthony Liguori wrote:
>>>>>
>>>>>            
>>>>>> Right, we should introduce a KVMBus that KVM devices are created on.
>>>>>> The devices can get at KVMState through the BusState.
>>>>>>
>>>>>>              
>>>>> There is no kvm bus in a PC (I looked).  We're bending the device model
>>>>> here because a device is implemented in the kernel and not in
>>>>> userspace.  An implementation detail is magnified beyond all proportions.
>>>>>
>>>>> An ioapic that is implemented by kvm lives in exactly the same place
>>>>> that the qemu ioapic lives in.  An assigned pci device lives on the PCI
>>>>> bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
>>>>> it elsewhere, not through creating imaginary buses that don't exist.
>>>>>
>>>>>
>>>>>            
>>>> Exactly.
>>>>
>>>> So we can either "infect" the whole device tree with kvm (or maybe a
>>>> more generic accelerator structure that also deals with Xen) or we need
>>>> to pull the reference inside the device's init function from some global
>>>> service (kvm_get_state).
>>>>
>>>>          
>>> Note that this topic is still waiting for good suggestions, specifically
>>> from those who believe in kvm_state references :). This is not only
>>> blocking kvmstate merge but will affect KVM irqchips as well.
>>>
>>> It boils down to how we reasonably pass a kvm_state reference from
>>> machine init code to a sysbus device. I'm probably biased, but I don't
>>> see any way that does not work against the idea of confining access to
>>> kvm_state or breaks device instantiation from the command line or a
>>> config file.
>>>
>>>        
>> A KVM device should sit on a KVM specific bus that hangs off of sysbus.
>> It can get to kvm_state through that bus.
>>
>> That bus doesn't get instantiated through qdev so requiring a pointer
>> argument should not be an issue.
>>
>>      
> This design is in conflict with the requirement to attach KVM-assisted
> devices also to their home bus, e.g. an assigned PCI device to the PCI
> bus. We don't support multi-homed qdev devices.
>    

The bus topology reflects how I/O flows in and out of a device.  We do 
not model a perfect PC bus architecture and I don't think we ever intend 
to.  Instead, we model a functional architecture.

I/O from an assigned device does not flow through the emulated PCI bus.  
Therefore, it does not belong on the emulated PCI bus.

Assigned devices need to interact with the emulated PCI bus, but they 
shouldn't be children of it.

Regards,

Anthony Liguori

> Jan
>
>    


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-18 15:50                         ` Anthony Liguori
  0 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-18 15:50 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: kvm, Glauber Costa, Marcelo Tosatti, qemu-devel,
	Markus Armbruster, Avi Kivity

On 01/18/2011 09:43 AM, Jan Kiszka wrote:
> On 2011-01-18 16:04, Anthony Liguori wrote:
>    
>> On 01/18/2011 08:28 AM, Jan Kiszka wrote:
>>      
>>> On 2011-01-12 11:31, Jan Kiszka wrote:
>>>
>>>        
>>>> Am 12.01.2011 11:22, Avi Kivity wrote:
>>>>
>>>>          
>>>>> On 01/11/2011 03:54 PM, Anthony Liguori wrote:
>>>>>
>>>>>            
>>>>>> Right, we should introduce a KVMBus that KVM devices are created on.
>>>>>> The devices can get at KVMState through the BusState.
>>>>>>
>>>>>>              
>>>>> There is no kvm bus in a PC (I looked).  We're bending the device model
>>>>> here because a device is implemented in the kernel and not in
>>>>> userspace.  An implementation detail is magnified beyond all proportions.
>>>>>
>>>>> An ioapic that is implemented by kvm lives in exactly the same place
>>>>> that the qemu ioapic lives in.  An assigned pci device lives on the PCI
>>>>> bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
>>>>> it elsewhere, not through creating imaginary buses that don't exist.
>>>>>
>>>>>
>>>>>            
>>>> Exactly.
>>>>
>>>> So we can either "infect" the whole device tree with kvm (or maybe a
>>>> more generic accelerator structure that also deals with Xen) or we need
>>>> to pull the reference inside the device's init function from some global
>>>> service (kvm_get_state).
>>>>
>>>>          
>>> Note that this topic is still waiting for good suggestions, specifically
>>> from those who believe in kvm_state references :). This is not only
>>> blocking kvmstate merge but will affect KVM irqchips as well.
>>>
>>> It boils down to how we reasonably pass a kvm_state reference from
>>> machine init code to a sysbus device. I'm probably biased, but I don't
>>> see any way that does not work against the idea of confining access to
>>> kvm_state or breaks device instantiation from the command line or a
>>> config file.
>>>
>>>        
>> A KVM device should sit on a KVM specific bus that hangs off of sysbus.
>> It can get to kvm_state through that bus.
>>
>> That bus doesn't get instantiated through qdev so requiring a pointer
>> argument should not be an issue.
>>
>>      
> This design is in conflict with the requirement to attach KVM-assisted
> devices also to their home bus, e.g. an assigned PCI device to the PCI
> bus. We don't support multi-homed qdev devices.
>    

The bus topology reflects how I/O flows in and out of a device.  We do 
not model a perfect PC bus architecture and I don't think we ever intend 
to.  Instead, we model a functional architecture.

I/O from an assigned device does not flow through the emulated PCI bus.  
Therefore, it does not belong on the emulated PCI bus.

Assigned devices need to interact with the emulated PCI bus, but they 
shouldn't be children of it.

Regards,

Anthony Liguori

> Jan
>
>    

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-18 15:48                         ` Anthony Liguori
@ 2011-01-18 15:54                           ` Jan Kiszka
  -1 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-18 15:54 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Avi Kivity, Markus Armbruster, Marcelo Tosatti, Glauber Costa,
	kvm, qemu-devel, Alex Williamson

On 2011-01-18 16:48, Anthony Liguori wrote:
> On 01/18/2011 09:43 AM, Jan Kiszka wrote:
>> On 2011-01-18 16:04, Anthony Liguori wrote:
>>    
>>> On 01/18/2011 08:28 AM, Jan Kiszka wrote:
>>>      
>>>> On 2011-01-12 11:31, Jan Kiszka wrote:
>>>>
>>>>        
>>>>> Am 12.01.2011 11:22, Avi Kivity wrote:
>>>>>
>>>>>          
>>>>>> On 01/11/2011 03:54 PM, Anthony Liguori wrote:
>>>>>>
>>>>>>            
>>>>>>> Right, we should introduce a KVMBus that KVM devices are created on.
>>>>>>> The devices can get at KVMState through the BusState.
>>>>>>>
>>>>>>>              
>>>>>> There is no kvm bus in a PC (I looked).  We're bending the device model
>>>>>> here because a device is implemented in the kernel and not in
>>>>>> userspace.  An implementation detail is magnified beyond all proportions.
>>>>>>
>>>>>> An ioapic that is implemented by kvm lives in exactly the same place
>>>>>> that the qemu ioapic lives in.  An assigned pci device lives on the PCI
>>>>>> bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
>>>>>> it elsewhere, not through creating imaginary buses that don't exist.
>>>>>>
>>>>>>
>>>>>>            
>>>>> Exactly.
>>>>>
>>>>> So we can either "infect" the whole device tree with kvm (or maybe a
>>>>> more generic accelerator structure that also deals with Xen) or we need
>>>>> to pull the reference inside the device's init function from some global
>>>>> service (kvm_get_state).
>>>>>
>>>>>          
>>>> Note that this topic is still waiting for good suggestions, specifically
>>>> from those who believe in kvm_state references :). This is not only
>>>> blocking kvmstate merge but will affect KVM irqchips as well.
>>>>
>>>> It boils down to how we reasonably pass a kvm_state reference from
>>>> machine init code to a sysbus device. I'm probably biased, but I don't
>>>> see any way that does not work against the idea of confining access to
>>>> kvm_state or breaks device instantiation from the command line or a
>>>> config file.
>>>>
>>>>        
>>> A KVM device should sit on a KVM specific bus that hangs off of sysbus.
>>> It can get to kvm_state through that bus.
>>>
>>> That bus doesn't get instantiated through qdev so requiring a pointer
>>> argument should not be an issue.
>>>
>>>      
>> This design is in conflict with the requirement to attach KVM-assisted
>> devices also to their home bus, e.g. an assigned PCI device to the PCI
>> bus. We don't support multi-homed qdev devices.
>>    
> 
> With vfio, would an assigned PCI device even need kvm_state?

IIUC: Yes, for establishing the irqfd link.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-18 15:54                           ` Jan Kiszka
  0 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-18 15:54 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kvm, Glauber Costa, Marcelo Tosatti, qemu-devel,
	Markus Armbruster, Alex Williamson, Avi Kivity

On 2011-01-18 16:48, Anthony Liguori wrote:
> On 01/18/2011 09:43 AM, Jan Kiszka wrote:
>> On 2011-01-18 16:04, Anthony Liguori wrote:
>>    
>>> On 01/18/2011 08:28 AM, Jan Kiszka wrote:
>>>      
>>>> On 2011-01-12 11:31, Jan Kiszka wrote:
>>>>
>>>>        
>>>>> Am 12.01.2011 11:22, Avi Kivity wrote:
>>>>>
>>>>>          
>>>>>> On 01/11/2011 03:54 PM, Anthony Liguori wrote:
>>>>>>
>>>>>>            
>>>>>>> Right, we should introduce a KVMBus that KVM devices are created on.
>>>>>>> The devices can get at KVMState through the BusState.
>>>>>>>
>>>>>>>              
>>>>>> There is no kvm bus in a PC (I looked).  We're bending the device model
>>>>>> here because a device is implemented in the kernel and not in
>>>>>> userspace.  An implementation detail is magnified beyond all proportions.
>>>>>>
>>>>>> An ioapic that is implemented by kvm lives in exactly the same place
>>>>>> that the qemu ioapic lives in.  An assigned pci device lives on the PCI
>>>>>> bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
>>>>>> it elsewhere, not through creating imaginary buses that don't exist.
>>>>>>
>>>>>>
>>>>>>            
>>>>> Exactly.
>>>>>
>>>>> So we can either "infect" the whole device tree with kvm (or maybe a
>>>>> more generic accelerator structure that also deals with Xen) or we need
>>>>> to pull the reference inside the device's init function from some global
>>>>> service (kvm_get_state).
>>>>>
>>>>>          
>>>> Note that this topic is still waiting for good suggestions, specifically
>>>> from those who believe in kvm_state references :). This is not only
>>>> blocking kvmstate merge but will affect KVM irqchips as well.
>>>>
>>>> It boils down to how we reasonably pass a kvm_state reference from
>>>> machine init code to a sysbus device. I'm probably biased, but I don't
>>>> see any way that does not work against the idea of confining access to
>>>> kvm_state or breaks device instantiation from the command line or a
>>>> config file.
>>>>
>>>>        
>>> A KVM device should sit on a KVM specific bus that hangs off of sysbus.
>>> It can get to kvm_state through that bus.
>>>
>>> That bus doesn't get instantiated through qdev so requiring a pointer
>>> argument should not be an issue.
>>>
>>>      
>> This design is in conflict with the requirement to attach KVM-assisted
>> devices also to their home bus, e.g. an assigned PCI device to the PCI
>> bus. We don't support multi-homed qdev devices.
>>    
> 
> With vfio, would an assigned PCI device even need kvm_state?

IIUC: Yes, for establishing the irqfd link.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-18 15:50                         ` Anthony Liguori
@ 2011-01-18 16:01                           ` Jan Kiszka
  -1 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-18 16:01 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Avi Kivity, Markus Armbruster, Marcelo Tosatti, Glauber Costa,
	kvm, qemu-devel

On 2011-01-18 16:50, Anthony Liguori wrote:
> On 01/18/2011 09:43 AM, Jan Kiszka wrote:
>> On 2011-01-18 16:04, Anthony Liguori wrote:
>>    
>>> On 01/18/2011 08:28 AM, Jan Kiszka wrote:
>>>      
>>>> On 2011-01-12 11:31, Jan Kiszka wrote:
>>>>
>>>>        
>>>>> Am 12.01.2011 11:22, Avi Kivity wrote:
>>>>>
>>>>>          
>>>>>> On 01/11/2011 03:54 PM, Anthony Liguori wrote:
>>>>>>
>>>>>>            
>>>>>>> Right, we should introduce a KVMBus that KVM devices are created on.
>>>>>>> The devices can get at KVMState through the BusState.
>>>>>>>
>>>>>>>              
>>>>>> There is no kvm bus in a PC (I looked).  We're bending the device model
>>>>>> here because a device is implemented in the kernel and not in
>>>>>> userspace.  An implementation detail is magnified beyond all proportions.
>>>>>>
>>>>>> An ioapic that is implemented by kvm lives in exactly the same place
>>>>>> that the qemu ioapic lives in.  An assigned pci device lives on the PCI
>>>>>> bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
>>>>>> it elsewhere, not through creating imaginary buses that don't exist.
>>>>>>
>>>>>>
>>>>>>            
>>>>> Exactly.
>>>>>
>>>>> So we can either "infect" the whole device tree with kvm (or maybe a
>>>>> more generic accelerator structure that also deals with Xen) or we need
>>>>> to pull the reference inside the device's init function from some global
>>>>> service (kvm_get_state).
>>>>>
>>>>>          
>>>> Note that this topic is still waiting for good suggestions, specifically
>>>> from those who believe in kvm_state references :). This is not only
>>>> blocking kvmstate merge but will affect KVM irqchips as well.
>>>>
>>>> It boils down to how we reasonably pass a kvm_state reference from
>>>> machine init code to a sysbus device. I'm probably biased, but I don't
>>>> see any way that does not work against the idea of confining access to
>>>> kvm_state or breaks device instantiation from the command line or a
>>>> config file.
>>>>
>>>>        
>>> A KVM device should sit on a KVM specific bus that hangs off of sysbus.
>>> It can get to kvm_state through that bus.
>>>
>>> That bus doesn't get instantiated through qdev so requiring a pointer
>>> argument should not be an issue.
>>>
>>>      
>> This design is in conflict with the requirement to attach KVM-assisted
>> devices also to their home bus, e.g. an assigned PCI device to the PCI
>> bus. We don't support multi-homed qdev devices.
>>    
> 
> The bus topology reflects how I/O flows in and out of a device.  We do 
> not model a perfect PC bus architecture and I don't think we ever intend 
> to.  Instead, we model a functional architecture.
> 
> I/O from an assigned device does not flow through the emulated PCI bus.  
> Therefore, it does not belong on the emulated PCI bus.
> 
> Assigned devices need to interact with the emulated PCI bus, but they 
> shouldn't be children of it.

You should be able to find assigned devices on some PCI bus, so you
either have to hack up the existing bus to host devices that are, on the
other side, not part of it or branch off a pci-kvm sub-bus, just like
you would have to create a sysbus-kvm. I guess, if at all, we want the
latter.

Is that acceptable for everyone?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-18 16:01                           ` Jan Kiszka
  0 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-18 16:01 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kvm, Glauber Costa, Marcelo Tosatti, qemu-devel,
	Markus Armbruster, Avi Kivity

On 2011-01-18 16:50, Anthony Liguori wrote:
> On 01/18/2011 09:43 AM, Jan Kiszka wrote:
>> On 2011-01-18 16:04, Anthony Liguori wrote:
>>    
>>> On 01/18/2011 08:28 AM, Jan Kiszka wrote:
>>>      
>>>> On 2011-01-12 11:31, Jan Kiszka wrote:
>>>>
>>>>        
>>>>> Am 12.01.2011 11:22, Avi Kivity wrote:
>>>>>
>>>>>          
>>>>>> On 01/11/2011 03:54 PM, Anthony Liguori wrote:
>>>>>>
>>>>>>            
>>>>>>> Right, we should introduce a KVMBus that KVM devices are created on.
>>>>>>> The devices can get at KVMState through the BusState.
>>>>>>>
>>>>>>>              
>>>>>> There is no kvm bus in a PC (I looked).  We're bending the device model
>>>>>> here because a device is implemented in the kernel and not in
>>>>>> userspace.  An implementation detail is magnified beyond all proportions.
>>>>>>
>>>>>> An ioapic that is implemented by kvm lives in exactly the same place
>>>>>> that the qemu ioapic lives in.  An assigned pci device lives on the PCI
>>>>>> bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
>>>>>> it elsewhere, not through creating imaginary buses that don't exist.
>>>>>>
>>>>>>
>>>>>>            
>>>>> Exactly.
>>>>>
>>>>> So we can either "infect" the whole device tree with kvm (or maybe a
>>>>> more generic accelerator structure that also deals with Xen) or we need
>>>>> to pull the reference inside the device's init function from some global
>>>>> service (kvm_get_state).
>>>>>
>>>>>          
>>>> Note that this topic is still waiting for good suggestions, specifically
>>>> from those who believe in kvm_state references :). This is not only
>>>> blocking kvmstate merge but will affect KVM irqchips as well.
>>>>
>>>> It boils down to how we reasonably pass a kvm_state reference from
>>>> machine init code to a sysbus device. I'm probably biased, but I don't
>>>> see any way that does not work against the idea of confining access to
>>>> kvm_state or breaks device instantiation from the command line or a
>>>> config file.
>>>>
>>>>        
>>> A KVM device should sit on a KVM specific bus that hangs off of sysbus.
>>> It can get to kvm_state through that bus.
>>>
>>> That bus doesn't get instantiated through qdev so requiring a pointer
>>> argument should not be an issue.
>>>
>>>      
>> This design is in conflict with the requirement to attach KVM-assisted
>> devices also to their home bus, e.g. an assigned PCI device to the PCI
>> bus. We don't support multi-homed qdev devices.
>>    
> 
> The bus topology reflects how I/O flows in and out of a device.  We do 
> not model a perfect PC bus architecture and I don't think we ever intend 
> to.  Instead, we model a functional architecture.
> 
> I/O from an assigned device does not flow through the emulated PCI bus.  
> Therefore, it does not belong on the emulated PCI bus.
> 
> Assigned devices need to interact with the emulated PCI bus, but they 
> shouldn't be children of it.

You should be able to find assigned devices on some PCI bus, so you
either have to hack up the existing bus to host devices that are, on the
other side, not part of it or branch off a pci-kvm sub-bus, just like
you would have to create a sysbus-kvm. I guess, if at all, we want the
latter.

Is that acceptable for everyone?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-18 16:01                           ` Jan Kiszka
@ 2011-01-18 16:04                             ` Anthony Liguori
  -1 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-18 16:04 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Avi Kivity, Markus Armbruster, Marcelo Tosatti, Glauber Costa,
	kvm, qemu-devel

On 01/18/2011 10:01 AM, Jan Kiszka wrote:
> On 2011-01-18 16:50, Anthony Liguori wrote:
>    
>> On 01/18/2011 09:43 AM, Jan Kiszka wrote:
>>      
>>> On 2011-01-18 16:04, Anthony Liguori wrote:
>>>
>>>        
>>>> On 01/18/2011 08:28 AM, Jan Kiszka wrote:
>>>>
>>>>          
>>>>> On 2011-01-12 11:31, Jan Kiszka wrote:
>>>>>
>>>>>
>>>>>            
>>>>>> Am 12.01.2011 11:22, Avi Kivity wrote:
>>>>>>
>>>>>>
>>>>>>              
>>>>>>> On 01/11/2011 03:54 PM, Anthony Liguori wrote:
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>>>> Right, we should introduce a KVMBus that KVM devices are created on.
>>>>>>>> The devices can get at KVMState through the BusState.
>>>>>>>>
>>>>>>>>
>>>>>>>>                  
>>>>>>> There is no kvm bus in a PC (I looked).  We're bending the device model
>>>>>>> here because a device is implemented in the kernel and not in
>>>>>>> userspace.  An implementation detail is magnified beyond all proportions.
>>>>>>>
>>>>>>> An ioapic that is implemented by kvm lives in exactly the same place
>>>>>>> that the qemu ioapic lives in.  An assigned pci device lives on the PCI
>>>>>>> bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
>>>>>>> it elsewhere, not through creating imaginary buses that don't exist.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>> Exactly.
>>>>>>
>>>>>> So we can either "infect" the whole device tree with kvm (or maybe a
>>>>>> more generic accelerator structure that also deals with Xen) or we need
>>>>>> to pull the reference inside the device's init function from some global
>>>>>> service (kvm_get_state).
>>>>>>
>>>>>>
>>>>>>              
>>>>> Note that this topic is still waiting for good suggestions, specifically
>>>>> from those who believe in kvm_state references :). This is not only
>>>>> blocking kvmstate merge but will affect KVM irqchips as well.
>>>>>
>>>>> It boils down to how we reasonably pass a kvm_state reference from
>>>>> machine init code to a sysbus device. I'm probably biased, but I don't
>>>>> see any way that does not work against the idea of confining access to
>>>>> kvm_state or breaks device instantiation from the command line or a
>>>>> config file.
>>>>>
>>>>>
>>>>>            
>>>> A KVM device should sit on a KVM specific bus that hangs off of sysbus.
>>>> It can get to kvm_state through that bus.
>>>>
>>>> That bus doesn't get instantiated through qdev so requiring a pointer
>>>> argument should not be an issue.
>>>>
>>>>
>>>>          
>>> This design is in conflict with the requirement to attach KVM-assisted
>>> devices also to their home bus, e.g. an assigned PCI device to the PCI
>>> bus. We don't support multi-homed qdev devices.
>>>
>>>        
>> The bus topology reflects how I/O flows in and out of a device.  We do
>> not model a perfect PC bus architecture and I don't think we ever intend
>> to.  Instead, we model a functional architecture.
>>
>> I/O from an assigned device does not flow through the emulated PCI bus.
>> Therefore, it does not belong on the emulated PCI bus.
>>
>> Assigned devices need to interact with the emulated PCI bus, but they
>> shouldn't be children of it.
>>      
> You should be able to find assigned devices on some PCI bus, so you
> either have to hack up the existing bus to host devices that are, on the
> other side, not part of it or branch off a pci-kvm sub-bus, just like
> you would have to create a sysbus-kvm.

Management tools should never transverse the device tree to find 
devices.  This is a recipe for disaster in the long term because the 
device tree will not remain stable.

So yes, a management tool should be able to enumerate assigned devices 
as they would enumerate any other PCI device but that has almost nothing 
to do with what the tree layout is.

Regards,

Anthony Liguori

>   I guess, if at all, we want the
> latter.
>
> Is that acceptable for everyone?
>
> Jan
>
>    


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-18 16:04                             ` Anthony Liguori
  0 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-18 16:04 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: kvm, Glauber Costa, Marcelo Tosatti, qemu-devel,
	Markus Armbruster, Avi Kivity

On 01/18/2011 10:01 AM, Jan Kiszka wrote:
> On 2011-01-18 16:50, Anthony Liguori wrote:
>    
>> On 01/18/2011 09:43 AM, Jan Kiszka wrote:
>>      
>>> On 2011-01-18 16:04, Anthony Liguori wrote:
>>>
>>>        
>>>> On 01/18/2011 08:28 AM, Jan Kiszka wrote:
>>>>
>>>>          
>>>>> On 2011-01-12 11:31, Jan Kiszka wrote:
>>>>>
>>>>>
>>>>>            
>>>>>> Am 12.01.2011 11:22, Avi Kivity wrote:
>>>>>>
>>>>>>
>>>>>>              
>>>>>>> On 01/11/2011 03:54 PM, Anthony Liguori wrote:
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>>>> Right, we should introduce a KVMBus that KVM devices are created on.
>>>>>>>> The devices can get at KVMState through the BusState.
>>>>>>>>
>>>>>>>>
>>>>>>>>                  
>>>>>>> There is no kvm bus in a PC (I looked).  We're bending the device model
>>>>>>> here because a device is implemented in the kernel and not in
>>>>>>> userspace.  An implementation detail is magnified beyond all proportions.
>>>>>>>
>>>>>>> An ioapic that is implemented by kvm lives in exactly the same place
>>>>>>> that the qemu ioapic lives in.  An assigned pci device lives on the PCI
>>>>>>> bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
>>>>>>> it elsewhere, not through creating imaginary buses that don't exist.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>> Exactly.
>>>>>>
>>>>>> So we can either "infect" the whole device tree with kvm (or maybe a
>>>>>> more generic accelerator structure that also deals with Xen) or we need
>>>>>> to pull the reference inside the device's init function from some global
>>>>>> service (kvm_get_state).
>>>>>>
>>>>>>
>>>>>>              
>>>>> Note that this topic is still waiting for good suggestions, specifically
>>>>> from those who believe in kvm_state references :). This is not only
>>>>> blocking kvmstate merge but will affect KVM irqchips as well.
>>>>>
>>>>> It boils down to how we reasonably pass a kvm_state reference from
>>>>> machine init code to a sysbus device. I'm probably biased, but I don't
>>>>> see any way that does not work against the idea of confining access to
>>>>> kvm_state or breaks device instantiation from the command line or a
>>>>> config file.
>>>>>
>>>>>
>>>>>            
>>>> A KVM device should sit on a KVM specific bus that hangs off of sysbus.
>>>> It can get to kvm_state through that bus.
>>>>
>>>> That bus doesn't get instantiated through qdev so requiring a pointer
>>>> argument should not be an issue.
>>>>
>>>>
>>>>          
>>> This design is in conflict with the requirement to attach KVM-assisted
>>> devices also to their home bus, e.g. an assigned PCI device to the PCI
>>> bus. We don't support multi-homed qdev devices.
>>>
>>>        
>> The bus topology reflects how I/O flows in and out of a device.  We do
>> not model a perfect PC bus architecture and I don't think we ever intend
>> to.  Instead, we model a functional architecture.
>>
>> I/O from an assigned device does not flow through the emulated PCI bus.
>> Therefore, it does not belong on the emulated PCI bus.
>>
>> Assigned devices need to interact with the emulated PCI bus, but they
>> shouldn't be children of it.
>>      
> You should be able to find assigned devices on some PCI bus, so you
> either have to hack up the existing bus to host devices that are, on the
> other side, not part of it or branch off a pci-kvm sub-bus, just like
> you would have to create a sysbus-kvm.

Management tools should never transverse the device tree to find 
devices.  This is a recipe for disaster in the long term because the 
device tree will not remain stable.

So yes, a management tool should be able to enumerate assigned devices 
as they would enumerate any other PCI device but that has almost nothing 
to do with what the tree layout is.

Regards,

Anthony Liguori

>   I guess, if at all, we want the
> latter.
>
> Is that acceptable for everyone?
>
> Jan
>
>    

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-18 16:04                             ` Anthony Liguori
@ 2011-01-18 16:17                               ` Jan Kiszka
  -1 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-18 16:17 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Avi Kivity, Markus Armbruster, Marcelo Tosatti, Glauber Costa,
	kvm, qemu-devel

On 2011-01-18 17:04, Anthony Liguori wrote:
>>>>> A KVM device should sit on a KVM specific bus that hangs off of
>>>>> sysbus.
>>>>> It can get to kvm_state through that bus.
>>>>>
>>>>> That bus doesn't get instantiated through qdev so requiring a pointer
>>>>> argument should not be an issue.
>>>>>
>>>>>
>>>>>          
>>>> This design is in conflict with the requirement to attach KVM-assisted
>>>> devices also to their home bus, e.g. an assigned PCI device to the PCI
>>>> bus. We don't support multi-homed qdev devices.
>>>>
>>>>        
>>> The bus topology reflects how I/O flows in and out of a device.  We do
>>> not model a perfect PC bus architecture and I don't think we ever intend
>>> to.  Instead, we model a functional architecture.
>>>
>>> I/O from an assigned device does not flow through the emulated PCI bus.
>>> Therefore, it does not belong on the emulated PCI bus.
>>>
>>> Assigned devices need to interact with the emulated PCI bus, but they
>>> shouldn't be children of it.
>>>      
>> You should be able to find assigned devices on some PCI bus, so you
>> either have to hack up the existing bus to host devices that are, on the
>> other side, not part of it or branch off a pci-kvm sub-bus, just like
>> you would have to create a sysbus-kvm.
> 
> Management tools should never transverse the device tree to find
> devices.  This is a recipe for disaster in the long term because the
> device tree will not remain stable.
> 
> So yes, a management tool should be able to enumerate assigned devices
> as they would enumerate any other PCI device but that has almost nothing
> to do with what the tree layout is.

I'm probably misunderstanding you, but if the bus topology as the guest
sees it is not properly reflected in an object tree on the qemu side, we
are creating hacks again.

Management and analysis tools must be able to traverse the system buses
and find guest devices this way. If they create a device on bus X, it
must never end up on bus Y just because it happens to be KVM-assisted or
has some other property. On the other hand, trying to hide this
dependency will likely cause severe damage to the qdev design.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-18 16:17                               ` Jan Kiszka
  0 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-18 16:17 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kvm, Glauber Costa, Marcelo Tosatti, qemu-devel,
	Markus Armbruster, Avi Kivity

On 2011-01-18 17:04, Anthony Liguori wrote:
>>>>> A KVM device should sit on a KVM specific bus that hangs off of
>>>>> sysbus.
>>>>> It can get to kvm_state through that bus.
>>>>>
>>>>> That bus doesn't get instantiated through qdev so requiring a pointer
>>>>> argument should not be an issue.
>>>>>
>>>>>
>>>>>          
>>>> This design is in conflict with the requirement to attach KVM-assisted
>>>> devices also to their home bus, e.g. an assigned PCI device to the PCI
>>>> bus. We don't support multi-homed qdev devices.
>>>>
>>>>        
>>> The bus topology reflects how I/O flows in and out of a device.  We do
>>> not model a perfect PC bus architecture and I don't think we ever intend
>>> to.  Instead, we model a functional architecture.
>>>
>>> I/O from an assigned device does not flow through the emulated PCI bus.
>>> Therefore, it does not belong on the emulated PCI bus.
>>>
>>> Assigned devices need to interact with the emulated PCI bus, but they
>>> shouldn't be children of it.
>>>      
>> You should be able to find assigned devices on some PCI bus, so you
>> either have to hack up the existing bus to host devices that are, on the
>> other side, not part of it or branch off a pci-kvm sub-bus, just like
>> you would have to create a sysbus-kvm.
> 
> Management tools should never transverse the device tree to find
> devices.  This is a recipe for disaster in the long term because the
> device tree will not remain stable.
> 
> So yes, a management tool should be able to enumerate assigned devices
> as they would enumerate any other PCI device but that has almost nothing
> to do with what the tree layout is.

I'm probably misunderstanding you, but if the bus topology as the guest
sees it is not properly reflected in an object tree on the qemu side, we
are creating hacks again.

Management and analysis tools must be able to traverse the system buses
and find guest devices this way. If they create a device on bus X, it
must never end up on bus Y just because it happens to be KVM-assisted or
has some other property. On the other hand, trying to hide this
dependency will likely cause severe damage to the qdev design.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-18 16:17                               ` Jan Kiszka
@ 2011-01-18 16:37                                 ` Anthony Liguori
  -1 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-18 16:37 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Avi Kivity, Markus Armbruster, Marcelo Tosatti, Glauber Costa,
	kvm, qemu-devel

On 01/18/2011 10:17 AM, Jan Kiszka wrote:
> On 2011-01-18 17:04, Anthony Liguori wrote:
>    
>>>>>> A KVM device should sit on a KVM specific bus that hangs off of
>>>>>> sysbus.
>>>>>> It can get to kvm_state through that bus.
>>>>>>
>>>>>> That bus doesn't get instantiated through qdev so requiring a pointer
>>>>>> argument should not be an issue.
>>>>>>
>>>>>>
>>>>>>
>>>>>>              
>>>>> This design is in conflict with the requirement to attach KVM-assisted
>>>>> devices also to their home bus, e.g. an assigned PCI device to the PCI
>>>>> bus. We don't support multi-homed qdev devices.
>>>>>
>>>>>
>>>>>            
>>>> The bus topology reflects how I/O flows in and out of a device.  We do
>>>> not model a perfect PC bus architecture and I don't think we ever intend
>>>> to.  Instead, we model a functional architecture.
>>>>
>>>> I/O from an assigned device does not flow through the emulated PCI bus.
>>>> Therefore, it does not belong on the emulated PCI bus.
>>>>
>>>> Assigned devices need to interact with the emulated PCI bus, but they
>>>> shouldn't be children of it.
>>>>
>>>>          
>>> You should be able to find assigned devices on some PCI bus, so you
>>> either have to hack up the existing bus to host devices that are, on the
>>> other side, not part of it or branch off a pci-kvm sub-bus, just like
>>> you would have to create a sysbus-kvm.
>>>        
>> Management tools should never transverse the device tree to find
>> devices.  This is a recipe for disaster in the long term because the
>> device tree will not remain stable.
>>
>> So yes, a management tool should be able to enumerate assigned devices
>> as they would enumerate any other PCI device but that has almost nothing
>> to do with what the tree layout is.
>>      
> I'm probably misunderstanding you, but if the bus topology as the guest
> sees it is not properly reflected in an object tree on the qemu side, we
> are creating hacks again.
>    

There is no such thing as the "bus topology as the guest sees it".

The guest just sees a bunch of devices.  The guest can only infer things 
like ISA busses.  The guest sees a bunch of devices: an i8254, i8259, 
RTC, etc.  Whether those devices are on an ISA bus, and LPC bus, or all 
in a SuperI/O chip that's part of the southbridge is all invisible to 
the guest.

The device model topology is 100% a hidden architectural detail.

> Management and analysis tools must be able to traverse the system buses
> and find guest devices this way.

We need to provide a compatible interface to the guest.  If you agree 
with my above statements, then you'll also agree that we can do this 
without keeping the device model topology stable.

But we also need to provide a compatible interface to management tools.  
Exposing the device model topology as a compatible interface 
artificially limits us.  It's far better to provide higher level 
supported interfaces to give us the flexibility to change the device 
model as we need to.

>   If they create a device on bus X, it
> must never end up on bus Y just because it happens to be KVM-assisted or
> has some other property.

Nope.  This is exactly what should happen.

90% of the devices in the device model are not created by management 
tools.  They're part of a chipset.  The chipset has well defined 
extension points and we provide management interfaces to create devices 
on those extension points.  That is, interfaces to create PCI devices.

Regards,

Anthony Liguori

>   On the other hand, trying to hide this
> dependency will likely cause severe damage to the qdev design.
>
> Jan
>
>    


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-18 16:37                                 ` Anthony Liguori
  0 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-18 16:37 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: kvm, Glauber Costa, Marcelo Tosatti, qemu-devel,
	Markus Armbruster, Avi Kivity

On 01/18/2011 10:17 AM, Jan Kiszka wrote:
> On 2011-01-18 17:04, Anthony Liguori wrote:
>    
>>>>>> A KVM device should sit on a KVM specific bus that hangs off of
>>>>>> sysbus.
>>>>>> It can get to kvm_state through that bus.
>>>>>>
>>>>>> That bus doesn't get instantiated through qdev so requiring a pointer
>>>>>> argument should not be an issue.
>>>>>>
>>>>>>
>>>>>>
>>>>>>              
>>>>> This design is in conflict with the requirement to attach KVM-assisted
>>>>> devices also to their home bus, e.g. an assigned PCI device to the PCI
>>>>> bus. We don't support multi-homed qdev devices.
>>>>>
>>>>>
>>>>>            
>>>> The bus topology reflects how I/O flows in and out of a device.  We do
>>>> not model a perfect PC bus architecture and I don't think we ever intend
>>>> to.  Instead, we model a functional architecture.
>>>>
>>>> I/O from an assigned device does not flow through the emulated PCI bus.
>>>> Therefore, it does not belong on the emulated PCI bus.
>>>>
>>>> Assigned devices need to interact with the emulated PCI bus, but they
>>>> shouldn't be children of it.
>>>>
>>>>          
>>> You should be able to find assigned devices on some PCI bus, so you
>>> either have to hack up the existing bus to host devices that are, on the
>>> other side, not part of it or branch off a pci-kvm sub-bus, just like
>>> you would have to create a sysbus-kvm.
>>>        
>> Management tools should never transverse the device tree to find
>> devices.  This is a recipe for disaster in the long term because the
>> device tree will not remain stable.
>>
>> So yes, a management tool should be able to enumerate assigned devices
>> as they would enumerate any other PCI device but that has almost nothing
>> to do with what the tree layout is.
>>      
> I'm probably misunderstanding you, but if the bus topology as the guest
> sees it is not properly reflected in an object tree on the qemu side, we
> are creating hacks again.
>    

There is no such thing as the "bus topology as the guest sees it".

The guest just sees a bunch of devices.  The guest can only infer things 
like ISA busses.  The guest sees a bunch of devices: an i8254, i8259, 
RTC, etc.  Whether those devices are on an ISA bus, and LPC bus, or all 
in a SuperI/O chip that's part of the southbridge is all invisible to 
the guest.

The device model topology is 100% a hidden architectural detail.

> Management and analysis tools must be able to traverse the system buses
> and find guest devices this way.

We need to provide a compatible interface to the guest.  If you agree 
with my above statements, then you'll also agree that we can do this 
without keeping the device model topology stable.

But we also need to provide a compatible interface to management tools.  
Exposing the device model topology as a compatible interface 
artificially limits us.  It's far better to provide higher level 
supported interfaces to give us the flexibility to change the device 
model as we need to.

>   If they create a device on bus X, it
> must never end up on bus Y just because it happens to be KVM-assisted or
> has some other property.

Nope.  This is exactly what should happen.

90% of the devices in the device model are not created by management 
tools.  They're part of a chipset.  The chipset has well defined 
extension points and we provide management interfaces to create devices 
on those extension points.  That is, interfaces to create PCI devices.

Regards,

Anthony Liguori

>   On the other hand, trying to hide this
> dependency will likely cause severe damage to the qdev design.
>
> Jan
>
>    

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-18 16:37                                 ` Anthony Liguori
@ 2011-01-18 16:56                                   ` Jan Kiszka
  -1 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-18 16:56 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Avi Kivity, Markus Armbruster, Marcelo Tosatti, Glauber Costa,
	kvm, qemu-devel

On 2011-01-18 17:37, Anthony Liguori wrote:
> On 01/18/2011 10:17 AM, Jan Kiszka wrote:
>> On 2011-01-18 17:04, Anthony Liguori wrote:
>>    
>>>>>>> A KVM device should sit on a KVM specific bus that hangs off of
>>>>>>> sysbus.
>>>>>>> It can get to kvm_state through that bus.
>>>>>>>
>>>>>>> That bus doesn't get instantiated through qdev so requiring a pointer
>>>>>>> argument should not be an issue.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>              
>>>>>> This design is in conflict with the requirement to attach KVM-assisted
>>>>>> devices also to their home bus, e.g. an assigned PCI device to the PCI
>>>>>> bus. We don't support multi-homed qdev devices.
>>>>>>
>>>>>>
>>>>>>            
>>>>> The bus topology reflects how I/O flows in and out of a device.  We do
>>>>> not model a perfect PC bus architecture and I don't think we ever intend
>>>>> to.  Instead, we model a functional architecture.
>>>>>
>>>>> I/O from an assigned device does not flow through the emulated PCI bus.
>>>>> Therefore, it does not belong on the emulated PCI bus.
>>>>>
>>>>> Assigned devices need to interact with the emulated PCI bus, but they
>>>>> shouldn't be children of it.
>>>>>
>>>>>          
>>>> You should be able to find assigned devices on some PCI bus, so you
>>>> either have to hack up the existing bus to host devices that are, on the
>>>> other side, not part of it or branch off a pci-kvm sub-bus, just like
>>>> you would have to create a sysbus-kvm.
>>>>        
>>> Management tools should never transverse the device tree to find
>>> devices.  This is a recipe for disaster in the long term because the
>>> device tree will not remain stable.
>>>
>>> So yes, a management tool should be able to enumerate assigned devices
>>> as they would enumerate any other PCI device but that has almost nothing
>>> to do with what the tree layout is.
>>>      
>> I'm probably misunderstanding you, but if the bus topology as the guest
>> sees it is not properly reflected in an object tree on the qemu side, we
>> are creating hacks again.
>>    
> 
> There is no such thing as the "bus topology as the guest sees it".
> 
> The guest just sees a bunch of devices.  The guest can only infer things 
> like ISA busses.  The guest sees a bunch of devices: an i8254, i8259, 
> RTC, etc.  Whether those devices are on an ISA bus, and LPC bus, or all 
> in a SuperI/O chip that's part of the southbridge is all invisible to 
> the guest.
> 
> The device model topology is 100% a hidden architectural detail.

This is true for the sysbus, it is obviously not the case for PCI and
similarly discoverable buses. There we have a guest-explorable topology
that is currently equivalent to the the qdev layout.

> 
>> Management and analysis tools must be able to traverse the system buses
>> and find guest devices this way.
> 
> We need to provide a compatible interface to the guest.  If you agree 
> with my above statements, then you'll also agree that we can do this 
> without keeping the device model topology stable.
> 
> But we also need to provide a compatible interface to management tools.  
> Exposing the device model topology as a compatible interface 
> artificially limits us.  It's far better to provide higher level 
> supported interfaces to give us the flexibility to change the device 
> model as we need to.

How do you want to change qdev to keep the guest and management tool
view stable while branching off kvm sub-buses? Please propose such
extensions so that they can be discussed. IIUC, that would be second
relation between qdev and qbus instances besides the physical topology.
What further use cases (besides passing kvm_state around) do you have in
mind?

> 
>>   If they create a device on bus X, it
>> must never end up on bus Y just because it happens to be KVM-assisted or
>> has some other property.
> 
> Nope.  This is exactly what should happen.
> 
> 90% of the devices in the device model are not created by management 
> tools.  They're part of a chipset.  The chipset has well defined 
> extension points and we provide management interfaces to create devices 
> on those extension points.  That is, interfaces to create PCI devices.
> 

Creating kvm irqchips via the management tool would be one simple way
(not the only one, though) to enable/disable kvm assistance for those
devices.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-18 16:56                                   ` Jan Kiszka
  0 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-18 16:56 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kvm, Glauber Costa, Marcelo Tosatti, qemu-devel,
	Markus Armbruster, Avi Kivity

On 2011-01-18 17:37, Anthony Liguori wrote:
> On 01/18/2011 10:17 AM, Jan Kiszka wrote:
>> On 2011-01-18 17:04, Anthony Liguori wrote:
>>    
>>>>>>> A KVM device should sit on a KVM specific bus that hangs off of
>>>>>>> sysbus.
>>>>>>> It can get to kvm_state through that bus.
>>>>>>>
>>>>>>> That bus doesn't get instantiated through qdev so requiring a pointer
>>>>>>> argument should not be an issue.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>              
>>>>>> This design is in conflict with the requirement to attach KVM-assisted
>>>>>> devices also to their home bus, e.g. an assigned PCI device to the PCI
>>>>>> bus. We don't support multi-homed qdev devices.
>>>>>>
>>>>>>
>>>>>>            
>>>>> The bus topology reflects how I/O flows in and out of a device.  We do
>>>>> not model a perfect PC bus architecture and I don't think we ever intend
>>>>> to.  Instead, we model a functional architecture.
>>>>>
>>>>> I/O from an assigned device does not flow through the emulated PCI bus.
>>>>> Therefore, it does not belong on the emulated PCI bus.
>>>>>
>>>>> Assigned devices need to interact with the emulated PCI bus, but they
>>>>> shouldn't be children of it.
>>>>>
>>>>>          
>>>> You should be able to find assigned devices on some PCI bus, so you
>>>> either have to hack up the existing bus to host devices that are, on the
>>>> other side, not part of it or branch off a pci-kvm sub-bus, just like
>>>> you would have to create a sysbus-kvm.
>>>>        
>>> Management tools should never transverse the device tree to find
>>> devices.  This is a recipe for disaster in the long term because the
>>> device tree will not remain stable.
>>>
>>> So yes, a management tool should be able to enumerate assigned devices
>>> as they would enumerate any other PCI device but that has almost nothing
>>> to do with what the tree layout is.
>>>      
>> I'm probably misunderstanding you, but if the bus topology as the guest
>> sees it is not properly reflected in an object tree on the qemu side, we
>> are creating hacks again.
>>    
> 
> There is no such thing as the "bus topology as the guest sees it".
> 
> The guest just sees a bunch of devices.  The guest can only infer things 
> like ISA busses.  The guest sees a bunch of devices: an i8254, i8259, 
> RTC, etc.  Whether those devices are on an ISA bus, and LPC bus, or all 
> in a SuperI/O chip that's part of the southbridge is all invisible to 
> the guest.
> 
> The device model topology is 100% a hidden architectural detail.

This is true for the sysbus, it is obviously not the case for PCI and
similarly discoverable buses. There we have a guest-explorable topology
that is currently equivalent to the the qdev layout.

> 
>> Management and analysis tools must be able to traverse the system buses
>> and find guest devices this way.
> 
> We need to provide a compatible interface to the guest.  If you agree 
> with my above statements, then you'll also agree that we can do this 
> without keeping the device model topology stable.
> 
> But we also need to provide a compatible interface to management tools.  
> Exposing the device model topology as a compatible interface 
> artificially limits us.  It's far better to provide higher level 
> supported interfaces to give us the flexibility to change the device 
> model as we need to.

How do you want to change qdev to keep the guest and management tool
view stable while branching off kvm sub-buses? Please propose such
extensions so that they can be discussed. IIUC, that would be second
relation between qdev and qbus instances besides the physical topology.
What further use cases (besides passing kvm_state around) do you have in
mind?

> 
>>   If they create a device on bus X, it
>> must never end up on bus Y just because it happens to be KVM-assisted or
>> has some other property.
> 
> Nope.  This is exactly what should happen.
> 
> 90% of the devices in the device model are not created by management 
> tools.  They're part of a chipset.  The chipset has well defined 
> extension points and we provide management interfaces to create devices 
> on those extension points.  That is, interfaces to create PCI devices.
> 

Creating kvm irqchips via the management tool would be one simple way
(not the only one, though) to enable/disable kvm assistance for those
devices.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-18 15:54                           ` Jan Kiszka
@ 2011-01-18 17:02                             ` Alex Williamson
  -1 siblings, 0 replies; 300+ messages in thread
From: Alex Williamson @ 2011-01-18 17:02 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, Avi Kivity, Markus Armbruster, Marcelo Tosatti,
	Glauber Costa, kvm, qemu-devel

On Tue, 2011-01-18 at 16:54 +0100, Jan Kiszka wrote:
> On 2011-01-18 16:48, Anthony Liguori wrote:
> > On 01/18/2011 09:43 AM, Jan Kiszka wrote:
> >> On 2011-01-18 16:04, Anthony Liguori wrote:
> >>    
> >>> On 01/18/2011 08:28 AM, Jan Kiszka wrote:
> >>>      
> >>>> On 2011-01-12 11:31, Jan Kiszka wrote:
> >>>>
> >>>>        
> >>>>> Am 12.01.2011 11:22, Avi Kivity wrote:
> >>>>>
> >>>>>          
> >>>>>> On 01/11/2011 03:54 PM, Anthony Liguori wrote:
> >>>>>>
> >>>>>>            
> >>>>>>> Right, we should introduce a KVMBus that KVM devices are created on.
> >>>>>>> The devices can get at KVMState through the BusState.
> >>>>>>>
> >>>>>>>              
> >>>>>> There is no kvm bus in a PC (I looked).  We're bending the device model
> >>>>>> here because a device is implemented in the kernel and not in
> >>>>>> userspace.  An implementation detail is magnified beyond all proportions.
> >>>>>>
> >>>>>> An ioapic that is implemented by kvm lives in exactly the same place
> >>>>>> that the qemu ioapic lives in.  An assigned pci device lives on the PCI
> >>>>>> bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
> >>>>>> it elsewhere, not through creating imaginary buses that don't exist.
> >>>>>>
> >>>>>>
> >>>>>>            
> >>>>> Exactly.
> >>>>>
> >>>>> So we can either "infect" the whole device tree with kvm (or maybe a
> >>>>> more generic accelerator structure that also deals with Xen) or we need
> >>>>> to pull the reference inside the device's init function from some global
> >>>>> service (kvm_get_state).
> >>>>>
> >>>>>          
> >>>> Note that this topic is still waiting for good suggestions, specifically
> >>>> from those who believe in kvm_state references :). This is not only
> >>>> blocking kvmstate merge but will affect KVM irqchips as well.
> >>>>
> >>>> It boils down to how we reasonably pass a kvm_state reference from
> >>>> machine init code to a sysbus device. I'm probably biased, but I don't
> >>>> see any way that does not work against the idea of confining access to
> >>>> kvm_state or breaks device instantiation from the command line or a
> >>>> config file.
> >>>>
> >>>>        
> >>> A KVM device should sit on a KVM specific bus that hangs off of sysbus.
> >>> It can get to kvm_state through that bus.
> >>>
> >>> That bus doesn't get instantiated through qdev so requiring a pointer
> >>> argument should not be an issue.
> >>>
> >>>      
> >> This design is in conflict with the requirement to attach KVM-assisted
> >> devices also to their home bus, e.g. an assigned PCI device to the PCI
> >> bus. We don't support multi-homed qdev devices.
> >>    
> > 
> > With vfio, would an assigned PCI device even need kvm_state?
> 
> IIUC: Yes, for establishing the irqfd link.

We abstract this through the msi/msix layer though, so the vfio driver
doesn't directly know anything about kvm_state.

Alex


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-18 17:02                             ` Alex Williamson
  0 siblings, 0 replies; 300+ messages in thread
From: Alex Williamson @ 2011-01-18 17:02 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: kvm, Glauber Costa, Marcelo Tosatti, Markus Armbruster,
	qemu-devel, Anthony Liguori, Avi Kivity

On Tue, 2011-01-18 at 16:54 +0100, Jan Kiszka wrote:
> On 2011-01-18 16:48, Anthony Liguori wrote:
> > On 01/18/2011 09:43 AM, Jan Kiszka wrote:
> >> On 2011-01-18 16:04, Anthony Liguori wrote:
> >>    
> >>> On 01/18/2011 08:28 AM, Jan Kiszka wrote:
> >>>      
> >>>> On 2011-01-12 11:31, Jan Kiszka wrote:
> >>>>
> >>>>        
> >>>>> Am 12.01.2011 11:22, Avi Kivity wrote:
> >>>>>
> >>>>>          
> >>>>>> On 01/11/2011 03:54 PM, Anthony Liguori wrote:
> >>>>>>
> >>>>>>            
> >>>>>>> Right, we should introduce a KVMBus that KVM devices are created on.
> >>>>>>> The devices can get at KVMState through the BusState.
> >>>>>>>
> >>>>>>>              
> >>>>>> There is no kvm bus in a PC (I looked).  We're bending the device model
> >>>>>> here because a device is implemented in the kernel and not in
> >>>>>> userspace.  An implementation detail is magnified beyond all proportions.
> >>>>>>
> >>>>>> An ioapic that is implemented by kvm lives in exactly the same place
> >>>>>> that the qemu ioapic lives in.  An assigned pci device lives on the PCI
> >>>>>> bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
> >>>>>> it elsewhere, not through creating imaginary buses that don't exist.
> >>>>>>
> >>>>>>
> >>>>>>            
> >>>>> Exactly.
> >>>>>
> >>>>> So we can either "infect" the whole device tree with kvm (or maybe a
> >>>>> more generic accelerator structure that also deals with Xen) or we need
> >>>>> to pull the reference inside the device's init function from some global
> >>>>> service (kvm_get_state).
> >>>>>
> >>>>>          
> >>>> Note that this topic is still waiting for good suggestions, specifically
> >>>> from those who believe in kvm_state references :). This is not only
> >>>> blocking kvmstate merge but will affect KVM irqchips as well.
> >>>>
> >>>> It boils down to how we reasonably pass a kvm_state reference from
> >>>> machine init code to a sysbus device. I'm probably biased, but I don't
> >>>> see any way that does not work against the idea of confining access to
> >>>> kvm_state or breaks device instantiation from the command line or a
> >>>> config file.
> >>>>
> >>>>        
> >>> A KVM device should sit on a KVM specific bus that hangs off of sysbus.
> >>> It can get to kvm_state through that bus.
> >>>
> >>> That bus doesn't get instantiated through qdev so requiring a pointer
> >>> argument should not be an issue.
> >>>
> >>>      
> >> This design is in conflict with the requirement to attach KVM-assisted
> >> devices also to their home bus, e.g. an assigned PCI device to the PCI
> >> bus. We don't support multi-homed qdev devices.
> >>    
> > 
> > With vfio, would an assigned PCI device even need kvm_state?
> 
> IIUC: Yes, for establishing the irqfd link.

We abstract this through the msi/msix layer though, so the vfio driver
doesn't directly know anything about kvm_state.

Alex

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-18 17:02                             ` Alex Williamson
@ 2011-01-18 17:08                               ` Jan Kiszka
  -1 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-18 17:08 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Anthony Liguori, Avi Kivity, Markus Armbruster, Marcelo Tosatti,
	Glauber Costa, kvm, qemu-devel

On 2011-01-18 18:02, Alex Williamson wrote:
> On Tue, 2011-01-18 at 16:54 +0100, Jan Kiszka wrote:
>> On 2011-01-18 16:48, Anthony Liguori wrote:
>>> On 01/18/2011 09:43 AM, Jan Kiszka wrote:
>>>> On 2011-01-18 16:04, Anthony Liguori wrote:
>>>>    
>>>>> On 01/18/2011 08:28 AM, Jan Kiszka wrote:
>>>>>      
>>>>>> On 2011-01-12 11:31, Jan Kiszka wrote:
>>>>>>
>>>>>>        
>>>>>>> Am 12.01.2011 11:22, Avi Kivity wrote:
>>>>>>>
>>>>>>>          
>>>>>>>> On 01/11/2011 03:54 PM, Anthony Liguori wrote:
>>>>>>>>
>>>>>>>>            
>>>>>>>>> Right, we should introduce a KVMBus that KVM devices are created on.
>>>>>>>>> The devices can get at KVMState through the BusState.
>>>>>>>>>
>>>>>>>>>              
>>>>>>>> There is no kvm bus in a PC (I looked).  We're bending the device model
>>>>>>>> here because a device is implemented in the kernel and not in
>>>>>>>> userspace.  An implementation detail is magnified beyond all proportions.
>>>>>>>>
>>>>>>>> An ioapic that is implemented by kvm lives in exactly the same place
>>>>>>>> that the qemu ioapic lives in.  An assigned pci device lives on the PCI
>>>>>>>> bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
>>>>>>>> it elsewhere, not through creating imaginary buses that don't exist.
>>>>>>>>
>>>>>>>>
>>>>>>>>            
>>>>>>> Exactly.
>>>>>>>
>>>>>>> So we can either "infect" the whole device tree with kvm (or maybe a
>>>>>>> more generic accelerator structure that also deals with Xen) or we need
>>>>>>> to pull the reference inside the device's init function from some global
>>>>>>> service (kvm_get_state).
>>>>>>>
>>>>>>>          
>>>>>> Note that this topic is still waiting for good suggestions, specifically
>>>>>> from those who believe in kvm_state references :). This is not only
>>>>>> blocking kvmstate merge but will affect KVM irqchips as well.
>>>>>>
>>>>>> It boils down to how we reasonably pass a kvm_state reference from
>>>>>> machine init code to a sysbus device. I'm probably biased, but I don't
>>>>>> see any way that does not work against the idea of confining access to
>>>>>> kvm_state or breaks device instantiation from the command line or a
>>>>>> config file.
>>>>>>
>>>>>>        
>>>>> A KVM device should sit on a KVM specific bus that hangs off of sysbus.
>>>>> It can get to kvm_state through that bus.
>>>>>
>>>>> That bus doesn't get instantiated through qdev so requiring a pointer
>>>>> argument should not be an issue.
>>>>>
>>>>>      
>>>> This design is in conflict with the requirement to attach KVM-assisted
>>>> devices also to their home bus, e.g. an assigned PCI device to the PCI
>>>> bus. We don't support multi-homed qdev devices.
>>>>    
>>>
>>> With vfio, would an assigned PCI device even need kvm_state?
>>
>> IIUC: Yes, for establishing the irqfd link.
> 
> We abstract this through the msi/msix layer though, so the vfio driver
> doesn't directly know anything about kvm_state.

Which version/tree are you referring to? It wasn't the case in the last
version I found on the list.

Does the msi layer use irqfd for every source in kvm mode then? Of
course, the key question will be how that layer will once obtain kvm_state.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-18 17:08                               ` Jan Kiszka
  0 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-18 17:08 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kvm, Glauber Costa, Marcelo Tosatti, Markus Armbruster,
	qemu-devel, Anthony Liguori, Avi Kivity

On 2011-01-18 18:02, Alex Williamson wrote:
> On Tue, 2011-01-18 at 16:54 +0100, Jan Kiszka wrote:
>> On 2011-01-18 16:48, Anthony Liguori wrote:
>>> On 01/18/2011 09:43 AM, Jan Kiszka wrote:
>>>> On 2011-01-18 16:04, Anthony Liguori wrote:
>>>>    
>>>>> On 01/18/2011 08:28 AM, Jan Kiszka wrote:
>>>>>      
>>>>>> On 2011-01-12 11:31, Jan Kiszka wrote:
>>>>>>
>>>>>>        
>>>>>>> Am 12.01.2011 11:22, Avi Kivity wrote:
>>>>>>>
>>>>>>>          
>>>>>>>> On 01/11/2011 03:54 PM, Anthony Liguori wrote:
>>>>>>>>
>>>>>>>>            
>>>>>>>>> Right, we should introduce a KVMBus that KVM devices are created on.
>>>>>>>>> The devices can get at KVMState through the BusState.
>>>>>>>>>
>>>>>>>>>              
>>>>>>>> There is no kvm bus in a PC (I looked).  We're bending the device model
>>>>>>>> here because a device is implemented in the kernel and not in
>>>>>>>> userspace.  An implementation detail is magnified beyond all proportions.
>>>>>>>>
>>>>>>>> An ioapic that is implemented by kvm lives in exactly the same place
>>>>>>>> that the qemu ioapic lives in.  An assigned pci device lives on the PCI
>>>>>>>> bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
>>>>>>>> it elsewhere, not through creating imaginary buses that don't exist.
>>>>>>>>
>>>>>>>>
>>>>>>>>            
>>>>>>> Exactly.
>>>>>>>
>>>>>>> So we can either "infect" the whole device tree with kvm (or maybe a
>>>>>>> more generic accelerator structure that also deals with Xen) or we need
>>>>>>> to pull the reference inside the device's init function from some global
>>>>>>> service (kvm_get_state).
>>>>>>>
>>>>>>>          
>>>>>> Note that this topic is still waiting for good suggestions, specifically
>>>>>> from those who believe in kvm_state references :). This is not only
>>>>>> blocking kvmstate merge but will affect KVM irqchips as well.
>>>>>>
>>>>>> It boils down to how we reasonably pass a kvm_state reference from
>>>>>> machine init code to a sysbus device. I'm probably biased, but I don't
>>>>>> see any way that does not work against the idea of confining access to
>>>>>> kvm_state or breaks device instantiation from the command line or a
>>>>>> config file.
>>>>>>
>>>>>>        
>>>>> A KVM device should sit on a KVM specific bus that hangs off of sysbus.
>>>>> It can get to kvm_state through that bus.
>>>>>
>>>>> That bus doesn't get instantiated through qdev so requiring a pointer
>>>>> argument should not be an issue.
>>>>>
>>>>>      
>>>> This design is in conflict with the requirement to attach KVM-assisted
>>>> devices also to their home bus, e.g. an assigned PCI device to the PCI
>>>> bus. We don't support multi-homed qdev devices.
>>>>    
>>>
>>> With vfio, would an assigned PCI device even need kvm_state?
>>
>> IIUC: Yes, for establishing the irqfd link.
> 
> We abstract this through the msi/msix layer though, so the vfio driver
> doesn't directly know anything about kvm_state.

Which version/tree are you referring to? It wasn't the case in the last
version I found on the list.

Does the msi layer use irqfd for every source in kvm mode then? Of
course, the key question will be how that layer will once obtain kvm_state.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-18 16:56                                   ` Jan Kiszka
@ 2011-01-18 17:09                                     ` Anthony Liguori
  -1 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-18 17:09 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Avi Kivity, Markus Armbruster, Marcelo Tosatti, Glauber Costa,
	kvm, qemu-devel

On 01/18/2011 10:56 AM, Jan Kiszka wrote:
>
>> The device model topology is 100% a hidden architectural detail.
>>      
> This is true for the sysbus, it is obviously not the case for PCI and
> similarly discoverable buses. There we have a guest-explorable topology
> that is currently equivalent to the the qdev layout.
>    

But we also don't do PCI passthrough so we really haven't even explored 
how that maps in qdev.  I don't know if qemu-kvm has attempted to 
qdev-ify it.

>>> Management and analysis tools must be able to traverse the system buses
>>> and find guest devices this way.
>>>        
>> We need to provide a compatible interface to the guest.  If you agree
>> with my above statements, then you'll also agree that we can do this
>> without keeping the device model topology stable.
>>
>> But we also need to provide a compatible interface to management tools.
>> Exposing the device model topology as a compatible interface
>> artificially limits us.  It's far better to provide higher level
>> supported interfaces to give us the flexibility to change the device
>> model as we need to.
>>      
> How do you want to change qdev to keep the guest and management tool
> view stable while branching off kvm sub-buses?

The qdev device model is not a stable interface.  I think that's been 
clear from the very beginning.

>   Please propose such
> extensions so that they can be discussed. IIUC, that would be second
> relation between qdev and qbus instances besides the physical topology.
> What further use cases (besides passing kvm_state around) do you have in
> mind?
>    

The -device interface is a stable interface.  Right now, you don't 
specify any type of identifier of the pci bus when you create a PCI 
device.  It's implied in the interface.

>    
>>      
>>>    If they create a device on bus X, it
>>> must never end up on bus Y just because it happens to be KVM-assisted or
>>> has some other property.
>>>        
>> Nope.  This is exactly what should happen.
>>
>> 90% of the devices in the device model are not created by management
>> tools.  They're part of a chipset.  The chipset has well defined
>> extension points and we provide management interfaces to create devices
>> on those extension points.  That is, interfaces to create PCI devices.
>>
>>      
> Creating kvm irqchips via the management tool would be one simple way
> (not the only one, though) to enable/disable kvm assistance for those
> devices.
>    

It's automatically created as part of the CPUs or as part of the 
chipset.  How to enable/disable kvm assistance is a property of the CPU 
and/or chipset.

Regards,

Anthony Liguori


> Jan
>
>    


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-18 17:09                                     ` Anthony Liguori
  0 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-18 17:09 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: kvm, Glauber Costa, Marcelo Tosatti, qemu-devel,
	Markus Armbruster, Avi Kivity

On 01/18/2011 10:56 AM, Jan Kiszka wrote:
>
>> The device model topology is 100% a hidden architectural detail.
>>      
> This is true for the sysbus, it is obviously not the case for PCI and
> similarly discoverable buses. There we have a guest-explorable topology
> that is currently equivalent to the the qdev layout.
>    

But we also don't do PCI passthrough so we really haven't even explored 
how that maps in qdev.  I don't know if qemu-kvm has attempted to 
qdev-ify it.

>>> Management and analysis tools must be able to traverse the system buses
>>> and find guest devices this way.
>>>        
>> We need to provide a compatible interface to the guest.  If you agree
>> with my above statements, then you'll also agree that we can do this
>> without keeping the device model topology stable.
>>
>> But we also need to provide a compatible interface to management tools.
>> Exposing the device model topology as a compatible interface
>> artificially limits us.  It's far better to provide higher level
>> supported interfaces to give us the flexibility to change the device
>> model as we need to.
>>      
> How do you want to change qdev to keep the guest and management tool
> view stable while branching off kvm sub-buses?

The qdev device model is not a stable interface.  I think that's been 
clear from the very beginning.

>   Please propose such
> extensions so that they can be discussed. IIUC, that would be second
> relation between qdev and qbus instances besides the physical topology.
> What further use cases (besides passing kvm_state around) do you have in
> mind?
>    

The -device interface is a stable interface.  Right now, you don't 
specify any type of identifier of the pci bus when you create a PCI 
device.  It's implied in the interface.

>    
>>      
>>>    If they create a device on bus X, it
>>> must never end up on bus Y just because it happens to be KVM-assisted or
>>> has some other property.
>>>        
>> Nope.  This is exactly what should happen.
>>
>> 90% of the devices in the device model are not created by management
>> tools.  They're part of a chipset.  The chipset has well defined
>> extension points and we provide management interfaces to create devices
>> on those extension points.  That is, interfaces to create PCI devices.
>>
>>      
> Creating kvm irqchips via the management tool would be one simple way
> (not the only one, though) to enable/disable kvm assistance for those
> devices.
>    

It's automatically created as part of the CPUs or as part of the 
chipset.  How to enable/disable kvm assistance is a property of the CPU 
and/or chipset.

Regards,

Anthony Liguori


> Jan
>
>    

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-18 17:09                                     ` Anthony Liguori
@ 2011-01-18 17:20                                       ` Jan Kiszka
  -1 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-18 17:20 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Avi Kivity, Markus Armbruster, Marcelo Tosatti, Glauber Costa,
	kvm, qemu-devel

On 2011-01-18 18:09, Anthony Liguori wrote:
> On 01/18/2011 10:56 AM, Jan Kiszka wrote:
>>
>>> The device model topology is 100% a hidden architectural detail.
>>>      
>> This is true for the sysbus, it is obviously not the case for PCI and
>> similarly discoverable buses. There we have a guest-explorable topology
>> that is currently equivalent to the the qdev layout.
>>    
> 
> But we also don't do PCI passthrough so we really haven't even explored 
> how that maps in qdev.  I don't know if qemu-kvm has attempted to 
> qdev-ify it.

It is. And even if it weren't or the current version in qemu-kvm was not
perfect, we need to consider those uses cases now as we are about to
define a generic model for kvm device integration. That's the point of
this discussion.

> 
>>>> Management and analysis tools must be able to traverse the system buses
>>>> and find guest devices this way.
>>>>        
>>> We need to provide a compatible interface to the guest.  If you agree
>>> with my above statements, then you'll also agree that we can do this
>>> without keeping the device model topology stable.
>>>
>>> But we also need to provide a compatible interface to management tools.
>>> Exposing the device model topology as a compatible interface
>>> artificially limits us.  It's far better to provide higher level
>>> supported interfaces to give us the flexibility to change the device
>>> model as we need to.
>>>      
>> How do you want to change qdev to keep the guest and management tool
>> view stable while branching off kvm sub-buses?
> 
> The qdev device model is not a stable interface.  I think that's been 
> clear from the very beginning.

Internals aren't stable, but they should only be changed for a good
reason, specifically when the change may impact the whole set of device
models.

> 
>>   Please propose such
>> extensions so that they can be discussed. IIUC, that would be second
>> relation between qdev and qbus instances besides the physical topology.
>> What further use cases (besides passing kvm_state around) do you have in
>> mind?
>>    
> 
> The -device interface is a stable interface.  Right now, you don't 
> specify any type of identifier of the pci bus when you create a PCI 
> device.  It's implied in the interface.

Which only works as along as we expose a single bus. You don't need to
be an oracle to predict that this is not a stable interface.

> 
>>    
>>>      
>>>>    If they create a device on bus X, it
>>>> must never end up on bus Y just because it happens to be KVM-assisted or
>>>> has some other property.
>>>>        
>>> Nope.  This is exactly what should happen.
>>>
>>> 90% of the devices in the device model are not created by management
>>> tools.  They're part of a chipset.  The chipset has well defined
>>> extension points and we provide management interfaces to create devices
>>> on those extension points.  That is, interfaces to create PCI devices.
>>>
>>>      
>> Creating kvm irqchips via the management tool would be one simple way
>> (not the only one, though) to enable/disable kvm assistance for those
>> devices.
>>    
> 
> It's automatically created as part of the CPUs or as part of the 
> chipset.  How to enable/disable kvm assistance is a property of the CPU 
> and/or chipset.

If we exclude creation via command line / config files, we could also
pass the kvm_state directly from the machine or chipset setup code and
save us at least the kvm system buses.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-18 17:20                                       ` Jan Kiszka
  0 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-18 17:20 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kvm, Glauber Costa, Marcelo Tosatti, qemu-devel,
	Markus Armbruster, Avi Kivity

On 2011-01-18 18:09, Anthony Liguori wrote:
> On 01/18/2011 10:56 AM, Jan Kiszka wrote:
>>
>>> The device model topology is 100% a hidden architectural detail.
>>>      
>> This is true for the sysbus, it is obviously not the case for PCI and
>> similarly discoverable buses. There we have a guest-explorable topology
>> that is currently equivalent to the the qdev layout.
>>    
> 
> But we also don't do PCI passthrough so we really haven't even explored 
> how that maps in qdev.  I don't know if qemu-kvm has attempted to 
> qdev-ify it.

It is. And even if it weren't or the current version in qemu-kvm was not
perfect, we need to consider those uses cases now as we are about to
define a generic model for kvm device integration. That's the point of
this discussion.

> 
>>>> Management and analysis tools must be able to traverse the system buses
>>>> and find guest devices this way.
>>>>        
>>> We need to provide a compatible interface to the guest.  If you agree
>>> with my above statements, then you'll also agree that we can do this
>>> without keeping the device model topology stable.
>>>
>>> But we also need to provide a compatible interface to management tools.
>>> Exposing the device model topology as a compatible interface
>>> artificially limits us.  It's far better to provide higher level
>>> supported interfaces to give us the flexibility to change the device
>>> model as we need to.
>>>      
>> How do you want to change qdev to keep the guest and management tool
>> view stable while branching off kvm sub-buses?
> 
> The qdev device model is not a stable interface.  I think that's been 
> clear from the very beginning.

Internals aren't stable, but they should only be changed for a good
reason, specifically when the change may impact the whole set of device
models.

> 
>>   Please propose such
>> extensions so that they can be discussed. IIUC, that would be second
>> relation between qdev and qbus instances besides the physical topology.
>> What further use cases (besides passing kvm_state around) do you have in
>> mind?
>>    
> 
> The -device interface is a stable interface.  Right now, you don't 
> specify any type of identifier of the pci bus when you create a PCI 
> device.  It's implied in the interface.

Which only works as along as we expose a single bus. You don't need to
be an oracle to predict that this is not a stable interface.

> 
>>    
>>>      
>>>>    If they create a device on bus X, it
>>>> must never end up on bus Y just because it happens to be KVM-assisted or
>>>> has some other property.
>>>>        
>>> Nope.  This is exactly what should happen.
>>>
>>> 90% of the devices in the device model are not created by management
>>> tools.  They're part of a chipset.  The chipset has well defined
>>> extension points and we provide management interfaces to create devices
>>> on those extension points.  That is, interfaces to create PCI devices.
>>>
>>>      
>> Creating kvm irqchips via the management tool would be one simple way
>> (not the only one, though) to enable/disable kvm assistance for those
>> devices.
>>    
> 
> It's automatically created as part of the CPUs or as part of the 
> chipset.  How to enable/disable kvm assistance is a property of the CPU 
> and/or chipset.

If we exclude creation via command line / config files, we could also
pass the kvm_state directly from the machine or chipset setup code and
save us at least the kvm system buses.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-18 17:20                                       ` Jan Kiszka
@ 2011-01-18 17:31                                         ` Anthony Liguori
  -1 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-18 17:31 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Avi Kivity, Markus Armbruster, Marcelo Tosatti, Glauber Costa,
	kvm, qemu-devel

On 01/18/2011 11:20 AM, Jan Kiszka wrote:
>
> Which only works as along as we expose a single bus. You don't need to
> be an oracle to predict that this is not a stable interface.
>    

Today we only have a very low level factory interface--device creation 
and deletion.

I think we should move to higher level bus factory interfaces.  An 
interface to create a PCI device and to delete PCI devices.  This is the 
only sane way to do hot plug.

This also makes supporting multiple busses a lot more reasonable since 
this factory interface could be a method of the controller.

>> It's automatically created as part of the CPUs or as part of the
>> chipset.  How to enable/disable kvm assistance is a property of the CPU
>> and/or chipset.
>>      
> If we exclude creation via command line / config files, we could also
> pass the kvm_state directly from the machine or chipset setup code and
> save us at least the kvm system buses.
>    

Which is fine in the short term.  This is exactly why we don't want the 
device model to be an ABI.   It gives us the ability to make changes as 
they make sense instead of trying to be perfect from the start (which we 
never will be).

Regards,

Anthony Liguori

> Jan
>
>    


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-18 17:31                                         ` Anthony Liguori
  0 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-18 17:31 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: kvm, Glauber Costa, Marcelo Tosatti, qemu-devel,
	Markus Armbruster, Avi Kivity

On 01/18/2011 11:20 AM, Jan Kiszka wrote:
>
> Which only works as along as we expose a single bus. You don't need to
> be an oracle to predict that this is not a stable interface.
>    

Today we only have a very low level factory interface--device creation 
and deletion.

I think we should move to higher level bus factory interfaces.  An 
interface to create a PCI device and to delete PCI devices.  This is the 
only sane way to do hot plug.

This also makes supporting multiple busses a lot more reasonable since 
this factory interface could be a method of the controller.

>> It's automatically created as part of the CPUs or as part of the
>> chipset.  How to enable/disable kvm assistance is a property of the CPU
>> and/or chipset.
>>      
> If we exclude creation via command line / config files, we could also
> pass the kvm_state directly from the machine or chipset setup code and
> save us at least the kvm system buses.
>    

Which is fine in the short term.  This is exactly why we don't want the 
device model to be an ABI.   It gives us the ability to make changes as 
they make sense instead of trying to be perfect from the start (which we 
never will be).

Regards,

Anthony Liguori

> Jan
>
>    

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-18 17:08                               ` Jan Kiszka
@ 2011-01-18 17:39                                 ` Alex Williamson
  -1 siblings, 0 replies; 300+ messages in thread
From: Alex Williamson @ 2011-01-18 17:39 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, Avi Kivity, Markus Armbruster, Marcelo Tosatti,
	Glauber Costa, kvm, qemu-devel

On Tue, 2011-01-18 at 18:08 +0100, Jan Kiszka wrote:
> On 2011-01-18 18:02, Alex Williamson wrote:
> > On Tue, 2011-01-18 at 16:54 +0100, Jan Kiszka wrote:
> >> On 2011-01-18 16:48, Anthony Liguori wrote:
> >>> On 01/18/2011 09:43 AM, Jan Kiszka wrote:
> >>>> On 2011-01-18 16:04, Anthony Liguori wrote:
> >>>>    
> >>>>> On 01/18/2011 08:28 AM, Jan Kiszka wrote:
> >>>>>      
> >>>>>> On 2011-01-12 11:31, Jan Kiszka wrote:
> >>>>>>
> >>>>>>        
> >>>>>>> Am 12.01.2011 11:22, Avi Kivity wrote:
> >>>>>>>
> >>>>>>>          
> >>>>>>>> On 01/11/2011 03:54 PM, Anthony Liguori wrote:
> >>>>>>>>
> >>>>>>>>            
> >>>>>>>>> Right, we should introduce a KVMBus that KVM devices are created on.
> >>>>>>>>> The devices can get at KVMState through the BusState.
> >>>>>>>>>
> >>>>>>>>>              
> >>>>>>>> There is no kvm bus in a PC (I looked).  We're bending the device model
> >>>>>>>> here because a device is implemented in the kernel and not in
> >>>>>>>> userspace.  An implementation detail is magnified beyond all proportions.
> >>>>>>>>
> >>>>>>>> An ioapic that is implemented by kvm lives in exactly the same place
> >>>>>>>> that the qemu ioapic lives in.  An assigned pci device lives on the PCI
> >>>>>>>> bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
> >>>>>>>> it elsewhere, not through creating imaginary buses that don't exist.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>            
> >>>>>>> Exactly.
> >>>>>>>
> >>>>>>> So we can either "infect" the whole device tree with kvm (or maybe a
> >>>>>>> more generic accelerator structure that also deals with Xen) or we need
> >>>>>>> to pull the reference inside the device's init function from some global
> >>>>>>> service (kvm_get_state).
> >>>>>>>
> >>>>>>>          
> >>>>>> Note that this topic is still waiting for good suggestions, specifically
> >>>>>> from those who believe in kvm_state references :). This is not only
> >>>>>> blocking kvmstate merge but will affect KVM irqchips as well.
> >>>>>>
> >>>>>> It boils down to how we reasonably pass a kvm_state reference from
> >>>>>> machine init code to a sysbus device. I'm probably biased, but I don't
> >>>>>> see any way that does not work against the idea of confining access to
> >>>>>> kvm_state or breaks device instantiation from the command line or a
> >>>>>> config file.
> >>>>>>
> >>>>>>        
> >>>>> A KVM device should sit on a KVM specific bus that hangs off of sysbus.
> >>>>> It can get to kvm_state through that bus.
> >>>>>
> >>>>> That bus doesn't get instantiated through qdev so requiring a pointer
> >>>>> argument should not be an issue.
> >>>>>
> >>>>>      
> >>>> This design is in conflict with the requirement to attach KVM-assisted
> >>>> devices also to their home bus, e.g. an assigned PCI device to the PCI
> >>>> bus. We don't support multi-homed qdev devices.
> >>>>    
> >>>
> >>> With vfio, would an assigned PCI device even need kvm_state?
> >>
> >> IIUC: Yes, for establishing the irqfd link.
> > 
> > We abstract this through the msi/msix layer though, so the vfio driver
> > doesn't directly know anything about kvm_state.
> 
> Which version/tree are you referring to? It wasn't the case in the last
> version I found on the list.
> 
> Does the msi layer use irqfd for every source in kvm mode then? Of
> course, the key question will be how that layer will once obtain kvm_state.

Looking at "[RFC PATCH v2] VFIO based device assignment" sent on Nov
5th, I guess we do call kvm_set_irqfd.  Maybe I'm just wishing that the
msi layer abstracted it better.  I'd like to be able to pass in a
userspace interrupt handler function pointer and an event notifier fd
and let the interrupt layers worry about how it's hooked up.

Alex


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-18 17:39                                 ` Alex Williamson
  0 siblings, 0 replies; 300+ messages in thread
From: Alex Williamson @ 2011-01-18 17:39 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: kvm, Glauber Costa, Marcelo Tosatti, Markus Armbruster,
	qemu-devel, Anthony Liguori, Avi Kivity

On Tue, 2011-01-18 at 18:08 +0100, Jan Kiszka wrote:
> On 2011-01-18 18:02, Alex Williamson wrote:
> > On Tue, 2011-01-18 at 16:54 +0100, Jan Kiszka wrote:
> >> On 2011-01-18 16:48, Anthony Liguori wrote:
> >>> On 01/18/2011 09:43 AM, Jan Kiszka wrote:
> >>>> On 2011-01-18 16:04, Anthony Liguori wrote:
> >>>>    
> >>>>> On 01/18/2011 08:28 AM, Jan Kiszka wrote:
> >>>>>      
> >>>>>> On 2011-01-12 11:31, Jan Kiszka wrote:
> >>>>>>
> >>>>>>        
> >>>>>>> Am 12.01.2011 11:22, Avi Kivity wrote:
> >>>>>>>
> >>>>>>>          
> >>>>>>>> On 01/11/2011 03:54 PM, Anthony Liguori wrote:
> >>>>>>>>
> >>>>>>>>            
> >>>>>>>>> Right, we should introduce a KVMBus that KVM devices are created on.
> >>>>>>>>> The devices can get at KVMState through the BusState.
> >>>>>>>>>
> >>>>>>>>>              
> >>>>>>>> There is no kvm bus in a PC (I looked).  We're bending the device model
> >>>>>>>> here because a device is implemented in the kernel and not in
> >>>>>>>> userspace.  An implementation detail is magnified beyond all proportions.
> >>>>>>>>
> >>>>>>>> An ioapic that is implemented by kvm lives in exactly the same place
> >>>>>>>> that the qemu ioapic lives in.  An assigned pci device lives on the PCI
> >>>>>>>> bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
> >>>>>>>> it elsewhere, not through creating imaginary buses that don't exist.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>            
> >>>>>>> Exactly.
> >>>>>>>
> >>>>>>> So we can either "infect" the whole device tree with kvm (or maybe a
> >>>>>>> more generic accelerator structure that also deals with Xen) or we need
> >>>>>>> to pull the reference inside the device's init function from some global
> >>>>>>> service (kvm_get_state).
> >>>>>>>
> >>>>>>>          
> >>>>>> Note that this topic is still waiting for good suggestions, specifically
> >>>>>> from those who believe in kvm_state references :). This is not only
> >>>>>> blocking kvmstate merge but will affect KVM irqchips as well.
> >>>>>>
> >>>>>> It boils down to how we reasonably pass a kvm_state reference from
> >>>>>> machine init code to a sysbus device. I'm probably biased, but I don't
> >>>>>> see any way that does not work against the idea of confining access to
> >>>>>> kvm_state or breaks device instantiation from the command line or a
> >>>>>> config file.
> >>>>>>
> >>>>>>        
> >>>>> A KVM device should sit on a KVM specific bus that hangs off of sysbus.
> >>>>> It can get to kvm_state through that bus.
> >>>>>
> >>>>> That bus doesn't get instantiated through qdev so requiring a pointer
> >>>>> argument should not be an issue.
> >>>>>
> >>>>>      
> >>>> This design is in conflict with the requirement to attach KVM-assisted
> >>>> devices also to their home bus, e.g. an assigned PCI device to the PCI
> >>>> bus. We don't support multi-homed qdev devices.
> >>>>    
> >>>
> >>> With vfio, would an assigned PCI device even need kvm_state?
> >>
> >> IIUC: Yes, for establishing the irqfd link.
> > 
> > We abstract this through the msi/msix layer though, so the vfio driver
> > doesn't directly know anything about kvm_state.
> 
> Which version/tree are you referring to? It wasn't the case in the last
> version I found on the list.
> 
> Does the msi layer use irqfd for every source in kvm mode then? Of
> course, the key question will be how that layer will once obtain kvm_state.

Looking at "[RFC PATCH v2] VFIO based device assignment" sent on Nov
5th, I guess we do call kvm_set_irqfd.  Maybe I'm just wishing that the
msi layer abstracted it better.  I'd like to be able to pass in a
userspace interrupt handler function pointer and an event notifier fd
and let the interrupt layers worry about how it's hooked up.

Alex

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-18 17:31                                         ` Anthony Liguori
@ 2011-01-18 17:45                                           ` Jan Kiszka
  -1 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-18 17:45 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Avi Kivity, Markus Armbruster, Marcelo Tosatti, Glauber Costa,
	kvm, qemu-devel

On 2011-01-18 18:31, Anthony Liguori wrote:
>>> It's automatically created as part of the CPUs or as part of the
>>> chipset.  How to enable/disable kvm assistance is a property of the CPU
>>> and/or chipset.
>>>      
>> If we exclude creation via command line / config files, we could also
>> pass the kvm_state directly from the machine or chipset setup code and
>> save us at least the kvm system buses.
>>    
> 
> Which is fine in the short term.

Unless we want to abuse the pointer property for this, and there was
some resistance, we would have to change the sysbus init function
signature. I don't want to propose this for a short-term workaround, we
need a clearer vision and roadmap to avoid multiple invasive changes to
the device model.

>  This is exactly why we don't want the 
> device model to be an ABI.   It gives us the ability to make changes as 
> they make sense instead of trying to be perfect from the start (which we 
> never will be).

The device model will always consist of a stable part, the guest and
management visible topology. That beast needs to be modeled as well,
likely via some new bus objects. If that's the way to go, starting now
is probably the right time as we have an urgent use case, right?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-18 17:45                                           ` Jan Kiszka
  0 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-18 17:45 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kvm, Glauber Costa, Marcelo Tosatti, qemu-devel,
	Markus Armbruster, Avi Kivity

On 2011-01-18 18:31, Anthony Liguori wrote:
>>> It's automatically created as part of the CPUs or as part of the
>>> chipset.  How to enable/disable kvm assistance is a property of the CPU
>>> and/or chipset.
>>>      
>> If we exclude creation via command line / config files, we could also
>> pass the kvm_state directly from the machine or chipset setup code and
>> save us at least the kvm system buses.
>>    
> 
> Which is fine in the short term.

Unless we want to abuse the pointer property for this, and there was
some resistance, we would have to change the sysbus init function
signature. I don't want to propose this for a short-term workaround, we
need a clearer vision and roadmap to avoid multiple invasive changes to
the device model.

>  This is exactly why we don't want the 
> device model to be an ABI.   It gives us the ability to make changes as 
> they make sense instead of trying to be perfect from the start (which we 
> never will be).

The device model will always consist of a stable part, the guest and
management visible topology. That beast needs to be modeled as well,
likely via some new bus objects. If that's the way to go, starting now
is probably the right time as we have an urgent use case, right?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-18 17:09                                     ` Anthony Liguori
@ 2011-01-19  9:48                                       ` Gerd Hoffmann
  -1 siblings, 0 replies; 300+ messages in thread
From: Gerd Hoffmann @ 2011-01-19  9:48 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Jan Kiszka, Avi Kivity, Markus Armbruster, Marcelo Tosatti,
	Glauber Costa, kvm, qemu-devel

On 01/18/11 18:09, Anthony Liguori wrote:
> On 01/18/2011 10:56 AM, Jan Kiszka wrote:
>>
>>> The device model topology is 100% a hidden architectural detail.
>> This is true for the sysbus, it is obviously not the case for PCI and
>> similarly discoverable buses. There we have a guest-explorable topology
>> that is currently equivalent to the the qdev layout.
>
> But we also don't do PCI passthrough so we really haven't even explored
> how that maps in qdev. I don't know if qemu-kvm has attempted to
> qdev-ify it.

It is qdev-ified.  It is a normal pci device from qdev's point of view.

BTW: is there any reason why (vfio-based) pci passthrough couldn't work 
with tcg?

> The -device interface is a stable interface. Right now, you don't
> specify any type of identifier of the pci bus when you create a PCI
> device. It's implied in the interface.

Wrong.  You can specify the bus you want attach the device to via 
bus=<name>.  This is true for *every* device, including all pci devices. 
  If unspecified qdev uses the first bus it finds.

As long as there is a single pci bus only there is simply no need to 
specify it, thats why nobody does that today.  Once q35 finally arrives 
this will change of course.

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-19  9:48                                       ` Gerd Hoffmann
  0 siblings, 0 replies; 300+ messages in thread
From: Gerd Hoffmann @ 2011-01-19  9:48 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kvm, Jan Kiszka, Glauber Costa, Marcelo Tosatti,
	Markus Armbruster, qemu-devel, Avi Kivity

On 01/18/11 18:09, Anthony Liguori wrote:
> On 01/18/2011 10:56 AM, Jan Kiszka wrote:
>>
>>> The device model topology is 100% a hidden architectural detail.
>> This is true for the sysbus, it is obviously not the case for PCI and
>> similarly discoverable buses. There we have a guest-explorable topology
>> that is currently equivalent to the the qdev layout.
>
> But we also don't do PCI passthrough so we really haven't even explored
> how that maps in qdev. I don't know if qemu-kvm has attempted to
> qdev-ify it.

It is qdev-ified.  It is a normal pci device from qdev's point of view.

BTW: is there any reason why (vfio-based) pci passthrough couldn't work 
with tcg?

> The -device interface is a stable interface. Right now, you don't
> specify any type of identifier of the pci bus when you create a PCI
> device. It's implied in the interface.

Wrong.  You can specify the bus you want attach the device to via 
bus=<name>.  This is true for *every* device, including all pci devices. 
  If unspecified qdev uses the first bus it finds.

As long as there is a single pci bus only there is simply no need to 
specify it, thats why nobody does that today.  Once q35 finally arrives 
this will change of course.

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-18 17:09                                     ` Anthony Liguori
@ 2011-01-19 13:09                                       ` Markus Armbruster
  -1 siblings, 0 replies; 300+ messages in thread
From: Markus Armbruster @ 2011-01-19 13:09 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Jan Kiszka, kvm, Glauber Costa, Marcelo Tosatti, qemu-devel, Avi Kivity

Anthony Liguori <aliguori@linux.vnet.ibm.com> writes:

> On 01/18/2011 10:56 AM, Jan Kiszka wrote:
>>
>>> The device model topology is 100% a hidden architectural detail.
>>>      
>> This is true for the sysbus, it is obviously not the case for PCI and
>> similarly discoverable buses. There we have a guest-explorable topology
>> that is currently equivalent to the the qdev layout.
>>    
>
> But we also don't do PCI passthrough so we really haven't even
> explored how that maps in qdev.  I don't know if qemu-kvm has
> attempted to qdev-ify it.
>
>>>> Management and analysis tools must be able to traverse the system buses
>>>> and find guest devices this way.
>>>>        
>>> We need to provide a compatible interface to the guest.  If you agree
>>> with my above statements, then you'll also agree that we can do this
>>> without keeping the device model topology stable.
>>>
>>> But we also need to provide a compatible interface to management tools.
>>> Exposing the device model topology as a compatible interface
>>> artificially limits us.  It's far better to provide higher level
>>> supported interfaces to give us the flexibility to change the device
>>> model as we need to.
>>>      
>> How do you want to change qdev to keep the guest and management tool
>> view stable while branching off kvm sub-buses?
>
> The qdev device model is not a stable interface.  I think that's been
> clear from the very beginning.
>
>>   Please propose such
>> extensions so that they can be discussed. IIUC, that would be second
>> relation between qdev and qbus instances besides the physical topology.
>> What further use cases (besides passing kvm_state around) do you have in
>> mind?
>>    
>
> The -device interface is a stable interface.  Right now, you don't
> specify any type of identifier of the pci bus when you create a PCI
> device.  It's implied in the interface.

Now I'm confused.  Isn't "-device FOO,bus=pci.0" specifying the PCI bus?

[...]

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-19 13:09                                       ` Markus Armbruster
  0 siblings, 0 replies; 300+ messages in thread
From: Markus Armbruster @ 2011-01-19 13:09 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kvm, Jan Kiszka, Glauber Costa, Marcelo Tosatti, qemu-devel, Avi Kivity

Anthony Liguori <aliguori@linux.vnet.ibm.com> writes:

> On 01/18/2011 10:56 AM, Jan Kiszka wrote:
>>
>>> The device model topology is 100% a hidden architectural detail.
>>>      
>> This is true for the sysbus, it is obviously not the case for PCI and
>> similarly discoverable buses. There we have a guest-explorable topology
>> that is currently equivalent to the the qdev layout.
>>    
>
> But we also don't do PCI passthrough so we really haven't even
> explored how that maps in qdev.  I don't know if qemu-kvm has
> attempted to qdev-ify it.
>
>>>> Management and analysis tools must be able to traverse the system buses
>>>> and find guest devices this way.
>>>>        
>>> We need to provide a compatible interface to the guest.  If you agree
>>> with my above statements, then you'll also agree that we can do this
>>> without keeping the device model topology stable.
>>>
>>> But we also need to provide a compatible interface to management tools.
>>> Exposing the device model topology as a compatible interface
>>> artificially limits us.  It's far better to provide higher level
>>> supported interfaces to give us the flexibility to change the device
>>> model as we need to.
>>>      
>> How do you want to change qdev to keep the guest and management tool
>> view stable while branching off kvm sub-buses?
>
> The qdev device model is not a stable interface.  I think that's been
> clear from the very beginning.
>
>>   Please propose such
>> extensions so that they can be discussed. IIUC, that would be second
>> relation between qdev and qbus instances besides the physical topology.
>> What further use cases (besides passing kvm_state around) do you have in
>> mind?
>>    
>
> The -device interface is a stable interface.  Right now, you don't
> specify any type of identifier of the pci bus when you create a PCI
> device.  It's implied in the interface.

Now I'm confused.  Isn't "-device FOO,bus=pci.0" specifying the PCI bus?

[...]

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-19  9:48                                       ` Gerd Hoffmann
@ 2011-01-19 13:11                                         ` Markus Armbruster
  -1 siblings, 0 replies; 300+ messages in thread
From: Markus Armbruster @ 2011-01-19 13:11 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Anthony Liguori, kvm, Jan Kiszka, Glauber Costa, Marcelo Tosatti,
	qemu-devel, Avi Kivity

Gerd Hoffmann <kraxel@redhat.com> writes:

> On 01/18/11 18:09, Anthony Liguori wrote:
>> On 01/18/2011 10:56 AM, Jan Kiszka wrote:
>>>
>>>> The device model topology is 100% a hidden architectural detail.
>>> This is true for the sysbus, it is obviously not the case for PCI and
>>> similarly discoverable buses. There we have a guest-explorable topology
>>> that is currently equivalent to the the qdev layout.
>>
>> But we also don't do PCI passthrough so we really haven't even explored
>> how that maps in qdev. I don't know if qemu-kvm has attempted to
>> qdev-ify it.
>
> It is qdev-ified.  It is a normal pci device from qdev's point of view.
>
> BTW: is there any reason why (vfio-based) pci passthrough couldn't
> work with tcg?
>
>> The -device interface is a stable interface. Right now, you don't
>> specify any type of identifier of the pci bus when you create a PCI
>> device. It's implied in the interface.
>
> Wrong.  You can specify the bus you want attach the device to via
> bus=<name>.  This is true for *every* device, including all pci
> devices. If unspecified qdev uses the first bus it finds.
>
> As long as there is a single pci bus only there is simply no need to
> specify it, thats why nobody does that today.  Once q35 finally
> arrives this will change of course.

As far as I know, libvirt does it already.

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-19 13:11                                         ` Markus Armbruster
  0 siblings, 0 replies; 300+ messages in thread
From: Markus Armbruster @ 2011-01-19 13:11 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: kvm, Jan Kiszka, Glauber Costa, Marcelo Tosatti, qemu-devel,
	Anthony Liguori, Avi Kivity

Gerd Hoffmann <kraxel@redhat.com> writes:

> On 01/18/11 18:09, Anthony Liguori wrote:
>> On 01/18/2011 10:56 AM, Jan Kiszka wrote:
>>>
>>>> The device model topology is 100% a hidden architectural detail.
>>> This is true for the sysbus, it is obviously not the case for PCI and
>>> similarly discoverable buses. There we have a guest-explorable topology
>>> that is currently equivalent to the the qdev layout.
>>
>> But we also don't do PCI passthrough so we really haven't even explored
>> how that maps in qdev. I don't know if qemu-kvm has attempted to
>> qdev-ify it.
>
> It is qdev-ified.  It is a normal pci device from qdev's point of view.
>
> BTW: is there any reason why (vfio-based) pci passthrough couldn't
> work with tcg?
>
>> The -device interface is a stable interface. Right now, you don't
>> specify any type of identifier of the pci bus when you create a PCI
>> device. It's implied in the interface.
>
> Wrong.  You can specify the bus you want attach the device to via
> bus=<name>.  This is true for *every* device, including all pci
> devices. If unspecified qdev uses the first bus it finds.
>
> As long as there is a single pci bus only there is simply no need to
> specify it, thats why nobody does that today.  Once q35 finally
> arrives this will change of course.

As far as I know, libvirt does it already.

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-18 15:50                         ` Anthony Liguori
@ 2011-01-19 13:15                           ` Markus Armbruster
  -1 siblings, 0 replies; 300+ messages in thread
From: Markus Armbruster @ 2011-01-19 13:15 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Jan Kiszka, kvm, Glauber Costa, Marcelo Tosatti, qemu-devel, Avi Kivity

Anthony Liguori <aliguori@linux.vnet.ibm.com> writes:

> On 01/18/2011 09:43 AM, Jan Kiszka wrote:
>> On 2011-01-18 16:04, Anthony Liguori wrote:
>>    
>>> On 01/18/2011 08:28 AM, Jan Kiszka wrote:
>>>      
>>>> On 2011-01-12 11:31, Jan Kiszka wrote:
>>>>
>>>>        
>>>>> Am 12.01.2011 11:22, Avi Kivity wrote:
>>>>>
>>>>>          
>>>>>> On 01/11/2011 03:54 PM, Anthony Liguori wrote:
>>>>>>
>>>>>>            
>>>>>>> Right, we should introduce a KVMBus that KVM devices are created on.
>>>>>>> The devices can get at KVMState through the BusState.
>>>>>>>
>>>>>>>              
>>>>>> There is no kvm bus in a PC (I looked).  We're bending the device model
>>>>>> here because a device is implemented in the kernel and not in
>>>>>> userspace.  An implementation detail is magnified beyond all proportions.
>>>>>>
>>>>>> An ioapic that is implemented by kvm lives in exactly the same place
>>>>>> that the qemu ioapic lives in.  An assigned pci device lives on the PCI
>>>>>> bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
>>>>>> it elsewhere, not through creating imaginary buses that don't exist.
>>>>>>
>>>>>>
>>>>>>            
>>>>> Exactly.
>>>>>
>>>>> So we can either "infect" the whole device tree with kvm (or maybe a
>>>>> more generic accelerator structure that also deals with Xen) or we need
>>>>> to pull the reference inside the device's init function from some global
>>>>> service (kvm_get_state).
>>>>>
>>>>>          
>>>> Note that this topic is still waiting for good suggestions, specifically
>>>> from those who believe in kvm_state references :). This is not only
>>>> blocking kvmstate merge but will affect KVM irqchips as well.
>>>>
>>>> It boils down to how we reasonably pass a kvm_state reference from
>>>> machine init code to a sysbus device. I'm probably biased, but I don't
>>>> see any way that does not work against the idea of confining access to
>>>> kvm_state or breaks device instantiation from the command line or a
>>>> config file.
>>>>
>>>>        
>>> A KVM device should sit on a KVM specific bus that hangs off of sysbus.
>>> It can get to kvm_state through that bus.
>>>
>>> That bus doesn't get instantiated through qdev so requiring a pointer
>>> argument should not be an issue.
>>>
>>>      
>> This design is in conflict with the requirement to attach KVM-assisted
>> devices also to their home bus, e.g. an assigned PCI device to the PCI
>> bus. We don't support multi-homed qdev devices.
>>    
>
> The bus topology reflects how I/O flows in and out of a device.  We do
> not model a perfect PC bus architecture and I don't think we ever
> intend to.  Instead, we model a functional architecture.
>
> I/O from an assigned device does not flow through the emulated PCI
> bus.  Therefore, it does not belong on the emulated PCI bus.
>
> Assigned devices need to interact with the emulated PCI bus, but they
> shouldn't be children of it.

So they interact with KVM (need kvm_state), and they interact with the
emulated PCI bus.  Could you elaborate on the fundamental difference
between the two interactions that makes you choose the (hypothetical)
KVM bus over the PCI bus as device parent?

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-19 13:15                           ` Markus Armbruster
  0 siblings, 0 replies; 300+ messages in thread
From: Markus Armbruster @ 2011-01-19 13:15 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kvm, Jan Kiszka, Glauber Costa, Marcelo Tosatti, qemu-devel, Avi Kivity

Anthony Liguori <aliguori@linux.vnet.ibm.com> writes:

> On 01/18/2011 09:43 AM, Jan Kiszka wrote:
>> On 2011-01-18 16:04, Anthony Liguori wrote:
>>    
>>> On 01/18/2011 08:28 AM, Jan Kiszka wrote:
>>>      
>>>> On 2011-01-12 11:31, Jan Kiszka wrote:
>>>>
>>>>        
>>>>> Am 12.01.2011 11:22, Avi Kivity wrote:
>>>>>
>>>>>          
>>>>>> On 01/11/2011 03:54 PM, Anthony Liguori wrote:
>>>>>>
>>>>>>            
>>>>>>> Right, we should introduce a KVMBus that KVM devices are created on.
>>>>>>> The devices can get at KVMState through the BusState.
>>>>>>>
>>>>>>>              
>>>>>> There is no kvm bus in a PC (I looked).  We're bending the device model
>>>>>> here because a device is implemented in the kernel and not in
>>>>>> userspace.  An implementation detail is magnified beyond all proportions.
>>>>>>
>>>>>> An ioapic that is implemented by kvm lives in exactly the same place
>>>>>> that the qemu ioapic lives in.  An assigned pci device lives on the PCI
>>>>>> bus, not a KVMBus.  If we need a pointer to KVMState, then we must find
>>>>>> it elsewhere, not through creating imaginary buses that don't exist.
>>>>>>
>>>>>>
>>>>>>            
>>>>> Exactly.
>>>>>
>>>>> So we can either "infect" the whole device tree with kvm (or maybe a
>>>>> more generic accelerator structure that also deals with Xen) or we need
>>>>> to pull the reference inside the device's init function from some global
>>>>> service (kvm_get_state).
>>>>>
>>>>>          
>>>> Note that this topic is still waiting for good suggestions, specifically
>>>> from those who believe in kvm_state references :). This is not only
>>>> blocking kvmstate merge but will affect KVM irqchips as well.
>>>>
>>>> It boils down to how we reasonably pass a kvm_state reference from
>>>> machine init code to a sysbus device. I'm probably biased, but I don't
>>>> see any way that does not work against the idea of confining access to
>>>> kvm_state or breaks device instantiation from the command line or a
>>>> config file.
>>>>
>>>>        
>>> A KVM device should sit on a KVM specific bus that hangs off of sysbus.
>>> It can get to kvm_state through that bus.
>>>
>>> That bus doesn't get instantiated through qdev so requiring a pointer
>>> argument should not be an issue.
>>>
>>>      
>> This design is in conflict with the requirement to attach KVM-assisted
>> devices also to their home bus, e.g. an assigned PCI device to the PCI
>> bus. We don't support multi-homed qdev devices.
>>    
>
> The bus topology reflects how I/O flows in and out of a device.  We do
> not model a perfect PC bus architecture and I don't think we ever
> intend to.  Instead, we model a functional architecture.
>
> I/O from an assigned device does not flow through the emulated PCI
> bus.  Therefore, it does not belong on the emulated PCI bus.
>
> Assigned devices need to interact with the emulated PCI bus, but they
> shouldn't be children of it.

So they interact with KVM (need kvm_state), and they interact with the
emulated PCI bus.  Could you elaborate on the fundamental difference
between the two interactions that makes you choose the (hypothetical)
KVM bus over the PCI bus as device parent?

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-19  9:48                                       ` Gerd Hoffmann
@ 2011-01-19 16:53                                         ` Anthony Liguori
  -1 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-19 16:53 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Jan Kiszka, Avi Kivity, Markus Armbruster, Marcelo Tosatti,
	Glauber Costa, kvm, qemu-devel

On 01/19/2011 03:48 AM, Gerd Hoffmann wrote:
> On 01/18/11 18:09, Anthony Liguori wrote:
>> On 01/18/2011 10:56 AM, Jan Kiszka wrote:
>>>
>>>> The device model topology is 100% a hidden architectural detail.
>>> This is true for the sysbus, it is obviously not the case for PCI and
>>> similarly discoverable buses. There we have a guest-explorable topology
>>> that is currently equivalent to the the qdev layout.
>>
>> But we also don't do PCI passthrough so we really haven't even explored
>> how that maps in qdev. I don't know if qemu-kvm has attempted to
>> qdev-ify it.
>
> It is qdev-ified.  It is a normal pci device from qdev's point of view.
>
> BTW: is there any reason why (vfio-based) pci passthrough couldn't 
> work with tcg?
>
>> The -device interface is a stable interface. Right now, you don't
>> specify any type of identifier of the pci bus when you create a PCI
>> device. It's implied in the interface.
>
> Wrong.  You can specify the bus you want attach the device to via 
> bus=<name>.  This is true for *every* device, including all pci 
> devices.  If unspecified qdev uses the first bus it finds.
>
> As long as there is a single pci bus only there is simply no need to 
> specify it, thats why nobody does that today.

Right.  In terms of specifying bus=, what are we promising re: 
compatibility?  Will there always be a pci.0?  If we add some PCI-to-PCI 
bridges in order to support more devices, is libvirt support to parse 
the hierarchy and figure out which bus to put the device on?

Regards,

Anthony Liguori

>   Once q35 finally arrives this will change of course.
>
> cheers,
>   Gerd


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-19 16:53                                         ` Anthony Liguori
  0 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-19 16:53 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: kvm, Jan Kiszka, Glauber Costa, Marcelo Tosatti,
	Markus Armbruster, qemu-devel, Avi Kivity

On 01/19/2011 03:48 AM, Gerd Hoffmann wrote:
> On 01/18/11 18:09, Anthony Liguori wrote:
>> On 01/18/2011 10:56 AM, Jan Kiszka wrote:
>>>
>>>> The device model topology is 100% a hidden architectural detail.
>>> This is true for the sysbus, it is obviously not the case for PCI and
>>> similarly discoverable buses. There we have a guest-explorable topology
>>> that is currently equivalent to the the qdev layout.
>>
>> But we also don't do PCI passthrough so we really haven't even explored
>> how that maps in qdev. I don't know if qemu-kvm has attempted to
>> qdev-ify it.
>
> It is qdev-ified.  It is a normal pci device from qdev's point of view.
>
> BTW: is there any reason why (vfio-based) pci passthrough couldn't 
> work with tcg?
>
>> The -device interface is a stable interface. Right now, you don't
>> specify any type of identifier of the pci bus when you create a PCI
>> device. It's implied in the interface.
>
> Wrong.  You can specify the bus you want attach the device to via 
> bus=<name>.  This is true for *every* device, including all pci 
> devices.  If unspecified qdev uses the first bus it finds.
>
> As long as there is a single pci bus only there is simply no need to 
> specify it, thats why nobody does that today.

Right.  In terms of specifying bus=, what are we promising re: 
compatibility?  Will there always be a pci.0?  If we add some PCI-to-PCI 
bridges in order to support more devices, is libvirt support to parse 
the hierarchy and figure out which bus to put the device on?

Regards,

Anthony Liguori

>   Once q35 finally arrives this will change of course.
>
> cheers,
>   Gerd

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-19 13:11                                         ` Markus Armbruster
@ 2011-01-19 16:54                                           ` Anthony Liguori
  -1 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-19 16:54 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Gerd Hoffmann, kvm, Jan Kiszka, Glauber Costa, Marcelo Tosatti,
	qemu-devel, Avi Kivity

On 01/19/2011 07:11 AM, Markus Armbruster wrote:
> Gerd Hoffmann<kraxel@redhat.com>  writes:
>
>    
>> On 01/18/11 18:09, Anthony Liguori wrote:
>>      
>>> On 01/18/2011 10:56 AM, Jan Kiszka wrote:
>>>        
>>>>          
>>>>> The device model topology is 100% a hidden architectural detail.
>>>>>            
>>>> This is true for the sysbus, it is obviously not the case for PCI and
>>>> similarly discoverable buses. There we have a guest-explorable topology
>>>> that is currently equivalent to the the qdev layout.
>>>>          
>>> But we also don't do PCI passthrough so we really haven't even explored
>>> how that maps in qdev. I don't know if qemu-kvm has attempted to
>>> qdev-ify it.
>>>        
>> It is qdev-ified.  It is a normal pci device from qdev's point of view.
>>
>> BTW: is there any reason why (vfio-based) pci passthrough couldn't
>> work with tcg?
>>
>>      
>>> The -device interface is a stable interface. Right now, you don't
>>> specify any type of identifier of the pci bus when you create a PCI
>>> device. It's implied in the interface.
>>>        
>> Wrong.  You can specify the bus you want attach the device to via
>> bus=<name>.  This is true for *every* device, including all pci
>> devices. If unspecified qdev uses the first bus it finds.
>>
>> As long as there is a single pci bus only there is simply no need to
>> specify it, thats why nobody does that today.  Once q35 finally
>> arrives this will change of course.
>>      
> As far as I know, libvirt does it already.
>    

I think that's a bad idea from a forward compatibility perspective.

Regards,

Anthony Liguori



^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-19 16:54                                           ` Anthony Liguori
  0 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-19 16:54 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: kvm, Jan Kiszka, Glauber Costa, Marcelo Tosatti, qemu-devel,
	Gerd Hoffmann, Avi Kivity

On 01/19/2011 07:11 AM, Markus Armbruster wrote:
> Gerd Hoffmann<kraxel@redhat.com>  writes:
>
>    
>> On 01/18/11 18:09, Anthony Liguori wrote:
>>      
>>> On 01/18/2011 10:56 AM, Jan Kiszka wrote:
>>>        
>>>>          
>>>>> The device model topology is 100% a hidden architectural detail.
>>>>>            
>>>> This is true for the sysbus, it is obviously not the case for PCI and
>>>> similarly discoverable buses. There we have a guest-explorable topology
>>>> that is currently equivalent to the the qdev layout.
>>>>          
>>> But we also don't do PCI passthrough so we really haven't even explored
>>> how that maps in qdev. I don't know if qemu-kvm has attempted to
>>> qdev-ify it.
>>>        
>> It is qdev-ified.  It is a normal pci device from qdev's point of view.
>>
>> BTW: is there any reason why (vfio-based) pci passthrough couldn't
>> work with tcg?
>>
>>      
>>> The -device interface is a stable interface. Right now, you don't
>>> specify any type of identifier of the pci bus when you create a PCI
>>> device. It's implied in the interface.
>>>        
>> Wrong.  You can specify the bus you want attach the device to via
>> bus=<name>.  This is true for *every* device, including all pci
>> devices. If unspecified qdev uses the first bus it finds.
>>
>> As long as there is a single pci bus only there is simply no need to
>> specify it, thats why nobody does that today.  Once q35 finally
>> arrives this will change of course.
>>      
> As far as I know, libvirt does it already.
>    

I think that's a bad idea from a forward compatibility perspective.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-19 13:15                           ` Markus Armbruster
@ 2011-01-19 16:57                             ` Anthony Liguori
  -1 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-19 16:57 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: kvm, Jan Kiszka, Glauber Costa, Marcelo Tosatti, qemu-devel, Avi Kivity

On 01/19/2011 07:15 AM, Markus Armbruster wrote:
> So they interact with KVM (need kvm_state), and they interact with the
> emulated PCI bus.  Could you elaborate on the fundamental difference
> between the two interactions that makes you choose the (hypothetical)
> KVM bus over the PCI bus as device parent?
>    

It's almost arbitrary, but I would say it's the direction that I/Os flow.

But if the underlying observation is that the device tree is not really 
a tree, you're 100% correct.  This is part of why a factory interface 
that just takes a parent bus is too simplistic.

I think we ought to introduce a -pci-device option that is specifically 
for creating PCI devices that doesn't require a parent bus argument but 
provides a way to specify stable addressing (for instancing, using a 
linear index).

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-19 16:57                             ` Anthony Liguori
  0 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-19 16:57 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: kvm, Jan Kiszka, Glauber Costa, Marcelo Tosatti, qemu-devel, Avi Kivity

On 01/19/2011 07:15 AM, Markus Armbruster wrote:
> So they interact with KVM (need kvm_state), and they interact with the
> emulated PCI bus.  Could you elaborate on the fundamental difference
> between the two interactions that makes you choose the (hypothetical)
> KVM bus over the PCI bus as device parent?
>    

It's almost arbitrary, but I would say it's the direction that I/Os flow.

But if the underlying observation is that the device tree is not really 
a tree, you're 100% correct.  This is part of why a factory interface 
that just takes a parent bus is too simplistic.

I think we ought to introduce a -pci-device option that is specifically 
for creating PCI devices that doesn't require a parent bus argument but 
provides a way to specify stable addressing (for instancing, using a 
linear index).

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-19 16:53                                         ` Anthony Liguori
@ 2011-01-19 17:01                                           ` Daniel P. Berrange
  -1 siblings, 0 replies; 300+ messages in thread
From: Daniel P. Berrange @ 2011-01-19 17:01 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Gerd Hoffmann, Jan Kiszka, Avi Kivity, Markus Armbruster,
	Marcelo Tosatti, Glauber Costa, kvm, qemu-devel

On Wed, Jan 19, 2011 at 10:53:30AM -0600, Anthony Liguori wrote:
> On 01/19/2011 03:48 AM, Gerd Hoffmann wrote:
> >On 01/18/11 18:09, Anthony Liguori wrote:
> >>On 01/18/2011 10:56 AM, Jan Kiszka wrote:
> >>>
> >>>>The device model topology is 100% a hidden architectural detail.
> >>>This is true for the sysbus, it is obviously not the case for PCI and
> >>>similarly discoverable buses. There we have a guest-explorable topology
> >>>that is currently equivalent to the the qdev layout.
> >>
> >>But we also don't do PCI passthrough so we really haven't even explored
> >>how that maps in qdev. I don't know if qemu-kvm has attempted to
> >>qdev-ify it.
> >
> >It is qdev-ified.  It is a normal pci device from qdev's point of view.
> >
> >BTW: is there any reason why (vfio-based) pci passthrough couldn't
> >work with tcg?
> >
> >>The -device interface is a stable interface. Right now, you don't
> >>specify any type of identifier of the pci bus when you create a PCI
> >>device. It's implied in the interface.
> >
> >Wrong.  You can specify the bus you want attach the device to via
> >bus=<name>.  This is true for *every* device, including all pci
> >devices.  If unspecified qdev uses the first bus it finds.
> >
> >As long as there is a single pci bus only there is simply no need
> >to specify it, thats why nobody does that today.
> 
> Right.  In terms of specifying bus=, what are we promising re:
> compatibility?  Will there always be a pci.0?  If we add some
> PCI-to-PCI bridges in order to support more devices, is libvirt
> support to parse the hierarchy and figure out which bus to put the
> device on?

The reason we specify 'bus' is that we wanted to be flexible wrt
upgrades of libvirt, without needing restarts of QEMU instances
it manages. That way we can introduce new functionality into
libvirt that relies on it having previously set 'bus' on all
active QEMUs.

If QEMU adds PCI-to-PCI bridges, then I wouldn't expect QEMU to
be adding the extra bridges. I'd expect that QEMU provided just
the first bridge and then libvirt would specify how many more
bridges to create at boot or hotplug them later. So it wouldn't
ever need to parse topology.

Regards,
Daniel

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-19 17:01                                           ` Daniel P. Berrange
  0 siblings, 0 replies; 300+ messages in thread
From: Daniel P. Berrange @ 2011-01-19 17:01 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kvm, Jan Kiszka, Glauber Costa, Marcelo Tosatti,
	Markus Armbruster, qemu-devel, Gerd Hoffmann, Avi Kivity

On Wed, Jan 19, 2011 at 10:53:30AM -0600, Anthony Liguori wrote:
> On 01/19/2011 03:48 AM, Gerd Hoffmann wrote:
> >On 01/18/11 18:09, Anthony Liguori wrote:
> >>On 01/18/2011 10:56 AM, Jan Kiszka wrote:
> >>>
> >>>>The device model topology is 100% a hidden architectural detail.
> >>>This is true for the sysbus, it is obviously not the case for PCI and
> >>>similarly discoverable buses. There we have a guest-explorable topology
> >>>that is currently equivalent to the the qdev layout.
> >>
> >>But we also don't do PCI passthrough so we really haven't even explored
> >>how that maps in qdev. I don't know if qemu-kvm has attempted to
> >>qdev-ify it.
> >
> >It is qdev-ified.  It is a normal pci device from qdev's point of view.
> >
> >BTW: is there any reason why (vfio-based) pci passthrough couldn't
> >work with tcg?
> >
> >>The -device interface is a stable interface. Right now, you don't
> >>specify any type of identifier of the pci bus when you create a PCI
> >>device. It's implied in the interface.
> >
> >Wrong.  You can specify the bus you want attach the device to via
> >bus=<name>.  This is true for *every* device, including all pci
> >devices.  If unspecified qdev uses the first bus it finds.
> >
> >As long as there is a single pci bus only there is simply no need
> >to specify it, thats why nobody does that today.
> 
> Right.  In terms of specifying bus=, what are we promising re:
> compatibility?  Will there always be a pci.0?  If we add some
> PCI-to-PCI bridges in order to support more devices, is libvirt
> support to parse the hierarchy and figure out which bus to put the
> device on?

The reason we specify 'bus' is that we wanted to be flexible wrt
upgrades of libvirt, without needing restarts of QEMU instances
it manages. That way we can introduce new functionality into
libvirt that relies on it having previously set 'bus' on all
active QEMUs.

If QEMU adds PCI-to-PCI bridges, then I wouldn't expect QEMU to
be adding the extra bridges. I'd expect that QEMU provided just
the first bridge and then libvirt would specify how many more
bridges to create at boot or hotplug them later. So it wouldn't
ever need to parse topology.

Regards,
Daniel

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-19 16:54                                           ` Anthony Liguori
@ 2011-01-19 17:19                                             ` Daniel P. Berrange
  -1 siblings, 0 replies; 300+ messages in thread
From: Daniel P. Berrange @ 2011-01-19 17:19 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Markus Armbruster, kvm, Jan Kiszka, Glauber Costa,
	Marcelo Tosatti, qemu-devel, Gerd Hoffmann, Avi Kivity

On Wed, Jan 19, 2011 at 10:54:10AM -0600, Anthony Liguori wrote:
> On 01/19/2011 07:11 AM, Markus Armbruster wrote:
> >Gerd Hoffmann<kraxel@redhat.com>  writes:
> >
> >>On 01/18/11 18:09, Anthony Liguori wrote:
> >>>On 01/18/2011 10:56 AM, Jan Kiszka wrote:
> >>>>>The device model topology is 100% a hidden architectural detail.
> >>>>This is true for the sysbus, it is obviously not the case for PCI and
> >>>>similarly discoverable buses. There we have a guest-explorable topology
> >>>>that is currently equivalent to the the qdev layout.
> >>>But we also don't do PCI passthrough so we really haven't even explored
> >>>how that maps in qdev. I don't know if qemu-kvm has attempted to
> >>>qdev-ify it.
> >>It is qdev-ified.  It is a normal pci device from qdev's point of view.
> >>
> >>BTW: is there any reason why (vfio-based) pci passthrough couldn't
> >>work with tcg?
> >>
> >>>The -device interface is a stable interface. Right now, you don't
> >>>specify any type of identifier of the pci bus when you create a PCI
> >>>device. It's implied in the interface.
> >>Wrong.  You can specify the bus you want attach the device to via
> >>bus=<name>.  This is true for *every* device, including all pci
> >>devices. If unspecified qdev uses the first bus it finds.
> >>
> >>As long as there is a single pci bus only there is simply no need to
> >>specify it, thats why nobody does that today.  Once q35 finally
> >>arrives this will change of course.
> >As far as I know, libvirt does it already.
> 
> I think that's a bad idea from a forward compatibility perspective.

In our past experiance though, *not* specifying attributes like
these has also been pretty bad from a forward compatibility
perspective too. We're kind of damned either way, so on balance
we decided we'd specify every attribute in qdev that's related
to unique identification of devices & their inter-relationships.
By strictly locking down the topology we were defining, we ought
to have a more stable ABI in face of future changes. I accept
this might not always work out, so we may have to adjust things
over time still. Predicting the future is hard :-)

Daniel

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-19 17:19                                             ` Daniel P. Berrange
  0 siblings, 0 replies; 300+ messages in thread
From: Daniel P. Berrange @ 2011-01-19 17:19 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kvm, Jan Kiszka, Glauber Costa, Marcelo Tosatti,
	Markus Armbruster, qemu-devel, Gerd Hoffmann, Avi Kivity

On Wed, Jan 19, 2011 at 10:54:10AM -0600, Anthony Liguori wrote:
> On 01/19/2011 07:11 AM, Markus Armbruster wrote:
> >Gerd Hoffmann<kraxel@redhat.com>  writes:
> >
> >>On 01/18/11 18:09, Anthony Liguori wrote:
> >>>On 01/18/2011 10:56 AM, Jan Kiszka wrote:
> >>>>>The device model topology is 100% a hidden architectural detail.
> >>>>This is true for the sysbus, it is obviously not the case for PCI and
> >>>>similarly discoverable buses. There we have a guest-explorable topology
> >>>>that is currently equivalent to the the qdev layout.
> >>>But we also don't do PCI passthrough so we really haven't even explored
> >>>how that maps in qdev. I don't know if qemu-kvm has attempted to
> >>>qdev-ify it.
> >>It is qdev-ified.  It is a normal pci device from qdev's point of view.
> >>
> >>BTW: is there any reason why (vfio-based) pci passthrough couldn't
> >>work with tcg?
> >>
> >>>The -device interface is a stable interface. Right now, you don't
> >>>specify any type of identifier of the pci bus when you create a PCI
> >>>device. It's implied in the interface.
> >>Wrong.  You can specify the bus you want attach the device to via
> >>bus=<name>.  This is true for *every* device, including all pci
> >>devices. If unspecified qdev uses the first bus it finds.
> >>
> >>As long as there is a single pci bus only there is simply no need to
> >>specify it, thats why nobody does that today.  Once q35 finally
> >>arrives this will change of course.
> >As far as I know, libvirt does it already.
> 
> I think that's a bad idea from a forward compatibility perspective.

In our past experiance though, *not* specifying attributes like
these has also been pretty bad from a forward compatibility
perspective too. We're kind of damned either way, so on balance
we decided we'd specify every attribute in qdev that's related
to unique identification of devices & their inter-relationships.
By strictly locking down the topology we were defining, we ought
to have a more stable ABI in face of future changes. I accept
this might not always work out, so we may have to adjust things
over time still. Predicting the future is hard :-)

Daniel

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-19 16:57                             ` [Qemu-devel] " Anthony Liguori
@ 2011-01-19 17:25                               ` Jan Kiszka
  -1 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-19 17:25 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Markus Armbruster, kvm, Glauber Costa, Marcelo Tosatti,
	qemu-devel, Avi Kivity

On 2011-01-19 17:57, Anthony Liguori wrote:
> On 01/19/2011 07:15 AM, Markus Armbruster wrote:
>> So they interact with KVM (need kvm_state), and they interact with the
>> emulated PCI bus.  Could you elaborate on the fundamental difference
>> between the two interactions that makes you choose the (hypothetical)
>> KVM bus over the PCI bus as device parent?
>>    
> 
> It's almost arbitrary, but I would say it's the direction that I/Os flow.

We need both if we want KVM buses. They are useless for enumerating the
device on that bus the guest sees it on.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-19 17:25                               ` Jan Kiszka
  0 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-19 17:25 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kvm, Glauber Costa, Marcelo Tosatti, Markus Armbruster,
	qemu-devel, Avi Kivity

On 2011-01-19 17:57, Anthony Liguori wrote:
> On 01/19/2011 07:15 AM, Markus Armbruster wrote:
>> So they interact with KVM (need kvm_state), and they interact with the
>> emulated PCI bus.  Could you elaborate on the fundamental difference
>> between the two interactions that makes you choose the (hypothetical)
>> KVM bus over the PCI bus as device parent?
>>    
> 
> It's almost arbitrary, but I would say it's the direction that I/Os flow.

We need both if we want KVM buses. They are useless for enumerating the
device on that bus the guest sees it on.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-19 16:53                                         ` Anthony Liguori
@ 2011-01-19 17:35                                           ` Daniel P. Berrange
  -1 siblings, 0 replies; 300+ messages in thread
From: Daniel P. Berrange @ 2011-01-19 17:35 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Gerd Hoffmann, Jan Kiszka, Avi Kivity, Markus Armbruster,
	Marcelo Tosatti, Glauber Costa, kvm, qemu-devel

On Wed, Jan 19, 2011 at 10:53:30AM -0600, Anthony Liguori wrote:
> On 01/19/2011 03:48 AM, Gerd Hoffmann wrote:
> >On 01/18/11 18:09, Anthony Liguori wrote:
> >>On 01/18/2011 10:56 AM, Jan Kiszka wrote:
> >>>
> >>>>The device model topology is 100% a hidden architectural detail.
> >>>This is true for the sysbus, it is obviously not the case for PCI and
> >>>similarly discoverable buses. There we have a guest-explorable topology
> >>>that is currently equivalent to the the qdev layout.
> >>
> >>But we also don't do PCI passthrough so we really haven't even explored
> >>how that maps in qdev. I don't know if qemu-kvm has attempted to
> >>qdev-ify it.
> >
> >It is qdev-ified.  It is a normal pci device from qdev's point of view.
> >
> >BTW: is there any reason why (vfio-based) pci passthrough couldn't
> >work with tcg?
> >
> >>The -device interface is a stable interface. Right now, you don't
> >>specify any type of identifier of the pci bus when you create a PCI
> >>device. It's implied in the interface.
> >
> >Wrong.  You can specify the bus you want attach the device to via
> >bus=<name>.  This is true for *every* device, including all pci
> >devices.  If unspecified qdev uses the first bus it finds.
> >
> >As long as there is a single pci bus only there is simply no need
> >to specify it, thats why nobody does that today.
> 
> Right.  In terms of specifying bus=, what are we promising re:
> compatibility?  Will there always be a pci.0?  If we add some
> PCI-to-PCI bridges in order to support more devices, is libvirt
> support to parse the hierarchy and figure out which bus to put the
> device on?

The answer to your questions probably differ depending on
whether '-nodefconfig' and '-nodefaults' are set on the
command line.  If they are set, then I'd expect to only
ever see one PCI bus with name pci.0 forever more, unless
i explicitly ask for more. If they are not set, then you
might expect to see multiple PCI buses by appear by magic

Daniel

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-19 17:35                                           ` Daniel P. Berrange
  0 siblings, 0 replies; 300+ messages in thread
From: Daniel P. Berrange @ 2011-01-19 17:35 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kvm, Jan Kiszka, Glauber Costa, Marcelo Tosatti,
	Markus Armbruster, qemu-devel, Gerd Hoffmann, Avi Kivity

On Wed, Jan 19, 2011 at 10:53:30AM -0600, Anthony Liguori wrote:
> On 01/19/2011 03:48 AM, Gerd Hoffmann wrote:
> >On 01/18/11 18:09, Anthony Liguori wrote:
> >>On 01/18/2011 10:56 AM, Jan Kiszka wrote:
> >>>
> >>>>The device model topology is 100% a hidden architectural detail.
> >>>This is true for the sysbus, it is obviously not the case for PCI and
> >>>similarly discoverable buses. There we have a guest-explorable topology
> >>>that is currently equivalent to the the qdev layout.
> >>
> >>But we also don't do PCI passthrough so we really haven't even explored
> >>how that maps in qdev. I don't know if qemu-kvm has attempted to
> >>qdev-ify it.
> >
> >It is qdev-ified.  It is a normal pci device from qdev's point of view.
> >
> >BTW: is there any reason why (vfio-based) pci passthrough couldn't
> >work with tcg?
> >
> >>The -device interface is a stable interface. Right now, you don't
> >>specify any type of identifier of the pci bus when you create a PCI
> >>device. It's implied in the interface.
> >
> >Wrong.  You can specify the bus you want attach the device to via
> >bus=<name>.  This is true for *every* device, including all pci
> >devices.  If unspecified qdev uses the first bus it finds.
> >
> >As long as there is a single pci bus only there is simply no need
> >to specify it, thats why nobody does that today.
> 
> Right.  In terms of specifying bus=, what are we promising re:
> compatibility?  Will there always be a pci.0?  If we add some
> PCI-to-PCI bridges in order to support more devices, is libvirt
> support to parse the hierarchy and figure out which bus to put the
> device on?

The answer to your questions probably differ depending on
whether '-nodefconfig' and '-nodefaults' are set on the
command line.  If they are set, then I'd expect to only
ever see one PCI bus with name pci.0 forever more, unless
i explicitly ask for more. If they are not set, then you
might expect to see multiple PCI buses by appear by magic

Daniel

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-19 17:35                                           ` Daniel P. Berrange
@ 2011-01-19 17:42                                             ` Anthony Liguori
  -1 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-19 17:42 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Gerd Hoffmann, Jan Kiszka, Avi Kivity, Markus Armbruster,
	Marcelo Tosatti, Glauber Costa, kvm, qemu-devel

On 01/19/2011 11:35 AM, Daniel P. Berrange wrote:
> On Wed, Jan 19, 2011 at 10:53:30AM -0600, Anthony Liguori wrote:
>    
>> On 01/19/2011 03:48 AM, Gerd Hoffmann wrote:
>>      
>>> On 01/18/11 18:09, Anthony Liguori wrote:
>>>        
>>>> On 01/18/2011 10:56 AM, Jan Kiszka wrote:
>>>>          
>>>>>            
>>>>>> The device model topology is 100% a hidden architectural detail.
>>>>>>              
>>>>> This is true for the sysbus, it is obviously not the case for PCI and
>>>>> similarly discoverable buses. There we have a guest-explorable topology
>>>>> that is currently equivalent to the the qdev layout.
>>>>>            
>>>> But we also don't do PCI passthrough so we really haven't even explored
>>>> how that maps in qdev. I don't know if qemu-kvm has attempted to
>>>> qdev-ify it.
>>>>          
>>> It is qdev-ified.  It is a normal pci device from qdev's point of view.
>>>
>>> BTW: is there any reason why (vfio-based) pci passthrough couldn't
>>> work with tcg?
>>>
>>>        
>>>> The -device interface is a stable interface. Right now, you don't
>>>> specify any type of identifier of the pci bus when you create a PCI
>>>> device. It's implied in the interface.
>>>>          
>>> Wrong.  You can specify the bus you want attach the device to via
>>> bus=<name>.  This is true for *every* device, including all pci
>>> devices.  If unspecified qdev uses the first bus it finds.
>>>
>>> As long as there is a single pci bus only there is simply no need
>>> to specify it, thats why nobody does that today.
>>>        
>> Right.  In terms of specifying bus=, what are we promising re:
>> compatibility?  Will there always be a pci.0?  If we add some
>> PCI-to-PCI bridges in order to support more devices, is libvirt
>> support to parse the hierarchy and figure out which bus to put the
>> device on?
>>      
> The answer to your questions probably differ depending on
> whether '-nodefconfig' and '-nodefaults' are set on the
> command line.  If they are set, then I'd expect to only
> ever see one PCI bus with name pci.0 forever more, unless
> i explicitly ask for more. If they are not set, then you
> might expect to see multiple PCI buses by appear by magic
>    

Yeah, we can't promise that.  If you use -M pc, you aren't guaranteed a 
stable PCI bus topology even with -nodefconfig/-nodefaults.

Regards,

Anthony Liguori

> Daniel
>    


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-19 17:42                                             ` Anthony Liguori
  0 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-19 17:42 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: kvm, Jan Kiszka, Glauber Costa, Marcelo Tosatti,
	Markus Armbruster, qemu-devel, Gerd Hoffmann, Avi Kivity

On 01/19/2011 11:35 AM, Daniel P. Berrange wrote:
> On Wed, Jan 19, 2011 at 10:53:30AM -0600, Anthony Liguori wrote:
>    
>> On 01/19/2011 03:48 AM, Gerd Hoffmann wrote:
>>      
>>> On 01/18/11 18:09, Anthony Liguori wrote:
>>>        
>>>> On 01/18/2011 10:56 AM, Jan Kiszka wrote:
>>>>          
>>>>>            
>>>>>> The device model topology is 100% a hidden architectural detail.
>>>>>>              
>>>>> This is true for the sysbus, it is obviously not the case for PCI and
>>>>> similarly discoverable buses. There we have a guest-explorable topology
>>>>> that is currently equivalent to the the qdev layout.
>>>>>            
>>>> But we also don't do PCI passthrough so we really haven't even explored
>>>> how that maps in qdev. I don't know if qemu-kvm has attempted to
>>>> qdev-ify it.
>>>>          
>>> It is qdev-ified.  It is a normal pci device from qdev's point of view.
>>>
>>> BTW: is there any reason why (vfio-based) pci passthrough couldn't
>>> work with tcg?
>>>
>>>        
>>>> The -device interface is a stable interface. Right now, you don't
>>>> specify any type of identifier of the pci bus when you create a PCI
>>>> device. It's implied in the interface.
>>>>          
>>> Wrong.  You can specify the bus you want attach the device to via
>>> bus=<name>.  This is true for *every* device, including all pci
>>> devices.  If unspecified qdev uses the first bus it finds.
>>>
>>> As long as there is a single pci bus only there is simply no need
>>> to specify it, thats why nobody does that today.
>>>        
>> Right.  In terms of specifying bus=, what are we promising re:
>> compatibility?  Will there always be a pci.0?  If we add some
>> PCI-to-PCI bridges in order to support more devices, is libvirt
>> support to parse the hierarchy and figure out which bus to put the
>> device on?
>>      
> The answer to your questions probably differ depending on
> whether '-nodefconfig' and '-nodefaults' are set on the
> command line.  If they are set, then I'd expect to only
> ever see one PCI bus with name pci.0 forever more, unless
> i explicitly ask for more. If they are not set, then you
> might expect to see multiple PCI buses by appear by magic
>    

Yeah, we can't promise that.  If you use -M pc, you aren't guaranteed a 
stable PCI bus topology even with -nodefconfig/-nodefaults.

Regards,

Anthony Liguori

> Daniel
>    

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-19 17:19                                             ` Daniel P. Berrange
@ 2011-01-19 17:43                                               ` Anthony Liguori
  -1 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-19 17:43 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Markus Armbruster, kvm, Jan Kiszka, Glauber Costa,
	Marcelo Tosatti, qemu-devel, Gerd Hoffmann, Avi Kivity

On 01/19/2011 11:19 AM, Daniel P. Berrange wrote:
>
> In our past experiance though, *not* specifying attributes like
> these has also been pretty bad from a forward compatibility
> perspective too. We're kind of damned either way, so on balance
> we decided we'd specify every attribute in qdev that's related
> to unique identification of devices&  their inter-relationships.
> By strictly locking down the topology we were defining, we ought
> to have a more stable ABI in face of future changes. I accept
> this might not always work out, so we may have to adjust things
> over time still. Predicting the future is hard :-)
>    

There are two distinct things here:

1) creating exactly the same virtual machine (like for migration) given 
a newer version of QEMU

2) creating a reasonably similar virtual machine given a newer version 
of QEMU

For (1), you cannot use -M pc.  You should use things like bus=X,addr=Y 
much better is for QEMU to dump a device file and to just reuse that 
instead of guessing what you need.

For (2), you cannot use bus=X,addr=Y because it makes assumptions about 
the PCI topology which may change in newer -M pc's.

I think libvirt needs to treat this two scenarios differently to support 
forwards compatibility.

Regards,

Anthony Liguori

> Daniel
>    


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-19 17:43                                               ` Anthony Liguori
  0 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-19 17:43 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: kvm, Jan Kiszka, Glauber Costa, Marcelo Tosatti,
	Markus Armbruster, qemu-devel, Gerd Hoffmann, Avi Kivity

On 01/19/2011 11:19 AM, Daniel P. Berrange wrote:
>
> In our past experiance though, *not* specifying attributes like
> these has also been pretty bad from a forward compatibility
> perspective too. We're kind of damned either way, so on balance
> we decided we'd specify every attribute in qdev that's related
> to unique identification of devices&  their inter-relationships.
> By strictly locking down the topology we were defining, we ought
> to have a more stable ABI in face of future changes. I accept
> this might not always work out, so we may have to adjust things
> over time still. Predicting the future is hard :-)
>    

There are two distinct things here:

1) creating exactly the same virtual machine (like for migration) given 
a newer version of QEMU

2) creating a reasonably similar virtual machine given a newer version 
of QEMU

For (1), you cannot use -M pc.  You should use things like bus=X,addr=Y 
much better is for QEMU to dump a device file and to just reuse that 
instead of guessing what you need.

For (2), you cannot use bus=X,addr=Y because it makes assumptions about 
the PCI topology which may change in newer -M pc's.

I think libvirt needs to treat this two scenarios differently to support 
forwards compatibility.

Regards,

Anthony Liguori

> Daniel
>    

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-19 17:01                                           ` Daniel P. Berrange
@ 2011-01-19 17:51                                             ` Anthony Liguori
  -1 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-19 17:51 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Gerd Hoffmann, Jan Kiszka, Avi Kivity, Markus Armbruster,
	Marcelo Tosatti, Glauber Costa, kvm, qemu-devel

On 01/19/2011 11:01 AM, Daniel P. Berrange wrote:
>
> The reason we specify 'bus' is that we wanted to be flexible wrt
> upgrades of libvirt, without needing restarts of QEMU instances
> it manages. That way we can introduce new functionality into
> libvirt that relies on it having previously set 'bus' on all
> active QEMUs.
>
> If QEMU adds PCI-to-PCI bridges, then I wouldn't expect QEMU to
> be adding the extra bridges. I'd expect that QEMU provided just
> the first bridge and then libvirt would specify how many more
> bridges to create at boot or hotplug them later. So it wouldn't
> ever need to parse topology.
>    

Yeah, but replacing the main chipset will certainly change the PCI 
topology such that if you're specifying bus=X and addr=X and then also 
using -M pc, unless you're parsing the default topology to come up with 
the addressing, it will break in the future.

That's why I think something simpler like a linear index that QEMU maps 
to a static location in the topology is probably the best future proof 
interface.

Regards,

Anthony Liguori

> Regards,
> Daniel
>    


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-19 17:51                                             ` Anthony Liguori
  0 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-19 17:51 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: kvm, Jan Kiszka, Glauber Costa, Marcelo Tosatti,
	Markus Armbruster, qemu-devel, Gerd Hoffmann, Avi Kivity

On 01/19/2011 11:01 AM, Daniel P. Berrange wrote:
>
> The reason we specify 'bus' is that we wanted to be flexible wrt
> upgrades of libvirt, without needing restarts of QEMU instances
> it manages. That way we can introduce new functionality into
> libvirt that relies on it having previously set 'bus' on all
> active QEMUs.
>
> If QEMU adds PCI-to-PCI bridges, then I wouldn't expect QEMU to
> be adding the extra bridges. I'd expect that QEMU provided just
> the first bridge and then libvirt would specify how many more
> bridges to create at boot or hotplug them later. So it wouldn't
> ever need to parse topology.
>    

Yeah, but replacing the main chipset will certainly change the PCI 
topology such that if you're specifying bus=X and addr=X and then also 
using -M pc, unless you're parsing the default topology to come up with 
the addressing, it will break in the future.

That's why I think something simpler like a linear index that QEMU maps 
to a static location in the topology is probably the best future proof 
interface.

Regards,

Anthony Liguori

> Regards,
> Daniel
>    

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-19 17:51                                             ` Anthony Liguori
@ 2011-01-19 18:52                                               ` Daniel P. Berrange
  -1 siblings, 0 replies; 300+ messages in thread
From: Daniel P. Berrange @ 2011-01-19 18:52 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Gerd Hoffmann, Jan Kiszka, Avi Kivity, Markus Armbruster,
	Marcelo Tosatti, Glauber Costa, kvm, qemu-devel

On Wed, Jan 19, 2011 at 11:51:58AM -0600, Anthony Liguori wrote:
> On 01/19/2011 11:01 AM, Daniel P. Berrange wrote:
> >
> >The reason we specify 'bus' is that we wanted to be flexible wrt
> >upgrades of libvirt, without needing restarts of QEMU instances
> >it manages. That way we can introduce new functionality into
> >libvirt that relies on it having previously set 'bus' on all
> >active QEMUs.
> >
> >If QEMU adds PCI-to-PCI bridges, then I wouldn't expect QEMU to
> >be adding the extra bridges. I'd expect that QEMU provided just
> >the first bridge and then libvirt would specify how many more
> >bridges to create at boot or hotplug them later. So it wouldn't
> >ever need to parse topology.
> 
> Yeah, but replacing the main chipset will certainly change the PCI
> topology such that if you're specifying bus=X and addr=X and then
> also using -M pc, unless you're parsing the default topology to come
> up with the addressing, it will break in the future.

We never use a bare '-M pc' though, we always canonicalize to
one of the versioned forms.  So if we run '-M pc-0.12', then
neither the main PCI chipset nor topology would have changed
in newer QEMU.  Of course if we deployed a new VM with
'-M pc-0.20' that might have new PCI chipset, so bus=pci.0
might have different meaning that it did when used with
'-M pc-0.12', but I don't think that's an immediate problem

Regards,
Daniel

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-19 18:52                                               ` Daniel P. Berrange
  0 siblings, 0 replies; 300+ messages in thread
From: Daniel P. Berrange @ 2011-01-19 18:52 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kvm, Jan Kiszka, Glauber Costa, Marcelo Tosatti,
	Markus Armbruster, qemu-devel, Gerd Hoffmann, Avi Kivity

On Wed, Jan 19, 2011 at 11:51:58AM -0600, Anthony Liguori wrote:
> On 01/19/2011 11:01 AM, Daniel P. Berrange wrote:
> >
> >The reason we specify 'bus' is that we wanted to be flexible wrt
> >upgrades of libvirt, without needing restarts of QEMU instances
> >it manages. That way we can introduce new functionality into
> >libvirt that relies on it having previously set 'bus' on all
> >active QEMUs.
> >
> >If QEMU adds PCI-to-PCI bridges, then I wouldn't expect QEMU to
> >be adding the extra bridges. I'd expect that QEMU provided just
> >the first bridge and then libvirt would specify how many more
> >bridges to create at boot or hotplug them later. So it wouldn't
> >ever need to parse topology.
> 
> Yeah, but replacing the main chipset will certainly change the PCI
> topology such that if you're specifying bus=X and addr=X and then
> also using -M pc, unless you're parsing the default topology to come
> up with the addressing, it will break in the future.

We never use a bare '-M pc' though, we always canonicalize to
one of the versioned forms.  So if we run '-M pc-0.12', then
neither the main PCI chipset nor topology would have changed
in newer QEMU.  Of course if we deployed a new VM with
'-M pc-0.20' that might have new PCI chipset, so bus=pci.0
might have different meaning that it did when used with
'-M pc-0.12', but I don't think that's an immediate problem

Regards,
Daniel

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-19 17:42                                             ` Anthony Liguori
@ 2011-01-19 18:53                                               ` Daniel P. Berrange
  -1 siblings, 0 replies; 300+ messages in thread
From: Daniel P. Berrange @ 2011-01-19 18:53 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Gerd Hoffmann, Jan Kiszka, Avi Kivity, Markus Armbruster,
	Marcelo Tosatti, Glauber Costa, kvm, qemu-devel

On Wed, Jan 19, 2011 at 11:42:18AM -0600, Anthony Liguori wrote:
> On 01/19/2011 11:35 AM, Daniel P. Berrange wrote:
> >On Wed, Jan 19, 2011 at 10:53:30AM -0600, Anthony Liguori wrote:
> >>On 01/19/2011 03:48 AM, Gerd Hoffmann wrote:
> >>>On 01/18/11 18:09, Anthony Liguori wrote:
> >>>>On 01/18/2011 10:56 AM, Jan Kiszka wrote:
> >>>>>>The device model topology is 100% a hidden architectural detail.
> >>>>>This is true for the sysbus, it is obviously not the case for PCI and
> >>>>>similarly discoverable buses. There we have a guest-explorable topology
> >>>>>that is currently equivalent to the the qdev layout.
> >>>>But we also don't do PCI passthrough so we really haven't even explored
> >>>>how that maps in qdev. I don't know if qemu-kvm has attempted to
> >>>>qdev-ify it.
> >>>It is qdev-ified.  It is a normal pci device from qdev's point of view.
> >>>
> >>>BTW: is there any reason why (vfio-based) pci passthrough couldn't
> >>>work with tcg?
> >>>
> >>>>The -device interface is a stable interface. Right now, you don't
> >>>>specify any type of identifier of the pci bus when you create a PCI
> >>>>device. It's implied in the interface.
> >>>Wrong.  You can specify the bus you want attach the device to via
> >>>bus=<name>.  This is true for *every* device, including all pci
> >>>devices.  If unspecified qdev uses the first bus it finds.
> >>>
> >>>As long as there is a single pci bus only there is simply no need
> >>>to specify it, thats why nobody does that today.
> >>Right.  In terms of specifying bus=, what are we promising re:
> >>compatibility?  Will there always be a pci.0?  If we add some
> >>PCI-to-PCI bridges in order to support more devices, is libvirt
> >>support to parse the hierarchy and figure out which bus to put the
> >>device on?
> >The answer to your questions probably differ depending on
> >whether '-nodefconfig' and '-nodefaults' are set on the
> >command line.  If they are set, then I'd expect to only
> >ever see one PCI bus with name pci.0 forever more, unless
> >i explicitly ask for more. If they are not set, then you
> >might expect to see multiple PCI buses by appear by magic
> 
> Yeah, we can't promise that.  If you use -M pc, you aren't
> guaranteed a stable PCI bus topology even with
> -nodefconfig/-nodefaults.

That's why we never use '-M pc' when actually invoking QEMU.
If the user specifies 'pc' in the XML, we canonicalize that
to the versioned alternative like 'pc-0.12' before invoking
QEMU. We also expose the list of versioned machines to apps
so they can do canonicalization themselves if desired.

Regards,
Daniel

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-19 18:53                                               ` Daniel P. Berrange
  0 siblings, 0 replies; 300+ messages in thread
From: Daniel P. Berrange @ 2011-01-19 18:53 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kvm, Jan Kiszka, Glauber Costa, Marcelo Tosatti,
	Markus Armbruster, qemu-devel, Gerd Hoffmann, Avi Kivity

On Wed, Jan 19, 2011 at 11:42:18AM -0600, Anthony Liguori wrote:
> On 01/19/2011 11:35 AM, Daniel P. Berrange wrote:
> >On Wed, Jan 19, 2011 at 10:53:30AM -0600, Anthony Liguori wrote:
> >>On 01/19/2011 03:48 AM, Gerd Hoffmann wrote:
> >>>On 01/18/11 18:09, Anthony Liguori wrote:
> >>>>On 01/18/2011 10:56 AM, Jan Kiszka wrote:
> >>>>>>The device model topology is 100% a hidden architectural detail.
> >>>>>This is true for the sysbus, it is obviously not the case for PCI and
> >>>>>similarly discoverable buses. There we have a guest-explorable topology
> >>>>>that is currently equivalent to the the qdev layout.
> >>>>But we also don't do PCI passthrough so we really haven't even explored
> >>>>how that maps in qdev. I don't know if qemu-kvm has attempted to
> >>>>qdev-ify it.
> >>>It is qdev-ified.  It is a normal pci device from qdev's point of view.
> >>>
> >>>BTW: is there any reason why (vfio-based) pci passthrough couldn't
> >>>work with tcg?
> >>>
> >>>>The -device interface is a stable interface. Right now, you don't
> >>>>specify any type of identifier of the pci bus when you create a PCI
> >>>>device. It's implied in the interface.
> >>>Wrong.  You can specify the bus you want attach the device to via
> >>>bus=<name>.  This is true for *every* device, including all pci
> >>>devices.  If unspecified qdev uses the first bus it finds.
> >>>
> >>>As long as there is a single pci bus only there is simply no need
> >>>to specify it, thats why nobody does that today.
> >>Right.  In terms of specifying bus=, what are we promising re:
> >>compatibility?  Will there always be a pci.0?  If we add some
> >>PCI-to-PCI bridges in order to support more devices, is libvirt
> >>support to parse the hierarchy and figure out which bus to put the
> >>device on?
> >The answer to your questions probably differ depending on
> >whether '-nodefconfig' and '-nodefaults' are set on the
> >command line.  If they are set, then I'd expect to only
> >ever see one PCI bus with name pci.0 forever more, unless
> >i explicitly ask for more. If they are not set, then you
> >might expect to see multiple PCI buses by appear by magic
> 
> Yeah, we can't promise that.  If you use -M pc, you aren't
> guaranteed a stable PCI bus topology even with
> -nodefconfig/-nodefaults.

That's why we never use '-M pc' when actually invoking QEMU.
If the user specifies 'pc' in the XML, we canonicalize that
to the versioned alternative like 'pc-0.12' before invoking
QEMU. We also expose the list of versioned machines to apps
so they can do canonicalization themselves if desired.

Regards,
Daniel

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-19 18:52                                               ` Daniel P. Berrange
@ 2011-01-19 18:58                                                 ` Anthony Liguori
  -1 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-19 18:58 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Gerd Hoffmann, Jan Kiszka, Avi Kivity, Markus Armbruster,
	Marcelo Tosatti, Glauber Costa, kvm, qemu-devel

On 01/19/2011 12:52 PM, Daniel P. Berrange wrote:
> On Wed, Jan 19, 2011 at 11:51:58AM -0600, Anthony Liguori wrote:
>    
>> On 01/19/2011 11:01 AM, Daniel P. Berrange wrote:
>>      
>>> The reason we specify 'bus' is that we wanted to be flexible wrt
>>> upgrades of libvirt, without needing restarts of QEMU instances
>>> it manages. That way we can introduce new functionality into
>>> libvirt that relies on it having previously set 'bus' on all
>>> active QEMUs.
>>>
>>> If QEMU adds PCI-to-PCI bridges, then I wouldn't expect QEMU to
>>> be adding the extra bridges. I'd expect that QEMU provided just
>>> the first bridge and then libvirt would specify how many more
>>> bridges to create at boot or hotplug them later. So it wouldn't
>>> ever need to parse topology.
>>>        
>> Yeah, but replacing the main chipset will certainly change the PCI
>> topology such that if you're specifying bus=X and addr=X and then
>> also using -M pc, unless you're parsing the default topology to come
>> up with the addressing, it will break in the future.
>>      
> We never use a bare '-M pc' though, we always canonicalize to
> one of the versioned forms.  So if we run '-M pc-0.12', then
> neither the main PCI chipset nor topology would have changed
> in newer QEMU.  Of course if we deployed a new VM with
> '-M pc-0.20' that might have new PCI chipset, so bus=pci.0
> might have different meaning that it did when used with
> '-M pc-0.12', but I don't think that's an immediate problem
>    

Right, but you expose bus addressing via the XML, no?  That means that 
if a user specifies something like '1.0', and you translate that to 
bus='pci.0',addr='1.0', then when pc-0.50 comes out and slot 1.0 is used 
for the integrated 3D VGA graphics adapter, the guest creation will fail.

If you expose topological configuration to the user, the guest will not 
continue working down the road unless you come up with a scheme where 
you map addresses to a different address range for newer pcs.

Regards,

Anthony Liguori

> Regards,
> Daniel
>    


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-19 18:58                                                 ` Anthony Liguori
  0 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-19 18:58 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: kvm, Jan Kiszka, Glauber Costa, Marcelo Tosatti,
	Markus Armbruster, qemu-devel, Gerd Hoffmann, Avi Kivity

On 01/19/2011 12:52 PM, Daniel P. Berrange wrote:
> On Wed, Jan 19, 2011 at 11:51:58AM -0600, Anthony Liguori wrote:
>    
>> On 01/19/2011 11:01 AM, Daniel P. Berrange wrote:
>>      
>>> The reason we specify 'bus' is that we wanted to be flexible wrt
>>> upgrades of libvirt, without needing restarts of QEMU instances
>>> it manages. That way we can introduce new functionality into
>>> libvirt that relies on it having previously set 'bus' on all
>>> active QEMUs.
>>>
>>> If QEMU adds PCI-to-PCI bridges, then I wouldn't expect QEMU to
>>> be adding the extra bridges. I'd expect that QEMU provided just
>>> the first bridge and then libvirt would specify how many more
>>> bridges to create at boot or hotplug them later. So it wouldn't
>>> ever need to parse topology.
>>>        
>> Yeah, but replacing the main chipset will certainly change the PCI
>> topology such that if you're specifying bus=X and addr=X and then
>> also using -M pc, unless you're parsing the default topology to come
>> up with the addressing, it will break in the future.
>>      
> We never use a bare '-M pc' though, we always canonicalize to
> one of the versioned forms.  So if we run '-M pc-0.12', then
> neither the main PCI chipset nor topology would have changed
> in newer QEMU.  Of course if we deployed a new VM with
> '-M pc-0.20' that might have new PCI chipset, so bus=pci.0
> might have different meaning that it did when used with
> '-M pc-0.12', but I don't think that's an immediate problem
>    

Right, but you expose bus addressing via the XML, no?  That means that 
if a user specifies something like '1.0', and you translate that to 
bus='pci.0',addr='1.0', then when pc-0.50 comes out and slot 1.0 is used 
for the integrated 3D VGA graphics adapter, the guest creation will fail.

If you expose topological configuration to the user, the guest will not 
continue working down the road unless you come up with a scheme where 
you map addresses to a different address range for newer pcs.

Regards,

Anthony Liguori

> Regards,
> Daniel
>    

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-19 16:57                             ` [Qemu-devel] " Anthony Liguori
@ 2011-01-19 19:32                               ` Blue Swirl
  -1 siblings, 0 replies; 300+ messages in thread
From: Blue Swirl @ 2011-01-19 19:32 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Markus Armbruster, kvm, Jan Kiszka, Glauber Costa,
	Marcelo Tosatti, qemu-devel, Avi Kivity

On Wed, Jan 19, 2011 at 4:57 PM, Anthony Liguori
<aliguori@linux.vnet.ibm.com> wrote:
> On 01/19/2011 07:15 AM, Markus Armbruster wrote:
>>
>> So they interact with KVM (need kvm_state), and they interact with the
>> emulated PCI bus.  Could you elaborate on the fundamental difference
>> between the two interactions that makes you choose the (hypothetical)
>> KVM bus over the PCI bus as device parent?
>>
>
> It's almost arbitrary, but I would say it's the direction that I/Os flow.
>
> But if the underlying observation is that the device tree is not really a
> tree, you're 100% correct.  This is part of why a factory interface that
> just takes a parent bus is too simplistic.
>
> I think we ought to introduce a -pci-device option that is specifically for
> creating PCI devices that doesn't require a parent bus argument but provides
> a way to specify stable addressing (for instancing, using a linear index).

I think kvm_state should not be a property of any device or bus. It
should be split to more logical pieces.

Some parts of it could remain in CPUState, because they are associated
with a VCPU.

Also, for example irqfd could be considered to be similar object to
char or block devices provided by QEMU to devices. Would it make sense
to introduce new host types for passing parts of kvm_state to devices?

I'd also make coalesced MMIO stuff part of memory object. We are not
passing any state references when using cpu_physical_memory_rw(), but
that could be changed.

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-19 19:32                               ` Blue Swirl
  0 siblings, 0 replies; 300+ messages in thread
From: Blue Swirl @ 2011-01-19 19:32 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kvm, Jan Kiszka, Glauber Costa, Marcelo Tosatti,
	Markus Armbruster, qemu-devel, Avi Kivity

On Wed, Jan 19, 2011 at 4:57 PM, Anthony Liguori
<aliguori@linux.vnet.ibm.com> wrote:
> On 01/19/2011 07:15 AM, Markus Armbruster wrote:
>>
>> So they interact with KVM (need kvm_state), and they interact with the
>> emulated PCI bus.  Could you elaborate on the fundamental difference
>> between the two interactions that makes you choose the (hypothetical)
>> KVM bus over the PCI bus as device parent?
>>
>
> It's almost arbitrary, but I would say it's the direction that I/Os flow.
>
> But if the underlying observation is that the device tree is not really a
> tree, you're 100% correct.  This is part of why a factory interface that
> just takes a parent bus is too simplistic.
>
> I think we ought to introduce a -pci-device option that is specifically for
> creating PCI devices that doesn't require a parent bus argument but provides
> a way to specify stable addressing (for instancing, using a linear index).

I think kvm_state should not be a property of any device or bus. It
should be split to more logical pieces.

Some parts of it could remain in CPUState, because they are associated
with a VCPU.

Also, for example irqfd could be considered to be similar object to
char or block devices provided by QEMU to devices. Would it make sense
to introduce new host types for passing parts of kvm_state to devices?

I'd also make coalesced MMIO stuff part of memory object. We are not
passing any state references when using cpu_physical_memory_rw(), but
that could be changed.

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-19 17:43                                               ` Anthony Liguori
@ 2011-01-20  8:44                                                 ` Gerd Hoffmann
  -1 siblings, 0 replies; 300+ messages in thread
From: Gerd Hoffmann @ 2011-01-20  8:44 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Daniel P. Berrange, Markus Armbruster, kvm, Jan Kiszka,
	Glauber Costa, Marcelo Tosatti, qemu-devel, Avi Kivity

   Hi,

> For (2), you cannot use bus=X,addr=Y because it makes assumptions about
> the PCI topology which may change in newer -M pc's.

Why should the PCI topology for 'pc' ever change?

We'll probably get q35 support some day, but when this lands I expect 
we'll see a new machine type 'q35', so '-m q35' will pick the ich9 
chipset (which will have a different pci topology of course) and '-m pc' 
will pick the existing piix chipset (which will continue to look like it 
looks today).

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-20  8:44                                                 ` Gerd Hoffmann
  0 siblings, 0 replies; 300+ messages in thread
From: Gerd Hoffmann @ 2011-01-20  8:44 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kvm, Jan Kiszka, Glauber Costa, Marcelo Tosatti,
	Markus Armbruster, qemu-devel, Avi Kivity

   Hi,

> For (2), you cannot use bus=X,addr=Y because it makes assumptions about
> the PCI topology which may change in newer -M pc's.

Why should the PCI topology for 'pc' ever change?

We'll probably get q35 support some day, but when this lands I expect 
we'll see a new machine type 'q35', so '-m q35' will pick the ich9 
chipset (which will have a different pci topology of course) and '-m pc' 
will pick the existing piix chipset (which will continue to look like it 
looks today).

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-19 19:32                               ` Blue Swirl
@ 2011-01-20  9:33                                 ` Jan Kiszka
  -1 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-20  9:33 UTC (permalink / raw)
  To: Blue Swirl
  Cc: Anthony Liguori, Markus Armbruster, kvm, Glauber Costa,
	Marcelo Tosatti, qemu-devel, Avi Kivity

On 2011-01-19 20:32, Blue Swirl wrote:
> On Wed, Jan 19, 2011 at 4:57 PM, Anthony Liguori
> <aliguori@linux.vnet.ibm.com> wrote:
>> On 01/19/2011 07:15 AM, Markus Armbruster wrote:
>>>
>>> So they interact with KVM (need kvm_state), and they interact with the
>>> emulated PCI bus.  Could you elaborate on the fundamental difference
>>> between the two interactions that makes you choose the (hypothetical)
>>> KVM bus over the PCI bus as device parent?
>>>
>>
>> It's almost arbitrary, but I would say it's the direction that I/Os flow.
>>
>> But if the underlying observation is that the device tree is not really a
>> tree, you're 100% correct.  This is part of why a factory interface that
>> just takes a parent bus is too simplistic.
>>
>> I think we ought to introduce a -pci-device option that is specifically for
>> creating PCI devices that doesn't require a parent bus argument but provides
>> a way to specify stable addressing (for instancing, using a linear index).
> 
> I think kvm_state should not be a property of any device or bus. It
> should be split to more logical pieces.
> 
> Some parts of it could remain in CPUState, because they are associated
> with a VCPU.
> 
> Also, for example irqfd could be considered to be similar object to
> char or block devices provided by QEMU to devices. Would it make sense
> to introduce new host types for passing parts of kvm_state to devices?
> 
> I'd also make coalesced MMIO stuff part of memory object. We are not
> passing any state references when using cpu_physical_memory_rw(), but
> that could be changed.

There are currently no VCPU-specific bits remaining in kvm_state. It may
be a good idea to introduce an arch-specific kvm_state and move related
bits over. It may also once be feasible to carve out memory management
related fields if we have proper abstractions for that, but I'm not
completely sure here.

Anyway, all these things are secondary. The primary topic here is how to
deal with kvm_state and its fields that have VM-global scope.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-20  9:33                                 ` Jan Kiszka
  0 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-20  9:33 UTC (permalink / raw)
  To: Blue Swirl
  Cc: kvm, Glauber Costa, Marcelo Tosatti, Markus Armbruster,
	qemu-devel, Anthony Liguori, Avi Kivity

On 2011-01-19 20:32, Blue Swirl wrote:
> On Wed, Jan 19, 2011 at 4:57 PM, Anthony Liguori
> <aliguori@linux.vnet.ibm.com> wrote:
>> On 01/19/2011 07:15 AM, Markus Armbruster wrote:
>>>
>>> So they interact with KVM (need kvm_state), and they interact with the
>>> emulated PCI bus.  Could you elaborate on the fundamental difference
>>> between the two interactions that makes you choose the (hypothetical)
>>> KVM bus over the PCI bus as device parent?
>>>
>>
>> It's almost arbitrary, but I would say it's the direction that I/Os flow.
>>
>> But if the underlying observation is that the device tree is not really a
>> tree, you're 100% correct.  This is part of why a factory interface that
>> just takes a parent bus is too simplistic.
>>
>> I think we ought to introduce a -pci-device option that is specifically for
>> creating PCI devices that doesn't require a parent bus argument but provides
>> a way to specify stable addressing (for instancing, using a linear index).
> 
> I think kvm_state should not be a property of any device or bus. It
> should be split to more logical pieces.
> 
> Some parts of it could remain in CPUState, because they are associated
> with a VCPU.
> 
> Also, for example irqfd could be considered to be similar object to
> char or block devices provided by QEMU to devices. Would it make sense
> to introduce new host types for passing parts of kvm_state to devices?
> 
> I'd also make coalesced MMIO stuff part of memory object. We are not
> passing any state references when using cpu_physical_memory_rw(), but
> that could be changed.

There are currently no VCPU-specific bits remaining in kvm_state. It may
be a good idea to introduce an arch-specific kvm_state and move related
bits over. It may also once be feasible to carve out memory management
related fields if we have proper abstractions for that, but I'm not
completely sure here.

Anyway, all these things are secondary. The primary topic here is how to
deal with kvm_state and its fields that have VM-global scope.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-20  8:44                                                 ` Gerd Hoffmann
@ 2011-01-20 10:33                                                   ` Daniel P. Berrange
  -1 siblings, 0 replies; 300+ messages in thread
From: Daniel P. Berrange @ 2011-01-20 10:33 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Anthony Liguori, Markus Armbruster, kvm, Jan Kiszka,
	Glauber Costa, Marcelo Tosatti, qemu-devel, Avi Kivity

On Thu, Jan 20, 2011 at 09:44:05AM +0100, Gerd Hoffmann wrote:
>   Hi,
> 
> >For (2), you cannot use bus=X,addr=Y because it makes assumptions about
> >the PCI topology which may change in newer -M pc's.
> 
> Why should the PCI topology for 'pc' ever change?
> 
> We'll probably get q35 support some day, but when this lands I
> expect we'll see a new machine type 'q35', so '-m q35' will pick the
> ich9 chipset (which will have a different pci topology of course)
> and '-m pc' will pick the existing piix chipset (which will continue
> to look like it looks today).

If the topology does ever change (eg in the kind of way anthony
suggests, first bus only has the graphics card), I think libvirt
is going to need a little work to adapt to the new topology,
regardless of whether we currently specify a bus= arg to -device
or not. I'm not sure there's anything we could do to future proof
us to that kind of change.

Regards,
Daniel

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-20 10:33                                                   ` Daniel P. Berrange
  0 siblings, 0 replies; 300+ messages in thread
From: Daniel P. Berrange @ 2011-01-20 10:33 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: kvm, Jan Kiszka, Glauber Costa, Marcelo Tosatti,
	Markus Armbruster, qemu-devel, Anthony Liguori, Avi Kivity

On Thu, Jan 20, 2011 at 09:44:05AM +0100, Gerd Hoffmann wrote:
>   Hi,
> 
> >For (2), you cannot use bus=X,addr=Y because it makes assumptions about
> >the PCI topology which may change in newer -M pc's.
> 
> Why should the PCI topology for 'pc' ever change?
> 
> We'll probably get q35 support some day, but when this lands I
> expect we'll see a new machine type 'q35', so '-m q35' will pick the
> ich9 chipset (which will have a different pci topology of course)
> and '-m pc' will pick the existing piix chipset (which will continue
> to look like it looks today).

If the topology does ever change (eg in the kind of way anthony
suggests, first bus only has the graphics card), I think libvirt
is going to need a little work to adapt to the new topology,
regardless of whether we currently specify a bus= arg to -device
or not. I'm not sure there's anything we could do to future proof
us to that kind of change.

Regards,
Daniel

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-20  9:33                                 ` Jan Kiszka
@ 2011-01-20 19:27                                   ` Blue Swirl
  -1 siblings, 0 replies; 300+ messages in thread
From: Blue Swirl @ 2011-01-20 19:27 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, Markus Armbruster, kvm, Glauber Costa,
	Marcelo Tosatti, qemu-devel, Avi Kivity

On Thu, Jan 20, 2011 at 9:33 AM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 2011-01-19 20:32, Blue Swirl wrote:
>> On Wed, Jan 19, 2011 at 4:57 PM, Anthony Liguori
>> <aliguori@linux.vnet.ibm.com> wrote:
>>> On 01/19/2011 07:15 AM, Markus Armbruster wrote:
>>>>
>>>> So they interact with KVM (need kvm_state), and they interact with the
>>>> emulated PCI bus.  Could you elaborate on the fundamental difference
>>>> between the two interactions that makes you choose the (hypothetical)
>>>> KVM bus over the PCI bus as device parent?
>>>>
>>>
>>> It's almost arbitrary, but I would say it's the direction that I/Os flow.
>>>
>>> But if the underlying observation is that the device tree is not really a
>>> tree, you're 100% correct.  This is part of why a factory interface that
>>> just takes a parent bus is too simplistic.
>>>
>>> I think we ought to introduce a -pci-device option that is specifically for
>>> creating PCI devices that doesn't require a parent bus argument but provides
>>> a way to specify stable addressing (for instancing, using a linear index).
>>
>> I think kvm_state should not be a property of any device or bus. It
>> should be split to more logical pieces.
>>
>> Some parts of it could remain in CPUState, because they are associated
>> with a VCPU.
>>
>> Also, for example irqfd could be considered to be similar object to
>> char or block devices provided by QEMU to devices. Would it make sense
>> to introduce new host types for passing parts of kvm_state to devices?
>>
>> I'd also make coalesced MMIO stuff part of memory object. We are not
>> passing any state references when using cpu_physical_memory_rw(), but
>> that could be changed.
>
> There are currently no VCPU-specific bits remaining in kvm_state.

I think fields vcpu_events, robust_singlestep, debugregs,
kvm_sw_breakpoints, xsave, xcrs belong to CPUX86State. They may be the
same for all VCPUs but still they are sort of CPU properties. I'm not
sure about fd field.

> It may
> be a good idea to introduce an arch-specific kvm_state and move related
> bits over.

This should probably contain only irqchip_in_kernel, pit_in_kernel and
many_ioeventfds, maybe fd.

> It may also once be feasible to carve out memory management
> related fields if we have proper abstractions for that, but I'm not
> completely sure here.

I'd put slots, vmfd, coalesced_mmio, broken_set_mem_region,
migration_log into the memory object.

> Anyway, all these things are secondary. The primary topic here is how to
> deal with kvm_state and its fields that have VM-global scope.

If it is an opaque blob which contains various unrelated stuff, no
clear place will be found.

By the way, we don't have a QEMUState but instead use globals. Perhaps
this should be reorganized as well. For fd field, maybe even using a
global variable could be justified, since it is used for direct access
to kernel, not unlike a system call.

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-20 19:27                                   ` Blue Swirl
  0 siblings, 0 replies; 300+ messages in thread
From: Blue Swirl @ 2011-01-20 19:27 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: kvm, Glauber Costa, Marcelo Tosatti, Markus Armbruster,
	qemu-devel, Anthony Liguori, Avi Kivity

On Thu, Jan 20, 2011 at 9:33 AM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 2011-01-19 20:32, Blue Swirl wrote:
>> On Wed, Jan 19, 2011 at 4:57 PM, Anthony Liguori
>> <aliguori@linux.vnet.ibm.com> wrote:
>>> On 01/19/2011 07:15 AM, Markus Armbruster wrote:
>>>>
>>>> So they interact with KVM (need kvm_state), and they interact with the
>>>> emulated PCI bus.  Could you elaborate on the fundamental difference
>>>> between the two interactions that makes you choose the (hypothetical)
>>>> KVM bus over the PCI bus as device parent?
>>>>
>>>
>>> It's almost arbitrary, but I would say it's the direction that I/Os flow.
>>>
>>> But if the underlying observation is that the device tree is not really a
>>> tree, you're 100% correct.  This is part of why a factory interface that
>>> just takes a parent bus is too simplistic.
>>>
>>> I think we ought to introduce a -pci-device option that is specifically for
>>> creating PCI devices that doesn't require a parent bus argument but provides
>>> a way to specify stable addressing (for instancing, using a linear index).
>>
>> I think kvm_state should not be a property of any device or bus. It
>> should be split to more logical pieces.
>>
>> Some parts of it could remain in CPUState, because they are associated
>> with a VCPU.
>>
>> Also, for example irqfd could be considered to be similar object to
>> char or block devices provided by QEMU to devices. Would it make sense
>> to introduce new host types for passing parts of kvm_state to devices?
>>
>> I'd also make coalesced MMIO stuff part of memory object. We are not
>> passing any state references when using cpu_physical_memory_rw(), but
>> that could be changed.
>
> There are currently no VCPU-specific bits remaining in kvm_state.

I think fields vcpu_events, robust_singlestep, debugregs,
kvm_sw_breakpoints, xsave, xcrs belong to CPUX86State. They may be the
same for all VCPUs but still they are sort of CPU properties. I'm not
sure about fd field.

> It may
> be a good idea to introduce an arch-specific kvm_state and move related
> bits over.

This should probably contain only irqchip_in_kernel, pit_in_kernel and
many_ioeventfds, maybe fd.

> It may also once be feasible to carve out memory management
> related fields if we have proper abstractions for that, but I'm not
> completely sure here.

I'd put slots, vmfd, coalesced_mmio, broken_set_mem_region,
migration_log into the memory object.

> Anyway, all these things are secondary. The primary topic here is how to
> deal with kvm_state and its fields that have VM-global scope.

If it is an opaque blob which contains various unrelated stuff, no
clear place will be found.

By the way, we don't have a QEMUState but instead use globals. Perhaps
this should be reorganized as well. For fd field, maybe even using a
global variable could be justified, since it is used for direct access
to kernel, not unlike a system call.

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-20  9:33                                 ` Jan Kiszka
@ 2011-01-20 19:37                                   ` Anthony Liguori
  -1 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-20 19:37 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Blue Swirl, Markus Armbruster, kvm, Glauber Costa,
	Marcelo Tosatti, qemu-devel, Avi Kivity

On 01/20/2011 03:33 AM, Jan Kiszka wrote:
> On 2011-01-19 20:32, Blue Swirl wrote:
>    
>> On Wed, Jan 19, 2011 at 4:57 PM, Anthony Liguori
>> <aliguori@linux.vnet.ibm.com>  wrote:
>>      
>>> On 01/19/2011 07:15 AM, Markus Armbruster wrote:
>>>        
>>>> So they interact with KVM (need kvm_state), and they interact with the
>>>> emulated PCI bus.  Could you elaborate on the fundamental difference
>>>> between the two interactions that makes you choose the (hypothetical)
>>>> KVM bus over the PCI bus as device parent?
>>>>
>>>>          
>>> It's almost arbitrary, but I would say it's the direction that I/Os flow.
>>>
>>> But if the underlying observation is that the device tree is not really a
>>> tree, you're 100% correct.  This is part of why a factory interface that
>>> just takes a parent bus is too simplistic.
>>>
>>> I think we ought to introduce a -pci-device option that is specifically for
>>> creating PCI devices that doesn't require a parent bus argument but provides
>>> a way to specify stable addressing (for instancing, using a linear index).
>>>        
>> I think kvm_state should not be a property of any device or bus. It
>> should be split to more logical pieces.
>>
>> Some parts of it could remain in CPUState, because they are associated
>> with a VCPU.
>>
>> Also, for example irqfd could be considered to be similar object to
>> char or block devices provided by QEMU to devices. Would it make sense
>> to introduce new host types for passing parts of kvm_state to devices?
>>
>> I'd also make coalesced MMIO stuff part of memory object. We are not
>> passing any state references when using cpu_physical_memory_rw(), but
>> that could be changed.
>>      
> There are currently no VCPU-specific bits remaining in kvm_state. It may
> be a good idea to introduce an arch-specific kvm_state and move related
> bits over. It may also once be feasible to carve out memory management
> related fields if we have proper abstractions for that, but I'm not
> completely sure here.
>
> Anyway, all these things are secondary. The primary topic here is how to
> deal with kvm_state and its fields that have VM-global scope.
>    

The debate is really:

1) should we remove all passing of kvm_state and just assume it's static

2) deal with a couple places in the code where we need to figure out how 
to get at kvm_state

I think we've only identified 1 real instance of (2) and it's resulted 
in some good discussions about how to model KVM devices vs. emulated 
devices.  Honestly, (1) just stinks.  I see absolutely no advantage to 
it at all.   In the very worst case scenario, the thing we need to do is 
just reference an extern variable in a few places.  That completely 
avoids all of the modelling discussions for now (while leaving for 
placeholder FIXMEs so the problem can be tackled later).

I don't understand the resistance here.

Regards,

Anthony Liguori

> Jan
>
>    


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-20 19:37                                   ` Anthony Liguori
  0 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-20 19:37 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: kvm, Glauber Costa, Marcelo Tosatti, Markus Armbruster,
	qemu-devel, Blue Swirl, Avi Kivity

On 01/20/2011 03:33 AM, Jan Kiszka wrote:
> On 2011-01-19 20:32, Blue Swirl wrote:
>    
>> On Wed, Jan 19, 2011 at 4:57 PM, Anthony Liguori
>> <aliguori@linux.vnet.ibm.com>  wrote:
>>      
>>> On 01/19/2011 07:15 AM, Markus Armbruster wrote:
>>>        
>>>> So they interact with KVM (need kvm_state), and they interact with the
>>>> emulated PCI bus.  Could you elaborate on the fundamental difference
>>>> between the two interactions that makes you choose the (hypothetical)
>>>> KVM bus over the PCI bus as device parent?
>>>>
>>>>          
>>> It's almost arbitrary, but I would say it's the direction that I/Os flow.
>>>
>>> But if the underlying observation is that the device tree is not really a
>>> tree, you're 100% correct.  This is part of why a factory interface that
>>> just takes a parent bus is too simplistic.
>>>
>>> I think we ought to introduce a -pci-device option that is specifically for
>>> creating PCI devices that doesn't require a parent bus argument but provides
>>> a way to specify stable addressing (for instancing, using a linear index).
>>>        
>> I think kvm_state should not be a property of any device or bus. It
>> should be split to more logical pieces.
>>
>> Some parts of it could remain in CPUState, because they are associated
>> with a VCPU.
>>
>> Also, for example irqfd could be considered to be similar object to
>> char or block devices provided by QEMU to devices. Would it make sense
>> to introduce new host types for passing parts of kvm_state to devices?
>>
>> I'd also make coalesced MMIO stuff part of memory object. We are not
>> passing any state references when using cpu_physical_memory_rw(), but
>> that could be changed.
>>      
> There are currently no VCPU-specific bits remaining in kvm_state. It may
> be a good idea to introduce an arch-specific kvm_state and move related
> bits over. It may also once be feasible to carve out memory management
> related fields if we have proper abstractions for that, but I'm not
> completely sure here.
>
> Anyway, all these things are secondary. The primary topic here is how to
> deal with kvm_state and its fields that have VM-global scope.
>    

The debate is really:

1) should we remove all passing of kvm_state and just assume it's static

2) deal with a couple places in the code where we need to figure out how 
to get at kvm_state

I think we've only identified 1 real instance of (2) and it's resulted 
in some good discussions about how to model KVM devices vs. emulated 
devices.  Honestly, (1) just stinks.  I see absolutely no advantage to 
it at all.   In the very worst case scenario, the thing we need to do is 
just reference an extern variable in a few places.  That completely 
avoids all of the modelling discussions for now (while leaving for 
placeholder FIXMEs so the problem can be tackled later).

I don't understand the resistance here.

Regards,

Anthony Liguori

> Jan
>
>    

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-20  8:44                                                 ` Gerd Hoffmann
@ 2011-01-20 19:39                                                   ` Anthony Liguori
  -1 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-20 19:39 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Daniel P. Berrange, Markus Armbruster, kvm, Jan Kiszka,
	Glauber Costa, Marcelo Tosatti, qemu-devel, Avi Kivity

On 01/20/2011 02:44 AM, Gerd Hoffmann wrote:
>   Hi,
>
>> For (2), you cannot use bus=X,addr=Y because it makes assumptions about
>> the PCI topology which may change in newer -M pc's.
>
> Why should the PCI topology for 'pc' ever change?
>
> We'll probably get q35 support some day, but when this lands I expect 
> we'll see a new machine type 'q35', so '-m q35' will pick the ich9 
> chipset (which will have a different pci topology of course) and '-m 
> pc' will pick the existing piix chipset (which will continue to look 
> like it looks today).

But then what's the default machine type?  When I say -M pc, I really 
mean the default machine.

At some point, "qemu-system-x86_64 -device virtio-net-pci,addr=2.0"

Is not going to be a reliable way to invoke qemu because there's no way 
we can guarantee that slot 2 isn't occupied by a chipset device or some 
other default device.

Regards,

Anthony Liguori

> cheers,
>   Gerd


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-20 19:39                                                   ` Anthony Liguori
  0 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-20 19:39 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: kvm, Jan Kiszka, Glauber Costa, Marcelo Tosatti,
	Markus Armbruster, qemu-devel, Avi Kivity

On 01/20/2011 02:44 AM, Gerd Hoffmann wrote:
>   Hi,
>
>> For (2), you cannot use bus=X,addr=Y because it makes assumptions about
>> the PCI topology which may change in newer -M pc's.
>
> Why should the PCI topology for 'pc' ever change?
>
> We'll probably get q35 support some day, but when this lands I expect 
> we'll see a new machine type 'q35', so '-m q35' will pick the ich9 
> chipset (which will have a different pci topology of course) and '-m 
> pc' will pick the existing piix chipset (which will continue to look 
> like it looks today).

But then what's the default machine type?  When I say -M pc, I really 
mean the default machine.

At some point, "qemu-system-x86_64 -device virtio-net-pci,addr=2.0"

Is not going to be a reliable way to invoke qemu because there's no way 
we can guarantee that slot 2 isn't occupied by a chipset device or some 
other default device.

Regards,

Anthony Liguori

> cheers,
>   Gerd

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-20 10:33                                                   ` Daniel P. Berrange
@ 2011-01-20 19:42                                                     ` Anthony Liguori
  -1 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-20 19:42 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Gerd Hoffmann, Markus Armbruster, kvm, Jan Kiszka, Glauber Costa,
	Marcelo Tosatti, qemu-devel, Avi Kivity

On 01/20/2011 04:33 AM, Daniel P. Berrange wrote:
> On Thu, Jan 20, 2011 at 09:44:05AM +0100, Gerd Hoffmann wrote:
>    
>>    Hi,
>>
>>      
>>> For (2), you cannot use bus=X,addr=Y because it makes assumptions about
>>> the PCI topology which may change in newer -M pc's.
>>>        
>> Why should the PCI topology for 'pc' ever change?
>>
>> We'll probably get q35 support some day, but when this lands I
>> expect we'll see a new machine type 'q35', so '-m q35' will pick the
>> ich9 chipset (which will have a different pci topology of course)
>> and '-m pc' will pick the existing piix chipset (which will continue
>> to look like it looks today).
>>      
> If the topology does ever change (eg in the kind of way anthony
> suggests, first bus only has the graphics card), I think libvirt
> is going to need a little work to adapt to the new topology,
> regardless of whether we currently specify a bus= arg to -device
> or not. I'm not sure there's anything we could do to future proof
> us to that kind of change.
>    

I assume that libvirt today assumes that it can use a set of PCI slots 
in bus 0, correct?  Probably in the range 3-31?  Such assumptions are 
very likely to break.

Regards,

Anthony Liguori

> Regards,
> Daniel
>    


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-20 19:42                                                     ` Anthony Liguori
  0 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-20 19:42 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: kvm, Jan Kiszka, Glauber Costa, Marcelo Tosatti,
	Markus Armbruster, qemu-devel, Gerd Hoffmann, Avi Kivity

On 01/20/2011 04:33 AM, Daniel P. Berrange wrote:
> On Thu, Jan 20, 2011 at 09:44:05AM +0100, Gerd Hoffmann wrote:
>    
>>    Hi,
>>
>>      
>>> For (2), you cannot use bus=X,addr=Y because it makes assumptions about
>>> the PCI topology which may change in newer -M pc's.
>>>        
>> Why should the PCI topology for 'pc' ever change?
>>
>> We'll probably get q35 support some day, but when this lands I
>> expect we'll see a new machine type 'q35', so '-m q35' will pick the
>> ich9 chipset (which will have a different pci topology of course)
>> and '-m pc' will pick the existing piix chipset (which will continue
>> to look like it looks today).
>>      
> If the topology does ever change (eg in the kind of way anthony
> suggests, first bus only has the graphics card), I think libvirt
> is going to need a little work to adapt to the new topology,
> regardless of whether we currently specify a bus= arg to -device
> or not. I'm not sure there's anything we could do to future proof
> us to that kind of change.
>    

I assume that libvirt today assumes that it can use a set of PCI slots 
in bus 0, correct?  Probably in the range 3-31?  Such assumptions are 
very likely to break.

Regards,

Anthony Liguori

> Regards,
> Daniel
>    

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-20 19:37                                   ` Anthony Liguori
@ 2011-01-20 20:02                                     ` Blue Swirl
  -1 siblings, 0 replies; 300+ messages in thread
From: Blue Swirl @ 2011-01-20 20:02 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Jan Kiszka, Markus Armbruster, kvm, Glauber Costa,
	Marcelo Tosatti, qemu-devel, Avi Kivity

On Thu, Jan 20, 2011 at 7:37 PM, Anthony Liguori
<aliguori@linux.vnet.ibm.com> wrote:
> On 01/20/2011 03:33 AM, Jan Kiszka wrote:
>>
>> On 2011-01-19 20:32, Blue Swirl wrote:
>>
>>>
>>> On Wed, Jan 19, 2011 at 4:57 PM, Anthony Liguori
>>> <aliguori@linux.vnet.ibm.com>  wrote:
>>>
>>>>
>>>> On 01/19/2011 07:15 AM, Markus Armbruster wrote:
>>>>
>>>>>
>>>>> So they interact with KVM (need kvm_state), and they interact with the
>>>>> emulated PCI bus.  Could you elaborate on the fundamental difference
>>>>> between the two interactions that makes you choose the (hypothetical)
>>>>> KVM bus over the PCI bus as device parent?
>>>>>
>>>>>
>>>>
>>>> It's almost arbitrary, but I would say it's the direction that I/Os
>>>> flow.
>>>>
>>>> But if the underlying observation is that the device tree is not really
>>>> a
>>>> tree, you're 100% correct.  This is part of why a factory interface that
>>>> just takes a parent bus is too simplistic.
>>>>
>>>> I think we ought to introduce a -pci-device option that is specifically
>>>> for
>>>> creating PCI devices that doesn't require a parent bus argument but
>>>> provides
>>>> a way to specify stable addressing (for instancing, using a linear
>>>> index).
>>>>
>>>
>>> I think kvm_state should not be a property of any device or bus. It
>>> should be split to more logical pieces.
>>>
>>> Some parts of it could remain in CPUState, because they are associated
>>> with a VCPU.
>>>
>>> Also, for example irqfd could be considered to be similar object to
>>> char or block devices provided by QEMU to devices. Would it make sense
>>> to introduce new host types for passing parts of kvm_state to devices?
>>>
>>> I'd also make coalesced MMIO stuff part of memory object. We are not
>>> passing any state references when using cpu_physical_memory_rw(), but
>>> that could be changed.
>>>
>>
>> There are currently no VCPU-specific bits remaining in kvm_state. It may
>> be a good idea to introduce an arch-specific kvm_state and move related
>> bits over. It may also once be feasible to carve out memory management
>> related fields if we have proper abstractions for that, but I'm not
>> completely sure here.
>>
>> Anyway, all these things are secondary. The primary topic here is how to
>> deal with kvm_state and its fields that have VM-global scope.
>>
>
> The debate is really:
>
> 1) should we remove all passing of kvm_state and just assume it's static
>
> 2) deal with a couple places in the code where we need to figure out how to
> get at kvm_state
>
> I think we've only identified 1 real instance of (2) and it's resulted in
> some good discussions about how to model KVM devices vs. emulated devices.
>  Honestly, (1) just stinks.  I see absolutely no advantage to it at all.

Fully agree.

> In the very worst case scenario, the thing we need to do is just reference
> an extern variable in a few places.  That completely avoids all of the
> modelling discussions for now (while leaving for placeholder FIXMEs so the
> problem can be tackled later).

I think KVMState was designed to match KVM ioctl interface: all stuff
that is needed for talking to KVM or received from KVM are there. But
I think this shouldn't be a design driver.

If the only pieces of kvm_state that are needed by the devices are
irqchip_in_kernel, pit_in_kernel and many_ioeventfds, the problem of
passing kvm_state to devices becomes very different. Each of these are
just single bits, affecting only a few devices. Perhaps they could be
device properties which the board level sets when KVM is used?

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-20 20:02                                     ` Blue Swirl
  0 siblings, 0 replies; 300+ messages in thread
From: Blue Swirl @ 2011-01-20 20:02 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kvm, Jan Kiszka, Glauber Costa, Marcelo Tosatti,
	Markus Armbruster, qemu-devel, Avi Kivity

On Thu, Jan 20, 2011 at 7:37 PM, Anthony Liguori
<aliguori@linux.vnet.ibm.com> wrote:
> On 01/20/2011 03:33 AM, Jan Kiszka wrote:
>>
>> On 2011-01-19 20:32, Blue Swirl wrote:
>>
>>>
>>> On Wed, Jan 19, 2011 at 4:57 PM, Anthony Liguori
>>> <aliguori@linux.vnet.ibm.com>  wrote:
>>>
>>>>
>>>> On 01/19/2011 07:15 AM, Markus Armbruster wrote:
>>>>
>>>>>
>>>>> So they interact with KVM (need kvm_state), and they interact with the
>>>>> emulated PCI bus.  Could you elaborate on the fundamental difference
>>>>> between the two interactions that makes you choose the (hypothetical)
>>>>> KVM bus over the PCI bus as device parent?
>>>>>
>>>>>
>>>>
>>>> It's almost arbitrary, but I would say it's the direction that I/Os
>>>> flow.
>>>>
>>>> But if the underlying observation is that the device tree is not really
>>>> a
>>>> tree, you're 100% correct.  This is part of why a factory interface that
>>>> just takes a parent bus is too simplistic.
>>>>
>>>> I think we ought to introduce a -pci-device option that is specifically
>>>> for
>>>> creating PCI devices that doesn't require a parent bus argument but
>>>> provides
>>>> a way to specify stable addressing (for instancing, using a linear
>>>> index).
>>>>
>>>
>>> I think kvm_state should not be a property of any device or bus. It
>>> should be split to more logical pieces.
>>>
>>> Some parts of it could remain in CPUState, because they are associated
>>> with a VCPU.
>>>
>>> Also, for example irqfd could be considered to be similar object to
>>> char or block devices provided by QEMU to devices. Would it make sense
>>> to introduce new host types for passing parts of kvm_state to devices?
>>>
>>> I'd also make coalesced MMIO stuff part of memory object. We are not
>>> passing any state references when using cpu_physical_memory_rw(), but
>>> that could be changed.
>>>
>>
>> There are currently no VCPU-specific bits remaining in kvm_state. It may
>> be a good idea to introduce an arch-specific kvm_state and move related
>> bits over. It may also once be feasible to carve out memory management
>> related fields if we have proper abstractions for that, but I'm not
>> completely sure here.
>>
>> Anyway, all these things are secondary. The primary topic here is how to
>> deal with kvm_state and its fields that have VM-global scope.
>>
>
> The debate is really:
>
> 1) should we remove all passing of kvm_state and just assume it's static
>
> 2) deal with a couple places in the code where we need to figure out how to
> get at kvm_state
>
> I think we've only identified 1 real instance of (2) and it's resulted in
> some good discussions about how to model KVM devices vs. emulated devices.
>  Honestly, (1) just stinks.  I see absolutely no advantage to it at all.

Fully agree.

> In the very worst case scenario, the thing we need to do is just reference
> an extern variable in a few places.  That completely avoids all of the
> modelling discussions for now (while leaving for placeholder FIXMEs so the
> problem can be tackled later).

I think KVMState was designed to match KVM ioctl interface: all stuff
that is needed for talking to KVM or received from KVM are there. But
I think this shouldn't be a design driver.

If the only pieces of kvm_state that are needed by the devices are
irqchip_in_kernel, pit_in_kernel and many_ioeventfds, the problem of
passing kvm_state to devices becomes very different. Each of these are
just single bits, affecting only a few devices. Perhaps they could be
device properties which the board level sets when KVM is used?

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-20 19:27                                   ` Blue Swirl
@ 2011-01-20 21:22                                     ` Jan Kiszka
  -1 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-20 21:22 UTC (permalink / raw)
  To: Blue Swirl
  Cc: Anthony Liguori, Markus Armbruster, kvm, Glauber Costa,
	Marcelo Tosatti, qemu-devel, Avi Kivity

[-- Attachment #1: Type: text/plain, Size: 4753 bytes --]

On 2011-01-20 20:27, Blue Swirl wrote:
> On Thu, Jan 20, 2011 at 9:33 AM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> On 2011-01-19 20:32, Blue Swirl wrote:
>>> On Wed, Jan 19, 2011 at 4:57 PM, Anthony Liguori
>>> <aliguori@linux.vnet.ibm.com> wrote:
>>>> On 01/19/2011 07:15 AM, Markus Armbruster wrote:
>>>>>
>>>>> So they interact with KVM (need kvm_state), and they interact with the
>>>>> emulated PCI bus.  Could you elaborate on the fundamental difference
>>>>> between the two interactions that makes you choose the (hypothetical)
>>>>> KVM bus over the PCI bus as device parent?
>>>>>
>>>>
>>>> It's almost arbitrary, but I would say it's the direction that I/Os flow.
>>>>
>>>> But if the underlying observation is that the device tree is not really a
>>>> tree, you're 100% correct.  This is part of why a factory interface that
>>>> just takes a parent bus is too simplistic.
>>>>
>>>> I think we ought to introduce a -pci-device option that is specifically for
>>>> creating PCI devices that doesn't require a parent bus argument but provides
>>>> a way to specify stable addressing (for instancing, using a linear index).
>>>
>>> I think kvm_state should not be a property of any device or bus. It
>>> should be split to more logical pieces.
>>>
>>> Some parts of it could remain in CPUState, because they are associated
>>> with a VCPU.
>>>
>>> Also, for example irqfd could be considered to be similar object to
>>> char or block devices provided by QEMU to devices. Would it make sense
>>> to introduce new host types for passing parts of kvm_state to devices?
>>>
>>> I'd also make coalesced MMIO stuff part of memory object. We are not
>>> passing any state references when using cpu_physical_memory_rw(), but
>>> that could be changed.
>>
>> There are currently no VCPU-specific bits remaining in kvm_state.
> 
> I think fields vcpu_events, robust_singlestep, debugregs,
> kvm_sw_breakpoints, xsave, xcrs belong to CPUX86State. They may be the
> same for all VCPUs but still they are sort of CPU properties. I'm not
> sure about fd field.

They are all properties of the currently loaded KVM subsystem in the
host kernel. They can't change while KVM's root fd is opened.
Replicating this static information into each and every VCPU state would
be crazy.

In fact, services like kvm_has_vcpu_events() already encode this: they
are static functions without any kvm_state reference that simply return
the content of those fields. Totally inconsistent to this, we force the
caller of kvm_check_extension to pass a handle. This is part of my
problem with the current situation and any halfhearted steps in this
context. Either we work towards eliminating "static KVMState *kvm_state"
in kvm-all.c or eliminating KVMState.

> 
>> It may
>> be a good idea to introduce an arch-specific kvm_state and move related
>> bits over.
> 
> This should probably contain only irqchip_in_kernel, pit_in_kernel and
> many_ioeventfds, maybe fd.

fd is that root file descriptor you need for a few KVM services that are
not bound to a specific VM - e.g. feature queries. It's not arch
specific. Arch specific are e.g. robust_singlestep or xsave feature states.

> 
>> It may also once be feasible to carve out memory management
>> related fields if we have proper abstractions for that, but I'm not
>> completely sure here.
> 
> I'd put slots, vmfd, coalesced_mmio, broken_set_mem_region,
> migration_log into the memory object.

vmfd is the VM-scope file descriptor you need at machine-level. The rest
logically belongs to a memory object, but I haven't looked at technical
details yet.

> 
>> Anyway, all these things are secondary. The primary topic here is how to
>> deal with kvm_state and its fields that have VM-global scope.
> 
> If it is an opaque blob which contains various unrelated stuff, no
> clear place will be found.

We aren't moving fields yet (and we shouldn't). We first of all need to
establish the handle distribution (which apparently requires quite some
work in areas beyond KVM).

> 
> By the way, we don't have a QEMUState but instead use globals. Perhaps
> this should be reorganized as well. For fd field, maybe even using a
> global variable could be justified, since it is used for direct access
> to kernel, not unlike a system call.

The fd field is part of this discussion. Making it global (but local to
the kvm subsystem) was an implicit part of my original suggestion.

I've no problem with something like a QEMUState, or better a
MachineState that would also include a few KVM-specific fields like the
vmfd - just like we already do for CPUstate (or should we better
introduce a KVM CPU bus... ;) ).

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-20 21:22                                     ` Jan Kiszka
  0 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-20 21:22 UTC (permalink / raw)
  To: Blue Swirl
  Cc: kvm, Glauber Costa, Marcelo Tosatti, Markus Armbruster,
	qemu-devel, Anthony Liguori, Avi Kivity

[-- Attachment #1: Type: text/plain, Size: 4753 bytes --]

On 2011-01-20 20:27, Blue Swirl wrote:
> On Thu, Jan 20, 2011 at 9:33 AM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> On 2011-01-19 20:32, Blue Swirl wrote:
>>> On Wed, Jan 19, 2011 at 4:57 PM, Anthony Liguori
>>> <aliguori@linux.vnet.ibm.com> wrote:
>>>> On 01/19/2011 07:15 AM, Markus Armbruster wrote:
>>>>>
>>>>> So they interact with KVM (need kvm_state), and they interact with the
>>>>> emulated PCI bus.  Could you elaborate on the fundamental difference
>>>>> between the two interactions that makes you choose the (hypothetical)
>>>>> KVM bus over the PCI bus as device parent?
>>>>>
>>>>
>>>> It's almost arbitrary, but I would say it's the direction that I/Os flow.
>>>>
>>>> But if the underlying observation is that the device tree is not really a
>>>> tree, you're 100% correct.  This is part of why a factory interface that
>>>> just takes a parent bus is too simplistic.
>>>>
>>>> I think we ought to introduce a -pci-device option that is specifically for
>>>> creating PCI devices that doesn't require a parent bus argument but provides
>>>> a way to specify stable addressing (for instancing, using a linear index).
>>>
>>> I think kvm_state should not be a property of any device or bus. It
>>> should be split to more logical pieces.
>>>
>>> Some parts of it could remain in CPUState, because they are associated
>>> with a VCPU.
>>>
>>> Also, for example irqfd could be considered to be similar object to
>>> char or block devices provided by QEMU to devices. Would it make sense
>>> to introduce new host types for passing parts of kvm_state to devices?
>>>
>>> I'd also make coalesced MMIO stuff part of memory object. We are not
>>> passing any state references when using cpu_physical_memory_rw(), but
>>> that could be changed.
>>
>> There are currently no VCPU-specific bits remaining in kvm_state.
> 
> I think fields vcpu_events, robust_singlestep, debugregs,
> kvm_sw_breakpoints, xsave, xcrs belong to CPUX86State. They may be the
> same for all VCPUs but still they are sort of CPU properties. I'm not
> sure about fd field.

They are all properties of the currently loaded KVM subsystem in the
host kernel. They can't change while KVM's root fd is opened.
Replicating this static information into each and every VCPU state would
be crazy.

In fact, services like kvm_has_vcpu_events() already encode this: they
are static functions without any kvm_state reference that simply return
the content of those fields. Totally inconsistent to this, we force the
caller of kvm_check_extension to pass a handle. This is part of my
problem with the current situation and any halfhearted steps in this
context. Either we work towards eliminating "static KVMState *kvm_state"
in kvm-all.c or eliminating KVMState.

> 
>> It may
>> be a good idea to introduce an arch-specific kvm_state and move related
>> bits over.
> 
> This should probably contain only irqchip_in_kernel, pit_in_kernel and
> many_ioeventfds, maybe fd.

fd is that root file descriptor you need for a few KVM services that are
not bound to a specific VM - e.g. feature queries. It's not arch
specific. Arch specific are e.g. robust_singlestep or xsave feature states.

> 
>> It may also once be feasible to carve out memory management
>> related fields if we have proper abstractions for that, but I'm not
>> completely sure here.
> 
> I'd put slots, vmfd, coalesced_mmio, broken_set_mem_region,
> migration_log into the memory object.

vmfd is the VM-scope file descriptor you need at machine-level. The rest
logically belongs to a memory object, but I haven't looked at technical
details yet.

> 
>> Anyway, all these things are secondary. The primary topic here is how to
>> deal with kvm_state and its fields that have VM-global scope.
> 
> If it is an opaque blob which contains various unrelated stuff, no
> clear place will be found.

We aren't moving fields yet (and we shouldn't). We first of all need to
establish the handle distribution (which apparently requires quite some
work in areas beyond KVM).

> 
> By the way, we don't have a QEMUState but instead use globals. Perhaps
> this should be reorganized as well. For fd field, maybe even using a
> global variable could be justified, since it is used for direct access
> to kernel, not unlike a system call.

The fd field is part of this discussion. Making it global (but local to
the kvm subsystem) was an implicit part of my original suggestion.

I've no problem with something like a QEMUState, or better a
MachineState that would also include a few KVM-specific fields like the
vmfd - just like we already do for CPUstate (or should we better
introduce a KVM CPU bus... ;) ).

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-20 19:37                                   ` Anthony Liguori
@ 2011-01-20 21:27                                     ` Jan Kiszka
  -1 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-20 21:27 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Blue Swirl, Markus Armbruster, kvm, Glauber Costa,
	Marcelo Tosatti, qemu-devel, Avi Kivity

[-- Attachment #1: Type: text/plain, Size: 3759 bytes --]

On 2011-01-20 20:37, Anthony Liguori wrote:
> On 01/20/2011 03:33 AM, Jan Kiszka wrote:
>> On 2011-01-19 20:32, Blue Swirl wrote:
>>   
>>> On Wed, Jan 19, 2011 at 4:57 PM, Anthony Liguori
>>> <aliguori@linux.vnet.ibm.com>  wrote:
>>>     
>>>> On 01/19/2011 07:15 AM, Markus Armbruster wrote:
>>>>       
>>>>> So they interact with KVM (need kvm_state), and they interact with the
>>>>> emulated PCI bus.  Could you elaborate on the fundamental difference
>>>>> between the two interactions that makes you choose the (hypothetical)
>>>>> KVM bus over the PCI bus as device parent?
>>>>>
>>>>>          
>>>> It's almost arbitrary, but I would say it's the direction that I/Os
>>>> flow.
>>>>
>>>> But if the underlying observation is that the device tree is not
>>>> really a
>>>> tree, you're 100% correct.  This is part of why a factory interface
>>>> that
>>>> just takes a parent bus is too simplistic.
>>>>
>>>> I think we ought to introduce a -pci-device option that is
>>>> specifically for
>>>> creating PCI devices that doesn't require a parent bus argument but
>>>> provides
>>>> a way to specify stable addressing (for instancing, using a linear
>>>> index).
>>>>        
>>> I think kvm_state should not be a property of any device or bus. It
>>> should be split to more logical pieces.
>>>
>>> Some parts of it could remain in CPUState, because they are associated
>>> with a VCPU.
>>>
>>> Also, for example irqfd could be considered to be similar object to
>>> char or block devices provided by QEMU to devices. Would it make sense
>>> to introduce new host types for passing parts of kvm_state to devices?
>>>
>>> I'd also make coalesced MMIO stuff part of memory object. We are not
>>> passing any state references when using cpu_physical_memory_rw(), but
>>> that could be changed.
>>>      
>> There are currently no VCPU-specific bits remaining in kvm_state. It may
>> be a good idea to introduce an arch-specific kvm_state and move related
>> bits over. It may also once be feasible to carve out memory management
>> related fields if we have proper abstractions for that, but I'm not
>> completely sure here.
>>
>> Anyway, all these things are secondary. The primary topic here is how to
>> deal with kvm_state and its fields that have VM-global scope.
>>    
> 
> The debate is really:
> 
> 1) should we remove all passing of kvm_state and just assume it's static
> 
> 2) deal with a couple places in the code where we need to figure out how
> to get at kvm_state
> 
> I think we've only identified 1 real instance of (2) and it's resulted
> in some good discussions about how to model KVM devices vs. emulated
> devices.  Honestly, (1) just stinks.  I see absolutely no advantage to
> it at all.   In the very worst case scenario, the thing we need to do is
> just reference an extern variable in a few places.  That completely
> avoids all of the modelling discussions for now (while leaving for
> placeholder FIXMEs so the problem can be tackled later).

The PCI bus discussion is surely an interesting outcome, but now almost
completely off-topic to the original, way less critical issue (as we
were discussing internals).

> 
> I don't understand the resistance here.

IMHO, most suggestions on the table are still over-designed (like a
KVMBus that only passes a kvm_state - or do you have more features for
it in mind?). The idea I love most so far is establishing a machine
state that also carries those few KVM bits which correspond to the KVM
extension of CPUState.

But in the end I want an implementable consensus that helps moving
forward with main topic: the overdue KVM upstream merge. I just do not
have a clear picture yet.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-20 21:27                                     ` Jan Kiszka
  0 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-20 21:27 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kvm, Glauber Costa, Marcelo Tosatti, Markus Armbruster,
	qemu-devel, Blue Swirl, Avi Kivity

[-- Attachment #1: Type: text/plain, Size: 3759 bytes --]

On 2011-01-20 20:37, Anthony Liguori wrote:
> On 01/20/2011 03:33 AM, Jan Kiszka wrote:
>> On 2011-01-19 20:32, Blue Swirl wrote:
>>   
>>> On Wed, Jan 19, 2011 at 4:57 PM, Anthony Liguori
>>> <aliguori@linux.vnet.ibm.com>  wrote:
>>>     
>>>> On 01/19/2011 07:15 AM, Markus Armbruster wrote:
>>>>       
>>>>> So they interact with KVM (need kvm_state), and they interact with the
>>>>> emulated PCI bus.  Could you elaborate on the fundamental difference
>>>>> between the two interactions that makes you choose the (hypothetical)
>>>>> KVM bus over the PCI bus as device parent?
>>>>>
>>>>>          
>>>> It's almost arbitrary, but I would say it's the direction that I/Os
>>>> flow.
>>>>
>>>> But if the underlying observation is that the device tree is not
>>>> really a
>>>> tree, you're 100% correct.  This is part of why a factory interface
>>>> that
>>>> just takes a parent bus is too simplistic.
>>>>
>>>> I think we ought to introduce a -pci-device option that is
>>>> specifically for
>>>> creating PCI devices that doesn't require a parent bus argument but
>>>> provides
>>>> a way to specify stable addressing (for instancing, using a linear
>>>> index).
>>>>        
>>> I think kvm_state should not be a property of any device or bus. It
>>> should be split to more logical pieces.
>>>
>>> Some parts of it could remain in CPUState, because they are associated
>>> with a VCPU.
>>>
>>> Also, for example irqfd could be considered to be similar object to
>>> char or block devices provided by QEMU to devices. Would it make sense
>>> to introduce new host types for passing parts of kvm_state to devices?
>>>
>>> I'd also make coalesced MMIO stuff part of memory object. We are not
>>> passing any state references when using cpu_physical_memory_rw(), but
>>> that could be changed.
>>>      
>> There are currently no VCPU-specific bits remaining in kvm_state. It may
>> be a good idea to introduce an arch-specific kvm_state and move related
>> bits over. It may also once be feasible to carve out memory management
>> related fields if we have proper abstractions for that, but I'm not
>> completely sure here.
>>
>> Anyway, all these things are secondary. The primary topic here is how to
>> deal with kvm_state and its fields that have VM-global scope.
>>    
> 
> The debate is really:
> 
> 1) should we remove all passing of kvm_state and just assume it's static
> 
> 2) deal with a couple places in the code where we need to figure out how
> to get at kvm_state
> 
> I think we've only identified 1 real instance of (2) and it's resulted
> in some good discussions about how to model KVM devices vs. emulated
> devices.  Honestly, (1) just stinks.  I see absolutely no advantage to
> it at all.   In the very worst case scenario, the thing we need to do is
> just reference an extern variable in a few places.  That completely
> avoids all of the modelling discussions for now (while leaving for
> placeholder FIXMEs so the problem can be tackled later).

The PCI bus discussion is surely an interesting outcome, but now almost
completely off-topic to the original, way less critical issue (as we
were discussing internals).

> 
> I don't understand the resistance here.

IMHO, most suggestions on the table are still over-designed (like a
KVMBus that only passes a kvm_state - or do you have more features for
it in mind?). The idea I love most so far is establishing a machine
state that also carries those few KVM bits which correspond to the KVM
extension of CPUState.

But in the end I want an implementable consensus that helps moving
forward with main topic: the overdue KVM upstream merge. I just do not
have a clear picture yet.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-20 21:22                                     ` Jan Kiszka
@ 2011-01-20 21:40                                       ` Blue Swirl
  -1 siblings, 0 replies; 300+ messages in thread
From: Blue Swirl @ 2011-01-20 21:40 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, Markus Armbruster, kvm, Glauber Costa,
	Marcelo Tosatti, qemu-devel, Avi Kivity

On Thu, Jan 20, 2011 at 9:22 PM, Jan Kiszka <jan.kiszka@web.de> wrote:
> On 2011-01-20 20:27, Blue Swirl wrote:
>> On Thu, Jan 20, 2011 at 9:33 AM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>> On 2011-01-19 20:32, Blue Swirl wrote:
>>>> On Wed, Jan 19, 2011 at 4:57 PM, Anthony Liguori
>>>> <aliguori@linux.vnet.ibm.com> wrote:
>>>>> On 01/19/2011 07:15 AM, Markus Armbruster wrote:
>>>>>>
>>>>>> So they interact with KVM (need kvm_state), and they interact with the
>>>>>> emulated PCI bus.  Could you elaborate on the fundamental difference
>>>>>> between the two interactions that makes you choose the (hypothetical)
>>>>>> KVM bus over the PCI bus as device parent?
>>>>>>
>>>>>
>>>>> It's almost arbitrary, but I would say it's the direction that I/Os flow.
>>>>>
>>>>> But if the underlying observation is that the device tree is not really a
>>>>> tree, you're 100% correct.  This is part of why a factory interface that
>>>>> just takes a parent bus is too simplistic.
>>>>>
>>>>> I think we ought to introduce a -pci-device option that is specifically for
>>>>> creating PCI devices that doesn't require a parent bus argument but provides
>>>>> a way to specify stable addressing (for instancing, using a linear index).
>>>>
>>>> I think kvm_state should not be a property of any device or bus. It
>>>> should be split to more logical pieces.
>>>>
>>>> Some parts of it could remain in CPUState, because they are associated
>>>> with a VCPU.
>>>>
>>>> Also, for example irqfd could be considered to be similar object to
>>>> char or block devices provided by QEMU to devices. Would it make sense
>>>> to introduce new host types for passing parts of kvm_state to devices?
>>>>
>>>> I'd also make coalesced MMIO stuff part of memory object. We are not
>>>> passing any state references when using cpu_physical_memory_rw(), but
>>>> that could be changed.
>>>
>>> There are currently no VCPU-specific bits remaining in kvm_state.
>>
>> I think fields vcpu_events, robust_singlestep, debugregs,
>> kvm_sw_breakpoints, xsave, xcrs belong to CPUX86State. They may be the
>> same for all VCPUs but still they are sort of CPU properties. I'm not
>> sure about fd field.
>
> They are all properties of the currently loaded KVM subsystem in the
> host kernel. They can't change while KVM's root fd is opened.
> Replicating this static information into each and every VCPU state would
> be crazy.

Then each CPUX86State could have a pointer to common structure.

> In fact, services like kvm_has_vcpu_events() already encode this: they
> are static functions without any kvm_state reference that simply return
> the content of those fields. Totally inconsistent to this, we force the
> caller of kvm_check_extension to pass a handle. This is part of my
> problem with the current situation and any halfhearted steps in this
> context. Either we work towards eliminating "static KVMState *kvm_state"
> in kvm-all.c or eliminating KVMState.

If the CPU related fields are accessible through CPUState, the handle
should be available.

>>> It may
>>> be a good idea to introduce an arch-specific kvm_state and move related
>>> bits over.
>>
>> This should probably contain only irqchip_in_kernel, pit_in_kernel and
>> many_ioeventfds, maybe fd.
>
> fd is that root file descriptor you need for a few KVM services that are
> not bound to a specific VM - e.g. feature queries. It's not arch
> specific. Arch specific are e.g. robust_singlestep or xsave feature states.

By arch you mean guest CPU architecture? They are not machine features.

>>
>>> It may also once be feasible to carve out memory management
>>> related fields if we have proper abstractions for that, but I'm not
>>> completely sure here.
>>
>> I'd put slots, vmfd, coalesced_mmio, broken_set_mem_region,
>> migration_log into the memory object.
>
> vmfd is the VM-scope file descriptor you need at machine-level. The rest
> logically belongs to a memory object, but I haven't looked at technical
> details yet.
>
>>
>>> Anyway, all these things are secondary. The primary topic here is how to
>>> deal with kvm_state and its fields that have VM-global scope.
>>
>> If it is an opaque blob which contains various unrelated stuff, no
>> clear place will be found.
>
> We aren't moving fields yet (and we shouldn't). We first of all need to
> establish the handle distribution (which apparently requires quite some
> work in areas beyond KVM).

But I think this is exactly  the problem. If the handle is for the
current KVMState, you'll indeed need it in various places and passing
it around will be cumbersome. By moving the fields around, the
information should  be available more naturally.

>> By the way, we don't have a QEMUState but instead use globals. Perhaps
>> this should be reorganized as well. For fd field, maybe even using a
>> global variable could be justified, since it is used for direct access
>> to kernel, not unlike a system call.
>
> The fd field is part of this discussion. Making it global (but local to
> the kvm subsystem) was an implicit part of my original suggestion.
>
> I've no problem with something like a QEMUState, or better a
> MachineState that would also include a few KVM-specific fields like the
> vmfd - just like we already do for CPUstate (or should we better
> introduce a KVM CPU bus... ;) ).
>
> Jan
>
>

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-20 21:40                                       ` Blue Swirl
  0 siblings, 0 replies; 300+ messages in thread
From: Blue Swirl @ 2011-01-20 21:40 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: kvm, Glauber Costa, Marcelo Tosatti, Markus Armbruster,
	qemu-devel, Anthony Liguori, Avi Kivity

On Thu, Jan 20, 2011 at 9:22 PM, Jan Kiszka <jan.kiszka@web.de> wrote:
> On 2011-01-20 20:27, Blue Swirl wrote:
>> On Thu, Jan 20, 2011 at 9:33 AM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>> On 2011-01-19 20:32, Blue Swirl wrote:
>>>> On Wed, Jan 19, 2011 at 4:57 PM, Anthony Liguori
>>>> <aliguori@linux.vnet.ibm.com> wrote:
>>>>> On 01/19/2011 07:15 AM, Markus Armbruster wrote:
>>>>>>
>>>>>> So they interact with KVM (need kvm_state), and they interact with the
>>>>>> emulated PCI bus.  Could you elaborate on the fundamental difference
>>>>>> between the two interactions that makes you choose the (hypothetical)
>>>>>> KVM bus over the PCI bus as device parent?
>>>>>>
>>>>>
>>>>> It's almost arbitrary, but I would say it's the direction that I/Os flow.
>>>>>
>>>>> But if the underlying observation is that the device tree is not really a
>>>>> tree, you're 100% correct.  This is part of why a factory interface that
>>>>> just takes a parent bus is too simplistic.
>>>>>
>>>>> I think we ought to introduce a -pci-device option that is specifically for
>>>>> creating PCI devices that doesn't require a parent bus argument but provides
>>>>> a way to specify stable addressing (for instancing, using a linear index).
>>>>
>>>> I think kvm_state should not be a property of any device or bus. It
>>>> should be split to more logical pieces.
>>>>
>>>> Some parts of it could remain in CPUState, because they are associated
>>>> with a VCPU.
>>>>
>>>> Also, for example irqfd could be considered to be similar object to
>>>> char or block devices provided by QEMU to devices. Would it make sense
>>>> to introduce new host types for passing parts of kvm_state to devices?
>>>>
>>>> I'd also make coalesced MMIO stuff part of memory object. We are not
>>>> passing any state references when using cpu_physical_memory_rw(), but
>>>> that could be changed.
>>>
>>> There are currently no VCPU-specific bits remaining in kvm_state.
>>
>> I think fields vcpu_events, robust_singlestep, debugregs,
>> kvm_sw_breakpoints, xsave, xcrs belong to CPUX86State. They may be the
>> same for all VCPUs but still they are sort of CPU properties. I'm not
>> sure about fd field.
>
> They are all properties of the currently loaded KVM subsystem in the
> host kernel. They can't change while KVM's root fd is opened.
> Replicating this static information into each and every VCPU state would
> be crazy.

Then each CPUX86State could have a pointer to common structure.

> In fact, services like kvm_has_vcpu_events() already encode this: they
> are static functions without any kvm_state reference that simply return
> the content of those fields. Totally inconsistent to this, we force the
> caller of kvm_check_extension to pass a handle. This is part of my
> problem with the current situation and any halfhearted steps in this
> context. Either we work towards eliminating "static KVMState *kvm_state"
> in kvm-all.c or eliminating KVMState.

If the CPU related fields are accessible through CPUState, the handle
should be available.

>>> It may
>>> be a good idea to introduce an arch-specific kvm_state and move related
>>> bits over.
>>
>> This should probably contain only irqchip_in_kernel, pit_in_kernel and
>> many_ioeventfds, maybe fd.
>
> fd is that root file descriptor you need for a few KVM services that are
> not bound to a specific VM - e.g. feature queries. It's not arch
> specific. Arch specific are e.g. robust_singlestep or xsave feature states.

By arch you mean guest CPU architecture? They are not machine features.

>>
>>> It may also once be feasible to carve out memory management
>>> related fields if we have proper abstractions for that, but I'm not
>>> completely sure here.
>>
>> I'd put slots, vmfd, coalesced_mmio, broken_set_mem_region,
>> migration_log into the memory object.
>
> vmfd is the VM-scope file descriptor you need at machine-level. The rest
> logically belongs to a memory object, but I haven't looked at technical
> details yet.
>
>>
>>> Anyway, all these things are secondary. The primary topic here is how to
>>> deal with kvm_state and its fields that have VM-global scope.
>>
>> If it is an opaque blob which contains various unrelated stuff, no
>> clear place will be found.
>
> We aren't moving fields yet (and we shouldn't). We first of all need to
> establish the handle distribution (which apparently requires quite some
> work in areas beyond KVM).

But I think this is exactly  the problem. If the handle is for the
current KVMState, you'll indeed need it in various places and passing
it around will be cumbersome. By moving the fields around, the
information should  be available more naturally.

>> By the way, we don't have a QEMUState but instead use globals. Perhaps
>> this should be reorganized as well. For fd field, maybe even using a
>> global variable could be justified, since it is used for direct access
>> to kernel, not unlike a system call.
>
> The fd field is part of this discussion. Making it global (but local to
> the kvm subsystem) was an implicit part of my original suggestion.
>
> I've no problem with something like a QEMUState, or better a
> MachineState that would also include a few KVM-specific fields like the
> vmfd - just like we already do for CPUstate (or should we better
> introduce a KVM CPU bus... ;) ).
>
> Jan
>
>

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-20 20:02                                     ` Blue Swirl
@ 2011-01-20 21:42                                       ` Jan Kiszka
  -1 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-20 21:42 UTC (permalink / raw)
  To: Blue Swirl
  Cc: Anthony Liguori, Markus Armbruster, kvm, Glauber Costa,
	Marcelo Tosatti, qemu-devel, Avi Kivity

[-- Attachment #1: Type: text/plain, Size: 1283 bytes --]

On 2011-01-20 21:02, Blue Swirl wrote:
> I think KVMState was designed to match KVM ioctl interface: all stuff
> that is needed for talking to KVM or received from KVM are there. But
> I think this shouldn't be a design driver.

Agreed. The nice cleanup would probably be the complete assimilation of
KVMState by something bigger of comparable scope.

If a machine was brought up with KVM support, every device that refers
to this machine (as it is supposed to become part of it) should be able
to use KVM services in order to accelerate its model.

> 
> If the only pieces of kvm_state that are needed by the devices are
> irqchip_in_kernel, pit_in_kernel and many_ioeventfds, the problem of
> passing kvm_state to devices becomes very different. Each of these are
> just single bits, affecting only a few devices. Perhaps they could be
> device properties which the board level sets when KVM is used?

Forget about the static capabilities for now. The core of kvm_state are
handles that enable you to use KVM services and maybe state fields that
have machine scope (unless we find more local homes like a memory
object). Those need to be accessible by the kvm layer when servicing
requests of components that are related to that very same machine.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-20 21:42                                       ` Jan Kiszka
  0 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-20 21:42 UTC (permalink / raw)
  To: Blue Swirl
  Cc: kvm, Glauber Costa, Marcelo Tosatti, Markus Armbruster,
	qemu-devel, Anthony Liguori, Avi Kivity

[-- Attachment #1: Type: text/plain, Size: 1283 bytes --]

On 2011-01-20 21:02, Blue Swirl wrote:
> I think KVMState was designed to match KVM ioctl interface: all stuff
> that is needed for talking to KVM or received from KVM are there. But
> I think this shouldn't be a design driver.

Agreed. The nice cleanup would probably be the complete assimilation of
KVMState by something bigger of comparable scope.

If a machine was brought up with KVM support, every device that refers
to this machine (as it is supposed to become part of it) should be able
to use KVM services in order to accelerate its model.

> 
> If the only pieces of kvm_state that are needed by the devices are
> irqchip_in_kernel, pit_in_kernel and many_ioeventfds, the problem of
> passing kvm_state to devices becomes very different. Each of these are
> just single bits, affecting only a few devices. Perhaps they could be
> device properties which the board level sets when KVM is used?

Forget about the static capabilities for now. The core of kvm_state are
handles that enable you to use KVM services and maybe state fields that
have machine scope (unless we find more local homes like a memory
object). Those need to be accessible by the kvm layer when servicing
requests of components that are related to that very same machine.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-20 21:40                                       ` Blue Swirl
@ 2011-01-20 21:53                                         ` Jan Kiszka
  -1 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-20 21:53 UTC (permalink / raw)
  To: Blue Swirl
  Cc: Anthony Liguori, Markus Armbruster, kvm, Glauber Costa,
	Marcelo Tosatti, qemu-devel, Avi Kivity

[-- Attachment #1: Type: text/plain, Size: 5204 bytes --]

On 2011-01-20 22:40, Blue Swirl wrote:
> On Thu, Jan 20, 2011 at 9:22 PM, Jan Kiszka <jan.kiszka@web.de> wrote:
>> On 2011-01-20 20:27, Blue Swirl wrote:
>>> On Thu, Jan 20, 2011 at 9:33 AM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>>> On 2011-01-19 20:32, Blue Swirl wrote:
>>>>> On Wed, Jan 19, 2011 at 4:57 PM, Anthony Liguori
>>>>> <aliguori@linux.vnet.ibm.com> wrote:
>>>>>> On 01/19/2011 07:15 AM, Markus Armbruster wrote:
>>>>>>>
>>>>>>> So they interact with KVM (need kvm_state), and they interact with the
>>>>>>> emulated PCI bus.  Could you elaborate on the fundamental difference
>>>>>>> between the two interactions that makes you choose the (hypothetical)
>>>>>>> KVM bus over the PCI bus as device parent?
>>>>>>>
>>>>>>
>>>>>> It's almost arbitrary, but I would say it's the direction that I/Os flow.
>>>>>>
>>>>>> But if the underlying observation is that the device tree is not really a
>>>>>> tree, you're 100% correct.  This is part of why a factory interface that
>>>>>> just takes a parent bus is too simplistic.
>>>>>>
>>>>>> I think we ought to introduce a -pci-device option that is specifically for
>>>>>> creating PCI devices that doesn't require a parent bus argument but provides
>>>>>> a way to specify stable addressing (for instancing, using a linear index).
>>>>>
>>>>> I think kvm_state should not be a property of any device or bus. It
>>>>> should be split to more logical pieces.
>>>>>
>>>>> Some parts of it could remain in CPUState, because they are associated
>>>>> with a VCPU.
>>>>>
>>>>> Also, for example irqfd could be considered to be similar object to
>>>>> char or block devices provided by QEMU to devices. Would it make sense
>>>>> to introduce new host types for passing parts of kvm_state to devices?
>>>>>
>>>>> I'd also make coalesced MMIO stuff part of memory object. We are not
>>>>> passing any state references when using cpu_physical_memory_rw(), but
>>>>> that could be changed.
>>>>
>>>> There are currently no VCPU-specific bits remaining in kvm_state.
>>>
>>> I think fields vcpu_events, robust_singlestep, debugregs,
>>> kvm_sw_breakpoints, xsave, xcrs belong to CPUX86State. They may be the
>>> same for all VCPUs but still they are sort of CPU properties. I'm not
>>> sure about fd field.
>>
>> They are all properties of the currently loaded KVM subsystem in the
>> host kernel. They can't change while KVM's root fd is opened.
>> Replicating this static information into each and every VCPU state would
>> be crazy.
> 
> Then each CPUX86State could have a pointer to common structure.

That already exists.

> 
>> In fact, services like kvm_has_vcpu_events() already encode this: they
>> are static functions without any kvm_state reference that simply return
>> the content of those fields. Totally inconsistent to this, we force the
>> caller of kvm_check_extension to pass a handle. This is part of my
>> problem with the current situation and any halfhearted steps in this
>> context. Either we work towards eliminating "static KVMState *kvm_state"
>> in kvm-all.c or eliminating KVMState.
> 
> If the CPU related fields are accessible through CPUState, the handle
> should be available.
> 
>>>> It may
>>>> be a good idea to introduce an arch-specific kvm_state and move related
>>>> bits over.
>>>
>>> This should probably contain only irqchip_in_kernel, pit_in_kernel and
>>> many_ioeventfds, maybe fd.
>>
>> fd is that root file descriptor you need for a few KVM services that are
>> not bound to a specific VM - e.g. feature queries. It's not arch
>> specific. Arch specific are e.g. robust_singlestep or xsave feature states.
> 
> By arch you mean guest CPU architecture? They are not machine features.

No, they are practically static host features.

> 
>>>
>>>> It may also once be feasible to carve out memory management
>>>> related fields if we have proper abstractions for that, but I'm not
>>>> completely sure here.
>>>
>>> I'd put slots, vmfd, coalesced_mmio, broken_set_mem_region,
>>> migration_log into the memory object.
>>
>> vmfd is the VM-scope file descriptor you need at machine-level. The rest
>> logically belongs to a memory object, but I haven't looked at technical
>> details yet.
>>
>>>
>>>> Anyway, all these things are secondary. The primary topic here is how to
>>>> deal with kvm_state and its fields that have VM-global scope.
>>>
>>> If it is an opaque blob which contains various unrelated stuff, no
>>> clear place will be found.
>>
>> We aren't moving fields yet (and we shouldn't). We first of all need to
>> establish the handle distribution (which apparently requires quite some
>> work in areas beyond KVM).
> 
> But I think this is exactly  the problem. If the handle is for the
> current KVMState, you'll indeed need it in various places and passing
> it around will be cumbersome. By moving the fields around, the
> information should  be available more naturally.

Yeah, if we had a MachineState or if we could agree on introducing it,
I'm with you again. Improving the currently cumbersome KVM API
interaction was the main motivation for my original patch.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-20 21:53                                         ` Jan Kiszka
  0 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-20 21:53 UTC (permalink / raw)
  To: Blue Swirl
  Cc: kvm, Glauber Costa, Marcelo Tosatti, Markus Armbruster,
	qemu-devel, Anthony Liguori, Avi Kivity

[-- Attachment #1: Type: text/plain, Size: 5204 bytes --]

On 2011-01-20 22:40, Blue Swirl wrote:
> On Thu, Jan 20, 2011 at 9:22 PM, Jan Kiszka <jan.kiszka@web.de> wrote:
>> On 2011-01-20 20:27, Blue Swirl wrote:
>>> On Thu, Jan 20, 2011 at 9:33 AM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>>> On 2011-01-19 20:32, Blue Swirl wrote:
>>>>> On Wed, Jan 19, 2011 at 4:57 PM, Anthony Liguori
>>>>> <aliguori@linux.vnet.ibm.com> wrote:
>>>>>> On 01/19/2011 07:15 AM, Markus Armbruster wrote:
>>>>>>>
>>>>>>> So they interact with KVM (need kvm_state), and they interact with the
>>>>>>> emulated PCI bus.  Could you elaborate on the fundamental difference
>>>>>>> between the two interactions that makes you choose the (hypothetical)
>>>>>>> KVM bus over the PCI bus as device parent?
>>>>>>>
>>>>>>
>>>>>> It's almost arbitrary, but I would say it's the direction that I/Os flow.
>>>>>>
>>>>>> But if the underlying observation is that the device tree is not really a
>>>>>> tree, you're 100% correct.  This is part of why a factory interface that
>>>>>> just takes a parent bus is too simplistic.
>>>>>>
>>>>>> I think we ought to introduce a -pci-device option that is specifically for
>>>>>> creating PCI devices that doesn't require a parent bus argument but provides
>>>>>> a way to specify stable addressing (for instancing, using a linear index).
>>>>>
>>>>> I think kvm_state should not be a property of any device or bus. It
>>>>> should be split to more logical pieces.
>>>>>
>>>>> Some parts of it could remain in CPUState, because they are associated
>>>>> with a VCPU.
>>>>>
>>>>> Also, for example irqfd could be considered to be similar object to
>>>>> char or block devices provided by QEMU to devices. Would it make sense
>>>>> to introduce new host types for passing parts of kvm_state to devices?
>>>>>
>>>>> I'd also make coalesced MMIO stuff part of memory object. We are not
>>>>> passing any state references when using cpu_physical_memory_rw(), but
>>>>> that could be changed.
>>>>
>>>> There are currently no VCPU-specific bits remaining in kvm_state.
>>>
>>> I think fields vcpu_events, robust_singlestep, debugregs,
>>> kvm_sw_breakpoints, xsave, xcrs belong to CPUX86State. They may be the
>>> same for all VCPUs but still they are sort of CPU properties. I'm not
>>> sure about fd field.
>>
>> They are all properties of the currently loaded KVM subsystem in the
>> host kernel. They can't change while KVM's root fd is opened.
>> Replicating this static information into each and every VCPU state would
>> be crazy.
> 
> Then each CPUX86State could have a pointer to common structure.

That already exists.

> 
>> In fact, services like kvm_has_vcpu_events() already encode this: they
>> are static functions without any kvm_state reference that simply return
>> the content of those fields. Totally inconsistent to this, we force the
>> caller of kvm_check_extension to pass a handle. This is part of my
>> problem with the current situation and any halfhearted steps in this
>> context. Either we work towards eliminating "static KVMState *kvm_state"
>> in kvm-all.c or eliminating KVMState.
> 
> If the CPU related fields are accessible through CPUState, the handle
> should be available.
> 
>>>> It may
>>>> be a good idea to introduce an arch-specific kvm_state and move related
>>>> bits over.
>>>
>>> This should probably contain only irqchip_in_kernel, pit_in_kernel and
>>> many_ioeventfds, maybe fd.
>>
>> fd is that root file descriptor you need for a few KVM services that are
>> not bound to a specific VM - e.g. feature queries. It's not arch
>> specific. Arch specific are e.g. robust_singlestep or xsave feature states.
> 
> By arch you mean guest CPU architecture? They are not machine features.

No, they are practically static host features.

> 
>>>
>>>> It may also once be feasible to carve out memory management
>>>> related fields if we have proper abstractions for that, but I'm not
>>>> completely sure here.
>>>
>>> I'd put slots, vmfd, coalesced_mmio, broken_set_mem_region,
>>> migration_log into the memory object.
>>
>> vmfd is the VM-scope file descriptor you need at machine-level. The rest
>> logically belongs to a memory object, but I haven't looked at technical
>> details yet.
>>
>>>
>>>> Anyway, all these things are secondary. The primary topic here is how to
>>>> deal with kvm_state and its fields that have VM-global scope.
>>>
>>> If it is an opaque blob which contains various unrelated stuff, no
>>> clear place will be found.
>>
>> We aren't moving fields yet (and we shouldn't). We first of all need to
>> establish the handle distribution (which apparently requires quite some
>> work in areas beyond KVM).
> 
> But I think this is exactly  the problem. If the handle is for the
> current KVMState, you'll indeed need it in various places and passing
> it around will be cumbersome. By moving the fields around, the
> information should  be available more naturally.

Yeah, if we had a MachineState or if we could agree on introducing it,
I'm with you again. Improving the currently cumbersome KVM API
interaction was the main motivation for my original patch.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-20 19:39                                                   ` Anthony Liguori
@ 2011-01-21  8:35                                                     ` Gerd Hoffmann
  -1 siblings, 0 replies; 300+ messages in thread
From: Gerd Hoffmann @ 2011-01-21  8:35 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Daniel P. Berrange, Markus Armbruster, kvm, Jan Kiszka,
	Glauber Costa, Marcelo Tosatti, qemu-devel, Avi Kivity

On 01/20/11 20:39, Anthony Liguori wrote:
> On 01/20/2011 02:44 AM, Gerd Hoffmann wrote:
>> Hi,
>>
>>> For (2), you cannot use bus=X,addr=Y because it makes assumptions about
>>> the PCI topology which may change in newer -M pc's.
>>
>> Why should the PCI topology for 'pc' ever change?
>>
>> We'll probably get q35 support some day, but when this lands I expect
>> we'll see a new machine type 'q35', so '-m q35' will pick the ich9
>> chipset (which will have a different pci topology of course) and '-m
>> pc' will pick the existing piix chipset (which will continue to look
>> like it looks today).
>
> But then what's the default machine type? When I say -M pc, I really
> mean the default machine.

I'd tend to leave pc as default for a release cycle or two so we can 
hash out issues with q35, then flip the default once it got broader 
testing and runs stable.

> At some point, "qemu-system-x86_64 -device virtio-net-pci,addr=2.0"
>
> Is not going to be a reliable way to invoke qemu because there's no way
> we can guarantee that slot 2 isn't occupied by a chipset device or some
> other default device.

Indeed.  But qemu -M pc should continue to work though.  'pc' would 
better named 'piix3', but renaming it now is probably not worth the trouble.

cheers,
   Gerd


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-21  8:35                                                     ` Gerd Hoffmann
  0 siblings, 0 replies; 300+ messages in thread
From: Gerd Hoffmann @ 2011-01-21  8:35 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kvm, Jan Kiszka, Glauber Costa, Marcelo Tosatti,
	Markus Armbruster, qemu-devel, Avi Kivity

On 01/20/11 20:39, Anthony Liguori wrote:
> On 01/20/2011 02:44 AM, Gerd Hoffmann wrote:
>> Hi,
>>
>>> For (2), you cannot use bus=X,addr=Y because it makes assumptions about
>>> the PCI topology which may change in newer -M pc's.
>>
>> Why should the PCI topology for 'pc' ever change?
>>
>> We'll probably get q35 support some day, but when this lands I expect
>> we'll see a new machine type 'q35', so '-m q35' will pick the ich9
>> chipset (which will have a different pci topology of course) and '-m
>> pc' will pick the existing piix chipset (which will continue to look
>> like it looks today).
>
> But then what's the default machine type? When I say -M pc, I really
> mean the default machine.

I'd tend to leave pc as default for a release cycle or two so we can 
hash out issues with q35, then flip the default once it got broader 
testing and runs stable.

> At some point, "qemu-system-x86_64 -device virtio-net-pci,addr=2.0"
>
> Is not going to be a reliable way to invoke qemu because there's no way
> we can guarantee that slot 2 isn't occupied by a chipset device or some
> other default device.

Indeed.  But qemu -M pc should continue to work though.  'pc' would 
better named 'piix3', but renaming it now is probably not worth the trouble.

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-20 19:27                                   ` Blue Swirl
@ 2011-01-21  8:46                                     ` Gerd Hoffmann
  -1 siblings, 0 replies; 300+ messages in thread
From: Gerd Hoffmann @ 2011-01-21  8:46 UTC (permalink / raw)
  To: Blue Swirl
  Cc: Jan Kiszka, Anthony Liguori, Markus Armbruster, kvm,
	Glauber Costa, Marcelo Tosatti, qemu-devel, Avi Kivity

   Hi,

> By the way, we don't have a QEMUState but instead use globals.

/me wants to underline this.

IMO it is absolutely pointless to worry about ways to pass around 
kvm_state.  There never ever will be a serious need for that.

We can stick with the current model of keeping global state in global 
variables.  And just do the same with kvm_state.

Or we can move to have all state in a QEMUState struct which we'll pass 
around basically everywhere.  Then we can simply embed or reference 
kvm_state there.

I'd tend to stick with the global variables as I don't see the point in 
having a QEMUstate.  I doubt we'll ever see two virtual machines driven 
by a single qemu process.  YMMV.

cheers,
   Gerd


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-21  8:46                                     ` Gerd Hoffmann
  0 siblings, 0 replies; 300+ messages in thread
From: Gerd Hoffmann @ 2011-01-21  8:46 UTC (permalink / raw)
  To: Blue Swirl
  Cc: kvm, Jan Kiszka, Glauber Costa, Marcelo Tosatti,
	Markus Armbruster, qemu-devel, Anthony Liguori, Avi Kivity

   Hi,

> By the way, we don't have a QEMUState but instead use globals.

/me wants to underline this.

IMO it is absolutely pointless to worry about ways to pass around 
kvm_state.  There never ever will be a serious need for that.

We can stick with the current model of keeping global state in global 
variables.  And just do the same with kvm_state.

Or we can move to have all state in a QEMUState struct which we'll pass 
around basically everywhere.  Then we can simply embed or reference 
kvm_state there.

I'd tend to stick with the global variables as I don't see the point in 
having a QEMUstate.  I doubt we'll ever see two virtual machines driven 
by a single qemu process.  YMMV.

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-21  8:35                                                     ` Gerd Hoffmann
@ 2011-01-21 10:03                                                       ` Markus Armbruster
  -1 siblings, 0 replies; 300+ messages in thread
From: Markus Armbruster @ 2011-01-21 10:03 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Anthony Liguori, kvm, Jan Kiszka, Glauber Costa, Marcelo Tosatti,
	qemu-devel, Avi Kivity

Gerd Hoffmann <kraxel@redhat.com> writes:

> On 01/20/11 20:39, Anthony Liguori wrote:
>> On 01/20/2011 02:44 AM, Gerd Hoffmann wrote:
>>> Hi,
>>>
>>>> For (2), you cannot use bus=X,addr=Y because it makes assumptions about
>>>> the PCI topology which may change in newer -M pc's.
>>>
>>> Why should the PCI topology for 'pc' ever change?
>>>
>>> We'll probably get q35 support some day, but when this lands I expect
>>> we'll see a new machine type 'q35', so '-m q35' will pick the ich9
>>> chipset (which will have a different pci topology of course) and '-m
>>> pc' will pick the existing piix chipset (which will continue to look
>>> like it looks today).
>>
>> But then what's the default machine type? When I say -M pc, I really
>> mean the default machine.
>
> I'd tend to leave pc as default for a release cycle or two so we can
> hash out issues with q35, then flip the default once it got broader
> testing and runs stable.
>
>> At some point, "qemu-system-x86_64 -device virtio-net-pci,addr=2.0"
>>
>> Is not going to be a reliable way to invoke qemu because there's no way
>> we can guarantee that slot 2 isn't occupied by a chipset device or some
>> other default device.
>
> Indeed.  But qemu -M pc should continue to work though.  'pc' would
> better named 'piix3', but renaming it now is probably not worth the
> trouble.

We mustn't change pc-0.14 & friends.  We routinely change pc, but
whether an upgrade to q35 qualifies as routine change is debatable.

If you don't want PCI topology (and more) to change across QEMU updates,
consider using the versioned machine types.

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-21 10:03                                                       ` Markus Armbruster
  0 siblings, 0 replies; 300+ messages in thread
From: Markus Armbruster @ 2011-01-21 10:03 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: kvm, Jan Kiszka, Glauber Costa, Marcelo Tosatti, qemu-devel,
	Anthony Liguori, Avi Kivity

Gerd Hoffmann <kraxel@redhat.com> writes:

> On 01/20/11 20:39, Anthony Liguori wrote:
>> On 01/20/2011 02:44 AM, Gerd Hoffmann wrote:
>>> Hi,
>>>
>>>> For (2), you cannot use bus=X,addr=Y because it makes assumptions about
>>>> the PCI topology which may change in newer -M pc's.
>>>
>>> Why should the PCI topology for 'pc' ever change?
>>>
>>> We'll probably get q35 support some day, but when this lands I expect
>>> we'll see a new machine type 'q35', so '-m q35' will pick the ich9
>>> chipset (which will have a different pci topology of course) and '-m
>>> pc' will pick the existing piix chipset (which will continue to look
>>> like it looks today).
>>
>> But then what's the default machine type? When I say -M pc, I really
>> mean the default machine.
>
> I'd tend to leave pc as default for a release cycle or two so we can
> hash out issues with q35, then flip the default once it got broader
> testing and runs stable.
>
>> At some point, "qemu-system-x86_64 -device virtio-net-pci,addr=2.0"
>>
>> Is not going to be a reliable way to invoke qemu because there's no way
>> we can guarantee that slot 2 isn't occupied by a chipset device or some
>> other default device.
>
> Indeed.  But qemu -M pc should continue to work though.  'pc' would
> better named 'piix3', but renaming it now is probably not worth the
> trouble.

We mustn't change pc-0.14 & friends.  We routinely change pc, but
whether an upgrade to q35 qualifies as routine change is debatable.

If you don't want PCI topology (and more) to change across QEMU updates,
consider using the versioned machine types.

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-21  8:46                                     ` Gerd Hoffmann
@ 2011-01-21 10:05                                       ` Markus Armbruster
  -1 siblings, 0 replies; 300+ messages in thread
From: Markus Armbruster @ 2011-01-21 10:05 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Blue Swirl, kvm, Jan Kiszka, Glauber Costa, Marcelo Tosatti,
	qemu-devel, Anthony Liguori, Avi Kivity

Gerd Hoffmann <kraxel@redhat.com> writes:

>   Hi,
>
>> By the way, we don't have a QEMUState but instead use globals.
>
> /me wants to underline this.
>
> IMO it is absolutely pointless to worry about ways to pass around
> kvm_state.  There never ever will be a serious need for that.
>
> We can stick with the current model of keeping global state in global
> variables.  And just do the same with kvm_state.
>
> Or we can move to have all state in a QEMUState struct which we'll
> pass around basically everywhere.  Then we can simply embed or
> reference kvm_state there.
>
> I'd tend to stick with the global variables as I don't see the point
> in having a QEMUstate.  I doubt we'll ever see two virtual machines
> driven by a single qemu process.  YMMV.

/me grabs the fat magic marker and underlines some more.

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-21 10:05                                       ` Markus Armbruster
  0 siblings, 0 replies; 300+ messages in thread
From: Markus Armbruster @ 2011-01-21 10:05 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: kvm, Jan Kiszka, Glauber Costa, Marcelo Tosatti, qemu-devel,
	Blue Swirl, Anthony Liguori, Avi Kivity

Gerd Hoffmann <kraxel@redhat.com> writes:

>   Hi,
>
>> By the way, we don't have a QEMUState but instead use globals.
>
> /me wants to underline this.
>
> IMO it is absolutely pointless to worry about ways to pass around
> kvm_state.  There never ever will be a serious need for that.
>
> We can stick with the current model of keeping global state in global
> variables.  And just do the same with kvm_state.
>
> Or we can move to have all state in a QEMUState struct which we'll
> pass around basically everywhere.  Then we can simply embed or
> reference kvm_state there.
>
> I'd tend to stick with the global variables as I don't see the point
> in having a QEMUstate.  I doubt we'll ever see two virtual machines
> driven by a single qemu process.  YMMV.

/me grabs the fat magic marker and underlines some more.

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-21  8:46                                     ` Gerd Hoffmann
@ 2011-01-21 16:37                                       ` Blue Swirl
  -1 siblings, 0 replies; 300+ messages in thread
From: Blue Swirl @ 2011-01-21 16:37 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Jan Kiszka, Anthony Liguori, Markus Armbruster, kvm,
	Glauber Costa, Marcelo Tosatti, qemu-devel, Avi Kivity

On Fri, Jan 21, 2011 at 8:46 AM, Gerd Hoffmann <kraxel@redhat.com> wrote:
>  Hi,
>
>> By the way, we don't have a QEMUState but instead use globals.
>
> /me wants to underline this.
>
> IMO it is absolutely pointless to worry about ways to pass around kvm_state.
>  There never ever will be a serious need for that.
>
> We can stick with the current model of keeping global state in global
> variables.  And just do the same with kvm_state.
>
> Or we can move to have all state in a QEMUState struct which we'll pass
> around basically everywhere.  Then we can simply embed or reference
> kvm_state there.
>
> I'd tend to stick with the global variables as I don't see the point in
> having a QEMUstate.  I doubt we'll ever see two virtual machines driven by a
> single qemu process.  YMMV.

Global variables are signs of a poor design. QEMUState would not help
that, instead more specific structures should be designed, much like
what I've proposed for KVMState. Some of these new structures should
be even passed around when it makes sense.

But I'd not start kvm_state redesign around global variables or QEMUState.

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-21 16:37                                       ` Blue Swirl
  0 siblings, 0 replies; 300+ messages in thread
From: Blue Swirl @ 2011-01-21 16:37 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: kvm, Jan Kiszka, Glauber Costa, Marcelo Tosatti,
	Markus Armbruster, qemu-devel, Anthony Liguori, Avi Kivity

On Fri, Jan 21, 2011 at 8:46 AM, Gerd Hoffmann <kraxel@redhat.com> wrote:
>  Hi,
>
>> By the way, we don't have a QEMUState but instead use globals.
>
> /me wants to underline this.
>
> IMO it is absolutely pointless to worry about ways to pass around kvm_state.
>  There never ever will be a serious need for that.
>
> We can stick with the current model of keeping global state in global
> variables.  And just do the same with kvm_state.
>
> Or we can move to have all state in a QEMUState struct which we'll pass
> around basically everywhere.  Then we can simply embed or reference
> kvm_state there.
>
> I'd tend to stick with the global variables as I don't see the point in
> having a QEMUstate.  I doubt we'll ever see two virtual machines driven by a
> single qemu process.  YMMV.

Global variables are signs of a poor design. QEMUState would not help
that, instead more specific structures should be designed, much like
what I've proposed for KVMState. Some of these new structures should
be even passed around when it makes sense.

But I'd not start kvm_state redesign around global variables or QEMUState.

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-21 16:37                                       ` Blue Swirl
@ 2011-01-21 17:21                                         ` Jan Kiszka
  -1 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-21 17:21 UTC (permalink / raw)
  To: Blue Swirl
  Cc: Gerd Hoffmann, Anthony Liguori, Markus Armbruster, kvm,
	Glauber Costa, Marcelo Tosatti, qemu-devel, Avi Kivity

On 2011-01-21 17:37, Blue Swirl wrote:
> On Fri, Jan 21, 2011 at 8:46 AM, Gerd Hoffmann <kraxel@redhat.com> wrote:
>>  Hi,
>>
>>> By the way, we don't have a QEMUState but instead use globals.
>>
>> /me wants to underline this.
>>
>> IMO it is absolutely pointless to worry about ways to pass around kvm_state.
>>  There never ever will be a serious need for that.
>>
>> We can stick with the current model of keeping global state in global
>> variables.  And just do the same with kvm_state.
>>
>> Or we can move to have all state in a QEMUState struct which we'll pass
>> around basically everywhere.  Then we can simply embed or reference
>> kvm_state there.
>>
>> I'd tend to stick with the global variables as I don't see the point in
>> having a QEMUstate.  I doubt we'll ever see two virtual machines driven by a
>> single qemu process.  YMMV.
> 
> Global variables are signs of a poor design.

s/are/can be/.

> QEMUState would not help
> that, instead more specific structures should be designed, much like
> what I've proposed for KVMState. Some of these new structures should
> be even passed around when it makes sense.
> 
> But I'd not start kvm_state redesign around global variables or QEMUState.

We do not need to move individual fields yet, but we need to define
classes of fields and strategies how to deal with them long-term. Then
we can move forward, and that already in the right direction.

Obvious classes are
 - static host capabilities and means for the KVM core to query them
 - per-VM fields
 - fields related to memory management

And we now need at least a plan for the second class to proceed with the
actual job.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-21 17:21                                         ` Jan Kiszka
  0 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-21 17:21 UTC (permalink / raw)
  To: Blue Swirl
  Cc: kvm, Glauber Costa, Marcelo Tosatti, Markus Armbruster,
	qemu-devel, Anthony Liguori, Gerd Hoffmann, Avi Kivity

On 2011-01-21 17:37, Blue Swirl wrote:
> On Fri, Jan 21, 2011 at 8:46 AM, Gerd Hoffmann <kraxel@redhat.com> wrote:
>>  Hi,
>>
>>> By the way, we don't have a QEMUState but instead use globals.
>>
>> /me wants to underline this.
>>
>> IMO it is absolutely pointless to worry about ways to pass around kvm_state.
>>  There never ever will be a serious need for that.
>>
>> We can stick with the current model of keeping global state in global
>> variables.  And just do the same with kvm_state.
>>
>> Or we can move to have all state in a QEMUState struct which we'll pass
>> around basically everywhere.  Then we can simply embed or reference
>> kvm_state there.
>>
>> I'd tend to stick with the global variables as I don't see the point in
>> having a QEMUstate.  I doubt we'll ever see two virtual machines driven by a
>> single qemu process.  YMMV.
> 
> Global variables are signs of a poor design.

s/are/can be/.

> QEMUState would not help
> that, instead more specific structures should be designed, much like
> what I've proposed for KVMState. Some of these new structures should
> be even passed around when it makes sense.
> 
> But I'd not start kvm_state redesign around global variables or QEMUState.

We do not need to move individual fields yet, but we need to define
classes of fields and strategies how to deal with them long-term. Then
we can move forward, and that already in the right direction.

Obvious classes are
 - static host capabilities and means for the KVM core to query them
 - per-VM fields
 - fields related to memory management

And we now need at least a plan for the second class to proceed with the
actual job.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-21 17:21                                         ` Jan Kiszka
@ 2011-01-21 18:04                                           ` Blue Swirl
  -1 siblings, 0 replies; 300+ messages in thread
From: Blue Swirl @ 2011-01-21 18:04 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Gerd Hoffmann, Anthony Liguori, Markus Armbruster, kvm,
	Glauber Costa, Marcelo Tosatti, qemu-devel, Avi Kivity

On Fri, Jan 21, 2011 at 5:21 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 2011-01-21 17:37, Blue Swirl wrote:
>> On Fri, Jan 21, 2011 at 8:46 AM, Gerd Hoffmann <kraxel@redhat.com> wrote:
>>>  Hi,
>>>
>>>> By the way, we don't have a QEMUState but instead use globals.
>>>
>>> /me wants to underline this.
>>>
>>> IMO it is absolutely pointless to worry about ways to pass around kvm_state.
>>>  There never ever will be a serious need for that.
>>>
>>> We can stick with the current model of keeping global state in global
>>> variables.  And just do the same with kvm_state.
>>>
>>> Or we can move to have all state in a QEMUState struct which we'll pass
>>> around basically everywhere.  Then we can simply embed or reference
>>> kvm_state there.
>>>
>>> I'd tend to stick with the global variables as I don't see the point in
>>> having a QEMUstate.  I doubt we'll ever see two virtual machines driven by a
>>> single qemu process.  YMMV.
>>
>> Global variables are signs of a poor design.
>
> s/are/can be/.
>
>> QEMUState would not help
>> that, instead more specific structures should be designed, much like
>> what I've proposed for KVMState. Some of these new structures should
>> be even passed around when it makes sense.
>>
>> But I'd not start kvm_state redesign around global variables or QEMUState.
>
> We do not need to move individual fields yet, but we need to define
> classes of fields and strategies how to deal with them long-term. Then
> we can move forward, and that already in the right direction.

Excellent plan.

> Obvious classes are
>  - static host capabilities and means for the KVM core to query them

OK. There could be other host capabilities here in the future too,
like Xen. I don't think there are any Xen capabilities ATM though but
IIRC some recently sent patches had something like those.

>  - per-VM fields

What is per-VM which is not machine or CPU architecture specific?

>  - fields related to memory management

OK.

I'd add fourth possible class:
 - device, CPU and machine configuration, like nographic,
win2k_install_hack, no_hpet, smp_cpus etc. Maybe also
irqchip_in_kernel could fit here, though it obviously depends on a
host capability too.

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-21 18:04                                           ` Blue Swirl
  0 siblings, 0 replies; 300+ messages in thread
From: Blue Swirl @ 2011-01-21 18:04 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: kvm, Glauber Costa, Marcelo Tosatti, Markus Armbruster,
	qemu-devel, Anthony Liguori, Gerd Hoffmann, Avi Kivity

On Fri, Jan 21, 2011 at 5:21 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 2011-01-21 17:37, Blue Swirl wrote:
>> On Fri, Jan 21, 2011 at 8:46 AM, Gerd Hoffmann <kraxel@redhat.com> wrote:
>>>  Hi,
>>>
>>>> By the way, we don't have a QEMUState but instead use globals.
>>>
>>> /me wants to underline this.
>>>
>>> IMO it is absolutely pointless to worry about ways to pass around kvm_state.
>>>  There never ever will be a serious need for that.
>>>
>>> We can stick with the current model of keeping global state in global
>>> variables.  And just do the same with kvm_state.
>>>
>>> Or we can move to have all state in a QEMUState struct which we'll pass
>>> around basically everywhere.  Then we can simply embed or reference
>>> kvm_state there.
>>>
>>> I'd tend to stick with the global variables as I don't see the point in
>>> having a QEMUstate.  I doubt we'll ever see two virtual machines driven by a
>>> single qemu process.  YMMV.
>>
>> Global variables are signs of a poor design.
>
> s/are/can be/.
>
>> QEMUState would not help
>> that, instead more specific structures should be designed, much like
>> what I've proposed for KVMState. Some of these new structures should
>> be even passed around when it makes sense.
>>
>> But I'd not start kvm_state redesign around global variables or QEMUState.
>
> We do not need to move individual fields yet, but we need to define
> classes of fields and strategies how to deal with them long-term. Then
> we can move forward, and that already in the right direction.

Excellent plan.

> Obvious classes are
>  - static host capabilities and means for the KVM core to query them

OK. There could be other host capabilities here in the future too,
like Xen. I don't think there are any Xen capabilities ATM though but
IIRC some recently sent patches had something like those.

>  - per-VM fields

What is per-VM which is not machine or CPU architecture specific?

>  - fields related to memory management

OK.

I'd add fourth possible class:
 - device, CPU and machine configuration, like nographic,
win2k_install_hack, no_hpet, smp_cpus etc. Maybe also
irqchip_in_kernel could fit here, though it obviously depends on a
host capability too.

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-21 18:04                                           ` Blue Swirl
@ 2011-01-21 18:17                                             ` Jan Kiszka
  -1 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-21 18:17 UTC (permalink / raw)
  To: Blue Swirl
  Cc: Gerd Hoffmann, Anthony Liguori, Markus Armbruster, kvm,
	Glauber Costa, Marcelo Tosatti, qemu-devel, Avi Kivity

On 2011-01-21 19:04, Blue Swirl wrote:
> On Fri, Jan 21, 2011 at 5:21 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> On 2011-01-21 17:37, Blue Swirl wrote:
>>> On Fri, Jan 21, 2011 at 8:46 AM, Gerd Hoffmann <kraxel@redhat.com> wrote:
>>>>  Hi,
>>>>
>>>>> By the way, we don't have a QEMUState but instead use globals.
>>>>
>>>> /me wants to underline this.
>>>>
>>>> IMO it is absolutely pointless to worry about ways to pass around kvm_state.
>>>>  There never ever will be a serious need for that.
>>>>
>>>> We can stick with the current model of keeping global state in global
>>>> variables.  And just do the same with kvm_state.
>>>>
>>>> Or we can move to have all state in a QEMUState struct which we'll pass
>>>> around basically everywhere.  Then we can simply embed or reference
>>>> kvm_state there.
>>>>
>>>> I'd tend to stick with the global variables as I don't see the point in
>>>> having a QEMUstate.  I doubt we'll ever see two virtual machines driven by a
>>>> single qemu process.  YMMV.
>>>
>>> Global variables are signs of a poor design.
>>
>> s/are/can be/.
>>
>>> QEMUState would not help
>>> that, instead more specific structures should be designed, much like
>>> what I've proposed for KVMState. Some of these new structures should
>>> be even passed around when it makes sense.
>>>
>>> But I'd not start kvm_state redesign around global variables or QEMUState.
>>
>> We do not need to move individual fields yet, but we need to define
>> classes of fields and strategies how to deal with them long-term. Then
>> we can move forward, and that already in the right direction.
> 
> Excellent plan.
> 
>> Obvious classes are
>>  - static host capabilities and means for the KVM core to query them
> 
> OK. There could be other host capabilities here in the future too,
> like Xen. I don't think there are any Xen capabilities ATM though but
> IIRC some recently sent patches had something like those.
> 
>>  - per-VM fields
> 
> What is per-VM which is not machine or CPU architecture specific?

I think it would suffice for a first step to consider all per-VM fields
as independent of CPU architecture or machine type.

> 
>>  - fields related to memory management
> 
> OK.
> 
> I'd add fourth possible class:
>  - device, CPU and machine configuration, like nographic,
> win2k_install_hack, no_hpet, smp_cpus etc. Maybe also
> irqchip_in_kernel could fit here, though it obviously depends on a
> host capability too.

I would count everything that cannot be assigned to a concrete device
upfront to the dynamic state of a machine, thus class 2. The point is,
(potentially) every device of that machine requires access to it, just
like (indirectly, via the KVM core services) to some KVM VM state bits.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-21 18:17                                             ` Jan Kiszka
  0 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-21 18:17 UTC (permalink / raw)
  To: Blue Swirl
  Cc: kvm, Glauber Costa, Marcelo Tosatti, Markus Armbruster,
	qemu-devel, Anthony Liguori, Gerd Hoffmann, Avi Kivity

On 2011-01-21 19:04, Blue Swirl wrote:
> On Fri, Jan 21, 2011 at 5:21 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> On 2011-01-21 17:37, Blue Swirl wrote:
>>> On Fri, Jan 21, 2011 at 8:46 AM, Gerd Hoffmann <kraxel@redhat.com> wrote:
>>>>  Hi,
>>>>
>>>>> By the way, we don't have a QEMUState but instead use globals.
>>>>
>>>> /me wants to underline this.
>>>>
>>>> IMO it is absolutely pointless to worry about ways to pass around kvm_state.
>>>>  There never ever will be a serious need for that.
>>>>
>>>> We can stick with the current model of keeping global state in global
>>>> variables.  And just do the same with kvm_state.
>>>>
>>>> Or we can move to have all state in a QEMUState struct which we'll pass
>>>> around basically everywhere.  Then we can simply embed or reference
>>>> kvm_state there.
>>>>
>>>> I'd tend to stick with the global variables as I don't see the point in
>>>> having a QEMUstate.  I doubt we'll ever see two virtual machines driven by a
>>>> single qemu process.  YMMV.
>>>
>>> Global variables are signs of a poor design.
>>
>> s/are/can be/.
>>
>>> QEMUState would not help
>>> that, instead more specific structures should be designed, much like
>>> what I've proposed for KVMState. Some of these new structures should
>>> be even passed around when it makes sense.
>>>
>>> But I'd not start kvm_state redesign around global variables or QEMUState.
>>
>> We do not need to move individual fields yet, but we need to define
>> classes of fields and strategies how to deal with them long-term. Then
>> we can move forward, and that already in the right direction.
> 
> Excellent plan.
> 
>> Obvious classes are
>>  - static host capabilities and means for the KVM core to query them
> 
> OK. There could be other host capabilities here in the future too,
> like Xen. I don't think there are any Xen capabilities ATM though but
> IIRC some recently sent patches had something like those.
> 
>>  - per-VM fields
> 
> What is per-VM which is not machine or CPU architecture specific?

I think it would suffice for a first step to consider all per-VM fields
as independent of CPU architecture or machine type.

> 
>>  - fields related to memory management
> 
> OK.
> 
> I'd add fourth possible class:
>  - device, CPU and machine configuration, like nographic,
> win2k_install_hack, no_hpet, smp_cpus etc. Maybe also
> irqchip_in_kernel could fit here, though it obviously depends on a
> host capability too.

I would count everything that cannot be assigned to a concrete device
upfront to the dynamic state of a machine, thus class 2. The point is,
(potentially) every device of that machine requires access to it, just
like (indirectly, via the KVM core services) to some KVM VM state bits.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-21 18:17                                             ` Jan Kiszka
@ 2011-01-21 18:49                                               ` Blue Swirl
  -1 siblings, 0 replies; 300+ messages in thread
From: Blue Swirl @ 2011-01-21 18:49 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Gerd Hoffmann, Anthony Liguori, Markus Armbruster, kvm,
	Glauber Costa, Marcelo Tosatti, qemu-devel, Avi Kivity

On Fri, Jan 21, 2011 at 6:17 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 2011-01-21 19:04, Blue Swirl wrote:
>> On Fri, Jan 21, 2011 at 5:21 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>> On 2011-01-21 17:37, Blue Swirl wrote:
>>>> On Fri, Jan 21, 2011 at 8:46 AM, Gerd Hoffmann <kraxel@redhat.com> wrote:
>>>>>  Hi,
>>>>>
>>>>>> By the way, we don't have a QEMUState but instead use globals.
>>>>>
>>>>> /me wants to underline this.
>>>>>
>>>>> IMO it is absolutely pointless to worry about ways to pass around kvm_state.
>>>>>  There never ever will be a serious need for that.
>>>>>
>>>>> We can stick with the current model of keeping global state in global
>>>>> variables.  And just do the same with kvm_state.
>>>>>
>>>>> Or we can move to have all state in a QEMUState struct which we'll pass
>>>>> around basically everywhere.  Then we can simply embed or reference
>>>>> kvm_state there.
>>>>>
>>>>> I'd tend to stick with the global variables as I don't see the point in
>>>>> having a QEMUstate.  I doubt we'll ever see two virtual machines driven by a
>>>>> single qemu process.  YMMV.
>>>>
>>>> Global variables are signs of a poor design.
>>>
>>> s/are/can be/.
>>>
>>>> QEMUState would not help
>>>> that, instead more specific structures should be designed, much like
>>>> what I've proposed for KVMState. Some of these new structures should
>>>> be even passed around when it makes sense.
>>>>
>>>> But I'd not start kvm_state redesign around global variables or QEMUState.
>>>
>>> We do not need to move individual fields yet, but we need to define
>>> classes of fields and strategies how to deal with them long-term. Then
>>> we can move forward, and that already in the right direction.
>>
>> Excellent plan.
>>
>>> Obvious classes are
>>>  - static host capabilities and means for the KVM core to query them
>>
>> OK. There could be other host capabilities here in the future too,
>> like Xen. I don't think there are any Xen capabilities ATM though but
>> IIRC some recently sent patches had something like those.
>>
>>>  - per-VM fields
>>
>> What is per-VM which is not machine or CPU architecture specific?
>
> I think it would suffice for a first step to consider all per-VM fields
> as independent of CPU architecture or machine type.

I'm afraid that would not be progress.

>>>  - fields related to memory management
>>
>> OK.
>>
>> I'd add fourth possible class:
>>  - device, CPU and machine configuration, like nographic,
>> win2k_install_hack, no_hpet, smp_cpus etc. Maybe also
>> irqchip_in_kernel could fit here, though it obviously depends on a
>> host capability too.
>
> I would count everything that cannot be assigned to a concrete device
> upfront to the dynamic state of a machine, thus class 2. The point is,
> (potentially) every device of that machine requires access to it, just
> like (indirectly, via the KVM core services) to some KVM VM state bits.

The machine class should not be a catch-all, it would be like
QEMUState or KVMState then. Perhaps each field or variable should be
listed and given more thought.

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-21 18:49                                               ` Blue Swirl
  0 siblings, 0 replies; 300+ messages in thread
From: Blue Swirl @ 2011-01-21 18:49 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: kvm, Glauber Costa, Marcelo Tosatti, Markus Armbruster,
	qemu-devel, Anthony Liguori, Gerd Hoffmann, Avi Kivity

On Fri, Jan 21, 2011 at 6:17 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 2011-01-21 19:04, Blue Swirl wrote:
>> On Fri, Jan 21, 2011 at 5:21 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>> On 2011-01-21 17:37, Blue Swirl wrote:
>>>> On Fri, Jan 21, 2011 at 8:46 AM, Gerd Hoffmann <kraxel@redhat.com> wrote:
>>>>>  Hi,
>>>>>
>>>>>> By the way, we don't have a QEMUState but instead use globals.
>>>>>
>>>>> /me wants to underline this.
>>>>>
>>>>> IMO it is absolutely pointless to worry about ways to pass around kvm_state.
>>>>>  There never ever will be a serious need for that.
>>>>>
>>>>> We can stick with the current model of keeping global state in global
>>>>> variables.  And just do the same with kvm_state.
>>>>>
>>>>> Or we can move to have all state in a QEMUState struct which we'll pass
>>>>> around basically everywhere.  Then we can simply embed or reference
>>>>> kvm_state there.
>>>>>
>>>>> I'd tend to stick with the global variables as I don't see the point in
>>>>> having a QEMUstate.  I doubt we'll ever see two virtual machines driven by a
>>>>> single qemu process.  YMMV.
>>>>
>>>> Global variables are signs of a poor design.
>>>
>>> s/are/can be/.
>>>
>>>> QEMUState would not help
>>>> that, instead more specific structures should be designed, much like
>>>> what I've proposed for KVMState. Some of these new structures should
>>>> be even passed around when it makes sense.
>>>>
>>>> But I'd not start kvm_state redesign around global variables or QEMUState.
>>>
>>> We do not need to move individual fields yet, but we need to define
>>> classes of fields and strategies how to deal with them long-term. Then
>>> we can move forward, and that already in the right direction.
>>
>> Excellent plan.
>>
>>> Obvious classes are
>>>  - static host capabilities and means for the KVM core to query them
>>
>> OK. There could be other host capabilities here in the future too,
>> like Xen. I don't think there are any Xen capabilities ATM though but
>> IIRC some recently sent patches had something like those.
>>
>>>  - per-VM fields
>>
>> What is per-VM which is not machine or CPU architecture specific?
>
> I think it would suffice for a first step to consider all per-VM fields
> as independent of CPU architecture or machine type.

I'm afraid that would not be progress.

>>>  - fields related to memory management
>>
>> OK.
>>
>> I'd add fourth possible class:
>>  - device, CPU and machine configuration, like nographic,
>> win2k_install_hack, no_hpet, smp_cpus etc. Maybe also
>> irqchip_in_kernel could fit here, though it obviously depends on a
>> host capability too.
>
> I would count everything that cannot be assigned to a concrete device
> upfront to the dynamic state of a machine, thus class 2. The point is,
> (potentially) every device of that machine requires access to it, just
> like (indirectly, via the KVM core services) to some KVM VM state bits.

The machine class should not be a catch-all, it would be like
QEMUState or KVMState then. Perhaps each field or variable should be
listed and given more thought.

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-18 17:09                                     ` Anthony Liguori
@ 2011-01-24  8:45                                       ` Gleb Natapov
  -1 siblings, 0 replies; 300+ messages in thread
From: Gleb Natapov @ 2011-01-24  8:45 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Jan Kiszka, Avi Kivity, Markus Armbruster, Marcelo Tosatti,
	Glauber Costa, kvm, qemu-devel

On Tue, Jan 18, 2011 at 11:09:01AM -0600, Anthony Liguori wrote:
> >>But we also need to provide a compatible interface to management tools.
> >>Exposing the device model topology as a compatible interface
> >>artificially limits us.  It's far better to provide higher level
> >>supported interfaces to give us the flexibility to change the device
> >>model as we need to.
> >How do you want to change qdev to keep the guest and management tool
> >view stable while branching off kvm sub-buses?
> 
> The qdev device model is not a stable interface.  I think that's
> been clear from the very beginning.
> 

And what was the reason it was declared not stable? May be because we
were not sure we will do it right from the start, so change will be
needed later. But changes should bring qdev closer to reflect what guest
expect device topology to look like. This will bring us to stable state
as close possible.  We need this knowledge and stability in qdev for
device path creation.  Both kind of device paths: OF and the one we use
for migration. To create OF device path we need to know topology as seen
by a guest (and guest does not care how isa bus is implemented internally
inside south bridge), to create device path used for migration we need
stability, otherwise change in qdev topology will break migration. All
this artificial buses you propose to add move us in opposite direction
and make qdev useless for anything but .... well for anything.

--
			Gleb.

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-24  8:45                                       ` Gleb Natapov
  0 siblings, 0 replies; 300+ messages in thread
From: Gleb Natapov @ 2011-01-24  8:45 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kvm, Jan Kiszka, Glauber Costa, Marcelo Tosatti,
	Markus Armbruster, qemu-devel, Avi Kivity

On Tue, Jan 18, 2011 at 11:09:01AM -0600, Anthony Liguori wrote:
> >>But we also need to provide a compatible interface to management tools.
> >>Exposing the device model topology as a compatible interface
> >>artificially limits us.  It's far better to provide higher level
> >>supported interfaces to give us the flexibility to change the device
> >>model as we need to.
> >How do you want to change qdev to keep the guest and management tool
> >view stable while branching off kvm sub-buses?
> 
> The qdev device model is not a stable interface.  I think that's
> been clear from the very beginning.
> 

And what was the reason it was declared not stable? May be because we
were not sure we will do it right from the start, so change will be
needed later. But changes should bring qdev closer to reflect what guest
expect device topology to look like. This will bring us to stable state
as close possible.  We need this knowledge and stability in qdev for
device path creation.  Both kind of device paths: OF and the one we use
for migration. To create OF device path we need to know topology as seen
by a guest (and guest does not care how isa bus is implemented internally
inside south bridge), to create device path used for migration we need
stability, otherwise change in qdev topology will break migration. All
this artificial buses you propose to add move us in opposite direction
and make qdev useless for anything but .... well for anything.

--
			Gleb.

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-21 18:49                                               ` Blue Swirl
@ 2011-01-24 14:08                                                 ` Jan Kiszka
  -1 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-24 14:08 UTC (permalink / raw)
  To: Blue Swirl
  Cc: Gerd Hoffmann, Anthony Liguori, Markus Armbruster, kvm,
	Glauber Costa, Marcelo Tosatti, qemu-devel, Avi Kivity

On 2011-01-21 19:49, Blue Swirl wrote:
>>> I'd add fourth possible class:
>>>  - device, CPU and machine configuration, like nographic,
>>> win2k_install_hack, no_hpet, smp_cpus etc. Maybe also
>>> irqchip_in_kernel could fit here, though it obviously depends on a
>>> host capability too.
>>
>> I would count everything that cannot be assigned to a concrete device
>> upfront to the dynamic state of a machine, thus class 2. The point is,
>> (potentially) every device of that machine requires access to it, just
>> like (indirectly, via the KVM core services) to some KVM VM state bits.
> 
> The machine class should not be a catch-all, it would be like
> QEMUState or KVMState then. Perhaps each field or variable should be
> listed and given more thought.

Let's start with what is most urgent:

 - vmfd: file descriptor required for any KVM request that has VM scope
   (in-kernel device creation, device state synchronizations, IRQ
   routing etc.)
 - irqchip_in_kernel: VM uses in-kernel irqchip acceleration
   (some devices will have to adjust their behavior depending on this)

pit_in_kernel would be analogue to irqchip, but it's also conceptually
x86-only (irqchips is only used by x86, but not tied to it) and it's not
mandatory for a first round of KVM devices for upstream.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-24 14:08                                                 ` Jan Kiszka
  0 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-24 14:08 UTC (permalink / raw)
  To: Blue Swirl
  Cc: kvm, Glauber Costa, Marcelo Tosatti, Markus Armbruster,
	qemu-devel, Anthony Liguori, Gerd Hoffmann, Avi Kivity

On 2011-01-21 19:49, Blue Swirl wrote:
>>> I'd add fourth possible class:
>>>  - device, CPU and machine configuration, like nographic,
>>> win2k_install_hack, no_hpet, smp_cpus etc. Maybe also
>>> irqchip_in_kernel could fit here, though it obviously depends on a
>>> host capability too.
>>
>> I would count everything that cannot be assigned to a concrete device
>> upfront to the dynamic state of a machine, thus class 2. The point is,
>> (potentially) every device of that machine requires access to it, just
>> like (indirectly, via the KVM core services) to some KVM VM state bits.
> 
> The machine class should not be a catch-all, it would be like
> QEMUState or KVMState then. Perhaps each field or variable should be
> listed and given more thought.

Let's start with what is most urgent:

 - vmfd: file descriptor required for any KVM request that has VM scope
   (in-kernel device creation, device state synchronizations, IRQ
   routing etc.)
 - irqchip_in_kernel: VM uses in-kernel irqchip acceleration
   (some devices will have to adjust their behavior depending on this)

pit_in_kernel would be analogue to irqchip, but it's also conceptually
x86-only (irqchips is only used by x86, but not tied to it) and it's not
mandatory for a first round of KVM devices for upstream.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-24 14:08                                                 ` Jan Kiszka
@ 2011-01-24 21:35                                                   ` Blue Swirl
  -1 siblings, 0 replies; 300+ messages in thread
From: Blue Swirl @ 2011-01-24 21:35 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Gerd Hoffmann, Anthony Liguori, Markus Armbruster, kvm,
	Glauber Costa, Marcelo Tosatti, qemu-devel, Avi Kivity

On Mon, Jan 24, 2011 at 2:08 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 2011-01-21 19:49, Blue Swirl wrote:
>>>> I'd add fourth possible class:
>>>>  - device, CPU and machine configuration, like nographic,
>>>> win2k_install_hack, no_hpet, smp_cpus etc. Maybe also
>>>> irqchip_in_kernel could fit here, though it obviously depends on a
>>>> host capability too.
>>>
>>> I would count everything that cannot be assigned to a concrete device
>>> upfront to the dynamic state of a machine, thus class 2. The point is,
>>> (potentially) every device of that machine requires access to it, just
>>> like (indirectly, via the KVM core services) to some KVM VM state bits.
>>
>> The machine class should not be a catch-all, it would be like
>> QEMUState or KVMState then. Perhaps each field or variable should be
>> listed and given more thought.
>
> Let's start with what is most urgent:
>
>  - vmfd: file descriptor required for any KVM request that has VM scope
>   (in-kernel device creation, device state synchronizations, IRQ
>   routing etc.)

I'd say VM state.

>  - irqchip_in_kernel: VM uses in-kernel irqchip acceleration
>   (some devices will have to adjust their behavior depending on this)

Since QEMU version is useless, I peeked at qemu-kvm version.

There are a lot of lines like:
if (kvm_enabled() && !kvm_irqchip_in_kernel())
    kvm_just_do_it();

Perhaps these would be cleaner with stub functions.

The device cases are obvious: the devices need a flag, passed to them
by pc.c, which combines kvm_enabled && kvm_irqchip_in_kernel(). This
gets stored in device state.

But exec.c case, where kvm_update_interrupt_request() is called, is
more interesting. CPU init could set up function pointer to either
stub/NULL or kvm_update_interrupt_request().

I didn't look at kvm*.c, qemu-kvm*.c or stuff in kvm/.

So I'd eliminate kvm_irqchip_in_kernel() from outside of KVM and pc.c.
The information could be stored in a MachineState, where pc.c could
grab it for device and CPU setup.

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-24 21:35                                                   ` Blue Swirl
  0 siblings, 0 replies; 300+ messages in thread
From: Blue Swirl @ 2011-01-24 21:35 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: kvm, Glauber Costa, Marcelo Tosatti, Markus Armbruster,
	qemu-devel, Anthony Liguori, Gerd Hoffmann, Avi Kivity

On Mon, Jan 24, 2011 at 2:08 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 2011-01-21 19:49, Blue Swirl wrote:
>>>> I'd add fourth possible class:
>>>>  - device, CPU and machine configuration, like nographic,
>>>> win2k_install_hack, no_hpet, smp_cpus etc. Maybe also
>>>> irqchip_in_kernel could fit here, though it obviously depends on a
>>>> host capability too.
>>>
>>> I would count everything that cannot be assigned to a concrete device
>>> upfront to the dynamic state of a machine, thus class 2. The point is,
>>> (potentially) every device of that machine requires access to it, just
>>> like (indirectly, via the KVM core services) to some KVM VM state bits.
>>
>> The machine class should not be a catch-all, it would be like
>> QEMUState or KVMState then. Perhaps each field or variable should be
>> listed and given more thought.
>
> Let's start with what is most urgent:
>
>  - vmfd: file descriptor required for any KVM request that has VM scope
>   (in-kernel device creation, device state synchronizations, IRQ
>   routing etc.)

I'd say VM state.

>  - irqchip_in_kernel: VM uses in-kernel irqchip acceleration
>   (some devices will have to adjust their behavior depending on this)

Since QEMU version is useless, I peeked at qemu-kvm version.

There are a lot of lines like:
if (kvm_enabled() && !kvm_irqchip_in_kernel())
    kvm_just_do_it();

Perhaps these would be cleaner with stub functions.

The device cases are obvious: the devices need a flag, passed to them
by pc.c, which combines kvm_enabled && kvm_irqchip_in_kernel(). This
gets stored in device state.

But exec.c case, where kvm_update_interrupt_request() is called, is
more interesting. CPU init could set up function pointer to either
stub/NULL or kvm_update_interrupt_request().

I didn't look at kvm*.c, qemu-kvm*.c or stuff in kvm/.

So I'd eliminate kvm_irqchip_in_kernel() from outside of KVM and pc.c.
The information could be stored in a MachineState, where pc.c could
grab it for device and CPU setup.

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-24 21:35                                                   ` Blue Swirl
@ 2011-01-24 21:57                                                     ` Jan Kiszka
  -1 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-24 21:57 UTC (permalink / raw)
  To: Blue Swirl
  Cc: Gerd Hoffmann, Anthony Liguori, Markus Armbruster, kvm,
	Glauber Costa, Marcelo Tosatti, qemu-devel, Avi Kivity

[-- Attachment #1: Type: text/plain, Size: 2825 bytes --]

On 2011-01-24 22:35, Blue Swirl wrote:
> On Mon, Jan 24, 2011 at 2:08 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> On 2011-01-21 19:49, Blue Swirl wrote:
>>>>> I'd add fourth possible class:
>>>>>  - device, CPU and machine configuration, like nographic,
>>>>> win2k_install_hack, no_hpet, smp_cpus etc. Maybe also
>>>>> irqchip_in_kernel could fit here, though it obviously depends on a
>>>>> host capability too.
>>>>
>>>> I would count everything that cannot be assigned to a concrete device
>>>> upfront to the dynamic state of a machine, thus class 2. The point is,
>>>> (potentially) every device of that machine requires access to it, just
>>>> like (indirectly, via the KVM core services) to some KVM VM state bits.
>>>
>>> The machine class should not be a catch-all, it would be like
>>> QEMUState or KVMState then. Perhaps each field or variable should be
>>> listed and given more thought.
>>
>> Let's start with what is most urgent:
>>
>>  - vmfd: file descriptor required for any KVM request that has VM scope
>>   (in-kernel device creation, device state synchronizations, IRQ
>>   routing etc.)
> 
> I'd say VM state.

Good. That's +1 for introducing and distributing it.

> 
>>  - irqchip_in_kernel: VM uses in-kernel irqchip acceleration
>>   (some devices will have to adjust their behavior depending on this)
> 
> Since QEMU version is useless, I peeked at qemu-kvm version.
> 
> There are a lot of lines like:
> if (kvm_enabled() && !kvm_irqchip_in_kernel())
>     kvm_just_do_it();
> 
> Perhaps these would be cleaner with stub functions.

Probably. I guess there is quite some room left for cleanups in this area.

> 
> The device cases are obvious: the devices need a flag, passed to them
> by pc.c, which combines kvm_enabled && kvm_irqchip_in_kernel(). This
> gets stored in device state.

Not all devices are only instantiated by the machine init code. Even if
we are lucky that all those we need on x86 are created that way, we
shouldn't rely on this for future use case, including other KVM archs.

> 
> But exec.c case, where kvm_update_interrupt_request() is called, is
> more interesting. CPU init could set up function pointer to either
> stub/NULL or kvm_update_interrupt_request().
> 

Yes, callbacks are the way to go long term. Here we could also define
one for VCPU interrupt handling and set it according to the VCPU mode.

> I didn't look at kvm*.c, qemu-kvm*.c or stuff in kvm/.
> 
> So I'd eliminate kvm_irqchip_in_kernel() from outside of KVM and pc.c.
> The information could be stored in a MachineState, where pc.c could
> grab it for device and CPU setup.

I still don't see how we can distribute the information to all
interested devices. It's basically the same issue as with current kvm_state.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-24 21:57                                                     ` Jan Kiszka
  0 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-24 21:57 UTC (permalink / raw)
  To: Blue Swirl
  Cc: kvm, Glauber Costa, Marcelo Tosatti, Markus Armbruster,
	qemu-devel, Anthony Liguori, Gerd Hoffmann, Avi Kivity

[-- Attachment #1: Type: text/plain, Size: 2825 bytes --]

On 2011-01-24 22:35, Blue Swirl wrote:
> On Mon, Jan 24, 2011 at 2:08 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> On 2011-01-21 19:49, Blue Swirl wrote:
>>>>> I'd add fourth possible class:
>>>>>  - device, CPU and machine configuration, like nographic,
>>>>> win2k_install_hack, no_hpet, smp_cpus etc. Maybe also
>>>>> irqchip_in_kernel could fit here, though it obviously depends on a
>>>>> host capability too.
>>>>
>>>> I would count everything that cannot be assigned to a concrete device
>>>> upfront to the dynamic state of a machine, thus class 2. The point is,
>>>> (potentially) every device of that machine requires access to it, just
>>>> like (indirectly, via the KVM core services) to some KVM VM state bits.
>>>
>>> The machine class should not be a catch-all, it would be like
>>> QEMUState or KVMState then. Perhaps each field or variable should be
>>> listed and given more thought.
>>
>> Let's start with what is most urgent:
>>
>>  - vmfd: file descriptor required for any KVM request that has VM scope
>>   (in-kernel device creation, device state synchronizations, IRQ
>>   routing etc.)
> 
> I'd say VM state.

Good. That's +1 for introducing and distributing it.

> 
>>  - irqchip_in_kernel: VM uses in-kernel irqchip acceleration
>>   (some devices will have to adjust their behavior depending on this)
> 
> Since QEMU version is useless, I peeked at qemu-kvm version.
> 
> There are a lot of lines like:
> if (kvm_enabled() && !kvm_irqchip_in_kernel())
>     kvm_just_do_it();
> 
> Perhaps these would be cleaner with stub functions.

Probably. I guess there is quite some room left for cleanups in this area.

> 
> The device cases are obvious: the devices need a flag, passed to them
> by pc.c, which combines kvm_enabled && kvm_irqchip_in_kernel(). This
> gets stored in device state.

Not all devices are only instantiated by the machine init code. Even if
we are lucky that all those we need on x86 are created that way, we
shouldn't rely on this for future use case, including other KVM archs.

> 
> But exec.c case, where kvm_update_interrupt_request() is called, is
> more interesting. CPU init could set up function pointer to either
> stub/NULL or kvm_update_interrupt_request().
> 

Yes, callbacks are the way to go long term. Here we could also define
one for VCPU interrupt handling and set it according to the VCPU mode.

> I didn't look at kvm*.c, qemu-kvm*.c or stuff in kvm/.
> 
> So I'd eliminate kvm_irqchip_in_kernel() from outside of KVM and pc.c.
> The information could be stored in a MachineState, where pc.c could
> grab it for device and CPU setup.

I still don't see how we can distribute the information to all
interested devices. It's basically the same issue as with current kvm_state.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-18 14:28                   ` Jan Kiszka
@ 2011-01-25 10:27                     ` Avi Kivity
  -1 siblings, 0 replies; 300+ messages in thread
From: Avi Kivity @ 2011-01-25 10:27 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, Markus Armbruster, Marcelo Tosatti,
	Glauber Costa, kvm, qemu-devel

On 01/18/2011 04:28 PM, Jan Kiszka wrote:
> >
> >  So we can either "infect" the whole device tree with kvm (or maybe a
> >  more generic accelerator structure that also deals with Xen) or we need
> >  to pull the reference inside the device's init function from some global
> >  service (kvm_get_state).
>
> Note that this topic is still waiting for good suggestions, specifically
> from those who believe in kvm_state references :). This is not only
> blocking kvmstate merge but will affect KVM irqchips as well.

I'm one of them, but I don't have anything better to suggest than adding 
"kvm_state" attribute to qdev, which seems mighty artificial.  So I'm in 
favour of eliminating it now.

> It boils down to how we reasonably pass a kvm_state reference from
> machine init code to a sysbus device. I'm probably biased, but I don't
> see any way that does not work against the idea of confining access to
> kvm_state or breaks device instantiation from the command line or a
> config file.

I'm biased in the other direction, but I agree.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-25 10:27                     ` Avi Kivity
  0 siblings, 0 replies; 300+ messages in thread
From: Avi Kivity @ 2011-01-25 10:27 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: kvm, Glauber Costa, Marcelo Tosatti, Markus Armbruster,
	qemu-devel, Anthony Liguori

On 01/18/2011 04:28 PM, Jan Kiszka wrote:
> >
> >  So we can either "infect" the whole device tree with kvm (or maybe a
> >  more generic accelerator structure that also deals with Xen) or we need
> >  to pull the reference inside the device's init function from some global
> >  service (kvm_get_state).
>
> Note that this topic is still waiting for good suggestions, specifically
> from those who believe in kvm_state references :). This is not only
> blocking kvmstate merge but will affect KVM irqchips as well.

I'm one of them, but I don't have anything better to suggest than adding 
"kvm_state" attribute to qdev, which seems mighty artificial.  So I'm in 
favour of eliminating it now.

> It boils down to how we reasonably pass a kvm_state reference from
> machine init code to a sysbus device. I'm probably biased, but I don't
> see any way that does not work against the idea of confining access to
> kvm_state or breaks device instantiation from the command line or a
> config file.

I'm biased in the other direction, but I agree.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-18 15:50                         ` Anthony Liguori
@ 2011-01-25 10:34                           ` Avi Kivity
  -1 siblings, 0 replies; 300+ messages in thread
From: Avi Kivity @ 2011-01-25 10:34 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Jan Kiszka, Markus Armbruster, Marcelo Tosatti, Glauber Costa,
	kvm, qemu-devel

On 01/18/2011 05:50 PM, Anthony Liguori wrote:
>> This design is in conflict with the requirement to attach KVM-assisted
>> devices also to their home bus, e.g. an assigned PCI device to the PCI
>> bus. We don't support multi-homed qdev devices.
>
> The bus topology reflects how I/O flows in and out of a device.  We do 
> not model a perfect PC bus architecture and I don't think we ever 
> intend to.  Instead, we model a functional architecture.

A KVM bus is far from a function architecture.  It simply departs from 
both what a real PC looks like, and what a KVM PC looks like to a guest, 
for no reason except to force a particular object model.

It's completely artificial.  If kvm were not behind the kernel/user 
interface, and an unchangeable ABI, we'd refactor it.  However, we can't 
refactor it, and you're trying to warp the device model to adapt to this 
design problem instead of working around it.  You're elevating a kink 
into an architectural feature.

>
> I/O from an assigned device does not flow through the emulated PCI 
> bus.  Therefore, it does not belong on the emulated PCI bus.

Yes it does.  Config space and some mmio flows through qemu and the 
emulated PCI bus.  So do things like hotplug/hotunplug.


>
> Assigned devices need to interact with the emulated PCI bus, but they 
> shouldn't be children of it.
>

What's the difference, from the guest point of view, from an assigned 
RTL8139 card, and an emulated RTL8139 card?

If you believe there is no difference, what better way to model this 
than implement them the same way using the same interfaces?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-25 10:34                           ` Avi Kivity
  0 siblings, 0 replies; 300+ messages in thread
From: Avi Kivity @ 2011-01-25 10:34 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kvm, Jan Kiszka, Glauber Costa, Marcelo Tosatti,
	Markus Armbruster, qemu-devel

On 01/18/2011 05:50 PM, Anthony Liguori wrote:
>> This design is in conflict with the requirement to attach KVM-assisted
>> devices also to their home bus, e.g. an assigned PCI device to the PCI
>> bus. We don't support multi-homed qdev devices.
>
> The bus topology reflects how I/O flows in and out of a device.  We do 
> not model a perfect PC bus architecture and I don't think we ever 
> intend to.  Instead, we model a functional architecture.

A KVM bus is far from a function architecture.  It simply departs from 
both what a real PC looks like, and what a KVM PC looks like to a guest, 
for no reason except to force a particular object model.

It's completely artificial.  If kvm were not behind the kernel/user 
interface, and an unchangeable ABI, we'd refactor it.  However, we can't 
refactor it, and you're trying to warp the device model to adapt to this 
design problem instead of working around it.  You're elevating a kink 
into an architectural feature.

>
> I/O from an assigned device does not flow through the emulated PCI 
> bus.  Therefore, it does not belong on the emulated PCI bus.

Yes it does.  Config space and some mmio flows through qemu and the 
emulated PCI bus.  So do things like hotplug/hotunplug.


>
> Assigned devices need to interact with the emulated PCI bus, but they 
> shouldn't be children of it.
>

What's the difference, from the guest point of view, from an assigned 
RTL8139 card, and an emulated RTL8139 card?

If you believe there is no difference, what better way to model this 
than implement them the same way using the same interfaces?

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-19 16:57                             ` [Qemu-devel] " Anthony Liguori
@ 2011-01-25 11:06                               ` Avi Kivity
  -1 siblings, 0 replies; 300+ messages in thread
From: Avi Kivity @ 2011-01-25 11:06 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Markus Armbruster, Jan Kiszka, kvm, Glauber Costa,
	Marcelo Tosatti, qemu-devel

On 01/19/2011 06:57 PM, Anthony Liguori wrote:
> On 01/19/2011 07:15 AM, Markus Armbruster wrote:
>> So they interact with KVM (need kvm_state), and they interact with the
>> emulated PCI bus.  Could you elaborate on the fundamental difference
>> between the two interactions that makes you choose the (hypothetical)
>> KVM bus over the PCI bus as device parent?
>
> It's almost arbitrary, but I would say it's the direction that I/Os flow.
>

In the case of kvm, things are somewhat misleading.  I/O still flows 
through the (virtual) PCI bus, it's just short-circuited to a real 
device.  Similarly when attaching an ioeventfd to a virtio kick 
register, things still logically from the same way as without ioeventfd; 
we simply add a fast path for the operation.  But it doesn't change the 
logical view of things.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-25 11:06                               ` Avi Kivity
  0 siblings, 0 replies; 300+ messages in thread
From: Avi Kivity @ 2011-01-25 11:06 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kvm, Jan Kiszka, Glauber Costa, Marcelo Tosatti,
	Markus Armbruster, qemu-devel

On 01/19/2011 06:57 PM, Anthony Liguori wrote:
> On 01/19/2011 07:15 AM, Markus Armbruster wrote:
>> So they interact with KVM (need kvm_state), and they interact with the
>> emulated PCI bus.  Could you elaborate on the fundamental difference
>> between the two interactions that makes you choose the (hypothetical)
>> KVM bus over the PCI bus as device parent?
>
> It's almost arbitrary, but I would say it's the direction that I/Os flow.
>

In the case of kvm, things are somewhat misleading.  I/O still flows 
through the (virtual) PCI bus, it's just short-circuited to a real 
device.  Similarly when attaching an ioeventfd to a virtio kick 
register, things still logically from the same way as without ioeventfd; 
we simply add a fast path for the operation.  But it doesn't change the 
logical view of things.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-20 21:22                                     ` Jan Kiszka
@ 2011-01-25 11:10                                       ` Avi Kivity
  -1 siblings, 0 replies; 300+ messages in thread
From: Avi Kivity @ 2011-01-25 11:10 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Blue Swirl, Anthony Liguori, Markus Armbruster, kvm,
	Glauber Costa, Marcelo Tosatti, qemu-devel

On 01/20/2011 11:22 PM, Jan Kiszka wrote:
> On 2011-01-20 20:27, Blue Swirl wrote:
> >  On Thu, Jan 20, 2011 at 9:33 AM, Jan Kiszka<jan.kiszka@siemens.com>  wrote:
> >>  On 2011-01-19 20:32, Blue Swirl wrote:
> >>>  On Wed, Jan 19, 2011 at 4:57 PM, Anthony Liguori
> >>>  <aliguori@linux.vnet.ibm.com>  wrote:
> >>>>  On 01/19/2011 07:15 AM, Markus Armbruster wrote:
> >>>>>
> >>>>>  So they interact with KVM (need kvm_state), and they interact with the
> >>>>>  emulated PCI bus.  Could you elaborate on the fundamental difference
> >>>>>  between the two interactions that makes you choose the (hypothetical)
> >>>>>  KVM bus over the PCI bus as device parent?
> >>>>>
> >>>>
> >>>>  It's almost arbitrary, but I would say it's the direction that I/Os flow.
> >>>>
> >>>>  But if the underlying observation is that the device tree is not really a
> >>>>  tree, you're 100% correct.  This is part of why a factory interface that
> >>>>  just takes a parent bus is too simplistic.
> >>>>
> >>>>  I think we ought to introduce a -pci-device option that is specifically for
> >>>>  creating PCI devices that doesn't require a parent bus argument but provides
> >>>>  a way to specify stable addressing (for instancing, using a linear index).
> >>>
> >>>  I think kvm_state should not be a property of any device or bus. It
> >>>  should be split to more logical pieces.
> >>>
> >>>  Some parts of it could remain in CPUState, because they are associated
> >>>  with a VCPU.
> >>>
> >>>  Also, for example irqfd could be considered to be similar object to
> >>>  char or block devices provided by QEMU to devices. Would it make sense
> >>>  to introduce new host types for passing parts of kvm_state to devices?
> >>>
> >>>  I'd also make coalesced MMIO stuff part of memory object. We are not
> >>>  passing any state references when using cpu_physical_memory_rw(), but
> >>>  that could be changed.
> >>
> >>  There are currently no VCPU-specific bits remaining in kvm_state.
> >
> >  I think fields vcpu_events, robust_singlestep, debugregs,
> >  kvm_sw_breakpoints, xsave, xcrs belong to CPUX86State. They may be the
> >  same for all VCPUs but still they are sort of CPU properties. I'm not
> >  sure about fd field.
>
> They are all properties of the currently loaded KVM subsystem in the
> host kernel. They can't change while KVM's root fd is opened.
> Replicating this static information into each and every VCPU state would
> be crazy.

Perhaps they should be renamed to have_xsave or features.xsave, and be 
made bools, to improve readability.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-25 11:10                                       ` Avi Kivity
  0 siblings, 0 replies; 300+ messages in thread
From: Avi Kivity @ 2011-01-25 11:10 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: kvm, Glauber Costa, Marcelo Tosatti, Markus Armbruster,
	qemu-devel, Blue Swirl, Anthony Liguori

On 01/20/2011 11:22 PM, Jan Kiszka wrote:
> On 2011-01-20 20:27, Blue Swirl wrote:
> >  On Thu, Jan 20, 2011 at 9:33 AM, Jan Kiszka<jan.kiszka@siemens.com>  wrote:
> >>  On 2011-01-19 20:32, Blue Swirl wrote:
> >>>  On Wed, Jan 19, 2011 at 4:57 PM, Anthony Liguori
> >>>  <aliguori@linux.vnet.ibm.com>  wrote:
> >>>>  On 01/19/2011 07:15 AM, Markus Armbruster wrote:
> >>>>>
> >>>>>  So they interact with KVM (need kvm_state), and they interact with the
> >>>>>  emulated PCI bus.  Could you elaborate on the fundamental difference
> >>>>>  between the two interactions that makes you choose the (hypothetical)
> >>>>>  KVM bus over the PCI bus as device parent?
> >>>>>
> >>>>
> >>>>  It's almost arbitrary, but I would say it's the direction that I/Os flow.
> >>>>
> >>>>  But if the underlying observation is that the device tree is not really a
> >>>>  tree, you're 100% correct.  This is part of why a factory interface that
> >>>>  just takes a parent bus is too simplistic.
> >>>>
> >>>>  I think we ought to introduce a -pci-device option that is specifically for
> >>>>  creating PCI devices that doesn't require a parent bus argument but provides
> >>>>  a way to specify stable addressing (for instancing, using a linear index).
> >>>
> >>>  I think kvm_state should not be a property of any device or bus. It
> >>>  should be split to more logical pieces.
> >>>
> >>>  Some parts of it could remain in CPUState, because they are associated
> >>>  with a VCPU.
> >>>
> >>>  Also, for example irqfd could be considered to be similar object to
> >>>  char or block devices provided by QEMU to devices. Would it make sense
> >>>  to introduce new host types for passing parts of kvm_state to devices?
> >>>
> >>>  I'd also make coalesced MMIO stuff part of memory object. We are not
> >>>  passing any state references when using cpu_physical_memory_rw(), but
> >>>  that could be changed.
> >>
> >>  There are currently no VCPU-specific bits remaining in kvm_state.
> >
> >  I think fields vcpu_events, robust_singlestep, debugregs,
> >  kvm_sw_breakpoints, xsave, xcrs belong to CPUX86State. They may be the
> >  same for all VCPUs but still they are sort of CPU properties. I'm not
> >  sure about fd field.
>
> They are all properties of the currently loaded KVM subsystem in the
> host kernel. They can't change while KVM's root fd is opened.
> Replicating this static information into each and every VCPU state would
> be crazy.

Perhaps they should be renamed to have_xsave or features.xsave, and be 
made bools, to improve readability.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-25 10:27                     ` Avi Kivity
@ 2011-01-25 13:58                       ` Anthony Liguori
  -1 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-25 13:58 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Jan Kiszka, kvm, Glauber Costa, Marcelo Tosatti,
	Markus Armbruster, qemu-devel, Anthony Liguori

On 01/25/2011 04:27 AM, Avi Kivity wrote:
>> It boils down to how we reasonably pass a kvm_state reference from
>> machine init code to a sysbus device. I'm probably biased, but I don't
>> see any way that does not work against the idea of confining access to
>> kvm_state or breaks device instantiation from the command line or a
>> config file.
>
> I'm biased in the other direction, but I agree.

Just #include "kvm.h" and reference the global kvm_state once in the 
initfn.  We don't have to solve this problem yet.  References to the 
global kvm_state become placeholders of where things need to be fixed up.

Regards,

Anthony Liguori



^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-25 13:58                       ` Anthony Liguori
  0 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-25 13:58 UTC (permalink / raw)
  To: Avi Kivity
  Cc: kvm, Jan Kiszka, Glauber Costa, Marcelo Tosatti,
	Markus Armbruster, qemu-devel, Anthony Liguori

On 01/25/2011 04:27 AM, Avi Kivity wrote:
>> It boils down to how we reasonably pass a kvm_state reference from
>> machine init code to a sysbus device. I'm probably biased, but I don't
>> see any way that does not work against the idea of confining access to
>> kvm_state or breaks device instantiation from the command line or a
>> config file.
>
> I'm biased in the other direction, but I agree.

Just #include "kvm.h" and reference the global kvm_state once in the 
initfn.  We don't have to solve this problem yet.  References to the 
global kvm_state become placeholders of where things need to be fixed up.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
  2011-01-25 11:06                               ` Avi Kivity
@ 2011-01-25 14:30                                 ` Anthony Liguori
  -1 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-25 14:30 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Markus Armbruster, Jan Kiszka, kvm, Glauber Costa,
	Marcelo Tosatti, qemu-devel

On 01/25/2011 05:06 AM, Avi Kivity wrote:
> On 01/19/2011 06:57 PM, Anthony Liguori wrote:
>> On 01/19/2011 07:15 AM, Markus Armbruster wrote:
>>> So they interact with KVM (need kvm_state), and they interact with the
>>> emulated PCI bus.  Could you elaborate on the fundamental difference
>>> between the two interactions that makes you choose the (hypothetical)
>>> KVM bus over the PCI bus as device parent?
>>
>> It's almost arbitrary, but I would say it's the direction that I/Os 
>> flow.
>>
>
> In the case of kvm, things are somewhat misleading.  I/O still flows 
> through the (virtual) PCI bus, it's just short-circuited to a real device.

It doesn't.  If we have a PCI bus that transforms I/O or remaps I/O via 
an IOMMU, that device doesn't participate in it.

But this whole discussion is way off track.

We don't have to solve any of these problems today.  Just don't remove 
kvm_state and grab a global reference to it when we need to (which is 
*at best* one place in the code today) and let's move on with our lives.

Regards,

Anthony Liguori

>   Similarly when attaching an ioeventfd to a virtio kick register, 
> things still logically from the same way as without ioeventfd; we 
> simply add a fast path for the operation.  But it doesn't change the 
> logical view of things.
>


^ permalink raw reply	[flat|nested] 300+ messages in thread

* Re: [Qemu-devel] [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state
@ 2011-01-25 14:30                                 ` Anthony Liguori
  0 siblings, 0 replies; 300+ messages in thread
From: Anthony Liguori @ 2011-01-25 14:30 UTC (permalink / raw)
  To: Avi Kivity
  Cc: kvm, Jan Kiszka, Glauber Costa, Marcelo Tosatti,
	Markus Armbruster, qemu-devel

On 01/25/2011 05:06 AM, Avi Kivity wrote:
> On 01/19/2011 06:57 PM, Anthony Liguori wrote:
>> On 01/19/2011 07:15 AM, Markus Armbruster wrote:
>>> So they interact with KVM (need kvm_state), and they interact with the
>>> emulated PCI bus.  Could you elaborate on the fundamental difference
>>> between the two interactions that makes you choose the (hypothetical)
>>> KVM bus over the PCI bus as device parent?
>>
>> It's almost arbitrary, but I would say it's the direction that I/Os 
>> flow.
>>
>
> In the case of kvm, things are somewhat misleading.  I/O still flows 
> through the (virtual) PCI bus, it's just short-circuited to a real device.

It doesn't.  If we have a PCI bus that transforms I/O or remaps I/O via 
an IOMMU, that device doesn't participate in it.

But this whole discussion is way off track.

We don't have to solve any of these problems today.  Just don't remove 
kvm_state and grab a global reference to it when we need to (which is 
*at best* one place in the code today) and let's move on with our lives.

Regards,

Anthony Liguori

>   Similarly when attaching an ioeventfd to a virtio kick register, 
> things still logically from the same way as without ioeventfd; we 
> simply add a fast path for the operation.  But it doesn't change the 
> logical view of things.
>

^ permalink raw reply	[flat|nested] 300+ messages in thread

* [PATCH] kvm: x86: Fix build in absence of KVM_CAP_ASYNC_PF
  2011-01-06 17:56 ` [Qemu-devel] " Marcelo Tosatti
@ 2011-01-27 14:39   ` Jan Kiszka
  -1 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-27 14:39 UTC (permalink / raw)
  To: Marcelo Tosatti, Anthony Liguori; +Cc: qemu-devel, kvm

Reported by Stefan Hajnoczi.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---

Build regression of "Only read/write MSR_KVM_ASYNC_PF_EN if supported".

 target-i386/kvm.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 8e8880a..05010bb 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -167,7 +167,9 @@ static int get_para_features(CPUState *env)
             features |= (1 << para_features[i].feature);
         }
     }
+#ifdef KVM_CAP_ASYNC_PF
     has_msr_async_pf_en = features & (1 << KVM_FEATURE_ASYNC_PF);
+#endif
     return features;
 }
 #endif
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 300+ messages in thread

* [Qemu-devel] [PATCH] kvm: x86: Fix build in absence of KVM_CAP_ASYNC_PF
@ 2011-01-27 14:39   ` Jan Kiszka
  0 siblings, 0 replies; 300+ messages in thread
From: Jan Kiszka @ 2011-01-27 14:39 UTC (permalink / raw)
  To: Marcelo Tosatti, Anthony Liguori; +Cc: qemu-devel, kvm

Reported by Stefan Hajnoczi.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---

Build regression of "Only read/write MSR_KVM_ASYNC_PF_EN if supported".

 target-i386/kvm.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 8e8880a..05010bb 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -167,7 +167,9 @@ static int get_para_features(CPUState *env)
             features |= (1 << para_features[i].feature);
         }
     }
+#ifdef KVM_CAP_ASYNC_PF
     has_msr_async_pf_en = features & (1 << KVM_FEATURE_ASYNC_PF);
+#endif
     return features;
 }
 #endif
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 300+ messages in thread

end of thread, other threads:[~2011-01-27 14:40 UTC | newest]

Thread overview: 300+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-06 17:56 [PATCH 00/35] [PULL] qemu-kvm.git uq/master queue Marcelo Tosatti
2011-01-06 17:56 ` [Qemu-devel] " Marcelo Tosatti
2011-01-06 17:56 ` [PATCH 01/35] kvm: Enable user space NMI injection for kvm guest Marcelo Tosatti
2011-01-06 17:56   ` [Qemu-devel] " Marcelo Tosatti
2011-01-06 17:56 ` [PATCH 02/35] kvm: convert kvm_ioctl(KVM_CHECK_EXTENSION) to kvm_check_extension() Marcelo Tosatti
2011-01-06 17:56   ` [Qemu-devel] " Marcelo Tosatti
2011-01-06 17:56 ` [PATCH 03/35] Clean up cpu_inject_x86_mce() Marcelo Tosatti
2011-01-06 17:56   ` [Qemu-devel] " Marcelo Tosatti
2011-01-06 17:56 ` [PATCH 04/35] Add "broadcast" option for mce command Marcelo Tosatti
2011-01-06 17:56   ` [Qemu-devel] " Marcelo Tosatti
2011-01-09 18:51   ` Jan Kiszka
2011-01-09 18:51     ` [Qemu-devel] " Jan Kiszka
2011-01-15 16:24     ` Jan Kiszka
2011-01-15 16:24       ` [Qemu-devel] " Jan Kiszka
2011-01-06 17:56 ` [PATCH 05/35] Add function for checking mca broadcast of CPU Marcelo Tosatti
2011-01-06 17:56   ` [Qemu-devel] " Marcelo Tosatti
2011-01-06 17:56 ` [PATCH 06/35] kvm: introduce kvm_mce_in_progress Marcelo Tosatti
2011-01-06 17:56   ` [Qemu-devel] " Marcelo Tosatti
2011-01-06 17:56 ` [PATCH 07/35] kvm: kvm_mce_inj_* subroutines for templated error injections Marcelo Tosatti
2011-01-06 17:56   ` [Qemu-devel] " Marcelo Tosatti
2011-01-06 17:56 ` [PATCH 08/35] kvm: introduce kvm_inject_x86_mce_on Marcelo Tosatti
2011-01-06 17:56   ` [Qemu-devel] " Marcelo Tosatti
2011-01-06 17:56 ` [PATCH 09/35] kvm: x86: Fix DPL write back of segment registers Marcelo Tosatti
2011-01-06 17:56   ` [Qemu-devel] " Marcelo Tosatti
2011-01-06 17:56 ` [PATCH 10/35] kvm: x86: Remove obsolete SS.RPL/DPL aligment Marcelo Tosatti
2011-01-06 17:56   ` [Qemu-devel] " Marcelo Tosatti
2011-01-06 17:56 ` [PATCH 11/35] kvm: x86: Prevent sign extension of DR7 in guest debugging mode Marcelo Tosatti
2011-01-06 17:56   ` [Qemu-devel] " Marcelo Tosatti
2011-01-06 17:56 ` [PATCH 12/35] kvm: x86: Fix a few coding style violations Marcelo Tosatti
2011-01-06 17:56   ` [Qemu-devel] " Marcelo Tosatti
2011-01-06 17:56 ` [PATCH 13/35] kvm: Fix " Marcelo Tosatti
2011-01-06 17:56   ` [Qemu-devel] " Marcelo Tosatti
2011-01-06 17:56 ` [PATCH 14/35] kvm: Drop return value of kvm_cpu_exec Marcelo Tosatti
2011-01-06 17:56   ` [Qemu-devel] " Marcelo Tosatti
2011-01-08 13:09   ` Jan Kiszka
2011-01-08 13:09     ` [Qemu-devel] " Jan Kiszka
2011-01-06 17:56 ` [PATCH 15/35] kvm: Stop on all fatal exit reasons Marcelo Tosatti
2011-01-06 17:56   ` [Qemu-devel] " Marcelo Tosatti
2011-01-06 17:56 ` [PATCH 16/35] kvm: Improve reporting of fatal errors Marcelo Tosatti
2011-01-06 17:56   ` [Qemu-devel] " Marcelo Tosatti
2011-01-06 17:56 ` [PATCH 17/35] x86: Optionally dump code bytes on cpu_dump_state Marcelo Tosatti
2011-01-06 17:56   ` [Qemu-devel] " Marcelo Tosatti
2011-01-06 17:56 ` [PATCH 18/35] kvm: x86: Align kvm_arch_put_registers code with comment Marcelo Tosatti
2011-01-06 17:56   ` [Qemu-devel] " Marcelo Tosatti
2011-01-06 17:56 ` [PATCH 19/35] kvm: x86: Prepare kvm_get_mp_state for in-kernel irqchip Marcelo Tosatti
2011-01-06 17:56   ` [Qemu-devel] " Marcelo Tosatti
2011-01-06 17:56 ` [PATCH 20/35] kvm: x86: Remove redundant mp_state initialization Marcelo Tosatti
2011-01-06 17:56   ` [Qemu-devel] " Marcelo Tosatti
2011-01-06 17:56 ` [PATCH 21/35] kvm: x86: Fix xcr0 reset mismerge Marcelo Tosatti
2011-01-06 17:56   ` [Qemu-devel] " Marcelo Tosatti
2011-01-06 17:56 ` [PATCH 22/35] kvm: x86: Refactor msr_star/hsave_pa setup and checks Marcelo Tosatti
2011-01-06 17:56   ` [Qemu-devel] " Marcelo Tosatti
2011-01-06 17:56 ` [PATCH 23/35] kvm: x86: Reset paravirtual MSRs Marcelo Tosatti
2011-01-06 17:56   ` [Qemu-devel] " Marcelo Tosatti
2011-01-06 17:56 ` [PATCH 24/35] Synchronize VCPU states before reset Marcelo Tosatti
2011-01-06 17:56   ` [Qemu-devel] " Marcelo Tosatti
2011-01-06 17:56 ` [PATCH 25/35] kvm: x86: Drop MCE MSRs write back restrictions Marcelo Tosatti
2011-01-06 17:56   ` [Qemu-devel] " Marcelo Tosatti
2011-01-06 17:56 ` [PATCH 26/35] kvm: Eliminate KVMState arguments Marcelo Tosatti
2011-01-06 17:56   ` [Qemu-devel] " Marcelo Tosatti
2011-01-06 19:24   ` Anthony Liguori
2011-01-06 19:24     ` [Qemu-devel] " Anthony Liguori
2011-01-07  9:03     ` Jan Kiszka
2011-01-07  9:03       ` [Qemu-devel] " Jan Kiszka
2011-01-07 23:27       ` Anthony Liguori
2011-01-07 23:27         ` [Qemu-devel] " Anthony Liguori
2011-01-08  8:47         ` Jan Kiszka
2011-01-08  8:47           ` [Qemu-devel] " Jan Kiszka
2011-01-10 19:59           ` Anthony Liguori
2011-01-10 19:59             ` [Qemu-devel] " Anthony Liguori
2011-01-10 20:12             ` Jan Kiszka
2011-01-10 20:12               ` [Qemu-devel] " Jan Kiszka
2011-01-10 20:23               ` Anthony Liguori
2011-01-10 20:34                 ` Jan Kiszka
2011-01-11  9:01                 ` Avi Kivity
2011-01-11  9:01                   ` Avi Kivity
2011-01-11 14:00                   ` Anthony Liguori
2011-01-11 14:00                     ` Anthony Liguori
2011-01-11 14:06                     ` Alexander Graf
2011-01-11 14:06                       ` Alexander Graf
2011-01-11 14:09                       ` Anthony Liguori
2011-01-11 14:09                         ` Anthony Liguori
2011-01-11 14:22                         ` Avi Kivity
2011-01-11 14:22                           ` Avi Kivity
2011-01-11 14:36                           ` Anthony Liguori
2011-01-11 14:36                             ` Anthony Liguori
2011-01-11 14:56                             ` Avi Kivity
2011-01-11 14:56                               ` Avi Kivity
2011-01-11 15:12                               ` Anthony Liguori
2011-01-11 15:12                                 ` Anthony Liguori
2011-01-11 15:17                                 ` Alexander Graf
2011-01-11 15:17                                   ` Alexander Graf
2011-01-11 15:37                                 ` Avi Kivity
2011-01-11 15:37                                   ` Avi Kivity
2011-01-11 15:55                                   ` Anthony Liguori
2011-01-11 15:55                                     ` Anthony Liguori
2011-01-11 16:03                                     ` Avi Kivity
2011-01-11 16:03                                       ` Avi Kivity
2011-01-11 16:26                                       ` Anthony Liguori
2011-01-11 16:26                                         ` Anthony Liguori
2011-01-11 17:05                                         ` Avi Kivity
2011-01-11 17:05                                           ` Avi Kivity
2011-01-11 14:24                         ` Alexander Graf
2011-01-11 14:24                           ` Alexander Graf
2011-01-11 14:18                     ` Avi Kivity
2011-01-11 14:18                       ` Avi Kivity
2011-01-11 14:28                       ` Anthony Liguori
2011-01-11 14:28                         ` Anthony Liguori
2011-01-11 14:52                         ` Avi Kivity
2011-01-11 14:52                           ` Avi Kivity
2011-01-10 20:11           ` Anthony Liguori
2011-01-10 20:15             ` Jan Kiszka
2011-01-11  9:17             ` Avi Kivity
2011-01-11  9:17               ` Avi Kivity
2011-01-06 17:56 ` [PATCH 27/35] kvm: x86: Fix !CONFIG_KVM_PARA build Marcelo Tosatti
2011-01-06 17:56   ` [Qemu-devel] " Marcelo Tosatti
2011-01-06 17:56 ` [PATCH 28/35] kvm: x86: Introduce kvmclock device to save/restore its state Marcelo Tosatti
2011-01-06 17:56   ` [Qemu-devel] " Marcelo Tosatti
2011-01-10 20:31   ` Anthony Liguori
2011-01-10 20:31     ` Anthony Liguori
2011-01-10 21:06     ` Jan Kiszka
2011-01-10 21:06       ` Jan Kiszka
2011-01-10 22:21       ` Jan Kiszka
2011-01-10 22:21         ` Jan Kiszka
2011-01-10 23:02         ` Anthony Liguori
2011-01-10 23:02           ` Anthony Liguori
2011-01-11  5:54           ` Jan Kiszka
2011-01-11  5:54             ` Jan Kiszka
2011-01-11  8:00         ` Paolo Bonzini
2011-01-11  8:00           ` Paolo Bonzini
2011-01-11  8:53         ` Gerd Hoffmann
2011-01-11  8:53           ` Gerd Hoffmann
2011-01-11 17:13           ` Jan Kiszka
2011-01-11 17:13             ` Jan Kiszka
2011-01-11  9:31         ` Markus Armbruster
2011-01-11  9:31           ` Markus Armbruster
2011-01-11 13:54           ` Anthony Liguori
2011-01-11 13:54             ` Anthony Liguori
2011-01-12 10:22             ` Avi Kivity
2011-01-12 10:22               ` Avi Kivity
2011-01-12 10:31               ` Jan Kiszka
2011-01-12 10:31                 ` Jan Kiszka
2011-01-18 14:28                 ` Jan Kiszka
2011-01-18 14:28                   ` Jan Kiszka
2011-01-18 15:04                   ` Anthony Liguori
2011-01-18 15:04                     ` Anthony Liguori
2011-01-18 15:43                     ` Jan Kiszka
2011-01-18 15:43                       ` Jan Kiszka
2011-01-18 15:48                       ` Anthony Liguori
2011-01-18 15:48                         ` Anthony Liguori
2011-01-18 15:54                         ` Jan Kiszka
2011-01-18 15:54                           ` Jan Kiszka
2011-01-18 17:02                           ` Alex Williamson
2011-01-18 17:02                             ` Alex Williamson
2011-01-18 17:08                             ` Jan Kiszka
2011-01-18 17:08                               ` Jan Kiszka
2011-01-18 17:39                               ` Alex Williamson
2011-01-18 17:39                                 ` Alex Williamson
2011-01-18 15:50                       ` Anthony Liguori
2011-01-18 15:50                         ` Anthony Liguori
2011-01-18 16:01                         ` Jan Kiszka
2011-01-18 16:01                           ` Jan Kiszka
2011-01-18 16:04                           ` Anthony Liguori
2011-01-18 16:04                             ` Anthony Liguori
2011-01-18 16:17                             ` Jan Kiszka
2011-01-18 16:17                               ` Jan Kiszka
2011-01-18 16:37                               ` Anthony Liguori
2011-01-18 16:37                                 ` Anthony Liguori
2011-01-18 16:56                                 ` Jan Kiszka
2011-01-18 16:56                                   ` Jan Kiszka
2011-01-18 17:09                                   ` Anthony Liguori
2011-01-18 17:09                                     ` Anthony Liguori
2011-01-18 17:20                                     ` Jan Kiszka
2011-01-18 17:20                                       ` Jan Kiszka
2011-01-18 17:31                                       ` Anthony Liguori
2011-01-18 17:31                                         ` Anthony Liguori
2011-01-18 17:45                                         ` Jan Kiszka
2011-01-18 17:45                                           ` Jan Kiszka
2011-01-19  9:48                                     ` Gerd Hoffmann
2011-01-19  9:48                                       ` Gerd Hoffmann
2011-01-19 13:11                                       ` Markus Armbruster
2011-01-19 13:11                                         ` Markus Armbruster
2011-01-19 16:54                                         ` Anthony Liguori
2011-01-19 16:54                                           ` Anthony Liguori
2011-01-19 17:19                                           ` Daniel P. Berrange
2011-01-19 17:19                                             ` Daniel P. Berrange
2011-01-19 17:43                                             ` Anthony Liguori
2011-01-19 17:43                                               ` Anthony Liguori
2011-01-20  8:44                                               ` Gerd Hoffmann
2011-01-20  8:44                                                 ` Gerd Hoffmann
2011-01-20 10:33                                                 ` Daniel P. Berrange
2011-01-20 10:33                                                   ` Daniel P. Berrange
2011-01-20 19:42                                                   ` Anthony Liguori
2011-01-20 19:42                                                     ` Anthony Liguori
2011-01-20 19:39                                                 ` Anthony Liguori
2011-01-20 19:39                                                   ` Anthony Liguori
2011-01-21  8:35                                                   ` Gerd Hoffmann
2011-01-21  8:35                                                     ` Gerd Hoffmann
2011-01-21 10:03                                                     ` Markus Armbruster
2011-01-21 10:03                                                       ` Markus Armbruster
2011-01-19 16:53                                       ` Anthony Liguori
2011-01-19 16:53                                         ` Anthony Liguori
2011-01-19 17:01                                         ` Daniel P. Berrange
2011-01-19 17:01                                           ` Daniel P. Berrange
2011-01-19 17:51                                           ` Anthony Liguori
2011-01-19 17:51                                             ` Anthony Liguori
2011-01-19 18:52                                             ` Daniel P. Berrange
2011-01-19 18:52                                               ` Daniel P. Berrange
2011-01-19 18:58                                               ` Anthony Liguori
2011-01-19 18:58                                                 ` Anthony Liguori
2011-01-19 17:35                                         ` Daniel P. Berrange
2011-01-19 17:35                                           ` Daniel P. Berrange
2011-01-19 17:42                                           ` Anthony Liguori
2011-01-19 17:42                                             ` Anthony Liguori
2011-01-19 18:53                                             ` Daniel P. Berrange
2011-01-19 18:53                                               ` Daniel P. Berrange
2011-01-19 13:09                                     ` Markus Armbruster
2011-01-19 13:09                                       ` Markus Armbruster
2011-01-24  8:45                                     ` Gleb Natapov
2011-01-24  8:45                                       ` Gleb Natapov
2011-01-19 13:15                         ` Markus Armbruster
2011-01-19 13:15                           ` Markus Armbruster
2011-01-19 16:57                           ` Anthony Liguori
2011-01-19 16:57                             ` [Qemu-devel] " Anthony Liguori
2011-01-19 17:25                             ` Jan Kiszka
2011-01-19 17:25                               ` Jan Kiszka
2011-01-19 19:32                             ` Blue Swirl
2011-01-19 19:32                               ` Blue Swirl
2011-01-20  9:33                               ` Jan Kiszka
2011-01-20  9:33                                 ` Jan Kiszka
2011-01-20 19:27                                 ` Blue Swirl
2011-01-20 19:27                                   ` Blue Swirl
2011-01-20 21:22                                   ` Jan Kiszka
2011-01-20 21:22                                     ` Jan Kiszka
2011-01-20 21:40                                     ` Blue Swirl
2011-01-20 21:40                                       ` Blue Swirl
2011-01-20 21:53                                       ` Jan Kiszka
2011-01-20 21:53                                         ` Jan Kiszka
2011-01-25 11:10                                     ` Avi Kivity
2011-01-25 11:10                                       ` Avi Kivity
2011-01-21  8:46                                   ` Gerd Hoffmann
2011-01-21  8:46                                     ` Gerd Hoffmann
2011-01-21 10:05                                     ` Markus Armbruster
2011-01-21 10:05                                       ` Markus Armbruster
2011-01-21 16:37                                     ` Blue Swirl
2011-01-21 16:37                                       ` Blue Swirl
2011-01-21 17:21                                       ` Jan Kiszka
2011-01-21 17:21                                         ` Jan Kiszka
2011-01-21 18:04                                         ` Blue Swirl
2011-01-21 18:04                                           ` Blue Swirl
2011-01-21 18:17                                           ` Jan Kiszka
2011-01-21 18:17                                             ` Jan Kiszka
2011-01-21 18:49                                             ` Blue Swirl
2011-01-21 18:49                                               ` Blue Swirl
2011-01-24 14:08                                               ` Jan Kiszka
2011-01-24 14:08                                                 ` Jan Kiszka
2011-01-24 21:35                                                 ` Blue Swirl
2011-01-24 21:35                                                   ` Blue Swirl
2011-01-24 21:57                                                   ` Jan Kiszka
2011-01-24 21:57                                                     ` Jan Kiszka
2011-01-20 19:37                                 ` Anthony Liguori
2011-01-20 19:37                                   ` Anthony Liguori
2011-01-20 20:02                                   ` Blue Swirl
2011-01-20 20:02                                     ` Blue Swirl
2011-01-20 21:42                                     ` Jan Kiszka
2011-01-20 21:42                                       ` Jan Kiszka
2011-01-20 21:27                                   ` Jan Kiszka
2011-01-20 21:27                                     ` Jan Kiszka
2011-01-25 11:06                             ` Avi Kivity
2011-01-25 11:06                               ` Avi Kivity
2011-01-25 14:30                               ` Anthony Liguori
2011-01-25 14:30                                 ` Anthony Liguori
2011-01-25 10:34                         ` Avi Kivity
2011-01-25 10:34                           ` Avi Kivity
2011-01-25 10:27                   ` Avi Kivity
2011-01-25 10:27                     ` Avi Kivity
2011-01-25 13:58                     ` Anthony Liguori
2011-01-25 13:58                       ` Anthony Liguori
2011-01-12 12:04               ` Markus Armbruster
2011-01-12 12:04                 ` Markus Armbruster
2011-01-10 23:04       ` Anthony Liguori
2011-01-10 23:04         ` Anthony Liguori
2011-01-11  5:55         ` Jan Kiszka
2011-01-11  5:55           ` Jan Kiszka
2011-01-06 17:56 ` [PATCH 29/35] kvm: Drop smp_cpus argument from init functions Marcelo Tosatti
2011-01-06 17:56   ` [Qemu-devel] " Marcelo Tosatti
2011-01-06 17:56 ` [PATCH 30/35] kvm: Consolidate must-have capability checks Marcelo Tosatti
2011-01-06 17:56   ` [Qemu-devel] " Marcelo Tosatti
2011-01-06 17:56 ` [PATCH 31/35] kvm: x86: Rework identity map and TSS setup for larger BIOS sizes Marcelo Tosatti
2011-01-06 17:56   ` [Qemu-devel] " Marcelo Tosatti
2011-01-06 17:56 ` [PATCH 32/35] kvm: Flush coalesced mmio buffer on IO window exits Marcelo Tosatti
2011-01-06 17:56   ` [Qemu-devel] " Marcelo Tosatti
2011-01-06 17:56 ` [PATCH 33/35] kvm: Do not use qemu_fair_mutex Marcelo Tosatti
2011-01-06 17:56   ` [Qemu-devel] " Marcelo Tosatti
2011-01-06 17:56 ` [PATCH 34/35] kvm: x86: Implicitly clear nmi_injected/pending on reset Marcelo Tosatti
2011-01-06 17:56   ` [Qemu-devel] " Marcelo Tosatti
2011-01-06 17:56 ` [PATCH 35/35] kvm: x86: Only read/write MSR_KVM_ASYNC_PF_EN if supported Marcelo Tosatti
2011-01-06 17:56   ` [Qemu-devel] " Marcelo Tosatti
2011-01-27 14:39 ` [PATCH] kvm: x86: Fix build in absence of KVM_CAP_ASYNC_PF Jan Kiszka
2011-01-27 14:39   ` [Qemu-devel] " Jan Kiszka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.