[PATCH v2 0/8] uq/master: TPR access optimization for Windows guests

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/8] uq/master: TPR access optimization for Windows guests
@ 2012-02-10 18:31 ` Jan Kiszka
  0 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-10 18:31 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Paolo Bonzini, Anthony Liguori, qemu-devel, kvm, Gleb Natapov

Here is v2 of the TPR access optimization. Changes:
 - plug race between patching and running VCPUs accessing the same TPR
   instruction by stopping VCPUs during patch process
 - realized forward/backward check in evaluate_tpr_instruction via a
   table but kept patch_instruction as is (too much variations for a
   table-driven approach)
 - dropped smp_cpus == 1 special case from get_kpcr_number
 - fixed comment why R/W ROM alias has to be page-aligned

The series is also available at

    git://git.kiszka.org/qemu-kvm.git queues/kvm-tpr

Please review/apply.

CC: Paolo Bonzini <pbonzini@redhat.com>

Jan Kiszka (8):
  kvm: Set cpu_single_env only once
  Allow to use pause_all_vcpus from VCPU context
  target-i386: Add infrastructure for reporting TPR MMIO accesses
  kvmvapic: Add option ROM
  kvmvapic: Introduce TPR access optimization for Windows guests
  kvmvapic: Simplify mp/up_set_tpr
  optionsrom: Reserve space for checksum
  kvmvapic: Use optionrom helpers

 .gitignore                    |    1 +
 Makefile                      |    2 +-
 Makefile.target               |    3 +-
 cpu-all.h                     |    3 +-
 cpus.c                        |   13 +
 hw/apic.c                     |  126 ++++++-
 hw/apic.h                     |    2 +
 hw/apic_common.c              |   68 ++++-
 hw/apic_internal.h            |   27 ++
 hw/kvm/apic.c                 |   32 ++
 hw/kvmvapic.c                 |  774 +++++++++++++++++++++++++++++++++++++++++
 kvm-all.c                     |    5 -
 pc-bios/optionrom/Makefile    |    2 +-
 pc-bios/optionrom/kvmvapic.S  |  335 ++++++++++++++++++
 pc-bios/optionrom/optionrom.h |    3 +-
 target-i386/cpu.h             |    9 +
 target-i386/helper.c          |   19 +
 target-i386/kvm.c             |   24 ++-
 18 files changed, 1423 insertions(+), 25 deletions(-)
 create mode 100644 hw/kvmvapic.c
 create mode 100644 pc-bios/optionrom/kvmvapic.S

-- 
1.7.3.4

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [Qemu-devel] [PATCH v2 0/8] uq/master: TPR access optimization for Windows guests
@ 2012-02-10 18:31 ` Jan Kiszka
  0 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-10 18:31 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Paolo Bonzini, Anthony Liguori, qemu-devel, kvm, Gleb Natapov

Here is v2 of the TPR access optimization. Changes:
 - plug race between patching and running VCPUs accessing the same TPR
   instruction by stopping VCPUs during patch process
 - realized forward/backward check in evaluate_tpr_instruction via a
   table but kept patch_instruction as is (too much variations for a
   table-driven approach)
 - dropped smp_cpus == 1 special case from get_kpcr_number
 - fixed comment why R/W ROM alias has to be page-aligned

The series is also available at

    git://git.kiszka.org/qemu-kvm.git queues/kvm-tpr

Please review/apply.

CC: Paolo Bonzini <pbonzini@redhat.com>

Jan Kiszka (8):
  kvm: Set cpu_single_env only once
  Allow to use pause_all_vcpus from VCPU context
  target-i386: Add infrastructure for reporting TPR MMIO accesses
  kvmvapic: Add option ROM
  kvmvapic: Introduce TPR access optimization for Windows guests
  kvmvapic: Simplify mp/up_set_tpr
  optionsrom: Reserve space for checksum
  kvmvapic: Use optionrom helpers

 .gitignore                    |    1 +
 Makefile                      |    2 +-
 Makefile.target               |    3 +-
 cpu-all.h                     |    3 +-
 cpus.c                        |   13 +
 hw/apic.c                     |  126 ++++++-
 hw/apic.h                     |    2 +
 hw/apic_common.c              |   68 ++++-
 hw/apic_internal.h            |   27 ++
 hw/kvm/apic.c                 |   32 ++
 hw/kvmvapic.c                 |  774 +++++++++++++++++++++++++++++++++++++++++
 kvm-all.c                     |    5 -
 pc-bios/optionrom/Makefile    |    2 +-
 pc-bios/optionrom/kvmvapic.S  |  335 ++++++++++++++++++
 pc-bios/optionrom/optionrom.h |    3 +-
 target-i386/cpu.h             |    9 +
 target-i386/helper.c          |   19 +
 target-i386/kvm.c             |   24 ++-
 18 files changed, 1423 insertions(+), 25 deletions(-)
 create mode 100644 hw/kvmvapic.c
 create mode 100644 pc-bios/optionrom/kvmvapic.S

-- 
1.7.3.4

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [PATCH v2 1/8] kvm: Set cpu_single_env only once
  2012-02-10 18:31 ` [Qemu-devel] " Jan Kiszka
@ 2012-02-10 18:31   ` Jan Kiszka
  -1 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-10 18:31 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, qemu-devel, Anthony Liguori, Gleb Natapov

As we have thread-local cpu_single_env now and KVM uses exactly one
thread per VCPU, we can drop the cpu_single_env updates from the loop
and initialize this variable only once during setup.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 cpus.c    |    1 +
 kvm-all.c |    5 -----
 2 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/cpus.c b/cpus.c
index f45a438..d0c8340 100644
--- a/cpus.c
+++ b/cpus.c
@@ -714,6 +714,7 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
     qemu_mutex_lock(&qemu_global_mutex);
     qemu_thread_get_self(env->thread);
     env->thread_id = qemu_get_thread_id();
+    cpu_single_env = env;
 
     r = kvm_init_vcpu(env);
     if (r < 0) {
diff --git a/kvm-all.c b/kvm-all.c
index c4babda..e2cbc03 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1118,8 +1118,6 @@ int kvm_cpu_exec(CPUState *env)
         return EXCP_HLT;
     }
 
-    cpu_single_env = env;
-
     do {
         if (env->kvm_vcpu_dirty) {
             kvm_arch_put_registers(env, KVM_PUT_RUNTIME_STATE);
@@ -1136,13 +1134,11 @@ int kvm_cpu_exec(CPUState *env)
              */
             qemu_cpu_kick_self();
         }
-        cpu_single_env = NULL;
         qemu_mutex_unlock_iothread();
 
         run_ret = kvm_vcpu_ioctl(env, KVM_RUN, 0);
 
         qemu_mutex_lock_iothread();
-        cpu_single_env = env;
         kvm_arch_post_run(env, run);
 
         kvm_flush_coalesced_mmio_buffer();
@@ -1206,7 +1202,6 @@ int kvm_cpu_exec(CPUState *env)
     }
 
     env->exit_request = 0;
-    cpu_single_env = NULL;
     return ret;
 }
 
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Qemu-devel] [PATCH v2 1/8] kvm: Set cpu_single_env only once
@ 2012-02-10 18:31   ` Jan Kiszka
  0 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-10 18:31 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Anthony Liguori, qemu-devel, kvm, Gleb Natapov

As we have thread-local cpu_single_env now and KVM uses exactly one
thread per VCPU, we can drop the cpu_single_env updates from the loop
and initialize this variable only once during setup.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 cpus.c    |    1 +
 kvm-all.c |    5 -----
 2 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/cpus.c b/cpus.c
index f45a438..d0c8340 100644
--- a/cpus.c
+++ b/cpus.c
@@ -714,6 +714,7 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
     qemu_mutex_lock(&qemu_global_mutex);
     qemu_thread_get_self(env->thread);
     env->thread_id = qemu_get_thread_id();
+    cpu_single_env = env;
 
     r = kvm_init_vcpu(env);
     if (r < 0) {
diff --git a/kvm-all.c b/kvm-all.c
index c4babda..e2cbc03 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1118,8 +1118,6 @@ int kvm_cpu_exec(CPUState *env)
         return EXCP_HLT;
     }
 
-    cpu_single_env = env;
-
     do {
         if (env->kvm_vcpu_dirty) {
             kvm_arch_put_registers(env, KVM_PUT_RUNTIME_STATE);
@@ -1136,13 +1134,11 @@ int kvm_cpu_exec(CPUState *env)
              */
             qemu_cpu_kick_self();
         }
-        cpu_single_env = NULL;
         qemu_mutex_unlock_iothread();
 
         run_ret = kvm_vcpu_ioctl(env, KVM_RUN, 0);
 
         qemu_mutex_lock_iothread();
-        cpu_single_env = env;
         kvm_arch_post_run(env, run);
 
         kvm_flush_coalesced_mmio_buffer();
@@ -1206,7 +1202,6 @@ int kvm_cpu_exec(CPUState *env)
     }
 
     env->exit_request = 0;
-    cpu_single_env = NULL;
     return ret;
 }
 
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v2 2/8] Allow to use pause_all_vcpus from VCPU context
  2012-02-10 18:31 ` [Qemu-devel] " Jan Kiszka
@ 2012-02-10 18:31   ` Jan Kiszka
  -1 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-10 18:31 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, qemu-devel, Anthony Liguori, Gleb Natapov, Paolo Bonzini

In order to perform critical manipulations on the VM state in the
context of a VCPU, specifically code patching, stopping and resuming of
all VCPUs may be necessary. resume_all_vcpus is already compatible, now
enable pause_all_vcpus for this use case by stopping the calling context
before starting to wait for the whole gang.

CC: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 cpus.c |   12 ++++++++++++
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/cpus.c b/cpus.c
index d0c8340..5adfc6b 100644
--- a/cpus.c
+++ b/cpus.c
@@ -870,6 +870,18 @@ void pause_all_vcpus(void)
         penv = (CPUState *)penv->next_cpu;
     }
 
+    if (!qemu_thread_is_self(&io_thread)) {
+        cpu_stop_current();
+        if (!kvm_enabled()) {
+            while (penv) {
+                penv->stop = 0;
+                penv->stopped = 1;
+                penv = (CPUState *)penv->next_cpu;
+            }
+            return;
+        }
+    }
+
     while (!all_vcpus_paused()) {
         qemu_cond_wait(&qemu_pause_cond, &qemu_global_mutex);
         penv = first_cpu;
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Qemu-devel] [PATCH v2 2/8] Allow to use pause_all_vcpus from VCPU context
@ 2012-02-10 18:31   ` Jan Kiszka
  0 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-10 18:31 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Paolo Bonzini, Anthony Liguori, qemu-devel, kvm, Gleb Natapov

In order to perform critical manipulations on the VM state in the
context of a VCPU, specifically code patching, stopping and resuming of
all VCPUs may be necessary. resume_all_vcpus is already compatible, now
enable pause_all_vcpus for this use case by stopping the calling context
before starting to wait for the whole gang.

CC: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 cpus.c |   12 ++++++++++++
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/cpus.c b/cpus.c
index d0c8340..5adfc6b 100644
--- a/cpus.c
+++ b/cpus.c
@@ -870,6 +870,18 @@ void pause_all_vcpus(void)
         penv = (CPUState *)penv->next_cpu;
     }
 
+    if (!qemu_thread_is_self(&io_thread)) {
+        cpu_stop_current();
+        if (!kvm_enabled()) {
+            while (penv) {
+                penv->stop = 0;
+                penv->stopped = 1;
+                penv = (CPUState *)penv->next_cpu;
+            }
+            return;
+        }
+    }
+
     while (!all_vcpus_paused()) {
         qemu_cond_wait(&qemu_pause_cond, &qemu_global_mutex);
         penv = first_cpu;
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v2 3/8] target-i386: Add infrastructure for reporting TPR MMIO accesses
  2012-02-10 18:31 ` [Qemu-devel] " Jan Kiszka
@ 2012-02-10 18:31   ` Jan Kiszka
  -1 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-10 18:31 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, qemu-devel, Anthony Liguori, Gleb Natapov

This will allow the APIC core to file a TPR access report. Depending on
the accelerator and kernel irqchip mode, it will either be delivered
right away or queued for later reporting.

In TCG mode, we can restart the triggering instruction and can therefore
forward the event directly. KVM does not allows us to restart, so we
postpone the delivery of events recording in the user space APIC until
the current instruction is completed.

Note that KVM without in-kernel irqchip will report the address after
the instruction that triggered a write access. In contrast, read
accesses will return the precise information.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 cpu-all.h            |    3 ++-
 hw/apic.h            |    2 ++
 hw/apic_common.c     |    4 ++++
 target-i386/cpu.h    |    9 +++++++++
 target-i386/helper.c |   19 +++++++++++++++++++
 target-i386/kvm.c    |   24 ++++++++++++++++++++++--
 6 files changed, 58 insertions(+), 3 deletions(-)

diff --git a/cpu-all.h b/cpu-all.h
index e2c3c49..80e6d42 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -375,8 +375,9 @@ DECLARE_TLS(CPUState *,cpu_single_env);
 #define CPU_INTERRUPT_TGT_INT_0   0x0100
 #define CPU_INTERRUPT_TGT_INT_1   0x0400
 #define CPU_INTERRUPT_TGT_INT_2   0x0800
+#define CPU_INTERRUPT_TGT_INT_3   0x2000
 
-/* First unused bit: 0x2000.  */
+/* First unused bit: 0x4000.  */
 
 /* The set of all bits that should be masked when single-stepping.  */
 #define CPU_INTERRUPT_SSTEP_MASK \
diff --git a/hw/apic.h b/hw/apic.h
index a62d83b..45598bd 100644
--- a/hw/apic.h
+++ b/hw/apic.h
@@ -18,6 +18,8 @@ void cpu_set_apic_tpr(DeviceState *s, uint8_t val);
 uint8_t cpu_get_apic_tpr(DeviceState *s);
 void apic_init_reset(DeviceState *s);
 void apic_sipi(DeviceState *s);
+void apic_handle_tpr_access_report(DeviceState *d, target_ulong ip,
+                                   int access);
 
 /* pc.c */
 int cpu_is_bsp(CPUState *env);
diff --git a/hw/apic_common.c b/hw/apic_common.c
index 8373d79..588531b 100644
--- a/hw/apic_common.c
+++ b/hw/apic_common.c
@@ -68,6 +68,10 @@ uint8_t cpu_get_apic_tpr(DeviceState *d)
     return s ? s->tpr >> 4 : 0;
 }
 
+void apic_handle_tpr_access_report(DeviceState *d, target_ulong ip, int access)
+{
+}
+
 void apic_report_irq_delivered(int delivered)
 {
     apic_irq_delivered += delivered;
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 37dde79..92e9c87 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -482,6 +482,7 @@
 #define CPU_INTERRUPT_VIRQ      CPU_INTERRUPT_TGT_INT_0
 #define CPU_INTERRUPT_INIT      CPU_INTERRUPT_TGT_INT_1
 #define CPU_INTERRUPT_SIPI      CPU_INTERRUPT_TGT_INT_2
+#define CPU_INTERRUPT_TPR       CPU_INTERRUPT_TGT_INT_3
 
 
 enum {
@@ -772,6 +773,9 @@ typedef struct CPUX86State {
     XMMReg ymmh_regs[CPU_NB_REGS];
 
     uint64_t xcr0;
+
+    target_ulong tpr_access_ip;
+    int tpr_access_type;
 } CPUX86State;
 
 CPUX86State *cpu_x86_init(const char *cpu_model);
@@ -1064,4 +1068,9 @@ void svm_check_intercept(CPUState *env1, uint32_t type);
 
 uint32_t cpu_cc_compute_all(CPUState *env1, int op);
 
+#define TPR_ACCESS_READ     0
+#define TPR_ACCESS_WRITE    1
+
+void cpu_report_tpr_access(CPUState *env, int access);
+
 #endif /* CPU_I386_H */
diff --git a/target-i386/helper.c b/target-i386/helper.c
index 2586aff..eca20cd 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -1189,6 +1189,25 @@ void cpu_x86_inject_mce(Monitor *mon, CPUState *cenv, int bank,
         }
     }
 }
+
+void cpu_report_tpr_access(CPUState *env, int access)
+{
+    TranslationBlock *tb;
+
+    if (kvm_enabled()) {
+        cpu_synchronize_state(env);
+
+        env->tpr_access_ip = env->eip;
+        env->tpr_access_type = access;
+
+        cpu_interrupt(env, CPU_INTERRUPT_TPR);
+    } else {
+        tb = tb_find_pc(env->mem_io_pc);
+        cpu_restore_state(tb, env, env->mem_io_pc);
+
+        apic_handle_tpr_access_report(env->apic_state, env->eip, access);
+    }
+}
 #endif /* !CONFIG_USER_ONLY */
 
 static void mce_init(CPUX86State *cenv)
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 981192d..fa77f9d 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -1635,8 +1635,10 @@ void kvm_arch_pre_run(CPUState *env, struct kvm_run *run)
     }
 
     if (!kvm_irqchip_in_kernel()) {
-        /* Force the VCPU out of its inner loop to process the INIT request */
-        if (env->interrupt_request & CPU_INTERRUPT_INIT) {
+        /* Force the VCPU out of its inner loop to process any INIT requests
+         * or pending TPR access reports. */
+        if (env->interrupt_request &
+            (CPU_INTERRUPT_INIT | CPU_INTERRUPT_TPR)) {
             env->exit_request = 1;
         }
 
@@ -1730,6 +1732,11 @@ int kvm_arch_process_async_events(CPUState *env)
         kvm_cpu_synchronize_state(env);
         do_cpu_sipi(env);
     }
+    if (env->interrupt_request & CPU_INTERRUPT_TPR) {
+        env->interrupt_request &= ~CPU_INTERRUPT_TPR;
+        apic_handle_tpr_access_report(env->apic_state, env->tpr_access_ip,
+                                      env->tpr_access_type);
+    }
 
     return env->halted;
 }
@@ -1746,6 +1753,16 @@ static int kvm_handle_halt(CPUState *env)
     return 0;
 }
 
+static int kvm_handle_tpr_access(CPUState *env)
+{
+    struct kvm_run *run = env->kvm_run;
+
+    apic_handle_tpr_access_report(env->apic_state, run->tpr_access.rip,
+                                  run->tpr_access.is_write ? TPR_ACCESS_WRITE
+                                                           : TPR_ACCESS_READ);
+    return 1;
+}
+
 int kvm_arch_insert_sw_breakpoint(CPUState *env, struct kvm_sw_breakpoint *bp)
 {
     static const uint8_t int3 = 0xcc;
@@ -1950,6 +1967,9 @@ int kvm_arch_handle_exit(CPUState *env, struct kvm_run *run)
     case KVM_EXIT_SET_TPR:
         ret = 0;
         break;
+    case KVM_EXIT_TPR_ACCESS:
+        ret = kvm_handle_tpr_access(env);
+        break;
     case KVM_EXIT_FAIL_ENTRY:
         code = run->fail_entry.hardware_entry_failure_reason;
         fprintf(stderr, "KVM: entry failed, hardware error 0x%" PRIx64 "\n",
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Qemu-devel] [PATCH v2 3/8] target-i386: Add infrastructure for reporting TPR MMIO accesses
@ 2012-02-10 18:31   ` Jan Kiszka
  0 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-10 18:31 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Anthony Liguori, qemu-devel, kvm, Gleb Natapov

This will allow the APIC core to file a TPR access report. Depending on
the accelerator and kernel irqchip mode, it will either be delivered
right away or queued for later reporting.

In TCG mode, we can restart the triggering instruction and can therefore
forward the event directly. KVM does not allows us to restart, so we
postpone the delivery of events recording in the user space APIC until
the current instruction is completed.

Note that KVM without in-kernel irqchip will report the address after
the instruction that triggered a write access. In contrast, read
accesses will return the precise information.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 cpu-all.h            |    3 ++-
 hw/apic.h            |    2 ++
 hw/apic_common.c     |    4 ++++
 target-i386/cpu.h    |    9 +++++++++
 target-i386/helper.c |   19 +++++++++++++++++++
 target-i386/kvm.c    |   24 ++++++++++++++++++++++--
 6 files changed, 58 insertions(+), 3 deletions(-)

diff --git a/cpu-all.h b/cpu-all.h
index e2c3c49..80e6d42 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -375,8 +375,9 @@ DECLARE_TLS(CPUState *,cpu_single_env);
 #define CPU_INTERRUPT_TGT_INT_0   0x0100
 #define CPU_INTERRUPT_TGT_INT_1   0x0400
 #define CPU_INTERRUPT_TGT_INT_2   0x0800
+#define CPU_INTERRUPT_TGT_INT_3   0x2000
 
-/* First unused bit: 0x2000.  */
+/* First unused bit: 0x4000.  */
 
 /* The set of all bits that should be masked when single-stepping.  */
 #define CPU_INTERRUPT_SSTEP_MASK \
diff --git a/hw/apic.h b/hw/apic.h
index a62d83b..45598bd 100644
--- a/hw/apic.h
+++ b/hw/apic.h
@@ -18,6 +18,8 @@ void cpu_set_apic_tpr(DeviceState *s, uint8_t val);
 uint8_t cpu_get_apic_tpr(DeviceState *s);
 void apic_init_reset(DeviceState *s);
 void apic_sipi(DeviceState *s);
+void apic_handle_tpr_access_report(DeviceState *d, target_ulong ip,
+                                   int access);
 
 /* pc.c */
 int cpu_is_bsp(CPUState *env);
diff --git a/hw/apic_common.c b/hw/apic_common.c
index 8373d79..588531b 100644
--- a/hw/apic_common.c
+++ b/hw/apic_common.c
@@ -68,6 +68,10 @@ uint8_t cpu_get_apic_tpr(DeviceState *d)
     return s ? s->tpr >> 4 : 0;
 }
 
+void apic_handle_tpr_access_report(DeviceState *d, target_ulong ip, int access)
+{
+}
+
 void apic_report_irq_delivered(int delivered)
 {
     apic_irq_delivered += delivered;
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 37dde79..92e9c87 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -482,6 +482,7 @@
 #define CPU_INTERRUPT_VIRQ      CPU_INTERRUPT_TGT_INT_0
 #define CPU_INTERRUPT_INIT      CPU_INTERRUPT_TGT_INT_1
 #define CPU_INTERRUPT_SIPI      CPU_INTERRUPT_TGT_INT_2
+#define CPU_INTERRUPT_TPR       CPU_INTERRUPT_TGT_INT_3
 
 
 enum {
@@ -772,6 +773,9 @@ typedef struct CPUX86State {
     XMMReg ymmh_regs[CPU_NB_REGS];
 
     uint64_t xcr0;
+
+    target_ulong tpr_access_ip;
+    int tpr_access_type;
 } CPUX86State;
 
 CPUX86State *cpu_x86_init(const char *cpu_model);
@@ -1064,4 +1068,9 @@ void svm_check_intercept(CPUState *env1, uint32_t type);
 
 uint32_t cpu_cc_compute_all(CPUState *env1, int op);
 
+#define TPR_ACCESS_READ     0
+#define TPR_ACCESS_WRITE    1
+
+void cpu_report_tpr_access(CPUState *env, int access);
+
 #endif /* CPU_I386_H */
diff --git a/target-i386/helper.c b/target-i386/helper.c
index 2586aff..eca20cd 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -1189,6 +1189,25 @@ void cpu_x86_inject_mce(Monitor *mon, CPUState *cenv, int bank,
         }
     }
 }
+
+void cpu_report_tpr_access(CPUState *env, int access)
+{
+    TranslationBlock *tb;
+
+    if (kvm_enabled()) {
+        cpu_synchronize_state(env);
+
+        env->tpr_access_ip = env->eip;
+        env->tpr_access_type = access;
+
+        cpu_interrupt(env, CPU_INTERRUPT_TPR);
+    } else {
+        tb = tb_find_pc(env->mem_io_pc);
+        cpu_restore_state(tb, env, env->mem_io_pc);
+
+        apic_handle_tpr_access_report(env->apic_state, env->eip, access);
+    }
+}
 #endif /* !CONFIG_USER_ONLY */
 
 static void mce_init(CPUX86State *cenv)
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 981192d..fa77f9d 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -1635,8 +1635,10 @@ void kvm_arch_pre_run(CPUState *env, struct kvm_run *run)
     }
 
     if (!kvm_irqchip_in_kernel()) {
-        /* Force the VCPU out of its inner loop to process the INIT request */
-        if (env->interrupt_request & CPU_INTERRUPT_INIT) {
+        /* Force the VCPU out of its inner loop to process any INIT requests
+         * or pending TPR access reports. */
+        if (env->interrupt_request &
+            (CPU_INTERRUPT_INIT | CPU_INTERRUPT_TPR)) {
             env->exit_request = 1;
         }
 
@@ -1730,6 +1732,11 @@ int kvm_arch_process_async_events(CPUState *env)
         kvm_cpu_synchronize_state(env);
         do_cpu_sipi(env);
     }
+    if (env->interrupt_request & CPU_INTERRUPT_TPR) {
+        env->interrupt_request &= ~CPU_INTERRUPT_TPR;
+        apic_handle_tpr_access_report(env->apic_state, env->tpr_access_ip,
+                                      env->tpr_access_type);
+    }
 
     return env->halted;
 }
@@ -1746,6 +1753,16 @@ static int kvm_handle_halt(CPUState *env)
     return 0;
 }
 
+static int kvm_handle_tpr_access(CPUState *env)
+{
+    struct kvm_run *run = env->kvm_run;
+
+    apic_handle_tpr_access_report(env->apic_state, run->tpr_access.rip,
+                                  run->tpr_access.is_write ? TPR_ACCESS_WRITE
+                                                           : TPR_ACCESS_READ);
+    return 1;
+}
+
 int kvm_arch_insert_sw_breakpoint(CPUState *env, struct kvm_sw_breakpoint *bp)
 {
     static const uint8_t int3 = 0xcc;
@@ -1950,6 +1967,9 @@ int kvm_arch_handle_exit(CPUState *env, struct kvm_run *run)
     case KVM_EXIT_SET_TPR:
         ret = 0;
         break;
+    case KVM_EXIT_TPR_ACCESS:
+        ret = kvm_handle_tpr_access(env);
+        break;
     case KVM_EXIT_FAIL_ENTRY:
         code = run->fail_entry.hardware_entry_failure_reason;
         fprintf(stderr, "KVM: entry failed, hardware error 0x%" PRIx64 "\n",
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v2 4/8] kvmvapic: Add option ROM
  2012-02-10 18:31 ` [Qemu-devel] " Jan Kiszka
@ 2012-02-10 18:31   ` Jan Kiszka
  -1 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-10 18:31 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, qemu-devel, Anthony Liguori, Gleb Natapov

This imports and builds the original VAPIC option ROM of qemu-kvm.
Its interaction with QEMU is described in the commit that introduces the
corresponding device model.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 .gitignore                   |    1 +
 Makefile                     |    2 +-
 pc-bios/optionrom/Makefile   |    2 +-
 pc-bios/optionrom/kvmvapic.S |  341 ++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 344 insertions(+), 2 deletions(-)
 create mode 100644 pc-bios/optionrom/kvmvapic.S

diff --git a/.gitignore b/.gitignore
index f5aab2c..d3b78c3 100644
--- a/.gitignore
+++ b/.gitignore
@@ -75,6 +75,7 @@ pc-bios/vgabios-pq/status
 pc-bios/optionrom/linuxboot.bin
 pc-bios/optionrom/multiboot.bin
 pc-bios/optionrom/multiboot.raw
+pc-bios/optionrom/kvmvapic.bin
 .stgit-*
 cscope.*
 tags
diff --git a/Makefile b/Makefile
index 47acf3d..c2ef135 100644
--- a/Makefile
+++ b/Makefile
@@ -255,7 +255,7 @@ pxe-e1000.rom pxe-eepro100.rom pxe-ne2k_pci.rom \
 pxe-pcnet.rom pxe-rtl8139.rom pxe-virtio.rom \
 bamboo.dtb petalogix-s3adsp1800.dtb petalogix-ml605.dtb \
 mpc8544ds.dtb \
-multiboot.bin linuxboot.bin \
+multiboot.bin linuxboot.bin kvmvapic.bin \
 s390-zipl.rom \
 spapr-rtas.bin slof.bin \
 palcode-clipper
diff --git a/pc-bios/optionrom/Makefile b/pc-bios/optionrom/Makefile
index 2caf7e6..f6b4027 100644
--- a/pc-bios/optionrom/Makefile
+++ b/pc-bios/optionrom/Makefile
@@ -14,7 +14,7 @@ CFLAGS += -I$(SRC_PATH)
 CFLAGS += $(call cc-option, $(CFLAGS), -fno-stack-protector)
 QEMU_CFLAGS = $(CFLAGS)
 
-build-all: multiboot.bin linuxboot.bin
+build-all: multiboot.bin linuxboot.bin kvmvapic.bin
 
 # suppress auto-removal of intermediate files
 .SECONDARY:
diff --git a/pc-bios/optionrom/kvmvapic.S b/pc-bios/optionrom/kvmvapic.S
new file mode 100644
index 0000000..e1d8f18
--- /dev/null
+++ b/pc-bios/optionrom/kvmvapic.S
@@ -0,0 +1,341 @@
+#
+# Local APIC acceleration for Windows XP and related guests
+#
+# Copyright 2011 Red Hat, Inc. and/or its affiliates
+#
+# Author: Avi Kivity <avi@redhat.com>
+#
+# This work is licensed under the terms of the GNU GPL, version 2, or (at your
+# option) any later version.  See the COPYING file in the top-level directory.
+#
+
+	.text 0
+	.code16
+.global _start
+_start:
+	.short 0xaa55
+	.byte (_end - _start) / 512
+	# clear vapic area: firmware load using rep insb may cause
+	# stale tpr/isr/irr data to corrupt the vapic area.
+	push %es
+	push %cs
+	pop %es
+	xor %ax, %ax
+	mov $vapic_size/2, %cx
+	lea vapic, %di
+	cld
+	rep stosw
+	pop %es
+	mov $vapic_base, %ax
+	out %ax, $0x7e
+	lret
+
+	.code32
+vapic_size = 2*4096
+
+.macro fixup delta=-4
+777:
+	.text 1
+	.long 777b + \delta  - vapic_base
+	.text 0
+.endm
+
+.macro reenable_vtpr
+	out %al, $0x7e
+.endm
+
+.text 1
+	fixup_start = .
+.text 0
+
+.align 16
+
+vapic_base:
+	.ascii "kvm aPiC"
+
+	/* relocation data */
+	.long vapic_base	; fixup
+	.long fixup_start	; fixup
+	.long fixup_end		; fixup
+
+	.long vapic		; fixup
+	.long vapic_size
+vcpu_shift:
+	.long 0
+real_tpr:
+	.long 0
+	.long up_set_tpr	; fixup
+	.long up_set_tpr_eax	; fixup
+	.long up_get_tpr_eax	; fixup
+	.long up_get_tpr_ecx	; fixup
+	.long up_get_tpr_edx	; fixup
+	.long up_get_tpr_ebx	; fixup
+	.long 0 /* esp. won't work. */
+	.long up_get_tpr_ebp	; fixup
+	.long up_get_tpr_esi	; fixup
+	.long up_get_tpr_edi	; fixup
+	.long up_get_tpr_stack  ; fixup
+	.long mp_set_tpr	; fixup
+	.long mp_set_tpr_eax	; fixup
+	.long mp_get_tpr_eax	; fixup
+	.long mp_get_tpr_ecx	; fixup
+	.long mp_get_tpr_edx	; fixup
+	.long mp_get_tpr_ebx	; fixup
+	.long 0 /* esp. won't work. */
+	.long mp_get_tpr_ebp	; fixup
+	.long mp_get_tpr_esi	; fixup
+	.long mp_get_tpr_edi	; fixup
+	.long mp_get_tpr_stack  ; fixup
+
+.macro kvm_hypercall
+	.byte 0x0f, 0x01, 0xc1
+.endm
+
+kvm_hypercall_vapic_poll_irq = 1
+
+pcr_cpu = 0x51
+
+.align 64
+
+mp_get_tpr_eax:
+	pushf
+	cli
+	reenable_vtpr
+	push %ecx
+
+	fs/movzbl pcr_cpu, %eax
+
+	mov vcpu_shift, %ecx	; fixup
+	shl %cl, %eax
+	testb $1, vapic+4(%eax)	; fixup delta=-5
+	jz mp_get_tpr_bad
+	movzbl vapic(%eax), %eax ; fixup
+
+mp_get_tpr_out:
+	pop %ecx
+	popf
+	ret
+
+mp_get_tpr_bad:
+	mov real_tpr, %eax	; fixup
+	mov (%eax), %eax
+	jmp mp_get_tpr_out
+
+mp_get_tpr_ebx:
+	mov %eax, %ebx
+	call mp_get_tpr_eax
+	xchg %eax, %ebx
+	ret
+
+mp_get_tpr_ecx:
+	mov %eax, %ecx
+	call mp_get_tpr_eax
+	xchg %eax, %ecx
+	ret
+
+mp_get_tpr_edx:
+	mov %eax, %edx
+	call mp_get_tpr_eax
+	xchg %eax, %edx
+	ret
+
+mp_get_tpr_esi:
+	mov %eax, %esi
+	call mp_get_tpr_eax
+	xchg %eax, %esi
+	ret
+
+mp_get_tpr_edi:
+	mov %eax, %edi
+	call mp_get_tpr_edi
+	xchg %eax, %edi
+	ret
+
+mp_get_tpr_ebp:
+	mov %eax, %ebp
+	call mp_get_tpr_eax
+	xchg %eax, %ebp
+	ret
+
+mp_get_tpr_stack:
+	call mp_get_tpr_eax
+	xchg %eax, 4(%esp)
+	ret
+
+mp_set_tpr_eax:
+	push %eax
+	call mp_set_tpr
+	ret
+
+mp_set_tpr:
+	pushf
+	push %eax
+	push %ecx
+	push %edx
+	push %ebx
+	cli
+	reenable_vtpr
+
+mp_set_tpr_failed:
+	fs/movzbl pcr_cpu, %edx
+
+	mov vcpu_shift, %ecx	; fixup
+	shl %cl, %edx
+
+	testb $1, vapic+4(%edx)	; fixup delta=-5
+	jz mp_set_tpr_bad
+
+	mov vapic(%edx), %eax	; fixup
+
+	mov %eax, %ebx
+	mov 24(%esp), %bl
+
+	/* %ebx = new vapic (%bl = tpr, %bh = isr, %b3 = irr) */
+
+	lock cmpxchg %ebx, vapic(%edx) ; fixup
+	jnz mp_set_tpr_failed
+
+	/* compute ppr */
+	cmp %bh, %bl
+	jae mp_tpr_is_bigger
+mp_isr_is_bigger:
+	mov %bh, %bl
+mp_tpr_is_bigger:
+	/* %bl = ppr */
+	mov %bl, %ch   /* ch = ppr */
+	rol $8, %ebx
+	/* now: %bl = irr, %bh = ppr */
+	cmp %bh, %bl
+	ja mp_set_tpr_poll_irq
+
+mp_set_tpr_out:
+	pop %ebx
+	pop %edx
+	pop %ecx
+	pop %eax
+	popf
+	ret $4
+
+mp_set_tpr_poll_irq:
+	mov $kvm_hypercall_vapic_poll_irq, %eax
+	kvm_hypercall
+	jmp mp_set_tpr_out
+
+mp_set_tpr_bad:
+	mov 24(%esp), %ecx
+	mov real_tpr, %eax	; fixup
+	mov %ecx, (%eax)
+	jmp mp_set_tpr_out
+
+up_get_tpr_eax:
+	reenable_vtpr
+	movzbl vapic, %eax ; fixup
+	ret
+
+up_get_tpr_ebx:
+	reenable_vtpr
+	movzbl vapic, %ebx ; fixup
+	ret
+
+up_get_tpr_ecx:
+	reenable_vtpr
+	movzbl vapic, %ecx ; fixup
+	ret
+
+up_get_tpr_edx:
+	reenable_vtpr
+	movzbl vapic, %edx ; fixup
+	ret
+
+up_get_tpr_esi:
+	reenable_vtpr
+	movzbl vapic, %esi ; fixup
+	ret
+
+up_get_tpr_edi:
+	reenable_vtpr
+	movzbl vapic, %edi ; fixup
+	ret
+
+up_get_tpr_ebp:
+	reenable_vtpr
+	movzbl vapic, %ebp ; fixup
+	ret
+
+up_get_tpr_stack:
+	reenable_vtpr
+	movzbl vapic, %eax ; fixup
+	xchg %eax, 4(%esp)
+	ret
+
+up_set_tpr_eax:
+	push %eax
+	call up_set_tpr
+	ret
+
+up_set_tpr:
+	pushf
+	push %eax
+	push %ecx
+	push %ebx
+	reenable_vtpr
+
+up_set_tpr_failed:
+	mov vapic, %eax	; fixup
+
+	mov %eax, %ebx
+	mov 20(%esp), %bl
+
+	/* %ebx = new vapic (%bl = tpr, %bh = isr, %b3 = irr) */
+
+	lock cmpxchg %ebx, vapic ; fixup
+	jnz up_set_tpr_failed
+
+	/* compute ppr */
+	cmp %bh, %bl
+	jae up_tpr_is_bigger
+up_isr_is_bigger:
+	mov %bh, %bl
+up_tpr_is_bigger:
+	/* %bl = ppr */
+	mov %bl, %ch   /* ch = ppr */
+	rol $8, %ebx
+	/* now: %bl = irr, %bh = ppr */
+	cmp %bh, %bl
+	ja up_set_tpr_poll_irq
+
+up_set_tpr_out:
+	pop %ebx
+	pop %ecx
+	pop %eax
+	popf
+	ret $4
+
+up_set_tpr_poll_irq:
+	mov $kvm_hypercall_vapic_poll_irq, %eax
+	kvm_hypercall
+	jmp up_set_tpr_out
+
+.text 1
+	fixup_end = .
+.text 0
+
+/*
+ * vapic format:
+ *  per-vcpu records of size 2^vcpu shift.
+ *     byte 0: tpr (r/w)
+ *     byte 1: highest in-service interrupt (isr) (r/o); bits 3:0 are zero
+ *     byte 2: zero (r/o)
+ *     byte 3: highest pending interrupt (irr) (r/o)
+ */
+.text 2
+
+.align 128
+
+vapic:
+. = . + vapic_size
+
+.byte 0  # reserve space for signature
+.align 512, 0
+
+_end:
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Qemu-devel] [PATCH v2 4/8] kvmvapic: Add option ROM
@ 2012-02-10 18:31   ` Jan Kiszka
  0 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-10 18:31 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Anthony Liguori, qemu-devel, kvm, Gleb Natapov

This imports and builds the original VAPIC option ROM of qemu-kvm.
Its interaction with QEMU is described in the commit that introduces the
corresponding device model.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 .gitignore                   |    1 +
 Makefile                     |    2 +-
 pc-bios/optionrom/Makefile   |    2 +-
 pc-bios/optionrom/kvmvapic.S |  341 ++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 344 insertions(+), 2 deletions(-)
 create mode 100644 pc-bios/optionrom/kvmvapic.S

diff --git a/.gitignore b/.gitignore
index f5aab2c..d3b78c3 100644
--- a/.gitignore
+++ b/.gitignore
@@ -75,6 +75,7 @@ pc-bios/vgabios-pq/status
 pc-bios/optionrom/linuxboot.bin
 pc-bios/optionrom/multiboot.bin
 pc-bios/optionrom/multiboot.raw
+pc-bios/optionrom/kvmvapic.bin
 .stgit-*
 cscope.*
 tags
diff --git a/Makefile b/Makefile
index 47acf3d..c2ef135 100644
--- a/Makefile
+++ b/Makefile
@@ -255,7 +255,7 @@ pxe-e1000.rom pxe-eepro100.rom pxe-ne2k_pci.rom \
 pxe-pcnet.rom pxe-rtl8139.rom pxe-virtio.rom \
 bamboo.dtb petalogix-s3adsp1800.dtb petalogix-ml605.dtb \
 mpc8544ds.dtb \
-multiboot.bin linuxboot.bin \
+multiboot.bin linuxboot.bin kvmvapic.bin \
 s390-zipl.rom \
 spapr-rtas.bin slof.bin \
 palcode-clipper
diff --git a/pc-bios/optionrom/Makefile b/pc-bios/optionrom/Makefile
index 2caf7e6..f6b4027 100644
--- a/pc-bios/optionrom/Makefile
+++ b/pc-bios/optionrom/Makefile
@@ -14,7 +14,7 @@ CFLAGS += -I$(SRC_PATH)
 CFLAGS += $(call cc-option, $(CFLAGS), -fno-stack-protector)
 QEMU_CFLAGS = $(CFLAGS)
 
-build-all: multiboot.bin linuxboot.bin
+build-all: multiboot.bin linuxboot.bin kvmvapic.bin
 
 # suppress auto-removal of intermediate files
 .SECONDARY:
diff --git a/pc-bios/optionrom/kvmvapic.S b/pc-bios/optionrom/kvmvapic.S
new file mode 100644
index 0000000..e1d8f18
--- /dev/null
+++ b/pc-bios/optionrom/kvmvapic.S
@@ -0,0 +1,341 @@
+#
+# Local APIC acceleration for Windows XP and related guests
+#
+# Copyright 2011 Red Hat, Inc. and/or its affiliates
+#
+# Author: Avi Kivity <avi@redhat.com>
+#
+# This work is licensed under the terms of the GNU GPL, version 2, or (at your
+# option) any later version.  See the COPYING file in the top-level directory.
+#
+
+	.text 0
+	.code16
+.global _start
+_start:
+	.short 0xaa55
+	.byte (_end - _start) / 512
+	# clear vapic area: firmware load using rep insb may cause
+	# stale tpr/isr/irr data to corrupt the vapic area.
+	push %es
+	push %cs
+	pop %es
+	xor %ax, %ax
+	mov $vapic_size/2, %cx
+	lea vapic, %di
+	cld
+	rep stosw
+	pop %es
+	mov $vapic_base, %ax
+	out %ax, $0x7e
+	lret
+
+	.code32
+vapic_size = 2*4096
+
+.macro fixup delta=-4
+777:
+	.text 1
+	.long 777b + \delta  - vapic_base
+	.text 0
+.endm
+
+.macro reenable_vtpr
+	out %al, $0x7e
+.endm
+
+.text 1
+	fixup_start = .
+.text 0
+
+.align 16
+
+vapic_base:
+	.ascii "kvm aPiC"
+
+	/* relocation data */
+	.long vapic_base	; fixup
+	.long fixup_start	; fixup
+	.long fixup_end		; fixup
+
+	.long vapic		; fixup
+	.long vapic_size
+vcpu_shift:
+	.long 0
+real_tpr:
+	.long 0
+	.long up_set_tpr	; fixup
+	.long up_set_tpr_eax	; fixup
+	.long up_get_tpr_eax	; fixup
+	.long up_get_tpr_ecx	; fixup
+	.long up_get_tpr_edx	; fixup
+	.long up_get_tpr_ebx	; fixup
+	.long 0 /* esp. won't work. */
+	.long up_get_tpr_ebp	; fixup
+	.long up_get_tpr_esi	; fixup
+	.long up_get_tpr_edi	; fixup
+	.long up_get_tpr_stack  ; fixup
+	.long mp_set_tpr	; fixup
+	.long mp_set_tpr_eax	; fixup
+	.long mp_get_tpr_eax	; fixup
+	.long mp_get_tpr_ecx	; fixup
+	.long mp_get_tpr_edx	; fixup
+	.long mp_get_tpr_ebx	; fixup
+	.long 0 /* esp. won't work. */
+	.long mp_get_tpr_ebp	; fixup
+	.long mp_get_tpr_esi	; fixup
+	.long mp_get_tpr_edi	; fixup
+	.long mp_get_tpr_stack  ; fixup
+
+.macro kvm_hypercall
+	.byte 0x0f, 0x01, 0xc1
+.endm
+
+kvm_hypercall_vapic_poll_irq = 1
+
+pcr_cpu = 0x51
+
+.align 64
+
+mp_get_tpr_eax:
+	pushf
+	cli
+	reenable_vtpr
+	push %ecx
+
+	fs/movzbl pcr_cpu, %eax
+
+	mov vcpu_shift, %ecx	; fixup
+	shl %cl, %eax
+	testb $1, vapic+4(%eax)	; fixup delta=-5
+	jz mp_get_tpr_bad
+	movzbl vapic(%eax), %eax ; fixup
+
+mp_get_tpr_out:
+	pop %ecx
+	popf
+	ret
+
+mp_get_tpr_bad:
+	mov real_tpr, %eax	; fixup
+	mov (%eax), %eax
+	jmp mp_get_tpr_out
+
+mp_get_tpr_ebx:
+	mov %eax, %ebx
+	call mp_get_tpr_eax
+	xchg %eax, %ebx
+	ret
+
+mp_get_tpr_ecx:
+	mov %eax, %ecx
+	call mp_get_tpr_eax
+	xchg %eax, %ecx
+	ret
+
+mp_get_tpr_edx:
+	mov %eax, %edx
+	call mp_get_tpr_eax
+	xchg %eax, %edx
+	ret
+
+mp_get_tpr_esi:
+	mov %eax, %esi
+	call mp_get_tpr_eax
+	xchg %eax, %esi
+	ret
+
+mp_get_tpr_edi:
+	mov %eax, %edi
+	call mp_get_tpr_edi
+	xchg %eax, %edi
+	ret
+
+mp_get_tpr_ebp:
+	mov %eax, %ebp
+	call mp_get_tpr_eax
+	xchg %eax, %ebp
+	ret
+
+mp_get_tpr_stack:
+	call mp_get_tpr_eax
+	xchg %eax, 4(%esp)
+	ret
+
+mp_set_tpr_eax:
+	push %eax
+	call mp_set_tpr
+	ret
+
+mp_set_tpr:
+	pushf
+	push %eax
+	push %ecx
+	push %edx
+	push %ebx
+	cli
+	reenable_vtpr
+
+mp_set_tpr_failed:
+	fs/movzbl pcr_cpu, %edx
+
+	mov vcpu_shift, %ecx	; fixup
+	shl %cl, %edx
+
+	testb $1, vapic+4(%edx)	; fixup delta=-5
+	jz mp_set_tpr_bad
+
+	mov vapic(%edx), %eax	; fixup
+
+	mov %eax, %ebx
+	mov 24(%esp), %bl
+
+	/* %ebx = new vapic (%bl = tpr, %bh = isr, %b3 = irr) */
+
+	lock cmpxchg %ebx, vapic(%edx) ; fixup
+	jnz mp_set_tpr_failed
+
+	/* compute ppr */
+	cmp %bh, %bl
+	jae mp_tpr_is_bigger
+mp_isr_is_bigger:
+	mov %bh, %bl
+mp_tpr_is_bigger:
+	/* %bl = ppr */
+	mov %bl, %ch   /* ch = ppr */
+	rol $8, %ebx
+	/* now: %bl = irr, %bh = ppr */
+	cmp %bh, %bl
+	ja mp_set_tpr_poll_irq
+
+mp_set_tpr_out:
+	pop %ebx
+	pop %edx
+	pop %ecx
+	pop %eax
+	popf
+	ret $4
+
+mp_set_tpr_poll_irq:
+	mov $kvm_hypercall_vapic_poll_irq, %eax
+	kvm_hypercall
+	jmp mp_set_tpr_out
+
+mp_set_tpr_bad:
+	mov 24(%esp), %ecx
+	mov real_tpr, %eax	; fixup
+	mov %ecx, (%eax)
+	jmp mp_set_tpr_out
+
+up_get_tpr_eax:
+	reenable_vtpr
+	movzbl vapic, %eax ; fixup
+	ret
+
+up_get_tpr_ebx:
+	reenable_vtpr
+	movzbl vapic, %ebx ; fixup
+	ret
+
+up_get_tpr_ecx:
+	reenable_vtpr
+	movzbl vapic, %ecx ; fixup
+	ret
+
+up_get_tpr_edx:
+	reenable_vtpr
+	movzbl vapic, %edx ; fixup
+	ret
+
+up_get_tpr_esi:
+	reenable_vtpr
+	movzbl vapic, %esi ; fixup
+	ret
+
+up_get_tpr_edi:
+	reenable_vtpr
+	movzbl vapic, %edi ; fixup
+	ret
+
+up_get_tpr_ebp:
+	reenable_vtpr
+	movzbl vapic, %ebp ; fixup
+	ret
+
+up_get_tpr_stack:
+	reenable_vtpr
+	movzbl vapic, %eax ; fixup
+	xchg %eax, 4(%esp)
+	ret
+
+up_set_tpr_eax:
+	push %eax
+	call up_set_tpr
+	ret
+
+up_set_tpr:
+	pushf
+	push %eax
+	push %ecx
+	push %ebx
+	reenable_vtpr
+
+up_set_tpr_failed:
+	mov vapic, %eax	; fixup
+
+	mov %eax, %ebx
+	mov 20(%esp), %bl
+
+	/* %ebx = new vapic (%bl = tpr, %bh = isr, %b3 = irr) */
+
+	lock cmpxchg %ebx, vapic ; fixup
+	jnz up_set_tpr_failed
+
+	/* compute ppr */
+	cmp %bh, %bl
+	jae up_tpr_is_bigger
+up_isr_is_bigger:
+	mov %bh, %bl
+up_tpr_is_bigger:
+	/* %bl = ppr */
+	mov %bl, %ch   /* ch = ppr */
+	rol $8, %ebx
+	/* now: %bl = irr, %bh = ppr */
+	cmp %bh, %bl
+	ja up_set_tpr_poll_irq
+
+up_set_tpr_out:
+	pop %ebx
+	pop %ecx
+	pop %eax
+	popf
+	ret $4
+
+up_set_tpr_poll_irq:
+	mov $kvm_hypercall_vapic_poll_irq, %eax
+	kvm_hypercall
+	jmp up_set_tpr_out
+
+.text 1
+	fixup_end = .
+.text 0
+
+/*
+ * vapic format:
+ *  per-vcpu records of size 2^vcpu shift.
+ *     byte 0: tpr (r/w)
+ *     byte 1: highest in-service interrupt (isr) (r/o); bits 3:0 are zero
+ *     byte 2: zero (r/o)
+ *     byte 3: highest pending interrupt (irr) (r/o)
+ */
+.text 2
+
+.align 128
+
+vapic:
+. = . + vapic_size
+
+.byte 0  # reserve space for signature
+.align 512, 0
+
+_end:
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests
  2012-02-10 18:31 ` [Qemu-devel] " Jan Kiszka
@ 2012-02-10 18:31   ` Jan Kiszka
  -1 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-10 18:31 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, qemu-devel, Anthony Liguori, Gleb Natapov

This enables acceleration for MMIO-based TPR registers accesses of
32-bit Windows guest systems. It is mostly useful with KVM enabled,
either on older Intel CPUs (without flexpriority feature, can also be
manually disabled for testing) or any current AMD processor.

The approach introduced here is derived from the original version of
qemu-kvm. It was refactored, documented, and extended by support for
user space APIC emulation, both with and without KVM acceleration. The
VMState format was kept compatible, so was the ABI to the option ROM
that implements the guest-side para-virtualized driver service. This
enables seamless migration from qemu-kvm to upstream or, one day,
between KVM and TCG mode.

The basic concept goes like this:
 - VAPIC PV interface consisting of I/O port 0x7e and (for KVM in-kernel
   irqchip) a vmcall hypercall is registered
 - VAPIC option ROM is loaded into guest
 - option ROM activates TPR MMIO access reporting via port 0x7e
 - TPR accesses are trapped and patched in the guest to call into option
   ROM instead, VAPIC support is enabled
 - option ROM TPR helpers track state in memory and invoke hypercall to
   poll for pending IRQs if required

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 Makefile.target    |    3 +-
 hw/apic.c          |  126 ++++++++-
 hw/apic_common.c   |   64 +++++-
 hw/apic_internal.h |   27 ++
 hw/kvm/apic.c      |   32 +++
 hw/kvmvapic.c      |  774 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 1012 insertions(+), 14 deletions(-)
 create mode 100644 hw/kvmvapic.c

diff --git a/Makefile.target b/Makefile.target
index 68481a3..ec7eff8 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -230,7 +230,8 @@ obj-y += device-hotplug.o
 
 # Hardware support
 obj-i386-y += mc146818rtc.o pc.o
-obj-i386-y += sga.o apic_common.o apic.o ioapic_common.o ioapic.o piix_pci.o
+obj-i386-y += apic_common.o apic.o kvmvapic.o
+obj-i386-y += sga.o ioapic_common.o ioapic.o piix_pci.o
 obj-i386-y += vmport.o
 obj-i386-y += pci-hotplug.o smbios.o wdt_ib700.o
 obj-i386-y += debugcon.o multiboot.o
diff --git a/hw/apic.c b/hw/apic.c
index 086c544..2ebf3ca 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -35,6 +35,10 @@
 #define MSI_ADDR_DEST_ID_SHIFT		12
 #define	MSI_ADDR_DEST_ID_MASK		0x00ffff0
 
+#define SYNC_FROM_VAPIC                 0x1
+#define SYNC_TO_VAPIC                   0x2
+#define SYNC_ISR_IRR_TO_VAPIC           0x4
+
 static APICCommonState *local_apics[MAX_APICS + 1];
 
 static void apic_set_irq(APICCommonState *s, int vector_num, int trigger_mode);
@@ -78,6 +82,70 @@ static inline int get_bit(uint32_t *tab, int index)
     return !!(tab[i] & mask);
 }
 
+/* return -1 if no bit is set */
+static int get_highest_priority_int(uint32_t *tab)
+{
+    int i;
+    for (i = 7; i >= 0; i--) {
+        if (tab[i] != 0) {
+            return i * 32 + fls_bit(tab[i]);
+        }
+    }
+    return -1;
+}
+
+static void apic_sync_vapic(APICCommonState *s, int sync_type)
+{
+    VAPICState vapic_state;
+    size_t length;
+    off_t start;
+    int vector;
+
+    if (!s->vapic_paddr) {
+        return;
+    }
+    if (sync_type & SYNC_FROM_VAPIC) {
+        cpu_physical_memory_rw(s->vapic_paddr, (void *)&vapic_state,
+                               sizeof(vapic_state), 0);
+        s->tpr = vapic_state.tpr;
+    }
+    if (sync_type & (SYNC_TO_VAPIC | SYNC_ISR_IRR_TO_VAPIC)) {
+        start = offsetof(VAPICState, isr);
+        length = offsetof(VAPICState, enabled) - offsetof(VAPICState, isr);
+
+        if (sync_type & SYNC_TO_VAPIC) {
+            assert(qemu_cpu_is_self(s->cpu_env));
+
+            vapic_state.tpr = s->tpr;
+            vapic_state.enabled = 1;
+            start = 0;
+            length = sizeof(VAPICState);
+        }
+
+        vector = get_highest_priority_int(s->isr);
+        if (vector < 0) {
+            vector = 0;
+        }
+        vapic_state.isr = vector & 0xf0;
+
+        vapic_state.zero = 0;
+
+        vector = get_highest_priority_int(s->irr);
+        if (vector < 0) {
+            vector = 0;
+        }
+        vapic_state.irr = vector & 0xff;
+
+        cpu_physical_memory_write_rom(s->vapic_paddr + start,
+                                      ((void *)&vapic_state) + start, length);
+    }
+}
+
+static void apic_vapic_base_update(APICCommonState *s)
+{
+    apic_sync_vapic(s, SYNC_TO_VAPIC);
+}
+
 static void apic_local_deliver(APICCommonState *s, int vector)
 {
     uint32_t lvt = s->lvt[vector];
@@ -239,20 +307,17 @@ static void apic_set_base(APICCommonState *s, uint64_t val)
 
 static void apic_set_tpr(APICCommonState *s, uint8_t val)
 {
-    s->tpr = (val & 0x0f) << 4;
-    apic_update_irq(s);
+    /* Updates from cr8 are ignored while the VAPIC is active */
+    if (!s->vapic_paddr) {
+        s->tpr = val << 4;
+        apic_update_irq(s);
+    }
 }
 
-/* return -1 if no bit is set */
-static int get_highest_priority_int(uint32_t *tab)
+static uint8_t apic_get_tpr(APICCommonState *s)
 {
-    int i;
-    for(i = 7; i >= 0; i--) {
-        if (tab[i] != 0) {
-            return i * 32 + fls_bit(tab[i]);
-        }
-    }
-    return -1;
+    apic_sync_vapic(s, SYNC_FROM_VAPIC);
+    return s->tpr >> 4;
 }
 
 static int apic_get_ppr(APICCommonState *s)
@@ -312,6 +377,14 @@ static void apic_update_irq(APICCommonState *s)
     }
 }
 
+void apic_poll_irq(DeviceState *d)
+{
+    APICCommonState *s = APIC_COMMON(d);
+
+    apic_sync_vapic(s, SYNC_FROM_VAPIC);
+    apic_update_irq(s);
+}
+
 static void apic_set_irq(APICCommonState *s, int vector_num, int trigger_mode)
 {
     apic_report_irq_delivered(!get_bit(s->irr, vector_num));
@@ -321,6 +394,16 @@ static void apic_set_irq(APICCommonState *s, int vector_num, int trigger_mode)
         set_bit(s->tmr, vector_num);
     else
         reset_bit(s->tmr, vector_num);
+    if (s->vapic_paddr) {
+        apic_sync_vapic(s, SYNC_ISR_IRR_TO_VAPIC);
+        /*
+         * The vcpu thread needs to see the new IRR before we pull its current
+         * TPR value. That way, if we miss a lowering of the TRP, the guest
+         * has the chance to notice the new IRR and poll for IRQs on its own.
+         */
+        smp_wmb();
+        apic_sync_vapic(s, SYNC_FROM_VAPIC);
+    }
     apic_update_irq(s);
 }
 
@@ -334,6 +417,7 @@ static void apic_eoi(APICCommonState *s)
     if (!(s->spurious_vec & APIC_SV_DIRECTED_IO) && get_bit(s->tmr, isrv)) {
         ioapic_eoi_broadcast(isrv);
     }
+    apic_sync_vapic(s, SYNC_FROM_VAPIC | SYNC_TO_VAPIC);
     apic_update_irq(s);
 }
 
@@ -471,15 +555,19 @@ int apic_get_interrupt(DeviceState *d)
     if (!(s->spurious_vec & APIC_SV_ENABLE))
         return -1;
 
+    apic_sync_vapic(s, SYNC_FROM_VAPIC);
     intno = apic_irq_pending(s);
 
     if (intno == 0) {
+        apic_sync_vapic(s, SYNC_TO_VAPIC);
         return -1;
     } else if (intno < 0) {
+        apic_sync_vapic(s, SYNC_TO_VAPIC);
         return s->spurious_vec & 0xff;
     }
     reset_bit(s->irr, intno);
     set_bit(s->isr, intno);
+    apic_sync_vapic(s, SYNC_TO_VAPIC);
     apic_update_irq(s);
     return intno;
 }
@@ -576,6 +664,10 @@ static uint32_t apic_mem_readl(void *opaque, target_phys_addr_t addr)
         val = 0x11 | ((APIC_LVT_NB - 1) << 16); /* version 0x11 */
         break;
     case 0x08:
+        apic_sync_vapic(s, SYNC_FROM_VAPIC);
+        if (apic_report_tpr_access) {
+            cpu_report_tpr_access(s->cpu_env, TPR_ACCESS_READ);
+        }
         val = s->tpr;
         break;
     case 0x09:
@@ -675,7 +767,11 @@ static void apic_mem_writel(void *opaque, target_phys_addr_t addr, uint32_t val)
     case 0x03:
         break;
     case 0x08:
+        if (apic_report_tpr_access) {
+            cpu_report_tpr_access(s->cpu_env, TPR_ACCESS_WRITE);
+        }
         s->tpr = val;
+        apic_sync_vapic(s, SYNC_TO_VAPIC);
         apic_update_irq(s);
         break;
     case 0x09:
@@ -737,6 +833,11 @@ static void apic_mem_writel(void *opaque, target_phys_addr_t addr, uint32_t val)
     }
 }
 
+static void apic_pre_save(APICCommonState *s)
+{
+    apic_sync_vapic(s, SYNC_FROM_VAPIC);
+}
+
 static void apic_post_load(APICCommonState *s)
 {
     if (s->timer_expiry != -1) {
@@ -770,7 +871,10 @@ static void apic_class_init(ObjectClass *klass, void *data)
     k->init = apic_init;
     k->set_base = apic_set_base;
     k->set_tpr = apic_set_tpr;
+    k->get_tpr = apic_get_tpr;
+    k->vapic_base_update = apic_vapic_base_update;
     k->external_nmi = apic_external_nmi;
+    k->pre_save = apic_pre_save;
     k->post_load = apic_post_load;
 }
 
diff --git a/hw/apic_common.c b/hw/apic_common.c
index 588531b..1977da7 100644
--- a/hw/apic_common.c
+++ b/hw/apic_common.c
@@ -20,8 +20,10 @@
 #include "apic.h"
 #include "apic_internal.h"
 #include "trace.h"
+#include "kvm.h"
 
 static int apic_irq_delivered;
+bool apic_report_tpr_access;
 
 void cpu_set_apic_base(DeviceState *d, uint64_t val)
 {
@@ -63,13 +65,44 @@ void cpu_set_apic_tpr(DeviceState *d, uint8_t val)
 
 uint8_t cpu_get_apic_tpr(DeviceState *d)
 {
+    APICCommonState *s;
+    APICCommonClass *info;
+
+    if (!d) {
+        return 0;
+    }
+
+    s = APIC_COMMON(d);
+    info = APIC_COMMON_GET_CLASS(s);
+
+    return info->get_tpr(s);
+}
+
+void apic_enable_tpr_access_reporting(DeviceState *d)
+{
+    APICCommonState *s = DO_UPCAST(APICCommonState, busdev.qdev, d);
+    APICCommonClass *info = APIC_COMMON_GET_CLASS(s);
+
+    apic_report_tpr_access = true;
+    if (info->enable_tpr_reporting) {
+        info->enable_tpr_reporting(s);
+    }
+}
+
+void apic_enable_vapic(DeviceState *d, target_phys_addr_t paddr)
+{
     APICCommonState *s = DO_UPCAST(APICCommonState, busdev.qdev, d);
+    APICCommonClass *info = APIC_COMMON_GET_CLASS(s);
 
-    return s ? s->tpr >> 4 : 0;
+    s->vapic_paddr = paddr;
+    info->vapic_base_update(s);
 }
 
 void apic_handle_tpr_access_report(DeviceState *d, target_ulong ip, int access)
 {
+    APICCommonState *s = DO_UPCAST(APICCommonState, busdev.qdev, d);
+
+    vapic_report_tpr_access(s->vapic, s->cpu_env, ip, access);
 }
 
 void apic_report_irq_delivered(int delivered)
@@ -170,12 +203,16 @@ void apic_init_reset(DeviceState *d)
 static void apic_reset_common(DeviceState *d)
 {
     APICCommonState *s = DO_UPCAST(APICCommonState, busdev.qdev, d);
+    APICCommonClass *info = APIC_COMMON_GET_CLASS(s);
     bool bsp;
 
     bsp = cpu_is_bsp(s->cpu_env);
     s->apicbase = 0xfee00000 |
         (bsp ? MSR_IA32_APICBASE_BSP : 0) | MSR_IA32_APICBASE_ENABLE;
 
+    s->vapic_paddr = 0;
+    info->vapic_base_update(s);
+
     apic_init_reset(d);
 
     if (bsp) {
@@ -238,6 +275,7 @@ static int apic_init_common(SysBusDevice *dev)
 {
     APICCommonState *s = APIC_COMMON(dev);
     APICCommonClass *info;
+    static DeviceState *vapic;
     static int apic_no;
 
     if (apic_no >= MAX_APICS) {
@@ -248,10 +286,29 @@ static int apic_init_common(SysBusDevice *dev)
     info = APIC_COMMON_GET_CLASS(s);
     info->init(s);
 
-    sysbus_init_mmio(&s->busdev, &s->io_memory);
+    sysbus_init_mmio(dev, &s->io_memory);
+
+    if (!vapic && s->vapic_control & VAPIC_ENABLE_MASK) {
+        vapic = sysbus_create_simple("kvmvapic", -1, NULL);
+    }
+    s->vapic = vapic;
+    if (apic_report_tpr_access && info->enable_tpr_reporting) {
+        info->enable_tpr_reporting(s);
+    }
+
     return 0;
 }
 
+static void apic_dispatch_pre_save(void *opaque)
+{
+    APICCommonState *s = APIC_COMMON(opaque);
+    APICCommonClass *info = APIC_COMMON_GET_CLASS(s);
+
+    if (info->pre_save) {
+        info->pre_save(s);
+    }
+}
+
 static int apic_dispatch_post_load(void *opaque, int version_id)
 {
     APICCommonState *s = APIC_COMMON(opaque);
@@ -269,6 +326,7 @@ static const VMStateDescription vmstate_apic_common = {
     .minimum_version_id = 3,
     .minimum_version_id_old = 1,
     .load_state_old = apic_load_old,
+    .pre_save = apic_dispatch_pre_save,
     .post_load = apic_dispatch_post_load,
     .fields = (VMStateField[]) {
         VMSTATE_UINT32(apicbase, APICCommonState),
@@ -298,6 +356,8 @@ static const VMStateDescription vmstate_apic_common = {
 static Property apic_properties_common[] = {
     DEFINE_PROP_UINT8("id", APICCommonState, id, -1),
     DEFINE_PROP_PTR("cpu_env", APICCommonState, cpu_env),
+    DEFINE_PROP_BIT("vapic", APICCommonState, vapic_control, VAPIC_ENABLE_BIT,
+                    true),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/apic_internal.h b/hw/apic_internal.h
index 0cab010..95cc7cf 100644
--- a/hw/apic_internal.h
+++ b/hw/apic_internal.h
@@ -61,6 +61,9 @@
 #define APIC_SV_DIRECTED_IO             (1<<12)
 #define APIC_SV_ENABLE                  (1<<8)
 
+#define VAPIC_ENABLE_BIT                0
+#define VAPIC_ENABLE_MASK               (1 << VAPIC_ENABLE_BIT)
+
 #define MAX_APICS 255
 
 #define MSI_SPACE_SIZE                  0x100000
@@ -82,7 +85,11 @@ typedef struct APICCommonClass
     void (*init)(APICCommonState *s);
     void (*set_base)(APICCommonState *s, uint64_t val);
     void (*set_tpr)(APICCommonState *s, uint8_t val);
+    uint8_t (*get_tpr)(APICCommonState *s);
+    void (*enable_tpr_reporting)(APICCommonState *s);
+    void (*vapic_base_update)(APICCommonState *s);
     void (*external_nmi)(APICCommonState *s);
+    void (*pre_save)(APICCommonState *s);
     void (*post_load)(APICCommonState *s);
 } APICCommonClass;
 
@@ -114,9 +121,29 @@ struct APICCommonState {
     int64_t timer_expiry;
     int sipi_vector;
     int wait_for_sipi;
+
+    uint32_t vapic_control;
+    DeviceState *vapic;
+    target_phys_addr_t vapic_paddr; /* note: persistence via kvmvapic */
 };
 
+typedef struct VAPICState {
+    uint8_t tpr;
+    uint8_t isr;
+    uint8_t zero;
+    uint8_t irr;
+    uint8_t enabled;
+} QEMU_PACKED VAPICState;
+
+extern bool apic_report_tpr_access;
+
 void apic_report_irq_delivered(int delivered);
 bool apic_next_timer(APICCommonState *s, int64_t current_time);
+void apic_enable_tpr_access_reporting(DeviceState *d);
+void apic_enable_vapic(DeviceState *d, target_phys_addr_t paddr);
+void apic_poll_irq(DeviceState *d);
+
+void vapic_report_tpr_access(DeviceState *dev, void *cpu, target_ulong ip,
+                             int access);
 
 #endif /* !QEMU_APIC_INTERNAL_H */
diff --git a/hw/kvm/apic.c b/hw/kvm/apic.c
index dfc2ab3..326eb37 100644
--- a/hw/kvm/apic.c
+++ b/hw/kvm/apic.c
@@ -92,6 +92,35 @@ static void kvm_apic_set_tpr(APICCommonState *s, uint8_t val)
     s->tpr = (val & 0x0f) << 4;
 }
 
+static uint8_t kvm_apic_get_tpr(APICCommonState *s)
+{
+    return s->tpr >> 4;
+}
+
+static void kvm_apic_enable_tpr_reporting(APICCommonState *s)
+{
+    struct kvm_tpr_access_ctl ctl = {
+        .enabled = 1
+    };
+
+    kvm_vcpu_ioctl(s->cpu_env, KVM_TPR_ACCESS_REPORTING, &ctl);
+}
+
+static void kvm_apic_vapic_base_update(APICCommonState *s)
+{
+    struct kvm_vapic_addr vapid_addr = {
+        .vapic_addr = s->vapic_paddr,
+    };
+    int ret;
+
+    ret = kvm_vcpu_ioctl(s->cpu_env, KVM_SET_VAPIC_ADDR, &vapid_addr);
+    if (ret < 0) {
+        fprintf(stderr, "KVM: setting VAPIC address failed (%s)\n",
+                strerror(-ret));
+        abort();
+    }
+}
+
 static void do_inject_external_nmi(void *data)
 {
     APICCommonState *s = data;
@@ -129,6 +158,9 @@ static void kvm_apic_class_init(ObjectClass *klass, void *data)
     k->init = kvm_apic_init;
     k->set_base = kvm_apic_set_base;
     k->set_tpr = kvm_apic_set_tpr;
+    k->get_tpr = kvm_apic_get_tpr;
+    k->enable_tpr_reporting = kvm_apic_enable_tpr_reporting;
+    k->vapic_base_update = kvm_apic_vapic_base_update;
     k->external_nmi = kvm_apic_external_nmi;
 }
 
diff --git a/hw/kvmvapic.c b/hw/kvmvapic.c
new file mode 100644
index 0000000..0c4d304
--- /dev/null
+++ b/hw/kvmvapic.c
@@ -0,0 +1,774 @@
+/*
+ * TPR optimization for 32-bit Windows guests
+ *
+ * Copyright (C) 2007-2008 Qumranet Technologies
+ * Copyright (C) 2012      Jan Kiszka, Siemens AG
+ *
+ * This work is licensed under the terms of the GNU GPL version 2, or
+ * (at your option) any later version. See the COPYING file in the
+ * top-level directory.
+ */
+#include "sysemu.h"
+#include "cpus.h"
+#include "kvm.h"
+#include "apic_internal.h"
+
+#define APIC_DEFAULT_ADDRESS    0xfee00000
+
+#define VAPIC_IO_PORT           0x7e
+
+#define VAPIC_INACTIVE          0
+#define VAPIC_ACTIVE            1
+#define VAPIC_STANDBY           2
+
+#define VAPIC_CPU_SHIFT         7
+
+#define ROM_BLOCK_SIZE          512
+#define ROM_BLOCK_MASK          (~(ROM_BLOCK_SIZE - 1))
+
+typedef struct VAPICHandlers {
+    uint32_t set_tpr;
+    uint32_t set_tpr_eax;
+    uint32_t get_tpr[8];
+    uint32_t get_tpr_stack;
+} QEMU_PACKED VAPICHandlers;
+
+typedef struct GuestROMState {
+    char signature[8];
+    uint32_t vaddr;
+    uint32_t fixup_start;
+    uint32_t fixup_end;
+    uint32_t vapic_vaddr;
+    uint32_t vapic_size;
+    uint32_t vcpu_shift;
+    uint32_t real_tpr_addr;
+    VAPICHandlers up;
+    VAPICHandlers mp;
+} QEMU_PACKED GuestROMState;
+
+typedef struct VAPICROMState {
+    SysBusDevice busdev;
+    MemoryRegion io;
+    MemoryRegion rom;
+    bool rom_mapped_writable;
+    uint32_t state;
+    uint32_t rom_state_paddr;
+    uint32_t rom_state_vaddr;
+    uint32_t vapic_paddr;
+    uint32_t real_tpr_addr;
+    GuestROMState rom_state;
+    size_t rom_size;
+} VAPICROMState;
+
+#define TPR_INSTR_IS_WRITE              0x1
+#define TPR_INSTR_ABS_MODRM             0x2
+#define TPR_INSTR_MATCH_MODRM_REG       0x4
+
+typedef struct TPRInstruction {
+    uint8_t opcode;
+    uint8_t modrm_reg;
+    unsigned int flags;
+    size_t length;
+    off_t addr_offset;
+} TPRInstruction;
+
+/* must be sorted by length, shortest first */
+static const TPRInstruction tpr_instr[] = {
+    { /* mov abs to eax */
+        .opcode = 0xa1,
+        .length = 5,
+        .addr_offset = 1,
+    },
+    { /* mov eax to abs */
+        .opcode = 0xa3,
+        .flags = TPR_INSTR_IS_WRITE,
+        .length = 5,
+        .addr_offset = 1,
+    },
+    { /* mov r32 to r/m32 */
+        .opcode = 0x89,
+        .flags = TPR_INSTR_IS_WRITE | TPR_INSTR_ABS_MODRM,
+        .length = 6,
+        .addr_offset = 2,
+    },
+    { /* mov r/m32 to r32 */
+        .opcode = 0x8b,
+        .flags = TPR_INSTR_ABS_MODRM,
+        .length = 6,
+        .addr_offset = 2,
+    },
+    { /* push r/m32 */
+        .opcode = 0xff,
+        .modrm_reg = 6,
+        .flags = TPR_INSTR_ABS_MODRM | TPR_INSTR_MATCH_MODRM_REG,
+        .length = 6,
+        .addr_offset = 2,
+    },
+    { /* mov imm32, r/m32 (c7/0) */
+        .opcode = 0xc7,
+        .modrm_reg = 0,
+        .flags = TPR_INSTR_IS_WRITE | TPR_INSTR_ABS_MODRM |
+                 TPR_INSTR_MATCH_MODRM_REG,
+        .length = 10,
+        .addr_offset = 2,
+    },
+};
+
+static void read_guest_rom_state(VAPICROMState *s)
+{
+    cpu_physical_memory_rw(s->rom_state_paddr, (void *)&s->rom_state,
+                           sizeof(GuestROMState), 0);
+}
+
+static void write_guest_rom_state(VAPICROMState *s)
+{
+    cpu_physical_memory_rw(s->rom_state_paddr, (void *)&s->rom_state,
+                           sizeof(GuestROMState), 1);
+}
+
+static void update_guest_rom_state(VAPICROMState *s)
+{
+    read_guest_rom_state(s);
+
+    s->rom_state.real_tpr_addr = cpu_to_le32(s->real_tpr_addr);
+    s->rom_state.vcpu_shift = cpu_to_le32(VAPIC_CPU_SHIFT);
+
+    write_guest_rom_state(s);
+}
+
+static int find_real_tpr_addr(VAPICROMState *s, CPUState *env)
+{
+    target_phys_addr_t paddr;
+    target_ulong addr;
+
+    if (s->state == VAPIC_ACTIVE) {
+        return 0;
+    }
+    for (addr = 0xfffff000; addr >= 0x80000000; addr -= TARGET_PAGE_SIZE) {
+        paddr = cpu_get_phys_page_debug(env, addr);
+        if (paddr != APIC_DEFAULT_ADDRESS) {
+            continue;
+        }
+        s->real_tpr_addr = addr + 0x80;
+        update_guest_rom_state(s);
+        return 0;
+    }
+    return -1;
+}
+
+static uint8_t modrm_reg(uint8_t modrm)
+{
+    return (modrm >> 3) & 7;
+}
+
+static bool is_abs_modrm(uint8_t modrm)
+{
+    return (modrm & 0xc7) == 0x05;
+}
+
+static bool opcode_matches(uint8_t *opcode, const TPRInstruction *instr)
+{
+    return opcode[0] == instr->opcode &&
+        (!(instr->flags & TPR_INSTR_ABS_MODRM) || is_abs_modrm(opcode[1])) &&
+        (!(instr->flags & TPR_INSTR_MATCH_MODRM_REG) ||
+         modrm_reg(opcode[1]) == instr->modrm_reg);
+}
+
+static int evaluate_tpr_instruction(VAPICROMState *s, CPUState *env,
+                                    target_ulong *pip, int access)
+{
+    const TPRInstruction *instr;
+    target_ulong ip = *pip;
+    uint8_t opcode[2];
+    uint32_t real_tpr_addr;
+    int i;
+
+    if ((ip & 0xf0000000) != 0x80000000 && (ip & 0xf0000000) != 0xe0000000) {
+        return -1;
+    }
+
+    /*
+     * Early Windows 2003 SMP initialization contains a
+     *
+     *   mov imm32, r/m32
+     *
+     * instruction that is patched by TPR optimization. The problem is that
+     * RSP, used by the patched instruction, is zero, so the guest gets a
+     * double fault and dies.
+     */
+    if (env->regs[R_ESP] == 0) {
+        return -1;
+    }
+
+    if (access == TPR_ACCESS_WRITE && kvm_enabled() &&
+        !kvm_irqchip_in_kernel()) {
+        /*
+         * KVM without TPR access reporting calls into the user space APIC on
+         * write with IP pointing after the accessing instruction. So we need
+         * to look backward to find the reason.
+         */
+        for (i = 0; i < ARRAY_SIZE(tpr_instr); i++) {
+            instr = &tpr_instr[i];
+            if (!(instr->flags & TPR_INSTR_IS_WRITE)) {
+                continue;
+            }
+            if (cpu_memory_rw_debug(env, ip - instr->length, opcode,
+                                    sizeof(opcode), 0) < 0) {
+                return -1;
+            }
+            if (opcode_matches(opcode, instr)) {
+                ip -= instr->length;
+                goto instruction_ok;
+            }
+        }
+        return -1;
+    } else {
+        if (cpu_memory_rw_debug(env, ip, opcode, sizeof(opcode), 0) < 0) {
+            return -1;
+        }
+        for (i = 0; i < ARRAY_SIZE(tpr_instr); i++) {
+            instr = &tpr_instr[i];
+            if (opcode_matches(opcode, instr)) {
+                goto instruction_ok;
+            }
+        }
+        return -1;
+    }
+
+instruction_ok:
+    /*
+     * Grab the virtual TPR address from the instruction
+     * and update the cached values.
+     */
+    if (cpu_memory_rw_debug(env, ip + instr->addr_offset,
+                            (void *)&real_tpr_addr,
+                            sizeof(real_tpr_addr), 0) < 0) {
+        return -1;
+    }
+    real_tpr_addr = le32_to_cpu(real_tpr_addr);
+    if ((real_tpr_addr & 0xfff) != 0x80) {
+        return -1;
+    }
+    s->real_tpr_addr = real_tpr_addr;
+    update_guest_rom_state(s);
+
+    *pip = ip;
+    return 0;
+}
+
+static int update_rom_mapping(VAPICROMState *s, CPUState *env, target_ulong ip)
+{
+    target_phys_addr_t paddr;
+    uint32_t rom_state_vaddr;
+    uint32_t pos, patch, offset;
+
+    /* nothing to do if already activated */
+    if (s->state == VAPIC_ACTIVE) {
+        return 0;
+    }
+
+    /* bail out if ROM init code was not executed (missing ROM?) */
+    if (s->state == VAPIC_INACTIVE) {
+        return -1;
+    }
+
+    /* find out virtual address of the ROM */
+    rom_state_vaddr = s->rom_state_paddr + (ip & 0xf0000000);
+    paddr = cpu_get_phys_page_debug(env, rom_state_vaddr);
+    if (paddr == -1) {
+        return -1;
+    }
+    paddr += rom_state_vaddr & ~TARGET_PAGE_MASK;
+    if (paddr != s->rom_state_paddr) {
+        return -1;
+    }
+    read_guest_rom_state(s);
+    if (memcmp(s->rom_state.signature, "kvm aPiC", 8) != 0) {
+        return -1;
+    }
+    s->rom_state_vaddr = rom_state_vaddr;
+
+    /* fixup addresses in ROM if needed */
+    if (rom_state_vaddr == le32_to_cpu(s->rom_state.vaddr)) {
+        return 0;
+    }
+    for (pos = le32_to_cpu(s->rom_state.fixup_start);
+         pos < le32_to_cpu(s->rom_state.fixup_end);
+         pos += 4) {
+        cpu_physical_memory_rw(paddr + pos - s->rom_state.vaddr,
+                               (void *)&offset, sizeof(offset), 0);
+        offset = le32_to_cpu(offset);
+        cpu_physical_memory_rw(paddr + offset, (void *)&patch,
+                               sizeof(patch), 0);
+        patch = le32_to_cpu(patch);
+        patch += rom_state_vaddr - le32_to_cpu(s->rom_state.vaddr);
+        patch = cpu_to_le32(patch);
+        cpu_physical_memory_rw(paddr + offset, (void *)&patch,
+                               sizeof(patch), 1);
+    }
+    read_guest_rom_state(s);
+    s->vapic_paddr = paddr + le32_to_cpu(s->rom_state.vapic_vaddr) -
+        le32_to_cpu(s->rom_state.vaddr);
+
+    return 0;
+}
+
+/*
+ * Tries to read the unique processor number from the Kernel Processor Control
+ * Region (KPCR) of 32-bit Windows. Returns -1 if the KPCR cannot be accessed
+ * or is considered invalid.
+ */
+static int get_kpcr_number(CPUState *env)
+{
+    struct kpcr {
+        uint8_t  fill1[0x1c];
+        uint32_t self;
+        uint8_t  fill2[0x31];
+        uint8_t  number;
+    } QEMU_PACKED kpcr;
+
+    if (cpu_memory_rw_debug(env, env->segs[R_FS].base,
+                            (void *)&kpcr, sizeof(kpcr), 0) < 0 ||
+        kpcr.self != env->segs[R_FS].base) {
+        return -1;
+    }
+    return kpcr.number;
+}
+
+static int vapic_enable(VAPICROMState *s, CPUState *env)
+{
+    int cpu_number = get_kpcr_number(env);
+    target_phys_addr_t vapic_paddr;
+    static const uint8_t enabled = 1;
+
+    if (cpu_number < 0) {
+        return -1;
+    }
+    vapic_paddr = s->vapic_paddr +
+        (((target_phys_addr_t)cpu_number) << VAPIC_CPU_SHIFT);
+    cpu_physical_memory_rw(vapic_paddr + offsetof(VAPICState, enabled),
+                           (void *)&enabled, sizeof(enabled), 1);
+    apic_enable_vapic(env->apic_state, vapic_paddr);
+
+    s->state = VAPIC_ACTIVE;
+
+    return 0;
+}
+
+static void patch_byte(CPUState *env, target_ulong addr, uint8_t byte)
+{
+    cpu_memory_rw_debug(env, addr, &byte, 1, 1);
+}
+
+static void patch_call(VAPICROMState *s, CPUState *env, target_ulong ip,
+                       uint32_t target)
+{
+    uint32_t offset;
+
+    offset = cpu_to_le32(target - ip - 5);
+    patch_byte(env, ip, 0xe8); /* call near */
+    cpu_memory_rw_debug(env, ip + 1, (void *)&offset, sizeof(offset), 1);
+}
+
+static void patch_instruction(VAPICROMState *s, CPUState *env, target_ulong ip)
+{
+    target_phys_addr_t paddr;
+    VAPICHandlers *handlers;
+    uint8_t opcode[2];
+    uint32_t imm32;
+
+    if (smp_cpus == 1) {
+        handlers = &s->rom_state.up;
+    } else {
+        handlers = &s->rom_state.mp;
+    }
+
+    pause_all_vcpus();
+
+    cpu_memory_rw_debug(env, ip, opcode, sizeof(opcode), 0);
+
+    switch (opcode[0]) {
+    case 0x89: /* mov r32 to r/m32 */
+        patch_byte(env, ip, 0x50 + modrm_reg(opcode[1]));  /* push reg */
+        patch_call(s, env, ip + 1, handlers->set_tpr);
+        break;
+    case 0x8b: /* mov r/m32 to r32 */
+        patch_byte(env, ip, 0x90);
+        patch_call(s, env, ip + 1, handlers->get_tpr[modrm_reg(opcode[1])]);
+        break;
+    case 0xa1: /* mov abs to eax */
+        patch_call(s, env, ip, handlers->get_tpr[0]);
+        break;
+    case 0xa3: /* mov eax to abs */
+        patch_call(s, env, ip, handlers->set_tpr_eax);
+        break;
+    case 0xc7: /* mov imm32, r/m32 (c7/0) */
+        patch_byte(env, ip, 0x68);  /* push imm32 */
+        cpu_memory_rw_debug(env, ip + 6, (void *)&imm32, sizeof(imm32), 0);
+        cpu_memory_rw_debug(env, ip + 1, (void *)&imm32, sizeof(imm32), 1);
+        patch_call(s, env, ip + 5, handlers->set_tpr);
+        break;
+    case 0xff: /* push r/m32 */
+        patch_byte(env, ip, 0x50); /* push eax */
+        patch_call(s, env, ip + 1, handlers->get_tpr_stack);
+        break;
+    default:
+        abort();
+    }
+
+    resume_all_vcpus();
+
+    paddr = cpu_get_phys_page_debug(env, ip);
+    paddr += ip & ~TARGET_PAGE_MASK;
+    tb_invalidate_phys_page_range(paddr, paddr + 1, 1);
+}
+
+void vapic_report_tpr_access(DeviceState *dev, void *cpu, target_ulong ip,
+                             int access)
+{
+    VAPICROMState *s = DO_UPCAST(VAPICROMState, busdev.qdev, dev);
+    CPUState *env = cpu;
+
+    cpu_synchronize_state(env);
+
+    if (evaluate_tpr_instruction(s, env, &ip, access) < 0) {
+        if (s->state == VAPIC_ACTIVE) {
+            vapic_enable(s, env);
+        }
+        return;
+    }
+    if (update_rom_mapping(s, env, ip) < 0) {
+        return;
+    }
+    if (vapic_enable(s, env) < 0) {
+        return;
+    }
+    patch_instruction(s, env, ip);
+}
+
+static void vapic_reset(DeviceState *dev)
+{
+    VAPICROMState *s = DO_UPCAST(VAPICROMState, busdev.qdev, dev);
+
+    if (s->state == VAPIC_ACTIVE) {
+        s->state = VAPIC_STANDBY;
+    }
+}
+
+static int patch_hypercalls(VAPICROMState *s)
+{
+    target_phys_addr_t rom_paddr = s->rom_state_paddr & ROM_BLOCK_MASK;
+    static uint8_t vmcall_pattern[] = {
+        0xb8, 0x1, 0, 0, 0, 0xf, 0x1, 0xc1
+    };
+    static uint8_t outl_pattern[] = {
+        0xb8, 0x1, 0, 0, 0, 0x90, 0xe7, 0x7e
+    };
+    uint8_t alternates[2];
+    uint8_t *pattern;
+    uint8_t *patch;
+    int patches = 0;
+    off_t pos;
+    uint8_t *rom;
+
+    rom = g_malloc(s->rom_size);
+    cpu_physical_memory_rw(rom_paddr, rom, s->rom_size, 0);
+
+    for (pos = 0; pos < s->rom_size - sizeof(vmcall_pattern); pos++) {
+        if (kvm_irqchip_in_kernel()) {
+            pattern = outl_pattern;
+            alternates[0] = outl_pattern[7];
+            alternates[1] = outl_pattern[7];
+            patch = &vmcall_pattern[5];
+        } else {
+            pattern = vmcall_pattern;
+            alternates[0] = vmcall_pattern[7];
+            alternates[1] = 0xd9; /* AMD's VMMCALL */
+            patch = &outl_pattern[5];
+        }
+        if (memcmp(rom + pos, pattern, 7) == 0 &&
+            (rom[pos + 7] == alternates[0] || rom[pos + 7] == alternates[1])) {
+            cpu_physical_memory_rw(rom_paddr + pos + 5, patch, 3, 1);
+            /*
+             * Don't flush the tb here. Under ordinary conditions, the patched
+             * calls are miles away from the current IP. Under malicious
+             * conditions, the guest could trick us to crash.
+             */
+        }
+    }
+
+    g_free(rom);
+
+    if (patches != 0 && patches != 2) {
+        return -1;
+    }
+
+    return 0;
+}
+
+static void vapic_map_rom_writable(VAPICROMState *s)
+{
+    target_phys_addr_t rom_paddr = s->rom_state_paddr & ROM_BLOCK_MASK;
+    MemoryRegionSection section;
+    MemoryRegion *as;
+    size_t rom_size;
+    uint8_t *ram;
+
+    as = sysbus_address_space(&s->busdev);
+
+    if (s->rom_mapped_writable) {
+        memory_region_del_subregion(as, &s->rom);
+        memory_region_destroy(&s->rom);
+    }
+
+    /* grab RAM memory region (region @rom_paddr may still be pc.rom) */
+    section = memory_region_find(as, 0, 1);
+
+    /* read ROM size from RAM region */
+    ram = memory_region_get_ram_ptr(section.mr);
+    rom_size = ram[rom_paddr + 2] * ROM_BLOCK_SIZE;
+    s->rom_size = rom_size;
+
+    /* We need to round up to avoid creating subpages
+     * from which we cannot run code. */
+    rom_size = TARGET_PAGE_ALIGN(rom_size);
+
+    memory_region_init_alias(&s->rom, "kvmvapic-rom", section.mr, rom_paddr,
+                             rom_size);
+    memory_region_add_subregion_overlap(as, rom_paddr, &s->rom, 1000);
+    s->rom_mapped_writable = true;
+}
+
+static void do_enable_tpr_reporting(void *data)
+{
+    CPUState *env = data;
+
+    apic_enable_tpr_access_reporting(env->apic_state);
+}
+
+static void vapic_enable_tpr_reporting(void)
+{
+    CPUState *env = cpu_single_env;
+
+    for (env = first_cpu; env != NULL; env = env->next_cpu) {
+        run_on_cpu(env, do_enable_tpr_reporting, env);
+    }
+}
+
+static int vapic_prepare(VAPICROMState *s)
+{
+    vapic_map_rom_writable(s);
+
+    if (patch_hypercalls(s) < 0) {
+        return -1;
+    }
+
+    vapic_enable_tpr_reporting();
+
+    return 0;
+}
+
+static void vapic_write(void *opaque, target_phys_addr_t addr, uint64_t data,
+                        unsigned int size)
+{
+    CPUState *env = cpu_single_env;
+    target_phys_addr_t rom_paddr;
+    VAPICROMState *s = opaque;
+
+    cpu_synchronize_state(env);
+
+    /*
+     * The VAPIC supports two PIO-based hypercalls, both via port 0x7E.
+     *  o 16-bit write access:
+     *    Reports the option ROM initialization to the hypervisor. Written
+     *    value is the offset of the state structure in the ROM.
+     *  o 8-bit write access:
+     *    Reactivates the VAPIC after a guest hibernation, i.e. after the
+     *    option ROM content has been re-initialized by a guest power cycle.
+     *  o 32-bit write access:
+     *    Poll for pending IRQs, considering the current VAPIC state.
+     */
+    switch (size) {
+    case 2:
+        if (s->state != VAPIC_INACTIVE) {
+            patch_hypercalls(s);
+            break;
+        }
+
+        rom_paddr = (env->segs[R_CS].base + env->eip) & ROM_BLOCK_MASK;
+        s->rom_state_paddr = rom_paddr + data;
+
+        if (vapic_prepare(s) < 0) {
+            break;
+        }
+        s->state = VAPIC_STANDBY;
+        break;
+    case 1:
+        if (kvm_enabled()) {
+            /*
+             * Disable triggering instruction in ROM by writing a NOP.
+             *
+             * We cannot do this in TCG mode as the reported IP is not
+             * reliable.
+             */
+            pause_all_vcpus();
+            patch_byte(env, env->eip - 2, 0x66);
+            patch_byte(env, env->eip - 1, 0x90);
+            resume_all_vcpus();
+        }
+
+        if (s->state == VAPIC_ACTIVE) {
+            break;
+        }
+        if (update_rom_mapping(s, env, env->eip) < 0) {
+            break;
+        }
+        if (find_real_tpr_addr(s, env) < 0) {
+            break;
+        }
+        vapic_enable(s, env);
+        break;
+    default:
+    case 4:
+        if (!kvm_irqchip_in_kernel()) {
+            apic_poll_irq(env->apic_state);
+        }
+        break;
+    }
+}
+
+static const MemoryRegionOps vapic_ops = {
+    .write = vapic_write,
+    .endianness = DEVICE_NATIVE_ENDIAN,
+};
+
+static int vapic_init(SysBusDevice *dev)
+{
+    VAPICROMState *s = FROM_SYSBUS(VAPICROMState, dev);
+
+    memory_region_init_io(&s->io, &vapic_ops, s, "kvmvapic", 2);
+    sysbus_add_io(dev, VAPIC_IO_PORT, &s->io);
+    sysbus_init_ioports(dev, VAPIC_IO_PORT, 2);
+
+    option_rom[nb_option_roms].name = "kvmvapic.bin";
+    option_rom[nb_option_roms].bootindex = -1;
+    nb_option_roms++;
+
+    return 0;
+}
+
+static void do_vapic_enable(void *data)
+{
+    VAPICROMState *s = data;
+
+    vapic_enable(s, first_cpu);
+}
+
+static int vapic_post_load(void *opaque, int version_id)
+{
+    VAPICROMState *s = opaque;
+    uint8_t *zero;
+
+    /*
+     * The old implementation of qemu-kvm did not provide the state
+     * VAPIC_STANDBY. Reconstruct it.
+     */
+    if (s->state == VAPIC_INACTIVE && s->rom_state_paddr != 0) {
+        s->state = VAPIC_STANDBY;
+    }
+
+    if (s->state != VAPIC_INACTIVE) {
+        if (vapic_prepare(s) < 0) {
+            return -1;
+        }
+    }
+    if (s->state == VAPIC_ACTIVE) {
+        if (smp_cpus == 1) {
+            run_on_cpu(first_cpu, do_vapic_enable, s);
+        } else {
+            zero = g_malloc0(s->rom_state.vapic_size);
+            cpu_physical_memory_rw(s->vapic_paddr, zero,
+                                   s->rom_state.vapic_size, 1);
+            g_free(zero);
+        }
+    }
+
+    return 0;
+}
+
+static const VMStateDescription vmstate_handlers = {
+    .name = "kvmvapic-handlers",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT32(set_tpr, VAPICHandlers),
+        VMSTATE_UINT32(set_tpr_eax, VAPICHandlers),
+        VMSTATE_UINT32_ARRAY(get_tpr, VAPICHandlers, 8),
+        VMSTATE_UINT32(get_tpr_stack, VAPICHandlers),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static const VMStateDescription vmstate_guest_rom = {
+    .name = "kvmvapic-guest-rom",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_UNUSED(8),     /* signature */
+        VMSTATE_UINT32(vaddr, GuestROMState),
+        VMSTATE_UINT32(fixup_start, GuestROMState),
+        VMSTATE_UINT32(fixup_end, GuestROMState),
+        VMSTATE_UINT32(vapic_vaddr, GuestROMState),
+        VMSTATE_UINT32(vapic_size, GuestROMState),
+        VMSTATE_UINT32(vcpu_shift, GuestROMState),
+        VMSTATE_UINT32(real_tpr_addr, GuestROMState),
+        VMSTATE_STRUCT(up, GuestROMState, 0, vmstate_handlers, VAPICHandlers),
+        VMSTATE_STRUCT(mp, GuestROMState, 0, vmstate_handlers, VAPICHandlers),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static const VMStateDescription vmstate_vapic = {
+    .name = "kvm-tpr-opt",      /* compatible with qemu-kvm VAPIC */
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .post_load = vapic_post_load,
+    .fields = (VMStateField[]) {
+        VMSTATE_STRUCT(rom_state, VAPICROMState, 0, vmstate_guest_rom,
+                       GuestROMState),
+        VMSTATE_UINT32(state, VAPICROMState),
+        VMSTATE_UINT32(real_tpr_addr, VAPICROMState),
+        VMSTATE_UINT32(rom_state_vaddr, VAPICROMState),
+        VMSTATE_UINT32(vapic_paddr, VAPICROMState),
+        VMSTATE_UINT32(rom_state_paddr, VAPICROMState),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static void vapic_class_init(ObjectClass *klass, void *data)
+{
+    SysBusDeviceClass *sc = SYS_BUS_DEVICE_CLASS(klass);
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    dc->no_user = 1;
+    dc->reset   = vapic_reset;
+    dc->vmsd    = &vmstate_vapic;
+    sc->init    = vapic_init;
+}
+
+static TypeInfo vapic_type = {
+    .name          = "kvmvapic",
+    .parent        = TYPE_SYS_BUS_DEVICE,
+    .instance_size = sizeof(VAPICROMState),
+    .class_init    = vapic_class_init,
+};
+
+static void vapic_register(void)
+{
+    type_register_static(&vapic_type);
+}
+
+device_init(vapic_register);
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Qemu-devel] [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests
@ 2012-02-10 18:31   ` Jan Kiszka
  0 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-10 18:31 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Anthony Liguori, qemu-devel, kvm, Gleb Natapov

This enables acceleration for MMIO-based TPR registers accesses of
32-bit Windows guest systems. It is mostly useful with KVM enabled,
either on older Intel CPUs (without flexpriority feature, can also be
manually disabled for testing) or any current AMD processor.

The approach introduced here is derived from the original version of
qemu-kvm. It was refactored, documented, and extended by support for
user space APIC emulation, both with and without KVM acceleration. The
VMState format was kept compatible, so was the ABI to the option ROM
that implements the guest-side para-virtualized driver service. This
enables seamless migration from qemu-kvm to upstream or, one day,
between KVM and TCG mode.

The basic concept goes like this:
 - VAPIC PV interface consisting of I/O port 0x7e and (for KVM in-kernel
   irqchip) a vmcall hypercall is registered
 - VAPIC option ROM is loaded into guest
 - option ROM activates TPR MMIO access reporting via port 0x7e
 - TPR accesses are trapped and patched in the guest to call into option
   ROM instead, VAPIC support is enabled
 - option ROM TPR helpers track state in memory and invoke hypercall to
   poll for pending IRQs if required

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 Makefile.target    |    3 +-
 hw/apic.c          |  126 ++++++++-
 hw/apic_common.c   |   64 +++++-
 hw/apic_internal.h |   27 ++
 hw/kvm/apic.c      |   32 +++
 hw/kvmvapic.c      |  774 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 1012 insertions(+), 14 deletions(-)
 create mode 100644 hw/kvmvapic.c

diff --git a/Makefile.target b/Makefile.target
index 68481a3..ec7eff8 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -230,7 +230,8 @@ obj-y += device-hotplug.o
 
 # Hardware support
 obj-i386-y += mc146818rtc.o pc.o
-obj-i386-y += sga.o apic_common.o apic.o ioapic_common.o ioapic.o piix_pci.o
+obj-i386-y += apic_common.o apic.o kvmvapic.o
+obj-i386-y += sga.o ioapic_common.o ioapic.o piix_pci.o
 obj-i386-y += vmport.o
 obj-i386-y += pci-hotplug.o smbios.o wdt_ib700.o
 obj-i386-y += debugcon.o multiboot.o
diff --git a/hw/apic.c b/hw/apic.c
index 086c544..2ebf3ca 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -35,6 +35,10 @@
 #define MSI_ADDR_DEST_ID_SHIFT		12
 #define	MSI_ADDR_DEST_ID_MASK		0x00ffff0
 
+#define SYNC_FROM_VAPIC                 0x1
+#define SYNC_TO_VAPIC                   0x2
+#define SYNC_ISR_IRR_TO_VAPIC           0x4
+
 static APICCommonState *local_apics[MAX_APICS + 1];
 
 static void apic_set_irq(APICCommonState *s, int vector_num, int trigger_mode);
@@ -78,6 +82,70 @@ static inline int get_bit(uint32_t *tab, int index)
     return !!(tab[i] & mask);
 }
 
+/* return -1 if no bit is set */
+static int get_highest_priority_int(uint32_t *tab)
+{
+    int i;
+    for (i = 7; i >= 0; i--) {
+        if (tab[i] != 0) {
+            return i * 32 + fls_bit(tab[i]);
+        }
+    }
+    return -1;
+}
+
+static void apic_sync_vapic(APICCommonState *s, int sync_type)
+{
+    VAPICState vapic_state;
+    size_t length;
+    off_t start;
+    int vector;
+
+    if (!s->vapic_paddr) {
+        return;
+    }
+    if (sync_type & SYNC_FROM_VAPIC) {
+        cpu_physical_memory_rw(s->vapic_paddr, (void *)&vapic_state,
+                               sizeof(vapic_state), 0);
+        s->tpr = vapic_state.tpr;
+    }
+    if (sync_type & (SYNC_TO_VAPIC | SYNC_ISR_IRR_TO_VAPIC)) {
+        start = offsetof(VAPICState, isr);
+        length = offsetof(VAPICState, enabled) - offsetof(VAPICState, isr);
+
+        if (sync_type & SYNC_TO_VAPIC) {
+            assert(qemu_cpu_is_self(s->cpu_env));
+
+            vapic_state.tpr = s->tpr;
+            vapic_state.enabled = 1;
+            start = 0;
+            length = sizeof(VAPICState);
+        }
+
+        vector = get_highest_priority_int(s->isr);
+        if (vector < 0) {
+            vector = 0;
+        }
+        vapic_state.isr = vector & 0xf0;
+
+        vapic_state.zero = 0;
+
+        vector = get_highest_priority_int(s->irr);
+        if (vector < 0) {
+            vector = 0;
+        }
+        vapic_state.irr = vector & 0xff;
+
+        cpu_physical_memory_write_rom(s->vapic_paddr + start,
+                                      ((void *)&vapic_state) + start, length);
+    }
+}
+
+static void apic_vapic_base_update(APICCommonState *s)
+{
+    apic_sync_vapic(s, SYNC_TO_VAPIC);
+}
+
 static void apic_local_deliver(APICCommonState *s, int vector)
 {
     uint32_t lvt = s->lvt[vector];
@@ -239,20 +307,17 @@ static void apic_set_base(APICCommonState *s, uint64_t val)
 
 static void apic_set_tpr(APICCommonState *s, uint8_t val)
 {
-    s->tpr = (val & 0x0f) << 4;
-    apic_update_irq(s);
+    /* Updates from cr8 are ignored while the VAPIC is active */
+    if (!s->vapic_paddr) {
+        s->tpr = val << 4;
+        apic_update_irq(s);
+    }
 }
 
-/* return -1 if no bit is set */
-static int get_highest_priority_int(uint32_t *tab)
+static uint8_t apic_get_tpr(APICCommonState *s)
 {
-    int i;
-    for(i = 7; i >= 0; i--) {
-        if (tab[i] != 0) {
-            return i * 32 + fls_bit(tab[i]);
-        }
-    }
-    return -1;
+    apic_sync_vapic(s, SYNC_FROM_VAPIC);
+    return s->tpr >> 4;
 }
 
 static int apic_get_ppr(APICCommonState *s)
@@ -312,6 +377,14 @@ static void apic_update_irq(APICCommonState *s)
     }
 }
 
+void apic_poll_irq(DeviceState *d)
+{
+    APICCommonState *s = APIC_COMMON(d);
+
+    apic_sync_vapic(s, SYNC_FROM_VAPIC);
+    apic_update_irq(s);
+}
+
 static void apic_set_irq(APICCommonState *s, int vector_num, int trigger_mode)
 {
     apic_report_irq_delivered(!get_bit(s->irr, vector_num));
@@ -321,6 +394,16 @@ static void apic_set_irq(APICCommonState *s, int vector_num, int trigger_mode)
         set_bit(s->tmr, vector_num);
     else
         reset_bit(s->tmr, vector_num);
+    if (s->vapic_paddr) {
+        apic_sync_vapic(s, SYNC_ISR_IRR_TO_VAPIC);
+        /*
+         * The vcpu thread needs to see the new IRR before we pull its current
+         * TPR value. That way, if we miss a lowering of the TRP, the guest
+         * has the chance to notice the new IRR and poll for IRQs on its own.
+         */
+        smp_wmb();
+        apic_sync_vapic(s, SYNC_FROM_VAPIC);
+    }
     apic_update_irq(s);
 }
 
@@ -334,6 +417,7 @@ static void apic_eoi(APICCommonState *s)
     if (!(s->spurious_vec & APIC_SV_DIRECTED_IO) && get_bit(s->tmr, isrv)) {
         ioapic_eoi_broadcast(isrv);
     }
+    apic_sync_vapic(s, SYNC_FROM_VAPIC | SYNC_TO_VAPIC);
     apic_update_irq(s);
 }
 
@@ -471,15 +555,19 @@ int apic_get_interrupt(DeviceState *d)
     if (!(s->spurious_vec & APIC_SV_ENABLE))
         return -1;
 
+    apic_sync_vapic(s, SYNC_FROM_VAPIC);
     intno = apic_irq_pending(s);
 
     if (intno == 0) {
+        apic_sync_vapic(s, SYNC_TO_VAPIC);
         return -1;
     } else if (intno < 0) {
+        apic_sync_vapic(s, SYNC_TO_VAPIC);
         return s->spurious_vec & 0xff;
     }
     reset_bit(s->irr, intno);
     set_bit(s->isr, intno);
+    apic_sync_vapic(s, SYNC_TO_VAPIC);
     apic_update_irq(s);
     return intno;
 }
@@ -576,6 +664,10 @@ static uint32_t apic_mem_readl(void *opaque, target_phys_addr_t addr)
         val = 0x11 | ((APIC_LVT_NB - 1) << 16); /* version 0x11 */
         break;
     case 0x08:
+        apic_sync_vapic(s, SYNC_FROM_VAPIC);
+        if (apic_report_tpr_access) {
+            cpu_report_tpr_access(s->cpu_env, TPR_ACCESS_READ);
+        }
         val = s->tpr;
         break;
     case 0x09:
@@ -675,7 +767,11 @@ static void apic_mem_writel(void *opaque, target_phys_addr_t addr, uint32_t val)
     case 0x03:
         break;
     case 0x08:
+        if (apic_report_tpr_access) {
+            cpu_report_tpr_access(s->cpu_env, TPR_ACCESS_WRITE);
+        }
         s->tpr = val;
+        apic_sync_vapic(s, SYNC_TO_VAPIC);
         apic_update_irq(s);
         break;
     case 0x09:
@@ -737,6 +833,11 @@ static void apic_mem_writel(void *opaque, target_phys_addr_t addr, uint32_t val)
     }
 }
 
+static void apic_pre_save(APICCommonState *s)
+{
+    apic_sync_vapic(s, SYNC_FROM_VAPIC);
+}
+
 static void apic_post_load(APICCommonState *s)
 {
     if (s->timer_expiry != -1) {
@@ -770,7 +871,10 @@ static void apic_class_init(ObjectClass *klass, void *data)
     k->init = apic_init;
     k->set_base = apic_set_base;
     k->set_tpr = apic_set_tpr;
+    k->get_tpr = apic_get_tpr;
+    k->vapic_base_update = apic_vapic_base_update;
     k->external_nmi = apic_external_nmi;
+    k->pre_save = apic_pre_save;
     k->post_load = apic_post_load;
 }
 
diff --git a/hw/apic_common.c b/hw/apic_common.c
index 588531b..1977da7 100644
--- a/hw/apic_common.c
+++ b/hw/apic_common.c
@@ -20,8 +20,10 @@
 #include "apic.h"
 #include "apic_internal.h"
 #include "trace.h"
+#include "kvm.h"
 
 static int apic_irq_delivered;
+bool apic_report_tpr_access;
 
 void cpu_set_apic_base(DeviceState *d, uint64_t val)
 {
@@ -63,13 +65,44 @@ void cpu_set_apic_tpr(DeviceState *d, uint8_t val)
 
 uint8_t cpu_get_apic_tpr(DeviceState *d)
 {
+    APICCommonState *s;
+    APICCommonClass *info;
+
+    if (!d) {
+        return 0;
+    }
+
+    s = APIC_COMMON(d);
+    info = APIC_COMMON_GET_CLASS(s);
+
+    return info->get_tpr(s);
+}
+
+void apic_enable_tpr_access_reporting(DeviceState *d)
+{
+    APICCommonState *s = DO_UPCAST(APICCommonState, busdev.qdev, d);
+    APICCommonClass *info = APIC_COMMON_GET_CLASS(s);
+
+    apic_report_tpr_access = true;
+    if (info->enable_tpr_reporting) {
+        info->enable_tpr_reporting(s);
+    }
+}
+
+void apic_enable_vapic(DeviceState *d, target_phys_addr_t paddr)
+{
     APICCommonState *s = DO_UPCAST(APICCommonState, busdev.qdev, d);
+    APICCommonClass *info = APIC_COMMON_GET_CLASS(s);
 
-    return s ? s->tpr >> 4 : 0;
+    s->vapic_paddr = paddr;
+    info->vapic_base_update(s);
 }
 
 void apic_handle_tpr_access_report(DeviceState *d, target_ulong ip, int access)
 {
+    APICCommonState *s = DO_UPCAST(APICCommonState, busdev.qdev, d);
+
+    vapic_report_tpr_access(s->vapic, s->cpu_env, ip, access);
 }
 
 void apic_report_irq_delivered(int delivered)
@@ -170,12 +203,16 @@ void apic_init_reset(DeviceState *d)
 static void apic_reset_common(DeviceState *d)
 {
     APICCommonState *s = DO_UPCAST(APICCommonState, busdev.qdev, d);
+    APICCommonClass *info = APIC_COMMON_GET_CLASS(s);
     bool bsp;
 
     bsp = cpu_is_bsp(s->cpu_env);
     s->apicbase = 0xfee00000 |
         (bsp ? MSR_IA32_APICBASE_BSP : 0) | MSR_IA32_APICBASE_ENABLE;
 
+    s->vapic_paddr = 0;
+    info->vapic_base_update(s);
+
     apic_init_reset(d);
 
     if (bsp) {
@@ -238,6 +275,7 @@ static int apic_init_common(SysBusDevice *dev)
 {
     APICCommonState *s = APIC_COMMON(dev);
     APICCommonClass *info;
+    static DeviceState *vapic;
     static int apic_no;
 
     if (apic_no >= MAX_APICS) {
@@ -248,10 +286,29 @@ static int apic_init_common(SysBusDevice *dev)
     info = APIC_COMMON_GET_CLASS(s);
     info->init(s);
 
-    sysbus_init_mmio(&s->busdev, &s->io_memory);
+    sysbus_init_mmio(dev, &s->io_memory);
+
+    if (!vapic && s->vapic_control & VAPIC_ENABLE_MASK) {
+        vapic = sysbus_create_simple("kvmvapic", -1, NULL);
+    }
+    s->vapic = vapic;
+    if (apic_report_tpr_access && info->enable_tpr_reporting) {
+        info->enable_tpr_reporting(s);
+    }
+
     return 0;
 }
 
+static void apic_dispatch_pre_save(void *opaque)
+{
+    APICCommonState *s = APIC_COMMON(opaque);
+    APICCommonClass *info = APIC_COMMON_GET_CLASS(s);
+
+    if (info->pre_save) {
+        info->pre_save(s);
+    }
+}
+
 static int apic_dispatch_post_load(void *opaque, int version_id)
 {
     APICCommonState *s = APIC_COMMON(opaque);
@@ -269,6 +326,7 @@ static const VMStateDescription vmstate_apic_common = {
     .minimum_version_id = 3,
     .minimum_version_id_old = 1,
     .load_state_old = apic_load_old,
+    .pre_save = apic_dispatch_pre_save,
     .post_load = apic_dispatch_post_load,
     .fields = (VMStateField[]) {
         VMSTATE_UINT32(apicbase, APICCommonState),
@@ -298,6 +356,8 @@ static const VMStateDescription vmstate_apic_common = {
 static Property apic_properties_common[] = {
     DEFINE_PROP_UINT8("id", APICCommonState, id, -1),
     DEFINE_PROP_PTR("cpu_env", APICCommonState, cpu_env),
+    DEFINE_PROP_BIT("vapic", APICCommonState, vapic_control, VAPIC_ENABLE_BIT,
+                    true),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/apic_internal.h b/hw/apic_internal.h
index 0cab010..95cc7cf 100644
--- a/hw/apic_internal.h
+++ b/hw/apic_internal.h
@@ -61,6 +61,9 @@
 #define APIC_SV_DIRECTED_IO             (1<<12)
 #define APIC_SV_ENABLE                  (1<<8)
 
+#define VAPIC_ENABLE_BIT                0
+#define VAPIC_ENABLE_MASK               (1 << VAPIC_ENABLE_BIT)
+
 #define MAX_APICS 255
 
 #define MSI_SPACE_SIZE                  0x100000
@@ -82,7 +85,11 @@ typedef struct APICCommonClass
     void (*init)(APICCommonState *s);
     void (*set_base)(APICCommonState *s, uint64_t val);
     void (*set_tpr)(APICCommonState *s, uint8_t val);
+    uint8_t (*get_tpr)(APICCommonState *s);
+    void (*enable_tpr_reporting)(APICCommonState *s);
+    void (*vapic_base_update)(APICCommonState *s);
     void (*external_nmi)(APICCommonState *s);
+    void (*pre_save)(APICCommonState *s);
     void (*post_load)(APICCommonState *s);
 } APICCommonClass;
 
@@ -114,9 +121,29 @@ struct APICCommonState {
     int64_t timer_expiry;
     int sipi_vector;
     int wait_for_sipi;
+
+    uint32_t vapic_control;
+    DeviceState *vapic;
+    target_phys_addr_t vapic_paddr; /* note: persistence via kvmvapic */
 };
 
+typedef struct VAPICState {
+    uint8_t tpr;
+    uint8_t isr;
+    uint8_t zero;
+    uint8_t irr;
+    uint8_t enabled;
+} QEMU_PACKED VAPICState;
+
+extern bool apic_report_tpr_access;
+
 void apic_report_irq_delivered(int delivered);
 bool apic_next_timer(APICCommonState *s, int64_t current_time);
+void apic_enable_tpr_access_reporting(DeviceState *d);
+void apic_enable_vapic(DeviceState *d, target_phys_addr_t paddr);
+void apic_poll_irq(DeviceState *d);
+
+void vapic_report_tpr_access(DeviceState *dev, void *cpu, target_ulong ip,
+                             int access);
 
 #endif /* !QEMU_APIC_INTERNAL_H */
diff --git a/hw/kvm/apic.c b/hw/kvm/apic.c
index dfc2ab3..326eb37 100644
--- a/hw/kvm/apic.c
+++ b/hw/kvm/apic.c
@@ -92,6 +92,35 @@ static void kvm_apic_set_tpr(APICCommonState *s, uint8_t val)
     s->tpr = (val & 0x0f) << 4;
 }
 
+static uint8_t kvm_apic_get_tpr(APICCommonState *s)
+{
+    return s->tpr >> 4;
+}
+
+static void kvm_apic_enable_tpr_reporting(APICCommonState *s)
+{
+    struct kvm_tpr_access_ctl ctl = {
+        .enabled = 1
+    };
+
+    kvm_vcpu_ioctl(s->cpu_env, KVM_TPR_ACCESS_REPORTING, &ctl);
+}
+
+static void kvm_apic_vapic_base_update(APICCommonState *s)
+{
+    struct kvm_vapic_addr vapid_addr = {
+        .vapic_addr = s->vapic_paddr,
+    };
+    int ret;
+
+    ret = kvm_vcpu_ioctl(s->cpu_env, KVM_SET_VAPIC_ADDR, &vapid_addr);
+    if (ret < 0) {
+        fprintf(stderr, "KVM: setting VAPIC address failed (%s)\n",
+                strerror(-ret));
+        abort();
+    }
+}
+
 static void do_inject_external_nmi(void *data)
 {
     APICCommonState *s = data;
@@ -129,6 +158,9 @@ static void kvm_apic_class_init(ObjectClass *klass, void *data)
     k->init = kvm_apic_init;
     k->set_base = kvm_apic_set_base;
     k->set_tpr = kvm_apic_set_tpr;
+    k->get_tpr = kvm_apic_get_tpr;
+    k->enable_tpr_reporting = kvm_apic_enable_tpr_reporting;
+    k->vapic_base_update = kvm_apic_vapic_base_update;
     k->external_nmi = kvm_apic_external_nmi;
 }
 
diff --git a/hw/kvmvapic.c b/hw/kvmvapic.c
new file mode 100644
index 0000000..0c4d304
--- /dev/null
+++ b/hw/kvmvapic.c
@@ -0,0 +1,774 @@
+/*
+ * TPR optimization for 32-bit Windows guests
+ *
+ * Copyright (C) 2007-2008 Qumranet Technologies
+ * Copyright (C) 2012      Jan Kiszka, Siemens AG
+ *
+ * This work is licensed under the terms of the GNU GPL version 2, or
+ * (at your option) any later version. See the COPYING file in the
+ * top-level directory.
+ */
+#include "sysemu.h"
+#include "cpus.h"
+#include "kvm.h"
+#include "apic_internal.h"
+
+#define APIC_DEFAULT_ADDRESS    0xfee00000
+
+#define VAPIC_IO_PORT           0x7e
+
+#define VAPIC_INACTIVE          0
+#define VAPIC_ACTIVE            1
+#define VAPIC_STANDBY           2
+
+#define VAPIC_CPU_SHIFT         7
+
+#define ROM_BLOCK_SIZE          512
+#define ROM_BLOCK_MASK          (~(ROM_BLOCK_SIZE - 1))
+
+typedef struct VAPICHandlers {
+    uint32_t set_tpr;
+    uint32_t set_tpr_eax;
+    uint32_t get_tpr[8];
+    uint32_t get_tpr_stack;
+} QEMU_PACKED VAPICHandlers;
+
+typedef struct GuestROMState {
+    char signature[8];
+    uint32_t vaddr;
+    uint32_t fixup_start;
+    uint32_t fixup_end;
+    uint32_t vapic_vaddr;
+    uint32_t vapic_size;
+    uint32_t vcpu_shift;
+    uint32_t real_tpr_addr;
+    VAPICHandlers up;
+    VAPICHandlers mp;
+} QEMU_PACKED GuestROMState;
+
+typedef struct VAPICROMState {
+    SysBusDevice busdev;
+    MemoryRegion io;
+    MemoryRegion rom;
+    bool rom_mapped_writable;
+    uint32_t state;
+    uint32_t rom_state_paddr;
+    uint32_t rom_state_vaddr;
+    uint32_t vapic_paddr;
+    uint32_t real_tpr_addr;
+    GuestROMState rom_state;
+    size_t rom_size;
+} VAPICROMState;
+
+#define TPR_INSTR_IS_WRITE              0x1
+#define TPR_INSTR_ABS_MODRM             0x2
+#define TPR_INSTR_MATCH_MODRM_REG       0x4
+
+typedef struct TPRInstruction {
+    uint8_t opcode;
+    uint8_t modrm_reg;
+    unsigned int flags;
+    size_t length;
+    off_t addr_offset;
+} TPRInstruction;
+
+/* must be sorted by length, shortest first */
+static const TPRInstruction tpr_instr[] = {
+    { /* mov abs to eax */
+        .opcode = 0xa1,
+        .length = 5,
+        .addr_offset = 1,
+    },
+    { /* mov eax to abs */
+        .opcode = 0xa3,
+        .flags = TPR_INSTR_IS_WRITE,
+        .length = 5,
+        .addr_offset = 1,
+    },
+    { /* mov r32 to r/m32 */
+        .opcode = 0x89,
+        .flags = TPR_INSTR_IS_WRITE | TPR_INSTR_ABS_MODRM,
+        .length = 6,
+        .addr_offset = 2,
+    },
+    { /* mov r/m32 to r32 */
+        .opcode = 0x8b,
+        .flags = TPR_INSTR_ABS_MODRM,
+        .length = 6,
+        .addr_offset = 2,
+    },
+    { /* push r/m32 */
+        .opcode = 0xff,
+        .modrm_reg = 6,
+        .flags = TPR_INSTR_ABS_MODRM | TPR_INSTR_MATCH_MODRM_REG,
+        .length = 6,
+        .addr_offset = 2,
+    },
+    { /* mov imm32, r/m32 (c7/0) */
+        .opcode = 0xc7,
+        .modrm_reg = 0,
+        .flags = TPR_INSTR_IS_WRITE | TPR_INSTR_ABS_MODRM |
+                 TPR_INSTR_MATCH_MODRM_REG,
+        .length = 10,
+        .addr_offset = 2,
+    },
+};
+
+static void read_guest_rom_state(VAPICROMState *s)
+{
+    cpu_physical_memory_rw(s->rom_state_paddr, (void *)&s->rom_state,
+                           sizeof(GuestROMState), 0);
+}
+
+static void write_guest_rom_state(VAPICROMState *s)
+{
+    cpu_physical_memory_rw(s->rom_state_paddr, (void *)&s->rom_state,
+                           sizeof(GuestROMState), 1);
+}
+
+static void update_guest_rom_state(VAPICROMState *s)
+{
+    read_guest_rom_state(s);
+
+    s->rom_state.real_tpr_addr = cpu_to_le32(s->real_tpr_addr);
+    s->rom_state.vcpu_shift = cpu_to_le32(VAPIC_CPU_SHIFT);
+
+    write_guest_rom_state(s);
+}
+
+static int find_real_tpr_addr(VAPICROMState *s, CPUState *env)
+{
+    target_phys_addr_t paddr;
+    target_ulong addr;
+
+    if (s->state == VAPIC_ACTIVE) {
+        return 0;
+    }
+    for (addr = 0xfffff000; addr >= 0x80000000; addr -= TARGET_PAGE_SIZE) {
+        paddr = cpu_get_phys_page_debug(env, addr);
+        if (paddr != APIC_DEFAULT_ADDRESS) {
+            continue;
+        }
+        s->real_tpr_addr = addr + 0x80;
+        update_guest_rom_state(s);
+        return 0;
+    }
+    return -1;
+}
+
+static uint8_t modrm_reg(uint8_t modrm)
+{
+    return (modrm >> 3) & 7;
+}
+
+static bool is_abs_modrm(uint8_t modrm)
+{
+    return (modrm & 0xc7) == 0x05;
+}
+
+static bool opcode_matches(uint8_t *opcode, const TPRInstruction *instr)
+{
+    return opcode[0] == instr->opcode &&
+        (!(instr->flags & TPR_INSTR_ABS_MODRM) || is_abs_modrm(opcode[1])) &&
+        (!(instr->flags & TPR_INSTR_MATCH_MODRM_REG) ||
+         modrm_reg(opcode[1]) == instr->modrm_reg);
+}
+
+static int evaluate_tpr_instruction(VAPICROMState *s, CPUState *env,
+                                    target_ulong *pip, int access)
+{
+    const TPRInstruction *instr;
+    target_ulong ip = *pip;
+    uint8_t opcode[2];
+    uint32_t real_tpr_addr;
+    int i;
+
+    if ((ip & 0xf0000000) != 0x80000000 && (ip & 0xf0000000) != 0xe0000000) {
+        return -1;
+    }
+
+    /*
+     * Early Windows 2003 SMP initialization contains a
+     *
+     *   mov imm32, r/m32
+     *
+     * instruction that is patched by TPR optimization. The problem is that
+     * RSP, used by the patched instruction, is zero, so the guest gets a
+     * double fault and dies.
+     */
+    if (env->regs[R_ESP] == 0) {
+        return -1;
+    }
+
+    if (access == TPR_ACCESS_WRITE && kvm_enabled() &&
+        !kvm_irqchip_in_kernel()) {
+        /*
+         * KVM without TPR access reporting calls into the user space APIC on
+         * write with IP pointing after the accessing instruction. So we need
+         * to look backward to find the reason.
+         */
+        for (i = 0; i < ARRAY_SIZE(tpr_instr); i++) {
+            instr = &tpr_instr[i];
+            if (!(instr->flags & TPR_INSTR_IS_WRITE)) {
+                continue;
+            }
+            if (cpu_memory_rw_debug(env, ip - instr->length, opcode,
+                                    sizeof(opcode), 0) < 0) {
+                return -1;
+            }
+            if (opcode_matches(opcode, instr)) {
+                ip -= instr->length;
+                goto instruction_ok;
+            }
+        }
+        return -1;
+    } else {
+        if (cpu_memory_rw_debug(env, ip, opcode, sizeof(opcode), 0) < 0) {
+            return -1;
+        }
+        for (i = 0; i < ARRAY_SIZE(tpr_instr); i++) {
+            instr = &tpr_instr[i];
+            if (opcode_matches(opcode, instr)) {
+                goto instruction_ok;
+            }
+        }
+        return -1;
+    }
+
+instruction_ok:
+    /*
+     * Grab the virtual TPR address from the instruction
+     * and update the cached values.
+     */
+    if (cpu_memory_rw_debug(env, ip + instr->addr_offset,
+                            (void *)&real_tpr_addr,
+                            sizeof(real_tpr_addr), 0) < 0) {
+        return -1;
+    }
+    real_tpr_addr = le32_to_cpu(real_tpr_addr);
+    if ((real_tpr_addr & 0xfff) != 0x80) {
+        return -1;
+    }
+    s->real_tpr_addr = real_tpr_addr;
+    update_guest_rom_state(s);
+
+    *pip = ip;
+    return 0;
+}
+
+static int update_rom_mapping(VAPICROMState *s, CPUState *env, target_ulong ip)
+{
+    target_phys_addr_t paddr;
+    uint32_t rom_state_vaddr;
+    uint32_t pos, patch, offset;
+
+    /* nothing to do if already activated */
+    if (s->state == VAPIC_ACTIVE) {
+        return 0;
+    }
+
+    /* bail out if ROM init code was not executed (missing ROM?) */
+    if (s->state == VAPIC_INACTIVE) {
+        return -1;
+    }
+
+    /* find out virtual address of the ROM */
+    rom_state_vaddr = s->rom_state_paddr + (ip & 0xf0000000);
+    paddr = cpu_get_phys_page_debug(env, rom_state_vaddr);
+    if (paddr == -1) {
+        return -1;
+    }
+    paddr += rom_state_vaddr & ~TARGET_PAGE_MASK;
+    if (paddr != s->rom_state_paddr) {
+        return -1;
+    }
+    read_guest_rom_state(s);
+    if (memcmp(s->rom_state.signature, "kvm aPiC", 8) != 0) {
+        return -1;
+    }
+    s->rom_state_vaddr = rom_state_vaddr;
+
+    /* fixup addresses in ROM if needed */
+    if (rom_state_vaddr == le32_to_cpu(s->rom_state.vaddr)) {
+        return 0;
+    }
+    for (pos = le32_to_cpu(s->rom_state.fixup_start);
+         pos < le32_to_cpu(s->rom_state.fixup_end);
+         pos += 4) {
+        cpu_physical_memory_rw(paddr + pos - s->rom_state.vaddr,
+                               (void *)&offset, sizeof(offset), 0);
+        offset = le32_to_cpu(offset);
+        cpu_physical_memory_rw(paddr + offset, (void *)&patch,
+                               sizeof(patch), 0);
+        patch = le32_to_cpu(patch);
+        patch += rom_state_vaddr - le32_to_cpu(s->rom_state.vaddr);
+        patch = cpu_to_le32(patch);
+        cpu_physical_memory_rw(paddr + offset, (void *)&patch,
+                               sizeof(patch), 1);
+    }
+    read_guest_rom_state(s);
+    s->vapic_paddr = paddr + le32_to_cpu(s->rom_state.vapic_vaddr) -
+        le32_to_cpu(s->rom_state.vaddr);
+
+    return 0;
+}
+
+/*
+ * Tries to read the unique processor number from the Kernel Processor Control
+ * Region (KPCR) of 32-bit Windows. Returns -1 if the KPCR cannot be accessed
+ * or is considered invalid.
+ */
+static int get_kpcr_number(CPUState *env)
+{
+    struct kpcr {
+        uint8_t  fill1[0x1c];
+        uint32_t self;
+        uint8_t  fill2[0x31];
+        uint8_t  number;
+    } QEMU_PACKED kpcr;
+
+    if (cpu_memory_rw_debug(env, env->segs[R_FS].base,
+                            (void *)&kpcr, sizeof(kpcr), 0) < 0 ||
+        kpcr.self != env->segs[R_FS].base) {
+        return -1;
+    }
+    return kpcr.number;
+}
+
+static int vapic_enable(VAPICROMState *s, CPUState *env)
+{
+    int cpu_number = get_kpcr_number(env);
+    target_phys_addr_t vapic_paddr;
+    static const uint8_t enabled = 1;
+
+    if (cpu_number < 0) {
+        return -1;
+    }
+    vapic_paddr = s->vapic_paddr +
+        (((target_phys_addr_t)cpu_number) << VAPIC_CPU_SHIFT);
+    cpu_physical_memory_rw(vapic_paddr + offsetof(VAPICState, enabled),
+                           (void *)&enabled, sizeof(enabled), 1);
+    apic_enable_vapic(env->apic_state, vapic_paddr);
+
+    s->state = VAPIC_ACTIVE;
+
+    return 0;
+}
+
+static void patch_byte(CPUState *env, target_ulong addr, uint8_t byte)
+{
+    cpu_memory_rw_debug(env, addr, &byte, 1, 1);
+}
+
+static void patch_call(VAPICROMState *s, CPUState *env, target_ulong ip,
+                       uint32_t target)
+{
+    uint32_t offset;
+
+    offset = cpu_to_le32(target - ip - 5);
+    patch_byte(env, ip, 0xe8); /* call near */
+    cpu_memory_rw_debug(env, ip + 1, (void *)&offset, sizeof(offset), 1);
+}
+
+static void patch_instruction(VAPICROMState *s, CPUState *env, target_ulong ip)
+{
+    target_phys_addr_t paddr;
+    VAPICHandlers *handlers;
+    uint8_t opcode[2];
+    uint32_t imm32;
+
+    if (smp_cpus == 1) {
+        handlers = &s->rom_state.up;
+    } else {
+        handlers = &s->rom_state.mp;
+    }
+
+    pause_all_vcpus();
+
+    cpu_memory_rw_debug(env, ip, opcode, sizeof(opcode), 0);
+
+    switch (opcode[0]) {
+    case 0x89: /* mov r32 to r/m32 */
+        patch_byte(env, ip, 0x50 + modrm_reg(opcode[1]));  /* push reg */
+        patch_call(s, env, ip + 1, handlers->set_tpr);
+        break;
+    case 0x8b: /* mov r/m32 to r32 */
+        patch_byte(env, ip, 0x90);
+        patch_call(s, env, ip + 1, handlers->get_tpr[modrm_reg(opcode[1])]);
+        break;
+    case 0xa1: /* mov abs to eax */
+        patch_call(s, env, ip, handlers->get_tpr[0]);
+        break;
+    case 0xa3: /* mov eax to abs */
+        patch_call(s, env, ip, handlers->set_tpr_eax);
+        break;
+    case 0xc7: /* mov imm32, r/m32 (c7/0) */
+        patch_byte(env, ip, 0x68);  /* push imm32 */
+        cpu_memory_rw_debug(env, ip + 6, (void *)&imm32, sizeof(imm32), 0);
+        cpu_memory_rw_debug(env, ip + 1, (void *)&imm32, sizeof(imm32), 1);
+        patch_call(s, env, ip + 5, handlers->set_tpr);
+        break;
+    case 0xff: /* push r/m32 */
+        patch_byte(env, ip, 0x50); /* push eax */
+        patch_call(s, env, ip + 1, handlers->get_tpr_stack);
+        break;
+    default:
+        abort();
+    }
+
+    resume_all_vcpus();
+
+    paddr = cpu_get_phys_page_debug(env, ip);
+    paddr += ip & ~TARGET_PAGE_MASK;
+    tb_invalidate_phys_page_range(paddr, paddr + 1, 1);
+}
+
+void vapic_report_tpr_access(DeviceState *dev, void *cpu, target_ulong ip,
+                             int access)
+{
+    VAPICROMState *s = DO_UPCAST(VAPICROMState, busdev.qdev, dev);
+    CPUState *env = cpu;
+
+    cpu_synchronize_state(env);
+
+    if (evaluate_tpr_instruction(s, env, &ip, access) < 0) {
+        if (s->state == VAPIC_ACTIVE) {
+            vapic_enable(s, env);
+        }
+        return;
+    }
+    if (update_rom_mapping(s, env, ip) < 0) {
+        return;
+    }
+    if (vapic_enable(s, env) < 0) {
+        return;
+    }
+    patch_instruction(s, env, ip);
+}
+
+static void vapic_reset(DeviceState *dev)
+{
+    VAPICROMState *s = DO_UPCAST(VAPICROMState, busdev.qdev, dev);
+
+    if (s->state == VAPIC_ACTIVE) {
+        s->state = VAPIC_STANDBY;
+    }
+}
+
+static int patch_hypercalls(VAPICROMState *s)
+{
+    target_phys_addr_t rom_paddr = s->rom_state_paddr & ROM_BLOCK_MASK;
+    static uint8_t vmcall_pattern[] = {
+        0xb8, 0x1, 0, 0, 0, 0xf, 0x1, 0xc1
+    };
+    static uint8_t outl_pattern[] = {
+        0xb8, 0x1, 0, 0, 0, 0x90, 0xe7, 0x7e
+    };
+    uint8_t alternates[2];
+    uint8_t *pattern;
+    uint8_t *patch;
+    int patches = 0;
+    off_t pos;
+    uint8_t *rom;
+
+    rom = g_malloc(s->rom_size);
+    cpu_physical_memory_rw(rom_paddr, rom, s->rom_size, 0);
+
+    for (pos = 0; pos < s->rom_size - sizeof(vmcall_pattern); pos++) {
+        if (kvm_irqchip_in_kernel()) {
+            pattern = outl_pattern;
+            alternates[0] = outl_pattern[7];
+            alternates[1] = outl_pattern[7];
+            patch = &vmcall_pattern[5];
+        } else {
+            pattern = vmcall_pattern;
+            alternates[0] = vmcall_pattern[7];
+            alternates[1] = 0xd9; /* AMD's VMMCALL */
+            patch = &outl_pattern[5];
+        }
+        if (memcmp(rom + pos, pattern, 7) == 0 &&
+            (rom[pos + 7] == alternates[0] || rom[pos + 7] == alternates[1])) {
+            cpu_physical_memory_rw(rom_paddr + pos + 5, patch, 3, 1);
+            /*
+             * Don't flush the tb here. Under ordinary conditions, the patched
+             * calls are miles away from the current IP. Under malicious
+             * conditions, the guest could trick us to crash.
+             */
+        }
+    }
+
+    g_free(rom);
+
+    if (patches != 0 && patches != 2) {
+        return -1;
+    }
+
+    return 0;
+}
+
+static void vapic_map_rom_writable(VAPICROMState *s)
+{
+    target_phys_addr_t rom_paddr = s->rom_state_paddr & ROM_BLOCK_MASK;
+    MemoryRegionSection section;
+    MemoryRegion *as;
+    size_t rom_size;
+    uint8_t *ram;
+
+    as = sysbus_address_space(&s->busdev);
+
+    if (s->rom_mapped_writable) {
+        memory_region_del_subregion(as, &s->rom);
+        memory_region_destroy(&s->rom);
+    }
+
+    /* grab RAM memory region (region @rom_paddr may still be pc.rom) */
+    section = memory_region_find(as, 0, 1);
+
+    /* read ROM size from RAM region */
+    ram = memory_region_get_ram_ptr(section.mr);
+    rom_size = ram[rom_paddr + 2] * ROM_BLOCK_SIZE;
+    s->rom_size = rom_size;
+
+    /* We need to round up to avoid creating subpages
+     * from which we cannot run code. */
+    rom_size = TARGET_PAGE_ALIGN(rom_size);
+
+    memory_region_init_alias(&s->rom, "kvmvapic-rom", section.mr, rom_paddr,
+                             rom_size);
+    memory_region_add_subregion_overlap(as, rom_paddr, &s->rom, 1000);
+    s->rom_mapped_writable = true;
+}
+
+static void do_enable_tpr_reporting(void *data)
+{
+    CPUState *env = data;
+
+    apic_enable_tpr_access_reporting(env->apic_state);
+}
+
+static void vapic_enable_tpr_reporting(void)
+{
+    CPUState *env = cpu_single_env;
+
+    for (env = first_cpu; env != NULL; env = env->next_cpu) {
+        run_on_cpu(env, do_enable_tpr_reporting, env);
+    }
+}
+
+static int vapic_prepare(VAPICROMState *s)
+{
+    vapic_map_rom_writable(s);
+
+    if (patch_hypercalls(s) < 0) {
+        return -1;
+    }
+
+    vapic_enable_tpr_reporting();
+
+    return 0;
+}
+
+static void vapic_write(void *opaque, target_phys_addr_t addr, uint64_t data,
+                        unsigned int size)
+{
+    CPUState *env = cpu_single_env;
+    target_phys_addr_t rom_paddr;
+    VAPICROMState *s = opaque;
+
+    cpu_synchronize_state(env);
+
+    /*
+     * The VAPIC supports two PIO-based hypercalls, both via port 0x7E.
+     *  o 16-bit write access:
+     *    Reports the option ROM initialization to the hypervisor. Written
+     *    value is the offset of the state structure in the ROM.
+     *  o 8-bit write access:
+     *    Reactivates the VAPIC after a guest hibernation, i.e. after the
+     *    option ROM content has been re-initialized by a guest power cycle.
+     *  o 32-bit write access:
+     *    Poll for pending IRQs, considering the current VAPIC state.
+     */
+    switch (size) {
+    case 2:
+        if (s->state != VAPIC_INACTIVE) {
+            patch_hypercalls(s);
+            break;
+        }
+
+        rom_paddr = (env->segs[R_CS].base + env->eip) & ROM_BLOCK_MASK;
+        s->rom_state_paddr = rom_paddr + data;
+
+        if (vapic_prepare(s) < 0) {
+            break;
+        }
+        s->state = VAPIC_STANDBY;
+        break;
+    case 1:
+        if (kvm_enabled()) {
+            /*
+             * Disable triggering instruction in ROM by writing a NOP.
+             *
+             * We cannot do this in TCG mode as the reported IP is not
+             * reliable.
+             */
+            pause_all_vcpus();
+            patch_byte(env, env->eip - 2, 0x66);
+            patch_byte(env, env->eip - 1, 0x90);
+            resume_all_vcpus();
+        }
+
+        if (s->state == VAPIC_ACTIVE) {
+            break;
+        }
+        if (update_rom_mapping(s, env, env->eip) < 0) {
+            break;
+        }
+        if (find_real_tpr_addr(s, env) < 0) {
+            break;
+        }
+        vapic_enable(s, env);
+        break;
+    default:
+    case 4:
+        if (!kvm_irqchip_in_kernel()) {
+            apic_poll_irq(env->apic_state);
+        }
+        break;
+    }
+}
+
+static const MemoryRegionOps vapic_ops = {
+    .write = vapic_write,
+    .endianness = DEVICE_NATIVE_ENDIAN,
+};
+
+static int vapic_init(SysBusDevice *dev)
+{
+    VAPICROMState *s = FROM_SYSBUS(VAPICROMState, dev);
+
+    memory_region_init_io(&s->io, &vapic_ops, s, "kvmvapic", 2);
+    sysbus_add_io(dev, VAPIC_IO_PORT, &s->io);
+    sysbus_init_ioports(dev, VAPIC_IO_PORT, 2);
+
+    option_rom[nb_option_roms].name = "kvmvapic.bin";
+    option_rom[nb_option_roms].bootindex = -1;
+    nb_option_roms++;
+
+    return 0;
+}
+
+static void do_vapic_enable(void *data)
+{
+    VAPICROMState *s = data;
+
+    vapic_enable(s, first_cpu);
+}
+
+static int vapic_post_load(void *opaque, int version_id)
+{
+    VAPICROMState *s = opaque;
+    uint8_t *zero;
+
+    /*
+     * The old implementation of qemu-kvm did not provide the state
+     * VAPIC_STANDBY. Reconstruct it.
+     */
+    if (s->state == VAPIC_INACTIVE && s->rom_state_paddr != 0) {
+        s->state = VAPIC_STANDBY;
+    }
+
+    if (s->state != VAPIC_INACTIVE) {
+        if (vapic_prepare(s) < 0) {
+            return -1;
+        }
+    }
+    if (s->state == VAPIC_ACTIVE) {
+        if (smp_cpus == 1) {
+            run_on_cpu(first_cpu, do_vapic_enable, s);
+        } else {
+            zero = g_malloc0(s->rom_state.vapic_size);
+            cpu_physical_memory_rw(s->vapic_paddr, zero,
+                                   s->rom_state.vapic_size, 1);
+            g_free(zero);
+        }
+    }
+
+    return 0;
+}
+
+static const VMStateDescription vmstate_handlers = {
+    .name = "kvmvapic-handlers",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT32(set_tpr, VAPICHandlers),
+        VMSTATE_UINT32(set_tpr_eax, VAPICHandlers),
+        VMSTATE_UINT32_ARRAY(get_tpr, VAPICHandlers, 8),
+        VMSTATE_UINT32(get_tpr_stack, VAPICHandlers),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static const VMStateDescription vmstate_guest_rom = {
+    .name = "kvmvapic-guest-rom",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_UNUSED(8),     /* signature */
+        VMSTATE_UINT32(vaddr, GuestROMState),
+        VMSTATE_UINT32(fixup_start, GuestROMState),
+        VMSTATE_UINT32(fixup_end, GuestROMState),
+        VMSTATE_UINT32(vapic_vaddr, GuestROMState),
+        VMSTATE_UINT32(vapic_size, GuestROMState),
+        VMSTATE_UINT32(vcpu_shift, GuestROMState),
+        VMSTATE_UINT32(real_tpr_addr, GuestROMState),
+        VMSTATE_STRUCT(up, GuestROMState, 0, vmstate_handlers, VAPICHandlers),
+        VMSTATE_STRUCT(mp, GuestROMState, 0, vmstate_handlers, VAPICHandlers),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static const VMStateDescription vmstate_vapic = {
+    .name = "kvm-tpr-opt",      /* compatible with qemu-kvm VAPIC */
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .post_load = vapic_post_load,
+    .fields = (VMStateField[]) {
+        VMSTATE_STRUCT(rom_state, VAPICROMState, 0, vmstate_guest_rom,
+                       GuestROMState),
+        VMSTATE_UINT32(state, VAPICROMState),
+        VMSTATE_UINT32(real_tpr_addr, VAPICROMState),
+        VMSTATE_UINT32(rom_state_vaddr, VAPICROMState),
+        VMSTATE_UINT32(vapic_paddr, VAPICROMState),
+        VMSTATE_UINT32(rom_state_paddr, VAPICROMState),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static void vapic_class_init(ObjectClass *klass, void *data)
+{
+    SysBusDeviceClass *sc = SYS_BUS_DEVICE_CLASS(klass);
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    dc->no_user = 1;
+    dc->reset   = vapic_reset;
+    dc->vmsd    = &vmstate_vapic;
+    sc->init    = vapic_init;
+}
+
+static TypeInfo vapic_type = {
+    .name          = "kvmvapic",
+    .parent        = TYPE_SYS_BUS_DEVICE,
+    .instance_size = sizeof(VAPICROMState),
+    .class_init    = vapic_class_init,
+};
+
+static void vapic_register(void)
+{
+    type_register_static(&vapic_type);
+}
+
+device_init(vapic_register);
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v2 6/8] kvmvapic: Simplify mp/up_set_tpr
  2012-02-10 18:31 ` [Qemu-devel] " Jan Kiszka
@ 2012-02-10 18:31   ` Jan Kiszka
  -1 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-10 18:31 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, qemu-devel, Anthony Liguori, Gleb Natapov

The CH registers is only written, never read. So we can remove these
operations and, in case of up_set_tpr, also the ECX push/pop.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 pc-bios/optionrom/kvmvapic.S |    6 +-----
 1 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/pc-bios/optionrom/kvmvapic.S b/pc-bios/optionrom/kvmvapic.S
index e1d8f18..856c1e5 100644
--- a/pc-bios/optionrom/kvmvapic.S
+++ b/pc-bios/optionrom/kvmvapic.S
@@ -202,7 +202,6 @@ mp_isr_is_bigger:
 	mov %bh, %bl
 mp_tpr_is_bigger:
 	/* %bl = ppr */
-	mov %bl, %ch   /* ch = ppr */
 	rol $8, %ebx
 	/* now: %bl = irr, %bh = ppr */
 	cmp %bh, %bl
@@ -276,7 +275,6 @@ up_set_tpr_eax:
 up_set_tpr:
 	pushf
 	push %eax
-	push %ecx
 	push %ebx
 	reenable_vtpr
 
@@ -284,7 +282,7 @@ up_set_tpr_failed:
 	mov vapic, %eax	; fixup
 
 	mov %eax, %ebx
-	mov 20(%esp), %bl
+	mov 16(%esp), %bl
 
 	/* %ebx = new vapic (%bl = tpr, %bh = isr, %b3 = irr) */
 
@@ -298,7 +296,6 @@ up_isr_is_bigger:
 	mov %bh, %bl
 up_tpr_is_bigger:
 	/* %bl = ppr */
-	mov %bl, %ch   /* ch = ppr */
 	rol $8, %ebx
 	/* now: %bl = irr, %bh = ppr */
 	cmp %bh, %bl
@@ -306,7 +303,6 @@ up_tpr_is_bigger:
 
 up_set_tpr_out:
 	pop %ebx
-	pop %ecx
 	pop %eax
 	popf
 	ret $4
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Qemu-devel] [PATCH v2 6/8] kvmvapic: Simplify mp/up_set_tpr
@ 2012-02-10 18:31   ` Jan Kiszka
  0 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-10 18:31 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Anthony Liguori, qemu-devel, kvm, Gleb Natapov

The CH registers is only written, never read. So we can remove these
operations and, in case of up_set_tpr, also the ECX push/pop.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 pc-bios/optionrom/kvmvapic.S |    6 +-----
 1 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/pc-bios/optionrom/kvmvapic.S b/pc-bios/optionrom/kvmvapic.S
index e1d8f18..856c1e5 100644
--- a/pc-bios/optionrom/kvmvapic.S
+++ b/pc-bios/optionrom/kvmvapic.S
@@ -202,7 +202,6 @@ mp_isr_is_bigger:
 	mov %bh, %bl
 mp_tpr_is_bigger:
 	/* %bl = ppr */
-	mov %bl, %ch   /* ch = ppr */
 	rol $8, %ebx
 	/* now: %bl = irr, %bh = ppr */
 	cmp %bh, %bl
@@ -276,7 +275,6 @@ up_set_tpr_eax:
 up_set_tpr:
 	pushf
 	push %eax
-	push %ecx
 	push %ebx
 	reenable_vtpr
 
@@ -284,7 +282,7 @@ up_set_tpr_failed:
 	mov vapic, %eax	; fixup
 
 	mov %eax, %ebx
-	mov 20(%esp), %bl
+	mov 16(%esp), %bl
 
 	/* %ebx = new vapic (%bl = tpr, %bh = isr, %b3 = irr) */
 
@@ -298,7 +296,6 @@ up_isr_is_bigger:
 	mov %bh, %bl
 up_tpr_is_bigger:
 	/* %bl = ppr */
-	mov %bl, %ch   /* ch = ppr */
 	rol $8, %ebx
 	/* now: %bl = irr, %bh = ppr */
 	cmp %bh, %bl
@@ -306,7 +303,6 @@ up_tpr_is_bigger:
 
 up_set_tpr_out:
 	pop %ebx
-	pop %ecx
 	pop %eax
 	popf
 	ret $4
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v2 7/8] optionsrom: Reserve space for checksum
  2012-02-10 18:31 ` [Qemu-devel] " Jan Kiszka
@ 2012-02-10 18:31   ` Jan Kiszka
  -1 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-10 18:31 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Anthony Liguori, qemu-devel, kvm, Gleb Natapov

Always add a byte before the final 512-bytes alignment to reserve the
space for the ROM checksum.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 pc-bios/optionrom/optionrom.h |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/pc-bios/optionrom/optionrom.h b/pc-bios/optionrom/optionrom.h
index aa783de..3daf7da 100644
--- a/pc-bios/optionrom/optionrom.h
+++ b/pc-bios/optionrom/optionrom.h
@@ -124,7 +124,8 @@
 	movw		%ax, %ds;
 
 #define OPTION_ROM_END					\
-    .align 512, 0;					\
+	.byte		0;				\
+	.align		512, 0;				\
     _end:
 
 #define BOOT_ROM_END					\
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Qemu-devel] [PATCH v2 7/8] optionsrom: Reserve space for checksum
@ 2012-02-10 18:31   ` Jan Kiszka
  0 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-10 18:31 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Anthony Liguori, qemu-devel, kvm, Gleb Natapov

Always add a byte before the final 512-bytes alignment to reserve the
space for the ROM checksum.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 pc-bios/optionrom/optionrom.h |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/pc-bios/optionrom/optionrom.h b/pc-bios/optionrom/optionrom.h
index aa783de..3daf7da 100644
--- a/pc-bios/optionrom/optionrom.h
+++ b/pc-bios/optionrom/optionrom.h
@@ -124,7 +124,8 @@
 	movw		%ax, %ds;
 
 #define OPTION_ROM_END					\
-    .align 512, 0;					\
+	.byte		0;				\
+	.align		512, 0;				\
     _end:
 
 #define BOOT_ROM_END					\
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [PATCH v2 8/8] kvmvapic: Use optionrom helpers
  2012-02-10 18:31 ` [Qemu-devel] " Jan Kiszka
@ 2012-02-10 18:31   ` Jan Kiszka
  -1 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-10 18:31 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: kvm, qemu-devel, Anthony Liguori, Gleb Natapov

Use OPTION_ROM_START/END from the common header file, add comment to
init code.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 pc-bios/optionrom/kvmvapic.S |   18 ++++++++----------
 1 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/pc-bios/optionrom/kvmvapic.S b/pc-bios/optionrom/kvmvapic.S
index 856c1e5..aa17a40 100644
--- a/pc-bios/optionrom/kvmvapic.S
+++ b/pc-bios/optionrom/kvmvapic.S
@@ -9,12 +9,10 @@
 # option) any later version.  See the COPYING file in the top-level directory.
 #
 
-	.text 0
-	.code16
-.global _start
-_start:
-	.short 0xaa55
-	.byte (_end - _start) / 512
+#include "optionrom.h"
+
+OPTION_ROM_START
+
 	# clear vapic area: firmware load using rep insb may cause
 	# stale tpr/isr/irr data to corrupt the vapic area.
 	push %es
@@ -26,8 +24,11 @@ _start:
 	cld
 	rep stosw
 	pop %es
+
+	# announce presence to the hypervisor
 	mov $vapic_base, %ax
 	out %ax, $0x7e
+
 	lret
 
 	.code32
@@ -331,7 +332,4 @@ up_set_tpr_poll_irq:
 vapic:
 . = . + vapic_size
 
-.byte 0  # reserve space for signature
-.align 512, 0
-
-_end:
+OPTION_ROM_END
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [Qemu-devel] [PATCH v2 8/8] kvmvapic: Use optionrom helpers
@ 2012-02-10 18:31   ` Jan Kiszka
  0 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-10 18:31 UTC (permalink / raw)
  To: Avi Kivity, Marcelo Tosatti
  Cc: Anthony Liguori, qemu-devel, kvm, Gleb Natapov

Use OPTION_ROM_START/END from the common header file, add comment to
init code.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 pc-bios/optionrom/kvmvapic.S |   18 ++++++++----------
 1 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/pc-bios/optionrom/kvmvapic.S b/pc-bios/optionrom/kvmvapic.S
index 856c1e5..aa17a40 100644
--- a/pc-bios/optionrom/kvmvapic.S
+++ b/pc-bios/optionrom/kvmvapic.S
@@ -9,12 +9,10 @@
 # option) any later version.  See the COPYING file in the top-level directory.
 #
 
-	.text 0
-	.code16
-.global _start
-_start:
-	.short 0xaa55
-	.byte (_end - _start) / 512
+#include "optionrom.h"
+
+OPTION_ROM_START
+
 	# clear vapic area: firmware load using rep insb may cause
 	# stale tpr/isr/irr data to corrupt the vapic area.
 	push %es
@@ -26,8 +24,11 @@ _start:
 	cld
 	rep stosw
 	pop %es
+
+	# announce presence to the hypervisor
 	mov $vapic_base, %ax
 	out %ax, $0x7e
+
 	lret
 
 	.code32
@@ -331,7 +332,4 @@ up_set_tpr_poll_irq:
 vapic:
 . = . + vapic_size
 
-.byte 0  # reserve space for signature
-.align 512, 0
-
-_end:
+OPTION_ROM_END
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 1/8] kvm: Set cpu_single_env only once
  2012-02-10 18:31   ` [Qemu-devel] " Jan Kiszka
@ 2012-02-11 10:02     ` Blue Swirl
  -1 siblings, 0 replies; 90+ messages in thread
From: Blue Swirl @ 2012-02-11 10:02 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti, qemu-devel,
	Avi Kivity

On Fri, Feb 10, 2012 at 18:31, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> As we have thread-local cpu_single_env now and KVM uses exactly one
> thread per VCPU, we can drop the cpu_single_env updates from the loop
> and initialize this variable only once during setup.

I don't think this is correct. Maybe you missed the part that sets
cpu_single_env to NULL, which I think is to annoy broken code that
assumes that some CPU state is always globally available. This is not
true for monitor context.

> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
>  cpus.c    |    1 +
>  kvm-all.c |    5 -----
>  2 files changed, 1 insertions(+), 5 deletions(-)
>
> diff --git a/cpus.c b/cpus.c
> index f45a438..d0c8340 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -714,6 +714,7 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
>     qemu_mutex_lock(&qemu_global_mutex);
>     qemu_thread_get_self(env->thread);
>     env->thread_id = qemu_get_thread_id();
> +    cpu_single_env = env;
>
>     r = kvm_init_vcpu(env);
>     if (r < 0) {
> diff --git a/kvm-all.c b/kvm-all.c
> index c4babda..e2cbc03 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -1118,8 +1118,6 @@ int kvm_cpu_exec(CPUState *env)
>         return EXCP_HLT;
>     }
>
> -    cpu_single_env = env;
> -
>     do {
>         if (env->kvm_vcpu_dirty) {
>             kvm_arch_put_registers(env, KVM_PUT_RUNTIME_STATE);
> @@ -1136,13 +1134,11 @@ int kvm_cpu_exec(CPUState *env)
>              */
>             qemu_cpu_kick_self();
>         }
> -        cpu_single_env = NULL;
>         qemu_mutex_unlock_iothread();
>
>         run_ret = kvm_vcpu_ioctl(env, KVM_RUN, 0);
>
>         qemu_mutex_lock_iothread();
> -        cpu_single_env = env;
>         kvm_arch_post_run(env, run);
>
>         kvm_flush_coalesced_mmio_buffer();
> @@ -1206,7 +1202,6 @@ int kvm_cpu_exec(CPUState *env)
>     }
>
>     env->exit_request = 0;
> -    cpu_single_env = NULL;
>     return ret;
>  }
>
> --
> 1.7.3.4
>
>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/8] kvm: Set cpu_single_env only once
@ 2012-02-11 10:02     ` Blue Swirl
  0 siblings, 0 replies; 90+ messages in thread
From: Blue Swirl @ 2012-02-11 10:02 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti, qemu-devel,
	Avi Kivity

On Fri, Feb 10, 2012 at 18:31, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> As we have thread-local cpu_single_env now and KVM uses exactly one
> thread per VCPU, we can drop the cpu_single_env updates from the loop
> and initialize this variable only once during setup.

I don't think this is correct. Maybe you missed the part that sets
cpu_single_env to NULL, which I think is to annoy broken code that
assumes that some CPU state is always globally available. This is not
true for monitor context.

> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
>  cpus.c    |    1 +
>  kvm-all.c |    5 -----
>  2 files changed, 1 insertions(+), 5 deletions(-)
>
> diff --git a/cpus.c b/cpus.c
> index f45a438..d0c8340 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -714,6 +714,7 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
>     qemu_mutex_lock(&qemu_global_mutex);
>     qemu_thread_get_self(env->thread);
>     env->thread_id = qemu_get_thread_id();
> +    cpu_single_env = env;
>
>     r = kvm_init_vcpu(env);
>     if (r < 0) {
> diff --git a/kvm-all.c b/kvm-all.c
> index c4babda..e2cbc03 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -1118,8 +1118,6 @@ int kvm_cpu_exec(CPUState *env)
>         return EXCP_HLT;
>     }
>
> -    cpu_single_env = env;
> -
>     do {
>         if (env->kvm_vcpu_dirty) {
>             kvm_arch_put_registers(env, KVM_PUT_RUNTIME_STATE);
> @@ -1136,13 +1134,11 @@ int kvm_cpu_exec(CPUState *env)
>              */
>             qemu_cpu_kick_self();
>         }
> -        cpu_single_env = NULL;
>         qemu_mutex_unlock_iothread();
>
>         run_ret = kvm_vcpu_ioctl(env, KVM_RUN, 0);
>
>         qemu_mutex_lock_iothread();
> -        cpu_single_env = env;
>         kvm_arch_post_run(env, run);
>
>         kvm_flush_coalesced_mmio_buffer();
> @@ -1206,7 +1202,6 @@ int kvm_cpu_exec(CPUState *env)
>     }
>
>     env->exit_request = 0;
> -    cpu_single_env = NULL;
>     return ret;
>  }
>
> --
> 1.7.3.4
>
>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 1/8] kvm: Set cpu_single_env only once
  2012-02-11 10:02     ` [Qemu-devel] " Blue Swirl
@ 2012-02-11 10:06       ` Jan Kiszka
  -1 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-11 10:06 UTC (permalink / raw)
  To: Blue Swirl
  Cc: Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti, qemu-devel,
	Avi Kivity

[-- Attachment #1: Type: text/plain, Size: 2290 bytes --]

On 2012-02-11 11:02, Blue Swirl wrote:
> On Fri, Feb 10, 2012 at 18:31, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> As we have thread-local cpu_single_env now and KVM uses exactly one
>> thread per VCPU, we can drop the cpu_single_env updates from the loop
>> and initialize this variable only once during setup.
> 
> I don't think this is correct. Maybe you missed the part that sets
> cpu_single_env to NULL, which I think is to annoy broken code that
> assumes that some CPU state is always globally available. This is not
> true for monitor context.

I did check this before changing, and I see no such need. Particularly
as this old debugging help prevents valid use case.

Jan

> 
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> ---
>>  cpus.c    |    1 +
>>  kvm-all.c |    5 -----
>>  2 files changed, 1 insertions(+), 5 deletions(-)
>>
>> diff --git a/cpus.c b/cpus.c
>> index f45a438..d0c8340 100644
>> --- a/cpus.c
>> +++ b/cpus.c
>> @@ -714,6 +714,7 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
>>     qemu_mutex_lock(&qemu_global_mutex);
>>     qemu_thread_get_self(env->thread);
>>     env->thread_id = qemu_get_thread_id();
>> +    cpu_single_env = env;
>>
>>     r = kvm_init_vcpu(env);
>>     if (r < 0) {
>> diff --git a/kvm-all.c b/kvm-all.c
>> index c4babda..e2cbc03 100644
>> --- a/kvm-all.c
>> +++ b/kvm-all.c
>> @@ -1118,8 +1118,6 @@ int kvm_cpu_exec(CPUState *env)
>>         return EXCP_HLT;
>>     }
>>
>> -    cpu_single_env = env;
>> -
>>     do {
>>         if (env->kvm_vcpu_dirty) {
>>             kvm_arch_put_registers(env, KVM_PUT_RUNTIME_STATE);
>> @@ -1136,13 +1134,11 @@ int kvm_cpu_exec(CPUState *env)
>>              */
>>             qemu_cpu_kick_self();
>>         }
>> -        cpu_single_env = NULL;
>>         qemu_mutex_unlock_iothread();
>>
>>         run_ret = kvm_vcpu_ioctl(env, KVM_RUN, 0);
>>
>>         qemu_mutex_lock_iothread();
>> -        cpu_single_env = env;
>>         kvm_arch_post_run(env, run);
>>
>>         kvm_flush_coalesced_mmio_buffer();
>> @@ -1206,7 +1202,6 @@ int kvm_cpu_exec(CPUState *env)
>>     }
>>
>>     env->exit_request = 0;
>> -    cpu_single_env = NULL;
>>     return ret;
>>  }
>>
>> --
>> 1.7.3.4
>>
>>
> 
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/8] kvm: Set cpu_single_env only once
@ 2012-02-11 10:06       ` Jan Kiszka
  0 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-11 10:06 UTC (permalink / raw)
  To: Blue Swirl
  Cc: Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti, qemu-devel,
	Avi Kivity

[-- Attachment #1: Type: text/plain, Size: 2290 bytes --]

On 2012-02-11 11:02, Blue Swirl wrote:
> On Fri, Feb 10, 2012 at 18:31, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> As we have thread-local cpu_single_env now and KVM uses exactly one
>> thread per VCPU, we can drop the cpu_single_env updates from the loop
>> and initialize this variable only once during setup.
> 
> I don't think this is correct. Maybe you missed the part that sets
> cpu_single_env to NULL, which I think is to annoy broken code that
> assumes that some CPU state is always globally available. This is not
> true for monitor context.

I did check this before changing, and I see no such need. Particularly
as this old debugging help prevents valid use case.

Jan

> 
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> ---
>>  cpus.c    |    1 +
>>  kvm-all.c |    5 -----
>>  2 files changed, 1 insertions(+), 5 deletions(-)
>>
>> diff --git a/cpus.c b/cpus.c
>> index f45a438..d0c8340 100644
>> --- a/cpus.c
>> +++ b/cpus.c
>> @@ -714,6 +714,7 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
>>     qemu_mutex_lock(&qemu_global_mutex);
>>     qemu_thread_get_self(env->thread);
>>     env->thread_id = qemu_get_thread_id();
>> +    cpu_single_env = env;
>>
>>     r = kvm_init_vcpu(env);
>>     if (r < 0) {
>> diff --git a/kvm-all.c b/kvm-all.c
>> index c4babda..e2cbc03 100644
>> --- a/kvm-all.c
>> +++ b/kvm-all.c
>> @@ -1118,8 +1118,6 @@ int kvm_cpu_exec(CPUState *env)
>>         return EXCP_HLT;
>>     }
>>
>> -    cpu_single_env = env;
>> -
>>     do {
>>         if (env->kvm_vcpu_dirty) {
>>             kvm_arch_put_registers(env, KVM_PUT_RUNTIME_STATE);
>> @@ -1136,13 +1134,11 @@ int kvm_cpu_exec(CPUState *env)
>>              */
>>             qemu_cpu_kick_self();
>>         }
>> -        cpu_single_env = NULL;
>>         qemu_mutex_unlock_iothread();
>>
>>         run_ret = kvm_vcpu_ioctl(env, KVM_RUN, 0);
>>
>>         qemu_mutex_lock_iothread();
>> -        cpu_single_env = env;
>>         kvm_arch_post_run(env, run);
>>
>>         kvm_flush_coalesced_mmio_buffer();
>> @@ -1206,7 +1202,6 @@ int kvm_cpu_exec(CPUState *env)
>>     }
>>
>>     env->exit_request = 0;
>> -    cpu_single_env = NULL;
>>     return ret;
>>  }
>>
>> --
>> 1.7.3.4
>>
>>
> 
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 1/8] kvm: Set cpu_single_env only once
  2012-02-11 10:06       ` [Qemu-devel] " Jan Kiszka
@ 2012-02-11 11:25         ` Blue Swirl
  -1 siblings, 0 replies; 90+ messages in thread
From: Blue Swirl @ 2012-02-11 11:25 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti, qemu-devel,
	Avi Kivity

On Sat, Feb 11, 2012 at 10:06, Jan Kiszka <jan.kiszka@web.de> wrote:
> On 2012-02-11 11:02, Blue Swirl wrote:
>> On Fri, Feb 10, 2012 at 18:31, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>> As we have thread-local cpu_single_env now and KVM uses exactly one
>>> thread per VCPU, we can drop the cpu_single_env updates from the loop
>>> and initialize this variable only once during setup.
>>
>> I don't think this is correct. Maybe you missed the part that sets
>> cpu_single_env to NULL, which I think is to annoy broken code that
>> assumes that some CPU state is always globally available. This is not
>> true for monitor context.
>
> I did check this before changing, and I see no such need. Particularly
> as this old debugging help prevents valid use case.

It looks like monitor code is safe now. But in several places there
are checks like this in pc.c:
DeviceState *cpu_get_current_apic(void)
{
    if (cpu_single_env) {
        return cpu_single_env->apic_state;
    } else {
        return NULL;
    }
}

In cpu-exec.c, there are these lines:
    /* fail safe : never use cpu_single_env outside cpu_exec() */
    cpu_single_env = NULL;

I think using cpu_single_env is an indication of a problem, like poor
code, layering violation or poor API (vmport). What is your use case?

>
> Jan
>
>>
>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>> ---
>>>  cpus.c    |    1 +
>>>  kvm-all.c |    5 -----
>>>  2 files changed, 1 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/cpus.c b/cpus.c
>>> index f45a438..d0c8340 100644
>>> --- a/cpus.c
>>> +++ b/cpus.c
>>> @@ -714,6 +714,7 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
>>>     qemu_mutex_lock(&qemu_global_mutex);
>>>     qemu_thread_get_self(env->thread);
>>>     env->thread_id = qemu_get_thread_id();
>>> +    cpu_single_env = env;
>>>
>>>     r = kvm_init_vcpu(env);
>>>     if (r < 0) {
>>> diff --git a/kvm-all.c b/kvm-all.c
>>> index c4babda..e2cbc03 100644
>>> --- a/kvm-all.c
>>> +++ b/kvm-all.c
>>> @@ -1118,8 +1118,6 @@ int kvm_cpu_exec(CPUState *env)
>>>         return EXCP_HLT;
>>>     }
>>>
>>> -    cpu_single_env = env;
>>> -
>>>     do {
>>>         if (env->kvm_vcpu_dirty) {
>>>             kvm_arch_put_registers(env, KVM_PUT_RUNTIME_STATE);
>>> @@ -1136,13 +1134,11 @@ int kvm_cpu_exec(CPUState *env)
>>>              */
>>>             qemu_cpu_kick_self();
>>>         }
>>> -        cpu_single_env = NULL;
>>>         qemu_mutex_unlock_iothread();
>>>
>>>         run_ret = kvm_vcpu_ioctl(env, KVM_RUN, 0);
>>>
>>>         qemu_mutex_lock_iothread();
>>> -        cpu_single_env = env;
>>>         kvm_arch_post_run(env, run);
>>>
>>>         kvm_flush_coalesced_mmio_buffer();
>>> @@ -1206,7 +1202,6 @@ int kvm_cpu_exec(CPUState *env)
>>>     }
>>>
>>>     env->exit_request = 0;
>>> -    cpu_single_env = NULL;
>>>     return ret;
>>>  }
>>>
>>> --
>>> 1.7.3.4
>>>
>>>
>>
>>
>
>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/8] kvm: Set cpu_single_env only once
@ 2012-02-11 11:25         ` Blue Swirl
  0 siblings, 0 replies; 90+ messages in thread
From: Blue Swirl @ 2012-02-11 11:25 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti, qemu-devel,
	Avi Kivity

On Sat, Feb 11, 2012 at 10:06, Jan Kiszka <jan.kiszka@web.de> wrote:
> On 2012-02-11 11:02, Blue Swirl wrote:
>> On Fri, Feb 10, 2012 at 18:31, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>> As we have thread-local cpu_single_env now and KVM uses exactly one
>>> thread per VCPU, we can drop the cpu_single_env updates from the loop
>>> and initialize this variable only once during setup.
>>
>> I don't think this is correct. Maybe you missed the part that sets
>> cpu_single_env to NULL, which I think is to annoy broken code that
>> assumes that some CPU state is always globally available. This is not
>> true for monitor context.
>
> I did check this before changing, and I see no such need. Particularly
> as this old debugging help prevents valid use case.

It looks like monitor code is safe now. But in several places there
are checks like this in pc.c:
DeviceState *cpu_get_current_apic(void)
{
    if (cpu_single_env) {
        return cpu_single_env->apic_state;
    } else {
        return NULL;
    }
}

In cpu-exec.c, there are these lines:
    /* fail safe : never use cpu_single_env outside cpu_exec() */
    cpu_single_env = NULL;

I think using cpu_single_env is an indication of a problem, like poor
code, layering violation or poor API (vmport). What is your use case?

>
> Jan
>
>>
>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>> ---
>>>  cpus.c    |    1 +
>>>  kvm-all.c |    5 -----
>>>  2 files changed, 1 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/cpus.c b/cpus.c
>>> index f45a438..d0c8340 100644
>>> --- a/cpus.c
>>> +++ b/cpus.c
>>> @@ -714,6 +714,7 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
>>>     qemu_mutex_lock(&qemu_global_mutex);
>>>     qemu_thread_get_self(env->thread);
>>>     env->thread_id = qemu_get_thread_id();
>>> +    cpu_single_env = env;
>>>
>>>     r = kvm_init_vcpu(env);
>>>     if (r < 0) {
>>> diff --git a/kvm-all.c b/kvm-all.c
>>> index c4babda..e2cbc03 100644
>>> --- a/kvm-all.c
>>> +++ b/kvm-all.c
>>> @@ -1118,8 +1118,6 @@ int kvm_cpu_exec(CPUState *env)
>>>         return EXCP_HLT;
>>>     }
>>>
>>> -    cpu_single_env = env;
>>> -
>>>     do {
>>>         if (env->kvm_vcpu_dirty) {
>>>             kvm_arch_put_registers(env, KVM_PUT_RUNTIME_STATE);
>>> @@ -1136,13 +1134,11 @@ int kvm_cpu_exec(CPUState *env)
>>>              */
>>>             qemu_cpu_kick_self();
>>>         }
>>> -        cpu_single_env = NULL;
>>>         qemu_mutex_unlock_iothread();
>>>
>>>         run_ret = kvm_vcpu_ioctl(env, KVM_RUN, 0);
>>>
>>>         qemu_mutex_lock_iothread();
>>> -        cpu_single_env = env;
>>>         kvm_arch_post_run(env, run);
>>>
>>>         kvm_flush_coalesced_mmio_buffer();
>>> @@ -1206,7 +1202,6 @@ int kvm_cpu_exec(CPUState *env)
>>>     }
>>>
>>>     env->exit_request = 0;
>>> -    cpu_single_env = NULL;
>>>     return ret;
>>>  }
>>>
>>> --
>>> 1.7.3.4
>>>
>>>
>>
>>
>
>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 7/8] optionsrom: Reserve space for checksum
  2012-02-10 18:31   ` [Qemu-devel] " Jan Kiszka
@ 2012-02-11 11:46     ` Andreas Färber
  -1 siblings, 0 replies; 90+ messages in thread
From: Andreas Färber @ 2012-02-11 11:46 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti, qemu-devel,
	Avi Kivity

Am 10.02.2012 19:31, schrieb Jan Kiszka:
> Always add a byte before the final 512-bytes alignment to reserve the
> space for the ROM checksum.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
>  pc-bios/optionrom/optionrom.h |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/pc-bios/optionrom/optionrom.h b/pc-bios/optionrom/optionrom.h
> index aa783de..3daf7da 100644
> --- a/pc-bios/optionrom/optionrom.h
> +++ b/pc-bios/optionrom/optionrom.h
> @@ -124,7 +124,8 @@
>  	movw		%ax, %ds;
>  
>  #define OPTION_ROM_END					\
> -    .align 512, 0;					\
> +	.byte		0;				\
> +	.align		512, 0;				\

Tabs.

Andreas

>      _end:
>  
>  #define BOOT_ROM_END					\

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/8] optionsrom: Reserve space for checksum
@ 2012-02-11 11:46     ` Andreas Färber
  0 siblings, 0 replies; 90+ messages in thread
From: Andreas Färber @ 2012-02-11 11:46 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti, qemu-devel,
	Avi Kivity

Am 10.02.2012 19:31, schrieb Jan Kiszka:
> Always add a byte before the final 512-bytes alignment to reserve the
> space for the ROM checksum.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
>  pc-bios/optionrom/optionrom.h |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/pc-bios/optionrom/optionrom.h b/pc-bios/optionrom/optionrom.h
> index aa783de..3daf7da 100644
> --- a/pc-bios/optionrom/optionrom.h
> +++ b/pc-bios/optionrom/optionrom.h
> @@ -124,7 +124,8 @@
>  	movw		%ax, %ds;
>  
>  #define OPTION_ROM_END					\
> -    .align 512, 0;					\
> +	.byte		0;				\
> +	.align		512, 0;				\

Tabs.

Andreas

>      _end:
>  
>  #define BOOT_ROM_END					\

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 1/8] kvm: Set cpu_single_env only once
  2012-02-11 11:25         ` [Qemu-devel] " Blue Swirl
@ 2012-02-11 11:49           ` Andreas Färber
  -1 siblings, 0 replies; 90+ messages in thread
From: Andreas Färber @ 2012-02-11 11:49 UTC (permalink / raw)
  To: Blue Swirl, Jan Kiszka
  Cc: Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti, qemu-devel,
	Avi Kivity

Am 11.02.2012 12:25, schrieb Blue Swirl:
> I think using cpu_single_env is an indication of a problem, like poor
> code, layering violation or poor API (vmport). What is your use case?

I couldn't spot any in this series. Jan, note that any new use of env or
cpu_single_env will need to be redone when we convert to QOM CPU.

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/8] kvm: Set cpu_single_env only once
@ 2012-02-11 11:49           ` Andreas Färber
  0 siblings, 0 replies; 90+ messages in thread
From: Andreas Färber @ 2012-02-11 11:49 UTC (permalink / raw)
  To: Blue Swirl, Jan Kiszka
  Cc: Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti, qemu-devel,
	Avi Kivity

Am 11.02.2012 12:25, schrieb Blue Swirl:
> I think using cpu_single_env is an indication of a problem, like poor
> code, layering violation or poor API (vmport). What is your use case?

I couldn't spot any in this series. Jan, note that any new use of env or
cpu_single_env will need to be redone when we convert to QOM CPU.

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 1/8] kvm: Set cpu_single_env only once
  2012-02-11 11:25         ` [Qemu-devel] " Blue Swirl
@ 2012-02-11 12:41           ` Jan Kiszka
  -1 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-11 12:41 UTC (permalink / raw)
  To: Blue Swirl
  Cc: Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti, qemu-devel,
	Avi Kivity

[-- Attachment #1: Type: text/plain, Size: 1801 bytes --]

On 2012-02-11 12:25, Blue Swirl wrote:
> On Sat, Feb 11, 2012 at 10:06, Jan Kiszka <jan.kiszka@web.de> wrote:
>> On 2012-02-11 11:02, Blue Swirl wrote:
>>> On Fri, Feb 10, 2012 at 18:31, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>>> As we have thread-local cpu_single_env now and KVM uses exactly one
>>>> thread per VCPU, we can drop the cpu_single_env updates from the loop
>>>> and initialize this variable only once during setup.
>>>
>>> I don't think this is correct. Maybe you missed the part that sets
>>> cpu_single_env to NULL, which I think is to annoy broken code that
>>> assumes that some CPU state is always globally available. This is not
>>> true for monitor context.
>>
>> I did check this before changing, and I see no such need. Particularly
>> as this old debugging help prevents valid use case.
> 
> It looks like monitor code is safe now. But in several places there
> are checks like this in pc.c:
> DeviceState *cpu_get_current_apic(void)
> {
>     if (cpu_single_env) {
>         return cpu_single_env->apic_state;
>     } else {
>         return NULL;
>     }
> }
> 
> In cpu-exec.c, there are these lines:
>     /* fail safe : never use cpu_single_env outside cpu_exec() */
>     cpu_single_env = NULL;

That's legacy stuff from the pre-io-thread times. Nowadays,
cpu_single_env is logically either always valid (KVM) or switching
seamlessly between the VCPUs (TCG).

> 
> I think using cpu_single_env is an indication of a problem, like poor
> code, layering violation or poor API (vmport). What is your use case?

We have a few poor ABIs like the VMware stuff or the KVM VAPIC, so we
have to live with it already for that use cases. Moreover, cpus.c use it
internally, like for vm_stop, to find out the caller context.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/8] kvm: Set cpu_single_env only once
@ 2012-02-11 12:41           ` Jan Kiszka
  0 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-11 12:41 UTC (permalink / raw)
  To: Blue Swirl
  Cc: Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti, qemu-devel,
	Avi Kivity

[-- Attachment #1: Type: text/plain, Size: 1801 bytes --]

On 2012-02-11 12:25, Blue Swirl wrote:
> On Sat, Feb 11, 2012 at 10:06, Jan Kiszka <jan.kiszka@web.de> wrote:
>> On 2012-02-11 11:02, Blue Swirl wrote:
>>> On Fri, Feb 10, 2012 at 18:31, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>>> As we have thread-local cpu_single_env now and KVM uses exactly one
>>>> thread per VCPU, we can drop the cpu_single_env updates from the loop
>>>> and initialize this variable only once during setup.
>>>
>>> I don't think this is correct. Maybe you missed the part that sets
>>> cpu_single_env to NULL, which I think is to annoy broken code that
>>> assumes that some CPU state is always globally available. This is not
>>> true for monitor context.
>>
>> I did check this before changing, and I see no such need. Particularly
>> as this old debugging help prevents valid use case.
> 
> It looks like monitor code is safe now. But in several places there
> are checks like this in pc.c:
> DeviceState *cpu_get_current_apic(void)
> {
>     if (cpu_single_env) {
>         return cpu_single_env->apic_state;
>     } else {
>         return NULL;
>     }
> }
> 
> In cpu-exec.c, there are these lines:
>     /* fail safe : never use cpu_single_env outside cpu_exec() */
>     cpu_single_env = NULL;

That's legacy stuff from the pre-io-thread times. Nowadays,
cpu_single_env is logically either always valid (KVM) or switching
seamlessly between the VCPUs (TCG).

> 
> I think using cpu_single_env is an indication of a problem, like poor
> code, layering violation or poor API (vmport). What is your use case?

We have a few poor ABIs like the VMware stuff or the KVM VAPIC, so we
have to live with it already for that use cases. Moreover, cpus.c use it
internally, like for vm_stop, to find out the caller context.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/8] kvm: Set cpu_single_env only once
  2012-02-11 11:49           ` [Qemu-devel] " Andreas Färber
@ 2012-02-11 12:43             ` Jan Kiszka
  -1 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-11 12:43 UTC (permalink / raw)
  To: Andreas Färber
  Cc: Blue Swirl, Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti,
	qemu-devel, Avi Kivity

[-- Attachment #1: Type: text/plain, Size: 579 bytes --]

On 2012-02-11 12:49, Andreas Färber wrote:
> Am 11.02.2012 12:25, schrieb Blue Swirl:
>> I think using cpu_single_env is an indication of a problem, like poor
>> code, layering violation or poor API (vmport). What is your use case?
> 
> I couldn't spot any in this series. Jan, note that any new use of env or
> cpu_single_env will need to be redone when we convert to QOM CPU.

cpu_single_env should have nothing to do with QOM.

The ABIs of vmport and the KVM VAPI require a reference to the calling
VCPU, and that's why you find tons of them in patch 5.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/8] kvm: Set cpu_single_env only once
@ 2012-02-11 12:43             ` Jan Kiszka
  0 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-11 12:43 UTC (permalink / raw)
  To: Andreas Färber
  Cc: Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti, qemu-devel,
	Blue Swirl, Avi Kivity

[-- Attachment #1: Type: text/plain, Size: 579 bytes --]

On 2012-02-11 12:49, Andreas Färber wrote:
> Am 11.02.2012 12:25, schrieb Blue Swirl:
>> I think using cpu_single_env is an indication of a problem, like poor
>> code, layering violation or poor API (vmport). What is your use case?
> 
> I couldn't spot any in this series. Jan, note that any new use of env or
> cpu_single_env will need to be redone when we convert to QOM CPU.

cpu_single_env should have nothing to do with QOM.

The ABIs of vmport and the KVM VAPI require a reference to the calling
VCPU, and that's why you find tons of them in patch 5.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 7/8] optionsrom: Reserve space for checksum
  2012-02-11 11:46     ` [Qemu-devel] " Andreas Färber
@ 2012-02-11 12:45       ` Jan Kiszka
  -1 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-11 12:45 UTC (permalink / raw)
  To: Andreas Färber
  Cc: Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti, qemu-devel,
	Avi Kivity

[-- Attachment #1: Type: text/plain, Size: 909 bytes --]

On 2012-02-11 12:46, Andreas Färber wrote:
> Am 10.02.2012 19:31, schrieb Jan Kiszka:
>> Always add a byte before the final 512-bytes alignment to reserve the
>> space for the ROM checksum.
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> ---
>>  pc-bios/optionrom/optionrom.h |    3 ++-
>>  1 files changed, 2 insertions(+), 1 deletions(-)
>>
>> diff --git a/pc-bios/optionrom/optionrom.h b/pc-bios/optionrom/optionrom.h
>> index aa783de..3daf7da 100644
>> --- a/pc-bios/optionrom/optionrom.h
>> +++ b/pc-bios/optionrom/optionrom.h
>> @@ -124,7 +124,8 @@
>>  	movw		%ax, %ds;
>>  
>>  #define OPTION_ROM_END					\
>> -    .align 512, 0;					\
>> +	.byte		0;				\
>> +	.align		512, 0;				\
> 
> Tabs.

For sure, like in the whole file.

If a codestyle fix is desired, I'll post one for all assembly files. But
I guess there are different views on such changes.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/8] optionsrom: Reserve space for checksum
@ 2012-02-11 12:45       ` Jan Kiszka
  0 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-11 12:45 UTC (permalink / raw)
  To: Andreas Färber
  Cc: Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti, qemu-devel,
	Avi Kivity

[-- Attachment #1: Type: text/plain, Size: 909 bytes --]

On 2012-02-11 12:46, Andreas Färber wrote:
> Am 10.02.2012 19:31, schrieb Jan Kiszka:
>> Always add a byte before the final 512-bytes alignment to reserve the
>> space for the ROM checksum.
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> ---
>>  pc-bios/optionrom/optionrom.h |    3 ++-
>>  1 files changed, 2 insertions(+), 1 deletions(-)
>>
>> diff --git a/pc-bios/optionrom/optionrom.h b/pc-bios/optionrom/optionrom.h
>> index aa783de..3daf7da 100644
>> --- a/pc-bios/optionrom/optionrom.h
>> +++ b/pc-bios/optionrom/optionrom.h
>> @@ -124,7 +124,8 @@
>>  	movw		%ax, %ds;
>>  
>>  #define OPTION_ROM_END					\
>> -    .align 512, 0;					\
>> +	.byte		0;				\
>> +	.align		512, 0;				\
> 
> Tabs.

For sure, like in the whole file.

If a codestyle fix is desired, I'll post one for all assembly files. But
I guess there are different views on such changes.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 7/8] optionsrom: Reserve space for checksum
  2012-02-11 12:45       ` [Qemu-devel] " Jan Kiszka
@ 2012-02-11 12:51         ` Andreas Färber
  -1 siblings, 0 replies; 90+ messages in thread
From: Andreas Färber @ 2012-02-11 12:51 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti, qemu-devel,
	Avi Kivity

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Am 11.02.2012 13:45, schrieb Jan Kiszka:
> On 2012-02-11 12:46, Andreas Färber wrote:
>> Am 10.02.2012 19:31, schrieb Jan Kiszka:
>>> Always add a byte before the final 512-bytes alignment to
>>> reserve the space for the ROM checksum.
>>> 
>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> --- 
>>> pc-bios/optionrom/optionrom.h |    3 ++- 1 files changed, 2
>>> insertions(+), 1 deletions(-)
>>> 
>>> diff --git a/pc-bios/optionrom/optionrom.h
>>> b/pc-bios/optionrom/optionrom.h index aa783de..3daf7da 100644 
>>> --- a/pc-bios/optionrom/optionrom.h +++
>>> b/pc-bios/optionrom/optionrom.h @@ -124,7 +124,8 @@ movw		%ax,
>>> %ds;
>>> 
>>> #define OPTION_ROM_END					\ -    .align 512, 0;					\ +	.byte
>>> 0;				\ +	.align		512, 0;				\
>> 
>> Tabs.
> 
> For sure, like in the whole file.

No, as we can see in this patch, .align above and _align below use 4
spaces, so this looks inconsistent in Thunderbird.

I don't really mind, just noticed and thought you might want to fix.

Andreas

> 
> If a codestyle fix is desired, I'll post one for all assembly
> files. But I guess there are different views on such changes.
> 
> Jan

- -- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)

iQIcBAEBAgAGBQJPNmQ2AAoJEPou0S0+fgE/nnEQALknNqkUrCgdXTiRwaKCZehU
U+lnRwAY2DoFKaq8IqmvmiwNW2g4U0qPy0Stc7qvsOeTHSk/MX3R1Ym1+a03Hk22
tIzyJjvQPgVkbQtAljC7BObGzxGq9mUlV4+/TuTQgTsDXig121CmVVmrKXDRdP6o
vXmtLgkXQB75ArOQSr0sByCFuBbA+AlIcmtIpvlR6znMZcfp0QFYbAzfAINzad+A
Qg40GXUsFyyHlstM3Njt9H0ArtvE+NIPH9ZABk+58aV92CAk31rOigiiBAE1sPja
qR0AIEMmwnASEWsn/yX45b+lhxXfVadZNzobhnJ/KqG/K7hrfzcLIeD5pKMt8USu
TPFfdswt4utm/JDGhP9moRpe3cLGITAN93vybyOv/7W0CK18etFgevQvX0C98V9O
+FUjwqsOwfXkswXcglwYmUVqJarUooh1MlXF/QnS+/d7aefIl59Y/YaGzRryjZNm
Nd1hN7L5J34LfBCgltWNXA9KQB+0LyCYCX0lzE1+428WwqbZ8nGa9ZiUY0w4N2O7
maWrWnmI1VsiUL8RdsQpx7+nRfUEZW6JDfm1bA5nbd+uLuE+TwpAWWkC2kVOhnfi
UQggicoM6J88zcbiGT74whjfZgx7CXcE/F/Tncd5ebi5qi++jXB6C0f3LSFcx2KP
+Zv19XpWHzZXNNYiGFW0
=rQEQ
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/8] optionsrom: Reserve space for checksum
@ 2012-02-11 12:51         ` Andreas Färber
  0 siblings, 0 replies; 90+ messages in thread
From: Andreas Färber @ 2012-02-11 12:51 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti, qemu-devel,
	Avi Kivity

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Am 11.02.2012 13:45, schrieb Jan Kiszka:
> On 2012-02-11 12:46, Andreas Färber wrote:
>> Am 10.02.2012 19:31, schrieb Jan Kiszka:
>>> Always add a byte before the final 512-bytes alignment to
>>> reserve the space for the ROM checksum.
>>> 
>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> --- 
>>> pc-bios/optionrom/optionrom.h |    3 ++- 1 files changed, 2
>>> insertions(+), 1 deletions(-)
>>> 
>>> diff --git a/pc-bios/optionrom/optionrom.h
>>> b/pc-bios/optionrom/optionrom.h index aa783de..3daf7da 100644 
>>> --- a/pc-bios/optionrom/optionrom.h +++
>>> b/pc-bios/optionrom/optionrom.h @@ -124,7 +124,8 @@ movw		%ax,
>>> %ds;
>>> 
>>> #define OPTION_ROM_END					\ -    .align 512, 0;					\ +	.byte
>>> 0;				\ +	.align		512, 0;				\
>> 
>> Tabs.
> 
> For sure, like in the whole file.

No, as we can see in this patch, .align above and _align below use 4
spaces, so this looks inconsistent in Thunderbird.

I don't really mind, just noticed and thought you might want to fix.

Andreas

> 
> If a codestyle fix is desired, I'll post one for all assembly
> files. But I guess there are different views on such changes.
> 
> Jan

- -- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)

iQIcBAEBAgAGBQJPNmQ2AAoJEPou0S0+fgE/nnEQALknNqkUrCgdXTiRwaKCZehU
U+lnRwAY2DoFKaq8IqmvmiwNW2g4U0qPy0Stc7qvsOeTHSk/MX3R1Ym1+a03Hk22
tIzyJjvQPgVkbQtAljC7BObGzxGq9mUlV4+/TuTQgTsDXig121CmVVmrKXDRdP6o
vXmtLgkXQB75ArOQSr0sByCFuBbA+AlIcmtIpvlR6znMZcfp0QFYbAzfAINzad+A
Qg40GXUsFyyHlstM3Njt9H0ArtvE+NIPH9ZABk+58aV92CAk31rOigiiBAE1sPja
qR0AIEMmwnASEWsn/yX45b+lhxXfVadZNzobhnJ/KqG/K7hrfzcLIeD5pKMt8USu
TPFfdswt4utm/JDGhP9moRpe3cLGITAN93vybyOv/7W0CK18etFgevQvX0C98V9O
+FUjwqsOwfXkswXcglwYmUVqJarUooh1MlXF/QnS+/d7aefIl59Y/YaGzRryjZNm
Nd1hN7L5J34LfBCgltWNXA9KQB+0LyCYCX0lzE1+428WwqbZ8nGa9ZiUY0w4N2O7
maWrWnmI1VsiUL8RdsQpx7+nRfUEZW6JDfm1bA5nbd+uLuE+TwpAWWkC2kVOhnfi
UQggicoM6J88zcbiGT74whjfZgx7CXcE/F/Tncd5ebi5qi++jXB6C0f3LSFcx2KP
+Zv19XpWHzZXNNYiGFW0
=rQEQ
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 7/8] optionsrom: Reserve space for checksum
  2012-02-11 12:51         ` [Qemu-devel] " Andreas Färber
@ 2012-02-11 12:57           ` Jan Kiszka
  -1 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-11 12:57 UTC (permalink / raw)
  To: Andreas Färber
  Cc: Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti, qemu-devel,
	Avi Kivity

[-- Attachment #1: Type: text/plain, Size: 1455 bytes --]

On 2012-02-11 13:51, Andreas Färber wrote:
> Am 11.02.2012 13:45, schrieb Jan Kiszka:
>> On 2012-02-11 12:46, Andreas Färber wrote:
>>> Am 10.02.2012 19:31, schrieb Jan Kiszka:
>>>> Always add a byte before the final 512-bytes alignment to
>>>> reserve the space for the ROM checksum.
>>>>
>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> --- 
>>>> pc-bios/optionrom/optionrom.h |    3 ++- 1 files changed, 2
>>>> insertions(+), 1 deletions(-)
>>>>
>>>> diff --git a/pc-bios/optionrom/optionrom.h
>>>> b/pc-bios/optionrom/optionrom.h index aa783de..3daf7da 100644 
>>>> --- a/pc-bios/optionrom/optionrom.h +++
>>>> b/pc-bios/optionrom/optionrom.h @@ -124,7 +124,8 @@ movw		%ax,
>>>> %ds;
>>>>
>>>> #define OPTION_ROM_END					\ -    .align 512, 0;					\ +	.byte
>>>> 0;				\ +	.align		512, 0;				\
>>>
>>> Tabs.
> 
>> For sure, like in the whole file.
> 
> No, as we can see in this patch, .align above and _align below use 4
> spaces, so this looks inconsistent in Thunderbird.

Right, but that is consistent with other labels in this file. OTOH,
label indention is different again in the option rom source files. Well,
no optimal solution here... ;)

Thanks,
Jan

> 
> I don't really mind, just noticed and thought you might want to fix.
> 
> Andreas
> 
> 
>> If a codestyle fix is desired, I'll post one for all assembly
>> files. But I guess there are different views on such changes.
> 
>> Jan
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 7/8] optionsrom: Reserve space for checksum
@ 2012-02-11 12:57           ` Jan Kiszka
  0 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-11 12:57 UTC (permalink / raw)
  To: Andreas Färber
  Cc: Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti, qemu-devel,
	Avi Kivity

[-- Attachment #1: Type: text/plain, Size: 1455 bytes --]

On 2012-02-11 13:51, Andreas Färber wrote:
> Am 11.02.2012 13:45, schrieb Jan Kiszka:
>> On 2012-02-11 12:46, Andreas Färber wrote:
>>> Am 10.02.2012 19:31, schrieb Jan Kiszka:
>>>> Always add a byte before the final 512-bytes alignment to
>>>> reserve the space for the ROM checksum.
>>>>
>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> --- 
>>>> pc-bios/optionrom/optionrom.h |    3 ++- 1 files changed, 2
>>>> insertions(+), 1 deletions(-)
>>>>
>>>> diff --git a/pc-bios/optionrom/optionrom.h
>>>> b/pc-bios/optionrom/optionrom.h index aa783de..3daf7da 100644 
>>>> --- a/pc-bios/optionrom/optionrom.h +++
>>>> b/pc-bios/optionrom/optionrom.h @@ -124,7 +124,8 @@ movw		%ax,
>>>> %ds;
>>>>
>>>> #define OPTION_ROM_END					\ -    .align 512, 0;					\ +	.byte
>>>> 0;				\ +	.align		512, 0;				\
>>>
>>> Tabs.
> 
>> For sure, like in the whole file.
> 
> No, as we can see in this patch, .align above and _align below use 4
> spaces, so this looks inconsistent in Thunderbird.

Right, but that is consistent with other labels in this file. OTOH,
label indention is different again in the option rom source files. Well,
no optimal solution here... ;)

Thanks,
Jan

> 
> I don't really mind, just noticed and thought you might want to fix.
> 
> Andreas
> 
> 
>> If a codestyle fix is desired, I'll post one for all assembly
>> files. But I guess there are different views on such changes.
> 
>> Jan
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 1/8] kvm: Set cpu_single_env only once
  2012-02-11 12:43             ` Jan Kiszka
@ 2012-02-11 13:06               ` Andreas Färber
  -1 siblings, 0 replies; 90+ messages in thread
From: Andreas Färber @ 2012-02-11 13:06 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti, qemu-devel,
	Blue Swirl, Avi Kivity

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Am 11.02.2012 13:43, schrieb Jan Kiszka:
> On 2012-02-11 12:49, Andreas Färber wrote:
>> Am 11.02.2012 12:25, schrieb Blue Swirl:
>>> I think using cpu_single_env is an indication of a problem,
>>> like poor code, layering violation or poor API (vmport). What
>>> is your use case?
>> 
>> I couldn't spot any in this series. Jan, note that any new use of
>> env or cpu_single_env will need to be redone when we convert to
>> QOM CPU.
> 
> cpu_single_env should have nothing to do with QOM.

It does, cf. my patch series: Current CPU*State is being embedded in
the QOM object and most future code outside TCG will use a CPU rather
than CPUState pointer. The reason is that CPUState is totally
target-specific and does not belong in common code.

Andreas

- -- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)

iQIcBAEBAgAGBQJPNme8AAoJEPou0S0+fgE/bPcP/RRc85K6aJZEqRyw/lvN8+FB
2FtwOqCm6zTBiEfBOfs816YzBDl75F5BVRbNapMLi1Yp4y/BFwQF1lbpu7INF90R
ZvY5BjjW8+xjBbGN0BhmkbjKdXZS1spjYNXDjIcUTvfj/GXW8Aamfj4IQVTpd+0D
l1s6A/X4BgGoxEqLtnHi8mZojafFFW6Dy0tX7BOmAPwBJle+IK91huO/cmL3Ou3v
0X1Rl4UJlq7j5AxFZlbBkkMrB9vozMPZi983SpAyhieQTVqTB+XuRobwZZVWww0m
ff2cBPBckFSF+i5L7eWvL+HfCD2aeYgwTCmfxtxOxjThwvM7gkyz59gQznUmb3yZ
0SLi9aj0dYQkuidoLxORZaAG20pqfvGCMezJQ6p45jhGmq7W3RzMqJX5Hh7GN0bY
J+Yp1W/Svop9XS1MumERufO6E1+2TNpbtwGDizKV52DpT2dTtwQZJ9UjHUvLz52c
avM5DvuuYLGDIyMteURoAh1eo27kfHFZs9vI6HFK3uXrmihgGihtzlVvFxf887kR
LWt/QO8K/VzuktRKj9NutiMqJOUxIzddikxpkEU/80FOtMedy1Ne1cVpMWOTqXRh
U0iayaZ3FKK+NfSYgHjSGCHubJG3JwV/Hawu01nWuUR1aGOsbQuxm1sNcQ+VV1zJ
iGvD5Fdn+9+o+UTJSkiQ
=zss6
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/8] kvm: Set cpu_single_env only once
@ 2012-02-11 13:06               ` Andreas Färber
  0 siblings, 0 replies; 90+ messages in thread
From: Andreas Färber @ 2012-02-11 13:06 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti, qemu-devel,
	Blue Swirl, Avi Kivity

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Am 11.02.2012 13:43, schrieb Jan Kiszka:
> On 2012-02-11 12:49, Andreas Färber wrote:
>> Am 11.02.2012 12:25, schrieb Blue Swirl:
>>> I think using cpu_single_env is an indication of a problem,
>>> like poor code, layering violation or poor API (vmport). What
>>> is your use case?
>> 
>> I couldn't spot any in this series. Jan, note that any new use of
>> env or cpu_single_env will need to be redone when we convert to
>> QOM CPU.
> 
> cpu_single_env should have nothing to do with QOM.

It does, cf. my patch series: Current CPU*State is being embedded in
the QOM object and most future code outside TCG will use a CPU rather
than CPUState pointer. The reason is that CPUState is totally
target-specific and does not belong in common code.

Andreas

- -- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)

iQIcBAEBAgAGBQJPNme8AAoJEPou0S0+fgE/bPcP/RRc85K6aJZEqRyw/lvN8+FB
2FtwOqCm6zTBiEfBOfs816YzBDl75F5BVRbNapMLi1Yp4y/BFwQF1lbpu7INF90R
ZvY5BjjW8+xjBbGN0BhmkbjKdXZS1spjYNXDjIcUTvfj/GXW8Aamfj4IQVTpd+0D
l1s6A/X4BgGoxEqLtnHi8mZojafFFW6Dy0tX7BOmAPwBJle+IK91huO/cmL3Ou3v
0X1Rl4UJlq7j5AxFZlbBkkMrB9vozMPZi983SpAyhieQTVqTB+XuRobwZZVWww0m
ff2cBPBckFSF+i5L7eWvL+HfCD2aeYgwTCmfxtxOxjThwvM7gkyz59gQznUmb3yZ
0SLi9aj0dYQkuidoLxORZaAG20pqfvGCMezJQ6p45jhGmq7W3RzMqJX5Hh7GN0bY
J+Yp1W/Svop9XS1MumERufO6E1+2TNpbtwGDizKV52DpT2dTtwQZJ9UjHUvLz52c
avM5DvuuYLGDIyMteURoAh1eo27kfHFZs9vI6HFK3uXrmihgGihtzlVvFxf887kR
LWt/QO8K/VzuktRKj9NutiMqJOUxIzddikxpkEU/80FOtMedy1Ne1cVpMWOTqXRh
U0iayaZ3FKK+NfSYgHjSGCHubJG3JwV/Hawu01nWuUR1aGOsbQuxm1sNcQ+VV1zJ
iGvD5Fdn+9+o+UTJSkiQ
=zss6
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/8] kvm: Set cpu_single_env only once
  2012-02-11 13:06               ` [Qemu-devel] " Andreas Färber
@ 2012-02-11 13:07                 ` Jan Kiszka
  -1 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-11 13:07 UTC (permalink / raw)
  To: Andreas Färber
  Cc: Blue Swirl, Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti,
	qemu-devel, Avi Kivity

[-- Attachment #1: Type: text/plain, Size: 945 bytes --]

On 2012-02-11 14:06, Andreas Färber wrote:
> Am 11.02.2012 13:43, schrieb Jan Kiszka:
>> On 2012-02-11 12:49, Andreas Färber wrote:
>>> Am 11.02.2012 12:25, schrieb Blue Swirl:
>>>> I think using cpu_single_env is an indication of a problem,
>>>> like poor code, layering violation or poor API (vmport). What
>>>> is your use case?
>>>
>>> I couldn't spot any in this series. Jan, note that any new use of
>>> env or cpu_single_env will need to be redone when we convert to
>>> QOM CPU.
> 
>> cpu_single_env should have nothing to do with QOM.
> 
> It does, cf. my patch series: Current CPU*State is being embedded in
> the QOM object and most future code outside TCG will use a CPU rather
> than CPUState pointer. The reason is that CPUState is totally
> target-specific and does not belong in common code.

So are the devices that depend on a current CPU pointer. You will have
to provide something equivalent.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/8] kvm: Set cpu_single_env only once
@ 2012-02-11 13:07                 ` Jan Kiszka
  0 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-11 13:07 UTC (permalink / raw)
  To: Andreas Färber
  Cc: Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti, qemu-devel,
	Blue Swirl, Avi Kivity

[-- Attachment #1: Type: text/plain, Size: 945 bytes --]

On 2012-02-11 14:06, Andreas Färber wrote:
> Am 11.02.2012 13:43, schrieb Jan Kiszka:
>> On 2012-02-11 12:49, Andreas Färber wrote:
>>> Am 11.02.2012 12:25, schrieb Blue Swirl:
>>>> I think using cpu_single_env is an indication of a problem,
>>>> like poor code, layering violation or poor API (vmport). What
>>>> is your use case?
>>>
>>> I couldn't spot any in this series. Jan, note that any new use of
>>> env or cpu_single_env will need to be redone when we convert to
>>> QOM CPU.
> 
>> cpu_single_env should have nothing to do with QOM.
> 
> It does, cf. my patch series: Current CPU*State is being embedded in
> the QOM object and most future code outside TCG will use a CPU rather
> than CPUState pointer. The reason is that CPUState is totally
> target-specific and does not belong in common code.

So are the devices that depend on a current CPU pointer. You will have
to provide something equivalent.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/8] kvm: Set cpu_single_env only once
  2012-02-11 13:07                 ` Jan Kiszka
@ 2012-02-11 13:21                   ` Andreas Färber
  -1 siblings, 0 replies; 90+ messages in thread
From: Andreas Färber @ 2012-02-11 13:21 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Blue Swirl, Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti,
	qemu-devel, Avi Kivity, Paolo Bonzini

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Am 11.02.2012 14:07, schrieb Jan Kiszka:
> On 2012-02-11 14:06, Andreas Färber wrote:
>> Am 11.02.2012 13:43, schrieb Jan Kiszka:
>>> On 2012-02-11 12:49, Andreas Färber wrote:
>>>> Am 11.02.2012 12:25, schrieb Blue Swirl:
>>>>> I think using cpu_single_env is an indication of a
>>>>> problem, like poor code, layering violation or poor API
>>>>> (vmport). What is your use case?
>>>> 
>>>> I couldn't spot any in this series. Jan, note that any new
>>>> use of env or cpu_single_env will need to be redone when we
>>>> convert to QOM CPU.
>> 
>>> cpu_single_env should have nothing to do with QOM.
>> 
>> It does, cf. my patch series: Current CPU*State is being embedded
>> in the QOM object and most future code outside TCG will use a

Let me stress this:

>> CPU rather than CPUState pointer.

>> The reason is that CPUState is totally target-specific and does
>> not belong in common code.
> 
> So are the devices that depend on a current CPU pointer. You will
> have to provide something equivalent.

CPU base class v3:
http://patchwork.ozlabs.org/patch/139284/ (v4 coming up)

That doesn't prevent target-specific devices. Although Paolo does want
that to change wrt properties.

Andreas

- -- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)

iQIcBAEBAgAGBQJPNmtzAAoJEPou0S0+fgE/SRQP+gLK/FvwIOXZqvSn+i+ooxin
jXOvH3oBtfiIQp5+59KGlOd7dSjILFwoPtH3U5tGDpI5HHLFpQQOsuppsiBwVOC9
9QUgqFt9d/xodvPJ0gv5ShghoEmCZNdFwNnBYeqB69mEDm5sZwYlvWgXaOgRti2+
0lhGFVISetImmQbiy5l7ubMONwcGUCVuT7pjiZ+S/Cew7wvGW5O7fpo3P8b4Xw4E
P7qX6y785Sm4Wn8iEangFOUqer5ALAS0fL2xHo5NYUUZ8jgn2xwDIT8TP9t8Pkei
5U0kWm+mNyvJ4VLxsN449LNGDV+c3AMyzPodRmV2KJBYISDRIFYlar/SkJGiBkvo
cNKdJLrkm4KIEt6eomyhYgSHJi5nUeoT60lAaZkHIDNonKoFw8swhf85wSi7sQmq
38nIY+F5YAHZ3TQCfTfxTDHy2Wbc6G7bn792FWKOxCVLWtD2Bp3iQv8J3MlYEhMJ
fnJv+/nKUQuPlti4LNwrhJyRLPUNrc6PKgzC8He4dupLMASFPuSMh4mKRlWj41+/
SYKvXz42elSqv2Z798eA8VNCbs7e+0EH67BJQLIL3QEuD4vY/Yfeulr3CGsfkLEL
m+UIAAntzloSkZvxuKmI5MP5XrjHTAbWuab+Gh9kYVyEsWZj4TAvn1hobKtU7sv9
lMd32AbfCskRN8jAw8So
=0Rvo
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/8] kvm: Set cpu_single_env only once
@ 2012-02-11 13:21                   ` Andreas Färber
  0 siblings, 0 replies; 90+ messages in thread
From: Andreas Färber @ 2012-02-11 13:21 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti, qemu-devel,
	Blue Swirl, Avi Kivity, Paolo Bonzini

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Am 11.02.2012 14:07, schrieb Jan Kiszka:
> On 2012-02-11 14:06, Andreas Färber wrote:
>> Am 11.02.2012 13:43, schrieb Jan Kiszka:
>>> On 2012-02-11 12:49, Andreas Färber wrote:
>>>> Am 11.02.2012 12:25, schrieb Blue Swirl:
>>>>> I think using cpu_single_env is an indication of a
>>>>> problem, like poor code, layering violation or poor API
>>>>> (vmport). What is your use case?
>>>> 
>>>> I couldn't spot any in this series. Jan, note that any new
>>>> use of env or cpu_single_env will need to be redone when we
>>>> convert to QOM CPU.
>> 
>>> cpu_single_env should have nothing to do with QOM.
>> 
>> It does, cf. my patch series: Current CPU*State is being embedded
>> in the QOM object and most future code outside TCG will use a

Let me stress this:

>> CPU rather than CPUState pointer.

>> The reason is that CPUState is totally target-specific and does
>> not belong in common code.
> 
> So are the devices that depend on a current CPU pointer. You will
> have to provide something equivalent.

CPU base class v3:
http://patchwork.ozlabs.org/patch/139284/ (v4 coming up)

That doesn't prevent target-specific devices. Although Paolo does want
that to change wrt properties.

Andreas

- -- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)

iQIcBAEBAgAGBQJPNmtzAAoJEPou0S0+fgE/SRQP+gLK/FvwIOXZqvSn+i+ooxin
jXOvH3oBtfiIQp5+59KGlOd7dSjILFwoPtH3U5tGDpI5HHLFpQQOsuppsiBwVOC9
9QUgqFt9d/xodvPJ0gv5ShghoEmCZNdFwNnBYeqB69mEDm5sZwYlvWgXaOgRti2+
0lhGFVISetImmQbiy5l7ubMONwcGUCVuT7pjiZ+S/Cew7wvGW5O7fpo3P8b4Xw4E
P7qX6y785Sm4Wn8iEangFOUqer5ALAS0fL2xHo5NYUUZ8jgn2xwDIT8TP9t8Pkei
5U0kWm+mNyvJ4VLxsN449LNGDV+c3AMyzPodRmV2KJBYISDRIFYlar/SkJGiBkvo
cNKdJLrkm4KIEt6eomyhYgSHJi5nUeoT60lAaZkHIDNonKoFw8swhf85wSi7sQmq
38nIY+F5YAHZ3TQCfTfxTDHy2Wbc6G7bn792FWKOxCVLWtD2Bp3iQv8J3MlYEhMJ
fnJv+/nKUQuPlti4LNwrhJyRLPUNrc6PKgzC8He4dupLMASFPuSMh4mKRlWj41+/
SYKvXz42elSqv2Z798eA8VNCbs7e+0EH67BJQLIL3QEuD4vY/Yfeulr3CGsfkLEL
m+UIAAntzloSkZvxuKmI5MP5XrjHTAbWuab+Gh9kYVyEsWZj4TAvn1hobKtU7sv9
lMd32AbfCskRN8jAw8So
=0Rvo
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 1/8] kvm: Set cpu_single_env only once
  2012-02-11 13:21                   ` Andreas Färber
@ 2012-02-11 13:35                     ` Jan Kiszka
  -1 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-11 13:35 UTC (permalink / raw)
  To: Andreas Färber
  Cc: Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti, qemu-devel,
	Blue Swirl, Avi Kivity, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 1555 bytes --]

On 2012-02-11 14:21, Andreas Färber wrote:
> Am 11.02.2012 14:07, schrieb Jan Kiszka:
>> On 2012-02-11 14:06, Andreas Färber wrote:
>>> Am 11.02.2012 13:43, schrieb Jan Kiszka:
>>>> On 2012-02-11 12:49, Andreas Färber wrote:
>>>>> Am 11.02.2012 12:25, schrieb Blue Swirl:
>>>>>> I think using cpu_single_env is an indication of a
>>>>>> problem, like poor code, layering violation or poor API
>>>>>> (vmport). What is your use case?
>>>>>
>>>>> I couldn't spot any in this series. Jan, note that any new
>>>>> use of env or cpu_single_env will need to be redone when we
>>>>> convert to QOM CPU.
>>>
>>>> cpu_single_env should have nothing to do with QOM.
>>>
>>> It does, cf. my patch series: Current CPU*State is being embedded
>>> in the QOM object and most future code outside TCG will use a
> 
> Let me stress this:
> 
>>> CPU rather than CPUState pointer.
> 
>>> The reason is that CPUState is totally target-specific and does
>>> not belong in common code.
> 
>> So are the devices that depend on a current CPU pointer. You will
>> have to provide something equivalent.
> 
> CPU base class v3:
> http://patchwork.ozlabs.org/patch/139284/ (v4 coming up)
> 
> That doesn't prevent target-specific devices. Although Paolo does want
> that to change wrt properties.

This patch doesn't explain yet what shall happen to cpu_single_env and
CPUState accesses from target-specific devices. Do you plan accessors
for registers? And a service to return the CPU object associated with
the execution context?

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/8] kvm: Set cpu_single_env only once
@ 2012-02-11 13:35                     ` Jan Kiszka
  0 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-11 13:35 UTC (permalink / raw)
  To: Andreas Färber
  Cc: Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti, qemu-devel,
	Blue Swirl, Avi Kivity, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 1555 bytes --]

On 2012-02-11 14:21, Andreas Färber wrote:
> Am 11.02.2012 14:07, schrieb Jan Kiszka:
>> On 2012-02-11 14:06, Andreas Färber wrote:
>>> Am 11.02.2012 13:43, schrieb Jan Kiszka:
>>>> On 2012-02-11 12:49, Andreas Färber wrote:
>>>>> Am 11.02.2012 12:25, schrieb Blue Swirl:
>>>>>> I think using cpu_single_env is an indication of a
>>>>>> problem, like poor code, layering violation or poor API
>>>>>> (vmport). What is your use case?
>>>>>
>>>>> I couldn't spot any in this series. Jan, note that any new
>>>>> use of env or cpu_single_env will need to be redone when we
>>>>> convert to QOM CPU.
>>>
>>>> cpu_single_env should have nothing to do with QOM.
>>>
>>> It does, cf. my patch series: Current CPU*State is being embedded
>>> in the QOM object and most future code outside TCG will use a
> 
> Let me stress this:
> 
>>> CPU rather than CPUState pointer.
> 
>>> The reason is that CPUState is totally target-specific and does
>>> not belong in common code.
> 
>> So are the devices that depend on a current CPU pointer. You will
>> have to provide something equivalent.
> 
> CPU base class v3:
> http://patchwork.ozlabs.org/patch/139284/ (v4 coming up)
> 
> That doesn't prevent target-specific devices. Although Paolo does want
> that to change wrt properties.

This patch doesn't explain yet what shall happen to cpu_single_env and
CPUState accesses from target-specific devices. Do you plan accessors
for registers? And a service to return the CPU object associated with
the execution context?

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/8] kvm: Set cpu_single_env only once
  2012-02-11 12:43             ` Jan Kiszka
@ 2012-02-11 13:54               ` Blue Swirl
  -1 siblings, 0 replies; 90+ messages in thread
From: Blue Swirl @ 2012-02-11 13:54 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Andreas Färber, Anthony Liguori, kvm, Gleb Natapov,
	Marcelo Tosatti, qemu-devel, Avi Kivity

On Sat, Feb 11, 2012 at 12:43, Jan Kiszka <jan.kiszka@web.de> wrote:
> On 2012-02-11 12:49, Andreas Färber wrote:
>> Am 11.02.2012 12:25, schrieb Blue Swirl:
>>> I think using cpu_single_env is an indication of a problem, like poor
>>> code, layering violation or poor API (vmport). What is your use case?
>>
>> I couldn't spot any in this series. Jan, note that any new use of env or
>> cpu_single_env will need to be redone when we convert to QOM CPU.
>
> cpu_single_env should have nothing to do with QOM.
>
> The ABIs of vmport and the KVM VAPI require a reference to the calling
> VCPU, and that's why you find tons of them in patch 5.

Yes, this seems to be another case of a badly designed ABI. I guess
there is no way to change that anymore, just like vmport?

Some of the cpu_single_env accesses in patch 5 could be avoided when
APIC is moved closer to CPU. VAPIC should be also close to APIC so it
should be able to access the CPU directly. In some other cases the
current state could be passed around instead once it is known.

>
> Jan
>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/8] kvm: Set cpu_single_env only once
@ 2012-02-11 13:54               ` Blue Swirl
  0 siblings, 0 replies; 90+ messages in thread
From: Blue Swirl @ 2012-02-11 13:54 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, Gleb Natapov, kvm, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Andreas Färber

On Sat, Feb 11, 2012 at 12:43, Jan Kiszka <jan.kiszka@web.de> wrote:
> On 2012-02-11 12:49, Andreas Färber wrote:
>> Am 11.02.2012 12:25, schrieb Blue Swirl:
>>> I think using cpu_single_env is an indication of a problem, like poor
>>> code, layering violation or poor API (vmport). What is your use case?
>>
>> I couldn't spot any in this series. Jan, note that any new use of env or
>> cpu_single_env will need to be redone when we convert to QOM CPU.
>
> cpu_single_env should have nothing to do with QOM.
>
> The ABIs of vmport and the KVM VAPI require a reference to the calling
> VCPU, and that's why you find tons of them in patch 5.

Yes, this seems to be another case of a badly designed ABI. I guess
there is no way to change that anymore, just like vmport?

Some of the cpu_single_env accesses in patch 5 could be avoided when
APIC is moved closer to CPU. VAPIC should be also close to APIC so it
should be able to access the CPU directly. In some other cases the
current state could be passed around instead once it is known.

>
> Jan
>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 1/8] kvm: Set cpu_single_env only once
  2012-02-11 13:35                     ` [Qemu-devel] " Jan Kiszka
@ 2012-02-11 13:59                       ` Andreas Färber
  -1 siblings, 0 replies; 90+ messages in thread
From: Andreas Färber @ 2012-02-11 13:59 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti, qemu-devel,
	Blue Swirl, Avi Kivity, Paolo Bonzini

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Am 11.02.2012 14:35, schrieb Jan Kiszka:
> On 2012-02-11 14:21, Andreas Färber wrote:
>> CPU base class v3: http://patchwork.ozlabs.org/patch/139284/ (v4
>> coming up)
>> 
>> That doesn't prevent target-specific devices. Although Paolo does
>> want that to change wrt properties.
> 
> This patch doesn't explain yet what shall happen to cpu_single_env
> and CPUState accesses from target-specific devices.

True. We can't change them before all targets are converted. So far I
have 3/14 and still some review comments to work in.

Another patch in that series uses a macro
s/ENV_GET_OBJECT/ENV_GET_CPU/ to go from CPUState -> CPU while we
convert targets.

Depending on our taste, cpu_single_env might become cpu_single_cpu,
single_cpu or cpu_single.

> Do you plan accessors for registers?

No, registers are in target-specific ARMCPU, S390CPU, MIPSCPU, etc.
and their CPU*State. It would be possible to have static inline
accessors but so far I've seen no need.

> And a service to return the CPU object associated with the
> execution context?

What do you mean by execution context? TLS? The modelling of the state
is pretty orthogonal to how/where we store it.

HTE,

Andreas

- -- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)

iQIcBAEBAgAGBQJPNnQvAAoJEPou0S0+fgE/yYAP/j4IUTrrP1invYoVLOiea7wn
9yJ3TgssSUPnQyQRUmkXFvx1hj0Z6umlyxTK+dc1DQF8jJtoouo3CS0D+tyIZEhd
GHN9J0qtgiDzvl1c+3b092VTw47gtv+rXjGUcyKSiTqyGl3OCdbIgt21HK8cT6CN
U7n2pFGBeZiX8GEYiZmhAglyJ45jGpWmVulGYiqzOlBYPLaYi0CQ25NVUalBpq4I
MIEqdW/W8lx0+h5Onl+qUo2btHNQCnG1oPH/BmdVf7Pe1G99VynOXwFVNXeJ2gR4
Lb7wnzKFqdybktkNLtkAIuC0gFj+Ph9+wfVw/QGsCBc0r6gotE8O9uDTyxs1ro+2
in6Am/A+o4M02sOf9uhaYx0l3uryOyXifiIAVzj4Y8s0QhyeDfPa6f8p4iQKh+gE
m/bvbDTb5hU9nW68IuiFXK9dfQMmU2ub5Gx7UAHuyOgEzV0gOPAf4nugYux3owIw
kYJy6sWFUMH/+l94nKAI0FanmL6JSOmA8hfaLXCXOfvfX9CfJEN+KEotj7Ma0Tcz
+YFAlGkwZYmnJmvFakxFlecRUYY/lpwlIusqRJsw1KP40pHuT8GZUDzv6Wn1UD2R
/6xeL7007iUmD+mafc+3xFbKMXS+kyF6+syM3xh/7r1SRyAIZQeJnZvLm8pZXS0T
tsKlb6nYaV+NLwD/rKy0
=09lZ
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/8] kvm: Set cpu_single_env only once
@ 2012-02-11 13:59                       ` Andreas Färber
  0 siblings, 0 replies; 90+ messages in thread
From: Andreas Färber @ 2012-02-11 13:59 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti, qemu-devel,
	Blue Swirl, Avi Kivity, Paolo Bonzini

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Am 11.02.2012 14:35, schrieb Jan Kiszka:
> On 2012-02-11 14:21, Andreas Färber wrote:
>> CPU base class v3: http://patchwork.ozlabs.org/patch/139284/ (v4
>> coming up)
>> 
>> That doesn't prevent target-specific devices. Although Paolo does
>> want that to change wrt properties.
> 
> This patch doesn't explain yet what shall happen to cpu_single_env
> and CPUState accesses from target-specific devices.

True. We can't change them before all targets are converted. So far I
have 3/14 and still some review comments to work in.

Another patch in that series uses a macro
s/ENV_GET_OBJECT/ENV_GET_CPU/ to go from CPUState -> CPU while we
convert targets.

Depending on our taste, cpu_single_env might become cpu_single_cpu,
single_cpu or cpu_single.

> Do you plan accessors for registers?

No, registers are in target-specific ARMCPU, S390CPU, MIPSCPU, etc.
and their CPU*State. It would be possible to have static inline
accessors but so far I've seen no need.

> And a service to return the CPU object associated with the
> execution context?

What do you mean by execution context? TLS? The modelling of the state
is pretty orthogonal to how/where we store it.

HTE,

Andreas

- -- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)

iQIcBAEBAgAGBQJPNnQvAAoJEPou0S0+fgE/yYAP/j4IUTrrP1invYoVLOiea7wn
9yJ3TgssSUPnQyQRUmkXFvx1hj0Z6umlyxTK+dc1DQF8jJtoouo3CS0D+tyIZEhd
GHN9J0qtgiDzvl1c+3b092VTw47gtv+rXjGUcyKSiTqyGl3OCdbIgt21HK8cT6CN
U7n2pFGBeZiX8GEYiZmhAglyJ45jGpWmVulGYiqzOlBYPLaYi0CQ25NVUalBpq4I
MIEqdW/W8lx0+h5Onl+qUo2btHNQCnG1oPH/BmdVf7Pe1G99VynOXwFVNXeJ2gR4
Lb7wnzKFqdybktkNLtkAIuC0gFj+Ph9+wfVw/QGsCBc0r6gotE8O9uDTyxs1ro+2
in6Am/A+o4M02sOf9uhaYx0l3uryOyXifiIAVzj4Y8s0QhyeDfPa6f8p4iQKh+gE
m/bvbDTb5hU9nW68IuiFXK9dfQMmU2ub5Gx7UAHuyOgEzV0gOPAf4nugYux3owIw
kYJy6sWFUMH/+l94nKAI0FanmL6JSOmA8hfaLXCXOfvfX9CfJEN+KEotj7Ma0Tcz
+YFAlGkwZYmnJmvFakxFlecRUYY/lpwlIusqRJsw1KP40pHuT8GZUDzv6Wn1UD2R
/6xeL7007iUmD+mafc+3xFbKMXS+kyF6+syM3xh/7r1SRyAIZQeJnZvLm8pZXS0T
tsKlb6nYaV+NLwD/rKy0
=09lZ
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 1/8] kvm: Set cpu_single_env only once
  2012-02-11 13:54               ` Blue Swirl
@ 2012-02-11 14:00                 ` Jan Kiszka
  -1 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-11 14:00 UTC (permalink / raw)
  To: Blue Swirl
  Cc: Anthony Liguori, Gleb Natapov, kvm, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Andreas Färber

[-- Attachment #1: Type: text/plain, Size: 1557 bytes --]

On 2012-02-11 14:54, Blue Swirl wrote:
> On Sat, Feb 11, 2012 at 12:43, Jan Kiszka <jan.kiszka@web.de> wrote:
>> On 2012-02-11 12:49, Andreas Färber wrote:
>>> Am 11.02.2012 12:25, schrieb Blue Swirl:
>>>> I think using cpu_single_env is an indication of a problem, like poor
>>>> code, layering violation or poor API (vmport). What is your use case?
>>>
>>> I couldn't spot any in this series. Jan, note that any new use of env or
>>> cpu_single_env will need to be redone when we convert to QOM CPU.
>>
>> cpu_single_env should have nothing to do with QOM.
>>
>> The ABIs of vmport and the KVM VAPI require a reference to the calling
>> VCPU, and that's why you find tons of them in patch 5.
> 
> Yes, this seems to be another case of a badly designed ABI. I guess
> there is no way to change that anymore, just like vmport?

Believe me, I grumbled over it more than once while porting it from
qemu-kvm. The point is that some (Windows) VMs out there are running
already with this option ROM loaded and working this unfortunate ABI.

> 
> Some of the cpu_single_env accesses in patch 5 could be avoided when
> APIC is moved closer to CPU. VAPIC should be also close to APIC so it
> should be able to access the CPU directly. In some other cases the
> current state could be passed around instead once it is known.

Some callbacks are I/O-port originated, ie. not associated with the
per-CPU MMIO area or some MSR. So we would have to pass down the causing
CPU to every I/O handler - not sure if that is desired...

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/8] kvm: Set cpu_single_env only once
@ 2012-02-11 14:00                 ` Jan Kiszka
  0 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-11 14:00 UTC (permalink / raw)
  To: Blue Swirl
  Cc: Anthony Liguori, Gleb Natapov, kvm, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Andreas Färber

[-- Attachment #1: Type: text/plain, Size: 1557 bytes --]

On 2012-02-11 14:54, Blue Swirl wrote:
> On Sat, Feb 11, 2012 at 12:43, Jan Kiszka <jan.kiszka@web.de> wrote:
>> On 2012-02-11 12:49, Andreas Färber wrote:
>>> Am 11.02.2012 12:25, schrieb Blue Swirl:
>>>> I think using cpu_single_env is an indication of a problem, like poor
>>>> code, layering violation or poor API (vmport). What is your use case?
>>>
>>> I couldn't spot any in this series. Jan, note that any new use of env or
>>> cpu_single_env will need to be redone when we convert to QOM CPU.
>>
>> cpu_single_env should have nothing to do with QOM.
>>
>> The ABIs of vmport and the KVM VAPI require a reference to the calling
>> VCPU, and that's why you find tons of them in patch 5.
> 
> Yes, this seems to be another case of a badly designed ABI. I guess
> there is no way to change that anymore, just like vmport?

Believe me, I grumbled over it more than once while porting it from
qemu-kvm. The point is that some (Windows) VMs out there are running
already with this option ROM loaded and working this unfortunate ABI.

> 
> Some of the cpu_single_env accesses in patch 5 could be avoided when
> APIC is moved closer to CPU. VAPIC should be also close to APIC so it
> should be able to access the CPU directly. In some other cases the
> current state could be passed around instead once it is known.

Some callbacks are I/O-port originated, ie. not associated with the
per-CPU MMIO area or some MSR. So we would have to pass down the causing
CPU to every I/O handler - not sure if that is desired...

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 1/8] kvm: Set cpu_single_env only once
  2012-02-11 13:59                       ` [Qemu-devel] " Andreas Färber
@ 2012-02-11 14:02                         ` Jan Kiszka
  -1 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-11 14:02 UTC (permalink / raw)
  To: Andreas Färber
  Cc: Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti, qemu-devel,
	Blue Swirl, Avi Kivity, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 1615 bytes --]

On 2012-02-11 14:59, Andreas Färber wrote:
> Am 11.02.2012 14:35, schrieb Jan Kiszka:
>> On 2012-02-11 14:21, Andreas Färber wrote:
>>> CPU base class v3: http://patchwork.ozlabs.org/patch/139284/ (v4
>>> coming up)
>>>
>>> That doesn't prevent target-specific devices. Although Paolo does
>>> want that to change wrt properties.
> 
>> This patch doesn't explain yet what shall happen to cpu_single_env
>> and CPUState accesses from target-specific devices.
> 
> True. We can't change them before all targets are converted. So far I
> have 3/14 and still some review comments to work in.
> 
> Another patch in that series uses a macro
> s/ENV_GET_OBJECT/ENV_GET_CPU/ to go from CPUState -> CPU while we
> convert targets.
> 
> Depending on our taste, cpu_single_env might become cpu_single_cpu,
> single_cpu or cpu_single.
> 
>> Do you plan accessors for registers?
> 
> No, registers are in target-specific ARMCPU, S390CPU, MIPSCPU, etc.
> and their CPU*State. It would be possible to have static inline
> accessors but so far I've seen no need.

Then the devices need to have access to a CPUState pointer, just as so far.

> 
>> And a service to return the CPU object associated with the
>> execution context?
> 
> What do you mean by execution context? TLS? The modelling of the state
> is pretty orthogonal to how/where we store it.

I mean "Where come this I/O access from?" and "am I running over some
VCPU thread?". This questions need to be answered in target-specific
device models and some parts of cpus.c (the latter is one motivation for
this patch).

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/8] kvm: Set cpu_single_env only once
@ 2012-02-11 14:02                         ` Jan Kiszka
  0 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-11 14:02 UTC (permalink / raw)
  To: Andreas Färber
  Cc: Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti, qemu-devel,
	Blue Swirl, Avi Kivity, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 1615 bytes --]

On 2012-02-11 14:59, Andreas Färber wrote:
> Am 11.02.2012 14:35, schrieb Jan Kiszka:
>> On 2012-02-11 14:21, Andreas Färber wrote:
>>> CPU base class v3: http://patchwork.ozlabs.org/patch/139284/ (v4
>>> coming up)
>>>
>>> That doesn't prevent target-specific devices. Although Paolo does
>>> want that to change wrt properties.
> 
>> This patch doesn't explain yet what shall happen to cpu_single_env
>> and CPUState accesses from target-specific devices.
> 
> True. We can't change them before all targets are converted. So far I
> have 3/14 and still some review comments to work in.
> 
> Another patch in that series uses a macro
> s/ENV_GET_OBJECT/ENV_GET_CPU/ to go from CPUState -> CPU while we
> convert targets.
> 
> Depending on our taste, cpu_single_env might become cpu_single_cpu,
> single_cpu or cpu_single.
> 
>> Do you plan accessors for registers?
> 
> No, registers are in target-specific ARMCPU, S390CPU, MIPSCPU, etc.
> and their CPU*State. It would be possible to have static inline
> accessors but so far I've seen no need.

Then the devices need to have access to a CPUState pointer, just as so far.

> 
>> And a service to return the CPU object associated with the
>> execution context?
> 
> What do you mean by execution context? TLS? The modelling of the state
> is pretty orthogonal to how/where we store it.

I mean "Where come this I/O access from?" and "am I running over some
VCPU thread?". This questions need to be answered in target-specific
device models and some parts of cpus.c (the latter is one motivation for
this patch).

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/8] kvm: Set cpu_single_env only once
  2012-02-11 14:00                 ` [Qemu-devel] " Jan Kiszka
@ 2012-02-11 14:11                   ` Blue Swirl
  -1 siblings, 0 replies; 90+ messages in thread
From: Blue Swirl @ 2012-02-11 14:11 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Andreas Färber, Anthony Liguori, kvm, Gleb Natapov,
	Marcelo Tosatti, qemu-devel, Avi Kivity

On Sat, Feb 11, 2012 at 14:00, Jan Kiszka <jan.kiszka@web.de> wrote:
> On 2012-02-11 14:54, Blue Swirl wrote:
>> On Sat, Feb 11, 2012 at 12:43, Jan Kiszka <jan.kiszka@web.de> wrote:
>>> On 2012-02-11 12:49, Andreas Färber wrote:
>>>> Am 11.02.2012 12:25, schrieb Blue Swirl:
>>>>> I think using cpu_single_env is an indication of a problem, like poor
>>>>> code, layering violation or poor API (vmport). What is your use case?
>>>>
>>>> I couldn't spot any in this series. Jan, note that any new use of env or
>>>> cpu_single_env will need to be redone when we convert to QOM CPU.
>>>
>>> cpu_single_env should have nothing to do with QOM.
>>>
>>> The ABIs of vmport and the KVM VAPI require a reference to the calling
>>> VCPU, and that's why you find tons of them in patch 5.
>>
>> Yes, this seems to be another case of a badly designed ABI. I guess
>> there is no way to change that anymore, just like vmport?
>
> Believe me, I grumbled over it more than once while porting it from
> qemu-kvm. The point is that some (Windows) VMs out there are running
> already with this option ROM loaded and working this unfortunate ABI.

Maybe in time those could be deprecated and a ROM using a sane ABI
introduced instead. After some grace time the old ABI could be finally
removed.

>>
>> Some of the cpu_single_env accesses in patch 5 could be avoided when
>> APIC is moved closer to CPU. VAPIC should be also close to APIC so it
>> should be able to access the CPU directly. In some other cases the
>> current state could be passed around instead once it is known.
>
> Some callbacks are I/O-port originated, ie. not associated with the
> per-CPU MMIO area or some MSR. So we would have to pass down the causing
> CPU to every I/O handler - not sure if that is desired...

I meant things like vapic_enable_tpr_reporting(), current CPUState
could be passed via vapic_prepare() easily.

> Jan
>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/8] kvm: Set cpu_single_env only once
@ 2012-02-11 14:11                   ` Blue Swirl
  0 siblings, 0 replies; 90+ messages in thread
From: Blue Swirl @ 2012-02-11 14:11 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, Gleb Natapov, kvm, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Andreas Färber

On Sat, Feb 11, 2012 at 14:00, Jan Kiszka <jan.kiszka@web.de> wrote:
> On 2012-02-11 14:54, Blue Swirl wrote:
>> On Sat, Feb 11, 2012 at 12:43, Jan Kiszka <jan.kiszka@web.de> wrote:
>>> On 2012-02-11 12:49, Andreas Färber wrote:
>>>> Am 11.02.2012 12:25, schrieb Blue Swirl:
>>>>> I think using cpu_single_env is an indication of a problem, like poor
>>>>> code, layering violation or poor API (vmport). What is your use case?
>>>>
>>>> I couldn't spot any in this series. Jan, note that any new use of env or
>>>> cpu_single_env will need to be redone when we convert to QOM CPU.
>>>
>>> cpu_single_env should have nothing to do with QOM.
>>>
>>> The ABIs of vmport and the KVM VAPI require a reference to the calling
>>> VCPU, and that's why you find tons of them in patch 5.
>>
>> Yes, this seems to be another case of a badly designed ABI. I guess
>> there is no way to change that anymore, just like vmport?
>
> Believe me, I grumbled over it more than once while porting it from
> qemu-kvm. The point is that some (Windows) VMs out there are running
> already with this option ROM loaded and working this unfortunate ABI.

Maybe in time those could be deprecated and a ROM using a sane ABI
introduced instead. After some grace time the old ABI could be finally
removed.

>>
>> Some of the cpu_single_env accesses in patch 5 could be avoided when
>> APIC is moved closer to CPU. VAPIC should be also close to APIC so it
>> should be able to access the CPU directly. In some other cases the
>> current state could be passed around instead once it is known.
>
> Some callbacks are I/O-port originated, ie. not associated with the
> per-CPU MMIO area or some MSR. So we would have to pass down the causing
> CPU to every I/O handler - not sure if that is desired...

I meant things like vapic_enable_tpr_reporting(), current CPUState
could be passed via vapic_prepare() easily.

> Jan
>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/8] kvm: Set cpu_single_env only once
  2012-02-11 14:02                         ` [Qemu-devel] " Jan Kiszka
@ 2012-02-11 14:12                           ` Andreas Färber
  -1 siblings, 0 replies; 90+ messages in thread
From: Andreas Färber @ 2012-02-11 14:12 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Blue Swirl, Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti,
	qemu-devel, Avi Kivity, Paolo Bonzini

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Am 11.02.2012 15:02, schrieb Jan Kiszka:
> On 2012-02-11 14:59, Andreas Färber wrote:
>> Am 11.02.2012 14:35, schrieb Jan Kiszka:
>>> On 2012-02-11 14:21, Andreas Färber wrote:
>>>> CPU base class v3: http://patchwork.ozlabs.org/patch/139284/
>>>> (v4 coming up)
>>>> 
>>>> That doesn't prevent target-specific devices. Although Paolo
>>>> does want that to change wrt properties.
>> 
>>> This patch doesn't explain yet what shall happen to
>>> cpu_single_env and CPUState accesses from target-specific
>>> devices.
>> 
>> True. We can't change them before all targets are converted. So
>> far I have 3/14 and still some review comments to work in.
>> 
>> Another patch in that series uses a macro 
>> s/ENV_GET_OBJECT/ENV_GET_CPU/ to go from CPUState -> CPU while
>> we convert targets.
>> 
>> Depending on our taste, cpu_single_env might become
>> cpu_single_cpu, single_cpu or cpu_single.
>> 
>>> Do you plan accessors for registers?
>> 
>> No, registers are in target-specific ARMCPU, S390CPU, MIPSCPU,
>> etc. and their CPU*State. It would be possible to have static
>> inline accessors but so far I've seen no need.
> 
> Then the devices need to have access to a CPUState pointer, just as
> so far.

Yes and no. They can have any target-specific pointer they want, just
as before. But no global first_cpu / cpu_single_env pointer - that's
replaced by CPU pointers, through which members of derived classes can
be accessed (which did not work for CPUState due to CPU_COMMON members
being at target-specific offset in the middle).

There's nothing wrong with your patch per se, just that it may need to
get refactored some time soonish. We need to be aware of it so that we
don't create merge conflicts for Anthony.

Andreas

- -- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)

iQIcBAEBAgAGBQJPNndNAAoJEPou0S0+fgE/+EoQAJFau/xsn2CYcuKPEsJmAMRk
yhOPBT6EJ2Q+34h31uLr3iftxQO9JpnLfEhB7ekTs36i0GklUsCQKgn4rg6vmPKj
tLvUk/hF4zuqJzUJOwxnYxYjuzdEGHuEbkCYgclUtNnywHCo3GLXhqP0izSds9mF
MhmqD45GblecjUpH7zdM/WTvulQm824hbDFPTCQaH8IQsw0QxT1Y4B71gpQtFJvJ
pVk2+qfc488ClhOhPISC5IiQFPnR7DVju82FuDgn6JFq/db9o3KXqIRlQg7pqkPc
h4K+Nz/rhzWpR6jtbTKqJV3yWBV9vxs6YDMSICZnGBabTlHh+tKoabg25Aj5zbcM
6Dmw10uFybi+jKlygKiSSxExParaRC9B3EFCk4dUhMC28B+qFSEkRA62Qpjndxwg
HCmzg2kSQpufyrWNdWj8W+mNygU/0rm8xcB7fX1vhSOmdu3DNTPIH7P4C9hOfC1g
hdIo0DpSd4AFfEIjZ0Loq0XOWKO9V05pOlcVsGmnCmGmfPXFPHWCFq3LGPz9Bj/7
rK1YtReDMXFOhq+QsOuRDuz1pCpPEfT4YhiXRuPsLlIaSszjFx3i6WAxBmh/tTtA
oxoGZQPUI3SRZYZPN5W+J5HqRyNkB16ffsrbcHVTmCrUm33yT+7a6S/vPE9NlZpm
zy92ShUp7JDvFjtnyOLK
=uTCH
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/8] kvm: Set cpu_single_env only once
@ 2012-02-11 14:12                           ` Andreas Färber
  0 siblings, 0 replies; 90+ messages in thread
From: Andreas Färber @ 2012-02-11 14:12 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti, qemu-devel,
	Blue Swirl, Avi Kivity, Paolo Bonzini

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Am 11.02.2012 15:02, schrieb Jan Kiszka:
> On 2012-02-11 14:59, Andreas Färber wrote:
>> Am 11.02.2012 14:35, schrieb Jan Kiszka:
>>> On 2012-02-11 14:21, Andreas Färber wrote:
>>>> CPU base class v3: http://patchwork.ozlabs.org/patch/139284/
>>>> (v4 coming up)
>>>> 
>>>> That doesn't prevent target-specific devices. Although Paolo
>>>> does want that to change wrt properties.
>> 
>>> This patch doesn't explain yet what shall happen to
>>> cpu_single_env and CPUState accesses from target-specific
>>> devices.
>> 
>> True. We can't change them before all targets are converted. So
>> far I have 3/14 and still some review comments to work in.
>> 
>> Another patch in that series uses a macro 
>> s/ENV_GET_OBJECT/ENV_GET_CPU/ to go from CPUState -> CPU while
>> we convert targets.
>> 
>> Depending on our taste, cpu_single_env might become
>> cpu_single_cpu, single_cpu or cpu_single.
>> 
>>> Do you plan accessors for registers?
>> 
>> No, registers are in target-specific ARMCPU, S390CPU, MIPSCPU,
>> etc. and their CPU*State. It would be possible to have static
>> inline accessors but so far I've seen no need.
> 
> Then the devices need to have access to a CPUState pointer, just as
> so far.

Yes and no. They can have any target-specific pointer they want, just
as before. But no global first_cpu / cpu_single_env pointer - that's
replaced by CPU pointers, through which members of derived classes can
be accessed (which did not work for CPUState due to CPU_COMMON members
being at target-specific offset in the middle).

There's nothing wrong with your patch per se, just that it may need to
get refactored some time soonish. We need to be aware of it so that we
don't create merge conflicts for Anthony.

Andreas

- -- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)

iQIcBAEBAgAGBQJPNndNAAoJEPou0S0+fgE/+EoQAJFau/xsn2CYcuKPEsJmAMRk
yhOPBT6EJ2Q+34h31uLr3iftxQO9JpnLfEhB7ekTs36i0GklUsCQKgn4rg6vmPKj
tLvUk/hF4zuqJzUJOwxnYxYjuzdEGHuEbkCYgclUtNnywHCo3GLXhqP0izSds9mF
MhmqD45GblecjUpH7zdM/WTvulQm824hbDFPTCQaH8IQsw0QxT1Y4B71gpQtFJvJ
pVk2+qfc488ClhOhPISC5IiQFPnR7DVju82FuDgn6JFq/db9o3KXqIRlQg7pqkPc
h4K+Nz/rhzWpR6jtbTKqJV3yWBV9vxs6YDMSICZnGBabTlHh+tKoabg25Aj5zbcM
6Dmw10uFybi+jKlygKiSSxExParaRC9B3EFCk4dUhMC28B+qFSEkRA62Qpjndxwg
HCmzg2kSQpufyrWNdWj8W+mNygU/0rm8xcB7fX1vhSOmdu3DNTPIH7P4C9hOfC1g
hdIo0DpSd4AFfEIjZ0Loq0XOWKO9V05pOlcVsGmnCmGmfPXFPHWCFq3LGPz9Bj/7
rK1YtReDMXFOhq+QsOuRDuz1pCpPEfT4YhiXRuPsLlIaSszjFx3i6WAxBmh/tTtA
oxoGZQPUI3SRZYZPN5W+J5HqRyNkB16ffsrbcHVTmCrUm33yT+7a6S/vPE9NlZpm
zy92ShUp7JDvFjtnyOLK
=uTCH
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 2/8] Allow to use pause_all_vcpus from VCPU context
  2012-02-10 18:31   ` [Qemu-devel] " Jan Kiszka
@ 2012-02-11 14:16     ` Blue Swirl
  -1 siblings, 0 replies; 90+ messages in thread
From: Blue Swirl @ 2012-02-11 14:16 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, Gleb Natapov, kvm, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Paolo Bonzini

On Fri, Feb 10, 2012 at 18:31, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> In order to perform critical manipulations on the VM state in the
> context of a VCPU, specifically code patching, stopping and resuming of
> all VCPUs may be necessary. resume_all_vcpus is already compatible, now
> enable pause_all_vcpus for this use case by stopping the calling context
> before starting to wait for the whole gang.
>
> CC: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
>  cpus.c |   12 ++++++++++++
>  1 files changed, 12 insertions(+), 0 deletions(-)
>
> diff --git a/cpus.c b/cpus.c
> index d0c8340..5adfc6b 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -870,6 +870,18 @@ void pause_all_vcpus(void)
>         penv = (CPUState *)penv->next_cpu;
>     }
>
> +    if (!qemu_thread_is_self(&io_thread)) {
> +        cpu_stop_current();
> +        if (!kvm_enabled()) {
> +            while (penv) {
> +                penv->stop = 0;
> +                penv->stopped = 1;
> +                penv = (CPUState *)penv->next_cpu;

The cast is useless, next_cpu is already CPUState *. I wonder why it
is used in other cases too.

> +            }
> +            return;
> +        }
> +    }
> +
>     while (!all_vcpus_paused()) {
>         qemu_cond_wait(&qemu_pause_cond, &qemu_global_mutex);
>         penv = first_cpu;
> --
> 1.7.3.4
>
>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/8] Allow to use pause_all_vcpus from VCPU context
@ 2012-02-11 14:16     ` Blue Swirl
  0 siblings, 0 replies; 90+ messages in thread
From: Blue Swirl @ 2012-02-11 14:16 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, Gleb Natapov, kvm, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Paolo Bonzini

On Fri, Feb 10, 2012 at 18:31, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> In order to perform critical manipulations on the VM state in the
> context of a VCPU, specifically code patching, stopping and resuming of
> all VCPUs may be necessary. resume_all_vcpus is already compatible, now
> enable pause_all_vcpus for this use case by stopping the calling context
> before starting to wait for the whole gang.
>
> CC: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
>  cpus.c |   12 ++++++++++++
>  1 files changed, 12 insertions(+), 0 deletions(-)
>
> diff --git a/cpus.c b/cpus.c
> index d0c8340..5adfc6b 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -870,6 +870,18 @@ void pause_all_vcpus(void)
>         penv = (CPUState *)penv->next_cpu;
>     }
>
> +    if (!qemu_thread_is_self(&io_thread)) {
> +        cpu_stop_current();
> +        if (!kvm_enabled()) {
> +            while (penv) {
> +                penv->stop = 0;
> +                penv->stopped = 1;
> +                penv = (CPUState *)penv->next_cpu;

The cast is useless, next_cpu is already CPUState *. I wonder why it
is used in other cases too.

> +            }
> +            return;
> +        }
> +    }
> +
>     while (!all_vcpus_paused()) {
>         qemu_cond_wait(&qemu_pause_cond, &qemu_global_mutex);
>         penv = first_cpu;
> --
> 1.7.3.4
>
>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/8] kvm: Set cpu_single_env only once
  2012-02-11 14:11                   ` Blue Swirl
@ 2012-02-11 14:18                     ` Jan Kiszka
  -1 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-11 14:18 UTC (permalink / raw)
  To: Blue Swirl
  Cc: Andreas Färber, Anthony Liguori, kvm, Gleb Natapov,
	Marcelo Tosatti, qemu-devel, Avi Kivity

[-- Attachment #1: Type: text/plain, Size: 2412 bytes --]

On 2012-02-11 15:11, Blue Swirl wrote:
> On Sat, Feb 11, 2012 at 14:00, Jan Kiszka <jan.kiszka@web.de> wrote:
>> On 2012-02-11 14:54, Blue Swirl wrote:
>>> On Sat, Feb 11, 2012 at 12:43, Jan Kiszka <jan.kiszka@web.de> wrote:
>>>> On 2012-02-11 12:49, Andreas Färber wrote:
>>>>> Am 11.02.2012 12:25, schrieb Blue Swirl:
>>>>>> I think using cpu_single_env is an indication of a problem, like poor
>>>>>> code, layering violation or poor API (vmport). What is your use case?
>>>>>
>>>>> I couldn't spot any in this series. Jan, note that any new use of env or
>>>>> cpu_single_env will need to be redone when we convert to QOM CPU.
>>>>
>>>> cpu_single_env should have nothing to do with QOM.
>>>>
>>>> The ABIs of vmport and the KVM VAPI require a reference to the calling
>>>> VCPU, and that's why you find tons of them in patch 5.
>>>
>>> Yes, this seems to be another case of a badly designed ABI. I guess
>>> there is no way to change that anymore, just like vmport?
>>
>> Believe me, I grumbled over it more than once while porting it from
>> qemu-kvm. The point is that some (Windows) VMs out there are running
>> already with this option ROM loaded and working this unfortunate ABI.
> 
> Maybe in time those could be deprecated and a ROM using a sane ABI
> introduced instead. After some grace time the old ABI could be finally
> removed.

At some point. But now we have this interface and no other even thought
out. Given that we want to provide a migration path from qemu-kvm to
upstream rather sooner than later, I think there is no way around this
model for a certain, not too short period.

> 
>>>
>>> Some of the cpu_single_env accesses in patch 5 could be avoided when
>>> APIC is moved closer to CPU. VAPIC should be also close to APIC so it
>>> should be able to access the CPU directly. In some other cases the
>>> current state could be passed around instead once it is known.
>>
>> Some callbacks are I/O-port originated, ie. not associated with the
>> per-CPU MMIO area or some MSR. So we would have to pass down the causing
>> CPU to every I/O handler - not sure if that is desired...
> 
> I meant things like vapic_enable_tpr_reporting(), current CPUState
> could be passed via vapic_prepare() easily.

Oh, there is in fact dead code in vapic_enable_tpr_reporting. It just
iterates over all VCPUs, no need to know the current one.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/8] kvm: Set cpu_single_env only once
@ 2012-02-11 14:18                     ` Jan Kiszka
  0 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-11 14:18 UTC (permalink / raw)
  To: Blue Swirl
  Cc: Anthony Liguori, Gleb Natapov, kvm, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Andreas Färber

[-- Attachment #1: Type: text/plain, Size: 2412 bytes --]

On 2012-02-11 15:11, Blue Swirl wrote:
> On Sat, Feb 11, 2012 at 14:00, Jan Kiszka <jan.kiszka@web.de> wrote:
>> On 2012-02-11 14:54, Blue Swirl wrote:
>>> On Sat, Feb 11, 2012 at 12:43, Jan Kiszka <jan.kiszka@web.de> wrote:
>>>> On 2012-02-11 12:49, Andreas Färber wrote:
>>>>> Am 11.02.2012 12:25, schrieb Blue Swirl:
>>>>>> I think using cpu_single_env is an indication of a problem, like poor
>>>>>> code, layering violation or poor API (vmport). What is your use case?
>>>>>
>>>>> I couldn't spot any in this series. Jan, note that any new use of env or
>>>>> cpu_single_env will need to be redone when we convert to QOM CPU.
>>>>
>>>> cpu_single_env should have nothing to do with QOM.
>>>>
>>>> The ABIs of vmport and the KVM VAPI require a reference to the calling
>>>> VCPU, and that's why you find tons of them in patch 5.
>>>
>>> Yes, this seems to be another case of a badly designed ABI. I guess
>>> there is no way to change that anymore, just like vmport?
>>
>> Believe me, I grumbled over it more than once while porting it from
>> qemu-kvm. The point is that some (Windows) VMs out there are running
>> already with this option ROM loaded and working this unfortunate ABI.
> 
> Maybe in time those could be deprecated and a ROM using a sane ABI
> introduced instead. After some grace time the old ABI could be finally
> removed.

At some point. But now we have this interface and no other even thought
out. Given that we want to provide a migration path from qemu-kvm to
upstream rather sooner than later, I think there is no way around this
model for a certain, not too short period.

> 
>>>
>>> Some of the cpu_single_env accesses in patch 5 could be avoided when
>>> APIC is moved closer to CPU. VAPIC should be also close to APIC so it
>>> should be able to access the CPU directly. In some other cases the
>>> current state could be passed around instead once it is known.
>>
>> Some callbacks are I/O-port originated, ie. not associated with the
>> per-CPU MMIO area or some MSR. So we would have to pass down the causing
>> CPU to every I/O handler - not sure if that is desired...
> 
> I meant things like vapic_enable_tpr_reporting(), current CPUState
> could be passed via vapic_prepare() easily.

Oh, there is in fact dead code in vapic_enable_tpr_reporting. It just
iterates over all VCPUs, no need to know the current one.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/8] kvm: Set cpu_single_env only once
  2012-02-11 14:18                     ` Jan Kiszka
@ 2012-02-11 14:23                       ` Blue Swirl
  -1 siblings, 0 replies; 90+ messages in thread
From: Blue Swirl @ 2012-02-11 14:23 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Andreas Färber, Anthony Liguori, kvm, Gleb Natapov,
	Marcelo Tosatti, qemu-devel, Avi Kivity

On Sat, Feb 11, 2012 at 14:18, Jan Kiszka <jan.kiszka@web.de> wrote:
> On 2012-02-11 15:11, Blue Swirl wrote:
>> On Sat, Feb 11, 2012 at 14:00, Jan Kiszka <jan.kiszka@web.de> wrote:
>>> On 2012-02-11 14:54, Blue Swirl wrote:
>>>> On Sat, Feb 11, 2012 at 12:43, Jan Kiszka <jan.kiszka@web.de> wrote:
>>>>> On 2012-02-11 12:49, Andreas Färber wrote:
>>>>>> Am 11.02.2012 12:25, schrieb Blue Swirl:
>>>>>>> I think using cpu_single_env is an indication of a problem, like poor
>>>>>>> code, layering violation or poor API (vmport). What is your use case?
>>>>>>
>>>>>> I couldn't spot any in this series. Jan, note that any new use of env or
>>>>>> cpu_single_env will need to be redone when we convert to QOM CPU.
>>>>>
>>>>> cpu_single_env should have nothing to do with QOM.
>>>>>
>>>>> The ABIs of vmport and the KVM VAPI require a reference to the calling
>>>>> VCPU, and that's why you find tons of them in patch 5.
>>>>
>>>> Yes, this seems to be another case of a badly designed ABI. I guess
>>>> there is no way to change that anymore, just like vmport?
>>>
>>> Believe me, I grumbled over it more than once while porting it from
>>> qemu-kvm. The point is that some (Windows) VMs out there are running
>>> already with this option ROM loaded and working this unfortunate ABI.
>>
>> Maybe in time those could be deprecated and a ROM using a sane ABI
>> introduced instead. After some grace time the old ABI could be finally
>> removed.
>
> At some point. But now we have this interface and no other even thought
> out. Given that we want to provide a migration path from qemu-kvm to
> upstream rather sooner than later, I think there is no way around this
> model for a certain, not too short period.

Yes, that is inevitable.

>>
>>>>
>>>> Some of the cpu_single_env accesses in patch 5 could be avoided when
>>>> APIC is moved closer to CPU. VAPIC should be also close to APIC so it
>>>> should be able to access the CPU directly. In some other cases the
>>>> current state could be passed around instead once it is known.
>>>
>>> Some callbacks are I/O-port originated, ie. not associated with the
>>> per-CPU MMIO area or some MSR. So we would have to pass down the causing
>>> CPU to every I/O handler - not sure if that is desired...
>>
>> I meant things like vapic_enable_tpr_reporting(), current CPUState
>> could be passed via vapic_prepare() easily.
>
> Oh, there is in fact dead code in vapic_enable_tpr_reporting. It just
> iterates over all VCPUs, no need to know the current one.

Ok.

> Jan
>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/8] kvm: Set cpu_single_env only once
@ 2012-02-11 14:23                       ` Blue Swirl
  0 siblings, 0 replies; 90+ messages in thread
From: Blue Swirl @ 2012-02-11 14:23 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, Gleb Natapov, kvm, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Andreas Färber

On Sat, Feb 11, 2012 at 14:18, Jan Kiszka <jan.kiszka@web.de> wrote:
> On 2012-02-11 15:11, Blue Swirl wrote:
>> On Sat, Feb 11, 2012 at 14:00, Jan Kiszka <jan.kiszka@web.de> wrote:
>>> On 2012-02-11 14:54, Blue Swirl wrote:
>>>> On Sat, Feb 11, 2012 at 12:43, Jan Kiszka <jan.kiszka@web.de> wrote:
>>>>> On 2012-02-11 12:49, Andreas Färber wrote:
>>>>>> Am 11.02.2012 12:25, schrieb Blue Swirl:
>>>>>>> I think using cpu_single_env is an indication of a problem, like poor
>>>>>>> code, layering violation or poor API (vmport). What is your use case?
>>>>>>
>>>>>> I couldn't spot any in this series. Jan, note that any new use of env or
>>>>>> cpu_single_env will need to be redone when we convert to QOM CPU.
>>>>>
>>>>> cpu_single_env should have nothing to do with QOM.
>>>>>
>>>>> The ABIs of vmport and the KVM VAPI require a reference to the calling
>>>>> VCPU, and that's why you find tons of them in patch 5.
>>>>
>>>> Yes, this seems to be another case of a badly designed ABI. I guess
>>>> there is no way to change that anymore, just like vmport?
>>>
>>> Believe me, I grumbled over it more than once while porting it from
>>> qemu-kvm. The point is that some (Windows) VMs out there are running
>>> already with this option ROM loaded and working this unfortunate ABI.
>>
>> Maybe in time those could be deprecated and a ROM using a sane ABI
>> introduced instead. After some grace time the old ABI could be finally
>> removed.
>
> At some point. But now we have this interface and no other even thought
> out. Given that we want to provide a migration path from qemu-kvm to
> upstream rather sooner than later, I think there is no way around this
> model for a certain, not too short period.

Yes, that is inevitable.

>>
>>>>
>>>> Some of the cpu_single_env accesses in patch 5 could be avoided when
>>>> APIC is moved closer to CPU. VAPIC should be also close to APIC so it
>>>> should be able to access the CPU directly. In some other cases the
>>>> current state could be passed around instead once it is known.
>>>
>>> Some callbacks are I/O-port originated, ie. not associated with the
>>> per-CPU MMIO area or some MSR. So we would have to pass down the causing
>>> CPU to every I/O handler - not sure if that is desired...
>>
>> I meant things like vapic_enable_tpr_reporting(), current CPUState
>> could be passed via vapic_prepare() easily.
>
> Oh, there is in fact dead code in vapic_enable_tpr_reporting. It just
> iterates over all VCPUs, no need to know the current one.

Ok.

> Jan
>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/8] kvm: Set cpu_single_env only once
  2012-02-11 14:12                           ` Andreas Färber
@ 2012-02-11 14:24                             ` Jan Kiszka
  -1 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-11 14:24 UTC (permalink / raw)
  To: Andreas Färber
  Cc: Blue Swirl, Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti,
	qemu-devel, Avi Kivity, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 2275 bytes --]

On 2012-02-11 15:12, Andreas Färber wrote:
> Am 11.02.2012 15:02, schrieb Jan Kiszka:
>> On 2012-02-11 14:59, Andreas Färber wrote:
>>> Am 11.02.2012 14:35, schrieb Jan Kiszka:
>>>> On 2012-02-11 14:21, Andreas Färber wrote:
>>>>> CPU base class v3: http://patchwork.ozlabs.org/patch/139284/
>>>>> (v4 coming up)
>>>>>
>>>>> That doesn't prevent target-specific devices. Although Paolo
>>>>> does want that to change wrt properties.
>>>
>>>> This patch doesn't explain yet what shall happen to
>>>> cpu_single_env and CPUState accesses from target-specific
>>>> devices.
>>>
>>> True. We can't change them before all targets are converted. So
>>> far I have 3/14 and still some review comments to work in.
>>>
>>> Another patch in that series uses a macro 
>>> s/ENV_GET_OBJECT/ENV_GET_CPU/ to go from CPUState -> CPU while
>>> we convert targets.
>>>
>>> Depending on our taste, cpu_single_env might become
>>> cpu_single_cpu, single_cpu or cpu_single.
>>>
>>>> Do you plan accessors for registers?
>>>
>>> No, registers are in target-specific ARMCPU, S390CPU, MIPSCPU,
>>> etc. and their CPU*State. It would be possible to have static
>>> inline accessors but so far I've seen no need.
> 
>> Then the devices need to have access to a CPUState pointer, just as
>> so far.
> 
> Yes and no. They can have any target-specific pointer they want, just
> as before. But no global first_cpu / cpu_single_env pointer - that's

If you want to drop first_cpu, you need to provide a for_each_cpu
iterating service instead. And cpu_single_env can only be obsoleted if
I/O access handlers can otherwise query the triggering CPU.

> replaced by CPU pointers, through which members of derived classes can
> be accessed (which did not work for CPUState due to CPU_COMMON members
> being at target-specific offset in the middle).
> 
> There's nothing wrong with your patch per se, just that it may need to
> get refactored some time soonish. We need to be aware of it so that we
> don't create merge conflicts for Anthony.

There can't be logical merge conflicts as the no fundamentally new
requirements are introduced with this series. And we have no code
proposal seen yet to address them already for the existing use cases.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/8] kvm: Set cpu_single_env only once
@ 2012-02-11 14:24                             ` Jan Kiszka
  0 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-11 14:24 UTC (permalink / raw)
  To: Andreas Färber
  Cc: Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti, qemu-devel,
	Blue Swirl, Avi Kivity, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 2275 bytes --]

On 2012-02-11 15:12, Andreas Färber wrote:
> Am 11.02.2012 15:02, schrieb Jan Kiszka:
>> On 2012-02-11 14:59, Andreas Färber wrote:
>>> Am 11.02.2012 14:35, schrieb Jan Kiszka:
>>>> On 2012-02-11 14:21, Andreas Färber wrote:
>>>>> CPU base class v3: http://patchwork.ozlabs.org/patch/139284/
>>>>> (v4 coming up)
>>>>>
>>>>> That doesn't prevent target-specific devices. Although Paolo
>>>>> does want that to change wrt properties.
>>>
>>>> This patch doesn't explain yet what shall happen to
>>>> cpu_single_env and CPUState accesses from target-specific
>>>> devices.
>>>
>>> True. We can't change them before all targets are converted. So
>>> far I have 3/14 and still some review comments to work in.
>>>
>>> Another patch in that series uses a macro 
>>> s/ENV_GET_OBJECT/ENV_GET_CPU/ to go from CPUState -> CPU while
>>> we convert targets.
>>>
>>> Depending on our taste, cpu_single_env might become
>>> cpu_single_cpu, single_cpu or cpu_single.
>>>
>>>> Do you plan accessors for registers?
>>>
>>> No, registers are in target-specific ARMCPU, S390CPU, MIPSCPU,
>>> etc. and their CPU*State. It would be possible to have static
>>> inline accessors but so far I've seen no need.
> 
>> Then the devices need to have access to a CPUState pointer, just as
>> so far.
> 
> Yes and no. They can have any target-specific pointer they want, just
> as before. But no global first_cpu / cpu_single_env pointer - that's

If you want to drop first_cpu, you need to provide a for_each_cpu
iterating service instead. And cpu_single_env can only be obsoleted if
I/O access handlers can otherwise query the triggering CPU.

> replaced by CPU pointers, through which members of derived classes can
> be accessed (which did not work for CPUState due to CPU_COMMON members
> being at target-specific offset in the middle).
> 
> There's nothing wrong with your patch per se, just that it may need to
> get refactored some time soonish. We need to be aware of it so that we
> don't create merge conflicts for Anthony.

There can't be logical merge conflicts as the no fundamentally new
requirements are introduced with this series. And we have no code
proposal seen yet to address them already for the existing use cases.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 2/8] Allow to use pause_all_vcpus from VCPU context
  2012-02-11 14:16     ` [Qemu-devel] " Blue Swirl
@ 2012-02-11 14:31       ` Jan Kiszka
  -1 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-11 14:31 UTC (permalink / raw)
  To: Blue Swirl
  Cc: Anthony Liguori, Gleb Natapov, kvm, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 1335 bytes --]

On 2012-02-11 15:16, Blue Swirl wrote:
> On Fri, Feb 10, 2012 at 18:31, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> In order to perform critical manipulations on the VM state in the
>> context of a VCPU, specifically code patching, stopping and resuming of
>> all VCPUs may be necessary. resume_all_vcpus is already compatible, now
>> enable pause_all_vcpus for this use case by stopping the calling context
>> before starting to wait for the whole gang.
>>
>> CC: Paolo Bonzini <pbonzini@redhat.com>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> ---
>>  cpus.c |   12 ++++++++++++
>>  1 files changed, 12 insertions(+), 0 deletions(-)
>>
>> diff --git a/cpus.c b/cpus.c
>> index d0c8340..5adfc6b 100644
>> --- a/cpus.c
>> +++ b/cpus.c
>> @@ -870,6 +870,18 @@ void pause_all_vcpus(void)
>>         penv = (CPUState *)penv->next_cpu;
>>     }
>>
>> +    if (!qemu_thread_is_self(&io_thread)) {
>> +        cpu_stop_current();
>> +        if (!kvm_enabled()) {
>> +            while (penv) {
>> +                penv->stop = 0;
>> +                penv->stopped = 1;
>> +                penv = (CPUState *)penv->next_cpu;
> 
> The cast is useless, next_cpu is already CPUState *. I wonder why it
> is used in other cases too.

Indeed, weird. We can clean the others up separately.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/8] Allow to use pause_all_vcpus from VCPU context
@ 2012-02-11 14:31       ` Jan Kiszka
  0 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-11 14:31 UTC (permalink / raw)
  To: Blue Swirl
  Cc: Anthony Liguori, Gleb Natapov, kvm, Marcelo Tosatti, qemu-devel,
	Avi Kivity, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 1335 bytes --]

On 2012-02-11 15:16, Blue Swirl wrote:
> On Fri, Feb 10, 2012 at 18:31, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> In order to perform critical manipulations on the VM state in the
>> context of a VCPU, specifically code patching, stopping and resuming of
>> all VCPUs may be necessary. resume_all_vcpus is already compatible, now
>> enable pause_all_vcpus for this use case by stopping the calling context
>> before starting to wait for the whole gang.
>>
>> CC: Paolo Bonzini <pbonzini@redhat.com>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> ---
>>  cpus.c |   12 ++++++++++++
>>  1 files changed, 12 insertions(+), 0 deletions(-)
>>
>> diff --git a/cpus.c b/cpus.c
>> index d0c8340..5adfc6b 100644
>> --- a/cpus.c
>> +++ b/cpus.c
>> @@ -870,6 +870,18 @@ void pause_all_vcpus(void)
>>         penv = (CPUState *)penv->next_cpu;
>>     }
>>
>> +    if (!qemu_thread_is_self(&io_thread)) {
>> +        cpu_stop_current();
>> +        if (!kvm_enabled()) {
>> +            while (penv) {
>> +                penv->stop = 0;
>> +                penv->stopped = 1;
>> +                penv = (CPUState *)penv->next_cpu;
> 
> The cast is useless, next_cpu is already CPUState *. I wonder why it
> is used in other cases too.

Indeed, weird. We can clean the others up separately.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/8] target-i386: Add infrastructure for reporting TPR MMIO accesses
  2012-02-10 18:31   ` [Qemu-devel] " Jan Kiszka
@ 2012-02-11 14:32     ` Blue Swirl
  -1 siblings, 0 replies; 90+ messages in thread
From: Blue Swirl @ 2012-02-11 14:32 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Avi Kivity, Marcelo Tosatti, Anthony Liguori, qemu-devel, kvm,
	Gleb Natapov

On Fri, Feb 10, 2012 at 18:31, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> This will allow the APIC core to file a TPR access report. Depending on
> the accelerator and kernel irqchip mode, it will either be delivered
> right away or queued for later reporting.
>
> In TCG mode, we can restart the triggering instruction and can therefore
> forward the event directly. KVM does not allows us to restart, so we
> postpone the delivery of events recording in the user space APIC until
> the current instruction is completed.
>
> Note that KVM without in-kernel irqchip will report the address after
> the instruction that triggered a write access. In contrast, read
> accesses will return the precise information.
>
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
>  cpu-all.h            |    3 ++-
>  hw/apic.h            |    2 ++
>  hw/apic_common.c     |    4 ++++
>  target-i386/cpu.h    |    9 +++++++++
>  target-i386/helper.c |   19 +++++++++++++++++++
>  target-i386/kvm.c    |   24 ++++++++++++++++++++++--
>  6 files changed, 58 insertions(+), 3 deletions(-)
>
> diff --git a/cpu-all.h b/cpu-all.h
> index e2c3c49..80e6d42 100644
> --- a/cpu-all.h
> +++ b/cpu-all.h
> @@ -375,8 +375,9 @@ DECLARE_TLS(CPUState *,cpu_single_env);
>  #define CPU_INTERRUPT_TGT_INT_0   0x0100
>  #define CPU_INTERRUPT_TGT_INT_1   0x0400
>  #define CPU_INTERRUPT_TGT_INT_2   0x0800
> +#define CPU_INTERRUPT_TGT_INT_3   0x2000
>
> -/* First unused bit: 0x2000.  */
> +/* First unused bit: 0x4000.  */
>
>  /* The set of all bits that should be masked when single-stepping.  */
>  #define CPU_INTERRUPT_SSTEP_MASK \
> diff --git a/hw/apic.h b/hw/apic.h
> index a62d83b..45598bd 100644
> --- a/hw/apic.h
> +++ b/hw/apic.h
> @@ -18,6 +18,8 @@ void cpu_set_apic_tpr(DeviceState *s, uint8_t val);
>  uint8_t cpu_get_apic_tpr(DeviceState *s);
>  void apic_init_reset(DeviceState *s);
>  void apic_sipi(DeviceState *s);
> +void apic_handle_tpr_access_report(DeviceState *d, target_ulong ip,
> +                                   int access);
>
>  /* pc.c */
>  int cpu_is_bsp(CPUState *env);
> diff --git a/hw/apic_common.c b/hw/apic_common.c
> index 8373d79..588531b 100644
> --- a/hw/apic_common.c
> +++ b/hw/apic_common.c
> @@ -68,6 +68,10 @@ uint8_t cpu_get_apic_tpr(DeviceState *d)
>     return s ? s->tpr >> 4 : 0;
>  }
>
> +void apic_handle_tpr_access_report(DeviceState *d, target_ulong ip, int access)
> +{
> +}
> +
>  void apic_report_irq_delivered(int delivered)
>  {
>     apic_irq_delivered += delivered;
> diff --git a/target-i386/cpu.h b/target-i386/cpu.h
> index 37dde79..92e9c87 100644
> --- a/target-i386/cpu.h
> +++ b/target-i386/cpu.h
> @@ -482,6 +482,7 @@
>  #define CPU_INTERRUPT_VIRQ      CPU_INTERRUPT_TGT_INT_0
>  #define CPU_INTERRUPT_INIT      CPU_INTERRUPT_TGT_INT_1
>  #define CPU_INTERRUPT_SIPI      CPU_INTERRUPT_TGT_INT_2
> +#define CPU_INTERRUPT_TPR       CPU_INTERRUPT_TGT_INT_3
>
>
>  enum {
> @@ -772,6 +773,9 @@ typedef struct CPUX86State {
>     XMMReg ymmh_regs[CPU_NB_REGS];
>
>     uint64_t xcr0;
> +
> +    target_ulong tpr_access_ip;
> +    int tpr_access_type;
>  } CPUX86State;
>
>  CPUX86State *cpu_x86_init(const char *cpu_model);
> @@ -1064,4 +1068,9 @@ void svm_check_intercept(CPUState *env1, uint32_t type);
>
>  uint32_t cpu_cc_compute_all(CPUState *env1, int op);
>
> +#define TPR_ACCESS_READ     0
> +#define TPR_ACCESS_WRITE    1

enum would be nicer.

> +
> +void cpu_report_tpr_access(CPUState *env, int access);
> +
>  #endif /* CPU_I386_H */
> diff --git a/target-i386/helper.c b/target-i386/helper.c
> index 2586aff..eca20cd 100644
> --- a/target-i386/helper.c
> +++ b/target-i386/helper.c
> @@ -1189,6 +1189,25 @@ void cpu_x86_inject_mce(Monitor *mon, CPUState *cenv, int bank,
>         }
>     }
>  }
> +
> +void cpu_report_tpr_access(CPUState *env, int access)
> +{
> +    TranslationBlock *tb;
> +
> +    if (kvm_enabled()) {
> +        cpu_synchronize_state(env);
> +
> +        env->tpr_access_ip = env->eip;
> +        env->tpr_access_type = access;
> +
> +        cpu_interrupt(env, CPU_INTERRUPT_TPR);
> +    } else {
> +        tb = tb_find_pc(env->mem_io_pc);
> +        cpu_restore_state(tb, env, env->mem_io_pc);
> +
> +        apic_handle_tpr_access_report(env->apic_state, env->eip, access);
> +    }
> +}
>  #endif /* !CONFIG_USER_ONLY */
>
>  static void mce_init(CPUX86State *cenv)
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index 981192d..fa77f9d 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -1635,8 +1635,10 @@ void kvm_arch_pre_run(CPUState *env, struct kvm_run *run)
>     }
>
>     if (!kvm_irqchip_in_kernel()) {
> -        /* Force the VCPU out of its inner loop to process the INIT request */
> -        if (env->interrupt_request & CPU_INTERRUPT_INIT) {
> +        /* Force the VCPU out of its inner loop to process any INIT requests
> +         * or pending TPR access reports. */
> +        if (env->interrupt_request &
> +            (CPU_INTERRUPT_INIT | CPU_INTERRUPT_TPR)) {
>             env->exit_request = 1;
>         }
>
> @@ -1730,6 +1732,11 @@ int kvm_arch_process_async_events(CPUState *env)
>         kvm_cpu_synchronize_state(env);
>         do_cpu_sipi(env);
>     }
> +    if (env->interrupt_request & CPU_INTERRUPT_TPR) {
> +        env->interrupt_request &= ~CPU_INTERRUPT_TPR;
> +        apic_handle_tpr_access_report(env->apic_state, env->tpr_access_ip,
> +                                      env->tpr_access_type);
> +    }
>
>     return env->halted;
>  }
> @@ -1746,6 +1753,16 @@ static int kvm_handle_halt(CPUState *env)
>     return 0;
>  }
>
> +static int kvm_handle_tpr_access(CPUState *env)
> +{
> +    struct kvm_run *run = env->kvm_run;
> +
> +    apic_handle_tpr_access_report(env->apic_state, run->tpr_access.rip,
> +                                  run->tpr_access.is_write ? TPR_ACCESS_WRITE
> +                                                           : TPR_ACCESS_READ);
> +    return 1;
> +}
> +
>  int kvm_arch_insert_sw_breakpoint(CPUState *env, struct kvm_sw_breakpoint *bp)
>  {
>     static const uint8_t int3 = 0xcc;
> @@ -1950,6 +1967,9 @@ int kvm_arch_handle_exit(CPUState *env, struct kvm_run *run)
>     case KVM_EXIT_SET_TPR:
>         ret = 0;
>         break;
> +    case KVM_EXIT_TPR_ACCESS:
> +        ret = kvm_handle_tpr_access(env);
> +        break;
>     case KVM_EXIT_FAIL_ENTRY:
>         code = run->fail_entry.hardware_entry_failure_reason;
>         fprintf(stderr, "KVM: entry failed, hardware error 0x%" PRIx64 "\n",
> --
> 1.7.3.4
>
>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/8] target-i386: Add infrastructure for reporting TPR MMIO accesses
@ 2012-02-11 14:32     ` Blue Swirl
  0 siblings, 0 replies; 90+ messages in thread
From: Blue Swirl @ 2012-02-11 14:32 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti, qemu-devel,
	Avi Kivity

On Fri, Feb 10, 2012 at 18:31, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> This will allow the APIC core to file a TPR access report. Depending on
> the accelerator and kernel irqchip mode, it will either be delivered
> right away or queued for later reporting.
>
> In TCG mode, we can restart the triggering instruction and can therefore
> forward the event directly. KVM does not allows us to restart, so we
> postpone the delivery of events recording in the user space APIC until
> the current instruction is completed.
>
> Note that KVM without in-kernel irqchip will report the address after
> the instruction that triggered a write access. In contrast, read
> accesses will return the precise information.
>
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
>  cpu-all.h            |    3 ++-
>  hw/apic.h            |    2 ++
>  hw/apic_common.c     |    4 ++++
>  target-i386/cpu.h    |    9 +++++++++
>  target-i386/helper.c |   19 +++++++++++++++++++
>  target-i386/kvm.c    |   24 ++++++++++++++++++++++--
>  6 files changed, 58 insertions(+), 3 deletions(-)
>
> diff --git a/cpu-all.h b/cpu-all.h
> index e2c3c49..80e6d42 100644
> --- a/cpu-all.h
> +++ b/cpu-all.h
> @@ -375,8 +375,9 @@ DECLARE_TLS(CPUState *,cpu_single_env);
>  #define CPU_INTERRUPT_TGT_INT_0   0x0100
>  #define CPU_INTERRUPT_TGT_INT_1   0x0400
>  #define CPU_INTERRUPT_TGT_INT_2   0x0800
> +#define CPU_INTERRUPT_TGT_INT_3   0x2000
>
> -/* First unused bit: 0x2000.  */
> +/* First unused bit: 0x4000.  */
>
>  /* The set of all bits that should be masked when single-stepping.  */
>  #define CPU_INTERRUPT_SSTEP_MASK \
> diff --git a/hw/apic.h b/hw/apic.h
> index a62d83b..45598bd 100644
> --- a/hw/apic.h
> +++ b/hw/apic.h
> @@ -18,6 +18,8 @@ void cpu_set_apic_tpr(DeviceState *s, uint8_t val);
>  uint8_t cpu_get_apic_tpr(DeviceState *s);
>  void apic_init_reset(DeviceState *s);
>  void apic_sipi(DeviceState *s);
> +void apic_handle_tpr_access_report(DeviceState *d, target_ulong ip,
> +                                   int access);
>
>  /* pc.c */
>  int cpu_is_bsp(CPUState *env);
> diff --git a/hw/apic_common.c b/hw/apic_common.c
> index 8373d79..588531b 100644
> --- a/hw/apic_common.c
> +++ b/hw/apic_common.c
> @@ -68,6 +68,10 @@ uint8_t cpu_get_apic_tpr(DeviceState *d)
>     return s ? s->tpr >> 4 : 0;
>  }
>
> +void apic_handle_tpr_access_report(DeviceState *d, target_ulong ip, int access)
> +{
> +}
> +
>  void apic_report_irq_delivered(int delivered)
>  {
>     apic_irq_delivered += delivered;
> diff --git a/target-i386/cpu.h b/target-i386/cpu.h
> index 37dde79..92e9c87 100644
> --- a/target-i386/cpu.h
> +++ b/target-i386/cpu.h
> @@ -482,6 +482,7 @@
>  #define CPU_INTERRUPT_VIRQ      CPU_INTERRUPT_TGT_INT_0
>  #define CPU_INTERRUPT_INIT      CPU_INTERRUPT_TGT_INT_1
>  #define CPU_INTERRUPT_SIPI      CPU_INTERRUPT_TGT_INT_2
> +#define CPU_INTERRUPT_TPR       CPU_INTERRUPT_TGT_INT_3
>
>
>  enum {
> @@ -772,6 +773,9 @@ typedef struct CPUX86State {
>     XMMReg ymmh_regs[CPU_NB_REGS];
>
>     uint64_t xcr0;
> +
> +    target_ulong tpr_access_ip;
> +    int tpr_access_type;
>  } CPUX86State;
>
>  CPUX86State *cpu_x86_init(const char *cpu_model);
> @@ -1064,4 +1068,9 @@ void svm_check_intercept(CPUState *env1, uint32_t type);
>
>  uint32_t cpu_cc_compute_all(CPUState *env1, int op);
>
> +#define TPR_ACCESS_READ     0
> +#define TPR_ACCESS_WRITE    1

enum would be nicer.

> +
> +void cpu_report_tpr_access(CPUState *env, int access);
> +
>  #endif /* CPU_I386_H */
> diff --git a/target-i386/helper.c b/target-i386/helper.c
> index 2586aff..eca20cd 100644
> --- a/target-i386/helper.c
> +++ b/target-i386/helper.c
> @@ -1189,6 +1189,25 @@ void cpu_x86_inject_mce(Monitor *mon, CPUState *cenv, int bank,
>         }
>     }
>  }
> +
> +void cpu_report_tpr_access(CPUState *env, int access)
> +{
> +    TranslationBlock *tb;
> +
> +    if (kvm_enabled()) {
> +        cpu_synchronize_state(env);
> +
> +        env->tpr_access_ip = env->eip;
> +        env->tpr_access_type = access;
> +
> +        cpu_interrupt(env, CPU_INTERRUPT_TPR);
> +    } else {
> +        tb = tb_find_pc(env->mem_io_pc);
> +        cpu_restore_state(tb, env, env->mem_io_pc);
> +
> +        apic_handle_tpr_access_report(env->apic_state, env->eip, access);
> +    }
> +}
>  #endif /* !CONFIG_USER_ONLY */
>
>  static void mce_init(CPUX86State *cenv)
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index 981192d..fa77f9d 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -1635,8 +1635,10 @@ void kvm_arch_pre_run(CPUState *env, struct kvm_run *run)
>     }
>
>     if (!kvm_irqchip_in_kernel()) {
> -        /* Force the VCPU out of its inner loop to process the INIT request */
> -        if (env->interrupt_request & CPU_INTERRUPT_INIT) {
> +        /* Force the VCPU out of its inner loop to process any INIT requests
> +         * or pending TPR access reports. */
> +        if (env->interrupt_request &
> +            (CPU_INTERRUPT_INIT | CPU_INTERRUPT_TPR)) {
>             env->exit_request = 1;
>         }
>
> @@ -1730,6 +1732,11 @@ int kvm_arch_process_async_events(CPUState *env)
>         kvm_cpu_synchronize_state(env);
>         do_cpu_sipi(env);
>     }
> +    if (env->interrupt_request & CPU_INTERRUPT_TPR) {
> +        env->interrupt_request &= ~CPU_INTERRUPT_TPR;
> +        apic_handle_tpr_access_report(env->apic_state, env->tpr_access_ip,
> +                                      env->tpr_access_type);
> +    }
>
>     return env->halted;
>  }
> @@ -1746,6 +1753,16 @@ static int kvm_handle_halt(CPUState *env)
>     return 0;
>  }
>
> +static int kvm_handle_tpr_access(CPUState *env)
> +{
> +    struct kvm_run *run = env->kvm_run;
> +
> +    apic_handle_tpr_access_report(env->apic_state, run->tpr_access.rip,
> +                                  run->tpr_access.is_write ? TPR_ACCESS_WRITE
> +                                                           : TPR_ACCESS_READ);
> +    return 1;
> +}
> +
>  int kvm_arch_insert_sw_breakpoint(CPUState *env, struct kvm_sw_breakpoint *bp)
>  {
>     static const uint8_t int3 = 0xcc;
> @@ -1950,6 +1967,9 @@ int kvm_arch_handle_exit(CPUState *env, struct kvm_run *run)
>     case KVM_EXIT_SET_TPR:
>         ret = 0;
>         break;
> +    case KVM_EXIT_TPR_ACCESS:
> +        ret = kvm_handle_tpr_access(env);
> +        break;
>     case KVM_EXIT_FAIL_ENTRY:
>         code = run->fail_entry.hardware_entry_failure_reason;
>         fprintf(stderr, "KVM: entry failed, hardware error 0x%" PRIx64 "\n",
> --
> 1.7.3.4
>
>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 1/8] kvm: Set cpu_single_env only once
  2012-02-11 14:24                             ` Jan Kiszka
@ 2012-02-11 14:49                               ` Andreas Färber
  -1 siblings, 0 replies; 90+ messages in thread
From: Andreas Färber @ 2012-02-11 14:49 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti, qemu-devel,
	Blue Swirl, Avi Kivity, Paolo Bonzini

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Am 11.02.2012 15:24, schrieb Jan Kiszka:
> On 2012-02-11 15:12, Andreas Färber wrote:
>> Am 11.02.2012 15:02, schrieb Jan Kiszka:
>>> On 2012-02-11 14:59, Andreas Färber wrote:
>>>> Am 11.02.2012 14:35, schrieb Jan Kiszka:
>>>>> On 2012-02-11 14:21, Andreas Färber wrote:
>>>>>> CPU base class v3:
>>>>>> http://patchwork.ozlabs.org/patch/139284/ (v4 coming up)
>>>>>> 
>>>>>> That doesn't prevent target-specific devices. Although
>>>>>> Paolo does want that to change wrt properties.
>>>> 
>>>>> This patch doesn't explain yet what shall happen to 
>>>>> cpu_single_env and CPUState accesses from target-specific 
>>>>> devices.
>>>> 
>>>> True. We can't change them before all targets are converted.
>>>> So far I have 3/14 and still some review comments to work
>>>> in.
>>>> 
>>>> Another patch in that series uses a macro 
>>>> s/ENV_GET_OBJECT/ENV_GET_CPU/ to go from CPUState -> CPU
>>>> while we convert targets.
>>>> 
>>>> Depending on our taste, cpu_single_env might become 
>>>> cpu_single_cpu, single_cpu or cpu_single.
>>>> 
>>>>> Do you plan accessors for registers?
>>>> 
>>>> No, registers are in target-specific ARMCPU, S390CPU,
>>>> MIPSCPU, etc. and their CPU*State. It would be possible to
>>>> have static inline accessors but so far I've seen no need.
>> 
>>> Then the devices need to have access to a CPUState pointer,
>>> just as so far.
>> 
>> Yes and no. They can have any target-specific pointer they want,
>> just as before. But no global first_cpu / cpu_single_env pointer
>> - that's
> 
> If you want to drop first_cpu, you need to provide a for_each_cpu 
> iterating service instead. And cpu_single_env can only be obsoleted
> if I/O access handlers can otherwise query the triggering CPU.

I already answered that above. Please join #qemu if you further want
to discuss that, for this thread seems to lead nowhere.

Andreas

> 
>> replaced by CPU pointers, through which members of derived
>> classes can be accessed (which did not work for CPUState due to
>> CPU_COMMON members being at target-specific offset in the
>> middle).
>> 
>> There's nothing wrong with your patch per se, just that it may
>> need to get refactored some time soonish. We need to be aware of
>> it so that we don't create merge conflicts for Anthony.
> 
> There can't be logical merge conflicts as the no fundamentally new 
> requirements are introduced with this series. And we have no code 
> proposal seen yet to address them already for the existing use
> cases.
> 
> Jan
> 


- -- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)

iQIcBAEBAgAGBQJPNoALAAoJEPou0S0+fgE/OHAQALK7X6v9nA0A4tZ8umD4Ak8p
DkyHX9N0pkv8Jc9y06WWLCzsgCQJFxPKp0n0mpWHhG96mWryez+Cd8x00W9wJJWx
A15beRV80jwpDWlkNMtnQ+T9kvVamUsL3a090Bgcb662EkCpfR5UtjDlrYBM7X7f
C/ejV31NYnFIXM5F2TcsURrXZ7GRXNeSRsoXrt2WoCBhFkf+DBek8ejEsYcFS6q0
lrqoggHTVKRZuGbBIJ9yS3/L/pf6gWDOv1ZyUAHfAUeWt2rD3NxNFHHFLbrl3d47
k5Yev4acZOTe6ozgvK3qWcrvA2t42BmKTCA7FqLKg2057szll277wKHf0K2xqlvs
oYTbSk4t9IWI4StBFevgVDM0kaXg6OAGKiDDP8PRrBI3YzJajLL6zkDVitA5Hp0N
LPryOYwhI+KtO3Too7R919UDZIoZ+pg2Mm+L1/1IuneB8Ar1MeiPwU0zXLYGiDVx
ZrMzjhhbYJn+PPC8FxI9gnaPkLVkZzSfcXkpA1RXLZdpkjdmt4rwA0KfFNB000DU
fag3rAGTdcvT8O58K2FXYAWe8VFqA979VHWxsTOLrVX3rL9Xbi63SUfvz/joMryO
mZMYsJ/NGHd5IVYdWP0kBdxuXRtFUaqHnp7PFnwj0IXtnV13csgB2nN+HN5wJULs
A855i5ibqUahcKGej48W
=jxwX
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/8] kvm: Set cpu_single_env only once
@ 2012-02-11 14:49                               ` Andreas Färber
  0 siblings, 0 replies; 90+ messages in thread
From: Andreas Färber @ 2012-02-11 14:49 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti, qemu-devel,
	Blue Swirl, Avi Kivity, Paolo Bonzini

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Am 11.02.2012 15:24, schrieb Jan Kiszka:
> On 2012-02-11 15:12, Andreas Färber wrote:
>> Am 11.02.2012 15:02, schrieb Jan Kiszka:
>>> On 2012-02-11 14:59, Andreas Färber wrote:
>>>> Am 11.02.2012 14:35, schrieb Jan Kiszka:
>>>>> On 2012-02-11 14:21, Andreas Färber wrote:
>>>>>> CPU base class v3:
>>>>>> http://patchwork.ozlabs.org/patch/139284/ (v4 coming up)
>>>>>> 
>>>>>> That doesn't prevent target-specific devices. Although
>>>>>> Paolo does want that to change wrt properties.
>>>> 
>>>>> This patch doesn't explain yet what shall happen to 
>>>>> cpu_single_env and CPUState accesses from target-specific 
>>>>> devices.
>>>> 
>>>> True. We can't change them before all targets are converted.
>>>> So far I have 3/14 and still some review comments to work
>>>> in.
>>>> 
>>>> Another patch in that series uses a macro 
>>>> s/ENV_GET_OBJECT/ENV_GET_CPU/ to go from CPUState -> CPU
>>>> while we convert targets.
>>>> 
>>>> Depending on our taste, cpu_single_env might become 
>>>> cpu_single_cpu, single_cpu or cpu_single.
>>>> 
>>>>> Do you plan accessors for registers?
>>>> 
>>>> No, registers are in target-specific ARMCPU, S390CPU,
>>>> MIPSCPU, etc. and their CPU*State. It would be possible to
>>>> have static inline accessors but so far I've seen no need.
>> 
>>> Then the devices need to have access to a CPUState pointer,
>>> just as so far.
>> 
>> Yes and no. They can have any target-specific pointer they want,
>> just as before. But no global first_cpu / cpu_single_env pointer
>> - that's
> 
> If you want to drop first_cpu, you need to provide a for_each_cpu 
> iterating service instead. And cpu_single_env can only be obsoleted
> if I/O access handlers can otherwise query the triggering CPU.

I already answered that above. Please join #qemu if you further want
to discuss that, for this thread seems to lead nowhere.

Andreas

> 
>> replaced by CPU pointers, through which members of derived
>> classes can be accessed (which did not work for CPUState due to
>> CPU_COMMON members being at target-specific offset in the
>> middle).
>> 
>> There's nothing wrong with your patch per se, just that it may
>> need to get refactored some time soonish. We need to be aware of
>> it so that we don't create merge conflicts for Anthony.
> 
> There can't be logical merge conflicts as the no fundamentally new 
> requirements are introduced with this series. And we have no code 
> proposal seen yet to address them already for the existing use
> cases.
> 
> Jan
> 


- -- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)

iQIcBAEBAgAGBQJPNoALAAoJEPou0S0+fgE/OHAQALK7X6v9nA0A4tZ8umD4Ak8p
DkyHX9N0pkv8Jc9y06WWLCzsgCQJFxPKp0n0mpWHhG96mWryez+Cd8x00W9wJJWx
A15beRV80jwpDWlkNMtnQ+T9kvVamUsL3a090Bgcb662EkCpfR5UtjDlrYBM7X7f
C/ejV31NYnFIXM5F2TcsURrXZ7GRXNeSRsoXrt2WoCBhFkf+DBek8ejEsYcFS6q0
lrqoggHTVKRZuGbBIJ9yS3/L/pf6gWDOv1ZyUAHfAUeWt2rD3NxNFHHFLbrl3d47
k5Yev4acZOTe6ozgvK3qWcrvA2t42BmKTCA7FqLKg2057szll277wKHf0K2xqlvs
oYTbSk4t9IWI4StBFevgVDM0kaXg6OAGKiDDP8PRrBI3YzJajLL6zkDVitA5Hp0N
LPryOYwhI+KtO3Too7R919UDZIoZ+pg2Mm+L1/1IuneB8Ar1MeiPwU0zXLYGiDVx
ZrMzjhhbYJn+PPC8FxI9gnaPkLVkZzSfcXkpA1RXLZdpkjdmt4rwA0KfFNB000DU
fag3rAGTdcvT8O58K2FXYAWe8VFqA979VHWxsTOLrVX3rL9Xbi63SUfvz/joMryO
mZMYsJ/NGHd5IVYdWP0kBdxuXRtFUaqHnp7PFnwj0IXtnV13csgB2nN+HN5wJULs
A855i5ibqUahcKGej48W
=jxwX
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests
  2012-02-10 18:31   ` [Qemu-devel] " Jan Kiszka
@ 2012-02-11 15:25     ` Blue Swirl
  -1 siblings, 0 replies; 90+ messages in thread
From: Blue Swirl @ 2012-02-11 15:25 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti, qemu-devel,
	Avi Kivity

On Fri, Feb 10, 2012 at 18:31, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> This enables acceleration for MMIO-based TPR registers accesses of
> 32-bit Windows guest systems. It is mostly useful with KVM enabled,
> either on older Intel CPUs (without flexpriority feature, can also be
> manually disabled for testing) or any current AMD processor.
>
> The approach introduced here is derived from the original version of
> qemu-kvm. It was refactored, documented, and extended by support for
> user space APIC emulation, both with and without KVM acceleration. The
> VMState format was kept compatible, so was the ABI to the option ROM
> that implements the guest-side para-virtualized driver service. This
> enables seamless migration from qemu-kvm to upstream or, one day,
> between KVM and TCG mode.
>
> The basic concept goes like this:
>  - VAPIC PV interface consisting of I/O port 0x7e and (for KVM in-kernel
>   irqchip) a vmcall hypercall is registered
>  - VAPIC option ROM is loaded into guest
>  - option ROM activates TPR MMIO access reporting via port 0x7e
>  - TPR accesses are trapped and patched in the guest to call into option
>   ROM instead, VAPIC support is enabled
>  - option ROM TPR helpers track state in memory and invoke hypercall to
>   poll for pending IRQs if required
>
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>

I must say that I find the approach horrible, patching guests and ROMs
and looking up Windows internals. Taking the same approach to extreme,
we could for example patch Xen guest to become a KVM guest. Not that I
object merging.

> ---
>  Makefile.target    |    3 +-
>  hw/apic.c          |  126 ++++++++-
>  hw/apic_common.c   |   64 +++++-
>  hw/apic_internal.h |   27 ++
>  hw/kvm/apic.c      |   32 +++
>  hw/kvmvapic.c      |  774 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  6 files changed, 1012 insertions(+), 14 deletions(-)
>  create mode 100644 hw/kvmvapic.c
>
> diff --git a/Makefile.target b/Makefile.target
> index 68481a3..ec7eff8 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -230,7 +230,8 @@ obj-y += device-hotplug.o
>
>  # Hardware support
>  obj-i386-y += mc146818rtc.o pc.o
> -obj-i386-y += sga.o apic_common.o apic.o ioapic_common.o ioapic.o piix_pci.o
> +obj-i386-y += apic_common.o apic.o kvmvapic.o
> +obj-i386-y += sga.o ioapic_common.o ioapic.o piix_pci.o
>  obj-i386-y += vmport.o
>  obj-i386-y += pci-hotplug.o smbios.o wdt_ib700.o
>  obj-i386-y += debugcon.o multiboot.o
> diff --git a/hw/apic.c b/hw/apic.c
> index 086c544..2ebf3ca 100644
> --- a/hw/apic.c
> +++ b/hw/apic.c
> @@ -35,6 +35,10 @@
>  #define MSI_ADDR_DEST_ID_SHIFT         12
>  #define        MSI_ADDR_DEST_ID_MASK           0x00ffff0
>
> +#define SYNC_FROM_VAPIC                 0x1
> +#define SYNC_TO_VAPIC                   0x2
> +#define SYNC_ISR_IRR_TO_VAPIC           0x4

Enum, please.

> +
>  static APICCommonState *local_apics[MAX_APICS + 1];
>
>  static void apic_set_irq(APICCommonState *s, int vector_num, int trigger_mode);
> @@ -78,6 +82,70 @@ static inline int get_bit(uint32_t *tab, int index)
>     return !!(tab[i] & mask);
>  }
>
> +/* return -1 if no bit is set */
> +static int get_highest_priority_int(uint32_t *tab)
> +{
> +    int i;
> +    for (i = 7; i >= 0; i--) {
> +        if (tab[i] != 0) {
> +            return i * 32 + fls_bit(tab[i]);
> +        }
> +    }
> +    return -1;
> +}
> +
> +static void apic_sync_vapic(APICCommonState *s, int sync_type)
> +{
> +    VAPICState vapic_state;
> +    size_t length;
> +    off_t start;
> +    int vector;
> +
> +    if (!s->vapic_paddr) {
> +        return;
> +    }
> +    if (sync_type & SYNC_FROM_VAPIC) {
> +        cpu_physical_memory_rw(s->vapic_paddr, (void *)&vapic_state,
> +                               sizeof(vapic_state), 0);
> +        s->tpr = vapic_state.tpr;
> +    }
> +    if (sync_type & (SYNC_TO_VAPIC | SYNC_ISR_IRR_TO_VAPIC)) {
> +        start = offsetof(VAPICState, isr);
> +        length = offsetof(VAPICState, enabled) - offsetof(VAPICState, isr);
> +
> +        if (sync_type & SYNC_TO_VAPIC) {
> +            assert(qemu_cpu_is_self(s->cpu_env));
> +
> +            vapic_state.tpr = s->tpr;
> +            vapic_state.enabled = 1;
> +            start = 0;
> +            length = sizeof(VAPICState);
> +        }
> +
> +        vector = get_highest_priority_int(s->isr);
> +        if (vector < 0) {
> +            vector = 0;
> +        }
> +        vapic_state.isr = vector & 0xf0;
> +
> +        vapic_state.zero = 0;
> +
> +        vector = get_highest_priority_int(s->irr);
> +        if (vector < 0) {
> +            vector = 0;
> +        }
> +        vapic_state.irr = vector & 0xff;
> +
> +        cpu_physical_memory_write_rom(s->vapic_paddr + start,
> +                                      ((void *)&vapic_state) + start, length);

This assumes that the vapic_state structure matches guest what guest
expect without conversion. Is this true for i386 on x86_64? I didn't
check the structure in question.

> +    }
> +}
> +
> +static void apic_vapic_base_update(APICCommonState *s)
> +{
> +    apic_sync_vapic(s, SYNC_TO_VAPIC);
> +}
> +
>  static void apic_local_deliver(APICCommonState *s, int vector)
>  {
>     uint32_t lvt = s->lvt[vector];
> @@ -239,20 +307,17 @@ static void apic_set_base(APICCommonState *s, uint64_t val)
>
>  static void apic_set_tpr(APICCommonState *s, uint8_t val)
>  {
> -    s->tpr = (val & 0x0f) << 4;
> -    apic_update_irq(s);
> +    /* Updates from cr8 are ignored while the VAPIC is active */
> +    if (!s->vapic_paddr) {
> +        s->tpr = val << 4;
> +        apic_update_irq(s);
> +    }
>  }
>
> -/* return -1 if no bit is set */
> -static int get_highest_priority_int(uint32_t *tab)
> +static uint8_t apic_get_tpr(APICCommonState *s)
>  {
> -    int i;
> -    for(i = 7; i >= 0; i--) {
> -        if (tab[i] != 0) {
> -            return i * 32 + fls_bit(tab[i]);
> -        }
> -    }
> -    return -1;
> +    apic_sync_vapic(s, SYNC_FROM_VAPIC);
> +    return s->tpr >> 4;
>  }
>
>  static int apic_get_ppr(APICCommonState *s)
> @@ -312,6 +377,14 @@ static void apic_update_irq(APICCommonState *s)
>     }
>  }
>
> +void apic_poll_irq(DeviceState *d)
> +{
> +    APICCommonState *s = APIC_COMMON(d);
> +
> +    apic_sync_vapic(s, SYNC_FROM_VAPIC);
> +    apic_update_irq(s);
> +}
> +
>  static void apic_set_irq(APICCommonState *s, int vector_num, int trigger_mode)
>  {
>     apic_report_irq_delivered(!get_bit(s->irr, vector_num));
> @@ -321,6 +394,16 @@ static void apic_set_irq(APICCommonState *s, int vector_num, int trigger_mode)
>         set_bit(s->tmr, vector_num);
>     else
>         reset_bit(s->tmr, vector_num);
> +    if (s->vapic_paddr) {
> +        apic_sync_vapic(s, SYNC_ISR_IRR_TO_VAPIC);
> +        /*
> +         * The vcpu thread needs to see the new IRR before we pull its current
> +         * TPR value. That way, if we miss a lowering of the TRP, the guest
> +         * has the chance to notice the new IRR and poll for IRQs on its own.
> +         */
> +        smp_wmb();
> +        apic_sync_vapic(s, SYNC_FROM_VAPIC);
> +    }
>     apic_update_irq(s);
>  }
>
> @@ -334,6 +417,7 @@ static void apic_eoi(APICCommonState *s)
>     if (!(s->spurious_vec & APIC_SV_DIRECTED_IO) && get_bit(s->tmr, isrv)) {
>         ioapic_eoi_broadcast(isrv);
>     }
> +    apic_sync_vapic(s, SYNC_FROM_VAPIC | SYNC_TO_VAPIC);
>     apic_update_irq(s);
>  }
>
> @@ -471,15 +555,19 @@ int apic_get_interrupt(DeviceState *d)
>     if (!(s->spurious_vec & APIC_SV_ENABLE))
>         return -1;
>
> +    apic_sync_vapic(s, SYNC_FROM_VAPIC);
>     intno = apic_irq_pending(s);
>
>     if (intno == 0) {
> +        apic_sync_vapic(s, SYNC_TO_VAPIC);
>         return -1;
>     } else if (intno < 0) {
> +        apic_sync_vapic(s, SYNC_TO_VAPIC);
>         return s->spurious_vec & 0xff;
>     }
>     reset_bit(s->irr, intno);
>     set_bit(s->isr, intno);
> +    apic_sync_vapic(s, SYNC_TO_VAPIC);
>     apic_update_irq(s);
>     return intno;
>  }
> @@ -576,6 +664,10 @@ static uint32_t apic_mem_readl(void *opaque, target_phys_addr_t addr)
>         val = 0x11 | ((APIC_LVT_NB - 1) << 16); /* version 0x11 */
>         break;
>     case 0x08:
> +        apic_sync_vapic(s, SYNC_FROM_VAPIC);
> +        if (apic_report_tpr_access) {
> +            cpu_report_tpr_access(s->cpu_env, TPR_ACCESS_READ);
> +        }
>         val = s->tpr;
>         break;
>     case 0x09:
> @@ -675,7 +767,11 @@ static void apic_mem_writel(void *opaque, target_phys_addr_t addr, uint32_t val)
>     case 0x03:
>         break;
>     case 0x08:
> +        if (apic_report_tpr_access) {
> +            cpu_report_tpr_access(s->cpu_env, TPR_ACCESS_WRITE);
> +        }
>         s->tpr = val;
> +        apic_sync_vapic(s, SYNC_TO_VAPIC);
>         apic_update_irq(s);
>         break;
>     case 0x09:
> @@ -737,6 +833,11 @@ static void apic_mem_writel(void *opaque, target_phys_addr_t addr, uint32_t val)
>     }
>  }
>
> +static void apic_pre_save(APICCommonState *s)
> +{
> +    apic_sync_vapic(s, SYNC_FROM_VAPIC);
> +}
> +
>  static void apic_post_load(APICCommonState *s)
>  {
>     if (s->timer_expiry != -1) {
> @@ -770,7 +871,10 @@ static void apic_class_init(ObjectClass *klass, void *data)
>     k->init = apic_init;
>     k->set_base = apic_set_base;
>     k->set_tpr = apic_set_tpr;
> +    k->get_tpr = apic_get_tpr;
> +    k->vapic_base_update = apic_vapic_base_update;
>     k->external_nmi = apic_external_nmi;
> +    k->pre_save = apic_pre_save;
>     k->post_load = apic_post_load;
>  }
>
> diff --git a/hw/apic_common.c b/hw/apic_common.c
> index 588531b..1977da7 100644
> --- a/hw/apic_common.c
> +++ b/hw/apic_common.c
> @@ -20,8 +20,10 @@
>  #include "apic.h"
>  #include "apic_internal.h"
>  #include "trace.h"
> +#include "kvm.h"
>
>  static int apic_irq_delivered;
> +bool apic_report_tpr_access;

This should go to APICCommonState.

>
>  void cpu_set_apic_base(DeviceState *d, uint64_t val)
>  {
> @@ -63,13 +65,44 @@ void cpu_set_apic_tpr(DeviceState *d, uint8_t val)
>
>  uint8_t cpu_get_apic_tpr(DeviceState *d)
>  {
> +    APICCommonState *s;
> +    APICCommonClass *info;
> +
> +    if (!d) {
> +        return 0;
> +    }
> +
> +    s = APIC_COMMON(d);
> +    info = APIC_COMMON_GET_CLASS(s);
> +
> +    return info->get_tpr(s);
> +}
> +
> +void apic_enable_tpr_access_reporting(DeviceState *d)
> +{
> +    APICCommonState *s = DO_UPCAST(APICCommonState, busdev.qdev, d);
> +    APICCommonClass *info = APIC_COMMON_GET_CLASS(s);
> +
> +    apic_report_tpr_access = true;
> +    if (info->enable_tpr_reporting) {
> +        info->enable_tpr_reporting(s);
> +    }
> +}
> +
> +void apic_enable_vapic(DeviceState *d, target_phys_addr_t paddr)
> +{
>     APICCommonState *s = DO_UPCAST(APICCommonState, busdev.qdev, d);
> +    APICCommonClass *info = APIC_COMMON_GET_CLASS(s);
>
> -    return s ? s->tpr >> 4 : 0;
> +    s->vapic_paddr = paddr;
> +    info->vapic_base_update(s);
>  }
>
>  void apic_handle_tpr_access_report(DeviceState *d, target_ulong ip, int access)
>  {
> +    APICCommonState *s = DO_UPCAST(APICCommonState, busdev.qdev, d);
> +
> +    vapic_report_tpr_access(s->vapic, s->cpu_env, ip, access);
>  }
>
>  void apic_report_irq_delivered(int delivered)
> @@ -170,12 +203,16 @@ void apic_init_reset(DeviceState *d)
>  static void apic_reset_common(DeviceState *d)
>  {
>     APICCommonState *s = DO_UPCAST(APICCommonState, busdev.qdev, d);
> +    APICCommonClass *info = APIC_COMMON_GET_CLASS(s);
>     bool bsp;
>
>     bsp = cpu_is_bsp(s->cpu_env);
>     s->apicbase = 0xfee00000 |
>         (bsp ? MSR_IA32_APICBASE_BSP : 0) | MSR_IA32_APICBASE_ENABLE;
>
> +    s->vapic_paddr = 0;
> +    info->vapic_base_update(s);
> +
>     apic_init_reset(d);
>
>     if (bsp) {
> @@ -238,6 +275,7 @@ static int apic_init_common(SysBusDevice *dev)
>  {
>     APICCommonState *s = APIC_COMMON(dev);
>     APICCommonClass *info;
> +    static DeviceState *vapic;
>     static int apic_no;
>
>     if (apic_no >= MAX_APICS) {
> @@ -248,10 +286,29 @@ static int apic_init_common(SysBusDevice *dev)
>     info = APIC_COMMON_GET_CLASS(s);
>     info->init(s);
>
> -    sysbus_init_mmio(&s->busdev, &s->io_memory);
> +    sysbus_init_mmio(dev, &s->io_memory);
> +
> +    if (!vapic && s->vapic_control & VAPIC_ENABLE_MASK) {
> +        vapic = sysbus_create_simple("kvmvapic", -1, NULL);
> +    }
> +    s->vapic = vapic;
> +    if (apic_report_tpr_access && info->enable_tpr_reporting) {

I think you should not rely on apic_report_tpr_access being in sane
condition during class init.

> +        info->enable_tpr_reporting(s);
> +    }
> +
>     return 0;
>  }
>
> +static void apic_dispatch_pre_save(void *opaque)
> +{
> +    APICCommonState *s = APIC_COMMON(opaque);
> +    APICCommonClass *info = APIC_COMMON_GET_CLASS(s);
> +
> +    if (info->pre_save) {
> +        info->pre_save(s);
> +    }
> +}
> +
>  static int apic_dispatch_post_load(void *opaque, int version_id)
>  {
>     APICCommonState *s = APIC_COMMON(opaque);
> @@ -269,6 +326,7 @@ static const VMStateDescription vmstate_apic_common = {
>     .minimum_version_id = 3,
>     .minimum_version_id_old = 1,
>     .load_state_old = apic_load_old,
> +    .pre_save = apic_dispatch_pre_save,
>     .post_load = apic_dispatch_post_load,
>     .fields = (VMStateField[]) {
>         VMSTATE_UINT32(apicbase, APICCommonState),
> @@ -298,6 +356,8 @@ static const VMStateDescription vmstate_apic_common = {
>  static Property apic_properties_common[] = {
>     DEFINE_PROP_UINT8("id", APICCommonState, id, -1),
>     DEFINE_PROP_PTR("cpu_env", APICCommonState, cpu_env),
> +    DEFINE_PROP_BIT("vapic", APICCommonState, vapic_control, VAPIC_ENABLE_BIT,
> +                    true),
>     DEFINE_PROP_END_OF_LIST(),
>  };
>
> diff --git a/hw/apic_internal.h b/hw/apic_internal.h
> index 0cab010..95cc7cf 100644
> --- a/hw/apic_internal.h
> +++ b/hw/apic_internal.h
> @@ -61,6 +61,9 @@
>  #define APIC_SV_DIRECTED_IO             (1<<12)
>  #define APIC_SV_ENABLE                  (1<<8)
>
> +#define VAPIC_ENABLE_BIT                0
> +#define VAPIC_ENABLE_MASK               (1 << VAPIC_ENABLE_BIT)
> +
>  #define MAX_APICS 255
>
>  #define MSI_SPACE_SIZE                  0x100000
> @@ -82,7 +85,11 @@ typedef struct APICCommonClass
>     void (*init)(APICCommonState *s);
>     void (*set_base)(APICCommonState *s, uint64_t val);
>     void (*set_tpr)(APICCommonState *s, uint8_t val);
> +    uint8_t (*get_tpr)(APICCommonState *s);
> +    void (*enable_tpr_reporting)(APICCommonState *s);
> +    void (*vapic_base_update)(APICCommonState *s);
>     void (*external_nmi)(APICCommonState *s);
> +    void (*pre_save)(APICCommonState *s);
>     void (*post_load)(APICCommonState *s);
>  } APICCommonClass;
>
> @@ -114,9 +121,29 @@ struct APICCommonState {
>     int64_t timer_expiry;
>     int sipi_vector;
>     int wait_for_sipi;
> +
> +    uint32_t vapic_control;
> +    DeviceState *vapic;
> +    target_phys_addr_t vapic_paddr; /* note: persistence via kvmvapic */
>  };
>
> +typedef struct VAPICState {
> +    uint8_t tpr;
> +    uint8_t isr;
> +    uint8_t zero;
> +    uint8_t irr;
> +    uint8_t enabled;
> +} QEMU_PACKED VAPICState;
> +
> +extern bool apic_report_tpr_access;
> +
>  void apic_report_irq_delivered(int delivered);
>  bool apic_next_timer(APICCommonState *s, int64_t current_time);
> +void apic_enable_tpr_access_reporting(DeviceState *d);
> +void apic_enable_vapic(DeviceState *d, target_phys_addr_t paddr);
> +void apic_poll_irq(DeviceState *d);
> +
> +void vapic_report_tpr_access(DeviceState *dev, void *cpu, target_ulong ip,
> +                             int access);
>
>  #endif /* !QEMU_APIC_INTERNAL_H */
> diff --git a/hw/kvm/apic.c b/hw/kvm/apic.c
> index dfc2ab3..326eb37 100644
> --- a/hw/kvm/apic.c
> +++ b/hw/kvm/apic.c
> @@ -92,6 +92,35 @@ static void kvm_apic_set_tpr(APICCommonState *s, uint8_t val)
>     s->tpr = (val & 0x0f) << 4;
>  }
>
> +static uint8_t kvm_apic_get_tpr(APICCommonState *s)
> +{
> +    return s->tpr >> 4;
> +}
> +
> +static void kvm_apic_enable_tpr_reporting(APICCommonState *s)
> +{
> +    struct kvm_tpr_access_ctl ctl = {
> +        .enabled = 1
> +    };
> +
> +    kvm_vcpu_ioctl(s->cpu_env, KVM_TPR_ACCESS_REPORTING, &ctl);
> +}
> +
> +static void kvm_apic_vapic_base_update(APICCommonState *s)
> +{
> +    struct kvm_vapic_addr vapid_addr = {
> +        .vapic_addr = s->vapic_paddr,
> +    };
> +    int ret;
> +
> +    ret = kvm_vcpu_ioctl(s->cpu_env, KVM_SET_VAPIC_ADDR, &vapid_addr);
> +    if (ret < 0) {
> +        fprintf(stderr, "KVM: setting VAPIC address failed (%s)\n",
> +                strerror(-ret));
> +        abort();
> +    }
> +}
> +
>  static void do_inject_external_nmi(void *data)
>  {
>     APICCommonState *s = data;
> @@ -129,6 +158,9 @@ static void kvm_apic_class_init(ObjectClass *klass, void *data)
>     k->init = kvm_apic_init;
>     k->set_base = kvm_apic_set_base;
>     k->set_tpr = kvm_apic_set_tpr;
> +    k->get_tpr = kvm_apic_get_tpr;
> +    k->enable_tpr_reporting = kvm_apic_enable_tpr_reporting;
> +    k->vapic_base_update = kvm_apic_vapic_base_update;
>     k->external_nmi = kvm_apic_external_nmi;
>  }
>
> diff --git a/hw/kvmvapic.c b/hw/kvmvapic.c
> new file mode 100644
> index 0000000..0c4d304
> --- /dev/null
> +++ b/hw/kvmvapic.c
> @@ -0,0 +1,774 @@
> +/*
> + * TPR optimization for 32-bit Windows guests
> + *
> + * Copyright (C) 2007-2008 Qumranet Technologies
> + * Copyright (C) 2012      Jan Kiszka, Siemens AG
> + *
> + * This work is licensed under the terms of the GNU GPL version 2, or
> + * (at your option) any later version. See the COPYING file in the
> + * top-level directory.
> + */
> +#include "sysemu.h"
> +#include "cpus.h"
> +#include "kvm.h"
> +#include "apic_internal.h"
> +
> +#define APIC_DEFAULT_ADDRESS    0xfee00000
> +
> +#define VAPIC_IO_PORT           0x7e
> +
> +#define VAPIC_INACTIVE          0
> +#define VAPIC_ACTIVE            1
> +#define VAPIC_STANDBY           2

Enums, please.

> +
> +#define VAPIC_CPU_SHIFT         7
> +
> +#define ROM_BLOCK_SIZE          512
> +#define ROM_BLOCK_MASK          (~(ROM_BLOCK_SIZE - 1))
> +
> +typedef struct VAPICHandlers {
> +    uint32_t set_tpr;
> +    uint32_t set_tpr_eax;
> +    uint32_t get_tpr[8];
> +    uint32_t get_tpr_stack;
> +} QEMU_PACKED VAPICHandlers;
> +
> +typedef struct GuestROMState {
> +    char signature[8];
> +    uint32_t vaddr;

This does not look 64 bit clean.

> +    uint32_t fixup_start;
> +    uint32_t fixup_end;
> +    uint32_t vapic_vaddr;
> +    uint32_t vapic_size;
> +    uint32_t vcpu_shift;
> +    uint32_t real_tpr_addr;
> +    VAPICHandlers up;
> +    VAPICHandlers mp;
> +} QEMU_PACKED GuestROMState;

Why packed, is this passed to guest directly?

> +
> +typedef struct VAPICROMState {
> +    SysBusDevice busdev;
> +    MemoryRegion io;
> +    MemoryRegion rom;
> +    bool rom_mapped_writable;

I'd put this later to avoid a structure hole.

> +    uint32_t state;
> +    uint32_t rom_state_paddr;
> +    uint32_t rom_state_vaddr;
> +    uint32_t vapic_paddr;
> +    uint32_t real_tpr_addr;
> +    GuestROMState rom_state;
> +    size_t rom_size;
> +} VAPICROMState;
> +
> +#define TPR_INSTR_IS_WRITE              0x1
> +#define TPR_INSTR_ABS_MODRM             0x2
> +#define TPR_INSTR_MATCH_MODRM_REG       0x4
> +
> +typedef struct TPRInstruction {
> +    uint8_t opcode;
> +    uint8_t modrm_reg;
> +    unsigned int flags;
> +    size_t length;
> +    off_t addr_offset;
> +} TPRInstruction;

Also here the order is pessimized.

> +
> +/* must be sorted by length, shortest first */
> +static const TPRInstruction tpr_instr[] = {
> +    { /* mov abs to eax */
> +        .opcode = 0xa1,
> +        .length = 5,
> +        .addr_offset = 1,
> +    },
> +    { /* mov eax to abs */
> +        .opcode = 0xa3,
> +        .flags = TPR_INSTR_IS_WRITE,
> +        .length = 5,
> +        .addr_offset = 1,
> +    },
> +    { /* mov r32 to r/m32 */
> +        .opcode = 0x89,
> +        .flags = TPR_INSTR_IS_WRITE | TPR_INSTR_ABS_MODRM,
> +        .length = 6,
> +        .addr_offset = 2,
> +    },
> +    { /* mov r/m32 to r32 */
> +        .opcode = 0x8b,
> +        .flags = TPR_INSTR_ABS_MODRM,
> +        .length = 6,
> +        .addr_offset = 2,
> +    },
> +    { /* push r/m32 */
> +        .opcode = 0xff,
> +        .modrm_reg = 6,
> +        .flags = TPR_INSTR_ABS_MODRM | TPR_INSTR_MATCH_MODRM_REG,
> +        .length = 6,
> +        .addr_offset = 2,
> +    },
> +    { /* mov imm32, r/m32 (c7/0) */
> +        .opcode = 0xc7,
> +        .modrm_reg = 0,
> +        .flags = TPR_INSTR_IS_WRITE | TPR_INSTR_ABS_MODRM |
> +                 TPR_INSTR_MATCH_MODRM_REG,
> +        .length = 10,
> +        .addr_offset = 2,
> +    },
> +};
> +
> +static void read_guest_rom_state(VAPICROMState *s)
> +{
> +    cpu_physical_memory_rw(s->rom_state_paddr, (void *)&s->rom_state,
> +                           sizeof(GuestROMState), 0);
> +}
> +
> +static void write_guest_rom_state(VAPICROMState *s)
> +{
> +    cpu_physical_memory_rw(s->rom_state_paddr, (void *)&s->rom_state,
> +                           sizeof(GuestROMState), 1);
> +}
> +
> +static void update_guest_rom_state(VAPICROMState *s)
> +{
> +    read_guest_rom_state(s);
> +
> +    s->rom_state.real_tpr_addr = cpu_to_le32(s->real_tpr_addr);
> +    s->rom_state.vcpu_shift = cpu_to_le32(VAPIC_CPU_SHIFT);
> +
> +    write_guest_rom_state(s);
> +}
> +
> +static int find_real_tpr_addr(VAPICROMState *s, CPUState *env)
> +{
> +    target_phys_addr_t paddr;
> +    target_ulong addr;
> +
> +    if (s->state == VAPIC_ACTIVE) {
> +        return 0;
> +    }
> +    for (addr = 0xfffff000; addr >= 0x80000000; addr -= TARGET_PAGE_SIZE) {
> +        paddr = cpu_get_phys_page_debug(env, addr);
> +        if (paddr != APIC_DEFAULT_ADDRESS) {
> +            continue;
> +        }
> +        s->real_tpr_addr = addr + 0x80;
> +        update_guest_rom_state(s);
> +        return 0;
> +    }

This loop looks odd, what should it do, probe for unused address?

> +    return -1;
> +}
> +
> +static uint8_t modrm_reg(uint8_t modrm)
> +{
> +    return (modrm >> 3) & 7;
> +}
> +
> +static bool is_abs_modrm(uint8_t modrm)
> +{
> +    return (modrm & 0xc7) == 0x05;
> +}
> +
> +static bool opcode_matches(uint8_t *opcode, const TPRInstruction *instr)
> +{
> +    return opcode[0] == instr->opcode &&
> +        (!(instr->flags & TPR_INSTR_ABS_MODRM) || is_abs_modrm(opcode[1])) &&
> +        (!(instr->flags & TPR_INSTR_MATCH_MODRM_REG) ||
> +         modrm_reg(opcode[1]) == instr->modrm_reg);
> +}
> +
> +static int evaluate_tpr_instruction(VAPICROMState *s, CPUState *env,
> +                                    target_ulong *pip, int access)
> +{
> +    const TPRInstruction *instr;
> +    target_ulong ip = *pip;
> +    uint8_t opcode[2];
> +    uint32_t real_tpr_addr;
> +    int i;
> +
> +    if ((ip & 0xf0000000) != 0x80000000 && (ip & 0xf0000000) != 0xe0000000) {

The constants should be using ULL suffix because target_ulong could be
64 bit, though maybe this is more optimal.

> +        return -1;
> +    }
> +
> +    /*
> +     * Early Windows 2003 SMP initialization contains a
> +     *
> +     *   mov imm32, r/m32
> +     *
> +     * instruction that is patched by TPR optimization. The problem is that
> +     * RSP, used by the patched instruction, is zero, so the guest gets a
> +     * double fault and dies.
> +     */
> +    if (env->regs[R_ESP] == 0) {
> +        return -1;
> +    }
> +
> +    if (access == TPR_ACCESS_WRITE && kvm_enabled() &&
> +        !kvm_irqchip_in_kernel()) {
> +        /*
> +         * KVM without TPR access reporting calls into the user space APIC on
> +         * write with IP pointing after the accessing instruction. So we need
> +         * to look backward to find the reason.
> +         */
> +        for (i = 0; i < ARRAY_SIZE(tpr_instr); i++) {
> +            instr = &tpr_instr[i];
> +            if (!(instr->flags & TPR_INSTR_IS_WRITE)) {
> +                continue;
> +            }
> +            if (cpu_memory_rw_debug(env, ip - instr->length, opcode,
> +                                    sizeof(opcode), 0) < 0) {
> +                return -1;
> +            }
> +            if (opcode_matches(opcode, instr)) {
> +                ip -= instr->length;
> +                goto instruction_ok;
> +            }
> +        }
> +        return -1;
> +    } else {
> +        if (cpu_memory_rw_debug(env, ip, opcode, sizeof(opcode), 0) < 0) {
> +            return -1;
> +        }
> +        for (i = 0; i < ARRAY_SIZE(tpr_instr); i++) {
> +            instr = &tpr_instr[i];
> +            if (opcode_matches(opcode, instr)) {
> +                goto instruction_ok;
> +            }
> +        }
> +        return -1;
> +    }
> +
> +instruction_ok:
> +    /*
> +     * Grab the virtual TPR address from the instruction
> +     * and update the cached values.
> +     */
> +    if (cpu_memory_rw_debug(env, ip + instr->addr_offset,
> +                            (void *)&real_tpr_addr,
> +                            sizeof(real_tpr_addr), 0) < 0) {
> +        return -1;
> +    }
> +    real_tpr_addr = le32_to_cpu(real_tpr_addr);
> +    if ((real_tpr_addr & 0xfff) != 0x80) {
> +        return -1;
> +    }
> +    s->real_tpr_addr = real_tpr_addr;
> +    update_guest_rom_state(s);
> +
> +    *pip = ip;
> +    return 0;
> +}
> +
> +static int update_rom_mapping(VAPICROMState *s, CPUState *env, target_ulong ip)
> +{
> +    target_phys_addr_t paddr;
> +    uint32_t rom_state_vaddr;
> +    uint32_t pos, patch, offset;
> +
> +    /* nothing to do if already activated */
> +    if (s->state == VAPIC_ACTIVE) {
> +        return 0;
> +    }
> +
> +    /* bail out if ROM init code was not executed (missing ROM?) */
> +    if (s->state == VAPIC_INACTIVE) {
> +        return -1;
> +    }
> +
> +    /* find out virtual address of the ROM */
> +    rom_state_vaddr = s->rom_state_paddr + (ip & 0xf0000000);
> +    paddr = cpu_get_phys_page_debug(env, rom_state_vaddr);
> +    if (paddr == -1) {
> +        return -1;
> +    }
> +    paddr += rom_state_vaddr & ~TARGET_PAGE_MASK;
> +    if (paddr != s->rom_state_paddr) {
> +        return -1;
> +    }
> +    read_guest_rom_state(s);
> +    if (memcmp(s->rom_state.signature, "kvm aPiC", 8) != 0) {
> +        return -1;
> +    }
> +    s->rom_state_vaddr = rom_state_vaddr;
> +
> +    /* fixup addresses in ROM if needed */
> +    if (rom_state_vaddr == le32_to_cpu(s->rom_state.vaddr)) {
> +        return 0;
> +    }
> +    for (pos = le32_to_cpu(s->rom_state.fixup_start);
> +         pos < le32_to_cpu(s->rom_state.fixup_end);
> +         pos += 4) {
> +        cpu_physical_memory_rw(paddr + pos - s->rom_state.vaddr,
> +                               (void *)&offset, sizeof(offset), 0);
> +        offset = le32_to_cpu(offset);
> +        cpu_physical_memory_rw(paddr + offset, (void *)&patch,
> +                               sizeof(patch), 0);
> +        patch = le32_to_cpu(patch);
> +        patch += rom_state_vaddr - le32_to_cpu(s->rom_state.vaddr);
> +        patch = cpu_to_le32(patch);
> +        cpu_physical_memory_rw(paddr + offset, (void *)&patch,
> +                               sizeof(patch), 1);
> +    }
> +    read_guest_rom_state(s);
> +    s->vapic_paddr = paddr + le32_to_cpu(s->rom_state.vapic_vaddr) -
> +        le32_to_cpu(s->rom_state.vaddr);
> +
> +    return 0;
> +}
> +
> +/*
> + * Tries to read the unique processor number from the Kernel Processor Control
> + * Region (KPCR) of 32-bit Windows. Returns -1 if the KPCR cannot be accessed
> + * or is considered invalid.
> + */

Horrible hack. Is guest OS type or version checked somewhere?

> +static int get_kpcr_number(CPUState *env)
> +{
> +    struct kpcr {
> +        uint8_t  fill1[0x1c];
> +        uint32_t self;
> +        uint8_t  fill2[0x31];
> +        uint8_t  number;
> +    } QEMU_PACKED kpcr;

KPCR. Pointers to Windows documentation would be nice.

> +
> +    if (cpu_memory_rw_debug(env, env->segs[R_FS].base,
> +                            (void *)&kpcr, sizeof(kpcr), 0) < 0 ||
> +        kpcr.self != env->segs[R_FS].base) {
> +        return -1;
> +    }
> +    return kpcr.number;
> +}
> +
> +static int vapic_enable(VAPICROMState *s, CPUState *env)
> +{
> +    int cpu_number = get_kpcr_number(env);
> +    target_phys_addr_t vapic_paddr;
> +    static const uint8_t enabled = 1;
> +
> +    if (cpu_number < 0) {
> +        return -1;
> +    }
> +    vapic_paddr = s->vapic_paddr +
> +        (((target_phys_addr_t)cpu_number) << VAPIC_CPU_SHIFT);
> +    cpu_physical_memory_rw(vapic_paddr + offsetof(VAPICState, enabled),
> +                           (void *)&enabled, sizeof(enabled), 1);
> +    apic_enable_vapic(env->apic_state, vapic_paddr);
> +
> +    s->state = VAPIC_ACTIVE;
> +
> +    return 0;
> +}
> +
> +static void patch_byte(CPUState *env, target_ulong addr, uint8_t byte)
> +{
> +    cpu_memory_rw_debug(env, addr, &byte, 1, 1);
> +}
> +
> +static void patch_call(VAPICROMState *s, CPUState *env, target_ulong ip,
> +                       uint32_t target)
> +{
> +    uint32_t offset;
> +
> +    offset = cpu_to_le32(target - ip - 5);
> +    patch_byte(env, ip, 0xe8); /* call near */
> +    cpu_memory_rw_debug(env, ip + 1, (void *)&offset, sizeof(offset), 1);
> +}
> +
> +static void patch_instruction(VAPICROMState *s, CPUState *env, target_ulong ip)
> +{
> +    target_phys_addr_t paddr;
> +    VAPICHandlers *handlers;
> +    uint8_t opcode[2];
> +    uint32_t imm32;
> +
> +    if (smp_cpus == 1) {
> +        handlers = &s->rom_state.up;
> +    } else {
> +        handlers = &s->rom_state.mp;
> +    }
> +
> +    pause_all_vcpus();
> +
> +    cpu_memory_rw_debug(env, ip, opcode, sizeof(opcode), 0);
> +
> +    switch (opcode[0]) {
> +    case 0x89: /* mov r32 to r/m32 */
> +        patch_byte(env, ip, 0x50 + modrm_reg(opcode[1]));  /* push reg */
> +        patch_call(s, env, ip + 1, handlers->set_tpr);
> +        break;
> +    case 0x8b: /* mov r/m32 to r32 */
> +        patch_byte(env, ip, 0x90);
> +        patch_call(s, env, ip + 1, handlers->get_tpr[modrm_reg(opcode[1])]);
> +        break;
> +    case 0xa1: /* mov abs to eax */
> +        patch_call(s, env, ip, handlers->get_tpr[0]);
> +        break;
> +    case 0xa3: /* mov eax to abs */
> +        patch_call(s, env, ip, handlers->set_tpr_eax);
> +        break;
> +    case 0xc7: /* mov imm32, r/m32 (c7/0) */
> +        patch_byte(env, ip, 0x68);  /* push imm32 */
> +        cpu_memory_rw_debug(env, ip + 6, (void *)&imm32, sizeof(imm32), 0);
> +        cpu_memory_rw_debug(env, ip + 1, (void *)&imm32, sizeof(imm32), 1);
> +        patch_call(s, env, ip + 5, handlers->set_tpr);
> +        break;
> +    case 0xff: /* push r/m32 */
> +        patch_byte(env, ip, 0x50); /* push eax */
> +        patch_call(s, env, ip + 1, handlers->get_tpr_stack);
> +        break;
> +    default:
> +        abort();
> +    }
> +
> +    resume_all_vcpus();
> +
> +    paddr = cpu_get_phys_page_debug(env, ip);
> +    paddr += ip & ~TARGET_PAGE_MASK;
> +    tb_invalidate_phys_page_range(paddr, paddr + 1, 1);
> +}
> +
> +void vapic_report_tpr_access(DeviceState *dev, void *cpu, target_ulong ip,
> +                             int access)
> +{
> +    VAPICROMState *s = DO_UPCAST(VAPICROMState, busdev.qdev, dev);
> +    CPUState *env = cpu;
> +
> +    cpu_synchronize_state(env);
> +
> +    if (evaluate_tpr_instruction(s, env, &ip, access) < 0) {
> +        if (s->state == VAPIC_ACTIVE) {
> +            vapic_enable(s, env);
> +        }
> +        return;
> +    }
> +    if (update_rom_mapping(s, env, ip) < 0) {
> +        return;
> +    }
> +    if (vapic_enable(s, env) < 0) {
> +        return;
> +    }
> +    patch_instruction(s, env, ip);
> +}
> +
> +static void vapic_reset(DeviceState *dev)
> +{
> +    VAPICROMState *s = DO_UPCAST(VAPICROMState, busdev.qdev, dev);
> +
> +    if (s->state == VAPIC_ACTIVE) {
> +        s->state = VAPIC_STANDBY;
> +    }
> +}
> +
> +static int patch_hypercalls(VAPICROMState *s)
> +{
> +    target_phys_addr_t rom_paddr = s->rom_state_paddr & ROM_BLOCK_MASK;
> +    static uint8_t vmcall_pattern[] = {

const

> +        0xb8, 0x1, 0, 0, 0, 0xf, 0x1, 0xc1
> +    };
> +    static uint8_t outl_pattern[] = {

const

> +        0xb8, 0x1, 0, 0, 0, 0x90, 0xe7, 0x7e
> +    };
> +    uint8_t alternates[2];
> +    uint8_t *pattern;
> +    uint8_t *patch;
> +    int patches = 0;
> +    off_t pos;
> +    uint8_t *rom;
> +
> +    rom = g_malloc(s->rom_size);
> +    cpu_physical_memory_rw(rom_paddr, rom, s->rom_size, 0);
> +
> +    for (pos = 0; pos < s->rom_size - sizeof(vmcall_pattern); pos++) {
> +        if (kvm_irqchip_in_kernel()) {
> +            pattern = outl_pattern;
> +            alternates[0] = outl_pattern[7];
> +            alternates[1] = outl_pattern[7];
> +            patch = &vmcall_pattern[5];
> +        } else {
> +            pattern = vmcall_pattern;
> +            alternates[0] = vmcall_pattern[7];
> +            alternates[1] = 0xd9; /* AMD's VMMCALL */
> +            patch = &outl_pattern[5];
> +        }
> +        if (memcmp(rom + pos, pattern, 7) == 0 &&
> +            (rom[pos + 7] == alternates[0] || rom[pos + 7] == alternates[1])) {
> +            cpu_physical_memory_rw(rom_paddr + pos + 5, patch, 3, 1);
> +            /*
> +             * Don't flush the tb here. Under ordinary conditions, the patched
> +             * calls are miles away from the current IP. Under malicious
> +             * conditions, the guest could trick us to crash.
> +             */
> +        }
> +    }
> +
> +    g_free(rom);
> +
> +    if (patches != 0 && patches != 2) {
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +static void vapic_map_rom_writable(VAPICROMState *s)
> +{
> +    target_phys_addr_t rom_paddr = s->rom_state_paddr & ROM_BLOCK_MASK;
> +    MemoryRegionSection section;
> +    MemoryRegion *as;
> +    size_t rom_size;
> +    uint8_t *ram;
> +
> +    as = sysbus_address_space(&s->busdev);
> +
> +    if (s->rom_mapped_writable) {
> +        memory_region_del_subregion(as, &s->rom);
> +        memory_region_destroy(&s->rom);
> +    }
> +
> +    /* grab RAM memory region (region @rom_paddr may still be pc.rom) */
> +    section = memory_region_find(as, 0, 1);
> +
> +    /* read ROM size from RAM region */
> +    ram = memory_region_get_ram_ptr(section.mr);
> +    rom_size = ram[rom_paddr + 2] * ROM_BLOCK_SIZE;
> +    s->rom_size = rom_size;
> +
> +    /* We need to round up to avoid creating subpages
> +     * from which we cannot run code. */
> +    rom_size = TARGET_PAGE_ALIGN(rom_size);
> +
> +    memory_region_init_alias(&s->rom, "kvmvapic-rom", section.mr, rom_paddr,
> +                             rom_size);
> +    memory_region_add_subregion_overlap(as, rom_paddr, &s->rom, 1000);
> +    s->rom_mapped_writable = true;
> +}
> +
> +static void do_enable_tpr_reporting(void *data)
> +{
> +    CPUState *env = data;
> +
> +    apic_enable_tpr_access_reporting(env->apic_state);
> +}
> +
> +static void vapic_enable_tpr_reporting(void)
> +{
> +    CPUState *env = cpu_single_env;
> +
> +    for (env = first_cpu; env != NULL; env = env->next_cpu) {
> +        run_on_cpu(env, do_enable_tpr_reporting, env);
> +    }
> +}
> +
> +static int vapic_prepare(VAPICROMState *s)
> +{
> +    vapic_map_rom_writable(s);
> +
> +    if (patch_hypercalls(s) < 0) {
> +        return -1;
> +    }
> +
> +    vapic_enable_tpr_reporting();
> +
> +    return 0;
> +}
> +
> +static void vapic_write(void *opaque, target_phys_addr_t addr, uint64_t data,
> +                        unsigned int size)
> +{
> +    CPUState *env = cpu_single_env;
> +    target_phys_addr_t rom_paddr;
> +    VAPICROMState *s = opaque;
> +
> +    cpu_synchronize_state(env);
> +
> +    /*
> +     * The VAPIC supports two PIO-based hypercalls, both via port 0x7E.
> +     *  o 16-bit write access:
> +     *    Reports the option ROM initialization to the hypervisor. Written
> +     *    value is the offset of the state structure in the ROM.
> +     *  o 8-bit write access:
> +     *    Reactivates the VAPIC after a guest hibernation, i.e. after the
> +     *    option ROM content has been re-initialized by a guest power cycle.
> +     *  o 32-bit write access:
> +     *    Poll for pending IRQs, considering the current VAPIC state.
> +     */

Different operation depending on size? Interesting.

> +    switch (size) {
> +    case 2:
> +        if (s->state != VAPIC_INACTIVE) {
> +            patch_hypercalls(s);
> +            break;
> +        }
> +
> +        rom_paddr = (env->segs[R_CS].base + env->eip) & ROM_BLOCK_MASK;
> +        s->rom_state_paddr = rom_paddr + data;
> +
> +        if (vapic_prepare(s) < 0) {
> +            break;
> +        }
> +        s->state = VAPIC_STANDBY;
> +        break;
> +    case 1:
> +        if (kvm_enabled()) {
> +            /*
> +             * Disable triggering instruction in ROM by writing a NOP.
> +             *
> +             * We cannot do this in TCG mode as the reported IP is not
> +             * reliable.

Given the hack level of the whole, it would not be impossible to find
the IP using search PC.

> +             */
> +            pause_all_vcpus();
> +            patch_byte(env, env->eip - 2, 0x66);
> +            patch_byte(env, env->eip - 1, 0x90);
> +            resume_all_vcpus();
> +        }
> +
> +        if (s->state == VAPIC_ACTIVE) {
> +            break;
> +        }
> +        if (update_rom_mapping(s, env, env->eip) < 0) {
> +            break;
> +        }
> +        if (find_real_tpr_addr(s, env) < 0) {
> +            break;
> +        }
> +        vapic_enable(s, env);
> +        break;
> +    default:
> +    case 4:
> +        if (!kvm_irqchip_in_kernel()) {
> +            apic_poll_irq(env->apic_state);
> +        }
> +        break;
> +    }
> +}
> +
> +static const MemoryRegionOps vapic_ops = {
> +    .write = vapic_write,
> +    .endianness = DEVICE_NATIVE_ENDIAN,
> +};
> +
> +static int vapic_init(SysBusDevice *dev)
> +{
> +    VAPICROMState *s = FROM_SYSBUS(VAPICROMState, dev);
> +
> +    memory_region_init_io(&s->io, &vapic_ops, s, "kvmvapic", 2);
> +    sysbus_add_io(dev, VAPIC_IO_PORT, &s->io);
> +    sysbus_init_ioports(dev, VAPIC_IO_PORT, 2);
> +
> +    option_rom[nb_option_roms].name = "kvmvapic.bin";
> +    option_rom[nb_option_roms].bootindex = -1;
> +    nb_option_roms++;
> +
> +    return 0;
> +}
> +
> +static void do_vapic_enable(void *data)
> +{
> +    VAPICROMState *s = data;
> +
> +    vapic_enable(s, first_cpu);
> +}
> +
> +static int vapic_post_load(void *opaque, int version_id)
> +{
> +    VAPICROMState *s = opaque;
> +    uint8_t *zero;
> +
> +    /*
> +     * The old implementation of qemu-kvm did not provide the state
> +     * VAPIC_STANDBY. Reconstruct it.
> +     */
> +    if (s->state == VAPIC_INACTIVE && s->rom_state_paddr != 0) {
> +        s->state = VAPIC_STANDBY;
> +    }
> +
> +    if (s->state != VAPIC_INACTIVE) {
> +        if (vapic_prepare(s) < 0) {
> +            return -1;
> +        }
> +    }
> +    if (s->state == VAPIC_ACTIVE) {
> +        if (smp_cpus == 1) {
> +            run_on_cpu(first_cpu, do_vapic_enable, s);
> +        } else {
> +            zero = g_malloc0(s->rom_state.vapic_size);
> +            cpu_physical_memory_rw(s->vapic_paddr, zero,
> +                                   s->rom_state.vapic_size, 1);
> +            g_free(zero);
> +        }
> +    }
> +
> +    return 0;
> +}
> +
> +static const VMStateDescription vmstate_handlers = {
> +    .name = "kvmvapic-handlers",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .minimum_version_id_old = 1,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT32(set_tpr, VAPICHandlers),
> +        VMSTATE_UINT32(set_tpr_eax, VAPICHandlers),
> +        VMSTATE_UINT32_ARRAY(get_tpr, VAPICHandlers, 8),
> +        VMSTATE_UINT32(get_tpr_stack, VAPICHandlers),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
> +static const VMStateDescription vmstate_guest_rom = {
> +    .name = "kvmvapic-guest-rom",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .minimum_version_id_old = 1,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UNUSED(8),     /* signature */
> +        VMSTATE_UINT32(vaddr, GuestROMState),
> +        VMSTATE_UINT32(fixup_start, GuestROMState),
> +        VMSTATE_UINT32(fixup_end, GuestROMState),
> +        VMSTATE_UINT32(vapic_vaddr, GuestROMState),
> +        VMSTATE_UINT32(vapic_size, GuestROMState),
> +        VMSTATE_UINT32(vcpu_shift, GuestROMState),
> +        VMSTATE_UINT32(real_tpr_addr, GuestROMState),
> +        VMSTATE_STRUCT(up, GuestROMState, 0, vmstate_handlers, VAPICHandlers),
> +        VMSTATE_STRUCT(mp, GuestROMState, 0, vmstate_handlers, VAPICHandlers),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
> +static const VMStateDescription vmstate_vapic = {
> +    .name = "kvm-tpr-opt",      /* compatible with qemu-kvm VAPIC */
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .minimum_version_id_old = 1,
> +    .post_load = vapic_post_load,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_STRUCT(rom_state, VAPICROMState, 0, vmstate_guest_rom,
> +                       GuestROMState),
> +        VMSTATE_UINT32(state, VAPICROMState),
> +        VMSTATE_UINT32(real_tpr_addr, VAPICROMState),
> +        VMSTATE_UINT32(rom_state_vaddr, VAPICROMState),
> +        VMSTATE_UINT32(vapic_paddr, VAPICROMState),
> +        VMSTATE_UINT32(rom_state_paddr, VAPICROMState),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
> +static void vapic_class_init(ObjectClass *klass, void *data)
> +{
> +    SysBusDeviceClass *sc = SYS_BUS_DEVICE_CLASS(klass);
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +
> +    dc->no_user = 1;
> +    dc->reset   = vapic_reset;
> +    dc->vmsd    = &vmstate_vapic;
> +    sc->init    = vapic_init;
> +}
> +
> +static TypeInfo vapic_type = {
> +    .name          = "kvmvapic",
> +    .parent        = TYPE_SYS_BUS_DEVICE,
> +    .instance_size = sizeof(VAPICROMState),
> +    .class_init    = vapic_class_init,
> +};
> +
> +static void vapic_register(void)
> +{
> +    type_register_static(&vapic_type);
> +}
> +
> +device_init(vapic_register);
> --
> 1.7.3.4
>
>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests
@ 2012-02-11 15:25     ` Blue Swirl
  0 siblings, 0 replies; 90+ messages in thread
From: Blue Swirl @ 2012-02-11 15:25 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti, qemu-devel,
	Avi Kivity

On Fri, Feb 10, 2012 at 18:31, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> This enables acceleration for MMIO-based TPR registers accesses of
> 32-bit Windows guest systems. It is mostly useful with KVM enabled,
> either on older Intel CPUs (without flexpriority feature, can also be
> manually disabled for testing) or any current AMD processor.
>
> The approach introduced here is derived from the original version of
> qemu-kvm. It was refactored, documented, and extended by support for
> user space APIC emulation, both with and without KVM acceleration. The
> VMState format was kept compatible, so was the ABI to the option ROM
> that implements the guest-side para-virtualized driver service. This
> enables seamless migration from qemu-kvm to upstream or, one day,
> between KVM and TCG mode.
>
> The basic concept goes like this:
>  - VAPIC PV interface consisting of I/O port 0x7e and (for KVM in-kernel
>   irqchip) a vmcall hypercall is registered
>  - VAPIC option ROM is loaded into guest
>  - option ROM activates TPR MMIO access reporting via port 0x7e
>  - TPR accesses are trapped and patched in the guest to call into option
>   ROM instead, VAPIC support is enabled
>  - option ROM TPR helpers track state in memory and invoke hypercall to
>   poll for pending IRQs if required
>
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>

I must say that I find the approach horrible, patching guests and ROMs
and looking up Windows internals. Taking the same approach to extreme,
we could for example patch Xen guest to become a KVM guest. Not that I
object merging.

> ---
>  Makefile.target    |    3 +-
>  hw/apic.c          |  126 ++++++++-
>  hw/apic_common.c   |   64 +++++-
>  hw/apic_internal.h |   27 ++
>  hw/kvm/apic.c      |   32 +++
>  hw/kvmvapic.c      |  774 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  6 files changed, 1012 insertions(+), 14 deletions(-)
>  create mode 100644 hw/kvmvapic.c
>
> diff --git a/Makefile.target b/Makefile.target
> index 68481a3..ec7eff8 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -230,7 +230,8 @@ obj-y += device-hotplug.o
>
>  # Hardware support
>  obj-i386-y += mc146818rtc.o pc.o
> -obj-i386-y += sga.o apic_common.o apic.o ioapic_common.o ioapic.o piix_pci.o
> +obj-i386-y += apic_common.o apic.o kvmvapic.o
> +obj-i386-y += sga.o ioapic_common.o ioapic.o piix_pci.o
>  obj-i386-y += vmport.o
>  obj-i386-y += pci-hotplug.o smbios.o wdt_ib700.o
>  obj-i386-y += debugcon.o multiboot.o
> diff --git a/hw/apic.c b/hw/apic.c
> index 086c544..2ebf3ca 100644
> --- a/hw/apic.c
> +++ b/hw/apic.c
> @@ -35,6 +35,10 @@
>  #define MSI_ADDR_DEST_ID_SHIFT         12
>  #define        MSI_ADDR_DEST_ID_MASK           0x00ffff0
>
> +#define SYNC_FROM_VAPIC                 0x1
> +#define SYNC_TO_VAPIC                   0x2
> +#define SYNC_ISR_IRR_TO_VAPIC           0x4

Enum, please.

> +
>  static APICCommonState *local_apics[MAX_APICS + 1];
>
>  static void apic_set_irq(APICCommonState *s, int vector_num, int trigger_mode);
> @@ -78,6 +82,70 @@ static inline int get_bit(uint32_t *tab, int index)
>     return !!(tab[i] & mask);
>  }
>
> +/* return -1 if no bit is set */
> +static int get_highest_priority_int(uint32_t *tab)
> +{
> +    int i;
> +    for (i = 7; i >= 0; i--) {
> +        if (tab[i] != 0) {
> +            return i * 32 + fls_bit(tab[i]);
> +        }
> +    }
> +    return -1;
> +}
> +
> +static void apic_sync_vapic(APICCommonState *s, int sync_type)
> +{
> +    VAPICState vapic_state;
> +    size_t length;
> +    off_t start;
> +    int vector;
> +
> +    if (!s->vapic_paddr) {
> +        return;
> +    }
> +    if (sync_type & SYNC_FROM_VAPIC) {
> +        cpu_physical_memory_rw(s->vapic_paddr, (void *)&vapic_state,
> +                               sizeof(vapic_state), 0);
> +        s->tpr = vapic_state.tpr;
> +    }
> +    if (sync_type & (SYNC_TO_VAPIC | SYNC_ISR_IRR_TO_VAPIC)) {
> +        start = offsetof(VAPICState, isr);
> +        length = offsetof(VAPICState, enabled) - offsetof(VAPICState, isr);
> +
> +        if (sync_type & SYNC_TO_VAPIC) {
> +            assert(qemu_cpu_is_self(s->cpu_env));
> +
> +            vapic_state.tpr = s->tpr;
> +            vapic_state.enabled = 1;
> +            start = 0;
> +            length = sizeof(VAPICState);
> +        }
> +
> +        vector = get_highest_priority_int(s->isr);
> +        if (vector < 0) {
> +            vector = 0;
> +        }
> +        vapic_state.isr = vector & 0xf0;
> +
> +        vapic_state.zero = 0;
> +
> +        vector = get_highest_priority_int(s->irr);
> +        if (vector < 0) {
> +            vector = 0;
> +        }
> +        vapic_state.irr = vector & 0xff;
> +
> +        cpu_physical_memory_write_rom(s->vapic_paddr + start,
> +                                      ((void *)&vapic_state) + start, length);

This assumes that the vapic_state structure matches guest what guest
expect without conversion. Is this true for i386 on x86_64? I didn't
check the structure in question.

> +    }
> +}
> +
> +static void apic_vapic_base_update(APICCommonState *s)
> +{
> +    apic_sync_vapic(s, SYNC_TO_VAPIC);
> +}
> +
>  static void apic_local_deliver(APICCommonState *s, int vector)
>  {
>     uint32_t lvt = s->lvt[vector];
> @@ -239,20 +307,17 @@ static void apic_set_base(APICCommonState *s, uint64_t val)
>
>  static void apic_set_tpr(APICCommonState *s, uint8_t val)
>  {
> -    s->tpr = (val & 0x0f) << 4;
> -    apic_update_irq(s);
> +    /* Updates from cr8 are ignored while the VAPIC is active */
> +    if (!s->vapic_paddr) {
> +        s->tpr = val << 4;
> +        apic_update_irq(s);
> +    }
>  }
>
> -/* return -1 if no bit is set */
> -static int get_highest_priority_int(uint32_t *tab)
> +static uint8_t apic_get_tpr(APICCommonState *s)
>  {
> -    int i;
> -    for(i = 7; i >= 0; i--) {
> -        if (tab[i] != 0) {
> -            return i * 32 + fls_bit(tab[i]);
> -        }
> -    }
> -    return -1;
> +    apic_sync_vapic(s, SYNC_FROM_VAPIC);
> +    return s->tpr >> 4;
>  }
>
>  static int apic_get_ppr(APICCommonState *s)
> @@ -312,6 +377,14 @@ static void apic_update_irq(APICCommonState *s)
>     }
>  }
>
> +void apic_poll_irq(DeviceState *d)
> +{
> +    APICCommonState *s = APIC_COMMON(d);
> +
> +    apic_sync_vapic(s, SYNC_FROM_VAPIC);
> +    apic_update_irq(s);
> +}
> +
>  static void apic_set_irq(APICCommonState *s, int vector_num, int trigger_mode)
>  {
>     apic_report_irq_delivered(!get_bit(s->irr, vector_num));
> @@ -321,6 +394,16 @@ static void apic_set_irq(APICCommonState *s, int vector_num, int trigger_mode)
>         set_bit(s->tmr, vector_num);
>     else
>         reset_bit(s->tmr, vector_num);
> +    if (s->vapic_paddr) {
> +        apic_sync_vapic(s, SYNC_ISR_IRR_TO_VAPIC);
> +        /*
> +         * The vcpu thread needs to see the new IRR before we pull its current
> +         * TPR value. That way, if we miss a lowering of the TRP, the guest
> +         * has the chance to notice the new IRR and poll for IRQs on its own.
> +         */
> +        smp_wmb();
> +        apic_sync_vapic(s, SYNC_FROM_VAPIC);
> +    }
>     apic_update_irq(s);
>  }
>
> @@ -334,6 +417,7 @@ static void apic_eoi(APICCommonState *s)
>     if (!(s->spurious_vec & APIC_SV_DIRECTED_IO) && get_bit(s->tmr, isrv)) {
>         ioapic_eoi_broadcast(isrv);
>     }
> +    apic_sync_vapic(s, SYNC_FROM_VAPIC | SYNC_TO_VAPIC);
>     apic_update_irq(s);
>  }
>
> @@ -471,15 +555,19 @@ int apic_get_interrupt(DeviceState *d)
>     if (!(s->spurious_vec & APIC_SV_ENABLE))
>         return -1;
>
> +    apic_sync_vapic(s, SYNC_FROM_VAPIC);
>     intno = apic_irq_pending(s);
>
>     if (intno == 0) {
> +        apic_sync_vapic(s, SYNC_TO_VAPIC);
>         return -1;
>     } else if (intno < 0) {
> +        apic_sync_vapic(s, SYNC_TO_VAPIC);
>         return s->spurious_vec & 0xff;
>     }
>     reset_bit(s->irr, intno);
>     set_bit(s->isr, intno);
> +    apic_sync_vapic(s, SYNC_TO_VAPIC);
>     apic_update_irq(s);
>     return intno;
>  }
> @@ -576,6 +664,10 @@ static uint32_t apic_mem_readl(void *opaque, target_phys_addr_t addr)
>         val = 0x11 | ((APIC_LVT_NB - 1) << 16); /* version 0x11 */
>         break;
>     case 0x08:
> +        apic_sync_vapic(s, SYNC_FROM_VAPIC);
> +        if (apic_report_tpr_access) {
> +            cpu_report_tpr_access(s->cpu_env, TPR_ACCESS_READ);
> +        }
>         val = s->tpr;
>         break;
>     case 0x09:
> @@ -675,7 +767,11 @@ static void apic_mem_writel(void *opaque, target_phys_addr_t addr, uint32_t val)
>     case 0x03:
>         break;
>     case 0x08:
> +        if (apic_report_tpr_access) {
> +            cpu_report_tpr_access(s->cpu_env, TPR_ACCESS_WRITE);
> +        }
>         s->tpr = val;
> +        apic_sync_vapic(s, SYNC_TO_VAPIC);
>         apic_update_irq(s);
>         break;
>     case 0x09:
> @@ -737,6 +833,11 @@ static void apic_mem_writel(void *opaque, target_phys_addr_t addr, uint32_t val)
>     }
>  }
>
> +static void apic_pre_save(APICCommonState *s)
> +{
> +    apic_sync_vapic(s, SYNC_FROM_VAPIC);
> +}
> +
>  static void apic_post_load(APICCommonState *s)
>  {
>     if (s->timer_expiry != -1) {
> @@ -770,7 +871,10 @@ static void apic_class_init(ObjectClass *klass, void *data)
>     k->init = apic_init;
>     k->set_base = apic_set_base;
>     k->set_tpr = apic_set_tpr;
> +    k->get_tpr = apic_get_tpr;
> +    k->vapic_base_update = apic_vapic_base_update;
>     k->external_nmi = apic_external_nmi;
> +    k->pre_save = apic_pre_save;
>     k->post_load = apic_post_load;
>  }
>
> diff --git a/hw/apic_common.c b/hw/apic_common.c
> index 588531b..1977da7 100644
> --- a/hw/apic_common.c
> +++ b/hw/apic_common.c
> @@ -20,8 +20,10 @@
>  #include "apic.h"
>  #include "apic_internal.h"
>  #include "trace.h"
> +#include "kvm.h"
>
>  static int apic_irq_delivered;
> +bool apic_report_tpr_access;

This should go to APICCommonState.

>
>  void cpu_set_apic_base(DeviceState *d, uint64_t val)
>  {
> @@ -63,13 +65,44 @@ void cpu_set_apic_tpr(DeviceState *d, uint8_t val)
>
>  uint8_t cpu_get_apic_tpr(DeviceState *d)
>  {
> +    APICCommonState *s;
> +    APICCommonClass *info;
> +
> +    if (!d) {
> +        return 0;
> +    }
> +
> +    s = APIC_COMMON(d);
> +    info = APIC_COMMON_GET_CLASS(s);
> +
> +    return info->get_tpr(s);
> +}
> +
> +void apic_enable_tpr_access_reporting(DeviceState *d)
> +{
> +    APICCommonState *s = DO_UPCAST(APICCommonState, busdev.qdev, d);
> +    APICCommonClass *info = APIC_COMMON_GET_CLASS(s);
> +
> +    apic_report_tpr_access = true;
> +    if (info->enable_tpr_reporting) {
> +        info->enable_tpr_reporting(s);
> +    }
> +}
> +
> +void apic_enable_vapic(DeviceState *d, target_phys_addr_t paddr)
> +{
>     APICCommonState *s = DO_UPCAST(APICCommonState, busdev.qdev, d);
> +    APICCommonClass *info = APIC_COMMON_GET_CLASS(s);
>
> -    return s ? s->tpr >> 4 : 0;
> +    s->vapic_paddr = paddr;
> +    info->vapic_base_update(s);
>  }
>
>  void apic_handle_tpr_access_report(DeviceState *d, target_ulong ip, int access)
>  {
> +    APICCommonState *s = DO_UPCAST(APICCommonState, busdev.qdev, d);
> +
> +    vapic_report_tpr_access(s->vapic, s->cpu_env, ip, access);
>  }
>
>  void apic_report_irq_delivered(int delivered)
> @@ -170,12 +203,16 @@ void apic_init_reset(DeviceState *d)
>  static void apic_reset_common(DeviceState *d)
>  {
>     APICCommonState *s = DO_UPCAST(APICCommonState, busdev.qdev, d);
> +    APICCommonClass *info = APIC_COMMON_GET_CLASS(s);
>     bool bsp;
>
>     bsp = cpu_is_bsp(s->cpu_env);
>     s->apicbase = 0xfee00000 |
>         (bsp ? MSR_IA32_APICBASE_BSP : 0) | MSR_IA32_APICBASE_ENABLE;
>
> +    s->vapic_paddr = 0;
> +    info->vapic_base_update(s);
> +
>     apic_init_reset(d);
>
>     if (bsp) {
> @@ -238,6 +275,7 @@ static int apic_init_common(SysBusDevice *dev)
>  {
>     APICCommonState *s = APIC_COMMON(dev);
>     APICCommonClass *info;
> +    static DeviceState *vapic;
>     static int apic_no;
>
>     if (apic_no >= MAX_APICS) {
> @@ -248,10 +286,29 @@ static int apic_init_common(SysBusDevice *dev)
>     info = APIC_COMMON_GET_CLASS(s);
>     info->init(s);
>
> -    sysbus_init_mmio(&s->busdev, &s->io_memory);
> +    sysbus_init_mmio(dev, &s->io_memory);
> +
> +    if (!vapic && s->vapic_control & VAPIC_ENABLE_MASK) {
> +        vapic = sysbus_create_simple("kvmvapic", -1, NULL);
> +    }
> +    s->vapic = vapic;
> +    if (apic_report_tpr_access && info->enable_tpr_reporting) {

I think you should not rely on apic_report_tpr_access being in sane
condition during class init.

> +        info->enable_tpr_reporting(s);
> +    }
> +
>     return 0;
>  }
>
> +static void apic_dispatch_pre_save(void *opaque)
> +{
> +    APICCommonState *s = APIC_COMMON(opaque);
> +    APICCommonClass *info = APIC_COMMON_GET_CLASS(s);
> +
> +    if (info->pre_save) {
> +        info->pre_save(s);
> +    }
> +}
> +
>  static int apic_dispatch_post_load(void *opaque, int version_id)
>  {
>     APICCommonState *s = APIC_COMMON(opaque);
> @@ -269,6 +326,7 @@ static const VMStateDescription vmstate_apic_common = {
>     .minimum_version_id = 3,
>     .minimum_version_id_old = 1,
>     .load_state_old = apic_load_old,
> +    .pre_save = apic_dispatch_pre_save,
>     .post_load = apic_dispatch_post_load,
>     .fields = (VMStateField[]) {
>         VMSTATE_UINT32(apicbase, APICCommonState),
> @@ -298,6 +356,8 @@ static const VMStateDescription vmstate_apic_common = {
>  static Property apic_properties_common[] = {
>     DEFINE_PROP_UINT8("id", APICCommonState, id, -1),
>     DEFINE_PROP_PTR("cpu_env", APICCommonState, cpu_env),
> +    DEFINE_PROP_BIT("vapic", APICCommonState, vapic_control, VAPIC_ENABLE_BIT,
> +                    true),
>     DEFINE_PROP_END_OF_LIST(),
>  };
>
> diff --git a/hw/apic_internal.h b/hw/apic_internal.h
> index 0cab010..95cc7cf 100644
> --- a/hw/apic_internal.h
> +++ b/hw/apic_internal.h
> @@ -61,6 +61,9 @@
>  #define APIC_SV_DIRECTED_IO             (1<<12)
>  #define APIC_SV_ENABLE                  (1<<8)
>
> +#define VAPIC_ENABLE_BIT                0
> +#define VAPIC_ENABLE_MASK               (1 << VAPIC_ENABLE_BIT)
> +
>  #define MAX_APICS 255
>
>  #define MSI_SPACE_SIZE                  0x100000
> @@ -82,7 +85,11 @@ typedef struct APICCommonClass
>     void (*init)(APICCommonState *s);
>     void (*set_base)(APICCommonState *s, uint64_t val);
>     void (*set_tpr)(APICCommonState *s, uint8_t val);
> +    uint8_t (*get_tpr)(APICCommonState *s);
> +    void (*enable_tpr_reporting)(APICCommonState *s);
> +    void (*vapic_base_update)(APICCommonState *s);
>     void (*external_nmi)(APICCommonState *s);
> +    void (*pre_save)(APICCommonState *s);
>     void (*post_load)(APICCommonState *s);
>  } APICCommonClass;
>
> @@ -114,9 +121,29 @@ struct APICCommonState {
>     int64_t timer_expiry;
>     int sipi_vector;
>     int wait_for_sipi;
> +
> +    uint32_t vapic_control;
> +    DeviceState *vapic;
> +    target_phys_addr_t vapic_paddr; /* note: persistence via kvmvapic */
>  };
>
> +typedef struct VAPICState {
> +    uint8_t tpr;
> +    uint8_t isr;
> +    uint8_t zero;
> +    uint8_t irr;
> +    uint8_t enabled;
> +} QEMU_PACKED VAPICState;
> +
> +extern bool apic_report_tpr_access;
> +
>  void apic_report_irq_delivered(int delivered);
>  bool apic_next_timer(APICCommonState *s, int64_t current_time);
> +void apic_enable_tpr_access_reporting(DeviceState *d);
> +void apic_enable_vapic(DeviceState *d, target_phys_addr_t paddr);
> +void apic_poll_irq(DeviceState *d);
> +
> +void vapic_report_tpr_access(DeviceState *dev, void *cpu, target_ulong ip,
> +                             int access);
>
>  #endif /* !QEMU_APIC_INTERNAL_H */
> diff --git a/hw/kvm/apic.c b/hw/kvm/apic.c
> index dfc2ab3..326eb37 100644
> --- a/hw/kvm/apic.c
> +++ b/hw/kvm/apic.c
> @@ -92,6 +92,35 @@ static void kvm_apic_set_tpr(APICCommonState *s, uint8_t val)
>     s->tpr = (val & 0x0f) << 4;
>  }
>
> +static uint8_t kvm_apic_get_tpr(APICCommonState *s)
> +{
> +    return s->tpr >> 4;
> +}
> +
> +static void kvm_apic_enable_tpr_reporting(APICCommonState *s)
> +{
> +    struct kvm_tpr_access_ctl ctl = {
> +        .enabled = 1
> +    };
> +
> +    kvm_vcpu_ioctl(s->cpu_env, KVM_TPR_ACCESS_REPORTING, &ctl);
> +}
> +
> +static void kvm_apic_vapic_base_update(APICCommonState *s)
> +{
> +    struct kvm_vapic_addr vapid_addr = {
> +        .vapic_addr = s->vapic_paddr,
> +    };
> +    int ret;
> +
> +    ret = kvm_vcpu_ioctl(s->cpu_env, KVM_SET_VAPIC_ADDR, &vapid_addr);
> +    if (ret < 0) {
> +        fprintf(stderr, "KVM: setting VAPIC address failed (%s)\n",
> +                strerror(-ret));
> +        abort();
> +    }
> +}
> +
>  static void do_inject_external_nmi(void *data)
>  {
>     APICCommonState *s = data;
> @@ -129,6 +158,9 @@ static void kvm_apic_class_init(ObjectClass *klass, void *data)
>     k->init = kvm_apic_init;
>     k->set_base = kvm_apic_set_base;
>     k->set_tpr = kvm_apic_set_tpr;
> +    k->get_tpr = kvm_apic_get_tpr;
> +    k->enable_tpr_reporting = kvm_apic_enable_tpr_reporting;
> +    k->vapic_base_update = kvm_apic_vapic_base_update;
>     k->external_nmi = kvm_apic_external_nmi;
>  }
>
> diff --git a/hw/kvmvapic.c b/hw/kvmvapic.c
> new file mode 100644
> index 0000000..0c4d304
> --- /dev/null
> +++ b/hw/kvmvapic.c
> @@ -0,0 +1,774 @@
> +/*
> + * TPR optimization for 32-bit Windows guests
> + *
> + * Copyright (C) 2007-2008 Qumranet Technologies
> + * Copyright (C) 2012      Jan Kiszka, Siemens AG
> + *
> + * This work is licensed under the terms of the GNU GPL version 2, or
> + * (at your option) any later version. See the COPYING file in the
> + * top-level directory.
> + */
> +#include "sysemu.h"
> +#include "cpus.h"
> +#include "kvm.h"
> +#include "apic_internal.h"
> +
> +#define APIC_DEFAULT_ADDRESS    0xfee00000
> +
> +#define VAPIC_IO_PORT           0x7e
> +
> +#define VAPIC_INACTIVE          0
> +#define VAPIC_ACTIVE            1
> +#define VAPIC_STANDBY           2

Enums, please.

> +
> +#define VAPIC_CPU_SHIFT         7
> +
> +#define ROM_BLOCK_SIZE          512
> +#define ROM_BLOCK_MASK          (~(ROM_BLOCK_SIZE - 1))
> +
> +typedef struct VAPICHandlers {
> +    uint32_t set_tpr;
> +    uint32_t set_tpr_eax;
> +    uint32_t get_tpr[8];
> +    uint32_t get_tpr_stack;
> +} QEMU_PACKED VAPICHandlers;
> +
> +typedef struct GuestROMState {
> +    char signature[8];
> +    uint32_t vaddr;

This does not look 64 bit clean.

> +    uint32_t fixup_start;
> +    uint32_t fixup_end;
> +    uint32_t vapic_vaddr;
> +    uint32_t vapic_size;
> +    uint32_t vcpu_shift;
> +    uint32_t real_tpr_addr;
> +    VAPICHandlers up;
> +    VAPICHandlers mp;
> +} QEMU_PACKED GuestROMState;

Why packed, is this passed to guest directly?

> +
> +typedef struct VAPICROMState {
> +    SysBusDevice busdev;
> +    MemoryRegion io;
> +    MemoryRegion rom;
> +    bool rom_mapped_writable;

I'd put this later to avoid a structure hole.

> +    uint32_t state;
> +    uint32_t rom_state_paddr;
> +    uint32_t rom_state_vaddr;
> +    uint32_t vapic_paddr;
> +    uint32_t real_tpr_addr;
> +    GuestROMState rom_state;
> +    size_t rom_size;
> +} VAPICROMState;
> +
> +#define TPR_INSTR_IS_WRITE              0x1
> +#define TPR_INSTR_ABS_MODRM             0x2
> +#define TPR_INSTR_MATCH_MODRM_REG       0x4
> +
> +typedef struct TPRInstruction {
> +    uint8_t opcode;
> +    uint8_t modrm_reg;
> +    unsigned int flags;
> +    size_t length;
> +    off_t addr_offset;
> +} TPRInstruction;

Also here the order is pessimized.

> +
> +/* must be sorted by length, shortest first */
> +static const TPRInstruction tpr_instr[] = {
> +    { /* mov abs to eax */
> +        .opcode = 0xa1,
> +        .length = 5,
> +        .addr_offset = 1,
> +    },
> +    { /* mov eax to abs */
> +        .opcode = 0xa3,
> +        .flags = TPR_INSTR_IS_WRITE,
> +        .length = 5,
> +        .addr_offset = 1,
> +    },
> +    { /* mov r32 to r/m32 */
> +        .opcode = 0x89,
> +        .flags = TPR_INSTR_IS_WRITE | TPR_INSTR_ABS_MODRM,
> +        .length = 6,
> +        .addr_offset = 2,
> +    },
> +    { /* mov r/m32 to r32 */
> +        .opcode = 0x8b,
> +        .flags = TPR_INSTR_ABS_MODRM,
> +        .length = 6,
> +        .addr_offset = 2,
> +    },
> +    { /* push r/m32 */
> +        .opcode = 0xff,
> +        .modrm_reg = 6,
> +        .flags = TPR_INSTR_ABS_MODRM | TPR_INSTR_MATCH_MODRM_REG,
> +        .length = 6,
> +        .addr_offset = 2,
> +    },
> +    { /* mov imm32, r/m32 (c7/0) */
> +        .opcode = 0xc7,
> +        .modrm_reg = 0,
> +        .flags = TPR_INSTR_IS_WRITE | TPR_INSTR_ABS_MODRM |
> +                 TPR_INSTR_MATCH_MODRM_REG,
> +        .length = 10,
> +        .addr_offset = 2,
> +    },
> +};
> +
> +static void read_guest_rom_state(VAPICROMState *s)
> +{
> +    cpu_physical_memory_rw(s->rom_state_paddr, (void *)&s->rom_state,
> +                           sizeof(GuestROMState), 0);
> +}
> +
> +static void write_guest_rom_state(VAPICROMState *s)
> +{
> +    cpu_physical_memory_rw(s->rom_state_paddr, (void *)&s->rom_state,
> +                           sizeof(GuestROMState), 1);
> +}
> +
> +static void update_guest_rom_state(VAPICROMState *s)
> +{
> +    read_guest_rom_state(s);
> +
> +    s->rom_state.real_tpr_addr = cpu_to_le32(s->real_tpr_addr);
> +    s->rom_state.vcpu_shift = cpu_to_le32(VAPIC_CPU_SHIFT);
> +
> +    write_guest_rom_state(s);
> +}
> +
> +static int find_real_tpr_addr(VAPICROMState *s, CPUState *env)
> +{
> +    target_phys_addr_t paddr;
> +    target_ulong addr;
> +
> +    if (s->state == VAPIC_ACTIVE) {
> +        return 0;
> +    }
> +    for (addr = 0xfffff000; addr >= 0x80000000; addr -= TARGET_PAGE_SIZE) {
> +        paddr = cpu_get_phys_page_debug(env, addr);
> +        if (paddr != APIC_DEFAULT_ADDRESS) {
> +            continue;
> +        }
> +        s->real_tpr_addr = addr + 0x80;
> +        update_guest_rom_state(s);
> +        return 0;
> +    }

This loop looks odd, what should it do, probe for unused address?

> +    return -1;
> +}
> +
> +static uint8_t modrm_reg(uint8_t modrm)
> +{
> +    return (modrm >> 3) & 7;
> +}
> +
> +static bool is_abs_modrm(uint8_t modrm)
> +{
> +    return (modrm & 0xc7) == 0x05;
> +}
> +
> +static bool opcode_matches(uint8_t *opcode, const TPRInstruction *instr)
> +{
> +    return opcode[0] == instr->opcode &&
> +        (!(instr->flags & TPR_INSTR_ABS_MODRM) || is_abs_modrm(opcode[1])) &&
> +        (!(instr->flags & TPR_INSTR_MATCH_MODRM_REG) ||
> +         modrm_reg(opcode[1]) == instr->modrm_reg);
> +}
> +
> +static int evaluate_tpr_instruction(VAPICROMState *s, CPUState *env,
> +                                    target_ulong *pip, int access)
> +{
> +    const TPRInstruction *instr;
> +    target_ulong ip = *pip;
> +    uint8_t opcode[2];
> +    uint32_t real_tpr_addr;
> +    int i;
> +
> +    if ((ip & 0xf0000000) != 0x80000000 && (ip & 0xf0000000) != 0xe0000000) {

The constants should be using ULL suffix because target_ulong could be
64 bit, though maybe this is more optimal.

> +        return -1;
> +    }
> +
> +    /*
> +     * Early Windows 2003 SMP initialization contains a
> +     *
> +     *   mov imm32, r/m32
> +     *
> +     * instruction that is patched by TPR optimization. The problem is that
> +     * RSP, used by the patched instruction, is zero, so the guest gets a
> +     * double fault and dies.
> +     */
> +    if (env->regs[R_ESP] == 0) {
> +        return -1;
> +    }
> +
> +    if (access == TPR_ACCESS_WRITE && kvm_enabled() &&
> +        !kvm_irqchip_in_kernel()) {
> +        /*
> +         * KVM without TPR access reporting calls into the user space APIC on
> +         * write with IP pointing after the accessing instruction. So we need
> +         * to look backward to find the reason.
> +         */
> +        for (i = 0; i < ARRAY_SIZE(tpr_instr); i++) {
> +            instr = &tpr_instr[i];
> +            if (!(instr->flags & TPR_INSTR_IS_WRITE)) {
> +                continue;
> +            }
> +            if (cpu_memory_rw_debug(env, ip - instr->length, opcode,
> +                                    sizeof(opcode), 0) < 0) {
> +                return -1;
> +            }
> +            if (opcode_matches(opcode, instr)) {
> +                ip -= instr->length;
> +                goto instruction_ok;
> +            }
> +        }
> +        return -1;
> +    } else {
> +        if (cpu_memory_rw_debug(env, ip, opcode, sizeof(opcode), 0) < 0) {
> +            return -1;
> +        }
> +        for (i = 0; i < ARRAY_SIZE(tpr_instr); i++) {
> +            instr = &tpr_instr[i];
> +            if (opcode_matches(opcode, instr)) {
> +                goto instruction_ok;
> +            }
> +        }
> +        return -1;
> +    }
> +
> +instruction_ok:
> +    /*
> +     * Grab the virtual TPR address from the instruction
> +     * and update the cached values.
> +     */
> +    if (cpu_memory_rw_debug(env, ip + instr->addr_offset,
> +                            (void *)&real_tpr_addr,
> +                            sizeof(real_tpr_addr), 0) < 0) {
> +        return -1;
> +    }
> +    real_tpr_addr = le32_to_cpu(real_tpr_addr);
> +    if ((real_tpr_addr & 0xfff) != 0x80) {
> +        return -1;
> +    }
> +    s->real_tpr_addr = real_tpr_addr;
> +    update_guest_rom_state(s);
> +
> +    *pip = ip;
> +    return 0;
> +}
> +
> +static int update_rom_mapping(VAPICROMState *s, CPUState *env, target_ulong ip)
> +{
> +    target_phys_addr_t paddr;
> +    uint32_t rom_state_vaddr;
> +    uint32_t pos, patch, offset;
> +
> +    /* nothing to do if already activated */
> +    if (s->state == VAPIC_ACTIVE) {
> +        return 0;
> +    }
> +
> +    /* bail out if ROM init code was not executed (missing ROM?) */
> +    if (s->state == VAPIC_INACTIVE) {
> +        return -1;
> +    }
> +
> +    /* find out virtual address of the ROM */
> +    rom_state_vaddr = s->rom_state_paddr + (ip & 0xf0000000);
> +    paddr = cpu_get_phys_page_debug(env, rom_state_vaddr);
> +    if (paddr == -1) {
> +        return -1;
> +    }
> +    paddr += rom_state_vaddr & ~TARGET_PAGE_MASK;
> +    if (paddr != s->rom_state_paddr) {
> +        return -1;
> +    }
> +    read_guest_rom_state(s);
> +    if (memcmp(s->rom_state.signature, "kvm aPiC", 8) != 0) {
> +        return -1;
> +    }
> +    s->rom_state_vaddr = rom_state_vaddr;
> +
> +    /* fixup addresses in ROM if needed */
> +    if (rom_state_vaddr == le32_to_cpu(s->rom_state.vaddr)) {
> +        return 0;
> +    }
> +    for (pos = le32_to_cpu(s->rom_state.fixup_start);
> +         pos < le32_to_cpu(s->rom_state.fixup_end);
> +         pos += 4) {
> +        cpu_physical_memory_rw(paddr + pos - s->rom_state.vaddr,
> +                               (void *)&offset, sizeof(offset), 0);
> +        offset = le32_to_cpu(offset);
> +        cpu_physical_memory_rw(paddr + offset, (void *)&patch,
> +                               sizeof(patch), 0);
> +        patch = le32_to_cpu(patch);
> +        patch += rom_state_vaddr - le32_to_cpu(s->rom_state.vaddr);
> +        patch = cpu_to_le32(patch);
> +        cpu_physical_memory_rw(paddr + offset, (void *)&patch,
> +                               sizeof(patch), 1);
> +    }
> +    read_guest_rom_state(s);
> +    s->vapic_paddr = paddr + le32_to_cpu(s->rom_state.vapic_vaddr) -
> +        le32_to_cpu(s->rom_state.vaddr);
> +
> +    return 0;
> +}
> +
> +/*
> + * Tries to read the unique processor number from the Kernel Processor Control
> + * Region (KPCR) of 32-bit Windows. Returns -1 if the KPCR cannot be accessed
> + * or is considered invalid.
> + */

Horrible hack. Is guest OS type or version checked somewhere?

> +static int get_kpcr_number(CPUState *env)
> +{
> +    struct kpcr {
> +        uint8_t  fill1[0x1c];
> +        uint32_t self;
> +        uint8_t  fill2[0x31];
> +        uint8_t  number;
> +    } QEMU_PACKED kpcr;

KPCR. Pointers to Windows documentation would be nice.

> +
> +    if (cpu_memory_rw_debug(env, env->segs[R_FS].base,
> +                            (void *)&kpcr, sizeof(kpcr), 0) < 0 ||
> +        kpcr.self != env->segs[R_FS].base) {
> +        return -1;
> +    }
> +    return kpcr.number;
> +}
> +
> +static int vapic_enable(VAPICROMState *s, CPUState *env)
> +{
> +    int cpu_number = get_kpcr_number(env);
> +    target_phys_addr_t vapic_paddr;
> +    static const uint8_t enabled = 1;
> +
> +    if (cpu_number < 0) {
> +        return -1;
> +    }
> +    vapic_paddr = s->vapic_paddr +
> +        (((target_phys_addr_t)cpu_number) << VAPIC_CPU_SHIFT);
> +    cpu_physical_memory_rw(vapic_paddr + offsetof(VAPICState, enabled),
> +                           (void *)&enabled, sizeof(enabled), 1);
> +    apic_enable_vapic(env->apic_state, vapic_paddr);
> +
> +    s->state = VAPIC_ACTIVE;
> +
> +    return 0;
> +}
> +
> +static void patch_byte(CPUState *env, target_ulong addr, uint8_t byte)
> +{
> +    cpu_memory_rw_debug(env, addr, &byte, 1, 1);
> +}
> +
> +static void patch_call(VAPICROMState *s, CPUState *env, target_ulong ip,
> +                       uint32_t target)
> +{
> +    uint32_t offset;
> +
> +    offset = cpu_to_le32(target - ip - 5);
> +    patch_byte(env, ip, 0xe8); /* call near */
> +    cpu_memory_rw_debug(env, ip + 1, (void *)&offset, sizeof(offset), 1);
> +}
> +
> +static void patch_instruction(VAPICROMState *s, CPUState *env, target_ulong ip)
> +{
> +    target_phys_addr_t paddr;
> +    VAPICHandlers *handlers;
> +    uint8_t opcode[2];
> +    uint32_t imm32;
> +
> +    if (smp_cpus == 1) {
> +        handlers = &s->rom_state.up;
> +    } else {
> +        handlers = &s->rom_state.mp;
> +    }
> +
> +    pause_all_vcpus();
> +
> +    cpu_memory_rw_debug(env, ip, opcode, sizeof(opcode), 0);
> +
> +    switch (opcode[0]) {
> +    case 0x89: /* mov r32 to r/m32 */
> +        patch_byte(env, ip, 0x50 + modrm_reg(opcode[1]));  /* push reg */
> +        patch_call(s, env, ip + 1, handlers->set_tpr);
> +        break;
> +    case 0x8b: /* mov r/m32 to r32 */
> +        patch_byte(env, ip, 0x90);
> +        patch_call(s, env, ip + 1, handlers->get_tpr[modrm_reg(opcode[1])]);
> +        break;
> +    case 0xa1: /* mov abs to eax */
> +        patch_call(s, env, ip, handlers->get_tpr[0]);
> +        break;
> +    case 0xa3: /* mov eax to abs */
> +        patch_call(s, env, ip, handlers->set_tpr_eax);
> +        break;
> +    case 0xc7: /* mov imm32, r/m32 (c7/0) */
> +        patch_byte(env, ip, 0x68);  /* push imm32 */
> +        cpu_memory_rw_debug(env, ip + 6, (void *)&imm32, sizeof(imm32), 0);
> +        cpu_memory_rw_debug(env, ip + 1, (void *)&imm32, sizeof(imm32), 1);
> +        patch_call(s, env, ip + 5, handlers->set_tpr);
> +        break;
> +    case 0xff: /* push r/m32 */
> +        patch_byte(env, ip, 0x50); /* push eax */
> +        patch_call(s, env, ip + 1, handlers->get_tpr_stack);
> +        break;
> +    default:
> +        abort();
> +    }
> +
> +    resume_all_vcpus();
> +
> +    paddr = cpu_get_phys_page_debug(env, ip);
> +    paddr += ip & ~TARGET_PAGE_MASK;
> +    tb_invalidate_phys_page_range(paddr, paddr + 1, 1);
> +}
> +
> +void vapic_report_tpr_access(DeviceState *dev, void *cpu, target_ulong ip,
> +                             int access)
> +{
> +    VAPICROMState *s = DO_UPCAST(VAPICROMState, busdev.qdev, dev);
> +    CPUState *env = cpu;
> +
> +    cpu_synchronize_state(env);
> +
> +    if (evaluate_tpr_instruction(s, env, &ip, access) < 0) {
> +        if (s->state == VAPIC_ACTIVE) {
> +            vapic_enable(s, env);
> +        }
> +        return;
> +    }
> +    if (update_rom_mapping(s, env, ip) < 0) {
> +        return;
> +    }
> +    if (vapic_enable(s, env) < 0) {
> +        return;
> +    }
> +    patch_instruction(s, env, ip);
> +}
> +
> +static void vapic_reset(DeviceState *dev)
> +{
> +    VAPICROMState *s = DO_UPCAST(VAPICROMState, busdev.qdev, dev);
> +
> +    if (s->state == VAPIC_ACTIVE) {
> +        s->state = VAPIC_STANDBY;
> +    }
> +}
> +
> +static int patch_hypercalls(VAPICROMState *s)
> +{
> +    target_phys_addr_t rom_paddr = s->rom_state_paddr & ROM_BLOCK_MASK;
> +    static uint8_t vmcall_pattern[] = {

const

> +        0xb8, 0x1, 0, 0, 0, 0xf, 0x1, 0xc1
> +    };
> +    static uint8_t outl_pattern[] = {

const

> +        0xb8, 0x1, 0, 0, 0, 0x90, 0xe7, 0x7e
> +    };
> +    uint8_t alternates[2];
> +    uint8_t *pattern;
> +    uint8_t *patch;
> +    int patches = 0;
> +    off_t pos;
> +    uint8_t *rom;
> +
> +    rom = g_malloc(s->rom_size);
> +    cpu_physical_memory_rw(rom_paddr, rom, s->rom_size, 0);
> +
> +    for (pos = 0; pos < s->rom_size - sizeof(vmcall_pattern); pos++) {
> +        if (kvm_irqchip_in_kernel()) {
> +            pattern = outl_pattern;
> +            alternates[0] = outl_pattern[7];
> +            alternates[1] = outl_pattern[7];
> +            patch = &vmcall_pattern[5];
> +        } else {
> +            pattern = vmcall_pattern;
> +            alternates[0] = vmcall_pattern[7];
> +            alternates[1] = 0xd9; /* AMD's VMMCALL */
> +            patch = &outl_pattern[5];
> +        }
> +        if (memcmp(rom + pos, pattern, 7) == 0 &&
> +            (rom[pos + 7] == alternates[0] || rom[pos + 7] == alternates[1])) {
> +            cpu_physical_memory_rw(rom_paddr + pos + 5, patch, 3, 1);
> +            /*
> +             * Don't flush the tb here. Under ordinary conditions, the patched
> +             * calls are miles away from the current IP. Under malicious
> +             * conditions, the guest could trick us to crash.
> +             */
> +        }
> +    }
> +
> +    g_free(rom);
> +
> +    if (patches != 0 && patches != 2) {
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +static void vapic_map_rom_writable(VAPICROMState *s)
> +{
> +    target_phys_addr_t rom_paddr = s->rom_state_paddr & ROM_BLOCK_MASK;
> +    MemoryRegionSection section;
> +    MemoryRegion *as;
> +    size_t rom_size;
> +    uint8_t *ram;
> +
> +    as = sysbus_address_space(&s->busdev);
> +
> +    if (s->rom_mapped_writable) {
> +        memory_region_del_subregion(as, &s->rom);
> +        memory_region_destroy(&s->rom);
> +    }
> +
> +    /* grab RAM memory region (region @rom_paddr may still be pc.rom) */
> +    section = memory_region_find(as, 0, 1);
> +
> +    /* read ROM size from RAM region */
> +    ram = memory_region_get_ram_ptr(section.mr);
> +    rom_size = ram[rom_paddr + 2] * ROM_BLOCK_SIZE;
> +    s->rom_size = rom_size;
> +
> +    /* We need to round up to avoid creating subpages
> +     * from which we cannot run code. */
> +    rom_size = TARGET_PAGE_ALIGN(rom_size);
> +
> +    memory_region_init_alias(&s->rom, "kvmvapic-rom", section.mr, rom_paddr,
> +                             rom_size);
> +    memory_region_add_subregion_overlap(as, rom_paddr, &s->rom, 1000);
> +    s->rom_mapped_writable = true;
> +}
> +
> +static void do_enable_tpr_reporting(void *data)
> +{
> +    CPUState *env = data;
> +
> +    apic_enable_tpr_access_reporting(env->apic_state);
> +}
> +
> +static void vapic_enable_tpr_reporting(void)
> +{
> +    CPUState *env = cpu_single_env;
> +
> +    for (env = first_cpu; env != NULL; env = env->next_cpu) {
> +        run_on_cpu(env, do_enable_tpr_reporting, env);
> +    }
> +}
> +
> +static int vapic_prepare(VAPICROMState *s)
> +{
> +    vapic_map_rom_writable(s);
> +
> +    if (patch_hypercalls(s) < 0) {
> +        return -1;
> +    }
> +
> +    vapic_enable_tpr_reporting();
> +
> +    return 0;
> +}
> +
> +static void vapic_write(void *opaque, target_phys_addr_t addr, uint64_t data,
> +                        unsigned int size)
> +{
> +    CPUState *env = cpu_single_env;
> +    target_phys_addr_t rom_paddr;
> +    VAPICROMState *s = opaque;
> +
> +    cpu_synchronize_state(env);
> +
> +    /*
> +     * The VAPIC supports two PIO-based hypercalls, both via port 0x7E.
> +     *  o 16-bit write access:
> +     *    Reports the option ROM initialization to the hypervisor. Written
> +     *    value is the offset of the state structure in the ROM.
> +     *  o 8-bit write access:
> +     *    Reactivates the VAPIC after a guest hibernation, i.e. after the
> +     *    option ROM content has been re-initialized by a guest power cycle.
> +     *  o 32-bit write access:
> +     *    Poll for pending IRQs, considering the current VAPIC state.
> +     */

Different operation depending on size? Interesting.

> +    switch (size) {
> +    case 2:
> +        if (s->state != VAPIC_INACTIVE) {
> +            patch_hypercalls(s);
> +            break;
> +        }
> +
> +        rom_paddr = (env->segs[R_CS].base + env->eip) & ROM_BLOCK_MASK;
> +        s->rom_state_paddr = rom_paddr + data;
> +
> +        if (vapic_prepare(s) < 0) {
> +            break;
> +        }
> +        s->state = VAPIC_STANDBY;
> +        break;
> +    case 1:
> +        if (kvm_enabled()) {
> +            /*
> +             * Disable triggering instruction in ROM by writing a NOP.
> +             *
> +             * We cannot do this in TCG mode as the reported IP is not
> +             * reliable.

Given the hack level of the whole, it would not be impossible to find
the IP using search PC.

> +             */
> +            pause_all_vcpus();
> +            patch_byte(env, env->eip - 2, 0x66);
> +            patch_byte(env, env->eip - 1, 0x90);
> +            resume_all_vcpus();
> +        }
> +
> +        if (s->state == VAPIC_ACTIVE) {
> +            break;
> +        }
> +        if (update_rom_mapping(s, env, env->eip) < 0) {
> +            break;
> +        }
> +        if (find_real_tpr_addr(s, env) < 0) {
> +            break;
> +        }
> +        vapic_enable(s, env);
> +        break;
> +    default:
> +    case 4:
> +        if (!kvm_irqchip_in_kernel()) {
> +            apic_poll_irq(env->apic_state);
> +        }
> +        break;
> +    }
> +}
> +
> +static const MemoryRegionOps vapic_ops = {
> +    .write = vapic_write,
> +    .endianness = DEVICE_NATIVE_ENDIAN,
> +};
> +
> +static int vapic_init(SysBusDevice *dev)
> +{
> +    VAPICROMState *s = FROM_SYSBUS(VAPICROMState, dev);
> +
> +    memory_region_init_io(&s->io, &vapic_ops, s, "kvmvapic", 2);
> +    sysbus_add_io(dev, VAPIC_IO_PORT, &s->io);
> +    sysbus_init_ioports(dev, VAPIC_IO_PORT, 2);
> +
> +    option_rom[nb_option_roms].name = "kvmvapic.bin";
> +    option_rom[nb_option_roms].bootindex = -1;
> +    nb_option_roms++;
> +
> +    return 0;
> +}
> +
> +static void do_vapic_enable(void *data)
> +{
> +    VAPICROMState *s = data;
> +
> +    vapic_enable(s, first_cpu);
> +}
> +
> +static int vapic_post_load(void *opaque, int version_id)
> +{
> +    VAPICROMState *s = opaque;
> +    uint8_t *zero;
> +
> +    /*
> +     * The old implementation of qemu-kvm did not provide the state
> +     * VAPIC_STANDBY. Reconstruct it.
> +     */
> +    if (s->state == VAPIC_INACTIVE && s->rom_state_paddr != 0) {
> +        s->state = VAPIC_STANDBY;
> +    }
> +
> +    if (s->state != VAPIC_INACTIVE) {
> +        if (vapic_prepare(s) < 0) {
> +            return -1;
> +        }
> +    }
> +    if (s->state == VAPIC_ACTIVE) {
> +        if (smp_cpus == 1) {
> +            run_on_cpu(first_cpu, do_vapic_enable, s);
> +        } else {
> +            zero = g_malloc0(s->rom_state.vapic_size);
> +            cpu_physical_memory_rw(s->vapic_paddr, zero,
> +                                   s->rom_state.vapic_size, 1);
> +            g_free(zero);
> +        }
> +    }
> +
> +    return 0;
> +}
> +
> +static const VMStateDescription vmstate_handlers = {
> +    .name = "kvmvapic-handlers",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .minimum_version_id_old = 1,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT32(set_tpr, VAPICHandlers),
> +        VMSTATE_UINT32(set_tpr_eax, VAPICHandlers),
> +        VMSTATE_UINT32_ARRAY(get_tpr, VAPICHandlers, 8),
> +        VMSTATE_UINT32(get_tpr_stack, VAPICHandlers),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
> +static const VMStateDescription vmstate_guest_rom = {
> +    .name = "kvmvapic-guest-rom",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .minimum_version_id_old = 1,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UNUSED(8),     /* signature */
> +        VMSTATE_UINT32(vaddr, GuestROMState),
> +        VMSTATE_UINT32(fixup_start, GuestROMState),
> +        VMSTATE_UINT32(fixup_end, GuestROMState),
> +        VMSTATE_UINT32(vapic_vaddr, GuestROMState),
> +        VMSTATE_UINT32(vapic_size, GuestROMState),
> +        VMSTATE_UINT32(vcpu_shift, GuestROMState),
> +        VMSTATE_UINT32(real_tpr_addr, GuestROMState),
> +        VMSTATE_STRUCT(up, GuestROMState, 0, vmstate_handlers, VAPICHandlers),
> +        VMSTATE_STRUCT(mp, GuestROMState, 0, vmstate_handlers, VAPICHandlers),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
> +static const VMStateDescription vmstate_vapic = {
> +    .name = "kvm-tpr-opt",      /* compatible with qemu-kvm VAPIC */
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .minimum_version_id_old = 1,
> +    .post_load = vapic_post_load,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_STRUCT(rom_state, VAPICROMState, 0, vmstate_guest_rom,
> +                       GuestROMState),
> +        VMSTATE_UINT32(state, VAPICROMState),
> +        VMSTATE_UINT32(real_tpr_addr, VAPICROMState),
> +        VMSTATE_UINT32(rom_state_vaddr, VAPICROMState),
> +        VMSTATE_UINT32(vapic_paddr, VAPICROMState),
> +        VMSTATE_UINT32(rom_state_paddr, VAPICROMState),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
> +static void vapic_class_init(ObjectClass *klass, void *data)
> +{
> +    SysBusDeviceClass *sc = SYS_BUS_DEVICE_CLASS(klass);
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +
> +    dc->no_user = 1;
> +    dc->reset   = vapic_reset;
> +    dc->vmsd    = &vmstate_vapic;
> +    sc->init    = vapic_init;
> +}
> +
> +static TypeInfo vapic_type = {
> +    .name          = "kvmvapic",
> +    .parent        = TYPE_SYS_BUS_DEVICE,
> +    .instance_size = sizeof(VAPICROMState),
> +    .class_init    = vapic_class_init,
> +};
> +
> +static void vapic_register(void)
> +{
> +    type_register_static(&vapic_type);
> +}
> +
> +device_init(vapic_register);
> --
> 1.7.3.4
>
>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/8] kvm: Set cpu_single_env only once
  2012-02-11 14:12                           ` Andreas Färber
@ 2012-02-13  8:17                             ` Paolo Bonzini
  -1 siblings, 0 replies; 90+ messages in thread
From: Paolo Bonzini @ 2012-02-13  8:17 UTC (permalink / raw)
  To: Andreas Färber
  Cc: Jan Kiszka, Blue Swirl, Anthony Liguori, kvm, Gleb Natapov,
	Marcelo Tosatti, qemu-devel, Avi Kivity

On 02/11/2012 03:12 PM, Andreas Färber wrote:
> Yes and no. They can have any target-specific pointer they want, just
> as before. But no global first_cpu / cpu_single_env pointer - that's
> replaced by CPU pointers, through which members of derived classes can
> be accessed (which did not work for CPUState due to CPU_COMMON members
> being at target-specific offset in the middle).

Hmm, now I'm not even sure what I want that Andreas referred to. :)

I definitely would like CPUState pointers to be changed into link 
properties, but that's not related to what Jan is doing here with 
cpu_single_env.  Each LAPIC refers to a CPU, and that would become a 
link property indeed.  But here we're using cpu_single_env to find out 
which LAPIC is being read.  It's the other direction.

Relying on thread-local cpu_single_env means that you restrict LAPIC 
memory reads to run in VCPU thread context, and this makes sense anyway. 
  The only case of MMIO running in iothread context is Xen, but Xen 
always keeps the LAPIC in the hypervisor.

Also, I think that having a view of CPUs in QOM is laudable, but I don't 
understand why that means you need to remove first_cpu / cpu_single_env.

Finally, CPU_COMMON members may be referenced from TCG-generated code, 
how do you plan to move them and still keep the TLBs at small offsets 
within CPUState?  Perhaps we need a drawing of the situation before and 
after the QOMization of CPUs.

Paolo

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/8] kvm: Set cpu_single_env only once
@ 2012-02-13  8:17                             ` Paolo Bonzini
  0 siblings, 0 replies; 90+ messages in thread
From: Paolo Bonzini @ 2012-02-13  8:17 UTC (permalink / raw)
  To: Andreas Färber
  Cc: Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti, qemu-devel,
	Blue Swirl, Jan Kiszka, Avi Kivity

On 02/11/2012 03:12 PM, Andreas Färber wrote:
> Yes and no. They can have any target-specific pointer they want, just
> as before. But no global first_cpu / cpu_single_env pointer - that's
> replaced by CPU pointers, through which members of derived classes can
> be accessed (which did not work for CPUState due to CPU_COMMON members
> being at target-specific offset in the middle).

Hmm, now I'm not even sure what I want that Andreas referred to. :)

I definitely would like CPUState pointers to be changed into link 
properties, but that's not related to what Jan is doing here with 
cpu_single_env.  Each LAPIC refers to a CPU, and that would become a 
link property indeed.  But here we're using cpu_single_env to find out 
which LAPIC is being read.  It's the other direction.

Relying on thread-local cpu_single_env means that you restrict LAPIC 
memory reads to run in VCPU thread context, and this makes sense anyway. 
  The only case of MMIO running in iothread context is Xen, but Xen 
always keeps the LAPIC in the hypervisor.

Also, I think that having a view of CPUs in QOM is laudable, but I don't 
understand why that means you need to remove first_cpu / cpu_single_env.

Finally, CPU_COMMON members may be referenced from TCG-generated code, 
how do you plan to move them and still keep the TLBs at small offsets 
within CPUState?  Perhaps we need a drawing of the situation before and 
after the QOMization of CPUs.

Paolo

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests
  2012-02-11 15:25     ` [Qemu-devel] " Blue Swirl
@ 2012-02-13 10:16       ` Jan Kiszka
  -1 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-13 10:16 UTC (permalink / raw)
  To: Blue Swirl
  Cc: Avi Kivity, Marcelo Tosatti, Anthony Liguori, qemu-devel, kvm,
	Gleb Natapov

On 2012-02-11 16:25, Blue Swirl wrote:
> On Fri, Feb 10, 2012 at 18:31, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> This enables acceleration for MMIO-based TPR registers accesses of
>> 32-bit Windows guest systems. It is mostly useful with KVM enabled,
>> either on older Intel CPUs (without flexpriority feature, can also be
>> manually disabled for testing) or any current AMD processor.
>>
>> The approach introduced here is derived from the original version of
>> qemu-kvm. It was refactored, documented, and extended by support for
>> user space APIC emulation, both with and without KVM acceleration. The
>> VMState format was kept compatible, so was the ABI to the option ROM
>> that implements the guest-side para-virtualized driver service. This
>> enables seamless migration from qemu-kvm to upstream or, one day,
>> between KVM and TCG mode.
>>
>> The basic concept goes like this:
>>  - VAPIC PV interface consisting of I/O port 0x7e and (for KVM in-kernel
>>   irqchip) a vmcall hypercall is registered
>>  - VAPIC option ROM is loaded into guest
>>  - option ROM activates TPR MMIO access reporting via port 0x7e
>>  - TPR accesses are trapped and patched in the guest to call into option
>>   ROM instead, VAPIC support is enabled
>>  - option ROM TPR helpers track state in memory and invoke hypercall to
>>   poll for pending IRQs if required
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> 
> I must say that I find the approach horrible, patching guests and ROMs
> and looking up Windows internals. Taking the same approach to extreme,
> we could for example patch Xen guest to become a KVM guest. Not that I
> object merging.

Yes, this is horrible. But there is no real better way in the absence of
hardware assisted virtualization of the TPR. I think MS is recommending
this patching approach as well.

>> diff --git a/hw/apic.c b/hw/apic.c
>> index 086c544..2ebf3ca 100644
>> --- a/hw/apic.c
>> +++ b/hw/apic.c
>> @@ -35,6 +35,10 @@
>>  #define MSI_ADDR_DEST_ID_SHIFT         12
>>  #define        MSI_ADDR_DEST_ID_MASK           0x00ffff0
>>
>> +#define SYNC_FROM_VAPIC                 0x1
>> +#define SYNC_TO_VAPIC                   0x2
>> +#define SYNC_ISR_IRR_TO_VAPIC           0x4
> 
> Enum, please.

OK.

> 
>> +
>>  static APICCommonState *local_apics[MAX_APICS + 1];
>>
>>  static void apic_set_irq(APICCommonState *s, int vector_num, int trigger_mode);
>> @@ -78,6 +82,70 @@ static inline int get_bit(uint32_t *tab, int index)
>>     return !!(tab[i] & mask);
>>  }
>>
>> +/* return -1 if no bit is set */
>> +static int get_highest_priority_int(uint32_t *tab)
>> +{
>> +    int i;
>> +    for (i = 7; i >= 0; i--) {
>> +        if (tab[i] != 0) {
>> +            return i * 32 + fls_bit(tab[i]);
>> +        }
>> +    }
>> +    return -1;
>> +}
>> +
>> +static void apic_sync_vapic(APICCommonState *s, int sync_type)
>> +{
>> +    VAPICState vapic_state;
>> +    size_t length;
>> +    off_t start;
>> +    int vector;
>> +
>> +    if (!s->vapic_paddr) {
>> +        return;
>> +    }
>> +    if (sync_type & SYNC_FROM_VAPIC) {
>> +        cpu_physical_memory_rw(s->vapic_paddr, (void *)&vapic_state,
>> +                               sizeof(vapic_state), 0);
>> +        s->tpr = vapic_state.tpr;
>> +    }
>> +    if (sync_type & (SYNC_TO_VAPIC | SYNC_ISR_IRR_TO_VAPIC)) {
>> +        start = offsetof(VAPICState, isr);
>> +        length = offsetof(VAPICState, enabled) - offsetof(VAPICState, isr);
>> +
>> +        if (sync_type & SYNC_TO_VAPIC) {
>> +            assert(qemu_cpu_is_self(s->cpu_env));
>> +
>> +            vapic_state.tpr = s->tpr;
>> +            vapic_state.enabled = 1;
>> +            start = 0;
>> +            length = sizeof(VAPICState);
>> +        }
>> +
>> +        vector = get_highest_priority_int(s->isr);
>> +        if (vector < 0) {
>> +            vector = 0;
>> +        }
>> +        vapic_state.isr = vector & 0xf0;
>> +
>> +        vapic_state.zero = 0;
>> +
>> +        vector = get_highest_priority_int(s->irr);
>> +        if (vector < 0) {
>> +            vector = 0;
>> +        }
>> +        vapic_state.irr = vector & 0xff;
>> +
>> +        cpu_physical_memory_write_rom(s->vapic_paddr + start,
>> +                                      ((void *)&vapic_state) + start, length);
> 
> This assumes that the vapic_state structure matches guest what guest
> expect without conversion. Is this true for i386 on x86_64? I didn't
> check the structure in question.

Yes, the structure in question is a packed one, stable on both guest and
host side (the guest side is 32-bit only anyway).

>> diff --git a/hw/apic_common.c b/hw/apic_common.c
>> index 588531b..1977da7 100644
>> --- a/hw/apic_common.c
>> +++ b/hw/apic_common.c
>> @@ -20,8 +20,10 @@
>>  #include "apic.h"
>>  #include "apic_internal.h"
>>  #include "trace.h"
>> +#include "kvm.h"
>>
>>  static int apic_irq_delivered;
>> +bool apic_report_tpr_access;
> 
> This should go to APICCommonState.

Nope, it is a global state, also checked in a place where the APIC is
set up, thus have no local clue about it yet and needs to pick up the
global view.

>> @@ -238,6 +275,7 @@ static int apic_init_common(SysBusDevice *dev)
>>  {
>>     APICCommonState *s = APIC_COMMON(dev);
>>     APICCommonClass *info;
>> +    static DeviceState *vapic;
>>     static int apic_no;
>>
>>     if (apic_no >= MAX_APICS) {
>> @@ -248,10 +286,29 @@ static int apic_init_common(SysBusDevice *dev)
>>     info = APIC_COMMON_GET_CLASS(s);
>>     info->init(s);
>>
>> -    sysbus_init_mmio(&s->busdev, &s->io_memory);
>> +    sysbus_init_mmio(dev, &s->io_memory);
>> +
>> +    if (!vapic && s->vapic_control & VAPIC_ENABLE_MASK) {
>> +        vapic = sysbus_create_simple("kvmvapic", -1, NULL);
>> +    }
>> +    s->vapic = vapic;
>> +    if (apic_report_tpr_access && info->enable_tpr_reporting) {
> 
> I think you should not rely on apic_report_tpr_access being in sane
> condition during class init.

It is mandatory, e.g. for CPU hotplug, as reporting needs to be
consistent accross all VCPUs. Therefore it is a static global, set to
false initially. However, you are right, we lack proper clearing of  the
access report feature on reset, not only in this variable.

>> diff --git a/hw/kvmvapic.c b/hw/kvmvapic.c
>> new file mode 100644
>> index 0000000..0c4d304
>> --- /dev/null
>> +++ b/hw/kvmvapic.c
>> @@ -0,0 +1,774 @@
>> +/*
>> + * TPR optimization for 32-bit Windows guests
>> + *
>> + * Copyright (C) 2007-2008 Qumranet Technologies
>> + * Copyright (C) 2012      Jan Kiszka, Siemens AG
>> + *
>> + * This work is licensed under the terms of the GNU GPL version 2, or
>> + * (at your option) any later version. See the COPYING file in the
>> + * top-level directory.
>> + */
>> +#include "sysemu.h"
>> +#include "cpus.h"
>> +#include "kvm.h"
>> +#include "apic_internal.h"
>> +
>> +#define APIC_DEFAULT_ADDRESS    0xfee00000
>> +
>> +#define VAPIC_IO_PORT           0x7e
>> +
>> +#define VAPIC_INACTIVE          0
>> +#define VAPIC_ACTIVE            1
>> +#define VAPIC_STANDBY           2
> 
> Enums, please.

OK.

> 
>> +
>> +#define VAPIC_CPU_SHIFT         7
>> +
>> +#define ROM_BLOCK_SIZE          512
>> +#define ROM_BLOCK_MASK          (~(ROM_BLOCK_SIZE - 1))
>> +
>> +typedef struct VAPICHandlers {
>> +    uint32_t set_tpr;
>> +    uint32_t set_tpr_eax;
>> +    uint32_t get_tpr[8];
>> +    uint32_t get_tpr_stack;
>> +} QEMU_PACKED VAPICHandlers;
>> +
>> +typedef struct GuestROMState {
>> +    char signature[8];
>> +    uint32_t vaddr;
> 
> This does not look 64 bit clean.

It's packed.

> 
>> +    uint32_t fixup_start;
>> +    uint32_t fixup_end;
>> +    uint32_t vapic_vaddr;
>> +    uint32_t vapic_size;
>> +    uint32_t vcpu_shift;
>> +    uint32_t real_tpr_addr;
>> +    VAPICHandlers up;
>> +    VAPICHandlers mp;
>> +} QEMU_PACKED GuestROMState;
> 
> Why packed, is this passed to guest directly?

It is a data field in the option ROM, see vapic_base in kvmvapic.S.

> 
>> +
>> +typedef struct VAPICROMState {
>> +    SysBusDevice busdev;
>> +    MemoryRegion io;
>> +    MemoryRegion rom;
>> +    bool rom_mapped_writable;
> 
> I'd put this later to avoid a structure hole.

Moving it after rom_state may save us a few precious bytes. Well, ok. :)

> 
>> +    uint32_t state;
>> +    uint32_t rom_state_paddr;
>> +    uint32_t rom_state_vaddr;
>> +    uint32_t vapic_paddr;
>> +    uint32_t real_tpr_addr;
>> +    GuestROMState rom_state;
>> +    size_t rom_size;
>> +} VAPICROMState;
>> +
>> +#define TPR_INSTR_IS_WRITE              0x1
>> +#define TPR_INSTR_ABS_MODRM             0x2
>> +#define TPR_INSTR_MATCH_MODRM_REG       0x4
>> +
>> +typedef struct TPRInstruction {
>> +    uint8_t opcode;
>> +    uint8_t modrm_reg;
>> +    unsigned int flags;
>> +    size_t length;
>> +    off_t addr_offset;
>> +} TPRInstruction;
> 
> Also here the order is pessimized.

Don't see the gain here, though.

>> +static int find_real_tpr_addr(VAPICROMState *s, CPUState *env)
>> +{
>> +    target_phys_addr_t paddr;
>> +    target_ulong addr;
>> +
>> +    if (s->state == VAPIC_ACTIVE) {
>> +        return 0;
>> +    }
>> +    for (addr = 0xfffff000; addr >= 0x80000000; addr -= TARGET_PAGE_SIZE) {
>> +        paddr = cpu_get_phys_page_debug(env, addr);
>> +        if (paddr != APIC_DEFAULT_ADDRESS) {
>> +            continue;
>> +        }
>> +        s->real_tpr_addr = addr + 0x80;
>> +        update_guest_rom_state(s);
>> +        return 0;
>> +    }
> 
> This loop looks odd, what should it do, probe for unused address?

Seems to deserve a comment: We have to scan for the guest's mapping of
the APIC as we enter here without a hint from an TPR accessing
instruction. So we probe the potential range, trying to find the page
that maps to that known physical address (known in the sense that
Windows does not remap the APIC physically - nor does QEMU support that
so far).

>> +static int evaluate_tpr_instruction(VAPICROMState *s, CPUState *env,
>> +                                    target_ulong *pip, int access)
>> +{
>> +    const TPRInstruction *instr;
>> +    target_ulong ip = *pip;
>> +    uint8_t opcode[2];
>> +    uint32_t real_tpr_addr;
>> +    int i;
>> +
>> +    if ((ip & 0xf0000000) != 0x80000000 && (ip & 0xf0000000) != 0xe0000000) {
> 
> The constants should be using ULL suffix because target_ulong could be
> 64 bit, though maybe this is more optimal.

target_ulong is 64-bit unconditionally on x86. I'll add this.

>> +
>> +/*
>> + * Tries to read the unique processor number from the Kernel Processor Control
>> + * Region (KPCR) of 32-bit Windows. Returns -1 if the KPCR cannot be accessed
>> + * or is considered invalid.
>> + */
> 
> Horrible hack. Is guest OS type or version checked somewhere?

This is all about hacking Windows 32-bit. And this check encodes that
even stronger. The other important binding is the expected virtual
address of the ROM mapping under Windows. I would have preferred
checking the version directly, but no one has a complete list of
supported guests and their codes.

> 
>> +static int get_kpcr_number(CPUState *env)
>> +{
>> +    struct kpcr {
>> +        uint8_t  fill1[0x1c];
>> +        uint32_t self;
>> +        uint8_t  fill2[0x31];
>> +        uint8_t  number;
>> +    } QEMU_PACKED kpcr;
> 
> KPCR. Pointers to Windows documentation would be nice.

Oops, yes.

Unfortunately, this is only an internal structure, not officially
documented by MS. However, all supported OS versions a legacy by now, no
longer changing its structure.

>> +
>> +static int patch_hypercalls(VAPICROMState *s)
>> +{
>> +    target_phys_addr_t rom_paddr = s->rom_state_paddr & ROM_BLOCK_MASK;
>> +    static uint8_t vmcall_pattern[] = {
> 
> const
> 
>> +        0xb8, 0x1, 0, 0, 0, 0xf, 0x1, 0xc1
>> +    };
>> +    static uint8_t outl_pattern[] = {
> 
> const

Yep.

>> +static void vapic_write(void *opaque, target_phys_addr_t addr, uint64_t data,
>> +                        unsigned int size)
>> +{
>> +    CPUState *env = cpu_single_env;
>> +    target_phys_addr_t rom_paddr;
>> +    VAPICROMState *s = opaque;
>> +
>> +    cpu_synchronize_state(env);
>> +
>> +    /*
>> +     * The VAPIC supports two PIO-based hypercalls, both via port 0x7E.
>> +     *  o 16-bit write access:
>> +     *    Reports the option ROM initialization to the hypervisor. Written
>> +     *    value is the offset of the state structure in the ROM.
>> +     *  o 8-bit write access:
>> +     *    Reactivates the VAPIC after a guest hibernation, i.e. after the
>> +     *    option ROM content has been re-initialized by a guest power cycle.
>> +     *  o 32-bit write access:
>> +     *    Poll for pending IRQs, considering the current VAPIC state.
>> +     */
> 
> Different operation depending on size? Interesting.

Originally not my idea, just added the third case. :)

> 
>> +    switch (size) {
>> +    case 2:
>> +        if (s->state != VAPIC_INACTIVE) {
>> +            patch_hypercalls(s);
>> +            break;
>> +        }
>> +
>> +        rom_paddr = (env->segs[R_CS].base + env->eip) & ROM_BLOCK_MASK;
>> +        s->rom_state_paddr = rom_paddr + data;
>> +
>> +        if (vapic_prepare(s) < 0) {
>> +            break;
>> +        }
>> +        s->state = VAPIC_STANDBY;
>> +        break;
>> +    case 1:
>> +        if (kvm_enabled()) {
>> +            /*
>> +             * Disable triggering instruction in ROM by writing a NOP.
>> +             *
>> +             * We cannot do this in TCG mode as the reported IP is not
>> +             * reliable.
> 
> Given the hack level of the whole, it would not be impossible to find
> the IP using search PC.

Is there a specific pre-existing service you have in mind? Otherwise,
the complexity might not be worth the gain.

Thanks for having a look,
Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests
@ 2012-02-13 10:16       ` Jan Kiszka
  0 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-13 10:16 UTC (permalink / raw)
  To: Blue Swirl
  Cc: Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti, qemu-devel,
	Avi Kivity

On 2012-02-11 16:25, Blue Swirl wrote:
> On Fri, Feb 10, 2012 at 18:31, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> This enables acceleration for MMIO-based TPR registers accesses of
>> 32-bit Windows guest systems. It is mostly useful with KVM enabled,
>> either on older Intel CPUs (without flexpriority feature, can also be
>> manually disabled for testing) or any current AMD processor.
>>
>> The approach introduced here is derived from the original version of
>> qemu-kvm. It was refactored, documented, and extended by support for
>> user space APIC emulation, both with and without KVM acceleration. The
>> VMState format was kept compatible, so was the ABI to the option ROM
>> that implements the guest-side para-virtualized driver service. This
>> enables seamless migration from qemu-kvm to upstream or, one day,
>> between KVM and TCG mode.
>>
>> The basic concept goes like this:
>>  - VAPIC PV interface consisting of I/O port 0x7e and (for KVM in-kernel
>>   irqchip) a vmcall hypercall is registered
>>  - VAPIC option ROM is loaded into guest
>>  - option ROM activates TPR MMIO access reporting via port 0x7e
>>  - TPR accesses are trapped and patched in the guest to call into option
>>   ROM instead, VAPIC support is enabled
>>  - option ROM TPR helpers track state in memory and invoke hypercall to
>>   poll for pending IRQs if required
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> 
> I must say that I find the approach horrible, patching guests and ROMs
> and looking up Windows internals. Taking the same approach to extreme,
> we could for example patch Xen guest to become a KVM guest. Not that I
> object merging.

Yes, this is horrible. But there is no real better way in the absence of
hardware assisted virtualization of the TPR. I think MS is recommending
this patching approach as well.

>> diff --git a/hw/apic.c b/hw/apic.c
>> index 086c544..2ebf3ca 100644
>> --- a/hw/apic.c
>> +++ b/hw/apic.c
>> @@ -35,6 +35,10 @@
>>  #define MSI_ADDR_DEST_ID_SHIFT         12
>>  #define        MSI_ADDR_DEST_ID_MASK           0x00ffff0
>>
>> +#define SYNC_FROM_VAPIC                 0x1
>> +#define SYNC_TO_VAPIC                   0x2
>> +#define SYNC_ISR_IRR_TO_VAPIC           0x4
> 
> Enum, please.

OK.

> 
>> +
>>  static APICCommonState *local_apics[MAX_APICS + 1];
>>
>>  static void apic_set_irq(APICCommonState *s, int vector_num, int trigger_mode);
>> @@ -78,6 +82,70 @@ static inline int get_bit(uint32_t *tab, int index)
>>     return !!(tab[i] & mask);
>>  }
>>
>> +/* return -1 if no bit is set */
>> +static int get_highest_priority_int(uint32_t *tab)
>> +{
>> +    int i;
>> +    for (i = 7; i >= 0; i--) {
>> +        if (tab[i] != 0) {
>> +            return i * 32 + fls_bit(tab[i]);
>> +        }
>> +    }
>> +    return -1;
>> +}
>> +
>> +static void apic_sync_vapic(APICCommonState *s, int sync_type)
>> +{
>> +    VAPICState vapic_state;
>> +    size_t length;
>> +    off_t start;
>> +    int vector;
>> +
>> +    if (!s->vapic_paddr) {
>> +        return;
>> +    }
>> +    if (sync_type & SYNC_FROM_VAPIC) {
>> +        cpu_physical_memory_rw(s->vapic_paddr, (void *)&vapic_state,
>> +                               sizeof(vapic_state), 0);
>> +        s->tpr = vapic_state.tpr;
>> +    }
>> +    if (sync_type & (SYNC_TO_VAPIC | SYNC_ISR_IRR_TO_VAPIC)) {
>> +        start = offsetof(VAPICState, isr);
>> +        length = offsetof(VAPICState, enabled) - offsetof(VAPICState, isr);
>> +
>> +        if (sync_type & SYNC_TO_VAPIC) {
>> +            assert(qemu_cpu_is_self(s->cpu_env));
>> +
>> +            vapic_state.tpr = s->tpr;
>> +            vapic_state.enabled = 1;
>> +            start = 0;
>> +            length = sizeof(VAPICState);
>> +        }
>> +
>> +        vector = get_highest_priority_int(s->isr);
>> +        if (vector < 0) {
>> +            vector = 0;
>> +        }
>> +        vapic_state.isr = vector & 0xf0;
>> +
>> +        vapic_state.zero = 0;
>> +
>> +        vector = get_highest_priority_int(s->irr);
>> +        if (vector < 0) {
>> +            vector = 0;
>> +        }
>> +        vapic_state.irr = vector & 0xff;
>> +
>> +        cpu_physical_memory_write_rom(s->vapic_paddr + start,
>> +                                      ((void *)&vapic_state) + start, length);
> 
> This assumes that the vapic_state structure matches guest what guest
> expect without conversion. Is this true for i386 on x86_64? I didn't
> check the structure in question.

Yes, the structure in question is a packed one, stable on both guest and
host side (the guest side is 32-bit only anyway).

>> diff --git a/hw/apic_common.c b/hw/apic_common.c
>> index 588531b..1977da7 100644
>> --- a/hw/apic_common.c
>> +++ b/hw/apic_common.c
>> @@ -20,8 +20,10 @@
>>  #include "apic.h"
>>  #include "apic_internal.h"
>>  #include "trace.h"
>> +#include "kvm.h"
>>
>>  static int apic_irq_delivered;
>> +bool apic_report_tpr_access;
> 
> This should go to APICCommonState.

Nope, it is a global state, also checked in a place where the APIC is
set up, thus have no local clue about it yet and needs to pick up the
global view.

>> @@ -238,6 +275,7 @@ static int apic_init_common(SysBusDevice *dev)
>>  {
>>     APICCommonState *s = APIC_COMMON(dev);
>>     APICCommonClass *info;
>> +    static DeviceState *vapic;
>>     static int apic_no;
>>
>>     if (apic_no >= MAX_APICS) {
>> @@ -248,10 +286,29 @@ static int apic_init_common(SysBusDevice *dev)
>>     info = APIC_COMMON_GET_CLASS(s);
>>     info->init(s);
>>
>> -    sysbus_init_mmio(&s->busdev, &s->io_memory);
>> +    sysbus_init_mmio(dev, &s->io_memory);
>> +
>> +    if (!vapic && s->vapic_control & VAPIC_ENABLE_MASK) {
>> +        vapic = sysbus_create_simple("kvmvapic", -1, NULL);
>> +    }
>> +    s->vapic = vapic;
>> +    if (apic_report_tpr_access && info->enable_tpr_reporting) {
> 
> I think you should not rely on apic_report_tpr_access being in sane
> condition during class init.

It is mandatory, e.g. for CPU hotplug, as reporting needs to be
consistent accross all VCPUs. Therefore it is a static global, set to
false initially. However, you are right, we lack proper clearing of  the
access report feature on reset, not only in this variable.

>> diff --git a/hw/kvmvapic.c b/hw/kvmvapic.c
>> new file mode 100644
>> index 0000000..0c4d304
>> --- /dev/null
>> +++ b/hw/kvmvapic.c
>> @@ -0,0 +1,774 @@
>> +/*
>> + * TPR optimization for 32-bit Windows guests
>> + *
>> + * Copyright (C) 2007-2008 Qumranet Technologies
>> + * Copyright (C) 2012      Jan Kiszka, Siemens AG
>> + *
>> + * This work is licensed under the terms of the GNU GPL version 2, or
>> + * (at your option) any later version. See the COPYING file in the
>> + * top-level directory.
>> + */
>> +#include "sysemu.h"
>> +#include "cpus.h"
>> +#include "kvm.h"
>> +#include "apic_internal.h"
>> +
>> +#define APIC_DEFAULT_ADDRESS    0xfee00000
>> +
>> +#define VAPIC_IO_PORT           0x7e
>> +
>> +#define VAPIC_INACTIVE          0
>> +#define VAPIC_ACTIVE            1
>> +#define VAPIC_STANDBY           2
> 
> Enums, please.

OK.

> 
>> +
>> +#define VAPIC_CPU_SHIFT         7
>> +
>> +#define ROM_BLOCK_SIZE          512
>> +#define ROM_BLOCK_MASK          (~(ROM_BLOCK_SIZE - 1))
>> +
>> +typedef struct VAPICHandlers {
>> +    uint32_t set_tpr;
>> +    uint32_t set_tpr_eax;
>> +    uint32_t get_tpr[8];
>> +    uint32_t get_tpr_stack;
>> +} QEMU_PACKED VAPICHandlers;
>> +
>> +typedef struct GuestROMState {
>> +    char signature[8];
>> +    uint32_t vaddr;
> 
> This does not look 64 bit clean.

It's packed.

> 
>> +    uint32_t fixup_start;
>> +    uint32_t fixup_end;
>> +    uint32_t vapic_vaddr;
>> +    uint32_t vapic_size;
>> +    uint32_t vcpu_shift;
>> +    uint32_t real_tpr_addr;
>> +    VAPICHandlers up;
>> +    VAPICHandlers mp;
>> +} QEMU_PACKED GuestROMState;
> 
> Why packed, is this passed to guest directly?

It is a data field in the option ROM, see vapic_base in kvmvapic.S.

> 
>> +
>> +typedef struct VAPICROMState {
>> +    SysBusDevice busdev;
>> +    MemoryRegion io;
>> +    MemoryRegion rom;
>> +    bool rom_mapped_writable;
> 
> I'd put this later to avoid a structure hole.

Moving it after rom_state may save us a few precious bytes. Well, ok. :)

> 
>> +    uint32_t state;
>> +    uint32_t rom_state_paddr;
>> +    uint32_t rom_state_vaddr;
>> +    uint32_t vapic_paddr;
>> +    uint32_t real_tpr_addr;
>> +    GuestROMState rom_state;
>> +    size_t rom_size;
>> +} VAPICROMState;
>> +
>> +#define TPR_INSTR_IS_WRITE              0x1
>> +#define TPR_INSTR_ABS_MODRM             0x2
>> +#define TPR_INSTR_MATCH_MODRM_REG       0x4
>> +
>> +typedef struct TPRInstruction {
>> +    uint8_t opcode;
>> +    uint8_t modrm_reg;
>> +    unsigned int flags;
>> +    size_t length;
>> +    off_t addr_offset;
>> +} TPRInstruction;
> 
> Also here the order is pessimized.

Don't see the gain here, though.

>> +static int find_real_tpr_addr(VAPICROMState *s, CPUState *env)
>> +{
>> +    target_phys_addr_t paddr;
>> +    target_ulong addr;
>> +
>> +    if (s->state == VAPIC_ACTIVE) {
>> +        return 0;
>> +    }
>> +    for (addr = 0xfffff000; addr >= 0x80000000; addr -= TARGET_PAGE_SIZE) {
>> +        paddr = cpu_get_phys_page_debug(env, addr);
>> +        if (paddr != APIC_DEFAULT_ADDRESS) {
>> +            continue;
>> +        }
>> +        s->real_tpr_addr = addr + 0x80;
>> +        update_guest_rom_state(s);
>> +        return 0;
>> +    }
> 
> This loop looks odd, what should it do, probe for unused address?

Seems to deserve a comment: We have to scan for the guest's mapping of
the APIC as we enter here without a hint from an TPR accessing
instruction. So we probe the potential range, trying to find the page
that maps to that known physical address (known in the sense that
Windows does not remap the APIC physically - nor does QEMU support that
so far).

>> +static int evaluate_tpr_instruction(VAPICROMState *s, CPUState *env,
>> +                                    target_ulong *pip, int access)
>> +{
>> +    const TPRInstruction *instr;
>> +    target_ulong ip = *pip;
>> +    uint8_t opcode[2];
>> +    uint32_t real_tpr_addr;
>> +    int i;
>> +
>> +    if ((ip & 0xf0000000) != 0x80000000 && (ip & 0xf0000000) != 0xe0000000) {
> 
> The constants should be using ULL suffix because target_ulong could be
> 64 bit, though maybe this is more optimal.

target_ulong is 64-bit unconditionally on x86. I'll add this.

>> +
>> +/*
>> + * Tries to read the unique processor number from the Kernel Processor Control
>> + * Region (KPCR) of 32-bit Windows. Returns -1 if the KPCR cannot be accessed
>> + * or is considered invalid.
>> + */
> 
> Horrible hack. Is guest OS type or version checked somewhere?

This is all about hacking Windows 32-bit. And this check encodes that
even stronger. The other important binding is the expected virtual
address of the ROM mapping under Windows. I would have preferred
checking the version directly, but no one has a complete list of
supported guests and their codes.

> 
>> +static int get_kpcr_number(CPUState *env)
>> +{
>> +    struct kpcr {
>> +        uint8_t  fill1[0x1c];
>> +        uint32_t self;
>> +        uint8_t  fill2[0x31];
>> +        uint8_t  number;
>> +    } QEMU_PACKED kpcr;
> 
> KPCR. Pointers to Windows documentation would be nice.

Oops, yes.

Unfortunately, this is only an internal structure, not officially
documented by MS. However, all supported OS versions a legacy by now, no
longer changing its structure.

>> +
>> +static int patch_hypercalls(VAPICROMState *s)
>> +{
>> +    target_phys_addr_t rom_paddr = s->rom_state_paddr & ROM_BLOCK_MASK;
>> +    static uint8_t vmcall_pattern[] = {
> 
> const
> 
>> +        0xb8, 0x1, 0, 0, 0, 0xf, 0x1, 0xc1
>> +    };
>> +    static uint8_t outl_pattern[] = {
> 
> const

Yep.

>> +static void vapic_write(void *opaque, target_phys_addr_t addr, uint64_t data,
>> +                        unsigned int size)
>> +{
>> +    CPUState *env = cpu_single_env;
>> +    target_phys_addr_t rom_paddr;
>> +    VAPICROMState *s = opaque;
>> +
>> +    cpu_synchronize_state(env);
>> +
>> +    /*
>> +     * The VAPIC supports two PIO-based hypercalls, both via port 0x7E.
>> +     *  o 16-bit write access:
>> +     *    Reports the option ROM initialization to the hypervisor. Written
>> +     *    value is the offset of the state structure in the ROM.
>> +     *  o 8-bit write access:
>> +     *    Reactivates the VAPIC after a guest hibernation, i.e. after the
>> +     *    option ROM content has been re-initialized by a guest power cycle.
>> +     *  o 32-bit write access:
>> +     *    Poll for pending IRQs, considering the current VAPIC state.
>> +     */
> 
> Different operation depending on size? Interesting.

Originally not my idea, just added the third case. :)

> 
>> +    switch (size) {
>> +    case 2:
>> +        if (s->state != VAPIC_INACTIVE) {
>> +            patch_hypercalls(s);
>> +            break;
>> +        }
>> +
>> +        rom_paddr = (env->segs[R_CS].base + env->eip) & ROM_BLOCK_MASK;
>> +        s->rom_state_paddr = rom_paddr + data;
>> +
>> +        if (vapic_prepare(s) < 0) {
>> +            break;
>> +        }
>> +        s->state = VAPIC_STANDBY;
>> +        break;
>> +    case 1:
>> +        if (kvm_enabled()) {
>> +            /*
>> +             * Disable triggering instruction in ROM by writing a NOP.
>> +             *
>> +             * We cannot do this in TCG mode as the reported IP is not
>> +             * reliable.
> 
> Given the hack level of the whole, it would not be impossible to find
> the IP using search PC.

Is there a specific pre-existing service you have in mind? Otherwise,
the complexity might not be worth the gain.

Thanks for having a look,
Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests
  2012-02-13 10:16       ` Jan Kiszka
@ 2012-02-13 18:50         ` Blue Swirl
  -1 siblings, 0 replies; 90+ messages in thread
From: Blue Swirl @ 2012-02-13 18:50 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Avi Kivity, Marcelo Tosatti, Anthony Liguori, qemu-devel, kvm,
	Gleb Natapov

On Mon, Feb 13, 2012 at 10:16, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 2012-02-11 16:25, Blue Swirl wrote:
>> On Fri, Feb 10, 2012 at 18:31, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>> This enables acceleration for MMIO-based TPR registers accesses of
>>> 32-bit Windows guest systems. It is mostly useful with KVM enabled,
>>> either on older Intel CPUs (without flexpriority feature, can also be
>>> manually disabled for testing) or any current AMD processor.
>>>
>>> The approach introduced here is derived from the original version of
>>> qemu-kvm. It was refactored, documented, and extended by support for
>>> user space APIC emulation, both with and without KVM acceleration. The
>>> VMState format was kept compatible, so was the ABI to the option ROM
>>> that implements the guest-side para-virtualized driver service. This
>>> enables seamless migration from qemu-kvm to upstream or, one day,
>>> between KVM and TCG mode.
>>>
>>> The basic concept goes like this:
>>>  - VAPIC PV interface consisting of I/O port 0x7e and (for KVM in-kernel
>>>   irqchip) a vmcall hypercall is registered
>>>  - VAPIC option ROM is loaded into guest
>>>  - option ROM activates TPR MMIO access reporting via port 0x7e
>>>  - TPR accesses are trapped and patched in the guest to call into option
>>>   ROM instead, VAPIC support is enabled
>>>  - option ROM TPR helpers track state in memory and invoke hypercall to
>>>   poll for pending IRQs if required
>>>
>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>
>> I must say that I find the approach horrible, patching guests and ROMs
>> and looking up Windows internals. Taking the same approach to extreme,
>> we could for example patch Xen guest to become a KVM guest. Not that I
>> object merging.
>
> Yes, this is horrible. But there is no real better way in the absence of
> hardware assisted virtualization of the TPR. I think MS is recommending
> this patching approach as well.

Maybe instead of routing via ROM and the hypercall, the TPR accesses
could be handled directly with guest invisible breakpoints (like GDB
breakpoints, but for QEMU internal use), much like other
instrumentation could be handled.

>>> diff --git a/hw/apic.c b/hw/apic.c
>>> index 086c544..2ebf3ca 100644
>>> --- a/hw/apic.c
>>> +++ b/hw/apic.c
>>> @@ -35,6 +35,10 @@
>>>  #define MSI_ADDR_DEST_ID_SHIFT         12
>>>  #define        MSI_ADDR_DEST_ID_MASK           0x00ffff0
>>>
>>> +#define SYNC_FROM_VAPIC                 0x1
>>> +#define SYNC_TO_VAPIC                   0x2
>>> +#define SYNC_ISR_IRR_TO_VAPIC           0x4
>>
>> Enum, please.
>
> OK.
>
>>
>>> +
>>>  static APICCommonState *local_apics[MAX_APICS + 1];
>>>
>>>  static void apic_set_irq(APICCommonState *s, int vector_num, int trigger_mode);
>>> @@ -78,6 +82,70 @@ static inline int get_bit(uint32_t *tab, int index)
>>>     return !!(tab[i] & mask);
>>>  }
>>>
>>> +/* return -1 if no bit is set */
>>> +static int get_highest_priority_int(uint32_t *tab)
>>> +{
>>> +    int i;
>>> +    for (i = 7; i >= 0; i--) {
>>> +        if (tab[i] != 0) {
>>> +            return i * 32 + fls_bit(tab[i]);
>>> +        }
>>> +    }
>>> +    return -1;
>>> +}
>>> +
>>> +static void apic_sync_vapic(APICCommonState *s, int sync_type)
>>> +{
>>> +    VAPICState vapic_state;
>>> +    size_t length;
>>> +    off_t start;
>>> +    int vector;
>>> +
>>> +    if (!s->vapic_paddr) {
>>> +        return;
>>> +    }
>>> +    if (sync_type & SYNC_FROM_VAPIC) {
>>> +        cpu_physical_memory_rw(s->vapic_paddr, (void *)&vapic_state,
>>> +                               sizeof(vapic_state), 0);
>>> +        s->tpr = vapic_state.tpr;
>>> +    }
>>> +    if (sync_type & (SYNC_TO_VAPIC | SYNC_ISR_IRR_TO_VAPIC)) {
>>> +        start = offsetof(VAPICState, isr);
>>> +        length = offsetof(VAPICState, enabled) - offsetof(VAPICState, isr);
>>> +
>>> +        if (sync_type & SYNC_TO_VAPIC) {
>>> +            assert(qemu_cpu_is_self(s->cpu_env));
>>> +
>>> +            vapic_state.tpr = s->tpr;
>>> +            vapic_state.enabled = 1;
>>> +            start = 0;
>>> +            length = sizeof(VAPICState);
>>> +        }
>>> +
>>> +        vector = get_highest_priority_int(s->isr);
>>> +        if (vector < 0) {
>>> +            vector = 0;
>>> +        }
>>> +        vapic_state.isr = vector & 0xf0;
>>> +
>>> +        vapic_state.zero = 0;
>>> +
>>> +        vector = get_highest_priority_int(s->irr);
>>> +        if (vector < 0) {
>>> +            vector = 0;
>>> +        }
>>> +        vapic_state.irr = vector & 0xff;
>>> +
>>> +        cpu_physical_memory_write_rom(s->vapic_paddr + start,
>>> +                                      ((void *)&vapic_state) + start, length);
>>
>> This assumes that the vapic_state structure matches guest what guest
>> expect without conversion. Is this true for i386 on x86_64? I didn't
>> check the structure in question.
>
> Yes, the structure in question is a packed one, stable on both guest and
> host side (the guest side is 32-bit only anyway).
>
>>> diff --git a/hw/apic_common.c b/hw/apic_common.c
>>> index 588531b..1977da7 100644
>>> --- a/hw/apic_common.c
>>> +++ b/hw/apic_common.c
>>> @@ -20,8 +20,10 @@
>>>  #include "apic.h"
>>>  #include "apic_internal.h"
>>>  #include "trace.h"
>>> +#include "kvm.h"
>>>
>>>  static int apic_irq_delivered;
>>> +bool apic_report_tpr_access;
>>
>> This should go to APICCommonState.
>
> Nope, it is a global state, also checked in a place where the APIC is
> set up, thus have no local clue about it yet and needs to pick up the
> global view.
>
>>> @@ -238,6 +275,7 @@ static int apic_init_common(SysBusDevice *dev)
>>>  {
>>>     APICCommonState *s = APIC_COMMON(dev);
>>>     APICCommonClass *info;
>>> +    static DeviceState *vapic;
>>>     static int apic_no;
>>>
>>>     if (apic_no >= MAX_APICS) {
>>> @@ -248,10 +286,29 @@ static int apic_init_common(SysBusDevice *dev)
>>>     info = APIC_COMMON_GET_CLASS(s);
>>>     info->init(s);
>>>
>>> -    sysbus_init_mmio(&s->busdev, &s->io_memory);
>>> +    sysbus_init_mmio(dev, &s->io_memory);
>>> +
>>> +    if (!vapic && s->vapic_control & VAPIC_ENABLE_MASK) {
>>> +        vapic = sysbus_create_simple("kvmvapic", -1, NULL);
>>> +    }
>>> +    s->vapic = vapic;
>>> +    if (apic_report_tpr_access && info->enable_tpr_reporting) {
>>
>> I think you should not rely on apic_report_tpr_access being in sane
>> condition during class init.
>
> It is mandatory, e.g. for CPU hotplug, as reporting needs to be
> consistent accross all VCPUs. Therefore it is a static global, set to
> false initially. However, you are right, we lack proper clearing of  the
> access report feature on reset, not only in this variable.

I'd also set it to false initially.

>>> diff --git a/hw/kvmvapic.c b/hw/kvmvapic.c
>>> new file mode 100644
>>> index 0000000..0c4d304
>>> --- /dev/null
>>> +++ b/hw/kvmvapic.c
>>> @@ -0,0 +1,774 @@
>>> +/*
>>> + * TPR optimization for 32-bit Windows guests
>>> + *
>>> + * Copyright (C) 2007-2008 Qumranet Technologies
>>> + * Copyright (C) 2012      Jan Kiszka, Siemens AG
>>> + *
>>> + * This work is licensed under the terms of the GNU GPL version 2, or
>>> + * (at your option) any later version. See the COPYING file in the
>>> + * top-level directory.
>>> + */
>>> +#include "sysemu.h"
>>> +#include "cpus.h"
>>> +#include "kvm.h"
>>> +#include "apic_internal.h"
>>> +
>>> +#define APIC_DEFAULT_ADDRESS    0xfee00000
>>> +
>>> +#define VAPIC_IO_PORT           0x7e
>>> +
>>> +#define VAPIC_INACTIVE          0
>>> +#define VAPIC_ACTIVE            1
>>> +#define VAPIC_STANDBY           2
>>
>> Enums, please.
>
> OK.
>
>>
>>> +
>>> +#define VAPIC_CPU_SHIFT         7
>>> +
>>> +#define ROM_BLOCK_SIZE          512
>>> +#define ROM_BLOCK_MASK          (~(ROM_BLOCK_SIZE - 1))
>>> +
>>> +typedef struct VAPICHandlers {
>>> +    uint32_t set_tpr;
>>> +    uint32_t set_tpr_eax;
>>> +    uint32_t get_tpr[8];
>>> +    uint32_t get_tpr_stack;
>>> +} QEMU_PACKED VAPICHandlers;
>>> +
>>> +typedef struct GuestROMState {
>>> +    char signature[8];
>>> +    uint32_t vaddr;
>>
>> This does not look 64 bit clean.
>
> It's packed.

I meant "virtual address could be 64 bits on a 64 bit host", not
structure packing.

>>
>>> +    uint32_t fixup_start;
>>> +    uint32_t fixup_end;
>>> +    uint32_t vapic_vaddr;
>>> +    uint32_t vapic_size;
>>> +    uint32_t vcpu_shift;
>>> +    uint32_t real_tpr_addr;
>>> +    VAPICHandlers up;
>>> +    VAPICHandlers mp;
>>> +} QEMU_PACKED GuestROMState;
>>
>> Why packed, is this passed to guest directly?
>
> It is a data field in the option ROM, see vapic_base in kvmvapic.S.
>
>>
>>> +
>>> +typedef struct VAPICROMState {
>>> +    SysBusDevice busdev;
>>> +    MemoryRegion io;
>>> +    MemoryRegion rom;
>>> +    bool rom_mapped_writable;
>>
>> I'd put this later to avoid a structure hole.
>
> Moving it after rom_state may save us a few precious bytes. Well, ok. :)
>
>>
>>> +    uint32_t state;
>>> +    uint32_t rom_state_paddr;
>>> +    uint32_t rom_state_vaddr;
>>> +    uint32_t vapic_paddr;
>>> +    uint32_t real_tpr_addr;
>>> +    GuestROMState rom_state;
>>> +    size_t rom_size;
>>> +} VAPICROMState;
>>> +
>>> +#define TPR_INSTR_IS_WRITE              0x1
>>> +#define TPR_INSTR_ABS_MODRM             0x2
>>> +#define TPR_INSTR_MATCH_MODRM_REG       0x4
>>> +
>>> +typedef struct TPRInstruction {
>>> +    uint8_t opcode;
>>> +    uint8_t modrm_reg;
>>> +    unsigned int flags;
>>> +    size_t length;
>>> +    off_t addr_offset;
>>> +} TPRInstruction;
>>
>> Also here the order is pessimized.
>
> Don't see the gain here, though.

There are two bytes' hole between modrm_reg and flags, maybe also 4
bytes between length and addr_offset (if size_t is 32 bits but off_t
64 bits). I'd reverse the order so that members with largest alignment
needs come first.

>>> +static int find_real_tpr_addr(VAPICROMState *s, CPUState *env)
>>> +{
>>> +    target_phys_addr_t paddr;
>>> +    target_ulong addr;
>>> +
>>> +    if (s->state == VAPIC_ACTIVE) {
>>> +        return 0;
>>> +    }
>>> +    for (addr = 0xfffff000; addr >= 0x80000000; addr -= TARGET_PAGE_SIZE) {
>>> +        paddr = cpu_get_phys_page_debug(env, addr);
>>> +        if (paddr != APIC_DEFAULT_ADDRESS) {
>>> +            continue;
>>> +        }
>>> +        s->real_tpr_addr = addr + 0x80;
>>> +        update_guest_rom_state(s);
>>> +        return 0;
>>> +    }
>>
>> This loop looks odd, what should it do, probe for unused address?
>
> Seems to deserve a comment: We have to scan for the guest's mapping of
> the APIC as we enter here without a hint from an TPR accessing
> instruction. So we probe the potential range, trying to find the page
> that maps to that known physical address (known in the sense that
> Windows does not remap the APIC physically - nor does QEMU support that
> so far).

Yes, more comments would be nice, especially on theory of operation.

>>> +static int evaluate_tpr_instruction(VAPICROMState *s, CPUState *env,
>>> +                                    target_ulong *pip, int access)
>>> +{
>>> +    const TPRInstruction *instr;
>>> +    target_ulong ip = *pip;
>>> +    uint8_t opcode[2];
>>> +    uint32_t real_tpr_addr;
>>> +    int i;
>>> +
>>> +    if ((ip & 0xf0000000) != 0x80000000 && (ip & 0xf0000000) != 0xe0000000) {
>>
>> The constants should be using ULL suffix because target_ulong could be
>> 64 bit, though maybe this is more optimal.
>
> target_ulong is 64-bit unconditionally on x86. I'll add this.

No, target_phys_addr_t is now 64 bits, but target_ulong (register
size) is 32 bits for i386-softmmu.

>>> +
>>> +/*
>>> + * Tries to read the unique processor number from the Kernel Processor Control
>>> + * Region (KPCR) of 32-bit Windows. Returns -1 if the KPCR cannot be accessed
>>> + * or is considered invalid.
>>> + */
>>
>> Horrible hack. Is guest OS type or version checked somewhere?
>
> This is all about hacking Windows 32-bit. And this check encodes that
> even stronger. The other important binding is the expected virtual
> address of the ROM mapping under Windows. I would have preferred
> checking the version directly, but no one has a complete list of
> supported guests and their codes.

Then it would be nice to only enable this with a command line switch,
so that some random poor non-Windows OS is not patched incorrectly.

>>
>>> +static int get_kpcr_number(CPUState *env)
>>> +{
>>> +    struct kpcr {
>>> +        uint8_t  fill1[0x1c];
>>> +        uint32_t self;
>>> +        uint8_t  fill2[0x31];
>>> +        uint8_t  number;
>>> +    } QEMU_PACKED kpcr;
>>
>> KPCR. Pointers to Windows documentation would be nice.
>
> Oops, yes.
>
> Unfortunately, this is only an internal structure, not officially
> documented by MS. However, all supported OS versions a legacy by now, no
> longer changing its structure.

This and a note about the supported OS versions could be added as comment.

>>> +
>>> +static int patch_hypercalls(VAPICROMState *s)
>>> +{
>>> +    target_phys_addr_t rom_paddr = s->rom_state_paddr & ROM_BLOCK_MASK;
>>> +    static uint8_t vmcall_pattern[] = {
>>
>> const
>>
>>> +        0xb8, 0x1, 0, 0, 0, 0xf, 0x1, 0xc1
>>> +    };
>>> +    static uint8_t outl_pattern[] = {
>>
>> const
>
> Yep.
>
>>> +static void vapic_write(void *opaque, target_phys_addr_t addr, uint64_t data,
>>> +                        unsigned int size)
>>> +{
>>> +    CPUState *env = cpu_single_env;
>>> +    target_phys_addr_t rom_paddr;
>>> +    VAPICROMState *s = opaque;
>>> +
>>> +    cpu_synchronize_state(env);
>>> +
>>> +    /*
>>> +     * The VAPIC supports two PIO-based hypercalls, both via port 0x7E.
>>> +     *  o 16-bit write access:
>>> +     *    Reports the option ROM initialization to the hypervisor. Written
>>> +     *    value is the offset of the state structure in the ROM.
>>> +     *  o 8-bit write access:
>>> +     *    Reactivates the VAPIC after a guest hibernation, i.e. after the
>>> +     *    option ROM content has been re-initialized by a guest power cycle.
>>> +     *  o 32-bit write access:
>>> +     *    Poll for pending IRQs, considering the current VAPIC state.
>>> +     */
>>
>> Different operation depending on size? Interesting.
>
> Originally not my idea, just added the third case. :)
>
>>
>>> +    switch (size) {
>>> +    case 2:
>>> +        if (s->state != VAPIC_INACTIVE) {
>>> +            patch_hypercalls(s);
>>> +            break;
>>> +        }
>>> +
>>> +        rom_paddr = (env->segs[R_CS].base + env->eip) & ROM_BLOCK_MASK;
>>> +        s->rom_state_paddr = rom_paddr + data;
>>> +
>>> +        if (vapic_prepare(s) < 0) {
>>> +            break;
>>> +        }
>>> +        s->state = VAPIC_STANDBY;
>>> +        break;
>>> +    case 1:
>>> +        if (kvm_enabled()) {
>>> +            /*
>>> +             * Disable triggering instruction in ROM by writing a NOP.
>>> +             *
>>> +             * We cannot do this in TCG mode as the reported IP is not
>>> +             * reliable.
>>
>> Given the hack level of the whole, it would not be impossible to find
>> the IP using search PC.
>
> Is there a specific pre-existing service you have in mind? Otherwise,
> the complexity might not be worth the gain.

There's gen_intermediate_code() vs. gen_intermediate_code_pc().
Probably not worth it, but it would increase the hack level nicely.

> Thanks for having a look,
> Jan
>
> --
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests
@ 2012-02-13 18:50         ` Blue Swirl
  0 siblings, 0 replies; 90+ messages in thread
From: Blue Swirl @ 2012-02-13 18:50 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti, qemu-devel,
	Avi Kivity

On Mon, Feb 13, 2012 at 10:16, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> On 2012-02-11 16:25, Blue Swirl wrote:
>> On Fri, Feb 10, 2012 at 18:31, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>> This enables acceleration for MMIO-based TPR registers accesses of
>>> 32-bit Windows guest systems. It is mostly useful with KVM enabled,
>>> either on older Intel CPUs (without flexpriority feature, can also be
>>> manually disabled for testing) or any current AMD processor.
>>>
>>> The approach introduced here is derived from the original version of
>>> qemu-kvm. It was refactored, documented, and extended by support for
>>> user space APIC emulation, both with and without KVM acceleration. The
>>> VMState format was kept compatible, so was the ABI to the option ROM
>>> that implements the guest-side para-virtualized driver service. This
>>> enables seamless migration from qemu-kvm to upstream or, one day,
>>> between KVM and TCG mode.
>>>
>>> The basic concept goes like this:
>>>  - VAPIC PV interface consisting of I/O port 0x7e and (for KVM in-kernel
>>>   irqchip) a vmcall hypercall is registered
>>>  - VAPIC option ROM is loaded into guest
>>>  - option ROM activates TPR MMIO access reporting via port 0x7e
>>>  - TPR accesses are trapped and patched in the guest to call into option
>>>   ROM instead, VAPIC support is enabled
>>>  - option ROM TPR helpers track state in memory and invoke hypercall to
>>>   poll for pending IRQs if required
>>>
>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>
>> I must say that I find the approach horrible, patching guests and ROMs
>> and looking up Windows internals. Taking the same approach to extreme,
>> we could for example patch Xen guest to become a KVM guest. Not that I
>> object merging.
>
> Yes, this is horrible. But there is no real better way in the absence of
> hardware assisted virtualization of the TPR. I think MS is recommending
> this patching approach as well.

Maybe instead of routing via ROM and the hypercall, the TPR accesses
could be handled directly with guest invisible breakpoints (like GDB
breakpoints, but for QEMU internal use), much like other
instrumentation could be handled.

>>> diff --git a/hw/apic.c b/hw/apic.c
>>> index 086c544..2ebf3ca 100644
>>> --- a/hw/apic.c
>>> +++ b/hw/apic.c
>>> @@ -35,6 +35,10 @@
>>>  #define MSI_ADDR_DEST_ID_SHIFT         12
>>>  #define        MSI_ADDR_DEST_ID_MASK           0x00ffff0
>>>
>>> +#define SYNC_FROM_VAPIC                 0x1
>>> +#define SYNC_TO_VAPIC                   0x2
>>> +#define SYNC_ISR_IRR_TO_VAPIC           0x4
>>
>> Enum, please.
>
> OK.
>
>>
>>> +
>>>  static APICCommonState *local_apics[MAX_APICS + 1];
>>>
>>>  static void apic_set_irq(APICCommonState *s, int vector_num, int trigger_mode);
>>> @@ -78,6 +82,70 @@ static inline int get_bit(uint32_t *tab, int index)
>>>     return !!(tab[i] & mask);
>>>  }
>>>
>>> +/* return -1 if no bit is set */
>>> +static int get_highest_priority_int(uint32_t *tab)
>>> +{
>>> +    int i;
>>> +    for (i = 7; i >= 0; i--) {
>>> +        if (tab[i] != 0) {
>>> +            return i * 32 + fls_bit(tab[i]);
>>> +        }
>>> +    }
>>> +    return -1;
>>> +}
>>> +
>>> +static void apic_sync_vapic(APICCommonState *s, int sync_type)
>>> +{
>>> +    VAPICState vapic_state;
>>> +    size_t length;
>>> +    off_t start;
>>> +    int vector;
>>> +
>>> +    if (!s->vapic_paddr) {
>>> +        return;
>>> +    }
>>> +    if (sync_type & SYNC_FROM_VAPIC) {
>>> +        cpu_physical_memory_rw(s->vapic_paddr, (void *)&vapic_state,
>>> +                               sizeof(vapic_state), 0);
>>> +        s->tpr = vapic_state.tpr;
>>> +    }
>>> +    if (sync_type & (SYNC_TO_VAPIC | SYNC_ISR_IRR_TO_VAPIC)) {
>>> +        start = offsetof(VAPICState, isr);
>>> +        length = offsetof(VAPICState, enabled) - offsetof(VAPICState, isr);
>>> +
>>> +        if (sync_type & SYNC_TO_VAPIC) {
>>> +            assert(qemu_cpu_is_self(s->cpu_env));
>>> +
>>> +            vapic_state.tpr = s->tpr;
>>> +            vapic_state.enabled = 1;
>>> +            start = 0;
>>> +            length = sizeof(VAPICState);
>>> +        }
>>> +
>>> +        vector = get_highest_priority_int(s->isr);
>>> +        if (vector < 0) {
>>> +            vector = 0;
>>> +        }
>>> +        vapic_state.isr = vector & 0xf0;
>>> +
>>> +        vapic_state.zero = 0;
>>> +
>>> +        vector = get_highest_priority_int(s->irr);
>>> +        if (vector < 0) {
>>> +            vector = 0;
>>> +        }
>>> +        vapic_state.irr = vector & 0xff;
>>> +
>>> +        cpu_physical_memory_write_rom(s->vapic_paddr + start,
>>> +                                      ((void *)&vapic_state) + start, length);
>>
>> This assumes that the vapic_state structure matches guest what guest
>> expect without conversion. Is this true for i386 on x86_64? I didn't
>> check the structure in question.
>
> Yes, the structure in question is a packed one, stable on both guest and
> host side (the guest side is 32-bit only anyway).
>
>>> diff --git a/hw/apic_common.c b/hw/apic_common.c
>>> index 588531b..1977da7 100644
>>> --- a/hw/apic_common.c
>>> +++ b/hw/apic_common.c
>>> @@ -20,8 +20,10 @@
>>>  #include "apic.h"
>>>  #include "apic_internal.h"
>>>  #include "trace.h"
>>> +#include "kvm.h"
>>>
>>>  static int apic_irq_delivered;
>>> +bool apic_report_tpr_access;
>>
>> This should go to APICCommonState.
>
> Nope, it is a global state, also checked in a place where the APIC is
> set up, thus have no local clue about it yet and needs to pick up the
> global view.
>
>>> @@ -238,6 +275,7 @@ static int apic_init_common(SysBusDevice *dev)
>>>  {
>>>     APICCommonState *s = APIC_COMMON(dev);
>>>     APICCommonClass *info;
>>> +    static DeviceState *vapic;
>>>     static int apic_no;
>>>
>>>     if (apic_no >= MAX_APICS) {
>>> @@ -248,10 +286,29 @@ static int apic_init_common(SysBusDevice *dev)
>>>     info = APIC_COMMON_GET_CLASS(s);
>>>     info->init(s);
>>>
>>> -    sysbus_init_mmio(&s->busdev, &s->io_memory);
>>> +    sysbus_init_mmio(dev, &s->io_memory);
>>> +
>>> +    if (!vapic && s->vapic_control & VAPIC_ENABLE_MASK) {
>>> +        vapic = sysbus_create_simple("kvmvapic", -1, NULL);
>>> +    }
>>> +    s->vapic = vapic;
>>> +    if (apic_report_tpr_access && info->enable_tpr_reporting) {
>>
>> I think you should not rely on apic_report_tpr_access being in sane
>> condition during class init.
>
> It is mandatory, e.g. for CPU hotplug, as reporting needs to be
> consistent accross all VCPUs. Therefore it is a static global, set to
> false initially. However, you are right, we lack proper clearing of  the
> access report feature on reset, not only in this variable.

I'd also set it to false initially.

>>> diff --git a/hw/kvmvapic.c b/hw/kvmvapic.c
>>> new file mode 100644
>>> index 0000000..0c4d304
>>> --- /dev/null
>>> +++ b/hw/kvmvapic.c
>>> @@ -0,0 +1,774 @@
>>> +/*
>>> + * TPR optimization for 32-bit Windows guests
>>> + *
>>> + * Copyright (C) 2007-2008 Qumranet Technologies
>>> + * Copyright (C) 2012      Jan Kiszka, Siemens AG
>>> + *
>>> + * This work is licensed under the terms of the GNU GPL version 2, or
>>> + * (at your option) any later version. See the COPYING file in the
>>> + * top-level directory.
>>> + */
>>> +#include "sysemu.h"
>>> +#include "cpus.h"
>>> +#include "kvm.h"
>>> +#include "apic_internal.h"
>>> +
>>> +#define APIC_DEFAULT_ADDRESS    0xfee00000
>>> +
>>> +#define VAPIC_IO_PORT           0x7e
>>> +
>>> +#define VAPIC_INACTIVE          0
>>> +#define VAPIC_ACTIVE            1
>>> +#define VAPIC_STANDBY           2
>>
>> Enums, please.
>
> OK.
>
>>
>>> +
>>> +#define VAPIC_CPU_SHIFT         7
>>> +
>>> +#define ROM_BLOCK_SIZE          512
>>> +#define ROM_BLOCK_MASK          (~(ROM_BLOCK_SIZE - 1))
>>> +
>>> +typedef struct VAPICHandlers {
>>> +    uint32_t set_tpr;
>>> +    uint32_t set_tpr_eax;
>>> +    uint32_t get_tpr[8];
>>> +    uint32_t get_tpr_stack;
>>> +} QEMU_PACKED VAPICHandlers;
>>> +
>>> +typedef struct GuestROMState {
>>> +    char signature[8];
>>> +    uint32_t vaddr;
>>
>> This does not look 64 bit clean.
>
> It's packed.

I meant "virtual address could be 64 bits on a 64 bit host", not
structure packing.

>>
>>> +    uint32_t fixup_start;
>>> +    uint32_t fixup_end;
>>> +    uint32_t vapic_vaddr;
>>> +    uint32_t vapic_size;
>>> +    uint32_t vcpu_shift;
>>> +    uint32_t real_tpr_addr;
>>> +    VAPICHandlers up;
>>> +    VAPICHandlers mp;
>>> +} QEMU_PACKED GuestROMState;
>>
>> Why packed, is this passed to guest directly?
>
> It is a data field in the option ROM, see vapic_base in kvmvapic.S.
>
>>
>>> +
>>> +typedef struct VAPICROMState {
>>> +    SysBusDevice busdev;
>>> +    MemoryRegion io;
>>> +    MemoryRegion rom;
>>> +    bool rom_mapped_writable;
>>
>> I'd put this later to avoid a structure hole.
>
> Moving it after rom_state may save us a few precious bytes. Well, ok. :)
>
>>
>>> +    uint32_t state;
>>> +    uint32_t rom_state_paddr;
>>> +    uint32_t rom_state_vaddr;
>>> +    uint32_t vapic_paddr;
>>> +    uint32_t real_tpr_addr;
>>> +    GuestROMState rom_state;
>>> +    size_t rom_size;
>>> +} VAPICROMState;
>>> +
>>> +#define TPR_INSTR_IS_WRITE              0x1
>>> +#define TPR_INSTR_ABS_MODRM             0x2
>>> +#define TPR_INSTR_MATCH_MODRM_REG       0x4
>>> +
>>> +typedef struct TPRInstruction {
>>> +    uint8_t opcode;
>>> +    uint8_t modrm_reg;
>>> +    unsigned int flags;
>>> +    size_t length;
>>> +    off_t addr_offset;
>>> +} TPRInstruction;
>>
>> Also here the order is pessimized.
>
> Don't see the gain here, though.

There are two bytes' hole between modrm_reg and flags, maybe also 4
bytes between length and addr_offset (if size_t is 32 bits but off_t
64 bits). I'd reverse the order so that members with largest alignment
needs come first.

>>> +static int find_real_tpr_addr(VAPICROMState *s, CPUState *env)
>>> +{
>>> +    target_phys_addr_t paddr;
>>> +    target_ulong addr;
>>> +
>>> +    if (s->state == VAPIC_ACTIVE) {
>>> +        return 0;
>>> +    }
>>> +    for (addr = 0xfffff000; addr >= 0x80000000; addr -= TARGET_PAGE_SIZE) {
>>> +        paddr = cpu_get_phys_page_debug(env, addr);
>>> +        if (paddr != APIC_DEFAULT_ADDRESS) {
>>> +            continue;
>>> +        }
>>> +        s->real_tpr_addr = addr + 0x80;
>>> +        update_guest_rom_state(s);
>>> +        return 0;
>>> +    }
>>
>> This loop looks odd, what should it do, probe for unused address?
>
> Seems to deserve a comment: We have to scan for the guest's mapping of
> the APIC as we enter here without a hint from an TPR accessing
> instruction. So we probe the potential range, trying to find the page
> that maps to that known physical address (known in the sense that
> Windows does not remap the APIC physically - nor does QEMU support that
> so far).

Yes, more comments would be nice, especially on theory of operation.

>>> +static int evaluate_tpr_instruction(VAPICROMState *s, CPUState *env,
>>> +                                    target_ulong *pip, int access)
>>> +{
>>> +    const TPRInstruction *instr;
>>> +    target_ulong ip = *pip;
>>> +    uint8_t opcode[2];
>>> +    uint32_t real_tpr_addr;
>>> +    int i;
>>> +
>>> +    if ((ip & 0xf0000000) != 0x80000000 && (ip & 0xf0000000) != 0xe0000000) {
>>
>> The constants should be using ULL suffix because target_ulong could be
>> 64 bit, though maybe this is more optimal.
>
> target_ulong is 64-bit unconditionally on x86. I'll add this.

No, target_phys_addr_t is now 64 bits, but target_ulong (register
size) is 32 bits for i386-softmmu.

>>> +
>>> +/*
>>> + * Tries to read the unique processor number from the Kernel Processor Control
>>> + * Region (KPCR) of 32-bit Windows. Returns -1 if the KPCR cannot be accessed
>>> + * or is considered invalid.
>>> + */
>>
>> Horrible hack. Is guest OS type or version checked somewhere?
>
> This is all about hacking Windows 32-bit. And this check encodes that
> even stronger. The other important binding is the expected virtual
> address of the ROM mapping under Windows. I would have preferred
> checking the version directly, but no one has a complete list of
> supported guests and their codes.

Then it would be nice to only enable this with a command line switch,
so that some random poor non-Windows OS is not patched incorrectly.

>>
>>> +static int get_kpcr_number(CPUState *env)
>>> +{
>>> +    struct kpcr {
>>> +        uint8_t  fill1[0x1c];
>>> +        uint32_t self;
>>> +        uint8_t  fill2[0x31];
>>> +        uint8_t  number;
>>> +    } QEMU_PACKED kpcr;
>>
>> KPCR. Pointers to Windows documentation would be nice.
>
> Oops, yes.
>
> Unfortunately, this is only an internal structure, not officially
> documented by MS. However, all supported OS versions a legacy by now, no
> longer changing its structure.

This and a note about the supported OS versions could be added as comment.

>>> +
>>> +static int patch_hypercalls(VAPICROMState *s)
>>> +{
>>> +    target_phys_addr_t rom_paddr = s->rom_state_paddr & ROM_BLOCK_MASK;
>>> +    static uint8_t vmcall_pattern[] = {
>>
>> const
>>
>>> +        0xb8, 0x1, 0, 0, 0, 0xf, 0x1, 0xc1
>>> +    };
>>> +    static uint8_t outl_pattern[] = {
>>
>> const
>
> Yep.
>
>>> +static void vapic_write(void *opaque, target_phys_addr_t addr, uint64_t data,
>>> +                        unsigned int size)
>>> +{
>>> +    CPUState *env = cpu_single_env;
>>> +    target_phys_addr_t rom_paddr;
>>> +    VAPICROMState *s = opaque;
>>> +
>>> +    cpu_synchronize_state(env);
>>> +
>>> +    /*
>>> +     * The VAPIC supports two PIO-based hypercalls, both via port 0x7E.
>>> +     *  o 16-bit write access:
>>> +     *    Reports the option ROM initialization to the hypervisor. Written
>>> +     *    value is the offset of the state structure in the ROM.
>>> +     *  o 8-bit write access:
>>> +     *    Reactivates the VAPIC after a guest hibernation, i.e. after the
>>> +     *    option ROM content has been re-initialized by a guest power cycle.
>>> +     *  o 32-bit write access:
>>> +     *    Poll for pending IRQs, considering the current VAPIC state.
>>> +     */
>>
>> Different operation depending on size? Interesting.
>
> Originally not my idea, just added the third case. :)
>
>>
>>> +    switch (size) {
>>> +    case 2:
>>> +        if (s->state != VAPIC_INACTIVE) {
>>> +            patch_hypercalls(s);
>>> +            break;
>>> +        }
>>> +
>>> +        rom_paddr = (env->segs[R_CS].base + env->eip) & ROM_BLOCK_MASK;
>>> +        s->rom_state_paddr = rom_paddr + data;
>>> +
>>> +        if (vapic_prepare(s) < 0) {
>>> +            break;
>>> +        }
>>> +        s->state = VAPIC_STANDBY;
>>> +        break;
>>> +    case 1:
>>> +        if (kvm_enabled()) {
>>> +            /*
>>> +             * Disable triggering instruction in ROM by writing a NOP.
>>> +             *
>>> +             * We cannot do this in TCG mode as the reported IP is not
>>> +             * reliable.
>>
>> Given the hack level of the whole, it would not be impossible to find
>> the IP using search PC.
>
> Is there a specific pre-existing service you have in mind? Otherwise,
> the complexity might not be worth the gain.

There's gen_intermediate_code() vs. gen_intermediate_code_pc().
Probably not worth it, but it would increase the hack level nicely.

> Thanks for having a look,
> Jan
>
> --
> Siemens AG, Corporate Technology, CT T DE IT 1
> Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests
  2012-02-13 18:50         ` Blue Swirl
@ 2012-02-13 19:11           ` Gleb Natapov
  -1 siblings, 0 replies; 90+ messages in thread
From: Gleb Natapov @ 2012-02-13 19:11 UTC (permalink / raw)
  To: Blue Swirl
  Cc: Jan Kiszka, Avi Kivity, Marcelo Tosatti, Anthony Liguori,
	qemu-devel, kvm

On Mon, Feb 13, 2012 at 06:50:08PM +0000, Blue Swirl wrote:
> On Mon, Feb 13, 2012 at 10:16, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> > On 2012-02-11 16:25, Blue Swirl wrote:
> >> On Fri, Feb 10, 2012 at 18:31, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> >>> This enables acceleration for MMIO-based TPR registers accesses of
> >>> 32-bit Windows guest systems. It is mostly useful with KVM enabled,
> >>> either on older Intel CPUs (without flexpriority feature, can also be
> >>> manually disabled for testing) or any current AMD processor.
> >>>
> >>> The approach introduced here is derived from the original version of
> >>> qemu-kvm. It was refactored, documented, and extended by support for
> >>> user space APIC emulation, both with and without KVM acceleration. The
> >>> VMState format was kept compatible, so was the ABI to the option ROM
> >>> that implements the guest-side para-virtualized driver service. This
> >>> enables seamless migration from qemu-kvm to upstream or, one day,
> >>> between KVM and TCG mode.
> >>>
> >>> The basic concept goes like this:
> >>>  - VAPIC PV interface consisting of I/O port 0x7e and (for KVM in-kernel
> >>>   irqchip) a vmcall hypercall is registered
> >>>  - VAPIC option ROM is loaded into guest
> >>>  - option ROM activates TPR MMIO access reporting via port 0x7e
> >>>  - TPR accesses are trapped and patched in the guest to call into option
> >>>   ROM instead, VAPIC support is enabled
> >>>  - option ROM TPR helpers track state in memory and invoke hypercall to
> >>>   poll for pending IRQs if required
> >>>
> >>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >>
> >> I must say that I find the approach horrible, patching guests and ROMs
> >> and looking up Windows internals. Taking the same approach to extreme,
> >> we could for example patch Xen guest to become a KVM guest. Not that I
> >> object merging.
> >
> > Yes, this is horrible. But there is no real better way in the absence of
> > hardware assisted virtualization of the TPR. I think MS is recommending
> > this patching approach as well.
> 
> Maybe instead of routing via ROM and the hypercall, the TPR accesses
> could be handled directly with guest invisible breakpoints (like GDB
> breakpoints, but for QEMU internal use), much like other
> instrumentation could be handled.
> 
Hypercall is rarely called. The idea behind patching is to not
have exit on each TPR update. Breakpoint will cause exit making the
whole exercise pointless.

--
			Gleb.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests
@ 2012-02-13 19:11           ` Gleb Natapov
  0 siblings, 0 replies; 90+ messages in thread
From: Gleb Natapov @ 2012-02-13 19:11 UTC (permalink / raw)
  To: Blue Swirl
  Cc: Anthony Liguori, kvm, Jan Kiszka, Marcelo Tosatti, qemu-devel,
	Avi Kivity

On Mon, Feb 13, 2012 at 06:50:08PM +0000, Blue Swirl wrote:
> On Mon, Feb 13, 2012 at 10:16, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> > On 2012-02-11 16:25, Blue Swirl wrote:
> >> On Fri, Feb 10, 2012 at 18:31, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> >>> This enables acceleration for MMIO-based TPR registers accesses of
> >>> 32-bit Windows guest systems. It is mostly useful with KVM enabled,
> >>> either on older Intel CPUs (without flexpriority feature, can also be
> >>> manually disabled for testing) or any current AMD processor.
> >>>
> >>> The approach introduced here is derived from the original version of
> >>> qemu-kvm. It was refactored, documented, and extended by support for
> >>> user space APIC emulation, both with and without KVM acceleration. The
> >>> VMState format was kept compatible, so was the ABI to the option ROM
> >>> that implements the guest-side para-virtualized driver service. This
> >>> enables seamless migration from qemu-kvm to upstream or, one day,
> >>> between KVM and TCG mode.
> >>>
> >>> The basic concept goes like this:
> >>>  - VAPIC PV interface consisting of I/O port 0x7e and (for KVM in-kernel
> >>>   irqchip) a vmcall hypercall is registered
> >>>  - VAPIC option ROM is loaded into guest
> >>>  - option ROM activates TPR MMIO access reporting via port 0x7e
> >>>  - TPR accesses are trapped and patched in the guest to call into option
> >>>   ROM instead, VAPIC support is enabled
> >>>  - option ROM TPR helpers track state in memory and invoke hypercall to
> >>>   poll for pending IRQs if required
> >>>
> >>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >>
> >> I must say that I find the approach horrible, patching guests and ROMs
> >> and looking up Windows internals. Taking the same approach to extreme,
> >> we could for example patch Xen guest to become a KVM guest. Not that I
> >> object merging.
> >
> > Yes, this is horrible. But there is no real better way in the absence of
> > hardware assisted virtualization of the TPR. I think MS is recommending
> > this patching approach as well.
> 
> Maybe instead of routing via ROM and the hypercall, the TPR accesses
> could be handled directly with guest invisible breakpoints (like GDB
> breakpoints, but for QEMU internal use), much like other
> instrumentation could be handled.
> 
Hypercall is rarely called. The idea behind patching is to not
have exit on each TPR update. Breakpoint will cause exit making the
whole exercise pointless.

--
			Gleb.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests
  2012-02-13 18:50         ` Blue Swirl
@ 2012-02-13 19:22           ` Jan Kiszka
  -1 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-13 19:22 UTC (permalink / raw)
  To: Blue Swirl
  Cc: Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti, qemu-devel,
	Avi Kivity

On 2012-02-13 19:50, Blue Swirl wrote:
> On Mon, Feb 13, 2012 at 10:16, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> On 2012-02-11 16:25, Blue Swirl wrote:
>>> On Fri, Feb 10, 2012 at 18:31, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>>> This enables acceleration for MMIO-based TPR registers accesses of
>>>> 32-bit Windows guest systems. It is mostly useful with KVM enabled,
>>>> either on older Intel CPUs (without flexpriority feature, can also be
>>>> manually disabled for testing) or any current AMD processor.
>>>>
>>>> The approach introduced here is derived from the original version of
>>>> qemu-kvm. It was refactored, documented, and extended by support for
>>>> user space APIC emulation, both with and without KVM acceleration. The
>>>> VMState format was kept compatible, so was the ABI to the option ROM
>>>> that implements the guest-side para-virtualized driver service. This
>>>> enables seamless migration from qemu-kvm to upstream or, one day,
>>>> between KVM and TCG mode.
>>>>
>>>> The basic concept goes like this:
>>>>  - VAPIC PV interface consisting of I/O port 0x7e and (for KVM in-kernel
>>>>   irqchip) a vmcall hypercall is registered
>>>>  - VAPIC option ROM is loaded into guest
>>>>  - option ROM activates TPR MMIO access reporting via port 0x7e
>>>>  - TPR accesses are trapped and patched in the guest to call into option
>>>>   ROM instead, VAPIC support is enabled
>>>>  - option ROM TPR helpers track state in memory and invoke hypercall to
>>>>   poll for pending IRQs if required
>>>>
>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>
>>> I must say that I find the approach horrible, patching guests and ROMs
>>> and looking up Windows internals. Taking the same approach to extreme,
>>> we could for example patch Xen guest to become a KVM guest. Not that I
>>> object merging.
>>
>> Yes, this is horrible. But there is no real better way in the absence of
>> hardware assisted virtualization of the TPR. I think MS is recommending
>> this patching approach as well.
> 
> Maybe instead of routing via ROM and the hypercall, the TPR accesses
> could be handled directly with guest invisible breakpoints (like GDB
> breakpoints, but for QEMU internal use), much like other
> instrumentation could be handled.

Gleb answered it already.

>>>> @@ -238,6 +275,7 @@ static int apic_init_common(SysBusDevice *dev)
>>>>  {
>>>>     APICCommonState *s = APIC_COMMON(dev);
>>>>     APICCommonClass *info;
>>>> +    static DeviceState *vapic;
>>>>     static int apic_no;
>>>>
>>>>     if (apic_no >= MAX_APICS) {
>>>> @@ -248,10 +286,29 @@ static int apic_init_common(SysBusDevice *dev)
>>>>     info = APIC_COMMON_GET_CLASS(s);
>>>>     info->init(s);
>>>>
>>>> -    sysbus_init_mmio(&s->busdev, &s->io_memory);
>>>> +    sysbus_init_mmio(dev, &s->io_memory);
>>>> +
>>>> +    if (!vapic && s->vapic_control & VAPIC_ENABLE_MASK) {
>>>> +        vapic = sysbus_create_simple("kvmvapic", -1, NULL);
>>>> +    }
>>>> +    s->vapic = vapic;
>>>> +    if (apic_report_tpr_access && info->enable_tpr_reporting) {
>>>
>>> I think you should not rely on apic_report_tpr_access being in sane
>>> condition during class init.
>>
>> It is mandatory, e.g. for CPU hotplug, as reporting needs to be
>> consistent accross all VCPUs. Therefore it is a static global, set to
>> false initially. However, you are right, we lack proper clearing of  the
>> access report feature on reset, not only in this variable.
> 
> I'd also set it to false initially.

It's a global variable, thus initialized to false by definition.

>>>> +
>>>> +#define VAPIC_CPU_SHIFT         7
>>>> +
>>>> +#define ROM_BLOCK_SIZE          512
>>>> +#define ROM_BLOCK_MASK          (~(ROM_BLOCK_SIZE - 1))
>>>> +
>>>> +typedef struct VAPICHandlers {
>>>> +    uint32_t set_tpr;
>>>> +    uint32_t set_tpr_eax;
>>>> +    uint32_t get_tpr[8];
>>>> +    uint32_t get_tpr_stack;
>>>> +} QEMU_PACKED VAPICHandlers;
>>>> +
>>>> +typedef struct GuestROMState {
>>>> +    char signature[8];
>>>> +    uint32_t vaddr;
>>>
>>> This does not look 64 bit clean.
>>
>> It's packed.
> 
> I meant "virtual address could be 64 bits on a 64 bit host", not
> structure packing.

This is for 32-bit guests only. 64-bit Windows doesn't access the TPR
via MMIO, thus is not activating the VAPIC.

>>>> +    uint32_t state;
>>>> +    uint32_t rom_state_paddr;
>>>> +    uint32_t rom_state_vaddr;
>>>> +    uint32_t vapic_paddr;
>>>> +    uint32_t real_tpr_addr;
>>>> +    GuestROMState rom_state;
>>>> +    size_t rom_size;
>>>> +} VAPICROMState;
>>>> +
>>>> +#define TPR_INSTR_IS_WRITE              0x1
>>>> +#define TPR_INSTR_ABS_MODRM             0x2
>>>> +#define TPR_INSTR_MATCH_MODRM_REG       0x4
>>>> +
>>>> +typedef struct TPRInstruction {
>>>> +    uint8_t opcode;
>>>> +    uint8_t modrm_reg;
>>>> +    unsigned int flags;
>>>> +    size_t length;
>>>> +    off_t addr_offset;
>>>> +} TPRInstruction;
>>>
>>> Also here the order is pessimized.
>>
>> Don't see the gain here, though.
> 
> There are two bytes' hole between modrm_reg and flags, maybe also 4
> bytes between length and addr_offset (if size_t is 32 bits but off_t
> 64 bits). I'd reverse the order so that members with largest alignment
> needs come first.

Well, but this won't make the struct smaller. I prefer to keep the
ordering in which we also initialize it.

> 
>>>> +static int find_real_tpr_addr(VAPICROMState *s, CPUState *env)
>>>> +{
>>>> +    target_phys_addr_t paddr;
>>>> +    target_ulong addr;
>>>> +
>>>> +    if (s->state == VAPIC_ACTIVE) {
>>>> +        return 0;
>>>> +    }
>>>> +    for (addr = 0xfffff000; addr >= 0x80000000; addr -= TARGET_PAGE_SIZE) {
>>>> +        paddr = cpu_get_phys_page_debug(env, addr);
>>>> +        if (paddr != APIC_DEFAULT_ADDRESS) {
>>>> +            continue;
>>>> +        }
>>>> +        s->real_tpr_addr = addr + 0x80;
>>>> +        update_guest_rom_state(s);
>>>> +        return 0;
>>>> +    }
>>>
>>> This loop looks odd, what should it do, probe for unused address?
>>
>> Seems to deserve a comment: We have to scan for the guest's mapping of
>> the APIC as we enter here without a hint from an TPR accessing
>> instruction. So we probe the potential range, trying to find the page
>> that maps to that known physical address (known in the sense that
>> Windows does not remap the APIC physically - nor does QEMU support that
>> so far).
> 
> Yes, more comments would be nice, especially on theory of operation.
> 
>>>> +static int evaluate_tpr_instruction(VAPICROMState *s, CPUState *env,
>>>> +                                    target_ulong *pip, int access)
>>>> +{
>>>> +    const TPRInstruction *instr;
>>>> +    target_ulong ip = *pip;
>>>> +    uint8_t opcode[2];
>>>> +    uint32_t real_tpr_addr;
>>>> +    int i;
>>>> +
>>>> +    if ((ip & 0xf0000000) != 0x80000000 && (ip & 0xf0000000) != 0xe0000000) {
>>>
>>> The constants should be using ULL suffix because target_ulong could be
>>> 64 bit, though maybe this is more optimal.
>>
>> target_ulong is 64-bit unconditionally on x86. I'll add this.
> 
> No, target_phys_addr_t is now 64 bits, but target_ulong (register
> size) is 32 bits for i386-softmmu.

Ah, right.

> 
>>>> +
>>>> +/*
>>>> + * Tries to read the unique processor number from the Kernel Processor Control
>>>> + * Region (KPCR) of 32-bit Windows. Returns -1 if the KPCR cannot be accessed
>>>> + * or is considered invalid.
>>>> + */
>>>
>>> Horrible hack. Is guest OS type or version checked somewhere?
>>
>> This is all about hacking Windows 32-bit. And this check encodes that
>> even stronger. The other important binding is the expected virtual
>> address of the ROM mapping under Windows. I would have preferred
>> checking the version directly, but no one has a complete list of
>> supported guests and their codes.
> 
> Then it would be nice to only enable this with a command line switch,
> so that some random poor non-Windows OS is not patched incorrectly.

I had the same concern, but there is no need to worry, we have
sufficient checks that no other guest is affected. And we do have a
switch as well, the APIC property. But this is better left on by default
to please our guests with optimal performance.

> 
>>>
>>>> +static int get_kpcr_number(CPUState *env)
>>>> +{
>>>> +    struct kpcr {
>>>> +        uint8_t  fill1[0x1c];
>>>> +        uint32_t self;
>>>> +        uint8_t  fill2[0x31];
>>>> +        uint8_t  number;
>>>> +    } QEMU_PACKED kpcr;
>>>
>>> KPCR. Pointers to Windows documentation would be nice.
>>
>> Oops, yes.
>>
>> Unfortunately, this is only an internal structure, not officially
>> documented by MS. However, all supported OS versions a legacy by now, no
>> longer changing its structure.
> 
> This and a note about the supported OS versions could be added as comment.

OK.

For the folks that developed it in qemu-kvm: This targets Windows XP,
Vista and Server 2003, all 32-bit, right?

Jan


-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests
@ 2012-02-13 19:22           ` Jan Kiszka
  0 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-13 19:22 UTC (permalink / raw)
  To: Blue Swirl
  Cc: Anthony Liguori, kvm, Gleb Natapov, Marcelo Tosatti, qemu-devel,
	Avi Kivity

On 2012-02-13 19:50, Blue Swirl wrote:
> On Mon, Feb 13, 2012 at 10:16, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> On 2012-02-11 16:25, Blue Swirl wrote:
>>> On Fri, Feb 10, 2012 at 18:31, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>>> This enables acceleration for MMIO-based TPR registers accesses of
>>>> 32-bit Windows guest systems. It is mostly useful with KVM enabled,
>>>> either on older Intel CPUs (without flexpriority feature, can also be
>>>> manually disabled for testing) or any current AMD processor.
>>>>
>>>> The approach introduced here is derived from the original version of
>>>> qemu-kvm. It was refactored, documented, and extended by support for
>>>> user space APIC emulation, both with and without KVM acceleration. The
>>>> VMState format was kept compatible, so was the ABI to the option ROM
>>>> that implements the guest-side para-virtualized driver service. This
>>>> enables seamless migration from qemu-kvm to upstream or, one day,
>>>> between KVM and TCG mode.
>>>>
>>>> The basic concept goes like this:
>>>>  - VAPIC PV interface consisting of I/O port 0x7e and (for KVM in-kernel
>>>>   irqchip) a vmcall hypercall is registered
>>>>  - VAPIC option ROM is loaded into guest
>>>>  - option ROM activates TPR MMIO access reporting via port 0x7e
>>>>  - TPR accesses are trapped and patched in the guest to call into option
>>>>   ROM instead, VAPIC support is enabled
>>>>  - option ROM TPR helpers track state in memory and invoke hypercall to
>>>>   poll for pending IRQs if required
>>>>
>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>
>>> I must say that I find the approach horrible, patching guests and ROMs
>>> and looking up Windows internals. Taking the same approach to extreme,
>>> we could for example patch Xen guest to become a KVM guest. Not that I
>>> object merging.
>>
>> Yes, this is horrible. But there is no real better way in the absence of
>> hardware assisted virtualization of the TPR. I think MS is recommending
>> this patching approach as well.
> 
> Maybe instead of routing via ROM and the hypercall, the TPR accesses
> could be handled directly with guest invisible breakpoints (like GDB
> breakpoints, but for QEMU internal use), much like other
> instrumentation could be handled.

Gleb answered it already.

>>>> @@ -238,6 +275,7 @@ static int apic_init_common(SysBusDevice *dev)
>>>>  {
>>>>     APICCommonState *s = APIC_COMMON(dev);
>>>>     APICCommonClass *info;
>>>> +    static DeviceState *vapic;
>>>>     static int apic_no;
>>>>
>>>>     if (apic_no >= MAX_APICS) {
>>>> @@ -248,10 +286,29 @@ static int apic_init_common(SysBusDevice *dev)
>>>>     info = APIC_COMMON_GET_CLASS(s);
>>>>     info->init(s);
>>>>
>>>> -    sysbus_init_mmio(&s->busdev, &s->io_memory);
>>>> +    sysbus_init_mmio(dev, &s->io_memory);
>>>> +
>>>> +    if (!vapic && s->vapic_control & VAPIC_ENABLE_MASK) {
>>>> +        vapic = sysbus_create_simple("kvmvapic", -1, NULL);
>>>> +    }
>>>> +    s->vapic = vapic;
>>>> +    if (apic_report_tpr_access && info->enable_tpr_reporting) {
>>>
>>> I think you should not rely on apic_report_tpr_access being in sane
>>> condition during class init.
>>
>> It is mandatory, e.g. for CPU hotplug, as reporting needs to be
>> consistent accross all VCPUs. Therefore it is a static global, set to
>> false initially. However, you are right, we lack proper clearing of  the
>> access report feature on reset, not only in this variable.
> 
> I'd also set it to false initially.

It's a global variable, thus initialized to false by definition.

>>>> +
>>>> +#define VAPIC_CPU_SHIFT         7
>>>> +
>>>> +#define ROM_BLOCK_SIZE          512
>>>> +#define ROM_BLOCK_MASK          (~(ROM_BLOCK_SIZE - 1))
>>>> +
>>>> +typedef struct VAPICHandlers {
>>>> +    uint32_t set_tpr;
>>>> +    uint32_t set_tpr_eax;
>>>> +    uint32_t get_tpr[8];
>>>> +    uint32_t get_tpr_stack;
>>>> +} QEMU_PACKED VAPICHandlers;
>>>> +
>>>> +typedef struct GuestROMState {
>>>> +    char signature[8];
>>>> +    uint32_t vaddr;
>>>
>>> This does not look 64 bit clean.
>>
>> It's packed.
> 
> I meant "virtual address could be 64 bits on a 64 bit host", not
> structure packing.

This is for 32-bit guests only. 64-bit Windows doesn't access the TPR
via MMIO, thus is not activating the VAPIC.

>>>> +    uint32_t state;
>>>> +    uint32_t rom_state_paddr;
>>>> +    uint32_t rom_state_vaddr;
>>>> +    uint32_t vapic_paddr;
>>>> +    uint32_t real_tpr_addr;
>>>> +    GuestROMState rom_state;
>>>> +    size_t rom_size;
>>>> +} VAPICROMState;
>>>> +
>>>> +#define TPR_INSTR_IS_WRITE              0x1
>>>> +#define TPR_INSTR_ABS_MODRM             0x2
>>>> +#define TPR_INSTR_MATCH_MODRM_REG       0x4
>>>> +
>>>> +typedef struct TPRInstruction {
>>>> +    uint8_t opcode;
>>>> +    uint8_t modrm_reg;
>>>> +    unsigned int flags;
>>>> +    size_t length;
>>>> +    off_t addr_offset;
>>>> +} TPRInstruction;
>>>
>>> Also here the order is pessimized.
>>
>> Don't see the gain here, though.
> 
> There are two bytes' hole between modrm_reg and flags, maybe also 4
> bytes between length and addr_offset (if size_t is 32 bits but off_t
> 64 bits). I'd reverse the order so that members with largest alignment
> needs come first.

Well, but this won't make the struct smaller. I prefer to keep the
ordering in which we also initialize it.

> 
>>>> +static int find_real_tpr_addr(VAPICROMState *s, CPUState *env)
>>>> +{
>>>> +    target_phys_addr_t paddr;
>>>> +    target_ulong addr;
>>>> +
>>>> +    if (s->state == VAPIC_ACTIVE) {
>>>> +        return 0;
>>>> +    }
>>>> +    for (addr = 0xfffff000; addr >= 0x80000000; addr -= TARGET_PAGE_SIZE) {
>>>> +        paddr = cpu_get_phys_page_debug(env, addr);
>>>> +        if (paddr != APIC_DEFAULT_ADDRESS) {
>>>> +            continue;
>>>> +        }
>>>> +        s->real_tpr_addr = addr + 0x80;
>>>> +        update_guest_rom_state(s);
>>>> +        return 0;
>>>> +    }
>>>
>>> This loop looks odd, what should it do, probe for unused address?
>>
>> Seems to deserve a comment: We have to scan for the guest's mapping of
>> the APIC as we enter here without a hint from an TPR accessing
>> instruction. So we probe the potential range, trying to find the page
>> that maps to that known physical address (known in the sense that
>> Windows does not remap the APIC physically - nor does QEMU support that
>> so far).
> 
> Yes, more comments would be nice, especially on theory of operation.
> 
>>>> +static int evaluate_tpr_instruction(VAPICROMState *s, CPUState *env,
>>>> +                                    target_ulong *pip, int access)
>>>> +{
>>>> +    const TPRInstruction *instr;
>>>> +    target_ulong ip = *pip;
>>>> +    uint8_t opcode[2];
>>>> +    uint32_t real_tpr_addr;
>>>> +    int i;
>>>> +
>>>> +    if ((ip & 0xf0000000) != 0x80000000 && (ip & 0xf0000000) != 0xe0000000) {
>>>
>>> The constants should be using ULL suffix because target_ulong could be
>>> 64 bit, though maybe this is more optimal.
>>
>> target_ulong is 64-bit unconditionally on x86. I'll add this.
> 
> No, target_phys_addr_t is now 64 bits, but target_ulong (register
> size) is 32 bits for i386-softmmu.

Ah, right.

> 
>>>> +
>>>> +/*
>>>> + * Tries to read the unique processor number from the Kernel Processor Control
>>>> + * Region (KPCR) of 32-bit Windows. Returns -1 if the KPCR cannot be accessed
>>>> + * or is considered invalid.
>>>> + */
>>>
>>> Horrible hack. Is guest OS type or version checked somewhere?
>>
>> This is all about hacking Windows 32-bit. And this check encodes that
>> even stronger. The other important binding is the expected virtual
>> address of the ROM mapping under Windows. I would have preferred
>> checking the version directly, but no one has a complete list of
>> supported guests and their codes.
> 
> Then it would be nice to only enable this with a command line switch,
> so that some random poor non-Windows OS is not patched incorrectly.

I had the same concern, but there is no need to worry, we have
sufficient checks that no other guest is affected. And we do have a
switch as well, the APIC property. But this is better left on by default
to please our guests with optimal performance.

> 
>>>
>>>> +static int get_kpcr_number(CPUState *env)
>>>> +{
>>>> +    struct kpcr {
>>>> +        uint8_t  fill1[0x1c];
>>>> +        uint32_t self;
>>>> +        uint8_t  fill2[0x31];
>>>> +        uint8_t  number;
>>>> +    } QEMU_PACKED kpcr;
>>>
>>> KPCR. Pointers to Windows documentation would be nice.
>>
>> Oops, yes.
>>
>> Unfortunately, this is only an internal structure, not officially
>> documented by MS. However, all supported OS versions a legacy by now, no
>> longer changing its structure.
> 
> This and a note about the supported OS versions could be added as comment.

OK.

For the folks that developed it in qemu-kvm: This targets Windows XP,
Vista and Server 2003, all 32-bit, right?

Jan


-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests
  2012-02-13 19:22           ` [Qemu-devel] " Jan Kiszka
@ 2012-02-14  7:54             ` Gleb Natapov
  -1 siblings, 0 replies; 90+ messages in thread
From: Gleb Natapov @ 2012-02-14  7:54 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, kvm, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Avi Kivity

On Mon, Feb 13, 2012 at 08:22:21PM +0100, Jan Kiszka wrote:
> >> Unfortunately, this is only an internal structure, not officially
> >> documented by MS. However, all supported OS versions a legacy by now, no
> >> longer changing its structure.
> > 
> > This and a note about the supported OS versions could be added as comment.
> 
> OK.
> 
> For the folks that developed it in qemu-kvm: This targets Windows XP,
> Vista and Server 2003, all 32-bit, right?
> 
Not Vista. Not sure about Server 2003.

--
			Gleb.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests
@ 2012-02-14  7:54             ` Gleb Natapov
  0 siblings, 0 replies; 90+ messages in thread
From: Gleb Natapov @ 2012-02-14  7:54 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, kvm, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Avi Kivity

On Mon, Feb 13, 2012 at 08:22:21PM +0100, Jan Kiszka wrote:
> >> Unfortunately, this is only an internal structure, not officially
> >> documented by MS. However, all supported OS versions a legacy by now, no
> >> longer changing its structure.
> > 
> > This and a note about the supported OS versions could be added as comment.
> 
> OK.
> 
> For the folks that developed it in qemu-kvm: This targets Windows XP,
> Vista and Server 2003, all 32-bit, right?
> 
Not Vista. Not sure about Server 2003.

--
			Gleb.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests
  2012-02-14  7:54             ` [Qemu-devel] " Gleb Natapov
@ 2012-02-14  8:55               ` Jan Kiszka
  -1 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-14  8:55 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Anthony Liguori, kvm, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Avi Kivity

On 2012-02-14 08:54, Gleb Natapov wrote:
> On Mon, Feb 13, 2012 at 08:22:21PM +0100, Jan Kiszka wrote:
>>>> Unfortunately, this is only an internal structure, not officially
>>>> documented by MS. However, all supported OS versions a legacy by now, no
>>>> longer changing its structure.
>>>
>>> This and a note about the supported OS versions could be added as comment.
>>
>> OK.
>>
>> For the folks that developed it in qemu-kvm: This targets Windows XP,
>> Vista and Server 2003, all 32-bit, right?
>>
> Not Vista. Not sure about Server 2003.

I think I saw some 2003 reference in the qemu-kvm git logs.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests
@ 2012-02-14  8:55               ` Jan Kiszka
  0 siblings, 0 replies; 90+ messages in thread
From: Jan Kiszka @ 2012-02-14  8:55 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Anthony Liguori, kvm, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Avi Kivity

On 2012-02-14 08:54, Gleb Natapov wrote:
> On Mon, Feb 13, 2012 at 08:22:21PM +0100, Jan Kiszka wrote:
>>>> Unfortunately, this is only an internal structure, not officially
>>>> documented by MS. However, all supported OS versions a legacy by now, no
>>>> longer changing its structure.
>>>
>>> This and a note about the supported OS versions could be added as comment.
>>
>> OK.
>>
>> For the folks that developed it in qemu-kvm: This targets Windows XP,
>> Vista and Server 2003, all 32-bit, right?
>>
> Not Vista. Not sure about Server 2003.

I think I saw some 2003 reference in the qemu-kvm git logs.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests
  2012-02-14  8:55               ` [Qemu-devel] " Jan Kiszka
@ 2012-02-14  8:59                 ` Gleb Natapov
  -1 siblings, 0 replies; 90+ messages in thread
From: Gleb Natapov @ 2012-02-14  8:59 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, kvm, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Avi Kivity

On Tue, Feb 14, 2012 at 09:55:46AM +0100, Jan Kiszka wrote:
> On 2012-02-14 08:54, Gleb Natapov wrote:
> > On Mon, Feb 13, 2012 at 08:22:21PM +0100, Jan Kiszka wrote:
> >>>> Unfortunately, this is only an internal structure, not officially
> >>>> documented by MS. However, all supported OS versions a legacy by now, no
> >>>> longer changing its structure.
> >>>
> >>> This and a note about the supported OS versions could be added as comment.
> >>
> >> OK.
> >>
> >> For the folks that developed it in qemu-kvm: This targets Windows XP,
> >> Vista and Server 2003, all 32-bit, right?
> >>
> > Not Vista. Not sure about Server 2003.
> 
> I think I saw some 2003 reference in the qemu-kvm git logs.
> 
Very likely. AFAIK it uses the same kernel as XP.

--
			Gleb.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [Qemu-devel] [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests
@ 2012-02-14  8:59                 ` Gleb Natapov
  0 siblings, 0 replies; 90+ messages in thread
From: Gleb Natapov @ 2012-02-14  8:59 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Anthony Liguori, kvm, Marcelo Tosatti, qemu-devel, Blue Swirl,
	Avi Kivity

On Tue, Feb 14, 2012 at 09:55:46AM +0100, Jan Kiszka wrote:
> On 2012-02-14 08:54, Gleb Natapov wrote:
> > On Mon, Feb 13, 2012 at 08:22:21PM +0100, Jan Kiszka wrote:
> >>>> Unfortunately, this is only an internal structure, not officially
> >>>> documented by MS. However, all supported OS versions a legacy by now, no
> >>>> longer changing its structure.
> >>>
> >>> This and a note about the supported OS versions could be added as comment.
> >>
> >> OK.
> >>
> >> For the folks that developed it in qemu-kvm: This targets Windows XP,
> >> Vista and Server 2003, all 32-bit, right?
> >>
> > Not Vista. Not sure about Server 2003.
> 
> I think I saw some 2003 reference in the qemu-kvm git logs.
> 
Very likely. AFAIK it uses the same kernel as XP.

--
			Gleb.

^ permalink raw reply	[flat|nested] 90+ messages in thread

end of thread, other threads:[~2012-02-14  8:59 UTC | newest]

Thread overview: 90+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-10 18:31 [PATCH v2 0/8] uq/master: TPR access optimization for Windows guests Jan Kiszka
2012-02-10 18:31 ` [Qemu-devel] " Jan Kiszka
2012-02-10 18:31 ` [PATCH v2 1/8] kvm: Set cpu_single_env only once Jan Kiszka
2012-02-10 18:31   ` [Qemu-devel] " Jan Kiszka
2012-02-11 10:02   ` Blue Swirl
2012-02-11 10:02     ` [Qemu-devel] " Blue Swirl
2012-02-11 10:06     ` Jan Kiszka
2012-02-11 10:06       ` [Qemu-devel] " Jan Kiszka
2012-02-11 11:25       ` Blue Swirl
2012-02-11 11:25         ` [Qemu-devel] " Blue Swirl
2012-02-11 11:49         ` Andreas Färber
2012-02-11 11:49           ` [Qemu-devel] " Andreas Färber
2012-02-11 12:43           ` Jan Kiszka
2012-02-11 12:43             ` Jan Kiszka
2012-02-11 13:06             ` Andreas Färber
2012-02-11 13:06               ` [Qemu-devel] " Andreas Färber
2012-02-11 13:07               ` Jan Kiszka
2012-02-11 13:07                 ` Jan Kiszka
2012-02-11 13:21                 ` Andreas Färber
2012-02-11 13:21                   ` Andreas Färber
2012-02-11 13:35                   ` Jan Kiszka
2012-02-11 13:35                     ` [Qemu-devel] " Jan Kiszka
2012-02-11 13:59                     ` Andreas Färber
2012-02-11 13:59                       ` [Qemu-devel] " Andreas Färber
2012-02-11 14:02                       ` Jan Kiszka
2012-02-11 14:02                         ` [Qemu-devel] " Jan Kiszka
2012-02-11 14:12                         ` Andreas Färber
2012-02-11 14:12                           ` Andreas Färber
2012-02-11 14:24                           ` Jan Kiszka
2012-02-11 14:24                             ` Jan Kiszka
2012-02-11 14:49                             ` Andreas Färber
2012-02-11 14:49                               ` [Qemu-devel] " Andreas Färber
2012-02-13  8:17                           ` Paolo Bonzini
2012-02-13  8:17                             ` Paolo Bonzini
2012-02-11 13:54             ` Blue Swirl
2012-02-11 13:54               ` Blue Swirl
2012-02-11 14:00               ` Jan Kiszka
2012-02-11 14:00                 ` [Qemu-devel] " Jan Kiszka
2012-02-11 14:11                 ` Blue Swirl
2012-02-11 14:11                   ` Blue Swirl
2012-02-11 14:18                   ` Jan Kiszka
2012-02-11 14:18                     ` Jan Kiszka
2012-02-11 14:23                     ` Blue Swirl
2012-02-11 14:23                       ` Blue Swirl
2012-02-11 12:41         ` Jan Kiszka
2012-02-11 12:41           ` [Qemu-devel] " Jan Kiszka
2012-02-10 18:31 ` [PATCH v2 2/8] Allow to use pause_all_vcpus from VCPU context Jan Kiszka
2012-02-10 18:31   ` [Qemu-devel] " Jan Kiszka
2012-02-11 14:16   ` Blue Swirl
2012-02-11 14:16     ` [Qemu-devel] " Blue Swirl
2012-02-11 14:31     ` Jan Kiszka
2012-02-11 14:31       ` [Qemu-devel] " Jan Kiszka
2012-02-10 18:31 ` [PATCH v2 3/8] target-i386: Add infrastructure for reporting TPR MMIO accesses Jan Kiszka
2012-02-10 18:31   ` [Qemu-devel] " Jan Kiszka
2012-02-11 14:32   ` Blue Swirl
2012-02-11 14:32     ` Blue Swirl
2012-02-10 18:31 ` [PATCH v2 4/8] kvmvapic: Add option ROM Jan Kiszka
2012-02-10 18:31   ` [Qemu-devel] " Jan Kiszka
2012-02-10 18:31 ` [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests Jan Kiszka
2012-02-10 18:31   ` [Qemu-devel] " Jan Kiszka
2012-02-11 15:25   ` Blue Swirl
2012-02-11 15:25     ` [Qemu-devel] " Blue Swirl
2012-02-13 10:16     ` Jan Kiszka
2012-02-13 10:16       ` Jan Kiszka
2012-02-13 18:50       ` Blue Swirl
2012-02-13 18:50         ` Blue Swirl
2012-02-13 19:11         ` Gleb Natapov
2012-02-13 19:11           ` Gleb Natapov
2012-02-13 19:22         ` Jan Kiszka
2012-02-13 19:22           ` [Qemu-devel] " Jan Kiszka
2012-02-14  7:54           ` Gleb Natapov
2012-02-14  7:54             ` [Qemu-devel] " Gleb Natapov
2012-02-14  8:55             ` Jan Kiszka
2012-02-14  8:55               ` [Qemu-devel] " Jan Kiszka
2012-02-14  8:59               ` Gleb Natapov
2012-02-14  8:59                 ` [Qemu-devel] " Gleb Natapov
2012-02-10 18:31 ` [PATCH v2 6/8] kvmvapic: Simplify mp/up_set_tpr Jan Kiszka
2012-02-10 18:31   ` [Qemu-devel] " Jan Kiszka
2012-02-10 18:31 ` [PATCH v2 7/8] optionsrom: Reserve space for checksum Jan Kiszka
2012-02-10 18:31   ` [Qemu-devel] " Jan Kiszka
2012-02-11 11:46   ` Andreas Färber
2012-02-11 11:46     ` [Qemu-devel] " Andreas Färber
2012-02-11 12:45     ` Jan Kiszka
2012-02-11 12:45       ` [Qemu-devel] " Jan Kiszka
2012-02-11 12:51       ` Andreas Färber
2012-02-11 12:51         ` [Qemu-devel] " Andreas Färber
2012-02-11 12:57         ` Jan Kiszka
2012-02-11 12:57           ` [Qemu-devel] " Jan Kiszka
2012-02-10 18:31 ` [PATCH v2 8/8] kvmvapic: Use optionrom helpers Jan Kiszka
2012-02-10 18:31   ` [Qemu-devel] " Jan Kiszka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.