All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/20] KVM updates for the 2.6.34 merge window (batch 4/4)
@ 2010-02-17 13:45 Avi Kivity
  2010-02-17 13:45 ` [PATCH 01/20] KVM: Fix Codestyle in virt/kvm/coalesced_mmio.c Avi Kivity
                   ` (19 more replies)
  0 siblings, 20 replies; 32+ messages in thread
From: Avi Kivity @ 2010-02-17 13:45 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel

This is the first of four batches of patches for the 2.6.34 merge window.  KVM
changes for this cycle include:

 - rdtscp support
 - powerpc server-class updates
 - much improved large-guest scaling (now up to 64 vcpus)
 - improved guest fpu handling
 - initial Hyper-V emulation
 - better swapping with EPT
 - 1GB pages on Intel
 - x86 emulator fixes

as well as the usual assortment of random fixes and improvements.

Avi Kivity (2):
  KVM: MMU: Add tracepoint for guest page aging
  KVM: Plan obsolescence of kernel allocated slots, paravirt mmu

Gleb Natapov (9):
  KVM: x86 emulator: Add group8 instruction decoding
  KVM: x86 emulator: Add group9 instruction decoding
  KVM: x86 emulator: Add Virtual-8086 mode of emulation
  KVM: x86 emulator: fix memory access during x86 emulation
  KVM: x86 emulator: Check IOPL level during io instruction emulation
  KVM: x86 emulator: Fix popf emulation
  KVM: x86 emulator: Check CPL level during privilege instruction
    emulation
  KVM: x86 emulator: Add LOCK prefix validity checking
  KVM: x86 emulator: disallow opcode 82 in 64-bit mode

Jochen Maes (1):
  KVM: Fix Codestyle in virt/kvm/coalesced_mmio.c

Liu Yu (1):
  KVM: ppc/booke: Set ESR and DEAR when inject interrupt to guest

Michael S. Tsirkin (1):
  KVM: do not store wqh in irqfd

Sheng Yang (1):
  KVM: VMX: Rename VMX_EPT_IGMT_BIT to VMX_EPT_IPAT_BIT

Wei Yongjun (5):
  KVM: PIT: unregister kvm irq notifier if fail to create pit
  KVM: kvm->arch.vioapic should be NULL if kvm_ioapic_init() failure
  KVM: cleanup the failure path of KVM_CREATE_IRQCHIP ioctrl
  KVM: ia64: destroy ioapic device if fail to setup default irq routing
  KVM: x86 emulator: code style cleanup

 Documentation/feature-removal-schedule.txt |   30 +++
 arch/ia64/kvm/kvm-ia64.c                   |    2 +-
 arch/powerpc/include/asm/kvm_host.h        |    2 +
 arch/powerpc/kvm/booke.c                   |   59 ++++--
 arch/powerpc/kvm/emulate.c                 |    4 +-
 arch/x86/include/asm/kvm_emulate.h         |   15 ++-
 arch/x86/include/asm/kvm_host.h            |    8 +-
 arch/x86/include/asm/vmx.h                 |    2 +-
 arch/x86/kvm/emulate.c                     |  300 +++++++++++++++++++++-------
 arch/x86/kvm/i8254.c                       |    5 +-
 arch/x86/kvm/i8259.c                       |   11 +
 arch/x86/kvm/irq.h                         |    1 +
 arch/x86/kvm/mmu.c                         |   28 ++--
 arch/x86/kvm/mmu.h                         |    6 +
 arch/x86/kvm/paging_tmpl.h                 |   11 +-
 arch/x86/kvm/vmx.c                         |    4 +-
 arch/x86/kvm/x86.c                         |  152 ++++++++++----
 include/trace/events/kvm.h                 |   22 ++
 virt/kvm/coalesced_mmio.c                  |    4 +-
 virt/kvm/eventfd.c                         |    3 -
 virt/kvm/ioapic.c                          |   15 ++-
 virt/kvm/ioapic.h                          |    1 +
 22 files changed, 525 insertions(+), 160 deletions(-)


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 01/20] KVM: Fix Codestyle in virt/kvm/coalesced_mmio.c
  2010-02-17 13:45 [PATCH 00/20] KVM updates for the 2.6.34 merge window (batch 4/4) Avi Kivity
@ 2010-02-17 13:45 ` Avi Kivity
  2010-02-17 13:45 ` [PATCH 02/20] KVM: MMU: Add tracepoint for guest page aging Avi Kivity
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Avi Kivity @ 2010-02-17 13:45 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel

From: Jochen Maes <jochen.maes@sejo.be>

Fixed 2 codestyle issues in virt/kvm/coalesced_mmio.c

Signed-off-by: Jochen Maes <jochen.maes@sejo.be>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
 virt/kvm/coalesced_mmio.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
index 5de6594..5169736 100644
--- a/virt/kvm/coalesced_mmio.c
+++ b/virt/kvm/coalesced_mmio.c
@@ -133,7 +133,7 @@ void kvm_coalesced_mmio_free(struct kvm *kvm)
 }
 
 int kvm_vm_ioctl_register_coalesced_mmio(struct kvm *kvm,
-				         struct kvm_coalesced_mmio_zone *zone)
+					 struct kvm_coalesced_mmio_zone *zone)
 {
 	struct kvm_coalesced_mmio_dev *dev = kvm->coalesced_mmio_dev;
 
@@ -166,7 +166,7 @@ int kvm_vm_ioctl_unregister_coalesced_mmio(struct kvm *kvm,
 	mutex_lock(&kvm->slots_lock);
 
 	i = dev->nb_zones;
-	while(i) {
+	while (i) {
 		z = &dev->zone[i - 1];
 
 		/* unregister all zones
-- 
1.6.5.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 02/20] KVM: MMU: Add tracepoint for guest page aging
  2010-02-17 13:45 [PATCH 00/20] KVM updates for the 2.6.34 merge window (batch 4/4) Avi Kivity
  2010-02-17 13:45 ` [PATCH 01/20] KVM: Fix Codestyle in virt/kvm/coalesced_mmio.c Avi Kivity
@ 2010-02-17 13:45 ` Avi Kivity
  2010-02-17 13:45 ` [PATCH 03/20] KVM: VMX: Rename VMX_EPT_IGMT_BIT to VMX_EPT_IPAT_BIT Avi Kivity
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Avi Kivity @ 2010-02-17 13:45 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel

Signed-off-by: Avi Kivity <avi@redhat.com>
---
 arch/x86/kvm/mmu.c         |   11 ++++++++---
 include/trace/events/kvm.h |   22 ++++++++++++++++++++++
 2 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index b8da671..7397932 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -151,6 +151,9 @@ module_param(oos_shadow, bool, 0644);
 #define ACC_USER_MASK    PT_USER_MASK
 #define ACC_ALL          (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK)
 
+#include <trace/events/kvm.h>
+
+#undef TRACE_INCLUDE_FILE
 #define CREATE_TRACE_POINTS
 #include "mmutrace.h"
 
@@ -792,6 +795,7 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
 					 unsigned long data))
 {
 	int i, j;
+	int ret;
 	int retval = 0;
 	struct kvm_memslots *slots;
 
@@ -806,16 +810,17 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
 		if (hva >= start && hva < end) {
 			gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT;
 
-			retval |= handler(kvm, &memslot->rmap[gfn_offset],
-					  data);
+			ret = handler(kvm, &memslot->rmap[gfn_offset], data);
 
 			for (j = 0; j < KVM_NR_PAGE_SIZES - 1; ++j) {
 				int idx = gfn_offset;
 				idx /= KVM_PAGES_PER_HPAGE(PT_DIRECTORY_LEVEL + j);
-				retval |= handler(kvm,
+				ret |= handler(kvm,
 					&memslot->lpage_info[j][idx].rmap_pde,
 					data);
 			}
+			trace_kvm_age_page(hva, memslot, ret);
+			retval |= ret;
 		}
 	}
 
diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h
index 8abdc12..b17d49d 100644
--- a/include/trace/events/kvm.h
+++ b/include/trace/events/kvm.h
@@ -164,6 +164,28 @@ TRACE_EVENT(kvm_fpu,
 	TP_printk("%s", __print_symbolic(__entry->load, kvm_fpu_load_symbol))
 );
 
+TRACE_EVENT(kvm_age_page,
+	TP_PROTO(ulong hva, struct kvm_memory_slot *slot, int ref),
+	TP_ARGS(hva, slot, ref),
+
+	TP_STRUCT__entry(
+		__field(	u64,	hva		)
+		__field(	u64,	gfn		)
+		__field(	u8,	referenced	)
+	),
+
+	TP_fast_assign(
+		__entry->hva		= hva;
+		__entry->gfn		=
+		  slot->base_gfn + ((hva - slot->userspace_addr) >> PAGE_SHIFT);
+		__entry->referenced	= ref;
+	),
+
+	TP_printk("hva %llx gfn %llx %s",
+		  __entry->hva, __entry->gfn,
+		  __entry->referenced ? "YOUNG" : "OLD")
+);
+
 #endif /* _TRACE_KVM_MAIN_H */
 
 /* This part must be outside protection */
-- 
1.6.5.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 03/20] KVM: VMX: Rename VMX_EPT_IGMT_BIT to VMX_EPT_IPAT_BIT
  2010-02-17 13:45 [PATCH 00/20] KVM updates for the 2.6.34 merge window (batch 4/4) Avi Kivity
  2010-02-17 13:45 ` [PATCH 01/20] KVM: Fix Codestyle in virt/kvm/coalesced_mmio.c Avi Kivity
  2010-02-17 13:45 ` [PATCH 02/20] KVM: MMU: Add tracepoint for guest page aging Avi Kivity
@ 2010-02-17 13:45 ` Avi Kivity
  2010-02-17 13:45 ` [PATCH 04/20] KVM: PIT: unregister kvm irq notifier if fail to create pit Avi Kivity
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Avi Kivity @ 2010-02-17 13:45 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel

From: Sheng Yang <sheng@linux.intel.com>

Following the new SDM. Now the bit is named "Ignore PAT memory type".

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
 arch/x86/include/asm/vmx.h |    2 +-
 arch/x86/kvm/vmx.c         |    4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 43f1e9b..fb9a080 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -377,7 +377,7 @@ enum vmcs_field {
 #define VMX_EPT_READABLE_MASK			0x1ull
 #define VMX_EPT_WRITABLE_MASK			0x2ull
 #define VMX_EPT_EXECUTABLE_MASK			0x4ull
-#define VMX_EPT_IGMT_BIT    			(1ull << 6)
+#define VMX_EPT_IPAT_BIT    			(1ull << 6)
 
 #define VMX_EPT_IDENTITY_PAGETABLE_ADDR		0xfffbc000ul
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index b400be0..f82b072 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4001,7 +4001,7 @@ static u64 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
 	 *   b. VT-d with snooping control feature: snooping control feature of
 	 *	VT-d engine can guarantee the cache correctness. Just set it
 	 *	to WB to keep consistent with host. So the same as item 3.
-	 * 3. EPT without VT-d: always map as WB and set IGMT=1 to keep
+	 * 3. EPT without VT-d: always map as WB and set IPAT=1 to keep
 	 *    consistent with host MTRR
 	 */
 	if (is_mmio)
@@ -4012,7 +4012,7 @@ static u64 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
 		      VMX_EPT_MT_EPTE_SHIFT;
 	else
 		ret = (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT)
-			| VMX_EPT_IGMT_BIT;
+			| VMX_EPT_IPAT_BIT;
 
 	return ret;
 }
-- 
1.6.5.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 04/20] KVM: PIT: unregister kvm irq notifier if fail to create pit
  2010-02-17 13:45 [PATCH 00/20] KVM updates for the 2.6.34 merge window (batch 4/4) Avi Kivity
                   ` (2 preceding siblings ...)
  2010-02-17 13:45 ` [PATCH 03/20] KVM: VMX: Rename VMX_EPT_IGMT_BIT to VMX_EPT_IPAT_BIT Avi Kivity
@ 2010-02-17 13:45 ` Avi Kivity
  2010-02-17 13:45 ` [PATCH 05/20] KVM: kvm->arch.vioapic should be NULL if kvm_ioapic_init() failure Avi Kivity
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Avi Kivity @ 2010-02-17 13:45 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel

From: Wei Yongjun <yjwei@cn.fujitsu.com>

If fail to create pit, we should unregister kvm irq notifier
which register in kvm_create_pit().

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
 arch/x86/kvm/i8254.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 6a74246..c9569f2 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -663,8 +663,9 @@ fail_unregister:
 	kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS, &pit->dev);
 
 fail:
-	if (pit->irq_source_id >= 0)
-		kvm_free_irq_source_id(kvm, pit->irq_source_id);
+	kvm_unregister_irq_mask_notifier(kvm, 0, &pit->mask_notifier);
+	kvm_unregister_irq_ack_notifier(kvm, &pit_state->irq_ack_notifier);
+	kvm_free_irq_source_id(kvm, pit->irq_source_id);
 
 	kfree(pit);
 	return NULL;
-- 
1.6.5.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 05/20] KVM: kvm->arch.vioapic should be NULL if kvm_ioapic_init() failure
  2010-02-17 13:45 [PATCH 00/20] KVM updates for the 2.6.34 merge window (batch 4/4) Avi Kivity
                   ` (3 preceding siblings ...)
  2010-02-17 13:45 ` [PATCH 04/20] KVM: PIT: unregister kvm irq notifier if fail to create pit Avi Kivity
@ 2010-02-17 13:45 ` Avi Kivity
  2010-02-17 13:45 ` [PATCH 06/20] KVM: cleanup the failure path of KVM_CREATE_IRQCHIP ioctrl Avi Kivity
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Avi Kivity @ 2010-02-17 13:45 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel

From: Wei Yongjun <yjwei@cn.fujitsu.com>

kvm->arch.vioapic should be NULL in case of kvm_ioapic_init() failure
due to cannot register io dev.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
 virt/kvm/ioapic.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index a2edfd1..f3d0693 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -393,8 +393,10 @@ int kvm_ioapic_init(struct kvm *kvm)
 	mutex_lock(&kvm->slots_lock);
 	ret = kvm_io_bus_register_dev(kvm, KVM_MMIO_BUS, &ioapic->dev);
 	mutex_unlock(&kvm->slots_lock);
-	if (ret < 0)
+	if (ret < 0) {
+		kvm->arch.vioapic = NULL;
 		kfree(ioapic);
+	}
 
 	return ret;
 }
-- 
1.6.5.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 06/20] KVM: cleanup the failure path of KVM_CREATE_IRQCHIP ioctrl
  2010-02-17 13:45 [PATCH 00/20] KVM updates for the 2.6.34 merge window (batch 4/4) Avi Kivity
                   ` (4 preceding siblings ...)
  2010-02-17 13:45 ` [PATCH 05/20] KVM: kvm->arch.vioapic should be NULL if kvm_ioapic_init() failure Avi Kivity
@ 2010-02-17 13:45 ` Avi Kivity
  2010-02-17 13:45 ` [PATCH 07/20] KVM: ia64: destroy ioapic device if fail to setup default irq routing Avi Kivity
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Avi Kivity @ 2010-02-17 13:45 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel

From: Wei Yongjun <yjwei@cn.fujitsu.com>

If we fail to init ioapic device or the fail to setup the default irq
routing, the device register by kvm_create_pic() and kvm_ioapic_init()
remain unregister. This patch fixed to do this.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
 arch/x86/kvm/i8259.c |   11 +++++++++++
 arch/x86/kvm/irq.h   |    1 +
 arch/x86/kvm/x86.c   |    8 ++++----
 virt/kvm/ioapic.c    |   11 +++++++++++
 virt/kvm/ioapic.h    |    1 +
 5 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index d5753a7..a3711f9 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -543,3 +543,14 @@ struct kvm_pic *kvm_create_pic(struct kvm *kvm)
 
 	return s;
 }
+
+void kvm_destroy_pic(struct kvm *kvm)
+{
+	struct kvm_pic *vpic = kvm->arch.vpic;
+
+	if (vpic) {
+		kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS, &vpic->dev);
+		kvm->arch.vpic = NULL;
+		kfree(vpic);
+	}
+}
diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h
index be399e2..0b71d48 100644
--- a/arch/x86/kvm/irq.h
+++ b/arch/x86/kvm/irq.h
@@ -75,6 +75,7 @@ struct kvm_pic {
 };
 
 struct kvm_pic *kvm_create_pic(struct kvm *kvm);
+void kvm_destroy_pic(struct kvm *kvm);
 int kvm_pic_read_irq(struct kvm *kvm);
 void kvm_pic_update_irq(struct kvm_pic *s);
 void kvm_pic_clear_isr_ack(struct kvm *kvm);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bd3161c..b2f91b9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2771,6 +2771,8 @@ long kvm_arch_vm_ioctl(struct file *filp,
 		if (vpic) {
 			r = kvm_ioapic_init(kvm);
 			if (r) {
+				kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS,
+							  &vpic->dev);
 				kfree(vpic);
 				goto create_irqchip_unlock;
 			}
@@ -2782,10 +2784,8 @@ long kvm_arch_vm_ioctl(struct file *filp,
 		r = kvm_setup_default_irq_routing(kvm);
 		if (r) {
 			mutex_lock(&kvm->irq_lock);
-			kfree(kvm->arch.vpic);
-			kfree(kvm->arch.vioapic);
-			kvm->arch.vpic = NULL;
-			kvm->arch.vioapic = NULL;
+			kvm_ioapic_destroy(kvm);
+			kvm_destroy_pic(kvm);
 			mutex_unlock(&kvm->irq_lock);
 		}
 	create_irqchip_unlock:
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index f3d0693..3db15a8 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -401,6 +401,17 @@ int kvm_ioapic_init(struct kvm *kvm)
 	return ret;
 }
 
+void kvm_ioapic_destroy(struct kvm *kvm)
+{
+	struct kvm_ioapic *ioapic = kvm->arch.vioapic;
+
+	if (ioapic) {
+		kvm_io_bus_unregister_dev(kvm, KVM_MMIO_BUS, &ioapic->dev);
+		kvm->arch.vioapic = NULL;
+		kfree(ioapic);
+	}
+}
+
 int kvm_get_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state)
 {
 	struct kvm_ioapic *ioapic = ioapic_irqchip(kvm);
diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h
index a505ce9..8a751b7 100644
--- a/virt/kvm/ioapic.h
+++ b/virt/kvm/ioapic.h
@@ -72,6 +72,7 @@ int kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,
 int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2);
 void kvm_ioapic_update_eoi(struct kvm *kvm, int vector, int trigger_mode);
 int kvm_ioapic_init(struct kvm *kvm);
+void kvm_ioapic_destroy(struct kvm *kvm);
 int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int level);
 void kvm_ioapic_reset(struct kvm_ioapic *ioapic);
 int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
-- 
1.6.5.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 07/20] KVM: ia64: destroy ioapic device if fail to setup default irq routing
  2010-02-17 13:45 [PATCH 00/20] KVM updates for the 2.6.34 merge window (batch 4/4) Avi Kivity
                   ` (5 preceding siblings ...)
  2010-02-17 13:45 ` [PATCH 06/20] KVM: cleanup the failure path of KVM_CREATE_IRQCHIP ioctrl Avi Kivity
@ 2010-02-17 13:45 ` Avi Kivity
  2010-02-17 13:45 ` [PATCH 08/20] KVM: ppc/booke: Set ESR and DEAR when inject interrupt to guest Avi Kivity
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Avi Kivity @ 2010-02-17 13:45 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel

From: Wei Yongjun <yjwei@cn.fujitsu.com>

If KVM_CREATE_IRQCHIP fail due to kvm_setup_default_irq_routing(),
ioapic device is not destroyed and kvm->arch.vioapic is not set to
NULL, this may cause KVM_GET_IRQCHIP and KVM_SET_IRQCHIP access to
unexcepted memory.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
 arch/ia64/kvm/kvm-ia64.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 0618898..26e0e08 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -968,7 +968,7 @@ long kvm_arch_vm_ioctl(struct file *filp,
 			goto out;
 		r = kvm_setup_default_irq_routing(kvm);
 		if (r) {
-			kfree(kvm->arch.vioapic);
+			kvm_ioapic_destroy(kvm);
 			goto out;
 		}
 		break;
-- 
1.6.5.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 08/20] KVM: ppc/booke: Set ESR and DEAR when inject interrupt to guest
  2010-02-17 13:45 [PATCH 00/20] KVM updates for the 2.6.34 merge window (batch 4/4) Avi Kivity
                   ` (6 preceding siblings ...)
  2010-02-17 13:45 ` [PATCH 07/20] KVM: ia64: destroy ioapic device if fail to setup default irq routing Avi Kivity
@ 2010-02-17 13:45 ` Avi Kivity
  2010-02-17 13:45 ` [PATCH 09/20] KVM: do not store wqh in irqfd Avi Kivity
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Avi Kivity @ 2010-02-17 13:45 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel

From: Liu Yu <yu.liu@freescale.com>

Old method prematurely sets ESR and DEAR.
Move this part after we decide to inject interrupt,
which is more like hardware behave.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Acked-by: Hollis Blanchard <hollis@penguinppc.org>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
 arch/powerpc/include/asm/kvm_host.h |    2 +
 arch/powerpc/kvm/booke.c            |   59 ++++++++++++++++++++++++++---------
 arch/powerpc/kvm/emulate.c          |    4 +-
 3 files changed, 48 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 715aa6b..5e5bae7 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -259,6 +259,8 @@ struct kvm_vcpu_arch {
 #endif
 	ulong fault_dear;
 	ulong fault_esr;
+	ulong queued_dear;
+	ulong queued_esr;
 	gpa_t paddr_accessed;
 
 	u8 io_gpr; /* GPR used as IO source/target */
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index e283e44..4d686cc 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -82,9 +82,32 @@ static void kvmppc_booke_queue_irqprio(struct kvm_vcpu *vcpu,
 	set_bit(priority, &vcpu->arch.pending_exceptions);
 }
 
-void kvmppc_core_queue_program(struct kvm_vcpu *vcpu, ulong flags)
+static void kvmppc_core_queue_dtlb_miss(struct kvm_vcpu *vcpu,
+                                        ulong dear_flags, ulong esr_flags)
 {
-	/* BookE does flags in ESR, so ignore those we get here */
+	vcpu->arch.queued_dear = dear_flags;
+	vcpu->arch.queued_esr = esr_flags;
+	kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_DTLB_MISS);
+}
+
+static void kvmppc_core_queue_data_storage(struct kvm_vcpu *vcpu,
+                                           ulong dear_flags, ulong esr_flags)
+{
+	vcpu->arch.queued_dear = dear_flags;
+	vcpu->arch.queued_esr = esr_flags;
+	kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_DATA_STORAGE);
+}
+
+static void kvmppc_core_queue_inst_storage(struct kvm_vcpu *vcpu,
+                                           ulong esr_flags)
+{
+	vcpu->arch.queued_esr = esr_flags;
+	kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_INST_STORAGE);
+}
+
+void kvmppc_core_queue_program(struct kvm_vcpu *vcpu, ulong esr_flags)
+{
+	vcpu->arch.queued_esr = esr_flags;
 	kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_PROGRAM);
 }
 
@@ -115,14 +138,19 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu,
 {
 	int allowed = 0;
 	ulong msr_mask;
+	bool update_esr = false, update_dear = false;
 
 	switch (priority) {
-	case BOOKE_IRQPRIO_PROGRAM:
 	case BOOKE_IRQPRIO_DTLB_MISS:
-	case BOOKE_IRQPRIO_ITLB_MISS:
-	case BOOKE_IRQPRIO_SYSCALL:
 	case BOOKE_IRQPRIO_DATA_STORAGE:
+		update_dear = true;
+		/* fall through */
 	case BOOKE_IRQPRIO_INST_STORAGE:
+	case BOOKE_IRQPRIO_PROGRAM:
+		update_esr = true;
+		/* fall through */
+	case BOOKE_IRQPRIO_ITLB_MISS:
+	case BOOKE_IRQPRIO_SYSCALL:
 	case BOOKE_IRQPRIO_FP_UNAVAIL:
 	case BOOKE_IRQPRIO_SPE_UNAVAIL:
 	case BOOKE_IRQPRIO_SPE_FP_DATA:
@@ -157,6 +185,10 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu,
 		vcpu->arch.srr0 = vcpu->arch.pc;
 		vcpu->arch.srr1 = vcpu->arch.msr;
 		vcpu->arch.pc = vcpu->arch.ivpr | vcpu->arch.ivor[priority];
+		if (update_esr == true)
+			vcpu->arch.esr = vcpu->arch.queued_esr;
+		if (update_dear == true)
+			vcpu->arch.dear = vcpu->arch.queued_dear;
 		kvmppc_set_msr(vcpu, vcpu->arch.msr & msr_mask);
 
 		clear_bit(priority, &vcpu->arch.pending_exceptions);
@@ -229,8 +261,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
 		if (vcpu->arch.msr & MSR_PR) {
 			/* Program traps generated by user-level software must be handled
 			 * by the guest kernel. */
-			vcpu->arch.esr = vcpu->arch.fault_esr;
-			kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_PROGRAM);
+			kvmppc_core_queue_program(vcpu, vcpu->arch.fault_esr);
 			r = RESUME_GUEST;
 			kvmppc_account_exit(vcpu, USR_PR_INST);
 			break;
@@ -286,16 +317,14 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
 		break;
 
 	case BOOKE_INTERRUPT_DATA_STORAGE:
-		vcpu->arch.dear = vcpu->arch.fault_dear;
-		vcpu->arch.esr = vcpu->arch.fault_esr;
-		kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_DATA_STORAGE);
+		kvmppc_core_queue_data_storage(vcpu, vcpu->arch.fault_dear,
+		                               vcpu->arch.fault_esr);
 		kvmppc_account_exit(vcpu, DSI_EXITS);
 		r = RESUME_GUEST;
 		break;
 
 	case BOOKE_INTERRUPT_INST_STORAGE:
-		vcpu->arch.esr = vcpu->arch.fault_esr;
-		kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_INST_STORAGE);
+		kvmppc_core_queue_inst_storage(vcpu, vcpu->arch.fault_esr);
 		kvmppc_account_exit(vcpu, ISI_EXITS);
 		r = RESUME_GUEST;
 		break;
@@ -316,9 +345,9 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
 		gtlb_index = kvmppc_mmu_dtlb_index(vcpu, eaddr);
 		if (gtlb_index < 0) {
 			/* The guest didn't have a mapping for it. */
-			kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_DTLB_MISS);
-			vcpu->arch.dear = vcpu->arch.fault_dear;
-			vcpu->arch.esr = vcpu->arch.fault_esr;
+			kvmppc_core_queue_dtlb_miss(vcpu,
+			                            vcpu->arch.fault_dear,
+			                            vcpu->arch.fault_esr);
 			kvmppc_mmu_dtlb_miss(vcpu);
 			kvmppc_account_exit(vcpu, DTLB_REAL_MISS_EXITS);
 			r = RESUME_GUEST;
diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index b905623..cb72a65 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -151,10 +151,10 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
 	case OP_TRAP:
 #ifdef CONFIG_PPC64
 	case OP_TRAP_64:
+		kvmppc_core_queue_program(vcpu, SRR1_PROGTRAP);
 #else
-		vcpu->arch.esr |= ESR_PTR;
+		kvmppc_core_queue_program(vcpu, vcpu->arch.esr | ESR_PTR);
 #endif
-		kvmppc_core_queue_program(vcpu, SRR1_PROGTRAP);
 		advance = 0;
 		break;
 
-- 
1.6.5.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 09/20] KVM: do not store wqh in irqfd
  2010-02-17 13:45 [PATCH 00/20] KVM updates for the 2.6.34 merge window (batch 4/4) Avi Kivity
                   ` (7 preceding siblings ...)
  2010-02-17 13:45 ` [PATCH 08/20] KVM: ppc/booke: Set ESR and DEAR when inject interrupt to guest Avi Kivity
@ 2010-02-17 13:45 ` Avi Kivity
  2010-02-17 13:45 ` [PATCH 10/20] KVM: x86 emulator: Add group8 instruction decoding Avi Kivity
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Avi Kivity @ 2010-02-17 13:45 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel

From: Michael S. Tsirkin <mst@redhat.com>

wqh is unused, so we do not need to store it in irqfd anymore

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
 virt/kvm/eventfd.c |    3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 486c604..7016319 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -47,7 +47,6 @@ struct _irqfd {
 	int                       gsi;
 	struct list_head          list;
 	poll_table                pt;
-	wait_queue_head_t        *wqh;
 	wait_queue_t              wait;
 	struct work_struct        inject;
 	struct work_struct        shutdown;
@@ -159,8 +158,6 @@ irqfd_ptable_queue_proc(struct file *file, wait_queue_head_t *wqh,
 			poll_table *pt)
 {
 	struct _irqfd *irqfd = container_of(pt, struct _irqfd, pt);
-
-	irqfd->wqh = wqh;
 	add_wait_queue(wqh, &irqfd->wait);
 }
 
-- 
1.6.5.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 10/20] KVM: x86 emulator: Add group8 instruction decoding
  2010-02-17 13:45 [PATCH 00/20] KVM updates for the 2.6.34 merge window (batch 4/4) Avi Kivity
                   ` (8 preceding siblings ...)
  2010-02-17 13:45 ` [PATCH 09/20] KVM: do not store wqh in irqfd Avi Kivity
@ 2010-02-17 13:45 ` Avi Kivity
  2010-02-17 13:45 ` [PATCH 11/20] KVM: x86 emulator: Add group9 " Avi Kivity
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Avi Kivity @ 2010-02-17 13:45 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel

From: Gleb Natapov <gleb@redhat.com>

Use groups mechanism to decode 0F BA instructions.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
---
 arch/x86/kvm/emulate.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 645b245..435b1e4 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -88,6 +88,7 @@
 enum {
 	Group1_80, Group1_81, Group1_82, Group1_83,
 	Group1A, Group3_Byte, Group3, Group4, Group5, Group7,
+	Group8,
 };
 
 static u32 opcode_table[256] = {
@@ -267,7 +268,7 @@ static u32 twobyte_table[256] = {
 	0, 0, ByteOp | DstReg | SrcMem | ModRM | Mov,
 	    DstReg | SrcMem16 | ModRM | Mov,
 	/* 0xB8 - 0xBF */
-	0, 0, DstMem | SrcImmByte | ModRM, DstMem | SrcReg | ModRM | BitOp,
+	0, 0, Group | Group8, DstMem | SrcReg | ModRM | BitOp,
 	0, 0, ByteOp | DstReg | SrcMem | ModRM | Mov,
 	    DstReg | SrcMem16 | ModRM | Mov,
 	/* 0xC0 - 0xCF */
@@ -323,6 +324,10 @@ static u32 group_table[] = {
 	0, 0, ModRM | SrcMem, ModRM | SrcMem,
 	SrcNone | ModRM | DstMem | Mov, 0,
 	SrcMem16 | ModRM | Mov, SrcMem | ModRM | ByteOp,
+	[Group8*8] =
+	0, 0, 0, 0,
+	DstMem | SrcImmByte | ModRM, DstMem | SrcImmByte | ModRM,
+	DstMem | SrcImmByte | ModRM, DstMem | SrcImmByte | ModRM,
 };
 
 static u32 group2_table[] = {
-- 
1.6.5.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 11/20] KVM: x86 emulator: Add group9 instruction decoding
  2010-02-17 13:45 [PATCH 00/20] KVM updates for the 2.6.34 merge window (batch 4/4) Avi Kivity
                   ` (9 preceding siblings ...)
  2010-02-17 13:45 ` [PATCH 10/20] KVM: x86 emulator: Add group8 instruction decoding Avi Kivity
@ 2010-02-17 13:45 ` Avi Kivity
  2010-02-17 13:45 ` [PATCH 12/20] KVM: x86 emulator: Add Virtual-8086 mode of emulation Avi Kivity
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Avi Kivity @ 2010-02-17 13:45 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel

From: Gleb Natapov <gleb@redhat.com>

Use groups mechanism to decode 0F C7 instructions.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
---
 arch/x86/kvm/emulate.c |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 435b1e4..45a4f7c 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -88,7 +88,7 @@
 enum {
 	Group1_80, Group1_81, Group1_82, Group1_83,
 	Group1A, Group3_Byte, Group3, Group4, Group5, Group7,
-	Group8,
+	Group8, Group9,
 };
 
 static u32 opcode_table[256] = {
@@ -272,7 +272,8 @@ static u32 twobyte_table[256] = {
 	0, 0, ByteOp | DstReg | SrcMem | ModRM | Mov,
 	    DstReg | SrcMem16 | ModRM | Mov,
 	/* 0xC0 - 0xCF */
-	0, 0, 0, DstMem | SrcReg | ModRM | Mov, 0, 0, 0, ImplicitOps | ModRM,
+	0, 0, 0, DstMem | SrcReg | ModRM | Mov,
+	0, 0, 0, Group | GroupDual | Group9,
 	0, 0, 0, 0, 0, 0, 0, 0,
 	/* 0xD0 - 0xDF */
 	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
@@ -328,6 +329,8 @@ static u32 group_table[] = {
 	0, 0, 0, 0,
 	DstMem | SrcImmByte | ModRM, DstMem | SrcImmByte | ModRM,
 	DstMem | SrcImmByte | ModRM, DstMem | SrcImmByte | ModRM,
+	[Group9*8] =
+	0, ImplicitOps | ModRM, 0, 0, 0, 0, 0, 0,
 };
 
 static u32 group2_table[] = {
@@ -335,6 +338,8 @@ static u32 group2_table[] = {
 	SrcNone | ModRM, 0, 0, SrcNone | ModRM,
 	SrcNone | ModRM | DstMem | Mov, 0,
 	SrcMem16 | ModRM | Mov, 0,
+	[Group9*8] =
+	0, 0, 0, 0, 0, 0, 0, 0,
 };
 
 /* EFLAGS bit definitions. */
-- 
1.6.5.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 12/20] KVM: x86 emulator: Add Virtual-8086 mode of emulation
  2010-02-17 13:45 [PATCH 00/20] KVM updates for the 2.6.34 merge window (batch 4/4) Avi Kivity
                   ` (10 preceding siblings ...)
  2010-02-17 13:45 ` [PATCH 11/20] KVM: x86 emulator: Add group9 " Avi Kivity
@ 2010-02-17 13:45 ` Avi Kivity
  2010-02-17 13:45 ` [PATCH 13/20] KVM: x86 emulator: fix memory access during x86 emulation Avi Kivity
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Avi Kivity @ 2010-02-17 13:45 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel

From: Gleb Natapov <gleb@redhat.com>

For some instructions CPU behaves differently for real-mode and
virtual 8086. Let emulator know which mode cpu is in, so it will
not poke into vcpu state directly.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
---
 arch/x86/include/asm/kvm_emulate.h |    1 +
 arch/x86/kvm/emulate.c             |   12 +++++++-----
 arch/x86/kvm/x86.c                 |    3 ++-
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
index 9b697c2..784d7c5 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -168,6 +168,7 @@ struct x86_emulate_ctxt {
 
 /* Execution mode, passed to the emulator. */
 #define X86EMUL_MODE_REAL     0	/* Real mode.             */
+#define X86EMUL_MODE_VM86     1	/* Virtual 8086 mode.     */
 #define X86EMUL_MODE_PROT16   2	/* 16-bit protected mode. */
 #define X86EMUL_MODE_PROT32   4	/* 32-bit protected mode. */
 #define X86EMUL_MODE_PROT64   8	/* 64-bit (long) mode.    */
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 45a4f7c..e4e2df3 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -899,6 +899,7 @@ x86_decode_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
 
 	switch (mode) {
 	case X86EMUL_MODE_REAL:
+	case X86EMUL_MODE_VM86:
 	case X86EMUL_MODE_PROT16:
 		def_op_bytes = def_ad_bytes = 2;
 		break;
@@ -1525,7 +1526,7 @@ emulate_syscall(struct x86_emulate_ctxt *ctxt)
 
 	/* syscall is not available in real mode */
 	if (c->lock_prefix || ctxt->mode == X86EMUL_MODE_REAL
-	    || !is_protmode(ctxt->vcpu))
+	    || ctxt->mode == X86EMUL_MODE_VM86)
 		return -1;
 
 	setup_syscalls_segments(ctxt, &cs, &ss);
@@ -1577,8 +1578,8 @@ emulate_sysenter(struct x86_emulate_ctxt *ctxt)
 	if (c->lock_prefix)
 		return -1;
 
-	/* inject #GP if in real mode or paging is disabled */
-	if (ctxt->mode == X86EMUL_MODE_REAL || !is_protmode(ctxt->vcpu)) {
+	/* inject #GP if in real mode */
+	if (ctxt->mode == X86EMUL_MODE_REAL) {
 		kvm_inject_gp(ctxt->vcpu, 0);
 		return -1;
 	}
@@ -1642,8 +1643,9 @@ emulate_sysexit(struct x86_emulate_ctxt *ctxt)
 	if (c->lock_prefix)
 		return -1;
 
-	/* inject #GP if in real mode or paging is disabled */
-	if (ctxt->mode == X86EMUL_MODE_REAL || !is_protmode(ctxt->vcpu)) {
+	/* inject #GP if in real mode or Virtual 8086 mode */
+	if (ctxt->mode == X86EMUL_MODE_REAL ||
+	    ctxt->mode == X86EMUL_MODE_VM86) {
 		kvm_inject_gp(ctxt->vcpu, 0);
 		return -1;
 	}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b2f91b9..a283795 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3348,8 +3348,9 @@ int emulate_instruction(struct kvm_vcpu *vcpu,
 		vcpu->arch.emulate_ctxt.vcpu = vcpu;
 		vcpu->arch.emulate_ctxt.eflags = kvm_get_rflags(vcpu);
 		vcpu->arch.emulate_ctxt.mode =
+			(!is_protmode(vcpu)) ? X86EMUL_MODE_REAL :
 			(vcpu->arch.emulate_ctxt.eflags & X86_EFLAGS_VM)
-			? X86EMUL_MODE_REAL : cs_l
+			? X86EMUL_MODE_VM86 : cs_l
 			? X86EMUL_MODE_PROT64 :	cs_db
 			? X86EMUL_MODE_PROT32 : X86EMUL_MODE_PROT16;
 
-- 
1.6.5.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 13/20] KVM: x86 emulator: fix memory access during x86 emulation
  2010-02-17 13:45 [PATCH 00/20] KVM updates for the 2.6.34 merge window (batch 4/4) Avi Kivity
                   ` (11 preceding siblings ...)
  2010-02-17 13:45 ` [PATCH 12/20] KVM: x86 emulator: Add Virtual-8086 mode of emulation Avi Kivity
@ 2010-02-17 13:45 ` Avi Kivity
  2010-03-06 13:53   ` Stefan Bader
  2010-02-17 13:45 ` [PATCH 14/20] KVM: x86 emulator: Check IOPL level during io instruction emulation Avi Kivity
                   ` (6 subsequent siblings)
  19 siblings, 1 reply; 32+ messages in thread
From: Avi Kivity @ 2010-02-17 13:45 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel

From: Gleb Natapov <gleb@redhat.com>

Currently when x86 emulator needs to access memory, page walk is done with
broadest permission possible, so if emulated instruction was executed
by userspace process it can still access kernel memory. Fix that by
providing correct memory access to page walker during emulation.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
---
 arch/x86/include/asm/kvm_emulate.h |   14 +++-
 arch/x86/include/asm/kvm_host.h    |    7 ++-
 arch/x86/kvm/emulate.c             |    6 +-
 arch/x86/kvm/mmu.c                 |   17 ++---
 arch/x86/kvm/mmu.h                 |    6 ++
 arch/x86/kvm/paging_tmpl.h         |   11 ++-
 arch/x86/kvm/x86.c                 |  131 +++++++++++++++++++++++++++---------
 7 files changed, 142 insertions(+), 50 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
index 784d7c5..7a6f54f 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -54,13 +54,23 @@ struct x86_emulate_ctxt;
 struct x86_emulate_ops {
 	/*
 	 * read_std: Read bytes of standard (non-emulated/special) memory.
-	 *           Used for instruction fetch, stack operations, and others.
+	 *           Used for descriptor reading.
 	 *  @addr:  [IN ] Linear address from which to read.
 	 *  @val:   [OUT] Value read from memory, zero-extended to 'u_long'.
 	 *  @bytes: [IN ] Number of bytes to read from memory.
 	 */
 	int (*read_std)(unsigned long addr, void *val,
-			unsigned int bytes, struct kvm_vcpu *vcpu);
+			unsigned int bytes, struct kvm_vcpu *vcpu, u32 *error);
+
+	/*
+	 * fetch: Read bytes of standard (non-emulated/special) memory.
+	 *        Used for instruction fetch.
+	 *  @addr:  [IN ] Linear address from which to read.
+	 *  @val:   [OUT] Value read from memory, zero-extended to 'u_long'.
+	 *  @bytes: [IN ] Number of bytes to read from memory.
+	 */
+	int (*fetch)(unsigned long addr, void *val,
+			unsigned int bytes, struct kvm_vcpu *vcpu, u32 *error);
 
 	/*
 	 * read_emulated: Read bytes from emulated/special memory area.
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1522337..c07c16f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -243,7 +243,8 @@ struct kvm_mmu {
 	void (*new_cr3)(struct kvm_vcpu *vcpu);
 	int (*page_fault)(struct kvm_vcpu *vcpu, gva_t gva, u32 err);
 	void (*free)(struct kvm_vcpu *vcpu);
-	gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva);
+	gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva, u32 access,
+			    u32 *error);
 	void (*prefetch_page)(struct kvm_vcpu *vcpu,
 			      struct kvm_mmu_page *page);
 	int (*sync_page)(struct kvm_vcpu *vcpu,
@@ -660,6 +661,10 @@ void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu);
 int kvm_mmu_load(struct kvm_vcpu *vcpu);
 void kvm_mmu_unload(struct kvm_vcpu *vcpu);
 void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu);
+gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva, u32 *error);
+gpa_t kvm_mmu_gva_to_gpa_fetch(struct kvm_vcpu *vcpu, gva_t gva, u32 *error);
+gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, gva_t gva, u32 *error);
+gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva, u32 *error);
 
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
 
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index e4e2df3..c44b460 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -616,7 +616,7 @@ static int do_fetch_insn_byte(struct x86_emulate_ctxt *ctxt,
 
 	if (linear < fc->start || linear >= fc->end) {
 		size = min(15UL, PAGE_SIZE - offset_in_page(linear));
-		rc = ops->read_std(linear, fc->data, size, ctxt->vcpu);
+		rc = ops->fetch(linear, fc->data, size, ctxt->vcpu, NULL);
 		if (rc)
 			return rc;
 		fc->start = linear;
@@ -671,11 +671,11 @@ static int read_descriptor(struct x86_emulate_ctxt *ctxt,
 		op_bytes = 3;
 	*address = 0;
 	rc = ops->read_std((unsigned long)ptr, (unsigned long *)size, 2,
-			   ctxt->vcpu);
+			   ctxt->vcpu, NULL);
 	if (rc)
 		return rc;
 	rc = ops->read_std((unsigned long)ptr + 2, address, op_bytes,
-			   ctxt->vcpu);
+			   ctxt->vcpu, NULL);
 	return rc;
 }
 
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 7397932..741373e 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -138,12 +138,6 @@ module_param(oos_shadow, bool, 0644);
 #define PT64_PERM_MASK (PT_PRESENT_MASK | PT_WRITABLE_MASK | PT_USER_MASK \
 			| PT64_NX_MASK)
 
-#define PFERR_PRESENT_MASK (1U << 0)
-#define PFERR_WRITE_MASK (1U << 1)
-#define PFERR_USER_MASK (1U << 2)
-#define PFERR_RSVD_MASK (1U << 3)
-#define PFERR_FETCH_MASK (1U << 4)
-
 #define RMAP_EXT 4
 
 #define ACC_EXEC_MASK    1
@@ -1632,7 +1626,7 @@ struct page *gva_to_page(struct kvm_vcpu *vcpu, gva_t gva)
 {
 	struct page *page;
 
-	gpa_t gpa = vcpu->arch.mmu.gva_to_gpa(vcpu, gva);
+	gpa_t gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, NULL);
 
 	if (gpa == UNMAPPED_GVA)
 		return NULL;
@@ -2155,8 +2149,11 @@ void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
 	spin_unlock(&vcpu->kvm->mmu_lock);
 }
 
-static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, gva_t vaddr)
+static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, gva_t vaddr,
+				  u32 access, u32 *error)
 {
+	if (error)
+		*error = 0;
 	return vaddr;
 }
 
@@ -2740,7 +2737,7 @@ int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva)
 	if (tdp_enabled)
 		return 0;
 
-	gpa = vcpu->arch.mmu.gva_to_gpa(vcpu, gva);
+	gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, NULL);
 
 	spin_lock(&vcpu->kvm->mmu_lock);
 	r = kvm_mmu_unprotect_page(vcpu->kvm, gpa >> PAGE_SHIFT);
@@ -3237,7 +3234,7 @@ static void audit_mappings_page(struct kvm_vcpu *vcpu, u64 page_pte,
 		if (is_shadow_present_pte(ent) && !is_last_spte(ent, level))
 			audit_mappings_page(vcpu, ent, va, level - 1);
 		else {
-			gpa_t gpa = vcpu->arch.mmu.gva_to_gpa(vcpu, va);
+			gpa_t gpa = kvm_mmu_gva_to_gpa_read(vcpu, va, NULL);
 			gfn_t gfn = gpa >> PAGE_SHIFT;
 			pfn_t pfn = gfn_to_pfn(vcpu->kvm, gfn);
 			hpa_t hpa = (hpa_t)pfn << PAGE_SHIFT;
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 61ef5a6..be66759 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -42,6 +42,12 @@
 #define PT_DIRECTORY_LEVEL 2
 #define PT_PAGE_TABLE_LEVEL 1
 
+#define PFERR_PRESENT_MASK (1U << 0)
+#define PFERR_WRITE_MASK (1U << 1)
+#define PFERR_USER_MASK (1U << 2)
+#define PFERR_RSVD_MASK (1U << 3)
+#define PFERR_FETCH_MASK (1U << 4)
+
 int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 sptes[4]);
 
 static inline void kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index df15a53..81eab9a 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -490,18 +490,23 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
 	spin_unlock(&vcpu->kvm->mmu_lock);
 }
 
-static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t vaddr)
+static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t vaddr, u32 access,
+			       u32 *error)
 {
 	struct guest_walker walker;
 	gpa_t gpa = UNMAPPED_GVA;
 	int r;
 
-	r = FNAME(walk_addr)(&walker, vcpu, vaddr, 0, 0, 0);
+	r = FNAME(walk_addr)(&walker, vcpu, vaddr,
+			     !!(access & PFERR_WRITE_MASK),
+			     !!(access & PFERR_USER_MASK),
+			     !!(access & PFERR_FETCH_MASK));
 
 	if (r) {
 		gpa = gfn_to_gpa(walker.gfn);
 		gpa |= vaddr & ~PAGE_MASK;
-	}
+	} else if (error)
+		*error = walker.error_code;
 
 	return gpa;
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a283795..ea3a8af 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3039,14 +3039,41 @@ static int vcpu_mmio_read(struct kvm_vcpu *vcpu, gpa_t addr, int len, void *v)
 	return kvm_io_bus_read(vcpu->kvm, KVM_MMIO_BUS, addr, len, v);
 }
 
-static int kvm_read_guest_virt(gva_t addr, void *val, unsigned int bytes,
-			       struct kvm_vcpu *vcpu)
+gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva, u32 *error)
+{
+	u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
+	return vcpu->arch.mmu.gva_to_gpa(vcpu, gva, access, error);
+}
+
+ gpa_t kvm_mmu_gva_to_gpa_fetch(struct kvm_vcpu *vcpu, gva_t gva, u32 *error)
+{
+	u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
+	access |= PFERR_FETCH_MASK;
+	return vcpu->arch.mmu.gva_to_gpa(vcpu, gva, access, error);
+}
+
+gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, gva_t gva, u32 *error)
+{
+	u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
+	access |= PFERR_WRITE_MASK;
+	return vcpu->arch.mmu.gva_to_gpa(vcpu, gva, access, error);
+}
+
+/* uses this to access any guest's mapped memory without checking CPL */
+gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva, u32 *error)
+{
+	return vcpu->arch.mmu.gva_to_gpa(vcpu, gva, 0, error);
+}
+
+static int kvm_read_guest_virt_helper(gva_t addr, void *val, unsigned int bytes,
+				      struct kvm_vcpu *vcpu, u32 access,
+				      u32 *error)
 {
 	void *data = val;
 	int r = X86EMUL_CONTINUE;
 
 	while (bytes) {
-		gpa_t gpa = vcpu->arch.mmu.gva_to_gpa(vcpu, addr);
+		gpa_t gpa = vcpu->arch.mmu.gva_to_gpa(vcpu, addr, access, error);
 		unsigned offset = addr & (PAGE_SIZE-1);
 		unsigned toread = min(bytes, (unsigned)PAGE_SIZE - offset);
 		int ret;
@@ -3069,14 +3096,37 @@ out:
 	return r;
 }
 
+/* used for instruction fetching */
+static int kvm_fetch_guest_virt(gva_t addr, void *val, unsigned int bytes,
+				struct kvm_vcpu *vcpu, u32 *error)
+{
+	u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
+	return kvm_read_guest_virt_helper(addr, val, bytes, vcpu,
+					  access | PFERR_FETCH_MASK, error);
+}
+
+static int kvm_read_guest_virt(gva_t addr, void *val, unsigned int bytes,
+			       struct kvm_vcpu *vcpu, u32 *error)
+{
+	u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
+	return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, access,
+					  error);
+}
+
+static int kvm_read_guest_virt_system(gva_t addr, void *val, unsigned int bytes,
+			       struct kvm_vcpu *vcpu, u32 *error)
+{
+	return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, 0, error);
+}
+
 static int kvm_write_guest_virt(gva_t addr, void *val, unsigned int bytes,
-				struct kvm_vcpu *vcpu)
+				struct kvm_vcpu *vcpu, u32 *error)
 {
 	void *data = val;
 	int r = X86EMUL_CONTINUE;
 
 	while (bytes) {
-		gpa_t gpa = vcpu->arch.mmu.gva_to_gpa(vcpu, addr);
+		gpa_t gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, error);
 		unsigned offset = addr & (PAGE_SIZE-1);
 		unsigned towrite = min(bytes, (unsigned)PAGE_SIZE - offset);
 		int ret;
@@ -3106,6 +3156,7 @@ static int emulator_read_emulated(unsigned long addr,
 				  struct kvm_vcpu *vcpu)
 {
 	gpa_t                 gpa;
+	u32 error_code;
 
 	if (vcpu->mmio_read_completed) {
 		memcpy(val, vcpu->mmio_data, bytes);
@@ -3115,17 +3166,20 @@ static int emulator_read_emulated(unsigned long addr,
 		return X86EMUL_CONTINUE;
 	}
 
-	gpa = vcpu->arch.mmu.gva_to_gpa(vcpu, addr);
+	gpa = kvm_mmu_gva_to_gpa_read(vcpu, addr, &error_code);
+
+	if (gpa == UNMAPPED_GVA) {
+		kvm_inject_page_fault(vcpu, addr, error_code);
+		return X86EMUL_PROPAGATE_FAULT;
+	}
 
 	/* For APIC access vmexit */
 	if ((gpa & PAGE_MASK) == APIC_DEFAULT_PHYS_BASE)
 		goto mmio;
 
-	if (kvm_read_guest_virt(addr, val, bytes, vcpu)
+	if (kvm_read_guest_virt(addr, val, bytes, vcpu, NULL)
 				== X86EMUL_CONTINUE)
 		return X86EMUL_CONTINUE;
-	if (gpa == UNMAPPED_GVA)
-		return X86EMUL_PROPAGATE_FAULT;
 
 mmio:
 	/*
@@ -3164,11 +3218,12 @@ static int emulator_write_emulated_onepage(unsigned long addr,
 					   struct kvm_vcpu *vcpu)
 {
 	gpa_t                 gpa;
+	u32 error_code;
 
-	gpa = vcpu->arch.mmu.gva_to_gpa(vcpu, addr);
+	gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, &error_code);
 
 	if (gpa == UNMAPPED_GVA) {
-		kvm_inject_page_fault(vcpu, addr, 2);
+		kvm_inject_page_fault(vcpu, addr, error_code);
 		return X86EMUL_PROPAGATE_FAULT;
 	}
 
@@ -3232,7 +3287,7 @@ static int emulator_cmpxchg_emulated(unsigned long addr,
 		char *kaddr;
 		u64 val;
 
-		gpa = vcpu->arch.mmu.gva_to_gpa(vcpu, addr);
+		gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, NULL);
 
 		if (gpa == UNMAPPED_GVA ||
 		   (gpa & PAGE_MASK) == APIC_DEFAULT_PHYS_BASE)
@@ -3297,7 +3352,7 @@ void kvm_report_emulation_failure(struct kvm_vcpu *vcpu, const char *context)
 
 	rip_linear = rip + get_segment_base(vcpu, VCPU_SREG_CS);
 
-	kvm_read_guest_virt(rip_linear, (void *)opcodes, 4, vcpu);
+	kvm_read_guest_virt(rip_linear, (void *)opcodes, 4, vcpu, NULL);
 
 	printk(KERN_ERR "emulation failed (%s) rip %lx %02x %02x %02x %02x\n",
 	       context, rip, opcodes[0], opcodes[1], opcodes[2], opcodes[3]);
@@ -3305,7 +3360,8 @@ void kvm_report_emulation_failure(struct kvm_vcpu *vcpu, const char *context)
 EXPORT_SYMBOL_GPL(kvm_report_emulation_failure);
 
 static struct x86_emulate_ops emulate_ops = {
-	.read_std            = kvm_read_guest_virt,
+	.read_std            = kvm_read_guest_virt_system,
+	.fetch               = kvm_fetch_guest_virt,
 	.read_emulated       = emulator_read_emulated,
 	.write_emulated      = emulator_write_emulated,
 	.cmpxchg_emulated    = emulator_cmpxchg_emulated,
@@ -3442,12 +3498,17 @@ static int pio_copy_data(struct kvm_vcpu *vcpu)
 	gva_t q = vcpu->arch.pio.guest_gva;
 	unsigned bytes;
 	int ret;
+	u32 error_code;
 
 	bytes = vcpu->arch.pio.size * vcpu->arch.pio.cur_count;
 	if (vcpu->arch.pio.in)
-		ret = kvm_write_guest_virt(q, p, bytes, vcpu);
+		ret = kvm_write_guest_virt(q, p, bytes, vcpu, &error_code);
 	else
-		ret = kvm_read_guest_virt(q, p, bytes, vcpu);
+		ret = kvm_read_guest_virt(q, p, bytes, vcpu, &error_code);
+
+	if (ret == X86EMUL_PROPAGATE_FAULT)
+		kvm_inject_page_fault(vcpu, q, error_code);
+
 	return ret;
 }
 
@@ -3468,7 +3529,7 @@ int complete_pio(struct kvm_vcpu *vcpu)
 		if (io->in) {
 			r = pio_copy_data(vcpu);
 			if (r)
-				return r;
+				goto out;
 		}
 
 		delta = 1;
@@ -3495,7 +3556,7 @@ int complete_pio(struct kvm_vcpu *vcpu)
 			kvm_register_write(vcpu, VCPU_REGS_RSI, val);
 		}
 	}
-
+out:
 	io->count -= io->cur_count;
 	io->cur_count = 0;
 
@@ -3617,10 +3678,8 @@ int kvm_emulate_pio_string(struct kvm_vcpu *vcpu, int in,
 	if (!vcpu->arch.pio.in) {
 		/* string PIO write */
 		ret = pio_copy_data(vcpu);
-		if (ret == X86EMUL_PROPAGATE_FAULT) {
-			kvm_inject_gp(vcpu, 0);
+		if (ret == X86EMUL_PROPAGATE_FAULT)
 			return 1;
-		}
 		if (ret == 0 && !pio_string_write(vcpu)) {
 			complete_pio(vcpu);
 			if (vcpu->arch.pio.count == 0)
@@ -4663,7 +4722,9 @@ static int load_guest_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector,
 		kvm_queue_exception_e(vcpu, GP_VECTOR, selector & 0xfffc);
 		return X86EMUL_PROPAGATE_FAULT;
 	}
-	return kvm_read_guest_virt(dtable.base + index*8, seg_desc, sizeof(*seg_desc), vcpu);
+	return kvm_read_guest_virt_system(dtable.base + index*8,
+					  seg_desc, sizeof(*seg_desc),
+					  vcpu, NULL);
 }
 
 /* allowed just for 8 bytes segments */
@@ -4677,15 +4738,23 @@ static int save_guest_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector,
 
 	if (dtable.limit < index * 8 + 7)
 		return 1;
-	return kvm_write_guest_virt(dtable.base + index*8, seg_desc, sizeof(*seg_desc), vcpu);
+	return kvm_write_guest_virt(dtable.base + index*8, seg_desc, sizeof(*seg_desc), vcpu, NULL);
+}
+
+static gpa_t get_tss_base_addr_write(struct kvm_vcpu *vcpu,
+			       struct desc_struct *seg_desc)
+{
+	u32 base_addr = get_desc_base(seg_desc);
+
+	return kvm_mmu_gva_to_gpa_write(vcpu, base_addr, NULL);
 }
 
-static gpa_t get_tss_base_addr(struct kvm_vcpu *vcpu,
+static gpa_t get_tss_base_addr_read(struct kvm_vcpu *vcpu,
 			     struct desc_struct *seg_desc)
 {
 	u32 base_addr = get_desc_base(seg_desc);
 
-	return vcpu->arch.mmu.gva_to_gpa(vcpu, base_addr);
+	return kvm_mmu_gva_to_gpa_read(vcpu, base_addr, NULL);
 }
 
 static u16 get_segment_selector(struct kvm_vcpu *vcpu, int seg)
@@ -4894,7 +4963,7 @@ static int kvm_task_switch_16(struct kvm_vcpu *vcpu, u16 tss_selector,
 			    sizeof tss_segment_16))
 		goto out;
 
-	if (kvm_read_guest(vcpu->kvm, get_tss_base_addr(vcpu, nseg_desc),
+	if (kvm_read_guest(vcpu->kvm, get_tss_base_addr_read(vcpu, nseg_desc),
 			   &tss_segment_16, sizeof tss_segment_16))
 		goto out;
 
@@ -4902,7 +4971,7 @@ static int kvm_task_switch_16(struct kvm_vcpu *vcpu, u16 tss_selector,
 		tss_segment_16.prev_task_link = old_tss_sel;
 
 		if (kvm_write_guest(vcpu->kvm,
-				    get_tss_base_addr(vcpu, nseg_desc),
+				    get_tss_base_addr_write(vcpu, nseg_desc),
 				    &tss_segment_16.prev_task_link,
 				    sizeof tss_segment_16.prev_task_link))
 			goto out;
@@ -4933,7 +5002,7 @@ static int kvm_task_switch_32(struct kvm_vcpu *vcpu, u16 tss_selector,
 			    sizeof tss_segment_32))
 		goto out;
 
-	if (kvm_read_guest(vcpu->kvm, get_tss_base_addr(vcpu, nseg_desc),
+	if (kvm_read_guest(vcpu->kvm, get_tss_base_addr_read(vcpu, nseg_desc),
 			   &tss_segment_32, sizeof tss_segment_32))
 		goto out;
 
@@ -4941,7 +5010,7 @@ static int kvm_task_switch_32(struct kvm_vcpu *vcpu, u16 tss_selector,
 		tss_segment_32.prev_task_link = old_tss_sel;
 
 		if (kvm_write_guest(vcpu->kvm,
-				    get_tss_base_addr(vcpu, nseg_desc),
+				    get_tss_base_addr_write(vcpu, nseg_desc),
 				    &tss_segment_32.prev_task_link,
 				    sizeof tss_segment_32.prev_task_link))
 			goto out;
@@ -4964,7 +5033,7 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int reason)
 	u32 old_tss_base = get_segment_base(vcpu, VCPU_SREG_TR);
 	u16 old_tss_sel = get_segment_selector(vcpu, VCPU_SREG_TR);
 
-	old_tss_base = vcpu->arch.mmu.gva_to_gpa(vcpu, old_tss_base);
+	old_tss_base = kvm_mmu_gva_to_gpa_write(vcpu, old_tss_base, NULL);
 
 	/* FIXME: Handle errors. Failure to read either TSS or their
 	 * descriptors should generate a pagefault.
@@ -5199,7 +5268,7 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
 
 	vcpu_load(vcpu);
 	idx = srcu_read_lock(&vcpu->kvm->srcu);
-	gpa = vcpu->arch.mmu.gva_to_gpa(vcpu, vaddr);
+	gpa = kvm_mmu_gva_to_gpa_system(vcpu, vaddr, NULL);
 	srcu_read_unlock(&vcpu->kvm->srcu, idx);
 	tr->physical_address = gpa;
 	tr->valid = gpa != UNMAPPED_GVA;
-- 
1.6.5.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 14/20] KVM: x86 emulator: Check IOPL level during io instruction emulation
  2010-02-17 13:45 [PATCH 00/20] KVM updates for the 2.6.34 merge window (batch 4/4) Avi Kivity
                   ` (12 preceding siblings ...)
  2010-02-17 13:45 ` [PATCH 13/20] KVM: x86 emulator: fix memory access during x86 emulation Avi Kivity
@ 2010-02-17 13:45 ` Avi Kivity
  2010-02-17 13:45 ` [PATCH 15/20] KVM: x86 emulator: Fix popf emulation Avi Kivity
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Avi Kivity @ 2010-02-17 13:45 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel

From: Gleb Natapov <gleb@redhat.com>

Make emulator check that vcpu is allowed to execute IN, INS, OUT,
OUTS, CLI, STI.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |    1 +
 arch/x86/kvm/emulate.c          |   89 +++++++++++++++++++++++++++++++++++---
 arch/x86/kvm/x86.c              |   10 ++---
 3 files changed, 87 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c07c16f..f9a2f66 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -678,6 +678,7 @@ void kvm_disable_tdp(void);
 
 int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3);
 int complete_pio(struct kvm_vcpu *vcpu);
+bool kvm_check_iopl(struct kvm_vcpu *vcpu);
 
 struct kvm_memory_slot *gfn_to_memslot_unaliased(struct kvm *kvm, gfn_t gfn);
 
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index c44b460..296e851 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1698,6 +1698,57 @@ emulate_sysexit(struct x86_emulate_ctxt *ctxt)
 	return 0;
 }
 
+static bool emulator_bad_iopl(struct x86_emulate_ctxt *ctxt)
+{
+	int iopl;
+	if (ctxt->mode == X86EMUL_MODE_REAL)
+		return false;
+	if (ctxt->mode == X86EMUL_MODE_VM86)
+		return true;
+	iopl = (ctxt->eflags & X86_EFLAGS_IOPL) >> IOPL_SHIFT;
+	return kvm_x86_ops->get_cpl(ctxt->vcpu) > iopl;
+}
+
+static bool emulator_io_port_access_allowed(struct x86_emulate_ctxt *ctxt,
+					    struct x86_emulate_ops *ops,
+					    u16 port, u16 len)
+{
+	struct kvm_segment tr_seg;
+	int r;
+	u16 io_bitmap_ptr;
+	u8 perm, bit_idx = port & 0x7;
+	unsigned mask = (1 << len) - 1;
+
+	kvm_get_segment(ctxt->vcpu, &tr_seg, VCPU_SREG_TR);
+	if (tr_seg.unusable)
+		return false;
+	if (tr_seg.limit < 103)
+		return false;
+	r = ops->read_std(tr_seg.base + 102, &io_bitmap_ptr, 2, ctxt->vcpu,
+			  NULL);
+	if (r != X86EMUL_CONTINUE)
+		return false;
+	if (io_bitmap_ptr + port/8 > tr_seg.limit)
+		return false;
+	r = ops->read_std(tr_seg.base + io_bitmap_ptr + port/8, &perm, 1,
+			  ctxt->vcpu, NULL);
+	if (r != X86EMUL_CONTINUE)
+		return false;
+	if ((perm >> bit_idx) & mask)
+		return false;
+	return true;
+}
+
+static bool emulator_io_permited(struct x86_emulate_ctxt *ctxt,
+				 struct x86_emulate_ops *ops,
+				 u16 port, u16 len)
+{
+	if (emulator_bad_iopl(ctxt))
+		if (!emulator_io_port_access_allowed(ctxt, ops, port, len))
+			return false;
+	return true;
+}
+
 int
 x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
 {
@@ -1889,7 +1940,12 @@ special_insn:
 		break;
 	case 0x6c:		/* insb */
 	case 0x6d:		/* insw/insd */
-		 if (kvm_emulate_pio_string(ctxt->vcpu,
+		if (!emulator_io_permited(ctxt, ops, c->regs[VCPU_REGS_RDX],
+					  (c->d & ByteOp) ? 1 : c->op_bytes)) {
+			kvm_inject_gp(ctxt->vcpu, 0);
+			goto done;
+		}
+		if (kvm_emulate_pio_string(ctxt->vcpu,
 				1,
 				(c->d & ByteOp) ? 1 : c->op_bytes,
 				c->rep_prefix ?
@@ -1905,6 +1961,11 @@ special_insn:
 		return 0;
 	case 0x6e:		/* outsb */
 	case 0x6f:		/* outsw/outsd */
+		if (!emulator_io_permited(ctxt, ops, c->regs[VCPU_REGS_RDX],
+					  (c->d & ByteOp) ? 1 : c->op_bytes)) {
+			kvm_inject_gp(ctxt->vcpu, 0);
+			goto done;
+		}
 		if (kvm_emulate_pio_string(ctxt->vcpu,
 				0,
 				(c->d & ByteOp) ? 1 : c->op_bytes,
@@ -2202,7 +2263,13 @@ special_insn:
 	case 0xef: /* out (e/r)ax,dx */
 		port = c->regs[VCPU_REGS_RDX];
 		io_dir_in = 0;
-	do_io:	if (kvm_emulate_pio(ctxt->vcpu, io_dir_in,
+	do_io:
+		if (!emulator_io_permited(ctxt, ops, port,
+					  (c->d & ByteOp) ? 1 : c->op_bytes)) {
+			kvm_inject_gp(ctxt->vcpu, 0);
+			goto done;
+		}
+		if (kvm_emulate_pio(ctxt->vcpu, io_dir_in,
 				   (c->d & ByteOp) ? 1 : c->op_bytes,
 				   port) != 0) {
 			c->eip = saved_eip;
@@ -2227,13 +2294,21 @@ special_insn:
 		c->dst.type = OP_NONE;	/* Disable writeback. */
 		break;
 	case 0xfa: /* cli */
-		ctxt->eflags &= ~X86_EFLAGS_IF;
-		c->dst.type = OP_NONE;	/* Disable writeback. */
+		if (emulator_bad_iopl(ctxt))
+			kvm_inject_gp(ctxt->vcpu, 0);
+		else {
+			ctxt->eflags &= ~X86_EFLAGS_IF;
+			c->dst.type = OP_NONE;	/* Disable writeback. */
+		}
 		break;
 	case 0xfb: /* sti */
-		toggle_interruptibility(ctxt, X86_SHADOW_INT_STI);
-		ctxt->eflags |= X86_EFLAGS_IF;
-		c->dst.type = OP_NONE;	/* Disable writeback. */
+		if (emulator_bad_iopl(ctxt))
+			kvm_inject_gp(ctxt->vcpu, 0);
+		else {
+			toggle_interruptibility(ctxt, X86_SHADOW_INT_STI);
+			ctxt->eflags |= X86_EFLAGS_IF;
+			c->dst.type = OP_NONE;	/* Disable writeback. */
+		}
 		break;
 	case 0xfc: /* cld */
 		ctxt->eflags &= ~EFLG_DF;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ea3a8af..86b739f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3599,6 +3599,8 @@ int kvm_emulate_pio(struct kvm_vcpu *vcpu, int in, int size, unsigned port)
 {
 	unsigned long val;
 
+	trace_kvm_pio(!in, port, size, 1);
+
 	vcpu->run->exit_reason = KVM_EXIT_IO;
 	vcpu->run->io.direction = in ? KVM_EXIT_IO_IN : KVM_EXIT_IO_OUT;
 	vcpu->run->io.size = vcpu->arch.pio.size = size;
@@ -3610,9 +3612,6 @@ int kvm_emulate_pio(struct kvm_vcpu *vcpu, int in, int size, unsigned port)
 	vcpu->arch.pio.down = 0;
 	vcpu->arch.pio.rep = 0;
 
-	trace_kvm_pio(vcpu->run->io.direction == KVM_EXIT_IO_OUT, port,
-		      size, 1);
-
 	if (!vcpu->arch.pio.in) {
 		val = kvm_register_read(vcpu, VCPU_REGS_RAX);
 		memcpy(vcpu->arch.pio_data, &val, 4);
@@ -3633,6 +3632,8 @@ int kvm_emulate_pio_string(struct kvm_vcpu *vcpu, int in,
 	unsigned now, in_page;
 	int ret = 0;
 
+	trace_kvm_pio(!in, port, size, count);
+
 	vcpu->run->exit_reason = KVM_EXIT_IO;
 	vcpu->run->io.direction = in ? KVM_EXIT_IO_IN : KVM_EXIT_IO_OUT;
 	vcpu->run->io.size = vcpu->arch.pio.size = size;
@@ -3644,9 +3645,6 @@ int kvm_emulate_pio_string(struct kvm_vcpu *vcpu, int in,
 	vcpu->arch.pio.down = down;
 	vcpu->arch.pio.rep = rep;
 
-	trace_kvm_pio(vcpu->run->io.direction == KVM_EXIT_IO_OUT, port,
-		      size, count);
-
 	if (!count) {
 		kvm_x86_ops->skip_emulated_instruction(vcpu);
 		return 1;
-- 
1.6.5.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 15/20] KVM: x86 emulator: Fix popf emulation
  2010-02-17 13:45 [PATCH 00/20] KVM updates for the 2.6.34 merge window (batch 4/4) Avi Kivity
                   ` (13 preceding siblings ...)
  2010-02-17 13:45 ` [PATCH 14/20] KVM: x86 emulator: Check IOPL level during io instruction emulation Avi Kivity
@ 2010-02-17 13:45 ` Avi Kivity
  2010-02-17 13:45 ` [PATCH 16/20] KVM: x86 emulator: Check CPL level during privilege instruction emulation Avi Kivity
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Avi Kivity @ 2010-02-17 13:45 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel

From: Gleb Natapov <gleb@redhat.com>

POPF behaves differently depending on current CPU mode. Emulate correct
logic to prevent guest from changing flags that it can't change otherwise.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
---
 arch/x86/kvm/emulate.c |   55 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 54 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 296e851..1782387 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -343,11 +343,18 @@ static u32 group2_table[] = {
 };
 
 /* EFLAGS bit definitions. */
+#define EFLG_ID (1<<21)
+#define EFLG_VIP (1<<20)
+#define EFLG_VIF (1<<19)
+#define EFLG_AC (1<<18)
 #define EFLG_VM (1<<17)
 #define EFLG_RF (1<<16)
+#define EFLG_IOPL (3<<12)
+#define EFLG_NT (1<<14)
 #define EFLG_OF (1<<11)
 #define EFLG_DF (1<<10)
 #define EFLG_IF (1<<9)
+#define EFLG_TF (1<<8)
 #define EFLG_SF (1<<7)
 #define EFLG_ZF (1<<6)
 #define EFLG_AF (1<<4)
@@ -1214,6 +1221,49 @@ static int emulate_pop(struct x86_emulate_ctxt *ctxt,
 	return rc;
 }
 
+static int emulate_popf(struct x86_emulate_ctxt *ctxt,
+		       struct x86_emulate_ops *ops,
+		       void *dest, int len)
+{
+	int rc;
+	unsigned long val, change_mask;
+	int iopl = (ctxt->eflags & X86_EFLAGS_IOPL) >> IOPL_SHIFT;
+	int cpl = kvm_x86_ops->get_cpl(ctxt->vcpu);
+
+	rc = emulate_pop(ctxt, ops, &val, len);
+	if (rc != X86EMUL_CONTINUE)
+		return rc;
+
+	change_mask = EFLG_CF | EFLG_PF | EFLG_AF | EFLG_ZF | EFLG_SF | EFLG_OF
+		| EFLG_TF | EFLG_DF | EFLG_NT | EFLG_RF | EFLG_AC | EFLG_ID;
+
+	switch(ctxt->mode) {
+	case X86EMUL_MODE_PROT64:
+	case X86EMUL_MODE_PROT32:
+	case X86EMUL_MODE_PROT16:
+		if (cpl == 0)
+			change_mask |= EFLG_IOPL;
+		if (cpl <= iopl)
+			change_mask |= EFLG_IF;
+		break;
+	case X86EMUL_MODE_VM86:
+		if (iopl < 3) {
+			kvm_inject_gp(ctxt->vcpu, 0);
+			return X86EMUL_PROPAGATE_FAULT;
+		}
+		change_mask |= EFLG_IF;
+		break;
+	default: /* real mode */
+		change_mask |= (EFLG_IOPL | EFLG_IF);
+		break;
+	}
+
+	*(unsigned long *)dest =
+		(ctxt->eflags & ~change_mask) | (val & change_mask);
+
+	return rc;
+}
+
 static void emulate_push_sreg(struct x86_emulate_ctxt *ctxt, int seg)
 {
 	struct decode_cache *c = &ctxt->decode;
@@ -2099,7 +2149,10 @@ special_insn:
 		c->dst.type = OP_REG;
 		c->dst.ptr = (unsigned long *) &ctxt->eflags;
 		c->dst.bytes = c->op_bytes;
-		goto pop_instruction;
+		rc = emulate_popf(ctxt, ops, &c->dst.val, c->op_bytes);
+		if (rc != X86EMUL_CONTINUE)
+			goto done;
+		break;
 	case 0xa0 ... 0xa1:	/* mov */
 		c->dst.ptr = (unsigned long *)&c->regs[VCPU_REGS_RAX];
 		c->dst.val = c->src.val;
-- 
1.6.5.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 16/20] KVM: x86 emulator: Check CPL level during privilege instruction emulation
  2010-02-17 13:45 [PATCH 00/20] KVM updates for the 2.6.34 merge window (batch 4/4) Avi Kivity
                   ` (14 preceding siblings ...)
  2010-02-17 13:45 ` [PATCH 15/20] KVM: x86 emulator: Fix popf emulation Avi Kivity
@ 2010-02-17 13:45 ` Avi Kivity
  2010-02-17 13:45 ` [PATCH 17/20] KVM: x86 emulator: Add LOCK prefix validity checking Avi Kivity
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Avi Kivity @ 2010-02-17 13:45 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel

From: Gleb Natapov <gleb@redhat.com>

Add CPL checking in case emulator is tricked into emulating
privilege instruction from userspace.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
---
 arch/x86/kvm/emulate.c |   35 ++++++++++++++++++++---------------
 1 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 1782387..d632111 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -76,6 +76,7 @@
 #define GroupDual   (1<<15)     /* Alternate decoding of mod == 3 */
 #define GroupMask   0xff        /* Group number stored in bits 0:7 */
 /* Misc flags */
+#define Priv        (1<<27) /* instruction generates #GP if current CPL != 0 */
 #define No64	    (1<<28)
 /* Source 2 operand type */
 #define Src2None    (0<<29)
@@ -211,7 +212,7 @@ static u32 opcode_table[256] = {
 	SrcNone | ByteOp | ImplicitOps, SrcNone | ImplicitOps,
 	/* 0xF0 - 0xF7 */
 	0, 0, 0, 0,
-	ImplicitOps, ImplicitOps, Group | Group3_Byte, Group | Group3,
+	ImplicitOps | Priv, ImplicitOps, Group | Group3_Byte, Group | Group3,
 	/* 0xF8 - 0xFF */
 	ImplicitOps, 0, ImplicitOps, ImplicitOps,
 	ImplicitOps, ImplicitOps, Group | Group4, Group | Group5,
@@ -219,16 +220,20 @@ static u32 opcode_table[256] = {
 
 static u32 twobyte_table[256] = {
 	/* 0x00 - 0x0F */
-	0, Group | GroupDual | Group7, 0, 0, 0, ImplicitOps, ImplicitOps, 0,
-	ImplicitOps, ImplicitOps, 0, 0, 0, ImplicitOps | ModRM, 0, 0,
+	0, Group | GroupDual | Group7, 0, 0,
+	0, ImplicitOps, ImplicitOps | Priv, 0,
+	ImplicitOps | Priv, ImplicitOps | Priv, 0, 0,
+	0, ImplicitOps | ModRM, 0, 0,
 	/* 0x10 - 0x1F */
 	0, 0, 0, 0, 0, 0, 0, 0, ImplicitOps | ModRM, 0, 0, 0, 0, 0, 0, 0,
 	/* 0x20 - 0x2F */
-	ModRM | ImplicitOps, ModRM, ModRM | ImplicitOps, ModRM, 0, 0, 0, 0,
+	ModRM | ImplicitOps | Priv, ModRM | Priv,
+	ModRM | ImplicitOps | Priv, ModRM | Priv,
+	0, 0, 0, 0,
 	0, 0, 0, 0, 0, 0, 0, 0,
 	/* 0x30 - 0x3F */
-	ImplicitOps, 0, ImplicitOps, 0,
-	ImplicitOps, ImplicitOps, 0, 0,
+	ImplicitOps | Priv, 0, ImplicitOps | Priv, 0,
+	ImplicitOps, ImplicitOps | Priv, 0, 0,
 	0, 0, 0, 0, 0, 0, 0, 0,
 	/* 0x40 - 0x47 */
 	DstReg | SrcMem | ModRM | Mov, DstReg | SrcMem | ModRM | Mov,
@@ -322,9 +327,9 @@ static u32 group_table[] = {
 	SrcMem | ModRM | Stack, 0,
 	SrcMem | ModRM | Stack, 0, SrcMem | ModRM | Stack, 0,
 	[Group7*8] =
-	0, 0, ModRM | SrcMem, ModRM | SrcMem,
+	0, 0, ModRM | SrcMem | Priv, ModRM | SrcMem | Priv,
 	SrcNone | ModRM | DstMem | Mov, 0,
-	SrcMem16 | ModRM | Mov, SrcMem | ModRM | ByteOp,
+	SrcMem16 | ModRM | Mov | Priv, SrcMem | ModRM | ByteOp | Priv,
 	[Group8*8] =
 	0, 0, 0, 0,
 	DstMem | SrcImmByte | ModRM, DstMem | SrcImmByte | ModRM,
@@ -335,7 +340,7 @@ static u32 group_table[] = {
 
 static u32 group2_table[] = {
 	[Group7*8] =
-	SrcNone | ModRM, 0, 0, SrcNone | ModRM,
+	SrcNone | ModRM | Priv, 0, 0, SrcNone | ModRM,
 	SrcNone | ModRM | DstMem | Mov, 0,
 	SrcMem16 | ModRM | Mov, 0,
 	[Group9*8] =
@@ -1700,12 +1705,6 @@ emulate_sysexit(struct x86_emulate_ctxt *ctxt)
 		return -1;
 	}
 
-	/* sysexit must be called from CPL 0 */
-	if (kvm_x86_ops->get_cpl(ctxt->vcpu) != 0) {
-		kvm_inject_gp(ctxt->vcpu, 0);
-		return -1;
-	}
-
 	setup_syscalls_segments(ctxt, &cs, &ss);
 
 	if ((c->rex_prefix & 0x8) != 0x0)
@@ -1820,6 +1819,12 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
 	memcpy(c->regs, ctxt->vcpu->arch.regs, sizeof c->regs);
 	saved_eip = c->eip;
 
+	/* Privileged instruction can be executed only in CPL=0 */
+	if ((c->d & Priv) && kvm_x86_ops->get_cpl(ctxt->vcpu)) {
+		kvm_inject_gp(ctxt->vcpu, 0);
+		goto done;
+	}
+
 	if (((c->d & ModRM) && (c->modrm_mod != 3)) || (c->d & MemAbs))
 		memop = c->modrm_ea;
 
-- 
1.6.5.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 17/20] KVM: x86 emulator: Add LOCK prefix validity checking
  2010-02-17 13:45 [PATCH 00/20] KVM updates for the 2.6.34 merge window (batch 4/4) Avi Kivity
                   ` (15 preceding siblings ...)
  2010-02-17 13:45 ` [PATCH 16/20] KVM: x86 emulator: Check CPL level during privilege instruction emulation Avi Kivity
@ 2010-02-17 13:45 ` Avi Kivity
  2010-02-17 13:45 ` [PATCH 18/20] KVM: Plan obsolescence of kernel allocated slots, paravirt mmu Avi Kivity
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Avi Kivity @ 2010-02-17 13:45 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel

From: Gleb Natapov <gleb@redhat.com>

Instructions which are not allowed to have LOCK prefix should
generate #UD if one is used.

[avi: fold opcode 82 fix from another patch]

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
 arch/x86/kvm/emulate.c |   97 +++++++++++++++++++++++++++--------------------
 1 files changed, 56 insertions(+), 41 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index d632111..c2de9f0 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -76,6 +76,7 @@
 #define GroupDual   (1<<15)     /* Alternate decoding of mod == 3 */
 #define GroupMask   0xff        /* Group number stored in bits 0:7 */
 /* Misc flags */
+#define Lock        (1<<26) /* lock prefix is allowed for the instruction */
 #define Priv        (1<<27) /* instruction generates #GP if current CPL != 0 */
 #define No64	    (1<<28)
 /* Source 2 operand type */
@@ -94,35 +95,35 @@ enum {
 
 static u32 opcode_table[256] = {
 	/* 0x00 - 0x07 */
-	ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM,
+	ByteOp | DstMem | SrcReg | ModRM | Lock, DstMem | SrcReg | ModRM | Lock,
 	ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM,
 	ByteOp | DstAcc | SrcImm, DstAcc | SrcImm,
 	ImplicitOps | Stack | No64, ImplicitOps | Stack | No64,
 	/* 0x08 - 0x0F */
-	ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM,
+	ByteOp | DstMem | SrcReg | ModRM | Lock, DstMem | SrcReg | ModRM | Lock,
 	ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM,
 	ByteOp | DstAcc | SrcImm, DstAcc | SrcImm,
 	ImplicitOps | Stack | No64, 0,
 	/* 0x10 - 0x17 */
-	ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM,
+	ByteOp | DstMem | SrcReg | ModRM | Lock, DstMem | SrcReg | ModRM | Lock,
 	ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM,
 	ByteOp | DstAcc | SrcImm, DstAcc | SrcImm,
 	ImplicitOps | Stack | No64, ImplicitOps | Stack | No64,
 	/* 0x18 - 0x1F */
-	ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM,
+	ByteOp | DstMem | SrcReg | ModRM | Lock, DstMem | SrcReg | ModRM | Lock,
 	ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM,
 	ByteOp | DstAcc | SrcImm, DstAcc | SrcImm,
 	ImplicitOps | Stack | No64, ImplicitOps | Stack | No64,
 	/* 0x20 - 0x27 */
-	ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM,
+	ByteOp | DstMem | SrcReg | ModRM | Lock, DstMem | SrcReg | ModRM | Lock,
 	ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM,
 	DstAcc | SrcImmByte, DstAcc | SrcImm, 0, 0,
 	/* 0x28 - 0x2F */
-	ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM,
+	ByteOp | DstMem | SrcReg | ModRM | Lock, DstMem | SrcReg | ModRM | Lock,
 	ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM,
 	0, 0, 0, 0,
 	/* 0x30 - 0x37 */
-	ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM,
+	ByteOp | DstMem | SrcReg | ModRM | Lock, DstMem | SrcReg | ModRM | Lock,
 	ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM,
 	0, 0, 0, 0,
 	/* 0x38 - 0x3F */
@@ -158,7 +159,7 @@ static u32 opcode_table[256] = {
 	Group | Group1_80, Group | Group1_81,
 	Group | Group1_82, Group | Group1_83,
 	ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM,
-	ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM,
+	ByteOp | DstMem | SrcReg | ModRM | Lock, DstMem | SrcReg | ModRM | Lock,
 	/* 0x88 - 0x8F */
 	ByteOp | DstMem | SrcReg | ModRM | Mov, DstMem | SrcReg | ModRM | Mov,
 	ByteOp | DstReg | SrcMem | ModRM | Mov, DstReg | SrcMem | ModRM | Mov,
@@ -263,17 +264,18 @@ static u32 twobyte_table[256] = {
 	DstMem | SrcReg | Src2CL | ModRM, 0, 0,
 	/* 0xA8 - 0xAF */
 	ImplicitOps | Stack, ImplicitOps | Stack,
-	0, DstMem | SrcReg | ModRM | BitOp,
+	0, DstMem | SrcReg | ModRM | BitOp | Lock,
 	DstMem | SrcReg | Src2ImmByte | ModRM,
 	DstMem | SrcReg | Src2CL | ModRM,
 	ModRM, 0,
 	/* 0xB0 - 0xB7 */
-	ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM, 0,
-	    DstMem | SrcReg | ModRM | BitOp,
+	ByteOp | DstMem | SrcReg | ModRM | Lock, DstMem | SrcReg | ModRM | Lock,
+	0, DstMem | SrcReg | ModRM | BitOp | Lock,
 	0, 0, ByteOp | DstReg | SrcMem | ModRM | Mov,
 	    DstReg | SrcMem16 | ModRM | Mov,
 	/* 0xB8 - 0xBF */
-	0, 0, Group | Group8, DstMem | SrcReg | ModRM | BitOp,
+	0, 0,
+	Group | Group8, DstMem | SrcReg | ModRM | BitOp | Lock,
 	0, 0, ByteOp | DstReg | SrcMem | ModRM | Mov,
 	    DstReg | SrcMem16 | ModRM | Mov,
 	/* 0xC0 - 0xCF */
@@ -290,25 +292,41 @@ static u32 twobyte_table[256] = {
 
 static u32 group_table[] = {
 	[Group1_80*8] =
-	ByteOp | DstMem | SrcImm | ModRM, ByteOp | DstMem | SrcImm | ModRM,
-	ByteOp | DstMem | SrcImm | ModRM, ByteOp | DstMem | SrcImm | ModRM,
-	ByteOp | DstMem | SrcImm | ModRM, ByteOp | DstMem | SrcImm | ModRM,
-	ByteOp | DstMem | SrcImm | ModRM, ByteOp | DstMem | SrcImm | ModRM,
+	ByteOp | DstMem | SrcImm | ModRM | Lock,
+	ByteOp | DstMem | SrcImm | ModRM | Lock,
+	ByteOp | DstMem | SrcImm | ModRM | Lock,
+	ByteOp | DstMem | SrcImm | ModRM | Lock,
+	ByteOp | DstMem | SrcImm | ModRM | Lock,
+	ByteOp | DstMem | SrcImm | ModRM | Lock,
+	ByteOp | DstMem | SrcImm | ModRM | Lock,
+	ByteOp | DstMem | SrcImm | ModRM,
 	[Group1_81*8] =
-	DstMem | SrcImm | ModRM, DstMem | SrcImm | ModRM,
-	DstMem | SrcImm | ModRM, DstMem | SrcImm | ModRM,
-	DstMem | SrcImm | ModRM, DstMem | SrcImm | ModRM,
-	DstMem | SrcImm | ModRM, DstMem | SrcImm | ModRM,
+	DstMem | SrcImm | ModRM | Lock,
+	DstMem | SrcImm | ModRM | Lock,
+	DstMem | SrcImm | ModRM | Lock,
+	DstMem | SrcImm | ModRM | Lock,
+	DstMem | SrcImm | ModRM | Lock,
+	DstMem | SrcImm | ModRM | Lock,
+	DstMem | SrcImm | ModRM | Lock,
+	DstMem | SrcImm | ModRM,
 	[Group1_82*8] =
-	ByteOp | DstMem | SrcImm | ModRM, ByteOp | DstMem | SrcImm | ModRM,
-	ByteOp | DstMem | SrcImm | ModRM, ByteOp | DstMem | SrcImm | ModRM,
-	ByteOp | DstMem | SrcImm | ModRM, ByteOp | DstMem | SrcImm | ModRM,
-	ByteOp | DstMem | SrcImm | ModRM, ByteOp | DstMem | SrcImm | ModRM,
+	ByteOp | DstMem | SrcImm | ModRM | Lock,
+	ByteOp | DstMem | SrcImm | ModRM | Lock,
+	ByteOp | DstMem | SrcImm | ModRM | Lock,
+	ByteOp | DstMem | SrcImm | ModRM | Lock,
+	ByteOp | DstMem | SrcImm | ModRM | Lock,
+	ByteOp | DstMem | SrcImm | ModRM | Lock,
+	ByteOp | DstMem | SrcImm | ModRM | Lock,
+	ByteOp | DstMem | SrcImm | ModRM,
 	[Group1_83*8] =
-	DstMem | SrcImmByte | ModRM, DstMem | SrcImmByte | ModRM,
-	DstMem | SrcImmByte | ModRM, DstMem | SrcImmByte | ModRM,
-	DstMem | SrcImmByte | ModRM, DstMem | SrcImmByte | ModRM,
-	DstMem | SrcImmByte | ModRM, DstMem | SrcImmByte | ModRM,
+	DstMem | SrcImmByte | ModRM | Lock,
+	DstMem | SrcImmByte | ModRM | Lock,
+	DstMem | SrcImmByte | ModRM | Lock,
+	DstMem | SrcImmByte | ModRM | Lock,
+	DstMem | SrcImmByte | ModRM | Lock,
+	DstMem | SrcImmByte | ModRM | Lock,
+	DstMem | SrcImmByte | ModRM | Lock,
+	DstMem | SrcImmByte | ModRM,
 	[Group1A*8] =
 	DstMem | SrcNone | ModRM | Mov | Stack, 0, 0, 0, 0, 0, 0, 0,
 	[Group3_Byte*8] =
@@ -332,10 +350,10 @@ static u32 group_table[] = {
 	SrcMem16 | ModRM | Mov | Priv, SrcMem | ModRM | ByteOp | Priv,
 	[Group8*8] =
 	0, 0, 0, 0,
-	DstMem | SrcImmByte | ModRM, DstMem | SrcImmByte | ModRM,
-	DstMem | SrcImmByte | ModRM, DstMem | SrcImmByte | ModRM,
+	DstMem | SrcImmByte | ModRM, DstMem | SrcImmByte | ModRM | Lock,
+	DstMem | SrcImmByte | ModRM | Lock, DstMem | SrcImmByte | ModRM | Lock,
 	[Group9*8] =
-	0, ImplicitOps | ModRM, 0, 0, 0, 0, 0, 0,
+	0, ImplicitOps | ModRM | Lock, 0, 0, 0, 0, 0, 0,
 };
 
 static u32 group2_table[] = {
@@ -1580,8 +1598,7 @@ emulate_syscall(struct x86_emulate_ctxt *ctxt)
 	u64 msr_data;
 
 	/* syscall is not available in real mode */
-	if (c->lock_prefix || ctxt->mode == X86EMUL_MODE_REAL
-	    || ctxt->mode == X86EMUL_MODE_VM86)
+	if (ctxt->mode == X86EMUL_MODE_REAL || ctxt->mode == X86EMUL_MODE_VM86)
 		return -1;
 
 	setup_syscalls_segments(ctxt, &cs, &ss);
@@ -1629,10 +1646,6 @@ emulate_sysenter(struct x86_emulate_ctxt *ctxt)
 	struct kvm_segment cs, ss;
 	u64 msr_data;
 
-	/* inject #UD if LOCK prefix is used */
-	if (c->lock_prefix)
-		return -1;
-
 	/* inject #GP if in real mode */
 	if (ctxt->mode == X86EMUL_MODE_REAL) {
 		kvm_inject_gp(ctxt->vcpu, 0);
@@ -1694,10 +1707,6 @@ emulate_sysexit(struct x86_emulate_ctxt *ctxt)
 	u64 msr_data;
 	int usermode;
 
-	/* inject #UD if LOCK prefix is used */
-	if (c->lock_prefix)
-		return -1;
-
 	/* inject #GP if in real mode or Virtual 8086 mode */
 	if (ctxt->mode == X86EMUL_MODE_REAL ||
 	    ctxt->mode == X86EMUL_MODE_VM86) {
@@ -1819,6 +1828,12 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
 	memcpy(c->regs, ctxt->vcpu->arch.regs, sizeof c->regs);
 	saved_eip = c->eip;
 
+	/* LOCK prefix is allowed only with some instructions */
+	if (c->lock_prefix && !(c->d & Lock)) {
+		kvm_queue_exception(ctxt->vcpu, UD_VECTOR);
+		goto done;
+	}
+
 	/* Privileged instruction can be executed only in CPL=0 */
 	if ((c->d & Priv) && kvm_x86_ops->get_cpl(ctxt->vcpu)) {
 		kvm_inject_gp(ctxt->vcpu, 0);
-- 
1.6.5.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 18/20] KVM: Plan obsolescence of kernel allocated slots, paravirt mmu
  2010-02-17 13:45 [PATCH 00/20] KVM updates for the 2.6.34 merge window (batch 4/4) Avi Kivity
                   ` (16 preceding siblings ...)
  2010-02-17 13:45 ` [PATCH 17/20] KVM: x86 emulator: Add LOCK prefix validity checking Avi Kivity
@ 2010-02-17 13:45 ` Avi Kivity
  2010-02-17 13:45 ` [PATCH 19/20] KVM: x86 emulator: code style cleanup Avi Kivity
  2010-02-17 13:45 ` [PATCH 20/20] KVM: x86 emulator: disallow opcode 82 in 64-bit mode Avi Kivity
  19 siblings, 0 replies; 32+ messages in thread
From: Avi Kivity @ 2010-02-17 13:45 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel

These features are unused by modern userspace and can go away.  Paravirt
mmu needs to stay a little longer for live migration.

Signed-off-by: Avi Kivity <avi@redhat.com>
---
 Documentation/feature-removal-schedule.txt |   30 ++++++++++++++++++++++++++++
 1 files changed, 30 insertions(+), 0 deletions(-)

diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 0a46833..47a6554 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -542,3 +542,33 @@ Why:	Duplicate functionality with the gspca_zc3xx driver, zc0301 only
 	sensors) wich are also supported by the gspca_zc3xx driver
 	(which supports 53 USB-ID's in total)
 Who:	Hans de Goede <hdegoede@redhat.com>
+
+----------------------------
+
+What:	KVM memory aliases support
+When:	July 2010
+Why:	Memory aliasing support is used for speeding up guest vga access
+	through the vga windows.
+
+	Modern userspace no longer uses this feature, so it's just bitrotted
+	code and can be removed with no impact.
+Who:	Avi Kivity <avi@redhat.com>
+
+----------------------------
+
+What:	KVM kernel-allocated memory slots
+When:	July 2010
+Why:	Since 2.6.25, kvm supports user-allocated memory slots, which are
+	much more flexible than kernel-allocated slots.  All current userspace
+	supports the newer interface and this code can be removed with no
+	impact.
+Who:	Avi Kivity <avi@redhat.com>
+
+----------------------------
+
+What:	KVM paravirt mmu host support
+When:	January 2011
+Why:	The paravirt mmu host support is slower than non-paravirt mmu, both
+	on newer and older hardware.  It is already not exposed to the guest,
+	and kept only for live migration purposes.
+Who:	Avi Kivity <avi@redhat.com>
-- 
1.6.5.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 19/20] KVM: x86 emulator: code style cleanup
  2010-02-17 13:45 [PATCH 00/20] KVM updates for the 2.6.34 merge window (batch 4/4) Avi Kivity
                   ` (17 preceding siblings ...)
  2010-02-17 13:45 ` [PATCH 18/20] KVM: Plan obsolescence of kernel allocated slots, paravirt mmu Avi Kivity
@ 2010-02-17 13:45 ` Avi Kivity
  2010-02-17 13:45 ` [PATCH 20/20] KVM: x86 emulator: disallow opcode 82 in 64-bit mode Avi Kivity
  19 siblings, 0 replies; 32+ messages in thread
From: Avi Kivity @ 2010-02-17 13:45 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel

From: Wei Yongjun <yjwei@cn.fujitsu.com>

Just remove redundant semicolon.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
 arch/x86/kvm/emulate.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index c2de9f0..dd1b935 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1016,7 +1016,7 @@ done_prefixes:
 	}
 
 	if (mode == X86EMUL_MODE_PROT64 && (c->d & No64)) {
-		kvm_report_emulation_failure(ctxt->vcpu, "invalid x86/64 instruction");;
+		kvm_report_emulation_failure(ctxt->vcpu, "invalid x86/64 instruction");
 		return -1;
 	}
 
-- 
1.6.5.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 20/20] KVM: x86 emulator: disallow opcode 82 in 64-bit mode
  2010-02-17 13:45 [PATCH 00/20] KVM updates for the 2.6.34 merge window (batch 4/4) Avi Kivity
                   ` (18 preceding siblings ...)
  2010-02-17 13:45 ` [PATCH 19/20] KVM: x86 emulator: code style cleanup Avi Kivity
@ 2010-02-17 13:45 ` Avi Kivity
  19 siblings, 0 replies; 32+ messages in thread
From: Avi Kivity @ 2010-02-17 13:45 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel

From: Gleb Natapov <gleb@redhat.com>

Instructions with opcode 82 are not valid in 64 bit mode.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
---
 arch/x86/kvm/emulate.c |   16 ++++++++--------
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index dd1b935..c280c23 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -310,14 +310,14 @@ static u32 group_table[] = {
 	DstMem | SrcImm | ModRM | Lock,
 	DstMem | SrcImm | ModRM,
 	[Group1_82*8] =
-	ByteOp | DstMem | SrcImm | ModRM | Lock,
-	ByteOp | DstMem | SrcImm | ModRM | Lock,
-	ByteOp | DstMem | SrcImm | ModRM | Lock,
-	ByteOp | DstMem | SrcImm | ModRM | Lock,
-	ByteOp | DstMem | SrcImm | ModRM | Lock,
-	ByteOp | DstMem | SrcImm | ModRM | Lock,
-	ByteOp | DstMem | SrcImm | ModRM | Lock,
-	ByteOp | DstMem | SrcImm | ModRM,
+	ByteOp | DstMem | SrcImm | ModRM | No64 | Lock,
+	ByteOp | DstMem | SrcImm | ModRM | No64 | Lock,
+	ByteOp | DstMem | SrcImm | ModRM | No64 | Lock,
+	ByteOp | DstMem | SrcImm | ModRM | No64 | Lock,
+	ByteOp | DstMem | SrcImm | ModRM | No64 | Lock,
+	ByteOp | DstMem | SrcImm | ModRM | No64 | Lock,
+	ByteOp | DstMem | SrcImm | ModRM | No64 | Lock,
+	ByteOp | DstMem | SrcImm | ModRM | No64,
 	[Group1_83*8] =
 	DstMem | SrcImmByte | ModRM | Lock,
 	DstMem | SrcImmByte | ModRM | Lock,
-- 
1.6.5.3


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 13/20] KVM: x86 emulator: fix memory access during x86 emulation
  2010-02-17 13:45 ` [PATCH 13/20] KVM: x86 emulator: fix memory access during x86 emulation Avi Kivity
@ 2010-03-06 13:53   ` Stefan Bader
  2010-03-07 10:07     ` Avi Kivity
  0 siblings, 1 reply; 32+ messages in thread
From: Stefan Bader @ 2010-03-06 13:53 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, linux-kernel

i Avi,

we currently try to integrate this patch for an update into a 2.6.32 based
system (amongst other kvm updates). But as soon as this patch gets added kvm
will die on startup in kvm_leave_lazy_mmu. This has been documented here:

https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/531823

I have placed the backports of your patches, which are currently in linux-next
and marked for stable here:

git://kernel.ubuntu.com/smb/linux-2.6.32.y kvm

I have tested the failure with a version that got only the following patches in:
KVM: x86 emulator: Add Virtual-8086 mode of emulation
KVM: x86 emulator: fix memory access during x86 emulation
KVM: x86 emulator: Check IOPL level during io instruction emulation
KVM: x86 emulator: Fix popf emulation
KVM: x86 emulator: Check CPL level during privilege instruction emulation

and also with a version that takes all stable patches up to the bad one:
KVM: VMX: Trap and invalid MWAIT/MONITOR instruction
KVM: x86 emulator: Add group8 instruction decoding
KVM: x86 emulator: Add group9 instruction decoding
KVM: x86 emulator: Add Virtual-8086 mode of emulation
KVM: x86 emulator: fix memory access during x86 emulation

But as soon as the fix for memory access gets added, the bug will occur. Would
you have an idea what might be causing this?

Thanks,
Stefan

Avi Kivity wrote:
> From: Gleb Natapov <gleb@redhat.com>
> 
> Currently when x86 emulator needs to access memory, page walk is done with
> broadest permission possible, so if emulated instruction was executed
> by userspace process it can still access kernel memory. Fix that by
> providing correct memory access to page walker during emulation.
> 
> Signed-off-by: Gleb Natapov <gleb@redhat.com>
> Cc: stable@kernel.org
> Signed-off-by: Avi Kivity <avi@redhat.com>
> ---
>  arch/x86/include/asm/kvm_emulate.h |   14 +++-
>  arch/x86/include/asm/kvm_host.h    |    7 ++-
>  arch/x86/kvm/emulate.c             |    6 +-
>  arch/x86/kvm/mmu.c                 |   17 ++---
>  arch/x86/kvm/mmu.h                 |    6 ++
>  arch/x86/kvm/paging_tmpl.h         |   11 ++-
>  arch/x86/kvm/x86.c                 |  131 +++++++++++++++++++++++++++---------
>  7 files changed, 142 insertions(+), 50 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
> index 784d7c5..7a6f54f 100644
> --- a/arch/x86/include/asm/kvm_emulate.h
> +++ b/arch/x86/include/asm/kvm_emulate.h
> @@ -54,13 +54,23 @@ struct x86_emulate_ctxt;
>  struct x86_emulate_ops {
>  	/*
>  	 * read_std: Read bytes of standard (non-emulated/special) memory.
> -	 *           Used for instruction fetch, stack operations, and others.
> +	 *           Used for descriptor reading.
>  	 *  @addr:  [IN ] Linear address from which to read.
>  	 *  @val:   [OUT] Value read from memory, zero-extended to 'u_long'.
>  	 *  @bytes: [IN ] Number of bytes to read from memory.
>  	 */
>  	int (*read_std)(unsigned long addr, void *val,
> -			unsigned int bytes, struct kvm_vcpu *vcpu);
> +			unsigned int bytes, struct kvm_vcpu *vcpu, u32 *error);
> +
> +	/*
> +	 * fetch: Read bytes of standard (non-emulated/special) memory.
> +	 *        Used for instruction fetch.
> +	 *  @addr:  [IN ] Linear address from which to read.
> +	 *  @val:   [OUT] Value read from memory, zero-extended to 'u_long'.
> +	 *  @bytes: [IN ] Number of bytes to read from memory.
> +	 */
> +	int (*fetch)(unsigned long addr, void *val,
> +			unsigned int bytes, struct kvm_vcpu *vcpu, u32 *error);
>  
>  	/*
>  	 * read_emulated: Read bytes from emulated/special memory area.
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 1522337..c07c16f 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -243,7 +243,8 @@ struct kvm_mmu {
>  	void (*new_cr3)(struct kvm_vcpu *vcpu);
>  	int (*page_fault)(struct kvm_vcpu *vcpu, gva_t gva, u32 err);
>  	void (*free)(struct kvm_vcpu *vcpu);
> -	gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva);
> +	gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva, u32 access,
> +			    u32 *error);
>  	void (*prefetch_page)(struct kvm_vcpu *vcpu,
>  			      struct kvm_mmu_page *page);
>  	int (*sync_page)(struct kvm_vcpu *vcpu,
> @@ -660,6 +661,10 @@ void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu);
>  int kvm_mmu_load(struct kvm_vcpu *vcpu);
>  void kvm_mmu_unload(struct kvm_vcpu *vcpu);
>  void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu);
> +gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva, u32 *error);
> +gpa_t kvm_mmu_gva_to_gpa_fetch(struct kvm_vcpu *vcpu, gva_t gva, u32 *error);
> +gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, gva_t gva, u32 *error);
> +gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva, u32 *error);
>  
>  int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
>  
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index e4e2df3..c44b460 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -616,7 +616,7 @@ static int do_fetch_insn_byte(struct x86_emulate_ctxt *ctxt,
>  
>  	if (linear < fc->start || linear >= fc->end) {
>  		size = min(15UL, PAGE_SIZE - offset_in_page(linear));
> -		rc = ops->read_std(linear, fc->data, size, ctxt->vcpu);
> +		rc = ops->fetch(linear, fc->data, size, ctxt->vcpu, NULL);
>  		if (rc)
>  			return rc;
>  		fc->start = linear;
> @@ -671,11 +671,11 @@ static int read_descriptor(struct x86_emulate_ctxt *ctxt,
>  		op_bytes = 3;
>  	*address = 0;
>  	rc = ops->read_std((unsigned long)ptr, (unsigned long *)size, 2,
> -			   ctxt->vcpu);
> +			   ctxt->vcpu, NULL);
>  	if (rc)
>  		return rc;
>  	rc = ops->read_std((unsigned long)ptr + 2, address, op_bytes,
> -			   ctxt->vcpu);
> +			   ctxt->vcpu, NULL);
>  	return rc;
>  }
>  
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 7397932..741373e 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -138,12 +138,6 @@ module_param(oos_shadow, bool, 0644);
>  #define PT64_PERM_MASK (PT_PRESENT_MASK | PT_WRITABLE_MASK | PT_USER_MASK \
>  			| PT64_NX_MASK)
>  
> -#define PFERR_PRESENT_MASK (1U << 0)
> -#define PFERR_WRITE_MASK (1U << 1)
> -#define PFERR_USER_MASK (1U << 2)
> -#define PFERR_RSVD_MASK (1U << 3)
> -#define PFERR_FETCH_MASK (1U << 4)
> -
>  #define RMAP_EXT 4
>  
>  #define ACC_EXEC_MASK    1
> @@ -1632,7 +1626,7 @@ struct page *gva_to_page(struct kvm_vcpu *vcpu, gva_t gva)
>  {
>  	struct page *page;
>  
> -	gpa_t gpa = vcpu->arch.mmu.gva_to_gpa(vcpu, gva);
> +	gpa_t gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, NULL);
>  
>  	if (gpa == UNMAPPED_GVA)
>  		return NULL;
> @@ -2155,8 +2149,11 @@ void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
>  	spin_unlock(&vcpu->kvm->mmu_lock);
>  }
>  
> -static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, gva_t vaddr)
> +static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, gva_t vaddr,
> +				  u32 access, u32 *error)
>  {
> +	if (error)
> +		*error = 0;
>  	return vaddr;
>  }
>  
> @@ -2740,7 +2737,7 @@ int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva)
>  	if (tdp_enabled)
>  		return 0;
>  
> -	gpa = vcpu->arch.mmu.gva_to_gpa(vcpu, gva);
> +	gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, NULL);
>  
>  	spin_lock(&vcpu->kvm->mmu_lock);
>  	r = kvm_mmu_unprotect_page(vcpu->kvm, gpa >> PAGE_SHIFT);
> @@ -3237,7 +3234,7 @@ static void audit_mappings_page(struct kvm_vcpu *vcpu, u64 page_pte,
>  		if (is_shadow_present_pte(ent) && !is_last_spte(ent, level))
>  			audit_mappings_page(vcpu, ent, va, level - 1);
>  		else {
> -			gpa_t gpa = vcpu->arch.mmu.gva_to_gpa(vcpu, va);
> +			gpa_t gpa = kvm_mmu_gva_to_gpa_read(vcpu, va, NULL);
>  			gfn_t gfn = gpa >> PAGE_SHIFT;
>  			pfn_t pfn = gfn_to_pfn(vcpu->kvm, gfn);
>  			hpa_t hpa = (hpa_t)pfn << PAGE_SHIFT;
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 61ef5a6..be66759 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -42,6 +42,12 @@
>  #define PT_DIRECTORY_LEVEL 2
>  #define PT_PAGE_TABLE_LEVEL 1
>  
> +#define PFERR_PRESENT_MASK (1U << 0)
> +#define PFERR_WRITE_MASK (1U << 1)
> +#define PFERR_USER_MASK (1U << 2)
> +#define PFERR_RSVD_MASK (1U << 3)
> +#define PFERR_FETCH_MASK (1U << 4)
> +
>  int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 sptes[4]);
>  
>  static inline void kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu)
> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> index df15a53..81eab9a 100644
> --- a/arch/x86/kvm/paging_tmpl.h
> +++ b/arch/x86/kvm/paging_tmpl.h
> @@ -490,18 +490,23 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
>  	spin_unlock(&vcpu->kvm->mmu_lock);
>  }
>  
> -static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t vaddr)
> +static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t vaddr, u32 access,
> +			       u32 *error)
>  {
>  	struct guest_walker walker;
>  	gpa_t gpa = UNMAPPED_GVA;
>  	int r;
>  
> -	r = FNAME(walk_addr)(&walker, vcpu, vaddr, 0, 0, 0);
> +	r = FNAME(walk_addr)(&walker, vcpu, vaddr,
> +			     !!(access & PFERR_WRITE_MASK),
> +			     !!(access & PFERR_USER_MASK),
> +			     !!(access & PFERR_FETCH_MASK));
>  
>  	if (r) {
>  		gpa = gfn_to_gpa(walker.gfn);
>  		gpa |= vaddr & ~PAGE_MASK;
> -	}
> +	} else if (error)
> +		*error = walker.error_code;
>  
>  	return gpa;
>  }
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a283795..ea3a8af 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3039,14 +3039,41 @@ static int vcpu_mmio_read(struct kvm_vcpu *vcpu, gpa_t addr, int len, void *v)
>  	return kvm_io_bus_read(vcpu->kvm, KVM_MMIO_BUS, addr, len, v);
>  }
>  
> -static int kvm_read_guest_virt(gva_t addr, void *val, unsigned int bytes,
> -			       struct kvm_vcpu *vcpu)
> +gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva, u32 *error)
> +{
> +	u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
> +	return vcpu->arch.mmu.gva_to_gpa(vcpu, gva, access, error);
> +}
> +
> + gpa_t kvm_mmu_gva_to_gpa_fetch(struct kvm_vcpu *vcpu, gva_t gva, u32 *error)
> +{
> +	u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
> +	access |= PFERR_FETCH_MASK;
> +	return vcpu->arch.mmu.gva_to_gpa(vcpu, gva, access, error);
> +}
> +
> +gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, gva_t gva, u32 *error)
> +{
> +	u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
> +	access |= PFERR_WRITE_MASK;
> +	return vcpu->arch.mmu.gva_to_gpa(vcpu, gva, access, error);
> +}
> +
> +/* uses this to access any guest's mapped memory without checking CPL */
> +gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva, u32 *error)
> +{
> +	return vcpu->arch.mmu.gva_to_gpa(vcpu, gva, 0, error);
> +}
> +
> +static int kvm_read_guest_virt_helper(gva_t addr, void *val, unsigned int bytes,
> +				      struct kvm_vcpu *vcpu, u32 access,
> +				      u32 *error)
>  {
>  	void *data = val;
>  	int r = X86EMUL_CONTINUE;
>  
>  	while (bytes) {
> -		gpa_t gpa = vcpu->arch.mmu.gva_to_gpa(vcpu, addr);
> +		gpa_t gpa = vcpu->arch.mmu.gva_to_gpa(vcpu, addr, access, error);
>  		unsigned offset = addr & (PAGE_SIZE-1);
>  		unsigned toread = min(bytes, (unsigned)PAGE_SIZE - offset);
>  		int ret;
> @@ -3069,14 +3096,37 @@ out:
>  	return r;
>  }
>  
> +/* used for instruction fetching */
> +static int kvm_fetch_guest_virt(gva_t addr, void *val, unsigned int bytes,
> +				struct kvm_vcpu *vcpu, u32 *error)
> +{
> +	u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
> +	return kvm_read_guest_virt_helper(addr, val, bytes, vcpu,
> +					  access | PFERR_FETCH_MASK, error);
> +}
> +
> +static int kvm_read_guest_virt(gva_t addr, void *val, unsigned int bytes,
> +			       struct kvm_vcpu *vcpu, u32 *error)
> +{
> +	u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
> +	return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, access,
> +					  error);
> +}
> +
> +static int kvm_read_guest_virt_system(gva_t addr, void *val, unsigned int bytes,
> +			       struct kvm_vcpu *vcpu, u32 *error)
> +{
> +	return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, 0, error);
> +}
> +
>  static int kvm_write_guest_virt(gva_t addr, void *val, unsigned int bytes,
> -				struct kvm_vcpu *vcpu)
> +				struct kvm_vcpu *vcpu, u32 *error)
>  {
>  	void *data = val;
>  	int r = X86EMUL_CONTINUE;
>  
>  	while (bytes) {
> -		gpa_t gpa = vcpu->arch.mmu.gva_to_gpa(vcpu, addr);
> +		gpa_t gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, error);
>  		unsigned offset = addr & (PAGE_SIZE-1);
>  		unsigned towrite = min(bytes, (unsigned)PAGE_SIZE - offset);
>  		int ret;
> @@ -3106,6 +3156,7 @@ static int emulator_read_emulated(unsigned long addr,
>  				  struct kvm_vcpu *vcpu)
>  {
>  	gpa_t                 gpa;
> +	u32 error_code;
>  
>  	if (vcpu->mmio_read_completed) {
>  		memcpy(val, vcpu->mmio_data, bytes);
> @@ -3115,17 +3166,20 @@ static int emulator_read_emulated(unsigned long addr,
>  		return X86EMUL_CONTINUE;
>  	}
>  
> -	gpa = vcpu->arch.mmu.gva_to_gpa(vcpu, addr);
> +	gpa = kvm_mmu_gva_to_gpa_read(vcpu, addr, &error_code);
> +
> +	if (gpa == UNMAPPED_GVA) {
> +		kvm_inject_page_fault(vcpu, addr, error_code);
> +		return X86EMUL_PROPAGATE_FAULT;
> +	}
>  
>  	/* For APIC access vmexit */
>  	if ((gpa & PAGE_MASK) == APIC_DEFAULT_PHYS_BASE)
>  		goto mmio;
>  
> -	if (kvm_read_guest_virt(addr, val, bytes, vcpu)
> +	if (kvm_read_guest_virt(addr, val, bytes, vcpu, NULL)
>  				== X86EMUL_CONTINUE)
>  		return X86EMUL_CONTINUE;
> -	if (gpa == UNMAPPED_GVA)
> -		return X86EMUL_PROPAGATE_FAULT;
>  
>  mmio:
>  	/*
> @@ -3164,11 +3218,12 @@ static int emulator_write_emulated_onepage(unsigned long addr,
>  					   struct kvm_vcpu *vcpu)
>  {
>  	gpa_t                 gpa;
> +	u32 error_code;
>  
> -	gpa = vcpu->arch.mmu.gva_to_gpa(vcpu, addr);
> +	gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, &error_code);
>  
>  	if (gpa == UNMAPPED_GVA) {
> -		kvm_inject_page_fault(vcpu, addr, 2);
> +		kvm_inject_page_fault(vcpu, addr, error_code);
>  		return X86EMUL_PROPAGATE_FAULT;
>  	}
>  
> @@ -3232,7 +3287,7 @@ static int emulator_cmpxchg_emulated(unsigned long addr,
>  		char *kaddr;
>  		u64 val;
>  
> -		gpa = vcpu->arch.mmu.gva_to_gpa(vcpu, addr);
> +		gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, NULL);
>  
>  		if (gpa == UNMAPPED_GVA ||
>  		   (gpa & PAGE_MASK) == APIC_DEFAULT_PHYS_BASE)
> @@ -3297,7 +3352,7 @@ void kvm_report_emulation_failure(struct kvm_vcpu *vcpu, const char *context)
>  
>  	rip_linear = rip + get_segment_base(vcpu, VCPU_SREG_CS);
>  
> -	kvm_read_guest_virt(rip_linear, (void *)opcodes, 4, vcpu);
> +	kvm_read_guest_virt(rip_linear, (void *)opcodes, 4, vcpu, NULL);
>  
>  	printk(KERN_ERR "emulation failed (%s) rip %lx %02x %02x %02x %02x\n",
>  	       context, rip, opcodes[0], opcodes[1], opcodes[2], opcodes[3]);
> @@ -3305,7 +3360,8 @@ void kvm_report_emulation_failure(struct kvm_vcpu *vcpu, const char *context)
>  EXPORT_SYMBOL_GPL(kvm_report_emulation_failure);
>  
>  static struct x86_emulate_ops emulate_ops = {
> -	.read_std            = kvm_read_guest_virt,
> +	.read_std            = kvm_read_guest_virt_system,
> +	.fetch               = kvm_fetch_guest_virt,
>  	.read_emulated       = emulator_read_emulated,
>  	.write_emulated      = emulator_write_emulated,
>  	.cmpxchg_emulated    = emulator_cmpxchg_emulated,
> @@ -3442,12 +3498,17 @@ static int pio_copy_data(struct kvm_vcpu *vcpu)
>  	gva_t q = vcpu->arch.pio.guest_gva;
>  	unsigned bytes;
>  	int ret;
> +	u32 error_code;
>  
>  	bytes = vcpu->arch.pio.size * vcpu->arch.pio.cur_count;
>  	if (vcpu->arch.pio.in)
> -		ret = kvm_write_guest_virt(q, p, bytes, vcpu);
> +		ret = kvm_write_guest_virt(q, p, bytes, vcpu, &error_code);
>  	else
> -		ret = kvm_read_guest_virt(q, p, bytes, vcpu);
> +		ret = kvm_read_guest_virt(q, p, bytes, vcpu, &error_code);
> +
> +	if (ret == X86EMUL_PROPAGATE_FAULT)
> +		kvm_inject_page_fault(vcpu, q, error_code);
> +
>  	return ret;
>  }
>  
> @@ -3468,7 +3529,7 @@ int complete_pio(struct kvm_vcpu *vcpu)
>  		if (io->in) {
>  			r = pio_copy_data(vcpu);
>  			if (r)
> -				return r;
> +				goto out;
>  		}
>  
>  		delta = 1;
> @@ -3495,7 +3556,7 @@ int complete_pio(struct kvm_vcpu *vcpu)
>  			kvm_register_write(vcpu, VCPU_REGS_RSI, val);
>  		}
>  	}
> -
> +out:
>  	io->count -= io->cur_count;
>  	io->cur_count = 0;
>  
> @@ -3617,10 +3678,8 @@ int kvm_emulate_pio_string(struct kvm_vcpu *vcpu, int in,
>  	if (!vcpu->arch.pio.in) {
>  		/* string PIO write */
>  		ret = pio_copy_data(vcpu);
> -		if (ret == X86EMUL_PROPAGATE_FAULT) {
> -			kvm_inject_gp(vcpu, 0);
> +		if (ret == X86EMUL_PROPAGATE_FAULT)
>  			return 1;
> -		}
>  		if (ret == 0 && !pio_string_write(vcpu)) {
>  			complete_pio(vcpu);
>  			if (vcpu->arch.pio.count == 0)
> @@ -4663,7 +4722,9 @@ static int load_guest_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector,
>  		kvm_queue_exception_e(vcpu, GP_VECTOR, selector & 0xfffc);
>  		return X86EMUL_PROPAGATE_FAULT;
>  	}
> -	return kvm_read_guest_virt(dtable.base + index*8, seg_desc, sizeof(*seg_desc), vcpu);
> +	return kvm_read_guest_virt_system(dtable.base + index*8,
> +					  seg_desc, sizeof(*seg_desc),
> +					  vcpu, NULL);
>  }
>  
>  /* allowed just for 8 bytes segments */
> @@ -4677,15 +4738,23 @@ static int save_guest_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector,
>  
>  	if (dtable.limit < index * 8 + 7)
>  		return 1;
> -	return kvm_write_guest_virt(dtable.base + index*8, seg_desc, sizeof(*seg_desc), vcpu);
> +	return kvm_write_guest_virt(dtable.base + index*8, seg_desc, sizeof(*seg_desc), vcpu, NULL);
> +}
> +
> +static gpa_t get_tss_base_addr_write(struct kvm_vcpu *vcpu,
> +			       struct desc_struct *seg_desc)
> +{
> +	u32 base_addr = get_desc_base(seg_desc);
> +
> +	return kvm_mmu_gva_to_gpa_write(vcpu, base_addr, NULL);
>  }
>  
> -static gpa_t get_tss_base_addr(struct kvm_vcpu *vcpu,
> +static gpa_t get_tss_base_addr_read(struct kvm_vcpu *vcpu,
>  			     struct desc_struct *seg_desc)
>  {
>  	u32 base_addr = get_desc_base(seg_desc);
>  
> -	return vcpu->arch.mmu.gva_to_gpa(vcpu, base_addr);
> +	return kvm_mmu_gva_to_gpa_read(vcpu, base_addr, NULL);
>  }
>  
>  static u16 get_segment_selector(struct kvm_vcpu *vcpu, int seg)
> @@ -4894,7 +4963,7 @@ static int kvm_task_switch_16(struct kvm_vcpu *vcpu, u16 tss_selector,
>  			    sizeof tss_segment_16))
>  		goto out;
>  
> -	if (kvm_read_guest(vcpu->kvm, get_tss_base_addr(vcpu, nseg_desc),
> +	if (kvm_read_guest(vcpu->kvm, get_tss_base_addr_read(vcpu, nseg_desc),
>  			   &tss_segment_16, sizeof tss_segment_16))
>  		goto out;
>  
> @@ -4902,7 +4971,7 @@ static int kvm_task_switch_16(struct kvm_vcpu *vcpu, u16 tss_selector,
>  		tss_segment_16.prev_task_link = old_tss_sel;
>  
>  		if (kvm_write_guest(vcpu->kvm,
> -				    get_tss_base_addr(vcpu, nseg_desc),
> +				    get_tss_base_addr_write(vcpu, nseg_desc),
>  				    &tss_segment_16.prev_task_link,
>  				    sizeof tss_segment_16.prev_task_link))
>  			goto out;
> @@ -4933,7 +5002,7 @@ static int kvm_task_switch_32(struct kvm_vcpu *vcpu, u16 tss_selector,
>  			    sizeof tss_segment_32))
>  		goto out;
>  
> -	if (kvm_read_guest(vcpu->kvm, get_tss_base_addr(vcpu, nseg_desc),
> +	if (kvm_read_guest(vcpu->kvm, get_tss_base_addr_read(vcpu, nseg_desc),
>  			   &tss_segment_32, sizeof tss_segment_32))
>  		goto out;
>  
> @@ -4941,7 +5010,7 @@ static int kvm_task_switch_32(struct kvm_vcpu *vcpu, u16 tss_selector,
>  		tss_segment_32.prev_task_link = old_tss_sel;
>  
>  		if (kvm_write_guest(vcpu->kvm,
> -				    get_tss_base_addr(vcpu, nseg_desc),
> +				    get_tss_base_addr_write(vcpu, nseg_desc),
>  				    &tss_segment_32.prev_task_link,
>  				    sizeof tss_segment_32.prev_task_link))
>  			goto out;
> @@ -4964,7 +5033,7 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int reason)
>  	u32 old_tss_base = get_segment_base(vcpu, VCPU_SREG_TR);
>  	u16 old_tss_sel = get_segment_selector(vcpu, VCPU_SREG_TR);
>  
> -	old_tss_base = vcpu->arch.mmu.gva_to_gpa(vcpu, old_tss_base);
> +	old_tss_base = kvm_mmu_gva_to_gpa_write(vcpu, old_tss_base, NULL);
>  
>  	/* FIXME: Handle errors. Failure to read either TSS or their
>  	 * descriptors should generate a pagefault.
> @@ -5199,7 +5268,7 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
>  
>  	vcpu_load(vcpu);
>  	idx = srcu_read_lock(&vcpu->kvm->srcu);
> -	gpa = vcpu->arch.mmu.gva_to_gpa(vcpu, vaddr);
> +	gpa = kvm_mmu_gva_to_gpa_system(vcpu, vaddr, NULL);
>  	srcu_read_unlock(&vcpu->kvm->srcu, idx);
>  	tr->physical_address = gpa;
>  	tr->valid = gpa != UNMAPPED_GVA;


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 13/20] KVM: x86 emulator: fix memory access during x86 emulation
  2010-03-06 13:53   ` Stefan Bader
@ 2010-03-07 10:07     ` Avi Kivity
  2010-03-08 14:10       ` Stefan Bader
  0 siblings, 1 reply; 32+ messages in thread
From: Avi Kivity @ 2010-03-07 10:07 UTC (permalink / raw)
  To: Stefan Bader; +Cc: kvm, linux-kernel

On 03/06/2010 03:53 PM, Stefan Bader wrote:
> i Avi,
>
> we currently try to integrate this patch for an update into a 2.6.32 based
> system (amongst other kvm updates). But as soon as this patch gets added kvm
> will die on startup in kvm_leave_lazy_mmu. This has been documented here:
>
> https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/531823
>
> I have placed the backports of your patches, which are currently in linux-next
> and marked for stable here:
>
> git://kernel.ubuntu.com/smb/linux-2.6.32.y kvm
>
> I have tested the failure with a version that got only the following patches in:
> KVM: x86 emulator: Add Virtual-8086 mode of emulation
> KVM: x86 emulator: fix memory access during x86 emulation
> KVM: x86 emulator: Check IOPL level during io instruction emulation
> KVM: x86 emulator: Fix popf emulation
> KVM: x86 emulator: Check CPL level during privilege instruction emulation
>
> and also with a version that takes all stable patches up to the bad one:
> KVM: VMX: Trap and invalid MWAIT/MONITOR instruction
> KVM: x86 emulator: Add group8 instruction decoding
> KVM: x86 emulator: Add group9 instruction decoding
> KVM: x86 emulator: Add Virtual-8086 mode of emulation
> KVM: x86 emulator: fix memory access during x86 emulation
>
> But as soon as the fix for memory access gets added, the bug will occur. Would
> you have an idea what might be causing this?
>    

Does the same guest, using the same qemu-kvm, work on kvm.git or upstream?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 13/20] KVM: x86 emulator: fix memory access during x86 emulation
  2010-03-07 10:07     ` Avi Kivity
@ 2010-03-08 14:10       ` Stefan Bader
  2010-03-08 14:12         ` Avi Kivity
  0 siblings, 1 reply; 32+ messages in thread
From: Stefan Bader @ 2010-03-08 14:10 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, linux-kernel

Avi Kivity wrote:
> On 03/06/2010 03:53 PM, Stefan Bader wrote:
>> i Avi,
>>
>> we currently try to integrate this patch for an update into a 2.6.32
>> based
>> system (amongst other kvm updates). But as soon as this patch gets
>> added kvm
>> will die on startup in kvm_leave_lazy_mmu. This has been documented here:
>>
>> https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/531823
>>
>> I have placed the backports of your patches, which are currently in
>> linux-next
>> and marked for stable here:
>>
>> git://kernel.ubuntu.com/smb/linux-2.6.32.y kvm
>>
>> I have tested the failure with a version that got only the following
>> patches in:
>> KVM: x86 emulator: Add Virtual-8086 mode of emulation
>> KVM: x86 emulator: fix memory access during x86 emulation
>> KVM: x86 emulator: Check IOPL level during io instruction emulation
>> KVM: x86 emulator: Fix popf emulation
>> KVM: x86 emulator: Check CPL level during privilege instruction emulation
>>
>> and also with a version that takes all stable patches up to the bad one:
>> KVM: VMX: Trap and invalid MWAIT/MONITOR instruction
>> KVM: x86 emulator: Add group8 instruction decoding
>> KVM: x86 emulator: Add group9 instruction decoding
>> KVM: x86 emulator: Add Virtual-8086 mode of emulation
>> KVM: x86 emulator: fix memory access during x86 emulation
>>
>> But as soon as the fix for memory access gets added, the bug will
>> occur. Would
>> you have an idea what might be causing this?
>>    
> 
> Does the same guest, using the same qemu-kvm, work on kvm.git or upstream?
> 
The test was done with a kvm user-space package based on 0.12.3 (which seems to
be the current upstream version). I try to do a test on the git version.

Stefan

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 13/20] KVM: x86 emulator: fix memory access during x86 emulation
  2010-03-08 14:10       ` Stefan Bader
@ 2010-03-08 14:12         ` Avi Kivity
  2010-03-08 14:17           ` Stefan Bader
  2010-03-08 20:48           ` Stefan Bader
  0 siblings, 2 replies; 32+ messages in thread
From: Avi Kivity @ 2010-03-08 14:12 UTC (permalink / raw)
  To: Stefan Bader; +Cc: kvm, linux-kernel

On 03/08/2010 04:10 PM, Stefan Bader wrote:
> Avi Kivity wrote:
>    
>> On 03/06/2010 03:53 PM, Stefan Bader wrote:
>>      
>>> i Avi,
>>>
>>> we currently try to integrate this patch for an update into a 2.6.32
>>> based
>>> system (amongst other kvm updates). But as soon as this patch gets
>>> added kvm
>>> will die on startup in kvm_leave_lazy_mmu. This has been documented here:
>>>
>>> https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/531823
>>>
>>> I have placed the backports of your patches, which are currently in
>>> linux-next
>>> and marked for stable here:
>>>
>>> git://kernel.ubuntu.com/smb/linux-2.6.32.y kvm
>>>
>>> I have tested the failure with a version that got only the following
>>> patches in:
>>> KVM: x86 emulator: Add Virtual-8086 mode of emulation
>>> KVM: x86 emulator: fix memory access during x86 emulation
>>> KVM: x86 emulator: Check IOPL level during io instruction emulation
>>> KVM: x86 emulator: Fix popf emulation
>>> KVM: x86 emulator: Check CPL level during privilege instruction emulation
>>>
>>> and also with a version that takes all stable patches up to the bad one:
>>> KVM: VMX: Trap and invalid MWAIT/MONITOR instruction
>>> KVM: x86 emulator: Add group8 instruction decoding
>>> KVM: x86 emulator: Add group9 instruction decoding
>>> KVM: x86 emulator: Add Virtual-8086 mode of emulation
>>> KVM: x86 emulator: fix memory access during x86 emulation
>>>
>>> But as soon as the fix for memory access gets added, the bug will
>>> occur. Would
>>> you have an idea what might be causing this?
>>>
>>>        
>> Does the same guest, using the same qemu-kvm, work on kvm.git or upstream?
>>
>>      
> The test was done with a kvm user-space package based on 0.12.3 (which seems to
> be the current upstream version). I try to do a test on the git version.
>    

I meant keep the same userspace without change, and try it on a Linus 
kernel or kvm.git master 
(http://git.kernel.org/?p=virt/kvm/kvm.git;a=summary).

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 13/20] KVM: x86 emulator: fix memory access during x86 emulation
  2010-03-08 14:12         ` Avi Kivity
@ 2010-03-08 14:17           ` Stefan Bader
  2010-03-08 20:48           ` Stefan Bader
  1 sibling, 0 replies; 32+ messages in thread
From: Stefan Bader @ 2010-03-08 14:17 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, linux-kernel

Avi Kivity wrote:
> On 03/08/2010 04:10 PM, Stefan Bader wrote:
>> Avi Kivity wrote:
>>   
>>> On 03/06/2010 03:53 PM, Stefan Bader wrote:
>>>     
>>>> i Avi,
>>>>
>>>> we currently try to integrate this patch for an update into a 2.6.32
>>>> based
>>>> system (amongst other kvm updates). But as soon as this patch gets
>>>> added kvm
>>>> will die on startup in kvm_leave_lazy_mmu. This has been documented
>>>> here:
>>>>
>>>> https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/531823
>>>>
>>>> I have placed the backports of your patches, which are currently in
>>>> linux-next
>>>> and marked for stable here:
>>>>
>>>> git://kernel.ubuntu.com/smb/linux-2.6.32.y kvm
>>>>
>>>> I have tested the failure with a version that got only the following
>>>> patches in:
>>>> KVM: x86 emulator: Add Virtual-8086 mode of emulation
>>>> KVM: x86 emulator: fix memory access during x86 emulation
>>>> KVM: x86 emulator: Check IOPL level during io instruction emulation
>>>> KVM: x86 emulator: Fix popf emulation
>>>> KVM: x86 emulator: Check CPL level during privilege instruction
>>>> emulation
>>>>
>>>> and also with a version that takes all stable patches up to the bad
>>>> one:
>>>> KVM: VMX: Trap and invalid MWAIT/MONITOR instruction
>>>> KVM: x86 emulator: Add group8 instruction decoding
>>>> KVM: x86 emulator: Add group9 instruction decoding
>>>> KVM: x86 emulator: Add Virtual-8086 mode of emulation
>>>> KVM: x86 emulator: fix memory access during x86 emulation
>>>>
>>>> But as soon as the fix for memory access gets added, the bug will
>>>> occur. Would
>>>> you have an idea what might be causing this?
>>>>
>>>>        
>>> Does the same guest, using the same qemu-kvm, work on kvm.git or
>>> upstream?
>>>
>>>      
>> The test was done with a kvm user-space package based on 0.12.3 (which
>> seems to
>> be the current upstream version). I try to do a test on the git version.
>>    
> 
> I meant keep the same userspace without change, and try it on a Linus
> kernel or kvm.git master
> (http://git.kernel.org/?p=virt/kvm/kvm.git;a=summary).
> 

Ok, sorry I misunderstood that. As I see Linus just pulled your patches in, I
will get that compiled and tested.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 13/20] KVM: x86 emulator: fix memory access during x86 emulation
  2010-03-08 14:12         ` Avi Kivity
  2010-03-08 14:17           ` Stefan Bader
@ 2010-03-08 20:48           ` Stefan Bader
  2010-03-09 15:49             ` Stefan Bader
  2010-03-11 21:16             ` KVM: x86: ignore access permissions for hypercall patching Marcelo Tosatti
  1 sibling, 2 replies; 32+ messages in thread
From: Stefan Bader @ 2010-03-08 20:48 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, linux-kernel

Avi Kivity wrote:
> On 03/08/2010 04:10 PM, Stefan Bader wrote:
>> Avi Kivity wrote:
>>   
>>> On 03/06/2010 03:53 PM, Stefan Bader wrote:
>>>     
>>>> i Avi,
>>>>
>>>> we currently try to integrate this patch for an update into a 2.6.32
>>>> based
>>>> system (amongst other kvm updates). But as soon as this patch gets
>>>> added kvm
>>>> will die on startup in kvm_leave_lazy_mmu. This has been documented
>>>> here:
>>>>
>>>> https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/531823
>>>>
>>>> I have placed the backports of your patches, which are currently in
>>>> linux-next
>>>> and marked for stable here:
>>>>
>>>> git://kernel.ubuntu.com/smb/linux-2.6.32.y kvm
>>>>
>>>> I have tested the failure with a version that got only the following
>>>> patches in:
>>>> KVM: x86 emulator: Add Virtual-8086 mode of emulation
>>>> KVM: x86 emulator: fix memory access during x86 emulation
>>>> KVM: x86 emulator: Check IOPL level during io instruction emulation
>>>> KVM: x86 emulator: Fix popf emulation
>>>> KVM: x86 emulator: Check CPL level during privilege instruction
>>>> emulation
>>>>
>>>> and also with a version that takes all stable patches up to the bad
>>>> one:
>>>> KVM: VMX: Trap and invalid MWAIT/MONITOR instruction
>>>> KVM: x86 emulator: Add group8 instruction decoding
>>>> KVM: x86 emulator: Add group9 instruction decoding
>>>> KVM: x86 emulator: Add Virtual-8086 mode of emulation
>>>> KVM: x86 emulator: fix memory access during x86 emulation
>>>>
>>>> But as soon as the fix for memory access gets added, the bug will
>>>> occur. Would
>>>> you have an idea what might be causing this?
>>>>
>>>>        
>>> Does the same guest, using the same qemu-kvm, work on kvm.git or
>>> upstream?
>>>
>>>      
>> The test was done with a kvm user-space package based on 0.12.3 (which
>> seems to
>> be the current upstream version). I try to do a test on the git version.
>>    
> 
> I meant keep the same userspace without change, and try it on a Linus
> kernel or kvm.git master
> (http://git.kernel.org/?p=virt/kvm/kvm.git;a=summary).
> 
HEAD of kvm.git tree works (with same client and userspace)
Stable 2.6.32.y tree plus all patches marked cc: stable fails.

(32bit host/guest)
Host dmesg:
kvm: emulating exchange as write

Guest dmesg:
...
[    3.053503] Freeing initrd memory: 8843k freed
[    3.059863] Freeing unused kernel memory: 660k freed
[    3.076657] Write protecting the kernel text: 4780k
[    3.082863] Write protecting the kernel read-only data: 1912k
[    3.086666] BUG: unable to handle kernel paging request at c01292e3
[    3.088025] IP: [<c01292e3>] kvm_leave_lazy_mmu+0x43/0x70
[    3.088025] *pde = 00910067 *pte = 00129161
[    3.088025] Oops: 0003 [#1] SMP
[    3.088025] last sysfs file:
[    3.088025] Modules linked in:
[    3.088025]
[    3.088025] Pid: 1, comm: init Not tainted (2.6.32-15-generic #22-Ubuntu) Bochs
[    3.088025] EIP: 0060:[<c01292e3>] EFLAGS: 00010246 CPU: 0
[    3.088025] EIP is at kvm_leave_lazy_mmu+0x43/0x70
[    3.088025] EAX: 00000002 EBX: 00000018 ECX: 01802c20 EDX: 00000000
[    3.088025] ESI: c1802c20 EDI: c1802c20 EBP: df071cb4 ESP: df071ca8
[    3.088025]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[    3.088025] Process init (pid: 1, ti=df070000 task=df068000 task.ti=df070000)
[    3.088025] Stack:
[    3.088025]  c0000000 dce2b000 dce2a844 df071cf0 c01e8b6d 00000000 00000001
bffff000
[    3.088025] <0> 00000000 db7ed000 c139d54c c139d54c df133000 db7ed000
1ffef067 bffff000
[    3.088025] <0> bfe10000 db44bbfc df071d2c c01e8ce0 c0000000 df133000
db44bbfc bfe10000
[    3.088025] Call Trace:
[    3.088025]  [<c01e8b6d>] ? move_ptes+0x1ad/0x270
[    3.088025]  [<c01e8ce0>] ? move_page_tables+0xb0/0x130
[    3.088025]  [<c020b614>] ? shift_arg_pages+0x94/0x180
[    3.088025]  [<c020b885>] ? setup_arg_pages+0x185/0x1b0
[    3.088025]  [<c0241243>] ? load_elf_binary+0x3c3/0xac0
[    3.088025]  [<c02f1654>] ? security_file_permission+0x14/0x20
[    3.088025]  [<c02052f4>] ? rw_verify_area+0x64/0xe0
[    3.088025]  [<c0240e80>] ? load_elf_binary+0x0/0xac0
[    3.088025]  [<c020bd9f>] ? search_binary_handler+0xef/0x2f0
[    3.088025]  [<c020b465>] ? kernel_read+0x35/0x50
[    3.088025]  [<c023f7b2>] ? load_script+0x1e2/0x270
[    3.088025]  [<c01e4160>] ? get_user_pages+0x50/0x60
[    3.088025]  [<c020a662>] ? get_arg_page+0x52/0xb0
[    3.088025]  [<c023f5d0>] ? load_script+0x0/0x270
[    3.088025]  [<c020bd9f>] ? search_binary_handler+0xef/0x2f0
[    3.088025]  [<c020a834>] ? copy_strings+0x174/0x190
[    3.088025]  [<c020c2c7>] ? do_execve+0x1f7/0x2c0
[    3.088025]  [<c034ed6a>] ? strncpy_from_user+0x3a/0x70
[    3.088025]  [<c0101a1d>] ? sys_execve+0x2d/0x60
[    3.088025]  [<c01033ec>] ? syscall_call+0x7/0xb
[    3.088025]  [<c01070a4>] ? kernel_execve+0x24/0x30
[    3.088025]  [<c01012ac>] ? run_init_process+0x1c/0x20
[    3.088025]  [<c0101396>] ? init_post+0xe6/0x100
[    3.088025]  [<c07d83d0>] ? kernel_init+0xb8/0xbf
[    3.088025]  [<c07d8318>] ? kernel_init+0x0/0xbf
[    3.088025]  [<c0104087>] ? kernel_thread_helper+0x7/0x10
[    3.088025] Code: 6c 87 c0 64 a1 40 6a 87 c0 03 3c 85 80 4a 7d c0 8b 9f 00 04
00 00 85 db 74 24 89 fe 31 d2 66 90 8d 8e 00 00 00 40 b8 02 00 00 00 <0f> 01 c1
01 c6 29 c3 75 ec c7 87 00 04 00 00 00 00 00 00 e8 e5
[    3.088025] EIP: [<c01292e3>] kvm_leave_lazy_mmu+0x43/0x70 SS:ESP 0068:df071ca8
[    3.088025] CR2: 00000000c01292e3
[    3.088025] ---[ end trace 85e247d11bf9c7e0 ]---
[    3.088025] note: init[1] exited with preempt_count 2
[    3.141968] BUG: scheduling while atomic: init/1/0x00000002
[    3.143101] Modules linked in:
[    3.143723] Pid: 1, comm: init Tainted: G      D    2.6.32-15-generic #22-Ubuntu
[    3.145183] Call Trace:
[    3.145674]  [<c013d562>] __schedule_bug+0x62/0x70
[    3.146646]  [<c05a37d4>] schedule+0x614/0x840
[    3.147497]  [<c05a9bcc>] ? smp_apic_timer_interrupt+0x5c/0x8b
[    3.148636]  [<c0103df1>] ? apic_timer_interrupt+0x31/0x40
[    3.149690]  [<c05a53b5>] rwsem_down_failed_common+0x75/0x1a0
[    3.150977]  [<c05a552d>] rwsem_down_read_failed+0x1d/0x30
[    3.152040]  [<c05a5587>] call_rwsem_down_read_failed+0x7/0x10
[    3.153149]  [<c05a4aec>] ? down_read+0x1c/0x20
[    3.154017]  [<c01878ef>] acct_collect+0x3f/0x170
[    3.154976]  [<c014ec12>] do_exit+0x262/0x310
[    3.155808]  [<c05a6595>] oops_end+0x95/0xd0
[    3.156642]  [<c01292e3>] ? kvm_leave_lazy_mmu+0x43/0x70
[    3.157660]  [<c012b2cc>] no_context+0xbc/0xe0
[    3.158545]  [<c01292e3>] ? kvm_leave_lazy_mmu+0x43/0x70
[    3.159553]  [<c01292e3>] ? kvm_leave_lazy_mmu+0x43/0x70
[    3.160627]  [<c012b32c>] __bad_area_nosemaphore+0x3c/0x160
[    3.161838]  [<c01c89ba>] ? T.903+0x3da/0x480
[    3.162741]  [<c01292e3>] ? kvm_leave_lazy_mmu+0x43/0x70
[    3.163772]  [<c012b467>] bad_area_nosemaphore+0x17/0x20
[    3.164809]  [<c05a7d56>] do_page_fault+0x2f6/0x380
[    3.165744]  [<c05a7a60>] ? do_page_fault+0x0/0x380
[    3.166737]  [<c05a5a63>] error_code+0x73/0x80
[    3.167595]  [<c01292e3>] ? kvm_leave_lazy_mmu+0x43/0x70
[    3.168629]  [<c01e8b6d>] move_ptes+0x1ad/0x270
[    3.169495]  [<c01e8ce0>] move_page_tables+0xb0/0x130
[    3.170525]  [<c020b614>] shift_arg_pages+0x94/0x180
[    3.171476]  [<c020b885>] setup_arg_pages+0x185/0x1b0
[    3.172461]  [<c0241243>] load_elf_binary+0x3c3/0xac0
[    3.173429]  [<c02f1654>] ? security_file_permission+0x14/0x20
[    3.174609]  [<c02052f4>] ? rw_verify_area+0x64/0xe0
[    3.175555]  [<c0240e80>] ? load_elf_binary+0x0/0xac0
[    3.176533]  [<c020bd9f>] search_binary_handler+0xef/0x2f0
[    3.177588]  [<c020b465>] ? kernel_read+0x35/0x50
[    3.178551]  [<c023f7b2>] load_script+0x1e2/0x270
[    3.179465]  [<c01e4160>] ? get_user_pages+0x50/0x60
[    3.180430]  [<c020a662>] ? get_arg_page+0x52/0xb0
[    3.181346]  [<c023f5d0>] ? load_script+0x0/0x270
[    3.182244]  [<c020bd9f>] search_binary_handler+0xef/0x2f0
[    3.183371]  [<c020a834>] ? copy_strings+0x174/0x190
[    3.184341]  [<c020c2c7>] do_execve+0x1f7/0x2c0
[    3.185210]  [<c034ed6a>] ? strncpy_from_user+0x3a/0x70
[    3.186203]  [<c0101a1d>] sys_execve+0x2d/0x60
[    3.187101]  [<c01033ec>] syscall_call+0x7/0xb
[    3.187945]  [<c01070a4>] ? kernel_execve+0x24/0x30
[    3.188890]  [<c01012ac>] ? run_init_process+0x1c/0x20
[    3.189874]  [<c0101396>] ? init_post+0xe6/0x100
[    3.190828]  [<c07d83d0>] ? kernel_init+0xb8/0xbf
[    3.191873]  [<c07d8318>] ? kernel_init+0x0/0xbf
[    3.192777]  [<c0104087>] ? kernel_thread_helper+0x7/0x10
[    3.524180] Clocksource tsc unstable (delta = -140394173 ns)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 13/20] KVM: x86 emulator: fix memory access during x86 emulation
  2010-03-08 20:48           ` Stefan Bader
@ 2010-03-09 15:49             ` Stefan Bader
  2010-03-11 21:16             ` KVM: x86: ignore access permissions for hypercall patching Marcelo Tosatti
  1 sibling, 0 replies; 32+ messages in thread
From: Stefan Bader @ 2010-03-09 15:49 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, linux-kernel

Stefan Bader wrote:
> Avi Kivity wrote:
>> On 03/08/2010 04:10 PM, Stefan Bader wrote:
>>> Avi Kivity wrote:
>>>   
>>>> On 03/06/2010 03:53 PM, Stefan Bader wrote:
>>>>     
>>>>> i Avi,
>>>>>
>>>>> we currently try to integrate this patch for an update into a 2.6.32
>>>>> based
>>>>> system (amongst other kvm updates). But as soon as this patch gets
>>>>> added kvm
>>>>> will die on startup in kvm_leave_lazy_mmu. This has been documented
>>>>> here:
>>>>>
>>>>> https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/531823
>>>>>
>>>>> I have placed the backports of your patches, which are currently in
>>>>> linux-next
>>>>> and marked for stable here:
>>>>>
>>>>> git://kernel.ubuntu.com/smb/linux-2.6.32.y kvm
>>>>>
>>>>> I have tested the failure with a version that got only the following
>>>>> patches in:
>>>>> KVM: x86 emulator: Add Virtual-8086 mode of emulation
>>>>> KVM: x86 emulator: fix memory access during x86 emulation
>>>>> KVM: x86 emulator: Check IOPL level during io instruction emulation
>>>>> KVM: x86 emulator: Fix popf emulation
>>>>> KVM: x86 emulator: Check CPL level during privilege instruction
>>>>> emulation
>>>>>
>>>>> and also with a version that takes all stable patches up to the bad
>>>>> one:
>>>>> KVM: VMX: Trap and invalid MWAIT/MONITOR instruction
>>>>> KVM: x86 emulator: Add group8 instruction decoding
>>>>> KVM: x86 emulator: Add group9 instruction decoding
>>>>> KVM: x86 emulator: Add Virtual-8086 mode of emulation
>>>>> KVM: x86 emulator: fix memory access during x86 emulation
>>>>>
>>>>> But as soon as the fix for memory access gets added, the bug will
>>>>> occur. Would
>>>>> you have an idea what might be causing this?
>>>>>
>>>>>        
>>>> Does the same guest, using the same qemu-kvm, work on kvm.git or
>>>> upstream?
>>>>
>>>>      
>>> The test was done with a kvm user-space package based on 0.12.3 (which
>>> seems to
>>> be the current upstream version). I try to do a test on the git version.
>>>    
>> I meant keep the same userspace without change, and try it on a Linus
>> kernel or kvm.git master
>> (http://git.kernel.org/?p=virt/kvm/kvm.git;a=summary).
>>
> HEAD of kvm.git tree works (with same client and userspace)
> Stable 2.6.32.y tree plus all patches marked cc: stable fails.

I did some more experiments:
- Reverting the kvm git tree back to "KVM: x86 emulator: fix memory
  access during x86 emulation" will also produce a working kernel.
- I tried to add changes to arch/x86/kvm between the last change to
  2.6.32.y and the memory access fix but still get the failure. (some
  are left out as they depend on larger/earlier changes)

54532a54d07cafb22076ef24346bd8b9f3b31008 KVM: Introduce kvm_host_page_size
79619a0b8ae87a1049cf6c2936205e2d2bb26ce8 KVM: Activate fpu on clts
d7008a4bec7ca24144eff555254ed1ec26fe330b KVM: fix load_guest_segment_descriptor(
8d067487fab8f00d9eb46beb1b54c0080824cd01 KVM: fix kvm_fix_hypercall() to return
a7c469e9abb33e63e098d4ea72d0291fd74bbc9b KVM: VMX: Wire up .fpu_activate() callb
bd148f5b1cf8e787264b7d8a09a9cc2a328eb987 KVM: VMX: Remove redundant test in vmx_
457132cfe7942ea9c0be8a37e9c822263eb67286 KVM: VMX: emulate accessed bit for EPT
9fe8302b20efa50423fd84efcc4a39b516980c90 KVM: Remove redundant reading of rax on
71c586b8a531000dad1b3a655dbcda1496a9bb8f KVM: Fix cr4 possible guest owned bits
d568ed45eac26170acfbd0f3eb71e53a9909b52b KVM: MMU: Add tracepoint for guest page
d041987339e09f0cf3e0d2ad76ba2190dd82f047 KVM: VMX: Rename VMX_EPT_IGMT_BIT to VM
482b8e268261f8e21f2bec74c7297ab91bba6d17 KVM: PIT: unregister kvm irq notifier i

But all is just stabbing in the dark at the moment. Is there a way I can get
more debug information?

> (32bit host/guest)
> Host dmesg:
> kvm: emulating exchange as write
> 
> Guest dmesg:
> ...
> [    3.053503] Freeing initrd memory: 8843k freed
> [    3.059863] Freeing unused kernel memory: 660k freed
> [    3.076657] Write protecting the kernel text: 4780k
> [    3.082863] Write protecting the kernel read-only data: 1912k
> [    3.086666] BUG: unable to handle kernel paging request at c01292e3
> [    3.088025] IP: [<c01292e3>] kvm_leave_lazy_mmu+0x43/0x70
> [    3.088025] *pde = 00910067 *pte = 00129161
> [    3.088025] Oops: 0003 [#1] SMP
> [    3.088025] last sysfs file:
> [    3.088025] Modules linked in:
> [    3.088025]
> [    3.088025] Pid: 1, comm: init Not tainted (2.6.32-15-generic #22-Ubuntu) Bochs
> [    3.088025] EIP: 0060:[<c01292e3>] EFLAGS: 00010246 CPU: 0
> [    3.088025] EIP is at kvm_leave_lazy_mmu+0x43/0x70
> [    3.088025] EAX: 00000002 EBX: 00000018 ECX: 01802c20 EDX: 00000000
> [    3.088025] ESI: c1802c20 EDI: c1802c20 EBP: df071cb4 ESP: df071ca8
> [    3.088025]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> [    3.088025] Process init (pid: 1, ti=df070000 task=df068000 task.ti=df070000)
> [    3.088025] Stack:
> [    3.088025]  c0000000 dce2b000 dce2a844 df071cf0 c01e8b6d 00000000 00000001
> bffff000
> [    3.088025] <0> 00000000 db7ed000 c139d54c c139d54c df133000 db7ed000
> 1ffef067 bffff000
> [    3.088025] <0> bfe10000 db44bbfc df071d2c c01e8ce0 c0000000 df133000
> db44bbfc bfe10000
> [    3.088025] Call Trace:
> [    3.088025]  [<c01e8b6d>] ? move_ptes+0x1ad/0x270
> [    3.088025]  [<c01e8ce0>] ? move_page_tables+0xb0/0x130
> [    3.088025]  [<c020b614>] ? shift_arg_pages+0x94/0x180
> [    3.088025]  [<c020b885>] ? setup_arg_pages+0x185/0x1b0
> [    3.088025]  [<c0241243>] ? load_elf_binary+0x3c3/0xac0
> [    3.088025]  [<c02f1654>] ? security_file_permission+0x14/0x20
> [    3.088025]  [<c02052f4>] ? rw_verify_area+0x64/0xe0
> [    3.088025]  [<c0240e80>] ? load_elf_binary+0x0/0xac0
> [    3.088025]  [<c020bd9f>] ? search_binary_handler+0xef/0x2f0
> [    3.088025]  [<c020b465>] ? kernel_read+0x35/0x50
> [    3.088025]  [<c023f7b2>] ? load_script+0x1e2/0x270
> [    3.088025]  [<c01e4160>] ? get_user_pages+0x50/0x60
> [    3.088025]  [<c020a662>] ? get_arg_page+0x52/0xb0
> [    3.088025]  [<c023f5d0>] ? load_script+0x0/0x270
> [    3.088025]  [<c020bd9f>] ? search_binary_handler+0xef/0x2f0
> [    3.088025]  [<c020a834>] ? copy_strings+0x174/0x190
> [    3.088025]  [<c020c2c7>] ? do_execve+0x1f7/0x2c0
> [    3.088025]  [<c034ed6a>] ? strncpy_from_user+0x3a/0x70
> [    3.088025]  [<c0101a1d>] ? sys_execve+0x2d/0x60
> [    3.088025]  [<c01033ec>] ? syscall_call+0x7/0xb
> [    3.088025]  [<c01070a4>] ? kernel_execve+0x24/0x30
> [    3.088025]  [<c01012ac>] ? run_init_process+0x1c/0x20
> [    3.088025]  [<c0101396>] ? init_post+0xe6/0x100
> [    3.088025]  [<c07d83d0>] ? kernel_init+0xb8/0xbf
> [    3.088025]  [<c07d8318>] ? kernel_init+0x0/0xbf
> [    3.088025]  [<c0104087>] ? kernel_thread_helper+0x7/0x10
> [    3.088025] Code: 6c 87 c0 64 a1 40 6a 87 c0 03 3c 85 80 4a 7d c0 8b 9f 00 04
> 00 00 85 db 74 24 89 fe 31 d2 66 90 8d 8e 00 00 00 40 b8 02 00 00 00 <0f> 01 c1
> 01 c6 29 c3 75 ec c7 87 00 04 00 00 00 00 00 00 e8 e5
> [    3.088025] EIP: [<c01292e3>] kvm_leave_lazy_mmu+0x43/0x70 SS:ESP 0068:df071ca8
> [    3.088025] CR2: 00000000c01292e3
> [    3.088025] ---[ end trace 85e247d11bf9c7e0 ]---
> [    3.088025] note: init[1] exited with preempt_count 2
> [    3.141968] BUG: scheduling while atomic: init/1/0x00000002
> [    3.143101] Modules linked in:
> [    3.143723] Pid: 1, comm: init Tainted: G      D    2.6.32-15-generic #22-Ubuntu
> [    3.145183] Call Trace:
> [    3.145674]  [<c013d562>] __schedule_bug+0x62/0x70
> [    3.146646]  [<c05a37d4>] schedule+0x614/0x840
> [    3.147497]  [<c05a9bcc>] ? smp_apic_timer_interrupt+0x5c/0x8b
> [    3.148636]  [<c0103df1>] ? apic_timer_interrupt+0x31/0x40
> [    3.149690]  [<c05a53b5>] rwsem_down_failed_common+0x75/0x1a0
> [    3.150977]  [<c05a552d>] rwsem_down_read_failed+0x1d/0x30
> [    3.152040]  [<c05a5587>] call_rwsem_down_read_failed+0x7/0x10
> [    3.153149]  [<c05a4aec>] ? down_read+0x1c/0x20
> [    3.154017]  [<c01878ef>] acct_collect+0x3f/0x170
> [    3.154976]  [<c014ec12>] do_exit+0x262/0x310
> [    3.155808]  [<c05a6595>] oops_end+0x95/0xd0
> [    3.156642]  [<c01292e3>] ? kvm_leave_lazy_mmu+0x43/0x70
> [    3.157660]  [<c012b2cc>] no_context+0xbc/0xe0
> [    3.158545]  [<c01292e3>] ? kvm_leave_lazy_mmu+0x43/0x70
> [    3.159553]  [<c01292e3>] ? kvm_leave_lazy_mmu+0x43/0x70
> [    3.160627]  [<c012b32c>] __bad_area_nosemaphore+0x3c/0x160
> [    3.161838]  [<c01c89ba>] ? T.903+0x3da/0x480
> [    3.162741]  [<c01292e3>] ? kvm_leave_lazy_mmu+0x43/0x70
> [    3.163772]  [<c012b467>] bad_area_nosemaphore+0x17/0x20
> [    3.164809]  [<c05a7d56>] do_page_fault+0x2f6/0x380
> [    3.165744]  [<c05a7a60>] ? do_page_fault+0x0/0x380
> [    3.166737]  [<c05a5a63>] error_code+0x73/0x80
> [    3.167595]  [<c01292e3>] ? kvm_leave_lazy_mmu+0x43/0x70
> [    3.168629]  [<c01e8b6d>] move_ptes+0x1ad/0x270
> [    3.169495]  [<c01e8ce0>] move_page_tables+0xb0/0x130
> [    3.170525]  [<c020b614>] shift_arg_pages+0x94/0x180
> [    3.171476]  [<c020b885>] setup_arg_pages+0x185/0x1b0
> [    3.172461]  [<c0241243>] load_elf_binary+0x3c3/0xac0
> [    3.173429]  [<c02f1654>] ? security_file_permission+0x14/0x20
> [    3.174609]  [<c02052f4>] ? rw_verify_area+0x64/0xe0
> [    3.175555]  [<c0240e80>] ? load_elf_binary+0x0/0xac0
> [    3.176533]  [<c020bd9f>] search_binary_handler+0xef/0x2f0
> [    3.177588]  [<c020b465>] ? kernel_read+0x35/0x50
> [    3.178551]  [<c023f7b2>] load_script+0x1e2/0x270
> [    3.179465]  [<c01e4160>] ? get_user_pages+0x50/0x60
> [    3.180430]  [<c020a662>] ? get_arg_page+0x52/0xb0
> [    3.181346]  [<c023f5d0>] ? load_script+0x0/0x270
> [    3.182244]  [<c020bd9f>] search_binary_handler+0xef/0x2f0
> [    3.183371]  [<c020a834>] ? copy_strings+0x174/0x190
> [    3.184341]  [<c020c2c7>] do_execve+0x1f7/0x2c0
> [    3.185210]  [<c034ed6a>] ? strncpy_from_user+0x3a/0x70
> [    3.186203]  [<c0101a1d>] sys_execve+0x2d/0x60
> [    3.187101]  [<c01033ec>] syscall_call+0x7/0xb
> [    3.187945]  [<c01070a4>] ? kernel_execve+0x24/0x30
> [    3.188890]  [<c01012ac>] ? run_init_process+0x1c/0x20
> [    3.189874]  [<c0101396>] ? init_post+0xe6/0x100
> [    3.190828]  [<c07d83d0>] ? kernel_init+0xb8/0xbf
> [    3.191873]  [<c07d8318>] ? kernel_init+0x0/0xbf
> [    3.192777]  [<c0104087>] ? kernel_thread_helper+0x7/0x10
> [    3.524180] Clocksource tsc unstable (delta = -140394173 ns)
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* KVM: x86: ignore access permissions for hypercall patching
  2010-03-08 20:48           ` Stefan Bader
  2010-03-09 15:49             ` Stefan Bader
@ 2010-03-11 21:16             ` Marcelo Tosatti
  2010-03-11 21:22               ` Stefan Bader
  2010-03-12  5:56               ` Gleb Natapov
  1 sibling, 2 replies; 32+ messages in thread
From: Marcelo Tosatti @ 2010-03-11 21:16 UTC (permalink / raw)
  To: Stefan Bader, Gleb Natapov; +Cc: kvm, Avi Kivity


Ignore access permissions while patching hypercall instructions. 
Otherwise KVM injects a page fault when trying to patch vmcall 
on read-only text regions:

Freeing initrd memory: 8843k freed
Freeing unused kernel memory: 660k freed
Write protecting the kernel text: 4780k
Write protecting the kernel read-only data: 1912k
BUG: unable to handle kernel paging request at c01292e3
IP: [<c01292e3>] kvm_leave_lazy_mmu+0x43/0x70
*pde = 00910067 *pte = 00129161
Oops: 0003 [#1] SMP

CC: stable@kernel.org
Reported-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 703f637..bf5c83f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3253,12 +3253,17 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
 static int emulator_write_emulated_onepage(unsigned long addr,
 					   const void *val,
 					   unsigned int bytes,
-					   struct kvm_vcpu *vcpu)
+					   struct kvm_vcpu *vcpu,
+					   bool guest_initiated)
 {
 	gpa_t                 gpa;
 	u32 error_code;
 
-	gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, &error_code);
+
+	if (guest_initiated)
+		gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, &error_code);
+	else
+		gpa = kvm_mmu_gva_to_gpa_system(vcpu, addr, &error_code);
 
 	if (gpa == UNMAPPED_GVA) {
 		kvm_inject_page_fault(vcpu, addr, error_code);
@@ -3289,24 +3294,35 @@ mmio:
 	return X86EMUL_CONTINUE;
 }
 
-int emulator_write_emulated(unsigned long addr,
+int __emulator_write_emulated(unsigned long addr,
 				   const void *val,
 				   unsigned int bytes,
-				   struct kvm_vcpu *vcpu)
+				   struct kvm_vcpu *vcpu,
+				   bool guest_initiated)
 {
 	/* Crossing a page boundary? */
 	if (((addr + bytes - 1) ^ addr) & PAGE_MASK) {
 		int rc, now;
 
 		now = -addr & ~PAGE_MASK;
-		rc = emulator_write_emulated_onepage(addr, val, now, vcpu);
+		rc = emulator_write_emulated_onepage(addr, val, now, vcpu,
+						     guest_initiated);
 		if (rc != X86EMUL_CONTINUE)
 			return rc;
 		addr += now;
 		val += now;
 		bytes -= now;
 	}
-	return emulator_write_emulated_onepage(addr, val, bytes, vcpu);
+	return emulator_write_emulated_onepage(addr, val, bytes, vcpu,
+					       guest_initiated);
+}
+
+int emulator_write_emulated(unsigned long addr,
+				   const void *val,
+				   unsigned int bytes,
+				   struct kvm_vcpu *vcpu)
+{
+	return __emulator_write_emulated(addr, val, bytes, vcpu, true);
 }
 EXPORT_SYMBOL_GPL(emulator_write_emulated);
 
@@ -3997,7 +4013,7 @@ int kvm_fix_hypercall(struct kvm_vcpu *vcpu)
 
 	kvm_x86_ops->patch_hypercall(vcpu, instruction);
 
-	return emulator_write_emulated(rip, instruction, 3, vcpu);
+	return __emulator_write_emulated(rip, instruction, 3, vcpu, false);
 }
 
 static u64 mk_cr_64(u64 curr_cr, u32 new_val)

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: KVM: x86: ignore access permissions for hypercall patching
  2010-03-11 21:16             ` KVM: x86: ignore access permissions for hypercall patching Marcelo Tosatti
@ 2010-03-11 21:22               ` Stefan Bader
  2010-03-12  5:56               ` Gleb Natapov
  1 sibling, 0 replies; 32+ messages in thread
From: Stefan Bader @ 2010-03-11 21:22 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Gleb Natapov, kvm, Avi Kivity

With this patch applied on top, I was able to boot my guest on a AMD host system.

Marcelo Tosatti wrote:
> Ignore access permissions while patching hypercall instructions. 
> Otherwise KVM injects a page fault when trying to patch vmcall 
> on read-only text regions:
> 
> Freeing initrd memory: 8843k freed
> Freeing unused kernel memory: 660k freed
> Write protecting the kernel text: 4780k
> Write protecting the kernel read-only data: 1912k
> BUG: unable to handle kernel paging request at c01292e3
> IP: [<c01292e3>] kvm_leave_lazy_mmu+0x43/0x70
> *pde = 00910067 *pte = 00129161
> Oops: 0003 [#1] SMP
> 
> CC: stable@kernel.org
> Reported-by: Stefan Bader <stefan.bader@canonical.com>
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Tested-by: Stefan Bader <stefan.bader@canonical.com>
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 703f637..bf5c83f 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3253,12 +3253,17 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
>  static int emulator_write_emulated_onepage(unsigned long addr,
>  					   const void *val,
>  					   unsigned int bytes,
> -					   struct kvm_vcpu *vcpu)
> +					   struct kvm_vcpu *vcpu,
> +					   bool guest_initiated)
>  {
>  	gpa_t                 gpa;
>  	u32 error_code;
>  
> -	gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, &error_code);
> +
> +	if (guest_initiated)
> +		gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, &error_code);
> +	else
> +		gpa = kvm_mmu_gva_to_gpa_system(vcpu, addr, &error_code);
>  
>  	if (gpa == UNMAPPED_GVA) {
>  		kvm_inject_page_fault(vcpu, addr, error_code);
> @@ -3289,24 +3294,35 @@ mmio:
>  	return X86EMUL_CONTINUE;
>  }
>  
> -int emulator_write_emulated(unsigned long addr,
> +int __emulator_write_emulated(unsigned long addr,
>  				   const void *val,
>  				   unsigned int bytes,
> -				   struct kvm_vcpu *vcpu)
> +				   struct kvm_vcpu *vcpu,
> +				   bool guest_initiated)
>  {
>  	/* Crossing a page boundary? */
>  	if (((addr + bytes - 1) ^ addr) & PAGE_MASK) {
>  		int rc, now;
>  
>  		now = -addr & ~PAGE_MASK;
> -		rc = emulator_write_emulated_onepage(addr, val, now, vcpu);
> +		rc = emulator_write_emulated_onepage(addr, val, now, vcpu,
> +						     guest_initiated);
>  		if (rc != X86EMUL_CONTINUE)
>  			return rc;
>  		addr += now;
>  		val += now;
>  		bytes -= now;
>  	}
> -	return emulator_write_emulated_onepage(addr, val, bytes, vcpu);
> +	return emulator_write_emulated_onepage(addr, val, bytes, vcpu,
> +					       guest_initiated);
> +}
> +
> +int emulator_write_emulated(unsigned long addr,
> +				   const void *val,
> +				   unsigned int bytes,
> +				   struct kvm_vcpu *vcpu)
> +{
> +	return __emulator_write_emulated(addr, val, bytes, vcpu, true);
>  }
>  EXPORT_SYMBOL_GPL(emulator_write_emulated);
>  
> @@ -3997,7 +4013,7 @@ int kvm_fix_hypercall(struct kvm_vcpu *vcpu)
>  
>  	kvm_x86_ops->patch_hypercall(vcpu, instruction);
>  
> -	return emulator_write_emulated(rip, instruction, 3, vcpu);
> +	return __emulator_write_emulated(rip, instruction, 3, vcpu, false);
>  }
>  
>  static u64 mk_cr_64(u64 curr_cr, u32 new_val)


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: KVM: x86: ignore access permissions for hypercall patching
  2010-03-11 21:16             ` KVM: x86: ignore access permissions for hypercall patching Marcelo Tosatti
  2010-03-11 21:22               ` Stefan Bader
@ 2010-03-12  5:56               ` Gleb Natapov
  2010-03-12  6:07                 ` Gleb Natapov
  1 sibling, 1 reply; 32+ messages in thread
From: Gleb Natapov @ 2010-03-12  5:56 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Stefan Bader, kvm, Avi Kivity

On Thu, Mar 11, 2010 at 06:16:05PM -0300, Marcelo Tosatti wrote:
> 
> Ignore access permissions while patching hypercall instructions. 
> Otherwise KVM injects a page fault when trying to patch vmcall 
> on read-only text regions:
> 
> Freeing initrd memory: 8843k freed
> Freeing unused kernel memory: 660k freed
> Write protecting the kernel text: 4780k
> Write protecting the kernel read-only data: 1912k
> BUG: unable to handle kernel paging request at c01292e3
> IP: [<c01292e3>] kvm_leave_lazy_mmu+0x43/0x70
> *pde = 00910067 *pte = 00129161
> Oops: 0003 [#1] SMP
> 
> CC: stable@kernel.org
> Reported-by: Stefan Bader <stefan.bader@canonical.com>
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> 
My emulator patch series introduce kvm_write_guest_virt_system(). May be
used it here (only compile tested).


diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3753c11..9833c25 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3157,14 +3157,18 @@ static int kvm_read_guest_virt_system(gva_t addr, void *val, unsigned int bytes,
 	return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, 0, error);
 }
 
-static int kvm_write_guest_virt(gva_t addr, void *val, unsigned int bytes,
-				struct kvm_vcpu *vcpu, u32 *error)
+static int kvm_write_guest_virt_helper(gva_t addr, void *val,
+				       unsigned int bytes,
+				       struct kvm_vcpu *vcpu, u32 access,
+				       u32 *error)
 {
 	void *data = val;
 	int r = X86EMUL_CONTINUE;
 
+	access |= PFERR_WRITE_MASK;
+
 	while (bytes) {
-		gpa_t gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, error);
+		gpa_t gpa =  vcpu->arch.mmu.gva_to_gpa(vcpu, addr, access, error);
 		unsigned offset = addr & (PAGE_SIZE-1);
 		unsigned towrite = min(bytes, (unsigned)PAGE_SIZE - offset);
 		int ret;
@@ -3187,6 +3191,19 @@ out:
 	return r;
 }
 
+static int kvm_write_guest_virt(gva_t addr, void *val, unsigned int bytes,
+				struct kvm_vcpu *vcpu, u32 *error)
+{
+	u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
+	return kvm_write_guest_virt_helper(addr, val, bytes, vcpu, access, error);
+}
+
+static int kvm_write_guest_virt_system(gva_t addr, void *val,
+				       unsigned int bytes,
+				       struct kvm_vcpu *vcpu, u32 *error)
+{
+	return kvm_write_guest_virt_helper(addr, val, bytes, vcpu, 0, error);
+}
 
 static int emulator_read_emulated(unsigned long addr,
 				  void *val,
@@ -3997,7 +4014,7 @@ int kvm_fix_hypercall(struct kvm_vcpu *vcpu)
 
 	kvm_x86_ops->patch_hypercall(vcpu, instruction);
 
-	return emulator_write_emulated(rip, instruction, 3, vcpu);
+	return kvm_write_guest_virt_system(rip, instruction, 3, vcpu, NULL);
 }
 
 static u64 mk_cr_64(u64 curr_cr, u32 new_val)
--
			Gleb.

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: KVM: x86: ignore access permissions for hypercall patching
  2010-03-12  5:56               ` Gleb Natapov
@ 2010-03-12  6:07                 ` Gleb Natapov
  0 siblings, 0 replies; 32+ messages in thread
From: Gleb Natapov @ 2010-03-12  6:07 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Stefan Bader, kvm, Avi Kivity

On Fri, Mar 12, 2010 at 07:56:00AM +0200, Gleb Natapov wrote:
> On Thu, Mar 11, 2010 at 06:16:05PM -0300, Marcelo Tosatti wrote:
> > 
> > Ignore access permissions while patching hypercall instructions. 
> > Otherwise KVM injects a page fault when trying to patch vmcall 
> > on read-only text regions:
> > 
> > Freeing initrd memory: 8843k freed
> > Freeing unused kernel memory: 660k freed
> > Write protecting the kernel text: 4780k
> > Write protecting the kernel read-only data: 1912k
> > BUG: unable to handle kernel paging request at c01292e3
> > IP: [<c01292e3>] kvm_leave_lazy_mmu+0x43/0x70
> > *pde = 00910067 *pte = 00129161
> > Oops: 0003 [#1] SMP
> > 
> > CC: stable@kernel.org
> > Reported-by: Stefan Bader <stefan.bader@canonical.com>
> > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> > 
> My emulator patch series introduce kvm_write_guest_virt_system(). May be
> used it here (only compile tested).
> 
Ignore that, it will not work.

--
			Gleb.

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2010-03-12  6:07 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-02-17 13:45 [PATCH 00/20] KVM updates for the 2.6.34 merge window (batch 4/4) Avi Kivity
2010-02-17 13:45 ` [PATCH 01/20] KVM: Fix Codestyle in virt/kvm/coalesced_mmio.c Avi Kivity
2010-02-17 13:45 ` [PATCH 02/20] KVM: MMU: Add tracepoint for guest page aging Avi Kivity
2010-02-17 13:45 ` [PATCH 03/20] KVM: VMX: Rename VMX_EPT_IGMT_BIT to VMX_EPT_IPAT_BIT Avi Kivity
2010-02-17 13:45 ` [PATCH 04/20] KVM: PIT: unregister kvm irq notifier if fail to create pit Avi Kivity
2010-02-17 13:45 ` [PATCH 05/20] KVM: kvm->arch.vioapic should be NULL if kvm_ioapic_init() failure Avi Kivity
2010-02-17 13:45 ` [PATCH 06/20] KVM: cleanup the failure path of KVM_CREATE_IRQCHIP ioctrl Avi Kivity
2010-02-17 13:45 ` [PATCH 07/20] KVM: ia64: destroy ioapic device if fail to setup default irq routing Avi Kivity
2010-02-17 13:45 ` [PATCH 08/20] KVM: ppc/booke: Set ESR and DEAR when inject interrupt to guest Avi Kivity
2010-02-17 13:45 ` [PATCH 09/20] KVM: do not store wqh in irqfd Avi Kivity
2010-02-17 13:45 ` [PATCH 10/20] KVM: x86 emulator: Add group8 instruction decoding Avi Kivity
2010-02-17 13:45 ` [PATCH 11/20] KVM: x86 emulator: Add group9 " Avi Kivity
2010-02-17 13:45 ` [PATCH 12/20] KVM: x86 emulator: Add Virtual-8086 mode of emulation Avi Kivity
2010-02-17 13:45 ` [PATCH 13/20] KVM: x86 emulator: fix memory access during x86 emulation Avi Kivity
2010-03-06 13:53   ` Stefan Bader
2010-03-07 10:07     ` Avi Kivity
2010-03-08 14:10       ` Stefan Bader
2010-03-08 14:12         ` Avi Kivity
2010-03-08 14:17           ` Stefan Bader
2010-03-08 20:48           ` Stefan Bader
2010-03-09 15:49             ` Stefan Bader
2010-03-11 21:16             ` KVM: x86: ignore access permissions for hypercall patching Marcelo Tosatti
2010-03-11 21:22               ` Stefan Bader
2010-03-12  5:56               ` Gleb Natapov
2010-03-12  6:07                 ` Gleb Natapov
2010-02-17 13:45 ` [PATCH 14/20] KVM: x86 emulator: Check IOPL level during io instruction emulation Avi Kivity
2010-02-17 13:45 ` [PATCH 15/20] KVM: x86 emulator: Fix popf emulation Avi Kivity
2010-02-17 13:45 ` [PATCH 16/20] KVM: x86 emulator: Check CPL level during privilege instruction emulation Avi Kivity
2010-02-17 13:45 ` [PATCH 17/20] KVM: x86 emulator: Add LOCK prefix validity checking Avi Kivity
2010-02-17 13:45 ` [PATCH 18/20] KVM: Plan obsolescence of kernel allocated slots, paravirt mmu Avi Kivity
2010-02-17 13:45 ` [PATCH 19/20] KVM: x86 emulator: code style cleanup Avi Kivity
2010-02-17 13:45 ` [PATCH 20/20] KVM: x86 emulator: disallow opcode 82 in 64-bit mode Avi Kivity

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.