linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] KVM updates for 2.6.20
@ 2007-01-11 10:02 Avi Kivity
  2007-01-11 10:03 ` [PATCH 1/5] KVM: Make sure there is a vcpu context loaded when destroying the mmu Avi Kivity
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Avi Kivity @ 2007-01-11 10:02 UTC (permalink / raw)
  To: kvm-devel; +Cc: linux-kernel, Andrew Morton, Ingo Molnar

The following patchset contains important kvm stability fixes for Linux 
2.6.20.  Most of the patches fix regressions introduced (or exposed) by 
the mmu optimization work; there's also a fix for an ancient (but 
serious) bug from the early days of kvm.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/5] KVM: Make sure there is a vcpu context loaded when destroying the mmu
  2007-01-11 10:02 [PATCH 0/5] KVM updates for 2.6.20 Avi Kivity
@ 2007-01-11 10:03 ` Avi Kivity
  2007-01-11 10:04 ` [PATCH 2/5] KVM: Fix race between mmio reads and injected interrupts Avi Kivity
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Avi Kivity @ 2007-01-11 10:03 UTC (permalink / raw)
  To: kvm-devel; +Cc: linux-kernel, akpm, mingo

This makes the vmwrite errors on vm shutdown go away.

Signed-off-by: Avi Kivity <avi@qumranet.com>

Index: linux-2.6/drivers/kvm/kvm_main.c
===================================================================
--- linux-2.6.orig/drivers/kvm/kvm_main.c
+++ linux-2.6/drivers/kvm/kvm_main.c
@@ -272,7 +272,9 @@ static void kvm_free_physmem(struct kvm 
 
 static void kvm_free_vcpu(struct kvm_vcpu *vcpu)
 {
+	vcpu_load(vcpu->kvm, vcpu_slot(vcpu));
 	kvm_mmu_destroy(vcpu);
+	vcpu_put(vcpu);
 	kvm_arch_ops->vcpu_free(vcpu);
 }
 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 2/5] KVM: Fix race between mmio reads and injected interrupts
  2007-01-11 10:02 [PATCH 0/5] KVM updates for 2.6.20 Avi Kivity
  2007-01-11 10:03 ` [PATCH 1/5] KVM: Make sure there is a vcpu context loaded when destroying the mmu Avi Kivity
@ 2007-01-11 10:04 ` Avi Kivity
  2007-01-11 10:05 ` [PATCH 3/5] KVM: x86 emulator: fix bit string instructions Avi Kivity
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Avi Kivity @ 2007-01-11 10:04 UTC (permalink / raw)
  To: kvm-devel; +Cc: linux-kernel, akpm, mingo

The kvm mmio read path looks like:

 1. guest read faults
 2. kvm emulates read, calls emulator_read_emulated()
 3. fails as a read requires userspace help
 4. exit to userspace
 5. userspace emulates read, kvm sets vcpu->mmio_read_completed
 6. re-enter guest, fault again
 7. kvm emulates read, calls emulator_read_emulated()
 8. succeeds as vcpu->mmio_read_emulated is set
 9. instruction completes and guest is resumed

A problem surfaces if the userspace exit (step 5) also requests an interrupt
injection.  In that case, the guest does not re-execute the original
instruction, but the interrupt handler.  The next time an mmio read is
exectued (likely for a different address), step 3 will find
vcpu->mmio_read_completed set and return the value read for the original
instruction.

The problem manifested itself in a few annoying ways:
- little squares appear randomly on console when switching virtual terminals
- ne2000 fails under nfs read load
- rtl8139 complains about "pci errors" even though the device model is
  incapable of issuing them.

Fix by skipping interrupt injection if an mmio read is pending.

A better fix is to avoid re-entry into the guest, and re-emulating immediately
instead.  However that's a bit more complex.

Signed-off-by: Avi Kivity <avi@qumranet.com>

Index: linux-2.6/drivers/kvm/svm.c
===================================================================
--- linux-2.6.orig/drivers/kvm/svm.c
+++ linux-2.6/drivers/kvm/svm.c
@@ -1407,7 +1407,8 @@ static int svm_vcpu_run(struct kvm_vcpu 
 	int r;
 
 again:
-	do_interrupt_requests(vcpu, kvm_run);
+	if (!vcpu->mmio_read_completed)
+		do_interrupt_requests(vcpu, kvm_run);
 
 	clgi();
 
Index: linux-2.6/drivers/kvm/vmx.c
===================================================================
--- linux-2.6.orig/drivers/kvm/vmx.c
+++ linux-2.6/drivers/kvm/vmx.c
@@ -1717,7 +1717,8 @@ again:
 	vmcs_writel(HOST_GS_BASE, segment_base(gs_sel));
 #endif
 
-	do_interrupt_requests(vcpu, kvm_run);
+	if (!vcpu->mmio_read_completed)
+		do_interrupt_requests(vcpu, kvm_run);
 
 	if (vcpu->guest_debug.enabled)
 		kvm_guest_debug_pre(vcpu);

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 3/5] KVM: x86 emulator: fix bit string instructions
  2007-01-11 10:02 [PATCH 0/5] KVM updates for 2.6.20 Avi Kivity
  2007-01-11 10:03 ` [PATCH 1/5] KVM: Make sure there is a vcpu context loaded when destroying the mmu Avi Kivity
  2007-01-11 10:04 ` [PATCH 2/5] KVM: Fix race between mmio reads and injected interrupts Avi Kivity
@ 2007-01-11 10:05 ` Avi Kivity
  2007-01-11 10:06 ` [PATCH 4/5] KVM: Fix asm constraints with CONFIG_FRAME_POINTER=n Avi Kivity
  2007-01-11 10:07 ` [PATCH 5/5] KVM: Fix bogus pagefault on writable pages Avi Kivity
  4 siblings, 0 replies; 7+ messages in thread
From: Avi Kivity @ 2007-01-11 10:05 UTC (permalink / raw)
  To: kvm-devel; +Cc: linux-kernel, akpm, mingo

The various bit string instructions (bts, btc, etc.) fail to adjust the
address correctly if the bit address is beyond BITS_PER_LONG.

This bug creeped in as the emulator originally relied on cr2 to contain
the memory address; however we now decode it from the mod r/m bits, and
must adjust the offset to account for large bit indices.

The patch is rather large because it switches src and dst decoding around,
so that the bit index is available when decoding the memory address.

This fixes workloads like the FC5 installer.

Signed-off-by: Avi Kivity <avi@qumranet.com>

Index: linux-2.6/drivers/kvm/x86_emulate.c
===================================================================
--- linux-2.6.orig/drivers/kvm/x86_emulate.c
+++ linux-2.6/drivers/kvm/x86_emulate.c
@@ -61,6 +61,7 @@
 #define ModRM       (1<<6)
 /* Destination is only written; never read. */
 #define Mov         (1<<7)
+#define BitOp       (1<<8)
 
 static u8 opcode_table[256] = {
 	/* 0x00 - 0x07 */
@@ -148,7 +149,7 @@ static u8 opcode_table[256] = {
 	0, 0, ByteOp | DstMem | SrcNone | ModRM, DstMem | SrcNone | ModRM
 };
 
-static u8 twobyte_table[256] = {
+static u16 twobyte_table[256] = {
 	/* 0x00 - 0x0F */
 	0, SrcMem | ModRM | DstReg, 0, 0, 0, 0, ImplicitOps, 0,
 	0, 0, 0, 0, 0, ImplicitOps | ModRM, 0, 0,
@@ -180,16 +181,16 @@ static u8 twobyte_table[256] = {
 	/* 0x90 - 0x9F */
 	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 	/* 0xA0 - 0xA7 */
-	0, 0, 0, DstMem | SrcReg | ModRM, 0, 0, 0, 0,
+	0, 0, 0, DstMem | SrcReg | ModRM | BitOp, 0, 0, 0, 0,
 	/* 0xA8 - 0xAF */
-	0, 0, 0, DstMem | SrcReg | ModRM, 0, 0, 0, 0,
+	0, 0, 0, DstMem | SrcReg | ModRM | BitOp, 0, 0, 0, 0,
 	/* 0xB0 - 0xB7 */
 	ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM, 0,
-	    DstMem | SrcReg | ModRM,
+	    DstMem | SrcReg | ModRM | BitOp,
 	0, 0, ByteOp | DstReg | SrcMem | ModRM | Mov,
 	    DstReg | SrcMem16 | ModRM | Mov,
 	/* 0xB8 - 0xBF */
-	0, 0, DstMem | SrcImmByte | ModRM, DstMem | SrcReg | ModRM,
+	0, 0, DstMem | SrcImmByte | ModRM, DstMem | SrcReg | ModRM | BitOp,
 	0, 0, ByteOp | DstReg | SrcMem | ModRM | Mov,
 	    DstReg | SrcMem16 | ModRM | Mov,
 	/* 0xC0 - 0xCF */
@@ -469,7 +470,8 @@ static int read_descriptor(struct x86_em
 int
 x86_emulate_memop(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
 {
-	u8 b, d, sib, twobyte = 0, rex_prefix = 0;
+	unsigned d;
+	u8 b, sib, twobyte = 0, rex_prefix = 0;
 	u8 modrm, modrm_mod = 0, modrm_reg = 0, modrm_rm = 0;
 	unsigned long *override_base = NULL;
 	unsigned int op_bytes, ad_bytes, lock_prefix = 0, rep_prefix = 0, i;
@@ -726,46 +728,6 @@ done_prefixes:
 		;
 	}
 
-	/* Decode and fetch the destination operand: register or memory. */
-	switch (d & DstMask) {
-	case ImplicitOps:
-		/* Special instructions do their own operand decoding. */
-		goto special_insn;
-	case DstReg:
-		dst.type = OP_REG;
-		if ((d & ByteOp)
-		    && !(twobyte_table && (b == 0xb6 || b == 0xb7))) {
-			dst.ptr = decode_register(modrm_reg, _regs,
-						  (rex_prefix == 0));
-			dst.val = *(u8 *) dst.ptr;
-			dst.bytes = 1;
-		} else {
-			dst.ptr = decode_register(modrm_reg, _regs, 0);
-			switch ((dst.bytes = op_bytes)) {
-			case 2:
-				dst.val = *(u16 *)dst.ptr;
-				break;
-			case 4:
-				dst.val = *(u32 *)dst.ptr;
-				break;
-			case 8:
-				dst.val = *(u64 *)dst.ptr;
-				break;
-			}
-		}
-		break;
-	case DstMem:
-		dst.type = OP_MEM;
-		dst.ptr = (unsigned long *)cr2;
-		dst.bytes = (d & ByteOp) ? 1 : op_bytes;
-		if (!(d & Mov) && /* optimisation - avoid slow emulated read */
-		    ((rc = ops->read_emulated((unsigned long)dst.ptr,
-					      &dst.val, dst.bytes, ctxt)) != 0))
-			goto done;
-		break;
-	}
-	dst.orig_val = dst.val;
-
 	/*
 	 * Decode and fetch the source operand: register, memory
 	 * or immediate.
@@ -838,6 +800,50 @@ done_prefixes:
 		break;
 	}
 
+	/* Decode and fetch the destination operand: register or memory. */
+	switch (d & DstMask) {
+	case ImplicitOps:
+		/* Special instructions do their own operand decoding. */
+		goto special_insn;
+	case DstReg:
+		dst.type = OP_REG;
+		if ((d & ByteOp)
+		    && !(twobyte_table && (b == 0xb6 || b == 0xb7))) {
+			dst.ptr = decode_register(modrm_reg, _regs,
+						  (rex_prefix == 0));
+			dst.val = *(u8 *) dst.ptr;
+			dst.bytes = 1;
+		} else {
+			dst.ptr = decode_register(modrm_reg, _regs, 0);
+			switch ((dst.bytes = op_bytes)) {
+			case 2:
+				dst.val = *(u16 *)dst.ptr;
+				break;
+			case 4:
+				dst.val = *(u32 *)dst.ptr;
+				break;
+			case 8:
+				dst.val = *(u64 *)dst.ptr;
+				break;
+			}
+		}
+		break;
+	case DstMem:
+		dst.type = OP_MEM;
+		dst.ptr = (unsigned long *)cr2;
+		dst.bytes = (d & ByteOp) ? 1 : op_bytes;
+		if (d & BitOp) {
+			dst.ptr += src.val / BITS_PER_LONG;
+			dst.bytes = sizeof(long);
+		}
+		if (!(d & Mov) && /* optimisation - avoid slow emulated read */
+		    ((rc = ops->read_emulated((unsigned long)dst.ptr,
+					      &dst.val, dst.bytes, ctxt)) != 0))
+			goto done;
+		break;
+	}
+	dst.orig_val = dst.val;
+
 	if (twobyte)
 		goto twobyte_insn;
 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 4/5] KVM: Fix asm constraints with CONFIG_FRAME_POINTER=n
  2007-01-11 10:02 [PATCH 0/5] KVM updates for 2.6.20 Avi Kivity
                   ` (2 preceding siblings ...)
  2007-01-11 10:05 ` [PATCH 3/5] KVM: x86 emulator: fix bit string instructions Avi Kivity
@ 2007-01-11 10:06 ` Avi Kivity
  2007-01-23  3:10   ` Herbert Xu
  2007-01-11 10:07 ` [PATCH 5/5] KVM: Fix bogus pagefault on writable pages Avi Kivity
  4 siblings, 1 reply; 7+ messages in thread
From: Avi Kivity @ 2007-01-11 10:06 UTC (permalink / raw)
  To: kvm-devel; +Cc: linux-kernel, akpm, mingo

A "g" constraint may place a local variable in an %rsp-relative memory operand.
but if your assembly changes %rsp, the operand points to the wrong location.

An "r" constraint fixes that.

Thanks to Ingo Molnar for neatly bisecting the problem.

Signed-off-by: Avi Kivity <avi@qumranet.com>

Index: linux-2.6/drivers/kvm/vmx.c
===================================================================
--- linux-2.6.orig/drivers/kvm/vmx.c
+++ linux-2.6/drivers/kvm/vmx.c
@@ -1825,7 +1825,7 @@ again:
 #endif
 		"setbe %0 \n\t"
 		"popf \n\t"
-	      : "=g" (fail)
+	      : "=r" (fail)
 	      : "r"(vcpu->launched), "d"((unsigned long)HOST_RSP),
 		"c"(vcpu),
 		[rax]"i"(offsetof(struct kvm_vcpu, regs[VCPU_REGS_RAX])),

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 5/5] KVM: Fix bogus pagefault on writable pages
  2007-01-11 10:02 [PATCH 0/5] KVM updates for 2.6.20 Avi Kivity
                   ` (3 preceding siblings ...)
  2007-01-11 10:06 ` [PATCH 4/5] KVM: Fix asm constraints with CONFIG_FRAME_POINTER=n Avi Kivity
@ 2007-01-11 10:07 ` Avi Kivity
  4 siblings, 0 replies; 7+ messages in thread
From: Avi Kivity @ 2007-01-11 10:07 UTC (permalink / raw)
  To: kvm-devel; +Cc: linux-kernel, akpm, mingo

If a page is marked as dirty in the guest pte, set_pte_common() can set the
writable bit on newly-instantiated shadow pte.  This optimization avoids
a write fault after the initial read fault.

However, if a write fault instantiates the pte, fix_write_pf() incorrectly
reports the fault as a guest page fault, and the guest oopses on what appears
to be a correctly-mapped page.

Fix is to detect the condition and only report a guest page fault on a user
access to a kernel page.

With the fix, a kvm guest can survive a whole night of running the kernel
hacker's screensaver (make -j9 in a loop).

Signed-off-by: Avi Kivity <avi@qumranet.com>

Index: linux-2.6/drivers/kvm/paging_tmpl.h
===================================================================
--- linux-2.6.orig/drivers/kvm/paging_tmpl.h
+++ linux-2.6/drivers/kvm/paging_tmpl.h
@@ -274,7 +274,7 @@ static int FNAME(fix_write_pf)(struct kv
 	struct kvm_mmu_page *page;
 
 	if (is_writeble_pte(*shadow_ent))
-		return 0;
+		return !user || (*shadow_ent & PT_USER_MASK);
 
 	writable_shadow = *shadow_ent & PT_SHADOW_WRITABLE_MASK;
 	if (user) {

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 4/5] KVM: Fix asm constraints with CONFIG_FRAME_POINTER=n
  2007-01-11 10:06 ` [PATCH 4/5] KVM: Fix asm constraints with CONFIG_FRAME_POINTER=n Avi Kivity
@ 2007-01-23  3:10   ` Herbert Xu
  0 siblings, 0 replies; 7+ messages in thread
From: Herbert Xu @ 2007-01-23  3:10 UTC (permalink / raw)
  To: Avi Kivity, torvalds; +Cc: kvm-devel, linux-kernel, akpm, mingo

Avi Kivity <avi@qumranet.com> wrote:
> A "g" constraint may place a local variable in an %rsp-relative memory operand.
> but if your assembly changes %rsp, the operand points to the wrong location.
> 
> An "r" constraint fixes that.
> 
> Thanks to Ingo Molnar for neatly bisecting the problem.
> 
> Signed-off-by: Avi Kivity <avi@qumranet.com>
> 
> Index: linux-2.6/drivers/kvm/vmx.c
> ===================================================================
> --- linux-2.6.orig/drivers/kvm/vmx.c
> +++ linux-2.6/drivers/kvm/vmx.c
> @@ -1825,7 +1825,7 @@ again:
> #endif
>                "setbe %0 \n\t"
>                "popf \n\t"
> -             : "=g" (fail)
> +             : "=r" (fail)
>              : "r"(vcpu->launched), "d"((unsigned long)HOST_RSP),
>                "c"(vcpu),
>                [rax]"i"(offsetof(struct kvm_vcpu, regs[VCPU_REGS_RAX])),

We need the following fix for 2.6.20.

[KVM] vmx: Fix register constraint in launch code

Both "=r" and "=g" breaks my build on i386:

$ make
  CC [M]  drivers/kvm/vmx.o
{standard input}: Assembler messages:
{standard input}:3318: Error: bad register name `%sil'
make[1]: *** [drivers/kvm/vmx.o] Error 1
make: *** [_module_drivers/kvm] Error 2

The reason is that setbe requires an 8-bit register but "=r" does not
constrain the target register to be one that has an 8-bit version on
i386.

According to

	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10153

the correct constraint is "=q".

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
index ce219e3..0aa2659 100644
--- a/drivers/kvm/vmx.c
+++ b/drivers/kvm/vmx.c
@@ -1824,7 +1824,7 @@ again:
 #endif
 		"setbe %0 \n\t"
 		"popf \n\t"
-	      : "=g" (fail)
+	      : "=q" (fail)
 	      : "r"(vcpu->launched), "d"((unsigned long)HOST_RSP),
 		"c"(vcpu),
 		[rax]"i"(offsetof(struct kvm_vcpu, regs[VCPU_REGS_RAX])),

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2007-01-23  3:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-01-11 10:02 [PATCH 0/5] KVM updates for 2.6.20 Avi Kivity
2007-01-11 10:03 ` [PATCH 1/5] KVM: Make sure there is a vcpu context loaded when destroying the mmu Avi Kivity
2007-01-11 10:04 ` [PATCH 2/5] KVM: Fix race between mmio reads and injected interrupts Avi Kivity
2007-01-11 10:05 ` [PATCH 3/5] KVM: x86 emulator: fix bit string instructions Avi Kivity
2007-01-11 10:06 ` [PATCH 4/5] KVM: Fix asm constraints with CONFIG_FRAME_POINTER=n Avi Kivity
2007-01-23  3:10   ` Herbert Xu
2007-01-11 10:07 ` [PATCH 5/5] KVM: Fix bogus pagefault on writable pages Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).