linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] KVM: PPC: e500mc: Revert "add load inst fixup"
@ 2013-06-28  9:20 Mihai Caraman
  2013-06-28  9:20 ` [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation Mihai Caraman
  0 siblings, 1 reply; 19+ messages in thread
From: Mihai Caraman @ 2013-06-28  9:20 UTC (permalink / raw)
  To: kvm-ppc; +Cc: Mihai Caraman, linuxppc-dev, kvm

lwepx faults needs to be handled by KVM. With the current solution
the host kernel searches for the faulting address using its LPID context.
If a host translation is found we return to the lwepx instr instead of the
fixup ending up in an infinite loop.

Revert the commit 1d628af7 "add load inst fixup". We will address lwepx
issue in a subsequent patch without the need of fixups.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
---
Resend this patch for Alex G. he was unsubscribed from kvm-ppc mailist
for a while.

 arch/powerpc/kvm/bookehv_interrupts.S |   26 +-------------------------
 1 files changed, 1 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/kvm/bookehv_interrupts.S b/arch/powerpc/kvm/bookehv_interrupts.S
index e8ed7d6..20c7a54 100644
--- a/arch/powerpc/kvm/bookehv_interrupts.S
+++ b/arch/powerpc/kvm/bookehv_interrupts.S
@@ -29,7 +29,6 @@
 #include <asm/asm-compat.h>
 #include <asm/asm-offsets.h>
 #include <asm/bitsperlong.h>
-#include <asm/thread_info.h>
 
 #ifdef CONFIG_64BIT
 #include <asm/exception-64e.h>
@@ -162,32 +161,9 @@
 	PPC_STL	r30, VCPU_GPR(R30)(r4)
 	PPC_STL	r31, VCPU_GPR(R31)(r4)
 	mtspr	SPRN_EPLC, r8
-
-	/* disable preemption, so we are sure we hit the fixup handler */
-	CURRENT_THREAD_INFO(r8, r1)
-	li	r7, 1
-	stw	r7, TI_PREEMPT(r8)
-
 	isync
-
-	/*
-	 * In case the read goes wrong, we catch it and write an invalid value
-	 * in LAST_INST instead.
-	 */
-1:	lwepx	r9, 0, r5
-2:
-.section .fixup, "ax"
-3:	li	r9, KVM_INST_FETCH_FAILED
-	b	2b
-.previous
-.section __ex_table,"a"
-	PPC_LONG_ALIGN
-	PPC_LONG 1b,3b
-.previous
-
+	lwepx   r9, 0, r5
 	mtspr	SPRN_EPLC, r3
-	li	r7, 0
-	stw	r7, TI_PREEMPT(r8)
 	stw	r9, VCPU_LAST_INST(r4)
 	.endif
 
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation
  2013-06-28  9:20 [PATCH 1/2] KVM: PPC: e500mc: Revert "add load inst fixup" Mihai Caraman
@ 2013-06-28  9:20 ` Mihai Caraman
  2013-07-08 13:39   ` Alexander Graf
  2013-07-09 21:45   ` Alexander Graf
  0 siblings, 2 replies; 19+ messages in thread
From: Mihai Caraman @ 2013-06-28  9:20 UTC (permalink / raw)
  To: kvm-ppc; +Cc: Mihai Caraman, linuxppc-dev, kvm

lwepx faults needs to be handled by KVM and this implies additional code
in DO_KVM macro to identify the source of the exception originated from
host context. This requires to check the Exception Syndrome Register
(ESR[EPID]) and External PID Load Context Register (EPLC[EGS]) for DTB_MISS,
DSI and LRAT exceptions which is too intrusive for the host.

Get rid of lwepx and acquire last instuction in kvmppc_handle_exit() by
searching for the physical address and kmap it. This fixes an infinite loop
caused by lwepx's data TLB miss handled in the host and the TODO for TLB
eviction and execute-but-not-read entries.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
---
Resend this pacth for Alex G. he was unsubscribed from kvm-ppc mailist
for a while.

 arch/powerpc/include/asm/mmu-book3e.h |    6 ++-
 arch/powerpc/kvm/booke.c              |    6 +++
 arch/powerpc/kvm/booke.h              |    2 +
 arch/powerpc/kvm/bookehv_interrupts.S |   32 ++-------------
 arch/powerpc/kvm/e500.c               |    4 ++
 arch/powerpc/kvm/e500mc.c             |   69 +++++++++++++++++++++++++++++++++
 6 files changed, 91 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu-book3e.h b/arch/powerpc/include/asm/mmu-book3e.h
index 99d43e0..32e470e 100644
--- a/arch/powerpc/include/asm/mmu-book3e.h
+++ b/arch/powerpc/include/asm/mmu-book3e.h
@@ -40,7 +40,10 @@
 
 /* MAS registers bit definitions */
 
-#define MAS0_TLBSEL(x)		(((x) << 28) & 0x30000000)
+#define MAS0_TLBSEL_MASK	0x30000000
+#define MAS0_TLBSEL_SHIFT	28
+#define MAS0_TLBSEL(x)		(((x) << MAS0_TLBSEL_SHIFT) & MAS0_TLBSEL_MASK)
+#define MAS0_GET_TLBSEL(mas0)	(((mas0) & MAS0_TLBSEL_MASK) >> MAS0_TLBSEL_SHIFT)
 #define MAS0_ESEL_MASK		0x0FFF0000
 #define MAS0_ESEL_SHIFT		16
 #define MAS0_ESEL(x)		(((x) << MAS0_ESEL_SHIFT) & MAS0_ESEL_MASK)
@@ -58,6 +61,7 @@
 #define MAS1_TSIZE_MASK		0x00000f80
 #define MAS1_TSIZE_SHIFT	7
 #define MAS1_TSIZE(x)		(((x) << MAS1_TSIZE_SHIFT) & MAS1_TSIZE_MASK)
+#define MAS1_GET_TSIZE(mas1)	(((mas1) & MAS1_TSIZE_MASK) >> MAS1_TSIZE_SHIFT)
 
 #define MAS2_EPN		(~0xFFFUL)
 #define MAS2_X0			0x00000040
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 1020119..6764a8e 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -836,6 +836,12 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	/* update before a new last_exit_type is rewritten */
 	kvmppc_update_timing_stats(vcpu);
 
+	/*
+	 * The exception type can change at this point, such as if the TLB entry
+	 * for the emulated instruction has been evicted.
+	 */
+	kvmppc_prepare_for_emulation(vcpu, &exit_nr);
+
 	/* restart interrupts if they were meant for the host */
 	kvmppc_restart_interrupt(vcpu, exit_nr);
 
diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h
index 5fd1ba6..a0d0fea 100644
--- a/arch/powerpc/kvm/booke.h
+++ b/arch/powerpc/kvm/booke.h
@@ -90,6 +90,8 @@ void kvmppc_vcpu_disable_spe(struct kvm_vcpu *vcpu);
 void kvmppc_booke_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
 void kvmppc_booke_vcpu_put(struct kvm_vcpu *vcpu);
 
+void kvmppc_prepare_for_emulation(struct kvm_vcpu *vcpu, unsigned int *exit_nr);
+
 enum int_class {
 	INT_CLASS_NONCRIT,
 	INT_CLASS_CRIT,
diff --git a/arch/powerpc/kvm/bookehv_interrupts.S b/arch/powerpc/kvm/bookehv_interrupts.S
index 20c7a54..0538ab9 100644
--- a/arch/powerpc/kvm/bookehv_interrupts.S
+++ b/arch/powerpc/kvm/bookehv_interrupts.S
@@ -120,37 +120,20 @@
 
 	.if	\flags & NEED_EMU
 	/*
-	 * This assumes you have external PID support.
-	 * To support a bookehv CPU without external PID, you'll
-	 * need to look up the TLB entry and create a temporary mapping.
-	 *
-	 * FIXME: we don't currently handle if the lwepx faults.  PR-mode
-	 * booke doesn't handle it either.  Since Linux doesn't use
-	 * broadcast tlbivax anymore, the only way this should happen is
-	 * if the guest maps its memory execute-but-not-read, or if we
-	 * somehow take a TLB miss in the middle of this entry code and
-	 * evict the relevant entry.  On e500mc, all kernel lowmem is
-	 * bolted into TLB1 large page mappings, and we don't use
-	 * broadcast invalidates, so we should not take a TLB miss here.
-	 *
-	 * Later we'll need to deal with faults here.  Disallowing guest
-	 * mappings that are execute-but-not-read could be an option on
-	 * e500mc, but not on chips with an LRAT if it is used.
+	 * We don't use external PID support. lwepx faults would need to be
+	 * handled by KVM and this implies aditional code in DO_KVM (for
+	 * DTB_MISS, DSI and LRAT) to check ESR[EPID] and EPLC[EGS] which
+	 * is too intrusive for the host. Get last instuction in
+	 * kvmppc_handle_exit().
 	 */
-
-	mfspr	r3, SPRN_EPLC	/* will already have correct ELPID and EGS */
 	PPC_STL	r15, VCPU_GPR(R15)(r4)
 	PPC_STL	r16, VCPU_GPR(R16)(r4)
 	PPC_STL	r17, VCPU_GPR(R17)(r4)
 	PPC_STL	r18, VCPU_GPR(R18)(r4)
 	PPC_STL	r19, VCPU_GPR(R19)(r4)
-	mr	r8, r3
 	PPC_STL	r20, VCPU_GPR(R20)(r4)
-	rlwimi	r8, r6, EPC_EAS_SHIFT - MSR_IR_LG, EPC_EAS
 	PPC_STL	r21, VCPU_GPR(R21)(r4)
-	rlwimi	r8, r6, EPC_EPR_SHIFT - MSR_PR_LG, EPC_EPR
 	PPC_STL	r22, VCPU_GPR(R22)(r4)
-	rlwimi	r8, r10, EPC_EPID_SHIFT, EPC_EPID
 	PPC_STL	r23, VCPU_GPR(R23)(r4)
 	PPC_STL	r24, VCPU_GPR(R24)(r4)
 	PPC_STL	r25, VCPU_GPR(R25)(r4)
@@ -160,11 +143,6 @@
 	PPC_STL	r29, VCPU_GPR(R29)(r4)
 	PPC_STL	r30, VCPU_GPR(R30)(r4)
 	PPC_STL	r31, VCPU_GPR(R31)(r4)
-	mtspr	SPRN_EPLC, r8
-	isync
-	lwepx   r9, 0, r5
-	mtspr	SPRN_EPLC, r3
-	stw	r9, VCPU_LAST_INST(r4)
 	.endif
 
 	.if	\flags & NEED_ESR
diff --git a/arch/powerpc/kvm/e500.c b/arch/powerpc/kvm/e500.c
index ce6b73c..c82a89f 100644
--- a/arch/powerpc/kvm/e500.c
+++ b/arch/powerpc/kvm/e500.c
@@ -439,6 +439,10 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id,
 	return r;
 }
 
+void kvmppc_prepare_for_emulation(struct kvm_vcpu *vcpu, unsigned int *exit_nr)
+{
+}
+
 struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
 {
 	struct kvmppc_vcpu_e500 *vcpu_e500;
diff --git a/arch/powerpc/kvm/e500mc.c b/arch/powerpc/kvm/e500mc.c
index c3bdc0a..3641df7 100644
--- a/arch/powerpc/kvm/e500mc.c
+++ b/arch/powerpc/kvm/e500mc.c
@@ -271,6 +271,75 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id,
 	return r;
 }
 
+void kvmppc_prepare_for_emulation(struct kvm_vcpu *vcpu, unsigned int *exit_nr)
+{
+	gva_t geaddr;
+	hpa_t addr;
+	u64 mas7_mas3;
+	hva_t eaddr;
+	u32 mas1, mas3;
+	struct page *page;
+	unsigned int addr_space, psize_shift;
+	bool pr;
+
+	if ((*exit_nr != BOOKE_INTERRUPT_DATA_STORAGE) &&
+	    (*exit_nr != BOOKE_INTERRUPT_DTLB_MISS) &&
+	    (*exit_nr != BOOKE_INTERRUPT_HV_PRIV))
+		return;
+
+	/* Search guest translation to find the real addressss */
+	geaddr = vcpu->arch.pc;
+	addr_space = (vcpu->arch.shared->msr & MSR_IS) >> MSR_IR_LG;
+	mtspr(SPRN_MAS6, (vcpu->arch.pid << MAS6_SPID_SHIFT) | addr_space);
+	mtspr(SPRN_MAS5, MAS5_SGS | vcpu->kvm->arch.lpid);
+	isync();
+	asm volatile("tlbsx 0, %[geaddr]\n" : : [geaddr] "r" (geaddr));
+	mtspr(SPRN_MAS5, 0);
+	mtspr(SPRN_MAS8, 0);	
+
+	mas1 = mfspr(SPRN_MAS1);
+	if (!(mas1 & MAS1_VALID)) {
+		/*
+	 	 * There is no translation for the emulated instruction.
+		 * Simulate an instruction TLB miss. This should force the host
+		 * or ultimately the guest to add the translation and then
+		 * reexecute the instruction.
+		 */
+		*exit_nr = BOOKE_INTERRUPT_ITLB_MISS;
+		return;
+	}
+
+	mas3 = mfspr(SPRN_MAS3);
+	pr = vcpu->arch.shared->msr & MSR_PR;
+	if ((pr && (!(mas3 & MAS3_UX))) || ((!pr) && (!(mas3 & MAS3_SX)))) {
+	 	/*
+		 * Another thread may rewrite the TLB entry in parallel, don't
+		 * execute from the address if the execute permission is not set
+		 */
+		vcpu->arch.fault_esr = 0;
+		*exit_nr = BOOKE_INTERRUPT_INST_STORAGE;
+		return;
+	}
+
+	/* Get page size */
+	if (MAS0_GET_TLBSEL(mfspr(SPRN_MAS0)) == 0)
+		psize_shift = PAGE_SHIFT;
+	else
+		psize_shift = MAS1_GET_TSIZE(mas1) + 10;
+
+	mas7_mas3 = (((u64) mfspr(SPRN_MAS7)) << 32) |
+		    mfspr(SPRN_MAS3);
+	addr = (mas7_mas3 & (~0ULL << psize_shift)) |
+	       (geaddr & ((1ULL << psize_shift) - 1ULL));
+
+	/* Map a page and get guest's instruction */
+	page = pfn_to_page(addr >> PAGE_SHIFT);
+	eaddr = (unsigned long)kmap_atomic(page);
+	eaddr |= addr & ~PAGE_MASK;
+	vcpu->arch.last_inst = *(u32 *)eaddr;
+	kunmap_atomic((u32 *)eaddr);
+}
+
 struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
 {
 	struct kvmppc_vcpu_e500 *vcpu_e500;
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation
  2013-06-28  9:20 ` [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation Mihai Caraman
@ 2013-07-08 13:39   ` Alexander Graf
  2013-07-09 17:13     ` Scott Wood
  2013-07-09 21:45   ` Alexander Graf
  1 sibling, 1 reply; 19+ messages in thread
From: Alexander Graf @ 2013-07-08 13:39 UTC (permalink / raw)
  To: Mihai Caraman; +Cc: linuxppc-dev, kvm, kvm-ppc


On 28.06.2013, at 11:20, Mihai Caraman wrote:

> lwepx faults needs to be handled by KVM and this implies additional =
code
> in DO_KVM macro to identify the source of the exception originated =
from
> host context. This requires to check the Exception Syndrome Register
> (ESR[EPID]) and External PID Load Context Register (EPLC[EGS]) for =
DTB_MISS,
> DSI and LRAT exceptions which is too intrusive for the host.
>=20
> Get rid of lwepx and acquire last instuction in kvmppc_handle_exit() =
by
> searching for the physical address and kmap it. This fixes an infinite =
loop

What's the difference in speed for this?

Also, could we call lwepx later in host code, when =
kvmppc_get_last_inst() gets invoked?

> caused by lwepx's data TLB miss handled in the host and the TODO for =
TLB
> eviction and execute-but-not-read entries.
>=20
> Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
> ---
> Resend this pacth for Alex G. he was unsubscribed from kvm-ppc mailist
> for a while.
>=20
> arch/powerpc/include/asm/mmu-book3e.h |    6 ++-
> arch/powerpc/kvm/booke.c              |    6 +++
> arch/powerpc/kvm/booke.h              |    2 +
> arch/powerpc/kvm/bookehv_interrupts.S |   32 ++-------------
> arch/powerpc/kvm/e500.c               |    4 ++
> arch/powerpc/kvm/e500mc.c             |   69 =
+++++++++++++++++++++++++++++++++
> 6 files changed, 91 insertions(+), 28 deletions(-)
>=20
> diff --git a/arch/powerpc/include/asm/mmu-book3e.h =
b/arch/powerpc/include/asm/mmu-book3e.h
> index 99d43e0..32e470e 100644
> --- a/arch/powerpc/include/asm/mmu-book3e.h
> +++ b/arch/powerpc/include/asm/mmu-book3e.h
> @@ -40,7 +40,10 @@
>=20
> /* MAS registers bit definitions */
>=20
> -#define MAS0_TLBSEL(x)		(((x) << 28) & 0x30000000)
> +#define MAS0_TLBSEL_MASK	0x30000000
> +#define MAS0_TLBSEL_SHIFT	28
> +#define MAS0_TLBSEL(x)		(((x) << MAS0_TLBSEL_SHIFT) & =
MAS0_TLBSEL_MASK)
> +#define MAS0_GET_TLBSEL(mas0)	(((mas0) & MAS0_TLBSEL_MASK) >> =
MAS0_TLBSEL_SHIFT)
> #define MAS0_ESEL_MASK		0x0FFF0000
> #define MAS0_ESEL_SHIFT		16
> #define MAS0_ESEL(x)		(((x) << MAS0_ESEL_SHIFT) & =
MAS0_ESEL_MASK)
> @@ -58,6 +61,7 @@
> #define MAS1_TSIZE_MASK		0x00000f80
> #define MAS1_TSIZE_SHIFT	7
> #define MAS1_TSIZE(x)		(((x) << MAS1_TSIZE_SHIFT) & =
MAS1_TSIZE_MASK)
> +#define MAS1_GET_TSIZE(mas1)	(((mas1) & MAS1_TSIZE_MASK) >> =
MAS1_TSIZE_SHIFT)
>=20
> #define MAS2_EPN		(~0xFFFUL)
> #define MAS2_X0			0x00000040
> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
> index 1020119..6764a8e 100644
> --- a/arch/powerpc/kvm/booke.c
> +++ b/arch/powerpc/kvm/booke.c
> @@ -836,6 +836,12 @@ int kvmppc_handle_exit(struct kvm_run *run, =
struct kvm_vcpu *vcpu,
> 	/* update before a new last_exit_type is rewritten */
> 	kvmppc_update_timing_stats(vcpu);
>=20
> +	/*
> +	 * The exception type can change at this point, such as if the =
TLB entry
> +	 * for the emulated instruction has been evicted.
> +	 */
> +	kvmppc_prepare_for_emulation(vcpu, &exit_nr);

Please model this the same way as book3s. Check out =
kvmppc_get_last_inst() as a starting point.

> +
> 	/* restart interrupts if they were meant for the host */
> 	kvmppc_restart_interrupt(vcpu, exit_nr);
>=20
> diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h
> index 5fd1ba6..a0d0fea 100644
> --- a/arch/powerpc/kvm/booke.h
> +++ b/arch/powerpc/kvm/booke.h
> @@ -90,6 +90,8 @@ void kvmppc_vcpu_disable_spe(struct kvm_vcpu *vcpu);
> void kvmppc_booke_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
> void kvmppc_booke_vcpu_put(struct kvm_vcpu *vcpu);
>=20
> +void kvmppc_prepare_for_emulation(struct kvm_vcpu *vcpu, unsigned int =
*exit_nr);
> +
> enum int_class {
> 	INT_CLASS_NONCRIT,
> 	INT_CLASS_CRIT,
> diff --git a/arch/powerpc/kvm/bookehv_interrupts.S =
b/arch/powerpc/kvm/bookehv_interrupts.S
> index 20c7a54..0538ab9 100644
> --- a/arch/powerpc/kvm/bookehv_interrupts.S
> +++ b/arch/powerpc/kvm/bookehv_interrupts.S
> @@ -120,37 +120,20 @@
>=20
> 	.if	\flags & NEED_EMU
> 	/*
> -	 * This assumes you have external PID support.
> -	 * To support a bookehv CPU without external PID, you'll
> -	 * need to look up the TLB entry and create a temporary mapping.
> -	 *
> -	 * FIXME: we don't currently handle if the lwepx faults.  =
PR-mode
> -	 * booke doesn't handle it either.  Since Linux doesn't use
> -	 * broadcast tlbivax anymore, the only way this should happen is
> -	 * if the guest maps its memory execute-but-not-read, or if we
> -	 * somehow take a TLB miss in the middle of this entry code and
> -	 * evict the relevant entry.  On e500mc, all kernel lowmem is
> -	 * bolted into TLB1 large page mappings, and we don't use
> -	 * broadcast invalidates, so we should not take a TLB miss here.
> -	 *
> -	 * Later we'll need to deal with faults here.  Disallowing guest
> -	 * mappings that are execute-but-not-read could be an option on
> -	 * e500mc, but not on chips with an LRAT if it is used.
> +	 * We don't use external PID support. lwepx faults would need to =
be
> +	 * handled by KVM and this implies aditional code in DO_KVM (for
> +	 * DTB_MISS, DSI and LRAT) to check ESR[EPID] and EPLC[EGS] =
which
> +	 * is too intrusive for the host. Get last instuction in
> +	 * kvmppc_handle_exit().
> 	 */
> -
> -	mfspr	r3, SPRN_EPLC	/* will already have correct ELPID and =
EGS */
> 	PPC_STL	r15, VCPU_GPR(R15)(r4)
> 	PPC_STL	r16, VCPU_GPR(R16)(r4)
> 	PPC_STL	r17, VCPU_GPR(R17)(r4)
> 	PPC_STL	r18, VCPU_GPR(R18)(r4)
> 	PPC_STL	r19, VCPU_GPR(R19)(r4)
> -	mr	r8, r3
> 	PPC_STL	r20, VCPU_GPR(R20)(r4)
> -	rlwimi	r8, r6, EPC_EAS_SHIFT - MSR_IR_LG, EPC_EAS
> 	PPC_STL	r21, VCPU_GPR(R21)(r4)
> -	rlwimi	r8, r6, EPC_EPR_SHIFT - MSR_PR_LG, EPC_EPR
> 	PPC_STL	r22, VCPU_GPR(R22)(r4)
> -	rlwimi	r8, r10, EPC_EPID_SHIFT, EPC_EPID
> 	PPC_STL	r23, VCPU_GPR(R23)(r4)
> 	PPC_STL	r24, VCPU_GPR(R24)(r4)
> 	PPC_STL	r25, VCPU_GPR(R25)(r4)
> @@ -160,11 +143,6 @@
> 	PPC_STL	r29, VCPU_GPR(R29)(r4)
> 	PPC_STL	r30, VCPU_GPR(R30)(r4)
> 	PPC_STL	r31, VCPU_GPR(R31)(r4)
> -	mtspr	SPRN_EPLC, r8
> -	isync
> -	lwepx   r9, 0, r5
> -	mtspr	SPRN_EPLC, r3
> -	stw	r9, VCPU_LAST_INST(r4)
> 	.endif
>=20
> 	.if	\flags & NEED_ESR
> diff --git a/arch/powerpc/kvm/e500.c b/arch/powerpc/kvm/e500.c
> index ce6b73c..c82a89f 100644
> --- a/arch/powerpc/kvm/e500.c
> +++ b/arch/powerpc/kvm/e500.c
> @@ -439,6 +439,10 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 =
id,
> 	return r;
> }
>=20
> +void kvmppc_prepare_for_emulation(struct kvm_vcpu *vcpu, unsigned int =
*exit_nr)
> +{
> +}
> +
> struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int =
id)
> {
> 	struct kvmppc_vcpu_e500 *vcpu_e500;
> diff --git a/arch/powerpc/kvm/e500mc.c b/arch/powerpc/kvm/e500mc.c
> index c3bdc0a..3641df7 100644
> --- a/arch/powerpc/kvm/e500mc.c
> +++ b/arch/powerpc/kvm/e500mc.c
> @@ -271,6 +271,75 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 =
id,
> 	return r;
> }
>=20
> +void kvmppc_prepare_for_emulation(struct kvm_vcpu *vcpu, unsigned int =
*exit_nr)

This really should be a generic "read data from guest address space" =
operation for the guest mmu handling code. Please implement that in =
e500_mmu.c.

> +{
> +	gva_t geaddr;
> +	hpa_t addr;
> +	u64 mas7_mas3;
> +	hva_t eaddr;
> +	u32 mas1, mas3;
> +	struct page *page;
> +	unsigned int addr_space, psize_shift;
> +	bool pr;
> +
> +	if ((*exit_nr !=3D BOOKE_INTERRUPT_DATA_STORAGE) &&
> +	    (*exit_nr !=3D BOOKE_INTERRUPT_DTLB_MISS) &&
> +	    (*exit_nr !=3D BOOKE_INTERRUPT_HV_PRIV))
> +		return;
> +
> +	/* Search guest translation to find the real addressss */
> +	geaddr =3D vcpu->arch.pc;
> +	addr_space =3D (vcpu->arch.shared->msr & MSR_IS) >> MSR_IR_LG;
> +	mtspr(SPRN_MAS6, (vcpu->arch.pid << MAS6_SPID_SHIFT) | =
addr_space);
> +	mtspr(SPRN_MAS5, MAS5_SGS | vcpu->kvm->arch.lpid);
> +	isync();
> +	asm volatile("tlbsx 0, %[geaddr]\n" : : [geaddr] "r" (geaddr));
> +	mtspr(SPRN_MAS5, 0);
> +	mtspr(SPRN_MAS8, 0);=09

Doesn't this break when a secondary thread eliminated that TLB entry by =
accident?

> +
> +	mas1 =3D mfspr(SPRN_MAS1);
> +	if (!(mas1 & MAS1_VALID)) {
> +		/*
> +	 	 * There is no translation for the emulated instruction.
> +		 * Simulate an instruction TLB miss. This should force =
the host
> +		 * or ultimately the guest to add the translation and =
then
> +		 * reexecute the instruction.
> +		 */
> +		*exit_nr =3D BOOKE_INTERRUPT_ITLB_MISS;

Ah, so that's how you handle that case. Please don't swizzle around exit =
information like that. It makes the code hard to follow.

> +		return;
> +	}
> +
> +	mas3 =3D mfspr(SPRN_MAS3);
> +	pr =3D vcpu->arch.shared->msr & MSR_PR;
> +	if ((pr && (!(mas3 & MAS3_UX))) || ((!pr) && (!(mas3 & =
MAS3_SX)))) {
> +	 	/*
> +		 * Another thread may rewrite the TLB entry in parallel, =
don't
> +		 * execute from the address if the execute permission is =
not set

Isn't this racy?

> +		 */
> +		vcpu->arch.fault_esr =3D 0;
> +		*exit_nr =3D BOOKE_INTERRUPT_INST_STORAGE;
> +		return;
> +	}
> +
> +	/* Get page size */
> +	if (MAS0_GET_TLBSEL(mfspr(SPRN_MAS0)) =3D=3D 0)
> +		psize_shift =3D PAGE_SHIFT;
> +	else
> +		psize_shift =3D MAS1_GET_TSIZE(mas1) + 10;
> +
> +	mas7_mas3 =3D (((u64) mfspr(SPRN_MAS7)) << 32) |
> +		    mfspr(SPRN_MAS3);

You're non-atomically reading MAS3/MAS7 after you've checked for =
permissions on MAS3. I'm surprised there's no handler that allows MAS3/7 =
access through the new, combined SPR for 64bit systems.

> +	addr =3D (mas7_mas3 & (~0ULL << psize_shift)) |
> +	       (geaddr & ((1ULL << psize_shift) - 1ULL));
> +
> +	/* Map a page and get guest's instruction */
> +	page =3D pfn_to_page(addr >> PAGE_SHIFT);

So it seems to me like you're jumping through a lot of hoops to make =
sure this works for LRAT and non-LRAT at the same time. Can't we just =
treat them as the different things they are?

What if we have different MMU backends for LRAT and non-LRAT? The =
non-LRAT case could then try lwepx, if that fails, fall back to read the =
shadow TLB. For the LRAT case, we'd do lwepx, if that fails fall back to =
this logic.


Alex

> +	eaddr =3D (unsigned long)kmap_atomic(page);
> +	eaddr |=3D addr & ~PAGE_MASK;
> +	vcpu->arch.last_inst =3D *(u32 *)eaddr;
> +	kunmap_atomic((u32 *)eaddr);
> +}
> +
> struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int =
id)
> {
> 	struct kvmppc_vcpu_e500 *vcpu_e500;
> --=20
> 1.7.4.1
>=20
>=20
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation
  2013-07-08 13:39   ` Alexander Graf
@ 2013-07-09 17:13     ` Scott Wood
  2013-07-09 17:44       ` Alexander Graf
  0 siblings, 1 reply; 19+ messages in thread
From: Scott Wood @ 2013-07-09 17:13 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Mihai Caraman, linuxppc-dev, kvm, kvm-ppc

On 07/08/2013 08:39:05 AM, Alexander Graf wrote:
>=20
> On 28.06.2013, at 11:20, Mihai Caraman wrote:
>=20
> > lwepx faults needs to be handled by KVM and this implies additional =20
> code
> > in DO_KVM macro to identify the source of the exception originated =20
> from
> > host context. This requires to check the Exception Syndrome Register
> > (ESR[EPID]) and External PID Load Context Register (EPLC[EGS]) for =20
> DTB_MISS,
> > DSI and LRAT exceptions which is too intrusive for the host.
> >
> > Get rid of lwepx and acquire last instuction in =20
> kvmppc_handle_exit() by
> > searching for the physical address and kmap it. This fixes an =20
> infinite loop
>=20
> What's the difference in speed for this?
>=20
> Also, could we call lwepx later in host code, when =20
> kvmppc_get_last_inst() gets invoked?

Any use of lwepx is problematic unless we want to add overhead to the =20
main Linux TLB miss handler.

> > +		return;
> > +	}
> > +
> > +	mas3 =3D mfspr(SPRN_MAS3);
> > +	pr =3D vcpu->arch.shared->msr & MSR_PR;
> > +	if ((pr && (!(mas3 & MAS3_UX))) || ((!pr) && (!(mas3 & =20
> MAS3_SX)))) {
> > +	 	/*
> > +		 * Another thread may rewrite the TLB entry in =20
> parallel, don't
> > +		 * execute from the address if the execute permission =20
> is not set
>=20
> Isn't this racy?

Yes, that's the point.  We want to access permissions atomically with =20
the address.  If the guest races here, the unpredictable behavior is =20
its own fault, but we don't want to make it worse by assuming that the =20
new TLB entry is executable just because the old TLB entry was.

There's still a potential problem if the instruction at the new TLB =20
entry is valid but not something that KVM emulates (because it wouldn't =20
have trapped).  Given that the guest is already engaging in =20
unpredictable behavior, though, and that it's no longer a security =20
issue (it'll just cause the guest to exit), I don't think we need to =20
worry too much about it.

> > +		 */
> > +		vcpu->arch.fault_esr =3D 0;
> > +		*exit_nr =3D BOOKE_INTERRUPT_INST_STORAGE;
> > +		return;
> > +	}
> > +
> > +	/* Get page size */
> > +	if (MAS0_GET_TLBSEL(mfspr(SPRN_MAS0)) =3D=3D 0)
> > +		psize_shift =3D PAGE_SHIFT;
> > +	else
> > +		psize_shift =3D MAS1_GET_TSIZE(mas1) + 10;
> > +
> > +	mas7_mas3 =3D (((u64) mfspr(SPRN_MAS7)) << 32) |
> > +		    mfspr(SPRN_MAS3);
>=20
> You're non-atomically reading MAS3/MAS7 after you've checked for =20
> permissions on MAS3. I'm surprised there's no handler that allows =20
> MAS3/7 access through the new, combined SPR for 64bit systems.

There is, but then we'd need to special-case 64-bit systems.  Why does =20
atomicity matter here?  The MAS registers were filled in when we did =20
the tlbsx.  They are thread-local.  They don't magically change just =20
because the other thread rewrites the TLB entry that was used to fill =20
them.

> > +	addr =3D (mas7_mas3 & (~0ULL << psize_shift)) |
> > +	       (geaddr & ((1ULL << psize_shift) - 1ULL));
> > +
> > +	/* Map a page and get guest's instruction */
> > +	page =3D pfn_to_page(addr >> PAGE_SHIFT);
>=20
> So it seems to me like you're jumping through a lot of hoops to make =20
> sure this works for LRAT and non-LRAT at the same time. Can't we just =20
> treat them as the different things they are?
>=20
> What if we have different MMU backends for LRAT and non-LRAT? The =20
> non-LRAT case could then try lwepx, if that fails, fall back to read =20
> the shadow TLB. For the LRAT case, we'd do lwepx, if that fails fall =20
> back to this logic.

This isn't about LRAT; it's about hardware threads.  It also fixes the =20
handling of execute-only pages on current chips.

-Scott=

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation
  2013-07-09 17:13     ` Scott Wood
@ 2013-07-09 17:44       ` Alexander Graf
  2013-07-09 18:46         ` Scott Wood
  0 siblings, 1 reply; 19+ messages in thread
From: Alexander Graf @ 2013-07-09 17:44 UTC (permalink / raw)
  To: Scott Wood; +Cc: Mihai Caraman, linuxppc-dev, kvm, kvm-ppc

On 07/09/2013 07:13 PM, Scott Wood wrote:
> On 07/08/2013 08:39:05 AM, Alexander Graf wrote:
>>
>> On 28.06.2013, at 11:20, Mihai Caraman wrote:
>>
>> > lwepx faults needs to be handled by KVM and this implies additional 
>> code
>> > in DO_KVM macro to identify the source of the exception originated 
>> from
>> > host context. This requires to check the Exception Syndrome Register
>> > (ESR[EPID]) and External PID Load Context Register (EPLC[EGS]) for 
>> DTB_MISS,
>> > DSI and LRAT exceptions which is too intrusive for the host.
>> >
>> > Get rid of lwepx and acquire last instuction in 
>> kvmppc_handle_exit() by
>> > searching for the physical address and kmap it. This fixes an 
>> infinite loop
>>
>> What's the difference in speed for this?
>>
>> Also, could we call lwepx later in host code, when 
>> kvmppc_get_last_inst() gets invoked?
>
> Any use of lwepx is problematic unless we want to add overhead to the 
> main Linux TLB miss handler.

What exactly would be missing?

I'd also still like to see some performance benchmarks on this to make 
sure we're not walking into a bad direction.

>
>> > +        return;
>> > +    }
>> > +
>> > +    mas3 = mfspr(SPRN_MAS3);
>> > +    pr = vcpu->arch.shared->msr & MSR_PR;
>> > +    if ((pr && (!(mas3 & MAS3_UX))) || ((!pr) && (!(mas3 & 
>> MAS3_SX)))) {
>> > +         /*
>> > +         * Another thread may rewrite the TLB entry in parallel, 
>> don't
>> > +         * execute from the address if the execute permission is 
>> not set
>>
>> Isn't this racy?
>
> Yes, that's the point.  We want to access permissions atomically with 
> the address.  If the guest races here, the unpredictable behavior is 
> its own fault, but we don't want to make it worse by assuming that the 
> new TLB entry is executable just because the old TLB entry was.

I see.

>
> There's still a potential problem if the instruction at the new TLB 
> entry is valid but not something that KVM emulates (because it 
> wouldn't have trapped).  Given that the guest is already engaging in 
> unpredictable behavior, though, and that it's no longer a security 
> issue (it'll just cause the guest to exit), I don't think we need to 
> worry too much about it.

No, that case is fine. It's the same as book3s pr.

>
>> > +         */
>> > +        vcpu->arch.fault_esr = 0;
>> > +        *exit_nr = BOOKE_INTERRUPT_INST_STORAGE;
>> > +        return;
>> > +    }
>> > +
>> > +    /* Get page size */
>> > +    if (MAS0_GET_TLBSEL(mfspr(SPRN_MAS0)) == 0)
>> > +        psize_shift = PAGE_SHIFT;
>> > +    else
>> > +        psize_shift = MAS1_GET_TSIZE(mas1) + 10;
>> > +
>> > +    mas7_mas3 = (((u64) mfspr(SPRN_MAS7)) << 32) |
>> > +            mfspr(SPRN_MAS3);
>>
>> You're non-atomically reading MAS3/MAS7 after you've checked for 
>> permissions on MAS3. I'm surprised there's no handler that allows 
>> MAS3/7 access through the new, combined SPR for 64bit systems.
>
> There is, but then we'd need to special-case 64-bit systems.

Oh, what I was trying to say is that I'm surprised there's nothing in 
Linux already like

static inline u64 get_mas73(void) {
#ifdef CONFIG_PPC64
     return mfspr(SPRN_MAS73)
#else
     return ((u64)mfspr(SPRN_MAS7) << 32) | mfspr(SPRN_MAS3);
#endif
}

>   Why does atomicity matter here?  The MAS registers were filled in 
> when we did the tlbsx.  They are thread-local.  They don't magically 
> change just because the other thread rewrites the TLB entry that was 
> used to fill them.

Yeah, it doesn't matter.

>
>> > +    addr = (mas7_mas3 & (~0ULL << psize_shift)) |
>> > +           (geaddr & ((1ULL << psize_shift) - 1ULL));
>> > +
>> > +    /* Map a page and get guest's instruction */
>> > +    page = pfn_to_page(addr >> PAGE_SHIFT);
>>
>> So it seems to me like you're jumping through a lot of hoops to make 
>> sure this works for LRAT and non-LRAT at the same time. Can't we just 
>> treat them as the different things they are?
>>
>> What if we have different MMU backends for LRAT and non-LRAT? The 
>> non-LRAT case could then try lwepx, if that fails, fall back to read 
>> the shadow TLB. For the LRAT case, we'd do lwepx, if that fails fall 
>> back to this logic.
>
> This isn't about LRAT; it's about hardware threads.  It also fixes the 
> handling of execute-only pages on current chips.

On non-LRAT systems we could always check our shadow copy of the guest's 
TLB, no? I'd really like to know what the performance difference would 
be for the 2 approaches.


Alex

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation
  2013-07-09 17:44       ` Alexander Graf
@ 2013-07-09 18:46         ` Scott Wood
  2013-07-09 21:44           ` Alexander Graf
  0 siblings, 1 reply; 19+ messages in thread
From: Scott Wood @ 2013-07-09 18:46 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Mihai Caraman, linuxppc-dev, kvm, kvm-ppc

On 07/09/2013 12:44:32 PM, Alexander Graf wrote:
> On 07/09/2013 07:13 PM, Scott Wood wrote:
>> On 07/08/2013 08:39:05 AM, Alexander Graf wrote:
>>>=20
>>> On 28.06.2013, at 11:20, Mihai Caraman wrote:
>>>=20
>>> > lwepx faults needs to be handled by KVM and this implies =20
>>> additional code
>>> > in DO_KVM macro to identify the source of the exception =20
>>> originated from
>>> > host context. This requires to check the Exception Syndrome =20
>>> Register
>>> > (ESR[EPID]) and External PID Load Context Register (EPLC[EGS]) =20
>>> for DTB_MISS,
>>> > DSI and LRAT exceptions which is too intrusive for the host.
>>> >
>>> > Get rid of lwepx and acquire last instuction in =20
>>> kvmppc_handle_exit() by
>>> > searching for the physical address and kmap it. This fixes an =20
>>> infinite loop
>>>=20
>>> What's the difference in speed for this?
>>>=20
>>> Also, could we call lwepx later in host code, when =20
>>> kvmppc_get_last_inst() gets invoked?
>>=20
>> Any use of lwepx is problematic unless we want to add overhead to =20
>> the main Linux TLB miss handler.
>=20
> What exactly would be missing?

If lwepx faults, it goes to the normal host TLB miss handler.  Without =20
adding code to it to recognize that it's an external-PID fault, it will =20
try to search the normal Linux page tables and insert a normal host =20
entry.  If it thinks it has succeeded, it will retry the instruction =20
rather than search for an exception handler.  The instruction will =20
fault again, and you get a hang.

> I'd also still like to see some performance benchmarks on this to =20
> make sure we're not walking into a bad direction.

I doubt it'll be significantly different.  There's overhead involved in =20
setting up for lwepx as well.  It doesn't hurt to test, though this is =20
a functional correctness issue, so I'm not sure what better =20
alternatives we have.  I don't want to slow down non-KVM TLB misses for =20
this.

>>> > +    addr =3D (mas7_mas3 & (~0ULL << psize_shift)) |
>>> > +           (geaddr & ((1ULL << psize_shift) - 1ULL));
>>> > +
>>> > +    /* Map a page and get guest's instruction */
>>> > +    page =3D pfn_to_page(addr >> PAGE_SHIFT);
>>>=20
>>> So it seems to me like you're jumping through a lot of hoops to =20
>>> make sure this works for LRAT and non-LRAT at the same time. Can't =20
>>> we just treat them as the different things they are?
>>>=20
>>> What if we have different MMU backends for LRAT and non-LRAT? The =20
>>> non-LRAT case could then try lwepx, if that fails, fall back to =20
>>> read the shadow TLB. For the LRAT case, we'd do lwepx, if that =20
>>> fails fall back to this logic.
>>=20
>> This isn't about LRAT; it's about hardware threads.  It also fixes =20
>> the handling of execute-only pages on current chips.
>=20
> On non-LRAT systems we could always check our shadow copy of the =20
> guest's TLB, no? I'd really like to know what the performance =20
> difference would be for the 2 approaches.

I suspect that tlbsx is faster, or at worst similar.  And unlike =20
comparing tlbsx to lwepx (not counting a fix for the threading =20
problem), we don't already have code to search the guest TLB, so =20
testing would be more work.

-Scott=

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation
  2013-07-09 18:46         ` Scott Wood
@ 2013-07-09 21:44           ` Alexander Graf
  2013-07-10  0:06             ` Scott Wood
  0 siblings, 1 reply; 19+ messages in thread
From: Alexander Graf @ 2013-07-09 21:44 UTC (permalink / raw)
  To: Scott Wood; +Cc: Mihai Caraman, linuxppc-dev, kvm, kvm-ppc


On 09.07.2013, at 20:46, Scott Wood wrote:

> On 07/09/2013 12:44:32 PM, Alexander Graf wrote:
>> On 07/09/2013 07:13 PM, Scott Wood wrote:
>>> On 07/08/2013 08:39:05 AM, Alexander Graf wrote:
>>>> On 28.06.2013, at 11:20, Mihai Caraman wrote:
>>>> > lwepx faults needs to be handled by KVM and this implies =
additional code
>>>> > in DO_KVM macro to identify the source of the exception =
originated from
>>>> > host context. This requires to check the Exception Syndrome =
Register
>>>> > (ESR[EPID]) and External PID Load Context Register (EPLC[EGS]) =
for DTB_MISS,
>>>> > DSI and LRAT exceptions which is too intrusive for the host.
>>>> >
>>>> > Get rid of lwepx and acquire last instuction in =
kvmppc_handle_exit() by
>>>> > searching for the physical address and kmap it. This fixes an =
infinite loop
>>>> What's the difference in speed for this?
>>>> Also, could we call lwepx later in host code, when =
kvmppc_get_last_inst() gets invoked?
>>> Any use of lwepx is problematic unless we want to add overhead to =
the main Linux TLB miss handler.
>> What exactly would be missing?
>=20
> If lwepx faults, it goes to the normal host TLB miss handler.  Without =
adding code to it to recognize that it's an external-PID fault, it will =
try to search the normal Linux page tables and insert a normal host =
entry.  If it thinks it has succeeded, it will retry the instruction =
rather than search for an exception handler.  The instruction will fault =
again, and you get a hang.

:(

So we either have to rewrite IVOR / IVPR or add a branch in the hot TLB =
miss interrupt handler. Both alternatives suck.

>=20
>> I'd also still like to see some performance benchmarks on this to =
make sure we're not walking into a bad direction.
>=20
> I doubt it'll be significantly different.  There's overhead involved =
in setting up for lwepx as well.  It doesn't hurt to test, though this =
is a functional correctness issue, so I'm not sure what better =
alternatives we have.  I don't want to slow down non-KVM TLB misses for =
this.

Yeah, I concur on that part. It probably won't get better. Sigh.

>=20
>>>> > +    addr =3D (mas7_mas3 & (~0ULL << psize_shift)) |
>>>> > +           (geaddr & ((1ULL << psize_shift) - 1ULL));
>>>> > +
>>>> > +    /* Map a page and get guest's instruction */
>>>> > +    page =3D pfn_to_page(addr >> PAGE_SHIFT);
>>>> So it seems to me like you're jumping through a lot of hoops to =
make sure this works for LRAT and non-LRAT at the same time. Can't we =
just treat them as the different things they are?
>>>> What if we have different MMU backends for LRAT and non-LRAT? The =
non-LRAT case could then try lwepx, if that fails, fall back to read the =
shadow TLB. For the LRAT case, we'd do lwepx, if that fails fall back to =
this logic.
>>> This isn't about LRAT; it's about hardware threads.  It also fixes =
the handling of execute-only pages on current chips.
>> On non-LRAT systems we could always check our shadow copy of the =
guest's TLB, no? I'd really like to know what the performance difference =
would be for the 2 approaches.
>=20
> I suspect that tlbsx is faster, or at worst similar.  And unlike =
comparing tlbsx to lwepx (not counting a fix for the threading problem), =
we don't already have code to search the guest TLB, so testing would be =
more work.

We have code to walk the guest TLB for TLB misses. This really is just =
the TLB miss search without host TLB injection.

So let's say we're using the shadow TLB. The guest always has its say 64 =
TLB entries that it can count on - we never evict anything by accident, =
because we store all of the 64 entries in our guest TLB cache. When the =
guest faults at an address, the first thing we do is we check the cache =
whether we have that page already mapped.

However, with this method we now have 2 enumeration methods for guest =
TLB searches. We have the tlbsx one which searches the host TLB and we =
have our guest TLB cache. The guest TLB cache might still contain an =
entry for an address that we already invalidated on the host. Would that =
impose a problem?

I guess not because we're swizzling the exit code around to instead be =
an instruction miss which means we restore the TLB entry into our host's =
TLB so that when we resume, we land here and the tlbsx hits. But it =
feels backwards.

At least this code has to become something more generic, such as =
kvmppc_read_guest(vcpu, addr, TYPE_INSN) and move into the host mmu =
implementation, as it's 100% host mmu specific.


Alex

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation
  2013-06-28  9:20 ` [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation Mihai Caraman
  2013-07-08 13:39   ` Alexander Graf
@ 2013-07-09 21:45   ` Alexander Graf
  2013-07-10  0:12     ` Scott Wood
  1 sibling, 1 reply; 19+ messages in thread
From: Alexander Graf @ 2013-07-09 21:45 UTC (permalink / raw)
  To: Mihai Caraman; +Cc: linuxppc-dev, kvm, kvm-ppc


On 28.06.2013, at 11:20, Mihai Caraman wrote:

> lwepx faults needs to be handled by KVM and this implies additional =
code
> in DO_KVM macro to identify the source of the exception originated =
from
> host context. This requires to check the Exception Syndrome Register
> (ESR[EPID]) and External PID Load Context Register (EPLC[EGS]) for =
DTB_MISS,
> DSI and LRAT exceptions which is too intrusive for the host.
>=20
> Get rid of lwepx and acquire last instuction in kvmppc_handle_exit() =
by
> searching for the physical address and kmap it. This fixes an infinite =
loop
> caused by lwepx's data TLB miss handled in the host and the TODO for =
TLB
> eviction and execute-but-not-read entries.
>=20
> Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
> ---
> Resend this pacth for Alex G. he was unsubscribed from kvm-ppc mailist
> for a while.
>=20
> arch/powerpc/include/asm/mmu-book3e.h |    6 ++-
> arch/powerpc/kvm/booke.c              |    6 +++
> arch/powerpc/kvm/booke.h              |    2 +
> arch/powerpc/kvm/bookehv_interrupts.S |   32 ++-------------
> arch/powerpc/kvm/e500.c               |    4 ++
> arch/powerpc/kvm/e500mc.c             |   69 =
+++++++++++++++++++++++++++++++++
> 6 files changed, 91 insertions(+), 28 deletions(-)
>=20
> diff --git a/arch/powerpc/include/asm/mmu-book3e.h =
b/arch/powerpc/include/asm/mmu-book3e.h
> index 99d43e0..32e470e 100644
> --- a/arch/powerpc/include/asm/mmu-book3e.h
> +++ b/arch/powerpc/include/asm/mmu-book3e.h
> @@ -40,7 +40,10 @@
>=20
> /* MAS registers bit definitions */
>=20
> -#define MAS0_TLBSEL(x)		(((x) << 28) & 0x30000000)
> +#define MAS0_TLBSEL_MASK	0x30000000
> +#define MAS0_TLBSEL_SHIFT	28
> +#define MAS0_TLBSEL(x)		(((x) << MAS0_TLBSEL_SHIFT) & =
MAS0_TLBSEL_MASK)
> +#define MAS0_GET_TLBSEL(mas0)	(((mas0) & MAS0_TLBSEL_MASK) >> =
MAS0_TLBSEL_SHIFT)
> #define MAS0_ESEL_MASK		0x0FFF0000
> #define MAS0_ESEL_SHIFT		16
> #define MAS0_ESEL(x)		(((x) << MAS0_ESEL_SHIFT) & =
MAS0_ESEL_MASK)
> @@ -58,6 +61,7 @@
> #define MAS1_TSIZE_MASK		0x00000f80
> #define MAS1_TSIZE_SHIFT	7
> #define MAS1_TSIZE(x)		(((x) << MAS1_TSIZE_SHIFT) & =
MAS1_TSIZE_MASK)
> +#define MAS1_GET_TSIZE(mas1)	(((mas1) & MAS1_TSIZE_MASK) >> =
MAS1_TSIZE_SHIFT)
>=20
> #define MAS2_EPN		(~0xFFFUL)
> #define MAS2_X0			0x00000040
> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
> index 1020119..6764a8e 100644
> --- a/arch/powerpc/kvm/booke.c
> +++ b/arch/powerpc/kvm/booke.c
> @@ -836,6 +836,12 @@ int kvmppc_handle_exit(struct kvm_run *run, =
struct kvm_vcpu *vcpu,
> 	/* update before a new last_exit_type is rewritten */
> 	kvmppc_update_timing_stats(vcpu);
>=20
> +	/*
> +	 * The exception type can change at this point, such as if the =
TLB entry
> +	 * for the emulated instruction has been evicted.
> +	 */
> +	kvmppc_prepare_for_emulation(vcpu, &exit_nr);
> +
> 	/* restart interrupts if they were meant for the host */
> 	kvmppc_restart_interrupt(vcpu, exit_nr);
>=20
> diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h
> index 5fd1ba6..a0d0fea 100644
> --- a/arch/powerpc/kvm/booke.h
> +++ b/arch/powerpc/kvm/booke.h
> @@ -90,6 +90,8 @@ void kvmppc_vcpu_disable_spe(struct kvm_vcpu *vcpu);
> void kvmppc_booke_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
> void kvmppc_booke_vcpu_put(struct kvm_vcpu *vcpu);
>=20
> +void kvmppc_prepare_for_emulation(struct kvm_vcpu *vcpu, unsigned int =
*exit_nr);
> +
> enum int_class {
> 	INT_CLASS_NONCRIT,
> 	INT_CLASS_CRIT,
> diff --git a/arch/powerpc/kvm/bookehv_interrupts.S =
b/arch/powerpc/kvm/bookehv_interrupts.S
> index 20c7a54..0538ab9 100644
> --- a/arch/powerpc/kvm/bookehv_interrupts.S
> +++ b/arch/powerpc/kvm/bookehv_interrupts.S
> @@ -120,37 +120,20 @@
>=20
> 	.if	\flags & NEED_EMU
> 	/*
> -	 * This assumes you have external PID support.
> -	 * To support a bookehv CPU without external PID, you'll
> -	 * need to look up the TLB entry and create a temporary mapping.
> -	 *
> -	 * FIXME: we don't currently handle if the lwepx faults.  =
PR-mode
> -	 * booke doesn't handle it either.  Since Linux doesn't use
> -	 * broadcast tlbivax anymore, the only way this should happen is
> -	 * if the guest maps its memory execute-but-not-read, or if we
> -	 * somehow take a TLB miss in the middle of this entry code and
> -	 * evict the relevant entry.  On e500mc, all kernel lowmem is
> -	 * bolted into TLB1 large page mappings, and we don't use
> -	 * broadcast invalidates, so we should not take a TLB miss here.
> -	 *
> -	 * Later we'll need to deal with faults here.  Disallowing guest
> -	 * mappings that are execute-but-not-read could be an option on
> -	 * e500mc, but not on chips with an LRAT if it is used.
> +	 * We don't use external PID support. lwepx faults would need to =
be
> +	 * handled by KVM and this implies aditional code in DO_KVM (for
> +	 * DTB_MISS, DSI and LRAT) to check ESR[EPID] and EPLC[EGS] =
which
> +	 * is too intrusive for the host. Get last instuction in
> +	 * kvmppc_handle_exit().
> 	 */
> -
> -	mfspr	r3, SPRN_EPLC	/* will already have correct ELPID and =
EGS */
> 	PPC_STL	r15, VCPU_GPR(R15)(r4)
> 	PPC_STL	r16, VCPU_GPR(R16)(r4)
> 	PPC_STL	r17, VCPU_GPR(R17)(r4)
> 	PPC_STL	r18, VCPU_GPR(R18)(r4)
> 	PPC_STL	r19, VCPU_GPR(R19)(r4)
> -	mr	r8, r3
> 	PPC_STL	r20, VCPU_GPR(R20)(r4)
> -	rlwimi	r8, r6, EPC_EAS_SHIFT - MSR_IR_LG, EPC_EAS
> 	PPC_STL	r21, VCPU_GPR(R21)(r4)
> -	rlwimi	r8, r6, EPC_EPR_SHIFT - MSR_PR_LG, EPC_EPR
> 	PPC_STL	r22, VCPU_GPR(R22)(r4)
> -	rlwimi	r8, r10, EPC_EPID_SHIFT, EPC_EPID
> 	PPC_STL	r23, VCPU_GPR(R23)(r4)
> 	PPC_STL	r24, VCPU_GPR(R24)(r4)
> 	PPC_STL	r25, VCPU_GPR(R25)(r4)
> @@ -160,11 +143,6 @@
> 	PPC_STL	r29, VCPU_GPR(R29)(r4)
> 	PPC_STL	r30, VCPU_GPR(R30)(r4)
> 	PPC_STL	r31, VCPU_GPR(R31)(r4)
> -	mtspr	SPRN_EPLC, r8
> -	isync
> -	lwepx   r9, 0, r5
> -	mtspr	SPRN_EPLC, r3
> -	stw	r9, VCPU_LAST_INST(r4)
> 	.endif
>=20
> 	.if	\flags & NEED_ESR
> diff --git a/arch/powerpc/kvm/e500.c b/arch/powerpc/kvm/e500.c
> index ce6b73c..c82a89f 100644
> --- a/arch/powerpc/kvm/e500.c
> +++ b/arch/powerpc/kvm/e500.c
> @@ -439,6 +439,10 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 =
id,
> 	return r;
> }
>=20
> +void kvmppc_prepare_for_emulation(struct kvm_vcpu *vcpu, unsigned int =
*exit_nr)
> +{
> +}
> +
> struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int =
id)
> {
> 	struct kvmppc_vcpu_e500 *vcpu_e500;
> diff --git a/arch/powerpc/kvm/e500mc.c b/arch/powerpc/kvm/e500mc.c
> index c3bdc0a..3641df7 100644
> --- a/arch/powerpc/kvm/e500mc.c
> +++ b/arch/powerpc/kvm/e500mc.c
> @@ -271,6 +271,75 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 =
id,
> 	return r;
> }
>=20
> +void kvmppc_prepare_for_emulation(struct kvm_vcpu *vcpu, unsigned int =
*exit_nr)
> +{
> +	gva_t geaddr;
> +	hpa_t addr;
> +	u64 mas7_mas3;
> +	hva_t eaddr;
> +	u32 mas1, mas3;
> +	struct page *page;
> +	unsigned int addr_space, psize_shift;
> +	bool pr;
> +
> +	if ((*exit_nr !=3D BOOKE_INTERRUPT_DATA_STORAGE) &&
> +	    (*exit_nr !=3D BOOKE_INTERRUPT_DTLB_MISS) &&
> +	    (*exit_nr !=3D BOOKE_INTERRUPT_HV_PRIV))
> +		return;
> +
> +	/* Search guest translation to find the real addressss */
> +	geaddr =3D vcpu->arch.pc;
> +	addr_space =3D (vcpu->arch.shared->msr & MSR_IS) >> MSR_IR_LG;
> +	mtspr(SPRN_MAS6, (vcpu->arch.pid << MAS6_SPID_SHIFT) | =
addr_space);
> +	mtspr(SPRN_MAS5, MAS5_SGS | vcpu->kvm->arch.lpid);
> +	isync();
> +	asm volatile("tlbsx 0, %[geaddr]\n" : : [geaddr] "r" (geaddr));
> +	mtspr(SPRN_MAS5, 0);
> +	mtspr(SPRN_MAS8, 0);=09
> +
> +	mas1 =3D mfspr(SPRN_MAS1);
> +	if (!(mas1 & MAS1_VALID)) {
> +		/*
> +	 	 * There is no translation for the emulated instruction.
> +		 * Simulate an instruction TLB miss. This should force =
the host
> +		 * or ultimately the guest to add the translation and =
then
> +		 * reexecute the instruction.
> +		 */
> +		*exit_nr =3D BOOKE_INTERRUPT_ITLB_MISS;
> +		return;
> +	}
> +
> +	mas3 =3D mfspr(SPRN_MAS3);
> +	pr =3D vcpu->arch.shared->msr & MSR_PR;
> +	if ((pr && (!(mas3 & MAS3_UX))) || ((!pr) && (!(mas3 & =
MAS3_SX)))) {
> +	 	/*
> +		 * Another thread may rewrite the TLB entry in parallel, =
don't
> +		 * execute from the address if the execute permission is =
not set
> +		 */
> +		vcpu->arch.fault_esr =3D 0;
> +		*exit_nr =3D BOOKE_INTERRUPT_INST_STORAGE;
> +		return;
> +	}
> +
> +	/* Get page size */
> +	if (MAS0_GET_TLBSEL(mfspr(SPRN_MAS0)) =3D=3D 0)
> +		psize_shift =3D PAGE_SHIFT;
> +	else
> +		psize_shift =3D MAS1_GET_TSIZE(mas1) + 10;
> +
> +	mas7_mas3 =3D (((u64) mfspr(SPRN_MAS7)) << 32) |
> +		    mfspr(SPRN_MAS3);
> +	addr =3D (mas7_mas3 & (~0ULL << psize_shift)) |
> +	       (geaddr & ((1ULL << psize_shift) - 1ULL));
> +
> +	/* Map a page and get guest's instruction */
> +	page =3D pfn_to_page(addr >> PAGE_SHIFT);

While looking at this I just realized that you're missing a check here. =
What if our IP is in some PCI BAR? Or can't we execute from those?


Alex

> +	eaddr =3D (unsigned long)kmap_atomic(page);
> +	eaddr |=3D addr & ~PAGE_MASK;
> +	vcpu->arch.last_inst =3D *(u32 *)eaddr;
> +	kunmap_atomic((u32 *)eaddr);
> +}
> +
> struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int =
id)
> {
> 	struct kvmppc_vcpu_e500 *vcpu_e500;
> --=20
> 1.7.4.1
>=20
>=20
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation
  2013-07-09 21:44           ` Alexander Graf
@ 2013-07-10  0:06             ` Scott Wood
  2013-07-10 10:15               ` Alexander Graf
  0 siblings, 1 reply; 19+ messages in thread
From: Scott Wood @ 2013-07-10  0:06 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Mihai Caraman, linuxppc-dev, kvm, kvm-ppc

On 07/09/2013 04:44:24 PM, Alexander Graf wrote:
>=20
> On 09.07.2013, at 20:46, Scott Wood wrote:
> > I suspect that tlbsx is faster, or at worst similar.  And unlike =20
> comparing tlbsx to lwepx (not counting a fix for the threading =20
> problem), we don't already have code to search the guest TLB, so =20
> testing would be more work.
>=20
> We have code to walk the guest TLB for TLB misses. This really is =20
> just the TLB miss search without host TLB injection.
>=20
> So let's say we're using the shadow TLB. The guest always has its say =20
> 64 TLB entries that it can count on - we never evict anything by =20
> accident, because we store all of the 64 entries in our guest TLB =20
> cache. When the guest faults at an address, the first thing we do is =20
> we check the cache whether we have that page already mapped.
>=20
> However, with this method we now have 2 enumeration methods for guest =20
> TLB searches. We have the tlbsx one which searches the host TLB and =20
> we have our guest TLB cache. The guest TLB cache might still contain =20
> an entry for an address that we already invalidated on the host. =20
> Would that impose a problem?
>=20
> I guess not because we're swizzling the exit code around to instead =20
> be an instruction miss which means we restore the TLB entry into our =20
> host's TLB so that when we resume, we land here and the tlbsx hits. =20
> But it feels backwards.

Any better way?  Searching the guest TLB won't work for the LRAT case, =20
so we'd need to have this logic around anyway.  We shouldn't add a =20
second codepath unless it's a clear performance gain -- and again, I =20
suspect it would be the opposite, especially if the entry is not in =20
TLB0 or in one of the first few entries searched in TLB1.  The tlbsx =20
miss case is not what we should optimize for.

> At least this code has to become something more generic, such as =20
> kvmppc_read_guest(vcpu, addr, TYPE_INSN) and move into the host mmu =20
> implementation, as it's 100% host mmu specific.

I agree that e500_mmu_host.c is a better place for it (with an ifdef =20
for BOOKEHV), but supporting anything other than instruction fetches =20
could wait until we have a user for it (it means extra code to figure =20
out if permissions are correct).

-Scott=

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation
  2013-07-09 21:45   ` Alexander Graf
@ 2013-07-10  0:12     ` Scott Wood
  2013-07-10 10:18       ` Alexander Graf
  0 siblings, 1 reply; 19+ messages in thread
From: Scott Wood @ 2013-07-10  0:12 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Mihai Caraman, linuxppc-dev, kvm, kvm-ppc

On 07/09/2013 04:45:10 PM, Alexander Graf wrote:
>=20
> On 28.06.2013, at 11:20, Mihai Caraman wrote:
>=20
> > +	/* Get page size */
> > +	if (MAS0_GET_TLBSEL(mfspr(SPRN_MAS0)) =3D=3D 0)
> > +		psize_shift =3D PAGE_SHIFT;
> > +	else
> > +		psize_shift =3D MAS1_GET_TSIZE(mas1) + 10;
> > +
> > +	mas7_mas3 =3D (((u64) mfspr(SPRN_MAS7)) << 32) |
> > +		    mfspr(SPRN_MAS3);
> > +	addr =3D (mas7_mas3 & (~0ULL << psize_shift)) |
> > +	       (geaddr & ((1ULL << psize_shift) - 1ULL));
> > +
> > +	/* Map a page and get guest's instruction */
> > +	page =3D pfn_to_page(addr >> PAGE_SHIFT);
>=20
> While looking at this I just realized that you're missing a check =20
> here. What if our IP is in some PCI BAR? Or can't we execute from =20
> those?

We at least need to check pfn_valid() first.  That'll just keep us from =20
accessing a bad pointer in the host kernel, though -- it won't make the =20
emulation actually work.  If we need that, we'll probably need to =20
create a temporary TLB entry manually.

-Scott=

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation
  2013-07-10  0:06             ` Scott Wood
@ 2013-07-10 10:15               ` Alexander Graf
  2013-07-10 18:42                 ` Scott Wood
  0 siblings, 1 reply; 19+ messages in thread
From: Alexander Graf @ 2013-07-10 10:15 UTC (permalink / raw)
  To: Scott Wood; +Cc: Mihai Caraman, linuxppc-dev, kvm, kvm-ppc


On 10.07.2013, at 02:06, Scott Wood wrote:

> On 07/09/2013 04:44:24 PM, Alexander Graf wrote:
>> On 09.07.2013, at 20:46, Scott Wood wrote:
>> > I suspect that tlbsx is faster, or at worst similar.  And unlike =
comparing tlbsx to lwepx (not counting a fix for the threading problem), =
we don't already have code to search the guest TLB, so testing would be =
more work.
>> We have code to walk the guest TLB for TLB misses. This really is =
just the TLB miss search without host TLB injection.
>> So let's say we're using the shadow TLB. The guest always has its say =
64 TLB entries that it can count on - we never evict anything by =
accident, because we store all of the 64 entries in our guest TLB cache. =
When the guest faults at an address, the first thing we do is we check =
the cache whether we have that page already mapped.
>> However, with this method we now have 2 enumeration methods for guest =
TLB searches. We have the tlbsx one which searches the host TLB and we =
have our guest TLB cache. The guest TLB cache might still contain an =
entry for an address that we already invalidated on the host. Would that =
impose a problem?
>> I guess not because we're swizzling the exit code around to instead =
be an instruction miss which means we restore the TLB entry into our =
host's TLB so that when we resume, we land here and the tlbsx hits. But =
it feels backwards.
>=20
> Any better way?  Searching the guest TLB won't work for the LRAT case, =
so we'd need to have this logic around anyway.  We shouldn't add a =
second codepath unless it's a clear performance gain -- and again, I =
suspect it would be the opposite, especially if the entry is not in TLB0 =
or in one of the first few entries searched in TLB1.  The tlbsx miss =
case is not what we should optimize for.

Hrm.

So let's redesign this thing theoretically. We would have an exit that =
requires an instruction fetch. We would override kvmppc_get_last_inst() =
to always do kvmppc_ld_inst(). That one can fail because it can't find =
the TLB entry in the host TLB. When it fails, we have to abort the =
emulation and resume the guest at the same IP.

Now the guest gets the TLB miss, we populate, go back into the guest. =
The guest hits the emulation failure again. We go back to =
kvmppc_ld_inst() which succeeds this time and we can emulate the =
instruction.

I think this works. Just make sure that the gateway to the instruction =
fetch is kvmppc_get_last_inst() and make that failable. Then the =
difference between looking for the TLB entry in the host's TLB or in the =
guest's TLB cache is hopefully negligible.

>=20
>> At least this code has to become something more generic, such as =
kvmppc_read_guest(vcpu, addr, TYPE_INSN) and move into the host mmu =
implementation, as it's 100% host mmu specific.
>=20
> I agree that e500_mmu_host.c is a better place for it (with an ifdef =
for BOOKEHV), but supporting anything other than instruction fetches =
could wait until we have a user for it (it means extra code to figure =
out if permissions are correct).

Works for me, as long as it's either documented through BUG_ON/WARN_ON's =
or an explicit naming convention.


Alex

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation
  2013-07-10  0:12     ` Scott Wood
@ 2013-07-10 10:18       ` Alexander Graf
  2013-07-10 18:37         ` Scott Wood
  0 siblings, 1 reply; 19+ messages in thread
From: Alexander Graf @ 2013-07-10 10:18 UTC (permalink / raw)
  To: Scott Wood; +Cc: Mihai Caraman, linuxppc-dev, kvm, kvm-ppc


On 10.07.2013, at 02:12, Scott Wood wrote:

> On 07/09/2013 04:45:10 PM, Alexander Graf wrote:
>> On 28.06.2013, at 11:20, Mihai Caraman wrote:
>> > +	/* Get page size */
>> > +	if (MAS0_GET_TLBSEL(mfspr(SPRN_MAS0)) =3D=3D 0)
>> > +		psize_shift =3D PAGE_SHIFT;
>> > +	else
>> > +		psize_shift =3D MAS1_GET_TSIZE(mas1) + 10;
>> > +
>> > +	mas7_mas3 =3D (((u64) mfspr(SPRN_MAS7)) << 32) |
>> > +		    mfspr(SPRN_MAS3);
>> > +	addr =3D (mas7_mas3 & (~0ULL << psize_shift)) |
>> > +	       (geaddr & ((1ULL << psize_shift) - 1ULL));
>> > +
>> > +	/* Map a page and get guest's instruction */
>> > +	page =3D pfn_to_page(addr >> PAGE_SHIFT);
>> While looking at this I just realized that you're missing a check =
here. What if our IP is in some PCI BAR? Or can't we execute from those?
>=20
> We at least need to check pfn_valid() first.  That'll just keep us =
from accessing a bad pointer in the host kernel, though -- it won't make =
the emulation actually work.  If we need that, we'll probably need to =
create a temporary TLB entry manually.

ioremap()?

However, if we were walking the guest TLB cache instead we would get a =
guest physical address which we can always resolve to a host virtual =
address.

I'm not sure how important that whole use case is though. Maybe we =
should just error out to the guest for now.


Alex

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation
  2013-07-10 10:18       ` Alexander Graf
@ 2013-07-10 18:37         ` Scott Wood
  2013-07-10 22:48           ` Alexander Graf
  0 siblings, 1 reply; 19+ messages in thread
From: Scott Wood @ 2013-07-10 18:37 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Mihai Caraman, linuxppc-dev, kvm, kvm-ppc

On 07/10/2013 05:18:10 AM, Alexander Graf wrote:
>=20
> On 10.07.2013, at 02:12, Scott Wood wrote:
>=20
> > On 07/09/2013 04:45:10 PM, Alexander Graf wrote:
> >> On 28.06.2013, at 11:20, Mihai Caraman wrote:
> >> > +	/* Get page size */
> >> > +	if (MAS0_GET_TLBSEL(mfspr(SPRN_MAS0)) =3D=3D 0)
> >> > +		psize_shift =3D PAGE_SHIFT;
> >> > +	else
> >> > +		psize_shift =3D MAS1_GET_TSIZE(mas1) + 10;
> >> > +
> >> > +	mas7_mas3 =3D (((u64) mfspr(SPRN_MAS7)) << 32) |
> >> > +		    mfspr(SPRN_MAS3);
> >> > +	addr =3D (mas7_mas3 & (~0ULL << psize_shift)) |
> >> > +	       (geaddr & ((1ULL << psize_shift) - 1ULL));
> >> > +
> >> > +	/* Map a page and get guest's instruction */
> >> > +	page =3D pfn_to_page(addr >> PAGE_SHIFT);
> >> While looking at this I just realized that you're missing a check =20
> here. What if our IP is in some PCI BAR? Or can't we execute from =20
> those?
> >
> > We at least need to check pfn_valid() first.  That'll just keep us =20
> from accessing a bad pointer in the host kernel, though -- it won't =20
> make the emulation actually work.  If we need that, we'll probably =20
> need to create a temporary TLB entry manually.
>=20
> ioremap()?

That's a bit heavy... also we'd need to deal with cacheability.  This =20
code is already engaged in directly creating TLB entries, so it doesn't =20
seem like much of a stretch to create one for this.  It should be =20
faster than ioremap() or kmap_atomic().

The one complication is allocating the virtual address space, but maybe =20
we could just use the page that kmap_atomic would have used?  Of =20
course, if we want to handle execution from other than normal kernel =20
memory, we'll need to make sure that the virtual address space is =20
allocated even when highmem is not present (e.g. 64-bit).

> However, if we were walking the guest TLB cache instead we would get =20
> a guest physical address which we can always resolve to a host =20
> virtual address.
>=20
> I'm not sure how important that whole use case is though. Maybe we =20
> should just error out to the guest for now.

It's not that important, now that we are using hugetlb rather than =20
directly mapping a large hunk of reserved memory.  It would be nice to =20
handle it though, if we can do without too much hassle.  And I think =20
manually creating a TLB entry could be faster than kmap_atomic(), or =20
searching the guest TLB and then doing a reverse memslot lookup.

-Scott=

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation
  2013-07-10 10:15               ` Alexander Graf
@ 2013-07-10 18:42                 ` Scott Wood
  2013-07-10 22:50                   ` Alexander Graf
  0 siblings, 1 reply; 19+ messages in thread
From: Scott Wood @ 2013-07-10 18:42 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Mihai Caraman, linuxppc-dev, kvm, kvm-ppc

On 07/10/2013 05:15:09 AM, Alexander Graf wrote:
>=20
> On 10.07.2013, at 02:06, Scott Wood wrote:
>=20
> > On 07/09/2013 04:44:24 PM, Alexander Graf wrote:
> >> On 09.07.2013, at 20:46, Scott Wood wrote:
> >> > I suspect that tlbsx is faster, or at worst similar.  And unlike =20
> comparing tlbsx to lwepx (not counting a fix for the threading =20
> problem), we don't already have code to search the guest TLB, so =20
> testing would be more work.
> >> We have code to walk the guest TLB for TLB misses. This really is =20
> just the TLB miss search without host TLB injection.
> >> So let's say we're using the shadow TLB. The guest always has its =20
> say 64 TLB entries that it can count on - we never evict anything by =20
> accident, because we store all of the 64 entries in our guest TLB =20
> cache. When the guest faults at an address, the first thing we do is =20
> we check the cache whether we have that page already mapped.
> >> However, with this method we now have 2 enumeration methods for =20
> guest TLB searches. We have the tlbsx one which searches the host TLB =20
> and we have our guest TLB cache. The guest TLB cache might still =20
> contain an entry for an address that we already invalidated on the =20
> host. Would that impose a problem?
> >> I guess not because we're swizzling the exit code around to =20
> instead be an instruction miss which means we restore the TLB entry =20
> into our host's TLB so that when we resume, we land here and the =20
> tlbsx hits. But it feels backwards.
> >
> > Any better way?  Searching the guest TLB won't work for the LRAT =20
> case, so we'd need to have this logic around anyway.  We shouldn't =20
> add a second codepath unless it's a clear performance gain -- and =20
> again, I suspect it would be the opposite, especially if the entry is =20
> not in TLB0 or in one of the first few entries searched in TLB1.  The =20
> tlbsx miss case is not what we should optimize for.
>=20
> Hrm.
>=20
> So let's redesign this thing theoretically. We would have an exit =20
> that requires an instruction fetch. We would override =20
> kvmppc_get_last_inst() to always do kvmppc_ld_inst(). That one can =20
> fail because it can't find the TLB entry in the host TLB. When it =20
> fails, we have to abort the emulation and resume the guest at the =20
> same IP.
>=20
> Now the guest gets the TLB miss, we populate, go back into the guest. =20
> The guest hits the emulation failure again. We go back to =20
> kvmppc_ld_inst() which succeeds this time and we can emulate the =20
> instruction.

That's pretty much what this patch does, except that it goes =20
immediately to the TLB miss code rather than having the extra =20
round-trip back to the guest.  Is there any benefit from adding that =20
extra round-trip?  Rewriting the exit type instead doesn't seem that =20
bad...

> I think this works. Just make sure that the gateway to the =20
> instruction fetch is kvmppc_get_last_inst() and make that failable. =20
> Then the difference between looking for the TLB entry in the host's =20
> TLB or in the guest's TLB cache is hopefully negligible.

I don't follow here.  What does this have to do with looking in the =20
guest TLB?

-Scott=

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation
  2013-07-10 18:37         ` Scott Wood
@ 2013-07-10 22:48           ` Alexander Graf
  0 siblings, 0 replies; 19+ messages in thread
From: Alexander Graf @ 2013-07-10 22:48 UTC (permalink / raw)
  To: Scott Wood; +Cc: Mihai Caraman, linuxppc-dev, kvm, kvm-ppc


On 10.07.2013, at 20:37, Scott Wood wrote:

> On 07/10/2013 05:18:10 AM, Alexander Graf wrote:
>> On 10.07.2013, at 02:12, Scott Wood wrote:
>> > On 07/09/2013 04:45:10 PM, Alexander Graf wrote:
>> >> On 28.06.2013, at 11:20, Mihai Caraman wrote:
>> >> > +	/* Get page size */
>> >> > +	if (MAS0_GET_TLBSEL(mfspr(SPRN_MAS0)) =3D=3D 0)
>> >> > +		psize_shift =3D PAGE_SHIFT;
>> >> > +	else
>> >> > +		psize_shift =3D MAS1_GET_TSIZE(mas1) + 10;
>> >> > +
>> >> > +	mas7_mas3 =3D (((u64) mfspr(SPRN_MAS7)) << 32) |
>> >> > +		    mfspr(SPRN_MAS3);
>> >> > +	addr =3D (mas7_mas3 & (~0ULL << psize_shift)) |
>> >> > +	       (geaddr & ((1ULL << psize_shift) - 1ULL));
>> >> > +
>> >> > +	/* Map a page and get guest's instruction */
>> >> > +	page =3D pfn_to_page(addr >> PAGE_SHIFT);
>> >> While looking at this I just realized that you're missing a check =
here. What if our IP is in some PCI BAR? Or can't we execute from those?
>> >
>> > We at least need to check pfn_valid() first.  That'll just keep us =
from accessing a bad pointer in the host kernel, though -- it won't make =
the emulation actually work.  If we need that, we'll probably need to =
create a temporary TLB entry manually.
>> ioremap()?
>=20
> That's a bit heavy... also we'd need to deal with cacheability.  This =
code is already engaged in directly creating TLB entries, so it doesn't =
seem like much of a stretch to create one for this.  It should be faster =
than ioremap() or kmap_atomic().

Do we have any guarantees that the TLB entry we create doesn't get =
evicted by another thread by the time we want to use it?


Alex

> The one complication is allocating the virtual address space, but =
maybe we could just use the page that kmap_atomic would have used?  Of =
course, if we want to handle execution from other than normal kernel =
memory, we'll need to make sure that the virtual address space is =
allocated even when highmem is not present (e.g. 64-bit).
>=20
>> However, if we were walking the guest TLB cache instead we would get =
a guest physical address which we can always resolve to a host virtual =
address.
>> I'm not sure how important that whole use case is though. Maybe we =
should just error out to the guest for now.
>=20
> It's not that important, now that we are using hugetlb rather than =
directly mapping a large hunk of reserved memory.  It would be nice to =
handle it though, if we can do without too much hassle.  And I think =
manually creating a TLB entry could be faster than kmap_atomic(), or =
searching the guest TLB and then doing a reverse memslot lookup.
>=20
> -Scott
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation
  2013-07-10 18:42                 ` Scott Wood
@ 2013-07-10 22:50                   ` Alexander Graf
  2013-07-11  0:15                     ` Scott Wood
  0 siblings, 1 reply; 19+ messages in thread
From: Alexander Graf @ 2013-07-10 22:50 UTC (permalink / raw)
  To: Scott Wood; +Cc: Mihai Caraman, linuxppc-dev, kvm, kvm-ppc


On 10.07.2013, at 20:42, Scott Wood wrote:

> On 07/10/2013 05:15:09 AM, Alexander Graf wrote:
>> On 10.07.2013, at 02:06, Scott Wood wrote:
>> > On 07/09/2013 04:44:24 PM, Alexander Graf wrote:
>> >> On 09.07.2013, at 20:46, Scott Wood wrote:
>> >> > I suspect that tlbsx is faster, or at worst similar.  And unlike =
comparing tlbsx to lwepx (not counting a fix for the threading problem), =
we don't already have code to search the guest TLB, so testing would be =
more work.
>> >> We have code to walk the guest TLB for TLB misses. This really is =
just the TLB miss search without host TLB injection.
>> >> So let's say we're using the shadow TLB. The guest always has its =
say 64 TLB entries that it can count on - we never evict anything by =
accident, because we store all of the 64 entries in our guest TLB cache. =
When the guest faults at an address, the first thing we do is we check =
the cache whether we have that page already mapped.
>> >> However, with this method we now have 2 enumeration methods for =
guest TLB searches. We have the tlbsx one which searches the host TLB =
and we have our guest TLB cache. The guest TLB cache might still contain =
an entry for an address that we already invalidated on the host. Would =
that impose a problem?
>> >> I guess not because we're swizzling the exit code around to =
instead be an instruction miss which means we restore the TLB entry into =
our host's TLB so that when we resume, we land here and the tlbsx hits. =
But it feels backwards.
>> >
>> > Any better way?  Searching the guest TLB won't work for the LRAT =
case, so we'd need to have this logic around anyway.  We shouldn't add a =
second codepath unless it's a clear performance gain -- and again, I =
suspect it would be the opposite, especially if the entry is not in TLB0 =
or in one of the first few entries searched in TLB1.  The tlbsx miss =
case is not what we should optimize for.
>> Hrm.
>> So let's redesign this thing theoretically. We would have an exit =
that requires an instruction fetch. We would override =
kvmppc_get_last_inst() to always do kvmppc_ld_inst(). That one can fail =
because it can't find the TLB entry in the host TLB. When it fails, we =
have to abort the emulation and resume the guest at the same IP.
>> Now the guest gets the TLB miss, we populate, go back into the guest. =
The guest hits the emulation failure again. We go back to =
kvmppc_ld_inst() which succeeds this time and we can emulate the =
instruction.
>=20
> That's pretty much what this patch does, except that it goes =
immediately to the TLB miss code rather than having the extra round-trip =
back to the guest.  Is there any benefit from adding that extra =
round-trip?  Rewriting the exit type instead doesn't seem that bad...

It's pretty bad. I want to have code that is easy to follow - and I =
don't care whether the very rare case of a TLB entry getting evicted by =
a random other thread right when we execute the exit path is slower by a =
few percent if we get cleaner code for that.

>=20
>> I think this works. Just make sure that the gateway to the =
instruction fetch is kvmppc_get_last_inst() and make that failable. Then =
the difference between looking for the TLB entry in the host's TLB or in =
the guest's TLB cache is hopefully negligible.
>=20
> I don't follow here.  What does this have to do with looking in the =
guest TLB?

I want to hide the fact that we're cheating as much as possible, that's =
it.


Alex

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation
  2013-07-10 22:50                   ` Alexander Graf
@ 2013-07-11  0:15                     ` Scott Wood
  2013-07-11  0:17                       ` Alexander Graf
  0 siblings, 1 reply; 19+ messages in thread
From: Scott Wood @ 2013-07-11  0:15 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Mihai Caraman, linuxppc-dev, kvm, kvm-ppc

On 07/10/2013 05:50:01 PM, Alexander Graf wrote:
>=20
> On 10.07.2013, at 20:42, Scott Wood wrote:
>=20
> > On 07/10/2013 05:15:09 AM, Alexander Graf wrote:
> >> On 10.07.2013, at 02:06, Scott Wood wrote:
> >> > On 07/09/2013 04:44:24 PM, Alexander Graf wrote:
> >> >> On 09.07.2013, at 20:46, Scott Wood wrote:
> >> >> > I suspect that tlbsx is faster, or at worst similar.  And =20
> unlike comparing tlbsx to lwepx (not counting a fix for the threading =20
> problem), we don't already have code to search the guest TLB, so =20
> testing would be more work.
> >> >> We have code to walk the guest TLB for TLB misses. This really =20
> is just the TLB miss search without host TLB injection.
> >> >> So let's say we're using the shadow TLB. The guest always has =20
> its say 64 TLB entries that it can count on - we never evict anything =20
> by accident, because we store all of the 64 entries in our guest TLB =20
> cache. When the guest faults at an address, the first thing we do is =20
> we check the cache whether we have that page already mapped.
> >> >> However, with this method we now have 2 enumeration methods for =20
> guest TLB searches. We have the tlbsx one which searches the host TLB =20
> and we have our guest TLB cache. The guest TLB cache might still =20
> contain an entry for an address that we already invalidated on the =20
> host. Would that impose a problem?
> >> >> I guess not because we're swizzling the exit code around to =20
> instead be an instruction miss which means we restore the TLB entry =20
> into our host's TLB so that when we resume, we land here and the =20
> tlbsx hits. But it feels backwards.
> >> >
> >> > Any better way?  Searching the guest TLB won't work for the LRAT =20
> case, so we'd need to have this logic around anyway.  We shouldn't =20
> add a second codepath unless it's a clear performance gain -- and =20
> again, I suspect it would be the opposite, especially if the entry is =20
> not in TLB0 or in one of the first few entries searched in TLB1.  The =20
> tlbsx miss case is not what we should optimize for.
> >> Hrm.
> >> So let's redesign this thing theoretically. We would have an exit =20
> that requires an instruction fetch. We would override =20
> kvmppc_get_last_inst() to always do kvmppc_ld_inst(). That one can =20
> fail because it can't find the TLB entry in the host TLB. When it =20
> fails, we have to abort the emulation and resume the guest at the =20
> same IP.
> >> Now the guest gets the TLB miss, we populate, go back into the =20
> guest. The guest hits the emulation failure again. We go back to =20
> kvmppc_ld_inst() which succeeds this time and we can emulate the =20
> instruction.
> >
> > That's pretty much what this patch does, except that it goes =20
> immediately to the TLB miss code rather than having the extra =20
> round-trip back to the guest.  Is there any benefit from adding that =20
> extra round-trip?  Rewriting the exit type instead doesn't seem that =20
> bad...
>=20
> It's pretty bad. I want to have code that is easy to follow - and I =20
> don't care whether the very rare case of a TLB entry getting evicted =20
> by a random other thread right when we execute the exit path is =20
> slower by a few percent if we get cleaner code for that.

I guess I just don't see how this is so much harder to follow than =20
returning to guest.  I find it harder to follow the flow when there are =20
more round trips to the guest involved.  "Treat this as an ITLB miss" =20
is simpler than, "Let this fail, and make sure we retry the trapping =20
instruction on failure.  Then, an ITLB miss will happen."

Also note that making kvmppc_get_last_inst() able to fail means =20
updating several existing callsites, both for the change in function =20
signature and to actually handle failures.

I don't care that deeply either way, it just doesn't seem obviously =20
better.

> >> I think this works. Just make sure that the gateway to the =20
> instruction fetch is kvmppc_get_last_inst() and make that failable. =20
> Then the difference between looking for the TLB entry in the host's =20
> TLB or in the guest's TLB cache is hopefully negligible.
> >
> > I don't follow here.  What does this have to do with looking in the =20
> guest TLB?
>=20
> I want to hide the fact that we're cheating as much as possible, =20
> that's it.

How are we cheating, and what specifically are you proposing to do to =20
hide that?  How is the guest TLB involved at all in the change you're =20
asking for?

-Scott=

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation
  2013-07-11  0:15                     ` Scott Wood
@ 2013-07-11  0:17                       ` Alexander Graf
  0 siblings, 0 replies; 19+ messages in thread
From: Alexander Graf @ 2013-07-11  0:17 UTC (permalink / raw)
  To: Scott Wood; +Cc: Mihai Caraman, linuxppc-dev, kvm, kvm-ppc


On 11.07.2013, at 02:15, Scott Wood wrote:

> On 07/10/2013 05:50:01 PM, Alexander Graf wrote:
>> On 10.07.2013, at 20:42, Scott Wood wrote:
>> > On 07/10/2013 05:15:09 AM, Alexander Graf wrote:
>> >> On 10.07.2013, at 02:06, Scott Wood wrote:
>> >> > On 07/09/2013 04:44:24 PM, Alexander Graf wrote:
>> >> >> On 09.07.2013, at 20:46, Scott Wood wrote:
>> >> >> > I suspect that tlbsx is faster, or at worst similar.  And =
unlike comparing tlbsx to lwepx (not counting a fix for the threading =
problem), we don't already have code to search the guest TLB, so testing =
would be more work.
>> >> >> We have code to walk the guest TLB for TLB misses. This really =
is just the TLB miss search without host TLB injection.
>> >> >> So let's say we're using the shadow TLB. The guest always has =
its say 64 TLB entries that it can count on - we never evict anything by =
accident, because we store all of the 64 entries in our guest TLB cache. =
When the guest faults at an address, the first thing we do is we check =
the cache whether we have that page already mapped.
>> >> >> However, with this method we now have 2 enumeration methods for =
guest TLB searches. We have the tlbsx one which searches the host TLB =
and we have our guest TLB cache. The guest TLB cache might still contain =
an entry for an address that we already invalidated on the host. Would =
that impose a problem?
>> >> >> I guess not because we're swizzling the exit code around to =
instead be an instruction miss which means we restore the TLB entry into =
our host's TLB so that when we resume, we land here and the tlbsx hits. =
But it feels backwards.
>> >> >
>> >> > Any better way?  Searching the guest TLB won't work for the LRAT =
case, so we'd need to have this logic around anyway.  We shouldn't add a =
second codepath unless it's a clear performance gain -- and again, I =
suspect it would be the opposite, especially if the entry is not in TLB0 =
or in one of the first few entries searched in TLB1.  The tlbsx miss =
case is not what we should optimize for.
>> >> Hrm.
>> >> So let's redesign this thing theoretically. We would have an exit =
that requires an instruction fetch. We would override =
kvmppc_get_last_inst() to always do kvmppc_ld_inst(). That one can fail =
because it can't find the TLB entry in the host TLB. When it fails, we =
have to abort the emulation and resume the guest at the same IP.
>> >> Now the guest gets the TLB miss, we populate, go back into the =
guest. The guest hits the emulation failure again. We go back to =
kvmppc_ld_inst() which succeeds this time and we can emulate the =
instruction.
>> >
>> > That's pretty much what this patch does, except that it goes =
immediately to the TLB miss code rather than having the extra round-trip =
back to the guest.  Is there any benefit from adding that extra =
round-trip?  Rewriting the exit type instead doesn't seem that bad...
>> It's pretty bad. I want to have code that is easy to follow - and I =
don't care whether the very rare case of a TLB entry getting evicted by =
a random other thread right when we execute the exit path is slower by a =
few percent if we get cleaner code for that.
>=20
> I guess I just don't see how this is so much harder to follow than =
returning to guest.  I find it harder to follow the flow when there are =
more round trips to the guest involved.  "Treat this as an ITLB miss" is =
simpler than, "Let this fail, and make sure we retry the trapping =
instruction on failure.  Then, an ITLB miss will happen."
>=20
> Also note that making kvmppc_get_last_inst() able to fail means =
updating several existing callsites, both for the change in function =
signature and to actually handle failures.
>=20
> I don't care that deeply either way, it just doesn't seem obviously =
better.
>=20
>> >> I think this works. Just make sure that the gateway to the =
instruction fetch is kvmppc_get_last_inst() and make that failable. Then =
the difference between looking for the TLB entry in the host's TLB or in =
the guest's TLB cache is hopefully negligible.
>> >
>> > I don't follow here.  What does this have to do with looking in the =
guest TLB?
>> I want to hide the fact that we're cheating as much as possible, =
that's it.
>=20
> How are we cheating, and what specifically are you proposing to do to =
hide that?  How is the guest TLB involved at all in the change you're =
asking for?

It's not involved, but it's basically what we do on book3s pr kvm. There =
kvmppc_ld reads the guest htab, not the host htab. I think it's fine to =
expose both cases as the same thing to the rest of kvm.


Alex

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation
  2013-06-06 16:11 [PATCH 1/2] KVM: PPC: e500mc: Revert "add load inst fixup" Mihai Caraman
@ 2013-06-06 16:11 ` Mihai Caraman
  0 siblings, 0 replies; 19+ messages in thread
From: Mihai Caraman @ 2013-06-06 16:11 UTC (permalink / raw)
  To: kvm-ppc; +Cc: Mihai Caraman, linuxppc-dev, kvm

lwepx faults needs to be handled by KVM and this implies additional code
in DO_KVM macro to identify the source of the exception originated in
host context. This requires to check the Exception Syndrome Register
(ESR[EPID]) and External PID Load Context Register (EPLC[EGS]) for DTB_MISS,
DSI and LRAT exceptions which is too intrusive for the host.

Get rid of lwepx and acquire last instuction in kvmppc_handle_exit() by
searching for the physical address and kmap it. This fixes an infinite loop
caused by lwepx's data TLB miss handled in the host and the TODO for TLB
eviction and execute-but-not-read entries.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
---
 arch/powerpc/include/asm/mmu-book3e.h |    6 ++-
 arch/powerpc/kvm/booke.c              |    6 +++
 arch/powerpc/kvm/booke.h              |    2 +
 arch/powerpc/kvm/bookehv_interrupts.S |   32 ++-------------
 arch/powerpc/kvm/e500.c               |    4 ++
 arch/powerpc/kvm/e500mc.c             |   69 +++++++++++++++++++++++++++++++++
 6 files changed, 91 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu-book3e.h b/arch/powerpc/include/asm/mmu-book3e.h
index 99d43e0..32e470e 100644
--- a/arch/powerpc/include/asm/mmu-book3e.h
+++ b/arch/powerpc/include/asm/mmu-book3e.h
@@ -40,7 +40,10 @@
 
 /* MAS registers bit definitions */
 
-#define MAS0_TLBSEL(x)		(((x) << 28) & 0x30000000)
+#define MAS0_TLBSEL_MASK	0x30000000
+#define MAS0_TLBSEL_SHIFT	28
+#define MAS0_TLBSEL(x)		(((x) << MAS0_TLBSEL_SHIFT) & MAS0_TLBSEL_MASK)
+#define MAS0_GET_TLBSEL(mas0)	(((mas0) & MAS0_TLBSEL_MASK) >> MAS0_TLBSEL_SHIFT)
 #define MAS0_ESEL_MASK		0x0FFF0000
 #define MAS0_ESEL_SHIFT		16
 #define MAS0_ESEL(x)		(((x) << MAS0_ESEL_SHIFT) & MAS0_ESEL_MASK)
@@ -58,6 +61,7 @@
 #define MAS1_TSIZE_MASK		0x00000f80
 #define MAS1_TSIZE_SHIFT	7
 #define MAS1_TSIZE(x)		(((x) << MAS1_TSIZE_SHIFT) & MAS1_TSIZE_MASK)
+#define MAS1_GET_TSIZE(mas1)	(((mas1) & MAS1_TSIZE_MASK) >> MAS1_TSIZE_SHIFT)
 
 #define MAS2_EPN		(~0xFFFUL)
 #define MAS2_X0			0x00000040
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 1020119..6764a8e 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -836,6 +836,12 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	/* update before a new last_exit_type is rewritten */
 	kvmppc_update_timing_stats(vcpu);
 
+	/*
+	 * The exception type can change at this point, such as if the TLB entry
+	 * for the emulated instruction has been evicted.
+	 */
+	kvmppc_prepare_for_emulation(vcpu, &exit_nr);
+
 	/* restart interrupts if they were meant for the host */
 	kvmppc_restart_interrupt(vcpu, exit_nr);
 
diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h
index 5fd1ba6..a0d0fea 100644
--- a/arch/powerpc/kvm/booke.h
+++ b/arch/powerpc/kvm/booke.h
@@ -90,6 +90,8 @@ void kvmppc_vcpu_disable_spe(struct kvm_vcpu *vcpu);
 void kvmppc_booke_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
 void kvmppc_booke_vcpu_put(struct kvm_vcpu *vcpu);
 
+void kvmppc_prepare_for_emulation(struct kvm_vcpu *vcpu, unsigned int *exit_nr);
+
 enum int_class {
 	INT_CLASS_NONCRIT,
 	INT_CLASS_CRIT,
diff --git a/arch/powerpc/kvm/bookehv_interrupts.S b/arch/powerpc/kvm/bookehv_interrupts.S
index 20c7a54..0538ab9 100644
--- a/arch/powerpc/kvm/bookehv_interrupts.S
+++ b/arch/powerpc/kvm/bookehv_interrupts.S
@@ -120,37 +120,20 @@
 
 	.if	\flags & NEED_EMU
 	/*
-	 * This assumes you have external PID support.
-	 * To support a bookehv CPU without external PID, you'll
-	 * need to look up the TLB entry and create a temporary mapping.
-	 *
-	 * FIXME: we don't currently handle if the lwepx faults.  PR-mode
-	 * booke doesn't handle it either.  Since Linux doesn't use
-	 * broadcast tlbivax anymore, the only way this should happen is
-	 * if the guest maps its memory execute-but-not-read, or if we
-	 * somehow take a TLB miss in the middle of this entry code and
-	 * evict the relevant entry.  On e500mc, all kernel lowmem is
-	 * bolted into TLB1 large page mappings, and we don't use
-	 * broadcast invalidates, so we should not take a TLB miss here.
-	 *
-	 * Later we'll need to deal with faults here.  Disallowing guest
-	 * mappings that are execute-but-not-read could be an option on
-	 * e500mc, but not on chips with an LRAT if it is used.
+	 * We don't use external PID support. lwepx faults would need to be
+	 * handled by KVM and this implies aditional code in DO_KVM (for
+	 * DTB_MISS, DSI and LRAT) to check ESR[EPID] and EPLC[EGS] which
+	 * is too intrusive for the host. Get last instuction in
+	 * kvmppc_handle_exit().
 	 */
-
-	mfspr	r3, SPRN_EPLC	/* will already have correct ELPID and EGS */
 	PPC_STL	r15, VCPU_GPR(R15)(r4)
 	PPC_STL	r16, VCPU_GPR(R16)(r4)
 	PPC_STL	r17, VCPU_GPR(R17)(r4)
 	PPC_STL	r18, VCPU_GPR(R18)(r4)
 	PPC_STL	r19, VCPU_GPR(R19)(r4)
-	mr	r8, r3
 	PPC_STL	r20, VCPU_GPR(R20)(r4)
-	rlwimi	r8, r6, EPC_EAS_SHIFT - MSR_IR_LG, EPC_EAS
 	PPC_STL	r21, VCPU_GPR(R21)(r4)
-	rlwimi	r8, r6, EPC_EPR_SHIFT - MSR_PR_LG, EPC_EPR
 	PPC_STL	r22, VCPU_GPR(R22)(r4)
-	rlwimi	r8, r10, EPC_EPID_SHIFT, EPC_EPID
 	PPC_STL	r23, VCPU_GPR(R23)(r4)
 	PPC_STL	r24, VCPU_GPR(R24)(r4)
 	PPC_STL	r25, VCPU_GPR(R25)(r4)
@@ -160,11 +143,6 @@
 	PPC_STL	r29, VCPU_GPR(R29)(r4)
 	PPC_STL	r30, VCPU_GPR(R30)(r4)
 	PPC_STL	r31, VCPU_GPR(R31)(r4)
-	mtspr	SPRN_EPLC, r8
-	isync
-	lwepx   r9, 0, r5
-	mtspr	SPRN_EPLC, r3
-	stw	r9, VCPU_LAST_INST(r4)
 	.endif
 
 	.if	\flags & NEED_ESR
diff --git a/arch/powerpc/kvm/e500.c b/arch/powerpc/kvm/e500.c
index ce6b73c..c82a89f 100644
--- a/arch/powerpc/kvm/e500.c
+++ b/arch/powerpc/kvm/e500.c
@@ -439,6 +439,10 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id,
 	return r;
 }
 
+void kvmppc_prepare_for_emulation(struct kvm_vcpu *vcpu, unsigned int *exit_nr)
+{
+}
+
 struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
 {
 	struct kvmppc_vcpu_e500 *vcpu_e500;
diff --git a/arch/powerpc/kvm/e500mc.c b/arch/powerpc/kvm/e500mc.c
index c3bdc0a..3641df7 100644
--- a/arch/powerpc/kvm/e500mc.c
+++ b/arch/powerpc/kvm/e500mc.c
@@ -271,6 +271,75 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id,
 	return r;
 }
 
+void kvmppc_prepare_for_emulation(struct kvm_vcpu *vcpu, unsigned int *exit_nr)
+{
+	gva_t geaddr;
+	hpa_t addr;
+	u64 mas7_mas3;
+	hva_t eaddr;
+	u32 mas1, mas3;
+	struct page *page;
+	unsigned int addr_space, psize_shift;
+	bool pr;
+
+	if ((*exit_nr != BOOKE_INTERRUPT_DATA_STORAGE) &&
+	    (*exit_nr != BOOKE_INTERRUPT_DTLB_MISS) &&
+	    (*exit_nr != BOOKE_INTERRUPT_HV_PRIV))
+		return;
+
+	/* Search guest translation to find the real addressss */
+	geaddr = vcpu->arch.pc;
+	addr_space = (vcpu->arch.shared->msr & MSR_IS) >> MSR_IR_LG;
+	mtspr(SPRN_MAS6, (vcpu->arch.pid << MAS6_SPID_SHIFT) | addr_space);
+	mtspr(SPRN_MAS5, MAS5_SGS | vcpu->kvm->arch.lpid);
+	isync();
+	asm volatile("tlbsx 0, %[geaddr]\n" : : [geaddr] "r" (geaddr));
+	mtspr(SPRN_MAS5, 0);
+	mtspr(SPRN_MAS8, 0);	
+
+	mas1 = mfspr(SPRN_MAS1);
+	if (!(mas1 & MAS1_VALID)) {
+		/*
+	 	 * There is no translation for the emulated instruction.
+		 * Simulate an instruction TLB miss. This should force the host
+		 * or ultimately the guest to add the translation and then
+		 * reexecute the instruction.
+		 */
+		*exit_nr = BOOKE_INTERRUPT_ITLB_MISS;
+		return;
+	}
+
+	mas3 = mfspr(SPRN_MAS3);
+	pr = vcpu->arch.shared->msr & MSR_PR;
+	if ((pr && (!(mas3 & MAS3_UX))) || ((!pr) && (!(mas3 & MAS3_SX)))) {
+	 	/*
+		 * Another thread may rewrite the TLB entry in parallel, don't
+		 * execute from the address if the execute permission is not set
+		 */
+		vcpu->arch.fault_esr = 0;
+		*exit_nr = BOOKE_INTERRUPT_INST_STORAGE;
+		return;
+	}
+
+	/* Get page size */
+	if (MAS0_GET_TLBSEL(mfspr(SPRN_MAS0)) == 0)
+		psize_shift = PAGE_SHIFT;
+	else
+		psize_shift = MAS1_GET_TSIZE(mas1) + 10;
+
+	mas7_mas3 = (((u64) mfspr(SPRN_MAS7)) << 32) |
+		    mfspr(SPRN_MAS3);
+	addr = (mas7_mas3 & (~0ULL << psize_shift)) |
+	       (geaddr & ((1ULL << psize_shift) - 1ULL));
+
+	/* Map a page and get guest's instruction */
+	page = pfn_to_page(addr >> PAGE_SHIFT);
+	eaddr = (unsigned long)kmap_atomic(page);
+	eaddr |= addr & ~PAGE_MASK;
+	vcpu->arch.last_inst = *(u32 *)eaddr;
+	kunmap_atomic((u32 *)eaddr);
+}
+
 struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
 {
 	struct kvmppc_vcpu_e500 *vcpu_e500;
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2013-07-11  0:17 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-28  9:20 [PATCH 1/2] KVM: PPC: e500mc: Revert "add load inst fixup" Mihai Caraman
2013-06-28  9:20 ` [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation Mihai Caraman
2013-07-08 13:39   ` Alexander Graf
2013-07-09 17:13     ` Scott Wood
2013-07-09 17:44       ` Alexander Graf
2013-07-09 18:46         ` Scott Wood
2013-07-09 21:44           ` Alexander Graf
2013-07-10  0:06             ` Scott Wood
2013-07-10 10:15               ` Alexander Graf
2013-07-10 18:42                 ` Scott Wood
2013-07-10 22:50                   ` Alexander Graf
2013-07-11  0:15                     ` Scott Wood
2013-07-11  0:17                       ` Alexander Graf
2013-07-09 21:45   ` Alexander Graf
2013-07-10  0:12     ` Scott Wood
2013-07-10 10:18       ` Alexander Graf
2013-07-10 18:37         ` Scott Wood
2013-07-10 22:48           ` Alexander Graf
  -- strict thread matches above, loose matches on Subject: below --
2013-06-06 16:11 [PATCH 1/2] KVM: PPC: e500mc: Revert "add load inst fixup" Mihai Caraman
2013-06-06 16:11 ` [PATCH 2/2] KVM: PPC: Book3E: Get vcpu's last instruction for emulation Mihai Caraman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).