All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/3] arm/arm64: KVM: Random selection of cache related fixes
@ 2015-01-21 18:39 Marc Zyngier
  2015-01-21 18:39 ` [PATCH v3 1/3] arm/arm64: KVM: Use set/way op trapping to track the state of the caches Marc Zyngier
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Marc Zyngier @ 2015-01-21 18:39 UTC (permalink / raw)
  To: Christoffer Dall, kvm, kvmarm; +Cc: Peter Maydell, Steve Capper

This small series fixes a number of issues that Christoffer and I have
been trying to nail down for a while, having to do with the host dying
under load (swapping), and also with the way we deal with caches in
general (and with set/way operation in particular):

- The first one changes the way we handle cache ops by set/way,
  basically turning them into VA ops for the whole memory. This allows
  platforms with system caches to boot a 32bit zImage, for example.

- The second one fixes a corner case that could happen if the guest
  used an uncached mapping (or had its caches off) while the host was
  swapping it out (and using a cache-coherent IO subsystem).

- Finally, the last one fixes this stability issue when the host was
  swapping, by using a kernel mapping for cache maintenance instead of
  the userspace one.

With these patches (and both the TLB invalidation and HCR fixes that
are on their way to mainline), the APM platform seems much more robust
than it previously was. Fingers crossed.

The first round of review has generated a lot of traffic about
ASID-tagged icache management for guests, but I've decided not to
address this issue as part of this series. The code is broken already,
and there isn't any virtualization capable, ASID-tagged icache core in
the wild, AFAIK. I'll try to revisit this in another series, once I
have wrapped my head around it (or someone beats me to it).

Based on 3.19-rc5, tested on Juno, X-Gene, TC-2 and Cubietruck.

Also at git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git kvm-arm64/mm-fixes-3.19

* From v2: [2]
- Reworked the algorithm that tracks the state of the guest's caches,
  as there is some cases I didn't anticipate. In the end, the
  algorithm is simpler.

* From v1: [1]
- Dropped Steve's patch after discussion with Andrea
- Refactor set/way support to avoid code duplication, better comments
- Much improved comments in patch #2, courtesy of Christoffer

[1]: http://www.spinics.net/lists/kvm-arm/msg13008.html
[2]: http://www.spinics.net/lists/kvm-arm/msg13161.html

Marc Zyngier (3):
  arm/arm64: KVM: Use set/way op trapping to track the state of the
    caches
  arm/arm64: KVM: Invalidate data cache on unmap
  arm/arm64: KVM: Use kernel mapping to perform invalidation on page
    fault

 arch/arm/include/asm/kvm_emulate.h   |  10 +++
 arch/arm/include/asm/kvm_host.h      |   3 -
 arch/arm/include/asm/kvm_mmu.h       |  77 +++++++++++++---
 arch/arm/kvm/arm.c                   |  10 ---
 arch/arm/kvm/coproc.c                |  64 +++-----------
 arch/arm/kvm/coproc_a15.c            |   2 +-
 arch/arm/kvm/coproc_a7.c             |   2 +-
 arch/arm/kvm/mmu.c                   | 164 ++++++++++++++++++++++++++++++-----
 arch/arm/kvm/trace.h                 |  39 +++++++++
 arch/arm64/include/asm/kvm_emulate.h |  10 +++
 arch/arm64/include/asm/kvm_host.h    |   3 -
 arch/arm64/include/asm/kvm_mmu.h     |  34 ++++++--
 arch/arm64/kvm/sys_regs.c            |  75 +++-------------
 13 files changed, 321 insertions(+), 172 deletions(-)

-- 
2.1.4


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 1/3] arm/arm64: KVM: Use set/way op trapping to track the state of the caches
  2015-01-21 18:39 [PATCH v3 0/3] arm/arm64: KVM: Random selection of cache related fixes Marc Zyngier
@ 2015-01-21 18:39 ` Marc Zyngier
  2015-01-26 22:58   ` Christoffer Dall
  2015-01-21 18:39 ` [PATCH v3 2/3] arm/arm64: KVM: Invalidate data cache on unmap Marc Zyngier
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Marc Zyngier @ 2015-01-21 18:39 UTC (permalink / raw)
  To: Christoffer Dall, kvm, kvmarm; +Cc: Peter Maydell, Steve Capper

Trying to emulate the behaviour of set/way cache ops is fairly
pointless, as there are too many ways we can end-up missing stuff.
Also, there is some system caches out there that simply ignore
set/way operations.

So instead of trying to implement them, let's convert it to VA ops,
and use them as a way to re-enable the trapping of VM ops. That way,
we can detect the point when the MMU/caches are turned off, and do
a full VM flush (which is what the guest was trying to do anyway).

This allows a 32bit zImage to boot on the APM thingy, and will
probably help bootloaders in general.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_emulate.h   | 10 +++++
 arch/arm/include/asm/kvm_host.h      |  3 --
 arch/arm/include/asm/kvm_mmu.h       |  3 +-
 arch/arm/kvm/arm.c                   | 10 -----
 arch/arm/kvm/coproc.c                | 64 ++++++------------------------
 arch/arm/kvm/coproc_a15.c            |  2 +-
 arch/arm/kvm/coproc_a7.c             |  2 +-
 arch/arm/kvm/mmu.c                   | 70 ++++++++++++++++++++++++++++++++-
 arch/arm/kvm/trace.h                 | 39 +++++++++++++++++++
 arch/arm64/include/asm/kvm_emulate.h | 10 +++++
 arch/arm64/include/asm/kvm_host.h    |  3 --
 arch/arm64/include/asm/kvm_mmu.h     |  3 +-
 arch/arm64/kvm/sys_regs.c            | 75 +++++-------------------------------
 13 files changed, 155 insertions(+), 139 deletions(-)

diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index 66ce176..7b01523 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -38,6 +38,16 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
 	vcpu->arch.hcr = HCR_GUEST_MASK;
 }
 
+static inline unsigned long vcpu_get_hcr(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.hcr;
+}
+
+static inline void vcpu_set_hcr(struct kvm_vcpu *vcpu, unsigned long hcr)
+{
+	vcpu->arch.hcr = hcr;
+}
+
 static inline bool vcpu_mode_is_32bit(struct kvm_vcpu *vcpu)
 {
 	return 1;
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 254e065..04b4ea0 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -125,9 +125,6 @@ struct kvm_vcpu_arch {
 	 * Anything that is not used directly from assembly code goes
 	 * here.
 	 */
-	/* dcache set/way operation pending */
-	int last_pcpu;
-	cpumask_t require_dcache_flush;
 
 	/* Don't run the guest on this vcpu */
 	bool pause;
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 63e0ecc..286644c 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -190,7 +190,8 @@ static inline void coherent_cache_guest_page(struct kvm_vcpu *vcpu, hva_t hva,
 
 #define kvm_virt_to_phys(x)		virt_to_idmap((unsigned long)(x))
 
-void stage2_flush_vm(struct kvm *kvm);
+void kvm_set_way_flush(struct kvm_vcpu *vcpu);
+void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
 
 #endif	/* !__ASSEMBLY__ */
 
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 2d6d910..0b0d58a 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -281,15 +281,6 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	vcpu->cpu = cpu;
 	vcpu->arch.host_cpu_context = this_cpu_ptr(kvm_host_cpu_state);
 
-	/*
-	 * Check whether this vcpu requires the cache to be flushed on
-	 * this physical CPU. This is a consequence of doing dcache
-	 * operations by set/way on this vcpu. We do it here to be in
-	 * a non-preemptible section.
-	 */
-	if (cpumask_test_and_clear_cpu(cpu, &vcpu->arch.require_dcache_flush))
-		flush_cache_all(); /* We'd really want v7_flush_dcache_all() */
-
 	kvm_arm_set_running_vcpu(vcpu);
 }
 
@@ -541,7 +532,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
 
 		vcpu->mode = OUTSIDE_GUEST_MODE;
-		vcpu->arch.last_pcpu = smp_processor_id();
 		kvm_guest_exit();
 		trace_kvm_exit(*vcpu_pc(vcpu));
 		/*
diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index 7928dbd..0afcc00 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -189,82 +189,40 @@ static bool access_l2ectlr(struct kvm_vcpu *vcpu,
 	return true;
 }
 
-/* See note at ARM ARM B1.14.4 */
+/*
+ * See note at ARMv7 ARM B1.14.4 (TL;DR: S/W ops are not easily virtualized).
+ */
 static bool access_dcsw(struct kvm_vcpu *vcpu,
 			const struct coproc_params *p,
 			const struct coproc_reg *r)
 {
-	unsigned long val;
-	int cpu;
-
 	if (!p->is_write)
 		return read_from_write_only(vcpu, p);
 
-	cpu = get_cpu();
-
-	cpumask_setall(&vcpu->arch.require_dcache_flush);
-	cpumask_clear_cpu(cpu, &vcpu->arch.require_dcache_flush);
-
-	/* If we were already preempted, take the long way around */
-	if (cpu != vcpu->arch.last_pcpu) {
-		flush_cache_all();
-		goto done;
-	}
-
-	val = *vcpu_reg(vcpu, p->Rt1);
-
-	switch (p->CRm) {
-	case 6:			/* Upgrade DCISW to DCCISW, as per HCR.SWIO */
-	case 14:		/* DCCISW */
-		asm volatile("mcr p15, 0, %0, c7, c14, 2" : : "r" (val));
-		break;
-
-	case 10:		/* DCCSW */
-		asm volatile("mcr p15, 0, %0, c7, c10, 2" : : "r" (val));
-		break;
-	}
-
-done:
-	put_cpu();
-
+	kvm_set_way_flush(vcpu);
 	return true;
 }
 
 /*
  * Generic accessor for VM registers. Only called as long as HCR_TVM
- * is set.
+ * is set.  If the guest enables the MMU, we stop trapping the VM
+ * sys_regs and leave it in complete control of the caches.
+ *
+ * Used by the cpu-specific code.
  */
 static bool access_vm_reg(struct kvm_vcpu *vcpu,
 			  const struct coproc_params *p,
 			  const struct coproc_reg *r)
 {
+	bool was_enabled = vcpu_has_cache_enabled(vcpu);
+
 	BUG_ON(!p->is_write);
 
 	vcpu->arch.cp15[r->reg] = *vcpu_reg(vcpu, p->Rt1);
 	if (p->is_64bit)
 		vcpu->arch.cp15[r->reg + 1] = *vcpu_reg(vcpu, p->Rt2);
 
-	return true;
-}
-
-/*
- * SCTLR accessor. Only called as long as HCR_TVM is set.  If the
- * guest enables the MMU, we stop trapping the VM sys_regs and leave
- * it in complete control of the caches.
- *
- * Used by the cpu-specific code.
- */
-bool access_sctlr(struct kvm_vcpu *vcpu,
-		  const struct coproc_params *p,
-		  const struct coproc_reg *r)
-{
-	access_vm_reg(vcpu, p, r);
-
-	if (vcpu_has_cache_enabled(vcpu)) {	/* MMU+Caches enabled? */
-		vcpu->arch.hcr &= ~HCR_TVM;
-		stage2_flush_vm(vcpu->kvm);
-	}
-
+	kvm_toggle_cache(vcpu, was_enabled);
 	return true;
 }
 
diff --git a/arch/arm/kvm/coproc_a15.c b/arch/arm/kvm/coproc_a15.c
index e6f4ae4..a713675 100644
--- a/arch/arm/kvm/coproc_a15.c
+++ b/arch/arm/kvm/coproc_a15.c
@@ -34,7 +34,7 @@
 static const struct coproc_reg a15_regs[] = {
 	/* SCTLR: swapped by interrupt.S. */
 	{ CRn( 1), CRm( 0), Op1( 0), Op2( 0), is32,
-			access_sctlr, reset_val, c1_SCTLR, 0x00C50078 },
+			access_vm_reg, reset_val, c1_SCTLR, 0x00C50078 },
 };
 
 static struct kvm_coproc_target_table a15_target_table = {
diff --git a/arch/arm/kvm/coproc_a7.c b/arch/arm/kvm/coproc_a7.c
index 17fc7cd..b19e46d 100644
--- a/arch/arm/kvm/coproc_a7.c
+++ b/arch/arm/kvm/coproc_a7.c
@@ -37,7 +37,7 @@
 static const struct coproc_reg a7_regs[] = {
 	/* SCTLR: swapped by interrupt.S. */
 	{ CRn( 1), CRm( 0), Op1( 0), Op2( 0), is32,
-			access_sctlr, reset_val, c1_SCTLR, 0x00C50878 },
+			access_vm_reg, reset_val, c1_SCTLR, 0x00C50878 },
 };
 
 static struct kvm_coproc_target_table a7_target_table = {
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 1dc9778..106737e 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -278,7 +278,7 @@ static void stage2_flush_memslot(struct kvm *kvm,
  * Go through the stage 2 page tables and invalidate any cache lines
  * backing memory already mapped to the VM.
  */
-void stage2_flush_vm(struct kvm *kvm)
+static void stage2_flush_vm(struct kvm *kvm)
 {
 	struct kvm_memslots *slots;
 	struct kvm_memory_slot *memslot;
@@ -1411,3 +1411,71 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 	unmap_stage2_range(kvm, gpa, size);
 	spin_unlock(&kvm->mmu_lock);
 }
+
+/*
+ * See note at ARMv7 ARM B1.14.4 (TL;DR: S/W ops are not easily virtualized).
+ *
+ * Main problems:
+ * - S/W ops are local to a CPU (not broadcast)
+ * - We have line migration behind our back (speculation)
+ * - System caches don't support S/W at all (damn!)
+ *
+ * In the face of the above, the best we can do is to try and convert
+ * S/W ops to VA ops. Because the guest is not allowed to infer the
+ * S/W to PA mapping, it can only use S/W to nuke the whole cache,
+ * which is a rather good thing for us.
+ *
+ * Also, it is only used when turning caches on/off ("The expected
+ * usage of the cache maintenance instructions that operate by set/way
+ * is associated with the cache maintenance instructions associated
+ * with the powerdown and powerup of caches, if this is required by
+ * the implementation.").
+ *
+ * We use the following policy:
+ *
+ * - If we trap a S/W operation, we enable VM trapping to detect
+ *   caches being turned on/off, and do a full clean.
+ *
+ * - We flush the caches on both caches being turned on and off.
+ *
+ * - Once the caches are enabled, we stop trapping VM ops.
+ */
+void kvm_set_way_flush(struct kvm_vcpu *vcpu)
+{
+	unsigned long hcr = vcpu_get_hcr(vcpu);
+
+	/*
+	 * If this is the first time we do a S/W operation
+	 * (i.e. HCR_TVM not set) flush the whole memory, and set the
+	 * VM trapping.
+	 *
+	 * Otherwise, rely on the VM trapping to wait for the MMU +
+	 * Caches to be turned off. At that point, we'll be able to
+	 * clean the caches again.
+	 */
+	if (!(hcr & HCR_TVM)) {
+		trace_kvm_set_way_flush(*vcpu_pc(vcpu),
+					vcpu_has_cache_enabled(vcpu));
+		stage2_flush_vm(vcpu->kvm);
+		vcpu_set_hcr(vcpu, hcr | HCR_TVM);
+	}
+}
+
+void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled)
+{
+	bool now_enabled = vcpu_has_cache_enabled(vcpu);
+
+	/*
+	 * If switching the MMU+caches on, need to invalidate the caches.
+	 * If switching it off, need to clean the caches.
+	 * Clean + invalidate does the trick always.
+	 */
+	if (now_enabled != was_enabled)
+		stage2_flush_vm(vcpu->kvm);
+
+	/* Caches are now on, stop trapping VM ops (until a S/W op) */
+	if (now_enabled)
+		vcpu_set_hcr(vcpu, vcpu_get_hcr(vcpu) & ~HCR_TVM);
+
+	trace_kvm_toggle_cache(*vcpu_pc(vcpu), was_enabled, now_enabled);
+}
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
index b1d640f..b6a6e71 100644
--- a/arch/arm/kvm/trace.h
+++ b/arch/arm/kvm/trace.h
@@ -223,6 +223,45 @@ TRACE_EVENT(kvm_hvc,
 		  __entry->vcpu_pc, __entry->r0, __entry->imm)
 );
 
+TRACE_EVENT(kvm_set_way_flush,
+	    TP_PROTO(unsigned long vcpu_pc, bool cache),
+	    TP_ARGS(vcpu_pc, cache),
+
+	    TP_STRUCT__entry(
+		    __field(	unsigned long,	vcpu_pc		)
+		    __field(	bool,		cache		)
+	    ),
+
+	    TP_fast_assign(
+		    __entry->vcpu_pc		= vcpu_pc;
+		    __entry->cache		= cache;
+	    ),
+
+	    TP_printk("S/W flush at 0x%016lx (cache %s)",
+		      __entry->vcpu_pc, __entry->cache ? "on" : "off")
+);
+
+TRACE_EVENT(kvm_toggle_cache,
+	    TP_PROTO(unsigned long vcpu_pc, bool was, bool now),
+	    TP_ARGS(vcpu_pc, was, now),
+
+	    TP_STRUCT__entry(
+		    __field(	unsigned long,	vcpu_pc		)
+		    __field(	bool,		was		)
+		    __field(	bool,		now		)
+	    ),
+
+	    TP_fast_assign(
+		    __entry->vcpu_pc		= vcpu_pc;
+		    __entry->was		= was;
+		    __entry->now		= now;
+	    ),
+
+	    TP_printk("VM op at 0x%016lx (cache was %s, now %s)",
+		      __entry->vcpu_pc, __entry->was ? "on" : "off",
+		      __entry->now ? "on" : "off")
+);
+
 #endif /* _TRACE_KVM_H */
 
 #undef TRACE_INCLUDE_PATH
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 865a7e2..3cb4c85 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -45,6 +45,16 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
 		vcpu->arch.hcr_el2 &= ~HCR_RW;
 }
 
+static inline unsigned long vcpu_get_hcr(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.hcr_el2;
+}
+
+static inline void vcpu_set_hcr(struct kvm_vcpu *vcpu, unsigned long hcr)
+{
+	vcpu->arch.hcr_el2 = hcr;
+}
+
 static inline unsigned long *vcpu_pc(const struct kvm_vcpu *vcpu)
 {
 	return (unsigned long *)&vcpu_gp_regs(vcpu)->regs.pc;
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 0b7dfdb..acd101a 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -116,9 +116,6 @@ struct kvm_vcpu_arch {
 	 * Anything that is not used directly from assembly code goes
 	 * here.
 	 */
-	/* dcache set/way operation pending */
-	int last_pcpu;
-	cpumask_t require_dcache_flush;
 
 	/* Don't run the guest */
 	bool pause;
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 14a74f1..92d22e9 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -260,7 +260,8 @@ static inline void coherent_cache_guest_page(struct kvm_vcpu *vcpu, hva_t hva,
 
 #define kvm_virt_to_phys(x)		__virt_to_phys((unsigned long)(x))
 
-void stage2_flush_vm(struct kvm *kvm);
+void kvm_set_way_flush(struct kvm_vcpu *vcpu);
+void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
 
 #endif /* __ASSEMBLY__ */
 #endif /* __ARM64_KVM_MMU_H__ */
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 3d7c2df..95daa73 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -69,68 +69,31 @@ static u32 get_ccsidr(u32 csselr)
 	return ccsidr;
 }
 
-static void do_dc_cisw(u32 val)
-{
-	asm volatile("dc cisw, %x0" : : "r" (val));
-	dsb(ish);
-}
-
-static void do_dc_csw(u32 val)
-{
-	asm volatile("dc csw, %x0" : : "r" (val));
-	dsb(ish);
-}
-
-/* See note at ARM ARM B1.14.4 */
+/*
+ * See note at ARMv7 ARM B1.14.4 (TL;DR: S/W ops are not easily virtualized).
+ */
 static bool access_dcsw(struct kvm_vcpu *vcpu,
 			const struct sys_reg_params *p,
 			const struct sys_reg_desc *r)
 {
-	unsigned long val;
-	int cpu;
-
 	if (!p->is_write)
 		return read_from_write_only(vcpu, p);
 
-	cpu = get_cpu();
-
-	cpumask_setall(&vcpu->arch.require_dcache_flush);
-	cpumask_clear_cpu(cpu, &vcpu->arch.require_dcache_flush);
-
-	/* If we were already preempted, take the long way around */
-	if (cpu != vcpu->arch.last_pcpu) {
-		flush_cache_all();
-		goto done;
-	}
-
-	val = *vcpu_reg(vcpu, p->Rt);
-
-	switch (p->CRm) {
-	case 6:			/* Upgrade DCISW to DCCISW, as per HCR.SWIO */
-	case 14:		/* DCCISW */
-		do_dc_cisw(val);
-		break;
-
-	case 10:		/* DCCSW */
-		do_dc_csw(val);
-		break;
-	}
-
-done:
-	put_cpu();
-
+	kvm_set_way_flush(vcpu);	
 	return true;
 }
 
 /*
  * Generic accessor for VM registers. Only called as long as HCR_TVM
- * is set.
+ * is set. If the guest enables the MMU, we stop trapping the VM
+ * sys_regs and leave it in complete control of the caches.
  */
 static bool access_vm_reg(struct kvm_vcpu *vcpu,
 			  const struct sys_reg_params *p,
 			  const struct sys_reg_desc *r)
 {
 	unsigned long val;
+	bool was_enabled = vcpu_has_cache_enabled(vcpu);
 
 	BUG_ON(!p->is_write);
 
@@ -143,25 +106,7 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
 		vcpu_cp15_64_low(vcpu, r->reg) = val & 0xffffffffUL;
 	}
 
-	return true;
-}
-
-/*
- * SCTLR_EL1 accessor. Only called as long as HCR_TVM is set.  If the
- * guest enables the MMU, we stop trapping the VM sys_regs and leave
- * it in complete control of the caches.
- */
-static bool access_sctlr(struct kvm_vcpu *vcpu,
-			 const struct sys_reg_params *p,
-			 const struct sys_reg_desc *r)
-{
-	access_vm_reg(vcpu, p, r);
-
-	if (vcpu_has_cache_enabled(vcpu)) {	/* MMU+Caches enabled? */
-		vcpu->arch.hcr_el2 &= ~HCR_TVM;
-		stage2_flush_vm(vcpu->kvm);
-	}
-
+	kvm_toggle_cache(vcpu, was_enabled);
 	return true;
 }
 
@@ -377,7 +322,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	  NULL, reset_mpidr, MPIDR_EL1 },
 	/* SCTLR_EL1 */
 	{ Op0(0b11), Op1(0b000), CRn(0b0001), CRm(0b0000), Op2(0b000),
-	  access_sctlr, reset_val, SCTLR_EL1, 0x00C50078 },
+	  access_vm_reg, reset_val, SCTLR_EL1, 0x00C50078 },
 	/* CPACR_EL1 */
 	{ Op0(0b11), Op1(0b000), CRn(0b0001), CRm(0b0000), Op2(0b010),
 	  NULL, reset_val, CPACR_EL1, 0 },
@@ -657,7 +602,7 @@ static const struct sys_reg_desc cp14_64_regs[] = {
  * register).
  */
 static const struct sys_reg_desc cp15_regs[] = {
-	{ Op1( 0), CRn( 1), CRm( 0), Op2( 0), access_sctlr, NULL, c1_SCTLR },
+	{ Op1( 0), CRn( 1), CRm( 0), Op2( 0), access_vm_reg, NULL, c1_SCTLR },
 	{ Op1( 0), CRn( 2), CRm( 0), Op2( 0), access_vm_reg, NULL, c2_TTBR0 },
 	{ Op1( 0), CRn( 2), CRm( 0), Op2( 1), access_vm_reg, NULL, c2_TTBR1 },
 	{ Op1( 0), CRn( 2), CRm( 0), Op2( 2), access_vm_reg, NULL, c2_TTBCR },
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 2/3] arm/arm64: KVM: Invalidate data cache on unmap
  2015-01-21 18:39 [PATCH v3 0/3] arm/arm64: KVM: Random selection of cache related fixes Marc Zyngier
  2015-01-21 18:39 ` [PATCH v3 1/3] arm/arm64: KVM: Use set/way op trapping to track the state of the caches Marc Zyngier
@ 2015-01-21 18:39 ` Marc Zyngier
  2015-01-29 14:02   ` Christoffer Dall
  2015-01-21 18:39 ` [PATCH v3 3/3] arm/arm64: KVM: Use kernel mapping to perform invalidation on page fault Marc Zyngier
  2015-01-26 17:09 ` [PATCH v3 0/3] arm/arm64: KVM: Random selection of cache related fixes Andrew Jones
  3 siblings, 1 reply; 12+ messages in thread
From: Marc Zyngier @ 2015-01-21 18:39 UTC (permalink / raw)
  To: Christoffer Dall, kvm, kvmarm; +Cc: Peter Maydell, Steve Capper

Let's assume a guest has created an uncached mapping, and written
to that page. Let's also assume that the host uses a cache-coherent
IO subsystem. Let's finally assume that the host is under memory
pressure and starts to swap things out.

Before this "uncached" page is evicted, we need to make sure
we invalidate potential speculated, clean cache lines that are
sitting there, or the IO subsystem is going to swap out the
cached view, loosing the data that has been written directly
into memory.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_mmu.h   | 31 +++++++++++++++
 arch/arm/kvm/mmu.c               | 82 ++++++++++++++++++++++++++++++++--------
 arch/arm64/include/asm/kvm_mmu.h | 18 +++++++++
 3 files changed, 116 insertions(+), 15 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 286644c..552c31f 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -44,6 +44,7 @@
 
 #ifndef __ASSEMBLY__
 
+#include <linux/highmem.h>
 #include <asm/cacheflush.h>
 #include <asm/pgalloc.h>
 
@@ -188,6 +189,36 @@ static inline void coherent_cache_guest_page(struct kvm_vcpu *vcpu, hva_t hva,
 	}
 }
 
+static inline void __kvm_flush_dcache_pte(pte_t pte)
+{
+	void *va = kmap_atomic(pte_page(pte));
+
+	kvm_flush_dcache_to_poc(va, PAGE_SIZE);
+
+	kunmap_atomic(va);
+}
+
+static inline void __kvm_flush_dcache_pmd(pmd_t pmd)
+{
+	unsigned long size = PMD_SIZE;
+	pfn_t pfn = pmd_pfn(pmd);
+
+	while (size) {
+		void *va = kmap_atomic_pfn(pfn);
+
+		kvm_flush_dcache_to_poc(va, PAGE_SIZE);
+
+		pfn++;
+		size -= PAGE_SIZE;
+
+		kunmap_atomic(va);
+	}
+}
+
+static inline void __kvm_flush_dcache_pud(pud_t pud)
+{
+}
+
 #define kvm_virt_to_phys(x)		virt_to_idmap((unsigned long)(x))
 
 void kvm_set_way_flush(struct kvm_vcpu *vcpu);
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 106737e..78e68ab 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -58,6 +58,26 @@ static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
 		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, kvm, ipa);
 }
 
+/*
+ * D-Cache management functions. They take the page table entries by
+ * value, as they are flushing the cache using the kernel mapping (or
+ * kmap on 32bit).
+ */
+static void kvm_flush_dcache_pte(pte_t pte)
+{
+	__kvm_flush_dcache_pte(pte);
+}
+
+static void kvm_flush_dcache_pmd(pmd_t pmd)
+{
+	__kvm_flush_dcache_pmd(pmd);
+}
+
+static void kvm_flush_dcache_pud(pud_t pud)
+{
+	__kvm_flush_dcache_pud(pud);
+}
+
 static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
 				  int min, int max)
 {
@@ -119,6 +139,26 @@ static void clear_pmd_entry(struct kvm *kvm, pmd_t *pmd, phys_addr_t addr)
 	put_page(virt_to_page(pmd));
 }
 
+/*
+ * Unmapping vs dcache management:
+ *
+ * If a guest maps certain memory pages as uncached, all writes will
+ * bypass the data cache and go directly to RAM.  However, the CPUs
+ * can still speculate reads (not writes) and fill cache lines with
+ * data.
+ *
+ * Those cache lines will be *clean* cache lines though, so a
+ * clean+invalidate operation is equivalent to an invalidate
+ * operation, because no cache lines are marked dirty.
+ *
+ * Those clean cache lines could be filled prior to an uncached write
+ * by the guest, and the cache coherent IO subsystem would therefore
+ * end up writing old data to disk.
+ *
+ * This is why right after unmapping a page/section and invalidating
+ * the corresponding TLBs, we call kvm_flush_dcache_p*() to make sure
+ * the IO subsystem will never hit in the cache.
+ */
 static void unmap_ptes(struct kvm *kvm, pmd_t *pmd,
 		       phys_addr_t addr, phys_addr_t end)
 {
@@ -128,9 +168,16 @@ static void unmap_ptes(struct kvm *kvm, pmd_t *pmd,
 	start_pte = pte = pte_offset_kernel(pmd, addr);
 	do {
 		if (!pte_none(*pte)) {
+			pte_t old_pte = *pte;
+
 			kvm_set_pte(pte, __pte(0));
-			put_page(virt_to_page(pte));
 			kvm_tlb_flush_vmid_ipa(kvm, addr);
+
+			/* No need to invalidate the cache for device mappings */
+			if ((pte_val(old_pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
+				kvm_flush_dcache_pte(old_pte);
+
+			put_page(virt_to_page(pte));
 		}
 	} while (pte++, addr += PAGE_SIZE, addr != end);
 
@@ -149,8 +196,13 @@ static void unmap_pmds(struct kvm *kvm, pud_t *pud,
 		next = kvm_pmd_addr_end(addr, end);
 		if (!pmd_none(*pmd)) {
 			if (kvm_pmd_huge(*pmd)) {
+				pmd_t old_pmd = *pmd;
+
 				pmd_clear(pmd);
 				kvm_tlb_flush_vmid_ipa(kvm, addr);
+
+				kvm_flush_dcache_pmd(old_pmd);
+
 				put_page(virt_to_page(pmd));
 			} else {
 				unmap_ptes(kvm, pmd, addr, next);
@@ -173,8 +225,13 @@ static void unmap_puds(struct kvm *kvm, pgd_t *pgd,
 		next = kvm_pud_addr_end(addr, end);
 		if (!pud_none(*pud)) {
 			if (pud_huge(*pud)) {
+				pud_t old_pud = *pud;
+
 				pud_clear(pud);
 				kvm_tlb_flush_vmid_ipa(kvm, addr);
+
+				kvm_flush_dcache_pud(old_pud);
+
 				put_page(virt_to_page(pud));
 			} else {
 				unmap_pmds(kvm, pud, addr, next);
@@ -209,10 +266,9 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
 
 	pte = pte_offset_kernel(pmd, addr);
 	do {
-		if (!pte_none(*pte)) {
-			hva_t hva = gfn_to_hva(kvm, addr >> PAGE_SHIFT);
-			kvm_flush_dcache_to_poc((void*)hva, PAGE_SIZE);
-		}
+		if (!pte_none(*pte) &&
+		    (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
+			kvm_flush_dcache_pte(*pte);
 	} while (pte++, addr += PAGE_SIZE, addr != end);
 }
 
@@ -226,12 +282,10 @@ static void stage2_flush_pmds(struct kvm *kvm, pud_t *pud,
 	do {
 		next = kvm_pmd_addr_end(addr, end);
 		if (!pmd_none(*pmd)) {
-			if (kvm_pmd_huge(*pmd)) {
-				hva_t hva = gfn_to_hva(kvm, addr >> PAGE_SHIFT);
-				kvm_flush_dcache_to_poc((void*)hva, PMD_SIZE);
-			} else {
+			if (kvm_pmd_huge(*pmd))
+				kvm_flush_dcache_pmd(*pmd);
+			else
 				stage2_flush_ptes(kvm, pmd, addr, next);
-			}
 		}
 	} while (pmd++, addr = next, addr != end);
 }
@@ -246,12 +300,10 @@ static void stage2_flush_puds(struct kvm *kvm, pgd_t *pgd,
 	do {
 		next = kvm_pud_addr_end(addr, end);
 		if (!pud_none(*pud)) {
-			if (pud_huge(*pud)) {
-				hva_t hva = gfn_to_hva(kvm, addr >> PAGE_SHIFT);
-				kvm_flush_dcache_to_poc((void*)hva, PUD_SIZE);
-			} else {
+			if (pud_huge(*pud))
+				kvm_flush_dcache_pud(*pud);
+			else
 				stage2_flush_pmds(kvm, pud, addr, next);
-			}
 		}
 	} while (pud++, addr = next, addr != end);
 }
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 92d22e9..cbdc236 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -258,6 +258,24 @@ static inline void coherent_cache_guest_page(struct kvm_vcpu *vcpu, hva_t hva,
 	}
 }
 
+static inline void __kvm_flush_dcache_pte(pte_t pte)
+{
+	struct page *page = pte_page(pte);
+	kvm_flush_dcache_to_poc(page_address(page), PAGE_SIZE);
+}
+
+static inline void __kvm_flush_dcache_pmd(pmd_t pmd)
+{
+	struct page *page = pmd_page(pmd);
+	kvm_flush_dcache_to_poc(page_address(page), PMD_SIZE);
+}
+
+static inline void __kvm_flush_dcache_pud(pud_t pud)
+{
+	struct page *page = pud_page(pud);
+	kvm_flush_dcache_to_poc(page_address(page), PUD_SIZE);
+}
+
 #define kvm_virt_to_phys(x)		__virt_to_phys((unsigned long)(x))
 
 void kvm_set_way_flush(struct kvm_vcpu *vcpu);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 3/3] arm/arm64: KVM: Use kernel mapping to perform invalidation on page fault
  2015-01-21 18:39 [PATCH v3 0/3] arm/arm64: KVM: Random selection of cache related fixes Marc Zyngier
  2015-01-21 18:39 ` [PATCH v3 1/3] arm/arm64: KVM: Use set/way op trapping to track the state of the caches Marc Zyngier
  2015-01-21 18:39 ` [PATCH v3 2/3] arm/arm64: KVM: Invalidate data cache on unmap Marc Zyngier
@ 2015-01-21 18:39 ` Marc Zyngier
  2015-01-29 14:28   ` Christoffer Dall
  2015-01-26 17:09 ` [PATCH v3 0/3] arm/arm64: KVM: Random selection of cache related fixes Andrew Jones
  3 siblings, 1 reply; 12+ messages in thread
From: Marc Zyngier @ 2015-01-21 18:39 UTC (permalink / raw)
  To: Christoffer Dall, kvm, kvmarm; +Cc: Peter Maydell, Steve Capper

When handling a fault in stage-2, we need to resync I$ and D$, just
to be sure we don't leave any old cache line behind.

That's very good, except that we do so using the *user* address.
Under heavy load (swapping like crazy), we may end up in a situation
where the page gets mapped in stage-2 while being unmapped from
userspace by another CPU.

At that point, the DC/IC instructions can generate a fault, which
we handle with kvm->mmu_lock held. The box quickly deadlocks, user
is unhappy.

Instead, perform this invalidation through the kernel mapping,
which is guaranteed to be present. The box is much happier, and so
am I.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_mmu.h   | 43 +++++++++++++++++++++++++++++++---------
 arch/arm/kvm/mmu.c               | 12 +++++++----
 arch/arm64/include/asm/kvm_mmu.h | 13 +++++++-----
 3 files changed, 50 insertions(+), 18 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 552c31f..e5614c9 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -162,13 +162,10 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
 	return (vcpu->arch.cp15[c1_SCTLR] & 0b101) == 0b101;
 }
 
-static inline void coherent_cache_guest_page(struct kvm_vcpu *vcpu, hva_t hva,
-					     unsigned long size,
-					     bool ipa_uncached)
+static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu, pfn_t pfn,
+				 	       unsigned long size,
+					       bool ipa_uncached)
 {
-	if (!vcpu_has_cache_enabled(vcpu) || ipa_uncached)
-		kvm_flush_dcache_to_poc((void *)hva, size);
-	
 	/*
 	 * If we are going to insert an instruction page and the icache is
 	 * either VIPT or PIPT, there is a potential problem where the host
@@ -180,10 +177,38 @@ static inline void coherent_cache_guest_page(struct kvm_vcpu *vcpu, hva_t hva,
 	 *
 	 * VIVT caches are tagged using both the ASID and the VMID and doesn't
 	 * need any kind of flushing (DDI 0406C.b - Page B3-1392).
+	 *
+	 * We need to do this through a kernel mapping (using the
+	 * user-space mapping has proved to be the wrong
+	 * solution). For that, we need to kmap one page at a time,
+	 * and iterate over the range.
 	 */
-	if (icache_is_pipt()) {
-		__cpuc_coherent_user_range(hva, hva + size);
-	} else if (!icache_is_vivt_asid_tagged()) {
+
+	bool need_flush = !vcpu_has_cache_enabled(vcpu) || ipa_uncached;
+
+	VM_BUG_ON(size & PAGE_MASK);
+
+	if (!need_flush && !icache_is_pipt())
+		goto vipt_cache;
+
+	while (size) {
+		void *va = kmap_atomic_pfn(pfn);
+
+		if (need_flush)
+			kvm_flush_dcache_to_poc(va, PAGE_SIZE);
+	
+		if (icache_is_pipt())
+			__cpuc_coherent_user_range((unsigned long)va,
+						   (unsigned long)va + PAGE_SIZE);
+
+		size -= PAGE_SIZE;
+		pfn++;
+
+		kunmap_atomic(va);
+	}
+
+vipt_cache:
+	if (!icache_is_pipt() && !icache_is_vivt_asid_tagged()) {
 		/* any kind of VIPT cache */
 		__flush_icache_all();
 	}
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 78e68ab..1366625 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -957,6 +957,12 @@ static bool kvm_is_device_pfn(unsigned long pfn)
 	return !pfn_valid(pfn);
 }
 
+static void coherent_cache_guest_page(struct kvm_vcpu *vcpu, pfn_t pfn,
+				      unsigned long size, bool uncached)
+{
+	__coherent_cache_guest_page(vcpu, pfn, size, uncached);
+}
+
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  struct kvm_memory_slot *memslot, unsigned long hva,
 			  unsigned long fault_status)
@@ -1046,8 +1052,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			kvm_set_s2pmd_writable(&new_pmd);
 			kvm_set_pfn_dirty(pfn);
 		}
-		coherent_cache_guest_page(vcpu, hva & PMD_MASK, PMD_SIZE,
-					  fault_ipa_uncached);
+		coherent_cache_guest_page(vcpu, pfn, PMD_SIZE, fault_ipa_uncached);
 		ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
 	} else {
 		pte_t new_pte = pfn_pte(pfn, mem_type);
@@ -1055,8 +1060,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			kvm_set_s2pte_writable(&new_pte);
 			kvm_set_pfn_dirty(pfn);
 		}
-		coherent_cache_guest_page(vcpu, hva, PAGE_SIZE,
-					  fault_ipa_uncached);
+		coherent_cache_guest_page(vcpu, pfn, PAGE_SIZE, fault_ipa_uncached);
 		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte,
 			pgprot_val(mem_type) == pgprot_val(PAGE_S2_DEVICE));
 	}
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index cbdc236..adcf495 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -243,15 +243,18 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
 	return (vcpu_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
 }
 
-static inline void coherent_cache_guest_page(struct kvm_vcpu *vcpu, hva_t hva,
-					     unsigned long size,
-					     bool ipa_uncached)
+static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu, pfn_t pfn,
+					       unsigned long size,
+					       bool ipa_uncached)
 {
+	void *va = page_address(pfn_to_page(pfn));
+
 	if (!vcpu_has_cache_enabled(vcpu) || ipa_uncached)
-		kvm_flush_dcache_to_poc((void *)hva, size);
+		kvm_flush_dcache_to_poc(va, size);
 
 	if (!icache_is_aliasing()) {		/* PIPT */
-		flush_icache_range(hva, hva + size);
+		flush_icache_range((unsigned long)va,
+				   (unsigned long)va + size);
 	} else if (!icache_is_aivivt()) {	/* non ASID-tagged VIVT */
 		/* any kind of VIPT cache */
 		__flush_icache_all();
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 0/3] arm/arm64: KVM: Random selection of cache related fixes
  2015-01-21 18:39 [PATCH v3 0/3] arm/arm64: KVM: Random selection of cache related fixes Marc Zyngier
                   ` (2 preceding siblings ...)
  2015-01-21 18:39 ` [PATCH v3 3/3] arm/arm64: KVM: Use kernel mapping to perform invalidation on page fault Marc Zyngier
@ 2015-01-26 17:09 ` Andrew Jones
  3 siblings, 0 replies; 12+ messages in thread
From: Andrew Jones @ 2015-01-26 17:09 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: Christoffer Dall, kvm, kvmarm

On Wed, Jan 21, 2015 at 06:39:45PM +0000, Marc Zyngier wrote:
> This small series fixes a number of issues that Christoffer and I have
> been trying to nail down for a while, having to do with the host dying
> under load (swapping), and also with the way we deal with caches in
> general (and with set/way operation in particular):
> 
> - The first one changes the way we handle cache ops by set/way,
>   basically turning them into VA ops for the whole memory. This allows
>   platforms with system caches to boot a 32bit zImage, for example.
> 
> - The second one fixes a corner case that could happen if the guest
>   used an uncached mapping (or had its caches off) while the host was
>   swapping it out (and using a cache-coherent IO subsystem).
> 
> - Finally, the last one fixes this stability issue when the host was
>   swapping, by using a kernel mapping for cache maintenance instead of
>   the userspace one.
> 
> With these patches (and both the TLB invalidation and HCR fixes that
> are on their way to mainline), the APM platform seems much more robust
> than it previously was. Fingers crossed.
> 
> The first round of review has generated a lot of traffic about
> ASID-tagged icache management for guests, but I've decided not to
> address this issue as part of this series. The code is broken already,
> and there isn't any virtualization capable, ASID-tagged icache core in
> the wild, AFAIK. I'll try to revisit this in another series, once I
> have wrapped my head around it (or someone beats me to it).
> 
> Based on 3.19-rc5, tested on Juno, X-Gene, TC-2 and Cubietruck.
> 
> Also at git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git kvm-arm64/mm-fixes-3.19
> 
> * From v2: [2]
> - Reworked the algorithm that tracks the state of the guest's caches,
>   as there is some cases I didn't anticipate. In the end, the
>   algorithm is simpler.
> 
> * From v1: [1]
> - Dropped Steve's patch after discussion with Andrea
> - Refactor set/way support to avoid code duplication, better comments
> - Much improved comments in patch #2, courtesy of Christoffer
> 
> [1]: http://www.spinics.net/lists/kvm-arm/msg13008.html
> [2]: http://www.spinics.net/lists/kvm-arm/msg13161.html
> 
> Marc Zyngier (3):
>   arm/arm64: KVM: Use set/way op trapping to track the state of the
>     caches
>   arm/arm64: KVM: Invalidate data cache on unmap
>   arm/arm64: KVM: Use kernel mapping to perform invalidation on page
>     fault
> 
>  arch/arm/include/asm/kvm_emulate.h   |  10 +++
>  arch/arm/include/asm/kvm_host.h      |   3 -
>  arch/arm/include/asm/kvm_mmu.h       |  77 +++++++++++++---
>  arch/arm/kvm/arm.c                   |  10 ---
>  arch/arm/kvm/coproc.c                |  64 +++-----------
>  arch/arm/kvm/coproc_a15.c            |   2 +-
>  arch/arm/kvm/coproc_a7.c             |   2 +-
>  arch/arm/kvm/mmu.c                   | 164 ++++++++++++++++++++++++++++++-----
>  arch/arm/kvm/trace.h                 |  39 +++++++++
>  arch/arm64/include/asm/kvm_emulate.h |  10 +++
>  arch/arm64/include/asm/kvm_host.h    |   3 -
>  arch/arm64/include/asm/kvm_mmu.h     |  34 ++++++--
>  arch/arm64/kvm/sys_regs.c            |  75 +++-------------
>  13 files changed, 321 insertions(+), 172 deletions(-)
> 
> -- 
> 2.1.4
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Hi Marc,

checkpatch found some whitespace issues (not just the false alarms
that trace.h files generate). Also a loosing vs. losing typo in 2/3's
commit message.

Thanks,
Drew (trivial comments) Jones

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/3] arm/arm64: KVM: Use set/way op trapping to track the state of the caches
  2015-01-21 18:39 ` [PATCH v3 1/3] arm/arm64: KVM: Use set/way op trapping to track the state of the caches Marc Zyngier
@ 2015-01-26 22:58   ` Christoffer Dall
  2015-01-27 11:21     ` Marc Zyngier
  0 siblings, 1 reply; 12+ messages in thread
From: Christoffer Dall @ 2015-01-26 22:58 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, kvmarm, Peter Maydell, Steve Capper

On Wed, Jan 21, 2015 at 06:39:46PM +0000, Marc Zyngier wrote:
> Trying to emulate the behaviour of set/way cache ops is fairly
> pointless, as there are too many ways we can end-up missing stuff.
> Also, there is some system caches out there that simply ignore
> set/way operations.
> 
> So instead of trying to implement them, let's convert it to VA ops,
> and use them as a way to re-enable the trapping of VM ops. That way,
> we can detect the point when the MMU/caches are turned off, and do
> a full VM flush (which is what the guest was trying to do anyway).
> 
> This allows a 32bit zImage to boot on the APM thingy, and will
> probably help bootloaders in general.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

This had some conflicts with dirty page logging.  I fixed it up here,
and also removed some trailing white space and mixed spaces/tabs that
patch complained about:

http://git.linaro.org/people/christoffer.dall/linux-kvm-arm.git mm-fixes

> ---
>  arch/arm/include/asm/kvm_emulate.h   | 10 +++++
>  arch/arm/include/asm/kvm_host.h      |  3 --
>  arch/arm/include/asm/kvm_mmu.h       |  3 +-
>  arch/arm/kvm/arm.c                   | 10 -----
>  arch/arm/kvm/coproc.c                | 64 ++++++------------------------
>  arch/arm/kvm/coproc_a15.c            |  2 +-
>  arch/arm/kvm/coproc_a7.c             |  2 +-
>  arch/arm/kvm/mmu.c                   | 70 ++++++++++++++++++++++++++++++++-
>  arch/arm/kvm/trace.h                 | 39 +++++++++++++++++++
>  arch/arm64/include/asm/kvm_emulate.h | 10 +++++
>  arch/arm64/include/asm/kvm_host.h    |  3 --
>  arch/arm64/include/asm/kvm_mmu.h     |  3 +-
>  arch/arm64/kvm/sys_regs.c            | 75 +++++-------------------------------
>  13 files changed, 155 insertions(+), 139 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
> index 66ce176..7b01523 100644
> --- a/arch/arm/include/asm/kvm_emulate.h
> +++ b/arch/arm/include/asm/kvm_emulate.h
> @@ -38,6 +38,16 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
>  	vcpu->arch.hcr = HCR_GUEST_MASK;
>  }
>  
> +static inline unsigned long vcpu_get_hcr(struct kvm_vcpu *vcpu)
> +{
> +	return vcpu->arch.hcr;
> +}
> +
> +static inline void vcpu_set_hcr(struct kvm_vcpu *vcpu, unsigned long hcr)
> +{
> +	vcpu->arch.hcr = hcr;
> +}
> +
>  static inline bool vcpu_mode_is_32bit(struct kvm_vcpu *vcpu)
>  {
>  	return 1;
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 254e065..04b4ea0 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -125,9 +125,6 @@ struct kvm_vcpu_arch {
>  	 * Anything that is not used directly from assembly code goes
>  	 * here.
>  	 */
> -	/* dcache set/way operation pending */
> -	int last_pcpu;
> -	cpumask_t require_dcache_flush;
>  
>  	/* Don't run the guest on this vcpu */
>  	bool pause;
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index 63e0ecc..286644c 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -190,7 +190,8 @@ static inline void coherent_cache_guest_page(struct kvm_vcpu *vcpu, hva_t hva,
>  
>  #define kvm_virt_to_phys(x)		virt_to_idmap((unsigned long)(x))
>  
> -void stage2_flush_vm(struct kvm *kvm);
> +void kvm_set_way_flush(struct kvm_vcpu *vcpu);
> +void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
>  
>  #endif	/* !__ASSEMBLY__ */
>  
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 2d6d910..0b0d58a 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -281,15 +281,6 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  	vcpu->cpu = cpu;
>  	vcpu->arch.host_cpu_context = this_cpu_ptr(kvm_host_cpu_state);
>  
> -	/*
> -	 * Check whether this vcpu requires the cache to be flushed on
> -	 * this physical CPU. This is a consequence of doing dcache
> -	 * operations by set/way on this vcpu. We do it here to be in
> -	 * a non-preemptible section.
> -	 */
> -	if (cpumask_test_and_clear_cpu(cpu, &vcpu->arch.require_dcache_flush))
> -		flush_cache_all(); /* We'd really want v7_flush_dcache_all() */
> -
>  	kvm_arm_set_running_vcpu(vcpu);
>  }
>  
> @@ -541,7 +532,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
>  
>  		vcpu->mode = OUTSIDE_GUEST_MODE;
> -		vcpu->arch.last_pcpu = smp_processor_id();
>  		kvm_guest_exit();
>  		trace_kvm_exit(*vcpu_pc(vcpu));
>  		/*
> diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
> index 7928dbd..0afcc00 100644
> --- a/arch/arm/kvm/coproc.c
> +++ b/arch/arm/kvm/coproc.c
> @@ -189,82 +189,40 @@ static bool access_l2ectlr(struct kvm_vcpu *vcpu,
>  	return true;
>  }
>  
> -/* See note at ARM ARM B1.14.4 */
> +/*
> + * See note at ARMv7 ARM B1.14.4 (TL;DR: S/W ops are not easily virtualized).
> + */
>  static bool access_dcsw(struct kvm_vcpu *vcpu,
>  			const struct coproc_params *p,
>  			const struct coproc_reg *r)
>  {
> -	unsigned long val;
> -	int cpu;
> -
>  	if (!p->is_write)
>  		return read_from_write_only(vcpu, p);
>  
> -	cpu = get_cpu();
> -
> -	cpumask_setall(&vcpu->arch.require_dcache_flush);
> -	cpumask_clear_cpu(cpu, &vcpu->arch.require_dcache_flush);
> -
> -	/* If we were already preempted, take the long way around */
> -	if (cpu != vcpu->arch.last_pcpu) {
> -		flush_cache_all();
> -		goto done;
> -	}
> -
> -	val = *vcpu_reg(vcpu, p->Rt1);
> -
> -	switch (p->CRm) {
> -	case 6:			/* Upgrade DCISW to DCCISW, as per HCR.SWIO */
> -	case 14:		/* DCCISW */
> -		asm volatile("mcr p15, 0, %0, c7, c14, 2" : : "r" (val));
> -		break;
> -
> -	case 10:		/* DCCSW */
> -		asm volatile("mcr p15, 0, %0, c7, c10, 2" : : "r" (val));
> -		break;
> -	}
> -
> -done:
> -	put_cpu();
> -
> +	kvm_set_way_flush(vcpu);
>  	return true;
>  }
>  
>  /*
>   * Generic accessor for VM registers. Only called as long as HCR_TVM
> - * is set.
> + * is set.  If the guest enables the MMU, we stop trapping the VM
> + * sys_regs and leave it in complete control of the caches.
> + *
> + * Used by the cpu-specific code.
>   */
>  static bool access_vm_reg(struct kvm_vcpu *vcpu,
>  			  const struct coproc_params *p,
>  			  const struct coproc_reg *r)
>  {
> +	bool was_enabled = vcpu_has_cache_enabled(vcpu);
> +
>  	BUG_ON(!p->is_write);
>  
>  	vcpu->arch.cp15[r->reg] = *vcpu_reg(vcpu, p->Rt1);
>  	if (p->is_64bit)
>  		vcpu->arch.cp15[r->reg + 1] = *vcpu_reg(vcpu, p->Rt2);
>  
> -	return true;
> -}
> -
> -/*
> - * SCTLR accessor. Only called as long as HCR_TVM is set.  If the
> - * guest enables the MMU, we stop trapping the VM sys_regs and leave
> - * it in complete control of the caches.
> - *
> - * Used by the cpu-specific code.
> - */
> -bool access_sctlr(struct kvm_vcpu *vcpu,
> -		  const struct coproc_params *p,
> -		  const struct coproc_reg *r)
> -{

I think you left in a prototype in arch/arm/kvm/coproc.h

> -	access_vm_reg(vcpu, p, r);
> -
> -	if (vcpu_has_cache_enabled(vcpu)) {	/* MMU+Caches enabled? */
> -		vcpu->arch.hcr &= ~HCR_TVM;
> -		stage2_flush_vm(vcpu->kvm);
> -	}
> -
> +	kvm_toggle_cache(vcpu, was_enabled);
>  	return true;
>  }
>  
> diff --git a/arch/arm/kvm/coproc_a15.c b/arch/arm/kvm/coproc_a15.c
> index e6f4ae4..a713675 100644
> --- a/arch/arm/kvm/coproc_a15.c
> +++ b/arch/arm/kvm/coproc_a15.c
> @@ -34,7 +34,7 @@
>  static const struct coproc_reg a15_regs[] = {
>  	/* SCTLR: swapped by interrupt.S. */
>  	{ CRn( 1), CRm( 0), Op1( 0), Op2( 0), is32,
> -			access_sctlr, reset_val, c1_SCTLR, 0x00C50078 },
> +			access_vm_reg, reset_val, c1_SCTLR, 0x00C50078 },
>  };
>  
>  static struct kvm_coproc_target_table a15_target_table = {
> diff --git a/arch/arm/kvm/coproc_a7.c b/arch/arm/kvm/coproc_a7.c
> index 17fc7cd..b19e46d 100644
> --- a/arch/arm/kvm/coproc_a7.c
> +++ b/arch/arm/kvm/coproc_a7.c
> @@ -37,7 +37,7 @@
>  static const struct coproc_reg a7_regs[] = {
>  	/* SCTLR: swapped by interrupt.S. */
>  	{ CRn( 1), CRm( 0), Op1( 0), Op2( 0), is32,
> -			access_sctlr, reset_val, c1_SCTLR, 0x00C50878 },
> +			access_vm_reg, reset_val, c1_SCTLR, 0x00C50878 },

I'm confused how you tested this, this doesn't seem to compile for me
(access_vm_reg is a static function).

>  };
>  
>  static struct kvm_coproc_target_table a7_target_table = {
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 1dc9778..106737e 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -278,7 +278,7 @@ static void stage2_flush_memslot(struct kvm *kvm,
>   * Go through the stage 2 page tables and invalidate any cache lines
>   * backing memory already mapped to the VM.
>   */
> -void stage2_flush_vm(struct kvm *kvm)
> +static void stage2_flush_vm(struct kvm *kvm)
>  {
>  	struct kvm_memslots *slots;
>  	struct kvm_memory_slot *memslot;
> @@ -1411,3 +1411,71 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>  	unmap_stage2_range(kvm, gpa, size);
>  	spin_unlock(&kvm->mmu_lock);
>  }
> +
> +/*
> + * See note at ARMv7 ARM B1.14.4 (TL;DR: S/W ops are not easily virtualized).
> + *
> + * Main problems:
> + * - S/W ops are local to a CPU (not broadcast)
> + * - We have line migration behind our back (speculation)
> + * - System caches don't support S/W at all (damn!)
> + *
> + * In the face of the above, the best we can do is to try and convert
> + * S/W ops to VA ops. Because the guest is not allowed to infer the
> + * S/W to PA mapping, it can only use S/W to nuke the whole cache,
> + * which is a rather good thing for us.
> + *
> + * Also, it is only used when turning caches on/off ("The expected
> + * usage of the cache maintenance instructions that operate by set/way
> + * is associated with the cache maintenance instructions associated
> + * with the powerdown and powerup of caches, if this is required by
> + * the implementation.").
> + *
> + * We use the following policy:
> + *
> + * - If we trap a S/W operation, we enable VM trapping to detect
> + *   caches being turned on/off, and do a full clean.
> + *
> + * - We flush the caches on both caches being turned on and off.
> + *
> + * - Once the caches are enabled, we stop trapping VM ops.
> + */
> +void kvm_set_way_flush(struct kvm_vcpu *vcpu)
> +{
> +	unsigned long hcr = vcpu_get_hcr(vcpu);
> +
> +	/*
> +	 * If this is the first time we do a S/W operation
> +	 * (i.e. HCR_TVM not set) flush the whole memory, and set the
> +	 * VM trapping.

what about when the guest boots, wouldn't it be using S/W operations to
flush the whole cache before turning it on?  But we ignore this case now
because we're already trapping VM regs so this is deferred until caches
are turned on.

However, when we are turning off caches, we both flush the cache when
doing the actual S/W operation and when the caches are then turned off.
Why?

> +	 *
> +	 * Otherwise, rely on the VM trapping to wait for the MMU +
> +	 * Caches to be turned off. At that point, we'll be able to
> +	 * clean the caches again.
> +	 */
> +	if (!(hcr & HCR_TVM)) {
> +		trace_kvm_set_way_flush(*vcpu_pc(vcpu),
> +					vcpu_has_cache_enabled(vcpu));
> +		stage2_flush_vm(vcpu->kvm);
> +		vcpu_set_hcr(vcpu, hcr | HCR_TVM);
> +	}
> +}
> +
> +void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled)
> +{
> +	bool now_enabled = vcpu_has_cache_enabled(vcpu);
> +
> +	/*
> +	 * If switching the MMU+caches on, need to invalidate the caches.
> +	 * If switching it off, need to clean the caches.
> +	 * Clean + invalidate does the trick always.
> +	 */
> +	if (now_enabled != was_enabled)
> +		stage2_flush_vm(vcpu->kvm);
> +
> +	/* Caches are now on, stop trapping VM ops (until a S/W op) */
> +	if (now_enabled)
> +		vcpu_set_hcr(vcpu, vcpu_get_hcr(vcpu) & ~HCR_TVM);
> +
> +	trace_kvm_toggle_cache(*vcpu_pc(vcpu), was_enabled, now_enabled);
> +}
> diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
> index b1d640f..b6a6e71 100644
> --- a/arch/arm/kvm/trace.h
> +++ b/arch/arm/kvm/trace.h
> @@ -223,6 +223,45 @@ TRACE_EVENT(kvm_hvc,
>  		  __entry->vcpu_pc, __entry->r0, __entry->imm)
>  );
>  
> +TRACE_EVENT(kvm_set_way_flush,
> +	    TP_PROTO(unsigned long vcpu_pc, bool cache),
> +	    TP_ARGS(vcpu_pc, cache),
> +
> +	    TP_STRUCT__entry(
> +		    __field(	unsigned long,	vcpu_pc		)
> +		    __field(	bool,		cache		)
> +	    ),
> +
> +	    TP_fast_assign(
> +		    __entry->vcpu_pc		= vcpu_pc;
> +		    __entry->cache		= cache;
> +	    ),
> +
> +	    TP_printk("S/W flush at 0x%016lx (cache %s)",
> +		      __entry->vcpu_pc, __entry->cache ? "on" : "off")
> +);
> +
> +TRACE_EVENT(kvm_toggle_cache,
> +	    TP_PROTO(unsigned long vcpu_pc, bool was, bool now),
> +	    TP_ARGS(vcpu_pc, was, now),
> +
> +	    TP_STRUCT__entry(
> +		    __field(	unsigned long,	vcpu_pc		)
> +		    __field(	bool,		was		)
> +		    __field(	bool,		now		)
> +	    ),
> +
> +	    TP_fast_assign(
> +		    __entry->vcpu_pc		= vcpu_pc;
> +		    __entry->was		= was;
> +		    __entry->now		= now;
> +	    ),
> +
> +	    TP_printk("VM op at 0x%016lx (cache was %s, now %s)",
> +		      __entry->vcpu_pc, __entry->was ? "on" : "off",
> +		      __entry->now ? "on" : "off")
> +);
> +
>  #endif /* _TRACE_KVM_H */
>  
>  #undef TRACE_INCLUDE_PATH
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 865a7e2..3cb4c85 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -45,6 +45,16 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
>  		vcpu->arch.hcr_el2 &= ~HCR_RW;
>  }
>  
> +static inline unsigned long vcpu_get_hcr(struct kvm_vcpu *vcpu)
> +{
> +	return vcpu->arch.hcr_el2;
> +}
> +
> +static inline void vcpu_set_hcr(struct kvm_vcpu *vcpu, unsigned long hcr)
> +{
> +	vcpu->arch.hcr_el2 = hcr;
> +}
> +
>  static inline unsigned long *vcpu_pc(const struct kvm_vcpu *vcpu)
>  {
>  	return (unsigned long *)&vcpu_gp_regs(vcpu)->regs.pc;
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 0b7dfdb..acd101a 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -116,9 +116,6 @@ struct kvm_vcpu_arch {
>  	 * Anything that is not used directly from assembly code goes
>  	 * here.
>  	 */
> -	/* dcache set/way operation pending */
> -	int last_pcpu;
> -	cpumask_t require_dcache_flush;
>  
>  	/* Don't run the guest */
>  	bool pause;
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 14a74f1..92d22e9 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -260,7 +260,8 @@ static inline void coherent_cache_guest_page(struct kvm_vcpu *vcpu, hva_t hva,
>  
>  #define kvm_virt_to_phys(x)		__virt_to_phys((unsigned long)(x))
>  
> -void stage2_flush_vm(struct kvm *kvm);
> +void kvm_set_way_flush(struct kvm_vcpu *vcpu);
> +void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
>  
>  #endif /* __ASSEMBLY__ */
>  #endif /* __ARM64_KVM_MMU_H__ */
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 3d7c2df..95daa73 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -69,68 +69,31 @@ static u32 get_ccsidr(u32 csselr)
>  	return ccsidr;
>  }
>  
> -static void do_dc_cisw(u32 val)
> -{
> -	asm volatile("dc cisw, %x0" : : "r" (val));
> -	dsb(ish);
> -}
> -
> -static void do_dc_csw(u32 val)
> -{
> -	asm volatile("dc csw, %x0" : : "r" (val));
> -	dsb(ish);
> -}
> -
> -/* See note at ARM ARM B1.14.4 */
> +/*
> + * See note at ARMv7 ARM B1.14.4 (TL;DR: S/W ops are not easily virtualized).
> + */
>  static bool access_dcsw(struct kvm_vcpu *vcpu,
>  			const struct sys_reg_params *p,
>  			const struct sys_reg_desc *r)
>  {
> -	unsigned long val;
> -	int cpu;
> -
>  	if (!p->is_write)
>  		return read_from_write_only(vcpu, p);
>  
> -	cpu = get_cpu();
> -
> -	cpumask_setall(&vcpu->arch.require_dcache_flush);
> -	cpumask_clear_cpu(cpu, &vcpu->arch.require_dcache_flush);
> -
> -	/* If we were already preempted, take the long way around */
> -	if (cpu != vcpu->arch.last_pcpu) {
> -		flush_cache_all();
> -		goto done;
> -	}
> -
> -	val = *vcpu_reg(vcpu, p->Rt);
> -
> -	switch (p->CRm) {
> -	case 6:			/* Upgrade DCISW to DCCISW, as per HCR.SWIO */
> -	case 14:		/* DCCISW */
> -		do_dc_cisw(val);
> -		break;
> -
> -	case 10:		/* DCCSW */
> -		do_dc_csw(val);
> -		break;
> -	}
> -
> -done:
> -	put_cpu();
> -
> +	kvm_set_way_flush(vcpu);	
>  	return true;
>  }
>  
>  /*
>   * Generic accessor for VM registers. Only called as long as HCR_TVM
> - * is set.
> + * is set. If the guest enables the MMU, we stop trapping the VM
> + * sys_regs and leave it in complete control of the caches.
>   */
>  static bool access_vm_reg(struct kvm_vcpu *vcpu,
>  			  const struct sys_reg_params *p,
>  			  const struct sys_reg_desc *r)
>  {
>  	unsigned long val;
> +	bool was_enabled = vcpu_has_cache_enabled(vcpu);
>  
>  	BUG_ON(!p->is_write);
>  
> @@ -143,25 +106,7 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
>  		vcpu_cp15_64_low(vcpu, r->reg) = val & 0xffffffffUL;
>  	}
>  
> -	return true;
> -}
> -
> -/*
> - * SCTLR_EL1 accessor. Only called as long as HCR_TVM is set.  If the
> - * guest enables the MMU, we stop trapping the VM sys_regs and leave
> - * it in complete control of the caches.
> - */
> -static bool access_sctlr(struct kvm_vcpu *vcpu,
> -			 const struct sys_reg_params *p,
> -			 const struct sys_reg_desc *r)
> -{
> -	access_vm_reg(vcpu, p, r);
> -
> -	if (vcpu_has_cache_enabled(vcpu)) {	/* MMU+Caches enabled? */
> -		vcpu->arch.hcr_el2 &= ~HCR_TVM;
> -		stage2_flush_vm(vcpu->kvm);
> -	}
> -
> +	kvm_toggle_cache(vcpu, was_enabled);
>  	return true;
>  }
>  
> @@ -377,7 +322,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	  NULL, reset_mpidr, MPIDR_EL1 },
>  	/* SCTLR_EL1 */
>  	{ Op0(0b11), Op1(0b000), CRn(0b0001), CRm(0b0000), Op2(0b000),
> -	  access_sctlr, reset_val, SCTLR_EL1, 0x00C50078 },
> +	  access_vm_reg, reset_val, SCTLR_EL1, 0x00C50078 },
>  	/* CPACR_EL1 */
>  	{ Op0(0b11), Op1(0b000), CRn(0b0001), CRm(0b0000), Op2(0b010),
>  	  NULL, reset_val, CPACR_EL1, 0 },
> @@ -657,7 +602,7 @@ static const struct sys_reg_desc cp14_64_regs[] = {
>   * register).
>   */
>  static const struct sys_reg_desc cp15_regs[] = {
> -	{ Op1( 0), CRn( 1), CRm( 0), Op2( 0), access_sctlr, NULL, c1_SCTLR },
> +	{ Op1( 0), CRn( 1), CRm( 0), Op2( 0), access_vm_reg, NULL, c1_SCTLR },
>  	{ Op1( 0), CRn( 2), CRm( 0), Op2( 0), access_vm_reg, NULL, c2_TTBR0 },
>  	{ Op1( 0), CRn( 2), CRm( 0), Op2( 1), access_vm_reg, NULL, c2_TTBR1 },
>  	{ Op1( 0), CRn( 2), CRm( 0), Op2( 2), access_vm_reg, NULL, c2_TTBCR },
> -- 
> 2.1.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/3] arm/arm64: KVM: Use set/way op trapping to track the state of the caches
  2015-01-26 22:58   ` Christoffer Dall
@ 2015-01-27 11:21     ` Marc Zyngier
  2015-01-27 13:17       ` Christoffer Dall
  0 siblings, 1 reply; 12+ messages in thread
From: Marc Zyngier @ 2015-01-27 11:21 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, kvmarm, Peter Maydell, Steve Capper

On 26/01/15 22:58, Christoffer Dall wrote:
> On Wed, Jan 21, 2015 at 06:39:46PM +0000, Marc Zyngier wrote:
>> Trying to emulate the behaviour of set/way cache ops is fairly
>> pointless, as there are too many ways we can end-up missing stuff.
>> Also, there is some system caches out there that simply ignore
>> set/way operations.
>>
>> So instead of trying to implement them, let's convert it to VA ops,
>> and use them as a way to re-enable the trapping of VM ops. That way,
>> we can detect the point when the MMU/caches are turned off, and do
>> a full VM flush (which is what the guest was trying to do anyway).
>>
>> This allows a 32bit zImage to boot on the APM thingy, and will
>> probably help bootloaders in general.
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> 
> This had some conflicts with dirty page logging.  I fixed it up here,
> and also removed some trailing white space and mixed spaces/tabs that
> patch complained about:
> 
> http://git.linaro.org/people/christoffer.dall/linux-kvm-arm.git mm-fixes

Thanks for doing so.

>> ---
>>  arch/arm/include/asm/kvm_emulate.h   | 10 +++++
>>  arch/arm/include/asm/kvm_host.h      |  3 --
>>  arch/arm/include/asm/kvm_mmu.h       |  3 +-
>>  arch/arm/kvm/arm.c                   | 10 -----
>>  arch/arm/kvm/coproc.c                | 64 ++++++------------------------
>>  arch/arm/kvm/coproc_a15.c            |  2 +-
>>  arch/arm/kvm/coproc_a7.c             |  2 +-
>>  arch/arm/kvm/mmu.c                   | 70 ++++++++++++++++++++++++++++++++-
>>  arch/arm/kvm/trace.h                 | 39 +++++++++++++++++++
>>  arch/arm64/include/asm/kvm_emulate.h | 10 +++++
>>  arch/arm64/include/asm/kvm_host.h    |  3 --
>>  arch/arm64/include/asm/kvm_mmu.h     |  3 +-
>>  arch/arm64/kvm/sys_regs.c            | 75 +++++-------------------------------
>>  13 files changed, 155 insertions(+), 139 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
>> index 66ce176..7b01523 100644
>> --- a/arch/arm/include/asm/kvm_emulate.h
>> +++ b/arch/arm/include/asm/kvm_emulate.h
>> @@ -38,6 +38,16 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
>>       vcpu->arch.hcr = HCR_GUEST_MASK;
>>  }
>>
>> +static inline unsigned long vcpu_get_hcr(struct kvm_vcpu *vcpu)
>> +{
>> +     return vcpu->arch.hcr;
>> +}
>> +
>> +static inline void vcpu_set_hcr(struct kvm_vcpu *vcpu, unsigned long hcr)
>> +{
>> +     vcpu->arch.hcr = hcr;
>> +}
>> +
>>  static inline bool vcpu_mode_is_32bit(struct kvm_vcpu *vcpu)
>>  {
>>       return 1;
>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>> index 254e065..04b4ea0 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -125,9 +125,6 @@ struct kvm_vcpu_arch {
>>        * Anything that is not used directly from assembly code goes
>>        * here.
>>        */
>> -     /* dcache set/way operation pending */
>> -     int last_pcpu;
>> -     cpumask_t require_dcache_flush;
>>
>>       /* Don't run the guest on this vcpu */
>>       bool pause;
>> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
>> index 63e0ecc..286644c 100644
>> --- a/arch/arm/include/asm/kvm_mmu.h
>> +++ b/arch/arm/include/asm/kvm_mmu.h
>> @@ -190,7 +190,8 @@ static inline void coherent_cache_guest_page(struct kvm_vcpu *vcpu, hva_t hva,
>>
>>  #define kvm_virt_to_phys(x)          virt_to_idmap((unsigned long)(x))
>>
>> -void stage2_flush_vm(struct kvm *kvm);
>> +void kvm_set_way_flush(struct kvm_vcpu *vcpu);
>> +void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
>>
>>  #endif       /* !__ASSEMBLY__ */
>>
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index 2d6d910..0b0d58a 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -281,15 +281,6 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>       vcpu->cpu = cpu;
>>       vcpu->arch.host_cpu_context = this_cpu_ptr(kvm_host_cpu_state);
>>
>> -     /*
>> -      * Check whether this vcpu requires the cache to be flushed on
>> -      * this physical CPU. This is a consequence of doing dcache
>> -      * operations by set/way on this vcpu. We do it here to be in
>> -      * a non-preemptible section.
>> -      */
>> -     if (cpumask_test_and_clear_cpu(cpu, &vcpu->arch.require_dcache_flush))
>> -             flush_cache_all(); /* We'd really want v7_flush_dcache_all() */
>> -
>>       kvm_arm_set_running_vcpu(vcpu);
>>  }
>>
>> @@ -541,7 +532,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>               ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
>>
>>               vcpu->mode = OUTSIDE_GUEST_MODE;
>> -             vcpu->arch.last_pcpu = smp_processor_id();
>>               kvm_guest_exit();
>>               trace_kvm_exit(*vcpu_pc(vcpu));
>>               /*
>> diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
>> index 7928dbd..0afcc00 100644
>> --- a/arch/arm/kvm/coproc.c
>> +++ b/arch/arm/kvm/coproc.c
>> @@ -189,82 +189,40 @@ static bool access_l2ectlr(struct kvm_vcpu *vcpu,
>>       return true;
>>  }
>>
>> -/* See note at ARM ARM B1.14.4 */
>> +/*
>> + * See note at ARMv7 ARM B1.14.4 (TL;DR: S/W ops are not easily virtualized).
>> + */
>>  static bool access_dcsw(struct kvm_vcpu *vcpu,
>>                       const struct coproc_params *p,
>>                       const struct coproc_reg *r)
>>  {
>> -     unsigned long val;
>> -     int cpu;
>> -
>>       if (!p->is_write)
>>               return read_from_write_only(vcpu, p);
>>
>> -     cpu = get_cpu();
>> -
>> -     cpumask_setall(&vcpu->arch.require_dcache_flush);
>> -     cpumask_clear_cpu(cpu, &vcpu->arch.require_dcache_flush);
>> -
>> -     /* If we were already preempted, take the long way around */
>> -     if (cpu != vcpu->arch.last_pcpu) {
>> -             flush_cache_all();
>> -             goto done;
>> -     }
>> -
>> -     val = *vcpu_reg(vcpu, p->Rt1);
>> -
>> -     switch (p->CRm) {
>> -     case 6:                 /* Upgrade DCISW to DCCISW, as per HCR.SWIO */
>> -     case 14:                /* DCCISW */
>> -             asm volatile("mcr p15, 0, %0, c7, c14, 2" : : "r" (val));
>> -             break;
>> -
>> -     case 10:                /* DCCSW */
>> -             asm volatile("mcr p15, 0, %0, c7, c10, 2" : : "r" (val));
>> -             break;
>> -     }
>> -
>> -done:
>> -     put_cpu();
>> -
>> +     kvm_set_way_flush(vcpu);
>>       return true;
>>  }
>>
>>  /*
>>   * Generic accessor for VM registers. Only called as long as HCR_TVM
>> - * is set.
>> + * is set.  If the guest enables the MMU, we stop trapping the VM
>> + * sys_regs and leave it in complete control of the caches.
>> + *
>> + * Used by the cpu-specific code.
>>   */
>>  static bool access_vm_reg(struct kvm_vcpu *vcpu,
>>                         const struct coproc_params *p,
>>                         const struct coproc_reg *r)
>>  {
>> +     bool was_enabled = vcpu_has_cache_enabled(vcpu);
>> +
>>       BUG_ON(!p->is_write);
>>
>>       vcpu->arch.cp15[r->reg] = *vcpu_reg(vcpu, p->Rt1);
>>       if (p->is_64bit)
>>               vcpu->arch.cp15[r->reg + 1] = *vcpu_reg(vcpu, p->Rt2);
>>
>> -     return true;
>> -}
>> -
>> -/*
>> - * SCTLR accessor. Only called as long as HCR_TVM is set.  If the
>> - * guest enables the MMU, we stop trapping the VM sys_regs and leave
>> - * it in complete control of the caches.
>> - *
>> - * Used by the cpu-specific code.
>> - */
>> -bool access_sctlr(struct kvm_vcpu *vcpu,
>> -               const struct coproc_params *p,
>> -               const struct coproc_reg *r)
>> -{
> 
> I think you left in a prototype in arch/arm/kvm/coproc.h
> 
>> -     access_vm_reg(vcpu, p, r);
>> -
>> -     if (vcpu_has_cache_enabled(vcpu)) {     /* MMU+Caches enabled? */
>> -             vcpu->arch.hcr &= ~HCR_TVM;
>> -             stage2_flush_vm(vcpu->kvm);
>> -     }
>> -
>> +     kvm_toggle_cache(vcpu, was_enabled);
>>       return true;
>>  }
>>
>> diff --git a/arch/arm/kvm/coproc_a15.c b/arch/arm/kvm/coproc_a15.c
>> index e6f4ae4..a713675 100644
>> --- a/arch/arm/kvm/coproc_a15.c
>> +++ b/arch/arm/kvm/coproc_a15.c
>> @@ -34,7 +34,7 @@
>>  static const struct coproc_reg a15_regs[] = {
>>       /* SCTLR: swapped by interrupt.S. */
>>       { CRn( 1), CRm( 0), Op1( 0), Op2( 0), is32,
>> -                     access_sctlr, reset_val, c1_SCTLR, 0x00C50078 },
>> +                     access_vm_reg, reset_val, c1_SCTLR, 0x00C50078 },
>>  };
>>
>>  static struct kvm_coproc_target_table a15_target_table = {
>> diff --git a/arch/arm/kvm/coproc_a7.c b/arch/arm/kvm/coproc_a7.c
>> index 17fc7cd..b19e46d 100644
>> --- a/arch/arm/kvm/coproc_a7.c
>> +++ b/arch/arm/kvm/coproc_a7.c
>> @@ -37,7 +37,7 @@
>>  static const struct coproc_reg a7_regs[] = {
>>       /* SCTLR: swapped by interrupt.S. */
>>       { CRn( 1), CRm( 0), Op1( 0), Op2( 0), is32,
>> -                     access_sctlr, reset_val, c1_SCTLR, 0x00C50878 },
>> +                     access_vm_reg, reset_val, c1_SCTLR, 0x00C50878 },
> 
> I'm confused how you tested this, this doesn't seem to compile for me
> (access_vm_reg is a static function).

That's worrying. It looks like I dropped a fixup while rebasing the
series. Nothing major, but still... I'll fix that.

>>  };
>>
>>  static struct kvm_coproc_target_table a7_target_table = {
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 1dc9778..106737e 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -278,7 +278,7 @@ static void stage2_flush_memslot(struct kvm *kvm,
>>   * Go through the stage 2 page tables and invalidate any cache lines
>>   * backing memory already mapped to the VM.
>>   */
>> -void stage2_flush_vm(struct kvm *kvm)
>> +static void stage2_flush_vm(struct kvm *kvm)
>>  {
>>       struct kvm_memslots *slots;
>>       struct kvm_memory_slot *memslot;
>> @@ -1411,3 +1411,71 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>>       unmap_stage2_range(kvm, gpa, size);
>>       spin_unlock(&kvm->mmu_lock);
>>  }
>> +
>> +/*
>> + * See note at ARMv7 ARM B1.14.4 (TL;DR: S/W ops are not easily virtualized).
>> + *
>> + * Main problems:
>> + * - S/W ops are local to a CPU (not broadcast)
>> + * - We have line migration behind our back (speculation)
>> + * - System caches don't support S/W at all (damn!)
>> + *
>> + * In the face of the above, the best we can do is to try and convert
>> + * S/W ops to VA ops. Because the guest is not allowed to infer the
>> + * S/W to PA mapping, it can only use S/W to nuke the whole cache,
>> + * which is a rather good thing for us.
>> + *
>> + * Also, it is only used when turning caches on/off ("The expected
>> + * usage of the cache maintenance instructions that operate by set/way
>> + * is associated with the cache maintenance instructions associated
>> + * with the powerdown and powerup of caches, if this is required by
>> + * the implementation.").
>> + *
>> + * We use the following policy:
>> + *
>> + * - If we trap a S/W operation, we enable VM trapping to detect
>> + *   caches being turned on/off, and do a full clean.
>> + *
>> + * - We flush the caches on both caches being turned on and off.
>> + *
>> + * - Once the caches are enabled, we stop trapping VM ops.
>> + */
>> +void kvm_set_way_flush(struct kvm_vcpu *vcpu)
>> +{
>> +     unsigned long hcr = vcpu_get_hcr(vcpu);
>> +
>> +     /*
>> +      * If this is the first time we do a S/W operation
>> +      * (i.e. HCR_TVM not set) flush the whole memory, and set the
>> +      * VM trapping.
> 
> what about when the guest boots, wouldn't it be using S/W operations to
> flush the whole cache before turning it on?  But we ignore this case now
> because we're already trapping VM regs so this is deferred until caches
> are turned on.

Exactly. Doing S/W ops with caches off can always be postponed until we
turn it on (this is the point where it actually matters).

> However, when we are turning off caches, we both flush the cache when
> doing the actual S/W operation and when the caches are then turned off.
> Why?

Think of it as a "belt and braces" measure. The guest tries to do
something *at that point*, and we should respect that. It doesn't mean
it makes sense from our point of view though. This is what the 32bit
kernel does, BTW (have a look at early_cachepolicy -- Mark Rutland is
trying to get rid of this).

So you can think of these as two different, rather unrelated events:
- Guest does S/W ops with caches on: we flush, knowing that this is a
best effort thing, probably doomed, but hey...
- Guest turns its caches off: we flush, knowing that this is the right
time to do it.

Hope this helps,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/3] arm/arm64: KVM: Use set/way op trapping to track the state of the caches
  2015-01-27 11:21     ` Marc Zyngier
@ 2015-01-27 13:17       ` Christoffer Dall
  2015-01-27 13:44         ` Marc Zyngier
  0 siblings, 1 reply; 12+ messages in thread
From: Christoffer Dall @ 2015-01-27 13:17 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, kvmarm, Peter Maydell, Steve Capper

On Tue, Jan 27, 2015 at 11:21:38AM +0000, Marc Zyngier wrote:
> On 26/01/15 22:58, Christoffer Dall wrote:
> > On Wed, Jan 21, 2015 at 06:39:46PM +0000, Marc Zyngier wrote:
> >> Trying to emulate the behaviour of set/way cache ops is fairly
> >> pointless, as there are too many ways we can end-up missing stuff.
> >> Also, there is some system caches out there that simply ignore
> >> set/way operations.
> >>
> >> So instead of trying to implement them, let's convert it to VA ops,
> >> and use them as a way to re-enable the trapping of VM ops. That way,
> >> we can detect the point when the MMU/caches are turned off, and do
> >> a full VM flush (which is what the guest was trying to do anyway).
> >>
> >> This allows a 32bit zImage to boot on the APM thingy, and will
> >> probably help bootloaders in general.
> >>
> >> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> > 
> > This had some conflicts with dirty page logging.  I fixed it up here,
> > and also removed some trailing white space and mixed spaces/tabs that
> > patch complained about:
> > 
> > http://git.linaro.org/people/christoffer.dall/linux-kvm-arm.git mm-fixes
> 
> Thanks for doing so.
> 
> >> ---
> >>  arch/arm/include/asm/kvm_emulate.h   | 10 +++++
> >>  arch/arm/include/asm/kvm_host.h      |  3 --
> >>  arch/arm/include/asm/kvm_mmu.h       |  3 +-
> >>  arch/arm/kvm/arm.c                   | 10 -----
> >>  arch/arm/kvm/coproc.c                | 64 ++++++------------------------
> >>  arch/arm/kvm/coproc_a15.c            |  2 +-
> >>  arch/arm/kvm/coproc_a7.c             |  2 +-
> >>  arch/arm/kvm/mmu.c                   | 70 ++++++++++++++++++++++++++++++++-
> >>  arch/arm/kvm/trace.h                 | 39 +++++++++++++++++++
> >>  arch/arm64/include/asm/kvm_emulate.h | 10 +++++
> >>  arch/arm64/include/asm/kvm_host.h    |  3 --
> >>  arch/arm64/include/asm/kvm_mmu.h     |  3 +-
> >>  arch/arm64/kvm/sys_regs.c            | 75 +++++-------------------------------
> >>  13 files changed, 155 insertions(+), 139 deletions(-)
> >>
> >> diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
> >> index 66ce176..7b01523 100644
> >> --- a/arch/arm/include/asm/kvm_emulate.h
> >> +++ b/arch/arm/include/asm/kvm_emulate.h
> >> @@ -38,6 +38,16 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
> >>       vcpu->arch.hcr = HCR_GUEST_MASK;
> >>  }
> >>
> >> +static inline unsigned long vcpu_get_hcr(struct kvm_vcpu *vcpu)
> >> +{
> >> +     return vcpu->arch.hcr;
> >> +}
> >> +
> >> +static inline void vcpu_set_hcr(struct kvm_vcpu *vcpu, unsigned long hcr)
> >> +{
> >> +     vcpu->arch.hcr = hcr;
> >> +}
> >> +
> >>  static inline bool vcpu_mode_is_32bit(struct kvm_vcpu *vcpu)
> >>  {
> >>       return 1;
> >> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> >> index 254e065..04b4ea0 100644
> >> --- a/arch/arm/include/asm/kvm_host.h
> >> +++ b/arch/arm/include/asm/kvm_host.h
> >> @@ -125,9 +125,6 @@ struct kvm_vcpu_arch {
> >>        * Anything that is not used directly from assembly code goes
> >>        * here.
> >>        */
> >> -     /* dcache set/way operation pending */
> >> -     int last_pcpu;
> >> -     cpumask_t require_dcache_flush;
> >>
> >>       /* Don't run the guest on this vcpu */
> >>       bool pause;
> >> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> >> index 63e0ecc..286644c 100644
> >> --- a/arch/arm/include/asm/kvm_mmu.h
> >> +++ b/arch/arm/include/asm/kvm_mmu.h
> >> @@ -190,7 +190,8 @@ static inline void coherent_cache_guest_page(struct kvm_vcpu *vcpu, hva_t hva,
> >>
> >>  #define kvm_virt_to_phys(x)          virt_to_idmap((unsigned long)(x))
> >>
> >> -void stage2_flush_vm(struct kvm *kvm);
> >> +void kvm_set_way_flush(struct kvm_vcpu *vcpu);
> >> +void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
> >>
> >>  #endif       /* !__ASSEMBLY__ */
> >>
> >> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> >> index 2d6d910..0b0d58a 100644
> >> --- a/arch/arm/kvm/arm.c
> >> +++ b/arch/arm/kvm/arm.c
> >> @@ -281,15 +281,6 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> >>       vcpu->cpu = cpu;
> >>       vcpu->arch.host_cpu_context = this_cpu_ptr(kvm_host_cpu_state);
> >>
> >> -     /*
> >> -      * Check whether this vcpu requires the cache to be flushed on
> >> -      * this physical CPU. This is a consequence of doing dcache
> >> -      * operations by set/way on this vcpu. We do it here to be in
> >> -      * a non-preemptible section.
> >> -      */
> >> -     if (cpumask_test_and_clear_cpu(cpu, &vcpu->arch.require_dcache_flush))
> >> -             flush_cache_all(); /* We'd really want v7_flush_dcache_all() */
> >> -
> >>       kvm_arm_set_running_vcpu(vcpu);
> >>  }
> >>
> >> @@ -541,7 +532,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >>               ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
> >>
> >>               vcpu->mode = OUTSIDE_GUEST_MODE;
> >> -             vcpu->arch.last_pcpu = smp_processor_id();
> >>               kvm_guest_exit();
> >>               trace_kvm_exit(*vcpu_pc(vcpu));
> >>               /*
> >> diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
> >> index 7928dbd..0afcc00 100644
> >> --- a/arch/arm/kvm/coproc.c
> >> +++ b/arch/arm/kvm/coproc.c
> >> @@ -189,82 +189,40 @@ static bool access_l2ectlr(struct kvm_vcpu *vcpu,
> >>       return true;
> >>  }
> >>
> >> -/* See note at ARM ARM B1.14.4 */
> >> +/*
> >> + * See note at ARMv7 ARM B1.14.4 (TL;DR: S/W ops are not easily virtualized).
> >> + */
> >>  static bool access_dcsw(struct kvm_vcpu *vcpu,
> >>                       const struct coproc_params *p,
> >>                       const struct coproc_reg *r)
> >>  {
> >> -     unsigned long val;
> >> -     int cpu;
> >> -
> >>       if (!p->is_write)
> >>               return read_from_write_only(vcpu, p);
> >>
> >> -     cpu = get_cpu();
> >> -
> >> -     cpumask_setall(&vcpu->arch.require_dcache_flush);
> >> -     cpumask_clear_cpu(cpu, &vcpu->arch.require_dcache_flush);
> >> -
> >> -     /* If we were already preempted, take the long way around */
> >> -     if (cpu != vcpu->arch.last_pcpu) {
> >> -             flush_cache_all();
> >> -             goto done;
> >> -     }
> >> -
> >> -     val = *vcpu_reg(vcpu, p->Rt1);
> >> -
> >> -     switch (p->CRm) {
> >> -     case 6:                 /* Upgrade DCISW to DCCISW, as per HCR.SWIO */
> >> -     case 14:                /* DCCISW */
> >> -             asm volatile("mcr p15, 0, %0, c7, c14, 2" : : "r" (val));
> >> -             break;
> >> -
> >> -     case 10:                /* DCCSW */
> >> -             asm volatile("mcr p15, 0, %0, c7, c10, 2" : : "r" (val));
> >> -             break;
> >> -     }
> >> -
> >> -done:
> >> -     put_cpu();
> >> -
> >> +     kvm_set_way_flush(vcpu);
> >>       return true;
> >>  }
> >>
> >>  /*
> >>   * Generic accessor for VM registers. Only called as long as HCR_TVM
> >> - * is set.
> >> + * is set.  If the guest enables the MMU, we stop trapping the VM
> >> + * sys_regs and leave it in complete control of the caches.
> >> + *
> >> + * Used by the cpu-specific code.
> >>   */
> >>  static bool access_vm_reg(struct kvm_vcpu *vcpu,
> >>                         const struct coproc_params *p,
> >>                         const struct coproc_reg *r)
> >>  {
> >> +     bool was_enabled = vcpu_has_cache_enabled(vcpu);
> >> +
> >>       BUG_ON(!p->is_write);
> >>
> >>       vcpu->arch.cp15[r->reg] = *vcpu_reg(vcpu, p->Rt1);
> >>       if (p->is_64bit)
> >>               vcpu->arch.cp15[r->reg + 1] = *vcpu_reg(vcpu, p->Rt2);
> >>
> >> -     return true;
> >> -}
> >> -
> >> -/*
> >> - * SCTLR accessor. Only called as long as HCR_TVM is set.  If the
> >> - * guest enables the MMU, we stop trapping the VM sys_regs and leave
> >> - * it in complete control of the caches.
> >> - *
> >> - * Used by the cpu-specific code.
> >> - */
> >> -bool access_sctlr(struct kvm_vcpu *vcpu,
> >> -               const struct coproc_params *p,
> >> -               const struct coproc_reg *r)
> >> -{
> > 
> > I think you left in a prototype in arch/arm/kvm/coproc.h
> > 
> >> -     access_vm_reg(vcpu, p, r);
> >> -
> >> -     if (vcpu_has_cache_enabled(vcpu)) {     /* MMU+Caches enabled? */
> >> -             vcpu->arch.hcr &= ~HCR_TVM;
> >> -             stage2_flush_vm(vcpu->kvm);
> >> -     }
> >> -
> >> +     kvm_toggle_cache(vcpu, was_enabled);
> >>       return true;
> >>  }
> >>
> >> diff --git a/arch/arm/kvm/coproc_a15.c b/arch/arm/kvm/coproc_a15.c
> >> index e6f4ae4..a713675 100644
> >> --- a/arch/arm/kvm/coproc_a15.c
> >> +++ b/arch/arm/kvm/coproc_a15.c
> >> @@ -34,7 +34,7 @@
> >>  static const struct coproc_reg a15_regs[] = {
> >>       /* SCTLR: swapped by interrupt.S. */
> >>       { CRn( 1), CRm( 0), Op1( 0), Op2( 0), is32,
> >> -                     access_sctlr, reset_val, c1_SCTLR, 0x00C50078 },
> >> +                     access_vm_reg, reset_val, c1_SCTLR, 0x00C50078 },
> >>  };
> >>
> >>  static struct kvm_coproc_target_table a15_target_table = {
> >> diff --git a/arch/arm/kvm/coproc_a7.c b/arch/arm/kvm/coproc_a7.c
> >> index 17fc7cd..b19e46d 100644
> >> --- a/arch/arm/kvm/coproc_a7.c
> >> +++ b/arch/arm/kvm/coproc_a7.c
> >> @@ -37,7 +37,7 @@
> >>  static const struct coproc_reg a7_regs[] = {
> >>       /* SCTLR: swapped by interrupt.S. */
> >>       { CRn( 1), CRm( 0), Op1( 0), Op2( 0), is32,
> >> -                     access_sctlr, reset_val, c1_SCTLR, 0x00C50878 },
> >> +                     access_vm_reg, reset_val, c1_SCTLR, 0x00C50878 },
> > 
> > I'm confused how you tested this, this doesn't seem to compile for me
> > (access_vm_reg is a static function).
> 
> That's worrying. It looks like I dropped a fixup while rebasing the
> series. Nothing major, but still... I'll fix that.
> 

no worries, just wanted to make sure we have tested the right thing.

> >>  };
> >>
> >>  static struct kvm_coproc_target_table a7_target_table = {
> >> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> >> index 1dc9778..106737e 100644
> >> --- a/arch/arm/kvm/mmu.c
> >> +++ b/arch/arm/kvm/mmu.c
> >> @@ -278,7 +278,7 @@ static void stage2_flush_memslot(struct kvm *kvm,
> >>   * Go through the stage 2 page tables and invalidate any cache lines
> >>   * backing memory already mapped to the VM.
> >>   */
> >> -void stage2_flush_vm(struct kvm *kvm)
> >> +static void stage2_flush_vm(struct kvm *kvm)
> >>  {
> >>       struct kvm_memslots *slots;
> >>       struct kvm_memory_slot *memslot;
> >> @@ -1411,3 +1411,71 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
> >>       unmap_stage2_range(kvm, gpa, size);
> >>       spin_unlock(&kvm->mmu_lock);
> >>  }
> >> +
> >> +/*
> >> + * See note at ARMv7 ARM B1.14.4 (TL;DR: S/W ops are not easily virtualized).
> >> + *
> >> + * Main problems:
> >> + * - S/W ops are local to a CPU (not broadcast)
> >> + * - We have line migration behind our back (speculation)
> >> + * - System caches don't support S/W at all (damn!)
> >> + *
> >> + * In the face of the above, the best we can do is to try and convert
> >> + * S/W ops to VA ops. Because the guest is not allowed to infer the
> >> + * S/W to PA mapping, it can only use S/W to nuke the whole cache,
> >> + * which is a rather good thing for us.
> >> + *
> >> + * Also, it is only used when turning caches on/off ("The expected
> >> + * usage of the cache maintenance instructions that operate by set/way
> >> + * is associated with the cache maintenance instructions associated
> >> + * with the powerdown and powerup of caches, if this is required by
> >> + * the implementation.").
> >> + *
> >> + * We use the following policy:
> >> + *
> >> + * - If we trap a S/W operation, we enable VM trapping to detect
> >> + *   caches being turned on/off, and do a full clean.
> >> + *
> >> + * - We flush the caches on both caches being turned on and off.
> >> + *
> >> + * - Once the caches are enabled, we stop trapping VM ops.
> >> + */
> >> +void kvm_set_way_flush(struct kvm_vcpu *vcpu)
> >> +{
> >> +     unsigned long hcr = vcpu_get_hcr(vcpu);
> >> +
> >> +     /*
> >> +      * If this is the first time we do a S/W operation
> >> +      * (i.e. HCR_TVM not set) flush the whole memory, and set the
> >> +      * VM trapping.
> > 
> > what about when the guest boots, wouldn't it be using S/W operations to
> > flush the whole cache before turning it on?  But we ignore this case now
> > because we're already trapping VM regs so this is deferred until caches
> > are turned on.
> 
> Exactly. Doing S/W ops with caches off can always be postponed until we
> turn it on (this is the point where it actually matters).
> 
> > However, when we are turning off caches, we both flush the cache when
> > doing the actual S/W operation and when the caches are then turned off.
> > Why?
> 
> Think of it as a "belt and braces" measure. The guest tries to do
> something *at that point*, and we should respect that. It doesn't mean
> it makes sense from our point of view though. This is what the 32bit
> kernel does, BTW (have a look at early_cachepolicy -- Mark Rutland is
> trying to get rid of this).

ok, and we don't need the "belt and braces" when the MMU is off because
the guest is not accessing the caches anyhow?

So shouldn't we be detecting the guest MMU state instead? But we found
this to be unstable before (for a still unknown reason), so we instead
rely on always having trapping TVM set when the MMU is off?

So is there any conceivable guest (UEFI or something else) that could
correctly be turning off the MMU without calling invalidate by set/way
and creating a situation that we don't catch now?  Or is that not a
problem because in the worst case we'll just be flushing too often?

What about the opposite case, if we have trapping set, but we actually
do need to flush the caches, how are we sure this never happens?

> 
> So you can think of these as two different, rather unrelated events:
> - Guest does S/W ops with caches on: we flush, knowing that this is a
> best effort thing, probably doomed, but hey...

ok, so Linux is basically doing something architecturally invalid?

> - Guest turns its caches off: we flush, knowing that this is the right
> time to do it.
> 
> Hope this helps,
> 

Yes a bit, but this does feel incredibly brittle, but I know it may be
the best option we have....

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/3] arm/arm64: KVM: Use set/way op trapping to track the state of the caches
  2015-01-27 13:17       ` Christoffer Dall
@ 2015-01-27 13:44         ` Marc Zyngier
  2015-01-29 13:40           ` Christoffer Dall
  0 siblings, 1 reply; 12+ messages in thread
From: Marc Zyngier @ 2015-01-27 13:44 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, kvmarm, Peter Maydell, Steve Capper

On 27/01/15 13:17, Christoffer Dall wrote:
> On Tue, Jan 27, 2015 at 11:21:38AM +0000, Marc Zyngier wrote:
>> On 26/01/15 22:58, Christoffer Dall wrote:
>>> On Wed, Jan 21, 2015 at 06:39:46PM +0000, Marc Zyngier wrote:
>>>> Trying to emulate the behaviour of set/way cache ops is fairly
>>>> pointless, as there are too many ways we can end-up missing stuff.
>>>> Also, there is some system caches out there that simply ignore
>>>> set/way operations.
>>>>
>>>> So instead of trying to implement them, let's convert it to VA ops,
>>>> and use them as a way to re-enable the trapping of VM ops. That way,
>>>> we can detect the point when the MMU/caches are turned off, and do
>>>> a full VM flush (which is what the guest was trying to do anyway).
>>>>
>>>> This allows a 32bit zImage to boot on the APM thingy, and will
>>>> probably help bootloaders in general.
>>>>
>>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>>
>>> This had some conflicts with dirty page logging.  I fixed it up here,
>>> and also removed some trailing white space and mixed spaces/tabs that
>>> patch complained about:
>>>
>>> http://git.linaro.org/people/christoffer.dall/linux-kvm-arm.git mm-fixes
>>
>> Thanks for doing so.
>>
>>>> ---
>>>>  arch/arm/include/asm/kvm_emulate.h   | 10 +++++
>>>>  arch/arm/include/asm/kvm_host.h      |  3 --
>>>>  arch/arm/include/asm/kvm_mmu.h       |  3 +-
>>>>  arch/arm/kvm/arm.c                   | 10 -----
>>>>  arch/arm/kvm/coproc.c                | 64 ++++++------------------------
>>>>  arch/arm/kvm/coproc_a15.c            |  2 +-
>>>>  arch/arm/kvm/coproc_a7.c             |  2 +-
>>>>  arch/arm/kvm/mmu.c                   | 70 ++++++++++++++++++++++++++++++++-
>>>>  arch/arm/kvm/trace.h                 | 39 +++++++++++++++++++
>>>>  arch/arm64/include/asm/kvm_emulate.h | 10 +++++
>>>>  arch/arm64/include/asm/kvm_host.h    |  3 --
>>>>  arch/arm64/include/asm/kvm_mmu.h     |  3 +-
>>>>  arch/arm64/kvm/sys_regs.c            | 75 +++++-------------------------------
>>>>  13 files changed, 155 insertions(+), 139 deletions(-)
>>>>
>>>> diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
>>>> index 66ce176..7b01523 100644
>>>> --- a/arch/arm/include/asm/kvm_emulate.h
>>>> +++ b/arch/arm/include/asm/kvm_emulate.h
>>>> @@ -38,6 +38,16 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
>>>>       vcpu->arch.hcr = HCR_GUEST_MASK;
>>>>  }
>>>>
>>>> +static inline unsigned long vcpu_get_hcr(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +     return vcpu->arch.hcr;
>>>> +}
>>>> +
>>>> +static inline void vcpu_set_hcr(struct kvm_vcpu *vcpu, unsigned long hcr)
>>>> +{
>>>> +     vcpu->arch.hcr = hcr;
>>>> +}
>>>> +
>>>>  static inline bool vcpu_mode_is_32bit(struct kvm_vcpu *vcpu)
>>>>  {
>>>>       return 1;
>>>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>>>> index 254e065..04b4ea0 100644
>>>> --- a/arch/arm/include/asm/kvm_host.h
>>>> +++ b/arch/arm/include/asm/kvm_host.h
>>>> @@ -125,9 +125,6 @@ struct kvm_vcpu_arch {
>>>>        * Anything that is not used directly from assembly code goes
>>>>        * here.
>>>>        */
>>>> -     /* dcache set/way operation pending */
>>>> -     int last_pcpu;
>>>> -     cpumask_t require_dcache_flush;
>>>>
>>>>       /* Don't run the guest on this vcpu */
>>>>       bool pause;
>>>> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
>>>> index 63e0ecc..286644c 100644
>>>> --- a/arch/arm/include/asm/kvm_mmu.h
>>>> +++ b/arch/arm/include/asm/kvm_mmu.h
>>>> @@ -190,7 +190,8 @@ static inline void coherent_cache_guest_page(struct kvm_vcpu *vcpu, hva_t hva,
>>>>
>>>>  #define kvm_virt_to_phys(x)          virt_to_idmap((unsigned long)(x))
>>>>
>>>> -void stage2_flush_vm(struct kvm *kvm);
>>>> +void kvm_set_way_flush(struct kvm_vcpu *vcpu);
>>>> +void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
>>>>
>>>>  #endif       /* !__ASSEMBLY__ */
>>>>
>>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>>>> index 2d6d910..0b0d58a 100644
>>>> --- a/arch/arm/kvm/arm.c
>>>> +++ b/arch/arm/kvm/arm.c
>>>> @@ -281,15 +281,6 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>>>       vcpu->cpu = cpu;
>>>>       vcpu->arch.host_cpu_context = this_cpu_ptr(kvm_host_cpu_state);
>>>>
>>>> -     /*
>>>> -      * Check whether this vcpu requires the cache to be flushed on
>>>> -      * this physical CPU. This is a consequence of doing dcache
>>>> -      * operations by set/way on this vcpu. We do it here to be in
>>>> -      * a non-preemptible section.
>>>> -      */
>>>> -     if (cpumask_test_and_clear_cpu(cpu, &vcpu->arch.require_dcache_flush))
>>>> -             flush_cache_all(); /* We'd really want v7_flush_dcache_all() */
>>>> -
>>>>       kvm_arm_set_running_vcpu(vcpu);
>>>>  }
>>>>
>>>> @@ -541,7 +532,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>>               ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
>>>>
>>>>               vcpu->mode = OUTSIDE_GUEST_MODE;
>>>> -             vcpu->arch.last_pcpu = smp_processor_id();
>>>>               kvm_guest_exit();
>>>>               trace_kvm_exit(*vcpu_pc(vcpu));
>>>>               /*
>>>> diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
>>>> index 7928dbd..0afcc00 100644
>>>> --- a/arch/arm/kvm/coproc.c
>>>> +++ b/arch/arm/kvm/coproc.c
>>>> @@ -189,82 +189,40 @@ static bool access_l2ectlr(struct kvm_vcpu *vcpu,
>>>>       return true;
>>>>  }
>>>>
>>>> -/* See note at ARM ARM B1.14.4 */
>>>> +/*
>>>> + * See note at ARMv7 ARM B1.14.4 (TL;DR: S/W ops are not easily virtualized).
>>>> + */
>>>>  static bool access_dcsw(struct kvm_vcpu *vcpu,
>>>>                       const struct coproc_params *p,
>>>>                       const struct coproc_reg *r)
>>>>  {
>>>> -     unsigned long val;
>>>> -     int cpu;
>>>> -
>>>>       if (!p->is_write)
>>>>               return read_from_write_only(vcpu, p);
>>>>
>>>> -     cpu = get_cpu();
>>>> -
>>>> -     cpumask_setall(&vcpu->arch.require_dcache_flush);
>>>> -     cpumask_clear_cpu(cpu, &vcpu->arch.require_dcache_flush);
>>>> -
>>>> -     /* If we were already preempted, take the long way around */
>>>> -     if (cpu != vcpu->arch.last_pcpu) {
>>>> -             flush_cache_all();
>>>> -             goto done;
>>>> -     }
>>>> -
>>>> -     val = *vcpu_reg(vcpu, p->Rt1);
>>>> -
>>>> -     switch (p->CRm) {
>>>> -     case 6:                 /* Upgrade DCISW to DCCISW, as per HCR.SWIO */
>>>> -     case 14:                /* DCCISW */
>>>> -             asm volatile("mcr p15, 0, %0, c7, c14, 2" : : "r" (val));
>>>> -             break;
>>>> -
>>>> -     case 10:                /* DCCSW */
>>>> -             asm volatile("mcr p15, 0, %0, c7, c10, 2" : : "r" (val));
>>>> -             break;
>>>> -     }
>>>> -
>>>> -done:
>>>> -     put_cpu();
>>>> -
>>>> +     kvm_set_way_flush(vcpu);
>>>>       return true;
>>>>  }
>>>>
>>>>  /*
>>>>   * Generic accessor for VM registers. Only called as long as HCR_TVM
>>>> - * is set.
>>>> + * is set.  If the guest enables the MMU, we stop trapping the VM
>>>> + * sys_regs and leave it in complete control of the caches.
>>>> + *
>>>> + * Used by the cpu-specific code.
>>>>   */
>>>>  static bool access_vm_reg(struct kvm_vcpu *vcpu,
>>>>                         const struct coproc_params *p,
>>>>                         const struct coproc_reg *r)
>>>>  {
>>>> +     bool was_enabled = vcpu_has_cache_enabled(vcpu);
>>>> +
>>>>       BUG_ON(!p->is_write);
>>>>
>>>>       vcpu->arch.cp15[r->reg] = *vcpu_reg(vcpu, p->Rt1);
>>>>       if (p->is_64bit)
>>>>               vcpu->arch.cp15[r->reg + 1] = *vcpu_reg(vcpu, p->Rt2);
>>>>
>>>> -     return true;
>>>> -}
>>>> -
>>>> -/*
>>>> - * SCTLR accessor. Only called as long as HCR_TVM is set.  If the
>>>> - * guest enables the MMU, we stop trapping the VM sys_regs and leave
>>>> - * it in complete control of the caches.
>>>> - *
>>>> - * Used by the cpu-specific code.
>>>> - */
>>>> -bool access_sctlr(struct kvm_vcpu *vcpu,
>>>> -               const struct coproc_params *p,
>>>> -               const struct coproc_reg *r)
>>>> -{
>>>
>>> I think you left in a prototype in arch/arm/kvm/coproc.h
>>>
>>>> -     access_vm_reg(vcpu, p, r);
>>>> -
>>>> -     if (vcpu_has_cache_enabled(vcpu)) {     /* MMU+Caches enabled? */
>>>> -             vcpu->arch.hcr &= ~HCR_TVM;
>>>> -             stage2_flush_vm(vcpu->kvm);
>>>> -     }
>>>> -
>>>> +     kvm_toggle_cache(vcpu, was_enabled);
>>>>       return true;
>>>>  }
>>>>
>>>> diff --git a/arch/arm/kvm/coproc_a15.c b/arch/arm/kvm/coproc_a15.c
>>>> index e6f4ae4..a713675 100644
>>>> --- a/arch/arm/kvm/coproc_a15.c
>>>> +++ b/arch/arm/kvm/coproc_a15.c
>>>> @@ -34,7 +34,7 @@
>>>>  static const struct coproc_reg a15_regs[] = {
>>>>       /* SCTLR: swapped by interrupt.S. */
>>>>       { CRn( 1), CRm( 0), Op1( 0), Op2( 0), is32,
>>>> -                     access_sctlr, reset_val, c1_SCTLR, 0x00C50078 },
>>>> +                     access_vm_reg, reset_val, c1_SCTLR, 0x00C50078 },
>>>>  };
>>>>
>>>>  static struct kvm_coproc_target_table a15_target_table = {
>>>> diff --git a/arch/arm/kvm/coproc_a7.c b/arch/arm/kvm/coproc_a7.c
>>>> index 17fc7cd..b19e46d 100644
>>>> --- a/arch/arm/kvm/coproc_a7.c
>>>> +++ b/arch/arm/kvm/coproc_a7.c
>>>> @@ -37,7 +37,7 @@
>>>>  static const struct coproc_reg a7_regs[] = {
>>>>       /* SCTLR: swapped by interrupt.S. */
>>>>       { CRn( 1), CRm( 0), Op1( 0), Op2( 0), is32,
>>>> -                     access_sctlr, reset_val, c1_SCTLR, 0x00C50878 },
>>>> +                     access_vm_reg, reset_val, c1_SCTLR, 0x00C50878 },
>>>
>>> I'm confused how you tested this, this doesn't seem to compile for me
>>> (access_vm_reg is a static function).
>>
>> That's worrying. It looks like I dropped a fixup while rebasing the
>> series. Nothing major, but still... I'll fix that.
>>
> 
> no worries, just wanted to make sure we have tested the right thing.
> 
>>>>  };
>>>>
>>>>  static struct kvm_coproc_target_table a7_target_table = {
>>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>>>> index 1dc9778..106737e 100644
>>>> --- a/arch/arm/kvm/mmu.c
>>>> +++ b/arch/arm/kvm/mmu.c
>>>> @@ -278,7 +278,7 @@ static void stage2_flush_memslot(struct kvm *kvm,
>>>>   * Go through the stage 2 page tables and invalidate any cache lines
>>>>   * backing memory already mapped to the VM.
>>>>   */
>>>> -void stage2_flush_vm(struct kvm *kvm)
>>>> +static void stage2_flush_vm(struct kvm *kvm)
>>>>  {
>>>>       struct kvm_memslots *slots;
>>>>       struct kvm_memory_slot *memslot;
>>>> @@ -1411,3 +1411,71 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>>>>       unmap_stage2_range(kvm, gpa, size);
>>>>       spin_unlock(&kvm->mmu_lock);
>>>>  }
>>>> +
>>>> +/*
>>>> + * See note at ARMv7 ARM B1.14.4 (TL;DR: S/W ops are not easily virtualized).
>>>> + *
>>>> + * Main problems:
>>>> + * - S/W ops are local to a CPU (not broadcast)
>>>> + * - We have line migration behind our back (speculation)
>>>> + * - System caches don't support S/W at all (damn!)
>>>> + *
>>>> + * In the face of the above, the best we can do is to try and convert
>>>> + * S/W ops to VA ops. Because the guest is not allowed to infer the
>>>> + * S/W to PA mapping, it can only use S/W to nuke the whole cache,
>>>> + * which is a rather good thing for us.
>>>> + *
>>>> + * Also, it is only used when turning caches on/off ("The expected
>>>> + * usage of the cache maintenance instructions that operate by set/way
>>>> + * is associated with the cache maintenance instructions associated
>>>> + * with the powerdown and powerup of caches, if this is required by
>>>> + * the implementation.").
>>>> + *
>>>> + * We use the following policy:
>>>> + *
>>>> + * - If we trap a S/W operation, we enable VM trapping to detect
>>>> + *   caches being turned on/off, and do a full clean.
>>>> + *
>>>> + * - We flush the caches on both caches being turned on and off.
>>>> + *
>>>> + * - Once the caches are enabled, we stop trapping VM ops.
>>>> + */
>>>> +void kvm_set_way_flush(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +     unsigned long hcr = vcpu_get_hcr(vcpu);
>>>> +
>>>> +     /*
>>>> +      * If this is the first time we do a S/W operation
>>>> +      * (i.e. HCR_TVM not set) flush the whole memory, and set the
>>>> +      * VM trapping.
>>>
>>> what about when the guest boots, wouldn't it be using S/W operations to
>>> flush the whole cache before turning it on?  But we ignore this case now
>>> because we're already trapping VM regs so this is deferred until caches
>>> are turned on.
>>
>> Exactly. Doing S/W ops with caches off can always be postponed until we
>> turn it on (this is the point where it actually matters).
>>
>>> However, when we are turning off caches, we both flush the cache when
>>> doing the actual S/W operation and when the caches are then turned off.
>>> Why?
>>
>> Think of it as a "belt and braces" measure. The guest tries to do
>> something *at that point*, and we should respect that. It doesn't mean
>> it makes sense from our point of view though. This is what the 32bit
>> kernel does, BTW (have a look at early_cachepolicy -- Mark Rutland is
>> trying to get rid of this).
> 
> ok, and we don't need the "belt and braces" when the MMU is off because
> the guest is not accessing the caches anyhow?

Yes. We don't hit in the cache (since they are off), so we can safely
defer this until caches are turned on.

> So shouldn't we be detecting the guest MMU state instead? But we found
> this to be unstable before (for a still unknown reason), so we instead
> rely on always having trapping TVM set when the MMU is off?

Tracking the MMU state would be the ideal thing to do. It is not so much
that it is unreliable, but that detecting the MMU being turned off is
extremely expensive (TVM traps all accessed to the VM regs).

Also, as you mentioned, there is still a code path I don't really
understand in the 32bit decompressor, where we somehow failed to notice
that the MMU had been turned off (it was with another version of the
patch, and things have changed quite a bit since).

> So is there any conceivable guest (UEFI or something else) that could
> correctly be turning off the MMU without calling invalidate by set/way
> and creating a situation that we don't catch now?  Or is that not a
> problem because in the worst case we'll just be flushing too often?

If the guest flushes by VA, then we're safe. If it doesn't flush at all,
it is a blatant bug. In both cases, trapping the MMU access doesn't buy
us much.

> What about the opposite case, if we have trapping set, but we actually
> do need to flush the caches, how are we sure this never happens?

You lost me here. What is the exact case?

>>
>> So you can think of these as two different, rather unrelated events:
>> - Guest does S/W ops with caches on: we flush, knowing that this is a
>> best effort thing, probably doomed, but hey...
> 
> ok, so Linux is basically doing something architecturally invalid?

Indeed. That area of the code is fairly old and covers all versions of
the architecture, without much distinctions as to what is valid where...

>> - Guest turns its caches off: we flush, knowing that this is the right
>> time to do it.
>>
>> Hope this helps,
>>
> 
> Yes a bit, but this does feel incredibly brittle, but I know it may be
> the best option we have....

Yup, this is probably the worse part of the architecture. We have to
work our way through both architectural issues (inherited from previous
revisions), and legacy code that has hardly evolved since ARMv4 (and
basically works by luck, and the lack of a system cache).

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/3] arm/arm64: KVM: Use set/way op trapping to track the state of the caches
  2015-01-27 13:44         ` Marc Zyngier
@ 2015-01-29 13:40           ` Christoffer Dall
  0 siblings, 0 replies; 12+ messages in thread
From: Christoffer Dall @ 2015-01-29 13:40 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, kvmarm, Peter Maydell, Steve Capper

On Tue, Jan 27, 2015 at 01:44:05PM +0000, Marc Zyngier wrote:
> On 27/01/15 13:17, Christoffer Dall wrote:
> > On Tue, Jan 27, 2015 at 11:21:38AM +0000, Marc Zyngier wrote:
> >> On 26/01/15 22:58, Christoffer Dall wrote:
> >>> On Wed, Jan 21, 2015 at 06:39:46PM +0000, Marc Zyngier wrote:
> >>>> Trying to emulate the behaviour of set/way cache ops is fairly
> >>>> pointless, as there are too many ways we can end-up missing stuff.
> >>>> Also, there is some system caches out there that simply ignore
> >>>> set/way operations.
> >>>>
> >>>> So instead of trying to implement them, let's convert it to VA ops,
> >>>> and use them as a way to re-enable the trapping of VM ops. That way,
> >>>> we can detect the point when the MMU/caches are turned off, and do
> >>>> a full VM flush (which is what the guest was trying to do anyway).
> >>>>
> >>>> This allows a 32bit zImage to boot on the APM thingy, and will
> >>>> probably help bootloaders in general.
> >>>>
> >>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >>>
> >>> This had some conflicts with dirty page logging.  I fixed it up here,
> >>> and also removed some trailing white space and mixed spaces/tabs that
> >>> patch complained about:
> >>>
> >>> http://git.linaro.org/people/christoffer.dall/linux-kvm-arm.git mm-fixes
> >>
> >> Thanks for doing so.
> >>
> >>>> ---
> >>>>  arch/arm/include/asm/kvm_emulate.h   | 10 +++++
> >>>>  arch/arm/include/asm/kvm_host.h      |  3 --
> >>>>  arch/arm/include/asm/kvm_mmu.h       |  3 +-
> >>>>  arch/arm/kvm/arm.c                   | 10 -----
> >>>>  arch/arm/kvm/coproc.c                | 64 ++++++------------------------
> >>>>  arch/arm/kvm/coproc_a15.c            |  2 +-
> >>>>  arch/arm/kvm/coproc_a7.c             |  2 +-
> >>>>  arch/arm/kvm/mmu.c                   | 70 ++++++++++++++++++++++++++++++++-
> >>>>  arch/arm/kvm/trace.h                 | 39 +++++++++++++++++++
> >>>>  arch/arm64/include/asm/kvm_emulate.h | 10 +++++
> >>>>  arch/arm64/include/asm/kvm_host.h    |  3 --
> >>>>  arch/arm64/include/asm/kvm_mmu.h     |  3 +-
> >>>>  arch/arm64/kvm/sys_regs.c            | 75 +++++-------------------------------
> >>>>  13 files changed, 155 insertions(+), 139 deletions(-)
> >>>>
> >>>> diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
> >>>> index 66ce176..7b01523 100644
> >>>> --- a/arch/arm/include/asm/kvm_emulate.h
> >>>> +++ b/arch/arm/include/asm/kvm_emulate.h
> >>>> @@ -38,6 +38,16 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
> >>>>       vcpu->arch.hcr = HCR_GUEST_MASK;
> >>>>  }
> >>>>
> >>>> +static inline unsigned long vcpu_get_hcr(struct kvm_vcpu *vcpu)
> >>>> +{
> >>>> +     return vcpu->arch.hcr;
> >>>> +}
> >>>> +
> >>>> +static inline void vcpu_set_hcr(struct kvm_vcpu *vcpu, unsigned long hcr)
> >>>> +{
> >>>> +     vcpu->arch.hcr = hcr;
> >>>> +}
> >>>> +
> >>>>  static inline bool vcpu_mode_is_32bit(struct kvm_vcpu *vcpu)
> >>>>  {
> >>>>       return 1;
> >>>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> >>>> index 254e065..04b4ea0 100644
> >>>> --- a/arch/arm/include/asm/kvm_host.h
> >>>> +++ b/arch/arm/include/asm/kvm_host.h
> >>>> @@ -125,9 +125,6 @@ struct kvm_vcpu_arch {
> >>>>        * Anything that is not used directly from assembly code goes
> >>>>        * here.
> >>>>        */
> >>>> -     /* dcache set/way operation pending */
> >>>> -     int last_pcpu;
> >>>> -     cpumask_t require_dcache_flush;
> >>>>
> >>>>       /* Don't run the guest on this vcpu */
> >>>>       bool pause;
> >>>> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> >>>> index 63e0ecc..286644c 100644
> >>>> --- a/arch/arm/include/asm/kvm_mmu.h
> >>>> +++ b/arch/arm/include/asm/kvm_mmu.h
> >>>> @@ -190,7 +190,8 @@ static inline void coherent_cache_guest_page(struct kvm_vcpu *vcpu, hva_t hva,
> >>>>
> >>>>  #define kvm_virt_to_phys(x)          virt_to_idmap((unsigned long)(x))
> >>>>
> >>>> -void stage2_flush_vm(struct kvm *kvm);
> >>>> +void kvm_set_way_flush(struct kvm_vcpu *vcpu);
> >>>> +void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
> >>>>
> >>>>  #endif       /* !__ASSEMBLY__ */
> >>>>
> >>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> >>>> index 2d6d910..0b0d58a 100644
> >>>> --- a/arch/arm/kvm/arm.c
> >>>> +++ b/arch/arm/kvm/arm.c
> >>>> @@ -281,15 +281,6 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> >>>>       vcpu->cpu = cpu;
> >>>>       vcpu->arch.host_cpu_context = this_cpu_ptr(kvm_host_cpu_state);
> >>>>
> >>>> -     /*
> >>>> -      * Check whether this vcpu requires the cache to be flushed on
> >>>> -      * this physical CPU. This is a consequence of doing dcache
> >>>> -      * operations by set/way on this vcpu. We do it here to be in
> >>>> -      * a non-preemptible section.
> >>>> -      */
> >>>> -     if (cpumask_test_and_clear_cpu(cpu, &vcpu->arch.require_dcache_flush))
> >>>> -             flush_cache_all(); /* We'd really want v7_flush_dcache_all() */
> >>>> -
> >>>>       kvm_arm_set_running_vcpu(vcpu);
> >>>>  }
> >>>>
> >>>> @@ -541,7 +532,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >>>>               ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
> >>>>
> >>>>               vcpu->mode = OUTSIDE_GUEST_MODE;
> >>>> -             vcpu->arch.last_pcpu = smp_processor_id();
> >>>>               kvm_guest_exit();
> >>>>               trace_kvm_exit(*vcpu_pc(vcpu));
> >>>>               /*
> >>>> diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
> >>>> index 7928dbd..0afcc00 100644
> >>>> --- a/arch/arm/kvm/coproc.c
> >>>> +++ b/arch/arm/kvm/coproc.c
> >>>> @@ -189,82 +189,40 @@ static bool access_l2ectlr(struct kvm_vcpu *vcpu,
> >>>>       return true;
> >>>>  }
> >>>>
> >>>> -/* See note at ARM ARM B1.14.4 */
> >>>> +/*
> >>>> + * See note at ARMv7 ARM B1.14.4 (TL;DR: S/W ops are not easily virtualized).
> >>>> + */
> >>>>  static bool access_dcsw(struct kvm_vcpu *vcpu,
> >>>>                       const struct coproc_params *p,
> >>>>                       const struct coproc_reg *r)
> >>>>  {
> >>>> -     unsigned long val;
> >>>> -     int cpu;
> >>>> -
> >>>>       if (!p->is_write)
> >>>>               return read_from_write_only(vcpu, p);
> >>>>
> >>>> -     cpu = get_cpu();
> >>>> -
> >>>> -     cpumask_setall(&vcpu->arch.require_dcache_flush);
> >>>> -     cpumask_clear_cpu(cpu, &vcpu->arch.require_dcache_flush);
> >>>> -
> >>>> -     /* If we were already preempted, take the long way around */
> >>>> -     if (cpu != vcpu->arch.last_pcpu) {
> >>>> -             flush_cache_all();
> >>>> -             goto done;
> >>>> -     }
> >>>> -
> >>>> -     val = *vcpu_reg(vcpu, p->Rt1);
> >>>> -
> >>>> -     switch (p->CRm) {
> >>>> -     case 6:                 /* Upgrade DCISW to DCCISW, as per HCR.SWIO */
> >>>> -     case 14:                /* DCCISW */
> >>>> -             asm volatile("mcr p15, 0, %0, c7, c14, 2" : : "r" (val));
> >>>> -             break;
> >>>> -
> >>>> -     case 10:                /* DCCSW */
> >>>> -             asm volatile("mcr p15, 0, %0, c7, c10, 2" : : "r" (val));
> >>>> -             break;
> >>>> -     }
> >>>> -
> >>>> -done:
> >>>> -     put_cpu();
> >>>> -
> >>>> +     kvm_set_way_flush(vcpu);
> >>>>       return true;
> >>>>  }
> >>>>
> >>>>  /*
> >>>>   * Generic accessor for VM registers. Only called as long as HCR_TVM
> >>>> - * is set.
> >>>> + * is set.  If the guest enables the MMU, we stop trapping the VM
> >>>> + * sys_regs and leave it in complete control of the caches.
> >>>> + *
> >>>> + * Used by the cpu-specific code.
> >>>>   */
> >>>>  static bool access_vm_reg(struct kvm_vcpu *vcpu,
> >>>>                         const struct coproc_params *p,
> >>>>                         const struct coproc_reg *r)
> >>>>  {
> >>>> +     bool was_enabled = vcpu_has_cache_enabled(vcpu);
> >>>> +
> >>>>       BUG_ON(!p->is_write);
> >>>>
> >>>>       vcpu->arch.cp15[r->reg] = *vcpu_reg(vcpu, p->Rt1);
> >>>>       if (p->is_64bit)
> >>>>               vcpu->arch.cp15[r->reg + 1] = *vcpu_reg(vcpu, p->Rt2);
> >>>>
> >>>> -     return true;
> >>>> -}
> >>>> -
> >>>> -/*
> >>>> - * SCTLR accessor. Only called as long as HCR_TVM is set.  If the
> >>>> - * guest enables the MMU, we stop trapping the VM sys_regs and leave
> >>>> - * it in complete control of the caches.
> >>>> - *
> >>>> - * Used by the cpu-specific code.
> >>>> - */
> >>>> -bool access_sctlr(struct kvm_vcpu *vcpu,
> >>>> -               const struct coproc_params *p,
> >>>> -               const struct coproc_reg *r)
> >>>> -{
> >>>
> >>> I think you left in a prototype in arch/arm/kvm/coproc.h
> >>>
> >>>> -     access_vm_reg(vcpu, p, r);
> >>>> -
> >>>> -     if (vcpu_has_cache_enabled(vcpu)) {     /* MMU+Caches enabled? */
> >>>> -             vcpu->arch.hcr &= ~HCR_TVM;
> >>>> -             stage2_flush_vm(vcpu->kvm);
> >>>> -     }
> >>>> -
> >>>> +     kvm_toggle_cache(vcpu, was_enabled);
> >>>>       return true;
> >>>>  }
> >>>>
> >>>> diff --git a/arch/arm/kvm/coproc_a15.c b/arch/arm/kvm/coproc_a15.c
> >>>> index e6f4ae4..a713675 100644
> >>>> --- a/arch/arm/kvm/coproc_a15.c
> >>>> +++ b/arch/arm/kvm/coproc_a15.c
> >>>> @@ -34,7 +34,7 @@
> >>>>  static const struct coproc_reg a15_regs[] = {
> >>>>       /* SCTLR: swapped by interrupt.S. */
> >>>>       { CRn( 1), CRm( 0), Op1( 0), Op2( 0), is32,
> >>>> -                     access_sctlr, reset_val, c1_SCTLR, 0x00C50078 },
> >>>> +                     access_vm_reg, reset_val, c1_SCTLR, 0x00C50078 },
> >>>>  };
> >>>>
> >>>>  static struct kvm_coproc_target_table a15_target_table = {
> >>>> diff --git a/arch/arm/kvm/coproc_a7.c b/arch/arm/kvm/coproc_a7.c
> >>>> index 17fc7cd..b19e46d 100644
> >>>> --- a/arch/arm/kvm/coproc_a7.c
> >>>> +++ b/arch/arm/kvm/coproc_a7.c
> >>>> @@ -37,7 +37,7 @@
> >>>>  static const struct coproc_reg a7_regs[] = {
> >>>>       /* SCTLR: swapped by interrupt.S. */
> >>>>       { CRn( 1), CRm( 0), Op1( 0), Op2( 0), is32,
> >>>> -                     access_sctlr, reset_val, c1_SCTLR, 0x00C50878 },
> >>>> +                     access_vm_reg, reset_val, c1_SCTLR, 0x00C50878 },
> >>>
> >>> I'm confused how you tested this, this doesn't seem to compile for me
> >>> (access_vm_reg is a static function).
> >>
> >> That's worrying. It looks like I dropped a fixup while rebasing the
> >> series. Nothing major, but still... I'll fix that.
> >>
> > 
> > no worries, just wanted to make sure we have tested the right thing.
> > 
> >>>>  };
> >>>>
> >>>>  static struct kvm_coproc_target_table a7_target_table = {
> >>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> >>>> index 1dc9778..106737e 100644
> >>>> --- a/arch/arm/kvm/mmu.c
> >>>> +++ b/arch/arm/kvm/mmu.c
> >>>> @@ -278,7 +278,7 @@ static void stage2_flush_memslot(struct kvm *kvm,
> >>>>   * Go through the stage 2 page tables and invalidate any cache lines
> >>>>   * backing memory already mapped to the VM.
> >>>>   */
> >>>> -void stage2_flush_vm(struct kvm *kvm)
> >>>> +static void stage2_flush_vm(struct kvm *kvm)
> >>>>  {
> >>>>       struct kvm_memslots *slots;
> >>>>       struct kvm_memory_slot *memslot;
> >>>> @@ -1411,3 +1411,71 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
> >>>>       unmap_stage2_range(kvm, gpa, size);
> >>>>       spin_unlock(&kvm->mmu_lock);
> >>>>  }
> >>>> +
> >>>> +/*
> >>>> + * See note at ARMv7 ARM B1.14.4 (TL;DR: S/W ops are not easily virtualized).
> >>>> + *
> >>>> + * Main problems:
> >>>> + * - S/W ops are local to a CPU (not broadcast)
> >>>> + * - We have line migration behind our back (speculation)
> >>>> + * - System caches don't support S/W at all (damn!)
> >>>> + *
> >>>> + * In the face of the above, the best we can do is to try and convert
> >>>> + * S/W ops to VA ops. Because the guest is not allowed to infer the
> >>>> + * S/W to PA mapping, it can only use S/W to nuke the whole cache,
> >>>> + * which is a rather good thing for us.
> >>>> + *
> >>>> + * Also, it is only used when turning caches on/off ("The expected
> >>>> + * usage of the cache maintenance instructions that operate by set/way
> >>>> + * is associated with the cache maintenance instructions associated
> >>>> + * with the powerdown and powerup of caches, if this is required by
> >>>> + * the implementation.").
> >>>> + *
> >>>> + * We use the following policy:
> >>>> + *
> >>>> + * - If we trap a S/W operation, we enable VM trapping to detect
> >>>> + *   caches being turned on/off, and do a full clean.
> >>>> + *
> >>>> + * - We flush the caches on both caches being turned on and off.
> >>>> + *
> >>>> + * - Once the caches are enabled, we stop trapping VM ops.
> >>>> + */
> >>>> +void kvm_set_way_flush(struct kvm_vcpu *vcpu)
> >>>> +{
> >>>> +     unsigned long hcr = vcpu_get_hcr(vcpu);
> >>>> +
> >>>> +     /*
> >>>> +      * If this is the first time we do a S/W operation
> >>>> +      * (i.e. HCR_TVM not set) flush the whole memory, and set the
> >>>> +      * VM trapping.
> >>>
> >>> what about when the guest boots, wouldn't it be using S/W operations to
> >>> flush the whole cache before turning it on?  But we ignore this case now
> >>> because we're already trapping VM regs so this is deferred until caches
> >>> are turned on.
> >>
> >> Exactly. Doing S/W ops with caches off can always be postponed until we
> >> turn it on (this is the point where it actually matters).
> >>
> >>> However, when we are turning off caches, we both flush the cache when
> >>> doing the actual S/W operation and when the caches are then turned off.
> >>> Why?
> >>
> >> Think of it as a "belt and braces" measure. The guest tries to do
> >> something *at that point*, and we should respect that. It doesn't mean
> >> it makes sense from our point of view though. This is what the 32bit
> >> kernel does, BTW (have a look at early_cachepolicy -- Mark Rutland is
> >> trying to get rid of this).
> > 
> > ok, and we don't need the "belt and braces" when the MMU is off because
> > the guest is not accessing the caches anyhow?
> 
> Yes. We don't hit in the cache (since they are off), so we can safely
> defer this until caches are turned on.
> 
> > So shouldn't we be detecting the guest MMU state instead? But we found
> > this to be unstable before (for a still unknown reason), so we instead
> > rely on always having trapping TVM set when the MMU is off?
> 
> Tracking the MMU state would be the ideal thing to do. It is not so much
> that it is unreliable, but that detecting the MMU being turned off is
> extremely expensive (TVM traps all accessed to the VM regs).

I meant that when we trap on S/W, we should decide on flushing 
based on whether the guest has turned the MMU off, and not trying to
infer this from our setting of HCR_TVM.  The only reason I can see for
doing this is that we are somehow not properly detecting the guest MMU
state.

> 
> Also, as you mentioned, there is still a code path I don't really
> understand in the 32bit decompressor, where we somehow failed to notice
> that the MMU had been turned off (it was with another version of the
> patch, and things have changed quite a bit since).
> 

hmm, the thing I don't really get by looking at the code now is why we
can avoid doing the flush if the TVM is set, this feels at least very
unintuitive.

> > So is there any conceivable guest (UEFI or something else) that could
> > correctly be turning off the MMU without calling invalidate by set/way
> > and creating a situation that we don't catch now?  Or is that not a
> > problem because in the worst case we'll just be flushing too often?
> 
> If the guest flushes by VA, then we're safe. If it doesn't flush at all,
> it is a blatant bug. In both cases, trapping the MMU access doesn't buy
> us much.
> 
> > What about the opposite case, if we have trapping set, but we actually
> > do need to flush the caches, how are we sure this never happens?
> 
> You lost me here. What is the exact case?
> 

If we have HCR_TVM set, and the guest does a flush by SW, we currently
do nothing.

So for example:

1. Guest calls flush by SW
2. We flush and enable trapping
3. guest manipulates memory, and flushes again
4. We do nothing because trapping is enabled.
5. The guest touches some VM register (e.g. TTBR), we don't flush because
   now_enable == was_enabled == true.  We stop trapping.

does this mean the guest is so broken that we don't care?

> >>
> >> So you can think of these as two different, rather unrelated events:
> >> - Guest does S/W ops with caches on: we flush, knowing that this is a
> >> best effort thing, probably doomed, but hey...
> > 
> > ok, so Linux is basically doing something architecturally invalid?
> 
> Indeed. That area of the code is fairly old and covers all versions of
> the architecture, without much distinctions as to what is valid where...
> 
> >> - Guest turns its caches off: we flush, knowing that this is the right
> >> time to do it.
> >>
> >> Hope this helps,
> >>
> > 
> > Yes a bit, but this does feel incredibly brittle, but I know it may be
> > the best option we have....
> 
> Yup, this is probably the worse part of the architecture. We have to
> work our way through both architectural issues (inherited from previous
> revisions), and legacy code that has hardly evolved since ARMv4 (and
> basically works by luck, and the lack of a system cache).
> 
Yikes!

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 2/3] arm/arm64: KVM: Invalidate data cache on unmap
  2015-01-21 18:39 ` [PATCH v3 2/3] arm/arm64: KVM: Invalidate data cache on unmap Marc Zyngier
@ 2015-01-29 14:02   ` Christoffer Dall
  0 siblings, 0 replies; 12+ messages in thread
From: Christoffer Dall @ 2015-01-29 14:02 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, kvmarm, Peter Maydell, Steve Capper

On Wed, Jan 21, 2015 at 06:39:47PM +0000, Marc Zyngier wrote:
> Let's assume a guest has created an uncached mapping, and written
> to that page. Let's also assume that the host uses a cache-coherent
> IO subsystem. Let's finally assume that the host is under memory
> pressure and starts to swap things out.
> 
> Before this "uncached" page is evicted, we need to make sure
> we invalidate potential speculated, clean cache lines that are
> sitting there, or the IO subsystem is going to swap out the
> cached view, loosing the data that has been written directly
> into memory.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 3/3] arm/arm64: KVM: Use kernel mapping to perform invalidation on page fault
  2015-01-21 18:39 ` [PATCH v3 3/3] arm/arm64: KVM: Use kernel mapping to perform invalidation on page fault Marc Zyngier
@ 2015-01-29 14:28   ` Christoffer Dall
  0 siblings, 0 replies; 12+ messages in thread
From: Christoffer Dall @ 2015-01-29 14:28 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, kvmarm, Peter Maydell, Steve Capper

On Wed, Jan 21, 2015 at 06:39:48PM +0000, Marc Zyngier wrote:
> When handling a fault in stage-2, we need to resync I$ and D$, just
> to be sure we don't leave any old cache line behind.
> 
> That's very good, except that we do so using the *user* address.
> Under heavy load (swapping like crazy), we may end up in a situation
> where the page gets mapped in stage-2 while being unmapped from
> userspace by another CPU.
> 
> At that point, the DC/IC instructions can generate a fault, which
> we handle with kvm->mmu_lock held. The box quickly deadlocks, user
> is unhappy.
> 
> Instead, perform this invalidation through the kernel mapping,
> which is guaranteed to be present. The box is much happier, and so
> am I.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/include/asm/kvm_mmu.h   | 43 +++++++++++++++++++++++++++++++---------
>  arch/arm/kvm/mmu.c               | 12 +++++++----
>  arch/arm64/include/asm/kvm_mmu.h | 13 +++++++-----
>  3 files changed, 50 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index 552c31f..e5614c9 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -162,13 +162,10 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
>  	return (vcpu->arch.cp15[c1_SCTLR] & 0b101) == 0b101;
>  }
>  
> -static inline void coherent_cache_guest_page(struct kvm_vcpu *vcpu, hva_t hva,
> -					     unsigned long size,
> -					     bool ipa_uncached)
> +static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu, pfn_t pfn,
> +				 	       unsigned long size,
> +					       bool ipa_uncached)
>  {
> -	if (!vcpu_has_cache_enabled(vcpu) || ipa_uncached)
> -		kvm_flush_dcache_to_poc((void *)hva, size);
> -	
>  	/*
>  	 * If we are going to insert an instruction page and the icache is
>  	 * either VIPT or PIPT, there is a potential problem where the host
> @@ -180,10 +177,38 @@ static inline void coherent_cache_guest_page(struct kvm_vcpu *vcpu, hva_t hva,
>  	 *
>  	 * VIVT caches are tagged using both the ASID and the VMID and doesn't
>  	 * need any kind of flushing (DDI 0406C.b - Page B3-1392).
> +	 *
> +	 * We need to do this through a kernel mapping (using the
> +	 * user-space mapping has proved to be the wrong
> +	 * solution). For that, we need to kmap one page at a time,
> +	 * and iterate over the range.
>  	 */
> -	if (icache_is_pipt()) {
> -		__cpuc_coherent_user_range(hva, hva + size);
> -	} else if (!icache_is_vivt_asid_tagged()) {
> +
> +	bool need_flush = !vcpu_has_cache_enabled(vcpu) || ipa_uncached;
> +
> +	VM_BUG_ON(size & PAGE_MASK);
> +
> +	if (!need_flush && !icache_is_pipt())
> +		goto vipt_cache;
> +
> +	while (size) {
> +		void *va = kmap_atomic_pfn(pfn);
> +
> +		if (need_flush)
> +			kvm_flush_dcache_to_poc(va, PAGE_SIZE);
> +	
> +		if (icache_is_pipt())
> +			__cpuc_coherent_user_range((unsigned long)va,
> +						   (unsigned long)va + PAGE_SIZE);
> +
> +		size -= PAGE_SIZE;
> +		pfn++;
> +
> +		kunmap_atomic(va);
> +	}
> +
> +vipt_cache:
> +	if (!icache_is_pipt() && !icache_is_vivt_asid_tagged()) {
>  		/* any kind of VIPT cache */
>  		__flush_icache_all();
>  	}
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 78e68ab..1366625 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -957,6 +957,12 @@ static bool kvm_is_device_pfn(unsigned long pfn)
>  	return !pfn_valid(pfn);
>  }
>  
> +static void coherent_cache_guest_page(struct kvm_vcpu *vcpu, pfn_t pfn,
> +				      unsigned long size, bool uncached)
> +{
> +	__coherent_cache_guest_page(vcpu, pfn, size, uncached);
> +}
> +

why the indirection?

>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  			  struct kvm_memory_slot *memslot, unsigned long hva,
>  			  unsigned long fault_status)
> @@ -1046,8 +1052,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  			kvm_set_s2pmd_writable(&new_pmd);
>  			kvm_set_pfn_dirty(pfn);
>  		}
> -		coherent_cache_guest_page(vcpu, hva & PMD_MASK, PMD_SIZE,
> -					  fault_ipa_uncached);
> +		coherent_cache_guest_page(vcpu, pfn, PMD_SIZE, fault_ipa_uncached);
>  		ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
>  	} else {
>  		pte_t new_pte = pfn_pte(pfn, mem_type);
> @@ -1055,8 +1060,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  			kvm_set_s2pte_writable(&new_pte);
>  			kvm_set_pfn_dirty(pfn);
>  		}
> -		coherent_cache_guest_page(vcpu, hva, PAGE_SIZE,
> -					  fault_ipa_uncached);
> +		coherent_cache_guest_page(vcpu, pfn, PAGE_SIZE, fault_ipa_uncached);
>  		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte,
>  			pgprot_val(mem_type) == pgprot_val(PAGE_S2_DEVICE));
>  	}
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index cbdc236..adcf495 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -243,15 +243,18 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
>  	return (vcpu_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
>  }
>  
> -static inline void coherent_cache_guest_page(struct kvm_vcpu *vcpu, hva_t hva,
> -					     unsigned long size,
> -					     bool ipa_uncached)
> +static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu, pfn_t pfn,
> +					       unsigned long size,
> +					       bool ipa_uncached)
>  {
> +	void *va = page_address(pfn_to_page(pfn));
> +
>  	if (!vcpu_has_cache_enabled(vcpu) || ipa_uncached)
> -		kvm_flush_dcache_to_poc((void *)hva, size);
> +		kvm_flush_dcache_to_poc(va, size);
>  
>  	if (!icache_is_aliasing()) {		/* PIPT */
> -		flush_icache_range(hva, hva + size);
> +		flush_icache_range((unsigned long)va,
> +				   (unsigned long)va + size);
>  	} else if (!icache_is_aivivt()) {	/* non ASID-tagged VIVT */
>  		/* any kind of VIPT cache */
>  		__flush_icache_all();
> -- 
> 2.1.4
> 

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-01-29 14:28 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-21 18:39 [PATCH v3 0/3] arm/arm64: KVM: Random selection of cache related fixes Marc Zyngier
2015-01-21 18:39 ` [PATCH v3 1/3] arm/arm64: KVM: Use set/way op trapping to track the state of the caches Marc Zyngier
2015-01-26 22:58   ` Christoffer Dall
2015-01-27 11:21     ` Marc Zyngier
2015-01-27 13:17       ` Christoffer Dall
2015-01-27 13:44         ` Marc Zyngier
2015-01-29 13:40           ` Christoffer Dall
2015-01-21 18:39 ` [PATCH v3 2/3] arm/arm64: KVM: Invalidate data cache on unmap Marc Zyngier
2015-01-29 14:02   ` Christoffer Dall
2015-01-21 18:39 ` [PATCH v3 3/3] arm/arm64: KVM: Use kernel mapping to perform invalidation on page fault Marc Zyngier
2015-01-29 14:28   ` Christoffer Dall
2015-01-26 17:09 ` [PATCH v3 0/3] arm/arm64: KVM: Random selection of cache related fixes Andrew Jones

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.