[PATCH 00/10] arm/arm64: KVM: limit icache invalidation to prefetch aborts

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 00/10] arm/arm64: KVM: limit icache invalidation to prefetch aborts
@ 2017-10-09 15:20 ` Marc Zyngier
  0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2017-10-09 15:20 UTC (permalink / raw)
  To: Christoffer Dall, Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, kvm, kvmarm

It was recently reported that on a VM restore, we seem to spend a
disproportionate amount of time invalidation the icache. This is
partially due to some HW behaviour, but also because we're being a bit
dumb and are invalidating the icache for every page we map at S2, even
if that on a data access.

The slightly better way of doing this is to mark the pages XN at S2,
and wait for the the guest to execute something in that page, at which
point we perform the invalidation. As it is likely that there is a lot
less instruction than data, we win (or so we hope).

We also take this opportunity to drop the extra dcache clean to the
PoU which is pretty useless, as we already clean all the way to the
PoC...

Running a bare metal test that touches 1GB of memory (using a 4kB
stride) leads to the following results on Seattle:

4.13:
do_fault_read.bin:       0.565885992 seconds time elapsed
do_fault_write.bin:       0.738296337 seconds time elapsed
do_fault_read_write.bin:       1.241812231 seconds time elapsed

4.14-rc3+patches:
do_fault_read.bin:       0.244961803 seconds time elapsed
do_fault_write.bin:       0.422740092 seconds time elapsed
do_fault_read_write.bin:       0.643402470 seconds time elapsed

We're almost halving the time of something that more or less looks
like a restore operation. Some larger systems will show much bigger
benefits as they become less impacted by the icache invalidation
(which is broadcast in the inner shareable domain).

I've also given it a test run on both Cubietruck and Jetson-TK1.

Tests are archived here:
https://git.kernel.org/pub/scm/linux/kernel/git/maz/kvm-ws-tests.git/

I'd value some additional test results on HW I don't have access to.

Thanks,

	M.

Marc Zyngier (10):
  KVM: arm/arm64: Split dcache/icache flushing
  arm64: KVM: Add invalidate_icache_range helper
  arm: KVM: Add optimized PIPT icache flushing
  arm64: KVM: PTE/PMD S2 XN bit definition
  KVM: arm/arm64: Limit icache invalidation to prefetch aborts
  KVM: arm/arm64: Only clean the dcache on translation fault
  KVM: arm/arm64: Preserve Exec permission across R/W permission faults
  KVM: arm/arm64: Drop vcpu parameter from
    coherent_{d,i}cache_guest_page
  KVM: arm/arm64: Detangle kvm_mmu.h from kvm_hyp.h
  arm: KVM: Use common implementation for all flushes to PoC

 arch/arm/include/asm/kvm_hyp.h         |   3 +-
 arch/arm/include/asm/kvm_mmu.h         | 110 +++++++++++++++++++++++----------
 arch/arm/include/asm/pgtable.h         |   4 +-
 arch/arm/kvm/hyp/switch.c              |   1 +
 arch/arm/kvm/hyp/tlb.c                 |   1 +
 arch/arm64/include/asm/cacheflush.h    |   8 +++
 arch/arm64/include/asm/kvm_hyp.h       |   1 -
 arch/arm64/include/asm/kvm_mmu.h       |  37 +++++++++--
 arch/arm64/include/asm/pgtable-hwdef.h |   2 +
 arch/arm64/include/asm/pgtable-prot.h  |   4 +-
 arch/arm64/kvm/hyp/debug-sr.c          |   1 +
 arch/arm64/kvm/hyp/switch.c            |   1 +
 arch/arm64/kvm/hyp/tlb.c               |   1 +
 arch/arm64/mm/cache.S                  |  24 +++++++
 virt/kvm/arm/hyp/vgic-v2-sr.c          |   1 +
 virt/kvm/arm/mmu.c                     |  68 +++++++++++++++++---
 16 files changed, 213 insertions(+), 54 deletions(-)

-- 
2.14.1

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 00/10] arm/arm64: KVM: limit icache invalidation to prefetch aborts
@ 2017-10-09 15:20 ` Marc Zyngier
  0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2017-10-09 15:20 UTC (permalink / raw)
  To: linux-arm-kernel

It was recently reported that on a VM restore, we seem to spend a
disproportionate amount of time invalidation the icache. This is
partially due to some HW behaviour, but also because we're being a bit
dumb and are invalidating the icache for every page we map at S2, even
if that on a data access.

The slightly better way of doing this is to mark the pages XN at S2,
and wait for the the guest to execute something in that page, at which
point we perform the invalidation. As it is likely that there is a lot
less instruction than data, we win (or so we hope).

We also take this opportunity to drop the extra dcache clean to the
PoU which is pretty useless, as we already clean all the way to the
PoC...

Running a bare metal test that touches 1GB of memory (using a 4kB
stride) leads to the following results on Seattle:

4.13:
do_fault_read.bin:       0.565885992 seconds time elapsed
do_fault_write.bin:       0.738296337 seconds time elapsed
do_fault_read_write.bin:       1.241812231 seconds time elapsed

4.14-rc3+patches:
do_fault_read.bin:       0.244961803 seconds time elapsed
do_fault_write.bin:       0.422740092 seconds time elapsed
do_fault_read_write.bin:       0.643402470 seconds time elapsed

We're almost halving the time of something that more or less looks
like a restore operation. Some larger systems will show much bigger
benefits as they become less impacted by the icache invalidation
(which is broadcast in the inner shareable domain).

I've also given it a test run on both Cubietruck and Jetson-TK1.

Tests are archived here:
https://git.kernel.org/pub/scm/linux/kernel/git/maz/kvm-ws-tests.git/

I'd value some additional test results on HW I don't have access to.

Thanks,

	M.

Marc Zyngier (10):
  KVM: arm/arm64: Split dcache/icache flushing
  arm64: KVM: Add invalidate_icache_range helper
  arm: KVM: Add optimized PIPT icache flushing
  arm64: KVM: PTE/PMD S2 XN bit definition
  KVM: arm/arm64: Limit icache invalidation to prefetch aborts
  KVM: arm/arm64: Only clean the dcache on translation fault
  KVM: arm/arm64: Preserve Exec permission across R/W permission faults
  KVM: arm/arm64: Drop vcpu parameter from
    coherent_{d,i}cache_guest_page
  KVM: arm/arm64: Detangle kvm_mmu.h from kvm_hyp.h
  arm: KVM: Use common implementation for all flushes to PoC

 arch/arm/include/asm/kvm_hyp.h         |   3 +-
 arch/arm/include/asm/kvm_mmu.h         | 110 +++++++++++++++++++++++----------
 arch/arm/include/asm/pgtable.h         |   4 +-
 arch/arm/kvm/hyp/switch.c              |   1 +
 arch/arm/kvm/hyp/tlb.c                 |   1 +
 arch/arm64/include/asm/cacheflush.h    |   8 +++
 arch/arm64/include/asm/kvm_hyp.h       |   1 -
 arch/arm64/include/asm/kvm_mmu.h       |  37 +++++++++--
 arch/arm64/include/asm/pgtable-hwdef.h |   2 +
 arch/arm64/include/asm/pgtable-prot.h  |   4 +-
 arch/arm64/kvm/hyp/debug-sr.c          |   1 +
 arch/arm64/kvm/hyp/switch.c            |   1 +
 arch/arm64/kvm/hyp/tlb.c               |   1 +
 arch/arm64/mm/cache.S                  |  24 +++++++
 virt/kvm/arm/hyp/vgic-v2-sr.c          |   1 +
 virt/kvm/arm/mmu.c                     |  68 +++++++++++++++++---
 16 files changed, 213 insertions(+), 54 deletions(-)

-- 
2.14.1

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 01/10] KVM: arm/arm64: Split dcache/icache flushing
  2017-10-09 15:20 ` Marc Zyngier
@ 2017-10-09 15:20   ` Marc Zyngier
  -1 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2017-10-09 15:20 UTC (permalink / raw)
  To: Christoffer Dall, Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, kvm, kvmarm

As we're about to introduce opportunistic invalidation of the icache,
let's split dcache and icache flushing.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_mmu.h   | 60 ++++++++++++++++++++++++++++------------
 arch/arm64/include/asm/kvm_mmu.h | 13 +++++++--
 virt/kvm/arm/mmu.c               | 20 ++++++++++----
 3 files changed, 67 insertions(+), 26 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index fa6f2174276b..f553aa62d0c3 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -126,21 +126,12 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
 	return (vcpu_cp15(vcpu, c1_SCTLR) & 0b101) == 0b101;
 }
 
-static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
-					       kvm_pfn_t pfn,
-					       unsigned long size)
+static inline void __coherent_dcache_guest_page(struct kvm_vcpu *vcpu,
+						kvm_pfn_t pfn,
+						unsigned long size)
 {
 	/*
-	 * If we are going to insert an instruction page and the icache is
-	 * either VIPT or PIPT, there is a potential problem where the host
-	 * (or another VM) may have used the same page as this guest, and we
-	 * read incorrect data from the icache.  If we're using a PIPT cache,
-	 * we can invalidate just that page, but if we are using a VIPT cache
-	 * we need to invalidate the entire icache - damn shame - as written
-	 * in the ARM ARM (DDI 0406C.b - Page B3-1393).
-	 *
-	 * VIVT caches are tagged using both the ASID and the VMID and doesn't
-	 * need any kind of flushing (DDI 0406C.b - Page B3-1392).
+	 * Clean the dcache to the Point of Coherency.
 	 *
 	 * We need to do this through a kernel mapping (using the
 	 * user-space mapping has proved to be the wrong
@@ -155,19 +146,52 @@ static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
 
 		kvm_flush_dcache_to_poc(va, PAGE_SIZE);
 
-		if (icache_is_pipt())
-			__cpuc_coherent_user_range((unsigned long)va,
-						   (unsigned long)va + PAGE_SIZE);
-
 		size -= PAGE_SIZE;
 		pfn++;
 
 		kunmap_atomic(va);
 	}
+}
 
-	if (!icache_is_pipt() && !icache_is_vivt_asid_tagged()) {
+static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
+						kvm_pfn_t pfn,
+						unsigned long size)
+{
+	/*
+	 * If we are going to insert an instruction page and the icache is
+	 * either VIPT or PIPT, there is a potential problem where the host
+	 * (or another VM) may have used the same page as this guest, and we
+	 * read incorrect data from the icache.  If we're using a PIPT cache,
+	 * we can invalidate just that page, but if we are using a VIPT cache
+	 * we need to invalidate the entire icache - damn shame - as written
+	 * in the ARM ARM (DDI 0406C.b - Page B3-1393).
+	 *
+	 * VIVT caches are tagged using both the ASID and the VMID and doesn't
+	 * need any kind of flushing (DDI 0406C.b - Page B3-1392).
+	 */
+
+	VM_BUG_ON(size & ~PAGE_MASK);
+
+	if (icache_is_vivt_asid_tagged())
+		return;
+
+	if (!icache_is_pipt()) {
 		/* any kind of VIPT cache */
 		__flush_icache_all();
+		return;
+	}
+
+	/* PIPT cache. As for the d-side, use a temporary kernel mapping. */
+	while (size) {
+		void *va = kmap_atomic_pfn(pfn);
+
+		__cpuc_coherent_user_range((unsigned long)va,
+					   (unsigned long)va + PAGE_SIZE);
+
+		size -= PAGE_SIZE;
+		pfn++;
+
+		kunmap_atomic(va);
 	}
 }
 
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 672c8684d5c2..4c4cb4f0e34f 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -230,19 +230,26 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
 	return (vcpu_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
 }
 
-static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
-					       kvm_pfn_t pfn,
-					       unsigned long size)
+static inline void __coherent_dcache_guest_page(struct kvm_vcpu *vcpu,
+						kvm_pfn_t pfn,
+						unsigned long size)
 {
 	void *va = page_address(pfn_to_page(pfn));
 
 	kvm_flush_dcache_to_poc(va, size);
+}
 
+static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
+						kvm_pfn_t pfn,
+						unsigned long size)
+{
 	if (icache_is_aliasing()) {
 		/* any kind of VIPT cache */
 		__flush_icache_all();
 	} else if (is_kernel_in_hyp_mode() || !icache_is_vpipt()) {
 		/* PIPT or VPIPT at EL2 (see comment in __kvm_tlb_flush_vmid_ipa) */
+		void *va = page_address(pfn_to_page(pfn));
+
 		flush_icache_range((unsigned long)va,
 				   (unsigned long)va + size);
 	}
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index b36945d49986..9e5628388af8 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1257,10 +1257,16 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
 }
 
-static void coherent_cache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
-				      unsigned long size)
+static void coherent_dcache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
+				       unsigned long size)
 {
-	__coherent_cache_guest_page(vcpu, pfn, size);
+	__coherent_dcache_guest_page(vcpu, pfn, size);
+}
+
+static void coherent_icache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
+				       unsigned long size)
+{
+	__coherent_icache_guest_page(vcpu, pfn, size);
 }
 
 static void kvm_send_hwpoison_signal(unsigned long address,
@@ -1391,7 +1397,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			new_pmd = kvm_s2pmd_mkwrite(new_pmd);
 			kvm_set_pfn_dirty(pfn);
 		}
-		coherent_cache_guest_page(vcpu, pfn, PMD_SIZE);
+		coherent_dcache_guest_page(vcpu, pfn, PMD_SIZE);
+		coherent_icache_guest_page(vcpu, pfn, PMD_SIZE);
+
 		ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
 	} else {
 		pte_t new_pte = pfn_pte(pfn, mem_type);
@@ -1401,7 +1409,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			kvm_set_pfn_dirty(pfn);
 			mark_page_dirty(kvm, gfn);
 		}
-		coherent_cache_guest_page(vcpu, pfn, PAGE_SIZE);
+		coherent_dcache_guest_page(vcpu, pfn, PAGE_SIZE);
+		coherent_icache_guest_page(vcpu, pfn, PAGE_SIZE);
+
 		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, flags);
 	}
 
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 01/10] KVM: arm/arm64: Split dcache/icache flushing
@ 2017-10-09 15:20   ` Marc Zyngier
  0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2017-10-09 15:20 UTC (permalink / raw)
  To: linux-arm-kernel

As we're about to introduce opportunistic invalidation of the icache,
let's split dcache and icache flushing.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_mmu.h   | 60 ++++++++++++++++++++++++++++------------
 arch/arm64/include/asm/kvm_mmu.h | 13 +++++++--
 virt/kvm/arm/mmu.c               | 20 ++++++++++----
 3 files changed, 67 insertions(+), 26 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index fa6f2174276b..f553aa62d0c3 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -126,21 +126,12 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
 	return (vcpu_cp15(vcpu, c1_SCTLR) & 0b101) == 0b101;
 }
 
-static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
-					       kvm_pfn_t pfn,
-					       unsigned long size)
+static inline void __coherent_dcache_guest_page(struct kvm_vcpu *vcpu,
+						kvm_pfn_t pfn,
+						unsigned long size)
 {
 	/*
-	 * If we are going to insert an instruction page and the icache is
-	 * either VIPT or PIPT, there is a potential problem where the host
-	 * (or another VM) may have used the same page as this guest, and we
-	 * read incorrect data from the icache.  If we're using a PIPT cache,
-	 * we can invalidate just that page, but if we are using a VIPT cache
-	 * we need to invalidate the entire icache - damn shame - as written
-	 * in the ARM ARM (DDI 0406C.b - Page B3-1393).
-	 *
-	 * VIVT caches are tagged using both the ASID and the VMID and doesn't
-	 * need any kind of flushing (DDI 0406C.b - Page B3-1392).
+	 * Clean the dcache to the Point of Coherency.
 	 *
 	 * We need to do this through a kernel mapping (using the
 	 * user-space mapping has proved to be the wrong
@@ -155,19 +146,52 @@ static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
 
 		kvm_flush_dcache_to_poc(va, PAGE_SIZE);
 
-		if (icache_is_pipt())
-			__cpuc_coherent_user_range((unsigned long)va,
-						   (unsigned long)va + PAGE_SIZE);
-
 		size -= PAGE_SIZE;
 		pfn++;
 
 		kunmap_atomic(va);
 	}
+}
 
-	if (!icache_is_pipt() && !icache_is_vivt_asid_tagged()) {
+static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
+						kvm_pfn_t pfn,
+						unsigned long size)
+{
+	/*
+	 * If we are going to insert an instruction page and the icache is
+	 * either VIPT or PIPT, there is a potential problem where the host
+	 * (or another VM) may have used the same page as this guest, and we
+	 * read incorrect data from the icache.  If we're using a PIPT cache,
+	 * we can invalidate just that page, but if we are using a VIPT cache
+	 * we need to invalidate the entire icache - damn shame - as written
+	 * in the ARM ARM (DDI 0406C.b - Page B3-1393).
+	 *
+	 * VIVT caches are tagged using both the ASID and the VMID and doesn't
+	 * need any kind of flushing (DDI 0406C.b - Page B3-1392).
+	 */
+
+	VM_BUG_ON(size & ~PAGE_MASK);
+
+	if (icache_is_vivt_asid_tagged())
+		return;
+
+	if (!icache_is_pipt()) {
 		/* any kind of VIPT cache */
 		__flush_icache_all();
+		return;
+	}
+
+	/* PIPT cache. As for the d-side, use a temporary kernel mapping. */
+	while (size) {
+		void *va = kmap_atomic_pfn(pfn);
+
+		__cpuc_coherent_user_range((unsigned long)va,
+					   (unsigned long)va + PAGE_SIZE);
+
+		size -= PAGE_SIZE;
+		pfn++;
+
+		kunmap_atomic(va);
 	}
 }
 
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 672c8684d5c2..4c4cb4f0e34f 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -230,19 +230,26 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
 	return (vcpu_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
 }
 
-static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
-					       kvm_pfn_t pfn,
-					       unsigned long size)
+static inline void __coherent_dcache_guest_page(struct kvm_vcpu *vcpu,
+						kvm_pfn_t pfn,
+						unsigned long size)
 {
 	void *va = page_address(pfn_to_page(pfn));
 
 	kvm_flush_dcache_to_poc(va, size);
+}
 
+static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
+						kvm_pfn_t pfn,
+						unsigned long size)
+{
 	if (icache_is_aliasing()) {
 		/* any kind of VIPT cache */
 		__flush_icache_all();
 	} else if (is_kernel_in_hyp_mode() || !icache_is_vpipt()) {
 		/* PIPT or VPIPT at EL2 (see comment in __kvm_tlb_flush_vmid_ipa) */
+		void *va = page_address(pfn_to_page(pfn));
+
 		flush_icache_range((unsigned long)va,
 				   (unsigned long)va + size);
 	}
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index b36945d49986..9e5628388af8 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1257,10 +1257,16 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
 }
 
-static void coherent_cache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
-				      unsigned long size)
+static void coherent_dcache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
+				       unsigned long size)
 {
-	__coherent_cache_guest_page(vcpu, pfn, size);
+	__coherent_dcache_guest_page(vcpu, pfn, size);
+}
+
+static void coherent_icache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
+				       unsigned long size)
+{
+	__coherent_icache_guest_page(vcpu, pfn, size);
 }
 
 static void kvm_send_hwpoison_signal(unsigned long address,
@@ -1391,7 +1397,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			new_pmd = kvm_s2pmd_mkwrite(new_pmd);
 			kvm_set_pfn_dirty(pfn);
 		}
-		coherent_cache_guest_page(vcpu, pfn, PMD_SIZE);
+		coherent_dcache_guest_page(vcpu, pfn, PMD_SIZE);
+		coherent_icache_guest_page(vcpu, pfn, PMD_SIZE);
+
 		ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
 	} else {
 		pte_t new_pte = pfn_pte(pfn, mem_type);
@@ -1401,7 +1409,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			kvm_set_pfn_dirty(pfn);
 			mark_page_dirty(kvm, gfn);
 		}
-		coherent_cache_guest_page(vcpu, pfn, PAGE_SIZE);
+		coherent_dcache_guest_page(vcpu, pfn, PAGE_SIZE);
+		coherent_icache_guest_page(vcpu, pfn, PAGE_SIZE);
+
 		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, flags);
 	}
 
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 02/10] arm64: KVM: Add invalidate_icache_range helper
  2017-10-09 15:20 ` Marc Zyngier
@ 2017-10-09 15:20   ` Marc Zyngier
  -1 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2017-10-09 15:20 UTC (permalink / raw)
  To: Christoffer Dall, Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, kvm, kvmarm

We currently tightly couple dcache clean with icache invalidation,
but KVM could do without the initial flush to PoU, as we've
already flushed things to PoC.

Let's introduce invalidate_icache_range which is limited to
invalidating the icache from the linear mapping (and thus
has none of the userspace fault handling complexity), and
wire it in KVM instead of flush_icache_range.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/cacheflush.h |  8 ++++++++
 arch/arm64/include/asm/kvm_mmu.h    |  4 ++--
 arch/arm64/mm/cache.S               | 24 ++++++++++++++++++++++++
 3 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
index 76d1cc85d5b1..ad56406944c6 100644
--- a/arch/arm64/include/asm/cacheflush.h
+++ b/arch/arm64/include/asm/cacheflush.h
@@ -52,6 +52,13 @@
  *		- start  - virtual start address
  *		- end    - virtual end address
  *
+ *	invalidate_icache_range(start, end)
+ *
+ *		Invalidate the I-cache in the region described by start, end.
+ *		Linear mapping only!
+ *		- start  - virtual start address
+ *		- end    - virtual end address
+ *
  *	__flush_cache_user_range(start, end)
  *
  *		Ensure coherency between the I-cache and the D-cache in the
@@ -66,6 +73,7 @@
  *		- size   - region size
  */
 extern void flush_icache_range(unsigned long start, unsigned long end);
+extern void invalidate_icache_range(unsigned long start, unsigned long end);
 extern void __flush_dcache_area(void *addr, size_t len);
 extern void __inval_dcache_area(void *addr, size_t len);
 extern void __clean_dcache_area_poc(void *addr, size_t len);
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 4c4cb4f0e34f..48d31ca2ce9c 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -250,8 +250,8 @@ static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
 		/* PIPT or VPIPT at EL2 (see comment in __kvm_tlb_flush_vmid_ipa) */
 		void *va = page_address(pfn_to_page(pfn));
 
-		flush_icache_range((unsigned long)va,
-				   (unsigned long)va + size);
+		invalidate_icache_range((unsigned long)va,
+					(unsigned long)va + size);
 	}
 }
 
diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S
index 7f1dbe962cf5..0c330666a8c9 100644
--- a/arch/arm64/mm/cache.S
+++ b/arch/arm64/mm/cache.S
@@ -80,6 +80,30 @@ USER(9f, ic	ivau, x4	)		// invalidate I line PoU
 ENDPROC(flush_icache_range)
 ENDPROC(__flush_cache_user_range)
 
+/*
+ *	invalidate_icache_range(start,end)
+ *
+ *	Ensure that the I cache is invalid within specified region. This
+ *	assumes that this is done on the linear mapping. Do not use it
+ *	on a userspace range, as this may fault horribly.
+ *
+ *	- start   - virtual start address of region
+ *	- end     - virtual end address of region
+ */
+ENTRY(invalidate_icache_range)
+	icache_line_size x2, x3
+	sub	x3, x2, #1
+	bic	x4, x0, x3
+1:
+	ic	ivau, x4			// invalidate I line PoU
+	add	x4, x4, x2
+	cmp	x4, x1
+	b.lo	1b
+	dsb	ish
+	isb
+	ret
+ENDPROC(invalidate_icache_range)
+
 /*
  *	__flush_dcache_area(kaddr, size)
  *
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 02/10] arm64: KVM: Add invalidate_icache_range helper
@ 2017-10-09 15:20   ` Marc Zyngier
  0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2017-10-09 15:20 UTC (permalink / raw)
  To: linux-arm-kernel

We currently tightly couple dcache clean with icache invalidation,
but KVM could do without the initial flush to PoU, as we've
already flushed things to PoC.

Let's introduce invalidate_icache_range which is limited to
invalidating the icache from the linear mapping (and thus
has none of the userspace fault handling complexity), and
wire it in KVM instead of flush_icache_range.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/cacheflush.h |  8 ++++++++
 arch/arm64/include/asm/kvm_mmu.h    |  4 ++--
 arch/arm64/mm/cache.S               | 24 ++++++++++++++++++++++++
 3 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
index 76d1cc85d5b1..ad56406944c6 100644
--- a/arch/arm64/include/asm/cacheflush.h
+++ b/arch/arm64/include/asm/cacheflush.h
@@ -52,6 +52,13 @@
  *		- start  - virtual start address
  *		- end    - virtual end address
  *
+ *	invalidate_icache_range(start, end)
+ *
+ *		Invalidate the I-cache in the region described by start, end.
+ *		Linear mapping only!
+ *		- start  - virtual start address
+ *		- end    - virtual end address
+ *
  *	__flush_cache_user_range(start, end)
  *
  *		Ensure coherency between the I-cache and the D-cache in the
@@ -66,6 +73,7 @@
  *		- size   - region size
  */
 extern void flush_icache_range(unsigned long start, unsigned long end);
+extern void invalidate_icache_range(unsigned long start, unsigned long end);
 extern void __flush_dcache_area(void *addr, size_t len);
 extern void __inval_dcache_area(void *addr, size_t len);
 extern void __clean_dcache_area_poc(void *addr, size_t len);
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 4c4cb4f0e34f..48d31ca2ce9c 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -250,8 +250,8 @@ static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
 		/* PIPT or VPIPT at EL2 (see comment in __kvm_tlb_flush_vmid_ipa) */
 		void *va = page_address(pfn_to_page(pfn));
 
-		flush_icache_range((unsigned long)va,
-				   (unsigned long)va + size);
+		invalidate_icache_range((unsigned long)va,
+					(unsigned long)va + size);
 	}
 }
 
diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S
index 7f1dbe962cf5..0c330666a8c9 100644
--- a/arch/arm64/mm/cache.S
+++ b/arch/arm64/mm/cache.S
@@ -80,6 +80,30 @@ USER(9f, ic	ivau, x4	)		// invalidate I line PoU
 ENDPROC(flush_icache_range)
 ENDPROC(__flush_cache_user_range)
 
+/*
+ *	invalidate_icache_range(start,end)
+ *
+ *	Ensure that the I cache is invalid within specified region. This
+ *	assumes that this is done on the linear mapping. Do not use it
+ *	on a userspace range, as this may fault horribly.
+ *
+ *	- start   - virtual start address of region
+ *	- end     - virtual end address of region
+ */
+ENTRY(invalidate_icache_range)
+	icache_line_size x2, x3
+	sub	x3, x2, #1
+	bic	x4, x0, x3
+1:
+	ic	ivau, x4			// invalidate I line PoU
+	add	x4, x4, x2
+	cmp	x4, x1
+	b.lo	1b
+	dsb	ish
+	isb
+	ret
+ENDPROC(invalidate_icache_range)
+
 /*
  *	__flush_dcache_area(kaddr, size)
  *
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 03/10] arm: KVM: Add optimized PIPT icache flushing
  2017-10-09 15:20 ` Marc Zyngier
@ 2017-10-09 15:20   ` Marc Zyngier
  -1 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2017-10-09 15:20 UTC (permalink / raw)
  To: Christoffer Dall, Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, kvm, kvmarm

Calling __cpuc_coherent_user_range to invalidate the icache on
a PIPT icache machine has some pointless overhead, as it starts
by cleaning the dcache to the PoU, while we're guaranteed to
have already cleaned it to the PoC.

As KVM is the only user of such a feature, let's implement some
ad-hoc cache flushing in kvm_mmu.h. Should it become useful to
other subsystems, it can be moved to a more global location.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_hyp.h |  2 ++
 arch/arm/include/asm/kvm_mmu.h | 24 ++++++++++++++++++++++--
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/kvm_hyp.h b/arch/arm/include/asm/kvm_hyp.h
index 14b5903f0224..ad541f9ecc78 100644
--- a/arch/arm/include/asm/kvm_hyp.h
+++ b/arch/arm/include/asm/kvm_hyp.h
@@ -69,6 +69,8 @@
 #define HIFAR		__ACCESS_CP15(c6, 4, c0, 2)
 #define HPFAR		__ACCESS_CP15(c6, 4, c0, 4)
 #define ICIALLUIS	__ACCESS_CP15(c7, 0, c1, 0)
+#define BPIALLIS	__ACCESS_CP15(c7, 0, c1, 6)
+#define ICIMVAU		__ACCESS_CP15(c7, 0, c5, 1)
 #define ATS1CPR		__ACCESS_CP15(c7, 0, c8, 0)
 #define TLBIALLIS	__ACCESS_CP15(c8, 0, c3, 0)
 #define TLBIALL		__ACCESS_CP15(c8, 0, c7, 0)
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index f553aa62d0c3..6773dcf21bff 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -37,6 +37,8 @@
 
 #include <linux/highmem.h>
 #include <asm/cacheflush.h>
+#include <asm/cputype.h>
+#include <asm/kvm_hyp.h>
 #include <asm/pgalloc.h>
 #include <asm/stage2_pgtable.h>
 
@@ -157,6 +159,8 @@ static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
 						kvm_pfn_t pfn,
 						unsigned long size)
 {
+	u32 iclsz;
+
 	/*
 	 * If we are going to insert an instruction page and the icache is
 	 * either VIPT or PIPT, there is a potential problem where the host
@@ -182,17 +186,33 @@ static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
 	}
 
 	/* PIPT cache. As for the d-side, use a temporary kernel mapping. */
+	iclsz = 4 << (read_cpuid(CPUID_CACHETYPE) & 0xf);
+
 	while (size) {
 		void *va = kmap_atomic_pfn(pfn);
+		void *end = va + PAGE_SIZE;
+		void *addr = va;
+
+		do {
+			write_sysreg(addr, ICIMVAU);
+			addr += iclsz;
+		} while (addr < end);
 
-		__cpuc_coherent_user_range((unsigned long)va,
-					   (unsigned long)va + PAGE_SIZE);
+		dsb(ishst);
+		isb();
 
 		size -= PAGE_SIZE;
 		pfn++;
 
 		kunmap_atomic(va);
 	}
+
+	/* Check if we need to invalidate the BTB */
+	if ((read_cpuid_ext(CPUID_EXT_MMFR1) >> 24) != 4) {
+		write_sysreg(0, BPIALLIS);
+		dsb(ishst);
+		isb();
+	}
 }
 
 static inline void __kvm_flush_dcache_pte(pte_t pte)
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 03/10] arm: KVM: Add optimized PIPT icache flushing
@ 2017-10-09 15:20   ` Marc Zyngier
  0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2017-10-09 15:20 UTC (permalink / raw)
  To: linux-arm-kernel

Calling __cpuc_coherent_user_range to invalidate the icache on
a PIPT icache machine has some pointless overhead, as it starts
by cleaning the dcache to the PoU, while we're guaranteed to
have already cleaned it to the PoC.

As KVM is the only user of such a feature, let's implement some
ad-hoc cache flushing in kvm_mmu.h. Should it become useful to
other subsystems, it can be moved to a more global location.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_hyp.h |  2 ++
 arch/arm/include/asm/kvm_mmu.h | 24 ++++++++++++++++++++++--
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/kvm_hyp.h b/arch/arm/include/asm/kvm_hyp.h
index 14b5903f0224..ad541f9ecc78 100644
--- a/arch/arm/include/asm/kvm_hyp.h
+++ b/arch/arm/include/asm/kvm_hyp.h
@@ -69,6 +69,8 @@
 #define HIFAR		__ACCESS_CP15(c6, 4, c0, 2)
 #define HPFAR		__ACCESS_CP15(c6, 4, c0, 4)
 #define ICIALLUIS	__ACCESS_CP15(c7, 0, c1, 0)
+#define BPIALLIS	__ACCESS_CP15(c7, 0, c1, 6)
+#define ICIMVAU		__ACCESS_CP15(c7, 0, c5, 1)
 #define ATS1CPR		__ACCESS_CP15(c7, 0, c8, 0)
 #define TLBIALLIS	__ACCESS_CP15(c8, 0, c3, 0)
 #define TLBIALL		__ACCESS_CP15(c8, 0, c7, 0)
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index f553aa62d0c3..6773dcf21bff 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -37,6 +37,8 @@
 
 #include <linux/highmem.h>
 #include <asm/cacheflush.h>
+#include <asm/cputype.h>
+#include <asm/kvm_hyp.h>
 #include <asm/pgalloc.h>
 #include <asm/stage2_pgtable.h>
 
@@ -157,6 +159,8 @@ static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
 						kvm_pfn_t pfn,
 						unsigned long size)
 {
+	u32 iclsz;
+
 	/*
 	 * If we are going to insert an instruction page and the icache is
 	 * either VIPT or PIPT, there is a potential problem where the host
@@ -182,17 +186,33 @@ static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
 	}
 
 	/* PIPT cache. As for the d-side, use a temporary kernel mapping. */
+	iclsz = 4 << (read_cpuid(CPUID_CACHETYPE) & 0xf);
+
 	while (size) {
 		void *va = kmap_atomic_pfn(pfn);
+		void *end = va + PAGE_SIZE;
+		void *addr = va;
+
+		do {
+			write_sysreg(addr, ICIMVAU);
+			addr += iclsz;
+		} while (addr < end);
 
-		__cpuc_coherent_user_range((unsigned long)va,
-					   (unsigned long)va + PAGE_SIZE);
+		dsb(ishst);
+		isb();
 
 		size -= PAGE_SIZE;
 		pfn++;
 
 		kunmap_atomic(va);
 	}
+
+	/* Check if we need to invalidate the BTB */
+	if ((read_cpuid_ext(CPUID_EXT_MMFR1) >> 24) != 4) {
+		write_sysreg(0, BPIALLIS);
+		dsb(ishst);
+		isb();
+	}
 }
 
 static inline void __kvm_flush_dcache_pte(pte_t pte)
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 04/10] arm64: KVM: PTE/PMD S2 XN bit definition
  2017-10-09 15:20 ` Marc Zyngier
@ 2017-10-09 15:20   ` Marc Zyngier
  -1 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2017-10-09 15:20 UTC (permalink / raw)
  To: Christoffer Dall, Catalin Marinas, Will Deacon
  Cc: kvm, linux-arm-kernel, kvmarm

As we're about to make S2 page-tables eXecute Never by default,
add the required bits for both PMDs and PTEs.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/pgtable-hwdef.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
index eb0c2bd90de9..af035331fb09 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -177,9 +177,11 @@
  */
 #define PTE_S2_RDONLY		(_AT(pteval_t, 1) << 6)   /* HAP[2:1] */
 #define PTE_S2_RDWR		(_AT(pteval_t, 3) << 6)   /* HAP[2:1] */
+#define PTE_S2_XN		(_AT(pteval_t, 2) << 53)  /* XN[1:0] */
 
 #define PMD_S2_RDONLY		(_AT(pmdval_t, 1) << 6)   /* HAP[2:1] */
 #define PMD_S2_RDWR		(_AT(pmdval_t, 3) << 6)   /* HAP[2:1] */
+#define PMD_S2_XN		(_AT(pmdval_t, 2) << 53)  /* XN[1:0] */
 
 /*
  * Memory Attribute override for Stage-2 (MemAttr[3:0])
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 04/10] arm64: KVM: PTE/PMD S2 XN bit definition
@ 2017-10-09 15:20   ` Marc Zyngier
  0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2017-10-09 15:20 UTC (permalink / raw)
  To: linux-arm-kernel

As we're about to make S2 page-tables eXecute Never by default,
add the required bits for both PMDs and PTEs.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/pgtable-hwdef.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
index eb0c2bd90de9..af035331fb09 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -177,9 +177,11 @@
  */
 #define PTE_S2_RDONLY		(_AT(pteval_t, 1) << 6)   /* HAP[2:1] */
 #define PTE_S2_RDWR		(_AT(pteval_t, 3) << 6)   /* HAP[2:1] */
+#define PTE_S2_XN		(_AT(pteval_t, 2) << 53)  /* XN[1:0] */
 
 #define PMD_S2_RDONLY		(_AT(pmdval_t, 1) << 6)   /* HAP[2:1] */
 #define PMD_S2_RDWR		(_AT(pmdval_t, 3) << 6)   /* HAP[2:1] */
+#define PMD_S2_XN		(_AT(pmdval_t, 2) << 53)  /* XN[1:0] */
 
 /*
  * Memory Attribute override for Stage-2 (MemAttr[3:0])
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 05/10] KVM: arm/arm64: Limit icache invalidation to prefetch aborts
  2017-10-09 15:20 ` Marc Zyngier
@ 2017-10-09 15:20   ` Marc Zyngier
  -1 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2017-10-09 15:20 UTC (permalink / raw)
  To: Christoffer Dall, Catalin Marinas, Will Deacon
  Cc: kvm, linux-arm-kernel, kvmarm

We've so far eagerly invalidated the icache, no matter how
the page was faulted in (data or prefetch abort).

But we can easily track execution by setting the XN bits
in the S2 page tables, get the prefetch abort at HYP and
perform the icache invalidation at that time only.

As for most VMs, the instruction working set is pretty
small compared to the data set, this is likely to save
some traffic (specially as the invalidation is broadcast).

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_mmu.h        | 12 ++++++++++++
 arch/arm/include/asm/pgtable.h        |  4 ++--
 arch/arm64/include/asm/kvm_mmu.h      | 12 ++++++++++++
 arch/arm64/include/asm/pgtable-prot.h |  4 ++--
 virt/kvm/arm/mmu.c                    | 19 +++++++++++++++----
 5 files changed, 43 insertions(+), 8 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 6773dcf21bff..bf76150aad5f 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -85,6 +85,18 @@ static inline pmd_t kvm_s2pmd_mkwrite(pmd_t pmd)
 	return pmd;
 }
 
+static inline pte_t kvm_s2pte_mkexec(pte_t pte)
+{
+	pte_val(pte) &= ~L_PTE_XN;
+	return pte;
+}
+
+static inline pmd_t kvm_s2pmd_mkexec(pmd_t pmd)
+{
+	pmd_val(pmd) &= ~PMD_SECT_XN;
+	return pmd;
+}
+
 static inline void kvm_set_s2pte_readonly(pte_t *pte)
 {
 	pte_val(*pte) = (pte_val(*pte) & ~L_PTE_S2_RDWR) | L_PTE_S2_RDONLY;
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 1c462381c225..9b6e77b9ab7e 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -102,8 +102,8 @@ extern pgprot_t		pgprot_s2_device;
 #define PAGE_HYP_EXEC		_MOD_PROT(pgprot_kernel, L_PTE_HYP | L_PTE_RDONLY)
 #define PAGE_HYP_RO		_MOD_PROT(pgprot_kernel, L_PTE_HYP | L_PTE_RDONLY | L_PTE_XN)
 #define PAGE_HYP_DEVICE		_MOD_PROT(pgprot_hyp_device, L_PTE_HYP)
-#define PAGE_S2			_MOD_PROT(pgprot_s2, L_PTE_S2_RDONLY)
-#define PAGE_S2_DEVICE		_MOD_PROT(pgprot_s2_device, L_PTE_S2_RDONLY)
+#define PAGE_S2			_MOD_PROT(pgprot_s2, L_PTE_S2_RDONLY | L_PTE_XN)
+#define PAGE_S2_DEVICE		_MOD_PROT(pgprot_s2_device, L_PTE_S2_RDONLY | L_PTE_XN)
 
 #define __PAGE_NONE		__pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | L_PTE_XN | L_PTE_NONE)
 #define __PAGE_SHARED		__pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_XN)
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 48d31ca2ce9c..60c420a5ac0d 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -173,6 +173,18 @@ static inline pmd_t kvm_s2pmd_mkwrite(pmd_t pmd)
 	return pmd;
 }
 
+static inline pte_t kvm_s2pte_mkexec(pte_t pte)
+{
+	pte_val(pte) &= ~PTE_S2_XN;
+	return pte;
+}
+
+static inline pmd_t kvm_s2pmd_mkexec(pmd_t pmd)
+{
+	pmd_val(pmd) &= ~PMD_S2_XN;
+	return pmd;
+}
+
 static inline void kvm_set_s2pte_readonly(pte_t *pte)
 {
 	pteval_t old_pteval, pteval;
diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
index 0a5635fb0ef9..4e12dabd342b 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -60,8 +60,8 @@
 #define PAGE_HYP_RO		__pgprot(_PAGE_DEFAULT | PTE_HYP | PTE_RDONLY | PTE_HYP_XN)
 #define PAGE_HYP_DEVICE		__pgprot(PROT_DEVICE_nGnRE | PTE_HYP)
 
-#define PAGE_S2			__pgprot(PROT_DEFAULT | PTE_S2_MEMATTR(MT_S2_NORMAL) | PTE_S2_RDONLY)
-#define PAGE_S2_DEVICE		__pgprot(PROT_DEFAULT | PTE_S2_MEMATTR(MT_S2_DEVICE_nGnRE) | PTE_S2_RDONLY | PTE_UXN)
+#define PAGE_S2			__pgprot(PROT_DEFAULT | PTE_S2_MEMATTR(MT_S2_NORMAL) | PTE_S2_RDONLY | PTE_S2_XN)
+#define PAGE_S2_DEVICE		__pgprot(PROT_DEFAULT | PTE_S2_MEMATTR(MT_S2_DEVICE_nGnRE) | PTE_S2_RDONLY | PTE_S2_XN)
 
 #define PAGE_NONE		__pgprot(((_PAGE_DEFAULT) & ~PTE_VALID) | PTE_PROT_NONE | PTE_RDONLY | PTE_PXN | PTE_UXN)
 #define PAGE_SHARED		__pgprot(_PAGE_DEFAULT | PTE_USER | PTE_NG | PTE_PXN | PTE_UXN | PTE_WRITE)
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 9e5628388af8..1d47da22f75c 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1292,7 +1292,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  unsigned long fault_status)
 {
 	int ret;
-	bool write_fault, writable, hugetlb = false, force_pte = false;
+	bool write_fault, exec_fault, writable, hugetlb = false, force_pte = false;
 	unsigned long mmu_seq;
 	gfn_t gfn = fault_ipa >> PAGE_SHIFT;
 	struct kvm *kvm = vcpu->kvm;
@@ -1304,7 +1304,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	unsigned long flags = 0;
 
 	write_fault = kvm_is_write_fault(vcpu);
-	if (fault_status == FSC_PERM && !write_fault) {
+	exec_fault = kvm_vcpu_trap_is_iabt(vcpu);
+	VM_BUG_ON(write_fault && exec_fault);
+
+	if (fault_status == FSC_PERM && !write_fault && !exec_fault) {
 		kvm_err("Unexpected L2 read permission error\n");
 		return -EFAULT;
 	}
@@ -1398,7 +1401,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			kvm_set_pfn_dirty(pfn);
 		}
 		coherent_dcache_guest_page(vcpu, pfn, PMD_SIZE);
-		coherent_icache_guest_page(vcpu, pfn, PMD_SIZE);
+
+		if (exec_fault) {
+			new_pmd = kvm_s2pmd_mkexec(new_pmd);
+			coherent_icache_guest_page(vcpu, pfn, PMD_SIZE);
+		}
 
 		ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
 	} else {
@@ -1410,7 +1417,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			mark_page_dirty(kvm, gfn);
 		}
 		coherent_dcache_guest_page(vcpu, pfn, PAGE_SIZE);
-		coherent_icache_guest_page(vcpu, pfn, PAGE_SIZE);
+
+		if (exec_fault) {
+			new_pte = kvm_s2pte_mkexec(new_pte);
+			coherent_icache_guest_page(vcpu, pfn, PAGE_SIZE);
+		}
 
 		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, flags);
 	}
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 05/10] KVM: arm/arm64: Limit icache invalidation to prefetch aborts
@ 2017-10-09 15:20   ` Marc Zyngier
  0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2017-10-09 15:20 UTC (permalink / raw)
  To: linux-arm-kernel

We've so far eagerly invalidated the icache, no matter how
the page was faulted in (data or prefetch abort).

But we can easily track execution by setting the XN bits
in the S2 page tables, get the prefetch abort at HYP and
perform the icache invalidation at that time only.

As for most VMs, the instruction working set is pretty
small compared to the data set, this is likely to save
some traffic (specially as the invalidation is broadcast).

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_mmu.h        | 12 ++++++++++++
 arch/arm/include/asm/pgtable.h        |  4 ++--
 arch/arm64/include/asm/kvm_mmu.h      | 12 ++++++++++++
 arch/arm64/include/asm/pgtable-prot.h |  4 ++--
 virt/kvm/arm/mmu.c                    | 19 +++++++++++++++----
 5 files changed, 43 insertions(+), 8 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 6773dcf21bff..bf76150aad5f 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -85,6 +85,18 @@ static inline pmd_t kvm_s2pmd_mkwrite(pmd_t pmd)
 	return pmd;
 }
 
+static inline pte_t kvm_s2pte_mkexec(pte_t pte)
+{
+	pte_val(pte) &= ~L_PTE_XN;
+	return pte;
+}
+
+static inline pmd_t kvm_s2pmd_mkexec(pmd_t pmd)
+{
+	pmd_val(pmd) &= ~PMD_SECT_XN;
+	return pmd;
+}
+
 static inline void kvm_set_s2pte_readonly(pte_t *pte)
 {
 	pte_val(*pte) = (pte_val(*pte) & ~L_PTE_S2_RDWR) | L_PTE_S2_RDONLY;
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 1c462381c225..9b6e77b9ab7e 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -102,8 +102,8 @@ extern pgprot_t		pgprot_s2_device;
 #define PAGE_HYP_EXEC		_MOD_PROT(pgprot_kernel, L_PTE_HYP | L_PTE_RDONLY)
 #define PAGE_HYP_RO		_MOD_PROT(pgprot_kernel, L_PTE_HYP | L_PTE_RDONLY | L_PTE_XN)
 #define PAGE_HYP_DEVICE		_MOD_PROT(pgprot_hyp_device, L_PTE_HYP)
-#define PAGE_S2			_MOD_PROT(pgprot_s2, L_PTE_S2_RDONLY)
-#define PAGE_S2_DEVICE		_MOD_PROT(pgprot_s2_device, L_PTE_S2_RDONLY)
+#define PAGE_S2			_MOD_PROT(pgprot_s2, L_PTE_S2_RDONLY | L_PTE_XN)
+#define PAGE_S2_DEVICE		_MOD_PROT(pgprot_s2_device, L_PTE_S2_RDONLY | L_PTE_XN)
 
 #define __PAGE_NONE		__pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | L_PTE_XN | L_PTE_NONE)
 #define __PAGE_SHARED		__pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_XN)
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 48d31ca2ce9c..60c420a5ac0d 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -173,6 +173,18 @@ static inline pmd_t kvm_s2pmd_mkwrite(pmd_t pmd)
 	return pmd;
 }
 
+static inline pte_t kvm_s2pte_mkexec(pte_t pte)
+{
+	pte_val(pte) &= ~PTE_S2_XN;
+	return pte;
+}
+
+static inline pmd_t kvm_s2pmd_mkexec(pmd_t pmd)
+{
+	pmd_val(pmd) &= ~PMD_S2_XN;
+	return pmd;
+}
+
 static inline void kvm_set_s2pte_readonly(pte_t *pte)
 {
 	pteval_t old_pteval, pteval;
diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
index 0a5635fb0ef9..4e12dabd342b 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -60,8 +60,8 @@
 #define PAGE_HYP_RO		__pgprot(_PAGE_DEFAULT | PTE_HYP | PTE_RDONLY | PTE_HYP_XN)
 #define PAGE_HYP_DEVICE		__pgprot(PROT_DEVICE_nGnRE | PTE_HYP)
 
-#define PAGE_S2			__pgprot(PROT_DEFAULT | PTE_S2_MEMATTR(MT_S2_NORMAL) | PTE_S2_RDONLY)
-#define PAGE_S2_DEVICE		__pgprot(PROT_DEFAULT | PTE_S2_MEMATTR(MT_S2_DEVICE_nGnRE) | PTE_S2_RDONLY | PTE_UXN)
+#define PAGE_S2			__pgprot(PROT_DEFAULT | PTE_S2_MEMATTR(MT_S2_NORMAL) | PTE_S2_RDONLY | PTE_S2_XN)
+#define PAGE_S2_DEVICE		__pgprot(PROT_DEFAULT | PTE_S2_MEMATTR(MT_S2_DEVICE_nGnRE) | PTE_S2_RDONLY | PTE_S2_XN)
 
 #define PAGE_NONE		__pgprot(((_PAGE_DEFAULT) & ~PTE_VALID) | PTE_PROT_NONE | PTE_RDONLY | PTE_PXN | PTE_UXN)
 #define PAGE_SHARED		__pgprot(_PAGE_DEFAULT | PTE_USER | PTE_NG | PTE_PXN | PTE_UXN | PTE_WRITE)
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 9e5628388af8..1d47da22f75c 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1292,7 +1292,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  unsigned long fault_status)
 {
 	int ret;
-	bool write_fault, writable, hugetlb = false, force_pte = false;
+	bool write_fault, exec_fault, writable, hugetlb = false, force_pte = false;
 	unsigned long mmu_seq;
 	gfn_t gfn = fault_ipa >> PAGE_SHIFT;
 	struct kvm *kvm = vcpu->kvm;
@@ -1304,7 +1304,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	unsigned long flags = 0;
 
 	write_fault = kvm_is_write_fault(vcpu);
-	if (fault_status == FSC_PERM && !write_fault) {
+	exec_fault = kvm_vcpu_trap_is_iabt(vcpu);
+	VM_BUG_ON(write_fault && exec_fault);
+
+	if (fault_status == FSC_PERM && !write_fault && !exec_fault) {
 		kvm_err("Unexpected L2 read permission error\n");
 		return -EFAULT;
 	}
@@ -1398,7 +1401,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			kvm_set_pfn_dirty(pfn);
 		}
 		coherent_dcache_guest_page(vcpu, pfn, PMD_SIZE);
-		coherent_icache_guest_page(vcpu, pfn, PMD_SIZE);
+
+		if (exec_fault) {
+			new_pmd = kvm_s2pmd_mkexec(new_pmd);
+			coherent_icache_guest_page(vcpu, pfn, PMD_SIZE);
+		}
 
 		ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
 	} else {
@@ -1410,7 +1417,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			mark_page_dirty(kvm, gfn);
 		}
 		coherent_dcache_guest_page(vcpu, pfn, PAGE_SIZE);
-		coherent_icache_guest_page(vcpu, pfn, PAGE_SIZE);
+
+		if (exec_fault) {
+			new_pte = kvm_s2pte_mkexec(new_pte);
+			coherent_icache_guest_page(vcpu, pfn, PAGE_SIZE);
+		}
 
 		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, flags);
 	}
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 06/10] KVM: arm/arm64: Only clean the dcache on translation fault
  2017-10-09 15:20 ` Marc Zyngier
@ 2017-10-09 15:20   ` Marc Zyngier
  -1 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2017-10-09 15:20 UTC (permalink / raw)
  To: Christoffer Dall, Catalin Marinas, Will Deacon
  Cc: kvm, linux-arm-kernel, kvmarm

The only case where we actually need to perform a dache maintenance
is when we map the page for the first time, and subsequent permission
faults do not require cache maintenance. Let's make it conditional
on not being a permission fault (and thus a translation fault).

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 virt/kvm/arm/mmu.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 1d47da22f75c..1911fadde88b 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1400,7 +1400,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			new_pmd = kvm_s2pmd_mkwrite(new_pmd);
 			kvm_set_pfn_dirty(pfn);
 		}
-		coherent_dcache_guest_page(vcpu, pfn, PMD_SIZE);
+
+		if (fault_status != FSC_PERM)
+			coherent_dcache_guest_page(vcpu, pfn, PMD_SIZE);
 
 		if (exec_fault) {
 			new_pmd = kvm_s2pmd_mkexec(new_pmd);
@@ -1416,7 +1418,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			kvm_set_pfn_dirty(pfn);
 			mark_page_dirty(kvm, gfn);
 		}
-		coherent_dcache_guest_page(vcpu, pfn, PAGE_SIZE);
+
+		if (fault_status != FSC_PERM)
+			coherent_dcache_guest_page(vcpu, pfn, PAGE_SIZE);
 
 		if (exec_fault) {
 			new_pte = kvm_s2pte_mkexec(new_pte);
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 06/10] KVM: arm/arm64: Only clean the dcache on translation fault
@ 2017-10-09 15:20   ` Marc Zyngier
  0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2017-10-09 15:20 UTC (permalink / raw)
  To: linux-arm-kernel

The only case where we actually need to perform a dache maintenance
is when we map the page for the first time, and subsequent permission
faults do not require cache maintenance. Let's make it conditional
on not being a permission fault (and thus a translation fault).

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 virt/kvm/arm/mmu.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 1d47da22f75c..1911fadde88b 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1400,7 +1400,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			new_pmd = kvm_s2pmd_mkwrite(new_pmd);
 			kvm_set_pfn_dirty(pfn);
 		}
-		coherent_dcache_guest_page(vcpu, pfn, PMD_SIZE);
+
+		if (fault_status != FSC_PERM)
+			coherent_dcache_guest_page(vcpu, pfn, PMD_SIZE);
 
 		if (exec_fault) {
 			new_pmd = kvm_s2pmd_mkexec(new_pmd);
@@ -1416,7 +1418,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			kvm_set_pfn_dirty(pfn);
 			mark_page_dirty(kvm, gfn);
 		}
-		coherent_dcache_guest_page(vcpu, pfn, PAGE_SIZE);
+
+		if (fault_status != FSC_PERM)
+			coherent_dcache_guest_page(vcpu, pfn, PAGE_SIZE);
 
 		if (exec_fault) {
 			new_pte = kvm_s2pte_mkexec(new_pte);
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 07/10] KVM: arm/arm64: Preserve Exec permission across R/W permission faults
  2017-10-09 15:20 ` Marc Zyngier
@ 2017-10-09 15:20   ` Marc Zyngier
  -1 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2017-10-09 15:20 UTC (permalink / raw)
  To: Christoffer Dall, Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, kvm, kvmarm

So far, we loose the Exec property whenever we take permission
faults, as we always reconstruct the PTE/PMD from scratch. This
can be counter productive as we can end-up with the following
fault sequence:

	X -> RO -> ROX -> RW -> RWX

Instead, we can lookup the existing PTE/PMD and clear the XN bit in the
new entry if it was already cleared in the old one, leadig to a much
nicer fault sequence:

	X -> ROX -> RWX

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_mmu.h   | 10 ++++++++++
 arch/arm64/include/asm/kvm_mmu.h | 10 ++++++++++
 virt/kvm/arm/mmu.c               | 25 +++++++++++++++++++++++++
 3 files changed, 45 insertions(+)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index bf76150aad5f..ad442d86c23e 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -107,6 +107,11 @@ static inline bool kvm_s2pte_readonly(pte_t *pte)
 	return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
 }
 
+static inline bool kvm_s2pte_exec(pte_t *pte)
+{
+	return !(pte_val(*pte) & L_PTE_XN);
+}
+
 static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
 {
 	pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
@@ -117,6 +122,11 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
 	return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
 }
 
+static inline bool kvm_s2pmd_exec(pmd_t *pmd)
+{
+	return !(pmd_val(*pmd) & PMD_SECT_XN);
+}
+
 static inline bool kvm_page_empty(void *ptr)
 {
 	struct page *ptr_page = virt_to_page(ptr);
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 60c420a5ac0d..e7af74b8b51a 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -203,6 +203,11 @@ static inline bool kvm_s2pte_readonly(pte_t *pte)
 	return (pte_val(*pte) & PTE_S2_RDWR) == PTE_S2_RDONLY;
 }
 
+static inline bool kvm_s2pte_exec(pte_t *pte)
+{
+	return !(pte_val(*pte) & PTE_S2_XN);
+}
+
 static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
 {
 	kvm_set_s2pte_readonly((pte_t *)pmd);
@@ -213,6 +218,11 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
 	return kvm_s2pte_readonly((pte_t *)pmd);
 }
 
+static inline bool kvm_s2pmd_exec(pmd_t *pmd)
+{
+	return !(pmd_val(*pmd) & PMD_S2_XN);
+}
+
 static inline bool kvm_page_empty(void *ptr)
 {
 	struct page *ptr_page = virt_to_page(ptr);
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 1911fadde88b..ccc6106764a6 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -926,6 +926,17 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
 	return 0;
 }
 
+static pte_t *stage2_get_pte(struct kvm *kvm, phys_addr_t addr)
+{
+	pmd_t *pmdp;
+
+	pmdp = stage2_get_pmd(kvm, NULL, addr);
+	if (!pmdp || pmd_none(*pmdp))
+		return NULL;
+
+	return pte_offset_kernel(pmdp, addr);
+}
+
 static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
 			  phys_addr_t addr, const pte_t *new_pte,
 			  unsigned long flags)
@@ -1407,6 +1418,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		if (exec_fault) {
 			new_pmd = kvm_s2pmd_mkexec(new_pmd);
 			coherent_icache_guest_page(vcpu, pfn, PMD_SIZE);
+		} else if (fault_status == FSC_PERM) {
+			/* Preserve execute if XN was already cleared */
+			pmd_t *old_pmdp = stage2_get_pmd(kvm, NULL, fault_ipa);
+
+			if (old_pmdp && pmd_present(*old_pmdp) &&
+			    kvm_s2pmd_exec(old_pmdp))
+				new_pmd = kvm_s2pmd_mkexec(new_pmd);
 		}
 
 		ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
@@ -1425,6 +1443,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		if (exec_fault) {
 			new_pte = kvm_s2pte_mkexec(new_pte);
 			coherent_icache_guest_page(vcpu, pfn, PAGE_SIZE);
+		} else if (fault_status == FSC_PERM) {
+			/* Preserve execute if XN was already cleared */
+			pte_t *old_ptep = stage2_get_pte(kvm, fault_ipa);
+
+			if (old_ptep && pte_present(*old_ptep) &&
+			    kvm_s2pte_exec(old_ptep))
+				new_pte = kvm_s2pte_mkexec(new_pte);
 		}
 
 		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, flags);
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 07/10] KVM: arm/arm64: Preserve Exec permission across R/W permission faults
@ 2017-10-09 15:20   ` Marc Zyngier
  0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2017-10-09 15:20 UTC (permalink / raw)
  To: linux-arm-kernel

So far, we loose the Exec property whenever we take permission
faults, as we always reconstruct the PTE/PMD from scratch. This
can be counter productive as we can end-up with the following
fault sequence:

	X -> RO -> ROX -> RW -> RWX

Instead, we can lookup the existing PTE/PMD and clear the XN bit in the
new entry if it was already cleared in the old one, leadig to a much
nicer fault sequence:

	X -> ROX -> RWX

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_mmu.h   | 10 ++++++++++
 arch/arm64/include/asm/kvm_mmu.h | 10 ++++++++++
 virt/kvm/arm/mmu.c               | 25 +++++++++++++++++++++++++
 3 files changed, 45 insertions(+)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index bf76150aad5f..ad442d86c23e 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -107,6 +107,11 @@ static inline bool kvm_s2pte_readonly(pte_t *pte)
 	return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
 }
 
+static inline bool kvm_s2pte_exec(pte_t *pte)
+{
+	return !(pte_val(*pte) & L_PTE_XN);
+}
+
 static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
 {
 	pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
@@ -117,6 +122,11 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
 	return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
 }
 
+static inline bool kvm_s2pmd_exec(pmd_t *pmd)
+{
+	return !(pmd_val(*pmd) & PMD_SECT_XN);
+}
+
 static inline bool kvm_page_empty(void *ptr)
 {
 	struct page *ptr_page = virt_to_page(ptr);
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 60c420a5ac0d..e7af74b8b51a 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -203,6 +203,11 @@ static inline bool kvm_s2pte_readonly(pte_t *pte)
 	return (pte_val(*pte) & PTE_S2_RDWR) == PTE_S2_RDONLY;
 }
 
+static inline bool kvm_s2pte_exec(pte_t *pte)
+{
+	return !(pte_val(*pte) & PTE_S2_XN);
+}
+
 static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
 {
 	kvm_set_s2pte_readonly((pte_t *)pmd);
@@ -213,6 +218,11 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
 	return kvm_s2pte_readonly((pte_t *)pmd);
 }
 
+static inline bool kvm_s2pmd_exec(pmd_t *pmd)
+{
+	return !(pmd_val(*pmd) & PMD_S2_XN);
+}
+
 static inline bool kvm_page_empty(void *ptr)
 {
 	struct page *ptr_page = virt_to_page(ptr);
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 1911fadde88b..ccc6106764a6 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -926,6 +926,17 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
 	return 0;
 }
 
+static pte_t *stage2_get_pte(struct kvm *kvm, phys_addr_t addr)
+{
+	pmd_t *pmdp;
+
+	pmdp = stage2_get_pmd(kvm, NULL, addr);
+	if (!pmdp || pmd_none(*pmdp))
+		return NULL;
+
+	return pte_offset_kernel(pmdp, addr);
+}
+
 static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
 			  phys_addr_t addr, const pte_t *new_pte,
 			  unsigned long flags)
@@ -1407,6 +1418,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		if (exec_fault) {
 			new_pmd = kvm_s2pmd_mkexec(new_pmd);
 			coherent_icache_guest_page(vcpu, pfn, PMD_SIZE);
+		} else if (fault_status == FSC_PERM) {
+			/* Preserve execute if XN was already cleared */
+			pmd_t *old_pmdp = stage2_get_pmd(kvm, NULL, fault_ipa);
+
+			if (old_pmdp && pmd_present(*old_pmdp) &&
+			    kvm_s2pmd_exec(old_pmdp))
+				new_pmd = kvm_s2pmd_mkexec(new_pmd);
 		}
 
 		ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
@@ -1425,6 +1443,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		if (exec_fault) {
 			new_pte = kvm_s2pte_mkexec(new_pte);
 			coherent_icache_guest_page(vcpu, pfn, PAGE_SIZE);
+		} else if (fault_status == FSC_PERM) {
+			/* Preserve execute if XN was already cleared */
+			pte_t *old_ptep = stage2_get_pte(kvm, fault_ipa);
+
+			if (old_ptep && pte_present(*old_ptep) &&
+			    kvm_s2pte_exec(old_ptep))
+				new_pte = kvm_s2pte_mkexec(new_pte);
 		}
 
 		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, flags);
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 08/10] KVM: arm/arm64: Drop vcpu parameter from coherent_{d,i}cache_guest_page
  2017-10-09 15:20 ` Marc Zyngier
@ 2017-10-09 15:20   ` Marc Zyngier
  -1 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2017-10-09 15:20 UTC (permalink / raw)
  To: Christoffer Dall, Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, kvm, kvmarm

The vcpu parameter isn't used for anything, and gets in the way of
further cleanups. Let's get rid of it.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_mmu.h   |  6 ++----
 arch/arm64/include/asm/kvm_mmu.h |  6 ++----
 virt/kvm/arm/mmu.c               | 18 ++++++++----------
 3 files changed, 12 insertions(+), 18 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index ad442d86c23e..5f1ac88a5951 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -150,8 +150,7 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
 	return (vcpu_cp15(vcpu, c1_SCTLR) & 0b101) == 0b101;
 }
 
-static inline void __coherent_dcache_guest_page(struct kvm_vcpu *vcpu,
-						kvm_pfn_t pfn,
+static inline void __coherent_dcache_guest_page(kvm_pfn_t pfn,
 						unsigned long size)
 {
 	/*
@@ -177,8 +176,7 @@ static inline void __coherent_dcache_guest_page(struct kvm_vcpu *vcpu,
 	}
 }
 
-static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
-						kvm_pfn_t pfn,
+static inline void __coherent_icache_guest_page(kvm_pfn_t pfn,
 						unsigned long size)
 {
 	u32 iclsz;
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index e7af74b8b51a..33dcc3c79574 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -252,8 +252,7 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
 	return (vcpu_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
 }
 
-static inline void __coherent_dcache_guest_page(struct kvm_vcpu *vcpu,
-						kvm_pfn_t pfn,
+static inline void __coherent_dcache_guest_page(kvm_pfn_t pfn,
 						unsigned long size)
 {
 	void *va = page_address(pfn_to_page(pfn));
@@ -261,8 +260,7 @@ static inline void __coherent_dcache_guest_page(struct kvm_vcpu *vcpu,
 	kvm_flush_dcache_to_poc(va, size);
 }
 
-static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
-						kvm_pfn_t pfn,
+static inline void __coherent_icache_guest_page(kvm_pfn_t pfn,
 						unsigned long size)
 {
 	if (icache_is_aliasing()) {
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index ccc6106764a6..5b495450e92f 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1268,16 +1268,14 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
 }
 
-static void coherent_dcache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
-				       unsigned long size)
+static void coherent_dcache_guest_page(kvm_pfn_t pfn, unsigned long size)
 {
-	__coherent_dcache_guest_page(vcpu, pfn, size);
+	__coherent_dcache_guest_page(pfn, size);
 }
 
-static void coherent_icache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
-				       unsigned long size)
+static void coherent_icache_guest_page(kvm_pfn_t pfn, unsigned long size)
 {
-	__coherent_icache_guest_page(vcpu, pfn, size);
+	__coherent_icache_guest_page(pfn, size);
 }
 
 static void kvm_send_hwpoison_signal(unsigned long address,
@@ -1413,11 +1411,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		}
 
 		if (fault_status != FSC_PERM)
-			coherent_dcache_guest_page(vcpu, pfn, PMD_SIZE);
+			coherent_dcache_guest_page(pfn, PMD_SIZE);
 
 		if (exec_fault) {
 			new_pmd = kvm_s2pmd_mkexec(new_pmd);
-			coherent_icache_guest_page(vcpu, pfn, PMD_SIZE);
+			coherent_icache_guest_page(pfn, PMD_SIZE);
 		} else if (fault_status == FSC_PERM) {
 			/* Preserve execute if XN was already cleared */
 			pmd_t *old_pmdp = stage2_get_pmd(kvm, NULL, fault_ipa);
@@ -1438,11 +1436,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		}
 
 		if (fault_status != FSC_PERM)
-			coherent_dcache_guest_page(vcpu, pfn, PAGE_SIZE);
+			coherent_dcache_guest_page(pfn, PAGE_SIZE);
 
 		if (exec_fault) {
 			new_pte = kvm_s2pte_mkexec(new_pte);
-			coherent_icache_guest_page(vcpu, pfn, PAGE_SIZE);
+			coherent_icache_guest_page(pfn, PAGE_SIZE);
 		} else if (fault_status == FSC_PERM) {
 			/* Preserve execute if XN was already cleared */
 			pte_t *old_ptep = stage2_get_pte(kvm, fault_ipa);
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 08/10] KVM: arm/arm64: Drop vcpu parameter from coherent_{d, i}cache_guest_page
@ 2017-10-09 15:20   ` Marc Zyngier
  0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2017-10-09 15:20 UTC (permalink / raw)
  To: linux-arm-kernel

The vcpu parameter isn't used for anything, and gets in the way of
further cleanups. Let's get rid of it.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_mmu.h   |  6 ++----
 arch/arm64/include/asm/kvm_mmu.h |  6 ++----
 virt/kvm/arm/mmu.c               | 18 ++++++++----------
 3 files changed, 12 insertions(+), 18 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index ad442d86c23e..5f1ac88a5951 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -150,8 +150,7 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
 	return (vcpu_cp15(vcpu, c1_SCTLR) & 0b101) == 0b101;
 }
 
-static inline void __coherent_dcache_guest_page(struct kvm_vcpu *vcpu,
-						kvm_pfn_t pfn,
+static inline void __coherent_dcache_guest_page(kvm_pfn_t pfn,
 						unsigned long size)
 {
 	/*
@@ -177,8 +176,7 @@ static inline void __coherent_dcache_guest_page(struct kvm_vcpu *vcpu,
 	}
 }
 
-static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
-						kvm_pfn_t pfn,
+static inline void __coherent_icache_guest_page(kvm_pfn_t pfn,
 						unsigned long size)
 {
 	u32 iclsz;
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index e7af74b8b51a..33dcc3c79574 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -252,8 +252,7 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
 	return (vcpu_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
 }
 
-static inline void __coherent_dcache_guest_page(struct kvm_vcpu *vcpu,
-						kvm_pfn_t pfn,
+static inline void __coherent_dcache_guest_page(kvm_pfn_t pfn,
 						unsigned long size)
 {
 	void *va = page_address(pfn_to_page(pfn));
@@ -261,8 +260,7 @@ static inline void __coherent_dcache_guest_page(struct kvm_vcpu *vcpu,
 	kvm_flush_dcache_to_poc(va, size);
 }
 
-static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
-						kvm_pfn_t pfn,
+static inline void __coherent_icache_guest_page(kvm_pfn_t pfn,
 						unsigned long size)
 {
 	if (icache_is_aliasing()) {
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index ccc6106764a6..5b495450e92f 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1268,16 +1268,14 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
 }
 
-static void coherent_dcache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
-				       unsigned long size)
+static void coherent_dcache_guest_page(kvm_pfn_t pfn, unsigned long size)
 {
-	__coherent_dcache_guest_page(vcpu, pfn, size);
+	__coherent_dcache_guest_page(pfn, size);
 }
 
-static void coherent_icache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
-				       unsigned long size)
+static void coherent_icache_guest_page(kvm_pfn_t pfn, unsigned long size)
 {
-	__coherent_icache_guest_page(vcpu, pfn, size);
+	__coherent_icache_guest_page(pfn, size);
 }
 
 static void kvm_send_hwpoison_signal(unsigned long address,
@@ -1413,11 +1411,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		}
 
 		if (fault_status != FSC_PERM)
-			coherent_dcache_guest_page(vcpu, pfn, PMD_SIZE);
+			coherent_dcache_guest_page(pfn, PMD_SIZE);
 
 		if (exec_fault) {
 			new_pmd = kvm_s2pmd_mkexec(new_pmd);
-			coherent_icache_guest_page(vcpu, pfn, PMD_SIZE);
+			coherent_icache_guest_page(pfn, PMD_SIZE);
 		} else if (fault_status == FSC_PERM) {
 			/* Preserve execute if XN was already cleared */
 			pmd_t *old_pmdp = stage2_get_pmd(kvm, NULL, fault_ipa);
@@ -1438,11 +1436,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		}
 
 		if (fault_status != FSC_PERM)
-			coherent_dcache_guest_page(vcpu, pfn, PAGE_SIZE);
+			coherent_dcache_guest_page(pfn, PAGE_SIZE);
 
 		if (exec_fault) {
 			new_pte = kvm_s2pte_mkexec(new_pte);
-			coherent_icache_guest_page(vcpu, pfn, PAGE_SIZE);
+			coherent_icache_guest_page(pfn, PAGE_SIZE);
 		} else if (fault_status == FSC_PERM) {
 			/* Preserve execute if XN was already cleared */
 			pte_t *old_ptep = stage2_get_pte(kvm, fault_ipa);
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 09/10] KVM: arm/arm64: Detangle kvm_mmu.h from kvm_hyp.h
  2017-10-09 15:20 ` Marc Zyngier
@ 2017-10-09 15:20   ` Marc Zyngier
  -1 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2017-10-09 15:20 UTC (permalink / raw)
  To: Christoffer Dall, Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, kvm, kvmarm

kvm_hyp.h has an odd dependency on kvm_mmu.h, which makes the
opposite inclusion impossible. Let's start with breaking that
useless dependency.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_hyp.h   | 1 -
 arch/arm/kvm/hyp/switch.c        | 1 +
 arch/arm/kvm/hyp/tlb.c           | 1 +
 arch/arm64/include/asm/kvm_hyp.h | 1 -
 arch/arm64/kvm/hyp/debug-sr.c    | 1 +
 arch/arm64/kvm/hyp/switch.c      | 1 +
 arch/arm64/kvm/hyp/tlb.c         | 1 +
 virt/kvm/arm/hyp/vgic-v2-sr.c    | 1 +
 8 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/kvm_hyp.h b/arch/arm/include/asm/kvm_hyp.h
index ad541f9ecc78..8b29faa119ba 100644
--- a/arch/arm/include/asm/kvm_hyp.h
+++ b/arch/arm/include/asm/kvm_hyp.h
@@ -21,7 +21,6 @@
 #include <linux/compiler.h>
 #include <linux/kvm_host.h>
 #include <asm/cp15.h>
-#include <asm/kvm_mmu.h>
 #include <asm/vfp.h>
 
 #define __hyp_text __section(.hyp.text) notrace
diff --git a/arch/arm/kvm/hyp/switch.c b/arch/arm/kvm/hyp/switch.c
index ebd2dd46adf7..67e0a689c4b5 100644
--- a/arch/arm/kvm/hyp/switch.c
+++ b/arch/arm/kvm/hyp/switch.c
@@ -18,6 +18,7 @@
 
 #include <asm/kvm_asm.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
 
 __asm__(".arch_extension     virt");
 
diff --git a/arch/arm/kvm/hyp/tlb.c b/arch/arm/kvm/hyp/tlb.c
index 6d810af2d9fd..c0edd450e104 100644
--- a/arch/arm/kvm/hyp/tlb.c
+++ b/arch/arm/kvm/hyp/tlb.c
@@ -19,6 +19,7 @@
  */
 
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
 
 /**
  * Flush per-VMID TLBs
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 4572a9b560fa..afbfbe0c12c5 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -20,7 +20,6 @@
 
 #include <linux/compiler.h>
 #include <linux/kvm_host.h>
-#include <asm/kvm_mmu.h>
 #include <asm/sysreg.h>
 
 #define __hyp_text __section(.hyp.text) notrace
diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
index f5154ed3da6c..d3a13d57f2c5 100644
--- a/arch/arm64/kvm/hyp/debug-sr.c
+++ b/arch/arm64/kvm/hyp/debug-sr.c
@@ -21,6 +21,7 @@
 #include <asm/debug-monitors.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
 
 #define read_debug(r,n)		read_sysreg(r##n##_el1)
 #define write_debug(v,r,n)	write_sysreg(v, r##n##_el1)
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 945e79c641c4..c52f7094122f 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -21,6 +21,7 @@
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
 #include <asm/fpsimd.h>
 
 static bool __hyp_text __fpsimd_enabled_nvhe(void)
diff --git a/arch/arm64/kvm/hyp/tlb.c b/arch/arm64/kvm/hyp/tlb.c
index 73464a96c365..131c7772703c 100644
--- a/arch/arm64/kvm/hyp/tlb.c
+++ b/arch/arm64/kvm/hyp/tlb.c
@@ -16,6 +16,7 @@
  */
 
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
 #include <asm/tlbflush.h>
 
 static void __hyp_text __tlb_switch_to_guest_vhe(struct kvm *kvm)
diff --git a/virt/kvm/arm/hyp/vgic-v2-sr.c b/virt/kvm/arm/hyp/vgic-v2-sr.c
index a3f18d362366..77ccd8e2090b 100644
--- a/virt/kvm/arm/hyp/vgic-v2-sr.c
+++ b/virt/kvm/arm/hyp/vgic-v2-sr.c
@@ -21,6 +21,7 @@
 
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
 
 static void __hyp_text save_elrsr(struct kvm_vcpu *vcpu, void __iomem *base)
 {
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 09/10] KVM: arm/arm64: Detangle kvm_mmu.h from kvm_hyp.h
@ 2017-10-09 15:20   ` Marc Zyngier
  0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2017-10-09 15:20 UTC (permalink / raw)
  To: linux-arm-kernel

kvm_hyp.h has an odd dependency on kvm_mmu.h, which makes the
opposite inclusion impossible. Let's start with breaking that
useless dependency.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_hyp.h   | 1 -
 arch/arm/kvm/hyp/switch.c        | 1 +
 arch/arm/kvm/hyp/tlb.c           | 1 +
 arch/arm64/include/asm/kvm_hyp.h | 1 -
 arch/arm64/kvm/hyp/debug-sr.c    | 1 +
 arch/arm64/kvm/hyp/switch.c      | 1 +
 arch/arm64/kvm/hyp/tlb.c         | 1 +
 virt/kvm/arm/hyp/vgic-v2-sr.c    | 1 +
 8 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/kvm_hyp.h b/arch/arm/include/asm/kvm_hyp.h
index ad541f9ecc78..8b29faa119ba 100644
--- a/arch/arm/include/asm/kvm_hyp.h
+++ b/arch/arm/include/asm/kvm_hyp.h
@@ -21,7 +21,6 @@
 #include <linux/compiler.h>
 #include <linux/kvm_host.h>
 #include <asm/cp15.h>
-#include <asm/kvm_mmu.h>
 #include <asm/vfp.h>
 
 #define __hyp_text __section(.hyp.text) notrace
diff --git a/arch/arm/kvm/hyp/switch.c b/arch/arm/kvm/hyp/switch.c
index ebd2dd46adf7..67e0a689c4b5 100644
--- a/arch/arm/kvm/hyp/switch.c
+++ b/arch/arm/kvm/hyp/switch.c
@@ -18,6 +18,7 @@
 
 #include <asm/kvm_asm.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
 
 __asm__(".arch_extension     virt");
 
diff --git a/arch/arm/kvm/hyp/tlb.c b/arch/arm/kvm/hyp/tlb.c
index 6d810af2d9fd..c0edd450e104 100644
--- a/arch/arm/kvm/hyp/tlb.c
+++ b/arch/arm/kvm/hyp/tlb.c
@@ -19,6 +19,7 @@
  */
 
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
 
 /**
  * Flush per-VMID TLBs
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 4572a9b560fa..afbfbe0c12c5 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -20,7 +20,6 @@
 
 #include <linux/compiler.h>
 #include <linux/kvm_host.h>
-#include <asm/kvm_mmu.h>
 #include <asm/sysreg.h>
 
 #define __hyp_text __section(.hyp.text) notrace
diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
index f5154ed3da6c..d3a13d57f2c5 100644
--- a/arch/arm64/kvm/hyp/debug-sr.c
+++ b/arch/arm64/kvm/hyp/debug-sr.c
@@ -21,6 +21,7 @@
 #include <asm/debug-monitors.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
 
 #define read_debug(r,n)		read_sysreg(r##n##_el1)
 #define write_debug(v,r,n)	write_sysreg(v, r##n##_el1)
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 945e79c641c4..c52f7094122f 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -21,6 +21,7 @@
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
 #include <asm/fpsimd.h>
 
 static bool __hyp_text __fpsimd_enabled_nvhe(void)
diff --git a/arch/arm64/kvm/hyp/tlb.c b/arch/arm64/kvm/hyp/tlb.c
index 73464a96c365..131c7772703c 100644
--- a/arch/arm64/kvm/hyp/tlb.c
+++ b/arch/arm64/kvm/hyp/tlb.c
@@ -16,6 +16,7 @@
  */
 
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
 #include <asm/tlbflush.h>
 
 static void __hyp_text __tlb_switch_to_guest_vhe(struct kvm *kvm)
diff --git a/virt/kvm/arm/hyp/vgic-v2-sr.c b/virt/kvm/arm/hyp/vgic-v2-sr.c
index a3f18d362366..77ccd8e2090b 100644
--- a/virt/kvm/arm/hyp/vgic-v2-sr.c
+++ b/virt/kvm/arm/hyp/vgic-v2-sr.c
@@ -21,6 +21,7 @@
 
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
 
 static void __hyp_text save_elrsr(struct kvm_vcpu *vcpu, void __iomem *base)
 {
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 10/10] arm: KVM: Use common implementation for all flushes to PoC
  2017-10-09 15:20 ` Marc Zyngier
@ 2017-10-09 15:20   ` Marc Zyngier
  -1 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2017-10-09 15:20 UTC (permalink / raw)
  To: Christoffer Dall, Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, kvm, kvmarm

We currently have no less than three implementations for the
"flush to PoC" code. Let standardize on a single one. This
requires a bit of unpleasant moving around, and relies on
__kvm_flush_dcache_pte and co being #defines so that they can
call into coherent_dcache_guest_page...

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_mmu.h | 28 ++++------------------------
 virt/kvm/arm/mmu.c             | 20 ++++++++++----------
 2 files changed, 14 insertions(+), 34 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 5f1ac88a5951..011b0db85c02 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -235,31 +235,11 @@ static inline void __coherent_icache_guest_page(kvm_pfn_t pfn,
 	}
 }
 
-static inline void __kvm_flush_dcache_pte(pte_t pte)
-{
-	void *va = kmap_atomic(pte_page(pte));
-
-	kvm_flush_dcache_to_poc(va, PAGE_SIZE);
-
-	kunmap_atomic(va);
-}
-
-static inline void __kvm_flush_dcache_pmd(pmd_t pmd)
-{
-	unsigned long size = PMD_SIZE;
-	kvm_pfn_t pfn = pmd_pfn(pmd);
-
-	while (size) {
-		void *va = kmap_atomic_pfn(pfn);
+#define __kvm_flush_dcache_pte(p)				\
+	coherent_dcache_guest_page(pte_pfn((p)), PAGE_SIZE)
 
-		kvm_flush_dcache_to_poc(va, PAGE_SIZE);
-
-		pfn++;
-		size -= PAGE_SIZE;
-
-		kunmap_atomic(va);
-	}
-}
+#define __kvm_flush_dcache_pmd(p)				\
+	coherent_dcache_guest_page(pmd_pfn((p)), PMD_SIZE)
 
 static inline void __kvm_flush_dcache_pud(pud_t pud)
 {
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 5b495450e92f..ab027fdf76e7 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -70,6 +70,16 @@ static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
 	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, kvm, ipa);
 }
 
+static void coherent_dcache_guest_page(kvm_pfn_t pfn, unsigned long size)
+{
+	__coherent_dcache_guest_page(pfn, size);
+}
+
+static void coherent_icache_guest_page(kvm_pfn_t pfn, unsigned long size)
+{
+	__coherent_icache_guest_page(pfn, size);
+}
+
 /*
  * D-Cache management functions. They take the page table entries by
  * value, as they are flushing the cache using the kernel mapping (or
@@ -1268,16 +1278,6 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
 }
 
-static void coherent_dcache_guest_page(kvm_pfn_t pfn, unsigned long size)
-{
-	__coherent_dcache_guest_page(pfn, size);
-}
-
-static void coherent_icache_guest_page(kvm_pfn_t pfn, unsigned long size)
-{
-	__coherent_icache_guest_page(pfn, size);
-}
-
 static void kvm_send_hwpoison_signal(unsigned long address,
 				     struct vm_area_struct *vma)
 {
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 10/10] arm: KVM: Use common implementation for all flushes to PoC
@ 2017-10-09 15:20   ` Marc Zyngier
  0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2017-10-09 15:20 UTC (permalink / raw)
  To: linux-arm-kernel

We currently have no less than three implementations for the
"flush to PoC" code. Let standardize on a single one. This
requires a bit of unpleasant moving around, and relies on
__kvm_flush_dcache_pte and co being #defines so that they can
call into coherent_dcache_guest_page...

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_mmu.h | 28 ++++------------------------
 virt/kvm/arm/mmu.c             | 20 ++++++++++----------
 2 files changed, 14 insertions(+), 34 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 5f1ac88a5951..011b0db85c02 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -235,31 +235,11 @@ static inline void __coherent_icache_guest_page(kvm_pfn_t pfn,
 	}
 }
 
-static inline void __kvm_flush_dcache_pte(pte_t pte)
-{
-	void *va = kmap_atomic(pte_page(pte));
-
-	kvm_flush_dcache_to_poc(va, PAGE_SIZE);
-
-	kunmap_atomic(va);
-}
-
-static inline void __kvm_flush_dcache_pmd(pmd_t pmd)
-{
-	unsigned long size = PMD_SIZE;
-	kvm_pfn_t pfn = pmd_pfn(pmd);
-
-	while (size) {
-		void *va = kmap_atomic_pfn(pfn);
+#define __kvm_flush_dcache_pte(p)				\
+	coherent_dcache_guest_page(pte_pfn((p)), PAGE_SIZE)
 
-		kvm_flush_dcache_to_poc(va, PAGE_SIZE);
-
-		pfn++;
-		size -= PAGE_SIZE;
-
-		kunmap_atomic(va);
-	}
-}
+#define __kvm_flush_dcache_pmd(p)				\
+	coherent_dcache_guest_page(pmd_pfn((p)), PMD_SIZE)
 
 static inline void __kvm_flush_dcache_pud(pud_t pud)
 {
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 5b495450e92f..ab027fdf76e7 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -70,6 +70,16 @@ static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
 	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, kvm, ipa);
 }
 
+static void coherent_dcache_guest_page(kvm_pfn_t pfn, unsigned long size)
+{
+	__coherent_dcache_guest_page(pfn, size);
+}
+
+static void coherent_icache_guest_page(kvm_pfn_t pfn, unsigned long size)
+{
+	__coherent_icache_guest_page(pfn, size);
+}
+
 /*
  * D-Cache management functions. They take the page table entries by
  * value, as they are flushing the cache using the kernel mapping (or
@@ -1268,16 +1278,6 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
 }
 
-static void coherent_dcache_guest_page(kvm_pfn_t pfn, unsigned long size)
-{
-	__coherent_dcache_guest_page(pfn, size);
-}
-
-static void coherent_icache_guest_page(kvm_pfn_t pfn, unsigned long size)
-{
-	__coherent_icache_guest_page(pfn, size);
-}
-
 static void kvm_send_hwpoison_signal(unsigned long address,
 				     struct vm_area_struct *vma)
 {
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* Re: [PATCH 10/10] arm: KVM: Use common implementation for all flushes to PoC
  2017-10-09 15:20   ` Marc Zyngier
@ 2017-10-16 20:06     ` Christoffer Dall
  -1 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2017-10-16 20:06 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, Catalin Marinas, Will Deacon, kvmarm, linux-arm-kernel

On Mon, Oct 09, 2017 at 04:20:32PM +0100, Marc Zyngier wrote:
> We currently have no less than three implementations for the
> "flush to PoC" code. Let standardize on a single one. This
> requires a bit of unpleasant moving around, and relies on
> __kvm_flush_dcache_pte and co being #defines so that they can
> call into coherent_dcache_guest_page...
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/include/asm/kvm_mmu.h | 28 ++++------------------------
>  virt/kvm/arm/mmu.c             | 20 ++++++++++----------
>  2 files changed, 14 insertions(+), 34 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index 5f1ac88a5951..011b0db85c02 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -235,31 +235,11 @@ static inline void __coherent_icache_guest_page(kvm_pfn_t pfn,
>  	}
>  }
>  
> -static inline void __kvm_flush_dcache_pte(pte_t pte)
> -{
> -	void *va = kmap_atomic(pte_page(pte));
> -
> -	kvm_flush_dcache_to_poc(va, PAGE_SIZE);
> -
> -	kunmap_atomic(va);
> -}
> -
> -static inline void __kvm_flush_dcache_pmd(pmd_t pmd)
> -{
> -	unsigned long size = PMD_SIZE;
> -	kvm_pfn_t pfn = pmd_pfn(pmd);
> -
> -	while (size) {
> -		void *va = kmap_atomic_pfn(pfn);
> +#define __kvm_flush_dcache_pte(p)				\
> +	coherent_dcache_guest_page(pte_pfn((p)), PAGE_SIZE)
>  
> -		kvm_flush_dcache_to_poc(va, PAGE_SIZE);
> -
> -		pfn++;
> -		size -= PAGE_SIZE;
> -
> -		kunmap_atomic(va);
> -	}
> -}
> +#define __kvm_flush_dcache_pmd(p)				\
> +	coherent_dcache_guest_page(pmd_pfn((p)), PMD_SIZE)

Why can't these just be static inlines which call
__coherent_dcache_guest_page already in the header file directly?

I'm really not too crazy about these #defines.

In fact, why do we need the coherent_Xcache_guest_page static
indirection functions in mmu.c in the first place?


Thanks,
-Christoffer
>  
>  static inline void __kvm_flush_dcache_pud(pud_t pud)
>  {
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 5b495450e92f..ab027fdf76e7 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -70,6 +70,16 @@ static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
>  	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, kvm, ipa);
>  }
>  
> +static void coherent_dcache_guest_page(kvm_pfn_t pfn, unsigned long size)
> +{
> +	__coherent_dcache_guest_page(pfn, size);
> +}
> +
> +static void coherent_icache_guest_page(kvm_pfn_t pfn, unsigned long size)
> +{
> +	__coherent_icache_guest_page(pfn, size);
> +}
> +
>  /*
>   * D-Cache management functions. They take the page table entries by
>   * value, as they are flushing the cache using the kernel mapping (or
> @@ -1268,16 +1278,6 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>  	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
>  }
>  
> -static void coherent_dcache_guest_page(kvm_pfn_t pfn, unsigned long size)
> -{
> -	__coherent_dcache_guest_page(pfn, size);
> -}
> -
> -static void coherent_icache_guest_page(kvm_pfn_t pfn, unsigned long size)
> -{
> -	__coherent_icache_guest_page(pfn, size);
> -}
> -
>  static void kvm_send_hwpoison_signal(unsigned long address,
>  				     struct vm_area_struct *vma)
>  {
> -- 
> 2.14.1
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 10/10] arm: KVM: Use common implementation for all flushes to PoC
@ 2017-10-16 20:06     ` Christoffer Dall
  0 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2017-10-16 20:06 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 09, 2017 at 04:20:32PM +0100, Marc Zyngier wrote:
> We currently have no less than three implementations for the
> "flush to PoC" code. Let standardize on a single one. This
> requires a bit of unpleasant moving around, and relies on
> __kvm_flush_dcache_pte and co being #defines so that they can
> call into coherent_dcache_guest_page...
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/include/asm/kvm_mmu.h | 28 ++++------------------------
>  virt/kvm/arm/mmu.c             | 20 ++++++++++----------
>  2 files changed, 14 insertions(+), 34 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index 5f1ac88a5951..011b0db85c02 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -235,31 +235,11 @@ static inline void __coherent_icache_guest_page(kvm_pfn_t pfn,
>  	}
>  }
>  
> -static inline void __kvm_flush_dcache_pte(pte_t pte)
> -{
> -	void *va = kmap_atomic(pte_page(pte));
> -
> -	kvm_flush_dcache_to_poc(va, PAGE_SIZE);
> -
> -	kunmap_atomic(va);
> -}
> -
> -static inline void __kvm_flush_dcache_pmd(pmd_t pmd)
> -{
> -	unsigned long size = PMD_SIZE;
> -	kvm_pfn_t pfn = pmd_pfn(pmd);
> -
> -	while (size) {
> -		void *va = kmap_atomic_pfn(pfn);
> +#define __kvm_flush_dcache_pte(p)				\
> +	coherent_dcache_guest_page(pte_pfn((p)), PAGE_SIZE)
>  
> -		kvm_flush_dcache_to_poc(va, PAGE_SIZE);
> -
> -		pfn++;
> -		size -= PAGE_SIZE;
> -
> -		kunmap_atomic(va);
> -	}
> -}
> +#define __kvm_flush_dcache_pmd(p)				\
> +	coherent_dcache_guest_page(pmd_pfn((p)), PMD_SIZE)

Why can't these just be static inlines which call
__coherent_dcache_guest_page already in the header file directly?

I'm really not too crazy about these #defines.

In fact, why do we need the coherent_Xcache_guest_page static
indirection functions in mmu.c in the first place?


Thanks,
-Christoffer
>  
>  static inline void __kvm_flush_dcache_pud(pud_t pud)
>  {
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 5b495450e92f..ab027fdf76e7 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -70,6 +70,16 @@ static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
>  	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, kvm, ipa);
>  }
>  
> +static void coherent_dcache_guest_page(kvm_pfn_t pfn, unsigned long size)
> +{
> +	__coherent_dcache_guest_page(pfn, size);
> +}
> +
> +static void coherent_icache_guest_page(kvm_pfn_t pfn, unsigned long size)
> +{
> +	__coherent_icache_guest_page(pfn, size);
> +}
> +
>  /*
>   * D-Cache management functions. They take the page table entries by
>   * value, as they are flushing the cache using the kernel mapping (or
> @@ -1268,16 +1278,6 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>  	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
>  }
>  
> -static void coherent_dcache_guest_page(kvm_pfn_t pfn, unsigned long size)
> -{
> -	__coherent_dcache_guest_page(pfn, size);
> -}
> -
> -static void coherent_icache_guest_page(kvm_pfn_t pfn, unsigned long size)
> -{
> -	__coherent_icache_guest_page(pfn, size);
> -}
> -
>  static void kvm_send_hwpoison_signal(unsigned long address,
>  				     struct vm_area_struct *vma)
>  {
> -- 
> 2.14.1
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 01/10] KVM: arm/arm64: Split dcache/icache flushing
  2017-10-09 15:20   ` Marc Zyngier
@ 2017-10-16 20:07     ` Christoffer Dall
  -1 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2017-10-16 20:07 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, Catalin Marinas, Will Deacon, kvmarm, linux-arm-kernel

On Mon, Oct 09, 2017 at 04:20:23PM +0100, Marc Zyngier wrote:
> As we're about to introduce opportunistic invalidation of the icache,
> let's split dcache and icache flushing.

I'm a little confused abut the naming of these functions now,
because where I believe the current function ensures coherency between
the I-cache and D-cache (and overly so) if you just call one or the
other function after this change, what exactly is the coherency you get?


> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/include/asm/kvm_mmu.h   | 60 ++++++++++++++++++++++++++++------------
>  arch/arm64/include/asm/kvm_mmu.h | 13 +++++++--
>  virt/kvm/arm/mmu.c               | 20 ++++++++++----
>  3 files changed, 67 insertions(+), 26 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index fa6f2174276b..f553aa62d0c3 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -126,21 +126,12 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
>  	return (vcpu_cp15(vcpu, c1_SCTLR) & 0b101) == 0b101;
>  }
>  
> -static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
> -					       kvm_pfn_t pfn,
> -					       unsigned long size)
> +static inline void __coherent_dcache_guest_page(struct kvm_vcpu *vcpu,
> +						kvm_pfn_t pfn,
> +						unsigned long size)
>  {
>  	/*
> -	 * If we are going to insert an instruction page and the icache is
> -	 * either VIPT or PIPT, there is a potential problem where the host
> -	 * (or another VM) may have used the same page as this guest, and we
> -	 * read incorrect data from the icache.  If we're using a PIPT cache,
> -	 * we can invalidate just that page, but if we are using a VIPT cache
> -	 * we need to invalidate the entire icache - damn shame - as written
> -	 * in the ARM ARM (DDI 0406C.b - Page B3-1393).
> -	 *
> -	 * VIVT caches are tagged using both the ASID and the VMID and doesn't
> -	 * need any kind of flushing (DDI 0406C.b - Page B3-1392).
> +	 * Clean the dcache to the Point of Coherency.
>  	 *
>  	 * We need to do this through a kernel mapping (using the
>  	 * user-space mapping has proved to be the wrong
> @@ -155,19 +146,52 @@ static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
>  
>  		kvm_flush_dcache_to_poc(va, PAGE_SIZE);
>  
> -		if (icache_is_pipt())
> -			__cpuc_coherent_user_range((unsigned long)va,
> -						   (unsigned long)va + PAGE_SIZE);
> -
>  		size -= PAGE_SIZE;
>  		pfn++;
>  
>  		kunmap_atomic(va);
>  	}
> +}
>  
> -	if (!icache_is_pipt() && !icache_is_vivt_asid_tagged()) {
> +static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
> +						kvm_pfn_t pfn,
> +						unsigned long size)
> +{
> +	/*
> +	 * If we are going to insert an instruction page and the icache is
> +	 * either VIPT or PIPT, there is a potential problem where the host
> +	 * (or another VM) may have used the same page as this guest, and we
> +	 * read incorrect data from the icache.  If we're using a PIPT cache,
> +	 * we can invalidate just that page, but if we are using a VIPT cache
> +	 * we need to invalidate the entire icache - damn shame - as written
> +	 * in the ARM ARM (DDI 0406C.b - Page B3-1393).
> +	 *
> +	 * VIVT caches are tagged using both the ASID and the VMID and doesn't
> +	 * need any kind of flushing (DDI 0406C.b - Page B3-1392).
> +	 */
> +
> +	VM_BUG_ON(size & ~PAGE_MASK);
> +
> +	if (icache_is_vivt_asid_tagged())
> +		return;
> +
> +	if (!icache_is_pipt()) {
>  		/* any kind of VIPT cache */
>  		__flush_icache_all();
> +		return;
> +	}
> +
> +	/* PIPT cache. As for the d-side, use a temporary kernel mapping. */
> +	while (size) {
> +		void *va = kmap_atomic_pfn(pfn);
> +
> +		__cpuc_coherent_user_range((unsigned long)va,
> +					   (unsigned long)va + PAGE_SIZE);
> +
> +		size -= PAGE_SIZE;
> +		pfn++;
> +
> +		kunmap_atomic(va);
>  	}
>  }
>  
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 672c8684d5c2..4c4cb4f0e34f 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -230,19 +230,26 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
>  	return (vcpu_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
>  }
>  
> -static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
> -					       kvm_pfn_t pfn,
> -					       unsigned long size)
> +static inline void __coherent_dcache_guest_page(struct kvm_vcpu *vcpu,
> +						kvm_pfn_t pfn,
> +						unsigned long size)
>  {
>  	void *va = page_address(pfn_to_page(pfn));
>  
>  	kvm_flush_dcache_to_poc(va, size);
> +}
>  
> +static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
> +						kvm_pfn_t pfn,
> +						unsigned long size)
> +{
>  	if (icache_is_aliasing()) {
>  		/* any kind of VIPT cache */
>  		__flush_icache_all();
>  	} else if (is_kernel_in_hyp_mode() || !icache_is_vpipt()) {
>  		/* PIPT or VPIPT at EL2 (see comment in __kvm_tlb_flush_vmid_ipa) */

unrelated: I went and read the comment in __kvm_tlb_flush_vmid_ipa, and
I don't really understand why there is only a need to flush the icache
if the host is running at EL1.

The text seems to describe the problem of remapping executable pages
within the guest.  That seems to me would require icache maintenance of
the page that gets overwritten with new code, regardless of whether the
host runs at EL1 or EL2.

Of course it's easier done on VHE because we don't have to take a trap,
but the code seems to not invalidate the icache at all for VHE systems
that have VPIPT.  I'm confused.  Can you help?

> +		void *va = page_address(pfn_to_page(pfn));
> +
>  		flush_icache_range((unsigned long)va,
>  				   (unsigned long)va + size);
>  	}
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index b36945d49986..9e5628388af8 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1257,10 +1257,16 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>  	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
>  }
>  
> -static void coherent_cache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
> -				      unsigned long size)
> +static void coherent_dcache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
> +				       unsigned long size)
>  {
> -	__coherent_cache_guest_page(vcpu, pfn, size);
> +	__coherent_dcache_guest_page(vcpu, pfn, size);
> +}
> +
> +static void coherent_icache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
> +				       unsigned long size)
> +{
> +	__coherent_icache_guest_page(vcpu, pfn, size);
>  }
>  
>  static void kvm_send_hwpoison_signal(unsigned long address,
> @@ -1391,7 +1397,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  			new_pmd = kvm_s2pmd_mkwrite(new_pmd);
>  			kvm_set_pfn_dirty(pfn);
>  		}
> -		coherent_cache_guest_page(vcpu, pfn, PMD_SIZE);
> +		coherent_dcache_guest_page(vcpu, pfn, PMD_SIZE);
> +		coherent_icache_guest_page(vcpu, pfn, PMD_SIZE);
> +
>  		ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
>  	} else {
>  		pte_t new_pte = pfn_pte(pfn, mem_type);
> @@ -1401,7 +1409,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  			kvm_set_pfn_dirty(pfn);
>  			mark_page_dirty(kvm, gfn);
>  		}
> -		coherent_cache_guest_page(vcpu, pfn, PAGE_SIZE);
> +		coherent_dcache_guest_page(vcpu, pfn, PAGE_SIZE);
> +		coherent_icache_guest_page(vcpu, pfn, PAGE_SIZE);
> +
>  		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, flags);
>  	}
>  
> -- 
> 2.14.1
> 

Otherwise this looks fine to me:

Acked-by: Christoffer Dall <cdall@linaro.org>

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 01/10] KVM: arm/arm64: Split dcache/icache flushing
@ 2017-10-16 20:07     ` Christoffer Dall
  0 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2017-10-16 20:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 09, 2017 at 04:20:23PM +0100, Marc Zyngier wrote:
> As we're about to introduce opportunistic invalidation of the icache,
> let's split dcache and icache flushing.

I'm a little confused abut the naming of these functions now,
because where I believe the current function ensures coherency between
the I-cache and D-cache (and overly so) if you just call one or the
other function after this change, what exactly is the coherency you get?


> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/include/asm/kvm_mmu.h   | 60 ++++++++++++++++++++++++++++------------
>  arch/arm64/include/asm/kvm_mmu.h | 13 +++++++--
>  virt/kvm/arm/mmu.c               | 20 ++++++++++----
>  3 files changed, 67 insertions(+), 26 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index fa6f2174276b..f553aa62d0c3 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -126,21 +126,12 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
>  	return (vcpu_cp15(vcpu, c1_SCTLR) & 0b101) == 0b101;
>  }
>  
> -static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
> -					       kvm_pfn_t pfn,
> -					       unsigned long size)
> +static inline void __coherent_dcache_guest_page(struct kvm_vcpu *vcpu,
> +						kvm_pfn_t pfn,
> +						unsigned long size)
>  {
>  	/*
> -	 * If we are going to insert an instruction page and the icache is
> -	 * either VIPT or PIPT, there is a potential problem where the host
> -	 * (or another VM) may have used the same page as this guest, and we
> -	 * read incorrect data from the icache.  If we're using a PIPT cache,
> -	 * we can invalidate just that page, but if we are using a VIPT cache
> -	 * we need to invalidate the entire icache - damn shame - as written
> -	 * in the ARM ARM (DDI 0406C.b - Page B3-1393).
> -	 *
> -	 * VIVT caches are tagged using both the ASID and the VMID and doesn't
> -	 * need any kind of flushing (DDI 0406C.b - Page B3-1392).
> +	 * Clean the dcache to the Point of Coherency.
>  	 *
>  	 * We need to do this through a kernel mapping (using the
>  	 * user-space mapping has proved to be the wrong
> @@ -155,19 +146,52 @@ static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
>  
>  		kvm_flush_dcache_to_poc(va, PAGE_SIZE);
>  
> -		if (icache_is_pipt())
> -			__cpuc_coherent_user_range((unsigned long)va,
> -						   (unsigned long)va + PAGE_SIZE);
> -
>  		size -= PAGE_SIZE;
>  		pfn++;
>  
>  		kunmap_atomic(va);
>  	}
> +}
>  
> -	if (!icache_is_pipt() && !icache_is_vivt_asid_tagged()) {
> +static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
> +						kvm_pfn_t pfn,
> +						unsigned long size)
> +{
> +	/*
> +	 * If we are going to insert an instruction page and the icache is
> +	 * either VIPT or PIPT, there is a potential problem where the host
> +	 * (or another VM) may have used the same page as this guest, and we
> +	 * read incorrect data from the icache.  If we're using a PIPT cache,
> +	 * we can invalidate just that page, but if we are using a VIPT cache
> +	 * we need to invalidate the entire icache - damn shame - as written
> +	 * in the ARM ARM (DDI 0406C.b - Page B3-1393).
> +	 *
> +	 * VIVT caches are tagged using both the ASID and the VMID and doesn't
> +	 * need any kind of flushing (DDI 0406C.b - Page B3-1392).
> +	 */
> +
> +	VM_BUG_ON(size & ~PAGE_MASK);
> +
> +	if (icache_is_vivt_asid_tagged())
> +		return;
> +
> +	if (!icache_is_pipt()) {
>  		/* any kind of VIPT cache */
>  		__flush_icache_all();
> +		return;
> +	}
> +
> +	/* PIPT cache. As for the d-side, use a temporary kernel mapping. */
> +	while (size) {
> +		void *va = kmap_atomic_pfn(pfn);
> +
> +		__cpuc_coherent_user_range((unsigned long)va,
> +					   (unsigned long)va + PAGE_SIZE);
> +
> +		size -= PAGE_SIZE;
> +		pfn++;
> +
> +		kunmap_atomic(va);
>  	}
>  }
>  
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 672c8684d5c2..4c4cb4f0e34f 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -230,19 +230,26 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
>  	return (vcpu_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
>  }
>  
> -static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
> -					       kvm_pfn_t pfn,
> -					       unsigned long size)
> +static inline void __coherent_dcache_guest_page(struct kvm_vcpu *vcpu,
> +						kvm_pfn_t pfn,
> +						unsigned long size)
>  {
>  	void *va = page_address(pfn_to_page(pfn));
>  
>  	kvm_flush_dcache_to_poc(va, size);
> +}
>  
> +static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
> +						kvm_pfn_t pfn,
> +						unsigned long size)
> +{
>  	if (icache_is_aliasing()) {
>  		/* any kind of VIPT cache */
>  		__flush_icache_all();
>  	} else if (is_kernel_in_hyp_mode() || !icache_is_vpipt()) {
>  		/* PIPT or VPIPT at EL2 (see comment in __kvm_tlb_flush_vmid_ipa) */

unrelated: I went and read the comment in __kvm_tlb_flush_vmid_ipa, and
I don't really understand why there is only a need to flush the icache
if the host is running at EL1.

The text seems to describe the problem of remapping executable pages
within the guest.  That seems to me would require icache maintenance of
the page that gets overwritten with new code, regardless of whether the
host runs at EL1 or EL2.

Of course it's easier done on VHE because we don't have to take a trap,
but the code seems to not invalidate the icache at all for VHE systems
that have VPIPT.  I'm confused.  Can you help?

> +		void *va = page_address(pfn_to_page(pfn));
> +
>  		flush_icache_range((unsigned long)va,
>  				   (unsigned long)va + size);
>  	}
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index b36945d49986..9e5628388af8 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1257,10 +1257,16 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>  	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
>  }
>  
> -static void coherent_cache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
> -				      unsigned long size)
> +static void coherent_dcache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
> +				       unsigned long size)
>  {
> -	__coherent_cache_guest_page(vcpu, pfn, size);
> +	__coherent_dcache_guest_page(vcpu, pfn, size);
> +}
> +
> +static void coherent_icache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
> +				       unsigned long size)
> +{
> +	__coherent_icache_guest_page(vcpu, pfn, size);
>  }
>  
>  static void kvm_send_hwpoison_signal(unsigned long address,
> @@ -1391,7 +1397,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  			new_pmd = kvm_s2pmd_mkwrite(new_pmd);
>  			kvm_set_pfn_dirty(pfn);
>  		}
> -		coherent_cache_guest_page(vcpu, pfn, PMD_SIZE);
> +		coherent_dcache_guest_page(vcpu, pfn, PMD_SIZE);
> +		coherent_icache_guest_page(vcpu, pfn, PMD_SIZE);
> +
>  		ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
>  	} else {
>  		pte_t new_pte = pfn_pte(pfn, mem_type);
> @@ -1401,7 +1409,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  			kvm_set_pfn_dirty(pfn);
>  			mark_page_dirty(kvm, gfn);
>  		}
> -		coherent_cache_guest_page(vcpu, pfn, PAGE_SIZE);
> +		coherent_dcache_guest_page(vcpu, pfn, PAGE_SIZE);
> +		coherent_icache_guest_page(vcpu, pfn, PAGE_SIZE);
> +
>  		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, flags);
>  	}
>  
> -- 
> 2.14.1
> 

Otherwise this looks fine to me:

Acked-by: Christoffer Dall <cdall@linaro.org>

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 03/10] arm: KVM: Add optimized PIPT icache flushing
  2017-10-09 15:20   ` Marc Zyngier
@ 2017-10-16 20:07     ` Christoffer Dall
  -1 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2017-10-16 20:07 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Christoffer Dall, Catalin Marinas, Will Deacon, linux-arm-kernel,
	kvm, kvmarm

On Mon, Oct 09, 2017 at 04:20:25PM +0100, Marc Zyngier wrote:
> Calling __cpuc_coherent_user_range to invalidate the icache on
> a PIPT icache machine has some pointless overhead, as it starts
> by cleaning the dcache to the PoU, while we're guaranteed to
> have already cleaned it to the PoC.
> 
> As KVM is the only user of such a feature, let's implement some
> ad-hoc cache flushing in kvm_mmu.h. Should it become useful to
> other subsystems, it can be moved to a more global location.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/include/asm/kvm_hyp.h |  2 ++
>  arch/arm/include/asm/kvm_mmu.h | 24 ++++++++++++++++++++++--
>  2 files changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_hyp.h b/arch/arm/include/asm/kvm_hyp.h
> index 14b5903f0224..ad541f9ecc78 100644
> --- a/arch/arm/include/asm/kvm_hyp.h
> +++ b/arch/arm/include/asm/kvm_hyp.h
> @@ -69,6 +69,8 @@
>  #define HIFAR		__ACCESS_CP15(c6, 4, c0, 2)
>  #define HPFAR		__ACCESS_CP15(c6, 4, c0, 4)
>  #define ICIALLUIS	__ACCESS_CP15(c7, 0, c1, 0)
> +#define BPIALLIS	__ACCESS_CP15(c7, 0, c1, 6)
> +#define ICIMVAU		__ACCESS_CP15(c7, 0, c5, 1)
>  #define ATS1CPR		__ACCESS_CP15(c7, 0, c8, 0)
>  #define TLBIALLIS	__ACCESS_CP15(c8, 0, c3, 0)
>  #define TLBIALL		__ACCESS_CP15(c8, 0, c7, 0)
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index f553aa62d0c3..6773dcf21bff 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -37,6 +37,8 @@
>  
>  #include <linux/highmem.h>
>  #include <asm/cacheflush.h>
> +#include <asm/cputype.h>
> +#include <asm/kvm_hyp.h>
>  #include <asm/pgalloc.h>
>  #include <asm/stage2_pgtable.h>
>  
> @@ -157,6 +159,8 @@ static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
>  						kvm_pfn_t pfn,
>  						unsigned long size)
>  {
> +	u32 iclsz;
> +
>  	/*
>  	 * If we are going to insert an instruction page and the icache is
>  	 * either VIPT or PIPT, there is a potential problem where the host
> @@ -182,17 +186,33 @@ static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
>  	}
>  
>  	/* PIPT cache. As for the d-side, use a temporary kernel mapping. */
> +	iclsz = 4 << (read_cpuid(CPUID_CACHETYPE) & 0xf);
> +

nit: the 4 here is a bit cryptic, could we say something like (perhaps
slightly over-explained):
/*
 * CTR IminLine contains Log2 of the number of words in the cache line,
 * so we can get the number of words as 2 << (IminLine - 1).  To get the
 * number of bytes, we multiply by 4 (the number of bytes in a 32-bit
 * word), and get 4 << (IminLine).
 */

>  	while (size) {
>  		void *va = kmap_atomic_pfn(pfn);
> +		void *end = va + PAGE_SIZE;
> +		void *addr = va;
> +
> +		do {
> +			write_sysreg(addr, ICIMVAU);

Maybe an oddball place to ask this, but I don't recall why we need PoU
everywhere, would PoC potentially be enough?

> +			addr += iclsz;
> +		} while (addr < end);
>  
> -		__cpuc_coherent_user_range((unsigned long)va,
> -					   (unsigned long)va + PAGE_SIZE);
> +		dsb(ishst);
> +		isb();

Do we really need this in every iteration of the loop?

>  
>  		size -= PAGE_SIZE;
>  		pfn++;
>  
>  		kunmap_atomic(va);
>  	}
> +
> +	/* Check if we need to invalidate the BTB */
> +	if ((read_cpuid_ext(CPUID_EXT_MMFR1) >> 24) != 4) {

Either I'm having a bad day or you meant to shift this 28, not 24?

> +		write_sysreg(0, BPIALLIS);
> +		dsb(ishst);
> +		isb();
> +	}
>  }
>  
>  static inline void __kvm_flush_dcache_pte(pte_t pte)
> -- 
> 2.14.1
> 

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 03/10] arm: KVM: Add optimized PIPT icache flushing
@ 2017-10-16 20:07     ` Christoffer Dall
  0 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2017-10-16 20:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 09, 2017 at 04:20:25PM +0100, Marc Zyngier wrote:
> Calling __cpuc_coherent_user_range to invalidate the icache on
> a PIPT icache machine has some pointless overhead, as it starts
> by cleaning the dcache to the PoU, while we're guaranteed to
> have already cleaned it to the PoC.
> 
> As KVM is the only user of such a feature, let's implement some
> ad-hoc cache flushing in kvm_mmu.h. Should it become useful to
> other subsystems, it can be moved to a more global location.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/include/asm/kvm_hyp.h |  2 ++
>  arch/arm/include/asm/kvm_mmu.h | 24 ++++++++++++++++++++++--
>  2 files changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_hyp.h b/arch/arm/include/asm/kvm_hyp.h
> index 14b5903f0224..ad541f9ecc78 100644
> --- a/arch/arm/include/asm/kvm_hyp.h
> +++ b/arch/arm/include/asm/kvm_hyp.h
> @@ -69,6 +69,8 @@
>  #define HIFAR		__ACCESS_CP15(c6, 4, c0, 2)
>  #define HPFAR		__ACCESS_CP15(c6, 4, c0, 4)
>  #define ICIALLUIS	__ACCESS_CP15(c7, 0, c1, 0)
> +#define BPIALLIS	__ACCESS_CP15(c7, 0, c1, 6)
> +#define ICIMVAU		__ACCESS_CP15(c7, 0, c5, 1)
>  #define ATS1CPR		__ACCESS_CP15(c7, 0, c8, 0)
>  #define TLBIALLIS	__ACCESS_CP15(c8, 0, c3, 0)
>  #define TLBIALL		__ACCESS_CP15(c8, 0, c7, 0)
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index f553aa62d0c3..6773dcf21bff 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -37,6 +37,8 @@
>  
>  #include <linux/highmem.h>
>  #include <asm/cacheflush.h>
> +#include <asm/cputype.h>
> +#include <asm/kvm_hyp.h>
>  #include <asm/pgalloc.h>
>  #include <asm/stage2_pgtable.h>
>  
> @@ -157,6 +159,8 @@ static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
>  						kvm_pfn_t pfn,
>  						unsigned long size)
>  {
> +	u32 iclsz;
> +
>  	/*
>  	 * If we are going to insert an instruction page and the icache is
>  	 * either VIPT or PIPT, there is a potential problem where the host
> @@ -182,17 +186,33 @@ static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
>  	}
>  
>  	/* PIPT cache. As for the d-side, use a temporary kernel mapping. */
> +	iclsz = 4 << (read_cpuid(CPUID_CACHETYPE) & 0xf);
> +

nit: the 4 here is a bit cryptic, could we say something like (perhaps
slightly over-explained):
/*
 * CTR IminLine contains Log2 of the number of words in the cache line,
 * so we can get the number of words as 2 << (IminLine - 1).  To get the
 * number of bytes, we multiply by 4 (the number of bytes in a 32-bit
 * word), and get 4 << (IminLine).
 */

>  	while (size) {
>  		void *va = kmap_atomic_pfn(pfn);
> +		void *end = va + PAGE_SIZE;
> +		void *addr = va;
> +
> +		do {
> +			write_sysreg(addr, ICIMVAU);

Maybe an oddball place to ask this, but I don't recall why we need PoU
everywhere, would PoC potentially be enough?

> +			addr += iclsz;
> +		} while (addr < end);
>  
> -		__cpuc_coherent_user_range((unsigned long)va,
> -					   (unsigned long)va + PAGE_SIZE);
> +		dsb(ishst);
> +		isb();

Do we really need this in every iteration of the loop?

>  
>  		size -= PAGE_SIZE;
>  		pfn++;
>  
>  		kunmap_atomic(va);
>  	}
> +
> +	/* Check if we need to invalidate the BTB */
> +	if ((read_cpuid_ext(CPUID_EXT_MMFR1) >> 24) != 4) {

Either I'm having a bad day or you meant to shift this 28, not 24?

> +		write_sysreg(0, BPIALLIS);
> +		dsb(ishst);
> +		isb();
> +	}
>  }
>  
>  static inline void __kvm_flush_dcache_pte(pte_t pte)
> -- 
> 2.14.1
> 

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 04/10] arm64: KVM: PTE/PMD S2 XN bit definition
  2017-10-09 15:20   ` Marc Zyngier
@ 2017-10-16 20:07     ` Christoffer Dall
  -1 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2017-10-16 20:07 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Christoffer Dall, Catalin Marinas, Will Deacon, linux-arm-kernel,
	kvm, kvmarm

On Mon, Oct 09, 2017 at 04:20:26PM +0100, Marc Zyngier wrote:
> As we're about to make S2 page-tables eXecute Never by default,
> add the required bits for both PMDs and PTEs.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>

> ---
>  arch/arm64/include/asm/pgtable-hwdef.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
> index eb0c2bd90de9..af035331fb09 100644
> --- a/arch/arm64/include/asm/pgtable-hwdef.h
> +++ b/arch/arm64/include/asm/pgtable-hwdef.h
> @@ -177,9 +177,11 @@
>   */
>  #define PTE_S2_RDONLY		(_AT(pteval_t, 1) << 6)   /* HAP[2:1] */
>  #define PTE_S2_RDWR		(_AT(pteval_t, 3) << 6)   /* HAP[2:1] */
> +#define PTE_S2_XN		(_AT(pteval_t, 2) << 53)  /* XN[1:0] */
>  
>  #define PMD_S2_RDONLY		(_AT(pmdval_t, 1) << 6)   /* HAP[2:1] */
>  #define PMD_S2_RDWR		(_AT(pmdval_t, 3) << 6)   /* HAP[2:1] */
> +#define PMD_S2_XN		(_AT(pmdval_t, 2) << 53)  /* XN[1:0] */
>  
>  /*
>   * Memory Attribute override for Stage-2 (MemAttr[3:0])
> -- 
> 2.14.1
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 04/10] arm64: KVM: PTE/PMD S2 XN bit definition
@ 2017-10-16 20:07     ` Christoffer Dall
  0 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2017-10-16 20:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 09, 2017 at 04:20:26PM +0100, Marc Zyngier wrote:
> As we're about to make S2 page-tables eXecute Never by default,
> add the required bits for both PMDs and PTEs.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>

> ---
>  arch/arm64/include/asm/pgtable-hwdef.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
> index eb0c2bd90de9..af035331fb09 100644
> --- a/arch/arm64/include/asm/pgtable-hwdef.h
> +++ b/arch/arm64/include/asm/pgtable-hwdef.h
> @@ -177,9 +177,11 @@
>   */
>  #define PTE_S2_RDONLY		(_AT(pteval_t, 1) << 6)   /* HAP[2:1] */
>  #define PTE_S2_RDWR		(_AT(pteval_t, 3) << 6)   /* HAP[2:1] */
> +#define PTE_S2_XN		(_AT(pteval_t, 2) << 53)  /* XN[1:0] */
>  
>  #define PMD_S2_RDONLY		(_AT(pmdval_t, 1) << 6)   /* HAP[2:1] */
>  #define PMD_S2_RDWR		(_AT(pmdval_t, 3) << 6)   /* HAP[2:1] */
> +#define PMD_S2_XN		(_AT(pmdval_t, 2) << 53)  /* XN[1:0] */
>  
>  /*
>   * Memory Attribute override for Stage-2 (MemAttr[3:0])
> -- 
> 2.14.1
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 06/10] KVM: arm/arm64: Only clean the dcache on translation fault
  2017-10-09 15:20   ` Marc Zyngier
@ 2017-10-16 20:08     ` Christoffer Dall
  -1 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2017-10-16 20:08 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, Catalin Marinas, Will Deacon, kvmarm, linux-arm-kernel

On Mon, Oct 09, 2017 at 04:20:28PM +0100, Marc Zyngier wrote:
> The only case where we actually need to perform a dache maintenance
> is when we map the page for the first time, and subsequent permission
> faults do not require cache maintenance. Let's make it conditional
> on not being a permission fault (and thus a translation fault).

Why do we actually need to do any dcache maintenance when faulting in a
page?

Is this for the case when the stage 1 MMU is disabled, or to support
guest mappings using uncached attributes?  Can we do better, for example
by only flushing the cache if the guest MMU is disabled?

Beyond that:

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>

> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  virt/kvm/arm/mmu.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 1d47da22f75c..1911fadde88b 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1400,7 +1400,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  			new_pmd = kvm_s2pmd_mkwrite(new_pmd);
>  			kvm_set_pfn_dirty(pfn);
>  		}
> -		coherent_dcache_guest_page(vcpu, pfn, PMD_SIZE);
> +
> +		if (fault_status != FSC_PERM)
> +			coherent_dcache_guest_page(vcpu, pfn, PMD_SIZE);
>  
>  		if (exec_fault) {
>  			new_pmd = kvm_s2pmd_mkexec(new_pmd);
> @@ -1416,7 +1418,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  			kvm_set_pfn_dirty(pfn);
>  			mark_page_dirty(kvm, gfn);
>  		}
> -		coherent_dcache_guest_page(vcpu, pfn, PAGE_SIZE);
> +
> +		if (fault_status != FSC_PERM)
> +			coherent_dcache_guest_page(vcpu, pfn, PAGE_SIZE);
>  
>  		if (exec_fault) {
>  			new_pte = kvm_s2pte_mkexec(new_pte);
> -- 
> 2.14.1
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 06/10] KVM: arm/arm64: Only clean the dcache on translation fault
@ 2017-10-16 20:08     ` Christoffer Dall
  0 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2017-10-16 20:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 09, 2017 at 04:20:28PM +0100, Marc Zyngier wrote:
> The only case where we actually need to perform a dache maintenance
> is when we map the page for the first time, and subsequent permission
> faults do not require cache maintenance. Let's make it conditional
> on not being a permission fault (and thus a translation fault).

Why do we actually need to do any dcache maintenance when faulting in a
page?

Is this for the case when the stage 1 MMU is disabled, or to support
guest mappings using uncached attributes?  Can we do better, for example
by only flushing the cache if the guest MMU is disabled?

Beyond that:

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>

> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  virt/kvm/arm/mmu.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 1d47da22f75c..1911fadde88b 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1400,7 +1400,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  			new_pmd = kvm_s2pmd_mkwrite(new_pmd);
>  			kvm_set_pfn_dirty(pfn);
>  		}
> -		coherent_dcache_guest_page(vcpu, pfn, PMD_SIZE);
> +
> +		if (fault_status != FSC_PERM)
> +			coherent_dcache_guest_page(vcpu, pfn, PMD_SIZE);
>  
>  		if (exec_fault) {
>  			new_pmd = kvm_s2pmd_mkexec(new_pmd);
> @@ -1416,7 +1418,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  			kvm_set_pfn_dirty(pfn);
>  			mark_page_dirty(kvm, gfn);
>  		}
> -		coherent_dcache_guest_page(vcpu, pfn, PAGE_SIZE);
> +
> +		if (fault_status != FSC_PERM)
> +			coherent_dcache_guest_page(vcpu, pfn, PAGE_SIZE);
>  
>  		if (exec_fault) {
>  			new_pte = kvm_s2pte_mkexec(new_pte);
> -- 
> 2.14.1
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 07/10] KVM: arm/arm64: Preserve Exec permission across R/W permission faults
  2017-10-09 15:20   ` Marc Zyngier
@ 2017-10-16 20:08     ` Christoffer Dall
  -1 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2017-10-16 20:08 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Christoffer Dall, Catalin Marinas, Will Deacon, linux-arm-kernel,
	kvm, kvmarm

On Mon, Oct 09, 2017 at 04:20:29PM +0100, Marc Zyngier wrote:
> So far, we loose the Exec property whenever we take permission
> faults, as we always reconstruct the PTE/PMD from scratch. This
> can be counter productive as we can end-up with the following
> fault sequence:
> 
> 	X -> RO -> ROX -> RW -> RWX
> 
> Instead, we can lookup the existing PTE/PMD and clear the XN bit in the
> new entry if it was already cleared in the old one, leadig to a much
> nicer fault sequence:
> 
> 	X -> ROX -> RWX
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/include/asm/kvm_mmu.h   | 10 ++++++++++
>  arch/arm64/include/asm/kvm_mmu.h | 10 ++++++++++
>  virt/kvm/arm/mmu.c               | 25 +++++++++++++++++++++++++
>  3 files changed, 45 insertions(+)
> 
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index bf76150aad5f..ad442d86c23e 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -107,6 +107,11 @@ static inline bool kvm_s2pte_readonly(pte_t *pte)
>  	return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
>  }
>  
> +static inline bool kvm_s2pte_exec(pte_t *pte)
> +{
> +	return !(pte_val(*pte) & L_PTE_XN);
> +}
> +
>  static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
>  {
>  	pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
> @@ -117,6 +122,11 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
>  	return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
>  }
>  
> +static inline bool kvm_s2pmd_exec(pmd_t *pmd)
> +{
> +	return !(pmd_val(*pmd) & PMD_SECT_XN);
> +}
> +
>  static inline bool kvm_page_empty(void *ptr)
>  {
>  	struct page *ptr_page = virt_to_page(ptr);
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 60c420a5ac0d..e7af74b8b51a 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -203,6 +203,11 @@ static inline bool kvm_s2pte_readonly(pte_t *pte)
>  	return (pte_val(*pte) & PTE_S2_RDWR) == PTE_S2_RDONLY;
>  }
>  
> +static inline bool kvm_s2pte_exec(pte_t *pte)
> +{
> +	return !(pte_val(*pte) & PTE_S2_XN);
> +}
> +
>  static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
>  {
>  	kvm_set_s2pte_readonly((pte_t *)pmd);
> @@ -213,6 +218,11 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
>  	return kvm_s2pte_readonly((pte_t *)pmd);
>  }
>  
> +static inline bool kvm_s2pmd_exec(pmd_t *pmd)
> +{
> +	return !(pmd_val(*pmd) & PMD_S2_XN);
> +}
> +
>  static inline bool kvm_page_empty(void *ptr)
>  {
>  	struct page *ptr_page = virt_to_page(ptr);
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 1911fadde88b..ccc6106764a6 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -926,6 +926,17 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
>  	return 0;
>  }
>  
> +static pte_t *stage2_get_pte(struct kvm *kvm, phys_addr_t addr)
> +{
> +	pmd_t *pmdp;
> +
> +	pmdp = stage2_get_pmd(kvm, NULL, addr);
> +	if (!pmdp || pmd_none(*pmdp))
> +		return NULL;
> +
> +	return pte_offset_kernel(pmdp, addr);
> +}
> +

nit, couldn't you change this to be

    stage2_is_exec(struct kvm *kvm, phys_addr_t addr)

Which, if the pmd is a section mapping just checks that, and if we find
a pte, we check that, and then we can have a simpler one-line call and
check from both the pte and pmd paths below?

>  static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
>  			  phys_addr_t addr, const pte_t *new_pte,
>  			  unsigned long flags)
> @@ -1407,6 +1418,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  		if (exec_fault) {
>  			new_pmd = kvm_s2pmd_mkexec(new_pmd);
>  			coherent_icache_guest_page(vcpu, pfn, PMD_SIZE);
> +		} else if (fault_status == FSC_PERM) {
> +			/* Preserve execute if XN was already cleared */
> +			pmd_t *old_pmdp = stage2_get_pmd(kvm, NULL, fault_ipa);
> +
> +			if (old_pmdp && pmd_present(*old_pmdp) &&
> +			    kvm_s2pmd_exec(old_pmdp))
> +				new_pmd = kvm_s2pmd_mkexec(new_pmd);

Is the reverse case not also possible then?  That is, if we have an
exec_fault, we could check if the entry is already writable and maintain
the property as well.  Not sure how often that would get hit though, as
a VM would only execute instructions on a page that has been written to,
but is somehow read-only at stage2, meaning the host must have marked
the page as read-only since content was written.  I think this could be
a somewhat common pattern with something like KSM though?


>  		}
>  
>  		ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
> @@ -1425,6 +1443,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  		if (exec_fault) {
>  			new_pte = kvm_s2pte_mkexec(new_pte);
>  			coherent_icache_guest_page(vcpu, pfn, PAGE_SIZE);
> +		} else if (fault_status == FSC_PERM) {
> +			/* Preserve execute if XN was already cleared */
> +			pte_t *old_ptep = stage2_get_pte(kvm, fault_ipa);
> +
> +			if (old_ptep && pte_present(*old_ptep) &&
> +			    kvm_s2pte_exec(old_ptep))
> +				new_pte = kvm_s2pte_mkexec(new_pte);
>  		}
>  
>  		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, flags);
> -- 
> 2.14.1
> 

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 07/10] KVM: arm/arm64: Preserve Exec permission across R/W permission faults
@ 2017-10-16 20:08     ` Christoffer Dall
  0 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2017-10-16 20:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 09, 2017 at 04:20:29PM +0100, Marc Zyngier wrote:
> So far, we loose the Exec property whenever we take permission
> faults, as we always reconstruct the PTE/PMD from scratch. This
> can be counter productive as we can end-up with the following
> fault sequence:
> 
> 	X -> RO -> ROX -> RW -> RWX
> 
> Instead, we can lookup the existing PTE/PMD and clear the XN bit in the
> new entry if it was already cleared in the old one, leadig to a much
> nicer fault sequence:
> 
> 	X -> ROX -> RWX
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/include/asm/kvm_mmu.h   | 10 ++++++++++
>  arch/arm64/include/asm/kvm_mmu.h | 10 ++++++++++
>  virt/kvm/arm/mmu.c               | 25 +++++++++++++++++++++++++
>  3 files changed, 45 insertions(+)
> 
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index bf76150aad5f..ad442d86c23e 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -107,6 +107,11 @@ static inline bool kvm_s2pte_readonly(pte_t *pte)
>  	return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
>  }
>  
> +static inline bool kvm_s2pte_exec(pte_t *pte)
> +{
> +	return !(pte_val(*pte) & L_PTE_XN);
> +}
> +
>  static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
>  {
>  	pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
> @@ -117,6 +122,11 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
>  	return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
>  }
>  
> +static inline bool kvm_s2pmd_exec(pmd_t *pmd)
> +{
> +	return !(pmd_val(*pmd) & PMD_SECT_XN);
> +}
> +
>  static inline bool kvm_page_empty(void *ptr)
>  {
>  	struct page *ptr_page = virt_to_page(ptr);
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 60c420a5ac0d..e7af74b8b51a 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -203,6 +203,11 @@ static inline bool kvm_s2pte_readonly(pte_t *pte)
>  	return (pte_val(*pte) & PTE_S2_RDWR) == PTE_S2_RDONLY;
>  }
>  
> +static inline bool kvm_s2pte_exec(pte_t *pte)
> +{
> +	return !(pte_val(*pte) & PTE_S2_XN);
> +}
> +
>  static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
>  {
>  	kvm_set_s2pte_readonly((pte_t *)pmd);
> @@ -213,6 +218,11 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
>  	return kvm_s2pte_readonly((pte_t *)pmd);
>  }
>  
> +static inline bool kvm_s2pmd_exec(pmd_t *pmd)
> +{
> +	return !(pmd_val(*pmd) & PMD_S2_XN);
> +}
> +
>  static inline bool kvm_page_empty(void *ptr)
>  {
>  	struct page *ptr_page = virt_to_page(ptr);
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 1911fadde88b..ccc6106764a6 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -926,6 +926,17 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
>  	return 0;
>  }
>  
> +static pte_t *stage2_get_pte(struct kvm *kvm, phys_addr_t addr)
> +{
> +	pmd_t *pmdp;
> +
> +	pmdp = stage2_get_pmd(kvm, NULL, addr);
> +	if (!pmdp || pmd_none(*pmdp))
> +		return NULL;
> +
> +	return pte_offset_kernel(pmdp, addr);
> +}
> +

nit, couldn't you change this to be

    stage2_is_exec(struct kvm *kvm, phys_addr_t addr)

Which, if the pmd is a section mapping just checks that, and if we find
a pte, we check that, and then we can have a simpler one-line call and
check from both the pte and pmd paths below?

>  static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
>  			  phys_addr_t addr, const pte_t *new_pte,
>  			  unsigned long flags)
> @@ -1407,6 +1418,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  		if (exec_fault) {
>  			new_pmd = kvm_s2pmd_mkexec(new_pmd);
>  			coherent_icache_guest_page(vcpu, pfn, PMD_SIZE);
> +		} else if (fault_status == FSC_PERM) {
> +			/* Preserve execute if XN was already cleared */
> +			pmd_t *old_pmdp = stage2_get_pmd(kvm, NULL, fault_ipa);
> +
> +			if (old_pmdp && pmd_present(*old_pmdp) &&
> +			    kvm_s2pmd_exec(old_pmdp))
> +				new_pmd = kvm_s2pmd_mkexec(new_pmd);

Is the reverse case not also possible then?  That is, if we have an
exec_fault, we could check if the entry is already writable and maintain
the property as well.  Not sure how often that would get hit though, as
a VM would only execute instructions on a page that has been written to,
but is somehow read-only at stage2, meaning the host must have marked
the page as read-only since content was written.  I think this could be
a somewhat common pattern with something like KSM though?


>  		}
>  
>  		ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
> @@ -1425,6 +1443,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  		if (exec_fault) {
>  			new_pte = kvm_s2pte_mkexec(new_pte);
>  			coherent_icache_guest_page(vcpu, pfn, PAGE_SIZE);
> +		} else if (fault_status == FSC_PERM) {
> +			/* Preserve execute if XN was already cleared */
> +			pte_t *old_ptep = stage2_get_pte(kvm, fault_ipa);
> +
> +			if (old_ptep && pte_present(*old_ptep) &&
> +			    kvm_s2pte_exec(old_ptep))
> +				new_pte = kvm_s2pte_mkexec(new_pte);
>  		}
>  
>  		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, flags);
> -- 
> 2.14.1
> 

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 05/10] KVM: arm/arm64: Limit icache invalidation to prefetch aborts
  2017-10-09 15:20   ` Marc Zyngier
@ 2017-10-16 20:08     ` Christoffer Dall
  -1 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2017-10-16 20:08 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, Catalin Marinas, Will Deacon, kvmarm, linux-arm-kernel

On Mon, Oct 09, 2017 at 04:20:27PM +0100, Marc Zyngier wrote:
> We've so far eagerly invalidated the icache, no matter how
> the page was faulted in (data or prefetch abort).
> 
> But we can easily track execution by setting the XN bits
> in the S2 page tables, get the prefetch abort at HYP and
> perform the icache invalidation at that time only.
> 
> As for most VMs, the instruction working set is pretty
> small compared to the data set, this is likely to save
> some traffic (specially as the invalidation is broadcast).
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/include/asm/kvm_mmu.h        | 12 ++++++++++++
>  arch/arm/include/asm/pgtable.h        |  4 ++--
>  arch/arm64/include/asm/kvm_mmu.h      | 12 ++++++++++++
>  arch/arm64/include/asm/pgtable-prot.h |  4 ++--
>  virt/kvm/arm/mmu.c                    | 19 +++++++++++++++----
>  5 files changed, 43 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index 6773dcf21bff..bf76150aad5f 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -85,6 +85,18 @@ static inline pmd_t kvm_s2pmd_mkwrite(pmd_t pmd)
>  	return pmd;
>  }
>  
> +static inline pte_t kvm_s2pte_mkexec(pte_t pte)
> +{
> +	pte_val(pte) &= ~L_PTE_XN;
> +	return pte;
> +}
> +
> +static inline pmd_t kvm_s2pmd_mkexec(pmd_t pmd)
> +{
> +	pmd_val(pmd) &= ~PMD_SECT_XN;
> +	return pmd;
> +}
> +
>  static inline void kvm_set_s2pte_readonly(pte_t *pte)
>  {
>  	pte_val(*pte) = (pte_val(*pte) & ~L_PTE_S2_RDWR) | L_PTE_S2_RDONLY;
> diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
> index 1c462381c225..9b6e77b9ab7e 100644
> --- a/arch/arm/include/asm/pgtable.h
> +++ b/arch/arm/include/asm/pgtable.h
> @@ -102,8 +102,8 @@ extern pgprot_t		pgprot_s2_device;
>  #define PAGE_HYP_EXEC		_MOD_PROT(pgprot_kernel, L_PTE_HYP | L_PTE_RDONLY)
>  #define PAGE_HYP_RO		_MOD_PROT(pgprot_kernel, L_PTE_HYP | L_PTE_RDONLY | L_PTE_XN)
>  #define PAGE_HYP_DEVICE		_MOD_PROT(pgprot_hyp_device, L_PTE_HYP)
> -#define PAGE_S2			_MOD_PROT(pgprot_s2, L_PTE_S2_RDONLY)
> -#define PAGE_S2_DEVICE		_MOD_PROT(pgprot_s2_device, L_PTE_S2_RDONLY)
> +#define PAGE_S2			_MOD_PROT(pgprot_s2, L_PTE_S2_RDONLY | L_PTE_XN)
> +#define PAGE_S2_DEVICE		_MOD_PROT(pgprot_s2_device, L_PTE_S2_RDONLY | L_PTE_XN)
>  
>  #define __PAGE_NONE		__pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | L_PTE_XN | L_PTE_NONE)
>  #define __PAGE_SHARED		__pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_XN)
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 48d31ca2ce9c..60c420a5ac0d 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -173,6 +173,18 @@ static inline pmd_t kvm_s2pmd_mkwrite(pmd_t pmd)
>  	return pmd;
>  }
>  
> +static inline pte_t kvm_s2pte_mkexec(pte_t pte)
> +{
> +	pte_val(pte) &= ~PTE_S2_XN;
> +	return pte;
> +}
> +
> +static inline pmd_t kvm_s2pmd_mkexec(pmd_t pmd)
> +{
> +	pmd_val(pmd) &= ~PMD_S2_XN;
> +	return pmd;
> +}
> +
>  static inline void kvm_set_s2pte_readonly(pte_t *pte)
>  {
>  	pteval_t old_pteval, pteval;
> diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
> index 0a5635fb0ef9..4e12dabd342b 100644
> --- a/arch/arm64/include/asm/pgtable-prot.h
> +++ b/arch/arm64/include/asm/pgtable-prot.h
> @@ -60,8 +60,8 @@
>  #define PAGE_HYP_RO		__pgprot(_PAGE_DEFAULT | PTE_HYP | PTE_RDONLY | PTE_HYP_XN)
>  #define PAGE_HYP_DEVICE		__pgprot(PROT_DEVICE_nGnRE | PTE_HYP)
>  
> -#define PAGE_S2			__pgprot(PROT_DEFAULT | PTE_S2_MEMATTR(MT_S2_NORMAL) | PTE_S2_RDONLY)
> -#define PAGE_S2_DEVICE		__pgprot(PROT_DEFAULT | PTE_S2_MEMATTR(MT_S2_DEVICE_nGnRE) | PTE_S2_RDONLY | PTE_UXN)
> +#define PAGE_S2			__pgprot(PROT_DEFAULT | PTE_S2_MEMATTR(MT_S2_NORMAL) | PTE_S2_RDONLY | PTE_S2_XN)
> +#define PAGE_S2_DEVICE		__pgprot(PROT_DEFAULT | PTE_S2_MEMATTR(MT_S2_DEVICE_nGnRE) | PTE_S2_RDONLY | PTE_S2_XN)
>  
>  #define PAGE_NONE		__pgprot(((_PAGE_DEFAULT) & ~PTE_VALID) | PTE_PROT_NONE | PTE_RDONLY | PTE_PXN | PTE_UXN)
>  #define PAGE_SHARED		__pgprot(_PAGE_DEFAULT | PTE_USER | PTE_NG | PTE_PXN | PTE_UXN | PTE_WRITE)
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 9e5628388af8..1d47da22f75c 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1292,7 +1292,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  			  unsigned long fault_status)
>  {
>  	int ret;
> -	bool write_fault, writable, hugetlb = false, force_pte = false;
> +	bool write_fault, exec_fault, writable, hugetlb = false, force_pte = false;
>  	unsigned long mmu_seq;
>  	gfn_t gfn = fault_ipa >> PAGE_SHIFT;
>  	struct kvm *kvm = vcpu->kvm;
> @@ -1304,7 +1304,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	unsigned long flags = 0;
>  
>  	write_fault = kvm_is_write_fault(vcpu);
> -	if (fault_status == FSC_PERM && !write_fault) {
> +	exec_fault = kvm_vcpu_trap_is_iabt(vcpu);
> +	VM_BUG_ON(write_fault && exec_fault);
> +
> +	if (fault_status == FSC_PERM && !write_fault && !exec_fault) {
>  		kvm_err("Unexpected L2 read permission error\n");
>  		return -EFAULT;
>  	}
> @@ -1398,7 +1401,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  			kvm_set_pfn_dirty(pfn);
>  		}
>  		coherent_dcache_guest_page(vcpu, pfn, PMD_SIZE);
> -		coherent_icache_guest_page(vcpu, pfn, PMD_SIZE);
> +
> +		if (exec_fault) {
> +			new_pmd = kvm_s2pmd_mkexec(new_pmd);
> +			coherent_icache_guest_page(vcpu, pfn, PMD_SIZE);
> +		}
>  
>  		ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
>  	} else {
> @@ -1410,7 +1417,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  			mark_page_dirty(kvm, gfn);
>  		}
>  		coherent_dcache_guest_page(vcpu, pfn, PAGE_SIZE);
> -		coherent_icache_guest_page(vcpu, pfn, PAGE_SIZE);
> +
> +		if (exec_fault) {
> +			new_pte = kvm_s2pte_mkexec(new_pte);
> +			coherent_icache_guest_page(vcpu, pfn, PAGE_SIZE);
> +		}
>  
>  		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, flags);
>  	}
> -- 
> 2.14.1
> 

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 05/10] KVM: arm/arm64: Limit icache invalidation to prefetch aborts
@ 2017-10-16 20:08     ` Christoffer Dall
  0 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2017-10-16 20:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 09, 2017 at 04:20:27PM +0100, Marc Zyngier wrote:
> We've so far eagerly invalidated the icache, no matter how
> the page was faulted in (data or prefetch abort).
> 
> But we can easily track execution by setting the XN bits
> in the S2 page tables, get the prefetch abort at HYP and
> perform the icache invalidation at that time only.
> 
> As for most VMs, the instruction working set is pretty
> small compared to the data set, this is likely to save
> some traffic (specially as the invalidation is broadcast).
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/include/asm/kvm_mmu.h        | 12 ++++++++++++
>  arch/arm/include/asm/pgtable.h        |  4 ++--
>  arch/arm64/include/asm/kvm_mmu.h      | 12 ++++++++++++
>  arch/arm64/include/asm/pgtable-prot.h |  4 ++--
>  virt/kvm/arm/mmu.c                    | 19 +++++++++++++++----
>  5 files changed, 43 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index 6773dcf21bff..bf76150aad5f 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -85,6 +85,18 @@ static inline pmd_t kvm_s2pmd_mkwrite(pmd_t pmd)
>  	return pmd;
>  }
>  
> +static inline pte_t kvm_s2pte_mkexec(pte_t pte)
> +{
> +	pte_val(pte) &= ~L_PTE_XN;
> +	return pte;
> +}
> +
> +static inline pmd_t kvm_s2pmd_mkexec(pmd_t pmd)
> +{
> +	pmd_val(pmd) &= ~PMD_SECT_XN;
> +	return pmd;
> +}
> +
>  static inline void kvm_set_s2pte_readonly(pte_t *pte)
>  {
>  	pte_val(*pte) = (pte_val(*pte) & ~L_PTE_S2_RDWR) | L_PTE_S2_RDONLY;
> diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
> index 1c462381c225..9b6e77b9ab7e 100644
> --- a/arch/arm/include/asm/pgtable.h
> +++ b/arch/arm/include/asm/pgtable.h
> @@ -102,8 +102,8 @@ extern pgprot_t		pgprot_s2_device;
>  #define PAGE_HYP_EXEC		_MOD_PROT(pgprot_kernel, L_PTE_HYP | L_PTE_RDONLY)
>  #define PAGE_HYP_RO		_MOD_PROT(pgprot_kernel, L_PTE_HYP | L_PTE_RDONLY | L_PTE_XN)
>  #define PAGE_HYP_DEVICE		_MOD_PROT(pgprot_hyp_device, L_PTE_HYP)
> -#define PAGE_S2			_MOD_PROT(pgprot_s2, L_PTE_S2_RDONLY)
> -#define PAGE_S2_DEVICE		_MOD_PROT(pgprot_s2_device, L_PTE_S2_RDONLY)
> +#define PAGE_S2			_MOD_PROT(pgprot_s2, L_PTE_S2_RDONLY | L_PTE_XN)
> +#define PAGE_S2_DEVICE		_MOD_PROT(pgprot_s2_device, L_PTE_S2_RDONLY | L_PTE_XN)
>  
>  #define __PAGE_NONE		__pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | L_PTE_XN | L_PTE_NONE)
>  #define __PAGE_SHARED		__pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_XN)
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 48d31ca2ce9c..60c420a5ac0d 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -173,6 +173,18 @@ static inline pmd_t kvm_s2pmd_mkwrite(pmd_t pmd)
>  	return pmd;
>  }
>  
> +static inline pte_t kvm_s2pte_mkexec(pte_t pte)
> +{
> +	pte_val(pte) &= ~PTE_S2_XN;
> +	return pte;
> +}
> +
> +static inline pmd_t kvm_s2pmd_mkexec(pmd_t pmd)
> +{
> +	pmd_val(pmd) &= ~PMD_S2_XN;
> +	return pmd;
> +}
> +
>  static inline void kvm_set_s2pte_readonly(pte_t *pte)
>  {
>  	pteval_t old_pteval, pteval;
> diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
> index 0a5635fb0ef9..4e12dabd342b 100644
> --- a/arch/arm64/include/asm/pgtable-prot.h
> +++ b/arch/arm64/include/asm/pgtable-prot.h
> @@ -60,8 +60,8 @@
>  #define PAGE_HYP_RO		__pgprot(_PAGE_DEFAULT | PTE_HYP | PTE_RDONLY | PTE_HYP_XN)
>  #define PAGE_HYP_DEVICE		__pgprot(PROT_DEVICE_nGnRE | PTE_HYP)
>  
> -#define PAGE_S2			__pgprot(PROT_DEFAULT | PTE_S2_MEMATTR(MT_S2_NORMAL) | PTE_S2_RDONLY)
> -#define PAGE_S2_DEVICE		__pgprot(PROT_DEFAULT | PTE_S2_MEMATTR(MT_S2_DEVICE_nGnRE) | PTE_S2_RDONLY | PTE_UXN)
> +#define PAGE_S2			__pgprot(PROT_DEFAULT | PTE_S2_MEMATTR(MT_S2_NORMAL) | PTE_S2_RDONLY | PTE_S2_XN)
> +#define PAGE_S2_DEVICE		__pgprot(PROT_DEFAULT | PTE_S2_MEMATTR(MT_S2_DEVICE_nGnRE) | PTE_S2_RDONLY | PTE_S2_XN)
>  
>  #define PAGE_NONE		__pgprot(((_PAGE_DEFAULT) & ~PTE_VALID) | PTE_PROT_NONE | PTE_RDONLY | PTE_PXN | PTE_UXN)
>  #define PAGE_SHARED		__pgprot(_PAGE_DEFAULT | PTE_USER | PTE_NG | PTE_PXN | PTE_UXN | PTE_WRITE)
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 9e5628388af8..1d47da22f75c 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1292,7 +1292,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  			  unsigned long fault_status)
>  {
>  	int ret;
> -	bool write_fault, writable, hugetlb = false, force_pte = false;
> +	bool write_fault, exec_fault, writable, hugetlb = false, force_pte = false;
>  	unsigned long mmu_seq;
>  	gfn_t gfn = fault_ipa >> PAGE_SHIFT;
>  	struct kvm *kvm = vcpu->kvm;
> @@ -1304,7 +1304,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	unsigned long flags = 0;
>  
>  	write_fault = kvm_is_write_fault(vcpu);
> -	if (fault_status == FSC_PERM && !write_fault) {
> +	exec_fault = kvm_vcpu_trap_is_iabt(vcpu);
> +	VM_BUG_ON(write_fault && exec_fault);
> +
> +	if (fault_status == FSC_PERM && !write_fault && !exec_fault) {
>  		kvm_err("Unexpected L2 read permission error\n");
>  		return -EFAULT;
>  	}
> @@ -1398,7 +1401,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  			kvm_set_pfn_dirty(pfn);
>  		}
>  		coherent_dcache_guest_page(vcpu, pfn, PMD_SIZE);
> -		coherent_icache_guest_page(vcpu, pfn, PMD_SIZE);
> +
> +		if (exec_fault) {
> +			new_pmd = kvm_s2pmd_mkexec(new_pmd);
> +			coherent_icache_guest_page(vcpu, pfn, PMD_SIZE);
> +		}
>  
>  		ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
>  	} else {
> @@ -1410,7 +1417,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  			mark_page_dirty(kvm, gfn);
>  		}
>  		coherent_dcache_guest_page(vcpu, pfn, PAGE_SIZE);
> -		coherent_icache_guest_page(vcpu, pfn, PAGE_SIZE);
> +
> +		if (exec_fault) {
> +			new_pte = kvm_s2pte_mkexec(new_pte);
> +			coherent_icache_guest_page(vcpu, pfn, PAGE_SIZE);
> +		}
>  
>  		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, flags);
>  	}
> -- 
> 2.14.1
> 

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 08/10] KVM: arm/arm64: Drop vcpu parameter from coherent_{d,i}cache_guest_page
  2017-10-09 15:20   ` [PATCH 08/10] KVM: arm/arm64: Drop vcpu parameter from coherent_{d, i}cache_guest_page Marc Zyngier
@ 2017-10-16 20:08     ` Christoffer Dall
  -1 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2017-10-16 20:08 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Christoffer Dall, Catalin Marinas, Will Deacon, linux-arm-kernel,
	kvm, kvmarm

On Mon, Oct 09, 2017 at 04:20:30PM +0100, Marc Zyngier wrote:
> The vcpu parameter isn't used for anything, and gets in the way of
> further cleanups. Let's get rid of it.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

Acked-by: Christoffer Dall <christoffer.dall@linaro.org>

> ---
>  arch/arm/include/asm/kvm_mmu.h   |  6 ++----
>  arch/arm64/include/asm/kvm_mmu.h |  6 ++----
>  virt/kvm/arm/mmu.c               | 18 ++++++++----------
>  3 files changed, 12 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index ad442d86c23e..5f1ac88a5951 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -150,8 +150,7 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
>  	return (vcpu_cp15(vcpu, c1_SCTLR) & 0b101) == 0b101;
>  }
>  
> -static inline void __coherent_dcache_guest_page(struct kvm_vcpu *vcpu,
> -						kvm_pfn_t pfn,
> +static inline void __coherent_dcache_guest_page(kvm_pfn_t pfn,
>  						unsigned long size)
>  {
>  	/*
> @@ -177,8 +176,7 @@ static inline void __coherent_dcache_guest_page(struct kvm_vcpu *vcpu,
>  	}
>  }
>  
> -static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
> -						kvm_pfn_t pfn,
> +static inline void __coherent_icache_guest_page(kvm_pfn_t pfn,
>  						unsigned long size)
>  {
>  	u32 iclsz;
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index e7af74b8b51a..33dcc3c79574 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -252,8 +252,7 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
>  	return (vcpu_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
>  }
>  
> -static inline void __coherent_dcache_guest_page(struct kvm_vcpu *vcpu,
> -						kvm_pfn_t pfn,
> +static inline void __coherent_dcache_guest_page(kvm_pfn_t pfn,
>  						unsigned long size)
>  {
>  	void *va = page_address(pfn_to_page(pfn));
> @@ -261,8 +260,7 @@ static inline void __coherent_dcache_guest_page(struct kvm_vcpu *vcpu,
>  	kvm_flush_dcache_to_poc(va, size);
>  }
>  
> -static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
> -						kvm_pfn_t pfn,
> +static inline void __coherent_icache_guest_page(kvm_pfn_t pfn,
>  						unsigned long size)
>  {
>  	if (icache_is_aliasing()) {
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index ccc6106764a6..5b495450e92f 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1268,16 +1268,14 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>  	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
>  }
>  
> -static void coherent_dcache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
> -				       unsigned long size)
> +static void coherent_dcache_guest_page(kvm_pfn_t pfn, unsigned long size)
>  {
> -	__coherent_dcache_guest_page(vcpu, pfn, size);
> +	__coherent_dcache_guest_page(pfn, size);
>  }
>  
> -static void coherent_icache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
> -				       unsigned long size)
> +static void coherent_icache_guest_page(kvm_pfn_t pfn, unsigned long size)
>  {
> -	__coherent_icache_guest_page(vcpu, pfn, size);
> +	__coherent_icache_guest_page(pfn, size);
>  }
>  
>  static void kvm_send_hwpoison_signal(unsigned long address,
> @@ -1413,11 +1411,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  		}
>  
>  		if (fault_status != FSC_PERM)
> -			coherent_dcache_guest_page(vcpu, pfn, PMD_SIZE);
> +			coherent_dcache_guest_page(pfn, PMD_SIZE);
>  
>  		if (exec_fault) {
>  			new_pmd = kvm_s2pmd_mkexec(new_pmd);
> -			coherent_icache_guest_page(vcpu, pfn, PMD_SIZE);
> +			coherent_icache_guest_page(pfn, PMD_SIZE);
>  		} else if (fault_status == FSC_PERM) {
>  			/* Preserve execute if XN was already cleared */
>  			pmd_t *old_pmdp = stage2_get_pmd(kvm, NULL, fault_ipa);
> @@ -1438,11 +1436,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  		}
>  
>  		if (fault_status != FSC_PERM)
> -			coherent_dcache_guest_page(vcpu, pfn, PAGE_SIZE);
> +			coherent_dcache_guest_page(pfn, PAGE_SIZE);
>  
>  		if (exec_fault) {
>  			new_pte = kvm_s2pte_mkexec(new_pte);
> -			coherent_icache_guest_page(vcpu, pfn, PAGE_SIZE);
> +			coherent_icache_guest_page(pfn, PAGE_SIZE);
>  		} else if (fault_status == FSC_PERM) {
>  			/* Preserve execute if XN was already cleared */
>  			pte_t *old_ptep = stage2_get_pte(kvm, fault_ipa);
> -- 
> 2.14.1
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 08/10] KVM: arm/arm64: Drop vcpu parameter from coherent_{d,i}cache_guest_page
@ 2017-10-16 20:08     ` Christoffer Dall
  0 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2017-10-16 20:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 09, 2017 at 04:20:30PM +0100, Marc Zyngier wrote:
> The vcpu parameter isn't used for anything, and gets in the way of
> further cleanups. Let's get rid of it.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

Acked-by: Christoffer Dall <christoffer.dall@linaro.org>

> ---
>  arch/arm/include/asm/kvm_mmu.h   |  6 ++----
>  arch/arm64/include/asm/kvm_mmu.h |  6 ++----
>  virt/kvm/arm/mmu.c               | 18 ++++++++----------
>  3 files changed, 12 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index ad442d86c23e..5f1ac88a5951 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -150,8 +150,7 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
>  	return (vcpu_cp15(vcpu, c1_SCTLR) & 0b101) == 0b101;
>  }
>  
> -static inline void __coherent_dcache_guest_page(struct kvm_vcpu *vcpu,
> -						kvm_pfn_t pfn,
> +static inline void __coherent_dcache_guest_page(kvm_pfn_t pfn,
>  						unsigned long size)
>  {
>  	/*
> @@ -177,8 +176,7 @@ static inline void __coherent_dcache_guest_page(struct kvm_vcpu *vcpu,
>  	}
>  }
>  
> -static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
> -						kvm_pfn_t pfn,
> +static inline void __coherent_icache_guest_page(kvm_pfn_t pfn,
>  						unsigned long size)
>  {
>  	u32 iclsz;
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index e7af74b8b51a..33dcc3c79574 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -252,8 +252,7 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
>  	return (vcpu_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
>  }
>  
> -static inline void __coherent_dcache_guest_page(struct kvm_vcpu *vcpu,
> -						kvm_pfn_t pfn,
> +static inline void __coherent_dcache_guest_page(kvm_pfn_t pfn,
>  						unsigned long size)
>  {
>  	void *va = page_address(pfn_to_page(pfn));
> @@ -261,8 +260,7 @@ static inline void __coherent_dcache_guest_page(struct kvm_vcpu *vcpu,
>  	kvm_flush_dcache_to_poc(va, size);
>  }
>  
> -static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
> -						kvm_pfn_t pfn,
> +static inline void __coherent_icache_guest_page(kvm_pfn_t pfn,
>  						unsigned long size)
>  {
>  	if (icache_is_aliasing()) {
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index ccc6106764a6..5b495450e92f 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1268,16 +1268,14 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>  	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
>  }
>  
> -static void coherent_dcache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
> -				       unsigned long size)
> +static void coherent_dcache_guest_page(kvm_pfn_t pfn, unsigned long size)
>  {
> -	__coherent_dcache_guest_page(vcpu, pfn, size);
> +	__coherent_dcache_guest_page(pfn, size);
>  }
>  
> -static void coherent_icache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
> -				       unsigned long size)
> +static void coherent_icache_guest_page(kvm_pfn_t pfn, unsigned long size)
>  {
> -	__coherent_icache_guest_page(vcpu, pfn, size);
> +	__coherent_icache_guest_page(pfn, size);
>  }
>  
>  static void kvm_send_hwpoison_signal(unsigned long address,
> @@ -1413,11 +1411,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  		}
>  
>  		if (fault_status != FSC_PERM)
> -			coherent_dcache_guest_page(vcpu, pfn, PMD_SIZE);
> +			coherent_dcache_guest_page(pfn, PMD_SIZE);
>  
>  		if (exec_fault) {
>  			new_pmd = kvm_s2pmd_mkexec(new_pmd);
> -			coherent_icache_guest_page(vcpu, pfn, PMD_SIZE);
> +			coherent_icache_guest_page(pfn, PMD_SIZE);
>  		} else if (fault_status == FSC_PERM) {
>  			/* Preserve execute if XN was already cleared */
>  			pmd_t *old_pmdp = stage2_get_pmd(kvm, NULL, fault_ipa);
> @@ -1438,11 +1436,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  		}
>  
>  		if (fault_status != FSC_PERM)
> -			coherent_dcache_guest_page(vcpu, pfn, PAGE_SIZE);
> +			coherent_dcache_guest_page(pfn, PAGE_SIZE);
>  
>  		if (exec_fault) {
>  			new_pte = kvm_s2pte_mkexec(new_pte);
> -			coherent_icache_guest_page(vcpu, pfn, PAGE_SIZE);
> +			coherent_icache_guest_page(pfn, PAGE_SIZE);
>  		} else if (fault_status == FSC_PERM) {
>  			/* Preserve execute if XN was already cleared */
>  			pte_t *old_ptep = stage2_get_pte(kvm, fault_ipa);
> -- 
> 2.14.1
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 09/10] KVM: arm/arm64: Detangle kvm_mmu.h from kvm_hyp.h
  2017-10-09 15:20   ` Marc Zyngier
@ 2017-10-16 20:08     ` Christoffer Dall
  -1 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2017-10-16 20:08 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, Catalin Marinas, Will Deacon, kvmarm, linux-arm-kernel

On Mon, Oct 09, 2017 at 04:20:31PM +0100, Marc Zyngier wrote:
> kvm_hyp.h has an odd dependency on kvm_mmu.h, which makes the
> opposite inclusion impossible. Let's start with breaking that
> useless dependency.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

Acked-by: Christoffer Dall <christoffer.dall@linaro.org>

> ---
>  arch/arm/include/asm/kvm_hyp.h   | 1 -
>  arch/arm/kvm/hyp/switch.c        | 1 +
>  arch/arm/kvm/hyp/tlb.c           | 1 +
>  arch/arm64/include/asm/kvm_hyp.h | 1 -
>  arch/arm64/kvm/hyp/debug-sr.c    | 1 +
>  arch/arm64/kvm/hyp/switch.c      | 1 +
>  arch/arm64/kvm/hyp/tlb.c         | 1 +
>  virt/kvm/arm/hyp/vgic-v2-sr.c    | 1 +
>  8 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_hyp.h b/arch/arm/include/asm/kvm_hyp.h
> index ad541f9ecc78..8b29faa119ba 100644
> --- a/arch/arm/include/asm/kvm_hyp.h
> +++ b/arch/arm/include/asm/kvm_hyp.h
> @@ -21,7 +21,6 @@
>  #include <linux/compiler.h>
>  #include <linux/kvm_host.h>
>  #include <asm/cp15.h>
> -#include <asm/kvm_mmu.h>
>  #include <asm/vfp.h>
>  
>  #define __hyp_text __section(.hyp.text) notrace
> diff --git a/arch/arm/kvm/hyp/switch.c b/arch/arm/kvm/hyp/switch.c
> index ebd2dd46adf7..67e0a689c4b5 100644
> --- a/arch/arm/kvm/hyp/switch.c
> +++ b/arch/arm/kvm/hyp/switch.c
> @@ -18,6 +18,7 @@
>  
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_hyp.h>
> +#include <asm/kvm_mmu.h>
>  
>  __asm__(".arch_extension     virt");
>  
> diff --git a/arch/arm/kvm/hyp/tlb.c b/arch/arm/kvm/hyp/tlb.c
> index 6d810af2d9fd..c0edd450e104 100644
> --- a/arch/arm/kvm/hyp/tlb.c
> +++ b/arch/arm/kvm/hyp/tlb.c
> @@ -19,6 +19,7 @@
>   */
>  
>  #include <asm/kvm_hyp.h>
> +#include <asm/kvm_mmu.h>
>  
>  /**
>   * Flush per-VMID TLBs
> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
> index 4572a9b560fa..afbfbe0c12c5 100644
> --- a/arch/arm64/include/asm/kvm_hyp.h
> +++ b/arch/arm64/include/asm/kvm_hyp.h
> @@ -20,7 +20,6 @@
>  
>  #include <linux/compiler.h>
>  #include <linux/kvm_host.h>
> -#include <asm/kvm_mmu.h>
>  #include <asm/sysreg.h>
>  
>  #define __hyp_text __section(.hyp.text) notrace
> diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
> index f5154ed3da6c..d3a13d57f2c5 100644
> --- a/arch/arm64/kvm/hyp/debug-sr.c
> +++ b/arch/arm64/kvm/hyp/debug-sr.c
> @@ -21,6 +21,7 @@
>  #include <asm/debug-monitors.h>
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_hyp.h>
> +#include <asm/kvm_mmu.h>
>  
>  #define read_debug(r,n)		read_sysreg(r##n##_el1)
>  #define write_debug(v,r,n)	write_sysreg(v, r##n##_el1)
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 945e79c641c4..c52f7094122f 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -21,6 +21,7 @@
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_emulate.h>
>  #include <asm/kvm_hyp.h>
> +#include <asm/kvm_mmu.h>
>  #include <asm/fpsimd.h>
>  
>  static bool __hyp_text __fpsimd_enabled_nvhe(void)
> diff --git a/arch/arm64/kvm/hyp/tlb.c b/arch/arm64/kvm/hyp/tlb.c
> index 73464a96c365..131c7772703c 100644
> --- a/arch/arm64/kvm/hyp/tlb.c
> +++ b/arch/arm64/kvm/hyp/tlb.c
> @@ -16,6 +16,7 @@
>   */
>  
>  #include <asm/kvm_hyp.h>
> +#include <asm/kvm_mmu.h>
>  #include <asm/tlbflush.h>
>  
>  static void __hyp_text __tlb_switch_to_guest_vhe(struct kvm *kvm)
> diff --git a/virt/kvm/arm/hyp/vgic-v2-sr.c b/virt/kvm/arm/hyp/vgic-v2-sr.c
> index a3f18d362366..77ccd8e2090b 100644
> --- a/virt/kvm/arm/hyp/vgic-v2-sr.c
> +++ b/virt/kvm/arm/hyp/vgic-v2-sr.c
> @@ -21,6 +21,7 @@
>  
>  #include <asm/kvm_emulate.h>
>  #include <asm/kvm_hyp.h>
> +#include <asm/kvm_mmu.h>
>  
>  static void __hyp_text save_elrsr(struct kvm_vcpu *vcpu, void __iomem *base)
>  {
> -- 
> 2.14.1
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 09/10] KVM: arm/arm64: Detangle kvm_mmu.h from kvm_hyp.h
@ 2017-10-16 20:08     ` Christoffer Dall
  0 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2017-10-16 20:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 09, 2017 at 04:20:31PM +0100, Marc Zyngier wrote:
> kvm_hyp.h has an odd dependency on kvm_mmu.h, which makes the
> opposite inclusion impossible. Let's start with breaking that
> useless dependency.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

Acked-by: Christoffer Dall <christoffer.dall@linaro.org>

> ---
>  arch/arm/include/asm/kvm_hyp.h   | 1 -
>  arch/arm/kvm/hyp/switch.c        | 1 +
>  arch/arm/kvm/hyp/tlb.c           | 1 +
>  arch/arm64/include/asm/kvm_hyp.h | 1 -
>  arch/arm64/kvm/hyp/debug-sr.c    | 1 +
>  arch/arm64/kvm/hyp/switch.c      | 1 +
>  arch/arm64/kvm/hyp/tlb.c         | 1 +
>  virt/kvm/arm/hyp/vgic-v2-sr.c    | 1 +
>  8 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_hyp.h b/arch/arm/include/asm/kvm_hyp.h
> index ad541f9ecc78..8b29faa119ba 100644
> --- a/arch/arm/include/asm/kvm_hyp.h
> +++ b/arch/arm/include/asm/kvm_hyp.h
> @@ -21,7 +21,6 @@
>  #include <linux/compiler.h>
>  #include <linux/kvm_host.h>
>  #include <asm/cp15.h>
> -#include <asm/kvm_mmu.h>
>  #include <asm/vfp.h>
>  
>  #define __hyp_text __section(.hyp.text) notrace
> diff --git a/arch/arm/kvm/hyp/switch.c b/arch/arm/kvm/hyp/switch.c
> index ebd2dd46adf7..67e0a689c4b5 100644
> --- a/arch/arm/kvm/hyp/switch.c
> +++ b/arch/arm/kvm/hyp/switch.c
> @@ -18,6 +18,7 @@
>  
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_hyp.h>
> +#include <asm/kvm_mmu.h>
>  
>  __asm__(".arch_extension     virt");
>  
> diff --git a/arch/arm/kvm/hyp/tlb.c b/arch/arm/kvm/hyp/tlb.c
> index 6d810af2d9fd..c0edd450e104 100644
> --- a/arch/arm/kvm/hyp/tlb.c
> +++ b/arch/arm/kvm/hyp/tlb.c
> @@ -19,6 +19,7 @@
>   */
>  
>  #include <asm/kvm_hyp.h>
> +#include <asm/kvm_mmu.h>
>  
>  /**
>   * Flush per-VMID TLBs
> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
> index 4572a9b560fa..afbfbe0c12c5 100644
> --- a/arch/arm64/include/asm/kvm_hyp.h
> +++ b/arch/arm64/include/asm/kvm_hyp.h
> @@ -20,7 +20,6 @@
>  
>  #include <linux/compiler.h>
>  #include <linux/kvm_host.h>
> -#include <asm/kvm_mmu.h>
>  #include <asm/sysreg.h>
>  
>  #define __hyp_text __section(.hyp.text) notrace
> diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
> index f5154ed3da6c..d3a13d57f2c5 100644
> --- a/arch/arm64/kvm/hyp/debug-sr.c
> +++ b/arch/arm64/kvm/hyp/debug-sr.c
> @@ -21,6 +21,7 @@
>  #include <asm/debug-monitors.h>
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_hyp.h>
> +#include <asm/kvm_mmu.h>
>  
>  #define read_debug(r,n)		read_sysreg(r##n##_el1)
>  #define write_debug(v,r,n)	write_sysreg(v, r##n##_el1)
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 945e79c641c4..c52f7094122f 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -21,6 +21,7 @@
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_emulate.h>
>  #include <asm/kvm_hyp.h>
> +#include <asm/kvm_mmu.h>
>  #include <asm/fpsimd.h>
>  
>  static bool __hyp_text __fpsimd_enabled_nvhe(void)
> diff --git a/arch/arm64/kvm/hyp/tlb.c b/arch/arm64/kvm/hyp/tlb.c
> index 73464a96c365..131c7772703c 100644
> --- a/arch/arm64/kvm/hyp/tlb.c
> +++ b/arch/arm64/kvm/hyp/tlb.c
> @@ -16,6 +16,7 @@
>   */
>  
>  #include <asm/kvm_hyp.h>
> +#include <asm/kvm_mmu.h>
>  #include <asm/tlbflush.h>
>  
>  static void __hyp_text __tlb_switch_to_guest_vhe(struct kvm *kvm)
> diff --git a/virt/kvm/arm/hyp/vgic-v2-sr.c b/virt/kvm/arm/hyp/vgic-v2-sr.c
> index a3f18d362366..77ccd8e2090b 100644
> --- a/virt/kvm/arm/hyp/vgic-v2-sr.c
> +++ b/virt/kvm/arm/hyp/vgic-v2-sr.c
> @@ -21,6 +21,7 @@
>  
>  #include <asm/kvm_emulate.h>
>  #include <asm/kvm_hyp.h>
> +#include <asm/kvm_mmu.h>
>  
>  static void __hyp_text save_elrsr(struct kvm_vcpu *vcpu, void __iomem *base)
>  {
> -- 
> 2.14.1
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 02/10] arm64: KVM: Add invalidate_icache_range helper
  2017-10-09 15:20   ` Marc Zyngier
@ 2017-10-16 20:08     ` Christoffer Dall
  -1 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2017-10-16 20:08 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, Catalin Marinas, Will Deacon, kvmarm, linux-arm-kernel

On Mon, Oct 09, 2017 at 04:20:24PM +0100, Marc Zyngier wrote:
> We currently tightly couple dcache clean with icache invalidation,
> but KVM could do without the initial flush to PoU, as we've
> already flushed things to PoC.
> 
> Let's introduce invalidate_icache_range which is limited to
> invalidating the icache from the linear mapping (and thus
> has none of the userspace fault handling complexity), and
> wire it in KVM instead of flush_icache_range.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/include/asm/cacheflush.h |  8 ++++++++
>  arch/arm64/include/asm/kvm_mmu.h    |  4 ++--
>  arch/arm64/mm/cache.S               | 24 ++++++++++++++++++++++++
>  3 files changed, 34 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
> index 76d1cc85d5b1..ad56406944c6 100644
> --- a/arch/arm64/include/asm/cacheflush.h
> +++ b/arch/arm64/include/asm/cacheflush.h
> @@ -52,6 +52,13 @@
>   *		- start  - virtual start address
>   *		- end    - virtual end address
>   *
> + *	invalidate_icache_range(start, end)
> + *
> + *		Invalidate the I-cache in the region described by start, end.
> + *		Linear mapping only!
> + *		- start  - virtual start address
> + *		- end    - virtual end address
> + *
>   *	__flush_cache_user_range(start, end)
>   *
>   *		Ensure coherency between the I-cache and the D-cache in the
> @@ -66,6 +73,7 @@
>   *		- size   - region size
>   */
>  extern void flush_icache_range(unsigned long start, unsigned long end);
> +extern void invalidate_icache_range(unsigned long start, unsigned long end);
>  extern void __flush_dcache_area(void *addr, size_t len);
>  extern void __inval_dcache_area(void *addr, size_t len);
>  extern void __clean_dcache_area_poc(void *addr, size_t len);
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 4c4cb4f0e34f..48d31ca2ce9c 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -250,8 +250,8 @@ static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
>  		/* PIPT or VPIPT at EL2 (see comment in __kvm_tlb_flush_vmid_ipa) */
>  		void *va = page_address(pfn_to_page(pfn));
>  
> -		flush_icache_range((unsigned long)va,
> -				   (unsigned long)va + size);
> +		invalidate_icache_range((unsigned long)va,
> +					(unsigned long)va + size);
>  	}
>  }
>  
> diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S
> index 7f1dbe962cf5..0c330666a8c9 100644
> --- a/arch/arm64/mm/cache.S
> +++ b/arch/arm64/mm/cache.S
> @@ -80,6 +80,30 @@ USER(9f, ic	ivau, x4	)		// invalidate I line PoU
>  ENDPROC(flush_icache_range)
>  ENDPROC(__flush_cache_user_range)
>  
> +/*
> + *	invalidate_icache_range(start,end)
> + *
> + *	Ensure that the I cache is invalid within specified region. This
> + *	assumes that this is done on the linear mapping. Do not use it
> + *	on a userspace range, as this may fault horribly.
> + *
> + *	- start   - virtual start address of region
> + *	- end     - virtual end address of region
> + */
> +ENTRY(invalidate_icache_range)
> +	icache_line_size x2, x3
> +	sub	x3, x2, #1
> +	bic	x4, x0, x3
> +1:
> +	ic	ivau, x4			// invalidate I line PoU
> +	add	x4, x4, x2
> +	cmp	x4, x1
> +	b.lo	1b
> +	dsb	ish
> +	isb
> +	ret
> +ENDPROC(invalidate_icache_range)
> +
>  /*
>   *	__flush_dcache_area(kaddr, size)
>   *
> -- 
> 2.14.1
> 

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 02/10] arm64: KVM: Add invalidate_icache_range helper
@ 2017-10-16 20:08     ` Christoffer Dall
  0 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2017-10-16 20:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 09, 2017 at 04:20:24PM +0100, Marc Zyngier wrote:
> We currently tightly couple dcache clean with icache invalidation,
> but KVM could do without the initial flush to PoU, as we've
> already flushed things to PoC.
> 
> Let's introduce invalidate_icache_range which is limited to
> invalidating the icache from the linear mapping (and thus
> has none of the userspace fault handling complexity), and
> wire it in KVM instead of flush_icache_range.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/include/asm/cacheflush.h |  8 ++++++++
>  arch/arm64/include/asm/kvm_mmu.h    |  4 ++--
>  arch/arm64/mm/cache.S               | 24 ++++++++++++++++++++++++
>  3 files changed, 34 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
> index 76d1cc85d5b1..ad56406944c6 100644
> --- a/arch/arm64/include/asm/cacheflush.h
> +++ b/arch/arm64/include/asm/cacheflush.h
> @@ -52,6 +52,13 @@
>   *		- start  - virtual start address
>   *		- end    - virtual end address
>   *
> + *	invalidate_icache_range(start, end)
> + *
> + *		Invalidate the I-cache in the region described by start, end.
> + *		Linear mapping only!
> + *		- start  - virtual start address
> + *		- end    - virtual end address
> + *
>   *	__flush_cache_user_range(start, end)
>   *
>   *		Ensure coherency between the I-cache and the D-cache in the
> @@ -66,6 +73,7 @@
>   *		- size   - region size
>   */
>  extern void flush_icache_range(unsigned long start, unsigned long end);
> +extern void invalidate_icache_range(unsigned long start, unsigned long end);
>  extern void __flush_dcache_area(void *addr, size_t len);
>  extern void __inval_dcache_area(void *addr, size_t len);
>  extern void __clean_dcache_area_poc(void *addr, size_t len);
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 4c4cb4f0e34f..48d31ca2ce9c 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -250,8 +250,8 @@ static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
>  		/* PIPT or VPIPT at EL2 (see comment in __kvm_tlb_flush_vmid_ipa) */
>  		void *va = page_address(pfn_to_page(pfn));
>  
> -		flush_icache_range((unsigned long)va,
> -				   (unsigned long)va + size);
> +		invalidate_icache_range((unsigned long)va,
> +					(unsigned long)va + size);
>  	}
>  }
>  
> diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S
> index 7f1dbe962cf5..0c330666a8c9 100644
> --- a/arch/arm64/mm/cache.S
> +++ b/arch/arm64/mm/cache.S
> @@ -80,6 +80,30 @@ USER(9f, ic	ivau, x4	)		// invalidate I line PoU
>  ENDPROC(flush_icache_range)
>  ENDPROC(__flush_cache_user_range)
>  
> +/*
> + *	invalidate_icache_range(start,end)
> + *
> + *	Ensure that the I cache is invalid within specified region. This
> + *	assumes that this is done on the linear mapping. Do not use it
> + *	on a userspace range, as this may fault horribly.
> + *
> + *	- start   - virtual start address of region
> + *	- end     - virtual end address of region
> + */
> +ENTRY(invalidate_icache_range)
> +	icache_line_size x2, x3
> +	sub	x3, x2, #1
> +	bic	x4, x0, x3
> +1:
> +	ic	ivau, x4			// invalidate I line PoU
> +	add	x4, x4, x2
> +	cmp	x4, x1
> +	b.lo	1b
> +	dsb	ish
> +	isb
> +	ret
> +ENDPROC(invalidate_icache_range)
> +
>  /*
>   *	__flush_dcache_area(kaddr, size)
>   *
> -- 
> 2.14.1
> 

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 00/10] arm/arm64: KVM: limit icache invalidation to prefetch aborts
  2017-10-09 15:20 ` Marc Zyngier
@ 2017-10-16 20:59   ` Christoffer Dall
  -1 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2017-10-16 20:59 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, Catalin Marinas, Will Deacon, kvmarm, linux-arm-kernel

On Mon, Oct 09, 2017 at 04:20:22PM +0100, Marc Zyngier wrote:
> It was recently reported that on a VM restore, we seem to spend a
> disproportionate amount of time invalidation the icache. This is
> partially due to some HW behaviour, but also because we're being a bit
> dumb and are invalidating the icache for every page we map at S2, even
> if that on a data access.
> 
> The slightly better way of doing this is to mark the pages XN at S2,
> and wait for the the guest to execute something in that page, at which
> point we perform the invalidation. As it is likely that there is a lot
> less instruction than data, we win (or so we hope).
> 
> We also take this opportunity to drop the extra dcache clean to the
> PoU which is pretty useless, as we already clean all the way to the
> PoC...
> 
> Running a bare metal test that touches 1GB of memory (using a 4kB
> stride) leads to the following results on Seattle:
> 
> 4.13:
> do_fault_read.bin:       0.565885992 seconds time elapsed
> do_fault_write.bin:       0.738296337 seconds time elapsed
> do_fault_read_write.bin:       1.241812231 seconds time elapsed
> 
> 4.14-rc3+patches:
> do_fault_read.bin:       0.244961803 seconds time elapsed
> do_fault_write.bin:       0.422740092 seconds time elapsed
> do_fault_read_write.bin:       0.643402470 seconds time elapsed
> 
> We're almost halving the time of something that more or less looks
> like a restore operation. Some larger systems will show much bigger
> benefits as they become less impacted by the icache invalidation
> (which is broadcast in the inner shareable domain).
> 
> I've also given it a test run on both Cubietruck and Jetson-TK1.
> 
> Tests are archived here:
> https://git.kernel.org/pub/scm/linux/kernel/git/maz/kvm-ws-tests.git/
> 
> I'd value some additional test results on HW I don't have access to.
> 

What would also be interesting is some insight into how big the hit then
is on first execution, but that should in no way gate merging these
patches.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 00/10] arm/arm64: KVM: limit icache invalidation to prefetch aborts
@ 2017-10-16 20:59   ` Christoffer Dall
  0 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2017-10-16 20:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 09, 2017 at 04:20:22PM +0100, Marc Zyngier wrote:
> It was recently reported that on a VM restore, we seem to spend a
> disproportionate amount of time invalidation the icache. This is
> partially due to some HW behaviour, but also because we're being a bit
> dumb and are invalidating the icache for every page we map at S2, even
> if that on a data access.
> 
> The slightly better way of doing this is to mark the pages XN at S2,
> and wait for the the guest to execute something in that page, at which
> point we perform the invalidation. As it is likely that there is a lot
> less instruction than data, we win (or so we hope).
> 
> We also take this opportunity to drop the extra dcache clean to the
> PoU which is pretty useless, as we already clean all the way to the
> PoC...
> 
> Running a bare metal test that touches 1GB of memory (using a 4kB
> stride) leads to the following results on Seattle:
> 
> 4.13:
> do_fault_read.bin:       0.565885992 seconds time elapsed
> do_fault_write.bin:       0.738296337 seconds time elapsed
> do_fault_read_write.bin:       1.241812231 seconds time elapsed
> 
> 4.14-rc3+patches:
> do_fault_read.bin:       0.244961803 seconds time elapsed
> do_fault_write.bin:       0.422740092 seconds time elapsed
> do_fault_read_write.bin:       0.643402470 seconds time elapsed
> 
> We're almost halving the time of something that more or less looks
> like a restore operation. Some larger systems will show much bigger
> benefits as they become less impacted by the icache invalidation
> (which is broadcast in the inner shareable domain).
> 
> I've also given it a test run on both Cubietruck and Jetson-TK1.
> 
> Tests are archived here:
> https://git.kernel.org/pub/scm/linux/kernel/git/maz/kvm-ws-tests.git/
> 
> I'd value some additional test results on HW I don't have access to.
> 

What would also be interesting is some insight into how big the hit then
is on first execution, but that should in no way gate merging these
patches.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 01/10] KVM: arm/arm64: Split dcache/icache flushing
  2017-10-09 15:20   ` Marc Zyngier
@ 2017-10-16 21:35     ` Roy Franz (Cavium)
  -1 siblings, 0 replies; 78+ messages in thread
From: Roy Franz (Cavium) @ 2017-10-16 21:35 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Christoffer Dall, Catalin Marinas, Will Deacon, kvm,
	linux-arm-kernel, kvmarm

On Mon, Oct 9, 2017 at 8:20 AM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> As we're about to introduce opportunistic invalidation of the icache,
> let's split dcache and icache flushing.
>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/include/asm/kvm_mmu.h   | 60 ++++++++++++++++++++++++++++------------
>  arch/arm64/include/asm/kvm_mmu.h | 13 +++++++--
>  virt/kvm/arm/mmu.c               | 20 ++++++++++----
>  3 files changed, 67 insertions(+), 26 deletions(-)
>
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index fa6f2174276b..f553aa62d0c3 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -126,21 +126,12 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
>         return (vcpu_cp15(vcpu, c1_SCTLR) & 0b101) == 0b101;
>  }
>
> -static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
> -                                              kvm_pfn_t pfn,
> -                                              unsigned long size)
> +static inline void __coherent_dcache_guest_page(struct kvm_vcpu *vcpu,
> +                                               kvm_pfn_t pfn,
> +                                               unsigned long size)
>  {
>         /*
> -        * If we are going to insert an instruction page and the icache is
> -        * either VIPT or PIPT, there is a potential problem where the host
> -        * (or another VM) may have used the same page as this guest, and we
> -        * read incorrect data from the icache.  If we're using a PIPT cache,
> -        * we can invalidate just that page, but if we are using a VIPT cache
> -        * we need to invalidate the entire icache - damn shame - as written
> -        * in the ARM ARM (DDI 0406C.b - Page B3-1393).
> -        *
> -        * VIVT caches are tagged using both the ASID and the VMID and doesn't
> -        * need any kind of flushing (DDI 0406C.b - Page B3-1392).
> +        * Clean the dcache to the Point of Coherency.
>          *
>          * We need to do this through a kernel mapping (using the
>          * user-space mapping has proved to be the wrong
> @@ -155,19 +146,52 @@ static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
>
>                 kvm_flush_dcache_to_poc(va, PAGE_SIZE);
>
> -               if (icache_is_pipt())
> -                       __cpuc_coherent_user_range((unsigned long)va,
> -                                                  (unsigned long)va + PAGE_SIZE);
> -
>                 size -= PAGE_SIZE;
>                 pfn++;
>
>                 kunmap_atomic(va);
>         }
> +}
>
> -       if (!icache_is_pipt() && !icache_is_vivt_asid_tagged()) {
> +static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
> +                                               kvm_pfn_t pfn,
> +                                               unsigned long size)
> +{
> +       /*
> +        * If we are going to insert an instruction page and the icache is
> +        * either VIPT or PIPT, there is a potential problem where the host
> +        * (or another VM) may have used the same page as this guest, and we
> +        * read incorrect data from the icache.  If we're using a PIPT cache,
> +        * we can invalidate just that page, but if we are using a VIPT cache
> +        * we need to invalidate the entire icache - damn shame - as written
> +        * in the ARM ARM (DDI 0406C.b - Page B3-1393).
> +        *
> +        * VIVT caches are tagged using both the ASID and the VMID and doesn't
> +        * need any kind of flushing (DDI 0406C.b - Page B3-1392).
> +        */
> +
> +       VM_BUG_ON(size & ~PAGE_MASK);
> +
> +       if (icache_is_vivt_asid_tagged())
> +               return;
> +
> +       if (!icache_is_pipt()) {
>                 /* any kind of VIPT cache */
>                 __flush_icache_all();
> +               return;
> +       }
How does cache_is_vivt() fit into these checks?   From my digging it looks like
that is ARMv5 and earlier only, so am I right in thinking those don't support
virtualization?  It looks like this code properly handles all the cache types
described in the ARM ARM that you referenced, and that the 'extra' cache
types in Linux are for older spec chips.


> +
> +       /* PIPT cache. As for the d-side, use a temporary kernel mapping. */
> +       while (size) {
> +               void *va = kmap_atomic_pfn(pfn);
> +
> +               __cpuc_coherent_user_range((unsigned long)va,
> +                                          (unsigned long)va + PAGE_SIZE);
> +
> +               size -= PAGE_SIZE;
> +               pfn++;
> +
> +               kunmap_atomic(va);
>         }
>  }
>
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 672c8684d5c2..4c4cb4f0e34f 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -230,19 +230,26 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
>         return (vcpu_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
>  }
>
> -static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
> -                                              kvm_pfn_t pfn,
> -                                              unsigned long size)
> +static inline void __coherent_dcache_guest_page(struct kvm_vcpu *vcpu,
> +                                               kvm_pfn_t pfn,
> +                                               unsigned long size)
>  {
>         void *va = page_address(pfn_to_page(pfn));
>
>         kvm_flush_dcache_to_poc(va, size);
> +}
>
> +static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
> +                                               kvm_pfn_t pfn,
> +                                               unsigned long size)
> +{
>         if (icache_is_aliasing()) {
>                 /* any kind of VIPT cache */
>                 __flush_icache_all();
>         } else if (is_kernel_in_hyp_mode() || !icache_is_vpipt()) {
>                 /* PIPT or VPIPT at EL2 (see comment in __kvm_tlb_flush_vmid_ipa) */
> +               void *va = page_address(pfn_to_page(pfn));
> +
>                 flush_icache_range((unsigned long)va,
>                                    (unsigned long)va + size);
>         }
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index b36945d49986..9e5628388af8 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1257,10 +1257,16 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>         kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
>  }
>
> -static void coherent_cache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
> -                                     unsigned long size)
> +static void coherent_dcache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
> +                                      unsigned long size)
>  {
> -       __coherent_cache_guest_page(vcpu, pfn, size);
> +       __coherent_dcache_guest_page(vcpu, pfn, size);
> +}
> +
> +static void coherent_icache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
> +                                      unsigned long size)
> +{
> +       __coherent_icache_guest_page(vcpu, pfn, size);
>  }
>
>  static void kvm_send_hwpoison_signal(unsigned long address,
> @@ -1391,7 +1397,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>                         new_pmd = kvm_s2pmd_mkwrite(new_pmd);
>                         kvm_set_pfn_dirty(pfn);
>                 }
> -               coherent_cache_guest_page(vcpu, pfn, PMD_SIZE);
> +               coherent_dcache_guest_page(vcpu, pfn, PMD_SIZE);
> +               coherent_icache_guest_page(vcpu, pfn, PMD_SIZE);
> +
>                 ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
>         } else {
>                 pte_t new_pte = pfn_pte(pfn, mem_type);
> @@ -1401,7 +1409,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>                         kvm_set_pfn_dirty(pfn);
>                         mark_page_dirty(kvm, gfn);
>                 }
> -               coherent_cache_guest_page(vcpu, pfn, PAGE_SIZE);
> +               coherent_dcache_guest_page(vcpu, pfn, PAGE_SIZE);
> +               coherent_icache_guest_page(vcpu, pfn, PAGE_SIZE);
> +
>                 ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, flags);
>         }
>
> --
> 2.14.1
>
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 01/10] KVM: arm/arm64: Split dcache/icache flushing
@ 2017-10-16 21:35     ` Roy Franz (Cavium)
  0 siblings, 0 replies; 78+ messages in thread
From: Roy Franz (Cavium) @ 2017-10-16 21:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 9, 2017 at 8:20 AM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> As we're about to introduce opportunistic invalidation of the icache,
> let's split dcache and icache flushing.
>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/include/asm/kvm_mmu.h   | 60 ++++++++++++++++++++++++++++------------
>  arch/arm64/include/asm/kvm_mmu.h | 13 +++++++--
>  virt/kvm/arm/mmu.c               | 20 ++++++++++----
>  3 files changed, 67 insertions(+), 26 deletions(-)
>
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index fa6f2174276b..f553aa62d0c3 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -126,21 +126,12 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
>         return (vcpu_cp15(vcpu, c1_SCTLR) & 0b101) == 0b101;
>  }
>
> -static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
> -                                              kvm_pfn_t pfn,
> -                                              unsigned long size)
> +static inline void __coherent_dcache_guest_page(struct kvm_vcpu *vcpu,
> +                                               kvm_pfn_t pfn,
> +                                               unsigned long size)
>  {
>         /*
> -        * If we are going to insert an instruction page and the icache is
> -        * either VIPT or PIPT, there is a potential problem where the host
> -        * (or another VM) may have used the same page as this guest, and we
> -        * read incorrect data from the icache.  If we're using a PIPT cache,
> -        * we can invalidate just that page, but if we are using a VIPT cache
> -        * we need to invalidate the entire icache - damn shame - as written
> -        * in the ARM ARM (DDI 0406C.b - Page B3-1393).
> -        *
> -        * VIVT caches are tagged using both the ASID and the VMID and doesn't
> -        * need any kind of flushing (DDI 0406C.b - Page B3-1392).
> +        * Clean the dcache to the Point of Coherency.
>          *
>          * We need to do this through a kernel mapping (using the
>          * user-space mapping has proved to be the wrong
> @@ -155,19 +146,52 @@ static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
>
>                 kvm_flush_dcache_to_poc(va, PAGE_SIZE);
>
> -               if (icache_is_pipt())
> -                       __cpuc_coherent_user_range((unsigned long)va,
> -                                                  (unsigned long)va + PAGE_SIZE);
> -
>                 size -= PAGE_SIZE;
>                 pfn++;
>
>                 kunmap_atomic(va);
>         }
> +}
>
> -       if (!icache_is_pipt() && !icache_is_vivt_asid_tagged()) {
> +static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
> +                                               kvm_pfn_t pfn,
> +                                               unsigned long size)
> +{
> +       /*
> +        * If we are going to insert an instruction page and the icache is
> +        * either VIPT or PIPT, there is a potential problem where the host
> +        * (or another VM) may have used the same page as this guest, and we
> +        * read incorrect data from the icache.  If we're using a PIPT cache,
> +        * we can invalidate just that page, but if we are using a VIPT cache
> +        * we need to invalidate the entire icache - damn shame - as written
> +        * in the ARM ARM (DDI 0406C.b - Page B3-1393).
> +        *
> +        * VIVT caches are tagged using both the ASID and the VMID and doesn't
> +        * need any kind of flushing (DDI 0406C.b - Page B3-1392).
> +        */
> +
> +       VM_BUG_ON(size & ~PAGE_MASK);
> +
> +       if (icache_is_vivt_asid_tagged())
> +               return;
> +
> +       if (!icache_is_pipt()) {
>                 /* any kind of VIPT cache */
>                 __flush_icache_all();
> +               return;
> +       }
How does cache_is_vivt() fit into these checks?   From my digging it looks like
that is ARMv5 and earlier only, so am I right in thinking those don't support
virtualization?  It looks like this code properly handles all the cache types
described in the ARM ARM that you referenced, and that the 'extra' cache
types in Linux are for older spec chips.


> +
> +       /* PIPT cache. As for the d-side, use a temporary kernel mapping. */
> +       while (size) {
> +               void *va = kmap_atomic_pfn(pfn);
> +
> +               __cpuc_coherent_user_range((unsigned long)va,
> +                                          (unsigned long)va + PAGE_SIZE);
> +
> +               size -= PAGE_SIZE;
> +               pfn++;
> +
> +               kunmap_atomic(va);
>         }
>  }
>
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 672c8684d5c2..4c4cb4f0e34f 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -230,19 +230,26 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
>         return (vcpu_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
>  }
>
> -static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
> -                                              kvm_pfn_t pfn,
> -                                              unsigned long size)
> +static inline void __coherent_dcache_guest_page(struct kvm_vcpu *vcpu,
> +                                               kvm_pfn_t pfn,
> +                                               unsigned long size)
>  {
>         void *va = page_address(pfn_to_page(pfn));
>
>         kvm_flush_dcache_to_poc(va, size);
> +}
>
> +static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
> +                                               kvm_pfn_t pfn,
> +                                               unsigned long size)
> +{
>         if (icache_is_aliasing()) {
>                 /* any kind of VIPT cache */
>                 __flush_icache_all();
>         } else if (is_kernel_in_hyp_mode() || !icache_is_vpipt()) {
>                 /* PIPT or VPIPT at EL2 (see comment in __kvm_tlb_flush_vmid_ipa) */
> +               void *va = page_address(pfn_to_page(pfn));
> +
>                 flush_icache_range((unsigned long)va,
>                                    (unsigned long)va + size);
>         }
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index b36945d49986..9e5628388af8 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1257,10 +1257,16 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>         kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
>  }
>
> -static void coherent_cache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
> -                                     unsigned long size)
> +static void coherent_dcache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
> +                                      unsigned long size)
>  {
> -       __coherent_cache_guest_page(vcpu, pfn, size);
> +       __coherent_dcache_guest_page(vcpu, pfn, size);
> +}
> +
> +static void coherent_icache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
> +                                      unsigned long size)
> +{
> +       __coherent_icache_guest_page(vcpu, pfn, size);
>  }
>
>  static void kvm_send_hwpoison_signal(unsigned long address,
> @@ -1391,7 +1397,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>                         new_pmd = kvm_s2pmd_mkwrite(new_pmd);
>                         kvm_set_pfn_dirty(pfn);
>                 }
> -               coherent_cache_guest_page(vcpu, pfn, PMD_SIZE);
> +               coherent_dcache_guest_page(vcpu, pfn, PMD_SIZE);
> +               coherent_icache_guest_page(vcpu, pfn, PMD_SIZE);
> +
>                 ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
>         } else {
>                 pte_t new_pte = pfn_pte(pfn, mem_type);
> @@ -1401,7 +1409,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>                         kvm_set_pfn_dirty(pfn);
>                         mark_page_dirty(kvm, gfn);
>                 }
> -               coherent_cache_guest_page(vcpu, pfn, PAGE_SIZE);
> +               coherent_dcache_guest_page(vcpu, pfn, PAGE_SIZE);
> +               coherent_icache_guest_page(vcpu, pfn, PAGE_SIZE);
> +
>                 ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, flags);
>         }
>
> --
> 2.14.1
>
> _______________________________________________
> kvmarm mailing list
> kvmarm at lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 01/10] KVM: arm/arm64: Split dcache/icache flushing
  2017-10-16 21:35     ` Roy Franz (Cavium)
@ 2017-10-17  6:44       ` Christoffer Dall
  -1 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2017-10-17  6:44 UTC (permalink / raw)
  To: roy.franz
  Cc: Marc Zyngier, Christoffer Dall, Catalin Marinas, Will Deacon,
	kvm, linux-arm-kernel, kvmarm

On Mon, Oct 16, 2017 at 02:35:47PM -0700, Roy Franz (Cavium) wrote:
> On Mon, Oct 9, 2017 at 8:20 AM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> > As we're about to introduce opportunistic invalidation of the icache,
> > let's split dcache and icache flushing.
> >
> > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> > ---
> >  arch/arm/include/asm/kvm_mmu.h   | 60 ++++++++++++++++++++++++++++------------
> >  arch/arm64/include/asm/kvm_mmu.h | 13 +++++++--
> >  virt/kvm/arm/mmu.c               | 20 ++++++++++----
> >  3 files changed, 67 insertions(+), 26 deletions(-)
> >
> > diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> > index fa6f2174276b..f553aa62d0c3 100644
> > --- a/arch/arm/include/asm/kvm_mmu.h
> > +++ b/arch/arm/include/asm/kvm_mmu.h
> > @@ -126,21 +126,12 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
> >         return (vcpu_cp15(vcpu, c1_SCTLR) & 0b101) == 0b101;
> >  }
> >
> > -static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
> > -                                              kvm_pfn_t pfn,
> > -                                              unsigned long size)
> > +static inline void __coherent_dcache_guest_page(struct kvm_vcpu *vcpu,
> > +                                               kvm_pfn_t pfn,
> > +                                               unsigned long size)
> >  {
> >         /*
> > -        * If we are going to insert an instruction page and the icache is
> > -        * either VIPT or PIPT, there is a potential problem where the host
> > -        * (or another VM) may have used the same page as this guest, and we
> > -        * read incorrect data from the icache.  If we're using a PIPT cache,
> > -        * we can invalidate just that page, but if we are using a VIPT cache
> > -        * we need to invalidate the entire icache - damn shame - as written
> > -        * in the ARM ARM (DDI 0406C.b - Page B3-1393).
> > -        *
> > -        * VIVT caches are tagged using both the ASID and the VMID and doesn't
> > -        * need any kind of flushing (DDI 0406C.b - Page B3-1392).
> > +        * Clean the dcache to the Point of Coherency.
> >          *
> >          * We need to do this through a kernel mapping (using the
> >          * user-space mapping has proved to be the wrong
> > @@ -155,19 +146,52 @@ static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
> >
> >                 kvm_flush_dcache_to_poc(va, PAGE_SIZE);
> >
> > -               if (icache_is_pipt())
> > -                       __cpuc_coherent_user_range((unsigned long)va,
> > -                                                  (unsigned long)va + PAGE_SIZE);
> > -
> >                 size -= PAGE_SIZE;
> >                 pfn++;
> >
> >                 kunmap_atomic(va);
> >         }
> > +}
> >
> > -       if (!icache_is_pipt() && !icache_is_vivt_asid_tagged()) {
> > +static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
> > +                                               kvm_pfn_t pfn,
> > +                                               unsigned long size)
> > +{
> > +       /*
> > +        * If we are going to insert an instruction page and the icache is
> > +        * either VIPT or PIPT, there is a potential problem where the host
> > +        * (or another VM) may have used the same page as this guest, and we
> > +        * read incorrect data from the icache.  If we're using a PIPT cache,
> > +        * we can invalidate just that page, but if we are using a VIPT cache
> > +        * we need to invalidate the entire icache - damn shame - as written
> > +        * in the ARM ARM (DDI 0406C.b - Page B3-1393).
> > +        *
> > +        * VIVT caches are tagged using both the ASID and the VMID and doesn't
> > +        * need any kind of flushing (DDI 0406C.b - Page B3-1392).
> > +        */
> > +
> > +       VM_BUG_ON(size & ~PAGE_MASK);
> > +
> > +       if (icache_is_vivt_asid_tagged())
> > +               return;
> > +
> > +       if (!icache_is_pipt()) {
> >                 /* any kind of VIPT cache */
> >                 __flush_icache_all();
> > +               return;
> > +       }
> How does cache_is_vivt() fit into these checks?   From my digging it looks like
> that is ARMv5 and earlier only, so am I right in thinking those don't support
> virtualization?  It looks like this code properly handles all the cache types
> described in the ARM ARM that you referenced, and that the 'extra' cache
> types in Linux are for older spec chips.
> 
> 
That's certainly my understanding.  From the ARMv7 ARM the only types of
instruction caches we should worry about are:

 - PIPT instruction caches
 - Virtually-indexed, physically-tagged (VIPT) instruction caches
 - ASID and VMID tagged Virtually-indexed, virtually-tagged (VIVT)
   instruction caches.

And I think that's covered here.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 01/10] KVM: arm/arm64: Split dcache/icache flushing
@ 2017-10-17  6:44       ` Christoffer Dall
  0 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2017-10-17  6:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 16, 2017 at 02:35:47PM -0700, Roy Franz (Cavium) wrote:
> On Mon, Oct 9, 2017 at 8:20 AM, Marc Zyngier <marc.zyngier@arm.com> wrote:
> > As we're about to introduce opportunistic invalidation of the icache,
> > let's split dcache and icache flushing.
> >
> > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> > ---
> >  arch/arm/include/asm/kvm_mmu.h   | 60 ++++++++++++++++++++++++++++------------
> >  arch/arm64/include/asm/kvm_mmu.h | 13 +++++++--
> >  virt/kvm/arm/mmu.c               | 20 ++++++++++----
> >  3 files changed, 67 insertions(+), 26 deletions(-)
> >
> > diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> > index fa6f2174276b..f553aa62d0c3 100644
> > --- a/arch/arm/include/asm/kvm_mmu.h
> > +++ b/arch/arm/include/asm/kvm_mmu.h
> > @@ -126,21 +126,12 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
> >         return (vcpu_cp15(vcpu, c1_SCTLR) & 0b101) == 0b101;
> >  }
> >
> > -static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
> > -                                              kvm_pfn_t pfn,
> > -                                              unsigned long size)
> > +static inline void __coherent_dcache_guest_page(struct kvm_vcpu *vcpu,
> > +                                               kvm_pfn_t pfn,
> > +                                               unsigned long size)
> >  {
> >         /*
> > -        * If we are going to insert an instruction page and the icache is
> > -        * either VIPT or PIPT, there is a potential problem where the host
> > -        * (or another VM) may have used the same page as this guest, and we
> > -        * read incorrect data from the icache.  If we're using a PIPT cache,
> > -        * we can invalidate just that page, but if we are using a VIPT cache
> > -        * we need to invalidate the entire icache - damn shame - as written
> > -        * in the ARM ARM (DDI 0406C.b - Page B3-1393).
> > -        *
> > -        * VIVT caches are tagged using both the ASID and the VMID and doesn't
> > -        * need any kind of flushing (DDI 0406C.b - Page B3-1392).
> > +        * Clean the dcache to the Point of Coherency.
> >          *
> >          * We need to do this through a kernel mapping (using the
> >          * user-space mapping has proved to be the wrong
> > @@ -155,19 +146,52 @@ static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
> >
> >                 kvm_flush_dcache_to_poc(va, PAGE_SIZE);
> >
> > -               if (icache_is_pipt())
> > -                       __cpuc_coherent_user_range((unsigned long)va,
> > -                                                  (unsigned long)va + PAGE_SIZE);
> > -
> >                 size -= PAGE_SIZE;
> >                 pfn++;
> >
> >                 kunmap_atomic(va);
> >         }
> > +}
> >
> > -       if (!icache_is_pipt() && !icache_is_vivt_asid_tagged()) {
> > +static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
> > +                                               kvm_pfn_t pfn,
> > +                                               unsigned long size)
> > +{
> > +       /*
> > +        * If we are going to insert an instruction page and the icache is
> > +        * either VIPT or PIPT, there is a potential problem where the host
> > +        * (or another VM) may have used the same page as this guest, and we
> > +        * read incorrect data from the icache.  If we're using a PIPT cache,
> > +        * we can invalidate just that page, but if we are using a VIPT cache
> > +        * we need to invalidate the entire icache - damn shame - as written
> > +        * in the ARM ARM (DDI 0406C.b - Page B3-1393).
> > +        *
> > +        * VIVT caches are tagged using both the ASID and the VMID and doesn't
> > +        * need any kind of flushing (DDI 0406C.b - Page B3-1392).
> > +        */
> > +
> > +       VM_BUG_ON(size & ~PAGE_MASK);
> > +
> > +       if (icache_is_vivt_asid_tagged())
> > +               return;
> > +
> > +       if (!icache_is_pipt()) {
> >                 /* any kind of VIPT cache */
> >                 __flush_icache_all();
> > +               return;
> > +       }
> How does cache_is_vivt() fit into these checks?   From my digging it looks like
> that is ARMv5 and earlier only, so am I right in thinking those don't support
> virtualization?  It looks like this code properly handles all the cache types
> described in the ARM ARM that you referenced, and that the 'extra' cache
> types in Linux are for older spec chips.
> 
> 
That's certainly my understanding.  From the ARMv7 ARM the only types of
instruction caches we should worry about are:

 - PIPT instruction caches
 - Virtually-indexed, physically-tagged (VIPT) instruction caches
 - ASID and VMID tagged Virtually-indexed, virtually-tagged (VIVT)
   instruction caches.

And I think that's covered here.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 01/10] KVM: arm/arm64: Split dcache/icache flushing
  2017-10-16 20:07     ` Christoffer Dall
@ 2017-10-17  8:57       ` Marc Zyngier
  -1 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2017-10-17  8:57 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvm, Catalin Marinas, Will Deacon, kvmarm, linux-arm-kernel

On 16/10/17 21:07, Christoffer Dall wrote:
> On Mon, Oct 09, 2017 at 04:20:23PM +0100, Marc Zyngier wrote:
>> As we're about to introduce opportunistic invalidation of the icache,
>> let's split dcache and icache flushing.
> 
> I'm a little confused abut the naming of these functions now,
> because where I believe the current function ensures coherency between
> the I-cache and D-cache (and overly so) if you just call one or the
> other function after this change, what exactly is the coherency you get?

Yeah, in retrospect, this is a pretty stupid naming scheme. I guess I'll
call them clean/invalidate, with the overarching caller still being
called coherent_cache_guest for the time being.

> 
> 
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm/include/asm/kvm_mmu.h   | 60 ++++++++++++++++++++++++++++------------
>>  arch/arm64/include/asm/kvm_mmu.h | 13 +++++++--
>>  virt/kvm/arm/mmu.c               | 20 ++++++++++----
>>  3 files changed, 67 insertions(+), 26 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
>> index fa6f2174276b..f553aa62d0c3 100644
>> --- a/arch/arm/include/asm/kvm_mmu.h
>> +++ b/arch/arm/include/asm/kvm_mmu.h
>> @@ -126,21 +126,12 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
>>  	return (vcpu_cp15(vcpu, c1_SCTLR) & 0b101) == 0b101;
>>  }
>>  
>> -static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
>> -					       kvm_pfn_t pfn,
>> -					       unsigned long size)
>> +static inline void __coherent_dcache_guest_page(struct kvm_vcpu *vcpu,
>> +						kvm_pfn_t pfn,
>> +						unsigned long size)
>>  {
>>  	/*
>> -	 * If we are going to insert an instruction page and the icache is
>> -	 * either VIPT or PIPT, there is a potential problem where the host
>> -	 * (or another VM) may have used the same page as this guest, and we
>> -	 * read incorrect data from the icache.  If we're using a PIPT cache,
>> -	 * we can invalidate just that page, but if we are using a VIPT cache
>> -	 * we need to invalidate the entire icache - damn shame - as written
>> -	 * in the ARM ARM (DDI 0406C.b - Page B3-1393).
>> -	 *
>> -	 * VIVT caches are tagged using both the ASID and the VMID and doesn't
>> -	 * need any kind of flushing (DDI 0406C.b - Page B3-1392).
>> +	 * Clean the dcache to the Point of Coherency.
>>  	 *
>>  	 * We need to do this through a kernel mapping (using the
>>  	 * user-space mapping has proved to be the wrong
>> @@ -155,19 +146,52 @@ static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
>>  
>>  		kvm_flush_dcache_to_poc(va, PAGE_SIZE);
>>  
>> -		if (icache_is_pipt())
>> -			__cpuc_coherent_user_range((unsigned long)va,
>> -						   (unsigned long)va + PAGE_SIZE);
>> -
>>  		size -= PAGE_SIZE;
>>  		pfn++;
>>  
>>  		kunmap_atomic(va);
>>  	}
>> +}
>>  
>> -	if (!icache_is_pipt() && !icache_is_vivt_asid_tagged()) {
>> +static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
>> +						kvm_pfn_t pfn,
>> +						unsigned long size)
>> +{
>> +	/*
>> +	 * If we are going to insert an instruction page and the icache is
>> +	 * either VIPT or PIPT, there is a potential problem where the host
>> +	 * (or another VM) may have used the same page as this guest, and we
>> +	 * read incorrect data from the icache.  If we're using a PIPT cache,
>> +	 * we can invalidate just that page, but if we are using a VIPT cache
>> +	 * we need to invalidate the entire icache - damn shame - as written
>> +	 * in the ARM ARM (DDI 0406C.b - Page B3-1393).
>> +	 *
>> +	 * VIVT caches are tagged using both the ASID and the VMID and doesn't
>> +	 * need any kind of flushing (DDI 0406C.b - Page B3-1392).
>> +	 */
>> +
>> +	VM_BUG_ON(size & ~PAGE_MASK);
>> +
>> +	if (icache_is_vivt_asid_tagged())
>> +		return;
>> +
>> +	if (!icache_is_pipt()) {
>>  		/* any kind of VIPT cache */
>>  		__flush_icache_all();
>> +		return;
>> +	}
>> +
>> +	/* PIPT cache. As for the d-side, use a temporary kernel mapping. */
>> +	while (size) {
>> +		void *va = kmap_atomic_pfn(pfn);
>> +
>> +		__cpuc_coherent_user_range((unsigned long)va,
>> +					   (unsigned long)va + PAGE_SIZE);
>> +
>> +		size -= PAGE_SIZE;
>> +		pfn++;
>> +
>> +		kunmap_atomic(va);
>>  	}
>>  }
>>  
>> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
>> index 672c8684d5c2..4c4cb4f0e34f 100644
>> --- a/arch/arm64/include/asm/kvm_mmu.h
>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>> @@ -230,19 +230,26 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
>>  	return (vcpu_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
>>  }
>>  
>> -static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
>> -					       kvm_pfn_t pfn,
>> -					       unsigned long size)
>> +static inline void __coherent_dcache_guest_page(struct kvm_vcpu *vcpu,
>> +						kvm_pfn_t pfn,
>> +						unsigned long size)
>>  {
>>  	void *va = page_address(pfn_to_page(pfn));
>>  
>>  	kvm_flush_dcache_to_poc(va, size);
>> +}
>>  
>> +static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
>> +						kvm_pfn_t pfn,
>> +						unsigned long size)
>> +{
>>  	if (icache_is_aliasing()) {
>>  		/* any kind of VIPT cache */
>>  		__flush_icache_all();
>>  	} else if (is_kernel_in_hyp_mode() || !icache_is_vpipt()) {
>>  		/* PIPT or VPIPT at EL2 (see comment in __kvm_tlb_flush_vmid_ipa) */
> 
> unrelated: I went and read the comment in __kvm_tlb_flush_vmid_ipa, and
> I don't really understand why there is only a need to flush the icache
> if the host is running at EL1.
> 
> The text seems to describe the problem of remapping executable pages
> within the guest.  That seems to me would require icache maintenance of
> the page that gets overwritten with new code, regardless of whether the
> host runs at EL1 or EL2.
> 
> Of course it's easier done on VHE because we don't have to take a trap,
> but the code seems to not invalidate the icache at all for VHE systems
> that have VPIPT.  I'm confused.  Can you help?

[+ Will, as he wrote that code and can reply if I say something stupid]

Here's the trick: The VMID-tagged aspect of VPIPT only applies if the
CMO is used at EL0 or EL1. When used at EL2, it behaves exactly like a
VPIPT operation (see D4.10.2 in the ARMv8 ARM version B_b).

So in the end, we deal with VPIPT the following way:

- Without VHE, we perform the icache invalidation on unmap, blatting the
whole icache.

- With VHE, we do it the usual way (at map time), using the PIPT
flavour, as the invalidation is done from EL2

>> +		void *va = page_address(pfn_to_page(pfn));
>> +
>>  		flush_icache_range((unsigned long)va,
>>  				   (unsigned long)va + size);
>>  	}
>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>> index b36945d49986..9e5628388af8 100644
>> --- a/virt/kvm/arm/mmu.c
>> +++ b/virt/kvm/arm/mmu.c
>> @@ -1257,10 +1257,16 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>>  	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
>>  }
>>  
>> -static void coherent_cache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
>> -				      unsigned long size)
>> +static void coherent_dcache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
>> +				       unsigned long size)
>>  {
>> -	__coherent_cache_guest_page(vcpu, pfn, size);
>> +	__coherent_dcache_guest_page(vcpu, pfn, size);
>> +}
>> +
>> +static void coherent_icache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
>> +				       unsigned long size)
>> +{
>> +	__coherent_icache_guest_page(vcpu, pfn, size);
>>  }
>>  
>>  static void kvm_send_hwpoison_signal(unsigned long address,
>> @@ -1391,7 +1397,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  			new_pmd = kvm_s2pmd_mkwrite(new_pmd);
>>  			kvm_set_pfn_dirty(pfn);
>>  		}
>> -		coherent_cache_guest_page(vcpu, pfn, PMD_SIZE);
>> +		coherent_dcache_guest_page(vcpu, pfn, PMD_SIZE);
>> +		coherent_icache_guest_page(vcpu, pfn, PMD_SIZE);
>> +
>>  		ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
>>  	} else {
>>  		pte_t new_pte = pfn_pte(pfn, mem_type);
>> @@ -1401,7 +1409,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  			kvm_set_pfn_dirty(pfn);
>>  			mark_page_dirty(kvm, gfn);
>>  		}
>> -		coherent_cache_guest_page(vcpu, pfn, PAGE_SIZE);
>> +		coherent_dcache_guest_page(vcpu, pfn, PAGE_SIZE);
>> +		coherent_icache_guest_page(vcpu, pfn, PAGE_SIZE);
>> +
>>  		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, flags);
>>  	}
>>  
>> -- 
>> 2.14.1
>>
> 
> Otherwise this looks fine to me:
> 
> Acked-by: Christoffer Dall <cdall@linaro.org>

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 01/10] KVM: arm/arm64: Split dcache/icache flushing
@ 2017-10-17  8:57       ` Marc Zyngier
  0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2017-10-17  8:57 UTC (permalink / raw)
  To: linux-arm-kernel

On 16/10/17 21:07, Christoffer Dall wrote:
> On Mon, Oct 09, 2017 at 04:20:23PM +0100, Marc Zyngier wrote:
>> As we're about to introduce opportunistic invalidation of the icache,
>> let's split dcache and icache flushing.
> 
> I'm a little confused abut the naming of these functions now,
> because where I believe the current function ensures coherency between
> the I-cache and D-cache (and overly so) if you just call one or the
> other function after this change, what exactly is the coherency you get?

Yeah, in retrospect, this is a pretty stupid naming scheme. I guess I'll
call them clean/invalidate, with the overarching caller still being
called coherent_cache_guest for the time being.

> 
> 
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm/include/asm/kvm_mmu.h   | 60 ++++++++++++++++++++++++++++------------
>>  arch/arm64/include/asm/kvm_mmu.h | 13 +++++++--
>>  virt/kvm/arm/mmu.c               | 20 ++++++++++----
>>  3 files changed, 67 insertions(+), 26 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
>> index fa6f2174276b..f553aa62d0c3 100644
>> --- a/arch/arm/include/asm/kvm_mmu.h
>> +++ b/arch/arm/include/asm/kvm_mmu.h
>> @@ -126,21 +126,12 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
>>  	return (vcpu_cp15(vcpu, c1_SCTLR) & 0b101) == 0b101;
>>  }
>>  
>> -static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
>> -					       kvm_pfn_t pfn,
>> -					       unsigned long size)
>> +static inline void __coherent_dcache_guest_page(struct kvm_vcpu *vcpu,
>> +						kvm_pfn_t pfn,
>> +						unsigned long size)
>>  {
>>  	/*
>> -	 * If we are going to insert an instruction page and the icache is
>> -	 * either VIPT or PIPT, there is a potential problem where the host
>> -	 * (or another VM) may have used the same page as this guest, and we
>> -	 * read incorrect data from the icache.  If we're using a PIPT cache,
>> -	 * we can invalidate just that page, but if we are using a VIPT cache
>> -	 * we need to invalidate the entire icache - damn shame - as written
>> -	 * in the ARM ARM (DDI 0406C.b - Page B3-1393).
>> -	 *
>> -	 * VIVT caches are tagged using both the ASID and the VMID and doesn't
>> -	 * need any kind of flushing (DDI 0406C.b - Page B3-1392).
>> +	 * Clean the dcache to the Point of Coherency.
>>  	 *
>>  	 * We need to do this through a kernel mapping (using the
>>  	 * user-space mapping has proved to be the wrong
>> @@ -155,19 +146,52 @@ static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
>>  
>>  		kvm_flush_dcache_to_poc(va, PAGE_SIZE);
>>  
>> -		if (icache_is_pipt())
>> -			__cpuc_coherent_user_range((unsigned long)va,
>> -						   (unsigned long)va + PAGE_SIZE);
>> -
>>  		size -= PAGE_SIZE;
>>  		pfn++;
>>  
>>  		kunmap_atomic(va);
>>  	}
>> +}
>>  
>> -	if (!icache_is_pipt() && !icache_is_vivt_asid_tagged()) {
>> +static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
>> +						kvm_pfn_t pfn,
>> +						unsigned long size)
>> +{
>> +	/*
>> +	 * If we are going to insert an instruction page and the icache is
>> +	 * either VIPT or PIPT, there is a potential problem where the host
>> +	 * (or another VM) may have used the same page as this guest, and we
>> +	 * read incorrect data from the icache.  If we're using a PIPT cache,
>> +	 * we can invalidate just that page, but if we are using a VIPT cache
>> +	 * we need to invalidate the entire icache - damn shame - as written
>> +	 * in the ARM ARM (DDI 0406C.b - Page B3-1393).
>> +	 *
>> +	 * VIVT caches are tagged using both the ASID and the VMID and doesn't
>> +	 * need any kind of flushing (DDI 0406C.b - Page B3-1392).
>> +	 */
>> +
>> +	VM_BUG_ON(size & ~PAGE_MASK);
>> +
>> +	if (icache_is_vivt_asid_tagged())
>> +		return;
>> +
>> +	if (!icache_is_pipt()) {
>>  		/* any kind of VIPT cache */
>>  		__flush_icache_all();
>> +		return;
>> +	}
>> +
>> +	/* PIPT cache. As for the d-side, use a temporary kernel mapping. */
>> +	while (size) {
>> +		void *va = kmap_atomic_pfn(pfn);
>> +
>> +		__cpuc_coherent_user_range((unsigned long)va,
>> +					   (unsigned long)va + PAGE_SIZE);
>> +
>> +		size -= PAGE_SIZE;
>> +		pfn++;
>> +
>> +		kunmap_atomic(va);
>>  	}
>>  }
>>  
>> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
>> index 672c8684d5c2..4c4cb4f0e34f 100644
>> --- a/arch/arm64/include/asm/kvm_mmu.h
>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>> @@ -230,19 +230,26 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
>>  	return (vcpu_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
>>  }
>>  
>> -static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
>> -					       kvm_pfn_t pfn,
>> -					       unsigned long size)
>> +static inline void __coherent_dcache_guest_page(struct kvm_vcpu *vcpu,
>> +						kvm_pfn_t pfn,
>> +						unsigned long size)
>>  {
>>  	void *va = page_address(pfn_to_page(pfn));
>>  
>>  	kvm_flush_dcache_to_poc(va, size);
>> +}
>>  
>> +static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
>> +						kvm_pfn_t pfn,
>> +						unsigned long size)
>> +{
>>  	if (icache_is_aliasing()) {
>>  		/* any kind of VIPT cache */
>>  		__flush_icache_all();
>>  	} else if (is_kernel_in_hyp_mode() || !icache_is_vpipt()) {
>>  		/* PIPT or VPIPT at EL2 (see comment in __kvm_tlb_flush_vmid_ipa) */
> 
> unrelated: I went and read the comment in __kvm_tlb_flush_vmid_ipa, and
> I don't really understand why there is only a need to flush the icache
> if the host is running at EL1.
> 
> The text seems to describe the problem of remapping executable pages
> within the guest.  That seems to me would require icache maintenance of
> the page that gets overwritten with new code, regardless of whether the
> host runs at EL1 or EL2.
> 
> Of course it's easier done on VHE because we don't have to take a trap,
> but the code seems to not invalidate the icache at all for VHE systems
> that have VPIPT.  I'm confused.  Can you help?

[+ Will, as he wrote that code and can reply if I say something stupid]

Here's the trick: The VMID-tagged aspect of VPIPT only applies if the
CMO is used at EL0 or EL1. When used at EL2, it behaves exactly like a
VPIPT operation (see D4.10.2 in the ARMv8 ARM version B_b).

So in the end, we deal with VPIPT the following way:

- Without VHE, we perform the icache invalidation on unmap, blatting the
whole icache.

- With VHE, we do it the usual way (at map time), using the PIPT
flavour, as the invalidation is done from EL2

>> +		void *va = page_address(pfn_to_page(pfn));
>> +
>>  		flush_icache_range((unsigned long)va,
>>  				   (unsigned long)va + size);
>>  	}
>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>> index b36945d49986..9e5628388af8 100644
>> --- a/virt/kvm/arm/mmu.c
>> +++ b/virt/kvm/arm/mmu.c
>> @@ -1257,10 +1257,16 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>>  	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
>>  }
>>  
>> -static void coherent_cache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
>> -				      unsigned long size)
>> +static void coherent_dcache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
>> +				       unsigned long size)
>>  {
>> -	__coherent_cache_guest_page(vcpu, pfn, size);
>> +	__coherent_dcache_guest_page(vcpu, pfn, size);
>> +}
>> +
>> +static void coherent_icache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
>> +				       unsigned long size)
>> +{
>> +	__coherent_icache_guest_page(vcpu, pfn, size);
>>  }
>>  
>>  static void kvm_send_hwpoison_signal(unsigned long address,
>> @@ -1391,7 +1397,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  			new_pmd = kvm_s2pmd_mkwrite(new_pmd);
>>  			kvm_set_pfn_dirty(pfn);
>>  		}
>> -		coherent_cache_guest_page(vcpu, pfn, PMD_SIZE);
>> +		coherent_dcache_guest_page(vcpu, pfn, PMD_SIZE);
>> +		coherent_icache_guest_page(vcpu, pfn, PMD_SIZE);
>> +
>>  		ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
>>  	} else {
>>  		pte_t new_pte = pfn_pte(pfn, mem_type);
>> @@ -1401,7 +1409,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  			kvm_set_pfn_dirty(pfn);
>>  			mark_page_dirty(kvm, gfn);
>>  		}
>> -		coherent_cache_guest_page(vcpu, pfn, PAGE_SIZE);
>> +		coherent_dcache_guest_page(vcpu, pfn, PAGE_SIZE);
>> +		coherent_icache_guest_page(vcpu, pfn, PAGE_SIZE);
>> +
>>  		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, flags);
>>  	}
>>  
>> -- 
>> 2.14.1
>>
> 
> Otherwise this looks fine to me:
> 
> Acked-by: Christoffer Dall <cdall@linaro.org>

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 03/10] arm: KVM: Add optimized PIPT icache flushing
  2017-10-16 20:07     ` Christoffer Dall
@ 2017-10-17  9:26       ` Marc Zyngier
  -1 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2017-10-17  9:26 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvm, Catalin Marinas, Will Deacon, kvmarm, linux-arm-kernel

On 16/10/17 21:07, Christoffer Dall wrote:
> On Mon, Oct 09, 2017 at 04:20:25PM +0100, Marc Zyngier wrote:
>> Calling __cpuc_coherent_user_range to invalidate the icache on
>> a PIPT icache machine has some pointless overhead, as it starts
>> by cleaning the dcache to the PoU, while we're guaranteed to
>> have already cleaned it to the PoC.
>>
>> As KVM is the only user of such a feature, let's implement some
>> ad-hoc cache flushing in kvm_mmu.h. Should it become useful to
>> other subsystems, it can be moved to a more global location.
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm/include/asm/kvm_hyp.h |  2 ++
>>  arch/arm/include/asm/kvm_mmu.h | 24 ++++++++++++++++++++++--
>>  2 files changed, 24 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/kvm_hyp.h b/arch/arm/include/asm/kvm_hyp.h
>> index 14b5903f0224..ad541f9ecc78 100644
>> --- a/arch/arm/include/asm/kvm_hyp.h
>> +++ b/arch/arm/include/asm/kvm_hyp.h
>> @@ -69,6 +69,8 @@
>>  #define HIFAR		__ACCESS_CP15(c6, 4, c0, 2)
>>  #define HPFAR		__ACCESS_CP15(c6, 4, c0, 4)
>>  #define ICIALLUIS	__ACCESS_CP15(c7, 0, c1, 0)
>> +#define BPIALLIS	__ACCESS_CP15(c7, 0, c1, 6)
>> +#define ICIMVAU		__ACCESS_CP15(c7, 0, c5, 1)
>>  #define ATS1CPR		__ACCESS_CP15(c7, 0, c8, 0)
>>  #define TLBIALLIS	__ACCESS_CP15(c8, 0, c3, 0)
>>  #define TLBIALL		__ACCESS_CP15(c8, 0, c7, 0)
>> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
>> index f553aa62d0c3..6773dcf21bff 100644
>> --- a/arch/arm/include/asm/kvm_mmu.h
>> +++ b/arch/arm/include/asm/kvm_mmu.h
>> @@ -37,6 +37,8 @@
>>  
>>  #include <linux/highmem.h>
>>  #include <asm/cacheflush.h>
>> +#include <asm/cputype.h>
>> +#include <asm/kvm_hyp.h>
>>  #include <asm/pgalloc.h>
>>  #include <asm/stage2_pgtable.h>
>>  
>> @@ -157,6 +159,8 @@ static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
>>  						kvm_pfn_t pfn,
>>  						unsigned long size)
>>  {
>> +	u32 iclsz;
>> +
>>  	/*
>>  	 * If we are going to insert an instruction page and the icache is
>>  	 * either VIPT or PIPT, there is a potential problem where the host
>> @@ -182,17 +186,33 @@ static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
>>  	}
>>  
>>  	/* PIPT cache. As for the d-side, use a temporary kernel mapping. */
>> +	iclsz = 4 << (read_cpuid(CPUID_CACHETYPE) & 0xf);
>> +
> 
> nit: the 4 here is a bit cryptic, could we say something like (perhaps
> slightly over-explained):
> /*
>  * CTR IminLine contains Log2 of the number of words in the cache line,
>  * so we can get the number of words as 2 << (IminLine - 1).  To get the
>  * number of bytes, we multiply by 4 (the number of bytes in a 32-bit
>  * word), and get 4 << (IminLine).
>  */

Absolutely. I'll fold that in. Thanks.

>>  	while (size) {
>>  		void *va = kmap_atomic_pfn(pfn);
>> +		void *end = va + PAGE_SIZE;
>> +		void *addr = va;
>> +
>> +		do {
>> +			write_sysreg(addr, ICIMVAU);
> 
> Maybe an oddball place to ask this, but I don't recall why we need PoU
> everywhere, would PoC potentially be enough?

PoC is in general stronger than PoU. All we care is for instructions to
be fetched from the point where the icache cannot be distinguished from
the dcache - the definition of the PoU. Also, I don't think there is a
way to invalidate the icache to the PoC.

> 
>> +			addr += iclsz;
>> +		} while (addr < end);
>>  
>> -		__cpuc_coherent_user_range((unsigned long)va,
>> -					   (unsigned long)va + PAGE_SIZE);
>> +		dsb(ishst);
>> +		isb();
> 
> Do we really need this in every iteration of the loop?

The problem is that we need to factor in the interaction with the unmap
below. If we don't enforce the invalidation now, we may unmap the page
before the invalidations are finished, with could lead to a page fault.
If we didn't have to deal with highmem, that would indeed be a good
optimization.

> 
>>  
>>  		size -= PAGE_SIZE;
>>  		pfn++;
>>  
>>  		kunmap_atomic(va);
>>  	}
>> +
>> +	/* Check if we need to invalidate the BTB */
>> +	if ((read_cpuid_ext(CPUID_EXT_MMFR1) >> 24) != 4) {
> 
> Either I'm having a bad day or you meant to shift this 28, not 24?

Oops... That's indeed totally broken. Thanks for noticing it!

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 03/10] arm: KVM: Add optimized PIPT icache flushing
@ 2017-10-17  9:26       ` Marc Zyngier
  0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2017-10-17  9:26 UTC (permalink / raw)
  To: linux-arm-kernel

On 16/10/17 21:07, Christoffer Dall wrote:
> On Mon, Oct 09, 2017 at 04:20:25PM +0100, Marc Zyngier wrote:
>> Calling __cpuc_coherent_user_range to invalidate the icache on
>> a PIPT icache machine has some pointless overhead, as it starts
>> by cleaning the dcache to the PoU, while we're guaranteed to
>> have already cleaned it to the PoC.
>>
>> As KVM is the only user of such a feature, let's implement some
>> ad-hoc cache flushing in kvm_mmu.h. Should it become useful to
>> other subsystems, it can be moved to a more global location.
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm/include/asm/kvm_hyp.h |  2 ++
>>  arch/arm/include/asm/kvm_mmu.h | 24 ++++++++++++++++++++++--
>>  2 files changed, 24 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/kvm_hyp.h b/arch/arm/include/asm/kvm_hyp.h
>> index 14b5903f0224..ad541f9ecc78 100644
>> --- a/arch/arm/include/asm/kvm_hyp.h
>> +++ b/arch/arm/include/asm/kvm_hyp.h
>> @@ -69,6 +69,8 @@
>>  #define HIFAR		__ACCESS_CP15(c6, 4, c0, 2)
>>  #define HPFAR		__ACCESS_CP15(c6, 4, c0, 4)
>>  #define ICIALLUIS	__ACCESS_CP15(c7, 0, c1, 0)
>> +#define BPIALLIS	__ACCESS_CP15(c7, 0, c1, 6)
>> +#define ICIMVAU		__ACCESS_CP15(c7, 0, c5, 1)
>>  #define ATS1CPR		__ACCESS_CP15(c7, 0, c8, 0)
>>  #define TLBIALLIS	__ACCESS_CP15(c8, 0, c3, 0)
>>  #define TLBIALL		__ACCESS_CP15(c8, 0, c7, 0)
>> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
>> index f553aa62d0c3..6773dcf21bff 100644
>> --- a/arch/arm/include/asm/kvm_mmu.h
>> +++ b/arch/arm/include/asm/kvm_mmu.h
>> @@ -37,6 +37,8 @@
>>  
>>  #include <linux/highmem.h>
>>  #include <asm/cacheflush.h>
>> +#include <asm/cputype.h>
>> +#include <asm/kvm_hyp.h>
>>  #include <asm/pgalloc.h>
>>  #include <asm/stage2_pgtable.h>
>>  
>> @@ -157,6 +159,8 @@ static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
>>  						kvm_pfn_t pfn,
>>  						unsigned long size)
>>  {
>> +	u32 iclsz;
>> +
>>  	/*
>>  	 * If we are going to insert an instruction page and the icache is
>>  	 * either VIPT or PIPT, there is a potential problem where the host
>> @@ -182,17 +186,33 @@ static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
>>  	}
>>  
>>  	/* PIPT cache. As for the d-side, use a temporary kernel mapping. */
>> +	iclsz = 4 << (read_cpuid(CPUID_CACHETYPE) & 0xf);
>> +
> 
> nit: the 4 here is a bit cryptic, could we say something like (perhaps
> slightly over-explained):
> /*
>  * CTR IminLine contains Log2 of the number of words in the cache line,
>  * so we can get the number of words as 2 << (IminLine - 1).  To get the
>  * number of bytes, we multiply by 4 (the number of bytes in a 32-bit
>  * word), and get 4 << (IminLine).
>  */

Absolutely. I'll fold that in. Thanks.

>>  	while (size) {
>>  		void *va = kmap_atomic_pfn(pfn);
>> +		void *end = va + PAGE_SIZE;
>> +		void *addr = va;
>> +
>> +		do {
>> +			write_sysreg(addr, ICIMVAU);
> 
> Maybe an oddball place to ask this, but I don't recall why we need PoU
> everywhere, would PoC potentially be enough?

PoC is in general stronger than PoU. All we care is for instructions to
be fetched from the point where the icache cannot be distinguished from
the dcache - the definition of the PoU. Also, I don't think there is a
way to invalidate the icache to the PoC.

> 
>> +			addr += iclsz;
>> +		} while (addr < end);
>>  
>> -		__cpuc_coherent_user_range((unsigned long)va,
>> -					   (unsigned long)va + PAGE_SIZE);
>> +		dsb(ishst);
>> +		isb();
> 
> Do we really need this in every iteration of the loop?

The problem is that we need to factor in the interaction with the unmap
below. If we don't enforce the invalidation now, we may unmap the page
before the invalidations are finished, with could lead to a page fault.
If we didn't have to deal with highmem, that would indeed be a good
optimization.

> 
>>  
>>  		size -= PAGE_SIZE;
>>  		pfn++;
>>  
>>  		kunmap_atomic(va);
>>  	}
>> +
>> +	/* Check if we need to invalidate the BTB */
>> +	if ((read_cpuid_ext(CPUID_EXT_MMFR1) >> 24) != 4) {
> 
> Either I'm having a bad day or you meant to shift this 28, not 24?

Oops... That's indeed totally broken. Thanks for noticing it!

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 06/10] KVM: arm/arm64: Only clean the dcache on translation fault
  2017-10-16 20:08     ` Christoffer Dall
@ 2017-10-17  9:34       ` Marc Zyngier
  -1 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2017-10-17  9:34 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvm, Catalin Marinas, Will Deacon, kvmarm, linux-arm-kernel

On 16/10/17 21:08, Christoffer Dall wrote:
> On Mon, Oct 09, 2017 at 04:20:28PM +0100, Marc Zyngier wrote:
>> The only case where we actually need to perform a dache maintenance
>> is when we map the page for the first time, and subsequent permission
>> faults do not require cache maintenance. Let's make it conditional
>> on not being a permission fault (and thus a translation fault).
> 
> Why do we actually need to do any dcache maintenance when faulting in a
> page?
> 
> Is this for the case when the stage 1 MMU is disabled, or to support
> guest mappings using uncached attributes?

These are indeed the two cases that require cleaning the dcache to PoC.

> Can we do better, for example
> by only flushing the cache if the guest MMU is disabled?

The guest MMU being disabled is easy. But the uncached mapping is much
trickier, and would involve parsing the guest page tables. Not something
I'm really eager to implement.

> 
> Beyond that:
> 
> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Thanks!

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 06/10] KVM: arm/arm64: Only clean the dcache on translation fault
@ 2017-10-17  9:34       ` Marc Zyngier
  0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2017-10-17  9:34 UTC (permalink / raw)
  To: linux-arm-kernel

On 16/10/17 21:08, Christoffer Dall wrote:
> On Mon, Oct 09, 2017 at 04:20:28PM +0100, Marc Zyngier wrote:
>> The only case where we actually need to perform a dache maintenance
>> is when we map the page for the first time, and subsequent permission
>> faults do not require cache maintenance. Let's make it conditional
>> on not being a permission fault (and thus a translation fault).
> 
> Why do we actually need to do any dcache maintenance when faulting in a
> page?
> 
> Is this for the case when the stage 1 MMU is disabled, or to support
> guest mappings using uncached attributes?

These are indeed the two cases that require cleaning the dcache to PoC.

> Can we do better, for example
> by only flushing the cache if the guest MMU is disabled?

The guest MMU being disabled is easy. But the uncached mapping is much
trickier, and would involve parsing the guest page tables. Not something
I'm really eager to implement.

> 
> Beyond that:
> 
> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Thanks!

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 07/10] KVM: arm/arm64: Preserve Exec permission across R/W permission faults
  2017-10-16 20:08     ` Christoffer Dall
@ 2017-10-17 11:22       ` Marc Zyngier
  -1 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2017-10-17 11:22 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Christoffer Dall, Catalin Marinas, Will Deacon, linux-arm-kernel,
	kvm, kvmarm

On 16/10/17 21:08, Christoffer Dall wrote:
> On Mon, Oct 09, 2017 at 04:20:29PM +0100, Marc Zyngier wrote:
>> So far, we loose the Exec property whenever we take permission
>> faults, as we always reconstruct the PTE/PMD from scratch. This
>> can be counter productive as we can end-up with the following
>> fault sequence:
>>
>> 	X -> RO -> ROX -> RW -> RWX
>>
>> Instead, we can lookup the existing PTE/PMD and clear the XN bit in the
>> new entry if it was already cleared in the old one, leadig to a much
>> nicer fault sequence:
>>
>> 	X -> ROX -> RWX
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm/include/asm/kvm_mmu.h   | 10 ++++++++++
>>  arch/arm64/include/asm/kvm_mmu.h | 10 ++++++++++
>>  virt/kvm/arm/mmu.c               | 25 +++++++++++++++++++++++++
>>  3 files changed, 45 insertions(+)
>>
>> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
>> index bf76150aad5f..ad442d86c23e 100644
>> --- a/arch/arm/include/asm/kvm_mmu.h
>> +++ b/arch/arm/include/asm/kvm_mmu.h
>> @@ -107,6 +107,11 @@ static inline bool kvm_s2pte_readonly(pte_t *pte)
>>  	return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
>>  }
>>  
>> +static inline bool kvm_s2pte_exec(pte_t *pte)
>> +{
>> +	return !(pte_val(*pte) & L_PTE_XN);
>> +}
>> +
>>  static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
>>  {
>>  	pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
>> @@ -117,6 +122,11 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
>>  	return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
>>  }
>>  
>> +static inline bool kvm_s2pmd_exec(pmd_t *pmd)
>> +{
>> +	return !(pmd_val(*pmd) & PMD_SECT_XN);
>> +}
>> +
>>  static inline bool kvm_page_empty(void *ptr)
>>  {
>>  	struct page *ptr_page = virt_to_page(ptr);
>> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
>> index 60c420a5ac0d..e7af74b8b51a 100644
>> --- a/arch/arm64/include/asm/kvm_mmu.h
>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>> @@ -203,6 +203,11 @@ static inline bool kvm_s2pte_readonly(pte_t *pte)
>>  	return (pte_val(*pte) & PTE_S2_RDWR) == PTE_S2_RDONLY;
>>  }
>>  
>> +static inline bool kvm_s2pte_exec(pte_t *pte)
>> +{
>> +	return !(pte_val(*pte) & PTE_S2_XN);
>> +}
>> +
>>  static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
>>  {
>>  	kvm_set_s2pte_readonly((pte_t *)pmd);
>> @@ -213,6 +218,11 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
>>  	return kvm_s2pte_readonly((pte_t *)pmd);
>>  }
>>  
>> +static inline bool kvm_s2pmd_exec(pmd_t *pmd)
>> +{
>> +	return !(pmd_val(*pmd) & PMD_S2_XN);
>> +}
>> +
>>  static inline bool kvm_page_empty(void *ptr)
>>  {
>>  	struct page *ptr_page = virt_to_page(ptr);
>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>> index 1911fadde88b..ccc6106764a6 100644
>> --- a/virt/kvm/arm/mmu.c
>> +++ b/virt/kvm/arm/mmu.c
>> @@ -926,6 +926,17 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
>>  	return 0;
>>  }
>>  
>> +static pte_t *stage2_get_pte(struct kvm *kvm, phys_addr_t addr)
>> +{
>> +	pmd_t *pmdp;
>> +
>> +	pmdp = stage2_get_pmd(kvm, NULL, addr);
>> +	if (!pmdp || pmd_none(*pmdp))
>> +		return NULL;
>> +
>> +	return pte_offset_kernel(pmdp, addr);
>> +}
>> +
> 
> nit, couldn't you change this to be
> 
>     stage2_is_exec(struct kvm *kvm, phys_addr_t addr)
> 
> Which, if the pmd is a section mapping just checks that, and if we find
> a pte, we check that, and then we can have a simpler one-line call and
> check from both the pte and pmd paths below?

Yes, that's pretty neat. I've folded that in.

> 
>>  static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
>>  			  phys_addr_t addr, const pte_t *new_pte,
>>  			  unsigned long flags)
>> @@ -1407,6 +1418,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  		if (exec_fault) {
>>  			new_pmd = kvm_s2pmd_mkexec(new_pmd);
>>  			coherent_icache_guest_page(vcpu, pfn, PMD_SIZE);
>> +		} else if (fault_status == FSC_PERM) {
>> +			/* Preserve execute if XN was already cleared */
>> +			pmd_t *old_pmdp = stage2_get_pmd(kvm, NULL, fault_ipa);
>> +
>> +			if (old_pmdp && pmd_present(*old_pmdp) &&
>> +			    kvm_s2pmd_exec(old_pmdp))
>> +				new_pmd = kvm_s2pmd_mkexec(new_pmd);
> 
> Is the reverse case not also possible then?  That is, if we have an
> exec_fault, we could check if the entry is already writable and maintain
> the property as well.  Not sure how often that would get hit though, as
> a VM would only execute instructions on a page that has been written to,
> but is somehow read-only at stage2, meaning the host must have marked
> the page as read-only since content was written.  I think this could be
> a somewhat common pattern with something like KSM though?

I think this is already the case, because we always build the PTE/PMD as
either ROXN or RWXN, and only later clear the XN bit (see the
unconditional call to gfn_to_pfn_prot which should tell us whether to
map the page as writable or not). Or am I missing your point entirely?

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 07/10] KVM: arm/arm64: Preserve Exec permission across R/W permission faults
@ 2017-10-17 11:22       ` Marc Zyngier
  0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2017-10-17 11:22 UTC (permalink / raw)
  To: linux-arm-kernel

On 16/10/17 21:08, Christoffer Dall wrote:
> On Mon, Oct 09, 2017 at 04:20:29PM +0100, Marc Zyngier wrote:
>> So far, we loose the Exec property whenever we take permission
>> faults, as we always reconstruct the PTE/PMD from scratch. This
>> can be counter productive as we can end-up with the following
>> fault sequence:
>>
>> 	X -> RO -> ROX -> RW -> RWX
>>
>> Instead, we can lookup the existing PTE/PMD and clear the XN bit in the
>> new entry if it was already cleared in the old one, leadig to a much
>> nicer fault sequence:
>>
>> 	X -> ROX -> RWX
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm/include/asm/kvm_mmu.h   | 10 ++++++++++
>>  arch/arm64/include/asm/kvm_mmu.h | 10 ++++++++++
>>  virt/kvm/arm/mmu.c               | 25 +++++++++++++++++++++++++
>>  3 files changed, 45 insertions(+)
>>
>> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
>> index bf76150aad5f..ad442d86c23e 100644
>> --- a/arch/arm/include/asm/kvm_mmu.h
>> +++ b/arch/arm/include/asm/kvm_mmu.h
>> @@ -107,6 +107,11 @@ static inline bool kvm_s2pte_readonly(pte_t *pte)
>>  	return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
>>  }
>>  
>> +static inline bool kvm_s2pte_exec(pte_t *pte)
>> +{
>> +	return !(pte_val(*pte) & L_PTE_XN);
>> +}
>> +
>>  static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
>>  {
>>  	pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
>> @@ -117,6 +122,11 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
>>  	return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
>>  }
>>  
>> +static inline bool kvm_s2pmd_exec(pmd_t *pmd)
>> +{
>> +	return !(pmd_val(*pmd) & PMD_SECT_XN);
>> +}
>> +
>>  static inline bool kvm_page_empty(void *ptr)
>>  {
>>  	struct page *ptr_page = virt_to_page(ptr);
>> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
>> index 60c420a5ac0d..e7af74b8b51a 100644
>> --- a/arch/arm64/include/asm/kvm_mmu.h
>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>> @@ -203,6 +203,11 @@ static inline bool kvm_s2pte_readonly(pte_t *pte)
>>  	return (pte_val(*pte) & PTE_S2_RDWR) == PTE_S2_RDONLY;
>>  }
>>  
>> +static inline bool kvm_s2pte_exec(pte_t *pte)
>> +{
>> +	return !(pte_val(*pte) & PTE_S2_XN);
>> +}
>> +
>>  static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
>>  {
>>  	kvm_set_s2pte_readonly((pte_t *)pmd);
>> @@ -213,6 +218,11 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
>>  	return kvm_s2pte_readonly((pte_t *)pmd);
>>  }
>>  
>> +static inline bool kvm_s2pmd_exec(pmd_t *pmd)
>> +{
>> +	return !(pmd_val(*pmd) & PMD_S2_XN);
>> +}
>> +
>>  static inline bool kvm_page_empty(void *ptr)
>>  {
>>  	struct page *ptr_page = virt_to_page(ptr);
>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>> index 1911fadde88b..ccc6106764a6 100644
>> --- a/virt/kvm/arm/mmu.c
>> +++ b/virt/kvm/arm/mmu.c
>> @@ -926,6 +926,17 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
>>  	return 0;
>>  }
>>  
>> +static pte_t *stage2_get_pte(struct kvm *kvm, phys_addr_t addr)
>> +{
>> +	pmd_t *pmdp;
>> +
>> +	pmdp = stage2_get_pmd(kvm, NULL, addr);
>> +	if (!pmdp || pmd_none(*pmdp))
>> +		return NULL;
>> +
>> +	return pte_offset_kernel(pmdp, addr);
>> +}
>> +
> 
> nit, couldn't you change this to be
> 
>     stage2_is_exec(struct kvm *kvm, phys_addr_t addr)
> 
> Which, if the pmd is a section mapping just checks that, and if we find
> a pte, we check that, and then we can have a simpler one-line call and
> check from both the pte and pmd paths below?

Yes, that's pretty neat. I've folded that in.

> 
>>  static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
>>  			  phys_addr_t addr, const pte_t *new_pte,
>>  			  unsigned long flags)
>> @@ -1407,6 +1418,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  		if (exec_fault) {
>>  			new_pmd = kvm_s2pmd_mkexec(new_pmd);
>>  			coherent_icache_guest_page(vcpu, pfn, PMD_SIZE);
>> +		} else if (fault_status == FSC_PERM) {
>> +			/* Preserve execute if XN was already cleared */
>> +			pmd_t *old_pmdp = stage2_get_pmd(kvm, NULL, fault_ipa);
>> +
>> +			if (old_pmdp && pmd_present(*old_pmdp) &&
>> +			    kvm_s2pmd_exec(old_pmdp))
>> +				new_pmd = kvm_s2pmd_mkexec(new_pmd);
> 
> Is the reverse case not also possible then?  That is, if we have an
> exec_fault, we could check if the entry is already writable and maintain
> the property as well.  Not sure how often that would get hit though, as
> a VM would only execute instructions on a page that has been written to,
> but is somehow read-only at stage2, meaning the host must have marked
> the page as read-only since content was written.  I think this could be
> a somewhat common pattern with something like KSM though?

I think this is already the case, because we always build the PTE/PMD as
either ROXN or RWXN, and only later clear the XN bit (see the
unconditional call to gfn_to_pfn_prot which should tell us whether to
map the page as writable or not). Or am I missing your point entirely?

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 10/10] arm: KVM: Use common implementation for all flushes to PoC
  2017-10-16 20:06     ` Christoffer Dall
@ 2017-10-17 12:40       ` Marc Zyngier
  -1 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2017-10-17 12:40 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Christoffer Dall, Catalin Marinas, Will Deacon, linux-arm-kernel,
	kvm, kvmarm

On 16/10/17 21:06, Christoffer Dall wrote:
> On Mon, Oct 09, 2017 at 04:20:32PM +0100, Marc Zyngier wrote:
>> We currently have no less than three implementations for the
>> "flush to PoC" code. Let standardize on a single one. This
>> requires a bit of unpleasant moving around, and relies on
>> __kvm_flush_dcache_pte and co being #defines so that they can
>> call into coherent_dcache_guest_page...
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm/include/asm/kvm_mmu.h | 28 ++++------------------------
>>  virt/kvm/arm/mmu.c             | 20 ++++++++++----------
>>  2 files changed, 14 insertions(+), 34 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
>> index 5f1ac88a5951..011b0db85c02 100644
>> --- a/arch/arm/include/asm/kvm_mmu.h
>> +++ b/arch/arm/include/asm/kvm_mmu.h
>> @@ -235,31 +235,11 @@ static inline void __coherent_icache_guest_page(kvm_pfn_t pfn,
>>  	}
>>  }
>>  
>> -static inline void __kvm_flush_dcache_pte(pte_t pte)
>> -{
>> -	void *va = kmap_atomic(pte_page(pte));
>> -
>> -	kvm_flush_dcache_to_poc(va, PAGE_SIZE);
>> -
>> -	kunmap_atomic(va);
>> -}
>> -
>> -static inline void __kvm_flush_dcache_pmd(pmd_t pmd)
>> -{
>> -	unsigned long size = PMD_SIZE;
>> -	kvm_pfn_t pfn = pmd_pfn(pmd);
>> -
>> -	while (size) {
>> -		void *va = kmap_atomic_pfn(pfn);
>> +#define __kvm_flush_dcache_pte(p)				\
>> +	coherent_dcache_guest_page(pte_pfn((p)), PAGE_SIZE)
>>  
>> -		kvm_flush_dcache_to_poc(va, PAGE_SIZE);
>> -
>> -		pfn++;
>> -		size -= PAGE_SIZE;
>> -
>> -		kunmap_atomic(va);
>> -	}
>> -}
>> +#define __kvm_flush_dcache_pmd(p)				\
>> +	coherent_dcache_guest_page(pmd_pfn((p)), PMD_SIZE)
> 
> Why can't these just be static inlines which call
> __coherent_dcache_guest_page already in the header file directly?

Because if we do that, we get a significant code expansion in the
resulting binary (all the call sites end up having a copy of that function.

> I'm really not too crazy about these #defines.

Neither am I. But actually, this patch is completely wrong. Using the
same functions as the guest cleaning doesn't provide the guarantees
documented next to unmap_stage2_ptes, as we need a clean+invalidate, not
just a clean.

I'll rework this patch (or just drop it).

> In fact, why do we need the coherent_Xcache_guest_page static
> indirection functions in mmu.c in the first place?

Code expansion. That's the only reason.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 10/10] arm: KVM: Use common implementation for all flushes to PoC
@ 2017-10-17 12:40       ` Marc Zyngier
  0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2017-10-17 12:40 UTC (permalink / raw)
  To: linux-arm-kernel

On 16/10/17 21:06, Christoffer Dall wrote:
> On Mon, Oct 09, 2017 at 04:20:32PM +0100, Marc Zyngier wrote:
>> We currently have no less than three implementations for the
>> "flush to PoC" code. Let standardize on a single one. This
>> requires a bit of unpleasant moving around, and relies on
>> __kvm_flush_dcache_pte and co being #defines so that they can
>> call into coherent_dcache_guest_page...
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm/include/asm/kvm_mmu.h | 28 ++++------------------------
>>  virt/kvm/arm/mmu.c             | 20 ++++++++++----------
>>  2 files changed, 14 insertions(+), 34 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
>> index 5f1ac88a5951..011b0db85c02 100644
>> --- a/arch/arm/include/asm/kvm_mmu.h
>> +++ b/arch/arm/include/asm/kvm_mmu.h
>> @@ -235,31 +235,11 @@ static inline void __coherent_icache_guest_page(kvm_pfn_t pfn,
>>  	}
>>  }
>>  
>> -static inline void __kvm_flush_dcache_pte(pte_t pte)
>> -{
>> -	void *va = kmap_atomic(pte_page(pte));
>> -
>> -	kvm_flush_dcache_to_poc(va, PAGE_SIZE);
>> -
>> -	kunmap_atomic(va);
>> -}
>> -
>> -static inline void __kvm_flush_dcache_pmd(pmd_t pmd)
>> -{
>> -	unsigned long size = PMD_SIZE;
>> -	kvm_pfn_t pfn = pmd_pfn(pmd);
>> -
>> -	while (size) {
>> -		void *va = kmap_atomic_pfn(pfn);
>> +#define __kvm_flush_dcache_pte(p)				\
>> +	coherent_dcache_guest_page(pte_pfn((p)), PAGE_SIZE)
>>  
>> -		kvm_flush_dcache_to_poc(va, PAGE_SIZE);
>> -
>> -		pfn++;
>> -		size -= PAGE_SIZE;
>> -
>> -		kunmap_atomic(va);
>> -	}
>> -}
>> +#define __kvm_flush_dcache_pmd(p)				\
>> +	coherent_dcache_guest_page(pmd_pfn((p)), PMD_SIZE)
> 
> Why can't these just be static inlines which call
> __coherent_dcache_guest_page already in the header file directly?

Because if we do that, we get a significant code expansion in the
resulting binary (all the call sites end up having a copy of that function.

> I'm really not too crazy about these #defines.

Neither am I. But actually, this patch is completely wrong. Using the
same functions as the guest cleaning doesn't provide the guarantees
documented next to unmap_stage2_ptes, as we need a clean+invalidate, not
just a clean.

I'll rework this patch (or just drop it).

> In fact, why do we need the coherent_Xcache_guest_page static
> indirection functions in mmu.c in the first place?

Code expansion. That's the only reason.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 01/10] KVM: arm/arm64: Split dcache/icache flushing
  2017-10-17  8:57       ` Marc Zyngier
@ 2017-10-17 14:28         ` Christoffer Dall
  -1 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2017-10-17 14:28 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Christoffer Dall, Catalin Marinas, Will Deacon, linux-arm-kernel,
	kvm, kvmarm

On Tue, Oct 17, 2017 at 09:57:34AM +0100, Marc Zyngier wrote:
> On 16/10/17 21:07, Christoffer Dall wrote:
> > On Mon, Oct 09, 2017 at 04:20:23PM +0100, Marc Zyngier wrote:
> >> As we're about to introduce opportunistic invalidation of the icache,
> >> let's split dcache and icache flushing.
> > 
> > I'm a little confused abut the naming of these functions now,
> > because where I believe the current function ensures coherency between
> > the I-cache and D-cache (and overly so) if you just call one or the
> > other function after this change, what exactly is the coherency you get?
> 
> Yeah, in retrospect, this is a pretty stupid naming scheme. I guess I'll
> call them clean/invalidate, with the overarching caller still being
> called coherent_cache_guest for the time being.
> 

Sounds good.

> > 
> > 
> >>
> >> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >> ---
> >>  arch/arm/include/asm/kvm_mmu.h   | 60 ++++++++++++++++++++++++++++------------
> >>  arch/arm64/include/asm/kvm_mmu.h | 13 +++++++--
> >>  virt/kvm/arm/mmu.c               | 20 ++++++++++----
> >>  3 files changed, 67 insertions(+), 26 deletions(-)
> >>
> >> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> >> index fa6f2174276b..f553aa62d0c3 100644
> >> --- a/arch/arm/include/asm/kvm_mmu.h
> >> +++ b/arch/arm/include/asm/kvm_mmu.h
> >> @@ -126,21 +126,12 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
> >>  	return (vcpu_cp15(vcpu, c1_SCTLR) & 0b101) == 0b101;
> >>  }
> >>  
> >> -static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
> >> -					       kvm_pfn_t pfn,
> >> -					       unsigned long size)
> >> +static inline void __coherent_dcache_guest_page(struct kvm_vcpu *vcpu,
> >> +						kvm_pfn_t pfn,
> >> +						unsigned long size)
> >>  {
> >>  	/*
> >> -	 * If we are going to insert an instruction page and the icache is
> >> -	 * either VIPT or PIPT, there is a potential problem where the host
> >> -	 * (or another VM) may have used the same page as this guest, and we
> >> -	 * read incorrect data from the icache.  If we're using a PIPT cache,
> >> -	 * we can invalidate just that page, but if we are using a VIPT cache
> >> -	 * we need to invalidate the entire icache - damn shame - as written
> >> -	 * in the ARM ARM (DDI 0406C.b - Page B3-1393).
> >> -	 *
> >> -	 * VIVT caches are tagged using both the ASID and the VMID and doesn't
> >> -	 * need any kind of flushing (DDI 0406C.b - Page B3-1392).
> >> +	 * Clean the dcache to the Point of Coherency.
> >>  	 *
> >>  	 * We need to do this through a kernel mapping (using the
> >>  	 * user-space mapping has proved to be the wrong
> >> @@ -155,19 +146,52 @@ static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
> >>  
> >>  		kvm_flush_dcache_to_poc(va, PAGE_SIZE);
> >>  
> >> -		if (icache_is_pipt())
> >> -			__cpuc_coherent_user_range((unsigned long)va,
> >> -						   (unsigned long)va + PAGE_SIZE);
> >> -
> >>  		size -= PAGE_SIZE;
> >>  		pfn++;
> >>  
> >>  		kunmap_atomic(va);
> >>  	}
> >> +}
> >>  
> >> -	if (!icache_is_pipt() && !icache_is_vivt_asid_tagged()) {
> >> +static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
> >> +						kvm_pfn_t pfn,
> >> +						unsigned long size)
> >> +{
> >> +	/*
> >> +	 * If we are going to insert an instruction page and the icache is
> >> +	 * either VIPT or PIPT, there is a potential problem where the host
> >> +	 * (or another VM) may have used the same page as this guest, and we
> >> +	 * read incorrect data from the icache.  If we're using a PIPT cache,
> >> +	 * we can invalidate just that page, but if we are using a VIPT cache
> >> +	 * we need to invalidate the entire icache - damn shame - as written
> >> +	 * in the ARM ARM (DDI 0406C.b - Page B3-1393).
> >> +	 *
> >> +	 * VIVT caches are tagged using both the ASID and the VMID and doesn't
> >> +	 * need any kind of flushing (DDI 0406C.b - Page B3-1392).
> >> +	 */
> >> +
> >> +	VM_BUG_ON(size & ~PAGE_MASK);
> >> +
> >> +	if (icache_is_vivt_asid_tagged())
> >> +		return;
> >> +
> >> +	if (!icache_is_pipt()) {
> >>  		/* any kind of VIPT cache */
> >>  		__flush_icache_all();
> >> +		return;
> >> +	}
> >> +
> >> +	/* PIPT cache. As for the d-side, use a temporary kernel mapping. */
> >> +	while (size) {
> >> +		void *va = kmap_atomic_pfn(pfn);
> >> +
> >> +		__cpuc_coherent_user_range((unsigned long)va,
> >> +					   (unsigned long)va + PAGE_SIZE);
> >> +
> >> +		size -= PAGE_SIZE;
> >> +		pfn++;
> >> +
> >> +		kunmap_atomic(va);
> >>  	}
> >>  }
> >>  
> >> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> >> index 672c8684d5c2..4c4cb4f0e34f 100644
> >> --- a/arch/arm64/include/asm/kvm_mmu.h
> >> +++ b/arch/arm64/include/asm/kvm_mmu.h
> >> @@ -230,19 +230,26 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
> >>  	return (vcpu_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
> >>  }
> >>  
> >> -static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
> >> -					       kvm_pfn_t pfn,
> >> -					       unsigned long size)
> >> +static inline void __coherent_dcache_guest_page(struct kvm_vcpu *vcpu,
> >> +						kvm_pfn_t pfn,
> >> +						unsigned long size)
> >>  {
> >>  	void *va = page_address(pfn_to_page(pfn));
> >>  
> >>  	kvm_flush_dcache_to_poc(va, size);
> >> +}
> >>  
> >> +static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
> >> +						kvm_pfn_t pfn,
> >> +						unsigned long size)
> >> +{
> >>  	if (icache_is_aliasing()) {
> >>  		/* any kind of VIPT cache */
> >>  		__flush_icache_all();
> >>  	} else if (is_kernel_in_hyp_mode() || !icache_is_vpipt()) {
> >>  		/* PIPT or VPIPT at EL2 (see comment in __kvm_tlb_flush_vmid_ipa) */
> > 
> > unrelated: I went and read the comment in __kvm_tlb_flush_vmid_ipa, and
> > I don't really understand why there is only a need to flush the icache
> > if the host is running at EL1.
> > 
> > The text seems to describe the problem of remapping executable pages
> > within the guest.  That seems to me would require icache maintenance of
> > the page that gets overwritten with new code, regardless of whether the
> > host runs at EL1 or EL2.
> > 
> > Of course it's easier done on VHE because we don't have to take a trap,
> > but the code seems to not invalidate the icache at all for VHE systems
> > that have VPIPT.  I'm confused.  Can you help?
> 
> [+ Will, as he wrote that code and can reply if I say something stupid]
> 
> Here's the trick: The VMID-tagged aspect of VPIPT only applies if the
> CMO is used at EL0 or EL1. When used at EL2, it behaves exactly like a
> VPIPT operation (see D4.10.2 in the ARMv8 ARM version B_b).
> 
> So in the end, we deal with VPIPT the following way:
> 
> - Without VHE, we perform the icache invalidation on unmap, blatting the
> whole icache.

ok, but why can't we do the invalidation by jumping to EL2 like we do
for some of the other CMOs ?

> 
> - With VHE, we do it the usual way (at map time), using the PIPT
> flavour, as the invalidation is done from EL2
> 

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 01/10] KVM: arm/arm64: Split dcache/icache flushing
@ 2017-10-17 14:28         ` Christoffer Dall
  0 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2017-10-17 14:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Oct 17, 2017 at 09:57:34AM +0100, Marc Zyngier wrote:
> On 16/10/17 21:07, Christoffer Dall wrote:
> > On Mon, Oct 09, 2017 at 04:20:23PM +0100, Marc Zyngier wrote:
> >> As we're about to introduce opportunistic invalidation of the icache,
> >> let's split dcache and icache flushing.
> > 
> > I'm a little confused abut the naming of these functions now,
> > because where I believe the current function ensures coherency between
> > the I-cache and D-cache (and overly so) if you just call one or the
> > other function after this change, what exactly is the coherency you get?
> 
> Yeah, in retrospect, this is a pretty stupid naming scheme. I guess I'll
> call them clean/invalidate, with the overarching caller still being
> called coherent_cache_guest for the time being.
> 

Sounds good.

> > 
> > 
> >>
> >> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >> ---
> >>  arch/arm/include/asm/kvm_mmu.h   | 60 ++++++++++++++++++++++++++++------------
> >>  arch/arm64/include/asm/kvm_mmu.h | 13 +++++++--
> >>  virt/kvm/arm/mmu.c               | 20 ++++++++++----
> >>  3 files changed, 67 insertions(+), 26 deletions(-)
> >>
> >> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> >> index fa6f2174276b..f553aa62d0c3 100644
> >> --- a/arch/arm/include/asm/kvm_mmu.h
> >> +++ b/arch/arm/include/asm/kvm_mmu.h
> >> @@ -126,21 +126,12 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
> >>  	return (vcpu_cp15(vcpu, c1_SCTLR) & 0b101) == 0b101;
> >>  }
> >>  
> >> -static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
> >> -					       kvm_pfn_t pfn,
> >> -					       unsigned long size)
> >> +static inline void __coherent_dcache_guest_page(struct kvm_vcpu *vcpu,
> >> +						kvm_pfn_t pfn,
> >> +						unsigned long size)
> >>  {
> >>  	/*
> >> -	 * If we are going to insert an instruction page and the icache is
> >> -	 * either VIPT or PIPT, there is a potential problem where the host
> >> -	 * (or another VM) may have used the same page as this guest, and we
> >> -	 * read incorrect data from the icache.  If we're using a PIPT cache,
> >> -	 * we can invalidate just that page, but if we are using a VIPT cache
> >> -	 * we need to invalidate the entire icache - damn shame - as written
> >> -	 * in the ARM ARM (DDI 0406C.b - Page B3-1393).
> >> -	 *
> >> -	 * VIVT caches are tagged using both the ASID and the VMID and doesn't
> >> -	 * need any kind of flushing (DDI 0406C.b - Page B3-1392).
> >> +	 * Clean the dcache to the Point of Coherency.
> >>  	 *
> >>  	 * We need to do this through a kernel mapping (using the
> >>  	 * user-space mapping has proved to be the wrong
> >> @@ -155,19 +146,52 @@ static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
> >>  
> >>  		kvm_flush_dcache_to_poc(va, PAGE_SIZE);
> >>  
> >> -		if (icache_is_pipt())
> >> -			__cpuc_coherent_user_range((unsigned long)va,
> >> -						   (unsigned long)va + PAGE_SIZE);
> >> -
> >>  		size -= PAGE_SIZE;
> >>  		pfn++;
> >>  
> >>  		kunmap_atomic(va);
> >>  	}
> >> +}
> >>  
> >> -	if (!icache_is_pipt() && !icache_is_vivt_asid_tagged()) {
> >> +static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
> >> +						kvm_pfn_t pfn,
> >> +						unsigned long size)
> >> +{
> >> +	/*
> >> +	 * If we are going to insert an instruction page and the icache is
> >> +	 * either VIPT or PIPT, there is a potential problem where the host
> >> +	 * (or another VM) may have used the same page as this guest, and we
> >> +	 * read incorrect data from the icache.  If we're using a PIPT cache,
> >> +	 * we can invalidate just that page, but if we are using a VIPT cache
> >> +	 * we need to invalidate the entire icache - damn shame - as written
> >> +	 * in the ARM ARM (DDI 0406C.b - Page B3-1393).
> >> +	 *
> >> +	 * VIVT caches are tagged using both the ASID and the VMID and doesn't
> >> +	 * need any kind of flushing (DDI 0406C.b - Page B3-1392).
> >> +	 */
> >> +
> >> +	VM_BUG_ON(size & ~PAGE_MASK);
> >> +
> >> +	if (icache_is_vivt_asid_tagged())
> >> +		return;
> >> +
> >> +	if (!icache_is_pipt()) {
> >>  		/* any kind of VIPT cache */
> >>  		__flush_icache_all();
> >> +		return;
> >> +	}
> >> +
> >> +	/* PIPT cache. As for the d-side, use a temporary kernel mapping. */
> >> +	while (size) {
> >> +		void *va = kmap_atomic_pfn(pfn);
> >> +
> >> +		__cpuc_coherent_user_range((unsigned long)va,
> >> +					   (unsigned long)va + PAGE_SIZE);
> >> +
> >> +		size -= PAGE_SIZE;
> >> +		pfn++;
> >> +
> >> +		kunmap_atomic(va);
> >>  	}
> >>  }
> >>  
> >> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> >> index 672c8684d5c2..4c4cb4f0e34f 100644
> >> --- a/arch/arm64/include/asm/kvm_mmu.h
> >> +++ b/arch/arm64/include/asm/kvm_mmu.h
> >> @@ -230,19 +230,26 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
> >>  	return (vcpu_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
> >>  }
> >>  
> >> -static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
> >> -					       kvm_pfn_t pfn,
> >> -					       unsigned long size)
> >> +static inline void __coherent_dcache_guest_page(struct kvm_vcpu *vcpu,
> >> +						kvm_pfn_t pfn,
> >> +						unsigned long size)
> >>  {
> >>  	void *va = page_address(pfn_to_page(pfn));
> >>  
> >>  	kvm_flush_dcache_to_poc(va, size);
> >> +}
> >>  
> >> +static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
> >> +						kvm_pfn_t pfn,
> >> +						unsigned long size)
> >> +{
> >>  	if (icache_is_aliasing()) {
> >>  		/* any kind of VIPT cache */
> >>  		__flush_icache_all();
> >>  	} else if (is_kernel_in_hyp_mode() || !icache_is_vpipt()) {
> >>  		/* PIPT or VPIPT at EL2 (see comment in __kvm_tlb_flush_vmid_ipa) */
> > 
> > unrelated: I went and read the comment in __kvm_tlb_flush_vmid_ipa, and
> > I don't really understand why there is only a need to flush the icache
> > if the host is running at EL1.
> > 
> > The text seems to describe the problem of remapping executable pages
> > within the guest.  That seems to me would require icache maintenance of
> > the page that gets overwritten with new code, regardless of whether the
> > host runs at EL1 or EL2.
> > 
> > Of course it's easier done on VHE because we don't have to take a trap,
> > but the code seems to not invalidate the icache at all for VHE systems
> > that have VPIPT.  I'm confused.  Can you help?
> 
> [+ Will, as he wrote that code and can reply if I say something stupid]
> 
> Here's the trick: The VMID-tagged aspect of VPIPT only applies if the
> CMO is used at EL0 or EL1. When used at EL2, it behaves exactly like a
> VPIPT operation (see D4.10.2 in the ARMv8 ARM version B_b).
> 
> So in the end, we deal with VPIPT the following way:
> 
> - Without VHE, we perform the icache invalidation on unmap, blatting the
> whole icache.

ok, but why can't we do the invalidation by jumping to EL2 like we do
for some of the other CMOs ?

> 
> - With VHE, we do it the usual way (at map time), using the PIPT
> flavour, as the invalidation is done from EL2
> 

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 03/10] arm: KVM: Add optimized PIPT icache flushing
  2017-10-17  9:26       ` Marc Zyngier
@ 2017-10-17 14:34         ` Christoffer Dall
  -1 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2017-10-17 14:34 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Christoffer Dall, Catalin Marinas, Will Deacon, linux-arm-kernel,
	kvm, kvmarm

On Tue, Oct 17, 2017 at 10:26:31AM +0100, Marc Zyngier wrote:
> On 16/10/17 21:07, Christoffer Dall wrote:
> > On Mon, Oct 09, 2017 at 04:20:25PM +0100, Marc Zyngier wrote:
> >> Calling __cpuc_coherent_user_range to invalidate the icache on
> >> a PIPT icache machine has some pointless overhead, as it starts
> >> by cleaning the dcache to the PoU, while we're guaranteed to
> >> have already cleaned it to the PoC.
> >>
> >> As KVM is the only user of such a feature, let's implement some
> >> ad-hoc cache flushing in kvm_mmu.h. Should it become useful to
> >> other subsystems, it can be moved to a more global location.
> >>
> >> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >> ---
> >>  arch/arm/include/asm/kvm_hyp.h |  2 ++
> >>  arch/arm/include/asm/kvm_mmu.h | 24 ++++++++++++++++++++++--
> >>  2 files changed, 24 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/arch/arm/include/asm/kvm_hyp.h b/arch/arm/include/asm/kvm_hyp.h
> >> index 14b5903f0224..ad541f9ecc78 100644
> >> --- a/arch/arm/include/asm/kvm_hyp.h
> >> +++ b/arch/arm/include/asm/kvm_hyp.h
> >> @@ -69,6 +69,8 @@
> >>  #define HIFAR		__ACCESS_CP15(c6, 4, c0, 2)
> >>  #define HPFAR		__ACCESS_CP15(c6, 4, c0, 4)
> >>  #define ICIALLUIS	__ACCESS_CP15(c7, 0, c1, 0)
> >> +#define BPIALLIS	__ACCESS_CP15(c7, 0, c1, 6)
> >> +#define ICIMVAU		__ACCESS_CP15(c7, 0, c5, 1)
> >>  #define ATS1CPR		__ACCESS_CP15(c7, 0, c8, 0)
> >>  #define TLBIALLIS	__ACCESS_CP15(c8, 0, c3, 0)
> >>  #define TLBIALL		__ACCESS_CP15(c8, 0, c7, 0)
> >> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> >> index f553aa62d0c3..6773dcf21bff 100644
> >> --- a/arch/arm/include/asm/kvm_mmu.h
> >> +++ b/arch/arm/include/asm/kvm_mmu.h
> >> @@ -37,6 +37,8 @@
> >>  
> >>  #include <linux/highmem.h>
> >>  #include <asm/cacheflush.h>
> >> +#include <asm/cputype.h>
> >> +#include <asm/kvm_hyp.h>
> >>  #include <asm/pgalloc.h>
> >>  #include <asm/stage2_pgtable.h>
> >>  
> >> @@ -157,6 +159,8 @@ static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
> >>  						kvm_pfn_t pfn,
> >>  						unsigned long size)
> >>  {
> >> +	u32 iclsz;
> >> +
> >>  	/*
> >>  	 * If we are going to insert an instruction page and the icache is
> >>  	 * either VIPT or PIPT, there is a potential problem where the host
> >> @@ -182,17 +186,33 @@ static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
> >>  	}
> >>  
> >>  	/* PIPT cache. As for the d-side, use a temporary kernel mapping. */
> >> +	iclsz = 4 << (read_cpuid(CPUID_CACHETYPE) & 0xf);
> >> +
> > 
> > nit: the 4 here is a bit cryptic, could we say something like (perhaps
> > slightly over-explained):
> > /*
> >  * CTR IminLine contains Log2 of the number of words in the cache line,
> >  * so we can get the number of words as 2 << (IminLine - 1).  To get the
> >  * number of bytes, we multiply by 4 (the number of bytes in a 32-bit
> >  * word), and get 4 << (IminLine).
> >  */
> 
> Absolutely. I'll fold that in. Thanks.
> 
> >>  	while (size) {
> >>  		void *va = kmap_atomic_pfn(pfn);
> >> +		void *end = va + PAGE_SIZE;
> >> +		void *addr = va;
> >> +
> >> +		do {
> >> +			write_sysreg(addr, ICIMVAU);
> > 
> > Maybe an oddball place to ask this, but I don't recall why we need PoU
> > everywhere, would PoC potentially be enough?
> 
> PoC is in general stronger than PoU. All we care is for instructions to
> be fetched from the point where the icache cannot be distinguished from
> the dcache - the definition of the PoU. Also, I don't think there is a
> way to invalidate the icache to the PoC.
> 

Doi, I switched the meaning around in my head.  Sorry about the noise.

> > 
> >> +			addr += iclsz;
> >> +		} while (addr < end);
> >>  
> >> -		__cpuc_coherent_user_range((unsigned long)va,
> >> -					   (unsigned long)va + PAGE_SIZE);
> >> +		dsb(ishst);
> >> +		isb();
> > 
> > Do we really need this in every iteration of the loop?
> 
> The problem is that we need to factor in the interaction with the unmap
> below. If we don't enforce the invalidation now, we may unmap the page
> before the invalidations are finished, with could lead to a page fault.
> If we didn't have to deal with highmem, that would indeed be a good
> optimization.
> 

Ah, I completely failed to see that.  Thanks for the explanation.

> > 
> >>  
> >>  		size -= PAGE_SIZE;
> >>  		pfn++;
> >>  
> >>  		kunmap_atomic(va);
> >>  	}
> >> +
> >> +	/* Check if we need to invalidate the BTB */
> >> +	if ((read_cpuid_ext(CPUID_EXT_MMFR1) >> 24) != 4) {
> > 
> > Either I'm having a bad day or you meant to shift this 28, not 24?
> 
> Oops... That's indeed totally broken. Thanks for noticing it!
> 
With that fixed:

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 03/10] arm: KVM: Add optimized PIPT icache flushing
@ 2017-10-17 14:34         ` Christoffer Dall
  0 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2017-10-17 14:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Oct 17, 2017 at 10:26:31AM +0100, Marc Zyngier wrote:
> On 16/10/17 21:07, Christoffer Dall wrote:
> > On Mon, Oct 09, 2017 at 04:20:25PM +0100, Marc Zyngier wrote:
> >> Calling __cpuc_coherent_user_range to invalidate the icache on
> >> a PIPT icache machine has some pointless overhead, as it starts
> >> by cleaning the dcache to the PoU, while we're guaranteed to
> >> have already cleaned it to the PoC.
> >>
> >> As KVM is the only user of such a feature, let's implement some
> >> ad-hoc cache flushing in kvm_mmu.h. Should it become useful to
> >> other subsystems, it can be moved to a more global location.
> >>
> >> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >> ---
> >>  arch/arm/include/asm/kvm_hyp.h |  2 ++
> >>  arch/arm/include/asm/kvm_mmu.h | 24 ++++++++++++++++++++++--
> >>  2 files changed, 24 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/arch/arm/include/asm/kvm_hyp.h b/arch/arm/include/asm/kvm_hyp.h
> >> index 14b5903f0224..ad541f9ecc78 100644
> >> --- a/arch/arm/include/asm/kvm_hyp.h
> >> +++ b/arch/arm/include/asm/kvm_hyp.h
> >> @@ -69,6 +69,8 @@
> >>  #define HIFAR		__ACCESS_CP15(c6, 4, c0, 2)
> >>  #define HPFAR		__ACCESS_CP15(c6, 4, c0, 4)
> >>  #define ICIALLUIS	__ACCESS_CP15(c7, 0, c1, 0)
> >> +#define BPIALLIS	__ACCESS_CP15(c7, 0, c1, 6)
> >> +#define ICIMVAU		__ACCESS_CP15(c7, 0, c5, 1)
> >>  #define ATS1CPR		__ACCESS_CP15(c7, 0, c8, 0)
> >>  #define TLBIALLIS	__ACCESS_CP15(c8, 0, c3, 0)
> >>  #define TLBIALL		__ACCESS_CP15(c8, 0, c7, 0)
> >> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> >> index f553aa62d0c3..6773dcf21bff 100644
> >> --- a/arch/arm/include/asm/kvm_mmu.h
> >> +++ b/arch/arm/include/asm/kvm_mmu.h
> >> @@ -37,6 +37,8 @@
> >>  
> >>  #include <linux/highmem.h>
> >>  #include <asm/cacheflush.h>
> >> +#include <asm/cputype.h>
> >> +#include <asm/kvm_hyp.h>
> >>  #include <asm/pgalloc.h>
> >>  #include <asm/stage2_pgtable.h>
> >>  
> >> @@ -157,6 +159,8 @@ static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
> >>  						kvm_pfn_t pfn,
> >>  						unsigned long size)
> >>  {
> >> +	u32 iclsz;
> >> +
> >>  	/*
> >>  	 * If we are going to insert an instruction page and the icache is
> >>  	 * either VIPT or PIPT, there is a potential problem where the host
> >> @@ -182,17 +186,33 @@ static inline void __coherent_icache_guest_page(struct kvm_vcpu *vcpu,
> >>  	}
> >>  
> >>  	/* PIPT cache. As for the d-side, use a temporary kernel mapping. */
> >> +	iclsz = 4 << (read_cpuid(CPUID_CACHETYPE) & 0xf);
> >> +
> > 
> > nit: the 4 here is a bit cryptic, could we say something like (perhaps
> > slightly over-explained):
> > /*
> >  * CTR IminLine contains Log2 of the number of words in the cache line,
> >  * so we can get the number of words as 2 << (IminLine - 1).  To get the
> >  * number of bytes, we multiply by 4 (the number of bytes in a 32-bit
> >  * word), and get 4 << (IminLine).
> >  */
> 
> Absolutely. I'll fold that in. Thanks.
> 
> >>  	while (size) {
> >>  		void *va = kmap_atomic_pfn(pfn);
> >> +		void *end = va + PAGE_SIZE;
> >> +		void *addr = va;
> >> +
> >> +		do {
> >> +			write_sysreg(addr, ICIMVAU);
> > 
> > Maybe an oddball place to ask this, but I don't recall why we need PoU
> > everywhere, would PoC potentially be enough?
> 
> PoC is in general stronger than PoU. All we care is for instructions to
> be fetched from the point where the icache cannot be distinguished from
> the dcache - the definition of the PoU. Also, I don't think there is a
> way to invalidate the icache to the PoC.
> 

Doi, I switched the meaning around in my head.  Sorry about the noise.

> > 
> >> +			addr += iclsz;
> >> +		} while (addr < end);
> >>  
> >> -		__cpuc_coherent_user_range((unsigned long)va,
> >> -					   (unsigned long)va + PAGE_SIZE);
> >> +		dsb(ishst);
> >> +		isb();
> > 
> > Do we really need this in every iteration of the loop?
> 
> The problem is that we need to factor in the interaction with the unmap
> below. If we don't enforce the invalidation now, we may unmap the page
> before the invalidations are finished, with could lead to a page fault.
> If we didn't have to deal with highmem, that would indeed be a good
> optimization.
> 

Ah, I completely failed to see that.  Thanks for the explanation.

> > 
> >>  
> >>  		size -= PAGE_SIZE;
> >>  		pfn++;
> >>  
> >>  		kunmap_atomic(va);
> >>  	}
> >> +
> >> +	/* Check if we need to invalidate the BTB */
> >> +	if ((read_cpuid_ext(CPUID_EXT_MMFR1) >> 24) != 4) {
> > 
> > Either I'm having a bad day or you meant to shift this 28, not 24?
> 
> Oops... That's indeed totally broken. Thanks for noticing it!
> 
With that fixed:

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 06/10] KVM: arm/arm64: Only clean the dcache on translation fault
  2017-10-17  9:34       ` Marc Zyngier
@ 2017-10-17 14:36         ` Christoffer Dall
  -1 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2017-10-17 14:36 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Christoffer Dall, Catalin Marinas, Will Deacon, linux-arm-kernel,
	kvm, kvmarm

On Tue, Oct 17, 2017 at 10:34:15AM +0100, Marc Zyngier wrote:
> On 16/10/17 21:08, Christoffer Dall wrote:
> > On Mon, Oct 09, 2017 at 04:20:28PM +0100, Marc Zyngier wrote:
> >> The only case where we actually need to perform a dache maintenance
> >> is when we map the page for the first time, and subsequent permission
> >> faults do not require cache maintenance. Let's make it conditional
> >> on not being a permission fault (and thus a translation fault).
> > 
> > Why do we actually need to do any dcache maintenance when faulting in a
> > page?
> > 
> > Is this for the case when the stage 1 MMU is disabled, or to support
> > guest mappings using uncached attributes?
> 
> These are indeed the two cases that require cleaning the dcache to PoC.
> 
> > Can we do better, for example
> > by only flushing the cache if the guest MMU is disabled?
> 
> The guest MMU being disabled is easy. But the uncached mapping is much
> trickier, and would involve parsing the guest page tables. Not something
> I'm really eager to implement.
> 

Hmm, if the guest actually maps memory uncached, wouldn't it have to
invalidate caches itself, or is this the annoying thing where disabling
the MMU on hardware that doesn't have stage 2 would in fact always
completely bypass the cache, and therefore we have to do this work?

Sorry, I have forgotten all the details here, but wanted to make sure
we're not being overly careful.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 06/10] KVM: arm/arm64: Only clean the dcache on translation fault
@ 2017-10-17 14:36         ` Christoffer Dall
  0 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2017-10-17 14:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Oct 17, 2017 at 10:34:15AM +0100, Marc Zyngier wrote:
> On 16/10/17 21:08, Christoffer Dall wrote:
> > On Mon, Oct 09, 2017 at 04:20:28PM +0100, Marc Zyngier wrote:
> >> The only case where we actually need to perform a dache maintenance
> >> is when we map the page for the first time, and subsequent permission
> >> faults do not require cache maintenance. Let's make it conditional
> >> on not being a permission fault (and thus a translation fault).
> > 
> > Why do we actually need to do any dcache maintenance when faulting in a
> > page?
> > 
> > Is this for the case when the stage 1 MMU is disabled, or to support
> > guest mappings using uncached attributes?
> 
> These are indeed the two cases that require cleaning the dcache to PoC.
> 
> > Can we do better, for example
> > by only flushing the cache if the guest MMU is disabled?
> 
> The guest MMU being disabled is easy. But the uncached mapping is much
> trickier, and would involve parsing the guest page tables. Not something
> I'm really eager to implement.
> 

Hmm, if the guest actually maps memory uncached, wouldn't it have to
invalidate caches itself, or is this the annoying thing where disabling
the MMU on hardware that doesn't have stage 2 would in fact always
completely bypass the cache, and therefore we have to do this work?

Sorry, I have forgotten all the details here, but wanted to make sure
we're not being overly careful.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 01/10] KVM: arm/arm64: Split dcache/icache flushing
  2017-10-17 14:28         ` Christoffer Dall
@ 2017-10-17 14:41           ` Marc Zyngier
  -1 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2017-10-17 14:41 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Christoffer Dall, Catalin Marinas, Will Deacon, linux-arm-kernel,
	kvm, kvmarm

On 17/10/17 15:28, Christoffer Dall wrote:
> On Tue, Oct 17, 2017 at 09:57:34AM +0100, Marc Zyngier wrote:
>> On 16/10/17 21:07, Christoffer Dall wrote>>> unrelated: I went and read the comment in __kvm_tlb_flush_vmid_ipa, and
>>> I don't really understand why there is only a need to flush the icache
>>> if the host is running at EL1.
>>>
>>> The text seems to describe the problem of remapping executable pages
>>> within the guest.  That seems to me would require icache maintenance of
>>> the page that gets overwritten with new code, regardless of whether the
>>> host runs at EL1 or EL2.
>>>
>>> Of course it's easier done on VHE because we don't have to take a trap,
>>> but the code seems to not invalidate the icache at all for VHE systems
>>> that have VPIPT.  I'm confused.  Can you help?
>>
>> [+ Will, as he wrote that code and can reply if I say something stupid]
>>
>> Here's the trick: The VMID-tagged aspect of VPIPT only applies if the
>> CMO is used at EL0 or EL1. When used at EL2, it behaves exactly like a
>> VPIPT operation (see D4.10.2 in the ARMv8 ARM version B_b).
>>
>> So in the end, we deal with VPIPT the following way:
>>
>> - Without VHE, we perform the icache invalidation on unmap, blatting the
>> whole icache.
> 
> ok, but why can't we do the invalidation by jumping to EL2 like we do
> for some of the other CMOs ?

I don't think we have any other CMO requiring jumping to EL2. VPIPT
handling is the only one so far. I think that's why Will coupled it to
the TLB invalidation (we're already at EL2 for that).

Now, we could easily change that to follow the same flow as the rest of
the code. It only takes a separate entry point.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 01/10] KVM: arm/arm64: Split dcache/icache flushing
@ 2017-10-17 14:41           ` Marc Zyngier
  0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2017-10-17 14:41 UTC (permalink / raw)
  To: linux-arm-kernel

On 17/10/17 15:28, Christoffer Dall wrote:
> On Tue, Oct 17, 2017 at 09:57:34AM +0100, Marc Zyngier wrote:
>> On 16/10/17 21:07, Christoffer Dall wrote>>> unrelated: I went and read the comment in __kvm_tlb_flush_vmid_ipa, and
>>> I don't really understand why there is only a need to flush the icache
>>> if the host is running at EL1.
>>>
>>> The text seems to describe the problem of remapping executable pages
>>> within the guest.  That seems to me would require icache maintenance of
>>> the page that gets overwritten with new code, regardless of whether the
>>> host runs at EL1 or EL2.
>>>
>>> Of course it's easier done on VHE because we don't have to take a trap,
>>> but the code seems to not invalidate the icache at all for VHE systems
>>> that have VPIPT.  I'm confused.  Can you help?
>>
>> [+ Will, as he wrote that code and can reply if I say something stupid]
>>
>> Here's the trick: The VMID-tagged aspect of VPIPT only applies if the
>> CMO is used at EL0 or EL1. When used at EL2, it behaves exactly like a
>> VPIPT operation (see D4.10.2 in the ARMv8 ARM version B_b).
>>
>> So in the end, we deal with VPIPT the following way:
>>
>> - Without VHE, we perform the icache invalidation on unmap, blatting the
>> whole icache.
> 
> ok, but why can't we do the invalidation by jumping to EL2 like we do
> for some of the other CMOs ?

I don't think we have any other CMO requiring jumping to EL2. VPIPT
handling is the only one so far. I think that's why Will coupled it to
the TLB invalidation (we're already at EL2 for that).

Now, we could easily change that to follow the same flow as the rest of
the code. It only takes a separate entry point.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 07/10] KVM: arm/arm64: Preserve Exec permission across R/W permission faults
  2017-10-17 11:22       ` Marc Zyngier
@ 2017-10-17 14:46         ` Christoffer Dall
  -1 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2017-10-17 14:46 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, Catalin Marinas, Will Deacon, kvmarm, linux-arm-kernel

On Tue, Oct 17, 2017 at 12:22:08PM +0100, Marc Zyngier wrote:
> On 16/10/17 21:08, Christoffer Dall wrote:
> > On Mon, Oct 09, 2017 at 04:20:29PM +0100, Marc Zyngier wrote:
> >> So far, we loose the Exec property whenever we take permission
> >> faults, as we always reconstruct the PTE/PMD from scratch. This
> >> can be counter productive as we can end-up with the following
> >> fault sequence:
> >>
> >> 	X -> RO -> ROX -> RW -> RWX
> >>
> >> Instead, we can lookup the existing PTE/PMD and clear the XN bit in the
> >> new entry if it was already cleared in the old one, leadig to a much
> >> nicer fault sequence:
> >>
> >> 	X -> ROX -> RWX
> >>
> >> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >> ---
> >>  arch/arm/include/asm/kvm_mmu.h   | 10 ++++++++++
> >>  arch/arm64/include/asm/kvm_mmu.h | 10 ++++++++++
> >>  virt/kvm/arm/mmu.c               | 25 +++++++++++++++++++++++++
> >>  3 files changed, 45 insertions(+)
> >>
> >> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> >> index bf76150aad5f..ad442d86c23e 100644
> >> --- a/arch/arm/include/asm/kvm_mmu.h
> >> +++ b/arch/arm/include/asm/kvm_mmu.h
> >> @@ -107,6 +107,11 @@ static inline bool kvm_s2pte_readonly(pte_t *pte)
> >>  	return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
> >>  }
> >>  
> >> +static inline bool kvm_s2pte_exec(pte_t *pte)
> >> +{
> >> +	return !(pte_val(*pte) & L_PTE_XN);
> >> +}
> >> +
> >>  static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
> >>  {
> >>  	pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
> >> @@ -117,6 +122,11 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
> >>  	return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
> >>  }
> >>  
> >> +static inline bool kvm_s2pmd_exec(pmd_t *pmd)
> >> +{
> >> +	return !(pmd_val(*pmd) & PMD_SECT_XN);
> >> +}
> >> +
> >>  static inline bool kvm_page_empty(void *ptr)
> >>  {
> >>  	struct page *ptr_page = virt_to_page(ptr);
> >> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> >> index 60c420a5ac0d..e7af74b8b51a 100644
> >> --- a/arch/arm64/include/asm/kvm_mmu.h
> >> +++ b/arch/arm64/include/asm/kvm_mmu.h
> >> @@ -203,6 +203,11 @@ static inline bool kvm_s2pte_readonly(pte_t *pte)
> >>  	return (pte_val(*pte) & PTE_S2_RDWR) == PTE_S2_RDONLY;
> >>  }
> >>  
> >> +static inline bool kvm_s2pte_exec(pte_t *pte)
> >> +{
> >> +	return !(pte_val(*pte) & PTE_S2_XN);
> >> +}
> >> +
> >>  static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
> >>  {
> >>  	kvm_set_s2pte_readonly((pte_t *)pmd);
> >> @@ -213,6 +218,11 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
> >>  	return kvm_s2pte_readonly((pte_t *)pmd);
> >>  }
> >>  
> >> +static inline bool kvm_s2pmd_exec(pmd_t *pmd)
> >> +{
> >> +	return !(pmd_val(*pmd) & PMD_S2_XN);
> >> +}
> >> +
> >>  static inline bool kvm_page_empty(void *ptr)
> >>  {
> >>  	struct page *ptr_page = virt_to_page(ptr);
> >> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> >> index 1911fadde88b..ccc6106764a6 100644
> >> --- a/virt/kvm/arm/mmu.c
> >> +++ b/virt/kvm/arm/mmu.c
> >> @@ -926,6 +926,17 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
> >>  	return 0;
> >>  }
> >>  
> >> +static pte_t *stage2_get_pte(struct kvm *kvm, phys_addr_t addr)
> >> +{
> >> +	pmd_t *pmdp;
> >> +
> >> +	pmdp = stage2_get_pmd(kvm, NULL, addr);
> >> +	if (!pmdp || pmd_none(*pmdp))
> >> +		return NULL;
> >> +
> >> +	return pte_offset_kernel(pmdp, addr);
> >> +}
> >> +
> > 
> > nit, couldn't you change this to be
> > 
> >     stage2_is_exec(struct kvm *kvm, phys_addr_t addr)
> > 
> > Which, if the pmd is a section mapping just checks that, and if we find
> > a pte, we check that, and then we can have a simpler one-line call and
> > check from both the pte and pmd paths below?
> 
> Yes, that's pretty neat. I've folded that in.
> 
> > 
> >>  static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
> >>  			  phys_addr_t addr, const pte_t *new_pte,
> >>  			  unsigned long flags)
> >> @@ -1407,6 +1418,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >>  		if (exec_fault) {
> >>  			new_pmd = kvm_s2pmd_mkexec(new_pmd);
> >>  			coherent_icache_guest_page(vcpu, pfn, PMD_SIZE);
> >> +		} else if (fault_status == FSC_PERM) {
> >> +			/* Preserve execute if XN was already cleared */
> >> +			pmd_t *old_pmdp = stage2_get_pmd(kvm, NULL, fault_ipa);
> >> +
> >> +			if (old_pmdp && pmd_present(*old_pmdp) &&
> >> +			    kvm_s2pmd_exec(old_pmdp))
> >> +				new_pmd = kvm_s2pmd_mkexec(new_pmd);
> > 
> > Is the reverse case not also possible then?  That is, if we have an
> > exec_fault, we could check if the entry is already writable and maintain
> > the property as well.  Not sure how often that would get hit though, as
> > a VM would only execute instructions on a page that has been written to,
> > but is somehow read-only at stage2, meaning the host must have marked
> > the page as read-only since content was written.  I think this could be
> > a somewhat common pattern with something like KSM though?
> 
> I think this is already the case, because we always build the PTE/PMD as
> either ROXN or RWXN, and only later clear the XN bit (see the
> unconditional call to gfn_to_pfn_prot which should tell us whether to
> map the page as writable or not). Or am I missing your point entirely?
> 

I am worried about the flow where we map the page as RWXN, then execute
some code on it, so it becomes RW, then make it read-only as ROXN.
Shouldn't we we should preserve the exec state here?

I'm guessing that something like KSM will want to make pages read-only
to support COW, but perhaps that's always done by copying the content to
a new page and redirecting the old mapping to a new mapping (something
that calls kvm_set_spte_hva()) and in that case we do probably really
want the XN bit to be set again so that we can do the necessary
maintenance.

However, we shouldn't try to optimize for something we don't know to be
a problem, so as long as it's functionally correct, which I think it is,
we should be fine.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 07/10] KVM: arm/arm64: Preserve Exec permission across R/W permission faults
@ 2017-10-17 14:46         ` Christoffer Dall
  0 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2017-10-17 14:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Oct 17, 2017 at 12:22:08PM +0100, Marc Zyngier wrote:
> On 16/10/17 21:08, Christoffer Dall wrote:
> > On Mon, Oct 09, 2017 at 04:20:29PM +0100, Marc Zyngier wrote:
> >> So far, we loose the Exec property whenever we take permission
> >> faults, as we always reconstruct the PTE/PMD from scratch. This
> >> can be counter productive as we can end-up with the following
> >> fault sequence:
> >>
> >> 	X -> RO -> ROX -> RW -> RWX
> >>
> >> Instead, we can lookup the existing PTE/PMD and clear the XN bit in the
> >> new entry if it was already cleared in the old one, leadig to a much
> >> nicer fault sequence:
> >>
> >> 	X -> ROX -> RWX
> >>
> >> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >> ---
> >>  arch/arm/include/asm/kvm_mmu.h   | 10 ++++++++++
> >>  arch/arm64/include/asm/kvm_mmu.h | 10 ++++++++++
> >>  virt/kvm/arm/mmu.c               | 25 +++++++++++++++++++++++++
> >>  3 files changed, 45 insertions(+)
> >>
> >> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> >> index bf76150aad5f..ad442d86c23e 100644
> >> --- a/arch/arm/include/asm/kvm_mmu.h
> >> +++ b/arch/arm/include/asm/kvm_mmu.h
> >> @@ -107,6 +107,11 @@ static inline bool kvm_s2pte_readonly(pte_t *pte)
> >>  	return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
> >>  }
> >>  
> >> +static inline bool kvm_s2pte_exec(pte_t *pte)
> >> +{
> >> +	return !(pte_val(*pte) & L_PTE_XN);
> >> +}
> >> +
> >>  static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
> >>  {
> >>  	pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
> >> @@ -117,6 +122,11 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
> >>  	return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
> >>  }
> >>  
> >> +static inline bool kvm_s2pmd_exec(pmd_t *pmd)
> >> +{
> >> +	return !(pmd_val(*pmd) & PMD_SECT_XN);
> >> +}
> >> +
> >>  static inline bool kvm_page_empty(void *ptr)
> >>  {
> >>  	struct page *ptr_page = virt_to_page(ptr);
> >> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> >> index 60c420a5ac0d..e7af74b8b51a 100644
> >> --- a/arch/arm64/include/asm/kvm_mmu.h
> >> +++ b/arch/arm64/include/asm/kvm_mmu.h
> >> @@ -203,6 +203,11 @@ static inline bool kvm_s2pte_readonly(pte_t *pte)
> >>  	return (pte_val(*pte) & PTE_S2_RDWR) == PTE_S2_RDONLY;
> >>  }
> >>  
> >> +static inline bool kvm_s2pte_exec(pte_t *pte)
> >> +{
> >> +	return !(pte_val(*pte) & PTE_S2_XN);
> >> +}
> >> +
> >>  static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
> >>  {
> >>  	kvm_set_s2pte_readonly((pte_t *)pmd);
> >> @@ -213,6 +218,11 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
> >>  	return kvm_s2pte_readonly((pte_t *)pmd);
> >>  }
> >>  
> >> +static inline bool kvm_s2pmd_exec(pmd_t *pmd)
> >> +{
> >> +	return !(pmd_val(*pmd) & PMD_S2_XN);
> >> +}
> >> +
> >>  static inline bool kvm_page_empty(void *ptr)
> >>  {
> >>  	struct page *ptr_page = virt_to_page(ptr);
> >> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> >> index 1911fadde88b..ccc6106764a6 100644
> >> --- a/virt/kvm/arm/mmu.c
> >> +++ b/virt/kvm/arm/mmu.c
> >> @@ -926,6 +926,17 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
> >>  	return 0;
> >>  }
> >>  
> >> +static pte_t *stage2_get_pte(struct kvm *kvm, phys_addr_t addr)
> >> +{
> >> +	pmd_t *pmdp;
> >> +
> >> +	pmdp = stage2_get_pmd(kvm, NULL, addr);
> >> +	if (!pmdp || pmd_none(*pmdp))
> >> +		return NULL;
> >> +
> >> +	return pte_offset_kernel(pmdp, addr);
> >> +}
> >> +
> > 
> > nit, couldn't you change this to be
> > 
> >     stage2_is_exec(struct kvm *kvm, phys_addr_t addr)
> > 
> > Which, if the pmd is a section mapping just checks that, and if we find
> > a pte, we check that, and then we can have a simpler one-line call and
> > check from both the pte and pmd paths below?
> 
> Yes, that's pretty neat. I've folded that in.
> 
> > 
> >>  static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
> >>  			  phys_addr_t addr, const pte_t *new_pte,
> >>  			  unsigned long flags)
> >> @@ -1407,6 +1418,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >>  		if (exec_fault) {
> >>  			new_pmd = kvm_s2pmd_mkexec(new_pmd);
> >>  			coherent_icache_guest_page(vcpu, pfn, PMD_SIZE);
> >> +		} else if (fault_status == FSC_PERM) {
> >> +			/* Preserve execute if XN was already cleared */
> >> +			pmd_t *old_pmdp = stage2_get_pmd(kvm, NULL, fault_ipa);
> >> +
> >> +			if (old_pmdp && pmd_present(*old_pmdp) &&
> >> +			    kvm_s2pmd_exec(old_pmdp))
> >> +				new_pmd = kvm_s2pmd_mkexec(new_pmd);
> > 
> > Is the reverse case not also possible then?  That is, if we have an
> > exec_fault, we could check if the entry is already writable and maintain
> > the property as well.  Not sure how often that would get hit though, as
> > a VM would only execute instructions on a page that has been written to,
> > but is somehow read-only at stage2, meaning the host must have marked
> > the page as read-only since content was written.  I think this could be
> > a somewhat common pattern with something like KSM though?
> 
> I think this is already the case, because we always build the PTE/PMD as
> either ROXN or RWXN, and only later clear the XN bit (see the
> unconditional call to gfn_to_pfn_prot which should tell us whether to
> map the page as writable or not). Or am I missing your point entirely?
> 

I am worried about the flow where we map the page as RWXN, then execute
some code on it, so it becomes RW, then make it read-only as ROXN.
Shouldn't we we should preserve the exec state here?

I'm guessing that something like KSM will want to make pages read-only
to support COW, but perhaps that's always done by copying the content to
a new page and redirecting the old mapping to a new mapping (something
that calls kvm_set_spte_hva()) and in that case we do probably really
want the XN bit to be set again so that we can do the necessary
maintenance.

However, we shouldn't try to optimize for something we don't know to be
a problem, so as long as it's functionally correct, which I think it is,
we should be fine.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 10/10] arm: KVM: Use common implementation for all flushes to PoC
  2017-10-17 12:40       ` Marc Zyngier
@ 2017-10-17 14:48         ` Christoffer Dall
  -1 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2017-10-17 14:48 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, Catalin Marinas, Will Deacon, kvmarm, linux-arm-kernel

On Tue, Oct 17, 2017 at 01:40:00PM +0100, Marc Zyngier wrote:
> On 16/10/17 21:06, Christoffer Dall wrote:
> > On Mon, Oct 09, 2017 at 04:20:32PM +0100, Marc Zyngier wrote:
> >> We currently have no less than three implementations for the
> >> "flush to PoC" code. Let standardize on a single one. This
> >> requires a bit of unpleasant moving around, and relies on
> >> __kvm_flush_dcache_pte and co being #defines so that they can
> >> call into coherent_dcache_guest_page...
> >>
> >> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >> ---
> >>  arch/arm/include/asm/kvm_mmu.h | 28 ++++------------------------
> >>  virt/kvm/arm/mmu.c             | 20 ++++++++++----------
> >>  2 files changed, 14 insertions(+), 34 deletions(-)
> >>
> >> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> >> index 5f1ac88a5951..011b0db85c02 100644
> >> --- a/arch/arm/include/asm/kvm_mmu.h
> >> +++ b/arch/arm/include/asm/kvm_mmu.h
> >> @@ -235,31 +235,11 @@ static inline void __coherent_icache_guest_page(kvm_pfn_t pfn,
> >>  	}
> >>  }
> >>  
> >> -static inline void __kvm_flush_dcache_pte(pte_t pte)
> >> -{
> >> -	void *va = kmap_atomic(pte_page(pte));
> >> -
> >> -	kvm_flush_dcache_to_poc(va, PAGE_SIZE);
> >> -
> >> -	kunmap_atomic(va);
> >> -}
> >> -
> >> -static inline void __kvm_flush_dcache_pmd(pmd_t pmd)
> >> -{
> >> -	unsigned long size = PMD_SIZE;
> >> -	kvm_pfn_t pfn = pmd_pfn(pmd);
> >> -
> >> -	while (size) {
> >> -		void *va = kmap_atomic_pfn(pfn);
> >> +#define __kvm_flush_dcache_pte(p)				\
> >> +	coherent_dcache_guest_page(pte_pfn((p)), PAGE_SIZE)
> >>  
> >> -		kvm_flush_dcache_to_poc(va, PAGE_SIZE);
> >> -
> >> -		pfn++;
> >> -		size -= PAGE_SIZE;
> >> -
> >> -		kunmap_atomic(va);
> >> -	}
> >> -}
> >> +#define __kvm_flush_dcache_pmd(p)				\
> >> +	coherent_dcache_guest_page(pmd_pfn((p)), PMD_SIZE)
> > 
> > Why can't these just be static inlines which call
> > __coherent_dcache_guest_page already in the header file directly?
> 
> Because if we do that, we get a significant code expansion in the
> resulting binary (all the call sites end up having a copy of that function.
> 
> > I'm really not too crazy about these #defines.
> 
> Neither am I. But actually, this patch is completely wrong. Using the
> same functions as the guest cleaning doesn't provide the guarantees
> documented next to unmap_stage2_ptes, as we need a clean+invalidate, not
> just a clean.
> 
> I'll rework this patch (or just drop it).
> 
> > In fact, why do we need the coherent_Xcache_guest_page static
> > indirection functions in mmu.c in the first place?
> 
> Code expansion. That's the only reason.
> 
Then maybe a reworked patch needs a function defined in some
arch-specific object file that we can just call.  The functions don't
look that complicated to me, but I suppose if they inline the things
they call, it could become a bit hairy.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 10/10] arm: KVM: Use common implementation for all flushes to PoC
@ 2017-10-17 14:48         ` Christoffer Dall
  0 siblings, 0 replies; 78+ messages in thread
From: Christoffer Dall @ 2017-10-17 14:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Oct 17, 2017 at 01:40:00PM +0100, Marc Zyngier wrote:
> On 16/10/17 21:06, Christoffer Dall wrote:
> > On Mon, Oct 09, 2017 at 04:20:32PM +0100, Marc Zyngier wrote:
> >> We currently have no less than three implementations for the
> >> "flush to PoC" code. Let standardize on a single one. This
> >> requires a bit of unpleasant moving around, and relies on
> >> __kvm_flush_dcache_pte and co being #defines so that they can
> >> call into coherent_dcache_guest_page...
> >>
> >> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >> ---
> >>  arch/arm/include/asm/kvm_mmu.h | 28 ++++------------------------
> >>  virt/kvm/arm/mmu.c             | 20 ++++++++++----------
> >>  2 files changed, 14 insertions(+), 34 deletions(-)
> >>
> >> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> >> index 5f1ac88a5951..011b0db85c02 100644
> >> --- a/arch/arm/include/asm/kvm_mmu.h
> >> +++ b/arch/arm/include/asm/kvm_mmu.h
> >> @@ -235,31 +235,11 @@ static inline void __coherent_icache_guest_page(kvm_pfn_t pfn,
> >>  	}
> >>  }
> >>  
> >> -static inline void __kvm_flush_dcache_pte(pte_t pte)
> >> -{
> >> -	void *va = kmap_atomic(pte_page(pte));
> >> -
> >> -	kvm_flush_dcache_to_poc(va, PAGE_SIZE);
> >> -
> >> -	kunmap_atomic(va);
> >> -}
> >> -
> >> -static inline void __kvm_flush_dcache_pmd(pmd_t pmd)
> >> -{
> >> -	unsigned long size = PMD_SIZE;
> >> -	kvm_pfn_t pfn = pmd_pfn(pmd);
> >> -
> >> -	while (size) {
> >> -		void *va = kmap_atomic_pfn(pfn);
> >> +#define __kvm_flush_dcache_pte(p)				\
> >> +	coherent_dcache_guest_page(pte_pfn((p)), PAGE_SIZE)
> >>  
> >> -		kvm_flush_dcache_to_poc(va, PAGE_SIZE);
> >> -
> >> -		pfn++;
> >> -		size -= PAGE_SIZE;
> >> -
> >> -		kunmap_atomic(va);
> >> -	}
> >> -}
> >> +#define __kvm_flush_dcache_pmd(p)				\
> >> +	coherent_dcache_guest_page(pmd_pfn((p)), PMD_SIZE)
> > 
> > Why can't these just be static inlines which call
> > __coherent_dcache_guest_page already in the header file directly?
> 
> Because if we do that, we get a significant code expansion in the
> resulting binary (all the call sites end up having a copy of that function.
> 
> > I'm really not too crazy about these #defines.
> 
> Neither am I. But actually, this patch is completely wrong. Using the
> same functions as the guest cleaning doesn't provide the guarantees
> documented next to unmap_stage2_ptes, as we need a clean+invalidate, not
> just a clean.
> 
> I'll rework this patch (or just drop it).
> 
> > In fact, why do we need the coherent_Xcache_guest_page static
> > indirection functions in mmu.c in the first place?
> 
> Code expansion. That's the only reason.
> 
Then maybe a reworked patch needs a function defined in some
arch-specific object file that we can just call.  The functions don't
look that complicated to me, but I suppose if they inline the things
they call, it could become a bit hairy.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 06/10] KVM: arm/arm64: Only clean the dcache on translation fault
  2017-10-17 14:36         ` Christoffer Dall
@ 2017-10-17 14:52           ` Marc Zyngier
  -1 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2017-10-17 14:52 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Christoffer Dall, Catalin Marinas, Will Deacon, linux-arm-kernel,
	kvm, kvmarm

On 17/10/17 15:36, Christoffer Dall wrote:
> On Tue, Oct 17, 2017 at 10:34:15AM +0100, Marc Zyngier wrote:
>> On 16/10/17 21:08, Christoffer Dall wrote:
>>> On Mon, Oct 09, 2017 at 04:20:28PM +0100, Marc Zyngier wrote:
>>>> The only case where we actually need to perform a dache maintenance
>>>> is when we map the page for the first time, and subsequent permission
>>>> faults do not require cache maintenance. Let's make it conditional
>>>> on not being a permission fault (and thus a translation fault).
>>>
>>> Why do we actually need to do any dcache maintenance when faulting in a
>>> page?
>>>
>>> Is this for the case when the stage 1 MMU is disabled, or to support
>>> guest mappings using uncached attributes?
>>
>> These are indeed the two cases that require cleaning the dcache to PoC.
>>
>>> Can we do better, for example
>>> by only flushing the cache if the guest MMU is disabled?
>>
>> The guest MMU being disabled is easy. But the uncached mapping is much
>> trickier, and would involve parsing the guest page tables. Not something
>> I'm really eager to implement.
>>
> 
> Hmm, if the guest actually maps memory uncached, wouldn't it have to
> invalidate caches itself, or is this the annoying thing where disabling
> the MMU on hardware that doesn't have stage 2 would in fact always
> completely bypass the cache, and therefore we have to do this work?

The architecture is massively ambiguous about what is actually required
in terms of CMOs when using an uncached mapping (and whether you can hit
in the cache or not).

But even if the guest had done an invalidate in order not to hit in the
cache, the host could have evicted the page to disk, the guest faulted
the page back in, and it would still be the host's responsibility to
ensure that the guest sees something consistent.

> Sorry, I have forgotten all the details here, but wanted to make sure
> we're not being overly careful.

No worries, I'm always happy to revisit this particular rabbit hole... ;-)

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 06/10] KVM: arm/arm64: Only clean the dcache on translation fault
@ 2017-10-17 14:52           ` Marc Zyngier
  0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2017-10-17 14:52 UTC (permalink / raw)
  To: linux-arm-kernel

On 17/10/17 15:36, Christoffer Dall wrote:
> On Tue, Oct 17, 2017 at 10:34:15AM +0100, Marc Zyngier wrote:
>> On 16/10/17 21:08, Christoffer Dall wrote:
>>> On Mon, Oct 09, 2017 at 04:20:28PM +0100, Marc Zyngier wrote:
>>>> The only case where we actually need to perform a dache maintenance
>>>> is when we map the page for the first time, and subsequent permission
>>>> faults do not require cache maintenance. Let's make it conditional
>>>> on not being a permission fault (and thus a translation fault).
>>>
>>> Why do we actually need to do any dcache maintenance when faulting in a
>>> page?
>>>
>>> Is this for the case when the stage 1 MMU is disabled, or to support
>>> guest mappings using uncached attributes?
>>
>> These are indeed the two cases that require cleaning the dcache to PoC.
>>
>>> Can we do better, for example
>>> by only flushing the cache if the guest MMU is disabled?
>>
>> The guest MMU being disabled is easy. But the uncached mapping is much
>> trickier, and would involve parsing the guest page tables. Not something
>> I'm really eager to implement.
>>
> 
> Hmm, if the guest actually maps memory uncached, wouldn't it have to
> invalidate caches itself, or is this the annoying thing where disabling
> the MMU on hardware that doesn't have stage 2 would in fact always
> completely bypass the cache, and therefore we have to do this work?

The architecture is massively ambiguous about what is actually required
in terms of CMOs when using an uncached mapping (and whether you can hit
in the cache or not).

But even if the guest had done an invalidate in order not to hit in the
cache, the host could have evicted the page to disk, the guest faulted
the page back in, and it would still be the host's responsibility to
ensure that the guest sees something consistent.

> Sorry, I have forgotten all the details here, but wanted to make sure
> we're not being overly careful.

No worries, I'm always happy to revisit this particular rabbit hole... ;-)

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 07/10] KVM: arm/arm64: Preserve Exec permission across R/W permission faults
  2017-10-17 14:46         ` Christoffer Dall
@ 2017-10-17 15:04           ` Marc Zyngier
  -1 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2017-10-17 15:04 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvm, Catalin Marinas, Will Deacon, kvmarm, linux-arm-kernel

On 17/10/17 15:46, Christoffer Dall wrote:
> On Tue, Oct 17, 2017 at 12:22:08PM +0100, Marc Zyngier wrote:
>> On 16/10/17 21:08, Christoffer Dall wrote:
>>> On Mon, Oct 09, 2017 at 04:20:29PM +0100, Marc Zyngier wrote:
>>>> So far, we loose the Exec property whenever we take permission
>>>> faults, as we always reconstruct the PTE/PMD from scratch. This
>>>> can be counter productive as we can end-up with the following
>>>> fault sequence:
>>>>
>>>> 	X -> RO -> ROX -> RW -> RWX
>>>>
>>>> Instead, we can lookup the existing PTE/PMD and clear the XN bit in the
>>>> new entry if it was already cleared in the old one, leadig to a much
>>>> nicer fault sequence:
>>>>
>>>> 	X -> ROX -> RWX
>>>>
>>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>>> ---
>>>>  arch/arm/include/asm/kvm_mmu.h   | 10 ++++++++++
>>>>  arch/arm64/include/asm/kvm_mmu.h | 10 ++++++++++
>>>>  virt/kvm/arm/mmu.c               | 25 +++++++++++++++++++++++++
>>>>  3 files changed, 45 insertions(+)
>>>>
>>>> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
>>>> index bf76150aad5f..ad442d86c23e 100644
>>>> --- a/arch/arm/include/asm/kvm_mmu.h
>>>> +++ b/arch/arm/include/asm/kvm_mmu.h
>>>> @@ -107,6 +107,11 @@ static inline bool kvm_s2pte_readonly(pte_t *pte)
>>>>  	return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
>>>>  }
>>>>  
>>>> +static inline bool kvm_s2pte_exec(pte_t *pte)
>>>> +{
>>>> +	return !(pte_val(*pte) & L_PTE_XN);
>>>> +}
>>>> +
>>>>  static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
>>>>  {
>>>>  	pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
>>>> @@ -117,6 +122,11 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
>>>>  	return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
>>>>  }
>>>>  
>>>> +static inline bool kvm_s2pmd_exec(pmd_t *pmd)
>>>> +{
>>>> +	return !(pmd_val(*pmd) & PMD_SECT_XN);
>>>> +}
>>>> +
>>>>  static inline bool kvm_page_empty(void *ptr)
>>>>  {
>>>>  	struct page *ptr_page = virt_to_page(ptr);
>>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
>>>> index 60c420a5ac0d..e7af74b8b51a 100644
>>>> --- a/arch/arm64/include/asm/kvm_mmu.h
>>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>>>> @@ -203,6 +203,11 @@ static inline bool kvm_s2pte_readonly(pte_t *pte)
>>>>  	return (pte_val(*pte) & PTE_S2_RDWR) == PTE_S2_RDONLY;
>>>>  }
>>>>  
>>>> +static inline bool kvm_s2pte_exec(pte_t *pte)
>>>> +{
>>>> +	return !(pte_val(*pte) & PTE_S2_XN);
>>>> +}
>>>> +
>>>>  static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
>>>>  {
>>>>  	kvm_set_s2pte_readonly((pte_t *)pmd);
>>>> @@ -213,6 +218,11 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
>>>>  	return kvm_s2pte_readonly((pte_t *)pmd);
>>>>  }
>>>>  
>>>> +static inline bool kvm_s2pmd_exec(pmd_t *pmd)
>>>> +{
>>>> +	return !(pmd_val(*pmd) & PMD_S2_XN);
>>>> +}
>>>> +
>>>>  static inline bool kvm_page_empty(void *ptr)
>>>>  {
>>>>  	struct page *ptr_page = virt_to_page(ptr);
>>>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>>>> index 1911fadde88b..ccc6106764a6 100644
>>>> --- a/virt/kvm/arm/mmu.c
>>>> +++ b/virt/kvm/arm/mmu.c
>>>> @@ -926,6 +926,17 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
>>>>  	return 0;
>>>>  }
>>>>  
>>>> +static pte_t *stage2_get_pte(struct kvm *kvm, phys_addr_t addr)
>>>> +{
>>>> +	pmd_t *pmdp;
>>>> +
>>>> +	pmdp = stage2_get_pmd(kvm, NULL, addr);
>>>> +	if (!pmdp || pmd_none(*pmdp))
>>>> +		return NULL;
>>>> +
>>>> +	return pte_offset_kernel(pmdp, addr);
>>>> +}
>>>> +
>>>
>>> nit, couldn't you change this to be
>>>
>>>     stage2_is_exec(struct kvm *kvm, phys_addr_t addr)
>>>
>>> Which, if the pmd is a section mapping just checks that, and if we find
>>> a pte, we check that, and then we can have a simpler one-line call and
>>> check from both the pte and pmd paths below?
>>
>> Yes, that's pretty neat. I've folded that in.
>>
>>>
>>>>  static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
>>>>  			  phys_addr_t addr, const pte_t *new_pte,
>>>>  			  unsigned long flags)
>>>> @@ -1407,6 +1418,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>>  		if (exec_fault) {
>>>>  			new_pmd = kvm_s2pmd_mkexec(new_pmd);
>>>>  			coherent_icache_guest_page(vcpu, pfn, PMD_SIZE);
>>>> +		} else if (fault_status == FSC_PERM) {
>>>> +			/* Preserve execute if XN was already cleared */
>>>> +			pmd_t *old_pmdp = stage2_get_pmd(kvm, NULL, fault_ipa);
>>>> +
>>>> +			if (old_pmdp && pmd_present(*old_pmdp) &&
>>>> +			    kvm_s2pmd_exec(old_pmdp))
>>>> +				new_pmd = kvm_s2pmd_mkexec(new_pmd);
>>>
>>> Is the reverse case not also possible then?  That is, if we have an
>>> exec_fault, we could check if the entry is already writable and maintain
>>> the property as well.  Not sure how often that would get hit though, as
>>> a VM would only execute instructions on a page that has been written to,
>>> but is somehow read-only at stage2, meaning the host must have marked
>>> the page as read-only since content was written.  I think this could be
>>> a somewhat common pattern with something like KSM though?
>>
>> I think this is already the case, because we always build the PTE/PMD as
>> either ROXN or RWXN, and only later clear the XN bit (see the
>> unconditional call to gfn_to_pfn_prot which should tell us whether to
>> map the page as writable or not). Or am I missing your point entirely?
>>
> 
> I am worried about the flow where we map the page as RWXN, then execute
> some code on it, so it becomes RW, then make it read-only as ROXN.
> Shouldn't we we should preserve the exec state here?

I think that for this situation to happen (RW -> ROXN), we must
transition via an unmap. At that stage, there is nothing to preserve.

> I'm guessing that something like KSM will want to make pages read-only
> to support COW, but perhaps that's always done by copying the content to
> a new page and redirecting the old mapping to a new mapping (something
> that calls kvm_set_spte_hva()) and in that case we do probably really
> want the XN bit to be set again so that we can do the necessary
> maintenance.

Exactly. New mapping, fresh caches.

> However, we shouldn't try to optimize for something we don't know to be
> a problem, so as long as it's functionally correct, which I think it is,
> we should be fine.

Agreed.

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 07/10] KVM: arm/arm64: Preserve Exec permission across R/W permission faults
@ 2017-10-17 15:04           ` Marc Zyngier
  0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2017-10-17 15:04 UTC (permalink / raw)
  To: linux-arm-kernel

On 17/10/17 15:46, Christoffer Dall wrote:
> On Tue, Oct 17, 2017 at 12:22:08PM +0100, Marc Zyngier wrote:
>> On 16/10/17 21:08, Christoffer Dall wrote:
>>> On Mon, Oct 09, 2017 at 04:20:29PM +0100, Marc Zyngier wrote:
>>>> So far, we loose the Exec property whenever we take permission
>>>> faults, as we always reconstruct the PTE/PMD from scratch. This
>>>> can be counter productive as we can end-up with the following
>>>> fault sequence:
>>>>
>>>> 	X -> RO -> ROX -> RW -> RWX
>>>>
>>>> Instead, we can lookup the existing PTE/PMD and clear the XN bit in the
>>>> new entry if it was already cleared in the old one, leadig to a much
>>>> nicer fault sequence:
>>>>
>>>> 	X -> ROX -> RWX
>>>>
>>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>>> ---
>>>>  arch/arm/include/asm/kvm_mmu.h   | 10 ++++++++++
>>>>  arch/arm64/include/asm/kvm_mmu.h | 10 ++++++++++
>>>>  virt/kvm/arm/mmu.c               | 25 +++++++++++++++++++++++++
>>>>  3 files changed, 45 insertions(+)
>>>>
>>>> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
>>>> index bf76150aad5f..ad442d86c23e 100644
>>>> --- a/arch/arm/include/asm/kvm_mmu.h
>>>> +++ b/arch/arm/include/asm/kvm_mmu.h
>>>> @@ -107,6 +107,11 @@ static inline bool kvm_s2pte_readonly(pte_t *pte)
>>>>  	return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
>>>>  }
>>>>  
>>>> +static inline bool kvm_s2pte_exec(pte_t *pte)
>>>> +{
>>>> +	return !(pte_val(*pte) & L_PTE_XN);
>>>> +}
>>>> +
>>>>  static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
>>>>  {
>>>>  	pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
>>>> @@ -117,6 +122,11 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
>>>>  	return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
>>>>  }
>>>>  
>>>> +static inline bool kvm_s2pmd_exec(pmd_t *pmd)
>>>> +{
>>>> +	return !(pmd_val(*pmd) & PMD_SECT_XN);
>>>> +}
>>>> +
>>>>  static inline bool kvm_page_empty(void *ptr)
>>>>  {
>>>>  	struct page *ptr_page = virt_to_page(ptr);
>>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
>>>> index 60c420a5ac0d..e7af74b8b51a 100644
>>>> --- a/arch/arm64/include/asm/kvm_mmu.h
>>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>>>> @@ -203,6 +203,11 @@ static inline bool kvm_s2pte_readonly(pte_t *pte)
>>>>  	return (pte_val(*pte) & PTE_S2_RDWR) == PTE_S2_RDONLY;
>>>>  }
>>>>  
>>>> +static inline bool kvm_s2pte_exec(pte_t *pte)
>>>> +{
>>>> +	return !(pte_val(*pte) & PTE_S2_XN);
>>>> +}
>>>> +
>>>>  static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
>>>>  {
>>>>  	kvm_set_s2pte_readonly((pte_t *)pmd);
>>>> @@ -213,6 +218,11 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
>>>>  	return kvm_s2pte_readonly((pte_t *)pmd);
>>>>  }
>>>>  
>>>> +static inline bool kvm_s2pmd_exec(pmd_t *pmd)
>>>> +{
>>>> +	return !(pmd_val(*pmd) & PMD_S2_XN);
>>>> +}
>>>> +
>>>>  static inline bool kvm_page_empty(void *ptr)
>>>>  {
>>>>  	struct page *ptr_page = virt_to_page(ptr);
>>>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>>>> index 1911fadde88b..ccc6106764a6 100644
>>>> --- a/virt/kvm/arm/mmu.c
>>>> +++ b/virt/kvm/arm/mmu.c
>>>> @@ -926,6 +926,17 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
>>>>  	return 0;
>>>>  }
>>>>  
>>>> +static pte_t *stage2_get_pte(struct kvm *kvm, phys_addr_t addr)
>>>> +{
>>>> +	pmd_t *pmdp;
>>>> +
>>>> +	pmdp = stage2_get_pmd(kvm, NULL, addr);
>>>> +	if (!pmdp || pmd_none(*pmdp))
>>>> +		return NULL;
>>>> +
>>>> +	return pte_offset_kernel(pmdp, addr);
>>>> +}
>>>> +
>>>
>>> nit, couldn't you change this to be
>>>
>>>     stage2_is_exec(struct kvm *kvm, phys_addr_t addr)
>>>
>>> Which, if the pmd is a section mapping just checks that, and if we find
>>> a pte, we check that, and then we can have a simpler one-line call and
>>> check from both the pte and pmd paths below?
>>
>> Yes, that's pretty neat. I've folded that in.
>>
>>>
>>>>  static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
>>>>  			  phys_addr_t addr, const pte_t *new_pte,
>>>>  			  unsigned long flags)
>>>> @@ -1407,6 +1418,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>>  		if (exec_fault) {
>>>>  			new_pmd = kvm_s2pmd_mkexec(new_pmd);
>>>>  			coherent_icache_guest_page(vcpu, pfn, PMD_SIZE);
>>>> +		} else if (fault_status == FSC_PERM) {
>>>> +			/* Preserve execute if XN was already cleared */
>>>> +			pmd_t *old_pmdp = stage2_get_pmd(kvm, NULL, fault_ipa);
>>>> +
>>>> +			if (old_pmdp && pmd_present(*old_pmdp) &&
>>>> +			    kvm_s2pmd_exec(old_pmdp))
>>>> +				new_pmd = kvm_s2pmd_mkexec(new_pmd);
>>>
>>> Is the reverse case not also possible then?  That is, if we have an
>>> exec_fault, we could check if the entry is already writable and maintain
>>> the property as well.  Not sure how often that would get hit though, as
>>> a VM would only execute instructions on a page that has been written to,
>>> but is somehow read-only at stage2, meaning the host must have marked
>>> the page as read-only since content was written.  I think this could be
>>> a somewhat common pattern with something like KSM though?
>>
>> I think this is already the case, because we always build the PTE/PMD as
>> either ROXN or RWXN, and only later clear the XN bit (see the
>> unconditional call to gfn_to_pfn_prot which should tell us whether to
>> map the page as writable or not). Or am I missing your point entirely?
>>
> 
> I am worried about the flow where we map the page as RWXN, then execute
> some code on it, so it becomes RW, then make it read-only as ROXN.
> Shouldn't we we should preserve the exec state here?

I think that for this situation to happen (RW -> ROXN), we must
transition via an unmap. At that stage, there is nothing to preserve.

> I'm guessing that something like KSM will want to make pages read-only
> to support COW, but perhaps that's always done by copying the content to
> a new page and redirecting the old mapping to a new mapping (something
> that calls kvm_set_spte_hva()) and in that case we do probably really
> want the XN bit to be set again so that we can do the necessary
> maintenance.

Exactly. New mapping, fresh caches.

> However, we shouldn't try to optimize for something we don't know to be
> a problem, so as long as it's functionally correct, which I think it is,
> we should be fine.

Agreed.

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 02/10] arm64: KVM: Add invalidate_icache_range helper
  2017-10-09 15:20   ` Marc Zyngier
@ 2017-10-19 16:47     ` Will Deacon
  -1 siblings, 0 replies; 78+ messages in thread
From: Will Deacon @ 2017-10-19 16:47 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: Catalin Marinas, kvmarm, linux-arm-kernel, kvm

On Mon, Oct 09, 2017 at 04:20:24PM +0100, Marc Zyngier wrote:
> We currently tightly couple dcache clean with icache invalidation,
> but KVM could do without the initial flush to PoU, as we've
> already flushed things to PoC.
> 
> Let's introduce invalidate_icache_range which is limited to
> invalidating the icache from the linear mapping (and thus
> has none of the userspace fault handling complexity), and
> wire it in KVM instead of flush_icache_range.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/include/asm/cacheflush.h |  8 ++++++++
>  arch/arm64/include/asm/kvm_mmu.h    |  4 ++--
>  arch/arm64/mm/cache.S               | 24 ++++++++++++++++++++++++
>  3 files changed, 34 insertions(+), 2 deletions(-)

[...]

> diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S
> index 7f1dbe962cf5..0c330666a8c9 100644
> --- a/arch/arm64/mm/cache.S
> +++ b/arch/arm64/mm/cache.S
> @@ -80,6 +80,30 @@ USER(9f, ic	ivau, x4	)		// invalidate I line PoU
>  ENDPROC(flush_icache_range)
>  ENDPROC(__flush_cache_user_range)
>  
> +/*
> + *	invalidate_icache_range(start,end)
> + *
> + *	Ensure that the I cache is invalid within specified region. This
> + *	assumes that this is done on the linear mapping. Do not use it
> + *	on a userspace range, as this may fault horribly.
> + *
> + *	- start   - virtual start address of region
> + *	- end     - virtual end address of region
> + */
> +ENTRY(invalidate_icache_range)
> +	icache_line_size x2, x3
> +	sub	x3, x2, #1
> +	bic	x4, x0, x3
> +1:
> +	ic	ivau, x4			// invalidate I line PoU
> +	add	x4, x4, x2
> +	cmp	x4, x1
> +	b.lo	1b
> +	dsb	ish
> +	isb
> +	ret
> +ENDPROC(invalidate_icache_range)

Is there a good reason not to make this work for user addresses? If it's as
simple as adding a USER annotation and a fallback, then we should wrap that
in a macro and reuse it for __flush_cache_user_range.

Will

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 02/10] arm64: KVM: Add invalidate_icache_range helper
@ 2017-10-19 16:47     ` Will Deacon
  0 siblings, 0 replies; 78+ messages in thread
From: Will Deacon @ 2017-10-19 16:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 09, 2017 at 04:20:24PM +0100, Marc Zyngier wrote:
> We currently tightly couple dcache clean with icache invalidation,
> but KVM could do without the initial flush to PoU, as we've
> already flushed things to PoC.
> 
> Let's introduce invalidate_icache_range which is limited to
> invalidating the icache from the linear mapping (and thus
> has none of the userspace fault handling complexity), and
> wire it in KVM instead of flush_icache_range.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/include/asm/cacheflush.h |  8 ++++++++
>  arch/arm64/include/asm/kvm_mmu.h    |  4 ++--
>  arch/arm64/mm/cache.S               | 24 ++++++++++++++++++++++++
>  3 files changed, 34 insertions(+), 2 deletions(-)

[...]

> diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S
> index 7f1dbe962cf5..0c330666a8c9 100644
> --- a/arch/arm64/mm/cache.S
> +++ b/arch/arm64/mm/cache.S
> @@ -80,6 +80,30 @@ USER(9f, ic	ivau, x4	)		// invalidate I line PoU
>  ENDPROC(flush_icache_range)
>  ENDPROC(__flush_cache_user_range)
>  
> +/*
> + *	invalidate_icache_range(start,end)
> + *
> + *	Ensure that the I cache is invalid within specified region. This
> + *	assumes that this is done on the linear mapping. Do not use it
> + *	on a userspace range, as this may fault horribly.
> + *
> + *	- start   - virtual start address of region
> + *	- end     - virtual end address of region
> + */
> +ENTRY(invalidate_icache_range)
> +	icache_line_size x2, x3
> +	sub	x3, x2, #1
> +	bic	x4, x0, x3
> +1:
> +	ic	ivau, x4			// invalidate I line PoU
> +	add	x4, x4, x2
> +	cmp	x4, x1
> +	b.lo	1b
> +	dsb	ish
> +	isb
> +	ret
> +ENDPROC(invalidate_icache_range)

Is there a good reason not to make this work for user addresses? If it's as
simple as adding a USER annotation and a fallback, then we should wrap that
in a macro and reuse it for __flush_cache_user_range.

Will

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 02/10] arm64: KVM: Add invalidate_icache_range helper
  2017-10-19 16:47     ` Will Deacon
@ 2017-10-20 13:41       ` Marc Zyngier
  -1 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2017-10-20 13:41 UTC (permalink / raw)
  To: Will Deacon
  Cc: Christoffer Dall, Catalin Marinas, linux-arm-kernel, kvm, kvmarm

On 19/10/17 17:47, Will Deacon wrote:
> On Mon, Oct 09, 2017 at 04:20:24PM +0100, Marc Zyngier wrote:
>> We currently tightly couple dcache clean with icache invalidation,
>> but KVM could do without the initial flush to PoU, as we've
>> already flushed things to PoC.
>>
>> Let's introduce invalidate_icache_range which is limited to
>> invalidating the icache from the linear mapping (and thus
>> has none of the userspace fault handling complexity), and
>> wire it in KVM instead of flush_icache_range.
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm64/include/asm/cacheflush.h |  8 ++++++++
>>  arch/arm64/include/asm/kvm_mmu.h    |  4 ++--
>>  arch/arm64/mm/cache.S               | 24 ++++++++++++++++++++++++
>>  3 files changed, 34 insertions(+), 2 deletions(-)
> 
> [...]
> 
>> diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S
>> index 7f1dbe962cf5..0c330666a8c9 100644
>> --- a/arch/arm64/mm/cache.S
>> +++ b/arch/arm64/mm/cache.S
>> @@ -80,6 +80,30 @@ USER(9f, ic	ivau, x4	)		// invalidate I line PoU
>>  ENDPROC(flush_icache_range)
>>  ENDPROC(__flush_cache_user_range)
>>  
>> +/*
>> + *	invalidate_icache_range(start,end)
>> + *
>> + *	Ensure that the I cache is invalid within specified region. This
>> + *	assumes that this is done on the linear mapping. Do not use it
>> + *	on a userspace range, as this may fault horribly.
>> + *
>> + *	- start   - virtual start address of region
>> + *	- end     - virtual end address of region
>> + */
>> +ENTRY(invalidate_icache_range)
>> +	icache_line_size x2, x3
>> +	sub	x3, x2, #1
>> +	bic	x4, x0, x3
>> +1:
>> +	ic	ivau, x4			// invalidate I line PoU
>> +	add	x4, x4, x2
>> +	cmp	x4, x1
>> +	b.lo	1b
>> +	dsb	ish
>> +	isb
>> +	ret
>> +ENDPROC(invalidate_icache_range)
> 
> Is there a good reason not to make this work for user addresses? If it's as
> simple as adding a USER annotation and a fallback, then we should wrap that
> in a macro and reuse it for __flush_cache_user_range.

Fair enough. I've done that now (with an optional label that triggers
the generation of a USER() annotation).

I'll post the revised series shortly.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 02/10] arm64: KVM: Add invalidate_icache_range helper
@ 2017-10-20 13:41       ` Marc Zyngier
  0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2017-10-20 13:41 UTC (permalink / raw)
  To: linux-arm-kernel

On 19/10/17 17:47, Will Deacon wrote:
> On Mon, Oct 09, 2017 at 04:20:24PM +0100, Marc Zyngier wrote:
>> We currently tightly couple dcache clean with icache invalidation,
>> but KVM could do without the initial flush to PoU, as we've
>> already flushed things to PoC.
>>
>> Let's introduce invalidate_icache_range which is limited to
>> invalidating the icache from the linear mapping (and thus
>> has none of the userspace fault handling complexity), and
>> wire it in KVM instead of flush_icache_range.
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm64/include/asm/cacheflush.h |  8 ++++++++
>>  arch/arm64/include/asm/kvm_mmu.h    |  4 ++--
>>  arch/arm64/mm/cache.S               | 24 ++++++++++++++++++++++++
>>  3 files changed, 34 insertions(+), 2 deletions(-)
> 
> [...]
> 
>> diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S
>> index 7f1dbe962cf5..0c330666a8c9 100644
>> --- a/arch/arm64/mm/cache.S
>> +++ b/arch/arm64/mm/cache.S
>> @@ -80,6 +80,30 @@ USER(9f, ic	ivau, x4	)		// invalidate I line PoU
>>  ENDPROC(flush_icache_range)
>>  ENDPROC(__flush_cache_user_range)
>>  
>> +/*
>> + *	invalidate_icache_range(start,end)
>> + *
>> + *	Ensure that the I cache is invalid within specified region. This
>> + *	assumes that this is done on the linear mapping. Do not use it
>> + *	on a userspace range, as this may fault horribly.
>> + *
>> + *	- start   - virtual start address of region
>> + *	- end     - virtual end address of region
>> + */
>> +ENTRY(invalidate_icache_range)
>> +	icache_line_size x2, x3
>> +	sub	x3, x2, #1
>> +	bic	x4, x0, x3
>> +1:
>> +	ic	ivau, x4			// invalidate I line PoU
>> +	add	x4, x4, x2
>> +	cmp	x4, x1
>> +	b.lo	1b
>> +	dsb	ish
>> +	isb
>> +	ret
>> +ENDPROC(invalidate_icache_range)
> 
> Is there a good reason not to make this work for user addresses? If it's as
> simple as adding a USER annotation and a fallback, then we should wrap that
> in a macro and reuse it for __flush_cache_user_range.

Fair enough. I've done that now (with an optional label that triggers
the generation of a USER() annotation).

I'll post the revised series shortly.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

end of thread, other threads:[~2017-10-20 13:41 UTC | newest]

Thread overview: 78+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-09 15:20 [PATCH 00/10] arm/arm64: KVM: limit icache invalidation to prefetch aborts Marc Zyngier
2017-10-09 15:20 ` Marc Zyngier
2017-10-09 15:20 ` [PATCH 01/10] KVM: arm/arm64: Split dcache/icache flushing Marc Zyngier
2017-10-09 15:20   ` Marc Zyngier
2017-10-16 20:07   ` Christoffer Dall
2017-10-16 20:07     ` Christoffer Dall
2017-10-17  8:57     ` Marc Zyngier
2017-10-17  8:57       ` Marc Zyngier
2017-10-17 14:28       ` Christoffer Dall
2017-10-17 14:28         ` Christoffer Dall
2017-10-17 14:41         ` Marc Zyngier
2017-10-17 14:41           ` Marc Zyngier
2017-10-16 21:35   ` Roy Franz (Cavium)
2017-10-16 21:35     ` Roy Franz (Cavium)
2017-10-17  6:44     ` Christoffer Dall
2017-10-17  6:44       ` Christoffer Dall
2017-10-09 15:20 ` [PATCH 02/10] arm64: KVM: Add invalidate_icache_range helper Marc Zyngier
2017-10-09 15:20   ` Marc Zyngier
2017-10-16 20:08   ` Christoffer Dall
2017-10-16 20:08     ` Christoffer Dall
2017-10-19 16:47   ` Will Deacon
2017-10-19 16:47     ` Will Deacon
2017-10-20 13:41     ` Marc Zyngier
2017-10-20 13:41       ` Marc Zyngier
2017-10-09 15:20 ` [PATCH 03/10] arm: KVM: Add optimized PIPT icache flushing Marc Zyngier
2017-10-09 15:20   ` Marc Zyngier
2017-10-16 20:07   ` Christoffer Dall
2017-10-16 20:07     ` Christoffer Dall
2017-10-17  9:26     ` Marc Zyngier
2017-10-17  9:26       ` Marc Zyngier
2017-10-17 14:34       ` Christoffer Dall
2017-10-17 14:34         ` Christoffer Dall
2017-10-09 15:20 ` [PATCH 04/10] arm64: KVM: PTE/PMD S2 XN bit definition Marc Zyngier
2017-10-09 15:20   ` Marc Zyngier
2017-10-16 20:07   ` Christoffer Dall
2017-10-16 20:07     ` Christoffer Dall
2017-10-09 15:20 ` [PATCH 05/10] KVM: arm/arm64: Limit icache invalidation to prefetch aborts Marc Zyngier
2017-10-09 15:20   ` Marc Zyngier
2017-10-16 20:08   ` Christoffer Dall
2017-10-16 20:08     ` Christoffer Dall
2017-10-09 15:20 ` [PATCH 06/10] KVM: arm/arm64: Only clean the dcache on translation fault Marc Zyngier
2017-10-09 15:20   ` Marc Zyngier
2017-10-16 20:08   ` Christoffer Dall
2017-10-16 20:08     ` Christoffer Dall
2017-10-17  9:34     ` Marc Zyngier
2017-10-17  9:34       ` Marc Zyngier
2017-10-17 14:36       ` Christoffer Dall
2017-10-17 14:36         ` Christoffer Dall
2017-10-17 14:52         ` Marc Zyngier
2017-10-17 14:52           ` Marc Zyngier
2017-10-09 15:20 ` [PATCH 07/10] KVM: arm/arm64: Preserve Exec permission across R/W permission faults Marc Zyngier
2017-10-09 15:20   ` Marc Zyngier
2017-10-16 20:08   ` Christoffer Dall
2017-10-16 20:08     ` Christoffer Dall
2017-10-17 11:22     ` Marc Zyngier
2017-10-17 11:22       ` Marc Zyngier
2017-10-17 14:46       ` Christoffer Dall
2017-10-17 14:46         ` Christoffer Dall
2017-10-17 15:04         ` Marc Zyngier
2017-10-17 15:04           ` Marc Zyngier
2017-10-09 15:20 ` [PATCH 08/10] KVM: arm/arm64: Drop vcpu parameter from coherent_{d,i}cache_guest_page Marc Zyngier
2017-10-09 15:20   ` [PATCH 08/10] KVM: arm/arm64: Drop vcpu parameter from coherent_{d, i}cache_guest_page Marc Zyngier
2017-10-16 20:08   ` [PATCH 08/10] KVM: arm/arm64: Drop vcpu parameter from coherent_{d,i}cache_guest_page Christoffer Dall
2017-10-16 20:08     ` Christoffer Dall
2017-10-09 15:20 ` [PATCH 09/10] KVM: arm/arm64: Detangle kvm_mmu.h from kvm_hyp.h Marc Zyngier
2017-10-09 15:20   ` Marc Zyngier
2017-10-16 20:08   ` Christoffer Dall
2017-10-16 20:08     ` Christoffer Dall
2017-10-09 15:20 ` [PATCH 10/10] arm: KVM: Use common implementation for all flushes to PoC Marc Zyngier
2017-10-09 15:20   ` Marc Zyngier
2017-10-16 20:06   ` Christoffer Dall
2017-10-16 20:06     ` Christoffer Dall
2017-10-17 12:40     ` Marc Zyngier
2017-10-17 12:40       ` Marc Zyngier
2017-10-17 14:48       ` Christoffer Dall
2017-10-17 14:48         ` Christoffer Dall
2017-10-16 20:59 ` [PATCH 00/10] arm/arm64: KVM: limit icache invalidation to prefetch aborts Christoffer Dall
2017-10-16 20:59   ` Christoffer Dall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.