All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/9] arm/arm64: KVM: limit icache invalidation to prefetch aborts
@ 2017-10-23 16:11 ` Marc Zyngier
  0 siblings, 0 replies; 52+ messages in thread
From: Marc Zyngier @ 2017-10-23 16:11 UTC (permalink / raw)
  To: Christoffer Dall, Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, kvm, kvmarm

[now with the patches correctly following...]

It was recently reported that on a VM restore, we seem to spend a
disproportionate amount of time invalidation the icache. This is
partially due to some HW behaviour, but also because we're being a bit
dumb and are invalidating the icache for every page we map at S2, even
if that on a data access.

The slightly better way of doing this is to mark the pages XN at S2,
and wait for the the guest to execute something in that page, at which
point we perform the invalidation. As it is likely that there is a lot
less instruction than data, we win (or so we hope).

We also take this opportunity to drop the extra dcache clean to the
PoU which is pretty useless, as we already clean all the way to the
PoC...

Running a bare metal test that touches 1GB of memory (using a 4kB
stride) leads to the following results on Seattle:

4.13:
do_fault_read.bin:       0.565885992 seconds time elapsed
do_fault_write.bin:       0.738296337 seconds time elapsed
do_fault_read_write.bin:       1.241812231 seconds time elapsed

4.14-rc3+patches:
do_fault_read.bin:       0.244961803 seconds time elapsed
do_fault_write.bin:       0.422740092 seconds time elapsed
do_fault_read_write.bin:       0.643402470 seconds time elapsed

We're almost halving the time of something that more or less looks
like a restore operation. Some larger systems will show much bigger
benefits as they become less impacted by the icache invalidation
(which is broadcast in the inner shareable domain). I've tried to
measure the impact on a VM boot in order to assess the impact of
taking an extra permission fault, but found that any difference was
simply noise.                                                                   

I've also given it a test run on both Cubietruck and Jetson-TK1.

Tests are archived here:
https://git.kernel.org/pub/scm/linux/kernel/git/maz/kvm-ws-tests.git/

I'd value some additional test results on HW I don't have access to.

* From v2:
  - Brought back the "detangling" patch that allows 32bit ARM to still
    compile...
  - Let arm64 icache invalidation deal with userspace addresses

* From v1:
  - Some function renaming (coherent->clean/invalidate)
  - Made the arm64 icache invalidation a macro that's now used in
    two places
  - Fixed BTB flushing on 32bit
  - Added stage2_is_exec as a predicate for XN being absent from the
    entry
  - Dropped patch #10 which was both useless and broken, and patch #9
    that thus became useless
  - Tried to measure the impact on kernel boot time and failed to see
    any difference

Marc Zyngier (9):
  KVM: arm/arm64: Detangle kvm_mmu.h from kvm_hyp.h
  KVM: arm/arm64: Split dcache/icache flushing
  arm64: KVM: Add invalidate_icache_range helper
  arm: KVM: Add optimized PIPT icache flushing
  arm64: KVM: PTE/PMD S2 XN bit definition
  KVM: arm/arm64: Limit icache invalidation to prefetch aborts
  KVM: arm/arm64: Only clean the dcache on translation fault
  KVM: arm/arm64: Preserve Exec permission across R/W permission faults
  KVM: arm/arm64: Drop vcpu parameter from guest cache maintenance
    operartions

 arch/arm/include/asm/kvm_hyp.h         |  3 +-
 arch/arm/include/asm/kvm_mmu.h         | 99 ++++++++++++++++++++++++++++------
 arch/arm/include/asm/pgtable.h         |  4 +-
 arch/arm/kvm/hyp/switch.c              |  1 +
 arch/arm/kvm/hyp/tlb.c                 |  1 +
 arch/arm64/include/asm/assembler.h     | 21 ++++++++
 arch/arm64/include/asm/cacheflush.h    |  7 +++
 arch/arm64/include/asm/kvm_hyp.h       |  1 -
 arch/arm64/include/asm/kvm_mmu.h       | 36 +++++++++++--
 arch/arm64/include/asm/pgtable-hwdef.h |  2 +
 arch/arm64/include/asm/pgtable-prot.h  |  4 +-
 arch/arm64/kvm/hyp/debug-sr.c          |  1 +
 arch/arm64/kvm/hyp/switch.c            |  1 +
 arch/arm64/kvm/hyp/tlb.c               |  1 +
 arch/arm64/mm/cache.S                  | 32 +++++++----
 virt/kvm/arm/hyp/vgic-v2-sr.c          |  1 +
 virt/kvm/arm/mmu.c                     | 64 +++++++++++++++++++---
 17 files changed, 236 insertions(+), 43 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 0/9] arm/arm64: KVM: limit icache invalidation to prefetch aborts
@ 2017-10-23 16:11 ` Marc Zyngier
  0 siblings, 0 replies; 52+ messages in thread
From: Marc Zyngier @ 2017-10-23 16:11 UTC (permalink / raw)
  To: linux-arm-kernel

[now with the patches correctly following...]

It was recently reported that on a VM restore, we seem to spend a
disproportionate amount of time invalidation the icache. This is
partially due to some HW behaviour, but also because we're being a bit
dumb and are invalidating the icache for every page we map at S2, even
if that on a data access.

The slightly better way of doing this is to mark the pages XN at S2,
and wait for the the guest to execute something in that page, at which
point we perform the invalidation. As it is likely that there is a lot
less instruction than data, we win (or so we hope).

We also take this opportunity to drop the extra dcache clean to the
PoU which is pretty useless, as we already clean all the way to the
PoC...

Running a bare metal test that touches 1GB of memory (using a 4kB
stride) leads to the following results on Seattle:

4.13:
do_fault_read.bin:       0.565885992 seconds time elapsed
do_fault_write.bin:       0.738296337 seconds time elapsed
do_fault_read_write.bin:       1.241812231 seconds time elapsed

4.14-rc3+patches:
do_fault_read.bin:       0.244961803 seconds time elapsed
do_fault_write.bin:       0.422740092 seconds time elapsed
do_fault_read_write.bin:       0.643402470 seconds time elapsed

We're almost halving the time of something that more or less looks
like a restore operation. Some larger systems will show much bigger
benefits as they become less impacted by the icache invalidation
(which is broadcast in the inner shareable domain). I've tried to
measure the impact on a VM boot in order to assess the impact of
taking an extra permission fault, but found that any difference was
simply noise.                                                                   

I've also given it a test run on both Cubietruck and Jetson-TK1.

Tests are archived here:
https://git.kernel.org/pub/scm/linux/kernel/git/maz/kvm-ws-tests.git/

I'd value some additional test results on HW I don't have access to.

* From v2:
  - Brought back the "detangling" patch that allows 32bit ARM to still
    compile...
  - Let arm64 icache invalidation deal with userspace addresses

* From v1:
  - Some function renaming (coherent->clean/invalidate)
  - Made the arm64 icache invalidation a macro that's now used in
    two places
  - Fixed BTB flushing on 32bit
  - Added stage2_is_exec as a predicate for XN being absent from the
    entry
  - Dropped patch #10 which was both useless and broken, and patch #9
    that thus became useless
  - Tried to measure the impact on kernel boot time and failed to see
    any difference

Marc Zyngier (9):
  KVM: arm/arm64: Detangle kvm_mmu.h from kvm_hyp.h
  KVM: arm/arm64: Split dcache/icache flushing
  arm64: KVM: Add invalidate_icache_range helper
  arm: KVM: Add optimized PIPT icache flushing
  arm64: KVM: PTE/PMD S2 XN bit definition
  KVM: arm/arm64: Limit icache invalidation to prefetch aborts
  KVM: arm/arm64: Only clean the dcache on translation fault
  KVM: arm/arm64: Preserve Exec permission across R/W permission faults
  KVM: arm/arm64: Drop vcpu parameter from guest cache maintenance
    operartions

 arch/arm/include/asm/kvm_hyp.h         |  3 +-
 arch/arm/include/asm/kvm_mmu.h         | 99 ++++++++++++++++++++++++++++------
 arch/arm/include/asm/pgtable.h         |  4 +-
 arch/arm/kvm/hyp/switch.c              |  1 +
 arch/arm/kvm/hyp/tlb.c                 |  1 +
 arch/arm64/include/asm/assembler.h     | 21 ++++++++
 arch/arm64/include/asm/cacheflush.h    |  7 +++
 arch/arm64/include/asm/kvm_hyp.h       |  1 -
 arch/arm64/include/asm/kvm_mmu.h       | 36 +++++++++++--
 arch/arm64/include/asm/pgtable-hwdef.h |  2 +
 arch/arm64/include/asm/pgtable-prot.h  |  4 +-
 arch/arm64/kvm/hyp/debug-sr.c          |  1 +
 arch/arm64/kvm/hyp/switch.c            |  1 +
 arch/arm64/kvm/hyp/tlb.c               |  1 +
 arch/arm64/mm/cache.S                  | 32 +++++++----
 virt/kvm/arm/hyp/vgic-v2-sr.c          |  1 +
 virt/kvm/arm/mmu.c                     | 64 +++++++++++++++++++---
 17 files changed, 236 insertions(+), 43 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 1/9] KVM: arm/arm64: Detangle kvm_mmu.h from kvm_hyp.h
  2017-10-23 16:11 ` Marc Zyngier
@ 2017-10-23 16:11   ` Marc Zyngier
  -1 siblings, 0 replies; 52+ messages in thread
From: Marc Zyngier @ 2017-10-23 16:11 UTC (permalink / raw)
  To: Christoffer Dall, Catalin Marinas, Will Deacon
  Cc: kvm, linux-arm-kernel, kvmarm

kvm_hyp.h has an odd dependency on kvm_mmu.h, which makes the
opposite inclusion impossible. Let's start with breaking that
useless dependency.

Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_hyp.h   | 1 -
 arch/arm/kvm/hyp/switch.c        | 1 +
 arch/arm/kvm/hyp/tlb.c           | 1 +
 arch/arm64/include/asm/kvm_hyp.h | 1 -
 arch/arm64/kvm/hyp/debug-sr.c    | 1 +
 arch/arm64/kvm/hyp/switch.c      | 1 +
 arch/arm64/kvm/hyp/tlb.c         | 1 +
 virt/kvm/arm/hyp/vgic-v2-sr.c    | 1 +
 8 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/kvm_hyp.h b/arch/arm/include/asm/kvm_hyp.h
index 14b5903f0224..04e5077cbcb6 100644
--- a/arch/arm/include/asm/kvm_hyp.h
+++ b/arch/arm/include/asm/kvm_hyp.h
@@ -21,7 +21,6 @@
 #include <linux/compiler.h>
 #include <linux/kvm_host.h>
 #include <asm/cp15.h>
-#include <asm/kvm_mmu.h>
 #include <asm/vfp.h>
 
 #define __hyp_text __section(.hyp.text) notrace
diff --git a/arch/arm/kvm/hyp/switch.c b/arch/arm/kvm/hyp/switch.c
index ebd2dd46adf7..67e0a689c4b5 100644
--- a/arch/arm/kvm/hyp/switch.c
+++ b/arch/arm/kvm/hyp/switch.c
@@ -18,6 +18,7 @@
 
 #include <asm/kvm_asm.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
 
 __asm__(".arch_extension     virt");
 
diff --git a/arch/arm/kvm/hyp/tlb.c b/arch/arm/kvm/hyp/tlb.c
index 6d810af2d9fd..c0edd450e104 100644
--- a/arch/arm/kvm/hyp/tlb.c
+++ b/arch/arm/kvm/hyp/tlb.c
@@ -19,6 +19,7 @@
  */
 
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
 
 /**
  * Flush per-VMID TLBs
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 4572a9b560fa..afbfbe0c12c5 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -20,7 +20,6 @@
 
 #include <linux/compiler.h>
 #include <linux/kvm_host.h>
-#include <asm/kvm_mmu.h>
 #include <asm/sysreg.h>
 
 #define __hyp_text __section(.hyp.text) notrace
diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
index f5154ed3da6c..d3a13d57f2c5 100644
--- a/arch/arm64/kvm/hyp/debug-sr.c
+++ b/arch/arm64/kvm/hyp/debug-sr.c
@@ -21,6 +21,7 @@
 #include <asm/debug-monitors.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
 
 #define read_debug(r,n)		read_sysreg(r##n##_el1)
 #define write_debug(v,r,n)	write_sysreg(v, r##n##_el1)
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 945e79c641c4..c52f7094122f 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -21,6 +21,7 @@
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
 #include <asm/fpsimd.h>
 
 static bool __hyp_text __fpsimd_enabled_nvhe(void)
diff --git a/arch/arm64/kvm/hyp/tlb.c b/arch/arm64/kvm/hyp/tlb.c
index 73464a96c365..131c7772703c 100644
--- a/arch/arm64/kvm/hyp/tlb.c
+++ b/arch/arm64/kvm/hyp/tlb.c
@@ -16,6 +16,7 @@
  */
 
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
 #include <asm/tlbflush.h>
 
 static void __hyp_text __tlb_switch_to_guest_vhe(struct kvm *kvm)
diff --git a/virt/kvm/arm/hyp/vgic-v2-sr.c b/virt/kvm/arm/hyp/vgic-v2-sr.c
index a3f18d362366..77ccd8e2090b 100644
--- a/virt/kvm/arm/hyp/vgic-v2-sr.c
+++ b/virt/kvm/arm/hyp/vgic-v2-sr.c
@@ -21,6 +21,7 @@
 
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
 
 static void __hyp_text save_elrsr(struct kvm_vcpu *vcpu, void __iomem *base)
 {
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 1/9] KVM: arm/arm64: Detangle kvm_mmu.h from kvm_hyp.h
@ 2017-10-23 16:11   ` Marc Zyngier
  0 siblings, 0 replies; 52+ messages in thread
From: Marc Zyngier @ 2017-10-23 16:11 UTC (permalink / raw)
  To: linux-arm-kernel

kvm_hyp.h has an odd dependency on kvm_mmu.h, which makes the
opposite inclusion impossible. Let's start with breaking that
useless dependency.

Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_hyp.h   | 1 -
 arch/arm/kvm/hyp/switch.c        | 1 +
 arch/arm/kvm/hyp/tlb.c           | 1 +
 arch/arm64/include/asm/kvm_hyp.h | 1 -
 arch/arm64/kvm/hyp/debug-sr.c    | 1 +
 arch/arm64/kvm/hyp/switch.c      | 1 +
 arch/arm64/kvm/hyp/tlb.c         | 1 +
 virt/kvm/arm/hyp/vgic-v2-sr.c    | 1 +
 8 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/kvm_hyp.h b/arch/arm/include/asm/kvm_hyp.h
index 14b5903f0224..04e5077cbcb6 100644
--- a/arch/arm/include/asm/kvm_hyp.h
+++ b/arch/arm/include/asm/kvm_hyp.h
@@ -21,7 +21,6 @@
 #include <linux/compiler.h>
 #include <linux/kvm_host.h>
 #include <asm/cp15.h>
-#include <asm/kvm_mmu.h>
 #include <asm/vfp.h>
 
 #define __hyp_text __section(.hyp.text) notrace
diff --git a/arch/arm/kvm/hyp/switch.c b/arch/arm/kvm/hyp/switch.c
index ebd2dd46adf7..67e0a689c4b5 100644
--- a/arch/arm/kvm/hyp/switch.c
+++ b/arch/arm/kvm/hyp/switch.c
@@ -18,6 +18,7 @@
 
 #include <asm/kvm_asm.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
 
 __asm__(".arch_extension     virt");
 
diff --git a/arch/arm/kvm/hyp/tlb.c b/arch/arm/kvm/hyp/tlb.c
index 6d810af2d9fd..c0edd450e104 100644
--- a/arch/arm/kvm/hyp/tlb.c
+++ b/arch/arm/kvm/hyp/tlb.c
@@ -19,6 +19,7 @@
  */
 
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
 
 /**
  * Flush per-VMID TLBs
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 4572a9b560fa..afbfbe0c12c5 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -20,7 +20,6 @@
 
 #include <linux/compiler.h>
 #include <linux/kvm_host.h>
-#include <asm/kvm_mmu.h>
 #include <asm/sysreg.h>
 
 #define __hyp_text __section(.hyp.text) notrace
diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
index f5154ed3da6c..d3a13d57f2c5 100644
--- a/arch/arm64/kvm/hyp/debug-sr.c
+++ b/arch/arm64/kvm/hyp/debug-sr.c
@@ -21,6 +21,7 @@
 #include <asm/debug-monitors.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
 
 #define read_debug(r,n)		read_sysreg(r##n##_el1)
 #define write_debug(v,r,n)	write_sysreg(v, r##n##_el1)
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 945e79c641c4..c52f7094122f 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -21,6 +21,7 @@
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
 #include <asm/fpsimd.h>
 
 static bool __hyp_text __fpsimd_enabled_nvhe(void)
diff --git a/arch/arm64/kvm/hyp/tlb.c b/arch/arm64/kvm/hyp/tlb.c
index 73464a96c365..131c7772703c 100644
--- a/arch/arm64/kvm/hyp/tlb.c
+++ b/arch/arm64/kvm/hyp/tlb.c
@@ -16,6 +16,7 @@
  */
 
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
 #include <asm/tlbflush.h>
 
 static void __hyp_text __tlb_switch_to_guest_vhe(struct kvm *kvm)
diff --git a/virt/kvm/arm/hyp/vgic-v2-sr.c b/virt/kvm/arm/hyp/vgic-v2-sr.c
index a3f18d362366..77ccd8e2090b 100644
--- a/virt/kvm/arm/hyp/vgic-v2-sr.c
+++ b/virt/kvm/arm/hyp/vgic-v2-sr.c
@@ -21,6 +21,7 @@
 
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
 
 static void __hyp_text save_elrsr(struct kvm_vcpu *vcpu, void __iomem *base)
 {
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 2/9] KVM: arm/arm64: Split dcache/icache flushing
  2017-10-23 16:11 ` Marc Zyngier
@ 2017-10-23 16:11   ` Marc Zyngier
  -1 siblings, 0 replies; 52+ messages in thread
From: Marc Zyngier @ 2017-10-23 16:11 UTC (permalink / raw)
  To: Christoffer Dall, Catalin Marinas, Will Deacon
  Cc: Christoffer Dall, kvm, linux-arm-kernel, kvmarm

As we're about to introduce opportunistic invalidation of the icache,
let's split dcache and icache flushing.

Acked-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_mmu.h   | 60 ++++++++++++++++++++++++++++------------
 arch/arm64/include/asm/kvm_mmu.h | 13 +++++++--
 virt/kvm/arm/mmu.c               | 20 ++++++++++----
 3 files changed, 67 insertions(+), 26 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index fa6f2174276b..9fa4b2520974 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -126,21 +126,12 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
 	return (vcpu_cp15(vcpu, c1_SCTLR) & 0b101) == 0b101;
 }
 
-static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
-					       kvm_pfn_t pfn,
-					       unsigned long size)
+static inline void __clean_dcache_guest_page(struct kvm_vcpu *vcpu,
+					     kvm_pfn_t pfn,
+					     unsigned long size)
 {
 	/*
-	 * If we are going to insert an instruction page and the icache is
-	 * either VIPT or PIPT, there is a potential problem where the host
-	 * (or another VM) may have used the same page as this guest, and we
-	 * read incorrect data from the icache.  If we're using a PIPT cache,
-	 * we can invalidate just that page, but if we are using a VIPT cache
-	 * we need to invalidate the entire icache - damn shame - as written
-	 * in the ARM ARM (DDI 0406C.b - Page B3-1393).
-	 *
-	 * VIVT caches are tagged using both the ASID and the VMID and doesn't
-	 * need any kind of flushing (DDI 0406C.b - Page B3-1392).
+	 * Clean the dcache to the Point of Coherency.
 	 *
 	 * We need to do this through a kernel mapping (using the
 	 * user-space mapping has proved to be the wrong
@@ -155,19 +146,52 @@ static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
 
 		kvm_flush_dcache_to_poc(va, PAGE_SIZE);
 
-		if (icache_is_pipt())
-			__cpuc_coherent_user_range((unsigned long)va,
-						   (unsigned long)va + PAGE_SIZE);
-
 		size -= PAGE_SIZE;
 		pfn++;
 
 		kunmap_atomic(va);
 	}
+}
 
-	if (!icache_is_pipt() && !icache_is_vivt_asid_tagged()) {
+static inline void __invalidate_icache_guest_page(struct kvm_vcpu *vcpu,
+						  kvm_pfn_t pfn,
+						  unsigned long size)
+{
+	/*
+	 * If we are going to insert an instruction page and the icache is
+	 * either VIPT or PIPT, there is a potential problem where the host
+	 * (or another VM) may have used the same page as this guest, and we
+	 * read incorrect data from the icache.  If we're using a PIPT cache,
+	 * we can invalidate just that page, but if we are using a VIPT cache
+	 * we need to invalidate the entire icache - damn shame - as written
+	 * in the ARM ARM (DDI 0406C.b - Page B3-1393).
+	 *
+	 * VIVT caches are tagged using both the ASID and the VMID and doesn't
+	 * need any kind of flushing (DDI 0406C.b - Page B3-1392).
+	 */
+
+	VM_BUG_ON(size & ~PAGE_MASK);
+
+	if (icache_is_vivt_asid_tagged())
+		return;
+
+	if (!icache_is_pipt()) {
 		/* any kind of VIPT cache */
 		__flush_icache_all();
+		return;
+	}
+
+	/* PIPT cache. As for the d-side, use a temporary kernel mapping. */
+	while (size) {
+		void *va = kmap_atomic_pfn(pfn);
+
+		__cpuc_coherent_user_range((unsigned long)va,
+					   (unsigned long)va + PAGE_SIZE);
+
+		size -= PAGE_SIZE;
+		pfn++;
+
+		kunmap_atomic(va);
 	}
 }
 
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 672c8684d5c2..8034b96fb3a4 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -230,19 +230,26 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
 	return (vcpu_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
 }
 
-static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
-					       kvm_pfn_t pfn,
-					       unsigned long size)
+static inline void __clean_dcache_guest_page(struct kvm_vcpu *vcpu,
+					     kvm_pfn_t pfn,
+					     unsigned long size)
 {
 	void *va = page_address(pfn_to_page(pfn));
 
 	kvm_flush_dcache_to_poc(va, size);
+}
 
+static inline void __invalidate_icache_guest_page(struct kvm_vcpu *vcpu,
+						  kvm_pfn_t pfn,
+						  unsigned long size)
+{
 	if (icache_is_aliasing()) {
 		/* any kind of VIPT cache */
 		__flush_icache_all();
 	} else if (is_kernel_in_hyp_mode() || !icache_is_vpipt()) {
 		/* PIPT or VPIPT at EL2 (see comment in __kvm_tlb_flush_vmid_ipa) */
+		void *va = page_address(pfn_to_page(pfn));
+
 		flush_icache_range((unsigned long)va,
 				   (unsigned long)va + size);
 	}
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index b36945d49986..2174244f6317 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1257,10 +1257,16 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
 }
 
-static void coherent_cache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
-				      unsigned long size)
+static void clean_dcache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
+				    unsigned long size)
 {
-	__coherent_cache_guest_page(vcpu, pfn, size);
+	__clean_dcache_guest_page(vcpu, pfn, size);
+}
+
+static void invalidate_icache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
+					 unsigned long size)
+{
+	__invalidate_icache_guest_page(vcpu, pfn, size);
 }
 
 static void kvm_send_hwpoison_signal(unsigned long address,
@@ -1391,7 +1397,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			new_pmd = kvm_s2pmd_mkwrite(new_pmd);
 			kvm_set_pfn_dirty(pfn);
 		}
-		coherent_cache_guest_page(vcpu, pfn, PMD_SIZE);
+		clean_dcache_guest_page(vcpu, pfn, PMD_SIZE);
+		invalidate_icache_guest_page(vcpu, pfn, PMD_SIZE);
+
 		ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
 	} else {
 		pte_t new_pte = pfn_pte(pfn, mem_type);
@@ -1401,7 +1409,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			kvm_set_pfn_dirty(pfn);
 			mark_page_dirty(kvm, gfn);
 		}
-		coherent_cache_guest_page(vcpu, pfn, PAGE_SIZE);
+		clean_dcache_guest_page(vcpu, pfn, PAGE_SIZE);
+		invalidate_icache_guest_page(vcpu, pfn, PAGE_SIZE);
+
 		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, flags);
 	}
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 2/9] KVM: arm/arm64: Split dcache/icache flushing
@ 2017-10-23 16:11   ` Marc Zyngier
  0 siblings, 0 replies; 52+ messages in thread
From: Marc Zyngier @ 2017-10-23 16:11 UTC (permalink / raw)
  To: linux-arm-kernel

As we're about to introduce opportunistic invalidation of the icache,
let's split dcache and icache flushing.

Acked-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_mmu.h   | 60 ++++++++++++++++++++++++++++------------
 arch/arm64/include/asm/kvm_mmu.h | 13 +++++++--
 virt/kvm/arm/mmu.c               | 20 ++++++++++----
 3 files changed, 67 insertions(+), 26 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index fa6f2174276b..9fa4b2520974 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -126,21 +126,12 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
 	return (vcpu_cp15(vcpu, c1_SCTLR) & 0b101) == 0b101;
 }
 
-static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
-					       kvm_pfn_t pfn,
-					       unsigned long size)
+static inline void __clean_dcache_guest_page(struct kvm_vcpu *vcpu,
+					     kvm_pfn_t pfn,
+					     unsigned long size)
 {
 	/*
-	 * If we are going to insert an instruction page and the icache is
-	 * either VIPT or PIPT, there is a potential problem where the host
-	 * (or another VM) may have used the same page as this guest, and we
-	 * read incorrect data from the icache.  If we're using a PIPT cache,
-	 * we can invalidate just that page, but if we are using a VIPT cache
-	 * we need to invalidate the entire icache - damn shame - as written
-	 * in the ARM ARM (DDI 0406C.b - Page B3-1393).
-	 *
-	 * VIVT caches are tagged using both the ASID and the VMID and doesn't
-	 * need any kind of flushing (DDI 0406C.b - Page B3-1392).
+	 * Clean the dcache to the Point of Coherency.
 	 *
 	 * We need to do this through a kernel mapping (using the
 	 * user-space mapping has proved to be the wrong
@@ -155,19 +146,52 @@ static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
 
 		kvm_flush_dcache_to_poc(va, PAGE_SIZE);
 
-		if (icache_is_pipt())
-			__cpuc_coherent_user_range((unsigned long)va,
-						   (unsigned long)va + PAGE_SIZE);
-
 		size -= PAGE_SIZE;
 		pfn++;
 
 		kunmap_atomic(va);
 	}
+}
 
-	if (!icache_is_pipt() && !icache_is_vivt_asid_tagged()) {
+static inline void __invalidate_icache_guest_page(struct kvm_vcpu *vcpu,
+						  kvm_pfn_t pfn,
+						  unsigned long size)
+{
+	/*
+	 * If we are going to insert an instruction page and the icache is
+	 * either VIPT or PIPT, there is a potential problem where the host
+	 * (or another VM) may have used the same page as this guest, and we
+	 * read incorrect data from the icache.  If we're using a PIPT cache,
+	 * we can invalidate just that page, but if we are using a VIPT cache
+	 * we need to invalidate the entire icache - damn shame - as written
+	 * in the ARM ARM (DDI 0406C.b - Page B3-1393).
+	 *
+	 * VIVT caches are tagged using both the ASID and the VMID and doesn't
+	 * need any kind of flushing (DDI 0406C.b - Page B3-1392).
+	 */
+
+	VM_BUG_ON(size & ~PAGE_MASK);
+
+	if (icache_is_vivt_asid_tagged())
+		return;
+
+	if (!icache_is_pipt()) {
 		/* any kind of VIPT cache */
 		__flush_icache_all();
+		return;
+	}
+
+	/* PIPT cache. As for the d-side, use a temporary kernel mapping. */
+	while (size) {
+		void *va = kmap_atomic_pfn(pfn);
+
+		__cpuc_coherent_user_range((unsigned long)va,
+					   (unsigned long)va + PAGE_SIZE);
+
+		size -= PAGE_SIZE;
+		pfn++;
+
+		kunmap_atomic(va);
 	}
 }
 
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 672c8684d5c2..8034b96fb3a4 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -230,19 +230,26 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
 	return (vcpu_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
 }
 
-static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
-					       kvm_pfn_t pfn,
-					       unsigned long size)
+static inline void __clean_dcache_guest_page(struct kvm_vcpu *vcpu,
+					     kvm_pfn_t pfn,
+					     unsigned long size)
 {
 	void *va = page_address(pfn_to_page(pfn));
 
 	kvm_flush_dcache_to_poc(va, size);
+}
 
+static inline void __invalidate_icache_guest_page(struct kvm_vcpu *vcpu,
+						  kvm_pfn_t pfn,
+						  unsigned long size)
+{
 	if (icache_is_aliasing()) {
 		/* any kind of VIPT cache */
 		__flush_icache_all();
 	} else if (is_kernel_in_hyp_mode() || !icache_is_vpipt()) {
 		/* PIPT or VPIPT at EL2 (see comment in __kvm_tlb_flush_vmid_ipa) */
+		void *va = page_address(pfn_to_page(pfn));
+
 		flush_icache_range((unsigned long)va,
 				   (unsigned long)va + size);
 	}
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index b36945d49986..2174244f6317 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1257,10 +1257,16 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
 }
 
-static void coherent_cache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
-				      unsigned long size)
+static void clean_dcache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
+				    unsigned long size)
 {
-	__coherent_cache_guest_page(vcpu, pfn, size);
+	__clean_dcache_guest_page(vcpu, pfn, size);
+}
+
+static void invalidate_icache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
+					 unsigned long size)
+{
+	__invalidate_icache_guest_page(vcpu, pfn, size);
 }
 
 static void kvm_send_hwpoison_signal(unsigned long address,
@@ -1391,7 +1397,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			new_pmd = kvm_s2pmd_mkwrite(new_pmd);
 			kvm_set_pfn_dirty(pfn);
 		}
-		coherent_cache_guest_page(vcpu, pfn, PMD_SIZE);
+		clean_dcache_guest_page(vcpu, pfn, PMD_SIZE);
+		invalidate_icache_guest_page(vcpu, pfn, PMD_SIZE);
+
 		ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
 	} else {
 		pte_t new_pte = pfn_pte(pfn, mem_type);
@@ -1401,7 +1409,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			kvm_set_pfn_dirty(pfn);
 			mark_page_dirty(kvm, gfn);
 		}
-		coherent_cache_guest_page(vcpu, pfn, PAGE_SIZE);
+		clean_dcache_guest_page(vcpu, pfn, PAGE_SIZE);
+		invalidate_icache_guest_page(vcpu, pfn, PAGE_SIZE);
+
 		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, flags);
 	}
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 3/9] arm64: KVM: Add invalidate_icache_range helper
  2017-10-23 16:11 ` Marc Zyngier
@ 2017-10-23 16:11   ` Marc Zyngier
  -1 siblings, 0 replies; 52+ messages in thread
From: Marc Zyngier @ 2017-10-23 16:11 UTC (permalink / raw)
  To: Christoffer Dall, Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, kvm, kvmarm

We currently tightly couple dcache clean with icache invalidation,
but KVM could do without the initial flush to PoU, as we've
already flushed things to PoC.

Let's introduce invalidate_icache_range which is limited to
invalidating the icache from the linear mapping (and thus
has none of the userspace fault handling complexity), and
wire it in KVM instead of flush_icache_range.

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/assembler.h  | 21 +++++++++++++++++++++
 arch/arm64/include/asm/cacheflush.h |  7 +++++++
 arch/arm64/include/asm/kvm_mmu.h    |  4 ++--
 arch/arm64/mm/cache.S               | 32 ++++++++++++++++++++++----------
 4 files changed, 52 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index d58a6253c6ab..27093cf2b91f 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -375,6 +375,27 @@ alternative_endif
 	.endm
 
 /*
+ * Macro to perform an instruction cache maintenance for the interval
+ * [start, end)
+ *
+ * 	start, end:	virtual addresses describing the region
+ *	label:		A label to branch to on user fault.
+ * 	Corrupts:	tmp1, tmp2
+ */
+	.macro invalidate_icache_by_line start, end, tmp1, tmp2, label
+	icache_line_size \tmp1, \tmp2
+	sub	\tmp2, \tmp1, #1
+	bic	\tmp2, \start, \tmp2
+9997:
+USER(\label, ic	ivau, \tmp2)			// invalidate I line PoU
+	add	\tmp2, \tmp2, \tmp1
+	cmp	\tmp2, \end
+	b.lo	9997b
+	dsb	ish
+	isb
+	.endm
+
+/*
  * reset_pmuserenr_el0 - reset PMUSERENR_EL0 if PMUv3 present
  */
 	.macro	reset_pmuserenr_el0, tmpreg
diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
index 76d1cc85d5b1..e671e73c6453 100644
--- a/arch/arm64/include/asm/cacheflush.h
+++ b/arch/arm64/include/asm/cacheflush.h
@@ -52,6 +52,12 @@
  *		- start  - virtual start address
  *		- end    - virtual end address
  *
+ *	invalidate_icache_range(start, end)
+ *
+ *		Invalidate the I-cache in the region described by start, end.
+ *		- start  - virtual start address
+ *		- end    - virtual end address
+ *
  *	__flush_cache_user_range(start, end)
  *
  *		Ensure coherency between the I-cache and the D-cache in the
@@ -66,6 +72,7 @@
  *		- size   - region size
  */
 extern void flush_icache_range(unsigned long start, unsigned long end);
+extern int  invalidate_icache_range(unsigned long start, unsigned long end);
 extern void __flush_dcache_area(void *addr, size_t len);
 extern void __inval_dcache_area(void *addr, size_t len);
 extern void __clean_dcache_area_poc(void *addr, size_t len);
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 8034b96fb3a4..56b3e03c85e7 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -250,8 +250,8 @@ static inline void __invalidate_icache_guest_page(struct kvm_vcpu *vcpu,
 		/* PIPT or VPIPT at EL2 (see comment in __kvm_tlb_flush_vmid_ipa) */
 		void *va = page_address(pfn_to_page(pfn));
 
-		flush_icache_range((unsigned long)va,
-				   (unsigned long)va + size);
+		invalidate_icache_range((unsigned long)va,
+					(unsigned long)va + size);
 	}
 }
 
diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S
index 7f1dbe962cf5..bedd23da83f4 100644
--- a/arch/arm64/mm/cache.S
+++ b/arch/arm64/mm/cache.S
@@ -60,16 +60,7 @@ user_alt 9f, "dc cvau, x4",  "dc civac, x4",  ARM64_WORKAROUND_CLEAN_CACHE
 	b.lo	1b
 	dsb	ish
 
-	icache_line_size x2, x3
-	sub	x3, x2, #1
-	bic	x4, x0, x3
-1:
-USER(9f, ic	ivau, x4	)		// invalidate I line PoU
-	add	x4, x4, x2
-	cmp	x4, x1
-	b.lo	1b
-	dsb	ish
-	isb
+	invalidate_icache_by_line x0, x1, x2, x3, 9f
 	mov	x0, #0
 1:
 	uaccess_ttbr0_disable x1
@@ -81,6 +72,27 @@ ENDPROC(flush_icache_range)
 ENDPROC(__flush_cache_user_range)
 
 /*
+ *	invalidate_icache_range(start,end)
+ *
+ *	Ensure that the I cache is invalid within specified region.
+ *
+ *	- start   - virtual start address of region
+ *	- end     - virtual end address of region
+ */
+ENTRY(invalidate_icache_range)
+	uaccess_ttbr0_enable x2, x3
+
+	invalidate_icache_by_line x0, x1, x2, x3, 2f
+	mov	x0, xzr
+1:
+	uaccess_ttbr0_disable x1
+	ret
+2:
+	mov	x0, #-EFAULT
+	b	1b
+ENDPROC(invalidate_icache_range)
+
+/*
  *	__flush_dcache_area(kaddr, size)
  *
  *	Ensure that any D-cache lines for the interval [kaddr, kaddr+size)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 3/9] arm64: KVM: Add invalidate_icache_range helper
@ 2017-10-23 16:11   ` Marc Zyngier
  0 siblings, 0 replies; 52+ messages in thread
From: Marc Zyngier @ 2017-10-23 16:11 UTC (permalink / raw)
  To: linux-arm-kernel

We currently tightly couple dcache clean with icache invalidation,
but KVM could do without the initial flush to PoU, as we've
already flushed things to PoC.

Let's introduce invalidate_icache_range which is limited to
invalidating the icache from the linear mapping (and thus
has none of the userspace fault handling complexity), and
wire it in KVM instead of flush_icache_range.

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/assembler.h  | 21 +++++++++++++++++++++
 arch/arm64/include/asm/cacheflush.h |  7 +++++++
 arch/arm64/include/asm/kvm_mmu.h    |  4 ++--
 arch/arm64/mm/cache.S               | 32 ++++++++++++++++++++++----------
 4 files changed, 52 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index d58a6253c6ab..27093cf2b91f 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -375,6 +375,27 @@ alternative_endif
 	.endm
 
 /*
+ * Macro to perform an instruction cache maintenance for the interval
+ * [start, end)
+ *
+ * 	start, end:	virtual addresses describing the region
+ *	label:		A label to branch to on user fault.
+ * 	Corrupts:	tmp1, tmp2
+ */
+	.macro invalidate_icache_by_line start, end, tmp1, tmp2, label
+	icache_line_size \tmp1, \tmp2
+	sub	\tmp2, \tmp1, #1
+	bic	\tmp2, \start, \tmp2
+9997:
+USER(\label, ic	ivau, \tmp2)			// invalidate I line PoU
+	add	\tmp2, \tmp2, \tmp1
+	cmp	\tmp2, \end
+	b.lo	9997b
+	dsb	ish
+	isb
+	.endm
+
+/*
  * reset_pmuserenr_el0 - reset PMUSERENR_EL0 if PMUv3 present
  */
 	.macro	reset_pmuserenr_el0, tmpreg
diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
index 76d1cc85d5b1..e671e73c6453 100644
--- a/arch/arm64/include/asm/cacheflush.h
+++ b/arch/arm64/include/asm/cacheflush.h
@@ -52,6 +52,12 @@
  *		- start  - virtual start address
  *		- end    - virtual end address
  *
+ *	invalidate_icache_range(start, end)
+ *
+ *		Invalidate the I-cache in the region described by start, end.
+ *		- start  - virtual start address
+ *		- end    - virtual end address
+ *
  *	__flush_cache_user_range(start, end)
  *
  *		Ensure coherency between the I-cache and the D-cache in the
@@ -66,6 +72,7 @@
  *		- size   - region size
  */
 extern void flush_icache_range(unsigned long start, unsigned long end);
+extern int  invalidate_icache_range(unsigned long start, unsigned long end);
 extern void __flush_dcache_area(void *addr, size_t len);
 extern void __inval_dcache_area(void *addr, size_t len);
 extern void __clean_dcache_area_poc(void *addr, size_t len);
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 8034b96fb3a4..56b3e03c85e7 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -250,8 +250,8 @@ static inline void __invalidate_icache_guest_page(struct kvm_vcpu *vcpu,
 		/* PIPT or VPIPT at EL2 (see comment in __kvm_tlb_flush_vmid_ipa) */
 		void *va = page_address(pfn_to_page(pfn));
 
-		flush_icache_range((unsigned long)va,
-				   (unsigned long)va + size);
+		invalidate_icache_range((unsigned long)va,
+					(unsigned long)va + size);
 	}
 }
 
diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S
index 7f1dbe962cf5..bedd23da83f4 100644
--- a/arch/arm64/mm/cache.S
+++ b/arch/arm64/mm/cache.S
@@ -60,16 +60,7 @@ user_alt 9f, "dc cvau, x4",  "dc civac, x4",  ARM64_WORKAROUND_CLEAN_CACHE
 	b.lo	1b
 	dsb	ish
 
-	icache_line_size x2, x3
-	sub	x3, x2, #1
-	bic	x4, x0, x3
-1:
-USER(9f, ic	ivau, x4	)		// invalidate I line PoU
-	add	x4, x4, x2
-	cmp	x4, x1
-	b.lo	1b
-	dsb	ish
-	isb
+	invalidate_icache_by_line x0, x1, x2, x3, 9f
 	mov	x0, #0
 1:
 	uaccess_ttbr0_disable x1
@@ -81,6 +72,27 @@ ENDPROC(flush_icache_range)
 ENDPROC(__flush_cache_user_range)
 
 /*
+ *	invalidate_icache_range(start,end)
+ *
+ *	Ensure that the I cache is invalid within specified region.
+ *
+ *	- start   - virtual start address of region
+ *	- end     - virtual end address of region
+ */
+ENTRY(invalidate_icache_range)
+	uaccess_ttbr0_enable x2, x3
+
+	invalidate_icache_by_line x0, x1, x2, x3, 2f
+	mov	x0, xzr
+1:
+	uaccess_ttbr0_disable x1
+	ret
+2:
+	mov	x0, #-EFAULT
+	b	1b
+ENDPROC(invalidate_icache_range)
+
+/*
  *	__flush_dcache_area(kaddr, size)
  *
  *	Ensure that any D-cache lines for the interval [kaddr, kaddr+size)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 4/9] arm: KVM: Add optimized PIPT icache flushing
  2017-10-23 16:11 ` Marc Zyngier
@ 2017-10-23 16:11   ` Marc Zyngier
  -1 siblings, 0 replies; 52+ messages in thread
From: Marc Zyngier @ 2017-10-23 16:11 UTC (permalink / raw)
  To: Christoffer Dall, Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, kvm, kvmarm

Calling __cpuc_coherent_user_range to invalidate the icache on
a PIPT icache machine has some pointless overhead, as it starts
by cleaning the dcache to the PoU, while we're guaranteed to
have already cleaned it to the PoC.

As KVM is the only user of such a feature, let's implement some
ad-hoc cache flushing in kvm_mmu.h. Should it become useful to
other subsystems, it can be moved to a more global location.

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_hyp.h |  2 ++
 arch/arm/include/asm/kvm_mmu.h | 32 +++++++++++++++++++++++++++++---
 2 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/kvm_hyp.h b/arch/arm/include/asm/kvm_hyp.h
index 04e5077cbcb6..8b29faa119ba 100644
--- a/arch/arm/include/asm/kvm_hyp.h
+++ b/arch/arm/include/asm/kvm_hyp.h
@@ -68,6 +68,8 @@
 #define HIFAR		__ACCESS_CP15(c6, 4, c0, 2)
 #define HPFAR		__ACCESS_CP15(c6, 4, c0, 4)
 #define ICIALLUIS	__ACCESS_CP15(c7, 0, c1, 0)
+#define BPIALLIS	__ACCESS_CP15(c7, 0, c1, 6)
+#define ICIMVAU		__ACCESS_CP15(c7, 0, c5, 1)
 #define ATS1CPR		__ACCESS_CP15(c7, 0, c8, 0)
 #define TLBIALLIS	__ACCESS_CP15(c8, 0, c3, 0)
 #define TLBIALL		__ACCESS_CP15(c8, 0, c7, 0)
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 9fa4b2520974..bc8d21e76637 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -37,6 +37,8 @@
 
 #include <linux/highmem.h>
 #include <asm/cacheflush.h>
+#include <asm/cputype.h>
+#include <asm/kvm_hyp.h>
 #include <asm/pgalloc.h>
 #include <asm/stage2_pgtable.h>
 
@@ -157,6 +159,8 @@ static inline void __invalidate_icache_guest_page(struct kvm_vcpu *vcpu,
 						  kvm_pfn_t pfn,
 						  unsigned long size)
 {
+	u32 iclsz;
+
 	/*
 	 * If we are going to insert an instruction page and the icache is
 	 * either VIPT or PIPT, there is a potential problem where the host
@@ -181,18 +185,40 @@ static inline void __invalidate_icache_guest_page(struct kvm_vcpu *vcpu,
 		return;
 	}
 
-	/* PIPT cache. As for the d-side, use a temporary kernel mapping. */
+	/*
+	 * CTR IminLine contains Log2 of the number of words in the
+	 * cache line, so we can get the number of words as
+	 * 2 << (IminLine - 1).  To get the number of bytes, we
+	 * multiply by 4 (the number of bytes in a 32-bit word), and
+	 * get 4 << (IminLine).
+	 */
+	iclsz = 4 << (read_cpuid(CPUID_CACHETYPE) & 0xf);
+
 	while (size) {
 		void *va = kmap_atomic_pfn(pfn);
+		void *end = va + PAGE_SIZE;
+		void *addr = va;
 
-		__cpuc_coherent_user_range((unsigned long)va,
-					   (unsigned long)va + PAGE_SIZE);
+		do {
+			write_sysreg(addr, ICIMVAU);
+			addr += iclsz;
+		} while (addr < end);
+
+		dsb(ishst);
+		isb();
 
 		size -= PAGE_SIZE;
 		pfn++;
 
 		kunmap_atomic(va);
 	}
+
+	/* Check if we need to invalidate the BTB */
+	if ((read_cpuid_ext(CPUID_EXT_MMFR1) >> 28) != 4) {
+		write_sysreg(0, BPIALLIS);
+		dsb(ishst);
+		isb();
+	}
 }
 
 static inline void __kvm_flush_dcache_pte(pte_t pte)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 4/9] arm: KVM: Add optimized PIPT icache flushing
@ 2017-10-23 16:11   ` Marc Zyngier
  0 siblings, 0 replies; 52+ messages in thread
From: Marc Zyngier @ 2017-10-23 16:11 UTC (permalink / raw)
  To: linux-arm-kernel

Calling __cpuc_coherent_user_range to invalidate the icache on
a PIPT icache machine has some pointless overhead, as it starts
by cleaning the dcache to the PoU, while we're guaranteed to
have already cleaned it to the PoC.

As KVM is the only user of such a feature, let's implement some
ad-hoc cache flushing in kvm_mmu.h. Should it become useful to
other subsystems, it can be moved to a more global location.

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_hyp.h |  2 ++
 arch/arm/include/asm/kvm_mmu.h | 32 +++++++++++++++++++++++++++++---
 2 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/kvm_hyp.h b/arch/arm/include/asm/kvm_hyp.h
index 04e5077cbcb6..8b29faa119ba 100644
--- a/arch/arm/include/asm/kvm_hyp.h
+++ b/arch/arm/include/asm/kvm_hyp.h
@@ -68,6 +68,8 @@
 #define HIFAR		__ACCESS_CP15(c6, 4, c0, 2)
 #define HPFAR		__ACCESS_CP15(c6, 4, c0, 4)
 #define ICIALLUIS	__ACCESS_CP15(c7, 0, c1, 0)
+#define BPIALLIS	__ACCESS_CP15(c7, 0, c1, 6)
+#define ICIMVAU		__ACCESS_CP15(c7, 0, c5, 1)
 #define ATS1CPR		__ACCESS_CP15(c7, 0, c8, 0)
 #define TLBIALLIS	__ACCESS_CP15(c8, 0, c3, 0)
 #define TLBIALL		__ACCESS_CP15(c8, 0, c7, 0)
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 9fa4b2520974..bc8d21e76637 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -37,6 +37,8 @@
 
 #include <linux/highmem.h>
 #include <asm/cacheflush.h>
+#include <asm/cputype.h>
+#include <asm/kvm_hyp.h>
 #include <asm/pgalloc.h>
 #include <asm/stage2_pgtable.h>
 
@@ -157,6 +159,8 @@ static inline void __invalidate_icache_guest_page(struct kvm_vcpu *vcpu,
 						  kvm_pfn_t pfn,
 						  unsigned long size)
 {
+	u32 iclsz;
+
 	/*
 	 * If we are going to insert an instruction page and the icache is
 	 * either VIPT or PIPT, there is a potential problem where the host
@@ -181,18 +185,40 @@ static inline void __invalidate_icache_guest_page(struct kvm_vcpu *vcpu,
 		return;
 	}
 
-	/* PIPT cache. As for the d-side, use a temporary kernel mapping. */
+	/*
+	 * CTR IminLine contains Log2 of the number of words in the
+	 * cache line, so we can get the number of words as
+	 * 2 << (IminLine - 1).  To get the number of bytes, we
+	 * multiply by 4 (the number of bytes in a 32-bit word), and
+	 * get 4 << (IminLine).
+	 */
+	iclsz = 4 << (read_cpuid(CPUID_CACHETYPE) & 0xf);
+
 	while (size) {
 		void *va = kmap_atomic_pfn(pfn);
+		void *end = va + PAGE_SIZE;
+		void *addr = va;
 
-		__cpuc_coherent_user_range((unsigned long)va,
-					   (unsigned long)va + PAGE_SIZE);
+		do {
+			write_sysreg(addr, ICIMVAU);
+			addr += iclsz;
+		} while (addr < end);
+
+		dsb(ishst);
+		isb();
 
 		size -= PAGE_SIZE;
 		pfn++;
 
 		kunmap_atomic(va);
 	}
+
+	/* Check if we need to invalidate the BTB */
+	if ((read_cpuid_ext(CPUID_EXT_MMFR1) >> 28) != 4) {
+		write_sysreg(0, BPIALLIS);
+		dsb(ishst);
+		isb();
+	}
 }
 
 static inline void __kvm_flush_dcache_pte(pte_t pte)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 5/9] arm64: KVM: PTE/PMD S2 XN bit definition
  2017-10-23 16:11 ` Marc Zyngier
@ 2017-10-23 16:11   ` Marc Zyngier
  -1 siblings, 0 replies; 52+ messages in thread
From: Marc Zyngier @ 2017-10-23 16:11 UTC (permalink / raw)
  To: Christoffer Dall, Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, kvm, kvmarm

As we're about to make S2 page-tables eXecute Never by default,
add the required bits for both PMDs and PTEs.

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/pgtable-hwdef.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
index eb0c2bd90de9..af035331fb09 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -177,9 +177,11 @@
  */
 #define PTE_S2_RDONLY		(_AT(pteval_t, 1) << 6)   /* HAP[2:1] */
 #define PTE_S2_RDWR		(_AT(pteval_t, 3) << 6)   /* HAP[2:1] */
+#define PTE_S2_XN		(_AT(pteval_t, 2) << 53)  /* XN[1:0] */
 
 #define PMD_S2_RDONLY		(_AT(pmdval_t, 1) << 6)   /* HAP[2:1] */
 #define PMD_S2_RDWR		(_AT(pmdval_t, 3) << 6)   /* HAP[2:1] */
+#define PMD_S2_XN		(_AT(pmdval_t, 2) << 53)  /* XN[1:0] */
 
 /*
  * Memory Attribute override for Stage-2 (MemAttr[3:0])
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 5/9] arm64: KVM: PTE/PMD S2 XN bit definition
@ 2017-10-23 16:11   ` Marc Zyngier
  0 siblings, 0 replies; 52+ messages in thread
From: Marc Zyngier @ 2017-10-23 16:11 UTC (permalink / raw)
  To: linux-arm-kernel

As we're about to make S2 page-tables eXecute Never by default,
add the required bits for both PMDs and PTEs.

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/pgtable-hwdef.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
index eb0c2bd90de9..af035331fb09 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -177,9 +177,11 @@
  */
 #define PTE_S2_RDONLY		(_AT(pteval_t, 1) << 6)   /* HAP[2:1] */
 #define PTE_S2_RDWR		(_AT(pteval_t, 3) << 6)   /* HAP[2:1] */
+#define PTE_S2_XN		(_AT(pteval_t, 2) << 53)  /* XN[1:0] */
 
 #define PMD_S2_RDONLY		(_AT(pmdval_t, 1) << 6)   /* HAP[2:1] */
 #define PMD_S2_RDWR		(_AT(pmdval_t, 3) << 6)   /* HAP[2:1] */
+#define PMD_S2_XN		(_AT(pmdval_t, 2) << 53)  /* XN[1:0] */
 
 /*
  * Memory Attribute override for Stage-2 (MemAttr[3:0])
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 6/9] KVM: arm/arm64: Limit icache invalidation to prefetch aborts
  2017-10-23 16:11 ` Marc Zyngier
@ 2017-10-23 16:11   ` Marc Zyngier
  -1 siblings, 0 replies; 52+ messages in thread
From: Marc Zyngier @ 2017-10-23 16:11 UTC (permalink / raw)
  To: Christoffer Dall, Catalin Marinas, Will Deacon
  Cc: kvm, linux-arm-kernel, kvmarm

We've so far eagerly invalidated the icache, no matter how
the page was faulted in (data or prefetch abort).

But we can easily track execution by setting the XN bits
in the S2 page tables, get the prefetch abort at HYP and
perform the icache invalidation at that time only.

As for most VMs, the instruction working set is pretty
small compared to the data set, this is likely to save
some traffic (specially as the invalidation is broadcast).

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_mmu.h        | 12 ++++++++++++
 arch/arm/include/asm/pgtable.h        |  4 ++--
 arch/arm64/include/asm/kvm_mmu.h      | 12 ++++++++++++
 arch/arm64/include/asm/pgtable-prot.h |  4 ++--
 virt/kvm/arm/mmu.c                    | 19 +++++++++++++++----
 5 files changed, 43 insertions(+), 8 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index bc8d21e76637..4d7a54cbb3ab 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -85,6 +85,18 @@ static inline pmd_t kvm_s2pmd_mkwrite(pmd_t pmd)
 	return pmd;
 }
 
+static inline pte_t kvm_s2pte_mkexec(pte_t pte)
+{
+	pte_val(pte) &= ~L_PTE_XN;
+	return pte;
+}
+
+static inline pmd_t kvm_s2pmd_mkexec(pmd_t pmd)
+{
+	pmd_val(pmd) &= ~PMD_SECT_XN;
+	return pmd;
+}
+
 static inline void kvm_set_s2pte_readonly(pte_t *pte)
 {
 	pte_val(*pte) = (pte_val(*pte) & ~L_PTE_S2_RDWR) | L_PTE_S2_RDONLY;
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 1c462381c225..9b6e77b9ab7e 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -102,8 +102,8 @@ extern pgprot_t		pgprot_s2_device;
 #define PAGE_HYP_EXEC		_MOD_PROT(pgprot_kernel, L_PTE_HYP | L_PTE_RDONLY)
 #define PAGE_HYP_RO		_MOD_PROT(pgprot_kernel, L_PTE_HYP | L_PTE_RDONLY | L_PTE_XN)
 #define PAGE_HYP_DEVICE		_MOD_PROT(pgprot_hyp_device, L_PTE_HYP)
-#define PAGE_S2			_MOD_PROT(pgprot_s2, L_PTE_S2_RDONLY)
-#define PAGE_S2_DEVICE		_MOD_PROT(pgprot_s2_device, L_PTE_S2_RDONLY)
+#define PAGE_S2			_MOD_PROT(pgprot_s2, L_PTE_S2_RDONLY | L_PTE_XN)
+#define PAGE_S2_DEVICE		_MOD_PROT(pgprot_s2_device, L_PTE_S2_RDONLY | L_PTE_XN)
 
 #define __PAGE_NONE		__pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | L_PTE_XN | L_PTE_NONE)
 #define __PAGE_SHARED		__pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_XN)
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 56b3e03c85e7..1e1b20cb348f 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -173,6 +173,18 @@ static inline pmd_t kvm_s2pmd_mkwrite(pmd_t pmd)
 	return pmd;
 }
 
+static inline pte_t kvm_s2pte_mkexec(pte_t pte)
+{
+	pte_val(pte) &= ~PTE_S2_XN;
+	return pte;
+}
+
+static inline pmd_t kvm_s2pmd_mkexec(pmd_t pmd)
+{
+	pmd_val(pmd) &= ~PMD_S2_XN;
+	return pmd;
+}
+
 static inline void kvm_set_s2pte_readonly(pte_t *pte)
 {
 	pteval_t old_pteval, pteval;
diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
index 0a5635fb0ef9..4e12dabd342b 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -60,8 +60,8 @@
 #define PAGE_HYP_RO		__pgprot(_PAGE_DEFAULT | PTE_HYP | PTE_RDONLY | PTE_HYP_XN)
 #define PAGE_HYP_DEVICE		__pgprot(PROT_DEVICE_nGnRE | PTE_HYP)
 
-#define PAGE_S2			__pgprot(PROT_DEFAULT | PTE_S2_MEMATTR(MT_S2_NORMAL) | PTE_S2_RDONLY)
-#define PAGE_S2_DEVICE		__pgprot(PROT_DEFAULT | PTE_S2_MEMATTR(MT_S2_DEVICE_nGnRE) | PTE_S2_RDONLY | PTE_UXN)
+#define PAGE_S2			__pgprot(PROT_DEFAULT | PTE_S2_MEMATTR(MT_S2_NORMAL) | PTE_S2_RDONLY | PTE_S2_XN)
+#define PAGE_S2_DEVICE		__pgprot(PROT_DEFAULT | PTE_S2_MEMATTR(MT_S2_DEVICE_nGnRE) | PTE_S2_RDONLY | PTE_S2_XN)
 
 #define PAGE_NONE		__pgprot(((_PAGE_DEFAULT) & ~PTE_VALID) | PTE_PROT_NONE | PTE_RDONLY | PTE_PXN | PTE_UXN)
 #define PAGE_SHARED		__pgprot(_PAGE_DEFAULT | PTE_USER | PTE_NG | PTE_PXN | PTE_UXN | PTE_WRITE)
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 2174244f6317..0417c8e2a81c 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1292,7 +1292,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  unsigned long fault_status)
 {
 	int ret;
-	bool write_fault, writable, hugetlb = false, force_pte = false;
+	bool write_fault, exec_fault, writable, hugetlb = false, force_pte = false;
 	unsigned long mmu_seq;
 	gfn_t gfn = fault_ipa >> PAGE_SHIFT;
 	struct kvm *kvm = vcpu->kvm;
@@ -1304,7 +1304,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	unsigned long flags = 0;
 
 	write_fault = kvm_is_write_fault(vcpu);
-	if (fault_status == FSC_PERM && !write_fault) {
+	exec_fault = kvm_vcpu_trap_is_iabt(vcpu);
+	VM_BUG_ON(write_fault && exec_fault);
+
+	if (fault_status == FSC_PERM && !write_fault && !exec_fault) {
 		kvm_err("Unexpected L2 read permission error\n");
 		return -EFAULT;
 	}
@@ -1398,7 +1401,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			kvm_set_pfn_dirty(pfn);
 		}
 		clean_dcache_guest_page(vcpu, pfn, PMD_SIZE);
-		invalidate_icache_guest_page(vcpu, pfn, PMD_SIZE);
+
+		if (exec_fault) {
+			new_pmd = kvm_s2pmd_mkexec(new_pmd);
+			invalidate_icache_guest_page(vcpu, pfn, PMD_SIZE);
+		}
 
 		ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
 	} else {
@@ -1410,7 +1417,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			mark_page_dirty(kvm, gfn);
 		}
 		clean_dcache_guest_page(vcpu, pfn, PAGE_SIZE);
-		invalidate_icache_guest_page(vcpu, pfn, PAGE_SIZE);
+
+		if (exec_fault) {
+			new_pte = kvm_s2pte_mkexec(new_pte);
+			invalidate_icache_guest_page(vcpu, pfn, PAGE_SIZE);
+		}
 
 		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, flags);
 	}
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 6/9] KVM: arm/arm64: Limit icache invalidation to prefetch aborts
@ 2017-10-23 16:11   ` Marc Zyngier
  0 siblings, 0 replies; 52+ messages in thread
From: Marc Zyngier @ 2017-10-23 16:11 UTC (permalink / raw)
  To: linux-arm-kernel

We've so far eagerly invalidated the icache, no matter how
the page was faulted in (data or prefetch abort).

But we can easily track execution by setting the XN bits
in the S2 page tables, get the prefetch abort at HYP and
perform the icache invalidation at that time only.

As for most VMs, the instruction working set is pretty
small compared to the data set, this is likely to save
some traffic (specially as the invalidation is broadcast).

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_mmu.h        | 12 ++++++++++++
 arch/arm/include/asm/pgtable.h        |  4 ++--
 arch/arm64/include/asm/kvm_mmu.h      | 12 ++++++++++++
 arch/arm64/include/asm/pgtable-prot.h |  4 ++--
 virt/kvm/arm/mmu.c                    | 19 +++++++++++++++----
 5 files changed, 43 insertions(+), 8 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index bc8d21e76637..4d7a54cbb3ab 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -85,6 +85,18 @@ static inline pmd_t kvm_s2pmd_mkwrite(pmd_t pmd)
 	return pmd;
 }
 
+static inline pte_t kvm_s2pte_mkexec(pte_t pte)
+{
+	pte_val(pte) &= ~L_PTE_XN;
+	return pte;
+}
+
+static inline pmd_t kvm_s2pmd_mkexec(pmd_t pmd)
+{
+	pmd_val(pmd) &= ~PMD_SECT_XN;
+	return pmd;
+}
+
 static inline void kvm_set_s2pte_readonly(pte_t *pte)
 {
 	pte_val(*pte) = (pte_val(*pte) & ~L_PTE_S2_RDWR) | L_PTE_S2_RDONLY;
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 1c462381c225..9b6e77b9ab7e 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -102,8 +102,8 @@ extern pgprot_t		pgprot_s2_device;
 #define PAGE_HYP_EXEC		_MOD_PROT(pgprot_kernel, L_PTE_HYP | L_PTE_RDONLY)
 #define PAGE_HYP_RO		_MOD_PROT(pgprot_kernel, L_PTE_HYP | L_PTE_RDONLY | L_PTE_XN)
 #define PAGE_HYP_DEVICE		_MOD_PROT(pgprot_hyp_device, L_PTE_HYP)
-#define PAGE_S2			_MOD_PROT(pgprot_s2, L_PTE_S2_RDONLY)
-#define PAGE_S2_DEVICE		_MOD_PROT(pgprot_s2_device, L_PTE_S2_RDONLY)
+#define PAGE_S2			_MOD_PROT(pgprot_s2, L_PTE_S2_RDONLY | L_PTE_XN)
+#define PAGE_S2_DEVICE		_MOD_PROT(pgprot_s2_device, L_PTE_S2_RDONLY | L_PTE_XN)
 
 #define __PAGE_NONE		__pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | L_PTE_XN | L_PTE_NONE)
 #define __PAGE_SHARED		__pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_XN)
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 56b3e03c85e7..1e1b20cb348f 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -173,6 +173,18 @@ static inline pmd_t kvm_s2pmd_mkwrite(pmd_t pmd)
 	return pmd;
 }
 
+static inline pte_t kvm_s2pte_mkexec(pte_t pte)
+{
+	pte_val(pte) &= ~PTE_S2_XN;
+	return pte;
+}
+
+static inline pmd_t kvm_s2pmd_mkexec(pmd_t pmd)
+{
+	pmd_val(pmd) &= ~PMD_S2_XN;
+	return pmd;
+}
+
 static inline void kvm_set_s2pte_readonly(pte_t *pte)
 {
 	pteval_t old_pteval, pteval;
diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
index 0a5635fb0ef9..4e12dabd342b 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -60,8 +60,8 @@
 #define PAGE_HYP_RO		__pgprot(_PAGE_DEFAULT | PTE_HYP | PTE_RDONLY | PTE_HYP_XN)
 #define PAGE_HYP_DEVICE		__pgprot(PROT_DEVICE_nGnRE | PTE_HYP)
 
-#define PAGE_S2			__pgprot(PROT_DEFAULT | PTE_S2_MEMATTR(MT_S2_NORMAL) | PTE_S2_RDONLY)
-#define PAGE_S2_DEVICE		__pgprot(PROT_DEFAULT | PTE_S2_MEMATTR(MT_S2_DEVICE_nGnRE) | PTE_S2_RDONLY | PTE_UXN)
+#define PAGE_S2			__pgprot(PROT_DEFAULT | PTE_S2_MEMATTR(MT_S2_NORMAL) | PTE_S2_RDONLY | PTE_S2_XN)
+#define PAGE_S2_DEVICE		__pgprot(PROT_DEFAULT | PTE_S2_MEMATTR(MT_S2_DEVICE_nGnRE) | PTE_S2_RDONLY | PTE_S2_XN)
 
 #define PAGE_NONE		__pgprot(((_PAGE_DEFAULT) & ~PTE_VALID) | PTE_PROT_NONE | PTE_RDONLY | PTE_PXN | PTE_UXN)
 #define PAGE_SHARED		__pgprot(_PAGE_DEFAULT | PTE_USER | PTE_NG | PTE_PXN | PTE_UXN | PTE_WRITE)
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 2174244f6317..0417c8e2a81c 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1292,7 +1292,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  unsigned long fault_status)
 {
 	int ret;
-	bool write_fault, writable, hugetlb = false, force_pte = false;
+	bool write_fault, exec_fault, writable, hugetlb = false, force_pte = false;
 	unsigned long mmu_seq;
 	gfn_t gfn = fault_ipa >> PAGE_SHIFT;
 	struct kvm *kvm = vcpu->kvm;
@@ -1304,7 +1304,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	unsigned long flags = 0;
 
 	write_fault = kvm_is_write_fault(vcpu);
-	if (fault_status == FSC_PERM && !write_fault) {
+	exec_fault = kvm_vcpu_trap_is_iabt(vcpu);
+	VM_BUG_ON(write_fault && exec_fault);
+
+	if (fault_status == FSC_PERM && !write_fault && !exec_fault) {
 		kvm_err("Unexpected L2 read permission error\n");
 		return -EFAULT;
 	}
@@ -1398,7 +1401,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			kvm_set_pfn_dirty(pfn);
 		}
 		clean_dcache_guest_page(vcpu, pfn, PMD_SIZE);
-		invalidate_icache_guest_page(vcpu, pfn, PMD_SIZE);
+
+		if (exec_fault) {
+			new_pmd = kvm_s2pmd_mkexec(new_pmd);
+			invalidate_icache_guest_page(vcpu, pfn, PMD_SIZE);
+		}
 
 		ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
 	} else {
@@ -1410,7 +1417,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			mark_page_dirty(kvm, gfn);
 		}
 		clean_dcache_guest_page(vcpu, pfn, PAGE_SIZE);
-		invalidate_icache_guest_page(vcpu, pfn, PAGE_SIZE);
+
+		if (exec_fault) {
+			new_pte = kvm_s2pte_mkexec(new_pte);
+			invalidate_icache_guest_page(vcpu, pfn, PAGE_SIZE);
+		}
 
 		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, flags);
 	}
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 7/9] KVM: arm/arm64: Only clean the dcache on translation fault
  2017-10-23 16:11 ` Marc Zyngier
@ 2017-10-23 16:11   ` Marc Zyngier
  -1 siblings, 0 replies; 52+ messages in thread
From: Marc Zyngier @ 2017-10-23 16:11 UTC (permalink / raw)
  To: Christoffer Dall, Catalin Marinas, Will Deacon
  Cc: kvm, linux-arm-kernel, kvmarm

The only case where we actually need to perform a dcache maintenance
is when we map the page for the first time, and subsequent permission
faults do not require cache maintenance. Let's make it conditional
on not being a permission fault (and thus a translation fault).

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 virt/kvm/arm/mmu.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 0417c8e2a81c..f956efbd933d 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1400,7 +1400,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			new_pmd = kvm_s2pmd_mkwrite(new_pmd);
 			kvm_set_pfn_dirty(pfn);
 		}
-		clean_dcache_guest_page(vcpu, pfn, PMD_SIZE);
+
+		if (fault_status != FSC_PERM)
+			clean_dcache_guest_page(vcpu, pfn, PMD_SIZE);
 
 		if (exec_fault) {
 			new_pmd = kvm_s2pmd_mkexec(new_pmd);
@@ -1416,7 +1418,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			kvm_set_pfn_dirty(pfn);
 			mark_page_dirty(kvm, gfn);
 		}
-		clean_dcache_guest_page(vcpu, pfn, PAGE_SIZE);
+
+		if (fault_status != FSC_PERM)
+			clean_dcache_guest_page(vcpu, pfn, PAGE_SIZE);
 
 		if (exec_fault) {
 			new_pte = kvm_s2pte_mkexec(new_pte);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 7/9] KVM: arm/arm64: Only clean the dcache on translation fault
@ 2017-10-23 16:11   ` Marc Zyngier
  0 siblings, 0 replies; 52+ messages in thread
From: Marc Zyngier @ 2017-10-23 16:11 UTC (permalink / raw)
  To: linux-arm-kernel

The only case where we actually need to perform a dcache maintenance
is when we map the page for the first time, and subsequent permission
faults do not require cache maintenance. Let's make it conditional
on not being a permission fault (and thus a translation fault).

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 virt/kvm/arm/mmu.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 0417c8e2a81c..f956efbd933d 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1400,7 +1400,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			new_pmd = kvm_s2pmd_mkwrite(new_pmd);
 			kvm_set_pfn_dirty(pfn);
 		}
-		clean_dcache_guest_page(vcpu, pfn, PMD_SIZE);
+
+		if (fault_status != FSC_PERM)
+			clean_dcache_guest_page(vcpu, pfn, PMD_SIZE);
 
 		if (exec_fault) {
 			new_pmd = kvm_s2pmd_mkexec(new_pmd);
@@ -1416,7 +1418,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			kvm_set_pfn_dirty(pfn);
 			mark_page_dirty(kvm, gfn);
 		}
-		clean_dcache_guest_page(vcpu, pfn, PAGE_SIZE);
+
+		if (fault_status != FSC_PERM)
+			clean_dcache_guest_page(vcpu, pfn, PAGE_SIZE);
 
 		if (exec_fault) {
 			new_pte = kvm_s2pte_mkexec(new_pte);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 8/9] KVM: arm/arm64: Preserve Exec permission across R/W permission faults
  2017-10-23 16:11 ` Marc Zyngier
@ 2017-10-23 16:11   ` Marc Zyngier
  -1 siblings, 0 replies; 52+ messages in thread
From: Marc Zyngier @ 2017-10-23 16:11 UTC (permalink / raw)
  To: Christoffer Dall, Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, kvm, kvmarm

So far, we loose the Exec property whenever we take permission
faults, as we always reconstruct the PTE/PMD from scratch. This
can be counter productive as we can end-up with the following
fault sequence:

	X -> RO -> ROX -> RW -> RWX

Instead, we can lookup the existing PTE/PMD and clear the XN bit in the
new entry if it was already cleared in the old one, leadig to a much
nicer fault sequence:

	X -> ROX -> RWX

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_mmu.h   | 10 ++++++++++
 arch/arm64/include/asm/kvm_mmu.h | 10 ++++++++++
 virt/kvm/arm/mmu.c               | 27 +++++++++++++++++++++++++++
 3 files changed, 47 insertions(+)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 4d7a54cbb3ab..aab64fe52146 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -107,6 +107,11 @@ static inline bool kvm_s2pte_readonly(pte_t *pte)
 	return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
 }
 
+static inline bool kvm_s2pte_exec(pte_t *pte)
+{
+	return !(pte_val(*pte) & L_PTE_XN);
+}
+
 static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
 {
 	pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
@@ -117,6 +122,11 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
 	return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
 }
 
+static inline bool kvm_s2pmd_exec(pmd_t *pmd)
+{
+	return !(pmd_val(*pmd) & PMD_SECT_XN);
+}
+
 static inline bool kvm_page_empty(void *ptr)
 {
 	struct page *ptr_page = virt_to_page(ptr);
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 1e1b20cb348f..126abefffe7f 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -203,6 +203,11 @@ static inline bool kvm_s2pte_readonly(pte_t *pte)
 	return (pte_val(*pte) & PTE_S2_RDWR) == PTE_S2_RDONLY;
 }
 
+static inline bool kvm_s2pte_exec(pte_t *pte)
+{
+	return !(pte_val(*pte) & PTE_S2_XN);
+}
+
 static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
 {
 	kvm_set_s2pte_readonly((pte_t *)pmd);
@@ -213,6 +218,11 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
 	return kvm_s2pte_readonly((pte_t *)pmd);
 }
 
+static inline bool kvm_s2pmd_exec(pmd_t *pmd)
+{
+	return !(pmd_val(*pmd) & PMD_S2_XN);
+}
+
 static inline bool kvm_page_empty(void *ptr)
 {
 	struct page *ptr_page = virt_to_page(ptr);
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index f956efbd933d..b83b5a8442bb 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -926,6 +926,25 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
 	return 0;
 }
 
+static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr)
+{
+	pmd_t *pmdp;
+	pte_t *ptep;
+
+	pmdp = stage2_get_pmd(kvm, NULL, addr);
+	if (!pmdp || pmd_none(*pmdp) || !pmd_present(*pmdp))
+		return false;
+
+	if (pmd_thp_or_huge(*pmdp))
+		return kvm_s2pmd_exec(pmdp);
+
+	ptep = pte_offset_kernel(pmdp, addr);
+	if (!ptep || pte_none(*ptep) || !pte_present(*ptep))
+		return false;
+
+	return kvm_s2pte_exec(ptep);
+}
+
 static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
 			  phys_addr_t addr, const pte_t *new_pte,
 			  unsigned long flags)
@@ -1407,6 +1426,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		if (exec_fault) {
 			new_pmd = kvm_s2pmd_mkexec(new_pmd);
 			invalidate_icache_guest_page(vcpu, pfn, PMD_SIZE);
+		} else if (fault_status == FSC_PERM) {
+			/* Preserve execute if XN was already cleared */
+			if (stage2_is_exec(kvm, fault_ipa))
+				new_pmd = kvm_s2pmd_mkexec(new_pmd);
 		}
 
 		ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
@@ -1425,6 +1448,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		if (exec_fault) {
 			new_pte = kvm_s2pte_mkexec(new_pte);
 			invalidate_icache_guest_page(vcpu, pfn, PAGE_SIZE);
+		} else if (fault_status == FSC_PERM) {
+			/* Preserve execute if XN was already cleared */
+			if (stage2_is_exec(kvm, fault_ipa))
+				new_pte = kvm_s2pte_mkexec(new_pte);
 		}
 
 		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, flags);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 8/9] KVM: arm/arm64: Preserve Exec permission across R/W permission faults
@ 2017-10-23 16:11   ` Marc Zyngier
  0 siblings, 0 replies; 52+ messages in thread
From: Marc Zyngier @ 2017-10-23 16:11 UTC (permalink / raw)
  To: linux-arm-kernel

So far, we loose the Exec property whenever we take permission
faults, as we always reconstruct the PTE/PMD from scratch. This
can be counter productive as we can end-up with the following
fault sequence:

	X -> RO -> ROX -> RW -> RWX

Instead, we can lookup the existing PTE/PMD and clear the XN bit in the
new entry if it was already cleared in the old one, leadig to a much
nicer fault sequence:

	X -> ROX -> RWX

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_mmu.h   | 10 ++++++++++
 arch/arm64/include/asm/kvm_mmu.h | 10 ++++++++++
 virt/kvm/arm/mmu.c               | 27 +++++++++++++++++++++++++++
 3 files changed, 47 insertions(+)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 4d7a54cbb3ab..aab64fe52146 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -107,6 +107,11 @@ static inline bool kvm_s2pte_readonly(pte_t *pte)
 	return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
 }
 
+static inline bool kvm_s2pte_exec(pte_t *pte)
+{
+	return !(pte_val(*pte) & L_PTE_XN);
+}
+
 static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
 {
 	pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
@@ -117,6 +122,11 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
 	return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
 }
 
+static inline bool kvm_s2pmd_exec(pmd_t *pmd)
+{
+	return !(pmd_val(*pmd) & PMD_SECT_XN);
+}
+
 static inline bool kvm_page_empty(void *ptr)
 {
 	struct page *ptr_page = virt_to_page(ptr);
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 1e1b20cb348f..126abefffe7f 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -203,6 +203,11 @@ static inline bool kvm_s2pte_readonly(pte_t *pte)
 	return (pte_val(*pte) & PTE_S2_RDWR) == PTE_S2_RDONLY;
 }
 
+static inline bool kvm_s2pte_exec(pte_t *pte)
+{
+	return !(pte_val(*pte) & PTE_S2_XN);
+}
+
 static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
 {
 	kvm_set_s2pte_readonly((pte_t *)pmd);
@@ -213,6 +218,11 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
 	return kvm_s2pte_readonly((pte_t *)pmd);
 }
 
+static inline bool kvm_s2pmd_exec(pmd_t *pmd)
+{
+	return !(pmd_val(*pmd) & PMD_S2_XN);
+}
+
 static inline bool kvm_page_empty(void *ptr)
 {
 	struct page *ptr_page = virt_to_page(ptr);
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index f956efbd933d..b83b5a8442bb 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -926,6 +926,25 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
 	return 0;
 }
 
+static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr)
+{
+	pmd_t *pmdp;
+	pte_t *ptep;
+
+	pmdp = stage2_get_pmd(kvm, NULL, addr);
+	if (!pmdp || pmd_none(*pmdp) || !pmd_present(*pmdp))
+		return false;
+
+	if (pmd_thp_or_huge(*pmdp))
+		return kvm_s2pmd_exec(pmdp);
+
+	ptep = pte_offset_kernel(pmdp, addr);
+	if (!ptep || pte_none(*ptep) || !pte_present(*ptep))
+		return false;
+
+	return kvm_s2pte_exec(ptep);
+}
+
 static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
 			  phys_addr_t addr, const pte_t *new_pte,
 			  unsigned long flags)
@@ -1407,6 +1426,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		if (exec_fault) {
 			new_pmd = kvm_s2pmd_mkexec(new_pmd);
 			invalidate_icache_guest_page(vcpu, pfn, PMD_SIZE);
+		} else if (fault_status == FSC_PERM) {
+			/* Preserve execute if XN was already cleared */
+			if (stage2_is_exec(kvm, fault_ipa))
+				new_pmd = kvm_s2pmd_mkexec(new_pmd);
 		}
 
 		ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
@@ -1425,6 +1448,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		if (exec_fault) {
 			new_pte = kvm_s2pte_mkexec(new_pte);
 			invalidate_icache_guest_page(vcpu, pfn, PAGE_SIZE);
+		} else if (fault_status == FSC_PERM) {
+			/* Preserve execute if XN was already cleared */
+			if (stage2_is_exec(kvm, fault_ipa))
+				new_pte = kvm_s2pte_mkexec(new_pte);
 		}
 
 		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, flags);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 9/9] KVM: arm/arm64: Drop vcpu parameter from guest cache maintenance operartions
  2017-10-23 16:11 ` Marc Zyngier
@ 2017-10-23 16:11   ` Marc Zyngier
  -1 siblings, 0 replies; 52+ messages in thread
From: Marc Zyngier @ 2017-10-23 16:11 UTC (permalink / raw)
  To: Christoffer Dall, Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, kvm, kvmarm

The vcpu parameter isn't used for anything, and gets in the way of
further cleanups. Let's get rid of it.

Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_mmu.h   |  7 ++-----
 arch/arm64/include/asm/kvm_mmu.h |  7 ++-----
 virt/kvm/arm/mmu.c               | 18 ++++++++----------
 3 files changed, 12 insertions(+), 20 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index aab64fe52146..bc70a1f0f42d 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -150,9 +150,7 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
 	return (vcpu_cp15(vcpu, c1_SCTLR) & 0b101) == 0b101;
 }
 
-static inline void __clean_dcache_guest_page(struct kvm_vcpu *vcpu,
-					     kvm_pfn_t pfn,
-					     unsigned long size)
+static inline void __clean_dcache_guest_page(kvm_pfn_t pfn, unsigned long size)
 {
 	/*
 	 * Clean the dcache to the Point of Coherency.
@@ -177,8 +175,7 @@ static inline void __clean_dcache_guest_page(struct kvm_vcpu *vcpu,
 	}
 }
 
-static inline void __invalidate_icache_guest_page(struct kvm_vcpu *vcpu,
-						  kvm_pfn_t pfn,
+static inline void __invalidate_icache_guest_page(kvm_pfn_t pfn,
 						  unsigned long size)
 {
 	u32 iclsz;
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 126abefffe7f..06f1f9794679 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -252,17 +252,14 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
 	return (vcpu_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
 }
 
-static inline void __clean_dcache_guest_page(struct kvm_vcpu *vcpu,
-					     kvm_pfn_t pfn,
-					     unsigned long size)
+static inline void __clean_dcache_guest_page(kvm_pfn_t pfn, unsigned long size)
 {
 	void *va = page_address(pfn_to_page(pfn));
 
 	kvm_flush_dcache_to_poc(va, size);
 }
 
-static inline void __invalidate_icache_guest_page(struct kvm_vcpu *vcpu,
-						  kvm_pfn_t pfn,
+static inline void __invalidate_icache_guest_page(kvm_pfn_t pfn,
 						  unsigned long size)
 {
 	if (icache_is_aliasing()) {
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index b83b5a8442bb..a1ea43fa75cf 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1276,16 +1276,14 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
 }
 
-static void clean_dcache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
-				    unsigned long size)
+static void clean_dcache_guest_page(kvm_pfn_t pfn, unsigned long size)
 {
-	__clean_dcache_guest_page(vcpu, pfn, size);
+	__clean_dcache_guest_page(pfn, size);
 }
 
-static void invalidate_icache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
-					 unsigned long size)
+static void invalidate_icache_guest_page(kvm_pfn_t pfn, unsigned long size)
 {
-	__invalidate_icache_guest_page(vcpu, pfn, size);
+	__invalidate_icache_guest_page(pfn, size);
 }
 
 static void kvm_send_hwpoison_signal(unsigned long address,
@@ -1421,11 +1419,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		}
 
 		if (fault_status != FSC_PERM)
-			clean_dcache_guest_page(vcpu, pfn, PMD_SIZE);
+			clean_dcache_guest_page(pfn, PMD_SIZE);
 
 		if (exec_fault) {
 			new_pmd = kvm_s2pmd_mkexec(new_pmd);
-			invalidate_icache_guest_page(vcpu, pfn, PMD_SIZE);
+			invalidate_icache_guest_page(pfn, PMD_SIZE);
 		} else if (fault_status == FSC_PERM) {
 			/* Preserve execute if XN was already cleared */
 			if (stage2_is_exec(kvm, fault_ipa))
@@ -1443,11 +1441,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		}
 
 		if (fault_status != FSC_PERM)
-			clean_dcache_guest_page(vcpu, pfn, PAGE_SIZE);
+			clean_dcache_guest_page(pfn, PAGE_SIZE);
 
 		if (exec_fault) {
 			new_pte = kvm_s2pte_mkexec(new_pte);
-			invalidate_icache_guest_page(vcpu, pfn, PAGE_SIZE);
+			invalidate_icache_guest_page(pfn, PAGE_SIZE);
 		} else if (fault_status == FSC_PERM) {
 			/* Preserve execute if XN was already cleared */
 			if (stage2_is_exec(kvm, fault_ipa))
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 9/9] KVM: arm/arm64: Drop vcpu parameter from guest cache maintenance operartions
@ 2017-10-23 16:11   ` Marc Zyngier
  0 siblings, 0 replies; 52+ messages in thread
From: Marc Zyngier @ 2017-10-23 16:11 UTC (permalink / raw)
  To: linux-arm-kernel

The vcpu parameter isn't used for anything, and gets in the way of
further cleanups. Let's get rid of it.

Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_mmu.h   |  7 ++-----
 arch/arm64/include/asm/kvm_mmu.h |  7 ++-----
 virt/kvm/arm/mmu.c               | 18 ++++++++----------
 3 files changed, 12 insertions(+), 20 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index aab64fe52146..bc70a1f0f42d 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -150,9 +150,7 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
 	return (vcpu_cp15(vcpu, c1_SCTLR) & 0b101) == 0b101;
 }
 
-static inline void __clean_dcache_guest_page(struct kvm_vcpu *vcpu,
-					     kvm_pfn_t pfn,
-					     unsigned long size)
+static inline void __clean_dcache_guest_page(kvm_pfn_t pfn, unsigned long size)
 {
 	/*
 	 * Clean the dcache to the Point of Coherency.
@@ -177,8 +175,7 @@ static inline void __clean_dcache_guest_page(struct kvm_vcpu *vcpu,
 	}
 }
 
-static inline void __invalidate_icache_guest_page(struct kvm_vcpu *vcpu,
-						  kvm_pfn_t pfn,
+static inline void __invalidate_icache_guest_page(kvm_pfn_t pfn,
 						  unsigned long size)
 {
 	u32 iclsz;
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 126abefffe7f..06f1f9794679 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -252,17 +252,14 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
 	return (vcpu_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
 }
 
-static inline void __clean_dcache_guest_page(struct kvm_vcpu *vcpu,
-					     kvm_pfn_t pfn,
-					     unsigned long size)
+static inline void __clean_dcache_guest_page(kvm_pfn_t pfn, unsigned long size)
 {
 	void *va = page_address(pfn_to_page(pfn));
 
 	kvm_flush_dcache_to_poc(va, size);
 }
 
-static inline void __invalidate_icache_guest_page(struct kvm_vcpu *vcpu,
-						  kvm_pfn_t pfn,
+static inline void __invalidate_icache_guest_page(kvm_pfn_t pfn,
 						  unsigned long size)
 {
 	if (icache_is_aliasing()) {
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index b83b5a8442bb..a1ea43fa75cf 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1276,16 +1276,14 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
 }
 
-static void clean_dcache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
-				    unsigned long size)
+static void clean_dcache_guest_page(kvm_pfn_t pfn, unsigned long size)
 {
-	__clean_dcache_guest_page(vcpu, pfn, size);
+	__clean_dcache_guest_page(pfn, size);
 }
 
-static void invalidate_icache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
-					 unsigned long size)
+static void invalidate_icache_guest_page(kvm_pfn_t pfn, unsigned long size)
 {
-	__invalidate_icache_guest_page(vcpu, pfn, size);
+	__invalidate_icache_guest_page(pfn, size);
 }
 
 static void kvm_send_hwpoison_signal(unsigned long address,
@@ -1421,11 +1419,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		}
 
 		if (fault_status != FSC_PERM)
-			clean_dcache_guest_page(vcpu, pfn, PMD_SIZE);
+			clean_dcache_guest_page(pfn, PMD_SIZE);
 
 		if (exec_fault) {
 			new_pmd = kvm_s2pmd_mkexec(new_pmd);
-			invalidate_icache_guest_page(vcpu, pfn, PMD_SIZE);
+			invalidate_icache_guest_page(pfn, PMD_SIZE);
 		} else if (fault_status == FSC_PERM) {
 			/* Preserve execute if XN was already cleared */
 			if (stage2_is_exec(kvm, fault_ipa))
@@ -1443,11 +1441,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		}
 
 		if (fault_status != FSC_PERM)
-			clean_dcache_guest_page(vcpu, pfn, PAGE_SIZE);
+			clean_dcache_guest_page(pfn, PAGE_SIZE);
 
 		if (exec_fault) {
 			new_pte = kvm_s2pte_mkexec(new_pte);
-			invalidate_icache_guest_page(vcpu, pfn, PAGE_SIZE);
+			invalidate_icache_guest_page(pfn, PAGE_SIZE);
 		} else if (fault_status == FSC_PERM) {
 			/* Preserve execute if XN was already cleared */
 			if (stage2_is_exec(kvm, fault_ipa))
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 3/9] arm64: KVM: Add invalidate_icache_range helper
  2017-10-23 16:11   ` Marc Zyngier
@ 2017-10-23 16:19     ` Will Deacon
  -1 siblings, 0 replies; 52+ messages in thread
From: Will Deacon @ 2017-10-23 16:19 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Christoffer Dall, Catalin Marinas, linux-arm-kernel, kvm, kvmarm

On Mon, Oct 23, 2017 at 05:11:16PM +0100, Marc Zyngier wrote:
> We currently tightly couple dcache clean with icache invalidation,
> but KVM could do without the initial flush to PoU, as we've
> already flushed things to PoC.
> 
> Let's introduce invalidate_icache_range which is limited to
> invalidating the icache from the linear mapping (and thus
> has none of the userspace fault handling complexity), and
> wire it in KVM instead of flush_icache_range.
> 
> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/include/asm/assembler.h  | 21 +++++++++++++++++++++
>  arch/arm64/include/asm/cacheflush.h |  7 +++++++
>  arch/arm64/include/asm/kvm_mmu.h    |  4 ++--
>  arch/arm64/mm/cache.S               | 32 ++++++++++++++++++++++----------
>  4 files changed, 52 insertions(+), 12 deletions(-)

Acked-by: Will Deacon <will.deacon@arm.com>

Thanks for respinning this.

Will

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 3/9] arm64: KVM: Add invalidate_icache_range helper
@ 2017-10-23 16:19     ` Will Deacon
  0 siblings, 0 replies; 52+ messages in thread
From: Will Deacon @ 2017-10-23 16:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 23, 2017 at 05:11:16PM +0100, Marc Zyngier wrote:
> We currently tightly couple dcache clean with icache invalidation,
> but KVM could do without the initial flush to PoU, as we've
> already flushed things to PoC.
> 
> Let's introduce invalidate_icache_range which is limited to
> invalidating the icache from the linear mapping (and thus
> has none of the userspace fault handling complexity), and
> wire it in KVM instead of flush_icache_range.
> 
> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/include/asm/assembler.h  | 21 +++++++++++++++++++++
>  arch/arm64/include/asm/cacheflush.h |  7 +++++++
>  arch/arm64/include/asm/kvm_mmu.h    |  4 ++--
>  arch/arm64/mm/cache.S               | 32 ++++++++++++++++++++++----------
>  4 files changed, 52 insertions(+), 12 deletions(-)

Acked-by: Will Deacon <will.deacon@arm.com>

Thanks for respinning this.

Will

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 6/9] KVM: arm/arm64: Limit icache invalidation to prefetch aborts
  2017-10-23 16:11   ` Marc Zyngier
@ 2017-11-01 10:17     ` Andrew Jones
  -1 siblings, 0 replies; 52+ messages in thread
From: Andrew Jones @ 2017-11-01 10:17 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Christoffer Dall, Catalin Marinas, Will Deacon, linux-arm-kernel,
	kvm, kvmarm

On Mon, Oct 23, 2017 at 05:11:19PM +0100, Marc Zyngier wrote:
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 2174244f6317..0417c8e2a81c 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1292,7 +1292,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  			  unsigned long fault_status)
>  {
>  	int ret;
> -	bool write_fault, writable, hugetlb = false, force_pte = false;
> +	bool write_fault, exec_fault, writable, hugetlb = false, force_pte = false;
>  	unsigned long mmu_seq;
>  	gfn_t gfn = fault_ipa >> PAGE_SHIFT;
>  	struct kvm *kvm = vcpu->kvm;
> @@ -1304,7 +1304,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	unsigned long flags = 0;
>  
>  	write_fault = kvm_is_write_fault(vcpu);
> -	if (fault_status == FSC_PERM && !write_fault) {
> +	exec_fault = kvm_vcpu_trap_is_iabt(vcpu);
> +	VM_BUG_ON(write_fault && exec_fault);

This VM_BUG_ON can never fire as long as kvm_is_write_fault() is
defined as

 {
   if (kvm_vcpu_trap_is_iabt(vcpu))
       return false;
   return kvm_vcpu_dabt_iswrite(vcpu);
 }

Thanks,
drew

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 6/9] KVM: arm/arm64: Limit icache invalidation to prefetch aborts
@ 2017-11-01 10:17     ` Andrew Jones
  0 siblings, 0 replies; 52+ messages in thread
From: Andrew Jones @ 2017-11-01 10:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 23, 2017 at 05:11:19PM +0100, Marc Zyngier wrote:
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 2174244f6317..0417c8e2a81c 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1292,7 +1292,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  			  unsigned long fault_status)
>  {
>  	int ret;
> -	bool write_fault, writable, hugetlb = false, force_pte = false;
> +	bool write_fault, exec_fault, writable, hugetlb = false, force_pte = false;
>  	unsigned long mmu_seq;
>  	gfn_t gfn = fault_ipa >> PAGE_SHIFT;
>  	struct kvm *kvm = vcpu->kvm;
> @@ -1304,7 +1304,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	unsigned long flags = 0;
>  
>  	write_fault = kvm_is_write_fault(vcpu);
> -	if (fault_status == FSC_PERM && !write_fault) {
> +	exec_fault = kvm_vcpu_trap_is_iabt(vcpu);
> +	VM_BUG_ON(write_fault && exec_fault);

This VM_BUG_ON can never fire as long as kvm_is_write_fault() is
defined as

 {
   if (kvm_vcpu_trap_is_iabt(vcpu))
       return false;
   return kvm_vcpu_dabt_iswrite(vcpu);
 }

Thanks,
drew

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 6/9] KVM: arm/arm64: Limit icache invalidation to prefetch aborts
  2017-11-01 10:17     ` Andrew Jones
@ 2017-11-02 10:36       ` Marc Zyngier
  -1 siblings, 0 replies; 52+ messages in thread
From: Marc Zyngier @ 2017-11-02 10:36 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvm, Catalin Marinas, Will Deacon, kvmarm, linux-arm-kernel

On Wed, Nov 01 2017 at 11:17:27 am GMT, Andrew Jones <drjones@redhat.com> wrote:
> On Mon, Oct 23, 2017 at 05:11:19PM +0100, Marc Zyngier wrote:
>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>> index 2174244f6317..0417c8e2a81c 100644
>> --- a/virt/kvm/arm/mmu.c
>> +++ b/virt/kvm/arm/mmu.c
>> @@ -1292,7 +1292,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  			  unsigned long fault_status)
>>  {
>>  	int ret;
>> -	bool write_fault, writable, hugetlb = false, force_pte = false;
>> +	bool write_fault, exec_fault, writable, hugetlb = false, force_pte = false;
>>  	unsigned long mmu_seq;
>>  	gfn_t gfn = fault_ipa >> PAGE_SHIFT;
>>  	struct kvm *kvm = vcpu->kvm;
>> @@ -1304,7 +1304,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  	unsigned long flags = 0;
>>  
>>  	write_fault = kvm_is_write_fault(vcpu);
>> -	if (fault_status == FSC_PERM && !write_fault) {
>> +	exec_fault = kvm_vcpu_trap_is_iabt(vcpu);
>> +	VM_BUG_ON(write_fault && exec_fault);
>
> This VM_BUG_ON can never fire as long as kvm_is_write_fault() is
> defined as
>
>  {
>    if (kvm_vcpu_trap_is_iabt(vcpu))
>        return false;
>    return kvm_vcpu_dabt_iswrite(vcpu);
>  }

That's indeed what I expect. But given that the code now relies on this
property, I chose to make it explicit.

Or are you seeing a better way of making this an invariant?

Thanks,

	M.
-- 
Jazz is not dead, it just smell funny.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 6/9] KVM: arm/arm64: Limit icache invalidation to prefetch aborts
@ 2017-11-02 10:36       ` Marc Zyngier
  0 siblings, 0 replies; 52+ messages in thread
From: Marc Zyngier @ 2017-11-02 10:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Nov 01 2017 at 11:17:27 am GMT, Andrew Jones <drjones@redhat.com> wrote:
> On Mon, Oct 23, 2017 at 05:11:19PM +0100, Marc Zyngier wrote:
>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>> index 2174244f6317..0417c8e2a81c 100644
>> --- a/virt/kvm/arm/mmu.c
>> +++ b/virt/kvm/arm/mmu.c
>> @@ -1292,7 +1292,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  			  unsigned long fault_status)
>>  {
>>  	int ret;
>> -	bool write_fault, writable, hugetlb = false, force_pte = false;
>> +	bool write_fault, exec_fault, writable, hugetlb = false, force_pte = false;
>>  	unsigned long mmu_seq;
>>  	gfn_t gfn = fault_ipa >> PAGE_SHIFT;
>>  	struct kvm *kvm = vcpu->kvm;
>> @@ -1304,7 +1304,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>  	unsigned long flags = 0;
>>  
>>  	write_fault = kvm_is_write_fault(vcpu);
>> -	if (fault_status == FSC_PERM && !write_fault) {
>> +	exec_fault = kvm_vcpu_trap_is_iabt(vcpu);
>> +	VM_BUG_ON(write_fault && exec_fault);
>
> This VM_BUG_ON can never fire as long as kvm_is_write_fault() is
> defined as
>
>  {
>    if (kvm_vcpu_trap_is_iabt(vcpu))
>        return false;
>    return kvm_vcpu_dabt_iswrite(vcpu);
>  }

That's indeed what I expect. But given that the code now relies on this
property, I chose to make it explicit.

Or are you seeing a better way of making this an invariant?

Thanks,

	M.
-- 
Jazz is not dead, it just smell funny.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 6/9] KVM: arm/arm64: Limit icache invalidation to prefetch aborts
  2017-11-02 10:36       ` Marc Zyngier
@ 2017-11-02 13:13         ` Andrew Jones
  -1 siblings, 0 replies; 52+ messages in thread
From: Andrew Jones @ 2017-11-02 13:13 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Christoffer Dall, Catalin Marinas, Will Deacon, linux-arm-kernel,
	kvm, kvmarm

On Thu, Nov 02, 2017 at 10:36:35AM +0000, Marc Zyngier wrote:
> On Wed, Nov 01 2017 at 11:17:27 am GMT, Andrew Jones <drjones@redhat.com> wrote:
> > On Mon, Oct 23, 2017 at 05:11:19PM +0100, Marc Zyngier wrote:
> >> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> >> index 2174244f6317..0417c8e2a81c 100644
> >> --- a/virt/kvm/arm/mmu.c
> >> +++ b/virt/kvm/arm/mmu.c
> >> @@ -1292,7 +1292,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >>  			  unsigned long fault_status)
> >>  {
> >>  	int ret;
> >> -	bool write_fault, writable, hugetlb = false, force_pte = false;
> >> +	bool write_fault, exec_fault, writable, hugetlb = false, force_pte = false;
> >>  	unsigned long mmu_seq;
> >>  	gfn_t gfn = fault_ipa >> PAGE_SHIFT;
> >>  	struct kvm *kvm = vcpu->kvm;
> >> @@ -1304,7 +1304,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >>  	unsigned long flags = 0;
> >>  
> >>  	write_fault = kvm_is_write_fault(vcpu);
> >> -	if (fault_status == FSC_PERM && !write_fault) {
> >> +	exec_fault = kvm_vcpu_trap_is_iabt(vcpu);
> >> +	VM_BUG_ON(write_fault && exec_fault);
> >
> > This VM_BUG_ON can never fire as long as kvm_is_write_fault() is
> > defined as
> >
> >  {
> >    if (kvm_vcpu_trap_is_iabt(vcpu))
> >        return false;
> >    return kvm_vcpu_dabt_iswrite(vcpu);
> >  }
> 
> That's indeed what I expect. But given that the code now relies on this
> property, I chose to make it explicit.
> 
> Or are you seeing a better way of making this an invariant?
>

No, I wish I did, because then I wouldn't have to apologize for the
noise :-) The VM_BUG_ON() does indeed improve the code by documenting/
enforcing the requirement.

Thanks,
drew

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 6/9] KVM: arm/arm64: Limit icache invalidation to prefetch aborts
@ 2017-11-02 13:13         ` Andrew Jones
  0 siblings, 0 replies; 52+ messages in thread
From: Andrew Jones @ 2017-11-02 13:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Nov 02, 2017 at 10:36:35AM +0000, Marc Zyngier wrote:
> On Wed, Nov 01 2017 at 11:17:27 am GMT, Andrew Jones <drjones@redhat.com> wrote:
> > On Mon, Oct 23, 2017 at 05:11:19PM +0100, Marc Zyngier wrote:
> >> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> >> index 2174244f6317..0417c8e2a81c 100644
> >> --- a/virt/kvm/arm/mmu.c
> >> +++ b/virt/kvm/arm/mmu.c
> >> @@ -1292,7 +1292,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >>  			  unsigned long fault_status)
> >>  {
> >>  	int ret;
> >> -	bool write_fault, writable, hugetlb = false, force_pte = false;
> >> +	bool write_fault, exec_fault, writable, hugetlb = false, force_pte = false;
> >>  	unsigned long mmu_seq;
> >>  	gfn_t gfn = fault_ipa >> PAGE_SHIFT;
> >>  	struct kvm *kvm = vcpu->kvm;
> >> @@ -1304,7 +1304,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >>  	unsigned long flags = 0;
> >>  
> >>  	write_fault = kvm_is_write_fault(vcpu);
> >> -	if (fault_status == FSC_PERM && !write_fault) {
> >> +	exec_fault = kvm_vcpu_trap_is_iabt(vcpu);
> >> +	VM_BUG_ON(write_fault && exec_fault);
> >
> > This VM_BUG_ON can never fire as long as kvm_is_write_fault() is
> > defined as
> >
> >  {
> >    if (kvm_vcpu_trap_is_iabt(vcpu))
> >        return false;
> >    return kvm_vcpu_dabt_iswrite(vcpu);
> >  }
> 
> That's indeed what I expect. But given that the code now relies on this
> property, I chose to make it explicit.
> 
> Or are you seeing a better way of making this an invariant?
>

No, I wish I did, because then I wouldn't have to apologize for the
noise :-) The VM_BUG_ON() does indeed improve the code by documenting/
enforcing the requirement.

Thanks,
drew

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 7/9] KVM: arm/arm64: Only clean the dcache on translation fault
  2017-10-23 16:11   ` Marc Zyngier
@ 2018-08-21 13:35     ` Alexander Graf
  -1 siblings, 0 replies; 52+ messages in thread
From: Alexander Graf @ 2018-08-21 13:35 UTC (permalink / raw)
  To: Marc Zyngier, Christoffer Dall, Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, kvm, kvmarm

On 10/23/2017 06:11 PM, Marc Zyngier wrote:
> The only case where we actually need to perform a dcache maintenance
> is when we map the page for the first time, and subsequent permission
> faults do not require cache maintenance. Let's make it conditional
> on not being a permission fault (and thus a translation fault).
>
> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

This patch unfortunately breaks something on Hi1616 SoCs when running 
32bit guests. With this patch applied (and thus with 4.18) I get random 
illegal instruction warnings from 32bit code inside VMs. I do not know 
at this point whether this affects other CPUs as well.

If anyone is interested in a reproducer, I have something handy. But for 
now I believe we should just revert this patch.


Alex

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 7/9] KVM: arm/arm64: Only clean the dcache on translation fault
@ 2018-08-21 13:35     ` Alexander Graf
  0 siblings, 0 replies; 52+ messages in thread
From: Alexander Graf @ 2018-08-21 13:35 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/23/2017 06:11 PM, Marc Zyngier wrote:
> The only case where we actually need to perform a dcache maintenance
> is when we map the page for the first time, and subsequent permission
> faults do not require cache maintenance. Let's make it conditional
> on not being a permission fault (and thus a translation fault).
>
> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

This patch unfortunately breaks something on Hi1616 SoCs when running 
32bit guests. With this patch applied (and thus with 4.18) I get random 
illegal instruction warnings from 32bit code inside VMs. I do not know 
at this point whether this affects other CPUs as well.

If anyone is interested in a reproducer, I have something handy. But for 
now I believe we should just revert this patch.


Alex

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 7/9] KVM: arm/arm64: Only clean the dcache on translation fault
  2018-08-21 13:35     ` Alexander Graf
@ 2018-08-21 13:42       ` Alexander Graf
  -1 siblings, 0 replies; 52+ messages in thread
From: Alexander Graf @ 2018-08-21 13:42 UTC (permalink / raw)
  To: Marc Zyngier, Christoffer Dall, Catalin Marinas, Will Deacon
  Cc: Dirk Müller, kvm, linux-arm-kernel, kvmarm

On 08/21/2018 03:35 PM, Alexander Graf wrote:
> On 10/23/2017 06:11 PM, Marc Zyngier wrote:
>> The only case where we actually need to perform a dcache maintenance
>> is when we map the page for the first time, and subsequent permission
>> faults do not require cache maintenance. Let's make it conditional
>> on not being a permission fault (and thus a translation fault).
>>
>> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>
> This patch unfortunately breaks something on Hi1616 SoCs when running 
> 32bit guests. With this patch applied (and thus with 4.18) I get 
> random illegal instruction warnings from 32bit code inside VMs. I do 
> not know at this point whether this affects other CPUs as well.
>
> If anyone is interested in a reproducer, I have something handy. But 
> for now I believe we should just revert this patch.

Ok, I'm slightly confused. The patch in question is already upstream 
since 4.16, but the regression reportedly came with the switch from 4.17 
to 4.18. I'll try to bisect it down a bit further ...


Alex

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 7/9] KVM: arm/arm64: Only clean the dcache on translation fault
@ 2018-08-21 13:42       ` Alexander Graf
  0 siblings, 0 replies; 52+ messages in thread
From: Alexander Graf @ 2018-08-21 13:42 UTC (permalink / raw)
  To: linux-arm-kernel

On 08/21/2018 03:35 PM, Alexander Graf wrote:
> On 10/23/2017 06:11 PM, Marc Zyngier wrote:
>> The only case where we actually need to perform a dcache maintenance
>> is when we map the page for the first time, and subsequent permission
>> faults do not require cache maintenance. Let's make it conditional
>> on not being a permission fault (and thus a translation fault).
>>
>> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>
> This patch unfortunately breaks something on Hi1616 SoCs when running 
> 32bit guests. With this patch applied (and thus with 4.18) I get 
> random illegal instruction warnings from 32bit code inside VMs. I do 
> not know at this point whether this affects other CPUs as well.
>
> If anyone is interested in a reproducer, I have something handy. But 
> for now I believe we should just revert this patch.

Ok, I'm slightly confused. The patch in question is already upstream 
since 4.16, but the regression reportedly came with the switch from 4.17 
to 4.18. I'll try to bisect it down a bit further ...


Alex

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 7/9] KVM: arm/arm64: Only clean the dcache on translation fault
  2018-08-21 13:35     ` Alexander Graf
@ 2018-08-21 13:57       ` Marc Zyngier
  -1 siblings, 0 replies; 52+ messages in thread
From: Marc Zyngier @ 2018-08-21 13:57 UTC (permalink / raw)
  To: Alexander Graf, Christoffer Dall, Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, kvm, kvmarm

On 21/08/18 14:35, Alexander Graf wrote:
> On 10/23/2017 06:11 PM, Marc Zyngier wrote:
>> The only case where we actually need to perform a dcache maintenance
>> is when we map the page for the first time, and subsequent permission
>> faults do not require cache maintenance. Let's make it conditional
>> on not being a permission fault (and thus a translation fault).
>>
>> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> 
> This patch unfortunately breaks something on Hi1616 SoCs when running 
> 32bit guests. With this patch applied (and thus with 4.18) I get random 
> illegal instruction warnings from 32bit code inside VMs. I do not know 
> at this point whether this affects other CPUs as well.

Can you please give a few more details?

- what are the CPUs on this Hi1616? At least a /proc/cpuinfo would help

- an example of the crash? Is it within the decompressor? After? This
things do matter, given the number of crazy things the 32bit kernel does

- a host kernel configuration?

> If anyone is interested in a reproducer, I have something handy. But for 
> now I believe we should just revert this patch.

Before we revert anything, I'd like to understand what is happening.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 7/9] KVM: arm/arm64: Only clean the dcache on translation fault
@ 2018-08-21 13:57       ` Marc Zyngier
  0 siblings, 0 replies; 52+ messages in thread
From: Marc Zyngier @ 2018-08-21 13:57 UTC (permalink / raw)
  To: linux-arm-kernel

On 21/08/18 14:35, Alexander Graf wrote:
> On 10/23/2017 06:11 PM, Marc Zyngier wrote:
>> The only case where we actually need to perform a dcache maintenance
>> is when we map the page for the first time, and subsequent permission
>> faults do not require cache maintenance. Let's make it conditional
>> on not being a permission fault (and thus a translation fault).
>>
>> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> 
> This patch unfortunately breaks something on Hi1616 SoCs when running 
> 32bit guests. With this patch applied (and thus with 4.18) I get random 
> illegal instruction warnings from 32bit code inside VMs. I do not know 
> at this point whether this affects other CPUs as well.

Can you please give a few more details?

- what are the CPUs on this Hi1616? At least a /proc/cpuinfo would help

- an example of the crash? Is it within the decompressor? After? This
things do matter, given the number of crazy things the 32bit kernel does

- a host kernel configuration?

> If anyone is interested in a reproducer, I have something handy. But for 
> now I believe we should just revert this patch.

Before we revert anything, I'd like to understand what is happening.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 7/9] KVM: arm/arm64: Only clean the dcache on translation fault
  2018-08-21 13:57       ` Marc Zyngier
@ 2018-08-21 14:08         ` Alexander Graf
  -1 siblings, 0 replies; 52+ messages in thread
From: Alexander Graf @ 2018-08-21 14:08 UTC (permalink / raw)
  To: Marc Zyngier, Christoffer Dall, Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, kvm, kvmarm

On 08/21/2018 03:57 PM, Marc Zyngier wrote:
> On 21/08/18 14:35, Alexander Graf wrote:
>> On 10/23/2017 06:11 PM, Marc Zyngier wrote:
>>> The only case where we actually need to perform a dcache maintenance
>>> is when we map the page for the first time, and subsequent permission
>>> faults do not require cache maintenance. Let's make it conditional
>>> on not being a permission fault (and thus a translation fault).
>>>
>>> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> This patch unfortunately breaks something on Hi1616 SoCs when running
>> 32bit guests. With this patch applied (and thus with 4.18) I get random
>> illegal instruction warnings from 32bit code inside VMs. I do not know
>> at this point whether this affects other CPUs as well.
> Can you please give a few more details?
>
> - what are the CPUs on this Hi1616? At least a /proc/cpuinfo would help

These are A72s:

processor    : 0
BogoMIPS    : 100.00
Features    : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
CPU implementer    : 0x41
CPU architecture: 8
CPU variant    : 0x0
CPU part    : 0xd08
CPU revision    : 2

>
> - an example of the crash? Is it within the decompressor? After? This
> things do matter, given the number of crazy things the 32bit kernel does

They are always in user space. My current reproducer is this:

   $ while rpm -qa > /dev/null; do :; done

If I run this in parallel with something that just populates RAM (dd 
if=/dev/nvme0n1 of=/dev/null bs=10G) I get an illegal instruction fault 
within seconds:

sh-4.4# while rpm -qa > /dev/null; do true; done
Illegal instruction (core dumped)


> - a host kernel configuration?

Host kernel configuration is just the normal openSUSE one:

https://kernel.opensuse.org/cgit/kernel-source/plain/config/arm64/default?h=stable

>> If anyone is interested in a reproducer, I have something handy. But for
>> now I believe we should just revert this patch.
> Before we revert anything, I'd like to understand what is happening.

Yeah, I didn't realize the commit is already in since 4.16, so I agree. 
I'll bisect a bit, but it may take a while.


Alex

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 7/9] KVM: arm/arm64: Only clean the dcache on translation fault
@ 2018-08-21 14:08         ` Alexander Graf
  0 siblings, 0 replies; 52+ messages in thread
From: Alexander Graf @ 2018-08-21 14:08 UTC (permalink / raw)
  To: linux-arm-kernel

On 08/21/2018 03:57 PM, Marc Zyngier wrote:
> On 21/08/18 14:35, Alexander Graf wrote:
>> On 10/23/2017 06:11 PM, Marc Zyngier wrote:
>>> The only case where we actually need to perform a dcache maintenance
>>> is when we map the page for the first time, and subsequent permission
>>> faults do not require cache maintenance. Let's make it conditional
>>> on not being a permission fault (and thus a translation fault).
>>>
>>> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> This patch unfortunately breaks something on Hi1616 SoCs when running
>> 32bit guests. With this patch applied (and thus with 4.18) I get random
>> illegal instruction warnings from 32bit code inside VMs. I do not know
>> at this point whether this affects other CPUs as well.
> Can you please give a few more details?
>
> - what are the CPUs on this Hi1616? At least a /proc/cpuinfo would help

These are A72s:

processor??? : 0
BogoMIPS??? : 100.00
Features??? : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
CPU implementer??? : 0x41
CPU architecture: 8
CPU variant??? : 0x0
CPU part??? : 0xd08
CPU revision??? : 2

>
> - an example of the crash? Is it within the decompressor? After? This
> things do matter, given the number of crazy things the 32bit kernel does

They are always in user space. My current reproducer is this:

 ? $ while rpm -qa > /dev/null; do :; done

If I run this in parallel with something that just populates RAM (dd 
if=/dev/nvme0n1 of=/dev/null bs=10G) I get an illegal instruction fault 
within seconds:

sh-4.4# while rpm -qa > /dev/null; do true; done
Illegal instruction (core dumped)


> - a host kernel configuration?

Host kernel configuration is just the normal openSUSE one:

https://kernel.opensuse.org/cgit/kernel-source/plain/config/arm64/default?h=stable

>> If anyone is interested in a reproducer, I have something handy. But for
>> now I believe we should just revert this patch.
> Before we revert anything, I'd like to understand what is happening.

Yeah, I didn't realize the commit is already in since 4.16, so I agree. 
I'll bisect a bit, but it may take a while.


Alex

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 7/9] KVM: arm/arm64: Only clean the dcache on translation fault
  2018-08-21 14:08         ` Alexander Graf
@ 2018-08-21 15:08           ` Marc Zyngier
  -1 siblings, 0 replies; 52+ messages in thread
From: Marc Zyngier @ 2018-08-21 15:08 UTC (permalink / raw)
  To: Alexander Graf, Christoffer Dall, Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, kvm, kvmarm

On 21/08/18 15:08, Alexander Graf wrote:
> On 08/21/2018 03:57 PM, Marc Zyngier wrote:
>> On 21/08/18 14:35, Alexander Graf wrote:
>>> On 10/23/2017 06:11 PM, Marc Zyngier wrote:
>>>> The only case where we actually need to perform a dcache maintenance
>>>> is when we map the page for the first time, and subsequent permission
>>>> faults do not require cache maintenance. Let's make it conditional
>>>> on not being a permission fault (and thus a translation fault).
>>>>
>>>> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
>>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>> This patch unfortunately breaks something on Hi1616 SoCs when running
>>> 32bit guests. With this patch applied (and thus with 4.18) I get random
>>> illegal instruction warnings from 32bit code inside VMs. I do not know
>>> at this point whether this affects other CPUs as well.
>> Can you please give a few more details?
>>
>> - what are the CPUs on this Hi1616? At least a /proc/cpuinfo would help
> 
> These are A72s:
> 
> processor    : 0
> BogoMIPS    : 100.00
> Features    : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
> CPU implementer    : 0x41
> CPU architecture: 8
> CPU variant    : 0x0
> CPU part    : 0xd08
> CPU revision    : 2
> 
>>
>> - an example of the crash? Is it within the decompressor? After? This
>> things do matter, given the number of crazy things the 32bit kernel does
> 
> They are always in user space. My current reproducer is this:
> 
>    $ while rpm -qa > /dev/null; do :; done
> 
> If I run this in parallel with something that just populates RAM (dd 
> if=/dev/nvme0n1 of=/dev/null bs=10G) I get an illegal instruction fault 
> within seconds:
> 
> sh-4.4# while rpm -qa > /dev/null; do true; done
> Illegal instruction (core dumped)
> 
> 
>> - a host kernel configuration?
> 
> Host kernel configuration is just the normal openSUSE one:
> 
> https://kernel.opensuse.org/cgit/kernel-source/plain/config/arm64/default?h=stable
> 
>>> If anyone is interested in a reproducer, I have something handy. But for
>>> now I believe we should just revert this patch.
>> Before we revert anything, I'd like to understand what is happening.
> 
> Yeah, I didn't realize the commit is already in since 4.16, so I agree. 
> I'll bisect a bit, but it may take a while.

Do you mind giving this a try?

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 1d90d79706bd..df8f3d5eaa22 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1531,7 +1536,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			kvm_set_pfn_dirty(pfn);
 		}
 
-		if (fault_status != FSC_PERM)
+		if (fault_status != FSC_PERM || write_fault)
 			clean_dcache_guest_page(pfn, PMD_SIZE);
 
 		if (exec_fault) {
@@ -1553,7 +1558,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			mark_page_dirty(kvm, gfn);
 		}
 
-		if (fault_status != FSC_PERM)
+		if (fault_status != FSC_PERM || write_fault)
 			clean_dcache_guest_page(pfn, PAGE_SIZE);
 
 		if (exec_fault) {


The missing logic is that a write from the guest could have triggered
a CoW, meaning we definitely need to flush it in that case too. It
fixes a kvm-unit-test regression here.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 7/9] KVM: arm/arm64: Only clean the dcache on translation fault
@ 2018-08-21 15:08           ` Marc Zyngier
  0 siblings, 0 replies; 52+ messages in thread
From: Marc Zyngier @ 2018-08-21 15:08 UTC (permalink / raw)
  To: linux-arm-kernel

On 21/08/18 15:08, Alexander Graf wrote:
> On 08/21/2018 03:57 PM, Marc Zyngier wrote:
>> On 21/08/18 14:35, Alexander Graf wrote:
>>> On 10/23/2017 06:11 PM, Marc Zyngier wrote:
>>>> The only case where we actually need to perform a dcache maintenance
>>>> is when we map the page for the first time, and subsequent permission
>>>> faults do not require cache maintenance. Let's make it conditional
>>>> on not being a permission fault (and thus a translation fault).
>>>>
>>>> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
>>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>> This patch unfortunately breaks something on Hi1616 SoCs when running
>>> 32bit guests. With this patch applied (and thus with 4.18) I get random
>>> illegal instruction warnings from 32bit code inside VMs. I do not know
>>> at this point whether this affects other CPUs as well.
>> Can you please give a few more details?
>>
>> - what are the CPUs on this Hi1616? At least a /proc/cpuinfo would help
> 
> These are A72s:
> 
> processor??? : 0
> BogoMIPS??? : 100.00
> Features??? : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
> CPU implementer??? : 0x41
> CPU architecture: 8
> CPU variant??? : 0x0
> CPU part??? : 0xd08
> CPU revision??? : 2
> 
>>
>> - an example of the crash? Is it within the decompressor? After? This
>> things do matter, given the number of crazy things the 32bit kernel does
> 
> They are always in user space. My current reproducer is this:
> 
>  ? $ while rpm -qa > /dev/null; do :; done
> 
> If I run this in parallel with something that just populates RAM (dd 
> if=/dev/nvme0n1 of=/dev/null bs=10G) I get an illegal instruction fault 
> within seconds:
> 
> sh-4.4# while rpm -qa > /dev/null; do true; done
> Illegal instruction (core dumped)
> 
> 
>> - a host kernel configuration?
> 
> Host kernel configuration is just the normal openSUSE one:
> 
> https://kernel.opensuse.org/cgit/kernel-source/plain/config/arm64/default?h=stable
> 
>>> If anyone is interested in a reproducer, I have something handy. But for
>>> now I believe we should just revert this patch.
>> Before we revert anything, I'd like to understand what is happening.
> 
> Yeah, I didn't realize the commit is already in since 4.16, so I agree. 
> I'll bisect a bit, but it may take a while.

Do you mind giving this a try?

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 1d90d79706bd..df8f3d5eaa22 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1531,7 +1536,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			kvm_set_pfn_dirty(pfn);
 		}
 
-		if (fault_status != FSC_PERM)
+		if (fault_status != FSC_PERM || write_fault)
 			clean_dcache_guest_page(pfn, PMD_SIZE);
 
 		if (exec_fault) {
@@ -1553,7 +1558,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			mark_page_dirty(kvm, gfn);
 		}
 
-		if (fault_status != FSC_PERM)
+		if (fault_status != FSC_PERM || write_fault)
 			clean_dcache_guest_page(pfn, PAGE_SIZE);
 
 		if (exec_fault) {


The missing logic is that a write from the guest could have triggered
a CoW, meaning we definitely need to flush it in that case too. It
fixes a kvm-unit-test regression here.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 7/9] KVM: arm/arm64: Only clean the dcache on translation fault
  2018-08-21 14:08         ` Alexander Graf
@ 2018-08-21 16:45           ` Alexander Graf
  -1 siblings, 0 replies; 52+ messages in thread
From: Alexander Graf @ 2018-08-21 16:45 UTC (permalink / raw)
  To: Marc Zyngier, Christoffer Dall, Catalin Marinas, Will Deacon
  Cc: kvm, linux-arm-kernel, kvmarm

On 08/21/2018 04:08 PM, Alexander Graf wrote:
> On 08/21/2018 03:57 PM, Marc Zyngier wrote:
>> On 21/08/18 14:35, Alexander Graf wrote:
>>> On 10/23/2017 06:11 PM, Marc Zyngier wrote:
>>>> The only case where we actually need to perform a dcache maintenance
>>>> is when we map the page for the first time, and subsequent permission
>>>> faults do not require cache maintenance. Let's make it conditional
>>>> on not being a permission fault (and thus a translation fault).
>>>>
>>>> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
>>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>> This patch unfortunately breaks something on Hi1616 SoCs when running
>>> 32bit guests. With this patch applied (and thus with 4.18) I get random
>>> illegal instruction warnings from 32bit code inside VMs. I do not know
>>> at this point whether this affects other CPUs as well.
>> Can you please give a few more details?
>>
>> - what are the CPUs on this Hi1616? At least a /proc/cpuinfo would help
>
> These are A72s:
>
> processor    : 0
> BogoMIPS    : 100.00
> Features    : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
> CPU implementer    : 0x41
> CPU architecture: 8
> CPU variant    : 0x0
> CPU part    : 0xd08
> CPU revision    : 2
>
>>
>> - an example of the crash? Is it within the decompressor? After? This
>> things do matter, given the number of crazy things the 32bit kernel does
>
> They are always in user space. My current reproducer is this:
>
>   $ while rpm -qa > /dev/null; do :; done
>
> If I run this in parallel with something that just populates RAM (dd 
> if=/dev/nvme0n1 of=/dev/null bs=10G) I get an illegal instruction 
> fault within seconds:
>
> sh-4.4# while rpm -qa > /dev/null; do true; done
> Illegal instruction (core dumped)
>
>
>> - a host kernel configuration?
>
> Host kernel configuration is just the normal openSUSE one:
>
> https://kernel.opensuse.org/cgit/kernel-source/plain/config/arm64/default?h=stable 
>
>
>>> If anyone is interested in a reproducer, I have something handy. But 
>>> for
>>> now I believe we should just revert this patch.
>> Before we revert anything, I'd like to understand what is happening.
>
> Yeah, I didn't realize the commit is already in since 4.16, so I 
> agree. I'll bisect a bit, but it may take a while.

I'm stuck on bisect. Somewhere in the merge window things broke so badly 
that udev just segfaults on boot. I got this far:

$ git bisect log
git bisect start
# good: [29dcea88779c856c7dc92040a0c01233263101d4] Linux 4.17
git bisect good 29dcea88779c856c7dc92040a0c01233263101d4
# bad: [529bea37411759c2b5b41a187b3020723c67c16d] Linux 4.18.1
git bisect bad 529bea37411759c2b5b41a187b3020723c67c16d
# good: [3036bc45364f98515a2c446d7fac2c34dcfbeff4] Merge tag 
'media/v4.18-2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
git bisect good 3036bc45364f98515a2c446d7fac2c34dcfbeff4
# good: [721afaa2aeb860067decdddadc84ed16f42f2048] Merge tag 'armsoc-dt' 
of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
git bisect good 721afaa2aeb860067decdddadc84ed16f42f2048
# good: [e7441c9274a6a5453e06f4c2b8b5f72eca0a3f17] mac80211: disable 
BHs/preemption in ieee80211_tx_control_port()
git bisect good e7441c9274a6a5453e06f4c2b8b5f72eca0a3f17
# bad: [05df204549c510c7c56e58d25098c448998a0cd5] Merge tag 
'devicetree-fixes-for-4.18' of 
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
git bisect bad 05df204549c510c7c56e58d25098c448998a0cd5
# good: [b19b9282093588e73401f9d4981310a8de975f7d] Merge tag 
'riscv-for-linus-4.18-rc4' of 
git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux
git bisect good b19b9282093588e73401f9d4981310a8de975f7d
# good: [bf642e3a1996f1ed8f083c5ecd4b51270a9e11bc] Merge tag 
'drm-intel-fixes-2018-07-12' of 
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
git bisect good bf642e3a1996f1ed8f083c5ecd4b51270a9e11bc
# bad: [2a7e1211e130c51a2b5743ecf247645ac8e936ee] Merge tag 
'vfio-v4.18-rc5' of git://github.com/awilliam/linux-vfio
git bisect bad 2a7e1211e130c51a2b5743ecf247645ac8e936ee
# bad: [a74aa9676c0256eac05efb2c8e3127e0835e66e5] Merge tag 
'char-misc-4.18-rc5' of 
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
git bisect bad a74aa9676c0256eac05efb2c8e3127e0835e66e5
# bad: [f1454959ad89f9fe2b6862fa3c41070feaffeab9] Merge tag 
'mmc-v4.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
git bisect bad f1454959ad89f9fe2b6862fa3c41070feaffeab9
# bad: [1e09177acae32a61586af26d83ca5ef591cdcaf5] Merge tag 
'mips_fixes_4.18_3' of 
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
git bisect bad 1e09177acae32a61586af26d83ca5ef591cdcaf5
# good: [4f65245f2d178b9cba48350620d76faa4a098841] HID: hiddev: fix 
potential Spectre v1
git bisect good 4f65245f2d178b9cba48350620d76faa4a098841



Alex

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 7/9] KVM: arm/arm64: Only clean the dcache on translation fault
@ 2018-08-21 16:45           ` Alexander Graf
  0 siblings, 0 replies; 52+ messages in thread
From: Alexander Graf @ 2018-08-21 16:45 UTC (permalink / raw)
  To: linux-arm-kernel

On 08/21/2018 04:08 PM, Alexander Graf wrote:
> On 08/21/2018 03:57 PM, Marc Zyngier wrote:
>> On 21/08/18 14:35, Alexander Graf wrote:
>>> On 10/23/2017 06:11 PM, Marc Zyngier wrote:
>>>> The only case where we actually need to perform a dcache maintenance
>>>> is when we map the page for the first time, and subsequent permission
>>>> faults do not require cache maintenance. Let's make it conditional
>>>> on not being a permission fault (and thus a translation fault).
>>>>
>>>> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
>>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>> This patch unfortunately breaks something on Hi1616 SoCs when running
>>> 32bit guests. With this patch applied (and thus with 4.18) I get random
>>> illegal instruction warnings from 32bit code inside VMs. I do not know
>>> at this point whether this affects other CPUs as well.
>> Can you please give a few more details?
>>
>> - what are the CPUs on this Hi1616? At least a /proc/cpuinfo would help
>
> These are A72s:
>
> processor??? : 0
> BogoMIPS??? : 100.00
> Features??? : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
> CPU implementer??? : 0x41
> CPU architecture: 8
> CPU variant??? : 0x0
> CPU part??? : 0xd08
> CPU revision??? : 2
>
>>
>> - an example of the crash? Is it within the decompressor? After? This
>> things do matter, given the number of crazy things the 32bit kernel does
>
> They are always in user space. My current reproducer is this:
>
> ? $ while rpm -qa > /dev/null; do :; done
>
> If I run this in parallel with something that just populates RAM (dd 
> if=/dev/nvme0n1 of=/dev/null bs=10G) I get an illegal instruction 
> fault within seconds:
>
> sh-4.4# while rpm -qa > /dev/null; do true; done
> Illegal instruction (core dumped)
>
>
>> - a host kernel configuration?
>
> Host kernel configuration is just the normal openSUSE one:
>
> https://kernel.opensuse.org/cgit/kernel-source/plain/config/arm64/default?h=stable 
>
>
>>> If anyone is interested in a reproducer, I have something handy. But 
>>> for
>>> now I believe we should just revert this patch.
>> Before we revert anything, I'd like to understand what is happening.
>
> Yeah, I didn't realize the commit is already in since 4.16, so I 
> agree. I'll bisect a bit, but it may take a while.

I'm stuck on bisect. Somewhere in the merge window things broke so badly 
that udev just segfaults on boot. I got this far:

$ git bisect log
git bisect start
# good: [29dcea88779c856c7dc92040a0c01233263101d4] Linux 4.17
git bisect good 29dcea88779c856c7dc92040a0c01233263101d4
# bad: [529bea37411759c2b5b41a187b3020723c67c16d] Linux 4.18.1
git bisect bad 529bea37411759c2b5b41a187b3020723c67c16d
# good: [3036bc45364f98515a2c446d7fac2c34dcfbeff4] Merge tag 
'media/v4.18-2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
git bisect good 3036bc45364f98515a2c446d7fac2c34dcfbeff4
# good: [721afaa2aeb860067decdddadc84ed16f42f2048] Merge tag 'armsoc-dt' 
of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
git bisect good 721afaa2aeb860067decdddadc84ed16f42f2048
# good: [e7441c9274a6a5453e06f4c2b8b5f72eca0a3f17] mac80211: disable 
BHs/preemption in ieee80211_tx_control_port()
git bisect good e7441c9274a6a5453e06f4c2b8b5f72eca0a3f17
# bad: [05df204549c510c7c56e58d25098c448998a0cd5] Merge tag 
'devicetree-fixes-for-4.18' of 
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
git bisect bad 05df204549c510c7c56e58d25098c448998a0cd5
# good: [b19b9282093588e73401f9d4981310a8de975f7d] Merge tag 
'riscv-for-linus-4.18-rc4' of 
git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux
git bisect good b19b9282093588e73401f9d4981310a8de975f7d
# good: [bf642e3a1996f1ed8f083c5ecd4b51270a9e11bc] Merge tag 
'drm-intel-fixes-2018-07-12' of 
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
git bisect good bf642e3a1996f1ed8f083c5ecd4b51270a9e11bc
# bad: [2a7e1211e130c51a2b5743ecf247645ac8e936ee] Merge tag 
'vfio-v4.18-rc5' of git://github.com/awilliam/linux-vfio
git bisect bad 2a7e1211e130c51a2b5743ecf247645ac8e936ee
# bad: [a74aa9676c0256eac05efb2c8e3127e0835e66e5] Merge tag 
'char-misc-4.18-rc5' of 
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
git bisect bad a74aa9676c0256eac05efb2c8e3127e0835e66e5
# bad: [f1454959ad89f9fe2b6862fa3c41070feaffeab9] Merge tag 
'mmc-v4.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
git bisect bad f1454959ad89f9fe2b6862fa3c41070feaffeab9
# bad: [1e09177acae32a61586af26d83ca5ef591cdcaf5] Merge tag 
'mips_fixes_4.18_3' of 
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
git bisect bad 1e09177acae32a61586af26d83ca5ef591cdcaf5
# good: [4f65245f2d178b9cba48350620d76faa4a098841] HID: hiddev: fix 
potential Spectre v1
git bisect good 4f65245f2d178b9cba48350620d76faa4a098841



Alex

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 7/9] KVM: arm/arm64: Only clean the dcache on translation fault
  2018-08-21 15:08           ` Marc Zyngier
@ 2018-08-21 16:54             ` Alexander Graf
  -1 siblings, 0 replies; 52+ messages in thread
From: Alexander Graf @ 2018-08-21 16:54 UTC (permalink / raw)
  To: Marc Zyngier, Christoffer Dall, Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, kvm, kvmarm

On 08/21/2018 05:08 PM, Marc Zyngier wrote:
> On 21/08/18 15:08, Alexander Graf wrote:
>> On 08/21/2018 03:57 PM, Marc Zyngier wrote:
>>> On 21/08/18 14:35, Alexander Graf wrote:
>>>> On 10/23/2017 06:11 PM, Marc Zyngier wrote:
>>>>> The only case where we actually need to perform a dcache maintenance
>>>>> is when we map the page for the first time, and subsequent permission
>>>>> faults do not require cache maintenance. Let's make it conditional
>>>>> on not being a permission fault (and thus a translation fault).
>>>>>
>>>>> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
>>>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>>> This patch unfortunately breaks something on Hi1616 SoCs when running
>>>> 32bit guests. With this patch applied (and thus with 4.18) I get random
>>>> illegal instruction warnings from 32bit code inside VMs. I do not know
>>>> at this point whether this affects other CPUs as well.
>>> Can you please give a few more details?
>>>
>>> - what are the CPUs on this Hi1616? At least a /proc/cpuinfo would help
>> These are A72s:
>>
>> processor    : 0
>> BogoMIPS    : 100.00
>> Features    : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
>> CPU implementer    : 0x41
>> CPU architecture: 8
>> CPU variant    : 0x0
>> CPU part    : 0xd08
>> CPU revision    : 2
>>
>>> - an example of the crash? Is it within the decompressor? After? This
>>> things do matter, given the number of crazy things the 32bit kernel does
>> They are always in user space. My current reproducer is this:
>>
>>     $ while rpm -qa > /dev/null; do :; done
>>
>> If I run this in parallel with something that just populates RAM (dd
>> if=/dev/nvme0n1 of=/dev/null bs=10G) I get an illegal instruction fault
>> within seconds:
>>
>> sh-4.4# while rpm -qa > /dev/null; do true; done
>> Illegal instruction (core dumped)
>>
>>
>>> - a host kernel configuration?
>> Host kernel configuration is just the normal openSUSE one:
>>
>> https://kernel.opensuse.org/cgit/kernel-source/plain/config/arm64/default?h=stable
>>
>>>> If anyone is interested in a reproducer, I have something handy. But for
>>>> now I believe we should just revert this patch.
>>> Before we revert anything, I'd like to understand what is happening.
>> Yeah, I didn't realize the commit is already in since 4.16, so I agree.
>> I'll bisect a bit, but it may take a while.
> Do you mind giving this a try?
>
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 1d90d79706bd..df8f3d5eaa22 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1531,7 +1536,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>   			kvm_set_pfn_dirty(pfn);
>   		}
>   
> -		if (fault_status != FSC_PERM)
> +		if (fault_status != FSC_PERM || write_fault)
>   			clean_dcache_guest_page(pfn, PMD_SIZE);
>   
>   		if (exec_fault) {
> @@ -1553,7 +1558,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>   			mark_page_dirty(kvm, gfn);
>   		}
>   
> -		if (fault_status != FSC_PERM)
> +		if (fault_status != FSC_PERM || write_fault)
>   			clean_dcache_guest_page(pfn, PAGE_SIZE);
>   
>   		if (exec_fault) {
>
>
> The missing logic is that a write from the guest could have triggered
> a CoW, meaning we definitely need to flush it in that case too. It
> fixes a kvm-unit-test regression here.

This patch unfortunately does not fix the issue. I still see illegal 
instructions.


Alex

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 7/9] KVM: arm/arm64: Only clean the dcache on translation fault
@ 2018-08-21 16:54             ` Alexander Graf
  0 siblings, 0 replies; 52+ messages in thread
From: Alexander Graf @ 2018-08-21 16:54 UTC (permalink / raw)
  To: linux-arm-kernel

On 08/21/2018 05:08 PM, Marc Zyngier wrote:
> On 21/08/18 15:08, Alexander Graf wrote:
>> On 08/21/2018 03:57 PM, Marc Zyngier wrote:
>>> On 21/08/18 14:35, Alexander Graf wrote:
>>>> On 10/23/2017 06:11 PM, Marc Zyngier wrote:
>>>>> The only case where we actually need to perform a dcache maintenance
>>>>> is when we map the page for the first time, and subsequent permission
>>>>> faults do not require cache maintenance. Let's make it conditional
>>>>> on not being a permission fault (and thus a translation fault).
>>>>>
>>>>> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
>>>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>>> This patch unfortunately breaks something on Hi1616 SoCs when running
>>>> 32bit guests. With this patch applied (and thus with 4.18) I get random
>>>> illegal instruction warnings from 32bit code inside VMs. I do not know
>>>> at this point whether this affects other CPUs as well.
>>> Can you please give a few more details?
>>>
>>> - what are the CPUs on this Hi1616? At least a /proc/cpuinfo would help
>> These are A72s:
>>
>> processor??? : 0
>> BogoMIPS??? : 100.00
>> Features??? : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
>> CPU implementer??? : 0x41
>> CPU architecture: 8
>> CPU variant??? : 0x0
>> CPU part??? : 0xd08
>> CPU revision??? : 2
>>
>>> - an example of the crash? Is it within the decompressor? After? This
>>> things do matter, given the number of crazy things the 32bit kernel does
>> They are always in user space. My current reproducer is this:
>>
>>   ? $ while rpm -qa > /dev/null; do :; done
>>
>> If I run this in parallel with something that just populates RAM (dd
>> if=/dev/nvme0n1 of=/dev/null bs=10G) I get an illegal instruction fault
>> within seconds:
>>
>> sh-4.4# while rpm -qa > /dev/null; do true; done
>> Illegal instruction (core dumped)
>>
>>
>>> - a host kernel configuration?
>> Host kernel configuration is just the normal openSUSE one:
>>
>> https://kernel.opensuse.org/cgit/kernel-source/plain/config/arm64/default?h=stable
>>
>>>> If anyone is interested in a reproducer, I have something handy. But for
>>>> now I believe we should just revert this patch.
>>> Before we revert anything, I'd like to understand what is happening.
>> Yeah, I didn't realize the commit is already in since 4.16, so I agree.
>> I'll bisect a bit, but it may take a while.
> Do you mind giving this a try?
>
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 1d90d79706bd..df8f3d5eaa22 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1531,7 +1536,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>   			kvm_set_pfn_dirty(pfn);
>   		}
>   
> -		if (fault_status != FSC_PERM)
> +		if (fault_status != FSC_PERM || write_fault)
>   			clean_dcache_guest_page(pfn, PMD_SIZE);
>   
>   		if (exec_fault) {
> @@ -1553,7 +1558,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>   			mark_page_dirty(kvm, gfn);
>   		}
>   
> -		if (fault_status != FSC_PERM)
> +		if (fault_status != FSC_PERM || write_fault)
>   			clean_dcache_guest_page(pfn, PAGE_SIZE);
>   
>   		if (exec_fault) {
>
>
> The missing logic is that a write from the guest could have triggered
> a CoW, meaning we definitely need to flush it in that case too. It
> fixes a kvm-unit-test regression here.

This patch unfortunately does not fix the issue. I still see illegal 
instructions.


Alex

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 7/9] KVM: arm/arm64: Only clean the dcache on translation fault
  2018-08-21 16:54             ` Alexander Graf
@ 2018-08-23 11:16               ` Marc Zyngier
  -1 siblings, 0 replies; 52+ messages in thread
From: Marc Zyngier @ 2018-08-23 11:16 UTC (permalink / raw)
  To: Alexander Graf, Christoffer Dall, Catalin Marinas, Will Deacon
  Cc: Dave Martin, linux-arm-kernel, kvm, kvmarm

On 21/08/18 17:54, Alexander Graf wrote:
> On 08/21/2018 05:08 PM, Marc Zyngier wrote:
>> On 21/08/18 15:08, Alexander Graf wrote:
>>> On 08/21/2018 03:57 PM, Marc Zyngier wrote:
>>>> On 21/08/18 14:35, Alexander Graf wrote:
>>>>> On 10/23/2017 06:11 PM, Marc Zyngier wrote:
>>>>>> The only case where we actually need to perform a dcache maintenance
>>>>>> is when we map the page for the first time, and subsequent permission
>>>>>> faults do not require cache maintenance. Let's make it conditional
>>>>>> on not being a permission fault (and thus a translation fault).
>>>>>>
>>>>>> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
>>>>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>>>> This patch unfortunately breaks something on Hi1616 SoCs when running
>>>>> 32bit guests. With this patch applied (and thus with 4.18) I get random
>>>>> illegal instruction warnings from 32bit code inside VMs. I do not know
>>>>> at this point whether this affects other CPUs as well.
>>>> Can you please give a few more details?
>>>>
>>>> - what are the CPUs on this Hi1616? At least a /proc/cpuinfo would help
>>> These are A72s:
>>>
>>> processor    : 0
>>> BogoMIPS    : 100.00
>>> Features    : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
>>> CPU implementer    : 0x41
>>> CPU architecture: 8
>>> CPU variant    : 0x0
>>> CPU part    : 0xd08
>>> CPU revision    : 2
>>>
>>>> - an example of the crash? Is it within the decompressor? After? This
>>>> things do matter, given the number of crazy things the 32bit kernel does
>>> They are always in user space. My current reproducer is this:
>>>
>>>     $ while rpm -qa > /dev/null; do :; done
>>>
>>> If I run this in parallel with something that just populates RAM (dd
>>> if=/dev/nvme0n1 of=/dev/null bs=10G) I get an illegal instruction fault
>>> within seconds:
>>>
>>> sh-4.4# while rpm -qa > /dev/null; do true; done
>>> Illegal instruction (core dumped)
>>>
>>>
>>>> - a host kernel configuration?
>>> Host kernel configuration is just the normal openSUSE one:
>>>
>>> https://kernel.opensuse.org/cgit/kernel-source/plain/config/arm64/default?h=stable
>>>
>>>>> If anyone is interested in a reproducer, I have something handy. But for
>>>>> now I believe we should just revert this patch.
>>>> Before we revert anything, I'd like to understand what is happening.
>>> Yeah, I didn't realize the commit is already in since 4.16, so I agree.
>>> I'll bisect a bit, but it may take a while.
>> Do you mind giving this a try?
>>
>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>> index 1d90d79706bd..df8f3d5eaa22 100644
>> --- a/virt/kvm/arm/mmu.c
>> +++ b/virt/kvm/arm/mmu.c
>> @@ -1531,7 +1536,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>   			kvm_set_pfn_dirty(pfn);
>>   		}
>>   
>> -		if (fault_status != FSC_PERM)
>> +		if (fault_status != FSC_PERM || write_fault)
>>   			clean_dcache_guest_page(pfn, PMD_SIZE);
>>   
>>   		if (exec_fault) {
>> @@ -1553,7 +1558,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>   			mark_page_dirty(kvm, gfn);
>>   		}
>>   
>> -		if (fault_status != FSC_PERM)
>> +		if (fault_status != FSC_PERM || write_fault)
>>   			clean_dcache_guest_page(pfn, PAGE_SIZE);
>>   
>>   		if (exec_fault) {
>>
>>
>> The missing logic is that a write from the guest could have triggered
>> a CoW, meaning we definitely need to flush it in that case too. It
>> fixes a kvm-unit-test regression here.
> 
> This patch unfortunately does not fix the issue. I still see illegal 
> instructions.

[+Dave]

That's because what you're observing has nothing to do with caching, but 
with FP/SIMD trapping instead. Thanks to the guest image you've provided,
I've been able to extract the following:

[    3.944273] systemd-udevd (266): undefined instruction: pc=(ptrval)
[    3.945396] CPU: 0 PID: 266 Comm: systemd-udevd Not tainted 4.17.14-2-lpae #1 openSUSE Tumbleweed (unreleased)
[    3.947130] Hardware name: Generic DT based system
[    3.947976] PC is at 0xb6b9397a
[    3.948547] LR is at 0xb6e9a1b0
[    3.958291] pc : [<b6b9397a>]    lr : [<b6e9a1b0>]    psr: 20070030
[    3.959664] sp : bebe36a0  ip : b6f6ad50  fp : 005e8784
[    3.960601] r10: bebe3814  r9 : bebe36c0  r8 : bebe36bc
[    3.961522] r7 : bebe3814  r6 : 00000074  r5 : 00000000  r4 : 00000000
[    3.962661] r3 : 005f2b60  r2 : 00000074  r1 : 00000000  r0 : bebe3814
[    3.963801] Flags: nzCv  IRQs on  FIQs on  Mode USER_32  ISA Thumb  Segment user
[    4.000606] Control: 30c5383d  Table: 6a8e7400  DAC: fffffffd
[    4.001659] Code: d1f9 f1a0 0001 4770 (eee0) 1b10 

maz@flakes:~/armv7-build-fail$ (echo .thumb; echo .inst.w 0xeee01b10) > x.S &&  arm-linux-gnueabihf-gcc -c x.S && arm-linux-gnueabihf-objdump -d x.o

x.o:     file format elf32-littlearm


Disassembly of section .text:

00000000 <.text>:
   0:	eee0 1b10 	vdup.8	q0, r1

A VFP instruction. Given that you've reported that things worked in
4.17 and broke in 4.18, I strongly suspect the new lazy FPSIMD code.
Upon inspection, the way we setup trapping for 32bit is a tiny bit
suspect.

Could you please give this patch a go? My Seattle has been running
with it for 30 minutes now, and it is still running (instead of
failing with seconds).

Thanks,

	M.

From f2d1d43e2429f9269ab6c565440e095ab3a9be89 Mon Sep 17 00:00:00 2001
From: Marc Zyngier <marc.zyngier@arm.com>
Date: Thu, 23 Aug 2018 11:51:43 +0100
Subject: [PATCH] arm64: KVM: Only force FPEXC32_EL2.EN if trapping FPSIMD

If trapping FPSIMD in the context of an AArch32 guest, it is critical
to set FPEXC32_EL2.EN to 1 so that the trapping is taken to EL2 and
not EL1.

Conversely, it is just as critical *not* to set FPEXC32_EL2.EN to 1
if we're not going to trap FPSIMD, as we then corrupt the existing
VFP state.

Moving the call to __activate_traps_fpsimd32 to the point where we
know for sure that we are going to trap ensures that we don't set that
bit spuriously.

Fixes: e6b673b741ea ("KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing")
Cc: Dave Martin <dave.martin@arm.com>
Reported-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/kvm/hyp/switch.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index d496ef579859..ca46153d7915 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -98,8 +98,10 @@ static void activate_traps_vhe(struct kvm_vcpu *vcpu)
 	val = read_sysreg(cpacr_el1);
 	val |= CPACR_EL1_TTA;
 	val &= ~CPACR_EL1_ZEN;
-	if (!update_fp_enabled(vcpu))
+	if (!update_fp_enabled(vcpu)) {
 		val &= ~CPACR_EL1_FPEN;
+		__activate_traps_fpsimd32(vcpu);
+	}
 
 	write_sysreg(val, cpacr_el1);
 
@@ -114,8 +116,10 @@ static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
 
 	val = CPTR_EL2_DEFAULT;
 	val |= CPTR_EL2_TTA | CPTR_EL2_TZ;
-	if (!update_fp_enabled(vcpu))
+	if (!update_fp_enabled(vcpu)) {
 		val |= CPTR_EL2_TFP;
+		__activate_traps_fpsimd32(vcpu);
+	}
 
 	write_sysreg(val, cptr_el2);
 }
@@ -129,7 +133,6 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
 	if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN) && (hcr & HCR_VSE))
 		write_sysreg_s(vcpu->arch.vsesr_el2, SYS_VSESR_EL2);
 
-	__activate_traps_fpsimd32(vcpu);
 	if (has_vhe())
 		activate_traps_vhe(vcpu);
 	else
-- 
2.18.0


-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v3 7/9] KVM: arm/arm64: Only clean the dcache on translation fault
@ 2018-08-23 11:16               ` Marc Zyngier
  0 siblings, 0 replies; 52+ messages in thread
From: Marc Zyngier @ 2018-08-23 11:16 UTC (permalink / raw)
  To: linux-arm-kernel

On 21/08/18 17:54, Alexander Graf wrote:
> On 08/21/2018 05:08 PM, Marc Zyngier wrote:
>> On 21/08/18 15:08, Alexander Graf wrote:
>>> On 08/21/2018 03:57 PM, Marc Zyngier wrote:
>>>> On 21/08/18 14:35, Alexander Graf wrote:
>>>>> On 10/23/2017 06:11 PM, Marc Zyngier wrote:
>>>>>> The only case where we actually need to perform a dcache maintenance
>>>>>> is when we map the page for the first time, and subsequent permission
>>>>>> faults do not require cache maintenance. Let's make it conditional
>>>>>> on not being a permission fault (and thus a translation fault).
>>>>>>
>>>>>> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
>>>>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>>>> This patch unfortunately breaks something on Hi1616 SoCs when running
>>>>> 32bit guests. With this patch applied (and thus with 4.18) I get random
>>>>> illegal instruction warnings from 32bit code inside VMs. I do not know
>>>>> at this point whether this affects other CPUs as well.
>>>> Can you please give a few more details?
>>>>
>>>> - what are the CPUs on this Hi1616? At least a /proc/cpuinfo would help
>>> These are A72s:
>>>
>>> processor??? : 0
>>> BogoMIPS??? : 100.00
>>> Features??? : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
>>> CPU implementer??? : 0x41
>>> CPU architecture: 8
>>> CPU variant??? : 0x0
>>> CPU part??? : 0xd08
>>> CPU revision??? : 2
>>>
>>>> - an example of the crash? Is it within the decompressor? After? This
>>>> things do matter, given the number of crazy things the 32bit kernel does
>>> They are always in user space. My current reproducer is this:
>>>
>>>   ? $ while rpm -qa > /dev/null; do :; done
>>>
>>> If I run this in parallel with something that just populates RAM (dd
>>> if=/dev/nvme0n1 of=/dev/null bs=10G) I get an illegal instruction fault
>>> within seconds:
>>>
>>> sh-4.4# while rpm -qa > /dev/null; do true; done
>>> Illegal instruction (core dumped)
>>>
>>>
>>>> - a host kernel configuration?
>>> Host kernel configuration is just the normal openSUSE one:
>>>
>>> https://kernel.opensuse.org/cgit/kernel-source/plain/config/arm64/default?h=stable
>>>
>>>>> If anyone is interested in a reproducer, I have something handy. But for
>>>>> now I believe we should just revert this patch.
>>>> Before we revert anything, I'd like to understand what is happening.
>>> Yeah, I didn't realize the commit is already in since 4.16, so I agree.
>>> I'll bisect a bit, but it may take a while.
>> Do you mind giving this a try?
>>
>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>> index 1d90d79706bd..df8f3d5eaa22 100644
>> --- a/virt/kvm/arm/mmu.c
>> +++ b/virt/kvm/arm/mmu.c
>> @@ -1531,7 +1536,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>   			kvm_set_pfn_dirty(pfn);
>>   		}
>>   
>> -		if (fault_status != FSC_PERM)
>> +		if (fault_status != FSC_PERM || write_fault)
>>   			clean_dcache_guest_page(pfn, PMD_SIZE);
>>   
>>   		if (exec_fault) {
>> @@ -1553,7 +1558,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>   			mark_page_dirty(kvm, gfn);
>>   		}
>>   
>> -		if (fault_status != FSC_PERM)
>> +		if (fault_status != FSC_PERM || write_fault)
>>   			clean_dcache_guest_page(pfn, PAGE_SIZE);
>>   
>>   		if (exec_fault) {
>>
>>
>> The missing logic is that a write from the guest could have triggered
>> a CoW, meaning we definitely need to flush it in that case too. It
>> fixes a kvm-unit-test regression here.
> 
> This patch unfortunately does not fix the issue. I still see illegal 
> instructions.

[+Dave]

That's because what you're observing has nothing to do with caching, but 
with FP/SIMD trapping instead. Thanks to the guest image you've provided,
I've been able to extract the following:

[    3.944273] systemd-udevd (266): undefined instruction: pc=(ptrval)
[    3.945396] CPU: 0 PID: 266 Comm: systemd-udevd Not tainted 4.17.14-2-lpae #1 openSUSE Tumbleweed (unreleased)
[    3.947130] Hardware name: Generic DT based system
[    3.947976] PC is at 0xb6b9397a
[    3.948547] LR is at 0xb6e9a1b0
[    3.958291] pc : [<b6b9397a>]    lr : [<b6e9a1b0>]    psr: 20070030
[    3.959664] sp : bebe36a0  ip : b6f6ad50  fp : 005e8784
[    3.960601] r10: bebe3814  r9 : bebe36c0  r8 : bebe36bc
[    3.961522] r7 : bebe3814  r6 : 00000074  r5 : 00000000  r4 : 00000000
[    3.962661] r3 : 005f2b60  r2 : 00000074  r1 : 00000000  r0 : bebe3814
[    3.963801] Flags: nzCv  IRQs on  FIQs on  Mode USER_32  ISA Thumb  Segment user
[    4.000606] Control: 30c5383d  Table: 6a8e7400  DAC: fffffffd
[    4.001659] Code: d1f9 f1a0 0001 4770 (eee0) 1b10 

maz at flakes:~/armv7-build-fail$ (echo .thumb; echo .inst.w 0xeee01b10) > x.S &&  arm-linux-gnueabihf-gcc -c x.S && arm-linux-gnueabihf-objdump -d x.o

x.o:     file format elf32-littlearm


Disassembly of section .text:

00000000 <.text>:
   0:	eee0 1b10 	vdup.8	q0, r1

A VFP instruction. Given that you've reported that things worked in
4.17 and broke in 4.18, I strongly suspect the new lazy FPSIMD code.
Upon inspection, the way we setup trapping for 32bit is a tiny bit
suspect.

Could you please give this patch a go? My Seattle has been running
with it for 30 minutes now, and it is still running (instead of
failing with seconds).

Thanks,

	M.

>From f2d1d43e2429f9269ab6c565440e095ab3a9be89 Mon Sep 17 00:00:00 2001
From: Marc Zyngier <marc.zyngier@arm.com>
Date: Thu, 23 Aug 2018 11:51:43 +0100
Subject: [PATCH] arm64: KVM: Only force FPEXC32_EL2.EN if trapping FPSIMD

If trapping FPSIMD in the context of an AArch32 guest, it is critical
to set FPEXC32_EL2.EN to 1 so that the trapping is taken to EL2 and
not EL1.

Conversely, it is just as critical *not* to set FPEXC32_EL2.EN to 1
if we're not going to trap FPSIMD, as we then corrupt the existing
VFP state.

Moving the call to __activate_traps_fpsimd32 to the point where we
know for sure that we are going to trap ensures that we don't set that
bit spuriously.

Fixes: e6b673b741ea ("KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing")
Cc: Dave Martin <dave.martin@arm.com>
Reported-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/kvm/hyp/switch.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index d496ef579859..ca46153d7915 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -98,8 +98,10 @@ static void activate_traps_vhe(struct kvm_vcpu *vcpu)
 	val = read_sysreg(cpacr_el1);
 	val |= CPACR_EL1_TTA;
 	val &= ~CPACR_EL1_ZEN;
-	if (!update_fp_enabled(vcpu))
+	if (!update_fp_enabled(vcpu)) {
 		val &= ~CPACR_EL1_FPEN;
+		__activate_traps_fpsimd32(vcpu);
+	}
 
 	write_sysreg(val, cpacr_el1);
 
@@ -114,8 +116,10 @@ static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
 
 	val = CPTR_EL2_DEFAULT;
 	val |= CPTR_EL2_TTA | CPTR_EL2_TZ;
-	if (!update_fp_enabled(vcpu))
+	if (!update_fp_enabled(vcpu)) {
 		val |= CPTR_EL2_TFP;
+		__activate_traps_fpsimd32(vcpu);
+	}
 
 	write_sysreg(val, cptr_el2);
 }
@@ -129,7 +133,6 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
 	if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN) && (hcr & HCR_VSE))
 		write_sysreg_s(vcpu->arch.vsesr_el2, SYS_VSESR_EL2);
 
-	__activate_traps_fpsimd32(vcpu);
 	if (has_vhe())
 		activate_traps_vhe(vcpu);
 	else
-- 
2.18.0


-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 7/9] KVM: arm/arm64: Only clean the dcache on translation fault
  2018-08-23 11:16               ` Marc Zyngier
@ 2018-08-23 12:24                 ` Alexander Graf
  -1 siblings, 0 replies; 52+ messages in thread
From: Alexander Graf @ 2018-08-23 12:24 UTC (permalink / raw)
  To: Marc Zyngier, Christoffer Dall, Catalin Marinas, Will Deacon
  Cc: Dave Martin, linux-arm-kernel, kvm, kvmarm

On 08/23/2018 01:16 PM, Marc Zyngier wrote:
> On 21/08/18 17:54, Alexander Graf wrote:
>> On 08/21/2018 05:08 PM, Marc Zyngier wrote:
>>> On 21/08/18 15:08, Alexander Graf wrote:
>>>> On 08/21/2018 03:57 PM, Marc Zyngier wrote:
>>>>> On 21/08/18 14:35, Alexander Graf wrote:
>>>>>> On 10/23/2017 06:11 PM, Marc Zyngier wrote:
>>>>>>> The only case where we actually need to perform a dcache maintenance
>>>>>>> is when we map the page for the first time, and subsequent permission
>>>>>>> faults do not require cache maintenance. Let's make it conditional
>>>>>>> on not being a permission fault (and thus a translation fault).
>>>>>>>
>>>>>>> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
>>>>>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>>>>> This patch unfortunately breaks something on Hi1616 SoCs when running
>>>>>> 32bit guests. With this patch applied (and thus with 4.18) I get random
>>>>>> illegal instruction warnings from 32bit code inside VMs. I do not know
>>>>>> at this point whether this affects other CPUs as well.
>>>>> Can you please give a few more details?
>>>>>
>>>>> - what are the CPUs on this Hi1616? At least a /proc/cpuinfo would help
>>>> These are A72s:
>>>>
>>>> processor    : 0
>>>> BogoMIPS    : 100.00
>>>> Features    : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
>>>> CPU implementer    : 0x41
>>>> CPU architecture: 8
>>>> CPU variant    : 0x0
>>>> CPU part    : 0xd08
>>>> CPU revision    : 2
>>>>
>>>>> - an example of the crash? Is it within the decompressor? After? This
>>>>> things do matter, given the number of crazy things the 32bit kernel does
>>>> They are always in user space. My current reproducer is this:
>>>>
>>>>      $ while rpm -qa > /dev/null; do :; done
>>>>
>>>> If I run this in parallel with something that just populates RAM (dd
>>>> if=/dev/nvme0n1 of=/dev/null bs=10G) I get an illegal instruction fault
>>>> within seconds:
>>>>
>>>> sh-4.4# while rpm -qa > /dev/null; do true; done
>>>> Illegal instruction (core dumped)
>>>>
>>>>
>>>>> - a host kernel configuration?
>>>> Host kernel configuration is just the normal openSUSE one:
>>>>
>>>> https://kernel.opensuse.org/cgit/kernel-source/plain/config/arm64/default?h=stable
>>>>
>>>>>> If anyone is interested in a reproducer, I have something handy. But for
>>>>>> now I believe we should just revert this patch.
>>>>> Before we revert anything, I'd like to understand what is happening.
>>>> Yeah, I didn't realize the commit is already in since 4.16, so I agree.
>>>> I'll bisect a bit, but it may take a while.
>>> Do you mind giving this a try?
>>>
>>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>>> index 1d90d79706bd..df8f3d5eaa22 100644
>>> --- a/virt/kvm/arm/mmu.c
>>> +++ b/virt/kvm/arm/mmu.c
>>> @@ -1531,7 +1536,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>    			kvm_set_pfn_dirty(pfn);
>>>    		}
>>>    
>>> -		if (fault_status != FSC_PERM)
>>> +		if (fault_status != FSC_PERM || write_fault)
>>>    			clean_dcache_guest_page(pfn, PMD_SIZE);
>>>    
>>>    		if (exec_fault) {
>>> @@ -1553,7 +1558,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>    			mark_page_dirty(kvm, gfn);
>>>    		}
>>>    
>>> -		if (fault_status != FSC_PERM)
>>> +		if (fault_status != FSC_PERM || write_fault)
>>>    			clean_dcache_guest_page(pfn, PAGE_SIZE);
>>>    
>>>    		if (exec_fault) {
>>>
>>>
>>> The missing logic is that a write from the guest could have triggered
>>> a CoW, meaning we definitely need to flush it in that case too. It
>>> fixes a kvm-unit-test regression here.
>> This patch unfortunately does not fix the issue. I still see illegal
>> instructions.
> [+Dave]
>
> That's because what you're observing has nothing to do with caching, but
> with FP/SIMD trapping instead. Thanks to the guest image you've provided,
> I've been able to extract the following:
>
> [    3.944273] systemd-udevd (266): undefined instruction: pc=(ptrval)
> [    3.945396] CPU: 0 PID: 266 Comm: systemd-udevd Not tainted 4.17.14-2-lpae #1 openSUSE Tumbleweed (unreleased)
> [    3.947130] Hardware name: Generic DT based system
> [    3.947976] PC is at 0xb6b9397a
> [    3.948547] LR is at 0xb6e9a1b0
> [    3.958291] pc : [<b6b9397a>]    lr : [<b6e9a1b0>]    psr: 20070030
> [    3.959664] sp : bebe36a0  ip : b6f6ad50  fp : 005e8784
> [    3.960601] r10: bebe3814  r9 : bebe36c0  r8 : bebe36bc
> [    3.961522] r7 : bebe3814  r6 : 00000074  r5 : 00000000  r4 : 00000000
> [    3.962661] r3 : 005f2b60  r2 : 00000074  r1 : 00000000  r0 : bebe3814
> [    3.963801] Flags: nzCv  IRQs on  FIQs on  Mode USER_32  ISA Thumb  Segment user
> [    4.000606] Control: 30c5383d  Table: 6a8e7400  DAC: fffffffd
> [    4.001659] Code: d1f9 f1a0 0001 4770 (eee0) 1b10
>
> maz@flakes:~/armv7-build-fail$ (echo .thumb; echo .inst.w 0xeee01b10) > x.S &&  arm-linux-gnueabihf-gcc -c x.S && arm-linux-gnueabihf-objdump -d x.o
>
> x.o:     file format elf32-littlearm
>
>
> Disassembly of section .text:
>
> 00000000 <.text>:
>     0:	eee0 1b10 	vdup.8	q0, r1
>
> A VFP instruction. Given that you've reported that things worked in
> 4.17 and broke in 4.18, I strongly suspect the new lazy FPSIMD code.
> Upon inspection, the way we setup trapping for 32bit is a tiny bit
> suspect.
>
> Could you please give this patch a go? My Seattle has been running
> with it for 30 minutes now, and it is still running (instead of
> failing with seconds).
>
> Thanks,
>
> 	M.
>
>  From f2d1d43e2429f9269ab6c565440e095ab3a9be89 Mon Sep 17 00:00:00 2001
> From: Marc Zyngier <marc.zyngier@arm.com>
> Date: Thu, 23 Aug 2018 11:51:43 +0100
> Subject: [PATCH] arm64: KVM: Only force FPEXC32_EL2.EN if trapping FPSIMD
>
> If trapping FPSIMD in the context of an AArch32 guest, it is critical
> to set FPEXC32_EL2.EN to 1 so that the trapping is taken to EL2 and
> not EL1.
>
> Conversely, it is just as critical *not* to set FPEXC32_EL2.EN to 1
> if we're not going to trap FPSIMD, as we then corrupt the existing
> VFP state.
>
> Moving the call to __activate_traps_fpsimd32 to the point where we
> know for sure that we are going to trap ensures that we don't set that
> bit spuriously.
>
> Fixes: e6b673b741ea ("KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing")
> Cc: Dave Martin <dave.martin@arm.com>
> Reported-by: Alexander Graf <agraf@suse.de>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

Very good catch. This patch does indeed fix it. I'm still puzzled why 
reverting the dcache patch actually had any positive effect, but oh well 
... :)

Tested-by: Alexander Graf <agraf@suse.de>


Alex

> ---
>   arch/arm64/kvm/hyp/switch.c | 9 ++++++---
>   1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index d496ef579859..ca46153d7915 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -98,8 +98,10 @@ static void activate_traps_vhe(struct kvm_vcpu *vcpu)
>   	val = read_sysreg(cpacr_el1);
>   	val |= CPACR_EL1_TTA;
>   	val &= ~CPACR_EL1_ZEN;
> -	if (!update_fp_enabled(vcpu))
> +	if (!update_fp_enabled(vcpu)) {
>   		val &= ~CPACR_EL1_FPEN;
> +		__activate_traps_fpsimd32(vcpu);
> +	}
>   
>   	write_sysreg(val, cpacr_el1);
>   
> @@ -114,8 +116,10 @@ static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
>   
>   	val = CPTR_EL2_DEFAULT;
>   	val |= CPTR_EL2_TTA | CPTR_EL2_TZ;
> -	if (!update_fp_enabled(vcpu))
> +	if (!update_fp_enabled(vcpu)) {
>   		val |= CPTR_EL2_TFP;
> +		__activate_traps_fpsimd32(vcpu);
> +	}
>   
>   	write_sysreg(val, cptr_el2);
>   }
> @@ -129,7 +133,6 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
>   	if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN) && (hcr & HCR_VSE))
>   		write_sysreg_s(vcpu->arch.vsesr_el2, SYS_VSESR_EL2);
>   
> -	__activate_traps_fpsimd32(vcpu);
>   	if (has_vhe())
>   		activate_traps_vhe(vcpu);
>   	else


_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 7/9] KVM: arm/arm64: Only clean the dcache on translation fault
@ 2018-08-23 12:24                 ` Alexander Graf
  0 siblings, 0 replies; 52+ messages in thread
From: Alexander Graf @ 2018-08-23 12:24 UTC (permalink / raw)
  To: linux-arm-kernel

On 08/23/2018 01:16 PM, Marc Zyngier wrote:
> On 21/08/18 17:54, Alexander Graf wrote:
>> On 08/21/2018 05:08 PM, Marc Zyngier wrote:
>>> On 21/08/18 15:08, Alexander Graf wrote:
>>>> On 08/21/2018 03:57 PM, Marc Zyngier wrote:
>>>>> On 21/08/18 14:35, Alexander Graf wrote:
>>>>>> On 10/23/2017 06:11 PM, Marc Zyngier wrote:
>>>>>>> The only case where we actually need to perform a dcache maintenance
>>>>>>> is when we map the page for the first time, and subsequent permission
>>>>>>> faults do not require cache maintenance. Let's make it conditional
>>>>>>> on not being a permission fault (and thus a translation fault).
>>>>>>>
>>>>>>> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
>>>>>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>>>>> This patch unfortunately breaks something on Hi1616 SoCs when running
>>>>>> 32bit guests. With this patch applied (and thus with 4.18) I get random
>>>>>> illegal instruction warnings from 32bit code inside VMs. I do not know
>>>>>> at this point whether this affects other CPUs as well.
>>>>> Can you please give a few more details?
>>>>>
>>>>> - what are the CPUs on this Hi1616? At least a /proc/cpuinfo would help
>>>> These are A72s:
>>>>
>>>> processor??? : 0
>>>> BogoMIPS??? : 100.00
>>>> Features??? : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
>>>> CPU implementer??? : 0x41
>>>> CPU architecture: 8
>>>> CPU variant??? : 0x0
>>>> CPU part??? : 0xd08
>>>> CPU revision??? : 2
>>>>
>>>>> - an example of the crash? Is it within the decompressor? After? This
>>>>> things do matter, given the number of crazy things the 32bit kernel does
>>>> They are always in user space. My current reproducer is this:
>>>>
>>>>    ? $ while rpm -qa > /dev/null; do :; done
>>>>
>>>> If I run this in parallel with something that just populates RAM (dd
>>>> if=/dev/nvme0n1 of=/dev/null bs=10G) I get an illegal instruction fault
>>>> within seconds:
>>>>
>>>> sh-4.4# while rpm -qa > /dev/null; do true; done
>>>> Illegal instruction (core dumped)
>>>>
>>>>
>>>>> - a host kernel configuration?
>>>> Host kernel configuration is just the normal openSUSE one:
>>>>
>>>> https://kernel.opensuse.org/cgit/kernel-source/plain/config/arm64/default?h=stable
>>>>
>>>>>> If anyone is interested in a reproducer, I have something handy. But for
>>>>>> now I believe we should just revert this patch.
>>>>> Before we revert anything, I'd like to understand what is happening.
>>>> Yeah, I didn't realize the commit is already in since 4.16, so I agree.
>>>> I'll bisect a bit, but it may take a while.
>>> Do you mind giving this a try?
>>>
>>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>>> index 1d90d79706bd..df8f3d5eaa22 100644
>>> --- a/virt/kvm/arm/mmu.c
>>> +++ b/virt/kvm/arm/mmu.c
>>> @@ -1531,7 +1536,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>    			kvm_set_pfn_dirty(pfn);
>>>    		}
>>>    
>>> -		if (fault_status != FSC_PERM)
>>> +		if (fault_status != FSC_PERM || write_fault)
>>>    			clean_dcache_guest_page(pfn, PMD_SIZE);
>>>    
>>>    		if (exec_fault) {
>>> @@ -1553,7 +1558,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>    			mark_page_dirty(kvm, gfn);
>>>    		}
>>>    
>>> -		if (fault_status != FSC_PERM)
>>> +		if (fault_status != FSC_PERM || write_fault)
>>>    			clean_dcache_guest_page(pfn, PAGE_SIZE);
>>>    
>>>    		if (exec_fault) {
>>>
>>>
>>> The missing logic is that a write from the guest could have triggered
>>> a CoW, meaning we definitely need to flush it in that case too. It
>>> fixes a kvm-unit-test regression here.
>> This patch unfortunately does not fix the issue. I still see illegal
>> instructions.
> [+Dave]
>
> That's because what you're observing has nothing to do with caching, but
> with FP/SIMD trapping instead. Thanks to the guest image you've provided,
> I've been able to extract the following:
>
> [    3.944273] systemd-udevd (266): undefined instruction: pc=(ptrval)
> [    3.945396] CPU: 0 PID: 266 Comm: systemd-udevd Not tainted 4.17.14-2-lpae #1 openSUSE Tumbleweed (unreleased)
> [    3.947130] Hardware name: Generic DT based system
> [    3.947976] PC is at 0xb6b9397a
> [    3.948547] LR is at 0xb6e9a1b0
> [    3.958291] pc : [<b6b9397a>]    lr : [<b6e9a1b0>]    psr: 20070030
> [    3.959664] sp : bebe36a0  ip : b6f6ad50  fp : 005e8784
> [    3.960601] r10: bebe3814  r9 : bebe36c0  r8 : bebe36bc
> [    3.961522] r7 : bebe3814  r6 : 00000074  r5 : 00000000  r4 : 00000000
> [    3.962661] r3 : 005f2b60  r2 : 00000074  r1 : 00000000  r0 : bebe3814
> [    3.963801] Flags: nzCv  IRQs on  FIQs on  Mode USER_32  ISA Thumb  Segment user
> [    4.000606] Control: 30c5383d  Table: 6a8e7400  DAC: fffffffd
> [    4.001659] Code: d1f9 f1a0 0001 4770 (eee0) 1b10
>
> maz at flakes:~/armv7-build-fail$ (echo .thumb; echo .inst.w 0xeee01b10) > x.S &&  arm-linux-gnueabihf-gcc -c x.S && arm-linux-gnueabihf-objdump -d x.o
>
> x.o:     file format elf32-littlearm
>
>
> Disassembly of section .text:
>
> 00000000 <.text>:
>     0:	eee0 1b10 	vdup.8	q0, r1
>
> A VFP instruction. Given that you've reported that things worked in
> 4.17 and broke in 4.18, I strongly suspect the new lazy FPSIMD code.
> Upon inspection, the way we setup trapping for 32bit is a tiny bit
> suspect.
>
> Could you please give this patch a go? My Seattle has been running
> with it for 30 minutes now, and it is still running (instead of
> failing with seconds).
>
> Thanks,
>
> 	M.
>
>  From f2d1d43e2429f9269ab6c565440e095ab3a9be89 Mon Sep 17 00:00:00 2001
> From: Marc Zyngier <marc.zyngier@arm.com>
> Date: Thu, 23 Aug 2018 11:51:43 +0100
> Subject: [PATCH] arm64: KVM: Only force FPEXC32_EL2.EN if trapping FPSIMD
>
> If trapping FPSIMD in the context of an AArch32 guest, it is critical
> to set FPEXC32_EL2.EN to 1 so that the trapping is taken to EL2 and
> not EL1.
>
> Conversely, it is just as critical *not* to set FPEXC32_EL2.EN to 1
> if we're not going to trap FPSIMD, as we then corrupt the existing
> VFP state.
>
> Moving the call to __activate_traps_fpsimd32 to the point where we
> know for sure that we are going to trap ensures that we don't set that
> bit spuriously.
>
> Fixes: e6b673b741ea ("KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing")
> Cc: Dave Martin <dave.martin@arm.com>
> Reported-by: Alexander Graf <agraf@suse.de>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

Very good catch. This patch does indeed fix it. I'm still puzzled why 
reverting the dcache patch actually had any positive effect, but oh well 
... :)

Tested-by: Alexander Graf <agraf@suse.de>


Alex

> ---
>   arch/arm64/kvm/hyp/switch.c | 9 ++++++---
>   1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index d496ef579859..ca46153d7915 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -98,8 +98,10 @@ static void activate_traps_vhe(struct kvm_vcpu *vcpu)
>   	val = read_sysreg(cpacr_el1);
>   	val |= CPACR_EL1_TTA;
>   	val &= ~CPACR_EL1_ZEN;
> -	if (!update_fp_enabled(vcpu))
> +	if (!update_fp_enabled(vcpu)) {
>   		val &= ~CPACR_EL1_FPEN;
> +		__activate_traps_fpsimd32(vcpu);
> +	}
>   
>   	write_sysreg(val, cpacr_el1);
>   
> @@ -114,8 +116,10 @@ static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
>   
>   	val = CPTR_EL2_DEFAULT;
>   	val |= CPTR_EL2_TTA | CPTR_EL2_TZ;
> -	if (!update_fp_enabled(vcpu))
> +	if (!update_fp_enabled(vcpu)) {
>   		val |= CPTR_EL2_TFP;
> +		__activate_traps_fpsimd32(vcpu);
> +	}
>   
>   	write_sysreg(val, cptr_el2);
>   }
> @@ -129,7 +133,6 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
>   	if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN) && (hcr & HCR_VSE))
>   		write_sysreg_s(vcpu->arch.vsesr_el2, SYS_VSESR_EL2);
>   
> -	__activate_traps_fpsimd32(vcpu);
>   	if (has_vhe())
>   		activate_traps_vhe(vcpu);
>   	else

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 7/9] KVM: arm/arm64: Only clean the dcache on translation fault
  2018-08-23 12:24                 ` Alexander Graf
@ 2018-08-23 12:43                   ` Marc Zyngier
  -1 siblings, 0 replies; 52+ messages in thread
From: Marc Zyngier @ 2018-08-23 12:43 UTC (permalink / raw)
  To: Alexander Graf, Christoffer Dall, Catalin Marinas, Will Deacon
  Cc: Dave Martin, linux-arm-kernel, kvm, kvmarm

On 23/08/18 13:24, Alexander Graf wrote:
> On 08/23/2018 01:16 PM, Marc Zyngier wrote:
>> On 21/08/18 17:54, Alexander Graf wrote:
>>> On 08/21/2018 05:08 PM, Marc Zyngier wrote:
>>>> On 21/08/18 15:08, Alexander Graf wrote:
>>>>> On 08/21/2018 03:57 PM, Marc Zyngier wrote:
>>>>>> On 21/08/18 14:35, Alexander Graf wrote:
>>>>>>> On 10/23/2017 06:11 PM, Marc Zyngier wrote:
>>>>>>>> The only case where we actually need to perform a dcache maintenance
>>>>>>>> is when we map the page for the first time, and subsequent permission
>>>>>>>> faults do not require cache maintenance. Let's make it conditional
>>>>>>>> on not being a permission fault (and thus a translation fault).
>>>>>>>>
>>>>>>>> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
>>>>>>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>>>>>> This patch unfortunately breaks something on Hi1616 SoCs when running
>>>>>>> 32bit guests. With this patch applied (and thus with 4.18) I get random
>>>>>>> illegal instruction warnings from 32bit code inside VMs. I do not know
>>>>>>> at this point whether this affects other CPUs as well.
>>>>>> Can you please give a few more details?
>>>>>>
>>>>>> - what are the CPUs on this Hi1616? At least a /proc/cpuinfo would help
>>>>> These are A72s:
>>>>>
>>>>> processor    : 0
>>>>> BogoMIPS    : 100.00
>>>>> Features    : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
>>>>> CPU implementer    : 0x41
>>>>> CPU architecture: 8
>>>>> CPU variant    : 0x0
>>>>> CPU part    : 0xd08
>>>>> CPU revision    : 2
>>>>>
>>>>>> - an example of the crash? Is it within the decompressor? After? This
>>>>>> things do matter, given the number of crazy things the 32bit kernel does
>>>>> They are always in user space. My current reproducer is this:
>>>>>
>>>>>      $ while rpm -qa > /dev/null; do :; done
>>>>>
>>>>> If I run this in parallel with something that just populates RAM (dd
>>>>> if=/dev/nvme0n1 of=/dev/null bs=10G) I get an illegal instruction fault
>>>>> within seconds:
>>>>>
>>>>> sh-4.4# while rpm -qa > /dev/null; do true; done
>>>>> Illegal instruction (core dumped)
>>>>>
>>>>>
>>>>>> - a host kernel configuration?
>>>>> Host kernel configuration is just the normal openSUSE one:
>>>>>
>>>>> https://kernel.opensuse.org/cgit/kernel-source/plain/config/arm64/default?h=stable
>>>>>
>>>>>>> If anyone is interested in a reproducer, I have something handy. But for
>>>>>>> now I believe we should just revert this patch.
>>>>>> Before we revert anything, I'd like to understand what is happening.
>>>>> Yeah, I didn't realize the commit is already in since 4.16, so I agree.
>>>>> I'll bisect a bit, but it may take a while.
>>>> Do you mind giving this a try?
>>>>
>>>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>>>> index 1d90d79706bd..df8f3d5eaa22 100644
>>>> --- a/virt/kvm/arm/mmu.c
>>>> +++ b/virt/kvm/arm/mmu.c
>>>> @@ -1531,7 +1536,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>>    			kvm_set_pfn_dirty(pfn);
>>>>    		}
>>>>    
>>>> -		if (fault_status != FSC_PERM)
>>>> +		if (fault_status != FSC_PERM || write_fault)
>>>>    			clean_dcache_guest_page(pfn, PMD_SIZE);
>>>>    
>>>>    		if (exec_fault) {
>>>> @@ -1553,7 +1558,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>>    			mark_page_dirty(kvm, gfn);
>>>>    		}
>>>>    
>>>> -		if (fault_status != FSC_PERM)
>>>> +		if (fault_status != FSC_PERM || write_fault)
>>>>    			clean_dcache_guest_page(pfn, PAGE_SIZE);
>>>>    
>>>>    		if (exec_fault) {
>>>>
>>>>
>>>> The missing logic is that a write from the guest could have triggered
>>>> a CoW, meaning we definitely need to flush it in that case too. It
>>>> fixes a kvm-unit-test regression here.
>>> This patch unfortunately does not fix the issue. I still see illegal
>>> instructions.
>> [+Dave]
>>
>> That's because what you're observing has nothing to do with caching, but
>> with FP/SIMD trapping instead. Thanks to the guest image you've provided,
>> I've been able to extract the following:
>>
>> [    3.944273] systemd-udevd (266): undefined instruction: pc=(ptrval)
>> [    3.945396] CPU: 0 PID: 266 Comm: systemd-udevd Not tainted 4.17.14-2-lpae #1 openSUSE Tumbleweed (unreleased)
>> [    3.947130] Hardware name: Generic DT based system
>> [    3.947976] PC is at 0xb6b9397a
>> [    3.948547] LR is at 0xb6e9a1b0
>> [    3.958291] pc : [<b6b9397a>]    lr : [<b6e9a1b0>]    psr: 20070030
>> [    3.959664] sp : bebe36a0  ip : b6f6ad50  fp : 005e8784
>> [    3.960601] r10: bebe3814  r9 : bebe36c0  r8 : bebe36bc
>> [    3.961522] r7 : bebe3814  r6 : 00000074  r5 : 00000000  r4 : 00000000
>> [    3.962661] r3 : 005f2b60  r2 : 00000074  r1 : 00000000  r0 : bebe3814
>> [    3.963801] Flags: nzCv  IRQs on  FIQs on  Mode USER_32  ISA Thumb  Segment user
>> [    4.000606] Control: 30c5383d  Table: 6a8e7400  DAC: fffffffd
>> [    4.001659] Code: d1f9 f1a0 0001 4770 (eee0) 1b10
>>
>> maz@flakes:~/armv7-build-fail$ (echo .thumb; echo .inst.w 0xeee01b10) > x.S &&  arm-linux-gnueabihf-gcc -c x.S && arm-linux-gnueabihf-objdump -d x.o
>>
>> x.o:     file format elf32-littlearm
>>
>>
>> Disassembly of section .text:
>>
>> 00000000 <.text>:
>>     0:	eee0 1b10 	vdup.8	q0, r1
>>
>> A VFP instruction. Given that you've reported that things worked in
>> 4.17 and broke in 4.18, I strongly suspect the new lazy FPSIMD code.
>> Upon inspection, the way we setup trapping for 32bit is a tiny bit
>> suspect.
>>
>> Could you please give this patch a go? My Seattle has been running
>> with it for 30 minutes now, and it is still running (instead of
>> failing with seconds).
>>
>> Thanks,
>>
>> 	M.
>>
>>  From f2d1d43e2429f9269ab6c565440e095ab3a9be89 Mon Sep 17 00:00:00 2001
>> From: Marc Zyngier <marc.zyngier@arm.com>
>> Date: Thu, 23 Aug 2018 11:51:43 +0100
>> Subject: [PATCH] arm64: KVM: Only force FPEXC32_EL2.EN if trapping FPSIMD
>>
>> If trapping FPSIMD in the context of an AArch32 guest, it is critical
>> to set FPEXC32_EL2.EN to 1 so that the trapping is taken to EL2 and
>> not EL1.
>>
>> Conversely, it is just as critical *not* to set FPEXC32_EL2.EN to 1
>> if we're not going to trap FPSIMD, as we then corrupt the existing
>> VFP state.
>>
>> Moving the call to __activate_traps_fpsimd32 to the point where we
>> know for sure that we are going to trap ensures that we don't set that
>> bit spuriously.
>>
>> Fixes: e6b673b741ea ("KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing")
>> Cc: Dave Martin <dave.martin@arm.com>
>> Reported-by: Alexander Graf <agraf@suse.de>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> 
> Very good catch. This patch does indeed fix it. I'm still puzzled why 
> reverting the dcache patch actually had any positive effect, but oh well 
> ... :)

Well, there is still a bug in that dcache patch, which I've fixed
separately (in a better way than what I sent before), so you may have
hit that as well, but that's not the most visible bug.

> Tested-by: Alexander Graf <agraf@suse.de>

Thanks for that.

	M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 7/9] KVM: arm/arm64: Only clean the dcache on translation fault
@ 2018-08-23 12:43                   ` Marc Zyngier
  0 siblings, 0 replies; 52+ messages in thread
From: Marc Zyngier @ 2018-08-23 12:43 UTC (permalink / raw)
  To: linux-arm-kernel

On 23/08/18 13:24, Alexander Graf wrote:
> On 08/23/2018 01:16 PM, Marc Zyngier wrote:
>> On 21/08/18 17:54, Alexander Graf wrote:
>>> On 08/21/2018 05:08 PM, Marc Zyngier wrote:
>>>> On 21/08/18 15:08, Alexander Graf wrote:
>>>>> On 08/21/2018 03:57 PM, Marc Zyngier wrote:
>>>>>> On 21/08/18 14:35, Alexander Graf wrote:
>>>>>>> On 10/23/2017 06:11 PM, Marc Zyngier wrote:
>>>>>>>> The only case where we actually need to perform a dcache maintenance
>>>>>>>> is when we map the page for the first time, and subsequent permission
>>>>>>>> faults do not require cache maintenance. Let's make it conditional
>>>>>>>> on not being a permission fault (and thus a translation fault).
>>>>>>>>
>>>>>>>> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
>>>>>>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>>>>>> This patch unfortunately breaks something on Hi1616 SoCs when running
>>>>>>> 32bit guests. With this patch applied (and thus with 4.18) I get random
>>>>>>> illegal instruction warnings from 32bit code inside VMs. I do not know
>>>>>>> at this point whether this affects other CPUs as well.
>>>>>> Can you please give a few more details?
>>>>>>
>>>>>> - what are the CPUs on this Hi1616? At least a /proc/cpuinfo would help
>>>>> These are A72s:
>>>>>
>>>>> processor??? : 0
>>>>> BogoMIPS??? : 100.00
>>>>> Features??? : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
>>>>> CPU implementer??? : 0x41
>>>>> CPU architecture: 8
>>>>> CPU variant??? : 0x0
>>>>> CPU part??? : 0xd08
>>>>> CPU revision??? : 2
>>>>>
>>>>>> - an example of the crash? Is it within the decompressor? After? This
>>>>>> things do matter, given the number of crazy things the 32bit kernel does
>>>>> They are always in user space. My current reproducer is this:
>>>>>
>>>>>    ? $ while rpm -qa > /dev/null; do :; done
>>>>>
>>>>> If I run this in parallel with something that just populates RAM (dd
>>>>> if=/dev/nvme0n1 of=/dev/null bs=10G) I get an illegal instruction fault
>>>>> within seconds:
>>>>>
>>>>> sh-4.4# while rpm -qa > /dev/null; do true; done
>>>>> Illegal instruction (core dumped)
>>>>>
>>>>>
>>>>>> - a host kernel configuration?
>>>>> Host kernel configuration is just the normal openSUSE one:
>>>>>
>>>>> https://kernel.opensuse.org/cgit/kernel-source/plain/config/arm64/default?h=stable
>>>>>
>>>>>>> If anyone is interested in a reproducer, I have something handy. But for
>>>>>>> now I believe we should just revert this patch.
>>>>>> Before we revert anything, I'd like to understand what is happening.
>>>>> Yeah, I didn't realize the commit is already in since 4.16, so I agree.
>>>>> I'll bisect a bit, but it may take a while.
>>>> Do you mind giving this a try?
>>>>
>>>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>>>> index 1d90d79706bd..df8f3d5eaa22 100644
>>>> --- a/virt/kvm/arm/mmu.c
>>>> +++ b/virt/kvm/arm/mmu.c
>>>> @@ -1531,7 +1536,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>>    			kvm_set_pfn_dirty(pfn);
>>>>    		}
>>>>    
>>>> -		if (fault_status != FSC_PERM)
>>>> +		if (fault_status != FSC_PERM || write_fault)
>>>>    			clean_dcache_guest_page(pfn, PMD_SIZE);
>>>>    
>>>>    		if (exec_fault) {
>>>> @@ -1553,7 +1558,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>>    			mark_page_dirty(kvm, gfn);
>>>>    		}
>>>>    
>>>> -		if (fault_status != FSC_PERM)
>>>> +		if (fault_status != FSC_PERM || write_fault)
>>>>    			clean_dcache_guest_page(pfn, PAGE_SIZE);
>>>>    
>>>>    		if (exec_fault) {
>>>>
>>>>
>>>> The missing logic is that a write from the guest could have triggered
>>>> a CoW, meaning we definitely need to flush it in that case too. It
>>>> fixes a kvm-unit-test regression here.
>>> This patch unfortunately does not fix the issue. I still see illegal
>>> instructions.
>> [+Dave]
>>
>> That's because what you're observing has nothing to do with caching, but
>> with FP/SIMD trapping instead. Thanks to the guest image you've provided,
>> I've been able to extract the following:
>>
>> [    3.944273] systemd-udevd (266): undefined instruction: pc=(ptrval)
>> [    3.945396] CPU: 0 PID: 266 Comm: systemd-udevd Not tainted 4.17.14-2-lpae #1 openSUSE Tumbleweed (unreleased)
>> [    3.947130] Hardware name: Generic DT based system
>> [    3.947976] PC is at 0xb6b9397a
>> [    3.948547] LR is at 0xb6e9a1b0
>> [    3.958291] pc : [<b6b9397a>]    lr : [<b6e9a1b0>]    psr: 20070030
>> [    3.959664] sp : bebe36a0  ip : b6f6ad50  fp : 005e8784
>> [    3.960601] r10: bebe3814  r9 : bebe36c0  r8 : bebe36bc
>> [    3.961522] r7 : bebe3814  r6 : 00000074  r5 : 00000000  r4 : 00000000
>> [    3.962661] r3 : 005f2b60  r2 : 00000074  r1 : 00000000  r0 : bebe3814
>> [    3.963801] Flags: nzCv  IRQs on  FIQs on  Mode USER_32  ISA Thumb  Segment user
>> [    4.000606] Control: 30c5383d  Table: 6a8e7400  DAC: fffffffd
>> [    4.001659] Code: d1f9 f1a0 0001 4770 (eee0) 1b10
>>
>> maz at flakes:~/armv7-build-fail$ (echo .thumb; echo .inst.w 0xeee01b10) > x.S &&  arm-linux-gnueabihf-gcc -c x.S && arm-linux-gnueabihf-objdump -d x.o
>>
>> x.o:     file format elf32-littlearm
>>
>>
>> Disassembly of section .text:
>>
>> 00000000 <.text>:
>>     0:	eee0 1b10 	vdup.8	q0, r1
>>
>> A VFP instruction. Given that you've reported that things worked in
>> 4.17 and broke in 4.18, I strongly suspect the new lazy FPSIMD code.
>> Upon inspection, the way we setup trapping for 32bit is a tiny bit
>> suspect.
>>
>> Could you please give this patch a go? My Seattle has been running
>> with it for 30 minutes now, and it is still running (instead of
>> failing with seconds).
>>
>> Thanks,
>>
>> 	M.
>>
>>  From f2d1d43e2429f9269ab6c565440e095ab3a9be89 Mon Sep 17 00:00:00 2001
>> From: Marc Zyngier <marc.zyngier@arm.com>
>> Date: Thu, 23 Aug 2018 11:51:43 +0100
>> Subject: [PATCH] arm64: KVM: Only force FPEXC32_EL2.EN if trapping FPSIMD
>>
>> If trapping FPSIMD in the context of an AArch32 guest, it is critical
>> to set FPEXC32_EL2.EN to 1 so that the trapping is taken to EL2 and
>> not EL1.
>>
>> Conversely, it is just as critical *not* to set FPEXC32_EL2.EN to 1
>> if we're not going to trap FPSIMD, as we then corrupt the existing
>> VFP state.
>>
>> Moving the call to __activate_traps_fpsimd32 to the point where we
>> know for sure that we are going to trap ensures that we don't set that
>> bit spuriously.
>>
>> Fixes: e6b673b741ea ("KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing")
>> Cc: Dave Martin <dave.martin@arm.com>
>> Reported-by: Alexander Graf <agraf@suse.de>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> 
> Very good catch. This patch does indeed fix it. I'm still puzzled why 
> reverting the dcache patch actually had any positive effect, but oh well 
> ... :)

Well, there is still a bug in that dcache patch, which I've fixed
separately (in a better way than what I sent before), so you may have
hit that as well, but that's not the most visible bug.

> Tested-by: Alexander Graf <agraf@suse.de>

Thanks for that.

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v3 7/9] KVM: arm/arm64: Only clean the dcache on translation fault
  2018-08-23 12:43                   ` Marc Zyngier
@ 2018-09-01 10:03                     ` Alexander Graf
  -1 siblings, 0 replies; 52+ messages in thread
From: Alexander Graf @ 2018-09-01 10:03 UTC (permalink / raw)
  To: Marc Zyngier, Christoffer Dall, Catalin Marinas, Will Deacon
  Cc: Dirk Mueller, Dave Martin, linux-arm-kernel, kvm, kvmarm



On 23.08.18 14:43, Marc Zyngier wrote:
> On 23/08/18 13:24, Alexander Graf wrote:
>> On 08/23/2018 01:16 PM, Marc Zyngier wrote:
>>> On 21/08/18 17:54, Alexander Graf wrote:
>>>> On 08/21/2018 05:08 PM, Marc Zyngier wrote:
>>>>> On 21/08/18 15:08, Alexander Graf wrote:
>>>>>> On 08/21/2018 03:57 PM, Marc Zyngier wrote:
>>>>>>> On 21/08/18 14:35, Alexander Graf wrote:
>>>>>>>> On 10/23/2017 06:11 PM, Marc Zyngier wrote:
>>>>>>>>> The only case where we actually need to perform a dcache maintenance
>>>>>>>>> is when we map the page for the first time, and subsequent permission
>>>>>>>>> faults do not require cache maintenance. Let's make it conditional
>>>>>>>>> on not being a permission fault (and thus a translation fault).
>>>>>>>>>
>>>>>>>>> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
>>>>>>>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>>>>>>> This patch unfortunately breaks something on Hi1616 SoCs when running
>>>>>>>> 32bit guests. With this patch applied (and thus with 4.18) I get random
>>>>>>>> illegal instruction warnings from 32bit code inside VMs. I do not know
>>>>>>>> at this point whether this affects other CPUs as well.
>>>>>>> Can you please give a few more details?
>>>>>>>
>>>>>>> - what are the CPUs on this Hi1616? At least a /proc/cpuinfo would help
>>>>>> These are A72s:
>>>>>>
>>>>>> processor    : 0
>>>>>> BogoMIPS    : 100.00
>>>>>> Features    : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
>>>>>> CPU implementer    : 0x41
>>>>>> CPU architecture: 8
>>>>>> CPU variant    : 0x0
>>>>>> CPU part    : 0xd08
>>>>>> CPU revision    : 2
>>>>>>
>>>>>>> - an example of the crash? Is it within the decompressor? After? This
>>>>>>> things do matter, given the number of crazy things the 32bit kernel does
>>>>>> They are always in user space. My current reproducer is this:
>>>>>>
>>>>>>      $ while rpm -qa > /dev/null; do :; done
>>>>>>
>>>>>> If I run this in parallel with something that just populates RAM (dd
>>>>>> if=/dev/nvme0n1 of=/dev/null bs=10G) I get an illegal instruction fault
>>>>>> within seconds:
>>>>>>
>>>>>> sh-4.4# while rpm -qa > /dev/null; do true; done
>>>>>> Illegal instruction (core dumped)
>>>>>>
>>>>>>
>>>>>>> - a host kernel configuration?
>>>>>> Host kernel configuration is just the normal openSUSE one:
>>>>>>
>>>>>> https://kernel.opensuse.org/cgit/kernel-source/plain/config/arm64/default?h=stable
>>>>>>
>>>>>>>> If anyone is interested in a reproducer, I have something handy. But for
>>>>>>>> now I believe we should just revert this patch.
>>>>>>> Before we revert anything, I'd like to understand what is happening.
>>>>>> Yeah, I didn't realize the commit is already in since 4.16, so I agree.
>>>>>> I'll bisect a bit, but it may take a while.
>>>>> Do you mind giving this a try?
>>>>>
>>>>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>>>>> index 1d90d79706bd..df8f3d5eaa22 100644
>>>>> --- a/virt/kvm/arm/mmu.c
>>>>> +++ b/virt/kvm/arm/mmu.c
>>>>> @@ -1531,7 +1536,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>>>    			kvm_set_pfn_dirty(pfn);
>>>>>    		}
>>>>>    
>>>>> -		if (fault_status != FSC_PERM)
>>>>> +		if (fault_status != FSC_PERM || write_fault)
>>>>>    			clean_dcache_guest_page(pfn, PMD_SIZE);
>>>>>    
>>>>>    		if (exec_fault) {
>>>>> @@ -1553,7 +1558,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>>>    			mark_page_dirty(kvm, gfn);
>>>>>    		}
>>>>>    
>>>>> -		if (fault_status != FSC_PERM)
>>>>> +		if (fault_status != FSC_PERM || write_fault)
>>>>>    			clean_dcache_guest_page(pfn, PAGE_SIZE);
>>>>>    
>>>>>    		if (exec_fault) {
>>>>>
>>>>>
>>>>> The missing logic is that a write from the guest could have triggered
>>>>> a CoW, meaning we definitely need to flush it in that case too. It
>>>>> fixes a kvm-unit-test regression here.
>>>> This patch unfortunately does not fix the issue. I still see illegal
>>>> instructions.
>>> [+Dave]
>>>
>>> That's because what you're observing has nothing to do with caching, but
>>> with FP/SIMD trapping instead. Thanks to the guest image you've provided,
>>> I've been able to extract the following:
>>>
>>> [    3.944273] systemd-udevd (266): undefined instruction: pc=(ptrval)
>>> [    3.945396] CPU: 0 PID: 266 Comm: systemd-udevd Not tainted 4.17.14-2-lpae #1 openSUSE Tumbleweed (unreleased)
>>> [    3.947130] Hardware name: Generic DT based system
>>> [    3.947976] PC is at 0xb6b9397a
>>> [    3.948547] LR is at 0xb6e9a1b0
>>> [    3.958291] pc : [<b6b9397a>]    lr : [<b6e9a1b0>]    psr: 20070030
>>> [    3.959664] sp : bebe36a0  ip : b6f6ad50  fp : 005e8784
>>> [    3.960601] r10: bebe3814  r9 : bebe36c0  r8 : bebe36bc
>>> [    3.961522] r7 : bebe3814  r6 : 00000074  r5 : 00000000  r4 : 00000000
>>> [    3.962661] r3 : 005f2b60  r2 : 00000074  r1 : 00000000  r0 : bebe3814
>>> [    3.963801] Flags: nzCv  IRQs on  FIQs on  Mode USER_32  ISA Thumb  Segment user
>>> [    4.000606] Control: 30c5383d  Table: 6a8e7400  DAC: fffffffd
>>> [    4.001659] Code: d1f9 f1a0 0001 4770 (eee0) 1b10
>>>
>>> maz@flakes:~/armv7-build-fail$ (echo .thumb; echo .inst.w 0xeee01b10) > x.S &&  arm-linux-gnueabihf-gcc -c x.S && arm-linux-gnueabihf-objdump -d x.o
>>>
>>> x.o:     file format elf32-littlearm
>>>
>>>
>>> Disassembly of section .text:
>>>
>>> 00000000 <.text>:
>>>     0:	eee0 1b10 	vdup.8	q0, r1
>>>
>>> A VFP instruction. Given that you've reported that things worked in
>>> 4.17 and broke in 4.18, I strongly suspect the new lazy FPSIMD code.
>>> Upon inspection, the way we setup trapping for 32bit is a tiny bit
>>> suspect.
>>>
>>> Could you please give this patch a go? My Seattle has been running
>>> with it for 30 minutes now, and it is still running (instead of
>>> failing with seconds).
>>>
>>> Thanks,
>>>
>>> 	M.
>>>
>>>  From f2d1d43e2429f9269ab6c565440e095ab3a9be89 Mon Sep 17 00:00:00 2001
>>> From: Marc Zyngier <marc.zyngier@arm.com>
>>> Date: Thu, 23 Aug 2018 11:51:43 +0100
>>> Subject: [PATCH] arm64: KVM: Only force FPEXC32_EL2.EN if trapping FPSIMD
>>>
>>> If trapping FPSIMD in the context of an AArch32 guest, it is critical
>>> to set FPEXC32_EL2.EN to 1 so that the trapping is taken to EL2 and
>>> not EL1.
>>>
>>> Conversely, it is just as critical *not* to set FPEXC32_EL2.EN to 1
>>> if we're not going to trap FPSIMD, as we then corrupt the existing
>>> VFP state.
>>>
>>> Moving the call to __activate_traps_fpsimd32 to the point where we
>>> know for sure that we are going to trap ensures that we don't set that
>>> bit spuriously.
>>>
>>> Fixes: e6b673b741ea ("KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing")
>>> Cc: Dave Martin <dave.martin@arm.com>
>>> Reported-by: Alexander Graf <agraf@suse.de>
>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>
>> Very good catch. This patch does indeed fix it. I'm still puzzled why 
>> reverting the dcache patch actually had any positive effect, but oh well 
>> ... :)
> 
> Well, there is still a bug in that dcache patch, which I've fixed
> separately (in a better way than what I sent before), so you may have
> hit that as well, but that's not the most visible bug.
> 
>> Tested-by: Alexander Graf <agraf@suse.de>
> 
> Thanks for that.

Can I expect to see your patch in 4.19 and 4.18-stable? :)


Alex
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 7/9] KVM: arm/arm64: Only clean the dcache on translation fault
@ 2018-09-01 10:03                     ` Alexander Graf
  0 siblings, 0 replies; 52+ messages in thread
From: Alexander Graf @ 2018-09-01 10:03 UTC (permalink / raw)
  To: linux-arm-kernel



On 23.08.18 14:43, Marc Zyngier wrote:
> On 23/08/18 13:24, Alexander Graf wrote:
>> On 08/23/2018 01:16 PM, Marc Zyngier wrote:
>>> On 21/08/18 17:54, Alexander Graf wrote:
>>>> On 08/21/2018 05:08 PM, Marc Zyngier wrote:
>>>>> On 21/08/18 15:08, Alexander Graf wrote:
>>>>>> On 08/21/2018 03:57 PM, Marc Zyngier wrote:
>>>>>>> On 21/08/18 14:35, Alexander Graf wrote:
>>>>>>>> On 10/23/2017 06:11 PM, Marc Zyngier wrote:
>>>>>>>>> The only case where we actually need to perform a dcache maintenance
>>>>>>>>> is when we map the page for the first time, and subsequent permission
>>>>>>>>> faults do not require cache maintenance. Let's make it conditional
>>>>>>>>> on not being a permission fault (and thus a translation fault).
>>>>>>>>>
>>>>>>>>> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
>>>>>>>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>>>>>>> This patch unfortunately breaks something on Hi1616 SoCs when running
>>>>>>>> 32bit guests. With this patch applied (and thus with 4.18) I get random
>>>>>>>> illegal instruction warnings from 32bit code inside VMs. I do not know
>>>>>>>> at this point whether this affects other CPUs as well.
>>>>>>> Can you please give a few more details?
>>>>>>>
>>>>>>> - what are the CPUs on this Hi1616? At least a /proc/cpuinfo would help
>>>>>> These are A72s:
>>>>>>
>>>>>> processor??? : 0
>>>>>> BogoMIPS??? : 100.00
>>>>>> Features??? : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
>>>>>> CPU implementer??? : 0x41
>>>>>> CPU architecture: 8
>>>>>> CPU variant??? : 0x0
>>>>>> CPU part??? : 0xd08
>>>>>> CPU revision??? : 2
>>>>>>
>>>>>>> - an example of the crash? Is it within the decompressor? After? This
>>>>>>> things do matter, given the number of crazy things the 32bit kernel does
>>>>>> They are always in user space. My current reproducer is this:
>>>>>>
>>>>>>    ? $ while rpm -qa > /dev/null; do :; done
>>>>>>
>>>>>> If I run this in parallel with something that just populates RAM (dd
>>>>>> if=/dev/nvme0n1 of=/dev/null bs=10G) I get an illegal instruction fault
>>>>>> within seconds:
>>>>>>
>>>>>> sh-4.4# while rpm -qa > /dev/null; do true; done
>>>>>> Illegal instruction (core dumped)
>>>>>>
>>>>>>
>>>>>>> - a host kernel configuration?
>>>>>> Host kernel configuration is just the normal openSUSE one:
>>>>>>
>>>>>> https://kernel.opensuse.org/cgit/kernel-source/plain/config/arm64/default?h=stable
>>>>>>
>>>>>>>> If anyone is interested in a reproducer, I have something handy. But for
>>>>>>>> now I believe we should just revert this patch.
>>>>>>> Before we revert anything, I'd like to understand what is happening.
>>>>>> Yeah, I didn't realize the commit is already in since 4.16, so I agree.
>>>>>> I'll bisect a bit, but it may take a while.
>>>>> Do you mind giving this a try?
>>>>>
>>>>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>>>>> index 1d90d79706bd..df8f3d5eaa22 100644
>>>>> --- a/virt/kvm/arm/mmu.c
>>>>> +++ b/virt/kvm/arm/mmu.c
>>>>> @@ -1531,7 +1536,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>>>    			kvm_set_pfn_dirty(pfn);
>>>>>    		}
>>>>>    
>>>>> -		if (fault_status != FSC_PERM)
>>>>> +		if (fault_status != FSC_PERM || write_fault)
>>>>>    			clean_dcache_guest_page(pfn, PMD_SIZE);
>>>>>    
>>>>>    		if (exec_fault) {
>>>>> @@ -1553,7 +1558,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>>>    			mark_page_dirty(kvm, gfn);
>>>>>    		}
>>>>>    
>>>>> -		if (fault_status != FSC_PERM)
>>>>> +		if (fault_status != FSC_PERM || write_fault)
>>>>>    			clean_dcache_guest_page(pfn, PAGE_SIZE);
>>>>>    
>>>>>    		if (exec_fault) {
>>>>>
>>>>>
>>>>> The missing logic is that a write from the guest could have triggered
>>>>> a CoW, meaning we definitely need to flush it in that case too. It
>>>>> fixes a kvm-unit-test regression here.
>>>> This patch unfortunately does not fix the issue. I still see illegal
>>>> instructions.
>>> [+Dave]
>>>
>>> That's because what you're observing has nothing to do with caching, but
>>> with FP/SIMD trapping instead. Thanks to the guest image you've provided,
>>> I've been able to extract the following:
>>>
>>> [    3.944273] systemd-udevd (266): undefined instruction: pc=(ptrval)
>>> [    3.945396] CPU: 0 PID: 266 Comm: systemd-udevd Not tainted 4.17.14-2-lpae #1 openSUSE Tumbleweed (unreleased)
>>> [    3.947130] Hardware name: Generic DT based system
>>> [    3.947976] PC is at 0xb6b9397a
>>> [    3.948547] LR is at 0xb6e9a1b0
>>> [    3.958291] pc : [<b6b9397a>]    lr : [<b6e9a1b0>]    psr: 20070030
>>> [    3.959664] sp : bebe36a0  ip : b6f6ad50  fp : 005e8784
>>> [    3.960601] r10: bebe3814  r9 : bebe36c0  r8 : bebe36bc
>>> [    3.961522] r7 : bebe3814  r6 : 00000074  r5 : 00000000  r4 : 00000000
>>> [    3.962661] r3 : 005f2b60  r2 : 00000074  r1 : 00000000  r0 : bebe3814
>>> [    3.963801] Flags: nzCv  IRQs on  FIQs on  Mode USER_32  ISA Thumb  Segment user
>>> [    4.000606] Control: 30c5383d  Table: 6a8e7400  DAC: fffffffd
>>> [    4.001659] Code: d1f9 f1a0 0001 4770 (eee0) 1b10
>>>
>>> maz at flakes:~/armv7-build-fail$ (echo .thumb; echo .inst.w 0xeee01b10) > x.S &&  arm-linux-gnueabihf-gcc -c x.S && arm-linux-gnueabihf-objdump -d x.o
>>>
>>> x.o:     file format elf32-littlearm
>>>
>>>
>>> Disassembly of section .text:
>>>
>>> 00000000 <.text>:
>>>     0:	eee0 1b10 	vdup.8	q0, r1
>>>
>>> A VFP instruction. Given that you've reported that things worked in
>>> 4.17 and broke in 4.18, I strongly suspect the new lazy FPSIMD code.
>>> Upon inspection, the way we setup trapping for 32bit is a tiny bit
>>> suspect.
>>>
>>> Could you please give this patch a go? My Seattle has been running
>>> with it for 30 minutes now, and it is still running (instead of
>>> failing with seconds).
>>>
>>> Thanks,
>>>
>>> 	M.
>>>
>>>  From f2d1d43e2429f9269ab6c565440e095ab3a9be89 Mon Sep 17 00:00:00 2001
>>> From: Marc Zyngier <marc.zyngier@arm.com>
>>> Date: Thu, 23 Aug 2018 11:51:43 +0100
>>> Subject: [PATCH] arm64: KVM: Only force FPEXC32_EL2.EN if trapping FPSIMD
>>>
>>> If trapping FPSIMD in the context of an AArch32 guest, it is critical
>>> to set FPEXC32_EL2.EN to 1 so that the trapping is taken to EL2 and
>>> not EL1.
>>>
>>> Conversely, it is just as critical *not* to set FPEXC32_EL2.EN to 1
>>> if we're not going to trap FPSIMD, as we then corrupt the existing
>>> VFP state.
>>>
>>> Moving the call to __activate_traps_fpsimd32 to the point where we
>>> know for sure that we are going to trap ensures that we don't set that
>>> bit spuriously.
>>>
>>> Fixes: e6b673b741ea ("KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing")
>>> Cc: Dave Martin <dave.martin@arm.com>
>>> Reported-by: Alexander Graf <agraf@suse.de>
>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>
>> Very good catch. This patch does indeed fix it. I'm still puzzled why 
>> reverting the dcache patch actually had any positive effect, but oh well 
>> ... :)
> 
> Well, there is still a bug in that dcache patch, which I've fixed
> separately (in a better way than what I sent before), so you may have
> hit that as well, but that's not the most visible bug.
> 
>> Tested-by: Alexander Graf <agraf@suse.de>
> 
> Thanks for that.

Can I expect to see your patch in 4.19 and 4.18-stable? :)


Alex

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 0/9] arm/arm64: KVM: limit icache invalidation to prefetch aborts
@ 2017-10-23 15:49 ` Marc Zyngier
  0 siblings, 0 replies; 52+ messages in thread
From: Marc Zyngier @ 2017-10-23 15:49 UTC (permalink / raw)
  To: Christoffer Dall, Catalin Marinas, Will Deacon
  Cc: kvm, linux-arm-kernel, kvmarm

It was recently reported that on a VM restore, we seem to spend a
disproportionate amount of time invalidation the icache. This is
partially due to some HW behaviour, but also because we're being a bit
dumb and are invalidating the icache for every page we map at S2, even
if that on a data access.

The slightly better way of doing this is to mark the pages XN at S2,
and wait for the the guest to execute something in that page, at which
point we perform the invalidation. As it is likely that there is a lot
less instruction than data, we win (or so we hope).

We also take this opportunity to drop the extra dcache clean to the
PoU which is pretty useless, as we already clean all the way to the
PoC...

Running a bare metal test that touches 1GB of memory (using a 4kB
stride) leads to the following results on Seattle:

4.13:
do_fault_read.bin:       0.565885992 seconds time elapsed
do_fault_write.bin:       0.738296337 seconds time elapsed
do_fault_read_write.bin:       1.241812231 seconds time elapsed

4.14-rc3+patches:
do_fault_read.bin:       0.244961803 seconds time elapsed
do_fault_write.bin:       0.422740092 seconds time elapsed
do_fault_read_write.bin:       0.643402470 seconds time elapsed

We're almost halving the time of something that more or less looks
like a restore operation. Some larger systems will show much bigger
benefits as they become less impacted by the icache invalidation
(which is broadcast in the inner shareable domain). I've tried to
measure the impact on a VM boot in order to assess the impact of
taking an extra permission fault, but found that any difference was
simply noise.                                                                   

I've also given it a test run on both Cubietruck and Jetson-TK1.

Tests are archived here:
https://git.kernel.org/pub/scm/linux/kernel/git/maz/kvm-ws-tests.git/

I'd value some additional test results on HW I don't have access to.

* From v2:
  - Brought back the "detangling" patch that allows 32bit ARM to still
    compile...
  - Let arm64 icache invalidation deal with userspace addresses

* From v1:
  - Some function renaming (coherent->clean/invalidate)
  - Made the arm64 icache invalidation a macro that's now used in
    two places
  - Fixed BTB flushing on 32bit
  - Added stage2_is_exec as a predicate for XN being absent from the
    entry
  - Dropped patch #10 which was both useless and broken, and patch #9
    that thus became useless
  - Tried to measure the impact on kernel boot time and failed to see
    any difference

Marc Zyngier (9):
  KVM: arm/arm64: Detangle kvm_mmu.h from kvm_hyp.h
  KVM: arm/arm64: Split dcache/icache flushing
  arm64: KVM: Add invalidate_icache_range helper
  arm: KVM: Add optimized PIPT icache flushing
  arm64: KVM: PTE/PMD S2 XN bit definition
  KVM: arm/arm64: Limit icache invalidation to prefetch aborts
  KVM: arm/arm64: Only clean the dcache on translation fault
  KVM: arm/arm64: Preserve Exec permission across R/W permission faults
  KVM: arm/arm64: Drop vcpu parameter from guest cache maintenance
    operartions

 arch/arm/include/asm/kvm_hyp.h         |  3 +-
 arch/arm/include/asm/kvm_mmu.h         | 99 ++++++++++++++++++++++++++++------
 arch/arm/include/asm/pgtable.h         |  4 +-
 arch/arm/kvm/hyp/switch.c              |  1 +
 arch/arm/kvm/hyp/tlb.c                 |  1 +
 arch/arm64/include/asm/assembler.h     | 21 ++++++++
 arch/arm64/include/asm/cacheflush.h    |  7 +++
 arch/arm64/include/asm/kvm_hyp.h       |  1 -
 arch/arm64/include/asm/kvm_mmu.h       | 36 +++++++++++--
 arch/arm64/include/asm/pgtable-hwdef.h |  2 +
 arch/arm64/include/asm/pgtable-prot.h  |  4 +-
 arch/arm64/kvm/hyp/debug-sr.c          |  1 +
 arch/arm64/kvm/hyp/switch.c            |  1 +
 arch/arm64/kvm/hyp/tlb.c               |  1 +
 arch/arm64/mm/cache.S                  | 32 +++++++----
 virt/kvm/arm/hyp/vgic-v2-sr.c          |  1 +
 virt/kvm/arm/mmu.c                     | 64 +++++++++++++++++++---
 17 files changed, 236 insertions(+), 43 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v3 0/9] arm/arm64: KVM: limit icache invalidation to prefetch aborts
@ 2017-10-23 15:49 ` Marc Zyngier
  0 siblings, 0 replies; 52+ messages in thread
From: Marc Zyngier @ 2017-10-23 15:49 UTC (permalink / raw)
  To: linux-arm-kernel

It was recently reported that on a VM restore, we seem to spend a
disproportionate amount of time invalidation the icache. This is
partially due to some HW behaviour, but also because we're being a bit
dumb and are invalidating the icache for every page we map at S2, even
if that on a data access.

The slightly better way of doing this is to mark the pages XN at S2,
and wait for the the guest to execute something in that page, at which
point we perform the invalidation. As it is likely that there is a lot
less instruction than data, we win (or so we hope).

We also take this opportunity to drop the extra dcache clean to the
PoU which is pretty useless, as we already clean all the way to the
PoC...

Running a bare metal test that touches 1GB of memory (using a 4kB
stride) leads to the following results on Seattle:

4.13:
do_fault_read.bin:       0.565885992 seconds time elapsed
do_fault_write.bin:       0.738296337 seconds time elapsed
do_fault_read_write.bin:       1.241812231 seconds time elapsed

4.14-rc3+patches:
do_fault_read.bin:       0.244961803 seconds time elapsed
do_fault_write.bin:       0.422740092 seconds time elapsed
do_fault_read_write.bin:       0.643402470 seconds time elapsed

We're almost halving the time of something that more or less looks
like a restore operation. Some larger systems will show much bigger
benefits as they become less impacted by the icache invalidation
(which is broadcast in the inner shareable domain). I've tried to
measure the impact on a VM boot in order to assess the impact of
taking an extra permission fault, but found that any difference was
simply noise.                                                                   

I've also given it a test run on both Cubietruck and Jetson-TK1.

Tests are archived here:
https://git.kernel.org/pub/scm/linux/kernel/git/maz/kvm-ws-tests.git/

I'd value some additional test results on HW I don't have access to.

* From v2:
  - Brought back the "detangling" patch that allows 32bit ARM to still
    compile...
  - Let arm64 icache invalidation deal with userspace addresses

* From v1:
  - Some function renaming (coherent->clean/invalidate)
  - Made the arm64 icache invalidation a macro that's now used in
    two places
  - Fixed BTB flushing on 32bit
  - Added stage2_is_exec as a predicate for XN being absent from the
    entry
  - Dropped patch #10 which was both useless and broken, and patch #9
    that thus became useless
  - Tried to measure the impact on kernel boot time and failed to see
    any difference

Marc Zyngier (9):
  KVM: arm/arm64: Detangle kvm_mmu.h from kvm_hyp.h
  KVM: arm/arm64: Split dcache/icache flushing
  arm64: KVM: Add invalidate_icache_range helper
  arm: KVM: Add optimized PIPT icache flushing
  arm64: KVM: PTE/PMD S2 XN bit definition
  KVM: arm/arm64: Limit icache invalidation to prefetch aborts
  KVM: arm/arm64: Only clean the dcache on translation fault
  KVM: arm/arm64: Preserve Exec permission across R/W permission faults
  KVM: arm/arm64: Drop vcpu parameter from guest cache maintenance
    operartions

 arch/arm/include/asm/kvm_hyp.h         |  3 +-
 arch/arm/include/asm/kvm_mmu.h         | 99 ++++++++++++++++++++++++++++------
 arch/arm/include/asm/pgtable.h         |  4 +-
 arch/arm/kvm/hyp/switch.c              |  1 +
 arch/arm/kvm/hyp/tlb.c                 |  1 +
 arch/arm64/include/asm/assembler.h     | 21 ++++++++
 arch/arm64/include/asm/cacheflush.h    |  7 +++
 arch/arm64/include/asm/kvm_hyp.h       |  1 -
 arch/arm64/include/asm/kvm_mmu.h       | 36 +++++++++++--
 arch/arm64/include/asm/pgtable-hwdef.h |  2 +
 arch/arm64/include/asm/pgtable-prot.h  |  4 +-
 arch/arm64/kvm/hyp/debug-sr.c          |  1 +
 arch/arm64/kvm/hyp/switch.c            |  1 +
 arch/arm64/kvm/hyp/tlb.c               |  1 +
 arch/arm64/mm/cache.S                  | 32 +++++++----
 virt/kvm/arm/hyp/vgic-v2-sr.c          |  1 +
 virt/kvm/arm/mmu.c                     | 64 +++++++++++++++++++---
 17 files changed, 236 insertions(+), 43 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2018-09-01 10:03 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-23 16:11 [PATCH v3 0/9] arm/arm64: KVM: limit icache invalidation to prefetch aborts Marc Zyngier
2017-10-23 16:11 ` Marc Zyngier
2017-10-23 16:11 ` [PATCH v3 1/9] KVM: arm/arm64: Detangle kvm_mmu.h from kvm_hyp.h Marc Zyngier
2017-10-23 16:11   ` Marc Zyngier
2017-10-23 16:11 ` [PATCH v3 2/9] KVM: arm/arm64: Split dcache/icache flushing Marc Zyngier
2017-10-23 16:11   ` Marc Zyngier
2017-10-23 16:11 ` [PATCH v3 3/9] arm64: KVM: Add invalidate_icache_range helper Marc Zyngier
2017-10-23 16:11   ` Marc Zyngier
2017-10-23 16:19   ` Will Deacon
2017-10-23 16:19     ` Will Deacon
2017-10-23 16:11 ` [PATCH v3 4/9] arm: KVM: Add optimized PIPT icache flushing Marc Zyngier
2017-10-23 16:11   ` Marc Zyngier
2017-10-23 16:11 ` [PATCH v3 5/9] arm64: KVM: PTE/PMD S2 XN bit definition Marc Zyngier
2017-10-23 16:11   ` Marc Zyngier
2017-10-23 16:11 ` [PATCH v3 6/9] KVM: arm/arm64: Limit icache invalidation to prefetch aborts Marc Zyngier
2017-10-23 16:11   ` Marc Zyngier
2017-11-01 10:17   ` Andrew Jones
2017-11-01 10:17     ` Andrew Jones
2017-11-02 10:36     ` Marc Zyngier
2017-11-02 10:36       ` Marc Zyngier
2017-11-02 13:13       ` Andrew Jones
2017-11-02 13:13         ` Andrew Jones
2017-10-23 16:11 ` [PATCH v3 7/9] KVM: arm/arm64: Only clean the dcache on translation fault Marc Zyngier
2017-10-23 16:11   ` Marc Zyngier
2018-08-21 13:35   ` Alexander Graf
2018-08-21 13:35     ` Alexander Graf
2018-08-21 13:42     ` Alexander Graf
2018-08-21 13:42       ` Alexander Graf
2018-08-21 13:57     ` Marc Zyngier
2018-08-21 13:57       ` Marc Zyngier
2018-08-21 14:08       ` Alexander Graf
2018-08-21 14:08         ` Alexander Graf
2018-08-21 15:08         ` Marc Zyngier
2018-08-21 15:08           ` Marc Zyngier
2018-08-21 16:54           ` Alexander Graf
2018-08-21 16:54             ` Alexander Graf
2018-08-23 11:16             ` Marc Zyngier
2018-08-23 11:16               ` Marc Zyngier
2018-08-23 12:24               ` Alexander Graf
2018-08-23 12:24                 ` Alexander Graf
2018-08-23 12:43                 ` Marc Zyngier
2018-08-23 12:43                   ` Marc Zyngier
2018-09-01 10:03                   ` Alexander Graf
2018-09-01 10:03                     ` Alexander Graf
2018-08-21 16:45         ` Alexander Graf
2018-08-21 16:45           ` Alexander Graf
2017-10-23 16:11 ` [PATCH v3 8/9] KVM: arm/arm64: Preserve Exec permission across R/W permission faults Marc Zyngier
2017-10-23 16:11   ` Marc Zyngier
2017-10-23 16:11 ` [PATCH v3 9/9] KVM: arm/arm64: Drop vcpu parameter from guest cache maintenance operartions Marc Zyngier
2017-10-23 16:11   ` Marc Zyngier
  -- strict thread matches above, loose matches on Subject: below --
2017-10-23 15:49 [PATCH v3 0/9] arm/arm64: KVM: limit icache invalidation to prefetch aborts Marc Zyngier
2017-10-23 15:49 ` Marc Zyngier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.