KVM ARM Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 00/26] KVM: arm64: Preliminary NV patches
@ 2020-04-22 12:00 Marc Zyngier
  2020-04-22 12:00 ` [PATCH 01/26] KVM: arm64: Check advertised Stage-2 page size capability Marc Zyngier
                   ` (25 more replies)
  0 siblings, 26 replies; 78+ messages in thread
From: Marc Zyngier @ 2020-04-22 12:00 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Will Deacon, Andre Przywara, Dave Martin, George Cherian,
	Zengtao (B),
	Catalin Marinas

Hi all,

In order not to repeat the 90+ patch series that resulted in a
deafening silence last time, I've extracted a smaller set of patches
that form the required dependencies that allow the rest of the 65 NV
patches to be added on top. Yes, it is that bad.

The one real feature here is support for the ARMv8.4-TTL extension at
Stage-2 only. The reason to support it is that it helps the hypervisor
a lot when it comes to finding out how much to invalidate. It is thus
always "supported" with NV.

The rest doesn't contain any functionality change. Most of it reworks
existing data structures and adds new accessors for the things that
get moved around. The reason for this is that:

- With NV, we end-up with multiple Stage-2 MMU contexts per VM instead
  of a single one. This requires we divorce struct kvm from the S2 MMU
  configuration. Of course, we stick with a single MMU context for now.

- With ARMv8.4-NV, a number of system register accesses are turned
  into memory accesses into the so-called VNCR page. It is thus
  convenient to make this VNCR page part of the vcpu context and avoid
  copying data back and forth. For this to work, we need to make sure
  that all the VNCR-aware sysregs are moved into our per-vcpu sys_regs
  array instead of leaving in other data structures (the timers, for
  example). The VNCR page itself isn't introduced with these patches.

- As some of these data structures change, we need a way to isolate
  the userspace ABI from such change.

- The exception generation code is also reworked to prepare the
  addition of EL2 exceptions.

There is also a number of cleanups that were in the full fat series
that I decided to move early to get them out of the way.

The whole this is a bit of a mix of vaguely unrelated "stuff", but it
all comes together if you look at the final series[1]. This applies on
top of v5.7-rc1.

I haven't applied any of the Tested-by: tags, as the series keeps
changing. Please keep testing though!

[1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-5.7-rc1-WIP

Christoffer Dall (2):
  KVM: arm64: Factor out stage 2 page table data from struct kvm
  KVM: arm64: vgic-v3: Take cpu_if pointer directly instead of vcpu

Marc Zyngier (24):
  KVM: arm64: Check advertised Stage-2 page size capability
  KVM: arm64: Move __load_guest_stage2 to kvm_mmu.h
  arm64: Detect the ARMv8.4 TTL feature
  arm64: Document SW reserved PTE/PMD bits in Stage-2 descriptors
  arm64: Add level-hinted TLB invalidation helper
  KVM: arm64: Add a level hint to __kvm_tlb_flush_vmid_ipa
  KVM: arm64: Use TTL hint in when invalidating stage-2 translations
  KVM: arm64: Refactor vcpu_{read,write}_sys_reg
  KVM: arm64: Add missing reset handlers for PMU emulation
  KVM: arm64: Move sysreg reset check to boot time
  KVM: arm64: Introduce accessor for ctxt->sys_reg
  KVM: arm64: hyp: Use ctxt_sys_reg/__vcpu_sys_reg instead of raw
    sys_regs access
  KVM: arm64: sve: Use __vcpu_sys_reg() instead of raw sys_regs access
  KVM: arm64: pauth: Use ctxt_sys_reg() instead of raw sys_regs access
  KVM: arm64: debug: Use ctxt_sys_reg() instead of raw sys_regs access
  KVM: arm64: Don't use empty structures as CPU reset state
  KVM: arm64: Make struct kvm_regs userspace-only
  KVM: arm64: Move ELR_EL1 to the system register array
  KVM: arm64: Move SP_EL1 to the system register array
  KVM: arm64: Disintegrate SPSR array
  KVM: arm64: Move SPSR_EL1 to the system register array
  KVM: arm64: timers: Rename kvm_timer_sync_hwstate to
    kvm_timer_sync_user
  KVM: arm64: timers: Move timer registers to the sys_regs file
  KVM: arm64: Parametrize exception entry with a target EL

 arch/arm64/include/asm/cpucaps.h        |   3 +-
 arch/arm64/include/asm/kvm_asm.h        |   6 +-
 arch/arm64/include/asm/kvm_emulate.h    |  37 +---
 arch/arm64/include/asm/kvm_host.h       |  71 +++++--
 arch/arm64/include/asm/kvm_hyp.h        |  30 +--
 arch/arm64/include/asm/kvm_mmu.h        |  27 ++-
 arch/arm64/include/asm/pgtable-hwdef.h  |   2 +
 arch/arm64/include/asm/stage2_pgtable.h |   9 +
 arch/arm64/include/asm/sysreg.h         |   4 +
 arch/arm64/include/asm/tlbflush.h       |  30 +++
 arch/arm64/kernel/asm-offsets.c         |   3 +-
 arch/arm64/kernel/cpufeature.c          |  19 ++
 arch/arm64/kvm/fpsimd.c                 |   6 +-
 arch/arm64/kvm/guest.c                  |  79 ++++++-
 arch/arm64/kvm/handle_exit.c            |  17 +-
 arch/arm64/kvm/hyp/debug-sr.c           |  18 +-
 arch/arm64/kvm/hyp/entry.S              |   3 +-
 arch/arm64/kvm/hyp/switch.c             |  31 ++-
 arch/arm64/kvm/hyp/sysreg-sr.c          | 160 +++++++-------
 arch/arm64/kvm/hyp/tlb.c                |  51 +++--
 arch/arm64/kvm/inject_fault.c           |  75 +++----
 arch/arm64/kvm/regmap.c                 |  37 +++-
 arch/arm64/kvm/reset.c                  |  60 ++++--
 arch/arm64/kvm/sys_regs.c               | 215 ++++++++++---------
 include/kvm/arm_arch_timer.h            |  13 +-
 include/kvm/arm_vgic.h                  |   5 +-
 virt/kvm/arm/arch_timer.c               | 157 +++++++++++---
 virt/kvm/arm/arm.c                      |  40 ++--
 virt/kvm/arm/hyp/vgic-v3-sr.c           |  33 +--
 virt/kvm/arm/mmu.c                      | 267 +++++++++++++-----------
 virt/kvm/arm/trace.h                    |   8 +-
 virt/kvm/arm/vgic/vgic-v2.c             |  10 +-
 virt/kvm/arm/vgic/vgic-v3.c             |  14 +-
 virt/kvm/arm/vgic/vgic.c                |  25 ++-
 34 files changed, 942 insertions(+), 623 deletions(-)

-- 
2.26.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 01/26] KVM: arm64: Check advertised Stage-2 page size capability
  2020-04-22 12:00 [PATCH 00/26] KVM: arm64: Preliminary NV patches Marc Zyngier
@ 2020-04-22 12:00 ` Marc Zyngier
  2020-04-22 13:40   ` Suzuki K Poulose
  2020-05-07 11:42   ` Alexandru Elisei
  2020-04-22 12:00 ` [PATCH 02/26] KVM: arm64: Move __load_guest_stage2 to kvm_mmu.h Marc Zyngier
                   ` (24 subsequent siblings)
  25 siblings, 2 replies; 78+ messages in thread
From: Marc Zyngier @ 2020-04-22 12:00 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Will Deacon, Andre Przywara, Dave Martin, George Cherian,
	Zengtao (B),
	Catalin Marinas

With ARMv8.5-GTG, the hardware (or more likely a hypervisor) can
advertise the supported Stage-2 page sizes.

Let's check this at boot time.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h |  2 +-
 arch/arm64/include/asm/sysreg.h   |  3 +++
 arch/arm64/kernel/cpufeature.c    |  8 +++++++
 arch/arm64/kvm/reset.c            | 40 ++++++++++++++++++++++++++++---
 virt/kvm/arm/arm.c                |  4 +---
 5 files changed, 50 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 32c8a675e5a4a..7dd8fefa6aecd 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -670,7 +670,7 @@ static inline int kvm_arm_have_ssbd(void)
 void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu);
 void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu);
 
-void kvm_set_ipa_limit(void);
+int kvm_set_ipa_limit(void);
 
 #define __KVM_HAVE_ARCH_VM_ALLOC
 struct kvm *kvm_arch_alloc_vm(void);
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index ebc6224328318..5d10c9148e844 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -686,6 +686,9 @@
 #define ID_AA64ZFR0_SVEVER_SVE2		0x1
 
 /* id_aa64mmfr0 */
+#define ID_AA64MMFR0_TGRAN4_2_SHIFT	40
+#define ID_AA64MMFR0_TGRAN64_2_SHIFT	36
+#define ID_AA64MMFR0_TGRAN16_2_SHIFT	32
 #define ID_AA64MMFR0_TGRAN4_SHIFT	28
 #define ID_AA64MMFR0_TGRAN64_SHIFT	24
 #define ID_AA64MMFR0_TGRAN16_SHIFT	20
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 9fac745aa7bb2..9892a845d06c9 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -208,6 +208,14 @@ static const struct arm64_ftr_bits ftr_id_aa64zfr0[] = {
 };
 
 static const struct arm64_ftr_bits ftr_id_aa64mmfr0[] = {
+	/*
+	 * Page size not being supported at Stage-2 are not fatal. You
+	 * just give up KVM if PAGE_SIZE isn't supported there. Go fix
+	 * your favourite nesting hypervisor.
+	 */
+	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_EXACT, ID_AA64MMFR0_TGRAN4_2_SHIFT, 4, 1),
+	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_EXACT, ID_AA64MMFR0_TGRAN64_2_SHIFT, 4, 1),
+	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_EXACT, ID_AA64MMFR0_TGRAN16_2_SHIFT, 4, 1),
 	/*
 	 * We already refuse to boot CPUs that don't support our configured
 	 * page size, so we can only detect mismatches for a page size other
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 30b7ea680f66c..241db35a7ef4f 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -9,6 +9,7 @@
  */
 
 #include <linux/errno.h>
+#include <linux/bitfield.h>
 #include <linux/kernel.h>
 #include <linux/kvm_host.h>
 #include <linux/kvm.h>
@@ -340,11 +341,42 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 	return ret;
 }
 
-void kvm_set_ipa_limit(void)
+int kvm_set_ipa_limit(void)
 {
-	unsigned int ipa_max, pa_max, va_max, parange;
+	unsigned int ipa_max, pa_max, va_max, parange, tgran_2;
+	u64 mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
 
-	parange = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1) & 0x7;
+	/*
+	 * Check with ARMv8.5-GTG that our PAGE_SIZE is supported at
+	 * Stage-2. If not, things will stop very quickly.
+	 */
+	switch (PAGE_SIZE) {
+	default:
+	case SZ_4K:
+		tgran_2 = ID_AA64MMFR0_TGRAN4_2_SHIFT;
+		break;
+	case SZ_16K:
+		tgran_2 = ID_AA64MMFR0_TGRAN16_2_SHIFT;
+		break;
+	case SZ_64K:
+		tgran_2 = ID_AA64MMFR0_TGRAN64_2_SHIFT;
+		break;
+	}
+
+	switch (FIELD_GET(0xFUL << tgran_2, mmfr0)) {
+	default:
+	case 1:
+		kvm_err("PAGE_SIZE not supported at Stage-2, giving up\n");
+		return -EINVAL;
+	case 0:
+		kvm_debug("PAGE_SIZE supported at Stage-2 (default)\n");
+		break;
+	case 2:
+		kvm_debug("PAGE_SIZE supported at Stage-2 (advertised)\n");
+		break;
+	}
+
+	parange = mmfr0 & 0x7;
 	pa_max = id_aa64mmfr0_parange_to_phys_shift(parange);
 
 	/* Clamp the IPA limit to the PA size supported by the kernel */
@@ -378,6 +410,8 @@ void kvm_set_ipa_limit(void)
 	     "KVM IPA limit (%d bit) is smaller than default size\n", ipa_max);
 	kvm_ipa_limit = ipa_max;
 	kvm_info("IPA Size Limit: %dbits\n", kvm_ipa_limit);
+
+	return 0;
 }
 
 /*
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 48d0ec44ad77e..53b3ba9173ba7 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -1387,9 +1387,7 @@ static inline void hyp_cpu_pm_exit(void)
 
 static int init_common_resources(void)
 {
-	kvm_set_ipa_limit();
-
-	return 0;
+	return kvm_set_ipa_limit();
 }
 
 static int init_subsystems(void)
-- 
2.26.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 02/26] KVM: arm64: Move __load_guest_stage2 to kvm_mmu.h
  2020-04-22 12:00 [PATCH 00/26] KVM: arm64: Preliminary NV patches Marc Zyngier
  2020-04-22 12:00 ` [PATCH 01/26] KVM: arm64: Check advertised Stage-2 page size capability Marc Zyngier
@ 2020-04-22 12:00 ` Marc Zyngier
  2020-04-22 13:51   ` Suzuki K Poulose
  2020-04-22 12:00 ` [PATCH 03/26] KVM: arm64: Factor out stage 2 page table data from struct kvm Marc Zyngier
                   ` (23 subsequent siblings)
  25 siblings, 1 reply; 78+ messages in thread
From: Marc Zyngier @ 2020-04-22 12:00 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Will Deacon, Andre Przywara, Dave Martin, George Cherian,
	Zengtao (B),
	Catalin Marinas

Having __load_guest_stage2 in kvm_hyp.h is quickly going to trigger
a circular include problem. In order to avoid this, let's move
it to kvm_mmu.h, where it will be a better fit anyway.

In the process, drop the __hyp_text annotation, which doesn't help
as the function is marked as __always_inline.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_hyp.h | 18 ------------------
 arch/arm64/include/asm/kvm_mmu.h | 17 +++++++++++++++++
 2 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index fe57f60f06a89..dcb63bf941057 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -10,7 +10,6 @@
 #include <linux/compiler.h>
 #include <linux/kvm_host.h>
 #include <asm/alternative.h>
-#include <asm/kvm_mmu.h>
 #include <asm/sysreg.h>
 
 #define __hyp_text __section(.hyp.text) notrace
@@ -88,22 +87,5 @@ void deactivate_traps_vhe_put(void);
 u64 __guest_enter(struct kvm_vcpu *vcpu, struct kvm_cpu_context *host_ctxt);
 void __noreturn __hyp_do_panic(unsigned long, ...);
 
-/*
- * Must be called from hyp code running at EL2 with an updated VTTBR
- * and interrupts disabled.
- */
-static __always_inline void __hyp_text __load_guest_stage2(struct kvm *kvm)
-{
-	write_sysreg(kvm->arch.vtcr, vtcr_el2);
-	write_sysreg(kvm_get_vttbr(kvm), vttbr_el2);
-
-	/*
-	 * ARM errata 1165522 and 1530923 require the actual execution of the
-	 * above before we can switch to the EL1/EL0 translation regime used by
-	 * the guest.
-	 */
-	asm(ALTERNATIVE("nop", "isb", ARM64_WORKAROUND_SPECULATIVE_AT_VHE));
-}
-
 #endif /* __ARM64_KVM_HYP_H__ */
 
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 30b0e8d6b8953..5ba1310639ec6 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -604,5 +604,22 @@ static __always_inline u64 kvm_get_vttbr(struct kvm *kvm)
 	return kvm_phys_to_vttbr(baddr) | vmid_field | cnp;
 }
 
+/*
+ * Must be called from hyp code running at EL2 with an updated VTTBR
+ * and interrupts disabled.
+ */
+static __always_inline void __load_guest_stage2(struct kvm *kvm)
+{
+	write_sysreg(kvm->arch.vtcr, vtcr_el2);
+	write_sysreg(kvm_get_vttbr(kvm), vttbr_el2);
+
+	/*
+	 * ARM erratum 1165522 requires the actual execution of the above
+	 * before we can switch to the EL1/EL0 translation regime used by
+	 * the guest.
+	 */
+	asm(ALTERNATIVE("nop", "isb", ARM64_WORKAROUND_SPECULATIVE_AT_VHE));
+}
+
 #endif /* __ASSEMBLY__ */
 #endif /* __ARM64_KVM_MMU_H__ */
-- 
2.26.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 03/26] KVM: arm64: Factor out stage 2 page table data from struct kvm
  2020-04-22 12:00 [PATCH 00/26] KVM: arm64: Preliminary NV patches Marc Zyngier
  2020-04-22 12:00 ` [PATCH 01/26] KVM: arm64: Check advertised Stage-2 page size capability Marc Zyngier
  2020-04-22 12:00 ` [PATCH 02/26] KVM: arm64: Move __load_guest_stage2 to kvm_mmu.h Marc Zyngier
@ 2020-04-22 12:00 ` Marc Zyngier
  2020-05-05 15:26   ` Andrew Scull
                     ` (2 more replies)
  2020-04-22 12:00 ` [PATCH 04/26] arm64: Detect the ARMv8.4 TTL feature Marc Zyngier
                   ` (22 subsequent siblings)
  25 siblings, 3 replies; 78+ messages in thread
From: Marc Zyngier @ 2020-04-22 12:00 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Will Deacon, Andre Przywara, Dave Martin, George Cherian,
	Zengtao (B),
	Catalin Marinas

From: Christoffer Dall <christoffer.dall@arm.com>

As we are about to reuse our stage 2 page table manipulation code for
shadow stage 2 page tables in the context of nested virtualization, we
are going to manage multiple stage 2 page tables for a single VM.

This requires some pretty invasive changes to our data structures,
which moves the vmid and pgd pointers into a separate structure and
change pretty much all of our mmu code to operate on this structure
instead.

The new structure is called struct kvm_s2_mmu.

There is no intended functional change by this patch alone.

[Designed data structure layout in collaboration]
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Co-developed-by: Marc Zyngier <maz@kernel.org>
[maz: Moved the last_vcpu_ran down to the S2 MMU structure as well]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_asm.h  |   5 +-
 arch/arm64/include/asm/kvm_host.h |  30 +++-
 arch/arm64/include/asm/kvm_mmu.h  |  16 +-
 arch/arm64/kvm/hyp/switch.c       |   8 +-
 arch/arm64/kvm/hyp/tlb.c          |  48 +++---
 virt/kvm/arm/arm.c                |  32 +---
 virt/kvm/arm/mmu.c                | 266 +++++++++++++++++-------------
 7 files changed, 219 insertions(+), 186 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 7c7eeeaab9faa..5adf4e1a4c2c9 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -53,6 +53,7 @@
 
 struct kvm;
 struct kvm_vcpu;
+struct kvm_s2_mmu;
 
 extern char __kvm_hyp_init[];
 extern char __kvm_hyp_init_end[];
@@ -60,8 +61,8 @@ extern char __kvm_hyp_init_end[];
 extern char __kvm_hyp_vector[];
 
 extern void __kvm_flush_vm_context(void);
-extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
-extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
+extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa);
+extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
 extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
 
 extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 7dd8fefa6aecd..664a5d92ae9b8 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -63,19 +63,32 @@ struct kvm_vmid {
 	u32    vmid;
 };
 
-struct kvm_arch {
+struct kvm_s2_mmu {
 	struct kvm_vmid vmid;
 
-	/* stage2 entry level table */
-	pgd_t *pgd;
-	phys_addr_t pgd_phys;
-
-	/* VTCR_EL2 value for this VM */
-	u64    vtcr;
+	/*
+	 * stage2 entry level table
+	 *
+	 * Two kvm_s2_mmu structures in the same VM can point to the same pgd
+	 * here.  This happens when running a non-VHE guest hypervisor which
+	 * uses the canonical stage 2 page table for both vEL2 and for vEL1/0
+	 * with vHCR_EL2.VM == 0.
+	 */
+	pgd_t		*pgd;
+	phys_addr_t	pgd_phys;
 
 	/* The last vcpu id that ran on each physical CPU */
 	int __percpu *last_vcpu_ran;
 
+	struct kvm *kvm;
+};
+
+struct kvm_arch {
+	struct kvm_s2_mmu mmu;
+
+	/* VTCR_EL2 value for this VM */
+	u64    vtcr;
+
 	/* The maximum number of vCPUs depends on the used GIC model */
 	int max_vcpus;
 
@@ -255,6 +268,9 @@ struct kvm_vcpu_arch {
 	void *sve_state;
 	unsigned int sve_max_vl;
 
+	/* Stage 2 paging state used by the hardware on next switch */
+	struct kvm_s2_mmu *hw_mmu;
+
 	/* HYP configuration */
 	u64 hcr_el2;
 	u32 mdcr_el2;
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 5ba1310639ec6..c6c8eee008d66 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -154,8 +154,8 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
 void free_hyp_pgds(void);
 
 void stage2_unmap_vm(struct kvm *kvm);
-int kvm_alloc_stage2_pgd(struct kvm *kvm);
-void kvm_free_stage2_pgd(struct kvm *kvm);
+int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu);
+void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
 int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 			  phys_addr_t pa, unsigned long size, bool writable);
 
@@ -593,13 +593,13 @@ static inline u64 kvm_vttbr_baddr_mask(struct kvm *kvm)
 	return vttbr_baddr_mask(kvm_phys_shift(kvm), kvm_stage2_levels(kvm));
 }
 
-static __always_inline u64 kvm_get_vttbr(struct kvm *kvm)
+static __always_inline u64 kvm_get_vttbr(struct kvm_s2_mmu *mmu)
 {
-	struct kvm_vmid *vmid = &kvm->arch.vmid;
+	struct kvm_vmid *vmid = &mmu->vmid;
 	u64 vmid_field, baddr;
 	u64 cnp = system_supports_cnp() ? VTTBR_CNP_BIT : 0;
 
-	baddr = kvm->arch.pgd_phys;
+	baddr = mmu->pgd_phys;
 	vmid_field = (u64)vmid->vmid << VTTBR_VMID_SHIFT;
 	return kvm_phys_to_vttbr(baddr) | vmid_field | cnp;
 }
@@ -608,10 +608,10 @@ static __always_inline u64 kvm_get_vttbr(struct kvm *kvm)
  * Must be called from hyp code running at EL2 with an updated VTTBR
  * and interrupts disabled.
  */
-static __always_inline void __load_guest_stage2(struct kvm *kvm)
+static __always_inline void __load_guest_stage2(struct kvm_s2_mmu *mmu)
 {
-	write_sysreg(kvm->arch.vtcr, vtcr_el2);
-	write_sysreg(kvm_get_vttbr(kvm), vttbr_el2);
+	write_sysreg(kern_hyp_va(mmu->kvm)->arch.vtcr, vtcr_el2);
+	write_sysreg(kvm_get_vttbr(mmu), vttbr_el2);
 
 	/*
 	 * ARM erratum 1165522 requires the actual execution of the above
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 8a1e81a400e0f..d79319038b119 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -256,9 +256,9 @@ void deactivate_traps_vhe_put(void)
 	__deactivate_traps_common();
 }
 
-static void __hyp_text __activate_vm(struct kvm *kvm)
+static void __hyp_text __activate_vm(struct kvm_s2_mmu *mmu)
 {
-	__load_guest_stage2(kvm);
+	__load_guest_stage2(mmu);
 }
 
 static void __hyp_text __deactivate_vm(struct kvm_vcpu *vcpu)
@@ -659,7 +659,7 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	 * stage 2 translation, and __activate_traps clear HCR_EL2.TGE
 	 * (among other things).
 	 */
-	__activate_vm(vcpu->kvm);
+	__activate_vm(vcpu->arch.hw_mmu);
 	__activate_traps(vcpu);
 
 	sysreg_restore_guest_state_vhe(guest_ctxt);
@@ -766,7 +766,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 	__sysreg32_restore_state(vcpu);
 	__sysreg_restore_state_nvhe(guest_ctxt);
 
-	__activate_vm(kern_hyp_va(vcpu->kvm));
+	__activate_vm(kern_hyp_va(vcpu->arch.hw_mmu));
 	__activate_traps(vcpu);
 
 	__hyp_vgic_restore_state(vcpu);
diff --git a/arch/arm64/kvm/hyp/tlb.c b/arch/arm64/kvm/hyp/tlb.c
index ceaddbe4279f9..b795891b0d091 100644
--- a/arch/arm64/kvm/hyp/tlb.c
+++ b/arch/arm64/kvm/hyp/tlb.c
@@ -16,7 +16,7 @@ struct tlb_inv_context {
 	u64		sctlr;
 };
 
-static void __hyp_text __tlb_switch_to_guest_vhe(struct kvm *kvm,
+static void __hyp_text __tlb_switch_to_guest_vhe(struct kvm_s2_mmu *mmu,
 						 struct tlb_inv_context *cxt)
 {
 	u64 val;
@@ -53,14 +53,14 @@ static void __hyp_text __tlb_switch_to_guest_vhe(struct kvm *kvm,
 	 * place before clearing TGE. __load_guest_stage2() already
 	 * has an ISB in order to deal with this.
 	 */
-	__load_guest_stage2(kvm);
+	__load_guest_stage2(mmu);
 	val = read_sysreg(hcr_el2);
 	val &= ~HCR_TGE;
 	write_sysreg(val, hcr_el2);
 	isb();
 }
 
-static void __hyp_text __tlb_switch_to_guest_nvhe(struct kvm *kvm,
+static void __hyp_text __tlb_switch_to_guest_nvhe(struct kvm_s2_mmu *mmu,
 						  struct tlb_inv_context *cxt)
 {
 	if (cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT_NVHE)) {
@@ -79,21 +79,20 @@ static void __hyp_text __tlb_switch_to_guest_nvhe(struct kvm *kvm,
 		isb();
 	}
 
-	__load_guest_stage2(kvm);
+	__load_guest_stage2(mmu);
 	isb();
 }
 
-static void __hyp_text __tlb_switch_to_guest(struct kvm *kvm,
+static void __hyp_text __tlb_switch_to_guest(struct kvm_s2_mmu *mmu,
 					     struct tlb_inv_context *cxt)
 {
 	if (has_vhe())
-		__tlb_switch_to_guest_vhe(kvm, cxt);
+		__tlb_switch_to_guest_vhe(mmu, cxt);
 	else
-		__tlb_switch_to_guest_nvhe(kvm, cxt);
+		__tlb_switch_to_guest_nvhe(mmu, cxt);
 }
 
-static void __hyp_text __tlb_switch_to_host_vhe(struct kvm *kvm,
-						struct tlb_inv_context *cxt)
+static void __hyp_text __tlb_switch_to_host_vhe(struct tlb_inv_context *cxt)
 {
 	/*
 	 * We're done with the TLB operation, let's restore the host's
@@ -112,8 +111,7 @@ static void __hyp_text __tlb_switch_to_host_vhe(struct kvm *kvm,
 	local_irq_restore(cxt->flags);
 }
 
-static void __hyp_text __tlb_switch_to_host_nvhe(struct kvm *kvm,
-						 struct tlb_inv_context *cxt)
+static void __hyp_text __tlb_switch_to_host_nvhe(struct tlb_inv_context *cxt)
 {
 	write_sysreg(0, vttbr_el2);
 
@@ -125,24 +123,24 @@ static void __hyp_text __tlb_switch_to_host_nvhe(struct kvm *kvm,
 	}
 }
 
-static void __hyp_text __tlb_switch_to_host(struct kvm *kvm,
+static void __hyp_text __tlb_switch_to_host(struct kvm_s2_mmu *mmu,
 					    struct tlb_inv_context *cxt)
 {
 	if (has_vhe())
-		__tlb_switch_to_host_vhe(kvm, cxt);
+		__tlb_switch_to_host_vhe(cxt);
 	else
-		__tlb_switch_to_host_nvhe(kvm, cxt);
+		__tlb_switch_to_host_nvhe(cxt);
 }
 
-void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
+void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa)
 {
 	struct tlb_inv_context cxt;
 
 	dsb(ishst);
 
 	/* Switch to requested VMID */
-	kvm = kern_hyp_va(kvm);
-	__tlb_switch_to_guest(kvm, &cxt);
+	mmu = kern_hyp_va(mmu);
+	__tlb_switch_to_guest(mmu, &cxt);
 
 	/*
 	 * We could do so much better if we had the VA as well.
@@ -185,39 +183,39 @@ void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
 	if (!has_vhe() && icache_is_vpipt())
 		__flush_icache_all();
 
-	__tlb_switch_to_host(kvm, &cxt);
+	__tlb_switch_to_host(mmu, &cxt);
 }
 
-void __hyp_text __kvm_tlb_flush_vmid(struct kvm *kvm)
+void __hyp_text __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu)
 {
 	struct tlb_inv_context cxt;
 
 	dsb(ishst);
 
 	/* Switch to requested VMID */
-	kvm = kern_hyp_va(kvm);
-	__tlb_switch_to_guest(kvm, &cxt);
+	mmu = kern_hyp_va(mmu);
+	__tlb_switch_to_guest(mmu, &cxt);
 
 	__tlbi(vmalls12e1is);
 	dsb(ish);
 	isb();
 
-	__tlb_switch_to_host(kvm, &cxt);
+	__tlb_switch_to_host(mmu, &cxt);
 }
 
 void __hyp_text __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu)
 {
-	struct kvm *kvm = kern_hyp_va(kern_hyp_va(vcpu)->kvm);
+	struct kvm_s2_mmu *mmu = kern_hyp_va(kern_hyp_va(vcpu)->arch.hw_mmu);
 	struct tlb_inv_context cxt;
 
 	/* Switch to requested VMID */
-	__tlb_switch_to_guest(kvm, &cxt);
+	__tlb_switch_to_guest(mmu, &cxt);
 
 	__tlbi(vmalle1);
 	dsb(nsh);
 	isb();
 
-	__tlb_switch_to_host(kvm, &cxt);
+	__tlb_switch_to_host(mmu, &cxt);
 }
 
 void __hyp_text __kvm_flush_vm_context(void)
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 53b3ba9173ba7..03f01fcfa2bd5 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -101,22 +101,15 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
  */
 int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 {
-	int ret, cpu;
+	int ret;
 
 	ret = kvm_arm_setup_stage2(kvm, type);
 	if (ret)
 		return ret;
 
-	kvm->arch.last_vcpu_ran = alloc_percpu(typeof(*kvm->arch.last_vcpu_ran));
-	if (!kvm->arch.last_vcpu_ran)
-		return -ENOMEM;
-
-	for_each_possible_cpu(cpu)
-		*per_cpu_ptr(kvm->arch.last_vcpu_ran, cpu) = -1;
-
-	ret = kvm_alloc_stage2_pgd(kvm);
+	ret = kvm_init_stage2_mmu(kvm, &kvm->arch.mmu);
 	if (ret)
-		goto out_fail_alloc;
+		return ret;
 
 	ret = create_hyp_mappings(kvm, kvm + 1, PAGE_HYP);
 	if (ret)
@@ -124,19 +117,13 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 
 	kvm_vgic_early_init(kvm);
 
-	/* Mark the initial VMID generation invalid */
-	kvm->arch.vmid.vmid_gen = 0;
-
 	/* The maximum number of VCPUs is limited by the host's GIC model */
 	kvm->arch.max_vcpus = vgic_present ?
 				kvm_vgic_get_max_vcpus() : KVM_MAX_VCPUS;
 
 	return ret;
 out_free_stage2_pgd:
-	kvm_free_stage2_pgd(kvm);
-out_fail_alloc:
-	free_percpu(kvm->arch.last_vcpu_ran);
-	kvm->arch.last_vcpu_ran = NULL;
+	kvm_free_stage2_pgd(&kvm->arch.mmu);
 	return ret;
 }
 
@@ -161,9 +148,6 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 
 	kvm_vgic_destroy(kvm);
 
-	free_percpu(kvm->arch.last_vcpu_ran);
-	kvm->arch.last_vcpu_ran = NULL;
-
 	for (i = 0; i < KVM_MAX_VCPUS; ++i) {
 		if (kvm->vcpus[i]) {
 			kvm_vcpu_destroy(kvm->vcpus[i]);
@@ -279,6 +263,8 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 
 	kvm_arm_pvtime_vcpu_init(&vcpu->arch);
 
+	vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu;
+
 	err = kvm_vgic_vcpu_init(vcpu);
 	if (err)
 		return err;
@@ -337,7 +323,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	int *last_ran;
 	kvm_host_data_t *cpu_data;
 
-	last_ran = this_cpu_ptr(vcpu->kvm->arch.last_vcpu_ran);
+	last_ran = this_cpu_ptr(vcpu->arch.hw_mmu->last_vcpu_ran);
 	cpu_data = this_cpu_ptr(&kvm_host_data);
 
 	/*
@@ -679,7 +665,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		 */
 		cond_resched();
 
-		update_vmid(&vcpu->kvm->arch.vmid);
+		update_vmid(&vcpu->arch.hw_mmu->vmid);
 
 		check_vcpu_requests(vcpu);
 
@@ -728,7 +714,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		 */
 		smp_store_mb(vcpu->mode, IN_GUEST_MODE);
 
-		if (ret <= 0 || need_new_vmid_gen(&vcpu->kvm->arch.vmid) ||
+		if (ret <= 0 || need_new_vmid_gen(&vcpu->arch.hw_mmu->vmid) ||
 		    kvm_request_pending(vcpu)) {
 			vcpu->mode = OUTSIDE_GUEST_MODE;
 			isb(); /* Ensure work in x_flush_hwstate is committed */
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index e3b9ee268823b..2f99749048285 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -55,12 +55,12 @@ static bool memslot_is_logging(struct kvm_memory_slot *memslot)
  */
 void kvm_flush_remote_tlbs(struct kvm *kvm)
 {
-	kvm_call_hyp(__kvm_tlb_flush_vmid, kvm);
+	kvm_call_hyp(__kvm_tlb_flush_vmid, &kvm->arch.mmu);
 }
 
-static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
+static void kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa)
 {
-	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, kvm, ipa);
+	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ipa);
 }
 
 /*
@@ -96,31 +96,33 @@ static bool kvm_is_device_pfn(unsigned long pfn)
  *
  * Function clears a PMD entry, flushes addr 1st and 2nd stage TLBs.
  */
-static void stage2_dissolve_pmd(struct kvm *kvm, phys_addr_t addr, pmd_t *pmd)
+static void stage2_dissolve_pmd(struct kvm_s2_mmu *mmu, phys_addr_t addr, pmd_t *pmd)
 {
 	if (!pmd_thp_or_huge(*pmd))
 		return;
 
 	pmd_clear(pmd);
-	kvm_tlb_flush_vmid_ipa(kvm, addr);
+	kvm_tlb_flush_vmid_ipa(mmu, addr);
 	put_page(virt_to_page(pmd));
 }
 
 /**
  * stage2_dissolve_pud() - clear and flush huge PUD entry
- * @kvm:	pointer to kvm structure.
+ * @mmu:	pointer to mmu structure to operate on
  * @addr:	IPA
  * @pud:	pud pointer for IPA
  *
  * Function clears a PUD entry, flushes addr 1st and 2nd stage TLBs.
  */
-static void stage2_dissolve_pud(struct kvm *kvm, phys_addr_t addr, pud_t *pudp)
+static void stage2_dissolve_pud(struct kvm_s2_mmu *mmu, phys_addr_t addr, pud_t *pudp)
 {
+	struct kvm *kvm __maybe_unused = mmu->kvm;
+
 	if (!stage2_pud_huge(kvm, *pudp))
 		return;
 
 	stage2_pud_clear(kvm, pudp);
-	kvm_tlb_flush_vmid_ipa(kvm, addr);
+	kvm_tlb_flush_vmid_ipa(mmu, addr);
 	put_page(virt_to_page(pudp));
 }
 
@@ -156,31 +158,35 @@ static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
 	return p;
 }
 
-static void clear_stage2_pgd_entry(struct kvm *kvm, pgd_t *pgd, phys_addr_t addr)
+static void clear_stage2_pgd_entry(struct kvm_s2_mmu *mmu, pgd_t *pgd, phys_addr_t addr)
 {
+	struct kvm *kvm __maybe_unused = mmu->kvm;
+
 	pud_t *pud_table __maybe_unused = stage2_pud_offset(kvm, pgd, 0UL);
 	stage2_pgd_clear(kvm, pgd);
-	kvm_tlb_flush_vmid_ipa(kvm, addr);
+	kvm_tlb_flush_vmid_ipa(mmu, addr);
 	stage2_pud_free(kvm, pud_table);
 	put_page(virt_to_page(pgd));
 }
 
-static void clear_stage2_pud_entry(struct kvm *kvm, pud_t *pud, phys_addr_t addr)
+static void clear_stage2_pud_entry(struct kvm_s2_mmu *mmu, pud_t *pud, phys_addr_t addr)
 {
+	struct kvm *kvm __maybe_unused = mmu->kvm;
+
 	pmd_t *pmd_table __maybe_unused = stage2_pmd_offset(kvm, pud, 0);
 	VM_BUG_ON(stage2_pud_huge(kvm, *pud));
 	stage2_pud_clear(kvm, pud);
-	kvm_tlb_flush_vmid_ipa(kvm, addr);
+	kvm_tlb_flush_vmid_ipa(mmu, addr);
 	stage2_pmd_free(kvm, pmd_table);
 	put_page(virt_to_page(pud));
 }
 
-static void clear_stage2_pmd_entry(struct kvm *kvm, pmd_t *pmd, phys_addr_t addr)
+static void clear_stage2_pmd_entry(struct kvm_s2_mmu *mmu, pmd_t *pmd, phys_addr_t addr)
 {
 	pte_t *pte_table = pte_offset_kernel(pmd, 0);
 	VM_BUG_ON(pmd_thp_or_huge(*pmd));
 	pmd_clear(pmd);
-	kvm_tlb_flush_vmid_ipa(kvm, addr);
+	kvm_tlb_flush_vmid_ipa(mmu, addr);
 	free_page((unsigned long)pte_table);
 	put_page(virt_to_page(pmd));
 }
@@ -238,7 +244,7 @@ static inline void kvm_pgd_populate(pgd_t *pgdp, pud_t *pudp)
  * we then fully enforce cacheability of RAM, no matter what the guest
  * does.
  */
-static void unmap_stage2_ptes(struct kvm *kvm, pmd_t *pmd,
+static void unmap_stage2_ptes(struct kvm_s2_mmu *mmu, pmd_t *pmd,
 		       phys_addr_t addr, phys_addr_t end)
 {
 	phys_addr_t start_addr = addr;
@@ -250,7 +256,7 @@ static void unmap_stage2_ptes(struct kvm *kvm, pmd_t *pmd,
 			pte_t old_pte = *pte;
 
 			kvm_set_pte(pte, __pte(0));
-			kvm_tlb_flush_vmid_ipa(kvm, addr);
+			kvm_tlb_flush_vmid_ipa(mmu, addr);
 
 			/* No need to invalidate the cache for device mappings */
 			if (!kvm_is_device_pfn(pte_pfn(old_pte)))
@@ -260,13 +266,14 @@ static void unmap_stage2_ptes(struct kvm *kvm, pmd_t *pmd,
 		}
 	} while (pte++, addr += PAGE_SIZE, addr != end);
 
-	if (stage2_pte_table_empty(kvm, start_pte))
-		clear_stage2_pmd_entry(kvm, pmd, start_addr);
+	if (stage2_pte_table_empty(mmu->kvm, start_pte))
+		clear_stage2_pmd_entry(mmu, pmd, start_addr);
 }
 
-static void unmap_stage2_pmds(struct kvm *kvm, pud_t *pud,
+static void unmap_stage2_pmds(struct kvm_s2_mmu *mmu, pud_t *pud,
 		       phys_addr_t addr, phys_addr_t end)
 {
+	struct kvm *kvm __maybe_unused = mmu->kvm;
 	phys_addr_t next, start_addr = addr;
 	pmd_t *pmd, *start_pmd;
 
@@ -278,24 +285,25 @@ static void unmap_stage2_pmds(struct kvm *kvm, pud_t *pud,
 				pmd_t old_pmd = *pmd;
 
 				pmd_clear(pmd);
-				kvm_tlb_flush_vmid_ipa(kvm, addr);
+				kvm_tlb_flush_vmid_ipa(mmu, addr);
 
 				kvm_flush_dcache_pmd(old_pmd);
 
 				put_page(virt_to_page(pmd));
 			} else {
-				unmap_stage2_ptes(kvm, pmd, addr, next);
+				unmap_stage2_ptes(mmu, pmd, addr, next);
 			}
 		}
 	} while (pmd++, addr = next, addr != end);
 
 	if (stage2_pmd_table_empty(kvm, start_pmd))
-		clear_stage2_pud_entry(kvm, pud, start_addr);
+		clear_stage2_pud_entry(mmu, pud, start_addr);
 }
 
-static void unmap_stage2_puds(struct kvm *kvm, pgd_t *pgd,
+static void unmap_stage2_puds(struct kvm_s2_mmu *mmu, pgd_t *pgd,
 		       phys_addr_t addr, phys_addr_t end)
 {
+	struct kvm *kvm __maybe_unused = mmu->kvm;
 	phys_addr_t next, start_addr = addr;
 	pud_t *pud, *start_pud;
 
@@ -307,17 +315,17 @@ static void unmap_stage2_puds(struct kvm *kvm, pgd_t *pgd,
 				pud_t old_pud = *pud;
 
 				stage2_pud_clear(kvm, pud);
-				kvm_tlb_flush_vmid_ipa(kvm, addr);
+				kvm_tlb_flush_vmid_ipa(mmu, addr);
 				kvm_flush_dcache_pud(old_pud);
 				put_page(virt_to_page(pud));
 			} else {
-				unmap_stage2_pmds(kvm, pud, addr, next);
+				unmap_stage2_pmds(mmu, pud, addr, next);
 			}
 		}
 	} while (pud++, addr = next, addr != end);
 
 	if (stage2_pud_table_empty(kvm, start_pud))
-		clear_stage2_pgd_entry(kvm, pgd, start_addr);
+		clear_stage2_pgd_entry(mmu, pgd, start_addr);
 }
 
 /**
@@ -331,8 +339,9 @@ static void unmap_stage2_puds(struct kvm *kvm, pgd_t *pgd,
  * destroying the VM), otherwise another faulting VCPU may come in and mess
  * with things behind our backs.
  */
-static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
+static void unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
 {
+	struct kvm *kvm = mmu->kvm;
 	pgd_t *pgd;
 	phys_addr_t addr = start, end = start + size;
 	phys_addr_t next;
@@ -340,18 +349,18 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
 	assert_spin_locked(&kvm->mmu_lock);
 	WARN_ON(size & ~PAGE_MASK);
 
-	pgd = kvm->arch.pgd + stage2_pgd_index(kvm, addr);
+	pgd = mmu->pgd + stage2_pgd_index(kvm, addr);
 	do {
 		/*
 		 * Make sure the page table is still active, as another thread
 		 * could have possibly freed the page table, while we released
 		 * the lock.
 		 */
-		if (!READ_ONCE(kvm->arch.pgd))
+		if (!READ_ONCE(mmu->pgd))
 			break;
 		next = stage2_pgd_addr_end(kvm, addr, end);
 		if (!stage2_pgd_none(kvm, *pgd))
-			unmap_stage2_puds(kvm, pgd, addr, next);
+			unmap_stage2_puds(mmu, pgd, addr, next);
 		/*
 		 * If the range is too large, release the kvm->mmu_lock
 		 * to prevent starvation and lockup detector warnings.
@@ -361,7 +370,7 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
 	} while (pgd++, addr = next, addr != end);
 }
 
-static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
+static void stage2_flush_ptes(struct kvm_s2_mmu *mmu, pmd_t *pmd,
 			      phys_addr_t addr, phys_addr_t end)
 {
 	pte_t *pte;
@@ -373,9 +382,10 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
 	} while (pte++, addr += PAGE_SIZE, addr != end);
 }
 
-static void stage2_flush_pmds(struct kvm *kvm, pud_t *pud,
+static void stage2_flush_pmds(struct kvm_s2_mmu *mmu, pud_t *pud,
 			      phys_addr_t addr, phys_addr_t end)
 {
+	struct kvm *kvm = mmu->kvm;
 	pmd_t *pmd;
 	phys_addr_t next;
 
@@ -386,14 +396,15 @@ static void stage2_flush_pmds(struct kvm *kvm, pud_t *pud,
 			if (pmd_thp_or_huge(*pmd))
 				kvm_flush_dcache_pmd(*pmd);
 			else
-				stage2_flush_ptes(kvm, pmd, addr, next);
+				stage2_flush_ptes(mmu, pmd, addr, next);
 		}
 	} while (pmd++, addr = next, addr != end);
 }
 
-static void stage2_flush_puds(struct kvm *kvm, pgd_t *pgd,
+static void stage2_flush_puds(struct kvm_s2_mmu *mmu, pgd_t *pgd,
 			      phys_addr_t addr, phys_addr_t end)
 {
+	struct kvm *kvm __maybe_unused = mmu->kvm;
 	pud_t *pud;
 	phys_addr_t next;
 
@@ -404,24 +415,25 @@ static void stage2_flush_puds(struct kvm *kvm, pgd_t *pgd,
 			if (stage2_pud_huge(kvm, *pud))
 				kvm_flush_dcache_pud(*pud);
 			else
-				stage2_flush_pmds(kvm, pud, addr, next);
+				stage2_flush_pmds(mmu, pud, addr, next);
 		}
 	} while (pud++, addr = next, addr != end);
 }
 
-static void stage2_flush_memslot(struct kvm *kvm,
+static void stage2_flush_memslot(struct kvm_s2_mmu *mmu,
 				 struct kvm_memory_slot *memslot)
 {
+	struct kvm *kvm = mmu->kvm;
 	phys_addr_t addr = memslot->base_gfn << PAGE_SHIFT;
 	phys_addr_t end = addr + PAGE_SIZE * memslot->npages;
 	phys_addr_t next;
 	pgd_t *pgd;
 
-	pgd = kvm->arch.pgd + stage2_pgd_index(kvm, addr);
+	pgd = mmu->pgd + stage2_pgd_index(kvm, addr);
 	do {
 		next = stage2_pgd_addr_end(kvm, addr, end);
 		if (!stage2_pgd_none(kvm, *pgd))
-			stage2_flush_puds(kvm, pgd, addr, next);
+			stage2_flush_puds(mmu, pgd, addr, next);
 	} while (pgd++, addr = next, addr != end);
 }
 
@@ -443,7 +455,7 @@ static void stage2_flush_vm(struct kvm *kvm)
 
 	slots = kvm_memslots(kvm);
 	kvm_for_each_memslot(memslot, slots)
-		stage2_flush_memslot(kvm, memslot);
+		stage2_flush_memslot(&kvm->arch.mmu, memslot);
 
 	spin_unlock(&kvm->mmu_lock);
 	srcu_read_unlock(&kvm->srcu, idx);
@@ -886,21 +898,23 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
 }
 
 /**
- * kvm_alloc_stage2_pgd - allocate level-1 table for stage-2 translation.
- * @kvm:	The KVM struct pointer for the VM.
+ * kvm_init_stage2_mmu - Initialise a S2 MMU strucrure
+ * @kvm:	The pointer to the KVM structure
+ * @mmu:	The pointer to the s2 MMU structure
  *
  * Allocates only the stage-2 HW PGD level table(s) of size defined by
- * stage2_pgd_size(kvm).
+ * stage2_pgd_size(mmu->kvm).
  *
  * Note we don't need locking here as this is only called when the VM is
  * created, which can only be done once.
  */
-int kvm_alloc_stage2_pgd(struct kvm *kvm)
+int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
 {
 	phys_addr_t pgd_phys;
 	pgd_t *pgd;
+	int cpu;
 
-	if (kvm->arch.pgd != NULL) {
+	if (mmu->pgd != NULL) {
 		kvm_err("kvm_arch already initialized?\n");
 		return -EINVAL;
 	}
@@ -914,8 +928,20 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm)
 	if (WARN_ON(pgd_phys & ~kvm_vttbr_baddr_mask(kvm)))
 		return -EINVAL;
 
-	kvm->arch.pgd = pgd;
-	kvm->arch.pgd_phys = pgd_phys;
+	mmu->last_vcpu_ran = alloc_percpu(typeof(*mmu->last_vcpu_ran));
+	if (!mmu->last_vcpu_ran) {
+		free_pages_exact(pgd, stage2_pgd_size(kvm));
+		return -ENOMEM;
+	}
+
+	for_each_possible_cpu(cpu)
+		*per_cpu_ptr(mmu->last_vcpu_ran, cpu) = -1;
+
+	mmu->kvm = kvm;
+	mmu->pgd = pgd;
+	mmu->pgd_phys = pgd_phys;
+	mmu->vmid.vmid_gen = 0;
+
 	return 0;
 }
 
@@ -954,7 +980,7 @@ static void stage2_unmap_memslot(struct kvm *kvm,
 
 		if (!(vma->vm_flags & VM_PFNMAP)) {
 			gpa_t gpa = addr + (vm_start - memslot->userspace_addr);
-			unmap_stage2_range(kvm, gpa, vm_end - vm_start);
+			unmap_stage2_range(&kvm->arch.mmu, gpa, vm_end - vm_start);
 		}
 		hva = vm_end;
 	} while (hva < reg_end);
@@ -986,39 +1012,34 @@ void stage2_unmap_vm(struct kvm *kvm)
 	srcu_read_unlock(&kvm->srcu, idx);
 }
 
-/**
- * kvm_free_stage2_pgd - free all stage-2 tables
- * @kvm:	The KVM struct pointer for the VM.
- *
- * Walks the level-1 page table pointed to by kvm->arch.pgd and frees all
- * underlying level-2 and level-3 tables before freeing the actual level-1 table
- * and setting the struct pointer to NULL.
- */
-void kvm_free_stage2_pgd(struct kvm *kvm)
+void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
 {
+	struct kvm *kvm = mmu->kvm;
 	void *pgd = NULL;
 
 	spin_lock(&kvm->mmu_lock);
-	if (kvm->arch.pgd) {
-		unmap_stage2_range(kvm, 0, kvm_phys_size(kvm));
-		pgd = READ_ONCE(kvm->arch.pgd);
-		kvm->arch.pgd = NULL;
-		kvm->arch.pgd_phys = 0;
+	if (mmu->pgd) {
+		unmap_stage2_range(mmu, 0, kvm_phys_size(kvm));
+		pgd = READ_ONCE(mmu->pgd);
+		mmu->pgd = NULL;
 	}
 	spin_unlock(&kvm->mmu_lock);
 
 	/* Free the HW pgd, one page at a time */
-	if (pgd)
+	if (pgd) {
 		free_pages_exact(pgd, stage2_pgd_size(kvm));
+		free_percpu(mmu->last_vcpu_ran);
+	}
 }
 
-static pud_t *stage2_get_pud(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
+static pud_t *stage2_get_pud(struct kvm_s2_mmu *mmu, struct kvm_mmu_memory_cache *cache,
 			     phys_addr_t addr)
 {
+	struct kvm *kvm __maybe_unused = mmu->kvm;
 	pgd_t *pgd;
 	pud_t *pud;
 
-	pgd = kvm->arch.pgd + stage2_pgd_index(kvm, addr);
+	pgd = mmu->pgd + stage2_pgd_index(kvm, addr);
 	if (stage2_pgd_none(kvm, *pgd)) {
 		if (!cache)
 			return NULL;
@@ -1030,13 +1051,14 @@ static pud_t *stage2_get_pud(struct kvm *kvm, struct kvm_mmu_memory_cache *cache
 	return stage2_pud_offset(kvm, pgd, addr);
 }
 
-static pmd_t *stage2_get_pmd(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
+static pmd_t *stage2_get_pmd(struct kvm_s2_mmu *mmu, struct kvm_mmu_memory_cache *cache,
 			     phys_addr_t addr)
 {
+	struct kvm *kvm __maybe_unused = mmu->kvm;
 	pud_t *pud;
 	pmd_t *pmd;
 
-	pud = stage2_get_pud(kvm, cache, addr);
+	pud = stage2_get_pud(mmu, cache, addr);
 	if (!pud || stage2_pud_huge(kvm, *pud))
 		return NULL;
 
@@ -1051,13 +1073,14 @@ static pmd_t *stage2_get_pmd(struct kvm *kvm, struct kvm_mmu_memory_cache *cache
 	return stage2_pmd_offset(kvm, pud, addr);
 }
 
-static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
-			       *cache, phys_addr_t addr, const pmd_t *new_pmd)
+static int stage2_set_pmd_huge(struct kvm_s2_mmu *mmu,
+			       struct kvm_mmu_memory_cache *cache,
+			       phys_addr_t addr, const pmd_t *new_pmd)
 {
 	pmd_t *pmd, old_pmd;
 
 retry:
-	pmd = stage2_get_pmd(kvm, cache, addr);
+	pmd = stage2_get_pmd(mmu, cache, addr);
 	VM_BUG_ON(!pmd);
 
 	old_pmd = *pmd;
@@ -1090,7 +1113,7 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
 		 * get handled accordingly.
 		 */
 		if (!pmd_thp_or_huge(old_pmd)) {
-			unmap_stage2_range(kvm, addr & S2_PMD_MASK, S2_PMD_SIZE);
+			unmap_stage2_range(mmu, addr & S2_PMD_MASK, S2_PMD_SIZE);
 			goto retry;
 		}
 		/*
@@ -1106,7 +1129,7 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
 		 */
 		WARN_ON_ONCE(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd));
 		pmd_clear(pmd);
-		kvm_tlb_flush_vmid_ipa(kvm, addr);
+		kvm_tlb_flush_vmid_ipa(mmu, addr);
 	} else {
 		get_page(virt_to_page(pmd));
 	}
@@ -1115,13 +1138,15 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
 	return 0;
 }
 
-static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
+static int stage2_set_pud_huge(struct kvm_s2_mmu *mmu,
+			       struct kvm_mmu_memory_cache *cache,
 			       phys_addr_t addr, const pud_t *new_pudp)
 {
+	struct kvm *kvm __maybe_unused = mmu->kvm;
 	pud_t *pudp, old_pud;
 
 retry:
-	pudp = stage2_get_pud(kvm, cache, addr);
+	pudp = stage2_get_pud(mmu, cache, addr);
 	VM_BUG_ON(!pudp);
 
 	old_pud = *pudp;
@@ -1140,13 +1165,13 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cac
 		 * the range for this block and retry.
 		 */
 		if (!stage2_pud_huge(kvm, old_pud)) {
-			unmap_stage2_range(kvm, addr & S2_PUD_MASK, S2_PUD_SIZE);
+			unmap_stage2_range(mmu, addr & S2_PUD_MASK, S2_PUD_SIZE);
 			goto retry;
 		}
 
 		WARN_ON_ONCE(kvm_pud_pfn(old_pud) != kvm_pud_pfn(*new_pudp));
 		stage2_pud_clear(kvm, pudp);
-		kvm_tlb_flush_vmid_ipa(kvm, addr);
+		kvm_tlb_flush_vmid_ipa(mmu, addr);
 	} else {
 		get_page(virt_to_page(pudp));
 	}
@@ -1161,9 +1186,10 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cac
  * leaf-entry is returned in the appropriate level variable - pudpp,
  * pmdpp, ptepp.
  */
-static bool stage2_get_leaf_entry(struct kvm *kvm, phys_addr_t addr,
+static bool stage2_get_leaf_entry(struct kvm_s2_mmu *mmu, phys_addr_t addr,
 				  pud_t **pudpp, pmd_t **pmdpp, pte_t **ptepp)
 {
+	struct kvm *kvm __maybe_unused = mmu->kvm;
 	pud_t *pudp;
 	pmd_t *pmdp;
 	pte_t *ptep;
@@ -1172,7 +1198,7 @@ static bool stage2_get_leaf_entry(struct kvm *kvm, phys_addr_t addr,
 	*pmdpp = NULL;
 	*ptepp = NULL;
 
-	pudp = stage2_get_pud(kvm, NULL, addr);
+	pudp = stage2_get_pud(mmu, NULL, addr);
 	if (!pudp || stage2_pud_none(kvm, *pudp) || !stage2_pud_present(kvm, *pudp))
 		return false;
 
@@ -1198,14 +1224,14 @@ static bool stage2_get_leaf_entry(struct kvm *kvm, phys_addr_t addr,
 	return true;
 }
 
-static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr)
+static bool stage2_is_exec(struct kvm_s2_mmu *mmu, phys_addr_t addr)
 {
 	pud_t *pudp;
 	pmd_t *pmdp;
 	pte_t *ptep;
 	bool found;
 
-	found = stage2_get_leaf_entry(kvm, addr, &pudp, &pmdp, &ptep);
+	found = stage2_get_leaf_entry(mmu, addr, &pudp, &pmdp, &ptep);
 	if (!found)
 		return false;
 
@@ -1217,10 +1243,12 @@ static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr)
 		return kvm_s2pte_exec(ptep);
 }
 
-static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
+static int stage2_set_pte(struct kvm_s2_mmu *mmu,
+			  struct kvm_mmu_memory_cache *cache,
 			  phys_addr_t addr, const pte_t *new_pte,
 			  unsigned long flags)
 {
+	struct kvm *kvm __maybe_unused = mmu->kvm;
 	pud_t *pud;
 	pmd_t *pmd;
 	pte_t *pte, old_pte;
@@ -1230,7 +1258,7 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
 	VM_BUG_ON(logging_active && !cache);
 
 	/* Create stage-2 page table mapping - Levels 0 and 1 */
-	pud = stage2_get_pud(kvm, cache, addr);
+	pud = stage2_get_pud(mmu, cache, addr);
 	if (!pud) {
 		/*
 		 * Ignore calls from kvm_set_spte_hva for unallocated
@@ -1244,7 +1272,7 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
 	 * on to allocate page.
 	 */
 	if (logging_active)
-		stage2_dissolve_pud(kvm, addr, pud);
+		stage2_dissolve_pud(mmu, addr, pud);
 
 	if (stage2_pud_none(kvm, *pud)) {
 		if (!cache)
@@ -1268,7 +1296,7 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
 	 * allocate page.
 	 */
 	if (logging_active)
-		stage2_dissolve_pmd(kvm, addr, pmd);
+		stage2_dissolve_pmd(mmu, addr, pmd);
 
 	/* Create stage-2 page mappings - Level 2 */
 	if (pmd_none(*pmd)) {
@@ -1292,7 +1320,7 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
 			return 0;
 
 		kvm_set_pte(pte, __pte(0));
-		kvm_tlb_flush_vmid_ipa(kvm, addr);
+		kvm_tlb_flush_vmid_ipa(mmu, addr);
 	} else {
 		get_page(virt_to_page(pte));
 	}
@@ -1358,8 +1386,8 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 		if (ret)
 			goto out;
 		spin_lock(&kvm->mmu_lock);
-		ret = stage2_set_pte(kvm, &cache, addr, &pte,
-						KVM_S2PTE_FLAG_IS_IOMAP);
+		ret = stage2_set_pte(&kvm->arch.mmu, &cache, addr, &pte,
+				     KVM_S2PTE_FLAG_IS_IOMAP);
 		spin_unlock(&kvm->mmu_lock);
 		if (ret)
 			goto out;
@@ -1439,9 +1467,10 @@ static void stage2_wp_ptes(pmd_t *pmd, phys_addr_t addr, phys_addr_t end)
  * @addr:	range start address
  * @end:	range end address
  */
-static void stage2_wp_pmds(struct kvm *kvm, pud_t *pud,
+static void stage2_wp_pmds(struct kvm_s2_mmu *mmu, pud_t *pud,
 			   phys_addr_t addr, phys_addr_t end)
 {
+	struct kvm *kvm = mmu->kvm;
 	pmd_t *pmd;
 	phys_addr_t next;
 
@@ -1461,14 +1490,15 @@ static void stage2_wp_pmds(struct kvm *kvm, pud_t *pud,
 }
 
 /**
- * stage2_wp_puds - write protect PGD range
- * @pgd:	pointer to pgd entry
- * @addr:	range start address
- * @end:	range end address
- */
-static void  stage2_wp_puds(struct kvm *kvm, pgd_t *pgd,
+  * stage2_wp_puds - write protect PGD range
+  * @pgd:	pointer to pgd entry
+  * @addr:	range start address
+  * @end:	range end address
+  */
+static void  stage2_wp_puds(struct kvm_s2_mmu *mmu, pgd_t *pgd,
 			    phys_addr_t addr, phys_addr_t end)
 {
+	struct kvm *kvm __maybe_unused = mmu->kvm;
 	pud_t *pud;
 	phys_addr_t next;
 
@@ -1480,7 +1510,7 @@ static void  stage2_wp_puds(struct kvm *kvm, pgd_t *pgd,
 				if (!kvm_s2pud_readonly(pud))
 					kvm_set_s2pud_readonly(pud);
 			} else {
-				stage2_wp_pmds(kvm, pud, addr, next);
+				stage2_wp_pmds(mmu, pud, addr, next);
 			}
 		}
 	} while (pud++, addr = next, addr != end);
@@ -1492,12 +1522,13 @@ static void  stage2_wp_puds(struct kvm *kvm, pgd_t *pgd,
  * @addr:	Start address of range
  * @end:	End address of range
  */
-static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
+static void stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end)
 {
+	struct kvm *kvm = mmu->kvm;
 	pgd_t *pgd;
 	phys_addr_t next;
 
-	pgd = kvm->arch.pgd + stage2_pgd_index(kvm, addr);
+	pgd = mmu->pgd + stage2_pgd_index(kvm, addr);
 	do {
 		/*
 		 * Release kvm_mmu_lock periodically if the memory region is
@@ -1509,11 +1540,11 @@ static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
 		 * the lock.
 		 */
 		cond_resched_lock(&kvm->mmu_lock);
-		if (!READ_ONCE(kvm->arch.pgd))
+		if (!READ_ONCE(mmu->pgd))
 			break;
 		next = stage2_pgd_addr_end(kvm, addr, end);
 		if (stage2_pgd_present(kvm, *pgd))
-			stage2_wp_puds(kvm, pgd, addr, next);
+			stage2_wp_puds(mmu, pgd, addr, next);
 	} while (pgd++, addr = next, addr != end);
 }
 
@@ -1543,7 +1574,7 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
 	end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
 
 	spin_lock(&kvm->mmu_lock);
-	stage2_wp_range(kvm, start, end);
+	stage2_wp_range(&kvm->arch.mmu, start, end);
 	spin_unlock(&kvm->mmu_lock);
 	kvm_flush_remote_tlbs(kvm);
 }
@@ -1567,7 +1598,7 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
 	phys_addr_t start = (base_gfn +  __ffs(mask)) << PAGE_SHIFT;
 	phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT;
 
-	stage2_wp_range(kvm, start, end);
+	stage2_wp_range(&kvm->arch.mmu, start, end);
 }
 
 /*
@@ -1673,6 +1704,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	pgprot_t mem_type = PAGE_S2;
 	bool logging_active = memslot_is_logging(memslot);
 	unsigned long vma_pagesize, flags = 0;
+	struct kvm_s2_mmu *mmu = vcpu->arch.hw_mmu;
 
 	write_fault = kvm_is_write_fault(vcpu);
 	exec_fault = kvm_vcpu_trap_is_iabt(vcpu);
@@ -1801,7 +1833,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * execute permissions, and we preserve whatever we have.
 	 */
 	needs_exec = exec_fault ||
-		(fault_status == FSC_PERM && stage2_is_exec(kvm, fault_ipa));
+		(fault_status == FSC_PERM && stage2_is_exec(mmu, fault_ipa));
 
 	if (vma_pagesize == PUD_SIZE) {
 		pud_t new_pud = kvm_pfn_pud(pfn, mem_type);
@@ -1813,7 +1845,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		if (needs_exec)
 			new_pud = kvm_s2pud_mkexec(new_pud);
 
-		ret = stage2_set_pud_huge(kvm, memcache, fault_ipa, &new_pud);
+		ret = stage2_set_pud_huge(mmu, memcache, fault_ipa, &new_pud);
 	} else if (vma_pagesize == PMD_SIZE) {
 		pmd_t new_pmd = kvm_pfn_pmd(pfn, mem_type);
 
@@ -1825,7 +1857,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		if (needs_exec)
 			new_pmd = kvm_s2pmd_mkexec(new_pmd);
 
-		ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
+		ret = stage2_set_pmd_huge(mmu, memcache, fault_ipa, &new_pmd);
 	} else {
 		pte_t new_pte = kvm_pfn_pte(pfn, mem_type);
 
@@ -1837,7 +1869,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		if (needs_exec)
 			new_pte = kvm_s2pte_mkexec(new_pte);
 
-		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, flags);
+		ret = stage2_set_pte(mmu, memcache, fault_ipa, &new_pte, flags);
 	}
 
 out_unlock:
@@ -1866,7 +1898,7 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
 
 	spin_lock(&vcpu->kvm->mmu_lock);
 
-	if (!stage2_get_leaf_entry(vcpu->kvm, fault_ipa, &pud, &pmd, &pte))
+	if (!stage2_get_leaf_entry(vcpu->arch.hw_mmu, fault_ipa, &pud, &pmd, &pte))
 		goto out;
 
 	if (pud) {		/* HugeTLB */
@@ -2040,14 +2072,14 @@ static int handle_hva_to_gpa(struct kvm *kvm,
 
 static int kvm_unmap_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data)
 {
-	unmap_stage2_range(kvm, gpa, size);
+	unmap_stage2_range(&kvm->arch.mmu, gpa, size);
 	return 0;
 }
 
 int kvm_unmap_hva_range(struct kvm *kvm,
 			unsigned long start, unsigned long end)
 {
-	if (!kvm->arch.pgd)
+	if (!kvm->arch.mmu.pgd)
 		return 0;
 
 	trace_kvm_unmap_hva_range(start, end);
@@ -2067,7 +2099,7 @@ static int kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data
 	 * therefore stage2_set_pte() never needs to clear out a huge PMD
 	 * through this calling path.
 	 */
-	stage2_set_pte(kvm, NULL, gpa, pte, 0);
+	stage2_set_pte(&kvm->arch.mmu, NULL, gpa, pte, 0);
 	return 0;
 }
 
@@ -2078,7 +2110,7 @@ int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
 	kvm_pfn_t pfn = pte_pfn(pte);
 	pte_t stage2_pte;
 
-	if (!kvm->arch.pgd)
+	if (!kvm->arch.mmu.pgd)
 		return 0;
 
 	trace_kvm_set_spte_hva(hva);
@@ -2101,7 +2133,7 @@ static int kvm_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data)
 	pte_t *pte;
 
 	WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE);
-	if (!stage2_get_leaf_entry(kvm, gpa, &pud, &pmd, &pte))
+	if (!stage2_get_leaf_entry(&kvm->arch.mmu, gpa, &pud, &pmd, &pte))
 		return 0;
 
 	if (pud)
@@ -2119,7 +2151,7 @@ static int kvm_test_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *
 	pte_t *pte;
 
 	WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE);
-	if (!stage2_get_leaf_entry(kvm, gpa, &pud, &pmd, &pte))
+	if (!stage2_get_leaf_entry(&kvm->arch.mmu, gpa, &pud, &pmd, &pte))
 		return 0;
 
 	if (pud)
@@ -2132,7 +2164,7 @@ static int kvm_test_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *
 
 int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end)
 {
-	if (!kvm->arch.pgd)
+	if (!kvm->arch.mmu.pgd)
 		return 0;
 	trace_kvm_age_hva(start, end);
 	return handle_hva_to_gpa(kvm, start, end, kvm_age_hva_handler, NULL);
@@ -2140,7 +2172,7 @@ int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end)
 
 int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
 {
-	if (!kvm->arch.pgd)
+	if (!kvm->arch.mmu.pgd)
 		return 0;
 	trace_kvm_test_age_hva(hva);
 	return handle_hva_to_gpa(kvm, hva, hva + PAGE_SIZE,
@@ -2345,9 +2377,9 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 
 	spin_lock(&kvm->mmu_lock);
 	if (ret)
-		unmap_stage2_range(kvm, mem->guest_phys_addr, mem->memory_size);
+		unmap_stage2_range(&kvm->arch.mmu, mem->guest_phys_addr, mem->memory_size);
 	else
-		stage2_flush_memslot(kvm, memslot);
+		stage2_flush_memslot(&kvm->arch.mmu, memslot);
 	spin_unlock(&kvm->mmu_lock);
 out:
 	up_read(&current->mm->mmap_sem);
@@ -2364,7 +2396,7 @@ void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen)
 
 void kvm_arch_flush_shadow_all(struct kvm *kvm)
 {
-	kvm_free_stage2_pgd(kvm);
+	kvm_free_stage2_pgd(&kvm->arch.mmu);
 }
 
 void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
@@ -2374,7 +2406,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 	phys_addr_t size = slot->npages << PAGE_SHIFT;
 
 	spin_lock(&kvm->mmu_lock);
-	unmap_stage2_range(kvm, gpa, size);
+	unmap_stage2_range(&kvm->arch.mmu, gpa, size);
 	spin_unlock(&kvm->mmu_lock);
 }
 
-- 
2.26.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 04/26] arm64: Detect the ARMv8.4 TTL feature
  2020-04-22 12:00 [PATCH 00/26] KVM: arm64: Preliminary NV patches Marc Zyngier
                   ` (2 preceding siblings ...)
  2020-04-22 12:00 ` [PATCH 03/26] KVM: arm64: Factor out stage 2 page table data from struct kvm Marc Zyngier
@ 2020-04-22 12:00 ` Marc Zyngier
  2020-04-27 15:55   ` Suzuki K Poulose
  2020-04-22 12:00 ` [PATCH 05/26] arm64: Document SW reserved PTE/PMD bits in Stage-2 descriptors Marc Zyngier
                   ` (21 subsequent siblings)
  25 siblings, 1 reply; 78+ messages in thread
From: Marc Zyngier @ 2020-04-22 12:00 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Will Deacon, Andre Przywara, Dave Martin, George Cherian,
	Zengtao (B),
	Catalin Marinas

In order to reduce the cost of TLB invalidation, the ARMv8.4 TTL
feature allows TLBs to be issued with a level allowing for quicker
invalidation.

Let's detect the feature for now. Further patches will implement
its actual usage.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/cpucaps.h |  3 ++-
 arch/arm64/include/asm/sysreg.h  |  1 +
 arch/arm64/kernel/cpufeature.c   | 11 +++++++++++
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index 8eb5a088ae658..cabb0c49a1d11 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -61,7 +61,8 @@
 #define ARM64_HAS_AMU_EXTN			51
 #define ARM64_HAS_ADDRESS_AUTH			52
 #define ARM64_HAS_GENERIC_AUTH			53
+#define ARM64_HAS_ARMv8_4_TTL			54
 
-#define ARM64_NCAPS				54
+#define ARM64_NCAPS				55
 
 #endif /* __ASM_CPUCAPS_H */
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 5d10c9148e844..79cf186b7e471 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -726,6 +726,7 @@
 
 /* id_aa64mmfr2 */
 #define ID_AA64MMFR2_E0PD_SHIFT		60
+#define ID_AA64MMFR2_TTL_SHIFT		48
 #define ID_AA64MMFR2_FWB_SHIFT		40
 #define ID_AA64MMFR2_AT_SHIFT		32
 #define ID_AA64MMFR2_LVA_SHIFT		16
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 9892a845d06c9..d8ab4f1e93bee 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -252,6 +252,7 @@ static const struct arm64_ftr_bits ftr_id_aa64mmfr1[] = {
 
 static const struct arm64_ftr_bits ftr_id_aa64mmfr2[] = {
 	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_E0PD_SHIFT, 4, 0),
+	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_TTL_SHIFT, 4, 0),
 	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_FWB_SHIFT, 4, 0),
 	ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_AT_SHIFT, 4, 0),
 	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_LVA_SHIFT, 4, 0),
@@ -1630,6 +1631,16 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 		.matches = has_cpuid_feature,
 		.cpu_enable = cpu_has_fwb,
 	},
+	{
+		.desc = "ARMv8.4 Translation Table Level",
+		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
+		.capability = ARM64_HAS_ARMv8_4_TTL,
+		.sys_reg = SYS_ID_AA64MMFR2_EL1,
+		.sign = FTR_UNSIGNED,
+		.field_pos = ID_AA64MMFR2_TTL_SHIFT,
+		.min_field_value = 1,
+		.matches = has_cpuid_feature,
+	},
 #ifdef CONFIG_ARM64_HW_AFDBM
 	{
 		/*
-- 
2.26.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 05/26] arm64: Document SW reserved PTE/PMD bits in Stage-2 descriptors
  2020-04-22 12:00 [PATCH 00/26] KVM: arm64: Preliminary NV patches Marc Zyngier
                   ` (3 preceding siblings ...)
  2020-04-22 12:00 ` [PATCH 04/26] arm64: Detect the ARMv8.4 TTL feature Marc Zyngier
@ 2020-04-22 12:00 ` Marc Zyngier
  2020-05-05 15:59   ` Andrew Scull
  2020-04-22 12:00 ` [PATCH 06/26] arm64: Add level-hinted TLB invalidation helper Marc Zyngier
                   ` (20 subsequent siblings)
  25 siblings, 1 reply; 78+ messages in thread
From: Marc Zyngier @ 2020-04-22 12:00 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Will Deacon, Andre Przywara, Dave Martin, George Cherian,
	Zengtao (B),
	Catalin Marinas

Advertise bits [58:55] as reserved for SW in the S2 descriptors.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/pgtable-hwdef.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
index 6bf5e650da788..7eab0d23cdb52 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -177,10 +177,12 @@
 #define PTE_S2_RDONLY		(_AT(pteval_t, 1) << 6)   /* HAP[2:1] */
 #define PTE_S2_RDWR		(_AT(pteval_t, 3) << 6)   /* HAP[2:1] */
 #define PTE_S2_XN		(_AT(pteval_t, 2) << 53)  /* XN[1:0] */
+#define PTE_S2_SW_RESVD		(_AT(pteval_t, 15) << 55) /* Reserved for SW */
 
 #define PMD_S2_RDONLY		(_AT(pmdval_t, 1) << 6)   /* HAP[2:1] */
 #define PMD_S2_RDWR		(_AT(pmdval_t, 3) << 6)   /* HAP[2:1] */
 #define PMD_S2_XN		(_AT(pmdval_t, 2) << 53)  /* XN[1:0] */
+#define PMD_S2_SW_RESVD		(_AT(pmdval_t, 15) << 55) /* Reserved for SW */
 
 #define PUD_S2_RDONLY		(_AT(pudval_t, 1) << 6)   /* HAP[2:1] */
 #define PUD_S2_RDWR		(_AT(pudval_t, 3) << 6)   /* HAP[2:1] */
-- 
2.26.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 06/26] arm64: Add level-hinted TLB invalidation helper
  2020-04-22 12:00 [PATCH 00/26] KVM: arm64: Preliminary NV patches Marc Zyngier
                   ` (4 preceding siblings ...)
  2020-04-22 12:00 ` [PATCH 05/26] arm64: Document SW reserved PTE/PMD bits in Stage-2 descriptors Marc Zyngier
@ 2020-04-22 12:00 ` Marc Zyngier
  2020-05-05 17:16   ` Andrew Scull
  2020-04-22 12:00 ` [PATCH 07/26] KVM: arm64: Add a level hint to __kvm_tlb_flush_vmid_ipa Marc Zyngier
                   ` (19 subsequent siblings)
  25 siblings, 1 reply; 78+ messages in thread
From: Marc Zyngier @ 2020-04-22 12:00 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Will Deacon, Andre Przywara, Dave Martin, George Cherian,
	Zengtao (B),
	Catalin Marinas

Add a level-hinted TLB invalidation helper that only gets used if
ARMv8.4-TTL gets detected.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/tlbflush.h | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index bc39490647259..a3f70778a325c 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -10,6 +10,7 @@
 
 #ifndef __ASSEMBLY__
 
+#include <linux/bitfield.h>
 #include <linux/mm_types.h>
 #include <linux/sched.h>
 #include <asm/cputype.h>
@@ -59,6 +60,35 @@
 		__ta;						\
 	})
 
+#define TLBI_TTL_MASK	GENMASK_ULL(47, 44)
+
+#define __tlbi_level(op, addr, level)					\
+	do {								\
+		u64 arg = addr;						\
+									\
+		if (cpus_have_const_cap(ARM64_HAS_ARMv8_4_TTL) &&	\
+		    level) {						\
+			u64 ttl = level;				\
+									\
+			switch (PAGE_SIZE) {				\
+			case SZ_4K:					\
+				ttl |= 1 << 2;				\
+				break;					\
+			case SZ_16K:					\
+				ttl |= 2 << 2;				\
+				break;					\
+			case SZ_64K:					\
+				ttl |= 3 << 2;				\
+				break;					\
+			}						\
+									\
+			arg &= ~TLBI_TTL_MASK;				\
+			arg |= FIELD_PREP(TLBI_TTL_MASK, ttl);		\
+		}							\
+									\
+		__tlbi(op,  arg);					\
+	} while(0)
+
 /*
  *	TLB Invalidation
  *	================
-- 
2.26.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 07/26] KVM: arm64: Add a level hint to __kvm_tlb_flush_vmid_ipa
  2020-04-22 12:00 [PATCH 00/26] KVM: arm64: Preliminary NV patches Marc Zyngier
                   ` (5 preceding siblings ...)
  2020-04-22 12:00 ` [PATCH 06/26] arm64: Add level-hinted TLB invalidation helper Marc Zyngier
@ 2020-04-22 12:00 ` Marc Zyngier
  2020-05-07 15:08   ` Andrew Scull
  2020-04-22 12:00 ` [PATCH 08/26] KVM: arm64: Use TTL hint in when invalidating stage-2 translations Marc Zyngier
                   ` (18 subsequent siblings)
  25 siblings, 1 reply; 78+ messages in thread
From: Marc Zyngier @ 2020-04-22 12:00 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Will Deacon, Andre Przywara, Dave Martin, George Cherian,
	Zengtao (B),
	Catalin Marinas

Introduce a level hint to __kvm_tlb_flush_vmid_ipa. No functional
change.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_asm.h | 3 ++-
 arch/arm64/kvm/hyp/tlb.c         | 5 +++--
 virt/kvm/arm/mmu.c               | 2 +-
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 5adf4e1a4c2c9..fad03b83f8e41 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -61,7 +61,8 @@ extern char __kvm_hyp_init_end[];
 extern char __kvm_hyp_vector[];
 
 extern void __kvm_flush_vm_context(void);
-extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa);
+extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa,
+				     int level);
 extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
 extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm64/kvm/hyp/tlb.c b/arch/arm64/kvm/hyp/tlb.c
index b795891b0d091..6db2f8ad383f0 100644
--- a/arch/arm64/kvm/hyp/tlb.c
+++ b/arch/arm64/kvm/hyp/tlb.c
@@ -132,7 +132,8 @@ static void __hyp_text __tlb_switch_to_host(struct kvm_s2_mmu *mmu,
 		__tlb_switch_to_host_nvhe(cxt);
 }
 
-void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa)
+void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu,
+					 phys_addr_t ipa, int level)
 {
 	struct tlb_inv_context cxt;
 
@@ -148,7 +149,7 @@ void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa
 	 * whole of Stage-1. Weep...
 	 */
 	ipa >>= 12;
-	__tlbi(ipas2e1is, ipa);
+	__tlbi_level(ipas2e1is, ipa, level);
 
 	/*
 	 * We have to ensure completion of the invalidation at Stage-2,
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 2f99749048285..611234e0aef12 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -60,7 +60,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
 
 static void kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa)
 {
-	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ipa);
+	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ipa, 0);
 }
 
 /*
-- 
2.26.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 08/26] KVM: arm64: Use TTL hint in when invalidating stage-2 translations
  2020-04-22 12:00 [PATCH 00/26] KVM: arm64: Preliminary NV patches Marc Zyngier
                   ` (6 preceding siblings ...)
  2020-04-22 12:00 ` [PATCH 07/26] KVM: arm64: Add a level hint to __kvm_tlb_flush_vmid_ipa Marc Zyngier
@ 2020-04-22 12:00 ` Marc Zyngier
  2020-05-07 15:13   ` Andrew Scull
  2020-05-12 17:26   ` James Morse
  2020-04-22 12:00 ` [PATCH 09/26] KVM: arm64: vgic-v3: Take cpu_if pointer directly instead of vcpu Marc Zyngier
                   ` (17 subsequent siblings)
  25 siblings, 2 replies; 78+ messages in thread
From: Marc Zyngier @ 2020-04-22 12:00 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Will Deacon, Andre Przywara, Dave Martin, George Cherian,
	Zengtao (B),
	Catalin Marinas

Since we always have a precide idea of the level we're dealing with
when invalidating TLBs, we can provide it to as a hint to our
invalidation helper.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/stage2_pgtable.h |  9 +++++++++
 virt/kvm/arm/mmu.c                      | 27 +++++++++++++------------
 2 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/include/asm/stage2_pgtable.h b/arch/arm64/include/asm/stage2_pgtable.h
index 326aac658b9da..7ed5c1a769a9b 100644
--- a/arch/arm64/include/asm/stage2_pgtable.h
+++ b/arch/arm64/include/asm/stage2_pgtable.h
@@ -230,4 +230,13 @@ stage2_pgd_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
 	return (boundary - 1 < end - 1) ? boundary : end;
 }
 
+/*
+ * Level values for the ARMv8.4-TTL extension, mapping PUD/PMD/PTE and
+ * the architectural page-table level.
+ */
+#define S2_NO_LEVEL_HINT	0
+#define S2_PUD_LEVEL		1
+#define S2_PMD_LEVEL		2
+#define S2_PTE_LEVEL		3
+
 #endif	/* __ARM64_S2_PGTABLE_H_ */
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 611234e0aef12..2c132f5595c4b 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -58,9 +58,10 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
 	kvm_call_hyp(__kvm_tlb_flush_vmid, &kvm->arch.mmu);
 }
 
-static void kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa)
+static void kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa,
+				   int level)
 {
-	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ipa, 0);
+	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ipa, level);
 }
 
 /*
@@ -102,7 +103,7 @@ static void stage2_dissolve_pmd(struct kvm_s2_mmu *mmu, phys_addr_t addr, pmd_t
 		return;
 
 	pmd_clear(pmd);
-	kvm_tlb_flush_vmid_ipa(mmu, addr);
+	kvm_tlb_flush_vmid_ipa(mmu, addr, S2_PMD_LEVEL);
 	put_page(virt_to_page(pmd));
 }
 
@@ -122,7 +123,7 @@ static void stage2_dissolve_pud(struct kvm_s2_mmu *mmu, phys_addr_t addr, pud_t
 		return;
 
 	stage2_pud_clear(kvm, pudp);
-	kvm_tlb_flush_vmid_ipa(mmu, addr);
+	kvm_tlb_flush_vmid_ipa(mmu, addr, S2_PUD_LEVEL);
 	put_page(virt_to_page(pudp));
 }
 
@@ -164,7 +165,7 @@ static void clear_stage2_pgd_entry(struct kvm_s2_mmu *mmu, pgd_t *pgd, phys_addr
 
 	pud_t *pud_table __maybe_unused = stage2_pud_offset(kvm, pgd, 0UL);
 	stage2_pgd_clear(kvm, pgd);
-	kvm_tlb_flush_vmid_ipa(mmu, addr);
+	kvm_tlb_flush_vmid_ipa(mmu, addr, S2_NO_LEVEL_HINT);
 	stage2_pud_free(kvm, pud_table);
 	put_page(virt_to_page(pgd));
 }
@@ -176,7 +177,7 @@ static void clear_stage2_pud_entry(struct kvm_s2_mmu *mmu, pud_t *pud, phys_addr
 	pmd_t *pmd_table __maybe_unused = stage2_pmd_offset(kvm, pud, 0);
 	VM_BUG_ON(stage2_pud_huge(kvm, *pud));
 	stage2_pud_clear(kvm, pud);
-	kvm_tlb_flush_vmid_ipa(mmu, addr);
+	kvm_tlb_flush_vmid_ipa(mmu, addr, S2_NO_LEVEL_HINT);
 	stage2_pmd_free(kvm, pmd_table);
 	put_page(virt_to_page(pud));
 }
@@ -186,7 +187,7 @@ static void clear_stage2_pmd_entry(struct kvm_s2_mmu *mmu, pmd_t *pmd, phys_addr
 	pte_t *pte_table = pte_offset_kernel(pmd, 0);
 	VM_BUG_ON(pmd_thp_or_huge(*pmd));
 	pmd_clear(pmd);
-	kvm_tlb_flush_vmid_ipa(mmu, addr);
+	kvm_tlb_flush_vmid_ipa(mmu, addr, S2_NO_LEVEL_HINT);
 	free_page((unsigned long)pte_table);
 	put_page(virt_to_page(pmd));
 }
@@ -256,7 +257,7 @@ static void unmap_stage2_ptes(struct kvm_s2_mmu *mmu, pmd_t *pmd,
 			pte_t old_pte = *pte;
 
 			kvm_set_pte(pte, __pte(0));
-			kvm_tlb_flush_vmid_ipa(mmu, addr);
+			kvm_tlb_flush_vmid_ipa(mmu, addr, S2_PTE_LEVEL);
 
 			/* No need to invalidate the cache for device mappings */
 			if (!kvm_is_device_pfn(pte_pfn(old_pte)))
@@ -285,7 +286,7 @@ static void unmap_stage2_pmds(struct kvm_s2_mmu *mmu, pud_t *pud,
 				pmd_t old_pmd = *pmd;
 
 				pmd_clear(pmd);
-				kvm_tlb_flush_vmid_ipa(mmu, addr);
+				kvm_tlb_flush_vmid_ipa(mmu, addr, S2_PMD_LEVEL);
 
 				kvm_flush_dcache_pmd(old_pmd);
 
@@ -315,7 +316,7 @@ static void unmap_stage2_puds(struct kvm_s2_mmu *mmu, pgd_t *pgd,
 				pud_t old_pud = *pud;
 
 				stage2_pud_clear(kvm, pud);
-				kvm_tlb_flush_vmid_ipa(mmu, addr);
+				kvm_tlb_flush_vmid_ipa(mmu, addr, S2_PUD_LEVEL);
 				kvm_flush_dcache_pud(old_pud);
 				put_page(virt_to_page(pud));
 			} else {
@@ -1129,7 +1130,7 @@ static int stage2_set_pmd_huge(struct kvm_s2_mmu *mmu,
 		 */
 		WARN_ON_ONCE(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd));
 		pmd_clear(pmd);
-		kvm_tlb_flush_vmid_ipa(mmu, addr);
+		kvm_tlb_flush_vmid_ipa(mmu, addr, S2_PMD_LEVEL);
 	} else {
 		get_page(virt_to_page(pmd));
 	}
@@ -1171,7 +1172,7 @@ static int stage2_set_pud_huge(struct kvm_s2_mmu *mmu,
 
 		WARN_ON_ONCE(kvm_pud_pfn(old_pud) != kvm_pud_pfn(*new_pudp));
 		stage2_pud_clear(kvm, pudp);
-		kvm_tlb_flush_vmid_ipa(mmu, addr);
+		kvm_tlb_flush_vmid_ipa(mmu, addr, S2_PUD_LEVEL);
 	} else {
 		get_page(virt_to_page(pudp));
 	}
@@ -1320,7 +1321,7 @@ static int stage2_set_pte(struct kvm_s2_mmu *mmu,
 			return 0;
 
 		kvm_set_pte(pte, __pte(0));
-		kvm_tlb_flush_vmid_ipa(mmu, addr);
+		kvm_tlb_flush_vmid_ipa(mmu, addr, S2_PTE_LEVEL);
 	} else {
 		get_page(virt_to_page(pte));
 	}
-- 
2.26.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 09/26] KVM: arm64: vgic-v3: Take cpu_if pointer directly instead of vcpu
  2020-04-22 12:00 [PATCH 00/26] KVM: arm64: Preliminary NV patches Marc Zyngier
                   ` (7 preceding siblings ...)
  2020-04-22 12:00 ` [PATCH 08/26] KVM: arm64: Use TTL hint in when invalidating stage-2 translations Marc Zyngier
@ 2020-04-22 12:00 ` Marc Zyngier
  2020-05-07 16:26   ` James Morse
  2020-04-22 12:00 ` [PATCH 10/26] KVM: arm64: Refactor vcpu_{read,write}_sys_reg Marc Zyngier
                   ` (16 subsequent siblings)
  25 siblings, 1 reply; 78+ messages in thread
From: Marc Zyngier @ 2020-04-22 12:00 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Will Deacon, Andre Przywara, Dave Martin, George Cherian,
	Zengtao (B),
	Catalin Marinas

From: Christoffer Dall <christoffer.dall@arm.com>

If we move the used_lrs field to the version-specific cpu interface
structure, the following functions only operate on the struct
vgic_v3_cpu_if and not the full vcpu:

  __vgic_v3_save_state
  __vgic_v3_restore_state
  __vgic_v3_activate_traps
  __vgic_v3_deactivate_traps
  __vgic_v3_save_aprs
  __vgic_v3_restore_aprs

This is going to be very useful for nested virt, so move the used_lrs
field and change the prototypes and implementations of these functions to
take the cpu_if parameter directly.

No functional change.

Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_hyp.h | 12 ++++++------
 arch/arm64/kvm/hyp/switch.c      |  8 ++++----
 include/kvm/arm_vgic.h           |  5 ++++-
 virt/kvm/arm/hyp/vgic-v3-sr.c    | 33 ++++++++++----------------------
 virt/kvm/arm/vgic/vgic-v2.c      | 10 +++++-----
 virt/kvm/arm/vgic/vgic-v3.c      | 14 ++++++++------
 virt/kvm/arm/vgic/vgic.c         | 25 ++++++++++++++++--------
 7 files changed, 54 insertions(+), 53 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index dcb63bf941057..f3c3d7645000f 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -55,12 +55,12 @@
 
 int __vgic_v2_perform_cpuif_access(struct kvm_vcpu *vcpu);
 
-void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
-void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
-void __vgic_v3_activate_traps(struct kvm_vcpu *vcpu);
-void __vgic_v3_deactivate_traps(struct kvm_vcpu *vcpu);
-void __vgic_v3_save_aprs(struct kvm_vcpu *vcpu);
-void __vgic_v3_restore_aprs(struct kvm_vcpu *vcpu);
+void __vgic_v3_save_state(struct vgic_v3_cpu_if *cpu_if);
+void __vgic_v3_restore_state(struct vgic_v3_cpu_if *cpu_if);
+void __vgic_v3_activate_traps(struct vgic_v3_cpu_if *cpu_if);
+void __vgic_v3_deactivate_traps(struct vgic_v3_cpu_if *cpu_if);
+void __vgic_v3_save_aprs(struct vgic_v3_cpu_if *cpu_if);
+void __vgic_v3_restore_aprs(struct vgic_v3_cpu_if *cpu_if);
 int __vgic_v3_perform_cpuif_access(struct kvm_vcpu *vcpu);
 
 void __timer_enable_traps(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index d79319038b119..c48c96565f1a7 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -270,8 +270,8 @@ static void __hyp_text __deactivate_vm(struct kvm_vcpu *vcpu)
 static void __hyp_text __hyp_vgic_save_state(struct kvm_vcpu *vcpu)
 {
 	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif)) {
-		__vgic_v3_save_state(vcpu);
-		__vgic_v3_deactivate_traps(vcpu);
+		__vgic_v3_save_state(&vcpu->arch.vgic_cpu.vgic_v3);
+		__vgic_v3_deactivate_traps(&vcpu->arch.vgic_cpu.vgic_v3);
 	}
 }
 
@@ -279,8 +279,8 @@ static void __hyp_text __hyp_vgic_save_state(struct kvm_vcpu *vcpu)
 static void __hyp_text __hyp_vgic_restore_state(struct kvm_vcpu *vcpu)
 {
 	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif)) {
-		__vgic_v3_activate_traps(vcpu);
-		__vgic_v3_restore_state(vcpu);
+		__vgic_v3_activate_traps(&vcpu->arch.vgic_cpu.vgic_v3);
+		__vgic_v3_restore_state(&vcpu->arch.vgic_cpu.vgic_v3);
 	}
 }
 
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 69f4164d64776..a8d8fdcd37230 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -274,6 +274,8 @@ struct vgic_v2_cpu_if {
 	u32		vgic_vmcr;
 	u32		vgic_apr;
 	u32		vgic_lr[VGIC_V2_MAX_LRS];
+
+	unsigned int used_lrs;
 };
 
 struct vgic_v3_cpu_if {
@@ -291,6 +293,8 @@ struct vgic_v3_cpu_if {
 	 * linking the Linux IRQ subsystem and the ITS together.
 	 */
 	struct its_vpe	its_vpe;
+
+	unsigned int used_lrs;
 };
 
 struct vgic_cpu {
@@ -300,7 +304,6 @@ struct vgic_cpu {
 		struct vgic_v3_cpu_if	vgic_v3;
 	};
 
-	unsigned int used_lrs;
 	struct vgic_irq private_irqs[VGIC_NR_PRIVATE_IRQS];
 
 	raw_spinlock_t ap_list_lock;	/* Protects the ap_list */
diff --git a/virt/kvm/arm/hyp/vgic-v3-sr.c b/virt/kvm/arm/hyp/vgic-v3-sr.c
index ccf1fde9836c1..2ea9a0b73fc44 100644
--- a/virt/kvm/arm/hyp/vgic-v3-sr.c
+++ b/virt/kvm/arm/hyp/vgic-v3-sr.c
@@ -194,10 +194,9 @@ static u32 __hyp_text __vgic_v3_read_ap1rn(int n)
 	return val;
 }
 
-void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
+void __hyp_text __vgic_v3_save_state(struct vgic_v3_cpu_if *cpu_if)
 {
-	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
-	u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
+	u64 used_lrs = cpu_if->used_lrs;
 
 	/*
 	 * Make sure stores to the GIC via the memory mapped interface
@@ -230,10 +229,9 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
 	}
 }
 
-void __hyp_text __vgic_v3_restore_state(struct kvm_vcpu *vcpu)
+void __hyp_text __vgic_v3_restore_state(struct vgic_v3_cpu_if *cpu_if)
 {
-	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
-	u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
+	u64 used_lrs = cpu_if->used_lrs;
 	int i;
 
 	if (used_lrs || cpu_if->its_vpe.its_vm) {
@@ -257,10 +255,8 @@ void __hyp_text __vgic_v3_restore_state(struct kvm_vcpu *vcpu)
 	}
 }
 
-void __hyp_text __vgic_v3_activate_traps(struct kvm_vcpu *vcpu)
+void __hyp_text __vgic_v3_activate_traps(struct vgic_v3_cpu_if *cpu_if)
 {
-	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
-
 	/*
 	 * VFIQEn is RES1 if ICC_SRE_EL1.SRE is 1. This causes a
 	 * Group0 interrupt (as generated in GICv2 mode) to be
@@ -306,9 +302,8 @@ void __hyp_text __vgic_v3_activate_traps(struct kvm_vcpu *vcpu)
 		write_gicreg(cpu_if->vgic_hcr, ICH_HCR_EL2);
 }
 
-void __hyp_text __vgic_v3_deactivate_traps(struct kvm_vcpu *vcpu)
+void __hyp_text __vgic_v3_deactivate_traps(struct vgic_v3_cpu_if *cpu_if)
 {
-	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
 	u64 val;
 
 	if (!cpu_if->vgic_sre) {
@@ -333,15 +328,11 @@ void __hyp_text __vgic_v3_deactivate_traps(struct kvm_vcpu *vcpu)
 		write_gicreg(0, ICH_HCR_EL2);
 }
 
-void __hyp_text __vgic_v3_save_aprs(struct kvm_vcpu *vcpu)
+void __hyp_text __vgic_v3_save_aprs(struct vgic_v3_cpu_if *cpu_if)
 {
-	struct vgic_v3_cpu_if *cpu_if;
 	u64 val;
 	u32 nr_pre_bits;
 
-	vcpu = kern_hyp_va(vcpu);
-	cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
-
 	val = read_gicreg(ICH_VTR_EL2);
 	nr_pre_bits = vtr_to_nr_pre_bits(val);
 
@@ -370,15 +361,11 @@ void __hyp_text __vgic_v3_save_aprs(struct kvm_vcpu *vcpu)
 	}
 }
 
-void __hyp_text __vgic_v3_restore_aprs(struct kvm_vcpu *vcpu)
+void __hyp_text __vgic_v3_restore_aprs(struct vgic_v3_cpu_if *cpu_if)
 {
-	struct vgic_v3_cpu_if *cpu_if;
 	u64 val;
 	u32 nr_pre_bits;
 
-	vcpu = kern_hyp_va(vcpu);
-	cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
-
 	val = read_gicreg(ICH_VTR_EL2);
 	nr_pre_bits = vtr_to_nr_pre_bits(val);
 
@@ -453,7 +440,7 @@ static int __hyp_text __vgic_v3_highest_priority_lr(struct kvm_vcpu *vcpu,
 						    u32 vmcr,
 						    u64 *lr_val)
 {
-	unsigned int used_lrs = vcpu->arch.vgic_cpu.used_lrs;
+	unsigned int used_lrs = vcpu->arch.vgic_cpu.vgic_v3.used_lrs;
 	u8 priority = GICv3_IDLE_PRIORITY;
 	int i, lr = -1;
 
@@ -492,7 +479,7 @@ static int __hyp_text __vgic_v3_highest_priority_lr(struct kvm_vcpu *vcpu,
 static int __hyp_text __vgic_v3_find_active_lr(struct kvm_vcpu *vcpu,
 					       int intid, u64 *lr_val)
 {
-	unsigned int used_lrs = vcpu->arch.vgic_cpu.used_lrs;
+	unsigned int used_lrs = vcpu->arch.vgic_cpu.vgic_v3.used_lrs;
 	int i;
 
 	for (i = 0; i < used_lrs; i++) {
diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
index 621cc168fe3f7..ebf53a4e12963 100644
--- a/virt/kvm/arm/vgic/vgic-v2.c
+++ b/virt/kvm/arm/vgic/vgic-v2.c
@@ -56,7 +56,7 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
 
 	cpuif->vgic_hcr &= ~GICH_HCR_UIE;
 
-	for (lr = 0; lr < vgic_cpu->used_lrs; lr++) {
+	for (lr = 0; lr < vgic_cpu->vgic_v2.used_lrs; lr++) {
 		u32 val = cpuif->vgic_lr[lr];
 		u32 cpuid, intid = val & GICH_LR_VIRTUALID;
 		struct vgic_irq *irq;
@@ -120,7 +120,7 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
 		vgic_put_irq(vcpu->kvm, irq);
 	}
 
-	vgic_cpu->used_lrs = 0;
+	cpuif->used_lrs = 0;
 }
 
 /*
@@ -427,7 +427,7 @@ int vgic_v2_probe(const struct gic_kvm_info *info)
 static void save_lrs(struct kvm_vcpu *vcpu, void __iomem *base)
 {
 	struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
-	u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
+	u64 used_lrs = cpu_if->used_lrs;
 	u64 elrsr;
 	int i;
 
@@ -448,7 +448,7 @@ static void save_lrs(struct kvm_vcpu *vcpu, void __iomem *base)
 void vgic_v2_save_state(struct kvm_vcpu *vcpu)
 {
 	void __iomem *base = kvm_vgic_global_state.vctrl_base;
-	u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
+	u64 used_lrs = vcpu->arch.vgic_cpu.vgic_v2.used_lrs;
 
 	if (!base)
 		return;
@@ -463,7 +463,7 @@ void vgic_v2_restore_state(struct kvm_vcpu *vcpu)
 {
 	struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
 	void __iomem *base = kvm_vgic_global_state.vctrl_base;
-	u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
+	u64 used_lrs = cpu_if->used_lrs;
 	int i;
 
 	if (!base)
diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
index 2c9fc13e2c590..118d044f48ddf 100644
--- a/virt/kvm/arm/vgic/vgic-v3.c
+++ b/virt/kvm/arm/vgic/vgic-v3.c
@@ -39,7 +39,7 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
 
 	cpuif->vgic_hcr &= ~ICH_HCR_UIE;
 
-	for (lr = 0; lr < vgic_cpu->used_lrs; lr++) {
+	for (lr = 0; lr < cpuif->used_lrs; lr++) {
 		u64 val = cpuif->vgic_lr[lr];
 		u32 intid, cpuid;
 		struct vgic_irq *irq;
@@ -111,7 +111,7 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
 		vgic_put_irq(vcpu->kvm, irq);
 	}
 
-	vgic_cpu->used_lrs = 0;
+	cpuif->used_lrs = 0;
 }
 
 /* Requires the irq to be locked already */
@@ -664,10 +664,10 @@ void vgic_v3_load(struct kvm_vcpu *vcpu)
 	if (likely(cpu_if->vgic_sre))
 		kvm_call_hyp(__vgic_v3_write_vmcr, cpu_if->vgic_vmcr);
 
-	kvm_call_hyp(__vgic_v3_restore_aprs, vcpu);
+	kvm_call_hyp(__vgic_v3_restore_aprs, kern_hyp_va(cpu_if));
 
 	if (has_vhe())
-		__vgic_v3_activate_traps(vcpu);
+		__vgic_v3_activate_traps(cpu_if);
 
 	WARN_ON(vgic_v4_load(vcpu));
 }
@@ -682,12 +682,14 @@ void vgic_v3_vmcr_sync(struct kvm_vcpu *vcpu)
 
 void vgic_v3_put(struct kvm_vcpu *vcpu)
 {
+	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
+
 	WARN_ON(vgic_v4_put(vcpu, false));
 
 	vgic_v3_vmcr_sync(vcpu);
 
-	kvm_call_hyp(__vgic_v3_save_aprs, vcpu);
+	kvm_call_hyp(__vgic_v3_save_aprs, kern_hyp_va(cpu_if));
 
 	if (has_vhe())
-		__vgic_v3_deactivate_traps(vcpu);
+		__vgic_v3_deactivate_traps(cpu_if);
 }
diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index 99b02ca730a87..c3643b7f101b7 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -786,6 +786,7 @@ static void vgic_flush_lr_state(struct kvm_vcpu *vcpu)
 	int count;
 	bool multi_sgi;
 	u8 prio = 0xff;
+	int i = 0;
 
 	lockdep_assert_held(&vgic_cpu->ap_list_lock);
 
@@ -827,11 +828,14 @@ static void vgic_flush_lr_state(struct kvm_vcpu *vcpu)
 		}
 	}
 
-	vcpu->arch.vgic_cpu.used_lrs = count;
-
 	/* Nuke remaining LRs */
-	for ( ; count < kvm_vgic_global_state.nr_lr; count++)
-		vgic_clear_lr(vcpu, count);
+	for (i = count ; i < kvm_vgic_global_state.nr_lr; i++)
+		vgic_clear_lr(vcpu, i);
+
+	if (!static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
+		vcpu->arch.vgic_cpu.vgic_v2.used_lrs = count;
+	else
+		vcpu->arch.vgic_cpu.vgic_v3.used_lrs = count;
 }
 
 static inline bool can_access_vgic_from_kernel(void)
@@ -849,13 +853,13 @@ static inline void vgic_save_state(struct kvm_vcpu *vcpu)
 	if (!static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
 		vgic_v2_save_state(vcpu);
 	else
-		__vgic_v3_save_state(vcpu);
+		__vgic_v3_save_state(&vcpu->arch.vgic_cpu.vgic_v3);
 }
 
 /* Sync back the hardware VGIC state into our emulation after a guest's run. */
 void kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
 {
-	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
+	int used_lrs;
 
 	/* An empty ap_list_head implies used_lrs == 0 */
 	if (list_empty(&vcpu->arch.vgic_cpu.ap_list_head))
@@ -864,7 +868,12 @@ void kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
 	if (can_access_vgic_from_kernel())
 		vgic_save_state(vcpu);
 
-	if (vgic_cpu->used_lrs)
+	if (!static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
+		used_lrs = vcpu->arch.vgic_cpu.vgic_v2.used_lrs;
+	else
+		used_lrs = vcpu->arch.vgic_cpu.vgic_v3.used_lrs;
+
+	if (used_lrs)
 		vgic_fold_lr_state(vcpu);
 	vgic_prune_ap_list(vcpu);
 }
@@ -874,7 +883,7 @@ static inline void vgic_restore_state(struct kvm_vcpu *vcpu)
 	if (!static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
 		vgic_v2_restore_state(vcpu);
 	else
-		__vgic_v3_restore_state(vcpu);
+		__vgic_v3_restore_state(&vcpu->arch.vgic_cpu.vgic_v3);
 }
 
 /* Flush our emulation state into the GIC hardware before entering the guest. */
-- 
2.26.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 10/26] KVM: arm64: Refactor vcpu_{read,write}_sys_reg
  2020-04-22 12:00 [PATCH 00/26] KVM: arm64: Preliminary NV patches Marc Zyngier
                   ` (8 preceding siblings ...)
  2020-04-22 12:00 ` [PATCH 09/26] KVM: arm64: vgic-v3: Take cpu_if pointer directly instead of vcpu Marc Zyngier
@ 2020-04-22 12:00 ` Marc Zyngier
  2020-05-26 16:28   ` James Morse
  2020-04-22 12:00 ` [PATCH 11/26] KVM: arm64: Add missing reset handlers for PMU emulation Marc Zyngier
                   ` (15 subsequent siblings)
  25 siblings, 1 reply; 78+ messages in thread
From: Marc Zyngier @ 2020-04-22 12:00 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Will Deacon, Andre Przywara, Dave Martin, George Cherian,
	Zengtao (B),
	Catalin Marinas

Extract the direct HW accessors for later reuse.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/sys_regs.c | 135 ++++++++++++++++++++++----------------
 1 file changed, 78 insertions(+), 57 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 51db934702b64..46f218982df8c 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -64,11 +64,8 @@ static bool write_to_read_only(struct kvm_vcpu *vcpu,
 	return false;
 }
 
-u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
+static bool __vcpu_read_sys_reg_from_cpu(int reg, u64 *val)
 {
-	if (!vcpu->arch.sysregs_loaded_on_cpu)
-		goto immediate_read;
-
 	/*
 	 * System registers listed in the switch are not saved on every
 	 * exit from the guest but are only saved on vcpu_put.
@@ -79,40 +76,37 @@ u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
 	 * thread when emulating cross-VCPU communication.
 	 */
 	switch (reg) {
-	case CSSELR_EL1:	return read_sysreg_s(SYS_CSSELR_EL1);
-	case SCTLR_EL1:		return read_sysreg_s(SYS_SCTLR_EL12);
-	case ACTLR_EL1:		return read_sysreg_s(SYS_ACTLR_EL1);
-	case CPACR_EL1:		return read_sysreg_s(SYS_CPACR_EL12);
-	case TTBR0_EL1:		return read_sysreg_s(SYS_TTBR0_EL12);
-	case TTBR1_EL1:		return read_sysreg_s(SYS_TTBR1_EL12);
-	case TCR_EL1:		return read_sysreg_s(SYS_TCR_EL12);
-	case ESR_EL1:		return read_sysreg_s(SYS_ESR_EL12);
-	case AFSR0_EL1:		return read_sysreg_s(SYS_AFSR0_EL12);
-	case AFSR1_EL1:		return read_sysreg_s(SYS_AFSR1_EL12);
-	case FAR_EL1:		return read_sysreg_s(SYS_FAR_EL12);
-	case MAIR_EL1:		return read_sysreg_s(SYS_MAIR_EL12);
-	case VBAR_EL1:		return read_sysreg_s(SYS_VBAR_EL12);
-	case CONTEXTIDR_EL1:	return read_sysreg_s(SYS_CONTEXTIDR_EL12);
-	case TPIDR_EL0:		return read_sysreg_s(SYS_TPIDR_EL0);
-	case TPIDRRO_EL0:	return read_sysreg_s(SYS_TPIDRRO_EL0);
-	case TPIDR_EL1:		return read_sysreg_s(SYS_TPIDR_EL1);
-	case AMAIR_EL1:		return read_sysreg_s(SYS_AMAIR_EL12);
-	case CNTKCTL_EL1:	return read_sysreg_s(SYS_CNTKCTL_EL12);
-	case PAR_EL1:		return read_sysreg_s(SYS_PAR_EL1);
-	case DACR32_EL2:	return read_sysreg_s(SYS_DACR32_EL2);
-	case IFSR32_EL2:	return read_sysreg_s(SYS_IFSR32_EL2);
-	case DBGVCR32_EL2:	return read_sysreg_s(SYS_DBGVCR32_EL2);
+	case CSSELR_EL1:	*val = read_sysreg_s(SYS_CSSELR_EL1);	break;
+	case SCTLR_EL1:		*val = read_sysreg_s(SYS_SCTLR_EL12);	break;
+	case ACTLR_EL1:		*val = read_sysreg_s(SYS_ACTLR_EL1);	break;
+	case CPACR_EL1:		*val = read_sysreg_s(SYS_CPACR_EL12);	break;
+	case TTBR0_EL1:		*val = read_sysreg_s(SYS_TTBR0_EL12);	break;
+	case TTBR1_EL1:		*val = read_sysreg_s(SYS_TTBR1_EL12);	break;
+	case TCR_EL1:		*val = read_sysreg_s(SYS_TCR_EL12);	break;
+	case ESR_EL1:		*val = read_sysreg_s(SYS_ESR_EL12);	break;
+	case AFSR0_EL1:		*val = read_sysreg_s(SYS_AFSR0_EL12);	break;
+	case AFSR1_EL1:		*val = read_sysreg_s(SYS_AFSR1_EL12);	break;
+	case FAR_EL1:		*val = read_sysreg_s(SYS_FAR_EL12);	break;
+	case MAIR_EL1:		*val = read_sysreg_s(SYS_MAIR_EL12);	break;
+	case VBAR_EL1:		*val = read_sysreg_s(SYS_VBAR_EL12);	break;
+	case CONTEXTIDR_EL1:	*val = read_sysreg_s(SYS_CONTEXTIDR_EL12);break;
+	case TPIDR_EL0:		*val = read_sysreg_s(SYS_TPIDR_EL0);	break;
+	case TPIDRRO_EL0:	*val = read_sysreg_s(SYS_TPIDRRO_EL0);	break;
+	case TPIDR_EL1:		*val = read_sysreg_s(SYS_TPIDR_EL1);	break;
+	case AMAIR_EL1:		*val = read_sysreg_s(SYS_AMAIR_EL12);	break;
+	case CNTKCTL_EL1:	*val = read_sysreg_s(SYS_CNTKCTL_EL12);	break;
+	case PAR_EL1:		*val = read_sysreg_s(SYS_PAR_EL1);	break;
+	case DACR32_EL2:	*val = read_sysreg_s(SYS_DACR32_EL2);	break;
+	case IFSR32_EL2:	*val = read_sysreg_s(SYS_IFSR32_EL2);	break;
+	case DBGVCR32_EL2:	*val = read_sysreg_s(SYS_DBGVCR32_EL2);	break;
+	default:		return false;
 	}
 
-immediate_read:
-	return __vcpu_sys_reg(vcpu, reg);
+	return true;
 }
 
-void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
+static bool __vcpu_write_sys_reg_to_cpu(u64 val, int reg)
 {
-	if (!vcpu->arch.sysregs_loaded_on_cpu)
-		goto immediate_write;
-
 	/*
 	 * System registers listed in the switch are not restored on every
 	 * entry to the guest but are only restored on vcpu_load.
@@ -122,32 +116,59 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
 	 * set once, before running the VCPU, and never changed later.
 	 */
 	switch (reg) {
-	case CSSELR_EL1:	write_sysreg_s(val, SYS_CSSELR_EL1);	return;
-	case SCTLR_EL1:		write_sysreg_s(val, SYS_SCTLR_EL12);	return;
-	case ACTLR_EL1:		write_sysreg_s(val, SYS_ACTLR_EL1);	return;
-	case CPACR_EL1:		write_sysreg_s(val, SYS_CPACR_EL12);	return;
-	case TTBR0_EL1:		write_sysreg_s(val, SYS_TTBR0_EL12);	return;
-	case TTBR1_EL1:		write_sysreg_s(val, SYS_TTBR1_EL12);	return;
-	case TCR_EL1:		write_sysreg_s(val, SYS_TCR_EL12);	return;
-	case ESR_EL1:		write_sysreg_s(val, SYS_ESR_EL12);	return;
-	case AFSR0_EL1:		write_sysreg_s(val, SYS_AFSR0_EL12);	return;
-	case AFSR1_EL1:		write_sysreg_s(val, SYS_AFSR1_EL12);	return;
-	case FAR_EL1:		write_sysreg_s(val, SYS_FAR_EL12);	return;
-	case MAIR_EL1:		write_sysreg_s(val, SYS_MAIR_EL12);	return;
-	case VBAR_EL1:		write_sysreg_s(val, SYS_VBAR_EL12);	return;
-	case CONTEXTIDR_EL1:	write_sysreg_s(val, SYS_CONTEXTIDR_EL12); return;
-	case TPIDR_EL0:		write_sysreg_s(val, SYS_TPIDR_EL0);	return;
-	case TPIDRRO_EL0:	write_sysreg_s(val, SYS_TPIDRRO_EL0);	return;
-	case TPIDR_EL1:		write_sysreg_s(val, SYS_TPIDR_EL1);	return;
-	case AMAIR_EL1:		write_sysreg_s(val, SYS_AMAIR_EL12);	return;
-	case CNTKCTL_EL1:	write_sysreg_s(val, SYS_CNTKCTL_EL12);	return;
-	case PAR_EL1:		write_sysreg_s(val, SYS_PAR_EL1);	return;
-	case DACR32_EL2:	write_sysreg_s(val, SYS_DACR32_EL2);	return;
-	case IFSR32_EL2:	write_sysreg_s(val, SYS_IFSR32_EL2);	return;
-	case DBGVCR32_EL2:	write_sysreg_s(val, SYS_DBGVCR32_EL2);	return;
+	case CSSELR_EL1:	write_sysreg_s(val, SYS_CSSELR_EL1);	break;
+	case SCTLR_EL1:		write_sysreg_s(val, SYS_SCTLR_EL12);	break;
+	case ACTLR_EL1:		write_sysreg_s(val, SYS_ACTLR_EL1);	break;
+	case CPACR_EL1:		write_sysreg_s(val, SYS_CPACR_EL12);	break;
+	case TTBR0_EL1:		write_sysreg_s(val, SYS_TTBR0_EL12);	break;
+	case TTBR1_EL1:		write_sysreg_s(val, SYS_TTBR1_EL12);	break;
+	case TCR_EL1:		write_sysreg_s(val, SYS_TCR_EL12);	break;
+	case ESR_EL1:		write_sysreg_s(val, SYS_ESR_EL12);	break;
+	case AFSR0_EL1:		write_sysreg_s(val, SYS_AFSR0_EL12);	break;
+	case AFSR1_EL1:		write_sysreg_s(val, SYS_AFSR1_EL12);	break;
+	case FAR_EL1:		write_sysreg_s(val, SYS_FAR_EL12);	break;
+	case MAIR_EL1:		write_sysreg_s(val, SYS_MAIR_EL12);	break;
+	case VBAR_EL1:		write_sysreg_s(val, SYS_VBAR_EL12);	break;
+	case CONTEXTIDR_EL1:	write_sysreg_s(val, SYS_CONTEXTIDR_EL12);break;
+	case TPIDR_EL0:		write_sysreg_s(val, SYS_TPIDR_EL0);	break;
+	case TPIDRRO_EL0:	write_sysreg_s(val, SYS_TPIDRRO_EL0);	break;
+	case TPIDR_EL1:		write_sysreg_s(val, SYS_TPIDR_EL1);	break;
+	case AMAIR_EL1:		write_sysreg_s(val, SYS_AMAIR_EL12);	break;
+	case CNTKCTL_EL1:	write_sysreg_s(val, SYS_CNTKCTL_EL12);	break;
+	case PAR_EL1:		write_sysreg_s(val, SYS_PAR_EL1);	break;
+	case DACR32_EL2:	write_sysreg_s(val, SYS_DACR32_EL2);	break;
+	case IFSR32_EL2:	write_sysreg_s(val, SYS_IFSR32_EL2);	break;
+	case DBGVCR32_EL2:	write_sysreg_s(val, SYS_DBGVCR32_EL2);	break;
+	default:		return false;
+	}
+
+	return true;
+}
+
+u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
+{
+	u64 val = 0x8badf00d8badf00d;
+
+	if (!vcpu->arch.sysregs_loaded_on_cpu) {
+		goto memory_read;
 	}
 
-immediate_write:
+	if (__vcpu_read_sys_reg_from_cpu(reg, &val))
+		return val;
+
+memory_read:
+	return __vcpu_sys_reg(vcpu, reg);
+}
+
+void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
+{
+	if (!vcpu->arch.sysregs_loaded_on_cpu)
+		goto memory_write;
+
+	if (__vcpu_write_sys_reg_to_cpu(val, reg))
+		return;
+
+memory_write:
 	 __vcpu_sys_reg(vcpu, reg) = val;
 }
 
-- 
2.26.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 11/26] KVM: arm64: Add missing reset handlers for PMU emulation
  2020-04-22 12:00 [PATCH 00/26] KVM: arm64: Preliminary NV patches Marc Zyngier
                   ` (9 preceding siblings ...)
  2020-04-22 12:00 ` [PATCH 10/26] KVM: arm64: Refactor vcpu_{read,write}_sys_reg Marc Zyngier
@ 2020-04-22 12:00 ` Marc Zyngier
  2020-05-26 16:29   ` James Morse
  2020-04-22 12:00 ` [PATCH 12/26] KVM: arm64: Move sysreg reset check to boot time Marc Zyngier
                   ` (14 subsequent siblings)
  25 siblings, 1 reply; 78+ messages in thread
From: Marc Zyngier @ 2020-04-22 12:00 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Will Deacon, Andre Przywara, Dave Martin, George Cherian,
	Zengtao (B),
	Catalin Marinas

As we're about to become a bit more harsh when it comes to the lack of
reset callbacks, let's add the missing PMU reset handlers. Note that
these only cover *CLR registers that were always covered by their *SET
counterpart, so there is no semantic change here.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/sys_regs.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 46f218982df8c..c9443960b36e4 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1553,7 +1553,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_PAR_EL1), NULL, reset_unknown, PAR_EL1 },
 
 	{ SYS_DESC(SYS_PMINTENSET_EL1), access_pminten, reset_unknown, PMINTENSET_EL1 },
-	{ SYS_DESC(SYS_PMINTENCLR_EL1), access_pminten, NULL, PMINTENSET_EL1 },
+	{ SYS_DESC(SYS_PMINTENCLR_EL1), access_pminten, reset_unknown, PMINTENSET_EL1 },
 
 	{ SYS_DESC(SYS_MAIR_EL1), access_vm_reg, reset_unknown, MAIR_EL1 },
 	{ SYS_DESC(SYS_AMAIR_EL1), access_vm_reg, reset_amair_el1, AMAIR_EL1 },
@@ -1592,8 +1592,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 
 	{ SYS_DESC(SYS_PMCR_EL0), access_pmcr, reset_pmcr, PMCR_EL0 },
 	{ SYS_DESC(SYS_PMCNTENSET_EL0), access_pmcnten, reset_unknown, PMCNTENSET_EL0 },
-	{ SYS_DESC(SYS_PMCNTENCLR_EL0), access_pmcnten, NULL, PMCNTENSET_EL0 },
-	{ SYS_DESC(SYS_PMOVSCLR_EL0), access_pmovs, NULL, PMOVSSET_EL0 },
+	{ SYS_DESC(SYS_PMCNTENCLR_EL0), access_pmcnten, reset_unknown, PMCNTENSET_EL0 },
+	{ SYS_DESC(SYS_PMOVSCLR_EL0), access_pmovs, reset_unknown, PMOVSSET_EL0 },
 	{ SYS_DESC(SYS_PMSWINC_EL0), access_pmswinc, reset_unknown, PMSWINC_EL0 },
 	{ SYS_DESC(SYS_PMSELR_EL0), access_pmselr, reset_unknown, PMSELR_EL0 },
 	{ SYS_DESC(SYS_PMCEID0_EL0), access_pmceid },
-- 
2.26.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 12/26] KVM: arm64: Move sysreg reset check to boot time
  2020-04-22 12:00 [PATCH 00/26] KVM: arm64: Preliminary NV patches Marc Zyngier
                   ` (10 preceding siblings ...)
  2020-04-22 12:00 ` [PATCH 11/26] KVM: arm64: Add missing reset handlers for PMU emulation Marc Zyngier
@ 2020-04-22 12:00 ` Marc Zyngier
  2020-04-22 12:00 ` [PATCH 13/26] KVM: arm64: Introduce accessor for ctxt->sys_reg Marc Zyngier
                   ` (13 subsequent siblings)
  25 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2020-04-22 12:00 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Will Deacon, Andre Przywara, Dave Martin, George Cherian,
	Zengtao (B),
	Catalin Marinas

Our sysreg reset check has become a bit silly, as it only checks whether
a reset callback actually exists for a given sysreg entry, and apply the
method if available. Doing the check at each vcpu reset is pretty dumb,
as the tables never change. It is thus perfectly possible to do the same
checks at boot time.

This also allows us to introduce a sparse sys_regs[] array, something
that will be required with ARMv8.4-NV.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/sys_regs.c | 72 +++++++++++++++++++--------------------
 1 file changed, 35 insertions(+), 37 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index c9443960b36e4..5bda4af36a0e7 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2094,12 +2094,37 @@ static const struct sys_reg_desc cp15_64_regs[] = {
 	{ SYS_DESC(SYS_AARCH32_CNTP_CVAL),    access_arch_timer },
 };
 
+static int check_sysreg_table(const struct sys_reg_desc *table, unsigned int n,
+			      bool is_32)
+{
+	unsigned int i;
+
+	for (i = 0; i < n; i++) {
+		if (!is_32 && table[i].reg && !table[i].reset) {
+			kvm_err("sys_reg table %p entry %d has lacks reset\n",
+				table, i);
+			return 1;
+		}
+
+		if (i && cmp_sys_reg(&table[i-1], &table[i]) >= 0) {
+			kvm_err("sys_reg table %p out of order (%d)\n", table, i - 1);
+			return 1;
+		}
+	}
+
+	return 0;
+}
+
 /* Target specific emulation tables */
 static struct kvm_sys_reg_target_table *target_tables[KVM_ARM_NUM_TARGETS];
 
 void kvm_register_target_sys_reg_table(unsigned int target,
 				       struct kvm_sys_reg_target_table *table)
 {
+	if (check_sysreg_table(table->table64.table, table->table64.num, false) ||
+	    check_sysreg_table(table->table32.table, table->table32.num, true))
+		return;
+
 	target_tables[target] = table;
 }
 
@@ -2385,19 +2410,13 @@ static int emulate_sys_reg(struct kvm_vcpu *vcpu,
 }
 
 static void reset_sys_reg_descs(struct kvm_vcpu *vcpu,
-				const struct sys_reg_desc *table, size_t num,
-				unsigned long *bmap)
+				const struct sys_reg_desc *table, size_t num)
 {
 	unsigned long i;
 
 	for (i = 0; i < num; i++)
-		if (table[i].reset) {
-			int reg = table[i].reg;
-
+		if (table[i].reset)
 			table[i].reset(vcpu, &table[i]);
-			if (reg > 0 && reg < NR_SYS_REGS)
-				set_bit(reg, bmap);
-		}
 }
 
 /**
@@ -2853,32 +2872,18 @@ int kvm_arm_copy_sys_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
 	return write_demux_regids(uindices);
 }
 
-static int check_sysreg_table(const struct sys_reg_desc *table, unsigned int n)
-{
-	unsigned int i;
-
-	for (i = 1; i < n; i++) {
-		if (cmp_sys_reg(&table[i-1], &table[i]) >= 0) {
-			kvm_err("sys_reg table %p out of order (%d)\n", table, i - 1);
-			return 1;
-		}
-	}
-
-	return 0;
-}
-
 void kvm_sys_reg_table_init(void)
 {
 	unsigned int i;
 	struct sys_reg_desc clidr;
 
 	/* Make sure tables are unique and in order. */
-	BUG_ON(check_sysreg_table(sys_reg_descs, ARRAY_SIZE(sys_reg_descs)));
-	BUG_ON(check_sysreg_table(cp14_regs, ARRAY_SIZE(cp14_regs)));
-	BUG_ON(check_sysreg_table(cp14_64_regs, ARRAY_SIZE(cp14_64_regs)));
-	BUG_ON(check_sysreg_table(cp15_regs, ARRAY_SIZE(cp15_regs)));
-	BUG_ON(check_sysreg_table(cp15_64_regs, ARRAY_SIZE(cp15_64_regs)));
-	BUG_ON(check_sysreg_table(invariant_sys_regs, ARRAY_SIZE(invariant_sys_regs)));
+	BUG_ON(check_sysreg_table(sys_reg_descs, ARRAY_SIZE(sys_reg_descs), false));
+	BUG_ON(check_sysreg_table(cp14_regs, ARRAY_SIZE(cp14_regs), true));
+	BUG_ON(check_sysreg_table(cp14_64_regs, ARRAY_SIZE(cp14_64_regs), true));
+	BUG_ON(check_sysreg_table(cp15_regs, ARRAY_SIZE(cp15_regs), true));
+	BUG_ON(check_sysreg_table(cp15_64_regs, ARRAY_SIZE(cp15_64_regs), true));
+	BUG_ON(check_sysreg_table(invariant_sys_regs, ARRAY_SIZE(invariant_sys_regs), false));
 
 	/* We abuse the reset function to overwrite the table itself. */
 	for (i = 0; i < ARRAY_SIZE(invariant_sys_regs); i++)
@@ -2914,17 +2919,10 @@ void kvm_reset_sys_regs(struct kvm_vcpu *vcpu)
 {
 	size_t num;
 	const struct sys_reg_desc *table;
-	DECLARE_BITMAP(bmap, NR_SYS_REGS) = { 0, };
 
 	/* Generic chip reset first (so target could override). */
-	reset_sys_reg_descs(vcpu, sys_reg_descs, ARRAY_SIZE(sys_reg_descs), bmap);
+	reset_sys_reg_descs(vcpu, sys_reg_descs, ARRAY_SIZE(sys_reg_descs));
 
 	table = get_target_table(vcpu->arch.target, true, &num);
-	reset_sys_reg_descs(vcpu, table, num, bmap);
-
-	for (num = 1; num < NR_SYS_REGS; num++) {
-		if (WARN(!test_bit(num, bmap),
-			 "Didn't reset __vcpu_sys_reg(%zi)\n", num))
-			break;
-	}
+	reset_sys_reg_descs(vcpu, table, num);
 }
-- 
2.26.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 13/26] KVM: arm64: Introduce accessor for ctxt->sys_reg
  2020-04-22 12:00 [PATCH 00/26] KVM: arm64: Preliminary NV patches Marc Zyngier
                   ` (11 preceding siblings ...)
  2020-04-22 12:00 ` [PATCH 12/26] KVM: arm64: Move sysreg reset check to boot time Marc Zyngier
@ 2020-04-22 12:00 ` Marc Zyngier
  2020-04-22 12:00 ` [PATCH 14/26] KVM: arm64: hyp: Use ctxt_sys_reg/__vcpu_sys_reg instead of raw sys_regs access Marc Zyngier
                   ` (12 subsequent siblings)
  25 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2020-04-22 12:00 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Will Deacon, Andre Przywara, Dave Martin, George Cherian,
	Zengtao (B),
	Catalin Marinas

In order to allow the disintegration of the per-vcpu sysreg array,
let's introduce a new helper (ctxt_sys_reg()) that returns the
in-memory copy of a system register, picked from a given context.

__vcpu_sys_reg() is rewritten to use this helper.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 664a5d92ae9b8..037589a691903 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -407,12 +407,17 @@ struct kvm_vcpu_arch {
 #define vcpu_gp_regs(v)		(&(v)->arch.ctxt.gp_regs)
 
 /*
- * Only use __vcpu_sys_reg if you know you want the memory backed version of a
- * register, and not the one most recently accessed by a running VCPU.  For
- * example, for userspace access or for system registers that are never context
- * switched, but only emulated.
+ * Only use __vcpu_sys_reg/ctxt_sys_reg if you know you want the
+ * memory backed version of a register, and not the one most recently
+ * accessed by a running VCPU.  For example, for userspace access or
+ * for system registers that are never context switched, but only
+ * emulated.
  */
-#define __vcpu_sys_reg(v,r)	((v)->arch.ctxt.sys_regs[(r)])
+#define __ctxt_sys_reg(c,r)	(&(c)->sys_regs[(r)])
+
+#define ctxt_sys_reg(c,r)	(*__ctxt_sys_reg(c,r))
+
+#define __vcpu_sys_reg(v,r)	(ctxt_sys_reg(&(v)->arch.ctxt, (r)))
 
 u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg);
 void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg);
-- 
2.26.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 14/26] KVM: arm64: hyp: Use ctxt_sys_reg/__vcpu_sys_reg instead of raw sys_regs access
  2020-04-22 12:00 [PATCH 00/26] KVM: arm64: Preliminary NV patches Marc Zyngier
                   ` (12 preceding siblings ...)
  2020-04-22 12:00 ` [PATCH 13/26] KVM: arm64: Introduce accessor for ctxt->sys_reg Marc Zyngier
@ 2020-04-22 12:00 ` Marc Zyngier
  2020-04-22 12:00 ` [PATCH 15/26] KVM: arm64: sve: Use __vcpu_sys_reg() " Marc Zyngier
                   ` (11 subsequent siblings)
  25 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2020-04-22 12:00 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Will Deacon, Andre Przywara, Dave Martin, George Cherian,
	Zengtao (B),
	Catalin Marinas

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h |   2 +-
 arch/arm64/kvm/hyp/debug-sr.c     |   4 +-
 arch/arm64/kvm/hyp/switch.c       |  11 ++-
 arch/arm64/kvm/hyp/sysreg-sr.c    | 114 +++++++++++++++---------------
 4 files changed, 64 insertions(+), 67 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 037589a691903..37dd8d8faef67 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -548,7 +548,7 @@ DECLARE_PER_CPU(kvm_host_data_t, kvm_host_data);
 static inline void kvm_init_host_cpu_context(struct kvm_cpu_context *cpu_ctxt)
 {
 	/* The host's MPIDR is immutable, so let's set it up at boot time */
-	cpu_ctxt->sys_regs[MPIDR_EL1] = read_cpuid_mpidr();
+	ctxt_sys_reg(cpu_ctxt, MPIDR_EL1) = read_cpuid_mpidr();
 }
 
 void __kvm_enable_ssbs(void);
diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
index 0fc9872a14671..998758f8b5774 100644
--- a/arch/arm64/kvm/hyp/debug-sr.c
+++ b/arch/arm64/kvm/hyp/debug-sr.c
@@ -145,7 +145,7 @@ static void __hyp_text __debug_save_state(struct kvm_vcpu *vcpu,
 	save_debug(dbg->dbg_wcr, dbgwcr, wrps);
 	save_debug(dbg->dbg_wvr, dbgwvr, wrps);
 
-	ctxt->sys_regs[MDCCINT_EL1] = read_sysreg(mdccint_el1);
+	ctxt_sys_reg(ctxt, MDCCINT_EL1) = read_sysreg(mdccint_el1);
 }
 
 static void __hyp_text __debug_restore_state(struct kvm_vcpu *vcpu,
@@ -165,7 +165,7 @@ static void __hyp_text __debug_restore_state(struct kvm_vcpu *vcpu,
 	restore_debug(dbg->dbg_wcr, dbgwcr, wrps);
 	restore_debug(dbg->dbg_wvr, dbgwvr, wrps);
 
-	write_sysreg(ctxt->sys_regs[MDCCINT_EL1], mdccint_el1);
+	write_sysreg(ctxt_sys_reg(ctxt, MDCCINT_EL1), mdccint_el1);
 }
 
 void __hyp_text __debug_switch_to_guest(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index c48c96565f1a7..bc6c405b343b4 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -48,7 +48,7 @@ static void __hyp_text __fpsimd_save_fpexc32(struct kvm_vcpu *vcpu)
 	if (!vcpu_el1_is_32bit(vcpu))
 		return;
 
-	vcpu->arch.ctxt.sys_regs[FPEXC32_EL2] = read_sysreg(fpexc32_el2);
+	__vcpu_sys_reg(vcpu, FPEXC32_EL2) = read_sysreg(fpexc32_el2);
 }
 
 static void __hyp_text __activate_traps_fpsimd32(struct kvm_vcpu *vcpu)
@@ -147,9 +147,9 @@ static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
 		 * configured and enabled. We can now restore the guest's S1
 		 * configuration: SCTLR, and only then TCR.
 		 */
-		write_sysreg_el1(ctxt->sys_regs[SCTLR_EL1],	SYS_SCTLR);
+		write_sysreg_el1(ctxt_sys_reg(ctxt, SCTLR_EL1),	SYS_SCTLR);
 		isb();
-		write_sysreg_el1(ctxt->sys_regs[TCR_EL1],	SYS_TCR);
+		write_sysreg_el1(ctxt_sys_reg(ctxt, TCR_EL1),	SYS_TCR);
 	}
 }
 
@@ -420,15 +420,14 @@ static bool __hyp_text __hyp_handle_fpsimd(struct kvm_vcpu *vcpu)
 		sve_load_state(vcpu_sve_pffr(vcpu),
 			       &vcpu->arch.ctxt.gp_regs.fp_regs.fpsr,
 			       sve_vq_from_vl(vcpu->arch.sve_max_vl) - 1);
-		write_sysreg_s(vcpu->arch.ctxt.sys_regs[ZCR_EL1], SYS_ZCR_EL12);
+		write_sysreg_s(__vcpu_sys_reg(vcpu, ZCR_EL1), SYS_ZCR_EL12);
 	} else {
 		__fpsimd_restore_state(&vcpu->arch.ctxt.gp_regs.fp_regs);
 	}
 
 	/* Skip restoring fpexc32 for AArch64 guests */
 	if (!(read_sysreg(hcr_el2) & HCR_RW))
-		write_sysreg(vcpu->arch.ctxt.sys_regs[FPEXC32_EL2],
-			     fpexc32_el2);
+		write_sysreg(__vcpu_sys_reg(vcpu, FPEXC32_EL2), fpexc32_el2);
 
 	vcpu->arch.flags |= KVM_ARM64_FP_ENABLED;
 
diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index 75b1925763f16..970c94ce9d10a 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -25,7 +25,7 @@
 
 static void __hyp_text __sysreg_save_common_state(struct kvm_cpu_context *ctxt)
 {
-	ctxt->sys_regs[MDSCR_EL1]	= read_sysreg(mdscr_el1);
+	ctxt_sys_reg(ctxt, MDSCR_EL1)	= read_sysreg(mdscr_el1);
 
 	/*
 	 * The host arm64 Linux uses sp_el0 to point to 'current' and it must
@@ -36,30 +36,30 @@ static void __hyp_text __sysreg_save_common_state(struct kvm_cpu_context *ctxt)
 
 static void __hyp_text __sysreg_save_user_state(struct kvm_cpu_context *ctxt)
 {
-	ctxt->sys_regs[TPIDR_EL0]	= read_sysreg(tpidr_el0);
-	ctxt->sys_regs[TPIDRRO_EL0]	= read_sysreg(tpidrro_el0);
+	ctxt_sys_reg(ctxt, TPIDR_EL0)	= read_sysreg(tpidr_el0);
+	ctxt_sys_reg(ctxt, TPIDRRO_EL0)	= read_sysreg(tpidrro_el0);
 }
 
 static void __hyp_text __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
 {
-	ctxt->sys_regs[CSSELR_EL1]	= read_sysreg(csselr_el1);
-	ctxt->sys_regs[SCTLR_EL1]	= read_sysreg_el1(SYS_SCTLR);
-	ctxt->sys_regs[ACTLR_EL1]	= read_sysreg(actlr_el1);
-	ctxt->sys_regs[CPACR_EL1]	= read_sysreg_el1(SYS_CPACR);
-	ctxt->sys_regs[TTBR0_EL1]	= read_sysreg_el1(SYS_TTBR0);
-	ctxt->sys_regs[TTBR1_EL1]	= read_sysreg_el1(SYS_TTBR1);
-	ctxt->sys_regs[TCR_EL1]		= read_sysreg_el1(SYS_TCR);
-	ctxt->sys_regs[ESR_EL1]		= read_sysreg_el1(SYS_ESR);
-	ctxt->sys_regs[AFSR0_EL1]	= read_sysreg_el1(SYS_AFSR0);
-	ctxt->sys_regs[AFSR1_EL1]	= read_sysreg_el1(SYS_AFSR1);
-	ctxt->sys_regs[FAR_EL1]		= read_sysreg_el1(SYS_FAR);
-	ctxt->sys_regs[MAIR_EL1]	= read_sysreg_el1(SYS_MAIR);
-	ctxt->sys_regs[VBAR_EL1]	= read_sysreg_el1(SYS_VBAR);
-	ctxt->sys_regs[CONTEXTIDR_EL1]	= read_sysreg_el1(SYS_CONTEXTIDR);
-	ctxt->sys_regs[AMAIR_EL1]	= read_sysreg_el1(SYS_AMAIR);
-	ctxt->sys_regs[CNTKCTL_EL1]	= read_sysreg_el1(SYS_CNTKCTL);
-	ctxt->sys_regs[PAR_EL1]		= read_sysreg(par_el1);
-	ctxt->sys_regs[TPIDR_EL1]	= read_sysreg(tpidr_el1);
+	ctxt_sys_reg(ctxt, CSSELR_EL1)	= read_sysreg(csselr_el1);
+	ctxt_sys_reg(ctxt, SCTLR_EL1)	= read_sysreg_el1(SYS_SCTLR);
+	ctxt_sys_reg(ctxt, ACTLR_EL1)	= read_sysreg(actlr_el1);
+	ctxt_sys_reg(ctxt, CPACR_EL1)	= read_sysreg_el1(SYS_CPACR);
+	ctxt_sys_reg(ctxt, TTBR0_EL1)	= read_sysreg_el1(SYS_TTBR0);
+	ctxt_sys_reg(ctxt, TTBR1_EL1)	= read_sysreg_el1(SYS_TTBR1);
+	ctxt_sys_reg(ctxt, TCR_EL1)	= read_sysreg_el1(SYS_TCR);
+	ctxt_sys_reg(ctxt, ESR_EL1)	= read_sysreg_el1(SYS_ESR);
+	ctxt_sys_reg(ctxt, AFSR0_EL1)	= read_sysreg_el1(SYS_AFSR0);
+	ctxt_sys_reg(ctxt, AFSR1_EL1)	= read_sysreg_el1(SYS_AFSR1);
+	ctxt_sys_reg(ctxt, FAR_EL1)	= read_sysreg_el1(SYS_FAR);
+	ctxt_sys_reg(ctxt, MAIR_EL1)	= read_sysreg_el1(SYS_MAIR);
+	ctxt_sys_reg(ctxt, VBAR_EL1)	= read_sysreg_el1(SYS_VBAR);
+	ctxt_sys_reg(ctxt, CONTEXTIDR_EL1) = read_sysreg_el1(SYS_CONTEXTIDR);
+	ctxt_sys_reg(ctxt, AMAIR_EL1)	= read_sysreg_el1(SYS_AMAIR);
+	ctxt_sys_reg(ctxt, CNTKCTL_EL1)	= read_sysreg_el1(SYS_CNTKCTL);
+	ctxt_sys_reg(ctxt, PAR_EL1)	= read_sysreg(par_el1);
+	ctxt_sys_reg(ctxt, TPIDR_EL1)	= read_sysreg(tpidr_el1);
 
 	ctxt->gp_regs.sp_el1		= read_sysreg(sp_el1);
 	ctxt->gp_regs.elr_el1		= read_sysreg_el1(SYS_ELR);
@@ -72,7 +72,7 @@ static void __hyp_text __sysreg_save_el2_return_state(struct kvm_cpu_context *ct
 	ctxt->gp_regs.regs.pstate	= read_sysreg_el2(SYS_SPSR);
 
 	if (cpus_have_final_cap(ARM64_HAS_RAS_EXTN))
-		ctxt->sys_regs[DISR_EL1] = read_sysreg_s(SYS_VDISR_EL2);
+		ctxt_sys_reg(ctxt, DISR_EL1) = read_sysreg_s(SYS_VDISR_EL2);
 }
 
 void __hyp_text __sysreg_save_state_nvhe(struct kvm_cpu_context *ctxt)
@@ -98,7 +98,7 @@ NOKPROBE_SYMBOL(sysreg_save_guest_state_vhe);
 
 static void __hyp_text __sysreg_restore_common_state(struct kvm_cpu_context *ctxt)
 {
-	write_sysreg(ctxt->sys_regs[MDSCR_EL1],	  mdscr_el1);
+	write_sysreg(ctxt_sys_reg(ctxt, MDSCR_EL1),  mdscr_el1);
 
 	/*
 	 * The host arm64 Linux uses sp_el0 to point to 'current' and it must
@@ -109,45 +109,45 @@ static void __hyp_text __sysreg_restore_common_state(struct kvm_cpu_context *ctx
 
 static void __hyp_text __sysreg_restore_user_state(struct kvm_cpu_context *ctxt)
 {
-	write_sysreg(ctxt->sys_regs[TPIDR_EL0],		tpidr_el0);
-	write_sysreg(ctxt->sys_regs[TPIDRRO_EL0],	tpidrro_el0);
+	write_sysreg(ctxt_sys_reg(ctxt, TPIDR_EL0),	tpidr_el0);
+	write_sysreg(ctxt_sys_reg(ctxt, TPIDRRO_EL0),	tpidrro_el0);
 }
 
 static void __hyp_text __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
 {
-	write_sysreg(ctxt->sys_regs[MPIDR_EL1],		vmpidr_el2);
-	write_sysreg(ctxt->sys_regs[CSSELR_EL1],	csselr_el1);
+	write_sysreg(ctxt_sys_reg(ctxt, MPIDR_EL1),	vmpidr_el2);
+	write_sysreg(ctxt_sys_reg(ctxt, CSSELR_EL1),	csselr_el1);
 
 	if (!cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT_NVHE)) {
-		write_sysreg_el1(ctxt->sys_regs[SCTLR_EL1],	SYS_SCTLR);
-		write_sysreg_el1(ctxt->sys_regs[TCR_EL1],	SYS_TCR);
+		write_sysreg_el1(ctxt_sys_reg(ctxt, SCTLR_EL1),	SYS_SCTLR);
+		write_sysreg_el1(ctxt_sys_reg(ctxt, TCR_EL1),	SYS_TCR);
 	} else	if (!ctxt->__hyp_running_vcpu) {
 		/*
 		 * Must only be done for guest registers, hence the context
 		 * test. We're coming from the host, so SCTLR.M is already
 		 * set. Pairs with __activate_traps_nvhe().
 		 */
-		write_sysreg_el1((ctxt->sys_regs[TCR_EL1] |
+		write_sysreg_el1((ctxt_sys_reg(ctxt, TCR_EL1) |
 				  TCR_EPD1_MASK | TCR_EPD0_MASK),
 				 SYS_TCR);
 		isb();
 	}
 
-	write_sysreg(ctxt->sys_regs[ACTLR_EL1],		actlr_el1);
-	write_sysreg_el1(ctxt->sys_regs[CPACR_EL1],	SYS_CPACR);
-	write_sysreg_el1(ctxt->sys_regs[TTBR0_EL1],	SYS_TTBR0);
-	write_sysreg_el1(ctxt->sys_regs[TTBR1_EL1],	SYS_TTBR1);
-	write_sysreg_el1(ctxt->sys_regs[ESR_EL1],	SYS_ESR);
-	write_sysreg_el1(ctxt->sys_regs[AFSR0_EL1],	SYS_AFSR0);
-	write_sysreg_el1(ctxt->sys_regs[AFSR1_EL1],	SYS_AFSR1);
-	write_sysreg_el1(ctxt->sys_regs[FAR_EL1],	SYS_FAR);
-	write_sysreg_el1(ctxt->sys_regs[MAIR_EL1],	SYS_MAIR);
-	write_sysreg_el1(ctxt->sys_regs[VBAR_EL1],	SYS_VBAR);
-	write_sysreg_el1(ctxt->sys_regs[CONTEXTIDR_EL1],SYS_CONTEXTIDR);
-	write_sysreg_el1(ctxt->sys_regs[AMAIR_EL1],	SYS_AMAIR);
-	write_sysreg_el1(ctxt->sys_regs[CNTKCTL_EL1],	SYS_CNTKCTL);
-	write_sysreg(ctxt->sys_regs[PAR_EL1],		par_el1);
-	write_sysreg(ctxt->sys_regs[TPIDR_EL1],		tpidr_el1);
+	write_sysreg(ctxt_sys_reg(ctxt, ACTLR_EL1),	actlr_el1);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, CPACR_EL1),	SYS_CPACR);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR0_EL1),	SYS_TTBR0);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR1_EL1),	SYS_TTBR1);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, ESR_EL1),	SYS_ESR);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, AFSR0_EL1),	SYS_AFSR0);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, AFSR1_EL1),	SYS_AFSR1);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, FAR_EL1),	SYS_FAR);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, MAIR_EL1),	SYS_MAIR);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, VBAR_EL1),	SYS_VBAR);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, CONTEXTIDR_EL1), SYS_CONTEXTIDR);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, AMAIR_EL1),	SYS_AMAIR);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, CNTKCTL_EL1), SYS_CNTKCTL);
+	write_sysreg(ctxt_sys_reg(ctxt, PAR_EL1),	par_el1);
+	write_sysreg(ctxt_sys_reg(ctxt, TPIDR_EL1),	tpidr_el1);
 
 	if (cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT_NVHE) &&
 	    ctxt->__hyp_running_vcpu) {
@@ -161,9 +161,9 @@ static void __hyp_text __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
 		 * deconfigured and disabled. We can now restore the host's
 		 * S1 configuration: SCTLR, and only then TCR.
 		 */
-		write_sysreg_el1(ctxt->sys_regs[SCTLR_EL1],	SYS_SCTLR);
+		write_sysreg_el1(ctxt_sys_reg(ctxt, SCTLR_EL1),	SYS_SCTLR);
 		isb();
-		write_sysreg_el1(ctxt->sys_regs[TCR_EL1],	SYS_TCR);
+		write_sysreg_el1(ctxt_sys_reg(ctxt, TCR_EL1),	SYS_TCR);
 	}
 
 	write_sysreg(ctxt->gp_regs.sp_el1,		sp_el1);
@@ -195,7 +195,7 @@ __sysreg_restore_el2_return_state(struct kvm_cpu_context *ctxt)
 	write_sysreg_el2(pstate,			SYS_SPSR);
 
 	if (cpus_have_final_cap(ARM64_HAS_RAS_EXTN))
-		write_sysreg_s(ctxt->sys_regs[DISR_EL1], SYS_VDISR_EL2);
+		write_sysreg_s(ctxt_sys_reg(ctxt, DISR_EL1), SYS_VDISR_EL2);
 }
 
 void __hyp_text __sysreg_restore_state_nvhe(struct kvm_cpu_context *ctxt)
@@ -221,46 +221,44 @@ NOKPROBE_SYMBOL(sysreg_restore_guest_state_vhe);
 
 void __hyp_text __sysreg32_save_state(struct kvm_vcpu *vcpu)
 {
-	u64 *spsr, *sysreg;
+	u64 *spsr;
 
 	if (!vcpu_el1_is_32bit(vcpu))
 		return;
 
 	spsr = vcpu->arch.ctxt.gp_regs.spsr;
-	sysreg = vcpu->arch.ctxt.sys_regs;
 
 	spsr[KVM_SPSR_ABT] = read_sysreg(spsr_abt);
 	spsr[KVM_SPSR_UND] = read_sysreg(spsr_und);
 	spsr[KVM_SPSR_IRQ] = read_sysreg(spsr_irq);
 	spsr[KVM_SPSR_FIQ] = read_sysreg(spsr_fiq);
 
-	sysreg[DACR32_EL2] = read_sysreg(dacr32_el2);
-	sysreg[IFSR32_EL2] = read_sysreg(ifsr32_el2);
+	__vcpu_sys_reg(vcpu, DACR32_EL2) = read_sysreg(dacr32_el2);
+	__vcpu_sys_reg(vcpu, IFSR32_EL2) = read_sysreg(ifsr32_el2);
 
 	if (has_vhe() || vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY)
-		sysreg[DBGVCR32_EL2] = read_sysreg(dbgvcr32_el2);
+		__vcpu_sys_reg(vcpu, DBGVCR32_EL2) = read_sysreg(dbgvcr32_el2);
 }
 
 void __hyp_text __sysreg32_restore_state(struct kvm_vcpu *vcpu)
 {
-	u64 *spsr, *sysreg;
+	u64 *spsr;
 
 	if (!vcpu_el1_is_32bit(vcpu))
 		return;
 
 	spsr = vcpu->arch.ctxt.gp_regs.spsr;
-	sysreg = vcpu->arch.ctxt.sys_regs;
 
 	write_sysreg(spsr[KVM_SPSR_ABT], spsr_abt);
 	write_sysreg(spsr[KVM_SPSR_UND], spsr_und);
 	write_sysreg(spsr[KVM_SPSR_IRQ], spsr_irq);
 	write_sysreg(spsr[KVM_SPSR_FIQ], spsr_fiq);
 
-	write_sysreg(sysreg[DACR32_EL2], dacr32_el2);
-	write_sysreg(sysreg[IFSR32_EL2], ifsr32_el2);
+	write_sysreg(__vcpu_sys_reg(vcpu, DACR32_EL2), dacr32_el2);
+	write_sysreg(__vcpu_sys_reg(vcpu, IFSR32_EL2), ifsr32_el2);
 
 	if (has_vhe() || vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY)
-		write_sysreg(sysreg[DBGVCR32_EL2], dbgvcr32_el2);
+		write_sysreg(__vcpu_sys_reg(vcpu, DBGVCR32_EL2), dbgvcr32_el2);
 }
 
 /**
-- 
2.26.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 15/26] KVM: arm64: sve: Use __vcpu_sys_reg() instead of raw sys_regs access
  2020-04-22 12:00 [PATCH 00/26] KVM: arm64: Preliminary NV patches Marc Zyngier
                   ` (13 preceding siblings ...)
  2020-04-22 12:00 ` [PATCH 14/26] KVM: arm64: hyp: Use ctxt_sys_reg/__vcpu_sys_reg instead of raw sys_regs access Marc Zyngier
@ 2020-04-22 12:00 ` Marc Zyngier
  2020-04-22 12:00 ` [PATCH 16/26] KVM: arm64: pauth: Use ctxt_sys_reg() " Marc Zyngier
                   ` (10 subsequent siblings)
  25 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2020-04-22 12:00 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Will Deacon, Andre Przywara, Dave Martin, George Cherian,
	Zengtao (B),
	Catalin Marinas

Now that we have a wrapper for the sysreg accesses, let's use that
consistently.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/fpsimd.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
index e329a36b2bee5..e503caff14d1e 100644
--- a/arch/arm64/kvm/fpsimd.c
+++ b/arch/arm64/kvm/fpsimd.c
@@ -109,12 +109,10 @@ void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu)
 	local_irq_save(flags);
 
 	if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED) {
-		u64 *guest_zcr = &vcpu->arch.ctxt.sys_regs[ZCR_EL1];
-
 		fpsimd_save_and_flush_cpu_state();
 
 		if (guest_has_sve)
-			*guest_zcr = read_sysreg_s(SYS_ZCR_EL12);
+			__vcpu_sys_reg(vcpu, ZCR_EL1) = read_sysreg_s(SYS_ZCR_EL12);
 	} else if (host_has_sve) {
 		/*
 		 * The FPSIMD/SVE state in the CPU has not been touched, and we
-- 
2.26.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 16/26] KVM: arm64: pauth: Use ctxt_sys_reg() instead of raw sys_regs access
  2020-04-22 12:00 [PATCH 00/26] KVM: arm64: Preliminary NV patches Marc Zyngier
                   ` (14 preceding siblings ...)
  2020-04-22 12:00 ` [PATCH 15/26] KVM: arm64: sve: Use __vcpu_sys_reg() " Marc Zyngier
@ 2020-04-22 12:00 ` Marc Zyngier
  2020-04-22 12:00 ` [PATCH 17/26] KVM: arm64: debug: " Marc Zyngier
                   ` (9 subsequent siblings)
  25 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2020-04-22 12:00 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Will Deacon, Andre Przywara, Dave Martin, George Cherian,
	Zengtao (B),
	Catalin Marinas

Now that we have a wrapper for the sysreg accesses, let's use that
consistently.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/handle_exit.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index aacfc55de44cb..1feb0eb3174a3 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -162,10 +162,11 @@ static int handle_sve(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	return 1;
 }
 
-#define __ptrauth_save_key(regs, key)						\
+
+#define __ptrauth_save_key(ctxt, key)						\
 ({										\
-	regs[key ## KEYLO_EL1] = read_sysreg_s(SYS_ ## key ## KEYLO_EL1);	\
-	regs[key ## KEYHI_EL1] = read_sysreg_s(SYS_ ## key ## KEYHI_EL1);	\
+	ctxt_sys_reg(ctxt, key ## KEYLO_EL1) = read_sysreg_s(SYS_ ## key ## KEYLO_EL1); \
+	ctxt_sys_reg(ctxt, key ## KEYHI_EL1) = read_sysreg_s(SYS_ ## key ## KEYHI_EL1);	\
 })
 
 /*
@@ -179,11 +180,11 @@ void kvm_arm_vcpu_ptrauth_trap(struct kvm_vcpu *vcpu)
 	if (vcpu_has_ptrauth(vcpu)) {
 		vcpu_ptrauth_enable(vcpu);
 		ctxt = vcpu->arch.host_cpu_context;
-		__ptrauth_save_key(ctxt->sys_regs, APIA);
-		__ptrauth_save_key(ctxt->sys_regs, APIB);
-		__ptrauth_save_key(ctxt->sys_regs, APDA);
-		__ptrauth_save_key(ctxt->sys_regs, APDB);
-		__ptrauth_save_key(ctxt->sys_regs, APGA);
+		__ptrauth_save_key(ctxt, APIA);
+		__ptrauth_save_key(ctxt, APIB);
+		__ptrauth_save_key(ctxt, APDA);
+		__ptrauth_save_key(ctxt, APDB);
+		__ptrauth_save_key(ctxt, APGA);
 	} else {
 		kvm_inject_undefined(vcpu);
 	}
-- 
2.26.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 17/26] KVM: arm64: debug: Use ctxt_sys_reg() instead of raw sys_regs access
  2020-04-22 12:00 [PATCH 00/26] KVM: arm64: Preliminary NV patches Marc Zyngier
                   ` (15 preceding siblings ...)
  2020-04-22 12:00 ` [PATCH 16/26] KVM: arm64: pauth: Use ctxt_sys_reg() " Marc Zyngier
@ 2020-04-22 12:00 ` Marc Zyngier
  2020-04-22 12:00 ` [PATCH 18/26] KVM: arm64: Don't use empty structures as CPU reset state Marc Zyngier
                   ` (8 subsequent siblings)
  25 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2020-04-22 12:00 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Will Deacon, Andre Przywara, Dave Martin, George Cherian,
	Zengtao (B),
	Catalin Marinas

Now that we have a wrapper for the sysreg accesses, let's use that
consistently.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/hyp/debug-sr.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
index 998758f8b5774..421e547ed10f3 100644
--- a/arch/arm64/kvm/hyp/debug-sr.c
+++ b/arch/arm64/kvm/hyp/debug-sr.c
@@ -129,8 +129,7 @@ static void __hyp_text __debug_restore_spe_nvhe(u64 pmscr_el1)
 	write_sysreg_s(pmscr_el1, SYS_PMSCR_EL1);
 }
 
-static void __hyp_text __debug_save_state(struct kvm_vcpu *vcpu,
-					  struct kvm_guest_debug_arch *dbg,
+static void __hyp_text __debug_save_state(struct kvm_guest_debug_arch *dbg,
 					  struct kvm_cpu_context *ctxt)
 {
 	u64 aa64dfr0;
@@ -148,8 +147,7 @@ static void __hyp_text __debug_save_state(struct kvm_vcpu *vcpu,
 	ctxt_sys_reg(ctxt, MDCCINT_EL1) = read_sysreg(mdccint_el1);
 }
 
-static void __hyp_text __debug_restore_state(struct kvm_vcpu *vcpu,
-					     struct kvm_guest_debug_arch *dbg,
+static void __hyp_text __debug_restore_state(struct kvm_guest_debug_arch *dbg,
 					     struct kvm_cpu_context *ctxt)
 {
 	u64 aa64dfr0;
@@ -190,8 +188,8 @@ void __hyp_text __debug_switch_to_guest(struct kvm_vcpu *vcpu)
 	host_dbg = &vcpu->arch.host_debug_state.regs;
 	guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
 
-	__debug_save_state(vcpu, host_dbg, host_ctxt);
-	__debug_restore_state(vcpu, guest_dbg, guest_ctxt);
+	__debug_save_state(host_dbg, host_ctxt);
+	__debug_restore_state(guest_dbg, guest_ctxt);
 }
 
 void __hyp_text __debug_switch_to_host(struct kvm_vcpu *vcpu)
@@ -212,8 +210,8 @@ void __hyp_text __debug_switch_to_host(struct kvm_vcpu *vcpu)
 	host_dbg = &vcpu->arch.host_debug_state.regs;
 	guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
 
-	__debug_save_state(vcpu, guest_dbg, guest_ctxt);
-	__debug_restore_state(vcpu, host_dbg, host_ctxt);
+	__debug_save_state(guest_dbg, guest_ctxt);
+	__debug_restore_state(host_dbg, host_ctxt);
 
 	vcpu->arch.flags &= ~KVM_ARM64_DEBUG_DIRTY;
 }
-- 
2.26.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 18/26] KVM: arm64: Don't use empty structures as CPU reset state
  2020-04-22 12:00 [PATCH 00/26] KVM: arm64: Preliminary NV patches Marc Zyngier
                   ` (16 preceding siblings ...)
  2020-04-22 12:00 ` [PATCH 17/26] KVM: arm64: debug: " Marc Zyngier
@ 2020-04-22 12:00 ` Marc Zyngier
  2020-04-24  4:07   ` Zenghui Yu
  2020-04-22 12:00 ` [PATCH 19/26] KVM: arm64: Make struct kvm_regs userspace-only Marc Zyngier
                   ` (7 subsequent siblings)
  25 siblings, 1 reply; 78+ messages in thread
From: Marc Zyngier @ 2020-04-22 12:00 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Will Deacon, Andre Przywara, Dave Martin, George Cherian,
	Zengtao (B),
	Catalin Marinas

Keeping empty structure as the vcpu state initializer is slightly
wasteful: we only want to set pstate, and zero everything else.
Just do that.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/reset.c | 20 +++++++++-----------
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 241db35a7ef4f..895d7d9ad1866 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -37,15 +37,11 @@ static u32 kvm_ipa_limit;
 /*
  * ARMv8 Reset Values
  */
-static const struct kvm_regs default_regs_reset = {
-	.regs.pstate = (PSR_MODE_EL1h | PSR_A_BIT | PSR_I_BIT |
-			PSR_F_BIT | PSR_D_BIT),
-};
+#define VCPU_RESET_PSTATE_EL1	(PSR_MODE_EL1h | PSR_A_BIT | PSR_I_BIT | \
+				 PSR_F_BIT | PSR_D_BIT)
 
-static const struct kvm_regs default_regs_reset32 = {
-	.regs.pstate = (PSR_AA32_MODE_SVC | PSR_AA32_A_BIT |
-			PSR_AA32_I_BIT | PSR_AA32_F_BIT),
-};
+#define VCPU_RESET_PSTATE_SVC	(PSR_AA32_MODE_SVC | PSR_AA32_A_BIT | \
+				 PSR_AA32_I_BIT | PSR_AA32_F_BIT)
 
 static bool cpu_has_32bit_el1(void)
 {
@@ -261,6 +257,7 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 	const struct kvm_regs *cpu_reset;
 	int ret = -EINVAL;
 	bool loaded;
+	u32 pstate;
 
 	/* Reset PMU outside of the non-preemptible section */
 	kvm_pmu_vcpu_reset(vcpu);
@@ -291,16 +288,17 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 		if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
 			if (!cpu_has_32bit_el1())
 				goto out;
-			cpu_reset = &default_regs_reset32;
+			pstate = VCPU_RESET_PSTATE_SVC;
 		} else {
-			cpu_reset = &default_regs_reset;
+			pstate = VCPU_RESET_PSTATE_EL1;
 		}
 
 		break;
 	}
 
 	/* Reset core registers */
-	memcpy(vcpu_gp_regs(vcpu), cpu_reset, sizeof(*cpu_reset));
+	memset(vcpu_gp_regs(vcpu), 0, sizeof(*cpu_reset));
+	vcpu_gp_regs(vcpu)->regs.pstate = pstate;
 
 	/* Reset system registers */
 	kvm_reset_sys_regs(vcpu);
-- 
2.26.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 19/26] KVM: arm64: Make struct kvm_regs userspace-only
  2020-04-22 12:00 [PATCH 00/26] KVM: arm64: Preliminary NV patches Marc Zyngier
                   ` (17 preceding siblings ...)
  2020-04-22 12:00 ` [PATCH 18/26] KVM: arm64: Don't use empty structures as CPU reset state Marc Zyngier
@ 2020-04-22 12:00 ` Marc Zyngier
  2020-05-26 16:29   ` James Morse
  2020-04-22 12:00 ` [PATCH 20/26] KVM: arm64: Move ELR_EL1 to the system register array Marc Zyngier
                   ` (6 subsequent siblings)
  25 siblings, 1 reply; 78+ messages in thread
From: Marc Zyngier @ 2020-04-22 12:00 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Will Deacon, Andre Przywara, Dave Martin, George Cherian,
	Zengtao (B),
	Catalin Marinas

struct kvm_regs is used by userspace to indicate which register gets
accessed by the {GET,SET}_ONE_REG API. But as we're about to refactor
the layout of the in-kernel register structures, we need the kernel to
move away from it.

Let's make kvm_regs userspace only, and let the kernel map it to its own
internal representation.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_emulate.h | 18 +++----
 arch/arm64/include/asm/kvm_host.h    | 12 ++++-
 arch/arm64/kernel/asm-offsets.c      |  3 +-
 arch/arm64/kvm/fpsimd.c              |  2 +-
 arch/arm64/kvm/guest.c               | 70 +++++++++++++++++++++++++---
 arch/arm64/kvm/hyp/entry.S           |  3 +-
 arch/arm64/kvm/hyp/switch.c          |  4 +-
 arch/arm64/kvm/hyp/sysreg-sr.c       | 28 +++++------
 arch/arm64/kvm/regmap.c              |  6 +--
 arch/arm64/kvm/reset.c               |  2 +-
 10 files changed, 105 insertions(+), 43 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index a30b4eec7cb40..c0d9363f328dc 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -130,12 +130,12 @@ static inline void vcpu_set_vsesr(struct kvm_vcpu *vcpu, u64 vsesr)
 
 static __always_inline unsigned long *vcpu_pc(const struct kvm_vcpu *vcpu)
 {
-	return (unsigned long *)&vcpu_gp_regs(vcpu)->regs.pc;
+	return (unsigned long *)&vcpu_gp_regs(vcpu)->pc;
 }
 
 static inline unsigned long *__vcpu_elr_el1(const struct kvm_vcpu *vcpu)
 {
-	return (unsigned long *)&vcpu_gp_regs(vcpu)->elr_el1;
+	return (unsigned long *)&vcpu->arch.ctxt.elr_el1;
 }
 
 static inline unsigned long vcpu_read_elr_el1(const struct kvm_vcpu *vcpu)
@@ -156,7 +156,7 @@ static inline void vcpu_write_elr_el1(const struct kvm_vcpu *vcpu, unsigned long
 
 static __always_inline unsigned long *vcpu_cpsr(const struct kvm_vcpu *vcpu)
 {
-	return (unsigned long *)&vcpu_gp_regs(vcpu)->regs.pstate;
+	return (unsigned long *)&vcpu_gp_regs(vcpu)->pstate;
 }
 
 static __always_inline bool vcpu_mode_is_32bit(const struct kvm_vcpu *vcpu)
@@ -185,14 +185,14 @@ static inline void vcpu_set_thumb(struct kvm_vcpu *vcpu)
 static __always_inline unsigned long vcpu_get_reg(const struct kvm_vcpu *vcpu,
 					 u8 reg_num)
 {
-	return (reg_num == 31) ? 0 : vcpu_gp_regs(vcpu)->regs.regs[reg_num];
+	return (reg_num == 31) ? 0 : vcpu_gp_regs(vcpu)->regs[reg_num];
 }
 
 static __always_inline void vcpu_set_reg(struct kvm_vcpu *vcpu, u8 reg_num,
 				unsigned long val)
 {
 	if (reg_num != 31)
-		vcpu_gp_regs(vcpu)->regs.regs[reg_num] = val;
+		vcpu_gp_regs(vcpu)->regs[reg_num] = val;
 }
 
 static inline unsigned long vcpu_read_spsr(const struct kvm_vcpu *vcpu)
@@ -203,7 +203,7 @@ static inline unsigned long vcpu_read_spsr(const struct kvm_vcpu *vcpu)
 	if (vcpu->arch.sysregs_loaded_on_cpu)
 		return read_sysreg_el1(SYS_SPSR);
 	else
-		return vcpu_gp_regs(vcpu)->spsr[KVM_SPSR_EL1];
+		return vcpu->arch.ctxt.spsr[KVM_SPSR_EL1];
 }
 
 static inline void vcpu_write_spsr(struct kvm_vcpu *vcpu, unsigned long v)
@@ -216,7 +216,7 @@ static inline void vcpu_write_spsr(struct kvm_vcpu *vcpu, unsigned long v)
 	if (vcpu->arch.sysregs_loaded_on_cpu)
 		write_sysreg_el1(v, SYS_SPSR);
 	else
-		vcpu_gp_regs(vcpu)->spsr[KVM_SPSR_EL1] = v;
+		vcpu->arch.ctxt.spsr[KVM_SPSR_EL1] = v;
 }
 
 /*
@@ -523,11 +523,11 @@ static __always_inline void kvm_skip_instr(struct kvm_vcpu *vcpu, bool is_wide_i
 static __always_inline void __hyp_text __kvm_skip_instr(struct kvm_vcpu *vcpu)
 {
 	*vcpu_pc(vcpu) = read_sysreg_el2(SYS_ELR);
-	vcpu->arch.ctxt.gp_regs.regs.pstate = read_sysreg_el2(SYS_SPSR);
+	vcpu_gp_regs(vcpu)->pstate = read_sysreg_el2(SYS_SPSR);
 
 	kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
 
-	write_sysreg_el2(vcpu->arch.ctxt.gp_regs.regs.pstate, SYS_SPSR);
+	write_sysreg_el2(vcpu_gp_regs(vcpu)->pstate, SYS_SPSR);
 	write_sysreg_el2(*vcpu_pc(vcpu), SYS_ELR);
 }
 
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 37dd8d8faef67..95977b80265ce 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -235,7 +235,15 @@ enum vcpu_sysreg {
 #define NR_COPRO_REGS	(NR_SYS_REGS * 2)
 
 struct kvm_cpu_context {
-	struct kvm_regs	gp_regs;
+	struct user_pt_regs regs;	/* sp = sp_el0 */
+
+	u64	sp_el1;
+	u64	elr_el1;
+
+	u64	spsr[KVM_NR_SPSR];
+
+	struct user_fpsimd_state fp_regs;
+
 	union {
 		u64 sys_regs[NR_SYS_REGS];
 		u32 copro[NR_COPRO_REGS];
@@ -404,7 +412,7 @@ struct kvm_vcpu_arch {
 				  system_supports_generic_auth()) && \
 				 ((vcpu)->arch.flags & KVM_ARM64_GUEST_HAS_PTRAUTH))
 
-#define vcpu_gp_regs(v)		(&(v)->arch.ctxt.gp_regs)
+#define vcpu_gp_regs(v)		(&(v)->arch.ctxt.regs)
 
 /*
  * Only use __vcpu_sys_reg/ctxt_sys_reg if you know you want the
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 9981a0a5a87f1..1125df6449b60 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -101,13 +101,12 @@ int main(void)
   DEFINE(VCPU_FAULT_DISR,	offsetof(struct kvm_vcpu, arch.fault.disr_el1));
   DEFINE(VCPU_WORKAROUND_FLAGS,	offsetof(struct kvm_vcpu, arch.workaround_flags));
   DEFINE(VCPU_HCR_EL2,		offsetof(struct kvm_vcpu, arch.hcr_el2));
-  DEFINE(CPU_GP_REGS,		offsetof(struct kvm_cpu_context, gp_regs));
+  DEFINE(CPU_USER_PT_REGS,	offsetof(struct kvm_cpu_context, regs));
   DEFINE(CPU_APIAKEYLO_EL1,	offsetof(struct kvm_cpu_context, sys_regs[APIAKEYLO_EL1]));
   DEFINE(CPU_APIBKEYLO_EL1,	offsetof(struct kvm_cpu_context, sys_regs[APIBKEYLO_EL1]));
   DEFINE(CPU_APDAKEYLO_EL1,	offsetof(struct kvm_cpu_context, sys_regs[APDAKEYLO_EL1]));
   DEFINE(CPU_APDBKEYLO_EL1,	offsetof(struct kvm_cpu_context, sys_regs[APDBKEYLO_EL1]));
   DEFINE(CPU_APGAKEYLO_EL1,	offsetof(struct kvm_cpu_context, sys_regs[APGAKEYLO_EL1]));
-  DEFINE(CPU_USER_PT_REGS,	offsetof(struct kvm_regs, regs));
   DEFINE(HOST_CONTEXT_VCPU,	offsetof(struct kvm_cpu_context, __hyp_running_vcpu));
   DEFINE(HOST_DATA_CONTEXT,	offsetof(struct kvm_host_data, host_ctxt));
 #endif
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
index e503caff14d1e..3e081d556e810 100644
--- a/arch/arm64/kvm/fpsimd.c
+++ b/arch/arm64/kvm/fpsimd.c
@@ -85,7 +85,7 @@ void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu)
 	WARN_ON_ONCE(!irqs_disabled());
 
 	if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED) {
-		fpsimd_bind_state_to_cpu(&vcpu->arch.ctxt.gp_regs.fp_regs,
+		fpsimd_bind_state_to_cpu(&vcpu->arch.ctxt.fp_regs,
 					 vcpu->arch.sve_state,
 					 vcpu->arch.sve_max_vl);
 
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 23ebe51410f06..9fec9231b63e2 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -102,6 +102,55 @@ static int core_reg_size_from_offset(const struct kvm_vcpu *vcpu, u64 off)
 	return size;
 }
 
+static void *core_reg_addr(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
+{
+	u64 off = core_reg_offset_from_id(reg->id);
+
+	switch (off) {
+	case KVM_REG_ARM_CORE_REG(regs.regs[0]) ...
+	     KVM_REG_ARM_CORE_REG(regs.regs[30]):
+		off -= KVM_REG_ARM_CORE_REG(regs.regs[0]);
+		off /= 2;
+		return &vcpu->arch.ctxt.regs.regs[off];
+
+	case KVM_REG_ARM_CORE_REG(regs.sp):
+		return &vcpu->arch.ctxt.regs.sp;
+
+	case KVM_REG_ARM_CORE_REG(regs.pc):
+		return &vcpu->arch.ctxt.regs.pc;
+
+	case KVM_REG_ARM_CORE_REG(regs.pstate):
+		return &vcpu->arch.ctxt.regs.pstate;
+
+	case KVM_REG_ARM_CORE_REG(sp_el1):
+		return &vcpu->arch.ctxt.sp_el1;
+
+	case KVM_REG_ARM_CORE_REG(elr_el1):
+		return &vcpu->arch.ctxt.elr_el1;
+
+	case KVM_REG_ARM_CORE_REG(spsr[0]) ...
+	     KVM_REG_ARM_CORE_REG(spsr[KVM_NR_SPSR - 1]):
+		off -= KVM_REG_ARM_CORE_REG(spsr[0]);
+		off /= 2;
+		return &vcpu->arch.ctxt.spsr[off];
+
+	case KVM_REG_ARM_CORE_REG(fp_regs.vregs[0]) ...
+	     KVM_REG_ARM_CORE_REG(fp_regs.vregs[31]):
+		off -= KVM_REG_ARM_CORE_REG(fp_regs.vregs[0]);
+		off /= 4;
+		return &vcpu->arch.ctxt.fp_regs.vregs[off];
+
+	case KVM_REG_ARM_CORE_REG(fp_regs.fpsr):
+		return &vcpu->arch.ctxt.fp_regs.fpsr;
+
+	case KVM_REG_ARM_CORE_REG(fp_regs.fpcr):
+		return &vcpu->arch.ctxt.fp_regs.fpcr;
+
+	default:
+		return NULL;
+	}
+}
+
 static int validate_core_offset(const struct kvm_vcpu *vcpu,
 				const struct kvm_one_reg *reg)
 {
@@ -126,8 +175,8 @@ static int get_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 	 * off the index in the "array".
 	 */
 	__u32 __user *uaddr = (__u32 __user *)(unsigned long)reg->addr;
-	struct kvm_regs *regs = vcpu_gp_regs(vcpu);
-	int nr_regs = sizeof(*regs) / sizeof(__u32);
+	int nr_regs = sizeof(struct kvm_regs) / sizeof(__u32);
+	void *addr;
 	u32 off;
 
 	/* Our ID is an index into the kvm_regs struct. */
@@ -139,7 +188,11 @@ static int get_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 	if (validate_core_offset(vcpu, reg))
 		return -EINVAL;
 
-	if (copy_to_user(uaddr, ((u32 *)regs) + off, KVM_REG_SIZE(reg->id)))
+	addr = core_reg_addr(vcpu, reg);
+	if (!addr)		/* Should not happen... */
+		return -EINVAL;
+
+	if (copy_to_user(uaddr, addr, KVM_REG_SIZE(reg->id)))
 		return -EFAULT;
 
 	return 0;
@@ -148,10 +201,9 @@ static int get_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 {
 	__u32 __user *uaddr = (__u32 __user *)(unsigned long)reg->addr;
-	struct kvm_regs *regs = vcpu_gp_regs(vcpu);
-	int nr_regs = sizeof(*regs) / sizeof(__u32);
+	int nr_regs = sizeof(struct kvm_regs) / sizeof(__u32);
 	__uint128_t tmp;
-	void *valp = &tmp;
+	void *valp = &tmp, *addr;
 	u64 off;
 	int err = 0;
 
@@ -164,6 +216,10 @@ static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 	if (validate_core_offset(vcpu, reg))
 		return -EINVAL;
 
+	addr = core_reg_addr(vcpu, reg);
+	if (!addr)		/* Should not happen... */
+		return -EINVAL;
+
 	if (KVM_REG_SIZE(reg->id) > sizeof(tmp))
 		return -EINVAL;
 
@@ -199,7 +255,7 @@ static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 		}
 	}
 
-	memcpy((u32 *)regs + off, valp, KVM_REG_SIZE(reg->id));
+	memcpy(addr, valp, KVM_REG_SIZE(reg->id));
 out:
 	return err;
 }
diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index d22d0534dd600..f86f7dc8bebb8 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -16,8 +16,7 @@
 #include <asm/kvm_mmu.h>
 #include <asm/kvm_ptrauth.h>
 
-#define CPU_GP_REG_OFFSET(x)	(CPU_GP_REGS + x)
-#define CPU_XREG_OFFSET(x)	CPU_GP_REG_OFFSET(CPU_USER_PT_REGS + 8*x)
+#define CPU_XREG_OFFSET(x)	(CPU_USER_PT_REGS + 8*x)
 
 	.text
 	.pushsection	.hyp.text, "ax"
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index bc6c405b343b4..80dfbe4c4bede 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -418,11 +418,11 @@ static bool __hyp_text __hyp_handle_fpsimd(struct kvm_vcpu *vcpu)
 
 	if (sve_guest) {
 		sve_load_state(vcpu_sve_pffr(vcpu),
-			       &vcpu->arch.ctxt.gp_regs.fp_regs.fpsr,
+			       &vcpu->arch.ctxt.fp_regs.fpsr,
 			       sve_vq_from_vl(vcpu->arch.sve_max_vl) - 1);
 		write_sysreg_s(__vcpu_sys_reg(vcpu, ZCR_EL1), SYS_ZCR_EL12);
 	} else {
-		__fpsimd_restore_state(&vcpu->arch.ctxt.gp_regs.fp_regs);
+		__fpsimd_restore_state(&vcpu->arch.ctxt.fp_regs);
 	}
 
 	/* Skip restoring fpexc32 for AArch64 guests */
diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index 970c94ce9d10a..d9dd19f39a537 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -31,7 +31,7 @@ static void __hyp_text __sysreg_save_common_state(struct kvm_cpu_context *ctxt)
 	 * The host arm64 Linux uses sp_el0 to point to 'current' and it must
 	 * therefore be saved/restored on every entry/exit to/from the guest.
 	 */
-	ctxt->gp_regs.regs.sp		= read_sysreg(sp_el0);
+	ctxt->regs.sp			= read_sysreg(sp_el0);
 }
 
 static void __hyp_text __sysreg_save_user_state(struct kvm_cpu_context *ctxt)
@@ -61,15 +61,15 @@ static void __hyp_text __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
 	ctxt_sys_reg(ctxt, PAR_EL1)	= read_sysreg(par_el1);
 	ctxt_sys_reg(ctxt, TPIDR_EL1)	= read_sysreg(tpidr_el1);
 
-	ctxt->gp_regs.sp_el1		= read_sysreg(sp_el1);
-	ctxt->gp_regs.elr_el1		= read_sysreg_el1(SYS_ELR);
-	ctxt->gp_regs.spsr[KVM_SPSR_EL1]= read_sysreg_el1(SYS_SPSR);
+	ctxt->sp_el1			= read_sysreg(sp_el1);
+	ctxt->elr_el1			= read_sysreg_el1(SYS_ELR);
+	ctxt->spsr[KVM_SPSR_EL1]	= read_sysreg_el1(SYS_SPSR);
 }
 
 static void __hyp_text __sysreg_save_el2_return_state(struct kvm_cpu_context *ctxt)
 {
-	ctxt->gp_regs.regs.pc		= read_sysreg_el2(SYS_ELR);
-	ctxt->gp_regs.regs.pstate	= read_sysreg_el2(SYS_SPSR);
+	ctxt->regs.pc			= read_sysreg_el2(SYS_ELR);
+	ctxt->regs.pstate		= read_sysreg_el2(SYS_SPSR);
 
 	if (cpus_have_final_cap(ARM64_HAS_RAS_EXTN))
 		ctxt_sys_reg(ctxt, DISR_EL1) = read_sysreg_s(SYS_VDISR_EL2);
@@ -104,7 +104,7 @@ static void __hyp_text __sysreg_restore_common_state(struct kvm_cpu_context *ctx
 	 * The host arm64 Linux uses sp_el0 to point to 'current' and it must
 	 * therefore be saved/restored on every entry/exit to/from the guest.
 	 */
-	write_sysreg(ctxt->gp_regs.regs.sp,	  sp_el0);
+	write_sysreg(ctxt->regs.sp,		  sp_el0);
 }
 
 static void __hyp_text __sysreg_restore_user_state(struct kvm_cpu_context *ctxt)
@@ -166,15 +166,15 @@ static void __hyp_text __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
 		write_sysreg_el1(ctxt_sys_reg(ctxt, TCR_EL1),	SYS_TCR);
 	}
 
-	write_sysreg(ctxt->gp_regs.sp_el1,		sp_el1);
-	write_sysreg_el1(ctxt->gp_regs.elr_el1,		SYS_ELR);
-	write_sysreg_el1(ctxt->gp_regs.spsr[KVM_SPSR_EL1],SYS_SPSR);
+	write_sysreg(ctxt->sp_el1,			sp_el1);
+	write_sysreg_el1(ctxt->elr_el1,			SYS_ELR);
+	write_sysreg_el1(ctxt->spsr[KVM_SPSR_EL1],	SYS_SPSR);
 }
 
 static void __hyp_text
 __sysreg_restore_el2_return_state(struct kvm_cpu_context *ctxt)
 {
-	u64 pstate = ctxt->gp_regs.regs.pstate;
+	u64 pstate = ctxt->regs.pstate;
 	u64 mode = pstate & PSR_AA32_MODE_MASK;
 
 	/*
@@ -191,7 +191,7 @@ __sysreg_restore_el2_return_state(struct kvm_cpu_context *ctxt)
 	if (!(mode & PSR_MODE32_BIT) && mode >= PSR_MODE_EL2t)
 		pstate = PSR_MODE_EL2h | PSR_IL_BIT;
 
-	write_sysreg_el2(ctxt->gp_regs.regs.pc,		SYS_ELR);
+	write_sysreg_el2(ctxt->regs.pc,			SYS_ELR);
 	write_sysreg_el2(pstate,			SYS_SPSR);
 
 	if (cpus_have_final_cap(ARM64_HAS_RAS_EXTN))
@@ -226,7 +226,7 @@ void __hyp_text __sysreg32_save_state(struct kvm_vcpu *vcpu)
 	if (!vcpu_el1_is_32bit(vcpu))
 		return;
 
-	spsr = vcpu->arch.ctxt.gp_regs.spsr;
+	spsr = vcpu->arch.ctxt.spsr;
 
 	spsr[KVM_SPSR_ABT] = read_sysreg(spsr_abt);
 	spsr[KVM_SPSR_UND] = read_sysreg(spsr_und);
@@ -247,7 +247,7 @@ void __hyp_text __sysreg32_restore_state(struct kvm_vcpu *vcpu)
 	if (!vcpu_el1_is_32bit(vcpu))
 		return;
 
-	spsr = vcpu->arch.ctxt.gp_regs.spsr;
+	spsr = vcpu->arch.ctxt.spsr;
 
 	write_sysreg(spsr[KVM_SPSR_ABT], spsr_abt);
 	write_sysreg(spsr[KVM_SPSR_UND], spsr_und);
diff --git a/arch/arm64/kvm/regmap.c b/arch/arm64/kvm/regmap.c
index a900181e38678..b1596f314087b 100644
--- a/arch/arm64/kvm/regmap.c
+++ b/arch/arm64/kvm/regmap.c
@@ -100,7 +100,7 @@ static const unsigned long vcpu_reg_offsets[VCPU_NR_MODES][16] = {
  */
 unsigned long *vcpu_reg32(const struct kvm_vcpu *vcpu, u8 reg_num)
 {
-	unsigned long *reg_array = (unsigned long *)&vcpu->arch.ctxt.gp_regs.regs;
+	unsigned long *reg_array = (unsigned long *)&vcpu->arch.ctxt.regs;
 	unsigned long mode = *vcpu_cpsr(vcpu) & PSR_AA32_MODE_MASK;
 
 	switch (mode) {
@@ -148,7 +148,7 @@ unsigned long vcpu_read_spsr32(const struct kvm_vcpu *vcpu)
 	int spsr_idx = vcpu_spsr32_mode(vcpu);
 
 	if (!vcpu->arch.sysregs_loaded_on_cpu)
-		return vcpu_gp_regs(vcpu)->spsr[spsr_idx];
+		return vcpu->arch.ctxt.spsr[spsr_idx];
 
 	switch (spsr_idx) {
 	case KVM_SPSR_SVC:
@@ -171,7 +171,7 @@ void vcpu_write_spsr32(struct kvm_vcpu *vcpu, unsigned long v)
 	int spsr_idx = vcpu_spsr32_mode(vcpu);
 
 	if (!vcpu->arch.sysregs_loaded_on_cpu) {
-		vcpu_gp_regs(vcpu)->spsr[spsr_idx] = v;
+		vcpu->arch.ctxt.spsr[spsr_idx] = v;
 		return;
 	}
 
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 895d7d9ad1866..4bf1242b616f5 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -298,7 +298,7 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 
 	/* Reset core registers */
 	memset(vcpu_gp_regs(vcpu), 0, sizeof(*cpu_reset));
-	vcpu_gp_regs(vcpu)->regs.pstate = pstate;
+	vcpu_gp_regs(vcpu)->pstate = pstate;
 
 	/* Reset system registers */
 	kvm_reset_sys_regs(vcpu);
-- 
2.26.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 20/26] KVM: arm64: Move ELR_EL1 to the system register array
  2020-04-22 12:00 [PATCH 00/26] KVM: arm64: Preliminary NV patches Marc Zyngier
                   ` (18 preceding siblings ...)
  2020-04-22 12:00 ` [PATCH 19/26] KVM: arm64: Make struct kvm_regs userspace-only Marc Zyngier
@ 2020-04-22 12:00 ` Marc Zyngier
  2020-05-26 16:29   ` James Morse
  2020-04-22 12:00 ` [PATCH 21/26] KVM: arm64: Move SP_EL1 " Marc Zyngier
                   ` (5 subsequent siblings)
  25 siblings, 1 reply; 78+ messages in thread
From: Marc Zyngier @ 2020-04-22 12:00 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Will Deacon, Andre Przywara, Dave Martin, George Cherian,
	Zengtao (B),
	Catalin Marinas

As ELR-EL1 is a VNCR-capable register with ARMv8.4-NV, let's move it to
the sys_regs array and repaint the accessors. While we're at it, let's
kill the now useless accessors used only on the fault injection path.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_emulate.h | 21 ---------------------
 arch/arm64/include/asm/kvm_host.h    |  3 ++-
 arch/arm64/kvm/guest.c               |  2 +-
 arch/arm64/kvm/hyp/sysreg-sr.c       |  4 ++--
 arch/arm64/kvm/inject_fault.c        |  4 ++--
 arch/arm64/kvm/sys_regs.c            |  2 ++
 6 files changed, 9 insertions(+), 27 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index c0d9363f328dc..1b8a396528851 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -133,27 +133,6 @@ static __always_inline unsigned long *vcpu_pc(const struct kvm_vcpu *vcpu)
 	return (unsigned long *)&vcpu_gp_regs(vcpu)->pc;
 }
 
-static inline unsigned long *__vcpu_elr_el1(const struct kvm_vcpu *vcpu)
-{
-	return (unsigned long *)&vcpu->arch.ctxt.elr_el1;
-}
-
-static inline unsigned long vcpu_read_elr_el1(const struct kvm_vcpu *vcpu)
-{
-	if (vcpu->arch.sysregs_loaded_on_cpu)
-		return read_sysreg_el1(SYS_ELR);
-	else
-		return *__vcpu_elr_el1(vcpu);
-}
-
-static inline void vcpu_write_elr_el1(const struct kvm_vcpu *vcpu, unsigned long v)
-{
-	if (vcpu->arch.sysregs_loaded_on_cpu)
-		write_sysreg_el1(v, SYS_ELR);
-	else
-		*__vcpu_elr_el1(vcpu) = v;
-}
-
 static __always_inline unsigned long *vcpu_cpsr(const struct kvm_vcpu *vcpu)
 {
 	return (unsigned long *)&vcpu_gp_regs(vcpu)->pstate;
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 95977b80265ce..46949fce3e813 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -184,6 +184,8 @@ enum vcpu_sysreg {
 	APGAKEYLO_EL1,
 	APGAKEYHI_EL1,
 
+	ELR_EL1,
+
 	/* 32bit specific registers. Keep them at the end of the range */
 	DACR32_EL2,	/* Domain Access Control Register */
 	IFSR32_EL2,	/* Instruction Fault Status Register */
@@ -238,7 +240,6 @@ struct kvm_cpu_context {
 	struct user_pt_regs regs;	/* sp = sp_el0 */
 
 	u64	sp_el1;
-	u64	elr_el1;
 
 	u64	spsr[KVM_NR_SPSR];
 
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 9fec9231b63e2..ab99fe91c16d0 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -126,7 +126,7 @@ static void *core_reg_addr(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 		return &vcpu->arch.ctxt.sp_el1;
 
 	case KVM_REG_ARM_CORE_REG(elr_el1):
-		return &vcpu->arch.ctxt.elr_el1;
+		return __ctxt_sys_reg(&vcpu->arch.ctxt, ELR_EL1);
 
 	case KVM_REG_ARM_CORE_REG(spsr[0]) ...
 	     KVM_REG_ARM_CORE_REG(spsr[KVM_NR_SPSR - 1]):
diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index d9dd19f39a537..1d862b81eb70a 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -62,7 +62,7 @@ static void __hyp_text __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
 	ctxt_sys_reg(ctxt, TPIDR_EL1)	= read_sysreg(tpidr_el1);
 
 	ctxt->sp_el1			= read_sysreg(sp_el1);
-	ctxt->elr_el1			= read_sysreg_el1(SYS_ELR);
+	ctxt_sys_reg(ctxt, ELR_EL1)	= read_sysreg_el1(SYS_ELR);
 	ctxt->spsr[KVM_SPSR_EL1]	= read_sysreg_el1(SYS_SPSR);
 }
 
@@ -167,7 +167,7 @@ static void __hyp_text __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
 	}
 
 	write_sysreg(ctxt->sp_el1,			sp_el1);
-	write_sysreg_el1(ctxt->elr_el1,			SYS_ELR);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, ELR_EL1),	SYS_ELR);
 	write_sysreg_el1(ctxt->spsr[KVM_SPSR_EL1],	SYS_SPSR);
 }
 
diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index 6aafc2825c1cb..d3ebf8bca4b89 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -116,7 +116,7 @@ static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr
 	bool is_aarch32 = vcpu_mode_is_32bit(vcpu);
 	u32 esr = 0;
 
-	vcpu_write_elr_el1(vcpu, *vcpu_pc(vcpu));
+	vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL1);
 	*vcpu_pc(vcpu) = get_except_vector(vcpu, except_type_sync);
 
 	*vcpu_cpsr(vcpu) = get_except64_pstate(vcpu);
@@ -151,7 +151,7 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
 	unsigned long cpsr = *vcpu_cpsr(vcpu);
 	u32 esr = (ESR_ELx_EC_UNKNOWN << ESR_ELx_EC_SHIFT);
 
-	vcpu_write_elr_el1(vcpu, *vcpu_pc(vcpu));
+	vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL1);
 	*vcpu_pc(vcpu) = get_except_vector(vcpu, except_type_sync);
 
 	*vcpu_cpsr(vcpu) = get_except64_pstate(vcpu);
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 5bda4af36a0e7..7c2fffb20c217 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -95,6 +95,7 @@ static bool __vcpu_read_sys_reg_from_cpu(int reg, u64 *val)
 	case TPIDR_EL1:		*val = read_sysreg_s(SYS_TPIDR_EL1);	break;
 	case AMAIR_EL1:		*val = read_sysreg_s(SYS_AMAIR_EL12);	break;
 	case CNTKCTL_EL1:	*val = read_sysreg_s(SYS_CNTKCTL_EL12);	break;
+	case ELR_EL1:		*val = read_sysreg_s(SYS_ELR_EL12);	break;
 	case PAR_EL1:		*val = read_sysreg_s(SYS_PAR_EL1);	break;
 	case DACR32_EL2:	*val = read_sysreg_s(SYS_DACR32_EL2);	break;
 	case IFSR32_EL2:	*val = read_sysreg_s(SYS_IFSR32_EL2);	break;
@@ -135,6 +136,7 @@ static bool __vcpu_write_sys_reg_to_cpu(u64 val, int reg)
 	case TPIDR_EL1:		write_sysreg_s(val, SYS_TPIDR_EL1);	break;
 	case AMAIR_EL1:		write_sysreg_s(val, SYS_AMAIR_EL12);	break;
 	case CNTKCTL_EL1:	write_sysreg_s(val, SYS_CNTKCTL_EL12);	break;
+	case ELR_EL1:		write_sysreg_s(val, SYS_ELR_EL12);	break;
 	case PAR_EL1:		write_sysreg_s(val, SYS_PAR_EL1);	break;
 	case DACR32_EL2:	write_sysreg_s(val, SYS_DACR32_EL2);	break;
 	case IFSR32_EL2:	write_sysreg_s(val, SYS_IFSR32_EL2);	break;
-- 
2.26.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 21/26] KVM: arm64: Move SP_EL1 to the system register array
  2020-04-22 12:00 [PATCH 00/26] KVM: arm64: Preliminary NV patches Marc Zyngier
                   ` (19 preceding siblings ...)
  2020-04-22 12:00 ` [PATCH 20/26] KVM: arm64: Move ELR_EL1 to the system register array Marc Zyngier
@ 2020-04-22 12:00 ` Marc Zyngier
  2020-05-26 16:29   ` James Morse
  2020-04-22 12:00 ` [PATCH 22/26] KVM: arm64: Disintegrate SPSR array Marc Zyngier
                   ` (4 subsequent siblings)
  25 siblings, 1 reply; 78+ messages in thread
From: Marc Zyngier @ 2020-04-22 12:00 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Will Deacon, Andre Przywara, Dave Martin, George Cherian,
	Zengtao (B),
	Catalin Marinas

SP_EL1 being a VNCR-capable register with ARMv8.4-NV, move it to the
system register array and update the accessors.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h | 3 +--
 arch/arm64/kvm/guest.c            | 2 +-
 arch/arm64/kvm/hyp/sysreg-sr.c    | 4 ++--
 3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 46949fce3e813..4eca55c21c158 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -185,6 +185,7 @@ enum vcpu_sysreg {
 	APGAKEYHI_EL1,
 
 	ELR_EL1,
+	SP_EL1,
 
 	/* 32bit specific registers. Keep them at the end of the range */
 	DACR32_EL2,	/* Domain Access Control Register */
@@ -239,8 +240,6 @@ enum vcpu_sysreg {
 struct kvm_cpu_context {
 	struct user_pt_regs regs;	/* sp = sp_el0 */
 
-	u64	sp_el1;
-
 	u64	spsr[KVM_NR_SPSR];
 
 	struct user_fpsimd_state fp_regs;
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index ab99fe91c16d0..0f3315bc23ad9 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -123,7 +123,7 @@ static void *core_reg_addr(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 		return &vcpu->arch.ctxt.regs.pstate;
 
 	case KVM_REG_ARM_CORE_REG(sp_el1):
-		return &vcpu->arch.ctxt.sp_el1;
+		return __ctxt_sys_reg(&vcpu->arch.ctxt, SP_EL1);
 
 	case KVM_REG_ARM_CORE_REG(elr_el1):
 		return __ctxt_sys_reg(&vcpu->arch.ctxt, ELR_EL1);
diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index 1d862b81eb70a..7ff3b5f3c39d8 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -61,7 +61,7 @@ static void __hyp_text __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
 	ctxt_sys_reg(ctxt, PAR_EL1)	= read_sysreg(par_el1);
 	ctxt_sys_reg(ctxt, TPIDR_EL1)	= read_sysreg(tpidr_el1);
 
-	ctxt->sp_el1			= read_sysreg(sp_el1);
+	ctxt_sys_reg(ctxt, SP_EL1)	= read_sysreg(sp_el1);
 	ctxt_sys_reg(ctxt, ELR_EL1)	= read_sysreg_el1(SYS_ELR);
 	ctxt->spsr[KVM_SPSR_EL1]	= read_sysreg_el1(SYS_SPSR);
 }
@@ -166,7 +166,7 @@ static void __hyp_text __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
 		write_sysreg_el1(ctxt_sys_reg(ctxt, TCR_EL1),	SYS_TCR);
 	}
 
-	write_sysreg(ctxt->sp_el1,			sp_el1);
+	write_sysreg(ctxt_sys_reg(ctxt, SP_EL1),	sp_el1);
 	write_sysreg_el1(ctxt_sys_reg(ctxt, ELR_EL1),	SYS_ELR);
 	write_sysreg_el1(ctxt->spsr[KVM_SPSR_EL1],	SYS_SPSR);
 }
-- 
2.26.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 22/26] KVM: arm64: Disintegrate SPSR array
  2020-04-22 12:00 [PATCH 00/26] KVM: arm64: Preliminary NV patches Marc Zyngier
                   ` (20 preceding siblings ...)
  2020-04-22 12:00 ` [PATCH 21/26] KVM: arm64: Move SP_EL1 " Marc Zyngier
@ 2020-04-22 12:00 ` Marc Zyngier
  2020-05-26 16:30   ` James Morse
  2020-04-22 12:00 ` [PATCH 23/26] KVM: arm64: Move SPSR_EL1 to the system register array Marc Zyngier
                   ` (3 subsequent siblings)
  25 siblings, 1 reply; 78+ messages in thread
From: Marc Zyngier @ 2020-04-22 12:00 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Will Deacon, Andre Przywara, Dave Martin, George Cherian,
	Zengtao (B),
	Catalin Marinas

As we're about to move SPSR_EL1 into the VNCR page, we need to
disassociate it from the rest of the 32bit cruft. Let's break
the array into individual fields.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_emulate.h |  4 ++--
 arch/arm64/include/asm/kvm_host.h    |  6 ++++-
 arch/arm64/kvm/guest.c               | 19 +++++++++++----
 arch/arm64/kvm/hyp/sysreg-sr.c       | 28 ++++++++--------------
 arch/arm64/kvm/regmap.c              | 35 +++++++++++++++++++++++++---
 5 files changed, 63 insertions(+), 29 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 1b8a396528851..6eb48c1f324d5 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -182,7 +182,7 @@ static inline unsigned long vcpu_read_spsr(const struct kvm_vcpu *vcpu)
 	if (vcpu->arch.sysregs_loaded_on_cpu)
 		return read_sysreg_el1(SYS_SPSR);
 	else
-		return vcpu->arch.ctxt.spsr[KVM_SPSR_EL1];
+		return vcpu->arch.ctxt.spsr_el1;
 }
 
 static inline void vcpu_write_spsr(struct kvm_vcpu *vcpu, unsigned long v)
@@ -195,7 +195,7 @@ static inline void vcpu_write_spsr(struct kvm_vcpu *vcpu, unsigned long v)
 	if (vcpu->arch.sysregs_loaded_on_cpu)
 		write_sysreg_el1(v, SYS_SPSR);
 	else
-		vcpu->arch.ctxt.spsr[KVM_SPSR_EL1] = v;
+		vcpu->arch.ctxt.spsr_el1 = v;
 }
 
 /*
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 4eca55c21c158..a918f33074b15 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -240,7 +240,11 @@ enum vcpu_sysreg {
 struct kvm_cpu_context {
 	struct user_pt_regs regs;	/* sp = sp_el0 */
 
-	u64	spsr[KVM_NR_SPSR];
+	u64	spsr_el1;		/* aka spsr_svc */
+	u64	spsr_abt;
+	u64	spsr_und;
+	u64	spsr_irq;
+	u64	spsr_fiq;
 
 	struct user_fpsimd_state fp_regs;
 
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 0f3315bc23ad9..6e655ea6c98d0 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -128,11 +128,20 @@ static void *core_reg_addr(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 	case KVM_REG_ARM_CORE_REG(elr_el1):
 		return __ctxt_sys_reg(&vcpu->arch.ctxt, ELR_EL1);
 
-	case KVM_REG_ARM_CORE_REG(spsr[0]) ...
-	     KVM_REG_ARM_CORE_REG(spsr[KVM_NR_SPSR - 1]):
-		off -= KVM_REG_ARM_CORE_REG(spsr[0]);
-		off /= 2;
-		return &vcpu->arch.ctxt.spsr[off];
+	case KVM_REG_ARM_CORE_REG(spsr[KVM_SPSR_EL1]):
+		return &vcpu->arch.ctxt.spsr_el1;
+
+	case KVM_REG_ARM_CORE_REG(spsr[KVM_SPSR_ABT]):
+		return &vcpu->arch.ctxt.spsr_abt;
+
+	case KVM_REG_ARM_CORE_REG(spsr[KVM_SPSR_UND]):
+		return &vcpu->arch.ctxt.spsr_und;
+
+	case KVM_REG_ARM_CORE_REG(spsr[KVM_SPSR_IRQ]):
+		return &vcpu->arch.ctxt.spsr_irq;
+
+	case KVM_REG_ARM_CORE_REG(spsr[KVM_SPSR_FIQ]):
+		return &vcpu->arch.ctxt.spsr_fiq;
 
 	case KVM_REG_ARM_CORE_REG(fp_regs.vregs[0]) ...
 	     KVM_REG_ARM_CORE_REG(fp_regs.vregs[31]):
diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index 7ff3b5f3c39d8..c932ace46c9df 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -63,7 +63,7 @@ static void __hyp_text __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
 
 	ctxt_sys_reg(ctxt, SP_EL1)	= read_sysreg(sp_el1);
 	ctxt_sys_reg(ctxt, ELR_EL1)	= read_sysreg_el1(SYS_ELR);
-	ctxt->spsr[KVM_SPSR_EL1]	= read_sysreg_el1(SYS_SPSR);
+	ctxt->spsr_el1			= read_sysreg_el1(SYS_SPSR);
 }
 
 static void __hyp_text __sysreg_save_el2_return_state(struct kvm_cpu_context *ctxt)
@@ -168,7 +168,7 @@ static void __hyp_text __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
 
 	write_sysreg(ctxt_sys_reg(ctxt, SP_EL1),	sp_el1);
 	write_sysreg_el1(ctxt_sys_reg(ctxt, ELR_EL1),	SYS_ELR);
-	write_sysreg_el1(ctxt->spsr[KVM_SPSR_EL1],	SYS_SPSR);
+	write_sysreg_el1(ctxt->spsr_el1,		SYS_SPSR);
 }
 
 static void __hyp_text
@@ -221,17 +221,13 @@ NOKPROBE_SYMBOL(sysreg_restore_guest_state_vhe);
 
 void __hyp_text __sysreg32_save_state(struct kvm_vcpu *vcpu)
 {
-	u64 *spsr;
-
 	if (!vcpu_el1_is_32bit(vcpu))
 		return;
 
-	spsr = vcpu->arch.ctxt.spsr;
-
-	spsr[KVM_SPSR_ABT] = read_sysreg(spsr_abt);
-	spsr[KVM_SPSR_UND] = read_sysreg(spsr_und);
-	spsr[KVM_SPSR_IRQ] = read_sysreg(spsr_irq);
-	spsr[KVM_SPSR_FIQ] = read_sysreg(spsr_fiq);
+	vcpu->arch.ctxt.spsr_abt = read_sysreg(spsr_abt);
+	vcpu->arch.ctxt.spsr_und = read_sysreg(spsr_und);
+	vcpu->arch.ctxt.spsr_irq = read_sysreg(spsr_irq);
+	vcpu->arch.ctxt.spsr_fiq = read_sysreg(spsr_fiq);
 
 	__vcpu_sys_reg(vcpu, DACR32_EL2) = read_sysreg(dacr32_el2);
 	__vcpu_sys_reg(vcpu, IFSR32_EL2) = read_sysreg(ifsr32_el2);
@@ -242,17 +238,13 @@ void __hyp_text __sysreg32_save_state(struct kvm_vcpu *vcpu)
 
 void __hyp_text __sysreg32_restore_state(struct kvm_vcpu *vcpu)
 {
-	u64 *spsr;
-
 	if (!vcpu_el1_is_32bit(vcpu))
 		return;
 
-	spsr = vcpu->arch.ctxt.spsr;
-
-	write_sysreg(spsr[KVM_SPSR_ABT], spsr_abt);
-	write_sysreg(spsr[KVM_SPSR_UND], spsr_und);
-	write_sysreg(spsr[KVM_SPSR_IRQ], spsr_irq);
-	write_sysreg(spsr[KVM_SPSR_FIQ], spsr_fiq);
+	write_sysreg(vcpu->arch.ctxt.spsr_abt, spsr_abt);
+	write_sysreg(vcpu->arch.ctxt.spsr_und, spsr_und);
+	write_sysreg(vcpu->arch.ctxt.spsr_irq, spsr_irq);
+	write_sysreg(vcpu->arch.ctxt.spsr_fiq, spsr_fiq);
 
 	write_sysreg(__vcpu_sys_reg(vcpu, DACR32_EL2), dacr32_el2);
 	write_sysreg(__vcpu_sys_reg(vcpu, IFSR32_EL2), ifsr32_el2);
diff --git a/arch/arm64/kvm/regmap.c b/arch/arm64/kvm/regmap.c
index b1596f314087b..97c110810527e 100644
--- a/arch/arm64/kvm/regmap.c
+++ b/arch/arm64/kvm/regmap.c
@@ -147,8 +147,20 @@ unsigned long vcpu_read_spsr32(const struct kvm_vcpu *vcpu)
 {
 	int spsr_idx = vcpu_spsr32_mode(vcpu);
 
-	if (!vcpu->arch.sysregs_loaded_on_cpu)
-		return vcpu->arch.ctxt.spsr[spsr_idx];
+	if (!vcpu->arch.sysregs_loaded_on_cpu) {
+		switch (spsr_idx) {
+		case KVM_SPSR_SVC:
+			return vcpu->arch.ctxt.spsr_el1;
+		case KVM_SPSR_ABT:
+			return vcpu->arch.ctxt.spsr_abt;
+		case KVM_SPSR_UND:
+			return vcpu->arch.ctxt.spsr_und;
+		case KVM_SPSR_IRQ:
+			return vcpu->arch.ctxt.spsr_irq;
+		case KVM_SPSR_FIQ:
+			return vcpu->arch.ctxt.spsr_fiq;
+		}
+	}
 
 	switch (spsr_idx) {
 	case KVM_SPSR_SVC:
@@ -171,7 +183,24 @@ void vcpu_write_spsr32(struct kvm_vcpu *vcpu, unsigned long v)
 	int spsr_idx = vcpu_spsr32_mode(vcpu);
 
 	if (!vcpu->arch.sysregs_loaded_on_cpu) {
-		vcpu->arch.ctxt.spsr[spsr_idx] = v;
+		switch (spsr_idx) {
+		case KVM_SPSR_SVC:
+			vcpu->arch.ctxt.spsr_el1 = v;
+			break;
+		case KVM_SPSR_ABT:
+			vcpu->arch.ctxt.spsr_abt = v;
+			break;
+		case KVM_SPSR_UND:
+			vcpu->arch.ctxt.spsr_und = v;
+			break;
+		case KVM_SPSR_IRQ:
+			vcpu->arch.ctxt.spsr_irq = v;
+			break;
+		case KVM_SPSR_FIQ:
+			vcpu->arch.ctxt.spsr_fiq = v;
+			break;
+		}
+
 		return;
 	}
 
-- 
2.26.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 23/26] KVM: arm64: Move SPSR_EL1 to the system register array
  2020-04-22 12:00 [PATCH 00/26] KVM: arm64: Preliminary NV patches Marc Zyngier
                   ` (21 preceding siblings ...)
  2020-04-22 12:00 ` [PATCH 22/26] KVM: arm64: Disintegrate SPSR array Marc Zyngier
@ 2020-04-22 12:00 ` Marc Zyngier
  2020-05-26 16:30   ` James Morse
  2020-04-22 12:00 ` [PATCH 24/26] KVM: arm64: timers: Rename kvm_timer_sync_hwstate to kvm_timer_sync_user Marc Zyngier
                   ` (2 subsequent siblings)
  25 siblings, 1 reply; 78+ messages in thread
From: Marc Zyngier @ 2020-04-22 12:00 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Will Deacon, Andre Przywara, Dave Martin, George Cherian,
	Zengtao (B),
	Catalin Marinas

SPSR_EL1 being a VNCR-capable register with ARMv8.4-NV, move it to
the sysregs array and update the accessors.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_emulate.h | 4 ++--
 arch/arm64/include/asm/kvm_host.h    | 2 +-
 arch/arm64/kvm/guest.c               | 2 +-
 arch/arm64/kvm/hyp/sysreg-sr.c       | 4 ++--
 arch/arm64/kvm/regmap.c              | 4 ++--
 5 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 6eb48c1f324d5..77f771b08cf35 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -182,7 +182,7 @@ static inline unsigned long vcpu_read_spsr(const struct kvm_vcpu *vcpu)
 	if (vcpu->arch.sysregs_loaded_on_cpu)
 		return read_sysreg_el1(SYS_SPSR);
 	else
-		return vcpu->arch.ctxt.spsr_el1;
+		return __vcpu_sys_reg(vcpu, SPSR_EL1);
 }
 
 static inline void vcpu_write_spsr(struct kvm_vcpu *vcpu, unsigned long v)
@@ -195,7 +195,7 @@ static inline void vcpu_write_spsr(struct kvm_vcpu *vcpu, unsigned long v)
 	if (vcpu->arch.sysregs_loaded_on_cpu)
 		write_sysreg_el1(v, SYS_SPSR);
 	else
-		vcpu->arch.ctxt.spsr_el1 = v;
+		__vcpu_sys_reg(vcpu, SPSR_EL1) = v;
 }
 
 /*
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index a918f33074b15..e8305761f0077 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -186,6 +186,7 @@ enum vcpu_sysreg {
 
 	ELR_EL1,
 	SP_EL1,
+	SPSR_EL1,
 
 	/* 32bit specific registers. Keep them at the end of the range */
 	DACR32_EL2,	/* Domain Access Control Register */
@@ -240,7 +241,6 @@ enum vcpu_sysreg {
 struct kvm_cpu_context {
 	struct user_pt_regs regs;	/* sp = sp_el0 */
 
-	u64	spsr_el1;		/* aka spsr_svc */
 	u64	spsr_abt;
 	u64	spsr_und;
 	u64	spsr_irq;
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 6e655ea6c98d0..91215170e6883 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -129,7 +129,7 @@ static void *core_reg_addr(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 		return __ctxt_sys_reg(&vcpu->arch.ctxt, ELR_EL1);
 
 	case KVM_REG_ARM_CORE_REG(spsr[KVM_SPSR_EL1]):
-		return &vcpu->arch.ctxt.spsr_el1;
+		return __ctxt_sys_reg(&vcpu->arch.ctxt, SPSR_EL1);
 
 	case KVM_REG_ARM_CORE_REG(spsr[KVM_SPSR_ABT]):
 		return &vcpu->arch.ctxt.spsr_abt;
diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index c932ace46c9df..3457252e475de 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -63,7 +63,7 @@ static void __hyp_text __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
 
 	ctxt_sys_reg(ctxt, SP_EL1)	= read_sysreg(sp_el1);
 	ctxt_sys_reg(ctxt, ELR_EL1)	= read_sysreg_el1(SYS_ELR);
-	ctxt->spsr_el1			= read_sysreg_el1(SYS_SPSR);
+	ctxt_sys_reg(ctxt, SPSR_EL1)	= read_sysreg_el1(SYS_SPSR);
 }
 
 static void __hyp_text __sysreg_save_el2_return_state(struct kvm_cpu_context *ctxt)
@@ -168,7 +168,7 @@ static void __hyp_text __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
 
 	write_sysreg(ctxt_sys_reg(ctxt, SP_EL1),	sp_el1);
 	write_sysreg_el1(ctxt_sys_reg(ctxt, ELR_EL1),	SYS_ELR);
-	write_sysreg_el1(ctxt->spsr_el1,		SYS_SPSR);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, SPSR_EL1),	SYS_SPSR);
 }
 
 static void __hyp_text
diff --git a/arch/arm64/kvm/regmap.c b/arch/arm64/kvm/regmap.c
index 97c110810527e..accc1d5fba615 100644
--- a/arch/arm64/kvm/regmap.c
+++ b/arch/arm64/kvm/regmap.c
@@ -150,7 +150,7 @@ unsigned long vcpu_read_spsr32(const struct kvm_vcpu *vcpu)
 	if (!vcpu->arch.sysregs_loaded_on_cpu) {
 		switch (spsr_idx) {
 		case KVM_SPSR_SVC:
-			return vcpu->arch.ctxt.spsr_el1;
+			return __vcpu_sys_reg(vcpu, SPSR_EL1);
 		case KVM_SPSR_ABT:
 			return vcpu->arch.ctxt.spsr_abt;
 		case KVM_SPSR_UND:
@@ -185,7 +185,7 @@ void vcpu_write_spsr32(struct kvm_vcpu *vcpu, unsigned long v)
 	if (!vcpu->arch.sysregs_loaded_on_cpu) {
 		switch (spsr_idx) {
 		case KVM_SPSR_SVC:
-			vcpu->arch.ctxt.spsr_el1 = v;
+			__vcpu_sys_reg(vcpu, SPSR_EL1) = v;
 			break;
 		case KVM_SPSR_ABT:
 			vcpu->arch.ctxt.spsr_abt = v;
-- 
2.26.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 24/26] KVM: arm64: timers: Rename kvm_timer_sync_hwstate to kvm_timer_sync_user
  2020-04-22 12:00 [PATCH 00/26] KVM: arm64: Preliminary NV patches Marc Zyngier
                   ` (22 preceding siblings ...)
  2020-04-22 12:00 ` [PATCH 23/26] KVM: arm64: Move SPSR_EL1 to the system register array Marc Zyngier
@ 2020-04-22 12:00 ` Marc Zyngier
  2020-04-22 12:00 ` [PATCH 25/26] KVM: arm64: timers: Move timer registers to the sys_regs file Marc Zyngier
  2020-04-22 12:00 ` [PATCH 26/26] KVM: arm64: Parametrize exception entry with a target EL Marc Zyngier
  25 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2020-04-22 12:00 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Will Deacon, Andre Przywara, Dave Martin, George Cherian,
	Zengtao (B),
	Catalin Marinas

kvm_timer_sync_hwstate() has nothing to do with the timer HW state,
but more to do with the state of a userspace interrupt controller.
Change the suffix from _hwstate to_user, in keeping with the rest
of the code.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 include/kvm/arm_arch_timer.h | 2 +-
 virt/kvm/arm/arch_timer.c    | 2 +-
 virt/kvm/arm/arm.c           | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index d120e6c323e7c..a821dd1df0cfa 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -71,7 +71,7 @@ int kvm_timer_hyp_init(bool);
 int kvm_timer_enable(struct kvm_vcpu *vcpu);
 int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu);
 void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu);
-void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu);
+void kvm_timer_sync_user(struct kvm_vcpu *vcpu);
 bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu);
 void kvm_timer_update_run(struct kvm_vcpu *vcpu);
 void kvm_timer_vcpu_terminate(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 93bd59b46848e..1961de8828ee9 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -624,7 +624,7 @@ static void unmask_vtimer_irq_user(struct kvm_vcpu *vcpu)
 	}
 }
 
-void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
+void kvm_timer_sync_user(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = vcpu_timer(vcpu);
 
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 03f01fcfa2bd5..a7ca341e7fd76 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -720,7 +720,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 			isb(); /* Ensure work in x_flush_hwstate is committed */
 			kvm_pmu_sync_hwstate(vcpu);
 			if (static_branch_unlikely(&userspace_irqchip_in_use))
-				kvm_timer_sync_hwstate(vcpu);
+				kvm_timer_sync_user(vcpu);
 			kvm_vgic_sync_hwstate(vcpu);
 			local_irq_enable();
 			preempt_enable();
@@ -769,7 +769,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		 * timer virtual interrupt state.
 		 */
 		if (static_branch_unlikely(&userspace_irqchip_in_use))
-			kvm_timer_sync_hwstate(vcpu);
+			kvm_timer_sync_user(vcpu);
 
 		kvm_arch_vcpu_ctxsync_fp(vcpu);
 
-- 
2.26.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 25/26] KVM: arm64: timers: Move timer registers to the sys_regs file
  2020-04-22 12:00 [PATCH 00/26] KVM: arm64: Preliminary NV patches Marc Zyngier
                   ` (23 preceding siblings ...)
  2020-04-22 12:00 ` [PATCH 24/26] KVM: arm64: timers: Rename kvm_timer_sync_hwstate to kvm_timer_sync_user Marc Zyngier
@ 2020-04-22 12:00 ` Marc Zyngier
  2020-04-22 12:00 ` [PATCH 26/26] KVM: arm64: Parametrize exception entry with a target EL Marc Zyngier
  25 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2020-04-22 12:00 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Will Deacon, Andre Przywara, Dave Martin, George Cherian,
	Zengtao (B),
	Catalin Marinas

Move the timer gsisters to the sysreg file. This will further help when
they are directly changed by a nesting hypervisor in the VNCR page.

This requires moving the initialisation of the timer struct so that some
of the helpers (such as arch_timer_ctx_index) can work correctly at an
early stage.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h |   6 ++
 include/kvm/arm_arch_timer.h      |  11 +--
 virt/kvm/arm/arch_timer.c         | 155 +++++++++++++++++++++++-------
 virt/kvm/arm/trace.h              |   8 +-
 4 files changed, 136 insertions(+), 44 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index e8305761f0077..600dca55fc567 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -188,6 +188,12 @@ enum vcpu_sysreg {
 	SP_EL1,
 	SPSR_EL1,
 
+	CNTVOFF_EL2,
+	CNTV_CVAL_EL0,
+	CNTV_CTL_EL0,
+	CNTP_CVAL_EL0,
+	CNTP_CTL_EL0,
+
 	/* 32bit specific registers. Keep them at the end of the range */
 	DACR32_EL2,	/* Domain Access Control Register */
 	IFSR32_EL2,	/* Instruction Fault Status Register */
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index a821dd1df0cfa..51c19381108cf 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -26,16 +26,9 @@ enum kvm_arch_timer_regs {
 struct arch_timer_context {
 	struct kvm_vcpu			*vcpu;
 
-	/* Registers: control register, timer value */
-	u32				cnt_ctl;
-	u64				cnt_cval;
-
 	/* Timer IRQ */
 	struct kvm_irq_level		irq;
 
-	/* Virtual offset */
-	u64				cntvoff;
-
 	/* Emulated Timer (may be unused) */
 	struct hrtimer			hrtimer;
 
@@ -109,4 +102,8 @@ void kvm_arm_timer_write_sysreg(struct kvm_vcpu *vcpu,
 				enum kvm_arch_timer_regs treg,
 				u64 val);
 
+/* Needed for tracing */
+u32 timer_get_ctl(struct arch_timer_context *ctxt);
+u64 timer_get_cval(struct arch_timer_context *ctxt);
+
 #endif
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 1961de8828ee9..78b8eca02312e 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -51,6 +51,93 @@ static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
 			      struct arch_timer_context *timer,
 			      enum kvm_arch_timer_regs treg);
 
+u32 timer_get_ctl(struct arch_timer_context *ctxt)
+{
+	struct kvm_vcpu *vcpu = ctxt->vcpu;
+
+	switch(arch_timer_ctx_index(ctxt)) {
+	case TIMER_VTIMER:
+		return __vcpu_sys_reg(vcpu, CNTV_CTL_EL0);
+	case TIMER_PTIMER:
+		return __vcpu_sys_reg(vcpu, CNTP_CTL_EL0);
+	default:
+		WARN_ON(1);
+		return 0;
+	}
+}
+
+u64 timer_get_cval(struct arch_timer_context *ctxt)
+{
+	struct kvm_vcpu *vcpu = ctxt->vcpu;
+
+	switch(arch_timer_ctx_index(ctxt)) {
+	case TIMER_VTIMER:
+		return __vcpu_sys_reg(vcpu, CNTV_CVAL_EL0);
+	case TIMER_PTIMER:
+		return __vcpu_sys_reg(vcpu, CNTP_CVAL_EL0);
+	default:
+		WARN_ON(1);
+		return 0;
+	}
+}
+
+static u64 timer_get_offset(struct arch_timer_context *ctxt)
+{
+	struct kvm_vcpu *vcpu = ctxt->vcpu;
+
+	switch(arch_timer_ctx_index(ctxt)) {
+	case TIMER_VTIMER:
+		return __vcpu_sys_reg(vcpu, CNTVOFF_EL2);
+	default:
+		return 0;
+	}
+}
+
+static void timer_set_ctl(struct arch_timer_context *ctxt, u32 ctl)
+{
+	struct kvm_vcpu *vcpu = ctxt->vcpu;
+
+	switch(arch_timer_ctx_index(ctxt)) {
+	case TIMER_VTIMER:
+		__vcpu_sys_reg(vcpu, CNTV_CTL_EL0) = ctl;
+		break;
+	case TIMER_PTIMER:
+		__vcpu_sys_reg(vcpu, CNTP_CTL_EL0) = ctl;
+		break;
+	default:
+		WARN_ON(1);
+	}
+}
+
+static void timer_set_cval(struct arch_timer_context *ctxt, u64 cval)
+{
+	struct kvm_vcpu *vcpu = ctxt->vcpu;
+
+	switch(arch_timer_ctx_index(ctxt)) {
+	case TIMER_VTIMER:
+		__vcpu_sys_reg(vcpu, CNTV_CVAL_EL0) = cval;
+		break;
+	case TIMER_PTIMER:
+		__vcpu_sys_reg(vcpu, CNTP_CVAL_EL0) = cval;
+		break;
+	default:
+		WARN_ON(1);
+	}
+}
+
+static void timer_set_offset(struct arch_timer_context *ctxt, u64 offset)
+{
+	struct kvm_vcpu *vcpu = ctxt->vcpu;
+
+	switch(arch_timer_ctx_index(ctxt)) {
+	case TIMER_VTIMER:
+		__vcpu_sys_reg(vcpu, CNTVOFF_EL2) = offset;
+		break;
+	default:
+		WARN(offset, "timer %ld\n", arch_timer_ctx_index(ctxt));
+	}
+}
+
 u64 kvm_phys_timer_read(void)
 {
 	return timecounter->cc->read(timecounter->cc);
@@ -124,8 +211,8 @@ static u64 kvm_timer_compute_delta(struct arch_timer_context *timer_ctx)
 {
 	u64 cval, now;
 
-	cval = timer_ctx->cnt_cval;
-	now = kvm_phys_timer_read() - timer_ctx->cntvoff;
+	cval = timer_get_cval(timer_ctx);
+	now = kvm_phys_timer_read() - timer_get_offset(timer_ctx);
 
 	if (now < cval) {
 		u64 ns;
@@ -144,8 +231,8 @@ static bool kvm_timer_irq_can_fire(struct arch_timer_context *timer_ctx)
 {
 	WARN_ON(timer_ctx && timer_ctx->loaded);
 	return timer_ctx &&
-	       !(timer_ctx->cnt_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
-		(timer_ctx->cnt_ctl & ARCH_TIMER_CTRL_ENABLE);
+		((timer_get_ctl(timer_ctx) &
+		  (ARCH_TIMER_CTRL_IT_MASK | ARCH_TIMER_CTRL_ENABLE)) == ARCH_TIMER_CTRL_ENABLE);
 }
 
 /*
@@ -256,8 +343,8 @@ static bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx)
 	if (!kvm_timer_irq_can_fire(timer_ctx))
 		return false;
 
-	cval = timer_ctx->cnt_cval;
-	now = kvm_phys_timer_read() - timer_ctx->cntvoff;
+	cval = timer_get_cval(timer_ctx);
+	now = kvm_phys_timer_read() - timer_get_offset(timer_ctx);
 
 	return cval <= now;
 }
@@ -350,8 +437,8 @@ static void timer_save_state(struct arch_timer_context *ctx)
 
 	switch (index) {
 	case TIMER_VTIMER:
-		ctx->cnt_ctl = read_sysreg_el0(SYS_CNTV_CTL);
-		ctx->cnt_cval = read_sysreg_el0(SYS_CNTV_CVAL);
+		timer_set_ctl(ctx, read_sysreg_el0(SYS_CNTV_CTL));
+		timer_set_cval(ctx, read_sysreg_el0(SYS_CNTV_CVAL));
 
 		/* Disable the timer */
 		write_sysreg_el0(0, SYS_CNTV_CTL);
@@ -359,8 +446,8 @@ static void timer_save_state(struct arch_timer_context *ctx)
 
 		break;
 	case TIMER_PTIMER:
-		ctx->cnt_ctl = read_sysreg_el0(SYS_CNTP_CTL);
-		ctx->cnt_cval = read_sysreg_el0(SYS_CNTP_CVAL);
+		timer_set_ctl(ctx, read_sysreg_el0(SYS_CNTP_CTL));
+		timer_set_cval(ctx, read_sysreg_el0(SYS_CNTP_CVAL));
 
 		/* Disable the timer */
 		write_sysreg_el0(0, SYS_CNTP_CTL);
@@ -429,14 +516,14 @@ static void timer_restore_state(struct arch_timer_context *ctx)
 
 	switch (index) {
 	case TIMER_VTIMER:
-		write_sysreg_el0(ctx->cnt_cval, SYS_CNTV_CVAL);
+		write_sysreg_el0(timer_get_cval(ctx), SYS_CNTV_CVAL);
 		isb();
-		write_sysreg_el0(ctx->cnt_ctl, SYS_CNTV_CTL);
+		write_sysreg_el0(timer_get_ctl(ctx), SYS_CNTV_CTL);
 		break;
 	case TIMER_PTIMER:
-		write_sysreg_el0(ctx->cnt_cval, SYS_CNTP_CVAL);
+		write_sysreg_el0(timer_get_cval(ctx), SYS_CNTP_CVAL);
 		isb();
-		write_sysreg_el0(ctx->cnt_ctl, SYS_CNTP_CTL);
+		write_sysreg_el0(timer_get_ctl(ctx), SYS_CNTP_CTL);
 		break;
 	case NR_KVM_TIMERS:
 		BUG();
@@ -538,7 +625,7 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
 		kvm_timer_vcpu_load_nogic(vcpu);
 	}
 
-	set_cntvoff(map.direct_vtimer->cntvoff);
+	set_cntvoff(timer_get_offset(map.direct_vtimer));
 
 	kvm_timer_unblocking(vcpu);
 
@@ -648,8 +735,8 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
 	 * resets the timer to be disabled and unmasked and is compliant with
 	 * the ARMv7 architecture.
 	 */
-	vcpu_vtimer(vcpu)->cnt_ctl = 0;
-	vcpu_ptimer(vcpu)->cnt_ctl = 0;
+	timer_set_ctl(vcpu_vtimer(vcpu), 0);
+	timer_set_ctl(vcpu_ptimer(vcpu), 0);
 
 	if (timer->enabled) {
 		kvm_timer_update_irq(vcpu, false, vcpu_vtimer(vcpu));
@@ -677,13 +764,13 @@ static void update_vtimer_cntvoff(struct kvm_vcpu *vcpu, u64 cntvoff)
 
 	mutex_lock(&kvm->lock);
 	kvm_for_each_vcpu(i, tmp, kvm)
-		vcpu_vtimer(tmp)->cntvoff = cntvoff;
+		timer_set_offset(vcpu_vtimer(tmp), cntvoff);
 
 	/*
 	 * When called from the vcpu create path, the CPU being created is not
 	 * included in the loop above, so we just set it here as well.
 	 */
-	vcpu_vtimer(vcpu)->cntvoff = cntvoff;
+	timer_set_offset(vcpu_vtimer(vcpu), cntvoff);
 	mutex_unlock(&kvm->lock);
 }
 
@@ -693,9 +780,12 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
 	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
 	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
 
+	vtimer->vcpu = vcpu;
+	ptimer->vcpu = vcpu;
+
 	/* Synchronize cntvoff across all vtimers of a VM. */
 	update_vtimer_cntvoff(vcpu, kvm_phys_timer_read());
-	ptimer->cntvoff = 0;
+	timer_set_offset(ptimer, 0);
 
 	hrtimer_init(&timer->bg_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
 	timer->bg_timer.function = kvm_bg_timer_expire;
@@ -713,9 +803,6 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
 
 	vtimer->host_timer_irq_flags = host_vtimer_irq_flags;
 	ptimer->host_timer_irq_flags = host_ptimer_irq_flags;
-
-	vtimer->vcpu = vcpu;
-	ptimer->vcpu = vcpu;
 }
 
 static void kvm_timer_init_interrupt(void *info)
@@ -765,10 +852,12 @@ static u64 read_timer_ctl(struct arch_timer_context *timer)
 	 * UNKNOWN when ENABLE bit is 0, so we chose to set ISTATUS bit
 	 * regardless of ENABLE bit for our implementation convenience.
 	 */
+	u32 ctl = timer_get_ctl(timer);
+
 	if (!kvm_timer_compute_delta(timer))
-		return timer->cnt_ctl | ARCH_TIMER_CTRL_IT_STAT;
-	else
-		return timer->cnt_ctl;
+		ctl |= ARCH_TIMER_CTRL_IT_STAT;
+
+	return ctl;
 }
 
 u64 kvm_arm_timer_get_reg(struct kvm_vcpu *vcpu, u64 regid)
@@ -804,8 +893,8 @@ static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
 
 	switch (treg) {
 	case TIMER_REG_TVAL:
-		val = timer->cnt_cval - kvm_phys_timer_read() + timer->cntvoff;
-		val &= lower_32_bits(val);
+		val = timer_get_cval(timer) - kvm_phys_timer_read() + timer_get_offset(timer);
+		val = lower_32_bits(val);
 		break;
 
 	case TIMER_REG_CTL:
@@ -813,11 +902,11 @@ static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
 		break;
 
 	case TIMER_REG_CVAL:
-		val = timer->cnt_cval;
+		val = timer_get_cval(timer);
 		break;
 
 	case TIMER_REG_CNT:
-		val = kvm_phys_timer_read() - timer->cntvoff;
+		val = kvm_phys_timer_read() - timer_get_offset(timer);
 		break;
 
 	default:
@@ -851,15 +940,15 @@ static void kvm_arm_timer_write(struct kvm_vcpu *vcpu,
 {
 	switch (treg) {
 	case TIMER_REG_TVAL:
-		timer->cnt_cval = kvm_phys_timer_read() - timer->cntvoff + (s32)val;
+		timer_set_cval(timer, kvm_phys_timer_read() - timer_get_offset(timer) + (s32)val);
 		break;
 
 	case TIMER_REG_CTL:
-		timer->cnt_ctl = val & ~ARCH_TIMER_CTRL_IT_STAT;
+		timer_set_ctl(timer, val & ~ARCH_TIMER_CTRL_IT_STAT);
 		break;
 
 	case TIMER_REG_CVAL:
-		timer->cnt_cval = val;
+		timer_set_cval(timer, val);
 		break;
 
 	default:
diff --git a/virt/kvm/arm/trace.h b/virt/kvm/arm/trace.h
index cc94ccc688217..ff09cd493e480 100644
--- a/virt/kvm/arm/trace.h
+++ b/virt/kvm/arm/trace.h
@@ -302,8 +302,8 @@ TRACE_EVENT(kvm_timer_save_state,
 	),
 
 	TP_fast_assign(
-		__entry->ctl			= ctx->cnt_ctl;
-		__entry->cval			= ctx->cnt_cval;
+		__entry->ctl			= timer_get_ctl(ctx);
+		__entry->cval			= timer_get_cval(ctx);
 		__entry->timer_idx		= arch_timer_ctx_index(ctx);
 	),
 
@@ -324,8 +324,8 @@ TRACE_EVENT(kvm_timer_restore_state,
 	),
 
 	TP_fast_assign(
-		__entry->ctl			= ctx->cnt_ctl;
-		__entry->cval			= ctx->cnt_cval;
+		__entry->ctl			= timer_get_ctl(ctx);
+		__entry->cval			= timer_get_cval(ctx);
 		__entry->timer_idx		= arch_timer_ctx_index(ctx);
 	),
 
-- 
2.26.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 26/26] KVM: arm64: Parametrize exception entry with a target EL
  2020-04-22 12:00 [PATCH 00/26] KVM: arm64: Preliminary NV patches Marc Zyngier
                   ` (24 preceding siblings ...)
  2020-04-22 12:00 ` [PATCH 25/26] KVM: arm64: timers: Move timer registers to the sys_regs file Marc Zyngier
@ 2020-04-22 12:00 ` Marc Zyngier
  2020-05-19 10:44   ` Mark Rutland
  25 siblings, 1 reply; 78+ messages in thread
From: Marc Zyngier @ 2020-04-22 12:00 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Will Deacon, Andre Przywara, Dave Martin, George Cherian,
	Zengtao (B),
	Catalin Marinas

We currently assume that an exception is delivered to EL1, always.
Once we emulate EL2, this no longer will be the case. To prepare
for this, add a target_mode parameter.

While we're at it, merge the computing of the target PC and PSTATE in
a single function that updates both PC and CPSR after saving their
previous values in the corresponding ELR/SPSR. This ensures that they
are updated in the correct order (a pretty common source of bugs...).

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/inject_fault.c | 75 ++++++++++++++++++-----------------
 1 file changed, 38 insertions(+), 37 deletions(-)

diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index d3ebf8bca4b89..3dbcbc839b9c3 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -26,28 +26,12 @@ enum exception_type {
 	except_type_serror	= 0x180,
 };
 
-static u64 get_except_vector(struct kvm_vcpu *vcpu, enum exception_type type)
-{
-	u64 exc_offset;
-
-	switch (*vcpu_cpsr(vcpu) & (PSR_MODE_MASK | PSR_MODE32_BIT)) {
-	case PSR_MODE_EL1t:
-		exc_offset = CURRENT_EL_SP_EL0_VECTOR;
-		break;
-	case PSR_MODE_EL1h:
-		exc_offset = CURRENT_EL_SP_ELx_VECTOR;
-		break;
-	case PSR_MODE_EL0t:
-		exc_offset = LOWER_EL_AArch64_VECTOR;
-		break;
-	default:
-		exc_offset = LOWER_EL_AArch32_VECTOR;
-	}
-
-	return vcpu_read_sys_reg(vcpu, VBAR_EL1) + exc_offset + type;
-}
-
 /*
+ * This performs the exception entry at a given EL (@target_mode), stashing PC
+ * and PSTATE into ELR and SPSR respectively, and compute the new PC/PSTATE.
+ * The EL passed to this function *must* be a non-secure, privileged mode with
+ * bit 0 being set (PSTATE.SP == 1).
+ *
  * When an exception is taken, most PSTATE fields are left unchanged in the
  * handler. However, some are explicitly overridden (e.g. M[4:0]). Luckily all
  * of the inherited bits have the same position in the AArch64/AArch32 SPSR_ELx
@@ -59,10 +43,35 @@ static u64 get_except_vector(struct kvm_vcpu *vcpu, enum exception_type type)
  * Here we manipulate the fields in order of the AArch64 SPSR_ELx layout, from
  * MSB to LSB.
  */
-static unsigned long get_except64_pstate(struct kvm_vcpu *vcpu)
+static void enter_exception(struct kvm_vcpu *vcpu, unsigned long target_mode,
+			    enum exception_type type)
 {
-	unsigned long sctlr = vcpu_read_sys_reg(vcpu, SCTLR_EL1);
-	unsigned long old, new;
+	unsigned long sctlr, vbar, old, new, mode;
+	u64 exc_offset;
+
+	mode = *vcpu_cpsr(vcpu) & (PSR_MODE_MASK | PSR_MODE32_BIT);
+
+	if      (mode == target_mode)
+		exc_offset = CURRENT_EL_SP_ELx_VECTOR;
+	else if ((mode | 1) == target_mode)
+		exc_offset = CURRENT_EL_SP_EL0_VECTOR;
+	else if (!(mode & PSR_MODE32_BIT))
+		exc_offset = LOWER_EL_AArch64_VECTOR;
+	else
+		exc_offset = LOWER_EL_AArch32_VECTOR;
+
+	switch (target_mode) {
+	case PSR_MODE_EL1h:
+		vbar = vcpu_read_sys_reg(vcpu, VBAR_EL1);
+		sctlr = vcpu_read_sys_reg(vcpu, SCTLR_EL1);
+		vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL1);
+		break;
+	default:
+		/* Don't do that */
+		BUG();
+	}
+
+	*vcpu_pc(vcpu) = vbar + exc_offset + type;
 
 	old = *vcpu_cpsr(vcpu);
 	new = 0;
@@ -105,9 +114,10 @@ static unsigned long get_except64_pstate(struct kvm_vcpu *vcpu)
 	new |= PSR_I_BIT;
 	new |= PSR_F_BIT;
 
-	new |= PSR_MODE_EL1h;
+	new |= target_mode;
 
-	return new;
+	*vcpu_cpsr(vcpu) = new;
+	vcpu_write_spsr(vcpu, old);
 }
 
 static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr)
@@ -116,11 +126,7 @@ static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr
 	bool is_aarch32 = vcpu_mode_is_32bit(vcpu);
 	u32 esr = 0;
 
-	vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL1);
-	*vcpu_pc(vcpu) = get_except_vector(vcpu, except_type_sync);
-
-	*vcpu_cpsr(vcpu) = get_except64_pstate(vcpu);
-	vcpu_write_spsr(vcpu, cpsr);
+	enter_exception(vcpu, PSR_MODE_EL1h, except_type_sync);
 
 	vcpu_write_sys_reg(vcpu, addr, FAR_EL1);
 
@@ -148,14 +154,9 @@ static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr
 
 static void inject_undef64(struct kvm_vcpu *vcpu)
 {
-	unsigned long cpsr = *vcpu_cpsr(vcpu);
 	u32 esr = (ESR_ELx_EC_UNKNOWN << ESR_ELx_EC_SHIFT);
 
-	vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL1);
-	*vcpu_pc(vcpu) = get_except_vector(vcpu, except_type_sync);
-
-	*vcpu_cpsr(vcpu) = get_except64_pstate(vcpu);
-	vcpu_write_spsr(vcpu, cpsr);
+	enter_exception(vcpu, PSR_MODE_EL1h, except_type_sync);
 
 	/*
 	 * Build an unknown exception, depending on the instruction
-- 
2.26.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 01/26] KVM: arm64: Check advertised Stage-2 page size capability
  2020-04-22 12:00 ` [PATCH 01/26] KVM: arm64: Check advertised Stage-2 page size capability Marc Zyngier
@ 2020-04-22 13:40   ` Suzuki K Poulose
  2020-04-22 14:07     ` Marc Zyngier
  2020-05-07 11:42   ` Alexandru Elisei
  1 sibling, 1 reply; 78+ messages in thread
From: Suzuki K Poulose @ 2020-04-22 13:40 UTC (permalink / raw)
  To: maz, linux-arm-kernel, kvmarm, kvm
  Cc: will, andre.przywara, dave.martin, gcherian, prime.zeng, catalin.marinas

Hi Marc,

On 04/22/2020 01:00 PM, Marc Zyngier wrote:
> With ARMv8.5-GTG, the hardware (or more likely a hypervisor) can
> advertise the supported Stage-2 page sizes.
> 
> Let's check this at boot time.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>   arch/arm64/include/asm/kvm_host.h |  2 +-
>   arch/arm64/include/asm/sysreg.h   |  3 +++
>   arch/arm64/kernel/cpufeature.c    |  8 +++++++
>   arch/arm64/kvm/reset.c            | 40 ++++++++++++++++++++++++++++---
>   virt/kvm/arm/arm.c                |  4 +---
>   5 files changed, 50 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 32c8a675e5a4a..7dd8fefa6aecd 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -670,7 +670,7 @@ static inline int kvm_arm_have_ssbd(void)
>   void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu);
>   void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu);
>   
> -void kvm_set_ipa_limit(void);
> +int kvm_set_ipa_limit(void);
>   
>   #define __KVM_HAVE_ARCH_VM_ALLOC
>   struct kvm *kvm_arch_alloc_vm(void);
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index ebc6224328318..5d10c9148e844 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -686,6 +686,9 @@
>   #define ID_AA64ZFR0_SVEVER_SVE2		0x1
>   
>   /* id_aa64mmfr0 */
> +#define ID_AA64MMFR0_TGRAN4_2_SHIFT	40
> +#define ID_AA64MMFR0_TGRAN64_2_SHIFT	36
> +#define ID_AA64MMFR0_TGRAN16_2_SHIFT	32
>   #define ID_AA64MMFR0_TGRAN4_SHIFT	28
>   #define ID_AA64MMFR0_TGRAN64_SHIFT	24
>   #define ID_AA64MMFR0_TGRAN16_SHIFT	20
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 9fac745aa7bb2..9892a845d06c9 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -208,6 +208,14 @@ static const struct arm64_ftr_bits ftr_id_aa64zfr0[] = {
>   };
>   
>   static const struct arm64_ftr_bits ftr_id_aa64mmfr0[] = {
> +	/*
> +	 * Page size not being supported at Stage-2 are not fatal. You
> +	 * just give up KVM if PAGE_SIZE isn't supported there. Go fix
> +	 * your favourite nesting hypervisor.
> +	 */
> +	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_EXACT, ID_AA64MMFR0_TGRAN4_2_SHIFT, 4, 1),
> +	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_EXACT, ID_AA64MMFR0_TGRAN64_2_SHIFT, 4, 1),
> +	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_EXACT, ID_AA64MMFR0_TGRAN16_2_SHIFT, 4, 1),

One minor issue with this is, if we get a system with cpus having values
0 and 2 (both of which indicates the stage-2 support), we might reset
the value to 1 for the feature indicating, we don't support and block
KVM. But, we can blame the nesting hypervisor for not emulating this to
(2). Do we need a comment to make this explicit here ?

Otherwise this looks fine to me

Suzuki



>   	/*
>   	 * We already refuse to boot CPUs that don't support our configured
>   	 * page size, so we can only detect mismatches for a page size other
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index 30b7ea680f66c..241db35a7ef4f 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -9,6 +9,7 @@
>    */
>   
>   #include <linux/errno.h>
> +#include <linux/bitfield.h>
>   #include <linux/kernel.h>
>   #include <linux/kvm_host.h>
>   #include <linux/kvm.h>
> @@ -340,11 +341,42 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>   	return ret;
>   }
>   
> -void kvm_set_ipa_limit(void)
> +int kvm_set_ipa_limit(void)
>   {
> -	unsigned int ipa_max, pa_max, va_max, parange;
> +	unsigned int ipa_max, pa_max, va_max, parange, tgran_2;
> +	u64 mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
>   
> -	parange = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1) & 0x7;
> +	/*
> +	 * Check with ARMv8.5-GTG that our PAGE_SIZE is supported at
> +	 * Stage-2. If not, things will stop very quickly.
> +	 */
> +	switch (PAGE_SIZE) {
> +	default:
> +	case SZ_4K:
> +		tgran_2 = ID_AA64MMFR0_TGRAN4_2_SHIFT;
> +		break;
> +	case SZ_16K:
> +		tgran_2 = ID_AA64MMFR0_TGRAN16_2_SHIFT;
> +		break;
> +	case SZ_64K:
> +		tgran_2 = ID_AA64MMFR0_TGRAN64_2_SHIFT;
> +		break;
> +	}
> +
> +	switch (FIELD_GET(0xFUL << tgran_2, mmfr0)) {
> +	default:
> +	case 1:
> +		kvm_err("PAGE_SIZE not supported at Stage-2, giving up\n");
> +		return -EINVAL;
> +	case 0:
> +		kvm_debug("PAGE_SIZE supported at Stage-2 (default)\n");
> +		break;
> +	case 2:
> +		kvm_debug("PAGE_SIZE supported at Stage-2 (advertised)\n");
> +		break;
> +	}
> +
> +	parange = mmfr0 & 0x7;
>   	pa_max = id_aa64mmfr0_parange_to_phys_shift(parange);
>   
>   	/* Clamp the IPA limit to the PA size supported by the kernel */
> @@ -378,6 +410,8 @@ void kvm_set_ipa_limit(void)
>   	     "KVM IPA limit (%d bit) is smaller than default size\n", ipa_max);
>   	kvm_ipa_limit = ipa_max;
>   	kvm_info("IPA Size Limit: %dbits\n", kvm_ipa_limit);
> +
> +	return 0;
>   }
>   
>   /*
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 48d0ec44ad77e..53b3ba9173ba7 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -1387,9 +1387,7 @@ static inline void hyp_cpu_pm_exit(void)
>   
>   static int init_common_resources(void)
>   {
> -	kvm_set_ipa_limit();
> -
> -	return 0;
> +	return kvm_set_ipa_limit();
>   }
>   
>   static int init_subsystems(void)
> 

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 02/26] KVM: arm64: Move __load_guest_stage2 to kvm_mmu.h
  2020-04-22 12:00 ` [PATCH 02/26] KVM: arm64: Move __load_guest_stage2 to kvm_mmu.h Marc Zyngier
@ 2020-04-22 13:51   ` Suzuki K Poulose
  2020-04-22 13:59     ` Marc Zyngier
  0 siblings, 1 reply; 78+ messages in thread
From: Suzuki K Poulose @ 2020-04-22 13:51 UTC (permalink / raw)
  To: maz, linux-arm-kernel, kvmarm, kvm
  Cc: will, andre.przywara, dave.martin, gcherian, prime.zeng, catalin.marinas

Hi Marc,


On 04/22/2020 01:00 PM, Marc Zyngier wrote:
> Having __load_guest_stage2 in kvm_hyp.h is quickly going to trigger
> a circular include problem. In order to avoid this, let's move
> it to kvm_mmu.h, where it will be a better fit anyway.
> 
> In the process, drop the __hyp_text annotation, which doesn't help
> as the function is marked as __always_inline.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>

> ---
>   arch/arm64/include/asm/kvm_hyp.h | 18 ------------------
>   arch/arm64/include/asm/kvm_mmu.h | 17 +++++++++++++++++
>   2 files changed, 17 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
> index fe57f60f06a89..dcb63bf941057 100644
> --- a/arch/arm64/include/asm/kvm_hyp.h
> +++ b/arch/arm64/include/asm/kvm_hyp.h
> @@ -10,7 +10,6 @@
>   #include <linux/compiler.h>
>   #include <linux/kvm_host.h>
>   #include <asm/alternative.h>
> -#include <asm/kvm_mmu.h>
>   #include <asm/sysreg.h>
>   
>   #define __hyp_text __section(.hyp.text) notrace
> @@ -88,22 +87,5 @@ void deactivate_traps_vhe_put(void);
>   u64 __guest_enter(struct kvm_vcpu *vcpu, struct kvm_cpu_context *host_ctxt);
>   void __noreturn __hyp_do_panic(unsigned long, ...);
>   
> -/*
> - * Must be called from hyp code running at EL2 with an updated VTTBR
> - * and interrupts disabled.
> - */
> -static __always_inline void __hyp_text __load_guest_stage2(struct kvm *kvm)
> -{
> -	write_sysreg(kvm->arch.vtcr, vtcr_el2);
> -	write_sysreg(kvm_get_vttbr(kvm), vttbr_el2);
> -
> -	/*
> -	 * ARM errata 1165522 and 1530923 require the actual execution of the
> -	 * above before we can switch to the EL1/EL0 translation regime used by
> -	 * the guest.
> -	 */
> -	asm(ALTERNATIVE("nop", "isb", ARM64_WORKAROUND_SPECULATIVE_AT_VHE));
> -}
> -
>   #endif /* __ARM64_KVM_HYP_H__ */
>   
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 30b0e8d6b8953..5ba1310639ec6 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -604,5 +604,22 @@ static __always_inline u64 kvm_get_vttbr(struct kvm *kvm)
>   	return kvm_phys_to_vttbr(baddr) | vmid_field | cnp;
>   }
>   
> +/*
> + * Must be called from hyp code running at EL2 with an updated VTTBR
> + * and interrupts disabled.
> + */
> +static __always_inline void __load_guest_stage2(struct kvm *kvm)
> +{
> +	write_sysreg(kvm->arch.vtcr, vtcr_el2);
> +	write_sysreg(kvm_get_vttbr(kvm), vttbr_el2);
> +
> +	/*
> +	 * ARM erratum 1165522 requires the actual execution of the above

Is it intentional to drop the reference to errata 1530923 ?

Otherwise :

Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 02/26] KVM: arm64: Move __load_guest_stage2 to kvm_mmu.h
  2020-04-22 13:51   ` Suzuki K Poulose
@ 2020-04-22 13:59     ` Marc Zyngier
  0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2020-04-22 13:59 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: kvm, andre.przywara, kvmarm, will, gcherian, prime.zeng,
	catalin.marinas, dave.martin, linux-arm-kernel

Hi Suzuki,

On 2020-04-22 14:51, Suzuki K Poulose wrote:
> Hi Marc,
> 
> 
> On 04/22/2020 01:00 PM, Marc Zyngier wrote:
>> Having __load_guest_stage2 in kvm_hyp.h is quickly going to trigger
>> a circular include problem. In order to avoid this, let's move
>> it to kvm_mmu.h, where it will be a better fit anyway.
>> 
>> In the process, drop the __hyp_text annotation, which doesn't help
>> as the function is marked as __always_inline.
>> 
>> Signed-off-by: Marc Zyngier <maz@kernel.org>
> 
>> ---
>>   arch/arm64/include/asm/kvm_hyp.h | 18 ------------------
>>   arch/arm64/include/asm/kvm_mmu.h | 17 +++++++++++++++++
>>   2 files changed, 17 insertions(+), 18 deletions(-)
>> 
>> diff --git a/arch/arm64/include/asm/kvm_hyp.h 
>> b/arch/arm64/include/asm/kvm_hyp.h
>> index fe57f60f06a89..dcb63bf941057 100644
>> --- a/arch/arm64/include/asm/kvm_hyp.h
>> +++ b/arch/arm64/include/asm/kvm_hyp.h
>> @@ -10,7 +10,6 @@
>>   #include <linux/compiler.h>
>>   #include <linux/kvm_host.h>
>>   #include <asm/alternative.h>
>> -#include <asm/kvm_mmu.h>
>>   #include <asm/sysreg.h>
>>     #define __hyp_text __section(.hyp.text) notrace
>> @@ -88,22 +87,5 @@ void deactivate_traps_vhe_put(void);
>>   u64 __guest_enter(struct kvm_vcpu *vcpu, struct kvm_cpu_context 
>> *host_ctxt);
>>   void __noreturn __hyp_do_panic(unsigned long, ...);
>>   -/*
>> - * Must be called from hyp code running at EL2 with an updated VTTBR
>> - * and interrupts disabled.
>> - */
>> -static __always_inline void __hyp_text __load_guest_stage2(struct kvm 
>> *kvm)
>> -{
>> -	write_sysreg(kvm->arch.vtcr, vtcr_el2);
>> -	write_sysreg(kvm_get_vttbr(kvm), vttbr_el2);
>> -
>> -	/*
>> -	 * ARM errata 1165522 and 1530923 require the actual execution of 
>> the
>> -	 * above before we can switch to the EL1/EL0 translation regime used 
>> by
>> -	 * the guest.
>> -	 */
>> -	asm(ALTERNATIVE("nop", "isb", ARM64_WORKAROUND_SPECULATIVE_AT_VHE));
>> -}
>> -
>>   #endif /* __ARM64_KVM_HYP_H__ */
>>   diff --git a/arch/arm64/include/asm/kvm_mmu.h 
>> b/arch/arm64/include/asm/kvm_mmu.h
>> index 30b0e8d6b8953..5ba1310639ec6 100644
>> --- a/arch/arm64/include/asm/kvm_mmu.h
>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>> @@ -604,5 +604,22 @@ static __always_inline u64 kvm_get_vttbr(struct 
>> kvm *kvm)
>>   	return kvm_phys_to_vttbr(baddr) | vmid_field | cnp;
>>   }
>>   +/*
>> + * Must be called from hyp code running at EL2 with an updated VTTBR
>> + * and interrupts disabled.
>> + */
>> +static __always_inline void __load_guest_stage2(struct kvm *kvm)
>> +{
>> +	write_sysreg(kvm->arch.vtcr, vtcr_el2);
>> +	write_sysreg(kvm_get_vttbr(kvm), vttbr_el2);
>> +
>> +	/*
>> +	 * ARM erratum 1165522 requires the actual execution of the above
> 
> Is it intentional to drop the reference to errata 1530923 ?

No, that's obviously the effect of a bad conflict resolution during a 
rebase.
I'll fix it now, thanks for spotting it.

> Otherwise :
> 
> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>

Cheers,

         M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 01/26] KVM: arm64: Check advertised Stage-2 page size capability
  2020-04-22 13:40   ` Suzuki K Poulose
@ 2020-04-22 14:07     ` Marc Zyngier
  2020-04-22 14:14       ` Suzuki K Poulose
  0 siblings, 1 reply; 78+ messages in thread
From: Marc Zyngier @ 2020-04-22 14:07 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: kvm, andre.przywara, kvmarm, will, gcherian, prime.zeng,
	catalin.marinas, dave.martin, linux-arm-kernel

Hi Suzuki,

On 2020-04-22 14:40, Suzuki K Poulose wrote:
> Hi Marc,
> 
> On 04/22/2020 01:00 PM, Marc Zyngier wrote:
>> With ARMv8.5-GTG, the hardware (or more likely a hypervisor) can
>> advertise the supported Stage-2 page sizes.
>> 
>> Let's check this at boot time.
>> 
>> Signed-off-by: Marc Zyngier <maz@kernel.org>
>> ---
>>   arch/arm64/include/asm/kvm_host.h |  2 +-
>>   arch/arm64/include/asm/sysreg.h   |  3 +++
>>   arch/arm64/kernel/cpufeature.c    |  8 +++++++
>>   arch/arm64/kvm/reset.c            | 40 
>> ++++++++++++++++++++++++++++---
>>   virt/kvm/arm/arm.c                |  4 +---
>>   5 files changed, 50 insertions(+), 7 deletions(-)
>> 
>> diff --git a/arch/arm64/include/asm/kvm_host.h 
>> b/arch/arm64/include/asm/kvm_host.h
>> index 32c8a675e5a4a..7dd8fefa6aecd 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -670,7 +670,7 @@ static inline int kvm_arm_have_ssbd(void)
>>   void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu);
>>   void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu);
>>   -void kvm_set_ipa_limit(void);
>> +int kvm_set_ipa_limit(void);
>>     #define __KVM_HAVE_ARCH_VM_ALLOC
>>   struct kvm *kvm_arch_alloc_vm(void);
>> diff --git a/arch/arm64/include/asm/sysreg.h 
>> b/arch/arm64/include/asm/sysreg.h
>> index ebc6224328318..5d10c9148e844 100644
>> --- a/arch/arm64/include/asm/sysreg.h
>> +++ b/arch/arm64/include/asm/sysreg.h
>> @@ -686,6 +686,9 @@
>>   #define ID_AA64ZFR0_SVEVER_SVE2		0x1
>>     /* id_aa64mmfr0 */
>> +#define ID_AA64MMFR0_TGRAN4_2_SHIFT	40
>> +#define ID_AA64MMFR0_TGRAN64_2_SHIFT	36
>> +#define ID_AA64MMFR0_TGRAN16_2_SHIFT	32
>>   #define ID_AA64MMFR0_TGRAN4_SHIFT	28
>>   #define ID_AA64MMFR0_TGRAN64_SHIFT	24
>>   #define ID_AA64MMFR0_TGRAN16_SHIFT	20
>> diff --git a/arch/arm64/kernel/cpufeature.c 
>> b/arch/arm64/kernel/cpufeature.c
>> index 9fac745aa7bb2..9892a845d06c9 100644
>> --- a/arch/arm64/kernel/cpufeature.c
>> +++ b/arch/arm64/kernel/cpufeature.c
>> @@ -208,6 +208,14 @@ static const struct arm64_ftr_bits 
>> ftr_id_aa64zfr0[] = {
>>   };
>>     static const struct arm64_ftr_bits ftr_id_aa64mmfr0[] = {
>> +	/*
>> +	 * Page size not being supported at Stage-2 are not fatal. You
>> +	 * just give up KVM if PAGE_SIZE isn't supported there. Go fix
>> +	 * your favourite nesting hypervisor.
>> +	 */
>> +	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_EXACT, 
>> ID_AA64MMFR0_TGRAN4_2_SHIFT, 4, 1),
>> +	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_EXACT, 
>> ID_AA64MMFR0_TGRAN64_2_SHIFT, 4, 1),
>> +	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_EXACT, 
>> ID_AA64MMFR0_TGRAN16_2_SHIFT, 4, 1),
> 
> One minor issue with this is, if we get a system with cpus having 
> values
> 0 and 2 (both of which indicates the stage-2 support), we might reset
> the value to 1 for the feature indicating, we don't support and block
> KVM. But, we can blame the nesting hypervisor for not emulating this to
> (2). Do we need a comment to make this explicit here ?

Sure. How about something like:

"There is a small corner case where the hypervisor could explicitly 
advertise
  a given granule size at Stage-2 (value 2) on some vCPUs, and use the 
fallback
  to Stage-1 (value 0) for other vCPUs. Although this is not forbidden by 
the
  architecture, it indicates that the hypervisor is being silly (or 
buggy).
  We make no effort to cope with this and pretend that if these fields 
are
  inconsistent across vCPUs, then it isn't worth trying to bring KVM up."

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 01/26] KVM: arm64: Check advertised Stage-2 page size capability
  2020-04-22 14:07     ` Marc Zyngier
@ 2020-04-22 14:14       ` Suzuki K Poulose
  0 siblings, 0 replies; 78+ messages in thread
From: Suzuki K Poulose @ 2020-04-22 14:14 UTC (permalink / raw)
  To: maz
  Cc: kvm, andre.przywara, kvmarm, will, gcherian, prime.zeng,
	catalin.marinas, dave.martin, linux-arm-kernel

On 04/22/2020 03:07 PM, Marc Zyngier wrote:
> Hi Suzuki,
> 
> On 2020-04-22 14:40, Suzuki K Poulose wrote:
>> Hi Marc,
>>
>> On 04/22/2020 01:00 PM, Marc Zyngier wrote:
>>> With ARMv8.5-GTG, the hardware (or more likely a hypervisor) can
>>> advertise the supported Stage-2 page sizes.
>>>
>>> Let's check this at boot time.
>>>
>>> Signed-off-by: Marc Zyngier <maz@kernel.org>
>>> ---
>>>   arch/arm64/include/asm/kvm_host.h |  2 +-
>>>   arch/arm64/include/asm/sysreg.h   |  3 +++
>>>   arch/arm64/kernel/cpufeature.c    |  8 +++++++
>>>   arch/arm64/kvm/reset.c            | 40 ++++++++++++++++++++++++++++---
>>>   virt/kvm/arm/arm.c                |  4 +---
>>>   5 files changed, 50 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/kvm_host.h 
>>> b/arch/arm64/include/asm/kvm_host.h
>>> index 32c8a675e5a4a..7dd8fefa6aecd 100644
>>> --- a/arch/arm64/include/asm/kvm_host.h
>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>> @@ -670,7 +670,7 @@ static inline int kvm_arm_have_ssbd(void)
>>>   void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu);
>>>   void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu);
>>>   -void kvm_set_ipa_limit(void);
>>> +int kvm_set_ipa_limit(void);
>>>     #define __KVM_HAVE_ARCH_VM_ALLOC
>>>   struct kvm *kvm_arch_alloc_vm(void);
>>> diff --git a/arch/arm64/include/asm/sysreg.h 
>>> b/arch/arm64/include/asm/sysreg.h
>>> index ebc6224328318..5d10c9148e844 100644
>>> --- a/arch/arm64/include/asm/sysreg.h
>>> +++ b/arch/arm64/include/asm/sysreg.h
>>> @@ -686,6 +686,9 @@
>>>   #define ID_AA64ZFR0_SVEVER_SVE2        0x1
>>>     /* id_aa64mmfr0 */
>>> +#define ID_AA64MMFR0_TGRAN4_2_SHIFT    40
>>> +#define ID_AA64MMFR0_TGRAN64_2_SHIFT    36
>>> +#define ID_AA64MMFR0_TGRAN16_2_SHIFT    32
>>>   #define ID_AA64MMFR0_TGRAN4_SHIFT    28
>>>   #define ID_AA64MMFR0_TGRAN64_SHIFT    24
>>>   #define ID_AA64MMFR0_TGRAN16_SHIFT    20
>>> diff --git a/arch/arm64/kernel/cpufeature.c 
>>> b/arch/arm64/kernel/cpufeature.c
>>> index 9fac745aa7bb2..9892a845d06c9 100644
>>> --- a/arch/arm64/kernel/cpufeature.c
>>> +++ b/arch/arm64/kernel/cpufeature.c
>>> @@ -208,6 +208,14 @@ static const struct arm64_ftr_bits 
>>> ftr_id_aa64zfr0[] = {
>>>   };
>>>     static const struct arm64_ftr_bits ftr_id_aa64mmfr0[] = {
>>> +    /*
>>> +     * Page size not being supported at Stage-2 are not fatal. You
>>> +     * just give up KVM if PAGE_SIZE isn't supported there. Go fix
>>> +     * your favourite nesting hypervisor.
>>> +     */
>>> +    ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_EXACT, 
>>> ID_AA64MMFR0_TGRAN4_2_SHIFT, 4, 1),
>>> +    ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_EXACT, 
>>> ID_AA64MMFR0_TGRAN64_2_SHIFT, 4, 1),
>>> +    ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_EXACT, 
>>> ID_AA64MMFR0_TGRAN16_2_SHIFT, 4, 1),
>>
>> One minor issue with this is, if we get a system with cpus having values
>> 0 and 2 (both of which indicates the stage-2 support), we might reset
>> the value to 1 for the feature indicating, we don't support and block
>> KVM. But, we can blame the nesting hypervisor for not emulating this to
>> (2). Do we need a comment to make this explicit here ?
> 
> Sure. How about something like:
> 
> "There is a small corner case where the hypervisor could explicitly 
> advertise
>   a given granule size at Stage-2 (value 2) on some vCPUs, and use the 
> fallback
>   to Stage-1 (value 0) for other vCPUs. Although this is not forbidden 
> by the
>   architecture, it indicates that the hypervisor is being silly (or buggy).
>   We make no effort to cope with this and pretend that if these fields are
>   inconsistent across vCPUs, then it isn't worth trying to bring KVM up."
> 

Looks fine to me.

Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 18/26] KVM: arm64: Don't use empty structures as CPU reset state
  2020-04-22 12:00 ` [PATCH 18/26] KVM: arm64: Don't use empty structures as CPU reset state Marc Zyngier
@ 2020-04-24  4:07   ` Zenghui Yu
  2020-04-24  7:45     ` Marc Zyngier
  0 siblings, 1 reply; 78+ messages in thread
From: Zenghui Yu @ 2020-04-24  4:07 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, George Cherian, Zengtao (B),
	Catalin Marinas, Will Deacon, Dave Martin

Hi Marc,

On 2020/4/22 20:00, Marc Zyngier wrote:
> Keeping empty structure as the vcpu state initializer is slightly
> wasteful: we only want to set pstate, and zero everything else.
> Just do that.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>   arch/arm64/kvm/reset.c | 20 +++++++++-----------
>   1 file changed, 9 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index 241db35a7ef4f..895d7d9ad1866 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -37,15 +37,11 @@ static u32 kvm_ipa_limit;
>   /*
>    * ARMv8 Reset Values
>    */
> -static const struct kvm_regs default_regs_reset = {
> -	.regs.pstate = (PSR_MODE_EL1h | PSR_A_BIT | PSR_I_BIT |
> -			PSR_F_BIT | PSR_D_BIT),
> -};
> +#define VCPU_RESET_PSTATE_EL1	(PSR_MODE_EL1h | PSR_A_BIT | PSR_I_BIT | \
> +				 PSR_F_BIT | PSR_D_BIT)
>   
> -static const struct kvm_regs default_regs_reset32 = {
> -	.regs.pstate = (PSR_AA32_MODE_SVC | PSR_AA32_A_BIT |
> -			PSR_AA32_I_BIT | PSR_AA32_F_BIT),
> -};
> +#define VCPU_RESET_PSTATE_SVC	(PSR_AA32_MODE_SVC | PSR_AA32_A_BIT | \
> +				 PSR_AA32_I_BIT | PSR_AA32_F_BIT)
>   
>   static bool cpu_has_32bit_el1(void)
>   {
> @@ -261,6 +257,7 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>   	const struct kvm_regs *cpu_reset;
>   	int ret = -EINVAL;
>   	bool loaded;
> +	u32 pstate;
>   
>   	/* Reset PMU outside of the non-preemptible section */
>   	kvm_pmu_vcpu_reset(vcpu);
> @@ -291,16 +288,17 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>   		if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
>   			if (!cpu_has_32bit_el1())
>   				goto out;
> -			cpu_reset = &default_regs_reset32;
> +			pstate = VCPU_RESET_PSTATE_SVC;
>   		} else {
> -			cpu_reset = &default_regs_reset;
> +			pstate = VCPU_RESET_PSTATE_EL1;
>   		}
>   
>   		break;
>   	}
>   
>   	/* Reset core registers */
> -	memcpy(vcpu_gp_regs(vcpu), cpu_reset, sizeof(*cpu_reset));
> +	memset(vcpu_gp_regs(vcpu), 0, sizeof(*cpu_reset));

Be careful that we can *not* use 'sizeof(*cpu_reset)' here anymore.  As
you're going to refactor the layout of the core registers whilst keeping
the kvm_regs API unchanged.  Resetting the whole kvm_regs will go
corrupting some affected registers and make them temporarily invalid.
The bad thing will show up after you start moving ELR_EL1 around,
specifically in patch #20...

And the first victim is ... MPIDR_EL1 (the first one in sys_regs array).
Now you know how this was spotted ;-)  I think this should be the root
cause of what Zengtao had previously reported [*].

If these registers are all expected to be reset to architecturally
UNKNOWN values, I think we can just drop this memset(), though haven't
check with the ARM ARM carefully.


Thanks,
Zenghui


[*] 
https://lore.kernel.org/kvmarm/f55386a9-8eaa-944f-453d-9c3c4abee5fb@arm.com/T/#mc6c7268755f5cdaff7a23c34e6e16ea36bcfbe22

> +	vcpu_gp_regs(vcpu)->regs.pstate = pstate;
>   
>   	/* Reset system registers */
>   	kvm_reset_sys_regs(vcpu);
> 

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 18/26] KVM: arm64: Don't use empty structures as CPU reset state
  2020-04-24  4:07   ` Zenghui Yu
@ 2020-04-24  7:45     ` Marc Zyngier
  2020-04-28  1:34       ` Zengtao (B)
  0 siblings, 1 reply; 78+ messages in thread
From: Marc Zyngier @ 2020-04-24  7:45 UTC (permalink / raw)
  To: Zenghui Yu
  Cc: kvm, Andre Przywara, kvmarm, George Cherian, Zengtao (B),
	Catalin Marinas, Will Deacon, Dave Martin, linux-arm-kernel

Hi Zenghui,

On Fri, 24 Apr 2020 12:07:50 +0800
Zenghui Yu <yuzenghui@huawei.com> wrote:

> Hi Marc,
> 
> On 2020/4/22 20:00, Marc Zyngier wrote:
> > Keeping empty structure as the vcpu state initializer is slightly
> > wasteful: we only want to set pstate, and zero everything else.
> > Just do that.
> > 
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> >   arch/arm64/kvm/reset.c | 20 +++++++++-----------
> >   1 file changed, 9 insertions(+), 11 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> > index 241db35a7ef4f..895d7d9ad1866 100644
> > --- a/arch/arm64/kvm/reset.c
> > +++ b/arch/arm64/kvm/reset.c
> > @@ -37,15 +37,11 @@ static u32 kvm_ipa_limit;
> >   /*
> >    * ARMv8 Reset Values
> >    */
> > -static const struct kvm_regs default_regs_reset = {
> > -	.regs.pstate = (PSR_MODE_EL1h | PSR_A_BIT | PSR_I_BIT |
> > -			PSR_F_BIT | PSR_D_BIT),
> > -};
> > +#define VCPU_RESET_PSTATE_EL1	(PSR_MODE_EL1h | PSR_A_BIT | PSR_I_BIT | \
> > +				 PSR_F_BIT | PSR_D_BIT)  
> >   > -static const struct kvm_regs default_regs_reset32 = {  
> > -	.regs.pstate = (PSR_AA32_MODE_SVC | PSR_AA32_A_BIT |
> > -			PSR_AA32_I_BIT | PSR_AA32_F_BIT),
> > -};
> > +#define VCPU_RESET_PSTATE_SVC	(PSR_AA32_MODE_SVC | PSR_AA32_A_BIT | \
> > +				 PSR_AA32_I_BIT | PSR_AA32_F_BIT)  
> >   >   static bool cpu_has_32bit_el1(void)  
> >   {
> > @@ -261,6 +257,7 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
> >   	const struct kvm_regs *cpu_reset;
> >   	int ret = -EINVAL;
> >   	bool loaded;
> > +	u32 pstate;  
> >   >   	/* Reset PMU outside of the non-preemptible section */  
> >   	kvm_pmu_vcpu_reset(vcpu);
> > @@ -291,16 +288,17 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
> >   		if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
> >   			if (!cpu_has_32bit_el1())
> >   				goto out;
> > -			cpu_reset = &default_regs_reset32;
> > +			pstate = VCPU_RESET_PSTATE_SVC;
> >   		} else {
> > -			cpu_reset = &default_regs_reset;
> > +			pstate = VCPU_RESET_PSTATE_EL1;
> >   		}  
> >   >   		break;  
> >   	}  
> >   >   	/* Reset core registers */  
> > -	memcpy(vcpu_gp_regs(vcpu), cpu_reset, sizeof(*cpu_reset));
> > +	memset(vcpu_gp_regs(vcpu), 0, sizeof(*cpu_reset));  
> 
> Be careful that we can *not* use 'sizeof(*cpu_reset)' here anymore.  As
> you're going to refactor the layout of the core registers whilst keeping
> the kvm_regs API unchanged.  Resetting the whole kvm_regs will go
> corrupting some affected registers and make them temporarily invalid.
> The bad thing will show up after you start moving ELR_EL1 around,
> specifically in patch #20...

Ah, awesome find! Yes, it is pretty obvious now that you point it out.
If I had removed this now useless cpu_reset variable, I'd have spotted
it!

> And the first victim is ... MPIDR_EL1 (the first one in sys_regs array).
> Now you know how this was spotted ;-)  I think this should be the root
> cause of what Zengtao had previously reported [*].

It'd be good if Zengtao could confirm that changing this line to

	memset(vcpu_gp_regs(vcpu), 0, sizeof(*vcpu_gp_regs(vcpu)));

fixes his problem.

> If these registers are all expected to be reset to architecturally
> UNKNOWN values, I think we can just drop this memset(), though haven't
> check with the ARM ARM carefully.

D1.9.1 ("PE state on reset to AArch64 state"):

"All general-purpose, and SIMD and floating-point registers are
UNKNOWN."

There is a vaguely similar wording for AArch32 (G1.17.1), although it
is only described by omission:

"Immediately after a reset, much of the PE state is UNKNOWN. However,
some of the PE state is defined."

and the GPRs are not part of the list of defined states.

Still, I'm worried to change KVM's behaviour after so long... I'll have
a try with a handful of non-Linux guests and see if anything breaks.

Thanks again,

         M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 04/26] arm64: Detect the ARMv8.4 TTL feature
  2020-04-22 12:00 ` [PATCH 04/26] arm64: Detect the ARMv8.4 TTL feature Marc Zyngier
@ 2020-04-27 15:55   ` Suzuki K Poulose
  0 siblings, 0 replies; 78+ messages in thread
From: Suzuki K Poulose @ 2020-04-27 15:55 UTC (permalink / raw)
  To: maz, linux-arm-kernel, kvmarm, kvm
  Cc: will, andre.przywara, dave.martin, gcherian, prime.zeng, catalin.marinas

On 04/22/2020 01:00 PM, Marc Zyngier wrote:
> In order to reduce the cost of TLB invalidation, the ARMv8.4 TTL
> feature allows TLBs to be issued with a level allowing for quicker
> invalidation.
> 
> Let's detect the feature for now. Further patches will implement
> its actual usage.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---


>   arch/arm64/include/asm/cpucaps.h |  3 ++-
>   arch/arm64/include/asm/sysreg.h  |  1 +
>   arch/arm64/kernel/cpufeature.c   | 11 +++++++++++
>   3 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
> index 8eb5a088ae658..cabb0c49a1d11 100644
> --- a/arch/arm64/include/asm/cpucaps.h
> +++ b/arch/arm64/include/asm/cpucaps.h
> @@ -61,7 +61,8 @@
>   #define ARM64_HAS_AMU_EXTN			51
>   #define ARM64_HAS_ADDRESS_AUTH			52
>   #define ARM64_HAS_GENERIC_AUTH			53
> +#define ARM64_HAS_ARMv8_4_TTL			54
>   
> -#define ARM64_NCAPS				54
> +#define ARM64_NCAPS				55
>   
>   #endif /* __ASM_CPUCAPS_H */
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 5d10c9148e844..79cf186b7e471 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -726,6 +726,7 @@
>   
>   /* id_aa64mmfr2 */
>   #define ID_AA64MMFR2_E0PD_SHIFT		60
> +#define ID_AA64MMFR2_TTL_SHIFT		48
>   #define ID_AA64MMFR2_FWB_SHIFT		40
>   #define ID_AA64MMFR2_AT_SHIFT		32
>   #define ID_AA64MMFR2_LVA_SHIFT		16
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 9892a845d06c9..d8ab4f1e93bee 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -252,6 +252,7 @@ static const struct arm64_ftr_bits ftr_id_aa64mmfr1[] = {
>   
>   static const struct arm64_ftr_bits ftr_id_aa64mmfr2[] = {
>   	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_E0PD_SHIFT, 4, 0),
> +	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_TTL_SHIFT, 4, 0),
>   	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_FWB_SHIFT, 4, 0),
>   	ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_AT_SHIFT, 4, 0),
>   	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_LVA_SHIFT, 4, 0),
> @@ -1630,6 +1631,16 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
>   		.matches = has_cpuid_feature,
>   		.cpu_enable = cpu_has_fwb,
>   	},
> +	{
> +		.desc = "ARMv8.4 Translation Table Level",
> +		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
> +		.capability = ARM64_HAS_ARMv8_4_TTL,
> +		.sys_reg = SYS_ID_AA64MMFR2_EL1,
> +		.sign = FTR_UNSIGNED,
> +		.field_pos = ID_AA64MMFR2_TTL_SHIFT,
> +		.min_field_value = 1,
> +		.matches = has_cpuid_feature,
> +	},

Reviewed-by : Suzuki K Polose <suzuki.poulose@arm.com>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* RE: [PATCH 18/26] KVM: arm64: Don't use empty structures as CPU reset state
  2020-04-24  7:45     ` Marc Zyngier
@ 2020-04-28  1:34       ` Zengtao (B)
  0 siblings, 0 replies; 78+ messages in thread
From: Zengtao (B) @ 2020-04-28  1:34 UTC (permalink / raw)
  To: Marc Zyngier, yuzenghui
  Cc: kvm, Andre Przywara, kvmarm, George Cherian, Catalin Marinas,
	Will Deacon, Dave Martin, linux-arm-kernel

> -----Original Message-----
> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org]
> On Behalf Of Marc Zyngier
> Sent: Friday, April 24, 2020 3:45 PM
> To: yuzenghui
> Cc: linux-arm-kernel@lists.infradead.org; kvmarm@lists.cs.columbia.edu;
> kvm@vger.kernel.org; Will Deacon; Andre Przywara; Dave Martin;
> George Cherian; Zengtao (B); Catalin Marinas
> Subject: Re: [PATCH 18/26] KVM: arm64: Don't use empty structures as
> CPU reset state
> 
> Hi Zenghui,
> 
> On Fri, 24 Apr 2020 12:07:50 +0800
> Zenghui Yu <yuzenghui@huawei.com> wrote:
> 
> > Hi Marc,
> >
> > On 2020/4/22 20:00, Marc Zyngier wrote:
> > > Keeping empty structure as the vcpu state initializer is slightly
> > > wasteful: we only want to set pstate, and zero everything else.
> > > Just do that.
> > >
> > > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > > ---
> > >   arch/arm64/kvm/reset.c | 20 +++++++++-----------
> > >   1 file changed, 9 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> > > index 241db35a7ef4f..895d7d9ad1866 100644
> > > --- a/arch/arm64/kvm/reset.c
> > > +++ b/arch/arm64/kvm/reset.c
> > > @@ -37,15 +37,11 @@ static u32 kvm_ipa_limit;
> > >   /*
> > >    * ARMv8 Reset Values
> > >    */
> > > -static const struct kvm_regs default_regs_reset = {
> > > -	.regs.pstate = (PSR_MODE_EL1h | PSR_A_BIT | PSR_I_BIT |
> > > -			PSR_F_BIT | PSR_D_BIT),
> > > -};
> > > +#define VCPU_RESET_PSTATE_EL1	(PSR_MODE_EL1h |
> PSR_A_BIT | PSR_I_BIT | \
> > > +				 PSR_F_BIT | PSR_D_BIT)
> > >   > -static const struct kvm_regs default_regs_reset32 = {
> > > -	.regs.pstate = (PSR_AA32_MODE_SVC | PSR_AA32_A_BIT |
> > > -			PSR_AA32_I_BIT | PSR_AA32_F_BIT),
> > > -};
> > > +#define VCPU_RESET_PSTATE_SVC	(PSR_AA32_MODE_SVC |
> PSR_AA32_A_BIT | \
> > > +				 PSR_AA32_I_BIT | PSR_AA32_F_BIT)
> > >   >   static bool cpu_has_32bit_el1(void)
> > >   {
> > > @@ -261,6 +257,7 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
> > >   	const struct kvm_regs *cpu_reset;
> > >   	int ret = -EINVAL;
> > >   	bool loaded;
> > > +	u32 pstate;
> > >   >   	/* Reset PMU outside of the non-preemptible section */
> > >   	kvm_pmu_vcpu_reset(vcpu);
> > > @@ -291,16 +288,17 @@ int kvm_reset_vcpu(struct kvm_vcpu
> *vcpu)
> > >   		if (test_bit(KVM_ARM_VCPU_EL1_32BIT,
> vcpu->arch.features)) {
> > >   			if (!cpu_has_32bit_el1())
> > >   				goto out;
> > > -			cpu_reset = &default_regs_reset32;
> > > +			pstate = VCPU_RESET_PSTATE_SVC;
> > >   		} else {
> > > -			cpu_reset = &default_regs_reset;
> > > +			pstate = VCPU_RESET_PSTATE_EL1;
> > >   		}
> > >   >   		break;
> > >   	}
> > >   >   	/* Reset core registers */
> > > -	memcpy(vcpu_gp_regs(vcpu), cpu_reset, sizeof(*cpu_reset));
> > > +	memset(vcpu_gp_regs(vcpu), 0, sizeof(*cpu_reset));
> >
> > Be careful that we can *not* use 'sizeof(*cpu_reset)' here anymore.
> As
> > you're going to refactor the layout of the core registers whilst keeping
> > the kvm_regs API unchanged.  Resetting the whole kvm_regs will go
> > corrupting some affected registers and make them temporarily invalid.
> > The bad thing will show up after you start moving ELR_EL1 around,
> > specifically in patch #20...
> 
> Ah, awesome find! Yes, it is pretty obvious now that you point it out.
> If I had removed this now useless cpu_reset variable, I'd have spotted
> it!
> 
> > And the first victim is ... MPIDR_EL1 (the first one in sys_regs array).
> > Now you know how this was spotted ;-)  I think this should be the
> root
> > cause of what Zengtao had previously reported [*].
> 
> It'd be good if Zengtao could confirm that changing this line to
> 
> 	memset(vcpu_gp_regs(vcpu), 0, sizeof(*vcpu_gp_regs(vcpu)));
> 
> fixes his problem.

I tried this change, and it did fix my problem, thanks :). 

> 
> > If these registers are all expected to be reset to architecturally
> > UNKNOWN values, I think we can just drop this memset(), though
> haven't
> > check with the ARM ARM carefully.
> 
> D1.9.1 ("PE state on reset to AArch64 state"):
> 
> "All general-purpose, and SIMD and floating-point registers are
> UNKNOWN."
> 
> There is a vaguely similar wording for AArch32 (G1.17.1), although it
> is only described by omission:
> 
> "Immediately after a reset, much of the PE state is UNKNOWN. However,
> some of the PE state is defined."
> 
> and the GPRs are not part of the list of defined states.
> 
> Still, I'm worried to change KVM's behaviour after so long... I'll have
> a try with a handful of non-Linux guests and see if anything breaks.
> 
> Thanks again,
> 
>          M.
> --
> Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 03/26] KVM: arm64: Factor out stage 2 page table data from struct kvm
  2020-04-22 12:00 ` [PATCH 03/26] KVM: arm64: Factor out stage 2 page table data from struct kvm Marc Zyngier
@ 2020-05-05 15:26   ` Andrew Scull
  2020-05-05 16:32     ` Marc Zyngier
  2020-05-05 16:03   ` James Morse
  2020-05-11 16:38   ` Alexandru Elisei
  2 siblings, 1 reply; 78+ messages in thread
From: Andrew Scull @ 2020-05-05 15:26 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, Andre Przywara, kvmarm, George Cherian, Zengtao (B),
	Catalin Marinas, Will Deacon, Dave Martin, linux-arm-kernel

Having a go at reviewing. Might turn out to be more useful as a learning
exercise for me rather than useful feedback but we've got to start
somewhere..

> -struct kvm_arch {
> +struct kvm_s2_mmu {
>  	struct kvm_vmid vmid;
>  
> -	/* stage2 entry level table */
> -	pgd_t *pgd;
> -	phys_addr_t pgd_phys;
> -
> -	/* VTCR_EL2 value for this VM */
> -	u64    vtcr;
> +	/*
> +	 * stage2 entry level table
> +	 *
> +	 * Two kvm_s2_mmu structures in the same VM can point to the same pgd
> +	 * here.  This happens when running a non-VHE guest hypervisor which
> +	 * uses the canonical stage 2 page table for both vEL2 and for vEL1/0
> +	 * with vHCR_EL2.VM == 0.
> +	 */
> +	pgd_t		*pgd;
> +	phys_addr_t	pgd_phys;
>  
>  	/* The last vcpu id that ran on each physical CPU */
>  	int __percpu *last_vcpu_ran;
>  
> +	struct kvm *kvm;
> +};
> +
> +struct kvm_arch {
> +	struct kvm_s2_mmu mmu;
> +
> +	/* VTCR_EL2 value for this VM */
> +	u64    vtcr;

VTCR seems quite strongly tied to the MMU config. Is it not controlled
independently for the nested MMUs and so remains in this struct?

> -static void stage2_dissolve_pmd(struct kvm *kvm, phys_addr_t addr, pmd_t *pmd)
> +static void stage2_dissolve_pmd(struct kvm_s2_mmu *mmu, phys_addr_t addr, pmd_t *pmd)

How strictly is the long line style rule enforced? checkpatch has 16
such warnings on this patch.

> -static void stage2_dissolve_pud(struct kvm *kvm, phys_addr_t addr, pud_t *pudp)
> +static void stage2_dissolve_pud(struct kvm_s2_mmu *mmu, phys_addr_t addr, pud_t *pudp)
>  {
> +	struct kvm *kvm __maybe_unused = mmu->kvm;
> +
>  	if (!stage2_pud_huge(kvm, *pudp))
>  		return;

There're a couple places with `__maybe_unused` on variables that are
then used soon after. Can they be dropped in these cases so as not to
hide legitimate warning?
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 05/26] arm64: Document SW reserved PTE/PMD bits in Stage-2 descriptors
  2020-04-22 12:00 ` [PATCH 05/26] arm64: Document SW reserved PTE/PMD bits in Stage-2 descriptors Marc Zyngier
@ 2020-05-05 15:59   ` Andrew Scull
  2020-05-06  9:39     ` Marc Zyngier
  0 siblings, 1 reply; 78+ messages in thread
From: Andrew Scull @ 2020-05-05 15:59 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, Andre Przywara, kvmarm, George Cherian, Zengtao (B),
	Catalin Marinas, Will Deacon, Dave Martin, linux-arm-kernel

On Wed, Apr 22, 2020 at 01:00:29PM +0100, Marc Zyngier wrote:
> Advertise bits [58:55] as reserved for SW in the S2 descriptors.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/pgtable-hwdef.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
> index 6bf5e650da788..7eab0d23cdb52 100644
> --- a/arch/arm64/include/asm/pgtable-hwdef.h
> +++ b/arch/arm64/include/asm/pgtable-hwdef.h
> @@ -177,10 +177,12 @@
>  #define PTE_S2_RDONLY		(_AT(pteval_t, 1) << 6)   /* HAP[2:1] */
>  #define PTE_S2_RDWR		(_AT(pteval_t, 3) << 6)   /* HAP[2:1] */
>  #define PTE_S2_XN		(_AT(pteval_t, 2) << 53)  /* XN[1:0] */
> +#define PTE_S2_SW_RESVD		(_AT(pteval_t, 15) << 55) /* Reserved for SW */
>  
>  #define PMD_S2_RDONLY		(_AT(pmdval_t, 1) << 6)   /* HAP[2:1] */
>  #define PMD_S2_RDWR		(_AT(pmdval_t, 3) << 6)   /* HAP[2:1] */
>  #define PMD_S2_XN		(_AT(pmdval_t, 2) << 53)  /* XN[1:0] */
> +#define PMD_S2_SW_RESVD		(_AT(pmdval_t, 15) << 55) /* Reserved for SW */
>  
>  #define PUD_S2_RDONLY		(_AT(pudval_t, 1) << 6)   /* HAP[2:1] */
>  #define PUD_S2_RDWR		(_AT(pudval_t, 3) << 6)   /* HAP[2:1] */
> -- 
> 2.26.1
> 
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

This is consistent with "Attribute fields in stage 1 VMSAv8-64 Block and
Page descriptors"

Reviewed-by: Andrew Scull <ascull@google.com>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 03/26] KVM: arm64: Factor out stage 2 page table data from struct kvm
  2020-04-22 12:00 ` [PATCH 03/26] KVM: arm64: Factor out stage 2 page table data from struct kvm Marc Zyngier
  2020-05-05 15:26   ` Andrew Scull
@ 2020-05-05 16:03   ` James Morse
  2020-05-05 17:59     ` Marc Zyngier
  2020-05-11 16:38   ` Alexandru Elisei
  2 siblings, 1 reply; 78+ messages in thread
From: James Morse @ 2020-05-05 16:03 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, Andre Przywara, kvmarm, Will Deacon, George Cherian,
	Zengtao (B),
	Catalin Marinas, Dave Martin, linux-arm-kernel

Hi Marc,

On 22/04/2020 13:00, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@arm.com>
> 
> As we are about to reuse our stage 2 page table manipulation code for
> shadow stage 2 page tables in the context of nested virtualization, we
> are going to manage multiple stage 2 page tables for a single VM.
> 
> This requires some pretty invasive changes to our data structures,
> which moves the vmid and pgd pointers into a separate structure and
> change pretty much all of our mmu code to operate on this structure
> instead.
> 
> The new structure is called struct kvm_s2_mmu.
> 
> There is no intended functional change by this patch alone.

It's not obvious to me that VTCR_EL2.T0SZ is a per-vm thing, today the size of the IPA
space comes from the VMM, its not a hardware/compile-time property. Where does the vEL2's
T0SZ go? ... but using this for nested guests would 'only' cause a translation fault, it
would still need handling in the emulation code. So making it per-vm should be simpler.

But accessing VTCR is why the stage2_dissolve_p?d() stuff still needs the kvm pointer,
hence the backreference... it might be neater to push the vtcr properties into kvm_s2_mmu
that way you could drop the kvm backref, and only things that take vm-wide locks would
need the kvm pointer. But I don't think it matters.


I think I get it. I can't see anything that should be the other vm/vcpu pointer.

Reviewed-by: James Morse <james.morse@arm.com>


Some boring fiddly stuff:

[...]

> @@ -125,24 +123,24 @@ static void __hyp_text __tlb_switch_to_host_nvhe(struct kvm *kvm,
>  	}
>  }
>  
> -static void __hyp_text __tlb_switch_to_host(struct kvm *kvm,
> +static void __hyp_text __tlb_switch_to_host(struct kvm_s2_mmu *mmu,
>  					    struct tlb_inv_context *cxt)
>  {
>  	if (has_vhe())
> -		__tlb_switch_to_host_vhe(kvm, cxt);
> +		__tlb_switch_to_host_vhe(cxt);
>  	else
> -		__tlb_switch_to_host_nvhe(kvm, cxt);
> +		__tlb_switch_to_host_nvhe(cxt);
>  }

What does __tlb_switch_to_host() need the kvm_s2_mmu for?

[...]


>  void __hyp_text __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu)
>  {
> -	struct kvm *kvm = kern_hyp_va(kern_hyp_va(vcpu)->kvm);
> +	struct kvm_s2_mmu *mmu = kern_hyp_va(kern_hyp_va(vcpu)->arch.hw_mmu);
>  	struct tlb_inv_context cxt;
>
>  	/* Switch to requested VMID */
> -	__tlb_switch_to_guest(kvm, &cxt);
> +	__tlb_switch_to_guest(mmu, &cxt);
>
>  	__tlbi(vmalle1);
>  	dsb(nsh);
>  	isb();
>
> -	__tlb_switch_to_host(kvm, &cxt);
> +	__tlb_switch_to_host(mmu, &cxt);
>  }

Does this need the vcpu in the future?
Its the odd one out, the other tlb functions here take the s2_mmu, or nothing.
We only use the s2_mmu here.

[...]


> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index e3b9ee268823b..2f99749048285 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c

> @@ -96,31 +96,33 @@ static bool kvm_is_device_pfn(unsigned long pfn)
>   *
>   * Function clears a PMD entry, flushes addr 1st and 2nd stage TLBs.
>   */
> -static void stage2_dissolve_pmd(struct kvm *kvm, phys_addr_t addr, pmd_t *pmd)
> +static void stage2_dissolve_pmd(struct kvm_s2_mmu *mmu, phys_addr_t addr, pmd_t *pmd)

The comment above this function still has '@kvm:	pointer to kvm structure.'

[...]


> @@ -331,8 +339,9 @@ static void unmap_stage2_puds(struct kvm *kvm, pgd_t *pgd,
>   * destroying the VM), otherwise another faulting VCPU may come in and mess
>   * with things behind our backs.
>   */
> -static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
> +static void unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)

The comment above this function still has '@kvm:   The VM pointer'

[...]

> -static void stage2_flush_memslot(struct kvm *kvm,
> +static void stage2_flush_memslot(struct kvm_s2_mmu *mmu,
>  				 struct kvm_memory_slot *memslot)
>  {

Wouldn't something manipulating a memslot have to mess with a set of kvm_s2_mmu once this
is all assembled?
stage2_unmap_memslot() takes struct kvm, it seems odd to pass one kvm_s2_mmu here.

[...]

> @@ -886,21 +898,23 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,

> -int kvm_alloc_stage2_pgd(struct kvm *kvm)
> +int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
>  {
>  	phys_addr_t pgd_phys;
>  	pgd_t *pgd;
> +	int cpu;
>  
> -	if (kvm->arch.pgd != NULL) {
> +	if (mmu->pgd != NULL) {
>  		kvm_err("kvm_arch already initialized?\n");

Does this error message still make sense?


>  		return -EINVAL;
>  	}

[...]

> @@ -1439,9 +1467,10 @@ static void stage2_wp_ptes(pmd_t *pmd, phys_addr_t addr, phys_addr_t end)
>   * @addr:	range start address
>   * @end:	range end address
>   */
> -static void stage2_wp_pmds(struct kvm *kvm, pud_t *pud,
> +static void stage2_wp_pmds(struct kvm_s2_mmu *mmu, pud_t *pud,
>  			   phys_addr_t addr, phys_addr_t end)

The comment above this function still has 'kvm:		kvm instance for the VM'.


>  {
> +	struct kvm *kvm = mmu->kvm;
>  	pmd_t *pmd;
>  	phys_addr_t next;
>  
> @@ -1461,14 +1490,15 @@ static void stage2_wp_pmds(struct kvm *kvm, pud_t *pud,
>  }
>  
>  /**
> - * stage2_wp_puds - write protect PGD range
> - * @pgd:	pointer to pgd entry
> - * @addr:	range start address
> - * @end:	range end address
> - */
> -static void  stage2_wp_puds(struct kvm *kvm, pgd_t *pgd,
> +  * stage2_wp_puds - write protect PGD range
> +  * @pgd:	pointer to pgd entry
> +  * @addr:	range start address
> +  * @end:	range end address
> +  */
> +static void  stage2_wp_puds(struct kvm_s2_mmu *mmu, pgd_t *pgd,
>  			    phys_addr_t addr, phys_addr_t end)

Comment was missing @kvm, now its missing @mmu....


>  {
> +	struct kvm *kvm __maybe_unused = mmu->kvm;
>  	pud_t *pud;
>  	phys_addr_t next;
>  

> @@ -1492,12 +1522,13 @@ static void  stage2_wp_puds(struct kvm *kvm, pgd_t *pgd,
>   * @addr:	Start address of range
>   * @end:	End address of range
>   */
> -static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
> +static void stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end)

The comment above this function still ... you get the picture.

[...]


Thanks,

James
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 03/26] KVM: arm64: Factor out stage 2 page table data from struct kvm
  2020-05-05 15:26   ` Andrew Scull
@ 2020-05-05 16:32     ` Marc Zyngier
  2020-05-05 17:23       ` Andrew Scull
  0 siblings, 1 reply; 78+ messages in thread
From: Marc Zyngier @ 2020-05-05 16:32 UTC (permalink / raw)
  To: Andrew Scull
  Cc: kvm, Andre Przywara, kvmarm, George Cherian, Zengtao (B),
	Catalin Marinas, Will Deacon, Dave Martin, linux-arm-kernel

Hi Andrew,

On Tue, 05 May 2020 16:26:48 +0100,
Andrew Scull <ascull@google.com> wrote:
> 
> Having a go at reviewing. Might turn out to be more useful as a learning
> exercise for me rather than useful feedback but we've got to start
> somewhere..

Thanks for making the effort. Asking questions is never a pointless
exercise, as it usually means that something isn't as crystal clear as
the author expects... ;-)

> 
> > -struct kvm_arch {
> > +struct kvm_s2_mmu {
> >  	struct kvm_vmid vmid;
> >  
> > -	/* stage2 entry level table */
> > -	pgd_t *pgd;
> > -	phys_addr_t pgd_phys;
> > -
> > -	/* VTCR_EL2 value for this VM */
> > -	u64    vtcr;
> > +	/*
> > +	 * stage2 entry level table
> > +	 *
> > +	 * Two kvm_s2_mmu structures in the same VM can point to the same pgd
> > +	 * here.  This happens when running a non-VHE guest hypervisor which
> > +	 * uses the canonical stage 2 page table for both vEL2 and for vEL1/0
> > +	 * with vHCR_EL2.VM == 0.
> > +	 */
> > +	pgd_t		*pgd;
> > +	phys_addr_t	pgd_phys;
> >  
> >  	/* The last vcpu id that ran on each physical CPU */
> >  	int __percpu *last_vcpu_ran;
> >  
> > +	struct kvm *kvm;
> > +};
> > +
> > +struct kvm_arch {
> > +	struct kvm_s2_mmu mmu;
> > +
> > +	/* VTCR_EL2 value for this VM */
> > +	u64    vtcr;
> 
> VTCR seems quite strongly tied to the MMU config. Is it not controlled
> independently for the nested MMUs and so remains in this struct?

This particular instance of VTCR_EL2 is the host's version. Which
means it describes the virtual HW for the EL1 guest. It constraints,
among other things, the number of IPA bits for the guest, for example,
and is configured by the VMM.

Once you start nesting, each vcpu has its own VTCR_EL2 which is still
constrained by the main one (no nested guest can have a T0SZ bigger
than the value imposed by userspace for this guest as a whole).

Does it make sense?

> 
> > -static void stage2_dissolve_pmd(struct kvm *kvm, phys_addr_t addr, pmd_t *pmd)
> > +static void stage2_dissolve_pmd(struct kvm_s2_mmu *mmu, phys_addr_t addr, pmd_t *pmd)
> 
> How strictly is the long line style rule enforced? checkpatch has 16
> such warnings on this patch.

It isn't enforced at all for KVM/arm. I am perfectly happy with
longish lines (I stupidly gave away my vt100 a very long time ago).
In general, checkpatch warnings are to be looked into (it sometimes
brings interesting stuff up), but this falls into the *cosmetic*
department, and I cannot be bothered.

> 
> > -static void stage2_dissolve_pud(struct kvm *kvm, phys_addr_t addr, pud_t *pudp)
> > +static void stage2_dissolve_pud(struct kvm_s2_mmu *mmu, phys_addr_t addr, pud_t *pudp)
> >  {
> > +	struct kvm *kvm __maybe_unused = mmu->kvm;
> > +
> >  	if (!stage2_pud_huge(kvm, *pudp))
> >  		return;
> 
> There're a couple places with `__maybe_unused` on variables that are
> then used soon after. Can they be dropped in these cases so as not to
> hide legitimate warning?

Absolutely. I'll have a look.

Thanks for the review!

	M.

-- 
Jazz is not dead, it just smells funny.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 06/26] arm64: Add level-hinted TLB invalidation helper
  2020-04-22 12:00 ` [PATCH 06/26] arm64: Add level-hinted TLB invalidation helper Marc Zyngier
@ 2020-05-05 17:16   ` Andrew Scull
  2020-05-06  8:05     ` Marc Zyngier
  0 siblings, 1 reply; 78+ messages in thread
From: Andrew Scull @ 2020-05-05 17:16 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, Andre Przywara, kvmarm, George Cherian, Zengtao (B),
	Catalin Marinas, Will Deacon, Dave Martin, linux-arm-kernel

> +#define __tlbi_level(op, addr, level)					\
> +	do {								\
> +		u64 arg = addr;						\
> +									\
> +		if (cpus_have_const_cap(ARM64_HAS_ARMv8_4_TTL) &&	\
> +		    level) {						\
> +			u64 ttl = level;				\
> +									\
> +			switch (PAGE_SIZE) {				\
> +			case SZ_4K:					\
> +				ttl |= 1 << 2;				\
> +				break;					\
> +			case SZ_16K:					\
> +				ttl |= 2 << 2;				\
> +				break;					\
> +			case SZ_64K:					\
> +				ttl |= 3 << 2;				\
> +				break;					\
> +			}						\
> +									\
> +			arg &= ~TLBI_TTL_MASK;				\
> +			arg |= FIELD_PREP(TLBI_TTL_MASK, ttl);		\

Despite the spec saying both tables apply to TLB maintenance
instructions that do not apply to a range of addresses I think it only
means the 4-bit version (bug report to Arm, or I'm on the wrong spec).

This is consistent with Table D5-53 and the macro takes a single address
argument to make misuse with range based tlbi less likely.

It relies on the caller to get the level right and getting it wrong
could be pretty bad as the spec says all bets are off in that case. Is
it worth adding a check of the level against the address (seems a bit
involved), or that it is just 2 bits or adding a short doc comment to
explain it?

(Looks like we get some constants for the levels in a later patch that
could be referenced with some form of time travel)

> +		}							\
> +									\
> +		__tlbi(op,  arg);					\

cosmetic nit: double space in here
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 03/26] KVM: arm64: Factor out stage 2 page table data from struct kvm
  2020-05-05 16:32     ` Marc Zyngier
@ 2020-05-05 17:23       ` Andrew Scull
  2020-05-05 18:10         ` Marc Zyngier
  0 siblings, 1 reply; 78+ messages in thread
From: Andrew Scull @ 2020-05-05 17:23 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, Andre Przywara, kvmarm, George Cherian, Zengtao (B),
	Catalin Marinas, Will Deacon, Dave Martin, linux-arm-kernel

> > > +	/* VTCR_EL2 value for this VM */
> > > +	u64    vtcr;
> > 
> > VTCR seems quite strongly tied to the MMU config. Is it not controlled
> > independently for the nested MMUs and so remains in this struct?
> 
> This particular instance of VTCR_EL2 is the host's version. Which
> means it describes the virtual HW for the EL1 guest. It constraints,
> among other things, the number of IPA bits for the guest, for example,
> and is configured by the VMM.
> 
> Once you start nesting, each vcpu has its own VTCR_EL2 which is still
> constrained by the main one (no nested guest can have a T0SZ bigger
> than the value imposed by userspace for this guest as a whole).
> 
> Does it make sense?

It does up to my ignorance of the spec in this regard.

Simliar to James's question, should `vtcr` live inside the mmu struct
with the top level `kvm::mmu` field containing the host's version and
the nested mmus containing the nested version of vtcr to be applied to
the vCPU? I didn't noticed there being a vtcr for the nested version in
the ~90-patch series so maybe that just isn't something that needs
thinking about?
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 03/26] KVM: arm64: Factor out stage 2 page table data from struct kvm
  2020-05-05 16:03   ` James Morse
@ 2020-05-05 17:59     ` Marc Zyngier
  2020-05-06  9:30       ` Marc Zyngier
  0 siblings, 1 reply; 78+ messages in thread
From: Marc Zyngier @ 2020-05-05 17:59 UTC (permalink / raw)
  To: James Morse
  Cc: kvm, Andre Przywara, kvmarm, Will Deacon, George Cherian,
	Zengtao (B),
	Catalin Marinas, Dave Martin, linux-arm-kernel

Hi James,

On Tue, 05 May 2020 17:03:15 +0100,
James Morse <james.morse@arm.com> wrote:
> 
> Hi Marc,
> 
> On 22/04/2020 13:00, Marc Zyngier wrote:
> > From: Christoffer Dall <christoffer.dall@arm.com>
> > 
> > As we are about to reuse our stage 2 page table manipulation code for
> > shadow stage 2 page tables in the context of nested virtualization, we
> > are going to manage multiple stage 2 page tables for a single VM.
> > 
> > This requires some pretty invasive changes to our data structures,
> > which moves the vmid and pgd pointers into a separate structure and
> > change pretty much all of our mmu code to operate on this structure
> > instead.
> > 
> > The new structure is called struct kvm_s2_mmu.
> > 
> > There is no intended functional change by this patch alone.
> 
> It's not obvious to me that VTCR_EL2.T0SZ is a per-vm thing, today
> the size of the IPA space comes from the VMM, its not a
> hardware/compile-time property. Where does the vEL2's T0SZ go?
> ... but using this for nested guests would 'only' cause a
> translation fault, it would still need handling in the emulation
> code. So making it per-vm should be simpler.

My reasoning is that this VTCR defines the virtual HW, and the guest's
own VTCR_EL2 is just another guest system register. It is the role of
the NV code to compose the two in a way that makes sense (delivering
translation faults if the NV guest's S2 output range doesn't fit in
the host's view of the VM IPA range).

> But accessing VTCR is why the stage2_dissolve_p?d() stuff still
> needs the kvm pointer, hence the backreference... it might be neater
> to push the vtcr properties into kvm_s2_mmu that way you could drop
> the kvm backref, and only things that take vm-wide locks would need
> the kvm pointer. But I don't think it matters.

That's an interesting consideration. I'll have a look.

> I think I get it. I can't see anything that should be the other
> vm/vcpu pointer.
> 
> Reviewed-by: James Morse <james.morse@arm.com>

Thanks!

> Some boring fiddly stuff:
> 
> [...]
> 
> > @@ -125,24 +123,24 @@ static void __hyp_text __tlb_switch_to_host_nvhe(struct kvm *kvm,
> >  	}
> >  }
> >  
> > -static void __hyp_text __tlb_switch_to_host(struct kvm *kvm,
> > +static void __hyp_text __tlb_switch_to_host(struct kvm_s2_mmu *mmu,
> >  					    struct tlb_inv_context *cxt)
> >  {
> >  	if (has_vhe())
> > -		__tlb_switch_to_host_vhe(kvm, cxt);
> > +		__tlb_switch_to_host_vhe(cxt);
> >  	else
> > -		__tlb_switch_to_host_nvhe(kvm, cxt);
> > +		__tlb_switch_to_host_nvhe(cxt);
> >  }
> 
> What does __tlb_switch_to_host() need the kvm_s2_mmu for?

Not much. Obviously mechanical conversion of kvm->kvm_s2_mmu, and not
finishing the job. I'll fix that.

> 
> [...]
> 
> 
> >  void __hyp_text __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu)
> >  {
> > -	struct kvm *kvm = kern_hyp_va(kern_hyp_va(vcpu)->kvm);
> > +	struct kvm_s2_mmu *mmu = kern_hyp_va(kern_hyp_va(vcpu)->arch.hw_mmu);
> >  	struct tlb_inv_context cxt;
> >
> >  	/* Switch to requested VMID */
> > -	__tlb_switch_to_guest(kvm, &cxt);
> > +	__tlb_switch_to_guest(mmu, &cxt);
> >
> >  	__tlbi(vmalle1);
> >  	dsb(nsh);
> >  	isb();
> >
> > -	__tlb_switch_to_host(kvm, &cxt);
> > +	__tlb_switch_to_host(mmu, &cxt);
> >  }
> 
> Does this need the vcpu in the future?
> Its the odd one out, the other tlb functions here take the s2_mmu, or nothing.
> We only use the s2_mmu here.

I think this was done as a way not impact the 32bit code (rest in
peace). Definitely a candidate for immediate cleanup.

> 
> [...]
> 
> 
> > diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> > index e3b9ee268823b..2f99749048285 100644
> > --- a/virt/kvm/arm/mmu.c
> > +++ b/virt/kvm/arm/mmu.c
> 
> > @@ -96,31 +96,33 @@ static bool kvm_is_device_pfn(unsigned long pfn)
> >   *
> >   * Function clears a PMD entry, flushes addr 1st and 2nd stage TLBs.
> >   */
> > -static void stage2_dissolve_pmd(struct kvm *kvm, phys_addr_t addr, pmd_t *pmd)
> > +static void stage2_dissolve_pmd(struct kvm_s2_mmu *mmu, phys_addr_t addr, pmd_t *pmd)
> 
> The comment above this function still has '@kvm:	pointer to kvm structure.'
> 
> [...]
> 
> 
> > @@ -331,8 +339,9 @@ static void unmap_stage2_puds(struct kvm *kvm, pgd_t *pgd,
> >   * destroying the VM), otherwise another faulting VCPU may come in and mess
> >   * with things behind our backs.
> >   */
> > -static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
> > +static void unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
> 
> The comment above this function still has '@kvm:   The VM pointer'
> 
> [...]
> 
> > -static void stage2_flush_memslot(struct kvm *kvm,
> > +static void stage2_flush_memslot(struct kvm_s2_mmu *mmu,
> >  				 struct kvm_memory_slot *memslot)
> >  {
> 
> Wouldn't something manipulating a memslot have to mess with a set of
> kvm_s2_mmu once this is all assembled?  stage2_unmap_memslot() takes
> struct kvm, it seems odd to pass one kvm_s2_mmu here.

Indeed, that doesn't make sense. I guess this was done to match
kvm_stage2_flush_range, which does need to take a kvm_s2_mmu (it is
directly called from the nesting code).

stage2_flush_memslot() only affects the "main" S2 PTs, so passing kvm
here should be the right thing to do.

> 
> [...]
> 
> > @@ -886,21 +898,23 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
> 
> > -int kvm_alloc_stage2_pgd(struct kvm *kvm)
> > +int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
> >  {
> >  	phys_addr_t pgd_phys;
> >  	pgd_t *pgd;
> > +	int cpu;
> >  
> > -	if (kvm->arch.pgd != NULL) {
> > +	if (mmu->pgd != NULL) {
> >  		kvm_err("kvm_arch already initialized?\n");
> 
> Does this error message still make sense?

Probably not anymore. I'll revisit that shortly.

> 
> 
> >  		return -EINVAL;
> >  	}
> 
> [...]
> 
> > @@ -1439,9 +1467,10 @@ static void stage2_wp_ptes(pmd_t *pmd, phys_addr_t addr, phys_addr_t end)
> >   * @addr:	range start address
> >   * @end:	range end address
> >   */
> > -static void stage2_wp_pmds(struct kvm *kvm, pud_t *pud,
> > +static void stage2_wp_pmds(struct kvm_s2_mmu *mmu, pud_t *pud,
> >  			   phys_addr_t addr, phys_addr_t end)
> 
> The comment above this function still has 'kvm:		kvm instance for the VM'.
> 
> 
> >  {
> > +	struct kvm *kvm = mmu->kvm;
> >  	pmd_t *pmd;
> >  	phys_addr_t next;
> >  
> > @@ -1461,14 +1490,15 @@ static void stage2_wp_pmds(struct kvm *kvm, pud_t *pud,
> >  }
> >  
> >  /**
> > - * stage2_wp_puds - write protect PGD range
> > - * @pgd:	pointer to pgd entry
> > - * @addr:	range start address
> > - * @end:	range end address
> > - */
> > -static void  stage2_wp_puds(struct kvm *kvm, pgd_t *pgd,
> > +  * stage2_wp_puds - write protect PGD range
> > +  * @pgd:	pointer to pgd entry
> > +  * @addr:	range start address
> > +  * @end:	range end address
> > +  */
> > +static void  stage2_wp_puds(struct kvm_s2_mmu *mmu, pgd_t *pgd,
> >  			    phys_addr_t addr, phys_addr_t end)
> 
> Comment was missing @kvm, now its missing @mmu....
> 
> 
> >  {
> > +	struct kvm *kvm __maybe_unused = mmu->kvm;
> >  	pud_t *pud;
> >  	phys_addr_t next;
> >  
> 
> > @@ -1492,12 +1522,13 @@ static void  stage2_wp_puds(struct kvm *kvm, pgd_t *pgd,
> >   * @addr:	Start address of range
> >   * @end:	End address of range
> >   */
> > -static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
> > +static void stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end)
> 
> The comment above this function still ... you get the picture.

Thanks a lot for the careful review. I'll respin this shortly, as this
is one of the patch I'd like to get in early.

	M.

-- 
Jazz is not dead, it just smells funny.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 03/26] KVM: arm64: Factor out stage 2 page table data from struct kvm
  2020-05-05 17:23       ` Andrew Scull
@ 2020-05-05 18:10         ` Marc Zyngier
  0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2020-05-05 18:10 UTC (permalink / raw)
  To: Andrew Scull
  Cc: kvm, Andre Przywara, kvmarm, George Cherian, Zengtao (B),
	Catalin Marinas, Will Deacon, Dave Martin, linux-arm-kernel

On Tue, 05 May 2020 18:23:51 +0100,
Andrew Scull <ascull@google.com> wrote:
> 
> > > > +	/* VTCR_EL2 value for this VM */
> > > > +	u64    vtcr;
> > > 
> > > VTCR seems quite strongly tied to the MMU config. Is it not controlled
> > > independently for the nested MMUs and so remains in this struct?
> > 
> > This particular instance of VTCR_EL2 is the host's version. Which
> > means it describes the virtual HW for the EL1 guest. It constraints,
> > among other things, the number of IPA bits for the guest, for example,
> > and is configured by the VMM.
> > 
> > Once you start nesting, each vcpu has its own VTCR_EL2 which is still
> > constrained by the main one (no nested guest can have a T0SZ bigger
> > than the value imposed by userspace for this guest as a whole).
> > 
> > Does it make sense?
> 
> It does up to my ignorance of the spec in this regard.
> 
> Simliar to James's question, should `vtcr` live inside the mmu struct
> with the top level `kvm::mmu` field containing the host's version and
> the nested mmus containing the nested version of vtcr to be applied to
> the vCPU? I didn't noticed there being a vtcr for the nested version in
> the ~90-patch series so maybe that just isn't something that needs
> thinking about?

They serve two very different purposes. One defines the virtual HW,
the other one is the view that a guest hypervisor gives to its own
guests. The latter is also per-vcpu, and not per VM (yes, NV implies
the "de-facto CnP", for those who remember an intense whiteboard
session).  It thus lives in the system register file (being private to
each vcpu). Another reason is that the HW can directly access the
in-memory version of VTCR_EL2 when ARMv8.4-NV is present.

Thanks,

	M.

-- 
Jazz is not dead, it just smells funny.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 06/26] arm64: Add level-hinted TLB invalidation helper
  2020-05-05 17:16   ` Andrew Scull
@ 2020-05-06  8:05     ` Marc Zyngier
  0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2020-05-06  8:05 UTC (permalink / raw)
  To: Andrew Scull
  Cc: kvm, Andre Przywara, kvmarm, George Cherian, Zengtao (B),
	Catalin Marinas, Will Deacon, Dave Martin, linux-arm-kernel

On Tue, 5 May 2020 18:16:31 +0100
Andrew Scull <ascull@google.com> wrote:

Hi Andrew,

> > +#define __tlbi_level(op, addr, level)					\
> > +	do {								\
> > +		u64 arg = addr;						\
> > +									\
> > +		if (cpus_have_const_cap(ARM64_HAS_ARMv8_4_TTL) &&	\
> > +		    level) {						\
> > +			u64 ttl = level;				\
> > +									\
> > +			switch (PAGE_SIZE) {				\
> > +			case SZ_4K:					\
> > +				ttl |= 1 << 2;				\
> > +				break;					\
> > +			case SZ_16K:					\
> > +				ttl |= 2 << 2;				\
> > +				break;					\
> > +			case SZ_64K:					\
> > +				ttl |= 3 << 2;				\
> > +				break;					\
> > +			}						\
> > +									\
> > +			arg &= ~TLBI_TTL_MASK;				\
> > +			arg |= FIELD_PREP(TLBI_TTL_MASK, ttl);		\  
> 
> Despite the spec saying both tables apply to TLB maintenance
> instructions that do not apply to a range of addresses I think it only
> means the 4-bit version (bug report to Arm, or I'm on the wrong spec).

I'm not quite sure what you mean by the 4-bit version here. Do you mean
the instructions taking a VA or IPA as input address? In this case,
yes, this macro is solely for the use of the invalidation of a given
translation, identified by a VA/IPA and a level (which is an implicit
TLB size for a given translation granule).

And yes, it looks like there is a bad copy-paste bug in the ARM ARM
(Rev F.a).

> This is consistent with Table D5-53 and the macro takes a single address
> argument to make misuse with range based tlbi less likely.
> 
> It relies on the caller to get the level right and getting it wrong
> could be pretty bad as the spec says all bets are off in that case. Is
> it worth adding a check of the level against the address (seems a bit
> involved), or that it is just 2 bits or adding a short doc comment to
> explain it?

I'd like to avoid checking code at that level, to be honest. If you are
writing TLB invalidation code, you'd better know what you are doing.
Happy to document it a bit more thoroughly though, and set the
expectations that if you misuse the level, you're in for a treat.

> (Looks like we get some constants for the levels in a later patch that
> could be referenced with some form of time travel)

I'm happy to bring these definitions forward, maybe in a more generic
form (they are very S2-specific at the moment).

> 
> > +		}							\
> > +									\
> > +		__tlbi(op,  arg);					\  
> 
> cosmetic nit: double space in here

Well spotted.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 03/26] KVM: arm64: Factor out stage 2 page table data from struct kvm
  2020-05-05 17:59     ` Marc Zyngier
@ 2020-05-06  9:30       ` Marc Zyngier
  0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2020-05-06  9:30 UTC (permalink / raw)
  To: James Morse
  Cc: kvm, Andre Przywara, kvmarm, Will Deacon, George Cherian,
	Zengtao (B),
	Catalin Marinas, Dave Martin, linux-arm-kernel

On Tue, 05 May 2020 18:59:56 +0100
Marc Zyngier <maz@kernel.org> wrote:

Hi James,

> > But accessing VTCR is why the stage2_dissolve_p?d() stuff still
> > needs the kvm pointer, hence the backreference... it might be neater
> > to push the vtcr properties into kvm_s2_mmu that way you could drop
> > the kvm backref, and only things that take vm-wide locks would need
> > the kvm pointer. But I don't think it matters.  
> 
> That's an interesting consideration. I'll have a look.

So I went back on forth on this (the joys of not sleeping), and decided
to keep the host's VTCR_EL2 where it is today (in the kvm structure).
Two reasons for this:

- This field is part of the host configuration. Moving it to the S2 MMU
  structure muddies the waters a bit once you start nesting, as this
  structure really describes an inner guest context. It has its own
  associated VTCR, which lives in the sysreg file, and it becomes a bit
  confusing to look at a kvm_s2_mmu structure in isolation and wonder
  whether this field is directly related to the PTs in this structure,
  or to something else.

- It duplicates state. If there is one thing I have learned over the
  past years, it is that you should keep a given state in one single
  place at all times. Granted, VTCR doesn't change over the lifetime of
  the guest, but still.

I guess the one thing that would push me to the other side of the
debate is if we can show that the amount of pointer chasing generated
by the mmu->kvm->vtcr dance is causing actual performance issues. So
far, I haven't measured such an impact.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 05/26] arm64: Document SW reserved PTE/PMD bits in Stage-2 descriptors
  2020-05-05 15:59   ` Andrew Scull
@ 2020-05-06  9:39     ` Marc Zyngier
  2020-05-06 10:11       ` Andrew Scull
  0 siblings, 1 reply; 78+ messages in thread
From: Marc Zyngier @ 2020-05-06  9:39 UTC (permalink / raw)
  To: Andrew Scull
  Cc: kvm, Andre Przywara, kvmarm, George Cherian, Zengtao (B),
	Catalin Marinas, Will Deacon, Dave Martin, linux-arm-kernel

Hi Andrew,

On 2020-05-05 16:59, Andrew Scull wrote:
> On Wed, Apr 22, 2020 at 01:00:29PM +0100, Marc Zyngier wrote:
>> Advertise bits [58:55] as reserved for SW in the S2 descriptors.
>> 
>> Signed-off-by: Marc Zyngier <maz@kernel.org>
>> ---
>>  arch/arm64/include/asm/pgtable-hwdef.h | 2 ++
>>  1 file changed, 2 insertions(+)
>> 
>> diff --git a/arch/arm64/include/asm/pgtable-hwdef.h 
>> b/arch/arm64/include/asm/pgtable-hwdef.h
>> index 6bf5e650da788..7eab0d23cdb52 100644
>> --- a/arch/arm64/include/asm/pgtable-hwdef.h
>> +++ b/arch/arm64/include/asm/pgtable-hwdef.h
>> @@ -177,10 +177,12 @@
>>  #define PTE_S2_RDONLY		(_AT(pteval_t, 1) << 6)   /* HAP[2:1] */
>>  #define PTE_S2_RDWR		(_AT(pteval_t, 3) << 6)   /* HAP[2:1] */
>>  #define PTE_S2_XN		(_AT(pteval_t, 2) << 53)  /* XN[1:0] */
>> +#define PTE_S2_SW_RESVD		(_AT(pteval_t, 15) << 55) /* Reserved for SW 
>> */
>> 
>>  #define PMD_S2_RDONLY		(_AT(pmdval_t, 1) << 6)   /* HAP[2:1] */
>>  #define PMD_S2_RDWR		(_AT(pmdval_t, 3) << 6)   /* HAP[2:1] */
>>  #define PMD_S2_XN		(_AT(pmdval_t, 2) << 53)  /* XN[1:0] */
>> +#define PMD_S2_SW_RESVD		(_AT(pmdval_t, 15) << 55) /* Reserved for SW 
>> */
>> 
>>  #define PUD_S2_RDONLY		(_AT(pudval_t, 1) << 6)   /* HAP[2:1] */
>>  #define PUD_S2_RDWR		(_AT(pudval_t, 3) << 6)   /* HAP[2:1] */
>> --
>> 2.26.1
>> 
>> _______________________________________________
>> kvmarm mailing list
>> kvmarm@lists.cs.columbia.edu
>> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
> 
> This is consistent with "Attribute fields in stage 1 VMSAv8-64 Block 
> and
> Page descriptors"

Do you mean "stage 2" instead? The reserved bits are the same, but I 
want
to be sure we have looked at the same thing (ARM DDI 0487F.a, D5-2603).

> Reviewed-by: Andrew Scull <ascull@google.com>

Thanks,

          M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 05/26] arm64: Document SW reserved PTE/PMD bits in Stage-2 descriptors
  2020-05-06  9:39     ` Marc Zyngier
@ 2020-05-06 10:11       ` Andrew Scull
  0 siblings, 0 replies; 78+ messages in thread
From: Andrew Scull @ 2020-05-06 10:11 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, Andre Przywara, kvmarm, George Cherian, Zengtao (B),
	Catalin Marinas, Will Deacon, Dave Martin, linux-arm-kernel

On Wed, May 06, 2020 at 10:39:59AM +0100, Marc Zyngier wrote:
> Hi Andrew,
> 
> On 2020-05-05 16:59, Andrew Scull wrote:
> > On Wed, Apr 22, 2020 at 01:00:29PM +0100, Marc Zyngier wrote:
> > > Advertise bits [58:55] as reserved for SW in the S2 descriptors.
> > > 
> > > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > > ---
> > >  arch/arm64/include/asm/pgtable-hwdef.h | 2 ++
> > >  1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/arch/arm64/include/asm/pgtable-hwdef.h
> > > b/arch/arm64/include/asm/pgtable-hwdef.h
> > > index 6bf5e650da788..7eab0d23cdb52 100644
> > > --- a/arch/arm64/include/asm/pgtable-hwdef.h
> > > +++ b/arch/arm64/include/asm/pgtable-hwdef.h
> > > @@ -177,10 +177,12 @@
> > >  #define PTE_S2_RDONLY		(_AT(pteval_t, 1) << 6)   /* HAP[2:1] */
> > >  #define PTE_S2_RDWR		(_AT(pteval_t, 3) << 6)   /* HAP[2:1] */
> > >  #define PTE_S2_XN		(_AT(pteval_t, 2) << 53)  /* XN[1:0] */
> > > +#define PTE_S2_SW_RESVD		(_AT(pteval_t, 15) << 55) /* Reserved for
> > > SW */
> > > 
> > >  #define PMD_S2_RDONLY		(_AT(pmdval_t, 1) << 6)   /* HAP[2:1] */
> > >  #define PMD_S2_RDWR		(_AT(pmdval_t, 3) << 6)   /* HAP[2:1] */
> > >  #define PMD_S2_XN		(_AT(pmdval_t, 2) << 53)  /* XN[1:0] */
> > > +#define PMD_S2_SW_RESVD		(_AT(pmdval_t, 15) << 55) /* Reserved for
> > > SW */
> > > 
> > >  #define PUD_S2_RDONLY		(_AT(pudval_t, 1) << 6)   /* HAP[2:1] */
> > >  #define PUD_S2_RDWR		(_AT(pudval_t, 3) << 6)   /* HAP[2:1] */
> > > --
> > > 2.26.1
> > > 
> > > _______________________________________________
> > > kvmarm mailing list
> > > kvmarm@lists.cs.columbia.edu
> > > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
> > 
> > This is consistent with "Attribute fields in stage 1 VMSAv8-64 Block and
> > Page descriptors"
> 
> Do you mean "stage 2" instead? The reserved bits are the same, but I want
> to be sure we have looked at the same thing (ARM DDI 0487F.a, D5-2603).

My turn for the copy-paste bug, yes "Attribute fields in stage 2
VMSAv8-64 Block and Page descriptors". And conviniently the same bits
are reserved for SW in both.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 01/26] KVM: arm64: Check advertised Stage-2 page size capability
  2020-04-22 12:00 ` [PATCH 01/26] KVM: arm64: Check advertised Stage-2 page size capability Marc Zyngier
  2020-04-22 13:40   ` Suzuki K Poulose
@ 2020-05-07 11:42   ` Alexandru Elisei
  1 sibling, 0 replies; 78+ messages in thread
From: Alexandru Elisei @ 2020-05-07 11:42 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Dave Martin, George Cherian, Zengtao (B),
	Catalin Marinas, Will Deacon

Hi,

On 4/22/20 1:00 PM, Marc Zyngier wrote:
> With ARMv8.5-GTG, the hardware (or more likely a hypervisor) can
> advertise the supported Stage-2 page sizes.
>
> Let's check this at boot time.
>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_host.h |  2 +-
>  arch/arm64/include/asm/sysreg.h   |  3 +++
>  arch/arm64/kernel/cpufeature.c    |  8 +++++++
>  arch/arm64/kvm/reset.c            | 40 ++++++++++++++++++++++++++++---
>  virt/kvm/arm/arm.c                |  4 +---
>  5 files changed, 50 insertions(+), 7 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 32c8a675e5a4a..7dd8fefa6aecd 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -670,7 +670,7 @@ static inline int kvm_arm_have_ssbd(void)
>  void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu);
>  void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu);
>  
> -void kvm_set_ipa_limit(void);
> +int kvm_set_ipa_limit(void);
>  
>  #define __KVM_HAVE_ARCH_VM_ALLOC
>  struct kvm *kvm_arch_alloc_vm(void);
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index ebc6224328318..5d10c9148e844 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -686,6 +686,9 @@
>  #define ID_AA64ZFR0_SVEVER_SVE2		0x1
>  
>  /* id_aa64mmfr0 */
> +#define ID_AA64MMFR0_TGRAN4_2_SHIFT	40
> +#define ID_AA64MMFR0_TGRAN64_2_SHIFT	36
> +#define ID_AA64MMFR0_TGRAN16_2_SHIFT	32
>  #define ID_AA64MMFR0_TGRAN4_SHIFT	28
>  #define ID_AA64MMFR0_TGRAN64_SHIFT	24
>  #define ID_AA64MMFR0_TGRAN16_SHIFT	20
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 9fac745aa7bb2..9892a845d06c9 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -208,6 +208,14 @@ static const struct arm64_ftr_bits ftr_id_aa64zfr0[] = {
>  };
>  
>  static const struct arm64_ftr_bits ftr_id_aa64mmfr0[] = {
> +	/*
> +	 * Page size not being supported at Stage-2 are not fatal. You

s/are not fatal/is not fatal

> +	 * just give up KVM if PAGE_SIZE isn't supported there. Go fix
> +	 * your favourite nesting hypervisor.
> +	 */
> +	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_EXACT, ID_AA64MMFR0_TGRAN4_2_SHIFT, 4, 1),
> +	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_EXACT, ID_AA64MMFR0_TGRAN64_2_SHIFT, 4, 1),
> +	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_EXACT, ID_AA64MMFR0_TGRAN16_2_SHIFT, 4, 1),
>  	/*
>  	 * We already refuse to boot CPUs that don't support our configured
>  	 * page size, so we can only detect mismatches for a page size other
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index 30b7ea680f66c..241db35a7ef4f 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -9,6 +9,7 @@
>   */
>  
>  #include <linux/errno.h>
> +#include <linux/bitfield.h>
>  #include <linux/kernel.h>
>  #include <linux/kvm_host.h>
>  #include <linux/kvm.h>
> @@ -340,11 +341,42 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>  	return ret;
>  }
>  
> -void kvm_set_ipa_limit(void)
> +int kvm_set_ipa_limit(void)
>  {
> -	unsigned int ipa_max, pa_max, va_max, parange;
> +	unsigned int ipa_max, pa_max, va_max, parange, tgran_2;
> +	u64 mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
>  
> -	parange = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1) & 0x7;
> +	/*
> +	 * Check with ARMv8.5-GTG that our PAGE_SIZE is supported at
> +	 * Stage-2. If not, things will stop very quickly.
> +	 */
> +	switch (PAGE_SIZE) {
> +	default:
> +	case SZ_4K:
> +		tgran_2 = ID_AA64MMFR0_TGRAN4_2_SHIFT;
> +		break;
> +	case SZ_16K:
> +		tgran_2 = ID_AA64MMFR0_TGRAN16_2_SHIFT;
> +		break;
> +	case SZ_64K:
> +		tgran_2 = ID_AA64MMFR0_TGRAN64_2_SHIFT;
> +		break;
> +	}
> +
> +	switch (FIELD_GET(0xFUL << tgran_2, mmfr0)) {
> +	default:
> +	case 1:
> +		kvm_err("PAGE_SIZE not supported at Stage-2, giving up\n");
> +		return -EINVAL;
> +	case 0:
> +		kvm_debug("PAGE_SIZE supported at Stage-2 (default)\n");
> +		break;
> +	case 2:
> +		kvm_debug("PAGE_SIZE supported at Stage-2 (advertised)\n");
> +		break;
> +	}
> +
> +	parange = mmfr0 & 0x7;
>  	pa_max = id_aa64mmfr0_parange_to_phys_shift(parange);
>  
>  	/* Clamp the IPA limit to the PA size supported by the kernel */
> @@ -378,6 +410,8 @@ void kvm_set_ipa_limit(void)
>  	     "KVM IPA limit (%d bit) is smaller than default size\n", ipa_max);
>  	kvm_ipa_limit = ipa_max;
>  	kvm_info("IPA Size Limit: %dbits\n", kvm_ipa_limit);
> +
> +	return 0;
>  }
>  
>  /*
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 48d0ec44ad77e..53b3ba9173ba7 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -1387,9 +1387,7 @@ static inline void hyp_cpu_pm_exit(void)
>  
>  static int init_common_resources(void)
>  {
> -	kvm_set_ipa_limit();
> -
> -	return 0;
> +	return kvm_set_ipa_limit();
>  }
>  
>  static int init_subsystems(void)

For what is worth, I've taken a look at the ARMv8.5-GTG spec and your patch looks
fine to me:

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks,
Alex
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 07/26] KVM: arm64: Add a level hint to __kvm_tlb_flush_vmid_ipa
  2020-04-22 12:00 ` [PATCH 07/26] KVM: arm64: Add a level hint to __kvm_tlb_flush_vmid_ipa Marc Zyngier
@ 2020-05-07 15:08   ` Andrew Scull
  2020-05-07 15:13     ` Marc Zyngier
  0 siblings, 1 reply; 78+ messages in thread
From: Andrew Scull @ 2020-05-07 15:08 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, Andre Przywara, kvmarm, George Cherian, Zengtao (B),
	Catalin Marinas, Will Deacon, Dave Martin, linux-arm-kernel

> -void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa)
> +void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu,
> +					 phys_addr_t ipa, int level)

The level feels like a good opportunity for an enum to add some
documentation from the type.

>  static void kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa)
>  {
> -	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ipa);
> +	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ipa, 0);

With the constants from the next patch brought forward, the magic 0 can
be given a name, although it's very temporary so..

Otherwise, looks good.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 08/26] KVM: arm64: Use TTL hint in when invalidating stage-2 translations
  2020-04-22 12:00 ` [PATCH 08/26] KVM: arm64: Use TTL hint in when invalidating stage-2 translations Marc Zyngier
@ 2020-05-07 15:13   ` Andrew Scull
  2020-05-12 12:04     ` James Morse
  2020-05-12 17:26   ` James Morse
  1 sibling, 1 reply; 78+ messages in thread
From: Andrew Scull @ 2020-05-07 15:13 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, Andre Przywara, kvmarm, George Cherian, Zengtao (B),
	Catalin Marinas, Will Deacon, Dave Martin, linux-arm-kernel

> @@ -176,7 +177,7 @@ static void clear_stage2_pud_entry(struct kvm_s2_mmu *mmu, pud_t *pud, phys_addr
>  	pmd_t *pmd_table __maybe_unused = stage2_pmd_offset(kvm, pud, 0);
>  	VM_BUG_ON(stage2_pud_huge(kvm, *pud));
>  	stage2_pud_clear(kvm, pud);
> -	kvm_tlb_flush_vmid_ipa(mmu, addr);
> +	kvm_tlb_flush_vmid_ipa(mmu, addr, S2_NO_LEVEL_HINT);
>  	stage2_pmd_free(kvm, pmd_table);
>  	put_page(virt_to_page(pud));
>  }
> @@ -186,7 +187,7 @@ static void clear_stage2_pmd_entry(struct kvm_s2_mmu *mmu, pmd_t *pmd, phys_addr
>  	pte_t *pte_table = pte_offset_kernel(pmd, 0);
>  	VM_BUG_ON(pmd_thp_or_huge(*pmd));
>  	pmd_clear(pmd);
> -	kvm_tlb_flush_vmid_ipa(mmu, addr);
> +	kvm_tlb_flush_vmid_ipa(mmu, addr, S2_NO_LEVEL_HINT);
>  	free_page((unsigned long)pte_table);
>  	put_page(virt_to_page(pmd));
>  }

Going by the names, is it possible to give a better level hint for these
cases?
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 07/26] KVM: arm64: Add a level hint to __kvm_tlb_flush_vmid_ipa
  2020-05-07 15:08   ` Andrew Scull
@ 2020-05-07 15:13     ` Marc Zyngier
  0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2020-05-07 15:13 UTC (permalink / raw)
  To: Andrew Scull
  Cc: kvm, Andre Przywara, kvmarm, George Cherian, Zengtao (B),
	Catalin Marinas, Will Deacon, Dave Martin, linux-arm-kernel

On 2020-05-07 16:08, Andrew Scull wrote:
>> -void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, 
>> phys_addr_t ipa)
>> +void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu,
>> +					 phys_addr_t ipa, int level)
> 
> The level feels like a good opportunity for an enum to add some
> documentation from the type.

Sure, why not. I'll give it a go.

> 
>>  static void kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, 
>> phys_addr_t ipa)
>>  {
>> -	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ipa);
>> +	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ipa, 0);
> 
> With the constants from the next patch brought forward, the magic 0 can
> be given a name, although it's very temporary so..

Yup. To the point where I've now squashed this patch and the following
one together, and moved the constants to the previous patch.

> Otherwise, looks good.

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 09/26] KVM: arm64: vgic-v3: Take cpu_if pointer directly instead of vcpu
  2020-04-22 12:00 ` [PATCH 09/26] KVM: arm64: vgic-v3: Take cpu_if pointer directly instead of vcpu Marc Zyngier
@ 2020-05-07 16:26   ` James Morse
  2020-05-08 12:20     ` Marc Zyngier
  0 siblings, 1 reply; 78+ messages in thread
From: James Morse @ 2020-05-07 16:26 UTC (permalink / raw)
  To: Marc Zyngier, kvmarm
  Cc: kvm, Andre Przywara, Dave Martin, Will Deacon, George Cherian,
	Zengtao (B),
	Catalin Marinas, linux-arm-kernel

Hi Marc, Christoffer,

On 22/04/2020 13:00, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@arm.com>
> 
> If we move the used_lrs field to the version-specific cpu interface
> structure, the following functions only operate on the struct
> vgic_v3_cpu_if and not the full vcpu:
> 
>   __vgic_v3_save_state
>   __vgic_v3_restore_state
>   __vgic_v3_activate_traps
>   __vgic_v3_deactivate_traps
>   __vgic_v3_save_aprs
>   __vgic_v3_restore_aprs
> 
> This is going to be very useful for nested virt, 

... because you don't need to consider whether the vcpu is running in vEL2?


> so move the used_lrs
> field and change the prototypes and implementations of these functions to
> take the cpu_if parameter directly.


> No functional change.

Looks like no change!

Reviewed-by: James Morse <james.morse@arm.com>


Thanks,

James

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 09/26] KVM: arm64: vgic-v3: Take cpu_if pointer directly instead of vcpu
  2020-05-07 16:26   ` James Morse
@ 2020-05-08 12:20     ` Marc Zyngier
  0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2020-05-08 12:20 UTC (permalink / raw)
  To: James Morse
  Cc: kvm, Andre Przywara, kvmarm, Will Deacon, George Cherian,
	Zengtao (B),
	Catalin Marinas, Dave Martin, linux-arm-kernel

On Thu, 7 May 2020 17:26:06 +0100
James Morse <james.morse@arm.com> wrote:

Hi James,

> Hi Marc, Christoffer,
> 
> On 22/04/2020 13:00, Marc Zyngier wrote:
> > From: Christoffer Dall <christoffer.dall@arm.com>
> > 
> > If we move the used_lrs field to the version-specific cpu interface
> > structure, the following functions only operate on the struct
> > vgic_v3_cpu_if and not the full vcpu:
> > 
> >   __vgic_v3_save_state
> >   __vgic_v3_restore_state
> >   __vgic_v3_activate_traps
> >   __vgic_v3_deactivate_traps
> >   __vgic_v3_save_aprs
> >   __vgic_v3_restore_aprs
> > 
> > This is going to be very useful for nested virt,   
> 
> ... because you don't need to consider whether the vcpu is running in vEL2?

That's one of the reasons, as vEL2 is still EL1.

But things become really fun when you run a L2 guest, which is an EL1
guest from the PoV of the guest hypervisor (aka L1). At this stage
what you feed to the HW is not the state that could be populated by L0
for L1, but instead what L1 has created for L2 (with a bit of
additional repainting to adjust some of the interrupt numbers).

So we build a shadow cpu_if on the fly and pass it to the normal vgic
handling functions. I told you it was fun! ;-)

> > so move the used_lrs
> > field and change the prototypes and implementations of these functions to
> > take the cpu_if parameter directly.  
> 
> 
> > No functional change.  
> 
> Looks like no change!
> 
> Reviewed-by: James Morse <james.morse@arm.com>

Thanks!

	M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 03/26] KVM: arm64: Factor out stage 2 page table data from struct kvm
  2020-04-22 12:00 ` [PATCH 03/26] KVM: arm64: Factor out stage 2 page table data from struct kvm Marc Zyngier
  2020-05-05 15:26   ` Andrew Scull
  2020-05-05 16:03   ` James Morse
@ 2020-05-11 16:38   ` Alexandru Elisei
  2020-05-12 11:17     ` James Morse
  2 siblings, 1 reply; 78+ messages in thread
From: Alexandru Elisei @ 2020-05-11 16:38 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Dave Martin, George Cherian, Zengtao (B),
	Catalin Marinas, Will Deacon

Hi,

On 4/22/20 1:00 PM, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@arm.com>
>
> As we are about to reuse our stage 2 page table manipulation code for
> shadow stage 2 page tables in the context of nested virtualization, we
> are going to manage multiple stage 2 page tables for a single VM.
>
> This requires some pretty invasive changes to our data structures,
> which moves the vmid and pgd pointers into a separate structure and
> change pretty much all of our mmu code to operate on this structure
> instead.
>
> The new structure is called struct kvm_s2_mmu.
>
> There is no intended functional change by this patch alone.
>
> [Designed data structure layout in collaboration]
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Co-developed-by: Marc Zyngier <maz@kernel.org>
> [maz: Moved the last_vcpu_ran down to the S2 MMU structure as well]
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_asm.h  |   5 +-
>  arch/arm64/include/asm/kvm_host.h |  30 +++-
>  arch/arm64/include/asm/kvm_mmu.h  |  16 +-
>  arch/arm64/kvm/hyp/switch.c       |   8 +-
>  arch/arm64/kvm/hyp/tlb.c          |  48 +++---
>  virt/kvm/arm/arm.c                |  32 +---
>  virt/kvm/arm/mmu.c                | 266 +++++++++++++++++-------------
>  7 files changed, 219 insertions(+), 186 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index 7c7eeeaab9faa..5adf4e1a4c2c9 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -53,6 +53,7 @@
>  
>  struct kvm;
>  struct kvm_vcpu;
> +struct kvm_s2_mmu;
>  
>  extern char __kvm_hyp_init[];
>  extern char __kvm_hyp_init_end[];
> @@ -60,8 +61,8 @@ extern char __kvm_hyp_init_end[];
>  extern char __kvm_hyp_vector[];
>  
>  extern void __kvm_flush_vm_context(void);
> -extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
> -extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
> +extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa);
> +extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
>  extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
>  
>  extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 7dd8fefa6aecd..664a5d92ae9b8 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -63,19 +63,32 @@ struct kvm_vmid {
>  	u32    vmid;
>  };
>  
> -struct kvm_arch {
> +struct kvm_s2_mmu {
>  	struct kvm_vmid vmid;
>  
> -	/* stage2 entry level table */
> -	pgd_t *pgd;
> -	phys_addr_t pgd_phys;
> -
> -	/* VTCR_EL2 value for this VM */
> -	u64    vtcr;
> +	/*
> +	 * stage2 entry level table
> +	 *
> +	 * Two kvm_s2_mmu structures in the same VM can point to the same pgd
> +	 * here.  This happens when running a non-VHE guest hypervisor which
> +	 * uses the canonical stage 2 page table for both vEL2 and for vEL1/0
> +	 * with vHCR_EL2.VM == 0.

It makes more sense to me to say that a non-VHE guest hypervisor will use the
canonical stage *1* page table when running at EL2 (the "Non-secure EL2
translation regime" as ARM DDI 0487F.b calls it on page D5-2543). I think that's
the only situation where vEL2 and vEL1&0 will use the same L0 stage 2 tables. It's
been quite some time since I reviewed the initial version of the NV patches, did I
get that wrong?

> +	 */
> +	pgd_t		*pgd;
> +	phys_addr_t	pgd_phys;
>  
>  	/* The last vcpu id that ran on each physical CPU */
>  	int __percpu *last_vcpu_ran;

It makes sense for the other fields to be part of kvm_s2_mmu, but I'm struggling
to figure out why last_vcpu_ran is here. Would you mind sharing the rationale? I
don't see this change in v1 or v2 of the NV series.

More below.

>  
> +	struct kvm *kvm;
> +};
> +
> +struct kvm_arch {
> +	struct kvm_s2_mmu mmu;
> +
> +	/* VTCR_EL2 value for this VM */
> +	u64    vtcr;
> +
>  	/* The maximum number of vCPUs depends on the used GIC model */
>  	int max_vcpus;
>  
> @@ -255,6 +268,9 @@ struct kvm_vcpu_arch {
>  	void *sve_state;
>  	unsigned int sve_max_vl;
>  
> +	/* Stage 2 paging state used by the hardware on next switch */
> +	struct kvm_s2_mmu *hw_mmu;
> +
>  	/* HYP configuration */
>  	u64 hcr_el2;
>  	u32 mdcr_el2;
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 5ba1310639ec6..c6c8eee008d66 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -154,8 +154,8 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
>  void free_hyp_pgds(void);
>  
>  void stage2_unmap_vm(struct kvm *kvm);
> -int kvm_alloc_stage2_pgd(struct kvm *kvm);
> -void kvm_free_stage2_pgd(struct kvm *kvm);
> +int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu);
> +void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
>  int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
>  			  phys_addr_t pa, unsigned long size, bool writable);
>  
> @@ -593,13 +593,13 @@ static inline u64 kvm_vttbr_baddr_mask(struct kvm *kvm)
>  	return vttbr_baddr_mask(kvm_phys_shift(kvm), kvm_stage2_levels(kvm));
>  }
>  
> -static __always_inline u64 kvm_get_vttbr(struct kvm *kvm)
> +static __always_inline u64 kvm_get_vttbr(struct kvm_s2_mmu *mmu)
>  {
> -	struct kvm_vmid *vmid = &kvm->arch.vmid;
> +	struct kvm_vmid *vmid = &mmu->vmid;
>  	u64 vmid_field, baddr;
>  	u64 cnp = system_supports_cnp() ? VTTBR_CNP_BIT : 0;
>  
> -	baddr = kvm->arch.pgd_phys;
> +	baddr = mmu->pgd_phys;
>  	vmid_field = (u64)vmid->vmid << VTTBR_VMID_SHIFT;
>  	return kvm_phys_to_vttbr(baddr) | vmid_field | cnp;
>  }
> @@ -608,10 +608,10 @@ static __always_inline u64 kvm_get_vttbr(struct kvm *kvm)
>   * Must be called from hyp code running at EL2 with an updated VTTBR
>   * and interrupts disabled.
>   */
> -static __always_inline void __load_guest_stage2(struct kvm *kvm)
> +static __always_inline void __load_guest_stage2(struct kvm_s2_mmu *mmu)
>  {
> -	write_sysreg(kvm->arch.vtcr, vtcr_el2);
> -	write_sysreg(kvm_get_vttbr(kvm), vttbr_el2);
> +	write_sysreg(kern_hyp_va(mmu->kvm)->arch.vtcr, vtcr_el2);
> +	write_sysreg(kvm_get_vttbr(mmu), vttbr_el2);
>  
>  	/*
>  	 * ARM erratum 1165522 requires the actual execution of the above
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 8a1e81a400e0f..d79319038b119 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -256,9 +256,9 @@ void deactivate_traps_vhe_put(void)
>  	__deactivate_traps_common();
>  }
>  
> -static void __hyp_text __activate_vm(struct kvm *kvm)
> +static void __hyp_text __activate_vm(struct kvm_s2_mmu *mmu)
>  {
> -	__load_guest_stage2(kvm);
> +	__load_guest_stage2(mmu);
>  }
>  
>  static void __hyp_text __deactivate_vm(struct kvm_vcpu *vcpu)
> @@ -659,7 +659,7 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
>  	 * stage 2 translation, and __activate_traps clear HCR_EL2.TGE
>  	 * (among other things).
>  	 */
> -	__activate_vm(vcpu->kvm);
> +	__activate_vm(vcpu->arch.hw_mmu);
>  	__activate_traps(vcpu);
>  
>  	sysreg_restore_guest_state_vhe(guest_ctxt);
> @@ -766,7 +766,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
>  	__sysreg32_restore_state(vcpu);
>  	__sysreg_restore_state_nvhe(guest_ctxt);
>  
> -	__activate_vm(kern_hyp_va(vcpu->kvm));
> +	__activate_vm(kern_hyp_va(vcpu->arch.hw_mmu));
>  	__activate_traps(vcpu);
>  
>  	__hyp_vgic_restore_state(vcpu);
> diff --git a/arch/arm64/kvm/hyp/tlb.c b/arch/arm64/kvm/hyp/tlb.c
> index ceaddbe4279f9..b795891b0d091 100644
> --- a/arch/arm64/kvm/hyp/tlb.c
> +++ b/arch/arm64/kvm/hyp/tlb.c
> @@ -16,7 +16,7 @@ struct tlb_inv_context {
>  	u64		sctlr;
>  };
>  
> -static void __hyp_text __tlb_switch_to_guest_vhe(struct kvm *kvm,
> +static void __hyp_text __tlb_switch_to_guest_vhe(struct kvm_s2_mmu *mmu,
>  						 struct tlb_inv_context *cxt)
>  {
>  	u64 val;
> @@ -53,14 +53,14 @@ static void __hyp_text __tlb_switch_to_guest_vhe(struct kvm *kvm,
>  	 * place before clearing TGE. __load_guest_stage2() already
>  	 * has an ISB in order to deal with this.
>  	 */
> -	__load_guest_stage2(kvm);
> +	__load_guest_stage2(mmu);
>  	val = read_sysreg(hcr_el2);
>  	val &= ~HCR_TGE;
>  	write_sysreg(val, hcr_el2);
>  	isb();
>  }
>  
> -static void __hyp_text __tlb_switch_to_guest_nvhe(struct kvm *kvm,
> +static void __hyp_text __tlb_switch_to_guest_nvhe(struct kvm_s2_mmu *mmu,
>  						  struct tlb_inv_context *cxt)
>  {
>  	if (cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT_NVHE)) {
> @@ -79,21 +79,20 @@ static void __hyp_text __tlb_switch_to_guest_nvhe(struct kvm *kvm,
>  		isb();
>  	}
>  
> -	__load_guest_stage2(kvm);
> +	__load_guest_stage2(mmu);
>  	isb();
>  }
>  
> -static void __hyp_text __tlb_switch_to_guest(struct kvm *kvm,
> +static void __hyp_text __tlb_switch_to_guest(struct kvm_s2_mmu *mmu,
>  					     struct tlb_inv_context *cxt)
>  {
>  	if (has_vhe())
> -		__tlb_switch_to_guest_vhe(kvm, cxt);
> +		__tlb_switch_to_guest_vhe(mmu, cxt);
>  	else
> -		__tlb_switch_to_guest_nvhe(kvm, cxt);
> +		__tlb_switch_to_guest_nvhe(mmu, cxt);
>  }
>  
> -static void __hyp_text __tlb_switch_to_host_vhe(struct kvm *kvm,
> -						struct tlb_inv_context *cxt)
> +static void __hyp_text __tlb_switch_to_host_vhe(struct tlb_inv_context *cxt)
>  {
>  	/*
>  	 * We're done with the TLB operation, let's restore the host's
> @@ -112,8 +111,7 @@ static void __hyp_text __tlb_switch_to_host_vhe(struct kvm *kvm,
>  	local_irq_restore(cxt->flags);
>  }
>  
> -static void __hyp_text __tlb_switch_to_host_nvhe(struct kvm *kvm,
> -						 struct tlb_inv_context *cxt)
> +static void __hyp_text __tlb_switch_to_host_nvhe(struct tlb_inv_context *cxt)
>  {
>  	write_sysreg(0, vttbr_el2);
>  
> @@ -125,24 +123,24 @@ static void __hyp_text __tlb_switch_to_host_nvhe(struct kvm *kvm,
>  	}
>  }
>  
> -static void __hyp_text __tlb_switch_to_host(struct kvm *kvm,
> +static void __hyp_text __tlb_switch_to_host(struct kvm_s2_mmu *mmu,
>  					    struct tlb_inv_context *cxt)
>  {
>  	if (has_vhe())
> -		__tlb_switch_to_host_vhe(kvm, cxt);
> +		__tlb_switch_to_host_vhe(cxt);
>  	else
> -		__tlb_switch_to_host_nvhe(kvm, cxt);
> +		__tlb_switch_to_host_nvhe(cxt);
>  }
>  
> -void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
> +void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa)
>  {
>  	struct tlb_inv_context cxt;
>  
>  	dsb(ishst);
>  
>  	/* Switch to requested VMID */
> -	kvm = kern_hyp_va(kvm);
> -	__tlb_switch_to_guest(kvm, &cxt);
> +	mmu = kern_hyp_va(mmu);
> +	__tlb_switch_to_guest(mmu, &cxt);
>  
>  	/*
>  	 * We could do so much better if we had the VA as well.
> @@ -185,39 +183,39 @@ void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
>  	if (!has_vhe() && icache_is_vpipt())
>  		__flush_icache_all();
>  
> -	__tlb_switch_to_host(kvm, &cxt);
> +	__tlb_switch_to_host(mmu, &cxt);
>  }
>  
> -void __hyp_text __kvm_tlb_flush_vmid(struct kvm *kvm)
> +void __hyp_text __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu)
>  {
>  	struct tlb_inv_context cxt;
>  
>  	dsb(ishst);
>  
>  	/* Switch to requested VMID */
> -	kvm = kern_hyp_va(kvm);
> -	__tlb_switch_to_guest(kvm, &cxt);
> +	mmu = kern_hyp_va(mmu);
> +	__tlb_switch_to_guest(mmu, &cxt);
>  
>  	__tlbi(vmalls12e1is);
>  	dsb(ish);
>  	isb();
>  
> -	__tlb_switch_to_host(kvm, &cxt);
> +	__tlb_switch_to_host(mmu, &cxt);
>  }
>  
>  void __hyp_text __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu)
>  {
> -	struct kvm *kvm = kern_hyp_va(kern_hyp_va(vcpu)->kvm);
> +	struct kvm_s2_mmu *mmu = kern_hyp_va(kern_hyp_va(vcpu)->arch.hw_mmu);
>  	struct tlb_inv_context cxt;
>  
>  	/* Switch to requested VMID */
> -	__tlb_switch_to_guest(kvm, &cxt);
> +	__tlb_switch_to_guest(mmu, &cxt);
>  
>  	__tlbi(vmalle1);
>  	dsb(nsh);
>  	isb();
>  
> -	__tlb_switch_to_host(kvm, &cxt);
> +	__tlb_switch_to_host(mmu, &cxt);
>  }
>  
>  void __hyp_text __kvm_flush_vm_context(void)
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 53b3ba9173ba7..03f01fcfa2bd5 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c

There's a comment that still mentions arch.vmid that you missed in this file:

static bool need_new_vmid_gen(struct kvm_vmid *vmid)
{
    u64 current_vmid_gen = atomic64_read(&kvm_vmid_gen);
    smp_rmb(); /* Orders read of kvm_vmid_gen and kvm->arch.vmid */

> @@ -101,22 +101,15 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>   */
>  int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  {
> -	int ret, cpu;
> +	int ret;
>  
>  	ret = kvm_arm_setup_stage2(kvm, type);
>  	if (ret)
>  		return ret;
>  
> -	kvm->arch.last_vcpu_ran = alloc_percpu(typeof(*kvm->arch.last_vcpu_ran));
> -	if (!kvm->arch.last_vcpu_ran)
> -		return -ENOMEM;
> -
> -	for_each_possible_cpu(cpu)
> -		*per_cpu_ptr(kvm->arch.last_vcpu_ran, cpu) = -1;
> -
> -	ret = kvm_alloc_stage2_pgd(kvm);
> +	ret = kvm_init_stage2_mmu(kvm, &kvm->arch.mmu);
>  	if (ret)
> -		goto out_fail_alloc;
> +		return ret;
>  
>  	ret = create_hyp_mappings(kvm, kvm + 1, PAGE_HYP);
>  	if (ret)
> @@ -124,19 +117,13 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  
>  	kvm_vgic_early_init(kvm);
>  
> -	/* Mark the initial VMID generation invalid */
> -	kvm->arch.vmid.vmid_gen = 0;
> -
>  	/* The maximum number of VCPUs is limited by the host's GIC model */
>  	kvm->arch.max_vcpus = vgic_present ?
>  				kvm_vgic_get_max_vcpus() : KVM_MAX_VCPUS;
>  
>  	return ret;
>  out_free_stage2_pgd:
> -	kvm_free_stage2_pgd(kvm);
> -out_fail_alloc:
> -	free_percpu(kvm->arch.last_vcpu_ran);
> -	kvm->arch.last_vcpu_ran = NULL;
> +	kvm_free_stage2_pgd(&kvm->arch.mmu);
>  	return ret;
>  }
>  
> @@ -161,9 +148,6 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
>  
>  	kvm_vgic_destroy(kvm);
>  
> -	free_percpu(kvm->arch.last_vcpu_ran);
> -	kvm->arch.last_vcpu_ran = NULL;
> -
>  	for (i = 0; i < KVM_MAX_VCPUS; ++i) {
>  		if (kvm->vcpus[i]) {
>  			kvm_vcpu_destroy(kvm->vcpus[i]);
> @@ -279,6 +263,8 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
>  
>  	kvm_arm_pvtime_vcpu_init(&vcpu->arch);
>  
> +	vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu;
> +
>  	err = kvm_vgic_vcpu_init(vcpu);
>  	if (err)
>  		return err;
> @@ -337,7 +323,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  	int *last_ran;
>  	kvm_host_data_t *cpu_data;
>  
> -	last_ran = this_cpu_ptr(vcpu->kvm->arch.last_vcpu_ran);
> +	last_ran = this_cpu_ptr(vcpu->arch.hw_mmu->last_vcpu_ran);
>  	cpu_data = this_cpu_ptr(&kvm_host_data);
>  
>  	/*
> @@ -679,7 +665,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		 */
>  		cond_resched();
>  
> -		update_vmid(&vcpu->kvm->arch.vmid);
> +		update_vmid(&vcpu->arch.hw_mmu->vmid);
>  
>  		check_vcpu_requests(vcpu);
>  
> @@ -728,7 +714,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		 */
>  		smp_store_mb(vcpu->mode, IN_GUEST_MODE);
>  
> -		if (ret <= 0 || need_new_vmid_gen(&vcpu->kvm->arch.vmid) ||
> +		if (ret <= 0 || need_new_vmid_gen(&vcpu->arch.hw_mmu->vmid) ||
>  		    kvm_request_pending(vcpu)) {
>  			vcpu->mode = OUTSIDE_GUEST_MODE;
>  			isb(); /* Ensure work in x_flush_hwstate is committed */
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index e3b9ee268823b..2f99749048285 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -55,12 +55,12 @@ static bool memslot_is_logging(struct kvm_memory_slot *memslot)
>   */
>  void kvm_flush_remote_tlbs(struct kvm *kvm)
>  {
> -	kvm_call_hyp(__kvm_tlb_flush_vmid, kvm);
> +	kvm_call_hyp(__kvm_tlb_flush_vmid, &kvm->arch.mmu);
>  }
>  
> -static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
> +static void kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa)
>  {
> -	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, kvm, ipa);
> +	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ipa);
>  }
>  
>  /*
> @@ -96,31 +96,33 @@ static bool kvm_is_device_pfn(unsigned long pfn)
>   *
>   * Function clears a PMD entry, flushes addr 1st and 2nd stage TLBs.
>   */
> -static void stage2_dissolve_pmd(struct kvm *kvm, phys_addr_t addr, pmd_t *pmd)
> +static void stage2_dissolve_pmd(struct kvm_s2_mmu *mmu, phys_addr_t addr, pmd_t *pmd)
>  {
>  	if (!pmd_thp_or_huge(*pmd))
>  		return;
>  
>  	pmd_clear(pmd);
> -	kvm_tlb_flush_vmid_ipa(kvm, addr);
> +	kvm_tlb_flush_vmid_ipa(mmu, addr);
>  	put_page(virt_to_page(pmd));
>  }
>  
>  /**
>   * stage2_dissolve_pud() - clear and flush huge PUD entry
> - * @kvm:	pointer to kvm structure.
> + * @mmu:	pointer to mmu structure to operate on
>   * @addr:	IPA
>   * @pud:	pud pointer for IPA
>   *
>   * Function clears a PUD entry, flushes addr 1st and 2nd stage TLBs.
>   */
> -static void stage2_dissolve_pud(struct kvm *kvm, phys_addr_t addr, pud_t *pudp)
> +static void stage2_dissolve_pud(struct kvm_s2_mmu *mmu, phys_addr_t addr, pud_t *pudp)
>  {
> +	struct kvm *kvm __maybe_unused = mmu->kvm;
> +
>  	if (!stage2_pud_huge(kvm, *pudp))
>  		return;
>  
>  	stage2_pud_clear(kvm, pudp);
> -	kvm_tlb_flush_vmid_ipa(kvm, addr);
> +	kvm_tlb_flush_vmid_ipa(mmu, addr);
>  	put_page(virt_to_page(pudp));
>  }
>  
> @@ -156,31 +158,35 @@ static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
>  	return p;
>  }
>  
> -static void clear_stage2_pgd_entry(struct kvm *kvm, pgd_t *pgd, phys_addr_t addr)
> +static void clear_stage2_pgd_entry(struct kvm_s2_mmu *mmu, pgd_t *pgd, phys_addr_t addr)
>  {
> +	struct kvm *kvm __maybe_unused = mmu->kvm;
> +
>  	pud_t *pud_table __maybe_unused = stage2_pud_offset(kvm, pgd, 0UL);
>  	stage2_pgd_clear(kvm, pgd);
> -	kvm_tlb_flush_vmid_ipa(kvm, addr);
> +	kvm_tlb_flush_vmid_ipa(mmu, addr);
>  	stage2_pud_free(kvm, pud_table);
>  	put_page(virt_to_page(pgd));
>  }
>  
> -static void clear_stage2_pud_entry(struct kvm *kvm, pud_t *pud, phys_addr_t addr)
> +static void clear_stage2_pud_entry(struct kvm_s2_mmu *mmu, pud_t *pud, phys_addr_t addr)
>  {
> +	struct kvm *kvm __maybe_unused = mmu->kvm;
> +
>  	pmd_t *pmd_table __maybe_unused = stage2_pmd_offset(kvm, pud, 0);
>  	VM_BUG_ON(stage2_pud_huge(kvm, *pud));
>  	stage2_pud_clear(kvm, pud);
> -	kvm_tlb_flush_vmid_ipa(kvm, addr);
> +	kvm_tlb_flush_vmid_ipa(mmu, addr);
>  	stage2_pmd_free(kvm, pmd_table);
>  	put_page(virt_to_page(pud));
>  }
>  
> -static void clear_stage2_pmd_entry(struct kvm *kvm, pmd_t *pmd, phys_addr_t addr)
> +static void clear_stage2_pmd_entry(struct kvm_s2_mmu *mmu, pmd_t *pmd, phys_addr_t addr)
>  {
>  	pte_t *pte_table = pte_offset_kernel(pmd, 0);
>  	VM_BUG_ON(pmd_thp_or_huge(*pmd));
>  	pmd_clear(pmd);
> -	kvm_tlb_flush_vmid_ipa(kvm, addr);
> +	kvm_tlb_flush_vmid_ipa(mmu, addr);
>  	free_page((unsigned long)pte_table);
>  	put_page(virt_to_page(pmd));
>  }
> @@ -238,7 +244,7 @@ static inline void kvm_pgd_populate(pgd_t *pgdp, pud_t *pudp)
>   * we then fully enforce cacheability of RAM, no matter what the guest
>   * does.
>   */
> -static void unmap_stage2_ptes(struct kvm *kvm, pmd_t *pmd,
> +static void unmap_stage2_ptes(struct kvm_s2_mmu *mmu, pmd_t *pmd,
>  		       phys_addr_t addr, phys_addr_t end)
>  {
>  	phys_addr_t start_addr = addr;
> @@ -250,7 +256,7 @@ static void unmap_stage2_ptes(struct kvm *kvm, pmd_t *pmd,
>  			pte_t old_pte = *pte;
>  
>  			kvm_set_pte(pte, __pte(0));
> -			kvm_tlb_flush_vmid_ipa(kvm, addr);
> +			kvm_tlb_flush_vmid_ipa(mmu, addr);
>  
>  			/* No need to invalidate the cache for device mappings */
>  			if (!kvm_is_device_pfn(pte_pfn(old_pte)))
> @@ -260,13 +266,14 @@ static void unmap_stage2_ptes(struct kvm *kvm, pmd_t *pmd,
>  		}
>  	} while (pte++, addr += PAGE_SIZE, addr != end);
>  
> -	if (stage2_pte_table_empty(kvm, start_pte))
> -		clear_stage2_pmd_entry(kvm, pmd, start_addr);
> +	if (stage2_pte_table_empty(mmu->kvm, start_pte))
> +		clear_stage2_pmd_entry(mmu, pmd, start_addr);
>  }
>  
> -static void unmap_stage2_pmds(struct kvm *kvm, pud_t *pud,
> +static void unmap_stage2_pmds(struct kvm_s2_mmu *mmu, pud_t *pud,
>  		       phys_addr_t addr, phys_addr_t end)
>  {
> +	struct kvm *kvm __maybe_unused = mmu->kvm;
>  	phys_addr_t next, start_addr = addr;
>  	pmd_t *pmd, *start_pmd;
>  
> @@ -278,24 +285,25 @@ static void unmap_stage2_pmds(struct kvm *kvm, pud_t *pud,
>  				pmd_t old_pmd = *pmd;
>  
>  				pmd_clear(pmd);
> -				kvm_tlb_flush_vmid_ipa(kvm, addr);
> +				kvm_tlb_flush_vmid_ipa(mmu, addr);
>  
>  				kvm_flush_dcache_pmd(old_pmd);
>  
>  				put_page(virt_to_page(pmd));
>  			} else {
> -				unmap_stage2_ptes(kvm, pmd, addr, next);
> +				unmap_stage2_ptes(mmu, pmd, addr, next);
>  			}
>  		}
>  	} while (pmd++, addr = next, addr != end);
>  
>  	if (stage2_pmd_table_empty(kvm, start_pmd))
> -		clear_stage2_pud_entry(kvm, pud, start_addr);
> +		clear_stage2_pud_entry(mmu, pud, start_addr);
>  }
>  
> -static void unmap_stage2_puds(struct kvm *kvm, pgd_t *pgd,
> +static void unmap_stage2_puds(struct kvm_s2_mmu *mmu, pgd_t *pgd,
>  		       phys_addr_t addr, phys_addr_t end)
>  {
> +	struct kvm *kvm __maybe_unused = mmu->kvm;
>  	phys_addr_t next, start_addr = addr;
>  	pud_t *pud, *start_pud;
>  
> @@ -307,17 +315,17 @@ static void unmap_stage2_puds(struct kvm *kvm, pgd_t *pgd,
>  				pud_t old_pud = *pud;
>  
>  				stage2_pud_clear(kvm, pud);
> -				kvm_tlb_flush_vmid_ipa(kvm, addr);
> +				kvm_tlb_flush_vmid_ipa(mmu, addr);
>  				kvm_flush_dcache_pud(old_pud);
>  				put_page(virt_to_page(pud));
>  			} else {
> -				unmap_stage2_pmds(kvm, pud, addr, next);
> +				unmap_stage2_pmds(mmu, pud, addr, next);
>  			}
>  		}
>  	} while (pud++, addr = next, addr != end);
>  
>  	if (stage2_pud_table_empty(kvm, start_pud))
> -		clear_stage2_pgd_entry(kvm, pgd, start_addr);
> +		clear_stage2_pgd_entry(mmu, pgd, start_addr);
>  }
>  
>  /**
> @@ -331,8 +339,9 @@ static void unmap_stage2_puds(struct kvm *kvm, pgd_t *pgd,
>   * destroying the VM), otherwise another faulting VCPU may come in and mess
>   * with things behind our backs.
>   */
> -static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
> +static void unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
>  {
> +	struct kvm *kvm = mmu->kvm;
>  	pgd_t *pgd;
>  	phys_addr_t addr = start, end = start + size;
>  	phys_addr_t next;
> @@ -340,18 +349,18 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
>  	assert_spin_locked(&kvm->mmu_lock);
>  	WARN_ON(size & ~PAGE_MASK);
>  
> -	pgd = kvm->arch.pgd + stage2_pgd_index(kvm, addr);
> +	pgd = mmu->pgd + stage2_pgd_index(kvm, addr);
>  	do {
>  		/*
>  		 * Make sure the page table is still active, as another thread
>  		 * could have possibly freed the page table, while we released
>  		 * the lock.
>  		 */
> -		if (!READ_ONCE(kvm->arch.pgd))
> +		if (!READ_ONCE(mmu->pgd))
>  			break;
>  		next = stage2_pgd_addr_end(kvm, addr, end);
>  		if (!stage2_pgd_none(kvm, *pgd))
> -			unmap_stage2_puds(kvm, pgd, addr, next);
> +			unmap_stage2_puds(mmu, pgd, addr, next);
>  		/*
>  		 * If the range is too large, release the kvm->mmu_lock
>  		 * to prevent starvation and lockup detector warnings.
> @@ -361,7 +370,7 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
>  	} while (pgd++, addr = next, addr != end);
>  }
>  
> -static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
> +static void stage2_flush_ptes(struct kvm_s2_mmu *mmu, pmd_t *pmd,
>  			      phys_addr_t addr, phys_addr_t end)
>  {
>  	pte_t *pte;
> @@ -373,9 +382,10 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
>  	} while (pte++, addr += PAGE_SIZE, addr != end);
>  }
>  
> -static void stage2_flush_pmds(struct kvm *kvm, pud_t *pud,
> +static void stage2_flush_pmds(struct kvm_s2_mmu *mmu, pud_t *pud,
>  			      phys_addr_t addr, phys_addr_t end)
>  {
> +	struct kvm *kvm = mmu->kvm;
>  	pmd_t *pmd;
>  	phys_addr_t next;
>  
> @@ -386,14 +396,15 @@ static void stage2_flush_pmds(struct kvm *kvm, pud_t *pud,
>  			if (pmd_thp_or_huge(*pmd))
>  				kvm_flush_dcache_pmd(*pmd);
>  			else
> -				stage2_flush_ptes(kvm, pmd, addr, next);
> +				stage2_flush_ptes(mmu, pmd, addr, next);
>  		}
>  	} while (pmd++, addr = next, addr != end);
>  }
>  
> -static void stage2_flush_puds(struct kvm *kvm, pgd_t *pgd,
> +static void stage2_flush_puds(struct kvm_s2_mmu *mmu, pgd_t *pgd,
>  			      phys_addr_t addr, phys_addr_t end)
>  {
> +	struct kvm *kvm __maybe_unused = mmu->kvm;
>  	pud_t *pud;
>  	phys_addr_t next;
>  
> @@ -404,24 +415,25 @@ static void stage2_flush_puds(struct kvm *kvm, pgd_t *pgd,
>  			if (stage2_pud_huge(kvm, *pud))
>  				kvm_flush_dcache_pud(*pud);
>  			else
> -				stage2_flush_pmds(kvm, pud, addr, next);
> +				stage2_flush_pmds(mmu, pud, addr, next);
>  		}
>  	} while (pud++, addr = next, addr != end);
>  }
>  
> -static void stage2_flush_memslot(struct kvm *kvm,
> +static void stage2_flush_memslot(struct kvm_s2_mmu *mmu,
>  				 struct kvm_memory_slot *memslot)
>  {
> +	struct kvm *kvm = mmu->kvm;
>  	phys_addr_t addr = memslot->base_gfn << PAGE_SHIFT;
>  	phys_addr_t end = addr + PAGE_SIZE * memslot->npages;
>  	phys_addr_t next;
>  	pgd_t *pgd;
>  
> -	pgd = kvm->arch.pgd + stage2_pgd_index(kvm, addr);
> +	pgd = mmu->pgd + stage2_pgd_index(kvm, addr);
>  	do {
>  		next = stage2_pgd_addr_end(kvm, addr, end);
>  		if (!stage2_pgd_none(kvm, *pgd))
> -			stage2_flush_puds(kvm, pgd, addr, next);
> +			stage2_flush_puds(mmu, pgd, addr, next);
>  	} while (pgd++, addr = next, addr != end);
>  }
>  
> @@ -443,7 +455,7 @@ static void stage2_flush_vm(struct kvm *kvm)
>  
>  	slots = kvm_memslots(kvm);
>  	kvm_for_each_memslot(memslot, slots)
> -		stage2_flush_memslot(kvm, memslot);
> +		stage2_flush_memslot(&kvm->arch.mmu, memslot);
>  
>  	spin_unlock(&kvm->mmu_lock);
>  	srcu_read_unlock(&kvm->srcu, idx);
> @@ -886,21 +898,23 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
>  }
>  
>  /**
> - * kvm_alloc_stage2_pgd - allocate level-1 table for stage-2 translation.
> - * @kvm:	The KVM struct pointer for the VM.
> + * kvm_init_stage2_mmu - Initialise a S2 MMU strucrure
> + * @kvm:	The pointer to the KVM structure
> + * @mmu:	The pointer to the s2 MMU structure
>   *
>   * Allocates only the stage-2 HW PGD level table(s) of size defined by
> - * stage2_pgd_size(kvm).
> + * stage2_pgd_size(mmu->kvm).
>   *
>   * Note we don't need locking here as this is only called when the VM is
>   * created, which can only be done once.
>   */
> -int kvm_alloc_stage2_pgd(struct kvm *kvm)
> +int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
>  {
>  	phys_addr_t pgd_phys;
>  	pgd_t *pgd;
> +	int cpu;
>  
> -	if (kvm->arch.pgd != NULL) {
> +	if (mmu->pgd != NULL) {
>  		kvm_err("kvm_arch already initialized?\n");
>  		return -EINVAL;
>  	}
> @@ -914,8 +928,20 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm)
>  	if (WARN_ON(pgd_phys & ~kvm_vttbr_baddr_mask(kvm)))

We don't free the pgd here, but we do free it if alloc_percpu fails. Is that
intentional?

>  		return -EINVAL;
>  
> -	kvm->arch.pgd = pgd;
> -	kvm->arch.pgd_phys = pgd_phys;
> +	mmu->last_vcpu_ran = alloc_percpu(typeof(*mmu->last_vcpu_ran));
> +	if (!mmu->last_vcpu_ran) {
> +		free_pages_exact(pgd, stage2_pgd_size(kvm));
> +		return -ENOMEM;
> +	}
> +
> +	for_each_possible_cpu(cpu)
> +		*per_cpu_ptr(mmu->last_vcpu_ran, cpu) = -1;
> +
> +	mmu->kvm = kvm;
> +	mmu->pgd = pgd;
> +	mmu->pgd_phys = pgd_phys;
> +	mmu->vmid.vmid_gen = 0;
> +
>  	return 0;
>  }
>  
> @@ -954,7 +980,7 @@ static void stage2_unmap_memslot(struct kvm *kvm,
>  
>  		if (!(vma->vm_flags & VM_PFNMAP)) {
>  			gpa_t gpa = addr + (vm_start - memslot->userspace_addr);
> -			unmap_stage2_range(kvm, gpa, vm_end - vm_start);
> +			unmap_stage2_range(&kvm->arch.mmu, gpa, vm_end - vm_start);
>  		}
>  		hva = vm_end;
>  	} while (hva < reg_end);
> @@ -986,39 +1012,34 @@ void stage2_unmap_vm(struct kvm *kvm)
>  	srcu_read_unlock(&kvm->srcu, idx);
>  }
>  
> -/**
> - * kvm_free_stage2_pgd - free all stage-2 tables
> - * @kvm:	The KVM struct pointer for the VM.
> - *
> - * Walks the level-1 page table pointed to by kvm->arch.pgd and frees all
> - * underlying level-2 and level-3 tables before freeing the actual level-1 table
> - * and setting the struct pointer to NULL.
> - */
> -void kvm_free_stage2_pgd(struct kvm *kvm)
> +void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
>  {
> +	struct kvm *kvm = mmu->kvm;
>  	void *pgd = NULL;
>  
>  	spin_lock(&kvm->mmu_lock);
> -	if (kvm->arch.pgd) {
> -		unmap_stage2_range(kvm, 0, kvm_phys_size(kvm));
> -		pgd = READ_ONCE(kvm->arch.pgd);
> -		kvm->arch.pgd = NULL;
> -		kvm->arch.pgd_phys = 0;
> +	if (mmu->pgd) {
> +		unmap_stage2_range(mmu, 0, kvm_phys_size(kvm));
> +		pgd = READ_ONCE(mmu->pgd);
> +		mmu->pgd = NULL;

The kvm->arch.pgd_phys = 0 instruction seems to have been dropped here. Is that
intentional?

Thanks,
Alex
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 03/26] KVM: arm64: Factor out stage 2 page table data from struct kvm
  2020-05-11 16:38   ` Alexandru Elisei
@ 2020-05-12 11:17     ` James Morse
  2020-05-12 15:47       ` Alexandru Elisei
  2020-05-12 16:53       ` Alexandru Elisei
  0 siblings, 2 replies; 78+ messages in thread
From: James Morse @ 2020-05-12 11:17 UTC (permalink / raw)
  To: Alexandru Elisei, Marc Zyngier
  Cc: kvm, Andre Przywara, kvmarm, George Cherian, Zengtao (B),
	Catalin Marinas, Will Deacon, Dave Martin, linux-arm-kernel

Hi Alex, Marc,

(just on this last_vcpu_ran thing...)

On 11/05/2020 17:38, Alexandru Elisei wrote:
> On 4/22/20 1:00 PM, Marc Zyngier wrote:
>> From: Christoffer Dall <christoffer.dall@arm.com>
>>
>> As we are about to reuse our stage 2 page table manipulation code for
>> shadow stage 2 page tables in the context of nested virtualization, we
>> are going to manage multiple stage 2 page tables for a single VM.
>>
>> This requires some pretty invasive changes to our data structures,
>> which moves the vmid and pgd pointers into a separate structure and
>> change pretty much all of our mmu code to operate on this structure
>> instead.
>>
>> The new structure is called struct kvm_s2_mmu.
>>
>> There is no intended functional change by this patch alone.

>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 7dd8fefa6aecd..664a5d92ae9b8 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -63,19 +63,32 @@ struct kvm_vmid {
>>  	u32    vmid;
>>  };
>>  
>> -struct kvm_arch {
>> +struct kvm_s2_mmu {
>>  	struct kvm_vmid vmid;
>>  
>> -	/* stage2 entry level table */
>> -	pgd_t *pgd;
>> -	phys_addr_t pgd_phys;
>> -
>> -	/* VTCR_EL2 value for this VM */
>> -	u64    vtcr;
>> +	/*
>> +	 * stage2 entry level table
>> +	 *
>> +	 * Two kvm_s2_mmu structures in the same VM can point to the same pgd
>> +	 * here.  This happens when running a non-VHE guest hypervisor which
>> +	 * uses the canonical stage 2 page table for both vEL2 and for vEL1/0
>> +	 * with vHCR_EL2.VM == 0.
> 
> It makes more sense to me to say that a non-VHE guest hypervisor will use the
> canonical stage *1* page table when running at EL2

Can KVM say anything about stage1? Its totally under the the guests control even at vEL2...


> (the "Non-secure EL2 translation regime" as ARM DDI 0487F.b calls it on page D5-2543).

> I think that's
> the only situation where vEL2 and vEL1&0 will use the same L0 stage 2 tables. It's
> been quite some time since I reviewed the initial version of the NV patches, did I
> get that wrong?


>> +	 */
>> +	pgd_t		*pgd;
>> +	phys_addr_t	pgd_phys;
>>  
>>  	/* The last vcpu id that ran on each physical CPU */
>>  	int __percpu *last_vcpu_ran;
> 
> It makes sense for the other fields to be part of kvm_s2_mmu, but I'm struggling
> to figure out why last_vcpu_ran is here. Would you mind sharing the rationale? I
> don't see this change in v1 or v2 of the NV series.

Marc may have a better rationale. My thinking was because kvm_vmid is in here too.

last_vcpu_ran exists to prevent KVM accidentally emulating CNP without the opt-in. (we
call it defacto CNP).

The guest may expect to be able to use asid-4 with different page tables on different
vCPUs, assuming the TLB isn't shared. But if KVM is switching between those vCPU on one
physical CPU, the TLB is shared, ... the VMID and ASID are the same, but the page tables
are not. Not fun to debug!


NV makes this problem per-stage2, because each stage2 has its own VMID, we need to track
the vcpu_id that last ran this stage2 on this physical CPU. If its not the same, we need
to blow away this VMIDs TLB entries.

The workaround lives in virt/kvm/arm/arm.c::kvm_arch_vcpu_load()


> More below.

(lightly trimmed!)

Thanks,

James


>>  
>> +	struct kvm *kvm;
>> +};

[...]

>> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
>> index 53b3ba9173ba7..03f01fcfa2bd5 100644
>> --- a/virt/kvm/arm/arm.c
>> +++ b/virt/kvm/arm/arm.c
> 
> There's a comment that still mentions arch.vmid that you missed in this file:
> 
> static bool need_new_vmid_gen(struct kvm_vmid *vmid)
> {
>     u64 current_vmid_gen = atomic64_read(&kvm_vmid_gen);
>     smp_rmb(); /* Orders read of kvm_vmid_gen and kvm->arch.vmid */
> 

[..]

>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>> index e3b9ee268823b..2f99749048285 100644
>> --- a/virt/kvm/arm/mmu.c
>> +++ b/virt/kvm/arm/mmu.c

>> @@ -886,21 +898,23 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
>>  }
>>  
>>  /**
>> - * kvm_alloc_stage2_pgd - allocate level-1 table for stage-2 translation.
>> - * @kvm:	The KVM struct pointer for the VM.
>> + * kvm_init_stage2_mmu - Initialise a S2 MMU strucrure
>> + * @kvm:	The pointer to the KVM structure
>> + * @mmu:	The pointer to the s2 MMU structure
>>   *
>>   * Allocates only the stage-2 HW PGD level table(s) of size defined by
>> - * stage2_pgd_size(kvm).
>> + * stage2_pgd_size(mmu->kvm).
>>   *
>>   * Note we don't need locking here as this is only called when the VM is
>>   * created, which can only be done once.
>>   */
>> -int kvm_alloc_stage2_pgd(struct kvm *kvm)
>> +int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
>>  {
>>  	phys_addr_t pgd_phys;
>>  	pgd_t *pgd;
>> +	int cpu;
>>  
>> -	if (kvm->arch.pgd != NULL) {
>> +	if (mmu->pgd != NULL) {
>>  		kvm_err("kvm_arch already initialized?\n");
>>  		return -EINVAL;
>>  	}
>> @@ -914,8 +928,20 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm)
>>  	if (WARN_ON(pgd_phys & ~kvm_vttbr_baddr_mask(kvm)))

> We don't free the pgd here, but we do free it if alloc_percpu fails. Is that
> intentional?


>>  		return -EINVAL;
>>  
>> -	kvm->arch.pgd = pgd;
>> -	kvm->arch.pgd_phys = pgd_phys;
>> +	mmu->last_vcpu_ran = alloc_percpu(typeof(*mmu->last_vcpu_ran));
>> +	if (!mmu->last_vcpu_ran) {
>> +		free_pages_exact(pgd, stage2_pgd_size(kvm));
>> +		return -ENOMEM;
>> +	}
>> +
>> +	for_each_possible_cpu(cpu)
>> +		*per_cpu_ptr(mmu->last_vcpu_ran, cpu) = -1;
>> +
>> +	mmu->kvm = kvm;
>> +	mmu->pgd = pgd;
>> +	mmu->pgd_phys = pgd_phys;
>> +	mmu->vmid.vmid_gen = 0;
>> +
>>  	return 0;
>>  }
>>  

>> @@ -986,39 +1012,34 @@ void stage2_unmap_vm(struct kvm *kvm)
>>  	srcu_read_unlock(&kvm->srcu, idx);
>>  }
>>  
>> -/**
>> - * kvm_free_stage2_pgd - free all stage-2 tables
>> - * @kvm:	The KVM struct pointer for the VM.
>> - *
>> - * Walks the level-1 page table pointed to by kvm->arch.pgd and frees all
>> - * underlying level-2 and level-3 tables before freeing the actual level-1 table
>> - * and setting the struct pointer to NULL.
>> - */
>> -void kvm_free_stage2_pgd(struct kvm *kvm)
>> +void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
>>  {
>> +	struct kvm *kvm = mmu->kvm;
>>  	void *pgd = NULL;
>>  
>>  	spin_lock(&kvm->mmu_lock);
>> -	if (kvm->arch.pgd) {
>> -		unmap_stage2_range(kvm, 0, kvm_phys_size(kvm));
>> -		pgd = READ_ONCE(kvm->arch.pgd);
>> -		kvm->arch.pgd = NULL;
>> -		kvm->arch.pgd_phys = 0;
>> +	if (mmu->pgd) {
>> +		unmap_stage2_range(mmu, 0, kvm_phys_size(kvm));
>> +		pgd = READ_ONCE(mmu->pgd);
>> +		mmu->pgd = NULL;
> 
> The kvm->arch.pgd_phys = 0 instruction seems to have been dropped here. Is that
> intentional?
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 08/26] KVM: arm64: Use TTL hint in when invalidating stage-2 translations
  2020-05-07 15:13   ` Andrew Scull
@ 2020-05-12 12:04     ` James Morse
  2020-05-13  9:06       ` Andrew Scull
  0 siblings, 1 reply; 78+ messages in thread
From: James Morse @ 2020-05-12 12:04 UTC (permalink / raw)
  To: Andrew Scull
  Cc: Catalin Marinas, kvm, Marc Zyngier, Dave Martin, George Cherian,
	Zengtao (B),
	Andre Przywara, Will Deacon, kvmarm, linux-arm-kernel

Hi Andrew,

On 07/05/2020 16:13, Andrew Scull wrote:
>> @@ -176,7 +177,7 @@ static void clear_stage2_pud_entry(struct kvm_s2_mmu *mmu, pud_t *pud, phys_addr
>>  	pmd_t *pmd_table __maybe_unused = stage2_pmd_offset(kvm, pud, 0);
>>  	VM_BUG_ON(stage2_pud_huge(kvm, *pud));
>>  	stage2_pud_clear(kvm, pud);
>> -	kvm_tlb_flush_vmid_ipa(mmu, addr);
>> +	kvm_tlb_flush_vmid_ipa(mmu, addr, S2_NO_LEVEL_HINT);
>>  	stage2_pmd_free(kvm, pmd_table);
>>  	put_page(virt_to_page(pud));
>>  }
>> @@ -186,7 +187,7 @@ static void clear_stage2_pmd_entry(struct kvm_s2_mmu *mmu, pmd_t *pmd, phys_addr
>>  	pte_t *pte_table = pte_offset_kernel(pmd, 0);
>>  	VM_BUG_ON(pmd_thp_or_huge(*pmd));
>>  	pmd_clear(pmd);
>> -	kvm_tlb_flush_vmid_ipa(mmu, addr);
>> +	kvm_tlb_flush_vmid_ipa(mmu, addr, S2_NO_LEVEL_HINT);
>>  	free_page((unsigned long)pte_table);
>>  	put_page(virt_to_page(pmd));
>>  }
> 
> Going by the names, is it possible to give a better level hint for these
> cases?

There is no leaf entry being invalidated here. After clearing the range, we found we'd
emptied (and invalidated) a whole page of mappings:
|	if (stage2_pmd_table_empty(kvm, start_pmd))
|		clear_stage2_pud_entry(mmu, pud, start_addr);

Now we want to remove the link to the empty page so we can free it. We are changing the
structure of the tables, not what gets mapped.

I think this is why we need the un-hinted behaviour, to invalidate "any level of the
translation table walk required to translate the specified IPA". Otherwise the hardware
can look for a leaf at the indicated level, find none, and do nothing.


This is sufficiently horrible, its possible I've got it completely wrong! (does it make
sense?)


Thanks,

James
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 03/26] KVM: arm64: Factor out stage 2 page table data from struct kvm
  2020-05-12 11:17     ` James Morse
@ 2020-05-12 15:47       ` Alexandru Elisei
  2020-05-12 16:13         ` James Morse
  2020-05-12 16:53       ` Alexandru Elisei
  1 sibling, 1 reply; 78+ messages in thread
From: Alexandru Elisei @ 2020-05-12 15:47 UTC (permalink / raw)
  To: James Morse, Marc Zyngier
  Cc: kvm, Andre Przywara, kvmarm, George Cherian, Zengtao (B),
	Catalin Marinas, Will Deacon, Dave Martin, linux-arm-kernel

Hi,

On 5/12/20 12:17 PM, James Morse wrote:
> Hi Alex, Marc,
>
> (just on this last_vcpu_ran thing...)
>
> On 11/05/2020 17:38, Alexandru Elisei wrote:
>> On 4/22/20 1:00 PM, Marc Zyngier wrote:
>>> From: Christoffer Dall <christoffer.dall@arm.com>
>>>
>>> As we are about to reuse our stage 2 page table manipulation code for
>>> shadow stage 2 page tables in the context of nested virtualization, we
>>> are going to manage multiple stage 2 page tables for a single VM.
>>>
>>> This requires some pretty invasive changes to our data structures,
>>> which moves the vmid and pgd pointers into a separate structure and
>>> change pretty much all of our mmu code to operate on this structure
>>> instead.
>>>
>>> The new structure is called struct kvm_s2_mmu.
>>>
>>> There is no intended functional change by this patch alone.
>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>>> index 7dd8fefa6aecd..664a5d92ae9b8 100644
>>> --- a/arch/arm64/include/asm/kvm_host.h
>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>> @@ -63,19 +63,32 @@ struct kvm_vmid {
>>>  	u32    vmid;
>>>  };
>>>  
>>> -struct kvm_arch {
>>> +struct kvm_s2_mmu {
>>>  	struct kvm_vmid vmid;
>>>  
>>> -	/* stage2 entry level table */
>>> -	pgd_t *pgd;
>>> -	phys_addr_t pgd_phys;
>>> -
>>> -	/* VTCR_EL2 value for this VM */
>>> -	u64    vtcr;
>>> +	/*
>>> +	 * stage2 entry level table
>>> +	 *
>>> +	 * Two kvm_s2_mmu structures in the same VM can point to the same pgd
>>> +	 * here.  This happens when running a non-VHE guest hypervisor which
>>> +	 * uses the canonical stage 2 page table for both vEL2 and for vEL1/0
>>> +	 * with vHCR_EL2.VM == 0.
>> It makes more sense to me to say that a non-VHE guest hypervisor will use the
>> canonical stage *1* page table when running at EL2
> Can KVM say anything about stage1? Its totally under the the guests control even at vEL2...

It is. My interpretation of the comment was that if the guest doesn't have virtual
stage 2 enabled (we're not running a guest of the L1 hypervisor), then the L0 host
can use the same L0 stage 2 tables because we're running the same guest (the L1
VM), regardless of the actual exception level for the guest. If I remember
correctly, KVM assigns different vmids for guests running at vEL1/0 and vEL2 with
vHCR_EL2.VM == 0 because the translation regimes are different, but keeps the same
translation tables.

>
>
>> (the "Non-secure EL2 translation regime" as ARM DDI 0487F.b calls it on page D5-2543).
>> I think that's
>> the only situation where vEL2 and vEL1&0 will use the same L0 stage 2 tables. It's
>> been quite some time since I reviewed the initial version of the NV patches, did I
>> get that wrong?
>
>>> +	 */
>>> +	pgd_t		*pgd;
>>> +	phys_addr_t	pgd_phys;
>>>  
>>>  	/* The last vcpu id that ran on each physical CPU */
>>>  	int __percpu *last_vcpu_ran;
>> It makes sense for the other fields to be part of kvm_s2_mmu, but I'm struggling
>> to figure out why last_vcpu_ran is here. Would you mind sharing the rationale? I
>> don't see this change in v1 or v2 of the NV series.
> Marc may have a better rationale. My thinking was because kvm_vmid is in here too.
>
> last_vcpu_ran exists to prevent KVM accidentally emulating CNP without the opt-in. (we
> call it defacto CNP).
>
> The guest may expect to be able to use asid-4 with different page tables on different

I'm afraid I don't know what asid-4 is.

> vCPUs, assuming the TLB isn't shared. But if KVM is switching between those vCPU on one
> physical CPU, the TLB is shared, ... the VMID and ASID are the same, but the page tables
> are not. Not fun to debug!
>
>
> NV makes this problem per-stage2, because each stage2 has its own VMID, we need to track
> the vcpu_id that last ran this stage2 on this physical CPU. If its not the same, we need
> to blow away this VMIDs TLB entries.
>
> The workaround lives in virt/kvm/arm/arm.c::kvm_arch_vcpu_load()

Makes sense, thank you for explaining that.

Thanks,
Alex
>
>
>> More below.
> (lightly trimmed!)
>
> Thanks,
>
> James
>
>
>>>  
>>> +	struct kvm *kvm;
>>> +};
> [...]
>
>>> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
>>> index 53b3ba9173ba7..03f01fcfa2bd5 100644
>>> --- a/virt/kvm/arm/arm.c
>>> +++ b/virt/kvm/arm/arm.c
>> There's a comment that still mentions arch.vmid that you missed in this file:
>>
>> static bool need_new_vmid_gen(struct kvm_vmid *vmid)
>> {
>>     u64 current_vmid_gen = atomic64_read(&kvm_vmid_gen);
>>     smp_rmb(); /* Orders read of kvm_vmid_gen and kvm->arch.vmid */
>>
> [..]
>
>>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>>> index e3b9ee268823b..2f99749048285 100644
>>> --- a/virt/kvm/arm/mmu.c
>>> +++ b/virt/kvm/arm/mmu.c
>>> @@ -886,21 +898,23 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
>>>  }
>>>  
>>>  /**
>>> - * kvm_alloc_stage2_pgd - allocate level-1 table for stage-2 translation.
>>> - * @kvm:	The KVM struct pointer for the VM.
>>> + * kvm_init_stage2_mmu - Initialise a S2 MMU strucrure
>>> + * @kvm:	The pointer to the KVM structure
>>> + * @mmu:	The pointer to the s2 MMU structure
>>>   *
>>>   * Allocates only the stage-2 HW PGD level table(s) of size defined by
>>> - * stage2_pgd_size(kvm).
>>> + * stage2_pgd_size(mmu->kvm).
>>>   *
>>>   * Note we don't need locking here as this is only called when the VM is
>>>   * created, which can only be done once.
>>>   */
>>> -int kvm_alloc_stage2_pgd(struct kvm *kvm)
>>> +int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
>>>  {
>>>  	phys_addr_t pgd_phys;
>>>  	pgd_t *pgd;
>>> +	int cpu;
>>>  
>>> -	if (kvm->arch.pgd != NULL) {
>>> +	if (mmu->pgd != NULL) {
>>>  		kvm_err("kvm_arch already initialized?\n");
>>>  		return -EINVAL;
>>>  	}
>>> @@ -914,8 +928,20 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm)
>>>  	if (WARN_ON(pgd_phys & ~kvm_vttbr_baddr_mask(kvm)))
>> We don't free the pgd here, but we do free it if alloc_percpu fails. Is that
>> intentional?
>
>>>  		return -EINVAL;
>>>  
>>> -	kvm->arch.pgd = pgd;
>>> -	kvm->arch.pgd_phys = pgd_phys;
>>> +	mmu->last_vcpu_ran = alloc_percpu(typeof(*mmu->last_vcpu_ran));
>>> +	if (!mmu->last_vcpu_ran) {
>>> +		free_pages_exact(pgd, stage2_pgd_size(kvm));
>>> +		return -ENOMEM;
>>> +	}
>>> +
>>> +	for_each_possible_cpu(cpu)
>>> +		*per_cpu_ptr(mmu->last_vcpu_ran, cpu) = -1;
>>> +
>>> +	mmu->kvm = kvm;
>>> +	mmu->pgd = pgd;
>>> +	mmu->pgd_phys = pgd_phys;
>>> +	mmu->vmid.vmid_gen = 0;
>>> +
>>>  	return 0;
>>>  }
>>>  
>>> @@ -986,39 +1012,34 @@ void stage2_unmap_vm(struct kvm *kvm)
>>>  	srcu_read_unlock(&kvm->srcu, idx);
>>>  }
>>>  
>>> -/**
>>> - * kvm_free_stage2_pgd - free all stage-2 tables
>>> - * @kvm:	The KVM struct pointer for the VM.
>>> - *
>>> - * Walks the level-1 page table pointed to by kvm->arch.pgd and frees all
>>> - * underlying level-2 and level-3 tables before freeing the actual level-1 table
>>> - * and setting the struct pointer to NULL.
>>> - */
>>> -void kvm_free_stage2_pgd(struct kvm *kvm)
>>> +void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
>>>  {
>>> +	struct kvm *kvm = mmu->kvm;
>>>  	void *pgd = NULL;
>>>  
>>>  	spin_lock(&kvm->mmu_lock);
>>> -	if (kvm->arch.pgd) {
>>> -		unmap_stage2_range(kvm, 0, kvm_phys_size(kvm));
>>> -		pgd = READ_ONCE(kvm->arch.pgd);
>>> -		kvm->arch.pgd = NULL;
>>> -		kvm->arch.pgd_phys = 0;
>>> +	if (mmu->pgd) {
>>> +		unmap_stage2_range(mmu, 0, kvm_phys_size(kvm));
>>> +		pgd = READ_ONCE(mmu->pgd);
>>> +		mmu->pgd = NULL;
>> The kvm->arch.pgd_phys = 0 instruction seems to have been dropped here. Is that
>> intentional?
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 03/26] KVM: arm64: Factor out stage 2 page table data from struct kvm
  2020-05-12 15:47       ` Alexandru Elisei
@ 2020-05-12 16:13         ` James Morse
  0 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2020-05-12 16:13 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: Catalin Marinas, kvm, Marc Zyngier, Dave Martin, George Cherian,
	Zengtao (B),
	Andre Przywara, Will Deacon, kvmarm, linux-arm-kernel

Hi Alex,

On 12/05/2020 16:47, Alexandru Elisei wrote:
> On 5/12/20 12:17 PM, James Morse wrote:
>> On 11/05/2020 17:38, Alexandru Elisei wrote:
>>> On 4/22/20 1:00 PM, Marc Zyngier wrote:
>>>> From: Christoffer Dall <christoffer.dall@arm.com>
>>>>
>>>> As we are about to reuse our stage 2 page table manipulation code for
>>>> shadow stage 2 page tables in the context of nested virtualization, we
>>>> are going to manage multiple stage 2 page tables for a single VM.
>>>>
>>>> This requires some pretty invasive changes to our data structures,
>>>> which moves the vmid and pgd pointers into a separate structure and
>>>> change pretty much all of our mmu code to operate on this structure
>>>> instead.
>>>>
>>>> The new structure is called struct kvm_s2_mmu.
>>>>
>>>> There is no intended functional change by this patch alone.
>>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>>>> index 7dd8fefa6aecd..664a5d92ae9b8 100644
>>>> --- a/arch/arm64/include/asm/kvm_host.h
>>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>>> @@ -63,19 +63,32 @@ struct kvm_vmid {
>>>>  	u32    vmid;
>>>>  };
>>>>  
>>>> -struct kvm_arch {
>>>> +struct kvm_s2_mmu {
>>>>  	struct kvm_vmid vmid;
>>>>  
>>>> -	/* stage2 entry level table */
>>>> -	pgd_t *pgd;
>>>> -	phys_addr_t pgd_phys;
>>>> -
>>>> -	/* VTCR_EL2 value for this VM */
>>>> -	u64    vtcr;
>>>> +	/*
>>>> +	 * stage2 entry level table
>>>> +	 *
>>>> +	 * Two kvm_s2_mmu structures in the same VM can point to the same pgd
>>>> +	 * here.  This happens when running a non-VHE guest hypervisor which
>>>> +	 * uses the canonical stage 2 page table for both vEL2 and for vEL1/0
>>>> +	 * with vHCR_EL2.VM == 0.

>>> It makes more sense to me to say that a non-VHE guest hypervisor will use the
>>> canonical stage *1* page table when running at EL2

>> Can KVM say anything about stage1? Its totally under the the guests control even at vEL2...

> It is. My interpretation of the comment was that if the guest doesn't have virtual
> stage 2 enabled (we're not running a guest of the L1 hypervisor), then the L0 host
> can use the same L0 stage 2 tables because we're running the same guest (the L1
> VM), regardless of the actual exception level for the guest.

I think you're right, but I can't see where stage 1 comes in to it!


> If I remember
> correctly, KVM assigns different vmids for guests running at vEL1/0 and vEL2 with
> vHCR_EL2.VM == 0 because the translation regimes are different, but keeps the same
> translation tables.

Interesting. Is that because vEL2 really has ASIDs so it needs its own VMID space?



>>> (the "Non-secure EL2 translation regime" as ARM DDI 0487F.b calls it on page D5-2543).
>>> I think that's
>>> the only situation where vEL2 and vEL1&0 will use the same L0 stage 2 tables. It's
>>> been quite some time since I reviewed the initial version of the NV patches, did I
>>> get that wrong?
>>
>>>> +	 */
>>>> +	pgd_t		*pgd;
>>>> +	phys_addr_t	pgd_phys;
>>>>  
>>>>  	/* The last vcpu id that ran on each physical CPU */
>>>>  	int __percpu *last_vcpu_ran;

>>> It makes sense for the other fields to be part of kvm_s2_mmu, but I'm struggling
>>> to figure out why last_vcpu_ran is here. Would you mind sharing the rationale? I
>>> don't see this change in v1 or v2 of the NV series.

>> Marc may have a better rationale. My thinking was because kvm_vmid is in here too.
>>
>> last_vcpu_ran exists to prevent KVM accidentally emulating CNP without the opt-in. (we
>> call it defacto CNP).
>>
>> The guest may expect to be able to use asid-4 with different page tables on different

> I'm afraid I don't know what asid-4 is.

Sorry - 4 was just a random number![0]
'to use the same asid number on different vcpus'.


>> vCPUs, assuming the TLB isn't shared. But if KVM is switching between those vCPU on one
>> physical CPU, the TLB is shared, ... the VMID and ASID are the same, but the page tables
>> are not. Not fun to debug!
>>
>>
>> NV makes this problem per-stage2, because each stage2 has its own VMID, we need to track
>> the vcpu_id that last ran this stage2 on this physical CPU. If its not the same, we need
>> to blow away this VMIDs TLB entries.
>>
>> The workaround lives in virt/kvm/arm/arm.c::kvm_arch_vcpu_load()
> 
> Makes sense, thank you for explaining that.

Great,


Thanks,

James


[0] https://xkcd.com/221/
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 03/26] KVM: arm64: Factor out stage 2 page table data from struct kvm
  2020-05-12 11:17     ` James Morse
  2020-05-12 15:47       ` Alexandru Elisei
@ 2020-05-12 16:53       ` Alexandru Elisei
  2020-05-27  8:41         ` Marc Zyngier
  1 sibling, 1 reply; 78+ messages in thread
From: Alexandru Elisei @ 2020-05-12 16:53 UTC (permalink / raw)
  To: James Morse, Marc Zyngier
  Cc: kvm, Andre Przywara, kvmarm, George Cherian, Zengtao (B),
	Catalin Marinas, Will Deacon, Dave Martin, linux-arm-kernel

Hi,

On 5/12/20 12:17 PM, James Morse wrote:
> Hi Alex, Marc,
>
> (just on this last_vcpu_ran thing...)
>
> On 11/05/2020 17:38, Alexandru Elisei wrote:
>> On 4/22/20 1:00 PM, Marc Zyngier wrote:
>>> From: Christoffer Dall <christoffer.dall@arm.com>
>>>
>>> As we are about to reuse our stage 2 page table manipulation code for
>>> shadow stage 2 page tables in the context of nested virtualization, we
>>> are going to manage multiple stage 2 page tables for a single VM.
>>>
>>> This requires some pretty invasive changes to our data structures,
>>> which moves the vmid and pgd pointers into a separate structure and
>>> change pretty much all of our mmu code to operate on this structure
>>> instead.
>>>
>>> The new structure is called struct kvm_s2_mmu.
>>>
>>> There is no intended functional change by this patch alone.
>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>>> index 7dd8fefa6aecd..664a5d92ae9b8 100644
>>> --- a/arch/arm64/include/asm/kvm_host.h
>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>> @@ -63,19 +63,32 @@ struct kvm_vmid {
>>>  	u32    vmid;
>>>  };
>>>  
>>> -struct kvm_arch {
>>> +struct kvm_s2_mmu {
>>>  	struct kvm_vmid vmid;
>>>  
>>> -	/* stage2 entry level table */
>>> -	pgd_t *pgd;
>>> -	phys_addr_t pgd_phys;
>>> -
>>> -	/* VTCR_EL2 value for this VM */
>>> -	u64    vtcr;
>>> +	/*
>>> +	 * stage2 entry level table
>>> +	 *
>>> +	 * Two kvm_s2_mmu structures in the same VM can point to the same pgd
>>> +	 * here.  This happens when running a non-VHE guest hypervisor which
>>> +	 * uses the canonical stage 2 page table for both vEL2 and for vEL1/0
>>> +	 * with vHCR_EL2.VM == 0.
>> It makes more sense to me to say that a non-VHE guest hypervisor will use the
>> canonical stage *1* page table when running at EL2
> Can KVM say anything about stage1? Its totally under the the guests control even at vEL2...

It just occurred to me that "canonical stage 2 page table" refers to the L0
hypervisor stage 2, not to the L1 hypervisor stage 2. If you don't mind my
suggestion, perhaps the comment can be slightly improved to avoid any confusion?
Maybe something along the lines of "[..] This happens when running a non-VHE guest
hypervisor, in which case we use the canonical stage 2 page table for both vEL2
and for vEL1/0 with vHCR_EL2.VM == 0".

Thanks,
Alex
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 08/26] KVM: arm64: Use TTL hint in when invalidating stage-2 translations
  2020-04-22 12:00 ` [PATCH 08/26] KVM: arm64: Use TTL hint in when invalidating stage-2 translations Marc Zyngier
  2020-05-07 15:13   ` Andrew Scull
@ 2020-05-12 17:26   ` James Morse
  1 sibling, 0 replies; 78+ messages in thread
From: James Morse @ 2020-05-12 17:26 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, Andre Przywara, kvmarm, Will Deacon, George Cherian,
	Zengtao (B),
	Catalin Marinas, Dave Martin, linux-arm-kernel

Hi Marc,

On 22/04/2020 13:00, Marc Zyngier wrote:
> Since we always have a precide idea of the level we're dealing with

(precise)

> when invalidating TLBs, we can provide it to as a hint to our
> invalidation helper.

> diff --git a/arch/arm64/include/asm/stage2_pgtable.h b/arch/arm64/include/asm/stage2_pgtable.h
> index 326aac658b9da..7ed5c1a769a9b 100644
> --- a/arch/arm64/include/asm/stage2_pgtable.h
> +++ b/arch/arm64/include/asm/stage2_pgtable.h
> @@ -230,4 +230,13 @@ stage2_pgd_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
>  	return (boundary - 1 < end - 1) ? boundary : end;
>  }
>  
> +/*
> + * Level values for the ARMv8.4-TTL extension, mapping PUD/PMD/PTE and
> + * the architectural page-table level.
> + */
> +#define S2_NO_LEVEL_HINT	0
> +#define S2_PUD_LEVEL		1
> +#define S2_PMD_LEVEL		2
> +#define S2_PTE_LEVEL		3

Are these really just for stage2, would the stage1 definition be the same?

~

Digging into the VTCR_EL2.SL0 trickery, it does everything at pgd where there are no block
mappings, and no hints, so it looks fine.

Reviewed-by: James Morse <james.morse@arm.com>


Thanks,

James
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 08/26] KVM: arm64: Use TTL hint in when invalidating stage-2 translations
  2020-05-12 12:04     ` James Morse
@ 2020-05-13  9:06       ` Andrew Scull
  2020-05-27  8:59         ` Marc Zyngier
  0 siblings, 1 reply; 78+ messages in thread
From: Andrew Scull @ 2020-05-13  9:06 UTC (permalink / raw)
  To: James Morse
  Cc: Catalin Marinas, kvm, Marc Zyngier, Dave Martin, George Cherian,
	Zengtao (B),
	Andre Przywara, Will Deacon, kvmarm, linux-arm-kernel

On Tue, May 12, 2020 at 01:04:31PM +0100, James Morse wrote:
> Hi Andrew,
> 
> On 07/05/2020 16:13, Andrew Scull wrote:
> >> @@ -176,7 +177,7 @@ static void clear_stage2_pud_entry(struct kvm_s2_mmu *mmu, pud_t *pud, phys_addr
> >>  	pmd_t *pmd_table __maybe_unused = stage2_pmd_offset(kvm, pud, 0);
> >>  	VM_BUG_ON(stage2_pud_huge(kvm, *pud));
> >>  	stage2_pud_clear(kvm, pud);
> >> -	kvm_tlb_flush_vmid_ipa(mmu, addr);
> >> +	kvm_tlb_flush_vmid_ipa(mmu, addr, S2_NO_LEVEL_HINT);
> >>  	stage2_pmd_free(kvm, pmd_table);
> >>  	put_page(virt_to_page(pud));
> >>  }
> >> @@ -186,7 +187,7 @@ static void clear_stage2_pmd_entry(struct kvm_s2_mmu *mmu, pmd_t *pmd, phys_addr
> >>  	pte_t *pte_table = pte_offset_kernel(pmd, 0);
> >>  	VM_BUG_ON(pmd_thp_or_huge(*pmd));
> >>  	pmd_clear(pmd);
> >> -	kvm_tlb_flush_vmid_ipa(mmu, addr);
> >> +	kvm_tlb_flush_vmid_ipa(mmu, addr, S2_NO_LEVEL_HINT);
> >>  	free_page((unsigned long)pte_table);
> >>  	put_page(virt_to_page(pmd));
> >>  }
> > 
> > Going by the names, is it possible to give a better level hint for these
> > cases?
> 
> There is no leaf entry being invalidated here. After clearing the range, we found we'd
> emptied (and invalidated) a whole page of mappings:
> |	if (stage2_pmd_table_empty(kvm, start_pmd))
> |		clear_stage2_pud_entry(mmu, pud, start_addr);
> 
> Now we want to remove the link to the empty page so we can free it. We are changing the
> structure of the tables, not what gets mapped.
> 
> I think this is why we need the un-hinted behaviour, to invalidate "any level of the
> translation table walk required to translate the specified IPA". Otherwise the hardware
> can look for a leaf at the indicated level, find none, and do nothing.
> 
> 
> This is sufficiently horrible, its possible I've got it completely wrong! (does it make
> sense?)

Ok. `addr` is an IPA, that IPA is now omitted from the map so doesn't
appear in any entry of the table, least of all a leaf entry. That makes
sense.

Is there a convention to distinguish IPA and PA similar to the
distinction for VA or does kvmarm just use phys_addr_t all round?

It seems like the TTL patches are failry self contained if it would be
easier to serparate them out from these larger series?
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 26/26] KVM: arm64: Parametrize exception entry with a target EL
  2020-04-22 12:00 ` [PATCH 26/26] KVM: arm64: Parametrize exception entry with a target EL Marc Zyngier
@ 2020-05-19 10:44   ` Mark Rutland
  2020-05-27  9:34     ` Marc Zyngier
  0 siblings, 1 reply; 78+ messages in thread
From: Mark Rutland @ 2020-05-19 10:44 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, Andre Przywara, kvmarm, Will Deacon, George Cherian,
	Zengtao (B),
	Catalin Marinas, Dave Martin, linux-arm-kernel

On Wed, Apr 22, 2020 at 01:00:50PM +0100, Marc Zyngier wrote:
> We currently assume that an exception is delivered to EL1, always.
> Once we emulate EL2, this no longer will be the case. To prepare
> for this, add a target_mode parameter.
> 
> While we're at it, merge the computing of the target PC and PSTATE in
> a single function that updates both PC and CPSR after saving their
> previous values in the corresponding ELR/SPSR. This ensures that they
> are updated in the correct order (a pretty common source of bugs...).
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/inject_fault.c | 75 ++++++++++++++++++-----------------
>  1 file changed, 38 insertions(+), 37 deletions(-)
> 
> diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
> index d3ebf8bca4b89..3dbcbc839b9c3 100644
> --- a/arch/arm64/kvm/inject_fault.c
> +++ b/arch/arm64/kvm/inject_fault.c
> @@ -26,28 +26,12 @@ enum exception_type {
>  	except_type_serror	= 0x180,
>  };
>  
> -static u64 get_except_vector(struct kvm_vcpu *vcpu, enum exception_type type)
> -{
> -	u64 exc_offset;
> -
> -	switch (*vcpu_cpsr(vcpu) & (PSR_MODE_MASK | PSR_MODE32_BIT)) {
> -	case PSR_MODE_EL1t:
> -		exc_offset = CURRENT_EL_SP_EL0_VECTOR;
> -		break;
> -	case PSR_MODE_EL1h:
> -		exc_offset = CURRENT_EL_SP_ELx_VECTOR;
> -		break;
> -	case PSR_MODE_EL0t:
> -		exc_offset = LOWER_EL_AArch64_VECTOR;
> -		break;
> -	default:
> -		exc_offset = LOWER_EL_AArch32_VECTOR;
> -	}
> -
> -	return vcpu_read_sys_reg(vcpu, VBAR_EL1) + exc_offset + type;
> -}
> -
>  /*
> + * This performs the exception entry at a given EL (@target_mode), stashing PC
> + * and PSTATE into ELR and SPSR respectively, and compute the new PC/PSTATE.
> + * The EL passed to this function *must* be a non-secure, privileged mode with
> + * bit 0 being set (PSTATE.SP == 1).
> + *
>   * When an exception is taken, most PSTATE fields are left unchanged in the
>   * handler. However, some are explicitly overridden (e.g. M[4:0]). Luckily all
>   * of the inherited bits have the same position in the AArch64/AArch32 SPSR_ELx
> @@ -59,10 +43,35 @@ static u64 get_except_vector(struct kvm_vcpu *vcpu, enum exception_type type)
>   * Here we manipulate the fields in order of the AArch64 SPSR_ELx layout, from
>   * MSB to LSB.
>   */
> -static unsigned long get_except64_pstate(struct kvm_vcpu *vcpu)
> +static void enter_exception(struct kvm_vcpu *vcpu, unsigned long target_mode,
> +			    enum exception_type type)

Since this is all for an AArch64 target, could we keep `64` in the name,
e.g enter_exception64? That'd mirror the callers below.

>  {
> -	unsigned long sctlr = vcpu_read_sys_reg(vcpu, SCTLR_EL1);
> -	unsigned long old, new;
> +	unsigned long sctlr, vbar, old, new, mode;
> +	u64 exc_offset;
> +
> +	mode = *vcpu_cpsr(vcpu) & (PSR_MODE_MASK | PSR_MODE32_BIT);
> +
> +	if      (mode == target_mode)
> +		exc_offset = CURRENT_EL_SP_ELx_VECTOR;
> +	else if ((mode | 1) == target_mode)
> +		exc_offset = CURRENT_EL_SP_EL0_VECTOR;

It would be nice if we could add a mnemonic for the `1` here, e.g.
PSR_MODE_SP0 or PSR_MODE_THREAD_BIT.

> +	else if (!(mode & PSR_MODE32_BIT))
> +		exc_offset = LOWER_EL_AArch64_VECTOR;
> +	else
> +		exc_offset = LOWER_EL_AArch32_VECTOR;

Other than the above, I couldn't think of a nicer way of writing thism
and AFAICT this is correct.

> +
> +	switch (target_mode) {
> +	case PSR_MODE_EL1h:
> +		vbar = vcpu_read_sys_reg(vcpu, VBAR_EL1);
> +		sctlr = vcpu_read_sys_reg(vcpu, SCTLR_EL1);
> +		vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL1);
> +		break;
> +	default:
> +		/* Don't do that */
> +		BUG();
> +	}
> +
> +	*vcpu_pc(vcpu) = vbar + exc_offset + type;
>  
>  	old = *vcpu_cpsr(vcpu);
>  	new = 0;
> @@ -105,9 +114,10 @@ static unsigned long get_except64_pstate(struct kvm_vcpu *vcpu)
>  	new |= PSR_I_BIT;
>  	new |= PSR_F_BIT;
>  
> -	new |= PSR_MODE_EL1h;
> +	new |= target_mode;

As a heads-up, some of the other bits will need to change for an EL2
target (e.g. SPAN will depend on HCR_EL2.E2H), but as-is this this is
fine.

Regardless of the above comments:

Reviewed-by: Mark Rutland <mark.rutland@arm.com>

Mark.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 10/26] KVM: arm64: Refactor vcpu_{read,write}_sys_reg
  2020-04-22 12:00 ` [PATCH 10/26] KVM: arm64: Refactor vcpu_{read,write}_sys_reg Marc Zyngier
@ 2020-05-26 16:28   ` James Morse
  2020-05-27 10:04     ` Marc Zyngier
  0 siblings, 1 reply; 78+ messages in thread
From: James Morse @ 2020-05-26 16:28 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, Andre Przywara, kvmarm, Will Deacon, George Cherian,
	Zengtao (B),
	Catalin Marinas, Dave Martin, linux-arm-kernel

Hi Marc,

On 22/04/2020 13:00, Marc Zyngier wrote:
> Extract the direct HW accessors for later reuse.

> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 51db934702b64..46f218982df8c 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c

> +u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
> +{
> +	u64 val = 0x8badf00d8badf00d;
> +
> +	if (!vcpu->arch.sysregs_loaded_on_cpu) {
> +		goto memory_read;
>  	}
>  
> -immediate_write:
> +	if (__vcpu_read_sys_reg_from_cpu(reg, &val))
> +		return val;
> +
> +memory_read:
> +	return __vcpu_sys_reg(vcpu, reg);
> +}


The goto here is a bit odd, is it just an artefact of how we got here?
Is this easier on the eye?:
| u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
| {
|	u64 val = 0x8badf00d8badf00d;
|
|	if (vcpu->arch.sysregs_loaded_on_cpu &&
|	    __vcpu_read_sys_reg_from_cpu(reg, &val))
|		return val;
|
| 	return __vcpu_sys_reg(vcpu, reg);
| }


> +void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
> +{
> +	if (!vcpu->arch.sysregs_loaded_on_cpu)
> +		goto memory_write;
> +
> +	if (__vcpu_write_sys_reg_to_cpu(val, reg))
> +		return;
> +
> +memory_write:
>  	 __vcpu_sys_reg(vcpu, reg) = val;
>  }

Again I think its clearer without the goto....


Regardless:
Reviewed-by: James Morse <james.morse@arm.com>


Thanks,

James
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 11/26] KVM: arm64: Add missing reset handlers for PMU emulation
  2020-04-22 12:00 ` [PATCH 11/26] KVM: arm64: Add missing reset handlers for PMU emulation Marc Zyngier
@ 2020-05-26 16:29   ` James Morse
  0 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2020-05-26 16:29 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, Andre Przywara, kvmarm, Will Deacon, George Cherian,
	Zengtao (B),
	Catalin Marinas, Dave Martin, linux-arm-kernel

Hi Marc,

On 22/04/2020 13:00, Marc Zyngier wrote:
> As we're about to become a bit more harsh when it comes to the lack of
> reset callbacks, let's add the missing PMU reset handlers. Note that
> these only cover *CLR registers that were always covered by their *SET
> counterpart, so there is no semantic change here.

Reviewed-by: James Morse <james.morse@arm.com>


Thanks,

James
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 19/26] KVM: arm64: Make struct kvm_regs userspace-only
  2020-04-22 12:00 ` [PATCH 19/26] KVM: arm64: Make struct kvm_regs userspace-only Marc Zyngier
@ 2020-05-26 16:29   ` James Morse
  2020-05-27 10:22     ` Marc Zyngier
  0 siblings, 1 reply; 78+ messages in thread
From: James Morse @ 2020-05-26 16:29 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, Andre Przywara, kvmarm, Will Deacon, George Cherian,
	Zengtao (B),
	Catalin Marinas, Dave Martin, linux-arm-kernel

Hi Marc,

On 22/04/2020 13:00, Marc Zyngier wrote:
> struct kvm_regs is used by userspace to indicate which register gets
> accessed by the {GET,SET}_ONE_REG API. But as we're about to refactor
> the layout of the in-kernel register structures, we need the kernel to
> move away from it.
> 
> Let's make kvm_regs userspace only, and let the kernel map it to its own
> internal representation.

> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> index 23ebe51410f06..9fec9231b63e2 100644
> --- a/arch/arm64/kvm/guest.c
> +++ b/arch/arm64/kvm/guest.c
> @@ -102,6 +102,55 @@ static int core_reg_size_from_offset(const struct kvm_vcpu *vcpu, u64 off)
>  	return size;
>  }
>  
> +static void *core_reg_addr(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> +{
> +	u64 off = core_reg_offset_from_id(reg->id);
> +
> +	switch (off) {

> +	default:
> +		return NULL;

Doesn't this switch statement catch an out of range offset, and a misaligned offset?

... We still test for those explicitly in the caller. Better safe than implicit?


> +	}
> +}

With the reset thing reported by Zenghui and Zengtao on the previous patch fixed:
Reviewed-by: James Morse <james.morse@arm.com>

(otherwise struct kvm_regs isn't userspace-only!)


Thanks,

James
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 20/26] KVM: arm64: Move ELR_EL1 to the system register array
  2020-04-22 12:00 ` [PATCH 20/26] KVM: arm64: Move ELR_EL1 to the system register array Marc Zyngier
@ 2020-05-26 16:29   ` James Morse
  2020-05-27 10:36     ` Marc Zyngier
  0 siblings, 1 reply; 78+ messages in thread
From: James Morse @ 2020-05-26 16:29 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, Andre Przywara, kvmarm, Will Deacon, George Cherian,
	Zengtao (B),
	Catalin Marinas, Dave Martin, linux-arm-kernel

Hi Marc,

On 22/04/2020 13:00, Marc Zyngier wrote:
> As ELR-EL1 is a VNCR-capable register with ARMv8.4-NV, let's move it to
> the sys_regs array and repaint the accessors. While we're at it, let's
> kill the now useless accessors used only on the fault injection path.

Reviewed-by: James Morse <james.morse@arm.com>


A curiosity:

> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 95977b80265ce..46949fce3e813 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -184,6 +184,8 @@ enum vcpu_sysreg {

Comment above the enum has some claims about the order, but its already out of order with
__vcpu_read_sys_reg_from_cpu()... (PAR_EL1 being the culprit)

(I think it only matters for searching by encoding, which is checked at boot.)


>  	APGAKEYLO_EL1,
>  	APGAKEYHI_EL1,
>  
> +	ELR_EL1,
> +
>  	/* 32bit specific registers. Keep them at the end of the range */
>  	DACR32_EL2,	/* Domain Access Control Register */
>  	IFSR32_EL2,	/* Instruction Fault Status Register */

> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 5bda4af36a0e7..7c2fffb20c217 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -95,6 +95,7 @@ static bool __vcpu_read_sys_reg_from_cpu(int reg, u64 *val)
>  	case TPIDR_EL1:		*val = read_sysreg_s(SYS_TPIDR_EL1);	break;
>  	case AMAIR_EL1:		*val = read_sysreg_s(SYS_AMAIR_EL12);	break;
>  	case CNTKCTL_EL1:	*val = read_sysreg_s(SYS_CNTKCTL_EL12);	break;
> +	case ELR_EL1:		*val = read_sysreg_s(SYS_ELR_EL12);	break;
>  	case PAR_EL1:		*val = read_sysreg_s(SYS_PAR_EL1);	break;
>  	case DACR32_EL2:	*val = read_sysreg_s(SYS_DACR32_EL2);	break;
>  	case IFSR32_EL2:	*val = read_sysreg_s(SYS_IFSR32_EL2);	break;



Thanks,

James
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 21/26] KVM: arm64: Move SP_EL1 to the system register array
  2020-04-22 12:00 ` [PATCH 21/26] KVM: arm64: Move SP_EL1 " Marc Zyngier
@ 2020-05-26 16:29   ` James Morse
  0 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2020-05-26 16:29 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm
  Cc: Will Deacon, Andre Przywara, Dave Martin, George Cherian,
	Zengtao (B),
	Catalin Marinas

Hi Marc,

On 22/04/2020 13:00, Marc Zyngier wrote:
> SP_EL1 being a VNCR-capable register with ARMv8.4-NV, move it to the
> system register array and update the accessors.

Reviewed-by: James Morse <james.morse@arm.com>


Thanks,

James
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 22/26] KVM: arm64: Disintegrate SPSR array
  2020-04-22 12:00 ` [PATCH 22/26] KVM: arm64: Disintegrate SPSR array Marc Zyngier
@ 2020-05-26 16:30   ` James Morse
  0 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2020-05-26 16:30 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, Andre Przywara, kvmarm, Will Deacon, George Cherian,
	Zengtao (B),
	Catalin Marinas, Dave Martin, linux-arm-kernel

Hi Marc,

On 22/04/2020 13:00, Marc Zyngier wrote:
> As we're about to move SPSR_EL1 into the VNCR page, we need to
> disassociate it from the rest of the 32bit cruft. Let's break
> the array into individual fields.

Reviewed-by: James Morse <james.morse@arm.com>


Thanks,

James
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 23/26] KVM: arm64: Move SPSR_EL1 to the system register array
  2020-04-22 12:00 ` [PATCH 23/26] KVM: arm64: Move SPSR_EL1 to the system register array Marc Zyngier
@ 2020-05-26 16:30   ` James Morse
  0 siblings, 0 replies; 78+ messages in thread
From: James Morse @ 2020-05-26 16:30 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, Andre Przywara, kvmarm, Will Deacon, George Cherian,
	Zengtao (B),
	Catalin Marinas, Dave Martin, linux-arm-kernel

Hi Marc,

On 22/04/2020 13:00, Marc Zyngier wrote:
> SPSR_EL1 being a VNCR-capable register with ARMv8.4-NV, move it to
> the sysregs array and update the accessors.

Reviewed-by: James Morse <james.morse@arm.com>

Thanks,

James
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 03/26] KVM: arm64: Factor out stage 2 page table data from struct kvm
  2020-05-12 16:53       ` Alexandru Elisei
@ 2020-05-27  8:41         ` Marc Zyngier
  2020-05-27  8:45           ` Alexandru Elisei
  0 siblings, 1 reply; 78+ messages in thread
From: Marc Zyngier @ 2020-05-27  8:41 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, Andre Przywara, Dave Martin, George Cherian, Zengtao (B),
	Catalin Marinas, Will Deacon, kvmarm, linux-arm-kernel

Hi Alex,

On 2020-05-12 17:53, Alexandru Elisei wrote:
> Hi,
> 
> On 5/12/20 12:17 PM, James Morse wrote:
>> Hi Alex, Marc,
>> 
>> (just on this last_vcpu_ran thing...)
>> 
>> On 11/05/2020 17:38, Alexandru Elisei wrote:
>>> On 4/22/20 1:00 PM, Marc Zyngier wrote:
>>>> From: Christoffer Dall <christoffer.dall@arm.com>
>>>> 
>>>> As we are about to reuse our stage 2 page table manipulation code 
>>>> for
>>>> shadow stage 2 page tables in the context of nested virtualization, 
>>>> we
>>>> are going to manage multiple stage 2 page tables for a single VM.
>>>> 
>>>> This requires some pretty invasive changes to our data structures,
>>>> which moves the vmid and pgd pointers into a separate structure and
>>>> change pretty much all of our mmu code to operate on this structure
>>>> instead.
>>>> 
>>>> The new structure is called struct kvm_s2_mmu.
>>>> 
>>>> There is no intended functional change by this patch alone.
>>>> diff --git a/arch/arm64/include/asm/kvm_host.h 
>>>> b/arch/arm64/include/asm/kvm_host.h
>>>> index 7dd8fefa6aecd..664a5d92ae9b8 100644
>>>> --- a/arch/arm64/include/asm/kvm_host.h
>>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>>> @@ -63,19 +63,32 @@ struct kvm_vmid {
>>>>  	u32    vmid;
>>>>  };
>>>> 
>>>> -struct kvm_arch {
>>>> +struct kvm_s2_mmu {
>>>>  	struct kvm_vmid vmid;
>>>> 
>>>> -	/* stage2 entry level table */
>>>> -	pgd_t *pgd;
>>>> -	phys_addr_t pgd_phys;
>>>> -
>>>> -	/* VTCR_EL2 value for this VM */
>>>> -	u64    vtcr;
>>>> +	/*
>>>> +	 * stage2 entry level table
>>>> +	 *
>>>> +	 * Two kvm_s2_mmu structures in the same VM can point to the same 
>>>> pgd
>>>> +	 * here.  This happens when running a non-VHE guest hypervisor 
>>>> which
>>>> +	 * uses the canonical stage 2 page table for both vEL2 and for 
>>>> vEL1/0
>>>> +	 * with vHCR_EL2.VM == 0.
>>> It makes more sense to me to say that a non-VHE guest hypervisor will 
>>> use the
>>> canonical stage *1* page table when running at EL2
>> Can KVM say anything about stage1? Its totally under the the guests 
>> control even at vEL2...
> 
> It just occurred to me that "canonical stage 2 page table" refers to 
> the L0
> hypervisor stage 2, not to the L1 hypervisor stage 2. If you don't mind 
> my
> suggestion, perhaps the comment can be slightly improved to avoid any 
> confusion?
> Maybe something along the lines of "[..] This happens when running a
> non-VHE guest
> hypervisor, in which case we use the canonical stage 2 page table for 
> both vEL2
> and for vEL1/0 with vHCR_EL2.VM == 0".

If the confusion stems from the lack of guest stage-2, how about:

"This happens when running a guest using a translation regime that isn't
  affected by its own stage-2 translation, such as a non-VHE hypervisor
  running at vEL2, or for vEL1/EL0 with vHCR_EL2.VM == 0. In that case,
  we use the canonical stage-2 page tables."

instead? Does this lift the ambiguity?

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 03/26] KVM: arm64: Factor out stage 2 page table data from struct kvm
  2020-05-27  8:41         ` Marc Zyngier
@ 2020-05-27  8:45           ` Alexandru Elisei
  0 siblings, 0 replies; 78+ messages in thread
From: Alexandru Elisei @ 2020-05-27  8:45 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, Andre Przywara, Dave Martin, George Cherian, Zengtao (B),
	Catalin Marinas, Will Deacon, kvmarm, linux-arm-kernel

Hi Marc,

On 5/27/20 9:41 AM, Marc Zyngier wrote:
> Hi Alex,
>
> On 2020-05-12 17:53, Alexandru Elisei wrote:
>> Hi,
>>
>> On 5/12/20 12:17 PM, James Morse wrote:
>>> Hi Alex, Marc,
>>>
>>> (just on this last_vcpu_ran thing...)
>>>
>>> On 11/05/2020 17:38, Alexandru Elisei wrote:
>>>> On 4/22/20 1:00 PM, Marc Zyngier wrote:
>>>>> From: Christoffer Dall <christoffer.dall@arm.com>
>>>>>
>>>>> As we are about to reuse our stage 2 page table manipulation code for
>>>>> shadow stage 2 page tables in the context of nested virtualization, we
>>>>> are going to manage multiple stage 2 page tables for a single VM.
>>>>>
>>>>> This requires some pretty invasive changes to our data structures,
>>>>> which moves the vmid and pgd pointers into a separate structure and
>>>>> change pretty much all of our mmu code to operate on this structure
>>>>> instead.
>>>>>
>>>>> The new structure is called struct kvm_s2_mmu.
>>>>>
>>>>> There is no intended functional change by this patch alone.
>>>>> diff --git a/arch/arm64/include/asm/kvm_host.h
>>>>> b/arch/arm64/include/asm/kvm_host.h
>>>>> index 7dd8fefa6aecd..664a5d92ae9b8 100644
>>>>> --- a/arch/arm64/include/asm/kvm_host.h
>>>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>>>> @@ -63,19 +63,32 @@ struct kvm_vmid {
>>>>>      u32    vmid;
>>>>>  };
>>>>>
>>>>> -struct kvm_arch {
>>>>> +struct kvm_s2_mmu {
>>>>>      struct kvm_vmid vmid;
>>>>>
>>>>> -    /* stage2 entry level table */
>>>>> -    pgd_t *pgd;
>>>>> -    phys_addr_t pgd_phys;
>>>>> -
>>>>> -    /* VTCR_EL2 value for this VM */
>>>>> -    u64    vtcr;
>>>>> +    /*
>>>>> +     * stage2 entry level table
>>>>> +     *
>>>>> +     * Two kvm_s2_mmu structures in the same VM can point to the same pgd
>>>>> +     * here.  This happens when running a non-VHE guest hypervisor which
>>>>> +     * uses the canonical stage 2 page table for both vEL2 and for vEL1/0
>>>>> +     * with vHCR_EL2.VM == 0.
>>>> It makes more sense to me to say that a non-VHE guest hypervisor will use the
>>>> canonical stage *1* page table when running at EL2
>>> Can KVM say anything about stage1? Its totally under the the guests control
>>> even at vEL2...
>>
>> It just occurred to me that "canonical stage 2 page table" refers to the L0
>> hypervisor stage 2, not to the L1 hypervisor stage 2. If you don't mind my
>> suggestion, perhaps the comment can be slightly improved to avoid any confusion?
>> Maybe something along the lines of "[..] This happens when running a
>> non-VHE guest
>> hypervisor, in which case we use the canonical stage 2 page table for both vEL2
>> and for vEL1/0 with vHCR_EL2.VM == 0".
>
> If the confusion stems from the lack of guest stage-2, how about:
>
> "This happens when running a guest using a translation regime that isn't
>  affected by its own stage-2 translation, such as a non-VHE hypervisor
>  running at vEL2, or for vEL1/EL0 with vHCR_EL2.VM == 0. In that case,
>  we use the canonical stage-2 page tables."
>
> instead? Does this lift the ambiguity?

Yes, that's perfect.

Thanks,
Alex
>
> Thanks,
>
>         M.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 08/26] KVM: arm64: Use TTL hint in when invalidating stage-2 translations
  2020-05-13  9:06       ` Andrew Scull
@ 2020-05-27  8:59         ` Marc Zyngier
  0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2020-05-27  8:59 UTC (permalink / raw)
  To: Andrew Scull
  Cc: kvm, Andre Przywara, Dave Martin, George Cherian, Zengtao (B),
	Catalin Marinas, Will Deacon, kvmarm, linux-arm-kernel

On 2020-05-13 10:06, Andrew Scull wrote:
> On Tue, May 12, 2020 at 01:04:31PM +0100, James Morse wrote:
>> Hi Andrew,
>> 
>> On 07/05/2020 16:13, Andrew Scull wrote:
>> >> @@ -176,7 +177,7 @@ static void clear_stage2_pud_entry(struct kvm_s2_mmu *mmu, pud_t *pud, phys_addr
>> >>  	pmd_t *pmd_table __maybe_unused = stage2_pmd_offset(kvm, pud, 0);
>> >>  	VM_BUG_ON(stage2_pud_huge(kvm, *pud));
>> >>  	stage2_pud_clear(kvm, pud);
>> >> -	kvm_tlb_flush_vmid_ipa(mmu, addr);
>> >> +	kvm_tlb_flush_vmid_ipa(mmu, addr, S2_NO_LEVEL_HINT);
>> >>  	stage2_pmd_free(kvm, pmd_table);
>> >>  	put_page(virt_to_page(pud));
>> >>  }
>> >> @@ -186,7 +187,7 @@ static void clear_stage2_pmd_entry(struct kvm_s2_mmu *mmu, pmd_t *pmd, phys_addr
>> >>  	pte_t *pte_table = pte_offset_kernel(pmd, 0);
>> >>  	VM_BUG_ON(pmd_thp_or_huge(*pmd));
>> >>  	pmd_clear(pmd);
>> >> -	kvm_tlb_flush_vmid_ipa(mmu, addr);
>> >> +	kvm_tlb_flush_vmid_ipa(mmu, addr, S2_NO_LEVEL_HINT);
>> >>  	free_page((unsigned long)pte_table);
>> >>  	put_page(virt_to_page(pmd));
>> >>  }
>> >
>> > Going by the names, is it possible to give a better level hint for these
>> > cases?
>> 
>> There is no leaf entry being invalidated here. After clearing the 
>> range, we found we'd
>> emptied (and invalidated) a whole page of mappings:
>> |	if (stage2_pmd_table_empty(kvm, start_pmd))
>> |		clear_stage2_pud_entry(mmu, pud, start_addr);
>> 
>> Now we want to remove the link to the empty page so we can free it. We 
>> are changing the
>> structure of the tables, not what gets mapped.
>> 
>> I think this is why we need the un-hinted behaviour, to invalidate 
>> "any level of the
>> translation table walk required to translate the specified IPA". 
>> Otherwise the hardware
>> can look for a leaf at the indicated level, find none, and do nothing.
>> 
>> 
>> This is sufficiently horrible, its possible I've got it completely 
>> wrong! (does it make
>> sense?)
> 
> Ok. `addr` is an IPA, that IPA is now omitted from the map so doesn't
> appear in any entry of the table, least of all a leaf entry. That makes
> sense.
> 
> Is there a convention to distinguish IPA and PA similar to the
> distinction for VA or does kvmarm just use phys_addr_t all round?
> 
> It seems like the TTL patches are failry self contained if it would be
> easier to serparate them out from these larger series?

They are. This whole series is a mix of unrelated patches anyway.
Their only goal is to make my life a bit easier in the distant
future.

I'll repost that anyway, as I have made some cosmetic changes.

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 26/26] KVM: arm64: Parametrize exception entry with a target EL
  2020-05-19 10:44   ` Mark Rutland
@ 2020-05-27  9:34     ` Marc Zyngier
  2020-05-27 14:41       ` Mark Rutland
  0 siblings, 1 reply; 78+ messages in thread
From: Marc Zyngier @ 2020-05-27  9:34 UTC (permalink / raw)
  To: Mark Rutland
  Cc: kvm, Andre Przywara, kvmarm, Will Deacon, George Cherian,
	Zengtao (B),
	Catalin Marinas, Dave Martin, linux-arm-kernel

HI Mark,

On 2020-05-19 11:44, Mark Rutland wrote:
> On Wed, Apr 22, 2020 at 01:00:50PM +0100, Marc Zyngier wrote:
>> We currently assume that an exception is delivered to EL1, always.
>> Once we emulate EL2, this no longer will be the case. To prepare
>> for this, add a target_mode parameter.
>> 
>> While we're at it, merge the computing of the target PC and PSTATE in
>> a single function that updates both PC and CPSR after saving their
>> previous values in the corresponding ELR/SPSR. This ensures that they
>> are updated in the correct order (a pretty common source of bugs...).
>> 
>> Signed-off-by: Marc Zyngier <maz@kernel.org>
>> ---
>>  arch/arm64/kvm/inject_fault.c | 75 
>> ++++++++++++++++++-----------------
>>  1 file changed, 38 insertions(+), 37 deletions(-)
>> 
>> diff --git a/arch/arm64/kvm/inject_fault.c 
>> b/arch/arm64/kvm/inject_fault.c
>> index d3ebf8bca4b89..3dbcbc839b9c3 100644
>> --- a/arch/arm64/kvm/inject_fault.c
>> +++ b/arch/arm64/kvm/inject_fault.c
>> @@ -26,28 +26,12 @@ enum exception_type {
>>  	except_type_serror	= 0x180,
>>  };
>> 
>> -static u64 get_except_vector(struct kvm_vcpu *vcpu, enum 
>> exception_type type)
>> -{
>> -	u64 exc_offset;
>> -
>> -	switch (*vcpu_cpsr(vcpu) & (PSR_MODE_MASK | PSR_MODE32_BIT)) {
>> -	case PSR_MODE_EL1t:
>> -		exc_offset = CURRENT_EL_SP_EL0_VECTOR;
>> -		break;
>> -	case PSR_MODE_EL1h:
>> -		exc_offset = CURRENT_EL_SP_ELx_VECTOR;
>> -		break;
>> -	case PSR_MODE_EL0t:
>> -		exc_offset = LOWER_EL_AArch64_VECTOR;
>> -		break;
>> -	default:
>> -		exc_offset = LOWER_EL_AArch32_VECTOR;
>> -	}
>> -
>> -	return vcpu_read_sys_reg(vcpu, VBAR_EL1) + exc_offset + type;
>> -}
>> -
>>  /*
>> + * This performs the exception entry at a given EL (@target_mode), 
>> stashing PC
>> + * and PSTATE into ELR and SPSR respectively, and compute the new 
>> PC/PSTATE.
>> + * The EL passed to this function *must* be a non-secure, privileged 
>> mode with
>> + * bit 0 being set (PSTATE.SP == 1).
>> + *
>>   * When an exception is taken, most PSTATE fields are left unchanged 
>> in the
>>   * handler. However, some are explicitly overridden (e.g. M[4:0]). 
>> Luckily all
>>   * of the inherited bits have the same position in the 
>> AArch64/AArch32 SPSR_ELx
>> @@ -59,10 +43,35 @@ static u64 get_except_vector(struct kvm_vcpu 
>> *vcpu, enum exception_type type)
>>   * Here we manipulate the fields in order of the AArch64 SPSR_ELx 
>> layout, from
>>   * MSB to LSB.
>>   */
>> -static unsigned long get_except64_pstate(struct kvm_vcpu *vcpu)
>> +static void enter_exception(struct kvm_vcpu *vcpu, unsigned long 
>> target_mode,
>> +			    enum exception_type type)
> 
> Since this is all for an AArch64 target, could we keep `64` in the 
> name,
> e.g enter_exception64? That'd mirror the callers below.
> 
>>  {
>> -	unsigned long sctlr = vcpu_read_sys_reg(vcpu, SCTLR_EL1);
>> -	unsigned long old, new;
>> +	unsigned long sctlr, vbar, old, new, mode;
>> +	u64 exc_offset;
>> +
>> +	mode = *vcpu_cpsr(vcpu) & (PSR_MODE_MASK | PSR_MODE32_BIT);
>> +
>> +	if      (mode == target_mode)
>> +		exc_offset = CURRENT_EL_SP_ELx_VECTOR;
>> +	else if ((mode | 1) == target_mode)
>> +		exc_offset = CURRENT_EL_SP_EL0_VECTOR;
> 
> It would be nice if we could add a mnemonic for the `1` here, e.g.
> PSR_MODE_SP0 or PSR_MODE_THREAD_BIT.

I've addressed both comments as follows:

diff --git a/arch/arm64/include/asm/ptrace.h 
b/arch/arm64/include/asm/ptrace.h
index bf57308fcd63..953b6a1ce549 100644
--- a/arch/arm64/include/asm/ptrace.h
+++ b/arch/arm64/include/asm/ptrace.h
@@ -35,6 +35,7 @@
  #define GIC_PRIO_PSR_I_SET		(1 << 4)

  /* Additional SPSR bits not exposed in the UABI */
+#define PSR_MODE_THREAD_BIT	(1 << 0)
  #define PSR_IL_BIT		(1 << 20)

  /* AArch32-specific ptrace requests */
diff --git a/arch/arm64/kvm/inject_fault.c 
b/arch/arm64/kvm/inject_fault.c
index 3dbcbc839b9c..ebfdfc27b2bd 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -43,8 +43,8 @@ enum exception_type {
   * Here we manipulate the fields in order of the AArch64 SPSR_ELx 
layout, from
   * MSB to LSB.
   */
-static void enter_exception(struct kvm_vcpu *vcpu, unsigned long 
target_mode,
-			    enum exception_type type)
+static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long 
target_mode,
+			      enum exception_type type)
  {
  	unsigned long sctlr, vbar, old, new, mode;
  	u64 exc_offset;
@@ -53,7 +53,7 @@ static void enter_exception(struct kvm_vcpu *vcpu, 
unsigned long target_mode,

  	if      (mode == target_mode)
  		exc_offset = CURRENT_EL_SP_ELx_VECTOR;
-	else if ((mode | 1) == target_mode)
+	else if ((mode | PSR_MODE_THREAD_BIT) == target_mode)
  		exc_offset = CURRENT_EL_SP_EL0_VECTOR;
  	else if (!(mode & PSR_MODE32_BIT))
  		exc_offset = LOWER_EL_AArch64_VECTOR;
@@ -126,7 +126,7 @@ static void inject_abt64(struct kvm_vcpu *vcpu, bool 
is_iabt, unsigned long addr
  	bool is_aarch32 = vcpu_mode_is_32bit(vcpu);
  	u32 esr = 0;

-	enter_exception(vcpu, PSR_MODE_EL1h, except_type_sync);
+	enter_exception64(vcpu, PSR_MODE_EL1h, except_type_sync);

  	vcpu_write_sys_reg(vcpu, addr, FAR_EL1);

@@ -156,7 +156,7 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
  {
  	u32 esr = (ESR_ELx_EC_UNKNOWN << ESR_ELx_EC_SHIFT);

-	enter_exception(vcpu, PSR_MODE_EL1h, except_type_sync);
+	enter_exception64(vcpu, PSR_MODE_EL1h, except_type_sync);

  	/*
  	 * Build an unknown exception, depending on the instruction


Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 10/26] KVM: arm64: Refactor vcpu_{read,write}_sys_reg
  2020-05-26 16:28   ` James Morse
@ 2020-05-27 10:04     ` Marc Zyngier
  0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2020-05-27 10:04 UTC (permalink / raw)
  To: James Morse
  Cc: kvm, Andre Przywara, kvmarm, Will Deacon, George Cherian,
	Zengtao (B),
	Catalin Marinas, Dave Martin, linux-arm-kernel

On 2020-05-26 17:28, James Morse wrote:
> Hi Marc,
> 
> On 22/04/2020 13:00, Marc Zyngier wrote:
>> Extract the direct HW accessors for later reuse.
> 
>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>> index 51db934702b64..46f218982df8c 100644
>> --- a/arch/arm64/kvm/sys_regs.c
>> +++ b/arch/arm64/kvm/sys_regs.c
> 
>> +u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
>> +{
>> +	u64 val = 0x8badf00d8badf00d;
>> +
>> +	if (!vcpu->arch.sysregs_loaded_on_cpu) {
>> +		goto memory_read;
>>  	}
>> 
>> -immediate_write:
>> +	if (__vcpu_read_sys_reg_from_cpu(reg, &val))
>> +		return val;
>> +
>> +memory_read:
>> +	return __vcpu_sys_reg(vcpu, reg);
>> +}
> 
> 
> The goto here is a bit odd, is it just an artefact of how we got here?

That's because a lot of this changes when NV gets added to the mix,
see [1].

> Is this easier on the eye?:
> | u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
> | {
> |	u64 val = 0x8badf00d8badf00d;
> |
> |	if (vcpu->arch.sysregs_loaded_on_cpu &&
> |	    __vcpu_read_sys_reg_from_cpu(reg, &val))
> |		return val;
> |
> | 	return __vcpu_sys_reg(vcpu, reg);
> | }

Definitely. I don't mind reworking the NV branch so that the label
gets introduced there.

>> +void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
>> +{
>> +	if (!vcpu->arch.sysregs_loaded_on_cpu)
>> +		goto memory_write;
>> +
>> +	if (__vcpu_write_sys_reg_to_cpu(val, reg))
>> +		return;
>> +
>> +memory_write:
>>  	 __vcpu_sys_reg(vcpu, reg) = val;
>>  }
> 
> Again I think its clearer without the goto....
> 
> 
> Regardless:
> Reviewed-by: James Morse <james.morse@arm.com>

Thanks,

         M.

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/commit/?h=kvm-arm64/nv-5.7-rc1-WIP&id=11f3217d39a602cbfac7d08064c8b31afb57348e
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 19/26] KVM: arm64: Make struct kvm_regs userspace-only
  2020-05-26 16:29   ` James Morse
@ 2020-05-27 10:22     ` Marc Zyngier
  0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2020-05-27 10:22 UTC (permalink / raw)
  To: James Morse
  Cc: kvm, Andre Przywara, kvmarm, Will Deacon, George Cherian,
	Zengtao (B),
	Catalin Marinas, Dave Martin, linux-arm-kernel

On 2020-05-26 17:29, James Morse wrote:
> Hi Marc,
> 
> On 22/04/2020 13:00, Marc Zyngier wrote:
>> struct kvm_regs is used by userspace to indicate which register gets
>> accessed by the {GET,SET}_ONE_REG API. But as we're about to refactor
>> the layout of the in-kernel register structures, we need the kernel to
>> move away from it.
>> 
>> Let's make kvm_regs userspace only, and let the kernel map it to its 
>> own
>> internal representation.
> 
>> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
>> index 23ebe51410f06..9fec9231b63e2 100644
>> --- a/arch/arm64/kvm/guest.c
>> +++ b/arch/arm64/kvm/guest.c
>> @@ -102,6 +102,55 @@ static int core_reg_size_from_offset(const struct 
>> kvm_vcpu *vcpu, u64 off)
>>  	return size;
>>  }
>> 
>> +static void *core_reg_addr(struct kvm_vcpu *vcpu, const struct 
>> kvm_one_reg *reg)
>> +{
>> +	u64 off = core_reg_offset_from_id(reg->id);
>> +
>> +	switch (off) {
> 
>> +	default:
>> +		return NULL;
> 
> Doesn't this switch statement catch an out of range offset, and a
> misaligned offset?
> 
> ... We still test for those explicitly in the caller. Better safe than 
> implicit?

Indeed, this is not supposed to happen at all. Maybe I should just fold
validate_core_offset offset there, and make this NULL value the error
case.

> 
>> +	}
>> +}
> 
> With the reset thing reported by Zenghui and Zengtao on the previous
> patch fixed:
> Reviewed-by: James Morse <james.morse@arm.com>
> 
> (otherwise struct kvm_regs isn't userspace-only!)

Indeed!

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 20/26] KVM: arm64: Move ELR_EL1 to the system register array
  2020-05-26 16:29   ` James Morse
@ 2020-05-27 10:36     ` Marc Zyngier
  0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2020-05-27 10:36 UTC (permalink / raw)
  To: James Morse
  Cc: kvm, Andre Przywara, kvmarm, Will Deacon, George Cherian,
	Zengtao (B),
	Catalin Marinas, Dave Martin, linux-arm-kernel

On 2020-05-26 17:29, James Morse wrote:
> Hi Marc,
> 
> On 22/04/2020 13:00, Marc Zyngier wrote:
>> As ELR-EL1 is a VNCR-capable register with ARMv8.4-NV, let's move it 
>> to
>> the sys_regs array and repaint the accessors. While we're at it, let's
>> kill the now useless accessors used only on the fault injection path.
> 
> Reviewed-by: James Morse <james.morse@arm.com>
> 
> 
> A curiosity:
> 
>> diff --git a/arch/arm64/include/asm/kvm_host.h 
>> b/arch/arm64/include/asm/kvm_host.h
>> index 95977b80265ce..46949fce3e813 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -184,6 +184,8 @@ enum vcpu_sysreg {
> 
> Comment above the enum has some claims about the order, but its
> already out of order with
> __vcpu_read_sys_reg_from_cpu()... (PAR_EL1 being the culprit)

This comment dates back from the original assembly implementation,
where I was paranoid about accessing the sys_regs array in strict
order to maximize the chances of the prefetcher doing the right thing.

As always with premature optimization, it was worthless, and moving
things to C forced us to do things differently anyway (not to mention
VHE...).

I'll delete the comment in a separate patch.

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 26/26] KVM: arm64: Parametrize exception entry with a target EL
  2020-05-27  9:34     ` Marc Zyngier
@ 2020-05-27 14:41       ` Mark Rutland
  0 siblings, 0 replies; 78+ messages in thread
From: Mark Rutland @ 2020-05-27 14:41 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, Andre Przywara, kvmarm, Will Deacon, George Cherian,
	Zengtao (B),
	Catalin Marinas, Dave Martin, linux-arm-kernel

On Wed, May 27, 2020 at 10:34:09AM +0100, Marc Zyngier wrote:
> HI Mark,
> 
> On 2020-05-19 11:44, Mark Rutland wrote:
> > On Wed, Apr 22, 2020 at 01:00:50PM +0100, Marc Zyngier wrote:
> > > -static unsigned long get_except64_pstate(struct kvm_vcpu *vcpu)
> > > +static void enter_exception(struct kvm_vcpu *vcpu, unsigned long
> > > target_mode,
> > > +			    enum exception_type type)
> > 
> > Since this is all for an AArch64 target, could we keep `64` in the name,
> > e.g enter_exception64? That'd mirror the callers below.
> > 
> > >  {
> > > -	unsigned long sctlr = vcpu_read_sys_reg(vcpu, SCTLR_EL1);
> > > -	unsigned long old, new;
> > > +	unsigned long sctlr, vbar, old, new, mode;
> > > +	u64 exc_offset;
> > > +
> > > +	mode = *vcpu_cpsr(vcpu) & (PSR_MODE_MASK | PSR_MODE32_BIT);
> > > +
> > > +	if      (mode == target_mode)
> > > +		exc_offset = CURRENT_EL_SP_ELx_VECTOR;
> > > +	else if ((mode | 1) == target_mode)
> > > +		exc_offset = CURRENT_EL_SP_EL0_VECTOR;
> > 
> > It would be nice if we could add a mnemonic for the `1` here, e.g.
> > PSR_MODE_SP0 or PSR_MODE_THREAD_BIT.
> 
> I've addressed both comments as follows:
> 
> diff --git a/arch/arm64/include/asm/ptrace.h
> b/arch/arm64/include/asm/ptrace.h
> index bf57308fcd63..953b6a1ce549 100644
> --- a/arch/arm64/include/asm/ptrace.h
> +++ b/arch/arm64/include/asm/ptrace.h
> @@ -35,6 +35,7 @@
>  #define GIC_PRIO_PSR_I_SET		(1 << 4)
> 
>  /* Additional SPSR bits not exposed in the UABI */
> +#define PSR_MODE_THREAD_BIT	(1 << 0)
>  #define PSR_IL_BIT		(1 << 20)
> 
>  /* AArch32-specific ptrace requests */
> diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
> index 3dbcbc839b9c..ebfdfc27b2bd 100644
> --- a/arch/arm64/kvm/inject_fault.c
> +++ b/arch/arm64/kvm/inject_fault.c
> @@ -43,8 +43,8 @@ enum exception_type {
>   * Here we manipulate the fields in order of the AArch64 SPSR_ELx layout,
> from
>   * MSB to LSB.
>   */
> -static void enter_exception(struct kvm_vcpu *vcpu, unsigned long
> target_mode,
> -			    enum exception_type type)
> +static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long
> target_mode,
> +			      enum exception_type type)
>  {
>  	unsigned long sctlr, vbar, old, new, mode;
>  	u64 exc_offset;
> @@ -53,7 +53,7 @@ static void enter_exception(struct kvm_vcpu *vcpu,
> unsigned long target_mode,
> 
>  	if      (mode == target_mode)
>  		exc_offset = CURRENT_EL_SP_ELx_VECTOR;
> -	else if ((mode | 1) == target_mode)
> +	else if ((mode | PSR_MODE_THREAD_BIT) == target_mode)
>  		exc_offset = CURRENT_EL_SP_EL0_VECTOR;
>  	else if (!(mode & PSR_MODE32_BIT))
>  		exc_offset = LOWER_EL_AArch64_VECTOR;
> @@ -126,7 +126,7 @@ static void inject_abt64(struct kvm_vcpu *vcpu, bool
> is_iabt, unsigned long addr
>  	bool is_aarch32 = vcpu_mode_is_32bit(vcpu);
>  	u32 esr = 0;
> 
> -	enter_exception(vcpu, PSR_MODE_EL1h, except_type_sync);
> +	enter_exception64(vcpu, PSR_MODE_EL1h, except_type_sync);
> 
>  	vcpu_write_sys_reg(vcpu, addr, FAR_EL1);
> 
> @@ -156,7 +156,7 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
>  {
>  	u32 esr = (ESR_ELx_EC_UNKNOWN << ESR_ELx_EC_SHIFT);
> 
> -	enter_exception(vcpu, PSR_MODE_EL1h, except_type_sync);
> +	enter_exception64(vcpu, PSR_MODE_EL1h, except_type_sync);
> 
>  	/*
>  	 * Build an unknown exception, depending on the instruction

Thanks; that all looks good to me, and my R-b stands!

Mark.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

end of thread, back to index

Thread overview: 78+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-22 12:00 [PATCH 00/26] KVM: arm64: Preliminary NV patches Marc Zyngier
2020-04-22 12:00 ` [PATCH 01/26] KVM: arm64: Check advertised Stage-2 page size capability Marc Zyngier
2020-04-22 13:40   ` Suzuki K Poulose
2020-04-22 14:07     ` Marc Zyngier
2020-04-22 14:14       ` Suzuki K Poulose
2020-05-07 11:42   ` Alexandru Elisei
2020-04-22 12:00 ` [PATCH 02/26] KVM: arm64: Move __load_guest_stage2 to kvm_mmu.h Marc Zyngier
2020-04-22 13:51   ` Suzuki K Poulose
2020-04-22 13:59     ` Marc Zyngier
2020-04-22 12:00 ` [PATCH 03/26] KVM: arm64: Factor out stage 2 page table data from struct kvm Marc Zyngier
2020-05-05 15:26   ` Andrew Scull
2020-05-05 16:32     ` Marc Zyngier
2020-05-05 17:23       ` Andrew Scull
2020-05-05 18:10         ` Marc Zyngier
2020-05-05 16:03   ` James Morse
2020-05-05 17:59     ` Marc Zyngier
2020-05-06  9:30       ` Marc Zyngier
2020-05-11 16:38   ` Alexandru Elisei
2020-05-12 11:17     ` James Morse
2020-05-12 15:47       ` Alexandru Elisei
2020-05-12 16:13         ` James Morse
2020-05-12 16:53       ` Alexandru Elisei
2020-05-27  8:41         ` Marc Zyngier
2020-05-27  8:45           ` Alexandru Elisei
2020-04-22 12:00 ` [PATCH 04/26] arm64: Detect the ARMv8.4 TTL feature Marc Zyngier
2020-04-27 15:55   ` Suzuki K Poulose
2020-04-22 12:00 ` [PATCH 05/26] arm64: Document SW reserved PTE/PMD bits in Stage-2 descriptors Marc Zyngier
2020-05-05 15:59   ` Andrew Scull
2020-05-06  9:39     ` Marc Zyngier
2020-05-06 10:11       ` Andrew Scull
2020-04-22 12:00 ` [PATCH 06/26] arm64: Add level-hinted TLB invalidation helper Marc Zyngier
2020-05-05 17:16   ` Andrew Scull
2020-05-06  8:05     ` Marc Zyngier
2020-04-22 12:00 ` [PATCH 07/26] KVM: arm64: Add a level hint to __kvm_tlb_flush_vmid_ipa Marc Zyngier
2020-05-07 15:08   ` Andrew Scull
2020-05-07 15:13     ` Marc Zyngier
2020-04-22 12:00 ` [PATCH 08/26] KVM: arm64: Use TTL hint in when invalidating stage-2 translations Marc Zyngier
2020-05-07 15:13   ` Andrew Scull
2020-05-12 12:04     ` James Morse
2020-05-13  9:06       ` Andrew Scull
2020-05-27  8:59         ` Marc Zyngier
2020-05-12 17:26   ` James Morse
2020-04-22 12:00 ` [PATCH 09/26] KVM: arm64: vgic-v3: Take cpu_if pointer directly instead of vcpu Marc Zyngier
2020-05-07 16:26   ` James Morse
2020-05-08 12:20     ` Marc Zyngier
2020-04-22 12:00 ` [PATCH 10/26] KVM: arm64: Refactor vcpu_{read,write}_sys_reg Marc Zyngier
2020-05-26 16:28   ` James Morse
2020-05-27 10:04     ` Marc Zyngier
2020-04-22 12:00 ` [PATCH 11/26] KVM: arm64: Add missing reset handlers for PMU emulation Marc Zyngier
2020-05-26 16:29   ` James Morse
2020-04-22 12:00 ` [PATCH 12/26] KVM: arm64: Move sysreg reset check to boot time Marc Zyngier
2020-04-22 12:00 ` [PATCH 13/26] KVM: arm64: Introduce accessor for ctxt->sys_reg Marc Zyngier
2020-04-22 12:00 ` [PATCH 14/26] KVM: arm64: hyp: Use ctxt_sys_reg/__vcpu_sys_reg instead of raw sys_regs access Marc Zyngier
2020-04-22 12:00 ` [PATCH 15/26] KVM: arm64: sve: Use __vcpu_sys_reg() " Marc Zyngier
2020-04-22 12:00 ` [PATCH 16/26] KVM: arm64: pauth: Use ctxt_sys_reg() " Marc Zyngier
2020-04-22 12:00 ` [PATCH 17/26] KVM: arm64: debug: " Marc Zyngier
2020-04-22 12:00 ` [PATCH 18/26] KVM: arm64: Don't use empty structures as CPU reset state Marc Zyngier
2020-04-24  4:07   ` Zenghui Yu
2020-04-24  7:45     ` Marc Zyngier
2020-04-28  1:34       ` Zengtao (B)
2020-04-22 12:00 ` [PATCH 19/26] KVM: arm64: Make struct kvm_regs userspace-only Marc Zyngier
2020-05-26 16:29   ` James Morse
2020-05-27 10:22     ` Marc Zyngier
2020-04-22 12:00 ` [PATCH 20/26] KVM: arm64: Move ELR_EL1 to the system register array Marc Zyngier
2020-05-26 16:29   ` James Morse
2020-05-27 10:36     ` Marc Zyngier
2020-04-22 12:00 ` [PATCH 21/26] KVM: arm64: Move SP_EL1 " Marc Zyngier
2020-05-26 16:29   ` James Morse
2020-04-22 12:00 ` [PATCH 22/26] KVM: arm64: Disintegrate SPSR array Marc Zyngier
2020-05-26 16:30   ` James Morse
2020-04-22 12:00 ` [PATCH 23/26] KVM: arm64: Move SPSR_EL1 to the system register array Marc Zyngier
2020-05-26 16:30   ` James Morse
2020-04-22 12:00 ` [PATCH 24/26] KVM: arm64: timers: Rename kvm_timer_sync_hwstate to kvm_timer_sync_user Marc Zyngier
2020-04-22 12:00 ` [PATCH 25/26] KVM: arm64: timers: Move timer registers to the sys_regs file Marc Zyngier
2020-04-22 12:00 ` [PATCH 26/26] KVM: arm64: Parametrize exception entry with a target EL Marc Zyngier
2020-05-19 10:44   ` Mark Rutland
2020-05-27  9:34     ` Marc Zyngier
2020-05-27 14:41       ` Mark Rutland

KVM ARM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvmarm/0 kvmarm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvmarm kvmarm/ https://lore.kernel.org/kvmarm \
		kvmarm@lists.cs.columbia.edu
	public-inbox-index kvmarm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/edu.columbia.cs.lists.kvmarm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git