kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support
@ 2023-05-15 17:30 Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 01/59] KVM: arm64: Move VTCR_EL2 into struct s2_mmu Marc Zyngier
                   ` (61 more replies)
  0 siblings, 62 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

This is the 4th drop of NV support on arm64 for this year.

For the previous episodes, see [1].

What's changed:

- New framework to track system register traps that are reinjected in
  guest EL2. It is expected to replace the discrete handling we have
  enjoyed so far, which didn't scale at all. This has already fixed a
  number of bugs that were hidden (a bunch of traps were never
  forwarded...). Still a work in progress, but this is going in the
  right direction.

- Allow the L1 hypervisor to have a S2 that has an input larger than
  the L0 IPA space. This fixes a number of subtle issues, depending on
  how the initial guest was created.

- Consequently, the patch series has gone longer again. Boo. But
  hopefully some of it is easier to review...

[1] https://lore.kernel.org/r/20230405154008.3552854-1-maz@kernel.org

Andre Przywara (1):
  KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ

Christoffer Dall (5):
  KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2
  KVM: arm64: nv: Implement nested Stage-2 page table walk logic
  KVM: arm64: nv: Unmap/flush shadow stage 2 page tables
  KVM: arm64: nv: vgic: Emulate the HW bit in software
  KVM: arm64: nv: Sync nested timer state with FEAT_NV2

Jintack Lim (7):
  KVM: arm64: nv: Trap CPACR_EL1 access in virtual EL2
  KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting
  KVM: arm64: nv: Respect virtual CPTR_EL2.{TFP,FPEN} settings
  KVM: arm64: nv: Respect virtual HCR_EL2.{NV,TSC) settings
  KVM: arm64: nv: Configure HCR_EL2 for nested virtualization
  KVM: arm64: nv: Trap and emulate TLBI instructions from virtual EL2
  KVM: arm64: nv: Nested GICv3 Support

Marc Zyngier (46):
  KVM: arm64: Move VTCR_EL2 into struct s2_mmu
  arm64: Add missing Set/Way CMO encodings
  arm64: Add missing VA CMO encodings
  arm64: Add missing ERXMISCx_EL1 encodings
  arm64: Add missing DC ZVA/GVA/GZVA encodings
  arm64: Add TLBI operation encodings
  arm64: Add AT operation encodings
  KVM: arm64: Add missing HCR_EL2 trap bits
  KVM: arm64: nv: Add trap forwarding infrastructure
  KVM: arm64: nv: Add trap forwarding for HCR_EL2
  KVM: arm64: nv: Expose FEAT_EVT to nested guests
  KVM: arm64: nv: Add trap forwarding for MDCR_EL2
  KVM: arm64: nv: Add trap forwarding for CNTHCTL_EL2
  KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers
  KVM: arm64: nv: Handle virtual EL2 registers in
    vcpu_read/write_sys_reg()
  KVM: arm64: nv: Handle SPSR_EL2 specially
  KVM: arm64: nv: Handle HCR_EL2.E2H specially
  KVM: arm64: nv: Save/Restore vEL2 sysregs
  KVM: arm64: nv: Support multiple nested Stage-2 mmu structures
  KVM: arm64: nv: Handle shadow stage 2 page faults
  KVM: arm64: nv: Restrict S2 RD/WR permissions to match the guest's
  KVM: arm64: nv: Set a handler for the system instruction traps
  KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2
  KVM: arm64: nv: Fold guest's HCR_EL2 configuration into the host's
  KVM: arm64: nv: Hide RAS from nested guests
  KVM: arm64: nv: Add handling of EL2-specific timer registers
  KVM: arm64: nv: Load timer before the GIC
  KVM: arm64: nv: Don't load the GICv4 context on entering a nested
    guest
  KVM: arm64: nv: Implement maintenance interrupt forwarding
  KVM: arm64: nv: Deal with broken VGIC on maintenance interrupt
    delivery
  KVM: arm64: nv: Allow userspace to request KVM_ARM_VCPU_NESTED_VIRT
  KVM: arm64: nv: Add handling of FEAT_TTL TLB invalidation
  KVM: arm64: nv: Invalidate TLBs based on shadow S2 TTL-like
    information
  KVM: arm64: nv: Tag shadow S2 entries with nested level
  KVM: arm64: nv: Add include containing the VNCR_EL2 offsets
  KVM: arm64: nv: Map VNCR-capable registers to a separate page
  KVM: arm64: nv: Move nested vgic state into the sysreg file
  KVM: arm64: Add FEAT_NV2 cpu feature
  KVM: arm64: nv: Fold GICv3 host trapping requirements into guest setup
  KVM: arm64: nv: Publish emulated timer interrupt state in the
    in-memory state
  KVM: arm64: nv: Allocate VNCR page when required
  KVM: arm64: nv: Enable ARMv8.4-NV support
  KVM: arm64: nv: Fast-track 'InHost' exception returns
  KVM: arm64: nv: Fast-track EL1 TLBIs for VHE guests
  KVM: arm64: nv: Use FEAT_ECV to trap access to EL0 timers
  KVM: arm64: nv: Accelerate EL0 timer read accesses when FEAT_ECV is on

 .../virt/kvm/devices/arm-vgic-v3.rst          |  12 +-
 arch/arm64/include/asm/esr.h                  |   1 +
 arch/arm64/include/asm/kvm_arm.h              |  14 +
 arch/arm64/include/asm/kvm_asm.h              |   4 +
 arch/arm64/include/asm/kvm_emulate.h          |  93 +-
 arch/arm64/include/asm/kvm_host.h             | 181 +++-
 arch/arm64/include/asm/kvm_hyp.h              |   2 +
 arch/arm64/include/asm/kvm_mmu.h              |  20 +-
 arch/arm64/include/asm/kvm_nested.h           | 133 +++
 arch/arm64/include/asm/stage2_pgtable.h       |   4 +-
 arch/arm64/include/asm/sysreg.h               | 196 ++++
 arch/arm64/include/asm/vncr_mapping.h         |  74 ++
 arch/arm64/include/uapi/asm/kvm.h             |   1 +
 arch/arm64/kernel/cpufeature.c                |  11 +
 arch/arm64/kvm/Makefile                       |   4 +-
 arch/arm64/kvm/arch_timer.c                   |  98 +-
 arch/arm64/kvm/arm.c                          |  33 +-
 arch/arm64/kvm/at.c                           | 219 ++++
 arch/arm64/kvm/emulate-nested.c               | 934 ++++++++++++++++-
 arch/arm64/kvm/handle_exit.c                  |  29 +-
 arch/arm64/kvm/hyp/include/hyp/switch.h       |   8 +-
 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h    |   5 +-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         |   8 +-
 arch/arm64/kvm/hyp/nvhe/pkvm.c                |   4 +-
 arch/arm64/kvm/hyp/nvhe/switch.c              |   2 +-
 arch/arm64/kvm/hyp/nvhe/sysreg-sr.c           |   2 +-
 arch/arm64/kvm/hyp/pgtable.c                  |   2 +-
 arch/arm64/kvm/hyp/vgic-v3-sr.c               |   6 +-
 arch/arm64/kvm/hyp/vhe/switch.c               | 206 +++-
 arch/arm64/kvm/hyp/vhe/sysreg-sr.c            | 124 ++-
 arch/arm64/kvm/hyp/vhe/tlb.c                  |  83 ++
 arch/arm64/kvm/mmu.c                          | 255 ++++-
 arch/arm64/kvm/nested.c                       | 799 ++++++++++++++-
 arch/arm64/kvm/pkvm.c                         |   2 +-
 arch/arm64/kvm/reset.c                        |   7 +
 arch/arm64/kvm/sys_regs.c                     | 958 +++++++++++++++++-
 arch/arm64/kvm/trace_arm.h                    |  19 +
 arch/arm64/kvm/vgic/vgic-init.c               |  33 +
 arch/arm64/kvm/vgic/vgic-kvm-device.c         |  32 +-
 arch/arm64/kvm/vgic/vgic-v3-nested.c          | 248 +++++
 arch/arm64/kvm/vgic/vgic-v3.c                 |  43 +-
 arch/arm64/kvm/vgic/vgic.c                    |  29 +
 arch/arm64/kvm/vgic/vgic.h                    |  10 +
 arch/arm64/tools/cpucaps                      |   1 +
 include/clocksource/arm_arch_timer.h          |   4 +
 include/kvm/arm_arch_timer.h                  |   1 +
 include/kvm/arm_vgic.h                        |  17 +
 include/uapi/linux/kvm.h                      |   1 +
 tools/arch/arm/include/uapi/asm/kvm.h         |   1 +
 49 files changed, 4790 insertions(+), 183 deletions(-)
 create mode 100644 arch/arm64/include/asm/vncr_mapping.h
 create mode 100644 arch/arm64/kvm/at.c
 create mode 100644 arch/arm64/kvm/vgic/vgic-v3-nested.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 95+ messages in thread

* [PATCH v10 01/59] KVM: arm64: Move VTCR_EL2 into struct s2_mmu
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 02/59] arm64: Add missing Set/Way CMO encodings Marc Zyngier
                   ` (60 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

We currently have a global VTCR_EL2 value for each guest, even
if the guest uses NV. This implies that the guest's own S2 must
fit in the host's. This is odd, for multiple reasons:

- the PARange values and the number of IPA bits don't necessarily
  match: you can have 33 bits of IPA space, and yet you can only
  describe 32 or 36 bits of PARange

- When userspace set the IPA space, it creates a contract with the
  kernel saying "this is the IPA space I'm prepared to handle".
  At no point does it constraint the guest's own IPA space as
  long as the guest doesn't try to use a [I]PA outside of the
  IPA space set by userspace

- We don't even try to ide the value of ID_AA64MMFR0_EL1.PARange.

And then there is the consequence of the above: if a guest tries
to create a S2 that has for input address something that is larger
than the IPA space defined by the host, we inject a fatal exception.

This is no good. For all intent and purposes, a guest should be
able to have the S2 it really wants, as long as the *output* address
of that S2 isn't outside of the IPA space.

For that, we need to have a per-s2_mmu VTCR_EL2 setting, which
allows us to represent the full PARange. Move the vctr field into
the s2_mmu structure, which has no impact whatsoever, except for NV.

Note that once we are able to override ID_AA64MMFR0_EL1.PARange
from userspace, we'll also be able to restrict the size of the
shadow S2 that NV uses.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h       | 13 ++++++++++---
 arch/arm64/include/asm/kvm_mmu.h        |  8 ++++----
 arch/arm64/include/asm/stage2_pgtable.h |  4 ++--
 arch/arm64/kvm/hyp/nvhe/mem_protect.c   |  8 ++++----
 arch/arm64/kvm/hyp/nvhe/pkvm.c          |  4 ++--
 arch/arm64/kvm/hyp/pgtable.c            |  2 +-
 arch/arm64/kvm/mmu.c                    | 13 +++++++------
 arch/arm64/kvm/pkvm.c                   |  2 +-
 arch/arm64/kvm/vgic/vgic-kvm-device.c   |  3 ++-
 9 files changed, 33 insertions(+), 24 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index ec3e69090c6e..f2e3b5889f8b 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -156,6 +156,16 @@ struct kvm_s2_mmu {
 	phys_addr_t	pgd_phys;
 	struct kvm_pgtable *pgt;
 
+	/*
+	 * VTCR value used on the host. For a non-NV guest (or a NV
+	 * guest that runs in a context where its own S2 doesn't
+	 * apply), its T0SZ value reflects that of the IPA size.
+	 *
+	 * For a shadow S2 MMU, T0SZ reflects the PARange exposed to
+	 * the guest.
+	 */
+	u64	vtcr;
+
 	/* The last vcpu id that ran on each physical CPU */
 	int __percpu *last_vcpu_ran;
 
@@ -188,9 +198,6 @@ struct kvm_protected_vm {
 struct kvm_arch {
 	struct kvm_s2_mmu mmu;
 
-	/* VTCR_EL2 value for this VM */
-	u64    vtcr;
-
 	/* Interrupt controller */
 	struct vgic_dist	vgic;
 
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 27e63c111f78..346754b2f3b0 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -150,9 +150,9 @@ static __always_inline unsigned long __kern_hyp_va(unsigned long v)
  */
 #define KVM_PHYS_SHIFT	(40)
 
-#define kvm_phys_shift(kvm)		VTCR_EL2_IPA(kvm->arch.vtcr)
-#define kvm_phys_size(kvm)		(_AC(1, ULL) << kvm_phys_shift(kvm))
-#define kvm_phys_mask(kvm)		(kvm_phys_size(kvm) - _AC(1, ULL))
+#define kvm_phys_shift(mmu)		VTCR_EL2_IPA((mmu)->vtcr)
+#define kvm_phys_size(mmu)		(_AC(1, ULL) << kvm_phys_shift(mmu))
+#define kvm_phys_mask(mmu)		(kvm_phys_size(mmu) - _AC(1, ULL))
 
 #include <asm/kvm_pgtable.h>
 #include <asm/stage2_pgtable.h>
@@ -296,7 +296,7 @@ static __always_inline u64 kvm_get_vttbr(struct kvm_s2_mmu *mmu)
 static __always_inline void __load_stage2(struct kvm_s2_mmu *mmu,
 					  struct kvm_arch *arch)
 {
-	write_sysreg(arch->vtcr, vtcr_el2);
+	write_sysreg(mmu->vtcr, vtcr_el2);
 	write_sysreg(kvm_get_vttbr(mmu), vttbr_el2);
 
 	/*
diff --git a/arch/arm64/include/asm/stage2_pgtable.h b/arch/arm64/include/asm/stage2_pgtable.h
index c8dca8ae359c..23d27623e478 100644
--- a/arch/arm64/include/asm/stage2_pgtable.h
+++ b/arch/arm64/include/asm/stage2_pgtable.h
@@ -21,13 +21,13 @@
  * (IPA_SHIFT - 4).
  */
 #define stage2_pgtable_levels(ipa)	ARM64_HW_PGTABLE_LEVELS((ipa) - 4)
-#define kvm_stage2_levels(kvm)		VTCR_EL2_LVLS(kvm->arch.vtcr)
+#define kvm_stage2_levels(mmu)		VTCR_EL2_LVLS((mmu)->vtcr)
 
 /*
  * kvm_mmmu_cache_min_pages() is the number of pages required to install
  * a stage-2 translation. We pre-allocate the entry level page table at
  * the VM creation.
  */
-#define kvm_mmu_cache_min_pages(kvm)	(kvm_stage2_levels(kvm) - 1)
+#define kvm_mmu_cache_min_pages(mmu)	(kvm_stage2_levels(mmu) - 1)
 
 #endif	/* __ARM64_S2_PGTABLE_H_ */
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 2e9ec4a2a4a3..07766e33cc32 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -129,8 +129,8 @@ static void prepare_host_vtcr(void)
 	parange = kvm_get_parange(id_aa64mmfr0_el1_sys_val);
 	phys_shift = id_aa64mmfr0_parange_to_phys_shift(parange);
 
-	host_mmu.arch.vtcr = kvm_get_vtcr(id_aa64mmfr0_el1_sys_val,
-					  id_aa64mmfr1_el1_sys_val, phys_shift);
+	host_mmu.arch.mmu.vtcr = kvm_get_vtcr(id_aa64mmfr0_el1_sys_val,
+					      id_aa64mmfr1_el1_sys_val, phys_shift);
 }
 
 static bool host_stage2_force_pte_cb(u64 addr, u64 end, enum kvm_pgtable_prot prot);
@@ -235,7 +235,7 @@ int kvm_guest_prepare_stage2(struct pkvm_hyp_vm *vm, void *pgd)
 	unsigned long nr_pages;
 	int ret;
 
-	nr_pages = kvm_pgtable_stage2_pgd_size(vm->kvm.arch.vtcr) >> PAGE_SHIFT;
+	nr_pages = kvm_pgtable_stage2_pgd_size(mmu->vtcr) >> PAGE_SHIFT;
 	ret = hyp_pool_init(&vm->pool, hyp_virt_to_pfn(pgd), nr_pages, 0);
 	if (ret)
 		return ret;
@@ -295,7 +295,7 @@ int __pkvm_prot_finalize(void)
 		return -EPERM;
 
 	params->vttbr = kvm_get_vttbr(mmu);
-	params->vtcr = host_mmu.arch.vtcr;
+	params->vtcr = mmu->vtcr;
 	params->hcr_el2 |= HCR_VM;
 
 	/*
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index a06ece14a6d8..28d17e91346b 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -288,7 +288,7 @@ static void init_pkvm_hyp_vm(struct kvm *host_kvm, struct pkvm_hyp_vm *hyp_vm,
 {
 	hyp_vm->host_kvm = host_kvm;
 	hyp_vm->kvm.created_vcpus = nr_vcpus;
-	hyp_vm->kvm.arch.vtcr = host_mmu.arch.vtcr;
+	hyp_vm->kvm.arch.mmu.vtcr = host_mmu.arch.mmu.vtcr;
 }
 
 static int init_pkvm_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu,
@@ -468,7 +468,7 @@ int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
 	}
 
 	vm_size = pkvm_get_hyp_vm_size(nr_vcpus);
-	pgd_size = kvm_pgtable_stage2_pgd_size(host_mmu.arch.vtcr);
+	pgd_size = kvm_pgtable_stage2_pgd_size(host_mmu.arch.mmu.vtcr);
 
 	ret = -ENOMEM;
 
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 5282cb9ca4cf..028cebd6ff3e 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -1237,7 +1237,7 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 			      kvm_pgtable_force_pte_cb_t force_pte_cb)
 {
 	size_t pgd_sz;
-	u64 vtcr = mmu->arch->vtcr;
+	u64 vtcr = mmu->vtcr;
 	u32 ia_bits = VTCR_EL2_IPA(vtcr);
 	u32 sl0 = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
 	u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 3b9d4d24c361..c934c911c702 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -750,7 +750,7 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long t
 
 	mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
 	mmfr1 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
-	kvm->arch.vtcr = kvm_get_vtcr(mmfr0, mmfr1, phys_shift);
+	mmu->vtcr = kvm_get_vtcr(mmfr0, mmfr1, phys_shift);
 
 	if (mmu->pgt != NULL) {
 		kvm_err("kvm_arch already initialized?\n");
@@ -915,7 +915,8 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 	phys_addr_t addr;
 	int ret = 0;
 	struct kvm_mmu_memory_cache cache = { .gfp_zero = __GFP_ZERO };
-	struct kvm_pgtable *pgt = kvm->arch.mmu.pgt;
+	struct kvm_s2_mmu *mmu = &kvm->arch.mmu;
+	struct kvm_pgtable *pgt = mmu->pgt;
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_DEVICE |
 				     KVM_PGTABLE_PROT_R |
 				     (writable ? KVM_PGTABLE_PROT_W : 0);
@@ -928,7 +929,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 
 	for (addr = guest_ipa; addr < guest_ipa + size; addr += PAGE_SIZE) {
 		ret = kvm_mmu_topup_memory_cache(&cache,
-						 kvm_mmu_cache_min_pages(kvm));
+						 kvm_mmu_cache_min_pages(mmu));
 		if (ret)
 			break;
 
@@ -1252,7 +1253,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (fault_status != ESR_ELx_FSC_PERM ||
 	    (logging_active && write_fault)) {
 		ret = kvm_mmu_topup_memory_cache(memcache,
-						 kvm_mmu_cache_min_pages(kvm));
+						 kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu));
 		if (ret)
 			return ret;
 	}
@@ -1569,7 +1570,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 	}
 
 	/* Userspace should not be able to register out-of-bounds IPAs */
-	VM_BUG_ON(fault_ipa >= kvm_phys_size(vcpu->kvm));
+	VM_BUG_ON(fault_ipa >= kvm_phys_size(vcpu->arch.hw_mmu));
 
 	if (fault_status == ESR_ELx_FSC_ACCESS) {
 		handle_access_fault(vcpu, fault_ipa);
@@ -1823,7 +1824,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 	 * Prevent userspace from creating a memory region outside of the IPA
 	 * space addressable by the KVM guest IPA space.
 	 */
-	if ((new->base_gfn + new->npages) > (kvm_phys_size(kvm) >> PAGE_SHIFT))
+	if ((new->base_gfn + new->npages) > (kvm_phys_size(&kvm->arch.mmu) >> PAGE_SHIFT))
 		return -EFAULT;
 
 	hva = new->userspace_addr;
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index cf56958b1492..1032690c20d1 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -118,7 +118,7 @@ static int __pkvm_create_hyp_vm(struct kvm *host_kvm)
 	if (host_kvm->created_vcpus < 1)
 		return -EINVAL;
 
-	pgd_sz = kvm_pgtable_stage2_pgd_size(host_kvm->arch.vtcr);
+	pgd_sz = kvm_pgtable_stage2_pgd_size(host_kvm->arch.mmu.vtcr);
 
 	/*
 	 * The PGD pages will be reclaimed using a hyp_memcache which implies
diff --git a/arch/arm64/kvm/vgic/vgic-kvm-device.c b/arch/arm64/kvm/vgic/vgic-kvm-device.c
index 35cfa268fd5d..a6294ec3714c 100644
--- a/arch/arm64/kvm/vgic/vgic-kvm-device.c
+++ b/arch/arm64/kvm/vgic/vgic-kvm-device.c
@@ -27,7 +27,8 @@ int vgic_check_iorange(struct kvm *kvm, phys_addr_t ioaddr,
 	if (addr + size < addr)
 		return -EINVAL;
 
-	if (addr & ~kvm_phys_mask(kvm) || addr + size > kvm_phys_size(kvm))
+	if (addr & ~kvm_phys_mask(&kvm->arch.mmu) ||
+	    (addr + size) > kvm_phys_size(&kvm->arch.mmu))
 		return -E2BIG;
 
 	return 0;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 02/59] arm64: Add missing Set/Way CMO encodings
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 01/59] KVM: arm64: Move VTCR_EL2 into struct s2_mmu Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 03/59] arm64: Add missing VA " Marc Zyngier
                   ` (59 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

Add the missing Set/Way CMOs that apply to tagged memory.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/sysreg.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index a43f21559c3e..108663aebccb 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -115,8 +115,14 @@
 #define SB_BARRIER_INSN			__SYS_BARRIER_INSN(0, 7, 31)
 
 #define SYS_DC_ISW			sys_insn(1, 0, 7, 6, 2)
+#define SYS_DC_IGSW			sys_insn(1, 0, 7, 6, 4)
+#define SYS_DC_IGDSW			sys_insn(1, 0, 7, 6, 6)
 #define SYS_DC_CSW			sys_insn(1, 0, 7, 10, 2)
+#define SYS_DC_CGSW			sys_insn(1, 0, 7, 10, 4)
+#define SYS_DC_CGDSW			sys_insn(1, 0, 7, 10, 6)
 #define SYS_DC_CISW			sys_insn(1, 0, 7, 14, 2)
+#define SYS_DC_CIGSW			sys_insn(1, 0, 7, 14, 4)
+#define SYS_DC_CIGDSW			sys_insn(1, 0, 7, 14, 6)
 
 /*
  * Automatically generated definitions for system registers, the
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 03/59] arm64: Add missing VA CMO encodings
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 01/59] KVM: arm64: Move VTCR_EL2 into struct s2_mmu Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 02/59] arm64: Add missing Set/Way CMO encodings Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-06-05 17:46   ` Eric Auger
  2023-05-15 17:30 ` [PATCH v10 04/59] arm64: Add missing ERXMISCx_EL1 encodings Marc Zyngier
                   ` (58 subsequent siblings)
  61 siblings, 1 reply; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

Add the missing VA-based CMOs encodings.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/sysreg.h | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 108663aebccb..071cc8545fbe 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -124,6 +124,32 @@
 #define SYS_DC_CIGSW			sys_insn(1, 0, 7, 14, 4)
 #define SYS_DC_CIGDSW			sys_insn(1, 0, 7, 14, 6)
 
+#define SYS_IC_IALLUIS			sys_insn(1, 0, 7, 1, 0)
+#define SYS_IC_IALLU			sys_insn(1, 0, 7, 5, 0)
+#define SYS_IC_IVAU			sys_insn(1, 3, 7, 5, 1)
+
+#define SYS_DC_IVAC			sys_insn(1, 0, 7, 6, 1)
+#define SYS_DC_IGVAC			sys_insn(1, 0, 7, 6, 3)
+#define SYS_DC_IGDVAC			sys_insn(1, 0, 7, 6, 5)
+
+#define SYS_DC_CVAC			sys_insn(1, 3, 7, 10, 1)
+#define SYS_DC_CGVAC			sys_insn(1, 3, 7, 10, 3)
+#define SYS_DC_CGDVAC			sys_insn(1, 3, 7, 10, 5)
+
+#define SYS_DC_CVAU			sys_insn(1, 3, 7, 11, 1)
+
+#define SYS_DC_CVAP			sys_insn(1, 3, 7, 12, 1)
+#define SYS_DC_CGVAP			sys_insn(1, 3, 7, 12, 3)
+#define SYS_DC_CGDVAP			sys_insn(1, 3, 7, 12, 5)
+
+#define SYS_DC_CVADP			sys_insn(1, 3, 7, 13, 1)
+#define SYS_DC_CGVADP			sys_insn(1, 3, 7, 13, 3)
+#define SYS_DC_CGDVADP			sys_insn(1, 3, 7, 13, 5)
+
+#define SYS_DC_CIVAC			sys_insn(1, 3, 7, 14, 1)
+#define SYS_DC_CIGVAC			sys_insn(1, 3, 7, 14, 3)
+#define SYS_DC_CIGDVAC			sys_insn(1, 3, 7, 14, 5)
+
 /*
  * Automatically generated definitions for system registers, the
  * manual encodings below are in the process of being converted to
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 04/59] arm64: Add missing ERXMISCx_EL1 encodings
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (2 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 03/59] arm64: Add missing VA " Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-06-05 17:47   ` Eric Auger
  2023-05-15 17:30 ` [PATCH v10 05/59] arm64: Add missing DC ZVA/GVA/GZVA encodings Marc Zyngier
                   ` (57 subsequent siblings)
  61 siblings, 1 reply; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

We only describe ERXMISC{0,1}_EL1. Add ERXMISC{2,3}_EL1 for a good measure.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/sysreg.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 071cc8545fbe..71305f7425db 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -239,6 +239,8 @@
 #define SYS_ERXADDR_EL1			sys_reg(3, 0, 5, 4, 3)
 #define SYS_ERXMISC0_EL1		sys_reg(3, 0, 5, 5, 0)
 #define SYS_ERXMISC1_EL1		sys_reg(3, 0, 5, 5, 1)
+#define SYS_ERXMISC2_EL1		sys_reg(3, 0, 5, 5, 2)
+#define SYS_ERXMISC3_EL1		sys_reg(3, 0, 5, 5, 3)
 #define SYS_TFSR_EL1			sys_reg(3, 0, 5, 6, 0)
 #define SYS_TFSRE0_EL1			sys_reg(3, 0, 5, 6, 1)
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 05/59] arm64: Add missing DC ZVA/GVA/GZVA encodings
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (3 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 04/59] arm64: Add missing ERXMISCx_EL1 encodings Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-06-05 17:47   ` Eric Auger
  2023-05-15 17:30 ` [PATCH v10 06/59] arm64: Add TLBI operation encodings Marc Zyngier
                   ` (56 subsequent siblings)
  61 siblings, 1 reply; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

Add the missing DC *VA encodings.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/sysreg.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 71305f7425db..28ccc379a172 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -150,6 +150,11 @@
 #define SYS_DC_CIGVAC			sys_insn(1, 3, 7, 14, 3)
 #define SYS_DC_CIGDVAC			sys_insn(1, 3, 7, 14, 5)
 
+/* Data cache zero operations */
+#define SYS_DC_ZVA			sys_insn(1, 3, 7, 4, 1)
+#define SYS_DC_GVA			sys_insn(1, 3, 7, 4, 3)
+#define SYS_DC_GZVA			sys_insn(1, 3, 7, 4, 4)
+
 /*
  * Automatically generated definitions for system registers, the
  * manual encodings below are in the process of being converted to
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 06/59] arm64: Add TLBI operation encodings
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (4 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 05/59] arm64: Add missing DC ZVA/GVA/GZVA encodings Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-06-06  9:33   ` Eric Auger
  2023-05-15 17:30 ` [PATCH v10 07/59] arm64: Add AT " Marc Zyngier
                   ` (55 subsequent siblings)
  61 siblings, 1 reply; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

Add all the TLBI encodings that are usable from Non-Secure.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/sysreg.h | 128 ++++++++++++++++++++++++++++++++
 1 file changed, 128 insertions(+)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 28ccc379a172..2727e68dd65b 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -568,6 +568,134 @@
 
 #define SYS_SP_EL2			sys_reg(3, 6,  4, 1, 0)
 
+/* TLBI instructions */
+#define OP_TLBI_VMALLE1OS		sys_insn(1, 0, 8, 1, 0)
+#define OP_TLBI_VAE1OS			sys_insn(1, 0, 8, 1, 1)
+#define OP_TLBI_ASIDE1OS		sys_insn(1, 0, 8, 1, 2)
+#define OP_TLBI_VAAE1OS			sys_insn(1, 0, 8, 1, 3)
+#define OP_TLBI_VALE1OS			sys_insn(1, 0, 8, 1, 5)
+#define OP_TLBI_VAALE1OS		sys_insn(1, 0, 8, 1, 7)
+#define OP_TLBI_RVAE1IS			sys_insn(1, 0, 8, 2, 1)
+#define OP_TLBI_RVAAE1IS		sys_insn(1, 0, 8, 2, 3)
+#define OP_TLBI_RVALE1IS		sys_insn(1, 0, 8, 2, 5)
+#define OP_TLBI_RVAALE1IS		sys_insn(1, 0, 8, 2, 7)
+#define OP_TLBI_VMALLE1IS		sys_insn(1, 0, 8, 3, 0)
+#define OP_TLBI_VAE1IS			sys_insn(1, 0, 8, 3, 1)
+#define OP_TLBI_ASIDE1IS		sys_insn(1, 0, 8, 3, 2)
+#define OP_TLBI_VAAE1IS			sys_insn(1, 0, 8, 3, 3)
+#define OP_TLBI_VALE1IS			sys_insn(1, 0, 8, 3, 5)
+#define OP_TLBI_VAALE1IS		sys_insn(1, 0, 8, 3, 7)
+#define OP_TLBI_RVAE1OS			sys_insn(1, 0, 8, 5, 1)
+#define OP_TLBI_RVAAE1OS		sys_insn(1, 0, 8, 5, 3)
+#define OP_TLBI_RVALE1OS		sys_insn(1, 0, 8, 5, 5)
+#define OP_TLBI_RVAALE1OS		sys_insn(1, 0, 8, 5, 7)
+#define OP_TLBI_RVAE1			sys_insn(1, 0, 8, 6, 1)
+#define OP_TLBI_RVAAE1			sys_insn(1, 0, 8, 6, 3)
+#define OP_TLBI_RVALE1			sys_insn(1, 0, 8, 6, 5)
+#define OP_TLBI_RVAALE1			sys_insn(1, 0, 8, 6, 7)
+#define OP_TLBI_VMALLE1			sys_insn(1, 0, 8, 7, 0)
+#define OP_TLBI_VAE1			sys_insn(1, 0, 8, 7, 1)
+#define OP_TLBI_ASIDE1			sys_insn(1, 0, 8, 7, 2)
+#define OP_TLBI_VAAE1			sys_insn(1, 0, 8, 7, 3)
+#define OP_TLBI_VALE1			sys_insn(1, 0, 8, 7, 5)
+#define OP_TLBI_VAALE1			sys_insn(1, 0, 8, 7, 7)
+#define OP_TLBI_VMALLE1OSNXS		sys_insn(1, 0, 9, 1, 0)
+#define OP_TLBI_VAE1OSNXS		sys_insn(1, 0, 9, 1, 1)
+#define OP_TLBI_ASIDE1OSNXS		sys_insn(1, 0, 9, 1, 2)
+#define OP_TLBI_VAAE1OSNXS		sys_insn(1, 0, 9, 1, 3)
+#define OP_TLBI_VALE1OSNXS		sys_insn(1, 0, 9, 1, 5)
+#define OP_TLBI_VAALE1OSNXS		sys_insn(1, 0, 9, 1, 7)
+#define OP_TLBI_RVAE1ISNXS		sys_insn(1, 0, 9, 2, 1)
+#define OP_TLBI_RVAAE1ISNXS		sys_insn(1, 0, 9, 2, 3)
+#define OP_TLBI_RVALE1ISNXS		sys_insn(1, 0, 9, 2, 5)
+#define OP_TLBI_RVAALE1ISNXS		sys_insn(1, 0, 9, 2, 7)
+#define OP_TLBI_VMALLE1ISNXS		sys_insn(1, 0, 9, 3, 0)
+#define OP_TLBI_VAE1ISNXS		sys_insn(1, 0, 9, 3, 1)
+#define OP_TLBI_ASIDE1ISNXS		sys_insn(1, 0, 9, 3, 2)
+#define OP_TLBI_VAAE1ISNXS		sys_insn(1, 0, 9, 3, 3)
+#define OP_TLBI_VALE1ISNXS		sys_insn(1, 0, 9, 3, 5)
+#define OP_TLBI_VAALE1ISNXS		sys_insn(1, 0, 9, 3, 7)
+#define OP_TLBI_RVAE1OSNXS		sys_insn(1, 0, 9, 5, 1)
+#define OP_TLBI_RVAAE1OSNXS		sys_insn(1, 0, 9, 5, 3)
+#define OP_TLBI_RVALE1OSNXS		sys_insn(1, 0, 9, 5, 5)
+#define OP_TLBI_RVAALE1OSNXS		sys_insn(1, 0, 9, 5, 7)
+#define OP_TLBI_RVAE1NXS		sys_insn(1, 0, 9, 6, 1)
+#define OP_TLBI_RVAAE1NXS		sys_insn(1, 0, 9, 6, 3)
+#define OP_TLBI_RVALE1NXS		sys_insn(1, 0, 9, 6, 5)
+#define OP_TLBI_RVAALE1NXS		sys_insn(1, 0, 9, 6, 7)
+#define OP_TLBI_VMALLE1NXS		sys_insn(1, 0, 9, 7, 0)
+#define OP_TLBI_VAE1NXS			sys_insn(1, 0, 9, 7, 1)
+#define OP_TLBI_ASIDE1NXS		sys_insn(1, 0, 9, 7, 2)
+#define OP_TLBI_VAAE1NXS		sys_insn(1, 0, 9, 7, 3)
+#define OP_TLBI_VALE1NXS		sys_insn(1, 0, 9, 7, 5)
+#define OP_TLBI_VAALE1NXS		sys_insn(1, 0, 9, 7, 7)
+#define OP_TLBI_IPAS2E1IS		sys_insn(1, 4, 8, 0, 1)
+#define OP_TLBI_RIPAS2E1IS		sys_insn(1, 4, 8, 0, 2)
+#define OP_TLBI_IPAS2LE1IS		sys_insn(1, 4, 8, 0, 5)
+#define OP_TLBI_RIPAS2LE1IS		sys_insn(1, 4, 8, 0, 6)
+#define OP_TLBI_ALLE2OS			sys_insn(1, 4, 8, 1, 0)
+#define OP_TLBI_VAE2OS			sys_insn(1, 4, 8, 1, 1)
+#define OP_TLBI_ALLE1OS			sys_insn(1, 4, 8, 1, 4)
+#define OP_TLBI_VALE2OS			sys_insn(1, 4, 8, 1, 5)
+#define OP_TLBI_VMALLS12E1OS		sys_insn(1, 4, 8, 1, 6)
+#define OP_TLBI_RVAE2IS			sys_insn(1, 4, 8, 2, 1)
+#define OP_TLBI_RVALE2IS		sys_insn(1, 4, 8, 2, 5)
+#define OP_TLBI_ALLE2IS			sys_insn(1, 4, 8, 3, 0)
+#define OP_TLBI_VAE2IS			sys_insn(1, 4, 8, 3, 1)
+#define OP_TLBI_ALLE1IS			sys_insn(1, 4, 8, 3, 4)
+#define OP_TLBI_VALE2IS			sys_insn(1, 4, 8, 3, 5)
+#define OP_TLBI_VMALLS12E1IS		sys_insn(1, 4, 8, 3, 6)
+#define OP_TLBI_IPAS2E1OS		sys_insn(1, 4, 8, 4, 0)
+#define OP_TLBI_IPAS2E1			sys_insn(1, 4, 8, 4, 1)
+#define OP_TLBI_RIPAS2E1		sys_insn(1, 4, 8, 4, 2)
+#define OP_TLBI_RIPAS2E1OS		sys_insn(1, 4, 8, 4, 3)
+#define OP_TLBI_IPAS2LE1OS		sys_insn(1, 4, 8, 4, 4)
+#define OP_TLBI_IPAS2LE1		sys_insn(1, 4, 8, 4, 5)
+#define OP_TLBI_RIPAS2LE1		sys_insn(1, 4, 8, 4, 6)
+#define OP_TLBI_RIPAS2LE1OS		sys_insn(1, 4, 8, 4, 7)
+#define OP_TLBI_RVAE2OS			sys_insn(1, 4, 8, 5, 1)
+#define OP_TLBI_RVALE2OS		sys_insn(1, 4, 8, 5, 5)
+#define OP_TLBI_RVAE2			sys_insn(1, 4, 8, 6, 1)
+#define OP_TLBI_RVALE2			sys_insn(1, 4, 8, 6, 5)
+#define OP_TLBI_ALLE2			sys_insn(1, 4, 8, 7, 0)
+#define OP_TLBI_VAE2			sys_insn(1, 4, 8, 7, 1)
+#define OP_TLBI_ALLE1			sys_insn(1, 4, 8, 7, 4)
+#define OP_TLBI_VALE2			sys_insn(1, 4, 8, 7, 5)
+#define OP_TLBI_VMALLS12E1		sys_insn(1, 4, 8, 7, 6)
+#define OP_TLBI_IPAS2E1ISNXS		sys_insn(1, 4, 9, 0, 1)
+#define OP_TLBI_RIPAS2E1ISNXS		sys_insn(1, 4, 9, 0, 2)
+#define OP_TLBI_IPAS2LE1ISNXS		sys_insn(1, 4, 9, 0, 5)
+#define OP_TLBI_RIPAS2LE1ISNXS		sys_insn(1, 4, 9, 0, 6)
+#define OP_TLBI_ALLE2OSNXS		sys_insn(1, 4, 9, 1, 0)
+#define OP_TLBI_VAE2OSNXS		sys_insn(1, 4, 9, 1, 1)
+#define OP_TLBI_ALLE1OSNXS		sys_insn(1, 4, 9, 1, 4)
+#define OP_TLBI_VALE2OSNXS		sys_insn(1, 4, 9, 1, 5)
+#define OP_TLBI_VMALLS12E1OSNXS		sys_insn(1, 4, 9, 1, 6)
+#define OP_TLBI_RVAE2ISNXS		sys_insn(1, 4, 9, 2, 1)
+#define OP_TLBI_RVALE2ISNXS		sys_insn(1, 4, 9, 2, 5)
+#define OP_TLBI_ALLE2ISNXS		sys_insn(1, 4, 9, 3, 0)
+#define OP_TLBI_VAE2ISNXS		sys_insn(1, 4, 9, 3, 1)
+#define OP_TLBI_ALLE1ISNXS		sys_insn(1, 4, 9, 3, 4)
+#define OP_TLBI_VALE2ISNXS		sys_insn(1, 4, 9, 3, 5)
+#define OP_TLBI_VMALLS12E1ISNXS		sys_insn(1, 4, 9, 3, 6)
+#define OP_TLBI_IPAS2E1OSNXS		sys_insn(1, 4, 9, 4, 0)
+#define OP_TLBI_IPAS2E1NXS		sys_insn(1, 4, 9, 4, 1)
+#define OP_TLBI_RIPAS2E1NXS		sys_insn(1, 4, 9, 4, 2)
+#define OP_TLBI_RIPAS2E1OSNXS		sys_insn(1, 4, 9, 4, 3)
+#define OP_TLBI_IPAS2LE1OSNXS		sys_insn(1, 4, 9, 4, 4)
+#define OP_TLBI_IPAS2LE1NXS		sys_insn(1, 4, 9, 4, 5)
+#define OP_TLBI_RIPAS2LE1NXS		sys_insn(1, 4, 9, 4, 6)
+#define OP_TLBI_RIPAS2LE1OSNXS		sys_insn(1, 4, 9, 4, 7)
+#define OP_TLBI_RVAE2OSNXS		sys_insn(1, 4, 9, 5, 1)
+#define OP_TLBI_RVALE2OSNXS		sys_insn(1, 4, 9, 5, 5)
+#define OP_TLBI_RVAE2NXS		sys_insn(1, 4, 9, 6, 1)
+#define OP_TLBI_RVALE2NXS		sys_insn(1, 4, 9, 6, 5)
+#define OP_TLBI_ALLE2NXS		sys_insn(1, 4, 9, 7, 0)
+#define OP_TLBI_VAE2NXS			sys_insn(1, 4, 9, 7, 1)
+#define OP_TLBI_ALLE1NXS		sys_insn(1, 4, 9, 7, 4)
+#define OP_TLBI_VALE2NXS		sys_insn(1, 4, 9, 7, 5)
+#define OP_TLBI_VMALLS12E1NXS		sys_insn(1, 4, 9, 7, 6)
+
 /* Common SCTLR_ELx flags. */
 #define SCTLR_ELx_ENTP2	(BIT(60))
 #define SCTLR_ELx_DSSBS	(BIT(44))
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 07/59] arm64: Add AT operation encodings
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (5 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 06/59] arm64: Add TLBI operation encodings Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-06-14 15:08   ` Eric Auger
  2023-05-15 17:30 ` [PATCH v10 08/59] KVM: arm64: Add missing HCR_EL2 trap bits Marc Zyngier
                   ` (54 subsequent siblings)
  61 siblings, 1 reply; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

Add the encodings for the AT operation that are usable from NS.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/sysreg.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 2727e68dd65b..8be78a9b9b3b 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -568,6 +568,23 @@
 
 #define SYS_SP_EL2			sys_reg(3, 6,  4, 1, 0)
 
+/* AT instructions */
+#define AT_Op0 1
+#define AT_CRn 7
+
+#define OP_AT_S1E1R	sys_insn(AT_Op0, 0, AT_CRn, 8, 0)
+#define OP_AT_S1E1W	sys_insn(AT_Op0, 0, AT_CRn, 8, 1)
+#define OP_AT_S1E0R	sys_insn(AT_Op0, 0, AT_CRn, 8, 2)
+#define OP_AT_S1E0W	sys_insn(AT_Op0, 0, AT_CRn, 8, 3)
+#define OP_AT_S1E1RP	sys_insn(AT_Op0, 0, AT_CRn, 9, 0)
+#define OP_AT_S1E1WP	sys_insn(AT_Op0, 0, AT_CRn, 9, 1)
+#define OP_AT_S1E2R	sys_insn(AT_Op0, 4, AT_CRn, 8, 0)
+#define OP_AT_S1E2W	sys_insn(AT_Op0, 4, AT_CRn, 8, 1)
+#define OP_AT_S12E1R	sys_insn(AT_Op0, 4, AT_CRn, 8, 4)
+#define OP_AT_S12E1W	sys_insn(AT_Op0, 4, AT_CRn, 8, 5)
+#define OP_AT_S12E0R	sys_insn(AT_Op0, 4, AT_CRn, 8, 6)
+#define OP_AT_S12E0W	sys_insn(AT_Op0, 4, AT_CRn, 8, 7)
+
 /* TLBI instructions */
 #define OP_TLBI_VMALLE1OS		sys_insn(1, 0, 8, 1, 0)
 #define OP_TLBI_VAE1OS			sys_insn(1, 0, 8, 1, 1)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 08/59] KVM: arm64: Add missing HCR_EL2 trap bits
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (6 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 07/59] arm64: Add AT " Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-06-14 15:08   ` Eric Auger
  2023-05-15 17:30 ` [PATCH v10 09/59] KVM: arm64: nv: Add trap forwarding infrastructure Marc Zyngier
                   ` (53 subsequent siblings)
  61 siblings, 1 reply; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

We're still missing a handfull of HCR_EL2 trap bits. Add them.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_arm.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 209a4fba5d2a..4b3e55abb30f 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -17,9 +17,19 @@
 #define HCR_DCT		(UL(1) << 57)
 #define HCR_ATA_SHIFT	56
 #define HCR_ATA		(UL(1) << HCR_ATA_SHIFT)
+#define HCR_TTLBOS	(UL(1) << 55)
+#define HCR_TTLBIS	(UL(1) << 54)
+#define HCR_ENSCXT	(UL(1) << 53)
+#define HCR_TOCU	(UL(1) << 52)
 #define HCR_AMVOFFEN	(UL(1) << 51)
+#define HCR_TICAB	(UL(1) << 50)
+#define HCR_TID4	(UL(1) << 49)
 #define HCR_FIEN	(UL(1) << 47)
 #define HCR_FWB		(UL(1) << 46)
+#define HCR_NV2		(UL(1) << 45)
+#define HCR_AT		(UL(1) << 44)
+#define HCR_NV1		(UL(1) << 43)
+#define HCR_NV		(UL(1) << 42)
 #define HCR_API		(UL(1) << 41)
 #define HCR_APK		(UL(1) << 40)
 #define HCR_TEA		(UL(1) << 37)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 09/59] KVM: arm64: nv: Add trap forwarding infrastructure
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (7 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 08/59] KVM: arm64: Add missing HCR_EL2 trap bits Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-07-13 14:29   ` Eric Auger
  2023-05-15 17:30 ` [PATCH v10 10/59] KVM: arm64: nv: Add trap forwarding for HCR_EL2 Marc Zyngier
                   ` (52 subsequent siblings)
  61 siblings, 1 reply; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

A significant part of what a NV hypervisor needs to do is to decide
whether a trap from a L2+ guest has to be forwarded to a L1 guest
or handled locally. This is done by checking for the trap bits that
the guest hypervisor has set and acting accordingly, as described by
the architecture.

A previous approach was to sprinkle a bunch of checks in all the
system register accessors, but this is pretty error prone and doesn't
help getting an overview of what is happening.

Instead, implement a set of global tables that describe a trap bit,
combinations of trap bits, behaviours on trap, and what bits must
be evaluated on a system register trap.

Although this is painful to describe, this allows to specify each
and every control bit in a static manner. To make it efficient,
the table is inserted in an xarray that is global to the system,
and checked each time we trap a system register.

Add the basic infrastructure for now, while additional patches will
implement configuration registers.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h   |   1 +
 arch/arm64/include/asm/kvm_nested.h |   2 +
 arch/arm64/kvm/emulate-nested.c     | 175 ++++++++++++++++++++++++++++
 arch/arm64/kvm/sys_regs.c           |   6 +
 arch/arm64/kvm/trace_arm.h          |  19 +++
 5 files changed, 203 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index f2e3b5889f8b..65810618cb42 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -960,6 +960,7 @@ int kvm_handle_cp10_id(struct kvm_vcpu *vcpu);
 void kvm_reset_sys_regs(struct kvm_vcpu *vcpu);
 
 int __init kvm_sys_reg_table_init(void);
+void __init populate_nv_trap_config(void);
 
 bool lock_all_vcpus(struct kvm *kvm);
 void unlock_all_vcpus(struct kvm *kvm);
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 8fb67f032fd1..fa23cc9c2adc 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -11,6 +11,8 @@ static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
 		test_bit(KVM_ARM_VCPU_HAS_EL2, vcpu->arch.features));
 }
 
+extern bool __check_nv_sr_forward(struct kvm_vcpu *vcpu);
+
 struct sys_reg_params;
 struct sys_reg_desc;
 
diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
index b96662029fb1..a923f7f47add 100644
--- a/arch/arm64/kvm/emulate-nested.c
+++ b/arch/arm64/kvm/emulate-nested.c
@@ -14,6 +14,181 @@
 
 #include "trace.h"
 
+enum trap_behaviour {
+	BEHAVE_HANDLE_LOCALLY	= 0,
+	BEHAVE_FORWARD_READ	= BIT(0),
+	BEHAVE_FORWARD_WRITE	= BIT(1),
+	BEHAVE_FORWARD_ANY	= BEHAVE_FORWARD_READ | BEHAVE_FORWARD_WRITE,
+};
+
+struct trap_bits {
+	const enum vcpu_sysreg		index;
+	const enum trap_behaviour	behaviour;
+	const u64			value;
+	const u64			mask;
+};
+
+enum coarse_grain_trap_id {
+	/* Indicates no coarse trap control */
+	__RESERVED__,
+
+	/*
+	 * The first batch of IDs denote coarse trapping that are used
+	 * on their own instead of being part of a combination of
+	 * trap controls.
+	 */
+
+	/*
+	 * Anything after this point is a combination of trap controls,
+	 * which all must be evaluated to decide what to do.
+	 */
+	__MULTIPLE_CONTROL_BITS__,
+
+	/*
+	 * Anything after this point requires a callback evaluating a
+	 * complex trap condition. Hopefully we'll never need this...
+	 */
+	__COMPLEX_CONDITIONS__,
+};
+
+static const struct trap_bits coarse_trap_bits[] = {
+};
+
+#define MCB(id, ...)					\
+	[id - __MULTIPLE_CONTROL_BITS__]	=	\
+		(const enum coarse_grain_trap_id []){	\
+			__VA_ARGS__ , __RESERVED__	\
+		}
+
+static const enum coarse_grain_trap_id *coarse_control_combo[] = {
+};
+
+typedef enum trap_behaviour (*complex_condition_check)(struct kvm_vcpu *);
+
+#define CCC(id, fn)	[id - __COMPLEX_CONDITIONS__] = fn
+
+static const complex_condition_check ccc[] = {
+};
+
+struct encoding_to_trap_configs {
+	const u32			encoding;
+	const u32			end;
+	const enum coarse_grain_trap_id	id;
+};
+
+#define SR_RANGE_TRAP(sr_start, sr_end, trap_id)			\
+	{								\
+		.encoding	= sr_start,				\
+		.end		= sr_end,				\
+		.id		= trap_id,				\
+	}
+
+#define SR_TRAP(sr, trap_id)		SR_RANGE_TRAP(sr, sr, trap_id)
+
+/*
+ * Map encoding to trap bits for exception reported with EC=0x18.
+ * These must only be evaluated when running a nested hypervisor, but
+ * that the current context is not a hypervisor context. When the
+ * trapped access matches one of the trap controls, the exception is
+ * re-injected in the nested hypervisor.
+ */
+static const struct encoding_to_trap_configs encoding_to_traps[] __initdata = {
+};
+
+static DEFINE_XARRAY(sr_forward_xa);
+
+void __init populate_nv_trap_config(void)
+{
+	for (int i = 0; i < ARRAY_SIZE(encoding_to_traps); i++) {
+		const struct encoding_to_trap_configs *ett = &encoding_to_traps[i];
+		void *prev;
+
+		prev = xa_store_range(&sr_forward_xa, ett->encoding, ett->end,
+				      xa_mk_value(ett->id), GFP_KERNEL);
+		WARN_ON(prev);
+	}
+
+	kvm_info("nv: %ld trap handlers\n", ARRAY_SIZE(encoding_to_traps));
+}
+
+static const enum coarse_grain_trap_id get_trap_config(u32 sysreg)
+{
+	return xa_to_value(xa_load(&sr_forward_xa, sysreg));
+}
+
+static enum trap_behaviour get_behaviour(struct kvm_vcpu *vcpu,
+					 const struct trap_bits *tb)
+{
+	enum trap_behaviour b = BEHAVE_HANDLE_LOCALLY;
+	u64 val;
+
+	val = __vcpu_sys_reg(vcpu, tb->index);
+	if ((val & tb->mask) == tb->value)
+		b |= tb->behaviour;
+
+	return b;
+}
+
+static enum trap_behaviour __do_compute_behaviour(struct kvm_vcpu *vcpu,
+						  const enum coarse_grain_trap_id id,
+						  enum trap_behaviour b)
+{
+	switch (id) {
+		const enum coarse_grain_trap_id *cgids;
+
+	case __RESERVED__ ... __MULTIPLE_CONTROL_BITS__ - 1:
+		if (likely(id != __RESERVED__))
+			b |= get_behaviour(vcpu, &coarse_trap_bits[id]);
+		break;
+	case __MULTIPLE_CONTROL_BITS__ ... __COMPLEX_CONDITIONS__ - 1:
+		/* Yes, this is recursive. Don't do anything stupid. */
+		cgids = coarse_control_combo[id - __MULTIPLE_CONTROL_BITS__];
+		for (int i = 0; cgids[i] != __RESERVED__; i++)
+			b |= __do_compute_behaviour(vcpu, cgids[i], b);
+		break;
+	default:
+		if (ARRAY_SIZE(ccc))
+			b |= ccc[id -  __COMPLEX_CONDITIONS__](vcpu);
+		break;
+	}
+
+	return b;
+}
+
+static enum trap_behaviour compute_behaviour(struct kvm_vcpu *vcpu, u32 sysreg)
+{
+	const enum coarse_grain_trap_id id = get_trap_config(sysreg);
+	enum trap_behaviour b = BEHAVE_HANDLE_LOCALLY;
+
+	return __do_compute_behaviour(vcpu, id, b);
+}
+
+bool __check_nv_sr_forward(struct kvm_vcpu *vcpu)
+{
+	enum trap_behaviour b;
+	bool is_read;
+	u32 sysreg;
+	u64 esr;
+
+	if (!vcpu_has_nv(vcpu) || is_hyp_ctxt(vcpu))
+		return false;
+
+	esr = kvm_vcpu_get_esr(vcpu);
+	sysreg = esr_sys64_to_sysreg(esr);
+	is_read = (esr & ESR_ELx_SYS64_ISS_DIR_MASK) == ESR_ELx_SYS64_ISS_DIR_READ;
+
+	b = compute_behaviour(vcpu, sysreg);
+
+	if (!((b & BEHAVE_FORWARD_READ) && is_read) &&
+	    !((b & BEHAVE_FORWARD_WRITE) && !is_read))
+		return false;
+
+	trace_kvm_forward_sysreg_trap(vcpu, sysreg, is_read);
+
+	kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
+	return true;
+}
+
 static u64 kvm_check_illegal_exception_return(struct kvm_vcpu *vcpu, u64 spsr)
 {
 	u64 mode = spsr & PSR_MODE_MASK;
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 71b12094d613..77f8bc64148a 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2955,6 +2955,9 @@ int kvm_handle_sys_reg(struct kvm_vcpu *vcpu)
 
 	trace_kvm_handle_sys_reg(esr);
 
+	if (__check_nv_sr_forward(vcpu))
+		return 1;
+
 	params = esr_sys64_to_params(esr);
 	params.regval = vcpu_get_reg(vcpu, Rt);
 
@@ -3363,5 +3366,8 @@ int __init kvm_sys_reg_table_init(void)
 	for (i = 0; i < ARRAY_SIZE(invariant_sys_regs); i++)
 		invariant_sys_regs[i].reset(NULL, &invariant_sys_regs[i]);
 
+	if (kvm_get_mode() == KVM_MODE_NV)
+		populate_nv_trap_config();
+		
 	return 0;
 }
diff --git a/arch/arm64/kvm/trace_arm.h b/arch/arm64/kvm/trace_arm.h
index 6ce5c025218d..1f0f3f653606 100644
--- a/arch/arm64/kvm/trace_arm.h
+++ b/arch/arm64/kvm/trace_arm.h
@@ -364,6 +364,25 @@ TRACE_EVENT(kvm_inject_nested_exception,
 		  __entry->hcr_el2)
 );
 
+TRACE_EVENT(kvm_forward_sysreg_trap,
+	    TP_PROTO(struct kvm_vcpu *vcpu, u32 sysreg, bool is_read),
+	    TP_ARGS(vcpu, sysreg, is_read),
+
+	    TP_STRUCT__entry(
+		__field(struct kvm_vcpu *, vcpu)
+		__field(u32,		   sysreg)
+		__field(bool,		   is_read)
+	    ),
+
+	    TP_fast_assign(
+		__entry->vcpu = vcpu;
+		__entry->sysreg = sysreg;
+		__entry->is_read = is_read;
+	    ),
+
+	    TP_printk("%c %x", __entry->is_read ? 'R' : 'W', __entry->sysreg)
+);
+
 #endif /* _TRACE_ARM_ARM64_KVM_H */
 
 #undef TRACE_INCLUDE_PATH
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 10/59] KVM: arm64: nv: Add trap forwarding for HCR_EL2
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (8 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 09/59] KVM: arm64: nv: Add trap forwarding infrastructure Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 11/59] KVM: arm64: nv: Expose FEAT_EVT to nested guests Marc Zyngier
                   ` (51 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

Describe the HCR_EL2 register, and associate it with all the sysregs
it allows to trap.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/emulate-nested.c | 472 ++++++++++++++++++++++++++++++++
 1 file changed, 472 insertions(+)

diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
index a923f7f47add..8be331e5de9d 100644
--- a/arch/arm64/kvm/emulate-nested.c
+++ b/arch/arm64/kvm/emulate-nested.c
@@ -37,12 +37,47 @@ enum coarse_grain_trap_id {
 	 * on their own instead of being part of a combination of
 	 * trap controls.
 	 */
+	CGT_HCR_TID1,
+	CGT_HCR_TID2,
+	CGT_HCR_TID3,
+	CGT_HCR_IMO,
+	CGT_HCR_FMO,
+	CGT_HCR_TIDCP,
+	CGT_HCR_TACR,
+	CGT_HCR_TSW,
+	CGT_HCR_TPC,
+	CGT_HCR_TPU,
+	CGT_HCR_TTLB,
+	CGT_HCR_TVM,
+	CGT_HCR_TDZ,
+	CGT_HCR_TRVM,
+	CGT_HCR_TLOR,
+	CGT_HCR_TERR,
+	CGT_HCR_APK,
+	CGT_HCR_NV,
+	CGT_HCR_NV1,
+	CGT_HCR_AT,
+	CGT_HCR_FIEN,
+	CGT_HCR_TID4,
+	CGT_HCR_TICAB,
+	CGT_HCR_TOCU,
+	CGT_HCR_ENSCXT,
+	CGT_HCR_TTLBIS,
+	CGT_HCR_TTLBOS,
 
 	/*
 	 * Anything after this point is a combination of trap controls,
 	 * which all must be evaluated to decide what to do.
 	 */
 	__MULTIPLE_CONTROL_BITS__,
+	CGT_HCR_IMO_FMO = __MULTIPLE_CONTROL_BITS__,
+	CGT_HCR_TID2_TID4,
+	CGT_HCR_TTLB_TTLBIS,
+	CGT_HCR_TTLB_TTLBOS,
+	CGT_HCR_TVM_TRVM,
+	CGT_HCR_TPU_TICAB,
+	CGT_HCR_TPU_TOCU,
+	CGT_HCR_NV1_ENSCXT,
 
 	/*
 	 * Anything after this point requires a callback evaluating a
@@ -52,6 +87,168 @@ enum coarse_grain_trap_id {
 };
 
 static const struct trap_bits coarse_trap_bits[] = {
+	[CGT_HCR_TID1] = {
+		.index		= HCR_EL2,
+		.value 		= HCR_TID1,
+		.mask		= HCR_TID1,
+		.behaviour	= BEHAVE_FORWARD_READ,
+	},
+	[CGT_HCR_TID2] = {
+		.index		= HCR_EL2,
+		.value 		= HCR_TID2,
+		.mask		= HCR_TID2,
+		.behaviour	= BEHAVE_FORWARD_ANY,
+	},
+	[CGT_HCR_TID3] = {
+		.index		= HCR_EL2,
+		.value 		= HCR_TID3,
+		.mask		= HCR_TID3,
+		.behaviour	= BEHAVE_FORWARD_READ,
+	},
+	[CGT_HCR_IMO] = {
+		.index		= HCR_EL2,
+		.value 		= HCR_IMO,
+		.mask		= HCR_IMO,
+		.behaviour	= BEHAVE_FORWARD_WRITE,
+	},
+	[CGT_HCR_FMO] = {
+		.index		= HCR_EL2,
+		.value 		= HCR_FMO,
+		.mask		= HCR_FMO,
+		.behaviour	= BEHAVE_FORWARD_WRITE,
+	},
+	[CGT_HCR_TIDCP] = {
+		.index		= HCR_EL2,
+		.value		= HCR_TIDCP,
+		.mask		= HCR_TIDCP,
+		.behaviour	= BEHAVE_FORWARD_ANY,
+	},
+	[CGT_HCR_TACR] = {
+		.index		= HCR_EL2,
+		.value		= HCR_TACR,
+		.mask		= HCR_TACR,
+		.behaviour	= BEHAVE_FORWARD_ANY,
+	},
+	[CGT_HCR_TSW] = {
+		.index		= HCR_EL2,
+		.value		= HCR_TSW,
+		.mask		= HCR_TSW,
+		.behaviour	= BEHAVE_FORWARD_ANY,
+	},
+	[CGT_HCR_TPC] = {
+		.index		= HCR_EL2,
+		.value		= HCR_TPC,
+		.mask		= HCR_TPC,
+		.behaviour	= BEHAVE_FORWARD_ANY,
+	},
+	[CGT_HCR_TPU] = {
+		.index		= HCR_EL2,
+		.value		= HCR_TPU,
+		.mask		= HCR_TPU,
+		.behaviour	= BEHAVE_FORWARD_ANY,
+	},
+	[CGT_HCR_TTLB] = {
+		.index		= HCR_EL2,
+		.value		= HCR_TTLB,
+		.mask		= HCR_TTLB,
+		.behaviour	= BEHAVE_FORWARD_ANY,
+	},
+	[CGT_HCR_TVM] = {
+		.index		= HCR_EL2,
+		.value		= HCR_TVM,
+		.mask		= HCR_TVM,
+		.behaviour	= BEHAVE_FORWARD_ANY,
+	},
+	[CGT_HCR_TDZ] = {
+		.index		= HCR_EL2,
+		.value		= HCR_TDZ,
+		.mask		= HCR_TDZ,
+		.behaviour	= BEHAVE_FORWARD_ANY,
+	},
+	[CGT_HCR_TRVM] = {
+		.index		= HCR_EL2,
+		.value		= HCR_TRVM,
+		.mask		= HCR_TRVM,
+		.behaviour	= BEHAVE_FORWARD_READ,
+	},
+	[CGT_HCR_TLOR] = {
+		.index		= HCR_EL2,
+		.value		= HCR_TLOR,
+		.mask		= HCR_TLOR,
+		.behaviour	= BEHAVE_FORWARD_ANY,
+	},
+	[CGT_HCR_TERR] = {
+		.index		= HCR_EL2,
+		.value		= HCR_TERR,
+		.mask		= HCR_TERR,
+		.behaviour	= BEHAVE_FORWARD_ANY,
+	},
+	[CGT_HCR_APK] = {
+		.index		= HCR_EL2,
+		.value		= 0,
+		.mask		= HCR_APK,
+		.behaviour	= BEHAVE_FORWARD_ANY,
+	},
+	[CGT_HCR_NV] = {
+		.index		= HCR_EL2,
+		.value		= HCR_NV,
+		.mask		= HCR_NV | HCR_NV2,
+		.behaviour	= BEHAVE_FORWARD_ANY,
+	},
+	[CGT_HCR_NV1] = {
+		.index		= HCR_EL2,
+		.value		= HCR_NV | HCR_NV1,
+		.mask		= HCR_NV | HCR_NV1 | HCR_NV2,
+		.behaviour	= BEHAVE_FORWARD_ANY,
+	},
+	[CGT_HCR_AT] = {
+		.index		= HCR_EL2,
+		.value		= HCR_AT,
+		.mask		= HCR_AT,
+		.behaviour	= BEHAVE_FORWARD_ANY,
+	},
+	[CGT_HCR_FIEN] = {
+		.index		= HCR_EL2,
+		.value		= HCR_FIEN,
+		.mask		= HCR_FIEN,
+		.behaviour	= BEHAVE_FORWARD_ANY,
+	},
+	[CGT_HCR_TID4] = {
+		.index		= HCR_EL2,
+		.value 		= HCR_TID4,
+		.mask		= HCR_TID4,
+		.behaviour	= BEHAVE_FORWARD_ANY,
+	},
+	[CGT_HCR_TICAB] = {
+		.index		= HCR_EL2,
+		.value 		= HCR_TICAB,
+		.mask		= HCR_TICAB,
+		.behaviour	= BEHAVE_FORWARD_ANY,
+	},
+	[CGT_HCR_TOCU] = {
+		.index		= HCR_EL2,
+		.value 		= HCR_TOCU,
+		.mask		= HCR_TOCU,
+		.behaviour	= BEHAVE_FORWARD_ANY,
+	},
+	[CGT_HCR_ENSCXT] = {
+		.index		= HCR_EL2,
+		.value 		= 0,
+		.mask		= HCR_ENSCXT,
+		.behaviour	= BEHAVE_FORWARD_ANY,
+	},
+	[CGT_HCR_TTLBIS] = {
+		.index		= HCR_EL2,
+		.value		= HCR_TTLBIS,
+		.mask		= HCR_TTLBIS,
+		.behaviour	= BEHAVE_FORWARD_ANY,
+	},
+	[CGT_HCR_TTLBOS] = {
+		.index		= HCR_EL2,
+		.value		= HCR_TTLBOS,
+		.mask		= HCR_TTLBOS,
+		.behaviour	= BEHAVE_FORWARD_ANY,
+	},
 };
 
 #define MCB(id, ...)					\
@@ -61,6 +258,14 @@ static const struct trap_bits coarse_trap_bits[] = {
 		}
 
 static const enum coarse_grain_trap_id *coarse_control_combo[] = {
+	MCB(CGT_HCR_IMO_FMO,		CGT_HCR_IMO, CGT_HCR_FMO),
+	MCB(CGT_HCR_TID2_TID4,		CGT_HCR_TID2, CGT_HCR_TID4),
+	MCB(CGT_HCR_TTLB_TTLBIS,	CGT_HCR_TTLB, CGT_HCR_TTLBIS),
+	MCB(CGT_HCR_TTLB_TTLBOS,	CGT_HCR_TTLB, CGT_HCR_TTLBOS),
+	MCB(CGT_HCR_TVM_TRVM,		CGT_HCR_TVM, CGT_HCR_TRVM),
+	MCB(CGT_HCR_TPU_TICAB,		CGT_HCR_TPU, CGT_HCR_TICAB),
+	MCB(CGT_HCR_TPU_TOCU,		CGT_HCR_TPU, CGT_HCR_TOCU),
+	MCB(CGT_HCR_NV1_ENSCXT,		CGT_HCR_NV1, CGT_HCR_ENSCXT),
 };
 
 typedef enum trap_behaviour (*complex_condition_check)(struct kvm_vcpu *);
@@ -93,6 +298,273 @@ struct encoding_to_trap_configs {
  * re-injected in the nested hypervisor.
  */
 static const struct encoding_to_trap_configs encoding_to_traps[] __initdata = {
+	SR_TRAP(SYS_REVIDR_EL1,		CGT_HCR_TID1),
+	SR_TRAP(SYS_AIDR_EL1,		CGT_HCR_TID1),
+	SR_TRAP(SYS_SMIDR_EL1,		CGT_HCR_TID1),
+	SR_TRAP(SYS_CTR_EL0,		CGT_HCR_TID2),
+	SR_TRAP(SYS_CCSIDR_EL1,		CGT_HCR_TID2_TID4),
+	SR_TRAP(SYS_CCSIDR2_EL1,	CGT_HCR_TID2_TID4),
+	SR_TRAP(SYS_CLIDR_EL1,		CGT_HCR_TID2_TID4),
+	SR_TRAP(SYS_CSSELR_EL1,		CGT_HCR_TID2_TID4),
+	SR_RANGE_TRAP(SYS_ID_PFR0_EL1,
+		      sys_reg(3, 0, 0, 7, 7), CGT_HCR_TID3),
+	SR_TRAP(SYS_ICC_SGI0R_EL1,	CGT_HCR_IMO_FMO),
+	SR_TRAP(SYS_ICC_ASGI1R_EL1,	CGT_HCR_IMO_FMO),
+	SR_TRAP(SYS_ICC_SGI1R_EL1,	CGT_HCR_IMO_FMO),
+	SR_RANGE_TRAP(sys_reg(3, 0, 11, 0, 0),
+		      sys_reg(3, 0, 11, 15, 7), CGT_HCR_TIDCP),
+	SR_RANGE_TRAP(sys_reg(3, 1, 11, 0, 0),
+		      sys_reg(3, 1, 11, 15, 7), CGT_HCR_TIDCP),
+	SR_RANGE_TRAP(sys_reg(3, 2, 11, 0, 0),
+		      sys_reg(3, 2, 11, 15, 7), CGT_HCR_TIDCP),
+	SR_RANGE_TRAP(sys_reg(3, 3, 11, 0, 0),
+		      sys_reg(3, 3, 11, 15, 7), CGT_HCR_TIDCP),
+	SR_RANGE_TRAP(sys_reg(3, 4, 11, 0, 0),
+		      sys_reg(3, 4, 11, 15, 7), CGT_HCR_TIDCP),
+	SR_RANGE_TRAP(sys_reg(3, 5, 11, 0, 0),
+		      sys_reg(3, 5, 11, 15, 7), CGT_HCR_TIDCP),
+	SR_RANGE_TRAP(sys_reg(3, 6, 11, 0, 0),
+		      sys_reg(3, 6, 11, 15, 7), CGT_HCR_TIDCP),
+	SR_RANGE_TRAP(sys_reg(3, 7, 11, 0, 0),
+		      sys_reg(3, 7, 11, 15, 7), CGT_HCR_TIDCP),
+	SR_RANGE_TRAP(sys_reg(3, 0, 15, 0, 0),
+		      sys_reg(3, 0, 15, 15, 7), CGT_HCR_TIDCP),
+	SR_RANGE_TRAP(sys_reg(3, 1, 15, 0, 0),
+		      sys_reg(3, 1, 15, 15, 7), CGT_HCR_TIDCP),
+	SR_RANGE_TRAP(sys_reg(3, 2, 15, 0, 0),
+		      sys_reg(3, 2, 15, 15, 7), CGT_HCR_TIDCP),
+	SR_RANGE_TRAP(sys_reg(3, 3, 15, 0, 0),
+		      sys_reg(3, 3, 15, 15, 7), CGT_HCR_TIDCP),
+	SR_RANGE_TRAP(sys_reg(3, 4, 15, 0, 0),
+		      sys_reg(3, 4, 15, 15, 7), CGT_HCR_TIDCP),
+	SR_RANGE_TRAP(sys_reg(3, 5, 15, 0, 0),
+		      sys_reg(3, 5, 15, 15, 7), CGT_HCR_TIDCP),
+	SR_RANGE_TRAP(sys_reg(3, 6, 15, 0, 0),
+		      sys_reg(3, 6, 15, 15, 7), CGT_HCR_TIDCP),
+	SR_RANGE_TRAP(sys_reg(3, 7, 15, 0, 0),
+		      sys_reg(3, 7, 15, 15, 7), CGT_HCR_TIDCP),
+	SR_TRAP(SYS_ACTLR_EL1,		CGT_HCR_TACR),
+	SR_TRAP(SYS_DC_ISW,		CGT_HCR_TSW),
+	SR_TRAP(SYS_DC_CSW,		CGT_HCR_TSW),
+	SR_TRAP(SYS_DC_CISW,		CGT_HCR_TSW),
+	SR_TRAP(SYS_DC_IGSW,		CGT_HCR_TSW),
+	SR_TRAP(SYS_DC_IGDSW,		CGT_HCR_TSW),
+	SR_TRAP(SYS_DC_CGSW,		CGT_HCR_TSW),
+	SR_TRAP(SYS_DC_CGDSW,		CGT_HCR_TSW),
+	SR_TRAP(SYS_DC_CIGSW,		CGT_HCR_TSW),
+	SR_TRAP(SYS_DC_CIGDSW,		CGT_HCR_TSW),
+	SR_TRAP(SYS_DC_CIVAC,		CGT_HCR_TPC),
+	SR_TRAP(SYS_DC_CVAC,		CGT_HCR_TPC),
+	SR_TRAP(SYS_DC_CVAP,		CGT_HCR_TPC),
+	SR_TRAP(SYS_DC_IVAC,		CGT_HCR_TPC),
+	SR_TRAP(SYS_DC_CIGVAC,		CGT_HCR_TPC),
+	SR_TRAP(SYS_DC_CIGDVAC,		CGT_HCR_TPC),
+	SR_TRAP(SYS_DC_IGVAC,		CGT_HCR_TPC),
+	SR_TRAP(SYS_DC_IGDVAC,		CGT_HCR_TPC),
+	SR_TRAP(SYS_DC_CGVAC,		CGT_HCR_TPC),
+	SR_TRAP(SYS_DC_CGDVAC,		CGT_HCR_TPC),
+	SR_TRAP(SYS_DC_CGVAP,		CGT_HCR_TPC),
+	SR_TRAP(SYS_DC_CGDVAP,		CGT_HCR_TPC),
+	SR_TRAP(SYS_DC_CGVADP,		CGT_HCR_TPC),
+	SR_TRAP(SYS_DC_CGDVADP,		CGT_HCR_TPC),
+	SR_TRAP(SYS_IC_IVAU,		CGT_HCR_TPU_TOCU),
+	SR_TRAP(SYS_IC_IALLU,		CGT_HCR_TPU_TOCU),
+	SR_TRAP(SYS_IC_IALLUIS,		CGT_HCR_TPU_TICAB),
+	SR_TRAP(SYS_DC_CVAU,		CGT_HCR_TPU_TOCU),
+	SR_TRAP(OP_TLBI_RVAE1,		CGT_HCR_TTLB),
+	SR_TRAP(OP_TLBI_RVAAE1,		CGT_HCR_TTLB),
+	SR_TRAP(OP_TLBI_RVALE1,		CGT_HCR_TTLB),
+	SR_TRAP(OP_TLBI_RVAALE1,	CGT_HCR_TTLB),
+	SR_TRAP(OP_TLBI_VMALLE1,	CGT_HCR_TTLB),
+	SR_TRAP(OP_TLBI_VAE1,		CGT_HCR_TTLB),
+	SR_TRAP(OP_TLBI_ASIDE1,		CGT_HCR_TTLB),
+	SR_TRAP(OP_TLBI_VAAE1,		CGT_HCR_TTLB),
+	SR_TRAP(OP_TLBI_VALE1,		CGT_HCR_TTLB),
+	SR_TRAP(OP_TLBI_VAALE1,		CGT_HCR_TTLB),
+	SR_TRAP(OP_TLBI_RVAE1NXS,	CGT_HCR_TTLB),
+	SR_TRAP(OP_TLBI_RVAAE1NXS,	CGT_HCR_TTLB),
+	SR_TRAP(OP_TLBI_RVALE1NXS,	CGT_HCR_TTLB),
+	SR_TRAP(OP_TLBI_RVAALE1NXS,	CGT_HCR_TTLB),
+	SR_TRAP(OP_TLBI_VMALLE1NXS,	CGT_HCR_TTLB),
+	SR_TRAP(OP_TLBI_VAE1NXS,	CGT_HCR_TTLB),
+	SR_TRAP(OP_TLBI_ASIDE1NXS,	CGT_HCR_TTLB),
+	SR_TRAP(OP_TLBI_VAAE1NXS,	CGT_HCR_TTLB),
+	SR_TRAP(OP_TLBI_VALE1NXS,	CGT_HCR_TTLB),
+	SR_TRAP(OP_TLBI_VAALE1NXS,	CGT_HCR_TTLB),
+	SR_TRAP(OP_TLBI_RVAE1IS,	CGT_HCR_TTLB_TTLBIS),
+	SR_TRAP(OP_TLBI_RVAAE1IS,	CGT_HCR_TTLB_TTLBIS),
+	SR_TRAP(OP_TLBI_RVALE1IS,	CGT_HCR_TTLB_TTLBIS),
+	SR_TRAP(OP_TLBI_RVAALE1IS,	CGT_HCR_TTLB_TTLBIS),
+	SR_TRAP(OP_TLBI_VMALLE1IS,	CGT_HCR_TTLB_TTLBIS),
+	SR_TRAP(OP_TLBI_VAE1IS,		CGT_HCR_TTLB_TTLBIS),
+	SR_TRAP(OP_TLBI_ASIDE1IS,	CGT_HCR_TTLB_TTLBIS),
+	SR_TRAP(OP_TLBI_VAAE1IS,	CGT_HCR_TTLB_TTLBIS),
+	SR_TRAP(OP_TLBI_VALE1IS,	CGT_HCR_TTLB_TTLBIS),
+	SR_TRAP(OP_TLBI_VAALE1IS,	CGT_HCR_TTLB_TTLBIS),
+	SR_TRAP(OP_TLBI_RVAE1ISNXS,	CGT_HCR_TTLB_TTLBIS),
+	SR_TRAP(OP_TLBI_RVAAE1ISNXS,	CGT_HCR_TTLB_TTLBIS),
+	SR_TRAP(OP_TLBI_RVALE1ISNXS,	CGT_HCR_TTLB_TTLBIS),
+	SR_TRAP(OP_TLBI_RVAALE1ISNXS,	CGT_HCR_TTLB_TTLBIS),
+	SR_TRAP(OP_TLBI_VMALLE1ISNXS,	CGT_HCR_TTLB_TTLBIS),
+	SR_TRAP(OP_TLBI_VAE1ISNXS,	CGT_HCR_TTLB_TTLBIS),
+	SR_TRAP(OP_TLBI_ASIDE1ISNXS,	CGT_HCR_TTLB_TTLBIS),
+	SR_TRAP(OP_TLBI_VAAE1ISNXS,	CGT_HCR_TTLB_TTLBIS),
+	SR_TRAP(OP_TLBI_VALE1ISNXS,	CGT_HCR_TTLB_TTLBIS),
+	SR_TRAP(OP_TLBI_VAALE1ISNXS,	CGT_HCR_TTLB_TTLBIS),
+	SR_TRAP(OP_TLBI_VMALLE1OS,	CGT_HCR_TTLB_TTLBOS),
+	SR_TRAP(OP_TLBI_VAE1OS,		CGT_HCR_TTLB_TTLBOS),
+	SR_TRAP(OP_TLBI_ASIDE1OS,	CGT_HCR_TTLB_TTLBOS),
+	SR_TRAP(OP_TLBI_VAAE1OS,	CGT_HCR_TTLB_TTLBOS),
+	SR_TRAP(OP_TLBI_VALE1OS,	CGT_HCR_TTLB_TTLBOS),
+	SR_TRAP(OP_TLBI_VAALE1OS,	CGT_HCR_TTLB_TTLBOS),
+	SR_TRAP(OP_TLBI_RVAE1OS,	CGT_HCR_TTLB_TTLBOS),
+	SR_TRAP(OP_TLBI_RVAAE1OS,	CGT_HCR_TTLB_TTLBOS),
+	SR_TRAP(OP_TLBI_RVALE1OS,	CGT_HCR_TTLB_TTLBOS),
+	SR_TRAP(OP_TLBI_RVAALE1OS,	CGT_HCR_TTLB_TTLBOS),
+	SR_TRAP(OP_TLBI_VMALLE1OSNXS,	CGT_HCR_TTLB_TTLBOS),
+	SR_TRAP(OP_TLBI_VAE1OSNXS,	CGT_HCR_TTLB_TTLBOS),
+	SR_TRAP(OP_TLBI_ASIDE1OSNXS,	CGT_HCR_TTLB_TTLBOS),
+	SR_TRAP(OP_TLBI_VAAE1OSNXS,	CGT_HCR_TTLB_TTLBOS),
+	SR_TRAP(OP_TLBI_VALE1OSNXS,	CGT_HCR_TTLB_TTLBOS),
+	SR_TRAP(OP_TLBI_VAALE1OSNXS,	CGT_HCR_TTLB_TTLBOS),
+	SR_TRAP(OP_TLBI_RVAE1OSNXS,	CGT_HCR_TTLB_TTLBOS),
+	SR_TRAP(OP_TLBI_RVAAE1OSNXS,	CGT_HCR_TTLB_TTLBOS),
+	SR_TRAP(OP_TLBI_RVALE1OSNXS,	CGT_HCR_TTLB_TTLBOS),
+	SR_TRAP(OP_TLBI_RVAALE1OSNXS,	CGT_HCR_TTLB_TTLBOS),
+	SR_TRAP(SYS_SCTLR_EL1,		CGT_HCR_TVM_TRVM),
+	SR_TRAP(SYS_TTBR0_EL1,		CGT_HCR_TVM_TRVM),
+	SR_TRAP(SYS_TTBR1_EL1,		CGT_HCR_TVM_TRVM),
+	SR_TRAP(SYS_TCR_EL1,		CGT_HCR_TVM_TRVM),
+	SR_TRAP(SYS_ESR_EL1,		CGT_HCR_TVM_TRVM),
+	SR_TRAP(SYS_FAR_EL1,		CGT_HCR_TVM_TRVM),
+	SR_TRAP(SYS_AFSR0_EL1,		CGT_HCR_TVM_TRVM),
+	SR_TRAP(SYS_AFSR1_EL1,		CGT_HCR_TVM_TRVM),
+	SR_TRAP(SYS_MAIR_EL1,		CGT_HCR_TVM_TRVM),
+	SR_TRAP(SYS_AMAIR_EL1,		CGT_HCR_TVM_TRVM),
+	SR_TRAP(SYS_CONTEXTIDR_EL1,	CGT_HCR_TVM_TRVM),
+	SR_TRAP(SYS_DC_ZVA,		CGT_HCR_TDZ),
+	SR_TRAP(SYS_DC_GVA,		CGT_HCR_TDZ),
+	SR_TRAP(SYS_DC_GZVA,		CGT_HCR_TDZ),
+	SR_RANGE_TRAP(SYS_LORSA_EL1,
+		      SYS_LORC_EL1,	CGT_HCR_TLOR),
+	SR_TRAP(SYS_LORID_EL1,		CGT_HCR_TLOR),
+	SR_TRAP(SYS_ERRIDR_EL1,		CGT_HCR_TERR),
+	SR_TRAP(SYS_ERRSELR_EL1,	CGT_HCR_TERR),
+	SR_TRAP(SYS_ERXADDR_EL1,	CGT_HCR_TERR),
+	SR_TRAP(SYS_ERXCTLR_EL1,	CGT_HCR_TERR),
+	SR_TRAP(SYS_ERXFR_EL1,		CGT_HCR_TERR),
+	SR_TRAP(SYS_ERXMISC0_EL1,	CGT_HCR_TERR),
+	SR_TRAP(SYS_ERXMISC1_EL1,	CGT_HCR_TERR),
+	SR_TRAP(SYS_ERXMISC2_EL1,	CGT_HCR_TERR),
+	SR_TRAP(SYS_ERXMISC3_EL1,	CGT_HCR_TERR),
+	SR_TRAP(SYS_ERXSTATUS_EL1,	CGT_HCR_TERR),
+	SR_TRAP(SYS_APIAKEYLO_EL1,	CGT_HCR_APK),
+	SR_TRAP(SYS_APIAKEYHI_EL1,	CGT_HCR_APK),
+	SR_TRAP(SYS_APIBKEYLO_EL1,	CGT_HCR_APK),
+	SR_TRAP(SYS_APIBKEYHI_EL1,	CGT_HCR_APK),
+	SR_TRAP(SYS_APDAKEYLO_EL1,	CGT_HCR_APK),
+	SR_TRAP(SYS_APDAKEYHI_EL1,	CGT_HCR_APK),
+	SR_TRAP(SYS_APDBKEYLO_EL1,	CGT_HCR_APK),
+	SR_TRAP(SYS_APDBKEYHI_EL1,	CGT_HCR_APK),
+	SR_TRAP(SYS_APGAKEYLO_EL1,	CGT_HCR_APK),
+	SR_TRAP(SYS_APGAKEYHI_EL1,	CGT_HCR_APK),
+	/* All _EL2 registers */
+	SR_RANGE_TRAP(sys_reg(3, 4, 0, 0, 0),
+		      sys_reg(3, 4, 10, 15, 7), CGT_HCR_NV),
+	SR_RANGE_TRAP(sys_reg(3, 4, 12, 0, 0),
+		      sys_reg(3, 4, 14, 15, 7), CGT_HCR_NV),
+	/* All _EL02, _EL12 registers */
+	SR_RANGE_TRAP(sys_reg(3, 5, 0, 0, 0),
+		      sys_reg(3, 5, 10, 15, 7), CGT_HCR_NV),
+	SR_RANGE_TRAP(sys_reg(3, 5, 12, 0, 0),
+		      sys_reg(3, 5, 14, 15, 7), CGT_HCR_NV),
+	SR_TRAP(SYS_SP_EL1,		CGT_HCR_NV),
+	SR_TRAP(OP_AT_S1E2R,		CGT_HCR_NV),
+	SR_TRAP(OP_AT_S1E2W,		CGT_HCR_NV),
+	SR_TRAP(OP_AT_S12E1R,		CGT_HCR_NV),
+	SR_TRAP(OP_AT_S12E1W,		CGT_HCR_NV),
+	SR_TRAP(OP_AT_S12E0R,		CGT_HCR_NV),
+	SR_TRAP(OP_AT_S12E0W,		CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_IPAS2E1,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_RIPAS2E1,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_IPAS2LE1,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_RIPAS2LE1,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_RVAE2,		CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_RVALE2,		CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_ALLE2,		CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_VAE2,		CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_ALLE1,		CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_VALE2,		CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_VMALLS12E1,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_IPAS2E1NXS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_RIPAS2E1NXS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_IPAS2LE1NXS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_RIPAS2LE1NXS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_RVAE2NXS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_RVALE2NXS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_ALLE2NXS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_VAE2NXS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_ALLE1NXS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_VALE2NXS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_VMALLS12E1NXS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_IPAS2E1IS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_RIPAS2E1IS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_IPAS2LE1IS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_RIPAS2LE1IS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_RVAE2IS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_RVALE2IS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_ALLE2IS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_VAE2IS,		CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_ALLE1IS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_VALE2IS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_VMALLS12E1IS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_IPAS2E1ISNXS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_RIPAS2E1ISNXS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_IPAS2LE1ISNXS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_RIPAS2LE1ISNXS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_RVAE2ISNXS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_RVALE2ISNXS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_ALLE2ISNXS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_VAE2ISNXS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_ALLE1ISNXS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_VALE2ISNXS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_VMALLS12E1ISNXS,CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_ALLE2OS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_VAE2OS,		CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_ALLE1OS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_VALE2OS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_VMALLS12E1OS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_IPAS2E1OS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_RIPAS2E1OS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_IPAS2LE1OS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_RIPAS2LE1OS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_RVAE2OS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_RVALE2OS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_ALLE2OSNXS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_VAE2OSNXS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_ALLE1OSNXS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_VALE2OSNXS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_VMALLS12E1OSNXS,CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_IPAS2E1OSNXS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_RIPAS2E1OSNXS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_IPAS2LE1OSNXS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_RIPAS2LE1OSNXS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_RVAE2OSNXS,	CGT_HCR_NV),
+	SR_TRAP(OP_TLBI_RVALE2OSNXS,	CGT_HCR_NV),
+	SR_TRAP(SYS_VBAR_EL1,		CGT_HCR_NV1),
+	SR_TRAP(SYS_ELR_EL1,		CGT_HCR_NV1),
+	SR_TRAP(SYS_SPSR_EL1,		CGT_HCR_NV1),
+	SR_TRAP(SYS_SCXTNUM_EL1,	CGT_HCR_NV1_ENSCXT),
+	SR_TRAP(OP_AT_S1E1R, 		CGT_HCR_AT),
+	SR_TRAP(OP_AT_S1E1W, 		CGT_HCR_AT),
+	SR_TRAP(OP_AT_S1E0R, 		CGT_HCR_AT),
+	SR_TRAP(OP_AT_S1E0W, 		CGT_HCR_AT),
+	SR_TRAP(OP_AT_S1E1RP, 		CGT_HCR_AT),
+	SR_TRAP(OP_AT_S1E1WP, 		CGT_HCR_AT),
+	/* ERXPFGCDN_EL1, ERXPFGCTL_EL1, and ERXPFGF_EL1 */
+	SR_RANGE_TRAP(sys_reg(3, 0, 5, 4, 4),
+		      sys_reg(3, 0, 5, 4, 6), CGT_HCR_FIEN),
+	SR_TRAP(SYS_SCXTNUM_EL0,	CGT_HCR_ENSCXT),
 };
 
 static DEFINE_XARRAY(sr_forward_xa);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 11/59] KVM: arm64: nv: Expose FEAT_EVT to nested guests
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (9 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 10/59] KVM: arm64: nv: Add trap forwarding for HCR_EL2 Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 12/59] KVM: arm64: nv: Add trap forwarding for MDCR_EL2 Marc Zyngier
                   ` (50 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

Now that we properly implement FEAT_EVT (as we correctly forward
traps), expose it to guests.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/nested.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 315354d27978..7f80f385d9e8 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -124,8 +124,7 @@ void access_nested_id_reg(struct kvm_vcpu *v, struct sys_reg_params *p,
 		break;
 
 	case SYS_ID_AA64MMFR2_EL1:
-		val &= ~(NV_FTR(MMFR2, EVT)	|
-			 NV_FTR(MMFR2, BBM)	|
+		val &= ~(NV_FTR(MMFR2, BBM)	|
 			 NV_FTR(MMFR2, TTL)	|
 			 GENMASK_ULL(47, 44)	|
 			 NV_FTR(MMFR2, ST)	|
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 12/59] KVM: arm64: nv: Add trap forwarding for MDCR_EL2
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (10 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 11/59] KVM: arm64: nv: Expose FEAT_EVT to nested guests Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 13/59] KVM: arm64: nv: Add trap forwarding for CNTHCTL_EL2 Marc Zyngier
                   ` (49 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

Describe the MDCR_EL2 register, and associate it with all the sysregs
it allows to trap.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/emulate-nested.c | 204 ++++++++++++++++++++++++++++++++
 1 file changed, 204 insertions(+)

diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
index 8be331e5de9d..1e2fe192c9a4 100644
--- a/arch/arm64/kvm/emulate-nested.c
+++ b/arch/arm64/kvm/emulate-nested.c
@@ -65,6 +65,18 @@ enum coarse_grain_trap_id {
 	CGT_HCR_TTLBIS,
 	CGT_HCR_TTLBOS,
 
+	CGT_MDCR_TPMCR,
+	CGT_MDCR_TPM,
+	CGT_MDCR_TDE,
+	CGT_MDCR_TDA,
+	CGT_MDCR_TDOSA,
+	CGT_MDCR_TDRA,
+	CGT_MDCR_E2PB,
+	CGT_MDCR_TPMS,
+	CGT_MDCR_TTRF,
+	CGT_MDCR_E2TB,
+	CGT_MDCR_TDCC,
+
 	/*
 	 * Anything after this point is a combination of trap controls,
 	 * which all must be evaluated to decide what to do.
@@ -78,6 +90,11 @@ enum coarse_grain_trap_id {
 	CGT_HCR_TPU_TICAB,
 	CGT_HCR_TPU_TOCU,
 	CGT_HCR_NV1_ENSCXT,
+	CGT_MDCR_TPM_TPMCR,
+	CGT_MDCR_TDE_TDA,
+	CGT_MDCR_TDE_TDOSA,
+	CGT_MDCR_TDE_TDRA,
+	CGT_MDCR_TDCC_TDE_TDA,
 
 	/*
 	 * Anything after this point requires a callback evaluating a
@@ -249,6 +266,72 @@ static const struct trap_bits coarse_trap_bits[] = {
 		.mask		= HCR_TTLBOS,
 		.behaviour	= BEHAVE_FORWARD_ANY,
 	},
+	[CGT_MDCR_TPMCR] = {
+		.index		= MDCR_EL2,
+		.value		= MDCR_EL2_TPMCR,
+		.mask		= MDCR_EL2_TPMCR,
+		.behaviour	= BEHAVE_FORWARD_ANY,
+	},
+	[CGT_MDCR_TPM] = {
+		.index		= MDCR_EL2,
+		.value		= MDCR_EL2_TPM,
+		.mask		= MDCR_EL2_TPM,
+		.behaviour	= BEHAVE_FORWARD_ANY,
+	},
+	[CGT_MDCR_TDE] = {
+		.index		= MDCR_EL2,
+		.value		= MDCR_EL2_TDE,
+		.mask		= MDCR_EL2_TDE,
+		.behaviour	= BEHAVE_FORWARD_ANY,
+	},
+	[CGT_MDCR_TDA] = {
+		.index		= MDCR_EL2,
+		.value		= MDCR_EL2_TDA,
+		.mask		= MDCR_EL2_TDA,
+		.behaviour	= BEHAVE_FORWARD_ANY,
+	},
+	[CGT_MDCR_TDOSA] = {
+		.index		= MDCR_EL2,
+		.value		= MDCR_EL2_TDOSA,
+		.mask		= MDCR_EL2_TDOSA,
+		.behaviour	= BEHAVE_FORWARD_ANY,
+	},
+	[CGT_MDCR_TDRA] = {
+		.index		= MDCR_EL2,
+		.value		= MDCR_EL2_TDRA,
+		.mask		= MDCR_EL2_TDRA,
+		.behaviour	= BEHAVE_FORWARD_ANY,
+	},
+	[CGT_MDCR_E2PB] = {
+		.index		= MDCR_EL2,
+		.value		= 0,
+		.mask		= BIT(MDCR_EL2_E2PB_SHIFT),
+		.behaviour	= BEHAVE_FORWARD_ANY,
+	},
+	[CGT_MDCR_TPMS] = {
+		.index		= MDCR_EL2,
+		.value		= MDCR_EL2_TPMS,
+		.mask		= MDCR_EL2_TPMS,
+		.behaviour	= BEHAVE_FORWARD_ANY,
+	},
+	[CGT_MDCR_TTRF] = {
+		.index		= MDCR_EL2,
+		.value		= MDCR_EL2_TTRF,
+		.mask		= MDCR_EL2_TTRF,
+		.behaviour	= BEHAVE_FORWARD_ANY,
+	},
+	[CGT_MDCR_E2TB] = {
+		.index		= MDCR_EL2,
+		.value		= 0,
+		.mask		= BIT(MDCR_EL2_E2TB_SHIFT),
+		.behaviour	= BEHAVE_FORWARD_ANY,
+	},
+	[CGT_MDCR_TDCC] = {
+		.index		= MDCR_EL2,
+		.value		= MDCR_EL2_TDCC,
+		.mask		= MDCR_EL2_TDCC,
+		.behaviour	= BEHAVE_FORWARD_ANY,
+	},
 };
 
 #define MCB(id, ...)					\
@@ -266,6 +349,11 @@ static const enum coarse_grain_trap_id *coarse_control_combo[] = {
 	MCB(CGT_HCR_TPU_TICAB,		CGT_HCR_TPU, CGT_HCR_TICAB),
 	MCB(CGT_HCR_TPU_TOCU,		CGT_HCR_TPU, CGT_HCR_TOCU),
 	MCB(CGT_HCR_NV1_ENSCXT,		CGT_HCR_NV1, CGT_HCR_ENSCXT),
+	MCB(CGT_MDCR_TPM_TPMCR,		CGT_MDCR_TPM, CGT_MDCR_TPMCR),
+	MCB(CGT_MDCR_TDE_TDA,		CGT_MDCR_TDE, CGT_MDCR_TDA),
+	MCB(CGT_MDCR_TDE_TDOSA,		CGT_MDCR_TDE, CGT_MDCR_TDOSA),
+	MCB(CGT_MDCR_TDE_TDRA,		CGT_MDCR_TDE, CGT_MDCR_TDRA),
+	MCB(CGT_MDCR_TDCC_TDE_TDA,	CGT_MDCR_TDCC, CGT_MDCR_TDE_TDA),
 };
 
 typedef enum trap_behaviour (*complex_condition_check)(struct kvm_vcpu *);
@@ -565,6 +653,122 @@ static const struct encoding_to_trap_configs encoding_to_traps[] __initdata = {
 	SR_RANGE_TRAP(sys_reg(3, 0, 5, 4, 4),
 		      sys_reg(3, 0, 5, 4, 6), CGT_HCR_FIEN),
 	SR_TRAP(SYS_SCXTNUM_EL0,	CGT_HCR_ENSCXT),
+	SR_TRAP(SYS_PMCR_EL0,		CGT_MDCR_TPM_TPMCR),
+	SR_TRAP(SYS_PMCNTENSET_EL0,	CGT_MDCR_TPM),
+	SR_TRAP(SYS_PMCNTENCLR_EL0,	CGT_MDCR_TPM),
+	SR_TRAP(SYS_PMOVSSET_EL0,	CGT_MDCR_TPM),
+	SR_TRAP(SYS_PMOVSCLR_EL0,	CGT_MDCR_TPM),
+	SR_TRAP(SYS_PMCEID0_EL0,	CGT_MDCR_TPM),
+	SR_TRAP(SYS_PMCEID1_EL0,	CGT_MDCR_TPM),
+	SR_TRAP(SYS_PMXEVTYPER_EL0,	CGT_MDCR_TPM),
+	SR_TRAP(SYS_PMSWINC_EL0,	CGT_MDCR_TPM),
+	SR_TRAP(SYS_PMSELR_EL0,		CGT_MDCR_TPM),
+	SR_TRAP(SYS_PMXEVCNTR_EL0,	CGT_MDCR_TPM),
+	SR_TRAP(SYS_PMCCNTR_EL0,	CGT_MDCR_TPM),
+	SR_TRAP(SYS_PMUSERENR_EL0,	CGT_MDCR_TPM),
+	SR_TRAP(SYS_PMINTENSET_EL1,	CGT_MDCR_TPM),
+	SR_TRAP(SYS_PMINTENCLR_EL1,	CGT_MDCR_TPM),
+	SR_TRAP(SYS_PMMIR_EL1,		CGT_MDCR_TPM),
+	SR_RANGE_TRAP(SYS_PMEVCNTRn_EL0(0),
+		      SYS_PMEVCNTRn_EL0(30), CGT_MDCR_TPM),
+	SR_RANGE_TRAP(SYS_PMEVTYPERn_EL0(0),
+		      SYS_PMEVTYPERn_EL0(30), CGT_MDCR_TPM),
+	SR_TRAP(SYS_PMCCFILTR_EL0,	CGT_MDCR_TPM),
+	SR_TRAP(SYS_MDCCSR_EL0,		CGT_MDCR_TDCC_TDE_TDA),
+	SR_TRAP(SYS_MDCCINT_EL1,	CGT_MDCR_TDCC_TDE_TDA),
+	SR_TRAP(SYS_OSDTRRX_EL1,	CGT_MDCR_TDCC_TDE_TDA),
+	SR_TRAP(SYS_OSDTRTX_EL1,	CGT_MDCR_TDCC_TDE_TDA),
+	SR_TRAP(SYS_MDSCR_EL1,		CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_OSECCR_EL1,		CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGBVRn_EL1(0),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGBVRn_EL1(1),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGBVRn_EL1(2),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGBVRn_EL1(3),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGBVRn_EL1(4),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGBVRn_EL1(5),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGBVRn_EL1(6),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGBVRn_EL1(7),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGBVRn_EL1(8),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGBVRn_EL1(9),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGBVRn_EL1(10),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGBVRn_EL1(11),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGBVRn_EL1(12),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGBVRn_EL1(13),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGBVRn_EL1(14),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGBVRn_EL1(15),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGBCRn_EL1(0),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGBCRn_EL1(1),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGBCRn_EL1(2),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGBCRn_EL1(3),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGBCRn_EL1(4),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGBCRn_EL1(5),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGBCRn_EL1(6),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGBCRn_EL1(7),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGBCRn_EL1(8),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGBCRn_EL1(9),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGBCRn_EL1(10),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGBCRn_EL1(11),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGBCRn_EL1(12),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGBCRn_EL1(13),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGBCRn_EL1(14),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGBCRn_EL1(15),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGWVRn_EL1(0),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGWVRn_EL1(1),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGWVRn_EL1(2),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGWVRn_EL1(3),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGWVRn_EL1(4),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGWVRn_EL1(5),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGWVRn_EL1(6),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGWVRn_EL1(7),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGWVRn_EL1(8),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGWVRn_EL1(9),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGWVRn_EL1(10),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGWVRn_EL1(11),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGWVRn_EL1(12),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGWVRn_EL1(13),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGWVRn_EL1(14),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGWVRn_EL1(15),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGWCRn_EL1(0),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGWCRn_EL1(1),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGWCRn_EL1(2),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGWCRn_EL1(3),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGWCRn_EL1(4),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGWCRn_EL1(5),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGWCRn_EL1(6),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGWCRn_EL1(7),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGWCRn_EL1(8),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGWCRn_EL1(9),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGWCRn_EL1(10),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGWCRn_EL1(11),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGWCRn_EL1(12),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGWCRn_EL1(13),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGWCRn_EL1(14),	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGCLAIMSET_EL1,	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGCLAIMCLR_EL1,	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_DBGAUTHSTATUS_EL1,	CGT_MDCR_TDE_TDA),
+	SR_TRAP(SYS_OSLAR_EL1,		CGT_MDCR_TDE_TDOSA),
+	SR_TRAP(SYS_OSLSR_EL1,		CGT_MDCR_TDE_TDOSA),
+	SR_TRAP(SYS_OSDLR_EL1,		CGT_MDCR_TDE_TDOSA),
+	SR_TRAP(SYS_DBGPRCR_EL1,	CGT_MDCR_TDE_TDOSA),
+	SR_TRAP(SYS_MDRAR_EL1,		CGT_MDCR_TDE_TDRA),
+	SR_TRAP(SYS_PMBLIMITR_EL1,	CGT_MDCR_E2PB),
+	SR_TRAP(SYS_PMBPTR_EL1,		CGT_MDCR_E2PB),
+	SR_TRAP(SYS_PMBSR_EL1,		CGT_MDCR_E2PB),
+	SR_TRAP(SYS_PMSCR_EL1,		CGT_MDCR_TPMS),
+	SR_TRAP(SYS_PMSEVFR_EL1,	CGT_MDCR_TPMS),
+	SR_TRAP(SYS_PMSFCR_EL1,		CGT_MDCR_TPMS),
+	SR_TRAP(SYS_PMSICR_EL1,		CGT_MDCR_TPMS),
+	SR_TRAP(SYS_PMSIDR_EL1,		CGT_MDCR_TPMS),
+	SR_TRAP(SYS_PMSIRR_EL1,		CGT_MDCR_TPMS),
+	SR_TRAP(SYS_PMSLATFR_EL1,	CGT_MDCR_TPMS),
+	SR_TRAP(SYS_PMSNEVFR_EL1,	CGT_MDCR_TPMS),
+	SR_TRAP(SYS_TRFCR_EL1,		CGT_MDCR_TTRF),
+	SR_TRAP(SYS_TRBBASER_EL1,	CGT_MDCR_E2TB),
+	SR_TRAP(SYS_TRBLIMITR_EL1,	CGT_MDCR_E2TB),
+	SR_TRAP(SYS_TRBMAR_EL1, 	CGT_MDCR_E2TB),
+	SR_TRAP(SYS_TRBPTR_EL1, 	CGT_MDCR_E2TB),
+	SR_TRAP(SYS_TRBSR_EL1, 		CGT_MDCR_E2TB),
+	SR_TRAP(SYS_TRBTRG_EL1,		CGT_MDCR_E2TB),
 };
 
 static DEFINE_XARRAY(sr_forward_xa);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 13/59] KVM: arm64: nv: Add trap forwarding for CNTHCTL_EL2
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (11 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 12/59] KVM: arm64: nv: Add trap forwarding for MDCR_EL2 Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 14/59] KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers Marc Zyngier
                   ` (48 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

Describe the CNTHCTL_EL2 register, and associate it with all the sysregs
it allows to trap.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/emulate-nested.c | 37 ++++++++++++++++++++++++++++++++-
 1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
index 1e2fe192c9a4..101292c43602 100644
--- a/arch/arm64/kvm/emulate-nested.c
+++ b/arch/arm64/kvm/emulate-nested.c
@@ -98,9 +98,11 @@ enum coarse_grain_trap_id {
 
 	/*
 	 * Anything after this point requires a callback evaluating a
-	 * complex trap condition. Hopefully we'll never need this...
+	 * complex trap condition. Ugly stuff.
 	 */
 	__COMPLEX_CONDITIONS__,
+	CGT_CNTHCTL_EL1PCTEN = __COMPLEX_CONDITIONS__,
+	CGT_CNTHCTL_EL1PTEN,
 };
 
 static const struct trap_bits coarse_trap_bits[] = {
@@ -358,9 +360,37 @@ static const enum coarse_grain_trap_id *coarse_control_combo[] = {
 
 typedef enum trap_behaviour (*complex_condition_check)(struct kvm_vcpu *);
 
+static u64 get_sanitized_cnthctl(struct kvm_vcpu *vcpu)
+{
+	u64 val = __vcpu_sys_reg(vcpu, CNTHCTL_EL2);
+
+	if (!vcpu_el2_e2h_is_set(vcpu))
+		val = (val & (CNTHCTL_EL1PCEN | CNTHCTL_EL1PCTEN)) << 10;
+
+	return val;
+}
+
+static enum trap_behaviour check_cnthctl_el1pcten(struct kvm_vcpu *vcpu)
+{
+	if (get_sanitized_cnthctl(vcpu) & (CNTHCTL_EL1PCTEN << 10))
+		return BEHAVE_HANDLE_LOCALLY;
+
+	return BEHAVE_FORWARD_ANY;
+}
+
+static enum trap_behaviour check_cnthctl_el1pten(struct kvm_vcpu *vcpu)
+{
+	if (get_sanitized_cnthctl(vcpu) & (CNTHCTL_EL1PCEN << 10))
+		return BEHAVE_HANDLE_LOCALLY;
+
+	return BEHAVE_FORWARD_ANY;
+}
+
 #define CCC(id, fn)	[id - __COMPLEX_CONDITIONS__] = fn
 
 static const complex_condition_check ccc[] = {
+	CCC(CGT_CNTHCTL_EL1PCTEN, check_cnthctl_el1pcten),
+	CCC(CGT_CNTHCTL_EL1PTEN, check_cnthctl_el1pten),
 };
 
 struct encoding_to_trap_configs {
@@ -769,6 +799,11 @@ static const struct encoding_to_trap_configs encoding_to_traps[] __initdata = {
 	SR_TRAP(SYS_TRBPTR_EL1, 	CGT_MDCR_E2TB),
 	SR_TRAP(SYS_TRBSR_EL1, 		CGT_MDCR_E2TB),
 	SR_TRAP(SYS_TRBTRG_EL1,		CGT_MDCR_E2TB),
+	SR_TRAP(SYS_CNTP_TVAL_EL0,	CGT_CNTHCTL_EL1PTEN),
+	SR_TRAP(SYS_CNTP_CVAL_EL0,	CGT_CNTHCTL_EL1PTEN),
+	SR_TRAP(SYS_CNTP_CTL_EL0,	CGT_CNTHCTL_EL1PTEN),
+	SR_TRAP(SYS_CNTPCT_EL0,		CGT_CNTHCTL_EL1PCTEN),
+	SR_TRAP(SYS_CNTPCTSS_EL0,	CGT_CNTHCTL_EL1PCTEN),
 };
 
 static DEFINE_XARRAY(sr_forward_xa);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 14/59] KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (12 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 13/59] KVM: arm64: nv: Add trap forwarding for CNTHCTL_EL2 Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 15/59] KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg() Marc Zyngier
                   ` (47 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

Some EL2 system registers immediately affect the current execution
of the system, so we need to use their respective EL1 counterparts.
For this we need to define a mapping between the two. In general,
this only affects non-VHE guest hypervisors, as VHE system registers
are compatible with the EL1 counterparts.

These helpers will get used in subsequent patches.

Co-developed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_nested.h | 48 +++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index fa23cc9c2adc..9099df57037d 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -2,6 +2,7 @@
 #ifndef __ARM64_KVM_NESTED_H
 #define __ARM64_KVM_NESTED_H
 
+#include <linux/bitfield.h>
 #include <linux/kvm_host.h>
 
 static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
@@ -11,6 +12,53 @@ static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
 		test_bit(KVM_ARM_VCPU_HAS_EL2, vcpu->arch.features));
 }
 
+/* Translation helpers from non-VHE EL2 to EL1 */
+static inline u64 tcr_el2_ps_to_tcr_el1_ips(u64 tcr_el2)
+{
+	return (u64)FIELD_GET(TCR_EL2_PS_MASK, tcr_el2) << TCR_IPS_SHIFT;
+}
+
+static inline u64 translate_tcr_el2_to_tcr_el1(u64 tcr)
+{
+	return TCR_EPD1_MASK |				/* disable TTBR1_EL1 */
+	       ((tcr & TCR_EL2_TBI) ? TCR_TBI0 : 0) |
+	       tcr_el2_ps_to_tcr_el1_ips(tcr) |
+	       (tcr & TCR_EL2_TG0_MASK) |
+	       (tcr & TCR_EL2_ORGN0_MASK) |
+	       (tcr & TCR_EL2_IRGN0_MASK) |
+	       (tcr & TCR_EL2_T0SZ_MASK);
+}
+
+static inline u64 translate_cptr_el2_to_cpacr_el1(u64 cptr_el2)
+{
+	u64 cpacr_el1 = 0;
+
+	if (cptr_el2 & CPTR_EL2_TTA)
+		cpacr_el1 |= CPACR_ELx_TTA;
+	if (!(cptr_el2 & CPTR_EL2_TFP))
+		cpacr_el1 |= CPACR_ELx_FPEN;
+	if (!(cptr_el2 & CPTR_EL2_TZ))
+		cpacr_el1 |= CPACR_ELx_ZEN;
+
+	return cpacr_el1;
+}
+
+static inline u64 translate_sctlr_el2_to_sctlr_el1(u64 val)
+{
+	/* Only preserve the minimal set of bits we support */
+	val &= (SCTLR_ELx_M | SCTLR_ELx_A | SCTLR_ELx_C | SCTLR_ELx_SA |
+		SCTLR_ELx_I | SCTLR_ELx_IESB | SCTLR_ELx_WXN | SCTLR_ELx_EE);
+	val |= SCTLR_EL1_RES1;
+
+	return val;
+}
+
+static inline u64 translate_ttbr0_el2_to_ttbr0_el1(u64 ttbr0)
+{
+	/* Clear the ASID field */
+	return ttbr0 & ~GENMASK_ULL(63, 48);
+}
+
 extern bool __check_nv_sr_forward(struct kvm_vcpu *vcpu);
 
 struct sys_reg_params;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 15/59] KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg()
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (13 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 14/59] KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 16/59] KVM: arm64: nv: Handle SPSR_EL2 specially Marc Zyngier
                   ` (46 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

KVM internally uses accessor functions when reading or writing the
guest's system registers. This takes care of accessing either the stored
copy or using the "live" EL1 system registers when the host uses VHE.

With the introduction of virtual EL2 we add a bunch of EL2 system
registers, which now must also be taken care of:
- If the guest is running in vEL2, and we access an EL1 sysreg, we must
  revert to the stored version of that, and not use the CPU's copy.
- If the guest is running in vEL1, and we access an EL2 sysreg, we must
  also use the stored version, since the CPU carries the EL1 copy.
- Some EL2 system registers are supposed to affect the current execution
  of the system, so we need to put them into their respective EL1
  counterparts. For this we need to define a mapping between the two.
  This is done using the newly introduced struct el2_sysreg_map.
- Some EL2 system registers have a different format than their EL1
  counterpart, so we need to translate them before writing them to the
  CPU. This is done using an (optional) translate function in the map.
- There are the three special registers SP_EL2, SPSR_EL2 and ELR_EL2,
  which need some separate handling (SPSR_EL2 is being handled in a
  separate patch).

All of these cases are now wrapped into the existing accessor functions,
so KVM users wouldn't need to care whether they access EL2 or EL1
registers and also which state the guest is in.

This handles what was formerly known as the "shadow state" dynamically,
without requiring a separate copy for each vCPU EL.

Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Co-developed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/sys_regs.c | 143 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 138 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 77f8bc64148a..6e5783655b03 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -63,24 +63,157 @@ static bool write_to_read_only(struct kvm_vcpu *vcpu,
 	return false;
 }
 
+#define PURE_EL2_SYSREG(el2)						\
+	case el2: {							\
+		*el1r = el2;						\
+		return true;						\
+	}
+
+#define MAPPED_EL2_SYSREG(el2, el1, fn)					\
+	case el2: {							\
+		*xlate = fn;						\
+		*el1r = el1;						\
+		return true;						\
+	}
+
+static bool get_el2_to_el1_mapping(unsigned int reg,
+				   unsigned int *el1r, u64 (**xlate)(u64))
+{
+	switch (reg) {
+		PURE_EL2_SYSREG(  VPIDR_EL2	);
+		PURE_EL2_SYSREG(  VMPIDR_EL2	);
+		PURE_EL2_SYSREG(  ACTLR_EL2	);
+		PURE_EL2_SYSREG(  HCR_EL2	);
+		PURE_EL2_SYSREG(  MDCR_EL2	);
+		PURE_EL2_SYSREG(  HSTR_EL2	);
+		PURE_EL2_SYSREG(  HACR_EL2	);
+		PURE_EL2_SYSREG(  VTTBR_EL2	);
+		PURE_EL2_SYSREG(  VTCR_EL2	);
+		PURE_EL2_SYSREG(  RVBAR_EL2	);
+		PURE_EL2_SYSREG(  TPIDR_EL2	);
+		PURE_EL2_SYSREG(  HPFAR_EL2	);
+		PURE_EL2_SYSREG(  ELR_EL2	);
+		PURE_EL2_SYSREG(  SPSR_EL2	);
+		PURE_EL2_SYSREG(  CNTHCTL_EL2	);
+		MAPPED_EL2_SYSREG(SCTLR_EL2,   SCTLR_EL1,
+				  translate_sctlr_el2_to_sctlr_el1	     );
+		MAPPED_EL2_SYSREG(CPTR_EL2,    CPACR_EL1,
+				  translate_cptr_el2_to_cpacr_el1	     );
+		MAPPED_EL2_SYSREG(TTBR0_EL2,   TTBR0_EL1,
+				  translate_ttbr0_el2_to_ttbr0_el1	     );
+		MAPPED_EL2_SYSREG(TTBR1_EL2,   TTBR1_EL1,   NULL	     );
+		MAPPED_EL2_SYSREG(TCR_EL2,     TCR_EL1,
+				  translate_tcr_el2_to_tcr_el1		     );
+		MAPPED_EL2_SYSREG(VBAR_EL2,    VBAR_EL1,    NULL	     );
+		MAPPED_EL2_SYSREG(AFSR0_EL2,   AFSR0_EL1,   NULL	     );
+		MAPPED_EL2_SYSREG(AFSR1_EL2,   AFSR1_EL1,   NULL	     );
+		MAPPED_EL2_SYSREG(ESR_EL2,     ESR_EL1,     NULL	     );
+		MAPPED_EL2_SYSREG(FAR_EL2,     FAR_EL1,     NULL	     );
+		MAPPED_EL2_SYSREG(MAIR_EL2,    MAIR_EL1,    NULL	     );
+		MAPPED_EL2_SYSREG(AMAIR_EL2,   AMAIR_EL1,   NULL	     );
+	default:
+		return false;
+	}
+}
+
 u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
 {
 	u64 val = 0x8badf00d8badf00d;
+	u64 (*xlate)(u64) = NULL;
+	unsigned int el1r;
 
-	if (vcpu_get_flag(vcpu, SYSREGS_ON_CPU) &&
-	    __vcpu_read_sys_reg_from_cpu(reg, &val))
+	if (!vcpu_get_flag(vcpu, SYSREGS_ON_CPU))
+		goto memory_read;
+
+	if (unlikely(get_el2_to_el1_mapping(reg, &el1r, &xlate))) {
+		if (!is_hyp_ctxt(vcpu))
+			goto memory_read;
+
+		/*
+		 * ELR_EL2 is special cased for now.
+		 */
+		switch (reg) {
+		case ELR_EL2:
+			return read_sysreg_el1(SYS_ELR);
+		}
+
+		/*
+		 * If this register does not have an EL1 counterpart,
+		 * then read the stored EL2 version.
+		 */
+		if (reg == el1r)
+			goto memory_read;
+
+		/*
+		 * If we have a non-VHE guest and that the sysreg
+		 * requires translation to be used at EL1, use the
+		 * in-memory copy instead.
+		 */
+		if (!vcpu_el2_e2h_is_set(vcpu) && xlate)
+			goto memory_read;
+
+		/* Get the current version of the EL1 counterpart. */
+		WARN_ON(!__vcpu_read_sys_reg_from_cpu(el1r, &val));
+		return val;
+	}
+
+	/* EL1 register can't be on the CPU if the guest is in vEL2. */
+	if (unlikely(is_hyp_ctxt(vcpu)))
+		goto memory_read;
+
+	if (__vcpu_read_sys_reg_from_cpu(reg, &val))
 		return val;
 
+memory_read:
 	return __vcpu_sys_reg(vcpu, reg);
 }
 
 void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
 {
-	if (vcpu_get_flag(vcpu, SYSREGS_ON_CPU) &&
-	    __vcpu_write_sys_reg_to_cpu(val, reg))
+	u64 (*xlate)(u64) = NULL;
+	unsigned int el1r;
+
+	if (!vcpu_get_flag(vcpu, SYSREGS_ON_CPU))
+		goto memory_write;
+
+	if (unlikely(get_el2_to_el1_mapping(reg, &el1r, &xlate))) {
+		if (!is_hyp_ctxt(vcpu))
+			goto memory_write;
+
+		/*
+		 * Always store a copy of the write to memory to avoid having
+		 * to reverse-translate virtual EL2 system registers for a
+		 * non-VHE guest hypervisor.
+		 */
+		__vcpu_sys_reg(vcpu, reg) = val;
+
+		switch (reg) {
+		case ELR_EL2:
+			write_sysreg_el1(val, SYS_ELR);
+			return;
+		}
+
+		/* No EL1 counterpart? We're done here.? */
+		if (reg == el1r)
+			return;
+
+		if (!vcpu_el2_e2h_is_set(vcpu) && xlate)
+			val = xlate(val);
+
+		/* Redirect this to the EL1 version of the register. */
+		WARN_ON(!__vcpu_write_sys_reg_to_cpu(val, el1r));
+		return;
+	}
+
+	/* EL1 register can't be on the CPU if the guest is in vEL2. */
+	if (unlikely(is_hyp_ctxt(vcpu)))
+		goto memory_write;
+
+	if (__vcpu_write_sys_reg_to_cpu(val, reg))
 		return;
 
-	__vcpu_sys_reg(vcpu, reg) = val;
+memory_write:
+	 __vcpu_sys_reg(vcpu, reg) = val;
 }
 
 /* CSSELR values; used to index KVM_REG_ARM_DEMUX_ID_CCSIDR */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 16/59] KVM: arm64: nv: Handle SPSR_EL2 specially
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (14 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 15/59] KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg() Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 17/59] KVM: arm64: nv: Handle HCR_EL2.E2H specially Marc Zyngier
                   ` (45 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

SPSR_EL2 needs special attention when running nested on ARMv8.3:

If taking an exception while running at vEL2 (actually EL1), the
HW will update the SPSR_EL1 register with the EL1 mode. We need
to track this in order to make sure that accesses to the virtual
view of SPSR_EL2 is correct.

To do so, we place an illegal value in SPSR_EL1.M, and patch it
accordingly if required when accessing it.

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_emulate.h | 37 ++++++++++++++++++++++++++++
 arch/arm64/kvm/sys_regs.c            | 23 +++++++++++++++--
 2 files changed, 58 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index a08291051ac9..fd0979ba4328 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -251,6 +251,43 @@ static inline bool is_hyp_ctxt(const struct kvm_vcpu *vcpu)
 	return __is_hyp_ctxt(&vcpu->arch.ctxt);
 }
 
+static inline u64 __fixup_spsr_el2_write(struct kvm_cpu_context *ctxt, u64 val)
+{
+	if (!__vcpu_el2_e2h_is_set(ctxt)) {
+		/*
+		 * Clear the .M field when writing SPSR to the CPU, so that we
+		 * can detect when the CPU clobbered our SPSR copy during a
+		 * local exception.
+		 */
+		val &= ~0xc;
+	}
+
+	return val;
+}
+
+static inline u64 __fixup_spsr_el2_read(const struct kvm_cpu_context *ctxt, u64 val)
+{
+	if (__vcpu_el2_e2h_is_set(ctxt))
+		return val;
+
+	/*
+	 * SPSR.M == 0 means the CPU has not touched the SPSR, so the
+	 * register has still the value we saved on the last write.
+	 */
+	if ((val & 0xc) == 0)
+		return ctxt_sys_reg(ctxt, SPSR_EL2);
+
+	/*
+	 * Otherwise there was a "local" exception on the CPU,
+	 * which from the guest's point of view was being taken from
+	 * EL2 to EL2, although it actually happened to be from
+	 * EL1 to EL1.
+	 * So we need to fix the .M field in SPSR, to make it look
+	 * like EL2, which is what the guest would expect.
+	 */
+	return (val & ~0x0c) | CurrentEL_EL2;
+}
+
 /*
  * The layout of SPSR for an AArch32 state is different when observed from an
  * AArch64 SPSR_ELx or an AArch32 SPSR_*. This function generates the AArch32
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 6e5783655b03..42397eae594e 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -130,11 +130,14 @@ u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
 			goto memory_read;
 
 		/*
-		 * ELR_EL2 is special cased for now.
+		 * ELR_EL2 and SPSR_EL2 are special cased for now.
 		 */
 		switch (reg) {
 		case ELR_EL2:
 			return read_sysreg_el1(SYS_ELR);
+		case SPSR_EL2:
+			val = read_sysreg_el1(SYS_SPSR);
+			return __fixup_spsr_el2_read(&vcpu->arch.ctxt, val);
 		}
 
 		/*
@@ -191,6 +194,10 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
 		case ELR_EL2:
 			write_sysreg_el1(val, SYS_ELR);
 			return;
+		case SPSR_EL2:
+			val = __fixup_spsr_el2_write(&vcpu->arch.ctxt, val);
+			write_sysreg_el1(val, SYS_SPSR);
+			return;
 		}
 
 		/* No EL1 counterpart? We're done here.? */
@@ -1876,6 +1883,18 @@ static bool access_spsr(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+static bool access_spsr_el2(struct kvm_vcpu *vcpu,
+			    struct sys_reg_params *p,
+			    const struct sys_reg_desc *r)
+{
+	if (p->is_write)
+		vcpu_write_sys_reg(vcpu, p->regval, SPSR_EL2);
+	else
+		p->regval = vcpu_read_sys_reg(vcpu, SPSR_EL2);
+
+	return true;
+}
+
 /*
  * Architected system registers.
  * Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
@@ -2324,7 +2343,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	EL2_REG(VTCR_EL2, access_rw, reset_val, 0),
 
 	{ SYS_DESC(SYS_DACR32_EL2), NULL, reset_unknown, DACR32_EL2 },
-	EL2_REG(SPSR_EL2, access_rw, reset_val, 0),
+	EL2_REG(SPSR_EL2, access_spsr_el2, reset_val, 0),
 	EL2_REG(ELR_EL2, access_rw, reset_val, 0),
 	{ SYS_DESC(SYS_SP_EL1), access_sp_el1},
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 17/59] KVM: arm64: nv: Handle HCR_EL2.E2H specially
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (15 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 16/59] KVM: arm64: nv: Handle SPSR_EL2 specially Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 18/59] KVM: arm64: nv: Save/Restore vEL2 sysregs Marc Zyngier
                   ` (44 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

HCR_EL2.E2H is nasty, as a flip of this bit completely changes the way
we deal with a lot of the state. So when the guest flips this bit
(sysregs are live), do the put/load dance so that we have a consistent
state.

Yes, this is slow. Don't do it.

Suggested-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/sys_regs.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 42397eae594e..a881d5cd1671 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -180,9 +180,24 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
 		goto memory_write;
 
 	if (unlikely(get_el2_to_el1_mapping(reg, &el1r, &xlate))) {
+		bool need_put_load;
+
 		if (!is_hyp_ctxt(vcpu))
 			goto memory_write;
 
+		/*
+		 * HCR_EL2.E2H is nasty: it changes the way we interpret a
+		 * lot of the EL2 state, so treat is as a full state
+		 * transition.
+		 */
+		need_put_load = ((reg == HCR_EL2) &&
+				 vcpu_el2_e2h_is_set(vcpu) != !!(val & HCR_E2H));
+
+		if (need_put_load) {
+			preempt_disable();
+			kvm_arch_vcpu_put(vcpu);
+		}
+
 		/*
 		 * Always store a copy of the write to memory to avoid having
 		 * to reverse-translate virtual EL2 system registers for a
@@ -190,6 +205,11 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
 		 */
 		__vcpu_sys_reg(vcpu, reg) = val;
 
+		if (need_put_load) {
+			kvm_arch_vcpu_load(vcpu, smp_processor_id());
+			preempt_enable();
+		}
+
 		switch (reg) {
 		case ELR_EL2:
 			write_sysreg_el1(val, SYS_ELR);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 18/59] KVM: arm64: nv: Save/Restore vEL2 sysregs
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (16 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 17/59] KVM: arm64: nv: Handle HCR_EL2.E2H specially Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 19/59] KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2 Marc Zyngier
                   ` (43 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

Whenever we need to restore the guest's system registers to the CPU, we
now need to take care of the EL2 system registers as well. Most of them
are accessed via traps only, but some have an immediate effect and also
a guest running in VHE mode would expect them to be accessible via their
EL1 encoding, which we do not trap.

For vEL2 we write the virtual EL2 registers with an identical format directly
into their EL1 counterpart, and translate the few registers that have a
different format for the same effect on the execution when running a
non-VHE guest guest hypervisor.

Based on an initial patch from Andre Przywara, rewritten many times
since.

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h |   5 +-
 arch/arm64/kvm/hyp/nvhe/sysreg-sr.c        |   2 +-
 arch/arm64/kvm/hyp/vhe/sysreg-sr.c         | 124 ++++++++++++++++++++-
 3 files changed, 126 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
index 699ea1f8d409..d67c620f2af7 100644
--- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
+++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
@@ -91,9 +91,10 @@ static inline void __sysreg_restore_user_state(struct kvm_cpu_context *ctxt)
 	write_sysreg(ctxt_sys_reg(ctxt, TPIDRRO_EL0),	tpidrro_el0);
 }
 
-static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
+static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt,
+					      u64 mpidr)
 {
-	write_sysreg(ctxt_sys_reg(ctxt, MPIDR_EL1),	vmpidr_el2);
+	write_sysreg(mpidr,				vmpidr_el2);
 
 	if (has_vhe() ||
 	    !cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT)) {
diff --git a/arch/arm64/kvm/hyp/nvhe/sysreg-sr.c b/arch/arm64/kvm/hyp/nvhe/sysreg-sr.c
index 29305022bc04..dba101565de3 100644
--- a/arch/arm64/kvm/hyp/nvhe/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/sysreg-sr.c
@@ -28,7 +28,7 @@ void __sysreg_save_state_nvhe(struct kvm_cpu_context *ctxt)
 
 void __sysreg_restore_state_nvhe(struct kvm_cpu_context *ctxt)
 {
-	__sysreg_restore_el1_state(ctxt);
+	__sysreg_restore_el1_state(ctxt, ctxt_sys_reg(ctxt, MPIDR_EL1));
 	__sysreg_restore_common_state(ctxt);
 	__sysreg_restore_user_state(ctxt);
 	__sysreg_restore_el2_return_state(ctxt);
diff --git a/arch/arm64/kvm/hyp/vhe/sysreg-sr.c b/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
index b35a178e7e0d..cae3e35a9034 100644
--- a/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
@@ -15,6 +15,95 @@
 #include <asm/kvm_hyp.h>
 #include <asm/kvm_nested.h>
 
+static void __sysreg_save_vel2_state(struct kvm_cpu_context *ctxt)
+{
+	/* These registers are common with EL1 */
+	ctxt_sys_reg(ctxt, PAR_EL1)	= read_sysreg(par_el1);
+	ctxt_sys_reg(ctxt, TPIDR_EL1)	= read_sysreg(tpidr_el1);
+
+	ctxt_sys_reg(ctxt, ESR_EL2)	= read_sysreg_el1(SYS_ESR);
+	ctxt_sys_reg(ctxt, AFSR0_EL2)	= read_sysreg_el1(SYS_AFSR0);
+	ctxt_sys_reg(ctxt, AFSR1_EL2)	= read_sysreg_el1(SYS_AFSR1);
+	ctxt_sys_reg(ctxt, FAR_EL2)	= read_sysreg_el1(SYS_FAR);
+	ctxt_sys_reg(ctxt, MAIR_EL2)	= read_sysreg_el1(SYS_MAIR);
+	ctxt_sys_reg(ctxt, VBAR_EL2)	= read_sysreg_el1(SYS_VBAR);
+	ctxt_sys_reg(ctxt, CONTEXTIDR_EL2) = read_sysreg_el1(SYS_CONTEXTIDR);
+	ctxt_sys_reg(ctxt, AMAIR_EL2)	= read_sysreg_el1(SYS_AMAIR);
+
+	/*
+	 * In VHE mode those registers are compatible between EL1 and EL2,
+	 * and the guest uses the _EL1 versions on the CPU naturally.
+	 * So we save them into their _EL2 versions here.
+	 * For nVHE mode we trap accesses to those registers, so our
+	 * _EL2 copy in sys_regs[] is always up-to-date and we don't need
+	 * to save anything here.
+	 */
+	if (__vcpu_el2_e2h_is_set(ctxt)) {
+		ctxt_sys_reg(ctxt, SCTLR_EL2)	= read_sysreg_el1(SYS_SCTLR);
+		ctxt_sys_reg(ctxt, CPTR_EL2)	= read_sysreg_el1(SYS_CPACR);
+		ctxt_sys_reg(ctxt, TTBR0_EL2)	= read_sysreg_el1(SYS_TTBR0);
+		ctxt_sys_reg(ctxt, TTBR1_EL2)	= read_sysreg_el1(SYS_TTBR1);
+		ctxt_sys_reg(ctxt, TCR_EL2)	= read_sysreg_el1(SYS_TCR);
+		ctxt_sys_reg(ctxt, CNTHCTL_EL2)	= read_sysreg_el1(SYS_CNTKCTL);
+	}
+
+	ctxt_sys_reg(ctxt, SP_EL2)	= read_sysreg(sp_el1);
+	ctxt_sys_reg(ctxt, ELR_EL2)	= read_sysreg_el1(SYS_ELR);
+	ctxt_sys_reg(ctxt, SPSR_EL2)	= __fixup_spsr_el2_read(ctxt, read_sysreg_el1(SYS_SPSR));
+}
+
+static void __sysreg_restore_vel2_state(struct kvm_cpu_context *ctxt)
+{
+	u64 val;
+
+	/* These registers are common with EL1 */
+	write_sysreg(ctxt_sys_reg(ctxt, PAR_EL1),	par_el1);
+	write_sysreg(ctxt_sys_reg(ctxt, TPIDR_EL1),	tpidr_el1);
+
+	write_sysreg(read_cpuid_id(),			vpidr_el2);
+	write_sysreg(ctxt_sys_reg(ctxt, MPIDR_EL1),	vmpidr_el2);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, MAIR_EL2),	SYS_MAIR);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, VBAR_EL2),	SYS_VBAR);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, CONTEXTIDR_EL2),SYS_CONTEXTIDR);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, AMAIR_EL2),	SYS_AMAIR);
+
+	if (__vcpu_el2_e2h_is_set(ctxt)) {
+		/*
+		 * In VHE mode those registers are compatible between
+		 * EL1 and EL2.
+		 */
+		write_sysreg_el1(ctxt_sys_reg(ctxt, SCTLR_EL2),	SYS_SCTLR);
+		write_sysreg_el1(ctxt_sys_reg(ctxt, CPTR_EL2),	SYS_CPACR);
+		write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR0_EL2),	SYS_TTBR0);
+		write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR1_EL2),	SYS_TTBR1);
+		write_sysreg_el1(ctxt_sys_reg(ctxt, TCR_EL2),	SYS_TCR);
+		write_sysreg_el1(ctxt_sys_reg(ctxt, CNTHCTL_EL2), SYS_CNTKCTL);
+	} else {
+		/*
+		 * CNTHCTL_EL2 only affects EL1 when running nVHE, so
+		 * no need to restore it.
+		 */
+		val = translate_sctlr_el2_to_sctlr_el1(ctxt_sys_reg(ctxt, SCTLR_EL2));
+		write_sysreg_el1(val, SYS_SCTLR);
+		val = translate_cptr_el2_to_cpacr_el1(ctxt_sys_reg(ctxt, CPTR_EL2));
+		write_sysreg_el1(val, SYS_CPACR);
+		val = translate_ttbr0_el2_to_ttbr0_el1(ctxt_sys_reg(ctxt, TTBR0_EL2));
+		write_sysreg_el1(val, SYS_TTBR0);
+		val = translate_tcr_el2_to_tcr_el1(ctxt_sys_reg(ctxt, TCR_EL2));
+		write_sysreg_el1(val, SYS_TCR);
+	}
+
+	write_sysreg_el1(ctxt_sys_reg(ctxt, ESR_EL2),	SYS_ESR);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, AFSR0_EL2),	SYS_AFSR0);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, AFSR1_EL2),	SYS_AFSR1);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, FAR_EL2),	SYS_FAR);
+	write_sysreg(ctxt_sys_reg(ctxt, SP_EL2),	sp_el1);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, ELR_EL2),	SYS_ELR);
+
+	val = __fixup_spsr_el2_write(ctxt, ctxt_sys_reg(ctxt, SPSR_EL2));
+	write_sysreg_el1(val,	SYS_SPSR);
+}
+
 /*
  * VHE: Host and guest must save mdscr_el1 and sp_el0 (and the PC and
  * pstate, which are handled as part of the el2 return state) on every
@@ -66,6 +155,7 @@ void kvm_vcpu_load_sysregs_vhe(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
 	struct kvm_cpu_context *host_ctxt;
+	u64 mpidr;
 
 	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
 	__sysreg_save_user_state(host_ctxt);
@@ -89,7 +179,29 @@ void kvm_vcpu_load_sysregs_vhe(struct kvm_vcpu *vcpu)
 	 */
 	__sysreg32_restore_state(vcpu);
 	__sysreg_restore_user_state(guest_ctxt);
-	__sysreg_restore_el1_state(guest_ctxt);
+
+	if (unlikely(__is_hyp_ctxt(guest_ctxt))) {
+		__sysreg_restore_vel2_state(guest_ctxt);
+	} else {
+		if (vcpu_has_nv(vcpu)) {
+			/*
+			 * Only set VPIDR_EL2 for nested VMs, as this is the
+			 * only time it changes. We'll restore the MIDR_EL1
+			 * view on put.
+			 */
+			write_sysreg(ctxt_sys_reg(guest_ctxt, VPIDR_EL2), vpidr_el2);
+
+			/*
+			 * As we're restoring a nested guest, set the value
+			 * provided by the guest hypervisor.
+			 */
+			mpidr = ctxt_sys_reg(guest_ctxt, VMPIDR_EL2);
+		} else {
+			mpidr = ctxt_sys_reg(guest_ctxt, MPIDR_EL1);
+		}
+
+		__sysreg_restore_el1_state(guest_ctxt, mpidr);
+	}
 
 	vcpu_set_flag(vcpu, SYSREGS_ON_CPU);
 
@@ -115,12 +227,20 @@ void kvm_vcpu_put_sysregs_vhe(struct kvm_vcpu *vcpu)
 	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
 	deactivate_traps_vhe_put(vcpu);
 
-	__sysreg_save_el1_state(guest_ctxt);
+	if (unlikely(__is_hyp_ctxt(guest_ctxt)))
+		__sysreg_save_vel2_state(guest_ctxt);
+	else
+		__sysreg_save_el1_state(guest_ctxt);
+
 	__sysreg_save_user_state(guest_ctxt);
 	__sysreg32_save_state(vcpu);
 
 	/* Restore host user state */
 	__sysreg_restore_user_state(host_ctxt);
 
+	/* If leaving a nesting guest, restore MPIDR_EL1 default view */
+	if (vcpu_has_nv(vcpu))
+		write_sysreg(read_cpuid_id(),	vpidr_el2);
+
 	vcpu_clear_flag(vcpu, SYSREGS_ON_CPU);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 19/59] KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (17 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 18/59] KVM: arm64: nv: Save/Restore vEL2 sysregs Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 20/59] KVM: arm64: nv: Trap CPACR_EL1 access " Marc Zyngier
                   ` (42 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

From: Christoffer Dall <christoffer.dall@linaro.org>

When running in virtual EL2 mode, we actually run the hardware in EL1
and therefore have to use the EL1 registers to ensure correct operation.

By setting the HCR.TVM and HCR.TVRM we ensure that the virtual EL2 mode
doesn't shoot itself in the foot when setting up what it believes to be
a different mode's system register state (for example when preparing to
switch to a VM).

We can leverage the existing sysregs infrastructure to support trapped
accesses to these registers.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/hyp/include/hyp/switch.h |  4 +---
 arch/arm64/kvm/hyp/nvhe/switch.c        |  2 +-
 arch/arm64/kvm/hyp/vhe/switch.c         |  7 ++++++-
 arch/arm64/kvm/sys_regs.c               | 19 ++++++++++++++++---
 4 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index e78a08a72a3c..c6d62357e736 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -119,10 +119,8 @@ static inline void __deactivate_traps_common(struct kvm_vcpu *vcpu)
 	}
 }
 
-static inline void ___activate_traps(struct kvm_vcpu *vcpu)
+static inline void ___activate_traps(struct kvm_vcpu *vcpu, u64 hcr)
 {
-	u64 hcr = vcpu->arch.hcr_el2;
-
 	if (cpus_have_final_cap(ARM64_WORKAROUND_CAVIUM_TX2_219_TVM))
 		hcr |= HCR_TVM;
 
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index 71fa16a0dc77..7730860552da 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -40,7 +40,7 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 {
 	u64 val;
 
-	___activate_traps(vcpu);
+	___activate_traps(vcpu, vcpu->arch.hcr_el2);
 	__activate_traps_common(vcpu);
 
 	val = vcpu->arch.cptr_el2;
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 3d868e84c7a0..14880c84678f 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -35,9 +35,14 @@ DEFINE_PER_CPU(unsigned long, kvm_hyp_vector);
 
 static void __activate_traps(struct kvm_vcpu *vcpu)
 {
+	u64 hcr = vcpu->arch.hcr_el2;
 	u64 val;
 
-	___activate_traps(vcpu);
+	/* Trap VM sysreg accesses if an EL2 guest is not using VHE. */
+	if (vcpu_is_el2(vcpu) && !vcpu_el2_e2h_is_set(vcpu))
+		hcr |= HCR_TVM | HCR_TRVM;
+
+	___activate_traps(vcpu, hcr);
 
 	val = read_sysreg(cpacr_el1);
 	val |= CPACR_ELx_TTA;
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index a881d5cd1671..28b882b79d3e 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -391,8 +391,15 @@ static void get_access_mask(const struct sys_reg_desc *r, u64 *mask, u64 *shift)
 
 /*
  * Generic accessor for VM registers. Only called as long as HCR_TVM
- * is set. If the guest enables the MMU, we stop trapping the VM
- * sys_regs and leave it in complete control of the caches.
+ * is set.
+ *
+ * This is set in two cases: either (1) we're running at vEL2, or (2)
+ * we're running at EL1 and the guest has its MMU off.
+ *
+ * (1) TVM/TRVM is set, as we need to virtualise some of the VM
+ * registers for the guest hypervisor
+ * (2) Once the guest enables the MMU, we stop trapping the VM sys_regs
+ * and leave it in complete control of the caches.
  */
 static bool access_vm_reg(struct kvm_vcpu *vcpu,
 			  struct sys_reg_params *p,
@@ -401,7 +408,13 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
 	bool was_enabled = vcpu_has_cache_enabled(vcpu);
 	u64 val, mask, shift;
 
-	BUG_ON(!p->is_write);
+	/* We don't expect TRVM on the host */
+	BUG_ON(!vcpu_is_el2(vcpu) && !p->is_write);
+
+	if (!p->is_write) {
+		p->regval = vcpu_read_sys_reg(vcpu, r->reg);
+		return true;
+	}
 
 	get_access_mask(r, &mask, &shift);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 20/59] KVM: arm64: nv: Trap CPACR_EL1 access in virtual EL2
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (18 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 19/59] KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2 Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 21/59] KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting Marc Zyngier
                   ` (41 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

From: Jintack Lim <jintack.lim@linaro.org>

For the same reason we trap virtual memory register accesses in virtual
EL2, we trap CPACR_EL1 access too; We allow the virtual EL2 mode to
access EL1 system register state instead of the virtual EL2 one.

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/hyp/vhe/switch.c | 3 +++
 arch/arm64/kvm/sys_regs.c       | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 14880c84678f..8d1a6b007c19 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -68,6 +68,9 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 		__activate_traps_fpsimd32(vcpu);
 	}
 
+	if (vcpu_is_el2(vcpu) && !vcpu_el2_e2h_is_set(vcpu))
+		val |= CPTR_EL2_TCPAC;
+	
 	write_sysreg(val, cpacr_el1);
 
 	write_sysreg(__this_cpu_read(kvm_hyp_vector), vbar_el1);
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 28b882b79d3e..52d798dbc202 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2065,7 +2065,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 
 	{ SYS_DESC(SYS_SCTLR_EL1), access_vm_reg, reset_val, SCTLR_EL1, 0x00C50078 },
 	{ SYS_DESC(SYS_ACTLR_EL1), access_actlr, reset_actlr, ACTLR_EL1 },
-	{ SYS_DESC(SYS_CPACR_EL1), NULL, reset_val, CPACR_EL1, 0 },
+	{ SYS_DESC(SYS_CPACR_EL1), access_rw, reset_val, CPACR_EL1, 0 },
 
 	MTE_REG(RGSR_EL1),
 	MTE_REG(GCR_EL1),
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 21/59] KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (19 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 20/59] KVM: arm64: nv: Trap CPACR_EL1 access " Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 22/59] KVM: arm64: nv: Respect virtual CPTR_EL2.{TFP,FPEN} settings Marc Zyngier
                   ` (40 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

From: Jintack Lim <jintack.lim@linaro.org>

Forward exceptions due to WFI or WFE instructions to the virtual EL2 if
they are not coming from the virtual EL2 and virtual HCR_EL2.TWX is set.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_emulate.h | 14 ++++++++++++++
 arch/arm64/kvm/handle_exit.c         |  6 +++++-
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index fd0979ba4328..1ff0a224c32b 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -17,6 +17,7 @@
 #include <asm/esr.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_nested.h>
 #include <asm/ptrace.h>
 #include <asm/cputype.h>
 #include <asm/virt.h>
@@ -339,6 +340,19 @@ static __always_inline u64 kvm_vcpu_get_esr(const struct kvm_vcpu *vcpu)
 	return vcpu->arch.fault.esr_el2;
 }
 
+static inline bool guest_hyp_wfx_traps_enabled(const struct kvm_vcpu *vcpu)
+{
+	u64 esr = kvm_vcpu_get_esr(vcpu);
+	bool is_wfe = !!(esr & ESR_ELx_WFx_ISS_WFE);
+	u64 hcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
+
+	if (!vcpu_has_nv(vcpu) || vcpu_is_el2(vcpu))
+		return false;
+
+	return ((is_wfe && (hcr_el2 & HCR_TWE)) ||
+		(!is_wfe && (hcr_el2 & HCR_TWI)));
+}
+
 static __always_inline int kvm_vcpu_get_condition(const struct kvm_vcpu *vcpu)
 {
 	u64 esr = kvm_vcpu_get_esr(vcpu);
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 6dcd6604b6bc..5811a791cf01 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -114,8 +114,12 @@ static int handle_no_fpsimd(struct kvm_vcpu *vcpu)
 static int kvm_handle_wfx(struct kvm_vcpu *vcpu)
 {
 	u64 esr = kvm_vcpu_get_esr(vcpu);
+	bool is_wfe = !!(esr & ESR_ELx_WFx_ISS_WFE);
 
-	if (esr & ESR_ELx_WFx_ISS_WFE) {
+	if (guest_hyp_wfx_traps_enabled(vcpu))
+		return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
+
+	if (is_wfe) {
 		trace_kvm_wfx_arm64(*vcpu_pc(vcpu), true);
 		vcpu->stat.wfe_exit_stat++;
 	} else {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 22/59] KVM: arm64: nv: Respect virtual CPTR_EL2.{TFP,FPEN} settings
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (20 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 21/59] KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 23/59] KVM: arm64: nv: Respect virtual HCR_EL2.{NV,TSC) settings Marc Zyngier
                   ` (39 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

From: Jintack Lim <jintack.lim@linaro.org>

Forward traps due to FP/ASIMD register accesses to the virtual EL2
if virtual CPTR_EL2.TFP is set (with HCR_EL2.E2H == 0) or
CPTR_EL2.FPEN is configure to do so (with HCR_EL2.E2h == 1).

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
[maz: account for HCR_EL2.E2H when testing for TFP/FPEN, with
 all the hard work actually being done by Chase Conklin]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_emulate.h    | 25 +++++++++++++++++++++++++
 arch/arm64/kvm/handle_exit.c            | 16 ++++++++++++----
 arch/arm64/kvm/hyp/include/hyp/switch.h |  4 ++++
 3 files changed, 41 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 1ff0a224c32b..c64980d33c8b 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -11,6 +11,7 @@
 #ifndef __ARM64_KVM_EMULATE_H__
 #define __ARM64_KVM_EMULATE_H__
 
+#include <linux/bitfield.h>
 #include <linux/kvm_host.h>
 
 #include <asm/debug-monitors.h>
@@ -335,6 +336,30 @@ static inline bool vcpu_mode_priv(const struct kvm_vcpu *vcpu)
 	return mode != PSR_MODE_EL0t;
 }
 
+static inline bool guest_hyp_fpsimd_traps_enabled(const struct kvm_vcpu *vcpu)
+{
+	u64 val;
+
+	if (!vcpu_has_nv(vcpu))
+		return false;
+
+	val = vcpu_read_sys_reg(vcpu, CPTR_EL2);
+
+	if (!vcpu_el2_e2h_is_set(vcpu))
+		return (val & CPTR_EL2_TFP);
+
+	switch (FIELD_GET(CPACR_ELx_FPEN, val)) {
+	case 0b00:
+	case 0b10:
+		return true;
+	case 0b01:
+		return vcpu_el2_tge_is_set(vcpu) && !vcpu_is_el2(vcpu);
+	case 0b11:
+	default:		/* GCC is dumb */
+		return false;
+	}
+}
+
 static __always_inline u64 kvm_vcpu_get_esr(const struct kvm_vcpu *vcpu)
 {
 	return vcpu->arch.fault.esr_el2;
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 5811a791cf01..c4dc144726ee 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -87,11 +87,19 @@ static int handle_smc(struct kvm_vcpu *vcpu)
 }
 
 /*
- * Guest access to FP/ASIMD registers are routed to this handler only
- * when the system doesn't support FP/ASIMD.
+ * This handles the cases where the system does not support FP/ASIMD or when
+ * we are running nested virtualization and the guest hypervisor is trapping
+ * FP/ASIMD accesses by its guest guest.
+ *
+ * All other handling of guest vs. host FP/ASIMD register state is handled in
+ * fixup_guest_exit().
  */
-static int handle_no_fpsimd(struct kvm_vcpu *vcpu)
+static int kvm_handle_fpasimd(struct kvm_vcpu *vcpu)
 {
+	if (guest_hyp_fpsimd_traps_enabled(vcpu))
+		return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
+
+	/* This is the case when the system doesn't support FP/ASIMD. */
 	kvm_inject_undefined(vcpu);
 	return 1;
 }
@@ -253,7 +261,7 @@ static exit_handle_fn arm_exit_handlers[] = {
 	[ESR_ELx_EC_BREAKPT_LOW]= kvm_handle_guest_debug,
 	[ESR_ELx_EC_BKPT32]	= kvm_handle_guest_debug,
 	[ESR_ELx_EC_BRK64]	= kvm_handle_guest_debug,
-	[ESR_ELx_EC_FP_ASIMD]	= handle_no_fpsimd,
+	[ESR_ELx_EC_FP_ASIMD]	= kvm_handle_fpasimd,
 	[ESR_ELx_EC_PAC]	= kvm_handle_ptrauth,
 };
 
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index c6d62357e736..0a344184f26c 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -172,6 +172,10 @@ static bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code)
 	if (!system_supports_fpsimd())
 		return false;
 
+	/* Forward traps to the guest hypervisor as required */
+	if (guest_hyp_fpsimd_traps_enabled(vcpu))
+		return false;
+
 	sve_guest = vcpu_has_sve(vcpu);
 	esr_ec = kvm_vcpu_trap_get_class(vcpu);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 23/59] KVM: arm64: nv: Respect virtual HCR_EL2.{NV,TSC) settings
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (21 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 22/59] KVM: arm64: nv: Respect virtual CPTR_EL2.{TFP,FPEN} settings Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 24/59] KVM: arm64: nv: Configure HCR_EL2 for nested virtualization Marc Zyngier
                   ` (38 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

From: Jintack Lim <jintack.lim@linaro.org>

Forward traps due to HCR_EL2.{NV,TSC} bits to the virtual EL2 if
they are not coming from the virtual EL2, corresponding to SMC
and ERET instructions.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
[Moved code to emulate-nested.c]
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
[maz: drastically simplified]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_nested.h |  1 +
 arch/arm64/kvm/emulate-nested.c     | 27 +++++++++++++++++++++++++++
 arch/arm64/kvm/handle_exit.c        |  7 +++++++
 3 files changed, 35 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 9099df57037d..948ba0337558 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -59,6 +59,7 @@ static inline u64 translate_ttbr0_el2_to_ttbr0_el1(u64 ttbr0)
 	return ttbr0 & ~GENMASK_ULL(63, 48);
 }
 
+extern bool forward_smc_trap(struct kvm_vcpu *vcpu);
 extern bool __check_nv_sr_forward(struct kvm_vcpu *vcpu);
 
 struct sys_reg_params;
diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
index 101292c43602..08b152f8ba6a 100644
--- a/arch/arm64/kvm/emulate-nested.c
+++ b/arch/arm64/kvm/emulate-nested.c
@@ -900,6 +900,26 @@ bool __check_nv_sr_forward(struct kvm_vcpu *vcpu)
 	return true;
 }
 
+static bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit)
+{
+	bool control_bit_set;
+
+	if (!vcpu_has_nv(vcpu))
+		return false;
+
+	control_bit_set = __vcpu_sys_reg(vcpu, HCR_EL2) & control_bit;
+	if (!vcpu_is_el2(vcpu) && control_bit_set) {
+		kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
+		return true;
+	}
+	return false;
+}
+
+bool forward_smc_trap(struct kvm_vcpu *vcpu)
+{
+	return forward_traps(vcpu, HCR_TSC);
+}
+
 static u64 kvm_check_illegal_exception_return(struct kvm_vcpu *vcpu, u64 spsr)
 {
 	u64 mode = spsr & PSR_MODE_MASK;
@@ -938,6 +958,13 @@ void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu)
 	u64 spsr, elr, mode;
 	bool direct_eret;
 
+	/*
+	 * Forward this trap to the virtual EL2 if the virtual
+	 * HCR_EL2.NV bit is set and this is coming from !EL2.
+	 */
+	if (forward_traps(vcpu, HCR_NV))
+		return;
+
 	/*
 	 * Going through the whole put/load motions is a waste of time
 	 * if this is a VHE guest hypervisor returning to its own
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index c4dc144726ee..5903102c83f5 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -55,6 +55,13 @@ static int handle_hvc(struct kvm_vcpu *vcpu)
 
 static int handle_smc(struct kvm_vcpu *vcpu)
 {
+	/*
+	 * Forward this trapped smc instruction to the virtual EL2 if
+	 * the guest has asked for it.
+	 */
+	if (forward_smc_trap(vcpu))
+		return 1;
+
 	/*
 	 * "If an SMC instruction executed at Non-secure EL1 is
 	 * trapped to EL2 because HCR_EL2.TSC is 1, the exception is a
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 24/59] KVM: arm64: nv: Configure HCR_EL2 for nested virtualization
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (22 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 23/59] KVM: arm64: nv: Respect virtual HCR_EL2.{NV,TSC) settings Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 25/59] KVM: arm64: nv: Support multiple nested Stage-2 mmu structures Marc Zyngier
                   ` (37 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

From: Jintack Lim <jintack.lim@linaro.org>

We enable nested virtualization by setting the HCR NV and NV1 bit.

When the virtual E2H bit is set, we can support EL2 register accesses
via EL1 registers from the virtual EL2 by doing trap-and-emulate. A
better alternative, however, is to allow the virtual EL2 to access EL2
register states without trap. This can be easily achieved by not traping
EL1 registers since those registers already have EL2 register states.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_arm.h |  1 +
 arch/arm64/kvm/hyp/vhe/switch.c  | 38 +++++++++++++++++++++++++++++---
 2 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 4b3e55abb30f..ee1c4d71137c 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -97,6 +97,7 @@
 			 HCR_BSU_IS | HCR_FB | HCR_TACR | \
 			 HCR_AMO | HCR_SWIO | HCR_TIDCP | HCR_RW | HCR_TLOR | \
 			 HCR_FMO | HCR_IMO | HCR_PTW | HCR_TID3)
+#define HCR_GUEST_NV_FILTER_FLAGS (HCR_ATA | HCR_API | HCR_APK | HCR_FIEN)
 #define HCR_VIRT_EXCP_MASK (HCR_VSE | HCR_VI | HCR_VF)
 #define HCR_HOST_NVHE_FLAGS (HCR_RW | HCR_API | HCR_APK | HCR_ATA)
 #define HCR_HOST_NVHE_PROTECTED_FLAGS (HCR_HOST_NVHE_FLAGS | HCR_TSC)
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 8d1a6b007c19..d2fcd881a267 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -38,9 +38,41 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 	u64 hcr = vcpu->arch.hcr_el2;
 	u64 val;
 
-	/* Trap VM sysreg accesses if an EL2 guest is not using VHE. */
-	if (vcpu_is_el2(vcpu) && !vcpu_el2_e2h_is_set(vcpu))
-		hcr |= HCR_TVM | HCR_TRVM;
+	if (is_hyp_ctxt(vcpu)) {
+		hcr |= HCR_NV;
+
+		if (!vcpu_el2_e2h_is_set(vcpu)) {
+			/*
+			 * For a guest hypervisor on v8.0, trap and emulate
+			 * the EL1 virtual memory control register accesses.
+			 */
+			hcr |= HCR_TVM | HCR_TRVM | HCR_NV1;
+		} else {
+			/*
+			 * For a guest hypervisor on v8.1 (VHE), allow to
+			 * access the EL1 virtual memory control registers
+			 * natively. These accesses are to access EL2 register
+			 * states.
+			 * Note that we still need to respect the virtual
+			 * HCR_EL2 state.
+			 */
+			u64 vhcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
+
+			vhcr_el2 &= ~HCR_GUEST_NV_FILTER_FLAGS;
+
+			/*
+			 * We already set TVM to handle set/way cache maint
+			 * ops traps, this somewhat collides with the nested
+			 * virt trapping for nVHE. So turn this off for now
+			 * here, in the hope that VHE guests won't ever do this.
+			 * TODO: find out whether it's worth to support both
+			 * cases at the same time.
+			 */
+			hcr &= ~HCR_TVM;
+
+			hcr |= vhcr_el2 & (HCR_TVM | HCR_TRVM);
+		}
+	}
 
 	___activate_traps(vcpu, hcr);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 25/59] KVM: arm64: nv: Support multiple nested Stage-2 mmu structures
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (23 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 24/59] KVM: arm64: nv: Configure HCR_EL2 for nested virtualization Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 26/59] KVM: arm64: nv: Implement nested Stage-2 page table walk logic Marc Zyngier
                   ` (36 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

Add Stage-2 mmu data structures for virtual EL2 and for nested guests.
We don't yet populate shadow Stage-2 page tables, but we now have a
framework for getting to a shadow Stage-2 pgd.

We allocate twice the number of vcpus as Stage-2 mmu structures because
that's sufficient for each vcpu running two translation regimes without
having to flush the Stage-2 page tables.

Co-developed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h   |  41 ++++++
 arch/arm64/include/asm/kvm_mmu.h    |   9 ++
 arch/arm64/include/asm/kvm_nested.h |   7 +
 arch/arm64/kvm/arm.c                |  18 ++-
 arch/arm64/kvm/mmu.c                |  78 ++++++++---
 arch/arm64/kvm/nested.c             | 210 ++++++++++++++++++++++++++++
 arch/arm64/kvm/reset.c              |   6 +
 7 files changed, 345 insertions(+), 24 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 65810618cb42..2e2969f53b4a 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -170,8 +170,40 @@ struct kvm_s2_mmu {
 	int __percpu *last_vcpu_ran;
 
 	struct kvm_arch *arch;
+
+	/*
+	 * For a shadow stage-2 MMU, the virtual vttbr used by the
+	 * host to parse the guest S2.
+	 * This either contains:
+	 * - the virtual VTTBR programmed by the guest hypervisor with
+         *   CnP cleared
+	 * - The value 1 (VMID=0, BADDR=0, CnP=1) if invalid
+	 *
+	 * We also cache the full VTCR which gets used for TLB invalidation,
+	 * taking the ARM ARM's "Any of the bits in VTCR_EL2 are permitted
+	 * to be cached in a TLB" to the letter.
+	 */
+	u64	tlb_vttbr;
+	u64	tlb_vtcr;
+
+	/*
+	 * true when this represents a nested context where virtual
+	 * HCR_EL2.VM == 1
+	 */
+	bool	nested_stage2_enabled;
+
+	/*
+	 *  0: Nobody is currently using this, check vttbr for validity
+	 * >0: Somebody is actively using this.
+	 */
+	atomic_t refcnt;
 };
 
+static inline bool kvm_s2_mmu_valid(struct kvm_s2_mmu *mmu)
+{
+	return !(mmu->tlb_vttbr & 1);
+}
+
 struct kvm_arch_memory_slot {
 };
 
@@ -198,6 +230,14 @@ struct kvm_protected_vm {
 struct kvm_arch {
 	struct kvm_s2_mmu mmu;
 
+	/*
+	 * Stage 2 paging state for VMs with nested S2 using a virtual
+	 * VMID.
+	 */
+	struct kvm_s2_mmu *nested_mmus;
+	size_t nested_mmus_size;
+	int nested_mmus_next;
+
 	/* Interrupt controller */
 	struct vgic_dist	vgic;
 
@@ -1082,6 +1122,7 @@ void kvm_vcpu_load_sysregs_vhe(struct kvm_vcpu *vcpu);
 void kvm_vcpu_put_sysregs_vhe(struct kvm_vcpu *vcpu);
 
 int __init kvm_set_ipa_limit(void);
+u32 kvm_get_pa_bits(struct kvm *kvm);
 
 #define __KVM_HAVE_ARCH_VM_ALLOC
 struct kvm *kvm_arch_alloc_vm(void);
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 346754b2f3b0..896acdf98e71 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -119,6 +119,7 @@ alternative_cb_end
 #include <asm/mmu_context.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_host.h>
+#include <asm/kvm_nested.h>
 
 void kvm_update_va_mask(struct alt_instr *alt,
 			__le32 *origptr, __le32 *updptr, int nr_inst);
@@ -170,6 +171,7 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
 			     void **haddr);
 void __init free_hyp_pgds(void);
 
+void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size);
 void stage2_unmap_vm(struct kvm *kvm);
 int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long type);
 void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
@@ -311,5 +313,12 @@ static inline struct kvm *kvm_s2_mmu_to_kvm(struct kvm_s2_mmu *mmu)
 {
 	return container_of(mmu->arch, struct kvm, arch);
 }
+
+static inline u64 get_vmid(u64 vttbr)
+{
+	return (vttbr & VTTBR_VMID_MASK(kvm_get_vmid_bits())) >>
+		VTTBR_VMID_SHIFT;
+}
+
 #endif /* __ASSEMBLY__ */
 #endif /* __ARM64_KVM_MMU_H__ */
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 948ba0337558..d57e609dd3f4 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -59,6 +59,13 @@ static inline u64 translate_ttbr0_el2_to_ttbr0_el1(u64 ttbr0)
 	return ttbr0 & ~GENMASK_ULL(63, 48);
 }
 
+extern void kvm_init_nested(struct kvm *kvm);
+extern int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu);
+extern void kvm_init_nested_s2_mmu(struct kvm_s2_mmu *mmu);
+extern struct kvm_s2_mmu *lookup_s2_mmu(struct kvm_vcpu *vcpu);
+extern void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu);
+extern void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu);
+
 extern bool forward_smc_trap(struct kvm_vcpu *vcpu);
 extern bool __check_nv_sr_forward(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 7b0241eb7483..9036206f0be8 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -36,9 +36,10 @@
 #include <asm/virt.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_asm.h>
+#include <asm/kvm_emulate.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_nested.h>
 #include <asm/kvm_pkvm.h>
-#include <asm/kvm_emulate.h>
 #include <asm/sections.h>
 
 #include <kvm/arm_hypercalls.h>
@@ -138,6 +139,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	mutex_unlock(&kvm->lock);
 #endif
 
+	kvm_init_nested(kvm);
+
 	ret = kvm_share_hyp(kvm, kvm + 1);
 	if (ret)
 		return ret;
@@ -413,6 +416,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	struct kvm_s2_mmu *mmu;
 	int *last_ran;
 
+	if (vcpu_has_nv(vcpu))
+		kvm_vcpu_load_hw_mmu(vcpu);
+
 	mmu = vcpu->arch.hw_mmu;
 	last_ran = this_cpu_ptr(mmu->last_vcpu_ran);
 
@@ -463,9 +469,12 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 	kvm_timer_vcpu_put(vcpu);
 	kvm_vgic_put(vcpu);
 	kvm_vcpu_pmu_restore_host(vcpu);
+	if (vcpu_has_nv(vcpu))
+		kvm_vcpu_put_hw_mmu(vcpu);
 	kvm_arm_vmid_clear_active();
 
 	vcpu_clear_on_unsupported_cpu(vcpu);
+
 	vcpu->cpu = -1;
 }
 
@@ -1206,8 +1215,13 @@ static int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
 
 	vcpu->arch.target = phys_target;
 
+	/* Prepare for nested if required */
+	ret = kvm_vcpu_init_nested(vcpu);
+
 	/* Now we know what it is, we can reset it. */
-	ret = kvm_reset_vcpu(vcpu);
+	if (!ret)
+		ret = kvm_reset_vcpu(vcpu);
+
 	if (ret) {
 		vcpu->arch.target = -1;
 		bitmap_zero(vcpu->arch.features, KVM_VCPU_MAX_FEATURES);
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index c934c911c702..3268936eec40 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -217,7 +217,7 @@ static void invalidate_icache_guest_page(void *va, size_t size)
  * does.
  */
 /**
- * unmap_stage2_range -- Clear stage2 page table entries to unmap a range
+ * __unmap_stage2_range -- Clear stage2 page table entries to unmap a range
  * @mmu:   The KVM stage-2 MMU pointer
  * @start: The intermediate physical base address of the range to unmap
  * @size:  The size of the area to unmap
@@ -240,7 +240,7 @@ static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64
 				   may_block));
 }
 
-static void unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
+void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
 {
 	__unmap_stage2_range(mmu, start, size, true);
 }
@@ -711,21 +711,9 @@ static struct kvm_pgtable_mm_ops kvm_s2_mm_ops = {
 	.icache_inval_pou	= invalidate_icache_guest_page,
 };
 
-/**
- * kvm_init_stage2_mmu - Initialise a S2 MMU structure
- * @kvm:	The pointer to the KVM structure
- * @mmu:	The pointer to the s2 MMU structure
- * @type:	The machine type of the virtual machine
- *
- * Allocates only the stage-2 HW PGD level table(s).
- * Note we don't need locking here as this is only called when the VM is
- * created, which can only be done once.
- */
-int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long type)
+static int kvm_init_ipa_range(struct kvm_s2_mmu *mmu, unsigned long type)
 {
 	u32 kvm_ipa_limit = get_kvm_ipa_limit();
-	int cpu, err;
-	struct kvm_pgtable *pgt;
 	u64 mmfr0, mmfr1;
 	u32 phys_shift;
 
@@ -752,11 +740,58 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long t
 	mmfr1 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
 	mmu->vtcr = kvm_get_vtcr(mmfr0, mmfr1, phys_shift);
 
+	return 0;
+}
+
+/**
+ * kvm_init_stage2_mmu - Initialise a S2 MMU structure
+ * @kvm:	The pointer to the KVM structure
+ * @mmu:	The pointer to the s2 MMU structure
+ * @type:	The machine type of the virtual machine
+ *
+ * Allocates only the stage-2 HW PGD level table(s).
+ * Note we don't need locking here as this is only called in two cases:
+ *
+ * - when the VM is created, which can't race against anything
+ *
+ * - when secondary kvm_s2_mmu structures are initialised for NV
+ *   guests, and the caller must hold kvm->lock as this is called on a
+ *   per-vcpu basis.
+ */
+int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long type)
+{
+	int cpu, err;
+	struct kvm_pgtable *pgt;
+
+	/*
+	 * If we already have our page tables in place, and that the
+	 * MMU context is the canonical one, we have a bug somewhere,
+	 * as this is only supposed to ever happen once per VM.
+	 *
+	 * Otherwise, we're building nested page tables, and that's
+	 * probably because userspace called KVM_ARM_VCPU_INIT more
+	 * than once on the same vcpu. Since that's actually legal,
+	 * don't kick a fuss and leave gracefully.
+	 */
 	if (mmu->pgt != NULL) {
+		if (&kvm->arch.mmu != mmu)
+			return 0;
+
 		kvm_err("kvm_arch already initialized?\n");
 		return -EINVAL;
 	}
 
+	/*
+	 * We only initialise the IPA range on the canonical MMU, so
+	 * the type is meaningless in all other situations.
+	 */
+	if (&kvm->arch.mmu != mmu)
+		type = kvm_get_pa_bits(kvm);
+
+	err = kvm_init_ipa_range(mmu, type);
+	if (err)
+		return err;
+
 	pgt = kzalloc(sizeof(*pgt), GFP_KERNEL_ACCOUNT);
 	if (!pgt)
 		return -ENOMEM;
@@ -777,6 +812,10 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long t
 
 	mmu->pgt = pgt;
 	mmu->pgd_phys = __pa(pgt->pgd);
+
+	if (&kvm->arch.mmu != mmu)
+		kvm_init_nested_s2_mmu(mmu);
+
 	return 0;
 
 out_destroy_pgtable:
@@ -822,7 +861,7 @@ static void stage2_unmap_memslot(struct kvm *kvm,
 
 		if (!(vma->vm_flags & VM_PFNMAP)) {
 			gpa_t gpa = addr + (vm_start - memslot->userspace_addr);
-			unmap_stage2_range(&kvm->arch.mmu, gpa, vm_end - vm_start);
+			kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, vm_end - vm_start);
 		}
 		hva = vm_end;
 	} while (hva < reg_end);
@@ -1876,11 +1915,6 @@ void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen)
 {
 }
 
-void kvm_arch_flush_shadow_all(struct kvm *kvm)
-{
-	kvm_free_stage2_pgd(&kvm->arch.mmu);
-}
-
 void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 				   struct kvm_memory_slot *slot)
 {
@@ -1888,7 +1922,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 	phys_addr_t size = slot->npages << PAGE_SHIFT;
 
 	write_lock(&kvm->mmu_lock);
-	unmap_stage2_range(&kvm->arch.mmu, gpa, size);
+	kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, size);
 	write_unlock(&kvm->mmu_lock);
 }
 
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 7f80f385d9e8..c3da57d16d88 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -7,7 +7,9 @@
 #include <linux/kvm.h>
 #include <linux/kvm_host.h>
 
+#include <asm/kvm_arm.h>
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_mmu.h>
 #include <asm/kvm_nested.h>
 #include <asm/sysreg.h>
 
@@ -16,6 +18,214 @@
 /* Protection against the sysreg repainting madness... */
 #define NV_FTR(r, f)		ID_AA64##r##_EL1_##f
 
+void kvm_init_nested(struct kvm *kvm)
+{
+	kvm->arch.nested_mmus = NULL;
+	kvm->arch.nested_mmus_size = 0;
+}
+
+int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu)
+{
+	struct kvm *kvm = vcpu->kvm;
+	struct kvm_s2_mmu *tmp;
+	int num_mmus;
+	int ret = -ENOMEM;
+
+	if (!test_bit(KVM_ARM_VCPU_HAS_EL2, vcpu->arch.features))
+		return 0;
+
+	if (!cpus_have_final_cap(ARM64_HAS_NESTED_VIRT))
+		return -EINVAL;
+
+	mutex_lock(&kvm->arch.config_lock);
+
+	/*
+	 * Let's treat memory allocation failures as benign: If we fail to
+	 * allocate anything, return an error and keep the allocated array
+	 * alive. Userspace may try to recover by intializing the vcpu
+	 * again, and there is no reason to affect the whole VM for this.
+	 */
+	num_mmus = atomic_read(&kvm->online_vcpus) * 2;
+	tmp = krealloc(kvm->arch.nested_mmus,
+		       num_mmus * sizeof(*kvm->arch.nested_mmus),
+		       GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+	if (tmp) {
+		/*
+		 * If we went through a realocation, adjust the MMU
+		 * back-pointers in the previously initialised
+		 * pg_table structures.
+		 */
+		if (kvm->arch.nested_mmus != tmp) {
+			int i;
+
+			for (i = 0; i < num_mmus - 2; i++)
+				tmp[i].pgt->mmu = &tmp[i];
+		}
+
+		if (kvm_init_stage2_mmu(kvm, &tmp[num_mmus - 1], 0) ||
+		    kvm_init_stage2_mmu(kvm, &tmp[num_mmus - 2], 0)) {
+			kvm_free_stage2_pgd(&tmp[num_mmus - 1]);
+			kvm_free_stage2_pgd(&tmp[num_mmus - 2]);
+		} else {
+			kvm->arch.nested_mmus_size = num_mmus;
+			ret = 0;
+		}
+
+		kvm->arch.nested_mmus = tmp;
+	}
+
+	mutex_unlock(&kvm->arch.config_lock);
+	return ret;
+}
+
+/* Must be called with kvm->mmu_lock held */
+struct kvm_s2_mmu *lookup_s2_mmu(struct kvm_vcpu *vcpu)
+{
+	bool nested_stage2_enabled;
+	u64 vttbr, vtcr, hcr;
+	struct kvm *kvm;
+	int i;
+
+	kvm = vcpu->kvm;
+
+	vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
+	vtcr = vcpu_read_sys_reg(vcpu, VTCR_EL2);
+	hcr = vcpu_read_sys_reg(vcpu, HCR_EL2);
+
+	nested_stage2_enabled = hcr & HCR_VM;
+
+	/* Don't consider the CnP bit for the vttbr match */
+	vttbr = vttbr & ~VTTBR_CNP_BIT;
+
+	/*
+	 * Two possibilities when looking up a S2 MMU context:
+	 *
+	 * - either S2 is enabled in the guest, and we need a context that is
+         *   S2-enabled and matches the full VTTBR (VMID+BADDR) and VTCR,
+         *   which makes it safe from a TLB conflict perspective (a broken
+         *   guest won't be able to generate them),
+	 *
+	 * - or S2 is disabled, and we need a context that is S2-disabled
+         *   and matches the VMID only, as all TLBs are tagged by VMID even
+         *   if S2 translation is disabled.
+	 */
+	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
+		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
+
+		if (!kvm_s2_mmu_valid(mmu))
+			continue;
+
+		if (nested_stage2_enabled &&
+		    mmu->nested_stage2_enabled &&
+		    vttbr == mmu->tlb_vttbr &&
+		    vtcr == mmu->tlb_vtcr)
+			return mmu;
+
+		if (!nested_stage2_enabled &&
+		    !mmu->nested_stage2_enabled &&
+		    get_vmid(vttbr) == get_vmid(mmu->tlb_vttbr))
+			return mmu;
+	}
+	return NULL;
+}
+
+/* Must be called with kvm->mmu_lock held */
+static struct kvm_s2_mmu *get_s2_mmu_nested(struct kvm_vcpu *vcpu)
+{
+	struct kvm *kvm = vcpu->kvm;
+	struct kvm_s2_mmu *s2_mmu;
+	int i;
+
+	s2_mmu = lookup_s2_mmu(vcpu);
+	if (s2_mmu)
+		goto out;
+
+	/*
+	 * Make sure we don't always search from the same point, or we
+	 * will always reuse a potentially active context, leaving
+	 * free contexts unused.
+	 */
+	for (i = kvm->arch.nested_mmus_next;
+	     i < (kvm->arch.nested_mmus_size + kvm->arch.nested_mmus_next);
+	     i++) {
+		s2_mmu = &kvm->arch.nested_mmus[i % kvm->arch.nested_mmus_size];
+
+		if (atomic_read(&s2_mmu->refcnt) == 0)
+			break;
+	}
+	BUG_ON(atomic_read(&s2_mmu->refcnt)); /* We have struct MMUs to spare */
+
+	/* Set the scene for the next search */
+	kvm->arch.nested_mmus_next = (i + 1) % kvm->arch.nested_mmus_size;
+
+	if (kvm_s2_mmu_valid(s2_mmu)) {
+		/* Clear the old state */
+		kvm_unmap_stage2_range(s2_mmu, 0, kvm_phys_size(s2_mmu));
+		if (atomic64_read(&s2_mmu->vmid.id))
+			kvm_call_hyp(__kvm_tlb_flush_vmid, s2_mmu);
+	}
+
+	/*
+	 * The virtual VMID (modulo CnP) will be used as a key when matching
+	 * an existing kvm_s2_mmu.
+	 *
+	 * We cache VTCR at allocation time, once and for all. It'd be great
+	 * if the guest didn't screw that one up, as this is not very
+	 * forgiving...
+	 */
+	s2_mmu->tlb_vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2) & ~VTTBR_CNP_BIT;
+	s2_mmu->tlb_vtcr = vcpu_read_sys_reg(vcpu, VTCR_EL2);
+	s2_mmu->nested_stage2_enabled = vcpu_read_sys_reg(vcpu, HCR_EL2) & HCR_VM;
+
+out:
+	atomic_inc(&s2_mmu->refcnt);
+	return s2_mmu;
+}
+
+void kvm_init_nested_s2_mmu(struct kvm_s2_mmu *mmu)
+{
+	mmu->tlb_vttbr = 1;
+	mmu->nested_stage2_enabled = false;
+	atomic_set(&mmu->refcnt, 0);
+}
+
+void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu)
+{
+	if (is_hyp_ctxt(vcpu)) {
+		vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu;
+	} else {
+		write_lock(&vcpu->kvm->mmu_lock);
+		vcpu->arch.hw_mmu = get_s2_mmu_nested(vcpu);
+		write_unlock(&vcpu->kvm->mmu_lock);
+	}
+}
+
+void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu)
+{
+	if (vcpu->arch.hw_mmu != &vcpu->kvm->arch.mmu) {
+		atomic_dec(&vcpu->arch.hw_mmu->refcnt);
+		vcpu->arch.hw_mmu = NULL;
+	}
+}
+
+void kvm_arch_flush_shadow_all(struct kvm *kvm)
+{
+	int i;
+
+	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
+		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
+
+		WARN_ON(atomic_read(&mmu->refcnt));
+
+		if (!atomic_read(&mmu->refcnt))
+			kvm_free_stage2_pgd(mmu);
+	}
+	kfree(kvm->arch.nested_mmus);
+	kvm->arch.nested_mmus = NULL;
+	kvm->arch.nested_mmus_size = 0;
+	kvm_free_stage2_pgd(&kvm->arch.mmu);
+}
+
 /*
  * Our emulated CPU doesn't support all the possible features. For the
  * sake of simplicity (and probably mental sanity), wipe out a number
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index b5dee8e57e77..ad11cc514b3d 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -365,6 +365,12 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 	return ret;
 }
 
+u32 kvm_get_pa_bits(struct kvm *kvm)
+{
+	/* Fixed limit until we can configure ID_AA64MMFR0.PARange */
+	return kvm_ipa_limit;
+}
+
 u32 get_kvm_ipa_limit(void)
 {
 	return kvm_ipa_limit;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 26/59] KVM: arm64: nv: Implement nested Stage-2 page table walk logic
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (24 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 25/59] KVM: arm64: nv: Support multiple nested Stage-2 mmu structures Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 27/59] KVM: arm64: nv: Handle shadow stage 2 page faults Marc Zyngier
                   ` (35 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

From: Christoffer Dall <christoffer.dall@linaro.org>

Based on the pseudo-code in the ARM ARM, implement a stage 2 software
page table walker.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
[maz: heavily reworked for future ARMv8.4-TTL support]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/esr.h        |   1 +
 arch/arm64/include/asm/kvm_arm.h    |   2 +
 arch/arm64/include/asm/kvm_nested.h |  13 ++
 arch/arm64/kvm/nested.c             | 267 ++++++++++++++++++++++++++++
 4 files changed, 283 insertions(+)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index 8487aec9b658..380156a9b956 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -141,6 +141,7 @@
 #define ESR_ELx_CM 		(UL(1) << ESR_ELx_CM_SHIFT)
 
 /* ISS field definitions for exceptions taken in to Hyp */
+#define ESR_ELx_FSC_ADDRSZ	(0x00)
 #define ESR_ELx_CV		(UL(1) << 24)
 #define ESR_ELx_COND_SHIFT	(20)
 #define ESR_ELx_COND_MASK	(UL(0xF) << ESR_ELx_COND_SHIFT)
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index ee1c4d71137c..af24380e02d9 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -282,6 +282,8 @@
 #define VTTBR_VMID_SHIFT  (UL(48))
 #define VTTBR_VMID_MASK(size) (_AT(u64, (1 << size) - 1) << VTTBR_VMID_SHIFT)
 
+#define SCTLR_EE	(UL(1) << 25)
+
 /* Hyp System Trap Register */
 #define HSTR_EL2_T(x)	(1 << x)
 
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index d57e609dd3f4..5306f7ca5e07 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -66,6 +66,19 @@ extern struct kvm_s2_mmu *lookup_s2_mmu(struct kvm_vcpu *vcpu);
 extern void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu);
 extern void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu);
 
+struct kvm_s2_trans {
+	phys_addr_t output;
+	unsigned long block_size;
+	bool writable;
+	bool readable;
+	int level;
+	u32 esr;
+	u64 upper_attr;
+};
+
+extern int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
+			      struct kvm_s2_trans *result);
+
 extern bool forward_smc_trap(struct kvm_vcpu *vcpu);
 extern bool __check_nv_sr_forward(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index c3da57d16d88..9ab06055cd35 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -78,6 +78,273 @@ int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu)
 	return ret;
 }
 
+struct s2_walk_info {
+	int	     (*read_desc)(phys_addr_t pa, u64 *desc, void *data);
+	void	     *data;
+	u64	     baddr;
+	unsigned int max_pa_bits;
+	unsigned int max_ipa_bits;
+	unsigned int pgshift;
+	unsigned int pgsize;
+	unsigned int ps;
+	unsigned int sl;
+	unsigned int t0sz;
+	bool	     be;
+};
+
+static unsigned int ps_to_output_size(unsigned int ps)
+{
+	switch (ps) {
+	case 0: return 32;
+	case 1: return 36;
+	case 2: return 40;
+	case 3: return 42;
+	case 4: return 44;
+	case 5:
+	default:
+		return 48;
+	}
+}
+
+static u32 compute_fsc(int level, u32 fsc)
+{
+	return fsc | (level & 0x3);
+}
+
+static int check_base_s2_limits(struct s2_walk_info *wi,
+				int level, int input_size, int stride)
+{
+	int start_size;
+
+	/* Check translation limits */
+	switch (wi->pgsize) {
+	case SZ_64K:
+		if (level == 0 || (level == 1 && wi->max_ipa_bits <= 42))
+			return -EFAULT;
+		break;
+	case SZ_16K:
+		if (level == 0 || (level == 1 && wi->max_ipa_bits <= 40))
+			return -EFAULT;
+		break;
+	case SZ_4K:
+		if (level < 0 || (level == 0 && wi->max_ipa_bits <= 42))
+			return -EFAULT;
+		break;
+	}
+
+	/* Check input size limits */
+	if (input_size > wi->max_ipa_bits)
+		return -EFAULT;
+
+	/* Check number of entries in starting level table */
+	start_size = input_size - ((3 - level) * stride + wi->pgshift);
+	if (start_size < 1 || start_size > stride + 4)
+		return -EFAULT;
+
+	return 0;
+}
+
+/* Check if output is within boundaries */
+static int check_output_size(struct s2_walk_info *wi, phys_addr_t output)
+{
+	unsigned int output_size = ps_to_output_size(wi->ps);
+
+	if (output_size > wi->max_pa_bits)
+		output_size = wi->max_pa_bits;
+
+	if (output_size != 48 && (output & GENMASK_ULL(47, output_size)))
+		return -1;
+
+	return 0;
+}
+
+/*
+ * This is essentially a C-version of the pseudo code from the ARM ARM
+ * AArch64.TranslationTableWalk  function.  I strongly recommend looking at
+ * that pseudocode in trying to understand this.
+ *
+ * Must be called with the kvm->srcu read lock held
+ */
+static int walk_nested_s2_pgd(phys_addr_t ipa,
+			      struct s2_walk_info *wi, struct kvm_s2_trans *out)
+{
+	int first_block_level, level, stride, input_size, base_lower_bound;
+	phys_addr_t base_addr;
+	unsigned int addr_top, addr_bottom;
+	u64 desc;  /* page table entry */
+	int ret;
+	phys_addr_t paddr;
+
+	switch (wi->pgsize) {
+	case SZ_64K:
+	case SZ_16K:
+		level = 3 - wi->sl;
+		first_block_level = 2;
+		break;
+	case SZ_4K:
+		level = 2 - wi->sl;
+		first_block_level = 1;
+		break;
+	default:
+		/* GCC is braindead */
+		unreachable();
+	}
+
+	stride = wi->pgshift - 3;
+	input_size = 64 - wi->t0sz;
+	if (input_size > 48 || input_size < 25)
+		return -EFAULT;
+
+	ret = check_base_s2_limits(wi, level, input_size, stride);
+	if (WARN_ON(ret))
+		return ret;
+
+	base_lower_bound = 3 + input_size - ((3 - level) * stride +
+			   wi->pgshift);
+	base_addr = wi->baddr & GENMASK_ULL(47, base_lower_bound);
+
+	if (check_output_size(wi, base_addr)) {
+		out->esr = compute_fsc(level, ESR_ELx_FSC_ADDRSZ);
+		return 1;
+	}
+
+	addr_top = input_size - 1;
+
+	while (1) {
+		phys_addr_t index;
+
+		addr_bottom = (3 - level) * stride + wi->pgshift;
+		index = (ipa & GENMASK_ULL(addr_top, addr_bottom))
+			>> (addr_bottom - 3);
+
+		paddr = base_addr | index;
+		ret = wi->read_desc(paddr, &desc, wi->data);
+		if (ret < 0)
+			return ret;
+
+		/*
+		 * Handle reversedescriptors if endianness differs between the
+		 * host and the guest hypervisor.
+		 */
+		if (wi->be)
+			desc = be64_to_cpu(desc);
+		else
+			desc = le64_to_cpu(desc);
+
+		/* Check for valid descriptor at this point */
+		if (!(desc & 1) || ((desc & 3) == 1 && level == 3)) {
+			out->esr = compute_fsc(level, ESR_ELx_FSC_FAULT);
+			out->upper_attr = desc;
+			return 1;
+		}
+
+		/* We're at the final level or block translation level */
+		if ((desc & 3) == 1 || level == 3)
+			break;
+
+		if (check_output_size(wi, desc)) {
+			out->esr = compute_fsc(level, ESR_ELx_FSC_ADDRSZ);
+			out->upper_attr = desc;
+			return 1;
+		}
+
+		base_addr = desc & GENMASK_ULL(47, wi->pgshift);
+
+		level += 1;
+		addr_top = addr_bottom - 1;
+	}
+
+	if (level < first_block_level) {
+		out->esr = compute_fsc(level, ESR_ELx_FSC_FAULT);
+		out->upper_attr = desc;
+		return 1;
+	}
+
+	/*
+	 * We don't use the contiguous bit in the stage-2 ptes, so skip check
+	 * for misprogramming of the contiguous bit.
+	 */
+
+	if (check_output_size(wi, desc)) {
+		out->esr = compute_fsc(level, ESR_ELx_FSC_ADDRSZ);
+		out->upper_attr = desc;
+		return 1;
+	}
+
+	if (!(desc & BIT(10))) {
+		out->esr = compute_fsc(level, ESR_ELx_FSC_ACCESS);
+		out->upper_attr = desc;
+		return 1;
+	}
+
+	/* Calculate and return the result */
+	paddr = (desc & GENMASK_ULL(47, addr_bottom)) |
+		(ipa & GENMASK_ULL(addr_bottom - 1, 0));
+	out->output = paddr;
+	out->block_size = 1UL << ((3 - level) * stride + wi->pgshift);
+	out->readable = desc & (0b01 << 6);
+	out->writable = desc & (0b10 << 6);
+	out->level = level;
+	out->upper_attr = desc & GENMASK_ULL(63, 52);
+	return 0;
+}
+
+static int read_guest_s2_desc(phys_addr_t pa, u64 *desc, void *data)
+{
+	struct kvm_vcpu *vcpu = data;
+
+	return kvm_read_guest(vcpu->kvm, pa, desc, sizeof(*desc));
+}
+
+static void vtcr_to_walk_info(u64 vtcr, struct s2_walk_info *wi)
+{
+	wi->t0sz = vtcr & TCR_EL2_T0SZ_MASK;
+
+	switch (vtcr & VTCR_EL2_TG0_MASK) {
+	case VTCR_EL2_TG0_4K:
+		wi->pgshift = 12;	 break;
+	case VTCR_EL2_TG0_16K:
+		wi->pgshift = 14;	 break;
+	case VTCR_EL2_TG0_64K:
+	default:
+		wi->pgshift = 16;	 break;
+	}
+
+	wi->pgsize = BIT(wi->pgshift);
+	wi->ps = FIELD_GET(VTCR_EL2_PS_MASK, vtcr);
+	wi->sl = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
+	wi->max_ipa_bits = VTCR_EL2_IPA(vtcr);
+	/* Global limit for now, should eventually be per-VM */
+	wi->max_pa_bits = get_kvm_ipa_limit();
+}
+
+int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
+		       struct kvm_s2_trans *result)
+{
+	u64 vtcr = vcpu_read_sys_reg(vcpu, VTCR_EL2);
+	struct s2_walk_info wi;
+	int ret;
+
+	result->esr = 0;
+
+	if (!vcpu_has_nv(vcpu))
+		return 0;
+
+	wi.read_desc = read_guest_s2_desc;
+	wi.data = vcpu;
+	wi.baddr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
+
+	vtcr_to_walk_info(vtcr, &wi);
+
+	wi.be = vcpu_read_sys_reg(vcpu, SCTLR_EL2) & SCTLR_EE;
+
+	ret = walk_nested_s2_pgd(gipa, &wi, result);
+	if (ret)
+		result->esr |= (kvm_vcpu_get_esr(vcpu) & ~ESR_ELx_FSC);
+
+	return ret;
+}
+
 /* Must be called with kvm->mmu_lock held */
 struct kvm_s2_mmu *lookup_s2_mmu(struct kvm_vcpu *vcpu)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 27/59] KVM: arm64: nv: Handle shadow stage 2 page faults
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (25 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 26/59] KVM: arm64: nv: Implement nested Stage-2 page table walk logic Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 28/59] KVM: arm64: nv: Restrict S2 RD/WR permissions to match the guest's Marc Zyngier
                   ` (34 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

If we are faulting on a shadow stage 2 translation, we first walk the
guest hypervisor's stage 2 page table to see if it has a mapping. If
not, we inject a stage 2 page fault to the virtual EL2. Otherwise, we
create a mapping in the shadow stage 2 page table.

Note that we have to deal with two IPAs when we got a shadow stage 2
page fault. One is the address we faulted on, and is in the L2 guest
phys space. The other is from the guest stage-2 page table walk, and is
in the L1 guest phys space.  To differentiate them, we rename variables
so that fault_ipa is used for the former and ipa is used for the latter.

Co-developed-by: Christoffer Dall <christoffer.dall@linaro.org>
Co-developed-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
[maz: rewrote this multiple times...]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_emulate.h |  6 ++
 arch/arm64/include/asm/kvm_nested.h  | 19 ++++++
 arch/arm64/kvm/mmu.c                 | 89 ++++++++++++++++++++++++----
 arch/arm64/kvm/nested.c              | 48 +++++++++++++++
 4 files changed, 152 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index c64980d33c8b..f4ae0c61fa1c 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -652,4 +652,10 @@ static inline bool vcpu_has_feature(struct kvm_vcpu *vcpu, int feature)
 	return test_bit(feature, vcpu->arch.features);
 }
 
+static inline bool kvm_is_shadow_s2_fault(struct kvm_vcpu *vcpu)
+{
+	return (vcpu->arch.hw_mmu != &vcpu->kvm->arch.mmu &&
+		vcpu->arch.hw_mmu->nested_stage2_enabled);
+}
+
 #endif /* __ARM64_KVM_EMULATE_H__ */
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 5306f7ca5e07..d2cee4faa387 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -76,8 +76,27 @@ struct kvm_s2_trans {
 	u64 upper_attr;
 };
 
+static inline phys_addr_t kvm_s2_trans_output(struct kvm_s2_trans *trans)
+{
+	return trans->output;
+}
+
+static inline unsigned long kvm_s2_trans_size(struct kvm_s2_trans *trans)
+{
+	return trans->block_size;
+}
+
+static inline u32 kvm_s2_trans_esr(struct kvm_s2_trans *trans)
+{
+	return trans->esr;
+}
+
 extern int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
 			      struct kvm_s2_trans *result);
+extern int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu,
+				    struct kvm_s2_trans *trans);
+extern int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2);
+int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
 
 extern bool forward_smc_trap(struct kvm_vcpu *vcpu);
 extern bool __check_nv_sr_forward(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 3268936eec40..bc6f004d4ffb 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1253,14 +1253,16 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma)
 }
 
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
-			  struct kvm_memory_slot *memslot, unsigned long hva,
-			  unsigned long fault_status)
+			  struct kvm_s2_trans *nested,
+			  struct kvm_memory_slot *memslot,
+			  unsigned long hva, unsigned long fault_status)
 {
 	int ret = 0;
 	bool write_fault, writable, force_pte = false;
 	bool exec_fault, mte_allowed;
 	bool device = false;
 	unsigned long mmu_seq;
+	phys_addr_t ipa = fault_ipa;
 	struct kvm *kvm = vcpu->kvm;
 	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
 	struct vm_area_struct *vma;
@@ -1345,10 +1347,38 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	}
 
 	vma_pagesize = 1UL << vma_shift;
+
+	if (nested) {
+		unsigned long max_map_size;
+
+		max_map_size = force_pte ? PUD_SIZE : PAGE_SIZE;
+
+		ipa = kvm_s2_trans_output(nested);
+
+		/*
+		 * If we're about to create a shadow stage 2 entry, then we
+		 * can only create a block mapping if the guest stage 2 page
+		 * table uses at least as big a mapping.
+		 */
+		max_map_size = min(kvm_s2_trans_size(nested), max_map_size);
+
+		/*
+		 * Be careful that if the mapping size falls between
+		 * two host sizes, take the smallest of the two.
+		 */
+		if (max_map_size >= PMD_SIZE && max_map_size < PUD_SIZE)
+			max_map_size = PMD_SIZE;
+		else if (max_map_size >= PAGE_SIZE && max_map_size < PMD_SIZE)
+			max_map_size = PAGE_SIZE;
+
+		force_pte = (max_map_size == PAGE_SIZE);
+		vma_pagesize = min(vma_pagesize, (long)max_map_size);
+	}
+
 	if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE)
 		fault_ipa &= ~(vma_pagesize - 1);
 
-	gfn = fault_ipa >> PAGE_SHIFT;
+	gfn = ipa >> PAGE_SHIFT;
 	mte_allowed = kvm_vma_mte_allowed(vma);
 
 	/* Don't use the VMA after the unlock -- it may have vanished */
@@ -1499,8 +1529,10 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
  */
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 {
+	struct kvm_s2_trans nested_trans, *nested = NULL;
 	unsigned long fault_status;
-	phys_addr_t fault_ipa;
+	phys_addr_t fault_ipa; /* The address we faulted on */
+	phys_addr_t ipa; /* Always the IPA in the L1 guest phys space */
 	struct kvm_memory_slot *memslot;
 	unsigned long hva;
 	bool is_iabt, write_fault, writable;
@@ -1509,7 +1541,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 
 	fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
 
-	fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
+	ipa = fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
 	is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
 
 	if (fault_status == ESR_ELx_FSC_FAULT) {
@@ -1550,6 +1582,12 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 	if (fault_status != ESR_ELx_FSC_FAULT &&
 	    fault_status != ESR_ELx_FSC_PERM &&
 	    fault_status != ESR_ELx_FSC_ACCESS) {
+		/*
+		 * We must never see an address size fault on shadow stage 2
+		 * page table walk, because we would have injected an addr
+		 * size fault when we walked the nested s2 page and not
+		 * create the shadow entry.
+		 */
 		kvm_err("Unsupported FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
 			kvm_vcpu_trap_get_class(vcpu),
 			(unsigned long)kvm_vcpu_trap_get_fault(vcpu),
@@ -1559,7 +1597,37 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 
 	idx = srcu_read_lock(&vcpu->kvm->srcu);
 
-	gfn = fault_ipa >> PAGE_SHIFT;
+	/*
+	 * We may have faulted on a shadow stage 2 page table if we are
+	 * running a nested guest.  In this case, we have to resolve the L2
+	 * IPA to the L1 IPA first, before knowing what kind of memory should
+	 * back the L1 IPA.
+	 *
+	 * If the shadow stage 2 page table walk faults, then we simply inject
+	 * this to the guest and carry on.
+	 */
+	if (kvm_is_shadow_s2_fault(vcpu)) {
+		u32 esr;
+
+		ret = kvm_walk_nested_s2(vcpu, fault_ipa, &nested_trans);
+		if (ret) {
+			esr = kvm_s2_trans_esr(&nested_trans);
+			kvm_inject_s2_fault(vcpu, esr);
+			goto out_unlock;
+		}
+
+		ret = kvm_s2_handle_perm_fault(vcpu, &nested_trans);
+		if (ret) {
+			esr = kvm_s2_trans_esr(&nested_trans);
+			kvm_inject_s2_fault(vcpu, esr);
+			goto out_unlock;
+		}
+
+		ipa = kvm_s2_trans_output(&nested_trans);
+		nested = &nested_trans;
+	}
+
+	gfn = ipa >> PAGE_SHIFT;
 	memslot = gfn_to_memslot(vcpu->kvm, gfn);
 	hva = gfn_to_hva_memslot_prot(memslot, gfn, &writable);
 	write_fault = kvm_is_write_fault(vcpu);
@@ -1603,13 +1671,13 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 		 * faulting VA. This is always 12 bits, irrespective
 		 * of the page size.
 		 */
-		fault_ipa |= kvm_vcpu_get_hfar(vcpu) & ((1 << 12) - 1);
-		ret = io_mem_abort(vcpu, fault_ipa);
+		ipa |= kvm_vcpu_get_hfar(vcpu) & ((1 << 12) - 1);
+		ret = io_mem_abort(vcpu, ipa);
 		goto out_unlock;
 	}
 
 	/* Userspace should not be able to register out-of-bounds IPAs */
-	VM_BUG_ON(fault_ipa >= kvm_phys_size(vcpu->arch.hw_mmu));
+	VM_BUG_ON(ipa >= kvm_phys_size(vcpu->arch.hw_mmu));
 
 	if (fault_status == ESR_ELx_FSC_ACCESS) {
 		handle_access_fault(vcpu, fault_ipa);
@@ -1617,7 +1685,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 		goto out_unlock;
 	}
 
-	ret = user_mem_abort(vcpu, fault_ipa, memslot, hva, fault_status);
+	ret = user_mem_abort(vcpu, fault_ipa, nested,
+			     memslot, hva, fault_status);
 	if (ret == 0)
 		ret = 1;
 out:
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 9ab06055cd35..2538e0c632f2 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -111,6 +111,15 @@ static u32 compute_fsc(int level, u32 fsc)
 	return fsc | (level & 0x3);
 }
 
+static int esr_s2_fault(struct kvm_vcpu *vcpu, int level, u32 fsc)
+{
+	u32 esr;
+
+	esr = kvm_vcpu_get_esr(vcpu) & ~ESR_ELx_FSC;
+	esr |= compute_fsc(level, fsc);
+	return esr;
+}
+
 static int check_base_s2_limits(struct s2_walk_info *wi,
 				int level, int input_size, int stride)
 {
@@ -475,6 +484,45 @@ void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu)
 	}
 }
 
+/*
+ * Returns non-zero if permission fault is handled by injecting it to the next
+ * level hypervisor.
+ */
+int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu, struct kvm_s2_trans *trans)
+{
+	unsigned long fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
+	bool forward_fault = false;
+
+	trans->esr = 0;
+
+	if (fault_status != ESR_ELx_FSC_PERM)
+		return 0;
+
+	if (kvm_vcpu_trap_is_iabt(vcpu)) {
+		forward_fault = (trans->upper_attr & BIT(54));
+	} else {
+		bool write_fault = kvm_is_write_fault(vcpu);
+
+		forward_fault = ((write_fault && !trans->writable) ||
+				 (!write_fault && !trans->readable));
+	}
+
+	if (forward_fault) {
+		trans->esr = esr_s2_fault(vcpu, trans->level, ESR_ELx_FSC_PERM);
+		return 1;
+	}
+
+	return 0;
+}
+
+int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2)
+{
+	vcpu_write_sys_reg(vcpu, vcpu->arch.fault.far_el2, FAR_EL2);
+	vcpu_write_sys_reg(vcpu, vcpu->arch.fault.hpfar_el2, HPFAR_EL2);
+
+	return kvm_inject_nested_sync(vcpu, esr_el2);
+}
+
 void kvm_arch_flush_shadow_all(struct kvm *kvm)
 {
 	int i;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 28/59] KVM: arm64: nv: Restrict S2 RD/WR permissions to match the guest's
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (26 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 27/59] KVM: arm64: nv: Handle shadow stage 2 page faults Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 29/59] KVM: arm64: nv: Unmap/flush shadow stage 2 page tables Marc Zyngier
                   ` (33 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

When mapping a page in a shadow stage-2, special care must be
taken not to be more permissive than the guest is (writable or
readable page when the guest hasn't set that permission).

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_nested.h | 15 +++++++++++++++
 arch/arm64/kvm/mmu.c                | 14 +++++++++++++-
 arch/arm64/kvm/nested.c             |  2 +-
 3 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index d2cee4faa387..f20d272fcd6d 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -91,6 +91,21 @@ static inline u32 kvm_s2_trans_esr(struct kvm_s2_trans *trans)
 	return trans->esr;
 }
 
+static inline bool kvm_s2_trans_readable(struct kvm_s2_trans *trans)
+{
+	return trans->readable;
+}
+
+static inline bool kvm_s2_trans_writable(struct kvm_s2_trans *trans)
+{
+	return trans->writable;
+}
+
+static inline bool kvm_s2_trans_executable(struct kvm_s2_trans *trans)
+{
+	return !(trans->upper_attr & BIT(54));
+}
+
 extern int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
 			      struct kvm_s2_trans *result);
 extern int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu,
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index bc6f004d4ffb..1e19c59b8235 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1427,6 +1427,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (exec_fault && device)
 		return -ENOEXEC;
 
+	/*
+	 * Potentially reduce shadow S2 permissions to match the guest's own
+	 * S2. For exec faults, we'd only reach this point if the guest
+	 * actually allowed it (see kvm_s2_handle_perm_fault).
+	 */
+	if (nested) {
+		writable &= kvm_s2_trans_writable(nested);
+		if (!kvm_s2_trans_readable(nested))
+			prot &= ~KVM_PGTABLE_PROT_R;
+	}
+
 	read_lock(&kvm->mmu_lock);
 	pgt = vcpu->arch.hw_mmu->pgt;
 	if (mmu_invalidate_retry(kvm, mmu_seq))
@@ -1469,7 +1480,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 
 	if (device)
 		prot |= KVM_PGTABLE_PROT_DEVICE;
-	else if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC))
+	else if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC) &&
+		 (!nested || kvm_s2_trans_executable(nested)))
 		prot |= KVM_PGTABLE_PROT_X;
 
 	/*
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 2538e0c632f2..73c0be25345a 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -499,7 +499,7 @@ int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu, struct kvm_s2_trans *trans)
 		return 0;
 
 	if (kvm_vcpu_trap_is_iabt(vcpu)) {
-		forward_fault = (trans->upper_attr & BIT(54));
+		forward_fault = !kvm_s2_trans_executable(trans);
 	} else {
 		bool write_fault = kvm_is_write_fault(vcpu);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 29/59] KVM: arm64: nv: Unmap/flush shadow stage 2 page tables
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (27 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 28/59] KVM: arm64: nv: Restrict S2 RD/WR permissions to match the guest's Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-09-14 13:10   ` Ganapatrao Kulkarni
  2023-05-15 17:30 ` [PATCH v10 30/59] KVM: arm64: nv: Set a handler for the system instruction traps Marc Zyngier
                   ` (32 subsequent siblings)
  61 siblings, 1 reply; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

From: Christoffer Dall <christoffer.dall@linaro.org>

Unmap/flush shadow stage 2 page tables for the nested VMs as well as the
stage 2 page table for the guest hypervisor.

Note: A bunch of the code in mmu.c relating to MMU notifiers is
currently dealt with in an extremely abrupt way, for example by clearing
out an entire shadow stage-2 table. This will be handled in a more
efficient way using the reverse mapping feature in a later version of
the patch series.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_mmu.h    |  3 +++
 arch/arm64/include/asm/kvm_nested.h |  3 +++
 arch/arm64/kvm/mmu.c                | 30 ++++++++++++++++++----
 arch/arm64/kvm/nested.c             | 39 +++++++++++++++++++++++++++++
 4 files changed, 70 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 896acdf98e71..d155b3871c4c 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -169,6 +169,8 @@ int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
 			   void __iomem **haddr);
 int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
 			     void **haddr);
+void kvm_stage2_flush_range(struct kvm_s2_mmu *mmu,
+			    phys_addr_t addr, phys_addr_t end);
 void __init free_hyp_pgds(void);
 
 void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size);
@@ -177,6 +179,7 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long t
 void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
 int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 			  phys_addr_t pa, unsigned long size, bool writable);
+void kvm_stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end);
 
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index f20d272fcd6d..d330b947d48a 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -111,6 +111,9 @@ extern int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
 extern int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu,
 				    struct kvm_s2_trans *trans);
 extern int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2);
+extern void kvm_nested_s2_wp(struct kvm *kvm);
+extern void kvm_nested_s2_unmap(struct kvm *kvm);
+extern void kvm_nested_s2_flush(struct kvm *kvm);
 int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
 
 extern bool forward_smc_trap(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 1e19c59b8235..8144bb9b9ec8 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -245,13 +245,19 @@ void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
 	__unmap_stage2_range(mmu, start, size, true);
 }
 
+void kvm_stage2_flush_range(struct kvm_s2_mmu *mmu,
+			    phys_addr_t addr, phys_addr_t end)
+{
+	stage2_apply_range_resched(mmu, addr, end, kvm_pgtable_stage2_flush);
+}
+
 static void stage2_flush_memslot(struct kvm *kvm,
 				 struct kvm_memory_slot *memslot)
 {
 	phys_addr_t addr = memslot->base_gfn << PAGE_SHIFT;
 	phys_addr_t end = addr + PAGE_SIZE * memslot->npages;
 
-	stage2_apply_range_resched(&kvm->arch.mmu, addr, end, kvm_pgtable_stage2_flush);
+	kvm_stage2_flush_range(&kvm->arch.mmu, addr, end);
 }
 
 /**
@@ -274,6 +280,8 @@ static void stage2_flush_vm(struct kvm *kvm)
 	kvm_for_each_memslot(memslot, bkt, slots)
 		stage2_flush_memslot(kvm, memslot);
 
+	kvm_nested_s2_flush(kvm);
+
 	write_unlock(&kvm->mmu_lock);
 	srcu_read_unlock(&kvm->srcu, idx);
 }
@@ -888,6 +896,8 @@ void stage2_unmap_vm(struct kvm *kvm)
 	kvm_for_each_memslot(memslot, bkt, slots)
 		stage2_unmap_memslot(kvm, memslot);
 
+	kvm_nested_s2_unmap(kvm);
+
 	write_unlock(&kvm->mmu_lock);
 	mmap_read_unlock(current->mm);
 	srcu_read_unlock(&kvm->srcu, idx);
@@ -987,12 +997,12 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 }
 
 /**
- * stage2_wp_range() - write protect stage2 memory region range
+ * kvm_stage2_wp_range() - write protect stage2 memory region range
  * @mmu:        The KVM stage-2 MMU pointer
  * @addr:	Start address of range
  * @end:	End address of range
  */
-static void stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end)
+void kvm_stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end)
 {
 	stage2_apply_range_resched(mmu, addr, end, kvm_pgtable_stage2_wrprotect);
 }
@@ -1023,7 +1033,8 @@ static void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
 	end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
 
 	write_lock(&kvm->mmu_lock);
-	stage2_wp_range(&kvm->arch.mmu, start, end);
+	kvm_stage2_wp_range(&kvm->arch.mmu, start, end);
+	kvm_nested_s2_wp(kvm);
 	write_unlock(&kvm->mmu_lock);
 	kvm_flush_remote_tlbs(kvm);
 }
@@ -1047,7 +1058,7 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
 	phys_addr_t start = (base_gfn +  __ffs(mask)) << PAGE_SHIFT;
 	phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT;
 
-	stage2_wp_range(&kvm->arch.mmu, start, end);
+	kvm_stage2_wp_range(&kvm->arch.mmu, start, end);
 }
 
 /*
@@ -1062,6 +1073,7 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 		gfn_t gfn_offset, unsigned long mask)
 {
 	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
+	kvm_nested_s2_wp(kvm);
 }
 
 static void kvm_send_hwpoison_signal(unsigned long address, short lsb)
@@ -1720,6 +1732,7 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
 			     (range->end - range->start) << PAGE_SHIFT,
 			     range->may_block);
 
+	kvm_nested_s2_unmap(kvm);
 	return false;
 }
 
@@ -1754,6 +1767,7 @@ bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 			       PAGE_SIZE, __pfn_to_phys(pfn),
 			       KVM_PGTABLE_PROT_R, NULL, 0);
 
+	kvm_nested_s2_unmap(kvm);
 	return false;
 }
 
@@ -1772,6 +1786,11 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 					range->start << PAGE_SHIFT);
 	pte = __pte(kpte);
 	return pte_valid(pte) && pte_young(pte);
+
+	/*
+	 * TODO: Handle nested_mmu structures here using the reverse mapping in
+	 * a later version of patch series.
+	 */
 }
 
 bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
@@ -2004,6 +2023,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 
 	write_lock(&kvm->mmu_lock);
 	kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, size);
+	kvm_nested_s2_unmap(kvm);
 	write_unlock(&kvm->mmu_lock);
 }
 
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 73c0be25345a..948ac5b9638c 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -523,6 +523,45 @@ int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2)
 	return kvm_inject_nested_sync(vcpu, esr_el2);
 }
 
+/* expects kvm->mmu_lock to be held */
+void kvm_nested_s2_wp(struct kvm *kvm)
+{
+	int i;
+
+	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
+		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
+
+		if (kvm_s2_mmu_valid(mmu))
+			kvm_stage2_wp_range(mmu, 0, kvm_phys_size(mmu));
+	}
+}
+
+/* expects kvm->mmu_lock to be held */
+void kvm_nested_s2_unmap(struct kvm *kvm)
+{
+	int i;
+
+	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
+		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
+
+		if (kvm_s2_mmu_valid(mmu))
+			kvm_unmap_stage2_range(mmu, 0, kvm_phys_size(mmu));
+	}
+}
+
+/* expects kvm->mmu_lock to be held */
+void kvm_nested_s2_flush(struct kvm *kvm)
+{
+	int i;
+
+	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
+		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
+
+		if (kvm_s2_mmu_valid(mmu))
+			kvm_stage2_flush_range(mmu, 0, kvm_phys_size(mmu));
+	}
+}
+
 void kvm_arch_flush_shadow_all(struct kvm *kvm)
 {
 	int i;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 30/59] KVM: arm64: nv: Set a handler for the system instruction traps
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (28 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 29/59] KVM: arm64: nv: Unmap/flush shadow stage 2 page tables Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 31/59] KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2 Marc Zyngier
                   ` (31 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

When HCR.NV bit is set, execution of the EL2 translation regime address
aranslation instructions and TLB maintenance instructions are trapped to
EL2. In addition, execution of the EL1 translation regime address
aranslation instructions and TLB maintenance instructions that are only
accessible from EL2 and above are trapped to EL2. In these cases,
ESR_EL2.EC will be set to 0x18.

Rework the system instruction emulation framework to handle potentially
all system instruction traps other than MSR/MRS instructions. Those
system instructions would be AT and TLBI instructions controlled by
HCR_EL2.NV, AT, and TTLB bits.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
[maz: squashed two patches together, redispatched various bits around]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/sys_regs.c | 47 +++++++++++++++++++++++++++++++--------
 1 file changed, 38 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 52d798dbc202..1d3676d7b09f 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1940,10 +1940,6 @@ static bool access_spsr_el2(struct kvm_vcpu *vcpu,
  * guest...
  */
 static const struct sys_reg_desc sys_reg_descs[] = {
-	{ SYS_DESC(SYS_DC_ISW), access_dcsw },
-	{ SYS_DESC(SYS_DC_CSW), access_dcsw },
-	{ SYS_DESC(SYS_DC_CISW), access_dcsw },
-
 	DBG_BCR_BVR_WCR_WVR_EL1(0),
 	DBG_BCR_BVR_WCR_WVR_EL1(1),
 	{ SYS_DESC(SYS_MDCCINT_EL1), trap_debug_regs, reset_val, MDCCINT_EL1, 0 },
@@ -2422,6 +2418,12 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	EL2_REG(SP_EL2, NULL, reset_unknown, 0),
 };
 
+static struct sys_reg_desc sys_insn_descs[] = {
+	{ SYS_DESC(SYS_DC_ISW), access_dcsw },
+	{ SYS_DESC(SYS_DC_CSW), access_dcsw },
+	{ SYS_DESC(SYS_DC_CISW), access_dcsw },
+};
+
 static bool trap_dbgdidr(struct kvm_vcpu *vcpu,
 			struct sys_reg_params *p,
 			const struct sys_reg_desc *r)
@@ -3112,6 +3114,24 @@ static bool emulate_sys_reg(struct kvm_vcpu *vcpu,
 	return false;
 }
 
+static int emulate_sys_instr(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
+{
+	const struct sys_reg_desc *r;
+
+	/* Search from the system instruction table. */
+	r = find_reg(p, sys_insn_descs, ARRAY_SIZE(sys_insn_descs));
+
+	if (likely(r)) {
+		perform_access(vcpu, p, r);
+	} else {
+		kvm_err("Unsupported guest sys instruction at: %lx\n",
+			*vcpu_pc(vcpu));
+		print_sys_reg_instr(p);
+		kvm_inject_undefined(vcpu);
+	}
+	return 1;
+}
+
 /**
  * kvm_reset_sys_regs - sets system registers to reset value
  * @vcpu: The VCPU pointer
@@ -3129,7 +3149,8 @@ void kvm_reset_sys_regs(struct kvm_vcpu *vcpu)
 }
 
 /**
- * kvm_handle_sys_reg -- handles a mrs/msr trap on a guest sys_reg access
+ * kvm_handle_sys_reg -- handles a system instruction or mrs/msr instruction
+ *			 trap on a guest execution
  * @vcpu: The VCPU pointer
  */
 int kvm_handle_sys_reg(struct kvm_vcpu *vcpu)
@@ -3146,12 +3167,19 @@ int kvm_handle_sys_reg(struct kvm_vcpu *vcpu)
 	params = esr_sys64_to_params(esr);
 	params.regval = vcpu_get_reg(vcpu, Rt);
 
-	if (!emulate_sys_reg(vcpu, &params))
+	/* System register? */
+	if (params.Op0 == 2 || params.Op0 == 3) {
+		if (!emulate_sys_reg(vcpu, &params))
+			return 1;
+
+		if (!params.is_write)
+			vcpu_set_reg(vcpu, Rt, params.regval);
+
 		return 1;
+	}
 
-	if (!params.is_write)
-		vcpu_set_reg(vcpu, Rt, params.regval);
-	return 1;
+	/* Hints, PSTATE (Op0 == 0) and System instructions (Op0 == 1) */
+	return emulate_sys_instr(vcpu, &params);
 }
 
 /******************************************************************************
@@ -3543,6 +3571,7 @@ int __init kvm_sys_reg_table_init(void)
 	valid &= check_sysreg_table(cp15_regs, ARRAY_SIZE(cp15_regs), true);
 	valid &= check_sysreg_table(cp15_64_regs, ARRAY_SIZE(cp15_64_regs), true);
 	valid &= check_sysreg_table(invariant_sys_regs, ARRAY_SIZE(invariant_sys_regs), false);
+	valid &= check_sysreg_table(sys_insn_descs, ARRAY_SIZE(sys_insn_descs), false);
 
 	if (!valid)
 		return -EINVAL;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 31/59] KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (29 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 30/59] KVM: arm64: nv: Set a handler for the system instruction traps Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 32/59] KVM: arm64: nv: Trap and emulate TLBI " Marc Zyngier
                   ` (30 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

When supporting nested virtualization a guest hypervisor executing AT
instructions must be trapped and emulated by the host hypervisor,
because untrapped AT instructions operating on S1E1 will use the wrong
translation regieme (the one used to emulate virtual EL2 in EL1 instead
of virtual EL1) and AT instructions operating on S12 will not work from
EL1.

This patch does several things.

1. List and define all AT system instructions to emulate and document
the emulation design.

2. Implement AT instruction handling logic in EL2. This will be used to
emulate AT instructions executed in the virtual EL2.

AT instruction emulation works by loading the proper processor
context, which depends on the trapped instruction and the virtual
HCR_EL2, to the EL1 virtual memory control registers and executing AT
instructions. Note that ctxt->hw_sys_regs is expected to have the
proper processor context before calling the handling
function(__kvm_at_insn) implemented in this patch.

4. Emulate AT S1E[01] instructions by issuing the same instructions in
EL2. We set the physical EL1 registers, NV and NV1 bits as described in
the AT instruction emulation overview.

5. Emulate AT A12E[01] instructions in two steps: First, do the stage-1
translation by reusing the existing AT emulation functions.  Second, do
the stage-2 translation by walking the guest hypervisor's stage-2 page
table in software. Record the translation result to PAR_EL1.

6. Emulate AT S1E2 instructions by issuing the corresponding S1E1
instructions in EL2. We set the physical EL1 registers and the HCR_EL2
register as described in the AT instruction emulation overview.

7. Forward system instruction traps to the virtual EL2 if the corresponding
virtual AT bit is set in the virtual HCR_EL2.

  [ Much logic above has been reworked by Marc Zyngier ]

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
---
 arch/arm64/include/asm/kvm_arm.h |   1 +
 arch/arm64/include/asm/kvm_asm.h |   2 +
 arch/arm64/kvm/Makefile          |   2 +-
 arch/arm64/kvm/at.c              | 219 +++++++++++++++++++++++++++++++
 arch/arm64/kvm/hyp/vhe/switch.c  |  13 +-
 arch/arm64/kvm/sys_regs.c        | 217 ++++++++++++++++++++++++++++++
 6 files changed, 451 insertions(+), 3 deletions(-)
 create mode 100644 arch/arm64/kvm/at.c

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index af24380e02d9..604604a0fdc0 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -128,6 +128,7 @@
 #define VTCR_EL2_TG0_16K	TCR_TG0_16K
 #define VTCR_EL2_TG0_64K	TCR_TG0_64K
 #define VTCR_EL2_SH0_MASK	TCR_SH0_MASK
+#define VTCR_EL2_SH0_SHIFT	TCR_SH0_SHIFT
 #define VTCR_EL2_SH0_INNER	TCR_SH0_INNER
 #define VTCR_EL2_ORGN0_MASK	TCR_ORGN0_MASK
 #define VTCR_EL2_ORGN0_WBWA	TCR_ORGN0_WBWA
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 43c3bc0f9544..373558ea24d8 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -228,6 +228,8 @@ extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa,
 extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
 
 extern void __kvm_timer_set_cntvoff(u64 cntvoff);
+extern void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr);
+extern void __kvm_at_s1e2(struct kvm_vcpu *vcpu, u32 op, u64 vaddr);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index c0c050e53157..c2717a8f12f5 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -14,7 +14,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
 	 inject_fault.o va_layout.o handle_exit.o \
 	 guest.o debug.o reset.o sys_regs.o stacktrace.o \
 	 vgic-sys-reg-v3.o fpsimd.o pkvm.o \
-	 arch_timer.o trng.o vmid.o emulate-nested.o nested.o \
+	 arch_timer.o trng.o vmid.o emulate-nested.o nested.o at.o \
 	 vgic/vgic.o vgic/vgic-init.o \
 	 vgic/vgic-irqfd.o vgic/vgic-v2.o \
 	 vgic/vgic-v3.o vgic/vgic-v4.o \
diff --git a/arch/arm64/kvm/at.c b/arch/arm64/kvm/at.c
new file mode 100644
index 000000000000..6d47dd409384
--- /dev/null
+++ b/arch/arm64/kvm/at.c
@@ -0,0 +1,219 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2017 - Linaro Ltd
+ * Author: Jintack Lim <jintack.lim@linaro.org>
+ */
+
+#include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
+
+struct mmu_config {
+	u64	ttbr0;
+	u64	ttbr1;
+	u64	tcr;
+	u64	sctlr;
+	u64	vttbr;
+	u64	vtcr;
+	u64	hcr;
+};
+
+static void __mmu_config_save(struct mmu_config *config)
+{
+	config->ttbr0	= read_sysreg_el1(SYS_TTBR0);
+	config->ttbr1	= read_sysreg_el1(SYS_TTBR1);
+	config->tcr	= read_sysreg_el1(SYS_TCR);
+	config->sctlr	= read_sysreg_el1(SYS_SCTLR);
+	config->vttbr	= read_sysreg(vttbr_el2);
+	config->vtcr	= read_sysreg(vtcr_el2);
+	config->hcr	= read_sysreg(hcr_el2);
+}
+
+static void __mmu_config_restore(struct mmu_config *config)
+{
+	write_sysreg_el1(config->ttbr0,	SYS_TTBR0);
+	write_sysreg_el1(config->ttbr1,	SYS_TTBR1);
+	write_sysreg_el1(config->tcr,	SYS_TCR);
+	write_sysreg_el1(config->sctlr,	SYS_SCTLR);
+	write_sysreg(config->vttbr,	vttbr_el2);
+	write_sysreg(config->vtcr,	vtcr_el2);
+	write_sysreg(config->hcr,	hcr_el2);
+
+	isb();
+}
+
+void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
+{
+	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
+	struct mmu_config config;
+	struct kvm_s2_mmu *mmu;
+	bool fail;
+
+	write_lock(&vcpu->kvm->mmu_lock);
+
+	/*
+	 * If HCR_EL2.{E2H,TGE} == {1,1}, the MMU context is already
+	 * the right one (as we trapped from vEL2).
+	 */
+	if (vcpu_el2_e2h_is_set(vcpu) && vcpu_el2_tge_is_set(vcpu))
+		goto skip_mmu_switch;
+
+	/*
+	 * FIXME: Obtaining the S2 MMU for a L2 is horribly racy, and
+	 * we may not find it (recycled by another vcpu, for example).
+	 * See the other FIXME comment below about the need for a SW
+	 * PTW in this case.
+	 */
+	mmu = lookup_s2_mmu(vcpu);
+	if (WARN_ON(!mmu))
+		goto out;
+
+	/* We've trapped, so everything is live on the CPU. */
+	__mmu_config_save(&config);
+
+	write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR0_EL1),	SYS_TTBR0);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR1_EL1),	SYS_TTBR1);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, TCR_EL1),	SYS_TCR);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, SCTLR_EL1),	SYS_SCTLR);
+	write_sysreg(kvm_get_vttbr(mmu),		vttbr_el2);
+	/*
+	 * REVISIT: do we need anything from the guest's VTCR_EL2? If
+	 * looks like keeping the hosts configuration is the right
+	 * thing to do at this stage (and we could avoid save/restore
+	 * it. Keep the host's version for now.
+	 */
+	write_sysreg((config.hcr & ~HCR_TGE) | HCR_VM,	hcr_el2);
+
+	isb();
+
+skip_mmu_switch:
+
+	switch (op) {
+	case OP_AT_S1E1R:
+	case OP_AT_S1E1RP:
+		fail = __kvm_at("s1e1r", vaddr);
+		break;
+	case OP_AT_S1E1W:
+	case OP_AT_S1E1WP:
+		fail = __kvm_at("s1e1w", vaddr);
+		break;
+	case OP_AT_S1E0R:
+		fail = __kvm_at("s1e0r", vaddr);
+		break;
+	case OP_AT_S1E0W:
+		fail = __kvm_at("s1e0w", vaddr);
+		break;
+	default:
+		WARN_ON_ONCE(1);
+		break;
+	}
+
+	if (!fail)
+		ctxt_sys_reg(ctxt, PAR_EL1) = read_sysreg(par_el1);
+	else
+		ctxt_sys_reg(ctxt, PAR_EL1) = SYS_PAR_EL1_F;
+
+	/*
+	 * Failed? let's leave the building now.
+	 *
+	 * FIXME: how about a failed translation because the shadow S2
+	 * wasn't populated? We may need to perform a SW PTW,
+	 * populating our shadow S2 and retry the instruction.
+	 */
+	if (ctxt_sys_reg(ctxt, PAR_EL1) & SYS_PAR_EL1_F)
+		goto nopan;
+
+	/* No PAN? No problem. */
+	if (!vcpu_el2_e2h_is_set(vcpu) || !(*vcpu_cpsr(vcpu) & PSR_PAN_BIT))
+		goto nopan;
+
+	/*
+	 * For PAN-involved AT operations, perform the same
+	 * translation, using EL0 this time.
+	 */
+	switch (op) {
+	case OP_AT_S1E1RP:
+		fail = __kvm_at("s1e0r", vaddr);
+		break;
+	case OP_AT_S1E1WP:
+		fail = __kvm_at("s1e0w", vaddr);
+		break;
+	default:
+		goto nopan;
+	}
+
+	/*
+	 * If the EL0 translation has succeeded, we need to pretend
+	 * the AT operation has failed, as the PAN setting forbids
+	 * such a translation.
+	 *
+	 * FIXME: we hardcode a Level-3 permission fault. We really
+	 * should return the real fault level.
+	 */
+	if (fail || !(read_sysreg(par_el1) & SYS_PAR_EL1_F))
+		ctxt_sys_reg(ctxt, PAR_EL1) = (0xf << 1) | SYS_PAR_EL1_F;
+
+nopan:
+	if (!(vcpu_el2_e2h_is_set(vcpu) && vcpu_el2_tge_is_set(vcpu)))
+		__mmu_config_restore(&config);
+
+out:
+	write_unlock(&vcpu->kvm->mmu_lock);
+}
+
+void __kvm_at_s1e2(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
+{
+	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
+	struct mmu_config config;
+	struct kvm_s2_mmu *mmu;
+	u64 val;
+
+	write_lock(&vcpu->kvm->mmu_lock);
+
+	mmu = &vcpu->kvm->arch.mmu;
+
+	/* We've trapped, so everything is live on the CPU. */
+	__mmu_config_save(&config);
+
+	if (vcpu_el2_e2h_is_set(vcpu)) {
+		write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR0_EL2),	SYS_TTBR0);
+		write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR1_EL2),	SYS_TTBR1);
+		write_sysreg_el1(ctxt_sys_reg(ctxt, TCR_EL2),	SYS_TCR);
+		write_sysreg_el1(ctxt_sys_reg(ctxt, SCTLR_EL2),	SYS_SCTLR);
+
+		val = config.hcr;
+	} else {
+		write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR0_EL2),	SYS_TTBR0);
+		val = translate_tcr_el2_to_tcr_el1(ctxt_sys_reg(ctxt, TCR_EL2));
+		write_sysreg_el1(val, SYS_TCR);
+		val = translate_sctlr_el2_to_sctlr_el1(ctxt_sys_reg(ctxt, SCTLR_EL2));
+		write_sysreg_el1(val, SYS_SCTLR);
+
+		val = config.hcr | HCR_NV | HCR_NV1;
+	}
+
+	write_sysreg(kvm_get_vttbr(mmu),		vttbr_el2);
+	/* FIXME: write S2 MMU VTCR_EL2? */
+	write_sysreg((val & ~HCR_TGE) | HCR_VM,		hcr_el2);
+
+	isb();
+
+	switch (op) {
+	case OP_AT_S1E2R:
+		asm volatile("at s1e1r, %0" : : "r" (vaddr));
+		break;
+	case OP_AT_S1E2W:
+		asm volatile("at s1e1w, %0" : : "r" (vaddr));
+		break;
+	default:
+		WARN_ON_ONCE(1);
+		break;
+	}
+
+	isb();
+
+	/* FIXME: handle failed translation due to shadow S2 */
+	ctxt_sys_reg(ctxt, PAR_EL1) = read_sysreg(par_el1);
+
+	__mmu_config_restore(&config);
+	write_unlock(&vcpu->kvm->mmu_lock);
+}
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index d2fcd881a267..83d0df0c39a5 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -44,9 +44,10 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 		if (!vcpu_el2_e2h_is_set(vcpu)) {
 			/*
 			 * For a guest hypervisor on v8.0, trap and emulate
-			 * the EL1 virtual memory control register accesses.
+			 * the EL1 virtual memory control register accesses
+			 * as well as the AT S1 operations.
 			 */
-			hcr |= HCR_TVM | HCR_TRVM | HCR_NV1;
+			hcr |= HCR_TVM | HCR_TRVM | HCR_AT | HCR_NV1;
 		} else {
 			/*
 			 * For a guest hypervisor on v8.1 (VHE), allow to
@@ -71,6 +72,14 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 			hcr &= ~HCR_TVM;
 
 			hcr |= vhcr_el2 & (HCR_TVM | HCR_TRVM);
+
+			/*
+			 * If we're using the EL1 translation regime
+			 * (TGE clear), then ensure that AT S1 ops are
+			 * trapped too.
+			 */
+			if (!vcpu_el2_tge_is_set(vcpu))
+				hcr |= HCR_AT;
 		}
 	}
 
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 1d3676d7b09f..859fa9140974 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2418,10 +2418,227 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	EL2_REG(SP_EL2, NULL, reset_unknown, 0),
 };
 
+static bool handle_s1e01(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			 const struct sys_reg_desc *r)
+{
+	int sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
+
+	__kvm_at_s1e01(vcpu, sys_encoding, p->regval);
+
+	return true;
+}
+
+static bool handle_s1e2(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			const struct sys_reg_desc *r)
+{
+	int sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
+
+	__kvm_at_s1e2(vcpu, sys_encoding, p->regval);
+
+	return true;
+}
+
+static u64 setup_par_aborted(u32 esr)
+{
+	u64 par = 0;
+
+	/* S [9]: fault in the stage 2 translation */
+	par |= (1 << 9);
+	/* FST [6:1]: Fault status code  */
+	par |= (esr << 1);
+	/* F [0]: translation is aborted */
+	par |= 1;
+
+	return par;
+}
+
+static u64 setup_par_completed(struct kvm_vcpu *vcpu, struct kvm_s2_trans *out)
+{
+	u64 par, vtcr_sh0;
+
+	/* F [0]: Translation is completed successfully */
+	par = 0;
+	/* ATTR [63:56] */
+	par |= out->upper_attr;
+	/* PA [47:12] */
+	par |= out->output & GENMASK_ULL(11, 0);
+	/* RES1 [11] */
+	par |= (1UL << 11);
+	/* SH [8:7]: Shareability attribute */
+	vtcr_sh0 = vcpu_read_sys_reg(vcpu, VTCR_EL2) & VTCR_EL2_SH0_MASK;
+	par |= (vtcr_sh0 >> VTCR_EL2_SH0_SHIFT) << 7;
+
+	return par;
+}
+
+static bool handle_s12(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+		       const struct sys_reg_desc *r, bool write)
+{
+	u64 par, va;
+	u32 esr, op;
+	phys_addr_t ipa;
+	struct kvm_s2_trans out;
+	int ret;
+
+	/* Do the stage-1 translation */
+	va = p->regval;
+	op = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
+	switch (op) {
+	case OP_AT_S12E1R:
+		op = OP_AT_S1E1R;
+		break;
+	case OP_AT_S12E1W:
+		op = OP_AT_S1E1W;
+		break;
+	case OP_AT_S12E0R:
+		op = OP_AT_S1E0R;
+		break;
+	case OP_AT_S12E0W:
+		op = OP_AT_S1E0W;
+		break;
+	default:
+		WARN_ON_ONCE(1);
+		return true;
+	}
+
+	__kvm_at_s1e01(vcpu, op, va);
+	par = vcpu_read_sys_reg(vcpu, PAR_EL1);
+	if (par & 1) {
+		/* The stage-1 translation aborted */
+		return true;
+	}
+
+	/* Do the stage-2 translation */
+	ipa = (par & GENMASK_ULL(47, 12)) | (va & GENMASK_ULL(11, 0));
+	out.esr = 0;
+	ret = kvm_walk_nested_s2(vcpu, ipa, &out);
+	if (ret < 0)
+		return false;
+
+	/* Check if the stage-2 PTW is aborted */
+	if (out.esr) {
+		esr = out.esr;
+		goto s2_trans_abort;
+	}
+
+	/* Check the access permission */
+	if ((!write && !out.readable) || (write && !out.writable)) {
+		esr = ESR_ELx_FSC_PERM;
+		esr |= out.level & 0x3;
+		goto s2_trans_abort;
+	}
+
+	vcpu_write_sys_reg(vcpu, setup_par_completed(vcpu, &out), PAR_EL1);
+	return true;
+
+s2_trans_abort:
+	vcpu_write_sys_reg(vcpu, setup_par_aborted(esr), PAR_EL1);
+	return true;
+}
+
+static bool handle_s12r(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			const struct sys_reg_desc *r)
+{
+	return handle_s12(vcpu, p, r, false);
+}
+
+static bool handle_s12w(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			const struct sys_reg_desc *r)
+{
+	return handle_s12(vcpu, p, r, true);
+}
+
+/*
+ * AT instruction emulation
+ *
+ * We emulate AT instructions executed in the virtual EL2.
+ * Basic strategy for the stage-1 translation emulation is to load proper
+ * context, which depends on the trapped instruction and the virtual HCR_EL2,
+ * to the EL1 virtual memory control registers and execute S1E[01] instructions
+ * in EL2. See below for more detail.
+ *
+ * For the stage-2 translation, which is necessary for S12E[01] emulation,
+ * we walk the guest hypervisor's stage-2 page table in software.
+ *
+ * The stage-1 translation emulations can be divided into two groups depending
+ * on the translation regime.
+ *
+ * 1. EL2 AT instructions: S1E2x
+ * +-----------------------------------------------------------------------+
+ * |                             |         Setting for the emulation       |
+ * | Virtual HCR_EL2.E2H on trap |-----------------------------------------+
+ * |                             | Phys EL1 regs | Phys NV, NV1 | Phys TGE |
+ * |-----------------------------------------------------------------------|
+ * |             0               |     vEL2      |    (1, 1)    |    0     |
+ * |             1               |     vEL2      |    (0, 0)    |    0     |
+ * +-----------------------------------------------------------------------+
+ *
+ * We emulate the EL2 AT instructions by loading virtual EL2 context
+ * to the EL1 virtual memory control registers and executing corresponding
+ * EL1 AT instructions.
+ *
+ * We set physical NV and NV1 bits to use EL2 page table format for non-VHE
+ * guest hypervisor (i.e. HCR_EL2.E2H == 0). As a VHE guest hypervisor uses the
+ * EL1 page table format, we don't set those bits.
+ *
+ * We should clear physical TGE bit not to use the EL2 translation regime when
+ * the host uses the VHE feature.
+ *
+ *
+ * 2. EL0/EL1 AT instructions: S1E[01]x, S12E1x
+ * +----------------------------------------------------------------------+
+ * |   Virtual HCR_EL2 on trap  |        Setting for the emulation        |
+ * |----------------------------------------------------------------------+
+ * | (vE2H, vTGE) | (vNV, vNV1) | Phys EL1 regs | Phys NV, NV1 | Phys TGE |
+ * |----------------------------------------------------------------------|
+ * |    (0, 0)*   |   (0, 0)    |      vEL1     |    (0, 0)    |    0     |
+ * |    (0, 0)    |   (1, 1)    |      vEL1     |    (1, 1)    |    0     |
+ * |    (1, 1)    |   (0, 0)    |      vEL2     |    (0, 0)    |    0     |
+ * |    (1, 1)    |   (1, 1)    |      vEL2     |    (1, 1)    |    0     |
+ * +----------------------------------------------------------------------+
+ *
+ * *For (0, 0) in the 'Virtual HCR_EL2 on trap' column, it actually means
+ *  (1, 1). Keep them (0, 0) just for the readability.
+ *
+ * We set physical EL1 virtual memory control registers depending on
+ * (vE2H, vTGE) pair. When the pair is (0, 0) where AT instructions are
+ * supposed to use EL0/EL1 translation regime, we load the EL1 registers with
+ * the virtual EL1 registers (i.e. EL1 registers from the guest hypervisor's
+ * point of view). When the pair is (1, 1), however, AT instructions are defined
+ * to apply EL2 translation regime. To emulate this behavior, we load the EL1
+ * registers with the virtual EL2 context. (i.e the shadow registers)
+ *
+ * We respect the virtual NV and NV1 bit for the emulation. When those bits are
+ * set, it means that a guest hypervisor would like to use EL2 page table format
+ * for the EL1 translation regime. We emulate this by setting the physical
+ * NV and NV1 bits.
+ */
+
+#define SYS_INSN(insn, access_fn)					\
+	{								\
+		SYS_DESC(OP_##insn),					\
+		.access = (access_fn),					\
+	}
+
 static struct sys_reg_desc sys_insn_descs[] = {
 	{ SYS_DESC(SYS_DC_ISW), access_dcsw },
+
+	SYS_INSN(AT_S1E1R, handle_s1e01),
+	SYS_INSN(AT_S1E1W, handle_s1e01),
+	SYS_INSN(AT_S1E0R, handle_s1e01),
+	SYS_INSN(AT_S1E0W, handle_s1e01),
+	SYS_INSN(AT_S1E1RP, handle_s1e01),
+	SYS_INSN(AT_S1E1WP, handle_s1e01),
+
 	{ SYS_DESC(SYS_DC_CSW), access_dcsw },
 	{ SYS_DESC(SYS_DC_CISW), access_dcsw },
+
+	SYS_INSN(AT_S1E2R, handle_s1e2),
+	SYS_INSN(AT_S1E2W, handle_s1e2),
+	SYS_INSN(AT_S12E1R, handle_s12r),
+	SYS_INSN(AT_S12E1W, handle_s12w),
+	SYS_INSN(AT_S12E0R, handle_s12r),
+	SYS_INSN(AT_S12E0W, handle_s12w),
 };
 
 static bool trap_dbgdidr(struct kvm_vcpu *vcpu,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 32/59] KVM: arm64: nv: Trap and emulate TLBI instructions from virtual EL2
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (30 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 31/59] KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2 Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 33/59] KVM: arm64: nv: Fold guest's HCR_EL2 configuration into the host's Marc Zyngier
                   ` (29 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

From: Jintack Lim <jintack.lim@linaro.org>

When supporting nested virtualization a guest hypervisor executing TLBI
instructions must be trapped and emulated by the host hypervisor,
because the guest hypervisor can only affect physical TLB entries
relating to its own execution environment (virtual EL2 in EL1) but not
to the nested guests as required by the semantics of the instructions
and TLBI instructions might also result in updates (invalidations) to
shadow page tables.

This patch does several things.

1. Emulate TLBI ALLE2(IS) instruction executed in the virtual EL2. Since
we emulate the virtual EL2 in the EL1, we invalidate EL1&0 regime stage
1 TLB entries with setting vttbr_el2 having the VMID of the virtual EL2.

2. Emulate TLBI VAE2* instruction executed in the virtual EL2. Based on the
same principle as TLBI ALLE2 instruction, we can simply emulate those
instructions by executing corresponding VAE1* instructions with the
virtual EL2's VMID assigned by the host hypervisor.

Note that we are able to emulate TLBI ALLE2IS precisely by only
invalidating stage 1 TLB entries via TLBI VMALL1IS instruction, but to
make it simeple, we reuse the existing function, __kvm_tlb_flush_vmid(),
which invalidates both of stage 1 and 2 TLB entries.

3. TLBI ALLE1(IS) instruction invalidates all EL1&0 regime stage 1 and 2
TLB entries (on all PEs in the same Inner Shareable domain). To emulate
these instructions, we first need to clear all the mappings in the
shadow page tables since executing those instructions implies the change
of mappings in the stage 2 page tables maintained by the guest
hypervisor.  We then need to invalidate all EL1&0 regime stage 1 and 2
TLB entries of all VMIDs, which are assigned by the host hypervisor, for
this VM.

4. Based on the same principle as TLBI ALLE1(IS) emulation, we clear the
mappings in the shadow stage-2 page tables and invalidate TLB entries.
But this time we do it only for the current VMID from the guest
hypervisor's perspective, not for all VMIDs.

5. Based on the same principle as TLBI ALLE1(IS) and TLBI VMALLS12E1(IS)
emulation, we clear the mappings in the shadow stage-2 page tables and
invalidate TLB entries. We do it only for one mapping for the current
VMID from the guest hypervisor's view.

6. Even though a guest hypervisor can execute TLBI instructions that are
accesible at EL1 without trap, it's wrong; All those TLBI instructions
work based on current VMID, and when running a guest hypervisor current
VMID is the one for itself, not the one from the virtual vttbr_el2. So
letting a guest hypervisor execute those TLBI instructions results in
invalidating its own TLB entries and leaving invalid TLB entries
unhandled.

Therefore we trap and emulate those TLBI instructions. The emulation is
simple; we find a shadow VMID mapped to the virtual vttbr_el2, set it in
the physical vttbr_el2, then execute the same instruction in EL2.

We don't set HCR_EL2.TTLB bit yet.

  [ Changes performed by Marc Zynger:

    The TLBI handling code more or less directly execute the same
    instruction that has been trapped (with an EL2->EL1 conversion
    in the case of an EL2 TLBI), but that's unfortunately not enough:

    - TLBIs must be upgraded to the Inner Shareable domain to account
      for vcpu migration, just like we already have with HCR_EL2.FB.

    - The DSB instruction that synchronises these must thus be on
      the Inner Shareable domain as well.

    - Prior to executing the TLBI, we need another DSB ISHST to make
      sure that the update to the page tables is now visible.

      Ordering of system instructions fixed

    - The current TLB invalidation code is pretty buggy, as it assume a
      page mapping. On the contrary, it is likely that TLB invalidation
      will cover more than a single page, and the size should be decided
      by the guests configuration (and not the host's).

      Since we don't cache the guest mapping sizes in the shadow PT yet,
      let's assume the worse case (a block mapping) and invalidate that.

      Take this opportunity to fix the decoding of the parameter (it
      isn't a straight IPA).

    - In general, we always emulate local TBL invalidations as being
      as upgraded to the Inner Shareable domain so that we can easily
      deal with vcpu migration. This is consistent with the fact that
      we set HCR_EL2.FB when running non-nested VMs.

      So let's emulate TLBI ALLE2 as ALLE2IS.
  ]

  [ Changes performed by Christoffer Dall:

    Sometimes when we are invalidating the TLB for a certain S2 MMU
    context, this context can also have EL2 context associated with it
    and we have to invalidate this too.
  ]

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_asm.h    |   2 +
 arch/arm64/include/asm/kvm_nested.h |   7 +
 arch/arm64/include/asm/sysreg.h     |   4 +
 arch/arm64/kvm/hyp/vhe/switch.c     |   8 +-
 arch/arm64/kvm/hyp/vhe/tlb.c        |  81 ++++++++++
 arch/arm64/kvm/mmu.c                |  17 +-
 arch/arm64/kvm/nested.c             |  35 ++++
 arch/arm64/kvm/sys_regs.c           | 238 ++++++++++++++++++++++++++++
 8 files changed, 387 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 373558ea24d8..d73f8d5a8a25 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -226,6 +226,8 @@ extern void __kvm_flush_cpu_context(struct kvm_s2_mmu *mmu);
 extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa,
 				     int level);
 extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
+extern void __kvm_tlb_vae2is(struct kvm_s2_mmu *mmu, u64 va, u64 sys_encoding);
+extern void __kvm_tlb_el1_instr(struct kvm_s2_mmu *mmu, u64 val, u64 sys_encoding);
 
 extern void __kvm_timer_set_cntvoff(u64 cntvoff);
 extern void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr);
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index d330b947d48a..a4f626a011db 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -63,6 +63,13 @@ extern void kvm_init_nested(struct kvm *kvm);
 extern int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu);
 extern void kvm_init_nested_s2_mmu(struct kvm_s2_mmu *mmu);
 extern struct kvm_s2_mmu *lookup_s2_mmu(struct kvm_vcpu *vcpu);
+
+union tlbi_info;
+
+extern void kvm_s2_mmu_iterate_by_vmid(struct kvm *kvm, u16 vmid,
+				       const union tlbi_info *info,
+				       void (*)(struct kvm_s2_mmu *,
+						const union tlbi_info *));
 extern void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu);
 extern void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 8be78a9b9b3b..9f4a3bf6f182 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -586,6 +586,10 @@
 #define OP_AT_S12E0W	sys_insn(AT_Op0, 4, AT_CRn, 8, 7)
 
 /* TLBI instructions */
+#define TLBI_Op0	1
+#define TLBI_Op1_EL1	0	/* Accessible from EL1 or higher */
+#define TLBI_Op1_EL2	4	/* Accessible from EL2 or higher */
+
 #define OP_TLBI_VMALLE1OS		sys_insn(1, 0, 8, 1, 0)
 #define OP_TLBI_VAE1OS			sys_insn(1, 0, 8, 1, 1)
 #define OP_TLBI_ASIDE1OS		sys_insn(1, 0, 8, 1, 2)
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 83d0df0c39a5..c7d083407fae 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -47,7 +47,7 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 			 * the EL1 virtual memory control register accesses
 			 * as well as the AT S1 operations.
 			 */
-			hcr |= HCR_TVM | HCR_TRVM | HCR_AT | HCR_NV1;
+			hcr |= HCR_TVM | HCR_TRVM | HCR_AT | HCR_TTLB | HCR_NV1;
 		} else {
 			/*
 			 * For a guest hypervisor on v8.1 (VHE), allow to
@@ -75,11 +75,11 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 
 			/*
 			 * If we're using the EL1 translation regime
-			 * (TGE clear), then ensure that AT S1 ops are
-			 * trapped too.
+			 * (TGE clear), then ensure that AT S1 and
+			 * TLBI E1 ops are trapped too.
 			 */
 			if (!vcpu_el2_tge_is_set(vcpu))
-				hcr |= HCR_AT;
+				hcr |= HCR_AT | HCR_TTLB;
 		}
 	}
 
diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c
index 24cef9b87f9e..c4389db4cc22 100644
--- a/arch/arm64/kvm/hyp/vhe/tlb.c
+++ b/arch/arm64/kvm/hyp/vhe/tlb.c
@@ -161,3 +161,84 @@ void __kvm_flush_vm_context(void)
 
 	dsb(ish);
 }
+
+void __kvm_tlb_vae2is(struct kvm_s2_mmu *mmu, u64 va, u64 sys_encoding)
+{
+	struct tlb_inv_context cxt;
+
+	dsb(ishst);
+
+	/* Switch to requested VMID */
+	__tlb_switch_to_guest(mmu, &cxt);
+
+	/*
+	 * Execute the EL1 version of TLBI VAE2* instruction, forcing
+	 * an upgrade to the Inner Shareable domain in order to
+	 * perform the invalidation on all CPUs.
+	 */
+	switch (sys_encoding) {
+	case OP_TLBI_VAE2:
+	case OP_TLBI_VAE2IS:
+		__tlbi(vae1is, va);
+		break;
+	case OP_TLBI_VALE2:
+	case OP_TLBI_VALE2IS:
+		__tlbi(vale1is, va);
+		break;
+	default:
+		break;
+	}
+	dsb(ish);
+	isb();
+
+	__tlb_switch_to_host(&cxt);
+}
+
+void __kvm_tlb_el1_instr(struct kvm_s2_mmu *mmu, u64 val, u64 sys_encoding)
+{
+	struct tlb_inv_context cxt;
+
+	dsb(ishst);
+
+	/* Switch to requested VMID */
+	__tlb_switch_to_guest(mmu, &cxt);
+
+	/*
+	 * Execute the same instruction as the guest hypervisor did,
+	 * expanding the scope of local TLB invalidations to the Inner
+	 * Shareable domain so that it takes place on all CPUs. This
+	 * is equivalent to having HCR_EL2.FB set.
+	 */
+	switch (sys_encoding) {
+	case OP_TLBI_VMALLE1:
+	case OP_TLBI_VMALLE1IS:
+		__tlbi(vmalle1is);
+		break;
+	case OP_TLBI_VAE1:
+	case OP_TLBI_VAE1IS:
+		__tlbi(vae1is, val);
+		break;
+	case OP_TLBI_ASIDE1:
+	case OP_TLBI_ASIDE1IS:
+		__tlbi(aside1is, val);
+		break;
+	case OP_TLBI_VAAE1:
+	case OP_TLBI_VAAE1IS:
+		__tlbi(vaae1is, val);
+		break;
+	case OP_TLBI_VALE1:
+	case OP_TLBI_VALE1IS:
+		__tlbi(vale1is, val);
+		break;
+	case OP_TLBI_VAALE1:
+	case OP_TLBI_VAALE1IS:
+		__tlbi(vaale1is, val);
+		break;
+	default:
+		break;
+	}
+	dsb(ish);
+	isb();
+
+	__tlb_switch_to_host(&cxt);
+}
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 8144bb9b9ec8..6cd52397b97e 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -89,7 +89,22 @@ static bool memslot_is_logging(struct kvm_memory_slot *memslot)
 void kvm_flush_remote_tlbs(struct kvm *kvm)
 {
 	++kvm->stat.generic.remote_tlb_flush_requests;
-	kvm_call_hyp(__kvm_tlb_flush_vmid, &kvm->arch.mmu);
+
+	if (!kvm->arch.nested_mmus) {
+		/*
+		 * For a normal (i.e. non-nested) guest, flush entries for the
+		 * given VMID.
+		 */
+		kvm_call_hyp(__kvm_tlb_flush_vmid, &kvm->arch.mmu);
+	} else {
+		/*
+		 * When supporting nested virtualization, we can have multiple
+		 * VMIDs in play for each VCPU in the VM, so it's really not
+		 * worth it to try to quiesce the system and flush all the
+		 * VMIDs that may be in use, instead just nuke the whole thing.
+		 */
+		kvm_call_hyp(__kvm_flush_vm_context);
+	}
 }
 
 static bool kvm_is_device_pfn(unsigned long pfn)
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 948ac5b9638c..29cb28845aad 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -354,6 +354,41 @@ int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
 	return ret;
 }
 
+/*
+ * We can have multiple *different* MMU contexts with the same VMID:
+ *
+ * - S2 being enabled or not, hence differing by the HCR_EL2.VM bit
+ *
+ * - Multiple vcpus using private S2s (huh huh...), hence differing by the
+ *   VBBTR_EL2.BADDR address
+ *
+ * - A combination of the above...
+ *
+ * We can always identify which MMU context to pick at run-time.  However,
+ * TLB invalidation involving a VMID must take action on all the TLBs using
+ * this particular VMID. This translates into applying the same invalidation
+ * operation to all the contexts that are using this VMID. Moar phun!
+ */
+void kvm_s2_mmu_iterate_by_vmid(struct kvm *kvm, u16 vmid,
+				const union tlbi_info *info,
+				void (*tlbi_callback)(struct kvm_s2_mmu *,
+						      const union tlbi_info *))
+{
+	write_lock(&kvm->mmu_lock);
+
+	for (int i = 0; i < kvm->arch.nested_mmus_size; i++) {
+		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
+
+		if (!kvm_s2_mmu_valid(mmu))
+			continue;
+
+		if (vmid == get_vmid(mmu->tlb_vttbr))
+			tlbi_callback(mmu, info);
+	}
+
+	write_unlock(&kvm->mmu_lock);
+}
+
 /* Must be called with kvm->mmu_lock held */
 struct kvm_s2_mmu *lookup_s2_mmu(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 859fa9140974..1c66f0520ec3 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2548,6 +2548,216 @@ static bool handle_s12w(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 	return handle_s12(vcpu, p, r, true);
 }
 
+static bool handle_alle2is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			   const struct sys_reg_desc *r)
+{
+	/*
+	 * To emulate invalidating all EL2 regime stage 1 TLB entries for all
+	 * PEs, executing TLBI VMALLE1IS is enough. But reuse the existing
+	 * interface for the simplicity; invalidating stage 2 entries doesn't
+	 * affect the correctness.
+	 */
+	__kvm_tlb_flush_vmid(&vcpu->kvm->arch.mmu);
+	return true;
+}
+
+static bool handle_vae2is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			  const struct sys_reg_desc *r)
+{
+	int sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
+
+	/*
+	 * Based on the same principle as TLBI ALLE2 instruction
+	 * emulation, we emulate TLBI VAE2* instructions by executing
+	 * corresponding TLBI VAE1* instructions with the virtual
+	 * EL2's VMID assigned by the host hypervisor.
+	 */
+	__kvm_tlb_vae2is(&vcpu->kvm->arch.mmu, p->regval, sys_encoding);
+	return true;
+}
+
+static bool handle_alle1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			   const struct sys_reg_desc *r)
+{
+	struct kvm_s2_mmu *mmu = &vcpu->kvm->arch.mmu;
+
+	write_lock(&vcpu->kvm->mmu_lock);
+
+	/*
+	 * Clear all mappings in the shadow page tables and invalidate the stage
+	 * 1 and 2 TLB entries via kvm_tlb_flush_vmid_ipa().
+	 */
+	kvm_nested_s2_unmap(vcpu->kvm);
+
+	if (atomic64_read(&mmu->vmid.id)) {
+		/*
+		 * Invalidate the stage 1 and 2 TLB entries for the host OS
+		 * in a VM only if there is one.
+		 */
+		__kvm_tlb_flush_vmid(mmu);
+	}
+
+	write_unlock(&vcpu->kvm->mmu_lock);
+
+	return true;
+}
+
+/* Only defined here as this is an internal "abstraction" */
+union tlbi_info {
+	struct {
+		u64	start;
+		u64	size;
+	} range;
+
+	struct {
+		u64	addr;
+	} ipa;
+
+	struct {
+		u64	addr;
+		u32	encoding;
+	} va;
+};
+
+static void s2_mmu_unmap_stage2_range(struct kvm_s2_mmu *mmu,
+				      const union tlbi_info *info)
+{
+	kvm_unmap_stage2_range(mmu, info->range.start, info->range.size);
+}
+
+static bool handle_vmalls12e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+				const struct sys_reg_desc *r)
+{
+	u64 limit, vttbr;
+
+	vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
+	limit = BIT_ULL(kvm_get_pa_bits(vcpu->kvm));
+
+	kvm_s2_mmu_iterate_by_vmid(vcpu->kvm, get_vmid(vttbr),
+				   &(union tlbi_info) {
+					   .range = {
+						   .start = 0,
+						   .size = limit,
+					   },
+				   },
+				   s2_mmu_unmap_stage2_range);
+
+	return true;
+}
+
+static void s2_mmu_unmap_stage2_ipa(struct kvm_s2_mmu *mmu,
+				    const union tlbi_info *info)
+{
+	unsigned long max_size;
+	u64 base_addr;
+
+	/*
+	 * We drop a number of things from the supplied value:
+	 *
+	 * - NS bit: we're non-secure only.
+	 *
+	 * - TTL field: We already have the granule size from the
+	 *   VTCR_EL2.TG0 field, and the level is only relevant to the
+	 *   guest's S2PT.
+	 *
+	 * - IPA[51:48]: We don't support 52bit IPA just yet...
+	 *
+	 * And of course, adjust the IPA to be on an actual address.
+	 */
+	base_addr = (info->ipa.addr & GENMASK_ULL(35, 0)) << 12;
+
+	/* Compute the maximum extent of the invalidation */
+	switch (mmu->tlb_vtcr & VTCR_EL2_TG0_MASK) {
+	case VTCR_EL2_TG0_4K:
+		max_size = SZ_1G;
+		break;
+	case VTCR_EL2_TG0_16K:
+		max_size = SZ_32M;
+		break;
+	case VTCR_EL2_TG0_64K:
+		/*
+		 * No, we do not support 52bit IPA in nested yet. Once
+		 * we do, this should be 4TB.
+		 */
+		/* FIXME: remove the 52bit PA support from the IDregs */
+		max_size = SZ_512M;
+		break;
+	default:
+		BUG();
+	}
+
+	base_addr &= ~(max_size - 1);
+
+	kvm_unmap_stage2_range(mmu, base_addr, max_size);
+}
+
+static bool handle_ipas2e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			     const struct sys_reg_desc *r)
+{
+	u64 vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
+
+	kvm_s2_mmu_iterate_by_vmid(vcpu->kvm, get_vmid(vttbr),
+				   &(union tlbi_info) {
+					   .ipa = {
+						   .addr = p->regval,
+					   },
+				   },
+				   s2_mmu_unmap_stage2_ipa);
+
+	return true;
+}
+
+static void s2_mmu_unmap_stage2_va(struct kvm_s2_mmu *mmu,
+				   const union tlbi_info *info)
+{
+	__kvm_tlb_el1_instr(mmu, info->va.addr, info->va.encoding);
+}
+
+static bool handle_tlbi_el1(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			    const struct sys_reg_desc *r)
+{
+	u32 sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
+	u64 vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
+
+	/*
+	 * If we're here, this is because we've trapped on a EL1 TLBI
+	 * instruction that affects the EL1 translation regime while
+	 * we're running in a context that doesn't allow us to let the
+	 * HW do its thing (aka vEL2):
+	 *
+	 * - HCR_EL2.E2H == 0 : a non-VHE guest
+	 * - HCR_EL2.{E2H,TGE} == { 1, 0 } : a VHE guest in guest mode
+	 *
+	 * We don't expect these helpers to ever be called when running
+	 * in a vEL1 context.
+	 */
+
+	WARN_ON(!vcpu_is_el2(vcpu));
+
+	if ((__vcpu_sys_reg(vcpu, HCR_EL2) & (HCR_E2H | HCR_TGE)) == (HCR_E2H | HCR_TGE)) {
+		mutex_lock(&vcpu->kvm->lock);
+		/*
+		 * ARMv8.4-NV allows the guest to change TGE behind
+		 * our back, so we always trap EL1 TLBIs from vEL2...
+		 */
+		__kvm_tlb_el1_instr(&vcpu->kvm->arch.mmu, p->regval, sys_encoding);
+		mutex_unlock(&vcpu->kvm->lock);
+
+		return true;
+	}
+
+	kvm_s2_mmu_iterate_by_vmid(vcpu->kvm, get_vmid(vttbr),
+				   &(union tlbi_info) {
+					   .va = {
+						   .addr = p->regval,
+						   .encoding = sys_encoding,
+					   },
+				   },
+				   s2_mmu_unmap_stage2_va);
+
+	return true;
+}
+
 /*
  * AT instruction emulation
  *
@@ -2633,12 +2843,40 @@ static struct sys_reg_desc sys_insn_descs[] = {
 	{ SYS_DESC(SYS_DC_CSW), access_dcsw },
 	{ SYS_DESC(SYS_DC_CISW), access_dcsw },
 
+	SYS_INSN(TLBI_VMALLE1IS, handle_tlbi_el1),
+	SYS_INSN(TLBI_VAE1IS, handle_tlbi_el1),
+	SYS_INSN(TLBI_ASIDE1IS, handle_tlbi_el1),
+	SYS_INSN(TLBI_VAAE1IS, handle_tlbi_el1),
+	SYS_INSN(TLBI_VALE1IS, handle_tlbi_el1),
+	SYS_INSN(TLBI_VAALE1IS, handle_tlbi_el1),
+	SYS_INSN(TLBI_VMALLE1, handle_tlbi_el1),
+	SYS_INSN(TLBI_VAE1, handle_tlbi_el1),
+	SYS_INSN(TLBI_ASIDE1, handle_tlbi_el1),
+	SYS_INSN(TLBI_VAAE1, handle_tlbi_el1),
+	SYS_INSN(TLBI_VALE1, handle_tlbi_el1),
+	SYS_INSN(TLBI_VAALE1, handle_tlbi_el1),
+
 	SYS_INSN(AT_S1E2R, handle_s1e2),
 	SYS_INSN(AT_S1E2W, handle_s1e2),
 	SYS_INSN(AT_S12E1R, handle_s12r),
 	SYS_INSN(AT_S12E1W, handle_s12w),
 	SYS_INSN(AT_S12E0R, handle_s12r),
 	SYS_INSN(AT_S12E0W, handle_s12w),
+
+	SYS_INSN(TLBI_IPAS2E1IS, handle_ipas2e1is),
+	SYS_INSN(TLBI_IPAS2LE1IS, handle_ipas2e1is),
+	SYS_INSN(TLBI_ALLE2IS, handle_alle2is),
+	SYS_INSN(TLBI_VAE2IS, handle_vae2is),
+	SYS_INSN(TLBI_ALLE1IS, handle_alle1is),
+	SYS_INSN(TLBI_VALE2IS, handle_vae2is),
+	SYS_INSN(TLBI_VMALLS12E1IS, handle_vmalls12e1is),
+	SYS_INSN(TLBI_IPAS2E1, handle_ipas2e1is),
+	SYS_INSN(TLBI_IPAS2LE1, handle_ipas2e1is),
+	SYS_INSN(TLBI_ALLE2, handle_alle2is),
+	SYS_INSN(TLBI_VAE2, handle_vae2is),
+	SYS_INSN(TLBI_ALLE1, handle_alle1is),
+	SYS_INSN(TLBI_VALE2, handle_vae2is),
+	SYS_INSN(TLBI_VMALLS12E1, handle_vmalls12e1is),
 };
 
 static bool trap_dbgdidr(struct kvm_vcpu *vcpu,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 33/59] KVM: arm64: nv: Fold guest's HCR_EL2 configuration into the host's
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (31 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 32/59] KVM: arm64: nv: Trap and emulate TLBI " Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 34/59] KVM: arm64: nv: Hide RAS from nested guests Marc Zyngier
                   ` (28 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

When entering a L2 guest (nested virt enabled, but not in hypervisor
context), we need to honor the traps the L1 guest has asked enabled.

For now, just OR the guest's HCR_EL2 into the host's. We may have to do
some filtering in the future though.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/hyp/vhe/switch.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index c7d083407fae..154d994c1015 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -81,6 +81,11 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 			if (!vcpu_el2_tge_is_set(vcpu))
 				hcr |= HCR_AT | HCR_TTLB;
 		}
+	} else if (vcpu_has_nv(vcpu)) {
+		u64 vhcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
+
+		vhcr_el2 &= ~HCR_GUEST_NV_FILTER_FLAGS;
+		hcr |= vhcr_el2;
 	}
 
 	___activate_traps(vcpu, hcr);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 34/59] KVM: arm64: nv: Hide RAS from nested guests
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (32 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 33/59] KVM: arm64: nv: Fold guest's HCR_EL2 configuration into the host's Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 35/59] KVM: arm64: nv: Add handling of EL2-specific timer registers Marc Zyngier
                   ` (27 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

We don't want to expose complicated features to guests until we have
a good grasp on the basic CPU emulation. So let's pretend that RAS,
doesn't exist in a nested guest. We already hide the feature bits,
let's now make sure VDISR_EL1 will UNDEF.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/sys_regs.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 1c66f0520ec3..0ed928dfe345 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2391,6 +2391,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	EL2_REG(VBAR_EL2, access_rw, reset_val, 0),
 	EL2_REG(RVBAR_EL2, access_rw, reset_val, 0),
 	{ SYS_DESC(SYS_RMR_EL2), trap_undef },
+	{ SYS_DESC(SYS_VDISR_EL2), trap_undef },
 
 	EL2_REG(CONTEXTIDR_EL2, access_rw, reset_val, 0),
 	EL2_REG(TPIDR_EL2, access_rw, reset_val, 0),
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 35/59] KVM: arm64: nv: Add handling of EL2-specific timer registers
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (33 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 34/59] KVM: arm64: nv: Hide RAS from nested guests Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 36/59] KVM: arm64: nv: Load timer before the GIC Marc Zyngier
                   ` (26 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

Add the required handling for EL2 and EL02 registers, as
well as EL1 registers used in the E2H context.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/sysreg.h |   7 +++
 arch/arm64/kvm/sys_regs.c       | 102 +++++++++++++++++++++++++++++++-
 2 files changed, 108 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 9f4a3bf6f182..3508ba196b55 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -428,6 +428,7 @@
 #define SYS_CNTFRQ_EL0			sys_reg(3, 3, 14, 0, 0)
 
 #define SYS_CNTPCT_EL0			sys_reg(3, 3, 14, 0, 1)
+#define SYS_CNTVCT_EL0			sys_reg(3, 3, 14, 0, 2)
 #define SYS_CNTPCTSS_EL0		sys_reg(3, 3, 14, 0, 5)
 #define SYS_CNTVCTSS_EL0		sys_reg(3, 3, 14, 0, 6)
 
@@ -543,6 +544,12 @@
 
 #define SYS_CNTVOFF_EL2			sys_reg(3, 4, 14, 0, 3)
 #define SYS_CNTHCTL_EL2			sys_reg(3, 4, 14, 1, 0)
+#define SYS_CNTHP_TVAL_EL2		sys_reg(3, 4, 14, 2, 0)
+#define SYS_CNTHP_CTL_EL2		sys_reg(3, 4, 14, 2, 1)
+#define SYS_CNTHP_CVAL_EL2		sys_reg(3, 4, 14, 2, 2)
+#define SYS_CNTHV_TVAL_EL2		sys_reg(3, 4, 14, 3, 0)
+#define SYS_CNTHV_CTL_EL2		sys_reg(3, 4, 14, 3, 1)
+#define SYS_CNTHV_CVAL_EL2		sys_reg(3, 4, 14, 3, 2)
 
 /* VHE encodings for architectural EL0/1 system registers */
 #define SYS_SCTLR_EL12			sys_reg(3, 5, 1, 0, 0)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 0ed928dfe345..6be7a2615f01 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1313,26 +1313,111 @@ static bool access_arch_timer(struct kvm_vcpu *vcpu,
 
 	switch (reg) {
 	case SYS_CNTP_TVAL_EL0:
+		if (is_hyp_ctxt(vcpu) && vcpu_el2_e2h_is_set(vcpu))
+			tmr = TIMER_HPTIMER;
+		else
+			tmr = TIMER_PTIMER;
+		treg = TIMER_REG_TVAL;
+		break;
+
 	case SYS_AARCH32_CNTP_TVAL:
+	case SYS_CNTP_TVAL_EL02:
 		tmr = TIMER_PTIMER;
 		treg = TIMER_REG_TVAL;
 		break;
+
+	case SYS_CNTV_TVAL_EL02:
+		tmr = TIMER_VTIMER;
+		treg = TIMER_REG_TVAL;
+		break;
+
+	case SYS_CNTHP_TVAL_EL2:
+		tmr = TIMER_HPTIMER;
+		treg = TIMER_REG_TVAL;
+		break;
+
+	case SYS_CNTHV_TVAL_EL2:
+		tmr = TIMER_HVTIMER;
+		treg = TIMER_REG_TVAL;
+		break;
+
 	case SYS_CNTP_CTL_EL0:
+		if (is_hyp_ctxt(vcpu) && vcpu_el2_e2h_is_set(vcpu))
+			tmr = TIMER_HPTIMER;
+		else
+			tmr = TIMER_PTIMER;
+		treg = TIMER_REG_CTL;
+		break;
+
 	case SYS_AARCH32_CNTP_CTL:
+	case SYS_CNTP_CTL_EL02:
 		tmr = TIMER_PTIMER;
 		treg = TIMER_REG_CTL;
 		break;
+
+	case SYS_CNTV_CTL_EL02:
+		tmr = TIMER_VTIMER;
+		treg = TIMER_REG_CTL;
+		break;
+
+	case SYS_CNTHP_CTL_EL2:
+		tmr = TIMER_HPTIMER;
+		treg = TIMER_REG_CTL;
+		break;
+
+	case SYS_CNTHV_CTL_EL2:
+		tmr = TIMER_HVTIMER;
+		treg = TIMER_REG_CTL;
+		break;
+
 	case SYS_CNTP_CVAL_EL0:
+		if (is_hyp_ctxt(vcpu) && vcpu_el2_e2h_is_set(vcpu))
+			tmr = TIMER_HPTIMER;
+		else
+			tmr = TIMER_PTIMER;
+		treg = TIMER_REG_CVAL;
+		break;
+
 	case SYS_AARCH32_CNTP_CVAL:
+	case SYS_CNTP_CVAL_EL02:
 		tmr = TIMER_PTIMER;
 		treg = TIMER_REG_CVAL;
 		break;
+
+	case SYS_CNTV_CVAL_EL02:
+		tmr = TIMER_VTIMER;
+		treg = TIMER_REG_CVAL;
+		break;
+
+	case SYS_CNTHP_CVAL_EL2:
+		tmr = TIMER_HPTIMER;
+		treg = TIMER_REG_CVAL;
+		break;
+
+	case SYS_CNTHV_CVAL_EL2:
+		tmr = TIMER_HVTIMER;
+		treg = TIMER_REG_CVAL;
+		break;
+
+	case SYS_CNTVOFF_EL2:
+		tmr = TIMER_VTIMER;
+		treg = TIMER_REG_VOFF;
+		break;
+
 	case SYS_CNTPCT_EL0:
 	case SYS_CNTPCTSS_EL0:
+		if (is_hyp_ctxt(vcpu))
+			tmr = TIMER_HPTIMER;
+		else
+			tmr = TIMER_PTIMER;
+		treg = TIMER_REG_CNT;
+		break;
+
 	case SYS_AARCH32_CNTPCT:
 		tmr = TIMER_PTIMER;
 		treg = TIMER_REG_CNT;
 		break;
+
 	default:
 		print_sys_reg_msg(p, "%s", "Unhandled trapped timer register");
 		kvm_inject_undefined(vcpu);
@@ -2396,8 +2481,15 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	EL2_REG(CONTEXTIDR_EL2, access_rw, reset_val, 0),
 	EL2_REG(TPIDR_EL2, access_rw, reset_val, 0),
 
-	EL2_REG(CNTVOFF_EL2, access_rw, reset_val, 0),
+	EL2_REG(CNTVOFF_EL2, access_arch_timer, reset_val, 0),
 	EL2_REG(CNTHCTL_EL2, access_rw, reset_val, 0),
+	{ SYS_DESC(SYS_CNTHP_TVAL_EL2), access_arch_timer },
+	EL2_REG(CNTHP_CTL_EL2, access_arch_timer, reset_val, 0),
+	EL2_REG(CNTHP_CVAL_EL2, access_arch_timer, reset_val, 0),
+
+	{ SYS_DESC(SYS_CNTHV_TVAL_EL2), access_arch_timer },
+	EL2_REG(CNTHV_CTL_EL2, access_arch_timer, reset_val, 0),
+	EL2_REG(CNTHV_CVAL_EL2, access_arch_timer, reset_val, 0),
 
 	EL12_REG(SCTLR, access_vm_reg, reset_val, 0x00C50078),
 	EL12_REG(CPACR, access_rw, reset_val, 0),
@@ -2416,6 +2508,14 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	EL12_REG(CONTEXTIDR, access_vm_reg, reset_val, 0),
 	EL12_REG(CNTKCTL, access_rw, reset_val, 0),
 
+	{ SYS_DESC(SYS_CNTP_TVAL_EL02), access_arch_timer },
+	{ SYS_DESC(SYS_CNTP_CTL_EL02), access_arch_timer },
+	{ SYS_DESC(SYS_CNTP_CVAL_EL02), access_arch_timer },
+
+	{ SYS_DESC(SYS_CNTV_TVAL_EL02), access_arch_timer },
+	{ SYS_DESC(SYS_CNTV_CTL_EL02), access_arch_timer },
+	{ SYS_DESC(SYS_CNTV_CVAL_EL02), access_arch_timer },
+
 	EL2_REG(SP_EL2, NULL, reset_unknown, 0),
 };
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 36/59] KVM: arm64: nv: Load timer before the GIC
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (34 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 35/59] KVM: arm64: nv: Add handling of EL2-specific timer registers Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 37/59] KVM: arm64: nv: Nested GICv3 Support Marc Zyngier
                   ` (25 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

In order for vgic_v3_load_nested to be able to observe which timer
interrupts have the HW bit set for the current context, the timers
must have been loaded in the new mode and the right timer mapped
to their corresponding HW IRQs.

At the moment, we load the GIC first, meaning that timer interrupts
injected to an L2 guest will never have the HW bit set (we see the
old configuration).

Swapping the two loads solves this particular problem.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/arm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 9036206f0be8..a95144c0d088 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -438,8 +438,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 
 	vcpu->cpu = cpu;
 
-	kvm_vgic_load(vcpu);
 	kvm_timer_vcpu_load(vcpu);
+	kvm_vgic_load(vcpu);
 	if (has_vhe())
 		kvm_vcpu_load_sysregs_vhe(vcpu);
 	kvm_arch_vcpu_load_fp(vcpu);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 37/59] KVM: arm64: nv: Nested GICv3 Support
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (35 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 36/59] KVM: arm64: nv: Load timer before the GIC Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 38/59] KVM: arm64: nv: Don't load the GICv4 context on entering a nested guest Marc Zyngier
                   ` (24 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

From: Jintack Lim <jintack@cs.columbia.edu>

When entering a nested VM, we set up the hypervisor control interface
based on what the guest hypervisor has set. Especially, we investigate
each list register written by the guest hypervisor whether HW bit is
set.  If so, we translate hw irq number from the guest's point of view
to the real hardware irq number if there is a mapping.

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
[Redesigned execution flow around vcpu load/put]
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
[Rewritten to support GICv3 instead of GICv2]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_emulate.h |   8 +-
 arch/arm64/include/asm/kvm_host.h    |  13 +-
 arch/arm64/include/asm/kvm_nested.h  |   1 +
 arch/arm64/kvm/Makefile              |   2 +-
 arch/arm64/kvm/arm.c                 |   7 ++
 arch/arm64/kvm/nested.c              |  16 +++
 arch/arm64/kvm/sys_regs.c            | 161 +++++++++++++++++++++++-
 arch/arm64/kvm/vgic/vgic-v3-nested.c | 180 +++++++++++++++++++++++++++
 arch/arm64/kvm/vgic/vgic-v3.c        |  31 ++++-
 arch/arm64/kvm/vgic/vgic.c           |  27 ++++
 include/kvm/arm_vgic.h               |  20 +++
 11 files changed, 455 insertions(+), 11 deletions(-)
 create mode 100644 arch/arm64/kvm/vgic/vgic-v3-nested.c

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index f4ae0c61fa1c..be56ff626be2 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -546,7 +546,13 @@ static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
 
 static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
 {
-	return vcpu_read_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
+	/*
+	 * Use the in-memory view for MPIDR_EL1. It can't be changed by the
+	 * guest, and is also accessed from the context of *another* vcpu,
+	 * so anything using some other state (such as the NV state that is
+	 * used by vcpu_read_sys_reg) will eventually go wrong.
+	 */
+	return __vcpu_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
 }
 
 static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 2e2969f53b4a..1a6fc53f992d 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -42,12 +42,13 @@
 
 #define KVM_REQ_SLEEP \
 	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
-#define KVM_REQ_IRQ_PENDING	KVM_ARCH_REQ(1)
-#define KVM_REQ_VCPU_RESET	KVM_ARCH_REQ(2)
-#define KVM_REQ_RECORD_STEAL	KVM_ARCH_REQ(3)
-#define KVM_REQ_RELOAD_GICv4	KVM_ARCH_REQ(4)
-#define KVM_REQ_RELOAD_PMU	KVM_ARCH_REQ(5)
-#define KVM_REQ_SUSPEND		KVM_ARCH_REQ(6)
+#define KVM_REQ_IRQ_PENDING		KVM_ARCH_REQ(1)
+#define KVM_REQ_VCPU_RESET		KVM_ARCH_REQ(2)
+#define KVM_REQ_RECORD_STEAL		KVM_ARCH_REQ(3)
+#define KVM_REQ_RELOAD_GICv4		KVM_ARCH_REQ(4)
+#define KVM_REQ_RELOAD_PMU		KVM_ARCH_REQ(5)
+#define KVM_REQ_SUSPEND			KVM_ARCH_REQ(6)
+#define KVM_REQ_GUEST_HYP_IRQ_PENDING	KVM_ARCH_REQ(7)
 
 #define KVM_DIRTY_LOG_MANUAL_CAPS   (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | \
 				     KVM_DIRTY_LOG_INITIALLY_SET)
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index a4f626a011db..f7aa32cc2a72 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -72,6 +72,7 @@ extern void kvm_s2_mmu_iterate_by_vmid(struct kvm *kvm, u16 vmid,
 						const union tlbi_info *));
 extern void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu);
 extern void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu);
+extern void check_nested_vcpu_requests(struct kvm_vcpu *vcpu);
 
 struct kvm_s2_trans {
 	phys_addr_t output;
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index c2717a8f12f5..4462bede5c60 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -20,7 +20,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
 	 vgic/vgic-v3.o vgic/vgic-v4.o \
 	 vgic/vgic-mmio.o vgic/vgic-mmio-v2.o \
 	 vgic/vgic-mmio-v3.o vgic/vgic-kvm-device.o \
-	 vgic/vgic-its.o vgic/vgic-debug.o
+	 vgic/vgic-its.o vgic/vgic-debug.o vgic/vgic-v3-nested.o
 
 kvm-$(CONFIG_HW_PERF_EVENTS)  += pmu-emul.o pmu.o
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index a95144c0d088..2f3149c09805 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -803,6 +803,8 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu)
 
 		if (kvm_dirty_ring_check_request(vcpu))
 			return 0;
+
+		check_nested_vcpu_requests(vcpu);
 	}
 
 	return 1;
@@ -937,6 +939,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 		 * preserved on VMID roll-over if the task was preempted,
 		 * making a thread's VMID inactive. So we need to call
 		 * kvm_arm_vmid_update() in non-premptible context.
+		 *
+		 * Note that this must happen after the check_vcpu_request()
+		 * call to pick the correct s2_mmu structure, as a pending
+		 * nested exception (IRQ, for example) can trigger a change
+		 * in translation regime.
 		 */
 		kvm_arm_vmid_update(&vcpu->arch.hw_mmu->vmid);
 
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 29cb28845aad..a01ed0e1db82 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -615,6 +615,22 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm)
 	kvm_free_stage2_pgd(&kvm->arch.mmu);
 }
 
+bool vgic_state_is_nested(struct kvm_vcpu *vcpu)
+{
+	bool imo = __vcpu_sys_reg(vcpu, HCR_EL2) & HCR_IMO;
+	bool fmo = __vcpu_sys_reg(vcpu, HCR_EL2) & HCR_FMO;
+
+	WARN_ONCE(imo != fmo, "Separate virtual IRQ/FIQ settings not supported\n");
+
+	return vcpu_has_nv(vcpu) && imo && fmo && !is_hyp_ctxt(vcpu);
+}
+
+void check_nested_vcpu_requests(struct kvm_vcpu *vcpu)
+{
+	if (kvm_check_request(KVM_REQ_GUEST_HYP_IRQ_PENDING, vcpu))
+		kvm_inject_nested_irq(vcpu);
+}
+
 /*
  * Our emulated CPU doesn't support all the possible features. For the
  * sake of simplicity (and probably mental sanity), wipe out a number
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 6be7a2615f01..ce65a21f618d 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -17,6 +17,8 @@
 #include <linux/printk.h>
 #include <linux/uaccess.h>
 
+#include <linux/irqchip/arm-gic-v3.h>
+
 #include <asm/cacheflush.h>
 #include <asm/cputype.h>
 #include <asm/debug-monitors.h>
@@ -505,7 +507,13 @@ static bool access_gic_sre(struct kvm_vcpu *vcpu,
 	if (p->is_write)
 		return ignore_write(vcpu, p);
 
-	p->regval = vcpu->arch.vgic_cpu.vgic_v3.vgic_sre;
+	if (p->Op1 == 4) {	/* ICC_SRE_EL2 */
+		p->regval = (ICC_SRE_EL2_ENABLE | ICC_SRE_EL2_SRE |
+			     ICC_SRE_EL1_DIB | ICC_SRE_EL1_DFB);
+	} else {		/* ICC_SRE_EL1 */
+		p->regval = vcpu->arch.vgic_cpu.vgic_v3.vgic_sre;
+	}
+
 	return true;
 }
 
@@ -2013,6 +2021,122 @@ static bool access_spsr_el2(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+static bool access_gic_apr(struct kvm_vcpu *vcpu,
+			   struct sys_reg_params *p,
+			   const struct sys_reg_desc *r)
+{
+	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
+	u32 index, *base;
+
+	index = r->Op2;
+	if (r->CRm == 8)
+		base = cpu_if->vgic_ap0r;
+	else
+		base = cpu_if->vgic_ap1r;
+
+	if (p->is_write)
+		base[index] = p->regval;
+	else
+		p->regval = base[index];
+
+	return true;
+}
+
+static bool access_gic_hcr(struct kvm_vcpu *vcpu,
+			   struct sys_reg_params *p,
+			   const struct sys_reg_desc *r)
+{
+	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
+
+	if (p->is_write)
+		cpu_if->vgic_hcr = p->regval;
+	else
+		p->regval = cpu_if->vgic_hcr;
+
+	return true;
+}
+
+static bool access_gic_vtr(struct kvm_vcpu *vcpu,
+			   struct sys_reg_params *p,
+			   const struct sys_reg_desc *r)
+{
+	if (p->is_write)
+		return write_to_read_only(vcpu, p, r);
+
+	p->regval = kvm_vgic_global_state.ich_vtr_el2;
+
+	return true;
+}
+
+static bool access_gic_misr(struct kvm_vcpu *vcpu,
+			    struct sys_reg_params *p,
+			    const struct sys_reg_desc *r)
+{
+	if (p->is_write)
+		return write_to_read_only(vcpu, p, r);
+
+	p->regval = vgic_v3_get_misr(vcpu);
+
+	return true;
+}
+
+static bool access_gic_eisr(struct kvm_vcpu *vcpu,
+			    struct sys_reg_params *p,
+			    const struct sys_reg_desc *r)
+{
+	if (p->is_write)
+		return write_to_read_only(vcpu, p, r);
+
+	p->regval = vgic_v3_get_eisr(vcpu);
+
+	return true;
+}
+
+static bool access_gic_elrsr(struct kvm_vcpu *vcpu,
+			     struct sys_reg_params *p,
+			     const struct sys_reg_desc *r)
+{
+	if (p->is_write)
+		return write_to_read_only(vcpu, p, r);
+
+	p->regval = vgic_v3_get_elrsr(vcpu);
+
+	return true;
+}
+
+static bool access_gic_vmcr(struct kvm_vcpu *vcpu,
+			    struct sys_reg_params *p,
+			    const struct sys_reg_desc *r)
+{
+	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
+
+	if (p->is_write)
+		cpu_if->vgic_vmcr = p->regval;
+	else
+		p->regval = cpu_if->vgic_vmcr;
+
+	return true;
+}
+
+static bool access_gic_lr(struct kvm_vcpu *vcpu,
+			  struct sys_reg_params *p,
+			  const struct sys_reg_desc *r)
+{
+	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
+	u32 index;
+
+	index = p->Op2;
+	if (p->CRm == 13)
+		index += 8;
+
+	if (p->is_write)
+		cpu_if->vgic_lr[index] = p->regval;
+	else
+		p->regval = cpu_if->vgic_lr[index];
+
+	return true;
+}
+
 /*
  * Architected system registers.
  * Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
@@ -2478,6 +2602,41 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_RMR_EL2), trap_undef },
 	{ SYS_DESC(SYS_VDISR_EL2), trap_undef },
 
+	{ SYS_DESC(SYS_ICH_AP0R0_EL2), access_gic_apr },
+	{ SYS_DESC(SYS_ICH_AP0R1_EL2), access_gic_apr },
+	{ SYS_DESC(SYS_ICH_AP0R2_EL2), access_gic_apr },
+	{ SYS_DESC(SYS_ICH_AP0R3_EL2), access_gic_apr },
+	{ SYS_DESC(SYS_ICH_AP1R0_EL2), access_gic_apr },
+	{ SYS_DESC(SYS_ICH_AP1R1_EL2), access_gic_apr },
+	{ SYS_DESC(SYS_ICH_AP1R2_EL2), access_gic_apr },
+	{ SYS_DESC(SYS_ICH_AP1R3_EL2), access_gic_apr },
+
+	{ SYS_DESC(SYS_ICC_SRE_EL2), access_gic_sre },
+
+	{ SYS_DESC(SYS_ICH_HCR_EL2), access_gic_hcr },
+	{ SYS_DESC(SYS_ICH_VTR_EL2), access_gic_vtr },
+	{ SYS_DESC(SYS_ICH_MISR_EL2), access_gic_misr },
+	{ SYS_DESC(SYS_ICH_EISR_EL2), access_gic_eisr },
+	{ SYS_DESC(SYS_ICH_ELRSR_EL2), access_gic_elrsr },
+	{ SYS_DESC(SYS_ICH_VMCR_EL2), access_gic_vmcr },
+
+	{ SYS_DESC(SYS_ICH_LR0_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR1_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR2_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR3_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR4_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR5_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR6_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR7_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR8_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR9_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR10_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR11_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR12_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR13_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR14_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR15_EL2), access_gic_lr },
+
 	EL2_REG(CONTEXTIDR_EL2, access_rw, reset_val, 0),
 	EL2_REG(TPIDR_EL2, access_rw, reset_val, 0),
 
diff --git a/arch/arm64/kvm/vgic/vgic-v3-nested.c b/arch/arm64/kvm/vgic/vgic-v3-nested.c
new file mode 100644
index 000000000000..ab8ddf490b31
--- /dev/null
+++ b/arch/arm64/kvm/vgic/vgic-v3-nested.c
@@ -0,0 +1,180 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/cpu.h>
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/uaccess.h>
+
+#include <linux/irqchip/arm-gic-v3.h>
+
+#include <asm/kvm_emulate.h>
+#include <asm/kvm_arm.h>
+#include <kvm/arm_vgic.h>
+
+#include "vgic.h"
+
+static inline struct vgic_v3_cpu_if *vcpu_nested_if(struct kvm_vcpu *vcpu)
+{
+	return &vcpu->arch.vgic_cpu.nested_vgic_v3;
+}
+
+static inline struct vgic_v3_cpu_if *vcpu_shadow_if(struct kvm_vcpu *vcpu)
+{
+	return &vcpu->arch.vgic_cpu.shadow_vgic_v3;
+}
+
+static inline bool lr_triggers_eoi(u64 lr)
+{
+	return !(lr & (ICH_LR_STATE | ICH_LR_HW)) && (lr & ICH_LR_EOI);
+}
+
+u16 vgic_v3_get_eisr(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+	u16 reg = 0;
+	int i;
+
+	for (i = 0; i < kvm_vgic_global_state.nr_lr; i++) {
+		if (lr_triggers_eoi(cpu_if->vgic_lr[i]))
+			reg |= BIT(i);
+	}
+
+	return reg;
+}
+
+u16 vgic_v3_get_elrsr(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+	u16 reg = 0;
+	int i;
+
+	for (i = 0; i < kvm_vgic_global_state.nr_lr; i++) {
+		if (!(cpu_if->vgic_lr[i] & ICH_LR_STATE))
+			reg |= BIT(i);
+	}
+
+	return reg;
+}
+
+u64 vgic_v3_get_misr(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+	int nr_lr = kvm_vgic_global_state.nr_lr;
+	u64 reg = 0;
+
+	if (vgic_v3_get_eisr(vcpu))
+		reg |= ICH_MISR_EOI;
+
+	if (cpu_if->vgic_hcr & ICH_HCR_UIE) {
+		int used_lrs;
+
+		used_lrs = nr_lr - hweight16(vgic_v3_get_elrsr(vcpu));
+		if (used_lrs <= 1)
+			reg |= ICH_MISR_U;
+	}
+
+	/* TODO: Support remaining bits in this register */
+	return reg;
+}
+
+/*
+ * For LRs which have HW bit set such as timer interrupts, we modify them to
+ * have the host hardware interrupt number instead of the virtual one programmed
+ * by the guest hypervisor.
+ */
+static void vgic_v3_create_shadow_lr(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+	struct vgic_v3_cpu_if *s_cpu_if = vcpu_shadow_if(vcpu);
+	struct vgic_irq *irq;
+	int i, used_lrs = 0;
+
+	for (i = 0; i < kvm_vgic_global_state.nr_lr; i++) {
+		u64 lr = cpu_if->vgic_lr[i];
+		int l1_irq;
+
+		if (!(lr & ICH_LR_HW))
+			goto next;
+
+		/* We have the HW bit set */
+		l1_irq = (lr & ICH_LR_PHYS_ID_MASK) >> ICH_LR_PHYS_ID_SHIFT;
+		irq = vgic_get_irq(vcpu->kvm, vcpu, l1_irq);
+
+		if (!irq || !irq->hw) {
+			/* There was no real mapping, so nuke the HW bit */
+			lr &= ~ICH_LR_HW;
+			if (irq)
+				vgic_put_irq(vcpu->kvm, irq);
+			goto next;
+		}
+
+		/* Translate the virtual mapping to the real one */
+		lr &= ~ICH_LR_EOI; /* Why? */
+		lr &= ~ICH_LR_PHYS_ID_MASK;
+		lr |= (u64)irq->hwintid << ICH_LR_PHYS_ID_SHIFT;
+		vgic_put_irq(vcpu->kvm, irq);
+
+next:
+		s_cpu_if->vgic_lr[i] = lr;
+		used_lrs = i + 1;
+	}
+
+	s_cpu_if->used_lrs = used_lrs;
+}
+
+/*
+ * Change the shadow HWIRQ field back to the virtual value before copying over
+ * the entire shadow struct to the nested state.
+ */
+static void vgic_v3_fixup_shadow_lr_state(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+	struct vgic_v3_cpu_if *s_cpu_if = vcpu_shadow_if(vcpu);
+	int lr;
+
+	for (lr = 0; lr < kvm_vgic_global_state.nr_lr; lr++) {
+		s_cpu_if->vgic_lr[lr] &= ~ICH_LR_PHYS_ID_MASK;
+		s_cpu_if->vgic_lr[lr] |= cpu_if->vgic_lr[lr] & ICH_LR_PHYS_ID_MASK;
+	}
+}
+
+void vgic_v3_load_nested(struct kvm_vcpu *vcpu)
+{
+	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
+
+	vgic_cpu->shadow_vgic_v3 = vgic_cpu->nested_vgic_v3;
+	vgic_v3_create_shadow_lr(vcpu);
+	__vgic_v3_restore_state(vcpu_shadow_if(vcpu));
+}
+
+void vgic_v3_put_nested(struct kvm_vcpu *vcpu)
+{
+	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
+
+	__vgic_v3_save_state(vcpu_shadow_if(vcpu));
+
+	/*
+	 * Translate the shadow state HW fields back to the virtual ones
+	 * before copying the shadow struct back to the nested one.
+	 */
+	vgic_v3_fixup_shadow_lr_state(vcpu);
+	vgic_cpu->nested_vgic_v3 = vgic_cpu->shadow_vgic_v3;
+}
+
+void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+
+	/*
+	 * If we exit a nested VM with a pending maintenance interrupt from the
+	 * GIC, then we need to forward this to the guest hypervisor so that it
+	 * can re-sync the appropriate LRs and sample level triggered interrupts
+	 * again.
+	 */
+	if (vgic_state_is_nested(vcpu) &&
+	    (cpu_if->vgic_hcr & ICH_HCR_EN) &&
+	    vgic_v3_get_misr(vcpu))
+		kvm_inject_nested_irq(vcpu);
+}
diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c
index 93a47a515c13..41a3d0e5c876 100644
--- a/arch/arm64/kvm/vgic/vgic-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-v3.c
@@ -9,6 +9,7 @@
 #include <kvm/arm_vgic.h>
 #include <asm/kvm_hyp.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_nested.h>
 #include <asm/kvm_asm.h>
 
 #include "vgic.h"
@@ -278,6 +279,12 @@ void vgic_v3_enable(struct kvm_vcpu *vcpu)
 		vgic_v3->vgic_sre = (ICC_SRE_EL1_DIB |
 				     ICC_SRE_EL1_DFB |
 				     ICC_SRE_EL1_SRE);
+		/*
+		 * If nesting is allowed, force GICv3 onto the nested
+		 * guests as well.
+		 */
+		if (vcpu_has_nv(vcpu))
+			vcpu->arch.vgic_cpu.nested_vgic_v3.vgic_sre = vgic_v3->vgic_sre;
 		vcpu->arch.vgic_cpu.pendbaser = INITIAL_PENDBASER_VALUE;
 	} else {
 		vgic_v3->vgic_sre = 0;
@@ -728,6 +735,17 @@ void vgic_v3_load(struct kvm_vcpu *vcpu)
 {
 	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
 
+	vcpu->arch.vgic_cpu.current_cpu_if = cpu_if;
+
+	/*
+	 * vgic_v3_load_nested only affects the LRs in the shadow
+	 * state, so it is fine to pass the nested state around.
+	 */
+	if (vgic_state_is_nested(vcpu)) {
+		cpu_if = &vcpu->arch.vgic_cpu.shadow_vgic_v3;
+		vcpu->arch.vgic_cpu.current_cpu_if = cpu_if;
+	}
+
 	/*
 	 * If dealing with a GICv2 emulation on GICv3, VMCR_EL2.VFIQen
 	 * is dependent on ICC_SRE_EL1.SRE, and we have to perform the
@@ -741,12 +759,15 @@ void vgic_v3_load(struct kvm_vcpu *vcpu)
 	if (has_vhe())
 		__vgic_v3_activate_traps(cpu_if);
 
+	if (vgic_state_is_nested(vcpu))
+		vgic_v3_load_nested(vcpu);
+
 	WARN_ON(vgic_v4_load(vcpu));
 }
 
 void vgic_v3_vmcr_sync(struct kvm_vcpu *vcpu)
 {
-	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
+	struct vgic_v3_cpu_if *cpu_if = vcpu->arch.vgic_cpu.current_cpu_if;
 
 	if (likely(cpu_if->vgic_sre))
 		cpu_if->vgic_vmcr = kvm_call_hyp_ret(__vgic_v3_read_vmcr);
@@ -754,7 +775,7 @@ void vgic_v3_vmcr_sync(struct kvm_vcpu *vcpu)
 
 void vgic_v3_put(struct kvm_vcpu *vcpu)
 {
-	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
+	struct vgic_v3_cpu_if *cpu_if = vcpu->arch.vgic_cpu.current_cpu_if;
 
 	WARN_ON(vgic_v4_put(vcpu, false));
 
@@ -764,4 +785,10 @@ void vgic_v3_put(struct kvm_vcpu *vcpu)
 
 	if (has_vhe())
 		__vgic_v3_deactivate_traps(cpu_if);
+
+	if (vgic_state_is_nested(vcpu))
+		vgic_v3_put_nested(vcpu);
+
+	/* Default back to the non-nested state for sanity */
+	vcpu->arch.vgic_cpu.current_cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
 }
diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
index 8be4c1ebdec2..34f13fd1238b 100644
--- a/arch/arm64/kvm/vgic/vgic.c
+++ b/arch/arm64/kvm/vgic/vgic.c
@@ -878,6 +878,10 @@ void kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
 {
 	int used_lrs;
 
+	/* If nesting, this is a load/put affair, not flush/sync. */
+	if (vgic_state_is_nested(vcpu))
+		return;
+
 	/* An empty ap_list_head implies used_lrs == 0 */
 	if (list_empty(&vcpu->arch.vgic_cpu.ap_list_head))
 		return;
@@ -922,6 +926,29 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
 	    !vgic_supports_direct_msis(vcpu->kvm))
 		return;
 
+	/*
+	 * If in a nested state, we must return early. Two possibilities:
+	 *
+	 * - If we have any pending IRQ for the guest and the guest
+	 *   expects IRQs to be handled in its virtual EL2 mode (the
+	 *   virtual IMO bit is set) and it is not already running in
+	 *   virtual EL2 mode, then we have to emulate an IRQ
+	 *   exception to virtual EL2.
+	 *
+	 *   We do that by placing a request to ourselves which will
+	 *   abort the entry procedure and inject the exception at the
+	 *   beginning of the run loop.
+	 *
+	 * - Otherwise, do exactly *NOTHING*. The guest state is
+	 *   already loaded, and we can carry on with running it.
+	 */
+	if (vgic_state_is_nested(vcpu)) {
+		if (kvm_vgic_vcpu_pending_irq(vcpu))
+			kvm_make_request(KVM_REQ_GUEST_HYP_IRQ_PENDING, vcpu);
+
+		return;
+	}
+
 	DEBUG_SPINLOCK_BUG_ON(!irqs_disabled());
 
 	if (!list_empty(&vcpu->arch.vgic_cpu.ap_list_head)) {
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 402b545959af..d775eb50b72c 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -331,6 +331,17 @@ struct vgic_cpu {
 
 	struct vgic_irq private_irqs[VGIC_NR_PRIVATE_IRQS];
 
+	/* CPU vif control registers for the virtual GICH interface */
+	struct vgic_v3_cpu_if	nested_vgic_v3;
+
+	/*
+	 * The shadow vif control register loaded to the hardware when
+	 * running a nested L2 guest with the virtual IMO/FMO bit set.
+	 */
+	struct vgic_v3_cpu_if	shadow_vgic_v3;
+
+	struct vgic_v3_cpu_if	*current_cpu_if;
+
 	raw_spinlock_t ap_list_lock;	/* Protects the ap_list */
 
 	/*
@@ -389,6 +400,13 @@ void kvm_vgic_load(struct kvm_vcpu *vcpu);
 void kvm_vgic_put(struct kvm_vcpu *vcpu);
 void kvm_vgic_vmcr_sync(struct kvm_vcpu *vcpu);
 
+void vgic_v3_load_nested(struct kvm_vcpu *vcpu);
+void vgic_v3_put_nested(struct kvm_vcpu *vcpu);
+void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu);
+u16 vgic_v3_get_eisr(struct kvm_vcpu *vcpu);
+u16 vgic_v3_get_elrsr(struct kvm_vcpu *vcpu);
+u64 vgic_v3_get_misr(struct kvm_vcpu *vcpu);
+
 #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
 #define vgic_initialized(k)	((k)->arch.vgic.initialized)
 #define vgic_ready(k)		((k)->arch.vgic.ready)
@@ -433,6 +451,8 @@ int vgic_v4_load(struct kvm_vcpu *vcpu);
 void vgic_v4_commit(struct kvm_vcpu *vcpu);
 int vgic_v4_put(struct kvm_vcpu *vcpu, bool need_db);
 
+bool vgic_state_is_nested(struct kvm_vcpu *vcpu);
+
 /* CPU HP callbacks */
 void kvm_vgic_cpu_up(void);
 void kvm_vgic_cpu_down(void);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 38/59] KVM: arm64: nv: Don't load the GICv4 context on entering a nested guest
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (36 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 37/59] KVM: arm64: nv: Nested GICv3 Support Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 39/59] KVM: arm64: nv: vgic: Emulate the HW bit in software Marc Zyngier
                   ` (23 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

When entering a nested guest (vgic_state_is_nested() == true),
special care must be taken *not* to make the vPE resident, as
these are interrupts targetting the L1 guest, and not any
nested guest.

By not making the vPE resident, we guarantee that the delivery
of an vLPI will result in a doorbell, forcing an exit from the
nested guest and a switch to the L1 guest to handle the interrupt.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/vgic/vgic-v3.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c
index 41a3d0e5c876..4e907c2b1c20 100644
--- a/arch/arm64/kvm/vgic/vgic-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-v3.c
@@ -761,8 +761,8 @@ void vgic_v3_load(struct kvm_vcpu *vcpu)
 
 	if (vgic_state_is_nested(vcpu))
 		vgic_v3_load_nested(vcpu);
-
-	WARN_ON(vgic_v4_load(vcpu));
+	else
+		WARN_ON(vgic_v4_load(vcpu));
 }
 
 void vgic_v3_vmcr_sync(struct kvm_vcpu *vcpu)
@@ -777,6 +777,12 @@ void vgic_v3_put(struct kvm_vcpu *vcpu)
 {
 	struct vgic_v3_cpu_if *cpu_if = vcpu->arch.vgic_cpu.current_cpu_if;
 
+	/*
+	 * vgic_v4_put will do nothing if we were not resident. This
+	 * covers both the cases where we've blocked (we already have
+	 * done a vgic_v4_put) and when running a nested guest (the
+	 * vPE was never resident in order to generate a doorbell).
+	 */
 	WARN_ON(vgic_v4_put(vcpu, false));
 
 	vgic_v3_vmcr_sync(vcpu);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 39/59] KVM: arm64: nv: vgic: Emulate the HW bit in software
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (37 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 38/59] KVM: arm64: nv: Don't load the GICv4 context on entering a nested guest Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 40/59] KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ Marc Zyngier
                   ` (22 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

From: Christoffer Dall <christoffer.dall@arm.com>

Should the guest hypervisor use the HW bit in the LRs, we need to
emulate the deactivation from the L2 guest into the L1 distributor
emulation, which is handled by L0.

It's all good fun.

Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_hyp.h     |  2 ++
 arch/arm64/kvm/hyp/vgic-v3-sr.c      |  2 +-
 arch/arm64/kvm/vgic/vgic-v3-nested.c | 32 ++++++++++++++++++++++++++++
 arch/arm64/kvm/vgic/vgic.c           |  6 ++++--
 include/kvm/arm_vgic.h               |  1 +
 5 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index bdd9cf546d95..d57889de93e6 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -57,6 +57,8 @@ DECLARE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
 
 int __vgic_v2_perform_cpuif_access(struct kvm_vcpu *vcpu);
 
+u64 __gic_v3_get_lr(unsigned int lr);
+
 void __vgic_v3_save_state(struct vgic_v3_cpu_if *cpu_if);
 void __vgic_v3_restore_state(struct vgic_v3_cpu_if *cpu_if);
 void __vgic_v3_activate_traps(struct vgic_v3_cpu_if *cpu_if);
diff --git a/arch/arm64/kvm/hyp/vgic-v3-sr.c b/arch/arm64/kvm/hyp/vgic-v3-sr.c
index 6cb638b184b1..75152c1ce646 100644
--- a/arch/arm64/kvm/hyp/vgic-v3-sr.c
+++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c
@@ -18,7 +18,7 @@
 #define vtr_to_nr_pre_bits(v)		((((u32)(v) >> 26) & 7) + 1)
 #define vtr_to_nr_apr_regs(v)		(1 << (vtr_to_nr_pre_bits(v) - 5))
 
-static u64 __gic_v3_get_lr(unsigned int lr)
+u64 __gic_v3_get_lr(unsigned int lr)
 {
 	switch (lr & 0xf) {
 	case 0:
diff --git a/arch/arm64/kvm/vgic/vgic-v3-nested.c b/arch/arm64/kvm/vgic/vgic-v3-nested.c
index ab8ddf490b31..e88c75e79010 100644
--- a/arch/arm64/kvm/vgic/vgic-v3-nested.c
+++ b/arch/arm64/kvm/vgic/vgic-v3-nested.c
@@ -140,6 +140,38 @@ static void vgic_v3_fixup_shadow_lr_state(struct kvm_vcpu *vcpu)
 	}
 }
 
+void vgic_v3_sync_nested(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+	struct vgic_v3_cpu_if *s_cpu_if = vcpu_shadow_if(vcpu);
+	struct vgic_irq *irq;
+	int i;
+
+	for (i = 0; i < s_cpu_if->used_lrs; i++) {
+		u64 lr = cpu_if->vgic_lr[i];
+		int l1_irq;
+
+		if (!(lr & ICH_LR_HW) || !(lr & ICH_LR_STATE))
+			continue;
+
+		/*
+		 * If we had a HW lr programmed by the guest hypervisor, we
+		 * need to emulate the HW effect between the guest hypervisor
+		 * and the nested guest.
+		 */
+		l1_irq = (lr & ICH_LR_PHYS_ID_MASK) >> ICH_LR_PHYS_ID_SHIFT;
+		irq = vgic_get_irq(vcpu->kvm, vcpu, l1_irq);
+		if (!irq)
+			continue; /* oh well, the guest hyp is broken */
+
+		lr = __gic_v3_get_lr(i);
+		if (!(lr & ICH_LR_STATE))
+			irq->active = false;
+
+		vgic_put_irq(vcpu->kvm, irq);
+	}
+}
+
 void vgic_v3_load_nested(struct kvm_vcpu *vcpu)
 {
 	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
index 34f13fd1238b..880d33d9d514 100644
--- a/arch/arm64/kvm/vgic/vgic.c
+++ b/arch/arm64/kvm/vgic/vgic.c
@@ -878,9 +878,11 @@ void kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
 {
 	int used_lrs;
 
-	/* If nesting, this is a load/put affair, not flush/sync. */
-	if (vgic_state_is_nested(vcpu))
+	/* If nesting, emulate the HW effect from L0 to L1 */
+	if (vgic_state_is_nested(vcpu)) {
+		vgic_v3_sync_nested(vcpu);
 		return;
+	}
 
 	/* An empty ap_list_head implies used_lrs == 0 */
 	if (list_empty(&vcpu->arch.vgic_cpu.ap_list_head))
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index d775eb50b72c..ea45accbc690 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -400,6 +400,7 @@ void kvm_vgic_load(struct kvm_vcpu *vcpu);
 void kvm_vgic_put(struct kvm_vcpu *vcpu);
 void kvm_vgic_vmcr_sync(struct kvm_vcpu *vcpu);
 
+void vgic_v3_sync_nested(struct kvm_vcpu *vcpu);
 void vgic_v3_load_nested(struct kvm_vcpu *vcpu);
 void vgic_v3_put_nested(struct kvm_vcpu *vcpu);
 void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 40/59] KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (38 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 39/59] KVM: arm64: nv: vgic: Emulate the HW bit in software Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 41/59] KVM: arm64: nv: Implement maintenance interrupt forwarding Marc Zyngier
                   ` (21 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

From: Andre Przywara <andre.przywara@arm.com>

The VGIC maintenance IRQ signals various conditions about the LRs, when
the GIC's virtualization extension is used.
So far we didn't need it, but nested virtualization needs to know about
this interrupt, so add a userland interface to setup the IRQ number.
The architecture mandates that it must be a PPI, on top of that this code
only exports a per-device option, so the PPI is the same on all VCPUs.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
[added some bits of documentation]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 .../virt/kvm/devices/arm-vgic-v3.rst          | 12 +++++++-
 arch/arm64/include/uapi/asm/kvm.h             |  1 +
 arch/arm64/kvm/vgic/vgic-kvm-device.c         | 29 +++++++++++++++++--
 include/kvm/arm_vgic.h                        |  3 ++
 tools/arch/arm/include/uapi/asm/kvm.h         |  1 +
 5 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/Documentation/virt/kvm/devices/arm-vgic-v3.rst b/Documentation/virt/kvm/devices/arm-vgic-v3.rst
index 51e5e5762571..1901e651cc00 100644
--- a/Documentation/virt/kvm/devices/arm-vgic-v3.rst
+++ b/Documentation/virt/kvm/devices/arm-vgic-v3.rst
@@ -284,8 +284,18 @@ Groups:
       |    Aff3    |    Aff2    |    Aff1    |    Aff0    |
 
   Errors:
-
     =======  =============================================
     -EINVAL  vINTID is not multiple of 32 or info field is
 	     not VGIC_LEVEL_INFO_LINE_LEVEL
     =======  =============================================
+
+  KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ
+   Attributes:
+
+    The attr field of kvm_device_attr encodes the following values:
+
+      bits:     | 31   ....    5 | 4  ....  0 |
+      values:   |      RES0      |   vINTID   |
+
+    The vINTID specifies which interrupt is generated when the vGIC
+    must generate a maintenance interrupt. This must be a PPI.
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index f7ddd73a8c0f..fafbf12d3b84 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -403,6 +403,7 @@ enum {
 #define KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS 6
 #define KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO  7
 #define KVM_DEV_ARM_VGIC_GRP_ITS_REGS 8
+#define KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ  9
 #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT	10
 #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_MASK \
 			(0x3fffffULL << KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT)
diff --git a/arch/arm64/kvm/vgic/vgic-kvm-device.c b/arch/arm64/kvm/vgic/vgic-kvm-device.c
index a6294ec3714c..da7cc135e16d 100644
--- a/arch/arm64/kvm/vgic/vgic-kvm-device.c
+++ b/arch/arm64/kvm/vgic/vgic-kvm-device.c
@@ -292,6 +292,12 @@ static int vgic_get_common_attr(struct kvm_device *dev,
 			     VGIC_NR_PRIVATE_IRQS, uaddr);
 		break;
 	}
+	case KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ: {
+		u32 __user *uaddr = (u32 __user *)(long)attr->addr;
+
+		r = put_user(dev->kvm->arch.vgic.maint_irq, uaddr);
+		break;
+	}
 	}
 
 	return r;
@@ -510,7 +516,7 @@ static int vgic_v3_attr_regs_access(struct kvm_device *dev,
 	struct vgic_reg_attr reg_attr;
 	gpa_t addr;
 	struct kvm_vcpu *vcpu;
-	bool uaccess;
+	bool uaccess, post_init = true;
 	u32 val;
 	int ret;
 
@@ -526,6 +532,9 @@ static int vgic_v3_attr_regs_access(struct kvm_device *dev,
 		/* Sysregs uaccess is performed by the sysreg handling code */
 		uaccess = false;
 		break;
+	case KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ:
+		post_init = false;
+		fallthrough;
 	default:
 		uaccess = true;
 	}
@@ -545,7 +554,7 @@ static int vgic_v3_attr_regs_access(struct kvm_device *dev,
 
 	mutex_lock(&dev->kvm->arch.config_lock);
 
-	if (unlikely(!vgic_initialized(dev->kvm))) {
+	if (post_init != vgic_initialized(dev->kvm)) {
 		ret = -EBUSY;
 		goto out;
 	}
@@ -575,6 +584,19 @@ static int vgic_v3_attr_regs_access(struct kvm_device *dev,
 		}
 		break;
 	}
+	case KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ:
+		if (!is_write) {
+			val = dev->kvm->arch.vgic.maint_irq;
+			ret = 0;
+			break;
+		}
+
+		ret = -EINVAL;
+		if ((val < VGIC_NR_PRIVATE_IRQS) && (val >= VGIC_NR_SGIS)) {
+			dev->kvm->arch.vgic.maint_irq = val;
+			ret = 0;
+		}
+		break;
 	default:
 		ret = -EINVAL;
 		break;
@@ -601,6 +623,7 @@ static int vgic_v3_set_attr(struct kvm_device *dev,
 	case KVM_DEV_ARM_VGIC_GRP_REDIST_REGS:
 	case KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS:
 	case KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO:
+	case KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ:
 		return vgic_v3_attr_regs_access(dev, attr, true);
 	default:
 		return vgic_set_common_attr(dev, attr);
@@ -615,6 +638,7 @@ static int vgic_v3_get_attr(struct kvm_device *dev,
 	case KVM_DEV_ARM_VGIC_GRP_REDIST_REGS:
 	case KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS:
 	case KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO:
+	case KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ:
 		return vgic_v3_attr_regs_access(dev, attr, false);
 	default:
 		return vgic_get_common_attr(dev, attr);
@@ -638,6 +662,7 @@ static int vgic_v3_has_attr(struct kvm_device *dev,
 	case KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS:
 		return vgic_v3_has_attr_regs(dev, attr);
 	case KVM_DEV_ARM_VGIC_GRP_NR_IRQS:
+	case KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ:
 		return 0;
 	case KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO: {
 		if (((attr->attr & KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_MASK) >>
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index ea45accbc690..1a2e2e8abd92 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -243,6 +243,9 @@ struct vgic_dist {
 
 	int			nr_spis;
 
+	/* The GIC maintenance IRQ for nested hypervisors. */
+	u32			maint_irq;
+
 	/* base addresses in guest physical address space: */
 	gpa_t			vgic_dist_base;		/* distributor */
 	union {
diff --git a/tools/arch/arm/include/uapi/asm/kvm.h b/tools/arch/arm/include/uapi/asm/kvm.h
index 03cd7c19a683..d5dd96902817 100644
--- a/tools/arch/arm/include/uapi/asm/kvm.h
+++ b/tools/arch/arm/include/uapi/asm/kvm.h
@@ -246,6 +246,7 @@ struct kvm_vcpu_events {
 #define KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS 6
 #define KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO  7
 #define KVM_DEV_ARM_VGIC_GRP_ITS_REGS	8
+#define KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ	9
 #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT	10
 #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_MASK \
 			(0x3fffffULL << KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 41/59] KVM: arm64: nv: Implement maintenance interrupt forwarding
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (39 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 40/59] KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 42/59] KVM: arm64: nv: Deal with broken VGIC on maintenance interrupt delivery Marc Zyngier
                   ` (20 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

When we take a maintenance interrupt, we need to decide whether
it is generated on an action from the guest, or if it is something
that needs to be forwarded to the guest hypervisor.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/vgic/vgic-init.c      | 33 ++++++++++++++++++++++++++++
 arch/arm64/kvm/vgic/vgic-v3-nested.c | 28 ++++++++++++++++++-----
 2 files changed, 55 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
index 9d42c7cb2b58..4ded51eea1a6 100644
--- a/arch/arm64/kvm/vgic/vgic-init.c
+++ b/arch/arm64/kvm/vgic/vgic-init.c
@@ -6,10 +6,12 @@
 #include <linux/uaccess.h>
 #include <linux/interrupt.h>
 #include <linux/cpu.h>
+#include <linux/irq.h>
 #include <linux/kvm_host.h>
 #include <kvm/arm_vgic.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_nested.h>
 #include "vgic.h"
 
 /*
@@ -230,6 +232,16 @@ int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
 	if (!irqchip_in_kernel(vcpu->kvm))
 		return 0;
 
+	if (vcpu_has_nv(vcpu)) {
+		/* Cope with vintage userspace. Maybe we should fail instead */
+		if (vcpu->kvm->arch.vgic.maint_irq == 0)
+			vcpu->kvm->arch.vgic.maint_irq = kvm_vgic_global_state.maint_irq;
+		ret = kvm_vgic_set_owner(vcpu, vcpu->kvm->arch.vgic.maint_irq,
+					 vcpu);
+		if (ret)
+			return ret;
+	}
+
 	/*
 	 * If we are creating a VCPU with a GICv3 we must also register the
 	 * KVM io device for the redistributor that belongs to this VCPU.
@@ -488,12 +500,23 @@ void kvm_vgic_cpu_down(void)
 
 static irqreturn_t vgic_maintenance_handler(int irq, void *data)
 {
+	struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)data;
+
 	/*
 	 * We cannot rely on the vgic maintenance interrupt to be
 	 * delivered synchronously. This means we can only use it to
 	 * exit the VM, and we perform the handling of EOIed
 	 * interrupts on the exit path (see vgic_fold_lr_state).
 	 */
+
+	/* If not nested, deactivate */
+	if (!vcpu || !vgic_state_is_nested(vcpu)) {
+		irq_set_irqchip_state(irq, IRQCHIP_STATE_ACTIVE, false);
+		return IRQ_HANDLED;
+	}
+
+	/* Assume nested from now */
+	vgic_v3_handle_nested_maint_irq(vcpu);
 	return IRQ_HANDLED;
 }
 
@@ -592,6 +615,16 @@ int kvm_vgic_hyp_init(void)
 		return ret;
 	}
 
+	if (has_mask) {
+		ret = irq_set_vcpu_affinity(kvm_vgic_global_state.maint_irq,
+					    kvm_get_running_vcpus());
+		if (ret) {
+			kvm_err("Error setting vcpu affinity\n");
+			free_percpu_irq(kvm_vgic_global_state.maint_irq, kvm_get_running_vcpus());
+			return ret;
+		}
+	}
+
 	kvm_info("vgic interrupt IRQ%d\n", kvm_vgic_global_state.maint_irq);
 	return 0;
 }
diff --git a/arch/arm64/kvm/vgic/vgic-v3-nested.c b/arch/arm64/kvm/vgic/vgic-v3-nested.c
index e88c75e79010..755e4819603a 100644
--- a/arch/arm64/kvm/vgic/vgic-v3-nested.c
+++ b/arch/arm64/kvm/vgic/vgic-v3-nested.c
@@ -175,10 +175,20 @@ void vgic_v3_sync_nested(struct kvm_vcpu *vcpu)
 void vgic_v3_load_nested(struct kvm_vcpu *vcpu)
 {
 	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
+	struct vgic_irq *irq;
+	unsigned long flags;
 
 	vgic_cpu->shadow_vgic_v3 = vgic_cpu->nested_vgic_v3;
 	vgic_v3_create_shadow_lr(vcpu);
 	__vgic_v3_restore_state(vcpu_shadow_if(vcpu));
+
+	irq = vgic_get_irq(vcpu->kvm, vcpu, vcpu->kvm->arch.vgic.maint_irq);
+	raw_spin_lock_irqsave(&irq->irq_lock, flags);
+	if (irq->line_level || irq->active)
+		irq_set_irqchip_state(kvm_vgic_global_state.maint_irq,
+				      IRQCHIP_STATE_ACTIVE, true);
+	raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
+	vgic_put_irq(vcpu->kvm, irq);
 }
 
 void vgic_v3_put_nested(struct kvm_vcpu *vcpu)
@@ -193,20 +203,26 @@ void vgic_v3_put_nested(struct kvm_vcpu *vcpu)
 	 */
 	vgic_v3_fixup_shadow_lr_state(vcpu);
 	vgic_cpu->nested_vgic_v3 = vgic_cpu->shadow_vgic_v3;
+	irq_set_irqchip_state(kvm_vgic_global_state.maint_irq,
+			      IRQCHIP_STATE_ACTIVE, false);
 }
 
 void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu)
 {
-	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
-
 	/*
 	 * If we exit a nested VM with a pending maintenance interrupt from the
 	 * GIC, then we need to forward this to the guest hypervisor so that it
 	 * can re-sync the appropriate LRs and sample level triggered interrupts
 	 * again.
 	 */
-	if (vgic_state_is_nested(vcpu) &&
-	    (cpu_if->vgic_hcr & ICH_HCR_EN) &&
-	    vgic_v3_get_misr(vcpu))
-		kvm_inject_nested_irq(vcpu);
+	if (vgic_state_is_nested(vcpu)) {
+		struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+		bool state;
+
+		state  = cpu_if->vgic_hcr & ICH_HCR_EN;
+		state &= vgic_v3_get_misr(vcpu);
+
+		kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
+				    vcpu->kvm->arch.vgic.maint_irq, state, vcpu);
+	}
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 42/59] KVM: arm64: nv: Deal with broken VGIC on maintenance interrupt delivery
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (40 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 41/59] KVM: arm64: nv: Implement maintenance interrupt forwarding Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 43/59] KVM: arm64: nv: Allow userspace to request KVM_ARM_VCPU_NESTED_VIRT Marc Zyngier
                   ` (19 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

Normal, non-nesting KVM deals with maintenance interrupt in a very
simple way: we don't even try to handle it and just turn it off
as soon as we exit, long before the kernel can handle it.

However, with NV, we rely on the actual handling of the interrupt
to leave it active and pass it down to the L1 guest hypervisor
(we effectively treat it as an assigned interrupt, just like the
timer).

This doesn't work with something like the Apple M2, which doesn't
have an active state that allows the interrupt to be masked.

Instead, just disable the vgic after having taken the interrupt and
injected a virtual interrupt. This is enough for the guest to make
forward progress, but will limit its ability to handle further
interrupts until it next exits (IAR will always report "spurious").

Oh well.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/vgic/vgic-v3-nested.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm64/kvm/vgic/vgic-v3-nested.c b/arch/arm64/kvm/vgic/vgic-v3-nested.c
index 755e4819603a..919275b94625 100644
--- a/arch/arm64/kvm/vgic/vgic-v3-nested.c
+++ b/arch/arm64/kvm/vgic/vgic-v3-nested.c
@@ -225,4 +225,7 @@ void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu)
 		kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
 				    vcpu->kvm->arch.vgic.maint_irq, state, vcpu);
 	}
+
+	if (unlikely(kvm_vgic_global_state.no_hw_deactivation))
+		sysreg_clear_set_s(SYS_ICH_HCR_EL2, ICH_HCR_EN, 0);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 43/59] KVM: arm64: nv: Allow userspace to request KVM_ARM_VCPU_NESTED_VIRT
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (41 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 42/59] KVM: arm64: nv: Deal with broken VGIC on maintenance interrupt delivery Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 44/59] KVM: arm64: nv: Add handling of FEAT_TTL TLB invalidation Marc Zyngier
                   ` (18 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

Since we're (almost) feature complete, let's allow userspace to
request KVM_ARM_VCPU_NESTED_VIRT by bumping the KVM_VCPU_MAX_FEATURES
up. We also now advertise the feature to userspace with a new capability.

It's going to be great...

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h | 2 +-
 arch/arm64/kvm/arm.c              | 3 +++
 include/uapi/linux/kvm.h          | 1 +
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 1a6fc53f992d..d9f47e88e39b 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -38,7 +38,7 @@
 
 #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
 
-#define KVM_VCPU_MAX_FEATURES 7
+#define KVM_VCPU_MAX_FEATURES 8
 
 #define KVM_REQ_SLEEP \
 	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 2f3149c09805..b69fb2c042b6 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -285,6 +285,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_ARM_EL1_32BIT:
 		r = cpus_have_const_cap(ARM64_HAS_32BIT_EL1);
 		break;
+	case KVM_CAP_ARM_EL2:
+		r = cpus_have_const_cap(ARM64_HAS_NESTED_VIRT);
+		break;
 	case KVM_CAP_GUEST_DEBUG_HW_BPS:
 		r = get_num_brps();
 		break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 16287a996c32..b6bb905bc9b2 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1190,6 +1190,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 225
 #define KVM_CAP_PMU_EVENT_MASKED_EVENTS 226
 #define KVM_CAP_COUNTER_OFFSET 227
+#define KVM_CAP_ARM_EL2 228
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 44/59] KVM: arm64: nv: Add handling of FEAT_TTL TLB invalidation
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (42 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 43/59] KVM: arm64: nv: Allow userspace to request KVM_ARM_VCPU_NESTED_VIRT Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 45/59] KVM: arm64: nv: Invalidate TLBs based on shadow S2 TTL-like information Marc Zyngier
                   ` (17 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

Support guest-provided information information to find out about
the range of required invalidation.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_nested.h |  2 +
 arch/arm64/kvm/nested.c             | 90 +++++++++++++++++++++++++++++
 arch/arm64/kvm/sys_regs.c           | 26 +--------
 3 files changed, 93 insertions(+), 25 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index f7aa32cc2a72..d203263334e1 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -127,6 +127,8 @@ int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
 extern bool forward_smc_trap(struct kvm_vcpu *vcpu);
 extern bool __check_nv_sr_forward(struct kvm_vcpu *vcpu);
 
+unsigned long compute_tlb_inval_range(struct kvm_s2_mmu *mmu, u64 val);
+
 struct sys_reg_params;
 struct sys_reg_desc;
 
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index a01ed0e1db82..4bb845f75725 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -354,6 +354,96 @@ int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
 	return ret;
 }
 
+static unsigned int ttl_to_size(u8 ttl)
+{
+	int level = ttl & 3;
+	int gran = (ttl >> 2) & 3;
+	unsigned int max_size = 0;
+
+	switch (gran) {
+	case TLBI_TTL_TG_4K:
+		switch (level) {
+		case 0:
+			break;
+		case 1:
+			max_size = SZ_1G;
+			break;
+		case 2:
+			max_size = SZ_2M;
+			break;
+		case 3:
+			max_size = SZ_4K;
+			break;
+		}
+		break;
+	case TLBI_TTL_TG_16K:
+		switch (level) {
+		case 0:
+		case 1:
+			break;
+		case 2:
+			max_size = SZ_32M;
+			break;
+		case 3:
+			max_size = SZ_16K;
+			break;
+		}
+		break;
+	case TLBI_TTL_TG_64K:
+		switch (level) {
+		case 0:
+		case 1:
+			/* No 52bit IPA support */
+			break;
+		case 2:
+			max_size = SZ_512M;
+			break;
+		case 3:
+			max_size = SZ_64K;
+			break;
+		}
+		break;
+	default:			/* No size information */
+		break;
+	}
+
+	return max_size;
+}
+
+unsigned long compute_tlb_inval_range(struct kvm_s2_mmu *mmu, u64 val)
+{
+	unsigned long max_size;
+	u8 ttl;
+
+	ttl = FIELD_GET(GENMASK_ULL(47, 44), val);
+
+	max_size = ttl_to_size(ttl);
+
+	if (!max_size) {
+		/* Compute the maximum extent of the invalidation */
+		switch (mmu->tlb_vtcr & VTCR_EL2_TG0_MASK) {
+		case VTCR_EL2_TG0_4K:
+			max_size = SZ_1G;
+			break;
+		case VTCR_EL2_TG0_16K:
+			max_size = SZ_32M;
+			break;
+		case VTCR_EL2_TG0_64K:
+			/*
+			 * No, we do not support 52bit IPA in nested yet. Once
+			 * we do, this should be 4TB.
+			 */
+			max_size = SZ_512M;
+			break;
+		default:
+			BUG();
+		}
+	}
+
+	WARN_ON(!max_size);
+	return max_size;
+}
+
 /*
  * We can have multiple *different* MMU contexts with the same VMID:
  *
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index ce65a21f618d..3c2fe3fd9ab3 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2916,36 +2916,12 @@ static void s2_mmu_unmap_stage2_ipa(struct kvm_s2_mmu *mmu,
 	 *
 	 * - NS bit: we're non-secure only.
 	 *
-	 * - TTL field: We already have the granule size from the
-	 *   VTCR_EL2.TG0 field, and the level is only relevant to the
-	 *   guest's S2PT.
-	 *
 	 * - IPA[51:48]: We don't support 52bit IPA just yet...
 	 *
 	 * And of course, adjust the IPA to be on an actual address.
 	 */
 	base_addr = (info->ipa.addr & GENMASK_ULL(35, 0)) << 12;
-
-	/* Compute the maximum extent of the invalidation */
-	switch (mmu->tlb_vtcr & VTCR_EL2_TG0_MASK) {
-	case VTCR_EL2_TG0_4K:
-		max_size = SZ_1G;
-		break;
-	case VTCR_EL2_TG0_16K:
-		max_size = SZ_32M;
-		break;
-	case VTCR_EL2_TG0_64K:
-		/*
-		 * No, we do not support 52bit IPA in nested yet. Once
-		 * we do, this should be 4TB.
-		 */
-		/* FIXME: remove the 52bit PA support from the IDregs */
-		max_size = SZ_512M;
-		break;
-	default:
-		BUG();
-	}
-
+	max_size = compute_tlb_inval_range(mmu, info->ipa.addr);
 	base_addr &= ~(max_size - 1);
 
 	kvm_unmap_stage2_range(mmu, base_addr, max_size);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 45/59] KVM: arm64: nv: Invalidate TLBs based on shadow S2 TTL-like information
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (43 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 44/59] KVM: arm64: nv: Add handling of FEAT_TTL TLB invalidation Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 46/59] KVM: arm64: nv: Tag shadow S2 entries with nested level Marc Zyngier
                   ` (16 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

In order to be able to make S2 TLB invalidations more performant on NV,
let's use a scheme derived from the ARMv8.4 TTL extension.

If bits [56:55] in the descriptor are non-zero, they indicate a level
which can be used as an invalidation range.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_nested.h |  2 +
 arch/arm64/kvm/nested.c             | 81 +++++++++++++++++++++++++++++
 2 files changed, 83 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index d203263334e1..a34f6e5a531a 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -135,4 +135,6 @@ struct sys_reg_desc;
 void access_nested_id_reg(struct kvm_vcpu *v, struct sys_reg_params *p,
 			  const struct sys_reg_desc *r);
 
+#define KVM_NV_GUEST_MAP_SZ	(KVM_PGTABLE_PROT_SW1 | KVM_PGTABLE_PROT_SW0)
+
 #endif /* __ARM64_KVM_NESTED_H */
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 4bb845f75725..f41a86f0e924 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -4,6 +4,7 @@
  * Author: Jintack Lim <jintack.lim@linaro.org>
  */
 
+#include <linux/bitfield.h>
 #include <linux/kvm.h>
 #include <linux/kvm_host.h>
 
@@ -410,6 +411,81 @@ static unsigned int ttl_to_size(u8 ttl)
 	return max_size;
 }
 
+/*
+ * Compute the equivalent of the TTL field by parsing the shadow PT.  The
+ * granule size is extracted from the cached VTCR_EL2.TG0 while the level is
+ * retrieved from first entry carrying the level as a tag.
+ */
+static u8 get_guest_mapping_ttl(struct kvm_s2_mmu *mmu, u64 addr)
+{
+	u64 tmp, sz = 0, vtcr = mmu->tlb_vtcr;
+	kvm_pte_t pte;
+	u8 ttl, level;
+
+	switch (vtcr & VTCR_EL2_TG0_MASK) {
+	case VTCR_EL2_TG0_4K:
+		ttl = (1 << 2);
+		break;
+	case VTCR_EL2_TG0_16K:
+		ttl = (2 << 2);
+		break;
+	case VTCR_EL2_TG0_64K:
+		ttl = (3 << 2);
+		break;
+	default:
+		BUG();
+	}
+
+	tmp = addr;
+
+again:
+	/* Iteratively compute the block sizes for a particular granule size */
+	switch (vtcr & VTCR_EL2_TG0_MASK) {
+	case VTCR_EL2_TG0_4K:
+		if	(sz < SZ_4K)	sz = SZ_4K;
+		else if (sz < SZ_2M)	sz = SZ_2M;
+		else if (sz < SZ_1G)	sz = SZ_1G;
+		else			sz = 0;
+		break;
+	case VTCR_EL2_TG0_16K:
+		if	(sz < SZ_16K)	sz = SZ_16K;
+		else if (sz < SZ_32M)	sz = SZ_32M;
+		else			sz = 0;
+		break;
+	case VTCR_EL2_TG0_64K:
+		if	(sz < SZ_64K)	sz = SZ_64K;
+		else if (sz < SZ_512M)	sz = SZ_512M;
+		else			sz = 0;
+		break;
+	default:
+		BUG();
+	}
+
+	if (sz == 0)
+		return 0;
+
+	tmp &= ~(sz - 1);
+	if (kvm_pgtable_get_leaf(mmu->pgt, tmp, &pte, NULL))
+		goto again;
+	if (!(pte & PTE_VALID))
+		goto again;
+	level = FIELD_GET(KVM_NV_GUEST_MAP_SZ, pte);
+	if (!level)
+		goto again;
+
+	ttl |= level;
+
+	/*
+	 * We now have found some level information in the shadow S2. Check
+	 * that the resulting range is actually including the original IPA.
+	 */
+	sz = ttl_to_size(ttl);
+	if (addr < (tmp + sz))
+		return ttl;
+
+	return 0;
+}
+
 unsigned long compute_tlb_inval_range(struct kvm_s2_mmu *mmu, u64 val)
 {
 	unsigned long max_size;
@@ -417,6 +493,11 @@ unsigned long compute_tlb_inval_range(struct kvm_s2_mmu *mmu, u64 val)
 
 	ttl = FIELD_GET(GENMASK_ULL(47, 44), val);
 
+	if (!(cpus_have_const_cap(ARM64_HAS_ARMv8_4_TTL) && ttl)) {
+		u64 addr = (val & GENMASK_ULL(35, 0)) << 12;
+		ttl = get_guest_mapping_ttl(mmu, addr);
+	}
+
 	max_size = ttl_to_size(ttl);
 
 	if (!max_size) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 46/59] KVM: arm64: nv: Tag shadow S2 entries with nested level
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (44 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 45/59] KVM: arm64: nv: Invalidate TLBs based on shadow S2 TTL-like information Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 47/59] KVM: arm64: nv: Add include containing the VNCR_EL2 offsets Marc Zyngier
                   ` (15 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

Populate bits [56:55] of the leaf entry with the level provided
by the guest's S2 translation.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_nested.h |  7 +++++++
 arch/arm64/kvm/mmu.c                | 16 ++++++++++++++--
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index a34f6e5a531a..d1596d28a5f8 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -5,6 +5,8 @@
 #include <linux/bitfield.h>
 #include <linux/kvm_host.h>
 
+#include <asm/kvm_pgtable.h>
+
 static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
 {
 	return (!__is_defined(__KVM_NVHE_HYPERVISOR__) &&
@@ -137,4 +139,9 @@ void access_nested_id_reg(struct kvm_vcpu *v, struct sys_reg_params *p,
 
 #define KVM_NV_GUEST_MAP_SZ	(KVM_PGTABLE_PROT_SW1 | KVM_PGTABLE_PROT_SW0)
 
+static inline u64 kvm_encode_nested_level(struct kvm_s2_trans *trans)
+{
+	return FIELD_PREP(KVM_NV_GUEST_MAP_SZ, trans->level);
+}
+
 #endif /* __ARM64_KVM_NESTED_H */
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 6cd52397b97e..ddcc768d1f67 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1458,11 +1458,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * Potentially reduce shadow S2 permissions to match the guest's own
 	 * S2. For exec faults, we'd only reach this point if the guest
 	 * actually allowed it (see kvm_s2_handle_perm_fault).
+	 *
+	 * Also encode the level of the nested translation in the SW bits of
+	 * the PTE/PMD/PUD. This will be retrived on TLB invalidation from
+	 * the guest.
 	 */
 	if (nested) {
 		writable &= kvm_s2_trans_writable(nested);
 		if (!kvm_s2_trans_readable(nested))
 			prot &= ~KVM_PGTABLE_PROT_R;
+
+		prot |= kvm_encode_nested_level(nested);
 	}
 
 	read_lock(&kvm->mmu_lock);
@@ -1516,14 +1522,20 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * permissions only if vma_pagesize equals fault_granule. Otherwise,
 	 * kvm_pgtable_stage2_map() should be called to change block size.
 	 */
-	if (fault_status == ESR_ELx_FSC_PERM && vma_pagesize == fault_granule)
+	if (fault_status == ESR_ELx_FSC_PERM && vma_pagesize == fault_granule) {
+		/*
+		 * Drop the SW bits in favour of those stored in the
+		 * PTE, which will be preserved.
+		 */
+		prot &= ~KVM_NV_GUEST_MAP_SZ;
 		ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot);
-	else
+	} else {
 		ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize,
 					     __pfn_to_phys(pfn), prot,
 					     memcache,
 					     KVM_PGTABLE_WALK_HANDLE_FAULT |
 					     KVM_PGTABLE_WALK_SHARED);
+	}
 
 	/* Mark the page dirty only if the fault is handled successfully */
 	if (writable && !ret) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 47/59] KVM: arm64: nv: Add include containing the VNCR_EL2 offsets
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (45 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 46/59] KVM: arm64: nv: Tag shadow S2 entries with nested level Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 48/59] KVM: arm64: nv: Map VNCR-capable registers to a separate page Marc Zyngier
                   ` (14 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

VNCR_EL2 points to a page containing a number of system registers
accessed by a guest hypervisor when ARMv8.4-NV is enabled.

Let's document the offsets in that page, as we are going to use
this layout.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/vncr_mapping.h | 74 +++++++++++++++++++++++++++
 1 file changed, 74 insertions(+)
 create mode 100644 arch/arm64/include/asm/vncr_mapping.h

diff --git a/arch/arm64/include/asm/vncr_mapping.h b/arch/arm64/include/asm/vncr_mapping.h
new file mode 100644
index 000000000000..c0a2acd5045c
--- /dev/null
+++ b/arch/arm64/include/asm/vncr_mapping.h
@@ -0,0 +1,74 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * System register offsets in the VNCR page
+ * All offsets are *byte* displacements!
+ */
+
+#ifndef __ARM64_VNCR_MAPPING_H__
+#define __ARM64_VNCR_MAPPING_H__
+
+#define VNCR_VTTBR_EL2          0x020
+#define VNCR_VTCR_EL2           0x040
+#define VNCR_VMPIDR_EL2         0x050
+#define VNCR_CNTVOFF_EL2        0x060
+#define VNCR_HCR_EL2            0x078
+#define VNCR_HSTR_EL2           0x080
+#define VNCR_VPIDR_EL2          0x088
+#define VNCR_TPIDR_EL2          0x090
+#define VNCR_VNCR_EL2           0x0B0
+#define VNCR_CPACR_EL1          0x100
+#define VNCR_CONTEXTIDR_EL1     0x108
+#define VNCR_SCTLR_EL1          0x110
+#define VNCR_ACTLR_EL1          0x118
+#define VNCR_TCR_EL1            0x120
+#define VNCR_AFSR0_EL1          0x128
+#define VNCR_AFSR1_EL1          0x130
+#define VNCR_ESR_EL1            0x138
+#define VNCR_MAIR_EL1           0x140
+#define VNCR_AMAIR_EL1          0x148
+#define VNCR_MDSCR_EL1          0x158
+#define VNCR_SPSR_EL1           0x160
+#define VNCR_CNTV_CVAL_EL0      0x168
+#define VNCR_CNTV_CTL_EL0       0x170
+#define VNCR_CNTP_CVAL_EL0      0x178
+#define VNCR_CNTP_CTL_EL0       0x180
+#define VNCR_SCXTNUM_EL1        0x188
+#define VNCR_TFSR_EL1		0x190
+#define VNCR_ZCR_EL1            0x1E0
+#define VNCR_TTBR0_EL1          0x200
+#define VNCR_TTBR1_EL1          0x210
+#define VNCR_FAR_EL1            0x220
+#define VNCR_ELR_EL1            0x230
+#define VNCR_SP_EL1             0x240
+#define VNCR_VBAR_EL1           0x250
+#define VNCR_ICH_LR0_EL2        0x400
+//      VNCR_ICH_LRN_EL2(n)     VNCR_ICH_LR0_EL2+8*((n) & 7)
+#define VNCR_ICH_AP0R0_EL2      0x480
+//      VNCR_ICH_AP0RN_EL2(n)   VNCR_ICH_AP0R0_EL2+8*((n) & 3)
+#define VNCR_ICH_AP1R0_EL2      0x4A0
+//      VNCR_ICH_AP1RN_EL2(n)   VNCR_ICH_AP1R0_EL2+8*((n) & 3)
+#define VNCR_ICH_HCR_EL2        0x4C0
+#define VNCR_ICH_VMCR_EL2       0x4C8
+#define VNCR_VDISR_EL2          0x500
+#define VNCR_PMBLIMITR_EL1      0x800
+#define VNCR_PMBPTR_EL1         0x810
+#define VNCR_PMBSR_EL1          0x820
+#define VNCR_PMSCR_EL1          0x828
+#define VNCR_PMSEVFR_EL1        0x830
+#define VNCR_PMSICR_EL1         0x838
+#define VNCR_PMSIRR_EL1         0x840
+#define VNCR_PMSLATFR_EL1       0x848
+#define VNCR_TRFCR_EL1          0x880
+#define VNCR_MPAM1_EL1          0x900
+#define VNCR_MPAMHCR_EL2        0x930
+#define VNCR_MPAMVPMV_EL2       0x938
+#define VNCR_MPAMVPM0_EL2       0x940
+#define VNCR_MPAMVPM1_EL2       0x948
+#define VNCR_MPAMVPM2_EL2       0x950
+#define VNCR_MPAMVPM3_EL2       0x958
+#define VNCR_MPAMVPM4_EL2       0x960
+#define VNCR_MPAMVPM5_EL2       0x968
+#define VNCR_MPAMVPM6_EL2       0x970
+#define VNCR_MPAMVPM7_EL2       0x978
+
+#endif /* __ARM64_VNCR_MAPPING_H__ */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 48/59] KVM: arm64: nv: Map VNCR-capable registers to a separate page
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (46 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 47/59] KVM: arm64: nv: Add include containing the VNCR_EL2 offsets Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 49/59] KVM: arm64: nv: Move nested vgic state into the sysreg file Marc Zyngier
                   ` (13 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

With ARMv8.4-NV, registers that can be directly accessed in memory
by the guest have to live at architected offsets in a special page.

Let's annotate the sysreg enum to reflect the offset at which they
are in this page, whith a little twist:

If running on HW that doesn't have the ARMv8.4-NV feature, or even
a VM that doesn't use NV, we store all the system registers in the
usual sys_regs array. The only difference with the pre-8.4
situation is that VNCR-capable registers are at a "similar" offset
as in the VNCR page (we can compute the actual offset at compile
time), and that the sys_regs array is both bigger and sparse.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h | 104 ++++++++++++++++++++----------
 arch/arm64/tools/cpucaps          |   1 +
 2 files changed, 70 insertions(+), 35 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index d9f47e88e39b..14e5bdbe7153 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -27,6 +27,7 @@
 #include <asm/fpsimd.h>
 #include <asm/kvm.h>
 #include <asm/kvm_asm.h>
+#include <asm/vncr_mapping.h>
 
 #define __KVM_HAVE_ARCH_INTC_INITIALIZED
 
@@ -315,32 +316,33 @@ struct kvm_vcpu_fault_info {
 	u64 disr_el1;		/* Deferred [SError] Status Register */
 };
 
+/*
+ * VNCR() just places the VNCR_capable registers in the enum after
+ * __VNCR_START__, and the value (after correction) to be an 8-byte offset
+ * from the VNCR base. As we don't require the enum to be otherwise ordered,
+ * we need the terrible hack below to ensure that we correctly size the
+ * sys_regs array, no matter what.
+ *
+ * The __MAX__ macro has been lifted from Sean Eron Anderson's wonderful
+ * treasure trove of bit hacks:
+ * https://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMax
+ */
+#define __MAX__(x,y)	((x) ^ (((x) ^ (y)) & -((x) < (y))))
+#define VNCR(r)						\
+	__before_##r,					\
+	r = __VNCR_START__ + ((VNCR_ ## r) / 8),	\
+	__after_##r = __MAX__(__before_##r - 1, r)
+
 enum vcpu_sysreg {
 	__INVALID_SYSREG__,   /* 0 is reserved as an invalid value */
 	MPIDR_EL1,	/* MultiProcessor Affinity Register */
 	CLIDR_EL1,	/* Cache Level ID Register */
 	CSSELR_EL1,	/* Cache Size Selection Register */
-	SCTLR_EL1,	/* System Control Register */
-	ACTLR_EL1,	/* Auxiliary Control Register */
-	CPACR_EL1,	/* Coprocessor Access Control */
-	ZCR_EL1,	/* SVE Control */
-	TTBR0_EL1,	/* Translation Table Base Register 0 */
-	TTBR1_EL1,	/* Translation Table Base Register 1 */
-	TCR_EL1,	/* Translation Control Register */
-	ESR_EL1,	/* Exception Syndrome Register */
-	AFSR0_EL1,	/* Auxiliary Fault Status Register 0 */
-	AFSR1_EL1,	/* Auxiliary Fault Status Register 1 */
-	FAR_EL1,	/* Fault Address Register */
-	MAIR_EL1,	/* Memory Attribute Indirection Register */
-	VBAR_EL1,	/* Vector Base Address Register */
-	CONTEXTIDR_EL1,	/* Context ID Register */
 	TPIDR_EL0,	/* Thread ID, User R/W */
 	TPIDRRO_EL0,	/* Thread ID, User R/O */
 	TPIDR_EL1,	/* Thread ID, Privileged */
-	AMAIR_EL1,	/* Aux Memory Attribute Indirection Register */
 	CNTKCTL_EL1,	/* Timer Control Register (EL1) */
 	PAR_EL1,	/* Physical Address Register */
-	MDSCR_EL1,	/* Monitor Debug System Control Register */
 	MDCCINT_EL1,	/* Monitor Debug Comms Channel Interrupt Enable Reg */
 	OSLSR_EL1,	/* OS Lock Status Register */
 	DISR_EL1,	/* Deferred Interrupt Status Register */
@@ -371,20 +373,9 @@ enum vcpu_sysreg {
 	APGAKEYLO_EL1,
 	APGAKEYHI_EL1,
 
-	ELR_EL1,
-	SP_EL1,
-	SPSR_EL1,
-
-	CNTVOFF_EL2,
-	CNTV_CVAL_EL0,
-	CNTV_CTL_EL0,
-	CNTP_CVAL_EL0,
-	CNTP_CTL_EL0,
-
 	/* Memory Tagging Extension registers */
 	RGSR_EL1,	/* Random Allocation Tag Seed Register */
 	GCR_EL1,	/* Tag Control Register */
-	TFSR_EL1,	/* Tag Fault Status Register (EL1) */
 	TFSRE0_EL1,	/* Tag Fault Status Register (EL0) */
 
 	/* 32bit specific registers. */
@@ -394,20 +385,14 @@ enum vcpu_sysreg {
 	DBGVCR32_EL2,	/* Debug Vector Catch Register */
 
 	/* EL2 registers */
-	VPIDR_EL2,	/* Virtualization Processor ID Register */
-	VMPIDR_EL2,	/* Virtualization Multiprocessor ID Register */
 	SCTLR_EL2,	/* System Control Register (EL2) */
 	ACTLR_EL2,	/* Auxiliary Control Register (EL2) */
-	HCR_EL2,	/* Hypervisor Configuration Register */
 	MDCR_EL2,	/* Monitor Debug Configuration Register (EL2) */
 	CPTR_EL2,	/* Architectural Feature Trap Register (EL2) */
-	HSTR_EL2,	/* Hypervisor System Trap Register */
 	HACR_EL2,	/* Hypervisor Auxiliary Control Register */
 	TTBR0_EL2,	/* Translation Table Base Register 0 (EL2) */
 	TTBR1_EL2,	/* Translation Table Base Register 1 (EL2) */
 	TCR_EL2,	/* Translation Control Register (EL2) */
-	VTTBR_EL2,	/* Virtualization Translation Table Base Register */
-	VTCR_EL2,	/* Virtualization Translation Control Register */
 	SPSR_EL2,	/* EL2 saved program status register */
 	ELR_EL2,	/* EL2 exception link register */
 	AFSR0_EL2,	/* Auxiliary Fault Status Register 0 (EL2) */
@@ -420,7 +405,6 @@ enum vcpu_sysreg {
 	VBAR_EL2,	/* Vector Base Address Register (EL2) */
 	RVBAR_EL2,	/* Reset Vector Base Address Register */
 	CONTEXTIDR_EL2,	/* Context ID Register (EL2) */
-	TPIDR_EL2,	/* EL2 Software Thread ID Register */
 	CNTHCTL_EL2,	/* Counter-timer Hypervisor Control register */
 	SP_EL2,		/* EL2 Stack Pointer */
 	CNTHP_CTL_EL2,
@@ -428,6 +412,42 @@ enum vcpu_sysreg {
 	CNTHV_CTL_EL2,
 	CNTHV_CVAL_EL2,
 
+	__VNCR_START__,	/* Any VNCR-capable reg goes after this point */
+
+	VNCR(SCTLR_EL1),/* System Control Register */
+	VNCR(ACTLR_EL1),/* Auxiliary Control Register */
+	VNCR(CPACR_EL1),/* Coprocessor Access Control */
+	VNCR(ZCR_EL1),	/* SVE Control */
+	VNCR(TTBR0_EL1),/* Translation Table Base Register 0 */
+	VNCR(TTBR1_EL1),/* Translation Table Base Register 1 */
+	VNCR(TCR_EL1),	/* Translation Control Register */
+	VNCR(ESR_EL1),	/* Exception Syndrome Register */
+	VNCR(AFSR0_EL1),/* Auxiliary Fault Status Register 0 */
+	VNCR(AFSR1_EL1),/* Auxiliary Fault Status Register 1 */
+	VNCR(FAR_EL1),	/* Fault Address Register */
+	VNCR(MAIR_EL1),	/* Memory Attribute Indirection Register */
+	VNCR(VBAR_EL1),	/* Vector Base Address Register */
+	VNCR(CONTEXTIDR_EL1),	/* Context ID Register */
+	VNCR(AMAIR_EL1),/* Aux Memory Attribute Indirection Register */
+	VNCR(MDSCR_EL1),/* Monitor Debug System Control Register */
+	VNCR(ELR_EL1),
+	VNCR(SP_EL1),
+	VNCR(SPSR_EL1),
+	VNCR(TFSR_EL1),	/* Tag Fault Status Register (EL1) */
+	VNCR(VPIDR_EL2),/* Virtualization Processor ID Register */
+	VNCR(VMPIDR_EL2),/* Virtualization Multiprocessor ID Register */
+	VNCR(HCR_EL2),	/* Hypervisor Configuration Register */
+	VNCR(HSTR_EL2),	/* Hypervisor System Trap Register */
+	VNCR(VTTBR_EL2),/* Virtualization Translation Table Base Register */
+	VNCR(VTCR_EL2),	/* Virtualization Translation Control Register */
+	VNCR(TPIDR_EL2),/* EL2 Software Thread ID Register */
+
+	VNCR(CNTVOFF_EL2),
+	VNCR(CNTV_CVAL_EL0),
+	VNCR(CNTV_CTL_EL0),
+	VNCR(CNTP_CVAL_EL0),
+	VNCR(CNTP_CTL_EL0),
+
 	NR_SYS_REGS	/* Nothing after this line! */
 };
 
@@ -444,6 +464,9 @@ struct kvm_cpu_context {
 	u64 sys_regs[NR_SYS_REGS];
 
 	struct kvm_vcpu *__hyp_running_vcpu;
+
+	/* This pointer has to be 4kB aligned. */
+	u64 *vncr_array;
 };
 
 struct kvm_host_data {
@@ -803,8 +826,19 @@ struct kvm_vcpu_arch {
  * accessed by a running VCPU.  For example, for userspace access or
  * for system registers that are never context switched, but only
  * emulated.
+ *
+ * Don't bother with VNCR-based accesses in the nVHE code, it has no
+ * business dealing with NV.
  */
-#define __ctxt_sys_reg(c,r)	(&(c)->sys_regs[(r)])
+static inline u64 *__ctxt_sys_reg(const struct kvm_cpu_context *ctxt, int r)
+{
+#if !defined (__KVM_NVHE_HYPERVISOR__)
+	if (unlikely(cpus_have_final_cap(ARM64_HAS_ENHANCED_NESTED_VIRT) &&
+		     r >= __VNCR_START__ && ctxt->vncr_array))
+		return &ctxt->vncr_array[r - __VNCR_START__];
+#endif
+	return (u64 *)&ctxt->sys_regs[r];
+}
 
 #define ctxt_sys_reg(c,r)	(*__ctxt_sys_reg(c,r))
 
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 606d1184a5e9..b9bb46021593 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -24,6 +24,7 @@ HAS_DIT
 HAS_E0PD
 HAS_ECV
 HAS_ECV_CNTPOFF
+HAS_ENHANCED_NESTED_VIRT
 HAS_EPAN
 HAS_EVT
 HAS_GENERIC_AUTH
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 49/59] KVM: arm64: nv: Move nested vgic state into the sysreg file
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (47 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 48/59] KVM: arm64: nv: Map VNCR-capable registers to a separate page Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 50/59] KVM: arm64: Add FEAT_NV2 cpu feature Marc Zyngier
                   ` (12 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

The vgic nested state needs to be accessible from the VNCR page, and
thus needs to be part of the normal sysreg file. Let's move it there.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h    |  9 +++
 arch/arm64/kvm/sys_regs.c            | 53 ++++++++++++------
 arch/arm64/kvm/vgic/vgic-v3-nested.c | 82 ++++++++++++++--------------
 arch/arm64/kvm/vgic/vgic-v3.c        | 12 ++--
 arch/arm64/kvm/vgic/vgic.h           | 10 ++++
 include/kvm/arm_vgic.h               |  7 ---
 6 files changed, 104 insertions(+), 69 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 14e5bdbe7153..eb8ad037f1c5 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -448,6 +448,15 @@ enum vcpu_sysreg {
 	VNCR(CNTP_CVAL_EL0),
 	VNCR(CNTP_CTL_EL0),
 
+	VNCR(ICH_LR0_EL2),
+	ICH_LR15_EL2 = ICH_LR0_EL2 + 15,
+	VNCR(ICH_AP0R0_EL2),
+	ICH_AP0R3_EL2 = ICH_AP0R0_EL2 + 3,
+	VNCR(ICH_AP1R0_EL2),
+	ICH_AP1R3_EL2 = ICH_AP1R0_EL2 + 3,
+	VNCR(ICH_HCR_EL2),
+	VNCR(ICH_VMCR_EL2),
+
 	NR_SYS_REGS	/* Nothing after this line! */
 };
 
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 3c2fe3fd9ab3..228dba82e9e5 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2025,17 +2025,17 @@ static bool access_gic_apr(struct kvm_vcpu *vcpu,
 			   struct sys_reg_params *p,
 			   const struct sys_reg_desc *r)
 {
-	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
-	u32 index, *base;
+	u64 *base;
+	u8 index;
 
 	index = r->Op2;
 	if (r->CRm == 8)
-		base = cpu_if->vgic_ap0r;
+		base = __ctxt_sys_reg(&vcpu->arch.ctxt, ICH_AP0R0_EL2);
 	else
-		base = cpu_if->vgic_ap1r;
+		base = __ctxt_sys_reg(&vcpu->arch.ctxt, ICH_AP1R0_EL2);
 
 	if (p->is_write)
-		base[index] = p->regval;
+		base[index] = lower_32_bits(p->regval);
 	else
 		p->regval = base[index];
 
@@ -2046,12 +2046,10 @@ static bool access_gic_hcr(struct kvm_vcpu *vcpu,
 			   struct sys_reg_params *p,
 			   const struct sys_reg_desc *r)
 {
-	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
-
 	if (p->is_write)
-		cpu_if->vgic_hcr = p->regval;
+		__vcpu_sys_reg(vcpu, ICH_HCR_EL2) = lower_32_bits(p->regval);
 	else
-		p->regval = cpu_if->vgic_hcr;
+		p->regval = __vcpu_sys_reg(vcpu, ICH_HCR_EL2);
 
 	return true;
 }
@@ -2108,12 +2106,19 @@ static bool access_gic_vmcr(struct kvm_vcpu *vcpu,
 			    struct sys_reg_params *p,
 			    const struct sys_reg_desc *r)
 {
-	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
-
 	if (p->is_write)
-		cpu_if->vgic_vmcr = p->regval;
+		__vcpu_sys_reg(vcpu, ICH_VMCR_EL2) = (p->regval	&
+						      (ICH_VMCR_ENG0_MASK	|
+						       ICH_VMCR_ENG1_MASK	|
+						       ICH_VMCR_PMR_MASK	|
+						       ICH_VMCR_BPR0_MASK	|
+						       ICH_VMCR_BPR1_MASK	|
+						       ICH_VMCR_EOIM_MASK	|
+						       ICH_VMCR_CBPR_MASK	|
+						       ICH_VMCR_FIQ_EN_MASK	|
+						       ICH_VMCR_ACK_CTL_MASK));
 	else
-		p->regval = cpu_if->vgic_vmcr;
+		p->regval = __vcpu_sys_reg(vcpu, ICH_VMCR_EL2);
 
 	return true;
 }
@@ -2122,17 +2127,29 @@ static bool access_gic_lr(struct kvm_vcpu *vcpu,
 			  struct sys_reg_params *p,
 			  const struct sys_reg_desc *r)
 {
-	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
 	u32 index;
+	u64 *base;
 
+	base = __ctxt_sys_reg(&vcpu->arch.ctxt, ICH_LR0_EL2);
 	index = p->Op2;
 	if (p->CRm == 13)
 		index += 8;
 
-	if (p->is_write)
-		cpu_if->vgic_lr[index] = p->regval;
-	else
-		p->regval = cpu_if->vgic_lr[index];
+	if (p->is_write) {
+		u64 mask = (ICH_LR_VIRTUAL_ID_MASK	|
+			    ICH_LR_GROUP		|
+			    ICH_LR_HW			|
+			    ICH_LR_STATE);
+
+		if (p->regval & ICH_LR_HW)
+			mask |= ICH_LR_PHYS_ID_MASK;
+		else
+			mask |= ICH_LR_EOI;
+
+		base[index] = p->regval & mask;
+	} else {
+		p->regval = base[index];
+	}
 
 	return true;
 }
diff --git a/arch/arm64/kvm/vgic/vgic-v3-nested.c b/arch/arm64/kvm/vgic/vgic-v3-nested.c
index 919275b94625..12937bc86e1c 100644
--- a/arch/arm64/kvm/vgic/vgic-v3-nested.c
+++ b/arch/arm64/kvm/vgic/vgic-v3-nested.c
@@ -15,11 +15,6 @@
 
 #include "vgic.h"
 
-static inline struct vgic_v3_cpu_if *vcpu_nested_if(struct kvm_vcpu *vcpu)
-{
-	return &vcpu->arch.vgic_cpu.nested_vgic_v3;
-}
-
 static inline struct vgic_v3_cpu_if *vcpu_shadow_if(struct kvm_vcpu *vcpu)
 {
 	return &vcpu->arch.vgic_cpu.shadow_vgic_v3;
@@ -32,12 +27,11 @@ static inline bool lr_triggers_eoi(u64 lr)
 
 u16 vgic_v3_get_eisr(struct kvm_vcpu *vcpu)
 {
-	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
 	u16 reg = 0;
 	int i;
 
 	for (i = 0; i < kvm_vgic_global_state.nr_lr; i++) {
-		if (lr_triggers_eoi(cpu_if->vgic_lr[i]))
+		if (lr_triggers_eoi(__vcpu_sys_reg(vcpu, ICH_LRN(i))))
 			reg |= BIT(i);
 	}
 
@@ -46,12 +40,11 @@ u16 vgic_v3_get_eisr(struct kvm_vcpu *vcpu)
 
 u16 vgic_v3_get_elrsr(struct kvm_vcpu *vcpu)
 {
-	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
 	u16 reg = 0;
 	int i;
 
 	for (i = 0; i < kvm_vgic_global_state.nr_lr; i++) {
-		if (!(cpu_if->vgic_lr[i] & ICH_LR_STATE))
+		if (!(__vcpu_sys_reg(vcpu, ICH_LRN(i)) & ICH_LR_STATE))
 			reg |= BIT(i);
 	}
 
@@ -60,14 +53,13 @@ u16 vgic_v3_get_elrsr(struct kvm_vcpu *vcpu)
 
 u64 vgic_v3_get_misr(struct kvm_vcpu *vcpu)
 {
-	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
 	int nr_lr = kvm_vgic_global_state.nr_lr;
 	u64 reg = 0;
 
 	if (vgic_v3_get_eisr(vcpu))
 		reg |= ICH_MISR_EOI;
 
-	if (cpu_if->vgic_hcr & ICH_HCR_UIE) {
+	if (__vcpu_sys_reg(vcpu, ICH_HCR_EL2) & ICH_HCR_UIE) {
 		int used_lrs;
 
 		used_lrs = nr_lr - hweight16(vgic_v3_get_elrsr(vcpu));
@@ -86,13 +78,12 @@ u64 vgic_v3_get_misr(struct kvm_vcpu *vcpu)
  */
 static void vgic_v3_create_shadow_lr(struct kvm_vcpu *vcpu)
 {
-	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
 	struct vgic_v3_cpu_if *s_cpu_if = vcpu_shadow_if(vcpu);
 	struct vgic_irq *irq;
 	int i, used_lrs = 0;
 
 	for (i = 0; i < kvm_vgic_global_state.nr_lr; i++) {
-		u64 lr = cpu_if->vgic_lr[i];
+		u64 lr = __vcpu_sys_reg(vcpu, ICH_LRN(i));
 		int l1_irq;
 
 		if (!(lr & ICH_LR_HW))
@@ -124,31 +115,14 @@ static void vgic_v3_create_shadow_lr(struct kvm_vcpu *vcpu)
 	s_cpu_if->used_lrs = used_lrs;
 }
 
-/*
- * Change the shadow HWIRQ field back to the virtual value before copying over
- * the entire shadow struct to the nested state.
- */
-static void vgic_v3_fixup_shadow_lr_state(struct kvm_vcpu *vcpu)
-{
-	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
-	struct vgic_v3_cpu_if *s_cpu_if = vcpu_shadow_if(vcpu);
-	int lr;
-
-	for (lr = 0; lr < kvm_vgic_global_state.nr_lr; lr++) {
-		s_cpu_if->vgic_lr[lr] &= ~ICH_LR_PHYS_ID_MASK;
-		s_cpu_if->vgic_lr[lr] |= cpu_if->vgic_lr[lr] & ICH_LR_PHYS_ID_MASK;
-	}
-}
-
 void vgic_v3_sync_nested(struct kvm_vcpu *vcpu)
 {
-	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
 	struct vgic_v3_cpu_if *s_cpu_if = vcpu_shadow_if(vcpu);
 	struct vgic_irq *irq;
 	int i;
 
 	for (i = 0; i < s_cpu_if->used_lrs; i++) {
-		u64 lr = cpu_if->vgic_lr[i];
+		u64 lr = __vcpu_sys_reg(vcpu, ICH_LRN(i));
 		int l1_irq;
 
 		if (!(lr & ICH_LR_HW) || !(lr & ICH_LR_STATE))
@@ -172,14 +146,27 @@ void vgic_v3_sync_nested(struct kvm_vcpu *vcpu)
 	}
 }
 
+void vgic_v3_create_shadow_state(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.shadow_vgic_v3;
+	int i;
+
+	cpu_if->vgic_hcr = __vcpu_sys_reg(vcpu, ICH_HCR_EL2);
+	cpu_if->vgic_vmcr = __vcpu_sys_reg(vcpu, ICH_VMCR_EL2);
+
+	for (i = 0; i < 4; i++) {
+		cpu_if->vgic_ap0r[i] = __vcpu_sys_reg(vcpu, ICH_AP0RN(i));
+		cpu_if->vgic_ap1r[i] = __vcpu_sys_reg(vcpu, ICH_AP1RN(i));
+	}
+
+	vgic_v3_create_shadow_lr(vcpu);
+}
+
 void vgic_v3_load_nested(struct kvm_vcpu *vcpu)
 {
-	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
 	struct vgic_irq *irq;
 	unsigned long flags;
 
-	vgic_cpu->shadow_vgic_v3 = vgic_cpu->nested_vgic_v3;
-	vgic_v3_create_shadow_lr(vcpu);
 	__vgic_v3_restore_state(vcpu_shadow_if(vcpu));
 
 	irq = vgic_get_irq(vcpu->kvm, vcpu, vcpu->kvm->arch.vgic.maint_irq);
@@ -193,16 +180,32 @@ void vgic_v3_load_nested(struct kvm_vcpu *vcpu)
 
 void vgic_v3_put_nested(struct kvm_vcpu *vcpu)
 {
-	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
+	struct vgic_v3_cpu_if *s_cpu_if = vcpu_shadow_if(vcpu);
+	int i;
 
-	__vgic_v3_save_state(vcpu_shadow_if(vcpu));
+	__vgic_v3_save_state(s_cpu_if);
 
 	/*
 	 * Translate the shadow state HW fields back to the virtual ones
 	 * before copying the shadow struct back to the nested one.
 	 */
-	vgic_v3_fixup_shadow_lr_state(vcpu);
-	vgic_cpu->nested_vgic_v3 = vgic_cpu->shadow_vgic_v3;
+	__vcpu_sys_reg(vcpu, ICH_HCR_EL2) = s_cpu_if->vgic_hcr;
+	__vcpu_sys_reg(vcpu, ICH_VMCR_EL2) = s_cpu_if->vgic_vmcr;
+
+	for (i = 0; i < 4; i++) {
+		__vcpu_sys_reg(vcpu, ICH_AP0RN(i)) = s_cpu_if->vgic_ap0r[i];
+		__vcpu_sys_reg(vcpu, ICH_AP1RN(i)) = s_cpu_if->vgic_ap1r[i];
+	}
+
+	for (i = 0; i < kvm_vgic_global_state.nr_lr; i++) {
+		u64 val = __vcpu_sys_reg(vcpu, ICH_LRN(i));
+
+		val &= ~ICH_LR_STATE;
+		val |= s_cpu_if->vgic_lr[i] & ICH_LR_STATE;
+
+		__vcpu_sys_reg(vcpu, ICH_LRN(i)) = val;
+	}
+
 	irq_set_irqchip_state(kvm_vgic_global_state.maint_irq,
 			      IRQCHIP_STATE_ACTIVE, false);
 }
@@ -216,10 +219,9 @@ void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu)
 	 * again.
 	 */
 	if (vgic_state_is_nested(vcpu)) {
-		struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
 		bool state;
 
-		state  = cpu_if->vgic_hcr & ICH_HCR_EN;
+		state  = __vcpu_sys_reg(vcpu, ICH_HCR_EL2) & ICH_HCR_EN;
 		state &= vgic_v3_get_misr(vcpu);
 
 		kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c
index 4e907c2b1c20..c520c12744b2 100644
--- a/arch/arm64/kvm/vgic/vgic-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-v3.c
@@ -281,10 +281,11 @@ void vgic_v3_enable(struct kvm_vcpu *vcpu)
 				     ICC_SRE_EL1_SRE);
 		/*
 		 * If nesting is allowed, force GICv3 onto the nested
-		 * guests as well.
+		 * guests as well by setting the shadow state to the
+		 * same value.
 		 */
 		if (vcpu_has_nv(vcpu))
-			vcpu->arch.vgic_cpu.nested_vgic_v3.vgic_sre = vgic_v3->vgic_sre;
+			vcpu->arch.vgic_cpu.shadow_vgic_v3.vgic_sre = vgic_v3->vgic_sre;
 		vcpu->arch.vgic_cpu.pendbaser = INITIAL_PENDBASER_VALUE;
 	} else {
 		vgic_v3->vgic_sre = 0;
@@ -738,10 +739,13 @@ void vgic_v3_load(struct kvm_vcpu *vcpu)
 	vcpu->arch.vgic_cpu.current_cpu_if = cpu_if;
 
 	/*
-	 * vgic_v3_load_nested only affects the LRs in the shadow
-	 * state, so it is fine to pass the nested state around.
+	 * If the vgic is in nested state, populate the shadow state
+	 * from the guest's nested state. As vgic_v3_load_nested()
+	 * will only load LRs, let's deal with the rest of the state
+	 * here as if it was a non-nested state. Cunning.
 	 */
 	if (vgic_state_is_nested(vcpu)) {
+		vgic_v3_create_shadow_state(vcpu);
 		cpu_if = &vcpu->arch.vgic_cpu.shadow_vgic_v3;
 		vcpu->arch.vgic_cpu.current_cpu_if = cpu_if;
 	}
diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h
index f9923beedd27..bbf07bde4dc0 100644
--- a/arch/arm64/kvm/vgic/vgic.h
+++ b/arch/arm64/kvm/vgic/vgic.h
@@ -344,4 +344,14 @@ void vgic_v4_configure_vsgis(struct kvm *kvm);
 void vgic_v4_get_vlpi_state(struct vgic_irq *irq, bool *val);
 int vgic_v4_request_vpe_irq(struct kvm_vcpu *vcpu, int irq);
 
+void vgic_v3_sync_nested(struct kvm_vcpu *vcpu);
+void vgic_v3_create_shadow_state(struct kvm_vcpu *vcpu);
+void vgic_v3_load_nested(struct kvm_vcpu *vcpu);
+void vgic_v3_put_nested(struct kvm_vcpu *vcpu);
+void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu);
+
+#define ICH_LRN(n)	(ICH_LR0_EL2 + (n))
+#define ICH_AP0RN(n)	(ICH_AP0R0_EL2 + (n))
+#define ICH_AP1RN(n)	(ICH_AP1R0_EL2 + (n))
+
 #endif
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 1a2e2e8abd92..9b91a8135dac 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -334,9 +334,6 @@ struct vgic_cpu {
 
 	struct vgic_irq private_irqs[VGIC_NR_PRIVATE_IRQS];
 
-	/* CPU vif control registers for the virtual GICH interface */
-	struct vgic_v3_cpu_if	nested_vgic_v3;
-
 	/*
 	 * The shadow vif control register loaded to the hardware when
 	 * running a nested L2 guest with the virtual IMO/FMO bit set.
@@ -403,10 +400,6 @@ void kvm_vgic_load(struct kvm_vcpu *vcpu);
 void kvm_vgic_put(struct kvm_vcpu *vcpu);
 void kvm_vgic_vmcr_sync(struct kvm_vcpu *vcpu);
 
-void vgic_v3_sync_nested(struct kvm_vcpu *vcpu);
-void vgic_v3_load_nested(struct kvm_vcpu *vcpu);
-void vgic_v3_put_nested(struct kvm_vcpu *vcpu);
-void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu);
 u16 vgic_v3_get_eisr(struct kvm_vcpu *vcpu);
 u16 vgic_v3_get_elrsr(struct kvm_vcpu *vcpu);
 u64 vgic_v3_get_misr(struct kvm_vcpu *vcpu);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 50/59] KVM: arm64: Add FEAT_NV2 cpu feature
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (48 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 49/59] KVM: arm64: nv: Move nested vgic state into the sysreg file Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 51/59] KVM: arm64: nv: Sync nested timer state with FEAT_NV2 Marc Zyngier
                   ` (11 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

Add the detection code for the FEAT_NV2 feature.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_nested.h |  6 ++++++
 arch/arm64/kernel/cpufeature.c      | 11 +++++++++++
 2 files changed, 17 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index d1596d28a5f8..6a4ddf3683b9 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -14,6 +14,12 @@ static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
 		test_bit(KVM_ARM_VCPU_HAS_EL2, vcpu->arch.features));
 }
 
+static inline bool vcpu_has_nv2(const struct kvm_vcpu *vcpu)
+{
+	return cpus_have_final_cap(ARM64_HAS_ENHANCED_NESTED_VIRT) &&
+		vcpu_has_nv(vcpu);
+}
+
 /* Translation helpers from non-VHE EL2 to EL1 */
 static inline u64 tcr_el2_ps_to_tcr_el1_ips(u64 tcr_el2)
 {
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index bd184c2cef33..4f345f11d89d 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2298,6 +2298,17 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 		.field_width = 4,
 		.min_field_value = ID_AA64MMFR2_EL1_NV_IMP,
 	},
+	{
+		.desc = "Enhanced Nested Virtualization Support",
+		.capability = ARM64_HAS_ENHANCED_NESTED_VIRT,
+		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
+		.matches = has_nested_virt_support,
+		.sys_reg = SYS_ID_AA64MMFR2_EL1,
+		.sign = FTR_UNSIGNED,
+		.field_pos = ID_AA64MMFR2_EL1_NV_SHIFT,
+		.field_width = 4,
+		.min_field_value = ID_AA64MMFR2_EL1_NV_NV2,
+	},
 	{
 		.capability = ARM64_HAS_32BIT_EL0_DO_NOT_USE,
 		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 51/59] KVM: arm64: nv: Sync nested timer state with FEAT_NV2
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (49 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 50/59] KVM: arm64: Add FEAT_NV2 cpu feature Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 52/59] KVM: arm64: nv: Fold GICv3 host trapping requirements into guest setup Marc Zyngier
                   ` (10 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

From: Christoffer Dall <christoffer.dall@arm.com>

Emulating the timers with FEAT_NV2 is a bit odd, as the timers
can be reconfigured behind our back without the hypervisor even
noticing. In the VHE case, that's an actual regression in the
architecture...

Co-developed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/arch_timer.c  | 44 ++++++++++++++++++++++++++++++++++++
 arch/arm64/kvm/arm.c         |  3 +++
 include/kvm/arm_arch_timer.h |  1 +
 3 files changed, 48 insertions(+)

diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index 05b022be885b..294ab5013e40 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -914,6 +914,50 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
 		kvm_timer_blocking(vcpu);
 }
 
+void kvm_timer_sync_nested(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * When NV2 is on, guest hypervisors have their EL0 timer register
+	 * accesses redirected to the VNCR page. Any guest action taken on
+	 * the timer is postponed until the next exit, leading to a very
+	 * poor quality of emulation.
+	 */
+	if (!is_hyp_ctxt(vcpu))
+		return;
+
+	if (!vcpu_el2_e2h_is_set(vcpu)) {
+		/*
+		 * A non-VHE guest hypervisor doesn't have any direct access
+		 * to its timers: the EL2 registers trap (and the HW is
+		 * fully emulated), while the EL0 registers access memory
+		 * despite the access being notionally direct. Boo.
+		 *
+		 * We update the hardware timer registers with the
+		 * latest value written by the guest to the VNCR page
+		 * and let the hardware take care of the rest.
+		 */
+		write_sysreg_el0(__vcpu_sys_reg(vcpu, CNTV_CTL_EL0),  SYS_CNTV_CTL);
+		write_sysreg_el0(__vcpu_sys_reg(vcpu, CNTV_CVAL_EL0), SYS_CNTV_CVAL);
+		write_sysreg_el0(__vcpu_sys_reg(vcpu, CNTP_CTL_EL0),  SYS_CNTP_CTL);
+		write_sysreg_el0(__vcpu_sys_reg(vcpu, CNTP_CVAL_EL0), SYS_CNTP_CVAL);
+	} else {
+		/*
+		 * For a VHE guest hypervisor, the EL2 state is directly
+		 * stored in the host EL0 timers, while the emulated EL0
+		 * state is stored in the VNCR page. The latter could have
+		 * been updated behind our back, and we must reset the
+		 * emulation of the timers.
+		 */
+		struct timer_map map;
+		get_timer_map(vcpu, &map);
+
+		soft_timer_cancel(&map.emul_vtimer->hrtimer);
+		soft_timer_cancel(&map.emul_ptimer->hrtimer);
+		timer_emulate(map.emul_vtimer);
+		timer_emulate(map.emul_ptimer);
+	}
+}
+
 /*
  * With a userspace irqchip we have to check if the guest de-asserted the
  * timer and if so, unmask the timer irq signal on the host interrupt
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index b69fb2c042b6..725439a019bc 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1019,6 +1019,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 		if (static_branch_unlikely(&userspace_irqchip_in_use))
 			kvm_timer_sync_user(vcpu);
 
+		if (vcpu_has_nv2(vcpu))
+			kvm_timer_sync_nested(vcpu);
+
 		kvm_arch_vcpu_ctxsync_fp(vcpu);
 
 		/*
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index bb3cb005873e..7dc9a33c38af 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -96,6 +96,7 @@ int __init kvm_timer_hyp_init(bool has_gic);
 int kvm_timer_enable(struct kvm_vcpu *vcpu);
 int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu);
 void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu);
+void kvm_timer_sync_nested(struct kvm_vcpu *vcpu);
 void kvm_timer_sync_user(struct kvm_vcpu *vcpu);
 bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu);
 void kvm_timer_update_run(struct kvm_vcpu *vcpu);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 52/59] KVM: arm64: nv: Fold GICv3 host trapping requirements into guest setup
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (50 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 51/59] KVM: arm64: nv: Sync nested timer state with FEAT_NV2 Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 53/59] KVM: arm64: nv: Publish emulated timer interrupt state in the in-memory state Marc Zyngier
                   ` (9 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

Popular HW that is able to use NV also has a broken vgic implementation
that requires trapping.

On such HW, propagate the host trap bits into the guest's shadow
ICH_HCR_EL2 register, making sure we don't allow an L2 guest to bring
the system down.

This involves a bit of tweaking so that the emulation code correctly
poicks up the shadow state as needed, and to only partially sync
ICH_HCR_EL2 back with the guest state to capture EOIcount.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/hyp/vgic-v3-sr.c      |  4 ++--
 arch/arm64/kvm/vgic/vgic-v3-nested.c | 21 ++++++++++++++++++---
 2 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/hyp/vgic-v3-sr.c b/arch/arm64/kvm/hyp/vgic-v3-sr.c
index 75152c1ce646..aaaea35099e5 100644
--- a/arch/arm64/kvm/hyp/vgic-v3-sr.c
+++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c
@@ -484,7 +484,7 @@ static int __vgic_v3_get_group(struct kvm_vcpu *vcpu)
 static int __vgic_v3_highest_priority_lr(struct kvm_vcpu *vcpu, u32 vmcr,
 					 u64 *lr_val)
 {
-	unsigned int used_lrs = vcpu->arch.vgic_cpu.vgic_v3.used_lrs;
+	unsigned int used_lrs = kern_hyp_va(vcpu->arch.vgic_cpu.current_cpu_if)->used_lrs;
 	u8 priority = GICv3_IDLE_PRIORITY;
 	int i, lr = -1;
 
@@ -523,7 +523,7 @@ static int __vgic_v3_highest_priority_lr(struct kvm_vcpu *vcpu, u32 vmcr,
 static int __vgic_v3_find_active_lr(struct kvm_vcpu *vcpu, int intid,
 				    u64 *lr_val)
 {
-	unsigned int used_lrs = vcpu->arch.vgic_cpu.vgic_v3.used_lrs;
+	unsigned int used_lrs = kern_hyp_va(vcpu->arch.vgic_cpu.current_cpu_if)->used_lrs;
 	int i;
 
 	for (i = 0; i < used_lrs; i++) {
diff --git a/arch/arm64/kvm/vgic/vgic-v3-nested.c b/arch/arm64/kvm/vgic/vgic-v3-nested.c
index 12937bc86e1c..51f97bd4489d 100644
--- a/arch/arm64/kvm/vgic/vgic-v3-nested.c
+++ b/arch/arm64/kvm/vgic/vgic-v3-nested.c
@@ -149,9 +149,20 @@ void vgic_v3_sync_nested(struct kvm_vcpu *vcpu)
 void vgic_v3_create_shadow_state(struct kvm_vcpu *vcpu)
 {
 	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.shadow_vgic_v3;
+	struct vgic_v3_cpu_if *host_if = &vcpu->arch.vgic_cpu.vgic_v3;
+	u64 val = 0;
 	int i;
 
-	cpu_if->vgic_hcr = __vcpu_sys_reg(vcpu, ICH_HCR_EL2);
+	/*
+	 * If we're on a system with a broken vgic that requires
+	 * trapping, propagate the trapping requirements.
+	 *
+	 * Ah, the smell of rotten fruits...
+	 */
+	if (static_branch_unlikely(&vgic_v3_cpuif_trap))
+		val = host_if->vgic_hcr & (ICH_HCR_TALL0 | ICH_HCR_TALL1 |
+					   ICH_HCR_TC | ICH_HCR_TDIR);
+	cpu_if->vgic_hcr = __vcpu_sys_reg(vcpu, ICH_HCR_EL2) | val;
 	cpu_if->vgic_vmcr = __vcpu_sys_reg(vcpu, ICH_VMCR_EL2);
 
 	for (i = 0; i < 4; i++) {
@@ -181,6 +192,7 @@ void vgic_v3_load_nested(struct kvm_vcpu *vcpu)
 void vgic_v3_put_nested(struct kvm_vcpu *vcpu)
 {
 	struct vgic_v3_cpu_if *s_cpu_if = vcpu_shadow_if(vcpu);
+	u64 val;
 	int i;
 
 	__vgic_v3_save_state(s_cpu_if);
@@ -189,7 +201,10 @@ void vgic_v3_put_nested(struct kvm_vcpu *vcpu)
 	 * Translate the shadow state HW fields back to the virtual ones
 	 * before copying the shadow struct back to the nested one.
 	 */
-	__vcpu_sys_reg(vcpu, ICH_HCR_EL2) = s_cpu_if->vgic_hcr;
+	val = __vcpu_sys_reg(vcpu, ICH_HCR_EL2);
+	val &= ~ICH_HCR_EOIcount_MASK;
+	val |= (s_cpu_if->vgic_hcr & ICH_HCR_EOIcount_MASK);
+	__vcpu_sys_reg(vcpu, ICH_HCR_EL2) = val;
 	__vcpu_sys_reg(vcpu, ICH_VMCR_EL2) = s_cpu_if->vgic_vmcr;
 
 	for (i = 0; i < 4; i++) {
@@ -198,7 +213,7 @@ void vgic_v3_put_nested(struct kvm_vcpu *vcpu)
 	}
 
 	for (i = 0; i < kvm_vgic_global_state.nr_lr; i++) {
-		u64 val = __vcpu_sys_reg(vcpu, ICH_LRN(i));
+		val = __vcpu_sys_reg(vcpu, ICH_LRN(i));
 
 		val &= ~ICH_LR_STATE;
 		val |= s_cpu_if->vgic_lr[i] & ICH_LR_STATE;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 53/59] KVM: arm64: nv: Publish emulated timer interrupt state in the in-memory state
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (51 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 52/59] KVM: arm64: nv: Fold GICv3 host trapping requirements into guest setup Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 54/59] KVM: arm64: nv: Allocate VNCR page when required Marc Zyngier
                   ` (8 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

With FEAT_NV2, the EL0 timer state is entirely stored in memory,
meaning that the hypervisor can only provide a very poor emulation.

The only thing we can really do is to publish the interrupt state
in the guest view of CNT{P,V}_CTL_EL0, and defer everything else
to the next exit.

Only FEAT_ECV will allow us to fix it, at the cost of extra trapping.

Suggested-by: Chase Conklin <chase.conklin@arm.com>
Suggested-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/arch_timer.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index 294ab5013e40..f551aa4ac321 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -453,6 +453,25 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
 {
 	int ret;
 
+	/*
+	 * Paper over NV2 brokenness by publishing the interrupt status
+	 * bit. This still results in a poor quality of emulation (guest
+	 * writes will have no effect until the next exit).
+	 *
+	 * But hey, it's fast, right?
+	 */
+	if (vcpu_has_nv2(vcpu) && is_hyp_ctxt(vcpu) &&
+	    (timer_ctx == vcpu_vtimer(vcpu) || timer_ctx == vcpu_ptimer(vcpu))) {
+		u32 ctl = timer_get_ctl(timer_ctx);
+
+		if (new_level)
+			ctl |= ARCH_TIMER_CTRL_IT_STAT;
+		else
+			ctl &= ~ARCH_TIMER_CTRL_IT_STAT;
+
+		timer_set_ctl(timer_ctx, ctl);
+	}
+
 	timer_ctx->irq.level = new_level;
 	trace_kvm_timer_update_irq(vcpu->vcpu_id, timer_irq(timer_ctx),
 				   timer_ctx->irq.level);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 54/59] KVM: arm64: nv: Allocate VNCR page when required
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (52 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 53/59] KVM: arm64: nv: Publish emulated timer interrupt state in the in-memory state Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:30 ` [PATCH v10 55/59] KVM: arm64: nv: Enable ARMv8.4-NV support Marc Zyngier
                   ` (7 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

If running a NV guest on an ARMv8.4-NV capable system, let's
allocate an additional page that will be used by the hypervisor
to fulfill system register accesses.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/nested.c | 10 ++++++++++
 arch/arm64/kvm/reset.c  |  1 +
 2 files changed, 11 insertions(+)

diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index f41a86f0e924..bc47c79afad0 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -38,6 +38,14 @@ int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu)
 	if (!cpus_have_final_cap(ARM64_HAS_NESTED_VIRT))
 		return -EINVAL;
 
+	if (cpus_have_final_cap(ARM64_HAS_ENHANCED_NESTED_VIRT)) {
+		if (!vcpu->arch.ctxt.vncr_array)
+			vcpu->arch.ctxt.vncr_array = (u64 *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+
+		if (!vcpu->arch.ctxt.vncr_array)
+			return -ENOMEM;
+	}
+
 	mutex_lock(&kvm->arch.config_lock);
 
 	/*
@@ -67,6 +75,8 @@ int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu)
 		    kvm_init_stage2_mmu(kvm, &tmp[num_mmus - 2], 0)) {
 			kvm_free_stage2_pgd(&tmp[num_mmus - 1]);
 			kvm_free_stage2_pgd(&tmp[num_mmus - 2]);
+			free_page((unsigned long)vcpu->arch.ctxt.vncr_array);
+			vcpu->arch.ctxt.vncr_array = NULL;
 		} else {
 			kvm->arch.nested_mmus_size = num_mmus;
 			ret = 0;
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index ad11cc514b3d..d35b6a50a03a 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -161,6 +161,7 @@ void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu)
 	if (sve_state)
 		kvm_unshare_hyp(sve_state, sve_state + vcpu_sve_state_size(vcpu));
 	kfree(sve_state);
+	free_page((unsigned long)vcpu->arch.ctxt.vncr_array);
 	kfree(vcpu->arch.ccsidr);
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 55/59] KVM: arm64: nv: Enable ARMv8.4-NV support
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (53 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 54/59] KVM: arm64: nv: Allocate VNCR page when required Marc Zyngier
@ 2023-05-15 17:30 ` Marc Zyngier
  2023-05-15 17:31 ` [PATCH v10 56/59] KVM: arm64: nv: Fast-track 'InHost' exception returns Marc Zyngier
                   ` (6 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:30 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

As all the VNCR-capable system registers are nicely separated
from the rest of the crowd, let's set HCR_EL2.NV2 on and let
the ball rolling.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_emulate.h | 23 +++++++++++++----------
 arch/arm64/include/asm/sysreg.h      |  1 +
 arch/arm64/kvm/hyp/vhe/switch.c      | 14 +++++++++++++-
 3 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index be56ff626be2..461377f79c60 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -255,21 +255,24 @@ static inline bool is_hyp_ctxt(const struct kvm_vcpu *vcpu)
 
 static inline u64 __fixup_spsr_el2_write(struct kvm_cpu_context *ctxt, u64 val)
 {
-	if (!__vcpu_el2_e2h_is_set(ctxt)) {
-		/*
-		 * Clear the .M field when writing SPSR to the CPU, so that we
-		 * can detect when the CPU clobbered our SPSR copy during a
-		 * local exception.
-		 */
-		val &= ~0xc;
-	}
+	struct kvm_vcpu *vcpu = container_of(ctxt, struct kvm_vcpu, arch.ctxt);
+
+	if (vcpu_has_nv2(vcpu) || __vcpu_el2_e2h_is_set(ctxt))
+		return val;
 
-	return val;
+	/*
+	 * Clear the .M field when writing SPSR to the CPU, so that we
+	 * can detect when the CPU clobbered our SPSR copy during a
+	 * local exception.
+	 */
+	return val &= ~0xc;
 }
 
 static inline u64 __fixup_spsr_el2_read(const struct kvm_cpu_context *ctxt, u64 val)
 {
-	if (__vcpu_el2_e2h_is_set(ctxt))
+	struct kvm_vcpu *vcpu = container_of(ctxt, struct kvm_vcpu, arch.ctxt);
+
+	if (vcpu_has_nv2(vcpu) || __vcpu_el2_e2h_is_set(ctxt))
 		return val;
 
 	/*
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 3508ba196b55..72ff6df5d75b 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -472,6 +472,7 @@
 #define SYS_TCR_EL2			sys_reg(3, 4, 2, 0, 2)
 #define SYS_VTTBR_EL2			sys_reg(3, 4, 2, 1, 0)
 #define SYS_VTCR_EL2			sys_reg(3, 4, 2, 1, 2)
+#define SYS_VNCR_EL2			sys_reg(3, 4, 2, 2, 0)
 
 #define SYS_TRFCR_EL2			sys_reg(3, 4, 1, 2, 1)
 #define SYS_HDFGRTR_EL2			sys_reg(3, 4, 3, 1, 4)
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 154d994c1015..0aeafda5b966 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -47,7 +47,13 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 			 * the EL1 virtual memory control register accesses
 			 * as well as the AT S1 operations.
 			 */
-			hcr |= HCR_TVM | HCR_TRVM | HCR_AT | HCR_TTLB | HCR_NV1;
+			if (vcpu_has_nv2(vcpu)) {
+				hcr &= ~HCR_TVM;
+			} else {
+				hcr |= HCR_TVM | HCR_TRVM | HCR_TTLB;
+			}
+
+			hcr |= HCR_AT | HCR_NV1;
 		} else {
 			/*
 			 * For a guest hypervisor on v8.1 (VHE), allow to
@@ -81,6 +87,12 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 			if (!vcpu_el2_tge_is_set(vcpu))
 				hcr |= HCR_AT | HCR_TTLB;
 		}
+
+		if (vcpu_has_nv2(vcpu)) {
+			hcr |= HCR_AT | HCR_TTLB | HCR_NV2;
+			write_sysreg_s(vcpu->arch.ctxt.vncr_array,
+				       SYS_VNCR_EL2);
+		}
 	} else if (vcpu_has_nv(vcpu)) {
 		u64 vhcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 56/59] KVM: arm64: nv: Fast-track 'InHost' exception returns
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (54 preceding siblings ...)
  2023-05-15 17:30 ` [PATCH v10 55/59] KVM: arm64: nv: Enable ARMv8.4-NV support Marc Zyngier
@ 2023-05-15 17:31 ` Marc Zyngier
  2023-05-15 17:31 ` [PATCH v10 57/59] KVM: arm64: nv: Fast-track EL1 TLBIs for VHE guests Marc Zyngier
                   ` (5 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:31 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

A significant part of the ARMv8.3-NV extension is to trap ERET
instructions so that the hypervisor gets a chance to switch
from a vEL2 L1 guest to an EL1 L2 guest.

But this also has the unfortunate consequence of trapping ERET
in unsuspecting circumstances, such as staying at vEL2 (interrupt
handling while being in the guest hypervisor), or returning to host
userspace in the case of a VHE guest.

Although we already make some effort to handle these ERET quicker
by not doing the put/load dance, it is still way too far down the
line for it to be efficient enough.

For these cases, it would ideal to ERET directly, no question asked.
Of course, we can't do that. But the next best thing is to do it as
early as possible, in fixup_guest_exit(), much as we would handle
FPSIMD exceptions.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/emulate-nested.c | 29 +++------------------
 arch/arm64/kvm/hyp/vhe/switch.c | 46 +++++++++++++++++++++++++++++++++
 2 files changed, 49 insertions(+), 26 deletions(-)

diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
index 08b152f8ba6a..45d4e148b735 100644
--- a/arch/arm64/kvm/emulate-nested.c
+++ b/arch/arm64/kvm/emulate-nested.c
@@ -955,8 +955,7 @@ static u64 kvm_check_illegal_exception_return(struct kvm_vcpu *vcpu, u64 spsr)
 
 void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu)
 {
-	u64 spsr, elr, mode;
-	bool direct_eret;
+	u64 spsr, elr;
 
 	/*
 	 * Forward this trap to the virtual EL2 if the virtual
@@ -965,33 +964,11 @@ void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu)
 	if (forward_traps(vcpu, HCR_NV))
 		return;
 
-	/*
-	 * Going through the whole put/load motions is a waste of time
-	 * if this is a VHE guest hypervisor returning to its own
-	 * userspace, or the hypervisor performing a local exception
-	 * return. No need to save/restore registers, no need to
-	 * switch S2 MMU. Just do the canonical ERET.
-	 */
-	spsr = vcpu_read_sys_reg(vcpu, SPSR_EL2);
-	spsr = kvm_check_illegal_exception_return(vcpu, spsr);
-
-	mode = spsr & (PSR_MODE_MASK | PSR_MODE32_BIT);
-
-	direct_eret  = (mode == PSR_MODE_EL0t &&
-			vcpu_el2_e2h_is_set(vcpu) &&
-			vcpu_el2_tge_is_set(vcpu));
-	direct_eret |= (mode == PSR_MODE_EL2h || mode == PSR_MODE_EL2t);
-
-	if (direct_eret) {
-		*vcpu_pc(vcpu) = vcpu_read_sys_reg(vcpu, ELR_EL2);
-		*vcpu_cpsr(vcpu) = spsr;
-		trace_kvm_nested_eret(vcpu, *vcpu_pc(vcpu), spsr);
-		return;
-	}
-
 	preempt_disable();
 	kvm_arch_vcpu_put(vcpu);
 
+	spsr = __vcpu_sys_reg(vcpu, SPSR_EL2);
+	spsr = kvm_check_illegal_exception_return(vcpu, spsr);
 	elr = __vcpu_sys_reg(vcpu, ELR_EL2);
 
 	trace_kvm_nested_eret(vcpu, elr, spsr);
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 0aeafda5b966..a63c44bf62ce 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -168,6 +168,51 @@ void deactivate_traps_vhe_put(struct kvm_vcpu *vcpu)
 	__deactivate_traps_common(vcpu);
 }
 
+static bool kvm_hyp_handle_eret(struct kvm_vcpu *vcpu, u64 *exit_code)
+{
+	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
+	u64 spsr, mode;
+
+	/*
+	 * Going through the whole put/load motions is a waste of time
+	 * if this is a VHE guest hypervisor returning to its own
+	 * userspace, or the hypervisor performing a local exception
+	 * return. No need to save/restore registers, no need to
+	 * switch S2 MMU. Just do the canonical ERET.
+	 *
+	 * Unless the trap has to be forwarded further down the line,
+	 * of course...
+	 */
+	if (__vcpu_sys_reg(vcpu, HCR_EL2) & HCR_NV)
+		return false;
+
+	spsr = read_sysreg_el1(SYS_SPSR);
+	spsr = __fixup_spsr_el2_read(ctxt, spsr);
+	mode = spsr & (PSR_MODE_MASK | PSR_MODE32_BIT);
+
+	switch (mode) {
+	case PSR_MODE_EL0t:
+		if (!(vcpu_el2_e2h_is_set(vcpu) && vcpu_el2_tge_is_set(vcpu)))
+			return false;
+		break;
+	case PSR_MODE_EL2t:
+		mode = PSR_MODE_EL1t;
+		break;
+	case PSR_MODE_EL2h:
+		mode = PSR_MODE_EL1h;
+		break;
+	default:
+		return false;
+	}
+
+	spsr = (spsr & ~(PSR_MODE_MASK | PSR_MODE32_BIT)) | mode;
+
+	write_sysreg_el2(spsr, SYS_SPSR);
+	write_sysreg_el2(read_sysreg_el1(SYS_ELR), SYS_ELR);
+
+	return true;
+}
+
 static const exit_handler_fn hyp_exit_handlers[] = {
 	[0 ... ESR_ELx_EC_MAX]		= NULL,
 	[ESR_ELx_EC_CP15_32]		= kvm_hyp_handle_cp15_32,
@@ -177,6 +222,7 @@ static const exit_handler_fn hyp_exit_handlers[] = {
 	[ESR_ELx_EC_IABT_LOW]		= kvm_hyp_handle_iabt_low,
 	[ESR_ELx_EC_DABT_LOW]		= kvm_hyp_handle_dabt_low,
 	[ESR_ELx_EC_PAC]		= kvm_hyp_handle_ptrauth,
+	[ESR_ELx_EC_ERET]		= kvm_hyp_handle_eret,
 };
 
 static const exit_handler_fn *kvm_get_exit_handler_array(struct kvm_vcpu *vcpu)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 57/59] KVM: arm64: nv: Fast-track EL1 TLBIs for VHE guests
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (55 preceding siblings ...)
  2023-05-15 17:31 ` [PATCH v10 56/59] KVM: arm64: nv: Fast-track 'InHost' exception returns Marc Zyngier
@ 2023-05-15 17:31 ` Marc Zyngier
  2023-05-15 17:31 ` [PATCH v10 58/59] KVM: arm64: nv: Use FEAT_ECV to trap access to EL0 timers Marc Zyngier
                   ` (4 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:31 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

Due to the way ARMv8.4-NV suppresses traps when accessing EL2
system registers, we can't track when the guest changes its
HCR_EL2.TGE setting. This means we always trap EL1 TLBIs,
even if they don't affect any guest.

This obviously has a huge impact on performance, as we handle
TLBI traps as a normal exit, and a normal VHE host issues
thousands of TLBIs when booting (and quite a few when running
userspace).

A cheap way to reduce the overhead is to handle the limited
case of {E2H,TGE}=={1,1} as a guest fixup, as we already have
the right mmu configuration in place. Just execute the decoded
instruction right away and return to the guest.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/hyp/vhe/switch.c | 43 ++++++++++++++++++++++++++++++++-
 arch/arm64/kvm/hyp/vhe/tlb.c    |  6 +++--
 arch/arm64/kvm/sys_regs.c       | 12 ---------
 3 files changed, 46 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index a63c44bf62ce..2d305e0f145f 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -168,6 +168,47 @@ void deactivate_traps_vhe_put(struct kvm_vcpu *vcpu)
 	__deactivate_traps_common(vcpu);
 }
 
+static bool kvm_hyp_handle_tlbi_el1(struct kvm_vcpu *vcpu, u64 *exit_code)
+{
+	u32 instr;
+	u64 val;
+
+	/*
+	 * Ideally, we would never trap on EL1 TLB invalidations when the
+	 * guest's HCR_EL2.{E2H,TGE} == {1,1}. But "thanks" to ARMv8.4, we
+	 * don't trap writes to HCR_EL2, meaning that we can't track
+	 * changes to the virtual TGE bit. So we leave HCR_EL2.TTLB set on
+	 * the host. Oopsie...
+	 *
+	 * In order to speed-up EL1 TLBIs from the vEL2 guest when TGE is
+	 * set, try and handle these invalidation as quickly as possible,
+	 * without fully exiting (unless this needs forwarding).
+	 */
+	if (!vcpu_has_nv2(vcpu) ||
+	    !vcpu_is_el2(vcpu) ||
+	    (__vcpu_sys_reg(vcpu, HCR_EL2) & (HCR_E2H | HCR_TGE)) != (HCR_E2H | HCR_TGE))
+		return false;
+
+	instr = esr_sys64_to_sysreg(kvm_vcpu_get_esr(vcpu));
+	if (sys_reg_Op0(instr) != TLBI_Op0 ||
+	    sys_reg_Op1(instr) != TLBI_Op1_EL1)
+		return false;
+
+	val = vcpu_get_reg(vcpu, kvm_vcpu_sys_get_rt(vcpu));
+	__kvm_tlb_el1_instr(NULL, val, instr);
+	__kvm_skip_instr(vcpu);
+
+	return true;
+}
+
+static bool kvm_hyp_handle_sysreg_vhe(struct kvm_vcpu *vcpu, u64 *exit_code)
+{
+	if (kvm_hyp_handle_tlbi_el1(vcpu, exit_code))
+		return true;
+
+	return kvm_hyp_handle_sysreg(vcpu, exit_code);
+}
+
 static bool kvm_hyp_handle_eret(struct kvm_vcpu *vcpu, u64 *exit_code)
 {
 	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
@@ -216,7 +257,7 @@ static bool kvm_hyp_handle_eret(struct kvm_vcpu *vcpu, u64 *exit_code)
 static const exit_handler_fn hyp_exit_handlers[] = {
 	[0 ... ESR_ELx_EC_MAX]		= NULL,
 	[ESR_ELx_EC_CP15_32]		= kvm_hyp_handle_cp15_32,
-	[ESR_ELx_EC_SYS64]		= kvm_hyp_handle_sysreg,
+	[ESR_ELx_EC_SYS64]		= kvm_hyp_handle_sysreg_vhe,
 	[ESR_ELx_EC_SVE]		= kvm_hyp_handle_fpsimd,
 	[ESR_ELx_EC_FP_ASIMD]		= kvm_hyp_handle_fpsimd,
 	[ESR_ELx_EC_IABT_LOW]		= kvm_hyp_handle_iabt_low,
diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c
index c4389db4cc22..beb162468c0b 100644
--- a/arch/arm64/kvm/hyp/vhe/tlb.c
+++ b/arch/arm64/kvm/hyp/vhe/tlb.c
@@ -201,7 +201,8 @@ void __kvm_tlb_el1_instr(struct kvm_s2_mmu *mmu, u64 val, u64 sys_encoding)
 	dsb(ishst);
 
 	/* Switch to requested VMID */
-	__tlb_switch_to_guest(mmu, &cxt);
+	if (mmu)
+		__tlb_switch_to_guest(mmu, &cxt);
 
 	/*
 	 * Execute the same instruction as the guest hypervisor did,
@@ -240,5 +241,6 @@ void __kvm_tlb_el1_instr(struct kvm_s2_mmu *mmu, u64 val, u64 sys_encoding)
 	dsb(ish);
 	isb();
 
-	__tlb_switch_to_host(&cxt);
+	if (mmu)
+		__tlb_switch_to_host(&cxt);
 }
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 228dba82e9e5..27a29dcbfcd2 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2987,18 +2987,6 @@ static bool handle_tlbi_el1(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 
 	WARN_ON(!vcpu_is_el2(vcpu));
 
-	if ((__vcpu_sys_reg(vcpu, HCR_EL2) & (HCR_E2H | HCR_TGE)) == (HCR_E2H | HCR_TGE)) {
-		mutex_lock(&vcpu->kvm->lock);
-		/*
-		 * ARMv8.4-NV allows the guest to change TGE behind
-		 * our back, so we always trap EL1 TLBIs from vEL2...
-		 */
-		__kvm_tlb_el1_instr(&vcpu->kvm->arch.mmu, p->regval, sys_encoding);
-		mutex_unlock(&vcpu->kvm->lock);
-
-		return true;
-	}
-
 	kvm_s2_mmu_iterate_by_vmid(vcpu->kvm, get_vmid(vttbr),
 				   &(union tlbi_info) {
 					   .va = {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 58/59] KVM: arm64: nv: Use FEAT_ECV to trap access to EL0 timers
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (56 preceding siblings ...)
  2023-05-15 17:31 ` [PATCH v10 57/59] KVM: arm64: nv: Fast-track EL1 TLBIs for VHE guests Marc Zyngier
@ 2023-05-15 17:31 ` Marc Zyngier
  2023-05-15 17:31 ` [PATCH v10 59/59] KVM: arm64: nv: Accelerate EL0 timer read accesses when FEAT_ECV is on Marc Zyngier
                   ` (3 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:31 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

Although FEAT_NV2 makes most things fast, it also makes it impossible
to correctly emulate the timers, as the sysreg accesses are redirected
to memory.

FEAT_ECV addresses this by giving a hypervisor the ability to trap
the EL02 sysregs as well as the virtual timer.

Add the required trap setting to make use of the feature, allowing
us to elide the ugly resync in the middle of the run loop.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/arch_timer.c          | 37 +++++++++++++++++++++++++---
 include/clocksource/arm_arch_timer.h |  4 +++
 2 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index f551aa4ac321..e64c9e35e5fe 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -790,7 +790,7 @@ static void kvm_timer_vcpu_load_nested_switch(struct kvm_vcpu *vcpu,
 
 static void timer_set_traps(struct kvm_vcpu *vcpu, struct timer_map *map)
 {
-	bool tpt, tpc;
+	bool tvt, tpt, tvc, tpc, tvt02, tpt02;
 	u64 clr, set;
 
 	/*
@@ -805,7 +805,30 @@ static void timer_set_traps(struct kvm_vcpu *vcpu, struct timer_map *map)
 	 * within this function, reality kicks in and we start adding
 	 * traps based on emulation requirements.
 	 */
-	tpt = tpc = false;
+	tvt = tpt = tvc = tpc = false;
+	tvt02 = tpt02 = false;
+
+	/*
+	 * NV2 badly breaks the timer semantics by redirecting accesses to
+	 * the EL0 timer state to memory, so let's call ECV to the rescue if
+	 * available: we trap all CNT{P,V}_{CTL,CVAL,TVAL}_EL0 accesses.
+	 *
+	 * The treatment slightly varies depending whether we run a nVHE or
+	 * VHE guest: nVHE will use the _EL0 registers directly, while VHE
+	 * will use the _EL02 accessors. This translates in different trap
+	 * bits.
+	 *
+	 * None of the trapping is required when running in non-HYP context,
+	 * unless required by the L1 hypervisor settings once we advertise
+	 * ECV+NV in the guest, or that we need trapping for other reasons.
+	 */
+	if (cpus_have_final_cap(ARM64_HAS_ECV) &&
+	    vcpu_has_nv2(vcpu) && is_hyp_ctxt(vcpu)) {
+		if (vcpu_el2_e2h_is_set(vcpu))
+			tvt02 = tpt02 = true;
+		else
+			tvt = tpt = true;
+	}
 
 	/*
 	 * We have two possibility to deal with a physical offset:
@@ -845,6 +868,10 @@ static void timer_set_traps(struct kvm_vcpu *vcpu, struct timer_map *map)
 
 	assign_clear_set_bit(tpt, CNTHCTL_EL1PCEN << 10, set, clr);
 	assign_clear_set_bit(tpc, CNTHCTL_EL1PCTEN << 10, set, clr);
+	assign_clear_set_bit(tvt, CNTHCTL_EL1TVT, clr, set);
+	assign_clear_set_bit(tvc, CNTHCTL_EL1TVCT, clr, set);
+	assign_clear_set_bit(tvt02, CNTHCTL_EL1NVVCT, clr, set);
+	assign_clear_set_bit(tpt02, CNTHCTL_EL1NVPCT, clr, set);
 
 	/* This only happens on VHE, so use the CNTKCTL_EL1 accessor */
 	sysreg_clear_set(cntkctl_el1, clr, set);
@@ -940,8 +967,12 @@ void kvm_timer_sync_nested(struct kvm_vcpu *vcpu)
 	 * accesses redirected to the VNCR page. Any guest action taken on
 	 * the timer is postponed until the next exit, leading to a very
 	 * poor quality of emulation.
+	 *
+	 * This is an unmitigated disaster, only papered over by FEAT_ECV,
+	 * which allows trapping of the timer registers even with NV2.
+	 * Still, this is still worse than FEAT_NV on its own. Meh.
 	 */
-	if (!is_hyp_ctxt(vcpu))
+	if (cpus_have_const_cap(ARM64_HAS_ECV) || !is_hyp_ctxt(vcpu))
 		return;
 
 	if (!vcpu_el2_e2h_is_set(vcpu)) {
diff --git a/include/clocksource/arm_arch_timer.h b/include/clocksource/arm_arch_timer.h
index cbbc9a6dc571..c62811fb4130 100644
--- a/include/clocksource/arm_arch_timer.h
+++ b/include/clocksource/arm_arch_timer.h
@@ -22,6 +22,10 @@
 #define CNTHCTL_EVNTDIR			(1 << 3)
 #define CNTHCTL_EVNTI			(0xF << 4)
 #define CNTHCTL_ECV			(1 << 12)
+#define CNTHCTL_EL1TVT			(1 << 13)
+#define CNTHCTL_EL1TVCT			(1 << 14)
+#define CNTHCTL_EL1NVPCT		(1 << 15)
+#define CNTHCTL_EL1NVVCT		(1 << 16)
 
 enum arch_timer_reg {
 	ARCH_TIMER_REG_CTRL,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* [PATCH v10 59/59] KVM: arm64: nv: Accelerate EL0 timer read accesses when FEAT_ECV is on
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (57 preceding siblings ...)
  2023-05-15 17:31 ` [PATCH v10 58/59] KVM: arm64: nv: Use FEAT_ECV to trap access to EL0 timers Marc Zyngier
@ 2023-05-15 17:31 ` Marc Zyngier
  2023-05-16 16:53 ` [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Eric Auger
                   ` (2 subsequent siblings)
  61 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-15 17:31 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

Although FEAT_ECV allows us to correctly enable the timers, it also
reduces performances pretty badly (a L2 guest doing a lot of virtio
emulated in L1 userspace results in a 30% degradation).

Mitigate this by emulating the CTL/CVAL register reads in the
inner run loop, without returning to the general kernel. This halves
the overhead described above.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/hyp/vhe/switch.c | 49 +++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 2d305e0f145f..fc8b7e374e9d 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -201,11 +201,60 @@ static bool kvm_hyp_handle_tlbi_el1(struct kvm_vcpu *vcpu, u64 *exit_code)
 	return true;
 }
 
+static bool kvm_hyp_handle_ecv(struct kvm_vcpu *vcpu, u64 *exit_code)
+{
+	u64 esr, val;
+
+	/*
+	 * Having FEAT_ECV allows for a better quality of timer emulation.
+	 * However, this comes at a huge cost in terms of traps. Try and
+	 * satisfy the reads without returning to the kernel if we can.
+	 */
+	if (!cpus_have_final_cap(ARM64_HAS_ECV))
+		return false;
+
+	if (!vcpu_has_nv2(vcpu))
+		return false;
+
+	esr = kvm_vcpu_get_esr(vcpu);
+	if ((esr & ESR_ELx_SYS64_ISS_DIR_MASK) != ESR_ELx_SYS64_ISS_DIR_READ)
+		return false;
+
+	switch (esr_sys64_to_sysreg(esr)) {
+	case SYS_CNTP_CTL_EL02:
+	case SYS_CNTP_CTL_EL0:
+		val = __vcpu_sys_reg(vcpu, CNTP_CTL_EL0);
+		break;
+	case SYS_CNTP_CVAL_EL02:
+	case SYS_CNTP_CVAL_EL0:
+		val = __vcpu_sys_reg(vcpu, CNTP_CVAL_EL0);
+		break;
+	case SYS_CNTV_CTL_EL02:
+	case SYS_CNTV_CTL_EL0:
+		val = __vcpu_sys_reg(vcpu, CNTV_CTL_EL0);
+		break;
+	case SYS_CNTV_CVAL_EL02:
+	case SYS_CNTV_CVAL_EL0:
+		val = __vcpu_sys_reg(vcpu, CNTV_CVAL_EL0);
+		break;
+	default:
+		return false;
+	}
+
+	vcpu_set_reg(vcpu, kvm_vcpu_sys_get_rt(vcpu), val);
+	__kvm_skip_instr(vcpu);
+
+	return true;
+}
+
 static bool kvm_hyp_handle_sysreg_vhe(struct kvm_vcpu *vcpu, u64 *exit_code)
 {
 	if (kvm_hyp_handle_tlbi_el1(vcpu, exit_code))
 		return true;
 
+	if (kvm_hyp_handle_ecv(vcpu, exit_code))
+		return true;
+
 	return kvm_hyp_handle_sysreg(vcpu, exit_code);
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 95+ messages in thread

* Re: [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (58 preceding siblings ...)
  2023-05-15 17:31 ` [PATCH v10 59/59] KVM: arm64: nv: Accelerate EL0 timer read accesses when FEAT_ECV is on Marc Zyngier
@ 2023-05-16 16:53 ` Eric Auger
  2023-05-16 18:47   ` Marc Zyngier
  2023-05-16 20:28   ` Marc Zyngier
  2023-06-05 11:28 ` Eric Auger
  2023-06-28  6:45 ` Ganapatrao Kulkarni
  61 siblings, 2 replies; 95+ messages in thread
From: Eric Auger @ 2023-05-16 16:53 UTC (permalink / raw)
  To: Marc Zyngier, kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

Hi Marc,

On 5/15/23 19:30, Marc Zyngier wrote:
> This is the 4th drop of NV support on arm64 for this year.
> 
> For the previous episodes, see [1].
> 
> What's changed:
> 
> - New framework to track system register traps that are reinjected in
>   guest EL2. It is expected to replace the discrete handling we have
>   enjoyed so far, which didn't scale at all. This has already fixed a
>   number of bugs that were hidden (a bunch of traps were never
>   forwarded...). Still a work in progress, but this is going in the
>   right direction.
> 
> - Allow the L1 hypervisor to have a S2 that has an input larger than
>   the L0 IPA space. This fixes a number of subtle issues, depending on
>   how the initial guest was created.
> 
> - Consequently, the patch series has gone longer again. Boo. But
>   hopefully some of it is easier to review...
> 
> [1] https://lore.kernel.org/r/20230405154008.3552854-1-maz@kernel.org

I have started testing this and when booting my fedora guest I get

[  151.796544] kvm [7617]: Unsupported guest sys_reg access at:
23f425fd0 [80000209]
[  151.796544]  { Op0( 3), Op1( 3), CRn(14), CRm( 3), Op2( 1), func_write },

as soon as the host has kvm-arm.mode=nested

This seems to be triggered very early by EDK2
(ArmPkg/Drivers/TimerDxe/TimerDxe.c).

If I am not wrong this CNTV_CTL_EL0. Do you have any idea?

By the way I got this already with your v9
(git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git
kvm-arm64/nv-6.4-WIP) and with your v10 (I cherry-picked your
maz/kvm-arm64/nv-6.5-WIP branch).

Thanks

Eric





> 
> Andre Przywara (1):
>   KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ
> 
> Christoffer Dall (5):
>   KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2
>   KVM: arm64: nv: Implement nested Stage-2 page table walk logic
>   KVM: arm64: nv: Unmap/flush shadow stage 2 page tables
>   KVM: arm64: nv: vgic: Emulate the HW bit in software
>   KVM: arm64: nv: Sync nested timer state with FEAT_NV2
> 
> Jintack Lim (7):
>   KVM: arm64: nv: Trap CPACR_EL1 access in virtual EL2
>   KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting
>   KVM: arm64: nv: Respect virtual CPTR_EL2.{TFP,FPEN} settings
>   KVM: arm64: nv: Respect virtual HCR_EL2.{NV,TSC) settings
>   KVM: arm64: nv: Configure HCR_EL2 for nested virtualization
>   KVM: arm64: nv: Trap and emulate TLBI instructions from virtual EL2
>   KVM: arm64: nv: Nested GICv3 Support
> 
> Marc Zyngier (46):
>   KVM: arm64: Move VTCR_EL2 into struct s2_mmu
>   arm64: Add missing Set/Way CMO encodings
>   arm64: Add missing VA CMO encodings
>   arm64: Add missing ERXMISCx_EL1 encodings
>   arm64: Add missing DC ZVA/GVA/GZVA encodings
>   arm64: Add TLBI operation encodings
>   arm64: Add AT operation encodings
>   KVM: arm64: Add missing HCR_EL2 trap bits
>   KVM: arm64: nv: Add trap forwarding infrastructure
>   KVM: arm64: nv: Add trap forwarding for HCR_EL2
>   KVM: arm64: nv: Expose FEAT_EVT to nested guests
>   KVM: arm64: nv: Add trap forwarding for MDCR_EL2
>   KVM: arm64: nv: Add trap forwarding for CNTHCTL_EL2
>   KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers
>   KVM: arm64: nv: Handle virtual EL2 registers in
>     vcpu_read/write_sys_reg()
>   KVM: arm64: nv: Handle SPSR_EL2 specially
>   KVM: arm64: nv: Handle HCR_EL2.E2H specially
>   KVM: arm64: nv: Save/Restore vEL2 sysregs
>   KVM: arm64: nv: Support multiple nested Stage-2 mmu structures
>   KVM: arm64: nv: Handle shadow stage 2 page faults
>   KVM: arm64: nv: Restrict S2 RD/WR permissions to match the guest's
>   KVM: arm64: nv: Set a handler for the system instruction traps
>   KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2
>   KVM: arm64: nv: Fold guest's HCR_EL2 configuration into the host's
>   KVM: arm64: nv: Hide RAS from nested guests
>   KVM: arm64: nv: Add handling of EL2-specific timer registers
>   KVM: arm64: nv: Load timer before the GIC
>   KVM: arm64: nv: Don't load the GICv4 context on entering a nested
>     guest
>   KVM: arm64: nv: Implement maintenance interrupt forwarding
>   KVM: arm64: nv: Deal with broken VGIC on maintenance interrupt
>     delivery
>   KVM: arm64: nv: Allow userspace to request KVM_ARM_VCPU_NESTED_VIRT
>   KVM: arm64: nv: Add handling of FEAT_TTL TLB invalidation
>   KVM: arm64: nv: Invalidate TLBs based on shadow S2 TTL-like
>     information
>   KVM: arm64: nv: Tag shadow S2 entries with nested level
>   KVM: arm64: nv: Add include containing the VNCR_EL2 offsets
>   KVM: arm64: nv: Map VNCR-capable registers to a separate page
>   KVM: arm64: nv: Move nested vgic state into the sysreg file
>   KVM: arm64: Add FEAT_NV2 cpu feature
>   KVM: arm64: nv: Fold GICv3 host trapping requirements into guest setup
>   KVM: arm64: nv: Publish emulated timer interrupt state in the
>     in-memory state
>   KVM: arm64: nv: Allocate VNCR page when required
>   KVM: arm64: nv: Enable ARMv8.4-NV support
>   KVM: arm64: nv: Fast-track 'InHost' exception returns
>   KVM: arm64: nv: Fast-track EL1 TLBIs for VHE guests
>   KVM: arm64: nv: Use FEAT_ECV to trap access to EL0 timers
>   KVM: arm64: nv: Accelerate EL0 timer read accesses when FEAT_ECV is on
> 
>  .../virt/kvm/devices/arm-vgic-v3.rst          |  12 +-
>  arch/arm64/include/asm/esr.h                  |   1 +
>  arch/arm64/include/asm/kvm_arm.h              |  14 +
>  arch/arm64/include/asm/kvm_asm.h              |   4 +
>  arch/arm64/include/asm/kvm_emulate.h          |  93 +-
>  arch/arm64/include/asm/kvm_host.h             | 181 +++-
>  arch/arm64/include/asm/kvm_hyp.h              |   2 +
>  arch/arm64/include/asm/kvm_mmu.h              |  20 +-
>  arch/arm64/include/asm/kvm_nested.h           | 133 +++
>  arch/arm64/include/asm/stage2_pgtable.h       |   4 +-
>  arch/arm64/include/asm/sysreg.h               | 196 ++++
>  arch/arm64/include/asm/vncr_mapping.h         |  74 ++
>  arch/arm64/include/uapi/asm/kvm.h             |   1 +
>  arch/arm64/kernel/cpufeature.c                |  11 +
>  arch/arm64/kvm/Makefile                       |   4 +-
>  arch/arm64/kvm/arch_timer.c                   |  98 +-
>  arch/arm64/kvm/arm.c                          |  33 +-
>  arch/arm64/kvm/at.c                           | 219 ++++
>  arch/arm64/kvm/emulate-nested.c               | 934 ++++++++++++++++-
>  arch/arm64/kvm/handle_exit.c                  |  29 +-
>  arch/arm64/kvm/hyp/include/hyp/switch.h       |   8 +-
>  arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h    |   5 +-
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c         |   8 +-
>  arch/arm64/kvm/hyp/nvhe/pkvm.c                |   4 +-
>  arch/arm64/kvm/hyp/nvhe/switch.c              |   2 +-
>  arch/arm64/kvm/hyp/nvhe/sysreg-sr.c           |   2 +-
>  arch/arm64/kvm/hyp/pgtable.c                  |   2 +-
>  arch/arm64/kvm/hyp/vgic-v3-sr.c               |   6 +-
>  arch/arm64/kvm/hyp/vhe/switch.c               | 206 +++-
>  arch/arm64/kvm/hyp/vhe/sysreg-sr.c            | 124 ++-
>  arch/arm64/kvm/hyp/vhe/tlb.c                  |  83 ++
>  arch/arm64/kvm/mmu.c                          | 255 ++++-
>  arch/arm64/kvm/nested.c                       | 799 ++++++++++++++-
>  arch/arm64/kvm/pkvm.c                         |   2 +-
>  arch/arm64/kvm/reset.c                        |   7 +
>  arch/arm64/kvm/sys_regs.c                     | 958 +++++++++++++++++-
>  arch/arm64/kvm/trace_arm.h                    |  19 +
>  arch/arm64/kvm/vgic/vgic-init.c               |  33 +
>  arch/arm64/kvm/vgic/vgic-kvm-device.c         |  32 +-
>  arch/arm64/kvm/vgic/vgic-v3-nested.c          | 248 +++++
>  arch/arm64/kvm/vgic/vgic-v3.c                 |  43 +-
>  arch/arm64/kvm/vgic/vgic.c                    |  29 +
>  arch/arm64/kvm/vgic/vgic.h                    |  10 +
>  arch/arm64/tools/cpucaps                      |   1 +
>  include/clocksource/arm_arch_timer.h          |   4 +
>  include/kvm/arm_arch_timer.h                  |   1 +
>  include/kvm/arm_vgic.h                        |  17 +
>  include/uapi/linux/kvm.h                      |   1 +
>  tools/arch/arm/include/uapi/asm/kvm.h         |   1 +
>  49 files changed, 4790 insertions(+), 183 deletions(-)
>  create mode 100644 arch/arm64/include/asm/vncr_mapping.h
>  create mode 100644 arch/arm64/kvm/at.c
>  create mode 100644 arch/arm64/kvm/vgic/vgic-v3-nested.c
> 


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support
  2023-05-16 16:53 ` [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Eric Auger
@ 2023-05-16 18:47   ` Marc Zyngier
  2023-05-16 20:28   ` Marc Zyngier
  1 sibling, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-05-16 18:47 UTC (permalink / raw)
  To: Eric Auger
  Cc: kvmarm, kvm, linux-arm-kernel, Alexandru Elisei, Andre Przywara,
	Chase Conklin, Christoffer Dall, Ganapatrao Kulkarni,
	Darren Hart, Jintack Lim, Russell King, Miguel Luis, James Morse,
	Suzuki K Poulose, Oliver Upton, Zenghui Yu

Hey Eric,

On Tue, 16 May 2023 17:53:14 +0100,
Eric Auger <eauger@redhat.com> wrote:
> 
> Hi Marc,
> 
> On 5/15/23 19:30, Marc Zyngier wrote:
> > This is the 4th drop of NV support on arm64 for this year.
> > 
> > For the previous episodes, see [1].
> > 
> > What's changed:
> > 
> > - New framework to track system register traps that are reinjected in
> >   guest EL2. It is expected to replace the discrete handling we have
> >   enjoyed so far, which didn't scale at all. This has already fixed a
> >   number of bugs that were hidden (a bunch of traps were never
> >   forwarded...). Still a work in progress, but this is going in the
> >   right direction.
> > 
> > - Allow the L1 hypervisor to have a S2 that has an input larger than
> >   the L0 IPA space. This fixes a number of subtle issues, depending on
> >   how the initial guest was created.
> > 
> > - Consequently, the patch series has gone longer again. Boo. But
> >   hopefully some of it is easier to review...
> > 
> > [1] https://lore.kernel.org/r/20230405154008.3552854-1-maz@kernel.org
> 
> I have started testing this and when booting my fedora guest I get
> 
> [  151.796544] kvm [7617]: Unsupported guest sys_reg access at:
> 23f425fd0 [80000209]
> [  151.796544]  { Op0( 3), Op1( 3), CRn(14), CRm( 3), Op2( 1), func_write },
> 
> as soon as the host has kvm-arm.mode=nested

Very odd. A write to CNTV_CTL_EL0 that fails, meaning that we have ECV
traps for the virtual timer set, and so this is probably done from
(virtual) EL2. Which of course we're not properly handling. Duh.

> This seems to be triggered very early by EDK2
> (ArmPkg/Drivers/TimerDxe/TimerDxe.c).
> 
> If I am not wrong this CNTV_CTL_EL0. Do you have any idea?

I have a good idea, but I could use some more information:

- Is this a nested guest? Or a non nested guest?

- Can you stash your EDK2 build somewhere so that I can try to
  reproduce it?

- What do you use for VMM on the host?

I'm running EDK2 on both L1 and L2 guests (some Debian builds), and I
don't see any issue. But when running EDK2 on L1, I run it non-nested.

> By the way I got this already with your v9
> (git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git
> kvm-arm64/nv-6.4-WIP) and with your v10 (I cherry-picked your
> maz/kvm-arm64/nv-6.5-WIP branch).

That's an interesting data point. At least, it hasn't regressed
further... I'll try and cook a test tomorrow and run it on my own
rig. Which may or may not behave like yours, you never know...

Thanks a lot for the report!

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support
  2023-05-16 16:53 ` [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Eric Auger
  2023-05-16 18:47   ` Marc Zyngier
@ 2023-05-16 20:28   ` Marc Zyngier
  2023-05-17  8:59     ` Eric Auger
  1 sibling, 1 reply; 95+ messages in thread
From: Marc Zyngier @ 2023-05-16 20:28 UTC (permalink / raw)
  To: Eric Auger
  Cc: kvmarm, kvm, linux-arm-kernel, Alexandru Elisei, Andre Przywara,
	Chase Conklin, Christoffer Dall, Ganapatrao Kulkarni,
	Darren Hart, Jintack Lim, Russell King, Miguel Luis, James Morse,
	Suzuki K Poulose, Oliver Upton, Zenghui Yu

On Tue, 16 May 2023 17:53:14 +0100,
Eric Auger <eauger@redhat.com> wrote:
> 
> Hi Marc,
> 
> On 5/15/23 19:30, Marc Zyngier wrote:
> > This is the 4th drop of NV support on arm64 for this year.
> > 
> > For the previous episodes, see [1].
> > 
> > What's changed:
> > 
> > - New framework to track system register traps that are reinjected in
> >   guest EL2. It is expected to replace the discrete handling we have
> >   enjoyed so far, which didn't scale at all. This has already fixed a
> >   number of bugs that were hidden (a bunch of traps were never
> >   forwarded...). Still a work in progress, but this is going in the
> >   right direction.
> > 
> > - Allow the L1 hypervisor to have a S2 that has an input larger than
> >   the L0 IPA space. This fixes a number of subtle issues, depending on
> >   how the initial guest was created.
> > 
> > - Consequently, the patch series has gone longer again. Boo. But
> >   hopefully some of it is easier to review...
> > 
> > [1] https://lore.kernel.org/r/20230405154008.3552854-1-maz@kernel.org
> 
> I have started testing this and when booting my fedora guest I get
> 
> [  151.796544] kvm [7617]: Unsupported guest sys_reg access at:
> 23f425fd0 [80000209]
> [  151.796544]  { Op0( 3), Op1( 3), CRn(14), CRm( 3), Op2( 1), func_write },
> 
> as soon as the host has kvm-arm.mode=nested
> 
> This seems to be triggered very early by EDK2
> (ArmPkg/Drivers/TimerDxe/TimerDxe.c).
> 
> If I am not wrong this CNTV_CTL_EL0. Do you have any idea?

So here's my current analysis:

I assume you are running EDK2 as the L1 guest in a nested
configuration. I also assume that you are not running on an Apple
CPU. If these assumptions are correct, then EDK2 runs at vEL2, and is
in nVHE mode.

Finally, I'm going to assume that your implementation has FEAT_ECV and
FEAT_NV2, because I can't see how it could fail otherwise.

In these precise conditions, KVM sets the CNTHCTL_EL2.EL1TVT bit so
that we can trap the EL0 virtual timer and faithfully emulate it (it
is otherwise written to memory, which isn't very helpful).

As it turns out, we don't handle these traps. I didn't spot it because
my test machines are all Apple boxes that don't have a nVHE mode, so
nothing on the nVHE path is getting *ANY* coverage. Hint: having
access to such a machine would help (shipping address on request!).
Otherwise, I'll eventually kill the nVHE support altogether.

I have written the following patch, which compiles, but that I cannot
test with my current setup. Could you please give it a go?

Thanks again,

	M.

From feb03b57de0bcb83254a2d6a3ce320f5e39434b6 Mon Sep 17 00:00:00 2001
From: Marc Zyngier <maz@kernel.org>
Date: Tue, 16 May 2023 21:06:20 +0100
Subject: [PATCH] KVM: arm64: Handle virtual timer traps when
 CNTHCTL_EL2.EL1TVT is set

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/sysreg.h |  1 +
 arch/arm64/kvm/sys_regs.c       | 28 ++++++++++++++++++++++++++++
 2 files changed, 29 insertions(+)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 72ff6df5d75b..77a61179ea37 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -436,6 +436,7 @@
 #define SYS_CNTP_CTL_EL0		sys_reg(3, 3, 14, 2, 1)
 #define SYS_CNTP_CVAL_EL0		sys_reg(3, 3, 14, 2, 2)
 
+#define SYS_CNTV_TVAL_EL0		sys_reg(3, 3, 14, 3, 0)
 #define SYS_CNTV_CTL_EL0		sys_reg(3, 3, 14, 3, 1)
 #define SYS_CNTV_CVAL_EL0		sys_reg(3, 3, 14, 3, 2)
 
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 27a29dcbfcd2..9aa9c4e4b4d6 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1328,6 +1328,14 @@ static bool access_arch_timer(struct kvm_vcpu *vcpu,
 		treg = TIMER_REG_TVAL;
 		break;
 
+	case SYS_CNTV_TVAL_EL0:
+		if (is_hyp_ctxt(vcpu) && vcpu_el2_e2h_is_set(vcpu))
+			tmr = TIMER_HVTIMER;
+		else
+			tmr = TIMER_VTIMER;
+		treg = TIMER_REG_TVAL;
+		break;
+
 	case SYS_AARCH32_CNTP_TVAL:
 	case SYS_CNTP_TVAL_EL02:
 		tmr = TIMER_PTIMER;
@@ -1357,6 +1365,14 @@ static bool access_arch_timer(struct kvm_vcpu *vcpu,
 		treg = TIMER_REG_CTL;
 		break;
 
+	case SYS_CNTV_CTL_EL0:
+		if (is_hyp_ctxt(vcpu) && vcpu_el2_e2h_is_set(vcpu))
+			tmr = TIMER_HVTIMER;
+		else
+			tmr = TIMER_VTIMER;
+		treg = TIMER_REG_CTL;
+		break;
+
 	case SYS_AARCH32_CNTP_CTL:
 	case SYS_CNTP_CTL_EL02:
 		tmr = TIMER_PTIMER;
@@ -1386,6 +1402,14 @@ static bool access_arch_timer(struct kvm_vcpu *vcpu,
 		treg = TIMER_REG_CVAL;
 		break;
 
+	case SYS_CNTV_CVAL_EL0:
+		if (is_hyp_ctxt(vcpu) && vcpu_el2_e2h_is_set(vcpu))
+			tmr = TIMER_HVTIMER;
+		else
+			tmr = TIMER_VTIMER;
+		treg = TIMER_REG_CVAL;
+		break;
+
 	case SYS_AARCH32_CNTP_CVAL:
 	case SYS_CNTP_CVAL_EL02:
 		tmr = TIMER_PTIMER;
@@ -2510,6 +2534,10 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_CNTP_CTL_EL0), access_arch_timer },
 	{ SYS_DESC(SYS_CNTP_CVAL_EL0), access_arch_timer },
 
+	{ SYS_DESC(SYS_CNTV_TVAL_EL0), access_arch_timer },
+	{ SYS_DESC(SYS_CNTV_CTL_EL0), access_arch_timer },
+	{ SYS_DESC(SYS_CNTV_CVAL_EL0), access_arch_timer },
+
 	/* PMEVCNTRn_EL0 */
 	PMU_PMEVCNTR_EL0(0),
 	PMU_PMEVCNTR_EL0(1),
-- 
2.39.2


-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply related	[flat|nested] 95+ messages in thread

* Re: [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support
  2023-05-16 20:28   ` Marc Zyngier
@ 2023-05-17  8:59     ` Eric Auger
  2023-05-17 14:12       ` Marc Zyngier
  0 siblings, 1 reply; 95+ messages in thread
From: Eric Auger @ 2023-05-17  8:59 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, kvm, linux-arm-kernel, Alexandru Elisei, Andre Przywara,
	Chase Conklin, Christoffer Dall, Ganapatrao Kulkarni,
	Darren Hart, Jintack Lim, Russell King, Miguel Luis, James Morse,
	Suzuki K Poulose, Oliver Upton, Zenghui Yu

Hi Marc,

On 5/16/23 22:28, Marc Zyngier wrote:
> On Tue, 16 May 2023 17:53:14 +0100,
> Eric Auger <eauger@redhat.com> wrote:
>>
>> Hi Marc,
>>
>> On 5/15/23 19:30, Marc Zyngier wrote:
>>> This is the 4th drop of NV support on arm64 for this year.
>>>
>>> For the previous episodes, see [1].
>>>
>>> What's changed:
>>>
>>> - New framework to track system register traps that are reinjected in
>>>   guest EL2. It is expected to replace the discrete handling we have
>>>   enjoyed so far, which didn't scale at all. This has already fixed a
>>>   number of bugs that were hidden (a bunch of traps were never
>>>   forwarded...). Still a work in progress, but this is going in the
>>>   right direction.
>>>
>>> - Allow the L1 hypervisor to have a S2 that has an input larger than
>>>   the L0 IPA space. This fixes a number of subtle issues, depending on
>>>   how the initial guest was created.
>>>
>>> - Consequently, the patch series has gone longer again. Boo. But
>>>   hopefully some of it is easier to review...
>>>
>>> [1] https://lore.kernel.org/r/20230405154008.3552854-1-maz@kernel.org
>>
>> I have started testing this and when booting my fedora guest I get
>>
>> [  151.796544] kvm [7617]: Unsupported guest sys_reg access at:
>> 23f425fd0 [80000209]
>> [  151.796544]  { Op0( 3), Op1( 3), CRn(14), CRm( 3), Op2( 1), func_write },
>>
>> as soon as the host has kvm-arm.mode=nested
>>
>> This seems to be triggered very early by EDK2
>> (ArmPkg/Drivers/TimerDxe/TimerDxe.c).
>>
>> If I am not wrong this CNTV_CTL_EL0. Do you have any idea?
> 
> So here's my current analysis:
> 
> I assume you are running EDK2 as the L1 guest in a nested
> configuration. I also assume that you are not running on an Apple
> CPU. If these assumptions are correct, then EDK2 runs at vEL2, and is
> in nVHE mode.
> 
> Finally, I'm going to assume that your implementation has FEAT_ECV and
> FEAT_NV2, because I can't see how it could fail otherwise.
all the above is correct.
> 
> In these precise conditions, KVM sets the CNTHCTL_EL2.EL1TVT bit so
> that we can trap the EL0 virtual timer and faithfully emulate it (it
> is otherwise written to memory, which isn't very helpful).

indeed
> 
> As it turns out, we don't handle these traps. I didn't spot it because
> my test machines are all Apple boxes that don't have a nVHE mode, so
> nothing on the nVHE path is getting *ANY* coverage. Hint: having
> access to such a machine would help (shipping address on request!).
> Otherwise, I'll eventually kill the nVHE support altogether.
> 
> I have written the following patch, which compiles, but that I cannot
> test with my current setup. Could you please give it a go?

with the patch below, my guest boots nicely. You did it great on the 1st
shot!!! So this fixes my issue. I will continue testing the v10.

Thank you again!

Eric


> 
> Thanks again,
> 
> 	M.
> 
> From feb03b57de0bcb83254a2d6a3ce320f5e39434b6 Mon Sep 17 00:00:00 2001
> From: Marc Zyngier <maz@kernel.org>
> Date: Tue, 16 May 2023 21:06:20 +0100
> Subject: [PATCH] KVM: arm64: Handle virtual timer traps when
>  CNTHCTL_EL2.EL1TVT is set
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/sysreg.h |  1 +
>  arch/arm64/kvm/sys_regs.c       | 28 ++++++++++++++++++++++++++++
>  2 files changed, 29 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 72ff6df5d75b..77a61179ea37 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -436,6 +436,7 @@
>  #define SYS_CNTP_CTL_EL0		sys_reg(3, 3, 14, 2, 1)
>  #define SYS_CNTP_CVAL_EL0		sys_reg(3, 3, 14, 2, 2)
>  
> +#define SYS_CNTV_TVAL_EL0		sys_reg(3, 3, 14, 3, 0)
>  #define SYS_CNTV_CTL_EL0		sys_reg(3, 3, 14, 3, 1)
>  #define SYS_CNTV_CVAL_EL0		sys_reg(3, 3, 14, 3, 2)
>  
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 27a29dcbfcd2..9aa9c4e4b4d6 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1328,6 +1328,14 @@ static bool access_arch_timer(struct kvm_vcpu *vcpu,
>  		treg = TIMER_REG_TVAL;
>  		break;
>  
> +	case SYS_CNTV_TVAL_EL0:
> +		if (is_hyp_ctxt(vcpu) && vcpu_el2_e2h_is_set(vcpu))
> +			tmr = TIMER_HVTIMER;
> +		else
> +			tmr = TIMER_VTIMER;
> +		treg = TIMER_REG_TVAL;
> +		break;
> +
>  	case SYS_AARCH32_CNTP_TVAL:
>  	case SYS_CNTP_TVAL_EL02:
>  		tmr = TIMER_PTIMER;
> @@ -1357,6 +1365,14 @@ static bool access_arch_timer(struct kvm_vcpu *vcpu,
>  		treg = TIMER_REG_CTL;
>  		break;
>  
> +	case SYS_CNTV_CTL_EL0:
> +		if (is_hyp_ctxt(vcpu) && vcpu_el2_e2h_is_set(vcpu))
> +			tmr = TIMER_HVTIMER;
> +		else
> +			tmr = TIMER_VTIMER;
> +		treg = TIMER_REG_CTL;
> +		break;
> +
>  	case SYS_AARCH32_CNTP_CTL:
>  	case SYS_CNTP_CTL_EL02:
>  		tmr = TIMER_PTIMER;
> @@ -1386,6 +1402,14 @@ static bool access_arch_timer(struct kvm_vcpu *vcpu,
>  		treg = TIMER_REG_CVAL;
>  		break;
>  
> +	case SYS_CNTV_CVAL_EL0:
> +		if (is_hyp_ctxt(vcpu) && vcpu_el2_e2h_is_set(vcpu))
> +			tmr = TIMER_HVTIMER;
> +		else
> +			tmr = TIMER_VTIMER;
> +		treg = TIMER_REG_CVAL;
> +		break;
> +
>  	case SYS_AARCH32_CNTP_CVAL:
>  	case SYS_CNTP_CVAL_EL02:
>  		tmr = TIMER_PTIMER;
> @@ -2510,6 +2534,10 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	{ SYS_DESC(SYS_CNTP_CTL_EL0), access_arch_timer },
>  	{ SYS_DESC(SYS_CNTP_CVAL_EL0), access_arch_timer },
>  
> +	{ SYS_DESC(SYS_CNTV_TVAL_EL0), access_arch_timer },
> +	{ SYS_DESC(SYS_CNTV_CTL_EL0), access_arch_timer },
> +	{ SYS_DESC(SYS_CNTV_CVAL_EL0), access_arch_timer },
> +
>  	/* PMEVCNTRn_EL0 */
>  	PMU_PMEVCNTR_EL0(0),
>  	PMU_PMEVCNTR_EL0(1),


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support
  2023-05-17  8:59     ` Eric Auger
@ 2023-05-17 14:12       ` Marc Zyngier
  2023-06-06  9:33         ` Eric Auger
  0 siblings, 1 reply; 95+ messages in thread
From: Marc Zyngier @ 2023-05-17 14:12 UTC (permalink / raw)
  To: Eric Auger
  Cc: kvmarm, kvm, linux-arm-kernel, Alexandru Elisei, Andre Przywara,
	Chase Conklin, Christoffer Dall, Ganapatrao Kulkarni,
	Darren Hart, Jintack Lim, Russell King, Miguel Luis, James Morse,
	Suzuki K Poulose, Oliver Upton, Zenghui Yu

On Wed, 17 May 2023 09:59:45 +0100,
Eric Auger <eauger@redhat.com> wrote:
> 
> Hi Marc,
> 
> On 5/16/23 22:28, Marc Zyngier wrote:
> > On Tue, 16 May 2023 17:53:14 +0100,
> > Eric Auger <eauger@redhat.com> wrote:
> >>
> >> Hi Marc,
> >>
> >> On 5/15/23 19:30, Marc Zyngier wrote:
> >>> This is the 4th drop of NV support on arm64 for this year.
> >>>
> >>> For the previous episodes, see [1].
> >>>
> >>> What's changed:
> >>>
> >>> - New framework to track system register traps that are reinjected in
> >>>   guest EL2. It is expected to replace the discrete handling we have
> >>>   enjoyed so far, which didn't scale at all. This has already fixed a
> >>>   number of bugs that were hidden (a bunch of traps were never
> >>>   forwarded...). Still a work in progress, but this is going in the
> >>>   right direction.
> >>>
> >>> - Allow the L1 hypervisor to have a S2 that has an input larger than
> >>>   the L0 IPA space. This fixes a number of subtle issues, depending on
> >>>   how the initial guest was created.
> >>>
> >>> - Consequently, the patch series has gone longer again. Boo. But
> >>>   hopefully some of it is easier to review...
> >>>
> >>> [1] https://lore.kernel.org/r/20230405154008.3552854-1-maz@kernel.org
> >>
> >> I have started testing this and when booting my fedora guest I get
> >>
> >> [  151.796544] kvm [7617]: Unsupported guest sys_reg access at:
> >> 23f425fd0 [80000209]
> >> [  151.796544]  { Op0( 3), Op1( 3), CRn(14), CRm( 3), Op2( 1), func_write },
> >>
> >> as soon as the host has kvm-arm.mode=nested
> >>
> >> This seems to be triggered very early by EDK2
> >> (ArmPkg/Drivers/TimerDxe/TimerDxe.c).
> >>
> >> If I am not wrong this CNTV_CTL_EL0. Do you have any idea?
> > 
> > So here's my current analysis:
> > 
> > I assume you are running EDK2 as the L1 guest in a nested
> > configuration. I also assume that you are not running on an Apple
> > CPU. If these assumptions are correct, then EDK2 runs at vEL2, and is
> > in nVHE mode.
> > 
> > Finally, I'm going to assume that your implementation has FEAT_ECV and
> > FEAT_NV2, because I can't see how it could fail otherwise.
> all the above is correct.
> > 
> > In these precise conditions, KVM sets the CNTHCTL_EL2.EL1TVT bit so
> > that we can trap the EL0 virtual timer and faithfully emulate it (it
> > is otherwise written to memory, which isn't very helpful).
> 
> indeed
> > 
> > As it turns out, we don't handle these traps. I didn't spot it because
> > my test machines are all Apple boxes that don't have a nVHE mode, so
> > nothing on the nVHE path is getting *ANY* coverage. Hint: having
> > access to such a machine would help (shipping address on request!).
> > Otherwise, I'll eventually kill the nVHE support altogether.
> > 
> > I have written the following patch, which compiles, but that I cannot
> > test with my current setup. Could you please give it a go?
> 
> with the patch below, my guest boots nicely. You did it great on the 1st
> shot!!! So this fixes my issue. I will continue testing the v10.

Thanks a lot for reporting the issue and testing my hacks. I'll
eventually fold it into the rest of the series.

By the way, what are you using as your VMM? I'd really like to
reproduce your setup.

Cheers,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (59 preceding siblings ...)
  2023-05-16 16:53 ` [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Eric Auger
@ 2023-06-05 11:28 ` Eric Auger
  2023-06-06  7:30   ` Marc Zyngier
  2023-06-28  6:45 ` Ganapatrao Kulkarni
  61 siblings, 1 reply; 95+ messages in thread
From: Eric Auger @ 2023-06-05 11:28 UTC (permalink / raw)
  To: Marc Zyngier, kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

Hi Marc,

On 5/15/23 19:30, Marc Zyngier wrote:
> This is the 4th drop of NV support on arm64 for this year.
> 
> For the previous episodes, see [1].
> 
> What's changed:
> 
> - New framework to track system register traps that are reinjected in
>   guest EL2. It is expected to replace the discrete handling we have
>   enjoyed so far, which didn't scale at all. This has already fixed a
>   number of bugs that were hidden (a bunch of traps were never
>   forwarded...). Still a work in progress, but this is going in the
>   right direction.
> 
> - Allow the L1 hypervisor to have a S2 that has an input larger than
>   the L0 IPA space. This fixes a number of subtle issues, depending on
>   how the initial guest was created.
> 
> - Consequently, the patch series has gone longer again. Boo. But
>   hopefully some of it is easier to review...
> 
> [1] https://lore.kernel.org/r/20230405154008.3552854-1-maz@kernel.org
> 
> Andre Przywara (1):
>   KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ

I guess you have executed kselftests on L1 guests. Have all the tests
passed there? On my end it stalls in the KVM_RUN.

for instance
tools/testing/selftests/kvm/aarch64/aarch32_id_regs.c fails in
test_guest_raz(vcpu) on the KVM_RUN. Even with a basic

static void guest_main(void)
{
GUEST_DONE();
}

I get
 aarch32_id_regs-768     [002] .....   410.544665: kvm_exit: IRQ:
HSR_EC: 0x0000 (UNKNOWN), PC: 0x0000000000401ec4
 aarch32_id_regs-768     [002] d....   410.544666: kvm_entry: PC:
0x0000000000401ec4
 aarch32_id_regs-768     [002] .....   410.544675: kvm_exit: IRQ:
HSR_EC: 0x0000 (UNKNOWN), PC: 0x0000000000401ec4
 aarch32_id_regs-768     [002] d....   410.544676: kvm_entry: PC:
0x0000000000401ec4
 aarch32_id_regs-768     [002] .....   410.544685: kvm_exit: IRQ:
HSR_EC: 0x0000 (UNKNOWN), PC: 0x0000000000401ec4

looping forever instead of

aarch32_id_regs-1085576 [079] d..1. 1401295.068739: kvm_entry: PC:
0x0000000000401ec4
 aarch32_id_regs-1085576 [079] ...1. 1401295.068745: kvm_exit: TRAP:
HSR_EC: 0x0020 (IABT_LOW), PC: 0x0000000000401ec4
 aarch32_id_regs-1085576 [079] d..1. 1401295.068790: kvm_entry: PC:
0x0000000000401ec4
 aarch32_id_regs-1085576 [079] ...1. 1401295.068792: kvm_exit: TRAP:
HSR_EC: 0x0020 (IABT_LOW), PC: 0x0000000000401ec4
 aarch32_id_regs-1085576 [079] d..1. 1401295.068794: kvm_entry: PC:
0x0000000000401ec4
 aarch32_id_regs-1085576 [079] ...1. 1401295.068795: kvm_exit: TRAP:
HSR_EC: 0x0020 (IABT_LOW), PC: 0x0000000000401ec4
 aarch32_id_regs-1085576 [079] d..1. 1401295.068797: kvm_entry: PC:
0x0000000000401ec4
../..

Any idea or any known restriction wrt kselftests?

Thanks

Eric




> 
> Christoffer Dall (5):
>   KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2
>   KVM: arm64: nv: Implement nested Stage-2 page table walk logic
>   KVM: arm64: nv: Unmap/flush shadow stage 2 page tables
>   KVM: arm64: nv: vgic: Emulate the HW bit in software
>   KVM: arm64: nv: Sync nested timer state with FEAT_NV2
> 
> Jintack Lim (7):
>   KVM: arm64: nv: Trap CPACR_EL1 access in virtual EL2
>   KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting
>   KVM: arm64: nv: Respect virtual CPTR_EL2.{TFP,FPEN} settings
>   KVM: arm64: nv: Respect virtual HCR_EL2.{NV,TSC) settings
>   KVM: arm64: nv: Configure HCR_EL2 for nested virtualization
>   KVM: arm64: nv: Trap and emulate TLBI instructions from virtual EL2
>   KVM: arm64: nv: Nested GICv3 Support
> 
> Marc Zyngier (46):
>   KVM: arm64: Move VTCR_EL2 into struct s2_mmu
>   arm64: Add missing Set/Way CMO encodings
>   arm64: Add missing VA CMO encodings
>   arm64: Add missing ERXMISCx_EL1 encodings
>   arm64: Add missing DC ZVA/GVA/GZVA encodings
>   arm64: Add TLBI operation encodings
>   arm64: Add AT operation encodings
>   KVM: arm64: Add missing HCR_EL2 trap bits
>   KVM: arm64: nv: Add trap forwarding infrastructure
>   KVM: arm64: nv: Add trap forwarding for HCR_EL2
>   KVM: arm64: nv: Expose FEAT_EVT to nested guests
>   KVM: arm64: nv: Add trap forwarding for MDCR_EL2
>   KVM: arm64: nv: Add trap forwarding for CNTHCTL_EL2
>   KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers
>   KVM: arm64: nv: Handle virtual EL2 registers in
>     vcpu_read/write_sys_reg()
>   KVM: arm64: nv: Handle SPSR_EL2 specially
>   KVM: arm64: nv: Handle HCR_EL2.E2H specially
>   KVM: arm64: nv: Save/Restore vEL2 sysregs
>   KVM: arm64: nv: Support multiple nested Stage-2 mmu structures
>   KVM: arm64: nv: Handle shadow stage 2 page faults
>   KVM: arm64: nv: Restrict S2 RD/WR permissions to match the guest's
>   KVM: arm64: nv: Set a handler for the system instruction traps
>   KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2
>   KVM: arm64: nv: Fold guest's HCR_EL2 configuration into the host's
>   KVM: arm64: nv: Hide RAS from nested guests
>   KVM: arm64: nv: Add handling of EL2-specific timer registers
>   KVM: arm64: nv: Load timer before the GIC
>   KVM: arm64: nv: Don't load the GICv4 context on entering a nested
>     guest
>   KVM: arm64: nv: Implement maintenance interrupt forwarding
>   KVM: arm64: nv: Deal with broken VGIC on maintenance interrupt
>     delivery
>   KVM: arm64: nv: Allow userspace to request KVM_ARM_VCPU_NESTED_VIRT
>   KVM: arm64: nv: Add handling of FEAT_TTL TLB invalidation
>   KVM: arm64: nv: Invalidate TLBs based on shadow S2 TTL-like
>     information
>   KVM: arm64: nv: Tag shadow S2 entries with nested level
>   KVM: arm64: nv: Add include containing the VNCR_EL2 offsets
>   KVM: arm64: nv: Map VNCR-capable registers to a separate page
>   KVM: arm64: nv: Move nested vgic state into the sysreg file
>   KVM: arm64: Add FEAT_NV2 cpu feature
>   KVM: arm64: nv: Fold GICv3 host trapping requirements into guest setup
>   KVM: arm64: nv: Publish emulated timer interrupt state in the
>     in-memory state
>   KVM: arm64: nv: Allocate VNCR page when required
>   KVM: arm64: nv: Enable ARMv8.4-NV support
>   KVM: arm64: nv: Fast-track 'InHost' exception returns
>   KVM: arm64: nv: Fast-track EL1 TLBIs for VHE guests
>   KVM: arm64: nv: Use FEAT_ECV to trap access to EL0 timers
>   KVM: arm64: nv: Accelerate EL0 timer read accesses when FEAT_ECV is on
> 
>  .../virt/kvm/devices/arm-vgic-v3.rst          |  12 +-
>  arch/arm64/include/asm/esr.h                  |   1 +
>  arch/arm64/include/asm/kvm_arm.h              |  14 +
>  arch/arm64/include/asm/kvm_asm.h              |   4 +
>  arch/arm64/include/asm/kvm_emulate.h          |  93 +-
>  arch/arm64/include/asm/kvm_host.h             | 181 +++-
>  arch/arm64/include/asm/kvm_hyp.h              |   2 +
>  arch/arm64/include/asm/kvm_mmu.h              |  20 +-
>  arch/arm64/include/asm/kvm_nested.h           | 133 +++
>  arch/arm64/include/asm/stage2_pgtable.h       |   4 +-
>  arch/arm64/include/asm/sysreg.h               | 196 ++++
>  arch/arm64/include/asm/vncr_mapping.h         |  74 ++
>  arch/arm64/include/uapi/asm/kvm.h             |   1 +
>  arch/arm64/kernel/cpufeature.c                |  11 +
>  arch/arm64/kvm/Makefile                       |   4 +-
>  arch/arm64/kvm/arch_timer.c                   |  98 +-
>  arch/arm64/kvm/arm.c                          |  33 +-
>  arch/arm64/kvm/at.c                           | 219 ++++
>  arch/arm64/kvm/emulate-nested.c               | 934 ++++++++++++++++-
>  arch/arm64/kvm/handle_exit.c                  |  29 +-
>  arch/arm64/kvm/hyp/include/hyp/switch.h       |   8 +-
>  arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h    |   5 +-
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c         |   8 +-
>  arch/arm64/kvm/hyp/nvhe/pkvm.c                |   4 +-
>  arch/arm64/kvm/hyp/nvhe/switch.c              |   2 +-
>  arch/arm64/kvm/hyp/nvhe/sysreg-sr.c           |   2 +-
>  arch/arm64/kvm/hyp/pgtable.c                  |   2 +-
>  arch/arm64/kvm/hyp/vgic-v3-sr.c               |   6 +-
>  arch/arm64/kvm/hyp/vhe/switch.c               | 206 +++-
>  arch/arm64/kvm/hyp/vhe/sysreg-sr.c            | 124 ++-
>  arch/arm64/kvm/hyp/vhe/tlb.c                  |  83 ++
>  arch/arm64/kvm/mmu.c                          | 255 ++++-
>  arch/arm64/kvm/nested.c                       | 799 ++++++++++++++-
>  arch/arm64/kvm/pkvm.c                         |   2 +-
>  arch/arm64/kvm/reset.c                        |   7 +
>  arch/arm64/kvm/sys_regs.c                     | 958 +++++++++++++++++-
>  arch/arm64/kvm/trace_arm.h                    |  19 +
>  arch/arm64/kvm/vgic/vgic-init.c               |  33 +
>  arch/arm64/kvm/vgic/vgic-kvm-device.c         |  32 +-
>  arch/arm64/kvm/vgic/vgic-v3-nested.c          | 248 +++++
>  arch/arm64/kvm/vgic/vgic-v3.c                 |  43 +-
>  arch/arm64/kvm/vgic/vgic.c                    |  29 +
>  arch/arm64/kvm/vgic/vgic.h                    |  10 +
>  arch/arm64/tools/cpucaps                      |   1 +
>  include/clocksource/arm_arch_timer.h          |   4 +
>  include/kvm/arm_arch_timer.h                  |   1 +
>  include/kvm/arm_vgic.h                        |  17 +
>  include/uapi/linux/kvm.h                      |   1 +
>  tools/arch/arm/include/uapi/asm/kvm.h         |   1 +
>  49 files changed, 4790 insertions(+), 183 deletions(-)
>  create mode 100644 arch/arm64/include/asm/vncr_mapping.h
>  create mode 100644 arch/arm64/kvm/at.c
>  create mode 100644 arch/arm64/kvm/vgic/vgic-v3-nested.c
> 


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v10 03/59] arm64: Add missing VA CMO encodings
  2023-05-15 17:30 ` [PATCH v10 03/59] arm64: Add missing VA " Marc Zyngier
@ 2023-06-05 17:46   ` Eric Auger
  0 siblings, 0 replies; 95+ messages in thread
From: Eric Auger @ 2023-06-05 17:46 UTC (permalink / raw)
  To: Marc Zyngier, kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu



On 5/15/23 19:30, Marc Zyngier wrote:
> Add the missing VA-based CMOs encodings.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>

Eric
> ---
>  arch/arm64/include/asm/sysreg.h | 26 ++++++++++++++++++++++++++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 108663aebccb..071cc8545fbe 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -124,6 +124,32 @@
>  #define SYS_DC_CIGSW			sys_insn(1, 0, 7, 14, 4)
>  #define SYS_DC_CIGDSW			sys_insn(1, 0, 7, 14, 6)
>  
> +#define SYS_IC_IALLUIS			sys_insn(1, 0, 7, 1, 0)
> +#define SYS_IC_IALLU			sys_insn(1, 0, 7, 5, 0)
> +#define SYS_IC_IVAU			sys_insn(1, 3, 7, 5, 1)
> +
> +#define SYS_DC_IVAC			sys_insn(1, 0, 7, 6, 1)
> +#define SYS_DC_IGVAC			sys_insn(1, 0, 7, 6, 3)
> +#define SYS_DC_IGDVAC			sys_insn(1, 0, 7, 6, 5)
> +
> +#define SYS_DC_CVAC			sys_insn(1, 3, 7, 10, 1)
> +#define SYS_DC_CGVAC			sys_insn(1, 3, 7, 10, 3)
> +#define SYS_DC_CGDVAC			sys_insn(1, 3, 7, 10, 5)
> +
> +#define SYS_DC_CVAU			sys_insn(1, 3, 7, 11, 1)
> +
> +#define SYS_DC_CVAP			sys_insn(1, 3, 7, 12, 1)
> +#define SYS_DC_CGVAP			sys_insn(1, 3, 7, 12, 3)
> +#define SYS_DC_CGDVAP			sys_insn(1, 3, 7, 12, 5)
> +
> +#define SYS_DC_CVADP			sys_insn(1, 3, 7, 13, 1)
> +#define SYS_DC_CGVADP			sys_insn(1, 3, 7, 13, 3)
> +#define SYS_DC_CGDVADP			sys_insn(1, 3, 7, 13, 5)
> +
> +#define SYS_DC_CIVAC			sys_insn(1, 3, 7, 14, 1)
> +#define SYS_DC_CIGVAC			sys_insn(1, 3, 7, 14, 3)
> +#define SYS_DC_CIGDVAC			sys_insn(1, 3, 7, 14, 5)
> +
>  /*
>   * Automatically generated definitions for system registers, the
>   * manual encodings below are in the process of being converted to


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v10 04/59] arm64: Add missing ERXMISCx_EL1 encodings
  2023-05-15 17:30 ` [PATCH v10 04/59] arm64: Add missing ERXMISCx_EL1 encodings Marc Zyngier
@ 2023-06-05 17:47   ` Eric Auger
  0 siblings, 0 replies; 95+ messages in thread
From: Eric Auger @ 2023-06-05 17:47 UTC (permalink / raw)
  To: Marc Zyngier, kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

Hi Marc,

On 5/15/23 19:30, Marc Zyngier wrote:
> We only describe ERXMISC{0,1}_EL1. Add ERXMISC{2,3}_EL1 for a good measure.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/sysreg.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 071cc8545fbe..71305f7425db 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -239,6 +239,8 @@
>  #define SYS_ERXADDR_EL1			sys_reg(3, 0, 5, 4, 3)
>  #define SYS_ERXMISC0_EL1		sys_reg(3, 0, 5, 5, 0)
>  #define SYS_ERXMISC1_EL1		sys_reg(3, 0, 5, 5, 1)
> +#define SYS_ERXMISC2_EL1		sys_reg(3, 0, 5, 5, 2)
> +#define SYS_ERXMISC3_EL1		sys_reg(3, 0, 5, 5, 3)

>  #define SYS_TFSR_EL1			sys_reg(3, 0, 5, 6, 0)
>  #define SYS_TFSRE0_EL1			sys_reg(3, 0, 5, 6, 1)
>  
Reviewed-by: Eric Auger <eric.auger@redhat.com>

Eric


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v10 05/59] arm64: Add missing DC ZVA/GVA/GZVA encodings
  2023-05-15 17:30 ` [PATCH v10 05/59] arm64: Add missing DC ZVA/GVA/GZVA encodings Marc Zyngier
@ 2023-06-05 17:47   ` Eric Auger
  0 siblings, 0 replies; 95+ messages in thread
From: Eric Auger @ 2023-06-05 17:47 UTC (permalink / raw)
  To: Marc Zyngier, kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu



On 5/15/23 19:30, Marc Zyngier wrote:
> Add the missing DC *VA encodings.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/sysreg.h | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 71305f7425db..28ccc379a172 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -150,6 +150,11 @@
>  #define SYS_DC_CIGVAC			sys_insn(1, 3, 7, 14, 3)
>  #define SYS_DC_CIGDVAC			sys_insn(1, 3, 7, 14, 5)
>  
> +/* Data cache zero operations */
> +#define SYS_DC_ZVA			sys_insn(1, 3, 7, 4, 1)
> +#define SYS_DC_GVA			sys_insn(1, 3, 7, 4, 3)
> +#define SYS_DC_GZVA			sys_insn(1, 3, 7, 4, 4)
> +
>  /*
>   * Automatically generated definitions for system registers, the
>   * manual encodings below are in the process of being converted to
Reviewed-by: Eric Auger <eric.auger@redhat.com>

Eric


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support
  2023-06-05 11:28 ` Eric Auger
@ 2023-06-06  7:30   ` Marc Zyngier
  2023-06-06  9:29     ` Eric Auger
  0 siblings, 1 reply; 95+ messages in thread
From: Marc Zyngier @ 2023-06-06  7:30 UTC (permalink / raw)
  To: Eric Auger
  Cc: kvmarm, kvm, linux-arm-kernel, Alexandru Elisei, Andre Przywara,
	Chase Conklin, Christoffer Dall, Ganapatrao Kulkarni,
	Darren Hart, Jintack Lim, Russell King, Miguel Luis, James Morse,
	Suzuki K Poulose, Oliver Upton, Zenghui Yu

Hey Eric,

On Mon, 05 Jun 2023 12:28:12 +0100,
Eric Auger <eauger@redhat.com> wrote:
> 
> Hi Marc,
> 
> On 5/15/23 19:30, Marc Zyngier wrote:
> > This is the 4th drop of NV support on arm64 for this year.
> > 
> > For the previous episodes, see [1].
> > 
> > What's changed:
> > 
> > - New framework to track system register traps that are reinjected in
> >   guest EL2. It is expected to replace the discrete handling we have
> >   enjoyed so far, which didn't scale at all. This has already fixed a
> >   number of bugs that were hidden (a bunch of traps were never
> >   forwarded...). Still a work in progress, but this is going in the
> >   right direction.
> > 
> > - Allow the L1 hypervisor to have a S2 that has an input larger than
> >   the L0 IPA space. This fixes a number of subtle issues, depending on
> >   how the initial guest was created.
> > 
> > - Consequently, the patch series has gone longer again. Boo. But
> >   hopefully some of it is easier to review...
> > 
> > [1] https://lore.kernel.org/r/20230405154008.3552854-1-maz@kernel.org
> > 
> > Andre Przywara (1):
> >   KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ
> 
> I guess you have executed kselftests on L1 guests. Have all the tests
> passed there? On my end it stalls in the KVM_RUN.

No, I hardly run any kselftest, because they are just not designed to
run at EL2 at all. There's some work to be done there, but I just
don't have the bandwidth for that (hint, wink...)

> 
> for instance
> tools/testing/selftests/kvm/aarch64/aarch32_id_regs.c fails in
> test_guest_raz(vcpu) on the KVM_RUN. Even with a basic
> 
> static void guest_main(void)
> {
> GUEST_DONE();
> }

My guess is that the test harness expects things to run at EL1.
Depending on the value you get for HCR_EL2, you could get all sort of
odd behaviours. Also, the harness configures EL1 only, which is
unlikely to work at EL2. My conclusion is that "processor.c" needs to
be taught about EL2, at the very least.

> 
> I get
>  aarch32_id_regs-768     [002] .....   410.544665: kvm_exit: IRQ:
> HSR_EC: 0x0000 (UNKNOWN), PC: 0x0000000000401ec4
>  aarch32_id_regs-768     [002] d....   410.544666: kvm_entry: PC:
> 0x0000000000401ec4
>  aarch32_id_regs-768     [002] .....   410.544675: kvm_exit: IRQ:
> HSR_EC: 0x0000 (UNKNOWN), PC: 0x0000000000401ec4
>  aarch32_id_regs-768     [002] d....   410.544676: kvm_entry: PC:
> 0x0000000000401ec4
>  aarch32_id_regs-768     [002] .....   410.544685: kvm_exit: IRQ:
> HSR_EC: 0x0000 (UNKNOWN), PC: 0x0000000000401ec4
> 
> looping forever instead of
> 
> aarch32_id_regs-1085576 [079] d..1. 1401295.068739: kvm_entry: PC:
> 0x0000000000401ec4
>  aarch32_id_regs-1085576 [079] ...1. 1401295.068745: kvm_exit: TRAP:
> HSR_EC: 0x0020 (IABT_LOW), PC: 0x0000000000401ec4
>  aarch32_id_regs-1085576 [079] d..1. 1401295.068790: kvm_entry: PC:
> 0x0000000000401ec4
>  aarch32_id_regs-1085576 [079] ...1. 1401295.068792: kvm_exit: TRAP:
> HSR_EC: 0x0020 (IABT_LOW), PC: 0x0000000000401ec4
>  aarch32_id_regs-1085576 [079] d..1. 1401295.068794: kvm_entry: PC:
> 0x0000000000401ec4
>  aarch32_id_regs-1085576 [079] ...1. 1401295.068795: kvm_exit: TRAP:
> HSR_EC: 0x0020 (IABT_LOW), PC: 0x0000000000401ec4
>  aarch32_id_regs-1085576 [079] d..1. 1401295.068797: kvm_entry: PC:
> 0x0000000000401ec4
> ../..
> 
> Any idea or any known restriction wrt kselftests?

See above. I'd love someone to actually start looking into it and
devise a testing harness that would run both at EL{0,1,2} *at the same
time* so that we can start exercising some of the trap behaviours that
the architecture mandates.

Also, Alexandru had a some KUT tests a few years ago, but I don't know
what happened of them.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support
  2023-06-06  7:30   ` Marc Zyngier
@ 2023-06-06  9:29     ` Eric Auger
  2023-06-06 16:22       ` Marc Zyngier
  0 siblings, 1 reply; 95+ messages in thread
From: Eric Auger @ 2023-06-06  9:29 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, kvm, linux-arm-kernel, Alexandru Elisei, Andre Przywara,
	Chase Conklin, Christoffer Dall, Ganapatrao Kulkarni,
	Darren Hart, Jintack Lim, Russell King, Miguel Luis, James Morse,
	Suzuki K Poulose, Oliver Upton, Zenghui Yu

Hi Marc,
On 6/6/23 09:30, Marc Zyngier wrote:
> Hey Eric,
> 
> On Mon, 05 Jun 2023 12:28:12 +0100,
> Eric Auger <eauger@redhat.com> wrote:
>>
>> Hi Marc,
>>
>> On 5/15/23 19:30, Marc Zyngier wrote:
>>> This is the 4th drop of NV support on arm64 for this year.
>>>
>>> For the previous episodes, see [1].
>>>
>>> What's changed:
>>>
>>> - New framework to track system register traps that are reinjected in
>>>   guest EL2. It is expected to replace the discrete handling we have
>>>   enjoyed so far, which didn't scale at all. This has already fixed a
>>>   number of bugs that were hidden (a bunch of traps were never
>>>   forwarded...). Still a work in progress, but this is going in the
>>>   right direction.
>>>
>>> - Allow the L1 hypervisor to have a S2 that has an input larger than
>>>   the L0 IPA space. This fixes a number of subtle issues, depending on
>>>   how the initial guest was created.
>>>
>>> - Consequently, the patch series has gone longer again. Boo. But
>>>   hopefully some of it is easier to review...
>>>
>>> [1] https://lore.kernel.org/r/20230405154008.3552854-1-maz@kernel.org
>>>
>>> Andre Przywara (1):
>>>   KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ
>>
>> I guess you have executed kselftests on L1 guests. Have all the tests
>> passed there? On my end it stalls in the KVM_RUN.
> 
> No, I hardly run any kselftest, because they are just not designed to
> run at EL2 at all. There's some work to be done there, but I just
> don't have the bandwidth for that (hint, wink...)

oh OK, I missed that point. If nobody is working on this I can start
looking at it. Would be interesting to run them on nested guest too.
> 
>>
>> for instance
>> tools/testing/selftests/kvm/aarch64/aarch32_id_regs.c fails in
>> test_guest_raz(vcpu) on the KVM_RUN. Even with a basic
>>
>> static void guest_main(void)
>> {
>> GUEST_DONE();
>> }
> 
> My guess is that the test harness expects things to run at EL1.
> Depending on the value you get for HCR_EL2, you could get all sort of
> odd behaviours. Also, the harness configures EL1 only, which is
> unlikely to work at EL2. My conclusion is that "processor.c" needs to
> be taught about EL2, at the very least.
> 
>>
>> I get
>>  aarch32_id_regs-768     [002] .....   410.544665: kvm_exit: IRQ:
>> HSR_EC: 0x0000 (UNKNOWN), PC: 0x0000000000401ec4
>>  aarch32_id_regs-768     [002] d....   410.544666: kvm_entry: PC:
>> 0x0000000000401ec4
>>  aarch32_id_regs-768     [002] .....   410.544675: kvm_exit: IRQ:
>> HSR_EC: 0x0000 (UNKNOWN), PC: 0x0000000000401ec4
>>  aarch32_id_regs-768     [002] d....   410.544676: kvm_entry: PC:
>> 0x0000000000401ec4
>>  aarch32_id_regs-768     [002] .....   410.544685: kvm_exit: IRQ:
>> HSR_EC: 0x0000 (UNKNOWN), PC: 0x0000000000401ec4
>>
>> looping forever instead of
>>
>> aarch32_id_regs-1085576 [079] d..1. 1401295.068739: kvm_entry: PC:
>> 0x0000000000401ec4
>>  aarch32_id_regs-1085576 [079] ...1. 1401295.068745: kvm_exit: TRAP:
>> HSR_EC: 0x0020 (IABT_LOW), PC: 0x0000000000401ec4
>>  aarch32_id_regs-1085576 [079] d..1. 1401295.068790: kvm_entry: PC:
>> 0x0000000000401ec4
>>  aarch32_id_regs-1085576 [079] ...1. 1401295.068792: kvm_exit: TRAP:
>> HSR_EC: 0x0020 (IABT_LOW), PC: 0x0000000000401ec4
>>  aarch32_id_regs-1085576 [079] d..1. 1401295.068794: kvm_entry: PC:
>> 0x0000000000401ec4
>>  aarch32_id_regs-1085576 [079] ...1. 1401295.068795: kvm_exit: TRAP:
>> HSR_EC: 0x0020 (IABT_LOW), PC: 0x0000000000401ec4
>>  aarch32_id_regs-1085576 [079] d..1. 1401295.068797: kvm_entry: PC:
>> 0x0000000000401ec4
>> ../..
>>
>> Any idea or any known restriction wrt kselftests?
> 
> See above. I'd love someone to actually start looking into it and
> devise a testing harness that would run both at EL{0,1,2} *at the same
> time* so that we can start exercising some of the trap behaviours that
> the architecture mandates.
> 
> Also, Alexandru had a some KUT tests a few years ago, but I don't know
> what happened of them.

Yeah I remember that one too. Alexandru, do you have plans to revive it?
That would be also interesting to run kuts on nested, giving a chance to
have incremental and more unitary testing.

Thanks!

Eric

> 
> Thanks,
> 
> 	M.
> 


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support
  2023-05-17 14:12       ` Marc Zyngier
@ 2023-06-06  9:33         ` Eric Auger
  2023-06-06 16:30           ` Marc Zyngier
  2023-06-06 17:52           ` Miguel Luis
  0 siblings, 2 replies; 95+ messages in thread
From: Eric Auger @ 2023-06-06  9:33 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, kvm, linux-arm-kernel, Alexandru Elisei, Andre Przywara,
	Chase Conklin, Christoffer Dall, Ganapatrao Kulkarni,
	Darren Hart, Jintack Lim, Russell King, Miguel Luis, James Morse,
	Suzuki K Poulose, Oliver Upton, Zenghui Yu

Hi Marc,

On 5/17/23 16:12, Marc Zyngier wrote:
> On Wed, 17 May 2023 09:59:45 +0100,
> Eric Auger <eauger@redhat.com> wrote:
>>
>> Hi Marc,
>>Hi Marc,
>> On 5/16/23 22:28, Marc Zyngier wrote:
>>> On Tue, 16 May 2023 17:53:14 +0100,
>>> Eric Auger <eauger@redhat.com> wrote:
>>>>
>>>> Hi Marc,
>>>>
>>>> On 5/15/23 19:30, Marc Zyngier wrote:
>>>>> This is the 4th drop of NV support on arm64 for this year.
>>>>>
>>>>> For the previous episodes, see [1].
>>>>>
>>>>> What's changed:
>>>>>
>>>>> - New framework to track system register traps that are reinjected in
>>>>>   guest EL2. It is expected to replace the discrete handling we have
>>>>>   enjoyed so far, which didn't scale at all. This has already fixed a
>>>>>   number of bugs that were hidden (a bunch of traps were never
>>>>>   forwarded...). Still a work in progress, but this is going in the
>>>>>   right direction.
>>>>>
>>>>> - Allow the L1 hypervisor to have a S2 that has an input larger than
>>>>>   the L0 IPA space. This fixes a number of subtle issues, depending on
>>>>>   how the initial guest was created.
>>>>>
>>>>> - Consequently, the patch series has gone longer again. Boo. But
>>>>>   hopefully some of it is easier to review...
>>>>>
>>>>> [1] https://lore.kernel.org/r/20230405154008.3552854-1-maz@kernel.org
>>>>
>>>> I have started testing this and when booting my fedora guest I get
>>>>
>>>> [  151.796544] kvm [7617]: Unsupported guest sys_reg access at:
>>>> 23f425fd0 [80000209]
>>>> [  151.796544]  { Op0( 3), Op1( 3), CRn(14), CRm( 3), Op2( 1), func_write },
>>>>
>>>> as soon as the host has kvm-arm.mode=nested
>>>>
>>>> This seems to be triggered very early by EDK2
>>>> (ArmPkg/Drivers/TimerDxe/TimerDxe.c).
>>>>
>>>> If I am not wrong this CNTV_CTL_EL0. Do you have any idea?
>>>
>>> So here's my current analysis:
>>>
>>> I assume you are running EDK2 as the L1 guest in a nested
>>> configuration. I also assume that you are not running on an Apple
>>> CPU. If these assumptions are correct, then EDK2 runs at vEL2, and is
>>> in nVHE mode.
>>>
>>> Finally, I'm going to assume that your implementation has FEAT_ECV and
>>> FEAT_NV2, because I can't see how it could fail otherwise.
>> all the above is correct.
>>>
>>> In these precise conditions, KVM sets the CNTHCTL_EL2.EL1TVT bit so
>>> that we can trap the EL0 virtual timer and faithfully emulate it (it
>>> is otherwise written to memory, which isn't very helpful).
>>
>> indeed
>>>
>>> As it turns out, we don't handle these traps. I didn't spot it because
>>> my test machines are all Apple boxes that don't have a nVHE mode, so
>>> nothing on the nVHE path is getting *ANY* coverage. Hint: having
>>> access to such a machine would help (shipping address on request!).
>>> Otherwise, I'll eventually kill the nVHE support altogether.
>>>
>>> I have written the following patch, which compiles, but that I cannot
>>> test with my current setup. Could you please give it a go?
>>
>> with the patch below, my guest boots nicely. You did it great on the 1st
>> shot!!! So this fixes my issue. I will continue testing the v10.
> 
> Thanks a lot for reporting the issue and testing my hacks. I'll
> eventually fold it into the rest of the series.
> 
> By the way, what are you using as your VMM? I'd really like to
> reproduce your setup.
Sorry I missed your reply. I am using libvirt + qemu (feat Miguel's RFC)
and fedora L1 guest.

Thanks to your fix, this boots fine. But at the moment it does not
reboot and hangs in edk2 I think. Unfortunately this time I have no
trace on host :-( While looking at your series I will add some traces.

Eric
> 
> Cheers,
> 
> 	M.
> 


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v10 06/59] arm64: Add TLBI operation encodings
  2023-05-15 17:30 ` [PATCH v10 06/59] arm64: Add TLBI operation encodings Marc Zyngier
@ 2023-06-06  9:33   ` Eric Auger
  0 siblings, 0 replies; 95+ messages in thread
From: Eric Auger @ 2023-06-06  9:33 UTC (permalink / raw)
  To: Marc Zyngier, kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

Hi,

On 5/15/23 19:30, Marc Zyngier wrote:
> Add all the TLBI encodings that are usable from Non-Secure.
> 

> Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>

Eric
> ---
>  arch/arm64/include/asm/sysreg.h | 128 ++++++++++++++++++++++++++++++++
>  1 file changed, 128 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 28ccc379a172..2727e68dd65b 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -568,6 +568,134 @@
>  
>  #define SYS_SP_EL2			sys_reg(3, 6,  4, 1, 0)
>  
> +/* TLBI instructions */
> +#define OP_TLBI_VMALLE1OS		sys_insn(1, 0, 8, 1, 0)
> +#define OP_TLBI_VAE1OS			sys_insn(1, 0, 8, 1, 1)
> +#define OP_TLBI_ASIDE1OS		sys_insn(1, 0, 8, 1, 2)
> +#define OP_TLBI_VAAE1OS			sys_insn(1, 0, 8, 1, 3)
> +#define OP_TLBI_VALE1OS			sys_insn(1, 0, 8, 1, 5)
> +#define OP_TLBI_VAALE1OS		sys_insn(1, 0, 8, 1, 7)
> +#define OP_TLBI_RVAE1IS			sys_insn(1, 0, 8, 2, 1)
> +#define OP_TLBI_RVAAE1IS		sys_insn(1, 0, 8, 2, 3)
> +#define OP_TLBI_RVALE1IS		sys_insn(1, 0, 8, 2, 5)
> +#define OP_TLBI_RVAALE1IS		sys_insn(1, 0, 8, 2, 7)
> +#define OP_TLBI_VMALLE1IS		sys_insn(1, 0, 8, 3, 0)
> +#define OP_TLBI_VAE1IS			sys_insn(1, 0, 8, 3, 1)
> +#define OP_TLBI_ASIDE1IS		sys_insn(1, 0, 8, 3, 2)
> +#define OP_TLBI_VAAE1IS			sys_insn(1, 0, 8, 3, 3)
> +#define OP_TLBI_VALE1IS			sys_insn(1, 0, 8, 3, 5)
> +#define OP_TLBI_VAALE1IS		sys_insn(1, 0, 8, 3, 7)
> +#define OP_TLBI_RVAE1OS			sys_insn(1, 0, 8, 5, 1)
> +#define OP_TLBI_RVAAE1OS		sys_insn(1, 0, 8, 5, 3)
> +#define OP_TLBI_RVALE1OS		sys_insn(1, 0, 8, 5, 5)
> +#define OP_TLBI_RVAALE1OS		sys_insn(1, 0, 8, 5, 7)
> +#define OP_TLBI_RVAE1			sys_insn(1, 0, 8, 6, 1)
> +#define OP_TLBI_RVAAE1			sys_insn(1, 0, 8, 6, 3)
> +#define OP_TLBI_RVALE1			sys_insn(1, 0, 8, 6, 5)
> +#define OP_TLBI_RVAALE1			sys_insn(1, 0, 8, 6, 7)
> +#define OP_TLBI_VMALLE1			sys_insn(1, 0, 8, 7, 0)
> +#define OP_TLBI_VAE1			sys_insn(1, 0, 8, 7, 1)
> +#define OP_TLBI_ASIDE1			sys_insn(1, 0, 8, 7, 2)
> +#define OP_TLBI_VAAE1			sys_insn(1, 0, 8, 7, 3)
> +#define OP_TLBI_VALE1			sys_insn(1, 0, 8, 7, 5)
> +#define OP_TLBI_VAALE1			sys_insn(1, 0, 8, 7, 7)
> +#define OP_TLBI_VMALLE1OSNXS		sys_insn(1, 0, 9, 1, 0)
> +#define OP_TLBI_VAE1OSNXS		sys_insn(1, 0, 9, 1, 1)
> +#define OP_TLBI_ASIDE1OSNXS		sys_insn(1, 0, 9, 1, 2)
> +#define OP_TLBI_VAAE1OSNXS		sys_insn(1, 0, 9, 1, 3)
> +#define OP_TLBI_VALE1OSNXS		sys_insn(1, 0, 9, 1, 5)
> +#define OP_TLBI_VAALE1OSNXS		sys_insn(1, 0, 9, 1, 7)
> +#define OP_TLBI_RVAE1ISNXS		sys_insn(1, 0, 9, 2, 1)
> +#define OP_TLBI_RVAAE1ISNXS		sys_insn(1, 0, 9, 2, 3)
> +#define OP_TLBI_RVALE1ISNXS		sys_insn(1, 0, 9, 2, 5)
> +#define OP_TLBI_RVAALE1ISNXS		sys_insn(1, 0, 9, 2, 7)
> +#define OP_TLBI_VMALLE1ISNXS		sys_insn(1, 0, 9, 3, 0)
> +#define OP_TLBI_VAE1ISNXS		sys_insn(1, 0, 9, 3, 1)
> +#define OP_TLBI_ASIDE1ISNXS		sys_insn(1, 0, 9, 3, 2)
> +#define OP_TLBI_VAAE1ISNXS		sys_insn(1, 0, 9, 3, 3)
> +#define OP_TLBI_VALE1ISNXS		sys_insn(1, 0, 9, 3, 5)
> +#define OP_TLBI_VAALE1ISNXS		sys_insn(1, 0, 9, 3, 7)
> +#define OP_TLBI_RVAE1OSNXS		sys_insn(1, 0, 9, 5, 1)
> +#define OP_TLBI_RVAAE1OSNXS		sys_insn(1, 0, 9, 5, 3)
> +#define OP_TLBI_RVALE1OSNXS		sys_insn(1, 0, 9, 5, 5)
> +#define OP_TLBI_RVAALE1OSNXS		sys_insn(1, 0, 9, 5, 7)
> +#define OP_TLBI_RVAE1NXS		sys_insn(1, 0, 9, 6, 1)
> +#define OP_TLBI_RVAAE1NXS		sys_insn(1, 0, 9, 6, 3)
> +#define OP_TLBI_RVALE1NXS		sys_insn(1, 0, 9, 6, 5)
> +#define OP_TLBI_RVAALE1NXS		sys_insn(1, 0, 9, 6, 7)
> +#define OP_TLBI_VMALLE1NXS		sys_insn(1, 0, 9, 7, 0)
> +#define OP_TLBI_VAE1NXS			sys_insn(1, 0, 9, 7, 1)
> +#define OP_TLBI_ASIDE1NXS		sys_insn(1, 0, 9, 7, 2)
> +#define OP_TLBI_VAAE1NXS		sys_insn(1, 0, 9, 7, 3)
> +#define OP_TLBI_VALE1NXS		sys_insn(1, 0, 9, 7, 5)
> +#define OP_TLBI_VAALE1NXS		sys_insn(1, 0, 9, 7, 7)
> +#define OP_TLBI_IPAS2E1IS		sys_insn(1, 4, 8, 0, 1)
> +#define OP_TLBI_RIPAS2E1IS		sys_insn(1, 4, 8, 0, 2)
> +#define OP_TLBI_IPAS2LE1IS		sys_insn(1, 4, 8, 0, 5)
> +#define OP_TLBI_RIPAS2LE1IS		sys_insn(1, 4, 8, 0, 6)
> +#define OP_TLBI_ALLE2OS			sys_insn(1, 4, 8, 1, 0)
> +#define OP_TLBI_VAE2OS			sys_insn(1, 4, 8, 1, 1)
> +#define OP_TLBI_ALLE1OS			sys_insn(1, 4, 8, 1, 4)
> +#define OP_TLBI_VALE2OS			sys_insn(1, 4, 8, 1, 5)
> +#define OP_TLBI_VMALLS12E1OS		sys_insn(1, 4, 8, 1, 6)
> +#define OP_TLBI_RVAE2IS			sys_insn(1, 4, 8, 2, 1)
> +#define OP_TLBI_RVALE2IS		sys_insn(1, 4, 8, 2, 5)
> +#define OP_TLBI_ALLE2IS			sys_insn(1, 4, 8, 3, 0)
> +#define OP_TLBI_VAE2IS			sys_insn(1, 4, 8, 3, 1)
> +#define OP_TLBI_ALLE1IS			sys_insn(1, 4, 8, 3, 4)
> +#define OP_TLBI_VALE2IS			sys_insn(1, 4, 8, 3, 5)
> +#define OP_TLBI_VMALLS12E1IS		sys_insn(1, 4, 8, 3, 6)
> +#define OP_TLBI_IPAS2E1OS		sys_insn(1, 4, 8, 4, 0)
> +#define OP_TLBI_IPAS2E1			sys_insn(1, 4, 8, 4, 1)
> +#define OP_TLBI_RIPAS2E1		sys_insn(1, 4, 8, 4, 2)
> +#define OP_TLBI_RIPAS2E1OS		sys_insn(1, 4, 8, 4, 3)
> +#define OP_TLBI_IPAS2LE1OS		sys_insn(1, 4, 8, 4, 4)
> +#define OP_TLBI_IPAS2LE1		sys_insn(1, 4, 8, 4, 5)
> +#define OP_TLBI_RIPAS2LE1		sys_insn(1, 4, 8, 4, 6)
> +#define OP_TLBI_RIPAS2LE1OS		sys_insn(1, 4, 8, 4, 7)
> +#define OP_TLBI_RVAE2OS			sys_insn(1, 4, 8, 5, 1)
> +#define OP_TLBI_RVALE2OS		sys_insn(1, 4, 8, 5, 5)
> +#define OP_TLBI_RVAE2			sys_insn(1, 4, 8, 6, 1)
> +#define OP_TLBI_RVALE2			sys_insn(1, 4, 8, 6, 5)
> +#define OP_TLBI_ALLE2			sys_insn(1, 4, 8, 7, 0)
> +#define OP_TLBI_VAE2			sys_insn(1, 4, 8, 7, 1)
> +#define OP_TLBI_ALLE1			sys_insn(1, 4, 8, 7, 4)
> +#define OP_TLBI_VALE2			sys_insn(1, 4, 8, 7, 5)
> +#define OP_TLBI_VMALLS12E1		sys_insn(1, 4, 8, 7, 6)
> +#define OP_TLBI_IPAS2E1ISNXS		sys_insn(1, 4, 9, 0, 1)
> +#define OP_TLBI_RIPAS2E1ISNXS		sys_insn(1, 4, 9, 0, 2)
> +#define OP_TLBI_IPAS2LE1ISNXS		sys_insn(1, 4, 9, 0, 5)
> +#define OP_TLBI_RIPAS2LE1ISNXS		sys_insn(1, 4, 9, 0, 6)
> +#define OP_TLBI_ALLE2OSNXS		sys_insn(1, 4, 9, 1, 0)
> +#define OP_TLBI_VAE2OSNXS		sys_insn(1, 4, 9, 1, 1)
> +#define OP_TLBI_ALLE1OSNXS		sys_insn(1, 4, 9, 1, 4)
> +#define OP_TLBI_VALE2OSNXS		sys_insn(1, 4, 9, 1, 5)
> +#define OP_TLBI_VMALLS12E1OSNXS		sys_insn(1, 4, 9, 1, 6)
> +#define OP_TLBI_RVAE2ISNXS		sys_insn(1, 4, 9, 2, 1)
> +#define OP_TLBI_RVALE2ISNXS		sys_insn(1, 4, 9, 2, 5)
> +#define OP_TLBI_ALLE2ISNXS		sys_insn(1, 4, 9, 3, 0)
> +#define OP_TLBI_VAE2ISNXS		sys_insn(1, 4, 9, 3, 1)
> +#define OP_TLBI_ALLE1ISNXS		sys_insn(1, 4, 9, 3, 4)
> +#define OP_TLBI_VALE2ISNXS		sys_insn(1, 4, 9, 3, 5)
> +#define OP_TLBI_VMALLS12E1ISNXS		sys_insn(1, 4, 9, 3, 6)
> +#define OP_TLBI_IPAS2E1OSNXS		sys_insn(1, 4, 9, 4, 0)
> +#define OP_TLBI_IPAS2E1NXS		sys_insn(1, 4, 9, 4, 1)
> +#define OP_TLBI_RIPAS2E1NXS		sys_insn(1, 4, 9, 4, 2)
> +#define OP_TLBI_RIPAS2E1OSNXS		sys_insn(1, 4, 9, 4, 3)
> +#define OP_TLBI_IPAS2LE1OSNXS		sys_insn(1, 4, 9, 4, 4)
> +#define OP_TLBI_IPAS2LE1NXS		sys_insn(1, 4, 9, 4, 5)
> +#define OP_TLBI_RIPAS2LE1NXS		sys_insn(1, 4, 9, 4, 6)
> +#define OP_TLBI_RIPAS2LE1OSNXS		sys_insn(1, 4, 9, 4, 7)
> +#define OP_TLBI_RVAE2OSNXS		sys_insn(1, 4, 9, 5, 1)
> +#define OP_TLBI_RVALE2OSNXS		sys_insn(1, 4, 9, 5, 5)
> +#define OP_TLBI_RVAE2NXS		sys_insn(1, 4, 9, 6, 1)
> +#define OP_TLBI_RVALE2NXS		sys_insn(1, 4, 9, 6, 5)
> +#define OP_TLBI_ALLE2NXS		sys_insn(1, 4, 9, 7, 0)
> +#define OP_TLBI_VAE2NXS			sys_insn(1, 4, 9, 7, 1)
> +#define OP_TLBI_ALLE1NXS		sys_insn(1, 4, 9, 7, 4)
> +#define OP_TLBI_VALE2NXS		sys_insn(1, 4, 9, 7, 5)
> +#define OP_TLBI_VMALLS12E1NXS		sys_insn(1, 4, 9, 7, 6)
> +
>  /* Common SCTLR_ELx flags. */
>  #define SCTLR_ELx_ENTP2	(BIT(60))
>  #define SCTLR_ELx_DSSBS	(BIT(44))


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support
  2023-06-06  9:29     ` Eric Auger
@ 2023-06-06 16:22       ` Marc Zyngier
  2023-06-07 16:30         ` Eric Auger
  0 siblings, 1 reply; 95+ messages in thread
From: Marc Zyngier @ 2023-06-06 16:22 UTC (permalink / raw)
  To: Eric Auger
  Cc: kvmarm, kvm, linux-arm-kernel, Alexandru Elisei, Andre Przywara,
	Chase Conklin, Christoffer Dall, Ganapatrao Kulkarni,
	Darren Hart, Jintack Lim, Russell King, Miguel Luis, James Morse,
	Suzuki K Poulose, Oliver Upton, Zenghui Yu

On Tue, 06 Jun 2023 10:29:36 +0100,
Eric Auger <eauger@redhat.com> wrote:
> 
> Hi Marc,
> On 6/6/23 09:30, Marc Zyngier wrote:
> > Hey Eric,
> > 
> > On Mon, 05 Jun 2023 12:28:12 +0100,
> > Eric Auger <eauger@redhat.com> wrote:
> >>
> >> Hi Marc,
> >>
> >> On 5/15/23 19:30, Marc Zyngier wrote:
> >>> This is the 4th drop of NV support on arm64 for this year.
> >>>
> >>> For the previous episodes, see [1].
> >>>
> >>> What's changed:
> >>>
> >>> - New framework to track system register traps that are reinjected in
> >>>   guest EL2. It is expected to replace the discrete handling we have
> >>>   enjoyed so far, which didn't scale at all. This has already fixed a
> >>>   number of bugs that were hidden (a bunch of traps were never
> >>>   forwarded...). Still a work in progress, but this is going in the
> >>>   right direction.
> >>>
> >>> - Allow the L1 hypervisor to have a S2 that has an input larger than
> >>>   the L0 IPA space. This fixes a number of subtle issues, depending on
> >>>   how the initial guest was created.
> >>>
> >>> - Consequently, the patch series has gone longer again. Boo. But
> >>>   hopefully some of it is easier to review...
> >>>
> >>> [1] https://lore.kernel.org/r/20230405154008.3552854-1-maz@kernel.org
> >>>
> >>> Andre Przywara (1):
> >>>   KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ
> >>
> >> I guess you have executed kselftests on L1 guests. Have all the tests
> >> passed there? On my end it stalls in the KVM_RUN.
> > 
> > No, I hardly run any kselftest, because they are just not designed to
> > run at EL2 at all. There's some work to be done there, but I just
> > don't have the bandwidth for that (hint, wink...)
> 
> oh OK, I missed that point. If nobody is working on this I can start
> looking at it. Would be interesting to run them on nested guest too.

If you want to pick this up, it would be extremely helpful. And no,
nobody is really looking into it at the moment, so it's all yours!

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support
  2023-06-06  9:33         ` Eric Auger
@ 2023-06-06 16:30           ` Marc Zyngier
  2023-06-07 16:39             ` Eric Auger
  2023-06-06 17:52           ` Miguel Luis
  1 sibling, 1 reply; 95+ messages in thread
From: Marc Zyngier @ 2023-06-06 16:30 UTC (permalink / raw)
  To: Eric Auger
  Cc: kvmarm, kvm, linux-arm-kernel, Alexandru Elisei, Andre Przywara,
	Chase Conklin, Christoffer Dall, Ganapatrao Kulkarni,
	Darren Hart, Jintack Lim, Russell King, Miguel Luis, James Morse,
	Suzuki K Poulose, Oliver Upton, Zenghui Yu

On Tue, 06 Jun 2023 10:33:27 +0100,
Eric Auger <eauger@redhat.com> wrote:
> 
> > By the way, what are you using as your VMM? I'd really like to
> > reproduce your setup.
> Sorry I missed your reply. I am using libvirt + qemu (feat Miguel's RFC)
> and fedora L1 guest.

OK, so that's *very* different for what I'm using, which is good! Do
you have a QEMU branch somewhere?

> Thanks to your fix, this boots fine. But at the moment it does not
> reboot and hangs in edk2 I think. Unfortunately this time I have no
> trace on host :-( While looking at your series I will add some traces.

I've been able to run EDK2 compiled for kvmtool with only a couple of
change (such as using SMCs instead of HVCs, and disabling the broken
DT-to-ACPI convertion). However, reboot isn't something I've played
with, as kvmtool doesn't even try to reboot the guest (reboot is
handled as power-off).

I'm pretty sure this is related to tearing down the shadow S2 MMU
contexts when QEMU reinit the vcpus, and we may fail to clean things
up.

I'll try and have a look once I'm back from holiday (and we can have a
look at KVM Forum).

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support
  2023-06-06  9:33         ` Eric Auger
  2023-06-06 16:30           ` Marc Zyngier
@ 2023-06-06 17:52           ` Miguel Luis
  2023-06-07 16:40             ` Eric Auger
  1 sibling, 1 reply; 95+ messages in thread
From: Miguel Luis @ 2023-06-06 17:52 UTC (permalink / raw)
  To: Eric Auger
  Cc: Marc Zyngier, kvmarm, kvm, linux-arm-kernel, Alexandru Elisei,
	Andre Przywara, Chase Conklin, Christoffer Dall,
	Ganapatrao Kulkarni, Darren Hart, Jintack Lim, Russell King,
	James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu

Hello Eric, Marc,

> On 6 Jun 2023, at 09:33, Eric Auger <eauger@redhat.com> wrote:
> 
> Hi Marc,
> 
> On 5/17/23 16:12, Marc Zyngier wrote:
>> On Wed, 17 May 2023 09:59:45 +0100,
>> Eric Auger <eauger@redhat.com> wrote:
>>> 
>>> Hi Marc,
>>> Hi Marc,
>>> On 5/16/23 22:28, Marc Zyngier wrote:
>>>> On Tue, 16 May 2023 17:53:14 +0100,
>>>> Eric Auger <eauger@redhat.com> wrote:
>>>>> 
>>>>> Hi Marc,
>>>>> 
>>>>> On 5/15/23 19:30, Marc Zyngier wrote:
>>>>>> This is the 4th drop of NV support on arm64 for this year.
>>>>>> 
>>>>>> For the previous episodes, see [1].
>>>>>> 
>>>>>> What's changed:
>>>>>> 
>>>>>> - New framework to track system register traps that are reinjected in
>>>>>>  guest EL2. It is expected to replace the discrete handling we have
>>>>>>  enjoyed so far, which didn't scale at all. This has already fixed a
>>>>>>  number of bugs that were hidden (a bunch of traps were never
>>>>>>  forwarded...). Still a work in progress, but this is going in the
>>>>>>  right direction.
>>>>>> 
>>>>>> - Allow the L1 hypervisor to have a S2 that has an input larger than
>>>>>>  the L0 IPA space. This fixes a number of subtle issues, depending on
>>>>>>  how the initial guest was created.
>>>>>> 
>>>>>> - Consequently, the patch series has gone longer again. Boo. But
>>>>>>  hopefully some of it is easier to review...
>>>>>> 
>>>>>> [1] https://lore.kernel.org/r/20230405154008.3552854-1-maz@kernel.org
>>>>> 
>>>>> I have started testing this and when booting my fedora guest I get
>>>>> 
>>>>> [  151.796544] kvm [7617]: Unsupported guest sys_reg access at:
>>>>> 23f425fd0 [80000209]
>>>>> [  151.796544]  { Op0( 3), Op1( 3), CRn(14), CRm( 3), Op2( 1), func_write },
>>>>> 
>>>>> as soon as the host has kvm-arm.mode=nested
>>>>> 
>>>>> This seems to be triggered very early by EDK2
>>>>> (ArmPkg/Drivers/TimerDxe/TimerDxe.c).
>>>>> 
>>>>> If I am not wrong this CNTV_CTL_EL0. Do you have any idea?
>>>> 
>>>> So here's my current analysis:
>>>> 
>>>> I assume you are running EDK2 as the L1 guest in a nested
>>>> configuration. I also assume that you are not running on an Apple
>>>> CPU. If these assumptions are correct, then EDK2 runs at vEL2, and is
>>>> in nVHE mode.
>>>> 
>>>> Finally, I'm going to assume that your implementation has FEAT_ECV and
>>>> FEAT_NV2, because I can't see how it could fail otherwise.
>>> all the above is correct.
>>>> 
>>>> In these precise conditions, KVM sets the CNTHCTL_EL2.EL1TVT bit so
>>>> that we can trap the EL0 virtual timer and faithfully emulate it (it
>>>> is otherwise written to memory, which isn't very helpful).
>>> 
>>> indeed
>>>> 
>>>> As it turns out, we don't handle these traps. I didn't spot it because
>>>> my test machines are all Apple boxes that don't have a nVHE mode, so
>>>> nothing on the nVHE path is getting *ANY* coverage. Hint: having
>>>> access to such a machine would help (shipping address on request!).
>>>> Otherwise, I'll eventually kill the nVHE support altogether.
>>>> 
>>>> I have written the following patch, which compiles, but that I cannot
>>>> test with my current setup. Could you please give it a go?
>>> 
>>> with the patch below, my guest boots nicely. You did it great on the 1st
>>> shot!!! So this fixes my issue. I will continue testing the v10.
>> 
>> Thanks a lot for reporting the issue and testing my hacks. I'll
>> eventually fold it into the rest of the series.
>> 
>> By the way, what are you using as your VMM? I'd really like to
>> reproduce your setup.
> Sorry I missed your reply. I am using libvirt + qemu (feat Miguel's RFC)
> and fedora L1 guest.
> 

Following this subject, I’ve forward ported Alexandru’s KUT patches
( and I encourage others to do it also =) ) which expose an EL2 test that
does three checks:

- whether VHE is supported and enabled
- disable VHE
- re-enable VHE  

I’m running qemu with virtualization=on as well to run this test and it is passing although
problems seem to happen when running with virtualization=off, which I’m still looking into it.

Thanks
Miguel

> Thanks to your fix, this boots fine. But at the moment it does not
> reboot and hangs in edk2 I think. Unfortunately this time I have no
> trace on host :-( While looking at your series I will add some traces.
> 
> Eric
>> 
>> Cheers,
>> 
>> M.



^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support
  2023-06-06 16:22       ` Marc Zyngier
@ 2023-06-07 16:30         ` Eric Auger
  0 siblings, 0 replies; 95+ messages in thread
From: Eric Auger @ 2023-06-07 16:30 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, kvm, linux-arm-kernel, Alexandru Elisei, Andre Przywara,
	Chase Conklin, Christoffer Dall, Ganapatrao Kulkarni,
	Darren Hart, Jintack Lim, Russell King, Miguel Luis, James Morse,
	Suzuki K Poulose, Oliver Upton, Zenghui Yu

Hi Marc,

On 6/6/23 18:22, Marc Zyngier wrote:
> On Tue, 06 Jun 2023 10:29:36 +0100,
> Eric Auger <eauger@redhat.com> wrote:
>>
>> Hi Marc,
>> On 6/6/23 09:30, Marc Zyngier wrote:
>>> Hey Eric,
>>>
>>> On Mon, 05 Jun 2023 12:28:12 +0100,
>>> Eric Auger <eauger@redhat.com> wrote:
>>>>
>>>> Hi Marc,
>>>>
>>>> On 5/15/23 19:30, Marc Zyngier wrote:
>>>>> This is the 4th drop of NV support on arm64 for this year.
>>>>>
>>>>> For the previous episodes, see [1].
>>>>>
>>>>> What's changed:
>>>>>
>>>>> - New framework to track system register traps that are reinjected in
>>>>>   guest EL2. It is expected to replace the discrete handling we have
>>>>>   enjoyed so far, which didn't scale at all. This has already fixed a
>>>>>   number of bugs that were hidden (a bunch of traps were never
>>>>>   forwarded...). Still a work in progress, but this is going in the
>>>>>   right direction.
>>>>>
>>>>> - Allow the L1 hypervisor to have a S2 that has an input larger than
>>>>>   the L0 IPA space. This fixes a number of subtle issues, depending on
>>>>>   how the initial guest was created.
>>>>>
>>>>> - Consequently, the patch series has gone longer again. Boo. But
>>>>>   hopefully some of it is easier to review...
>>>>>
>>>>> [1] https://lore.kernel.org/r/20230405154008.3552854-1-maz@kernel.org
>>>>>
>>>>> Andre Przywara (1):
>>>>>   KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ
>>>>
>>>> I guess you have executed kselftests on L1 guests. Have all the tests
>>>> passed there? On my end it stalls in the KVM_RUN.
>>>
>>> No, I hardly run any kselftest, because they are just not designed to
>>> run at EL2 at all. There's some work to be done there, but I just
>>> don't have the bandwidth for that (hint, wink...)
>>
>> oh OK, I missed that point. If nobody is working on this I can start
>> looking at it. Would be interesting to run them on nested guest too.
> 
> If you want to pick this up, it would be extremely helpful. And no,
> nobody is really looking into it at the moment, so it's all yours!

OK I will study that then :-)

Eric
> 
> Thanks,
> 
> 	M.
> 


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support
  2023-06-06 16:30           ` Marc Zyngier
@ 2023-06-07 16:39             ` Eric Auger
  0 siblings, 0 replies; 95+ messages in thread
From: Eric Auger @ 2023-06-07 16:39 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, kvm, linux-arm-kernel, Alexandru Elisei, Andre Przywara,
	Chase Conklin, Christoffer Dall, Ganapatrao Kulkarni,
	Darren Hart, Jintack Lim, Russell King, Miguel Luis, James Morse,
	Suzuki K Poulose, Oliver Upton, Zenghui Yu

Hi Marc,

On 6/6/23 18:30, Marc Zyngier wrote:
> On Tue, 06 Jun 2023 10:33:27 +0100,
> Eric Auger <eauger@redhat.com> wrote:
>>
>>> By the way, what are you using as your VMM? I'd really like to
>>> reproduce your setup.
>> Sorry I missed your reply. I am using libvirt + qemu (feat Miguel's RFC)
>> and fedora L1 guest.
> 
> OK, so that's *very* different for what I'm using, which is good! Do
> you have a QEMU branch somewhere?

here it is:
https://github.com/eauger/qemu/tree/nv_rfc_rebase
> 
>> Thanks to your fix, this boots fine. But at the moment it does not
>> reboot and hangs in edk2 I think. Unfortunately this time I have no
>> trace on host :-( While looking at your series I will add some traces.
> 
> I've been able to run EDK2 compiled for kvmtool with only a couple of
> change (such as using SMCs instead of HVCs, and disabling the broken
> DT-to-ACPI convertion). However, reboot isn't something I've played
> with, as kvmtool doesn't even try to reboot the guest (reboot is
> handled as power-off).
> 
> I'm pretty sure this is related to tearing down the shadow S2 MMU
> contexts when QEMU reinit the vcpus, and we may fail to clean things
> up.


> 
> I'll try and have a look once I'm back from holiday (and we can have a
> look at KVM Forum).

I won't be there this year. But enjoy the event and enjoy your vacation!

Thanks

Eric
> 
> Thanks,
> 
> 	M.
> 


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support
  2023-06-06 17:52           ` Miguel Luis
@ 2023-06-07 16:40             ` Eric Auger
  2023-06-10  8:25               ` Miguel Luis
  0 siblings, 1 reply; 95+ messages in thread
From: Eric Auger @ 2023-06-07 16:40 UTC (permalink / raw)
  To: Miguel Luis
  Cc: Marc Zyngier, kvmarm, kvm, linux-arm-kernel, Alexandru Elisei,
	Andre Przywara, Chase Conklin, Christoffer Dall,
	Ganapatrao Kulkarni, Darren Hart, Jintack Lim, Russell King,
	James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu

Hi Miguel,

On 6/6/23 19:52, Miguel Luis wrote:
> Hello Eric, Marc,
> 
>> On 6 Jun 2023, at 09:33, Eric Auger <eauger@redhat.com> wrote:
>>
>> Hi Marc,
>>
>> On 5/17/23 16:12, Marc Zyngier wrote:
>>> On Wed, 17 May 2023 09:59:45 +0100,
>>> Eric Auger <eauger@redhat.com> wrote:
>>>>
>>>> Hi Marc,
>>>> Hi Marc,
>>>> On 5/16/23 22:28, Marc Zyngier wrote:
>>>>> On Tue, 16 May 2023 17:53:14 +0100,
>>>>> Eric Auger <eauger@redhat.com> wrote:
>>>>>>
>>>>>> Hi Marc,
>>>>>>
>>>>>> On 5/15/23 19:30, Marc Zyngier wrote:
>>>>>>> This is the 4th drop of NV support on arm64 for this year.
>>>>>>>
>>>>>>> For the previous episodes, see [1].
>>>>>>>
>>>>>>> What's changed:
>>>>>>>
>>>>>>> - New framework to track system register traps that are reinjected in
>>>>>>>  guest EL2. It is expected to replace the discrete handling we have
>>>>>>>  enjoyed so far, which didn't scale at all. This has already fixed a
>>>>>>>  number of bugs that were hidden (a bunch of traps were never
>>>>>>>  forwarded...). Still a work in progress, but this is going in the
>>>>>>>  right direction.
>>>>>>>
>>>>>>> - Allow the L1 hypervisor to have a S2 that has an input larger than
>>>>>>>  the L0 IPA space. This fixes a number of subtle issues, depending on
>>>>>>>  how the initial guest was created.
>>>>>>>
>>>>>>> - Consequently, the patch series has gone longer again. Boo. But
>>>>>>>  hopefully some of it is easier to review...
>>>>>>>
>>>>>>> [1] https://lore.kernel.org/r/20230405154008.3552854-1-maz@kernel.org
>>>>>>
>>>>>> I have started testing this and when booting my fedora guest I get
>>>>>>
>>>>>> [  151.796544] kvm [7617]: Unsupported guest sys_reg access at:
>>>>>> 23f425fd0 [80000209]
>>>>>> [  151.796544]  { Op0( 3), Op1( 3), CRn(14), CRm( 3), Op2( 1), func_write },
>>>>>>
>>>>>> as soon as the host has kvm-arm.mode=nested
>>>>>>
>>>>>> This seems to be triggered very early by EDK2
>>>>>> (ArmPkg/Drivers/TimerDxe/TimerDxe.c).
>>>>>>
>>>>>> If I am not wrong this CNTV_CTL_EL0. Do you have any idea?
>>>>>
>>>>> So here's my current analysis:
>>>>>
>>>>> I assume you are running EDK2 as the L1 guest in a nested
>>>>> configuration. I also assume that you are not running on an Apple
>>>>> CPU. If these assumptions are correct, then EDK2 runs at vEL2, and is
>>>>> in nVHE mode.
>>>>>
>>>>> Finally, I'm going to assume that your implementation has FEAT_ECV and
>>>>> FEAT_NV2, because I can't see how it could fail otherwise.
>>>> all the above is correct.
>>>>>
>>>>> In these precise conditions, KVM sets the CNTHCTL_EL2.EL1TVT bit so
>>>>> that we can trap the EL0 virtual timer and faithfully emulate it (it
>>>>> is otherwise written to memory, which isn't very helpful).
>>>>
>>>> indeed
>>>>>
>>>>> As it turns out, we don't handle these traps. I didn't spot it because
>>>>> my test machines are all Apple boxes that don't have a nVHE mode, so
>>>>> nothing on the nVHE path is getting *ANY* coverage. Hint: having
>>>>> access to such a machine would help (shipping address on request!).
>>>>> Otherwise, I'll eventually kill the nVHE support altogether.
>>>>>
>>>>> I have written the following patch, which compiles, but that I cannot
>>>>> test with my current setup. Could you please give it a go?
>>>>
>>>> with the patch below, my guest boots nicely. You did it great on the 1st
>>>> shot!!! So this fixes my issue. I will continue testing the v10.
>>>
>>> Thanks a lot for reporting the issue and testing my hacks. I'll
>>> eventually fold it into the rest of the series.
>>>
>>> By the way, what are you using as your VMM? I'd really like to
>>> reproduce your setup.
>> Sorry I missed your reply. I am using libvirt + qemu (feat Miguel's RFC)
>> and fedora L1 guest.
>>
> 
> Following this subject, I’ve forward ported Alexandru’s KUT patches
> ( and I encourage others to do it also =) ) which expose an EL2 test that

Do you have a branch available with Alexandru's rebased kut series?

Thanks

Eric
> does three checks:
> 
> - whether VHE is supported and enabled
> - disable VHE
> - re-enable VHE  
> 
> I’m running qemu with virtualization=on as well to run this test and it is passing although
> problems seem to happen when running with virtualization=off, which I’m still looking into it.
> 
> Thanks
> Miguel
> 
>> Thanks to your fix, this boots fine. But at the moment it does not
>> reboot and hangs in edk2 I think. Unfortunately this time I have no
>> trace on host :-( While looking at your series I will add some traces.
>>
>> Eric
>>>
>>> Cheers,
>>>
>>> M.
> 
> 


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support
  2023-06-07 16:40             ` Eric Auger
@ 2023-06-10  8:25               ` Miguel Luis
  2023-06-11 16:28                 ` Eric Auger
  0 siblings, 1 reply; 95+ messages in thread
From: Miguel Luis @ 2023-06-10  8:25 UTC (permalink / raw)
  To: Eric Auger
  Cc: Marc Zyngier, kvmarm, kvm, linux-arm-kernel, Alexandru Elisei,
	Andre Przywara, Chase Conklin, Christoffer Dall,
	Ganapatrao Kulkarni, Darren Hart, Jintack Lim, Russell King,
	James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu

Hi Eric,

> On 7 Jun 2023, at 16:40, Eric Auger <eauger@redhat.com> wrote:
> 
> Hi Miguel,
> 
> On 6/6/23 19:52, Miguel Luis wrote:
>> Hello Eric, Marc,
>> 
>>> On 6 Jun 2023, at 09:33, Eric Auger <eauger@redhat.com> wrote:
>>> 
>>> Hi Marc,
>>> 
>>> On 5/17/23 16:12, Marc Zyngier wrote:
>>>> On Wed, 17 May 2023 09:59:45 +0100,
>>>> Eric Auger <eauger@redhat.com> wrote:
>>>>> 
>>>>> Hi Marc,
>>>>> Hi Marc,
>>>>> On 5/16/23 22:28, Marc Zyngier wrote:
>>>>>> On Tue, 16 May 2023 17:53:14 +0100,
>>>>>> Eric Auger <eauger@redhat.com> wrote:
>>>>>>> 
>>>>>>> Hi Marc,
>>>>>>> 
>>>>>>> On 5/15/23 19:30, Marc Zyngier wrote:
>>>>>>>> This is the 4th drop of NV support on arm64 for this year.
>>>>>>>> 
>>>>>>>> For the previous episodes, see [1].
>>>>>>>> 
>>>>>>>> What's changed:
>>>>>>>> 
>>>>>>>> - New framework to track system register traps that are reinjected in
>>>>>>>> guest EL2. It is expected to replace the discrete handling we have
>>>>>>>> enjoyed so far, which didn't scale at all. This has already fixed a
>>>>>>>> number of bugs that were hidden (a bunch of traps were never
>>>>>>>> forwarded...). Still a work in progress, but this is going in the
>>>>>>>> right direction.
>>>>>>>> 
>>>>>>>> - Allow the L1 hypervisor to have a S2 that has an input larger than
>>>>>>>> the L0 IPA space. This fixes a number of subtle issues, depending on
>>>>>>>> how the initial guest was created.
>>>>>>>> 
>>>>>>>> - Consequently, the patch series has gone longer again. Boo. But
>>>>>>>> hopefully some of it is easier to review...
>>>>>>>> 
>>>>>>>> [1] https://lore.kernel.org/r/20230405154008.3552854-1-maz@kernel.org
>>>>>>> 
>>>>>>> I have started testing this and when booting my fedora guest I get
>>>>>>> 
>>>>>>> [  151.796544] kvm [7617]: Unsupported guest sys_reg access at:
>>>>>>> 23f425fd0 [80000209]
>>>>>>> [  151.796544]  { Op0( 3), Op1( 3), CRn(14), CRm( 3), Op2( 1), func_write },
>>>>>>> 
>>>>>>> as soon as the host has kvm-arm.mode=nested
>>>>>>> 
>>>>>>> This seems to be triggered very early by EDK2
>>>>>>> (ArmPkg/Drivers/TimerDxe/TimerDxe.c).
>>>>>>> 
>>>>>>> If I am not wrong this CNTV_CTL_EL0. Do you have any idea?
>>>>>> 
>>>>>> So here's my current analysis:
>>>>>> 
>>>>>> I assume you are running EDK2 as the L1 guest in a nested
>>>>>> configuration. I also assume that you are not running on an Apple
>>>>>> CPU. If these assumptions are correct, then EDK2 runs at vEL2, and is
>>>>>> in nVHE mode.
>>>>>> 
>>>>>> Finally, I'm going to assume that your implementation has FEAT_ECV and
>>>>>> FEAT_NV2, because I can't see how it could fail otherwise.
>>>>> all the above is correct.
>>>>>> 
>>>>>> In these precise conditions, KVM sets the CNTHCTL_EL2.EL1TVT bit so
>>>>>> that we can trap the EL0 virtual timer and faithfully emulate it (it
>>>>>> is otherwise written to memory, which isn't very helpful).
>>>>> 
>>>>> indeed
>>>>>> 
>>>>>> As it turns out, we don't handle these traps. I didn't spot it because
>>>>>> my test machines are all Apple boxes that don't have a nVHE mode, so
>>>>>> nothing on the nVHE path is getting *ANY* coverage. Hint: having
>>>>>> access to such a machine would help (shipping address on request!).
>>>>>> Otherwise, I'll eventually kill the nVHE support altogether.
>>>>>> 
>>>>>> I have written the following patch, which compiles, but that I cannot
>>>>>> test with my current setup. Could you please give it a go?
>>>>> 
>>>>> with the patch below, my guest boots nicely. You did it great on the 1st
>>>>> shot!!! So this fixes my issue. I will continue testing the v10.
>>>> 
>>>> Thanks a lot for reporting the issue and testing my hacks. I'll
>>>> eventually fold it into the rest of the series.
>>>> 
>>>> By the way, what are you using as your VMM? I'd really like to
>>>> reproduce your setup.
>>> Sorry I missed your reply. I am using libvirt + qemu (feat Miguel's RFC)
>>> and fedora L1 guest.
>>> 
>> 
>> Following this subject, I’ve forward ported Alexandru’s KUT patches
>> ( and I encourage others to do it also =) ) which expose an EL2 test that
> 
> Do you have a branch available with Alexandru's rebased kut series?

Now I do :)

https://github.com/mluis/kvm-unit-tests/tree/nv-WIP

Thanks,
Miguel

> 
> Thanks
> 
> Eric
>> does three checks:
>> 
>> - whether VHE is supported and enabled
>> - disable VHE
>> - re-enable VHE  
>> 
>> I’m running qemu with virtualization=on as well to run this test and it is passing although
>> problems seem to happen when running with virtualization=off, which I’m still looking into it.
>> 
>> Thanks
>> Miguel
>> 
>>> Thanks to your fix, this boots fine. But at the moment it does not
>>> reboot and hangs in edk2 I think. Unfortunately this time I have no
>>> trace on host :-( While looking at your series I will add some traces.
>>> 
>>> Eric
>>>> 
>>>> Cheers,
>>>> 
>>>> M.



^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support
  2023-06-10  8:25               ` Miguel Luis
@ 2023-06-11 16:28                 ` Eric Auger
  0 siblings, 0 replies; 95+ messages in thread
From: Eric Auger @ 2023-06-11 16:28 UTC (permalink / raw)
  To: Miguel Luis
  Cc: Marc Zyngier, kvmarm, kvm, linux-arm-kernel, Alexandru Elisei,
	Andre Przywara, Chase Conklin, Christoffer Dall,
	Ganapatrao Kulkarni, Darren Hart, Jintack Lim, Russell King,
	James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu

Hi Miguel,

On 6/10/23 10:25, Miguel Luis wrote:
> Hi Eric,
> 
>> On 7 Jun 2023, at 16:40, Eric Auger <eauger@redhat.com> wrote:
>>
>> Hi Miguel,
>>
>> On 6/6/23 19:52, Miguel Luis wrote:
>>> Hello Eric, Marc,
>>>
>>>> On 6 Jun 2023, at 09:33, Eric Auger <eauger@redhat.com> wrote:
>>>>
>>>> Hi Marc,
>>>>
>>>> On 5/17/23 16:12, Marc Zyngier wrote:
>>>>> On Wed, 17 May 2023 09:59:45 +0100,
>>>>> Eric Auger <eauger@redhat.com> wrote:
>>>>>>
>>>>>> Hi Marc,
>>>>>> Hi Marc,
>>>>>> On 5/16/23 22:28, Marc Zyngier wrote:
>>>>>>> On Tue, 16 May 2023 17:53:14 +0100,
>>>>>>> Eric Auger <eauger@redhat.com> wrote:
>>>>>>>>
>>>>>>>> Hi Marc,
>>>>>>>>
>>>>>>>> On 5/15/23 19:30, Marc Zyngier wrote:
>>>>>>>>> This is the 4th drop of NV support on arm64 for this year.
>>>>>>>>>
>>>>>>>>> For the previous episodes, see [1].
>>>>>>>>>
>>>>>>>>> What's changed:
>>>>>>>>>
>>>>>>>>> - New framework to track system register traps that are reinjected in
>>>>>>>>> guest EL2. It is expected to replace the discrete handling we have
>>>>>>>>> enjoyed so far, which didn't scale at all. This has already fixed a
>>>>>>>>> number of bugs that were hidden (a bunch of traps were never
>>>>>>>>> forwarded...). Still a work in progress, but this is going in the
>>>>>>>>> right direction.
>>>>>>>>>
>>>>>>>>> - Allow the L1 hypervisor to have a S2 that has an input larger than
>>>>>>>>> the L0 IPA space. This fixes a number of subtle issues, depending on
>>>>>>>>> how the initial guest was created.
>>>>>>>>>
>>>>>>>>> - Consequently, the patch series has gone longer again. Boo. But
>>>>>>>>> hopefully some of it is easier to review...
>>>>>>>>>
>>>>>>>>> [1] https://lore.kernel.org/r/20230405154008.3552854-1-maz@kernel.org
>>>>>>>>
>>>>>>>> I have started testing this and when booting my fedora guest I get
>>>>>>>>
>>>>>>>> [  151.796544] kvm [7617]: Unsupported guest sys_reg access at:
>>>>>>>> 23f425fd0 [80000209]
>>>>>>>> [  151.796544]  { Op0( 3), Op1( 3), CRn(14), CRm( 3), Op2( 1), func_write },
>>>>>>>>
>>>>>>>> as soon as the host has kvm-arm.mode=nested
>>>>>>>>
>>>>>>>> This seems to be triggered very early by EDK2
>>>>>>>> (ArmPkg/Drivers/TimerDxe/TimerDxe.c).
>>>>>>>>
>>>>>>>> If I am not wrong this CNTV_CTL_EL0. Do you have any idea?
>>>>>>>
>>>>>>> So here's my current analysis:
>>>>>>>
>>>>>>> I assume you are running EDK2 as the L1 guest in a nested
>>>>>>> configuration. I also assume that you are not running on an Apple
>>>>>>> CPU. If these assumptions are correct, then EDK2 runs at vEL2, and is
>>>>>>> in nVHE mode.
>>>>>>>
>>>>>>> Finally, I'm going to assume that your implementation has FEAT_ECV and
>>>>>>> FEAT_NV2, because I can't see how it could fail otherwise.
>>>>>> all the above is correct.
>>>>>>>
>>>>>>> In these precise conditions, KVM sets the CNTHCTL_EL2.EL1TVT bit so
>>>>>>> that we can trap the EL0 virtual timer and faithfully emulate it (it
>>>>>>> is otherwise written to memory, which isn't very helpful).
>>>>>>
>>>>>> indeed
>>>>>>>
>>>>>>> As it turns out, we don't handle these traps. I didn't spot it because
>>>>>>> my test machines are all Apple boxes that don't have a nVHE mode, so
>>>>>>> nothing on the nVHE path is getting *ANY* coverage. Hint: having
>>>>>>> access to such a machine would help (shipping address on request!).
>>>>>>> Otherwise, I'll eventually kill the nVHE support altogether.
>>>>>>>
>>>>>>> I have written the following patch, which compiles, but that I cannot
>>>>>>> test with my current setup. Could you please give it a go?
>>>>>>
>>>>>> with the patch below, my guest boots nicely. You did it great on the 1st
>>>>>> shot!!! So this fixes my issue. I will continue testing the v10.
>>>>>
>>>>> Thanks a lot for reporting the issue and testing my hacks. I'll
>>>>> eventually fold it into the rest of the series.
>>>>>
>>>>> By the way, what are you using as your VMM? I'd really like to
>>>>> reproduce your setup.
>>>> Sorry I missed your reply. I am using libvirt + qemu (feat Miguel's RFC)
>>>> and fedora L1 guest.
>>>>
>>>
>>> Following this subject, I’ve forward ported Alexandru’s KUT patches
>>> ( and I encourage others to do it also =) ) which expose an EL2 test that
>>
>> Do you have a branch available with Alexandru's rebased kut series?
> 
> Now I do :)
> 
> https://github.com/mluis/kvm-unit-tests/tree/nv-WIP

Cool, thanks!

Eric
> 
> Thanks,
> Miguel
> 
>>
>> Thanks
>>
>> Eric
>>> does three checks:
>>>
>>> - whether VHE is supported and enabled
>>> - disable VHE
>>> - re-enable VHE  
>>>
>>> I’m running qemu with virtualization=on as well to run this test and it is passing although
>>> problems seem to happen when running with virtualization=off, which I’m still looking into it.
>>>
>>> Thanks
>>> Miguel
>>>
>>>> Thanks to your fix, this boots fine. But at the moment it does not
>>>> reboot and hangs in edk2 I think. Unfortunately this time I have no
>>>> trace on host :-( While looking at your series I will add some traces.
>>>>
>>>> Eric
>>>>>
>>>>> Cheers,
>>>>>
>>>>> M.
> 
> 


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v10 07/59] arm64: Add AT operation encodings
  2023-05-15 17:30 ` [PATCH v10 07/59] arm64: Add AT " Marc Zyngier
@ 2023-06-14 15:08   ` Eric Auger
  0 siblings, 0 replies; 95+ messages in thread
From: Eric Auger @ 2023-06-14 15:08 UTC (permalink / raw)
  To: Marc Zyngier, kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

Hi Marc,

On 5/15/23 19:30, Marc Zyngier wrote:
> Add the encodings for the AT operation that are usable from NS.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/sysreg.h | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 2727e68dd65b..8be78a9b9b3b 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -568,6 +568,23 @@
>  
>  #define SYS_SP_EL2			sys_reg(3, 6,  4, 1, 0)
>  
> +/* AT instructions */
> +#define AT_Op0 1
> +#define AT_CRn 7
> +
> +#define OP_AT_S1E1R	sys_insn(AT_Op0, 0, AT_CRn, 8, 0)
> +#define OP_AT_S1E1W	sys_insn(AT_Op0, 0, AT_CRn, 8, 1)
> +#define OP_AT_S1E0R	sys_insn(AT_Op0, 0, AT_CRn, 8, 2)
> +#define OP_AT_S1E0W	sys_insn(AT_Op0, 0, AT_CRn, 8, 3)
> +#define OP_AT_S1E1RP	sys_insn(AT_Op0, 0, AT_CRn, 9, 0)
> +#define OP_AT_S1E1WP	sys_insn(AT_Op0, 0, AT_CRn, 9, 1)
> +#define OP_AT_S1E2R	sys_insn(AT_Op0, 4, AT_CRn, 8, 0)
> +#define OP_AT_S1E2W	sys_insn(AT_Op0, 4, AT_CRn, 8, 1)
> +#define OP_AT_S12E1R	sys_insn(AT_Op0, 4, AT_CRn, 8, 4)
> +#define OP_AT_S12E1W	sys_insn(AT_Op0, 4, AT_CRn, 8, 5)
> +#define OP_AT_S12E0R	sys_insn(AT_Op0, 4, AT_CRn, 8, 6)
> +#define OP_AT_S12E0W	sys_insn(AT_Op0, 4, AT_CRn, 8, 7)
> +
>  /* TLBI instructions */
>  #define OP_TLBI_VMALLE1OS		sys_insn(1, 0, 8, 1, 0)
>  #define OP_TLBI_VAE1OS			sys_insn(1, 0, 8, 1, 1)

Reviewed-by: Eric Auger <eric.auger@redhat.com>

Eric


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v10 08/59] KVM: arm64: Add missing HCR_EL2 trap bits
  2023-05-15 17:30 ` [PATCH v10 08/59] KVM: arm64: Add missing HCR_EL2 trap bits Marc Zyngier
@ 2023-06-14 15:08   ` Eric Auger
  0 siblings, 0 replies; 95+ messages in thread
From: Eric Auger @ 2023-06-14 15:08 UTC (permalink / raw)
  To: Marc Zyngier, kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

Hi Marc,

On 5/15/23 19:30, Marc Zyngier wrote:
> We're still missing a handfull of HCR_EL2 trap bits. Add them.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_arm.h | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index 209a4fba5d2a..4b3e55abb30f 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -17,9 +17,19 @@
>  #define HCR_DCT		(UL(1) << 57)
>  #define HCR_ATA_SHIFT	56
>  #define HCR_ATA		(UL(1) << HCR_ATA_SHIFT)
> +#define HCR_TTLBOS	(UL(1) << 55)
> +#define HCR_TTLBIS	(UL(1) << 54)
> +#define HCR_ENSCXT	(UL(1) << 53)
> +#define HCR_TOCU	(UL(1) << 52)
>  #define HCR_AMVOFFEN	(UL(1) << 51)
> +#define HCR_TICAB	(UL(1) << 50)
> +#define HCR_TID4	(UL(1) << 49)
>  #define HCR_FIEN	(UL(1) << 47)
>  #define HCR_FWB		(UL(1) << 46)
> +#define HCR_NV2		(UL(1) << 45)
> +#define HCR_AT		(UL(1) << 44)
> +#define HCR_NV1		(UL(1) << 43)
> +#define HCR_NV		(UL(1) << 42)
>  #define HCR_API		(UL(1) << 41)
>  #define HCR_APK		(UL(1) << 40)
>  #define HCR_TEA		(UL(1) << 37)
Reviewed-by: Eric Auger <eric.auger@redhat.com>

Eric


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support
  2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
                   ` (60 preceding siblings ...)
  2023-06-05 11:28 ` Eric Auger
@ 2023-06-28  6:45 ` Ganapatrao Kulkarni
  2023-06-29  7:03   ` Marc Zyngier
  61 siblings, 1 reply; 95+ messages in thread
From: Ganapatrao Kulkarni @ 2023-06-28  6:45 UTC (permalink / raw)
  To: Marc Zyngier, kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Darren Hart, Jintack Lim, Russell King,
	Miguel Luis, James Morse, Suzuki K Poulose, Oliver Upton,
	Zenghui Yu, Eric Auger


Hi Marc,


On 15-05-2023 11:00 pm, Marc Zyngier wrote:
> This is the 4th drop of NV support on arm64 for this year.
> 
> For the previous episodes, see [1].
> 
> What's changed:
> 
> - New framework to track system register traps that are reinjected in
>    guest EL2. It is expected to replace the discrete handling we have
>    enjoyed so far, which didn't scale at all. This has already fixed a
>    number of bugs that were hidden (a bunch of traps were never
>    forwarded...). Still a work in progress, but this is going in the
>    right direction.
> 
> - Allow the L1 hypervisor to have a S2 that has an input larger than
>    the L0 IPA space. This fixes a number of subtle issues, depending on
>    how the initial guest was created.
> 
> - Consequently, the patch series has gone longer again. Boo. But
>    hopefully some of it is easier to review...
> 

I am facing issue in booting NestedVM with V9 as well with 10 patchset.

I have tried V9/V10 on Ampere platform using kvmtool and I could boot
Guest-Hypervisor and then NestedVM without any issue.
However when I try to boot using QEMU(not using EDK2/EFI), 
Guest-Hypervisor is booted with Fedora 37 using virtio disk. From 
Guest-Hypervisor console(or ssh shell), If I try to boot NestedVM, boot 
hangs very early stage of the boot.

I did some debug using ftrace and it seems the Guest-Hypervisor is
getting very high rate of arch-timer interrupts,
due to that all CPU time is going on in serving the Guest-Hypervisor
and it is never going back to NestedVM.

I am using QEMU vanilla version v7.2.0 with top-up patches for NV [1]

[1] 
https://lore.kernel.org/all/20230227163718.62003-1-miguel.luis@oracle.com/

> [1] https://lore.kernel.org/r/20230405154008.3552854-1-maz@kernel.org
> 
> Andre Przywara (1):
>    KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ
> 
> Christoffer Dall (5):
>    KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2
>    KVM: arm64: nv: Implement nested Stage-2 page table walk logic
>    KVM: arm64: nv: Unmap/flush shadow stage 2 page tables
>    KVM: arm64: nv: vgic: Emulate the HW bit in software
>    KVM: arm64: nv: Sync nested timer state with FEAT_NV2
> 
> Jintack Lim (7):
>    KVM: arm64: nv: Trap CPACR_EL1 access in virtual EL2
>    KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting
>    KVM: arm64: nv: Respect virtual CPTR_EL2.{TFP,FPEN} settings
>    KVM: arm64: nv: Respect virtual HCR_EL2.{NV,TSC) settings
>    KVM: arm64: nv: Configure HCR_EL2 for nested virtualization
>    KVM: arm64: nv: Trap and emulate TLBI instructions from virtual EL2
>    KVM: arm64: nv: Nested GICv3 Support
> 
> Marc Zyngier (46):
>    KVM: arm64: Move VTCR_EL2 into struct s2_mmu
>    arm64: Add missing Set/Way CMO encodings
>    arm64: Add missing VA CMO encodings
>    arm64: Add missing ERXMISCx_EL1 encodings
>    arm64: Add missing DC ZVA/GVA/GZVA encodings
>    arm64: Add TLBI operation encodings
>    arm64: Add AT operation encodings
>    KVM: arm64: Add missing HCR_EL2 trap bits
>    KVM: arm64: nv: Add trap forwarding infrastructure
>    KVM: arm64: nv: Add trap forwarding for HCR_EL2
>    KVM: arm64: nv: Expose FEAT_EVT to nested guests
>    KVM: arm64: nv: Add trap forwarding for MDCR_EL2
>    KVM: arm64: nv: Add trap forwarding for CNTHCTL_EL2
>    KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers
>    KVM: arm64: nv: Handle virtual EL2 registers in
>      vcpu_read/write_sys_reg()
>    KVM: arm64: nv: Handle SPSR_EL2 specially
>    KVM: arm64: nv: Handle HCR_EL2.E2H specially
>    KVM: arm64: nv: Save/Restore vEL2 sysregs
>    KVM: arm64: nv: Support multiple nested Stage-2 mmu structures
>    KVM: arm64: nv: Handle shadow stage 2 page faults
>    KVM: arm64: nv: Restrict S2 RD/WR permissions to match the guest's
>    KVM: arm64: nv: Set a handler for the system instruction traps
>    KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2
>    KVM: arm64: nv: Fold guest's HCR_EL2 configuration into the host's
>    KVM: arm64: nv: Hide RAS from nested guests
>    KVM: arm64: nv: Add handling of EL2-specific timer registers
>    KVM: arm64: nv: Load timer before the GIC
>    KVM: arm64: nv: Don't load the GICv4 context on entering a nested
>      guest
>    KVM: arm64: nv: Implement maintenance interrupt forwarding
>    KVM: arm64: nv: Deal with broken VGIC on maintenance interrupt
>      delivery
>    KVM: arm64: nv: Allow userspace to request KVM_ARM_VCPU_NESTED_VIRT
>    KVM: arm64: nv: Add handling of FEAT_TTL TLB invalidation
>    KVM: arm64: nv: Invalidate TLBs based on shadow S2 TTL-like
>      information
>    KVM: arm64: nv: Tag shadow S2 entries with nested level
>    KVM: arm64: nv: Add include containing the VNCR_EL2 offsets
>    KVM: arm64: nv: Map VNCR-capable registers to a separate page
>    KVM: arm64: nv: Move nested vgic state into the sysreg file
>    KVM: arm64: Add FEAT_NV2 cpu feature
>    KVM: arm64: nv: Fold GICv3 host trapping requirements into guest setup
>    KVM: arm64: nv: Publish emulated timer interrupt state in the
>      in-memory state
>    KVM: arm64: nv: Allocate VNCR page when required
>    KVM: arm64: nv: Enable ARMv8.4-NV support
>    KVM: arm64: nv: Fast-track 'InHost' exception returns
>    KVM: arm64: nv: Fast-track EL1 TLBIs for VHE guests
>    KVM: arm64: nv: Use FEAT_ECV to trap access to EL0 timers
>    KVM: arm64: nv: Accelerate EL0 timer read accesses when FEAT_ECV is on
> 
>   .../virt/kvm/devices/arm-vgic-v3.rst          |  12 +-
>   arch/arm64/include/asm/esr.h                  |   1 +
>   arch/arm64/include/asm/kvm_arm.h              |  14 +
>   arch/arm64/include/asm/kvm_asm.h              |   4 +
>   arch/arm64/include/asm/kvm_emulate.h          |  93 +-
>   arch/arm64/include/asm/kvm_host.h             | 181 +++-
>   arch/arm64/include/asm/kvm_hyp.h              |   2 +
>   arch/arm64/include/asm/kvm_mmu.h              |  20 +-
>   arch/arm64/include/asm/kvm_nested.h           | 133 +++
>   arch/arm64/include/asm/stage2_pgtable.h       |   4 +-
>   arch/arm64/include/asm/sysreg.h               | 196 ++++
>   arch/arm64/include/asm/vncr_mapping.h         |  74 ++
>   arch/arm64/include/uapi/asm/kvm.h             |   1 +
>   arch/arm64/kernel/cpufeature.c                |  11 +
>   arch/arm64/kvm/Makefile                       |   4 +-
>   arch/arm64/kvm/arch_timer.c                   |  98 +-
>   arch/arm64/kvm/arm.c                          |  33 +-
>   arch/arm64/kvm/at.c                           | 219 ++++
>   arch/arm64/kvm/emulate-nested.c               | 934 ++++++++++++++++-
>   arch/arm64/kvm/handle_exit.c                  |  29 +-
>   arch/arm64/kvm/hyp/include/hyp/switch.h       |   8 +-
>   arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h    |   5 +-
>   arch/arm64/kvm/hyp/nvhe/mem_protect.c         |   8 +-
>   arch/arm64/kvm/hyp/nvhe/pkvm.c                |   4 +-
>   arch/arm64/kvm/hyp/nvhe/switch.c              |   2 +-
>   arch/arm64/kvm/hyp/nvhe/sysreg-sr.c           |   2 +-
>   arch/arm64/kvm/hyp/pgtable.c                  |   2 +-
>   arch/arm64/kvm/hyp/vgic-v3-sr.c               |   6 +-
>   arch/arm64/kvm/hyp/vhe/switch.c               | 206 +++-
>   arch/arm64/kvm/hyp/vhe/sysreg-sr.c            | 124 ++-
>   arch/arm64/kvm/hyp/vhe/tlb.c                  |  83 ++
>   arch/arm64/kvm/mmu.c                          | 255 ++++-
>   arch/arm64/kvm/nested.c                       | 799 ++++++++++++++-
>   arch/arm64/kvm/pkvm.c                         |   2 +-
>   arch/arm64/kvm/reset.c                        |   7 +
>   arch/arm64/kvm/sys_regs.c                     | 958 +++++++++++++++++-
>   arch/arm64/kvm/trace_arm.h                    |  19 +
>   arch/arm64/kvm/vgic/vgic-init.c               |  33 +
>   arch/arm64/kvm/vgic/vgic-kvm-device.c         |  32 +-
>   arch/arm64/kvm/vgic/vgic-v3-nested.c          | 248 +++++
>   arch/arm64/kvm/vgic/vgic-v3.c                 |  43 +-
>   arch/arm64/kvm/vgic/vgic.c                    |  29 +
>   arch/arm64/kvm/vgic/vgic.h                    |  10 +
>   arch/arm64/tools/cpucaps                      |   1 +
>   include/clocksource/arm_arch_timer.h          |   4 +
>   include/kvm/arm_arch_timer.h                  |   1 +
>   include/kvm/arm_vgic.h                        |  17 +
>   include/uapi/linux/kvm.h                      |   1 +
>   tools/arch/arm/include/uapi/asm/kvm.h         |   1 +
>   49 files changed, 4790 insertions(+), 183 deletions(-)
>   create mode 100644 arch/arm64/include/asm/vncr_mapping.h
>   create mode 100644 arch/arm64/kvm/at.c
>   create mode 100644 arch/arm64/kvm/vgic/vgic-v3-nested.c
> 

Thanks,
Ganapat

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support
  2023-06-28  6:45 ` Ganapatrao Kulkarni
@ 2023-06-29  7:03   ` Marc Zyngier
  2023-07-04 12:31     ` Ganapatrao Kulkarni
  2023-07-10 12:56     ` Miguel Luis
  0 siblings, 2 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-06-29  7:03 UTC (permalink / raw)
  To: Ganapatrao Kulkarni, Miguel Luis
  Cc: kvmarm, kvm, linux-arm-kernel, Alexandru Elisei, Andre Przywara,
	Chase Conklin, Christoffer Dall, Darren Hart, Jintack Lim,
	Russell King, James Morse, Suzuki K Poulose, Oliver Upton,
	Zenghui Yu, Eric Auger

Hi Ganapatrao,

On Wed, 28 Jun 2023 07:45:55 +0100,
Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
> 
> 
> Hi Marc,
> 
> 
> On 15-05-2023 11:00 pm, Marc Zyngier wrote:
> > This is the 4th drop of NV support on arm64 for this year.
> > 
> > For the previous episodes, see [1].
> > 
> > What's changed:
> > 
> > - New framework to track system register traps that are reinjected in
> >    guest EL2. It is expected to replace the discrete handling we have
> >    enjoyed so far, which didn't scale at all. This has already fixed a
> >    number of bugs that were hidden (a bunch of traps were never
> >    forwarded...). Still a work in progress, but this is going in the
> >    right direction.
> > 
> > - Allow the L1 hypervisor to have a S2 that has an input larger than
> >    the L0 IPA space. This fixes a number of subtle issues, depending on
> >    how the initial guest was created.
> > 
> > - Consequently, the patch series has gone longer again. Boo. But
> >    hopefully some of it is easier to review...
> > 
> 
> I am facing issue in booting NestedVM with V9 as well with 10 patchset.
> 
> I have tried V9/V10 on Ampere platform using kvmtool and I could boot
> Guest-Hypervisor and then NestedVM without any issue.
> However when I try to boot using QEMU(not using EDK2/EFI),
> Guest-Hypervisor is booted with Fedora 37 using virtio disk. From
> Guest-Hypervisor console(or ssh shell), If I try to boot NestedVM,
> boot hangs very early stage of the boot.
> 
> I did some debug using ftrace and it seems the Guest-Hypervisor is
> getting very high rate of arch-timer interrupts,
> due to that all CPU time is going on in serving the Guest-Hypervisor
> and it is never going back to NestedVM.
> 
> I am using QEMU vanilla version v7.2.0 with top-up patches for NV [1]

So I went ahead and gave QEMU a go. On my systems, *nothing* works (I
cannot even boot a L1 with 'virtualization=on" (the guest is stuck at
the point where virtio gets probed and waits for its first interrupt).

Worse, booting a hVHE guest results in QEMU generating an assert as it
tries to inject an interrupt using the QEMU GICv3 model, something
that should *NEVER* be in use with KVM.

With help from Eric, I got to a point where the hVHE guest could boot
as long as I kept injecting console interrupts, which is again a
symptom of the vGIC not being used.

So something is *majorly* wrong with the QEMU patches. I don't know
what makes it possible for you to even boot the L1 - if the GIC is
external, injecting an interrupt in the L2 is simply impossible.

Miguel, can you please investigate this?

In the meantime, I'll add some code to the kernel side to refuse the
external interrupt controller configuration with NV. Hopefully that
will lead to some clues about what is going on.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support
  2023-06-29  7:03   ` Marc Zyngier
@ 2023-07-04 12:31     ` Ganapatrao Kulkarni
  2023-07-07  9:46       ` Ganapatrao Kulkarni
  2023-07-10 12:56     ` Miguel Luis
  1 sibling, 1 reply; 95+ messages in thread
From: Ganapatrao Kulkarni @ 2023-07-04 12:31 UTC (permalink / raw)
  To: Marc Zyngier, Miguel Luis, Eric Auger
  Cc: kvmarm, kvm, linux-arm-kernel, Alexandru Elisei, Andre Przywara,
	Chase Conklin, Christoffer Dall, Darren Hart, Jintack Lim,
	Russell King, James Morse, Suzuki K Poulose, Oliver Upton,
	Zenghui Yu


Hi Marc,

On 29-06-2023 12:33 pm, Marc Zyngier wrote:
> Hi Ganapatrao,
> 
> On Wed, 28 Jun 2023 07:45:55 +0100,
> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
>>
>>
>> Hi Marc,
>>
>>
>> On 15-05-2023 11:00 pm, Marc Zyngier wrote:
>>> This is the 4th drop of NV support on arm64 for this year.
>>>
>>> For the previous episodes, see [1].
>>>
>>> What's changed:
>>>
>>> - New framework to track system register traps that are reinjected in
>>>     guest EL2. It is expected to replace the discrete handling we have
>>>     enjoyed so far, which didn't scale at all. This has already fixed a
>>>     number of bugs that were hidden (a bunch of traps were never
>>>     forwarded...). Still a work in progress, but this is going in the
>>>     right direction.
>>>
>>> - Allow the L1 hypervisor to have a S2 that has an input larger than
>>>     the L0 IPA space. This fixes a number of subtle issues, depending on
>>>     how the initial guest was created.
>>>
>>> - Consequently, the patch series has gone longer again. Boo. But
>>>     hopefully some of it is easier to review...
>>>
>>
>> I am facing issue in booting NestedVM with V9 as well with 10 patchset.
>>
>> I have tried V9/V10 on Ampere platform using kvmtool and I could boot
>> Guest-Hypervisor and then NestedVM without any issue.
>> However when I try to boot using QEMU(not using EDK2/EFI),
>> Guest-Hypervisor is booted with Fedora 37 using virtio disk. From
>> Guest-Hypervisor console(or ssh shell), If I try to boot NestedVM,
>> boot hangs very early stage of the boot.
>>
>> I did some debug using ftrace and it seems the Guest-Hypervisor is
>> getting very high rate of arch-timer interrupts,
>> due to that all CPU time is going on in serving the Guest-Hypervisor
>> and it is never going back to NestedVM.
>>
>> I am using QEMU vanilla version v7.2.0 with top-up patches for NV [1]
> 
> So I went ahead and gave QEMU a go. On my systems, *nothing* works (I
> cannot even boot a L1 with 'virtualization=on" (the guest is stuck at
> the point where virtio gets probed and waits for its first interrupt).
> 
> Worse, booting a hVHE guest results in QEMU generating an assert as it
> tries to inject an interrupt using the QEMU GICv3 model, something
> that should *NEVER* be in use with KVM.
> 
> With help from Eric, I got to a point where the hVHE guest could boot
> as long as I kept injecting console interrupts, which is again a
> symptom of the vGIC not being used.
> 
> So something is *majorly* wrong with the QEMU patches. I don't know
> what makes it possible for you to even boot the L1 - if the GIC is
> external, injecting an interrupt in the L2 is simply impossible.
> 
> Miguel, can you please investigate this?
> 
> In the meantime, I'll add some code to the kernel side to refuse the
> external interrupt controller configuration with NV. Hopefully that
> will lead to some clues about what is going on.

Continued debugging of the issue and it seems the endless ptimer 
interrupts on Ampere platform is due to some mess up of CVAL of ptimer, 
resulting in interrupt triggered always when it is enabled.

I see function "timer_set_offset" called from kvm_arm_timer_set_reg in 
QEMU case but there is no such calls in kvmtool boot.

If I comment the timer_set_offset calls in kvm_arm_timer_set_reg 
function, then I could boot the Guest-Hypervisor then NestedVM from GH/L1.

I also observed in QEMU case, kvm_arm_timer_set_reg is called to set 
CNT, CVAL and CTL of both vtimer and ptimer.
Not sure why QEMU is setting these registers explicitly? need to dig.

> 
> Thanks,
> 
> 	M.
> 

Thanks,
Ganapat


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support
  2023-07-04 12:31     ` Ganapatrao Kulkarni
@ 2023-07-07  9:46       ` Ganapatrao Kulkarni
  2023-07-11 11:56         ` Ganapatrao Kulkarni
  0 siblings, 1 reply; 95+ messages in thread
From: Ganapatrao Kulkarni @ 2023-07-07  9:46 UTC (permalink / raw)
  To: Marc Zyngier, Miguel Luis, Eric Auger
  Cc: kvmarm, kvm, linux-arm-kernel, Alexandru Elisei, Andre Przywara,
	Chase Conklin, Christoffer Dall, Darren Hart, Jintack Lim,
	Russell King, James Morse, Suzuki K Poulose, Oliver Upton,
	Zenghui Yu



On 04-07-2023 06:01 pm, Ganapatrao Kulkarni wrote:
> 
> Hi Marc,
> 
> On 29-06-2023 12:33 pm, Marc Zyngier wrote:
>> Hi Ganapatrao,
>>
>> On Wed, 28 Jun 2023 07:45:55 +0100,
>> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
>>>
>>>
>>> Hi Marc,
>>>
>>>
>>> On 15-05-2023 11:00 pm, Marc Zyngier wrote:
>>>> This is the 4th drop of NV support on arm64 for this year.
>>>>
>>>> For the previous episodes, see [1].
>>>>
>>>> What's changed:
>>>>
>>>> - New framework to track system register traps that are reinjected in
>>>>     guest EL2. It is expected to replace the discrete handling we have
>>>>     enjoyed so far, which didn't scale at all. This has already fixed a
>>>>     number of bugs that were hidden (a bunch of traps were never
>>>>     forwarded...). Still a work in progress, but this is going in the
>>>>     right direction.
>>>>
>>>> - Allow the L1 hypervisor to have a S2 that has an input larger than
>>>>     the L0 IPA space. This fixes a number of subtle issues, 
>>>> depending on
>>>>     how the initial guest was created.
>>>>
>>>> - Consequently, the patch series has gone longer again. Boo. But
>>>>     hopefully some of it is easier to review...
>>>>
>>>
>>> I am facing issue in booting NestedVM with V9 as well with 10 patchset.
>>>
>>> I have tried V9/V10 on Ampere platform using kvmtool and I could boot
>>> Guest-Hypervisor and then NestedVM without any issue.
>>> However when I try to boot using QEMU(not using EDK2/EFI),
>>> Guest-Hypervisor is booted with Fedora 37 using virtio disk. From
>>> Guest-Hypervisor console(or ssh shell), If I try to boot NestedVM,
>>> boot hangs very early stage of the boot.
>>>
>>> I did some debug using ftrace and it seems the Guest-Hypervisor is
>>> getting very high rate of arch-timer interrupts,
>>> due to that all CPU time is going on in serving the Guest-Hypervisor
>>> and it is never going back to NestedVM.
>>>
>>> I am using QEMU vanilla version v7.2.0 with top-up patches for NV [1]
>>
>> So I went ahead and gave QEMU a go. On my systems, *nothing* works (I
>> cannot even boot a L1 with 'virtualization=on" (the guest is stuck at
>> the point where virtio gets probed and waits for its first interrupt).
>>
>> Worse, booting a hVHE guest results in QEMU generating an assert as it
>> tries to inject an interrupt using the QEMU GICv3 model, something
>> that should *NEVER* be in use with KVM.
>>
>> With help from Eric, I got to a point where the hVHE guest could boot
>> as long as I kept injecting console interrupts, which is again a
>> symptom of the vGIC not being used.
>>
>> So something is *majorly* wrong with the QEMU patches. I don't know
>> what makes it possible for you to even boot the L1 - if the GIC is
>> external, injecting an interrupt in the L2 is simply impossible.
>>
>> Miguel, can you please investigate this?
>>
>> In the meantime, I'll add some code to the kernel side to refuse the
>> external interrupt controller configuration with NV. Hopefully that
>> will lead to some clues about what is going on.
> 
> Continued debugging of the issue and it seems the endless ptimer 
> interrupts on Ampere platform is due to some mess up of CVAL of ptimer, 
> resulting in interrupt triggered always when it is enabled.
> 
> I see function "timer_set_offset" called from kvm_arm_timer_set_reg in 
> QEMU case but there is no such calls in kvmtool boot.
> 
> If I comment the timer_set_offset calls in kvm_arm_timer_set_reg 
> function, then I could boot the Guest-Hypervisor then NestedVM from GH/L1.
> 
> I also observed in QEMU case, kvm_arm_timer_set_reg is called to set 
> CNT, CVAL and CTL of both vtimer and ptimer.
> Not sure why QEMU is setting these registers explicitly? need to dig.
> 

I don't see any direct ioctl calls to change any timer registers. Looks 
like it is happening from the emulation code(target/arm/helper.c)?

>>
>> Thanks,
>>
>>     M.
>>
> 
> Thanks,
> Ganapat
> 

Thanks,
Ganapat

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support
  2023-06-29  7:03   ` Marc Zyngier
  2023-07-04 12:31     ` Ganapatrao Kulkarni
@ 2023-07-10 12:56     ` Miguel Luis
  2023-07-18 10:29       ` Miguel Luis
  1 sibling, 1 reply; 95+ messages in thread
From: Miguel Luis @ 2023-07-10 12:56 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Ganapatrao Kulkarni, kvmarm, kvm, linux-arm-kernel,
	Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Darren Hart, Jintack Lim, Russell King,
	James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	Eric Auger

Hi Marc,

> On 29 Jun 2023, at 07:03, Marc Zyngier <maz@kernel.org> wrote:
> 
> Hi Ganapatrao,
> 
> On Wed, 28 Jun 2023 07:45:55 +0100,
> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
>> 
>> 
>> Hi Marc,
>> 
>> 
>> On 15-05-2023 11:00 pm, Marc Zyngier wrote:
>>> This is the 4th drop of NV support on arm64 for this year.
>>> 
>>> For the previous episodes, see [1].
>>> 
>>> What's changed:
>>> 
>>> - New framework to track system register traps that are reinjected in
>>>   guest EL2. It is expected to replace the discrete handling we have
>>>   enjoyed so far, which didn't scale at all. This has already fixed a
>>>   number of bugs that were hidden (a bunch of traps were never
>>>   forwarded...). Still a work in progress, but this is going in the
>>>   right direction.
>>> 
>>> - Allow the L1 hypervisor to have a S2 that has an input larger than
>>>   the L0 IPA space. This fixes a number of subtle issues, depending on
>>>   how the initial guest was created.
>>> 
>>> - Consequently, the patch series has gone longer again. Boo. But
>>>   hopefully some of it is easier to review...
>>> 
>> 
>> I am facing issue in booting NestedVM with V9 as well with 10 patchset.
>> 
>> I have tried V9/V10 on Ampere platform using kvmtool and I could boot
>> Guest-Hypervisor and then NestedVM without any issue.
>> However when I try to boot using QEMU(not using EDK2/EFI),
>> Guest-Hypervisor is booted with Fedora 37 using virtio disk. From
>> Guest-Hypervisor console(or ssh shell), If I try to boot NestedVM,
>> boot hangs very early stage of the boot.
>> 
>> I did some debug using ftrace and it seems the Guest-Hypervisor is
>> getting very high rate of arch-timer interrupts,
>> due to that all CPU time is going on in serving the Guest-Hypervisor
>> and it is never going back to NestedVM.
>> 
>> I am using QEMU vanilla version v7.2.0 with top-up patches for NV [1]
> 
> So I went ahead and gave QEMU a go. On my systems, *nothing* works (I
> cannot even boot a L1 with 'virtualization=on" (the guest is stuck at
> the point where virtio gets probed and waits for its first interrupt).
> 
> Worse, booting a hVHE guest results in QEMU generating an assert as it
> tries to inject an interrupt using the QEMU GICv3 model, something
> that should *NEVER* be in use with KVM.
> 
> With help from Eric, I got to a point where the hVHE guest could boot
> as long as I kept injecting console interrupts, which is again a
> symptom of the vGIC not being used.
> 
> So something is *majorly* wrong with the QEMU patches. I don't know
> what makes it possible for you to even boot the L1 - if the GIC is
> external, injecting an interrupt in the L2 is simply impossible.
> 
> Miguel, can you please investigate this?

Yes, I will investigate it. Sorry for the delay in replying as I took a break
short after KVM forum and I’ve just started to sync up.

Thanks,
Miguel

> 
> In the meantime, I'll add some code to the kernel side to refuse the
> external interrupt controller configuration with NV. Hopefully that
> will lead to some clues about what is going on.
> 
> Thanks,
> 
> M.
> 
> -- 
> Without deviation from the norm, progress is not possible.



^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support
  2023-07-07  9:46       ` Ganapatrao Kulkarni
@ 2023-07-11 11:56         ` Ganapatrao Kulkarni
  2023-07-11 12:30           ` Marc Zyngier
  0 siblings, 1 reply; 95+ messages in thread
From: Ganapatrao Kulkarni @ 2023-07-11 11:56 UTC (permalink / raw)
  To: Marc Zyngier, Miguel Luis, Eric Auger
  Cc: kvmarm, kvm, linux-arm-kernel, Alexandru Elisei, Andre Przywara,
	Chase Conklin, Christoffer Dall, Darren Hart, Jintack Lim,
	Russell King, James Morse, Suzuki K Poulose, Oliver Upton,
	Zenghui Yu



On 07-07-2023 03:16 pm, Ganapatrao Kulkarni wrote:
> 
> 
> On 04-07-2023 06:01 pm, Ganapatrao Kulkarni wrote:
>>
>> Hi Marc,
>>
>> On 29-06-2023 12:33 pm, Marc Zyngier wrote:
>>> Hi Ganapatrao,
>>>
>>> On Wed, 28 Jun 2023 07:45:55 +0100,
>>> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
>>>>
>>>>
>>>> Hi Marc,
>>>>
>>>>
>>>> On 15-05-2023 11:00 pm, Marc Zyngier wrote:
>>>>> This is the 4th drop of NV support on arm64 for this year.
>>>>>
>>>>> For the previous episodes, see [1].
>>>>>
>>>>> What's changed:
>>>>>
>>>>> - New framework to track system register traps that are reinjected in
>>>>>     guest EL2. It is expected to replace the discrete handling we have
>>>>>     enjoyed so far, which didn't scale at all. This has already 
>>>>> fixed a
>>>>>     number of bugs that were hidden (a bunch of traps were never
>>>>>     forwarded...). Still a work in progress, but this is going in the
>>>>>     right direction.
>>>>>
>>>>> - Allow the L1 hypervisor to have a S2 that has an input larger than
>>>>>     the L0 IPA space. This fixes a number of subtle issues, 
>>>>> depending on
>>>>>     how the initial guest was created.
>>>>>
>>>>> - Consequently, the patch series has gone longer again. Boo. But
>>>>>     hopefully some of it is easier to review...
>>>>>
>>>>
>>>> I am facing issue in booting NestedVM with V9 as well with 10 patchset.
>>>>
>>>> I have tried V9/V10 on Ampere platform using kvmtool and I could boot
>>>> Guest-Hypervisor and then NestedVM without any issue.
>>>> However when I try to boot using QEMU(not using EDK2/EFI),
>>>> Guest-Hypervisor is booted with Fedora 37 using virtio disk. From
>>>> Guest-Hypervisor console(or ssh shell), If I try to boot NestedVM,
>>>> boot hangs very early stage of the boot.
>>>>
>>>> I did some debug using ftrace and it seems the Guest-Hypervisor is
>>>> getting very high rate of arch-timer interrupts,
>>>> due to that all CPU time is going on in serving the Guest-Hypervisor
>>>> and it is never going back to NestedVM.
>>>>
>>>> I am using QEMU vanilla version v7.2.0 with top-up patches for NV [1]
>>>
>>> So I went ahead and gave QEMU a go. On my systems, *nothing* works (I
>>> cannot even boot a L1 with 'virtualization=on" (the guest is stuck at
>>> the point where virtio gets probed and waits for its first interrupt).
>>>
>>> Worse, booting a hVHE guest results in QEMU generating an assert as it
>>> tries to inject an interrupt using the QEMU GICv3 model, something
>>> that should *NEVER* be in use with KVM.
>>>
>>> With help from Eric, I got to a point where the hVHE guest could boot
>>> as long as I kept injecting console interrupts, which is again a
>>> symptom of the vGIC not being used.
>>>
>>> So something is *majorly* wrong with the QEMU patches. I don't know
>>> what makes it possible for you to even boot the L1 - if the GIC is
>>> external, injecting an interrupt in the L2 is simply impossible.
>>>
>>> Miguel, can you please investigate this?
>>>
>>> In the meantime, I'll add some code to the kernel side to refuse the
>>> external interrupt controller configuration with NV. Hopefully that
>>> will lead to some clues about what is going on.
>>
>> Continued debugging of the issue and it seems the endless ptimer 
>> interrupts on Ampere platform is due to some mess up of CVAL of 
>> ptimer, resulting in interrupt triggered always when it is enabled.
>>
>> I see function "timer_set_offset" called from kvm_arm_timer_set_reg in 
>> QEMU case but there is no such calls in kvmtool boot.
>>
>> If I comment the timer_set_offset calls in kvm_arm_timer_set_reg 
>> function, then I could boot the Guest-Hypervisor then NestedVM from 
>> GH/L1.
>>
>> I also observed in QEMU case, kvm_arm_timer_set_reg is called to set 
>> CNT, CVAL and CTL of both vtimer and ptimer.
>> Not sure why QEMU is setting these registers explicitly? need to dig.
>>
> 
> I don't see any direct ioctl calls to change any timer registers. Looks 
> like it is happening from the emulation code(target/arm/helper.c)?

function "write_list_to_kvmstate(target/arm/kvm.c)" is issuing ioctl to 
write timer registers. PTIMER_CNT and TIMER_CNT writes from this 
function is resulting in offsets change.

> 
>>>
>>> Thanks,
>>>
>>>     M.
>>>
>>
>> Thanks,
>> Ganapat
>>
> 
> Thanks,
> Ganapat

Thanks,
Ganapat

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support
  2023-07-11 11:56         ` Ganapatrao Kulkarni
@ 2023-07-11 12:30           ` Marc Zyngier
  0 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-07-11 12:30 UTC (permalink / raw)
  To: Ganapatrao Kulkarni
  Cc: Miguel Luis, Eric Auger, kvmarm, kvm, linux-arm-kernel,
	Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Darren Hart, Jintack Lim, Russell King,
	James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu

On Tue, 11 Jul 2023 12:56:48 +0100,
Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
> 
> 
> 
> On 07-07-2023 03:16 pm, Ganapatrao Kulkarni wrote:
> > 
> > 
> > On 04-07-2023 06:01 pm, Ganapatrao Kulkarni wrote:
> >> 
> >> Hi Marc,
> >> 
> >> On 29-06-2023 12:33 pm, Marc Zyngier wrote:
> >>> Hi Ganapatrao,
> >>> 
> >>> On Wed, 28 Jun 2023 07:45:55 +0100,
> >>> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
> >>>> 
> >>>> 
> >>>> Hi Marc,
> >>>> 
> >>>> 
> >>>> On 15-05-2023 11:00 pm, Marc Zyngier wrote:
> >>>>> This is the 4th drop of NV support on arm64 for this year.
> >>>>> 
> >>>>> For the previous episodes, see [1].
> >>>>> 
> >>>>> What's changed:
> >>>>> 
> >>>>> - New framework to track system register traps that are reinjected in
> >>>>>     guest EL2. It is expected to replace the discrete handling we have
> >>>>>     enjoyed so far, which didn't scale at all. This has already
> >>>>> fixed a
> >>>>>     number of bugs that were hidden (a bunch of traps were never
> >>>>>     forwarded...). Still a work in progress, but this is going in the
> >>>>>     right direction.
> >>>>> 
> >>>>> - Allow the L1 hypervisor to have a S2 that has an input larger than
> >>>>>     the L0 IPA space. This fixes a number of subtle issues,
> >>>>> depending on
> >>>>>     how the initial guest was created.
> >>>>> 
> >>>>> - Consequently, the patch series has gone longer again. Boo. But
> >>>>>     hopefully some of it is easier to review...
> >>>>> 
> >>>> 
> >>>> I am facing issue in booting NestedVM with V9 as well with 10 patchset.
> >>>> 
> >>>> I have tried V9/V10 on Ampere platform using kvmtool and I could boot
> >>>> Guest-Hypervisor and then NestedVM without any issue.
> >>>> However when I try to boot using QEMU(not using EDK2/EFI),
> >>>> Guest-Hypervisor is booted with Fedora 37 using virtio disk. From
> >>>> Guest-Hypervisor console(or ssh shell), If I try to boot NestedVM,
> >>>> boot hangs very early stage of the boot.
> >>>> 
> >>>> I did some debug using ftrace and it seems the Guest-Hypervisor is
> >>>> getting very high rate of arch-timer interrupts,
> >>>> due to that all CPU time is going on in serving the Guest-Hypervisor
> >>>> and it is never going back to NestedVM.
> >>>> 
> >>>> I am using QEMU vanilla version v7.2.0 with top-up patches for NV [1]
> >>> 
> >>> So I went ahead and gave QEMU a go. On my systems, *nothing* works (I
> >>> cannot even boot a L1 with 'virtualization=on" (the guest is stuck at
> >>> the point where virtio gets probed and waits for its first interrupt).
> >>> 
> >>> Worse, booting a hVHE guest results in QEMU generating an assert as it
> >>> tries to inject an interrupt using the QEMU GICv3 model, something
> >>> that should *NEVER* be in use with KVM.
> >>> 
> >>> With help from Eric, I got to a point where the hVHE guest could boot
> >>> as long as I kept injecting console interrupts, which is again a
> >>> symptom of the vGIC not being used.
> >>> 
> >>> So something is *majorly* wrong with the QEMU patches. I don't know
> >>> what makes it possible for you to even boot the L1 - if the GIC is
> >>> external, injecting an interrupt in the L2 is simply impossible.
> >>> 
> >>> Miguel, can you please investigate this?
> >>> 
> >>> In the meantime, I'll add some code to the kernel side to refuse the
> >>> external interrupt controller configuration with NV. Hopefully that
> >>> will lead to some clues about what is going on.
> >> 
> >> Continued debugging of the issue and it seems the endless ptimer
> >> interrupts on Ampere platform is due to some mess up of CVAL of
> >> ptimer, resulting in interrupt triggered always when it is enabled.
> >> 
> >> I see function "timer_set_offset" called from kvm_arm_timer_set_reg
> >> in QEMU case but there is no such calls in kvmtool boot.
> >> 
> >> If I comment the timer_set_offset calls in kvm_arm_timer_set_reg
> >> function, then I could boot the Guest-Hypervisor then NestedVM from
> >> GH/L1.
> >> 
> >> I also observed in QEMU case, kvm_arm_timer_set_reg is called to
> >> set CNT, CVAL and CTL of both vtimer and ptimer.
> >> Not sure why QEMU is setting these registers explicitly? need to dig.
> >> 
> > 
> > I don't see any direct ioctl calls to change any timer
> > registers. Looks like it is happening from the emulation
> > code(target/arm/helper.c)?
> 
> function "write_list_to_kvmstate(target/arm/kvm.c)" is issuing ioctl
> to write timer registers. PTIMER_CNT and TIMER_CNT writes from this
> function is resulting in offsets change.

Madness. Why is QEMU doing this? It has no business writing to the
timer at any stage, at least with KVM.

This confirms my suspicions that QEMU is confused about what mode it
runs in, and does all sort of funky things it was never expected to
do.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v10 09/59] KVM: arm64: nv: Add trap forwarding infrastructure
  2023-05-15 17:30 ` [PATCH v10 09/59] KVM: arm64: nv: Add trap forwarding infrastructure Marc Zyngier
@ 2023-07-13 14:29   ` Eric Auger
  2023-07-14 10:53     ` Marc Zyngier
  0 siblings, 1 reply; 95+ messages in thread
From: Eric Auger @ 2023-07-13 14:29 UTC (permalink / raw)
  To: Marc Zyngier, kvmarm, kvm, linux-arm-kernel
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Ganapatrao Kulkarni, Darren Hart, Jintack Lim,
	Russell King, Miguel Luis, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu

Hi Marc,

On 5/15/23 19:30, Marc Zyngier wrote:
> A significant part of what a NV hypervisor needs to do is to decide
> whether a trap from a L2+ guest has to be forwarded to a L1 guest
> or handled locally. This is done by checking for the trap bits that
I am confused by the terminology. The comment below says
' When the trapped access matches one of the trap controls, the
exception is re-injected in the nested hypervisor. '

> the guest hypervisor has set and acting accordingly, as described by
> the architecture.
> 
> A previous approach was to sprinkle a bunch of checks in all the
> system register accessors, but this is pretty error prone and doesn't
> help getting an overview of what is happening.
> 
> Instead, implement a set of global tables that describe a trap bit,
> combinations of trap bits, behaviours on trap, and what bits must
> be evaluated on a system register trap.
> 
> Although this is painful to describe, this allows to specify each
> and every control bit in a static manner. To make it efficient,
> the table is inserted in an xarray that is global to the system,
> and checked each time we trap a system register.
> 
> Add the basic infrastructure for now, while additional patches will
> implement configuration registers.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_host.h   |   1 +
>  arch/arm64/include/asm/kvm_nested.h |   2 +
>  arch/arm64/kvm/emulate-nested.c     | 175 ++++++++++++++++++++++++++++
>  arch/arm64/kvm/sys_regs.c           |   6 +
>  arch/arm64/kvm/trace_arm.h          |  19 +++
>  5 files changed, 203 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index f2e3b5889f8b..65810618cb42 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -960,6 +960,7 @@ int kvm_handle_cp10_id(struct kvm_vcpu *vcpu);
>  void kvm_reset_sys_regs(struct kvm_vcpu *vcpu);
>  
>  int __init kvm_sys_reg_table_init(void);
> +void __init populate_nv_trap_config(void);
>  
>  bool lock_all_vcpus(struct kvm *kvm);
>  void unlock_all_vcpus(struct kvm *kvm);
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index 8fb67f032fd1..fa23cc9c2adc 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -11,6 +11,8 @@ static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
>  		test_bit(KVM_ARM_VCPU_HAS_EL2, vcpu->arch.features));
>  }
>  
> +extern bool __check_nv_sr_forward(struct kvm_vcpu *vcpu);
> +
>  struct sys_reg_params;
>  struct sys_reg_desc;
>  
> diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
> index b96662029fb1..a923f7f47add 100644
> --- a/arch/arm64/kvm/emulate-nested.c
> +++ b/arch/arm64/kvm/emulate-nested.c
> @@ -14,6 +14,181 @@
>  
>  #include "trace.h"
>  
> +enum trap_behaviour {
> +	BEHAVE_HANDLE_LOCALLY	= 0,
> +	BEHAVE_FORWARD_READ	= BIT(0),
> +	BEHAVE_FORWARD_WRITE	= BIT(1),
> +	BEHAVE_FORWARD_ANY	= BEHAVE_FORWARD_READ | BEHAVE_FORWARD_WRITE,
> +};
> +
> +struct trap_bits {
> +	const enum vcpu_sysreg		index;
> +	const enum trap_behaviour	behaviour;
> +	const u64			value;
> +	const u64			mask;
> +};
> +
> +enum coarse_grain_trap_id {
drop coarse in the above name? It seems to feature both coarse, combos
and complex conditions ids?
> +	/* Indicates no coarse trap control */
> +	__RESERVED__,
> +
> +	/*
> +	 * The first batch of IDs denote coarse trapping that are used
> +	 * on their own instead of being part of a combination of
> +	 * trap controls.
> +	 */
> +
> +	/*
> +	 * Anything after this point is a combination of trap controls,
> +	 * which all must be evaluated to decide what to do.
> +	 */
> +	__MULTIPLE_CONTROL_BITS__,
> +
> +	/*
> +	 * Anything after this point requires a callback evaluating a
> +	 * complex trap condition. Hopefully we'll never need this...
> +	 */
> +	__COMPLEX_CONDITIONS__,> +};
> +
> +static const struct trap_bits coarse_trap_bits[] = {
> +};
> +
> +#define MCB(id, ...)					\
> +	[id - __MULTIPLE_CONTROL_BITS__]	=	\
> +		(const enum coarse_grain_trap_id []){	\
> +			__VA_ARGS__ , __RESERVED__	\
> +		}
nit there are few check patch errors
> +
> +static const enum coarse_grain_trap_id *coarse_control_combo[] = {
> +};
> +
> +typedef enum trap_behaviour (*complex_condition_check)(struct kvm_vcpu *);
> +
> +#define CCC(id, fn)	[id - __COMPLEX_CONDITIONS__] = fn
> +
> +static const complex_condition_check ccc[] = {
> +};
> +
> +struct encoding_to_trap_configs {
> +	const u32			encoding;
> +	const u32			end;
> +	const enum coarse_grain_trap_id	id;
> +};
> +
> +#define SR_RANGE_TRAP(sr_start, sr_end, trap_id)			\
> +	{								\
> +		.encoding	= sr_start,				\
> +		.end		= sr_end,				\
> +		.id		= trap_id,				\
> +	}
> +
> +#define SR_TRAP(sr, trap_id)		SR_RANGE_TRAP(sr, sr, trap_id)
> +
> +/*
> + * Map encoding to trap bits for exception reported with EC=0x18.
> + * These must only be evaluated when running a nested hypervisor, but
> + * that the current context is not a hypervisor context. When the
> + * trapped access matches one of the trap controls, the exception is
> + * re-injected in the nested hypervisor.
> + */
> +static const struct encoding_to_trap_configs encoding_to_traps[] __initdata = {
> +};
> +
> +static DEFINE_XARRAY(sr_forward_xa);
> +
> +void __init populate_nv_trap_config(void)
> +{
> +	for (int i = 0; i < ARRAY_SIZE(encoding_to_traps); i++) {
> +		const struct encoding_to_trap_configs *ett = &encoding_to_traps[i];
> +		void *prev;
> +
> +		prev = xa_store_range(&sr_forward_xa, ett->encoding, ett->end,
> +				      xa_mk_value(ett->id), GFP_KERNEL);
> +		WARN_ON(prev);
> +	}
> +
> +	kvm_info("nv: %ld trap handlers\n", ARRAY_SIZE(encoding_to_traps));
> +}
> +
> +static const enum coarse_grain_trap_id get_trap_config(u32 sysreg)
> +{
> +	return xa_to_value(xa_load(&sr_forward_xa, sysreg));
> +}
> +
> +static enum trap_behaviour get_behaviour(struct kvm_vcpu *vcpu,
> +					 const struct trap_bits *tb)
> +{
> +	enum trap_behaviour b = BEHAVE_HANDLE_LOCALLY;
> +	u64 val;
> +
> +	val = __vcpu_sys_reg(vcpu, tb->index);
> +	if ((val & tb->mask) == tb->value)
> +		b |= tb->behaviour;
> +
> +	return b;
> +}
> +
> +static enum trap_behaviour __do_compute_behaviour(struct kvm_vcpu *vcpu,
> +						  const enum coarse_grain_trap_id id,
> +						  enum trap_behaviour b)
> +{
> +	switch (id) {
> +		const enum coarse_grain_trap_id *cgids;
> +
> +	case __RESERVED__ ... __MULTIPLE_CONTROL_BITS__ - 1:
> +		if (likely(id != __RESERVED__))
> +			b |= get_behaviour(vcpu, &coarse_trap_bits[id]);
> +		break;
> +	case __MULTIPLE_CONTROL_BITS__ ... __COMPLEX_CONDITIONS__ - 1:
> +		/* Yes, this is recursive. Don't do anything stupid. */
> +		cgids = coarse_control_combo[id - __MULTIPLE_CONTROL_BITS__];
> +		for (int i = 0; cgids[i] != __RESERVED__; i++)
> +			b |= __do_compute_behaviour(vcpu, cgids[i], b);
> +		break;
> +	default:
> +		if (ARRAY_SIZE(ccc))
> +			b |= ccc[id -  __COMPLEX_CONDITIONS__](vcpu);
> +		break;
> +	}
> +
> +	return b;
> +}
> +
> +static enum trap_behaviour compute_behaviour(struct kvm_vcpu *vcpu, u32 sysreg)
> +{
> +	const enum coarse_grain_trap_id id = get_trap_config(sysreg);
> +	enum trap_behaviour b = BEHAVE_HANDLE_LOCALLY;
> +
> +	return __do_compute_behaviour(vcpu, id, b);
> +}
> +
> +bool __check_nv_sr_forward(struct kvm_vcpu *vcpu)
> +{
> +	enum trap_behaviour b;
> +	bool is_read;
> +	u32 sysreg;
> +	u64 esr;
> +
> +	if (!vcpu_has_nv(vcpu) || is_hyp_ctxt(vcpu))
> +		return false;
> +
> +	esr = kvm_vcpu_get_esr(vcpu);
> +	sysreg = esr_sys64_to_sysreg(esr);
> +	is_read = (esr & ESR_ELx_SYS64_ISS_DIR_MASK) == ESR_ELx_SYS64_ISS_DIR_READ;
> +
> +	b = compute_behaviour(vcpu, sysreg);
nit maybe compute_trap_behaviour would be clearer/more explicit about
what it does here and before.
> +
> +	if (!((b & BEHAVE_FORWARD_READ) && is_read) &&
> +	    !((b & BEHAVE_FORWARD_WRITE) && !is_read))
> +		return false;
> +
> +	trace_kvm_forward_sysreg_trap(vcpu, sysreg, is_read);
> +
> +	kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
> +	return true;
> +}
> +
>  static u64 kvm_check_illegal_exception_return(struct kvm_vcpu *vcpu, u64 spsr)
>  {
>  	u64 mode = spsr & PSR_MODE_MASK;
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 71b12094d613..77f8bc64148a 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -2955,6 +2955,9 @@ int kvm_handle_sys_reg(struct kvm_vcpu *vcpu)
>  
>  	trace_kvm_handle_sys_reg(esr);
>  
> +	if (__check_nv_sr_forward(vcpu))
> +		return 1;
> +
>  	params = esr_sys64_to_params(esr);
>  	params.regval = vcpu_get_reg(vcpu, Rt);
>  
> @@ -3363,5 +3366,8 @@ int __init kvm_sys_reg_table_init(void)
>  	for (i = 0; i < ARRAY_SIZE(invariant_sys_regs); i++)
>  		invariant_sys_regs[i].reset(NULL, &invariant_sys_regs[i]);
>  
> +	if (kvm_get_mode() == KVM_MODE_NV)
> +		populate_nv_trap_config();
> +		
>  	return 0;
>  }
> diff --git a/arch/arm64/kvm/trace_arm.h b/arch/arm64/kvm/trace_arm.h
> index 6ce5c025218d..1f0f3f653606 100644
> --- a/arch/arm64/kvm/trace_arm.h
> +++ b/arch/arm64/kvm/trace_arm.h
> @@ -364,6 +364,25 @@ TRACE_EVENT(kvm_inject_nested_exception,
>  		  __entry->hcr_el2)
>  );
>  
> +TRACE_EVENT(kvm_forward_sysreg_trap,
> +	    TP_PROTO(struct kvm_vcpu *vcpu, u32 sysreg, bool is_read),
> +	    TP_ARGS(vcpu, sysreg, is_read),
> +
> +	    TP_STRUCT__entry(
> +		__field(struct kvm_vcpu *, vcpu)
> +		__field(u32,		   sysreg)
> +		__field(bool,		   is_read)
> +	    ),
> +
> +	    TP_fast_assign(
> +		__entry->vcpu = vcpu;
> +		__entry->sysreg = sysreg;
> +		__entry->is_read = is_read;
> +	    ),
> +
> +	    TP_printk("%c %x", __entry->is_read ? 'R' : 'W', __entry->sysreg)
> +);
> +
>  #endif /* _TRACE_ARM_ARM64_KVM_H */
>  
>  #undef TRACE_INCLUDE_PATH

Thanks

Eric


^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v10 09/59] KVM: arm64: nv: Add trap forwarding infrastructure
  2023-07-13 14:29   ` Eric Auger
@ 2023-07-14 10:53     ` Marc Zyngier
  0 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-07-14 10:53 UTC (permalink / raw)
  To: Eric Auger
  Cc: kvmarm, kvm, linux-arm-kernel, Alexandru Elisei, Andre Przywara,
	Chase Conklin, Christoffer Dall, Ganapatrao Kulkarni,
	Darren Hart, Jintack Lim, Russell King, Miguel Luis, James Morse,
	Suzuki K Poulose, Oliver Upton, Zenghui Yu

Hi Eric,

Careful, you are not replying to the trap forwarding series, but to
an older full NV series. The code hasn't majorly changed since, but
there are some differences.

On Thu, 13 Jul 2023 15:29:02 +0100,
Eric Auger <eauger@redhat.com> wrote:
> 
> Hi Marc,
> 
> On 5/15/23 19:30, Marc Zyngier wrote:
> > A significant part of what a NV hypervisor needs to do is to decide
> > whether a trap from a L2+ guest has to be forwarded to a L1 guest
> > or handled locally. This is done by checking for the trap bits that
> I am confused by the terminology. The comment below says
> ' When the trapped access matches one of the trap controls, the
> exception is re-injected in the nested hypervisor. '

Can you spell out what confuses you here? I'm happy to rework the
commit log, the comment, or even both of them.

> 
> > the guest hypervisor has set and acting accordingly, as described by
> > the architecture.
> > 
> > A previous approach was to sprinkle a bunch of checks in all the
> > system register accessors, but this is pretty error prone and doesn't
> > help getting an overview of what is happening.
> > 
> > Instead, implement a set of global tables that describe a trap bit,
> > combinations of trap bits, behaviours on trap, and what bits must
> > be evaluated on a system register trap.
> > 
> > Although this is painful to describe, this allows to specify each
> > and every control bit in a static manner. To make it efficient,
> > the table is inserted in an xarray that is global to the system,
> > and checked each time we trap a system register.
> > 
> > Add the basic infrastructure for now, while additional patches will
> > implement configuration registers.
> > 
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> >  arch/arm64/include/asm/kvm_host.h   |   1 +
> >  arch/arm64/include/asm/kvm_nested.h |   2 +
> >  arch/arm64/kvm/emulate-nested.c     | 175 ++++++++++++++++++++++++++++
> >  arch/arm64/kvm/sys_regs.c           |   6 +
> >  arch/arm64/kvm/trace_arm.h          |  19 +++
> >  5 files changed, 203 insertions(+)
> > 
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index f2e3b5889f8b..65810618cb42 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -960,6 +960,7 @@ int kvm_handle_cp10_id(struct kvm_vcpu *vcpu);
> >  void kvm_reset_sys_regs(struct kvm_vcpu *vcpu);
> >  
> >  int __init kvm_sys_reg_table_init(void);
> > +void __init populate_nv_trap_config(void);
> >  
> >  bool lock_all_vcpus(struct kvm *kvm);
> >  void unlock_all_vcpus(struct kvm *kvm);
> > diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> > index 8fb67f032fd1..fa23cc9c2adc 100644
> > --- a/arch/arm64/include/asm/kvm_nested.h
> > +++ b/arch/arm64/include/asm/kvm_nested.h
> > @@ -11,6 +11,8 @@ static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
> >  		test_bit(KVM_ARM_VCPU_HAS_EL2, vcpu->arch.features));
> >  }
> >  
> > +extern bool __check_nv_sr_forward(struct kvm_vcpu *vcpu);
> > +
> >  struct sys_reg_params;
> >  struct sys_reg_desc;
> >  
> > diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
> > index b96662029fb1..a923f7f47add 100644
> > --- a/arch/arm64/kvm/emulate-nested.c
> > +++ b/arch/arm64/kvm/emulate-nested.c
> > @@ -14,6 +14,181 @@
> >  
> >  #include "trace.h"
> >  
> > +enum trap_behaviour {
> > +	BEHAVE_HANDLE_LOCALLY	= 0,
> > +	BEHAVE_FORWARD_READ	= BIT(0),
> > +	BEHAVE_FORWARD_WRITE	= BIT(1),
> > +	BEHAVE_FORWARD_ANY	= BEHAVE_FORWARD_READ | BEHAVE_FORWARD_WRITE,
> > +};
> > +
> > +struct trap_bits {
> > +	const enum vcpu_sysreg		index;
> > +	const enum trap_behaviour	behaviour;
> > +	const u64			value;
> > +	const u64			mask;
> > +};
> > +
> > +enum coarse_grain_trap_id {
> drop coarse in the above name? It seems to feature both coarse, combos
> and complex conditions ids?

I used 'coarse' in opposition to 'fine', but I agree this is
confusing. How about 'trap_group' instead, in an effort to preserve
the idea that it has a wider impact than the fine-grained traps?

> > +	/* Indicates no coarse trap control */
> > +	__RESERVED__,
> > +
> > +	/*
> > +	 * The first batch of IDs denote coarse trapping that are used
> > +	 * on their own instead of being part of a combination of
> > +	 * trap controls.
> > +	 */
> > +
> > +	/*
> > +	 * Anything after this point is a combination of trap controls,
> > +	 * which all must be evaluated to decide what to do.
> > +	 */
> > +	__MULTIPLE_CONTROL_BITS__,
> > +
> > +	/*
> > +	 * Anything after this point requires a callback evaluating a
> > +	 * complex trap condition. Hopefully we'll never need this...
> > +	 */
> > +	__COMPLEX_CONDITIONS__,> +};
> > +
> > +static const struct trap_bits coarse_trap_bits[] = {
> > +};
> > +
> > +#define MCB(id, ...)					\
> > +	[id - __MULTIPLE_CONTROL_BITS__]	=	\
> > +		(const enum coarse_grain_trap_id []){	\
> > +			__VA_ARGS__ , __RESERVED__	\
> > +		}
> nit there are few check patch errors

checkpatch? is this still a thing? ;-) I'll have a look at what it's
angry about...

> > +
> > +static const enum coarse_grain_trap_id *coarse_control_combo[] = {
> > +};
> > +
> > +typedef enum trap_behaviour (*complex_condition_check)(struct kvm_vcpu *);
> > +
> > +#define CCC(id, fn)	[id - __COMPLEX_CONDITIONS__] = fn
> > +
> > +static const complex_condition_check ccc[] = {
> > +};
> > +
> > +struct encoding_to_trap_configs {
> > +	const u32			encoding;
> > +	const u32			end;
> > +	const enum coarse_grain_trap_id	id;
> > +};
> > +
> > +#define SR_RANGE_TRAP(sr_start, sr_end, trap_id)			\
> > +	{								\
> > +		.encoding	= sr_start,				\
> > +		.end		= sr_end,				\
> > +		.id		= trap_id,				\
> > +	}
> > +
> > +#define SR_TRAP(sr, trap_id)		SR_RANGE_TRAP(sr, sr, trap_id)
> > +
> > +/*
> > + * Map encoding to trap bits for exception reported with EC=0x18.
> > + * These must only be evaluated when running a nested hypervisor, but
> > + * that the current context is not a hypervisor context. When the
> > + * trapped access matches one of the trap controls, the exception is
> > + * re-injected in the nested hypervisor.
> > + */
> > +static const struct encoding_to_trap_configs encoding_to_traps[] __initdata = {
> > +};
> > +
> > +static DEFINE_XARRAY(sr_forward_xa);
> > +
> > +void __init populate_nv_trap_config(void)
> > +{
> > +	for (int i = 0; i < ARRAY_SIZE(encoding_to_traps); i++) {
> > +		const struct encoding_to_trap_configs *ett = &encoding_to_traps[i];
> > +		void *prev;
> > +
> > +		prev = xa_store_range(&sr_forward_xa, ett->encoding, ett->end,
> > +				      xa_mk_value(ett->id), GFP_KERNEL);
> > +		WARN_ON(prev);
> > +	}
> > +
> > +	kvm_info("nv: %ld trap handlers\n", ARRAY_SIZE(encoding_to_traps));
> > +}
> > +
> > +static const enum coarse_grain_trap_id get_trap_config(u32 sysreg)
> > +{
> > +	return xa_to_value(xa_load(&sr_forward_xa, sysreg));
> > +}
> > +
> > +static enum trap_behaviour get_behaviour(struct kvm_vcpu *vcpu,
> > +					 const struct trap_bits *tb)
> > +{
> > +	enum trap_behaviour b = BEHAVE_HANDLE_LOCALLY;
> > +	u64 val;
> > +
> > +	val = __vcpu_sys_reg(vcpu, tb->index);
> > +	if ((val & tb->mask) == tb->value)
> > +		b |= tb->behaviour;
> > +
> > +	return b;
> > +}
> > +
> > +static enum trap_behaviour __do_compute_behaviour(struct kvm_vcpu *vcpu,
> > +						  const enum coarse_grain_trap_id id,
> > +						  enum trap_behaviour b)
> > +{
> > +	switch (id) {
> > +		const enum coarse_grain_trap_id *cgids;
> > +
> > +	case __RESERVED__ ... __MULTIPLE_CONTROL_BITS__ - 1:
> > +		if (likely(id != __RESERVED__))
> > +			b |= get_behaviour(vcpu, &coarse_trap_bits[id]);
> > +		break;
> > +	case __MULTIPLE_CONTROL_BITS__ ... __COMPLEX_CONDITIONS__ - 1:
> > +		/* Yes, this is recursive. Don't do anything stupid. */
> > +		cgids = coarse_control_combo[id - __MULTIPLE_CONTROL_BITS__];
> > +		for (int i = 0; cgids[i] != __RESERVED__; i++)
> > +			b |= __do_compute_behaviour(vcpu, cgids[i], b);
> > +		break;
> > +	default:
> > +		if (ARRAY_SIZE(ccc))
> > +			b |= ccc[id -  __COMPLEX_CONDITIONS__](vcpu);
> > +		break;
> > +	}
> > +
> > +	return b;
> > +}
> > +
> > +static enum trap_behaviour compute_behaviour(struct kvm_vcpu *vcpu, u32 sysreg)
> > +{
> > +	const enum coarse_grain_trap_id id = get_trap_config(sysreg);
> > +	enum trap_behaviour b = BEHAVE_HANDLE_LOCALLY;
> > +
> > +	return __do_compute_behaviour(vcpu, id, b);
> > +}
> > +
> > +bool __check_nv_sr_forward(struct kvm_vcpu *vcpu)
> > +{
> > +	enum trap_behaviour b;
> > +	bool is_read;
> > +	u32 sysreg;
> > +	u64 esr;
> > +
> > +	if (!vcpu_has_nv(vcpu) || is_hyp_ctxt(vcpu))
> > +		return false;
> > +
> > +	esr = kvm_vcpu_get_esr(vcpu);
> > +	sysreg = esr_sys64_to_sysreg(esr);
> > +	is_read = (esr & ESR_ELx_SYS64_ISS_DIR_MASK) == ESR_ELx_SYS64_ISS_DIR_READ;
> > +
> > +	b = compute_behaviour(vcpu, sysreg);
> nit maybe compute_trap_behaviour would be clearer/more explicit about
> what it does here and before.

Yup, that's a sensible change. I'll do that.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support
  2023-07-10 12:56     ` Miguel Luis
@ 2023-07-18 10:29       ` Miguel Luis
  0 siblings, 0 replies; 95+ messages in thread
From: Miguel Luis @ 2023-07-18 10:29 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Ganapatrao Kulkarni, kvmarm, kvm, linux-arm-kernel,
	Alexandru Elisei, Andre Przywara, Chase Conklin,
	Christoffer Dall, Darren Hart, Jintack Lim, Russell King,
	James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	Eric Auger

Hi Marc,

> On 10 Jul 2023, at 12:56, Miguel Luis <miguel.luis@oracle.com> wrote:
> 
> Hi Marc,
> 
>> On 29 Jun 2023, at 07:03, Marc Zyngier <maz@kernel.org> wrote:
>> 
>> Hi Ganapatrao,
>> 
>> On Wed, 28 Jun 2023 07:45:55 +0100,
>> Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
>>> 
>>> 
>>> Hi Marc,
>>> 
>>> 
>>> On 15-05-2023 11:00 pm, Marc Zyngier wrote:
>>>> This is the 4th drop of NV support on arm64 for this year.
>>>> 
>>>> For the previous episodes, see [1].
>>>> 
>>>> What's changed:
>>>> 
>>>> - New framework to track system register traps that are reinjected in
>>>>  guest EL2. It is expected to replace the discrete handling we have
>>>>  enjoyed so far, which didn't scale at all. This has already fixed a
>>>>  number of bugs that were hidden (a bunch of traps were never
>>>>  forwarded...). Still a work in progress, but this is going in the
>>>>  right direction.
>>>> 
>>>> - Allow the L1 hypervisor to have a S2 that has an input larger than
>>>>  the L0 IPA space. This fixes a number of subtle issues, depending on
>>>>  how the initial guest was created.
>>>> 
>>>> - Consequently, the patch series has gone longer again. Boo. But
>>>>  hopefully some of it is easier to review...
>>>> 
>>> 
>>> I am facing issue in booting NestedVM with V9 as well with 10 patchset.
>>> 
>>> I have tried V9/V10 on Ampere platform using kvmtool and I could boot
>>> Guest-Hypervisor and then NestedVM without any issue.
>>> However when I try to boot using QEMU(not using EDK2/EFI),
>>> Guest-Hypervisor is booted with Fedora 37 using virtio disk. From
>>> Guest-Hypervisor console(or ssh shell), If I try to boot NestedVM,
>>> boot hangs very early stage of the boot.
>>> 
>>> I did some debug using ftrace and it seems the Guest-Hypervisor is
>>> getting very high rate of arch-timer interrupts,
>>> due to that all CPU time is going on in serving the Guest-Hypervisor
>>> and it is never going back to NestedVM.
>>> 
>>> I am using QEMU vanilla version v7.2.0 with top-up patches for NV [1]
>> 
>> So I went ahead and gave QEMU a go. On my systems, *nothing* works (I
>> cannot even boot a L1 with 'virtualization=on" (the guest is stuck at
>> the point where virtio gets probed and waits for its first interrupt).

In order to use the previous patches you need to update the linux headers
of QEMU according to the target kernel you’re testing. So you would want to run
./scripts/update-linux-headers.sh <kernel src dir> <qemu src dir> in the place of
patch 1 then you should be able to boot the L1 guest with virtualization=on.

Regarding the L2 guest, it does not boot and I’m in the process of
understanding why. The previous patches had some improvements to make
but I couldn’t relate them to not booting the L2 guest, yet.

Eric stated an issue with NV and SVE enablement which I’m still looking at.
( which is similar to the commit
5b578f088ada3c4319f7274c0221b5d92143fe6a in your kvmtool branch arm64/nv-5.16 )

As a test I’ve disabled SVE on the kernel side with 'arm64.nosve’ and
the output matches the one from your kvmtool:

[    0.000000] CPU features: SYS_ID_AA64PFR0_EL1[35:32]: already set to 0
[    0.000000] CPU features: SYS_ID_AA64ZFR0_EL1[59:56]: already set to 0
[    0.000000] CPU features: SYS_ID_AA64ZFR0_EL1[55:52]: already set to 0
[    0.000000] CPU features: SYS_ID_AA64ZFR0_EL1[47:44]: already set to 0
[    0.000000] CPU features: SYS_ID_AA64ZFR0_EL1[43:40]: already set to 0
[    0.000000] CPU features: SYS_ID_AA64ZFR0_EL1[35:32]: already set to 0
[    0.000000] CPU features: SYS_ID_AA64ZFR0_EL1[23:20]: already set to 0
[    0.000000] CPU features: SYS_ID_AA64ZFR0_EL1[19:16]: already set to 0
[    0.000000] CPU features: SYS_ID_AA64ZFR0_EL1[7:4]: already set to 0
[    0.000000] CPU features: SYS_ID_AA64ZFR0_EL1[3:0]: already set to 0

That didn’t enabled the L2 guest to boot on QEMU so the issue feels still in a grey area.

As a baseline I tested your kvmtool branch for 5.16 after updating the includes,
and as I expected both L1 and L2 guests boot.

Miguel

>> 
>> Worse, booting a hVHE guest results in QEMU generating an assert as it
>> tries to inject an interrupt using the QEMU GICv3 model, something
>> that should *NEVER* be in use with KVM.
>> 
>> With help from Eric, I got to a point where the hVHE guest could boot
>> as long as I kept injecting console interrupts, which is again a
>> symptom of the vGIC not being used.
>> 
>> So something is *majorly* wrong with the QEMU patches. I don't know
>> what makes it possible for you to even boot the L1 - if the GIC is
>> external, injecting an interrupt in the L2 is simply impossible.
>> 
>> Miguel, can you please investigate this?
> 
> Yes, I will investigate it. Sorry for the delay in replying as I took a break
> short after KVM forum and I’ve just started to sync up.
> 
> Thanks,
> Miguel
> 
>> 
>> In the meantime, I'll add some code to the kernel side to refuse the
>> external interrupt controller configuration with NV. Hopefully that
>> will lead to some clues about what is going on.
>> 
>> Thanks,
>> 
>> M.
>> 
>> -- 
>> Without deviation from the norm, progress is not possible.



^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v10 29/59] KVM: arm64: nv: Unmap/flush shadow stage 2 page tables
  2023-05-15 17:30 ` [PATCH v10 29/59] KVM: arm64: nv: Unmap/flush shadow stage 2 page tables Marc Zyngier
@ 2023-09-14 13:10   ` Ganapatrao Kulkarni
  2023-09-14 13:37     ` Marc Zyngier
  0 siblings, 1 reply; 95+ messages in thread
From: Ganapatrao Kulkarni @ 2023-09-14 13:10 UTC (permalink / raw)
  To: Marc Zyngier, kvmarm, kvm, linux-arm-kernel, Christoffer Dall
  Cc: Alexandru Elisei, Andre Przywara, Chase Conklin, Darren Hart,
	Jintack Lim, Russell King, Miguel Luis, James Morse,
	Suzuki K Poulose, Oliver Upton, Zenghui Yu, D Scott Phillips


Hi Marc,

On 15-05-2023 11:00 pm, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@linaro.org>
> 
> Unmap/flush shadow stage 2 page tables for the nested VMs as well as the
> stage 2 page table for the guest hypervisor.
> 
> Note: A bunch of the code in mmu.c relating to MMU notifiers is
> currently dealt with in an extremely abrupt way, for example by clearing
> out an entire shadow stage-2 table. This will be handled in a more
> efficient way using the reverse mapping feature in a later version of
> the patch series.

We are seeing spin-lock contention due to this patch when the 
Guest-Hypervisor(L1) is booted with higher number of cores and auto-numa 
is enabled on L0.
kvm_nested_s2_unmap is called as part of notifier call-back when numa 
page migration is happening and this function which holds lock becomes 
source of contention when there are more vCPUs are processing the 
auto-numa page fault/migration.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>   arch/arm64/include/asm/kvm_mmu.h    |  3 +++
>   arch/arm64/include/asm/kvm_nested.h |  3 +++
>   arch/arm64/kvm/mmu.c                | 30 ++++++++++++++++++----
>   arch/arm64/kvm/nested.c             | 39 +++++++++++++++++++++++++++++
>   4 files changed, 70 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 896acdf98e71..d155b3871c4c 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -169,6 +169,8 @@ int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
>   			   void __iomem **haddr);
>   int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
>   			     void **haddr);
> +void kvm_stage2_flush_range(struct kvm_s2_mmu *mmu,
> +			    phys_addr_t addr, phys_addr_t end);
>   void __init free_hyp_pgds(void);
>   
>   void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size);
> @@ -177,6 +179,7 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long t
>   void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
>   int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
>   			  phys_addr_t pa, unsigned long size, bool writable);
> +void kvm_stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end);
>   
>   int kvm_handle_guest_abort(struct kvm_vcpu *vcpu);
>   
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index f20d272fcd6d..d330b947d48a 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -111,6 +111,9 @@ extern int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
>   extern int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu,
>   				    struct kvm_s2_trans *trans);
>   extern int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2);
> +extern void kvm_nested_s2_wp(struct kvm *kvm);
> +extern void kvm_nested_s2_unmap(struct kvm *kvm);
> +extern void kvm_nested_s2_flush(struct kvm *kvm);
>   int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
>   
>   extern bool forward_smc_trap(struct kvm_vcpu *vcpu);
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 1e19c59b8235..8144bb9b9ec8 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -245,13 +245,19 @@ void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
>   	__unmap_stage2_range(mmu, start, size, true);
>   }
>   
> +void kvm_stage2_flush_range(struct kvm_s2_mmu *mmu,
> +			    phys_addr_t addr, phys_addr_t end)
> +{
> +	stage2_apply_range_resched(mmu, addr, end, kvm_pgtable_stage2_flush);
> +}
> +
>   static void stage2_flush_memslot(struct kvm *kvm,
>   				 struct kvm_memory_slot *memslot)
>   {
>   	phys_addr_t addr = memslot->base_gfn << PAGE_SHIFT;
>   	phys_addr_t end = addr + PAGE_SIZE * memslot->npages;
>   
> -	stage2_apply_range_resched(&kvm->arch.mmu, addr, end, kvm_pgtable_stage2_flush);
> +	kvm_stage2_flush_range(&kvm->arch.mmu, addr, end);
>   }
>   
>   /**
> @@ -274,6 +280,8 @@ static void stage2_flush_vm(struct kvm *kvm)
>   	kvm_for_each_memslot(memslot, bkt, slots)
>   		stage2_flush_memslot(kvm, memslot);
>   
> +	kvm_nested_s2_flush(kvm);
> +
>   	write_unlock(&kvm->mmu_lock);
>   	srcu_read_unlock(&kvm->srcu, idx);
>   }
> @@ -888,6 +896,8 @@ void stage2_unmap_vm(struct kvm *kvm)
>   	kvm_for_each_memslot(memslot, bkt, slots)
>   		stage2_unmap_memslot(kvm, memslot);
>   
> +	kvm_nested_s2_unmap(kvm);
> +
>   	write_unlock(&kvm->mmu_lock);
>   	mmap_read_unlock(current->mm);
>   	srcu_read_unlock(&kvm->srcu, idx);
> @@ -987,12 +997,12 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
>   }
>   
>   /**
> - * stage2_wp_range() - write protect stage2 memory region range
> + * kvm_stage2_wp_range() - write protect stage2 memory region range
>    * @mmu:        The KVM stage-2 MMU pointer
>    * @addr:	Start address of range
>    * @end:	End address of range
>    */
> -static void stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end)
> +void kvm_stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end)
>   {
>   	stage2_apply_range_resched(mmu, addr, end, kvm_pgtable_stage2_wrprotect);
>   }
> @@ -1023,7 +1033,8 @@ static void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
>   	end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
>   
>   	write_lock(&kvm->mmu_lock);
> -	stage2_wp_range(&kvm->arch.mmu, start, end);
> +	kvm_stage2_wp_range(&kvm->arch.mmu, start, end);
> +	kvm_nested_s2_wp(kvm);
>   	write_unlock(&kvm->mmu_lock);
>   	kvm_flush_remote_tlbs(kvm);
>   }
> @@ -1047,7 +1058,7 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
>   	phys_addr_t start = (base_gfn +  __ffs(mask)) << PAGE_SHIFT;
>   	phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT;
>   
> -	stage2_wp_range(&kvm->arch.mmu, start, end);
> +	kvm_stage2_wp_range(&kvm->arch.mmu, start, end);
>   }
>   
>   /*
> @@ -1062,6 +1073,7 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>   		gfn_t gfn_offset, unsigned long mask)
>   {
>   	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
> +	kvm_nested_s2_wp(kvm);
>   }
>   
>   static void kvm_send_hwpoison_signal(unsigned long address, short lsb)
> @@ -1720,6 +1732,7 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
>   			     (range->end - range->start) << PAGE_SHIFT,
>   			     range->may_block);
>   
> +	kvm_nested_s2_unmap(kvm);

This kvm_nested_s2_unmap/kvm_unmap_stage2_range is called for every 
active L2 and page table walk-through iterates for long iterations since 
kvm_phys_size(mmu) is pretty big size(atleast 48bits).
What would be the best fix if we want to avoid this unnessary long 
iteration of page table lookup?

>   	return false;
>   }
>   
> @@ -1754,6 +1767,7 @@ bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
>   			       PAGE_SIZE, __pfn_to_phys(pfn),
>   			       KVM_PGTABLE_PROT_R, NULL, 0);
>   
> +	kvm_nested_s2_unmap(kvm);
>   	return false;
>   }
>   
> @@ -1772,6 +1786,11 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
>   					range->start << PAGE_SHIFT);
>   	pte = __pte(kpte);
>   	return pte_valid(pte) && pte_young(pte);
> +
> +	/*
> +	 * TODO: Handle nested_mmu structures here using the reverse mapping in
> +	 * a later version of patch series.
> +	 */
>   }
>   
>   bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
> @@ -2004,6 +2023,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>   
>   	write_lock(&kvm->mmu_lock);
>   	kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, size);
> +	kvm_nested_s2_unmap(kvm);
>   	write_unlock(&kvm->mmu_lock);
>   }
>   
> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> index 73c0be25345a..948ac5b9638c 100644
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
> @@ -523,6 +523,45 @@ int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2)
>   	return kvm_inject_nested_sync(vcpu, esr_el2);
>   }
>   
> +/* expects kvm->mmu_lock to be held */
> +void kvm_nested_s2_wp(struct kvm *kvm)
> +{
> +	int i;
> +
> +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> +		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
> +
> +		if (kvm_s2_mmu_valid(mmu))
> +			kvm_stage2_wp_range(mmu, 0, kvm_phys_size(mmu));
> +	}
> +}
> +
> +/* expects kvm->mmu_lock to be held */
> +void kvm_nested_s2_unmap(struct kvm *kvm)
> +{
> +	int i;
> +
> +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> +		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
> +
> +		if (kvm_s2_mmu_valid(mmu))
> +			kvm_unmap_stage2_range(mmu, 0, kvm_phys_size(mmu));
> +	}
> +}
> +
> +/* expects kvm->mmu_lock to be held */
> +void kvm_nested_s2_flush(struct kvm *kvm)
> +{
> +	int i;
> +
> +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> +		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
> +
> +		if (kvm_s2_mmu_valid(mmu))
> +			kvm_stage2_flush_range(mmu, 0, kvm_phys_size(mmu));
> +	}
> +}
> +
>   void kvm_arch_flush_shadow_all(struct kvm *kvm)
>   {
>   	int i;

Thanks,
Ganapat

^ permalink raw reply	[flat|nested] 95+ messages in thread

* Re: [PATCH v10 29/59] KVM: arm64: nv: Unmap/flush shadow stage 2 page tables
  2023-09-14 13:10   ` Ganapatrao Kulkarni
@ 2023-09-14 13:37     ` Marc Zyngier
  0 siblings, 0 replies; 95+ messages in thread
From: Marc Zyngier @ 2023-09-14 13:37 UTC (permalink / raw)
  To: Ganapatrao Kulkarni
  Cc: kvmarm, kvm, linux-arm-kernel, Christoffer Dall,
	Alexandru Elisei, Andre Przywara, Chase Conklin, Darren Hart,
	Jintack Lim, Russell King, Miguel Luis, James Morse,
	Suzuki K Poulose, Oliver Upton, Zenghui Yu, D Scott Phillips

Hi Ganapatrao,

On 2023-09-14 14:10, Ganapatrao Kulkarni wrote:
> Hi Marc,
> 
> On 15-05-2023 11:00 pm, Marc Zyngier wrote:
>> From: Christoffer Dall <christoffer.dall@linaro.org>
>> 
>> Unmap/flush shadow stage 2 page tables for the nested VMs as well as 
>> the
>> stage 2 page table for the guest hypervisor.
>> 
>> Note: A bunch of the code in mmu.c relating to MMU notifiers is
>> currently dealt with in an extremely abrupt way, for example by 
>> clearing
>> out an entire shadow stage-2 table. This will be handled in a more
>> efficient way using the reverse mapping feature in a later version of
>> the patch series.
> 
> We are seeing spin-lock contention due to this patch when the
> Guest-Hypervisor(L1) is booted with higher number of cores and
> auto-numa is enabled on L0.
> kvm_nested_s2_unmap is called as part of notifier call-back when numa
> page migration is happening and this function which holds lock becomes
> source of contention when there are more vCPUs are processing the
> auto-numa page fault/migration.

This is fully expected. Honestly, expecting any sort of performance at
this stage is extremely premature (and I have zero sympathy for hacks
like auto-numa...).

[...]

>>   +	kvm_nested_s2_unmap(kvm);
> 
> This kvm_nested_s2_unmap/kvm_unmap_stage2_range is called for every
> active L2 and page table walk-through iterates for long iterations
> since kvm_phys_size(mmu) is pretty big size(atleast 48bits).
> What would be the best fix if we want to avoid this unnessary long
> iteration of page table lookup?

The fix would be to have a reverse mapping for any canonical IPA to
any shadow IPA, which would allow us to not fully tear down the shadow
PTs but only the bit we need to remove. This is clearly stated in the
commit message you quoted.

However, this plan is pretty far down the list, and I have no plan to
work on performance optimisations until we have basic support merged
upstream.

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 95+ messages in thread

end of thread, other threads:[~2023-09-14 13:39 UTC | newest]

Thread overview: 95+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-15 17:30 [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 01/59] KVM: arm64: Move VTCR_EL2 into struct s2_mmu Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 02/59] arm64: Add missing Set/Way CMO encodings Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 03/59] arm64: Add missing VA " Marc Zyngier
2023-06-05 17:46   ` Eric Auger
2023-05-15 17:30 ` [PATCH v10 04/59] arm64: Add missing ERXMISCx_EL1 encodings Marc Zyngier
2023-06-05 17:47   ` Eric Auger
2023-05-15 17:30 ` [PATCH v10 05/59] arm64: Add missing DC ZVA/GVA/GZVA encodings Marc Zyngier
2023-06-05 17:47   ` Eric Auger
2023-05-15 17:30 ` [PATCH v10 06/59] arm64: Add TLBI operation encodings Marc Zyngier
2023-06-06  9:33   ` Eric Auger
2023-05-15 17:30 ` [PATCH v10 07/59] arm64: Add AT " Marc Zyngier
2023-06-14 15:08   ` Eric Auger
2023-05-15 17:30 ` [PATCH v10 08/59] KVM: arm64: Add missing HCR_EL2 trap bits Marc Zyngier
2023-06-14 15:08   ` Eric Auger
2023-05-15 17:30 ` [PATCH v10 09/59] KVM: arm64: nv: Add trap forwarding infrastructure Marc Zyngier
2023-07-13 14:29   ` Eric Auger
2023-07-14 10:53     ` Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 10/59] KVM: arm64: nv: Add trap forwarding for HCR_EL2 Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 11/59] KVM: arm64: nv: Expose FEAT_EVT to nested guests Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 12/59] KVM: arm64: nv: Add trap forwarding for MDCR_EL2 Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 13/59] KVM: arm64: nv: Add trap forwarding for CNTHCTL_EL2 Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 14/59] KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 15/59] KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg() Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 16/59] KVM: arm64: nv: Handle SPSR_EL2 specially Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 17/59] KVM: arm64: nv: Handle HCR_EL2.E2H specially Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 18/59] KVM: arm64: nv: Save/Restore vEL2 sysregs Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 19/59] KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2 Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 20/59] KVM: arm64: nv: Trap CPACR_EL1 access " Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 21/59] KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 22/59] KVM: arm64: nv: Respect virtual CPTR_EL2.{TFP,FPEN} settings Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 23/59] KVM: arm64: nv: Respect virtual HCR_EL2.{NV,TSC) settings Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 24/59] KVM: arm64: nv: Configure HCR_EL2 for nested virtualization Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 25/59] KVM: arm64: nv: Support multiple nested Stage-2 mmu structures Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 26/59] KVM: arm64: nv: Implement nested Stage-2 page table walk logic Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 27/59] KVM: arm64: nv: Handle shadow stage 2 page faults Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 28/59] KVM: arm64: nv: Restrict S2 RD/WR permissions to match the guest's Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 29/59] KVM: arm64: nv: Unmap/flush shadow stage 2 page tables Marc Zyngier
2023-09-14 13:10   ` Ganapatrao Kulkarni
2023-09-14 13:37     ` Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 30/59] KVM: arm64: nv: Set a handler for the system instruction traps Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 31/59] KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2 Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 32/59] KVM: arm64: nv: Trap and emulate TLBI " Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 33/59] KVM: arm64: nv: Fold guest's HCR_EL2 configuration into the host's Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 34/59] KVM: arm64: nv: Hide RAS from nested guests Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 35/59] KVM: arm64: nv: Add handling of EL2-specific timer registers Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 36/59] KVM: arm64: nv: Load timer before the GIC Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 37/59] KVM: arm64: nv: Nested GICv3 Support Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 38/59] KVM: arm64: nv: Don't load the GICv4 context on entering a nested guest Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 39/59] KVM: arm64: nv: vgic: Emulate the HW bit in software Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 40/59] KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 41/59] KVM: arm64: nv: Implement maintenance interrupt forwarding Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 42/59] KVM: arm64: nv: Deal with broken VGIC on maintenance interrupt delivery Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 43/59] KVM: arm64: nv: Allow userspace to request KVM_ARM_VCPU_NESTED_VIRT Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 44/59] KVM: arm64: nv: Add handling of FEAT_TTL TLB invalidation Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 45/59] KVM: arm64: nv: Invalidate TLBs based on shadow S2 TTL-like information Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 46/59] KVM: arm64: nv: Tag shadow S2 entries with nested level Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 47/59] KVM: arm64: nv: Add include containing the VNCR_EL2 offsets Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 48/59] KVM: arm64: nv: Map VNCR-capable registers to a separate page Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 49/59] KVM: arm64: nv: Move nested vgic state into the sysreg file Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 50/59] KVM: arm64: Add FEAT_NV2 cpu feature Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 51/59] KVM: arm64: nv: Sync nested timer state with FEAT_NV2 Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 52/59] KVM: arm64: nv: Fold GICv3 host trapping requirements into guest setup Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 53/59] KVM: arm64: nv: Publish emulated timer interrupt state in the in-memory state Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 54/59] KVM: arm64: nv: Allocate VNCR page when required Marc Zyngier
2023-05-15 17:30 ` [PATCH v10 55/59] KVM: arm64: nv: Enable ARMv8.4-NV support Marc Zyngier
2023-05-15 17:31 ` [PATCH v10 56/59] KVM: arm64: nv: Fast-track 'InHost' exception returns Marc Zyngier
2023-05-15 17:31 ` [PATCH v10 57/59] KVM: arm64: nv: Fast-track EL1 TLBIs for VHE guests Marc Zyngier
2023-05-15 17:31 ` [PATCH v10 58/59] KVM: arm64: nv: Use FEAT_ECV to trap access to EL0 timers Marc Zyngier
2023-05-15 17:31 ` [PATCH v10 59/59] KVM: arm64: nv: Accelerate EL0 timer read accesses when FEAT_ECV is on Marc Zyngier
2023-05-16 16:53 ` [PATCH v10 00/59] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Eric Auger
2023-05-16 18:47   ` Marc Zyngier
2023-05-16 20:28   ` Marc Zyngier
2023-05-17  8:59     ` Eric Auger
2023-05-17 14:12       ` Marc Zyngier
2023-06-06  9:33         ` Eric Auger
2023-06-06 16:30           ` Marc Zyngier
2023-06-07 16:39             ` Eric Auger
2023-06-06 17:52           ` Miguel Luis
2023-06-07 16:40             ` Eric Auger
2023-06-10  8:25               ` Miguel Luis
2023-06-11 16:28                 ` Eric Auger
2023-06-05 11:28 ` Eric Auger
2023-06-06  7:30   ` Marc Zyngier
2023-06-06  9:29     ` Eric Auger
2023-06-06 16:22       ` Marc Zyngier
2023-06-07 16:30         ` Eric Auger
2023-06-28  6:45 ` Ganapatrao Kulkarni
2023-06-29  7:03   ` Marc Zyngier
2023-07-04 12:31     ` Ganapatrao Kulkarni
2023-07-07  9:46       ` Ganapatrao Kulkarni
2023-07-11 11:56         ` Ganapatrao Kulkarni
2023-07-11 12:30           ` Marc Zyngier
2023-07-10 12:56     ` Miguel Luis
2023-07-18 10:29       ` Miguel Luis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).