kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support
@ 2019-06-21  9:37 Marc Zyngier
  2019-06-21  9:37 ` [PATCH 01/59] KVM: arm64: Migrate _elx sysreg accessors to msr_s/mrs_s Marc Zyngier
                   ` (60 more replies)
  0 siblings, 61 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:37 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

I've taken over the maintenance of this series originally written by
Jintack and Christoffer. Since then, the series has been substantially
reworked, new features (and most probably bugs) have been added, and
the whole thing rebased multiple times. If anything breaks, please
blame me, and nobody else.

As you can tell, this is quite big. It is also remarkably incomplete
(we're missing many critical bits for fully emulate EL2), but the idea
is to start merging things early in order to reduce the maintenance
headache. What we want to achieve is that with NV disabled, there is
no performance overhead and no regression. The only thing I intend to
merge ASAP is the first patch in the series, because it should have
zero effect and is a reasonable cleanup.

The series is roughly divided in 4 parts: exception handling, memory
virtualization, interrupts and timers. There are of course some
dependencies, but you'll hopefully get the gist of it.

For the most courageous of you, I've put out a branch[1] containing this
and a bit more. Of course, you'll need some userspace. Andre maintains
a hacked version of kvmtool[1] that takes a --nested option, allowing
the guest to be started at EL2. You can run the whole stack in the
Foundation model. Don't be in a hurry ;-).

[1] git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git kvm-arm64/nv-wip-5.2-rc5
[2] git://linux-arm.org/kvmtool.git nv/nv-wip-5.2-rc5

Andre Przywara (4):
  KVM: arm64: nv: Handle virtual EL2 registers in
    vcpu_read/write_sys_reg()
  KVM: arm64: nv: Save/Restore vEL2 sysregs
  KVM: arm64: nv: Handle traps for timer _EL02 and _EL2 sysregs
    accessors
  KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ

Christoffer Dall (16):
  KVM: arm64: nv: Introduce nested virtualization VCPU feature
  KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set
  KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x
  KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state
  KVM: arm64: nv: Handle trapped ERET from virtual EL2
  KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor
  KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2
  KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2
    changes
  KVM: arm/arm64: nv: Support multiple nested stage 2 mmu structures
  KVM: arm64: nv: Implement nested Stage-2 page table walk logic
  KVM: arm64: nv: Handle shadow stage 2 page faults
  KVM: arm64: nv: Unmap/flush shadow stage 2 page tables
  KVM: arm64: nv: arch_timer: Support hyp timer emulation
  KVM: arm64: nv: vgic-v3: Take cpu_if pointer directly instead of vcpu
  KVM: arm64: nv: vgic: Emulate the HW bit in software
  KVM: arm64: nv: Add nested GICv3 tracepoints

Dave Martin (1):
  KVM: arm64: Migrate _elx sysreg accessors to msr_s/mrs_s

Jintack Lim (21):
  arm64: Add ARM64_HAS_NESTED_VIRT cpufeature
  KVM: arm64: nv: Add EL2 system registers to vcpu context
  KVM: arm64: nv: Support virtual EL2 exceptions
  KVM: arm64: nv: Inject HVC exceptions to the virtual EL2
  KVM: arm64: nv: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
  KVM: arm64: nv: Trap CPACR_EL1 access in virtual EL2
  KVM: arm64: nv: Set a handler for the system instruction traps
  KVM: arm64: nv: Handle PSCI call via smc from the guest
  KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting
  KVM: arm64: nv: Respect virtual CPTR_EL2.TFP setting
  KVM: arm64: nv: Respect the virtual HCR_EL2.NV bit setting
  KVM: arm64: nv: Respect virtual HCR_EL2.TVM and TRVM settings
  KVM: arm64: nv: Respect the virtual HCR_EL2.NV1 bit setting
  KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2
  KVM: arm64: nv: Configure HCR_EL2 for nested virtualization
  KVM: arm64: nv: Pretend we only support larger-than-host page sizes
  KVM: arm64: nv: Introduce sys_reg_desc.forward_trap
  KVM: arm64: nv: Rework the system instruction emulation framework
  KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2
  KVM: arm64: nv: Trap and emulate TLBI instructions from virtual EL2
  KVM: arm64: nv: Nested GICv3 Support

Marc Zyngier (17):
  KVM: arm64: Move __load_guest_stage2 to kvm_mmu.h
  KVM: arm64: nv: Reset VMPIDR_EL2 and VPIDR_EL2 to sane values
  KVM: arm64: nv: Handle SPSR_EL2 specially
  KVM: arm64: nv: Refactor vcpu_{read,write}_sys_reg
  KVM: arm64: nv: Don't expose SVE to nested guests
  KVM: arm64: nv: Hide RAS from nested guests
  KVM: arm/arm64: nv: Factor out stage 2 page table data from struct kvm
  KVM: arm64: nv: Move last_vcpu_ran to be per s2 mmu
  KVM: arm64: nv: Don't always start an S2 MMU search from the beginning
  KVM: arm64: nv: Propagate CNTVOFF_EL2 to the virtual EL1 timer
  KVM: arm64: nv: Load timer before the GIC
  KVM: arm64: nv: Implement maintenance interrupt forwarding
  arm64: KVM: nv: Add handling of EL2-specific timer registers
  arm64: KVM: nv: Honor SCTLR_EL2.SPAN on entering vEL2
  arm64: KVM: nv: Handle SCTLR_EL2 RES0/RES1 bits
  arm64: KVM: nv: Restrict S2 RD/WR permissions to match the guest's
  arm64: KVM: nv: Allow userspace to request KVM_ARM_VCPU_NESTED_VIRT

 .../admin-guide/kernel-parameters.txt         |    4 +
 .../virtual/kvm/devices/arm-vgic-v3.txt       |    9 +
 arch/arm/include/asm/kvm_asm.h                |    5 +-
 arch/arm/include/asm/kvm_emulate.h            |    3 +
 arch/arm/include/asm/kvm_host.h               |   31 +-
 arch/arm/include/asm/kvm_hyp.h                |   25 +-
 arch/arm/include/asm/kvm_mmu.h                |   83 +-
 arch/arm/include/asm/kvm_nested.h             |    9 +
 arch/arm/include/uapi/asm/kvm.h               |    1 +
 arch/arm/kvm/hyp/switch.c                     |   11 +-
 arch/arm/kvm/hyp/tlb.c                        |   13 +-
 arch/arm64/include/asm/cpucaps.h              |    3 +-
 arch/arm64/include/asm/esr.h                  |    4 +-
 arch/arm64/include/asm/kvm_arm.h              |   28 +-
 arch/arm64/include/asm/kvm_asm.h              |    9 +-
 arch/arm64/include/asm/kvm_coproc.h           |    2 +-
 arch/arm64/include/asm/kvm_emulate.h          |  157 +-
 arch/arm64/include/asm/kvm_host.h             |  105 +-
 arch/arm64/include/asm/kvm_hyp.h              |   82 +-
 arch/arm64/include/asm/kvm_mmu.h              |   62 +-
 arch/arm64/include/asm/kvm_nested.h           |   68 +
 arch/arm64/include/asm/sysreg.h               |  143 +-
 arch/arm64/include/uapi/asm/kvm.h             |    2 +
 arch/arm64/kernel/cpufeature.c                |   26 +
 arch/arm64/kvm/Makefile                       |    4 +
 arch/arm64/kvm/emulate-nested.c               |  223 +++
 arch/arm64/kvm/guest.c                        |    6 +
 arch/arm64/kvm/handle_exit.c                  |   76 +-
 arch/arm64/kvm/hyp/Makefile                   |    1 +
 arch/arm64/kvm/hyp/at.c                       |  217 +++
 arch/arm64/kvm/hyp/switch.c                   |   86 +-
 arch/arm64/kvm/hyp/sysreg-sr.c                |  267 ++-
 arch/arm64/kvm/hyp/tlb.c                      |  129 +-
 arch/arm64/kvm/hyp/vgic-v2-cpuif-proxy.c      |    2 +-
 arch/arm64/kvm/inject_fault.c                 |   12 -
 arch/arm64/kvm/nested.c                       |  551 +++++++
 arch/arm64/kvm/regmap.c                       |    4 +-
 arch/arm64/kvm/reset.c                        |    7 +
 arch/arm64/kvm/sys_regs.c                     | 1460 +++++++++++++++--
 arch/arm64/kvm/sys_regs.h                     |    6 +
 arch/arm64/kvm/trace.h                        |   58 +-
 include/kvm/arm_arch_timer.h                  |    6 +
 include/kvm/arm_vgic.h                        |   28 +-
 virt/kvm/arm/arch_timer.c                     |  158 +-
 virt/kvm/arm/arm.c                            |   62 +-
 virt/kvm/arm/hyp/vgic-v3-sr.c                 |   35 +-
 virt/kvm/arm/mmio.c                           |   12 +-
 virt/kvm/arm/mmu.c                            |  445 +++--
 virt/kvm/arm/trace.h                          |    6 +-
 virt/kvm/arm/vgic/vgic-init.c                 |   30 +
 virt/kvm/arm/vgic/vgic-kvm-device.c           |   22 +
 virt/kvm/arm/vgic/vgic-nested-trace.h         |  137 ++
 virt/kvm/arm/vgic/vgic-v2.c                   |   10 +-
 virt/kvm/arm/vgic/vgic-v3-nested.c            |  236 +++
 virt/kvm/arm/vgic/vgic-v3.c                   |   40 +-
 virt/kvm/arm/vgic/vgic.c                      |   74 +-
 56 files changed, 4683 insertions(+), 612 deletions(-)
 create mode 100644 arch/arm/include/asm/kvm_nested.h
 create mode 100644 arch/arm64/include/asm/kvm_nested.h
 create mode 100644 arch/arm64/kvm/emulate-nested.c
 create mode 100644 arch/arm64/kvm/hyp/at.c
 create mode 100644 arch/arm64/kvm/nested.c
 create mode 100644 virt/kvm/arm/vgic/vgic-nested-trace.h
 create mode 100644 virt/kvm/arm/vgic/vgic-v3-nested.c

-- 
2.20.1


^ permalink raw reply	[flat|nested] 176+ messages in thread

* [PATCH 01/59] KVM: arm64: Migrate _elx sysreg accessors to msr_s/mrs_s
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
@ 2019-06-21  9:37 ` Marc Zyngier
  2019-06-24 11:16   ` Dave Martin
  2019-06-24 12:59   ` Alexandru Elisei
  2019-06-21  9:37 ` [PATCH 02/59] KVM: arm64: Move __load_guest_stage2 to kvm_mmu.h Marc Zyngier
                   ` (59 subsequent siblings)
  60 siblings, 2 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:37 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Dave Martin <Dave.Martin@arm.com>

Currently, the {read,write}_sysreg_el*() accessors for accessing
particular ELs' sysregs in the presence of VHE rely on some local
hacks and define their system register encodings in a way that is
inconsistent with the core definitions in <asm/sysreg.h>.

As a result, it is necessary to add duplicate definitions for any
system register that already needs a definition in sysreg.h for
other reasons.

This is a bit of a maintenance headache, and the reasons for the
_el*() accessors working the way they do is a bit historical.

This patch gets rid of the shadow sysreg definitions in
<asm/kvm_hyp.h>, converts the _el*() accessors to use the core
__msr_s/__mrs_s interface, and converts all call sites to use the
standard sysreg #define names (i.e., upper case, with SYS_ prefix).

This patch will conflict heavily anyway, so the opportunity taken
to clean up some bad whitespace in the context of the changes is
taken.

The change exposes a few system registers that have no sysreg.h
definition, due to msr_s/mrs_s being used in place of msr/mrs:
additions are made in order to fill in the gaps.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoffer Dall <christoffer.dall@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Link: https://www.spinics.net/lists/kvm-arm/msg31717.html
[Rebased to v4.21-rc1]
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
[Rebased to v5.2-rc5, changelog updates]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_hyp.h           | 13 ++--
 arch/arm64/include/asm/kvm_emulate.h     | 16 ++---
 arch/arm64/include/asm/kvm_hyp.h         | 50 ++-------------
 arch/arm64/include/asm/sysreg.h          | 35 ++++++++++-
 arch/arm64/kvm/hyp/switch.c              | 14 ++---
 arch/arm64/kvm/hyp/sysreg-sr.c           | 78 ++++++++++++------------
 arch/arm64/kvm/hyp/tlb.c                 | 12 ++--
 arch/arm64/kvm/hyp/vgic-v2-cpuif-proxy.c |  2 +-
 arch/arm64/kvm/regmap.c                  |  4 +-
 arch/arm64/kvm/sys_regs.c                | 56 ++++++++---------
 virt/kvm/arm/arch_timer.c                | 24 ++++----
 11 files changed, 148 insertions(+), 156 deletions(-)

diff --git a/arch/arm/include/asm/kvm_hyp.h b/arch/arm/include/asm/kvm_hyp.h
index 87bcd18df8d5..059224fb14db 100644
--- a/arch/arm/include/asm/kvm_hyp.h
+++ b/arch/arm/include/asm/kvm_hyp.h
@@ -93,13 +93,14 @@
 #define VFP_FPEXC	__ACCESS_VFP(FPEXC)
 
 /* AArch64 compatibility macros, only for the timer so far */
-#define read_sysreg_el0(r)		read_sysreg(r##_el0)
-#define write_sysreg_el0(v, r)		write_sysreg(v, r##_el0)
+#define read_sysreg_el0(r)		read_sysreg(r##_EL0)
+#define write_sysreg_el0(v, r)		write_sysreg(v, r##_EL0)
+
+#define SYS_CNTP_CTL_EL0		CNTP_CTL
+#define SYS_CNTP_CVAL_EL0		CNTP_CVAL
+#define SYS_CNTV_CTL_EL0		CNTV_CTL
+#define SYS_CNTV_CVAL_EL0		CNTV_CVAL
 
-#define cntp_ctl_el0			CNTP_CTL
-#define cntp_cval_el0			CNTP_CVAL
-#define cntv_ctl_el0			CNTV_CTL
-#define cntv_cval_el0			CNTV_CVAL
 #define cntvoff_el2			CNTVOFF
 #define cnthctl_el2			CNTHCTL
 
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 613427fafff9..39ffe41855bc 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -137,7 +137,7 @@ static inline unsigned long *__vcpu_elr_el1(const struct kvm_vcpu *vcpu)
 static inline unsigned long vcpu_read_elr_el1(const struct kvm_vcpu *vcpu)
 {
 	if (vcpu->arch.sysregs_loaded_on_cpu)
-		return read_sysreg_el1(elr);
+		return read_sysreg_el1(SYS_ELR);
 	else
 		return *__vcpu_elr_el1(vcpu);
 }
@@ -145,7 +145,7 @@ static inline unsigned long vcpu_read_elr_el1(const struct kvm_vcpu *vcpu)
 static inline void vcpu_write_elr_el1(const struct kvm_vcpu *vcpu, unsigned long v)
 {
 	if (vcpu->arch.sysregs_loaded_on_cpu)
-		write_sysreg_el1(v, elr);
+		write_sysreg_el1(v, SYS_ELR);
 	else
 		*__vcpu_elr_el1(vcpu) = v;
 }
@@ -197,7 +197,7 @@ static inline unsigned long vcpu_read_spsr(const struct kvm_vcpu *vcpu)
 		return vcpu_read_spsr32(vcpu);
 
 	if (vcpu->arch.sysregs_loaded_on_cpu)
-		return read_sysreg_el1(spsr);
+		return read_sysreg_el1(SYS_SPSR);
 	else
 		return vcpu_gp_regs(vcpu)->spsr[KVM_SPSR_EL1];
 }
@@ -210,7 +210,7 @@ static inline void vcpu_write_spsr(struct kvm_vcpu *vcpu, unsigned long v)
 	}
 
 	if (vcpu->arch.sysregs_loaded_on_cpu)
-		write_sysreg_el1(v, spsr);
+		write_sysreg_el1(v, SYS_SPSR);
 	else
 		vcpu_gp_regs(vcpu)->spsr[KVM_SPSR_EL1] = v;
 }
@@ -462,13 +462,13 @@ static inline void kvm_skip_instr(struct kvm_vcpu *vcpu, bool is_wide_instr)
  */
 static inline void __hyp_text __kvm_skip_instr(struct kvm_vcpu *vcpu)
 {
-	*vcpu_pc(vcpu) = read_sysreg_el2(elr);
-	vcpu->arch.ctxt.gp_regs.regs.pstate = read_sysreg_el2(spsr);
+	*vcpu_pc(vcpu) = read_sysreg_el2(SYS_ELR);
+	vcpu->arch.ctxt.gp_regs.regs.pstate = read_sysreg_el2(SYS_SPSR);
 
 	kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
 
-	write_sysreg_el2(vcpu->arch.ctxt.gp_regs.regs.pstate, spsr);
-	write_sysreg_el2(*vcpu_pc(vcpu), elr);
+	write_sysreg_el2(vcpu->arch.ctxt.gp_regs.regs.pstate, SYS_SPSR);
+	write_sysreg_el2(*vcpu_pc(vcpu), SYS_ELR);
 }
 
 #endif /* __ARM64_KVM_EMULATE_H__ */
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 09fe8bd15f6e..ce99c2daff04 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -29,7 +29,7 @@
 #define read_sysreg_elx(r,nvh,vh)					\
 	({								\
 		u64 reg;						\
-		asm volatile(ALTERNATIVE("mrs %0, " __stringify(r##nvh),\
+		asm volatile(ALTERNATIVE(__mrs_s("%0", r##nvh),	\
 					 __mrs_s("%0", r##vh),		\
 					 ARM64_HAS_VIRT_HOST_EXTN)	\
 			     : "=r" (reg));				\
@@ -39,7 +39,7 @@
 #define write_sysreg_elx(v,r,nvh,vh)					\
 	do {								\
 		u64 __val = (u64)(v);					\
-		asm volatile(ALTERNATIVE("msr " __stringify(r##nvh) ", %x0",\
+		asm volatile(ALTERNATIVE(__msr_s(r##nvh, "%x0"),	\
 					 __msr_s(r##vh, "%x0"),		\
 					 ARM64_HAS_VIRT_HOST_EXTN)	\
 					 : : "rZ" (__val));		\
@@ -48,55 +48,15 @@
 /*
  * Unified accessors for registers that have a different encoding
  * between VHE and non-VHE. They must be specified without their "ELx"
- * encoding.
+ * encoding, but with the SYS_ prefix, as defined in asm/sysreg.h.
  */
-#define read_sysreg_el2(r)						\
-	({								\
-		u64 reg;						\
-		asm volatile(ALTERNATIVE("mrs %0, " __stringify(r##_EL2),\
-					 "mrs %0, " __stringify(r##_EL1),\
-					 ARM64_HAS_VIRT_HOST_EXTN)	\
-			     : "=r" (reg));				\
-		reg;							\
-	})
-
-#define write_sysreg_el2(v,r)						\
-	do {								\
-		u64 __val = (u64)(v);					\
-		asm volatile(ALTERNATIVE("msr " __stringify(r##_EL2) ", %x0",\
-					 "msr " __stringify(r##_EL1) ", %x0",\
-					 ARM64_HAS_VIRT_HOST_EXTN)	\
-					 : : "rZ" (__val));		\
-	} while (0)
 
 #define read_sysreg_el0(r)	read_sysreg_elx(r, _EL0, _EL02)
 #define write_sysreg_el0(v,r)	write_sysreg_elx(v, r, _EL0, _EL02)
 #define read_sysreg_el1(r)	read_sysreg_elx(r, _EL1, _EL12)
 #define write_sysreg_el1(v,r)	write_sysreg_elx(v, r, _EL1, _EL12)
-
-/* The VHE specific system registers and their encoding */
-#define sctlr_EL12              sys_reg(3, 5, 1, 0, 0)
-#define cpacr_EL12              sys_reg(3, 5, 1, 0, 2)
-#define ttbr0_EL12              sys_reg(3, 5, 2, 0, 0)
-#define ttbr1_EL12              sys_reg(3, 5, 2, 0, 1)
-#define tcr_EL12                sys_reg(3, 5, 2, 0, 2)
-#define afsr0_EL12              sys_reg(3, 5, 5, 1, 0)
-#define afsr1_EL12              sys_reg(3, 5, 5, 1, 1)
-#define esr_EL12                sys_reg(3, 5, 5, 2, 0)
-#define far_EL12                sys_reg(3, 5, 6, 0, 0)
-#define mair_EL12               sys_reg(3, 5, 10, 2, 0)
-#define amair_EL12              sys_reg(3, 5, 10, 3, 0)
-#define vbar_EL12               sys_reg(3, 5, 12, 0, 0)
-#define contextidr_EL12         sys_reg(3, 5, 13, 0, 1)
-#define cntkctl_EL12            sys_reg(3, 5, 14, 1, 0)
-#define cntp_tval_EL02          sys_reg(3, 5, 14, 2, 0)
-#define cntp_ctl_EL02           sys_reg(3, 5, 14, 2, 1)
-#define cntp_cval_EL02          sys_reg(3, 5, 14, 2, 2)
-#define cntv_tval_EL02          sys_reg(3, 5, 14, 3, 0)
-#define cntv_ctl_EL02           sys_reg(3, 5, 14, 3, 1)
-#define cntv_cval_EL02          sys_reg(3, 5, 14, 3, 2)
-#define spsr_EL12               sys_reg(3, 5, 4, 0, 0)
-#define elr_EL12                sys_reg(3, 5, 4, 0, 1)
+#define read_sysreg_el2(r)	read_sysreg_elx(r, _EL2, _EL1)
+#define write_sysreg_el2(v,r)	write_sysreg_elx(v, r, _EL2, _EL1)
 
 /**
  * hyp_alternate_select - Generates patchable code sequences that are
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 902d75b60914..434cf53d527b 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -202,6 +202,9 @@
 #define SYS_APGAKEYLO_EL1		sys_reg(3, 0, 2, 3, 0)
 #define SYS_APGAKEYHI_EL1		sys_reg(3, 0, 2, 3, 1)
 
+#define SYS_SPSR_EL1			sys_reg(3, 0, 4, 0, 0)
+#define SYS_ELR_EL1			sys_reg(3, 0, 4, 0, 1)
+
 #define SYS_ICC_PMR_EL1			sys_reg(3, 0, 4, 6, 0)
 
 #define SYS_AFSR0_EL1			sys_reg(3, 0, 5, 1, 0)
@@ -393,6 +396,9 @@
 #define SYS_CNTP_CTL_EL0		sys_reg(3, 3, 14, 2, 1)
 #define SYS_CNTP_CVAL_EL0		sys_reg(3, 3, 14, 2, 2)
 
+#define SYS_CNTV_CTL_EL0		sys_reg(3, 3, 14, 3, 1)
+#define SYS_CNTV_CVAL_EL0		sys_reg(3, 3, 14, 3, 2)
+
 #define SYS_AARCH32_CNTP_TVAL		sys_reg(0, 0, 14, 2, 0)
 #define SYS_AARCH32_CNTP_CTL		sys_reg(0, 0, 14, 2, 1)
 #define SYS_AARCH32_CNTP_CVAL		sys_reg(0, 2, 0, 14, 0)
@@ -403,14 +409,17 @@
 #define __TYPER_CRm(n)			(0xc | (((n) >> 3) & 0x3))
 #define SYS_PMEVTYPERn_EL0(n)		sys_reg(3, 3, 14, __TYPER_CRm(n), __PMEV_op2(n))
 
-#define SYS_PMCCFILTR_EL0		sys_reg (3, 3, 14, 15, 7)
+#define SYS_PMCCFILTR_EL0		sys_reg(3, 3, 14, 15, 7)
 
 #define SYS_ZCR_EL2			sys_reg(3, 4, 1, 2, 0)
-
 #define SYS_DACR32_EL2			sys_reg(3, 4, 3, 0, 0)
+#define SYS_SPSR_EL2			sys_reg(3, 4, 4, 0, 0)
+#define SYS_ELR_EL2			sys_reg(3, 4, 4, 0, 1)
 #define SYS_IFSR32_EL2			sys_reg(3, 4, 5, 0, 1)
+#define SYS_ESR_EL2			sys_reg(3, 4, 5, 2, 0)
 #define SYS_VSESR_EL2			sys_reg(3, 4, 5, 2, 3)
 #define SYS_FPEXC32_EL2			sys_reg(3, 4, 5, 3, 0)
+#define SYS_FAR_EL2			sys_reg(3, 4, 6, 0, 0)
 
 #define SYS_VDISR_EL2			sys_reg(3, 4, 12, 1,  1)
 #define __SYS__AP0Rx_EL2(x)		sys_reg(3, 4, 12, 8, x)
@@ -455,7 +464,29 @@
 #define SYS_ICH_LR15_EL2		__SYS__LR8_EL2(7)
 
 /* VHE encodings for architectural EL0/1 system registers */
+#define SYS_SCTLR_EL12			sys_reg(3, 5, 1, 0, 0)
+#define SYS_CPACR_EL12			sys_reg(3, 5, 1, 0, 2)
 #define SYS_ZCR_EL12			sys_reg(3, 5, 1, 2, 0)
+#define SYS_TTBR0_EL12			sys_reg(3, 5, 2, 0, 0)
+#define SYS_TTBR1_EL12			sys_reg(3, 5, 2, 0, 1)
+#define SYS_TCR_EL12			sys_reg(3, 5, 2, 0, 2)
+#define SYS_SPSR_EL12			sys_reg(3, 5, 4, 0, 0)
+#define SYS_ELR_EL12			sys_reg(3, 5, 4, 0, 1)
+#define SYS_AFSR0_EL12			sys_reg(3, 5, 5, 1, 0)
+#define SYS_AFSR1_EL12			sys_reg(3, 5, 5, 1, 1)
+#define SYS_ESR_EL12			sys_reg(3, 5, 5, 2, 0)
+#define SYS_FAR_EL12			sys_reg(3, 5, 6, 0, 0)
+#define SYS_MAIR_EL12			sys_reg(3, 5, 10, 2, 0)
+#define SYS_AMAIR_EL12			sys_reg(3, 5, 10, 3, 0)
+#define SYS_VBAR_EL12			sys_reg(3, 5, 12, 0, 0)
+#define SYS_CONTEXTIDR_EL12		sys_reg(3, 5, 13, 0, 1)
+#define SYS_CNTKCTL_EL12		sys_reg(3, 5, 14, 1, 0)
+#define SYS_CNTP_TVAL_EL02		sys_reg(3, 5, 14, 2, 0)
+#define SYS_CNTP_CTL_EL02		sys_reg(3, 5, 14, 2, 1)
+#define SYS_CNTP_CVAL_EL02		sys_reg(3, 5, 14, 2, 2)
+#define SYS_CNTV_TVAL_EL02		sys_reg(3, 5, 14, 3, 0)
+#define SYS_CNTV_CTL_EL02		sys_reg(3, 5, 14, 3, 1)
+#define SYS_CNTV_CVAL_EL02		sys_reg(3, 5, 14, 3, 2)
 
 /* Common SCTLR_ELx flags. */
 #define SCTLR_ELx_DSSBS	(_BITUL(44))
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 8799e0c267d4..7b55c11b30fb 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -295,7 +295,7 @@ static bool __hyp_text __populate_fault_info(struct kvm_vcpu *vcpu)
 	if (ec != ESR_ELx_EC_DABT_LOW && ec != ESR_ELx_EC_IABT_LOW)
 		return true;
 
-	far = read_sysreg_el2(far);
+	far = read_sysreg_el2(SYS_FAR);
 
 	/*
 	 * The HPFAR can be invalid if the stage 2 fault did not
@@ -412,7 +412,7 @@ static bool __hyp_text __hyp_handle_fpsimd(struct kvm_vcpu *vcpu)
 static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 {
 	if (ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ)
-		vcpu->arch.fault.esr_el2 = read_sysreg_el2(esr);
+		vcpu->arch.fault.esr_el2 = read_sysreg_el2(SYS_ESR);
 
 	/*
 	 * We're using the raw exception code in order to only process
@@ -708,8 +708,8 @@ static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par,
 	asm volatile("ldr %0, =__hyp_panic_string" : "=r" (str_va));
 
 	__hyp_do_panic(str_va,
-		       spsr,  elr,
-		       read_sysreg(esr_el2),   read_sysreg_el2(far),
+		       spsr, elr,
+		       read_sysreg(esr_el2), read_sysreg_el2(SYS_FAR),
 		       read_sysreg(hpfar_el2), par, vcpu);
 }
 
@@ -724,15 +724,15 @@ static void __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 par,
 
 	panic(__hyp_panic_string,
 	      spsr,  elr,
-	      read_sysreg_el2(esr),   read_sysreg_el2(far),
+	      read_sysreg_el2(SYS_ESR),   read_sysreg_el2(SYS_FAR),
 	      read_sysreg(hpfar_el2), par, vcpu);
 }
 NOKPROBE_SYMBOL(__hyp_call_panic_vhe);
 
 void __hyp_text __noreturn hyp_panic(struct kvm_cpu_context *host_ctxt)
 {
-	u64 spsr = read_sysreg_el2(spsr);
-	u64 elr = read_sysreg_el2(elr);
+	u64 spsr = read_sysreg_el2(SYS_SPSR);
+	u64 elr = read_sysreg_el2(SYS_ELR);
 	u64 par = read_sysreg(par_el1);
 
 	if (!has_vhe())
diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index c52a8451637c..62866a68e852 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -54,33 +54,33 @@ static void __hyp_text __sysreg_save_user_state(struct kvm_cpu_context *ctxt)
 static void __hyp_text __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
 {
 	ctxt->sys_regs[CSSELR_EL1]	= read_sysreg(csselr_el1);
-	ctxt->sys_regs[SCTLR_EL1]	= read_sysreg_el1(sctlr);
+	ctxt->sys_regs[SCTLR_EL1]	= read_sysreg_el1(SYS_SCTLR);
 	ctxt->sys_regs[ACTLR_EL1]	= read_sysreg(actlr_el1);
-	ctxt->sys_regs[CPACR_EL1]	= read_sysreg_el1(cpacr);
-	ctxt->sys_regs[TTBR0_EL1]	= read_sysreg_el1(ttbr0);
-	ctxt->sys_regs[TTBR1_EL1]	= read_sysreg_el1(ttbr1);
-	ctxt->sys_regs[TCR_EL1]		= read_sysreg_el1(tcr);
-	ctxt->sys_regs[ESR_EL1]		= read_sysreg_el1(esr);
-	ctxt->sys_regs[AFSR0_EL1]	= read_sysreg_el1(afsr0);
-	ctxt->sys_regs[AFSR1_EL1]	= read_sysreg_el1(afsr1);
-	ctxt->sys_regs[FAR_EL1]		= read_sysreg_el1(far);
-	ctxt->sys_regs[MAIR_EL1]	= read_sysreg_el1(mair);
-	ctxt->sys_regs[VBAR_EL1]	= read_sysreg_el1(vbar);
-	ctxt->sys_regs[CONTEXTIDR_EL1]	= read_sysreg_el1(contextidr);
-	ctxt->sys_regs[AMAIR_EL1]	= read_sysreg_el1(amair);
-	ctxt->sys_regs[CNTKCTL_EL1]	= read_sysreg_el1(cntkctl);
+	ctxt->sys_regs[CPACR_EL1]	= read_sysreg_el1(SYS_CPACR);
+	ctxt->sys_regs[TTBR0_EL1]	= read_sysreg_el1(SYS_TTBR0);
+	ctxt->sys_regs[TTBR1_EL1]	= read_sysreg_el1(SYS_TTBR1);
+	ctxt->sys_regs[TCR_EL1]		= read_sysreg_el1(SYS_TCR);
+	ctxt->sys_regs[ESR_EL1]		= read_sysreg_el1(SYS_ESR);
+	ctxt->sys_regs[AFSR0_EL1]	= read_sysreg_el1(SYS_AFSR0);
+	ctxt->sys_regs[AFSR1_EL1]	= read_sysreg_el1(SYS_AFSR1);
+	ctxt->sys_regs[FAR_EL1]		= read_sysreg_el1(SYS_FAR);
+	ctxt->sys_regs[MAIR_EL1]	= read_sysreg_el1(SYS_MAIR);
+	ctxt->sys_regs[VBAR_EL1]	= read_sysreg_el1(SYS_VBAR);
+	ctxt->sys_regs[CONTEXTIDR_EL1]	= read_sysreg_el1(SYS_CONTEXTIDR);
+	ctxt->sys_regs[AMAIR_EL1]	= read_sysreg_el1(SYS_AMAIR);
+	ctxt->sys_regs[CNTKCTL_EL1]	= read_sysreg_el1(SYS_CNTKCTL);
 	ctxt->sys_regs[PAR_EL1]		= read_sysreg(par_el1);
 	ctxt->sys_regs[TPIDR_EL1]	= read_sysreg(tpidr_el1);
 
 	ctxt->gp_regs.sp_el1		= read_sysreg(sp_el1);
-	ctxt->gp_regs.elr_el1		= read_sysreg_el1(elr);
-	ctxt->gp_regs.spsr[KVM_SPSR_EL1]= read_sysreg_el1(spsr);
+	ctxt->gp_regs.elr_el1		= read_sysreg_el1(SYS_ELR);
+	ctxt->gp_regs.spsr[KVM_SPSR_EL1]= read_sysreg_el1(SYS_SPSR);
 }
 
 static void __hyp_text __sysreg_save_el2_return_state(struct kvm_cpu_context *ctxt)
 {
-	ctxt->gp_regs.regs.pc		= read_sysreg_el2(elr);
-	ctxt->gp_regs.regs.pstate	= read_sysreg_el2(spsr);
+	ctxt->gp_regs.regs.pc		= read_sysreg_el2(SYS_ELR);
+	ctxt->gp_regs.regs.pstate	= read_sysreg_el2(SYS_SPSR);
 
 	if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN))
 		ctxt->sys_regs[DISR_EL1] = read_sysreg_s(SYS_VDISR_EL2);
@@ -120,35 +120,35 @@ static void __hyp_text __sysreg_restore_common_state(struct kvm_cpu_context *ctx
 
 static void __hyp_text __sysreg_restore_user_state(struct kvm_cpu_context *ctxt)
 {
-	write_sysreg(ctxt->sys_regs[TPIDR_EL0],	  	tpidr_el0);
-	write_sysreg(ctxt->sys_regs[TPIDRRO_EL0], 	tpidrro_el0);
+	write_sysreg(ctxt->sys_regs[TPIDR_EL0],		tpidr_el0);
+	write_sysreg(ctxt->sys_regs[TPIDRRO_EL0],	tpidrro_el0);
 }
 
 static void __hyp_text __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
 {
 	write_sysreg(ctxt->sys_regs[MPIDR_EL1],		vmpidr_el2);
 	write_sysreg(ctxt->sys_regs[CSSELR_EL1],	csselr_el1);
-	write_sysreg_el1(ctxt->sys_regs[SCTLR_EL1],	sctlr);
-	write_sysreg(ctxt->sys_regs[ACTLR_EL1],	  	actlr_el1);
-	write_sysreg_el1(ctxt->sys_regs[CPACR_EL1],	cpacr);
-	write_sysreg_el1(ctxt->sys_regs[TTBR0_EL1],	ttbr0);
-	write_sysreg_el1(ctxt->sys_regs[TTBR1_EL1],	ttbr1);
-	write_sysreg_el1(ctxt->sys_regs[TCR_EL1],	tcr);
-	write_sysreg_el1(ctxt->sys_regs[ESR_EL1],	esr);
-	write_sysreg_el1(ctxt->sys_regs[AFSR0_EL1],	afsr0);
-	write_sysreg_el1(ctxt->sys_regs[AFSR1_EL1],	afsr1);
-	write_sysreg_el1(ctxt->sys_regs[FAR_EL1],	far);
-	write_sysreg_el1(ctxt->sys_regs[MAIR_EL1],	mair);
-	write_sysreg_el1(ctxt->sys_regs[VBAR_EL1],	vbar);
-	write_sysreg_el1(ctxt->sys_regs[CONTEXTIDR_EL1],contextidr);
-	write_sysreg_el1(ctxt->sys_regs[AMAIR_EL1],	amair);
-	write_sysreg_el1(ctxt->sys_regs[CNTKCTL_EL1], 	cntkctl);
+	write_sysreg_el1(ctxt->sys_regs[SCTLR_EL1],	SYS_SCTLR);
+	write_sysreg(ctxt->sys_regs[ACTLR_EL1],		actlr_el1);
+	write_sysreg_el1(ctxt->sys_regs[CPACR_EL1],	SYS_CPACR);
+	write_sysreg_el1(ctxt->sys_regs[TTBR0_EL1],	SYS_TTBR0);
+	write_sysreg_el1(ctxt->sys_regs[TTBR1_EL1],	SYS_TTBR1);
+	write_sysreg_el1(ctxt->sys_regs[TCR_EL1],	SYS_TCR);
+	write_sysreg_el1(ctxt->sys_regs[ESR_EL1],	SYS_ESR);
+	write_sysreg_el1(ctxt->sys_regs[AFSR0_EL1],	SYS_AFSR0);
+	write_sysreg_el1(ctxt->sys_regs[AFSR1_EL1],	SYS_AFSR1);
+	write_sysreg_el1(ctxt->sys_regs[FAR_EL1],	SYS_FAR);
+	write_sysreg_el1(ctxt->sys_regs[MAIR_EL1],	SYS_MAIR);
+	write_sysreg_el1(ctxt->sys_regs[VBAR_EL1],	SYS_VBAR);
+	write_sysreg_el1(ctxt->sys_regs[CONTEXTIDR_EL1],SYS_CONTEXTIDR);
+	write_sysreg_el1(ctxt->sys_regs[AMAIR_EL1],	SYS_AMAIR);
+	write_sysreg_el1(ctxt->sys_regs[CNTKCTL_EL1],	SYS_CNTKCTL);
 	write_sysreg(ctxt->sys_regs[PAR_EL1],		par_el1);
 	write_sysreg(ctxt->sys_regs[TPIDR_EL1],		tpidr_el1);
 
 	write_sysreg(ctxt->gp_regs.sp_el1,		sp_el1);
-	write_sysreg_el1(ctxt->gp_regs.elr_el1,		elr);
-	write_sysreg_el1(ctxt->gp_regs.spsr[KVM_SPSR_EL1],spsr);
+	write_sysreg_el1(ctxt->gp_regs.elr_el1,		SYS_ELR);
+	write_sysreg_el1(ctxt->gp_regs.spsr[KVM_SPSR_EL1],SYS_SPSR);
 }
 
 static void __hyp_text
@@ -171,8 +171,8 @@ __sysreg_restore_el2_return_state(struct kvm_cpu_context *ctxt)
 	if (!(mode & PSR_MODE32_BIT) && mode >= PSR_MODE_EL2t)
 		pstate = PSR_MODE_EL2h | PSR_IL_BIT;
 
-	write_sysreg_el2(ctxt->gp_regs.regs.pc,		elr);
-	write_sysreg_el2(pstate,			spsr);
+	write_sysreg_el2(ctxt->gp_regs.regs.pc,		SYS_ELR);
+	write_sysreg_el2(pstate,			SYS_SPSR);
 
 	if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN))
 		write_sysreg_s(ctxt->sys_regs[DISR_EL1], SYS_VDISR_EL2);
diff --git a/arch/arm64/kvm/hyp/tlb.c b/arch/arm64/kvm/hyp/tlb.c
index 76c30866069e..32a782bb00be 100644
--- a/arch/arm64/kvm/hyp/tlb.c
+++ b/arch/arm64/kvm/hyp/tlb.c
@@ -44,12 +44,12 @@ static void __hyp_text __tlb_switch_to_guest_vhe(struct kvm *kvm,
 		 * in the TCR_EL1 register. We also need to prevent it to
 		 * allocate IPA->PA walks, so we enable the S1 MMU...
 		 */
-		val = cxt->tcr = read_sysreg_el1(tcr);
+		val = cxt->tcr = read_sysreg_el1(SYS_TCR);
 		val |= TCR_EPD1_MASK | TCR_EPD0_MASK;
-		write_sysreg_el1(val, tcr);
-		val = cxt->sctlr = read_sysreg_el1(sctlr);
+		write_sysreg_el1(val, SYS_TCR);
+		val = cxt->sctlr = read_sysreg_el1(SYS_SCTLR);
 		val |= SCTLR_ELx_M;
-		write_sysreg_el1(val, sctlr);
+		write_sysreg_el1(val, SYS_SCTLR);
 	}
 
 	/*
@@ -96,8 +96,8 @@ static void __hyp_text __tlb_switch_to_host_vhe(struct kvm *kvm,
 
 	if (cpus_have_const_cap(ARM64_WORKAROUND_1165522)) {
 		/* Restore the registers to what they were */
-		write_sysreg_el1(cxt->tcr, tcr);
-		write_sysreg_el1(cxt->sctlr, sctlr);
+		write_sysreg_el1(cxt->tcr, SYS_TCR);
+		write_sysreg_el1(cxt->sctlr, SYS_SCTLR);
 	}
 
 	local_irq_restore(cxt->flags);
diff --git a/arch/arm64/kvm/hyp/vgic-v2-cpuif-proxy.c b/arch/arm64/kvm/hyp/vgic-v2-cpuif-proxy.c
index 9cbdd034a563..4cd32c856110 100644
--- a/arch/arm64/kvm/hyp/vgic-v2-cpuif-proxy.c
+++ b/arch/arm64/kvm/hyp/vgic-v2-cpuif-proxy.c
@@ -27,7 +27,7 @@
 static bool __hyp_text __is_be(struct kvm_vcpu *vcpu)
 {
 	if (vcpu_mode_is_32bit(vcpu))
-		return !!(read_sysreg_el2(spsr) & PSR_AA32_E_BIT);
+		return !!(read_sysreg_el2(SYS_SPSR) & PSR_AA32_E_BIT);
 
 	return !!(read_sysreg(SCTLR_EL1) & SCTLR_ELx_EE);
 }
diff --git a/arch/arm64/kvm/regmap.c b/arch/arm64/kvm/regmap.c
index 7a5173ea2276..5dd110b384e4 100644
--- a/arch/arm64/kvm/regmap.c
+++ b/arch/arm64/kvm/regmap.c
@@ -163,7 +163,7 @@ unsigned long vcpu_read_spsr32(const struct kvm_vcpu *vcpu)
 
 	switch (spsr_idx) {
 	case KVM_SPSR_SVC:
-		return read_sysreg_el1(spsr);
+		return read_sysreg_el1(SYS_SPSR);
 	case KVM_SPSR_ABT:
 		return read_sysreg(spsr_abt);
 	case KVM_SPSR_UND:
@@ -188,7 +188,7 @@ void vcpu_write_spsr32(struct kvm_vcpu *vcpu, unsigned long v)
 
 	switch (spsr_idx) {
 	case KVM_SPSR_SVC:
-		write_sysreg_el1(v, spsr);
+		write_sysreg_el1(v, SYS_SPSR);
 	case KVM_SPSR_ABT:
 		write_sysreg(v, spsr_abt);
 	case KVM_SPSR_UND:
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 857b226bcdde..adb8a7e9c8e4 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -92,24 +92,24 @@ u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
 	 */
 	switch (reg) {
 	case CSSELR_EL1:	return read_sysreg_s(SYS_CSSELR_EL1);
-	case SCTLR_EL1:		return read_sysreg_s(sctlr_EL12);
+	case SCTLR_EL1:		return read_sysreg_s(SYS_SCTLR_EL12);
 	case ACTLR_EL1:		return read_sysreg_s(SYS_ACTLR_EL1);
-	case CPACR_EL1:		return read_sysreg_s(cpacr_EL12);
-	case TTBR0_EL1:		return read_sysreg_s(ttbr0_EL12);
-	case TTBR1_EL1:		return read_sysreg_s(ttbr1_EL12);
-	case TCR_EL1:		return read_sysreg_s(tcr_EL12);
-	case ESR_EL1:		return read_sysreg_s(esr_EL12);
-	case AFSR0_EL1:		return read_sysreg_s(afsr0_EL12);
-	case AFSR1_EL1:		return read_sysreg_s(afsr1_EL12);
-	case FAR_EL1:		return read_sysreg_s(far_EL12);
-	case MAIR_EL1:		return read_sysreg_s(mair_EL12);
-	case VBAR_EL1:		return read_sysreg_s(vbar_EL12);
-	case CONTEXTIDR_EL1:	return read_sysreg_s(contextidr_EL12);
+	case CPACR_EL1:		return read_sysreg_s(SYS_CPACR_EL12);
+	case TTBR0_EL1:		return read_sysreg_s(SYS_TTBR0_EL12);
+	case TTBR1_EL1:		return read_sysreg_s(SYS_TTBR1_EL12);
+	case TCR_EL1:		return read_sysreg_s(SYS_TCR_EL12);
+	case ESR_EL1:		return read_sysreg_s(SYS_ESR_EL12);
+	case AFSR0_EL1:		return read_sysreg_s(SYS_AFSR0_EL12);
+	case AFSR1_EL1:		return read_sysreg_s(SYS_AFSR1_EL12);
+	case FAR_EL1:		return read_sysreg_s(SYS_FAR_EL12);
+	case MAIR_EL1:		return read_sysreg_s(SYS_MAIR_EL12);
+	case VBAR_EL1:		return read_sysreg_s(SYS_VBAR_EL12);
+	case CONTEXTIDR_EL1:	return read_sysreg_s(SYS_CONTEXTIDR_EL12);
 	case TPIDR_EL0:		return read_sysreg_s(SYS_TPIDR_EL0);
 	case TPIDRRO_EL0:	return read_sysreg_s(SYS_TPIDRRO_EL0);
 	case TPIDR_EL1:		return read_sysreg_s(SYS_TPIDR_EL1);
-	case AMAIR_EL1:		return read_sysreg_s(amair_EL12);
-	case CNTKCTL_EL1:	return read_sysreg_s(cntkctl_EL12);
+	case AMAIR_EL1:		return read_sysreg_s(SYS_AMAIR_EL12);
+	case CNTKCTL_EL1:	return read_sysreg_s(SYS_CNTKCTL_EL12);
 	case PAR_EL1:		return read_sysreg_s(SYS_PAR_EL1);
 	case DACR32_EL2:	return read_sysreg_s(SYS_DACR32_EL2);
 	case IFSR32_EL2:	return read_sysreg_s(SYS_IFSR32_EL2);
@@ -135,24 +135,24 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
 	 */
 	switch (reg) {
 	case CSSELR_EL1:	write_sysreg_s(val, SYS_CSSELR_EL1);	return;
-	case SCTLR_EL1:		write_sysreg_s(val, sctlr_EL12);	return;
+	case SCTLR_EL1:		write_sysreg_s(val, SYS_SCTLR_EL12);	return;
 	case ACTLR_EL1:		write_sysreg_s(val, SYS_ACTLR_EL1);	return;
-	case CPACR_EL1:		write_sysreg_s(val, cpacr_EL12);	return;
-	case TTBR0_EL1:		write_sysreg_s(val, ttbr0_EL12);	return;
-	case TTBR1_EL1:		write_sysreg_s(val, ttbr1_EL12);	return;
-	case TCR_EL1:		write_sysreg_s(val, tcr_EL12);		return;
-	case ESR_EL1:		write_sysreg_s(val, esr_EL12);		return;
-	case AFSR0_EL1:		write_sysreg_s(val, afsr0_EL12);	return;
-	case AFSR1_EL1:		write_sysreg_s(val, afsr1_EL12);	return;
-	case FAR_EL1:		write_sysreg_s(val, far_EL12);		return;
-	case MAIR_EL1:		write_sysreg_s(val, mair_EL12);		return;
-	case VBAR_EL1:		write_sysreg_s(val, vbar_EL12);		return;
-	case CONTEXTIDR_EL1:	write_sysreg_s(val, contextidr_EL12);	return;
+	case CPACR_EL1:		write_sysreg_s(val, SYS_CPACR_EL12);	return;
+	case TTBR0_EL1:		write_sysreg_s(val, SYS_TTBR0_EL12);	return;
+	case TTBR1_EL1:		write_sysreg_s(val, SYS_TTBR1_EL12);	return;
+	case TCR_EL1:		write_sysreg_s(val, SYS_TCR_EL12);	return;
+	case ESR_EL1:		write_sysreg_s(val, SYS_ESR_EL12);	return;
+	case AFSR0_EL1:		write_sysreg_s(val, SYS_AFSR0_EL12);	return;
+	case AFSR1_EL1:		write_sysreg_s(val, SYS_AFSR1_EL12);	return;
+	case FAR_EL1:		write_sysreg_s(val, SYS_FAR_EL12);	return;
+	case MAIR_EL1:		write_sysreg_s(val, SYS_MAIR_EL12);	return;
+	case VBAR_EL1:		write_sysreg_s(val, SYS_VBAR_EL12);	return;
+	case CONTEXTIDR_EL1:	write_sysreg_s(val, SYS_CONTEXTIDR_EL12); return;
 	case TPIDR_EL0:		write_sysreg_s(val, SYS_TPIDR_EL0);	return;
 	case TPIDRRO_EL0:	write_sysreg_s(val, SYS_TPIDRRO_EL0);	return;
 	case TPIDR_EL1:		write_sysreg_s(val, SYS_TPIDR_EL1);	return;
-	case AMAIR_EL1:		write_sysreg_s(val, amair_EL12);	return;
-	case CNTKCTL_EL1:	write_sysreg_s(val, cntkctl_EL12);	return;
+	case AMAIR_EL1:		write_sysreg_s(val, SYS_AMAIR_EL12);	return;
+	case CNTKCTL_EL1:	write_sysreg_s(val, SYS_CNTKCTL_EL12);	return;
 	case PAR_EL1:		write_sysreg_s(val, SYS_PAR_EL1);	return;
 	case DACR32_EL2:	write_sysreg_s(val, SYS_DACR32_EL2);	return;
 	case IFSR32_EL2:	write_sysreg_s(val, SYS_IFSR32_EL2);	return;
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 05ddb6293b79..089441a07ed7 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -237,10 +237,10 @@ static bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx)
 
 		switch (index) {
 		case TIMER_VTIMER:
-			cnt_ctl = read_sysreg_el0(cntv_ctl);
+			cnt_ctl = read_sysreg_el0(SYS_CNTV_CTL);
 			break;
 		case TIMER_PTIMER:
-			cnt_ctl = read_sysreg_el0(cntp_ctl);
+			cnt_ctl = read_sysreg_el0(SYS_CNTP_CTL);
 			break;
 		case NR_KVM_TIMERS:
 			/* GCC is braindead */
@@ -349,20 +349,20 @@ static void timer_save_state(struct arch_timer_context *ctx)
 
 	switch (index) {
 	case TIMER_VTIMER:
-		ctx->cnt_ctl = read_sysreg_el0(cntv_ctl);
-		ctx->cnt_cval = read_sysreg_el0(cntv_cval);
+		ctx->cnt_ctl = read_sysreg_el0(SYS_CNTV_CTL);
+		ctx->cnt_cval = read_sysreg_el0(SYS_CNTV_CVAL);
 
 		/* Disable the timer */
-		write_sysreg_el0(0, cntv_ctl);
+		write_sysreg_el0(0, SYS_CNTV_CTL);
 		isb();
 
 		break;
 	case TIMER_PTIMER:
-		ctx->cnt_ctl = read_sysreg_el0(cntp_ctl);
-		ctx->cnt_cval = read_sysreg_el0(cntp_cval);
+		ctx->cnt_ctl = read_sysreg_el0(SYS_CNTP_CTL);
+		ctx->cnt_cval = read_sysreg_el0(SYS_CNTP_CVAL);
 
 		/* Disable the timer */
-		write_sysreg_el0(0, cntp_ctl);
+		write_sysreg_el0(0, SYS_CNTP_CTL);
 		isb();
 
 		break;
@@ -428,14 +428,14 @@ static void timer_restore_state(struct arch_timer_context *ctx)
 
 	switch (index) {
 	case TIMER_VTIMER:
-		write_sysreg_el0(ctx->cnt_cval, cntv_cval);
+		write_sysreg_el0(ctx->cnt_cval, SYS_CNTV_CVAL);
 		isb();
-		write_sysreg_el0(ctx->cnt_ctl, cntv_ctl);
+		write_sysreg_el0(ctx->cnt_ctl, SYS_CNTV_CTL);
 		break;
 	case TIMER_PTIMER:
-		write_sysreg_el0(ctx->cnt_cval, cntp_cval);
+		write_sysreg_el0(ctx->cnt_cval, SYS_CNTP_CVAL);
 		isb();
-		write_sysreg_el0(ctx->cnt_ctl, cntp_ctl);
+		write_sysreg_el0(ctx->cnt_ctl, SYS_CNTP_CTL);
 		break;
 	case NR_KVM_TIMERS:
 		BUG();
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 02/59] KVM: arm64: Move __load_guest_stage2 to kvm_mmu.h
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
  2019-06-21  9:37 ` [PATCH 01/59] KVM: arm64: Migrate _elx sysreg accessors to msr_s/mrs_s Marc Zyngier
@ 2019-06-21  9:37 ` Marc Zyngier
  2019-06-24 11:19   ` Dave Martin
  2019-06-21  9:37 ` [PATCH 03/59] arm64: Add ARM64_HAS_NESTED_VIRT cpufeature Marc Zyngier
                   ` (58 subsequent siblings)
  60 siblings, 1 reply; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:37 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

Having __load_guest_stage2 in kvm_hyp.h is quickly going to trigger
a circular include problem. In order to avoid this, let's move
it to kvm_mmu.h, where it will be a better fit anyway.

In the process, drop the __hyp_text annotation, which doesn't help
as the function is marked as __always_inline.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/kvm_hyp.h | 18 ------------------
 arch/arm64/include/asm/kvm_mmu.h | 17 +++++++++++++++++
 2 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index ce99c2daff04..e8044f265824 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -21,7 +21,6 @@
 #include <linux/compiler.h>
 #include <linux/kvm_host.h>
 #include <asm/alternative.h>
-#include <asm/kvm_mmu.h>
 #include <asm/sysreg.h>
 
 #define __hyp_text __section(.hyp.text) notrace
@@ -116,22 +115,5 @@ void deactivate_traps_vhe_put(void);
 u64 __guest_enter(struct kvm_vcpu *vcpu, struct kvm_cpu_context *host_ctxt);
 void __noreturn __hyp_do_panic(unsigned long, ...);
 
-/*
- * Must be called from hyp code running at EL2 with an updated VTTBR
- * and interrupts disabled.
- */
-static __always_inline void __hyp_text __load_guest_stage2(struct kvm *kvm)
-{
-	write_sysreg(kvm->arch.vtcr, vtcr_el2);
-	write_sysreg(kvm_get_vttbr(kvm), vttbr_el2);
-
-	/*
-	 * ARM erratum 1165522 requires the actual execution of the above
-	 * before we can switch to the EL1/EL0 translation regime used by
-	 * the guest.
-	 */
-	asm(ALTERNATIVE("nop", "isb", ARM64_WORKAROUND_1165522));
-}
-
 #endif /* __ARM64_KVM_HYP_H__ */
 
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index ebeefcf835e8..3120ef948fa4 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -614,5 +614,22 @@ static __always_inline u64 kvm_get_vttbr(struct kvm *kvm)
 	return kvm_phys_to_vttbr(baddr) | vmid_field | cnp;
 }
 
+/*
+ * Must be called from hyp code running at EL2 with an updated VTTBR
+ * and interrupts disabled.
+ */
+static __always_inline void __load_guest_stage2(struct kvm *kvm)
+{
+	write_sysreg(kvm->arch.vtcr, vtcr_el2);
+	write_sysreg(kvm_get_vttbr(kvm), vttbr_el2);
+
+	/*
+	 * ARM erratum 1165522 requires the actual execution of the above
+	 * before we can switch to the EL1/EL0 translation regime used by
+	 * the guest.
+	 */
+	asm(ALTERNATIVE("nop", "isb", ARM64_WORKAROUND_1165522));
+}
+
 #endif /* __ASSEMBLY__ */
 #endif /* __ARM64_KVM_MMU_H__ */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 03/59] arm64: Add ARM64_HAS_NESTED_VIRT cpufeature
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
  2019-06-21  9:37 ` [PATCH 01/59] KVM: arm64: Migrate _elx sysreg accessors to msr_s/mrs_s Marc Zyngier
  2019-06-21  9:37 ` [PATCH 02/59] KVM: arm64: Move __load_guest_stage2 to kvm_mmu.h Marc Zyngier
@ 2019-06-21  9:37 ` Marc Zyngier
  2019-06-21 13:08   ` Julien Thierry
                     ` (2 more replies)
  2019-06-21  9:37 ` [PATCH 04/59] KVM: arm64: nv: Introduce nested virtualization VCPU feature Marc Zyngier
                   ` (57 subsequent siblings)
  60 siblings, 3 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:37 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Jintack Lim <jintack.lim@linaro.org>

Add a new ARM64_HAS_NESTED_VIRT feature to indicate that the
CPU has the ARMv8.3 nested virtualization capability.

This will be used to support nested virtualization in KVM.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 .../admin-guide/kernel-parameters.txt         |  4 +++
 arch/arm64/include/asm/cpucaps.h              |  3 ++-
 arch/arm64/include/asm/sysreg.h               |  1 +
 arch/arm64/kernel/cpufeature.c                | 26 +++++++++++++++++++
 4 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 138f6664b2e2..202bb2115d83 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2046,6 +2046,10 @@
 			[KVM,ARM] Allow use of GICv4 for direct injection of
 			LPIs.
 
+	kvm-arm.nested=
+			[KVM,ARM] Allow nested virtualization in KVM/ARM.
+			Default is 0 (disabled)
+
 	kvm-intel.ept=	[KVM,Intel] Disable extended page tables
 			(virtualized MMU) support on capable Intel chips.
 			Default is 1 (enabled)
diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index 33401ebc187c..faa13c1f1f65 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -63,7 +63,8 @@
 #define ARM64_HAS_IRQ_PRIO_MASKING		42
 #define ARM64_HAS_DCPODP			43
 #define ARM64_WORKAROUND_1463225		44
+#define ARM64_HAS_NESTED_VIRT			45
 
-#define ARM64_NCAPS				45
+#define ARM64_NCAPS				46
 
 #endif /* __ASM_CPUCAPS_H */
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 434cf53d527b..f3ca7e4796ab 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -693,6 +693,7 @@
 /* id_aa64mmfr2 */
 #define ID_AA64MMFR2_FWB_SHIFT		40
 #define ID_AA64MMFR2_AT_SHIFT		32
+#define ID_AA64MMFR2_NV_SHIFT		24
 #define ID_AA64MMFR2_LVA_SHIFT		16
 #define ID_AA64MMFR2_IESB_SHIFT		12
 #define ID_AA64MMFR2_LSM_SHIFT		8
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 80babf451519..2f8e7d4e8e45 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -224,6 +224,7 @@ static const struct arm64_ftr_bits ftr_id_aa64mmfr1[] = {
 static const struct arm64_ftr_bits ftr_id_aa64mmfr2[] = {
 	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_FWB_SHIFT, 4, 0),
 	ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_AT_SHIFT, 4, 0),
+	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_NV_SHIFT, 4, 0),
 	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_LVA_SHIFT, 4, 0),
 	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_IESB_SHIFT, 4, 0),
 	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_LSM_SHIFT, 4, 0),
@@ -1161,6 +1162,21 @@ static void cpu_copy_el2regs(const struct arm64_cpu_capabilities *__unused)
 	if (!alternative_is_applied(ARM64_HAS_VIRT_HOST_EXTN))
 		write_sysreg(read_sysreg(tpidr_el1), tpidr_el2);
 }
+
+static bool nested_param;
+static bool has_nested_virt_support(const struct arm64_cpu_capabilities *cap,
+				    int scope)
+{
+	return has_cpuid_feature(cap, scope) &&
+		nested_param;
+}
+
+static int __init kvmarm_nested_cfg(char *buf)
+{
+	return strtobool(buf, &nested_param);
+}
+
+early_param("kvm-arm.nested", kvmarm_nested_cfg);
 #endif
 
 static void cpu_has_fwb(const struct arm64_cpu_capabilities *__unused)
@@ -1331,6 +1347,16 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 		.matches = runs_at_el2,
 		.cpu_enable = cpu_copy_el2regs,
 	},
+	{
+		.desc = "Nested Virtualization Support",
+		.capability = ARM64_HAS_NESTED_VIRT,
+		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
+		.matches = has_nested_virt_support,
+		.sys_reg = SYS_ID_AA64MMFR2_EL1,
+		.sign = FTR_UNSIGNED,
+		.field_pos = ID_AA64MMFR2_NV_SHIFT,
+		.min_field_value = 1,
+	},
 #endif	/* CONFIG_ARM64_VHE */
 	{
 		.desc = "32-bit EL0 Support",
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 04/59] KVM: arm64: nv: Introduce nested virtualization VCPU feature
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (2 preceding siblings ...)
  2019-06-21  9:37 ` [PATCH 03/59] arm64: Add ARM64_HAS_NESTED_VIRT cpufeature Marc Zyngier
@ 2019-06-21  9:37 ` Marc Zyngier
  2019-06-21 13:08   ` Julien Thierry
                     ` (2 more replies)
  2019-06-21  9:37 ` [PATCH 05/59] KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set Marc Zyngier
                   ` (56 subsequent siblings)
  60 siblings, 3 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:37 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Christoffer Dall <christoffer.dall@arm.com>

Introduce the feature bit and a primitive that checks if the feature is
set behind a static key check based on the cpus_have_const_cap check.

Checking nested_virt_in_use() on systems without nested virt enabled
should have neglgible overhead.

We don't yet allow userspace to actually set this feature.

Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_nested.h   |  9 +++++++++
 arch/arm64/include/asm/kvm_nested.h | 13 +++++++++++++
 arch/arm64/include/uapi/asm/kvm.h   |  1 +
 3 files changed, 23 insertions(+)
 create mode 100644 arch/arm/include/asm/kvm_nested.h
 create mode 100644 arch/arm64/include/asm/kvm_nested.h

diff --git a/arch/arm/include/asm/kvm_nested.h b/arch/arm/include/asm/kvm_nested.h
new file mode 100644
index 000000000000..124ff6445f8f
--- /dev/null
+++ b/arch/arm/include/asm/kvm_nested.h
@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ARM_KVM_NESTED_H
+#define __ARM_KVM_NESTED_H
+
+#include <linux/kvm_host.h>
+
+static inline bool nested_virt_in_use(const struct kvm_vcpu *vcpu) { return false; }
+
+#endif /* __ARM_KVM_NESTED_H */
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
new file mode 100644
index 000000000000..8a3d121a0b42
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ARM64_KVM_NESTED_H
+#define __ARM64_KVM_NESTED_H
+
+#include <linux/kvm_host.h>
+
+static inline bool nested_virt_in_use(const struct kvm_vcpu *vcpu)
+{
+	return cpus_have_const_cap(ARM64_HAS_NESTED_VIRT) &&
+		test_bit(KVM_ARM_VCPU_NESTED_VIRT, vcpu->arch.features);
+}
+
+#endif /* __ARM64_KVM_NESTED_H */
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index d819a3e8b552..563e2a8bae93 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -106,6 +106,7 @@ struct kvm_regs {
 #define KVM_ARM_VCPU_SVE		4 /* enable SVE for this CPU */
 #define KVM_ARM_VCPU_PTRAUTH_ADDRESS	5 /* VCPU uses address authentication */
 #define KVM_ARM_VCPU_PTRAUTH_GENERIC	6 /* VCPU uses generic authentication */
+#define KVM_ARM_VCPU_NESTED_VIRT	7 /* Support nested virtualization */
 
 struct kvm_vcpu_init {
 	__u32 target;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 05/59] KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (3 preceding siblings ...)
  2019-06-21  9:37 ` [PATCH 04/59] KVM: arm64: nv: Introduce nested virtualization VCPU feature Marc Zyngier
@ 2019-06-21  9:37 ` Marc Zyngier
  2019-06-24 10:19   ` Suzuki K Poulose
  2019-06-24 11:38   ` Dave Martin
  2019-06-21  9:37 ` [PATCH 06/59] KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x Marc Zyngier
                   ` (55 subsequent siblings)
  60 siblings, 2 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:37 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Christoffer Dall <christoffer.dall@arm.com>

Reset the VCPU with PSTATE.M = EL2h when the nested virtualization
feature is enabled on the VCPU.

Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/kvm/reset.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 1140b4485575..675ca07dbb05 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -52,6 +52,11 @@ static const struct kvm_regs default_regs_reset = {
 			PSR_F_BIT | PSR_D_BIT),
 };
 
+static const struct kvm_regs default_regs_reset_el2 = {
+	.regs.pstate = (PSR_MODE_EL2h | PSR_A_BIT | PSR_I_BIT |
+			PSR_F_BIT | PSR_D_BIT),
+};
+
 static const struct kvm_regs default_regs_reset32 = {
 	.regs.pstate = (PSR_AA32_MODE_SVC | PSR_AA32_A_BIT |
 			PSR_AA32_I_BIT | PSR_AA32_F_BIT),
@@ -302,6 +307,8 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 			if (!cpu_has_32bit_el1())
 				goto out;
 			cpu_reset = &default_regs_reset32;
+		} else if (test_bit(KVM_ARM_VCPU_NESTED_VIRT, vcpu->arch.features)) {
+			cpu_reset = &default_regs_reset_el2;
 		} else {
 			cpu_reset = &default_regs_reset;
 		}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 06/59] KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (4 preceding siblings ...)
  2019-06-21  9:37 ` [PATCH 05/59] KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set Marc Zyngier
@ 2019-06-21  9:37 ` Marc Zyngier
  2019-06-21 13:24   ` Julien Thierry
  2019-06-21  9:37 ` [PATCH 07/59] KVM: arm64: nv: Add EL2 system registers to vcpu context Marc Zyngier
                   ` (54 subsequent siblings)
  60 siblings, 1 reply; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:37 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Christoffer Dall <christoffer.dall@linaro.org>

We were not allowing userspace to set a more privileged mode for the VCPU
than EL1, but we should allow this when nested virtualization is enabled
for the VCPU.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/kvm/guest.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 3ae2f82fca46..4c35b5d51e21 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -37,6 +37,7 @@
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_coproc.h>
 #include <asm/kvm_host.h>
+#include <asm/kvm_nested.h>
 #include <asm/sigcontext.h>
 
 #include "trace.h"
@@ -194,6 +195,11 @@ static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 			if (vcpu_el1_is_32bit(vcpu))
 				return -EINVAL;
 			break;
+		case PSR_MODE_EL2h:
+		case PSR_MODE_EL2t:
+			if (vcpu_el1_is_32bit(vcpu) || !nested_virt_in_use(vcpu))
+				return -EINVAL;
+			break;
 		default:
 			err = -EINVAL;
 			goto out;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 07/59] KVM: arm64: nv: Add EL2 system registers to vcpu context
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (5 preceding siblings ...)
  2019-06-21  9:37 ` [PATCH 06/59] KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x Marc Zyngier
@ 2019-06-21  9:37 ` Marc Zyngier
  2019-06-24 12:54   ` Dave Martin
                     ` (2 more replies)
  2019-06-21  9:37 ` [PATCH 08/59] KVM: arm64: nv: Reset VMPIDR_EL2 and VPIDR_EL2 to sane values Marc Zyngier
                   ` (53 subsequent siblings)
  60 siblings, 3 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:37 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Jintack Lim <jintack.lim@linaro.org>

ARM v8.3 introduces a new bit in the HCR_EL2, which is the NV bit. When
this bit is set, accessing EL2 registers in EL1 traps to EL2. In
addition, executing the following instructions in EL1 will trap to EL2:
tlbi, at, eret, and msr/mrs instructions to access SP_EL1. Most of the
instructions that trap to EL2 with the NV bit were undef at EL1 prior to
ARM v8.3. The only instruction that was not undef is eret.

This patch sets up a handler for EL2 registers and SP_EL1 register
accesses at EL1. The host hypervisor keeps those register values in
memory, and will emulate their behavior.

This patch doesn't set the NV bit yet. It will be set in a later patch
once nested virtualization support is completed.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/kvm_host.h | 37 +++++++++++++++-
 arch/arm64/include/asm/sysreg.h   | 50 ++++++++++++++++++++-
 arch/arm64/kvm/sys_regs.c         | 74 ++++++++++++++++++++++++++++---
 3 files changed, 154 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 4bcd9c1291d5..2d4290d2513a 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -173,12 +173,47 @@ enum vcpu_sysreg {
 	APGAKEYLO_EL1,
 	APGAKEYHI_EL1,
 
-	/* 32bit specific registers. Keep them at the end of the range */
+	/* 32bit specific registers. */
 	DACR32_EL2,	/* Domain Access Control Register */
 	IFSR32_EL2,	/* Instruction Fault Status Register */
 	FPEXC32_EL2,	/* Floating-Point Exception Control Register */
 	DBGVCR32_EL2,	/* Debug Vector Catch Register */
 
+	/* EL2 registers sorted ascending by Op0, Op1, CRn, CRm, Op2 */
+	FIRST_EL2_SYSREG,
+	VPIDR_EL2 = FIRST_EL2_SYSREG,
+			/* Virtualization Processor ID Register */
+	VMPIDR_EL2,	/* Virtualization Multiprocessor ID Register */
+	SCTLR_EL2,	/* System Control Register (EL2) */
+	ACTLR_EL2,	/* Auxiliary Control Register (EL2) */
+	HCR_EL2,	/* Hypervisor Configuration Register */
+	MDCR_EL2,	/* Monitor Debug Configuration Register (EL2) */
+	CPTR_EL2,	/* Architectural Feature Trap Register (EL2) */
+	HSTR_EL2,	/* Hypervisor System Trap Register */
+	HACR_EL2,	/* Hypervisor Auxiliary Control Register */
+	TTBR0_EL2,	/* Translation Table Base Register 0 (EL2) */
+	TTBR1_EL2,	/* Translation Table Base Register 1 (EL2) */
+	TCR_EL2,	/* Translation Control Register (EL2) */
+	VTTBR_EL2,	/* Virtualization Translation Table Base Register */
+	VTCR_EL2,	/* Virtualization Translation Control Register */
+	SPSR_EL2,	/* EL2 saved program status register */
+	ELR_EL2,	/* EL2 exception link register */
+	AFSR0_EL2,	/* Auxiliary Fault Status Register 0 (EL2) */
+	AFSR1_EL2,	/* Auxiliary Fault Status Register 1 (EL2) */
+	ESR_EL2,	/* Exception Syndrome Register (EL2) */
+	FAR_EL2,	/* Hypervisor IPA Fault Address Register */
+	HPFAR_EL2,	/* Hypervisor IPA Fault Address Register */
+	MAIR_EL2,	/* Memory Attribute Indirection Register (EL2) */
+	AMAIR_EL2,	/* Auxiliary Memory Attribute Indirection Register (EL2) */
+	VBAR_EL2,	/* Vector Base Address Register (EL2) */
+	RVBAR_EL2,	/* Reset Vector Base Address Register */
+	RMR_EL2,	/* Reset Management Register */
+	CONTEXTIDR_EL2,	/* Context ID Register (EL2) */
+	TPIDR_EL2,	/* EL2 Software Thread ID Register */
+	CNTVOFF_EL2,	/* Counter-timer Virtual Offset register */
+	CNTHCTL_EL2,	/* Counter-timer Hypervisor Control register */
+	SP_EL2,		/* EL2 Stack Pointer */
+
 	NR_SYS_REGS	/* Nothing after this line! */
 };
 
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index f3ca7e4796ab..8b95f2c42c3d 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -411,17 +411,49 @@
 
 #define SYS_PMCCFILTR_EL0		sys_reg(3, 3, 14, 15, 7)
 
+#define SYS_VPIDR_EL2			sys_reg(3, 4, 0, 0, 0)
+#define SYS_VMPIDR_EL2			sys_reg(3, 4, 0, 0, 5)
+
+#define SYS_SCTLR_EL2			sys_reg(3, 4, 1, 0, 0)
+#define SYS_ACTLR_EL2			sys_reg(3, 4, 1, 0, 1)
+#define SYS_HCR_EL2			sys_reg(3, 4, 1, 1, 0)
+#define SYS_MDCR_EL2			sys_reg(3, 4, 1, 1, 1)
+#define SYS_CPTR_EL2			sys_reg(3, 4, 1, 1, 2)
+#define SYS_HSTR_EL2			sys_reg(3, 4, 1, 1, 3)
+#define SYS_HACR_EL2			sys_reg(3, 4, 1, 1, 7)
+
 #define SYS_ZCR_EL2			sys_reg(3, 4, 1, 2, 0)
+
+#define SYS_TTBR0_EL2			sys_reg(3, 4, 2, 0, 0)
+#define SYS_TTBR1_EL2			sys_reg(3, 4, 2, 0, 1)
+#define SYS_TCR_EL2			sys_reg(3, 4, 2, 0, 2)
+#define SYS_VTTBR_EL2			sys_reg(3, 4, 2, 1, 0)
+#define SYS_VTCR_EL2			sys_reg(3, 4, 2, 1, 2)
+
 #define SYS_DACR32_EL2			sys_reg(3, 4, 3, 0, 0)
+
 #define SYS_SPSR_EL2			sys_reg(3, 4, 4, 0, 0)
 #define SYS_ELR_EL2			sys_reg(3, 4, 4, 0, 1)
+#define SYS_SP_EL1			sys_reg(3, 4, 4, 1, 0)
+
 #define SYS_IFSR32_EL2			sys_reg(3, 4, 5, 0, 1)
+#define SYS_AFSR0_EL2			sys_reg(3, 4, 5, 1, 0)
+#define SYS_AFSR1_EL2			sys_reg(3, 4, 5, 1, 1)
 #define SYS_ESR_EL2			sys_reg(3, 4, 5, 2, 0)
 #define SYS_VSESR_EL2			sys_reg(3, 4, 5, 2, 3)
 #define SYS_FPEXC32_EL2			sys_reg(3, 4, 5, 3, 0)
 #define SYS_FAR_EL2			sys_reg(3, 4, 6, 0, 0)
 
-#define SYS_VDISR_EL2			sys_reg(3, 4, 12, 1,  1)
+#define SYS_FAR_EL2			sys_reg(3, 4, 6, 0, 0)
+#define SYS_HPFAR_EL2			sys_reg(3, 4, 6, 0, 4)
+
+#define SYS_MAIR_EL2			sys_reg(3, 4, 10, 2, 0)
+#define SYS_AMAIR_EL2			sys_reg(3, 4, 10, 3, 0)
+
+#define SYS_VBAR_EL2			sys_reg(3, 4, 12, 0, 0)
+#define SYS_RVBAR_EL2			sys_reg(3, 4, 12, 0, 1)
+#define SYS_RMR_EL2			sys_reg(3, 4, 12, 0, 2)
+#define SYS_VDISR_EL2			sys_reg(3, 4, 12, 1, 1)
 #define __SYS__AP0Rx_EL2(x)		sys_reg(3, 4, 12, 8, x)
 #define SYS_ICH_AP0R0_EL2		__SYS__AP0Rx_EL2(0)
 #define SYS_ICH_AP0R1_EL2		__SYS__AP0Rx_EL2(1)
@@ -463,23 +495,37 @@
 #define SYS_ICH_LR14_EL2		__SYS__LR8_EL2(6)
 #define SYS_ICH_LR15_EL2		__SYS__LR8_EL2(7)
 
+#define SYS_CONTEXTIDR_EL2		sys_reg(3, 4, 13, 0, 1)
+#define SYS_TPIDR_EL2			sys_reg(3, 4, 13, 0, 2)
+
+#define SYS_CNTVOFF_EL2			sys_reg(3, 4, 14, 0, 3)
+#define SYS_CNTHCTL_EL2			sys_reg(3, 4, 14, 1, 0)
+
 /* VHE encodings for architectural EL0/1 system registers */
 #define SYS_SCTLR_EL12			sys_reg(3, 5, 1, 0, 0)
 #define SYS_CPACR_EL12			sys_reg(3, 5, 1, 0, 2)
 #define SYS_ZCR_EL12			sys_reg(3, 5, 1, 2, 0)
+
 #define SYS_TTBR0_EL12			sys_reg(3, 5, 2, 0, 0)
 #define SYS_TTBR1_EL12			sys_reg(3, 5, 2, 0, 1)
 #define SYS_TCR_EL12			sys_reg(3, 5, 2, 0, 2)
+
 #define SYS_SPSR_EL12			sys_reg(3, 5, 4, 0, 0)
 #define SYS_ELR_EL12			sys_reg(3, 5, 4, 0, 1)
+
 #define SYS_AFSR0_EL12			sys_reg(3, 5, 5, 1, 0)
 #define SYS_AFSR1_EL12			sys_reg(3, 5, 5, 1, 1)
 #define SYS_ESR_EL12			sys_reg(3, 5, 5, 2, 0)
+
 #define SYS_FAR_EL12			sys_reg(3, 5, 6, 0, 0)
+
 #define SYS_MAIR_EL12			sys_reg(3, 5, 10, 2, 0)
 #define SYS_AMAIR_EL12			sys_reg(3, 5, 10, 3, 0)
+
 #define SYS_VBAR_EL12			sys_reg(3, 5, 12, 0, 0)
+
 #define SYS_CONTEXTIDR_EL12		sys_reg(3, 5, 13, 0, 1)
+
 #define SYS_CNTKCTL_EL12		sys_reg(3, 5, 14, 1, 0)
 #define SYS_CNTP_TVAL_EL02		sys_reg(3, 5, 14, 2, 0)
 #define SYS_CNTP_CTL_EL02		sys_reg(3, 5, 14, 2, 1)
@@ -488,6 +534,8 @@
 #define SYS_CNTV_CTL_EL02		sys_reg(3, 5, 14, 3, 1)
 #define SYS_CNTV_CVAL_EL02		sys_reg(3, 5, 14, 3, 2)
 
+#define SYS_SP_EL2			sys_reg(3, 6,  4, 1, 0)
+
 /* Common SCTLR_ELx flags. */
 #define SCTLR_ELx_DSSBS	(_BITUL(44))
 #define SCTLR_ELx_ENIA	(_BITUL(31))
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index adb8a7e9c8e4..e81be6debe07 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -184,6 +184,18 @@ static u32 get_ccsidr(u32 csselr)
 	return ccsidr;
 }
 
+static bool access_rw(struct kvm_vcpu *vcpu,
+		      struct sys_reg_params *p,
+		      const struct sys_reg_desc *r)
+{
+	if (p->is_write)
+		vcpu_write_sys_reg(vcpu, p->regval, r->reg);
+	else
+		p->regval = vcpu_read_sys_reg(vcpu, r->reg);
+
+	return true;
+}
+
 /*
  * See note at ARMv7 ARM B1.14.4 (TL;DR: S/W ops are not easily virtualized).
  */
@@ -394,12 +406,9 @@ static bool trap_debug_regs(struct kvm_vcpu *vcpu,
 			    struct sys_reg_params *p,
 			    const struct sys_reg_desc *r)
 {
-	if (p->is_write) {
-		vcpu_write_sys_reg(vcpu, p->regval, r->reg);
+	access_rw(vcpu, p, r);
+	if (p->is_write)
 		vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
-	} else {
-		p->regval = vcpu_read_sys_reg(vcpu, r->reg);
-	}
 
 	trace_trap_reg(__func__, r->reg, p->is_write, p->regval);
 
@@ -1354,6 +1363,19 @@ static bool access_ccsidr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 	.set_user = set_raz_id_reg,		\
 }
 
+static bool access_sp_el1(struct kvm_vcpu *vcpu,
+			  struct sys_reg_params *p,
+			  const struct sys_reg_desc *r)
+{
+	/* SP_EL1 is NOT maintained in sys_regs array */
+	if (p->is_write)
+		vcpu->arch.ctxt.gp_regs.sp_el1 = p->regval;
+	else
+		p->regval = vcpu->arch.ctxt.gp_regs.sp_el1;
+
+	return true;
+}
+
 /*
  * Architected system registers.
  * Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
@@ -1646,9 +1668,51 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	 */
 	{ SYS_DESC(SYS_PMCCFILTR_EL0), access_pmu_evtyper, reset_val, PMCCFILTR_EL0, 0 },
 
+	{ SYS_DESC(SYS_VPIDR_EL2), access_rw, reset_val, VPIDR_EL2, 0 },
+	{ SYS_DESC(SYS_VMPIDR_EL2), access_rw, reset_val, VMPIDR_EL2, 0 },
+
+	{ SYS_DESC(SYS_SCTLR_EL2), access_rw, reset_val, SCTLR_EL2, 0 },
+	{ SYS_DESC(SYS_ACTLR_EL2), access_rw, reset_val, ACTLR_EL2, 0 },
+	{ SYS_DESC(SYS_HCR_EL2), access_rw, reset_val, HCR_EL2, 0 },
+	{ SYS_DESC(SYS_MDCR_EL2), access_rw, reset_val, MDCR_EL2, 0 },
+	{ SYS_DESC(SYS_CPTR_EL2), access_rw, reset_val, CPTR_EL2, 0 },
+	{ SYS_DESC(SYS_HSTR_EL2), access_rw, reset_val, HSTR_EL2, 0 },
+	{ SYS_DESC(SYS_HACR_EL2), access_rw, reset_val, HACR_EL2, 0 },
+
+	{ SYS_DESC(SYS_TTBR0_EL2), access_rw, reset_val, TTBR0_EL2, 0 },
+	{ SYS_DESC(SYS_TTBR1_EL2), access_rw, reset_val, TTBR1_EL2, 0 },
+	{ SYS_DESC(SYS_TCR_EL2), access_rw, reset_val, TCR_EL2, 0 },
+	{ SYS_DESC(SYS_VTTBR_EL2), access_rw, reset_val, VTTBR_EL2, 0 },
+	{ SYS_DESC(SYS_VTCR_EL2), access_rw, reset_val, VTCR_EL2, 0 },
+
 	{ SYS_DESC(SYS_DACR32_EL2), NULL, reset_unknown, DACR32_EL2 },
+	{ SYS_DESC(SYS_SPSR_EL2), access_rw, reset_val, SPSR_EL2, 0 },
+	{ SYS_DESC(SYS_ELR_EL2), access_rw, reset_val, ELR_EL2, 0 },
+	{ SYS_DESC(SYS_SP_EL1), access_sp_el1},
+
 	{ SYS_DESC(SYS_IFSR32_EL2), NULL, reset_unknown, IFSR32_EL2 },
+	{ SYS_DESC(SYS_AFSR0_EL2), access_rw, reset_val, AFSR0_EL2, 0 },
+	{ SYS_DESC(SYS_AFSR1_EL2), access_rw, reset_val, AFSR1_EL2, 0 },
+	{ SYS_DESC(SYS_ESR_EL2), access_rw, reset_val, ESR_EL2, 0 },
 	{ SYS_DESC(SYS_FPEXC32_EL2), NULL, reset_val, FPEXC32_EL2, 0x700 },
+
+	{ SYS_DESC(SYS_FAR_EL2), access_rw, reset_val, FAR_EL2, 0 },
+	{ SYS_DESC(SYS_HPFAR_EL2), access_rw, reset_val, HPFAR_EL2, 0 },
+
+	{ SYS_DESC(SYS_MAIR_EL2), access_rw, reset_val, MAIR_EL2, 0 },
+	{ SYS_DESC(SYS_AMAIR_EL2), access_rw, reset_val, AMAIR_EL2, 0 },
+
+	{ SYS_DESC(SYS_VBAR_EL2), access_rw, reset_val, VBAR_EL2, 0 },
+	{ SYS_DESC(SYS_RVBAR_EL2), access_rw, reset_val, RVBAR_EL2, 0 },
+	{ SYS_DESC(SYS_RMR_EL2), access_rw, reset_val, RMR_EL2, 0 },
+
+	{ SYS_DESC(SYS_CONTEXTIDR_EL2), access_rw, reset_val, CONTEXTIDR_EL2, 0 },
+	{ SYS_DESC(SYS_TPIDR_EL2), access_rw, reset_val, TPIDR_EL2, 0 },
+
+	{ SYS_DESC(SYS_CNTVOFF_EL2), access_rw, reset_val, CNTVOFF_EL2, 0 },
+	{ SYS_DESC(SYS_CNTHCTL_EL2), access_rw, reset_val, CNTHCTL_EL2, 0 },
+
+	{ SYS_DESC(SYS_SP_EL2), NULL, reset_unknown, SP_EL2 },
 };
 
 static bool trap_dbgidr(struct kvm_vcpu *vcpu,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 08/59] KVM: arm64: nv: Reset VMPIDR_EL2 and VPIDR_EL2 to sane values
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (6 preceding siblings ...)
  2019-06-21  9:37 ` [PATCH 07/59] KVM: arm64: nv: Add EL2 system registers to vcpu context Marc Zyngier
@ 2019-06-21  9:37 ` Marc Zyngier
  2019-06-24 12:59   ` Dave Martin
  2019-06-21  9:37 ` [PATCH 09/59] KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state Marc Zyngier
                   ` (52 subsequent siblings)
  60 siblings, 1 reply; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:37 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

The VMPIDR_EL2 and VPIDR_EL2 are architecturally UNKNOWN at reset, but
let's be nice to a guest hypervisor behaving foolishly and reset these
to something reasonable anyway.

Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/kvm/sys_regs.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index e81be6debe07..693dd063c9c2 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -624,7 +624,7 @@ static void reset_amair_el1(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
 	vcpu_write_sys_reg(vcpu, amair, AMAIR_EL1);
 }
 
-static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
+static u64 compute_reset_mpidr(struct kvm_vcpu *vcpu)
 {
 	u64 mpidr;
 
@@ -638,7 +638,24 @@ static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
 	mpidr = (vcpu->vcpu_id & 0x0f) << MPIDR_LEVEL_SHIFT(0);
 	mpidr |= ((vcpu->vcpu_id >> 4) & 0xff) << MPIDR_LEVEL_SHIFT(1);
 	mpidr |= ((vcpu->vcpu_id >> 12) & 0xff) << MPIDR_LEVEL_SHIFT(2);
-	vcpu_write_sys_reg(vcpu, (1ULL << 31) | mpidr, MPIDR_EL1);
+	mpidr |= (1ULL << 31);
+
+	return mpidr;
+}
+
+static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
+{
+	vcpu_write_sys_reg(vcpu, compute_reset_mpidr(vcpu), MPIDR_EL1);
+}
+
+static void reset_vmpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
+{
+	vcpu_write_sys_reg(vcpu, compute_reset_mpidr(vcpu), VMPIDR_EL2);
+}
+
+static void reset_vpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
+{
+	vcpu_write_sys_reg(vcpu, read_cpuid_id(), VPIDR_EL2);
 }
 
 static void reset_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
@@ -1668,8 +1685,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	 */
 	{ SYS_DESC(SYS_PMCCFILTR_EL0), access_pmu_evtyper, reset_val, PMCCFILTR_EL0, 0 },
 
-	{ SYS_DESC(SYS_VPIDR_EL2), access_rw, reset_val, VPIDR_EL2, 0 },
-	{ SYS_DESC(SYS_VMPIDR_EL2), access_rw, reset_val, VMPIDR_EL2, 0 },
+	{ SYS_DESC(SYS_VPIDR_EL2), access_rw, reset_vpidr, VPIDR_EL2 },
+	{ SYS_DESC(SYS_VMPIDR_EL2), access_rw, reset_vmpidr, VMPIDR_EL2 },
 
 	{ SYS_DESC(SYS_SCTLR_EL2), access_rw, reset_val, SCTLR_EL2, 0 },
 	{ SYS_DESC(SYS_ACTLR_EL2), access_rw, reset_val, ACTLR_EL2, 0 },
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 09/59] KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (7 preceding siblings ...)
  2019-06-21  9:37 ` [PATCH 08/59] KVM: arm64: nv: Reset VMPIDR_EL2 and VPIDR_EL2 to sane values Marc Zyngier
@ 2019-06-21  9:37 ` Marc Zyngier
  2019-06-24 13:08   ` Dave Martin
  2019-06-21  9:37 ` [PATCH 10/59] KVM: arm64: nv: Support virtual EL2 exceptions Marc Zyngier
                   ` (51 subsequent siblings)
  60 siblings, 1 reply; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:37 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Christoffer Dall <christoffer.dall@arm.com>

When running a nested hypervisor we commonly have to figure out if
the VCPU mode is running in the context of a guest hypervisor or guest
guest, or just a normal guest.

Add convenient primitives for this.

Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/kvm_emulate.h | 55 ++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 39ffe41855bc..8f201ea56f6e 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -191,6 +191,61 @@ static inline void vcpu_set_reg(struct kvm_vcpu *vcpu, u8 reg_num,
 		vcpu_gp_regs(vcpu)->regs.regs[reg_num] = val;
 }
 
+static inline bool vcpu_mode_el2_ctxt(const struct kvm_cpu_context *ctxt)
+{
+	unsigned long cpsr = ctxt->gp_regs.regs.pstate;
+	u32 mode;
+
+	if (cpsr & PSR_MODE32_BIT)
+		return false;
+
+	mode = cpsr & PSR_MODE_MASK;
+
+	return mode == PSR_MODE_EL2h || mode == PSR_MODE_EL2t;
+}
+
+static inline bool vcpu_mode_el2(const struct kvm_vcpu *vcpu)
+{
+	return vcpu_mode_el2_ctxt(&vcpu->arch.ctxt);
+}
+
+static inline bool __vcpu_el2_e2h_is_set(const struct kvm_cpu_context *ctxt)
+{
+	return ctxt->sys_regs[HCR_EL2] & HCR_E2H;
+}
+
+static inline bool vcpu_el2_e2h_is_set(const struct kvm_vcpu *vcpu)
+{
+	return __vcpu_el2_e2h_is_set(&vcpu->arch.ctxt);
+}
+
+static inline bool __vcpu_el2_tge_is_set(const struct kvm_cpu_context *ctxt)
+{
+	return ctxt->sys_regs[HCR_EL2] & HCR_TGE;
+}
+
+static inline bool vcpu_el2_tge_is_set(const struct kvm_vcpu *vcpu)
+{
+	return __vcpu_el2_tge_is_set(&vcpu->arch.ctxt);
+}
+
+static inline bool __is_hyp_ctxt(const struct kvm_cpu_context *ctxt)
+{
+	/*
+	 * We are in a hypervisor context if the vcpu mode is EL2 or
+	 * E2H and TGE bits are set. The latter means we are in the user space
+	 * of the VHE kernel. ARMv8.1 ARM describes this as 'InHost'
+	 */
+	return vcpu_mode_el2_ctxt(ctxt) ||
+		(__vcpu_el2_e2h_is_set(ctxt) && __vcpu_el2_tge_is_set(ctxt)) ||
+		WARN_ON(__vcpu_el2_tge_is_set(ctxt));
+}
+
+static inline bool is_hyp_ctxt(const struct kvm_vcpu *vcpu)
+{
+	return __is_hyp_ctxt(&vcpu->arch.ctxt);
+}
+
 static inline unsigned long vcpu_read_spsr(const struct kvm_vcpu *vcpu)
 {
 	if (vcpu_mode_is_32bit(vcpu))
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 10/59] KVM: arm64: nv: Support virtual EL2 exceptions
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (8 preceding siblings ...)
  2019-06-21  9:37 ` [PATCH 09/59] KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state Marc Zyngier
@ 2019-06-21  9:37 ` Marc Zyngier
  2019-07-08 13:56   ` Steven Price
  2019-06-21  9:37 ` [PATCH 11/59] KVM: arm64: nv: Inject HVC exceptions to the virtual EL2 Marc Zyngier
                   ` (50 subsequent siblings)
  60 siblings, 1 reply; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:37 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Jintack Lim <jintack.lim@linaro.org>

Support injecting exceptions and performing exception returns to and
from virtual EL2.  This must be done entirely in software except when
taking an exception from vEL0 to vEL2 when the virtual HCR_EL2.{E2H,TGE}
== {1,1}  (a VHE guest hypervisor).

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/kvm_arm.h     |  17 +++
 arch/arm64/include/asm/kvm_emulate.h |  22 ++++
 arch/arm64/kvm/Makefile              |   2 +
 arch/arm64/kvm/emulate-nested.c      | 184 +++++++++++++++++++++++++++
 arch/arm64/kvm/inject_fault.c        |  12 --
 arch/arm64/kvm/trace.h               |  56 ++++++++
 6 files changed, 281 insertions(+), 12 deletions(-)
 create mode 100644 arch/arm64/kvm/emulate-nested.c

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 7f9d2bfcf82e..9d70a5362fbb 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -339,4 +339,21 @@
 #define CPACR_EL1_TTA		(1 << 28)
 #define CPACR_EL1_DEFAULT	(CPACR_EL1_FPEN | CPACR_EL1_ZEN_EL1EN)
 
+#define kvm_mode_names				\
+	{ PSR_MODE_EL0t,	"EL0t" },	\
+	{ PSR_MODE_EL1t,	"EL1t" },	\
+	{ PSR_MODE_EL1h,	"EL1h" },	\
+	{ PSR_MODE_EL2t,	"EL2t" },	\
+	{ PSR_MODE_EL2h,	"EL2h" },	\
+	{ PSR_MODE_EL3t,	"EL3t" },	\
+	{ PSR_MODE_EL3h,	"EL3h" },	\
+	{ PSR_AA32_MODE_USR,	"32-bit USR" },	\
+	{ PSR_AA32_MODE_FIQ,	"32-bit FIQ" },	\
+	{ PSR_AA32_MODE_IRQ,	"32-bit IRQ" },	\
+	{ PSR_AA32_MODE_SVC,	"32-bit SVC" },	\
+	{ PSR_AA32_MODE_ABT,	"32-bit ABT" },	\
+	{ PSR_AA32_MODE_HYP,	"32-bit HYP" },	\
+	{ PSR_AA32_MODE_UND,	"32-bit UND" },	\
+	{ PSR_AA32_MODE_SYS,	"32-bit SYS" }
+
 #endif /* __ARM64_KVM_ARM_H__ */
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 8f201ea56f6e..c43aac5fed69 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -33,6 +33,24 @@
 #include <asm/cputype.h>
 #include <asm/virt.h>
 
+#define CURRENT_EL_SP_EL0_VECTOR	0x0
+#define CURRENT_EL_SP_ELx_VECTOR	0x200
+#define LOWER_EL_AArch64_VECTOR		0x400
+#define LOWER_EL_AArch32_VECTOR		0x600
+
+enum exception_type {
+	except_type_sync	= 0,
+	except_type_irq		= 0x80,
+	except_type_fiq		= 0x100,
+	except_type_serror	= 0x180,
+};
+
+#define kvm_exception_type_names		\
+	{ except_type_sync,	"SYNC"   },	\
+	{ except_type_irq,	"IRQ"    },	\
+	{ except_type_fiq,	"FIQ"    },	\
+	{ except_type_serror,	"SERROR" }
+
 unsigned long *vcpu_reg32(const struct kvm_vcpu *vcpu, u8 reg_num);
 unsigned long vcpu_read_spsr32(const struct kvm_vcpu *vcpu);
 void vcpu_write_spsr32(struct kvm_vcpu *vcpu, unsigned long v);
@@ -48,6 +66,10 @@ void kvm_inject_undef32(struct kvm_vcpu *vcpu);
 void kvm_inject_dabt32(struct kvm_vcpu *vcpu, unsigned long addr);
 void kvm_inject_pabt32(struct kvm_vcpu *vcpu, unsigned long addr);
 
+void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu);
+int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2);
+int kvm_inject_nested_irq(struct kvm_vcpu *vcpu);
+
 static inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
 {
 	return !(vcpu->arch.hcr_el2 & HCR_RW);
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 3ac1a64d2fb9..9e450aea7db6 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -35,3 +35,5 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-debug.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/irqchip.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
 kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
+
+kvm-$(CONFIG_KVM_ARM_HOST) += emulate-nested.o
diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
new file mode 100644
index 000000000000..f829b8b04dc8
--- /dev/null
+++ b/arch/arm64/kvm/emulate-nested.c
@@ -0,0 +1,184 @@
+/*
+ * Copyright (C) 2016 - Linaro and Columbia University
+ * Author: Jintack Lim <jintack.lim@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_coproc.h>
+#include <asm/kvm_emulate.h>
+#include <asm/kvm_nested.h>
+
+#include "trace.h"
+
+/* This is borrowed from get_except_vector in inject_fault.c */
+static u64 get_el2_except_vector(struct kvm_vcpu *vcpu,
+		enum exception_type type)
+{
+	u64 exc_offset;
+
+	switch (*vcpu_cpsr(vcpu) & (PSR_MODE_MASK | PSR_MODE32_BIT)) {
+	case PSR_MODE_EL2t:
+		exc_offset = CURRENT_EL_SP_EL0_VECTOR;
+		break;
+	case PSR_MODE_EL2h:
+		exc_offset = CURRENT_EL_SP_ELx_VECTOR;
+		break;
+	case PSR_MODE_EL1t:
+	case PSR_MODE_EL1h:
+	case PSR_MODE_EL0t:
+		exc_offset = LOWER_EL_AArch64_VECTOR;
+		break;
+	default:
+		kvm_err("Unexpected previous exception level: aarch32\n");
+		exc_offset = LOWER_EL_AArch32_VECTOR;
+	}
+
+	return vcpu_read_sys_reg(vcpu, VBAR_EL2) + exc_offset + type;
+}
+
+void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu)
+{
+	u64 spsr, elr, mode;
+	bool direct_eret;
+
+	/*
+	 * Going through the whole put/load motions is a waste of time
+	 * if this is a VHE guest hypervisor returning to its own
+	 * userspace, or the hypervisor performing a local exception
+	 * return. No need to save/restore registers, no need to
+	 * switch S2 MMU. Just do the canonical ERET.
+	 */
+	spsr = vcpu_read_sys_reg(vcpu, SPSR_EL2);
+	mode = spsr & (PSR_MODE_MASK | PSR_MODE32_BIT);
+
+	direct_eret  = (mode == PSR_MODE_EL0t &&
+			vcpu_el2_e2h_is_set(vcpu) &&
+			vcpu_el2_tge_is_set(vcpu));
+	direct_eret |= (mode == PSR_MODE_EL2h || mode == PSR_MODE_EL2t);
+
+	if (direct_eret) {
+		*vcpu_pc(vcpu) = vcpu_read_sys_reg(vcpu, ELR_EL2);
+		*vcpu_cpsr(vcpu) = spsr;
+		trace_kvm_nested_eret(vcpu, *vcpu_pc(vcpu), *vcpu_cpsr(vcpu));
+		return;
+	}
+
+	preempt_disable();
+	kvm_arch_vcpu_put(vcpu);
+
+	spsr = __vcpu_sys_reg(vcpu, SPSR_EL2);
+	elr = __vcpu_sys_reg(vcpu, ELR_EL2);
+
+	trace_kvm_nested_eret(vcpu, elr, spsr);
+
+	/*
+	 * Note that the current exception level is always the virtual EL2,
+	 * since we set HCR_EL2.NV bit only when entering the virtual EL2.
+	 */
+	*vcpu_pc(vcpu) = elr;
+	*vcpu_cpsr(vcpu) = spsr;
+
+	kvm_arch_vcpu_load(vcpu, smp_processor_id());
+	preempt_enable();
+}
+
+static void enter_el2_exception(struct kvm_vcpu *vcpu, u64 esr_el2,
+				enum exception_type type)
+{
+	trace_kvm_inject_nested_exception(vcpu, esr_el2, type);
+
+	vcpu_write_sys_reg(vcpu, *vcpu_cpsr(vcpu), SPSR_EL2);
+	vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL2);
+	vcpu_write_sys_reg(vcpu, esr_el2, ESR_EL2);
+
+	*vcpu_pc(vcpu) = get_el2_except_vector(vcpu, type);
+	/* On an exception, PSTATE.SP becomes 1 */
+	*vcpu_cpsr(vcpu) = PSR_MODE_EL2h;
+	*vcpu_cpsr(vcpu) |= PSR_A_BIT | PSR_F_BIT | PSR_I_BIT | PSR_D_BIT;
+}
+
+/*
+ * Emulate taking an exception to EL2.
+ * See ARM ARM J8.1.2 AArch64.TakeException()
+ */
+static int kvm_inject_nested(struct kvm_vcpu *vcpu, u64 esr_el2,
+			     enum exception_type type)
+{
+	u64 pstate, mode;
+	bool direct_inject;
+
+	if (!nested_virt_in_use(vcpu)) {
+		kvm_err("Unexpected call to %s for the non-nesting configuration\n",
+				__func__);
+		return -EINVAL;
+	}
+
+	/*
+	 * As for ERET, we can avoid doing too much on the injection path by
+	 * checking that we either took the exception from a VHE host
+	 * userspace or from vEL2. In these cases, there is no change in
+	 * translation regime (or anything else), so let's do as little as
+	 * possible.
+	 */
+	pstate = *vcpu_cpsr(vcpu);
+	mode = pstate & (PSR_MODE_MASK | PSR_MODE32_BIT);
+
+	direct_inject  = (mode == PSR_MODE_EL0t &&
+			  vcpu_el2_e2h_is_set(vcpu) &&
+			  vcpu_el2_tge_is_set(vcpu));
+	direct_inject |= (mode == PSR_MODE_EL2h || mode == PSR_MODE_EL2t);
+
+	if (direct_inject) {
+		enter_el2_exception(vcpu, esr_el2, type);
+		return 1;
+	}
+
+	preempt_disable();
+	kvm_arch_vcpu_put(vcpu);
+
+	enter_el2_exception(vcpu, esr_el2, type);
+
+	kvm_arch_vcpu_load(vcpu, smp_processor_id());
+	preempt_enable();
+
+	return 1;
+}
+
+int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2)
+{
+	return kvm_inject_nested(vcpu, esr_el2, except_type_sync);
+}
+
+int kvm_inject_nested_irq(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * Do not inject an irq if the:
+	 *  - Current exception level is EL2, and
+	 *  - virtual HCR_EL2.TGE == 0
+	 *  - virtual HCR_EL2.IMO == 0
+	 *
+	 * See Table D1-17 "Physical interrupt target and masking when EL3 is
+	 * not implemented and EL2 is implemented" in ARM DDI 0487C.a.
+	 */
+
+	if (vcpu_mode_el2(vcpu) && !vcpu_el2_tge_is_set(vcpu) &&
+	    !(__vcpu_sys_reg(vcpu, HCR_EL2) & HCR_IMO))
+		return 1;
+
+	/* esr_el2 value doesn't matter for exits due to irqs. */
+	return kvm_inject_nested(vcpu, 0, except_type_irq);
+}
diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index a55e91dfcf8f..fac962b467bd 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -28,18 +28,6 @@
 #define PSTATE_FAULT_BITS_64 	(PSR_MODE_EL1h | PSR_A_BIT | PSR_F_BIT | \
 				 PSR_I_BIT | PSR_D_BIT)
 
-#define CURRENT_EL_SP_EL0_VECTOR	0x0
-#define CURRENT_EL_SP_ELx_VECTOR	0x200
-#define LOWER_EL_AArch64_VECTOR		0x400
-#define LOWER_EL_AArch32_VECTOR		0x600
-
-enum exception_type {
-	except_type_sync	= 0,
-	except_type_irq		= 0x80,
-	except_type_fiq		= 0x100,
-	except_type_serror	= 0x180,
-};
-
 static u64 get_except_vector(struct kvm_vcpu *vcpu, enum exception_type type)
 {
 	u64 exc_offset;
diff --git a/arch/arm64/kvm/trace.h b/arch/arm64/kvm/trace.h
index eab91ad0effb..797a705bb644 100644
--- a/arch/arm64/kvm/trace.h
+++ b/arch/arm64/kvm/trace.h
@@ -204,7 +204,63 @@ TRACE_EVENT(kvm_set_guest_debug,
 	TP_printk("vcpu: %p, flags: 0x%08x", __entry->vcpu, __entry->guest_debug)
 );
 
+TRACE_EVENT(kvm_nested_eret,
+	TP_PROTO(struct kvm_vcpu *vcpu, unsigned long elr_el2,
+		 unsigned long spsr_el2),
+	TP_ARGS(vcpu, elr_el2, spsr_el2),
 
+	TP_STRUCT__entry(
+		__field(struct kvm_vcpu *,	vcpu)
+		__field(unsigned long,		elr_el2)
+		__field(unsigned long,		spsr_el2)
+		__field(unsigned long,		target_mode)
+		__field(unsigned long,		hcr_el2)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu = vcpu;
+		__entry->elr_el2 = elr_el2;
+		__entry->spsr_el2 = spsr_el2;
+		__entry->target_mode = spsr_el2 & (PSR_MODE_MASK | PSR_MODE32_BIT);
+		__entry->hcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
+	),
+
+	TP_printk("elr_el2: 0x%lx spsr_el2: 0x%08lx (M: %s) hcr_el2: %lx",
+		  __entry->elr_el2, __entry->spsr_el2,
+		  __print_symbolic(__entry->target_mode, kvm_mode_names),
+		  __entry->hcr_el2)
+);
+
+TRACE_EVENT(kvm_inject_nested_exception,
+	TP_PROTO(struct kvm_vcpu *vcpu, u64 esr_el2, int type),
+	TP_ARGS(vcpu, esr_el2, type),
+
+	TP_STRUCT__entry(
+		__field(struct kvm_vcpu *,		vcpu)
+		__field(unsigned long,			esr_el2)
+		__field(int,				type)
+		__field(unsigned long,			spsr_el2)
+		__field(unsigned long,			pc)
+		__field(int,				source_mode)
+		__field(unsigned long,			hcr_el2)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu = vcpu;
+		__entry->esr_el2 = esr_el2;
+		__entry->type = type;
+		__entry->spsr_el2 = *vcpu_cpsr(vcpu);
+		__entry->pc = *vcpu_pc(vcpu);
+		__entry->source_mode = *vcpu_cpsr(vcpu) & (PSR_MODE_MASK | PSR_MODE32_BIT);
+		__entry->hcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
+	),
+
+	TP_printk("%s: esr_el2 0x%lx elr_el2: 0x%lx spsr_el2: 0x%08lx (M: %s) hcr_el2: %lx",
+		  __print_symbolic(__entry->type, kvm_exception_type_names),
+		  __entry->esr_el2, __entry->pc, __entry->spsr_el2,
+		  __print_symbolic(__entry->source_mode, kvm_mode_names),
+		  __entry->hcr_el2)
+);
 #endif /* _TRACE_ARM64_KVM_H */
 
 #undef TRACE_INCLUDE_PATH
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 11/59] KVM: arm64: nv: Inject HVC exceptions to the virtual EL2
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (9 preceding siblings ...)
  2019-06-21  9:37 ` [PATCH 10/59] KVM: arm64: nv: Support virtual EL2 exceptions Marc Zyngier
@ 2019-06-21  9:37 ` Marc Zyngier
  2019-06-25 13:13   ` Alexandru Elisei
  2019-06-21  9:37 ` [PATCH 12/59] KVM: arm64: nv: Handle trapped ERET from " Marc Zyngier
                   ` (49 subsequent siblings)
  60 siblings, 1 reply; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:37 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Jintack Lim <jintack.lim@linaro.org>

Now that the psci call is done by the smc instruction when nested
virtualization is enabled, it is clear that all hvc instruction from the
VM (including from the virtual EL2) are supposed to handled in the
virtual EL2.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/kvm/handle_exit.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 516aead3c2a9..6c0ac52b34cc 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -30,6 +30,7 @@
 #include <asm/kvm_coproc.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_nested.h>
 #include <asm/debug-monitors.h>
 #include <asm/traps.h>
 
@@ -52,6 +53,12 @@ static int handle_hvc(struct kvm_vcpu *vcpu, struct kvm_run *run)
 			    kvm_vcpu_hvc_get_imm(vcpu));
 	vcpu->stat.hvc_exit_stat++;
 
+	/* Forward hvc instructions to the virtual EL2 if the guest has EL2. */
+	if (nested_virt_in_use(vcpu)) {
+		kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+		return 1;
+	}
+
 	ret = kvm_hvc_call_handler(vcpu);
 	if (ret < 0) {
 		vcpu_set_reg(vcpu, 0, ~0UL);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 12/59] KVM: arm64: nv: Handle trapped ERET from virtual EL2
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (10 preceding siblings ...)
  2019-06-21  9:37 ` [PATCH 11/59] KVM: arm64: nv: Inject HVC exceptions to the virtual EL2 Marc Zyngier
@ 2019-06-21  9:37 ` Marc Zyngier
  2019-07-02 12:00   ` Alexandru Elisei
  2019-06-21  9:37 ` [PATCH 13/59] KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg() Marc Zyngier
                   ` (48 subsequent siblings)
  60 siblings, 1 reply; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:37 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Christoffer Dall <christoffer.dall@arm.com>

When a guest hypervisor running virtual EL2 in EL1 executes an ERET
instruction, we will have set HCR_EL2.NV which traps ERET to EL2, so
that we can emulate the exception return in software.

Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/esr.h     | 3 ++-
 arch/arm64/include/asm/kvm_arm.h | 2 +-
 arch/arm64/kvm/handle_exit.c     | 8 ++++++++
 3 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index 0e27fe91d5ea..f85aa269082c 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -45,7 +45,8 @@
 #define ESR_ELx_EC_SMC64	(0x17)	/* EL2 and above */
 #define ESR_ELx_EC_SYS64	(0x18)
 #define ESR_ELx_EC_SVE		(0x19)
-/* Unallocated EC: 0x1A - 0x1E */
+#define ESR_ELx_EC_ERET		(0x1A)  /* EL2 only */
+/* Unallocated EC: 0x1B - 0x1E */
 #define ESR_ELx_EC_IMP_DEF	(0x1f)	/* EL3 only */
 #define ESR_ELx_EC_IABT_LOW	(0x20)
 #define ESR_ELx_EC_IABT_CUR	(0x21)
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 9d70a5362fbb..b2e363ac624d 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -333,7 +333,7 @@
 	ECN(SP_ALIGN), ECN(FP_EXC32), ECN(FP_EXC64), ECN(SERROR), \
 	ECN(BREAKPT_LOW), ECN(BREAKPT_CUR), ECN(SOFTSTP_LOW), \
 	ECN(SOFTSTP_CUR), ECN(WATCHPT_LOW), ECN(WATCHPT_CUR), \
-	ECN(BKPT32), ECN(VECTOR32), ECN(BRK64)
+	ECN(BKPT32), ECN(VECTOR32), ECN(BRK64), ECN(ERET)
 
 #define CPACR_EL1_FPEN		(3 << 20)
 #define CPACR_EL1_TTA		(1 << 28)
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 6c0ac52b34cc..2517711f034f 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -177,6 +177,13 @@ static int handle_sve(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
 	/* Until SVE is supported for guests: */
 	kvm_inject_undefined(vcpu);
+
+	return 1;
+}
+
+static int kvm_handle_eret(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	kvm_emulate_nested_eret(vcpu);
 	return 1;
 }
 
@@ -231,6 +238,7 @@ static exit_handle_fn arm_exit_handlers[] = {
 	[ESR_ELx_EC_SMC64]	= handle_smc,
 	[ESR_ELx_EC_SYS64]	= kvm_handle_sys_reg,
 	[ESR_ELx_EC_SVE]	= handle_sve,
+	[ESR_ELx_EC_ERET]	= kvm_handle_eret,
 	[ESR_ELx_EC_IABT_LOW]	= kvm_handle_guest_abort,
 	[ESR_ELx_EC_DABT_LOW]	= kvm_handle_guest_abort,
 	[ESR_ELx_EC_SOFTSTP_LOW]= kvm_handle_guest_debug,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 13/59] KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg()
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (11 preceding siblings ...)
  2019-06-21  9:37 ` [PATCH 12/59] KVM: arm64: nv: Handle trapped ERET from " Marc Zyngier
@ 2019-06-21  9:37 ` Marc Zyngier
  2019-06-24 12:42   ` Julien Thierry
                     ` (3 more replies)
  2019-06-21  9:37 ` [PATCH 14/59] KVM: arm64: nv: Handle SPSR_EL2 specially Marc Zyngier
                   ` (47 subsequent siblings)
  60 siblings, 4 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:37 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Andre Przywara <andre.przywara@arm.com>

KVM internally uses accessor functions when reading or writing the
guest's system registers. This takes care of accessing either the stored
copy or using the "live" EL1 system registers when the host uses VHE.

With the introduction of virtual EL2 we add a bunch of EL2 system
registers, which now must also be taken care of:
- If the guest is running in vEL2, and we access an EL1 sysreg, we must
  revert to the stored version of that, and not use the CPU's copy.
- If the guest is running in vEL1, and we access an EL2 sysreg, we must
  also use the stored version, since the CPU carries the EL1 copy.
- Some EL2 system registers are supposed to affect the current execution
  of the system, so we need to put them into their respective EL1
  counterparts. For this we need to define a mapping between the two.
  This is done using the newly introduced struct el2_sysreg_map.
- Some EL2 system registers have a different format than their EL1
  counterpart, so we need to translate them before writing them to the
  CPU. This is done using an (optional) translate function in the map.
- There are the three special registers SP_EL2, SPSR_EL2 and ELR_EL2,
  which need some separate handling.

All of these cases are now wrapped into the existing accessor functions,
so KVM users wouldn't need to care whether they access EL2 or EL1
registers and also which state the guest is in.

This handles what was formerly known as the "shadow state" dynamically,
without requiring a separate copy for each vCPU EL.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/kvm_emulate.h |   6 +
 arch/arm64/include/asm/kvm_host.h    |   5 +
 arch/arm64/kvm/sys_regs.c            | 163 +++++++++++++++++++++++++++
 3 files changed, 174 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index c43aac5fed69..f37006b6eec4 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -70,6 +70,12 @@ void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu);
 int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2);
 int kvm_inject_nested_irq(struct kvm_vcpu *vcpu);
 
+u64 translate_tcr(u64 tcr);
+u64 translate_cptr(u64 tcr);
+u64 translate_sctlr(u64 tcr);
+u64 translate_ttbr0(u64 tcr);
+u64 translate_cnthctl(u64 tcr);
+
 static inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
 {
 	return !(vcpu->arch.hcr_el2 & HCR_RW);
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 2d4290d2513a..dae9c42a7219 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -217,6 +217,11 @@ enum vcpu_sysreg {
 	NR_SYS_REGS	/* Nothing after this line! */
 };
 
+static inline bool sysreg_is_el2(int reg)
+{
+	return reg >= FIRST_EL2_SYSREG && reg < NR_SYS_REGS;
+}
+
 /* 32bit mapping */
 #define c0_MPIDR	(MPIDR_EL1 * 2)	/* MultiProcessor ID Register */
 #define c0_CSSELR	(CSSELR_EL1 * 2)/* Cache Size Selection Register */
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 693dd063c9c2..d024114da162 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -76,11 +76,142 @@ static bool write_to_read_only(struct kvm_vcpu *vcpu,
 	return false;
 }
 
+static u64 tcr_el2_ips_to_tcr_el1_ps(u64 tcr_el2)
+{
+	return ((tcr_el2 & TCR_EL2_PS_MASK) >> TCR_EL2_PS_SHIFT)
+		<< TCR_IPS_SHIFT;
+}
+
+u64 translate_tcr(u64 tcr)
+{
+	return TCR_EPD1_MASK |				/* disable TTBR1_EL1 */
+	       ((tcr & TCR_EL2_TBI) ? TCR_TBI0 : 0) |
+	       tcr_el2_ips_to_tcr_el1_ps(tcr) |
+	       (tcr & TCR_EL2_TG0_MASK) |
+	       (tcr & TCR_EL2_ORGN0_MASK) |
+	       (tcr & TCR_EL2_IRGN0_MASK) |
+	       (tcr & TCR_EL2_T0SZ_MASK);
+}
+
+u64 translate_cptr(u64 cptr_el2)
+{
+	u64 cpacr_el1 = 0;
+
+	if (!(cptr_el2 & CPTR_EL2_TFP))
+		cpacr_el1 |= CPACR_EL1_FPEN;
+	if (cptr_el2 & CPTR_EL2_TTA)
+		cpacr_el1 |= CPACR_EL1_TTA;
+	if (!(cptr_el2 & CPTR_EL2_TZ))
+		cpacr_el1 |= CPACR_EL1_ZEN;
+
+	return cpacr_el1;
+}
+
+u64 translate_sctlr(u64 sctlr)
+{
+	/* Bit 20 is RES1 in SCTLR_EL1, but RES0 in SCTLR_EL2 */
+	return sctlr | BIT(20);
+}
+
+u64 translate_ttbr0(u64 ttbr0)
+{
+	/* Force ASID to 0 (ASID 0 or RES0) */
+	return ttbr0 & ~GENMASK_ULL(63, 48);
+}
+
+u64 translate_cnthctl(u64 cnthctl)
+{
+	return ((cnthctl & 0x3) << 10) | (cnthctl & 0xfc);
+}
+
+#define EL2_SYSREG(el2, el1, translate)	\
+	[el2 - FIRST_EL2_SYSREG] = { el2, el1, translate }
+#define PURE_EL2_SYSREG(el2) \
+	[el2 - FIRST_EL2_SYSREG] = { el2,__INVALID_SYSREG__, NULL }
+/*
+ * Associate vEL2 registers to their EL1 counterparts on the CPU.
+ * The translate function can be NULL, when the register layout is identical.
+ */
+struct el2_sysreg_map {
+	int sysreg;	/* EL2 register index into the array above */
+	int mapping;	/* associated EL1 register */
+	u64 (*translate)(u64 value);
+} nested_sysreg_map[NR_SYS_REGS - FIRST_EL2_SYSREG] = {
+	PURE_EL2_SYSREG( VPIDR_EL2 ),
+	PURE_EL2_SYSREG( VMPIDR_EL2 ),
+	PURE_EL2_SYSREG( ACTLR_EL2 ),
+	PURE_EL2_SYSREG( HCR_EL2 ),
+	PURE_EL2_SYSREG( MDCR_EL2 ),
+	PURE_EL2_SYSREG( HSTR_EL2 ),
+	PURE_EL2_SYSREG( HACR_EL2 ),
+	PURE_EL2_SYSREG( VTTBR_EL2 ),
+	PURE_EL2_SYSREG( VTCR_EL2 ),
+	PURE_EL2_SYSREG( RVBAR_EL2 ),
+	PURE_EL2_SYSREG( RMR_EL2 ),
+	PURE_EL2_SYSREG( TPIDR_EL2 ),
+	PURE_EL2_SYSREG( CNTVOFF_EL2 ),
+	PURE_EL2_SYSREG( CNTHCTL_EL2 ),
+	PURE_EL2_SYSREG( HPFAR_EL2 ),
+	EL2_SYSREG(      SCTLR_EL2,  SCTLR_EL1,      translate_sctlr ),
+	EL2_SYSREG(      CPTR_EL2,   CPACR_EL1,      translate_cptr  ),
+	EL2_SYSREG(      TTBR0_EL2,  TTBR0_EL1,      translate_ttbr0 ),
+	EL2_SYSREG(      TTBR1_EL2,  TTBR1_EL1,      NULL            ),
+	EL2_SYSREG(      TCR_EL2,    TCR_EL1,        translate_tcr   ),
+	EL2_SYSREG(      VBAR_EL2,   VBAR_EL1,       NULL            ),
+	EL2_SYSREG(      AFSR0_EL2,  AFSR0_EL1,      NULL            ),
+	EL2_SYSREG(      AFSR1_EL2,  AFSR1_EL1,      NULL            ),
+	EL2_SYSREG(      ESR_EL2,    ESR_EL1,        NULL            ),
+	EL2_SYSREG(      FAR_EL2,    FAR_EL1,        NULL            ),
+	EL2_SYSREG(      MAIR_EL2,   MAIR_EL1,       NULL            ),
+	EL2_SYSREG(      AMAIR_EL2,  AMAIR_EL1,      NULL            ),
+};
+
+static
+const struct el2_sysreg_map *find_el2_sysreg(const struct el2_sysreg_map *map,
+					     int reg)
+{
+	const struct el2_sysreg_map *entry;
+
+	if (!sysreg_is_el2(reg))
+		return NULL;
+
+	entry = &nested_sysreg_map[reg - FIRST_EL2_SYSREG];
+	if (entry->sysreg == __INVALID_SYSREG__)
+		return NULL;
+
+	return entry;
+}
+
 u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
 {
+
 	if (!vcpu->arch.sysregs_loaded_on_cpu)
 		goto immediate_read;
 
+	if (unlikely(sysreg_is_el2(reg))) {
+		const struct el2_sysreg_map *el2_reg;
+
+		if (!is_hyp_ctxt(vcpu))
+			goto immediate_read;
+
+		el2_reg = find_el2_sysreg(nested_sysreg_map, reg);
+		if (el2_reg) {
+			/*
+			 * If this register does not have an EL1 counterpart,
+			 * then read the stored EL2 version.
+			 */
+			if (el2_reg->mapping == __INVALID_SYSREG__)
+				goto immediate_read;
+
+			/* Get the current version of the EL1 counterpart. */
+			reg = el2_reg->mapping;
+		}
+	} else {
+		/* EL1 register can't be on the CPU if the guest is in vEL2. */
+		if (unlikely(is_hyp_ctxt(vcpu)))
+			goto immediate_read;
+	}
+
 	/*
 	 * System registers listed in the switch are not saved on every
 	 * exit from the guest but are only saved on vcpu_put.
@@ -114,6 +245,8 @@ u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
 	case DACR32_EL2:	return read_sysreg_s(SYS_DACR32_EL2);
 	case IFSR32_EL2:	return read_sysreg_s(SYS_IFSR32_EL2);
 	case DBGVCR32_EL2:	return read_sysreg_s(SYS_DBGVCR32_EL2);
+	case SP_EL2:		return read_sysreg(sp_el1);
+	case ELR_EL2:		return read_sysreg_el1(SYS_ELR);
 	}
 
 immediate_read:
@@ -125,6 +258,34 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
 	if (!vcpu->arch.sysregs_loaded_on_cpu)
 		goto immediate_write;
 
+	if (unlikely(sysreg_is_el2(reg))) {
+		const struct el2_sysreg_map *el2_reg;
+
+		if (!is_hyp_ctxt(vcpu))
+			goto immediate_write;
+
+		/* Store the EL2 version in the sysregs array. */
+		__vcpu_sys_reg(vcpu, reg) = val;
+
+		el2_reg = find_el2_sysreg(nested_sysreg_map, reg);
+		if (el2_reg) {
+			/* Does this register have an EL1 counterpart? */
+			if (el2_reg->mapping == __INVALID_SYSREG__)
+				return;
+
+			if (!vcpu_el2_e2h_is_set(vcpu) &&
+			    el2_reg->translate)
+				val = el2_reg->translate(val);
+
+			/* Redirect this to the EL1 version of the register. */
+			reg = el2_reg->mapping;
+		}
+	} else {
+		/* EL1 register can't be on the CPU if the guest is in vEL2. */
+		if (unlikely(is_hyp_ctxt(vcpu)))
+			goto immediate_write;
+	}
+
 	/*
 	 * System registers listed in the switch are not restored on every
 	 * entry to the guest but are only restored on vcpu_load.
@@ -157,6 +318,8 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
 	case DACR32_EL2:	write_sysreg_s(val, SYS_DACR32_EL2);	return;
 	case IFSR32_EL2:	write_sysreg_s(val, SYS_IFSR32_EL2);	return;
 	case DBGVCR32_EL2:	write_sysreg_s(val, SYS_DBGVCR32_EL2);	return;
+	case SP_EL2:		write_sysreg(val, sp_el1);		return;
+	case ELR_EL2:		write_sysreg_el1(val, SYS_ELR);		return;
 	}
 
 immediate_write:
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 14/59] KVM: arm64: nv: Handle SPSR_EL2 specially
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (12 preceding siblings ...)
  2019-06-21  9:37 ` [PATCH 13/59] KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg() Marc Zyngier
@ 2019-06-21  9:37 ` Marc Zyngier
  2019-06-21  9:37 ` [PATCH 15/59] KVM: arm64: nv: Refactor vcpu_{read,write}_sys_reg Marc Zyngier
                   ` (46 subsequent siblings)
  60 siblings, 0 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:37 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

SPSR_EL2 needs special attention when running nested on ARMv8.3:

If taking an exception while running at vEL2 (actually EL1), the
HW will update the SPSR_EL1 register with the EL1 mode. We need
to track this in order to make sure that accesses to the virtual
view of SPSR_EL2 is correct.

To do so, we place an illegal value in SPSR_EL1.M, and patch it
accordingly if required when accessing it.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/kvm_emulate.h | 45 ++++++++++++++++++++++++++++
 arch/arm64/kvm/sys_regs.c            | 28 ++++++++++++++++-
 2 files changed, 72 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index f37006b6eec4..2644258e96ba 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -274,11 +274,51 @@ static inline bool is_hyp_ctxt(const struct kvm_vcpu *vcpu)
 	return __is_hyp_ctxt(&vcpu->arch.ctxt);
 }
 
+static inline u64 __fixup_spsr_el2_write(struct kvm_cpu_context *ctxt, u64 val)
+{
+	if (!__vcpu_el2_e2h_is_set(ctxt)) {
+		/*
+		 * Clear the .M field when writing SPSR to the CPU, so that we
+		 * can detect when the CPU clobbered our SPSR copy during a
+		 * local exception.
+		 */
+		val &= ~0xc;
+	}
+
+	return val;
+}
+
+static inline u64 __fixup_spsr_el2_read(const struct kvm_cpu_context *ctxt, u64 val)
+{
+	if (__vcpu_el2_e2h_is_set(ctxt))
+		return val;
+
+	/*
+	 * SPSR.M == 0 means the CPU has not touched the SPSR, so the
+	 * register has still the value we saved on the last write.
+	 */
+	if ((val & 0xc) == 0)
+		return ctxt->sys_regs[SPSR_EL2];
+
+	/*
+	 * Otherwise there was a "local" exception on the CPU,
+	 * which from the guest's point of view was being taken from
+	 * EL2 to EL2, although it actually happened to be from
+	 * EL1 to EL1.
+	 * So we need to fix the .M field in SPSR, to make it look
+	 * like EL2, which is what the guest would expect.
+	 */
+	return (val & ~0x0c) | CurrentEL_EL2;
+}
+
 static inline unsigned long vcpu_read_spsr(const struct kvm_vcpu *vcpu)
 {
 	if (vcpu_mode_is_32bit(vcpu))
 		return vcpu_read_spsr32(vcpu);
 
+	if (unlikely(vcpu_mode_el2(vcpu)))
+		return vcpu_read_sys_reg(vcpu, SPSR_EL2);
+
 	if (vcpu->arch.sysregs_loaded_on_cpu)
 		return read_sysreg_el1(SYS_SPSR);
 	else
@@ -292,6 +332,11 @@ static inline void vcpu_write_spsr(struct kvm_vcpu *vcpu, unsigned long v)
 		return;
 	}
 
+	if (unlikely(vcpu_mode_el2(vcpu))) {
+		vcpu_write_sys_reg(vcpu, v, SPSR_EL2);
+		return;
+	}
+
 	if (vcpu->arch.sysregs_loaded_on_cpu)
 		write_sysreg_el1(v, SYS_SPSR);
 	else
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index d024114da162..2b8734f75a09 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -184,6 +184,7 @@ const struct el2_sysreg_map *find_el2_sysreg(const struct el2_sysreg_map *map,
 
 u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
 {
+	u64 val;
 
 	if (!vcpu->arch.sysregs_loaded_on_cpu)
 		goto immediate_read;
@@ -194,6 +195,12 @@ u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
 		if (!is_hyp_ctxt(vcpu))
 			goto immediate_read;
 
+		switch (reg) {
+		case SPSR_EL2:
+			val = read_sysreg_el1(SYS_SPSR);
+			return __fixup_spsr_el2_read(&vcpu->arch.ctxt, val);
+		}
+
 		el2_reg = find_el2_sysreg(nested_sysreg_map, reg);
 		if (el2_reg) {
 			/*
@@ -267,6 +274,13 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
 		/* Store the EL2 version in the sysregs array. */
 		__vcpu_sys_reg(vcpu, reg) = val;
 
+		switch (reg) {
+		case SPSR_EL2:
+			val = __fixup_spsr_el2_write(&vcpu->arch.ctxt, val);
+			write_sysreg_el1(val, SYS_SPSR);
+			return;
+		}
+
 		el2_reg = find_el2_sysreg(nested_sysreg_map, reg);
 		if (el2_reg) {
 			/* Does this register have an EL1 counterpart? */
@@ -1556,6 +1570,18 @@ static bool access_sp_el1(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+static bool access_spsr_el2(struct kvm_vcpu *vcpu,
+			    struct sys_reg_params *p,
+			    const struct sys_reg_desc *r)
+{
+	if (p->is_write)
+		vcpu_write_sys_reg(vcpu, p->regval, SPSR_EL2);
+	else
+		p->regval = vcpu_read_sys_reg(vcpu, SPSR_EL2);
+
+	return true;
+}
+
 /*
  * Architected system registers.
  * Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
@@ -1866,7 +1892,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_VTCR_EL2), access_rw, reset_val, VTCR_EL2, 0 },
 
 	{ SYS_DESC(SYS_DACR32_EL2), NULL, reset_unknown, DACR32_EL2 },
-	{ SYS_DESC(SYS_SPSR_EL2), access_rw, reset_val, SPSR_EL2, 0 },
+	{ SYS_DESC(SYS_SPSR_EL2), access_spsr_el2, reset_val, SPSR_EL2, 0 },
 	{ SYS_DESC(SYS_ELR_EL2), access_rw, reset_val, ELR_EL2, 0 },
 	{ SYS_DESC(SYS_SP_EL1), access_sp_el1},
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 15/59] KVM: arm64: nv: Refactor vcpu_{read,write}_sys_reg
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (13 preceding siblings ...)
  2019-06-21  9:37 ` [PATCH 14/59] KVM: arm64: nv: Handle SPSR_EL2 specially Marc Zyngier
@ 2019-06-21  9:37 ` Marc Zyngier
  2019-06-24 15:07   ` Julien Thierry
  2019-06-27  9:21   ` Alexandru Elisei
  2019-06-21  9:38 ` [PATCH 16/59] KVM: arm64: nv: Save/Restore vEL2 sysregs Marc Zyngier
                   ` (45 subsequent siblings)
  60 siblings, 2 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:37 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

Extract the direct HW accessors for later reuse.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/kvm/sys_regs.c | 247 +++++++++++++++++++++-----------------
 1 file changed, 139 insertions(+), 108 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 2b8734f75a09..e181359adadf 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -182,99 +182,161 @@ const struct el2_sysreg_map *find_el2_sysreg(const struct el2_sysreg_map *map,
 	return entry;
 }
 
+static bool __vcpu_read_sys_reg_from_cpu(int reg, u64 *val)
+{
+	/*
+	 * System registers listed in the switch are not saved on every
+	 * exit from the guest but are only saved on vcpu_put.
+	 *
+	 * Note that MPIDR_EL1 for the guest is set by KVM via VMPIDR_EL2 but
+	 * should never be listed below, because the guest cannot modify its
+	 * own MPIDR_EL1 and MPIDR_EL1 is accessed for VCPU A from VCPU B's
+	 * thread when emulating cross-VCPU communication.
+	 */
+	switch (reg) {
+	case CSSELR_EL1:	*val = read_sysreg_s(SYS_CSSELR_EL1);	break;
+	case SCTLR_EL1:		*val = read_sysreg_s(SYS_SCTLR_EL12);	break;
+	case ACTLR_EL1:		*val = read_sysreg_s(SYS_ACTLR_EL1);	break;
+	case CPACR_EL1:		*val = read_sysreg_s(SYS_CPACR_EL12);	break;
+	case TTBR0_EL1:		*val = read_sysreg_s(SYS_TTBR0_EL12);	break;
+	case TTBR1_EL1:		*val = read_sysreg_s(SYS_TTBR1_EL12);	break;
+	case TCR_EL1:		*val = read_sysreg_s(SYS_TCR_EL12);	break;
+	case ESR_EL1:		*val = read_sysreg_s(SYS_ESR_EL12);	break;
+	case AFSR0_EL1:		*val = read_sysreg_s(SYS_AFSR0_EL12);	break;
+	case AFSR1_EL1:		*val = read_sysreg_s(SYS_AFSR1_EL12);	break;
+	case FAR_EL1:		*val = read_sysreg_s(SYS_FAR_EL12);	break;
+	case MAIR_EL1:		*val = read_sysreg_s(SYS_MAIR_EL12);	break;
+	case VBAR_EL1:		*val = read_sysreg_s(SYS_VBAR_EL12);	break;
+	case CONTEXTIDR_EL1:	*val = read_sysreg_s(SYS_CONTEXTIDR_EL12);break;
+	case TPIDR_EL0:		*val = read_sysreg_s(SYS_TPIDR_EL0);	break;
+	case TPIDRRO_EL0:	*val = read_sysreg_s(SYS_TPIDRRO_EL0);	break;
+	case TPIDR_EL1:		*val = read_sysreg_s(SYS_TPIDR_EL1);	break;
+	case AMAIR_EL1:		*val = read_sysreg_s(SYS_AMAIR_EL12);	break;
+	case CNTKCTL_EL1:	*val = read_sysreg_s(SYS_CNTKCTL_EL12);	break;
+	case PAR_EL1:		*val = read_sysreg_s(SYS_PAR_EL1);	break;
+	case DACR32_EL2:	*val = read_sysreg_s(SYS_DACR32_EL2);	break;
+	case IFSR32_EL2:	*val = read_sysreg_s(SYS_IFSR32_EL2);	break;
+	case DBGVCR32_EL2:	*val = read_sysreg_s(SYS_DBGVCR32_EL2);	break;
+	default:		return false;
+	}
+
+	return true;
+}
+
+static bool __vcpu_write_sys_reg_to_cpu(u64 val, int reg)
+{
+	/*
+	 * System registers listed in the switch are not restored on every
+	 * entry to the guest but are only restored on vcpu_load.
+	 *
+	 * Note that MPIDR_EL1 for the guest is set by KVM via VMPIDR_EL2 but
+	 * should never be listed below, because the the MPIDR should only be
+	 * set once, before running the VCPU, and never changed later.
+	 */
+	switch (reg) {
+	case CSSELR_EL1:	write_sysreg_s(val, SYS_CSSELR_EL1);	break;
+	case SCTLR_EL1:		write_sysreg_s(val, SYS_SCTLR_EL12);	break;
+	case ACTLR_EL1:		write_sysreg_s(val, SYS_ACTLR_EL1);	break;
+	case CPACR_EL1:		write_sysreg_s(val, SYS_CPACR_EL12);	break;
+	case TTBR0_EL1:		write_sysreg_s(val, SYS_TTBR0_EL12);	break;
+	case TTBR1_EL1:		write_sysreg_s(val, SYS_TTBR1_EL12);	break;
+	case TCR_EL1:		write_sysreg_s(val, SYS_TCR_EL12);	break;
+	case ESR_EL1:		write_sysreg_s(val, SYS_ESR_EL12);	break;
+	case AFSR0_EL1:		write_sysreg_s(val, SYS_AFSR0_EL12);	break;
+	case AFSR1_EL1:		write_sysreg_s(val, SYS_AFSR1_EL12);	break;
+	case FAR_EL1:		write_sysreg_s(val, SYS_FAR_EL12);	break;
+	case MAIR_EL1:		write_sysreg_s(val, SYS_MAIR_EL12);	break;
+	case VBAR_EL1:		write_sysreg_s(val, SYS_VBAR_EL12);	break;
+	case CONTEXTIDR_EL1:	write_sysreg_s(val, SYS_CONTEXTIDR_EL12);break;
+	case TPIDR_EL0:		write_sysreg_s(val, SYS_TPIDR_EL0);	break;
+	case TPIDRRO_EL0:	write_sysreg_s(val, SYS_TPIDRRO_EL0);	break;
+	case TPIDR_EL1:		write_sysreg_s(val, SYS_TPIDR_EL1);	break;
+	case AMAIR_EL1:		write_sysreg_s(val, SYS_AMAIR_EL12);	break;
+	case CNTKCTL_EL1:	write_sysreg_s(val, SYS_CNTKCTL_EL12);	break;
+	case PAR_EL1:		write_sysreg_s(val, SYS_PAR_EL1);	break;
+	case DACR32_EL2:	write_sysreg_s(val, SYS_DACR32_EL2);	break;
+	case IFSR32_EL2:	write_sysreg_s(val, SYS_IFSR32_EL2);	break;
+	case DBGVCR32_EL2:	write_sysreg_s(val, SYS_DBGVCR32_EL2);	break;
+	default:		return false;
+	}
+
+	return true;
+}
+
 u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
 {
-	u64 val;
+	u64 val = 0x8badf00d8badf00d;
 
 	if (!vcpu->arch.sysregs_loaded_on_cpu)
-		goto immediate_read;
+		goto memory_read;
 
 	if (unlikely(sysreg_is_el2(reg))) {
 		const struct el2_sysreg_map *el2_reg;
 
 		if (!is_hyp_ctxt(vcpu))
-			goto immediate_read;
+			goto memory_read;
 
 		switch (reg) {
+		case ELR_EL2:
+			return read_sysreg_el1(SYS_ELR);
 		case SPSR_EL2:
 			val = read_sysreg_el1(SYS_SPSR);
 			return __fixup_spsr_el2_read(&vcpu->arch.ctxt, val);
 		}
 
 		el2_reg = find_el2_sysreg(nested_sysreg_map, reg);
-		if (el2_reg) {
-			/*
-			 * If this register does not have an EL1 counterpart,
-			 * then read the stored EL2 version.
-			 */
-			if (el2_reg->mapping == __INVALID_SYSREG__)
-				goto immediate_read;
-
-			/* Get the current version of the EL1 counterpart. */
-			reg = el2_reg->mapping;
-		}
-	} else {
-		/* EL1 register can't be on the CPU if the guest is in vEL2. */
-		if (unlikely(is_hyp_ctxt(vcpu)))
-			goto immediate_read;
+		BUG_ON(!el2_reg);
+
+		/*
+		 * If this register does not have an EL1 counterpart,
+		 * then read the stored EL2 version.
+		 */
+		if (el2_reg->mapping == __INVALID_SYSREG__)
+			goto memory_read;
+
+		if (!vcpu_el2_e2h_is_set(vcpu) &&
+		    el2_reg->translate)
+			goto memory_read;
+
+		/* Get the current version of the EL1 counterpart. */
+		reg = el2_reg->mapping;
+		WARN_ON(!__vcpu_read_sys_reg_from_cpu(reg, &val));
+		return val;
 	}
 
-	/*
-	 * System registers listed in the switch are not saved on every
-	 * exit from the guest but are only saved on vcpu_put.
-	 *
-	 * Note that MPIDR_EL1 for the guest is set by KVM via VMPIDR_EL2 but
-	 * should never be listed below, because the guest cannot modify its
-	 * own MPIDR_EL1 and MPIDR_EL1 is accessed for VCPU A from VCPU B's
-	 * thread when emulating cross-VCPU communication.
-	 */
-	switch (reg) {
-	case CSSELR_EL1:	return read_sysreg_s(SYS_CSSELR_EL1);
-	case SCTLR_EL1:		return read_sysreg_s(SYS_SCTLR_EL12);
-	case ACTLR_EL1:		return read_sysreg_s(SYS_ACTLR_EL1);
-	case CPACR_EL1:		return read_sysreg_s(SYS_CPACR_EL12);
-	case TTBR0_EL1:		return read_sysreg_s(SYS_TTBR0_EL12);
-	case TTBR1_EL1:		return read_sysreg_s(SYS_TTBR1_EL12);
-	case TCR_EL1:		return read_sysreg_s(SYS_TCR_EL12);
-	case ESR_EL1:		return read_sysreg_s(SYS_ESR_EL12);
-	case AFSR0_EL1:		return read_sysreg_s(SYS_AFSR0_EL12);
-	case AFSR1_EL1:		return read_sysreg_s(SYS_AFSR1_EL12);
-	case FAR_EL1:		return read_sysreg_s(SYS_FAR_EL12);
-	case MAIR_EL1:		return read_sysreg_s(SYS_MAIR_EL12);
-	case VBAR_EL1:		return read_sysreg_s(SYS_VBAR_EL12);
-	case CONTEXTIDR_EL1:	return read_sysreg_s(SYS_CONTEXTIDR_EL12);
-	case TPIDR_EL0:		return read_sysreg_s(SYS_TPIDR_EL0);
-	case TPIDRRO_EL0:	return read_sysreg_s(SYS_TPIDRRO_EL0);
-	case TPIDR_EL1:		return read_sysreg_s(SYS_TPIDR_EL1);
-	case AMAIR_EL1:		return read_sysreg_s(SYS_AMAIR_EL12);
-	case CNTKCTL_EL1:	return read_sysreg_s(SYS_CNTKCTL_EL12);
-	case PAR_EL1:		return read_sysreg_s(SYS_PAR_EL1);
-	case DACR32_EL2:	return read_sysreg_s(SYS_DACR32_EL2);
-	case IFSR32_EL2:	return read_sysreg_s(SYS_IFSR32_EL2);
-	case DBGVCR32_EL2:	return read_sysreg_s(SYS_DBGVCR32_EL2);
-	case SP_EL2:		return read_sysreg(sp_el1);
-	case ELR_EL2:		return read_sysreg_el1(SYS_ELR);
-	}
+	/* EL1 register can't be on the CPU if the guest is in vEL2. */
+	if (unlikely(is_hyp_ctxt(vcpu)))
+		goto memory_read;
+
+	if (__vcpu_read_sys_reg_from_cpu(reg, &val))
+		return val;
 
-immediate_read:
+memory_read:
 	return __vcpu_sys_reg(vcpu, reg);
 }
 
 void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
 {
 	if (!vcpu->arch.sysregs_loaded_on_cpu)
-		goto immediate_write;
+		goto memory_write;
 
 	if (unlikely(sysreg_is_el2(reg))) {
 		const struct el2_sysreg_map *el2_reg;
 
 		if (!is_hyp_ctxt(vcpu))
-			goto immediate_write;
+			goto memory_write;
 
-		/* Store the EL2 version in the sysregs array. */
+		/*
+		 * Always store a copy of the write to memory to avoid having
+		 * to reverse-translate virtual EL2 system registers for a
+		 * non-VHE guest hypervisor.
+		 */
 		__vcpu_sys_reg(vcpu, reg) = val;
 
 		switch (reg) {
+		case ELR_EL2:
+			write_sysreg_el1(val, SYS_ELR);
+			return;
 		case SPSR_EL2:
 			val = __fixup_spsr_el2_write(&vcpu->arch.ctxt, val);
 			write_sysreg_el1(val, SYS_SPSR);
@@ -282,61 +344,30 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
 		}
 
 		el2_reg = find_el2_sysreg(nested_sysreg_map, reg);
-		if (el2_reg) {
-			/* Does this register have an EL1 counterpart? */
-			if (el2_reg->mapping == __INVALID_SYSREG__)
-				return;
+		WARN(!el2_reg, "reg: %d\n", reg);
 
-			if (!vcpu_el2_e2h_is_set(vcpu) &&
-			    el2_reg->translate)
-				val = el2_reg->translate(val);
+		/* Does this register have an EL1 counterpart? */
+		if (el2_reg->mapping == __INVALID_SYSREG__)
+			goto memory_write;
 
-			/* Redirect this to the EL1 version of the register. */
-			reg = el2_reg->mapping;
-		}
-	} else {
-		/* EL1 register can't be on the CPU if the guest is in vEL2. */
-		if (unlikely(is_hyp_ctxt(vcpu)))
-			goto immediate_write;
-	}
+		if (!vcpu_el2_e2h_is_set(vcpu) &&
+		    el2_reg->translate)
+			val = el2_reg->translate(val);
 
-	/*
-	 * System registers listed in the switch are not restored on every
-	 * entry to the guest but are only restored on vcpu_load.
-	 *
-	 * Note that MPIDR_EL1 for the guest is set by KVM via VMPIDR_EL2 but
-	 * should never be listed below, because the the MPIDR should only be
-	 * set once, before running the VCPU, and never changed later.
-	 */
-	switch (reg) {
-	case CSSELR_EL1:	write_sysreg_s(val, SYS_CSSELR_EL1);	return;
-	case SCTLR_EL1:		write_sysreg_s(val, SYS_SCTLR_EL12);	return;
-	case ACTLR_EL1:		write_sysreg_s(val, SYS_ACTLR_EL1);	return;
-	case CPACR_EL1:		write_sysreg_s(val, SYS_CPACR_EL12);	return;
-	case TTBR0_EL1:		write_sysreg_s(val, SYS_TTBR0_EL12);	return;
-	case TTBR1_EL1:		write_sysreg_s(val, SYS_TTBR1_EL12);	return;
-	case TCR_EL1:		write_sysreg_s(val, SYS_TCR_EL12);	return;
-	case ESR_EL1:		write_sysreg_s(val, SYS_ESR_EL12);	return;
-	case AFSR0_EL1:		write_sysreg_s(val, SYS_AFSR0_EL12);	return;
-	case AFSR1_EL1:		write_sysreg_s(val, SYS_AFSR1_EL12);	return;
-	case FAR_EL1:		write_sysreg_s(val, SYS_FAR_EL12);	return;
-	case MAIR_EL1:		write_sysreg_s(val, SYS_MAIR_EL12);	return;
-	case VBAR_EL1:		write_sysreg_s(val, SYS_VBAR_EL12);	return;
-	case CONTEXTIDR_EL1:	write_sysreg_s(val, SYS_CONTEXTIDR_EL12); return;
-	case TPIDR_EL0:		write_sysreg_s(val, SYS_TPIDR_EL0);	return;
-	case TPIDRRO_EL0:	write_sysreg_s(val, SYS_TPIDRRO_EL0);	return;
-	case TPIDR_EL1:		write_sysreg_s(val, SYS_TPIDR_EL1);	return;
-	case AMAIR_EL1:		write_sysreg_s(val, SYS_AMAIR_EL12);	return;
-	case CNTKCTL_EL1:	write_sysreg_s(val, SYS_CNTKCTL_EL12);	return;
-	case PAR_EL1:		write_sysreg_s(val, SYS_PAR_EL1);	return;
-	case DACR32_EL2:	write_sysreg_s(val, SYS_DACR32_EL2);	return;
-	case IFSR32_EL2:	write_sysreg_s(val, SYS_IFSR32_EL2);	return;
-	case DBGVCR32_EL2:	write_sysreg_s(val, SYS_DBGVCR32_EL2);	return;
-	case SP_EL2:		write_sysreg(val, sp_el1);		return;
-	case ELR_EL2:		write_sysreg_el1(val, SYS_ELR);		return;
+		/* Redirect this to the EL1 version of the register. */
+		reg = el2_reg->mapping;
+		WARN_ON(!__vcpu_write_sys_reg_to_cpu(val, reg));
+		return;
 	}
 
-immediate_write:
+	/* EL1 register can't be on the CPU if the guest is in vEL2. */
+	if (unlikely(is_hyp_ctxt(vcpu)))
+		goto memory_write;
+
+	if (__vcpu_write_sys_reg_to_cpu(val, reg))
+		return;
+
+memory_write:
 	 __vcpu_sys_reg(vcpu, reg) = val;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 16/59] KVM: arm64: nv: Save/Restore vEL2 sysregs
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (14 preceding siblings ...)
  2019-06-21  9:37 ` [PATCH 15/59] KVM: arm64: nv: Refactor vcpu_{read,write}_sys_reg Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-06-25  8:48   ` Julien Thierry
                     ` (2 more replies)
  2019-06-21  9:38 ` [PATCH 17/59] KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor Marc Zyngier
                   ` (44 subsequent siblings)
  60 siblings, 3 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Andre Przywara <andre.przywara@arm.com>

Whenever we need to restore the guest's system registers to the CPU, we
now need to take care of the EL2 system registers as well. Most of them
are accessed via traps only, but some have an immediate effect and also
a guest running in VHE mode would expect them to be accessible via their
EL1 encoding, which we do not trap.

Split the current __sysreg_{save,restore}_el1_state() functions into
handling common sysregs, then differentiate between the guest running in
vEL2 and vEL1.

For vEL2 we write the virtual EL2 registers with an identical format directly
into their EL1 counterpart, and translate the few registers that have a
different format for the same effect on the execution when running a
non-VHE guest guest hypervisor.

  [ Commit message reworked and many bug fixes applied by Marc Zyngier
    and Christoffer Dall. ]

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
---
 arch/arm64/kvm/hyp/sysreg-sr.c | 160 +++++++++++++++++++++++++++++++--
 1 file changed, 153 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index 62866a68e852..2abb9c3ff24f 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -22,6 +22,7 @@
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_nested.h>
 
 /*
  * Non-VHE: Both host and guest must save everything.
@@ -51,11 +52,9 @@ static void __hyp_text __sysreg_save_user_state(struct kvm_cpu_context *ctxt)
 	ctxt->sys_regs[TPIDRRO_EL0]	= read_sysreg(tpidrro_el0);
 }
 
-static void __hyp_text __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
+static void __hyp_text __sysreg_save_vel1_state(struct kvm_cpu_context *ctxt)
 {
-	ctxt->sys_regs[CSSELR_EL1]	= read_sysreg(csselr_el1);
 	ctxt->sys_regs[SCTLR_EL1]	= read_sysreg_el1(SYS_SCTLR);
-	ctxt->sys_regs[ACTLR_EL1]	= read_sysreg(actlr_el1);
 	ctxt->sys_regs[CPACR_EL1]	= read_sysreg_el1(SYS_CPACR);
 	ctxt->sys_regs[TTBR0_EL1]	= read_sysreg_el1(SYS_TTBR0);
 	ctxt->sys_regs[TTBR1_EL1]	= read_sysreg_el1(SYS_TTBR1);
@@ -69,14 +68,58 @@ static void __hyp_text __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
 	ctxt->sys_regs[CONTEXTIDR_EL1]	= read_sysreg_el1(SYS_CONTEXTIDR);
 	ctxt->sys_regs[AMAIR_EL1]	= read_sysreg_el1(SYS_AMAIR);
 	ctxt->sys_regs[CNTKCTL_EL1]	= read_sysreg_el1(SYS_CNTKCTL);
-	ctxt->sys_regs[PAR_EL1]		= read_sysreg(par_el1);
-	ctxt->sys_regs[TPIDR_EL1]	= read_sysreg(tpidr_el1);
 
 	ctxt->gp_regs.sp_el1		= read_sysreg(sp_el1);
 	ctxt->gp_regs.elr_el1		= read_sysreg_el1(SYS_ELR);
 	ctxt->gp_regs.spsr[KVM_SPSR_EL1]= read_sysreg_el1(SYS_SPSR);
 }
 
+static void __sysreg_save_vel2_state(struct kvm_cpu_context *ctxt)
+{
+	ctxt->sys_regs[ESR_EL2]		= read_sysreg_el1(SYS_ESR);
+	ctxt->sys_regs[AFSR0_EL2]	= read_sysreg_el1(SYS_AFSR0);
+	ctxt->sys_regs[AFSR1_EL2]	= read_sysreg_el1(SYS_AFSR1);
+	ctxt->sys_regs[FAR_EL2]		= read_sysreg_el1(SYS_FAR);
+	ctxt->sys_regs[MAIR_EL2]	= read_sysreg_el1(SYS_MAIR);
+	ctxt->sys_regs[VBAR_EL2]	= read_sysreg_el1(SYS_VBAR);
+	ctxt->sys_regs[CONTEXTIDR_EL2]	= read_sysreg_el1(SYS_CONTEXTIDR);
+	ctxt->sys_regs[AMAIR_EL2]	= read_sysreg_el1(SYS_AMAIR);
+
+	/*
+	 * In VHE mode those registers are compatible between EL1 and EL2,
+	 * and the guest uses the _EL1 versions on the CPU naturally.
+	 * So we save them into their _EL2 versions here.
+	 * For nVHE mode we trap accesses to those registers, so our
+	 * _EL2 copy in sys_regs[] is always up-to-date and we don't need
+	 * to save anything here.
+	 */
+	if (__vcpu_el2_e2h_is_set(ctxt)) {
+		ctxt->sys_regs[SCTLR_EL2]	= read_sysreg_el1(SYS_SCTLR);
+		ctxt->sys_regs[CPTR_EL2]	= read_sysreg_el1(SYS_CPACR);
+		ctxt->sys_regs[TTBR0_EL2]	= read_sysreg_el1(SYS_TTBR0);
+		ctxt->sys_regs[TTBR1_EL2]	= read_sysreg_el1(SYS_TTBR1);
+		ctxt->sys_regs[TCR_EL2]		= read_sysreg_el1(SYS_TCR);
+		ctxt->sys_regs[CNTHCTL_EL2]	= read_sysreg_el1(SYS_CNTKCTL);
+	}
+
+	ctxt->sys_regs[SP_EL2]		= read_sysreg(sp_el1);
+	ctxt->sys_regs[ELR_EL2]		= read_sysreg_el1(SYS_ELR);
+	ctxt->sys_regs[SPSR_EL2]	= __fixup_spsr_el2_read(ctxt, read_sysreg_el1(SYS_SPSR));
+}
+
+static void __hyp_text __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
+{
+	ctxt->sys_regs[CSSELR_EL1]	= read_sysreg(csselr_el1);
+	ctxt->sys_regs[ACTLR_EL1]	= read_sysreg(actlr_el1);
+	ctxt->sys_regs[PAR_EL1]		= read_sysreg(par_el1);
+	ctxt->sys_regs[TPIDR_EL1]	= read_sysreg(tpidr_el1);
+
+	if (unlikely(__is_hyp_ctxt(ctxt)))
+		__sysreg_save_vel2_state(ctxt);
+	else
+		__sysreg_save_vel1_state(ctxt);
+}
+
 static void __hyp_text __sysreg_save_el2_return_state(struct kvm_cpu_context *ctxt)
 {
 	ctxt->gp_regs.regs.pc		= read_sysreg_el2(SYS_ELR);
@@ -124,10 +167,91 @@ static void __hyp_text __sysreg_restore_user_state(struct kvm_cpu_context *ctxt)
 	write_sysreg(ctxt->sys_regs[TPIDRRO_EL0],	tpidrro_el0);
 }
 
-static void __hyp_text __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
+static void __sysreg_restore_vel2_state(struct kvm_cpu_context *ctxt)
 {
+	u64 val;
+
+	write_sysreg(read_cpuid_id(),			vpidr_el2);
 	write_sysreg(ctxt->sys_regs[MPIDR_EL1],		vmpidr_el2);
-	write_sysreg(ctxt->sys_regs[CSSELR_EL1],	csselr_el1);
+	write_sysreg_el1(ctxt->sys_regs[MAIR_EL2],	SYS_MAIR);
+	write_sysreg_el1(ctxt->sys_regs[VBAR_EL2],	SYS_VBAR);
+	write_sysreg_el1(ctxt->sys_regs[CONTEXTIDR_EL2],SYS_CONTEXTIDR);
+	write_sysreg_el1(ctxt->sys_regs[AMAIR_EL2],	SYS_AMAIR);
+
+	if (__vcpu_el2_e2h_is_set(ctxt)) {
+		/*
+		 * In VHE mode those registers are compatible between
+		 * EL1 and EL2.
+		 */
+		write_sysreg_el1(ctxt->sys_regs[SCTLR_EL2],	SYS_SCTLR);
+		write_sysreg_el1(ctxt->sys_regs[CPTR_EL2],	SYS_CPACR);
+		write_sysreg_el1(ctxt->sys_regs[TTBR0_EL2],	SYS_TTBR0);
+		write_sysreg_el1(ctxt->sys_regs[TTBR1_EL2],	SYS_TTBR1);
+		write_sysreg_el1(ctxt->sys_regs[TCR_EL2],	SYS_TCR);
+		write_sysreg_el1(ctxt->sys_regs[CNTHCTL_EL2],	SYS_CNTKCTL);
+	} else {
+		write_sysreg_el1(translate_sctlr(ctxt->sys_regs[SCTLR_EL2]),
+				 SYS_SCTLR);
+		write_sysreg_el1(translate_cptr(ctxt->sys_regs[CPTR_EL2]),
+				 SYS_CPACR);
+		write_sysreg_el1(translate_ttbr0(ctxt->sys_regs[TTBR0_EL2]),
+				 SYS_TTBR0);
+		write_sysreg_el1(translate_tcr(ctxt->sys_regs[TCR_EL2]),
+				 SYS_TCR);
+		write_sysreg_el1(translate_cnthctl(ctxt->sys_regs[CNTHCTL_EL2]),
+				 SYS_CNTKCTL);
+	}
+
+	/*
+	 * These registers can be modified behind our back by a fault
+	 * taken inside vEL2. Save them, always.
+	 */
+	write_sysreg_el1(ctxt->sys_regs[ESR_EL2],	SYS_ESR);
+	write_sysreg_el1(ctxt->sys_regs[AFSR0_EL2],	SYS_AFSR0);
+	write_sysreg_el1(ctxt->sys_regs[AFSR1_EL2],	SYS_AFSR1);
+	write_sysreg_el1(ctxt->sys_regs[FAR_EL2],	SYS_FAR);
+	write_sysreg(ctxt->sys_regs[SP_EL2],		sp_el1);
+	write_sysreg_el1(ctxt->sys_regs[ELR_EL2],	SYS_ELR);
+
+	val = __fixup_spsr_el2_write(ctxt, ctxt->sys_regs[SPSR_EL2]);
+	write_sysreg_el1(val,	SYS_SPSR);
+}
+
+static void __hyp_text __sysreg_restore_vel1_state(struct kvm_cpu_context *ctxt)
+{
+	u64 mpidr;
+
+	if (has_vhe()) {
+		struct kvm_vcpu *vcpu;
+
+		/*
+		 * Warning: this hack only works on VHE, because we only
+		 * call this with the *guest* context, which is part of
+		 * struct kvm_vcpu. On a host context, you'd get pure junk.
+		 */
+		vcpu = container_of(ctxt, struct kvm_vcpu, arch.ctxt);
+
+		if (nested_virt_in_use(vcpu)) {
+			/*
+			 * Only set VPIDR_EL2 for nested VMs, as this is the
+			 * only time it changes. We'll restore the MIDR_EL1
+			 * view on put.
+			 */
+			write_sysreg(ctxt->sys_regs[VPIDR_EL2],	vpidr_el2);
+
+			/*
+			 * As we're restoring a nested guest, set the value
+			 * provided by the guest hypervisor.
+			 */
+			mpidr = ctxt->sys_regs[VMPIDR_EL2];
+		} else {
+			mpidr = ctxt->sys_regs[MPIDR_EL1];
+		}
+	} else {
+		mpidr = ctxt->sys_regs[MPIDR_EL1];
+	}
+
+	write_sysreg(mpidr,				vmpidr_el2);
 	write_sysreg_el1(ctxt->sys_regs[SCTLR_EL1],	SYS_SCTLR);
 	write_sysreg(ctxt->sys_regs[ACTLR_EL1],		actlr_el1);
 	write_sysreg_el1(ctxt->sys_regs[CPACR_EL1],	SYS_CPACR);
@@ -151,6 +275,19 @@ static void __hyp_text __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
 	write_sysreg_el1(ctxt->gp_regs.spsr[KVM_SPSR_EL1],SYS_SPSR);
 }
 
+static void __hyp_text __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
+{
+	write_sysreg(ctxt->sys_regs[CSSELR_EL1],	csselr_el1);
+	write_sysreg(ctxt->sys_regs[ACTLR_EL1],	  	actlr_el1);
+	write_sysreg(ctxt->sys_regs[PAR_EL1],		par_el1);
+	write_sysreg(ctxt->sys_regs[TPIDR_EL1],		tpidr_el1);
+
+	if (__is_hyp_ctxt(ctxt))
+		__sysreg_restore_vel2_state(ctxt);
+	else
+		__sysreg_restore_vel1_state(ctxt);
+}
+
 static void __hyp_text
 __sysreg_restore_el2_return_state(struct kvm_cpu_context *ctxt)
 {
@@ -307,6 +444,15 @@ void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
 	/* Restore host user state */
 	__sysreg_restore_user_state(host_ctxt);
 
+	/*
+	 * If leaving a nesting guest, restore MPIDR_EL1 default view. It is
+	 * slightly ugly to do it here, but the alternative is to penalize
+	 * all non-nesting guests by forcing this on every load. Instead, we
+	 * choose to only penalize nesting VMs.
+	 */
+	if (nested_virt_in_use(vcpu))
+		write_sysreg(read_cpuid_id(),	vpidr_el2);
+
 	vcpu->arch.sysregs_loaded_on_cpu = false;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 17/59] KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (15 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 16/59] KVM: arm64: nv: Save/Restore vEL2 sysregs Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-06-21  9:38 ` [PATCH 18/59] KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2 Marc Zyngier
                   ` (43 subsequent siblings)
  60 siblings, 0 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Christoffer Dall <christoffer.dall@arm.com>

We can no longer blindly copy the VCPU's PSTATE into SPSR_EL2 and return
to the guest and vice versa when taking an exception to the hypervisor,
because we emulate virtual EL2 in EL1 and therefore have to translate
the mode field from EL2 to EL1 and vice versa.

Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/kvm/hyp/sysreg-sr.c | 41 ++++++++++++++++++++++++++++++++--
 1 file changed, 39 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index 2abb9c3ff24f..ea800eed811d 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -120,10 +120,32 @@ static void __hyp_text __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
 		__sysreg_save_vel1_state(ctxt);
 }
 
+static u64 __hyp_text from_hw_pstate(const struct kvm_cpu_context *ctxt)
+{
+	u64 reg = read_sysreg_el2(SYS_SPSR);
+
+	if (__is_hyp_ctxt(ctxt)) {
+		u64 mode = reg & PSR_MODE_MASK;
+
+		switch (mode) {
+		case PSR_MODE_EL1t:
+			mode = PSR_MODE_EL2t;
+			break;
+		case PSR_MODE_EL1h:
+			mode = PSR_MODE_EL2h;
+			break;
+		}
+
+		return (reg & ~PSR_MODE_MASK) | mode;
+	}
+
+	return reg;
+}
+
 static void __hyp_text __sysreg_save_el2_return_state(struct kvm_cpu_context *ctxt)
 {
 	ctxt->gp_regs.regs.pc		= read_sysreg_el2(SYS_ELR);
-	ctxt->gp_regs.regs.pstate	= read_sysreg_el2(SYS_SPSR);
+	ctxt->gp_regs.regs.pstate	= from_hw_pstate(ctxt);
 
 	if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN))
 		ctxt->sys_regs[DISR_EL1] = read_sysreg_s(SYS_VDISR_EL2);
@@ -288,10 +310,25 @@ static void __hyp_text __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
 		__sysreg_restore_vel1_state(ctxt);
 }
 
+/* Read the VCPU state's PSTATE, but translate (v)EL2 to EL1. */
+static u64 __hyp_text to_hw_pstate(const struct kvm_cpu_context *ctxt)
+{
+	u64 mode = ctxt->gp_regs.regs.pstate & PSR_MODE_MASK;
+
+	switch (mode) {
+	case PSR_MODE_EL2t:
+		mode = PSR_MODE_EL1t;
+	case PSR_MODE_EL2h:
+		mode = PSR_MODE_EL1h;
+	}
+
+	return (ctxt->gp_regs.regs.pstate & ~PSR_MODE_MASK) | mode;
+}
+
 static void __hyp_text
 __sysreg_restore_el2_return_state(struct kvm_cpu_context *ctxt)
 {
-	u64 pstate = ctxt->gp_regs.regs.pstate;
+	u64 pstate = to_hw_pstate(ctxt);
 	u64 mode = pstate & PSR_AA32_MODE_MASK;
 
 	/*
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 18/59] KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (16 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 17/59] KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-07-01 16:12   ` Alexandru Elisei
  2019-06-21  9:38 ` [PATCH 19/59] KVM: arm64: nv: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 from " Marc Zyngier
                   ` (42 subsequent siblings)
  60 siblings, 1 reply; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Christoffer Dall <christoffer.dall@linaro.org>

When running in virtual EL2 mode, we actually run the hardware in EL1
and therefore have to use the EL1 registers to ensure correct operation.

By setting the HCR.TVM and HCR.TVRM we ensure that the virtual EL2 mode
doesn't shoot itself in the foot when setting up what it believes to be
a different mode's system register state (for example when preparing to
switch to a VM).

We can leverage the existing sysregs infrastructure to support trapped
accesses to these registers.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/kvm/hyp/switch.c | 4 ++++
 arch/arm64/kvm/sys_regs.c   | 7 ++++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 7b55c11b30fb..791b26570347 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -135,6 +135,10 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
 {
 	u64 hcr = vcpu->arch.hcr_el2;
 
+	/* Trap VM sysreg accesses if an EL2 guest is not using VHE. */
+	if (vcpu_mode_el2(vcpu) && !vcpu_el2_e2h_is_set(vcpu))
+		hcr |= HCR_TVM | HCR_TRVM;
+
 	write_sysreg(hcr, hcr_el2);
 
 	if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN) && (hcr & HCR_VSE))
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index e181359adadf..0464d8e29cba 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -440,7 +440,12 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
 	u64 val;
 	int reg = r->reg;
 
-	BUG_ON(!p->is_write);
+	BUG_ON(!vcpu_mode_el2(vcpu) && !p->is_write);
+
+	if (!p->is_write) {
+		p->regval = vcpu_read_sys_reg(vcpu, reg);
+		return true;
+	}
 
 	/* See the 32bit mapping in kvm_host.h */
 	if (p->is_aarch32)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 19/59] KVM: arm64: nv: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (17 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 18/59] KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2 Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-06-21  9:38 ` [PATCH 20/59] KVM: arm64: nv: Trap CPACR_EL1 access in " Marc Zyngier
                   ` (41 subsequent siblings)
  60 siblings, 0 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Jintack Lim <jintack.lim@linaro.org>

For the same reason we trap virtual memory register accesses at virtual
EL2, we need to trap SPSR_EL1, ELR_EL1 and VBAR_EL1 accesses. ARM v8.3
introduces the HCR_EL2.NV1 bit to be able to trap on those register
accesses in EL1. Do not set this bit until the whole nesting support is
completed.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/kvm/sys_regs.c | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 0464d8e29cba..7fc87657382d 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1606,6 +1606,30 @@ static bool access_sp_el1(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+static bool access_elr(struct kvm_vcpu *vcpu,
+		       struct sys_reg_params *p,
+		       const struct sys_reg_desc *r)
+{
+	if (p->is_write)
+		vcpu->arch.ctxt.gp_regs.elr_el1 = p->regval;
+	else
+		p->regval = vcpu->arch.ctxt.gp_regs.elr_el1;
+
+	return true;
+}
+
+static bool access_spsr(struct kvm_vcpu *vcpu,
+			struct sys_reg_params *p,
+			const struct sys_reg_desc *r)
+{
+	if (p->is_write)
+		vcpu->arch.ctxt.gp_regs.spsr[KVM_SPSR_EL1] = p->regval;
+	else
+		p->regval = vcpu->arch.ctxt.gp_regs.spsr[KVM_SPSR_EL1];
+
+	return true;
+}
+
 static bool access_spsr_el2(struct kvm_vcpu *vcpu,
 			    struct sys_reg_params *p,
 			    const struct sys_reg_desc *r)
@@ -1761,6 +1785,9 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	PTRAUTH_KEY(APDB),
 	PTRAUTH_KEY(APGA),
 
+	{ SYS_DESC(SYS_SPSR_EL1), access_spsr},
+	{ SYS_DESC(SYS_ELR_EL1), access_elr},
+
 	{ SYS_DESC(SYS_AFSR0_EL1), access_vm_reg, reset_unknown, AFSR0_EL1 },
 	{ SYS_DESC(SYS_AFSR1_EL1), access_vm_reg, reset_unknown, AFSR1_EL1 },
 	{ SYS_DESC(SYS_ESR_EL1), access_vm_reg, reset_unknown, ESR_EL1 },
@@ -1789,7 +1816,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_LORC_EL1), trap_loregion },
 	{ SYS_DESC(SYS_LORID_EL1), trap_loregion },
 
-	{ SYS_DESC(SYS_VBAR_EL1), NULL, reset_val, VBAR_EL1, 0 },
+	{ SYS_DESC(SYS_VBAR_EL1), access_rw, reset_val, VBAR_EL1, 0 },
 	{ SYS_DESC(SYS_DISR_EL1), NULL, reset_val, DISR_EL1, 0 },
 
 	{ SYS_DESC(SYS_ICC_IAR0_EL1), write_to_read_only },
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 20/59] KVM: arm64: nv: Trap CPACR_EL1 access in virtual EL2
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (18 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 19/59] KVM: arm64: nv: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 from " Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-07-01 16:40   ` Alexandru Elisei
  2019-06-21  9:38 ` [PATCH 21/59] KVM: arm64: nv: Set a handler for the system instruction traps Marc Zyngier
                   ` (40 subsequent siblings)
  60 siblings, 1 reply; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Jintack Lim <jintack.lim@linaro.org>

For the same reason we trap virtual memory register accesses in virtual
EL2, we trap CPACR_EL1 access too; We allow the virtual EL2 mode to
access EL1 system register state instead of the virtual EL2 one.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/kvm_arm.h | 3 ++-
 arch/arm64/kvm/hyp/switch.c      | 2 ++
 arch/arm64/kvm/sys_regs.c        | 2 +-
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index b2e363ac624d..48e15af2bece 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -278,12 +278,13 @@
 #define CPTR_EL2_TFP_SHIFT 10
 
 /* Hyp Coprocessor Trap Register */
-#define CPTR_EL2_TCPAC	(1 << 31)
+#define CPTR_EL2_TCPAC	(1U << 31)
 #define CPTR_EL2_TTA	(1 << 20)
 #define CPTR_EL2_TFP	(1 << CPTR_EL2_TFP_SHIFT)
 #define CPTR_EL2_TZ	(1 << 8)
 #define CPTR_EL2_RES1	0x000032ff /* known RES1 bits in CPTR_EL2 */
 #define CPTR_EL2_DEFAULT	CPTR_EL2_RES1
+#define CPTR_EL2_E2H_TCPAC	(1U << 31)
 
 /* Hyp Debug Configuration Register bits */
 #define MDCR_EL2_TPMS		(1 << 14)
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 791b26570347..62359c7c3d6b 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -108,6 +108,8 @@ static void activate_traps_vhe(struct kvm_vcpu *vcpu)
 		val &= ~CPACR_EL1_FPEN;
 		__activate_traps_fpsimd32(vcpu);
 	}
+	if (vcpu_mode_el2(vcpu) && !vcpu_el2_e2h_is_set(vcpu))
+		val |= CPTR_EL2_E2H_TCPAC;
 
 	write_sysreg(val, cpacr_el1);
 
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 7fc87657382d..1d1312425cf2 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1773,7 +1773,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	ID_UNALLOCATED(7,7),
 
 	{ SYS_DESC(SYS_SCTLR_EL1), access_vm_reg, reset_val, SCTLR_EL1, 0x00C50078 },
-	{ SYS_DESC(SYS_CPACR_EL1), NULL, reset_val, CPACR_EL1, 0 },
+	{ SYS_DESC(SYS_CPACR_EL1), access_rw, reset_val, CPACR_EL1, 0 },
 	{ SYS_DESC(SYS_ZCR_EL1), NULL, reset_val, ZCR_EL1, 0, .visibility = sve_visibility },
 	{ SYS_DESC(SYS_TTBR0_EL1), access_vm_reg, reset_unknown, TTBR0_EL1 },
 	{ SYS_DESC(SYS_TTBR1_EL1), access_vm_reg, reset_unknown, TTBR1_EL1 },
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 21/59] KVM: arm64: nv: Set a handler for the system instruction traps
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (19 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 20/59] KVM: arm64: nv: Trap CPACR_EL1 access in " Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-06-25 12:55   ` Julien Thierry
  2019-06-21  9:38 ` [PATCH 22/59] KVM: arm64: nv: Handle PSCI call via smc from the guest Marc Zyngier
                   ` (39 subsequent siblings)
  60 siblings, 1 reply; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Jintack Lim <jintack.lim@linaro.org>

When HCR.NV bit is set, execution of the EL2 translation regime address
aranslation instructions and TLB maintenance instructions are trapped to
EL2. In addition, execution of the EL1 translation regime address
aranslation instructions and TLB maintenance instructions that are only
accessible from EL2 and above are trapped to EL2. In these cases,
ESR_EL2.EC will be set to 0x18.

Change the existing handler to handle those system instructions as well
as MRS/MSR instructions.  Emulation of each system instructions will be
done in separate patches.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/kvm_coproc.h |  2 +-
 arch/arm64/kvm/handle_exit.c        |  2 +-
 arch/arm64/kvm/sys_regs.c           | 53 +++++++++++++++++++++++++----
 arch/arm64/kvm/trace.h              |  2 +-
 4 files changed, 50 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_coproc.h b/arch/arm64/include/asm/kvm_coproc.h
index 0b52377a6c11..1b3d21bd8adb 100644
--- a/arch/arm64/include/asm/kvm_coproc.h
+++ b/arch/arm64/include/asm/kvm_coproc.h
@@ -43,7 +43,7 @@ int kvm_handle_cp14_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_cp14_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
-int kvm_handle_sys_reg(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_sys(struct kvm_vcpu *vcpu, struct kvm_run *run);
 
 #define kvm_coproc_table_init kvm_sys_reg_table_init
 void kvm_sys_reg_table_init(void);
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 2517711f034f..e662f23b63a1 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -236,7 +236,7 @@ static exit_handle_fn arm_exit_handlers[] = {
 	[ESR_ELx_EC_SMC32]	= handle_smc,
 	[ESR_ELx_EC_HVC64]	= handle_hvc,
 	[ESR_ELx_EC_SMC64]	= handle_smc,
-	[ESR_ELx_EC_SYS64]	= kvm_handle_sys_reg,
+	[ESR_ELx_EC_SYS64]	= kvm_handle_sys,
 	[ESR_ELx_EC_SVE]	= handle_sve,
 	[ESR_ELx_EC_ERET]	= kvm_handle_eret,
 	[ESR_ELx_EC_IABT_LOW]	= kvm_handle_guest_abort,
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 1d1312425cf2..e711dde4511c 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2597,6 +2597,40 @@ static int emulate_sys_reg(struct kvm_vcpu *vcpu,
 	return 1;
 }
 
+static int emulate_tlbi(struct kvm_vcpu *vcpu,
+			     struct sys_reg_params *params)
+{
+	/* TODO: support tlbi instruction emulation*/
+	kvm_inject_undefined(vcpu);
+	return 1;
+}
+
+static int emulate_at(struct kvm_vcpu *vcpu,
+			     struct sys_reg_params *params)
+{
+	/* TODO: support address translation instruction emulation */
+	kvm_inject_undefined(vcpu);
+	return 1;
+}
+
+static int emulate_sys_instr(struct kvm_vcpu *vcpu,
+			     struct sys_reg_params *params)
+{
+	int ret = 0;
+
+	/* TLB maintenance instructions*/
+	if (params->CRn == 0b1000)
+		ret = emulate_tlbi(vcpu, params);
+	/* Address Translation instructions */
+	else if (params->CRn == 0b0111 && params->CRm == 0b1000)
+		ret = emulate_at(vcpu, params);
+
+	if (ret)
+		kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
+
+	return ret;
+}
+
 static void reset_sys_reg_descs(struct kvm_vcpu *vcpu,
 			      const struct sys_reg_desc *table, size_t num)
 {
@@ -2608,18 +2642,19 @@ static void reset_sys_reg_descs(struct kvm_vcpu *vcpu,
 }
 
 /**
- * kvm_handle_sys_reg -- handles a mrs/msr trap on a guest sys_reg access
+ * kvm_handle_sys-- handles a system instruction or mrs/msr instruction trap
+		    on a guest execution
  * @vcpu: The VCPU pointer
  * @run:  The kvm_run struct
  */
-int kvm_handle_sys_reg(struct kvm_vcpu *vcpu, struct kvm_run *run)
+int kvm_handle_sys(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
 	struct sys_reg_params params;
 	unsigned long esr = kvm_vcpu_get_hsr(vcpu);
 	int Rt = kvm_vcpu_sys_get_rt(vcpu);
 	int ret;
 
-	trace_kvm_handle_sys_reg(esr);
+	trace_kvm_handle_sys(esr);
 
 	params.is_aarch32 = false;
 	params.is_32bit = false;
@@ -2631,10 +2666,16 @@ int kvm_handle_sys_reg(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	params.regval = vcpu_get_reg(vcpu, Rt);
 	params.is_write = !(esr & 1);
 
-	ret = emulate_sys_reg(vcpu, &params);
+	if (params.Op0 == 1) {
+		/* System instructions */
+		ret = emulate_sys_instr(vcpu, &params);
+	} else {
+		/* MRS/MSR instructions */
+		ret = emulate_sys_reg(vcpu, &params);
+		if (!params.is_write)
+			vcpu_set_reg(vcpu, Rt, params.regval);
+	}
 
-	if (!params.is_write)
-		vcpu_set_reg(vcpu, Rt, params.regval);
 	return ret;
 }
 
diff --git a/arch/arm64/kvm/trace.h b/arch/arm64/kvm/trace.h
index 797a705bb644..2a6a54ef3824 100644
--- a/arch/arm64/kvm/trace.h
+++ b/arch/arm64/kvm/trace.h
@@ -138,7 +138,7 @@ TRACE_EVENT(trap_reg,
 	TP_printk("%s %s reg %d (0x%08llx)", __entry->fn,  __entry->is_write?"write to":"read from", __entry->reg, __entry->write_value)
 );
 
-TRACE_EVENT(kvm_handle_sys_reg,
+TRACE_EVENT(kvm_handle_sys,
 	TP_PROTO(unsigned long hsr),
 	TP_ARGS(hsr),
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 22/59] KVM: arm64: nv: Handle PSCI call via smc from the guest
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (20 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 21/59] KVM: arm64: nv: Set a handler for the system instruction traps Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-06-21  9:38 ` [PATCH 23/59] KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting Marc Zyngier
                   ` (38 subsequent siblings)
  60 siblings, 0 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Jintack Lim <jintack.lim@linaro.org>

VMs used to execute hvc #0 for the psci call if EL3 is not implemented.
However, when we come to provide the virtual EL2 mode to the VM, the
host OS inside the VM calls kvm_call_hyp() which is also hvc #0. So,
it's hard to differentiate between them from the host hypervisor's point
of view.

So, let the VM execute smc instruction for the psci call. On ARMv8.3,
even if EL3 is not implemented, a smc instruction executed at non-secure
EL1 is trapped to EL2 if HCR_EL2.TSC==1, rather than being treated as
UNDEFINED. So, the host hypervisor can handle this psci call without any
confusion.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/kvm/handle_exit.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index e662f23b63a1..e348c15c81bc 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -70,6 +70,8 @@ static int handle_hvc(struct kvm_vcpu *vcpu, struct kvm_run *run)
 
 static int handle_smc(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
+	int ret;
+
 	/*
 	 * "If an SMC instruction executed at Non-secure EL1 is
 	 * trapped to EL2 because HCR_EL2.TSC is 1, the exception is a
@@ -77,10 +79,28 @@ static int handle_smc(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	 *
 	 * We need to advance the PC after the trap, as it would
 	 * otherwise return to the same address...
+	 *
+	 * If imm is non-zero, it's not defined, so just skip it.
+	 */
+	if (kvm_vcpu_hvc_get_imm(vcpu)) {
+		vcpu_set_reg(vcpu, 0, ~0UL);
+		kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
+		return 1;
+	}
+
+	/*
+	 * If imm is zero, it's a psci call.
+	 * Note that on ARMv8.3, even if EL3 is not implemented, SMC executed
+	 * at Non-secure EL1 is trapped to EL2 if HCR_EL2.TSC==1, rather than
+	 * being treated as UNDEFINED.
 	 */
-	vcpu_set_reg(vcpu, 0, ~0UL);
+	ret = kvm_hvc_call_handler(vcpu);
+	if (ret < 0)
+		vcpu_set_reg(vcpu, 0, ~0UL);
+
 	kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
-	return 1;
+
+	return ret;
 }
 
 /*
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 23/59] KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (21 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 22/59] KVM: arm64: nv: Handle PSCI call via smc from the guest Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-06-25 14:19   ` Julien Thierry
  2019-06-21  9:38 ` [PATCH 24/59] KVM: arm64: nv: Respect virtual CPTR_EL2.TFP setting Marc Zyngier
                   ` (37 subsequent siblings)
  60 siblings, 1 reply; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Jintack Lim <jintack.lim@linaro.org>

Forward exceptions due to WFI or WFE instructions to the virtual EL2 if
they are not coming from the virtual EL2 and virtual HCR_EL2.TWX is set.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/kvm_nested.h |  2 ++
 arch/arm64/kvm/Makefile             |  1 +
 arch/arm64/kvm/handle_exit.c        | 13 +++++++++-
 arch/arm64/kvm/nested.c             | 39 +++++++++++++++++++++++++++++
 4 files changed, 54 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/kvm/nested.c

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 8a3d121a0b42..645e5e11b749 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -10,4 +10,6 @@ static inline bool nested_virt_in_use(const struct kvm_vcpu *vcpu)
 		test_bit(KVM_ARM_VCPU_NESTED_VIRT, vcpu->arch.features);
 }
 
+int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
+
 #endif /* __ARM64_KVM_NESTED_H */
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 9e450aea7db6..f11bd8b0d837 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -36,4 +36,5 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/irqchip.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
 kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
 
+kvm-$(CONFIG_KVM_ARM_HOST) += nested.o
 kvm-$(CONFIG_KVM_ARM_HOST) += emulate-nested.o
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index e348c15c81bc..ddba212fd6ec 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -127,7 +127,18 @@ static int handle_no_fpsimd(struct kvm_vcpu *vcpu, struct kvm_run *run)
  */
 static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
-	if (kvm_vcpu_get_hsr(vcpu) & ESR_ELx_WFx_ISS_WFE) {
+	bool is_wfe = !!(kvm_vcpu_get_hsr(vcpu) & ESR_ELx_WFx_ISS_WFE);
+
+	if (nested_virt_in_use(vcpu)) {
+		int ret = handle_wfx_nested(vcpu, is_wfe);
+
+		if (ret < 0 && ret != -EINVAL)
+			return ret;
+		else if (ret >= 0)
+			return ret;
+	}
+
+	if (is_wfe) {
 		trace_kvm_wfx_arm64(*vcpu_pc(vcpu), true);
 		vcpu->stat.wfe_exit_stat++;
 		kvm_vcpu_on_spin(vcpu, vcpu_mode_priv(vcpu));
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
new file mode 100644
index 000000000000..3872e3cf1691
--- /dev/null
+++ b/arch/arm64/kvm/nested.c
@@ -0,0 +1,39 @@
+/*
+ * Copyright (C) 2017 - Columbia University and Linaro Ltd.
+ * Author: Jintack Lim <jintack.lim@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_emulate.h>
+
+/*
+ * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
+ * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
+ * handle this.
+ */
+int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
+{
+	u64 hcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
+
+	if (vcpu_mode_el2(vcpu))
+		return -EINVAL;
+
+	if ((is_wfe && (hcr_el2 & HCR_TWE)) || (!is_wfe && (hcr_el2 & HCR_TWI)))
+		return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
+	return -EINVAL;
+}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 24/59] KVM: arm64: nv: Respect virtual CPTR_EL2.TFP setting
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (22 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 23/59] KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-06-21  9:38 ` [PATCH 25/59] KVM: arm64: nv: Don't expose SVE to nested guests Marc Zyngier
                   ` (36 subsequent siblings)
  60 siblings, 0 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Jintack Lim <jintack.lim@linaro.org>

Forward traps due to FP/ASIMD register accesses to the virtual EL2 if
virtual CPTR_EL2.TFP is set. Note that if TFP bit is set, then even
accesses to FP/ASIMD register from EL2 as well as NS EL0/1 will trap to
EL2. So, we don't check the VM's exception level.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/kvm_emulate.h |  7 +++++++
 arch/arm64/kvm/handle_exit.c         | 16 ++++++++++++----
 arch/arm64/kvm/hyp/switch.c          | 11 +++++++++--
 3 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 2644258e96ba..73d8c54a52c6 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -29,6 +29,7 @@
 #include <asm/kvm_arm.h>
 #include <asm/kvm_hyp.h>
 #include <asm/kvm_mmio.h>
+#include <asm/kvm_nested.h>
 #include <asm/ptrace.h>
 #include <asm/cputype.h>
 #include <asm/virt.h>
@@ -357,6 +358,12 @@ static inline bool vcpu_mode_priv(const struct kvm_vcpu *vcpu)
 	return mode != PSR_MODE_EL0t;
 }
 
+static inline bool guest_hyp_fpsimd_traps_enabled(const struct kvm_vcpu *vcpu)
+{
+	return nested_virt_in_use(vcpu) &&
+		(vcpu_read_sys_reg(vcpu, CPTR_EL2) & CPTR_EL2_TFP);
+}
+
 static inline u32 kvm_vcpu_get_hsr(const struct kvm_vcpu *vcpu)
 {
 	return vcpu->arch.fault.esr_el2;
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index ddba212fd6ec..39602a4c1d61 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -104,11 +104,19 @@ static int handle_smc(struct kvm_vcpu *vcpu, struct kvm_run *run)
 }
 
 /*
- * Guest access to FP/ASIMD registers are routed to this handler only
- * when the system doesn't support FP/ASIMD.
+ * This handles the cases where the system does not support FP/ASIMD or when
+ * we are running nested virtualization and the guest hypervisor is trapping
+ * FP/ASIMD accesses by its guest guest.
+ *
+ * All other handling of guest vs. host FP/ASIMD register state is handled in
+ * fixup_guest_exit().
  */
-static int handle_no_fpsimd(struct kvm_vcpu *vcpu, struct kvm_run *run)
+static int kvm_handle_fpasimd(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
+	if (guest_hyp_fpsimd_traps_enabled(vcpu))
+		return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
+	/* This is the case when the system doesn't support FP/ASIMD. */
 	kvm_inject_undefined(vcpu);
 	return 1;
 }
@@ -277,7 +285,7 @@ static exit_handle_fn arm_exit_handlers[] = {
 	[ESR_ELx_EC_BREAKPT_LOW]= kvm_handle_guest_debug,
 	[ESR_ELx_EC_BKPT32]	= kvm_handle_guest_debug,
 	[ESR_ELx_EC_BRK64]	= kvm_handle_guest_debug,
-	[ESR_ELx_EC_FP_ASIMD]	= handle_no_fpsimd,
+	[ESR_ELx_EC_FP_ASIMD]	= kvm_handle_fpasimd,
 	[ESR_ELx_EC_PAC]	= kvm_handle_ptrauth,
 };
 
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 62359c7c3d6b..9b5129cdc26a 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -351,11 +351,18 @@ static bool __hyp_text __hyp_handle_fpsimd(struct kvm_vcpu *vcpu)
 	    hsr_ec != ESR_ELx_EC_SVE)
 		return false;
 
-	/* Don't handle SVE traps for non-SVE vcpus here: */
-	if (!sve_guest)
+	/*
+	 * Don't handle SVE traps for non-SVE vcpus here. This
+	 * includes NV guests for the time beeing.
+	 */
+	if (!sve_guest) {
 		if (hsr_ec != ESR_ELx_EC_FP_ASIMD)
 			return false;
 
+		if (guest_hyp_fpsimd_traps_enabled(vcpu))
+			return false;
+	}
+
 	/* Valid trap.  Switch the context: */
 
 	if (vhe) {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 25/59] KVM: arm64: nv: Don't expose SVE to nested guests
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (23 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 24/59] KVM: arm64: nv: Respect virtual CPTR_EL2.TFP setting Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-06-21  9:38 ` [PATCH 26/59] KVM: arm64: nv: Respect the virtual HCR_EL2.NV bit setting Marc Zyngier
                   ` (35 subsequent siblings)
  60 siblings, 0 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

For the time being, pretend that NV and SVE are incompatible.
Things will shortly change... Or not.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/kvm/sys_regs.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index e711dde4511c..94affa43e86c 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1324,7 +1324,8 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
 			 (u32)r->CRn, (u32)r->CRm, (u32)r->Op2);
 	u64 val = raz ? 0 : read_sanitised_ftr_reg(id);
 
-	if (id == SYS_ID_AA64PFR0_EL1 && !vcpu_has_sve(vcpu)) {
+	if (id == SYS_ID_AA64PFR0_EL1 &&
+	    (!vcpu_has_sve(vcpu) || nested_virt_in_use(vcpu))) {
 		val &= ~(0xfUL << ID_AA64PFR0_SVE_SHIFT);
 	} else if (id == SYS_ID_AA64ISAR1_EL1 && !vcpu_has_ptrauth(vcpu)) {
 		val &= ~((0xfUL << ID_AA64ISAR1_APA_SHIFT) |
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 26/59] KVM: arm64: nv: Respect the virtual HCR_EL2.NV bit setting
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (24 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 25/59] KVM: arm64: nv: Don't expose SVE to nested guests Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-06-26  5:31   ` Julien Thierry
  2019-06-21  9:38 ` [PATCH 27/59] KVM: arm64: nv: Respect virtual HCR_EL2.TVM and TRVM settings Marc Zyngier
                   ` (34 subsequent siblings)
  60 siblings, 1 reply; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Jintack Lim <jintack.lim@linaro.org>

Forward traps due to HCR_EL2.NV bit to the virtual EL2 if they are not
coming from the virtual EL2 and the virtual HCR_EL2.NV bit is set.

In addition to EL2 register accesses, setting NV bit will also make EL12
register accesses trap to EL2. To emulate this for the virtual EL2,
forword traps due to EL12 register accessses to the virtual EL2 if the
virtual HCR_EL2.NV bit is set.

This is for recursive nested virtualization.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
[Moved code to emulate-nested.c]
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/kvm_arm.h    |  1 +
 arch/arm64/include/asm/kvm_nested.h |  2 ++
 arch/arm64/kvm/emulate-nested.c     | 28 ++++++++++++++++++++++++++++
 arch/arm64/kvm/handle_exit.c        |  6 ++++++
 arch/arm64/kvm/sys_regs.c           | 18 ++++++++++++++++++
 5 files changed, 55 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 48e15af2bece..d21486274eeb 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -24,6 +24,7 @@
 
 /* Hyp Configuration Register (HCR) bits */
 #define HCR_FWB		(UL(1) << 46)
+#define HCR_NV		(UL(1) << 42)
 #define HCR_API		(UL(1) << 41)
 #define HCR_APK		(UL(1) << 40)
 #define HCR_TEA		(UL(1) << 37)
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 645e5e11b749..61e71d0d2151 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -11,5 +11,7 @@ static inline bool nested_virt_in_use(const struct kvm_vcpu *vcpu)
 }
 
 int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
+extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
+extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
 
 #endif /* __ARM64_KVM_NESTED_H */
diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
index f829b8b04dc8..c406fd688b9f 100644
--- a/arch/arm64/kvm/emulate-nested.c
+++ b/arch/arm64/kvm/emulate-nested.c
@@ -24,6 +24,27 @@
 
 #include "trace.h"
 
+bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit)
+{
+	bool control_bit_set;
+
+	if (!nested_virt_in_use(vcpu))
+		return false;
+
+	control_bit_set = __vcpu_sys_reg(vcpu, HCR_EL2) & control_bit;
+	if (!vcpu_mode_el2(vcpu) && control_bit_set) {
+		kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+		return true;
+	}
+	return false;
+}
+
+bool forward_nv_traps(struct kvm_vcpu *vcpu)
+{
+	return forward_traps(vcpu, HCR_NV);
+}
+
+
 /* This is borrowed from get_except_vector in inject_fault.c */
 static u64 get_el2_except_vector(struct kvm_vcpu *vcpu,
 		enum exception_type type)
@@ -55,6 +76,13 @@ void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu)
 	u64 spsr, elr, mode;
 	bool direct_eret;
 
+	/*
+	 * Forward this trap to the virtual EL2 if the virtual
+	 * HCR_EL2.NV bit is set and this is coming from !EL2.
+	 */
+	if (forward_nv_traps(vcpu))
+		return;
+
 	/*
 	 * Going through the whole put/load motions is a waste of time
 	 * if this is a VHE guest hypervisor returning to its own
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 39602a4c1d61..7e8b1ec1d251 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -72,6 +72,12 @@ static int handle_smc(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
 	int ret;
 
+	/*
+	 * Forward this trapped smc instruction to the virtual EL2.
+	 */
+	if ((vcpu_read_sys_reg(vcpu, HCR_EL2) & HCR_TSC) && forward_nv_traps(vcpu))
+		return 1;
+
 	/*
 	 * "If an SMC instruction executed at Non-secure EL1 is
 	 * trapped to EL2 because HCR_EL2.TSC is 1, the exception is a
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 94affa43e86c..582d62aa48b7 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -392,10 +392,19 @@ static u32 get_ccsidr(u32 csselr)
 	return ccsidr;
 }
 
+static bool el12_reg(struct sys_reg_params *p)
+{
+	/* All *_EL12 registers have Op1=5. */
+	return (p->Op1 == 5);
+}
+
 static bool access_rw(struct kvm_vcpu *vcpu,
 		      struct sys_reg_params *p,
 		      const struct sys_reg_desc *r)
 {
+	if (el12_reg(p) && forward_nv_traps(vcpu))
+		return false;
+
 	if (p->is_write)
 		vcpu_write_sys_reg(vcpu, p->regval, r->reg);
 	else
@@ -440,6 +449,9 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
 	u64 val;
 	int reg = r->reg;
 
+	if (el12_reg(p) && forward_nv_traps(vcpu))
+		return false;
+
 	BUG_ON(!vcpu_mode_el2(vcpu) && !p->is_write);
 
 	if (!p->is_write) {
@@ -1611,6 +1623,9 @@ static bool access_elr(struct kvm_vcpu *vcpu,
 		       struct sys_reg_params *p,
 		       const struct sys_reg_desc *r)
 {
+	if (el12_reg(p) && forward_nv_traps(vcpu))
+		return false;
+
 	if (p->is_write)
 		vcpu->arch.ctxt.gp_regs.elr_el1 = p->regval;
 	else
@@ -1623,6 +1638,9 @@ static bool access_spsr(struct kvm_vcpu *vcpu,
 			struct sys_reg_params *p,
 			const struct sys_reg_desc *r)
 {
+	if (el12_reg(p) && forward_nv_traps(vcpu))
+		return false;
+
 	if (p->is_write)
 		vcpu->arch.ctxt.gp_regs.spsr[KVM_SPSR_EL1] = p->regval;
 	else
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 27/59] KVM: arm64: nv: Respect virtual HCR_EL2.TVM and TRVM settings
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (25 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 26/59] KVM: arm64: nv: Respect the virtual HCR_EL2.NV bit setting Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-06-26  6:55   ` Julien Thierry
  2019-06-21  9:38 ` [PATCH 28/59] KVM: arm64: nv: Respect the virtual HCR_EL2.NV1 bit setting Marc Zyngier
                   ` (33 subsequent siblings)
  60 siblings, 1 reply; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Jintack Lim <jintack.lim@linaro.org>

Forward the EL1 virtual memory register traps to the virtual EL2 if they
are not coming from the virtual EL2 and the virtual HCR_EL2.TVM or TRVM
bit is set.

This is for recursive nested virtualization.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/kvm/sys_regs.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 582d62aa48b7..0f74b9277a86 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -436,6 +436,27 @@ static bool access_dcsw(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+/* This function is to support the recursive nested virtualization */
+static bool forward_vm_traps(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
+{
+	u64 hcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
+
+	/* If a trap comes from the virtual EL2, the host hypervisor handles. */
+	if (vcpu_mode_el2(vcpu))
+		return false;
+
+	/*
+	 * If the virtual HCR_EL2.TVM or TRVM bit is set, we need to foward
+	 * this trap to the virtual EL2.
+	 */
+	if ((hcr_el2 & HCR_TVM) && p->is_write)
+		return true;
+	else if ((hcr_el2 & HCR_TRVM) && !p->is_write)
+		return true;
+
+	return false;
+}
+
 /*
  * Generic accessor for VM registers. Only called as long as HCR_TVM
  * is set. If the guest enables the MMU, we stop trapping the VM
@@ -452,6 +473,9 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
 	if (el12_reg(p) && forward_nv_traps(vcpu))
 		return false;
 
+	if (!el12_reg(p) && forward_vm_traps(vcpu, p))
+		return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
 	BUG_ON(!vcpu_mode_el2(vcpu) && !p->is_write);
 
 	if (!p->is_write) {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 28/59] KVM: arm64: nv: Respect the virtual HCR_EL2.NV1 bit setting
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (26 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 27/59] KVM: arm64: nv: Respect virtual HCR_EL2.TVM and TRVM settings Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-06-26  7:23   ` Julien Thierry
  2019-07-02 16:32   ` Alexandru Elisei
  2019-06-21  9:38 ` [PATCH 29/59] KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2 Marc Zyngier
                   ` (32 subsequent siblings)
  60 siblings, 2 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Jintack Lim <jintack@cs.columbia.edu>

Forward ELR_EL1, SPSR_EL1 and VBAR_EL1 traps to the virtual EL2 if the
virtual HCR_EL2.NV bit is set.

This is for recursive nested virtualization.

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/kvm_arm.h |  1 +
 arch/arm64/kvm/sys_regs.c        | 19 +++++++++++++++++--
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index d21486274eeb..55f4525c112c 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -24,6 +24,7 @@
 
 /* Hyp Configuration Register (HCR) bits */
 #define HCR_FWB		(UL(1) << 46)
+#define HCR_NV1		(UL(1) << 43)
 #define HCR_NV		(UL(1) << 42)
 #define HCR_API		(UL(1) << 41)
 #define HCR_APK		(UL(1) << 40)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 0f74b9277a86..beadebcfc888 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -473,8 +473,10 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
 	if (el12_reg(p) && forward_nv_traps(vcpu))
 		return false;
 
-	if (!el12_reg(p) && forward_vm_traps(vcpu, p))
-		return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+	if (!el12_reg(p) && forward_vm_traps(vcpu, p)) {
+		kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+		return false;
+	}
 
 	BUG_ON(!vcpu_mode_el2(vcpu) && !p->is_write);
 
@@ -1643,6 +1645,13 @@ static bool access_sp_el1(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+
+/* This function is to support the recursive nested virtualization */
+static bool forward_nv1_traps(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
+{
+	return forward_traps(vcpu, HCR_NV1);
+}
+
 static bool access_elr(struct kvm_vcpu *vcpu,
 		       struct sys_reg_params *p,
 		       const struct sys_reg_desc *r)
@@ -1650,6 +1659,9 @@ static bool access_elr(struct kvm_vcpu *vcpu,
 	if (el12_reg(p) && forward_nv_traps(vcpu))
 		return false;
 
+	if (!el12_reg(p) && forward_nv1_traps(vcpu, p))
+		return false;
+
 	if (p->is_write)
 		vcpu->arch.ctxt.gp_regs.elr_el1 = p->regval;
 	else
@@ -1665,6 +1677,9 @@ static bool access_spsr(struct kvm_vcpu *vcpu,
 	if (el12_reg(p) && forward_nv_traps(vcpu))
 		return false;
 
+	if (!el12_reg(p) && forward_nv1_traps(vcpu, p))
+		return false;
+
 	if (p->is_write)
 		vcpu->arch.ctxt.gp_regs.spsr[KVM_SPSR_EL1] = p->regval;
 	else
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 29/59] KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (27 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 28/59] KVM: arm64: nv: Respect the virtual HCR_EL2.NV1 bit setting Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-07-03  9:16   ` Alexandru Elisei
  2019-06-21  9:38 ` [PATCH 30/59] KVM: arm64: nv: Configure HCR_EL2 for nested virtualization Marc Zyngier
                   ` (31 subsequent siblings)
  60 siblings, 1 reply; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Jintack Lim <jintack.lim@linaro.org>

With HCR_EL2.NV bit set, accesses to EL12 registers in the virtual EL2
trap to EL2. Handle those traps just like we do for EL1 registers.

One exception is CNTKCTL_EL12. We don't trap on CNTKCTL_EL1 for non-VHE
virtual EL2 because we don't have to. However, accessing CNTKCTL_EL12
will trap since it's one of the EL12 registers controlled by HCR_EL2.NV
bit.  Therefore, add a handler for it and don't treat it as a
non-trap-registers when preparing a shadow context.

Move EL12 system register macros to a common place to reuse them.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/kvm/sys_regs.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index beadebcfc888..34f1b79f7856 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2039,6 +2039,23 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_CNTVOFF_EL2), access_rw, reset_val, CNTVOFF_EL2, 0 },
 	{ SYS_DESC(SYS_CNTHCTL_EL2), access_rw, reset_val, CNTHCTL_EL2, 0 },
 
+	{ SYS_DESC(SYS_SCTLR_EL12), access_vm_reg, reset_val, SCTLR_EL1, 0x00C50078 },
+	{ SYS_DESC(SYS_CPACR_EL12), access_rw, reset_val, CPACR_EL1, 0 },
+	{ SYS_DESC(SYS_TTBR0_EL12), access_vm_reg, reset_unknown, TTBR0_EL1 },
+	{ SYS_DESC(SYS_TTBR1_EL12), access_vm_reg, reset_unknown, TTBR1_EL1 },
+	{ SYS_DESC(SYS_TCR_EL12), access_vm_reg, reset_val, TCR_EL1, 0 },
+	{ SYS_DESC(SYS_SPSR_EL12), access_spsr},
+	{ SYS_DESC(SYS_ELR_EL12), access_elr},
+	{ SYS_DESC(SYS_AFSR0_EL12), access_vm_reg, reset_unknown, AFSR0_EL1 },
+	{ SYS_DESC(SYS_AFSR1_EL12), access_vm_reg, reset_unknown, AFSR1_EL1 },
+	{ SYS_DESC(SYS_ESR_EL12), access_vm_reg, reset_unknown, ESR_EL1 },
+	{ SYS_DESC(SYS_FAR_EL12), access_vm_reg, reset_unknown, FAR_EL1 },
+	{ SYS_DESC(SYS_MAIR_EL12), access_vm_reg, reset_unknown, MAIR_EL1 },
+	{ SYS_DESC(SYS_AMAIR_EL12), access_vm_reg, reset_amair_el1, AMAIR_EL1 },
+	{ SYS_DESC(SYS_VBAR_EL12), access_rw, reset_val, VBAR_EL1, 0 },
+	{ SYS_DESC(SYS_CONTEXTIDR_EL12), access_vm_reg, reset_val, CONTEXTIDR_EL1, 0 },
+	{ SYS_DESC(SYS_CNTKCTL_EL12), access_rw, reset_val, CNTKCTL_EL1, 0 },
+
 	{ SYS_DESC(SYS_SP_EL2), NULL, reset_unknown, SP_EL2 },
 };
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 30/59] KVM: arm64: nv: Configure HCR_EL2 for nested virtualization
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (28 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 29/59] KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2 Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-06-21  9:38 ` [PATCH 31/59] KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 changes Marc Zyngier
                   ` (30 subsequent siblings)
  60 siblings, 0 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Jintack Lim <jintack.lim@linaro.org>

We enable nested virtualization by setting the HCR NV and NV1 bit.

When the virtual E2H bit is set, we can support EL2 register accesses
via EL1 registers from the virtual EL2 by doing trap-and-emulate. A
better alternative, however, is to allow the virtual EL2 to access EL2
register states without trap. This can be easily achieved by not traping
EL1 registers since those registers already have EL2 register states.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/kvm/hyp/switch.c | 36 +++++++++++++++++++++++++++++++++---
 1 file changed, 33 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 9b5129cdc26a..4b2c45060b38 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -137,9 +137,39 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
 {
 	u64 hcr = vcpu->arch.hcr_el2;
 
-	/* Trap VM sysreg accesses if an EL2 guest is not using VHE. */
-	if (vcpu_mode_el2(vcpu) && !vcpu_el2_e2h_is_set(vcpu))
-		hcr |= HCR_TVM | HCR_TRVM;
+	if (is_hyp_ctxt(vcpu)) {
+		hcr |= HCR_NV;
+
+		if (!vcpu_el2_e2h_is_set(vcpu)) {
+			/*
+			 * For a guest hypervisor on v8.0, trap and emulate
+			 * the EL1 virtual memory control register accesses.
+			 */
+			hcr |= HCR_TVM | HCR_TRVM | HCR_NV1;
+		} else {
+			/*
+			 * For a guest hypervisor on v8.1 (VHE), allow to
+			 * access the EL1 virtual memory control registers
+			 * natively. These accesses are to access EL2 register
+			 * states.
+			 * Note that we still need to respect the virtual
+			 * HCR_EL2 state.
+			 */
+			u64 vhcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
+
+			/*
+			 * We already set TVM to handle set/way cache maint
+			 * ops traps, this somewhat collides with the nested
+			 * virt trapping for nVHE. So turn this off for now
+			 * here, in the hope that VHE guests won't ever do this.
+			 * TODO: find out whether it's worth to support both
+			 * cases at the same time.
+			 */
+			hcr &= ~HCR_TVM;
+
+			hcr |= vhcr_el2 & (HCR_TVM | HCR_TRVM);
+		}
+	}
 
 	write_sysreg(hcr, hcr_el2);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 31/59] KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 changes
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (29 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 30/59] KVM: arm64: nv: Configure HCR_EL2 for nested virtualization Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-06-21  9:38 ` [PATCH 32/59] KVM: arm64: nv: Hide RAS from nested guests Marc Zyngier
                   ` (29 subsequent siblings)
  60 siblings, 0 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Christoffer Dall <christoffer.dall@linaro.org>

So far we were flushing almost the entire universe whenever a VM would
load/unload the SCTLR_EL1 and the two versions of that register had
different MMU enabled settings.  This turned out to be so slow that it
prevented forward progress for a nested VM, because a scheduler timer
tick interrupt would always be pending when we reached the nested VM.

To avoid this problem, we consider the SCTLR_EL2 when evaluating if
caches are on or off when entering virtual EL2 (because this is the
value that we end up shadowing onto the hardware EL1 register).

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/kvm_mmu.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 3120ef948fa4..fe954efc992c 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -99,6 +99,7 @@ alternative_cb_end
 #include <asm/cacheflush.h>
 #include <asm/mmu_context.h>
 #include <asm/pgtable.h>
+#include <asm/kvm_emulate.h>
 
 void kvm_update_va_mask(struct alt_instr *alt,
 			__le32 *origptr, __le32 *updptr, int nr_inst);
@@ -315,7 +316,10 @@ struct kvm;
 
 static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
 {
-	return (vcpu_read_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
+	if (vcpu_mode_el2(vcpu))
+		return (__vcpu_sys_reg(vcpu, SCTLR_EL2) & 0b101) == 0b101;
+	else
+		return (vcpu_read_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
 }
 
 static inline void __clean_dcache_guest_page(kvm_pfn_t pfn, unsigned long size)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 32/59] KVM: arm64: nv: Hide RAS from nested guests
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (30 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 31/59] KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 changes Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-07-03 13:59   ` Alexandru Elisei
  2019-06-21  9:38 ` [PATCH 33/59] KVM: arm64: nv: Pretend we only support larger-than-host page sizes Marc Zyngier
                   ` (28 subsequent siblings)
  60 siblings, 1 reply; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

We don't want to expose complicated features to guests until we have
a good grasp on the basic CPU emulation. So let's pretend that RAS,
just like SVE, doesn't exist in a nested guest.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/kvm/sys_regs.c | 32 +++++++++++++++++++++++++++++---
 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 34f1b79f7856..ec34b81da936 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -577,6 +577,14 @@ static bool trap_raz_wi(struct kvm_vcpu *vcpu,
 		return read_zero(vcpu, p);
 }
 
+static bool trap_undef(struct kvm_vcpu *vcpu,
+		       struct sys_reg_params *p,
+		       const struct sys_reg_desc *r)
+{
+	kvm_inject_undefined(vcpu);
+	return false;
+}
+
 /*
  * ARMv8.1 mandates at least a trivial LORegion implementation, where all the
  * RW registers are RES0 (which we can implement as RAZ/WI). On an ARMv8.0
@@ -1601,13 +1609,15 @@ static bool access_ccsidr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 }
 
 /* sys_reg_desc initialiser for known cpufeature ID registers */
-#define ID_SANITISED(name) {			\
+#define ID_SANITISED_FN(name, fn) {		\
 	SYS_DESC(SYS_##name),			\
-	.access	= access_id_reg,		\
+	.access	= fn,				\
 	.get_user = get_id_reg,			\
 	.set_user = set_id_reg,			\
 }
 
+#define ID_SANITISED(name) 	ID_SANITISED_FN(name, access_id_reg)
+
 /*
  * sys_reg_desc initialiser for architecturally unallocated cpufeature ID
  * register with encoding Op0=3, Op1=0, CRn=0, CRm=crm, Op2=op2
@@ -1700,6 +1710,21 @@ static bool access_spsr_el2(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+static bool access_id_aa64pfr0_el1(struct kvm_vcpu *v,
+				   struct sys_reg_params *p,
+				   const struct sys_reg_desc *r)
+{
+	u64 val;
+
+	if (!nested_virt_in_use(v) || p->is_write)
+		return access_id_reg(v, p, r);
+
+	val = read_sanitised_ftr_reg(SYS_ID_AA64PFR0_EL1);
+	p->regval = val & ~(0xf << ID_AA64PFR0_RAS_SHIFT);
+
+	return true;
+}
+
 /*
  * Architected system registers.
  * Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
@@ -1791,7 +1816,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 
 	/* AArch64 ID registers */
 	/* CRm=4 */
-	ID_SANITISED(ID_AA64PFR0_EL1),
+	ID_SANITISED_FN(ID_AA64PFR0_EL1, access_id_aa64pfr0_el1),
 	ID_SANITISED(ID_AA64PFR1_EL1),
 	ID_UNALLOCATED(4,2),
 	ID_UNALLOCATED(4,3),
@@ -2032,6 +2057,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_VBAR_EL2), access_rw, reset_val, VBAR_EL2, 0 },
 	{ SYS_DESC(SYS_RVBAR_EL2), access_rw, reset_val, RVBAR_EL2, 0 },
 	{ SYS_DESC(SYS_RMR_EL2), access_rw, reset_val, RMR_EL2, 0 },
+	{ SYS_DESC(SYS_VDISR_EL2), trap_undef },
 
 	{ SYS_DESC(SYS_CONTEXTIDR_EL2), access_rw, reset_val, CONTEXTIDR_EL2, 0 },
 	{ SYS_DESC(SYS_TPIDR_EL2), access_rw, reset_val, TPIDR_EL2, 0 },
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 33/59] KVM: arm64: nv: Pretend we only support larger-than-host page sizes
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (31 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 32/59] KVM: arm64: nv: Hide RAS from nested guests Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-07-03 14:13   ` Alexandru Elisei
  2019-06-21  9:38 ` [PATCH 34/59] KVM: arm/arm64: nv: Factor out stage 2 page table data from struct kvm Marc Zyngier
                   ` (27 subsequent siblings)
  60 siblings, 1 reply; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Jintack Lim <jintack.lim@linaro.org>

Exposing memory management support to the virtual EL2 as is exposed to
the host hypervisor would make the implementation too complex and
inefficient. Therefore expose limited memory management support for the
following two cases.

We expose same or larger page granules than the one host uses.  We can
theoretically support a guest hypervisor having smaller-than-host
granularities but it is not worth it since it makes the implementation
complicated and it would waste memory.

We expose 40 bits of physical address range to the virtual EL2, because
we only support a 40bit IPA for the guest. Eventually, this will change.

  [ This was only trapping on the 32-bit encoding, also using the
    current target register value as a base for the sanitisation.

    Use as the handler for the 64-bit sysreg as well, also load the
    sanitised version of the sysreg before clearing and setting bits.

    -- Andre Przywara ]

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/kvm/sys_regs.c | 50 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 49 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index ec34b81da936..cc994ec3c121 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1710,6 +1710,54 @@ static bool access_spsr_el2(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+static bool access_id_aa64mmfr0_el1(struct kvm_vcpu *v,
+				    struct sys_reg_params *p,
+				    const struct sys_reg_desc *r)
+{
+	u64 val;
+
+	if (p->is_write)
+		return write_to_read_only(v, p, r);
+
+	val = read_id_reg(v, r, false);
+
+	if (!nested_virt_in_use(v))
+		goto out;
+
+	/*
+	 * Don't expose granules smaller than the host's granule to the guest.
+	 * We can theoretically support a guest hypervisor having
+	 * smaller-than-host granularities but it is not worth it since it
+	 * makes the implementation complicated and it would waste memory.
+	 */
+	switch (PAGE_SIZE) {
+	case SZ_64K:
+		/* 16KB granule not supported */
+		val &= ~(0xf << ID_AA64MMFR0_TGRAN16_SHIFT);
+		val |= (ID_AA64MMFR0_TGRAN16_NI << ID_AA64MMFR0_TGRAN16_SHIFT);
+		/* fall through */
+	case SZ_16K:
+		/* 4KB granule not supported */
+		val &= ~(0xf << ID_AA64MMFR0_TGRAN4_SHIFT);
+		val |= (ID_AA64MMFR0_TGRAN4_NI << ID_AA64MMFR0_TGRAN4_SHIFT);
+		break;
+	case SZ_4K:
+		/* All granule sizes are supported */
+		break;
+	default:
+		unreachable();
+	}
+
+	/* Expose only 40 bits physical address range to the guest hypervisor */
+	val &= ~(0xf << ID_AA64MMFR0_PARANGE_SHIFT);
+	val |= (0x2 << ID_AA64MMFR0_PARANGE_SHIFT); /* 40 bits */
+
+out:
+	p->regval = val;
+
+	return true;
+}
+
 static bool access_id_aa64pfr0_el1(struct kvm_vcpu *v,
 				   struct sys_reg_params *p,
 				   const struct sys_reg_desc *r)
@@ -1846,7 +1894,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	ID_UNALLOCATED(6,7),
 
 	/* CRm=7 */
-	ID_SANITISED(ID_AA64MMFR0_EL1),
+	ID_SANITISED_FN(ID_AA64MMFR0_EL1, access_id_aa64mmfr0_el1),
 	ID_SANITISED(ID_AA64MMFR1_EL1),
 	ID_SANITISED(ID_AA64MMFR2_EL1),
 	ID_UNALLOCATED(7,3),
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 34/59] KVM: arm/arm64: nv: Factor out stage 2 page table data from struct kvm
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (32 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 33/59] KVM: arm64: nv: Pretend we only support larger-than-host page sizes Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-07-03 15:52   ` Alexandru Elisei
  2019-06-21  9:38 ` [PATCH 35/59] KVM: arm/arm64: nv: Support multiple nested stage 2 mmu structures Marc Zyngier
                   ` (26 subsequent siblings)
  60 siblings, 1 reply; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

As we are about to reuse our stage 2 page table manipulation code for
shadow stage 2 page tables in the context of nested virtualization, we
are going to manage multiple stage 2 page tables for a single VM.

This requires some pretty invasive changes to our data structures,
which moves the vmid and pgd pointers into a separate structure and
change pretty much all of our mmu code to operate on this structure
instead.

The new structre is called struct kvm_s2_mmu.

There is no intended functional change by this patch alone.

[Designed data structure layout in collaboration]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
---
 arch/arm/include/asm/kvm_asm.h    |   5 +-
 arch/arm/include/asm/kvm_host.h   |  23 ++-
 arch/arm/include/asm/kvm_mmu.h    |  10 +-
 arch/arm/kvm/hyp/switch.c         |   3 +-
 arch/arm/kvm/hyp/tlb.c            |  13 +-
 arch/arm64/include/asm/kvm_asm.h  |   5 +-
 arch/arm64/include/asm/kvm_host.h |  24 ++-
 arch/arm64/include/asm/kvm_mmu.h  |  16 +-
 arch/arm64/kvm/hyp/switch.c       |   8 +-
 arch/arm64/kvm/hyp/tlb.c          |  36 ++---
 virt/kvm/arm/arm.c                |  17 +-
 virt/kvm/arm/mmu.c                | 250 ++++++++++++++++--------------
 12 files changed, 224 insertions(+), 186 deletions(-)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index f615830f9f57..4f85323f1290 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -49,13 +49,14 @@
 #ifndef __ASSEMBLY__
 struct kvm;
 struct kvm_vcpu;
+struct kvm_s2_mmu;
 
 extern char __kvm_hyp_init[];
 extern char __kvm_hyp_init_end[];
 
 extern void __kvm_flush_vm_context(void);
-extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
-extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
+extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa);
+extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
 extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
 
 extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index f80418ddeb60..e3217c4ad25b 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -55,18 +55,23 @@ struct kvm_vmid {
 	u32    vmid;
 };
 
+struct kvm_s2_mmu {
+	/* The VMID generation used for the virt. memory system */
+	struct kvm_vmid vmid;
+
+	/* Stage-2 page table */
+	pgd_t *pgd;
+	phys_addr_t pgd_phys;
+
+	struct kvm *kvm;
+};
+
 struct kvm_arch {
+	struct kvm_s2_mmu mmu;
+
 	/* The last vcpu id that ran on each physical CPU */
 	int __percpu *last_vcpu_ran;
 
-	/*
-	 * Anything that is not used directly from assembly code goes
-	 * here.
-	 */
-
-	/* The VMID generation used for the virt. memory system */
-	struct kvm_vmid vmid;
-
 	/* Stage-2 page table */
 	pgd_t *pgd;
 	phys_addr_t pgd_phys;
@@ -164,6 +169,8 @@ struct vcpu_reset_state {
 struct kvm_vcpu_arch {
 	struct kvm_cpu_context ctxt;
 
+	struct kvm_s2_mmu *hw_mmu;
+
 	int target; /* Processor target */
 	DECLARE_BITMAP(features, KVM_VCPU_MAX_FEATURES);
 
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 0d84d50bf9ba..be23e3f8e08c 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -52,8 +52,8 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
 void free_hyp_pgds(void);
 
 void stage2_unmap_vm(struct kvm *kvm);
-int kvm_alloc_stage2_pgd(struct kvm *kvm);
-void kvm_free_stage2_pgd(struct kvm *kvm);
+int kvm_alloc_stage2_pgd(struct kvm_s2_mmu *mmu);
+void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
 int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 			  phys_addr_t pa, unsigned long size, bool writable);
 
@@ -420,12 +420,12 @@ static inline int hyp_map_aux_data(void)
 
 static inline void kvm_set_ipa_limit(void) {}
 
-static __always_inline u64 kvm_get_vttbr(struct kvm *kvm)
+static __always_inline u64 kvm_get_vttbr(struct kvm_s2_mmu *mmu)
 {
-	struct kvm_vmid *vmid = &kvm->arch.vmid;
+	struct kvm_vmid *vmid = &mmu->vmid;
 	u64 vmid_field, baddr;
 
-	baddr = kvm->arch.pgd_phys;
+	baddr = mmu->pgd_phys;
 	vmid_field = (u64)vmid->vmid << VTTBR_VMID_SHIFT;
 	return kvm_phys_to_vttbr(baddr) | vmid_field;
 }
diff --git a/arch/arm/kvm/hyp/switch.c b/arch/arm/kvm/hyp/switch.c
index 3b058a5d7c5f..6e9c3f11bfa4 100644
--- a/arch/arm/kvm/hyp/switch.c
+++ b/arch/arm/kvm/hyp/switch.c
@@ -76,8 +76,7 @@ static void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu)
 
 static void __hyp_text __activate_vm(struct kvm_vcpu *vcpu)
 {
-	struct kvm *kvm = kern_hyp_va(vcpu->kvm);
-	write_sysreg(kvm_get_vttbr(kvm), VTTBR);
+	write_sysreg(kvm_get_vttbr(vcpu->arch.hw_mmu), VTTBR);
 	write_sysreg(vcpu->arch.midr, VPIDR);
 }
 
diff --git a/arch/arm/kvm/hyp/tlb.c b/arch/arm/kvm/hyp/tlb.c
index 8e4afba73635..2d66288e20ed 100644
--- a/arch/arm/kvm/hyp/tlb.c
+++ b/arch/arm/kvm/hyp/tlb.c
@@ -35,13 +35,12 @@
  * As v7 does not support flushing per IPA, just nuke the whole TLB
  * instead, ignoring the ipa value.
  */
-void __hyp_text __kvm_tlb_flush_vmid(struct kvm *kvm)
+void __hyp_text __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu)
 {
 	dsb(ishst);
 
 	/* Switch to requested VMID */
-	kvm = kern_hyp_va(kvm);
-	write_sysreg(kvm_get_vttbr(kvm), VTTBR);
+	write_sysreg(kvm_get_vttbr(mmu), VTTBR);
 	isb();
 
 	write_sysreg(0, TLBIALLIS);
@@ -51,17 +50,15 @@ void __hyp_text __kvm_tlb_flush_vmid(struct kvm *kvm)
 	write_sysreg(0, VTTBR);
 }
 
-void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
+void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa)
 {
-	__kvm_tlb_flush_vmid(kvm);
+	__kvm_tlb_flush_vmid(mmu);
 }
 
 void __hyp_text __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu)
 {
-	struct kvm *kvm = kern_hyp_va(kern_hyp_va(vcpu)->kvm);
-
 	/* Switch to requested VMID */
-	write_sysreg(kvm_get_vttbr(kvm), VTTBR);
+	write_sysreg(kvm_get_vttbr(vcpu->arch.hw_mmu), VTTBR);
 	isb();
 
 	write_sysreg(0, TLBIALL);
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index ff73f5462aca..5e956c2cd9b4 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -56,6 +56,7 @@
 
 struct kvm;
 struct kvm_vcpu;
+struct kvm_s2_mmu;
 
 extern char __kvm_hyp_init[];
 extern char __kvm_hyp_init_end[];
@@ -63,8 +64,8 @@ extern char __kvm_hyp_init_end[];
 extern char __kvm_hyp_vector[];
 
 extern void __kvm_flush_vm_context(void);
-extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
-extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
+extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa);
+extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
 extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
 
 extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index dae9c42a7219..3dee5e17a4ee 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -73,12 +73,25 @@ struct kvm_vmid {
 	u32    vmid;
 };
 
-struct kvm_arch {
+struct kvm_s2_mmu {
 	struct kvm_vmid vmid;
 
-	/* stage2 entry level table */
-	pgd_t *pgd;
-	phys_addr_t pgd_phys;
+	/*
+	 * stage2 entry level table
+	 *
+	 * Two kvm_s2_mmu structures in the same VM can point to the same pgd
+	 * here.  This happens when running a non-VHE guest hypervisor which
+	 * uses the canonical stage 2 page table for both vEL2 and for vEL1/0
+	 * with vHCR_EL2.VM == 0.
+	 */
+	pgd_t		*pgd;
+	phys_addr_t	pgd_phys;
+
+	struct kvm *kvm;
+};
+
+struct kvm_arch {
+	struct kvm_s2_mmu mmu;
 
 	/* VTCR_EL2 value for this VM */
 	u64    vtcr;
@@ -297,6 +310,9 @@ struct kvm_vcpu_arch {
 	void *sve_state;
 	unsigned int sve_max_vl;
 
+	/* Stage 2 paging state used by the hardware on next switch */
+	struct kvm_s2_mmu *hw_mmu;
+
 	/* HYP configuration */
 	u64 hcr_el2;
 	u32 mdcr_el2;
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index fe954efc992c..1eb6e0ca61c2 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -165,8 +165,8 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
 void free_hyp_pgds(void);
 
 void stage2_unmap_vm(struct kvm *kvm);
-int kvm_alloc_stage2_pgd(struct kvm *kvm);
-void kvm_free_stage2_pgd(struct kvm *kvm);
+int kvm_alloc_stage2_pgd(struct kvm_s2_mmu *mmu);
+void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
 int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 			  phys_addr_t pa, unsigned long size, bool writable);
 
@@ -607,13 +607,13 @@ static inline u64 kvm_vttbr_baddr_mask(struct kvm *kvm)
 	return vttbr_baddr_mask(kvm_phys_shift(kvm), kvm_stage2_levels(kvm));
 }
 
-static __always_inline u64 kvm_get_vttbr(struct kvm *kvm)
+static __always_inline u64 kvm_get_vttbr(struct kvm_s2_mmu *mmu)
 {
-	struct kvm_vmid *vmid = &kvm->arch.vmid;
+	struct kvm_vmid *vmid = &mmu->vmid;
 	u64 vmid_field, baddr;
 	u64 cnp = system_supports_cnp() ? VTTBR_CNP_BIT : 0;
 
-	baddr = kvm->arch.pgd_phys;
+	baddr = mmu->pgd_phys;
 	vmid_field = (u64)vmid->vmid << VTTBR_VMID_SHIFT;
 	return kvm_phys_to_vttbr(baddr) | vmid_field | cnp;
 }
@@ -622,10 +622,10 @@ static __always_inline u64 kvm_get_vttbr(struct kvm *kvm)
  * Must be called from hyp code running at EL2 with an updated VTTBR
  * and interrupts disabled.
  */
-static __always_inline void __load_guest_stage2(struct kvm *kvm)
+static __always_inline void __load_guest_stage2(struct kvm_s2_mmu *mmu)
 {
-	write_sysreg(kvm->arch.vtcr, vtcr_el2);
-	write_sysreg(kvm_get_vttbr(kvm), vttbr_el2);
+	write_sysreg(kern_hyp_va(mmu->kvm)->arch.vtcr, vtcr_el2);
+	write_sysreg(kvm_get_vttbr(mmu), vttbr_el2);
 
 	/*
 	 * ARM erratum 1165522 requires the actual execution of the above
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 4b2c45060b38..fb479c71b521 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -248,9 +248,9 @@ void deactivate_traps_vhe_put(void)
 	__deactivate_traps_common();
 }
 
-static void __hyp_text __activate_vm(struct kvm *kvm)
+static void __hyp_text __activate_vm(struct kvm_s2_mmu *mmu)
 {
-	__load_guest_stage2(kvm);
+	__load_guest_stage2(mmu);
 }
 
 static void __hyp_text __deactivate_vm(struct kvm_vcpu *vcpu)
@@ -611,7 +611,7 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	 * stage 2 translation, and __activate_traps clear HCR_EL2.TGE
 	 * (among other things).
 	 */
-	__activate_vm(vcpu->kvm);
+	__activate_vm(vcpu->arch.hw_mmu);
 	__activate_traps(vcpu);
 
 	sysreg_restore_guest_state_vhe(guest_ctxt);
@@ -672,7 +672,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 
 	__sysreg_save_state_nvhe(host_ctxt);
 
-	__activate_vm(kern_hyp_va(vcpu->kvm));
+	__activate_vm(kern_hyp_va(vcpu->arch.hw_mmu));
 	__activate_traps(vcpu);
 
 	__hyp_vgic_restore_state(vcpu);
diff --git a/arch/arm64/kvm/hyp/tlb.c b/arch/arm64/kvm/hyp/tlb.c
index 32a782bb00be..779405db3fb3 100644
--- a/arch/arm64/kvm/hyp/tlb.c
+++ b/arch/arm64/kvm/hyp/tlb.c
@@ -27,7 +27,7 @@ struct tlb_inv_context {
 	u64		sctlr;
 };
 
-static void __hyp_text __tlb_switch_to_guest_vhe(struct kvm *kvm,
+static void __hyp_text __tlb_switch_to_guest_vhe(struct kvm_s2_mmu *mmu,
 						 struct tlb_inv_context *cxt)
 {
 	u64 val;
@@ -64,17 +64,17 @@ static void __hyp_text __tlb_switch_to_guest_vhe(struct kvm *kvm,
 	 * place before clearing TGE. __load_guest_stage2() already
 	 * has an ISB in order to deal with this.
 	 */
-	__load_guest_stage2(kvm);
+	__load_guest_stage2(mmu);
 	val = read_sysreg(hcr_el2);
 	val &= ~HCR_TGE;
 	write_sysreg(val, hcr_el2);
 	isb();
 }
 
-static void __hyp_text __tlb_switch_to_guest_nvhe(struct kvm *kvm,
+static void __hyp_text __tlb_switch_to_guest_nvhe(struct kvm_s2_mmu *mmu,
 						  struct tlb_inv_context *cxt)
 {
-	__load_guest_stage2(kvm);
+	__load_guest_stage2(mmu);
 	isb();
 }
 
@@ -83,8 +83,7 @@ static hyp_alternate_select(__tlb_switch_to_guest,
 			    __tlb_switch_to_guest_vhe,
 			    ARM64_HAS_VIRT_HOST_EXTN);
 
-static void __hyp_text __tlb_switch_to_host_vhe(struct kvm *kvm,
-						struct tlb_inv_context *cxt)
+static void __hyp_text __tlb_switch_to_host_vhe(struct tlb_inv_context *cxt)
 {
 	/*
 	 * We're done with the TLB operation, let's restore the host's
@@ -103,8 +102,7 @@ static void __hyp_text __tlb_switch_to_host_vhe(struct kvm *kvm,
 	local_irq_restore(cxt->flags);
 }
 
-static void __hyp_text __tlb_switch_to_host_nvhe(struct kvm *kvm,
-						 struct tlb_inv_context *cxt)
+static void __hyp_text __tlb_switch_to_host_nvhe(struct tlb_inv_context *cxt)
 {
 	write_sysreg(0, vttbr_el2);
 }
@@ -114,15 +112,15 @@ static hyp_alternate_select(__tlb_switch_to_host,
 			    __tlb_switch_to_host_vhe,
 			    ARM64_HAS_VIRT_HOST_EXTN);
 
-void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
+void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa)
 {
 	struct tlb_inv_context cxt;
 
 	dsb(ishst);
 
 	/* Switch to requested VMID */
-	kvm = kern_hyp_va(kvm);
-	__tlb_switch_to_guest()(kvm, &cxt);
+	mmu = kern_hyp_va(mmu);
+	__tlb_switch_to_guest()(mmu, &cxt);
 
 	/*
 	 * We could do so much better if we had the VA as well.
@@ -165,39 +163,39 @@ void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
 	if (!has_vhe() && icache_is_vpipt())
 		__flush_icache_all();
 
-	__tlb_switch_to_host()(kvm, &cxt);
+	__tlb_switch_to_host()(&cxt);
 }
 
-void __hyp_text __kvm_tlb_flush_vmid(struct kvm *kvm)
+void __hyp_text __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu)
 {
 	struct tlb_inv_context cxt;
 
 	dsb(ishst);
 
 	/* Switch to requested VMID */
-	kvm = kern_hyp_va(kvm);
-	__tlb_switch_to_guest()(kvm, &cxt);
+	mmu = kern_hyp_va(mmu);
+	__tlb_switch_to_guest()(mmu, &cxt);
 
 	__tlbi(vmalls12e1is);
 	dsb(ish);
 	isb();
 
-	__tlb_switch_to_host()(kvm, &cxt);
+	__tlb_switch_to_host()(&cxt);
 }
 
 void __hyp_text __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu)
 {
-	struct kvm *kvm = kern_hyp_va(kern_hyp_va(vcpu)->kvm);
+	struct kvm_s2_mmu *mmu = kern_hyp_va(kern_hyp_va(vcpu)->arch.hw_mmu);
 	struct tlb_inv_context cxt;
 
 	/* Switch to requested VMID */
-	__tlb_switch_to_guest()(kvm, &cxt);
+	__tlb_switch_to_guest()(mmu, &cxt);
 
 	__tlbi(vmalle1);
 	dsb(nsh);
 	isb();
 
-	__tlb_switch_to_host()(kvm, &cxt);
+	__tlb_switch_to_host()(&cxt);
 }
 
 void __hyp_text __kvm_flush_vm_context(void)
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index bd5c55916d0d..5d4371633e1c 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -118,26 +118,27 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	for_each_possible_cpu(cpu)
 		*per_cpu_ptr(kvm->arch.last_vcpu_ran, cpu) = -1;
 
-	ret = kvm_alloc_stage2_pgd(kvm);
+	ret = kvm_alloc_stage2_pgd(&kvm->arch.mmu);
 	if (ret)
 		goto out_fail_alloc;
 
+	/* Mark the initial VMID generation invalid */
+	kvm->arch.mmu.vmid.vmid_gen = 0;
+	kvm->arch.mmu.kvm = kvm;
+
 	ret = create_hyp_mappings(kvm, kvm + 1, PAGE_HYP);
 	if (ret)
 		goto out_free_stage2_pgd;
 
 	kvm_vgic_early_init(kvm);
 
-	/* Mark the initial VMID generation invalid */
-	kvm->arch.vmid.vmid_gen = 0;
-
 	/* The maximum number of VCPUs is limited by the host's GIC model */
 	kvm->arch.max_vcpus = vgic_present ?
 				kvm_vgic_get_max_vcpus() : KVM_MAX_VCPUS;
 
 	return ret;
 out_free_stage2_pgd:
-	kvm_free_stage2_pgd(kvm);
+	kvm_free_stage2_pgd(&kvm->arch.mmu);
 out_fail_alloc:
 	free_percpu(kvm->arch.last_vcpu_ran);
 	kvm->arch.last_vcpu_ran = NULL;
@@ -342,6 +343,8 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 
 	kvm_arm_reset_debug_ptr(vcpu);
 
+	vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu;
+
 	return kvm_vgic_vcpu_init(vcpu);
 }
 
@@ -682,7 +685,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		 */
 		cond_resched();
 
-		update_vmid(&vcpu->kvm->arch.vmid);
+		update_vmid(&vcpu->arch.hw_mmu->vmid);
 
 		check_vcpu_requests(vcpu);
 
@@ -731,7 +734,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		 */
 		smp_store_mb(vcpu->mode, IN_GUEST_MODE);
 
-		if (ret <= 0 || need_new_vmid_gen(&vcpu->kvm->arch.vmid) ||
+		if (ret <= 0 || need_new_vmid_gen(&vcpu->arch.hw_mmu->vmid) ||
 		    kvm_request_pending(vcpu)) {
 			vcpu->mode = OUTSIDE_GUEST_MODE;
 			isb(); /* Ensure work in x_flush_hwstate is committed */
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 198e5171e1f7..bb1be4ea55ec 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -51,12 +51,12 @@ static bool memslot_is_logging(struct kvm_memory_slot *memslot)
  */
 void kvm_flush_remote_tlbs(struct kvm *kvm)
 {
-	kvm_call_hyp(__kvm_tlb_flush_vmid, kvm);
+	kvm_call_hyp(__kvm_tlb_flush_vmid, &kvm->arch.mmu);
 }
 
-static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
+static void kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa)
 {
-	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, kvm, ipa);
+	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ipa);
 }
 
 /*
@@ -92,31 +92,33 @@ static bool kvm_is_device_pfn(unsigned long pfn)
  *
  * Function clears a PMD entry, flushes addr 1st and 2nd stage TLBs.
  */
-static void stage2_dissolve_pmd(struct kvm *kvm, phys_addr_t addr, pmd_t *pmd)
+static void stage2_dissolve_pmd(struct kvm_s2_mmu *mmu, phys_addr_t addr, pmd_t *pmd)
 {
 	if (!pmd_thp_or_huge(*pmd))
 		return;
 
 	pmd_clear(pmd);
-	kvm_tlb_flush_vmid_ipa(kvm, addr);
+	kvm_tlb_flush_vmid_ipa(mmu, addr);
 	put_page(virt_to_page(pmd));
 }
 
 /**
  * stage2_dissolve_pud() - clear and flush huge PUD entry
- * @kvm:	pointer to kvm structure.
+ * @mmu:	pointer to mmu structure to operate on
  * @addr:	IPA
  * @pud:	pud pointer for IPA
  *
  * Function clears a PUD entry, flushes addr 1st and 2nd stage TLBs.
  */
-static void stage2_dissolve_pud(struct kvm *kvm, phys_addr_t addr, pud_t *pudp)
+static void stage2_dissolve_pud(struct kvm_s2_mmu *mmu, phys_addr_t addr, pud_t *pudp)
 {
+	struct kvm *kvm __maybe_unused = mmu->kvm;
+
 	if (!stage2_pud_huge(kvm, *pudp))
 		return;
 
 	stage2_pud_clear(kvm, pudp);
-	kvm_tlb_flush_vmid_ipa(kvm, addr);
+	kvm_tlb_flush_vmid_ipa(mmu, addr);
 	put_page(virt_to_page(pudp));
 }
 
@@ -152,31 +154,35 @@ static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
 	return p;
 }
 
-static void clear_stage2_pgd_entry(struct kvm *kvm, pgd_t *pgd, phys_addr_t addr)
+static void clear_stage2_pgd_entry(struct kvm_s2_mmu *mmu, pgd_t *pgd, phys_addr_t addr)
 {
+	struct kvm *kvm __maybe_unused = mmu->kvm;
+
 	pud_t *pud_table __maybe_unused = stage2_pud_offset(kvm, pgd, 0UL);
 	stage2_pgd_clear(kvm, pgd);
-	kvm_tlb_flush_vmid_ipa(kvm, addr);
+	kvm_tlb_flush_vmid_ipa(mmu, addr);
 	stage2_pud_free(kvm, pud_table);
 	put_page(virt_to_page(pgd));
 }
 
-static void clear_stage2_pud_entry(struct kvm *kvm, pud_t *pud, phys_addr_t addr)
+static void clear_stage2_pud_entry(struct kvm_s2_mmu *mmu, pud_t *pud, phys_addr_t addr)
 {
+	struct kvm *kvm __maybe_unused = mmu->kvm;
+
 	pmd_t *pmd_table __maybe_unused = stage2_pmd_offset(kvm, pud, 0);
 	VM_BUG_ON(stage2_pud_huge(kvm, *pud));
 	stage2_pud_clear(kvm, pud);
-	kvm_tlb_flush_vmid_ipa(kvm, addr);
+	kvm_tlb_flush_vmid_ipa(mmu, addr);
 	stage2_pmd_free(kvm, pmd_table);
 	put_page(virt_to_page(pud));
 }
 
-static void clear_stage2_pmd_entry(struct kvm *kvm, pmd_t *pmd, phys_addr_t addr)
+static void clear_stage2_pmd_entry(struct kvm_s2_mmu *mmu, pmd_t *pmd, phys_addr_t addr)
 {
 	pte_t *pte_table = pte_offset_kernel(pmd, 0);
 	VM_BUG_ON(pmd_thp_or_huge(*pmd));
 	pmd_clear(pmd);
-	kvm_tlb_flush_vmid_ipa(kvm, addr);
+	kvm_tlb_flush_vmid_ipa(mmu, addr);
 	free_page((unsigned long)pte_table);
 	put_page(virt_to_page(pmd));
 }
@@ -234,7 +240,7 @@ static inline void kvm_pgd_populate(pgd_t *pgdp, pud_t *pudp)
  * we then fully enforce cacheability of RAM, no matter what the guest
  * does.
  */
-static void unmap_stage2_ptes(struct kvm *kvm, pmd_t *pmd,
+static void unmap_stage2_ptes(struct kvm_s2_mmu *mmu, pmd_t *pmd,
 		       phys_addr_t addr, phys_addr_t end)
 {
 	phys_addr_t start_addr = addr;
@@ -246,7 +252,7 @@ static void unmap_stage2_ptes(struct kvm *kvm, pmd_t *pmd,
 			pte_t old_pte = *pte;
 
 			kvm_set_pte(pte, __pte(0));
-			kvm_tlb_flush_vmid_ipa(kvm, addr);
+			kvm_tlb_flush_vmid_ipa(mmu, addr);
 
 			/* No need to invalidate the cache for device mappings */
 			if (!kvm_is_device_pfn(pte_pfn(old_pte)))
@@ -256,13 +262,14 @@ static void unmap_stage2_ptes(struct kvm *kvm, pmd_t *pmd,
 		}
 	} while (pte++, addr += PAGE_SIZE, addr != end);
 
-	if (stage2_pte_table_empty(kvm, start_pte))
-		clear_stage2_pmd_entry(kvm, pmd, start_addr);
+	if (stage2_pte_table_empty(mmu->kvm, start_pte))
+		clear_stage2_pmd_entry(mmu, pmd, start_addr);
 }
 
-static void unmap_stage2_pmds(struct kvm *kvm, pud_t *pud,
+static void unmap_stage2_pmds(struct kvm_s2_mmu *mmu, pud_t *pud,
 		       phys_addr_t addr, phys_addr_t end)
 {
+	struct kvm *kvm __maybe_unused = mmu->kvm;
 	phys_addr_t next, start_addr = addr;
 	pmd_t *pmd, *start_pmd;
 
@@ -274,24 +281,25 @@ static void unmap_stage2_pmds(struct kvm *kvm, pud_t *pud,
 				pmd_t old_pmd = *pmd;
 
 				pmd_clear(pmd);
-				kvm_tlb_flush_vmid_ipa(kvm, addr);
+				kvm_tlb_flush_vmid_ipa(mmu, addr);
 
 				kvm_flush_dcache_pmd(old_pmd);
 
 				put_page(virt_to_page(pmd));
 			} else {
-				unmap_stage2_ptes(kvm, pmd, addr, next);
+				unmap_stage2_ptes(mmu, pmd, addr, next);
 			}
 		}
 	} while (pmd++, addr = next, addr != end);
 
 	if (stage2_pmd_table_empty(kvm, start_pmd))
-		clear_stage2_pud_entry(kvm, pud, start_addr);
+		clear_stage2_pud_entry(mmu, pud, start_addr);
 }
 
-static void unmap_stage2_puds(struct kvm *kvm, pgd_t *pgd,
+static void unmap_stage2_puds(struct kvm_s2_mmu *mmu, pgd_t *pgd,
 		       phys_addr_t addr, phys_addr_t end)
 {
+	struct kvm *kvm __maybe_unused = mmu->kvm;
 	phys_addr_t next, start_addr = addr;
 	pud_t *pud, *start_pud;
 
@@ -303,17 +311,17 @@ static void unmap_stage2_puds(struct kvm *kvm, pgd_t *pgd,
 				pud_t old_pud = *pud;
 
 				stage2_pud_clear(kvm, pud);
-				kvm_tlb_flush_vmid_ipa(kvm, addr);
+				kvm_tlb_flush_vmid_ipa(mmu, addr);
 				kvm_flush_dcache_pud(old_pud);
 				put_page(virt_to_page(pud));
 			} else {
-				unmap_stage2_pmds(kvm, pud, addr, next);
+				unmap_stage2_pmds(mmu, pud, addr, next);
 			}
 		}
 	} while (pud++, addr = next, addr != end);
 
 	if (stage2_pud_table_empty(kvm, start_pud))
-		clear_stage2_pgd_entry(kvm, pgd, start_addr);
+		clear_stage2_pgd_entry(mmu, pgd, start_addr);
 }
 
 /**
@@ -327,8 +335,9 @@ static void unmap_stage2_puds(struct kvm *kvm, pgd_t *pgd,
  * destroying the VM), otherwise another faulting VCPU may come in and mess
  * with things behind our backs.
  */
-static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
+static void unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
 {
+	struct kvm *kvm = mmu->kvm;
 	pgd_t *pgd;
 	phys_addr_t addr = start, end = start + size;
 	phys_addr_t next;
@@ -336,18 +345,18 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
 	assert_spin_locked(&kvm->mmu_lock);
 	WARN_ON(size & ~PAGE_MASK);
 
-	pgd = kvm->arch.pgd + stage2_pgd_index(kvm, addr);
+	pgd = mmu->pgd + stage2_pgd_index(kvm, addr);
 	do {
 		/*
 		 * Make sure the page table is still active, as another thread
 		 * could have possibly freed the page table, while we released
 		 * the lock.
 		 */
-		if (!READ_ONCE(kvm->arch.pgd))
+		if (!READ_ONCE(mmu->pgd))
 			break;
 		next = stage2_pgd_addr_end(kvm, addr, end);
 		if (!stage2_pgd_none(kvm, *pgd))
-			unmap_stage2_puds(kvm, pgd, addr, next);
+			unmap_stage2_puds(mmu, pgd, addr, next);
 		/*
 		 * If the range is too large, release the kvm->mmu_lock
 		 * to prevent starvation and lockup detector warnings.
@@ -357,7 +366,7 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
 	} while (pgd++, addr = next, addr != end);
 }
 
-static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
+static void stage2_flush_ptes(struct kvm_s2_mmu *mmu, pmd_t *pmd,
 			      phys_addr_t addr, phys_addr_t end)
 {
 	pte_t *pte;
@@ -369,9 +378,10 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
 	} while (pte++, addr += PAGE_SIZE, addr != end);
 }
 
-static void stage2_flush_pmds(struct kvm *kvm, pud_t *pud,
+static void stage2_flush_pmds(struct kvm_s2_mmu *mmu, pud_t *pud,
 			      phys_addr_t addr, phys_addr_t end)
 {
+	struct kvm *kvm = mmu->kvm;
 	pmd_t *pmd;
 	phys_addr_t next;
 
@@ -382,14 +392,15 @@ static void stage2_flush_pmds(struct kvm *kvm, pud_t *pud,
 			if (pmd_thp_or_huge(*pmd))
 				kvm_flush_dcache_pmd(*pmd);
 			else
-				stage2_flush_ptes(kvm, pmd, addr, next);
+				stage2_flush_ptes(mmu, pmd, addr, next);
 		}
 	} while (pmd++, addr = next, addr != end);
 }
 
-static void stage2_flush_puds(struct kvm *kvm, pgd_t *pgd,
+static void stage2_flush_puds(struct kvm_s2_mmu *mmu, pgd_t *pgd,
 			      phys_addr_t addr, phys_addr_t end)
 {
+	struct kvm *kvm __maybe_unused = mmu->kvm;
 	pud_t *pud;
 	phys_addr_t next;
 
@@ -400,24 +411,25 @@ static void stage2_flush_puds(struct kvm *kvm, pgd_t *pgd,
 			if (stage2_pud_huge(kvm, *pud))
 				kvm_flush_dcache_pud(*pud);
 			else
-				stage2_flush_pmds(kvm, pud, addr, next);
+				stage2_flush_pmds(mmu, pud, addr, next);
 		}
 	} while (pud++, addr = next, addr != end);
 }
 
-static void stage2_flush_memslot(struct kvm *kvm,
+static void stage2_flush_memslot(struct kvm_s2_mmu *mmu,
 				 struct kvm_memory_slot *memslot)
 {
+	struct kvm *kvm = mmu->kvm;
 	phys_addr_t addr = memslot->base_gfn << PAGE_SHIFT;
 	phys_addr_t end = addr + PAGE_SIZE * memslot->npages;
 	phys_addr_t next;
 	pgd_t *pgd;
 
-	pgd = kvm->arch.pgd + stage2_pgd_index(kvm, addr);
+	pgd = mmu->pgd + stage2_pgd_index(kvm, addr);
 	do {
 		next = stage2_pgd_addr_end(kvm, addr, end);
 		if (!stage2_pgd_none(kvm, *pgd))
-			stage2_flush_puds(kvm, pgd, addr, next);
+			stage2_flush_puds(mmu, pgd, addr, next);
 	} while (pgd++, addr = next, addr != end);
 }
 
@@ -439,7 +451,7 @@ static void stage2_flush_vm(struct kvm *kvm)
 
 	slots = kvm_memslots(kvm);
 	kvm_for_each_memslot(memslot, slots)
-		stage2_flush_memslot(kvm, memslot);
+		stage2_flush_memslot(&kvm->arch.mmu, memslot);
 
 	spin_unlock(&kvm->mmu_lock);
 	srcu_read_unlock(&kvm->srcu, idx);
@@ -883,35 +895,35 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
 
 /**
  * kvm_alloc_stage2_pgd - allocate level-1 table for stage-2 translation.
- * @kvm:	The KVM struct pointer for the VM.
+ * @mmu:	The stage 2 mmu struct pointer
  *
  * Allocates only the stage-2 HW PGD level table(s) of size defined by
- * stage2_pgd_size(kvm).
+ * stage2_pgd_size(mmu->kvm).
  *
  * Note we don't need locking here as this is only called when the VM is
  * created, which can only be done once.
  */
-int kvm_alloc_stage2_pgd(struct kvm *kvm)
+int kvm_alloc_stage2_pgd(struct kvm_s2_mmu *mmu)
 {
 	phys_addr_t pgd_phys;
 	pgd_t *pgd;
 
-	if (kvm->arch.pgd != NULL) {
+	if (mmu->pgd != NULL) {
 		kvm_err("kvm_arch already initialized?\n");
 		return -EINVAL;
 	}
 
 	/* Allocate the HW PGD, making sure that each page gets its own refcount */
-	pgd = alloc_pages_exact(stage2_pgd_size(kvm), GFP_KERNEL | __GFP_ZERO);
+	pgd = alloc_pages_exact(stage2_pgd_size(mmu->kvm), GFP_KERNEL | __GFP_ZERO);
 	if (!pgd)
 		return -ENOMEM;
 
 	pgd_phys = virt_to_phys(pgd);
-	if (WARN_ON(pgd_phys & ~kvm_vttbr_baddr_mask(kvm)))
+	if (WARN_ON(pgd_phys & ~kvm_vttbr_baddr_mask(mmu->kvm)))
 		return -EINVAL;
 
-	kvm->arch.pgd = pgd;
-	kvm->arch.pgd_phys = pgd_phys;
+	mmu->pgd = pgd;
+	mmu->pgd_phys = pgd_phys;
 	return 0;
 }
 
@@ -950,7 +962,7 @@ static void stage2_unmap_memslot(struct kvm *kvm,
 
 		if (!(vma->vm_flags & VM_PFNMAP)) {
 			gpa_t gpa = addr + (vm_start - memslot->userspace_addr);
-			unmap_stage2_range(kvm, gpa, vm_end - vm_start);
+			unmap_stage2_range(&kvm->arch.mmu, gpa, vm_end - vm_start);
 		}
 		hva = vm_end;
 	} while (hva < reg_end);
@@ -982,24 +994,16 @@ void stage2_unmap_vm(struct kvm *kvm)
 	srcu_read_unlock(&kvm->srcu, idx);
 }
 
-/**
- * kvm_free_stage2_pgd - free all stage-2 tables
- * @kvm:	The KVM struct pointer for the VM.
- *
- * Walks the level-1 page table pointed to by kvm->arch.pgd and frees all
- * underlying level-2 and level-3 tables before freeing the actual level-1 table
- * and setting the struct pointer to NULL.
- */
-void kvm_free_stage2_pgd(struct kvm *kvm)
+void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
 {
+	struct kvm *kvm = mmu->kvm;
 	void *pgd = NULL;
 
 	spin_lock(&kvm->mmu_lock);
-	if (kvm->arch.pgd) {
-		unmap_stage2_range(kvm, 0, kvm_phys_size(kvm));
-		pgd = READ_ONCE(kvm->arch.pgd);
-		kvm->arch.pgd = NULL;
-		kvm->arch.pgd_phys = 0;
+	if (mmu->pgd) {
+		unmap_stage2_range(mmu, 0, kvm_phys_size(kvm));
+		pgd = READ_ONCE(mmu->pgd);
+		mmu->pgd = NULL;
 	}
 	spin_unlock(&kvm->mmu_lock);
 
@@ -1008,13 +1012,14 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
 		free_pages_exact(pgd, stage2_pgd_size(kvm));
 }
 
-static pud_t *stage2_get_pud(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
+static pud_t *stage2_get_pud(struct kvm_s2_mmu *mmu, struct kvm_mmu_memory_cache *cache,
 			     phys_addr_t addr)
 {
+	struct kvm *kvm __maybe_unused = mmu->kvm;
 	pgd_t *pgd;
 	pud_t *pud;
 
-	pgd = kvm->arch.pgd + stage2_pgd_index(kvm, addr);
+	pgd = mmu->pgd + stage2_pgd_index(kvm, addr);
 	if (stage2_pgd_none(kvm, *pgd)) {
 		if (!cache)
 			return NULL;
@@ -1026,13 +1031,14 @@ static pud_t *stage2_get_pud(struct kvm *kvm, struct kvm_mmu_memory_cache *cache
 	return stage2_pud_offset(kvm, pgd, addr);
 }
 
-static pmd_t *stage2_get_pmd(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
+static pmd_t *stage2_get_pmd(struct kvm_s2_mmu *mmu, struct kvm_mmu_memory_cache *cache,
 			     phys_addr_t addr)
 {
+	struct kvm *kvm __maybe_unused = mmu->kvm;
 	pud_t *pud;
 	pmd_t *pmd;
 
-	pud = stage2_get_pud(kvm, cache, addr);
+	pud = stage2_get_pud(mmu, cache, addr);
 	if (!pud || stage2_pud_huge(kvm, *pud))
 		return NULL;
 
@@ -1047,13 +1053,14 @@ static pmd_t *stage2_get_pmd(struct kvm *kvm, struct kvm_mmu_memory_cache *cache
 	return stage2_pmd_offset(kvm, pud, addr);
 }
 
-static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
-			       *cache, phys_addr_t addr, const pmd_t *new_pmd)
+static int stage2_set_pmd_huge(struct kvm_s2_mmu *mmu,
+			       struct kvm_mmu_memory_cache *cache,
+			       phys_addr_t addr, const pmd_t *new_pmd)
 {
 	pmd_t *pmd, old_pmd;
 
 retry:
-	pmd = stage2_get_pmd(kvm, cache, addr);
+	pmd = stage2_get_pmd(mmu, cache, addr);
 	VM_BUG_ON(!pmd);
 
 	old_pmd = *pmd;
@@ -1086,7 +1093,7 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
 		 * get handled accordingly.
 		 */
 		if (!pmd_thp_or_huge(old_pmd)) {
-			unmap_stage2_range(kvm, addr & S2_PMD_MASK, S2_PMD_SIZE);
+			unmap_stage2_range(mmu, addr & S2_PMD_MASK, S2_PMD_SIZE);
 			goto retry;
 		}
 		/*
@@ -1102,7 +1109,7 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
 		 */
 		WARN_ON_ONCE(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd));
 		pmd_clear(pmd);
-		kvm_tlb_flush_vmid_ipa(kvm, addr);
+		kvm_tlb_flush_vmid_ipa(mmu, addr);
 	} else {
 		get_page(virt_to_page(pmd));
 	}
@@ -1111,13 +1118,15 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
 	return 0;
 }
 
-static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
+static int stage2_set_pud_huge(struct kvm_s2_mmu *mmu,
+			       struct kvm_mmu_memory_cache *cache,
 			       phys_addr_t addr, const pud_t *new_pudp)
 {
+	struct kvm *kvm __maybe_unused = mmu->kvm;
 	pud_t *pudp, old_pud;
 
 retry:
-	pudp = stage2_get_pud(kvm, cache, addr);
+	pudp = stage2_get_pud(mmu, cache, addr);
 	VM_BUG_ON(!pudp);
 
 	old_pud = *pudp;
@@ -1136,13 +1145,13 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cac
 		 * the range for this block and retry.
 		 */
 		if (!stage2_pud_huge(kvm, old_pud)) {
-			unmap_stage2_range(kvm, addr & S2_PUD_MASK, S2_PUD_SIZE);
+			unmap_stage2_range(mmu, addr & S2_PUD_MASK, S2_PUD_SIZE);
 			goto retry;
 		}
 
 		WARN_ON_ONCE(kvm_pud_pfn(old_pud) != kvm_pud_pfn(*new_pudp));
 		stage2_pud_clear(kvm, pudp);
-		kvm_tlb_flush_vmid_ipa(kvm, addr);
+		kvm_tlb_flush_vmid_ipa(mmu, addr);
 	} else {
 		get_page(virt_to_page(pudp));
 	}
@@ -1157,9 +1166,10 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cac
  * leaf-entry is returned in the appropriate level variable - pudpp,
  * pmdpp, ptepp.
  */
-static bool stage2_get_leaf_entry(struct kvm *kvm, phys_addr_t addr,
+static bool stage2_get_leaf_entry(struct kvm_s2_mmu *mmu, phys_addr_t addr,
 				  pud_t **pudpp, pmd_t **pmdpp, pte_t **ptepp)
 {
+	struct kvm *kvm __maybe_unused = mmu->kvm;
 	pud_t *pudp;
 	pmd_t *pmdp;
 	pte_t *ptep;
@@ -1168,7 +1178,7 @@ static bool stage2_get_leaf_entry(struct kvm *kvm, phys_addr_t addr,
 	*pmdpp = NULL;
 	*ptepp = NULL;
 
-	pudp = stage2_get_pud(kvm, NULL, addr);
+	pudp = stage2_get_pud(mmu, NULL, addr);
 	if (!pudp || stage2_pud_none(kvm, *pudp) || !stage2_pud_present(kvm, *pudp))
 		return false;
 
@@ -1194,14 +1204,14 @@ static bool stage2_get_leaf_entry(struct kvm *kvm, phys_addr_t addr,
 	return true;
 }
 
-static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr)
+static bool stage2_is_exec(struct kvm_s2_mmu *mmu, phys_addr_t addr)
 {
 	pud_t *pudp;
 	pmd_t *pmdp;
 	pte_t *ptep;
 	bool found;
 
-	found = stage2_get_leaf_entry(kvm, addr, &pudp, &pmdp, &ptep);
+	found = stage2_get_leaf_entry(mmu, addr, &pudp, &pmdp, &ptep);
 	if (!found)
 		return false;
 
@@ -1213,10 +1223,12 @@ static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr)
 		return kvm_s2pte_exec(ptep);
 }
 
-static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
+static int stage2_set_pte(struct kvm_s2_mmu *mmu,
+			  struct kvm_mmu_memory_cache *cache,
 			  phys_addr_t addr, const pte_t *new_pte,
 			  unsigned long flags)
 {
+	struct kvm *kvm __maybe_unused = mmu->kvm;
 	pud_t *pud;
 	pmd_t *pmd;
 	pte_t *pte, old_pte;
@@ -1226,7 +1238,7 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
 	VM_BUG_ON(logging_active && !cache);
 
 	/* Create stage-2 page table mapping - Levels 0 and 1 */
-	pud = stage2_get_pud(kvm, cache, addr);
+	pud = stage2_get_pud(mmu, cache, addr);
 	if (!pud) {
 		/*
 		 * Ignore calls from kvm_set_spte_hva for unallocated
@@ -1240,7 +1252,7 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
 	 * on to allocate page.
 	 */
 	if (logging_active)
-		stage2_dissolve_pud(kvm, addr, pud);
+		stage2_dissolve_pud(mmu, addr, pud);
 
 	if (stage2_pud_none(kvm, *pud)) {
 		if (!cache)
@@ -1264,7 +1276,7 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
 	 * allocate page.
 	 */
 	if (logging_active)
-		stage2_dissolve_pmd(kvm, addr, pmd);
+		stage2_dissolve_pmd(mmu, addr, pmd);
 
 	/* Create stage-2 page mappings - Level 2 */
 	if (pmd_none(*pmd)) {
@@ -1288,7 +1300,7 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
 			return 0;
 
 		kvm_set_pte(pte, __pte(0));
-		kvm_tlb_flush_vmid_ipa(kvm, addr);
+		kvm_tlb_flush_vmid_ipa(mmu, addr);
 	} else {
 		get_page(virt_to_page(pte));
 	}
@@ -1354,8 +1366,8 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 		if (ret)
 			goto out;
 		spin_lock(&kvm->mmu_lock);
-		ret = stage2_set_pte(kvm, &cache, addr, &pte,
-						KVM_S2PTE_FLAG_IS_IOMAP);
+		ret = stage2_set_pte(&kvm->arch.mmu, &cache, addr, &pte,
+				     KVM_S2PTE_FLAG_IS_IOMAP);
 		spin_unlock(&kvm->mmu_lock);
 		if (ret)
 			goto out;
@@ -1441,9 +1453,10 @@ static void stage2_wp_ptes(pmd_t *pmd, phys_addr_t addr, phys_addr_t end)
  * @addr:	range start address
  * @end:	range end address
  */
-static void stage2_wp_pmds(struct kvm *kvm, pud_t *pud,
+static void stage2_wp_pmds(struct kvm_s2_mmu *mmu, pud_t *pud,
 			   phys_addr_t addr, phys_addr_t end)
 {
+	struct kvm *kvm = mmu->kvm;
 	pmd_t *pmd;
 	phys_addr_t next;
 
@@ -1463,14 +1476,15 @@ static void stage2_wp_pmds(struct kvm *kvm, pud_t *pud,
 }
 
 /**
- * stage2_wp_puds - write protect PGD range
- * @pgd:	pointer to pgd entry
- * @addr:	range start address
- * @end:	range end address
- */
-static void  stage2_wp_puds(struct kvm *kvm, pgd_t *pgd,
+  * stage2_wp_puds - write protect PGD range
+  * @pgd:	pointer to pgd entry
+  * @addr:	range start address
+  * @end:	range end address
+  */
+static void  stage2_wp_puds(struct kvm_s2_mmu *mmu, pgd_t *pgd,
 			    phys_addr_t addr, phys_addr_t end)
 {
+	struct kvm *kvm __maybe_unused = mmu->kvm;
 	pud_t *pud;
 	phys_addr_t next;
 
@@ -1482,7 +1496,7 @@ static void  stage2_wp_puds(struct kvm *kvm, pgd_t *pgd,
 				if (!kvm_s2pud_readonly(pud))
 					kvm_set_s2pud_readonly(pud);
 			} else {
-				stage2_wp_pmds(kvm, pud, addr, next);
+				stage2_wp_pmds(mmu, pud, addr, next);
 			}
 		}
 	} while (pud++, addr = next, addr != end);
@@ -1494,12 +1508,13 @@ static void  stage2_wp_puds(struct kvm *kvm, pgd_t *pgd,
  * @addr:	Start address of range
  * @end:	End address of range
  */
-static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
+static void stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end)
 {
+	struct kvm *kvm = mmu->kvm;
 	pgd_t *pgd;
 	phys_addr_t next;
 
-	pgd = kvm->arch.pgd + stage2_pgd_index(kvm, addr);
+	pgd = mmu->pgd + stage2_pgd_index(kvm, addr);
 	do {
 		/*
 		 * Release kvm_mmu_lock periodically if the memory region is
@@ -1511,11 +1526,11 @@ static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
 		 * the lock.
 		 */
 		cond_resched_lock(&kvm->mmu_lock);
-		if (!READ_ONCE(kvm->arch.pgd))
+		if (!READ_ONCE(mmu->pgd))
 			break;
 		next = stage2_pgd_addr_end(kvm, addr, end);
 		if (stage2_pgd_present(kvm, *pgd))
-			stage2_wp_puds(kvm, pgd, addr, next);
+			stage2_wp_puds(mmu, pgd, addr, next);
 	} while (pgd++, addr = next, addr != end);
 }
 
@@ -1540,7 +1555,7 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
 	phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
 
 	spin_lock(&kvm->mmu_lock);
-	stage2_wp_range(kvm, start, end);
+	stage2_wp_range(&kvm->arch.mmu, start, end);
 	spin_unlock(&kvm->mmu_lock);
 	kvm_flush_remote_tlbs(kvm);
 }
@@ -1564,7 +1579,7 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
 	phys_addr_t start = (base_gfn +  __ffs(mask)) << PAGE_SHIFT;
 	phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT;
 
-	stage2_wp_range(kvm, start, end);
+	stage2_wp_range(&kvm->arch.mmu, start, end);
 }
 
 /*
@@ -1677,6 +1692,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	pgprot_t mem_type = PAGE_S2;
 	bool logging_active = memslot_is_logging(memslot);
 	unsigned long vma_pagesize, flags = 0;
+	struct kvm_s2_mmu *mmu = vcpu->arch.hw_mmu;
 
 	write_fault = kvm_is_write_fault(vcpu);
 	exec_fault = kvm_vcpu_trap_is_iabt(vcpu);
@@ -1796,7 +1812,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * execute permissions, and we preserve whatever we have.
 	 */
 	needs_exec = exec_fault ||
-		(fault_status == FSC_PERM && stage2_is_exec(kvm, fault_ipa));
+		(fault_status == FSC_PERM && stage2_is_exec(mmu, fault_ipa));
 
 	if (vma_pagesize == PUD_SIZE) {
 		pud_t new_pud = kvm_pfn_pud(pfn, mem_type);
@@ -1808,7 +1824,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		if (needs_exec)
 			new_pud = kvm_s2pud_mkexec(new_pud);
 
-		ret = stage2_set_pud_huge(kvm, memcache, fault_ipa, &new_pud);
+		ret = stage2_set_pud_huge(mmu, memcache, fault_ipa, &new_pud);
 	} else if (vma_pagesize == PMD_SIZE) {
 		pmd_t new_pmd = kvm_pfn_pmd(pfn, mem_type);
 
@@ -1820,7 +1836,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		if (needs_exec)
 			new_pmd = kvm_s2pmd_mkexec(new_pmd);
 
-		ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
+		ret = stage2_set_pmd_huge(mmu, memcache, fault_ipa, &new_pmd);
 	} else {
 		pte_t new_pte = kvm_pfn_pte(pfn, mem_type);
 
@@ -1832,7 +1848,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		if (needs_exec)
 			new_pte = kvm_s2pte_mkexec(new_pte);
 
-		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, flags);
+		ret = stage2_set_pte(mmu, memcache, fault_ipa, &new_pte, flags);
 	}
 
 out_unlock:
@@ -1861,7 +1877,7 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
 
 	spin_lock(&vcpu->kvm->mmu_lock);
 
-	if (!stage2_get_leaf_entry(vcpu->kvm, fault_ipa, &pud, &pmd, &pte))
+	if (!stage2_get_leaf_entry(vcpu->arch.hw_mmu, fault_ipa, &pud, &pmd, &pte))
 		goto out;
 
 	if (pud) {		/* HugeTLB */
@@ -2031,14 +2047,14 @@ static int handle_hva_to_gpa(struct kvm *kvm,
 
 static int kvm_unmap_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data)
 {
-	unmap_stage2_range(kvm, gpa, size);
+	unmap_stage2_range(&kvm->arch.mmu, gpa, size);
 	return 0;
 }
 
 int kvm_unmap_hva_range(struct kvm *kvm,
 			unsigned long start, unsigned long end)
 {
-	if (!kvm->arch.pgd)
+	if (!kvm->arch.mmu.pgd)
 		return 0;
 
 	trace_kvm_unmap_hva_range(start, end);
@@ -2058,7 +2074,7 @@ static int kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data
 	 * therefore stage2_set_pte() never needs to clear out a huge PMD
 	 * through this calling path.
 	 */
-	stage2_set_pte(kvm, NULL, gpa, pte, 0);
+	stage2_set_pte(&kvm->arch.mmu, NULL, gpa, pte, 0);
 	return 0;
 }
 
@@ -2069,7 +2085,7 @@ int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
 	kvm_pfn_t pfn = pte_pfn(pte);
 	pte_t stage2_pte;
 
-	if (!kvm->arch.pgd)
+	if (!kvm->arch.mmu.pgd)
 		return 0;
 
 	trace_kvm_set_spte_hva(hva);
@@ -2092,7 +2108,7 @@ static int kvm_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data)
 	pte_t *pte;
 
 	WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE);
-	if (!stage2_get_leaf_entry(kvm, gpa, &pud, &pmd, &pte))
+	if (!stage2_get_leaf_entry(&kvm->arch.mmu, gpa, &pud, &pmd, &pte))
 		return 0;
 
 	if (pud)
@@ -2110,7 +2126,7 @@ static int kvm_test_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *
 	pte_t *pte;
 
 	WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE);
-	if (!stage2_get_leaf_entry(kvm, gpa, &pud, &pmd, &pte))
+	if (!stage2_get_leaf_entry(&kvm->arch.mmu, gpa, &pud, &pmd, &pte))
 		return 0;
 
 	if (pud)
@@ -2123,7 +2139,7 @@ static int kvm_test_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *
 
 int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end)
 {
-	if (!kvm->arch.pgd)
+	if (!kvm->arch.mmu.pgd)
 		return 0;
 	trace_kvm_age_hva(start, end);
 	return handle_hva_to_gpa(kvm, start, end, kvm_age_hva_handler, NULL);
@@ -2131,7 +2147,7 @@ int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end)
 
 int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
 {
-	if (!kvm->arch.pgd)
+	if (!kvm->arch.mmu.pgd)
 		return 0;
 	trace_kvm_test_age_hva(hva);
 	return handle_hva_to_gpa(kvm, hva, hva, kvm_test_age_hva_handler, NULL);
@@ -2344,9 +2360,9 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 
 	spin_lock(&kvm->mmu_lock);
 	if (ret)
-		unmap_stage2_range(kvm, mem->guest_phys_addr, mem->memory_size);
+		unmap_stage2_range(&kvm->arch.mmu, mem->guest_phys_addr, mem->memory_size);
 	else
-		stage2_flush_memslot(kvm, memslot);
+		stage2_flush_memslot(&kvm->arch.mmu, memslot);
 	spin_unlock(&kvm->mmu_lock);
 out:
 	up_read(&current->mm->mmap_sem);
@@ -2370,7 +2386,7 @@ void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen)
 
 void kvm_arch_flush_shadow_all(struct kvm *kvm)
 {
-	kvm_free_stage2_pgd(kvm);
+	kvm_free_stage2_pgd(&kvm->arch.mmu);
 }
 
 void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
@@ -2380,7 +2396,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 	phys_addr_t size = slot->npages << PAGE_SHIFT;
 
 	spin_lock(&kvm->mmu_lock);
-	unmap_stage2_range(kvm, gpa, size);
+	unmap_stage2_range(&kvm->arch.mmu, gpa, size);
 	spin_unlock(&kvm->mmu_lock);
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 35/59] KVM: arm/arm64: nv: Support multiple nested stage 2 mmu structures
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (33 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 34/59] KVM: arm/arm64: nv: Factor out stage 2 page table data from struct kvm Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-06-25 12:19   ` Alexandru Elisei
                     ` (2 more replies)
  2019-06-21  9:38 ` [PATCH 36/59] KVM: arm64: nv: Implement nested Stage-2 page table walk logic Marc Zyngier
                   ` (25 subsequent siblings)
  60 siblings, 3 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Christoffer Dall <christoffer.dall@arm.com>

Add stage 2 mmu data structures for virtual EL2 and for nested guests.
We don't yet populate shadow stage 2 page tables, but we now have a
framework for getting to a shadow stage 2 pgd.

We allocate twice the number of vcpus as stage 2 mmu structures because
that's sufficient for each vcpu running two VMs without having to flush
the stage 2 page tables.

Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_host.h     |   4 +
 arch/arm/include/asm/kvm_mmu.h      |   3 +
 arch/arm64/include/asm/kvm_host.h   |  28 +++++
 arch/arm64/include/asm/kvm_mmu.h    |   8 ++
 arch/arm64/include/asm/kvm_nested.h |   7 ++
 arch/arm64/kvm/nested.c             | 172 ++++++++++++++++++++++++++++
 virt/kvm/arm/arm.c                  |  16 ++-
 virt/kvm/arm/mmu.c                  |  31 ++---
 8 files changed, 254 insertions(+), 15 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index e3217c4ad25b..b821eb2383ad 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -424,4 +424,8 @@ static inline bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu)
 	return true;
 }
 
+static inline void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu) {}
+static inline void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu) {}
+static inline int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu) { return 0; }
+
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index be23e3f8e08c..e6984b6da2ce 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -420,6 +420,9 @@ static inline int hyp_map_aux_data(void)
 
 static inline void kvm_set_ipa_limit(void) {}
 
+static inline void kvm_init_s2_mmu(struct kvm_s2_mmu *mmu) {}
+static inline void kvm_init_nested(struct kvm *kvm) {}
+
 static __always_inline u64 kvm_get_vttbr(struct kvm_s2_mmu *mmu)
 {
 	struct kvm_vmid *vmid = &mmu->vmid;
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 3dee5e17a4ee..cc238de170d2 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -88,11 +88,39 @@ struct kvm_s2_mmu {
 	phys_addr_t	pgd_phys;
 
 	struct kvm *kvm;
+
+	/*
+	 * For a shadow stage-2 MMU, the virtual vttbr programmed by the guest
+	 * hypervisor.  Unused for kvm_arch->mmu. Set to 1 when the structure
+	 * contains no valid information.
+	 */
+	u64	vttbr;
+
+	/* true when this represents a nested context where virtual HCR_EL2.VM == 1 */
+	bool	nested_stage2_enabled;
+
+	/*
+	 *  0: Nobody is currently using this, check vttbr for validity
+	 * >0: Somebody is actively using this.
+	 */
+	atomic_t refcnt;
 };
 
+static inline bool kvm_s2_mmu_valid(struct kvm_s2_mmu *mmu)
+{
+	return !(mmu->vttbr & 1);
+}
+
 struct kvm_arch {
 	struct kvm_s2_mmu mmu;
 
+	/*
+	 * Stage 2 paging stage for VMs with nested virtual using a virtual
+	 * VMID.
+	 */
+	struct kvm_s2_mmu *nested_mmus;
+	size_t nested_mmus_size;
+
 	/* VTCR_EL2 value for this VM */
 	u64    vtcr;
 
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 1eb6e0ca61c2..32bcaa1845dc 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -100,6 +100,7 @@ alternative_cb_end
 #include <asm/mmu_context.h>
 #include <asm/pgtable.h>
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_nested.h>
 
 void kvm_update_va_mask(struct alt_instr *alt,
 			__le32 *origptr, __le32 *updptr, int nr_inst);
@@ -164,6 +165,7 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
 			     void **haddr);
 void free_hyp_pgds(void);
 
+void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size);
 void stage2_unmap_vm(struct kvm *kvm);
 int kvm_alloc_stage2_pgd(struct kvm_s2_mmu *mmu);
 void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
@@ -635,5 +637,11 @@ static __always_inline void __load_guest_stage2(struct kvm_s2_mmu *mmu)
 	asm(ALTERNATIVE("nop", "isb", ARM64_WORKAROUND_1165522));
 }
 
+static inline u64 get_vmid(u64 vttbr)
+{
+	return (vttbr & VTTBR_VMID_MASK(kvm_get_vmid_bits())) >>
+		VTTBR_VMID_SHIFT;
+}
+
 #endif /* __ASSEMBLY__ */
 #endif /* __ARM64_KVM_MMU_H__ */
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 61e71d0d2151..d4021d0892bd 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -10,6 +10,13 @@ static inline bool nested_virt_in_use(const struct kvm_vcpu *vcpu)
 		test_bit(KVM_ARM_VCPU_NESTED_VIRT, vcpu->arch.features);
 }
 
+extern void kvm_init_nested(struct kvm *kvm);
+extern int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu);
+extern void kvm_init_s2_mmu(struct kvm_s2_mmu *mmu);
+extern struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr);
+extern void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu);
+extern void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu);
+
 int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
 extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
 extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 3872e3cf1691..4b38dc5c0be3 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -18,7 +18,161 @@
 #include <linux/kvm.h>
 #include <linux/kvm_host.h>
 
+#include <asm/kvm_arm.h>
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_mmu.h>
+#include <asm/kvm_nested.h>
+
+void kvm_init_nested(struct kvm *kvm)
+{
+	kvm_init_s2_mmu(&kvm->arch.mmu);
+
+	kvm->arch.nested_mmus = NULL;
+	kvm->arch.nested_mmus_size = 0;
+}
+
+int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu)
+{
+	struct kvm *kvm = vcpu->kvm;
+	struct kvm_s2_mmu *tmp;
+	int num_mmus;
+	int ret = -ENOMEM;
+
+	if (!test_bit(KVM_ARM_VCPU_NESTED_VIRT, vcpu->arch.features))
+		return 0;
+
+	if (!cpus_have_const_cap(ARM64_HAS_NESTED_VIRT))
+		return -EINVAL;
+
+	mutex_lock(&kvm->lock);
+
+	num_mmus = atomic_read(&kvm->online_vcpus) * 2;
+	tmp = __krealloc(kvm->arch.nested_mmus,
+			 num_mmus * sizeof(*kvm->arch.nested_mmus),
+			 GFP_KERNEL | __GFP_ZERO);
+
+	if (tmp) {
+		if (tmp != kvm->arch.nested_mmus)
+			kfree(kvm->arch.nested_mmus);
+
+		tmp[num_mmus - 1].kvm = kvm;
+		atomic_set(&tmp[num_mmus - 1].refcnt, 0);
+		ret = kvm_alloc_stage2_pgd(&tmp[num_mmus - 1]);
+		if (ret)
+			goto out;
+
+		tmp[num_mmus - 2].kvm = kvm;
+		atomic_set(&tmp[num_mmus - 2].refcnt, 0);
+		ret = kvm_alloc_stage2_pgd(&tmp[num_mmus - 2]);
+		if (ret) {
+			kvm_free_stage2_pgd(&tmp[num_mmus - 1]);
+			goto out;
+		}
+
+		kvm->arch.nested_mmus_size = num_mmus;
+		kvm->arch.nested_mmus = tmp;
+		tmp = NULL;
+	}
+
+out:
+	kfree(tmp);
+	mutex_unlock(&kvm->lock);
+	return ret;
+}
+
+/* Must be called with kvm->lock held */
+struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr)
+{
+	bool nested_stage2_enabled = hcr & HCR_VM;
+	int i;
+
+	/* Don't consider the CnP bit for the vttbr match */
+	vttbr = vttbr & ~1UL;
+
+	/* Search a mmu in the list using the virtual VMID as a key */
+	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
+		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
+
+		if (!kvm_s2_mmu_valid(mmu))
+			continue;
+
+		if (nested_stage2_enabled &&
+		    mmu->nested_stage2_enabled &&
+		    vttbr == mmu->vttbr)
+			return mmu;
+
+		if (!nested_stage2_enabled &&
+		    !mmu->nested_stage2_enabled &&
+		    get_vmid(vttbr) == get_vmid(mmu->vttbr))
+			return mmu;
+	}
+	return NULL;
+}
+
+static struct kvm_s2_mmu *get_s2_mmu_nested(struct kvm_vcpu *vcpu)
+{
+	struct kvm *kvm = vcpu->kvm;
+	u64 vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
+	u64 hcr= vcpu_read_sys_reg(vcpu, HCR_EL2);
+	struct kvm_s2_mmu *s2_mmu;
+	int i;
+
+	s2_mmu = lookup_s2_mmu(kvm, vttbr, hcr);
+	if (s2_mmu)
+		goto out;
+
+	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
+		s2_mmu = &kvm->arch.nested_mmus[i];
+
+		if (atomic_read(&s2_mmu->refcnt) == 0)
+			break;
+	}
+	BUG_ON(atomic_read(&s2_mmu->refcnt)); /* We have struct MMUs to spare */
+
+	if (kvm_s2_mmu_valid(s2_mmu)) {
+		/* Clear the old state */
+		kvm_unmap_stage2_range(s2_mmu, 0, kvm_phys_size(kvm));
+		if (s2_mmu->vmid.vmid_gen)
+			kvm_call_hyp(__kvm_tlb_flush_vmid, s2_mmu);
+	}
+
+	/*
+	 * The virtual VMID (modulo CnP) will be used as a key when matching
+	 * an existing kvm_s2_mmu.
+	 */
+	s2_mmu->vttbr = vttbr & ~1UL;
+	s2_mmu->nested_stage2_enabled = hcr & HCR_VM;
+
+out:
+	atomic_inc(&s2_mmu->refcnt);
+	return s2_mmu;
+}
+
+void kvm_init_s2_mmu(struct kvm_s2_mmu *mmu)
+{
+	mmu->vttbr = 1;
+	mmu->nested_stage2_enabled = false;
+	atomic_set(&mmu->refcnt, 0);
+}
+
+void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu)
+{
+	if (is_hyp_ctxt(vcpu)) {
+		vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu;
+	} else {
+		spin_lock(&vcpu->kvm->mmu_lock);
+		vcpu->arch.hw_mmu = get_s2_mmu_nested(vcpu);
+		spin_unlock(&vcpu->kvm->mmu_lock);
+	}
+}
+
+void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu)
+{
+	if (vcpu->arch.hw_mmu != &vcpu->kvm->arch.mmu) {
+		atomic_dec(&vcpu->arch.hw_mmu->refcnt);
+		vcpu->arch.hw_mmu = NULL;
+	}
+}
 
 /*
  * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
@@ -37,3 +191,21 @@ int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
 
 	return -EINVAL;
 }
+
+void kvm_arch_flush_shadow_all(struct kvm *kvm)
+{
+	int i;
+
+	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
+		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
+
+		WARN_ON(atomic_read(&mmu->refcnt));
+
+		if (!atomic_read(&mmu->refcnt))
+			kvm_free_stage2_pgd(mmu);
+	}
+	kfree(kvm->arch.nested_mmus);
+	kvm->arch.nested_mmus = NULL;
+	kvm->arch.nested_mmus_size = 0;
+	kvm_free_stage2_pgd(&kvm->arch.mmu);
+}
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 5d4371633e1c..4e3cbfa1ecbe 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -36,6 +36,7 @@
 #include <asm/kvm_arm.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_nested.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_coproc.h>
 #include <asm/sections.h>
@@ -126,6 +127,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	kvm->arch.mmu.vmid.vmid_gen = 0;
 	kvm->arch.mmu.kvm = kvm;
 
+	kvm_init_nested(kvm);
+
 	ret = create_hyp_mappings(kvm, kvm + 1, PAGE_HYP);
 	if (ret)
 		goto out_free_stage2_pgd;
@@ -353,6 +356,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	int *last_ran;
 	kvm_host_data_t *cpu_data;
 
+	if (nested_virt_in_use(vcpu))
+		kvm_vcpu_load_hw_mmu(vcpu);
+
 	last_ran = this_cpu_ptr(vcpu->kvm->arch.last_vcpu_ran);
 	cpu_data = this_cpu_ptr(&kvm_host_data);
 
@@ -391,6 +397,9 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 	kvm_vgic_put(vcpu);
 	kvm_vcpu_pmu_restore_host(vcpu);
 
+	if (nested_virt_in_use(vcpu))
+		kvm_vcpu_put_hw_mmu(vcpu);
+
 	vcpu->cpu = -1;
 
 	kvm_arm_set_running_vcpu(NULL);
@@ -968,8 +977,13 @@ static int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
 
 	vcpu->arch.target = phys_target;
 
+	/* Prepare for nested if required */
+	ret = kvm_vcpu_init_nested(vcpu);
+
 	/* Now we know what it is, we can reset it. */
-	ret = kvm_reset_vcpu(vcpu);
+	if (!ret)
+		ret = kvm_reset_vcpu(vcpu);
+
 	if (ret) {
 		vcpu->arch.target = -1;
 		bitmap_zero(vcpu->arch.features, KVM_VCPU_MAX_FEATURES);
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index bb1be4ea55ec..faa61a81c8cc 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -325,7 +325,7 @@ static void unmap_stage2_puds(struct kvm_s2_mmu *mmu, pgd_t *pgd,
 }
 
 /**
- * unmap_stage2_range -- Clear stage2 page table entries to unmap a range
+ * kvm_unmap_stage2_range -- Clear stage2 page table entries to unmap a range
  * @kvm:   The VM pointer
  * @start: The intermediate physical base address of the range to unmap
  * @size:  The size of the area to unmap
@@ -335,7 +335,7 @@ static void unmap_stage2_puds(struct kvm_s2_mmu *mmu, pgd_t *pgd,
  * destroying the VM), otherwise another faulting VCPU may come in and mess
  * with things behind our backs.
  */
-static void unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
+void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
 {
 	struct kvm *kvm = mmu->kvm;
 	pgd_t *pgd;
@@ -924,6 +924,10 @@ int kvm_alloc_stage2_pgd(struct kvm_s2_mmu *mmu)
 
 	mmu->pgd = pgd;
 	mmu->pgd_phys = pgd_phys;
+	mmu->vmid.vmid_gen = 0;
+
+	kvm_init_s2_mmu(mmu);
+
 	return 0;
 }
 
@@ -962,7 +966,7 @@ static void stage2_unmap_memslot(struct kvm *kvm,
 
 		if (!(vma->vm_flags & VM_PFNMAP)) {
 			gpa_t gpa = addr + (vm_start - memslot->userspace_addr);
-			unmap_stage2_range(&kvm->arch.mmu, gpa, vm_end - vm_start);
+			kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, vm_end - vm_start);
 		}
 		hva = vm_end;
 	} while (hva < reg_end);
@@ -1001,7 +1005,7 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
 
 	spin_lock(&kvm->mmu_lock);
 	if (mmu->pgd) {
-		unmap_stage2_range(mmu, 0, kvm_phys_size(kvm));
+		kvm_unmap_stage2_range(mmu, 0, kvm_phys_size(kvm));
 		pgd = READ_ONCE(mmu->pgd);
 		mmu->pgd = NULL;
 	}
@@ -1093,7 +1097,7 @@ static int stage2_set_pmd_huge(struct kvm_s2_mmu *mmu,
 		 * get handled accordingly.
 		 */
 		if (!pmd_thp_or_huge(old_pmd)) {
-			unmap_stage2_range(mmu, addr & S2_PMD_MASK, S2_PMD_SIZE);
+			kvm_unmap_stage2_range(mmu, addr & S2_PMD_MASK, S2_PMD_SIZE);
 			goto retry;
 		}
 		/*
@@ -1145,7 +1149,7 @@ static int stage2_set_pud_huge(struct kvm_s2_mmu *mmu,
 		 * the range for this block and retry.
 		 */
 		if (!stage2_pud_huge(kvm, old_pud)) {
-			unmap_stage2_range(mmu, addr & S2_PUD_MASK, S2_PUD_SIZE);
+			kvm_unmap_stage2_range(mmu, addr & S2_PUD_MASK, S2_PUD_SIZE);
 			goto retry;
 		}
 
@@ -2047,7 +2051,7 @@ static int handle_hva_to_gpa(struct kvm *kvm,
 
 static int kvm_unmap_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data)
 {
-	unmap_stage2_range(&kvm->arch.mmu, gpa, size);
+	kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, size);
 	return 0;
 }
 
@@ -2360,7 +2364,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 
 	spin_lock(&kvm->mmu_lock);
 	if (ret)
-		unmap_stage2_range(&kvm->arch.mmu, mem->guest_phys_addr, mem->memory_size);
+		kvm_unmap_stage2_range(&kvm->arch.mmu, mem->guest_phys_addr, mem->memory_size);
 	else
 		stage2_flush_memslot(&kvm->arch.mmu, memslot);
 	spin_unlock(&kvm->mmu_lock);
@@ -2384,11 +2388,6 @@ void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen)
 {
 }
 
-void kvm_arch_flush_shadow_all(struct kvm *kvm)
-{
-	kvm_free_stage2_pgd(&kvm->arch.mmu);
-}
-
 void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 				   struct kvm_memory_slot *slot)
 {
@@ -2396,7 +2395,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 	phys_addr_t size = slot->npages << PAGE_SHIFT;
 
 	spin_lock(&kvm->mmu_lock);
-	unmap_stage2_range(&kvm->arch.mmu, gpa, size);
+	kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, size);
 	spin_unlock(&kvm->mmu_lock);
 }
 
@@ -2467,3 +2466,7 @@ void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled)
 
 	trace_kvm_toggle_cache(*vcpu_pc(vcpu), was_enabled, now_enabled);
 }
+
+__weak void kvm_arch_flush_shadow_all(struct kvm *kvm)
+{
+}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 36/59] KVM: arm64: nv: Implement nested Stage-2 page table walk logic
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (34 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 35/59] KVM: arm/arm64: nv: Support multiple nested stage 2 mmu structures Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-06-21  9:38 ` [PATCH 37/59] KVM: arm64: nv: Handle shadow stage 2 page faults Marc Zyngier
                   ` (24 subsequent siblings)
  60 siblings, 0 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Christoffer Dall <christoffer.dall@linaro.org>

Based on the pseudo-code in the ARM ARM, implement a stage 2 software
page table walker.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/esr.h        |   1 +
 arch/arm64/include/asm/kvm_arm.h    |   2 +
 arch/arm64/include/asm/kvm_nested.h |  13 ++
 arch/arm64/kvm/nested.c             | 241 ++++++++++++++++++++++++++++
 4 files changed, 257 insertions(+)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index f85aa269082c..8efc91435948 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -136,6 +136,7 @@
 #define ESR_ELx_CM 		(UL(1) << ESR_ELx_CM_SHIFT)
 
 /* ISS field definitions for exceptions taken in to Hyp */
+#define ESR_ELx_FSC_ADDRSZ	(0x00)
 #define ESR_ELx_CV		(UL(1) << 24)
 #define ESR_ELx_COND_SHIFT	(20)
 #define ESR_ELx_COND_MASK	(UL(0xF) << ESR_ELx_COND_SHIFT)
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 55f4525c112c..1e4dbe0b1c8e 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -273,6 +273,8 @@
 #define VTTBR_VMID_SHIFT  (UL(48))
 #define VTTBR_VMID_MASK(size) (_AT(u64, (1 << size) - 1) << VTTBR_VMID_SHIFT)
 
+#define SCTLR_EE	(UL(1) << 25)
+
 /* Hyp System Trap Register */
 #define HSTR_EL2_T(x)	(1 << x)
 
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index d4021d0892bd..686ba53379ab 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -17,6 +17,19 @@ extern struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr);
 extern void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu);
 extern void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu);
 
+struct kvm_s2_trans {
+	phys_addr_t output;
+	phys_addr_t block_size;
+	bool writable;
+	bool readable;
+	int level;
+	u32 esr;
+	u64 upper_attr;
+};
+
+extern int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
+			      struct kvm_s2_trans *result);
+
 int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
 extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
 extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 4b38dc5c0be3..6a9bd68b769b 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -80,6 +80,247 @@ int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu)
 	return ret;
 }
 
+struct s2_walk_info {
+	unsigned int pgshift;
+	unsigned int pgsize;
+	unsigned int ps;
+	unsigned int sl;
+	unsigned int t0sz;
+};
+
+static unsigned int ps_to_output_size(unsigned int ps)
+{
+	switch (ps) {
+	case 0: return 32;
+	case 1: return 36;
+	case 2: return 40;
+	case 3: return 42;
+	case 4: return 44;
+	case 5:
+	default:
+		return 48;
+	}
+}
+
+static unsigned int pa_max(void)
+{
+	 /* We always emulate a VM with maximum PA size of KVM_PHYS_SIZE. */
+	return KVM_PHYS_SHIFT;
+}
+
+static int esr_s2_fault(struct kvm_vcpu *vcpu, int level, u32 fsc)
+{
+	u32 esr;
+
+	esr = kvm_vcpu_get_hsr(vcpu) & ~ESR_ELx_FSC;
+	esr |= fsc;
+	esr |= level & 0x3;
+	return esr;
+}
+
+static int check_base_s2_limits(struct kvm_vcpu *vcpu, struct s2_walk_info *wi,
+				int level, int input_size, int stride)
+{
+	int start_size;
+
+	/* Check translation limits */
+	switch (wi->pgsize) {
+	case SZ_64K:
+		if (level == 0 || (level == 1 && pa_max() <= 42))
+			return -EFAULT;
+		break;
+	case SZ_16K:
+		if (level == 0 || (level == 1 && pa_max() <= 40))
+			return -EFAULT;
+		break;
+	case SZ_4K:
+		if (level < 0 || (level == 0 && pa_max() <= 42))
+			return -EFAULT;
+		break;
+	}
+
+	/* Check input size limits */
+	if (input_size > pa_max() &&
+	    (!vcpu_mode_is_32bit(vcpu) || input_size > 40))
+		return -EFAULT;
+
+	/* Check number of entries in starting level table */
+	start_size = input_size - ((3 - level) * stride + wi->pgshift);
+	if (start_size < 1 || start_size > stride + 4)
+		return -EFAULT;
+
+	return 0;
+}
+
+/* Check if output is within boundaries */
+static int check_output_size(struct kvm_vcpu *vcpu, struct s2_walk_info *wi,
+			     phys_addr_t output)
+{
+	unsigned int output_size = ps_to_output_size(wi->ps);
+
+	if (output_size > pa_max())
+		output_size = pa_max();
+
+	if (output_size != 48 && (output & GENMASK_ULL(47, output_size)))
+		return -1;
+
+	return 0;
+}
+
+/*
+ * This is essentially a C-version of the pseudo code from the ARM ARM
+ * AArch64.TranslationTableWalk  function.  I strongly recommend looking at
+ * that pseudocode in trying to understand this.
+ *
+ * Must be called with the kvm->srcy read lock held
+ */
+static int walk_nested_s2_pgd(struct kvm_vcpu *vcpu, phys_addr_t ipa,
+			      struct s2_walk_info *wi, struct kvm_s2_trans *out)
+{
+	u64 vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
+	int first_block_level, level, stride, input_size, base_lower_bound;
+	phys_addr_t base_addr;
+	unsigned int addr_top, addr_bottom;
+	u64 desc;  /* page table entry */
+	int ret;
+	phys_addr_t paddr;
+
+	switch (wi->pgsize) {
+	case SZ_64K:
+	case SZ_16K:
+		level = 3 - wi->sl;
+		first_block_level = 2;
+		break;
+	case SZ_4K:
+		level = 2 - wi->sl;
+		first_block_level = 1;
+		break;
+	default:
+		/* GCC is braindead */
+		WARN(1, "Page size is none of 4K, 16K or 64K");
+	}
+
+	stride = wi->pgshift - 3;
+	input_size = 64 - wi->t0sz;
+	if (input_size > 48 || input_size < 25)
+		return -EFAULT;
+
+	ret = check_base_s2_limits(vcpu, wi, level, input_size, stride);
+	if (WARN_ON(ret))
+		return ret;
+
+	if (check_output_size(vcpu, wi, vttbr)) {
+		out->esr = esr_s2_fault(vcpu, level, ESR_ELx_FSC_ADDRSZ);
+		return 1;
+	}
+
+	base_lower_bound = 3 + input_size - ((3 - level) * stride +
+			   wi->pgshift);
+	base_addr = vttbr & GENMASK_ULL(47, base_lower_bound);
+
+	addr_top = input_size - 1;
+
+	while (1) {
+		phys_addr_t index;
+
+		addr_bottom = (3 - level) * stride + wi->pgshift;
+		index = (ipa & GENMASK_ULL(addr_top, addr_bottom))
+			>> (addr_bottom - 3);
+
+		paddr = base_addr | index;
+		ret = kvm_read_guest(vcpu->kvm, paddr, &desc, sizeof(desc));
+		if (ret < 0)
+			return ret;
+
+		/*
+		 * Handle reversedescriptors if endianness differs between the
+		 * host and the guest hypervisor.
+		 */
+		if (vcpu_read_sys_reg(vcpu, SCTLR_EL2) & SCTLR_EE)
+			desc = be64_to_cpu(desc);
+		else
+			desc = le64_to_cpu(desc);
+
+		/* Check for valid descriptor at this point */
+		if (!(desc & 1) || ((desc & 3) == 1 && level == 3)) {
+			out->esr = esr_s2_fault(vcpu, level, ESR_ELx_FSC_FAULT);
+			return 1;
+		}
+
+		/* We're at the final level or block translation level */
+		if ((desc & 3) == 1 || level == 3)
+			break;
+
+		if (check_output_size(vcpu, wi, desc)) {
+			out->esr = esr_s2_fault(vcpu, level, ESR_ELx_FSC_ADDRSZ);
+			return 1;
+		}
+
+		base_addr = desc & GENMASK_ULL(47, wi->pgshift);
+
+		level += 1;
+		addr_top = addr_bottom - 1;
+	}
+
+	if (level < first_block_level) {
+		out->esr = esr_s2_fault(vcpu, level, ESR_ELx_FSC_FAULT);
+		return 1;
+	}
+
+	/*
+	 * We don't use the contiguous bit in the stage-2 ptes, so skip check
+	 * for misprogramming of the contiguous bit.
+	 */
+
+	if (check_output_size(vcpu, wi, desc)) {
+		out->esr = esr_s2_fault(vcpu, level, ESR_ELx_FSC_ADDRSZ);
+		return 1;
+	}
+
+	if (!(desc & BIT(10))) {
+		out->esr = esr_s2_fault(vcpu, level, ESR_ELx_FSC_ACCESS);
+		return 1;
+	}
+
+	/* Calculate and return the result */
+	paddr = (desc & GENMASK_ULL(47, addr_bottom)) |
+		(ipa & GENMASK_ULL(addr_bottom - 1, 0));
+	out->output = paddr;
+	out->block_size = 1UL << ((3 - level) * stride + wi->pgshift);
+	out->readable = desc & (0b01 << 6);
+	out->writable = desc & (0b10 << 6);
+	out->level = level;
+	out->upper_attr = desc & GENMASK_ULL(63, 52);
+	return 0;
+}
+
+int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
+		       struct kvm_s2_trans *result)
+{
+	u64 vtcr = vcpu_read_sys_reg(vcpu, VTCR_EL2);
+	struct s2_walk_info wi;
+
+	if (!nested_virt_in_use(vcpu))
+		return 0;
+
+	wi.t0sz = vtcr & TCR_EL2_T0SZ_MASK;
+
+	switch (vtcr & VTCR_EL2_TG0_MASK) {
+	case VTCR_EL2_TG0_4K:
+		wi.pgshift = 12;	 break;
+	case VTCR_EL2_TG0_16K:
+		wi.pgshift = 14;	 break;
+	case VTCR_EL2_TG0_64K:
+	default:
+		wi.pgshift = 16;	 break;
+	}
+	wi.pgsize = 1UL << wi.pgshift;
+	wi.ps = (vtcr & VTCR_EL2_PS_MASK) >> VTCR_EL2_PS_SHIFT;
+	wi.sl = (vtcr & VTCR_EL2_SL0_MASK) >> VTCR_EL2_SL0_SHIFT;
+
+	return walk_nested_s2_pgd(vcpu, gipa, &wi, result);
+}
+
 /* Must be called with kvm->lock held */
 struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr)
 {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 37/59] KVM: arm64: nv: Handle shadow stage 2 page faults
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (35 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 36/59] KVM: arm64: nv: Implement nested Stage-2 page table walk logic Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-07-05 14:28   ` Alexandru Elisei
  2019-06-21  9:38 ` [PATCH 38/59] KVM: arm64: nv: Unmap/flush shadow stage 2 page tables Marc Zyngier
                   ` (23 subsequent siblings)
  60 siblings, 1 reply; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Christoffer Dall <christoffer.dall@linaro.org>

If we are faulting on a shadow stage 2 translation, we first walk the
guest hypervisor's stage 2 page table to see if it has a mapping. If
not, we inject a stage 2 page fault to the virtual EL2. Otherwise, we
create a mapping in the shadow stage 2 page table.

Note that we have to deal with two IPAs when we got a showdow stage 2
page fault. One is the address we faulted on, and is in the L2 guest
phys space. The other is from the guest stage-2 page table walk, and is
in the L1 guest phys space.  To differentiate them, we rename variable
names so that fault_ipa is used for the former and ipa is used for the
latter.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_mmu.h       | 52 +++++++++++++++
 arch/arm64/include/asm/kvm_emulate.h |  6 ++
 arch/arm64/include/asm/kvm_nested.h  | 20 +++++-
 arch/arm64/kvm/nested.c              | 41 ++++++++++++
 virt/kvm/arm/mmio.c                  | 12 ++--
 virt/kvm/arm/mmu.c                   | 99 ++++++++++++++++++++++------
 6 files changed, 203 insertions(+), 27 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index e6984b6da2ce..afabf1fd1d17 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -423,6 +423,58 @@ static inline void kvm_set_ipa_limit(void) {}
 static inline void kvm_init_s2_mmu(struct kvm_s2_mmu *mmu) {}
 static inline void kvm_init_nested(struct kvm *kvm) {}
 
+struct kvm_s2_trans {};
+static inline phys_addr_t kvm_s2_trans_output(struct kvm_s2_trans *trans)
+{
+	BUG();
+}
+
+static inline unsigned long kvm_s2_trans_size(struct kvm_s2_trans *trans)
+{
+	BUG();
+}
+
+static inline u32 kvm_s2_trans_esr(struct kvm_s2_trans *trans)
+{
+	BUG();
+}
+
+static inline int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t ipa,
+				     struct kvm_s2_trans *trans)
+{
+	BUG();
+}
+
+static inline int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu,
+					   struct kvm_s2_trans *trans)
+{
+	BUG();
+}
+
+static inline void kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u32 esr)
+{
+	BUG();
+}
+
+static inline bool kvm_s2_trans_readable(struct kvm_s2_trans *trans)
+{
+	BUG();
+}
+
+static inline bool kvm_s2_trans_writable(struct kvm_s2_trans *trans)
+{
+	BUG();
+}
+
+static inline void kvm_nested_s2_flush(struct kvm *kvm) {}
+static inline void kvm_nested_s2_wp(struct kvm *kvm) {}
+static inline void kvm_nested_s2_clear(struct kvm *kvm) {}
+
+static inline bool kvm_is_shadow_s2_fault(struct kvm_vcpu *vcpu)
+{
+	return false;
+}
+
 static __always_inline u64 kvm_get_vttbr(struct kvm_s2_mmu *mmu)
 {
 	struct kvm_vmid *vmid = &mmu->vmid;
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 73d8c54a52c6..b49a47f3daa8 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -606,4 +606,10 @@ static inline void __hyp_text __kvm_skip_instr(struct kvm_vcpu *vcpu)
 	write_sysreg_el2(*vcpu_pc(vcpu), SYS_ELR);
 }
 
+static inline bool kvm_is_shadow_s2_fault(struct kvm_vcpu *vcpu)
+{
+	return (vcpu->arch.hw_mmu != &vcpu->kvm->arch.mmu &&
+		vcpu->arch.hw_mmu->nested_stage2_enabled);
+}
+
 #endif /* __ARM64_KVM_EMULATE_H__ */
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 686ba53379ab..052d46d96201 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -19,7 +19,7 @@ extern void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu);
 
 struct kvm_s2_trans {
 	phys_addr_t output;
-	phys_addr_t block_size;
+	unsigned long block_size;
 	bool writable;
 	bool readable;
 	int level;
@@ -27,9 +27,27 @@ struct kvm_s2_trans {
 	u64 upper_attr;
 };
 
+static inline phys_addr_t kvm_s2_trans_output(struct kvm_s2_trans *trans)
+{
+	return trans->output;
+}
+
+static inline unsigned long kvm_s2_trans_size(struct kvm_s2_trans *trans)
+{
+	return trans->block_size;
+}
+
+static inline u32 kvm_s2_trans_esr(struct kvm_s2_trans *trans)
+{
+	return trans->esr;
+}
+
 extern int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
 			      struct kvm_s2_trans *result);
 
+extern int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu,
+				    struct kvm_s2_trans *trans);
+extern int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2);
 int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
 extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
 extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 6a9bd68b769b..023027fa2db5 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -300,6 +300,8 @@ int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
 	u64 vtcr = vcpu_read_sys_reg(vcpu, VTCR_EL2);
 	struct s2_walk_info wi;
 
+	result->esr = 0;
+
 	if (!nested_virt_in_use(vcpu))
 		return 0;
 
@@ -415,6 +417,45 @@ void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu)
 	}
 }
 
+/*
+ * Returns non-zero if permission fault is handled by injecting it to the next
+ * level hypervisor.
+ */
+int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu, struct kvm_s2_trans *trans)
+{
+	unsigned long fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
+	bool forward_fault = false;
+
+	trans->esr = 0;
+
+	if (fault_status != FSC_PERM)
+		return 0;
+
+	if (kvm_vcpu_trap_is_iabt(vcpu)) {
+		forward_fault = (trans->upper_attr & PTE_S2_XN);
+	} else {
+		bool write_fault = kvm_is_write_fault(vcpu);
+
+		forward_fault = ((write_fault && !trans->writable) ||
+				 (!write_fault && !trans->readable));
+	}
+
+	if (forward_fault) {
+		trans->esr = esr_s2_fault(vcpu, trans->level, ESR_ELx_FSC_PERM);
+		return 1;
+	}
+
+	return 0;
+}
+
+int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2)
+{
+	vcpu_write_sys_reg(vcpu, vcpu->arch.fault.far_el2, FAR_EL2);
+	vcpu_write_sys_reg(vcpu, vcpu->arch.fault.hpfar_el2, HPFAR_EL2);
+
+	return kvm_inject_nested_sync(vcpu, esr_el2);
+}
+
 /*
  * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
  * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
diff --git a/virt/kvm/arm/mmio.c b/virt/kvm/arm/mmio.c
index a8a6a0c883f1..2b5de8388bf4 100644
--- a/virt/kvm/arm/mmio.c
+++ b/virt/kvm/arm/mmio.c
@@ -142,7 +142,7 @@ static int decode_hsr(struct kvm_vcpu *vcpu, bool *is_write, int *len)
 }
 
 int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
-		 phys_addr_t fault_ipa)
+		 phys_addr_t ipa)
 {
 	unsigned long data;
 	unsigned long rt;
@@ -171,22 +171,22 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
 		data = vcpu_data_guest_to_host(vcpu, vcpu_get_reg(vcpu, rt),
 					       len);
 
-		trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, len, fault_ipa, &data);
+		trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, len, ipa, &data);
 		kvm_mmio_write_buf(data_buf, len, data);
 
-		ret = kvm_io_bus_write(vcpu, KVM_MMIO_BUS, fault_ipa, len,
+		ret = kvm_io_bus_write(vcpu, KVM_MMIO_BUS, ipa, len,
 				       data_buf);
 	} else {
 		trace_kvm_mmio(KVM_TRACE_MMIO_READ_UNSATISFIED, len,
-			       fault_ipa, NULL);
+			       ipa, NULL);
 
-		ret = kvm_io_bus_read(vcpu, KVM_MMIO_BUS, fault_ipa, len,
+		ret = kvm_io_bus_read(vcpu, KVM_MMIO_BUS, ipa, len,
 				      data_buf);
 	}
 
 	/* Now prepare kvm_run for the potential return to userland. */
 	run->mmio.is_write	= is_write;
-	run->mmio.phys_addr	= fault_ipa;
+	run->mmio.phys_addr	= ipa;
 	run->mmio.len		= len;
 
 	if (!ret) {
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index faa61a81c8cc..3c7845832db8 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1384,7 +1384,8 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 	return ret;
 }
 
-static bool transparent_hugepage_adjust(kvm_pfn_t *pfnp, phys_addr_t *ipap)
+static bool transparent_hugepage_adjust(kvm_pfn_t *pfnp, phys_addr_t *ipap,
+					phys_addr_t *fault_ipap)
 {
 	kvm_pfn_t pfn = *pfnp;
 	gfn_t gfn = *ipap >> PAGE_SHIFT;
@@ -1418,6 +1419,7 @@ static bool transparent_hugepage_adjust(kvm_pfn_t *pfnp, phys_addr_t *ipap)
 		mask = PTRS_PER_PMD - 1;
 		VM_BUG_ON((gfn & mask) != (pfn & mask));
 		if (pfn & mask) {
+			*fault_ipap &= PMD_MASK;
 			*ipap &= PMD_MASK;
 			kvm_release_pfn_clean(pfn);
 			pfn &= ~mask;
@@ -1681,14 +1683,16 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot,
 }
 
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
-			  struct kvm_memory_slot *memslot, unsigned long hva,
-			  unsigned long fault_status)
+			  struct kvm_s2_trans *nested,
+			  struct kvm_memory_slot *memslot,
+			  unsigned long hva, unsigned long fault_status)
 {
 	int ret;
-	bool write_fault, writable, force_pte = false;
+	bool write_fault, writable;
 	bool exec_fault, needs_exec;
 	unsigned long mmu_seq;
-	gfn_t gfn = fault_ipa >> PAGE_SHIFT;
+	phys_addr_t ipa = fault_ipa;
+	gfn_t gfn;
 	struct kvm *kvm = vcpu->kvm;
 	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
 	struct vm_area_struct *vma;
@@ -1697,6 +1701,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	bool logging_active = memslot_is_logging(memslot);
 	unsigned long vma_pagesize, flags = 0;
 	struct kvm_s2_mmu *mmu = vcpu->arch.hw_mmu;
+	unsigned long max_map_size = PUD_SIZE;
 
 	write_fault = kvm_is_write_fault(vcpu);
 	exec_fault = kvm_vcpu_trap_is_iabt(vcpu);
@@ -1717,11 +1722,26 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	}
 
 	vma_pagesize = vma_kernel_pagesize(vma);
-	if (logging_active ||
-	    !fault_supports_stage2_huge_mapping(memslot, hva, vma_pagesize)) {
-		force_pte = true;
-		vma_pagesize = PAGE_SIZE;
+
+	if (!fault_supports_stage2_huge_mapping(memslot, hva, vma_pagesize))
+		max_map_size = PAGE_SIZE;
+
+	if (logging_active)
+		max_map_size = PAGE_SIZE;
+
+	if (kvm_is_shadow_s2_fault(vcpu)) {
+		ipa = kvm_s2_trans_output(nested);
+
+		/*
+		 * If we're about to create a shadow stage 2 entry, then we
+		 * can only create a block mapping if the guest stage 2 page
+		 * table uses at least as big a mapping.
+		 */
+		max_map_size = min(kvm_s2_trans_size(nested), max_map_size);
 	}
+	gfn = ipa >> PAGE_SHIFT;
+
+	vma_pagesize = min(vma_pagesize, max_map_size);
 
 	/*
 	 * The stage2 has a minimum of 2 level table (For arm64 see
@@ -1731,8 +1751,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * 3 levels, i.e, PMD is not folded.
 	 */
 	if (vma_pagesize == PMD_SIZE ||
-	    (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm)))
-		gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT;
+	    (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm))) {
+		gfn = (ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT;
+	}
 	up_read(&current->mm->mmap_sem);
 
 	/* We need minimum second+third level pages */
@@ -1784,7 +1805,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (mmu_notifier_retry(kvm, mmu_seq))
 		goto out_unlock;
 
-	if (vma_pagesize == PAGE_SIZE && !force_pte) {
+	if (vma_pagesize == PAGE_SIZE && max_map_size >= PMD_SIZE) {
 		/*
 		 * Only PMD_SIZE transparent hugepages(THP) are
 		 * currently supported. This code will need to be
@@ -1794,7 +1815,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		 * aligned and that the block is contained within the memslot.
 		 */
 		if (fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE) &&
-		    transparent_hugepage_adjust(&pfn, &fault_ipa))
+		    transparent_hugepage_adjust(&pfn, &ipa, &fault_ipa))
 			vma_pagesize = PMD_SIZE;
 	}
 
@@ -1919,8 +1940,10 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
 	unsigned long fault_status;
-	phys_addr_t fault_ipa;
+	phys_addr_t fault_ipa; /* The address we faulted on */
+	phys_addr_t ipa; /* Always the IPA in the L1 guest phys space */
 	struct kvm_memory_slot *memslot;
+	struct kvm_s2_trans nested_trans;
 	unsigned long hva;
 	bool is_iabt, write_fault, writable;
 	gfn_t gfn;
@@ -1928,7 +1951,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 
 	fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
 
-	fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
+	ipa = fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
 	is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
 
 	/* Synchronous External Abort? */
@@ -1952,6 +1975,12 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	/* Check the stage-2 fault is trans. fault or write fault */
 	if (fault_status != FSC_FAULT && fault_status != FSC_PERM &&
 	    fault_status != FSC_ACCESS) {
+		/*
+		 * We must never see an address size fault on shadow stage 2
+		 * page table walk, because we would have injected an addr
+		 * size fault when we walked the nested s2 page and not
+		 * create the shadow entry.
+		 */
 		kvm_err("Unsupported FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
 			kvm_vcpu_trap_get_class(vcpu),
 			(unsigned long)kvm_vcpu_trap_get_fault(vcpu),
@@ -1961,7 +1990,36 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 
 	idx = srcu_read_lock(&vcpu->kvm->srcu);
 
-	gfn = fault_ipa >> PAGE_SHIFT;
+	/*
+	 * We may have faulted on a shadow stage 2 page table if we are
+	 * running a nested guest.  In this case, we have to resolve the L2
+	 * IPA to the L1 IPA first, before knowing what kind of memory should
+	 * back the L1 IPA.
+	 *
+	 * If the shadow stage 2 page table walk faults, then we simply inject
+	 * this to the guest and carry on.
+	 */
+	if (kvm_is_shadow_s2_fault(vcpu)) {
+		u32 esr;
+
+		ret = kvm_walk_nested_s2(vcpu, fault_ipa, &nested_trans);
+		esr = kvm_s2_trans_esr(&nested_trans);
+		if (esr)
+			kvm_inject_s2_fault(vcpu, esr);
+		if (ret)
+			goto out_unlock;
+
+		ret = kvm_s2_handle_perm_fault(vcpu, &nested_trans);
+		esr = kvm_s2_trans_esr(&nested_trans);
+		if (esr)
+			kvm_inject_s2_fault(vcpu, esr);
+		if (ret)
+			goto out_unlock;
+
+		ipa = kvm_s2_trans_output(&nested_trans);
+	}
+
+	gfn = ipa >> PAGE_SHIFT;
 	memslot = gfn_to_memslot(vcpu->kvm, gfn);
 	hva = gfn_to_hva_memslot_prot(memslot, gfn, &writable);
 	write_fault = kvm_is_write_fault(vcpu);
@@ -1995,13 +2053,13 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		 * faulting VA. This is always 12 bits, irrespective
 		 * of the page size.
 		 */
-		fault_ipa |= kvm_vcpu_get_hfar(vcpu) & ((1 << 12) - 1);
-		ret = io_mem_abort(vcpu, run, fault_ipa);
+		ipa |= kvm_vcpu_get_hfar(vcpu) & ((1 << 12) - 1);
+		ret = io_mem_abort(vcpu, run, ipa);
 		goto out_unlock;
 	}
 
 	/* Userspace should not be able to register out-of-bounds IPAs */
-	VM_BUG_ON(fault_ipa >= kvm_phys_size(vcpu->kvm));
+	VM_BUG_ON(ipa >= kvm_phys_size(vcpu->kvm));
 
 	if (fault_status == FSC_ACCESS) {
 		handle_access_fault(vcpu, fault_ipa);
@@ -2009,7 +2067,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		goto out_unlock;
 	}
 
-	ret = user_mem_abort(vcpu, fault_ipa, memslot, hva, fault_status);
+	ret = user_mem_abort(vcpu, fault_ipa, &nested_trans,
+			     memslot, hva, fault_status);
 	if (ret == 0)
 		ret = 1;
 out_unlock:
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 38/59] KVM: arm64: nv: Unmap/flush shadow stage 2 page tables
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (36 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 37/59] KVM: arm64: nv: Handle shadow stage 2 page faults Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-07-01  8:03   ` Julien Thierry
  2019-06-21  9:38 ` [PATCH 39/59] KVM: arm64: nv: Move last_vcpu_ran to be per s2 mmu Marc Zyngier
                   ` (22 subsequent siblings)
  60 siblings, 1 reply; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Christoffer Dall <christoffer.dall@linaro.org>

Unmap/flush shadow stage 2 page tables for the nested VMs as well as the
stage 2 page table for the guest hypervisor.

Note: A bunch of the code in mmu.c relating to MMU notifiers is
currently dealt with in an extremely abrupt way, for example by clearing
out an entire shadow stage-2 table. This will be handled in a more
efficient way using the reverse mapping feature in a later version of
the patch series.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/kvm_mmu.h    |  3 +++
 arch/arm64/include/asm/kvm_nested.h |  3 +++
 arch/arm64/kvm/nested.c             | 39 +++++++++++++++++++++++++++
 virt/kvm/arm/arm.c                  |  4 ++-
 virt/kvm/arm/mmu.c                  | 42 +++++++++++++++++++++++------
 5 files changed, 82 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 32bcaa1845dc..f4c5ac5eb95f 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -163,6 +163,8 @@ int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
 			   void __iomem **haddr);
 int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
 			     void **haddr);
+void kvm_stage2_flush_range(struct kvm_s2_mmu *mmu,
+			    phys_addr_t addr, phys_addr_t end);
 void free_hyp_pgds(void);
 
 void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size);
@@ -171,6 +173,7 @@ int kvm_alloc_stage2_pgd(struct kvm_s2_mmu *mmu);
 void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
 int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 			  phys_addr_t pa, unsigned long size, bool writable);
+void kvm_stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end);
 
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
 
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 052d46d96201..3b415bc76ced 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -48,6 +48,9 @@ extern int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
 extern int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu,
 				    struct kvm_s2_trans *trans);
 extern int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2);
+extern void kvm_nested_s2_wp(struct kvm *kvm);
+extern void kvm_nested_s2_clear(struct kvm *kvm);
+extern void kvm_nested_s2_flush(struct kvm *kvm);
 int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
 extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
 extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 023027fa2db5..8880033fb6e0 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -456,6 +456,45 @@ int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2)
 	return kvm_inject_nested_sync(vcpu, esr_el2);
 }
 
+/* expects kvm->mmu_lock to be held */
+void kvm_nested_s2_wp(struct kvm *kvm)
+{
+	int i;
+
+	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
+		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
+
+		if (kvm_s2_mmu_valid(mmu))
+			kvm_stage2_wp_range(mmu, 0, kvm_phys_size(kvm));
+	}
+}
+
+/* expects kvm->mmu_lock to be held */
+void kvm_nested_s2_clear(struct kvm *kvm)
+{
+	int i;
+
+	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
+		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
+
+		if (kvm_s2_mmu_valid(mmu))
+			kvm_unmap_stage2_range(mmu, 0, kvm_phys_size(kvm));
+	}
+}
+
+/* expects kvm->mmu_lock to be held */
+void kvm_nested_s2_flush(struct kvm *kvm)
+{
+	int i;
+
+	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
+		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
+
+		if (kvm_s2_mmu_valid(mmu))
+			kvm_stage2_flush_range(mmu, 0, kvm_phys_size(kvm));
+	}
+}
+
 /*
  * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
  * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 4e3cbfa1ecbe..bcca27d5c481 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -1005,8 +1005,10 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu,
 	 * Ensure a rebooted VM will fault in RAM pages and detect if the
 	 * guest MMU is turned off and flush the caches as needed.
 	 */
-	if (vcpu->arch.has_run_once)
+	if (vcpu->arch.has_run_once) {
 		stage2_unmap_vm(vcpu->kvm);
+		kvm_nested_s2_clear(vcpu->kvm);
+	}
 
 	vcpu_reset_hcr(vcpu);
 
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 3c7845832db8..94d400e7af57 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -416,12 +416,10 @@ static void stage2_flush_puds(struct kvm_s2_mmu *mmu, pgd_t *pgd,
 	} while (pud++, addr = next, addr != end);
 }
 
-static void stage2_flush_memslot(struct kvm_s2_mmu *mmu,
-				 struct kvm_memory_slot *memslot)
+void kvm_stage2_flush_range(struct kvm_s2_mmu *mmu,
+			    phys_addr_t addr, phys_addr_t end)
 {
 	struct kvm *kvm = mmu->kvm;
-	phys_addr_t addr = memslot->base_gfn << PAGE_SHIFT;
-	phys_addr_t end = addr + PAGE_SIZE * memslot->npages;
 	phys_addr_t next;
 	pgd_t *pgd;
 
@@ -433,6 +431,15 @@ static void stage2_flush_memslot(struct kvm_s2_mmu *mmu,
 	} while (pgd++, addr = next, addr != end);
 }
 
+static void stage2_flush_memslot(struct kvm_s2_mmu *mmu,
+				 struct kvm_memory_slot *memslot)
+{
+	phys_addr_t addr = memslot->base_gfn << PAGE_SHIFT;
+	phys_addr_t end = addr + PAGE_SIZE * memslot->npages;
+
+	kvm_stage2_flush_range(mmu, addr, end);
+}
+
 /**
  * stage2_flush_vm - Invalidate cache for pages mapped in stage 2
  * @kvm: The struct kvm pointer
@@ -453,6 +460,8 @@ static void stage2_flush_vm(struct kvm *kvm)
 	kvm_for_each_memslot(memslot, slots)
 		stage2_flush_memslot(&kvm->arch.mmu, memslot);
 
+	kvm_nested_s2_flush(kvm);
+
 	spin_unlock(&kvm->mmu_lock);
 	srcu_read_unlock(&kvm->srcu, idx);
 }
@@ -1509,12 +1518,12 @@ static void  stage2_wp_puds(struct kvm_s2_mmu *mmu, pgd_t *pgd,
 }
 
 /**
- * stage2_wp_range() - write protect stage2 memory region range
+ * kvm_stage2_wp_range() - write protect stage2 memory region range
  * @kvm:	The KVM pointer
  * @addr:	Start address of range
  * @end:	End address of range
  */
-static void stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end)
+void kvm_stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end)
 {
 	struct kvm *kvm = mmu->kvm;
 	pgd_t *pgd;
@@ -1561,7 +1570,8 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
 	phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
 
 	spin_lock(&kvm->mmu_lock);
-	stage2_wp_range(&kvm->arch.mmu, start, end);
+	kvm_stage2_wp_range(&kvm->arch.mmu, start, end);
+	kvm_nested_s2_wp(kvm);
 	spin_unlock(&kvm->mmu_lock);
 	kvm_flush_remote_tlbs(kvm);
 }
@@ -1585,7 +1595,7 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
 	phys_addr_t start = (base_gfn +  __ffs(mask)) << PAGE_SHIFT;
 	phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT;
 
-	stage2_wp_range(&kvm->arch.mmu, start, end);
+	kvm_stage2_wp_range(&kvm->arch.mmu, start, end);
 }
 
 /*
@@ -1600,6 +1610,7 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 		gfn_t gfn_offset, unsigned long mask)
 {
 	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
+	kvm_nested_s2_wp(kvm);
 }
 
 static void clean_dcache_guest_page(kvm_pfn_t pfn, unsigned long size)
@@ -2111,6 +2122,7 @@ static int handle_hva_to_gpa(struct kvm *kvm,
 static int kvm_unmap_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data)
 {
 	kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, size);
+	kvm_nested_s2_clear(kvm);
 	return 0;
 }
 
@@ -2138,6 +2150,7 @@ static int kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data
 	 * through this calling path.
 	 */
 	stage2_set_pte(&kvm->arch.mmu, NULL, gpa, pte, 0);
+	kvm_nested_s2_clear(kvm);
 	return 0;
 }
 
@@ -2180,6 +2193,13 @@ static int kvm_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data)
 		return stage2_pmdp_test_and_clear_young(pmd);
 	else
 		return stage2_ptep_test_and_clear_young(pte);
+
+	/*
+	 * TODO: Handle nested_mmu structures here using the reverse mapping in
+	 * a later version of patch series.
+	 */
+
+	return stage2_ptep_test_and_clear_young(pte);
 }
 
 static int kvm_test_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data)
@@ -2198,6 +2218,11 @@ static int kvm_test_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *
 		return pmd_young(*pmd);
 	else
 		return pte_young(*pte);
+
+	/*
+	 * TODO: Handle nested_mmu structures here using the reverse mapping in
+	 * a later version of patch series.
+	 */
 }
 
 int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end)
@@ -2455,6 +2480,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 
 	spin_lock(&kvm->mmu_lock);
 	kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, size);
+	kvm_nested_s2_clear(kvm);
 	spin_unlock(&kvm->mmu_lock);
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 39/59] KVM: arm64: nv: Move last_vcpu_ran to be per s2 mmu
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (37 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 38/59] KVM: arm64: nv: Unmap/flush shadow stage 2 page tables Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-07-01  9:10   ` Julien Thierry
  2019-07-05 15:28   ` Alexandru Elisei
  2019-06-21  9:38 ` [PATCH 40/59] KVM: arm64: nv: Don't always start an S2 MMU search from the beginning Marc Zyngier
                   ` (21 subsequent siblings)
  60 siblings, 2 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

last_vcpu_ran has to be per s2 mmu now that we can have multiple S2
per VM. Let's take this opportunity to perform some cleanup.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_host.h   |  6 +++---
 arch/arm/include/asm/kvm_mmu.h    |  2 +-
 arch/arm64/include/asm/kvm_host.h |  6 +++---
 arch/arm64/include/asm/kvm_mmu.h  |  2 +-
 arch/arm64/kvm/nested.c           | 13 ++++++-------
 virt/kvm/arm/arm.c                | 22 ++++------------------
 virt/kvm/arm/mmu.c                | 26 ++++++++++++++++++++------
 7 files changed, 38 insertions(+), 39 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index b821eb2383ad..cc761610e41e 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -63,15 +63,15 @@ struct kvm_s2_mmu {
 	pgd_t *pgd;
 	phys_addr_t pgd_phys;
 
+	/* The last vcpu id that ran on each physical CPU */
+	int __percpu *last_vcpu_ran;
+
 	struct kvm *kvm;
 };
 
 struct kvm_arch {
 	struct kvm_s2_mmu mmu;
 
-	/* The last vcpu id that ran on each physical CPU */
-	int __percpu *last_vcpu_ran;
-
 	/* Stage-2 page table */
 	pgd_t *pgd;
 	phys_addr_t pgd_phys;
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index afabf1fd1d17..7a6e9008ed45 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -52,7 +52,7 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
 void free_hyp_pgds(void);
 
 void stage2_unmap_vm(struct kvm *kvm);
-int kvm_alloc_stage2_pgd(struct kvm_s2_mmu *mmu);
+int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu);
 void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
 int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 			  phys_addr_t pa, unsigned long size, bool writable);
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index cc238de170d2..b71a7a237f95 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -104,6 +104,9 @@ struct kvm_s2_mmu {
 	 * >0: Somebody is actively using this.
 	 */
 	atomic_t refcnt;
+
+	/* The last vcpu id that ran on each physical CPU */
+	int __percpu *last_vcpu_ran;
 };
 
 static inline bool kvm_s2_mmu_valid(struct kvm_s2_mmu *mmu)
@@ -124,9 +127,6 @@ struct kvm_arch {
 	/* VTCR_EL2 value for this VM */
 	u64    vtcr;
 
-	/* The last vcpu id that ran on each physical CPU */
-	int __percpu *last_vcpu_ran;
-
 	/* The maximum number of vCPUs depends on the used GIC model */
 	int max_vcpus;
 
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index f4c5ac5eb95f..53103607065a 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -169,7 +169,7 @@ void free_hyp_pgds(void);
 
 void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size);
 void stage2_unmap_vm(struct kvm *kvm);
-int kvm_alloc_stage2_pgd(struct kvm_s2_mmu *mmu);
+int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu);
 void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
 int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 			  phys_addr_t pa, unsigned long size, bool writable);
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 8880033fb6e0..09afafbdc8fe 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -52,18 +52,17 @@ int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu)
 			 GFP_KERNEL | __GFP_ZERO);
 
 	if (tmp) {
-		if (tmp != kvm->arch.nested_mmus)
+		if (tmp != kvm->arch.nested_mmus) {
 			kfree(kvm->arch.nested_mmus);
+			kvm->arch.nested_mmus = NULL;
+			kvm->arch.nested_mmus_size = 0;
+		}
 
-		tmp[num_mmus - 1].kvm = kvm;
-		atomic_set(&tmp[num_mmus - 1].refcnt, 0);
-		ret = kvm_alloc_stage2_pgd(&tmp[num_mmus - 1]);
+		ret = kvm_init_stage2_mmu(kvm, &tmp[num_mmus - 1]);
 		if (ret)
 			goto out;
 
-		tmp[num_mmus - 2].kvm = kvm;
-		atomic_set(&tmp[num_mmus - 2].refcnt, 0);
-		ret = kvm_alloc_stage2_pgd(&tmp[num_mmus - 2]);
+		ret = kvm_init_stage2_mmu(kvm, &tmp[num_mmus - 2]);
 		if (ret) {
 			kvm_free_stage2_pgd(&tmp[num_mmus - 1]);
 			goto out;
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index bcca27d5c481..e8b584b79847 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -99,29 +99,21 @@ void kvm_arch_check_processor_compat(void *rtn)
 	*(int *)rtn = 0;
 }
 
-
 /**
  * kvm_arch_init_vm - initializes a VM data structure
  * @kvm:	pointer to the KVM struct
  */
 int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 {
-	int ret, cpu;
+	int ret;
 
 	ret = kvm_arm_setup_stage2(kvm, type);
 	if (ret)
 		return ret;
 
-	kvm->arch.last_vcpu_ran = alloc_percpu(typeof(*kvm->arch.last_vcpu_ran));
-	if (!kvm->arch.last_vcpu_ran)
-		return -ENOMEM;
-
-	for_each_possible_cpu(cpu)
-		*per_cpu_ptr(kvm->arch.last_vcpu_ran, cpu) = -1;
-
-	ret = kvm_alloc_stage2_pgd(&kvm->arch.mmu);
+	ret = kvm_init_stage2_mmu(kvm, &kvm->arch.mmu);
 	if (ret)
-		goto out_fail_alloc;
+		return ret;
 
 	/* Mark the initial VMID generation invalid */
 	kvm->arch.mmu.vmid.vmid_gen = 0;
@@ -142,9 +134,6 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	return ret;
 out_free_stage2_pgd:
 	kvm_free_stage2_pgd(&kvm->arch.mmu);
-out_fail_alloc:
-	free_percpu(kvm->arch.last_vcpu_ran);
-	kvm->arch.last_vcpu_ran = NULL;
 	return ret;
 }
 
@@ -174,9 +163,6 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 
 	kvm_vgic_destroy(kvm);
 
-	free_percpu(kvm->arch.last_vcpu_ran);
-	kvm->arch.last_vcpu_ran = NULL;
-
 	for (i = 0; i < KVM_MAX_VCPUS; ++i) {
 		if (kvm->vcpus[i]) {
 			kvm_arch_vcpu_free(kvm->vcpus[i]);
@@ -359,7 +345,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	if (nested_virt_in_use(vcpu))
 		kvm_vcpu_load_hw_mmu(vcpu);
 
-	last_ran = this_cpu_ptr(vcpu->kvm->arch.last_vcpu_ran);
+	last_ran = this_cpu_ptr(vcpu->arch.hw_mmu->last_vcpu_ran);
 	cpu_data = this_cpu_ptr(&kvm_host_data);
 
 	/*
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 94d400e7af57..6a7cba077bce 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -903,8 +903,9 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
 }
 
 /**
- * kvm_alloc_stage2_pgd - allocate level-1 table for stage-2 translation.
- * @mmu:	The stage 2 mmu struct pointer
+ * kvm_init_stage2_mmu - Initialise a S2 MMU strucrure
+ * @kvm:	The pointer to the KVM structure
+ * @mmu:	The pointer to the s2 MMU structure
  *
  * Allocates only the stage-2 HW PGD level table(s) of size defined by
  * stage2_pgd_size(mmu->kvm).
@@ -912,10 +913,11 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
  * Note we don't need locking here as this is only called when the VM is
  * created, which can only be done once.
  */
-int kvm_alloc_stage2_pgd(struct kvm_s2_mmu *mmu)
+int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
 {
 	phys_addr_t pgd_phys;
 	pgd_t *pgd;
+	int cpu;
 
 	if (mmu->pgd != NULL) {
 		kvm_err("kvm_arch already initialized?\n");
@@ -923,18 +925,28 @@ int kvm_alloc_stage2_pgd(struct kvm_s2_mmu *mmu)
 	}
 
 	/* Allocate the HW PGD, making sure that each page gets its own refcount */
-	pgd = alloc_pages_exact(stage2_pgd_size(mmu->kvm), GFP_KERNEL | __GFP_ZERO);
+	pgd = alloc_pages_exact(stage2_pgd_size(kvm), GFP_KERNEL | __GFP_ZERO);
 	if (!pgd)
 		return -ENOMEM;
 
 	pgd_phys = virt_to_phys(pgd);
-	if (WARN_ON(pgd_phys & ~kvm_vttbr_baddr_mask(mmu->kvm)))
+	if (WARN_ON(pgd_phys & ~kvm_vttbr_baddr_mask(kvm)))
 		return -EINVAL;
 
+	mmu->last_vcpu_ran = alloc_percpu(typeof(*mmu->last_vcpu_ran));
+	if (!mmu->last_vcpu_ran) {
+		free_pages_exact(pgd, stage2_pgd_size(kvm));
+		return -ENOMEM;
+	}
+
+	mmu->kvm = kvm;
 	mmu->pgd = pgd;
 	mmu->pgd_phys = pgd_phys;
 	mmu->vmid.vmid_gen = 0;
 
+	for_each_possible_cpu(cpu)
+		*per_cpu_ptr(mmu->last_vcpu_ran, cpu) = -1;
+
 	kvm_init_s2_mmu(mmu);
 
 	return 0;
@@ -1021,8 +1033,10 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
 	spin_unlock(&kvm->mmu_lock);
 
 	/* Free the HW pgd, one page at a time */
-	if (pgd)
+	if (pgd) {
 		free_pages_exact(pgd, stage2_pgd_size(kvm));
+		free_percpu(mmu->last_vcpu_ran);
+	}
 }
 
 static pud_t *stage2_get_pud(struct kvm_s2_mmu *mmu, struct kvm_mmu_memory_cache *cache,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 40/59] KVM: arm64: nv: Don't always start an S2 MMU search from the beginning
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (38 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 39/59] KVM: arm64: nv: Move last_vcpu_ran to be per s2 mmu Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-07-09  9:59   ` Alexandru Elisei
  2019-06-21  9:38 ` [PATCH 41/59] KVM: arm64: nv: Introduce sys_reg_desc.forward_trap Marc Zyngier
                   ` (20 subsequent siblings)
  60 siblings, 1 reply; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

Starting a S2 MMU search from the beginning all the time means that
we're potentially nuking a useful context (like we'd potentially
have on a !VHE KVM guest).

Instead, let's always start the search from the point *after* the
last allocated context. This should ensure that alternating between
two EL1 contexts will not result in nuking the whole S2 each time.

lookup_s2_mmu now has a chance to provide a hit.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/kvm_host.h |  1 +
 arch/arm64/kvm/nested.c           | 14 ++++++++++++--
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index b71a7a237f95..b7c44adcdbf3 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -123,6 +123,7 @@ struct kvm_arch {
 	 */
 	struct kvm_s2_mmu *nested_mmus;
 	size_t nested_mmus_size;
+	int nested_mmus_next;
 
 	/* VTCR_EL2 value for this VM */
 	u64    vtcr;
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 09afafbdc8fe..214d59019935 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -363,14 +363,24 @@ static struct kvm_s2_mmu *get_s2_mmu_nested(struct kvm_vcpu *vcpu)
 	if (s2_mmu)
 		goto out;
 
-	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
-		s2_mmu = &kvm->arch.nested_mmus[i];
+	/*
+	 * Make sure we don't always search from the same point, or we
+	 * will always reuse a potentially active context, leaving
+	 * free contexts unused.
+	 */
+	for (i = kvm->arch.nested_mmus_next;
+	     i < (kvm->arch.nested_mmus_size + kvm->arch.nested_mmus_next);
+	     i++) {
+		s2_mmu = &kvm->arch.nested_mmus[i % kvm->arch.nested_mmus_size];
 
 		if (atomic_read(&s2_mmu->refcnt) == 0)
 			break;
 	}
 	BUG_ON(atomic_read(&s2_mmu->refcnt)); /* We have struct MMUs to spare */
 
+	/* Set the scene for the next search */
+	kvm->arch.nested_mmus_next = (i + 1) % kvm->arch.nested_mmus_size;
+
 	if (kvm_s2_mmu_valid(s2_mmu)) {
 		/* Clear the old state */
 		kvm_unmap_stage2_range(s2_mmu, 0, kvm_phys_size(kvm));
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 41/59] KVM: arm64: nv: Introduce sys_reg_desc.forward_trap
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (39 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 40/59] KVM: arm64: nv: Don't always start an S2 MMU search from the beginning Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-06-21  9:38 ` [PATCH 42/59] KVM: arm64: nv: Rework the system instruction emulation framework Marc Zyngier
                   ` (19 subsequent siblings)
  60 siblings, 0 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Jintack Lim <jintack.lim@linaro.org>

This introduces a function prototype to determine if we need to forward
system instruction traps to the virtual EL2. The implementation of
forward_trap functions for each system instruction will be added in
later patches.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/kvm/sys_regs.c | 9 +++++++++
 arch/arm64/kvm/sys_regs.h | 6 ++++++
 2 files changed, 15 insertions(+)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index cc994ec3c121..8fbb04ddde11 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -36,6 +36,7 @@
 #include <asm/kvm_host.h>
 #include <asm/kvm_hyp.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_nested.h>
 #include <asm/perf_event.h>
 #include <asm/sysreg.h>
 
@@ -2532,6 +2533,14 @@ static void perform_access(struct kvm_vcpu *vcpu,
 	 */
 	BUG_ON(!r->access);
 
+	/*
+	 * Forward this trap to the virtual EL2 if the guest hypervisor has
+	 * configured to trap the current instruction.
+	 */
+	if (nested_virt_in_use(vcpu) && r->forward_trap
+	    && unlikely(r->forward_trap(vcpu)))
+		return;
+
 	/* Skip instruction if instructed so */
 	if (likely(r->access(vcpu, params, r)))
 		kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
diff --git a/arch/arm64/kvm/sys_regs.h b/arch/arm64/kvm/sys_regs.h
index 2be99508dcb9..0702d6b401a1 100644
--- a/arch/arm64/kvm/sys_regs.h
+++ b/arch/arm64/kvm/sys_regs.h
@@ -65,6 +65,12 @@ struct sys_reg_desc {
 	int (*set_user)(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
 			const struct kvm_one_reg *reg, void __user *uaddr);
 
+	/*
+	 * Forward the trap to the virtual EL2 if the guest hypervisor has
+	 * configured to trap the current instruction.
+	 */
+	bool (*forward_trap)(struct kvm_vcpu *vcpu);
+
 	/* Return mask of REG_* runtime visibility overrides */
 	unsigned int (*visibility)(const struct kvm_vcpu *vcpu,
 				   const struct sys_reg_desc *rd);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 42/59] KVM: arm64: nv: Rework the system instruction emulation framework
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (40 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 41/59] KVM: arm64: nv: Introduce sys_reg_desc.forward_trap Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-06-21  9:38 ` [PATCH 43/59] KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2 Marc Zyngier
                   ` (18 subsequent siblings)
  60 siblings, 0 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Jintack Lim <jintack.lim@linaro.org>

Rework the system instruction emulation framework to handle potentially
all system instruction traps other than MSR/MRS instructions. Those
system instructions would be AT and TLBI instructions controlled by
HCR_EL2.NV, AT, and TTLB bits.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
[Changed to use a generic forward_traps wrapper for forward_nv_traps]
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/kvm/sys_regs.c | 62 ++++++++++++++++++---------------------
 1 file changed, 28 insertions(+), 34 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 8fbb04ddde11..0d5b7a7c76de 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1656,7 +1656,6 @@ static bool access_sp_el1(struct kvm_vcpu *vcpu,
 	return true;
 }
 
-
 /* This function is to support the recursive nested virtualization */
 static bool forward_nv1_traps(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
 {
@@ -1703,6 +1702,12 @@ static bool access_spsr_el2(struct kvm_vcpu *vcpu,
 			    struct sys_reg_params *p,
 			    const struct sys_reg_desc *r)
 {
+	if (el12_reg(p) && forward_nv_traps(vcpu))
+		return false;
+
+	if (!el12_reg(p) && forward_nv1_traps(vcpu, p))
+		return false;
+
 	if (p->is_write)
 		vcpu_write_sys_reg(vcpu, p->regval, SPSR_EL2);
 	else
@@ -1786,10 +1791,6 @@ static bool access_id_aa64pfr0_el1(struct kvm_vcpu *v,
  * more demanding guest...
  */
 static const struct sys_reg_desc sys_reg_descs[] = {
-	{ SYS_DESC(SYS_DC_ISW), access_dcsw },
-	{ SYS_DESC(SYS_DC_CSW), access_dcsw },
-	{ SYS_DESC(SYS_DC_CISW), access_dcsw },
-
 	DBG_BCR_BVR_WCR_WVR_EL1(0),
 	DBG_BCR_BVR_WCR_WVR_EL1(1),
 	{ SYS_DESC(SYS_MDCCINT_EL1), trap_debug_regs, reset_val, MDCCINT_EL1, 0 },
@@ -2134,6 +2135,14 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_SP_EL2), NULL, reset_unknown, SP_EL2 },
 };
 
+#define SYS_INSN_TO_DESC(insn, access_fn, forward_fn)	\
+	{ SYS_DESC((insn)), (access_fn), NULL, 0, 0, NULL, NULL, (forward_fn) }
+static struct sys_reg_desc sys_insn_descs[] = {
+	{ SYS_DESC(SYS_DC_ISW), access_dcsw },
+	{ SYS_DESC(SYS_DC_CSW), access_dcsw },
+	{ SYS_DESC(SYS_DC_CISW), access_dcsw },
+};
+
 static bool trap_dbgidr(struct kvm_vcpu *vcpu,
 			struct sys_reg_params *p,
 			const struct sys_reg_desc *r)
@@ -2755,38 +2764,22 @@ static int emulate_sys_reg(struct kvm_vcpu *vcpu,
 	return 1;
 }
 
-static int emulate_tlbi(struct kvm_vcpu *vcpu,
-			     struct sys_reg_params *params)
+static int emulate_sys_instr(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
 {
-	/* TODO: support tlbi instruction emulation*/
-	kvm_inject_undefined(vcpu);
-	return 1;
-}
-
-static int emulate_at(struct kvm_vcpu *vcpu,
-			     struct sys_reg_params *params)
-{
-	/* TODO: support address translation instruction emulation */
-	kvm_inject_undefined(vcpu);
-	return 1;
-}
-
-static int emulate_sys_instr(struct kvm_vcpu *vcpu,
-			     struct sys_reg_params *params)
-{
-	int ret = 0;
-
-	/* TLB maintenance instructions*/
-	if (params->CRn == 0b1000)
-		ret = emulate_tlbi(vcpu, params);
-	/* Address Translation instructions */
-	else if (params->CRn == 0b0111 && params->CRm == 0b1000)
-		ret = emulate_at(vcpu, params);
+	const struct sys_reg_desc *r;
 
-	if (ret)
-		kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
+	/* Search from the system instruction table. */
+	r = find_reg(p, sys_insn_descs, ARRAY_SIZE(sys_insn_descs));
 
-	return ret;
+	if (likely(r)) {
+		perform_access(vcpu, p, r);
+	} else {
+		kvm_err("Unsupported guest sys instruction at: %lx\n",
+			*vcpu_pc(vcpu));
+		print_sys_reg_instr(p);
+		kvm_inject_undefined(vcpu);
+	}
+	return 1;
 }
 
 static void reset_sys_reg_descs(struct kvm_vcpu *vcpu,
@@ -3282,6 +3275,7 @@ void kvm_sys_reg_table_init(void)
 	BUG_ON(check_sysreg_table(cp15_regs, ARRAY_SIZE(cp15_regs)));
 	BUG_ON(check_sysreg_table(cp15_64_regs, ARRAY_SIZE(cp15_64_regs)));
 	BUG_ON(check_sysreg_table(invariant_sys_regs, ARRAY_SIZE(invariant_sys_regs)));
+	BUG_ON(check_sysreg_table(sys_insn_descs, ARRAY_SIZE(sys_insn_descs)));
 
 	/* We abuse the reset function to overwrite the table itself. */
 	for (i = 0; i < ARRAY_SIZE(invariant_sys_regs); i++)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 43/59] KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (41 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 42/59] KVM: arm64: nv: Rework the system instruction emulation framework Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-07-01 15:45   ` Julien Thierry
                     ` (2 more replies)
  2019-06-21  9:38 ` [PATCH 44/59] KVM: arm64: nv: Trap and emulate TLBI " Marc Zyngier
                   ` (17 subsequent siblings)
  60 siblings, 3 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Jintack Lim <jintack.lim@linaro.org>

When supporting nested virtualization a guest hypervisor executing AT
instructions must be trapped and emulated by the host hypervisor,
because untrapped AT instructions operating on S1E1 will use the wrong
translation regieme (the one used to emulate virtual EL2 in EL1 instead
of virtual EL1) and AT instructions operating on S12 will not work from
EL1.

This patch does several things.

1. List and define all AT system instructions to emulate and document
the emulation design.

2. Implement AT instruction handling logic in EL2. This will be used to
emulate AT instructions executed in the virtual EL2.

AT instruction emulation works by loading the proper processor
context, which depends on the trapped instruction and the virtual
HCR_EL2, to the EL1 virtual memory control registers and executing AT
instructions. Note that ctxt->hw_sys_regs is expected to have the
proper processor context before calling the handling
function(__kvm_at_insn) implemented in this patch.

4. Emulate AT S1E[01] instructions by issuing the same instructions in
EL2. We set the physical EL1 registers, NV and NV1 bits as described in
the AT instruction emulation overview.

5. Emulate AT A12E[01] instructions in two steps: First, do the stage-1
translation by reusing the existing AT emulation functions.  Second, do
the stage-2 translation by walking the guest hypervisor's stage-2 page
table in software. Record the translation result to PAR_EL1.

6. Emulate AT S1E2 instructions by issuing the corresponding S1E1
instructions in EL2. We set the physical EL1 registers and the HCR_EL2
register as described in the AT instruction emulation overview.

7. Forward system instruction traps to the virtual EL2 if the corresponding
virtual AT bit is set in the virtual HCR_EL2.

  [ Much logic above has been reworked by Marc Zyngier ]

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
---
 arch/arm64/include/asm/kvm_arm.h |   2 +
 arch/arm64/include/asm/kvm_asm.h |   2 +
 arch/arm64/include/asm/sysreg.h  |  17 +++
 arch/arm64/kvm/hyp/Makefile      |   1 +
 arch/arm64/kvm/hyp/at.c          | 217 +++++++++++++++++++++++++++++++
 arch/arm64/kvm/hyp/switch.c      |  13 +-
 arch/arm64/kvm/sys_regs.c        | 202 +++++++++++++++++++++++++++-
 7 files changed, 450 insertions(+), 4 deletions(-)
 create mode 100644 arch/arm64/kvm/hyp/at.c

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 1e4dbe0b1c8e..9903f10f6343 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -24,6 +24,7 @@
 
 /* Hyp Configuration Register (HCR) bits */
 #define HCR_FWB		(UL(1) << 46)
+#define HCR_AT		(UL(1) << 44)
 #define HCR_NV1		(UL(1) << 43)
 #define HCR_NV		(UL(1) << 42)
 #define HCR_API		(UL(1) << 41)
@@ -119,6 +120,7 @@
 #define VTCR_EL2_TG0_16K	TCR_TG0_16K
 #define VTCR_EL2_TG0_64K	TCR_TG0_64K
 #define VTCR_EL2_SH0_MASK	TCR_SH0_MASK
+#define VTCR_EL2_SH0_SHIFT	TCR_SH0_SHIFT
 #define VTCR_EL2_SH0_INNER	TCR_SH0_INNER
 #define VTCR_EL2_ORGN0_MASK	TCR_ORGN0_MASK
 #define VTCR_EL2_ORGN0_WBWA	TCR_ORGN0_WBWA
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 5e956c2cd9b4..1cfa4d2cf772 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -69,6 +69,8 @@ extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
 extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
 
 extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
+extern void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr);
+extern void __kvm_at_s1e2(struct kvm_vcpu *vcpu, u32 op, u64 vaddr);
 
 extern int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 8b95f2c42c3d..b3a8d21c07b3 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -536,6 +536,23 @@
 
 #define SYS_SP_EL2			sys_reg(3, 6,  4, 1, 0)
 
+/* AT instructions */
+#define AT_Op0 1
+#define AT_CRn 7
+
+#define OP_AT_S1E1R	sys_insn(AT_Op0, 0, AT_CRn, 8, 0)
+#define OP_AT_S1E1W	sys_insn(AT_Op0, 0, AT_CRn, 8, 1)
+#define OP_AT_S1E0R	sys_insn(AT_Op0, 0, AT_CRn, 8, 2)
+#define OP_AT_S1E0W	sys_insn(AT_Op0, 0, AT_CRn, 8, 3)
+#define OP_AT_S1E1RP	sys_insn(AT_Op0, 0, AT_CRn, 9, 0)
+#define OP_AT_S1E1WP	sys_insn(AT_Op0, 0, AT_CRn, 9, 1)
+#define OP_AT_S1E2R	sys_insn(AT_Op0, 4, AT_CRn, 8, 0)
+#define OP_AT_S1E2W	sys_insn(AT_Op0, 4, AT_CRn, 8, 1)
+#define OP_AT_S12E1R	sys_insn(AT_Op0, 4, AT_CRn, 8, 4)
+#define OP_AT_S12E1W	sys_insn(AT_Op0, 4, AT_CRn, 8, 5)
+#define OP_AT_S12E0R	sys_insn(AT_Op0, 4, AT_CRn, 8, 6)
+#define OP_AT_S12E0W	sys_insn(AT_Op0, 4, AT_CRn, 8, 7)
+
 /* Common SCTLR_ELx flags. */
 #define SCTLR_ELx_DSSBS	(_BITUL(44))
 #define SCTLR_ELx_ENIA	(_BITUL(31))
diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
index ea710f674cb6..f7af51647079 100644
--- a/arch/arm64/kvm/hyp/Makefile
+++ b/arch/arm64/kvm/hyp/Makefile
@@ -19,6 +19,7 @@ obj-$(CONFIG_KVM_ARM_HOST) += entry.o
 obj-$(CONFIG_KVM_ARM_HOST) += switch.o
 obj-$(CONFIG_KVM_ARM_HOST) += fpsimd.o
 obj-$(CONFIG_KVM_ARM_HOST) += tlb.o
+obj-$(CONFIG_KVM_ARM_HOST) += at.o
 obj-$(CONFIG_KVM_ARM_HOST) += hyp-entry.o
 
 # KVM code is run at a different exception code with a different map, so
diff --git a/arch/arm64/kvm/hyp/at.c b/arch/arm64/kvm/hyp/at.c
new file mode 100644
index 000000000000..0e938b6f8e43
--- /dev/null
+++ b/arch/arm64/kvm/hyp/at.c
@@ -0,0 +1,217 @@
+/*
+ * Copyright (C) 2017 - Linaro Ltd
+ * Author: Jintack Lim <jintack.lim@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
+
+struct mmu_config {
+	u64	ttbr0;
+	u64	ttbr1;
+	u64	tcr;
+	u64	sctlr;
+	u64	vttbr;
+	u64	vtcr;
+	u64	hcr;
+};
+
+static void __mmu_config_save(struct mmu_config *config)
+{
+	config->ttbr0	= read_sysreg_el1(SYS_TTBR0);
+	config->ttbr1	= read_sysreg_el1(SYS_TTBR1);
+	config->tcr	= read_sysreg_el1(SYS_TCR);
+	config->sctlr	= read_sysreg_el1(SYS_SCTLR);
+	config->vttbr	= read_sysreg(vttbr_el2);
+	config->vtcr	= read_sysreg(vtcr_el2);
+	config->hcr	= read_sysreg(hcr_el2);
+}
+
+static void __mmu_config_restore(struct mmu_config *config)
+{
+	write_sysreg_el1(config->ttbr0,	SYS_TTBR0);
+	write_sysreg_el1(config->ttbr1,	SYS_TTBR1);
+	write_sysreg_el1(config->tcr,	SYS_TCR);
+	write_sysreg_el1(config->sctlr,	SYS_SCTLR);
+	write_sysreg(config->vttbr,	vttbr_el2);
+	write_sysreg(config->vtcr,	vttbr_el2);
+	write_sysreg(config->hcr,	hcr_el2);
+
+	isb();
+}
+
+void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
+{
+	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
+	struct mmu_config config;
+	struct kvm_s2_mmu *mmu;
+
+	/*
+	 * We can only get here when trapping from vEL2, so we're
+	 * translating a guest guest VA.
+	 *
+	 * FIXME: Obtaining the S2 MMU for a a guest guest is horribly
+	 * racy, and we may not find it.
+	 */
+	spin_lock(&vcpu->kvm->mmu_lock);
+
+	mmu = lookup_s2_mmu(vcpu->kvm,
+			    vcpu_read_sys_reg(vcpu, VTTBR_EL2),
+			    vcpu_read_sys_reg(vcpu, HCR_EL2));
+
+	if (WARN_ON(!mmu))
+		goto out;
+
+	/* We've trapped, so everything is live on the CPU. */
+	__mmu_config_save(&config);
+
+	write_sysreg_el1(ctxt->sys_regs[TTBR0_EL1],	SYS_TTBR0);
+	write_sysreg_el1(ctxt->sys_regs[TTBR1_EL1],	SYS_TTBR1);
+	write_sysreg_el1(ctxt->sys_regs[TCR_EL1],	SYS_TCR);
+	write_sysreg_el1(ctxt->sys_regs[SCTLR_EL1],	SYS_SCTLR);
+	write_sysreg(kvm_get_vttbr(mmu),		vttbr_el2);
+	/* FIXME: write S2 MMU VTCR_EL2 */
+	write_sysreg(config.hcr & ~HCR_TGE,		hcr_el2);
+
+	isb();
+
+	switch (op) {
+	case OP_AT_S1E1R:
+	case OP_AT_S1E1RP:
+		asm volatile("at s1e1r, %0" : : "r" (vaddr));
+		break;
+	case OP_AT_S1E1W:
+	case OP_AT_S1E1WP:
+		asm volatile("at s1e1w, %0" : : "r" (vaddr));
+		break;
+	case OP_AT_S1E0R:
+		asm volatile("at s1e0r, %0" : : "r" (vaddr));
+		break;
+	case OP_AT_S1E0W:
+		asm volatile("at s1e0w, %0" : : "r" (vaddr));
+		break;
+	default:
+		WARN_ON(1);
+		break;
+	}
+
+	isb();
+
+	ctxt->sys_regs[PAR_EL1] = read_sysreg(par_el1);
+
+	/*
+	 * Failed? let's leave the building now.
+	 *
+	 * FIXME: how about a failed translation because the shadow S2
+	 * wasn't populated? We may need to perform a SW PTW,
+	 * populating our shadow S2 and retry the instruction.
+	 */
+	if (ctxt->sys_regs[PAR_EL1] & 1)
+		goto nopan;
+
+	/* No PAN? No problem. */
+	if (!(*vcpu_cpsr(vcpu) & PSR_PAN_BIT))
+		goto nopan;
+
+	/*
+	 * For PAN-involved AT operations, perform the same
+	 * translation, using EL0 this time.
+	 */
+	switch (op) {
+	case OP_AT_S1E1RP:
+		asm volatile("at s1e0r, %0" : : "r" (vaddr));
+		break;
+	case OP_AT_S1E1WP:
+		asm volatile("at s1e0w, %0" : : "r" (vaddr));
+		break;
+	default:
+		goto nopan;
+	}
+
+	/*
+	 * If the EL0 translation has succeeded, we need to pretend
+	 * the AT operation has failed, as the PAN setting forbids
+	 * such a translation.
+	 *
+	 * FIXME: we hardcode a Level-3 permission fault. We really
+	 * should return the real fault level.
+	 */
+	if (!(read_sysreg(par_el1) & 1))
+		ctxt->sys_regs[PAR_EL1] = 0x1f;
+
+nopan:
+	__mmu_config_restore(&config);
+
+out:
+	spin_unlock(&vcpu->kvm->mmu_lock);
+}
+
+void __kvm_at_s1e2(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
+{
+	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
+	struct mmu_config config;
+	struct kvm_s2_mmu *mmu;
+	u64 val;
+
+	spin_lock(&vcpu->kvm->mmu_lock);
+
+	mmu = &vcpu->kvm->arch.mmu;
+
+	/* We've trapped, so everything is live on the CPU. */
+	__mmu_config_save(&config);
+
+	if (vcpu_el2_e2h_is_set(vcpu)) {
+		write_sysreg_el1(ctxt->sys_regs[TTBR0_EL2],	SYS_TTBR0);
+		write_sysreg_el1(ctxt->sys_regs[TTBR1_EL2],	SYS_TTBR1);
+		write_sysreg_el1(ctxt->sys_regs[TCR_EL2],	SYS_TCR);
+		write_sysreg_el1(ctxt->sys_regs[SCTLR_EL2],	SYS_SCTLR);
+
+		val = config.hcr;
+	} else {
+		write_sysreg_el1(ctxt->sys_regs[TTBR0_EL2],	SYS_TTBR0);
+		write_sysreg_el1(translate_tcr(ctxt->sys_regs[TCR_EL2]),
+				 SYS_TCR);
+		write_sysreg_el1(translate_sctlr(ctxt->sys_regs[SCTLR_EL2]),
+				 SYS_SCTLR);
+
+		val = config.hcr | HCR_NV | HCR_NV1;
+	}
+
+	write_sysreg(kvm_get_vttbr(mmu),		vttbr_el2);
+	/* FIXME: write S2 MMU VTCR_EL2 */
+	write_sysreg(val & ~HCR_TGE,			hcr_el2);
+
+	isb();
+
+	switch (op) {
+	case OP_AT_S1E2R:
+		asm volatile("at s1e1r, %0" : : "r" (vaddr));
+		break;
+	case OP_AT_S1E2W:
+		asm volatile("at s1e1w, %0" : : "r" (vaddr));
+		break;
+	default:
+		WARN_ON(1);
+		break;
+	}
+
+	isb();
+
+	/* FIXME: handle failed translation due to shadow S2 */
+	ctxt->sys_regs[PAR_EL1] = read_sysreg(par_el1);
+
+	__mmu_config_restore(&config);
+	spin_unlock(&vcpu->kvm->mmu_lock);
+}
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index fb479c71b521..bd9fc0dae8e8 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -143,9 +143,10 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
 		if (!vcpu_el2_e2h_is_set(vcpu)) {
 			/*
 			 * For a guest hypervisor on v8.0, trap and emulate
-			 * the EL1 virtual memory control register accesses.
+			 * the EL1 virtual memory control register accesses
+			 * as well as the AT S1 operations.
 			 */
-			hcr |= HCR_TVM | HCR_TRVM | HCR_NV1;
+			hcr |= HCR_TVM | HCR_TRVM | HCR_AT | HCR_NV1;
 		} else {
 			/*
 			 * For a guest hypervisor on v8.1 (VHE), allow to
@@ -168,6 +169,14 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
 			hcr &= ~HCR_TVM;
 
 			hcr |= vhcr_el2 & (HCR_TVM | HCR_TRVM);
+
+			/*
+			 * If we're using the EL1 translation regime
+			 * (TGE clear, then ensure that AT S1 ops are
+			 * trapped too.
+			 */
+			if (!vcpu_el2_tge_is_set(vcpu))
+				hcr |= HCR_AT;
 		}
 	}
 
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 0d5b7a7c76de..102419b837e8 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1656,6 +1656,11 @@ static bool access_sp_el1(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+static bool forward_at_traps(struct kvm_vcpu *vcpu)
+{
+	return forward_traps(vcpu, HCR_AT);
+}
+
 /* This function is to support the recursive nested virtualization */
 static bool forward_nv1_traps(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
 {
@@ -2135,12 +2140,205 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_SP_EL2), NULL, reset_unknown, SP_EL2 },
 };
 
-#define SYS_INSN_TO_DESC(insn, access_fn, forward_fn)	\
-	{ SYS_DESC((insn)), (access_fn), NULL, 0, 0, NULL, NULL, (forward_fn) }
+static bool handle_s1e01(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			 const struct sys_reg_desc *r)
+{
+	int sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
+
+	__kvm_at_s1e01(vcpu, sys_encoding, p->regval);
+
+	return true;
+}
+
+static bool handle_s1e2(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			const struct sys_reg_desc *r)
+{
+	int sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
+
+	__kvm_at_s1e2(vcpu, sys_encoding, p->regval);
+
+	return true;
+}
+
+static u64 setup_par_aborted(u32 esr)
+{
+	u64 par = 0;
+
+	/* S [9]: fault in the stage 2 translation */
+	par |= (1 << 9);
+	/* FST [6:1]: Fault status code  */
+	par |= (esr << 1);
+	/* F [0]: translation is aborted */
+	par |= 1;
+
+	return par;
+}
+
+static u64 setup_par_completed(struct kvm_vcpu *vcpu, struct kvm_s2_trans *out)
+{
+	u64 par, vtcr_sh0;
+
+	/* F [0]: Translation is completed successfully */
+	par = 0;
+	/* ATTR [63:56] */
+	par |= out->upper_attr;
+	/* PA [47:12] */
+	par |= out->output & GENMASK_ULL(11, 0);
+	/* RES1 [11] */
+	par |= (1UL << 11);
+	/* SH [8:7]: Shareability attribute */
+	vtcr_sh0 = vcpu_read_sys_reg(vcpu, VTCR_EL2) & VTCR_EL2_SH0_MASK;
+	par |= (vtcr_sh0 >> VTCR_EL2_SH0_SHIFT) << 7;
+
+	return par;
+}
+
+static bool handle_s12(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+		       const struct sys_reg_desc *r, bool write)
+{
+	u64 par, va;
+	u32 esr;
+	phys_addr_t ipa;
+	struct kvm_s2_trans out;
+	int ret;
+
+	/* Do the stage-1 translation */
+	handle_s1e01(vcpu, p, r);
+	par = vcpu_read_sys_reg(vcpu, PAR_EL1);
+	if (par & 1) {
+		/* The stage-1 translation aborted */
+		return true;
+	}
+
+	/* Do the stage-2 translation */
+	va = p->regval;
+	ipa = (par & GENMASK_ULL(47, 12)) | (va & GENMASK_ULL(11, 0));
+	out.esr = 0;
+	ret = kvm_walk_nested_s2(vcpu, ipa, &out);
+	if (ret < 0)
+		return false;
+
+	/* Check if the stage-2 PTW is aborted */
+	if (out.esr) {
+		esr = out.esr;
+		goto s2_trans_abort;
+	}
+
+	/* Check the access permission */
+	if ((!write && !out.readable) || (write && !out.writable)) {
+		esr = ESR_ELx_FSC_PERM;
+		esr |= out.level & 0x3;
+		goto s2_trans_abort;
+	}
+
+	vcpu_write_sys_reg(vcpu, setup_par_completed(vcpu, &out), PAR_EL1);
+	return true;
+
+s2_trans_abort:
+	vcpu_write_sys_reg(vcpu, setup_par_aborted(esr), PAR_EL1);
+	return true;
+}
+
+static bool handle_s12r(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			const struct sys_reg_desc *r)
+{
+	return handle_s12(vcpu, p, r, false);
+}
+
+static bool handle_s12w(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			const struct sys_reg_desc *r)
+{
+	return handle_s12(vcpu, p, r, true);
+}
+
+/*
+ * AT instruction emulation
+ *
+ * We emulate AT instructions executed in the virtual EL2.
+ * Basic strategy for the stage-1 translation emulation is to load proper
+ * context, which depends on the trapped instruction and the virtual HCR_EL2,
+ * to the EL1 virtual memory control registers and execute S1E[01] instructions
+ * in EL2. See below for more detail.
+ *
+ * For the stage-2 translation, which is necessary for S12E[01] emulation,
+ * we walk the guest hypervisor's stage-2 page table in software.
+ *
+ * The stage-1 translation emulations can be divided into two groups depending
+ * on the translation regime.
+ *
+ * 1. EL2 AT instructions: S1E2x
+ * +-----------------------------------------------------------------------+
+ * |                             |         Setting for the emulation       |
+ * | Virtual HCR_EL2.E2H on trap |-----------------------------------------+
+ * |                             | Phys EL1 regs | Phys NV, NV1 | Phys TGE |
+ * |-----------------------------------------------------------------------|
+ * |             0               |     vEL2      |    (1, 1)    |    0     |
+ * |             1               |     vEL2      |    (0, 0)    |    0     |
+ * +-----------------------------------------------------------------------+
+ *
+ * We emulate the EL2 AT instructions by loading virtual EL2 context
+ * to the EL1 virtual memory control registers and executing corresponding
+ * EL1 AT instructions.
+ *
+ * We set physical NV and NV1 bits to use EL2 page table format for non-VHE
+ * guest hypervisor (i.e. HCR_EL2.E2H == 0). As a VHE guest hypervisor uses the
+ * EL1 page table format, we don't set those bits.
+ *
+ * We should clear physical TGE bit not to use the EL2 translation regime when
+ * the host uses the VHE feature.
+ *
+ *
+ * 2. EL0/EL1 AT instructions: S1E[01]x, S12E1x
+ * +----------------------------------------------------------------------+
+ * |   Virtual HCR_EL2 on trap  |        Setting for the emulation        |
+ * |----------------------------------------------------------------------+
+ * | (vE2H, vTGE) | (vNV, vNV1) | Phys EL1 regs | Phys NV, NV1 | Phys TGE |
+ * |----------------------------------------------------------------------|
+ * |    (0, 0)*   |   (0, 0)    |      vEL1     |    (0, 0)    |    0     |
+ * |    (0, 0)    |   (1, 1)    |      vEL1     |    (1, 1)    |    0     |
+ * |    (1, 1)    |   (0, 0)    |      vEL2     |    (0, 0)    |    0     |
+ * |    (1, 1)    |   (1, 1)    |      vEL2     |    (1, 1)    |    0     |
+ * +----------------------------------------------------------------------+
+ *
+ * *For (0, 0) in the 'Virtual HCR_EL2 on trap' column, it actually means
+ *  (1, 1). Keep them (0, 0) just for the readability.
+ *
+ * We set physical EL1 virtual memory control registers depending on
+ * (vE2H, vTGE) pair. When the pair is (0, 0) where AT instructions are
+ * supposed to use EL0/EL1 translation regime, we load the EL1 registers with
+ * the virtual EL1 registers (i.e. EL1 registers from the guest hypervisor's
+ * point of view). When the pair is (1, 1), however, AT instructions are defined
+ * to apply EL2 translation regime. To emulate this behavior, we load the EL1
+ * registers with the virtual EL2 context. (i.e the shadow registers)
+ *
+ * We respect the virtual NV and NV1 bit for the emulation. When those bits are
+ * set, it means that a guest hypervisor would like to use EL2 page table format
+ * for the EL1 translation regime. We emulate this by setting the physical
+ * NV and NV1 bits.
+ */
+
+#define SYS_INSN_TO_DESC(insn, access_fn, forward_fn)			\
+	{ SYS_DESC(OP_##insn), (access_fn), NULL, 0, 0,			\
+	  NULL, NULL, (forward_fn) }
 static struct sys_reg_desc sys_insn_descs[] = {
 	{ SYS_DESC(SYS_DC_ISW), access_dcsw },
+
+	SYS_INSN_TO_DESC(AT_S1E1R, handle_s1e01, forward_at_traps),
+	SYS_INSN_TO_DESC(AT_S1E1W, handle_s1e01, forward_at_traps),
+	SYS_INSN_TO_DESC(AT_S1E0R, handle_s1e01, forward_at_traps),
+	SYS_INSN_TO_DESC(AT_S1E0W, handle_s1e01, forward_at_traps),
+	SYS_INSN_TO_DESC(AT_S1E1RP, handle_s1e01, forward_at_traps),
+	SYS_INSN_TO_DESC(AT_S1E1WP, handle_s1e01, forward_at_traps),
+
 	{ SYS_DESC(SYS_DC_CSW), access_dcsw },
 	{ SYS_DESC(SYS_DC_CISW), access_dcsw },
+
+	SYS_INSN_TO_DESC(AT_S1E2R, handle_s1e2, forward_nv_traps),
+	SYS_INSN_TO_DESC(AT_S1E2W, handle_s1e2, forward_nv_traps),
+	SYS_INSN_TO_DESC(AT_S12E1R, handle_s12r, forward_nv_traps),
+	SYS_INSN_TO_DESC(AT_S12E1W, handle_s12w, forward_nv_traps),
+	SYS_INSN_TO_DESC(AT_S12E0R, handle_s12r, forward_nv_traps),
+	SYS_INSN_TO_DESC(AT_S12E0W, handle_s12w, forward_nv_traps),
 };
 
 static bool trap_dbgidr(struct kvm_vcpu *vcpu,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 44/59] KVM: arm64: nv: Trap and emulate TLBI instructions from virtual EL2
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (42 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 43/59] KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2 Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-07-02 12:37   ` Julien Thierry
  2019-07-10 10:15   ` Alexandru Elisei
  2019-06-21  9:38 ` [PATCH 45/59] KVM: arm64: nv: Handle traps for timer _EL02 and _EL2 sysregs accessors Marc Zyngier
                   ` (16 subsequent siblings)
  60 siblings, 2 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Jintack Lim <jintack.lim@linaro.org>

When supporting nested virtualization a guest hypervisor executing TLBI
instructions must be trapped and emulated by the host hypervisor,
because the guest hypervisor can only affect physical TLB entries
relating to its own execution environment (virtual EL2 in EL1) but not
to the nested guests as required by the semantics of the instructions
and TLBI instructions might also result in updates (invalidations) to
shadow page tables.

This patch does several things.

1. List and define all TLBI system instructions to emulate.

2. Emulate TLBI ALLE2(IS) instruction executed in the virtual EL2. Since
we emulate the virtual EL2 in the EL1, we invalidate EL1&0 regime stage
1 TLB entries with setting vttbr_el2 having the VMID of the virtual EL2.

3. Emulate TLBI VAE2* instruction executed in the virtual EL2. Based on the
same principle as TLBI ALLE2 instruction, we can simply emulate those
instructions by executing corresponding VAE1* instructions with the
virtual EL2's VMID assigned by the host hypervisor.

Note that we are able to emulate TLBI ALLE2IS precisely by only
invalidating stage 1 TLB entries via TLBI VMALL1IS instruction, but to
make it simeple, we reuse the existing function, __kvm_tlb_flush_vmid(),
which invalidates both of stage 1 and 2 TLB entries.

4. TLBI ALLE1(IS) instruction invalidates all EL1&0 regime stage 1 and 2
TLB entries (on all PEs in the same Inner Shareable domain). To emulate
these instructions, we first need to clear all the mappings in the
shadow page tables since executing those instructions implies the change
of mappings in the stage 2 page tables maintained by the guest
hypervisor.  We then need to invalidate all EL1&0 regime stage 1 and 2
TLB entries of all VMIDs, which are assigned by the host hypervisor, for
this VM.

5. Based on the same principle as TLBI ALLE1(IS) emulation, we clear the
mappings in the shadow stage-2 page tables and invalidate TLB entries.
But this time we do it only for the current VMID from the guest
hypervisor's perspective, not for all VMIDs.

6. Based on the same principle as TLBI ALLE1(IS) and TLBI VMALLS12E1(IS)
emulation, we clear the mappings in the shadow stage-2 page tables and
invalidate TLB entries. We do it only for one mapping for the current
VMID from the guest hypervisor's view.

7. Forward system instruction traps to the virtual EL2 if a
corresponding bit in the virtual HCR_EL2 is set.

8. Even though a guest hypervisor can execute TLBI instructions that are
accesible at EL1 without trap, it's wrong; All those TLBI instructions
work based on current VMID, and when running a guest hypervisor current
VMID is the one for itself, not the one from the virtual vttbr_el2. So
letting a guest hypervisor execute those TLBI instructions results in
invalidating its own TLB entries and leaving invalid TLB entries
unhandled.

Therefore we trap and emulate those TLBI instructions. The emulation is
simple; we find a shadow VMID mapped to the virtual vttbr_el2, set it in
the physical vttbr_el2, then execute the same instruction in EL2.

We don't set HCR_EL2.TTLB bit yet.

  [ Changes performed by Marc Zynger:

    The TLBI handling code more or less directly execute the same
    instruction that has been trapped (with an EL2->EL1 conversion
    in the case of an EL2 TLBI), but that's unfortunately not enough:

    - TLBIs must be upgraded to the Inner Shareable domain to account
      for vcpu migration, just like we already have with HCR_EL2.FB.

    - The DSB instruction that synchronises these must thus be on
      the Inner Shareable domain as well.

    - Prior to executing the TLBI, we need another DSB ISHST to make
      sure that the update to the page tables is now visible.

      Ordering of system instructions fixed

    - The current TLB invalidation code is pretty buggy, as it assume a
      page mapping. On the contrary, it is likely that TLB invalidation
      will cover more than a single page, and the size should be decided
      by the guests configuration (and not the host's).

      Since we don't cache the guest mapping sizes in the shadow PT yet,
      let's assume the worse case (a block mapping) and invalidate that.

      Take this opportunity to fix the decoding of the parameter (it
      isn't a straight IPA).

    - In general, we always emulate local TBL invalidations as being
      as upgraded to the Inner Shareable domain so that we can easily
      deal with vcpu migration. This is consistent with the fact that
      we set HCR_EL2.FB when running non-nested VMs.

      So let's emulate TLBI ALLE2 as ALLE2IS.
  ]

  [ Changes performed by Christoffer Dall:

    Sometimes when we are invalidating the TLB for a certain S2 MMU
    context, this context can also have EL2 context associated with it
    and we have to invalidate this too.
  ]

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
---
 arch/arm64/include/asm/kvm_asm.h |   2 +
 arch/arm64/include/asm/sysreg.h  |  36 ++++++
 arch/arm64/kvm/hyp/tlb.c         |  81 +++++++++++++
 arch/arm64/kvm/sys_regs.c        | 201 +++++++++++++++++++++++++++++++
 virt/kvm/arm/mmu.c               |  18 ++-
 5 files changed, 337 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 1cfa4d2cf772..9cb9ab066ebc 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -67,6 +67,8 @@ extern void __kvm_flush_vm_context(void);
 extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa);
 extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
 extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
+extern void __kvm_tlb_vae2(struct kvm_s2_mmu *mmu, u64 va, u64 sys_encoding);
+extern void __kvm_tlb_el1_instr(struct kvm_s2_mmu *mmu, u64 val, u64 sys_encoding);
 
 extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
 extern void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr);
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index b3a8d21c07b3..e0912ececd92 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -553,6 +553,42 @@
 #define OP_AT_S12E0R	sys_insn(AT_Op0, 4, AT_CRn, 8, 6)
 #define OP_AT_S12E0W	sys_insn(AT_Op0, 4, AT_CRn, 8, 7)
 
+/* TLBI instructions */
+#define TLBI_Op0	1
+#define TLBI_Op1_EL1	0	/* Accessible from EL1 or higher */
+#define TLBI_Op1_EL2	4	/* Accessible from EL2 or higher */
+#define TLBI_CRn	8
+#define tlbi_insn_el1(CRm, Op2)	sys_insn(TLBI_Op0, TLBI_Op1_EL1, TLBI_CRn, (CRm), (Op2))
+#define tlbi_insn_el2(CRm, Op2)	sys_insn(TLBI_Op0, TLBI_Op1_EL2, TLBI_CRn, (CRm), (Op2))
+
+#define OP_TLBI_VMALLE1IS	tlbi_insn_el1(3, 0)
+#define OP_TLBI_VAE1IS		tlbi_insn_el1(3, 1)
+#define OP_TLBI_ASIDE1IS	tlbi_insn_el1(3, 2)
+#define OP_TLBI_VAAE1IS		tlbi_insn_el1(3, 3)
+#define OP_TLBI_VALE1IS		tlbi_insn_el1(3, 5)
+#define OP_TLBI_VAALE1IS	tlbi_insn_el1(3, 7)
+#define OP_TLBI_VMALLE1		tlbi_insn_el1(7, 0)
+#define OP_TLBI_VAE1		tlbi_insn_el1(7, 1)
+#define OP_TLBI_ASIDE1		tlbi_insn_el1(7, 2)
+#define OP_TLBI_VAAE1		tlbi_insn_el1(7, 3)
+#define OP_TLBI_VALE1		tlbi_insn_el1(7, 5)
+#define OP_TLBI_VAALE1		tlbi_insn_el1(7, 7)
+
+#define OP_TLBI_IPAS2E1IS	tlbi_insn_el2(0, 1)
+#define OP_TLBI_IPAS2LE1IS	tlbi_insn_el2(0, 5)
+#define OP_TLBI_ALLE2IS		tlbi_insn_el2(3, 0)
+#define OP_TLBI_VAE2IS		tlbi_insn_el2(3, 1)
+#define OP_TLBI_ALLE1IS		tlbi_insn_el2(3, 4)
+#define OP_TLBI_VALE2IS		tlbi_insn_el2(3, 5)
+#define OP_TLBI_VMALLS12E1IS	tlbi_insn_el2(3, 6)
+#define OP_TLBI_IPAS2E1		tlbi_insn_el2(4, 1)
+#define OP_TLBI_IPAS2LE1	tlbi_insn_el2(4, 5)
+#define OP_TLBI_ALLE2		tlbi_insn_el2(7, 0)
+#define OP_TLBI_VAE2		tlbi_insn_el2(7, 1)
+#define OP_TLBI_ALLE1		tlbi_insn_el2(7, 4)
+#define OP_TLBI_VALE2		tlbi_insn_el2(7, 5)
+#define OP_TLBI_VMALLS12E1	tlbi_insn_el2(7, 6)
+
 /* Common SCTLR_ELx flags. */
 #define SCTLR_ELx_DSSBS	(_BITUL(44))
 #define SCTLR_ELx_ENIA	(_BITUL(31))
diff --git a/arch/arm64/kvm/hyp/tlb.c b/arch/arm64/kvm/hyp/tlb.c
index 779405db3fb3..026afbf1a697 100644
--- a/arch/arm64/kvm/hyp/tlb.c
+++ b/arch/arm64/kvm/hyp/tlb.c
@@ -205,3 +205,84 @@ void __hyp_text __kvm_flush_vm_context(void)
 	asm volatile("ic ialluis" : : );
 	dsb(ish);
 }
+
+void __hyp_text __kvm_tlb_vae2(struct kvm_s2_mmu *mmu, u64 va, u64 sys_encoding)
+{
+	struct tlb_inv_context cxt;
+
+	dsb(ishst);
+
+	/* Switch to requested VMID */
+	__tlb_switch_to_guest()(mmu, &cxt);
+
+	/*
+	 * Execute the EL1 version of TLBI VAE2* instruction, forcing
+	 * an upgrade to the Inner Shareable domain in order to
+	 * perform the invalidation on all CPUs.
+	 */
+	switch (sys_encoding) {
+	case OP_TLBI_VAE2:
+	case OP_TLBI_VAE2IS:
+		__tlbi(vae1is, va);
+		break;
+	case OP_TLBI_VALE2:
+	case OP_TLBI_VALE2IS:
+		__tlbi(vale1is, va);
+		break;
+	default:
+		break;
+	}
+	dsb(ish);
+	isb();
+
+	__tlb_switch_to_host()(&cxt);
+}
+
+void __hyp_text __kvm_tlb_el1_instr(struct kvm_s2_mmu *mmu, u64 val, u64 sys_encoding)
+{
+	struct tlb_inv_context cxt;
+
+	dsb(ishst);
+
+	/* Switch to requested VMID */
+	__tlb_switch_to_guest()(mmu, &cxt);
+
+	/*
+	 * Execute the same instruction as the guest hypervisor did,
+	 * expanding the scope of local TLB invalidations to the Inner
+	 * Shareable domain so that it takes place on all CPUs. This
+	 * is equivalent to having HCR_EL2.FB set.
+	 */
+	switch (sys_encoding) {
+	case OP_TLBI_VMALLE1:
+	case OP_TLBI_VMALLE1IS:
+		__tlbi(vmalle1is);
+		break;
+	case OP_TLBI_VAE1:
+	case OP_TLBI_VAE1IS:
+		__tlbi(vae1is, val);
+		break;
+	case OP_TLBI_ASIDE1:
+	case OP_TLBI_ASIDE1IS:
+		__tlbi(aside1is, val);
+		break;
+	case OP_TLBI_VAAE1:
+	case OP_TLBI_VAAE1IS:
+		__tlbi(vaae1is, val);
+		break;
+	case OP_TLBI_VALE1:
+	case OP_TLBI_VALE1IS:
+		__tlbi(vale1is, val);
+		break;
+	case OP_TLBI_VAALE1:
+	case OP_TLBI_VAALE1IS:
+		__tlbi(vaale1is, val);
+		break;
+	default:
+		break;
+	}
+	dsb(ish);
+	isb();
+
+	__tlb_switch_to_host()(&cxt);
+}
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 102419b837e8..0343682fe47f 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1661,6 +1661,11 @@ static bool forward_at_traps(struct kvm_vcpu *vcpu)
 	return forward_traps(vcpu, HCR_AT);
 }
 
+static bool forward_ttlb_traps(struct kvm_vcpu *vcpu)
+{
+	return forward_traps(vcpu, HCR_TTLB);
+}
+
 /* This function is to support the recursive nested virtualization */
 static bool forward_nv1_traps(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
 {
@@ -2251,6 +2256,174 @@ static bool handle_s12w(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 	return handle_s12(vcpu, p, r, true);
 }
 
+static bool handle_alle2is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			   const struct sys_reg_desc *r)
+{
+	/*
+	 * To emulate invalidating all EL2 regime stage 1 TLB entries for all
+	 * PEs, executing TLBI VMALLE1IS is enough. But reuse the existing
+	 * interface for the simplicity; invalidating stage 2 entries doesn't
+	 * affect the correctness.
+	 */
+	kvm_call_hyp(__kvm_tlb_flush_vmid, &vcpu->kvm->arch.mmu);
+	return true;
+}
+
+static bool handle_vae2(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+		       const struct sys_reg_desc *r)
+{
+	int sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
+
+	/*
+	 * Based on the same principle as TLBI ALLE2 instruction emulation, we
+	 * emulate TLBI VAE2* instructions by executing corresponding TLBI VAE1*
+	 * instructions with the virtual EL2's VMID assigned by the host
+	 * hypervisor.
+	 */
+	kvm_call_hyp(__kvm_tlb_vae2, &vcpu->kvm->arch.mmu,
+		     p->regval, sys_encoding);
+	return true;
+}
+
+static bool handle_alle1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			   const struct sys_reg_desc *r)
+{
+	struct kvm_s2_mmu *mmu = &vcpu->kvm->arch.mmu;
+	spin_lock(&vcpu->kvm->mmu_lock);
+
+	/*
+	 * Clear all mappings in the shadow page tables and invalidate the stage
+	 * 1 and 2 TLB entries via kvm_tlb_flush_vmid_ipa().
+	 */
+	kvm_nested_s2_clear(vcpu->kvm);
+
+	if (mmu->vmid.vmid_gen) {
+		/*
+		 * Invalidate the stage 1 and 2 TLB entries for the host OS
+		 * in a VM only if there is one.
+		 */
+		kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
+	}
+
+	spin_unlock(&vcpu->kvm->mmu_lock);
+
+	return true;
+}
+
+static bool handle_vmalls12e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+				const struct sys_reg_desc *r)
+{
+	u64 vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
+	struct kvm_s2_mmu *mmu;
+
+	spin_lock(&vcpu->kvm->mmu_lock);
+
+	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, HCR_VM);
+	if (mmu)
+		kvm_unmap_stage2_range(mmu, 0, kvm_phys_size(vcpu->kvm));
+
+	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, 0);
+	if (mmu)
+		kvm_unmap_stage2_range(mmu, 0, kvm_phys_size(vcpu->kvm));
+
+	spin_unlock(&vcpu->kvm->mmu_lock);
+
+	return true;
+}
+
+static bool handle_ipas2e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			     const struct sys_reg_desc *r)
+{
+	u64 vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
+	u64 vtcr = vcpu_read_sys_reg(vcpu, VTCR_EL2);
+	struct kvm_s2_mmu *mmu;
+	u64 base_addr;
+	int max_size;
+
+	/*
+	 * We drop a number of things from the supplied value:
+	 *
+	 * - NS bit: we're non-secure only.
+	 *
+	 * - TTL field: We already have the granule size from the
+	 *   VTCR_EL2.TG0 field, and the level is only relevant to the
+	 *   guest's S2PT.
+	 *
+	 * - IPA[51:48]: We don't support 52bit IPA just yet...
+	 *
+	 * And of course, adjust the IPA to be on an actual address.
+	 */
+	base_addr = (p->regval & GENMASK_ULL(35, 0)) << 12;
+
+	/* Compute the maximum extent of the invalidation */
+	switch ((vtcr & VTCR_EL2_TG0_MASK)) {
+	case VTCR_EL2_TG0_4K:
+		max_size = SZ_1G;
+		break;
+	case VTCR_EL2_TG0_16K:
+		max_size = SZ_32M;
+		break;
+	case VTCR_EL2_TG0_64K:
+		/*
+		 * No, we do not support 52bit IPA in nested yet. Once
+		 * we do, this should be 4TB.
+		 */
+		/* FIXME: remove the 52bit PA support from the IDregs */
+		max_size = SZ_512M;
+		break;
+	default:
+		BUG();
+	}
+
+	spin_lock(&vcpu->kvm->mmu_lock);
+
+	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, HCR_VM);
+	if (mmu)
+		kvm_unmap_stage2_range(mmu, base_addr, max_size);
+
+	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, 0);
+	if (mmu)
+		kvm_unmap_stage2_range(mmu, base_addr, max_size);
+
+	spin_unlock(&vcpu->kvm->mmu_lock);
+
+	return true;
+}
+
+static bool handle_tlbi_el1(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			    const struct sys_reg_desc *r)
+{
+	u64 virtual_vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
+	struct kvm_s2_mmu *mmu;
+	int sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
+
+	/*
+	 * TODO: Revisit this comment:
+	 *
+	 * If we can't find a shadow VMID, it is either the virtual
+	 * VMID is for the host OS or the nested VM having the virtual
+	 * VMID is never executed. (Note that we create a showdow VMID
+	 * when entering a VM.) For the former, we can flush TLB
+	 * entries belonging to the host OS in a VM. For the latter, we
+	 * don't have to do anything. Since we can't differentiate
+	 * between those cases, just do what we can do for the former.
+	 */
+
+	mutex_lock(&vcpu->kvm->lock);
+	mmu = lookup_s2_mmu(vcpu->kvm, virtual_vttbr, HCR_VM);
+	if (mmu)
+		kvm_call_hyp(__kvm_tlb_el1_instr,
+			     mmu, p->regval, sys_encoding);
+
+	mmu = lookup_s2_mmu(vcpu->kvm, virtual_vttbr, 0);
+	if (mmu)
+		kvm_call_hyp(__kvm_tlb_el1_instr,
+			     mmu, p->regval, sys_encoding);
+	mutex_unlock(&vcpu->kvm->lock);
+
+	return true;
+}
+
 /*
  * AT instruction emulation
  *
@@ -2333,12 +2506,40 @@ static struct sys_reg_desc sys_insn_descs[] = {
 	{ SYS_DESC(SYS_DC_CSW), access_dcsw },
 	{ SYS_DESC(SYS_DC_CISW), access_dcsw },
 
+	SYS_INSN_TO_DESC(TLBI_VMALLE1IS, handle_tlbi_el1, forward_ttlb_traps),
+	SYS_INSN_TO_DESC(TLBI_VAE1IS, handle_tlbi_el1, forward_ttlb_traps),
+	SYS_INSN_TO_DESC(TLBI_ASIDE1IS, handle_tlbi_el1, forward_ttlb_traps),
+	SYS_INSN_TO_DESC(TLBI_VAAE1IS, handle_tlbi_el1, forward_ttlb_traps),
+	SYS_INSN_TO_DESC(TLBI_VALE1IS, handle_tlbi_el1, forward_ttlb_traps),
+	SYS_INSN_TO_DESC(TLBI_VAALE1IS, handle_tlbi_el1, forward_ttlb_traps),
+	SYS_INSN_TO_DESC(TLBI_VMALLE1, handle_tlbi_el1, forward_ttlb_traps),
+	SYS_INSN_TO_DESC(TLBI_VAE1, handle_tlbi_el1, forward_ttlb_traps),
+	SYS_INSN_TO_DESC(TLBI_ASIDE1, handle_tlbi_el1, forward_ttlb_traps),
+	SYS_INSN_TO_DESC(TLBI_VAAE1, handle_tlbi_el1, forward_ttlb_traps),
+	SYS_INSN_TO_DESC(TLBI_VALE1, handle_tlbi_el1, forward_ttlb_traps),
+	SYS_INSN_TO_DESC(TLBI_VAALE1, handle_tlbi_el1, forward_ttlb_traps),
+
 	SYS_INSN_TO_DESC(AT_S1E2R, handle_s1e2, forward_nv_traps),
 	SYS_INSN_TO_DESC(AT_S1E2W, handle_s1e2, forward_nv_traps),
 	SYS_INSN_TO_DESC(AT_S12E1R, handle_s12r, forward_nv_traps),
 	SYS_INSN_TO_DESC(AT_S12E1W, handle_s12w, forward_nv_traps),
 	SYS_INSN_TO_DESC(AT_S12E0R, handle_s12r, forward_nv_traps),
 	SYS_INSN_TO_DESC(AT_S12E0W, handle_s12w, forward_nv_traps),
+
+	SYS_INSN_TO_DESC(TLBI_IPAS2E1IS, handle_ipas2e1is, forward_nv_traps),
+	SYS_INSN_TO_DESC(TLBI_IPAS2LE1IS, handle_ipas2e1is, forward_nv_traps),
+	SYS_INSN_TO_DESC(TLBI_ALLE2IS, handle_alle2is, forward_nv_traps),
+	SYS_INSN_TO_DESC(TLBI_VAE2IS, handle_vae2, forward_nv_traps),
+	SYS_INSN_TO_DESC(TLBI_ALLE1IS, handle_alle1is, forward_nv_traps),
+	SYS_INSN_TO_DESC(TLBI_VALE2IS, handle_vae2, forward_nv_traps),
+	SYS_INSN_TO_DESC(TLBI_VMALLS12E1IS, handle_vmalls12e1is, forward_nv_traps),
+	SYS_INSN_TO_DESC(TLBI_IPAS2E1, handle_ipas2e1is, forward_nv_traps),
+	SYS_INSN_TO_DESC(TLBI_IPAS2LE1, handle_ipas2e1is, forward_nv_traps),
+	SYS_INSN_TO_DESC(TLBI_ALLE2, handle_alle2is, forward_nv_traps),
+	SYS_INSN_TO_DESC(TLBI_VAE2, handle_vae2, forward_nv_traps),
+	SYS_INSN_TO_DESC(TLBI_ALLE1, handle_alle1is, forward_nv_traps),
+	SYS_INSN_TO_DESC(TLBI_VALE2, handle_vae2, forward_nv_traps),
+	SYS_INSN_TO_DESC(TLBI_VMALLS12E1, handle_vmalls12e1is, forward_nv_traps),
 };
 
 static bool trap_dbgidr(struct kvm_vcpu *vcpu,
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 6a7cba077bce..0ea79e543b29 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -51,7 +51,23 @@ static bool memslot_is_logging(struct kvm_memory_slot *memslot)
  */
 void kvm_flush_remote_tlbs(struct kvm *kvm)
 {
-	kvm_call_hyp(__kvm_tlb_flush_vmid, &kvm->arch.mmu);
+	struct kvm_s2_mmu *mmu = &kvm->arch.mmu;
+
+	if (mmu == &kvm->arch.mmu) {
+		/*
+		 * For a normal (i.e. non-nested) guest, flush entries for the
+		 * given VMID *
+		 */
+		kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
+	} else {
+		/*
+		 * When supporting nested virtualization, we can have multiple
+		 * VMIDs in play for each VCPU in the VM, so it's really not
+		 * worth it to try to quiesce the system and flush all the
+		 * VMIDs that may be in use, instead just nuke the whole thing.
+		 */
+		kvm_call_hyp(__kvm_flush_vm_context);
+	}
 }
 
 static void kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 45/59] KVM: arm64: nv: Handle traps for timer _EL02 and _EL2 sysregs accessors
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (43 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 44/59] KVM: arm64: nv: Trap and emulate TLBI " Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-06-21  9:38 ` [PATCH 46/59] KVM: arm64: nv: arch_timer: Support hyp timer emulation Marc Zyngier
                   ` (15 subsequent siblings)
  60 siblings, 0 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Andre Przywara <andre.przywara@arm.com>

Add trap handlers for the timer system registers accessed from a guest
hypervisors using either _EL02 or _EL2 system register access
instructions.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/sysreg.h |  6 ++++++
 arch/arm64/kvm/sys_regs.c       | 15 +++++++++++++++
 2 files changed, 21 insertions(+)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index e0912ececd92..c24f1550d2d7 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -500,6 +500,12 @@
 
 #define SYS_CNTVOFF_EL2			sys_reg(3, 4, 14, 0, 3)
 #define SYS_CNTHCTL_EL2			sys_reg(3, 4, 14, 1, 0)
+#define SYS_CNTHP_TVAL_EL2		sys_reg(3, 4, 14, 2, 0)
+#define SYS_CNTHP_CTL_EL2		sys_reg(3, 4, 14, 2, 1)
+#define SYS_CNTHP_CVAL_EL2		sys_reg(3, 4, 14, 2, 2)
+#define SYS_CNTHV_TVAL_EL2		sys_reg(3, 4, 14, 3, 0)
+#define SYS_CNTHV_CTL_EL2		sys_reg(3, 4, 14, 3, 1)
+#define SYS_CNTHV_CVAL_EL2		sys_reg(3, 4, 14, 3, 2)
 
 /* VHE encodings for architectural EL0/1 system registers */
 #define SYS_SCTLR_EL12			sys_reg(3, 5, 1, 0, 0)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 0343682fe47f..1b8016330a19 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2125,6 +2125,13 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_CNTVOFF_EL2), access_rw, reset_val, CNTVOFF_EL2, 0 },
 	{ SYS_DESC(SYS_CNTHCTL_EL2), access_rw, reset_val, CNTHCTL_EL2, 0 },
 
+	{ SYS_DESC(SYS_CNTHP_TVAL_EL2), access_arch_timer },
+	{ SYS_DESC(SYS_CNTHP_CTL_EL2), access_arch_timer },
+	{ SYS_DESC(SYS_CNTHP_CVAL_EL2), access_arch_timer },
+	{ SYS_DESC(SYS_CNTHV_TVAL_EL2), access_arch_timer },
+	{ SYS_DESC(SYS_CNTHV_CTL_EL2), access_arch_timer },
+	{ SYS_DESC(SYS_CNTHV_CVAL_EL2), access_arch_timer },
+
 	{ SYS_DESC(SYS_SCTLR_EL12), access_vm_reg, reset_val, SCTLR_EL1, 0x00C50078 },
 	{ SYS_DESC(SYS_CPACR_EL12), access_rw, reset_val, CPACR_EL1, 0 },
 	{ SYS_DESC(SYS_TTBR0_EL12), access_vm_reg, reset_unknown, TTBR0_EL1 },
@@ -2142,6 +2149,14 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_CONTEXTIDR_EL12), access_vm_reg, reset_val, CONTEXTIDR_EL1, 0 },
 	{ SYS_DESC(SYS_CNTKCTL_EL12), access_rw, reset_val, CNTKCTL_EL1, 0 },
 
+	{ SYS_DESC(SYS_CNTP_TVAL_EL02), access_arch_timer },
+	{ SYS_DESC(SYS_CNTP_CTL_EL02), access_arch_timer },
+	{ SYS_DESC(SYS_CNTP_CVAL_EL02), access_arch_timer },
+
+	{ SYS_DESC(SYS_CNTV_TVAL_EL02), access_arch_timer },
+	{ SYS_DESC(SYS_CNTV_CTL_EL02), access_arch_timer },
+	{ SYS_DESC(SYS_CNTV_CVAL_EL02), access_arch_timer },
+
 	{ SYS_DESC(SYS_SP_EL2), NULL, reset_unknown, SP_EL2 },
 };
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 46/59] KVM: arm64: nv: arch_timer: Support hyp timer emulation
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (44 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 45/59] KVM: arm64: nv: Handle traps for timer _EL02 and _EL2 sysregs accessors Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-07-10 16:23   ` Alexandru Elisei
  2019-06-21  9:38 ` [PATCH 47/59] KVM: arm64: nv: Propagate CNTVOFF_EL2 to the virtual EL1 timer Marc Zyngier
                   ` (14 subsequent siblings)
  60 siblings, 1 reply; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Christoffer Dall <christoffer.dall@arm.com>

Emulating EL2 also means emulating the EL2 timers. To do so, we expand
our timer framework to deal with at most 4 timers. At any given time,
two timers are using the HW timers, and the two others are purely
emulated.

The role of deciding which is which at any given time is left to a
mapping function which is called every time we need to make such a
decision.

Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_emulate.h |   2 +
 include/kvm/arm_arch_timer.h       |   5 ++
 include/kvm/arm_vgic.h             |   1 +
 virt/kvm/arm/arch_timer.c          | 122 ++++++++++++++++++++++++++++-
 virt/kvm/arm/trace.h               |   6 +-
 virt/kvm/arm/vgic/vgic.c           |  15 ++++
 6 files changed, 147 insertions(+), 4 deletions(-)

diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index 6b7644a383f6..865ce545b465 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -333,4 +333,6 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
 
 static inline void vcpu_ptrauth_setup_lazy(struct kvm_vcpu *vcpu) {}
 
+static inline bool is_hyp_ctxt(struct kvm_vcpu *vcpu) { return false; }
+
 #endif /* __ARM_KVM_EMULATE_H__ */
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index d120e6c323e7..3a5d9255120e 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -13,6 +13,8 @@
 enum kvm_arch_timers {
 	TIMER_PTIMER,
 	TIMER_VTIMER,
+	TIMER_HVTIMER,
+	TIMER_HPTIMER,
 	NR_KVM_TIMERS
 };
 
@@ -54,6 +56,7 @@ struct arch_timer_context {
 struct timer_map {
 	struct arch_timer_context *direct_vtimer;
 	struct arch_timer_context *direct_ptimer;
+	struct arch_timer_context *emul_vtimer;
 	struct arch_timer_context *emul_ptimer;
 };
 
@@ -98,6 +101,8 @@ bool kvm_arch_timer_get_input_level(int vintid);
 #define vcpu_get_timer(v,t)	(&vcpu_timer(v)->timers[(t)])
 #define vcpu_vtimer(v)	(&(v)->arch.timer_cpu.timers[TIMER_VTIMER])
 #define vcpu_ptimer(v)	(&(v)->arch.timer_cpu.timers[TIMER_PTIMER])
+#define vcpu_hvtimer(v)	(&(v)->arch.timer_cpu.timers[TIMER_HVTIMER])
+#define vcpu_hptimer(v)	(&(v)->arch.timer_cpu.timers[TIMER_HPTIMER])
 
 #define arch_timer_ctx_index(ctx)	((ctx) - vcpu_timer((ctx)->vcpu)->timers)
 
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index c36c86f1ec9a..7fc3b413b3de 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -355,6 +355,7 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
 int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, unsigned int host_irq,
 			  u32 vintid, bool (*get_input_level)(int vindid));
 int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int vintid);
+int kvm_vgic_get_map(struct kvm_vcpu *vcpu, unsigned int vintid);
 bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int vintid);
 
 int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 089441a07ed7..3d84c240071d 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -15,6 +15,7 @@
 #include <asm/arch_timer.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_nested.h>
 
 #include <kvm/arm_vgic.h>
 #include <kvm/arm_arch_timer.h>
@@ -39,6 +40,16 @@ static const struct kvm_irq_level default_vtimer_irq = {
 	.level	= 1,
 };
 
+static const struct kvm_irq_level default_hptimer_irq = {
+	.irq	= 26,
+	.level	= 1,
+};
+
+static const struct kvm_irq_level default_hvtimer_irq = {
+	.irq	= 28,
+	.level	= 1,
+};
+
 static bool kvm_timer_irq_can_fire(struct arch_timer_context *timer_ctx);
 static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
 				 struct arch_timer_context *timer_ctx);
@@ -58,13 +69,27 @@ u64 kvm_phys_timer_read(void)
 
 static void get_timer_map(struct kvm_vcpu *vcpu, struct timer_map *map)
 {
-	if (has_vhe()) {
+	if (nested_virt_in_use(vcpu)) {
+		if (is_hyp_ctxt(vcpu)) {
+			map->direct_vtimer = vcpu_hvtimer(vcpu);
+			map->direct_ptimer = vcpu_hptimer(vcpu);
+			map->emul_vtimer = vcpu_vtimer(vcpu);
+			map->emul_ptimer = vcpu_ptimer(vcpu);
+		} else {
+			map->direct_vtimer = vcpu_vtimer(vcpu);
+			map->direct_ptimer = vcpu_ptimer(vcpu);
+			map->emul_vtimer = vcpu_hvtimer(vcpu);
+			map->emul_ptimer = vcpu_hptimer(vcpu);
+		}
+	} else if (has_vhe()) {
 		map->direct_vtimer = vcpu_vtimer(vcpu);
 		map->direct_ptimer = vcpu_ptimer(vcpu);
+		map->emul_vtimer = NULL;
 		map->emul_ptimer = NULL;
 	} else {
 		map->direct_vtimer = vcpu_vtimer(vcpu);
 		map->direct_ptimer = NULL;
+		map->emul_vtimer = NULL;
 		map->emul_ptimer = vcpu_ptimer(vcpu);
 	}
 
@@ -237,9 +262,11 @@ static bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx)
 
 		switch (index) {
 		case TIMER_VTIMER:
+		case TIMER_HVTIMER:
 			cnt_ctl = read_sysreg_el0(SYS_CNTV_CTL);
 			break;
 		case TIMER_PTIMER:
+		case TIMER_HPTIMER:
 			cnt_ctl = read_sysreg_el0(SYS_CNTP_CTL);
 			break;
 		case NR_KVM_TIMERS:
@@ -270,6 +297,7 @@ bool kvm_timer_is_pending(struct kvm_vcpu *vcpu)
 
 	return kvm_timer_should_fire(map.direct_vtimer) ||
 	       kvm_timer_should_fire(map.direct_ptimer) ||
+	       kvm_timer_should_fire(map.emul_vtimer) ||
 	       kvm_timer_should_fire(map.emul_ptimer);
 }
 
@@ -349,6 +377,7 @@ static void timer_save_state(struct arch_timer_context *ctx)
 
 	switch (index) {
 	case TIMER_VTIMER:
+	case TIMER_HVTIMER:
 		ctx->cnt_ctl = read_sysreg_el0(SYS_CNTV_CTL);
 		ctx->cnt_cval = read_sysreg_el0(SYS_CNTV_CVAL);
 
@@ -358,6 +387,7 @@ static void timer_save_state(struct arch_timer_context *ctx)
 
 		break;
 	case TIMER_PTIMER:
+	case TIMER_HPTIMER:
 		ctx->cnt_ctl = read_sysreg_el0(SYS_CNTP_CTL);
 		ctx->cnt_cval = read_sysreg_el0(SYS_CNTP_CVAL);
 
@@ -395,6 +425,7 @@ static void kvm_timer_blocking(struct kvm_vcpu *vcpu)
 	 */
 	if (!kvm_timer_irq_can_fire(map.direct_vtimer) &&
 	    !kvm_timer_irq_can_fire(map.direct_ptimer) &&
+	    !kvm_timer_irq_can_fire(map.emul_vtimer) &&
 	    !kvm_timer_irq_can_fire(map.emul_ptimer))
 		return;
 
@@ -428,11 +459,13 @@ static void timer_restore_state(struct arch_timer_context *ctx)
 
 	switch (index) {
 	case TIMER_VTIMER:
+	case TIMER_HVTIMER:
 		write_sysreg_el0(ctx->cnt_cval, SYS_CNTV_CVAL);
 		isb();
 		write_sysreg_el0(ctx->cnt_ctl, SYS_CNTV_CTL);
 		break;
 	case TIMER_PTIMER:
+	case TIMER_HPTIMER:
 		write_sysreg_el0(ctx->cnt_cval, SYS_CNTP_CVAL);
 		isb();
 		write_sysreg_el0(ctx->cnt_ctl, SYS_CNTP_CTL);
@@ -519,6 +552,40 @@ static void kvm_timer_vcpu_load_nogic(struct kvm_vcpu *vcpu)
 		enable_percpu_irq(host_vtimer_irq, host_vtimer_irq_flags);
 }
 
+static void kvm_timer_vcpu_load_nested_switch(struct kvm_vcpu *vcpu,
+					      struct timer_map *map)
+{
+	int hw, ret;
+
+	if (!irqchip_in_kernel(vcpu->kvm))
+		return;
+
+	/*
+	 * We only ever unmap the vtimer irq on a VHE system that runs nested
+	 * virtualization, in which case we have both a valid emul_vtimer,
+	 * emul_ptimer, direct_vtimer, and direct_ptimer.
+	 *
+	 * Since this is called from kvm_timer_vcpu_load(), a change between
+	 * vEL2 and vEL1/0 will have just happened, and the timer_map will
+	 * represent this, and therefore we switch the emul/direct mappings
+	 * below.
+	 */
+	hw = kvm_vgic_get_map(vcpu, map->direct_vtimer->irq.irq);
+	if (hw < 0) {
+		kvm_vgic_unmap_phys_irq(vcpu, map->emul_vtimer->irq.irq);
+		kvm_vgic_unmap_phys_irq(vcpu, map->emul_ptimer->irq.irq);
+
+		ret = kvm_vgic_map_phys_irq(vcpu,
+					    map->direct_vtimer->host_timer_irq,
+					    map->direct_vtimer->irq.irq,
+					    kvm_arch_timer_get_input_level);
+		ret = kvm_vgic_map_phys_irq(vcpu,
+					    map->direct_ptimer->host_timer_irq,
+					    map->direct_ptimer->irq.irq,
+					    kvm_arch_timer_get_input_level);
+	}
+}
+
 void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = vcpu_timer(vcpu);
@@ -530,6 +597,8 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
 	get_timer_map(vcpu, &map);
 
 	if (static_branch_likely(&has_gic_active_state)) {
+		kvm_timer_vcpu_load_nested_switch(vcpu, &map);
+
 		kvm_timer_vcpu_load_gic(map.direct_vtimer);
 		if (map.direct_ptimer)
 			kvm_timer_vcpu_load_gic(map.direct_ptimer);
@@ -545,6 +614,8 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
 	if (map.direct_ptimer)
 		timer_restore_state(map.direct_ptimer);
 
+	if (map.emul_vtimer)
+		timer_emulate(map.emul_vtimer);
 	if (map.emul_ptimer)
 		timer_emulate(map.emul_ptimer);
 }
@@ -589,6 +660,8 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
 	 * In any case, we re-schedule the hrtimer for the physical timer when
 	 * coming back to the VCPU thread in kvm_timer_vcpu_load().
 	 */
+	if (map.emul_vtimer)
+		soft_timer_cancel(&map.emul_vtimer->hrtimer);
 	if (map.emul_ptimer)
 		soft_timer_cancel(&map.emul_ptimer->hrtimer);
 
@@ -649,10 +722,14 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
 	 */
 	vcpu_vtimer(vcpu)->cnt_ctl = 0;
 	vcpu_ptimer(vcpu)->cnt_ctl = 0;
+	vcpu_hvtimer(vcpu)->cnt_ctl = 0;
+	vcpu_hptimer(vcpu)->cnt_ctl = 0;
 
 	if (timer->enabled) {
 		kvm_timer_update_irq(vcpu, false, vcpu_vtimer(vcpu));
 		kvm_timer_update_irq(vcpu, false, vcpu_ptimer(vcpu));
+		kvm_timer_update_irq(vcpu, false, vcpu_hvtimer(vcpu));
+		kvm_timer_update_irq(vcpu, false, vcpu_hptimer(vcpu));
 
 		if (irqchip_in_kernel(vcpu->kvm)) {
 			kvm_vgic_reset_mapped_irq(vcpu, map.direct_vtimer->irq.irq);
@@ -661,6 +738,8 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
 		}
 	}
 
+	if (map.emul_vtimer)
+		soft_timer_cancel(&map.emul_vtimer->hrtimer);
 	if (map.emul_ptimer)
 		soft_timer_cancel(&map.emul_ptimer->hrtimer);
 
@@ -691,30 +770,46 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
 	struct arch_timer_cpu *timer = vcpu_timer(vcpu);
 	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
 	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
+	struct arch_timer_context *hvtimer = vcpu_hvtimer(vcpu);
+	struct arch_timer_context *hptimer = vcpu_hptimer(vcpu);
 
 	/* Synchronize cntvoff across all vtimers of a VM. */
 	update_vtimer_cntvoff(vcpu, kvm_phys_timer_read());
 	ptimer->cntvoff = 0;
+	hvtimer->cntvoff = 0;
+	hptimer->cntvoff = 0;
 
 	hrtimer_init(&timer->bg_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
 	timer->bg_timer.function = kvm_bg_timer_expire;
 
 	hrtimer_init(&vtimer->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
 	hrtimer_init(&ptimer->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
+	hrtimer_init(&hvtimer->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
+	hrtimer_init(&hptimer->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
 	vtimer->hrtimer.function = kvm_hrtimer_expire;
 	ptimer->hrtimer.function = kvm_hrtimer_expire;
+	hvtimer->hrtimer.function = kvm_hrtimer_expire;
+	hptimer->hrtimer.function = kvm_hrtimer_expire;
 
 	vtimer->irq.irq = default_vtimer_irq.irq;
 	ptimer->irq.irq = default_ptimer_irq.irq;
+	hvtimer->irq.irq = default_hvtimer_irq.irq;
+	hptimer->irq.irq = default_hptimer_irq.irq;
 
 	vtimer->host_timer_irq = host_vtimer_irq;
 	ptimer->host_timer_irq = host_ptimer_irq;
+	hvtimer->host_timer_irq = host_vtimer_irq;
+	hptimer->host_timer_irq = host_ptimer_irq;
 
 	vtimer->host_timer_irq_flags = host_vtimer_irq_flags;
 	ptimer->host_timer_irq_flags = host_ptimer_irq_flags;
+	hvtimer->host_timer_irq_flags = host_vtimer_irq_flags;
+	hptimer->host_timer_irq_flags = host_ptimer_irq_flags;
 
 	vtimer->vcpu = vcpu;
 	ptimer->vcpu = vcpu;
+	hvtimer->vcpu = vcpu;
+	hptimer->vcpu = vcpu;
 }
 
 static void kvm_timer_init_interrupt(void *info)
@@ -997,7 +1092,7 @@ void kvm_timer_vcpu_terminate(struct kvm_vcpu *vcpu)
 
 static bool timer_irqs_are_valid(struct kvm_vcpu *vcpu)
 {
-	int vtimer_irq, ptimer_irq;
+	int vtimer_irq, ptimer_irq, hvtimer_irq, hptimer_irq;
 	int i, ret;
 
 	vtimer_irq = vcpu_vtimer(vcpu)->irq.irq;
@@ -1010,9 +1105,21 @@ static bool timer_irqs_are_valid(struct kvm_vcpu *vcpu)
 	if (ret)
 		return false;
 
+	hvtimer_irq = vcpu_hvtimer(vcpu)->irq.irq;
+	ret = kvm_vgic_set_owner(vcpu, hvtimer_irq, vcpu_hvtimer(vcpu));
+	if (ret)
+		return false;
+
+	hptimer_irq = vcpu_hptimer(vcpu)->irq.irq;
+	ret = kvm_vgic_set_owner(vcpu, hptimer_irq, vcpu_hptimer(vcpu));
+	if (ret)
+		return false;
+
 	kvm_for_each_vcpu(i, vcpu, vcpu->kvm) {
 		if (vcpu_vtimer(vcpu)->irq.irq != vtimer_irq ||
-		    vcpu_ptimer(vcpu)->irq.irq != ptimer_irq)
+		    vcpu_ptimer(vcpu)->irq.irq != ptimer_irq ||
+		    vcpu_hvtimer(vcpu)->irq.irq != hvtimer_irq ||
+		    vcpu_hptimer(vcpu)->irq.irq != hptimer_irq)
 			return false;
 	}
 
@@ -1028,6 +1135,10 @@ bool kvm_arch_timer_get_input_level(int vintid)
 		timer = vcpu_vtimer(vcpu);
 	else if (vintid == vcpu_ptimer(vcpu)->irq.irq)
 		timer = vcpu_ptimer(vcpu);
+	else if (vintid == vcpu_hvtimer(vcpu)->irq.irq)
+		timer = vcpu_hvtimer(vcpu);
+	else if (vintid == vcpu_hptimer(vcpu)->irq.irq)
+		timer = vcpu_hptimer(vcpu);
 	else
 		BUG();
 
@@ -1109,6 +1220,7 @@ static void set_timer_irqs(struct kvm *kvm, int vtimer_irq, int ptimer_irq)
 	kvm_for_each_vcpu(i, vcpu, kvm) {
 		vcpu_vtimer(vcpu)->irq.irq = vtimer_irq;
 		vcpu_ptimer(vcpu)->irq.irq = ptimer_irq;
+		/* TODO: Add support for hv/hp timers */
 	}
 }
 
@@ -1119,6 +1231,8 @@ int kvm_arm_timer_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
 	int irq;
 
+	/* TODO: Add support for hv/hp timers */
+
 	if (!irqchip_in_kernel(vcpu->kvm))
 		return -EINVAL;
 
@@ -1151,6 +1265,8 @@ int kvm_arm_timer_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 	struct arch_timer_context *timer;
 	int irq;
 
+	/* TODO: Add support for hv/hp timers */
+
 	switch (attr->attr) {
 	case KVM_ARM_VCPU_TIMER_IRQ_VTIMER:
 		timer = vcpu_vtimer(vcpu);
diff --git a/virt/kvm/arm/trace.h b/virt/kvm/arm/trace.h
index 204d210d01c2..3b08cc0376f4 100644
--- a/virt/kvm/arm/trace.h
+++ b/virt/kvm/arm/trace.h
@@ -271,6 +271,7 @@ TRACE_EVENT(kvm_get_timer_map,
 		__field(	unsigned long,		vcpu_id	)
 		__field(	int,			direct_vtimer	)
 		__field(	int,			direct_ptimer	)
+		__field(	int,			emul_vtimer	)
 		__field(	int,			emul_ptimer	)
 	),
 
@@ -279,14 +280,17 @@ TRACE_EVENT(kvm_get_timer_map,
 		__entry->direct_vtimer		= arch_timer_ctx_index(map->direct_vtimer);
 		__entry->direct_ptimer =
 			(map->direct_ptimer) ? arch_timer_ctx_index(map->direct_ptimer) : -1;
+		__entry->emul_vtimer =
+			(map->emul_vtimer) ? arch_timer_ctx_index(map->emul_vtimer) : -1;
 		__entry->emul_ptimer =
 			(map->emul_ptimer) ? arch_timer_ctx_index(map->emul_ptimer) : -1;
 	),
 
-	TP_printk("VCPU: %ld, dv: %d, dp: %d, ep: %d",
+	TP_printk("VCPU: %ld, dv: %d, dp: %d, ev: %d, ep: %d",
 		  __entry->vcpu_id,
 		  __entry->direct_vtimer,
 		  __entry->direct_ptimer,
+		  __entry->emul_vtimer,
 		  __entry->emul_ptimer)
 );
 
diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index 191deccf60bf..1c5b4dbd33e4 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -569,6 +569,21 @@ int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int vintid)
 	return 0;
 }
 
+int kvm_vgic_get_map(struct kvm_vcpu *vcpu, unsigned int vintid)
+{
+	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, vintid);
+	unsigned long flags;
+	int ret = -1;
+
+	raw_spin_lock_irqsave(&irq->irq_lock, flags);
+	if (irq->hw)
+		ret = irq->hwintid;
+	raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
+
+	vgic_put_irq(vcpu->kvm, irq);
+	return ret;
+}
+
 /**
  * kvm_vgic_set_owner - Set the owner of an interrupt for a VM
  *
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 47/59] KVM: arm64: nv: Propagate CNTVOFF_EL2 to the virtual EL1 timer
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (45 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 46/59] KVM: arm64: nv: arch_timer: Support hyp timer emulation Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-08-08  9:34   ` Alexandru Elisei
  2019-06-21  9:38 ` [PATCH 48/59] KVM: arm64: nv: Load timer before the GIC Marc Zyngier
                   ` (13 subsequent siblings)
  60 siblings, 1 reply; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

We need to allow a guest hypervisor to virtualize the virtual timer.
FOr that, let's propagate CNTVOFF_EL2 to the guest's view of that
timer.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/kvm_host.h |  1 -
 arch/arm64/kvm/sys_regs.c         |  8 ++++++--
 include/kvm/arm_arch_timer.h      |  1 +
 virt/kvm/arm/arch_timer.c         | 12 ++++++++++++
 4 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index b7c44adcdbf3..e0fe9acb46bf 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -252,7 +252,6 @@ enum vcpu_sysreg {
 	RMR_EL2,	/* Reset Management Register */
 	CONTEXTIDR_EL2,	/* Context ID Register (EL2) */
 	TPIDR_EL2,	/* EL2 Software Thread ID Register */
-	CNTVOFF_EL2,	/* Counter-timer Virtual Offset register */
 	CNTHCTL_EL2,	/* Counter-timer Hypervisor Control register */
 	SP_EL2,		/* EL2 Stack Pointer */
 
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 1b8016330a19..2031a59fcf49 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -150,7 +150,6 @@ struct el2_sysreg_map {
 	PURE_EL2_SYSREG( RVBAR_EL2 ),
 	PURE_EL2_SYSREG( RMR_EL2 ),
 	PURE_EL2_SYSREG( TPIDR_EL2 ),
-	PURE_EL2_SYSREG( CNTVOFF_EL2 ),
 	PURE_EL2_SYSREG( CNTHCTL_EL2 ),
 	PURE_EL2_SYSREG( HPFAR_EL2 ),
 	EL2_SYSREG(      SCTLR_EL2,  SCTLR_EL1,      translate_sctlr ),
@@ -1351,6 +1350,11 @@ static bool access_arch_timer(struct kvm_vcpu *vcpu,
 		tmr = TIMER_PTIMER;
 		treg = TIMER_REG_CVAL;
 		break;
+	case SYS_CNTVOFF_EL2:
+		tmr = TIMER_VTIMER;
+		treg = TIMER_REG_VOFF;
+		break;
+
 	default:
 		BUG();
 	}
@@ -2122,7 +2126,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_CONTEXTIDR_EL2), access_rw, reset_val, CONTEXTIDR_EL2, 0 },
 	{ SYS_DESC(SYS_TPIDR_EL2), access_rw, reset_val, TPIDR_EL2, 0 },
 
-	{ SYS_DESC(SYS_CNTVOFF_EL2), access_rw, reset_val, CNTVOFF_EL2, 0 },
+	{ SYS_DESC(SYS_CNTVOFF_EL2), access_arch_timer },
 	{ SYS_DESC(SYS_CNTHCTL_EL2), access_rw, reset_val, CNTHCTL_EL2, 0 },
 
 	{ SYS_DESC(SYS_CNTHP_TVAL_EL2), access_arch_timer },
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index 3a5d9255120e..3389606f3029 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -23,6 +23,7 @@ enum kvm_arch_timer_regs {
 	TIMER_REG_CVAL,
 	TIMER_REG_TVAL,
 	TIMER_REG_CTL,
+	TIMER_REG_VOFF,
 };
 
 struct arch_timer_context {
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 3d84c240071d..1d53352c7d97 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -913,6 +913,10 @@ static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
 		val = kvm_phys_timer_read() - timer->cntvoff;
 		break;
 
+	case TIMER_REG_VOFF:
+		val = timer->cntvoff;
+		break;
+
 	default:
 		BUG();
 	}
@@ -955,6 +959,10 @@ static void kvm_arm_timer_write(struct kvm_vcpu *vcpu,
 		timer->cnt_cval = val;
 		break;
 
+	case TIMER_REG_VOFF:
+		timer->cntvoff = val;
+		break;
+
 	default:
 		BUG();
 	}
@@ -1166,6 +1174,10 @@ int kvm_timer_enable(struct kvm_vcpu *vcpu)
 		return -EINVAL;
 	}
 
+	/* Nested virtualization requires zero offset for virtual EL2 */
+	if (nested_virt_in_use(vcpu))
+		vcpu_vtimer(vcpu)->cntvoff = 0;
+
 	get_timer_map(vcpu, &map);
 
 	ret = kvm_vgic_map_phys_irq(vcpu,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 48/59] KVM: arm64: nv: Load timer before the GIC
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (46 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 47/59] KVM: arm64: nv: Propagate CNTVOFF_EL2 to the virtual EL1 timer Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-07-11 13:17   ` Alexandru Elisei
  2019-06-21  9:38 ` [PATCH 49/59] KVM: arm64: nv: vgic-v3: Take cpu_if pointer directly instead of vcpu Marc Zyngier
                   ` (12 subsequent siblings)
  60 siblings, 1 reply; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

In order for vgic_v3_load_nested to be able to observe which
which timer interrupts have the HW bit set for the current
context, the timers must have been loaded in the new mode
and the right timer mapped to their corresponding HW IRQs.

At the moment, we load the GIC first, meaning that timer
interrupts injected to an L2 guest will never have the HW
HW bit set (we see the old configuration).

Swapping the two loads solves this particular problem.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 virt/kvm/arm/arm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index e8b584b79847..ca10a11e044e 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -361,8 +361,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	vcpu->arch.host_cpu_context = &cpu_data->host_ctxt;
 
 	kvm_arm_set_running_vcpu(vcpu);
-	kvm_vgic_load(vcpu);
 	kvm_timer_vcpu_load(vcpu);
+	kvm_vgic_load(vcpu);
 	kvm_vcpu_load_sysregs(vcpu);
 	kvm_arch_vcpu_load_fp(vcpu);
 	kvm_vcpu_pmu_restore_guest(vcpu);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 49/59] KVM: arm64: nv: vgic-v3: Take cpu_if pointer directly instead of vcpu
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (47 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 48/59] KVM: arm64: nv: Load timer before the GIC Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-06-21  9:38 ` [PATCH 50/59] KVM: arm64: nv: Nested GICv3 Support Marc Zyngier
                   ` (11 subsequent siblings)
  60 siblings, 0 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Christoffer Dall <christoffer.dall@arm.com>

If we move the used_lrs field to the version-specific cpu interface
structure, the following functions only operate on the struct
vgic_v3_cpu_if and not the full vcpu:

  __vgic_v3_save_state
  __vgic_v3_restore_state
  __vgic_v3_activate_traps
  __vgic_v3_deactivate_traps
  __vgic_v3_save_aprs
  __vgic_v3_restore_aprs

This is going to be very useful for nested virt, so move the used_lrs
field and change the prototypes and implementations of these functions to
take the cpu_if parameter directly.

No functional change.

Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_hyp.h   | 12 ++++++------
 arch/arm/kvm/hyp/switch.c        |  8 ++++----
 arch/arm64/include/asm/kvm_hyp.h | 12 ++++++------
 arch/arm64/kvm/hyp/switch.c      |  8 ++++----
 include/kvm/arm_vgic.h           |  5 ++++-
 virt/kvm/arm/hyp/vgic-v3-sr.c    | 33 ++++++++++----------------------
 virt/kvm/arm/vgic/vgic-v2.c      | 10 +++++-----
 virt/kvm/arm/vgic/vgic-v3.c      | 12 ++++++------
 virt/kvm/arm/vgic/vgic.c         | 25 ++++++++++++++++--------
 9 files changed, 62 insertions(+), 63 deletions(-)

diff --git a/arch/arm/include/asm/kvm_hyp.h b/arch/arm/include/asm/kvm_hyp.h
index 059224fb14db..da68073ab672 100644
--- a/arch/arm/include/asm/kvm_hyp.h
+++ b/arch/arm/include/asm/kvm_hyp.h
@@ -113,12 +113,12 @@ void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
 void __sysreg_save_state(struct kvm_cpu_context *ctxt);
 void __sysreg_restore_state(struct kvm_cpu_context *ctxt);
 
-void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
-void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
-void __vgic_v3_activate_traps(struct kvm_vcpu *vcpu);
-void __vgic_v3_deactivate_traps(struct kvm_vcpu *vcpu);
-void __vgic_v3_save_aprs(struct kvm_vcpu *vcpu);
-void __vgic_v3_restore_aprs(struct kvm_vcpu *vcpu);
+void __vgic_v3_save_state(struct vgic_v3_cpu_if *cpu_if);
+void __vgic_v3_restore_state(struct vgic_v3_cpu_if *cpu_if);
+void __vgic_v3_activate_traps(struct vgic_v3_cpu_if *cpu_if);
+void __vgic_v3_deactivate_traps(struct vgic_v3_cpu_if *cpu_if);
+void __vgic_v3_save_aprs(struct vgic_v3_cpu_if *cpu_if);
+void __vgic_v3_restore_aprs(struct vgic_v3_cpu_if *cpu_if);
 
 asmlinkage void __vfp_save_state(struct vfp_hard_struct *vfp);
 asmlinkage void __vfp_restore_state(struct vfp_hard_struct *vfp);
diff --git a/arch/arm/kvm/hyp/switch.c b/arch/arm/kvm/hyp/switch.c
index 6e9c3f11bfa4..92513fb5bfa2 100644
--- a/arch/arm/kvm/hyp/switch.c
+++ b/arch/arm/kvm/hyp/switch.c
@@ -90,16 +90,16 @@ static void __hyp_text __deactivate_vm(struct kvm_vcpu *vcpu)
 static void __hyp_text __vgic_save_state(struct kvm_vcpu *vcpu)
 {
 	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif)) {
-		__vgic_v3_save_state(vcpu);
-		__vgic_v3_deactivate_traps(vcpu);
+		__vgic_v3_save_state(&vcpu->arch.vgic_cpu.vgic_v3);
+		__vgic_v3_deactivate_traps(&vcpu->arch.vgic_cpu.vgic_v3);
 	}
 }
 
 static void __hyp_text __vgic_restore_state(struct kvm_vcpu *vcpu)
 {
 	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif)) {
-		__vgic_v3_activate_traps(vcpu);
-		__vgic_v3_restore_state(vcpu);
+		__vgic_v3_activate_traps(&vcpu->arch.vgic_cpu.vgic_v3);
+		__vgic_v3_restore_state(&vcpu->arch.vgic_cpu.vgic_v3);
 	}
 }
 
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index e8044f265824..3ebbf307d2e2 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -83,12 +83,12 @@ typeof(orig) * __hyp_text fname(void)					\
 
 int __vgic_v2_perform_cpuif_access(struct kvm_vcpu *vcpu);
 
-void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
-void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
-void __vgic_v3_activate_traps(struct kvm_vcpu *vcpu);
-void __vgic_v3_deactivate_traps(struct kvm_vcpu *vcpu);
-void __vgic_v3_save_aprs(struct kvm_vcpu *vcpu);
-void __vgic_v3_restore_aprs(struct kvm_vcpu *vcpu);
+void __vgic_v3_save_state(struct vgic_v3_cpu_if *cpu_if);
+void __vgic_v3_restore_state(struct vgic_v3_cpu_if *cpu_if);
+void __vgic_v3_activate_traps(struct vgic_v3_cpu_if *cpu_if);
+void __vgic_v3_deactivate_traps(struct vgic_v3_cpu_if *cpu_if);
+void __vgic_v3_save_aprs(struct vgic_v3_cpu_if *cpu_if);
+void __vgic_v3_restore_aprs(struct vgic_v3_cpu_if *cpu_if);
 int __vgic_v3_perform_cpuif_access(struct kvm_vcpu *vcpu);
 
 void __timer_enable_traps(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index bd9fc0dae8e8..0fba9d47ff56 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -271,8 +271,8 @@ static void __hyp_text __deactivate_vm(struct kvm_vcpu *vcpu)
 static void __hyp_text __hyp_vgic_save_state(struct kvm_vcpu *vcpu)
 {
 	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif)) {
-		__vgic_v3_save_state(vcpu);
-		__vgic_v3_deactivate_traps(vcpu);
+		__vgic_v3_save_state(&vcpu->arch.vgic_cpu.vgic_v3);
+		__vgic_v3_deactivate_traps(&vcpu->arch.vgic_cpu.vgic_v3);
 	}
 }
 
@@ -280,8 +280,8 @@ static void __hyp_text __hyp_vgic_save_state(struct kvm_vcpu *vcpu)
 static void __hyp_text __hyp_vgic_restore_state(struct kvm_vcpu *vcpu)
 {
 	if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif)) {
-		__vgic_v3_activate_traps(vcpu);
-		__vgic_v3_restore_state(vcpu);
+		__vgic_v3_activate_traps(&vcpu->arch.vgic_cpu.vgic_v3);
+		__vgic_v3_restore_state(&vcpu->arch.vgic_cpu.vgic_v3);
 	}
 }
 
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 7fc3b413b3de..163b132e100e 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -278,6 +278,8 @@ struct vgic_v2_cpu_if {
 	u32		vgic_vmcr;
 	u32		vgic_apr;
 	u32		vgic_lr[VGIC_V2_MAX_LRS];
+
+	unsigned int used_lrs;
 };
 
 struct vgic_v3_cpu_if {
@@ -295,6 +297,8 @@ struct vgic_v3_cpu_if {
 	 * linking the Linux IRQ subsystem and the ITS together.
 	 */
 	struct its_vpe	its_vpe;
+
+	unsigned int used_lrs;
 };
 
 struct vgic_cpu {
@@ -304,7 +308,6 @@ struct vgic_cpu {
 		struct vgic_v3_cpu_if	vgic_v3;
 	};
 
-	unsigned int used_lrs;
 	struct vgic_irq private_irqs[VGIC_NR_PRIVATE_IRQS];
 
 	raw_spinlock_t ap_list_lock;	/* Protects the ap_list */
diff --git a/virt/kvm/arm/hyp/vgic-v3-sr.c b/virt/kvm/arm/hyp/vgic-v3-sr.c
index 370bd6c5e6cb..b5d818d8b458 100644
--- a/virt/kvm/arm/hyp/vgic-v3-sr.c
+++ b/virt/kvm/arm/hyp/vgic-v3-sr.c
@@ -205,10 +205,9 @@ static u32 __hyp_text __vgic_v3_read_ap1rn(int n)
 	return val;
 }
 
-void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
+void __hyp_text __vgic_v3_save_state(struct vgic_v3_cpu_if *cpu_if)
 {
-	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
-	u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
+	u64 used_lrs = cpu_if->used_lrs;
 
 	/*
 	 * Make sure stores to the GIC via the memory mapped interface
@@ -241,10 +240,9 @@ void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
 	}
 }
 
-void __hyp_text __vgic_v3_restore_state(struct kvm_vcpu *vcpu)
+void __hyp_text __vgic_v3_restore_state(struct vgic_v3_cpu_if *cpu_if)
 {
-	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
-	u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
+	u64 used_lrs = cpu_if->used_lrs;
 	int i;
 
 	if (used_lrs || cpu_if->its_vpe.its_vm) {
@@ -268,10 +266,8 @@ void __hyp_text __vgic_v3_restore_state(struct kvm_vcpu *vcpu)
 	}
 }
 
-void __hyp_text __vgic_v3_activate_traps(struct kvm_vcpu *vcpu)
+void __hyp_text __vgic_v3_activate_traps(struct vgic_v3_cpu_if *cpu_if)
 {
-	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
-
 	/*
 	 * VFIQEn is RES1 if ICC_SRE_EL1.SRE is 1. This causes a
 	 * Group0 interrupt (as generated in GICv2 mode) to be
@@ -317,9 +313,8 @@ void __hyp_text __vgic_v3_activate_traps(struct kvm_vcpu *vcpu)
 		write_gicreg(cpu_if->vgic_hcr, ICH_HCR_EL2);
 }
 
-void __hyp_text __vgic_v3_deactivate_traps(struct kvm_vcpu *vcpu)
+void __hyp_text __vgic_v3_deactivate_traps(struct vgic_v3_cpu_if *cpu_if)
 {
-	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
 	u64 val;
 
 	if (!cpu_if->vgic_sre) {
@@ -344,15 +339,11 @@ void __hyp_text __vgic_v3_deactivate_traps(struct kvm_vcpu *vcpu)
 		write_gicreg(0, ICH_HCR_EL2);
 }
 
-void __hyp_text __vgic_v3_save_aprs(struct kvm_vcpu *vcpu)
+void __hyp_text __vgic_v3_save_aprs(struct vgic_v3_cpu_if *cpu_if)
 {
-	struct vgic_v3_cpu_if *cpu_if;
 	u64 val;
 	u32 nr_pre_bits;
 
-	vcpu = kern_hyp_va(vcpu);
-	cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
-
 	val = read_gicreg(ICH_VTR_EL2);
 	nr_pre_bits = vtr_to_nr_pre_bits(val);
 
@@ -377,15 +368,11 @@ void __hyp_text __vgic_v3_save_aprs(struct kvm_vcpu *vcpu)
 	}
 }
 
-void __hyp_text __vgic_v3_restore_aprs(struct kvm_vcpu *vcpu)
+void __hyp_text __vgic_v3_restore_aprs(struct vgic_v3_cpu_if *cpu_if)
 {
-	struct vgic_v3_cpu_if *cpu_if;
 	u64 val;
 	u32 nr_pre_bits;
 
-	vcpu = kern_hyp_va(vcpu);
-	cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
-
 	val = read_gicreg(ICH_VTR_EL2);
 	nr_pre_bits = vtr_to_nr_pre_bits(val);
 
@@ -456,7 +443,7 @@ static int __hyp_text __vgic_v3_highest_priority_lr(struct kvm_vcpu *vcpu,
 						    u32 vmcr,
 						    u64 *lr_val)
 {
-	unsigned int used_lrs = vcpu->arch.vgic_cpu.used_lrs;
+	unsigned int used_lrs = vcpu->arch.vgic_cpu.vgic_v3.used_lrs;
 	u8 priority = GICv3_IDLE_PRIORITY;
 	int i, lr = -1;
 
@@ -495,7 +482,7 @@ static int __hyp_text __vgic_v3_highest_priority_lr(struct kvm_vcpu *vcpu,
 static int __hyp_text __vgic_v3_find_active_lr(struct kvm_vcpu *vcpu,
 					       int intid, u64 *lr_val)
 {
-	unsigned int used_lrs = vcpu->arch.vgic_cpu.used_lrs;
+	unsigned int used_lrs = vcpu->arch.vgic_cpu.vgic_v3.used_lrs;
 	int i;
 
 	for (i = 0; i < used_lrs; i++) {
diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
index d91a8938aa7c..66fdffe28ea9 100644
--- a/virt/kvm/arm/vgic/vgic-v2.c
+++ b/virt/kvm/arm/vgic/vgic-v2.c
@@ -67,7 +67,7 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
 
 	cpuif->vgic_hcr &= ~GICH_HCR_UIE;
 
-	for (lr = 0; lr < vgic_cpu->used_lrs; lr++) {
+	for (lr = 0; lr < vgic_cpu->vgic_v2.used_lrs; lr++) {
 		u32 val = cpuif->vgic_lr[lr];
 		u32 cpuid, intid = val & GICH_LR_VIRTUALID;
 		struct vgic_irq *irq;
@@ -131,7 +131,7 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
 		vgic_put_irq(vcpu->kvm, irq);
 	}
 
-	vgic_cpu->used_lrs = 0;
+	cpuif->used_lrs = 0;
 }
 
 /*
@@ -434,7 +434,7 @@ int vgic_v2_probe(const struct gic_kvm_info *info)
 static void save_lrs(struct kvm_vcpu *vcpu, void __iomem *base)
 {
 	struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
-	u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
+	u64 used_lrs = cpu_if->used_lrs;
 	u64 elrsr;
 	int i;
 
@@ -455,7 +455,7 @@ static void save_lrs(struct kvm_vcpu *vcpu, void __iomem *base)
 void vgic_v2_save_state(struct kvm_vcpu *vcpu)
 {
 	void __iomem *base = kvm_vgic_global_state.vctrl_base;
-	u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
+	u64 used_lrs = vcpu->arch.vgic_cpu.vgic_v2.used_lrs;
 
 	if (!base)
 		return;
@@ -470,7 +470,7 @@ void vgic_v2_restore_state(struct kvm_vcpu *vcpu)
 {
 	struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
 	void __iomem *base = kvm_vgic_global_state.vctrl_base;
-	u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
+	u64 used_lrs = cpu_if->used_lrs;
 	int i;
 
 	if (!base)
diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
index 9f87e58dbd4a..77d23e817756 100644
--- a/virt/kvm/arm/vgic/vgic-v3.c
+++ b/virt/kvm/arm/vgic/vgic-v3.c
@@ -51,7 +51,7 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
 
 	cpuif->vgic_hcr &= ~ICH_HCR_UIE;
 
-	for (lr = 0; lr < vgic_cpu->used_lrs; lr++) {
+	for (lr = 0; lr < cpuif->used_lrs; lr++) {
 		u64 val = cpuif->vgic_lr[lr];
 		u32 intid, cpuid;
 		struct vgic_irq *irq;
@@ -123,7 +123,7 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
 		vgic_put_irq(vcpu->kvm, irq);
 	}
 
-	vgic_cpu->used_lrs = 0;
+	cpuif->used_lrs = 0;
 }
 
 /* Requires the irq to be locked already */
@@ -668,10 +668,10 @@ void vgic_v3_load(struct kvm_vcpu *vcpu)
 	if (likely(cpu_if->vgic_sre))
 		kvm_call_hyp(__vgic_v3_write_vmcr, cpu_if->vgic_vmcr);
 
-	kvm_call_hyp(__vgic_v3_restore_aprs, vcpu);
+	kvm_call_hyp(__vgic_v3_restore_aprs, kern_hyp_va(cpu_if));
 
 	if (has_vhe())
-		__vgic_v3_activate_traps(vcpu);
+		__vgic_v3_activate_traps(cpu_if);
 }
 
 void vgic_v3_put(struct kvm_vcpu *vcpu)
@@ -681,8 +681,8 @@ void vgic_v3_put(struct kvm_vcpu *vcpu)
 	if (likely(cpu_if->vgic_sre))
 		cpu_if->vgic_vmcr = kvm_call_hyp_ret(__vgic_v3_read_vmcr);
 
-	kvm_call_hyp(__vgic_v3_save_aprs, vcpu);
+	kvm_call_hyp(__vgic_v3_save_aprs, kern_hyp_va(cpu_if));
 
 	if (has_vhe())
-		__vgic_v3_deactivate_traps(vcpu);
+		__vgic_v3_deactivate_traps(cpu_if);
 }
diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index 1c5b4dbd33e4..6953aefecbb6 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -797,6 +797,7 @@ static void vgic_flush_lr_state(struct kvm_vcpu *vcpu)
 	int count;
 	bool multi_sgi;
 	u8 prio = 0xff;
+	int i = 0;
 
 	lockdep_assert_held(&vgic_cpu->ap_list_lock);
 
@@ -838,11 +839,14 @@ static void vgic_flush_lr_state(struct kvm_vcpu *vcpu)
 		}
 	}
 
-	vcpu->arch.vgic_cpu.used_lrs = count;
-
 	/* Nuke remaining LRs */
-	for ( ; count < kvm_vgic_global_state.nr_lr; count++)
-		vgic_clear_lr(vcpu, count);
+	for (i = count ; i < kvm_vgic_global_state.nr_lr; i++)
+		vgic_clear_lr(vcpu, i);
+
+	if (!static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
+		vcpu->arch.vgic_cpu.vgic_v2.used_lrs = count;
+	else
+		vcpu->arch.vgic_cpu.vgic_v3.used_lrs = count;
 }
 
 static inline bool can_access_vgic_from_kernel(void)
@@ -860,13 +864,13 @@ static inline void vgic_save_state(struct kvm_vcpu *vcpu)
 	if (!static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
 		vgic_v2_save_state(vcpu);
 	else
-		__vgic_v3_save_state(vcpu);
+		__vgic_v3_save_state(&vcpu->arch.vgic_cpu.vgic_v3);
 }
 
 /* Sync back the hardware VGIC state into our emulation after a guest's run. */
 void kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
 {
-	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
+	int used_lrs;
 
 	WARN_ON(vgic_v4_sync_hwstate(vcpu));
 
@@ -877,7 +881,12 @@ void kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
 	if (can_access_vgic_from_kernel())
 		vgic_save_state(vcpu);
 
-	if (vgic_cpu->used_lrs)
+	if (!static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
+		used_lrs = vcpu->arch.vgic_cpu.vgic_v2.used_lrs;
+	else
+		used_lrs = vcpu->arch.vgic_cpu.vgic_v3.used_lrs;
+
+	if (used_lrs)
 		vgic_fold_lr_state(vcpu);
 	vgic_prune_ap_list(vcpu);
 }
@@ -887,7 +896,7 @@ static inline void vgic_restore_state(struct kvm_vcpu *vcpu)
 	if (!static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif))
 		vgic_v2_restore_state(vcpu);
 	else
-		__vgic_v3_restore_state(vcpu);
+		__vgic_v3_restore_state(&vcpu->arch.vgic_cpu.vgic_v3);
 }
 
 /* Flush our emulation state into the GIC hardware before entering the guest. */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 50/59] KVM: arm64: nv: Nested GICv3 Support
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (48 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 49/59] KVM: arm64: nv: vgic-v3: Take cpu_if pointer directly instead of vcpu Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-07-16 11:41   ` Alexandru Elisei
  2019-06-21  9:38 ` [PATCH 51/59] KVM: arm64: nv: vgic: Emulate the HW bit in software Marc Zyngier
                   ` (10 subsequent siblings)
  60 siblings, 1 reply; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Jintack Lim <jintack@cs.columbia.edu>

When entering a nested VM, we set up the hypervisor control interface
based on what the guest hypervisor has set. Especially, we investigate
each list register written by the guest hypervisor whether HW bit is
set.  If so, we translate hw irq number from the guest's point of view
to the real hardware irq number if there is a mapping.

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
[Rewritten to support GICv3 instead of GICv2]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
[Redesigned execution flow around vcpu load/put]
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
---
 arch/arm/include/asm/kvm_emulate.h |   1 +
 arch/arm/include/asm/kvm_host.h    |   6 +-
 arch/arm64/include/asm/kvm_host.h  |   5 +-
 arch/arm64/kvm/Makefile            |   1 +
 arch/arm64/kvm/nested.c            |  10 ++
 arch/arm64/kvm/sys_regs.c          | 178 ++++++++++++++++++++++++++++-
 include/kvm/arm_vgic.h             |  18 +++
 virt/kvm/arm/arm.c                 |   7 +-
 virt/kvm/arm/vgic/vgic-v3-nested.c | 177 ++++++++++++++++++++++++++++
 virt/kvm/arm/vgic/vgic-v3.c        |  28 +++++
 virt/kvm/arm/vgic/vgic.c           |  32 ++++++
 11 files changed, 456 insertions(+), 7 deletions(-)
 create mode 100644 virt/kvm/arm/vgic/vgic-v3-nested.c

diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index 865ce545b465..a53f19041e16 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -334,5 +334,6 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
 static inline void vcpu_ptrauth_setup_lazy(struct kvm_vcpu *vcpu) {}
 
 static inline bool is_hyp_ctxt(struct kvm_vcpu *vcpu) { return false; }
+static inline int kvm_inject_nested_irq(struct kvm_vcpu *vcpu) { BUG(); }
 
 #endif /* __ARM_KVM_EMULATE_H__ */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index cc761610e41e..d6923ed55796 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -35,10 +35,12 @@
 #define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS
 #endif
 
+/* KVM_REQ_GUEST_HYP_IRQ_PENDING is actually unused */
 #define KVM_REQ_SLEEP \
 	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
-#define KVM_REQ_IRQ_PENDING	KVM_ARCH_REQ(1)
-#define KVM_REQ_VCPU_RESET	KVM_ARCH_REQ(2)
+#define KVM_REQ_IRQ_PENDING		KVM_ARCH_REQ(1)
+#define KVM_REQ_VCPU_RESET		KVM_ARCH_REQ(2)
+#define KVM_REQ_GUEST_HYP_IRQ_PENDING	KVM_ARCH_REQ(3)
 
 DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
 
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index e0fe9acb46bf..e2e44cc650bf 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -53,8 +53,9 @@
 
 #define KVM_REQ_SLEEP \
 	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
-#define KVM_REQ_IRQ_PENDING	KVM_ARCH_REQ(1)
-#define KVM_REQ_VCPU_RESET	KVM_ARCH_REQ(2)
+#define KVM_REQ_IRQ_PENDING		KVM_ARCH_REQ(1)
+#define KVM_REQ_VCPU_RESET		KVM_ARCH_REQ(2)
+#define KVM_REQ_GUEST_HYP_IRQ_PENDING	KVM_ARCH_REQ(3)
 
 DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
 
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index f11bd8b0d837..045a8f18f465 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -38,3 +38,4 @@ kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
 
 kvm-$(CONFIG_KVM_ARM_HOST) += nested.o
 kvm-$(CONFIG_KVM_ARM_HOST) += emulate-nested.o
+kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-v3-nested.o
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 214d59019935..df2db9ab7cfb 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -539,3 +539,13 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm)
 	kvm->arch.nested_mmus_size = 0;
 	kvm_free_stage2_pgd(&kvm->arch.mmu);
 }
+
+bool vgic_state_is_nested(struct kvm_vcpu *vcpu)
+{
+	bool imo = __vcpu_sys_reg(vcpu, HCR_EL2) & HCR_IMO;
+	bool fmo = __vcpu_sys_reg(vcpu, HCR_EL2) & HCR_FMO;
+
+	WARN(imo != fmo, "Separate virtual IRQ/FIQ settings not supported\n");
+
+	return nested_virt_in_use(vcpu) && imo && fmo && !is_hyp_ctxt(vcpu);
+}
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 2031a59fcf49..ba3bcd29c02d 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -26,6 +26,8 @@
 #include <linux/printk.h>
 #include <linux/uaccess.h>
 
+#include <linux/irqchip/arm-gic-v3.h>
+
 #include <asm/cacheflush.h>
 #include <asm/cputype.h>
 #include <asm/debug-monitors.h>
@@ -505,6 +507,18 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+/*
+ * The architecture says that non-secure write accesses to this register from
+ * EL1 are trapped to EL2, if either:
+ *  - HCR_EL2.FMO==1, or
+ *  - HCR_EL2.IMO==1
+ */
+static bool sgi_traps_to_vel2(struct kvm_vcpu *vcpu)
+{
+	return !vcpu_mode_el2(vcpu) &&
+		!!(__vcpu_sys_reg(vcpu, HCR_EL2) & (HCR_IMO | HCR_FMO));
+}
+
 /*
  * Trap handler for the GICv3 SGI generation system register.
  * Forward the request to the VGIC emulation.
@@ -520,6 +534,11 @@ static bool access_gic_sgi(struct kvm_vcpu *vcpu,
 	if (!p->is_write)
 		return read_from_write_only(vcpu, p, r);
 
+	if (sgi_traps_to_vel2(vcpu)) {
+		kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+		return false;
+	}
+
 	/*
 	 * In a system where GICD_CTLR.DS=1, a ICC_SGI0R_EL1 access generates
 	 * Group0 SGIs only, while ICC_SGI1R_EL1 can generate either group,
@@ -563,7 +582,13 @@ static bool access_gic_sre(struct kvm_vcpu *vcpu,
 	if (p->is_write)
 		return ignore_write(vcpu, p);
 
-	p->regval = vcpu->arch.vgic_cpu.vgic_v3.vgic_sre;
+	if (p->Op1 == 4) {	/* ICC_SRE_EL2 */
+		p->regval = (ICC_SRE_EL2_ENABLE | ICC_SRE_EL2_SRE |
+			     ICC_SRE_EL1_DIB | ICC_SRE_EL1_DFB);
+	} else {		/* ICC_SRE_EL1 */
+		p->regval = vcpu->arch.vgic_cpu.vgic_v3.vgic_sre;
+	}
+
 	return true;
 }
 
@@ -1793,6 +1818,122 @@ static bool access_id_aa64pfr0_el1(struct kvm_vcpu *v,
 	return true;
 }
 
+static bool access_gic_apr(struct kvm_vcpu *vcpu,
+			   struct sys_reg_params *p,
+			   const struct sys_reg_desc *r)
+{
+	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
+	u32 index, *base;
+
+	index = r->Op2;
+	if (r->CRm == 8)
+		base = cpu_if->vgic_ap0r;
+	else
+		base = cpu_if->vgic_ap1r;
+
+	if (p->is_write)
+		base[index] = p->regval;
+	else
+		p->regval = base[index];
+
+	return true;
+}
+
+static bool access_gic_hcr(struct kvm_vcpu *vcpu,
+			   struct sys_reg_params *p,
+			   const struct sys_reg_desc *r)
+{
+	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
+
+	if (p->is_write)
+		cpu_if->vgic_hcr = p->regval;
+	else
+		p->regval = cpu_if->vgic_hcr;
+
+	return true;
+}
+
+static bool access_gic_vtr(struct kvm_vcpu *vcpu,
+			   struct sys_reg_params *p,
+			   const struct sys_reg_desc *r)
+{
+	if (p->is_write)
+		return write_to_read_only(vcpu, p, r);
+
+	p->regval = kvm_vgic_global_state.ich_vtr_el2;
+
+	return true;
+}
+
+static bool access_gic_misr(struct kvm_vcpu *vcpu,
+			    struct sys_reg_params *p,
+			    const struct sys_reg_desc *r)
+{
+	if (p->is_write)
+		return write_to_read_only(vcpu, p, r);
+
+	p->regval = vgic_v3_get_misr(vcpu);
+
+	return true;
+}
+
+static bool access_gic_eisr(struct kvm_vcpu *vcpu,
+			    struct sys_reg_params *p,
+			    const struct sys_reg_desc *r)
+{
+	if (p->is_write)
+		return write_to_read_only(vcpu, p, r);
+
+	p->regval = vgic_v3_get_eisr(vcpu);
+
+	return true;
+}
+
+static bool access_gic_elrsr(struct kvm_vcpu *vcpu,
+			     struct sys_reg_params *p,
+			     const struct sys_reg_desc *r)
+{
+	if (p->is_write)
+		return write_to_read_only(vcpu, p, r);
+
+	p->regval = vgic_v3_get_elrsr(vcpu);
+
+	return true;
+}
+
+static bool access_gic_vmcr(struct kvm_vcpu *vcpu,
+			    struct sys_reg_params *p,
+			    const struct sys_reg_desc *r)
+{
+	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
+
+	if (p->is_write)
+		cpu_if->vgic_vmcr = p->regval;
+	else
+		p->regval = cpu_if->vgic_vmcr;
+
+	return true;
+}
+
+static bool access_gic_lr(struct kvm_vcpu *vcpu,
+			  struct sys_reg_params *p,
+			  const struct sys_reg_desc *r)
+{
+	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
+	u32 index;
+
+	index = p->Op2;
+	if (p->CRm == 13)
+		index += 8;
+
+	if (p->is_write)
+		cpu_if->vgic_lr[index] = p->regval;
+	else
+		p->regval = cpu_if->vgic_lr[index];
+
+	return true;
+}
+
 /*
  * Architected system registers.
  * Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
@@ -2123,6 +2264,41 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_RMR_EL2), access_rw, reset_val, RMR_EL2, 0 },
 	{ SYS_DESC(SYS_VDISR_EL2), trap_undef },
 
+	{ SYS_DESC(SYS_ICH_AP0R0_EL2), access_gic_apr },
+	{ SYS_DESC(SYS_ICH_AP0R1_EL2), access_gic_apr },
+	{ SYS_DESC(SYS_ICH_AP0R2_EL2), access_gic_apr },
+	{ SYS_DESC(SYS_ICH_AP0R3_EL2), access_gic_apr },
+	{ SYS_DESC(SYS_ICH_AP1R0_EL2), access_gic_apr },
+	{ SYS_DESC(SYS_ICH_AP1R1_EL2), access_gic_apr },
+	{ SYS_DESC(SYS_ICH_AP1R2_EL2), access_gic_apr },
+	{ SYS_DESC(SYS_ICH_AP1R3_EL2), access_gic_apr },
+
+	{ SYS_DESC(SYS_ICC_SRE_EL2), access_gic_sre },
+
+	{ SYS_DESC(SYS_ICH_HCR_EL2), access_gic_hcr },
+	{ SYS_DESC(SYS_ICH_VTR_EL2), access_gic_vtr },
+	{ SYS_DESC(SYS_ICH_MISR_EL2), access_gic_misr },
+	{ SYS_DESC(SYS_ICH_EISR_EL2), access_gic_eisr },
+	{ SYS_DESC(SYS_ICH_ELRSR_EL2), access_gic_elrsr },
+	{ SYS_DESC(SYS_ICH_VMCR_EL2), access_gic_vmcr },
+
+	{ SYS_DESC(SYS_ICH_LR0_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR1_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR2_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR3_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR4_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR5_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR6_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR7_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR8_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR9_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR10_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR11_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR12_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR13_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR14_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR15_EL2), access_gic_lr },
+
 	{ SYS_DESC(SYS_CONTEXTIDR_EL2), access_rw, reset_val, CONTEXTIDR_EL2, 0 },
 	{ SYS_DESC(SYS_TPIDR_EL2), access_rw, reset_val, TPIDR_EL2, 0 },
 
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 163b132e100e..707fbe627155 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -310,6 +310,15 @@ struct vgic_cpu {
 
 	struct vgic_irq private_irqs[VGIC_NR_PRIVATE_IRQS];
 
+	/* CPU vif control registers for the virtual GICH interface */
+	struct vgic_v3_cpu_if	nested_vgic_v3;
+
+	/*
+	 * The shadow vif control register loaded to the hardware when
+	 * running a nested L2 guest with the virtual IMO/FMO bit set.
+	 */
+	struct vgic_v3_cpu_if	shadow_vgic_v3;
+
 	raw_spinlock_t ap_list_lock;	/* Protects the ap_list */
 
 	/*
@@ -366,6 +375,13 @@ int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
 void kvm_vgic_load(struct kvm_vcpu *vcpu);
 void kvm_vgic_put(struct kvm_vcpu *vcpu);
 
+void vgic_v3_load_nested(struct kvm_vcpu *vcpu);
+void vgic_v3_put_nested(struct kvm_vcpu *vcpu);
+void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu);
+u16 vgic_v3_get_eisr(struct kvm_vcpu *vcpu);
+u16 vgic_v3_get_elrsr(struct kvm_vcpu *vcpu);
+u64 vgic_v3_get_misr(struct kvm_vcpu *vcpu);
+
 #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
 #define vgic_initialized(k)	((k)->arch.vgic.initialized)
 #define vgic_ready(k)		((k)->arch.vgic.ready)
@@ -411,4 +427,6 @@ int kvm_vgic_v4_unset_forwarding(struct kvm *kvm, int irq,
 void kvm_vgic_v4_enable_doorbell(struct kvm_vcpu *vcpu);
 void kvm_vgic_v4_disable_doorbell(struct kvm_vcpu *vcpu);
 
+bool vgic_state_is_nested(struct kvm_vcpu *vcpu);
+
 #endif /* __KVM_ARM_VGIC_H */
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index ca10a11e044e..ddcab58ae440 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -634,6 +634,9 @@ static void check_vcpu_requests(struct kvm_vcpu *vcpu)
 		 * that a VCPU sees new virtual interrupts.
 		 */
 		kvm_check_request(KVM_REQ_IRQ_PENDING, vcpu);
+
+		if (kvm_check_request(KVM_REQ_GUEST_HYP_IRQ_PENDING, vcpu))
+			kvm_inject_nested_irq(vcpu);
 	}
 }
 
@@ -680,10 +683,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		 */
 		cond_resched();
 
-		update_vmid(&vcpu->arch.hw_mmu->vmid);
-
 		check_vcpu_requests(vcpu);
 
+		update_vmid(&vcpu->arch.hw_mmu->vmid);
+
 		/*
 		 * Preparing the interrupts to be injected also
 		 * involves poking the GIC, which must be done in a
diff --git a/virt/kvm/arm/vgic/vgic-v3-nested.c b/virt/kvm/arm/vgic/vgic-v3-nested.c
new file mode 100644
index 000000000000..6fb81dfbb679
--- /dev/null
+++ b/virt/kvm/arm/vgic/vgic-v3-nested.c
@@ -0,0 +1,177 @@
+#include <linux/cpu.h>
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/uaccess.h>
+
+#include <linux/irqchip/arm-gic-v3.h>
+
+#include <asm/kvm_emulate.h>
+#include <asm/kvm_arm.h>
+#include <kvm/arm_vgic.h>
+
+#include "vgic.h"
+
+static inline struct vgic_v3_cpu_if *vcpu_nested_if(struct kvm_vcpu *vcpu)
+{
+	return &vcpu->arch.vgic_cpu.nested_vgic_v3;
+}
+
+static inline struct vgic_v3_cpu_if *vcpu_shadow_if(struct kvm_vcpu *vcpu)
+{
+	return &vcpu->arch.vgic_cpu.shadow_vgic_v3;
+}
+
+static inline bool lr_triggers_eoi(u64 lr)
+{
+	return !(lr & (ICH_LR_STATE | ICH_LR_HW)) && (lr & ICH_LR_EOI);
+}
+
+u16 vgic_v3_get_eisr(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+	u16 reg = 0;
+	int i;
+
+	for (i = 0; i < kvm_vgic_global_state.nr_lr; i++) {
+		if (lr_triggers_eoi(cpu_if->vgic_lr[i]))
+			reg |= BIT(i);
+	}
+
+	return reg;
+}
+
+u16 vgic_v3_get_elrsr(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+	u16 reg = 0;
+	int i;
+
+	for (i = 0; i < kvm_vgic_global_state.nr_lr; i++) {
+		if (!(cpu_if->vgic_lr[i] & ICH_LR_STATE))
+			reg |= BIT(i);
+	}
+
+	return reg;
+}
+
+u64 vgic_v3_get_misr(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+	int nr_lr = kvm_vgic_global_state.nr_lr;
+	u64 reg = 0;
+
+	if (vgic_v3_get_eisr(vcpu))
+		reg |= ICH_MISR_EOI;
+
+	if (cpu_if->vgic_hcr & ICH_HCR_UIE) {
+		int used_lrs;
+
+		used_lrs = nr_lr - hweight16(vgic_v3_get_elrsr(vcpu));
+		if (used_lrs <= 1)
+			reg |= ICH_MISR_U;
+	}
+
+	/* TODO: Support remaining bits in this register */
+	return reg;
+}
+
+/*
+ * For LRs which have HW bit set such as timer interrupts, we modify them to
+ * have the host hardware interrupt number instead of the virtual one programmed
+ * by the guest hypervisor.
+ */
+static void vgic_v3_create_shadow_lr(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+	struct vgic_v3_cpu_if *s_cpu_if = vcpu_shadow_if(vcpu);
+	struct vgic_irq *irq;
+	int i;
+
+	for (i = 0; i < kvm_vgic_global_state.nr_lr; i++) {
+		u64 lr = cpu_if->vgic_lr[i];
+		int l1_irq;
+
+		if (!(lr & ICH_LR_HW))
+			goto next;
+
+		/* We have the HW bit set */
+		l1_irq = (lr & ICH_LR_PHYS_ID_MASK) >> ICH_LR_PHYS_ID_SHIFT;
+		irq = vgic_get_irq(vcpu->kvm, vcpu, l1_irq);
+
+		if (!irq || !irq->hw) {
+			/* There was no real mapping, so nuke the HW bit */
+			lr &= ~ICH_LR_HW;
+			if (irq)
+				vgic_put_irq(vcpu->kvm, irq);
+			goto next;
+		}
+
+		/* Translate the virtual mapping to the real one */
+		lr &= ~ICH_LR_EOI; /* Why? */
+		lr &= ~ICH_LR_PHYS_ID_MASK;
+		lr |= (u64)irq->hwintid << ICH_LR_PHYS_ID_SHIFT;
+		vgic_put_irq(vcpu->kvm, irq);
+
+next:
+		s_cpu_if->vgic_lr[i] = lr;
+	}
+
+	s_cpu_if->used_lrs = kvm_vgic_global_state.nr_lr;
+}
+
+/*
+ * Change the shadow HWIRQ field back to the virtual value before copying over
+ * the entire shadow struct to the nested state.
+ */
+static void vgic_v3_fixup_shadow_lr_state(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+	struct vgic_v3_cpu_if *s_cpu_if = vcpu_shadow_if(vcpu);
+	int lr;
+
+	for (lr = 0; lr < kvm_vgic_global_state.nr_lr; lr++) {
+		s_cpu_if->vgic_lr[lr] &= ~ICH_LR_PHYS_ID_MASK;
+		s_cpu_if->vgic_lr[lr] |= cpu_if->vgic_lr[lr] & ICH_LR_PHYS_ID_MASK;
+	}
+}
+
+void vgic_v3_load_nested(struct kvm_vcpu *vcpu)
+{
+	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
+
+	vgic_cpu->shadow_vgic_v3 = vgic_cpu->nested_vgic_v3;
+	vgic_v3_create_shadow_lr(vcpu);
+	__vgic_v3_restore_state(vcpu_shadow_if(vcpu));
+}
+
+void vgic_v3_put_nested(struct kvm_vcpu *vcpu)
+{
+	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
+
+	__vgic_v3_save_state(vcpu_shadow_if(vcpu));
+
+	/*
+	 * Translate the shadow state HW fields back to the virtual ones
+	 * before copying the shadow struct back to the nested one.
+	 */
+	vgic_v3_fixup_shadow_lr_state(vcpu);
+	vgic_cpu->nested_vgic_v3 = vgic_cpu->shadow_vgic_v3;
+}
+
+void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+
+	/*
+	 * If we exit a nested VM with a pending maintenance interrupt from the
+	 * GIC, then we need to forward this to the guest hypervisor so that it
+	 * can re-sync the appropriate LRs and sample level triggered interrupts
+	 * again.
+	 */
+	if (vgic_state_is_nested(vcpu) &&
+	    (cpu_if->vgic_hcr & ICH_HCR_EN) &&
+	    vgic_v3_get_misr(vcpu))
+		kvm_inject_nested_irq(vcpu);
+}
diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
index 77d23e817756..25edf32c28fb 100644
--- a/virt/kvm/arm/vgic/vgic-v3.c
+++ b/virt/kvm/arm/vgic/vgic-v3.c
@@ -18,6 +18,7 @@
 #include <kvm/arm_vgic.h>
 #include <asm/kvm_hyp.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_nested.h>
 #include <asm/kvm_asm.h>
 
 #include "vgic.h"
@@ -298,6 +299,12 @@ void vgic_v3_enable(struct kvm_vcpu *vcpu)
 		vgic_v3->vgic_sre = (ICC_SRE_EL1_DIB |
 				     ICC_SRE_EL1_DFB |
 				     ICC_SRE_EL1_SRE);
+		/*
+		 * If nesting is allowed, force GICv3 onto the nested
+		 * guests as well.
+		 */
+		if (nested_virt_in_use(vcpu))
+			vcpu->arch.vgic_cpu.nested_vgic_v3.vgic_sre = vgic_v3->vgic_sre;
 		vcpu->arch.vgic_cpu.pendbaser = INITIAL_PENDBASER_VALUE;
 	} else {
 		vgic_v3->vgic_sre = 0;
@@ -660,6 +667,13 @@ void vgic_v3_load(struct kvm_vcpu *vcpu)
 {
 	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
 
+	/*
+	 * vgic_v3_load_nested only affects the LRs in the shadow
+	 * state, so it is fine to pass the nested state around.
+	 */
+	if (vgic_state_is_nested(vcpu))
+		cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
+
 	/*
 	 * If dealing with a GICv2 emulation on GICv3, VMCR_EL2.VFIQen
 	 * is dependent on ICC_SRE_EL1.SRE, and we have to perform the
@@ -672,12 +686,18 @@ void vgic_v3_load(struct kvm_vcpu *vcpu)
 
 	if (has_vhe())
 		__vgic_v3_activate_traps(cpu_if);
+
+	if (vgic_state_is_nested(vcpu))
+		vgic_v3_load_nested(vcpu);
 }
 
 void vgic_v3_put(struct kvm_vcpu *vcpu)
 {
 	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
 
+	if (vgic_state_is_nested(vcpu))
+		cpu_if = &vcpu->arch.vgic_cpu.shadow_vgic_v3;
+
 	if (likely(cpu_if->vgic_sre))
 		cpu_if->vgic_vmcr = kvm_call_hyp_ret(__vgic_v3_read_vmcr);
 
@@ -685,4 +705,12 @@ void vgic_v3_put(struct kvm_vcpu *vcpu)
 
 	if (has_vhe())
 		__vgic_v3_deactivate_traps(cpu_if);
+
+	if (vgic_state_is_nested(vcpu))
+		vgic_v3_put_nested(vcpu);
 }
+
+__weak void vgic_v3_sync_nested(struct kvm_vcpu *vcpu) {}
+__weak void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu) {}
+__weak void vgic_v3_load_nested(struct kvm_vcpu *vcpu) {}
+__weak void vgic_v3_put_nested(struct kvm_vcpu *vcpu) {}
diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index 6953aefecbb6..f32f49b0c803 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -872,6 +872,10 @@ void kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
 {
 	int used_lrs;
 
+	/* If nesting, this is a load/put affair, not flush/sync. */
+	if (vgic_state_is_nested(vcpu))
+		return;
+
 	WARN_ON(vgic_v4_sync_hwstate(vcpu));
 
 	/* An empty ap_list_head implies used_lrs == 0 */
@@ -920,6 +924,29 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
 	    !vgic_supports_direct_msis(vcpu->kvm))
 		return;
 
+	/*
+	 * If in a nested state, we must return early. Two possibilities:
+	 *
+	 * - If we have any pending IRQ for the guest and the guest
+	 *   expects IRQs to be handled in its virtual EL2 mode (the
+	 *   virtual IMO bit is set) and it is not already running in
+	 *   virtual EL2 mode, then we have to emulate an IRQ
+	 *   exception to virtual EL2.
+	 *
+	 *   We do that by placing a request to ourselves which will
+	 *   abort the entry procedure and inject the exception at the
+	 *   beginning of the run loop.
+	 *
+	 * - Otherwise, do exactly *NOTHING*. The guest state is
+	 *   already loaded, and we can carry on with running it.
+	 */
+	if (vgic_state_is_nested(vcpu)) {
+		if (kvm_vgic_vcpu_pending_irq(vcpu))
+			kvm_make_request(KVM_REQ_GUEST_HYP_IRQ_PENDING, vcpu);
+
+		return;
+	}
+
 	DEBUG_SPINLOCK_BUG_ON(!irqs_disabled());
 
 	if (!list_empty(&vcpu->arch.vgic_cpu.ap_list_head)) {
@@ -1022,3 +1049,8 @@ bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int vintid)
 
 	return map_is_active;
 }
+
+__weak bool vgic_state_is_nested(struct kvm_vcpu *vcpu)
+{
+	return false;
+}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 51/59] KVM: arm64: nv: vgic: Emulate the HW bit in software
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (49 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 50/59] KVM: arm64: nv: Nested GICv3 Support Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-06-21  9:38 ` [PATCH 52/59] KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ Marc Zyngier
                   ` (9 subsequent siblings)
  60 siblings, 0 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Christoffer Dall <christoffer.dall@arm.com>

Should the guest hypervisor use the HW bit in the LRs, we need to
emulate the deactivation from the L2 guest into the L1 distributor
emulation, which is handled by L0.

It's all good fun.

Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/kvm_hyp.h   |  2 ++
 include/kvm/arm_vgic.h             |  1 +
 virt/kvm/arm/hyp/vgic-v3-sr.c      |  2 +-
 virt/kvm/arm/vgic/vgic-v3-nested.c | 32 ++++++++++++++++++++++++++++++
 virt/kvm/arm/vgic/vgic.c           | 10 ++++++----
 5 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 3ebbf307d2e2..f86c396e2c67 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -83,6 +83,8 @@ typeof(orig) * __hyp_text fname(void)					\
 
 int __vgic_v2_perform_cpuif_access(struct kvm_vcpu *vcpu);
 
+u64 __hyp_text __gic_v3_get_lr(unsigned int lr);
+
 void __vgic_v3_save_state(struct vgic_v3_cpu_if *cpu_if);
 void __vgic_v3_restore_state(struct vgic_v3_cpu_if *cpu_if);
 void __vgic_v3_activate_traps(struct vgic_v3_cpu_if *cpu_if);
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 707fbe627155..17d32c2797c0 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -375,6 +375,7 @@ int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
 void kvm_vgic_load(struct kvm_vcpu *vcpu);
 void kvm_vgic_put(struct kvm_vcpu *vcpu);
 
+void vgic_v3_sync_nested(struct kvm_vcpu *vcpu);
 void vgic_v3_load_nested(struct kvm_vcpu *vcpu);
 void vgic_v3_put_nested(struct kvm_vcpu *vcpu);
 void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/arm/hyp/vgic-v3-sr.c b/virt/kvm/arm/hyp/vgic-v3-sr.c
index b5d818d8b458..c2e4354416a3 100644
--- a/virt/kvm/arm/hyp/vgic-v3-sr.c
+++ b/virt/kvm/arm/hyp/vgic-v3-sr.c
@@ -27,7 +27,7 @@
 #define vtr_to_nr_pre_bits(v)		((((u32)(v) >> 26) & 7) + 1)
 #define vtr_to_nr_apr_regs(v)		(1 << (vtr_to_nr_pre_bits(v) - 5))
 
-static u64 __hyp_text __gic_v3_get_lr(unsigned int lr)
+u64 __hyp_text __gic_v3_get_lr(unsigned int lr)
 {
 	switch (lr & 0xf) {
 	case 0:
diff --git a/virt/kvm/arm/vgic/vgic-v3-nested.c b/virt/kvm/arm/vgic/vgic-v3-nested.c
index 6fb81dfbb679..c917d49e4a14 100644
--- a/virt/kvm/arm/vgic/vgic-v3-nested.c
+++ b/virt/kvm/arm/vgic/vgic-v3-nested.c
@@ -137,6 +137,38 @@ static void vgic_v3_fixup_shadow_lr_state(struct kvm_vcpu *vcpu)
 	}
 }
 
+void vgic_v3_sync_nested(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+	struct vgic_v3_cpu_if *s_cpu_if = vcpu_shadow_if(vcpu);
+	struct vgic_irq *irq;
+	int i;
+
+	for (i = 0; i < s_cpu_if->used_lrs; i++) {
+		u64 lr = cpu_if->vgic_lr[i];
+		int l1_irq;
+
+		if (!(lr & ICH_LR_HW) || !(lr & ICH_LR_STATE))
+			continue;
+
+		/*
+		 * If we had a HW lr programmed by the guest hypervisor, we
+		 * need to emulate the HW effect between the guest hypervisor
+		 * and the nested guest.
+		 */
+		l1_irq = (lr & ICH_LR_PHYS_ID_MASK) >> ICH_LR_PHYS_ID_SHIFT;
+		irq = vgic_get_irq(vcpu->kvm, vcpu, l1_irq);
+		if (!irq)
+			continue; /* oh well, the guest hyp is broken */
+
+		lr = __gic_v3_get_lr(i);
+		if (!(lr & ICH_LR_STATE))
+			irq->active = false;
+
+		vgic_put_irq(vcpu->kvm, irq);
+	}
+}
+
 void vgic_v3_load_nested(struct kvm_vcpu *vcpu)
 {
 	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index f32f49b0c803..3dd670166d70 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -872,12 +872,14 @@ void kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
 {
 	int used_lrs;
 
-	/* If nesting, this is a load/put affair, not flush/sync. */
-	if (vgic_state_is_nested(vcpu))
-		return;
-
 	WARN_ON(vgic_v4_sync_hwstate(vcpu));
 
+	/* If nesting, emualte the HW effect from L0 to L1 */
+	if (vgic_state_is_nested(vcpu)) {
+		vgic_v3_sync_nested(vcpu);
+		return;
+	}
+
 	/* An empty ap_list_head implies used_lrs == 0 */
 	if (list_empty(&vcpu->arch.vgic_cpu.ap_list_head))
 		return;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 52/59] KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (50 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 51/59] KVM: arm64: nv: vgic: Emulate the HW bit in software Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-07-04  7:38   ` Julien Thierry
  2019-06-21  9:38 ` [PATCH 53/59] KVM: arm64: nv: Implement maintenance interrupt forwarding Marc Zyngier
                   ` (8 subsequent siblings)
  60 siblings, 1 reply; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Andre Przywara <andre.przywara@arm.com>

The VGIC maintenance IRQ signals various conditions about the LRs, when
the GIC's virtualization extension is used.
So far we didn't need it, but nested virtualization needs to know about
this interrupt, so add a userland interface to setup the IRQ number.
The architecture mandates that it must be a PPI, on top of that this code
only exports a per-device option, so the PPI is the same on all VCPUs.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
[added some bits of documentation]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 .../virtual/kvm/devices/arm-vgic-v3.txt       |  9 ++++++++
 arch/arm/include/uapi/asm/kvm.h               |  1 +
 arch/arm64/include/uapi/asm/kvm.h             |  1 +
 include/kvm/arm_vgic.h                        |  3 +++
 virt/kvm/arm/vgic/vgic-kvm-device.c           | 22 +++++++++++++++++++
 5 files changed, 36 insertions(+)

diff --git a/Documentation/virtual/kvm/devices/arm-vgic-v3.txt b/Documentation/virtual/kvm/devices/arm-vgic-v3.txt
index ff290b43c8e5..c70e8f2e0c9c 100644
--- a/Documentation/virtual/kvm/devices/arm-vgic-v3.txt
+++ b/Documentation/virtual/kvm/devices/arm-vgic-v3.txt
@@ -249,3 +249,12 @@ Groups:
   Errors:
     -EINVAL: vINTID is not multiple of 32 or
      info field is not VGIC_LEVEL_INFO_LINE_LEVEL
+
+  KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ
+    The attr field of kvm_device_attr encodes the following values:
+    bits:     | 31   ....    5 | 4  ....  0 |
+    values:   |      RES0      |   vINTID   |
+
+    The vINTID specifies which interrupt is generated when the vGIC
+    must generate a maintenance interrupt. This must be a PPI.
+
diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
index 4602464ebdfb..a6af45645a6d 100644
--- a/arch/arm/include/uapi/asm/kvm.h
+++ b/arch/arm/include/uapi/asm/kvm.h
@@ -233,6 +233,7 @@ struct kvm_vcpu_events {
 #define KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS 6
 #define KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO  7
 #define KVM_DEV_ARM_VGIC_GRP_ITS_REGS	8
+#define KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ	9
 #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT	10
 #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_MASK \
 			(0x3fffffULL << KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT)
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 563e2a8bae93..8e1c6ffe1b59 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -295,6 +295,7 @@ struct kvm_vcpu_events {
 #define KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS 6
 #define KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO  7
 #define KVM_DEV_ARM_VGIC_GRP_ITS_REGS 8
+#define KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ  9
 #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT	10
 #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_MASK \
 			(0x3fffffULL << KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT)
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 17d32c2797c0..95c6232663c3 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -229,6 +229,9 @@ struct vgic_dist {
 
 	int			nr_spis;
 
+	/* The GIC maintenance IRQ for nested hypervisors. */
+	u32			maint_irq;
+
 	/* base addresses in guest physical address space: */
 	gpa_t			vgic_dist_base;		/* distributor */
 	union {
diff --git a/virt/kvm/arm/vgic/vgic-kvm-device.c b/virt/kvm/arm/vgic/vgic-kvm-device.c
index 44419679f91a..dfb1d7cc66b3 100644
--- a/virt/kvm/arm/vgic/vgic-kvm-device.c
+++ b/virt/kvm/arm/vgic/vgic-kvm-device.c
@@ -241,6 +241,12 @@ static int vgic_get_common_attr(struct kvm_device *dev,
 			     VGIC_NR_PRIVATE_IRQS, uaddr);
 		break;
 	}
+	case KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ: {
+		u32 __user *uaddr = (u32 __user *)(long)attr->addr;
+
+		r = put_user(dev->kvm->arch.vgic.maint_irq, uaddr);
+		break;
+	}
 	}
 
 	return r;
@@ -627,6 +633,21 @@ static int vgic_v3_set_attr(struct kvm_device *dev,
 		reg = tmp32;
 		return vgic_v3_attr_regs_access(dev, attr, &reg, true);
 	}
+	case KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ: {
+		u32 __user *uaddr = (u32 __user *)(long)attr->addr;
+		u32 val;
+
+		if (get_user(val, uaddr))
+			return -EFAULT;
+
+		/* Must be a PPI. */
+		if ((val >= VGIC_NR_PRIVATE_IRQS) || (val < VGIC_NR_SGIS))
+			return -EINVAL;
+
+		dev->kvm->arch.vgic.maint_irq = val;
+
+		return 0;
+	}
 	case KVM_DEV_ARM_VGIC_GRP_CTRL: {
 		int ret;
 
@@ -712,6 +733,7 @@ static int vgic_v3_has_attr(struct kvm_device *dev,
 	case KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS:
 		return vgic_v3_has_attr_regs(dev, attr);
 	case KVM_DEV_ARM_VGIC_GRP_NR_IRQS:
+	case KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ:
 		return 0;
 	case KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO: {
 		if (((attr->attr & KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_MASK) >>
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 53/59] KVM: arm64: nv: Implement maintenance interrupt forwarding
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (51 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 52/59] KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-07-04  8:06   ` Julien Thierry
  2019-07-16 16:35   ` Alexandru Elisei
  2019-06-21  9:38 ` [PATCH 54/59] KVM: arm64: nv: Add nested GICv3 tracepoints Marc Zyngier
                   ` (7 subsequent siblings)
  60 siblings, 2 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

When we take a maintenance interrupt, we need to decide whether
it is generated on an action from the guest, or if it is something
that needs to be forwarded to the guest hypervisor.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/kvm/nested.c            |  2 +-
 virt/kvm/arm/vgic/vgic-init.c      | 30 ++++++++++++++++++++++++++++++
 virt/kvm/arm/vgic/vgic-v3-nested.c | 25 +++++++++++++++++++++----
 3 files changed, 52 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index df2db9ab7cfb..ab61f0f30ee6 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -545,7 +545,7 @@ bool vgic_state_is_nested(struct kvm_vcpu *vcpu)
 	bool imo = __vcpu_sys_reg(vcpu, HCR_EL2) & HCR_IMO;
 	bool fmo = __vcpu_sys_reg(vcpu, HCR_EL2) & HCR_FMO;
 
-	WARN(imo != fmo, "Separate virtual IRQ/FIQ settings not supported\n");
+	WARN_ONCE(imo != fmo, "Separate virtual IRQ/FIQ settings not supported\n");
 
 	return nested_virt_in_use(vcpu) && imo && fmo && !is_hyp_ctxt(vcpu);
 }
diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c
index 3bdb31eaed64..ec54bc8d5126 100644
--- a/virt/kvm/arm/vgic/vgic-init.c
+++ b/virt/kvm/arm/vgic/vgic-init.c
@@ -17,9 +17,11 @@
 #include <linux/uaccess.h>
 #include <linux/interrupt.h>
 #include <linux/cpu.h>
+#include <linux/irq.h>
 #include <linux/kvm_host.h>
 #include <kvm/arm_vgic.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_nested.h>
 #include "vgic.h"
 
 /*
@@ -240,6 +242,16 @@ int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
 	if (!irqchip_in_kernel(vcpu->kvm))
 		return 0;
 
+	if (nested_virt_in_use(vcpu)) {
+		/* FIXME: remove this hack */
+		if (vcpu->kvm->arch.vgic.maint_irq == 0)
+			vcpu->kvm->arch.vgic.maint_irq = kvm_vgic_global_state.maint_irq;
+		ret = kvm_vgic_set_owner(vcpu, vcpu->kvm->arch.vgic.maint_irq,
+					 vcpu);
+		if (ret)
+			return ret;
+	}
+
 	/*
 	 * If we are creating a VCPU with a GICv3 we must also register the
 	 * KVM io device for the redistributor that belongs to this VCPU.
@@ -455,12 +467,23 @@ static int vgic_init_cpu_dying(unsigned int cpu)
 
 static irqreturn_t vgic_maintenance_handler(int irq, void *data)
 {
+	struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)data;
+
 	/*
 	 * We cannot rely on the vgic maintenance interrupt to be
 	 * delivered synchronously. This means we can only use it to
 	 * exit the VM, and we perform the handling of EOIed
 	 * interrupts on the exit path (see vgic_fold_lr_state).
 	 */
+
+	/* If not nested, deactivate */
+	if (!vcpu || !vgic_state_is_nested(vcpu)) {
+		irq_set_irqchip_state(irq, IRQCHIP_STATE_ACTIVE, false);
+		return IRQ_HANDLED;
+	}
+
+	/* Assume nested from now */
+	vgic_v3_handle_nested_maint_irq(vcpu);
 	return IRQ_HANDLED;
 }
 
@@ -531,6 +554,13 @@ int kvm_vgic_hyp_init(void)
 		return ret;
 	}
 
+	ret = irq_set_vcpu_affinity(kvm_vgic_global_state.maint_irq,
+				    kvm_get_running_vcpus());
+	if (ret) {
+		kvm_err("Error setting vcpu affinity\n");
+		goto out_free_irq;
+	}
+
 	ret = cpuhp_setup_state(CPUHP_AP_KVM_ARM_VGIC_INIT_STARTING,
 				"kvm/arm/vgic:starting",
 				vgic_init_cpu_starting, vgic_init_cpu_dying);
diff --git a/virt/kvm/arm/vgic/vgic-v3-nested.c b/virt/kvm/arm/vgic/vgic-v3-nested.c
index c917d49e4a14..7c5f82ae68e0 100644
--- a/virt/kvm/arm/vgic/vgic-v3-nested.c
+++ b/virt/kvm/arm/vgic/vgic-v3-nested.c
@@ -172,10 +172,20 @@ void vgic_v3_sync_nested(struct kvm_vcpu *vcpu)
 void vgic_v3_load_nested(struct kvm_vcpu *vcpu)
 {
 	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
+	struct vgic_irq *irq;
+	unsigned long flags;
 
 	vgic_cpu->shadow_vgic_v3 = vgic_cpu->nested_vgic_v3;
 	vgic_v3_create_shadow_lr(vcpu);
 	__vgic_v3_restore_state(vcpu_shadow_if(vcpu));
+
+	irq = vgic_get_irq(vcpu->kvm, vcpu, vcpu->kvm->arch.vgic.maint_irq);
+	raw_spin_lock_irqsave(&irq->irq_lock, flags);
+	if (irq->line_level || irq->active)
+		irq_set_irqchip_state(kvm_vgic_global_state.maint_irq,
+				      IRQCHIP_STATE_ACTIVE, true);
+	raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
+	vgic_put_irq(vcpu->kvm, irq);
 }
 
 void vgic_v3_put_nested(struct kvm_vcpu *vcpu)
@@ -190,11 +200,14 @@ void vgic_v3_put_nested(struct kvm_vcpu *vcpu)
 	 */
 	vgic_v3_fixup_shadow_lr_state(vcpu);
 	vgic_cpu->nested_vgic_v3 = vgic_cpu->shadow_vgic_v3;
+	irq_set_irqchip_state(kvm_vgic_global_state.maint_irq,
+			      IRQCHIP_STATE_ACTIVE, false);
 }
 
 void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu)
 {
 	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+	bool state;
 
 	/*
 	 * If we exit a nested VM with a pending maintenance interrupt from the
@@ -202,8 +215,12 @@ void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu)
 	 * can re-sync the appropriate LRs and sample level triggered interrupts
 	 * again.
 	 */
-	if (vgic_state_is_nested(vcpu) &&
-	    (cpu_if->vgic_hcr & ICH_HCR_EN) &&
-	    vgic_v3_get_misr(vcpu))
-		kvm_inject_nested_irq(vcpu);
+	if (!vgic_state_is_nested(vcpu))
+		return;
+
+	state  = cpu_if->vgic_hcr & ICH_HCR_EN;
+	state &= vgic_v3_get_misr(vcpu);
+
+	kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
+			    vcpu->kvm->arch.vgic.maint_irq, state, vcpu);
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 54/59] KVM: arm64: nv: Add nested GICv3 tracepoints
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (52 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 53/59] KVM: arm64: nv: Implement maintenance interrupt forwarding Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-06-21  9:38 ` [PATCH 55/59] arm64: KVM: nv: Add handling of EL2-specific timer registers Marc Zyngier
                   ` (6 subsequent siblings)
  60 siblings, 0 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

From: Christoffer Dall <christoffer.dall@arm.com>

Adding tracepoints to be able to peek into the shadow LRs used when
running a guest guest.

Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 virt/kvm/arm/vgic/vgic-nested-trace.h | 137 ++++++++++++++++++++++++++
 virt/kvm/arm/vgic/vgic-v3-nested.c    |  12 ++-
 2 files changed, 148 insertions(+), 1 deletion(-)
 create mode 100644 virt/kvm/arm/vgic/vgic-nested-trace.h

diff --git a/virt/kvm/arm/vgic/vgic-nested-trace.h b/virt/kvm/arm/vgic/vgic-nested-trace.h
new file mode 100644
index 000000000000..69f4ec031e7c
--- /dev/null
+++ b/virt/kvm/arm/vgic/vgic-nested-trace.h
@@ -0,0 +1,137 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#if !defined(_TRACE_VGIC_NESTED_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_VGIC_NESTED_H
+
+#include <linux/tracepoint.h>
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM kvm
+
+#define SLR_ENTRY_VALS(x)							\
+	" ",									\
+	!!(__entry->lrs[x] & ICH_LR_HW),		   			\
+	!!(__entry->lrs[x] & ICH_LR_PENDING_BIT),	   			\
+	!!(__entry->lrs[x] & ICH_LR_ACTIVE_BIT),	   			\
+	__entry->lrs[x] & ICH_LR_VIRTUAL_ID_MASK,				\
+	(__entry->lrs[x] & ICH_LR_PHYS_ID_MASK) >> ICH_LR_PHYS_ID_SHIFT,	\
+	(__entry->orig_lrs[x] & ICH_LR_PHYS_ID_MASK) >> ICH_LR_PHYS_ID_SHIFT
+
+TRACE_EVENT(vgic_create_shadow_lrs,
+	TP_PROTO(struct kvm_vcpu *vcpu, int nr_lr, u64 *lrs, u64 *orig_lrs),
+	TP_ARGS(vcpu, nr_lr, lrs, orig_lrs),
+
+	TP_STRUCT__entry(
+		__field(	int,	nr_lr			)
+		__array(	u64,	lrs,		16	)
+		__array(	u64,	orig_lrs,	16	)
+	),
+
+	TP_fast_assign(
+		__entry->nr_lr		= nr_lr;
+		memcpy(__entry->lrs, lrs, 16 * sizeof(u64));
+		memcpy(__entry->orig_lrs, orig_lrs, 16 * sizeof(u64));
+	),
+
+	TP_printk("nr_lr: %d\n"
+		  "%50sLR[ 0]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[ 1]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[ 2]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[ 3]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[ 4]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[ 5]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[ 6]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[ 7]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[ 8]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[ 9]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[10]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[11]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[12]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[13]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[14]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[15]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)",
+		  __entry->nr_lr,
+		  SLR_ENTRY_VALS(0), SLR_ENTRY_VALS(1), SLR_ENTRY_VALS(2),
+		  SLR_ENTRY_VALS(3), SLR_ENTRY_VALS(4), SLR_ENTRY_VALS(5),
+		  SLR_ENTRY_VALS(6), SLR_ENTRY_VALS(7), SLR_ENTRY_VALS(8),
+		  SLR_ENTRY_VALS(9), SLR_ENTRY_VALS(10), SLR_ENTRY_VALS(11),
+		  SLR_ENTRY_VALS(12), SLR_ENTRY_VALS(13), SLR_ENTRY_VALS(14),
+		  SLR_ENTRY_VALS(15))
+);
+
+#define LR_ENTRY_VALS(x)							\
+	" ",									\
+	!!(__entry->lrs[x] & ICH_LR_HW),		   			\
+	!!(__entry->lrs[x] & ICH_LR_PENDING_BIT),	   			\
+	!!(__entry->lrs[x] & ICH_LR_ACTIVE_BIT),	   			\
+	__entry->lrs[x] & ICH_LR_VIRTUAL_ID_MASK,				\
+	(__entry->lrs[x] & ICH_LR_PHYS_ID_MASK) >> ICH_LR_PHYS_ID_SHIFT
+
+TRACE_EVENT(vgic_put_nested,
+	TP_PROTO(struct kvm_vcpu *vcpu, int nr_lr, u64 *lrs),
+	TP_ARGS(vcpu, nr_lr, lrs),
+
+	TP_STRUCT__entry(
+		__field(	int,	nr_lr			)
+		__array(	u64,	lrs,		16	)
+	),
+
+	TP_fast_assign(
+		__entry->nr_lr		= nr_lr;
+		memcpy(__entry->lrs, lrs, 16 * sizeof(u64));
+	),
+
+	TP_printk("nr_lr: %d\n"
+		  "%50sLR[ 0]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[ 1]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[ 2]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[ 3]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[ 4]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[ 5]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[ 6]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[ 7]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[ 8]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[ 9]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[10]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[11]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[12]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[13]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[14]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[15]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu",
+		  __entry->nr_lr,
+		  LR_ENTRY_VALS(0), LR_ENTRY_VALS(1), LR_ENTRY_VALS(2),
+		  LR_ENTRY_VALS(3), LR_ENTRY_VALS(4), LR_ENTRY_VALS(5),
+		  LR_ENTRY_VALS(6), LR_ENTRY_VALS(7), LR_ENTRY_VALS(8),
+		  LR_ENTRY_VALS(9), LR_ENTRY_VALS(10), LR_ENTRY_VALS(11),
+		  LR_ENTRY_VALS(12), LR_ENTRY_VALS(13), LR_ENTRY_VALS(14),
+		  LR_ENTRY_VALS(15))
+);
+
+TRACE_EVENT(vgic_nested_hw_emulate,
+	TP_PROTO(int lr, u64 lr_val, u32 l1_intid),
+	TP_ARGS(lr, lr_val, l1_intid),
+
+	TP_STRUCT__entry(
+		__field(	int,	lr		)
+		__field(	u64,	lr_val		)
+		__field(	u32,	l1_intid	)
+	),
+
+	TP_fast_assign(
+		__entry->lr		= lr;
+		__entry->lr_val		= lr_val;
+		__entry->l1_intid	= l1_intid;
+	),
+
+	TP_printk("lr: %d LR %llx L1 INTID: %u\n",
+		  __entry->lr, __entry->lr_val, __entry->l1_intid)
+);
+
+#endif /* _TRACE_VGIC_NESTED_H */
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH ../../../virt/kvm/arm/vgic
+#undef TRACE_INCLUDE_FILE
+#define TRACE_INCLUDE_FILE vgic-nested-trace
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/virt/kvm/arm/vgic/vgic-v3-nested.c b/virt/kvm/arm/vgic/vgic-v3-nested.c
index 7c5f82ae68e0..ec838b7be6dc 100644
--- a/virt/kvm/arm/vgic/vgic-v3-nested.c
+++ b/virt/kvm/arm/vgic/vgic-v3-nested.c
@@ -13,6 +13,9 @@
 
 #include "vgic.h"
 
+#define CREATE_TRACE_POINTS
+#include "vgic-nested-trace.h"
+
 static inline struct vgic_v3_cpu_if *vcpu_nested_if(struct kvm_vcpu *vcpu)
 {
 	return &vcpu->arch.vgic_cpu.nested_vgic_v3;
@@ -118,6 +121,8 @@ static void vgic_v3_create_shadow_lr(struct kvm_vcpu *vcpu)
 		s_cpu_if->vgic_lr[i] = lr;
 	}
 
+	trace_vgic_create_shadow_lrs(vcpu, kvm_vgic_global_state.nr_lr,
+				     s_cpu_if->vgic_lr, cpu_if->vgic_lr);
 	s_cpu_if->used_lrs = kvm_vgic_global_state.nr_lr;
 }
 
@@ -162,8 +167,10 @@ void vgic_v3_sync_nested(struct kvm_vcpu *vcpu)
 			continue; /* oh well, the guest hyp is broken */
 
 		lr = __gic_v3_get_lr(i);
-		if (!(lr & ICH_LR_STATE))
+		if (!(lr & ICH_LR_STATE)) {
+			trace_vgic_nested_hw_emulate(i, lr, l1_irq);
 			irq->active = false;
+		}
 
 		vgic_put_irq(vcpu->kvm, irq);
 	}
@@ -194,6 +201,9 @@ void vgic_v3_put_nested(struct kvm_vcpu *vcpu)
 
 	__vgic_v3_save_state(vcpu_shadow_if(vcpu));
 
+	trace_vgic_put_nested(vcpu, kvm_vgic_global_state.nr_lr,
+			      vcpu_shadow_if(vcpu)->vgic_lr);
+
 	/*
 	 * Translate the shadow state HW fields back to the virtual ones
 	 * before copying the shadow struct back to the nested one.
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 55/59] arm64: KVM: nv: Add handling of EL2-specific timer registers
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (53 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 54/59] KVM: arm64: nv: Add nested GICv3 tracepoints Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-07-11 12:35   ` Alexandru Elisei
  2019-07-17 10:19   ` Alexandru Elisei
  2019-06-21  9:38 ` [PATCH 56/59] arm64: KVM: nv: Honor SCTLR_EL2.SPAN on entering vEL2 Marc Zyngier
                   ` (5 subsequent siblings)
  60 siblings, 2 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

Add the required handling for EL2 and EL02 registers, as
well as EL1 registers used in the E2H context.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/kvm/sys_regs.c | 72 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 72 insertions(+)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index ba3bcd29c02d..0bd6a66b621e 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1361,20 +1361,92 @@ static bool access_arch_timer(struct kvm_vcpu *vcpu,
 
 	switch (reg) {
 	case SYS_CNTP_TVAL_EL0:
+		if (vcpu_mode_el2(vcpu) && vcpu_el2_e2h_is_set(vcpu))
+			tmr = TIMER_HPTIMER;
+		else
+			tmr = TIMER_PTIMER;
+		treg = TIMER_REG_TVAL;
+		break;
+
 	case SYS_AARCH32_CNTP_TVAL:
+	case SYS_CNTP_TVAL_EL02:
 		tmr = TIMER_PTIMER;
 		treg = TIMER_REG_TVAL;
 		break;
+
+	case SYS_CNTV_TVAL_EL02:
+		tmr = TIMER_VTIMER;
+		treg = TIMER_REG_TVAL;
+		break;
+
+	case SYS_CNTHP_TVAL_EL2:
+		tmr = TIMER_HPTIMER;
+		treg = TIMER_REG_TVAL;
+		break;
+
+	case SYS_CNTHV_TVAL_EL2:
+		tmr = TIMER_HVTIMER;
+		treg = TIMER_REG_TVAL;
+		break;
+
 	case SYS_CNTP_CTL_EL0:
+		if (vcpu_mode_el2(vcpu) && vcpu_el2_e2h_is_set(vcpu))
+			tmr = TIMER_HPTIMER;
+		else
+			tmr = TIMER_PTIMER;
+		treg = TIMER_REG_CTL;
+		break;
+
 	case SYS_AARCH32_CNTP_CTL:
+	case SYS_CNTP_CTL_EL02:
 		tmr = TIMER_PTIMER;
 		treg = TIMER_REG_CTL;
 		break;
+
+	case SYS_CNTV_CTL_EL02:
+		tmr = TIMER_VTIMER;
+		treg = TIMER_REG_CTL;
+		break;
+
+	case SYS_CNTHP_CTL_EL2:
+		tmr = TIMER_HPTIMER;
+		treg = TIMER_REG_CTL;
+		break;
+
+	case SYS_CNTHV_CTL_EL2:
+		tmr = TIMER_HVTIMER;
+		treg = TIMER_REG_CTL;
+		break;
+		
 	case SYS_CNTP_CVAL_EL0:
+		if (vcpu_mode_el2(vcpu) && vcpu_el2_e2h_is_set(vcpu))
+			tmr = TIMER_HPTIMER;
+		else
+			tmr = TIMER_PTIMER;
+		treg = TIMER_REG_CVAL;
+		break;
+
 	case SYS_AARCH32_CNTP_CVAL:
+	case SYS_CNTP_CVAL_EL02:
 		tmr = TIMER_PTIMER;
 		treg = TIMER_REG_CVAL;
 		break;
+
+	case SYS_CNTV_CVAL_EL02:
+		tmr = TIMER_VTIMER;
+		treg = TIMER_REG_CVAL;
+		break;
+
+	case SYS_CNTHP_CVAL_EL2:
+		tmr = TIMER_HPTIMER;
+		treg = TIMER_REG_CVAL;
+		break;
+
+	case SYS_CNTHV_CVAL_EL2:
+		tmr = TIMER_HVTIMER;
+		treg = TIMER_REG_CVAL;
+		break;
+
 	case SYS_CNTVOFF_EL2:
 		tmr = TIMER_VTIMER;
 		treg = TIMER_REG_VOFF;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 56/59] arm64: KVM: nv: Honor SCTLR_EL2.SPAN on entering vEL2
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (54 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 55/59] arm64: KVM: nv: Add handling of EL2-specific timer registers Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-06-21  9:38 ` [PATCH 57/59] arm64: KVM: nv: Handle SCTLR_EL2 RES0/RES1 bits Marc Zyngier
                   ` (4 subsequent siblings)
  60 siblings, 0 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

On entering vEL2, we must honor the SCTLR_EL2.SPAN bit so that
PSTATE.PAN reflect the expected setting.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/kvm/emulate-nested.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
index c406fd688b9f..e37af2749b38 100644
--- a/arch/arm64/kvm/emulate-nested.c
+++ b/arch/arm64/kvm/emulate-nested.c
@@ -127,9 +127,11 @@ void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu)
 static void enter_el2_exception(struct kvm_vcpu *vcpu, u64 esr_el2,
 				enum exception_type type)
 {
+	u64 spsr = *vcpu_cpsr(vcpu);
+
 	trace_kvm_inject_nested_exception(vcpu, esr_el2, type);
 
-	vcpu_write_sys_reg(vcpu, *vcpu_cpsr(vcpu), SPSR_EL2);
+	vcpu_write_sys_reg(vcpu, spsr, SPSR_EL2);
 	vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL2);
 	vcpu_write_sys_reg(vcpu, esr_el2, ESR_EL2);
 
@@ -137,6 +139,15 @@ static void enter_el2_exception(struct kvm_vcpu *vcpu, u64 esr_el2,
 	/* On an exception, PSTATE.SP becomes 1 */
 	*vcpu_cpsr(vcpu) = PSR_MODE_EL2h;
 	*vcpu_cpsr(vcpu) |= PSR_A_BIT | PSR_F_BIT | PSR_I_BIT | PSR_D_BIT;
+
+	/*
+	 * If SPAN is clear, set the PAN bit on exception entry
+	 * if SPAN is set, copy the PAN bit across
+	 */
+	if (!(vcpu_read_sys_reg(vcpu, SCTLR_EL2) & SCTLR_EL1_SPAN))
+		*vcpu_cpsr(vcpu) |= PSR_PAN_BIT;
+	else
+		*vcpu_cpsr(vcpu) |= (spsr & PSR_PAN_BIT);
 }
 
 /*
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 57/59] arm64: KVM: nv: Handle SCTLR_EL2 RES0/RES1 bits
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (55 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 56/59] arm64: KVM: nv: Honor SCTLR_EL2.SPAN on entering vEL2 Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-06-21  9:38 ` [PATCH 58/59] arm64: KVM: nv: Restrict S2 RD/WR permissions to match the guest's Marc Zyngier
                   ` (3 subsequent siblings)
  60 siblings, 0 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

Depending on the HCR_EL2.{E2H,TGE} values, SCTLR_EL2 has different
RES0/RES1 constraints.

Let's handle that.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/kvm/sys_regs.c | 28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 0bd6a66b621e..1d896113f1f8 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -415,6 +415,32 @@ static bool access_rw(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+static bool access_sctlr_el2(struct kvm_vcpu *vcpu,
+			     struct sys_reg_params *p,
+			     const struct sys_reg_desc *r)
+{
+	if (el12_reg(p) && forward_nv_traps(vcpu))
+		return false;
+
+	if (p->is_write) {
+		u64 val = p->regval;
+
+		if (vcpu_el2_e2h_is_set(vcpu) && vcpu_el2_tge_is_set(vcpu)) {
+			val &= ~SCTLR_EL1_RES0;
+			val |=  SCTLR_EL1_RES1;
+		} else {
+			val &= ~SCTLR_EL2_RES0;
+			val |=  SCTLR_EL2_RES1;
+		}
+
+		vcpu_write_sys_reg(vcpu, val, r->reg);
+	} else {
+		p->regval = vcpu_read_sys_reg(vcpu, r->reg);
+	}
+
+	return true;
+}
+
 /*
  * See note at ARMv7 ARM B1.14.4 (TL;DR: S/W ops are not easily virtualized).
  */
@@ -2300,7 +2326,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_VPIDR_EL2), access_rw, reset_vpidr, VPIDR_EL2 },
 	{ SYS_DESC(SYS_VMPIDR_EL2), access_rw, reset_vmpidr, VMPIDR_EL2 },
 
-	{ SYS_DESC(SYS_SCTLR_EL2), access_rw, reset_val, SCTLR_EL2, 0 },
+	{ SYS_DESC(SYS_SCTLR_EL2), access_sctlr_el2, reset_val, SCTLR_EL2, SCTLR_EL2_RES1 },
 	{ SYS_DESC(SYS_ACTLR_EL2), access_rw, reset_val, ACTLR_EL2, 0 },
 	{ SYS_DESC(SYS_HCR_EL2), access_rw, reset_val, HCR_EL2, 0 },
 	{ SYS_DESC(SYS_MDCR_EL2), access_rw, reset_val, MDCR_EL2, 0 },
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 58/59] arm64: KVM: nv: Restrict S2 RD/WR permissions to match the guest's
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (56 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 57/59] arm64: KVM: nv: Handle SCTLR_EL2 RES0/RES1 bits Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
  2019-06-21  9:38 ` [PATCH 59/59] arm64: KVM: nv: Allow userspace to request KVM_ARM_VCPU_NESTED_VIRT Marc Zyngier
                   ` (2 subsequent siblings)
  60 siblings, 0 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

When mapping a page in a shadow stage-2, special care must be
taken not to be more permissive than the guest is (writable or
readable page when the guest hasn't set that permission).

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_mmu.h      | 18 ++++++++++++++++++
 arch/arm64/include/asm/kvm_mmu.h    | 18 ++++++++++++++++++
 arch/arm64/include/asm/kvm_nested.h | 10 ++++++++++
 virt/kvm/arm/mmu.c                  | 21 ++++++++++++++++++++-
 4 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 7a6e9008ed45..7aa7182ca744 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -113,6 +113,12 @@ static inline pud_t kvm_s2pud_mkexec(pud_t pud)
 	return pud;
 }
 
+static inline pud_t kvm_s2pud_revoke_read(pud_t pud)
+{
+	WARN_ON(1);
+	return pud;
+}
+
 static inline bool kvm_s2pud_exec(pud_t *pud)
 {
 	WARN_ON(1);
@@ -155,6 +161,18 @@ static inline pmd_t kvm_s2pmd_mkexec(pmd_t pmd)
 	return pmd;
 }
 
+static inline pte_t kvm_s2pte_revoke_read(pte_t pte)
+{
+	pte_val(pte) &= ~L_PTE_S2_RDONLY;
+	return pte;
+}
+
+static inline pmd_t kvm_s2pmd_revoke_read(pmd_t pmd)
+{
+	pmd_val(pmd) &= ~L_PMD_S2_RDONLY;
+	return pmd;
+}
+
 static inline void kvm_set_s2pte_readonly(pte_t *pte)
 {
 	pte_val(*pte) = (pte_val(*pte) & ~L_PTE_S2_RDWR) | L_PTE_S2_RDONLY;
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 53103607065a..f46ea12e55c2 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -238,6 +238,24 @@ static inline pud_t kvm_s2pud_mkexec(pud_t pud)
 	return pud;
 }
 
+static inline pte_t kvm_s2pte_revoke_read(pte_t pte)
+{
+	pte_val(pte) &= ~PTE_S2_RDONLY;
+	return pte;
+}
+
+static inline pmd_t kvm_s2pmd_revoke_read(pmd_t pmd)
+{
+	pmd_val(pmd) &= ~PMD_S2_RDONLY;
+	return pmd;
+}
+
+static inline pud_t kvm_s2pud_revoke_read(pud_t pud)
+{
+	pud_val(pud) &= ~PUD_S2_RDONLY;
+	return pud;
+}
+
 static inline void kvm_set_s2pte_readonly(pte_t *ptep)
 {
 	pteval_t old_pteval, pteval;
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 3b415bc76ced..76777b1b9419 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -42,6 +42,16 @@ static inline u32 kvm_s2_trans_esr(struct kvm_s2_trans *trans)
 	return trans->esr;
 }
 
+static inline bool kvm_s2_trans_readable(struct kvm_s2_trans *trans)
+{
+	return trans->readable;
+}
+
+static inline bool kvm_s2_trans_writable(struct kvm_s2_trans *trans)
+{
+	return trans->writable;
+}
+
 extern int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
 			      struct kvm_s2_trans *result);
 
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 0ea79e543b29..bc8cb529647b 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1729,7 +1729,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  unsigned long hva, unsigned long fault_status)
 {
 	int ret;
-	bool write_fault, writable;
+	bool write_fault, writable, readable = true;
 	bool exec_fault, needs_exec;
 	unsigned long mmu_seq;
 	phys_addr_t ipa = fault_ipa;
@@ -1842,6 +1842,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			writable = false;
 	}
 
+	/*
+	 * Potentially reduce shadow S2 permissions to match the guest's own
+	 * S2. For exec faults, we'd only reach this point if the guest
+	 * actually allowed it (see kvm_s2_handle_perm_fault).
+	 */
+	if (kvm_is_shadow_s2_fault(vcpu)) {
+		writable &= kvm_s2_trans_writable(nested);
+		readable &= kvm_s2_trans_readable(nested);
+	}
+
 	spin_lock(&kvm->mmu_lock);
 	if (mmu_notifier_retry(kvm, mmu_seq))
 		goto out_unlock;
@@ -1887,6 +1897,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		if (writable)
 			new_pud = kvm_s2pud_mkwrite(new_pud);
 
+		if (!readable)
+			new_pud = kvm_s2pud_revoke_read(new_pud);
+
 		if (needs_exec)
 			new_pud = kvm_s2pud_mkexec(new_pud);
 
@@ -1899,6 +1912,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		if (writable)
 			new_pmd = kvm_s2pmd_mkwrite(new_pmd);
 
+		if (!readable)
+			new_pmd = kvm_s2pmd_revoke_read(new_pmd);
+
 		if (needs_exec)
 			new_pmd = kvm_s2pmd_mkexec(new_pmd);
 
@@ -1911,6 +1927,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			mark_page_dirty(kvm, gfn);
 		}
 
+		if (!readable)
+			new_pte = kvm_s2pte_revoke_read(new_pte);
+
 		if (needs_exec)
 			new_pte = kvm_s2pte_mkexec(new_pte);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH 59/59] arm64: KVM: nv: Allow userspace to request KVM_ARM_VCPU_NESTED_VIRT
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (57 preceding siblings ...)
  2019-06-21  9:38 ` [PATCH 58/59] arm64: KVM: nv: Restrict S2 RD/WR permissions to match the guest's Marc Zyngier
@ 2019-06-21  9:38 ` Marc Zyngier
       [not found] ` <CANW9uyssDm_0ysC_pnvhHRrnsmFZik+3_ENmFz7L2GCmtH09fw@mail.gmail.com>
  2019-08-02 10:11 ` Alexandru Elisei
  60 siblings, 0 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21  9:38 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	Julien Thierry, James Morse, Suzuki K Poulose

Since we're (almost) feature complete, let's allow userspace to
request KVM_ARM_VCPU_NESTED_VIRT by bumping the KVM_VCPU_MAX_FEATURES
up.

It's going to be great...

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/kvm_host.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index e2e44cc650bf..a29e3ab72eec 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -49,7 +49,7 @@
 
 #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
 
-#define KVM_VCPU_MAX_FEATURES 7
+#define KVM_VCPU_MAX_FEATURES 8
 
 #define KVM_REQ_SLEEP \
 	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* Re: [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support
       [not found] ` <CANW9uyssDm_0ysC_pnvhHRrnsmFZik+3_ENmFz7L2GCmtH09fw@mail.gmail.com>
@ 2019-06-21 11:21   ` Marc Zyngier
  0 siblings, 0 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21 11:21 UTC (permalink / raw)
  To: Itaru Kitayama
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, James Morse,
	Jintack Lim, Julien Thierry, Suzuki K Poulose, kvm, kvmarm,
	linux-arm-kernel

On 21/06/2019 10:57, Itaru Kitayama wrote:
> Marc,
> Only possible way to test this series is to get
> on Fast Model?

Unless you have some ARMv8.3-capable hardware lying around, yes.
Note that the Foundation model is also a good option.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 03/59] arm64: Add ARM64_HAS_NESTED_VIRT cpufeature
  2019-06-21  9:37 ` [PATCH 03/59] arm64: Add ARM64_HAS_NESTED_VIRT cpufeature Marc Zyngier
@ 2019-06-21 13:08   ` Julien Thierry
  2019-06-21 13:22     ` Marc Zyngier
  2019-06-21 13:44   ` Suzuki K Poulose
  2019-06-24 11:24   ` Dave Martin
  2 siblings, 1 reply; 176+ messages in thread
From: Julien Thierry @ 2019-06-21 13:08 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	James Morse, Suzuki K Poulose



On 21/06/2019 10:37, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> Add a new ARM64_HAS_NESTED_VIRT feature to indicate that the
> CPU has the ARMv8.3 nested virtualization capability.
> 
> This will be used to support nested virtualization in KVM.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  .../admin-guide/kernel-parameters.txt         |  4 +++
>  arch/arm64/include/asm/cpucaps.h              |  3 ++-
>  arch/arm64/include/asm/sysreg.h               |  1 +
>  arch/arm64/kernel/cpufeature.c                | 26 +++++++++++++++++++
>  4 files changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 138f6664b2e2..202bb2115d83 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2046,6 +2046,10 @@
>  			[KVM,ARM] Allow use of GICv4 for direct injection of
>  			LPIs.
>  
> +	kvm-arm.nested=
> +			[KVM,ARM] Allow nested virtualization in KVM/ARM.
> +			Default is 0 (disabled)
> +

Once the kernel has been built with nested guest support, what do we
gain from having it disabled by default?

It seems a bit odd since the guests have to opt-in for the capability of
running guests of their own.

Is it it likely to have negative impact a negative impact on the host
kernel? Or on guests that do not request use of nested virt?

If not I feel that this kernel parameter should be dropped.

Cheers,

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 04/59] KVM: arm64: nv: Introduce nested virtualization VCPU feature
  2019-06-21  9:37 ` [PATCH 04/59] KVM: arm64: nv: Introduce nested virtualization VCPU feature Marc Zyngier
@ 2019-06-21 13:08   ` Julien Thierry
  2019-06-24 11:28   ` Dave Martin
  2019-06-24 11:43   ` Dave Martin
  2 siblings, 0 replies; 176+ messages in thread
From: Julien Thierry @ 2019-06-21 13:08 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	James Morse, Suzuki K Poulose



On 21/06/2019 10:37, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@arm.com>
> 
> Introduce the feature bit and a primitive that checks if the feature is
> set behind a static key check based on the cpus_have_const_cap check.
> 
> Checking nested_virt_in_use() on systems without nested virt enabled
> should have neglgible overhead.
> 
> We don't yet allow userspace to actually set this feature.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/include/asm/kvm_nested.h   |  9 +++++++++
>  arch/arm64/include/asm/kvm_nested.h | 13 +++++++++++++
>  arch/arm64/include/uapi/asm/kvm.h   |  1 +
>  3 files changed, 23 insertions(+)
>  create mode 100644 arch/arm/include/asm/kvm_nested.h
>  create mode 100644 arch/arm64/include/asm/kvm_nested.h
> 
> diff --git a/arch/arm/include/asm/kvm_nested.h b/arch/arm/include/asm/kvm_nested.h
> new file mode 100644
> index 000000000000..124ff6445f8f
> --- /dev/null
> +++ b/arch/arm/include/asm/kvm_nested.h
> @@ -0,0 +1,9 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __ARM_KVM_NESTED_H
> +#define __ARM_KVM_NESTED_H
> +
> +#include <linux/kvm_host.h>
> +
> +static inline bool nested_virt_in_use(const struct kvm_vcpu *vcpu) { return false; }
> +
> +#endif /* __ARM_KVM_NESTED_H */
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> new file mode 100644
> index 000000000000..8a3d121a0b42
> --- /dev/null
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -0,0 +1,13 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __ARM64_KVM_NESTED_H
> +#define __ARM64_KVM_NESTED_H
> +
> +#include <linux/kvm_host.h>
> +
> +static inline bool nested_virt_in_use(const struct kvm_vcpu *vcpu)
> +{
> +	return cpus_have_const_cap(ARM64_HAS_NESTED_VIRT) &&
> +		test_bit(KVM_ARM_VCPU_NESTED_VIRT, vcpu->arch.features);

Nit: You could make it even cheaper for some systems by adding
IS_DEFINED(CONFIG_ARM64_VHE). It would also make the dependency between
NV and VHE more explicit.

Otherwise:

Reviewed-by: Julien Thierry <julien.thierry@arm.com>

Cheers,

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 03/59] arm64: Add ARM64_HAS_NESTED_VIRT cpufeature
  2019-06-21 13:08   ` Julien Thierry
@ 2019-06-21 13:22     ` Marc Zyngier
  0 siblings, 0 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21 13:22 UTC (permalink / raw)
  To: Julien Thierry, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	James Morse, Suzuki K Poulose

On 21/06/2019 14:08, Julien Thierry wrote:
> 
> 
> On 21/06/2019 10:37, Marc Zyngier wrote:
>> From: Jintack Lim <jintack.lim@linaro.org>
>>
>> Add a new ARM64_HAS_NESTED_VIRT feature to indicate that the
>> CPU has the ARMv8.3 nested virtualization capability.
>>
>> This will be used to support nested virtualization in KVM.
>>
>> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  .../admin-guide/kernel-parameters.txt         |  4 +++
>>  arch/arm64/include/asm/cpucaps.h              |  3 ++-
>>  arch/arm64/include/asm/sysreg.h               |  1 +
>>  arch/arm64/kernel/cpufeature.c                | 26 +++++++++++++++++++
>>  4 files changed, 33 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index 138f6664b2e2..202bb2115d83 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -2046,6 +2046,10 @@
>>  			[KVM,ARM] Allow use of GICv4 for direct injection of
>>  			LPIs.
>>  
>> +	kvm-arm.nested=
>> +			[KVM,ARM] Allow nested virtualization in KVM/ARM.
>> +			Default is 0 (disabled)
>> +
> 
> Once the kernel has been built with nested guest support, what do we
> gain from having it disabled by default?

We have a bunch of fast paths almost everywhere when NV isn't enabled.
It makes a real difference at the moment.

> It seems a bit odd since the guests have to opt-in for the capability of
> running guests of their own.
> 
> Is it it likely to have negative impact a negative impact on the host
> kernel? Or on guests that do not request use of nested virt?
> 
> If not I feel that this kernel parameter should be dropped.

It really does. Speed is one, but also security is another. NV adds all
kind of new paths and complexity. Having a central knob to control it
and having it OFF by default helps me sleep at night...

This is also what x86 had for multiple years until it was deemed safe
enough to be on by default.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 06/59] KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x
  2019-06-21  9:37 ` [PATCH 06/59] KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x Marc Zyngier
@ 2019-06-21 13:24   ` Julien Thierry
  2019-06-21 13:50     ` Marc Zyngier
  0 siblings, 1 reply; 176+ messages in thread
From: Julien Thierry @ 2019-06-21 13:24 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	James Morse, Suzuki K Poulose



On 21/06/2019 10:37, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@linaro.org>
> 
> We were not allowing userspace to set a more privileged mode for the VCPU
> than EL1, but we should allow this when nested virtualization is enabled
> for the VCPU.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/kvm/guest.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> index 3ae2f82fca46..4c35b5d51e21 100644
> --- a/arch/arm64/kvm/guest.c
> +++ b/arch/arm64/kvm/guest.c
> @@ -37,6 +37,7 @@
>  #include <asm/kvm_emulate.h>
>  #include <asm/kvm_coproc.h>
>  #include <asm/kvm_host.h>
> +#include <asm/kvm_nested.h>
>  #include <asm/sigcontext.h>
>  
>  #include "trace.h"
> @@ -194,6 +195,11 @@ static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
>  			if (vcpu_el1_is_32bit(vcpu))
>  				return -EINVAL;
>  			break;
> +		case PSR_MODE_EL2h:
> +		case PSR_MODE_EL2t:
> +			if (vcpu_el1_is_32bit(vcpu) || !nested_virt_in_use(vcpu))

This condition reads a bit weirdly. Why do we care about anything else
than !nested_virt_in_use() ?

If nested virt is not in use then obviously we return the error.

If nested virt is in use then why do we care about EL1? Or should this
test read as "highest_el_is_32bit" ?

Thanks,

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 03/59] arm64: Add ARM64_HAS_NESTED_VIRT cpufeature
  2019-06-21  9:37 ` [PATCH 03/59] arm64: Add ARM64_HAS_NESTED_VIRT cpufeature Marc Zyngier
  2019-06-21 13:08   ` Julien Thierry
@ 2019-06-21 13:44   ` Suzuki K Poulose
  2019-06-24 11:24   ` Dave Martin
  2 siblings, 0 replies; 176+ messages in thread
From: Suzuki K Poulose @ 2019-06-21 13:44 UTC (permalink / raw)
  To: marc.zyngier, linux-arm-kernel, kvmarm, kvm
  Cc: andre.przywara, christoffer.dall, dave.martin, jintack,
	julien.thierry, james.morse

On 06/21/2019 10:37 AM, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> Add a new ARM64_HAS_NESTED_VIRT feature to indicate that the
> CPU has the ARMv8.3 nested virtualization capability.
> 
> This will be used to support nested virtualization in KVM.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>


Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 06/59] KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x
  2019-06-21 13:24   ` Julien Thierry
@ 2019-06-21 13:50     ` Marc Zyngier
  2019-06-24 12:48       ` Dave Martin
  0 siblings, 1 reply; 176+ messages in thread
From: Marc Zyngier @ 2019-06-21 13:50 UTC (permalink / raw)
  To: Julien Thierry, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	James Morse, Suzuki K Poulose

On 21/06/2019 14:24, Julien Thierry wrote:
> 
> 
> On 21/06/2019 10:37, Marc Zyngier wrote:
>> From: Christoffer Dall <christoffer.dall@linaro.org>
>>
>> We were not allowing userspace to set a more privileged mode for the VCPU
>> than EL1, but we should allow this when nested virtualization is enabled
>> for the VCPU.
>>
>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm64/kvm/guest.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
>> index 3ae2f82fca46..4c35b5d51e21 100644
>> --- a/arch/arm64/kvm/guest.c
>> +++ b/arch/arm64/kvm/guest.c
>> @@ -37,6 +37,7 @@
>>  #include <asm/kvm_emulate.h>
>>  #include <asm/kvm_coproc.h>
>>  #include <asm/kvm_host.h>
>> +#include <asm/kvm_nested.h>
>>  #include <asm/sigcontext.h>
>>  
>>  #include "trace.h"
>> @@ -194,6 +195,11 @@ static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
>>  			if (vcpu_el1_is_32bit(vcpu))
>>  				return -EINVAL;
>>  			break;
>> +		case PSR_MODE_EL2h:
>> +		case PSR_MODE_EL2t:
>> +			if (vcpu_el1_is_32bit(vcpu) || !nested_virt_in_use(vcpu))
> 
> This condition reads a bit weirdly. Why do we care about anything else
> than !nested_virt_in_use() ?
> 
> If nested virt is not in use then obviously we return the error.
> 
> If nested virt is in use then why do we care about EL1? Or should this
> test read as "highest_el_is_32bit" ?

There are multiple things at play here:

- MODE_EL2x is not a valid 32bit mode
- The architecture forbids nested virt with 32bit EL2

The code above is a simplification of these two conditions. But
certainly we can do a bit better, as kvm_reset_cpu() doesn't really
check that we don't create a vcpu with both 32bit+NV. These two bits
should really be exclusive.

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 05/59] KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set
  2019-06-21  9:37 ` [PATCH 05/59] KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set Marc Zyngier
@ 2019-06-24 10:19   ` Suzuki K Poulose
  2019-06-24 11:38   ` Dave Martin
  1 sibling, 0 replies; 176+ messages in thread
From: Suzuki K Poulose @ 2019-06-24 10:19 UTC (permalink / raw)
  To: marc.zyngier, linux-arm-kernel, kvmarm, kvm
  Cc: andre.przywara, christoffer.dall, dave.martin, jintack,
	julien.thierry, james.morse



On 21/06/2019 10:37, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@arm.com>
> 
> Reset the VCPU with PSTATE.M = EL2h when the nested virtualization
> feature is enabled on the VCPU.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>   arch/arm64/kvm/reset.c | 7 +++++++
>   1 file changed, 7 insertions(+)
> 
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index 1140b4485575..675ca07dbb05 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -52,6 +52,11 @@ static const struct kvm_regs default_regs_reset = {
>   			PSR_F_BIT | PSR_D_BIT),
>   };
>   
> +static const struct kvm_regs default_regs_reset_el2 = {
> +	.regs.pstate = (PSR_MODE_EL2h | PSR_A_BIT | PSR_I_BIT |
> +			PSR_F_BIT | PSR_D_BIT),
> +};
> +
>   static const struct kvm_regs default_regs_reset32 = {
>   	.regs.pstate = (PSR_AA32_MODE_SVC | PSR_AA32_A_BIT |
>   			PSR_AA32_I_BIT | PSR_AA32_F_BIT),
> @@ -302,6 +307,8 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>   			if (!cpu_has_32bit_el1())
>   				goto out;
>   			cpu_reset = &default_regs_reset32;
> +		} else if (test_bit(KVM_ARM_VCPU_NESTED_VIRT, vcpu->arch.features)) {

minor nit: "else if nested_virt_in_use(vcpu)" ?

Either ways:

Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 01/59] KVM: arm64: Migrate _elx sysreg accessors to msr_s/mrs_s
  2019-06-21  9:37 ` [PATCH 01/59] KVM: arm64: Migrate _elx sysreg accessors to msr_s/mrs_s Marc Zyngier
@ 2019-06-24 11:16   ` Dave Martin
  2019-06-24 12:59   ` Alexandru Elisei
  1 sibling, 0 replies; 176+ messages in thread
From: Dave Martin @ 2019-06-24 11:16 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Julien Thierry, Andre Przywara,
	Suzuki K Poulose, Christoffer Dall, James Morse, Jintack Lim

On Fri, Jun 21, 2019 at 10:37:45AM +0100, Marc Zyngier wrote:
> From: Dave Martin <Dave.Martin@arm.com>
> 
> Currently, the {read,write}_sysreg_el*() accessors for accessing
> particular ELs' sysregs in the presence of VHE rely on some local
> hacks and define their system register encodings in a way that is
> inconsistent with the core definitions in <asm/sysreg.h>.
> 
> As a result, it is necessary to add duplicate definitions for any
> system register that already needs a definition in sysreg.h for
> other reasons.
> 
> This is a bit of a maintenance headache, and the reasons for the
> _el*() accessors working the way they do is a bit historical.
> 
> This patch gets rid of the shadow sysreg definitions in
> <asm/kvm_hyp.h>, converts the _el*() accessors to use the core
> __msr_s/__mrs_s interface, and converts all call sites to use the
> standard sysreg #define names (i.e., upper case, with SYS_ prefix).
> 
> This patch will conflict heavily anyway, so the opportunity taken
> to clean up some bad whitespace in the context of the changes is
> taken.

FWIW, "opportunity taken ... is taken".

Anway, Ack, thanks to you and Sudeep for keeping this alive.

Cheers
---Dave

> The change exposes a few system registers that have no sysreg.h
> definition, due to msr_s/mrs_s being used in place of msr/mrs:
> additions are made in order to fill in the gaps.
> 
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Christoffer Dall <christoffer.dall@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Link: https://www.spinics.net/lists/kvm-arm/msg31717.html
> [Rebased to v4.21-rc1]
> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> [Rebased to v5.2-rc5, changelog updates]
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/include/asm/kvm_hyp.h           | 13 ++--
>  arch/arm64/include/asm/kvm_emulate.h     | 16 ++---
>  arch/arm64/include/asm/kvm_hyp.h         | 50 ++-------------
>  arch/arm64/include/asm/sysreg.h          | 35 ++++++++++-
>  arch/arm64/kvm/hyp/switch.c              | 14 ++---
>  arch/arm64/kvm/hyp/sysreg-sr.c           | 78 ++++++++++++------------
>  arch/arm64/kvm/hyp/tlb.c                 | 12 ++--
>  arch/arm64/kvm/hyp/vgic-v2-cpuif-proxy.c |  2 +-
>  arch/arm64/kvm/regmap.c                  |  4 +-
>  arch/arm64/kvm/sys_regs.c                | 56 ++++++++---------
>  virt/kvm/arm/arch_timer.c                | 24 ++++----
>  11 files changed, 148 insertions(+), 156 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_hyp.h b/arch/arm/include/asm/kvm_hyp.h
> index 87bcd18df8d5..059224fb14db 100644
> --- a/arch/arm/include/asm/kvm_hyp.h
> +++ b/arch/arm/include/asm/kvm_hyp.h
> @@ -93,13 +93,14 @@
>  #define VFP_FPEXC	__ACCESS_VFP(FPEXC)
>  
>  /* AArch64 compatibility macros, only for the timer so far */
> -#define read_sysreg_el0(r)		read_sysreg(r##_el0)
> -#define write_sysreg_el0(v, r)		write_sysreg(v, r##_el0)
> +#define read_sysreg_el0(r)		read_sysreg(r##_EL0)
> +#define write_sysreg_el0(v, r)		write_sysreg(v, r##_EL0)
> +
> +#define SYS_CNTP_CTL_EL0		CNTP_CTL
> +#define SYS_CNTP_CVAL_EL0		CNTP_CVAL
> +#define SYS_CNTV_CTL_EL0		CNTV_CTL
> +#define SYS_CNTV_CVAL_EL0		CNTV_CVAL
>  
> -#define cntp_ctl_el0			CNTP_CTL
> -#define cntp_cval_el0			CNTP_CVAL
> -#define cntv_ctl_el0			CNTV_CTL
> -#define cntv_cval_el0			CNTV_CVAL
>  #define cntvoff_el2			CNTVOFF
>  #define cnthctl_el2			CNTHCTL
>  
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 613427fafff9..39ffe41855bc 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -137,7 +137,7 @@ static inline unsigned long *__vcpu_elr_el1(const struct kvm_vcpu *vcpu)
>  static inline unsigned long vcpu_read_elr_el1(const struct kvm_vcpu *vcpu)
>  {
>  	if (vcpu->arch.sysregs_loaded_on_cpu)
> -		return read_sysreg_el1(elr);
> +		return read_sysreg_el1(SYS_ELR);
>  	else
>  		return *__vcpu_elr_el1(vcpu);
>  }
> @@ -145,7 +145,7 @@ static inline unsigned long vcpu_read_elr_el1(const struct kvm_vcpu *vcpu)
>  static inline void vcpu_write_elr_el1(const struct kvm_vcpu *vcpu, unsigned long v)
>  {
>  	if (vcpu->arch.sysregs_loaded_on_cpu)
> -		write_sysreg_el1(v, elr);
> +		write_sysreg_el1(v, SYS_ELR);
>  	else
>  		*__vcpu_elr_el1(vcpu) = v;
>  }
> @@ -197,7 +197,7 @@ static inline unsigned long vcpu_read_spsr(const struct kvm_vcpu *vcpu)
>  		return vcpu_read_spsr32(vcpu);
>  
>  	if (vcpu->arch.sysregs_loaded_on_cpu)
> -		return read_sysreg_el1(spsr);
> +		return read_sysreg_el1(SYS_SPSR);
>  	else
>  		return vcpu_gp_regs(vcpu)->spsr[KVM_SPSR_EL1];
>  }
> @@ -210,7 +210,7 @@ static inline void vcpu_write_spsr(struct kvm_vcpu *vcpu, unsigned long v)
>  	}
>  
>  	if (vcpu->arch.sysregs_loaded_on_cpu)
> -		write_sysreg_el1(v, spsr);
> +		write_sysreg_el1(v, SYS_SPSR);
>  	else
>  		vcpu_gp_regs(vcpu)->spsr[KVM_SPSR_EL1] = v;
>  }
> @@ -462,13 +462,13 @@ static inline void kvm_skip_instr(struct kvm_vcpu *vcpu, bool is_wide_instr)
>   */
>  static inline void __hyp_text __kvm_skip_instr(struct kvm_vcpu *vcpu)
>  {
> -	*vcpu_pc(vcpu) = read_sysreg_el2(elr);
> -	vcpu->arch.ctxt.gp_regs.regs.pstate = read_sysreg_el2(spsr);
> +	*vcpu_pc(vcpu) = read_sysreg_el2(SYS_ELR);
> +	vcpu->arch.ctxt.gp_regs.regs.pstate = read_sysreg_el2(SYS_SPSR);
>  
>  	kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
>  
> -	write_sysreg_el2(vcpu->arch.ctxt.gp_regs.regs.pstate, spsr);
> -	write_sysreg_el2(*vcpu_pc(vcpu), elr);
> +	write_sysreg_el2(vcpu->arch.ctxt.gp_regs.regs.pstate, SYS_SPSR);
> +	write_sysreg_el2(*vcpu_pc(vcpu), SYS_ELR);
>  }
>  
>  #endif /* __ARM64_KVM_EMULATE_H__ */
> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
> index 09fe8bd15f6e..ce99c2daff04 100644
> --- a/arch/arm64/include/asm/kvm_hyp.h
> +++ b/arch/arm64/include/asm/kvm_hyp.h
> @@ -29,7 +29,7 @@
>  #define read_sysreg_elx(r,nvh,vh)					\
>  	({								\
>  		u64 reg;						\
> -		asm volatile(ALTERNATIVE("mrs %0, " __stringify(r##nvh),\
> +		asm volatile(ALTERNATIVE(__mrs_s("%0", r##nvh),	\
>  					 __mrs_s("%0", r##vh),		\
>  					 ARM64_HAS_VIRT_HOST_EXTN)	\
>  			     : "=r" (reg));				\
> @@ -39,7 +39,7 @@
>  #define write_sysreg_elx(v,r,nvh,vh)					\
>  	do {								\
>  		u64 __val = (u64)(v);					\
> -		asm volatile(ALTERNATIVE("msr " __stringify(r##nvh) ", %x0",\
> +		asm volatile(ALTERNATIVE(__msr_s(r##nvh, "%x0"),	\
>  					 __msr_s(r##vh, "%x0"),		\
>  					 ARM64_HAS_VIRT_HOST_EXTN)	\
>  					 : : "rZ" (__val));		\
> @@ -48,55 +48,15 @@
>  /*
>   * Unified accessors for registers that have a different encoding
>   * between VHE and non-VHE. They must be specified without their "ELx"
> - * encoding.
> + * encoding, but with the SYS_ prefix, as defined in asm/sysreg.h.
>   */
> -#define read_sysreg_el2(r)						\
> -	({								\
> -		u64 reg;						\
> -		asm volatile(ALTERNATIVE("mrs %0, " __stringify(r##_EL2),\
> -					 "mrs %0, " __stringify(r##_EL1),\
> -					 ARM64_HAS_VIRT_HOST_EXTN)	\
> -			     : "=r" (reg));				\
> -		reg;							\
> -	})
> -
> -#define write_sysreg_el2(v,r)						\
> -	do {								\
> -		u64 __val = (u64)(v);					\
> -		asm volatile(ALTERNATIVE("msr " __stringify(r##_EL2) ", %x0",\
> -					 "msr " __stringify(r##_EL1) ", %x0",\
> -					 ARM64_HAS_VIRT_HOST_EXTN)	\
> -					 : : "rZ" (__val));		\
> -	} while (0)
>  
>  #define read_sysreg_el0(r)	read_sysreg_elx(r, _EL0, _EL02)
>  #define write_sysreg_el0(v,r)	write_sysreg_elx(v, r, _EL0, _EL02)
>  #define read_sysreg_el1(r)	read_sysreg_elx(r, _EL1, _EL12)
>  #define write_sysreg_el1(v,r)	write_sysreg_elx(v, r, _EL1, _EL12)
> -
> -/* The VHE specific system registers and their encoding */
> -#define sctlr_EL12              sys_reg(3, 5, 1, 0, 0)
> -#define cpacr_EL12              sys_reg(3, 5, 1, 0, 2)
> -#define ttbr0_EL12              sys_reg(3, 5, 2, 0, 0)
> -#define ttbr1_EL12              sys_reg(3, 5, 2, 0, 1)
> -#define tcr_EL12                sys_reg(3, 5, 2, 0, 2)
> -#define afsr0_EL12              sys_reg(3, 5, 5, 1, 0)
> -#define afsr1_EL12              sys_reg(3, 5, 5, 1, 1)
> -#define esr_EL12                sys_reg(3, 5, 5, 2, 0)
> -#define far_EL12                sys_reg(3, 5, 6, 0, 0)
> -#define mair_EL12               sys_reg(3, 5, 10, 2, 0)
> -#define amair_EL12              sys_reg(3, 5, 10, 3, 0)
> -#define vbar_EL12               sys_reg(3, 5, 12, 0, 0)
> -#define contextidr_EL12         sys_reg(3, 5, 13, 0, 1)
> -#define cntkctl_EL12            sys_reg(3, 5, 14, 1, 0)
> -#define cntp_tval_EL02          sys_reg(3, 5, 14, 2, 0)
> -#define cntp_ctl_EL02           sys_reg(3, 5, 14, 2, 1)
> -#define cntp_cval_EL02          sys_reg(3, 5, 14, 2, 2)
> -#define cntv_tval_EL02          sys_reg(3, 5, 14, 3, 0)
> -#define cntv_ctl_EL02           sys_reg(3, 5, 14, 3, 1)
> -#define cntv_cval_EL02          sys_reg(3, 5, 14, 3, 2)
> -#define spsr_EL12               sys_reg(3, 5, 4, 0, 0)
> -#define elr_EL12                sys_reg(3, 5, 4, 0, 1)
> +#define read_sysreg_el2(r)	read_sysreg_elx(r, _EL2, _EL1)
> +#define write_sysreg_el2(v,r)	write_sysreg_elx(v, r, _EL2, _EL1)
>  
>  /**
>   * hyp_alternate_select - Generates patchable code sequences that are
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 902d75b60914..434cf53d527b 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -202,6 +202,9 @@
>  #define SYS_APGAKEYLO_EL1		sys_reg(3, 0, 2, 3, 0)
>  #define SYS_APGAKEYHI_EL1		sys_reg(3, 0, 2, 3, 1)
>  
> +#define SYS_SPSR_EL1			sys_reg(3, 0, 4, 0, 0)
> +#define SYS_ELR_EL1			sys_reg(3, 0, 4, 0, 1)
> +
>  #define SYS_ICC_PMR_EL1			sys_reg(3, 0, 4, 6, 0)
>  
>  #define SYS_AFSR0_EL1			sys_reg(3, 0, 5, 1, 0)
> @@ -393,6 +396,9 @@
>  #define SYS_CNTP_CTL_EL0		sys_reg(3, 3, 14, 2, 1)
>  #define SYS_CNTP_CVAL_EL0		sys_reg(3, 3, 14, 2, 2)
>  
> +#define SYS_CNTV_CTL_EL0		sys_reg(3, 3, 14, 3, 1)
> +#define SYS_CNTV_CVAL_EL0		sys_reg(3, 3, 14, 3, 2)
> +
>  #define SYS_AARCH32_CNTP_TVAL		sys_reg(0, 0, 14, 2, 0)
>  #define SYS_AARCH32_CNTP_CTL		sys_reg(0, 0, 14, 2, 1)
>  #define SYS_AARCH32_CNTP_CVAL		sys_reg(0, 2, 0, 14, 0)
> @@ -403,14 +409,17 @@
>  #define __TYPER_CRm(n)			(0xc | (((n) >> 3) & 0x3))
>  #define SYS_PMEVTYPERn_EL0(n)		sys_reg(3, 3, 14, __TYPER_CRm(n), __PMEV_op2(n))
>  
> -#define SYS_PMCCFILTR_EL0		sys_reg (3, 3, 14, 15, 7)
> +#define SYS_PMCCFILTR_EL0		sys_reg(3, 3, 14, 15, 7)
>  
>  #define SYS_ZCR_EL2			sys_reg(3, 4, 1, 2, 0)
> -
>  #define SYS_DACR32_EL2			sys_reg(3, 4, 3, 0, 0)
> +#define SYS_SPSR_EL2			sys_reg(3, 4, 4, 0, 0)
> +#define SYS_ELR_EL2			sys_reg(3, 4, 4, 0, 1)
>  #define SYS_IFSR32_EL2			sys_reg(3, 4, 5, 0, 1)
> +#define SYS_ESR_EL2			sys_reg(3, 4, 5, 2, 0)
>  #define SYS_VSESR_EL2			sys_reg(3, 4, 5, 2, 3)
>  #define SYS_FPEXC32_EL2			sys_reg(3, 4, 5, 3, 0)
> +#define SYS_FAR_EL2			sys_reg(3, 4, 6, 0, 0)
>  
>  #define SYS_VDISR_EL2			sys_reg(3, 4, 12, 1,  1)
>  #define __SYS__AP0Rx_EL2(x)		sys_reg(3, 4, 12, 8, x)
> @@ -455,7 +464,29 @@
>  #define SYS_ICH_LR15_EL2		__SYS__LR8_EL2(7)
>  
>  /* VHE encodings for architectural EL0/1 system registers */
> +#define SYS_SCTLR_EL12			sys_reg(3, 5, 1, 0, 0)
> +#define SYS_CPACR_EL12			sys_reg(3, 5, 1, 0, 2)
>  #define SYS_ZCR_EL12			sys_reg(3, 5, 1, 2, 0)
> +#define SYS_TTBR0_EL12			sys_reg(3, 5, 2, 0, 0)
> +#define SYS_TTBR1_EL12			sys_reg(3, 5, 2, 0, 1)
> +#define SYS_TCR_EL12			sys_reg(3, 5, 2, 0, 2)
> +#define SYS_SPSR_EL12			sys_reg(3, 5, 4, 0, 0)
> +#define SYS_ELR_EL12			sys_reg(3, 5, 4, 0, 1)
> +#define SYS_AFSR0_EL12			sys_reg(3, 5, 5, 1, 0)
> +#define SYS_AFSR1_EL12			sys_reg(3, 5, 5, 1, 1)
> +#define SYS_ESR_EL12			sys_reg(3, 5, 5, 2, 0)
> +#define SYS_FAR_EL12			sys_reg(3, 5, 6, 0, 0)
> +#define SYS_MAIR_EL12			sys_reg(3, 5, 10, 2, 0)
> +#define SYS_AMAIR_EL12			sys_reg(3, 5, 10, 3, 0)
> +#define SYS_VBAR_EL12			sys_reg(3, 5, 12, 0, 0)
> +#define SYS_CONTEXTIDR_EL12		sys_reg(3, 5, 13, 0, 1)
> +#define SYS_CNTKCTL_EL12		sys_reg(3, 5, 14, 1, 0)
> +#define SYS_CNTP_TVAL_EL02		sys_reg(3, 5, 14, 2, 0)
> +#define SYS_CNTP_CTL_EL02		sys_reg(3, 5, 14, 2, 1)
> +#define SYS_CNTP_CVAL_EL02		sys_reg(3, 5, 14, 2, 2)
> +#define SYS_CNTV_TVAL_EL02		sys_reg(3, 5, 14, 3, 0)
> +#define SYS_CNTV_CTL_EL02		sys_reg(3, 5, 14, 3, 1)
> +#define SYS_CNTV_CVAL_EL02		sys_reg(3, 5, 14, 3, 2)
>  
>  /* Common SCTLR_ELx flags. */
>  #define SCTLR_ELx_DSSBS	(_BITUL(44))
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 8799e0c267d4..7b55c11b30fb 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -295,7 +295,7 @@ static bool __hyp_text __populate_fault_info(struct kvm_vcpu *vcpu)
>  	if (ec != ESR_ELx_EC_DABT_LOW && ec != ESR_ELx_EC_IABT_LOW)
>  		return true;
>  
> -	far = read_sysreg_el2(far);
> +	far = read_sysreg_el2(SYS_FAR);
>  
>  	/*
>  	 * The HPFAR can be invalid if the stage 2 fault did not
> @@ -412,7 +412,7 @@ static bool __hyp_text __hyp_handle_fpsimd(struct kvm_vcpu *vcpu)
>  static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>  {
>  	if (ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ)
> -		vcpu->arch.fault.esr_el2 = read_sysreg_el2(esr);
> +		vcpu->arch.fault.esr_el2 = read_sysreg_el2(SYS_ESR);
>  
>  	/*
>  	 * We're using the raw exception code in order to only process
> @@ -708,8 +708,8 @@ static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par,
>  	asm volatile("ldr %0, =__hyp_panic_string" : "=r" (str_va));
>  
>  	__hyp_do_panic(str_va,
> -		       spsr,  elr,
> -		       read_sysreg(esr_el2),   read_sysreg_el2(far),
> +		       spsr, elr,
> +		       read_sysreg(esr_el2), read_sysreg_el2(SYS_FAR),
>  		       read_sysreg(hpfar_el2), par, vcpu);
>  }
>  
> @@ -724,15 +724,15 @@ static void __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 par,
>  
>  	panic(__hyp_panic_string,
>  	      spsr,  elr,
> -	      read_sysreg_el2(esr),   read_sysreg_el2(far),
> +	      read_sysreg_el2(SYS_ESR),   read_sysreg_el2(SYS_FAR),
>  	      read_sysreg(hpfar_el2), par, vcpu);
>  }
>  NOKPROBE_SYMBOL(__hyp_call_panic_vhe);
>  
>  void __hyp_text __noreturn hyp_panic(struct kvm_cpu_context *host_ctxt)
>  {
> -	u64 spsr = read_sysreg_el2(spsr);
> -	u64 elr = read_sysreg_el2(elr);
> +	u64 spsr = read_sysreg_el2(SYS_SPSR);
> +	u64 elr = read_sysreg_el2(SYS_ELR);
>  	u64 par = read_sysreg(par_el1);
>  
>  	if (!has_vhe())
> diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> index c52a8451637c..62866a68e852 100644
> --- a/arch/arm64/kvm/hyp/sysreg-sr.c
> +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> @@ -54,33 +54,33 @@ static void __hyp_text __sysreg_save_user_state(struct kvm_cpu_context *ctxt)
>  static void __hyp_text __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
>  {
>  	ctxt->sys_regs[CSSELR_EL1]	= read_sysreg(csselr_el1);
> -	ctxt->sys_regs[SCTLR_EL1]	= read_sysreg_el1(sctlr);
> +	ctxt->sys_regs[SCTLR_EL1]	= read_sysreg_el1(SYS_SCTLR);
>  	ctxt->sys_regs[ACTLR_EL1]	= read_sysreg(actlr_el1);
> -	ctxt->sys_regs[CPACR_EL1]	= read_sysreg_el1(cpacr);
> -	ctxt->sys_regs[TTBR0_EL1]	= read_sysreg_el1(ttbr0);
> -	ctxt->sys_regs[TTBR1_EL1]	= read_sysreg_el1(ttbr1);
> -	ctxt->sys_regs[TCR_EL1]		= read_sysreg_el1(tcr);
> -	ctxt->sys_regs[ESR_EL1]		= read_sysreg_el1(esr);
> -	ctxt->sys_regs[AFSR0_EL1]	= read_sysreg_el1(afsr0);
> -	ctxt->sys_regs[AFSR1_EL1]	= read_sysreg_el1(afsr1);
> -	ctxt->sys_regs[FAR_EL1]		= read_sysreg_el1(far);
> -	ctxt->sys_regs[MAIR_EL1]	= read_sysreg_el1(mair);
> -	ctxt->sys_regs[VBAR_EL1]	= read_sysreg_el1(vbar);
> -	ctxt->sys_regs[CONTEXTIDR_EL1]	= read_sysreg_el1(contextidr);
> -	ctxt->sys_regs[AMAIR_EL1]	= read_sysreg_el1(amair);
> -	ctxt->sys_regs[CNTKCTL_EL1]	= read_sysreg_el1(cntkctl);
> +	ctxt->sys_regs[CPACR_EL1]	= read_sysreg_el1(SYS_CPACR);
> +	ctxt->sys_regs[TTBR0_EL1]	= read_sysreg_el1(SYS_TTBR0);
> +	ctxt->sys_regs[TTBR1_EL1]	= read_sysreg_el1(SYS_TTBR1);
> +	ctxt->sys_regs[TCR_EL1]		= read_sysreg_el1(SYS_TCR);
> +	ctxt->sys_regs[ESR_EL1]		= read_sysreg_el1(SYS_ESR);
> +	ctxt->sys_regs[AFSR0_EL1]	= read_sysreg_el1(SYS_AFSR0);
> +	ctxt->sys_regs[AFSR1_EL1]	= read_sysreg_el1(SYS_AFSR1);
> +	ctxt->sys_regs[FAR_EL1]		= read_sysreg_el1(SYS_FAR);
> +	ctxt->sys_regs[MAIR_EL1]	= read_sysreg_el1(SYS_MAIR);
> +	ctxt->sys_regs[VBAR_EL1]	= read_sysreg_el1(SYS_VBAR);
> +	ctxt->sys_regs[CONTEXTIDR_EL1]	= read_sysreg_el1(SYS_CONTEXTIDR);
> +	ctxt->sys_regs[AMAIR_EL1]	= read_sysreg_el1(SYS_AMAIR);
> +	ctxt->sys_regs[CNTKCTL_EL1]	= read_sysreg_el1(SYS_CNTKCTL);
>  	ctxt->sys_regs[PAR_EL1]		= read_sysreg(par_el1);
>  	ctxt->sys_regs[TPIDR_EL1]	= read_sysreg(tpidr_el1);
>  
>  	ctxt->gp_regs.sp_el1		= read_sysreg(sp_el1);
> -	ctxt->gp_regs.elr_el1		= read_sysreg_el1(elr);
> -	ctxt->gp_regs.spsr[KVM_SPSR_EL1]= read_sysreg_el1(spsr);
> +	ctxt->gp_regs.elr_el1		= read_sysreg_el1(SYS_ELR);
> +	ctxt->gp_regs.spsr[KVM_SPSR_EL1]= read_sysreg_el1(SYS_SPSR);
>  }
>  
>  static void __hyp_text __sysreg_save_el2_return_state(struct kvm_cpu_context *ctxt)
>  {
> -	ctxt->gp_regs.regs.pc		= read_sysreg_el2(elr);
> -	ctxt->gp_regs.regs.pstate	= read_sysreg_el2(spsr);
> +	ctxt->gp_regs.regs.pc		= read_sysreg_el2(SYS_ELR);
> +	ctxt->gp_regs.regs.pstate	= read_sysreg_el2(SYS_SPSR);
>  
>  	if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN))
>  		ctxt->sys_regs[DISR_EL1] = read_sysreg_s(SYS_VDISR_EL2);
> @@ -120,35 +120,35 @@ static void __hyp_text __sysreg_restore_common_state(struct kvm_cpu_context *ctx
>  
>  static void __hyp_text __sysreg_restore_user_state(struct kvm_cpu_context *ctxt)
>  {
> -	write_sysreg(ctxt->sys_regs[TPIDR_EL0],	  	tpidr_el0);
> -	write_sysreg(ctxt->sys_regs[TPIDRRO_EL0], 	tpidrro_el0);
> +	write_sysreg(ctxt->sys_regs[TPIDR_EL0],		tpidr_el0);
> +	write_sysreg(ctxt->sys_regs[TPIDRRO_EL0],	tpidrro_el0);
>  }
>  
>  static void __hyp_text __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
>  {
>  	write_sysreg(ctxt->sys_regs[MPIDR_EL1],		vmpidr_el2);
>  	write_sysreg(ctxt->sys_regs[CSSELR_EL1],	csselr_el1);
> -	write_sysreg_el1(ctxt->sys_regs[SCTLR_EL1],	sctlr);
> -	write_sysreg(ctxt->sys_regs[ACTLR_EL1],	  	actlr_el1);
> -	write_sysreg_el1(ctxt->sys_regs[CPACR_EL1],	cpacr);
> -	write_sysreg_el1(ctxt->sys_regs[TTBR0_EL1],	ttbr0);
> -	write_sysreg_el1(ctxt->sys_regs[TTBR1_EL1],	ttbr1);
> -	write_sysreg_el1(ctxt->sys_regs[TCR_EL1],	tcr);
> -	write_sysreg_el1(ctxt->sys_regs[ESR_EL1],	esr);
> -	write_sysreg_el1(ctxt->sys_regs[AFSR0_EL1],	afsr0);
> -	write_sysreg_el1(ctxt->sys_regs[AFSR1_EL1],	afsr1);
> -	write_sysreg_el1(ctxt->sys_regs[FAR_EL1],	far);
> -	write_sysreg_el1(ctxt->sys_regs[MAIR_EL1],	mair);
> -	write_sysreg_el1(ctxt->sys_regs[VBAR_EL1],	vbar);
> -	write_sysreg_el1(ctxt->sys_regs[CONTEXTIDR_EL1],contextidr);
> -	write_sysreg_el1(ctxt->sys_regs[AMAIR_EL1],	amair);
> -	write_sysreg_el1(ctxt->sys_regs[CNTKCTL_EL1], 	cntkctl);
> +	write_sysreg_el1(ctxt->sys_regs[SCTLR_EL1],	SYS_SCTLR);
> +	write_sysreg(ctxt->sys_regs[ACTLR_EL1],		actlr_el1);
> +	write_sysreg_el1(ctxt->sys_regs[CPACR_EL1],	SYS_CPACR);
> +	write_sysreg_el1(ctxt->sys_regs[TTBR0_EL1],	SYS_TTBR0);
> +	write_sysreg_el1(ctxt->sys_regs[TTBR1_EL1],	SYS_TTBR1);
> +	write_sysreg_el1(ctxt->sys_regs[TCR_EL1],	SYS_TCR);
> +	write_sysreg_el1(ctxt->sys_regs[ESR_EL1],	SYS_ESR);
> +	write_sysreg_el1(ctxt->sys_regs[AFSR0_EL1],	SYS_AFSR0);
> +	write_sysreg_el1(ctxt->sys_regs[AFSR1_EL1],	SYS_AFSR1);
> +	write_sysreg_el1(ctxt->sys_regs[FAR_EL1],	SYS_FAR);
> +	write_sysreg_el1(ctxt->sys_regs[MAIR_EL1],	SYS_MAIR);
> +	write_sysreg_el1(ctxt->sys_regs[VBAR_EL1],	SYS_VBAR);
> +	write_sysreg_el1(ctxt->sys_regs[CONTEXTIDR_EL1],SYS_CONTEXTIDR);
> +	write_sysreg_el1(ctxt->sys_regs[AMAIR_EL1],	SYS_AMAIR);
> +	write_sysreg_el1(ctxt->sys_regs[CNTKCTL_EL1],	SYS_CNTKCTL);
>  	write_sysreg(ctxt->sys_regs[PAR_EL1],		par_el1);
>  	write_sysreg(ctxt->sys_regs[TPIDR_EL1],		tpidr_el1);
>  
>  	write_sysreg(ctxt->gp_regs.sp_el1,		sp_el1);
> -	write_sysreg_el1(ctxt->gp_regs.elr_el1,		elr);
> -	write_sysreg_el1(ctxt->gp_regs.spsr[KVM_SPSR_EL1],spsr);
> +	write_sysreg_el1(ctxt->gp_regs.elr_el1,		SYS_ELR);
> +	write_sysreg_el1(ctxt->gp_regs.spsr[KVM_SPSR_EL1],SYS_SPSR);
>  }
>  
>  static void __hyp_text
> @@ -171,8 +171,8 @@ __sysreg_restore_el2_return_state(struct kvm_cpu_context *ctxt)
>  	if (!(mode & PSR_MODE32_BIT) && mode >= PSR_MODE_EL2t)
>  		pstate = PSR_MODE_EL2h | PSR_IL_BIT;
>  
> -	write_sysreg_el2(ctxt->gp_regs.regs.pc,		elr);
> -	write_sysreg_el2(pstate,			spsr);
> +	write_sysreg_el2(ctxt->gp_regs.regs.pc,		SYS_ELR);
> +	write_sysreg_el2(pstate,			SYS_SPSR);
>  
>  	if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN))
>  		write_sysreg_s(ctxt->sys_regs[DISR_EL1], SYS_VDISR_EL2);
> diff --git a/arch/arm64/kvm/hyp/tlb.c b/arch/arm64/kvm/hyp/tlb.c
> index 76c30866069e..32a782bb00be 100644
> --- a/arch/arm64/kvm/hyp/tlb.c
> +++ b/arch/arm64/kvm/hyp/tlb.c
> @@ -44,12 +44,12 @@ static void __hyp_text __tlb_switch_to_guest_vhe(struct kvm *kvm,
>  		 * in the TCR_EL1 register. We also need to prevent it to
>  		 * allocate IPA->PA walks, so we enable the S1 MMU...
>  		 */
> -		val = cxt->tcr = read_sysreg_el1(tcr);
> +		val = cxt->tcr = read_sysreg_el1(SYS_TCR);
>  		val |= TCR_EPD1_MASK | TCR_EPD0_MASK;
> -		write_sysreg_el1(val, tcr);
> -		val = cxt->sctlr = read_sysreg_el1(sctlr);
> +		write_sysreg_el1(val, SYS_TCR);
> +		val = cxt->sctlr = read_sysreg_el1(SYS_SCTLR);
>  		val |= SCTLR_ELx_M;
> -		write_sysreg_el1(val, sctlr);
> +		write_sysreg_el1(val, SYS_SCTLR);
>  	}
>  
>  	/*
> @@ -96,8 +96,8 @@ static void __hyp_text __tlb_switch_to_host_vhe(struct kvm *kvm,
>  
>  	if (cpus_have_const_cap(ARM64_WORKAROUND_1165522)) {
>  		/* Restore the registers to what they were */
> -		write_sysreg_el1(cxt->tcr, tcr);
> -		write_sysreg_el1(cxt->sctlr, sctlr);
> +		write_sysreg_el1(cxt->tcr, SYS_TCR);
> +		write_sysreg_el1(cxt->sctlr, SYS_SCTLR);
>  	}
>  
>  	local_irq_restore(cxt->flags);
> diff --git a/arch/arm64/kvm/hyp/vgic-v2-cpuif-proxy.c b/arch/arm64/kvm/hyp/vgic-v2-cpuif-proxy.c
> index 9cbdd034a563..4cd32c856110 100644
> --- a/arch/arm64/kvm/hyp/vgic-v2-cpuif-proxy.c
> +++ b/arch/arm64/kvm/hyp/vgic-v2-cpuif-proxy.c
> @@ -27,7 +27,7 @@
>  static bool __hyp_text __is_be(struct kvm_vcpu *vcpu)
>  {
>  	if (vcpu_mode_is_32bit(vcpu))
> -		return !!(read_sysreg_el2(spsr) & PSR_AA32_E_BIT);
> +		return !!(read_sysreg_el2(SYS_SPSR) & PSR_AA32_E_BIT);
>  
>  	return !!(read_sysreg(SCTLR_EL1) & SCTLR_ELx_EE);
>  }
> diff --git a/arch/arm64/kvm/regmap.c b/arch/arm64/kvm/regmap.c
> index 7a5173ea2276..5dd110b384e4 100644
> --- a/arch/arm64/kvm/regmap.c
> +++ b/arch/arm64/kvm/regmap.c
> @@ -163,7 +163,7 @@ unsigned long vcpu_read_spsr32(const struct kvm_vcpu *vcpu)
>  
>  	switch (spsr_idx) {
>  	case KVM_SPSR_SVC:
> -		return read_sysreg_el1(spsr);
> +		return read_sysreg_el1(SYS_SPSR);
>  	case KVM_SPSR_ABT:
>  		return read_sysreg(spsr_abt);
>  	case KVM_SPSR_UND:
> @@ -188,7 +188,7 @@ void vcpu_write_spsr32(struct kvm_vcpu *vcpu, unsigned long v)
>  
>  	switch (spsr_idx) {
>  	case KVM_SPSR_SVC:
> -		write_sysreg_el1(v, spsr);
> +		write_sysreg_el1(v, SYS_SPSR);
>  	case KVM_SPSR_ABT:
>  		write_sysreg(v, spsr_abt);
>  	case KVM_SPSR_UND:
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 857b226bcdde..adb8a7e9c8e4 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -92,24 +92,24 @@ u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
>  	 */
>  	switch (reg) {
>  	case CSSELR_EL1:	return read_sysreg_s(SYS_CSSELR_EL1);
> -	case SCTLR_EL1:		return read_sysreg_s(sctlr_EL12);
> +	case SCTLR_EL1:		return read_sysreg_s(SYS_SCTLR_EL12);
>  	case ACTLR_EL1:		return read_sysreg_s(SYS_ACTLR_EL1);
> -	case CPACR_EL1:		return read_sysreg_s(cpacr_EL12);
> -	case TTBR0_EL1:		return read_sysreg_s(ttbr0_EL12);
> -	case TTBR1_EL1:		return read_sysreg_s(ttbr1_EL12);
> -	case TCR_EL1:		return read_sysreg_s(tcr_EL12);
> -	case ESR_EL1:		return read_sysreg_s(esr_EL12);
> -	case AFSR0_EL1:		return read_sysreg_s(afsr0_EL12);
> -	case AFSR1_EL1:		return read_sysreg_s(afsr1_EL12);
> -	case FAR_EL1:		return read_sysreg_s(far_EL12);
> -	case MAIR_EL1:		return read_sysreg_s(mair_EL12);
> -	case VBAR_EL1:		return read_sysreg_s(vbar_EL12);
> -	case CONTEXTIDR_EL1:	return read_sysreg_s(contextidr_EL12);
> +	case CPACR_EL1:		return read_sysreg_s(SYS_CPACR_EL12);
> +	case TTBR0_EL1:		return read_sysreg_s(SYS_TTBR0_EL12);
> +	case TTBR1_EL1:		return read_sysreg_s(SYS_TTBR1_EL12);
> +	case TCR_EL1:		return read_sysreg_s(SYS_TCR_EL12);
> +	case ESR_EL1:		return read_sysreg_s(SYS_ESR_EL12);
> +	case AFSR0_EL1:		return read_sysreg_s(SYS_AFSR0_EL12);
> +	case AFSR1_EL1:		return read_sysreg_s(SYS_AFSR1_EL12);
> +	case FAR_EL1:		return read_sysreg_s(SYS_FAR_EL12);
> +	case MAIR_EL1:		return read_sysreg_s(SYS_MAIR_EL12);
> +	case VBAR_EL1:		return read_sysreg_s(SYS_VBAR_EL12);
> +	case CONTEXTIDR_EL1:	return read_sysreg_s(SYS_CONTEXTIDR_EL12);
>  	case TPIDR_EL0:		return read_sysreg_s(SYS_TPIDR_EL0);
>  	case TPIDRRO_EL0:	return read_sysreg_s(SYS_TPIDRRO_EL0);
>  	case TPIDR_EL1:		return read_sysreg_s(SYS_TPIDR_EL1);
> -	case AMAIR_EL1:		return read_sysreg_s(amair_EL12);
> -	case CNTKCTL_EL1:	return read_sysreg_s(cntkctl_EL12);
> +	case AMAIR_EL1:		return read_sysreg_s(SYS_AMAIR_EL12);
> +	case CNTKCTL_EL1:	return read_sysreg_s(SYS_CNTKCTL_EL12);
>  	case PAR_EL1:		return read_sysreg_s(SYS_PAR_EL1);
>  	case DACR32_EL2:	return read_sysreg_s(SYS_DACR32_EL2);
>  	case IFSR32_EL2:	return read_sysreg_s(SYS_IFSR32_EL2);
> @@ -135,24 +135,24 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
>  	 */
>  	switch (reg) {
>  	case CSSELR_EL1:	write_sysreg_s(val, SYS_CSSELR_EL1);	return;
> -	case SCTLR_EL1:		write_sysreg_s(val, sctlr_EL12);	return;
> +	case SCTLR_EL1:		write_sysreg_s(val, SYS_SCTLR_EL12);	return;
>  	case ACTLR_EL1:		write_sysreg_s(val, SYS_ACTLR_EL1);	return;
> -	case CPACR_EL1:		write_sysreg_s(val, cpacr_EL12);	return;
> -	case TTBR0_EL1:		write_sysreg_s(val, ttbr0_EL12);	return;
> -	case TTBR1_EL1:		write_sysreg_s(val, ttbr1_EL12);	return;
> -	case TCR_EL1:		write_sysreg_s(val, tcr_EL12);		return;
> -	case ESR_EL1:		write_sysreg_s(val, esr_EL12);		return;
> -	case AFSR0_EL1:		write_sysreg_s(val, afsr0_EL12);	return;
> -	case AFSR1_EL1:		write_sysreg_s(val, afsr1_EL12);	return;
> -	case FAR_EL1:		write_sysreg_s(val, far_EL12);		return;
> -	case MAIR_EL1:		write_sysreg_s(val, mair_EL12);		return;
> -	case VBAR_EL1:		write_sysreg_s(val, vbar_EL12);		return;
> -	case CONTEXTIDR_EL1:	write_sysreg_s(val, contextidr_EL12);	return;
> +	case CPACR_EL1:		write_sysreg_s(val, SYS_CPACR_EL12);	return;
> +	case TTBR0_EL1:		write_sysreg_s(val, SYS_TTBR0_EL12);	return;
> +	case TTBR1_EL1:		write_sysreg_s(val, SYS_TTBR1_EL12);	return;
> +	case TCR_EL1:		write_sysreg_s(val, SYS_TCR_EL12);	return;
> +	case ESR_EL1:		write_sysreg_s(val, SYS_ESR_EL12);	return;
> +	case AFSR0_EL1:		write_sysreg_s(val, SYS_AFSR0_EL12);	return;
> +	case AFSR1_EL1:		write_sysreg_s(val, SYS_AFSR1_EL12);	return;
> +	case FAR_EL1:		write_sysreg_s(val, SYS_FAR_EL12);	return;
> +	case MAIR_EL1:		write_sysreg_s(val, SYS_MAIR_EL12);	return;
> +	case VBAR_EL1:		write_sysreg_s(val, SYS_VBAR_EL12);	return;
> +	case CONTEXTIDR_EL1:	write_sysreg_s(val, SYS_CONTEXTIDR_EL12); return;
>  	case TPIDR_EL0:		write_sysreg_s(val, SYS_TPIDR_EL0);	return;
>  	case TPIDRRO_EL0:	write_sysreg_s(val, SYS_TPIDRRO_EL0);	return;
>  	case TPIDR_EL1:		write_sysreg_s(val, SYS_TPIDR_EL1);	return;
> -	case AMAIR_EL1:		write_sysreg_s(val, amair_EL12);	return;
> -	case CNTKCTL_EL1:	write_sysreg_s(val, cntkctl_EL12);	return;
> +	case AMAIR_EL1:		write_sysreg_s(val, SYS_AMAIR_EL12);	return;
> +	case CNTKCTL_EL1:	write_sysreg_s(val, SYS_CNTKCTL_EL12);	return;
>  	case PAR_EL1:		write_sysreg_s(val, SYS_PAR_EL1);	return;
>  	case DACR32_EL2:	write_sysreg_s(val, SYS_DACR32_EL2);	return;
>  	case IFSR32_EL2:	write_sysreg_s(val, SYS_IFSR32_EL2);	return;
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 05ddb6293b79..089441a07ed7 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -237,10 +237,10 @@ static bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx)
>  
>  		switch (index) {
>  		case TIMER_VTIMER:
> -			cnt_ctl = read_sysreg_el0(cntv_ctl);
> +			cnt_ctl = read_sysreg_el0(SYS_CNTV_CTL);
>  			break;
>  		case TIMER_PTIMER:
> -			cnt_ctl = read_sysreg_el0(cntp_ctl);
> +			cnt_ctl = read_sysreg_el0(SYS_CNTP_CTL);
>  			break;
>  		case NR_KVM_TIMERS:
>  			/* GCC is braindead */
> @@ -349,20 +349,20 @@ static void timer_save_state(struct arch_timer_context *ctx)
>  
>  	switch (index) {
>  	case TIMER_VTIMER:
> -		ctx->cnt_ctl = read_sysreg_el0(cntv_ctl);
> -		ctx->cnt_cval = read_sysreg_el0(cntv_cval);
> +		ctx->cnt_ctl = read_sysreg_el0(SYS_CNTV_CTL);
> +		ctx->cnt_cval = read_sysreg_el0(SYS_CNTV_CVAL);
>  
>  		/* Disable the timer */
> -		write_sysreg_el0(0, cntv_ctl);
> +		write_sysreg_el0(0, SYS_CNTV_CTL);
>  		isb();
>  
>  		break;
>  	case TIMER_PTIMER:
> -		ctx->cnt_ctl = read_sysreg_el0(cntp_ctl);
> -		ctx->cnt_cval = read_sysreg_el0(cntp_cval);
> +		ctx->cnt_ctl = read_sysreg_el0(SYS_CNTP_CTL);
> +		ctx->cnt_cval = read_sysreg_el0(SYS_CNTP_CVAL);
>  
>  		/* Disable the timer */
> -		write_sysreg_el0(0, cntp_ctl);
> +		write_sysreg_el0(0, SYS_CNTP_CTL);
>  		isb();
>  
>  		break;
> @@ -428,14 +428,14 @@ static void timer_restore_state(struct arch_timer_context *ctx)
>  
>  	switch (index) {
>  	case TIMER_VTIMER:
> -		write_sysreg_el0(ctx->cnt_cval, cntv_cval);
> +		write_sysreg_el0(ctx->cnt_cval, SYS_CNTV_CVAL);
>  		isb();
> -		write_sysreg_el0(ctx->cnt_ctl, cntv_ctl);
> +		write_sysreg_el0(ctx->cnt_ctl, SYS_CNTV_CTL);
>  		break;
>  	case TIMER_PTIMER:
> -		write_sysreg_el0(ctx->cnt_cval, cntp_cval);
> +		write_sysreg_el0(ctx->cnt_cval, SYS_CNTP_CVAL);
>  		isb();
> -		write_sysreg_el0(ctx->cnt_ctl, cntp_ctl);
> +		write_sysreg_el0(ctx->cnt_ctl, SYS_CNTP_CTL);
>  		break;
>  	case NR_KVM_TIMERS:
>  		BUG();
> -- 
> 2.20.1
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 02/59] KVM: arm64: Move __load_guest_stage2 to kvm_mmu.h
  2019-06-21  9:37 ` [PATCH 02/59] KVM: arm64: Move __load_guest_stage2 to kvm_mmu.h Marc Zyngier
@ 2019-06-24 11:19   ` Dave Martin
  2019-07-03  9:30     ` Marc Zyngier
  0 siblings, 1 reply; 176+ messages in thread
From: Dave Martin @ 2019-06-24 11:19 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Julien Thierry, Andre Przywara,
	Suzuki K Poulose, Christoffer Dall, James Morse, Jintack Lim

On Fri, Jun 21, 2019 at 10:37:46AM +0100, Marc Zyngier wrote:
> Having __load_guest_stage2 in kvm_hyp.h is quickly going to trigger
> a circular include problem. In order to avoid this, let's move
> it to kvm_mmu.h, where it will be a better fit anyway.
> 
> In the process, drop the __hyp_text annotation, which doesn't help
> as the function is marked as __always_inline.

Does GCC always inline things marked __always_inline?

I seem to remember some gotchas in this area, but I may be being
paranoid.

If this still only called from hyp, I'd be tempted to heep the
__hyp_text annotation just to be on the safe side.

[...]

Cheers
---Dave

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 03/59] arm64: Add ARM64_HAS_NESTED_VIRT cpufeature
  2019-06-21  9:37 ` [PATCH 03/59] arm64: Add ARM64_HAS_NESTED_VIRT cpufeature Marc Zyngier
  2019-06-21 13:08   ` Julien Thierry
  2019-06-21 13:44   ` Suzuki K Poulose
@ 2019-06-24 11:24   ` Dave Martin
  2 siblings, 0 replies; 176+ messages in thread
From: Dave Martin @ 2019-06-24 11:24 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Julien Thierry, Andre Przywara,
	Suzuki K Poulose, Christoffer Dall, James Morse, Jintack Lim

On Fri, Jun 21, 2019 at 10:37:47AM +0100, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> Add a new ARM64_HAS_NESTED_VIRT feature to indicate that the
> CPU has the ARMv8.3 nested virtualization capability.
> 
> This will be used to support nested virtualization in KVM.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  .../admin-guide/kernel-parameters.txt         |  4 +++
>  arch/arm64/include/asm/cpucaps.h              |  3 ++-
>  arch/arm64/include/asm/sysreg.h               |  1 +
>  arch/arm64/kernel/cpufeature.c                | 26 +++++++++++++++++++
>  4 files changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 138f6664b2e2..202bb2115d83 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2046,6 +2046,10 @@
>  			[KVM,ARM] Allow use of GICv4 for direct injection of
>  			LPIs.
>  
> +	kvm-arm.nested=
> +			[KVM,ARM] Allow nested virtualization in KVM/ARM.
> +			Default is 0 (disabled)
> +

In light of the discussion on this patch, is it worth making 0 not
guarantee that nested is allowed, rather than guaranteeing to disable
nested?

This would allow the option to be turned into a no-op later once the NV
code is considered mature enough to rip out all the conditionality.

[...]

Cheers
---Dave

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 04/59] KVM: arm64: nv: Introduce nested virtualization VCPU feature
  2019-06-21  9:37 ` [PATCH 04/59] KVM: arm64: nv: Introduce nested virtualization VCPU feature Marc Zyngier
  2019-06-21 13:08   ` Julien Thierry
@ 2019-06-24 11:28   ` Dave Martin
  2019-07-03 11:53     ` Marc Zyngier
  2019-06-24 11:43   ` Dave Martin
  2 siblings, 1 reply; 176+ messages in thread
From: Dave Martin @ 2019-06-24 11:28 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Julien Thierry, Andre Przywara,
	Suzuki K Poulose, Christoffer Dall, James Morse, Jintack Lim

On Fri, Jun 21, 2019 at 10:37:48AM +0100, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@arm.com>
> 
> Introduce the feature bit and a primitive that checks if the feature is
> set behind a static key check based on the cpus_have_const_cap check.
> 
> Checking nested_virt_in_use() on systems without nested virt enabled
> should have neglgible overhead.
> 
> We don't yet allow userspace to actually set this feature.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/include/asm/kvm_nested.h   |  9 +++++++++
>  arch/arm64/include/asm/kvm_nested.h | 13 +++++++++++++
>  arch/arm64/include/uapi/asm/kvm.h   |  1 +
>  3 files changed, 23 insertions(+)
>  create mode 100644 arch/arm/include/asm/kvm_nested.h
>  create mode 100644 arch/arm64/include/asm/kvm_nested.h
> 
> diff --git a/arch/arm/include/asm/kvm_nested.h b/arch/arm/include/asm/kvm_nested.h
> new file mode 100644
> index 000000000000..124ff6445f8f
> --- /dev/null
> +++ b/arch/arm/include/asm/kvm_nested.h
> @@ -0,0 +1,9 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __ARM_KVM_NESTED_H
> +#define __ARM_KVM_NESTED_H
> +
> +#include <linux/kvm_host.h>
> +
> +static inline bool nested_virt_in_use(const struct kvm_vcpu *vcpu) { return false; }
> +
> +#endif /* __ARM_KVM_NESTED_H */
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> new file mode 100644
> index 000000000000..8a3d121a0b42
> --- /dev/null
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -0,0 +1,13 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __ARM64_KVM_NESTED_H
> +#define __ARM64_KVM_NESTED_H
> +
> +#include <linux/kvm_host.h>
> +
> +static inline bool nested_virt_in_use(const struct kvm_vcpu *vcpu)
> +{
> +	return cpus_have_const_cap(ARM64_HAS_NESTED_VIRT) &&
> +		test_bit(KVM_ARM_VCPU_NESTED_VIRT, vcpu->arch.features);
> +}
> +
> +#endif /* __ARM64_KVM_NESTED_H */
> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> index d819a3e8b552..563e2a8bae93 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -106,6 +106,7 @@ struct kvm_regs {
>  #define KVM_ARM_VCPU_SVE		4 /* enable SVE for this CPU */
>  #define KVM_ARM_VCPU_PTRAUTH_ADDRESS	5 /* VCPU uses address authentication */
>  #define KVM_ARM_VCPU_PTRAUTH_GENERIC	6 /* VCPU uses generic authentication */
> +#define KVM_ARM_VCPU_NESTED_VIRT	7 /* Support nested virtualization */

This seems weirdly named:

Isn't the feature we're exposing here really EL2?  In that case, the
feature the guest gets with this flag enabled is plain virtualisation,
possibly with the option to nest further.

Does the guest also get nested virt (i.e., recursively nested virt from
the host's PoV) as a side effect, or would require an explicit extra
flag?

Cheers
---Dave

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 05/59] KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set
  2019-06-21  9:37 ` [PATCH 05/59] KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set Marc Zyngier
  2019-06-24 10:19   ` Suzuki K Poulose
@ 2019-06-24 11:38   ` Dave Martin
  1 sibling, 0 replies; 176+ messages in thread
From: Dave Martin @ 2019-06-24 11:38 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Julien Thierry, Andre Przywara,
	Suzuki K Poulose, Christoffer Dall, James Morse, Jintack Lim

On Fri, Jun 21, 2019 at 10:37:49AM +0100, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@arm.com>
> 
> Reset the VCPU with PSTATE.M = EL2h when the nested virtualization
> feature is enabled on the VCPU.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/kvm/reset.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index 1140b4485575..675ca07dbb05 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -52,6 +52,11 @@ static const struct kvm_regs default_regs_reset = {
>  			PSR_F_BIT | PSR_D_BIT),
>  };
>  
> +static const struct kvm_regs default_regs_reset_el2 = {
> +	.regs.pstate = (PSR_MODE_EL2h | PSR_A_BIT | PSR_I_BIT |
> +			PSR_F_BIT | PSR_D_BIT),
> +};
> +

Is it worth having a #define for the common non-mode bits?  It's a bit
weird for EL2 and EL1 to have indepedent DAIF defaults.

Putting a big block of zeros in the kernel text just to initialise one
register seems overkill.  Now we're adding a third block of zeros,
maybe this is worth refactoring?  We really just need a memset(0)
followed by config-dependent initialisation of regs.pstate AFAICT.

Not a big deal though: this doesn't look like a high risk for
maintainability.

Cheers
---Dave

>  static const struct kvm_regs default_regs_reset32 = {
>  	.regs.pstate = (PSR_AA32_MODE_SVC | PSR_AA32_A_BIT |
>  			PSR_AA32_I_BIT | PSR_AA32_F_BIT),
> @@ -302,6 +307,8 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>  			if (!cpu_has_32bit_el1())
>  				goto out;
>  			cpu_reset = &default_regs_reset32;
> +		} else if (test_bit(KVM_ARM_VCPU_NESTED_VIRT, vcpu->arch.features)) {
> +			cpu_reset = &default_regs_reset_el2;
>  		} else {
>  			cpu_reset = &default_regs_reset;
>  		}
> -- 
> 2.20.1
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 04/59] KVM: arm64: nv: Introduce nested virtualization VCPU feature
  2019-06-21  9:37 ` [PATCH 04/59] KVM: arm64: nv: Introduce nested virtualization VCPU feature Marc Zyngier
  2019-06-21 13:08   ` Julien Thierry
  2019-06-24 11:28   ` Dave Martin
@ 2019-06-24 11:43   ` Dave Martin
  2019-07-03 11:56     ` Marc Zyngier
  2 siblings, 1 reply; 176+ messages in thread
From: Dave Martin @ 2019-06-24 11:43 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Julien Thierry, Andre Przywara,
	Suzuki K Poulose, Christoffer Dall, James Morse, Jintack Lim

On Fri, Jun 21, 2019 at 10:37:48AM +0100, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@arm.com>
> 
> Introduce the feature bit and a primitive that checks if the feature is
> set behind a static key check based on the cpus_have_const_cap check.
> 
> Checking nested_virt_in_use() on systems without nested virt enabled
> should have neglgible overhead.
> 
> We don't yet allow userspace to actually set this feature.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---

[...]

> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> new file mode 100644
> index 000000000000..8a3d121a0b42
> --- /dev/null
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -0,0 +1,13 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __ARM64_KVM_NESTED_H
> +#define __ARM64_KVM_NESTED_H
> +
> +#include <linux/kvm_host.h>
> +
> +static inline bool nested_virt_in_use(const struct kvm_vcpu *vcpu)
> +{
> +	return cpus_have_const_cap(ARM64_HAS_NESTED_VIRT) &&
> +		test_bit(KVM_ARM_VCPU_NESTED_VIRT, vcpu->arch.features);
> +}

Also, is it worth having a vcpu->arch.flags flag for this, similarly to
SVE and ptrauth?

[...]

Cheers
---Dave

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 13/59] KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg()
  2019-06-21  9:37 ` [PATCH 13/59] KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg() Marc Zyngier
@ 2019-06-24 12:42   ` Julien Thierry
  2019-06-25 14:02     ` Alexandru Elisei
  2019-07-03 12:15     ` Marc Zyngier
  2019-06-25 15:18   ` Alexandru Elisei
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 176+ messages in thread
From: Julien Thierry @ 2019-06-24 12:42 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	James Morse, Suzuki K Poulose



On 06/21/2019 10:37 AM, Marc Zyngier wrote:
> From: Andre Przywara <andre.przywara@arm.com>
> 
> KVM internally uses accessor functions when reading or writing the
> guest's system registers. This takes care of accessing either the stored
> copy or using the "live" EL1 system registers when the host uses VHE.
> 
> With the introduction of virtual EL2 we add a bunch of EL2 system
> registers, which now must also be taken care of:
> - If the guest is running in vEL2, and we access an EL1 sysreg, we must
>   revert to the stored version of that, and not use the CPU's copy.
> - If the guest is running in vEL1, and we access an EL2 sysreg, we must
>   also use the stored version, since the CPU carries the EL1 copy.
> - Some EL2 system registers are supposed to affect the current execution
>   of the system, so we need to put them into their respective EL1
>   counterparts. For this we need to define a mapping between the two.
>   This is done using the newly introduced struct el2_sysreg_map.
> - Some EL2 system registers have a different format than their EL1
>   counterpart, so we need to translate them before writing them to the
>   CPU. This is done using an (optional) translate function in the map.
> - There are the three special registers SP_EL2, SPSR_EL2 and ELR_EL2,
>   which need some separate handling.
> 
> All of these cases are now wrapped into the existing accessor functions,
> so KVM users wouldn't need to care whether they access EL2 or EL1
> registers and also which state the guest is in.
> 
> This handles what was formerly known as the "shadow state" dynamically,
> without requiring a separate copy for each vCPU EL.
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/include/asm/kvm_emulate.h |   6 +
>  arch/arm64/include/asm/kvm_host.h    |   5 +
>  arch/arm64/kvm/sys_regs.c            | 163 +++++++++++++++++++++++++++
>  3 files changed, 174 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index c43aac5fed69..f37006b6eec4 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -70,6 +70,12 @@ void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu);
>  int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2);
>  int kvm_inject_nested_irq(struct kvm_vcpu *vcpu);
>  
> +u64 translate_tcr(u64 tcr);
> +u64 translate_cptr(u64 tcr);
> +u64 translate_sctlr(u64 tcr);
> +u64 translate_ttbr0(u64 tcr);
> +u64 translate_cnthctl(u64 tcr);
> +
>  static inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
>  {
>  	return !(vcpu->arch.hcr_el2 & HCR_RW);
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 2d4290d2513a..dae9c42a7219 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -217,6 +217,11 @@ enum vcpu_sysreg {
>  	NR_SYS_REGS	/* Nothing after this line! */
>  };
>  
> +static inline bool sysreg_is_el2(int reg)
> +{
> +	return reg >= FIRST_EL2_SYSREG && reg < NR_SYS_REGS;
> +}
> +
>  /* 32bit mapping */
>  #define c0_MPIDR	(MPIDR_EL1 * 2)	/* MultiProcessor ID Register */
>  #define c0_CSSELR	(CSSELR_EL1 * 2)/* Cache Size Selection Register */
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 693dd063c9c2..d024114da162 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -76,11 +76,142 @@ static bool write_to_read_only(struct kvm_vcpu *vcpu,
>  	return false;
>  }
>  
> +static u64 tcr_el2_ips_to_tcr_el1_ps(u64 tcr_el2)
> +{
> +	return ((tcr_el2 & TCR_EL2_PS_MASK) >> TCR_EL2_PS_SHIFT)
> +		<< TCR_IPS_SHIFT;
> +}
> +
> +u64 translate_tcr(u64 tcr)
> +{
> +	return TCR_EPD1_MASK |				/* disable TTBR1_EL1 */
> +	       ((tcr & TCR_EL2_TBI) ? TCR_TBI0 : 0) |
> +	       tcr_el2_ips_to_tcr_el1_ps(tcr) |
> +	       (tcr & TCR_EL2_TG0_MASK) |
> +	       (tcr & TCR_EL2_ORGN0_MASK) |
> +	       (tcr & TCR_EL2_IRGN0_MASK) |
> +	       (tcr & TCR_EL2_T0SZ_MASK);
> +}
> +
> +u64 translate_cptr(u64 cptr_el2)
> +{
> +	u64 cpacr_el1 = 0;
> +
> +	if (!(cptr_el2 & CPTR_EL2_TFP))
> +		cpacr_el1 |= CPACR_EL1_FPEN;
> +	if (cptr_el2 & CPTR_EL2_TTA)
> +		cpacr_el1 |= CPACR_EL1_TTA;
> +	if (!(cptr_el2 & CPTR_EL2_TZ))
> +		cpacr_el1 |= CPACR_EL1_ZEN;
> +
> +	return cpacr_el1;
> +}
> +
> +u64 translate_sctlr(u64 sctlr)
> +{
> +	/* Bit 20 is RES1 in SCTLR_EL1, but RES0 in SCTLR_EL2 */
> +	return sctlr | BIT(20);
> +}
> +
> +u64 translate_ttbr0(u64 ttbr0)
> +{
> +	/* Force ASID to 0 (ASID 0 or RES0) */
> +	return ttbr0 & ~GENMASK_ULL(63, 48);
> +}
> +
> +u64 translate_cnthctl(u64 cnthctl)
> +{
> +	return ((cnthctl & 0x3) << 10) | (cnthctl & 0xfc);
> +}
> +
> +#define EL2_SYSREG(el2, el1, translate)	\
> +	[el2 - FIRST_EL2_SYSREG] = { el2, el1, translate }
> +#define PURE_EL2_SYSREG(el2) \
> +	[el2 - FIRST_EL2_SYSREG] = { el2,__INVALID_SYSREG__, NULL }
> +/*
> + * Associate vEL2 registers to their EL1 counterparts on the CPU.
> + * The translate function can be NULL, when the register layout is identical.
> + */
> +struct el2_sysreg_map {
> +	int sysreg;	/* EL2 register index into the array above */
> +	int mapping;	/* associated EL1 register */
> +	u64 (*translate)(u64 value);
> +} nested_sysreg_map[NR_SYS_REGS - FIRST_EL2_SYSREG] = {
> +	PURE_EL2_SYSREG( VPIDR_EL2 ),
> +	PURE_EL2_SYSREG( VMPIDR_EL2 ),
> +	PURE_EL2_SYSREG( ACTLR_EL2 ),
> +	PURE_EL2_SYSREG( HCR_EL2 ),
> +	PURE_EL2_SYSREG( MDCR_EL2 ),
> +	PURE_EL2_SYSREG( HSTR_EL2 ),
> +	PURE_EL2_SYSREG( HACR_EL2 ),
> +	PURE_EL2_SYSREG( VTTBR_EL2 ),
> +	PURE_EL2_SYSREG( VTCR_EL2 ),
> +	PURE_EL2_SYSREG( RVBAR_EL2 ),
> +	PURE_EL2_SYSREG( RMR_EL2 ),
> +	PURE_EL2_SYSREG( TPIDR_EL2 ),
> +	PURE_EL2_SYSREG( CNTVOFF_EL2 ),
> +	PURE_EL2_SYSREG( CNTHCTL_EL2 ),
> +	PURE_EL2_SYSREG( HPFAR_EL2 ),
> +	EL2_SYSREG(      SCTLR_EL2,  SCTLR_EL1,      translate_sctlr ),
> +	EL2_SYSREG(      CPTR_EL2,   CPACR_EL1,      translate_cptr  ),
> +	EL2_SYSREG(      TTBR0_EL2,  TTBR0_EL1,      translate_ttbr0 ),
> +	EL2_SYSREG(      TTBR1_EL2,  TTBR1_EL1,      NULL            ),
> +	EL2_SYSREG(      TCR_EL2,    TCR_EL1,        translate_tcr   ),
> +	EL2_SYSREG(      VBAR_EL2,   VBAR_EL1,       NULL            ),
> +	EL2_SYSREG(      AFSR0_EL2,  AFSR0_EL1,      NULL            ),
> +	EL2_SYSREG(      AFSR1_EL2,  AFSR1_EL1,      NULL            ),
> +	EL2_SYSREG(      ESR_EL2,    ESR_EL1,        NULL            ),
> +	EL2_SYSREG(      FAR_EL2,    FAR_EL1,        NULL            ),
> +	EL2_SYSREG(      MAIR_EL2,   MAIR_EL1,       NULL            ),
> +	EL2_SYSREG(      AMAIR_EL2,  AMAIR_EL1,      NULL            ),
> +};
> +
> +static
> +const struct el2_sysreg_map *find_el2_sysreg(const struct el2_sysreg_map *map,
> +					     int reg)
> +{
> +	const struct el2_sysreg_map *entry;
> +
> +	if (!sysreg_is_el2(reg))
> +		return NULL;
> +
> +	entry = &nested_sysreg_map[reg - FIRST_EL2_SYSREG];
> +	if (entry->sysreg == __INVALID_SYSREG__)
> +		return NULL;
> +
> +	return entry;
> +}
> +
>  u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
>  {
> +
>  	if (!vcpu->arch.sysregs_loaded_on_cpu)
>  		goto immediate_read;
>  
> +	if (unlikely(sysreg_is_el2(reg))) {
> +		const struct el2_sysreg_map *el2_reg;
> +
> +		if (!is_hyp_ctxt(vcpu))
> +			goto immediate_read;
> +
> +		el2_reg = find_el2_sysreg(nested_sysreg_map, reg);
> +		if (el2_reg) {
> +			/*
> +			 * If this register does not have an EL1 counterpart,
> +			 * then read the stored EL2 version.
> +			 */
> +			if (el2_reg->mapping == __INVALID_SYSREG__)

In this patch, find_el2_sysreg returns NULL for PURE_EL2 registers. So
for PURE_EL2, the access would go through the switch case. However this
branch suggest that for PURE_EL2 register we intend to do the read from
the memory backed version.

Which should it be?

> +				goto immediate_read;
> +
> +			/* Get the current version of the EL1 counterpart. */
> +			reg = el2_reg->mapping;
> +		}
> +	} else {
> +		/* EL1 register can't be on the CPU if the guest is in vEL2. */
> +		if (unlikely(is_hyp_ctxt(vcpu)))
> +			goto immediate_read;
> +	}
> +
>  	/*
>  	 * System registers listed in the switch are not saved on every
>  	 * exit from the guest but are only saved on vcpu_put.
> @@ -114,6 +245,8 @@ u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
>  	case DACR32_EL2:	return read_sysreg_s(SYS_DACR32_EL2);
>  	case IFSR32_EL2:	return read_sysreg_s(SYS_IFSR32_EL2);
>  	case DBGVCR32_EL2:	return read_sysreg_s(SYS_DBGVCR32_EL2);
> +	case SP_EL2:		return read_sysreg(sp_el1);
> +	case ELR_EL2:		return read_sysreg_el1(SYS_ELR);
>  	}
>  
>  immediate_read:
> @@ -125,6 +258,34 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
>  	if (!vcpu->arch.sysregs_loaded_on_cpu)
>  		goto immediate_write;
>  
> +	if (unlikely(sysreg_is_el2(reg))) {
> +		const struct el2_sysreg_map *el2_reg;
> +
> +		if (!is_hyp_ctxt(vcpu))
> +			goto immediate_write;
> +
> +		/* Store the EL2 version in the sysregs array. */
> +		__vcpu_sys_reg(vcpu, reg) = val;
> +
> +		el2_reg = find_el2_sysreg(nested_sysreg_map, reg);
> +		if (el2_reg) {
> +			/* Does this register have an EL1 counterpart? */
> +			if (el2_reg->mapping == __INVALID_SYSREG__)
> +				return;

As in the read case, this is never reached and we'll go through the
switch case.

Cheers,

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 06/59] KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x
  2019-06-21 13:50     ` Marc Zyngier
@ 2019-06-24 12:48       ` Dave Martin
  2019-07-03  9:21         ` Marc Zyngier
  0 siblings, 1 reply; 176+ messages in thread
From: Dave Martin @ 2019-06-24 12:48 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Julien Thierry, linux-arm-kernel, kvmarm, kvm, Andre Przywara

On Fri, Jun 21, 2019 at 02:50:08PM +0100, Marc Zyngier wrote:
> On 21/06/2019 14:24, Julien Thierry wrote:
> > 
> > 
> > On 21/06/2019 10:37, Marc Zyngier wrote:
> >> From: Christoffer Dall <christoffer.dall@linaro.org>
> >>
> >> We were not allowing userspace to set a more privileged mode for the VCPU
> >> than EL1, but we should allow this when nested virtualization is enabled
> >> for the VCPU.
> >>
> >> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> >> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >> ---
> >>  arch/arm64/kvm/guest.c | 6 ++++++
> >>  1 file changed, 6 insertions(+)
> >>
> >> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> >> index 3ae2f82fca46..4c35b5d51e21 100644
> >> --- a/arch/arm64/kvm/guest.c
> >> +++ b/arch/arm64/kvm/guest.c
> >> @@ -37,6 +37,7 @@
> >>  #include <asm/kvm_emulate.h>
> >>  #include <asm/kvm_coproc.h>
> >>  #include <asm/kvm_host.h>
> >> +#include <asm/kvm_nested.h>
> >>  #include <asm/sigcontext.h>
> >>  
> >>  #include "trace.h"
> >> @@ -194,6 +195,11 @@ static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> >>  			if (vcpu_el1_is_32bit(vcpu))
> >>  				return -EINVAL;
> >>  			break;
> >> +		case PSR_MODE_EL2h:
> >> +		case PSR_MODE_EL2t:
> >> +			if (vcpu_el1_is_32bit(vcpu) || !nested_virt_in_use(vcpu))
> > 
> > This condition reads a bit weirdly. Why do we care about anything else
> > than !nested_virt_in_use() ?
> > 
> > If nested virt is not in use then obviously we return the error.
> > 
> > If nested virt is in use then why do we care about EL1? Or should this
> > test read as "highest_el_is_32bit" ?
> 
> There are multiple things at play here:
> 
> - MODE_EL2x is not a valid 32bit mode
> - The architecture forbids nested virt with 32bit EL2
> 
> The code above is a simplification of these two conditions. But
> certainly we can do a bit better, as kvm_reset_cpu() doesn't really
> check that we don't create a vcpu with both 32bit+NV. These two bits
> should really be exclusive.

This code is safe for now because KVM_VCPU_MAX_FEATURES <=
KVM_ARM_VCPU_NESTED_VIRT, right, i.e., nested_virt_in_use() cannot be
true?

This makes me a little uneasy, but I think that's paranoia talking: we
want bisectably, but no sane person should ship with just half of this
series.  So I guess this is fine.

We could stick something like

	if (WARN_ON(...))
		return false;

in nested_virt_in_use() and then remove it in the final patch, but it's
probably overkill.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 07/59] KVM: arm64: nv: Add EL2 system registers to vcpu context
  2019-06-21  9:37 ` [PATCH 07/59] KVM: arm64: nv: Add EL2 system registers to vcpu context Marc Zyngier
@ 2019-06-24 12:54   ` Dave Martin
  2019-07-03 12:20     ` Marc Zyngier
  2019-06-24 15:47   ` Alexandru Elisei
  2019-07-01 16:36   ` Suzuki K Poulose
  2 siblings, 1 reply; 176+ messages in thread
From: Dave Martin @ 2019-06-24 12:54 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Julien Thierry, Andre Przywara,
	Suzuki K Poulose, Christoffer Dall, James Morse, Jintack Lim

On Fri, Jun 21, 2019 at 10:37:51AM +0100, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> ARM v8.3 introduces a new bit in the HCR_EL2, which is the NV bit. When
> this bit is set, accessing EL2 registers in EL1 traps to EL2. In
> addition, executing the following instructions in EL1 will trap to EL2:
> tlbi, at, eret, and msr/mrs instructions to access SP_EL1. Most of the
> instructions that trap to EL2 with the NV bit were undef at EL1 prior to
> ARM v8.3. The only instruction that was not undef is eret.
> 
> This patch sets up a handler for EL2 registers and SP_EL1 register
> accesses at EL1. The host hypervisor keeps those register values in
> memory, and will emulate their behavior.
> 
> This patch doesn't set the NV bit yet. It will be set in a later patch
> once nested virtualization support is completed.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/include/asm/kvm_host.h | 37 +++++++++++++++-
>  arch/arm64/include/asm/sysreg.h   | 50 ++++++++++++++++++++-
>  arch/arm64/kvm/sys_regs.c         | 74 ++++++++++++++++++++++++++++---
>  3 files changed, 154 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 4bcd9c1291d5..2d4290d2513a 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -173,12 +173,47 @@ enum vcpu_sysreg {
>  	APGAKEYLO_EL1,
>  	APGAKEYHI_EL1,
>  
> -	/* 32bit specific registers. Keep them at the end of the range */
> +	/* 32bit specific registers. */

Out of interest, why did we originally want these to be at the end?
Because they're not at the end any more...

>  	DACR32_EL2,	/* Domain Access Control Register */
>  	IFSR32_EL2,	/* Instruction Fault Status Register */
>  	FPEXC32_EL2,	/* Floating-Point Exception Control Register */
>  	DBGVCR32_EL2,	/* Debug Vector Catch Register */
>  
> +	/* EL2 registers sorted ascending by Op0, Op1, CRn, CRm, Op2 */
> +	FIRST_EL2_SYSREG,
> +	VPIDR_EL2 = FIRST_EL2_SYSREG,
> +			/* Virtualization Processor ID Register */
> +	VMPIDR_EL2,	/* Virtualization Multiprocessor ID Register */
> +	SCTLR_EL2,	/* System Control Register (EL2) */
> +	ACTLR_EL2,	/* Auxiliary Control Register (EL2) */
> +	HCR_EL2,	/* Hypervisor Configuration Register */
> +	MDCR_EL2,	/* Monitor Debug Configuration Register (EL2) */
> +	CPTR_EL2,	/* Architectural Feature Trap Register (EL2) */
> +	HSTR_EL2,	/* Hypervisor System Trap Register */
> +	HACR_EL2,	/* Hypervisor Auxiliary Control Register */
> +	TTBR0_EL2,	/* Translation Table Base Register 0 (EL2) */
> +	TTBR1_EL2,	/* Translation Table Base Register 1 (EL2) */
> +	TCR_EL2,	/* Translation Control Register (EL2) */
> +	VTTBR_EL2,	/* Virtualization Translation Table Base Register */
> +	VTCR_EL2,	/* Virtualization Translation Control Register */
> +	SPSR_EL2,	/* EL2 saved program status register */
> +	ELR_EL2,	/* EL2 exception link register */
> +	AFSR0_EL2,	/* Auxiliary Fault Status Register 0 (EL2) */
> +	AFSR1_EL2,	/* Auxiliary Fault Status Register 1 (EL2) */
> +	ESR_EL2,	/* Exception Syndrome Register (EL2) */
> +	FAR_EL2,	/* Hypervisor IPA Fault Address Register */
> +	HPFAR_EL2,	/* Hypervisor IPA Fault Address Register */
> +	MAIR_EL2,	/* Memory Attribute Indirection Register (EL2) */
> +	AMAIR_EL2,	/* Auxiliary Memory Attribute Indirection Register (EL2) */
> +	VBAR_EL2,	/* Vector Base Address Register (EL2) */
> +	RVBAR_EL2,	/* Reset Vector Base Address Register */
> +	RMR_EL2,	/* Reset Management Register */
> +	CONTEXTIDR_EL2,	/* Context ID Register (EL2) */
> +	TPIDR_EL2,	/* EL2 Software Thread ID Register */
> +	CNTVOFF_EL2,	/* Counter-timer Virtual Offset register */
> +	CNTHCTL_EL2,	/* Counter-timer Hypervisor Control register */
> +	SP_EL2,		/* EL2 Stack Pointer */
> +

I wonder whether we could make these conditionally present somehow.  Not
worth worrying about for now to save 200-odd bytes per vcpu though.

[...]

Cheers
---Dave

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 08/59] KVM: arm64: nv: Reset VMPIDR_EL2 and VPIDR_EL2 to sane values
  2019-06-21  9:37 ` [PATCH 08/59] KVM: arm64: nv: Reset VMPIDR_EL2 and VPIDR_EL2 to sane values Marc Zyngier
@ 2019-06-24 12:59   ` Dave Martin
  0 siblings, 0 replies; 176+ messages in thread
From: Dave Martin @ 2019-06-24 12:59 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Julien Thierry, Andre Przywara,
	Suzuki K Poulose, Christoffer Dall, James Morse, Jintack Lim

On Fri, Jun 21, 2019 at 10:37:52AM +0100, Marc Zyngier wrote:
> The VMPIDR_EL2 and VPIDR_EL2 are architecturally UNKNOWN at reset, but
> let's be nice to a guest hypervisor behaving foolishly and reset these
> to something reasonable anyway.

Why be nice?  Generally we do try to initialise UNKNOWN regs to garbage,
to help trip up badly-written guests.

Cheers
---Dave

> 
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/kvm/sys_regs.c | 25 +++++++++++++++++++++----
>  1 file changed, 21 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index e81be6debe07..693dd063c9c2 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -624,7 +624,7 @@ static void reset_amair_el1(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
>  	vcpu_write_sys_reg(vcpu, amair, AMAIR_EL1);
>  }
>  
> -static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
> +static u64 compute_reset_mpidr(struct kvm_vcpu *vcpu)
>  {
>  	u64 mpidr;
>  
> @@ -638,7 +638,24 @@ static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
>  	mpidr = (vcpu->vcpu_id & 0x0f) << MPIDR_LEVEL_SHIFT(0);
>  	mpidr |= ((vcpu->vcpu_id >> 4) & 0xff) << MPIDR_LEVEL_SHIFT(1);
>  	mpidr |= ((vcpu->vcpu_id >> 12) & 0xff) << MPIDR_LEVEL_SHIFT(2);
> -	vcpu_write_sys_reg(vcpu, (1ULL << 31) | mpidr, MPIDR_EL1);
> +	mpidr |= (1ULL << 31);
> +
> +	return mpidr;
> +}
> +
> +static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
> +{
> +	vcpu_write_sys_reg(vcpu, compute_reset_mpidr(vcpu), MPIDR_EL1);
> +}
> +
> +static void reset_vmpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
> +{
> +	vcpu_write_sys_reg(vcpu, compute_reset_mpidr(vcpu), VMPIDR_EL2);
> +}
> +
> +static void reset_vpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
> +{
> +	vcpu_write_sys_reg(vcpu, read_cpuid_id(), VPIDR_EL2);
>  }
>  
>  static void reset_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
> @@ -1668,8 +1685,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	 */
>  	{ SYS_DESC(SYS_PMCCFILTR_EL0), access_pmu_evtyper, reset_val, PMCCFILTR_EL0, 0 },
>  
> -	{ SYS_DESC(SYS_VPIDR_EL2), access_rw, reset_val, VPIDR_EL2, 0 },
> -	{ SYS_DESC(SYS_VMPIDR_EL2), access_rw, reset_val, VMPIDR_EL2, 0 },
> +	{ SYS_DESC(SYS_VPIDR_EL2), access_rw, reset_vpidr, VPIDR_EL2 },
> +	{ SYS_DESC(SYS_VMPIDR_EL2), access_rw, reset_vmpidr, VMPIDR_EL2 },
>  
>  	{ SYS_DESC(SYS_SCTLR_EL2), access_rw, reset_val, SCTLR_EL2, 0 },
>  	{ SYS_DESC(SYS_ACTLR_EL2), access_rw, reset_val, ACTLR_EL2, 0 },
> -- 
> 2.20.1
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 01/59] KVM: arm64: Migrate _elx sysreg accessors to msr_s/mrs_s
  2019-06-21  9:37 ` [PATCH 01/59] KVM: arm64: Migrate _elx sysreg accessors to msr_s/mrs_s Marc Zyngier
  2019-06-24 11:16   ` Dave Martin
@ 2019-06-24 12:59   ` Alexandru Elisei
  2019-07-03 12:32     ` Marc Zyngier
  1 sibling, 1 reply; 176+ messages in thread
From: Alexandru Elisei @ 2019-06-24 12:59 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave Martin

On 6/21/19 10:37 AM, Marc Zyngier wrote:
> From: Dave Martin <Dave.Martin@arm.com>
>
> Currently, the {read,write}_sysreg_el*() accessors for accessing
> particular ELs' sysregs in the presence of VHE rely on some local
> hacks and define their system register encodings in a way that is
> inconsistent with the core definitions in <asm/sysreg.h>.
>
> As a result, it is necessary to add duplicate definitions for any
> system register that already needs a definition in sysreg.h for
> other reasons.
>
> This is a bit of a maintenance headache, and the reasons for the
> _el*() accessors working the way they do is a bit historical.
>
> This patch gets rid of the shadow sysreg definitions in
> <asm/kvm_hyp.h>, converts the _el*() accessors to use the core
> __msr_s/__mrs_s interface, and converts all call sites to use the
> standard sysreg #define names (i.e., upper case, with SYS_ prefix).
>
> This patch will conflict heavily anyway, so the opportunity taken
> to clean up some bad whitespace in the context of the changes is
> taken.
>
> The change exposes a few system registers that have no sysreg.h
> definition, due to msr_s/mrs_s being used in place of msr/mrs:
> additions are made in order to fill in the gaps.
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Christoffer Dall <christoffer.dall@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Link: https://www.spinics.net/lists/kvm-arm/msg31717.html
> [Rebased to v4.21-rc1]
> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> [Rebased to v5.2-rc5, changelog updates]
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/include/asm/kvm_hyp.h           | 13 ++--
>  arch/arm64/include/asm/kvm_emulate.h     | 16 ++---
>  arch/arm64/include/asm/kvm_hyp.h         | 50 ++-------------
>  arch/arm64/include/asm/sysreg.h          | 35 ++++++++++-
>  arch/arm64/kvm/hyp/switch.c              | 14 ++---
>  arch/arm64/kvm/hyp/sysreg-sr.c           | 78 ++++++++++++------------
>  arch/arm64/kvm/hyp/tlb.c                 | 12 ++--
>  arch/arm64/kvm/hyp/vgic-v2-cpuif-proxy.c |  2 +-
>  arch/arm64/kvm/regmap.c                  |  4 +-
>  arch/arm64/kvm/sys_regs.c                | 56 ++++++++---------
>  virt/kvm/arm/arch_timer.c                | 24 ++++----
>  11 files changed, 148 insertions(+), 156 deletions(-)
>
> diff --git a/arch/arm/include/asm/kvm_hyp.h b/arch/arm/include/asm/kvm_hyp.h
> index 87bcd18df8d5..059224fb14db 100644
> --- a/arch/arm/include/asm/kvm_hyp.h
> +++ b/arch/arm/include/asm/kvm_hyp.h
> @@ -93,13 +93,14 @@
>  #define VFP_FPEXC	__ACCESS_VFP(FPEXC)
>  
>  /* AArch64 compatibility macros, only for the timer so far */
> -#define read_sysreg_el0(r)		read_sysreg(r##_el0)
> -#define write_sysreg_el0(v, r)		write_sysreg(v, r##_el0)
> +#define read_sysreg_el0(r)		read_sysreg(r##_EL0)
> +#define write_sysreg_el0(v, r)		write_sysreg(v, r##_EL0)
> +
> +#define SYS_CNTP_CTL_EL0		CNTP_CTL
> +#define SYS_CNTP_CVAL_EL0		CNTP_CVAL
> +#define SYS_CNTV_CTL_EL0		CNTV_CTL
> +#define SYS_CNTV_CVAL_EL0		CNTV_CVAL
>  
> -#define cntp_ctl_el0			CNTP_CTL
> -#define cntp_cval_el0			CNTP_CVAL
> -#define cntv_ctl_el0			CNTV_CTL
> -#define cntv_cval_el0			CNTV_CVAL
>  #define cntvoff_el2			CNTVOFF
>  #define cnthctl_el2			CNTHCTL
>  
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 613427fafff9..39ffe41855bc 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -137,7 +137,7 @@ static inline unsigned long *__vcpu_elr_el1(const struct kvm_vcpu *vcpu)
>  static inline unsigned long vcpu_read_elr_el1(const struct kvm_vcpu *vcpu)
>  {
>  	if (vcpu->arch.sysregs_loaded_on_cpu)
> -		return read_sysreg_el1(elr);
> +		return read_sysreg_el1(SYS_ELR);
>  	else
>  		return *__vcpu_elr_el1(vcpu);
>  }
> @@ -145,7 +145,7 @@ static inline unsigned long vcpu_read_elr_el1(const struct kvm_vcpu *vcpu)
>  static inline void vcpu_write_elr_el1(const struct kvm_vcpu *vcpu, unsigned long v)
>  {
>  	if (vcpu->arch.sysregs_loaded_on_cpu)
> -		write_sysreg_el1(v, elr);
> +		write_sysreg_el1(v, SYS_ELR);
>  	else
>  		*__vcpu_elr_el1(vcpu) = v;
>  }
> @@ -197,7 +197,7 @@ static inline unsigned long vcpu_read_spsr(const struct kvm_vcpu *vcpu)
>  		return vcpu_read_spsr32(vcpu);
>  
>  	if (vcpu->arch.sysregs_loaded_on_cpu)
> -		return read_sysreg_el1(spsr);
> +		return read_sysreg_el1(SYS_SPSR);
>  	else
>  		return vcpu_gp_regs(vcpu)->spsr[KVM_SPSR_EL1];
>  }
> @@ -210,7 +210,7 @@ static inline void vcpu_write_spsr(struct kvm_vcpu *vcpu, unsigned long v)
>  	}
>  
>  	if (vcpu->arch.sysregs_loaded_on_cpu)
> -		write_sysreg_el1(v, spsr);
> +		write_sysreg_el1(v, SYS_SPSR);
>  	else
>  		vcpu_gp_regs(vcpu)->spsr[KVM_SPSR_EL1] = v;
>  }
> @@ -462,13 +462,13 @@ static inline void kvm_skip_instr(struct kvm_vcpu *vcpu, bool is_wide_instr)
>   */
>  static inline void __hyp_text __kvm_skip_instr(struct kvm_vcpu *vcpu)
>  {
> -	*vcpu_pc(vcpu) = read_sysreg_el2(elr);
> -	vcpu->arch.ctxt.gp_regs.regs.pstate = read_sysreg_el2(spsr);
> +	*vcpu_pc(vcpu) = read_sysreg_el2(SYS_ELR);
> +	vcpu->arch.ctxt.gp_regs.regs.pstate = read_sysreg_el2(SYS_SPSR);
>  
>  	kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
>  
> -	write_sysreg_el2(vcpu->arch.ctxt.gp_regs.regs.pstate, spsr);
> -	write_sysreg_el2(*vcpu_pc(vcpu), elr);
> +	write_sysreg_el2(vcpu->arch.ctxt.gp_regs.regs.pstate, SYS_SPSR);
> +	write_sysreg_el2(*vcpu_pc(vcpu), SYS_ELR);
>  }
>  
>  #endif /* __ARM64_KVM_EMULATE_H__ */
> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
> index 09fe8bd15f6e..ce99c2daff04 100644
> --- a/arch/arm64/include/asm/kvm_hyp.h
> +++ b/arch/arm64/include/asm/kvm_hyp.h
> @@ -29,7 +29,7 @@
>  #define read_sysreg_elx(r,nvh,vh)					\
>  	({								\
>  		u64 reg;						\
> -		asm volatile(ALTERNATIVE("mrs %0, " __stringify(r##nvh),\
> +		asm volatile(ALTERNATIVE(__mrs_s("%0", r##nvh),	\
>  					 __mrs_s("%0", r##vh),		\
>  					 ARM64_HAS_VIRT_HOST_EXTN)	\
>  			     : "=r" (reg));				\
> @@ -39,7 +39,7 @@
>  #define write_sysreg_elx(v,r,nvh,vh)					\
>  	do {								\
>  		u64 __val = (u64)(v);					\
> -		asm volatile(ALTERNATIVE("msr " __stringify(r##nvh) ", %x0",\
> +		asm volatile(ALTERNATIVE(__msr_s(r##nvh, "%x0"),	\
>  					 __msr_s(r##vh, "%x0"),		\
>  					 ARM64_HAS_VIRT_HOST_EXTN)	\
>  					 : : "rZ" (__val));		\
> @@ -48,55 +48,15 @@
>  /*
>   * Unified accessors for registers that have a different encoding
>   * between VHE and non-VHE. They must be specified without their "ELx"
> - * encoding.
> + * encoding, but with the SYS_ prefix, as defined in asm/sysreg.h.
>   */
> -#define read_sysreg_el2(r)						\
> -	({								\
> -		u64 reg;						\
> -		asm volatile(ALTERNATIVE("mrs %0, " __stringify(r##_EL2),\
> -					 "mrs %0, " __stringify(r##_EL1),\
> -					 ARM64_HAS_VIRT_HOST_EXTN)	\
> -			     : "=r" (reg));				\
> -		reg;							\
> -	})
> -
> -#define write_sysreg_el2(v,r)						\
> -	do {								\
> -		u64 __val = (u64)(v);					\
> -		asm volatile(ALTERNATIVE("msr " __stringify(r##_EL2) ", %x0",\
> -					 "msr " __stringify(r##_EL1) ", %x0",\
> -					 ARM64_HAS_VIRT_HOST_EXTN)	\
> -					 : : "rZ" (__val));		\
> -	} while (0)
>  
>  #define read_sysreg_el0(r)	read_sysreg_elx(r, _EL0, _EL02)
>  #define write_sysreg_el0(v,r)	write_sysreg_elx(v, r, _EL0, _EL02)
>  #define read_sysreg_el1(r)	read_sysreg_elx(r, _EL1, _EL12)
>  #define write_sysreg_el1(v,r)	write_sysreg_elx(v, r, _EL1, _EL12)
> -
> -/* The VHE specific system registers and their encoding */
> -#define sctlr_EL12              sys_reg(3, 5, 1, 0, 0)
> -#define cpacr_EL12              sys_reg(3, 5, 1, 0, 2)
> -#define ttbr0_EL12              sys_reg(3, 5, 2, 0, 0)
> -#define ttbr1_EL12              sys_reg(3, 5, 2, 0, 1)
> -#define tcr_EL12                sys_reg(3, 5, 2, 0, 2)
> -#define afsr0_EL12              sys_reg(3, 5, 5, 1, 0)
> -#define afsr1_EL12              sys_reg(3, 5, 5, 1, 1)
> -#define esr_EL12                sys_reg(3, 5, 5, 2, 0)
> -#define far_EL12                sys_reg(3, 5, 6, 0, 0)
> -#define mair_EL12               sys_reg(3, 5, 10, 2, 0)
> -#define amair_EL12              sys_reg(3, 5, 10, 3, 0)
> -#define vbar_EL12               sys_reg(3, 5, 12, 0, 0)
> -#define contextidr_EL12         sys_reg(3, 5, 13, 0, 1)
> -#define cntkctl_EL12            sys_reg(3, 5, 14, 1, 0)
> -#define cntp_tval_EL02          sys_reg(3, 5, 14, 2, 0)
> -#define cntp_ctl_EL02           sys_reg(3, 5, 14, 2, 1)
> -#define cntp_cval_EL02          sys_reg(3, 5, 14, 2, 2)
> -#define cntv_tval_EL02          sys_reg(3, 5, 14, 3, 0)
> -#define cntv_ctl_EL02           sys_reg(3, 5, 14, 3, 1)
> -#define cntv_cval_EL02          sys_reg(3, 5, 14, 3, 2)
> -#define spsr_EL12               sys_reg(3, 5, 4, 0, 0)
> -#define elr_EL12                sys_reg(3, 5, 4, 0, 1)
> +#define read_sysreg_el2(r)	read_sysreg_elx(r, _EL2, _EL1)
> +#define write_sysreg_el2(v,r)	write_sysreg_elx(v, r, _EL2, _EL1)
>  
>  /**
>   * hyp_alternate_select - Generates patchable code sequences that are
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 902d75b60914..434cf53d527b 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -202,6 +202,9 @@
>  #define SYS_APGAKEYLO_EL1		sys_reg(3, 0, 2, 3, 0)
>  #define SYS_APGAKEYHI_EL1		sys_reg(3, 0, 2, 3, 1)
>  
> +#define SYS_SPSR_EL1			sys_reg(3, 0, 4, 0, 0)
> +#define SYS_ELR_EL1			sys_reg(3, 0, 4, 0, 1)
> +
>  #define SYS_ICC_PMR_EL1			sys_reg(3, 0, 4, 6, 0)
>  
>  #define SYS_AFSR0_EL1			sys_reg(3, 0, 5, 1, 0)
> @@ -393,6 +396,9 @@
>  #define SYS_CNTP_CTL_EL0		sys_reg(3, 3, 14, 2, 1)
>  #define SYS_CNTP_CVAL_EL0		sys_reg(3, 3, 14, 2, 2)
>  
> +#define SYS_CNTV_CTL_EL0		sys_reg(3, 3, 14, 3, 1)
> +#define SYS_CNTV_CVAL_EL0		sys_reg(3, 3, 14, 3, 2)
> +
>  #define SYS_AARCH32_CNTP_TVAL		sys_reg(0, 0, 14, 2, 0)
>  #define SYS_AARCH32_CNTP_CTL		sys_reg(0, 0, 14, 2, 1)
>  #define SYS_AARCH32_CNTP_CVAL		sys_reg(0, 2, 0, 14, 0)
> @@ -403,14 +409,17 @@
>  #define __TYPER_CRm(n)			(0xc | (((n) >> 3) & 0x3))
>  #define SYS_PMEVTYPERn_EL0(n)		sys_reg(3, 3, 14, __TYPER_CRm(n), __PMEV_op2(n))
>  
> -#define SYS_PMCCFILTR_EL0		sys_reg (3, 3, 14, 15, 7)
> +#define SYS_PMCCFILTR_EL0		sys_reg(3, 3, 14, 15, 7)
>  
>  #define SYS_ZCR_EL2			sys_reg(3, 4, 1, 2, 0)
> -
>  #define SYS_DACR32_EL2			sys_reg(3, 4, 3, 0, 0)
> +#define SYS_SPSR_EL2			sys_reg(3, 4, 4, 0, 0)
> +#define SYS_ELR_EL2			sys_reg(3, 4, 4, 0, 1)
>  #define SYS_IFSR32_EL2			sys_reg(3, 4, 5, 0, 1)
> +#define SYS_ESR_EL2			sys_reg(3, 4, 5, 2, 0)
>  #define SYS_VSESR_EL2			sys_reg(3, 4, 5, 2, 3)
>  #define SYS_FPEXC32_EL2			sys_reg(3, 4, 5, 3, 0)
> +#define SYS_FAR_EL2			sys_reg(3, 4, 6, 0, 0)
>  
>  #define SYS_VDISR_EL2			sys_reg(3, 4, 12, 1,  1)
>  #define __SYS__AP0Rx_EL2(x)		sys_reg(3, 4, 12, 8, x)
> @@ -455,7 +464,29 @@
>  #define SYS_ICH_LR15_EL2		__SYS__LR8_EL2(7)
>  
>  /* VHE encodings for architectural EL0/1 system registers */
> +#define SYS_SCTLR_EL12			sys_reg(3, 5, 1, 0, 0)
> +#define SYS_CPACR_EL12			sys_reg(3, 5, 1, 0, 2)
>  #define SYS_ZCR_EL12			sys_reg(3, 5, 1, 2, 0)
> +#define SYS_TTBR0_EL12			sys_reg(3, 5, 2, 0, 0)
> +#define SYS_TTBR1_EL12			sys_reg(3, 5, 2, 0, 1)
> +#define SYS_TCR_EL12			sys_reg(3, 5, 2, 0, 2)
> +#define SYS_SPSR_EL12			sys_reg(3, 5, 4, 0, 0)
> +#define SYS_ELR_EL12			sys_reg(3, 5, 4, 0, 1)
> +#define SYS_AFSR0_EL12			sys_reg(3, 5, 5, 1, 0)
> +#define SYS_AFSR1_EL12			sys_reg(3, 5, 5, 1, 1)
> +#define SYS_ESR_EL12			sys_reg(3, 5, 5, 2, 0)
> +#define SYS_FAR_EL12			sys_reg(3, 5, 6, 0, 0)
> +#define SYS_MAIR_EL12			sys_reg(3, 5, 10, 2, 0)
> +#define SYS_AMAIR_EL12			sys_reg(3, 5, 10, 3, 0)
> +#define SYS_VBAR_EL12			sys_reg(3, 5, 12, 0, 0)
> +#define SYS_CONTEXTIDR_EL12		sys_reg(3, 5, 13, 0, 1)
> +#define SYS_CNTKCTL_EL12		sys_reg(3, 5, 14, 1, 0)
> +#define SYS_CNTP_TVAL_EL02		sys_reg(3, 5, 14, 2, 0)
> +#define SYS_CNTP_CTL_EL02		sys_reg(3, 5, 14, 2, 1)
> +#define SYS_CNTP_CVAL_EL02		sys_reg(3, 5, 14, 2, 2)
> +#define SYS_CNTV_TVAL_EL02		sys_reg(3, 5, 14, 3, 0)
> +#define SYS_CNTV_CTL_EL02		sys_reg(3, 5, 14, 3, 1)
> +#define SYS_CNTV_CVAL_EL02		sys_reg(3, 5, 14, 3, 2)
>  
>  /* Common SCTLR_ELx flags. */
>  #define SCTLR_ELx_DSSBS	(_BITUL(44))
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 8799e0c267d4..7b55c11b30fb 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -295,7 +295,7 @@ static bool __hyp_text __populate_fault_info(struct kvm_vcpu *vcpu)
>  	if (ec != ESR_ELx_EC_DABT_LOW && ec != ESR_ELx_EC_IABT_LOW)
>  		return true;
>  
> -	far = read_sysreg_el2(far);
> +	far = read_sysreg_el2(SYS_FAR);
>  
>  	/*
>  	 * The HPFAR can be invalid if the stage 2 fault did not
> @@ -412,7 +412,7 @@ static bool __hyp_text __hyp_handle_fpsimd(struct kvm_vcpu *vcpu)
>  static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>  {
>  	if (ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ)
> -		vcpu->arch.fault.esr_el2 = read_sysreg_el2(esr);
> +		vcpu->arch.fault.esr_el2 = read_sysreg_el2(SYS_ESR);
>  
>  	/*
>  	 * We're using the raw exception code in order to only process
> @@ -708,8 +708,8 @@ static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par,
>  	asm volatile("ldr %0, =__hyp_panic_string" : "=r" (str_va));
>  
>  	__hyp_do_panic(str_va,
> -		       spsr,  elr,
> -		       read_sysreg(esr_el2),   read_sysreg_el2(far),
> +		       spsr, elr,
> +		       read_sysreg(esr_el2), read_sysreg_el2(SYS_FAR),
Seems to me we are pretty sure here we don't have VHE, so why not make both
reads either read_sysreg or read_sysreg_el2 for consistency? Am I missing something?
>  		       read_sysreg(hpfar_el2), par, vcpu);
>  }
>  
> @@ -724,15 +724,15 @@ static void __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 par,
>  
>  	panic(__hyp_panic_string,
>  	      spsr,  elr,
> -	      read_sysreg_el2(esr),   read_sysreg_el2(far),
> +	      read_sysreg_el2(SYS_ESR),   read_sysreg_el2(SYS_FAR),
>  	      read_sysreg(hpfar_el2), par, vcpu);
>  }
>  NOKPROBE_SYMBOL(__hyp_call_panic_vhe);
>  
>  void __hyp_text __noreturn hyp_panic(struct kvm_cpu_context *host_ctxt)
>  {
> -	u64 spsr = read_sysreg_el2(spsr);
> -	u64 elr = read_sysreg_el2(elr);
> +	u64 spsr = read_sysreg_el2(SYS_SPSR);
> +	u64 elr = read_sysreg_el2(SYS_ELR);
>  	u64 par = read_sysreg(par_el1);
>  
>  	if (!has_vhe())
> diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> index c52a8451637c..62866a68e852 100644
> --- a/arch/arm64/kvm/hyp/sysreg-sr.c
> +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> @@ -54,33 +54,33 @@ static void __hyp_text __sysreg_save_user_state(struct kvm_cpu_context *ctxt)
>  static void __hyp_text __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
>  {
>  	ctxt->sys_regs[CSSELR_EL1]	= read_sysreg(csselr_el1);
> -	ctxt->sys_regs[SCTLR_EL1]	= read_sysreg_el1(sctlr);
> +	ctxt->sys_regs[SCTLR_EL1]	= read_sysreg_el1(SYS_SCTLR);
>  	ctxt->sys_regs[ACTLR_EL1]	= read_sysreg(actlr_el1);
> -	ctxt->sys_regs[CPACR_EL1]	= read_sysreg_el1(cpacr);
> -	ctxt->sys_regs[TTBR0_EL1]	= read_sysreg_el1(ttbr0);
> -	ctxt->sys_regs[TTBR1_EL1]	= read_sysreg_el1(ttbr1);
> -	ctxt->sys_regs[TCR_EL1]		= read_sysreg_el1(tcr);
> -	ctxt->sys_regs[ESR_EL1]		= read_sysreg_el1(esr);
> -	ctxt->sys_regs[AFSR0_EL1]	= read_sysreg_el1(afsr0);
> -	ctxt->sys_regs[AFSR1_EL1]	= read_sysreg_el1(afsr1);
> -	ctxt->sys_regs[FAR_EL1]		= read_sysreg_el1(far);
> -	ctxt->sys_regs[MAIR_EL1]	= read_sysreg_el1(mair);
> -	ctxt->sys_regs[VBAR_EL1]	= read_sysreg_el1(vbar);
> -	ctxt->sys_regs[CONTEXTIDR_EL1]	= read_sysreg_el1(contextidr);
> -	ctxt->sys_regs[AMAIR_EL1]	= read_sysreg_el1(amair);
> -	ctxt->sys_regs[CNTKCTL_EL1]	= read_sysreg_el1(cntkctl);
> +	ctxt->sys_regs[CPACR_EL1]	= read_sysreg_el1(SYS_CPACR);
> +	ctxt->sys_regs[TTBR0_EL1]	= read_sysreg_el1(SYS_TTBR0);
> +	ctxt->sys_regs[TTBR1_EL1]	= read_sysreg_el1(SYS_TTBR1);
> +	ctxt->sys_regs[TCR_EL1]		= read_sysreg_el1(SYS_TCR);
> +	ctxt->sys_regs[ESR_EL1]		= read_sysreg_el1(SYS_ESR);
> +	ctxt->sys_regs[AFSR0_EL1]	= read_sysreg_el1(SYS_AFSR0);
> +	ctxt->sys_regs[AFSR1_EL1]	= read_sysreg_el1(SYS_AFSR1);
> +	ctxt->sys_regs[FAR_EL1]		= read_sysreg_el1(SYS_FAR);
> +	ctxt->sys_regs[MAIR_EL1]	= read_sysreg_el1(SYS_MAIR);
> +	ctxt->sys_regs[VBAR_EL1]	= read_sysreg_el1(SYS_VBAR);
> +	ctxt->sys_regs[CONTEXTIDR_EL1]	= read_sysreg_el1(SYS_CONTEXTIDR);
> +	ctxt->sys_regs[AMAIR_EL1]	= read_sysreg_el1(SYS_AMAIR);
> +	ctxt->sys_regs[CNTKCTL_EL1]	= read_sysreg_el1(SYS_CNTKCTL);
>  	ctxt->sys_regs[PAR_EL1]		= read_sysreg(par_el1);
>  	ctxt->sys_regs[TPIDR_EL1]	= read_sysreg(tpidr_el1);
>  
>  	ctxt->gp_regs.sp_el1		= read_sysreg(sp_el1);
> -	ctxt->gp_regs.elr_el1		= read_sysreg_el1(elr);
> -	ctxt->gp_regs.spsr[KVM_SPSR_EL1]= read_sysreg_el1(spsr);
> +	ctxt->gp_regs.elr_el1		= read_sysreg_el1(SYS_ELR);
> +	ctxt->gp_regs.spsr[KVM_SPSR_EL1]= read_sysreg_el1(SYS_SPSR);
>  }
>  
>  static void __hyp_text __sysreg_save_el2_return_state(struct kvm_cpu_context *ctxt)
>  {
> -	ctxt->gp_regs.regs.pc		= read_sysreg_el2(elr);
> -	ctxt->gp_regs.regs.pstate	= read_sysreg_el2(spsr);
> +	ctxt->gp_regs.regs.pc		= read_sysreg_el2(SYS_ELR);
> +	ctxt->gp_regs.regs.pstate	= read_sysreg_el2(SYS_SPSR);
>  
>  	if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN))
>  		ctxt->sys_regs[DISR_EL1] = read_sysreg_s(SYS_VDISR_EL2);
> @@ -120,35 +120,35 @@ static void __hyp_text __sysreg_restore_common_state(struct kvm_cpu_context *ctx
>  
>  static void __hyp_text __sysreg_restore_user_state(struct kvm_cpu_context *ctxt)
>  {
> -	write_sysreg(ctxt->sys_regs[TPIDR_EL0],	  	tpidr_el0);
> -	write_sysreg(ctxt->sys_regs[TPIDRRO_EL0], 	tpidrro_el0);
> +	write_sysreg(ctxt->sys_regs[TPIDR_EL0],		tpidr_el0);
> +	write_sysreg(ctxt->sys_regs[TPIDRRO_EL0],	tpidrro_el0);
>  }
>  
>  static void __hyp_text __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
>  {
>  	write_sysreg(ctxt->sys_regs[MPIDR_EL1],		vmpidr_el2);
>  	write_sysreg(ctxt->sys_regs[CSSELR_EL1],	csselr_el1);
> -	write_sysreg_el1(ctxt->sys_regs[SCTLR_EL1],	sctlr);
> -	write_sysreg(ctxt->sys_regs[ACTLR_EL1],	  	actlr_el1);
> -	write_sysreg_el1(ctxt->sys_regs[CPACR_EL1],	cpacr);
> -	write_sysreg_el1(ctxt->sys_regs[TTBR0_EL1],	ttbr0);
> -	write_sysreg_el1(ctxt->sys_regs[TTBR1_EL1],	ttbr1);
> -	write_sysreg_el1(ctxt->sys_regs[TCR_EL1],	tcr);
> -	write_sysreg_el1(ctxt->sys_regs[ESR_EL1],	esr);
> -	write_sysreg_el1(ctxt->sys_regs[AFSR0_EL1],	afsr0);
> -	write_sysreg_el1(ctxt->sys_regs[AFSR1_EL1],	afsr1);
> -	write_sysreg_el1(ctxt->sys_regs[FAR_EL1],	far);
> -	write_sysreg_el1(ctxt->sys_regs[MAIR_EL1],	mair);
> -	write_sysreg_el1(ctxt->sys_regs[VBAR_EL1],	vbar);
> -	write_sysreg_el1(ctxt->sys_regs[CONTEXTIDR_EL1],contextidr);
> -	write_sysreg_el1(ctxt->sys_regs[AMAIR_EL1],	amair);
> -	write_sysreg_el1(ctxt->sys_regs[CNTKCTL_EL1], 	cntkctl);
> +	write_sysreg_el1(ctxt->sys_regs[SCTLR_EL1],	SYS_SCTLR);
> +	write_sysreg(ctxt->sys_regs[ACTLR_EL1],		actlr_el1);
> +	write_sysreg_el1(ctxt->sys_regs[CPACR_EL1],	SYS_CPACR);
> +	write_sysreg_el1(ctxt->sys_regs[TTBR0_EL1],	SYS_TTBR0);
> +	write_sysreg_el1(ctxt->sys_regs[TTBR1_EL1],	SYS_TTBR1);
> +	write_sysreg_el1(ctxt->sys_regs[TCR_EL1],	SYS_TCR);
> +	write_sysreg_el1(ctxt->sys_regs[ESR_EL1],	SYS_ESR);
> +	write_sysreg_el1(ctxt->sys_regs[AFSR0_EL1],	SYS_AFSR0);
> +	write_sysreg_el1(ctxt->sys_regs[AFSR1_EL1],	SYS_AFSR1);
> +	write_sysreg_el1(ctxt->sys_regs[FAR_EL1],	SYS_FAR);
> +	write_sysreg_el1(ctxt->sys_regs[MAIR_EL1],	SYS_MAIR);
> +	write_sysreg_el1(ctxt->sys_regs[VBAR_EL1],	SYS_VBAR);
> +	write_sysreg_el1(ctxt->sys_regs[CONTEXTIDR_EL1],SYS_CONTEXTIDR);
> +	write_sysreg_el1(ctxt->sys_regs[AMAIR_EL1],	SYS_AMAIR);
> +	write_sysreg_el1(ctxt->sys_regs[CNTKCTL_EL1],	SYS_CNTKCTL);
>  	write_sysreg(ctxt->sys_regs[PAR_EL1],		par_el1);
>  	write_sysreg(ctxt->sys_regs[TPIDR_EL1],		tpidr_el1);
>  
>  	write_sysreg(ctxt->gp_regs.sp_el1,		sp_el1);
> -	write_sysreg_el1(ctxt->gp_regs.elr_el1,		elr);
> -	write_sysreg_el1(ctxt->gp_regs.spsr[KVM_SPSR_EL1],spsr);
> +	write_sysreg_el1(ctxt->gp_regs.elr_el1,		SYS_ELR);
> +	write_sysreg_el1(ctxt->gp_regs.spsr[KVM_SPSR_EL1],SYS_SPSR);
>  }
>  
>  static void __hyp_text
> @@ -171,8 +171,8 @@ __sysreg_restore_el2_return_state(struct kvm_cpu_context *ctxt)
>  	if (!(mode & PSR_MODE32_BIT) && mode >= PSR_MODE_EL2t)
>  		pstate = PSR_MODE_EL2h | PSR_IL_BIT;
>  
> -	write_sysreg_el2(ctxt->gp_regs.regs.pc,		elr);
> -	write_sysreg_el2(pstate,			spsr);
> +	write_sysreg_el2(ctxt->gp_regs.regs.pc,		SYS_ELR);
> +	write_sysreg_el2(pstate,			SYS_SPSR);
>  
>  	if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN))
>  		write_sysreg_s(ctxt->sys_regs[DISR_EL1], SYS_VDISR_EL2);
> diff --git a/arch/arm64/kvm/hyp/tlb.c b/arch/arm64/kvm/hyp/tlb.c
> index 76c30866069e..32a782bb00be 100644
> --- a/arch/arm64/kvm/hyp/tlb.c
> +++ b/arch/arm64/kvm/hyp/tlb.c
> @@ -44,12 +44,12 @@ static void __hyp_text __tlb_switch_to_guest_vhe(struct kvm *kvm,
>  		 * in the TCR_EL1 register. We also need to prevent it to
>  		 * allocate IPA->PA walks, so we enable the S1 MMU...
>  		 */
> -		val = cxt->tcr = read_sysreg_el1(tcr);
> +		val = cxt->tcr = read_sysreg_el1(SYS_TCR);
>  		val |= TCR_EPD1_MASK | TCR_EPD0_MASK;
> -		write_sysreg_el1(val, tcr);
> -		val = cxt->sctlr = read_sysreg_el1(sctlr);
> +		write_sysreg_el1(val, SYS_TCR);
> +		val = cxt->sctlr = read_sysreg_el1(SYS_SCTLR);
>  		val |= SCTLR_ELx_M;
> -		write_sysreg_el1(val, sctlr);
> +		write_sysreg_el1(val, SYS_SCTLR);
>  	}
>  
>  	/*
> @@ -96,8 +96,8 @@ static void __hyp_text __tlb_switch_to_host_vhe(struct kvm *kvm,
>  
>  	if (cpus_have_const_cap(ARM64_WORKAROUND_1165522)) {
>  		/* Restore the registers to what they were */
> -		write_sysreg_el1(cxt->tcr, tcr);
> -		write_sysreg_el1(cxt->sctlr, sctlr);
> +		write_sysreg_el1(cxt->tcr, SYS_TCR);
> +		write_sysreg_el1(cxt->sctlr, SYS_SCTLR);
>  	}
>  
>  	local_irq_restore(cxt->flags);
> diff --git a/arch/arm64/kvm/hyp/vgic-v2-cpuif-proxy.c b/arch/arm64/kvm/hyp/vgic-v2-cpuif-proxy.c
> index 9cbdd034a563..4cd32c856110 100644
> --- a/arch/arm64/kvm/hyp/vgic-v2-cpuif-proxy.c
> +++ b/arch/arm64/kvm/hyp/vgic-v2-cpuif-proxy.c
> @@ -27,7 +27,7 @@
>  static bool __hyp_text __is_be(struct kvm_vcpu *vcpu)
>  {
>  	if (vcpu_mode_is_32bit(vcpu))
> -		return !!(read_sysreg_el2(spsr) & PSR_AA32_E_BIT);
> +		return !!(read_sysreg_el2(SYS_SPSR) & PSR_AA32_E_BIT);
>  
>  	return !!(read_sysreg(SCTLR_EL1) & SCTLR_ELx_EE);
>  }
> diff --git a/arch/arm64/kvm/regmap.c b/arch/arm64/kvm/regmap.c
> index 7a5173ea2276..5dd110b384e4 100644
> --- a/arch/arm64/kvm/regmap.c
> +++ b/arch/arm64/kvm/regmap.c
> @@ -163,7 +163,7 @@ unsigned long vcpu_read_spsr32(const struct kvm_vcpu *vcpu)
>  
>  	switch (spsr_idx) {
>  	case KVM_SPSR_SVC:
> -		return read_sysreg_el1(spsr);
> +		return read_sysreg_el1(SYS_SPSR);
>  	case KVM_SPSR_ABT:
>  		return read_sysreg(spsr_abt);
>  	case KVM_SPSR_UND:
> @@ -188,7 +188,7 @@ void vcpu_write_spsr32(struct kvm_vcpu *vcpu, unsigned long v)
>  
>  	switch (spsr_idx) {
>  	case KVM_SPSR_SVC:
> -		write_sysreg_el1(v, spsr);
> +		write_sysreg_el1(v, SYS_SPSR);
>  	case KVM_SPSR_ABT:
>  		write_sysreg(v, spsr_abt);
>  	case KVM_SPSR_UND:
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 857b226bcdde..adb8a7e9c8e4 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -92,24 +92,24 @@ u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
>  	 */
>  	switch (reg) {
>  	case CSSELR_EL1:	return read_sysreg_s(SYS_CSSELR_EL1);
> -	case SCTLR_EL1:		return read_sysreg_s(sctlr_EL12);
> +	case SCTLR_EL1:		return read_sysreg_s(SYS_SCTLR_EL12);
>  	case ACTLR_EL1:		return read_sysreg_s(SYS_ACTLR_EL1);
> -	case CPACR_EL1:		return read_sysreg_s(cpacr_EL12);
> -	case TTBR0_EL1:		return read_sysreg_s(ttbr0_EL12);
> -	case TTBR1_EL1:		return read_sysreg_s(ttbr1_EL12);
> -	case TCR_EL1:		return read_sysreg_s(tcr_EL12);
> -	case ESR_EL1:		return read_sysreg_s(esr_EL12);
> -	case AFSR0_EL1:		return read_sysreg_s(afsr0_EL12);
> -	case AFSR1_EL1:		return read_sysreg_s(afsr1_EL12);
> -	case FAR_EL1:		return read_sysreg_s(far_EL12);
> -	case MAIR_EL1:		return read_sysreg_s(mair_EL12);
> -	case VBAR_EL1:		return read_sysreg_s(vbar_EL12);
> -	case CONTEXTIDR_EL1:	return read_sysreg_s(contextidr_EL12);
> +	case CPACR_EL1:		return read_sysreg_s(SYS_CPACR_EL12);
> +	case TTBR0_EL1:		return read_sysreg_s(SYS_TTBR0_EL12);
> +	case TTBR1_EL1:		return read_sysreg_s(SYS_TTBR1_EL12);
> +	case TCR_EL1:		return read_sysreg_s(SYS_TCR_EL12);
> +	case ESR_EL1:		return read_sysreg_s(SYS_ESR_EL12);
> +	case AFSR0_EL1:		return read_sysreg_s(SYS_AFSR0_EL12);
> +	case AFSR1_EL1:		return read_sysreg_s(SYS_AFSR1_EL12);
> +	case FAR_EL1:		return read_sysreg_s(SYS_FAR_EL12);
> +	case MAIR_EL1:		return read_sysreg_s(SYS_MAIR_EL12);
> +	case VBAR_EL1:		return read_sysreg_s(SYS_VBAR_EL12);
> +	case CONTEXTIDR_EL1:	return read_sysreg_s(SYS_CONTEXTIDR_EL12);
>  	case TPIDR_EL0:		return read_sysreg_s(SYS_TPIDR_EL0);
>  	case TPIDRRO_EL0:	return read_sysreg_s(SYS_TPIDRRO_EL0);
>  	case TPIDR_EL1:		return read_sysreg_s(SYS_TPIDR_EL1);
> -	case AMAIR_EL1:		return read_sysreg_s(amair_EL12);
> -	case CNTKCTL_EL1:	return read_sysreg_s(cntkctl_EL12);
> +	case AMAIR_EL1:		return read_sysreg_s(SYS_AMAIR_EL12);
> +	case CNTKCTL_EL1:	return read_sysreg_s(SYS_CNTKCTL_EL12);
>  	case PAR_EL1:		return read_sysreg_s(SYS_PAR_EL1);
>  	case DACR32_EL2:	return read_sysreg_s(SYS_DACR32_EL2);
>  	case IFSR32_EL2:	return read_sysreg_s(SYS_IFSR32_EL2);
> @@ -135,24 +135,24 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
>  	 */
>  	switch (reg) {
>  	case CSSELR_EL1:	write_sysreg_s(val, SYS_CSSELR_EL1);	return;
> -	case SCTLR_EL1:		write_sysreg_s(val, sctlr_EL12);	return;
> +	case SCTLR_EL1:		write_sysreg_s(val, SYS_SCTLR_EL12);	return;
>  	case ACTLR_EL1:		write_sysreg_s(val, SYS_ACTLR_EL1);	return;
> -	case CPACR_EL1:		write_sysreg_s(val, cpacr_EL12);	return;
> -	case TTBR0_EL1:		write_sysreg_s(val, ttbr0_EL12);	return;
> -	case TTBR1_EL1:		write_sysreg_s(val, ttbr1_EL12);	return;
> -	case TCR_EL1:		write_sysreg_s(val, tcr_EL12);		return;
> -	case ESR_EL1:		write_sysreg_s(val, esr_EL12);		return;
> -	case AFSR0_EL1:		write_sysreg_s(val, afsr0_EL12);	return;
> -	case AFSR1_EL1:		write_sysreg_s(val, afsr1_EL12);	return;
> -	case FAR_EL1:		write_sysreg_s(val, far_EL12);		return;
> -	case MAIR_EL1:		write_sysreg_s(val, mair_EL12);		return;
> -	case VBAR_EL1:		write_sysreg_s(val, vbar_EL12);		return;
> -	case CONTEXTIDR_EL1:	write_sysreg_s(val, contextidr_EL12);	return;
> +	case CPACR_EL1:		write_sysreg_s(val, SYS_CPACR_EL12);	return;
> +	case TTBR0_EL1:		write_sysreg_s(val, SYS_TTBR0_EL12);	return;
> +	case TTBR1_EL1:		write_sysreg_s(val, SYS_TTBR1_EL12);	return;
> +	case TCR_EL1:		write_sysreg_s(val, SYS_TCR_EL12);	return;
> +	case ESR_EL1:		write_sysreg_s(val, SYS_ESR_EL12);	return;
> +	case AFSR0_EL1:		write_sysreg_s(val, SYS_AFSR0_EL12);	return;
> +	case AFSR1_EL1:		write_sysreg_s(val, SYS_AFSR1_EL12);	return;
> +	case FAR_EL1:		write_sysreg_s(val, SYS_FAR_EL12);	return;
> +	case MAIR_EL1:		write_sysreg_s(val, SYS_MAIR_EL12);	return;
> +	case VBAR_EL1:		write_sysreg_s(val, SYS_VBAR_EL12);	return;
> +	case CONTEXTIDR_EL1:	write_sysreg_s(val, SYS_CONTEXTIDR_EL12); return;
>  	case TPIDR_EL0:		write_sysreg_s(val, SYS_TPIDR_EL0);	return;
>  	case TPIDRRO_EL0:	write_sysreg_s(val, SYS_TPIDRRO_EL0);	return;
>  	case TPIDR_EL1:		write_sysreg_s(val, SYS_TPIDR_EL1);	return;
> -	case AMAIR_EL1:		write_sysreg_s(val, amair_EL12);	return;
> -	case CNTKCTL_EL1:	write_sysreg_s(val, cntkctl_EL12);	return;
> +	case AMAIR_EL1:		write_sysreg_s(val, SYS_AMAIR_EL12);	return;
> +	case CNTKCTL_EL1:	write_sysreg_s(val, SYS_CNTKCTL_EL12);	return;
>  	case PAR_EL1:		write_sysreg_s(val, SYS_PAR_EL1);	return;
>  	case DACR32_EL2:	write_sysreg_s(val, SYS_DACR32_EL2);	return;
>  	case IFSR32_EL2:	write_sysreg_s(val, SYS_IFSR32_EL2);	return;
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 05ddb6293b79..089441a07ed7 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -237,10 +237,10 @@ static bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx)
>  
>  		switch (index) {
>  		case TIMER_VTIMER:
> -			cnt_ctl = read_sysreg_el0(cntv_ctl);
> +			cnt_ctl = read_sysreg_el0(SYS_CNTV_CTL);
>  			break;
>  		case TIMER_PTIMER:
> -			cnt_ctl = read_sysreg_el0(cntp_ctl);
> +			cnt_ctl = read_sysreg_el0(SYS_CNTP_CTL);
>  			break;
>  		case NR_KVM_TIMERS:
>  			/* GCC is braindead */
> @@ -349,20 +349,20 @@ static void timer_save_state(struct arch_timer_context *ctx)
>  
>  	switch (index) {
>  	case TIMER_VTIMER:
> -		ctx->cnt_ctl = read_sysreg_el0(cntv_ctl);
> -		ctx->cnt_cval = read_sysreg_el0(cntv_cval);
> +		ctx->cnt_ctl = read_sysreg_el0(SYS_CNTV_CTL);
> +		ctx->cnt_cval = read_sysreg_el0(SYS_CNTV_CVAL);
>  
>  		/* Disable the timer */
> -		write_sysreg_el0(0, cntv_ctl);
> +		write_sysreg_el0(0, SYS_CNTV_CTL);
>  		isb();
>  
>  		break;
>  	case TIMER_PTIMER:
> -		ctx->cnt_ctl = read_sysreg_el0(cntp_ctl);
> -		ctx->cnt_cval = read_sysreg_el0(cntp_cval);
> +		ctx->cnt_ctl = read_sysreg_el0(SYS_CNTP_CTL);
> +		ctx->cnt_cval = read_sysreg_el0(SYS_CNTP_CVAL);
>  
>  		/* Disable the timer */
> -		write_sysreg_el0(0, cntp_ctl);
> +		write_sysreg_el0(0, SYS_CNTP_CTL);
>  		isb();
>  
>  		break;
> @@ -428,14 +428,14 @@ static void timer_restore_state(struct arch_timer_context *ctx)
>  
>  	switch (index) {
>  	case TIMER_VTIMER:
> -		write_sysreg_el0(ctx->cnt_cval, cntv_cval);
> +		write_sysreg_el0(ctx->cnt_cval, SYS_CNTV_CVAL);
>  		isb();
> -		write_sysreg_el0(ctx->cnt_ctl, cntv_ctl);
> +		write_sysreg_el0(ctx->cnt_ctl, SYS_CNTV_CTL);
>  		break;
>  	case TIMER_PTIMER:
> -		write_sysreg_el0(ctx->cnt_cval, cntp_cval);
> +		write_sysreg_el0(ctx->cnt_cval, SYS_CNTP_CVAL);
>  		isb();
> -		write_sysreg_el0(ctx->cnt_ctl, cntp_ctl);
> +		write_sysreg_el0(ctx->cnt_ctl, SYS_CNTP_CTL);
>  		break;
>  	case NR_KVM_TIMERS:
>  		BUG();

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 09/59] KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state
  2019-06-21  9:37 ` [PATCH 09/59] KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state Marc Zyngier
@ 2019-06-24 13:08   ` Dave Martin
  0 siblings, 0 replies; 176+ messages in thread
From: Dave Martin @ 2019-06-24 13:08 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Julien Thierry, Andre Przywara,
	Suzuki K Poulose, Christoffer Dall, James Morse, Jintack Lim

On Fri, Jun 21, 2019 at 10:37:53AM +0100, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@arm.com>
> 
> When running a nested hypervisor we commonly have to figure out if
> the VCPU mode is running in the context of a guest hypervisor or guest
> guest, or just a normal guest.
> 
> Add convenient primitives for this.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/include/asm/kvm_emulate.h | 55 ++++++++++++++++++++++++++++
>  1 file changed, 55 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 39ffe41855bc..8f201ea56f6e 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -191,6 +191,61 @@ static inline void vcpu_set_reg(struct kvm_vcpu *vcpu, u8 reg_num,
>  		vcpu_gp_regs(vcpu)->regs.regs[reg_num] = val;
>  }
>  
> +static inline bool vcpu_mode_el2_ctxt(const struct kvm_cpu_context *ctxt)
> +{
> +	unsigned long cpsr = ctxt->gp_regs.regs.pstate;
> +	u32 mode;
> +
> +	if (cpsr & PSR_MODE32_BIT)
> +		return false;
> +
> +	mode = cpsr & PSR_MODE_MASK;
> +
> +	return mode == PSR_MODE_EL2h || mode == PSR_MODE_EL2t;

We could also treat PSR_MODE32_BIT and PSR_MODE_MASK as a single field,
similarly as in the next patch, say:

	switch (ctxt->gp_regs.regs.pstate & (PSR_MODE32_BIT | PSR_MODE_MASK)) {
	case PSR_MODE_EL2h:
	case PSR_MODE_EL2t:
		return true;
	}

	return false;

(This is blatant bikeshedding...)

[...]

Cheers
---Dave

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 15/59] KVM: arm64: nv: Refactor vcpu_{read,write}_sys_reg
  2019-06-21  9:37 ` [PATCH 15/59] KVM: arm64: nv: Refactor vcpu_{read,write}_sys_reg Marc Zyngier
@ 2019-06-24 15:07   ` Julien Thierry
  2019-07-03 13:09     ` Marc Zyngier
  2019-06-27  9:21   ` Alexandru Elisei
  1 sibling, 1 reply; 176+ messages in thread
From: Julien Thierry @ 2019-06-24 15:07 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	James Morse, Suzuki K Poulose



On 06/21/2019 10:37 AM, Marc Zyngier wrote:
> Extract the direct HW accessors for later reuse.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/kvm/sys_regs.c | 247 +++++++++++++++++++++-----------------
>  1 file changed, 139 insertions(+), 108 deletions(-)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 2b8734f75a09..e181359adadf 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -182,99 +182,161 @@ const struct el2_sysreg_map *find_el2_sysreg(const struct el2_sysreg_map *map,
>  	return entry;
>  }
>  
> +static bool __vcpu_read_sys_reg_from_cpu(int reg, u64 *val)
> +{
> +	/*
> +	 * System registers listed in the switch are not saved on every
> +	 * exit from the guest but are only saved on vcpu_put.
> +	 *
> +	 * Note that MPIDR_EL1 for the guest is set by KVM via VMPIDR_EL2 but
> +	 * should never be listed below, because the guest cannot modify its
> +	 * own MPIDR_EL1 and MPIDR_EL1 is accessed for VCPU A from VCPU B's
> +	 * thread when emulating cross-VCPU communication.
> +	 */
> +	switch (reg) {
> +	case CSSELR_EL1:	*val = read_sysreg_s(SYS_CSSELR_EL1);	break;
> +	case SCTLR_EL1:		*val = read_sysreg_s(SYS_SCTLR_EL12);	break;
> +	case ACTLR_EL1:		*val = read_sysreg_s(SYS_ACTLR_EL1);	break;
> +	case CPACR_EL1:		*val = read_sysreg_s(SYS_CPACR_EL12);	break;
> +	case TTBR0_EL1:		*val = read_sysreg_s(SYS_TTBR0_EL12);	break;
> +	case TTBR1_EL1:		*val = read_sysreg_s(SYS_TTBR1_EL12);	break;
> +	case TCR_EL1:		*val = read_sysreg_s(SYS_TCR_EL12);	break;
> +	case ESR_EL1:		*val = read_sysreg_s(SYS_ESR_EL12);	break;
> +	case AFSR0_EL1:		*val = read_sysreg_s(SYS_AFSR0_EL12);	break;
> +	case AFSR1_EL1:		*val = read_sysreg_s(SYS_AFSR1_EL12);	break;
> +	case FAR_EL1:		*val = read_sysreg_s(SYS_FAR_EL12);	break;
> +	case MAIR_EL1:		*val = read_sysreg_s(SYS_MAIR_EL12);	break;
> +	case VBAR_EL1:		*val = read_sysreg_s(SYS_VBAR_EL12);	break;
> +	case CONTEXTIDR_EL1:	*val = read_sysreg_s(SYS_CONTEXTIDR_EL12);break;
> +	case TPIDR_EL0:		*val = read_sysreg_s(SYS_TPIDR_EL0);	break;
> +	case TPIDRRO_EL0:	*val = read_sysreg_s(SYS_TPIDRRO_EL0);	break;
> +	case TPIDR_EL1:		*val = read_sysreg_s(SYS_TPIDR_EL1);	break;
> +	case AMAIR_EL1:		*val = read_sysreg_s(SYS_AMAIR_EL12);	break;
> +	case CNTKCTL_EL1:	*val = read_sysreg_s(SYS_CNTKCTL_EL12);	break;
> +	case PAR_EL1:		*val = read_sysreg_s(SYS_PAR_EL1);	break;
> +	case DACR32_EL2:	*val = read_sysreg_s(SYS_DACR32_EL2);	break;
> +	case IFSR32_EL2:	*val = read_sysreg_s(SYS_IFSR32_EL2);	break;
> +	case DBGVCR32_EL2:	*val = read_sysreg_s(SYS_DBGVCR32_EL2);	break;
> +	default:		return false;
> +	}
> +
> +	return true;
> +}
> +
> +static bool __vcpu_write_sys_reg_to_cpu(u64 val, int reg)
> +{
> +	/*
> +	 * System registers listed in the switch are not restored on every
> +	 * entry to the guest but are only restored on vcpu_load.
> +	 *
> +	 * Note that MPIDR_EL1 for the guest is set by KVM via VMPIDR_EL2 but
> +	 * should never be listed below, because the the MPIDR should only be
> +	 * set once, before running the VCPU, and never changed later.
> +	 */
> +	switch (reg) {
> +	case CSSELR_EL1:	write_sysreg_s(val, SYS_CSSELR_EL1);	break;
> +	case SCTLR_EL1:		write_sysreg_s(val, SYS_SCTLR_EL12);	break;
> +	case ACTLR_EL1:		write_sysreg_s(val, SYS_ACTLR_EL1);	break;
> +	case CPACR_EL1:		write_sysreg_s(val, SYS_CPACR_EL12);	break;
> +	case TTBR0_EL1:		write_sysreg_s(val, SYS_TTBR0_EL12);	break;
> +	case TTBR1_EL1:		write_sysreg_s(val, SYS_TTBR1_EL12);	break;
> +	case TCR_EL1:		write_sysreg_s(val, SYS_TCR_EL12);	break;
> +	case ESR_EL1:		write_sysreg_s(val, SYS_ESR_EL12);	break;
> +	case AFSR0_EL1:		write_sysreg_s(val, SYS_AFSR0_EL12);	break;
> +	case AFSR1_EL1:		write_sysreg_s(val, SYS_AFSR1_EL12);	break;
> +	case FAR_EL1:		write_sysreg_s(val, SYS_FAR_EL12);	break;
> +	case MAIR_EL1:		write_sysreg_s(val, SYS_MAIR_EL12);	break;
> +	case VBAR_EL1:		write_sysreg_s(val, SYS_VBAR_EL12);	break;
> +	case CONTEXTIDR_EL1:	write_sysreg_s(val, SYS_CONTEXTIDR_EL12);break;
> +	case TPIDR_EL0:		write_sysreg_s(val, SYS_TPIDR_EL0);	break;
> +	case TPIDRRO_EL0:	write_sysreg_s(val, SYS_TPIDRRO_EL0);	break;
> +	case TPIDR_EL1:		write_sysreg_s(val, SYS_TPIDR_EL1);	break;
> +	case AMAIR_EL1:		write_sysreg_s(val, SYS_AMAIR_EL12);	break;
> +	case CNTKCTL_EL1:	write_sysreg_s(val, SYS_CNTKCTL_EL12);	break;
> +	case PAR_EL1:		write_sysreg_s(val, SYS_PAR_EL1);	break;
> +	case DACR32_EL2:	write_sysreg_s(val, SYS_DACR32_EL2);	break;
> +	case IFSR32_EL2:	write_sysreg_s(val, SYS_IFSR32_EL2);	break;
> +	case DBGVCR32_EL2:	write_sysreg_s(val, SYS_DBGVCR32_EL2);	break;
> +	default:		return false;
> +	}
> +
> +	return true;
> +}
> +
>  u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
>  {
> -	u64 val;
> +	u64 val = 0x8badf00d8badf00d;
>  
>  	if (!vcpu->arch.sysregs_loaded_on_cpu)
> -		goto immediate_read;
> +		goto memory_read;
>  
>  	if (unlikely(sysreg_is_el2(reg))) {
>  		const struct el2_sysreg_map *el2_reg;
>  
>  		if (!is_hyp_ctxt(vcpu))
> -			goto immediate_read;
> +			goto memory_read;
>  
>  		switch (reg) {
> +		case ELR_EL2:
> +			return read_sysreg_el1(SYS_ELR);

Hmmm, This change feels a bit out of place.

Also, patch 13 added ELR_EL2 and SP_EL2 to the switch cases for physical
sysreg accesses. Now ELR_EL2 is moved out of the main switch cases and
SP_EL2 is completely omitted.

I'd say either patch 13 needs to be reworked or there is a separate
patch that should be extracted from this patch to have an intermediate
state, or the commit message on this patch should be more detailed.

Cheers,

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 07/59] KVM: arm64: nv: Add EL2 system registers to vcpu context
  2019-06-21  9:37 ` [PATCH 07/59] KVM: arm64: nv: Add EL2 system registers to vcpu context Marc Zyngier
  2019-06-24 12:54   ` Dave Martin
@ 2019-06-24 15:47   ` Alexandru Elisei
  2019-07-03 13:20     ` Marc Zyngier
  2019-07-01 16:36   ` Suzuki K Poulose
  2 siblings, 1 reply; 176+ messages in thread
From: Alexandru Elisei @ 2019-06-24 15:47 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave Martin

On 6/21/19 10:37 AM, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
>
> ARM v8.3 introduces a new bit in the HCR_EL2, which is the NV bit. When
> this bit is set, accessing EL2 registers in EL1 traps to EL2. In
> addition, executing the following instructions in EL1 will trap to EL2:
> tlbi, at, eret, and msr/mrs instructions to access SP_EL1. Most of the
> instructions that trap to EL2 with the NV bit were undef at EL1 prior to
> ARM v8.3. The only instruction that was not undef is eret.
>
> This patch sets up a handler for EL2 registers and SP_EL1 register
> accesses at EL1. The host hypervisor keeps those register values in
> memory, and will emulate their behavior.
>
> This patch doesn't set the NV bit yet. It will be set in a later patch
> once nested virtualization support is completed.
>
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/include/asm/kvm_host.h | 37 +++++++++++++++-
>  arch/arm64/include/asm/sysreg.h   | 50 ++++++++++++++++++++-
>  arch/arm64/kvm/sys_regs.c         | 74 ++++++++++++++++++++++++++++---
>  3 files changed, 154 insertions(+), 7 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 4bcd9c1291d5..2d4290d2513a 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -173,12 +173,47 @@ enum vcpu_sysreg {
>  	APGAKEYLO_EL1,
>  	APGAKEYHI_EL1,
>  
> -	/* 32bit specific registers. Keep them at the end of the range */
> +	/* 32bit specific registers. */
>  	DACR32_EL2,	/* Domain Access Control Register */
>  	IFSR32_EL2,	/* Instruction Fault Status Register */
>  	FPEXC32_EL2,	/* Floating-Point Exception Control Register */
>  	DBGVCR32_EL2,	/* Debug Vector Catch Register */
>  
> +	/* EL2 registers sorted ascending by Op0, Op1, CRn, CRm, Op2 */
> +	FIRST_EL2_SYSREG,
> +	VPIDR_EL2 = FIRST_EL2_SYSREG,
> +			/* Virtualization Processor ID Register */
> +	VMPIDR_EL2,	/* Virtualization Multiprocessor ID Register */
> +	SCTLR_EL2,	/* System Control Register (EL2) */
> +	ACTLR_EL2,	/* Auxiliary Control Register (EL2) */
> +	HCR_EL2,	/* Hypervisor Configuration Register */
> +	MDCR_EL2,	/* Monitor Debug Configuration Register (EL2) */
> +	CPTR_EL2,	/* Architectural Feature Trap Register (EL2) */
> +	HSTR_EL2,	/* Hypervisor System Trap Register */
> +	HACR_EL2,	/* Hypervisor Auxiliary Control Register */
> +	TTBR0_EL2,	/* Translation Table Base Register 0 (EL2) */
> +	TTBR1_EL2,	/* Translation Table Base Register 1 (EL2) */
> +	TCR_EL2,	/* Translation Control Register (EL2) */
> +	VTTBR_EL2,	/* Virtualization Translation Table Base Register */
> +	VTCR_EL2,	/* Virtualization Translation Control Register */
> +	SPSR_EL2,	/* EL2 saved program status register */
> +	ELR_EL2,	/* EL2 exception link register */
> +	AFSR0_EL2,	/* Auxiliary Fault Status Register 0 (EL2) */
> +	AFSR1_EL2,	/* Auxiliary Fault Status Register 1 (EL2) */
> +	ESR_EL2,	/* Exception Syndrome Register (EL2) */
> +	FAR_EL2,	/* Hypervisor IPA Fault Address Register */
> +	HPFAR_EL2,	/* Hypervisor IPA Fault Address Register */
> +	MAIR_EL2,	/* Memory Attribute Indirection Register (EL2) */
> +	AMAIR_EL2,	/* Auxiliary Memory Attribute Indirection Register (EL2) */
> +	VBAR_EL2,	/* Vector Base Address Register (EL2) */
> +	RVBAR_EL2,	/* Reset Vector Base Address Register */
> +	RMR_EL2,	/* Reset Management Register */
> +	CONTEXTIDR_EL2,	/* Context ID Register (EL2) */
> +	TPIDR_EL2,	/* EL2 Software Thread ID Register */
> +	CNTVOFF_EL2,	/* Counter-timer Virtual Offset register */
> +	CNTHCTL_EL2,	/* Counter-timer Hypervisor Control register */
> +	SP_EL2,		/* EL2 Stack Pointer */
> +
>  	NR_SYS_REGS	/* Nothing after this line! */
>  };
>  
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index f3ca7e4796ab..8b95f2c42c3d 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -411,17 +411,49 @@
>  
>  #define SYS_PMCCFILTR_EL0		sys_reg(3, 3, 14, 15, 7)
>  
> +#define SYS_VPIDR_EL2			sys_reg(3, 4, 0, 0, 0)
> +#define SYS_VMPIDR_EL2			sys_reg(3, 4, 0, 0, 5)
> +
> +#define SYS_SCTLR_EL2			sys_reg(3, 4, 1, 0, 0)
> +#define SYS_ACTLR_EL2			sys_reg(3, 4, 1, 0, 1)
> +#define SYS_HCR_EL2			sys_reg(3, 4, 1, 1, 0)
> +#define SYS_MDCR_EL2			sys_reg(3, 4, 1, 1, 1)
> +#define SYS_CPTR_EL2			sys_reg(3, 4, 1, 1, 2)
> +#define SYS_HSTR_EL2			sys_reg(3, 4, 1, 1, 3)
> +#define SYS_HACR_EL2			sys_reg(3, 4, 1, 1, 7)
> +
>  #define SYS_ZCR_EL2			sys_reg(3, 4, 1, 2, 0)
> +
> +#define SYS_TTBR0_EL2			sys_reg(3, 4, 2, 0, 0)
> +#define SYS_TTBR1_EL2			sys_reg(3, 4, 2, 0, 1)
> +#define SYS_TCR_EL2			sys_reg(3, 4, 2, 0, 2)
> +#define SYS_VTTBR_EL2			sys_reg(3, 4, 2, 1, 0)
> +#define SYS_VTCR_EL2			sys_reg(3, 4, 2, 1, 2)
> +
>  #define SYS_DACR32_EL2			sys_reg(3, 4, 3, 0, 0)
> +
>  #define SYS_SPSR_EL2			sys_reg(3, 4, 4, 0, 0)
>  #define SYS_ELR_EL2			sys_reg(3, 4, 4, 0, 1)
> +#define SYS_SP_EL1			sys_reg(3, 4, 4, 1, 0)
> +
>  #define SYS_IFSR32_EL2			sys_reg(3, 4, 5, 0, 1)
> +#define SYS_AFSR0_EL2			sys_reg(3, 4, 5, 1, 0)
> +#define SYS_AFSR1_EL2			sys_reg(3, 4, 5, 1, 1)
>  #define SYS_ESR_EL2			sys_reg(3, 4, 5, 2, 0)
>  #define SYS_VSESR_EL2			sys_reg(3, 4, 5, 2, 3)
>  #define SYS_FPEXC32_EL2			sys_reg(3, 4, 5, 3, 0)
>  #define SYS_FAR_EL2			sys_reg(3, 4, 6, 0, 0)
>  
> -#define SYS_VDISR_EL2			sys_reg(3, 4, 12, 1,  1)
> +#define SYS_FAR_EL2			sys_reg(3, 4, 6, 0, 0)
> +#define SYS_HPFAR_EL2			sys_reg(3, 4, 6, 0, 4)
> +
> +#define SYS_MAIR_EL2			sys_reg(3, 4, 10, 2, 0)
> +#define SYS_AMAIR_EL2			sys_reg(3, 4, 10, 3, 0)
> +
> +#define SYS_VBAR_EL2			sys_reg(3, 4, 12, 0, 0)
> +#define SYS_RVBAR_EL2			sys_reg(3, 4, 12, 0, 1)
> +#define SYS_RMR_EL2			sys_reg(3, 4, 12, 0, 2)
> +#define SYS_VDISR_EL2			sys_reg(3, 4, 12, 1, 1)
>  #define __SYS__AP0Rx_EL2(x)		sys_reg(3, 4, 12, 8, x)
>  #define SYS_ICH_AP0R0_EL2		__SYS__AP0Rx_EL2(0)
>  #define SYS_ICH_AP0R1_EL2		__SYS__AP0Rx_EL2(1)
> @@ -463,23 +495,37 @@
>  #define SYS_ICH_LR14_EL2		__SYS__LR8_EL2(6)
>  #define SYS_ICH_LR15_EL2		__SYS__LR8_EL2(7)
>  
> +#define SYS_CONTEXTIDR_EL2		sys_reg(3, 4, 13, 0, 1)
> +#define SYS_TPIDR_EL2			sys_reg(3, 4, 13, 0, 2)
> +
> +#define SYS_CNTVOFF_EL2			sys_reg(3, 4, 14, 0, 3)
> +#define SYS_CNTHCTL_EL2			sys_reg(3, 4, 14, 1, 0)
> +
>  /* VHE encodings for architectural EL0/1 system registers */
>  #define SYS_SCTLR_EL12			sys_reg(3, 5, 1, 0, 0)
>  #define SYS_CPACR_EL12			sys_reg(3, 5, 1, 0, 2)
>  #define SYS_ZCR_EL12			sys_reg(3, 5, 1, 2, 0)
> +
>  #define SYS_TTBR0_EL12			sys_reg(3, 5, 2, 0, 0)
>  #define SYS_TTBR1_EL12			sys_reg(3, 5, 2, 0, 1)
>  #define SYS_TCR_EL12			sys_reg(3, 5, 2, 0, 2)
> +
>  #define SYS_SPSR_EL12			sys_reg(3, 5, 4, 0, 0)
>  #define SYS_ELR_EL12			sys_reg(3, 5, 4, 0, 1)
> +
>  #define SYS_AFSR0_EL12			sys_reg(3, 5, 5, 1, 0)
>  #define SYS_AFSR1_EL12			sys_reg(3, 5, 5, 1, 1)
>  #define SYS_ESR_EL12			sys_reg(3, 5, 5, 2, 0)
> +
>  #define SYS_FAR_EL12			sys_reg(3, 5, 6, 0, 0)
> +
>  #define SYS_MAIR_EL12			sys_reg(3, 5, 10, 2, 0)
>  #define SYS_AMAIR_EL12			sys_reg(3, 5, 10, 3, 0)
> +
>  #define SYS_VBAR_EL12			sys_reg(3, 5, 12, 0, 0)
> +
>  #define SYS_CONTEXTIDR_EL12		sys_reg(3, 5, 13, 0, 1)
> +
>  #define SYS_CNTKCTL_EL12		sys_reg(3, 5, 14, 1, 0)
>  #define SYS_CNTP_TVAL_EL02		sys_reg(3, 5, 14, 2, 0)
>  #define SYS_CNTP_CTL_EL02		sys_reg(3, 5, 14, 2, 1)
> @@ -488,6 +534,8 @@
>  #define SYS_CNTV_CTL_EL02		sys_reg(3, 5, 14, 3, 1)
>  #define SYS_CNTV_CVAL_EL02		sys_reg(3, 5, 14, 3, 2)
>  
> +#define SYS_SP_EL2			sys_reg(3, 6,  4, 1, 0)
> +
>  /* Common SCTLR_ELx flags. */
>  #define SCTLR_ELx_DSSBS	(_BITUL(44))
>  #define SCTLR_ELx_ENIA	(_BITUL(31))
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index adb8a7e9c8e4..e81be6debe07 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -184,6 +184,18 @@ static u32 get_ccsidr(u32 csselr)
>  	return ccsidr;
>  }
>  
> +static bool access_rw(struct kvm_vcpu *vcpu,
> +		      struct sys_reg_params *p,
> +		      const struct sys_reg_desc *r)
> +{
> +	if (p->is_write)
> +		vcpu_write_sys_reg(vcpu, p->regval, r->reg);
> +	else
> +		p->regval = vcpu_read_sys_reg(vcpu, r->reg);
> +
> +	return true;
> +}
> +
>  /*
>   * See note at ARMv7 ARM B1.14.4 (TL;DR: S/W ops are not easily virtualized).
>   */
> @@ -394,12 +406,9 @@ static bool trap_debug_regs(struct kvm_vcpu *vcpu,
>  			    struct sys_reg_params *p,
>  			    const struct sys_reg_desc *r)
>  {
> -	if (p->is_write) {
> -		vcpu_write_sys_reg(vcpu, p->regval, r->reg);
> +	access_rw(vcpu, p, r);
> +	if (p->is_write)
>  		vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
> -	} else {
> -		p->regval = vcpu_read_sys_reg(vcpu, r->reg);
> -	}
>  
>  	trace_trap_reg(__func__, r->reg, p->is_write, p->regval);
>  
> @@ -1354,6 +1363,19 @@ static bool access_ccsidr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
>  	.set_user = set_raz_id_reg,		\
>  }
>  
> +static bool access_sp_el1(struct kvm_vcpu *vcpu,
> +			  struct sys_reg_params *p,
> +			  const struct sys_reg_desc *r)
> +{
> +	/* SP_EL1 is NOT maintained in sys_regs array */
> +	if (p->is_write)
> +		vcpu->arch.ctxt.gp_regs.sp_el1 = p->regval;
> +	else
> +		p->regval = vcpu->arch.ctxt.gp_regs.sp_el1;
> +
> +	return true;
> +}
> +
>  /*
>   * Architected system registers.
>   * Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
> @@ -1646,9 +1668,51 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	 */
>  	{ SYS_DESC(SYS_PMCCFILTR_EL0), access_pmu_evtyper, reset_val, PMCCFILTR_EL0, 0 },
I have to admit I haven't gone through all the patches, or maybe this is part of
the bits that will be added at a later date, but some of the reset values seem
incorrect according to ARM DDI 0487D.a. I'll comment below the relevant registers.
>  
> +	{ SYS_DESC(SYS_VPIDR_EL2), access_rw, reset_val, VPIDR_EL2, 0 },
> +	{ SYS_DESC(SYS_VMPIDR_EL2), access_rw, reset_val, VMPIDR_EL2, 0 },
> +
> +	{ SYS_DESC(SYS_SCTLR_EL2), access_rw, reset_val, SCTLR_EL2, 0 },
Some bits are RES1 for SCTLR_EL2.
> +	{ SYS_DESC(SYS_ACTLR_EL2), access_rw, reset_val, ACTLR_EL2, 0 },
> +	{ SYS_DESC(SYS_HCR_EL2), access_rw, reset_val, HCR_EL2, 0 },
> +	{ SYS_DESC(SYS_MDCR_EL2), access_rw, reset_val, MDCR_EL2, 0 },
> +	{ SYS_DESC(SYS_CPTR_EL2), access_rw, reset_val, CPTR_EL2, 0 },
Some bits are RES1 for CPTR_EL2 if HCR_EL2.E2H == 0, which the reset value for
HCR_EL2 seems to imply.
> +	{ SYS_DESC(SYS_HSTR_EL2), access_rw, reset_val, HSTR_EL2, 0 },
> +	{ SYS_DESC(SYS_HACR_EL2), access_rw, reset_val, HACR_EL2, 0 },
> +
> +	{ SYS_DESC(SYS_TTBR0_EL2), access_rw, reset_val, TTBR0_EL2, 0 },
> +	{ SYS_DESC(SYS_TTBR1_EL2), access_rw, reset_val, TTBR1_EL2, 0 },
> +	{ SYS_DESC(SYS_TCR_EL2), access_rw, reset_val, TCR_EL2, 0 },
Same here, bits 31 and 23 are RES1 for TCR_EL2 when HCR_EL2.E2H == 0.
> +	{ SYS_DESC(SYS_VTTBR_EL2), access_rw, reset_val, VTTBR_EL2, 0 },
> +	{ SYS_DESC(SYS_VTCR_EL2), access_rw, reset_val, VTCR_EL2, 0 },
> +
>  	{ SYS_DESC(SYS_DACR32_EL2), NULL, reset_unknown, DACR32_EL2 },
> +	{ SYS_DESC(SYS_SPSR_EL2), access_rw, reset_val, SPSR_EL2, 0 },
> +	{ SYS_DESC(SYS_ELR_EL2), access_rw, reset_val, ELR_EL2, 0 },
> +	{ SYS_DESC(SYS_SP_EL1), access_sp_el1},
> +
>  	{ SYS_DESC(SYS_IFSR32_EL2), NULL, reset_unknown, IFSR32_EL2 },
> +	{ SYS_DESC(SYS_AFSR0_EL2), access_rw, reset_val, AFSR0_EL2, 0 },
> +	{ SYS_DESC(SYS_AFSR1_EL2), access_rw, reset_val, AFSR1_EL2, 0 },
> +	{ SYS_DESC(SYS_ESR_EL2), access_rw, reset_val, ESR_EL2, 0 },
>  	{ SYS_DESC(SYS_FPEXC32_EL2), NULL, reset_val, FPEXC32_EL2, 0x700 },
> +
> +	{ SYS_DESC(SYS_FAR_EL2), access_rw, reset_val, FAR_EL2, 0 },
> +	{ SYS_DESC(SYS_HPFAR_EL2), access_rw, reset_val, HPFAR_EL2, 0 },
> +
> +	{ SYS_DESC(SYS_MAIR_EL2), access_rw, reset_val, MAIR_EL2, 0 },
> +	{ SYS_DESC(SYS_AMAIR_EL2), access_rw, reset_val, AMAIR_EL2, 0 },
> +
> +	{ SYS_DESC(SYS_VBAR_EL2), access_rw, reset_val, VBAR_EL2, 0 },
> +	{ SYS_DESC(SYS_RVBAR_EL2), access_rw, reset_val, RVBAR_EL2, 0 },
> +	{ SYS_DESC(SYS_RMR_EL2), access_rw, reset_val, RMR_EL2, 0 },
Bit AA64 [0] for RMR_EL2 is RAO/WI for EL2 cannot aarch32, which is what the
patches seem to enforce.
> +
> +	{ SYS_DESC(SYS_CONTEXTIDR_EL2), access_rw, reset_val, CONTEXTIDR_EL2, 0 },
> +	{ SYS_DESC(SYS_TPIDR_EL2), access_rw, reset_val, TPIDR_EL2, 0 },
> +
> +	{ SYS_DESC(SYS_CNTVOFF_EL2), access_rw, reset_val, CNTVOFF_EL2, 0 },
> +	{ SYS_DESC(SYS_CNTHCTL_EL2), access_rw, reset_val, CNTHCTL_EL2, 0 },
> +
> +	{ SYS_DESC(SYS_SP_EL2), NULL, reset_unknown, SP_EL2 },
>  };
>  
>  static bool trap_dbgidr(struct kvm_vcpu *vcpu,

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 16/59] KVM: arm64: nv: Save/Restore vEL2 sysregs
  2019-06-21  9:38 ` [PATCH 16/59] KVM: arm64: nv: Save/Restore vEL2 sysregs Marc Zyngier
@ 2019-06-25  8:48   ` Julien Thierry
  2019-07-03 13:42     ` Marc Zyngier
  2019-07-01 12:09   ` Alexandru Elisei
  2019-08-21 11:57   ` Alexandru Elisei
  2 siblings, 1 reply; 176+ messages in thread
From: Julien Thierry @ 2019-06-25  8:48 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	James Morse, Suzuki K Poulose



On 06/21/2019 10:38 AM, Marc Zyngier wrote:
> From: Andre Przywara <andre.przywara@arm.com>
> 
> Whenever we need to restore the guest's system registers to the CPU, we
> now need to take care of the EL2 system registers as well. Most of them
> are accessed via traps only, but some have an immediate effect and also
> a guest running in VHE mode would expect them to be accessible via their
> EL1 encoding, which we do not trap.
> 
> Split the current __sysreg_{save,restore}_el1_state() functions into
> handling common sysregs, then differentiate between the guest running in
> vEL2 and vEL1.
> 
> For vEL2 we write the virtual EL2 registers with an identical format directly
> into their EL1 counterpart, and translate the few registers that have a
> different format for the same effect on the execution when running a
> non-VHE guest guest hypervisor.
> 
>   [ Commit message reworked and many bug fixes applied by Marc Zyngier
>     and Christoffer Dall. ]
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> ---
>  arch/arm64/kvm/hyp/sysreg-sr.c | 160 +++++++++++++++++++++++++++++++--
>  1 file changed, 153 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> index 62866a68e852..2abb9c3ff24f 100644
> --- a/arch/arm64/kvm/hyp/sysreg-sr.c
> +++ b/arch/arm64/kvm/hyp/sysreg-sr.c

[...]

> @@ -124,10 +167,91 @@ static void __hyp_text __sysreg_restore_user_state(struct kvm_cpu_context *ctxt)
>  	write_sysreg(ctxt->sys_regs[TPIDRRO_EL0],	tpidrro_el0);
>  }
>  
> -static void __hyp_text __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
> +static void __sysreg_restore_vel2_state(struct kvm_cpu_context *ctxt)
>  {
> +	u64 val;
> +
> +	write_sysreg(read_cpuid_id(),			vpidr_el2);
>  	write_sysreg(ctxt->sys_regs[MPIDR_EL1],		vmpidr_el2);
> -	write_sysreg(ctxt->sys_regs[CSSELR_EL1],	csselr_el1);
> +	write_sysreg_el1(ctxt->sys_regs[MAIR_EL2],	SYS_MAIR);
> +	write_sysreg_el1(ctxt->sys_regs[VBAR_EL2],	SYS_VBAR);
> +	write_sysreg_el1(ctxt->sys_regs[CONTEXTIDR_EL2],SYS_CONTEXTIDR);
> +	write_sysreg_el1(ctxt->sys_regs[AMAIR_EL2],	SYS_AMAIR);
> +
> +	if (__vcpu_el2_e2h_is_set(ctxt)) {
> +		/*
> +		 * In VHE mode those registers are compatible between
> +		 * EL1 and EL2.
> +		 */
> +		write_sysreg_el1(ctxt->sys_regs[SCTLR_EL2],	SYS_SCTLR);
> +		write_sysreg_el1(ctxt->sys_regs[CPTR_EL2],	SYS_CPACR);
> +		write_sysreg_el1(ctxt->sys_regs[TTBR0_EL2],	SYS_TTBR0);
> +		write_sysreg_el1(ctxt->sys_regs[TTBR1_EL2],	SYS_TTBR1);
> +		write_sysreg_el1(ctxt->sys_regs[TCR_EL2],	SYS_TCR);
> +		write_sysreg_el1(ctxt->sys_regs[CNTHCTL_EL2],	SYS_CNTKCTL);
> +	} else {
> +		write_sysreg_el1(translate_sctlr(ctxt->sys_regs[SCTLR_EL2]),
> +				 SYS_SCTLR);
> +		write_sysreg_el1(translate_cptr(ctxt->sys_regs[CPTR_EL2]),
> +				 SYS_CPACR);
> +		write_sysreg_el1(translate_ttbr0(ctxt->sys_regs[TTBR0_EL2]),
> +				 SYS_TTBR0);
> +		write_sysreg_el1(translate_tcr(ctxt->sys_regs[TCR_EL2]),
> +				 SYS_TCR);
> +		write_sysreg_el1(translate_cnthctl(ctxt->sys_regs[CNTHCTL_EL2]),
> +				 SYS_CNTKCTL);
> +	}
> +
> +	/*
> +	 * These registers can be modified behind our back by a fault
> +	 * taken inside vEL2. Save them, always.
> +	 */
> +	write_sysreg_el1(ctxt->sys_regs[ESR_EL2],	SYS_ESR);
> +	write_sysreg_el1(ctxt->sys_regs[AFSR0_EL2],	SYS_AFSR0);
> +	write_sysreg_el1(ctxt->sys_regs[AFSR1_EL2],	SYS_AFSR1);
> +	write_sysreg_el1(ctxt->sys_regs[FAR_EL2],	SYS_FAR);
> +	write_sysreg(ctxt->sys_regs[SP_EL2],		sp_el1);
> +	write_sysreg_el1(ctxt->sys_regs[ELR_EL2],	SYS_ELR);
> +
> +	val = __fixup_spsr_el2_write(ctxt, ctxt->sys_regs[SPSR_EL2]);
> +	write_sysreg_el1(val,	SYS_SPSR);
> +}
> +
> +static void __hyp_text __sysreg_restore_vel1_state(struct kvm_cpu_context *ctxt)
> +{
> +	u64 mpidr;
> +
> +	if (has_vhe()) {
> +		struct kvm_vcpu *vcpu;
> +
> +		/*
> +		 * Warning: this hack only works on VHE, because we only
> +		 * call this with the *guest* context, which is part of
> +		 * struct kvm_vcpu. On a host context, you'd get pure junk.
> +		 */
> +		vcpu = container_of(ctxt, struct kvm_vcpu, arch.ctxt);

This seems very fragile, just to find out whether the guest has hyp
capabilities. It would be at least nice to make sure this is indeed a
guest context.

The *clean* way to do it could be to have a pointer to kvm_vcpu in the
kvm_cpu_context which would be NULL for host contexts.

Otherwise, I'm under the impression that for a host context,
ctxt->sys_reg[HCR_EL2] == 0 and that this would also be true for a guest
without nested virt capability. Could we use something like that here?

Cheers,

Julien

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 35/59] KVM: arm/arm64: nv: Support multiple nested stage 2 mmu structures
  2019-06-21  9:38 ` [PATCH 35/59] KVM: arm/arm64: nv: Support multiple nested stage 2 mmu structures Marc Zyngier
@ 2019-06-25 12:19   ` Alexandru Elisei
  2019-07-03 13:47     ` Marc Zyngier
  2019-06-27 13:15   ` Julien Thierry
  2019-07-04 15:51   ` Alexandru Elisei
  2 siblings, 1 reply; 176+ messages in thread
From: Alexandru Elisei @ 2019-06-25 12:19 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave Martin

On 6/21/19 10:38 AM, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@arm.com>
>
> Add stage 2 mmu data structures for virtual EL2 and for nested guests.
> We don't yet populate shadow stage 2 page tables, but we now have a
> framework for getting to a shadow stage 2 pgd.
>
> We allocate twice the number of vcpus as stage 2 mmu structures because
> that's sufficient for each vcpu running two VMs without having to flush
> the stage 2 page tables.
>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/include/asm/kvm_host.h     |   4 +
>  arch/arm/include/asm/kvm_mmu.h      |   3 +
>  arch/arm64/include/asm/kvm_host.h   |  28 +++++
>  arch/arm64/include/asm/kvm_mmu.h    |   8 ++
>  arch/arm64/include/asm/kvm_nested.h |   7 ++
>  arch/arm64/kvm/nested.c             | 172 ++++++++++++++++++++++++++++
>  virt/kvm/arm/arm.c                  |  16 ++-
>  virt/kvm/arm/mmu.c                  |  31 ++---
>  8 files changed, 254 insertions(+), 15 deletions(-)
>
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index e3217c4ad25b..b821eb2383ad 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -424,4 +424,8 @@ static inline bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu)
>  	return true;
>  }
>  
> +static inline void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu) {}
> +static inline int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu) { return 0; }
> +
>  #endif /* __ARM_KVM_HOST_H__ */
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index be23e3f8e08c..e6984b6da2ce 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -420,6 +420,9 @@ static inline int hyp_map_aux_data(void)
>  
>  static inline void kvm_set_ipa_limit(void) {}
>  
> +static inline void kvm_init_s2_mmu(struct kvm_s2_mmu *mmu) {}
> +static inline void kvm_init_nested(struct kvm *kvm) {}
> +
>  static __always_inline u64 kvm_get_vttbr(struct kvm_s2_mmu *mmu)
>  {
>  	struct kvm_vmid *vmid = &mmu->vmid;
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 3dee5e17a4ee..cc238de170d2 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -88,11 +88,39 @@ struct kvm_s2_mmu {
>  	phys_addr_t	pgd_phys;
>  
>  	struct kvm *kvm;
> +
> +	/*
> +	 * For a shadow stage-2 MMU, the virtual vttbr programmed by the guest
> +	 * hypervisor.  Unused for kvm_arch->mmu. Set to 1 when the structure
> +	 * contains no valid information.
> +	 */
> +	u64	vttbr;
> +
> +	/* true when this represents a nested context where virtual HCR_EL2.VM == 1 */
> +	bool	nested_stage2_enabled;
> +
> +	/*
> +	 *  0: Nobody is currently using this, check vttbr for validity
> +	 * >0: Somebody is actively using this.
> +	 */
> +	atomic_t refcnt;
>  };
>  
> +static inline bool kvm_s2_mmu_valid(struct kvm_s2_mmu *mmu)
> +{
> +	return !(mmu->vttbr & 1);
> +}
> +
>  struct kvm_arch {
>  	struct kvm_s2_mmu mmu;
>  
> +	/*
> +	 * Stage 2 paging stage for VMs with nested virtual using a virtual
> +	 * VMID.
> +	 */
> +	struct kvm_s2_mmu *nested_mmus;
> +	size_t nested_mmus_size;
> +
>  	/* VTCR_EL2 value for this VM */
>  	u64    vtcr;
>  
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 1eb6e0ca61c2..32bcaa1845dc 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -100,6 +100,7 @@ alternative_cb_end
>  #include <asm/mmu_context.h>
>  #include <asm/pgtable.h>
>  #include <asm/kvm_emulate.h>
> +#include <asm/kvm_nested.h>
>  
>  void kvm_update_va_mask(struct alt_instr *alt,
>  			__le32 *origptr, __le32 *updptr, int nr_inst);
> @@ -164,6 +165,7 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
>  			     void **haddr);
>  void free_hyp_pgds(void);
>  
> +void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size);
>  void stage2_unmap_vm(struct kvm *kvm);
>  int kvm_alloc_stage2_pgd(struct kvm_s2_mmu *mmu);
>  void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
> @@ -635,5 +637,11 @@ static __always_inline void __load_guest_stage2(struct kvm_s2_mmu *mmu)
>  	asm(ALTERNATIVE("nop", "isb", ARM64_WORKAROUND_1165522));
>  }
>  
> +static inline u64 get_vmid(u64 vttbr)
> +{
> +	return (vttbr & VTTBR_VMID_MASK(kvm_get_vmid_bits())) >>
> +		VTTBR_VMID_SHIFT;
> +}
> +
>  #endif /* __ASSEMBLY__ */
>  #endif /* __ARM64_KVM_MMU_H__ */
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index 61e71d0d2151..d4021d0892bd 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -10,6 +10,13 @@ static inline bool nested_virt_in_use(const struct kvm_vcpu *vcpu)
>  		test_bit(KVM_ARM_VCPU_NESTED_VIRT, vcpu->arch.features);
>  }
>  
> +extern void kvm_init_nested(struct kvm *kvm);
> +extern int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu);
> +extern void kvm_init_s2_mmu(struct kvm_s2_mmu *mmu);
> +extern struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr);
> +extern void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu);
> +extern void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu);
> +
>  int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
>  extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
>  extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> index 3872e3cf1691..4b38dc5c0be3 100644
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
> @@ -18,7 +18,161 @@
>  #include <linux/kvm.h>
>  #include <linux/kvm_host.h>
>  
> +#include <asm/kvm_arm.h>
>  #include <asm/kvm_emulate.h>
> +#include <asm/kvm_mmu.h>
> +#include <asm/kvm_nested.h>
> +
> +void kvm_init_nested(struct kvm *kvm)
> +{
> +	kvm_init_s2_mmu(&kvm->arch.mmu);
> +
> +	kvm->arch.nested_mmus = NULL;
> +	kvm->arch.nested_mmus_size = 0;
> +}
> +
> +int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm *kvm = vcpu->kvm;
> +	struct kvm_s2_mmu *tmp;
> +	int num_mmus;
> +	int ret = -ENOMEM;
> +
> +	if (!test_bit(KVM_ARM_VCPU_NESTED_VIRT, vcpu->arch.features))
> +		return 0;
> +
> +	if (!cpus_have_const_cap(ARM64_HAS_NESTED_VIRT))
> +		return -EINVAL;

Here we fail if KVM_ARM_VCPU_NESTED_VIRT features was requested from the virtual
vcpu, but the nested capability isn't present. This function is called as a
result of the KVM_ARM_VCPU_INIT, and when this function fails, the
KVM_ARM_VCPU_INIT ioctl will also fail. This means that we cannot have a vcpu
with the nested virt feature on a system which doesn't support nested
virtualization.

However, commit 04/59 "KVM: arm64: nv: Introduce nested virtualization VCPU
feature" added the function nested_virt_in_use (in
arch/arm64/include/asm/kvm_nested.h) which checks for **both** conditions before
returning true. I believe the capability check is not required in
nested_virt_in_use.

> +
> +	mutex_lock(&kvm->lock);
> +
> +	num_mmus = atomic_read(&kvm->online_vcpus) * 2;
> +	tmp = __krealloc(kvm->arch.nested_mmus,
> +			 num_mmus * sizeof(*kvm->arch.nested_mmus),
> +			 GFP_KERNEL | __GFP_ZERO);
> +
> +	if (tmp) {
> +		if (tmp != kvm->arch.nested_mmus)
> +			kfree(kvm->arch.nested_mmus);
> +
> +		tmp[num_mmus - 1].kvm = kvm;
> +		atomic_set(&tmp[num_mmus - 1].refcnt, 0);
> +		ret = kvm_alloc_stage2_pgd(&tmp[num_mmus - 1]);
> +		if (ret)
> +			goto out;
> +
> +		tmp[num_mmus - 2].kvm = kvm;
> +		atomic_set(&tmp[num_mmus - 2].refcnt, 0);
> +		ret = kvm_alloc_stage2_pgd(&tmp[num_mmus - 2]);
> +		if (ret) {
> +			kvm_free_stage2_pgd(&tmp[num_mmus - 1]);
> +			goto out;
> +		}
> +
> +		kvm->arch.nested_mmus_size = num_mmus;
> +		kvm->arch.nested_mmus = tmp;
> +		tmp = NULL;
> +	}
> +
> +out:
> +	kfree(tmp);
> +	mutex_unlock(&kvm->lock);
> +	return ret;
> +}
> +
> +/* Must be called with kvm->lock held */
> +struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr)
> +{
> +	bool nested_stage2_enabled = hcr & HCR_VM;
> +	int i;
> +
> +	/* Don't consider the CnP bit for the vttbr match */
> +	vttbr = vttbr & ~1UL;
> +
> +	/* Search a mmu in the list using the virtual VMID as a key */
> +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> +		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
> +
> +		if (!kvm_s2_mmu_valid(mmu))
> +			continue;
> +
> +		if (nested_stage2_enabled &&
> +		    mmu->nested_stage2_enabled &&
> +		    vttbr == mmu->vttbr)
> +			return mmu;
> +
> +		if (!nested_stage2_enabled &&
> +		    !mmu->nested_stage2_enabled &&
> +		    get_vmid(vttbr) == get_vmid(mmu->vttbr))
> +			return mmu;
> +	}
> +	return NULL;
> +}
> +
> +static struct kvm_s2_mmu *get_s2_mmu_nested(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm *kvm = vcpu->kvm;
> +	u64 vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
> +	u64 hcr= vcpu_read_sys_reg(vcpu, HCR_EL2);
> +	struct kvm_s2_mmu *s2_mmu;
> +	int i;
> +
> +	s2_mmu = lookup_s2_mmu(kvm, vttbr, hcr);
> +	if (s2_mmu)
> +		goto out;
> +
> +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> +		s2_mmu = &kvm->arch.nested_mmus[i];
> +
> +		if (atomic_read(&s2_mmu->refcnt) == 0)
> +			break;
> +	}
> +	BUG_ON(atomic_read(&s2_mmu->refcnt)); /* We have struct MMUs to spare */
> +
> +	if (kvm_s2_mmu_valid(s2_mmu)) {
> +		/* Clear the old state */
> +		kvm_unmap_stage2_range(s2_mmu, 0, kvm_phys_size(kvm));
> +		if (s2_mmu->vmid.vmid_gen)
> +			kvm_call_hyp(__kvm_tlb_flush_vmid, s2_mmu);
> +	}
> +
> +	/*
> +	 * The virtual VMID (modulo CnP) will be used as a key when matching
> +	 * an existing kvm_s2_mmu.
> +	 */
> +	s2_mmu->vttbr = vttbr & ~1UL;
> +	s2_mmu->nested_stage2_enabled = hcr & HCR_VM;
> +
> +out:
> +	atomic_inc(&s2_mmu->refcnt);
> +	return s2_mmu;
> +}
> +
> +void kvm_init_s2_mmu(struct kvm_s2_mmu *mmu)
> +{
> +	mmu->vttbr = 1;
> +	mmu->nested_stage2_enabled = false;
> +	atomic_set(&mmu->refcnt, 0);
> +}
> +
> +void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu)
> +{
> +	if (is_hyp_ctxt(vcpu)) {
> +		vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu;
> +	} else {
> +		spin_lock(&vcpu->kvm->mmu_lock);
> +		vcpu->arch.hw_mmu = get_s2_mmu_nested(vcpu);
> +		spin_unlock(&vcpu->kvm->mmu_lock);
> +	}
> +}
> +
> +void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu)
> +{
> +	if (vcpu->arch.hw_mmu != &vcpu->kvm->arch.mmu) {
> +		atomic_dec(&vcpu->arch.hw_mmu->refcnt);
> +		vcpu->arch.hw_mmu = NULL;
> +	}
> +}
>  
>  /*
>   * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
> @@ -37,3 +191,21 @@ int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
>  
>  	return -EINVAL;
>  }
> +
> +void kvm_arch_flush_shadow_all(struct kvm *kvm)
> +{
> +	int i;
> +
> +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> +		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
> +
> +		WARN_ON(atomic_read(&mmu->refcnt));
> +
> +		if (!atomic_read(&mmu->refcnt))
> +			kvm_free_stage2_pgd(mmu);
> +	}
> +	kfree(kvm->arch.nested_mmus);
> +	kvm->arch.nested_mmus = NULL;
> +	kvm->arch.nested_mmus_size = 0;
> +	kvm_free_stage2_pgd(&kvm->arch.mmu);
> +}
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 5d4371633e1c..4e3cbfa1ecbe 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -36,6 +36,7 @@
>  #include <asm/kvm_arm.h>
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_mmu.h>
> +#include <asm/kvm_nested.h>
>  #include <asm/kvm_emulate.h>
>  #include <asm/kvm_coproc.h>
>  #include <asm/sections.h>
> @@ -126,6 +127,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  	kvm->arch.mmu.vmid.vmid_gen = 0;
>  	kvm->arch.mmu.kvm = kvm;
>  
> +	kvm_init_nested(kvm);
> +
>  	ret = create_hyp_mappings(kvm, kvm + 1, PAGE_HYP);
>  	if (ret)
>  		goto out_free_stage2_pgd;
> @@ -353,6 +356,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  	int *last_ran;
>  	kvm_host_data_t *cpu_data;
>  
> +	if (nested_virt_in_use(vcpu))
> +		kvm_vcpu_load_hw_mmu(vcpu);
> +
>  	last_ran = this_cpu_ptr(vcpu->kvm->arch.last_vcpu_ran);
>  	cpu_data = this_cpu_ptr(&kvm_host_data);
>  
> @@ -391,6 +397,9 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  	kvm_vgic_put(vcpu);
>  	kvm_vcpu_pmu_restore_host(vcpu);
>  
> +	if (nested_virt_in_use(vcpu))
> +		kvm_vcpu_put_hw_mmu(vcpu);
> +
>  	vcpu->cpu = -1;
>  
>  	kvm_arm_set_running_vcpu(NULL);
> @@ -968,8 +977,13 @@ static int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
>  
>  	vcpu->arch.target = phys_target;
>  
> +	/* Prepare for nested if required */
> +	ret = kvm_vcpu_init_nested(vcpu);
> +
>  	/* Now we know what it is, we can reset it. */
> -	ret = kvm_reset_vcpu(vcpu);
> +	if (!ret)
> +		ret = kvm_reset_vcpu(vcpu);
> +
>  	if (ret) {
>  		vcpu->arch.target = -1;
>  		bitmap_zero(vcpu->arch.features, KVM_VCPU_MAX_FEATURES);
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index bb1be4ea55ec..faa61a81c8cc 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -325,7 +325,7 @@ static void unmap_stage2_puds(struct kvm_s2_mmu *mmu, pgd_t *pgd,
>  }
>  
>  /**
> - * unmap_stage2_range -- Clear stage2 page table entries to unmap a range
> + * kvm_unmap_stage2_range -- Clear stage2 page table entries to unmap a range
>   * @kvm:   The VM pointer
>   * @start: The intermediate physical base address of the range to unmap
>   * @size:  The size of the area to unmap
> @@ -335,7 +335,7 @@ static void unmap_stage2_puds(struct kvm_s2_mmu *mmu, pgd_t *pgd,
>   * destroying the VM), otherwise another faulting VCPU may come in and mess
>   * with things behind our backs.
>   */
> -static void unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
> +void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
>  {
>  	struct kvm *kvm = mmu->kvm;
>  	pgd_t *pgd;
> @@ -924,6 +924,10 @@ int kvm_alloc_stage2_pgd(struct kvm_s2_mmu *mmu)
>  
>  	mmu->pgd = pgd;
>  	mmu->pgd_phys = pgd_phys;
> +	mmu->vmid.vmid_gen = 0;
> +
> +	kvm_init_s2_mmu(mmu);
> +
>  	return 0;
>  }
>  
> @@ -962,7 +966,7 @@ static void stage2_unmap_memslot(struct kvm *kvm,
>  
>  		if (!(vma->vm_flags & VM_PFNMAP)) {
>  			gpa_t gpa = addr + (vm_start - memslot->userspace_addr);
> -			unmap_stage2_range(&kvm->arch.mmu, gpa, vm_end - vm_start);
> +			kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, vm_end - vm_start);
>  		}
>  		hva = vm_end;
>  	} while (hva < reg_end);
> @@ -1001,7 +1005,7 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
>  
>  	spin_lock(&kvm->mmu_lock);
>  	if (mmu->pgd) {
> -		unmap_stage2_range(mmu, 0, kvm_phys_size(kvm));
> +		kvm_unmap_stage2_range(mmu, 0, kvm_phys_size(kvm));
>  		pgd = READ_ONCE(mmu->pgd);
>  		mmu->pgd = NULL;
>  	}
> @@ -1093,7 +1097,7 @@ static int stage2_set_pmd_huge(struct kvm_s2_mmu *mmu,
>  		 * get handled accordingly.
>  		 */
>  		if (!pmd_thp_or_huge(old_pmd)) {
> -			unmap_stage2_range(mmu, addr & S2_PMD_MASK, S2_PMD_SIZE);
> +			kvm_unmap_stage2_range(mmu, addr & S2_PMD_MASK, S2_PMD_SIZE);
>  			goto retry;
>  		}
>  		/*
> @@ -1145,7 +1149,7 @@ static int stage2_set_pud_huge(struct kvm_s2_mmu *mmu,
>  		 * the range for this block and retry.
>  		 */
>  		if (!stage2_pud_huge(kvm, old_pud)) {
> -			unmap_stage2_range(mmu, addr & S2_PUD_MASK, S2_PUD_SIZE);
> +			kvm_unmap_stage2_range(mmu, addr & S2_PUD_MASK, S2_PUD_SIZE);
>  			goto retry;
>  		}
>  
> @@ -2047,7 +2051,7 @@ static int handle_hva_to_gpa(struct kvm *kvm,
>  
>  static int kvm_unmap_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data)
>  {
> -	unmap_stage2_range(&kvm->arch.mmu, gpa, size);
> +	kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, size);
>  	return 0;
>  }
>  
> @@ -2360,7 +2364,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>  
>  	spin_lock(&kvm->mmu_lock);
>  	if (ret)
> -		unmap_stage2_range(&kvm->arch.mmu, mem->guest_phys_addr, mem->memory_size);
> +		kvm_unmap_stage2_range(&kvm->arch.mmu, mem->guest_phys_addr, mem->memory_size);
>  	else
>  		stage2_flush_memslot(&kvm->arch.mmu, memslot);
>  	spin_unlock(&kvm->mmu_lock);
> @@ -2384,11 +2388,6 @@ void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen)
>  {
>  }
>  
> -void kvm_arch_flush_shadow_all(struct kvm *kvm)
> -{
> -	kvm_free_stage2_pgd(&kvm->arch.mmu);
> -}
> -
>  void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>  				   struct kvm_memory_slot *slot)
>  {
> @@ -2396,7 +2395,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>  	phys_addr_t size = slot->npages << PAGE_SHIFT;
>  
>  	spin_lock(&kvm->mmu_lock);
> -	unmap_stage2_range(&kvm->arch.mmu, gpa, size);
> +	kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, size);
>  	spin_unlock(&kvm->mmu_lock);
>  }
>  
> @@ -2467,3 +2466,7 @@ void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled)
>  
>  	trace_kvm_toggle_cache(*vcpu_pc(vcpu), was_enabled, now_enabled);
>  }
> +
> +__weak void kvm_arch_flush_shadow_all(struct kvm *kvm)
> +{
> +}

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 21/59] KVM: arm64: nv: Set a handler for the system instruction traps
  2019-06-21  9:38 ` [PATCH 21/59] KVM: arm64: nv: Set a handler for the system instruction traps Marc Zyngier
@ 2019-06-25 12:55   ` Julien Thierry
  2019-07-03 14:15     ` Marc Zyngier
  0 siblings, 1 reply; 176+ messages in thread
From: Julien Thierry @ 2019-06-25 12:55 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	James Morse, Suzuki K Poulose



On 06/21/2019 10:38 AM, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> When HCR.NV bit is set, execution of the EL2 translation regime address
> aranslation instructions and TLB maintenance instructions are trapped to
> EL2. In addition, execution of the EL1 translation regime address
> aranslation instructions and TLB maintenance instructions that are only

What's "translation regime address aranslation" ? I would guess
"aranslation" should be removed, but since the same pattern appears
twice in the commit doubt took over me :) .

> accessible from EL2 and above are trapped to EL2. In these cases,
> ESR_EL2.EC will be set to 0x18.
> 
> Change the existing handler to handle those system instructions as well
> as MRS/MSR instructions.  Emulation of each system instructions will be
> done in separate patches.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/include/asm/kvm_coproc.h |  2 +-
>  arch/arm64/kvm/handle_exit.c        |  2 +-
>  arch/arm64/kvm/sys_regs.c           | 53 +++++++++++++++++++++++++----
>  arch/arm64/kvm/trace.h              |  2 +-
>  4 files changed, 50 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_coproc.h b/arch/arm64/include/asm/kvm_coproc.h
> index 0b52377a6c11..1b3d21bd8adb 100644
> --- a/arch/arm64/include/asm/kvm_coproc.h
> +++ b/arch/arm64/include/asm/kvm_coproc.h
> @@ -43,7 +43,7 @@ int kvm_handle_cp14_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
>  int kvm_handle_cp14_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
>  int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
>  int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
> -int kvm_handle_sys_reg(struct kvm_vcpu *vcpu, struct kvm_run *run);
> +int kvm_handle_sys(struct kvm_vcpu *vcpu, struct kvm_run *run);
>  
>  #define kvm_coproc_table_init kvm_sys_reg_table_init
>  void kvm_sys_reg_table_init(void);
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 2517711f034f..e662f23b63a1 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -236,7 +236,7 @@ static exit_handle_fn arm_exit_handlers[] = {
>  	[ESR_ELx_EC_SMC32]	= handle_smc,
>  	[ESR_ELx_EC_HVC64]	= handle_hvc,
>  	[ESR_ELx_EC_SMC64]	= handle_smc,
> -	[ESR_ELx_EC_SYS64]	= kvm_handle_sys_reg,
> +	[ESR_ELx_EC_SYS64]	= kvm_handle_sys,
>  	[ESR_ELx_EC_SVE]	= handle_sve,
>  	[ESR_ELx_EC_ERET]	= kvm_handle_eret,
>  	[ESR_ELx_EC_IABT_LOW]	= kvm_handle_guest_abort,
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 1d1312425cf2..e711dde4511c 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -2597,6 +2597,40 @@ static int emulate_sys_reg(struct kvm_vcpu *vcpu,
>  	return 1;
>  }
>  
> +static int emulate_tlbi(struct kvm_vcpu *vcpu,
> +			     struct sys_reg_params *params)
> +{
> +	/* TODO: support tlbi instruction emulation*/
> +	kvm_inject_undefined(vcpu);
> +	return 1;
> +}
> +
> +static int emulate_at(struct kvm_vcpu *vcpu,
> +			     struct sys_reg_params *params)
> +{
> +	/* TODO: support address translation instruction emulation */
> +	kvm_inject_undefined(vcpu);
> +	return 1;
> +}
> +
> +static int emulate_sys_instr(struct kvm_vcpu *vcpu,
> +			     struct sys_reg_params *params)
> +{
> +	int ret = 0;
> +
> +	/* TLB maintenance instructions*/
> +	if (params->CRn == 0b1000)
> +		ret = emulate_tlbi(vcpu, params);
> +	/* Address Translation instructions */
> +	else if (params->CRn == 0b0111 && params->CRm == 0b1000)
> +		ret = emulate_at(vcpu, params);
> +

So, in theory the NV bit shouldn't trap other Op0 == 1 instructions.
Would it be worth adding a WARN() or BUG() in an "else" branch here,
just in case?

Thanks,

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 11/59] KVM: arm64: nv: Inject HVC exceptions to the virtual EL2
  2019-06-21  9:37 ` [PATCH 11/59] KVM: arm64: nv: Inject HVC exceptions to the virtual EL2 Marc Zyngier
@ 2019-06-25 13:13   ` Alexandru Elisei
  2019-07-03 14:16     ` Marc Zyngier
  2019-07-30 14:08     ` Alexandru Elisei
  0 siblings, 2 replies; 176+ messages in thread
From: Alexandru Elisei @ 2019-06-25 13:13 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave Martin

On 6/21/19 10:37 AM, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
>
> Now that the psci call is done by the smc instruction when nested
This suggests that we have support for PSCI calls using SMC as the conduit, but
that is not the case, as the handle_smc function is not changed by this commit,
and support for PSCI via SMC is added later in patch 22/59 "KVM: arm64: nv:
Handle PSCI call via smc from the guest". Perhaps the commit message should be
reworded to reflect that?
> virtualization is enabled, it is clear that all hvc instruction from the
> VM (including from the virtual EL2) are supposed to handled in the
> virtual EL2.
>
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/kvm/handle_exit.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 516aead3c2a9..6c0ac52b34cc 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -30,6 +30,7 @@
>  #include <asm/kvm_coproc.h>
>  #include <asm/kvm_emulate.h>
>  #include <asm/kvm_mmu.h>
> +#include <asm/kvm_nested.h>
>  #include <asm/debug-monitors.h>
>  #include <asm/traps.h>
>  
> @@ -52,6 +53,12 @@ static int handle_hvc(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  			    kvm_vcpu_hvc_get_imm(vcpu));
>  	vcpu->stat.hvc_exit_stat++;
>  
> +	/* Forward hvc instructions to the virtual EL2 if the guest has EL2. */
> +	if (nested_virt_in_use(vcpu)) {
> +		kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
> +		return 1;
> +	}
> +
>  	ret = kvm_hvc_call_handler(vcpu);
>  	if (ret < 0) {
>  		vcpu_set_reg(vcpu, 0, ~0UL);

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 13/59] KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg()
  2019-06-24 12:42   ` Julien Thierry
@ 2019-06-25 14:02     ` Alexandru Elisei
  2019-07-03 12:15     ` Marc Zyngier
  1 sibling, 0 replies; 176+ messages in thread
From: Alexandru Elisei @ 2019-06-25 14:02 UTC (permalink / raw)
  To: Julien Thierry, Marc Zyngier, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	James Morse, Suzuki K Poulose

On 6/24/19 1:42 PM, Julien Thierry wrote:
>
> On 06/21/2019 10:37 AM, Marc Zyngier wrote:
>> From: Andre Przywara <andre.przywara@arm.com>
>>
>> KVM internally uses accessor functions when reading or writing the
>> guest's system registers. This takes care of accessing either the stored
>> copy or using the "live" EL1 system registers when the host uses VHE.
>>
>> With the introduction of virtual EL2 we add a bunch of EL2 system
>> registers, which now must also be taken care of:
>> - If the guest is running in vEL2, and we access an EL1 sysreg, we must
>>   revert to the stored version of that, and not use the CPU's copy.
>> - If the guest is running in vEL1, and we access an EL2 sysreg, we must
>>   also use the stored version, since the CPU carries the EL1 copy.
>> - Some EL2 system registers are supposed to affect the current execution
>>   of the system, so we need to put them into their respective EL1
>>   counterparts. For this we need to define a mapping between the two.
>>   This is done using the newly introduced struct el2_sysreg_map.
>> - Some EL2 system registers have a different format than their EL1
>>   counterpart, so we need to translate them before writing them to the
>>   CPU. This is done using an (optional) translate function in the map.
>> - There are the three special registers SP_EL2, SPSR_EL2 and ELR_EL2,
>>   which need some separate handling.
>>
>> All of these cases are now wrapped into the existing accessor functions,
>> so KVM users wouldn't need to care whether they access EL2 or EL1
>> registers and also which state the guest is in.
>>
>> This handles what was formerly known as the "shadow state" dynamically,
>> without requiring a separate copy for each vCPU EL.
>>
>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm64/include/asm/kvm_emulate.h |   6 +
>>  arch/arm64/include/asm/kvm_host.h    |   5 +
>>  arch/arm64/kvm/sys_regs.c            | 163 +++++++++++++++++++++++++++
>>  3 files changed, 174 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
>> index c43aac5fed69..f37006b6eec4 100644
>> --- a/arch/arm64/include/asm/kvm_emulate.h
>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>> @@ -70,6 +70,12 @@ void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu);
>>  int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2);
>>  int kvm_inject_nested_irq(struct kvm_vcpu *vcpu);
>>  
>> +u64 translate_tcr(u64 tcr);
>> +u64 translate_cptr(u64 tcr);
>> +u64 translate_sctlr(u64 tcr);
>> +u64 translate_ttbr0(u64 tcr);
>> +u64 translate_cnthctl(u64 tcr);
>> +
>>  static inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
>>  {
>>  	return !(vcpu->arch.hcr_el2 & HCR_RW);
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 2d4290d2513a..dae9c42a7219 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -217,6 +217,11 @@ enum vcpu_sysreg {
>>  	NR_SYS_REGS	/* Nothing after this line! */
>>  };
>>  
>> +static inline bool sysreg_is_el2(int reg)
>> +{
>> +	return reg >= FIRST_EL2_SYSREG && reg < NR_SYS_REGS;
>> +}
>> +
>>  /* 32bit mapping */
>>  #define c0_MPIDR	(MPIDR_EL1 * 2)	/* MultiProcessor ID Register */
>>  #define c0_CSSELR	(CSSELR_EL1 * 2)/* Cache Size Selection Register */
>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>> index 693dd063c9c2..d024114da162 100644
>> --- a/arch/arm64/kvm/sys_regs.c
>> +++ b/arch/arm64/kvm/sys_regs.c
>> @@ -76,11 +76,142 @@ static bool write_to_read_only(struct kvm_vcpu *vcpu,
>>  	return false;
>>  }
>>  
>> +static u64 tcr_el2_ips_to_tcr_el1_ps(u64 tcr_el2)
>> +{
>> +	return ((tcr_el2 & TCR_EL2_PS_MASK) >> TCR_EL2_PS_SHIFT)
>> +		<< TCR_IPS_SHIFT;
>> +}
>> +
>> +u64 translate_tcr(u64 tcr)
>> +{
>> +	return TCR_EPD1_MASK |				/* disable TTBR1_EL1 */
>> +	       ((tcr & TCR_EL2_TBI) ? TCR_TBI0 : 0) |
>> +	       tcr_el2_ips_to_tcr_el1_ps(tcr) |
>> +	       (tcr & TCR_EL2_TG0_MASK) |
>> +	       (tcr & TCR_EL2_ORGN0_MASK) |
>> +	       (tcr & TCR_EL2_IRGN0_MASK) |
>> +	       (tcr & TCR_EL2_T0SZ_MASK);
>> +}
>> +
>> +u64 translate_cptr(u64 cptr_el2)
>> +{
>> +	u64 cpacr_el1 = 0;
>> +
>> +	if (!(cptr_el2 & CPTR_EL2_TFP))
>> +		cpacr_el1 |= CPACR_EL1_FPEN;
>> +	if (cptr_el2 & CPTR_EL2_TTA)
>> +		cpacr_el1 |= CPACR_EL1_TTA;
>> +	if (!(cptr_el2 & CPTR_EL2_TZ))
>> +		cpacr_el1 |= CPACR_EL1_ZEN;
>> +
>> +	return cpacr_el1;
>> +}
>> +
>> +u64 translate_sctlr(u64 sctlr)
>> +{
>> +	/* Bit 20 is RES1 in SCTLR_EL1, but RES0 in SCTLR_EL2 */
>> +	return sctlr | BIT(20);
>> +}
>> +
>> +u64 translate_ttbr0(u64 ttbr0)
>> +{
>> +	/* Force ASID to 0 (ASID 0 or RES0) */
>> +	return ttbr0 & ~GENMASK_ULL(63, 48);
>> +}
>> +
>> +u64 translate_cnthctl(u64 cnthctl)
>> +{
>> +	return ((cnthctl & 0x3) << 10) | (cnthctl & 0xfc);
>> +}
>> +
>> +#define EL2_SYSREG(el2, el1, translate)	\
>> +	[el2 - FIRST_EL2_SYSREG] = { el2, el1, translate }
>> +#define PURE_EL2_SYSREG(el2) \
>> +	[el2 - FIRST_EL2_SYSREG] = { el2,__INVALID_SYSREG__, NULL }
>> +/*
>> + * Associate vEL2 registers to their EL1 counterparts on the CPU.
>> + * The translate function can be NULL, when the register layout is identical.
>> + */
>> +struct el2_sysreg_map {
>> +	int sysreg;	/* EL2 register index into the array above */
>> +	int mapping;	/* associated EL1 register */
>> +	u64 (*translate)(u64 value);
>> +} nested_sysreg_map[NR_SYS_REGS - FIRST_EL2_SYSREG] = {
>> +	PURE_EL2_SYSREG( VPIDR_EL2 ),
>> +	PURE_EL2_SYSREG( VMPIDR_EL2 ),
>> +	PURE_EL2_SYSREG( ACTLR_EL2 ),
>> +	PURE_EL2_SYSREG( HCR_EL2 ),
>> +	PURE_EL2_SYSREG( MDCR_EL2 ),
>> +	PURE_EL2_SYSREG( HSTR_EL2 ),
>> +	PURE_EL2_SYSREG( HACR_EL2 ),
>> +	PURE_EL2_SYSREG( VTTBR_EL2 ),
>> +	PURE_EL2_SYSREG( VTCR_EL2 ),
>> +	PURE_EL2_SYSREG( RVBAR_EL2 ),
>> +	PURE_EL2_SYSREG( RMR_EL2 ),
>> +	PURE_EL2_SYSREG( TPIDR_EL2 ),
>> +	PURE_EL2_SYSREG( CNTVOFF_EL2 ),
>> +	PURE_EL2_SYSREG( CNTHCTL_EL2 ),
>> +	PURE_EL2_SYSREG( HPFAR_EL2 ),
>> +	EL2_SYSREG(      SCTLR_EL2,  SCTLR_EL1,      translate_sctlr ),
>> +	EL2_SYSREG(      CPTR_EL2,   CPACR_EL1,      translate_cptr  ),
>> +	EL2_SYSREG(      TTBR0_EL2,  TTBR0_EL1,      translate_ttbr0 ),
>> +	EL2_SYSREG(      TTBR1_EL2,  TTBR1_EL1,      NULL            ),
>> +	EL2_SYSREG(      TCR_EL2,    TCR_EL1,        translate_tcr   ),
>> +	EL2_SYSREG(      VBAR_EL2,   VBAR_EL1,       NULL            ),
>> +	EL2_SYSREG(      AFSR0_EL2,  AFSR0_EL1,      NULL            ),
>> +	EL2_SYSREG(      AFSR1_EL2,  AFSR1_EL1,      NULL            ),
>> +	EL2_SYSREG(      ESR_EL2,    ESR_EL1,        NULL            ),
>> +	EL2_SYSREG(      FAR_EL2,    FAR_EL1,        NULL            ),
>> +	EL2_SYSREG(      MAIR_EL2,   MAIR_EL1,       NULL            ),
>> +	EL2_SYSREG(      AMAIR_EL2,  AMAIR_EL1,      NULL            ),
>> +};
>> +
>> +static
>> +const struct el2_sysreg_map *find_el2_sysreg(const struct el2_sysreg_map *map,
>> +					     int reg)
>> +{
>> +	const struct el2_sysreg_map *entry;
>> +
>> +	if (!sysreg_is_el2(reg))
>> +		return NULL;
>> +
>> +	entry = &nested_sysreg_map[reg - FIRST_EL2_SYSREG];
>> +	if (entry->sysreg == __INVALID_SYSREG__)
>> +		return NULL;
>> +
>> +	return entry;
>> +}
>> +
>>  u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
>>  {
>> +
>>  	if (!vcpu->arch.sysregs_loaded_on_cpu)
>>  		goto immediate_read;
>>  
>> +	if (unlikely(sysreg_is_el2(reg))) {
>> +		const struct el2_sysreg_map *el2_reg;
>> +
>> +		if (!is_hyp_ctxt(vcpu))
>> +			goto immediate_read;
>> +
>> +		el2_reg = find_el2_sysreg(nested_sysreg_map, reg);
>> +		if (el2_reg) {
>> +			/*
>> +			 * If this register does not have an EL1 counterpart,
>> +			 * then read the stored EL2 version.
>> +			 */
>> +			if (el2_reg->mapping == __INVALID_SYSREG__)
> In this patch, find_el2_sysreg returns NULL for PURE_EL2 registers. So
> for PURE_EL2, the access would go through the switch case. However this
> branch suggest that for PURE_EL2 register we intend to do the read from
> the memory backed version.
>
> Which should it be?
From my understanding of the code, find_el2_sysreg returns NULL when reg is not
an EL2 register or when the entry associated with reg in nested_sysreg_map is
zero (reg is not in the map).
>
>> +				goto immediate_read;
>> +
>> +			/* Get the current version of the EL1 counterpart. */
>> +			reg = el2_reg->mapping;
>> +		}
>> +	} else {
>> +		/* EL1 register can't be on the CPU if the guest is in vEL2. */
>> +		if (unlikely(is_hyp_ctxt(vcpu)))
>> +			goto immediate_read;
>> +	}
>> +
>>  	/*
>>  	 * System registers listed in the switch are not saved on every
>>  	 * exit from the guest but are only saved on vcpu_put.
>> @@ -114,6 +245,8 @@ u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
>>  	case DACR32_EL2:	return read_sysreg_s(SYS_DACR32_EL2);
>>  	case IFSR32_EL2:	return read_sysreg_s(SYS_IFSR32_EL2);
>>  	case DBGVCR32_EL2:	return read_sysreg_s(SYS_DBGVCR32_EL2);
>> +	case SP_EL2:		return read_sysreg(sp_el1);
>> +	case ELR_EL2:		return read_sysreg_el1(SYS_ELR);
>>  	}
>>  
>>  immediate_read:
>> @@ -125,6 +258,34 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
>>  	if (!vcpu->arch.sysregs_loaded_on_cpu)
>>  		goto immediate_write;
>>  
>> +	if (unlikely(sysreg_is_el2(reg))) {
>> +		const struct el2_sysreg_map *el2_reg;
>> +
>> +		if (!is_hyp_ctxt(vcpu))
>> +			goto immediate_write;
>> +
>> +		/* Store the EL2 version in the sysregs array. */
>> +		__vcpu_sys_reg(vcpu, reg) = val;
>> +
>> +		el2_reg = find_el2_sysreg(nested_sysreg_map, reg);
>> +		if (el2_reg) {
>> +			/* Does this register have an EL1 counterpart? */
>> +			if (el2_reg->mapping == __INVALID_SYSREG__)
>> +				return;
> As in the read case, this is never reached and we'll go through the
> switch case.
>
> Cheers,
>

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 23/59] KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting
  2019-06-21  9:38 ` [PATCH 23/59] KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting Marc Zyngier
@ 2019-06-25 14:19   ` Julien Thierry
  2019-07-02 12:54     ` Alexandru Elisei
  2019-07-03 14:18     ` Marc Zyngier
  0 siblings, 2 replies; 176+ messages in thread
From: Julien Thierry @ 2019-06-25 14:19 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	James Morse, Suzuki K Poulose



On 06/21/2019 10:38 AM, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> Forward exceptions due to WFI or WFE instructions to the virtual EL2 if
> they are not coming from the virtual EL2 and virtual HCR_EL2.TWX is set.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/include/asm/kvm_nested.h |  2 ++
>  arch/arm64/kvm/Makefile             |  1 +
>  arch/arm64/kvm/handle_exit.c        | 13 +++++++++-
>  arch/arm64/kvm/nested.c             | 39 +++++++++++++++++++++++++++++
>  4 files changed, 54 insertions(+), 1 deletion(-)
>  create mode 100644 arch/arm64/kvm/nested.c
> 
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index 8a3d121a0b42..645e5e11b749 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -10,4 +10,6 @@ static inline bool nested_virt_in_use(const struct kvm_vcpu *vcpu)
>  		test_bit(KVM_ARM_VCPU_NESTED_VIRT, vcpu->arch.features);
>  }
>  
> +int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
> +
>  #endif /* __ARM64_KVM_NESTED_H */
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index 9e450aea7db6..f11bd8b0d837 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -36,4 +36,5 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/irqchip.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
>  kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
>  
> +kvm-$(CONFIG_KVM_ARM_HOST) += nested.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += emulate-nested.o
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index e348c15c81bc..ddba212fd6ec 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -127,7 +127,18 @@ static int handle_no_fpsimd(struct kvm_vcpu *vcpu, struct kvm_run *run)
>   */
>  static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  {
> -	if (kvm_vcpu_get_hsr(vcpu) & ESR_ELx_WFx_ISS_WFE) {
> +	bool is_wfe = !!(kvm_vcpu_get_hsr(vcpu) & ESR_ELx_WFx_ISS_WFE);
> +
> +	if (nested_virt_in_use(vcpu)) {
> +		int ret = handle_wfx_nested(vcpu, is_wfe);
> +
> +		if (ret < 0 && ret != -EINVAL)
> +			return ret;
> +		else if (ret >= 0)
> +			return ret;

I think you can simplify this:

	if (ret != -EINVAL)
		return ret;

Cheers,

Julien


> +	}
> +
> +	if (is_wfe) {
>  		trace_kvm_wfx_arm64(*vcpu_pc(vcpu), true);
>  		vcpu->stat.wfe_exit_stat++;
>  		kvm_vcpu_on_spin(vcpu, vcpu_mode_priv(vcpu));
> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> new file mode 100644
> index 000000000000..3872e3cf1691
> --- /dev/null
> +++ b/arch/arm64/kvm/nested.c
> @@ -0,0 +1,39 @@
> +/*
> + * Copyright (C) 2017 - Columbia University and Linaro Ltd.
> + * Author: Jintack Lim <jintack.lim@linaro.org>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <linux/kvm.h>
> +#include <linux/kvm_host.h>
> +
> +#include <asm/kvm_emulate.h>
> +
> +/*
> + * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
> + * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
> + * handle this.
> + */
> +int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
> +{
> +	u64 hcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
> +
> +	if (vcpu_mode_el2(vcpu))
> +		return -EINVAL;
> +
> +	if ((is_wfe && (hcr_el2 & HCR_TWE)) || (!is_wfe && (hcr_el2 & HCR_TWI)))
> +		return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
> +
> +	return -EINVAL;
> +}
> 

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 13/59] KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg()
  2019-06-21  9:37 ` [PATCH 13/59] KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg() Marc Zyngier
  2019-06-24 12:42   ` Julien Thierry
@ 2019-06-25 15:18   ` Alexandru Elisei
  2019-07-01  9:58     ` Alexandru Elisei
  2019-07-03 15:59     ` Marc Zyngier
  2019-06-26 15:04   ` Alexandru Elisei
  2019-07-01 12:10   ` Alexandru Elisei
  3 siblings, 2 replies; 176+ messages in thread
From: Alexandru Elisei @ 2019-06-25 15:18 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave Martin

Hi Marc,

A question regarding this patch. This patch modifies vcpu_{read,write}_sys_reg
to handle virtual EL2 registers. However, the file kvm/emulate-nested.c added by
patch 10/59 "KVM: arm64: nv: Support virtual EL2 exceptions" already uses
vcpu_{read,write}_sys_reg to access EL2 registers. In my opinion, it doesn't
really matter which comes first because nested support is only enabled in the
last patch of the series, but I thought I should bring this up in case it is not
what you intended.

On 6/21/19 10:37 AM, Marc Zyngier wrote:
> From: Andre Przywara <andre.przywara@arm.com>
>
> KVM internally uses accessor functions when reading or writing the
> guest's system registers. This takes care of accessing either the stored
> copy or using the "live" EL1 system registers when the host uses VHE.
>
> With the introduction of virtual EL2 we add a bunch of EL2 system
> registers, which now must also be taken care of:
> - If the guest is running in vEL2, and we access an EL1 sysreg, we must
>   revert to the stored version of that, and not use the CPU's copy.
> - If the guest is running in vEL1, and we access an EL2 sysreg, we must
>   also use the stored version, since the CPU carries the EL1 copy.
> - Some EL2 system registers are supposed to affect the current execution
>   of the system, so we need to put them into their respective EL1
>   counterparts. For this we need to define a mapping between the two.
>   This is done using the newly introduced struct el2_sysreg_map.
> - Some EL2 system registers have a different format than their EL1
>   counterpart, so we need to translate them before writing them to the
>   CPU. This is done using an (optional) translate function in the map.
> - There are the three special registers SP_EL2, SPSR_EL2 and ELR_EL2,
>   which need some separate handling.
I see no change in this patch related to SPSR_EL2. Special handling of SPSR_EL2
is added in the next patch, patch 14/59 "KVM: arm64: nv: Handle SPSR_EL2 specially".
>
> All of these cases are now wrapped into the existing accessor functions,
> so KVM users wouldn't need to care whether they access EL2 or EL1
> registers and also which state the guest is in.
>
> This handles what was formerly known as the "shadow state" dynamically,
> without requiring a separate copy for each vCPU EL.
>
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/include/asm/kvm_emulate.h |   6 +
>  arch/arm64/include/asm/kvm_host.h    |   5 +
>  arch/arm64/kvm/sys_regs.c            | 163 +++++++++++++++++++++++++++
>  3 files changed, 174 insertions(+)
>
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index c43aac5fed69..f37006b6eec4 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -70,6 +70,12 @@ void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu);
>  int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2);
>  int kvm_inject_nested_irq(struct kvm_vcpu *vcpu);
>  
> +u64 translate_tcr(u64 tcr);
> +u64 translate_cptr(u64 tcr);
> +u64 translate_sctlr(u64 tcr);
> +u64 translate_ttbr0(u64 tcr);
> +u64 translate_cnthctl(u64 tcr);
> +
>  static inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
>  {
>  	return !(vcpu->arch.hcr_el2 & HCR_RW);
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 2d4290d2513a..dae9c42a7219 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -217,6 +217,11 @@ enum vcpu_sysreg {
>  	NR_SYS_REGS	/* Nothing after this line! */
>  };
>  
> +static inline bool sysreg_is_el2(int reg)
> +{
> +	return reg >= FIRST_EL2_SYSREG && reg < NR_SYS_REGS;
> +}
> +
>  /* 32bit mapping */
>  #define c0_MPIDR	(MPIDR_EL1 * 2)	/* MultiProcessor ID Register */
>  #define c0_CSSELR	(CSSELR_EL1 * 2)/* Cache Size Selection Register */
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 693dd063c9c2..d024114da162 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -76,11 +76,142 @@ static bool write_to_read_only(struct kvm_vcpu *vcpu,
>  	return false;
>  }
>  
> +static u64 tcr_el2_ips_to_tcr_el1_ps(u64 tcr_el2)
The code seems to suggest that you are translating TCR_EL2.PS to TCR_EL1.IPS.
Perhaps the function should be named tcr_el2_ps_to_tcr_el1_ips?
> +{
> +	return ((tcr_el2 & TCR_EL2_PS_MASK) >> TCR_EL2_PS_SHIFT)
> +		<< TCR_IPS_SHIFT;
> +}
> +
> +u64 translate_tcr(u64 tcr)
> +{
> +	return TCR_EPD1_MASK |				/* disable TTBR1_EL1 */
> +	       ((tcr & TCR_EL2_TBI) ? TCR_TBI0 : 0) |
> +	       tcr_el2_ips_to_tcr_el1_ps(tcr) |
> +	       (tcr & TCR_EL2_TG0_MASK) |
> +	       (tcr & TCR_EL2_ORGN0_MASK) |
> +	       (tcr & TCR_EL2_IRGN0_MASK) |
> +	       (tcr & TCR_EL2_T0SZ_MASK);
> +}
> +
> +u64 translate_cptr(u64 cptr_el2)
The argument name is not consistent with the other translate_* functions. I
think it is reasonably obvious that you are translating an EL2 register.
> +{
> +	u64 cpacr_el1 = 0;
> +
> +	if (!(cptr_el2 & CPTR_EL2_TFP))
> +		cpacr_el1 |= CPACR_EL1_FPEN;
> +	if (cptr_el2 & CPTR_EL2_TTA)
> +		cpacr_el1 |= CPACR_EL1_TTA;
> +	if (!(cptr_el2 & CPTR_EL2_TZ))
> +		cpacr_el1 |= CPACR_EL1_ZEN;
> +
> +	return cpacr_el1;
> +}
> +
> +u64 translate_sctlr(u64 sctlr)
> +{
> +	/* Bit 20 is RES1 in SCTLR_EL1, but RES0 in SCTLR_EL2 */
> +	return sctlr | BIT(20);
> +}
> +
> +u64 translate_ttbr0(u64 ttbr0)
> +{
> +	/* Force ASID to 0 (ASID 0 or RES0) */
Are you forcing ASID to 0 because you are only expecting a non-vhe guest
hypervisor to access ttbr0_el2, in which case the architecture says that the
ASID field is RES0? Is it so unlikely that a vhe guest hypervisor will access
ttbr0_el2 directly that it's not worth adding a check for that?
> +	return ttbr0 & ~GENMASK_ULL(63, 48);
> +}
> +
> +u64 translate_cnthctl(u64 cnthctl)
> +{
> +	return ((cnthctl & 0x3) << 10) | (cnthctl & 0xfc);

Patch 16/59 "KVM: arm64: nv: Save/Restore vEL2 sysregs" suggests that you are
translating CNTHCTL to write it to CNTKCTL_EL1. Looking at ARM DDI 0487D.b,
CNTKCTL_EL1 has bits 63:10 RES0. I think the correct value should be ((cnthctl &
0x3) << 8) | (cnthctl & 0xfc).

> +}
> +
> +#define EL2_SYSREG(el2, el1, translate)	\
> +	[el2 - FIRST_EL2_SYSREG] = { el2, el1, translate }
> +#define PURE_EL2_SYSREG(el2) \
> +	[el2 - FIRST_EL2_SYSREG] = { el2,__INVALID_SYSREG__, NULL }
> +/*
> + * Associate vEL2 registers to their EL1 counterparts on the CPU.
> + * The translate function can be NULL, when the register layout is identical.
> + */
> +struct el2_sysreg_map {
> +	int sysreg;	/* EL2 register index into the array above */
> +	int mapping;	/* associated EL1 register */
> +	u64 (*translate)(u64 value);
> +} nested_sysreg_map[NR_SYS_REGS - FIRST_EL2_SYSREG] = {
> +	PURE_EL2_SYSREG( VPIDR_EL2 ),
> +	PURE_EL2_SYSREG( VMPIDR_EL2 ),
> +	PURE_EL2_SYSREG( ACTLR_EL2 ),
> +	PURE_EL2_SYSREG( HCR_EL2 ),
> +	PURE_EL2_SYSREG( MDCR_EL2 ),
> +	PURE_EL2_SYSREG( HSTR_EL2 ),
> +	PURE_EL2_SYSREG( HACR_EL2 ),
> +	PURE_EL2_SYSREG( VTTBR_EL2 ),
> +	PURE_EL2_SYSREG( VTCR_EL2 ),
> +	PURE_EL2_SYSREG( RVBAR_EL2 ),
> +	PURE_EL2_SYSREG( RMR_EL2 ),
> +	PURE_EL2_SYSREG( TPIDR_EL2 ),
> +	PURE_EL2_SYSREG( CNTVOFF_EL2 ),
> +	PURE_EL2_SYSREG( CNTHCTL_EL2 ),
> +	PURE_EL2_SYSREG( HPFAR_EL2 ),
> +	EL2_SYSREG(      SCTLR_EL2,  SCTLR_EL1,      translate_sctlr ),
> +	EL2_SYSREG(      CPTR_EL2,   CPACR_EL1,      translate_cptr  ),
> +	EL2_SYSREG(      TTBR0_EL2,  TTBR0_EL1,      translate_ttbr0 ),
> +	EL2_SYSREG(      TTBR1_EL2,  TTBR1_EL1,      NULL            ),
> +	EL2_SYSREG(      TCR_EL2,    TCR_EL1,        translate_tcr   ),
> +	EL2_SYSREG(      VBAR_EL2,   VBAR_EL1,       NULL            ),
> +	EL2_SYSREG(      AFSR0_EL2,  AFSR0_EL1,      NULL            ),
> +	EL2_SYSREG(      AFSR1_EL2,  AFSR1_EL1,      NULL            ),
> +	EL2_SYSREG(      ESR_EL2,    ESR_EL1,        NULL            ),
> +	EL2_SYSREG(      FAR_EL2,    FAR_EL1,        NULL            ),
> +	EL2_SYSREG(      MAIR_EL2,   MAIR_EL1,       NULL            ),
> +	EL2_SYSREG(      AMAIR_EL2,  AMAIR_EL1,      NULL            ),
> +};
Figuring out which registers are in this map and which aren't and are supposed
to be treated differently is really cumbersome because they are split into two
types of el2 registers and their order is different from the order in enum
vcpu_sysreg (in kvm_host.h). Perhaps adding a comment about what registers will
be treated differently would make the code a bit easier to follow?
> +
> +static
> +const struct el2_sysreg_map *find_el2_sysreg(const struct el2_sysreg_map *map,
> +					     int reg)
> +{
> +	const struct el2_sysreg_map *entry;
> +
> +	if (!sysreg_is_el2(reg))
> +		return NULL;
> +
> +	entry = &nested_sysreg_map[reg - FIRST_EL2_SYSREG];
> +	if (entry->sysreg == __INVALID_SYSREG__)
> +		return NULL;
> +
> +	return entry;
> +}
> +
>  u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
>  {
> +
>  	if (!vcpu->arch.sysregs_loaded_on_cpu)
>  		goto immediate_read;
>  
> +	if (unlikely(sysreg_is_el2(reg))) {
> +		const struct el2_sysreg_map *el2_reg;
> +
> +		if (!is_hyp_ctxt(vcpu))
> +			goto immediate_read;
I'm confused by this. is_hyp_ctxt returns false when the guest is not in vEL2
AND HCR_EL.E2H or HCR_EL2.TGE are not set. In this case, the NV bit will not be
set and the hardware will raise an undefined instruction exception when
accessing an EL2 register from EL1. What am I missing?
> +
> +		el2_reg = find_el2_sysreg(nested_sysreg_map, reg);
> +		if (el2_reg) {
> +			/*
> +			 * If this register does not have an EL1 counterpart,
> +			 * then read the stored EL2 version.
> +			 */
> +			if (el2_reg->mapping == __INVALID_SYSREG__)
> +				goto immediate_read;
> +
> +			/* Get the current version of the EL1 counterpart. */
> +			reg = el2_reg->mapping;
> +		}
> +	} else {
> +		/* EL1 register can't be on the CPU if the guest is in vEL2. */
> +		if (unlikely(is_hyp_ctxt(vcpu)))
> +			goto immediate_read;
> +	}
> +
>  	/*
>  	 * System registers listed in the switch are not saved on every
>  	 * exit from the guest but are only saved on vcpu_put.
> @@ -114,6 +245,8 @@ u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
>  	case DACR32_EL2:	return read_sysreg_s(SYS_DACR32_EL2);
>  	case IFSR32_EL2:	return read_sysreg_s(SYS_IFSR32_EL2);
>  	case DBGVCR32_EL2:	return read_sysreg_s(SYS_DBGVCR32_EL2);
> +	case SP_EL2:		return read_sysreg(sp_el1);
> +	case ELR_EL2:		return read_sysreg_el1(SYS_ELR);
>  	}
>  
>  immediate_read:
> @@ -125,6 +258,34 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
>  	if (!vcpu->arch.sysregs_loaded_on_cpu)
>  		goto immediate_write;
>  
> +	if (unlikely(sysreg_is_el2(reg))) {
> +		const struct el2_sysreg_map *el2_reg;
> +
> +		if (!is_hyp_ctxt(vcpu))
> +			goto immediate_write;
> +
> +		/* Store the EL2 version in the sysregs array. */
> +		__vcpu_sys_reg(vcpu, reg) = val;
> +
> +		el2_reg = find_el2_sysreg(nested_sysreg_map, reg);
> +		if (el2_reg) {
> +			/* Does this register have an EL1 counterpart? */
> +			if (el2_reg->mapping == __INVALID_SYSREG__)
> +				return;
> +
> +			if (!vcpu_el2_e2h_is_set(vcpu) &&
> +			    el2_reg->translate)
> +				val = el2_reg->translate(val);
> +
> +			/* Redirect this to the EL1 version of the register. */
> +			reg = el2_reg->mapping;
> +		}
> +	} else {
> +		/* EL1 register can't be on the CPU if the guest is in vEL2. */
> +		if (unlikely(is_hyp_ctxt(vcpu)))
> +			goto immediate_write;
> +	}
> +
>  	/*
>  	 * System registers listed in the switch are not restored on every
>  	 * entry to the guest but are only restored on vcpu_load.
> @@ -157,6 +318,8 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
>  	case DACR32_EL2:	write_sysreg_s(val, SYS_DACR32_EL2);	return;
>  	case IFSR32_EL2:	write_sysreg_s(val, SYS_IFSR32_EL2);	return;
>  	case DBGVCR32_EL2:	write_sysreg_s(val, SYS_DBGVCR32_EL2);	return;
> +	case SP_EL2:		write_sysreg(val, sp_el1);		return;
> +	case ELR_EL2:		write_sysreg_el1(val, SYS_ELR);		return;
>  	}
>  
>  immediate_write:

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 26/59] KVM: arm64: nv: Respect the virtual HCR_EL2.NV bit setting
  2019-06-21  9:38 ` [PATCH 26/59] KVM: arm64: nv: Respect the virtual HCR_EL2.NV bit setting Marc Zyngier
@ 2019-06-26  5:31   ` Julien Thierry
  2019-07-03 16:31     ` Marc Zyngier
  0 siblings, 1 reply; 176+ messages in thread
From: Julien Thierry @ 2019-06-26  5:31 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	James Morse, Suzuki K Poulose



On 06/21/2019 10:38 AM, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> Forward traps due to HCR_EL2.NV bit to the virtual EL2 if they are not
> coming from the virtual EL2 and the virtual HCR_EL2.NV bit is set.
> 
> In addition to EL2 register accesses, setting NV bit will also make EL12
> register accesses trap to EL2. To emulate this for the virtual EL2,
> forword traps due to EL12 register accessses to the virtual EL2 if the
> virtual HCR_EL2.NV bit is set.
> 
> This is for recursive nested virtualization.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> [Moved code to emulate-nested.c]
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/include/asm/kvm_arm.h    |  1 +
>  arch/arm64/include/asm/kvm_nested.h |  2 ++
>  arch/arm64/kvm/emulate-nested.c     | 28 ++++++++++++++++++++++++++++
>  arch/arm64/kvm/handle_exit.c        |  6 ++++++
>  arch/arm64/kvm/sys_regs.c           | 18 ++++++++++++++++++
>  5 files changed, 55 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index 48e15af2bece..d21486274eeb 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -24,6 +24,7 @@
>  
>  /* Hyp Configuration Register (HCR) bits */
>  #define HCR_FWB		(UL(1) << 46)
> +#define HCR_NV		(UL(1) << 42)
>  #define HCR_API		(UL(1) << 41)
>  #define HCR_APK		(UL(1) << 40)
>  #define HCR_TEA		(UL(1) << 37)
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index 645e5e11b749..61e71d0d2151 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -11,5 +11,7 @@ static inline bool nested_virt_in_use(const struct kvm_vcpu *vcpu)
>  }
>  
>  int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
> +extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
> +extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
>  
>  #endif /* __ARM64_KVM_NESTED_H */
> diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
> index f829b8b04dc8..c406fd688b9f 100644
> --- a/arch/arm64/kvm/emulate-nested.c
> +++ b/arch/arm64/kvm/emulate-nested.c
> @@ -24,6 +24,27 @@
>  
>  #include "trace.h"
>  
> +bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit)

Should this one be static?

> +{
> +	bool control_bit_set;
> +
> +	if (!nested_virt_in_use(vcpu))
> +		return false;
> +
> +	control_bit_set = __vcpu_sys_reg(vcpu, HCR_EL2) & control_bit;
> +	if (!vcpu_mode_el2(vcpu) && control_bit_set) {
> +		kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
> +		return true;
> +	}
> +	return false;
> +}
> +
> +bool forward_nv_traps(struct kvm_vcpu *vcpu)
> +{
> +	return forward_traps(vcpu, HCR_NV);
> +}
> +
> +
>  /* This is borrowed from get_except_vector in inject_fault.c */
>  static u64 get_el2_except_vector(struct kvm_vcpu *vcpu,
>  		enum exception_type type)
> @@ -55,6 +76,13 @@ void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu)
>  	u64 spsr, elr, mode;
>  	bool direct_eret;
>  
> +	/*
> +	 * Forward this trap to the virtual EL2 if the virtual
> +	 * HCR_EL2.NV bit is set and this is coming from !EL2.
> +	 */
> +	if (forward_nv_traps(vcpu))
> +		return;
> +
>  	/*
>  	 * Going through the whole put/load motions is a waste of time
>  	 * if this is a VHE guest hypervisor returning to its own
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 39602a4c1d61..7e8b1ec1d251 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -72,6 +72,12 @@ static int handle_smc(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  {
>  	int ret;
>  
> +	/*
> +	 * Forward this trapped smc instruction to the virtual EL2.
> +	 */
> +	if ((vcpu_read_sys_reg(vcpu, HCR_EL2) & HCR_TSC) && forward_nv_traps(vcpu))

Not sure I understand why this would be only when the guest hyp also has
NV set.

If the guest hyp requested to trap smc instructions and that we received
one while in vel1, shouldn't we always forward it to the guest hyp to
let it implement the smc response the way it wants?

Cheers,

Julien

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 27/59] KVM: arm64: nv: Respect virtual HCR_EL2.TVM and TRVM settings
  2019-06-21  9:38 ` [PATCH 27/59] KVM: arm64: nv: Respect virtual HCR_EL2.TVM and TRVM settings Marc Zyngier
@ 2019-06-26  6:55   ` Julien Thierry
  2019-07-04 14:57     ` Marc Zyngier
  0 siblings, 1 reply; 176+ messages in thread
From: Julien Thierry @ 2019-06-26  6:55 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	James Morse, Suzuki K Poulose



On 06/21/2019 10:38 AM, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> Forward the EL1 virtual memory register traps to the virtual EL2 if they
> are not coming from the virtual EL2 and the virtual HCR_EL2.TVM or TRVM
> bit is set.
> 
> This is for recursive nested virtualization.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/kvm/sys_regs.c | 24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 582d62aa48b7..0f74b9277a86 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -436,6 +436,27 @@ static bool access_dcsw(struct kvm_vcpu *vcpu,
>  	return true;
>  }
>  
> +/* This function is to support the recursive nested virtualization */
> +static bool forward_vm_traps(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
> +{
> +	u64 hcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
> +
> +	/* If a trap comes from the virtual EL2, the host hypervisor handles. */
> +	if (vcpu_mode_el2(vcpu))
> +		return false;
> +
> +	/*
> +	 * If the virtual HCR_EL2.TVM or TRVM bit is set, we need to foward
> +	 * this trap to the virtual EL2.
> +	 */
> +	if ((hcr_el2 & HCR_TVM) && p->is_write)
> +		return true;
> +	else if ((hcr_el2 & HCR_TRVM) && !p->is_write)
> +		return true;
> +
> +	return false;
> +}
> +
>  /*
>   * Generic accessor for VM registers. Only called as long as HCR_TVM
>   * is set. If the guest enables the MMU, we stop trapping the VM
> @@ -452,6 +473,9 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
>  	if (el12_reg(p) && forward_nv_traps(vcpu))
>  		return false;
>  
> +	if (!el12_reg(p) && forward_vm_traps(vcpu, p))
> +		return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));

Since we already have forward_traps(), isn't this just:

	if (!el12_reg(p) && forward_traps(vcpu, p->is_write ? HCR_TVM : HCR_TRVM))
		return true;

We could maybe simplify forward_vm_traps() to just call forward_traps()
similar to forward_nv_traps().

Cheers,

Julien

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 28/59] KVM: arm64: nv: Respect the virtual HCR_EL2.NV1 bit setting
  2019-06-21  9:38 ` [PATCH 28/59] KVM: arm64: nv: Respect the virtual HCR_EL2.NV1 bit setting Marc Zyngier
@ 2019-06-26  7:23   ` Julien Thierry
  2019-07-02 16:32   ` Alexandru Elisei
  1 sibling, 0 replies; 176+ messages in thread
From: Julien Thierry @ 2019-06-26  7:23 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	James Morse, Suzuki K Poulose



On 06/21/2019 10:38 AM, Marc Zyngier wrote:
> From: Jintack Lim <jintack@cs.columbia.edu>
> 
> Forward ELR_EL1, SPSR_EL1 and VBAR_EL1 traps to the virtual EL2 if the
> virtual HCR_EL2.NV bit is set.
> 
> This is for recursive nested virtualization.
> 
> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/include/asm/kvm_arm.h |  1 +
>  arch/arm64/kvm/sys_regs.c        | 19 +++++++++++++++++--
>  2 files changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index d21486274eeb..55f4525c112c 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -24,6 +24,7 @@
>  
>  /* Hyp Configuration Register (HCR) bits */
>  #define HCR_FWB		(UL(1) << 46)
> +#define HCR_NV1		(UL(1) << 43)
>  #define HCR_NV		(UL(1) << 42)
>  #define HCR_API		(UL(1) << 41)
>  #define HCR_APK		(UL(1) << 40)
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 0f74b9277a86..beadebcfc888 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -473,8 +473,10 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
>  	if (el12_reg(p) && forward_nv_traps(vcpu))
>  		return false;
>  
> -	if (!el12_reg(p) && forward_vm_traps(vcpu, p))
> -		return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
> +	if (!el12_reg(p) && forward_vm_traps(vcpu, p)) {
> +		kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
> +		return false;

I feel like this change is actually intended to be part of the previous
patch.

Cheers,

Julien

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 13/59] KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg()
  2019-06-21  9:37 ` [PATCH 13/59] KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg() Marc Zyngier
  2019-06-24 12:42   ` Julien Thierry
  2019-06-25 15:18   ` Alexandru Elisei
@ 2019-06-26 15:04   ` Alexandru Elisei
  2019-07-04 15:05     ` Marc Zyngier
  2019-07-01 12:10   ` Alexandru Elisei
  3 siblings, 1 reply; 176+ messages in thread
From: Alexandru Elisei @ 2019-06-26 15:04 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave Martin

On 6/21/19 10:37 AM, Marc Zyngier wrote:
> From: Andre Przywara <andre.przywara@arm.com>
>
> KVM internally uses accessor functions when reading or writing the
> guest's system registers. This takes care of accessing either the stored
> copy or using the "live" EL1 system registers when the host uses VHE.
>
> With the introduction of virtual EL2 we add a bunch of EL2 system
> registers, which now must also be taken care of:
> - If the guest is running in vEL2, and we access an EL1 sysreg, we must
>   revert to the stored version of that, and not use the CPU's copy.
> - If the guest is running in vEL1, and we access an EL2 sysreg, we must
>   also use the stored version, since the CPU carries the EL1 copy.
> - Some EL2 system registers are supposed to affect the current execution
>   of the system, so we need to put them into their respective EL1
>   counterparts. For this we need to define a mapping between the two.
>   This is done using the newly introduced struct el2_sysreg_map.
> - Some EL2 system registers have a different format than their EL1
>   counterpart, so we need to translate them before writing them to the
>   CPU. This is done using an (optional) translate function in the map.
> - There are the three special registers SP_EL2, SPSR_EL2 and ELR_EL2,
>   which need some separate handling.
>
> All of these cases are now wrapped into the existing accessor functions,
> so KVM users wouldn't need to care whether they access EL2 or EL1
> registers and also which state the guest is in.
>
> This handles what was formerly known as the "shadow state" dynamically,
> without requiring a separate copy for each vCPU EL.
>
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/include/asm/kvm_emulate.h |   6 +
>  arch/arm64/include/asm/kvm_host.h    |   5 +
>  arch/arm64/kvm/sys_regs.c            | 163 +++++++++++++++++++++++++++
>  3 files changed, 174 insertions(+)
>
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index c43aac5fed69..f37006b6eec4 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -70,6 +70,12 @@ void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu);
>  int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2);
>  int kvm_inject_nested_irq(struct kvm_vcpu *vcpu);
>  
> +u64 translate_tcr(u64 tcr);
> +u64 translate_cptr(u64 tcr);
> +u64 translate_sctlr(u64 tcr);
> +u64 translate_ttbr0(u64 tcr);
> +u64 translate_cnthctl(u64 tcr);
> +
>  static inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
>  {
>  	return !(vcpu->arch.hcr_el2 & HCR_RW);
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 2d4290d2513a..dae9c42a7219 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -217,6 +217,11 @@ enum vcpu_sysreg {
>  	NR_SYS_REGS	/* Nothing after this line! */
>  };
>  
> +static inline bool sysreg_is_el2(int reg)
> +{
> +	return reg >= FIRST_EL2_SYSREG && reg < NR_SYS_REGS;
> +}
> +
>  /* 32bit mapping */
>  #define c0_MPIDR	(MPIDR_EL1 * 2)	/* MultiProcessor ID Register */
>  #define c0_CSSELR	(CSSELR_EL1 * 2)/* Cache Size Selection Register */
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 693dd063c9c2..d024114da162 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -76,11 +76,142 @@ static bool write_to_read_only(struct kvm_vcpu *vcpu,
>  	return false;
>  }
>  
> +static u64 tcr_el2_ips_to_tcr_el1_ps(u64 tcr_el2)
> +{
> +	return ((tcr_el2 & TCR_EL2_PS_MASK) >> TCR_EL2_PS_SHIFT)
> +		<< TCR_IPS_SHIFT;
> +}
> +
> +u64 translate_tcr(u64 tcr)
> +{
> +	return TCR_EPD1_MASK |				/* disable TTBR1_EL1 */
> +	       ((tcr & TCR_EL2_TBI) ? TCR_TBI0 : 0) |
> +	       tcr_el2_ips_to_tcr_el1_ps(tcr) |
> +	       (tcr & TCR_EL2_TG0_MASK) |
> +	       (tcr & TCR_EL2_ORGN0_MASK) |
> +	       (tcr & TCR_EL2_IRGN0_MASK) |
> +	       (tcr & TCR_EL2_T0SZ_MASK);
> +}
> +
> +u64 translate_cptr(u64 cptr_el2)
> +{
> +	u64 cpacr_el1 = 0;
> +
> +	if (!(cptr_el2 & CPTR_EL2_TFP))
> +		cpacr_el1 |= CPACR_EL1_FPEN;
> +	if (cptr_el2 & CPTR_EL2_TTA)
> +		cpacr_el1 |= CPACR_EL1_TTA;
> +	if (!(cptr_el2 & CPTR_EL2_TZ))
> +		cpacr_el1 |= CPACR_EL1_ZEN;
> +
> +	return cpacr_el1;
> +}
> +
> +u64 translate_sctlr(u64 sctlr)
> +{
> +	/* Bit 20 is RES1 in SCTLR_EL1, but RES0 in SCTLR_EL2 */
> +	return sctlr | BIT(20);
> +}
> +
> +u64 translate_ttbr0(u64 ttbr0)
> +{
> +	/* Force ASID to 0 (ASID 0 or RES0) */
> +	return ttbr0 & ~GENMASK_ULL(63, 48);
> +}
> +
> +u64 translate_cnthctl(u64 cnthctl)
> +{
> +	return ((cnthctl & 0x3) << 10) | (cnthctl & 0xfc);
> +}
> +
> +#define EL2_SYSREG(el2, el1, translate)	\
> +	[el2 - FIRST_EL2_SYSREG] = { el2, el1, translate }
> +#define PURE_EL2_SYSREG(el2) \
> +	[el2 - FIRST_EL2_SYSREG] = { el2,__INVALID_SYSREG__, NULL }
> +/*
> + * Associate vEL2 registers to their EL1 counterparts on the CPU.
> + * The translate function can be NULL, when the register layout is identical.
> + */
> +struct el2_sysreg_map {
> +	int sysreg;	/* EL2 register index into the array above */
> +	int mapping;	/* associated EL1 register */
> +	u64 (*translate)(u64 value);
> +} nested_sysreg_map[NR_SYS_REGS - FIRST_EL2_SYSREG] = {
> +	PURE_EL2_SYSREG( VPIDR_EL2 ),
> +	PURE_EL2_SYSREG( VMPIDR_EL2 ),
> +	PURE_EL2_SYSREG( ACTLR_EL2 ),
> +	PURE_EL2_SYSREG( HCR_EL2 ),
> +	PURE_EL2_SYSREG( MDCR_EL2 ),
> +	PURE_EL2_SYSREG( HSTR_EL2 ),
> +	PURE_EL2_SYSREG( HACR_EL2 ),
> +	PURE_EL2_SYSREG( VTTBR_EL2 ),
> +	PURE_EL2_SYSREG( VTCR_EL2 ),
> +	PURE_EL2_SYSREG( RVBAR_EL2 ),
> +	PURE_EL2_SYSREG( RMR_EL2 ),
> +	PURE_EL2_SYSREG( TPIDR_EL2 ),
> +	PURE_EL2_SYSREG( CNTVOFF_EL2 ),
> +	PURE_EL2_SYSREG( CNTHCTL_EL2 ),
> +	PURE_EL2_SYSREG( HPFAR_EL2 ),
> +	EL2_SYSREG(      SCTLR_EL2,  SCTLR_EL1,      translate_sctlr ),
> +	EL2_SYSREG(      CPTR_EL2,   CPACR_EL1,      translate_cptr  ),
> +	EL2_SYSREG(      TTBR0_EL2,  TTBR0_EL1,      translate_ttbr0 ),
> +	EL2_SYSREG(      TTBR1_EL2,  TTBR1_EL1,      NULL            ),
> +	EL2_SYSREG(      TCR_EL2,    TCR_EL1,        translate_tcr   ),
> +	EL2_SYSREG(      VBAR_EL2,   VBAR_EL1,       NULL            ),
> +	EL2_SYSREG(      AFSR0_EL2,  AFSR0_EL1,      NULL            ),
> +	EL2_SYSREG(      AFSR1_EL2,  AFSR1_EL1,      NULL            ),
> +	EL2_SYSREG(      ESR_EL2,    ESR_EL1,        NULL            ),
> +	EL2_SYSREG(      FAR_EL2,    FAR_EL1,        NULL            ),
> +	EL2_SYSREG(      MAIR_EL2,   MAIR_EL1,       NULL            ),
> +	EL2_SYSREG(      AMAIR_EL2,  AMAIR_EL1,      NULL            ),
> +};
> +
> +static
> +const struct el2_sysreg_map *find_el2_sysreg(const struct el2_sysreg_map *map,
> +					     int reg)
> +{
> +	const struct el2_sysreg_map *entry;
> +
> +	if (!sysreg_is_el2(reg))
> +		return NULL;
> +
> +	entry = &nested_sysreg_map[reg - FIRST_EL2_SYSREG];
> +	if (entry->sysreg == __INVALID_SYSREG__)
> +		return NULL;
> +
> +	return entry;
> +}
> +
>  u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
>  {
> +
>  	if (!vcpu->arch.sysregs_loaded_on_cpu)
>  		goto immediate_read;
>  
> +	if (unlikely(sysreg_is_el2(reg))) {
> +		const struct el2_sysreg_map *el2_reg;
> +
> +		if (!is_hyp_ctxt(vcpu))
> +			goto immediate_read;
> +
> +		el2_reg = find_el2_sysreg(nested_sysreg_map, reg);
> +		if (el2_reg) {
> +			/*
> +			 * If this register does not have an EL1 counterpart,
> +			 * then read the stored EL2 version.
> +			 */
> +			if (el2_reg->mapping == __INVALID_SYSREG__)
> +				goto immediate_read;
> +
> +			/* Get the current version of the EL1 counterpart. */
> +			reg = el2_reg->mapping;
> +		}
> +	} else {
> +		/* EL1 register can't be on the CPU if the guest is in vEL2. */
> +		if (unlikely(is_hyp_ctxt(vcpu)))
> +			goto immediate_read;
> +	}
> +
>  	/*
>  	 * System registers listed in the switch are not saved on every
>  	 * exit from the guest but are only saved on vcpu_put.
> @@ -114,6 +245,8 @@ u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
>  	case DACR32_EL2:	return read_sysreg_s(SYS_DACR32_EL2);
>  	case IFSR32_EL2:	return read_sysreg_s(SYS_IFSR32_EL2);
>  	case DBGVCR32_EL2:	return read_sysreg_s(SYS_DBGVCR32_EL2);
> +	case SP_EL2:		return read_sysreg(sp_el1);
From ARM DDI 0487D.b, section Behavior when HCR_EL2.NV == 1: "Reads or writes to
any allocated and implemented System register or Special-purpose register named
*_EL2, *_EL02, or *_EL12 in the MRS or MSR instruction, other than SP_EL2, are
trapped to EL2 rather than being UNDEFINED" (page D5-2480). My interpretation of
the text is that attempted reads of SP_EL2 from virtual EL2 cause an undefined
instruction exception.
> +	case ELR_EL2:		return read_sysreg_el1(SYS_ELR);
>  	}
>  
>  immediate_read:
> @@ -125,6 +258,34 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
>  	if (!vcpu->arch.sysregs_loaded_on_cpu)
>  		goto immediate_write;
>  
> +	if (unlikely(sysreg_is_el2(reg))) {
> +		const struct el2_sysreg_map *el2_reg;
> +
> +		if (!is_hyp_ctxt(vcpu))
> +			goto immediate_write;
> +
> +		/* Store the EL2 version in the sysregs array. */
> +		__vcpu_sys_reg(vcpu, reg) = val;
> +
> +		el2_reg = find_el2_sysreg(nested_sysreg_map, reg);
> +		if (el2_reg) {
> +			/* Does this register have an EL1 counterpart? */
> +			if (el2_reg->mapping == __INVALID_SYSREG__)
> +				return;
> +
> +			if (!vcpu_el2_e2h_is_set(vcpu) &&
> +			    el2_reg->translate)
> +				val = el2_reg->translate(val);
> +
> +			/* Redirect this to the EL1 version of the register. */
> +			reg = el2_reg->mapping;
> +		}
> +	} else {
> +		/* EL1 register can't be on the CPU if the guest is in vEL2. */
> +		if (unlikely(is_hyp_ctxt(vcpu)))
> +			goto immediate_write;
> +	}
> +
>  	/*
>  	 * System registers listed in the switch are not restored on every
>  	 * entry to the guest but are only restored on vcpu_load.
> @@ -157,6 +318,8 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
>  	case DACR32_EL2:	write_sysreg_s(val, SYS_DACR32_EL2);	return;
>  	case IFSR32_EL2:	write_sysreg_s(val, SYS_IFSR32_EL2);	return;
>  	case DBGVCR32_EL2:	write_sysreg_s(val, SYS_DBGVCR32_EL2);	return;
> +	case SP_EL2:		write_sysreg(val, sp_el1);		return;
> +	case ELR_EL2:		write_sysreg_el1(val, SYS_ELR);		return;
>  	}
>  
>  immediate_write:

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 15/59] KVM: arm64: nv: Refactor vcpu_{read,write}_sys_reg
  2019-06-21  9:37 ` [PATCH 15/59] KVM: arm64: nv: Refactor vcpu_{read,write}_sys_reg Marc Zyngier
  2019-06-24 15:07   ` Julien Thierry
@ 2019-06-27  9:21   ` Alexandru Elisei
  2019-07-04 15:15     ` Marc Zyngier
  1 sibling, 1 reply; 176+ messages in thread
From: Alexandru Elisei @ 2019-06-27  9:21 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave Martin

On 6/21/19 10:37 AM, Marc Zyngier wrote:
> Extract the direct HW accessors for later reuse.
>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/kvm/sys_regs.c | 247 +++++++++++++++++++++-----------------
>  1 file changed, 139 insertions(+), 108 deletions(-)
>
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 2b8734f75a09..e181359adadf 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -182,99 +182,161 @@ const struct el2_sysreg_map *find_el2_sysreg(const struct el2_sysreg_map *map,
>  	return entry;
>  }
>  
> +static bool __vcpu_read_sys_reg_from_cpu(int reg, u64 *val)
> +{
> +	/*
> +	 * System registers listed in the switch are not saved on every
> +	 * exit from the guest but are only saved on vcpu_put.
> +	 *
> +	 * Note that MPIDR_EL1 for the guest is set by KVM via VMPIDR_EL2 but
> +	 * should never be listed below, because the guest cannot modify its
> +	 * own MPIDR_EL1 and MPIDR_EL1 is accessed for VCPU A from VCPU B's
> +	 * thread when emulating cross-VCPU communication.
> +	 */
> +	switch (reg) {
> +	case CSSELR_EL1:	*val = read_sysreg_s(SYS_CSSELR_EL1);	break;
> +	case SCTLR_EL1:		*val = read_sysreg_s(SYS_SCTLR_EL12);	break;
> +	case ACTLR_EL1:		*val = read_sysreg_s(SYS_ACTLR_EL1);	break;
> +	case CPACR_EL1:		*val = read_sysreg_s(SYS_CPACR_EL12);	break;
> +	case TTBR0_EL1:		*val = read_sysreg_s(SYS_TTBR0_EL12);	break;
> +	case TTBR1_EL1:		*val = read_sysreg_s(SYS_TTBR1_EL12);	break;
> +	case TCR_EL1:		*val = read_sysreg_s(SYS_TCR_EL12);	break;
> +	case ESR_EL1:		*val = read_sysreg_s(SYS_ESR_EL12);	break;
> +	case AFSR0_EL1:		*val = read_sysreg_s(SYS_AFSR0_EL12);	break;
> +	case AFSR1_EL1:		*val = read_sysreg_s(SYS_AFSR1_EL12);	break;
> +	case FAR_EL1:		*val = read_sysreg_s(SYS_FAR_EL12);	break;
> +	case MAIR_EL1:		*val = read_sysreg_s(SYS_MAIR_EL12);	break;
> +	case VBAR_EL1:		*val = read_sysreg_s(SYS_VBAR_EL12);	break;
> +	case CONTEXTIDR_EL1:	*val = read_sysreg_s(SYS_CONTEXTIDR_EL12);break;
> +	case TPIDR_EL0:		*val = read_sysreg_s(SYS_TPIDR_EL0);	break;
> +	case TPIDRRO_EL0:	*val = read_sysreg_s(SYS_TPIDRRO_EL0);	break;
> +	case TPIDR_EL1:		*val = read_sysreg_s(SYS_TPIDR_EL1);	break;
> +	case AMAIR_EL1:		*val = read_sysreg_s(SYS_AMAIR_EL12);	break;
> +	case CNTKCTL_EL1:	*val = read_sysreg_s(SYS_CNTKCTL_EL12);	break;
> +	case PAR_EL1:		*val = read_sysreg_s(SYS_PAR_EL1);	break;
> +	case DACR32_EL2:	*val = read_sysreg_s(SYS_DACR32_EL2);	break;
> +	case IFSR32_EL2:	*val = read_sysreg_s(SYS_IFSR32_EL2);	break;
> +	case DBGVCR32_EL2:	*val = read_sysreg_s(SYS_DBGVCR32_EL2);	break;
> +	default:		return false;
> +	}
> +
> +	return true;
> +}
> +
> +static bool __vcpu_write_sys_reg_to_cpu(u64 val, int reg)
> +{
> +	/*
> +	 * System registers listed in the switch are not restored on every
> +	 * entry to the guest but are only restored on vcpu_load.
> +	 *
> +	 * Note that MPIDR_EL1 for the guest is set by KVM via VMPIDR_EL2 but
> +	 * should never be listed below, because the the MPIDR should only be
> +	 * set once, before running the VCPU, and never changed later.
> +	 */
> +	switch (reg) {
> +	case CSSELR_EL1:	write_sysreg_s(val, SYS_CSSELR_EL1);	break;
> +	case SCTLR_EL1:		write_sysreg_s(val, SYS_SCTLR_EL12);	break;
> +	case ACTLR_EL1:		write_sysreg_s(val, SYS_ACTLR_EL1);	break;
> +	case CPACR_EL1:		write_sysreg_s(val, SYS_CPACR_EL12);	break;
> +	case TTBR0_EL1:		write_sysreg_s(val, SYS_TTBR0_EL12);	break;
> +	case TTBR1_EL1:		write_sysreg_s(val, SYS_TTBR1_EL12);	break;
> +	case TCR_EL1:		write_sysreg_s(val, SYS_TCR_EL12);	break;
> +	case ESR_EL1:		write_sysreg_s(val, SYS_ESR_EL12);	break;
> +	case AFSR0_EL1:		write_sysreg_s(val, SYS_AFSR0_EL12);	break;
> +	case AFSR1_EL1:		write_sysreg_s(val, SYS_AFSR1_EL12);	break;
> +	case FAR_EL1:		write_sysreg_s(val, SYS_FAR_EL12);	break;
> +	case MAIR_EL1:		write_sysreg_s(val, SYS_MAIR_EL12);	break;
> +	case VBAR_EL1:		write_sysreg_s(val, SYS_VBAR_EL12);	break;
> +	case CONTEXTIDR_EL1:	write_sysreg_s(val, SYS_CONTEXTIDR_EL12);break;
> +	case TPIDR_EL0:		write_sysreg_s(val, SYS_TPIDR_EL0);	break;
> +	case TPIDRRO_EL0:	write_sysreg_s(val, SYS_TPIDRRO_EL0);	break;
> +	case TPIDR_EL1:		write_sysreg_s(val, SYS_TPIDR_EL1);	break;
> +	case AMAIR_EL1:		write_sysreg_s(val, SYS_AMAIR_EL12);	break;
> +	case CNTKCTL_EL1:	write_sysreg_s(val, SYS_CNTKCTL_EL12);	break;
> +	case PAR_EL1:		write_sysreg_s(val, SYS_PAR_EL1);	break;
> +	case DACR32_EL2:	write_sysreg_s(val, SYS_DACR32_EL2);	break;
> +	case IFSR32_EL2:	write_sysreg_s(val, SYS_IFSR32_EL2);	break;
> +	case DBGVCR32_EL2:	write_sysreg_s(val, SYS_DBGVCR32_EL2);	break;
> +	default:		return false;
> +	}
> +
> +	return true;
> +}
> +
>  u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
>  {
> -	u64 val;
> +	u64 val = 0x8badf00d8badf00d;
>  
>  	if (!vcpu->arch.sysregs_loaded_on_cpu)
> -		goto immediate_read;
> +		goto memory_read;
>  
>  	if (unlikely(sysreg_is_el2(reg))) {
>  		const struct el2_sysreg_map *el2_reg;
>  
>  		if (!is_hyp_ctxt(vcpu))
> -			goto immediate_read;
> +			goto memory_read;
>  
>  		switch (reg) {
> +		case ELR_EL2:
> +			return read_sysreg_el1(SYS_ELR);
>  		case SPSR_EL2:
>  			val = read_sysreg_el1(SYS_SPSR);
>  			return __fixup_spsr_el2_read(&vcpu->arch.ctxt, val);
>  		}
>  
>  		el2_reg = find_el2_sysreg(nested_sysreg_map, reg);
> -		if (el2_reg) {
> -			/*
> -			 * If this register does not have an EL1 counterpart,
> -			 * then read the stored EL2 version.
> -			 */
> -			if (el2_reg->mapping == __INVALID_SYSREG__)
> -				goto immediate_read;
> -
> -			/* Get the current version of the EL1 counterpart. */
> -			reg = el2_reg->mapping;
> -		}
> -	} else {
> -		/* EL1 register can't be on the CPU if the guest is in vEL2. */
> -		if (unlikely(is_hyp_ctxt(vcpu)))
> -			goto immediate_read;
> +		BUG_ON(!el2_reg);
> +
> +		/*
> +		 * If this register does not have an EL1 counterpart,
> +		 * then read the stored EL2 version.
> +		 */
> +		if (el2_reg->mapping == __INVALID_SYSREG__)
> +			goto memory_read;
> +
> +		if (!vcpu_el2_e2h_is_set(vcpu) &&
> +		    el2_reg->translate)
> +			goto memory_read;

Nit: the condition can be written on one line.

This condition wasn't present in patch 13 which introduced EL2 register
handling, and I'm struggling to understand what it does. As I understand the
code, this condition basically translates into:

- if the register is one of SCTLR_EL2, TTBR0_EL2, CPTR_EL2 or TCR_EL2, then read
it from memory.

- if the register is an EL2 register whose value is written unmodified to the
corresponding EL1 register, then read the corresponding EL1 register and return
that value.

Looking at vcpu_write_sys_reg, the values for the EL2 registers are always saved
in memory. The guest is a non-vhe guest, so writes to EL1 registers shouldn't be
reflected in the corresponding EL2 register. I think it's safe to always return
the value from memory.

I tried testing this with the following patch:

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 1235a88ec575..27d39bb9564d 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -290,6 +290,9 @@ u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
                el2_reg = find_el2_sysreg(nested_sysreg_map, reg);
                BUG_ON(!el2_reg);
 
+               if (!vcpu_el2_e2h_is_set(vcpu))
+                       goto memory_read;
+
                /*
                 * If this register does not have an EL1 counterpart,
                 * then read the stored EL2 version.
@@ -297,10 +300,6 @@ u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
                if (el2_reg->mapping == __INVALID_SYSREG__)
                        goto memory_read;
 
-               if (!vcpu_el2_e2h_is_set(vcpu) &&
-                   el2_reg->translate)
-                       goto memory_read;
-
                /* Get the current version of the EL1 counterpart. */
                reg = el2_reg->mapping;
                WARN_ON(!__vcpu_read_sys_reg_from_cpu(reg, &val));

I know it's not conclusive, but I was able to boot a L2 guest under a L1 non-vhe
hypervisor.

> +
> +		/* Get the current version of the EL1 counterpart. */
> +		reg = el2_reg->mapping;
> +		WARN_ON(!__vcpu_read_sys_reg_from_cpu(reg, &val));
> +		return val;
>  	}
>  
> -	/*
> -	 * System registers listed in the switch are not saved on every
> -	 * exit from the guest but are only saved on vcpu_put.
> -	 *
> -	 * Note that MPIDR_EL1 for the guest is set by KVM via VMPIDR_EL2 but
> -	 * should never be listed below, because the guest cannot modify its
> -	 * own MPIDR_EL1 and MPIDR_EL1 is accessed for VCPU A from VCPU B's
> -	 * thread when emulating cross-VCPU communication.
> -	 */
> -	switch (reg) {
> -	case CSSELR_EL1:	return read_sysreg_s(SYS_CSSELR_EL1);
> -	case SCTLR_EL1:		return read_sysreg_s(SYS_SCTLR_EL12);
> -	case ACTLR_EL1:		return read_sysreg_s(SYS_ACTLR_EL1);
> -	case CPACR_EL1:		return read_sysreg_s(SYS_CPACR_EL12);
> -	case TTBR0_EL1:		return read_sysreg_s(SYS_TTBR0_EL12);
> -	case TTBR1_EL1:		return read_sysreg_s(SYS_TTBR1_EL12);
> -	case TCR_EL1:		return read_sysreg_s(SYS_TCR_EL12);
> -	case ESR_EL1:		return read_sysreg_s(SYS_ESR_EL12);
> -	case AFSR0_EL1:		return read_sysreg_s(SYS_AFSR0_EL12);
> -	case AFSR1_EL1:		return read_sysreg_s(SYS_AFSR1_EL12);
> -	case FAR_EL1:		return read_sysreg_s(SYS_FAR_EL12);
> -	case MAIR_EL1:		return read_sysreg_s(SYS_MAIR_EL12);
> -	case VBAR_EL1:		return read_sysreg_s(SYS_VBAR_EL12);
> -	case CONTEXTIDR_EL1:	return read_sysreg_s(SYS_CONTEXTIDR_EL12);
> -	case TPIDR_EL0:		return read_sysreg_s(SYS_TPIDR_EL0);
> -	case TPIDRRO_EL0:	return read_sysreg_s(SYS_TPIDRRO_EL0);
> -	case TPIDR_EL1:		return read_sysreg_s(SYS_TPIDR_EL1);
> -	case AMAIR_EL1:		return read_sysreg_s(SYS_AMAIR_EL12);
> -	case CNTKCTL_EL1:	return read_sysreg_s(SYS_CNTKCTL_EL12);
> -	case PAR_EL1:		return read_sysreg_s(SYS_PAR_EL1);
> -	case DACR32_EL2:	return read_sysreg_s(SYS_DACR32_EL2);
> -	case IFSR32_EL2:	return read_sysreg_s(SYS_IFSR32_EL2);
> -	case DBGVCR32_EL2:	return read_sysreg_s(SYS_DBGVCR32_EL2);
> -	case SP_EL2:		return read_sysreg(sp_el1);
> -	case ELR_EL2:		return read_sysreg_el1(SYS_ELR);
> -	}
> +	/* EL1 register can't be on the CPU if the guest is in vEL2. */
> +	if (unlikely(is_hyp_ctxt(vcpu)))
> +		goto memory_read;
> +
> +	if (__vcpu_read_sys_reg_from_cpu(reg, &val))
> +		return val;
>  
> -immediate_read:
> +memory_read:
>  	return __vcpu_sys_reg(vcpu, reg);
>  }
>  
>  void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
>  {
>  	if (!vcpu->arch.sysregs_loaded_on_cpu)
> -		goto immediate_write;
> +		goto memory_write;
>  
>  	if (unlikely(sysreg_is_el2(reg))) {
>  		const struct el2_sysreg_map *el2_reg;
>  
>  		if (!is_hyp_ctxt(vcpu))
> -			goto immediate_write;
> +			goto memory_write;
>  
> -		/* Store the EL2 version in the sysregs array. */
> +		/*
> +		 * Always store a copy of the write to memory to avoid having
> +		 * to reverse-translate virtual EL2 system registers for a
> +		 * non-VHE guest hypervisor.
> +		 */
>  		__vcpu_sys_reg(vcpu, reg) = val;
>  
>  		switch (reg) {
> +		case ELR_EL2:
> +			write_sysreg_el1(val, SYS_ELR);
> +			return;
>  		case SPSR_EL2:
>  			val = __fixup_spsr_el2_write(&vcpu->arch.ctxt, val);
>  			write_sysreg_el1(val, SYS_SPSR);
> @@ -282,61 +344,30 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
>  		}
>  
>  		el2_reg = find_el2_sysreg(nested_sysreg_map, reg);
> -		if (el2_reg) {
> -			/* Does this register have an EL1 counterpart? */
> -			if (el2_reg->mapping == __INVALID_SYSREG__)
> -				return;
> +		WARN(!el2_reg, "reg: %d\n", reg);
>  
> -			if (!vcpu_el2_e2h_is_set(vcpu) &&
> -			    el2_reg->translate)
> -				val = el2_reg->translate(val);
> +		/* Does this register have an EL1 counterpart? */
> +		if (el2_reg->mapping == __INVALID_SYSREG__)
> +			goto memory_write;
>  
> -			/* Redirect this to the EL1 version of the register. */
> -			reg = el2_reg->mapping;
> -		}
> -	} else {
> -		/* EL1 register can't be on the CPU if the guest is in vEL2. */
> -		if (unlikely(is_hyp_ctxt(vcpu)))
> -			goto immediate_write;
> -	}
> +		if (!vcpu_el2_e2h_is_set(vcpu) &&
> +		    el2_reg->translate)
> +			val = el2_reg->translate(val);
>  
> -	/*
> -	 * System registers listed in the switch are not restored on every
> -	 * entry to the guest but are only restored on vcpu_load.
> -	 *
> -	 * Note that MPIDR_EL1 for the guest is set by KVM via VMPIDR_EL2 but
> -	 * should never be listed below, because the the MPIDR should only be
> -	 * set once, before running the VCPU, and never changed later.
> -	 */
> -	switch (reg) {
> -	case CSSELR_EL1:	write_sysreg_s(val, SYS_CSSELR_EL1);	return;
> -	case SCTLR_EL1:		write_sysreg_s(val, SYS_SCTLR_EL12);	return;
> -	case ACTLR_EL1:		write_sysreg_s(val, SYS_ACTLR_EL1);	return;
> -	case CPACR_EL1:		write_sysreg_s(val, SYS_CPACR_EL12);	return;
> -	case TTBR0_EL1:		write_sysreg_s(val, SYS_TTBR0_EL12);	return;
> -	case TTBR1_EL1:		write_sysreg_s(val, SYS_TTBR1_EL12);	return;
> -	case TCR_EL1:		write_sysreg_s(val, SYS_TCR_EL12);	return;
> -	case ESR_EL1:		write_sysreg_s(val, SYS_ESR_EL12);	return;
> -	case AFSR0_EL1:		write_sysreg_s(val, SYS_AFSR0_EL12);	return;
> -	case AFSR1_EL1:		write_sysreg_s(val, SYS_AFSR1_EL12);	return;
> -	case FAR_EL1:		write_sysreg_s(val, SYS_FAR_EL12);	return;
> -	case MAIR_EL1:		write_sysreg_s(val, SYS_MAIR_EL12);	return;
> -	case VBAR_EL1:		write_sysreg_s(val, SYS_VBAR_EL12);	return;
> -	case CONTEXTIDR_EL1:	write_sysreg_s(val, SYS_CONTEXTIDR_EL12); return;
> -	case TPIDR_EL0:		write_sysreg_s(val, SYS_TPIDR_EL0);	return;
> -	case TPIDRRO_EL0:	write_sysreg_s(val, SYS_TPIDRRO_EL0);	return;
> -	case TPIDR_EL1:		write_sysreg_s(val, SYS_TPIDR_EL1);	return;
> -	case AMAIR_EL1:		write_sysreg_s(val, SYS_AMAIR_EL12);	return;
> -	case CNTKCTL_EL1:	write_sysreg_s(val, SYS_CNTKCTL_EL12);	return;
> -	case PAR_EL1:		write_sysreg_s(val, SYS_PAR_EL1);	return;
> -	case DACR32_EL2:	write_sysreg_s(val, SYS_DACR32_EL2);	return;
> -	case IFSR32_EL2:	write_sysreg_s(val, SYS_IFSR32_EL2);	return;
> -	case DBGVCR32_EL2:	write_sysreg_s(val, SYS_DBGVCR32_EL2);	return;
> -	case SP_EL2:		write_sysreg(val, sp_el1);		return;
> -	case ELR_EL2:		write_sysreg_el1(val, SYS_ELR);		return;
> +		/* Redirect this to the EL1 version of the register. */
> +		reg = el2_reg->mapping;
> +		WARN_ON(!__vcpu_write_sys_reg_to_cpu(val, reg));
> +		return;
>  	}
>  
> -immediate_write:
> +	/* EL1 register can't be on the CPU if the guest is in vEL2. */
> +	if (unlikely(is_hyp_ctxt(vcpu)))
> +		goto memory_write;
> +
> +	if (__vcpu_write_sys_reg_to_cpu(val, reg))
> +		return;
> +
> +memory_write:
>  	 __vcpu_sys_reg(vcpu, reg) = val;
>  }
>  

^ permalink raw reply related	[flat|nested] 176+ messages in thread

* Re: [PATCH 35/59] KVM: arm/arm64: nv: Support multiple nested stage 2 mmu structures
  2019-06-21  9:38 ` [PATCH 35/59] KVM: arm/arm64: nv: Support multiple nested stage 2 mmu structures Marc Zyngier
  2019-06-25 12:19   ` Alexandru Elisei
@ 2019-06-27 13:15   ` Julien Thierry
  2019-07-04 15:51   ` Alexandru Elisei
  2 siblings, 0 replies; 176+ messages in thread
From: Julien Thierry @ 2019-06-27 13:15 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	James Morse, Suzuki K Poulose



On 06/21/2019 10:38 AM, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@arm.com>
> 
> Add stage 2 mmu data structures for virtual EL2 and for nested guests.
> We don't yet populate shadow stage 2 page tables, but we now have a
> framework for getting to a shadow stage 2 pgd.
> 
> We allocate twice the number of vcpus as stage 2 mmu structures because
> that's sufficient for each vcpu running two VMs without having to flush
> the stage 2 page tables.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/include/asm/kvm_host.h     |   4 +
>  arch/arm/include/asm/kvm_mmu.h      |   3 +
>  arch/arm64/include/asm/kvm_host.h   |  28 +++++
>  arch/arm64/include/asm/kvm_mmu.h    |   8 ++
>  arch/arm64/include/asm/kvm_nested.h |   7 ++
>  arch/arm64/kvm/nested.c             | 172 ++++++++++++++++++++++++++++
>  virt/kvm/arm/arm.c                  |  16 ++-
>  virt/kvm/arm/mmu.c                  |  31 ++---
>  8 files changed, 254 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index e3217c4ad25b..b821eb2383ad 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -424,4 +424,8 @@ static inline bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu)
>  	return true;
>  }
>  
> +static inline void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu) {}
> +static inline int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu) { return 0; }
> +
>  #endif /* __ARM_KVM_HOST_H__ */
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index be23e3f8e08c..e6984b6da2ce 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -420,6 +420,9 @@ static inline int hyp_map_aux_data(void)
>  
>  static inline void kvm_set_ipa_limit(void) {}
>  
> +static inline void kvm_init_s2_mmu(struct kvm_s2_mmu *mmu) {}
> +static inline void kvm_init_nested(struct kvm *kvm) {}
> +
>  static __always_inline u64 kvm_get_vttbr(struct kvm_s2_mmu *mmu)
>  {
>  	struct kvm_vmid *vmid = &mmu->vmid;
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 3dee5e17a4ee..cc238de170d2 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -88,11 +88,39 @@ struct kvm_s2_mmu {
>  	phys_addr_t	pgd_phys;
>  
>  	struct kvm *kvm;
> +
> +	/*
> +	 * For a shadow stage-2 MMU, the virtual vttbr programmed by the guest
> +	 * hypervisor.  Unused for kvm_arch->mmu. Set to 1 when the structure
> +	 * contains no valid information.
> +	 */
> +	u64	vttbr;
> +
> +	/* true when this represents a nested context where virtual HCR_EL2.VM == 1 */
> +	bool	nested_stage2_enabled;
> +
> +	/*
> +	 *  0: Nobody is currently using this, check vttbr for validity
> +	 * >0: Somebody is actively using this.
> +	 */
> +	atomic_t refcnt;
>  };
>  
> +static inline bool kvm_s2_mmu_valid(struct kvm_s2_mmu *mmu)
> +{
> +	return !(mmu->vttbr & 1);
> +}
> +
>  struct kvm_arch {
>  	struct kvm_s2_mmu mmu;
>  
> +	/*
> +	 * Stage 2 paging stage for VMs with nested virtual using a virtual
> +	 * VMID.
> +	 */
> +	struct kvm_s2_mmu *nested_mmus;
> +	size_t nested_mmus_size;
> +
>  	/* VTCR_EL2 value for this VM */
>  	u64    vtcr;
>  
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 1eb6e0ca61c2..32bcaa1845dc 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -100,6 +100,7 @@ alternative_cb_end
>  #include <asm/mmu_context.h>
>  #include <asm/pgtable.h>
>  #include <asm/kvm_emulate.h>
> +#include <asm/kvm_nested.h>
>  
>  void kvm_update_va_mask(struct alt_instr *alt,
>  			__le32 *origptr, __le32 *updptr, int nr_inst);
> @@ -164,6 +165,7 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
>  			     void **haddr);
>  void free_hyp_pgds(void);
>  
> +void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size);
>  void stage2_unmap_vm(struct kvm *kvm);
>  int kvm_alloc_stage2_pgd(struct kvm_s2_mmu *mmu);
>  void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
> @@ -635,5 +637,11 @@ static __always_inline void __load_guest_stage2(struct kvm_s2_mmu *mmu)
>  	asm(ALTERNATIVE("nop", "isb", ARM64_WORKAROUND_1165522));
>  }
>  
> +static inline u64 get_vmid(u64 vttbr)
> +{
> +	return (vttbr & VTTBR_VMID_MASK(kvm_get_vmid_bits())) >>
> +		VTTBR_VMID_SHIFT;
> +}
> +
>  #endif /* __ASSEMBLY__ */
>  #endif /* __ARM64_KVM_MMU_H__ */
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index 61e71d0d2151..d4021d0892bd 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -10,6 +10,13 @@ static inline bool nested_virt_in_use(const struct kvm_vcpu *vcpu)
>  		test_bit(KVM_ARM_VCPU_NESTED_VIRT, vcpu->arch.features);
>  }
>  
> +extern void kvm_init_nested(struct kvm *kvm);
> +extern int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu);
> +extern void kvm_init_s2_mmu(struct kvm_s2_mmu *mmu);
> +extern struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr);
> +extern void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu);
> +extern void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu);
> +
>  int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
>  extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
>  extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> index 3872e3cf1691..4b38dc5c0be3 100644
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
> @@ -18,7 +18,161 @@
>  #include <linux/kvm.h>
>  #include <linux/kvm_host.h>
>  
> +#include <asm/kvm_arm.h>
>  #include <asm/kvm_emulate.h>
> +#include <asm/kvm_mmu.h>
> +#include <asm/kvm_nested.h>
> +
> +void kvm_init_nested(struct kvm *kvm)
> +{
> +	kvm_init_s2_mmu(&kvm->arch.mmu);
> +
> +	kvm->arch.nested_mmus = NULL;
> +	kvm->arch.nested_mmus_size = 0;
> +}
> +
> +int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm *kvm = vcpu->kvm;
> +	struct kvm_s2_mmu *tmp;
> +	int num_mmus;
> +	int ret = -ENOMEM;
> +
> +	if (!test_bit(KVM_ARM_VCPU_NESTED_VIRT, vcpu->arch.features))
> +		return 0;
> +
> +	if (!cpus_have_const_cap(ARM64_HAS_NESTED_VIRT))
> +		return -EINVAL;
> +
> +	mutex_lock(&kvm->lock);
> +
> +	num_mmus = atomic_read(&kvm->online_vcpus) * 2;
> +	tmp = __krealloc(kvm->arch.nested_mmus,
> +			 num_mmus * sizeof(*kvm->arch.nested_mmus),
> +			 GFP_KERNEL | __GFP_ZERO);
> +
> +	if (tmp) {
> +		if (tmp != kvm->arch.nested_mmus)
> +			kfree(kvm->arch.nested_mmus);
> +
> +		tmp[num_mmus - 1].kvm = kvm;
> +		atomic_set(&tmp[num_mmus - 1].refcnt, 0);
> +		ret = kvm_alloc_stage2_pgd(&tmp[num_mmus - 1]);
> +		if (ret)
> +			goto out;
> +
> +		tmp[num_mmus - 2].kvm = kvm;
> +		atomic_set(&tmp[num_mmus - 2].refcnt, 0);
> +		ret = kvm_alloc_stage2_pgd(&tmp[num_mmus - 2]);
> +		if (ret) {
> +			kvm_free_stage2_pgd(&tmp[num_mmus - 1]);
> +			goto out;
> +		}
> +
> +		kvm->arch.nested_mmus_size = num_mmus;
> +		kvm->arch.nested_mmus = tmp;
> +		tmp = NULL;
> +	}
> +
> +out:
> +	kfree(tmp);
> +	mutex_unlock(&kvm->lock);
> +	return ret;
> +}
> +
> +/* Must be called with kvm->lock held */
> +struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr)
> +{
> +	bool nested_stage2_enabled = hcr & HCR_VM;
> +	int i;
> +
> +	/* Don't consider the CnP bit for the vttbr match */
> +	vttbr = vttbr & ~1UL;
> +
> +	/* Search a mmu in the list using the virtual VMID as a key */
> +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> +		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
> +
> +		if (!kvm_s2_mmu_valid(mmu))
> +			continue;
> +
> +		if (nested_stage2_enabled &&
> +		    mmu->nested_stage2_enabled &&
> +		    vttbr == mmu->vttbr)
> +			return mmu;
> +
> +		if (!nested_stage2_enabled &&
> +		    !mmu->nested_stage2_enabled &&
> +		    get_vmid(vttbr) == get_vmid(mmu->vttbr))
> +			return mmu;
> +	}
> +	return NULL;
> +}
> +
> +static struct kvm_s2_mmu *get_s2_mmu_nested(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm *kvm = vcpu->kvm;
> +	u64 vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
> +	u64 hcr= vcpu_read_sys_reg(vcpu, HCR_EL2);
> +	struct kvm_s2_mmu *s2_mmu;
> +	int i;
> +
> +	s2_mmu = lookup_s2_mmu(kvm, vttbr, hcr);
> +	if (s2_mmu)
> +		goto out;
> +
> +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> +		s2_mmu = &kvm->arch.nested_mmus[i];
> +
> +		if (atomic_read(&s2_mmu->refcnt) == 0)
> +			break;
> +	}
> +	BUG_ON(atomic_read(&s2_mmu->refcnt)); /* We have struct MMUs to spare */
> +
> +	if (kvm_s2_mmu_valid(s2_mmu)) {
> +		/* Clear the old state */
> +		kvm_unmap_stage2_range(s2_mmu, 0, kvm_phys_size(kvm));
> +		if (s2_mmu->vmid.vmid_gen)
> +			kvm_call_hyp(__kvm_tlb_flush_vmid, s2_mmu);
> +	}
> +
> +	/*
> +	 * The virtual VMID (modulo CnP) will be used as a key when matching
> +	 * an existing kvm_s2_mmu.
> +	 */
> +	s2_mmu->vttbr = vttbr & ~1UL;
> +	s2_mmu->nested_stage2_enabled = hcr & HCR_VM;
> +
> +out:
> +	atomic_inc(&s2_mmu->refcnt);
> +	return s2_mmu;
> +}
> +
> +void kvm_init_s2_mmu(struct kvm_s2_mmu *mmu)
> +{
> +	mmu->vttbr = 1;
> +	mmu->nested_stage2_enabled = false;
> +	atomic_set(&mmu->refcnt, 0);
> +}
> +
> +void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu)
> +{
> +	if (is_hyp_ctxt(vcpu)) {
> +		vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu;
> +	} else {
> +		spin_lock(&vcpu->kvm->mmu_lock);

For the allocation + initialization of s2 mmus, kvm->lock is taken in
kvm_vcpu_init_nested(). But here we take kvm->mmu_lock.

Are we in trouble? Or are we expecting
get_s2_mmu_nested()/lookup_s2_mmu() to be called only after
kvm_vcpu_init_nested() has completed on all vcpus of the VM? Otherwise
we could end up using the kvm->arch.nested_mmus when it has been freed
and before it is updated with the new pointer.

(I feel we should be taking kvm->mmu_lock in kvm_vcpu_init_nested() )

> +		vcpu->arch.hw_mmu = get_s2_mmu_nested(vcpu);
> +		spin_unlock(&vcpu->kvm->mmu_lock);
> +	}
> +}
> +
> +void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu)
> +{
> +	if (vcpu->arch.hw_mmu != &vcpu->kvm->arch.mmu) {
> +		atomic_dec(&vcpu->arch.hw_mmu->refcnt);
> +		vcpu->arch.hw_mmu = NULL;
> +	}
> +}
>  
>  /*
>   * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
> @@ -37,3 +191,21 @@ int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
>  
>  	return -EINVAL;
>  }
> +
> +void kvm_arch_flush_shadow_all(struct kvm *kvm)
> +{
> +	int i;
> +
> +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> +		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
> +
> +		WARN_ON(atomic_read(&mmu->refcnt));
> +
> +		if (!atomic_read(&mmu->refcnt))
> +			kvm_free_stage2_pgd(mmu);
> +	}
> +	kfree(kvm->arch.nested_mmus);
> +	kvm->arch.nested_mmus = NULL;
> +	kvm->arch.nested_mmus_size = 0;

Don't we need also to take the lock before modifying those? (Apprently
we're killing the VM, so there shouldn't be other user, but just want to
make sure...)

Cheers,

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 38/59] KVM: arm64: nv: Unmap/flush shadow stage 2 page tables
  2019-06-21  9:38 ` [PATCH 38/59] KVM: arm64: nv: Unmap/flush shadow stage 2 page tables Marc Zyngier
@ 2019-07-01  8:03   ` Julien Thierry
  0 siblings, 0 replies; 176+ messages in thread
From: Julien Thierry @ 2019-07-01  8:03 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	James Morse, Suzuki K Poulose



On 21/06/2019 10:38, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@linaro.org>
> 
> Unmap/flush shadow stage 2 page tables for the nested VMs as well as the
> stage 2 page table for the guest hypervisor.
> 
> Note: A bunch of the code in mmu.c relating to MMU notifiers is
> currently dealt with in an extremely abrupt way, for example by clearing
> out an entire shadow stage-2 table. This will be handled in a more
> efficient way using the reverse mapping feature in a later version of
> the patch series.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/include/asm/kvm_mmu.h    |  3 +++
>  arch/arm64/include/asm/kvm_nested.h |  3 +++
>  arch/arm64/kvm/nested.c             | 39 +++++++++++++++++++++++++++
>  virt/kvm/arm/arm.c                  |  4 ++-
>  virt/kvm/arm/mmu.c                  | 42 +++++++++++++++++++++++------
>  5 files changed, 82 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 32bcaa1845dc..f4c5ac5eb95f 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -163,6 +163,8 @@ int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
>  			   void __iomem **haddr);
>  int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
>  			     void **haddr);
> +void kvm_stage2_flush_range(struct kvm_s2_mmu *mmu,
> +			    phys_addr_t addr, phys_addr_t end);
>  void free_hyp_pgds(void);
>  
>  void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size);
> @@ -171,6 +173,7 @@ int kvm_alloc_stage2_pgd(struct kvm_s2_mmu *mmu);
>  void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
>  int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
>  			  phys_addr_t pa, unsigned long size, bool writable);
> +void kvm_stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end);
>  
>  int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
>  
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index 052d46d96201..3b415bc76ced 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -48,6 +48,9 @@ extern int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
>  extern int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu,
>  				    struct kvm_s2_trans *trans);
>  extern int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2);
> +extern void kvm_nested_s2_wp(struct kvm *kvm);
> +extern void kvm_nested_s2_clear(struct kvm *kvm);
> +extern void kvm_nested_s2_flush(struct kvm *kvm);
>  int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
>  extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
>  extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> index 023027fa2db5..8880033fb6e0 100644
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
> @@ -456,6 +456,45 @@ int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2)
>  	return kvm_inject_nested_sync(vcpu, esr_el2);
>  }
>  
> +/* expects kvm->mmu_lock to be held */
> +void kvm_nested_s2_wp(struct kvm *kvm)
> +{
> +	int i;
> +
> +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> +		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
> +
> +		if (kvm_s2_mmu_valid(mmu))
> +			kvm_stage2_wp_range(mmu, 0, kvm_phys_size(kvm));
> +	}
> +}
> +
> +/* expects kvm->mmu_lock to be held */
> +void kvm_nested_s2_clear(struct kvm *kvm)
> +{
> +	int i;
> +
> +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> +		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
> +
> +		if (kvm_s2_mmu_valid(mmu))
> +			kvm_unmap_stage2_range(mmu, 0, kvm_phys_size(kvm));
> +	}
> +}
> +
> +/* expects kvm->mmu_lock to be held */
> +void kvm_nested_s2_flush(struct kvm *kvm)
> +{
> +	int i;
> +
> +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> +		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
> +
> +		if (kvm_s2_mmu_valid(mmu))
> +			kvm_stage2_flush_range(mmu, 0, kvm_phys_size(kvm));
> +	}
> +}
> +
>  /*
>   * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
>   * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 4e3cbfa1ecbe..bcca27d5c481 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -1005,8 +1005,10 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu,
>  	 * Ensure a rebooted VM will fault in RAM pages and detect if the
>  	 * guest MMU is turned off and flush the caches as needed.
>  	 */
> -	if (vcpu->arch.has_run_once)
> +	if (vcpu->arch.has_run_once) {
>  		stage2_unmap_vm(vcpu->kvm);
> +		kvm_nested_s2_clear(vcpu->kvm);

The comment above kvm_nested_s2_clear() states that kvm->mmu_lock needs
to be taken, but in this state it isn't (stage2_unmap_vm() acquies the
lock and releases it).

Cheers,

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 39/59] KVM: arm64: nv: Move last_vcpu_ran to be per s2 mmu
  2019-06-21  9:38 ` [PATCH 39/59] KVM: arm64: nv: Move last_vcpu_ran to be per s2 mmu Marc Zyngier
@ 2019-07-01  9:10   ` Julien Thierry
  2019-07-05 15:28   ` Alexandru Elisei
  1 sibling, 0 replies; 176+ messages in thread
From: Julien Thierry @ 2019-07-01  9:10 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	James Morse, Suzuki K Poulose



On 21/06/2019 10:38, Marc Zyngier wrote:
> last_vcpu_ran has to be per s2 mmu now that we can have multiple S2
> per VM. Let's take this opportunity to perform some cleanup.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/include/asm/kvm_host.h   |  6 +++---
>  arch/arm/include/asm/kvm_mmu.h    |  2 +-
>  arch/arm64/include/asm/kvm_host.h |  6 +++---
>  arch/arm64/include/asm/kvm_mmu.h  |  2 +-
>  arch/arm64/kvm/nested.c           | 13 ++++++-------
>  virt/kvm/arm/arm.c                | 22 ++++------------------
>  virt/kvm/arm/mmu.c                | 26 ++++++++++++++++++++------
>  7 files changed, 38 insertions(+), 39 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index b821eb2383ad..cc761610e41e 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -63,15 +63,15 @@ struct kvm_s2_mmu {
>  	pgd_t *pgd;
>  	phys_addr_t pgd_phys;
>  
> +	/* The last vcpu id that ran on each physical CPU */
> +	int __percpu *last_vcpu_ran;
> +
>  	struct kvm *kvm;
>  };
>  
>  struct kvm_arch {
>  	struct kvm_s2_mmu mmu;
>  
> -	/* The last vcpu id that ran on each physical CPU */
> -	int __percpu *last_vcpu_ran;
> -
>  	/* Stage-2 page table */
>  	pgd_t *pgd;
>  	phys_addr_t pgd_phys;
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index afabf1fd1d17..7a6e9008ed45 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -52,7 +52,7 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
>  void free_hyp_pgds(void);
>  
>  void stage2_unmap_vm(struct kvm *kvm);
> -int kvm_alloc_stage2_pgd(struct kvm_s2_mmu *mmu);
> +int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu);
>  void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
>  int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
>  			  phys_addr_t pa, unsigned long size, bool writable);
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index cc238de170d2..b71a7a237f95 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -104,6 +104,9 @@ struct kvm_s2_mmu {
>  	 * >0: Somebody is actively using this.
>  	 */
>  	atomic_t refcnt;
> +
> +	/* The last vcpu id that ran on each physical CPU */
> +	int __percpu *last_vcpu_ran;
>  };
>  
>  static inline bool kvm_s2_mmu_valid(struct kvm_s2_mmu *mmu)
> @@ -124,9 +127,6 @@ struct kvm_arch {
>  	/* VTCR_EL2 value for this VM */
>  	u64    vtcr;
>  
> -	/* The last vcpu id that ran on each physical CPU */
> -	int __percpu *last_vcpu_ran;
> -
>  	/* The maximum number of vCPUs depends on the used GIC model */
>  	int max_vcpus;
>  
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index f4c5ac5eb95f..53103607065a 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -169,7 +169,7 @@ void free_hyp_pgds(void);
>  
>  void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size);
>  void stage2_unmap_vm(struct kvm *kvm);
> -int kvm_alloc_stage2_pgd(struct kvm_s2_mmu *mmu);
> +int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu);
>  void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
>  int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
>  			  phys_addr_t pa, unsigned long size, bool writable);
> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> index 8880033fb6e0..09afafbdc8fe 100644
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
> @@ -52,18 +52,17 @@ int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu)
>  			 GFP_KERNEL | __GFP_ZERO);
>  
>  	if (tmp) {
> -		if (tmp != kvm->arch.nested_mmus)
> +		if (tmp != kvm->arch.nested_mmus) {
>  			kfree(kvm->arch.nested_mmus);
> +			kvm->arch.nested_mmus = NULL;
> +			kvm->arch.nested_mmus_size = 0;
> +		}
>  
> -		tmp[num_mmus - 1].kvm = kvm;
> -		atomic_set(&tmp[num_mmus - 1].refcnt, 0);
> -		ret = kvm_alloc_stage2_pgd(&tmp[num_mmus - 1]);
> +		ret = kvm_init_stage2_mmu(kvm, &tmp[num_mmus - 1]);
>  		if (ret)
>  			goto out;
>  
> -		tmp[num_mmus - 2].kvm = kvm;
> -		atomic_set(&tmp[num_mmus - 2].refcnt, 0);
> -		ret = kvm_alloc_stage2_pgd(&tmp[num_mmus - 2]);
> +		ret = kvm_init_stage2_mmu(kvm, &tmp[num_mmus - 2]);
>  		if (ret) {
>  			kvm_free_stage2_pgd(&tmp[num_mmus - 1]);
>  			goto out;
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index bcca27d5c481..e8b584b79847 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -99,29 +99,21 @@ void kvm_arch_check_processor_compat(void *rtn)
>  	*(int *)rtn = 0;
>  }
>  
> -
>  /**
>   * kvm_arch_init_vm - initializes a VM data structure
>   * @kvm:	pointer to the KVM struct
>   */
>  int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  {
> -	int ret, cpu;
> +	int ret;
>  
>  	ret = kvm_arm_setup_stage2(kvm, type);
>  	if (ret)
>  		return ret;
>  
> -	kvm->arch.last_vcpu_ran = alloc_percpu(typeof(*kvm->arch.last_vcpu_ran));
> -	if (!kvm->arch.last_vcpu_ran)
> -		return -ENOMEM;
> -
> -	for_each_possible_cpu(cpu)
> -		*per_cpu_ptr(kvm->arch.last_vcpu_ran, cpu) = -1;
> -
> -	ret = kvm_alloc_stage2_pgd(&kvm->arch.mmu);
> +	ret = kvm_init_stage2_mmu(kvm, &kvm->arch.mmu);
>  	if (ret)
> -		goto out_fail_alloc;
> +		return ret;
>  
>  	/* Mark the initial VMID generation invalid */
>  	kvm->arch.mmu.vmid.vmid_gen = 0;
> @@ -142,9 +134,6 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  	return ret;
>  out_free_stage2_pgd:
>  	kvm_free_stage2_pgd(&kvm->arch.mmu);
> -out_fail_alloc:
> -	free_percpu(kvm->arch.last_vcpu_ran);
> -	kvm->arch.last_vcpu_ran = NULL;
>  	return ret;
>  }
>  
> @@ -174,9 +163,6 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
>  
>  	kvm_vgic_destroy(kvm);
>  
> -	free_percpu(kvm->arch.last_vcpu_ran);
> -	kvm->arch.last_vcpu_ran = NULL;
> -
>  	for (i = 0; i < KVM_MAX_VCPUS; ++i) {
>  		if (kvm->vcpus[i]) {
>  			kvm_arch_vcpu_free(kvm->vcpus[i]);
> @@ -359,7 +345,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  	if (nested_virt_in_use(vcpu))
>  		kvm_vcpu_load_hw_mmu(vcpu);
>  
> -	last_ran = this_cpu_ptr(vcpu->kvm->arch.last_vcpu_ran);
> +	last_ran = this_cpu_ptr(vcpu->arch.hw_mmu->last_vcpu_ran);
>  	cpu_data = this_cpu_ptr(&kvm_host_data);
>  
>  	/*
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 94d400e7af57..6a7cba077bce 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -903,8 +903,9 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
>  }
>  
>  /**
> - * kvm_alloc_stage2_pgd - allocate level-1 table for stage-2 translation.
> - * @mmu:	The stage 2 mmu struct pointer
> + * kvm_init_stage2_mmu - Initialise a S2 MMU strucrure
> + * @kvm:	The pointer to the KVM structure
> + * @mmu:	The pointer to the s2 MMU structure
>   *
>   * Allocates only the stage-2 HW PGD level table(s) of size defined by
>   * stage2_pgd_size(mmu->kvm).
> @@ -912,10 +913,11 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
>   * Note we don't need locking here as this is only called when the VM is
>   * created, which can only be done once.
>   */
> -int kvm_alloc_stage2_pgd(struct kvm_s2_mmu *mmu)
> +int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
>  {
>  	phys_addr_t pgd_phys;
>  	pgd_t *pgd;
> +	int cpu;
>  
>  	if (mmu->pgd != NULL) {
>  		kvm_err("kvm_arch already initialized?\n");
> @@ -923,18 +925,28 @@ int kvm_alloc_stage2_pgd(struct kvm_s2_mmu *mmu)
>  	}
>  
>  	/* Allocate the HW PGD, making sure that each page gets its own refcount */
> -	pgd = alloc_pages_exact(stage2_pgd_size(mmu->kvm), GFP_KERNEL | __GFP_ZERO);
> +	pgd = alloc_pages_exact(stage2_pgd_size(kvm), GFP_KERNEL | __GFP_ZERO);
>  	if (!pgd)
>  		return -ENOMEM;
>  
>  	pgd_phys = virt_to_phys(pgd);
> -	if (WARN_ON(pgd_phys & ~kvm_vttbr_baddr_mask(mmu->kvm)))
> +	if (WARN_ON(pgd_phys & ~kvm_vttbr_baddr_mask(kvm)))
>  		return -EINVAL;
>  
> +	mmu->last_vcpu_ran = alloc_percpu(typeof(*mmu->last_vcpu_ran));
> +	if (!mmu->last_vcpu_ran) {
> +		free_pages_exact(pgd, stage2_pgd_size(kvm));
> +		return -ENOMEM;
> +	}
> +
> +	mmu->kvm = kvm;

If we're initializing this here, we probably want to get rid of the
assignment in kvm_arch_init_vm().

>  	mmu->pgd = pgd;
>  	mmu->pgd_phys = pgd_phys;
>  	mmu->vmid.vmid_gen = 0;
>  
> +	for_each_possible_cpu(cpu)
> +		*per_cpu_ptr(mmu->last_vcpu_ran, cpu) = -1;

Nit: I'd suggest putting that right after the allocation of last_vcpu_ran.

> +
>  	kvm_init_s2_mmu(mmu);

Hmm, now we have kvm_init_stage2_mmu() and an arch (arm or arm64)
specific kvm_init_s2_mmu()...

If we want to keep the s2 mmu structure different for arm and arm64, I'd
suggest at least renaming kvm_init_s2_mmu() so the distinction with
kvm_init_stage2_mmu() is clearer.

>  
>  	return 0;
> @@ -1021,8 +1033,10 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
>  	spin_unlock(&kvm->mmu_lock);
>  
>  	/* Free the HW pgd, one page at a time */
> -	if (pgd)
> +	if (pgd) {
>  		free_pages_exact(pgd, stage2_pgd_size(kvm));
> +		free_percpu(mmu->last_vcpu_ran);
> +	}
>  }
>  
>  static pud_t *stage2_get_pud(struct kvm_s2_mmu *mmu, struct kvm_mmu_memory_cache *cache,
> 

Cheers,

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 13/59] KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg()
  2019-06-25 15:18   ` Alexandru Elisei
@ 2019-07-01  9:58     ` Alexandru Elisei
  2019-07-03 15:59     ` Marc Zyngier
  1 sibling, 0 replies; 176+ messages in thread
From: Alexandru Elisei @ 2019-07-01  9:58 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave Martin


On 6/25/19 4:18 PM, Alexandru Elisei wrote:
> Hi Marc,
>
> A question regarding this patch. This patch modifies vcpu_{read,write}_sys_reg
> to handle virtual EL2 registers. However, the file kvm/emulate-nested.c added by
> patch 10/59 "KVM: arm64: nv: Support virtual EL2 exceptions" already uses
> vcpu_{read,write}_sys_reg to access EL2 registers. In my opinion, it doesn't
> really matter which comes first because nested support is only enabled in the
> last patch of the series, but I thought I should bring this up in case it is not
> what you intended.
>
> On 6/21/19 10:37 AM, Marc Zyngier wrote:
>> From: Andre Przywara <andre.przywara@arm.com>
>>
>> KVM internally uses accessor functions when reading or writing the
>> guest's system registers. This takes care of accessing either the stored
>> copy or using the "live" EL1 system registers when the host uses VHE.
>>
>> With the introduction of virtual EL2 we add a bunch of EL2 system
>> registers, which now must also be taken care of:
>> - If the guest is running in vEL2, and we access an EL1 sysreg, we must
>>   revert to the stored version of that, and not use the CPU's copy.
>> - If the guest is running in vEL1, and we access an EL2 sysreg, we must
>>   also use the stored version, since the CPU carries the EL1 copy.
>> - Some EL2 system registers are supposed to affect the current execution
>>   of the system, so we need to put them into their respective EL1
>>   counterparts. For this we need to define a mapping between the two.
>>   This is done using the newly introduced struct el2_sysreg_map.
>> - Some EL2 system registers have a different format than their EL1
>>   counterpart, so we need to translate them before writing them to the
>>   CPU. This is done using an (optional) translate function in the map.
>> - There are the three special registers SP_EL2, SPSR_EL2 and ELR_EL2,
>>   which need some separate handling.
> I see no change in this patch related to SPSR_EL2. Special handling of SPSR_EL2
> is added in the next patch, patch 14/59 "KVM: arm64: nv: Handle SPSR_EL2 specially".
>> All of these cases are now wrapped into the existing accessor functions,
>> so KVM users wouldn't need to care whether they access EL2 or EL1
>> registers and also which state the guest is in.
>>
>> This handles what was formerly known as the "shadow state" dynamically,
>> without requiring a separate copy for each vCPU EL.
>>
>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm64/include/asm/kvm_emulate.h |   6 +
>>  arch/arm64/include/asm/kvm_host.h    |   5 +
>>  arch/arm64/kvm/sys_regs.c            | 163 +++++++++++++++++++++++++++
>>  3 files changed, 174 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
>> index c43aac5fed69..f37006b6eec4 100644
>> --- a/arch/arm64/include/asm/kvm_emulate.h
>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>> @@ -70,6 +70,12 @@ void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu);
>>  int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2);
>>  int kvm_inject_nested_irq(struct kvm_vcpu *vcpu);
>>  
>> +u64 translate_tcr(u64 tcr);
>> +u64 translate_cptr(u64 tcr);
>> +u64 translate_sctlr(u64 tcr);
>> +u64 translate_ttbr0(u64 tcr);
>> +u64 translate_cnthctl(u64 tcr);
>> +
>>  static inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
>>  {
>>  	return !(vcpu->arch.hcr_el2 & HCR_RW);
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 2d4290d2513a..dae9c42a7219 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -217,6 +217,11 @@ enum vcpu_sysreg {
>>  	NR_SYS_REGS	/* Nothing after this line! */
>>  };
>>  
>> +static inline bool sysreg_is_el2(int reg)
>> +{
>> +	return reg >= FIRST_EL2_SYSREG && reg < NR_SYS_REGS;
>> +}
>> +
>>  /* 32bit mapping */
>>  #define c0_MPIDR	(MPIDR_EL1 * 2)	/* MultiProcessor ID Register */
>>  #define c0_CSSELR	(CSSELR_EL1 * 2)/* Cache Size Selection Register */
>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>> index 693dd063c9c2..d024114da162 100644
>> --- a/arch/arm64/kvm/sys_regs.c
>> +++ b/arch/arm64/kvm/sys_regs.c
>> @@ -76,11 +76,142 @@ static bool write_to_read_only(struct kvm_vcpu *vcpu,
>>  	return false;
>>  }
>>  
>> +static u64 tcr_el2_ips_to_tcr_el1_ps(u64 tcr_el2)
> The code seems to suggest that you are translating TCR_EL2.PS to TCR_EL1.IPS.
> Perhaps the function should be named tcr_el2_ps_to_tcr_el1_ips?
>> +{
>> +	return ((tcr_el2 & TCR_EL2_PS_MASK) >> TCR_EL2_PS_SHIFT)
>> +		<< TCR_IPS_SHIFT;
>> +}
>> +
>> +u64 translate_tcr(u64 tcr)
>> +{
>> +	return TCR_EPD1_MASK |				/* disable TTBR1_EL1 */
>> +	       ((tcr & TCR_EL2_TBI) ? TCR_TBI0 : 0) |
>> +	       tcr_el2_ips_to_tcr_el1_ps(tcr) |
>> +	       (tcr & TCR_EL2_TG0_MASK) |
>> +	       (tcr & TCR_EL2_ORGN0_MASK) |
>> +	       (tcr & TCR_EL2_IRGN0_MASK) |
>> +	       (tcr & TCR_EL2_T0SZ_MASK);
>> +}
>> +
>> +u64 translate_cptr(u64 cptr_el2)
> The argument name is not consistent with the other translate_* functions. I
> think it is reasonably obvious that you are translating an EL2 register.
>> +{
>> +	u64 cpacr_el1 = 0;
>> +
>> +	if (!(cptr_el2 & CPTR_EL2_TFP))
>> +		cpacr_el1 |= CPACR_EL1_FPEN;
>> +	if (cptr_el2 & CPTR_EL2_TTA)
>> +		cpacr_el1 |= CPACR_EL1_TTA;
>> +	if (!(cptr_el2 & CPTR_EL2_TZ))
>> +		cpacr_el1 |= CPACR_EL1_ZEN;
>> +
>> +	return cpacr_el1;
>> +}
>> +
>> +u64 translate_sctlr(u64 sctlr)
>> +{
>> +	/* Bit 20 is RES1 in SCTLR_EL1, but RES0 in SCTLR_EL2 */
>> +	return sctlr | BIT(20);
>> +}
>> +
>> +u64 translate_ttbr0(u64 ttbr0)
>> +{
>> +	/* Force ASID to 0 (ASID 0 or RES0) */
> Are you forcing ASID to 0 because you are only expecting a non-vhe guest
> hypervisor to access ttbr0_el2, in which case the architecture says that the
> ASID field is RES0? Is it so unlikely that a vhe guest hypervisor will access
> ttbr0_el2 directly that it's not worth adding a check for that?

My mistake, obviously the translate functions are used only when VHE is
disabled, because when E2H is set, they have the same format as their EL1
counterparts.

Sorry for the noise,

Alex


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 16/59] KVM: arm64: nv: Save/Restore vEL2 sysregs
  2019-06-21  9:38 ` [PATCH 16/59] KVM: arm64: nv: Save/Restore vEL2 sysregs Marc Zyngier
  2019-06-25  8:48   ` Julien Thierry
@ 2019-07-01 12:09   ` Alexandru Elisei
  2019-08-21 11:57   ` Alexandru Elisei
  2 siblings, 0 replies; 176+ messages in thread
From: Alexandru Elisei @ 2019-07-01 12:09 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave Martin


On 6/21/19 10:38 AM, Marc Zyngier wrote:
> From: Andre Przywara <andre.przywara@arm.com>
>
> Whenever we need to restore the guest's system registers to the CPU, we
> now need to take care of the EL2 system registers as well. Most of them
> are accessed via traps only, but some have an immediate effect and also
> a guest running in VHE mode would expect them to be accessible via their
> EL1 encoding, which we do not trap.
>
> Split the current __sysreg_{save,restore}_el1_state() functions into
> handling common sysregs, then differentiate between the guest running in
> vEL2 and vEL1.
>
> For vEL2 we write the virtual EL2 registers with an identical format directly
> into their EL1 counterpart, and translate the few registers that have a
> different format for the same effect on the execution when running a
> non-VHE guest guest hypervisor.
>
>   [ Commit message reworked and many bug fixes applied by Marc Zyngier
>     and Christoffer Dall. ]
>
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> ---
>  arch/arm64/kvm/hyp/sysreg-sr.c | 160 +++++++++++++++++++++++++++++++--
>  1 file changed, 153 insertions(+), 7 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> index 62866a68e852..2abb9c3ff24f 100644
> --- a/arch/arm64/kvm/hyp/sysreg-sr.c
> +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> @@ -22,6 +22,7 @@
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_emulate.h>
>  #include <asm/kvm_hyp.h>
> +#include <asm/kvm_nested.h>
>  
>  /*
>   * Non-VHE: Both host and guest must save everything.
> @@ -51,11 +52,9 @@ static void __hyp_text __sysreg_save_user_state(struct kvm_cpu_context *ctxt)
>  	ctxt->sys_regs[TPIDRRO_EL0]	= read_sysreg(tpidrro_el0);
>  }
>  
> -static void __hyp_text __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
> +static void __hyp_text __sysreg_save_vel1_state(struct kvm_cpu_context *ctxt)
>  {
> -	ctxt->sys_regs[CSSELR_EL1]	= read_sysreg(csselr_el1);
>  	ctxt->sys_regs[SCTLR_EL1]	= read_sysreg_el1(SYS_SCTLR);
> -	ctxt->sys_regs[ACTLR_EL1]	= read_sysreg(actlr_el1);
>  	ctxt->sys_regs[CPACR_EL1]	= read_sysreg_el1(SYS_CPACR);
>  	ctxt->sys_regs[TTBR0_EL1]	= read_sysreg_el1(SYS_TTBR0);
>  	ctxt->sys_regs[TTBR1_EL1]	= read_sysreg_el1(SYS_TTBR1);
> @@ -69,14 +68,58 @@ static void __hyp_text __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
>  	ctxt->sys_regs[CONTEXTIDR_EL1]	= read_sysreg_el1(SYS_CONTEXTIDR);
>  	ctxt->sys_regs[AMAIR_EL1]	= read_sysreg_el1(SYS_AMAIR);
>  	ctxt->sys_regs[CNTKCTL_EL1]	= read_sysreg_el1(SYS_CNTKCTL);
> -	ctxt->sys_regs[PAR_EL1]		= read_sysreg(par_el1);
> -	ctxt->sys_regs[TPIDR_EL1]	= read_sysreg(tpidr_el1);
>  
>  	ctxt->gp_regs.sp_el1		= read_sysreg(sp_el1);
>  	ctxt->gp_regs.elr_el1		= read_sysreg_el1(SYS_ELR);
>  	ctxt->gp_regs.spsr[KVM_SPSR_EL1]= read_sysreg_el1(SYS_SPSR);
>  }
>  
> +static void __sysreg_save_vel2_state(struct kvm_cpu_context *ctxt)
> +{
> +	ctxt->sys_regs[ESR_EL2]		= read_sysreg_el1(SYS_ESR);
> +	ctxt->sys_regs[AFSR0_EL2]	= read_sysreg_el1(SYS_AFSR0);
> +	ctxt->sys_regs[AFSR1_EL2]	= read_sysreg_el1(SYS_AFSR1);
> +	ctxt->sys_regs[FAR_EL2]		= read_sysreg_el1(SYS_FAR);
> +	ctxt->sys_regs[MAIR_EL2]	= read_sysreg_el1(SYS_MAIR);
> +	ctxt->sys_regs[VBAR_EL2]	= read_sysreg_el1(SYS_VBAR);
> +	ctxt->sys_regs[CONTEXTIDR_EL2]	= read_sysreg_el1(SYS_CONTEXTIDR);
> +	ctxt->sys_regs[AMAIR_EL2]	= read_sysreg_el1(SYS_AMAIR);
> +
> +	/*
> +	 * In VHE mode those registers are compatible between EL1 and EL2,
> +	 * and the guest uses the _EL1 versions on the CPU naturally.
> +	 * So we save them into their _EL2 versions here.
> +	 * For nVHE mode we trap accesses to those registers, so our
> +	 * _EL2 copy in sys_regs[] is always up-to-date and we don't need
> +	 * to save anything here.
> +	 */
> +	if (__vcpu_el2_e2h_is_set(ctxt)) {
> +		ctxt->sys_regs[SCTLR_EL2]	= read_sysreg_el1(SYS_SCTLR);
> +		ctxt->sys_regs[CPTR_EL2]	= read_sysreg_el1(SYS_CPACR);
> +		ctxt->sys_regs[TTBR0_EL2]	= read_sysreg_el1(SYS_TTBR0);
> +		ctxt->sys_regs[TTBR1_EL2]	= read_sysreg_el1(SYS_TTBR1);
> +		ctxt->sys_regs[TCR_EL2]		= read_sysreg_el1(SYS_TCR);
> +		ctxt->sys_regs[CNTHCTL_EL2]	= read_sysreg_el1(SYS_CNTKCTL);

This goes against how the register is declared in arch/arm64/kvm/sys_regs.c
(added by patch 13), where it's declared as a "pure" EL2 register with no EL1
counterpart. I think this is correct, and having it as a pure register is not
the right approach, I'll explain why in patch 13.

> +	}
> +
> +	ctxt->sys_regs[SP_EL2]		= read_sysreg(sp_el1);
> +	ctxt->sys_regs[ELR_EL2]		= read_sysreg_el1(SYS_ELR);
> +	ctxt->sys_regs[SPSR_EL2]	= __fixup_spsr_el2_read(ctxt, read_sysreg_el1(SYS_SPSR));
> +}
> +
> +static void __hyp_text __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
> +{
> +	ctxt->sys_regs[CSSELR_EL1]	= read_sysreg(csselr_el1);
> +	ctxt->sys_regs[ACTLR_EL1]	= read_sysreg(actlr_el1);
> +	ctxt->sys_regs[PAR_EL1]		= read_sysreg(par_el1);
> +	ctxt->sys_regs[TPIDR_EL1]	= read_sysreg(tpidr_el1);
> +
> +	if (unlikely(__is_hyp_ctxt(ctxt)))
> +		__sysreg_save_vel2_state(ctxt);
> +	else
> +		__sysreg_save_vel1_state(ctxt);
> +}
> +
>  static void __hyp_text __sysreg_save_el2_return_state(struct kvm_cpu_context *ctxt)
>  {
>  	ctxt->gp_regs.regs.pc		= read_sysreg_el2(SYS_ELR);
> @@ -124,10 +167,91 @@ static void __hyp_text __sysreg_restore_user_state(struct kvm_cpu_context *ctxt)
>  	write_sysreg(ctxt->sys_regs[TPIDRRO_EL0],	tpidrro_el0);
>  }
>  
> -static void __hyp_text __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
> +static void __sysreg_restore_vel2_state(struct kvm_cpu_context *ctxt)
>  {
> +	u64 val;
> +
> +	write_sysreg(read_cpuid_id(),			vpidr_el2);
>  	write_sysreg(ctxt->sys_regs[MPIDR_EL1],		vmpidr_el2);
> -	write_sysreg(ctxt->sys_regs[CSSELR_EL1],	csselr_el1);
> +	write_sysreg_el1(ctxt->sys_regs[MAIR_EL2],	SYS_MAIR);
> +	write_sysreg_el1(ctxt->sys_regs[VBAR_EL2],	SYS_VBAR);
> +	write_sysreg_el1(ctxt->sys_regs[CONTEXTIDR_EL2],SYS_CONTEXTIDR);
> +	write_sysreg_el1(ctxt->sys_regs[AMAIR_EL2],	SYS_AMAIR);
> +
> +	if (__vcpu_el2_e2h_is_set(ctxt)) {
> +		/*
> +		 * In VHE mode those registers are compatible between
> +		 * EL1 and EL2.
> +		 */
> +		write_sysreg_el1(ctxt->sys_regs[SCTLR_EL2],	SYS_SCTLR);
> +		write_sysreg_el1(ctxt->sys_regs[CPTR_EL2],	SYS_CPACR);
> +		write_sysreg_el1(ctxt->sys_regs[TTBR0_EL2],	SYS_TTBR0);
> +		write_sysreg_el1(ctxt->sys_regs[TTBR1_EL2],	SYS_TTBR1);
> +		write_sysreg_el1(ctxt->sys_regs[TCR_EL2],	SYS_TCR);
> +		write_sysreg_el1(ctxt->sys_regs[CNTHCTL_EL2],	SYS_CNTKCTL);
> +	} else {
> +		write_sysreg_el1(translate_sctlr(ctxt->sys_regs[SCTLR_EL2]),
> +				 SYS_SCTLR);
> +		write_sysreg_el1(translate_cptr(ctxt->sys_regs[CPTR_EL2]),
> +				 SYS_CPACR);
> +		write_sysreg_el1(translate_ttbr0(ctxt->sys_regs[TTBR0_EL2]),
> +				 SYS_TTBR0);
> +		write_sysreg_el1(translate_tcr(ctxt->sys_regs[TCR_EL2]),
> +				 SYS_TCR);
> +		write_sysreg_el1(translate_cnthctl(ctxt->sys_regs[CNTHCTL_EL2]),
> +				 SYS_CNTKCTL);
> +	}
> +
> +	/*
> +	 * These registers can be modified behind our back by a fault
> +	 * taken inside vEL2. Save them, always.
> +	 */
> +	write_sysreg_el1(ctxt->sys_regs[ESR_EL2],	SYS_ESR);
> +	write_sysreg_el1(ctxt->sys_regs[AFSR0_EL2],	SYS_AFSR0);
> +	write_sysreg_el1(ctxt->sys_regs[AFSR1_EL2],	SYS_AFSR1);
> +	write_sysreg_el1(ctxt->sys_regs[FAR_EL2],	SYS_FAR);
> +	write_sysreg(ctxt->sys_regs[SP_EL2],		sp_el1);
> +	write_sysreg_el1(ctxt->sys_regs[ELR_EL2],	SYS_ELR);
> +
> +	val = __fixup_spsr_el2_write(ctxt, ctxt->sys_regs[SPSR_EL2]);
> +	write_sysreg_el1(val,	SYS_SPSR);
> +}
> +
> +static void __hyp_text __sysreg_restore_vel1_state(struct kvm_cpu_context *ctxt)
> +{
> +	u64 mpidr;
> +
> +	if (has_vhe()) {
> +		struct kvm_vcpu *vcpu;
> +
> +		/*
> +		 * Warning: this hack only works on VHE, because we only
> +		 * call this with the *guest* context, which is part of
> +		 * struct kvm_vcpu. On a host context, you'd get pure junk.
> +		 */
> +		vcpu = container_of(ctxt, struct kvm_vcpu, arch.ctxt);
> +
> +		if (nested_virt_in_use(vcpu)) {
> +			/*
> +			 * Only set VPIDR_EL2 for nested VMs, as this is the
> +			 * only time it changes. We'll restore the MIDR_EL1
> +			 * view on put.
> +			 */
> +			write_sysreg(ctxt->sys_regs[VPIDR_EL2],	vpidr_el2);
> +
> +			/*
> +			 * As we're restoring a nested guest, set the value
> +			 * provided by the guest hypervisor.
> +			 */
> +			mpidr = ctxt->sys_regs[VMPIDR_EL2];
> +		} else {
> +			mpidr = ctxt->sys_regs[MPIDR_EL1];
> +		}
> +	} else {
> +		mpidr = ctxt->sys_regs[MPIDR_EL1];
> +	}
> +
> +	write_sysreg(mpidr,				vmpidr_el2);
>  	write_sysreg_el1(ctxt->sys_regs[SCTLR_EL1],	SYS_SCTLR);
>  	write_sysreg(ctxt->sys_regs[ACTLR_EL1],		actlr_el1);
>  	write_sysreg_el1(ctxt->sys_regs[CPACR_EL1],	SYS_CPACR);
> @@ -151,6 +275,19 @@ static void __hyp_text __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
>  	write_sysreg_el1(ctxt->gp_regs.spsr[KVM_SPSR_EL1],SYS_SPSR);
>  }
>  
> +static void __hyp_text __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
> +{
> +	write_sysreg(ctxt->sys_regs[CSSELR_EL1],	csselr_el1);
> +	write_sysreg(ctxt->sys_regs[ACTLR_EL1],	  	actlr_el1);
> +	write_sysreg(ctxt->sys_regs[PAR_EL1],		par_el1);
> +	write_sysreg(ctxt->sys_regs[TPIDR_EL1],		tpidr_el1);
> +
> +	if (__is_hyp_ctxt(ctxt))
> +		__sysreg_restore_vel2_state(ctxt);
> +	else
> +		__sysreg_restore_vel1_state(ctxt);
> +}
> +
>  static void __hyp_text
>  __sysreg_restore_el2_return_state(struct kvm_cpu_context *ctxt)
>  {
> @@ -307,6 +444,15 @@ void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu)
>  	/* Restore host user state */
>  	__sysreg_restore_user_state(host_ctxt);
>  
> +	/*
> +	 * If leaving a nesting guest, restore MPIDR_EL1 default view. It is
> +	 * slightly ugly to do it here, but the alternative is to penalize
> +	 * all non-nesting guests by forcing this on every load. Instead, we
> +	 * choose to only penalize nesting VMs.
> +	 */
> +	if (nested_virt_in_use(vcpu))
> +		write_sysreg(read_cpuid_id(),	vpidr_el2);
> +
>  	vcpu->arch.sysregs_loaded_on_cpu = false;
>  }
>  

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 13/59] KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg()
  2019-06-21  9:37 ` [PATCH 13/59] KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg() Marc Zyngier
                     ` (2 preceding siblings ...)
  2019-06-26 15:04   ` Alexandru Elisei
@ 2019-07-01 12:10   ` Alexandru Elisei
  3 siblings, 0 replies; 176+ messages in thread
From: Alexandru Elisei @ 2019-07-01 12:10 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave Martin


On 6/21/19 10:37 AM, Marc Zyngier wrote:
> From: Andre Przywara <andre.przywara@arm.com>
>
> KVM internally uses accessor functions when reading or writing the
> guest's system registers. This takes care of accessing either the stored
> copy or using the "live" EL1 system registers when the host uses VHE.
>
> With the introduction of virtual EL2 we add a bunch of EL2 system
> registers, which now must also be taken care of:
> - If the guest is running in vEL2, and we access an EL1 sysreg, we must
>   revert to the stored version of that, and not use the CPU's copy.
> - If the guest is running in vEL1, and we access an EL2 sysreg, we must
>   also use the stored version, since the CPU carries the EL1 copy.
> - Some EL2 system registers are supposed to affect the current execution
>   of the system, so we need to put them into their respective EL1
>   counterparts. For this we need to define a mapping between the two.
>   This is done using the newly introduced struct el2_sysreg_map.
> - Some EL2 system registers have a different format than their EL1
>   counterpart, so we need to translate them before writing them to the
>   CPU. This is done using an (optional) translate function in the map.
> - There are the three special registers SP_EL2, SPSR_EL2 and ELR_EL2,
>   which need some separate handling.
>
> All of these cases are now wrapped into the existing accessor functions,
> so KVM users wouldn't need to care whether they access EL2 or EL1
> registers and also which state the guest is in.
>
> This handles what was formerly known as the "shadow state" dynamically,
> without requiring a separate copy for each vCPU EL.
>
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/include/asm/kvm_emulate.h |   6 +
>  arch/arm64/include/asm/kvm_host.h    |   5 +
>  arch/arm64/kvm/sys_regs.c            | 163 +++++++++++++++++++++++++++
>  3 files changed, 174 insertions(+)
>
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index c43aac5fed69..f37006b6eec4 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -70,6 +70,12 @@ void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu);
>  int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2);
>  int kvm_inject_nested_irq(struct kvm_vcpu *vcpu);
>  
> +u64 translate_tcr(u64 tcr);
> +u64 translate_cptr(u64 tcr);
> +u64 translate_sctlr(u64 tcr);
> +u64 translate_ttbr0(u64 tcr);
> +u64 translate_cnthctl(u64 tcr);
> +
>  static inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
>  {
>  	return !(vcpu->arch.hcr_el2 & HCR_RW);
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 2d4290d2513a..dae9c42a7219 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -217,6 +217,11 @@ enum vcpu_sysreg {
>  	NR_SYS_REGS	/* Nothing after this line! */
>  };
>  
> +static inline bool sysreg_is_el2(int reg)
> +{
> +	return reg >= FIRST_EL2_SYSREG && reg < NR_SYS_REGS;
> +}
> +
>  /* 32bit mapping */
>  #define c0_MPIDR	(MPIDR_EL1 * 2)	/* MultiProcessor ID Register */
>  #define c0_CSSELR	(CSSELR_EL1 * 2)/* Cache Size Selection Register */
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 693dd063c9c2..d024114da162 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -76,11 +76,142 @@ static bool write_to_read_only(struct kvm_vcpu *vcpu,
>  	return false;
>  }
>  
> +static u64 tcr_el2_ips_to_tcr_el1_ps(u64 tcr_el2)
> +{
> +	return ((tcr_el2 & TCR_EL2_PS_MASK) >> TCR_EL2_PS_SHIFT)
> +		<< TCR_IPS_SHIFT;
> +}
> +
> +u64 translate_tcr(u64 tcr)
> +{
> +	return TCR_EPD1_MASK |				/* disable TTBR1_EL1 */
> +	       ((tcr & TCR_EL2_TBI) ? TCR_TBI0 : 0) |
> +	       tcr_el2_ips_to_tcr_el1_ps(tcr) |
> +	       (tcr & TCR_EL2_TG0_MASK) |
> +	       (tcr & TCR_EL2_ORGN0_MASK) |
> +	       (tcr & TCR_EL2_IRGN0_MASK) |
> +	       (tcr & TCR_EL2_T0SZ_MASK);
> +}
> +
> +u64 translate_cptr(u64 cptr_el2)
> +{
> +	u64 cpacr_el1 = 0;
> +
> +	if (!(cptr_el2 & CPTR_EL2_TFP))
> +		cpacr_el1 |= CPACR_EL1_FPEN;
> +	if (cptr_el2 & CPTR_EL2_TTA)
> +		cpacr_el1 |= CPACR_EL1_TTA;
> +	if (!(cptr_el2 & CPTR_EL2_TZ))
> +		cpacr_el1 |= CPACR_EL1_ZEN;
> +
> +	return cpacr_el1;
> +}
> +
> +u64 translate_sctlr(u64 sctlr)
> +{
> +	/* Bit 20 is RES1 in SCTLR_EL1, but RES0 in SCTLR_EL2 */
> +	return sctlr | BIT(20);
> +}
> +
> +u64 translate_ttbr0(u64 ttbr0)
> +{
> +	/* Force ASID to 0 (ASID 0 or RES0) */
> +	return ttbr0 & ~GENMASK_ULL(63, 48);
> +}
> +
> +u64 translate_cnthctl(u64 cnthctl)
> +{
> +	return ((cnthctl & 0x3) << 10) | (cnthctl & 0xfc);
> +}
> +
> +#define EL2_SYSREG(el2, el1, translate)	\
> +	[el2 - FIRST_EL2_SYSREG] = { el2, el1, translate }
> +#define PURE_EL2_SYSREG(el2) \
> +	[el2 - FIRST_EL2_SYSREG] = { el2,__INVALID_SYSREG__, NULL }
> +/*
> + * Associate vEL2 registers to their EL1 counterparts on the CPU.
> + * The translate function can be NULL, when the register layout is identical.
> + */
> +struct el2_sysreg_map {
> +	int sysreg;	/* EL2 register index into the array above */
> +	int mapping;	/* associated EL1 register */
> +	u64 (*translate)(u64 value);
> +} nested_sysreg_map[NR_SYS_REGS - FIRST_EL2_SYSREG] = {
> +	PURE_EL2_SYSREG( VPIDR_EL2 ),
> +	PURE_EL2_SYSREG( VMPIDR_EL2 ),
> +	PURE_EL2_SYSREG( ACTLR_EL2 ),
> +	PURE_EL2_SYSREG( HCR_EL2 ),
> +	PURE_EL2_SYSREG( MDCR_EL2 ),
> +	PURE_EL2_SYSREG( HSTR_EL2 ),
> +	PURE_EL2_SYSREG( HACR_EL2 ),
> +	PURE_EL2_SYSREG( VTTBR_EL2 ),
> +	PURE_EL2_SYSREG( VTCR_EL2 ),
> +	PURE_EL2_SYSREG( RVBAR_EL2 ),
> +	PURE_EL2_SYSREG( RMR_EL2 ),
> +	PURE_EL2_SYSREG( TPIDR_EL2 ),
> +	PURE_EL2_SYSREG( CNTVOFF_EL2 ),
> +	PURE_EL2_SYSREG( CNTHCTL_EL2 ),
I don't think having CNTHCTL_EL2 as a "pure" EL2 register is the right approach.
More details below.
> +	PURE_EL2_SYSREG( HPFAR_EL2 ),
> +	EL2_SYSREG(      SCTLR_EL2,  SCTLR_EL1,      translate_sctlr ),
> +	EL2_SYSREG(      CPTR_EL2,   CPACR_EL1,      translate_cptr  ),
> +	EL2_SYSREG(      TTBR0_EL2,  TTBR0_EL1,      translate_ttbr0 ),
> +	EL2_SYSREG(      TTBR1_EL2,  TTBR1_EL1,      NULL            ),
> +	EL2_SYSREG(      TCR_EL2,    TCR_EL1,        translate_tcr   ),
> +	EL2_SYSREG(      VBAR_EL2,   VBAR_EL1,       NULL            ),
> +	EL2_SYSREG(      AFSR0_EL2,  AFSR0_EL1,      NULL            ),
> +	EL2_SYSREG(      AFSR1_EL2,  AFSR1_EL1,      NULL            ),
> +	EL2_SYSREG(      ESR_EL2,    ESR_EL1,        NULL            ),
> +	EL2_SYSREG(      FAR_EL2,    FAR_EL1,        NULL            ),
> +	EL2_SYSREG(      MAIR_EL2,   MAIR_EL1,       NULL            ),
> +	EL2_SYSREG(      AMAIR_EL2,  AMAIR_EL1,      NULL            ),
> +};
> +
> +static
> +const struct el2_sysreg_map *find_el2_sysreg(const struct el2_sysreg_map *map,
> +					     int reg)
> +{
> +	const struct el2_sysreg_map *entry;
> +
> +	if (!sysreg_is_el2(reg))
> +		return NULL;
> +
> +	entry = &nested_sysreg_map[reg - FIRST_EL2_SYSREG];
> +	if (entry->sysreg == __INVALID_SYSREG__)
> +		return NULL;
> +
> +	return entry;
> +}
> +
>  u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
>  {
> +
>  	if (!vcpu->arch.sysregs_loaded_on_cpu)
>  		goto immediate_read;
>  
> +	if (unlikely(sysreg_is_el2(reg))) {
> +		const struct el2_sysreg_map *el2_reg;
> +
> +		if (!is_hyp_ctxt(vcpu))
> +			goto immediate_read;
> +
> +		el2_reg = find_el2_sysreg(nested_sysreg_map, reg);
> +		if (el2_reg) {
> +			/*
> +			 * If this register does not have an EL1 counterpart,
> +			 * then read the stored EL2 version.
> +			 */
> +			if (el2_reg->mapping == __INVALID_SYSREG__)
> +				goto immediate_read;

With CNTHCTL_EL2 as a "pure" EL2 register, reads (and writes, in
vcpu_write_sys_reg) will go to memory. However, when vhe is enabled, CNTHCTL_EL2
has the same format as CNTKCTL_EL1 and reads/writes to CNTKCTL_EL1 should be
reflected in the value of CNTHCTL_EL2 according to the pseudocode for accessing
CNTKCTL_EL1 (ARM DDI 0487D.b, page D12-3496). This doesn't happen for vhe guest
hypervisors because EL2 is declared as a "pure" EL2 register.

I have tested that with this hack for reads (function chosen at random):

diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 04e554cae3a2..3a6260745680 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -653,8 +653,22 @@ static void check_vcpu_requests(struct kvm_vcpu *vcpu)
  */
 int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
+       u64 cntkctl, cnthctl;
        int ret;
 
+       /* Check that CNTKCTL_EL1 writes are redirected to CNTHCTL_EL2 */
+       if (has_vhe()) {
+               /* Check that CNTKCTL_EL1 reads are redirected to CNTHCTL_EL2 */
+               cntkctl = read_sysreg(cntkctl_el1);
+               cnthctl = cntkctl ^ 1;
+               write_sysreg(cnthctl, cnthctl_el2);
+               cntkctl = read_sysreg(cntkctl_el1);
+               BUG_ON(cntkctl != cnthctl);
+               /* Restore original value */
+               cnthctl ^= 1;
+               write_sysreg(cnthctl, cnthctl_el2);
+       }
+
        if (unlikely(!kvm_vcpu_initialized(vcpu)))
                return -ENOEXEC;

and this hack for writes:

diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 04e554cae3a2..1cfe47b6fa99 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -653,8 +653,21 @@ static void check_vcpu_requests(struct kvm_vcpu *vcpu)
  */
 int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
+       u64 cntkctl, cnthctl;
        int ret;
 
+       /* Check that CNTKCTL_EL1 writes are redirected to CNTHCTL_EL2 */
+       if (has_vhe()) {
+               cntkctl = read_sysreg(cntkctl_el1);
+               cntkctl ^= 1;
+               write_sysreg(cntkctl, cntkctl_el1);
+               cnthctl = read_sysreg(cnthctl_el2);
+               BUG_ON(cntkctl != cnthctl);
+               /* Restore original value */
+               cntkctl ^= 1;
+               write_sysreg(cntkctl, cntkctl_el1);
+       }
+
        if (unlikely(!kvm_vcpu_initialized(vcpu)))
                return -ENOEXEC;

The BUG_ON is not triggered on baremetal, but is triggered when running as a L1
guest hypervisor.

Another issue with CNTHCTL_EL2 being a "pure" EL2 register is that with non-vhe
guests, writes to CNTHCTL_EL2 aren't translated and written to CNTKCTL_EL1.

This patch seems to fix the issues with vhe and non-vhe guest hypervisors
(tested with booting a L2 guest):

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 1235a88ec575..bd21f0f45a86 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -153,7 +153,6 @@ struct el2_sysreg_map {
        PURE_EL2_SYSREG( RVBAR_EL2 ),
        PURE_EL2_SYSREG( RMR_EL2 ),
        PURE_EL2_SYSREG( TPIDR_EL2 ),
-       PURE_EL2_SYSREG( CNTHCTL_EL2 ),
        PURE_EL2_SYSREG( HPFAR_EL2 ),
        EL2_SYSREG(      SCTLR_EL2,  SCTLR_EL1,      translate_sctlr ),
        EL2_SYSREG(      CPTR_EL2,   CPACR_EL1,      translate_cptr  ),
@@ -167,6 +166,7 @@ struct el2_sysreg_map {
        EL2_SYSREG(      FAR_EL2,    FAR_EL1,        NULL            ),
        EL2_SYSREG(      MAIR_EL2,   MAIR_EL1,       NULL            ),
        EL2_SYSREG(      AMAIR_EL2,  AMAIR_EL1,      NULL            ),
+       EL2_SYSREG(      CNTHCTL_EL2,CNTKCTL_EL1,    translate_cnthctl),
 };
 
 static

> +
> +			/* Get the current version of the EL1 counterpart. */
> +			reg = el2_reg->mapping;
> +		}
> +	} else {
> +		/* EL1 register can't be on the CPU if the guest is in vEL2. */
> +		if (unlikely(is_hyp_ctxt(vcpu)))
> +			goto immediate_read;
> +	}
> +
>  	/*
>  	 * System registers listed in the switch are not saved on every
>  	 * exit from the guest but are only saved on vcpu_put.
> @@ -114,6 +245,8 @@ u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
>  	case DACR32_EL2:	return read_sysreg_s(SYS_DACR32_EL2);
>  	case IFSR32_EL2:	return read_sysreg_s(SYS_IFSR32_EL2);
>  	case DBGVCR32_EL2:	return read_sysreg_s(SYS_DBGVCR32_EL2);
> +	case SP_EL2:		return read_sysreg(sp_el1);
> +	case ELR_EL2:		return read_sysreg_el1(SYS_ELR);
>  	}
>  
>  immediate_read:
> @@ -125,6 +258,34 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
>  	if (!vcpu->arch.sysregs_loaded_on_cpu)
>  		goto immediate_write;
>  
> +	if (unlikely(sysreg_is_el2(reg))) {
> +		const struct el2_sysreg_map *el2_reg;
> +
> +		if (!is_hyp_ctxt(vcpu))
> +			goto immediate_write;
> +
> +		/* Store the EL2 version in the sysregs array. */
> +		__vcpu_sys_reg(vcpu, reg) = val;
> +
> +		el2_reg = find_el2_sysreg(nested_sysreg_map, reg);
> +		if (el2_reg) {
> +			/* Does this register have an EL1 counterpart? */
> +			if (el2_reg->mapping == __INVALID_SYSREG__)
> +				return;
> +
> +			if (!vcpu_el2_e2h_is_set(vcpu) &&
> +			    el2_reg->translate)
> +				val = el2_reg->translate(val);
> +
> +			/* Redirect this to the EL1 version of the register. */
> +			reg = el2_reg->mapping;
> +		}
> +	} else {
> +		/* EL1 register can't be on the CPU if the guest is in vEL2. */
> +		if (unlikely(is_hyp_ctxt(vcpu)))
> +			goto immediate_write;
> +	}
> +
>  	/*
>  	 * System registers listed in the switch are not restored on every
>  	 * entry to the guest but are only restored on vcpu_load.
> @@ -157,6 +318,8 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
>  	case DACR32_EL2:	write_sysreg_s(val, SYS_DACR32_EL2);	return;
>  	case IFSR32_EL2:	write_sysreg_s(val, SYS_IFSR32_EL2);	return;
>  	case DBGVCR32_EL2:	write_sysreg_s(val, SYS_DBGVCR32_EL2);	return;
> +	case SP_EL2:		write_sysreg(val, sp_el1);		return;
> +	case ELR_EL2:		write_sysreg_el1(val, SYS_ELR);		return;
>  	}
>  
>  immediate_write:

^ permalink raw reply related	[flat|nested] 176+ messages in thread

* Re: [PATCH 43/59] KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2
  2019-06-21  9:38 ` [PATCH 43/59] KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2 Marc Zyngier
@ 2019-07-01 15:45   ` Julien Thierry
  2019-07-09 13:20   ` Alexandru Elisei
  2019-07-24 10:25   ` Tomasz Nowicki
  2 siblings, 0 replies; 176+ messages in thread
From: Julien Thierry @ 2019-07-01 15:45 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	James Morse, Suzuki K Poulose



On 21/06/2019 10:38, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> When supporting nested virtualization a guest hypervisor executing AT
> instructions must be trapped and emulated by the host hypervisor,
> because untrapped AT instructions operating on S1E1 will use the wrong
> translation regieme (the one used to emulate virtual EL2 in EL1 instead
> of virtual EL1) and AT instructions operating on S12 will not work from
> EL1.
> 
> This patch does several things.
> 
> 1. List and define all AT system instructions to emulate and document
> the emulation design.
> 
> 2. Implement AT instruction handling logic in EL2. This will be used to
> emulate AT instructions executed in the virtual EL2.
> 
> AT instruction emulation works by loading the proper processor
> context, which depends on the trapped instruction and the virtual
> HCR_EL2, to the EL1 virtual memory control registers and executing AT
> instructions. Note that ctxt->hw_sys_regs is expected to have the
> proper processor context before calling the handling
> function(__kvm_at_insn) implemented in this patch.
> 
> 4. Emulate AT S1E[01] instructions by issuing the same instructions in
> EL2. We set the physical EL1 registers, NV and NV1 bits as described in
> the AT instruction emulation overview.
> 
> 5. Emulate AT A12E[01] instructions in two steps: First, do the stage-1
> translation by reusing the existing AT emulation functions.  Second, do
> the stage-2 translation by walking the guest hypervisor's stage-2 page
> table in software. Record the translation result to PAR_EL1.
> 
> 6. Emulate AT S1E2 instructions by issuing the corresponding S1E1
> instructions in EL2. We set the physical EL1 registers and the HCR_EL2
> register as described in the AT instruction emulation overview.
> 
> 7. Forward system instruction traps to the virtual EL2 if the corresponding
> virtual AT bit is set in the virtual HCR_EL2.
> 
>   [ Much logic above has been reworked by Marc Zyngier ]
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> ---
>  arch/arm64/include/asm/kvm_arm.h |   2 +
>  arch/arm64/include/asm/kvm_asm.h |   2 +
>  arch/arm64/include/asm/sysreg.h  |  17 +++
>  arch/arm64/kvm/hyp/Makefile      |   1 +
>  arch/arm64/kvm/hyp/at.c          | 217 +++++++++++++++++++++++++++++++
>  arch/arm64/kvm/hyp/switch.c      |  13 +-
>  arch/arm64/kvm/sys_regs.c        | 202 +++++++++++++++++++++++++++-
>  7 files changed, 450 insertions(+), 4 deletions(-)
>  create mode 100644 arch/arm64/kvm/hyp/at.c
> 
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index 1e4dbe0b1c8e..9903f10f6343 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -24,6 +24,7 @@
>  
>  /* Hyp Configuration Register (HCR) bits */
>  #define HCR_FWB		(UL(1) << 46)
> +#define HCR_AT		(UL(1) << 44)
>  #define HCR_NV1		(UL(1) << 43)
>  #define HCR_NV		(UL(1) << 42)
>  #define HCR_API		(UL(1) << 41)
> @@ -119,6 +120,7 @@
>  #define VTCR_EL2_TG0_16K	TCR_TG0_16K
>  #define VTCR_EL2_TG0_64K	TCR_TG0_64K
>  #define VTCR_EL2_SH0_MASK	TCR_SH0_MASK
> +#define VTCR_EL2_SH0_SHIFT	TCR_SH0_SHIFT
>  #define VTCR_EL2_SH0_INNER	TCR_SH0_INNER
>  #define VTCR_EL2_ORGN0_MASK	TCR_ORGN0_MASK
>  #define VTCR_EL2_ORGN0_WBWA	TCR_ORGN0_WBWA
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index 5e956c2cd9b4..1cfa4d2cf772 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -69,6 +69,8 @@ extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
>  extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
>  
>  extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
> +extern void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr);
> +extern void __kvm_at_s1e2(struct kvm_vcpu *vcpu, u32 op, u64 vaddr);
>  
>  extern int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu);
>  
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 8b95f2c42c3d..b3a8d21c07b3 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -536,6 +536,23 @@
>  
>  #define SYS_SP_EL2			sys_reg(3, 6,  4, 1, 0)
>  
> +/* AT instructions */
> +#define AT_Op0 1
> +#define AT_CRn 7
> +
> +#define OP_AT_S1E1R	sys_insn(AT_Op0, 0, AT_CRn, 8, 0)
> +#define OP_AT_S1E1W	sys_insn(AT_Op0, 0, AT_CRn, 8, 1)
> +#define OP_AT_S1E0R	sys_insn(AT_Op0, 0, AT_CRn, 8, 2)
> +#define OP_AT_S1E0W	sys_insn(AT_Op0, 0, AT_CRn, 8, 3)
> +#define OP_AT_S1E1RP	sys_insn(AT_Op0, 0, AT_CRn, 9, 0)
> +#define OP_AT_S1E1WP	sys_insn(AT_Op0, 0, AT_CRn, 9, 1)
> +#define OP_AT_S1E2R	sys_insn(AT_Op0, 4, AT_CRn, 8, 0)
> +#define OP_AT_S1E2W	sys_insn(AT_Op0, 4, AT_CRn, 8, 1)
> +#define OP_AT_S12E1R	sys_insn(AT_Op0, 4, AT_CRn, 8, 4)
> +#define OP_AT_S12E1W	sys_insn(AT_Op0, 4, AT_CRn, 8, 5)
> +#define OP_AT_S12E0R	sys_insn(AT_Op0, 4, AT_CRn, 8, 6)
> +#define OP_AT_S12E0W	sys_insn(AT_Op0, 4, AT_CRn, 8, 7)
> +
>  /* Common SCTLR_ELx flags. */
>  #define SCTLR_ELx_DSSBS	(_BITUL(44))
>  #define SCTLR_ELx_ENIA	(_BITUL(31))
> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
> index ea710f674cb6..f7af51647079 100644
> --- a/arch/arm64/kvm/hyp/Makefile
> +++ b/arch/arm64/kvm/hyp/Makefile
> @@ -19,6 +19,7 @@ obj-$(CONFIG_KVM_ARM_HOST) += entry.o
>  obj-$(CONFIG_KVM_ARM_HOST) += switch.o
>  obj-$(CONFIG_KVM_ARM_HOST) += fpsimd.o
>  obj-$(CONFIG_KVM_ARM_HOST) += tlb.o
> +obj-$(CONFIG_KVM_ARM_HOST) += at.o
>  obj-$(CONFIG_KVM_ARM_HOST) += hyp-entry.o
>  
>  # KVM code is run at a different exception code with a different map, so
> diff --git a/arch/arm64/kvm/hyp/at.c b/arch/arm64/kvm/hyp/at.c
> new file mode 100644
> index 000000000000..0e938b6f8e43
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/at.c
> @@ -0,0 +1,217 @@
> +/*
> + * Copyright (C) 2017 - Linaro Ltd
> + * Author: Jintack Lim <jintack.lim@linaro.org>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <asm/kvm_hyp.h>
> +#include <asm/kvm_mmu.h>
> +
> +struct mmu_config {
> +	u64	ttbr0;
> +	u64	ttbr1;
> +	u64	tcr;
> +	u64	sctlr;
> +	u64	vttbr;
> +	u64	vtcr;
> +	u64	hcr;
> +};
> +
> +static void __mmu_config_save(struct mmu_config *config)
> +{
> +	config->ttbr0	= read_sysreg_el1(SYS_TTBR0);
> +	config->ttbr1	= read_sysreg_el1(SYS_TTBR1);
> +	config->tcr	= read_sysreg_el1(SYS_TCR);
> +	config->sctlr	= read_sysreg_el1(SYS_SCTLR);
> +	config->vttbr	= read_sysreg(vttbr_el2);
> +	config->vtcr	= read_sysreg(vtcr_el2);
> +	config->hcr	= read_sysreg(hcr_el2);
> +}
> +
> +static void __mmu_config_restore(struct mmu_config *config)
> +{
> +	write_sysreg_el1(config->ttbr0,	SYS_TTBR0);
> +	write_sysreg_el1(config->ttbr1,	SYS_TTBR1);
> +	write_sysreg_el1(config->tcr,	SYS_TCR);
> +	write_sysreg_el1(config->sctlr,	SYS_SCTLR);
> +	write_sysreg(config->vttbr,	vttbr_el2);
> +	write_sysreg(config->vtcr,	vttbr_el2);

Copy-paste with terrible consequences! I guess you want to write this
one to vtcr_el2.

Actually, things still seem to run with that. It looks like that
save/restore might not be completely required.

This seems to only get called in the context of handle_exit(). At that
point I think we don't need to save the *_el2 registers. vttbr_el2 and
vtcr_el2 both get set from the vcpu content in __activate_vm() before
jumping to EL1 (or vEL2), and hcr_el2 gets set in the same manner in
__activate_traps().

I think the *_el1 regs still need the save restore as we don't hit
vcpu_load() before re-running the guest after a successful handle_exit().

So unless we plan to call the "at" emulation code within
kvm_vcpu_run_vhe(), it should be safe to drop the hcr/vttbr/vtcr from
the mmu_config.

> +	write_sysreg(config->hcr,	hcr_el2);
> +
> +	isb();
> +}
> +

Cheers,

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 18/59] KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2
  2019-06-21  9:38 ` [PATCH 18/59] KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2 Marc Zyngier
@ 2019-07-01 16:12   ` Alexandru Elisei
  0 siblings, 0 replies; 176+ messages in thread
From: Alexandru Elisei @ 2019-07-01 16:12 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave Martin


On 6/21/19 10:38 AM, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@linaro.org>
>
> When running in virtual EL2 mode, we actually run the hardware in EL1
> and therefore have to use the EL1 registers to ensure correct operation.
>
> By setting the HCR.TVM and HCR.TVRM we ensure that the virtual EL2 mode
> doesn't shoot itself in the foot when setting up what it believes to be
> a different mode's system register state (for example when preparing to
> switch to a VM).
A guest hypervisor with vhe enabled uses the _EL12 register names when preparing
to run a guest, and accesses to those registers are already trapped when setting
HCR_EL2.NV. This patch affects only non-vhe guest hypervisors, would you mind
updating the message to reflect that?
>
> We can leverage the existing sysregs infrastructure to support trapped
> accesses to these registers.
>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/kvm/hyp/switch.c | 4 ++++
>  arch/arm64/kvm/sys_regs.c   | 7 ++++++-
>  2 files changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 7b55c11b30fb..791b26570347 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -135,6 +135,10 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
>  {
>  	u64 hcr = vcpu->arch.hcr_el2;
>  
> +	/* Trap VM sysreg accesses if an EL2 guest is not using VHE. */
> +	if (vcpu_mode_el2(vcpu) && !vcpu_el2_e2h_is_set(vcpu))
> +		hcr |= HCR_TVM | HCR_TRVM;
> +
>  	write_sysreg(hcr, hcr_el2);
>  
>  	if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN) && (hcr & HCR_VSE))
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index e181359adadf..0464d8e29cba 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -440,7 +440,12 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
>  	u64 val;
>  	int reg = r->reg;
>  
> -	BUG_ON(!p->is_write);
> +	BUG_ON(!vcpu_mode_el2(vcpu) && !p->is_write);
> +
> +	if (!p->is_write) {
> +		p->regval = vcpu_read_sys_reg(vcpu, reg);
> +		return true;
> +	}
>  
>  	/* See the 32bit mapping in kvm_host.h */
>  	if (p->is_aarch32)

For more context:

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index e181359adadf..0464d8e29cba 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -428,31 +428,36 @@ static bool access_dcsw(struct kvm_vcpu *vcpu,
 }
 
 /*
  * Generic accessor for VM registers. Only called as long as HCR_TVM
  * is set. If the guest enables the MMU, we stop trapping the VM
  * sys_regs and leave it in complete control of the caches.
  */
 static bool access_vm_reg(struct kvm_vcpu *vcpu,
                          struct sys_reg_params *p,
                          const struct sys_reg_desc *r)
 {
        bool was_enabled = vcpu_has_cache_enabled(vcpu);
        u64 val;
        int reg = r->reg;
 
-       BUG_ON(!p->is_write);
+       BUG_ON(!vcpu_mode_el2(vcpu) && !p->is_write);
+
+       if (!p->is_write) {
+               p->regval = vcpu_read_sys_reg(vcpu, reg);
+               return true;
+       }
 
        /* See the 32bit mapping in kvm_host.h */
        if (p->is_aarch32)
                reg = r->reg / 2;
 
        if (!p->is_aarch32 || !p->is_32bit) {
                val = p->regval;
        } else {
                val = vcpu_read_sys_reg(vcpu, reg);
                if (r->reg % 2)
                        val = (p->regval << 32) | (u64)lower_32_bits(val);
                else
                        val = ((u64)upper_32_bits(val) << 32) |
                                lower_32_bits(p->regval);
        }

Perhaps the function comment should be updated to reflect that the function is
also used for VM register traps for a non-vhe guest hypervisor?


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* Re: [PATCH 07/59] KVM: arm64: nv: Add EL2 system registers to vcpu context
  2019-06-21  9:37 ` [PATCH 07/59] KVM: arm64: nv: Add EL2 system registers to vcpu context Marc Zyngier
  2019-06-24 12:54   ` Dave Martin
  2019-06-24 15:47   ` Alexandru Elisei
@ 2019-07-01 16:36   ` Suzuki K Poulose
  2 siblings, 0 replies; 176+ messages in thread
From: Suzuki K Poulose @ 2019-07-01 16:36 UTC (permalink / raw)
  To: marc.zyngier, linux-arm-kernel, kvmarm, kvm
  Cc: andre.przywara, christoffer.dall, dave.martin, jintack,
	julien.thierry, james.morse

Hi Marc,

On 21/06/2019 10:37, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> ARM v8.3 introduces a new bit in the HCR_EL2, which is the NV bit. When
> this bit is set, accessing EL2 registers in EL1 traps to EL2. In
> addition, executing the following instructions in EL1 will trap to EL2:
> tlbi, at, eret, and msr/mrs instructions to access SP_EL1. Most of the
> instructions that trap to EL2 with the NV bit were undef at EL1 prior to
> ARM v8.3. The only instruction that was not undef is eret.
> 
> This patch sets up a handler for EL2 registers and SP_EL1 register
> accesses at EL1. The host hypervisor keeps those register values in
> memory, and will emulate their behavior.
> 
> This patch doesn't set the NV bit yet. It will be set in a later patch
> once nested virtualization support is completed.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>   arch/arm64/include/asm/kvm_host.h | 37 +++++++++++++++-
>   arch/arm64/include/asm/sysreg.h   | 50 ++++++++++++++++++++-
>   arch/arm64/kvm/sys_regs.c         | 74 ++++++++++++++++++++++++++++---
>   3 files changed, 154 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 4bcd9c1291d5..2d4290d2513a 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -173,12 +173,47 @@ enum vcpu_sysreg {
>   	APGAKEYLO_EL1,
>   	APGAKEYHI_EL1,
>   
> -	/* 32bit specific registers. Keep them at the end of the range */
> +	/* 32bit specific registers. */
>   	DACR32_EL2,	/* Domain Access Control Register */
>   	IFSR32_EL2,	/* Instruction Fault Status Register */
>   	FPEXC32_EL2,	/* Floating-Point Exception Control Register */
>   	DBGVCR32_EL2,	/* Debug Vector Catch Register */
>   
> +	/* EL2 registers sorted ascending by Op0, Op1, CRn, CRm, Op2 */
> +	FIRST_EL2_SYSREG,
> +	VPIDR_EL2 = FIRST_EL2_SYSREG,
> +			/* Virtualization Processor ID Register */
> +	VMPIDR_EL2,	/* Virtualization Multiprocessor ID Register */
> +	SCTLR_EL2,	/* System Control Register (EL2) */
> +	ACTLR_EL2,	/* Auxiliary Control Register (EL2) */
> +	HCR_EL2,	/* Hypervisor Configuration Register */
> +	MDCR_EL2,	/* Monitor Debug Configuration Register (EL2) */
> +	CPTR_EL2,	/* Architectural Feature Trap Register (EL2) */
> +	HSTR_EL2,	/* Hypervisor System Trap Register */
> +	HACR_EL2,	/* Hypervisor Auxiliary Control Register */
> +	TTBR0_EL2,	/* Translation Table Base Register 0 (EL2) */
> +	TTBR1_EL2,	/* Translation Table Base Register 1 (EL2) */
> +	TCR_EL2,	/* Translation Control Register (EL2) */
> +	VTTBR_EL2,	/* Virtualization Translation Table Base Register */
> +	VTCR_EL2,	/* Virtualization Translation Control Register */
> +	SPSR_EL2,	/* EL2 saved program status register */
> +	ELR_EL2,	/* EL2 exception link register */
> +	AFSR0_EL2,	/* Auxiliary Fault Status Register 0 (EL2) */
> +	AFSR1_EL2,	/* Auxiliary Fault Status Register 1 (EL2) */
> +	ESR_EL2,	/* Exception Syndrome Register (EL2) */
> +	FAR_EL2,	/* Hypervisor IPA Fault Address Register */
> +	HPFAR_EL2,	/* Hypervisor IPA Fault Address Register */
> +	MAIR_EL2,	/* Memory Attribute Indirection Register (EL2) */
> +	AMAIR_EL2,	/* Auxiliary Memory Attribute Indirection Register (EL2) */
> +	VBAR_EL2,	/* Vector Base Address Register (EL2) */
> +	RVBAR_EL2,	/* Reset Vector Base Address Register */
> +	RMR_EL2,	/* Reset Management Register */
> +	CONTEXTIDR_EL2,	/* Context ID Register (EL2) */
> +	TPIDR_EL2,	/* EL2 Software Thread ID Register */
> +	CNTVOFF_EL2,	/* Counter-timer Virtual Offset register */
> +	CNTHCTL_EL2,	/* Counter-timer Hypervisor Control register */
> +	SP_EL2,		/* EL2 Stack Pointer */

Does it need to include the following registers Counter-timer Hyp Physical timer
registers ? i.e, CNTHP_{CTL,CVAL,TVAL}_EL2. Or do we have some other magic with
NV for the virtual EL2 ?

Cheers
Suzuki


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 20/59] KVM: arm64: nv: Trap CPACR_EL1 access in virtual EL2
  2019-06-21  9:38 ` [PATCH 20/59] KVM: arm64: nv: Trap CPACR_EL1 access in " Marc Zyngier
@ 2019-07-01 16:40   ` Alexandru Elisei
  0 siblings, 0 replies; 176+ messages in thread
From: Alexandru Elisei @ 2019-07-01 16:40 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave Martin


On 6/21/19 10:38 AM, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
>
> For the same reason we trap virtual memory register accesses in virtual
> EL2, we trap CPACR_EL1 access too; We allow the virtual EL2 mode to
> access EL1 system register state instead of the virtual EL2 one.
>
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/include/asm/kvm_arm.h | 3 ++-
>  arch/arm64/kvm/hyp/switch.c      | 2 ++
>  arch/arm64/kvm/sys_regs.c        | 2 +-
>  3 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index b2e363ac624d..48e15af2bece 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -278,12 +278,13 @@
>  #define CPTR_EL2_TFP_SHIFT 10
>  
>  /* Hyp Coprocessor Trap Register */
> -#define CPTR_EL2_TCPAC	(1 << 31)
> +#define CPTR_EL2_TCPAC	(1U << 31)
>  #define CPTR_EL2_TTA	(1 << 20)
>  #define CPTR_EL2_TFP	(1 << CPTR_EL2_TFP_SHIFT)
>  #define CPTR_EL2_TZ	(1 << 8)
>  #define CPTR_EL2_RES1	0x000032ff /* known RES1 bits in CPTR_EL2 */
>  #define CPTR_EL2_DEFAULT	CPTR_EL2_RES1
> +#define CPTR_EL2_E2H_TCPAC	(1U << 31)
I'm not sure why CPTR_EL2_TCPAC is being renamed to CPTR_EL2_E2H_TCPAC.
CPTR_EL2.TCPAC is always bit 31, regardless of the value of HCR_EL2.E2H. I also
did a grep and it's only used in the one place added by this patch.
>  
>  /* Hyp Debug Configuration Register bits */
>  #define MDCR_EL2_TPMS		(1 << 14)
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 791b26570347..62359c7c3d6b 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -108,6 +108,8 @@ static void activate_traps_vhe(struct kvm_vcpu *vcpu)
>  		val &= ~CPACR_EL1_FPEN;
>  		__activate_traps_fpsimd32(vcpu);
>  	}
> +	if (vcpu_mode_el2(vcpu) && !vcpu_el2_e2h_is_set(vcpu))
> +		val |= CPTR_EL2_E2H_TCPAC;
>  
>  	write_sysreg(val, cpacr_el1);
>  
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 7fc87657382d..1d1312425cf2 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1773,7 +1773,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	ID_UNALLOCATED(7,7),
>  
>  	{ SYS_DESC(SYS_SCTLR_EL1), access_vm_reg, reset_val, SCTLR_EL1, 0x00C50078 },
> -	{ SYS_DESC(SYS_CPACR_EL1), NULL, reset_val, CPACR_EL1, 0 },
> +	{ SYS_DESC(SYS_CPACR_EL1), access_rw, reset_val, CPACR_EL1, 0 },
>  	{ SYS_DESC(SYS_ZCR_EL1), NULL, reset_val, ZCR_EL1, 0, .visibility = sve_visibility },
>  	{ SYS_DESC(SYS_TTBR0_EL1), access_vm_reg, reset_unknown, TTBR0_EL1 },
>  	{ SYS_DESC(SYS_TTBR1_EL1), access_vm_reg, reset_unknown, TTBR1_EL1 },

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 12/59] KVM: arm64: nv: Handle trapped ERET from virtual EL2
  2019-06-21  9:37 ` [PATCH 12/59] KVM: arm64: nv: Handle trapped ERET from " Marc Zyngier
@ 2019-07-02 12:00   ` Alexandru Elisei
  0 siblings, 0 replies; 176+ messages in thread
From: Alexandru Elisei @ 2019-07-02 12:00 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave Martin


On 6/21/19 10:37 AM, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@arm.com>
>
> When a guest hypervisor running virtual EL2 in EL1 executes an ERET
> instruction, we will have set HCR_EL2.NV which traps ERET to EL2, so
> that we can emulate the exception return in software.
>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/include/asm/esr.h     | 3 ++-
>  arch/arm64/include/asm/kvm_arm.h | 2 +-
>  arch/arm64/kvm/handle_exit.c     | 8 ++++++++
>  3 files changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
> index 0e27fe91d5ea..f85aa269082c 100644
> --- a/arch/arm64/include/asm/esr.h
> +++ b/arch/arm64/include/asm/esr.h
> @@ -45,7 +45,8 @@
>  #define ESR_ELx_EC_SMC64	(0x17)	/* EL2 and above */
>  #define ESR_ELx_EC_SYS64	(0x18)
>  #define ESR_ELx_EC_SVE		(0x19)
> -/* Unallocated EC: 0x1A - 0x1E */
> +#define ESR_ELx_EC_ERET		(0x1A)  /* EL2 only */

From ARM DDI 0487D.b, about HCR_EL2.NV (page D12-2889):

"The priority of this trap is higher than the priority of the HCR_EL2.API trap.
If both of these bits are set so that EL1 execution of an ERETAA or ERETAB
instruction is trapped to EL2, then the syndrome reported is 0x1A."

I'm not familiar with the pointer authentication code, but it looks like the
HCR_EL2.API bit will trap if userspace sets the pointer authentication vcpu
feature, and I don't see any handling of the ERETAA or ERETAB instructions in
kvm_emulate_nested_eret. Is that pending in the next iteration of the series? Or
are the two features incompatible?

> +/* Unallocated EC: 0x1B - 0x1E */
>  #define ESR_ELx_EC_IMP_DEF	(0x1f)	/* EL3 only */
>  #define ESR_ELx_EC_IABT_LOW	(0x20)
>  #define ESR_ELx_EC_IABT_CUR	(0x21)
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index 9d70a5362fbb..b2e363ac624d 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -333,7 +333,7 @@
>  	ECN(SP_ALIGN), ECN(FP_EXC32), ECN(FP_EXC64), ECN(SERROR), \
>  	ECN(BREAKPT_LOW), ECN(BREAKPT_CUR), ECN(SOFTSTP_LOW), \
>  	ECN(SOFTSTP_CUR), ECN(WATCHPT_LOW), ECN(WATCHPT_CUR), \
> -	ECN(BKPT32), ECN(VECTOR32), ECN(BRK64)
> +	ECN(BKPT32), ECN(VECTOR32), ECN(BRK64), ECN(ERET)
>  
>  #define CPACR_EL1_FPEN		(3 << 20)
>  #define CPACR_EL1_TTA		(1 << 28)
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 6c0ac52b34cc..2517711f034f 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -177,6 +177,13 @@ static int handle_sve(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  {
>  	/* Until SVE is supported for guests: */
>  	kvm_inject_undefined(vcpu);
> +
> +	return 1;
> +}
> +
> +static int kvm_handle_eret(struct kvm_vcpu *vcpu, struct kvm_run *run)
> +{
> +	kvm_emulate_nested_eret(vcpu);
>  	return 1;
>  }
>  
> @@ -231,6 +238,7 @@ static exit_handle_fn arm_exit_handlers[] = {
>  	[ESR_ELx_EC_SMC64]	= handle_smc,
>  	[ESR_ELx_EC_SYS64]	= kvm_handle_sys_reg,
>  	[ESR_ELx_EC_SVE]	= handle_sve,
> +	[ESR_ELx_EC_ERET]	= kvm_handle_eret,
>  	[ESR_ELx_EC_IABT_LOW]	= kvm_handle_guest_abort,
>  	[ESR_ELx_EC_DABT_LOW]	= kvm_handle_guest_abort,
>  	[ESR_ELx_EC_SOFTSTP_LOW]= kvm_handle_guest_debug,

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 44/59] KVM: arm64: nv: Trap and emulate TLBI instructions from virtual EL2
  2019-06-21  9:38 ` [PATCH 44/59] KVM: arm64: nv: Trap and emulate TLBI " Marc Zyngier
@ 2019-07-02 12:37   ` Julien Thierry
  2019-07-10 10:15   ` Alexandru Elisei
  1 sibling, 0 replies; 176+ messages in thread
From: Julien Thierry @ 2019-07-02 12:37 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	James Morse, Suzuki K Poulose



On 21/06/2019 10:38, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> When supporting nested virtualization a guest hypervisor executing TLBI
> instructions must be trapped and emulated by the host hypervisor,
> because the guest hypervisor can only affect physical TLB entries
> relating to its own execution environment (virtual EL2 in EL1) but not
> to the nested guests as required by the semantics of the instructions
> and TLBI instructions might also result in updates (invalidations) to
> shadow page tables.
> 
> This patch does several things.
> 
> 1. List and define all TLBI system instructions to emulate.
> 
> 2. Emulate TLBI ALLE2(IS) instruction executed in the virtual EL2. Since
> we emulate the virtual EL2 in the EL1, we invalidate EL1&0 regime stage
> 1 TLB entries with setting vttbr_el2 having the VMID of the virtual EL2.
> 
> 3. Emulate TLBI VAE2* instruction executed in the virtual EL2. Based on the
> same principle as TLBI ALLE2 instruction, we can simply emulate those
> instructions by executing corresponding VAE1* instructions with the
> virtual EL2's VMID assigned by the host hypervisor.
> 
> Note that we are able to emulate TLBI ALLE2IS precisely by only
> invalidating stage 1 TLB entries via TLBI VMALL1IS instruction, but to
> make it simeple, we reuse the existing function, __kvm_tlb_flush_vmid(),
> which invalidates both of stage 1 and 2 TLB entries.
> 
> 4. TLBI ALLE1(IS) instruction invalidates all EL1&0 regime stage 1 and 2
> TLB entries (on all PEs in the same Inner Shareable domain). To emulate
> these instructions, we first need to clear all the mappings in the
> shadow page tables since executing those instructions implies the change
> of mappings in the stage 2 page tables maintained by the guest
> hypervisor.  We then need to invalidate all EL1&0 regime stage 1 and 2
> TLB entries of all VMIDs, which are assigned by the host hypervisor, for
> this VM.
> 
> 5. Based on the same principle as TLBI ALLE1(IS) emulation, we clear the
> mappings in the shadow stage-2 page tables and invalidate TLB entries.
> But this time we do it only for the current VMID from the guest
> hypervisor's perspective, not for all VMIDs.
> 
> 6. Based on the same principle as TLBI ALLE1(IS) and TLBI VMALLS12E1(IS)
> emulation, we clear the mappings in the shadow stage-2 page tables and
> invalidate TLB entries. We do it only for one mapping for the current
> VMID from the guest hypervisor's view.
> 
> 7. Forward system instruction traps to the virtual EL2 if a
> corresponding bit in the virtual HCR_EL2 is set.
> 
> 8. Even though a guest hypervisor can execute TLBI instructions that are
> accesible at EL1 without trap, it's wrong; All those TLBI instructions
> work based on current VMID, and when running a guest hypervisor current
> VMID is the one for itself, not the one from the virtual vttbr_el2. So
> letting a guest hypervisor execute those TLBI instructions results in
> invalidating its own TLB entries and leaving invalid TLB entries
> unhandled.
> 
> Therefore we trap and emulate those TLBI instructions. The emulation is
> simple; we find a shadow VMID mapped to the virtual vttbr_el2, set it in
> the physical vttbr_el2, then execute the same instruction in EL2.
> 
> We don't set HCR_EL2.TTLB bit yet.
> 
>   [ Changes performed by Marc Zynger:
> 
>     The TLBI handling code more or less directly execute the same
>     instruction that has been trapped (with an EL2->EL1 conversion
>     in the case of an EL2 TLBI), but that's unfortunately not enough:
> 
>     - TLBIs must be upgraded to the Inner Shareable domain to account
>       for vcpu migration, just like we already have with HCR_EL2.FB.
> 
>     - The DSB instruction that synchronises these must thus be on
>       the Inner Shareable domain as well.
> 
>     - Prior to executing the TLBI, we need another DSB ISHST to make
>       sure that the update to the page tables is now visible.
> 
>       Ordering of system instructions fixed
> 
>     - The current TLB invalidation code is pretty buggy, as it assume a
>       page mapping. On the contrary, it is likely that TLB invalidation
>       will cover more than a single page, and the size should be decided
>       by the guests configuration (and not the host's).
> 
>       Since we don't cache the guest mapping sizes in the shadow PT yet,
>       let's assume the worse case (a block mapping) and invalidate that.
> 
>       Take this opportunity to fix the decoding of the parameter (it
>       isn't a straight IPA).
> 
>     - In general, we always emulate local TBL invalidations as being
>       as upgraded to the Inner Shareable domain so that we can easily
>       deal with vcpu migration. This is consistent with the fact that
>       we set HCR_EL2.FB when running non-nested VMs.
> 
>       So let's emulate TLBI ALLE2 as ALLE2IS.
>   ]
> 
>   [ Changes performed by Christoffer Dall:
> 
>     Sometimes when we are invalidating the TLB for a certain S2 MMU
>     context, this context can also have EL2 context associated with it
>     and we have to invalidate this too.
>   ]
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> ---
>  arch/arm64/include/asm/kvm_asm.h |   2 +
>  arch/arm64/include/asm/sysreg.h  |  36 ++++++
>  arch/arm64/kvm/hyp/tlb.c         |  81 +++++++++++++
>  arch/arm64/kvm/sys_regs.c        | 201 +++++++++++++++++++++++++++++++
>  virt/kvm/arm/mmu.c               |  18 ++-
>  5 files changed, 337 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index 1cfa4d2cf772..9cb9ab066ebc 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -67,6 +67,8 @@ extern void __kvm_flush_vm_context(void);
>  extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa);
>  extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
>  extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
> +extern void __kvm_tlb_vae2(struct kvm_s2_mmu *mmu, u64 va, u64 sys_encoding);
> +extern void __kvm_tlb_el1_instr(struct kvm_s2_mmu *mmu, u64 val, u64 sys_encoding);
>  
>  extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
>  extern void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr);
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index b3a8d21c07b3..e0912ececd92 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -553,6 +553,42 @@
>  #define OP_AT_S12E0R	sys_insn(AT_Op0, 4, AT_CRn, 8, 6)
>  #define OP_AT_S12E0W	sys_insn(AT_Op0, 4, AT_CRn, 8, 7)
>  
> +/* TLBI instructions */
> +#define TLBI_Op0	1
> +#define TLBI_Op1_EL1	0	/* Accessible from EL1 or higher */
> +#define TLBI_Op1_EL2	4	/* Accessible from EL2 or higher */
> +#define TLBI_CRn	8
> +#define tlbi_insn_el1(CRm, Op2)	sys_insn(TLBI_Op0, TLBI_Op1_EL1, TLBI_CRn, (CRm), (Op2))
> +#define tlbi_insn_el2(CRm, Op2)	sys_insn(TLBI_Op0, TLBI_Op1_EL2, TLBI_CRn, (CRm), (Op2))
> +
> +#define OP_TLBI_VMALLE1IS	tlbi_insn_el1(3, 0)
> +#define OP_TLBI_VAE1IS		tlbi_insn_el1(3, 1)
> +#define OP_TLBI_ASIDE1IS	tlbi_insn_el1(3, 2)
> +#define OP_TLBI_VAAE1IS		tlbi_insn_el1(3, 3)
> +#define OP_TLBI_VALE1IS		tlbi_insn_el1(3, 5)
> +#define OP_TLBI_VAALE1IS	tlbi_insn_el1(3, 7)
> +#define OP_TLBI_VMALLE1		tlbi_insn_el1(7, 0)
> +#define OP_TLBI_VAE1		tlbi_insn_el1(7, 1)
> +#define OP_TLBI_ASIDE1		tlbi_insn_el1(7, 2)
> +#define OP_TLBI_VAAE1		tlbi_insn_el1(7, 3)
> +#define OP_TLBI_VALE1		tlbi_insn_el1(7, 5)
> +#define OP_TLBI_VAALE1		tlbi_insn_el1(7, 7)
> +
> +#define OP_TLBI_IPAS2E1IS	tlbi_insn_el2(0, 1)
> +#define OP_TLBI_IPAS2LE1IS	tlbi_insn_el2(0, 5)
> +#define OP_TLBI_ALLE2IS		tlbi_insn_el2(3, 0)
> +#define OP_TLBI_VAE2IS		tlbi_insn_el2(3, 1)
> +#define OP_TLBI_ALLE1IS		tlbi_insn_el2(3, 4)
> +#define OP_TLBI_VALE2IS		tlbi_insn_el2(3, 5)
> +#define OP_TLBI_VMALLS12E1IS	tlbi_insn_el2(3, 6)
> +#define OP_TLBI_IPAS2E1		tlbi_insn_el2(4, 1)
> +#define OP_TLBI_IPAS2LE1	tlbi_insn_el2(4, 5)
> +#define OP_TLBI_ALLE2		tlbi_insn_el2(7, 0)
> +#define OP_TLBI_VAE2		tlbi_insn_el2(7, 1)
> +#define OP_TLBI_ALLE1		tlbi_insn_el2(7, 4)
> +#define OP_TLBI_VALE2		tlbi_insn_el2(7, 5)
> +#define OP_TLBI_VMALLS12E1	tlbi_insn_el2(7, 6)
> +
>  /* Common SCTLR_ELx flags. */
>  #define SCTLR_ELx_DSSBS	(_BITUL(44))
>  #define SCTLR_ELx_ENIA	(_BITUL(31))
> diff --git a/arch/arm64/kvm/hyp/tlb.c b/arch/arm64/kvm/hyp/tlb.c
> index 779405db3fb3..026afbf1a697 100644
> --- a/arch/arm64/kvm/hyp/tlb.c
> +++ b/arch/arm64/kvm/hyp/tlb.c
> @@ -205,3 +205,84 @@ void __hyp_text __kvm_flush_vm_context(void)
>  	asm volatile("ic ialluis" : : );
>  	dsb(ish);
>  }
> +
> +void __hyp_text __kvm_tlb_vae2(struct kvm_s2_mmu *mmu, u64 va, u64 sys_encoding)
> +{
> +	struct tlb_inv_context cxt;
> +
> +	dsb(ishst);
> +
> +	/* Switch to requested VMID */
> +	__tlb_switch_to_guest()(mmu, &cxt);
> +
> +	/*
> +	 * Execute the EL1 version of TLBI VAE2* instruction, forcing
> +	 * an upgrade to the Inner Shareable domain in order to
> +	 * perform the invalidation on all CPUs.
> +	 */
> +	switch (sys_encoding) {
> +	case OP_TLBI_VAE2:
> +	case OP_TLBI_VAE2IS:
> +		__tlbi(vae1is, va);
> +		break;
> +	case OP_TLBI_VALE2:
> +	case OP_TLBI_VALE2IS:
> +		__tlbi(vale1is, va);
> +		break;
> +	default:
> +		break;
> +	}
> +	dsb(ish);
> +	isb();
> +
> +	__tlb_switch_to_host()(&cxt);
> +}
> +
> +void __hyp_text __kvm_tlb_el1_instr(struct kvm_s2_mmu *mmu, u64 val, u64 sys_encoding)
> +{
> +	struct tlb_inv_context cxt;
> +
> +	dsb(ishst);
> +
> +	/* Switch to requested VMID */
> +	__tlb_switch_to_guest()(mmu, &cxt);
> +
> +	/*
> +	 * Execute the same instruction as the guest hypervisor did,
> +	 * expanding the scope of local TLB invalidations to the Inner
> +	 * Shareable domain so that it takes place on all CPUs. This
> +	 * is equivalent to having HCR_EL2.FB set.
> +	 */
> +	switch (sys_encoding) {
> +	case OP_TLBI_VMALLE1:
> +	case OP_TLBI_VMALLE1IS:
> +		__tlbi(vmalle1is);
> +		break;
> +	case OP_TLBI_VAE1:
> +	case OP_TLBI_VAE1IS:
> +		__tlbi(vae1is, val);
> +		break;
> +	case OP_TLBI_ASIDE1:
> +	case OP_TLBI_ASIDE1IS:
> +		__tlbi(aside1is, val);
> +		break;
> +	case OP_TLBI_VAAE1:
> +	case OP_TLBI_VAAE1IS:
> +		__tlbi(vaae1is, val);
> +		break;
> +	case OP_TLBI_VALE1:
> +	case OP_TLBI_VALE1IS:
> +		__tlbi(vale1is, val);
> +		break;
> +	case OP_TLBI_VAALE1:
> +	case OP_TLBI_VAALE1IS:
> +		__tlbi(vaale1is, val);
> +		break;
> +	default:
> +		break;
> +	}
> +	dsb(ish);
> +	isb();
> +
> +	__tlb_switch_to_host()(&cxt);
> +}
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 102419b837e8..0343682fe47f 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1661,6 +1661,11 @@ static bool forward_at_traps(struct kvm_vcpu *vcpu)
>  	return forward_traps(vcpu, HCR_AT);
>  }
>  
> +static bool forward_ttlb_traps(struct kvm_vcpu *vcpu)
> +{
> +	return forward_traps(vcpu, HCR_TTLB);
> +}
> +
>  /* This function is to support the recursive nested virtualization */
>  static bool forward_nv1_traps(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
>  {
> @@ -2251,6 +2256,174 @@ static bool handle_s12w(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
>  	return handle_s12(vcpu, p, r, true);
>  }
>  
> +static bool handle_alle2is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +			   const struct sys_reg_desc *r)
> +{
> +	/*
> +	 * To emulate invalidating all EL2 regime stage 1 TLB entries for all
> +	 * PEs, executing TLBI VMALLE1IS is enough. But reuse the existing
> +	 * interface for the simplicity; invalidating stage 2 entries doesn't
> +	 * affect the correctness.
> +	 */
> +	kvm_call_hyp(__kvm_tlb_flush_vmid, &vcpu->kvm->arch.mmu);
> +	return true;
> +}
> +
> +static bool handle_vae2(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +		       const struct sys_reg_desc *r)
> +{
> +	int sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
> +
> +	/*
> +	 * Based on the same principle as TLBI ALLE2 instruction emulation, we
> +	 * emulate TLBI VAE2* instructions by executing corresponding TLBI VAE1*
> +	 * instructions with the virtual EL2's VMID assigned by the host
> +	 * hypervisor.
> +	 */
> +	kvm_call_hyp(__kvm_tlb_vae2, &vcpu->kvm->arch.mmu,
> +		     p->regval, sys_encoding);
> +	return true;
> +}
> +
> +static bool handle_alle1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +			   const struct sys_reg_desc *r)
> +{
> +	struct kvm_s2_mmu *mmu = &vcpu->kvm->arch.mmu;
> +	spin_lock(&vcpu->kvm->mmu_lock);
> +
> +	/*
> +	 * Clear all mappings in the shadow page tables and invalidate the stage
> +	 * 1 and 2 TLB entries via kvm_tlb_flush_vmid_ipa().
> +	 */
> +	kvm_nested_s2_clear(vcpu->kvm);
> +
> +	if (mmu->vmid.vmid_gen) {
> +		/*
> +		 * Invalidate the stage 1 and 2 TLB entries for the host OS
> +		 * in a VM only if there is one.
> +		 */
> +		kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
> +	}
> +
> +	spin_unlock(&vcpu->kvm->mmu_lock);
> +
> +	return true;
> +}
> +
> +static bool handle_vmalls12e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +				const struct sys_reg_desc *r)
> +{
> +	u64 vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
> +	struct kvm_s2_mmu *mmu;
> +
> +	spin_lock(&vcpu->kvm->mmu_lock);
> +
> +	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, HCR_VM);
> +	if (mmu)
> +		kvm_unmap_stage2_range(mmu, 0, kvm_phys_size(vcpu->kvm));
> +
> +	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, 0);
> +	if (mmu)
> +		kvm_unmap_stage2_range(mmu, 0, kvm_phys_size(vcpu->kvm));
> +
> +	spin_unlock(&vcpu->kvm->mmu_lock);
> +
> +	return true;
> +}
> +
> +static bool handle_ipas2e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +			     const struct sys_reg_desc *r)
> +{
> +	u64 vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
> +	u64 vtcr = vcpu_read_sys_reg(vcpu, VTCR_EL2);
> +	struct kvm_s2_mmu *mmu;
> +	u64 base_addr;
> +	int max_size;
> +
> +	/*
> +	 * We drop a number of things from the supplied value:
> +	 *
> +	 * - NS bit: we're non-secure only.
> +	 *
> +	 * - TTL field: We already have the granule size from the
> +	 *   VTCR_EL2.TG0 field, and the level is only relevant to the
> +	 *   guest's S2PT.
> +	 *
> +	 * - IPA[51:48]: We don't support 52bit IPA just yet...
> +	 *
> +	 * And of course, adjust the IPA to be on an actual address.
> +	 */
> +	base_addr = (p->regval & GENMASK_ULL(35, 0)) << 12;
> +
> +	/* Compute the maximum extent of the invalidation */
> +	switch ((vtcr & VTCR_EL2_TG0_MASK)) {
> +	case VTCR_EL2_TG0_4K:
> +		max_size = SZ_1G;
> +		break;
> +	case VTCR_EL2_TG0_16K:
> +		max_size = SZ_32M;
> +		break;
> +	case VTCR_EL2_TG0_64K:
> +		/*
> +		 * No, we do not support 52bit IPA in nested yet. Once
> +		 * we do, this should be 4TB.
> +		 */
> +		/* FIXME: remove the 52bit PA support from the IDregs */
> +		max_size = SZ_512M;
> +		break;
> +	default:
> +		BUG();
> +	}
> +
> +	spin_lock(&vcpu->kvm->mmu_lock);
> +
> +	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, HCR_VM);
> +	if (mmu)
> +		kvm_unmap_stage2_range(mmu, base_addr, max_size);
> +
> +	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, 0);
> +	if (mmu)
> +		kvm_unmap_stage2_range(mmu, base_addr, max_size);
> +
> +	spin_unlock(&vcpu->kvm->mmu_lock);

For lookup_s2_mmu(), sometimes we take kvm->lock, sometimes we take
kvm->mmu_lock instead (without the other being taken). The comment above
lookup_s2_mmu() suggest kvm->lock is the one that should be taken, but
I'm not sure which is the correct one.

Should the code here be something like the following?

	mutex_lock(&vcpu->kvm->lock);

	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, HCR_VM);
	if (!mmu)
		mmu = lookup_s2_mmu(vcpu->kvm, vttbr, HCR_VM);

	if (mmu) {
		spin_lock(&vcpu->kvm->mmu_lock);
		kvm_unmap_stage2_range(...);
		spin_unlock(&vcpu->kvm_mmu_lock);
	}
	mutex_unlock(&vcpu->kvm->lock);


Overall, there seems to be other places where lookup_s2_mmu() is called
only with kvm->mmu_lock taken instead of kvm->lock. Should the comment
be fixed to kvm->mmu_lock and callers taking only kvm->lock (i.e.
creation/destruction of s2_mmu) be updated?


> +
> +	return true;
> +}
> +
> +static bool handle_tlbi_el1(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +			    const struct sys_reg_desc *r)
> +{
> +	u64 virtual_vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
> +	struct kvm_s2_mmu *mmu;
> +	int sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
> +
> +	/*
> +	 * TODO: Revisit this comment:
> +	 *
> +	 * If we can't find a shadow VMID, it is either the virtual
> +	 * VMID is for the host OS or the nested VM having the virtual
> +	 * VMID is never executed. (Note that we create a showdow VMID
> +	 * when entering a VM.) For the former, we can flush TLB
> +	 * entries belonging to the host OS in a VM. For the latter, we
> +	 * don't have to do anything. Since we can't differentiate
> +	 * between those cases, just do what we can do for the former.
> +	 */
> +
> +	mutex_lock(&vcpu->kvm->lock);
> +	mmu = lookup_s2_mmu(vcpu->kvm, virtual_vttbr, HCR_VM);
> +	if (mmu)
> +		kvm_call_hyp(__kvm_tlb_el1_instr,
> +			     mmu, p->regval, sys_encoding);
> +
> +	mmu = lookup_s2_mmu(vcpu->kvm, virtual_vttbr, 0);
> +	if (mmu)
> +		kvm_call_hyp(__kvm_tlb_el1_instr,
> +			     mmu, p->regval, sys_encoding);
> +	mutex_unlock(&vcpu->kvm->lock);
> +
> +	return true;
> +}
> +
>  /*
>   * AT instruction emulation
>   *
> @@ -2333,12 +2506,40 @@ static struct sys_reg_desc sys_insn_descs[] = {
>  	{ SYS_DESC(SYS_DC_CSW), access_dcsw },
>  	{ SYS_DESC(SYS_DC_CISW), access_dcsw },
>  
> +	SYS_INSN_TO_DESC(TLBI_VMALLE1IS, handle_tlbi_el1, forward_ttlb_traps),
> +	SYS_INSN_TO_DESC(TLBI_VAE1IS, handle_tlbi_el1, forward_ttlb_traps),
> +	SYS_INSN_TO_DESC(TLBI_ASIDE1IS, handle_tlbi_el1, forward_ttlb_traps),
> +	SYS_INSN_TO_DESC(TLBI_VAAE1IS, handle_tlbi_el1, forward_ttlb_traps),
> +	SYS_INSN_TO_DESC(TLBI_VALE1IS, handle_tlbi_el1, forward_ttlb_traps),
> +	SYS_INSN_TO_DESC(TLBI_VAALE1IS, handle_tlbi_el1, forward_ttlb_traps),
> +	SYS_INSN_TO_DESC(TLBI_VMALLE1, handle_tlbi_el1, forward_ttlb_traps),
> +	SYS_INSN_TO_DESC(TLBI_VAE1, handle_tlbi_el1, forward_ttlb_traps),
> +	SYS_INSN_TO_DESC(TLBI_ASIDE1, handle_tlbi_el1, forward_ttlb_traps),
> +	SYS_INSN_TO_DESC(TLBI_VAAE1, handle_tlbi_el1, forward_ttlb_traps),
> +	SYS_INSN_TO_DESC(TLBI_VALE1, handle_tlbi_el1, forward_ttlb_traps),
> +	SYS_INSN_TO_DESC(TLBI_VAALE1, handle_tlbi_el1, forward_ttlb_traps),
> +
>  	SYS_INSN_TO_DESC(AT_S1E2R, handle_s1e2, forward_nv_traps),
>  	SYS_INSN_TO_DESC(AT_S1E2W, handle_s1e2, forward_nv_traps),
>  	SYS_INSN_TO_DESC(AT_S12E1R, handle_s12r, forward_nv_traps),
>  	SYS_INSN_TO_DESC(AT_S12E1W, handle_s12w, forward_nv_traps),
>  	SYS_INSN_TO_DESC(AT_S12E0R, handle_s12r, forward_nv_traps),
>  	SYS_INSN_TO_DESC(AT_S12E0W, handle_s12w, forward_nv_traps),
> +
> +	SYS_INSN_TO_DESC(TLBI_IPAS2E1IS, handle_ipas2e1is, forward_nv_traps),
> +	SYS_INSN_TO_DESC(TLBI_IPAS2LE1IS, handle_ipas2e1is, forward_nv_traps),
> +	SYS_INSN_TO_DESC(TLBI_ALLE2IS, handle_alle2is, forward_nv_traps),
> +	SYS_INSN_TO_DESC(TLBI_VAE2IS, handle_vae2, forward_nv_traps),
> +	SYS_INSN_TO_DESC(TLBI_ALLE1IS, handle_alle1is, forward_nv_traps),
> +	SYS_INSN_TO_DESC(TLBI_VALE2IS, handle_vae2, forward_nv_traps),
> +	SYS_INSN_TO_DESC(TLBI_VMALLS12E1IS, handle_vmalls12e1is, forward_nv_traps),
> +	SYS_INSN_TO_DESC(TLBI_IPAS2E1, handle_ipas2e1is, forward_nv_traps),
> +	SYS_INSN_TO_DESC(TLBI_IPAS2LE1, handle_ipas2e1is, forward_nv_traps),
> +	SYS_INSN_TO_DESC(TLBI_ALLE2, handle_alle2is, forward_nv_traps),
> +	SYS_INSN_TO_DESC(TLBI_VAE2, handle_vae2, forward_nv_traps),
> +	SYS_INSN_TO_DESC(TLBI_ALLE1, handle_alle1is, forward_nv_traps),
> +	SYS_INSN_TO_DESC(TLBI_VALE2, handle_vae2, forward_nv_traps),
> +	SYS_INSN_TO_DESC(TLBI_VMALLS12E1, handle_vmalls12e1is, forward_nv_traps),
>  };
>  
>  static bool trap_dbgidr(struct kvm_vcpu *vcpu,
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 6a7cba077bce..0ea79e543b29 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -51,7 +51,23 @@ static bool memslot_is_logging(struct kvm_memory_slot *memslot)
>   */
>  void kvm_flush_remote_tlbs(struct kvm *kvm)
>  {
> -	kvm_call_hyp(__kvm_tlb_flush_vmid, &kvm->arch.mmu);
> +	struct kvm_s2_mmu *mmu = &kvm->arch.mmu;
> +
> +	if (mmu == &kvm->arch.mmu) {
> +		/*
> +		 * For a normal (i.e. non-nested) guest, flush entries for the
> +		 * given VMID *
> +		 */
> +		kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
> +	} else {
> +		/*
> +		 * When supporting nested virtualization, we can have multiple
> +		 * VMIDs in play for each VCPU in the VM, so it's really not
> +		 * worth it to try to quiesce the system and flush all the
> +		 * VMIDs that may be in use, instead just nuke the whole thing.
> +		 */
> +		kvm_call_hyp(__kvm_flush_vm_context);
> +	}
>  }
>  
>  static void kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa)
> 

Cheers,

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 23/59] KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting
  2019-06-25 14:19   ` Julien Thierry
@ 2019-07-02 12:54     ` Alexandru Elisei
  2019-07-03 14:18     ` Marc Zyngier
  1 sibling, 0 replies; 176+ messages in thread
From: Alexandru Elisei @ 2019-07-02 12:54 UTC (permalink / raw)
  To: Julien Thierry, Marc Zyngier, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	James Morse, Suzuki K Poulose


On 6/25/19 3:19 PM, Julien Thierry wrote:
>
> On 06/21/2019 10:38 AM, Marc Zyngier wrote:
>> From: Jintack Lim <jintack.lim@linaro.org>
>>
>> Forward exceptions due to WFI or WFE instructions to the virtual EL2 if
>> they are not coming from the virtual EL2 and virtual HCR_EL2.TWX is set.
>>
>> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm64/include/asm/kvm_nested.h |  2 ++
>>  arch/arm64/kvm/Makefile             |  1 +
>>  arch/arm64/kvm/handle_exit.c        | 13 +++++++++-
>>  arch/arm64/kvm/nested.c             | 39 +++++++++++++++++++++++++++++
>>  4 files changed, 54 insertions(+), 1 deletion(-)
>>  create mode 100644 arch/arm64/kvm/nested.c
>>
>> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
>> index 8a3d121a0b42..645e5e11b749 100644
>> --- a/arch/arm64/include/asm/kvm_nested.h
>> +++ b/arch/arm64/include/asm/kvm_nested.h
>> @@ -10,4 +10,6 @@ static inline bool nested_virt_in_use(const struct kvm_vcpu *vcpu)
>>  		test_bit(KVM_ARM_VCPU_NESTED_VIRT, vcpu->arch.features);
>>  }
>>  
>> +int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
>> +
>>  #endif /* __ARM64_KVM_NESTED_H */
>> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
>> index 9e450aea7db6..f11bd8b0d837 100644
>> --- a/arch/arm64/kvm/Makefile
>> +++ b/arch/arm64/kvm/Makefile
>> @@ -36,4 +36,5 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/irqchip.o
>>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
>>  kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
>>  
>> +kvm-$(CONFIG_KVM_ARM_HOST) += nested.o
>>  kvm-$(CONFIG_KVM_ARM_HOST) += emulate-nested.o
>> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
>> index e348c15c81bc..ddba212fd6ec 100644
>> --- a/arch/arm64/kvm/handle_exit.c
>> +++ b/arch/arm64/kvm/handle_exit.c
>> @@ -127,7 +127,18 @@ static int handle_no_fpsimd(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>   */
>>  static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>  {
>> -	if (kvm_vcpu_get_hsr(vcpu) & ESR_ELx_WFx_ISS_WFE) {
>> +	bool is_wfe = !!(kvm_vcpu_get_hsr(vcpu) & ESR_ELx_WFx_ISS_WFE);
>> +
>> +	if (nested_virt_in_use(vcpu)) {
>> +		int ret = handle_wfx_nested(vcpu, is_wfe);
>> +
>> +		if (ret < 0 && ret != -EINVAL)
>> +			return ret;
>> +		else if (ret >= 0)
>> +			return ret;
> I think you can simplify this:
>
> 	if (ret != -EINVAL)
> 		return ret;

And handle_wfx_nested can only return -EINVAL or 1 (from
kvm_inject_nested_sync), so the condition is not only complicated, but also
misleading.

Thanks,

Alex

>
> Cheers,
>
> Julien
>
>
>> +	}
>> +
>> +	if (is_wfe) {
>>  		trace_kvm_wfx_arm64(*vcpu_pc(vcpu), true);
>>  		vcpu->stat.wfe_exit_stat++;
>>  		kvm_vcpu_on_spin(vcpu, vcpu_mode_priv(vcpu));
>> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
>> new file mode 100644
>> index 000000000000..3872e3cf1691
>> --- /dev/null
>> +++ b/arch/arm64/kvm/nested.c
>> @@ -0,0 +1,39 @@
>> +/*
>> + * Copyright (C) 2017 - Columbia University and Linaro Ltd.
>> + * Author: Jintack Lim <jintack.lim@linaro.org>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include <linux/kvm.h>
>> +#include <linux/kvm_host.h>
>> +
>> +#include <asm/kvm_emulate.h>
>> +
>> +/*
>> + * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
>> + * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
>> + * handle this.
>> + */
>> +int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
>> +{
>> +	u64 hcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
>> +
>> +	if (vcpu_mode_el2(vcpu))
>> +		return -EINVAL;
>> +
>> +	if ((is_wfe && (hcr_el2 & HCR_TWE)) || (!is_wfe && (hcr_el2 & HCR_TWI)))
>> +		return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
>> +
>> +	return -EINVAL;
>> +}
>>

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 28/59] KVM: arm64: nv: Respect the virtual HCR_EL2.NV1 bit setting
  2019-06-21  9:38 ` [PATCH 28/59] KVM: arm64: nv: Respect the virtual HCR_EL2.NV1 bit setting Marc Zyngier
  2019-06-26  7:23   ` Julien Thierry
@ 2019-07-02 16:32   ` Alexandru Elisei
  2019-07-03  9:10     ` Alexandru Elisei
  1 sibling, 1 reply; 176+ messages in thread
From: Alexandru Elisei @ 2019-07-02 16:32 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave Martin


On 6/21/19 10:38 AM, Marc Zyngier wrote:
> From: Jintack Lim <jintack@cs.columbia.edu>
>
> Forward ELR_EL1, SPSR_EL1 and VBAR_EL1 traps to the virtual EL2 if the
> virtual HCR_EL2.NV bit is set.
HCR_EL2.NV1?
>
> This is for recursive nested virtualization.
>
> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/include/asm/kvm_arm.h |  1 +
>  arch/arm64/kvm/sys_regs.c        | 19 +++++++++++++++++--
>  2 files changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index d21486274eeb..55f4525c112c 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -24,6 +24,7 @@
>  
>  /* Hyp Configuration Register (HCR) bits */
>  #define HCR_FWB		(UL(1) << 46)
> +#define HCR_NV1		(UL(1) << 43)
>  #define HCR_NV		(UL(1) << 42)
>  #define HCR_API		(UL(1) << 41)
>  #define HCR_APK		(UL(1) << 40)
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 0f74b9277a86..beadebcfc888 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -473,8 +473,10 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
>  	if (el12_reg(p) && forward_nv_traps(vcpu))
>  		return false;
>  
> -	if (!el12_reg(p) && forward_vm_traps(vcpu, p))
> -		return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
> +	if (!el12_reg(p) && forward_vm_traps(vcpu, p)) {
> +		kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
> +		return false;
> +	}
>  
>  	BUG_ON(!vcpu_mode_el2(vcpu) && !p->is_write);
>  
> @@ -1643,6 +1645,13 @@ static bool access_sp_el1(struct kvm_vcpu *vcpu,
>  	return true;
>  }
>  
> +
> +/* This function is to support the recursive nested virtualization */
> +static bool forward_nv1_traps(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
Why the struct sys_reg_params *p argument? It isn't used by the function.
> +{
> +	return forward_traps(vcpu, HCR_NV1);
> +}
> +
>  static bool access_elr(struct kvm_vcpu *vcpu,
>  		       struct sys_reg_params *p,
>  		       const struct sys_reg_desc *r)
> @@ -1650,6 +1659,9 @@ static bool access_elr(struct kvm_vcpu *vcpu,
>  	if (el12_reg(p) && forward_nv_traps(vcpu))
>  		return false;
>  
> +	if (!el12_reg(p) && forward_nv1_traps(vcpu, p))
> +		return false;
> +
>  	if (p->is_write)
>  		vcpu->arch.ctxt.gp_regs.elr_el1 = p->regval;
>  	else
> @@ -1665,6 +1677,9 @@ static bool access_spsr(struct kvm_vcpu *vcpu,
>  	if (el12_reg(p) && forward_nv_traps(vcpu))
>  		return false;
>  
> +	if (!el12_reg(p) && forward_nv1_traps(vcpu, p))
> +		return false;
> +
>  	if (p->is_write)
>  		vcpu->arch.ctxt.gp_regs.spsr[KVM_SPSR_EL1] = p->regval;
>  	else

The commit message mentions VBAR_EL1, but there's no change related to it.
Parhaps you're missing this (build tested only):

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index bd21f0f45a86..082dc31ff533 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -401,6 +401,12 @@ static bool el12_reg(struct sys_reg_params *p)
        return (p->Op1 == 5);
 }
 
+/* This function is to support the recursive nested virtualization */
+static bool forward_nv1_traps(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
+{
+       return forward_traps(vcpu, HCR_NV1);
+}
+
 static bool access_rw(struct kvm_vcpu *vcpu,
                      struct sys_reg_params *p,
                      const struct sys_reg_desc *r)
@@ -408,6 +414,10 @@ static bool access_rw(struct kvm_vcpu *vcpu,
        if (el12_reg(p) && forward_nv_traps(vcpu))
                return false;
 
+       if (sys_reg(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2) == SYS_VBAR_EL1 &&
+           forward_nv1_traps(vcpu, p))
+               return false;
+
        if (p->is_write)
                vcpu_write_sys_reg(vcpu, p->regval, r->reg);
        else
@@ -1794,12 +1804,6 @@ static bool forward_ttlb_traps(struct kvm_vcpu *vcpu)
        return forward_traps(vcpu, HCR_TTLB);
 }
 
-/* This function is to support the recursive nested virtualization */
-static bool forward_nv1_traps(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
-{
-       return forward_traps(vcpu, HCR_NV1);
-}
-
 static bool access_elr(struct kvm_vcpu *vcpu,
                       struct sys_reg_params *p,
                       const struct sys_reg_desc *r)


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* Re: [PATCH 28/59] KVM: arm64: nv: Respect the virtual HCR_EL2.NV1 bit setting
  2019-07-02 16:32   ` Alexandru Elisei
@ 2019-07-03  9:10     ` Alexandru Elisei
  0 siblings, 0 replies; 176+ messages in thread
From: Alexandru Elisei @ 2019-07-03  9:10 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave Martin


On 7/2/19 5:32 PM, Alexandru Elisei wrote:
> On 6/21/19 10:38 AM, Marc Zyngier wrote:
>> From: Jintack Lim <jintack@cs.columbia.edu>
>>
>> Forward ELR_EL1, SPSR_EL1 and VBAR_EL1 traps to the virtual EL2 if the
>> virtual HCR_EL2.NV bit is set.
> HCR_EL2.NV1?
>> This is for recursive nested virtualization.
>>
>> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm64/include/asm/kvm_arm.h |  1 +
>>  arch/arm64/kvm/sys_regs.c        | 19 +++++++++++++++++--
>>  2 files changed, 18 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
>> index d21486274eeb..55f4525c112c 100644
>> --- a/arch/arm64/include/asm/kvm_arm.h
>> +++ b/arch/arm64/include/asm/kvm_arm.h
>> @@ -24,6 +24,7 @@
>>  
>>  /* Hyp Configuration Register (HCR) bits */
>>  #define HCR_FWB		(UL(1) << 46)
>> +#define HCR_NV1		(UL(1) << 43)
>>  #define HCR_NV		(UL(1) << 42)
>>  #define HCR_API		(UL(1) << 41)
>>  #define HCR_APK		(UL(1) << 40)
>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>> index 0f74b9277a86..beadebcfc888 100644
>> --- a/arch/arm64/kvm/sys_regs.c
>> +++ b/arch/arm64/kvm/sys_regs.c
>> @@ -473,8 +473,10 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
>>  	if (el12_reg(p) && forward_nv_traps(vcpu))
>>  		return false;
>>  
>> -	if (!el12_reg(p) && forward_vm_traps(vcpu, p))
>> -		return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
>> +	if (!el12_reg(p) && forward_vm_traps(vcpu, p)) {
>> +		kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
>> +		return false;
>> +	}
>>  
>>  	BUG_ON(!vcpu_mode_el2(vcpu) && !p->is_write);
>>  
>> @@ -1643,6 +1645,13 @@ static bool access_sp_el1(struct kvm_vcpu *vcpu,
>>  	return true;
>>  }
>>  
>> +
>> +/* This function is to support the recursive nested virtualization */
>> +static bool forward_nv1_traps(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
> Why the struct sys_reg_params *p argument? It isn't used by the function.
>> +{
>> +	return forward_traps(vcpu, HCR_NV1);
>> +}
>> +
>>  static bool access_elr(struct kvm_vcpu *vcpu,
>>  		       struct sys_reg_params *p,
>>  		       const struct sys_reg_desc *r)
>> @@ -1650,6 +1659,9 @@ static bool access_elr(struct kvm_vcpu *vcpu,
>>  	if (el12_reg(p) && forward_nv_traps(vcpu))
>>  		return false;
>>  
>> +	if (!el12_reg(p) && forward_nv1_traps(vcpu, p))
>> +		return false;
>> +
>>  	if (p->is_write)
>>  		vcpu->arch.ctxt.gp_regs.elr_el1 = p->regval;
>>  	else
>> @@ -1665,6 +1677,9 @@ static bool access_spsr(struct kvm_vcpu *vcpu,
>>  	if (el12_reg(p) && forward_nv_traps(vcpu))
>>  		return false;
>>  
>> +	if (!el12_reg(p) && forward_nv1_traps(vcpu, p))
>> +		return false;
>> +
>>  	if (p->is_write)
>>  		vcpu->arch.ctxt.gp_regs.spsr[KVM_SPSR_EL1] = p->regval;
>>  	else
> The commit message mentions VBAR_EL1, but there's no change related to it.
> Parhaps you're missing this (build tested only):
>
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index bd21f0f45a86..082dc31ff533 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -401,6 +401,12 @@ static bool el12_reg(struct sys_reg_params *p)
>         return (p->Op1 == 5);
>  }
>  
> +/* This function is to support the recursive nested virtualization */
> +static bool forward_nv1_traps(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
> +{
> +       return forward_traps(vcpu, HCR_NV1);
> +}
> +
>  static bool access_rw(struct kvm_vcpu *vcpu,
>                       struct sys_reg_params *p,
>                       const struct sys_reg_desc *r)
> @@ -408,6 +414,10 @@ static bool access_rw(struct kvm_vcpu *vcpu,
>         if (el12_reg(p) && forward_nv_traps(vcpu))
>                 return false;
>  
> +       if (sys_reg(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2) == SYS_VBAR_EL1 &&
> +           forward_nv1_traps(vcpu, p))

Ahem... this is probably better:

+       if (r->reg == VBAR_EL1 && forward_nv1_traps(vcpu, p))

> +               return false;
> +
>         if (p->is_write)
>                 vcpu_write_sys_reg(vcpu, p->regval, r->reg);
>         else
> @@ -1794,12 +1804,6 @@ static bool forward_ttlb_traps(struct kvm_vcpu *vcpu)
>         return forward_traps(vcpu, HCR_TTLB);
>  }
>  
> -/* This function is to support the recursive nested virtualization */
> -static bool forward_nv1_traps(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
> -{
> -       return forward_traps(vcpu, HCR_NV1);
> -}
> -
>  static bool access_elr(struct kvm_vcpu *vcpu,
>                        struct sys_reg_params *p,
>                        const struct sys_reg_desc *r)
>

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 29/59] KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2
  2019-06-21  9:38 ` [PATCH 29/59] KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2 Marc Zyngier
@ 2019-07-03  9:16   ` Alexandru Elisei
  0 siblings, 0 replies; 176+ messages in thread
From: Alexandru Elisei @ 2019-07-03  9:16 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave Martin


On 6/21/19 10:38 AM, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
>
> With HCR_EL2.NV bit set, accesses to EL12 registers in the virtual EL2
> trap to EL2. Handle those traps just like we do for EL1 registers.
>
> One exception is CNTKCTL_EL12. We don't trap on CNTKCTL_EL1 for non-VHE
> virtual EL2 because we don't have to. However, accessing CNTKCTL_EL12
> will trap since it's one of the EL12 registers controlled by HCR_EL2.NV
> bit.  Therefore, add a handler for it and don't treat it as a
> non-trap-registers when preparing a shadow context.
>
> Move EL12 system register macros to a common place to reuse them.

I see no change related to moving macros, is this a left over from a rebase?

>
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/kvm/sys_regs.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
>
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index beadebcfc888..34f1b79f7856 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -2039,6 +2039,23 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	{ SYS_DESC(SYS_CNTVOFF_EL2), access_rw, reset_val, CNTVOFF_EL2, 0 },
>  	{ SYS_DESC(SYS_CNTHCTL_EL2), access_rw, reset_val, CNTHCTL_EL2, 0 },
>  
> +	{ SYS_DESC(SYS_SCTLR_EL12), access_vm_reg, reset_val, SCTLR_EL1, 0x00C50078 },
> +	{ SYS_DESC(SYS_CPACR_EL12), access_rw, reset_val, CPACR_EL1, 0 },
> +	{ SYS_DESC(SYS_TTBR0_EL12), access_vm_reg, reset_unknown, TTBR0_EL1 },
> +	{ SYS_DESC(SYS_TTBR1_EL12), access_vm_reg, reset_unknown, TTBR1_EL1 },
> +	{ SYS_DESC(SYS_TCR_EL12), access_vm_reg, reset_val, TCR_EL1, 0 },
> +	{ SYS_DESC(SYS_SPSR_EL12), access_spsr},
> +	{ SYS_DESC(SYS_ELR_EL12), access_elr},
> +	{ SYS_DESC(SYS_AFSR0_EL12), access_vm_reg, reset_unknown, AFSR0_EL1 },
> +	{ SYS_DESC(SYS_AFSR1_EL12), access_vm_reg, reset_unknown, AFSR1_EL1 },
> +	{ SYS_DESC(SYS_ESR_EL12), access_vm_reg, reset_unknown, ESR_EL1 },
> +	{ SYS_DESC(SYS_FAR_EL12), access_vm_reg, reset_unknown, FAR_EL1 },
> +	{ SYS_DESC(SYS_MAIR_EL12), access_vm_reg, reset_unknown, MAIR_EL1 },
> +	{ SYS_DESC(SYS_AMAIR_EL12), access_vm_reg, reset_amair_el1, AMAIR_EL1 },
> +	{ SYS_DESC(SYS_VBAR_EL12), access_rw, reset_val, VBAR_EL1, 0 },
> +	{ SYS_DESC(SYS_CONTEXTIDR_EL12), access_vm_reg, reset_val, CONTEXTIDR_EL1, 0 },
> +	{ SYS_DESC(SYS_CNTKCTL_EL12), access_rw, reset_val, CNTKCTL_EL1, 0 },
> +
>  	{ SYS_DESC(SYS_SP_EL2), NULL, reset_unknown, SP_EL2 },
>  };
>  

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 06/59] KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x
  2019-06-24 12:48       ` Dave Martin
@ 2019-07-03  9:21         ` Marc Zyngier
  2019-07-04 10:00           ` Dave Martin
  0 siblings, 1 reply; 176+ messages in thread
From: Marc Zyngier @ 2019-07-03  9:21 UTC (permalink / raw)
  To: Dave Martin; +Cc: Julien Thierry, linux-arm-kernel, kvmarm, kvm, Andre Przywara

On 24/06/2019 13:48, Dave Martin wrote:
> On Fri, Jun 21, 2019 at 02:50:08PM +0100, Marc Zyngier wrote:
>> On 21/06/2019 14:24, Julien Thierry wrote:
>>>
>>>
>>> On 21/06/2019 10:37, Marc Zyngier wrote:
>>>> From: Christoffer Dall <christoffer.dall@linaro.org>
>>>>
>>>> We were not allowing userspace to set a more privileged mode for the VCPU
>>>> than EL1, but we should allow this when nested virtualization is enabled
>>>> for the VCPU.
>>>>
>>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>>> ---
>>>>  arch/arm64/kvm/guest.c | 6 ++++++
>>>>  1 file changed, 6 insertions(+)
>>>>
>>>> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
>>>> index 3ae2f82fca46..4c35b5d51e21 100644
>>>> --- a/arch/arm64/kvm/guest.c
>>>> +++ b/arch/arm64/kvm/guest.c
>>>> @@ -37,6 +37,7 @@
>>>>  #include <asm/kvm_emulate.h>
>>>>  #include <asm/kvm_coproc.h>
>>>>  #include <asm/kvm_host.h>
>>>> +#include <asm/kvm_nested.h>
>>>>  #include <asm/sigcontext.h>
>>>>  
>>>>  #include "trace.h"
>>>> @@ -194,6 +195,11 @@ static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
>>>>  			if (vcpu_el1_is_32bit(vcpu))
>>>>  				return -EINVAL;
>>>>  			break;
>>>> +		case PSR_MODE_EL2h:
>>>> +		case PSR_MODE_EL2t:
>>>> +			if (vcpu_el1_is_32bit(vcpu) || !nested_virt_in_use(vcpu))
>>>
>>> This condition reads a bit weirdly. Why do we care about anything else
>>> than !nested_virt_in_use() ?
>>>
>>> If nested virt is not in use then obviously we return the error.
>>>
>>> If nested virt is in use then why do we care about EL1? Or should this
>>> test read as "highest_el_is_32bit" ?
>>
>> There are multiple things at play here:
>>
>> - MODE_EL2x is not a valid 32bit mode
>> - The architecture forbids nested virt with 32bit EL2
>>
>> The code above is a simplification of these two conditions. But
>> certainly we can do a bit better, as kvm_reset_cpu() doesn't really
>> check that we don't create a vcpu with both 32bit+NV. These two bits
>> should really be exclusive.
> 
> This code is safe for now because KVM_VCPU_MAX_FEATURES <=
> KVM_ARM_VCPU_NESTED_VIRT, right, i.e., nested_virt_in_use() cannot be
> true?
> 
> This makes me a little uneasy, but I think that's paranoia talking: we
> want bisectably, but no sane person should ship with just half of this
> series.  So I guess this is fine.
> 
> We could stick something like
> 
> 	if (WARN_ON(...))
> 		return false;
> 
> in nested_virt_in_use() and then remove it in the final patch, but it's
> probably overkill.

The only case I can imagine something going wrong is if this series is
only applied halfway, and another series bumps the maximum feature to
something that includes NV. I guess your suggestion would solve that.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 02/59] KVM: arm64: Move __load_guest_stage2 to kvm_mmu.h
  2019-06-24 11:19   ` Dave Martin
@ 2019-07-03  9:30     ` Marc Zyngier
  2019-07-03 16:13       ` Dave Martin
  0 siblings, 1 reply; 176+ messages in thread
From: Marc Zyngier @ 2019-07-03  9:30 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-arm-kernel, kvmarm, kvm, Julien Thierry, Andre Przywara,
	Suzuki K Poulose, Christoffer Dall, James Morse, Jintack Lim

On 24/06/2019 12:19, Dave Martin wrote:
> On Fri, Jun 21, 2019 at 10:37:46AM +0100, Marc Zyngier wrote:
>> Having __load_guest_stage2 in kvm_hyp.h is quickly going to trigger
>> a circular include problem. In order to avoid this, let's move
>> it to kvm_mmu.h, where it will be a better fit anyway.
>>
>> In the process, drop the __hyp_text annotation, which doesn't help
>> as the function is marked as __always_inline.
> 
> Does GCC always inline things marked __always_inline?
> 
> I seem to remember some gotchas in this area, but I may be being
> paranoid.

Yes, this is a strong guarantee. Things like static keys rely on that,
for example.

> 
> If this still only called from hyp, I'd be tempted to heep the
> __hyp_text annotation just to be on the safe side.

The trouble with that is that re-introduces the circular dependency with
kvm_hyp.h that this patch is trying to break...

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 04/59] KVM: arm64: nv: Introduce nested virtualization VCPU feature
  2019-06-24 11:28   ` Dave Martin
@ 2019-07-03 11:53     ` Marc Zyngier
  2019-07-03 16:27       ` Dave Martin
  0 siblings, 1 reply; 176+ messages in thread
From: Marc Zyngier @ 2019-07-03 11:53 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-arm-kernel, kvmarm, kvm, Julien Thierry, Andre Przywara,
	Suzuki K Poulose, Christoffer Dall, James Morse, Jintack Lim

On 24/06/2019 12:28, Dave Martin wrote:
> On Fri, Jun 21, 2019 at 10:37:48AM +0100, Marc Zyngier wrote:
>> From: Christoffer Dall <christoffer.dall@arm.com>
>>
>> Introduce the feature bit and a primitive that checks if the feature is
>> set behind a static key check based on the cpus_have_const_cap check.
>>
>> Checking nested_virt_in_use() on systems without nested virt enabled
>> should have neglgible overhead.
>>
>> We don't yet allow userspace to actually set this feature.
>>
>> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm/include/asm/kvm_nested.h   |  9 +++++++++
>>  arch/arm64/include/asm/kvm_nested.h | 13 +++++++++++++
>>  arch/arm64/include/uapi/asm/kvm.h   |  1 +
>>  3 files changed, 23 insertions(+)
>>  create mode 100644 arch/arm/include/asm/kvm_nested.h
>>  create mode 100644 arch/arm64/include/asm/kvm_nested.h
>>
>> diff --git a/arch/arm/include/asm/kvm_nested.h b/arch/arm/include/asm/kvm_nested.h
>> new file mode 100644
>> index 000000000000..124ff6445f8f
>> --- /dev/null
>> +++ b/arch/arm/include/asm/kvm_nested.h
>> @@ -0,0 +1,9 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#ifndef __ARM_KVM_NESTED_H
>> +#define __ARM_KVM_NESTED_H
>> +
>> +#include <linux/kvm_host.h>
>> +
>> +static inline bool nested_virt_in_use(const struct kvm_vcpu *vcpu) { return false; }
>> +
>> +#endif /* __ARM_KVM_NESTED_H */
>> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
>> new file mode 100644
>> index 000000000000..8a3d121a0b42
>> --- /dev/null
>> +++ b/arch/arm64/include/asm/kvm_nested.h
>> @@ -0,0 +1,13 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#ifndef __ARM64_KVM_NESTED_H
>> +#define __ARM64_KVM_NESTED_H
>> +
>> +#include <linux/kvm_host.h>
>> +
>> +static inline bool nested_virt_in_use(const struct kvm_vcpu *vcpu)
>> +{
>> +	return cpus_have_const_cap(ARM64_HAS_NESTED_VIRT) &&
>> +		test_bit(KVM_ARM_VCPU_NESTED_VIRT, vcpu->arch.features);
>> +}
>> +
>> +#endif /* __ARM64_KVM_NESTED_H */
>> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
>> index d819a3e8b552..563e2a8bae93 100644
>> --- a/arch/arm64/include/uapi/asm/kvm.h
>> +++ b/arch/arm64/include/uapi/asm/kvm.h
>> @@ -106,6 +106,7 @@ struct kvm_regs {
>>  #define KVM_ARM_VCPU_SVE		4 /* enable SVE for this CPU */
>>  #define KVM_ARM_VCPU_PTRAUTH_ADDRESS	5 /* VCPU uses address authentication */
>>  #define KVM_ARM_VCPU_PTRAUTH_GENERIC	6 /* VCPU uses generic authentication */
>> +#define KVM_ARM_VCPU_NESTED_VIRT	7 /* Support nested virtualization */
> 
> This seems weirdly named:
> 
> Isn't the feature we're exposing here really EL2?  In that case, the
> feature the guest gets with this flag enabled is plain virtualisation,
> possibly with the option to nest further.
> 
> Does the guest also get nested virt (i.e., recursively nested virt from
> the host's PoV) as a side effect, or would require an explicit extra
> flag?

So far, there is no extra flag to describe further nesting, and it
directly comes from EL2 being emulated. I don't mind renaming this to
KVM_ARM_VCPU_HAS_EL2, or something similar... Whether we want userspace
to control the exposure of the nesting capability (i.e. EL2 with
ARMv8.3-NV) is another question.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 04/59] KVM: arm64: nv: Introduce nested virtualization VCPU feature
  2019-06-24 11:43   ` Dave Martin
@ 2019-07-03 11:56     ` Marc Zyngier
  2019-07-03 16:24       ` Dave Martin
  0 siblings, 1 reply; 176+ messages in thread
From: Marc Zyngier @ 2019-07-03 11:56 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-arm-kernel, kvmarm, kvm, Julien Thierry, Andre Przywara,
	Suzuki K Poulose, Christoffer Dall, James Morse, Jintack Lim

On 24/06/2019 12:43, Dave Martin wrote:
> On Fri, Jun 21, 2019 at 10:37:48AM +0100, Marc Zyngier wrote:
>> From: Christoffer Dall <christoffer.dall@arm.com>
>>
>> Introduce the feature bit and a primitive that checks if the feature is
>> set behind a static key check based on the cpus_have_const_cap check.
>>
>> Checking nested_virt_in_use() on systems without nested virt enabled
>> should have neglgible overhead.
>>
>> We don't yet allow userspace to actually set this feature.
>>
>> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
> 
> [...]
> 
>> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
>> new file mode 100644
>> index 000000000000..8a3d121a0b42
>> --- /dev/null
>> +++ b/arch/arm64/include/asm/kvm_nested.h
>> @@ -0,0 +1,13 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#ifndef __ARM64_KVM_NESTED_H
>> +#define __ARM64_KVM_NESTED_H
>> +
>> +#include <linux/kvm_host.h>
>> +
>> +static inline bool nested_virt_in_use(const struct kvm_vcpu *vcpu)
>> +{
>> +	return cpus_have_const_cap(ARM64_HAS_NESTED_VIRT) &&
>> +		test_bit(KVM_ARM_VCPU_NESTED_VIRT, vcpu->arch.features);
>> +}
> 
> Also, is it worth having a vcpu->arch.flags flag for this, similarly to
> SVE and ptrauth?

What would we expose through this flag?

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 13/59] KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg()
  2019-06-24 12:42   ` Julien Thierry
  2019-06-25 14:02     ` Alexandru Elisei
@ 2019-07-03 12:15     ` Marc Zyngier
  2019-07-03 15:21       ` Julien Thierry
  1 sibling, 1 reply; 176+ messages in thread
From: Marc Zyngier @ 2019-07-03 12:15 UTC (permalink / raw)
  To: Julien Thierry, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	James Morse, Suzuki K Poulose

On 24/06/2019 13:42, Julien Thierry wrote:
> 
> 
> On 06/21/2019 10:37 AM, Marc Zyngier wrote:
>> From: Andre Przywara <andre.przywara@arm.com>
>>
>> KVM internally uses accessor functions when reading or writing the
>> guest's system registers. This takes care of accessing either the stored
>> copy or using the "live" EL1 system registers when the host uses VHE.
>>
>> With the introduction of virtual EL2 we add a bunch of EL2 system
>> registers, which now must also be taken care of:
>> - If the guest is running in vEL2, and we access an EL1 sysreg, we must
>>   revert to the stored version of that, and not use the CPU's copy.
>> - If the guest is running in vEL1, and we access an EL2 sysreg, we must
>>   also use the stored version, since the CPU carries the EL1 copy.
>> - Some EL2 system registers are supposed to affect the current execution
>>   of the system, so we need to put them into their respective EL1
>>   counterparts. For this we need to define a mapping between the two.
>>   This is done using the newly introduced struct el2_sysreg_map.
>> - Some EL2 system registers have a different format than their EL1
>>   counterpart, so we need to translate them before writing them to the
>>   CPU. This is done using an (optional) translate function in the map.
>> - There are the three special registers SP_EL2, SPSR_EL2 and ELR_EL2,
>>   which need some separate handling.
>>
>> All of these cases are now wrapped into the existing accessor functions,
>> so KVM users wouldn't need to care whether they access EL2 or EL1
>> registers and also which state the guest is in.
>>
>> This handles what was formerly known as the "shadow state" dynamically,
>> without requiring a separate copy for each vCPU EL.
>>
>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm64/include/asm/kvm_emulate.h |   6 +
>>  arch/arm64/include/asm/kvm_host.h    |   5 +
>>  arch/arm64/kvm/sys_regs.c            | 163 +++++++++++++++++++++++++++
>>  3 files changed, 174 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
>> index c43aac5fed69..f37006b6eec4 100644
>> --- a/arch/arm64/include/asm/kvm_emulate.h
>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>> @@ -70,6 +70,12 @@ void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu);
>>  int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2);
>>  int kvm_inject_nested_irq(struct kvm_vcpu *vcpu);
>>  
>> +u64 translate_tcr(u64 tcr);
>> +u64 translate_cptr(u64 tcr);
>> +u64 translate_sctlr(u64 tcr);
>> +u64 translate_ttbr0(u64 tcr);
>> +u64 translate_cnthctl(u64 tcr);
>> +
>>  static inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
>>  {
>>  	return !(vcpu->arch.hcr_el2 & HCR_RW);
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 2d4290d2513a..dae9c42a7219 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -217,6 +217,11 @@ enum vcpu_sysreg {
>>  	NR_SYS_REGS	/* Nothing after this line! */
>>  };
>>  
>> +static inline bool sysreg_is_el2(int reg)
>> +{
>> +	return reg >= FIRST_EL2_SYSREG && reg < NR_SYS_REGS;
>> +}
>> +
>>  /* 32bit mapping */
>>  #define c0_MPIDR	(MPIDR_EL1 * 2)	/* MultiProcessor ID Register */
>>  #define c0_CSSELR	(CSSELR_EL1 * 2)/* Cache Size Selection Register */
>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>> index 693dd063c9c2..d024114da162 100644
>> --- a/arch/arm64/kvm/sys_regs.c
>> +++ b/arch/arm64/kvm/sys_regs.c
>> @@ -76,11 +76,142 @@ static bool write_to_read_only(struct kvm_vcpu *vcpu,
>>  	return false;
>>  }
>>  
>> +static u64 tcr_el2_ips_to_tcr_el1_ps(u64 tcr_el2)
>> +{
>> +	return ((tcr_el2 & TCR_EL2_PS_MASK) >> TCR_EL2_PS_SHIFT)
>> +		<< TCR_IPS_SHIFT;
>> +}
>> +
>> +u64 translate_tcr(u64 tcr)
>> +{
>> +	return TCR_EPD1_MASK |				/* disable TTBR1_EL1 */
>> +	       ((tcr & TCR_EL2_TBI) ? TCR_TBI0 : 0) |
>> +	       tcr_el2_ips_to_tcr_el1_ps(tcr) |
>> +	       (tcr & TCR_EL2_TG0_MASK) |
>> +	       (tcr & TCR_EL2_ORGN0_MASK) |
>> +	       (tcr & TCR_EL2_IRGN0_MASK) |
>> +	       (tcr & TCR_EL2_T0SZ_MASK);
>> +}
>> +
>> +u64 translate_cptr(u64 cptr_el2)
>> +{
>> +	u64 cpacr_el1 = 0;
>> +
>> +	if (!(cptr_el2 & CPTR_EL2_TFP))
>> +		cpacr_el1 |= CPACR_EL1_FPEN;
>> +	if (cptr_el2 & CPTR_EL2_TTA)
>> +		cpacr_el1 |= CPACR_EL1_TTA;
>> +	if (!(cptr_el2 & CPTR_EL2_TZ))
>> +		cpacr_el1 |= CPACR_EL1_ZEN;
>> +
>> +	return cpacr_el1;
>> +}
>> +
>> +u64 translate_sctlr(u64 sctlr)
>> +{
>> +	/* Bit 20 is RES1 in SCTLR_EL1, but RES0 in SCTLR_EL2 */
>> +	return sctlr | BIT(20);
>> +}
>> +
>> +u64 translate_ttbr0(u64 ttbr0)
>> +{
>> +	/* Force ASID to 0 (ASID 0 or RES0) */
>> +	return ttbr0 & ~GENMASK_ULL(63, 48);
>> +}
>> +
>> +u64 translate_cnthctl(u64 cnthctl)
>> +{
>> +	return ((cnthctl & 0x3) << 10) | (cnthctl & 0xfc);
>> +}
>> +
>> +#define EL2_SYSREG(el2, el1, translate)	\
>> +	[el2 - FIRST_EL2_SYSREG] = { el2, el1, translate }
>> +#define PURE_EL2_SYSREG(el2) \
>> +	[el2 - FIRST_EL2_SYSREG] = { el2,__INVALID_SYSREG__, NULL }
>> +/*
>> + * Associate vEL2 registers to their EL1 counterparts on the CPU.
>> + * The translate function can be NULL, when the register layout is identical.
>> + */
>> +struct el2_sysreg_map {
>> +	int sysreg;	/* EL2 register index into the array above */
>> +	int mapping;	/* associated EL1 register */
>> +	u64 (*translate)(u64 value);
>> +} nested_sysreg_map[NR_SYS_REGS - FIRST_EL2_SYSREG] = {
>> +	PURE_EL2_SYSREG( VPIDR_EL2 ),
>> +	PURE_EL2_SYSREG( VMPIDR_EL2 ),
>> +	PURE_EL2_SYSREG( ACTLR_EL2 ),
>> +	PURE_EL2_SYSREG( HCR_EL2 ),
>> +	PURE_EL2_SYSREG( MDCR_EL2 ),
>> +	PURE_EL2_SYSREG( HSTR_EL2 ),
>> +	PURE_EL2_SYSREG( HACR_EL2 ),
>> +	PURE_EL2_SYSREG( VTTBR_EL2 ),
>> +	PURE_EL2_SYSREG( VTCR_EL2 ),
>> +	PURE_EL2_SYSREG( RVBAR_EL2 ),
>> +	PURE_EL2_SYSREG( RMR_EL2 ),
>> +	PURE_EL2_SYSREG( TPIDR_EL2 ),
>> +	PURE_EL2_SYSREG( CNTVOFF_EL2 ),
>> +	PURE_EL2_SYSREG( CNTHCTL_EL2 ),
>> +	PURE_EL2_SYSREG( HPFAR_EL2 ),
>> +	EL2_SYSREG(      SCTLR_EL2,  SCTLR_EL1,      translate_sctlr ),
>> +	EL2_SYSREG(      CPTR_EL2,   CPACR_EL1,      translate_cptr  ),
>> +	EL2_SYSREG(      TTBR0_EL2,  TTBR0_EL1,      translate_ttbr0 ),
>> +	EL2_SYSREG(      TTBR1_EL2,  TTBR1_EL1,      NULL            ),
>> +	EL2_SYSREG(      TCR_EL2,    TCR_EL1,        translate_tcr   ),
>> +	EL2_SYSREG(      VBAR_EL2,   VBAR_EL1,       NULL            ),
>> +	EL2_SYSREG(      AFSR0_EL2,  AFSR0_EL1,      NULL            ),
>> +	EL2_SYSREG(      AFSR1_EL2,  AFSR1_EL1,      NULL            ),
>> +	EL2_SYSREG(      ESR_EL2,    ESR_EL1,        NULL            ),
>> +	EL2_SYSREG(      FAR_EL2,    FAR_EL1,        NULL            ),
>> +	EL2_SYSREG(      MAIR_EL2,   MAIR_EL1,       NULL            ),
>> +	EL2_SYSREG(      AMAIR_EL2,  AMAIR_EL1,      NULL            ),
>> +};
>> +
>> +static
>> +const struct el2_sysreg_map *find_el2_sysreg(const struct el2_sysreg_map *map,
>> +					     int reg)
>> +{
>> +	const struct el2_sysreg_map *entry;
>> +
>> +	if (!sysreg_is_el2(reg))
>> +		return NULL;
>> +
>> +	entry = &nested_sysreg_map[reg - FIRST_EL2_SYSREG];
>> +	if (entry->sysreg == __INVALID_SYSREG__)
>> +		return NULL;
>> +
>> +	return entry;
>> +}
>> +
>>  u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
>>  {
>> +
>>  	if (!vcpu->arch.sysregs_loaded_on_cpu)
>>  		goto immediate_read;
>>  
>> +	if (unlikely(sysreg_is_el2(reg))) {
>> +		const struct el2_sysreg_map *el2_reg;
>> +
>> +		if (!is_hyp_ctxt(vcpu))
>> +			goto immediate_read;
>> +
>> +		el2_reg = find_el2_sysreg(nested_sysreg_map, reg);
>> +		if (el2_reg) {
>> +			/*
>> +			 * If this register does not have an EL1 counterpart,
>> +			 * then read the stored EL2 version.
>> +			 */
>> +			if (el2_reg->mapping == __INVALID_SYSREG__)
> 
> In this patch, find_el2_sysreg returns NULL for PURE_EL2 registers. So

That's not how I read this code. You get NULL if the the EL2 sysreg is
set to __INVALID_SYSREG__, of which there is no occurrence (yeah, dead
code).

> for PURE_EL2, the access would go through the switch case. However this
> branch suggest that for PURE_EL2 register we intend to do the read from
> the memory backed version.
> 
> Which should it be?

My understanding of this code is that we're actually hitting memory
here. Am I missing anything? Note that I'm actively refactoring this
code as part of of the ARMv8.4-NV effort, hopefully making it a bit simpler:

https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/commit/?h=kvm-arm64/nv-wip-5.2-rc6&id=ea93236776772ce08e0eab51d9b77a9197121fde

> 
>> +				goto immediate_read;
>> +
>> +			/* Get the current version of the EL1 counterpart. */
>> +			reg = el2_reg->mapping;
>> +		}
>> +	} else {
>> +		/* EL1 register can't be on the CPU if the guest is in vEL2. */
>> +		if (unlikely(is_hyp_ctxt(vcpu)))
>> +			goto immediate_read;
>> +	}
>> +
>>  	/*
>>  	 * System registers listed in the switch are not saved on every
>>  	 * exit from the guest but are only saved on vcpu_put.
>> @@ -114,6 +245,8 @@ u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
>>  	case DACR32_EL2:	return read_sysreg_s(SYS_DACR32_EL2);
>>  	case IFSR32_EL2:	return read_sysreg_s(SYS_IFSR32_EL2);
>>  	case DBGVCR32_EL2:	return read_sysreg_s(SYS_DBGVCR32_EL2);
>> +	case SP_EL2:		return read_sysreg(sp_el1);
>> +	case ELR_EL2:		return read_sysreg_el1(SYS_ELR);
>>  	}
>>  
>>  immediate_read:
>> @@ -125,6 +258,34 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
>>  	if (!vcpu->arch.sysregs_loaded_on_cpu)
>>  		goto immediate_write;
>>  
>> +	if (unlikely(sysreg_is_el2(reg))) {
>> +		const struct el2_sysreg_map *el2_reg;
>> +
>> +		if (!is_hyp_ctxt(vcpu))
>> +			goto immediate_write;
>> +
>> +		/* Store the EL2 version in the sysregs array. */
>> +		__vcpu_sys_reg(vcpu, reg) = val;
>> +
>> +		el2_reg = find_el2_sysreg(nested_sysreg_map, reg);
>> +		if (el2_reg) {
>> +			/* Does this register have an EL1 counterpart? */
>> +			if (el2_reg->mapping == __INVALID_SYSREG__)
>> +				return;
> 
> As in the read case, this is never reached and we'll go through the
> switch case.

Same thing. That's the mapping that is evaluated, not the sysreg itself.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 07/59] KVM: arm64: nv: Add EL2 system registers to vcpu context
  2019-06-24 12:54   ` Dave Martin
@ 2019-07-03 12:20     ` Marc Zyngier
  2019-07-03 16:31       ` Dave Martin
  0 siblings, 1 reply; 176+ messages in thread
From: Marc Zyngier @ 2019-07-03 12:20 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-arm-kernel, kvmarm, kvm, Julien Thierry, Andre Przywara,
	Suzuki K Poulose, Christoffer Dall, James Morse, Jintack Lim

On 24/06/2019 13:54, Dave Martin wrote:
> On Fri, Jun 21, 2019 at 10:37:51AM +0100, Marc Zyngier wrote:
>> From: Jintack Lim <jintack.lim@linaro.org>
>>
>> ARM v8.3 introduces a new bit in the HCR_EL2, which is the NV bit. When
>> this bit is set, accessing EL2 registers in EL1 traps to EL2. In
>> addition, executing the following instructions in EL1 will trap to EL2:
>> tlbi, at, eret, and msr/mrs instructions to access SP_EL1. Most of the
>> instructions that trap to EL2 with the NV bit were undef at EL1 prior to
>> ARM v8.3. The only instruction that was not undef is eret.
>>
>> This patch sets up a handler for EL2 registers and SP_EL1 register
>> accesses at EL1. The host hypervisor keeps those register values in
>> memory, and will emulate their behavior.
>>
>> This patch doesn't set the NV bit yet. It will be set in a later patch
>> once nested virtualization support is completed.
>>
>> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm64/include/asm/kvm_host.h | 37 +++++++++++++++-
>>  arch/arm64/include/asm/sysreg.h   | 50 ++++++++++++++++++++-
>>  arch/arm64/kvm/sys_regs.c         | 74 ++++++++++++++++++++++++++++---
>>  3 files changed, 154 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 4bcd9c1291d5..2d4290d2513a 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -173,12 +173,47 @@ enum vcpu_sysreg {
>>  	APGAKEYLO_EL1,
>>  	APGAKEYHI_EL1,
>>  
>> -	/* 32bit specific registers. Keep them at the end of the range */
>> +	/* 32bit specific registers. */
> 
> Out of interest, why did we originally want these to be at the end?
> Because they're not at the end any more...

I seem to remember the original assembly switch code used that property.
This is a long gone requirement, thankfully.

> 
>>  	DACR32_EL2,	/* Domain Access Control Register */
>>  	IFSR32_EL2,	/* Instruction Fault Status Register */
>>  	FPEXC32_EL2,	/* Floating-Point Exception Control Register */
>>  	DBGVCR32_EL2,	/* Debug Vector Catch Register */
>>  
>> +	/* EL2 registers sorted ascending by Op0, Op1, CRn, CRm, Op2 */
>> +	FIRST_EL2_SYSREG,
>> +	VPIDR_EL2 = FIRST_EL2_SYSREG,
>> +			/* Virtualization Processor ID Register */
>> +	VMPIDR_EL2,	/* Virtualization Multiprocessor ID Register */
>> +	SCTLR_EL2,	/* System Control Register (EL2) */
>> +	ACTLR_EL2,	/* Auxiliary Control Register (EL2) */
>> +	HCR_EL2,	/* Hypervisor Configuration Register */
>> +	MDCR_EL2,	/* Monitor Debug Configuration Register (EL2) */
>> +	CPTR_EL2,	/* Architectural Feature Trap Register (EL2) */
>> +	HSTR_EL2,	/* Hypervisor System Trap Register */
>> +	HACR_EL2,	/* Hypervisor Auxiliary Control Register */
>> +	TTBR0_EL2,	/* Translation Table Base Register 0 (EL2) */
>> +	TTBR1_EL2,	/* Translation Table Base Register 1 (EL2) */
>> +	TCR_EL2,	/* Translation Control Register (EL2) */
>> +	VTTBR_EL2,	/* Virtualization Translation Table Base Register */
>> +	VTCR_EL2,	/* Virtualization Translation Control Register */
>> +	SPSR_EL2,	/* EL2 saved program status register */
>> +	ELR_EL2,	/* EL2 exception link register */
>> +	AFSR0_EL2,	/* Auxiliary Fault Status Register 0 (EL2) */
>> +	AFSR1_EL2,	/* Auxiliary Fault Status Register 1 (EL2) */
>> +	ESR_EL2,	/* Exception Syndrome Register (EL2) */
>> +	FAR_EL2,	/* Hypervisor IPA Fault Address Register */
>> +	HPFAR_EL2,	/* Hypervisor IPA Fault Address Register */
>> +	MAIR_EL2,	/* Memory Attribute Indirection Register (EL2) */
>> +	AMAIR_EL2,	/* Auxiliary Memory Attribute Indirection Register (EL2) */
>> +	VBAR_EL2,	/* Vector Base Address Register (EL2) */
>> +	RVBAR_EL2,	/* Reset Vector Base Address Register */
>> +	RMR_EL2,	/* Reset Management Register */
>> +	CONTEXTIDR_EL2,	/* Context ID Register (EL2) */
>> +	TPIDR_EL2,	/* EL2 Software Thread ID Register */
>> +	CNTVOFF_EL2,	/* Counter-timer Virtual Offset register */
>> +	CNTHCTL_EL2,	/* Counter-timer Hypervisor Control register */
>> +	SP_EL2,		/* EL2 Stack Pointer */
>> +
> 
> I wonder whether we could make these conditionally present somehow.  Not
> worth worrying about for now to save 200-odd bytes per vcpu though.

With 8.4-NV, these 200 bytes turn into a whole 8kB (4kB page, plus
almost 4kB of padding that I need to reduce one way or another). So I'm
not too worried about this for now.

I really want the NV code to always be present though, in order to avoid
configuration related regressions. I'm not sure how to make this better.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 01/59] KVM: arm64: Migrate _elx sysreg accessors to msr_s/mrs_s
  2019-06-24 12:59   ` Alexandru Elisei
@ 2019-07-03 12:32     ` Marc Zyngier
  0 siblings, 0 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-07-03 12:32 UTC (permalink / raw)
  To: Alexandru Elisei, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Dave Martin

On 24/06/2019 13:59, Alexandru Elisei wrote:
> On 6/21/19 10:37 AM, Marc Zyngier wrote:
>> From: Dave Martin <Dave.Martin@arm.com>
>>
>> Currently, the {read,write}_sysreg_el*() accessors for accessing
>> particular ELs' sysregs in the presence of VHE rely on some local
>> hacks and define their system register encodings in a way that is
>> inconsistent with the core definitions in <asm/sysreg.h>.
>>
>> As a result, it is necessary to add duplicate definitions for any
>> system register that already needs a definition in sysreg.h for
>> other reasons.
>>
>> This is a bit of a maintenance headache, and the reasons for the
>> _el*() accessors working the way they do is a bit historical.
>>
>> This patch gets rid of the shadow sysreg definitions in
>> <asm/kvm_hyp.h>, converts the _el*() accessors to use the core
>> __msr_s/__mrs_s interface, and converts all call sites to use the
>> standard sysreg #define names (i.e., upper case, with SYS_ prefix).
>>
>> This patch will conflict heavily anyway, so the opportunity taken
>> to clean up some bad whitespace in the context of the changes is
>> taken.
>>
>> The change exposes a few system registers that have no sysreg.h
>> definition, due to msr_s/mrs_s being used in place of msr/mrs:
>> additions are made in order to fill in the gaps.
>>
>> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Christoffer Dall <christoffer.dall@arm.com>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: Will Deacon <will.deacon@arm.com>
>> Link: https://www.spinics.net/lists/kvm-arm/msg31717.html
>> [Rebased to v4.21-rc1]
>> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
>> [Rebased to v5.2-rc5, changelog updates]
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm/include/asm/kvm_hyp.h           | 13 ++--
>>  arch/arm64/include/asm/kvm_emulate.h     | 16 ++---
>>  arch/arm64/include/asm/kvm_hyp.h         | 50 ++-------------
>>  arch/arm64/include/asm/sysreg.h          | 35 ++++++++++-
>>  arch/arm64/kvm/hyp/switch.c              | 14 ++---
>>  arch/arm64/kvm/hyp/sysreg-sr.c           | 78 ++++++++++++------------
>>  arch/arm64/kvm/hyp/tlb.c                 | 12 ++--
>>  arch/arm64/kvm/hyp/vgic-v2-cpuif-proxy.c |  2 +-
>>  arch/arm64/kvm/regmap.c                  |  4 +-
>>  arch/arm64/kvm/sys_regs.c                | 56 ++++++++---------
>>  virt/kvm/arm/arch_timer.c                | 24 ++++----
>>  11 files changed, 148 insertions(+), 156 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/kvm_hyp.h b/arch/arm/include/asm/kvm_hyp.h
>> index 87bcd18df8d5..059224fb14db 100644
>> --- a/arch/arm/include/asm/kvm_hyp.h
>> +++ b/arch/arm/include/asm/kvm_hyp.h
>> @@ -93,13 +93,14 @@
>>  #define VFP_FPEXC	__ACCESS_VFP(FPEXC)
>>  
>>  /* AArch64 compatibility macros, only for the timer so far */
>> -#define read_sysreg_el0(r)		read_sysreg(r##_el0)
>> -#define write_sysreg_el0(v, r)		write_sysreg(v, r##_el0)
>> +#define read_sysreg_el0(r)		read_sysreg(r##_EL0)
>> +#define write_sysreg_el0(v, r)		write_sysreg(v, r##_EL0)
>> +
>> +#define SYS_CNTP_CTL_EL0		CNTP_CTL
>> +#define SYS_CNTP_CVAL_EL0		CNTP_CVAL
>> +#define SYS_CNTV_CTL_EL0		CNTV_CTL
>> +#define SYS_CNTV_CVAL_EL0		CNTV_CVAL
>>  
>> -#define cntp_ctl_el0			CNTP_CTL
>> -#define cntp_cval_el0			CNTP_CVAL
>> -#define cntv_ctl_el0			CNTV_CTL
>> -#define cntv_cval_el0			CNTV_CVAL
>>  #define cntvoff_el2			CNTVOFF
>>  #define cnthctl_el2			CNTHCTL
>>  
>> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
>> index 613427fafff9..39ffe41855bc 100644
>> --- a/arch/arm64/include/asm/kvm_emulate.h
>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>> @@ -137,7 +137,7 @@ static inline unsigned long *__vcpu_elr_el1(const struct kvm_vcpu *vcpu)
>>  static inline unsigned long vcpu_read_elr_el1(const struct kvm_vcpu *vcpu)
>>  {
>>  	if (vcpu->arch.sysregs_loaded_on_cpu)
>> -		return read_sysreg_el1(elr);
>> +		return read_sysreg_el1(SYS_ELR);
>>  	else
>>  		return *__vcpu_elr_el1(vcpu);
>>  }
>> @@ -145,7 +145,7 @@ static inline unsigned long vcpu_read_elr_el1(const struct kvm_vcpu *vcpu)
>>  static inline void vcpu_write_elr_el1(const struct kvm_vcpu *vcpu, unsigned long v)
>>  {
>>  	if (vcpu->arch.sysregs_loaded_on_cpu)
>> -		write_sysreg_el1(v, elr);
>> +		write_sysreg_el1(v, SYS_ELR);
>>  	else
>>  		*__vcpu_elr_el1(vcpu) = v;
>>  }
>> @@ -197,7 +197,7 @@ static inline unsigned long vcpu_read_spsr(const struct kvm_vcpu *vcpu)
>>  		return vcpu_read_spsr32(vcpu);
>>  
>>  	if (vcpu->arch.sysregs_loaded_on_cpu)
>> -		return read_sysreg_el1(spsr);
>> +		return read_sysreg_el1(SYS_SPSR);
>>  	else
>>  		return vcpu_gp_regs(vcpu)->spsr[KVM_SPSR_EL1];
>>  }
>> @@ -210,7 +210,7 @@ static inline void vcpu_write_spsr(struct kvm_vcpu *vcpu, unsigned long v)
>>  	}
>>  
>>  	if (vcpu->arch.sysregs_loaded_on_cpu)
>> -		write_sysreg_el1(v, spsr);
>> +		write_sysreg_el1(v, SYS_SPSR);
>>  	else
>>  		vcpu_gp_regs(vcpu)->spsr[KVM_SPSR_EL1] = v;
>>  }
>> @@ -462,13 +462,13 @@ static inline void kvm_skip_instr(struct kvm_vcpu *vcpu, bool is_wide_instr)
>>   */
>>  static inline void __hyp_text __kvm_skip_instr(struct kvm_vcpu *vcpu)
>>  {
>> -	*vcpu_pc(vcpu) = read_sysreg_el2(elr);
>> -	vcpu->arch.ctxt.gp_regs.regs.pstate = read_sysreg_el2(spsr);
>> +	*vcpu_pc(vcpu) = read_sysreg_el2(SYS_ELR);
>> +	vcpu->arch.ctxt.gp_regs.regs.pstate = read_sysreg_el2(SYS_SPSR);
>>  
>>  	kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
>>  
>> -	write_sysreg_el2(vcpu->arch.ctxt.gp_regs.regs.pstate, spsr);
>> -	write_sysreg_el2(*vcpu_pc(vcpu), elr);
>> +	write_sysreg_el2(vcpu->arch.ctxt.gp_regs.regs.pstate, SYS_SPSR);
>> +	write_sysreg_el2(*vcpu_pc(vcpu), SYS_ELR);
>>  }
>>  
>>  #endif /* __ARM64_KVM_EMULATE_H__ */
>> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
>> index 09fe8bd15f6e..ce99c2daff04 100644
>> --- a/arch/arm64/include/asm/kvm_hyp.h
>> +++ b/arch/arm64/include/asm/kvm_hyp.h
>> @@ -29,7 +29,7 @@
>>  #define read_sysreg_elx(r,nvh,vh)					\
>>  	({								\
>>  		u64 reg;						\
>> -		asm volatile(ALTERNATIVE("mrs %0, " __stringify(r##nvh),\
>> +		asm volatile(ALTERNATIVE(__mrs_s("%0", r##nvh),	\
>>  					 __mrs_s("%0", r##vh),		\
>>  					 ARM64_HAS_VIRT_HOST_EXTN)	\
>>  			     : "=r" (reg));				\
>> @@ -39,7 +39,7 @@
>>  #define write_sysreg_elx(v,r,nvh,vh)					\
>>  	do {								\
>>  		u64 __val = (u64)(v);					\
>> -		asm volatile(ALTERNATIVE("msr " __stringify(r##nvh) ", %x0",\
>> +		asm volatile(ALTERNATIVE(__msr_s(r##nvh, "%x0"),	\
>>  					 __msr_s(r##vh, "%x0"),		\
>>  					 ARM64_HAS_VIRT_HOST_EXTN)	\
>>  					 : : "rZ" (__val));		\
>> @@ -48,55 +48,15 @@
>>  /*
>>   * Unified accessors for registers that have a different encoding
>>   * between VHE and non-VHE. They must be specified without their "ELx"
>> - * encoding.
>> + * encoding, but with the SYS_ prefix, as defined in asm/sysreg.h.
>>   */
>> -#define read_sysreg_el2(r)						\
>> -	({								\
>> -		u64 reg;						\
>> -		asm volatile(ALTERNATIVE("mrs %0, " __stringify(r##_EL2),\
>> -					 "mrs %0, " __stringify(r##_EL1),\
>> -					 ARM64_HAS_VIRT_HOST_EXTN)	\
>> -			     : "=r" (reg));				\
>> -		reg;							\
>> -	})
>> -
>> -#define write_sysreg_el2(v,r)						\
>> -	do {								\
>> -		u64 __val = (u64)(v);					\
>> -		asm volatile(ALTERNATIVE("msr " __stringify(r##_EL2) ", %x0",\
>> -					 "msr " __stringify(r##_EL1) ", %x0",\
>> -					 ARM64_HAS_VIRT_HOST_EXTN)	\
>> -					 : : "rZ" (__val));		\
>> -	} while (0)
>>  
>>  #define read_sysreg_el0(r)	read_sysreg_elx(r, _EL0, _EL02)
>>  #define write_sysreg_el0(v,r)	write_sysreg_elx(v, r, _EL0, _EL02)
>>  #define read_sysreg_el1(r)	read_sysreg_elx(r, _EL1, _EL12)
>>  #define write_sysreg_el1(v,r)	write_sysreg_elx(v, r, _EL1, _EL12)
>> -
>> -/* The VHE specific system registers and their encoding */
>> -#define sctlr_EL12              sys_reg(3, 5, 1, 0, 0)
>> -#define cpacr_EL12              sys_reg(3, 5, 1, 0, 2)
>> -#define ttbr0_EL12              sys_reg(3, 5, 2, 0, 0)
>> -#define ttbr1_EL12              sys_reg(3, 5, 2, 0, 1)
>> -#define tcr_EL12                sys_reg(3, 5, 2, 0, 2)
>> -#define afsr0_EL12              sys_reg(3, 5, 5, 1, 0)
>> -#define afsr1_EL12              sys_reg(3, 5, 5, 1, 1)
>> -#define esr_EL12                sys_reg(3, 5, 5, 2, 0)
>> -#define far_EL12                sys_reg(3, 5, 6, 0, 0)
>> -#define mair_EL12               sys_reg(3, 5, 10, 2, 0)
>> -#define amair_EL12              sys_reg(3, 5, 10, 3, 0)
>> -#define vbar_EL12               sys_reg(3, 5, 12, 0, 0)
>> -#define contextidr_EL12         sys_reg(3, 5, 13, 0, 1)
>> -#define cntkctl_EL12            sys_reg(3, 5, 14, 1, 0)
>> -#define cntp_tval_EL02          sys_reg(3, 5, 14, 2, 0)
>> -#define cntp_ctl_EL02           sys_reg(3, 5, 14, 2, 1)
>> -#define cntp_cval_EL02          sys_reg(3, 5, 14, 2, 2)
>> -#define cntv_tval_EL02          sys_reg(3, 5, 14, 3, 0)
>> -#define cntv_ctl_EL02           sys_reg(3, 5, 14, 3, 1)
>> -#define cntv_cval_EL02          sys_reg(3, 5, 14, 3, 2)
>> -#define spsr_EL12               sys_reg(3, 5, 4, 0, 0)
>> -#define elr_EL12                sys_reg(3, 5, 4, 0, 1)
>> +#define read_sysreg_el2(r)	read_sysreg_elx(r, _EL2, _EL1)
>> +#define write_sysreg_el2(v,r)	write_sysreg_elx(v, r, _EL2, _EL1)
>>  
>>  /**
>>   * hyp_alternate_select - Generates patchable code sequences that are
>> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
>> index 902d75b60914..434cf53d527b 100644
>> --- a/arch/arm64/include/asm/sysreg.h
>> +++ b/arch/arm64/include/asm/sysreg.h
>> @@ -202,6 +202,9 @@
>>  #define SYS_APGAKEYLO_EL1		sys_reg(3, 0, 2, 3, 0)
>>  #define SYS_APGAKEYHI_EL1		sys_reg(3, 0, 2, 3, 1)
>>  
>> +#define SYS_SPSR_EL1			sys_reg(3, 0, 4, 0, 0)
>> +#define SYS_ELR_EL1			sys_reg(3, 0, 4, 0, 1)
>> +
>>  #define SYS_ICC_PMR_EL1			sys_reg(3, 0, 4, 6, 0)
>>  
>>  #define SYS_AFSR0_EL1			sys_reg(3, 0, 5, 1, 0)
>> @@ -393,6 +396,9 @@
>>  #define SYS_CNTP_CTL_EL0		sys_reg(3, 3, 14, 2, 1)
>>  #define SYS_CNTP_CVAL_EL0		sys_reg(3, 3, 14, 2, 2)
>>  
>> +#define SYS_CNTV_CTL_EL0		sys_reg(3, 3, 14, 3, 1)
>> +#define SYS_CNTV_CVAL_EL0		sys_reg(3, 3, 14, 3, 2)
>> +
>>  #define SYS_AARCH32_CNTP_TVAL		sys_reg(0, 0, 14, 2, 0)
>>  #define SYS_AARCH32_CNTP_CTL		sys_reg(0, 0, 14, 2, 1)
>>  #define SYS_AARCH32_CNTP_CVAL		sys_reg(0, 2, 0, 14, 0)
>> @@ -403,14 +409,17 @@
>>  #define __TYPER_CRm(n)			(0xc | (((n) >> 3) & 0x3))
>>  #define SYS_PMEVTYPERn_EL0(n)		sys_reg(3, 3, 14, __TYPER_CRm(n), __PMEV_op2(n))
>>  
>> -#define SYS_PMCCFILTR_EL0		sys_reg (3, 3, 14, 15, 7)
>> +#define SYS_PMCCFILTR_EL0		sys_reg(3, 3, 14, 15, 7)
>>  
>>  #define SYS_ZCR_EL2			sys_reg(3, 4, 1, 2, 0)
>> -
>>  #define SYS_DACR32_EL2			sys_reg(3, 4, 3, 0, 0)
>> +#define SYS_SPSR_EL2			sys_reg(3, 4, 4, 0, 0)
>> +#define SYS_ELR_EL2			sys_reg(3, 4, 4, 0, 1)
>>  #define SYS_IFSR32_EL2			sys_reg(3, 4, 5, 0, 1)
>> +#define SYS_ESR_EL2			sys_reg(3, 4, 5, 2, 0)
>>  #define SYS_VSESR_EL2			sys_reg(3, 4, 5, 2, 3)
>>  #define SYS_FPEXC32_EL2			sys_reg(3, 4, 5, 3, 0)
>> +#define SYS_FAR_EL2			sys_reg(3, 4, 6, 0, 0)
>>  
>>  #define SYS_VDISR_EL2			sys_reg(3, 4, 12, 1,  1)
>>  #define __SYS__AP0Rx_EL2(x)		sys_reg(3, 4, 12, 8, x)
>> @@ -455,7 +464,29 @@
>>  #define SYS_ICH_LR15_EL2		__SYS__LR8_EL2(7)
>>  
>>  /* VHE encodings for architectural EL0/1 system registers */
>> +#define SYS_SCTLR_EL12			sys_reg(3, 5, 1, 0, 0)
>> +#define SYS_CPACR_EL12			sys_reg(3, 5, 1, 0, 2)
>>  #define SYS_ZCR_EL12			sys_reg(3, 5, 1, 2, 0)
>> +#define SYS_TTBR0_EL12			sys_reg(3, 5, 2, 0, 0)
>> +#define SYS_TTBR1_EL12			sys_reg(3, 5, 2, 0, 1)
>> +#define SYS_TCR_EL12			sys_reg(3, 5, 2, 0, 2)
>> +#define SYS_SPSR_EL12			sys_reg(3, 5, 4, 0, 0)
>> +#define SYS_ELR_EL12			sys_reg(3, 5, 4, 0, 1)
>> +#define SYS_AFSR0_EL12			sys_reg(3, 5, 5, 1, 0)
>> +#define SYS_AFSR1_EL12			sys_reg(3, 5, 5, 1, 1)
>> +#define SYS_ESR_EL12			sys_reg(3, 5, 5, 2, 0)
>> +#define SYS_FAR_EL12			sys_reg(3, 5, 6, 0, 0)
>> +#define SYS_MAIR_EL12			sys_reg(3, 5, 10, 2, 0)
>> +#define SYS_AMAIR_EL12			sys_reg(3, 5, 10, 3, 0)
>> +#define SYS_VBAR_EL12			sys_reg(3, 5, 12, 0, 0)
>> +#define SYS_CONTEXTIDR_EL12		sys_reg(3, 5, 13, 0, 1)
>> +#define SYS_CNTKCTL_EL12		sys_reg(3, 5, 14, 1, 0)
>> +#define SYS_CNTP_TVAL_EL02		sys_reg(3, 5, 14, 2, 0)
>> +#define SYS_CNTP_CTL_EL02		sys_reg(3, 5, 14, 2, 1)
>> +#define SYS_CNTP_CVAL_EL02		sys_reg(3, 5, 14, 2, 2)
>> +#define SYS_CNTV_TVAL_EL02		sys_reg(3, 5, 14, 3, 0)
>> +#define SYS_CNTV_CTL_EL02		sys_reg(3, 5, 14, 3, 1)
>> +#define SYS_CNTV_CVAL_EL02		sys_reg(3, 5, 14, 3, 2)
>>  
>>  /* Common SCTLR_ELx flags. */
>>  #define SCTLR_ELx_DSSBS	(_BITUL(44))
>> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
>> index 8799e0c267d4..7b55c11b30fb 100644
>> --- a/arch/arm64/kvm/hyp/switch.c
>> +++ b/arch/arm64/kvm/hyp/switch.c
>> @@ -295,7 +295,7 @@ static bool __hyp_text __populate_fault_info(struct kvm_vcpu *vcpu)
>>  	if (ec != ESR_ELx_EC_DABT_LOW && ec != ESR_ELx_EC_IABT_LOW)
>>  		return true;
>>  
>> -	far = read_sysreg_el2(far);
>> +	far = read_sysreg_el2(SYS_FAR);
>>  
>>  	/*
>>  	 * The HPFAR can be invalid if the stage 2 fault did not
>> @@ -412,7 +412,7 @@ static bool __hyp_text __hyp_handle_fpsimd(struct kvm_vcpu *vcpu)
>>  static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>>  {
>>  	if (ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ)
>> -		vcpu->arch.fault.esr_el2 = read_sysreg_el2(esr);
>> +		vcpu->arch.fault.esr_el2 = read_sysreg_el2(SYS_ESR);
>>  
>>  	/*
>>  	 * We're using the raw exception code in order to only process
>> @@ -708,8 +708,8 @@ static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par,
>>  	asm volatile("ldr %0, =__hyp_panic_string" : "=r" (str_va));
>>  
>>  	__hyp_do_panic(str_va,
>> -		       spsr,  elr,
>> -		       read_sysreg(esr_el2),   read_sysreg_el2(far),
>> +		       spsr, elr,
>> +		       read_sysreg(esr_el2), read_sysreg_el2(SYS_FAR),
> Seems to me we are pretty sure here we don't have VHE, so why not make both
> reads either read_sysreg or read_sysreg_el2 for consistency? Am I missing something?

You're not missing much, only that it isn't what this change is about.
If we want to make these things consistent, I'd rather have a separate
patch that changes just that.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 15/59] KVM: arm64: nv: Refactor vcpu_{read,write}_sys_reg
  2019-06-24 15:07   ` Julien Thierry
@ 2019-07-03 13:09     ` Marc Zyngier
  0 siblings, 0 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-07-03 13:09 UTC (permalink / raw)
  To: Julien Thierry, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	James Morse, Suzuki K Poulose

On 24/06/2019 16:07, Julien Thierry wrote:
> 
> 
> On 06/21/2019 10:37 AM, Marc Zyngier wrote:
>> Extract the direct HW accessors for later reuse.
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm64/kvm/sys_regs.c | 247 +++++++++++++++++++++-----------------
>>  1 file changed, 139 insertions(+), 108 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>> index 2b8734f75a09..e181359adadf 100644
>> --- a/arch/arm64/kvm/sys_regs.c
>> +++ b/arch/arm64/kvm/sys_regs.c
>> @@ -182,99 +182,161 @@ const struct el2_sysreg_map *find_el2_sysreg(const struct el2_sysreg_map *map,
>>  	return entry;
>>  }
>>  
>> +static bool __vcpu_read_sys_reg_from_cpu(int reg, u64 *val)
>> +{
>> +	/*
>> +	 * System registers listed in the switch are not saved on every
>> +	 * exit from the guest but are only saved on vcpu_put.
>> +	 *
>> +	 * Note that MPIDR_EL1 for the guest is set by KVM via VMPIDR_EL2 but
>> +	 * should never be listed below, because the guest cannot modify its
>> +	 * own MPIDR_EL1 and MPIDR_EL1 is accessed for VCPU A from VCPU B's
>> +	 * thread when emulating cross-VCPU communication.
>> +	 */
>> +	switch (reg) {
>> +	case CSSELR_EL1:	*val = read_sysreg_s(SYS_CSSELR_EL1);	break;
>> +	case SCTLR_EL1:		*val = read_sysreg_s(SYS_SCTLR_EL12);	break;
>> +	case ACTLR_EL1:		*val = read_sysreg_s(SYS_ACTLR_EL1);	break;
>> +	case CPACR_EL1:		*val = read_sysreg_s(SYS_CPACR_EL12);	break;
>> +	case TTBR0_EL1:		*val = read_sysreg_s(SYS_TTBR0_EL12);	break;
>> +	case TTBR1_EL1:		*val = read_sysreg_s(SYS_TTBR1_EL12);	break;
>> +	case TCR_EL1:		*val = read_sysreg_s(SYS_TCR_EL12);	break;
>> +	case ESR_EL1:		*val = read_sysreg_s(SYS_ESR_EL12);	break;
>> +	case AFSR0_EL1:		*val = read_sysreg_s(SYS_AFSR0_EL12);	break;
>> +	case AFSR1_EL1:		*val = read_sysreg_s(SYS_AFSR1_EL12);	break;
>> +	case FAR_EL1:		*val = read_sysreg_s(SYS_FAR_EL12);	break;
>> +	case MAIR_EL1:		*val = read_sysreg_s(SYS_MAIR_EL12);	break;
>> +	case VBAR_EL1:		*val = read_sysreg_s(SYS_VBAR_EL12);	break;
>> +	case CONTEXTIDR_EL1:	*val = read_sysreg_s(SYS_CONTEXTIDR_EL12);break;
>> +	case TPIDR_EL0:		*val = read_sysreg_s(SYS_TPIDR_EL0);	break;
>> +	case TPIDRRO_EL0:	*val = read_sysreg_s(SYS_TPIDRRO_EL0);	break;
>> +	case TPIDR_EL1:		*val = read_sysreg_s(SYS_TPIDR_EL1);	break;
>> +	case AMAIR_EL1:		*val = read_sysreg_s(SYS_AMAIR_EL12);	break;
>> +	case CNTKCTL_EL1:	*val = read_sysreg_s(SYS_CNTKCTL_EL12);	break;
>> +	case PAR_EL1:		*val = read_sysreg_s(SYS_PAR_EL1);	break;
>> +	case DACR32_EL2:	*val = read_sysreg_s(SYS_DACR32_EL2);	break;
>> +	case IFSR32_EL2:	*val = read_sysreg_s(SYS_IFSR32_EL2);	break;
>> +	case DBGVCR32_EL2:	*val = read_sysreg_s(SYS_DBGVCR32_EL2);	break;
>> +	default:		return false;
>> +	}
>> +
>> +	return true;
>> +}
>> +
>> +static bool __vcpu_write_sys_reg_to_cpu(u64 val, int reg)
>> +{
>> +	/*
>> +	 * System registers listed in the switch are not restored on every
>> +	 * entry to the guest but are only restored on vcpu_load.
>> +	 *
>> +	 * Note that MPIDR_EL1 for the guest is set by KVM via VMPIDR_EL2 but
>> +	 * should never be listed below, because the the MPIDR should only be
>> +	 * set once, before running the VCPU, and never changed later.
>> +	 */
>> +	switch (reg) {
>> +	case CSSELR_EL1:	write_sysreg_s(val, SYS_CSSELR_EL1);	break;
>> +	case SCTLR_EL1:		write_sysreg_s(val, SYS_SCTLR_EL12);	break;
>> +	case ACTLR_EL1:		write_sysreg_s(val, SYS_ACTLR_EL1);	break;
>> +	case CPACR_EL1:		write_sysreg_s(val, SYS_CPACR_EL12);	break;
>> +	case TTBR0_EL1:		write_sysreg_s(val, SYS_TTBR0_EL12);	break;
>> +	case TTBR1_EL1:		write_sysreg_s(val, SYS_TTBR1_EL12);	break;
>> +	case TCR_EL1:		write_sysreg_s(val, SYS_TCR_EL12);	break;
>> +	case ESR_EL1:		write_sysreg_s(val, SYS_ESR_EL12);	break;
>> +	case AFSR0_EL1:		write_sysreg_s(val, SYS_AFSR0_EL12);	break;
>> +	case AFSR1_EL1:		write_sysreg_s(val, SYS_AFSR1_EL12);	break;
>> +	case FAR_EL1:		write_sysreg_s(val, SYS_FAR_EL12);	break;
>> +	case MAIR_EL1:		write_sysreg_s(val, SYS_MAIR_EL12);	break;
>> +	case VBAR_EL1:		write_sysreg_s(val, SYS_VBAR_EL12);	break;
>> +	case CONTEXTIDR_EL1:	write_sysreg_s(val, SYS_CONTEXTIDR_EL12);break;
>> +	case TPIDR_EL0:		write_sysreg_s(val, SYS_TPIDR_EL0);	break;
>> +	case TPIDRRO_EL0:	write_sysreg_s(val, SYS_TPIDRRO_EL0);	break;
>> +	case TPIDR_EL1:		write_sysreg_s(val, SYS_TPIDR_EL1);	break;
>> +	case AMAIR_EL1:		write_sysreg_s(val, SYS_AMAIR_EL12);	break;
>> +	case CNTKCTL_EL1:	write_sysreg_s(val, SYS_CNTKCTL_EL12);	break;
>> +	case PAR_EL1:		write_sysreg_s(val, SYS_PAR_EL1);	break;
>> +	case DACR32_EL2:	write_sysreg_s(val, SYS_DACR32_EL2);	break;
>> +	case IFSR32_EL2:	write_sysreg_s(val, SYS_IFSR32_EL2);	break;
>> +	case DBGVCR32_EL2:	write_sysreg_s(val, SYS_DBGVCR32_EL2);	break;
>> +	default:		return false;
>> +	}
>> +
>> +	return true;
>> +}
>> +
>>  u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
>>  {
>> -	u64 val;
>> +	u64 val = 0x8badf00d8badf00d;
>>  
>>  	if (!vcpu->arch.sysregs_loaded_on_cpu)
>> -		goto immediate_read;
>> +		goto memory_read;
>>  
>>  	if (unlikely(sysreg_is_el2(reg))) {
>>  		const struct el2_sysreg_map *el2_reg;
>>  
>>  		if (!is_hyp_ctxt(vcpu))
>> -			goto immediate_read;
>> +			goto memory_read;
>>  
>>  		switch (reg) {
>> +		case ELR_EL2:
>> +			return read_sysreg_el1(SYS_ELR);
> 
> Hmmm, This change feels a bit out of place.
> 
> Also, patch 13 added ELR_EL2 and SP_EL2 to the switch cases for physical
> sysreg accesses. Now ELR_EL2 is moved out of the main switch cases and
> SP_EL2 is completely omitted.
> 
> I'd say either patch 13 needs to be reworked or there is a separate
> patch that should be extracted from this patch to have an intermediate
> state, or the commit message on this patch should be more detailed.

Yeah, I wanted to actually squash most of this patch into #13, and got
sidetracked. Definitely room for improvement. I may even bring in the
rework I mentioned against #13.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 07/59] KVM: arm64: nv: Add EL2 system registers to vcpu context
  2019-06-24 15:47   ` Alexandru Elisei
@ 2019-07-03 13:20     ` Marc Zyngier
  2019-07-03 16:01       ` Marc Zyngier
  0 siblings, 1 reply; 176+ messages in thread
From: Marc Zyngier @ 2019-07-03 13:20 UTC (permalink / raw)
  To: Alexandru Elisei, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Dave Martin

On 24/06/2019 16:47, Alexandru Elisei wrote:
> On 6/21/19 10:37 AM, Marc Zyngier wrote:
>> From: Jintack Lim <jintack.lim@linaro.org>
>>
>> ARM v8.3 introduces a new bit in the HCR_EL2, which is the NV bit. When
>> this bit is set, accessing EL2 registers in EL1 traps to EL2. In
>> addition, executing the following instructions in EL1 will trap to EL2:
>> tlbi, at, eret, and msr/mrs instructions to access SP_EL1. Most of the
>> instructions that trap to EL2 with the NV bit were undef at EL1 prior to
>> ARM v8.3. The only instruction that was not undef is eret.
>>
>> This patch sets up a handler for EL2 registers and SP_EL1 register
>> accesses at EL1. The host hypervisor keeps those register values in
>> memory, and will emulate their behavior.
>>
>> This patch doesn't set the NV bit yet. It will be set in a later patch
>> once nested virtualization support is completed.
>>
>> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm64/include/asm/kvm_host.h | 37 +++++++++++++++-
>>  arch/arm64/include/asm/sysreg.h   | 50 ++++++++++++++++++++-
>>  arch/arm64/kvm/sys_regs.c         | 74 ++++++++++++++++++++++++++++---
>>  3 files changed, 154 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 4bcd9c1291d5..2d4290d2513a 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -173,12 +173,47 @@ enum vcpu_sysreg {
>>  	APGAKEYLO_EL1,
>>  	APGAKEYHI_EL1,
>>  
>> -	/* 32bit specific registers. Keep them at the end of the range */
>> +	/* 32bit specific registers. */
>>  	DACR32_EL2,	/* Domain Access Control Register */
>>  	IFSR32_EL2,	/* Instruction Fault Status Register */
>>  	FPEXC32_EL2,	/* Floating-Point Exception Control Register */
>>  	DBGVCR32_EL2,	/* Debug Vector Catch Register */
>>  
>> +	/* EL2 registers sorted ascending by Op0, Op1, CRn, CRm, Op2 */
>> +	FIRST_EL2_SYSREG,
>> +	VPIDR_EL2 = FIRST_EL2_SYSREG,
>> +			/* Virtualization Processor ID Register */
>> +	VMPIDR_EL2,	/* Virtualization Multiprocessor ID Register */
>> +	SCTLR_EL2,	/* System Control Register (EL2) */
>> +	ACTLR_EL2,	/* Auxiliary Control Register (EL2) */
>> +	HCR_EL2,	/* Hypervisor Configuration Register */
>> +	MDCR_EL2,	/* Monitor Debug Configuration Register (EL2) */
>> +	CPTR_EL2,	/* Architectural Feature Trap Register (EL2) */
>> +	HSTR_EL2,	/* Hypervisor System Trap Register */
>> +	HACR_EL2,	/* Hypervisor Auxiliary Control Register */
>> +	TTBR0_EL2,	/* Translation Table Base Register 0 (EL2) */
>> +	TTBR1_EL2,	/* Translation Table Base Register 1 (EL2) */
>> +	TCR_EL2,	/* Translation Control Register (EL2) */
>> +	VTTBR_EL2,	/* Virtualization Translation Table Base Register */
>> +	VTCR_EL2,	/* Virtualization Translation Control Register */
>> +	SPSR_EL2,	/* EL2 saved program status register */
>> +	ELR_EL2,	/* EL2 exception link register */
>> +	AFSR0_EL2,	/* Auxiliary Fault Status Register 0 (EL2) */
>> +	AFSR1_EL2,	/* Auxiliary Fault Status Register 1 (EL2) */
>> +	ESR_EL2,	/* Exception Syndrome Register (EL2) */
>> +	FAR_EL2,	/* Hypervisor IPA Fault Address Register */
>> +	HPFAR_EL2,	/* Hypervisor IPA Fault Address Register */
>> +	MAIR_EL2,	/* Memory Attribute Indirection Register (EL2) */
>> +	AMAIR_EL2,	/* Auxiliary Memory Attribute Indirection Register (EL2) */
>> +	VBAR_EL2,	/* Vector Base Address Register (EL2) */
>> +	RVBAR_EL2,	/* Reset Vector Base Address Register */
>> +	RMR_EL2,	/* Reset Management Register */
>> +	CONTEXTIDR_EL2,	/* Context ID Register (EL2) */
>> +	TPIDR_EL2,	/* EL2 Software Thread ID Register */
>> +	CNTVOFF_EL2,	/* Counter-timer Virtual Offset register */
>> +	CNTHCTL_EL2,	/* Counter-timer Hypervisor Control register */
>> +	SP_EL2,		/* EL2 Stack Pointer */
>> +
>>  	NR_SYS_REGS	/* Nothing after this line! */
>>  };
>>  
>> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
>> index f3ca7e4796ab..8b95f2c42c3d 100644
>> --- a/arch/arm64/include/asm/sysreg.h
>> +++ b/arch/arm64/include/asm/sysreg.h
>> @@ -411,17 +411,49 @@
>>  
>>  #define SYS_PMCCFILTR_EL0		sys_reg(3, 3, 14, 15, 7)
>>  
>> +#define SYS_VPIDR_EL2			sys_reg(3, 4, 0, 0, 0)
>> +#define SYS_VMPIDR_EL2			sys_reg(3, 4, 0, 0, 5)
>> +
>> +#define SYS_SCTLR_EL2			sys_reg(3, 4, 1, 0, 0)
>> +#define SYS_ACTLR_EL2			sys_reg(3, 4, 1, 0, 1)
>> +#define SYS_HCR_EL2			sys_reg(3, 4, 1, 1, 0)
>> +#define SYS_MDCR_EL2			sys_reg(3, 4, 1, 1, 1)
>> +#define SYS_CPTR_EL2			sys_reg(3, 4, 1, 1, 2)
>> +#define SYS_HSTR_EL2			sys_reg(3, 4, 1, 1, 3)
>> +#define SYS_HACR_EL2			sys_reg(3, 4, 1, 1, 7)
>> +
>>  #define SYS_ZCR_EL2			sys_reg(3, 4, 1, 2, 0)
>> +
>> +#define SYS_TTBR0_EL2			sys_reg(3, 4, 2, 0, 0)
>> +#define SYS_TTBR1_EL2			sys_reg(3, 4, 2, 0, 1)
>> +#define SYS_TCR_EL2			sys_reg(3, 4, 2, 0, 2)
>> +#define SYS_VTTBR_EL2			sys_reg(3, 4, 2, 1, 0)
>> +#define SYS_VTCR_EL2			sys_reg(3, 4, 2, 1, 2)
>> +
>>  #define SYS_DACR32_EL2			sys_reg(3, 4, 3, 0, 0)
>> +
>>  #define SYS_SPSR_EL2			sys_reg(3, 4, 4, 0, 0)
>>  #define SYS_ELR_EL2			sys_reg(3, 4, 4, 0, 1)
>> +#define SYS_SP_EL1			sys_reg(3, 4, 4, 1, 0)
>> +
>>  #define SYS_IFSR32_EL2			sys_reg(3, 4, 5, 0, 1)
>> +#define SYS_AFSR0_EL2			sys_reg(3, 4, 5, 1, 0)
>> +#define SYS_AFSR1_EL2			sys_reg(3, 4, 5, 1, 1)
>>  #define SYS_ESR_EL2			sys_reg(3, 4, 5, 2, 0)
>>  #define SYS_VSESR_EL2			sys_reg(3, 4, 5, 2, 3)
>>  #define SYS_FPEXC32_EL2			sys_reg(3, 4, 5, 3, 0)
>>  #define SYS_FAR_EL2			sys_reg(3, 4, 6, 0, 0)
>>  
>> -#define SYS_VDISR_EL2			sys_reg(3, 4, 12, 1,  1)
>> +#define SYS_FAR_EL2			sys_reg(3, 4, 6, 0, 0)
>> +#define SYS_HPFAR_EL2			sys_reg(3, 4, 6, 0, 4)
>> +
>> +#define SYS_MAIR_EL2			sys_reg(3, 4, 10, 2, 0)
>> +#define SYS_AMAIR_EL2			sys_reg(3, 4, 10, 3, 0)
>> +
>> +#define SYS_VBAR_EL2			sys_reg(3, 4, 12, 0, 0)
>> +#define SYS_RVBAR_EL2			sys_reg(3, 4, 12, 0, 1)
>> +#define SYS_RMR_EL2			sys_reg(3, 4, 12, 0, 2)
>> +#define SYS_VDISR_EL2			sys_reg(3, 4, 12, 1, 1)
>>  #define __SYS__AP0Rx_EL2(x)		sys_reg(3, 4, 12, 8, x)
>>  #define SYS_ICH_AP0R0_EL2		__SYS__AP0Rx_EL2(0)
>>  #define SYS_ICH_AP0R1_EL2		__SYS__AP0Rx_EL2(1)
>> @@ -463,23 +495,37 @@
>>  #define SYS_ICH_LR14_EL2		__SYS__LR8_EL2(6)
>>  #define SYS_ICH_LR15_EL2		__SYS__LR8_EL2(7)
>>  
>> +#define SYS_CONTEXTIDR_EL2		sys_reg(3, 4, 13, 0, 1)
>> +#define SYS_TPIDR_EL2			sys_reg(3, 4, 13, 0, 2)
>> +
>> +#define SYS_CNTVOFF_EL2			sys_reg(3, 4, 14, 0, 3)
>> +#define SYS_CNTHCTL_EL2			sys_reg(3, 4, 14, 1, 0)
>> +
>>  /* VHE encodings for architectural EL0/1 system registers */
>>  #define SYS_SCTLR_EL12			sys_reg(3, 5, 1, 0, 0)
>>  #define SYS_CPACR_EL12			sys_reg(3, 5, 1, 0, 2)
>>  #define SYS_ZCR_EL12			sys_reg(3, 5, 1, 2, 0)
>> +
>>  #define SYS_TTBR0_EL12			sys_reg(3, 5, 2, 0, 0)
>>  #define SYS_TTBR1_EL12			sys_reg(3, 5, 2, 0, 1)
>>  #define SYS_TCR_EL12			sys_reg(3, 5, 2, 0, 2)
>> +
>>  #define SYS_SPSR_EL12			sys_reg(3, 5, 4, 0, 0)
>>  #define SYS_ELR_EL12			sys_reg(3, 5, 4, 0, 1)
>> +
>>  #define SYS_AFSR0_EL12			sys_reg(3, 5, 5, 1, 0)
>>  #define SYS_AFSR1_EL12			sys_reg(3, 5, 5, 1, 1)
>>  #define SYS_ESR_EL12			sys_reg(3, 5, 5, 2, 0)
>> +
>>  #define SYS_FAR_EL12			sys_reg(3, 5, 6, 0, 0)
>> +
>>  #define SYS_MAIR_EL12			sys_reg(3, 5, 10, 2, 0)
>>  #define SYS_AMAIR_EL12			sys_reg(3, 5, 10, 3, 0)
>> +
>>  #define SYS_VBAR_EL12			sys_reg(3, 5, 12, 0, 0)
>> +
>>  #define SYS_CONTEXTIDR_EL12		sys_reg(3, 5, 13, 0, 1)
>> +
>>  #define SYS_CNTKCTL_EL12		sys_reg(3, 5, 14, 1, 0)
>>  #define SYS_CNTP_TVAL_EL02		sys_reg(3, 5, 14, 2, 0)
>>  #define SYS_CNTP_CTL_EL02		sys_reg(3, 5, 14, 2, 1)
>> @@ -488,6 +534,8 @@
>>  #define SYS_CNTV_CTL_EL02		sys_reg(3, 5, 14, 3, 1)
>>  #define SYS_CNTV_CVAL_EL02		sys_reg(3, 5, 14, 3, 2)
>>  
>> +#define SYS_SP_EL2			sys_reg(3, 6,  4, 1, 0)
>> +
>>  /* Common SCTLR_ELx flags. */
>>  #define SCTLR_ELx_DSSBS	(_BITUL(44))
>>  #define SCTLR_ELx_ENIA	(_BITUL(31))
>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>> index adb8a7e9c8e4..e81be6debe07 100644
>> --- a/arch/arm64/kvm/sys_regs.c
>> +++ b/arch/arm64/kvm/sys_regs.c
>> @@ -184,6 +184,18 @@ static u32 get_ccsidr(u32 csselr)
>>  	return ccsidr;
>>  }
>>  
>> +static bool access_rw(struct kvm_vcpu *vcpu,
>> +		      struct sys_reg_params *p,
>> +		      const struct sys_reg_desc *r)
>> +{
>> +	if (p->is_write)
>> +		vcpu_write_sys_reg(vcpu, p->regval, r->reg);
>> +	else
>> +		p->regval = vcpu_read_sys_reg(vcpu, r->reg);
>> +
>> +	return true;
>> +}
>> +
>>  /*
>>   * See note at ARMv7 ARM B1.14.4 (TL;DR: S/W ops are not easily virtualized).
>>   */
>> @@ -394,12 +406,9 @@ static bool trap_debug_regs(struct kvm_vcpu *vcpu,
>>  			    struct sys_reg_params *p,
>>  			    const struct sys_reg_desc *r)
>>  {
>> -	if (p->is_write) {
>> -		vcpu_write_sys_reg(vcpu, p->regval, r->reg);
>> +	access_rw(vcpu, p, r);
>> +	if (p->is_write)
>>  		vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
>> -	} else {
>> -		p->regval = vcpu_read_sys_reg(vcpu, r->reg);
>> -	}
>>  
>>  	trace_trap_reg(__func__, r->reg, p->is_write, p->regval);
>>  
>> @@ -1354,6 +1363,19 @@ static bool access_ccsidr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
>>  	.set_user = set_raz_id_reg,		\
>>  }
>>  
>> +static bool access_sp_el1(struct kvm_vcpu *vcpu,
>> +			  struct sys_reg_params *p,
>> +			  const struct sys_reg_desc *r)
>> +{
>> +	/* SP_EL1 is NOT maintained in sys_regs array */
>> +	if (p->is_write)
>> +		vcpu->arch.ctxt.gp_regs.sp_el1 = p->regval;
>> +	else
>> +		p->regval = vcpu->arch.ctxt.gp_regs.sp_el1;
>> +
>> +	return true;
>> +}
>> +
>>  /*
>>   * Architected system registers.
>>   * Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
>> @@ -1646,9 +1668,51 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>>  	 */
>>  	{ SYS_DESC(SYS_PMCCFILTR_EL0), access_pmu_evtyper, reset_val, PMCCFILTR_EL0, 0 },
> I have to admit I haven't gone through all the patches, or maybe this is part of
> the bits that will be added at a later date, but some of the reset values seem
> incorrect according to ARM DDI 0487D.a. I'll comment below the relevant registers.
>>  
>> +	{ SYS_DESC(SYS_VPIDR_EL2), access_rw, reset_val, VPIDR_EL2, 0 },
>> +	{ SYS_DESC(SYS_VMPIDR_EL2), access_rw, reset_val, VMPIDR_EL2, 0 },
>> +
>> +	{ SYS_DESC(SYS_SCTLR_EL2), access_rw, reset_val, SCTLR_EL2, 0 },
> Some bits are RES1 for SCTLR_EL2.

See Patch #67.
>> +	{ SYS_DESC(SYS_ACTLR_EL2), access_rw, reset_val, ACTLR_EL2, 0 },
>> +	{ SYS_DESC(SYS_HCR_EL2), access_rw, reset_val, HCR_EL2, 0 },
>> +	{ SYS_DESC(SYS_MDCR_EL2), access_rw, reset_val, MDCR_EL2, 0 },
>> +	{ SYS_DESC(SYS_CPTR_EL2), access_rw, reset_val, CPTR_EL2, 0 },
> Some bits are RES1 for CPTR_EL2 if HCR_EL2.E2H == 0, which the reset value for
> HCR_EL2 seems to imply.

Correct.

>> +	{ SYS_DESC(SYS_HSTR_EL2), access_rw, reset_val, HSTR_EL2, 0 },
>> +	{ SYS_DESC(SYS_HACR_EL2), access_rw, reset_val, HACR_EL2, 0 },
>> +
>> +	{ SYS_DESC(SYS_TTBR0_EL2), access_rw, reset_val, TTBR0_EL2, 0 },
>> +	{ SYS_DESC(SYS_TTBR1_EL2), access_rw, reset_val, TTBR1_EL2, 0 },
>> +	{ SYS_DESC(SYS_TCR_EL2), access_rw, reset_val, TCR_EL2, 0 },
> Same here, bits 31 and 23 are RES1 for TCR_EL2 when HCR_EL2.E2H == 0.

Indeed. This requires separate handling altogether.

>> +	{ SYS_DESC(SYS_VTTBR_EL2), access_rw, reset_val, VTTBR_EL2, 0 },
>> +	{ SYS_DESC(SYS_VTCR_EL2), access_rw, reset_val, VTCR_EL2, 0 },
>> +
>>  	{ SYS_DESC(SYS_DACR32_EL2), NULL, reset_unknown, DACR32_EL2 },
>> +	{ SYS_DESC(SYS_SPSR_EL2), access_rw, reset_val, SPSR_EL2, 0 },
>> +	{ SYS_DESC(SYS_ELR_EL2), access_rw, reset_val, ELR_EL2, 0 },
>> +	{ SYS_DESC(SYS_SP_EL1), access_sp_el1},
>> +
>>  	{ SYS_DESC(SYS_IFSR32_EL2), NULL, reset_unknown, IFSR32_EL2 },
>> +	{ SYS_DESC(SYS_AFSR0_EL2), access_rw, reset_val, AFSR0_EL2, 0 },
>> +	{ SYS_DESC(SYS_AFSR1_EL2), access_rw, reset_val, AFSR1_EL2, 0 },
>> +	{ SYS_DESC(SYS_ESR_EL2), access_rw, reset_val, ESR_EL2, 0 },
>>  	{ SYS_DESC(SYS_FPEXC32_EL2), NULL, reset_val, FPEXC32_EL2, 0x700 },
>> +
>> +	{ SYS_DESC(SYS_FAR_EL2), access_rw, reset_val, FAR_EL2, 0 },
>> +	{ SYS_DESC(SYS_HPFAR_EL2), access_rw, reset_val, HPFAR_EL2, 0 },
>> +
>> +	{ SYS_DESC(SYS_MAIR_EL2), access_rw, reset_val, MAIR_EL2, 0 },
>> +	{ SYS_DESC(SYS_AMAIR_EL2), access_rw, reset_val, AMAIR_EL2, 0 },
>> +
>> +	{ SYS_DESC(SYS_VBAR_EL2), access_rw, reset_val, VBAR_EL2, 0 },
>> +	{ SYS_DESC(SYS_RVBAR_EL2), access_rw, reset_val, RVBAR_EL2, 0 },
>> +	{ SYS_DESC(SYS_RMR_EL2), access_rw, reset_val, RMR_EL2, 0 },
> Bit AA64 [0] for RMR_EL2 is RAO/WI for EL2 cannot aarch32, which is what the
> patches seem to enforce.

Yup.

I guess I'll end-up spitting those registers out of this patch and
handle them separately.

>> +
>> +	{ SYS_DESC(SYS_CONTEXTIDR_EL2), access_rw, reset_val, CONTEXTIDR_EL2, 0 },
>> +	{ SYS_DESC(SYS_TPIDR_EL2), access_rw, reset_val, TPIDR_EL2, 0 },
>> +
>> +	{ SYS_DESC(SYS_CNTVOFF_EL2), access_rw, reset_val, CNTVOFF_EL2, 0 },
>> +	{ SYS_DESC(SYS_CNTHCTL_EL2), access_rw, reset_val, CNTHCTL_EL2, 0 },
>> +
>> +	{ SYS_DESC(SYS_SP_EL2), NULL, reset_unknown, SP_EL2 },
>>  };
>>  
>>  static bool trap_dbgidr(struct kvm_vcpu *vcpu,

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 16/59] KVM: arm64: nv: Save/Restore vEL2 sysregs
  2019-06-25  8:48   ` Julien Thierry
@ 2019-07-03 13:42     ` Marc Zyngier
  0 siblings, 0 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-07-03 13:42 UTC (permalink / raw)
  To: Julien Thierry, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	James Morse, Suzuki K Poulose

On 25/06/2019 09:48, Julien Thierry wrote:
> 
> 
> On 06/21/2019 10:38 AM, Marc Zyngier wrote:
>> From: Andre Przywara <andre.przywara@arm.com>
>>
>> Whenever we need to restore the guest's system registers to the CPU, we
>> now need to take care of the EL2 system registers as well. Most of them
>> are accessed via traps only, but some have an immediate effect and also
>> a guest running in VHE mode would expect them to be accessible via their
>> EL1 encoding, which we do not trap.
>>
>> Split the current __sysreg_{save,restore}_el1_state() functions into
>> handling common sysregs, then differentiate between the guest running in
>> vEL2 and vEL1.
>>
>> For vEL2 we write the virtual EL2 registers with an identical format directly
>> into their EL1 counterpart, and translate the few registers that have a
>> different format for the same effect on the execution when running a
>> non-VHE guest guest hypervisor.
>>
>>   [ Commit message reworked and many bug fixes applied by Marc Zyngier
>>     and Christoffer Dall. ]
>>
>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
>> ---
>>  arch/arm64/kvm/hyp/sysreg-sr.c | 160 +++++++++++++++++++++++++++++++--
>>  1 file changed, 153 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
>> index 62866a68e852..2abb9c3ff24f 100644
>> --- a/arch/arm64/kvm/hyp/sysreg-sr.c
>> +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> 
> [...]
> 
>> @@ -124,10 +167,91 @@ static void __hyp_text __sysreg_restore_user_state(struct kvm_cpu_context *ctxt)
>>  	write_sysreg(ctxt->sys_regs[TPIDRRO_EL0],	tpidrro_el0);
>>  }
>>  
>> -static void __hyp_text __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
>> +static void __sysreg_restore_vel2_state(struct kvm_cpu_context *ctxt)
>>  {
>> +	u64 val;
>> +
>> +	write_sysreg(read_cpuid_id(),			vpidr_el2);
>>  	write_sysreg(ctxt->sys_regs[MPIDR_EL1],		vmpidr_el2);
>> -	write_sysreg(ctxt->sys_regs[CSSELR_EL1],	csselr_el1);
>> +	write_sysreg_el1(ctxt->sys_regs[MAIR_EL2],	SYS_MAIR);
>> +	write_sysreg_el1(ctxt->sys_regs[VBAR_EL2],	SYS_VBAR);
>> +	write_sysreg_el1(ctxt->sys_regs[CONTEXTIDR_EL2],SYS_CONTEXTIDR);
>> +	write_sysreg_el1(ctxt->sys_regs[AMAIR_EL2],	SYS_AMAIR);
>> +
>> +	if (__vcpu_el2_e2h_is_set(ctxt)) {
>> +		/*
>> +		 * In VHE mode those registers are compatible between
>> +		 * EL1 and EL2.
>> +		 */
>> +		write_sysreg_el1(ctxt->sys_regs[SCTLR_EL2],	SYS_SCTLR);
>> +		write_sysreg_el1(ctxt->sys_regs[CPTR_EL2],	SYS_CPACR);
>> +		write_sysreg_el1(ctxt->sys_regs[TTBR0_EL2],	SYS_TTBR0);
>> +		write_sysreg_el1(ctxt->sys_regs[TTBR1_EL2],	SYS_TTBR1);
>> +		write_sysreg_el1(ctxt->sys_regs[TCR_EL2],	SYS_TCR);
>> +		write_sysreg_el1(ctxt->sys_regs[CNTHCTL_EL2],	SYS_CNTKCTL);
>> +	} else {
>> +		write_sysreg_el1(translate_sctlr(ctxt->sys_regs[SCTLR_EL2]),
>> +				 SYS_SCTLR);
>> +		write_sysreg_el1(translate_cptr(ctxt->sys_regs[CPTR_EL2]),
>> +				 SYS_CPACR);
>> +		write_sysreg_el1(translate_ttbr0(ctxt->sys_regs[TTBR0_EL2]),
>> +				 SYS_TTBR0);
>> +		write_sysreg_el1(translate_tcr(ctxt->sys_regs[TCR_EL2]),
>> +				 SYS_TCR);
>> +		write_sysreg_el1(translate_cnthctl(ctxt->sys_regs[CNTHCTL_EL2]),
>> +				 SYS_CNTKCTL);
>> +	}
>> +
>> +	/*
>> +	 * These registers can be modified behind our back by a fault
>> +	 * taken inside vEL2. Save them, always.
>> +	 */
>> +	write_sysreg_el1(ctxt->sys_regs[ESR_EL2],	SYS_ESR);
>> +	write_sysreg_el1(ctxt->sys_regs[AFSR0_EL2],	SYS_AFSR0);
>> +	write_sysreg_el1(ctxt->sys_regs[AFSR1_EL2],	SYS_AFSR1);
>> +	write_sysreg_el1(ctxt->sys_regs[FAR_EL2],	SYS_FAR);
>> +	write_sysreg(ctxt->sys_regs[SP_EL2],		sp_el1);
>> +	write_sysreg_el1(ctxt->sys_regs[ELR_EL2],	SYS_ELR);
>> +
>> +	val = __fixup_spsr_el2_write(ctxt, ctxt->sys_regs[SPSR_EL2]);
>> +	write_sysreg_el1(val,	SYS_SPSR);
>> +}
>> +
>> +static void __hyp_text __sysreg_restore_vel1_state(struct kvm_cpu_context *ctxt)
>> +{
>> +	u64 mpidr;
>> +
>> +	if (has_vhe()) {
>> +		struct kvm_vcpu *vcpu;
>> +
>> +		/*
>> +		 * Warning: this hack only works on VHE, because we only
>> +		 * call this with the *guest* context, which is part of
>> +		 * struct kvm_vcpu. On a host context, you'd get pure junk.
>> +		 */
>> +		vcpu = container_of(ctxt, struct kvm_vcpu, arch.ctxt);
> 
> This seems very fragile, just to find out whether the guest has hyp
> capabilities. It would be at least nice to make sure this is indeed a
> guest context.

Oh come on! It is such a nice hack! ;-) I distinctly remember
Christoffer being >that< close to vomiting when he saw that first.

More seriously, we know what the context is by construction.

> The *clean* way to do it could be to have a pointer to kvm_vcpu in the
> kvm_cpu_context which would be NULL for host contexts.

Funny you mention that. We have the exact opposite (host context
pointing to the running vcpu, and NULL in the guest context). Maybe we
can come up with something that always point to the vcpu, assuming
nothing yet checks for NULL to identify a guest context! ;-)

> Otherwise, I'm under the impression that for a host context,
> ctxt->sys_reg[HCR_EL2] == 0 and that this would also be true for a guest
> without nested virt capability. Could we use something like that here?

Urgh. I think I'd prefer the above suggestion. Or even my hack.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 35/59] KVM: arm/arm64: nv: Support multiple nested stage 2 mmu structures
  2019-06-25 12:19   ` Alexandru Elisei
@ 2019-07-03 13:47     ` Marc Zyngier
  0 siblings, 0 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-07-03 13:47 UTC (permalink / raw)
  To: Alexandru Elisei, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Dave Martin

On 25/06/2019 13:19, Alexandru Elisei wrote:
> On 6/21/19 10:38 AM, Marc Zyngier wrote:
>> From: Christoffer Dall <christoffer.dall@arm.com>
>>
>> Add stage 2 mmu data structures for virtual EL2 and for nested guests.
>> We don't yet populate shadow stage 2 page tables, but we now have a
>> framework for getting to a shadow stage 2 pgd.
>>
>> We allocate twice the number of vcpus as stage 2 mmu structures because
>> that's sufficient for each vcpu running two VMs without having to flush
>> the stage 2 page tables.
>>
>> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm/include/asm/kvm_host.h     |   4 +
>>  arch/arm/include/asm/kvm_mmu.h      |   3 +
>>  arch/arm64/include/asm/kvm_host.h   |  28 +++++
>>  arch/arm64/include/asm/kvm_mmu.h    |   8 ++
>>  arch/arm64/include/asm/kvm_nested.h |   7 ++
>>  arch/arm64/kvm/nested.c             | 172 ++++++++++++++++++++++++++++
>>  virt/kvm/arm/arm.c                  |  16 ++-
>>  virt/kvm/arm/mmu.c                  |  31 ++---
>>  8 files changed, 254 insertions(+), 15 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>> index e3217c4ad25b..b821eb2383ad 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -424,4 +424,8 @@ static inline bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu)
>>  	return true;
>>  }
>>  
>> +static inline void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu) {}
>> +static inline void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu) {}
>> +static inline int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu) { return 0; }
>> +
>>  #endif /* __ARM_KVM_HOST_H__ */
>> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
>> index be23e3f8e08c..e6984b6da2ce 100644
>> --- a/arch/arm/include/asm/kvm_mmu.h
>> +++ b/arch/arm/include/asm/kvm_mmu.h
>> @@ -420,6 +420,9 @@ static inline int hyp_map_aux_data(void)
>>  
>>  static inline void kvm_set_ipa_limit(void) {}
>>  
>> +static inline void kvm_init_s2_mmu(struct kvm_s2_mmu *mmu) {}
>> +static inline void kvm_init_nested(struct kvm *kvm) {}
>> +
>>  static __always_inline u64 kvm_get_vttbr(struct kvm_s2_mmu *mmu)
>>  {
>>  	struct kvm_vmid *vmid = &mmu->vmid;
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 3dee5e17a4ee..cc238de170d2 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -88,11 +88,39 @@ struct kvm_s2_mmu {
>>  	phys_addr_t	pgd_phys;
>>  
>>  	struct kvm *kvm;
>> +
>> +	/*
>> +	 * For a shadow stage-2 MMU, the virtual vttbr programmed by the guest
>> +	 * hypervisor.  Unused for kvm_arch->mmu. Set to 1 when the structure
>> +	 * contains no valid information.
>> +	 */
>> +	u64	vttbr;
>> +
>> +	/* true when this represents a nested context where virtual HCR_EL2.VM == 1 */
>> +	bool	nested_stage2_enabled;
>> +
>> +	/*
>> +	 *  0: Nobody is currently using this, check vttbr for validity
>> +	 * >0: Somebody is actively using this.
>> +	 */
>> +	atomic_t refcnt;
>>  };
>>  
>> +static inline bool kvm_s2_mmu_valid(struct kvm_s2_mmu *mmu)
>> +{
>> +	return !(mmu->vttbr & 1);
>> +}
>> +
>>  struct kvm_arch {
>>  	struct kvm_s2_mmu mmu;
>>  
>> +	/*
>> +	 * Stage 2 paging stage for VMs with nested virtual using a virtual
>> +	 * VMID.
>> +	 */
>> +	struct kvm_s2_mmu *nested_mmus;
>> +	size_t nested_mmus_size;
>> +
>>  	/* VTCR_EL2 value for this VM */
>>  	u64    vtcr;
>>  
>> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
>> index 1eb6e0ca61c2..32bcaa1845dc 100644
>> --- a/arch/arm64/include/asm/kvm_mmu.h
>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>> @@ -100,6 +100,7 @@ alternative_cb_end
>>  #include <asm/mmu_context.h>
>>  #include <asm/pgtable.h>
>>  #include <asm/kvm_emulate.h>
>> +#include <asm/kvm_nested.h>
>>  
>>  void kvm_update_va_mask(struct alt_instr *alt,
>>  			__le32 *origptr, __le32 *updptr, int nr_inst);
>> @@ -164,6 +165,7 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
>>  			     void **haddr);
>>  void free_hyp_pgds(void);
>>  
>> +void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size);
>>  void stage2_unmap_vm(struct kvm *kvm);
>>  int kvm_alloc_stage2_pgd(struct kvm_s2_mmu *mmu);
>>  void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
>> @@ -635,5 +637,11 @@ static __always_inline void __load_guest_stage2(struct kvm_s2_mmu *mmu)
>>  	asm(ALTERNATIVE("nop", "isb", ARM64_WORKAROUND_1165522));
>>  }
>>  
>> +static inline u64 get_vmid(u64 vttbr)
>> +{
>> +	return (vttbr & VTTBR_VMID_MASK(kvm_get_vmid_bits())) >>
>> +		VTTBR_VMID_SHIFT;
>> +}
>> +
>>  #endif /* __ASSEMBLY__ */
>>  #endif /* __ARM64_KVM_MMU_H__ */
>> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
>> index 61e71d0d2151..d4021d0892bd 100644
>> --- a/arch/arm64/include/asm/kvm_nested.h
>> +++ b/arch/arm64/include/asm/kvm_nested.h
>> @@ -10,6 +10,13 @@ static inline bool nested_virt_in_use(const struct kvm_vcpu *vcpu)
>>  		test_bit(KVM_ARM_VCPU_NESTED_VIRT, vcpu->arch.features);
>>  }
>>  
>> +extern void kvm_init_nested(struct kvm *kvm);
>> +extern int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu);
>> +extern void kvm_init_s2_mmu(struct kvm_s2_mmu *mmu);
>> +extern struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr);
>> +extern void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu);
>> +extern void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu);
>> +
>>  int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
>>  extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
>>  extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
>> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
>> index 3872e3cf1691..4b38dc5c0be3 100644
>> --- a/arch/arm64/kvm/nested.c
>> +++ b/arch/arm64/kvm/nested.c
>> @@ -18,7 +18,161 @@
>>  #include <linux/kvm.h>
>>  #include <linux/kvm_host.h>
>>  
>> +#include <asm/kvm_arm.h>
>>  #include <asm/kvm_emulate.h>
>> +#include <asm/kvm_mmu.h>
>> +#include <asm/kvm_nested.h>
>> +
>> +void kvm_init_nested(struct kvm *kvm)
>> +{
>> +	kvm_init_s2_mmu(&kvm->arch.mmu);
>> +
>> +	kvm->arch.nested_mmus = NULL;
>> +	kvm->arch.nested_mmus_size = 0;
>> +}
>> +
>> +int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu)
>> +{
>> +	struct kvm *kvm = vcpu->kvm;
>> +	struct kvm_s2_mmu *tmp;
>> +	int num_mmus;
>> +	int ret = -ENOMEM;
>> +
>> +	if (!test_bit(KVM_ARM_VCPU_NESTED_VIRT, vcpu->arch.features))
>> +		return 0;
>> +
>> +	if (!cpus_have_const_cap(ARM64_HAS_NESTED_VIRT))
>> +		return -EINVAL;
> 
> Here we fail if KVM_ARM_VCPU_NESTED_VIRT features was requested from the virtual
> vcpu, but the nested capability isn't present. This function is called as a
> result of the KVM_ARM_VCPU_INIT, and when this function fails, the
> KVM_ARM_VCPU_INIT ioctl will also fail. This means that we cannot have a vcpu
> with the nested virt feature on a system which doesn't support nested
> virtualization.
> 
> However, commit 04/59 "KVM: arm64: nv: Introduce nested virtualization VCPU
> feature" added the function nested_virt_in_use (in
> arch/arm64/include/asm/kvm_nested.h) which checks for **both** conditions before
> returning true. I believe the capability check is not required in
> nested_virt_in_use.

The capability check is a static branch, meaning that if there is no
actual evaluation, just a branch. If you remove this branch, you end-up
checking the feature bit even on systems that do not have NV, and that'd
be a measurable performance drag.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 32/59] KVM: arm64: nv: Hide RAS from nested guests
  2019-06-21  9:38 ` [PATCH 32/59] KVM: arm64: nv: Hide RAS from nested guests Marc Zyngier
@ 2019-07-03 13:59   ` Alexandru Elisei
  0 siblings, 0 replies; 176+ messages in thread
From: Alexandru Elisei @ 2019-07-03 13:59 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave P Martin


On 6/21/19 10:38 AM, Marc Zyngier wrote:
> We don't want to expose complicated features to guests until we have
> a good grasp on the basic CPU emulation. So let's pretend that RAS,
> just like SVE, doesn't exist in a nested guest.
>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/kvm/sys_regs.c | 32 +++++++++++++++++++++++++++++---
>  1 file changed, 29 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 34f1b79f7856..ec34b81da936 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -577,6 +577,14 @@ static bool trap_raz_wi(struct kvm_vcpu *vcpu,
>               return read_zero(vcpu, p);
>  }
>
> +static bool trap_undef(struct kvm_vcpu *vcpu,
> +                    struct sys_reg_params *p,
> +                    const struct sys_reg_desc *r)
> +{
> +     kvm_inject_undefined(vcpu);
> +     return false;
> +}
> +
>  /*
>   * ARMv8.1 mandates at least a trivial LORegion implementation, where all the
>   * RW registers are RES0 (which we can implement as RAZ/WI). On an ARMv8.0
> @@ -1601,13 +1609,15 @@ static bool access_ccsidr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
>  }
>
>  /* sys_reg_desc initialiser for known cpufeature ID registers */
> -#define ID_SANITISED(name) {                 \
> +#define ID_SANITISED_FN(name, fn) {          \
>       SYS_DESC(SYS_##name),                   \
> -     .access = access_id_reg,                \
> +     .access = fn,                           \
>       .get_user = get_id_reg,                 \
>       .set_user = set_id_reg,                 \
>  }
>
> +#define ID_SANITISED(name)   ID_SANITISED_FN(name, access_id_reg)
> +
>  /*
>   * sys_reg_desc initialiser for architecturally unallocated cpufeature ID
>   * register with encoding Op0=3, Op1=0, CRn=0, CRm=crm, Op2=op2
> @@ -1700,6 +1710,21 @@ static bool access_spsr_el2(struct kvm_vcpu *vcpu,
>       return true;
>  }
>
> +static bool access_id_aa64pfr0_el1(struct kvm_vcpu *v,
> +                                struct sys_reg_params *p,
> +                                const struct sys_reg_desc *r)
> +{
> +     u64 val;
> +
> +     if (!nested_virt_in_use(v) || p->is_write)
> +             return access_id_reg(v, p, r);

So SVE is masked in the nested case in access_id_reg (which calls read_id_reg,
modified in patch 25 of the series). Looks to me that the above condition means
that when nested virtualization is in use, on reads we don't go through
access_id_reg and we could end up with SVE support advertised to the guest. How
about we hide SVE from guests here, just like we do with RAS?

> +
> +     val = read_sanitised_ftr_reg(SYS_ID_AA64PFR0_EL1);
> +     p->regval = val & ~(0xf << ID_AA64PFR0_RAS_SHIFT);
> +
> +     return true;
> +}
> +
>  /*
>   * Architected system registers.
>   * Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
> @@ -1791,7 +1816,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>
>       /* AArch64 ID registers */
>       /* CRm=4 */
> -     ID_SANITISED(ID_AA64PFR0_EL1),
> +     ID_SANITISED_FN(ID_AA64PFR0_EL1, access_id_aa64pfr0_el1),
>       ID_SANITISED(ID_AA64PFR1_EL1),
>       ID_UNALLOCATED(4,2),
>       ID_UNALLOCATED(4,3),
> @@ -2032,6 +2057,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>       { SYS_DESC(SYS_VBAR_EL2), access_rw, reset_val, VBAR_EL2, 0 },
>       { SYS_DESC(SYS_RVBAR_EL2), access_rw, reset_val, RVBAR_EL2, 0 },
>       { SYS_DESC(SYS_RMR_EL2), access_rw, reset_val, RMR_EL2, 0 },
> +     { SYS_DESC(SYS_VDISR_EL2), trap_undef },
>
>       { SYS_DESC(SYS_CONTEXTIDR_EL2), access_rw, reset_val, CONTEXTIDR_EL2, 0 },
>       { SYS_DESC(SYS_TPIDR_EL2), access_rw, reset_val, TPIDR_EL2, 0 },
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 33/59] KVM: arm64: nv: Pretend we only support larger-than-host page sizes
  2019-06-21  9:38 ` [PATCH 33/59] KVM: arm64: nv: Pretend we only support larger-than-host page sizes Marc Zyngier
@ 2019-07-03 14:13   ` Alexandru Elisei
  0 siblings, 0 replies; 176+ messages in thread
From: Alexandru Elisei @ 2019-07-03 14:13 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave Martin


On 6/21/19 10:38 AM, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
>
> Exposing memory management support to the virtual EL2 as is exposed to
> the host hypervisor would make the implementation too complex and
> inefficient. Therefore expose limited memory management support for the
> following two cases.
>
> We expose same or larger page granules than the one host uses.  We can
> theoretically support a guest hypervisor having smaller-than-host
> granularities but it is not worth it since it makes the implementation
> complicated and it would waste memory.
>
> We expose 40 bits of physical address range to the virtual EL2, because
> we only support a 40bit IPA for the guest. Eventually, this will change.
>
>   [ This was only trapping on the 32-bit encoding, also using the
>     current target register value as a base for the sanitisation.
>
>     Use as the handler for the 64-bit sysreg as well, also load the
>     sanitised version of the sysreg before clearing and setting bits.
>
>     -- Andre Przywara ]
>
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/kvm/sys_regs.c | 50 ++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 49 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index ec34b81da936..cc994ec3c121 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1710,6 +1710,54 @@ static bool access_spsr_el2(struct kvm_vcpu *vcpu,
>  	return true;
>  }
>  
> +static bool access_id_aa64mmfr0_el1(struct kvm_vcpu *v,
> +				    struct sys_reg_params *p,
> +				    const struct sys_reg_desc *r)
> +{
> +	u64 val;
> +
> +	if (p->is_write)
> +		return write_to_read_only(v, p, r);
> +
> +	val = read_id_reg(v, r, false);
> +
> +	if (!nested_virt_in_use(v))
> +		goto out;
> +
> +	/*
> +	 * Don't expose granules smaller than the host's granule to the guest.
> +	 * We can theoretically support a guest hypervisor having
> +	 * smaller-than-host granularities but it is not worth it since it
> +	 * makes the implementation complicated and it would waste memory.
> +	 */
> +	switch (PAGE_SIZE) {
> +	case SZ_64K:
> +		/* 16KB granule not supported */
> +		val &= ~(0xf << ID_AA64MMFR0_TGRAN16_SHIFT);
> +		val |= (ID_AA64MMFR0_TGRAN16_NI << ID_AA64MMFR0_TGRAN16_SHIFT);
> +		/* fall through */
> +	case SZ_16K:
> +		/* 4KB granule not supported */
> +		val &= ~(0xf << ID_AA64MMFR0_TGRAN4_SHIFT);
> +		val |= (ID_AA64MMFR0_TGRAN4_NI << ID_AA64MMFR0_TGRAN4_SHIFT);
> +		break;
> +	case SZ_4K:
> +		/* All granule sizes are supported */
> +		break;
> +	default:
> +		unreachable();
> +	}
> +
> +	/* Expose only 40 bits physical address range to the guest hypervisor */
> +	val &= ~(0xf << ID_AA64MMFR0_PARANGE_SHIFT);
> +	val |= (0x2 << ID_AA64MMFR0_PARANGE_SHIFT); /* 40 bits */

There are already defines for ID_AA64MMFR0_PARANGE_48 and
ID_AA64MMFR0_PARANGE_52 in sysreg.h, perhaps a similar define for
ID_AA64MMFR0_PARANGE_40 would be appropriate?

> +
> +out:
> +	p->regval = val;
> +
> +	return true;
> +}
> +
>  static bool access_id_aa64pfr0_el1(struct kvm_vcpu *v,
>  				   struct sys_reg_params *p,
>  				   const struct sys_reg_desc *r)
> @@ -1846,7 +1894,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	ID_UNALLOCATED(6,7),
>  
>  	/* CRm=7 */
> -	ID_SANITISED(ID_AA64MMFR0_EL1),
> +	ID_SANITISED_FN(ID_AA64MMFR0_EL1, access_id_aa64mmfr0_el1),
>  	ID_SANITISED(ID_AA64MMFR1_EL1),
>  	ID_SANITISED(ID_AA64MMFR2_EL1),
>  	ID_UNALLOCATED(7,3),

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 21/59] KVM: arm64: nv: Set a handler for the system instruction traps
  2019-06-25 12:55   ` Julien Thierry
@ 2019-07-03 14:15     ` Marc Zyngier
  0 siblings, 0 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-07-03 14:15 UTC (permalink / raw)
  To: Julien Thierry, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	James Morse, Suzuki K Poulose

On 25/06/2019 13:55, Julien Thierry wrote:
> 
> 
> On 06/21/2019 10:38 AM, Marc Zyngier wrote:
>> From: Jintack Lim <jintack.lim@linaro.org>
>>
>> When HCR.NV bit is set, execution of the EL2 translation regime address
>> aranslation instructions and TLB maintenance instructions are trapped to
>> EL2. In addition, execution of the EL1 translation regime address
>> aranslation instructions and TLB maintenance instructions that are only
> 
> What's "translation regime address aranslation" ? I would guess
> "aranslation" should be removed, but since the same pattern appears
> twice in the commit doubt took over me :) .

It's a whole new concept. Still working on it though! ;-)

> 
>> accessible from EL2 and above are trapped to EL2. In these cases,
>> ESR_EL2.EC will be set to 0x18.
>>
>> Change the existing handler to handle those system instructions as well
>> as MRS/MSR instructions.  Emulation of each system instructions will be
>> done in separate patches.
>>
>> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm64/include/asm/kvm_coproc.h |  2 +-
>>  arch/arm64/kvm/handle_exit.c        |  2 +-
>>  arch/arm64/kvm/sys_regs.c           | 53 +++++++++++++++++++++++++----
>>  arch/arm64/kvm/trace.h              |  2 +-
>>  4 files changed, 50 insertions(+), 9 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_coproc.h b/arch/arm64/include/asm/kvm_coproc.h
>> index 0b52377a6c11..1b3d21bd8adb 100644
>> --- a/arch/arm64/include/asm/kvm_coproc.h
>> +++ b/arch/arm64/include/asm/kvm_coproc.h
>> @@ -43,7 +43,7 @@ int kvm_handle_cp14_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
>>  int kvm_handle_cp14_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
>>  int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
>>  int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
>> -int kvm_handle_sys_reg(struct kvm_vcpu *vcpu, struct kvm_run *run);
>> +int kvm_handle_sys(struct kvm_vcpu *vcpu, struct kvm_run *run);
>>  
>>  #define kvm_coproc_table_init kvm_sys_reg_table_init
>>  void kvm_sys_reg_table_init(void);
>> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
>> index 2517711f034f..e662f23b63a1 100644
>> --- a/arch/arm64/kvm/handle_exit.c
>> +++ b/arch/arm64/kvm/handle_exit.c
>> @@ -236,7 +236,7 @@ static exit_handle_fn arm_exit_handlers[] = {
>>  	[ESR_ELx_EC_SMC32]	= handle_smc,
>>  	[ESR_ELx_EC_HVC64]	= handle_hvc,
>>  	[ESR_ELx_EC_SMC64]	= handle_smc,
>> -	[ESR_ELx_EC_SYS64]	= kvm_handle_sys_reg,
>> +	[ESR_ELx_EC_SYS64]	= kvm_handle_sys,
>>  	[ESR_ELx_EC_SVE]	= handle_sve,
>>  	[ESR_ELx_EC_ERET]	= kvm_handle_eret,
>>  	[ESR_ELx_EC_IABT_LOW]	= kvm_handle_guest_abort,
>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>> index 1d1312425cf2..e711dde4511c 100644
>> --- a/arch/arm64/kvm/sys_regs.c
>> +++ b/arch/arm64/kvm/sys_regs.c
>> @@ -2597,6 +2597,40 @@ static int emulate_sys_reg(struct kvm_vcpu *vcpu,
>>  	return 1;
>>  }
>>  
>> +static int emulate_tlbi(struct kvm_vcpu *vcpu,
>> +			     struct sys_reg_params *params)
>> +{
>> +	/* TODO: support tlbi instruction emulation*/
>> +	kvm_inject_undefined(vcpu);
>> +	return 1;
>> +}
>> +
>> +static int emulate_at(struct kvm_vcpu *vcpu,
>> +			     struct sys_reg_params *params)
>> +{
>> +	/* TODO: support address translation instruction emulation */
>> +	kvm_inject_undefined(vcpu);
>> +	return 1;
>> +}
>> +
>> +static int emulate_sys_instr(struct kvm_vcpu *vcpu,
>> +			     struct sys_reg_params *params)
>> +{
>> +	int ret = 0;
>> +
>> +	/* TLB maintenance instructions*/
>> +	if (params->CRn == 0b1000)
>> +		ret = emulate_tlbi(vcpu, params);
>> +	/* Address Translation instructions */
>> +	else if (params->CRn == 0b0111 && params->CRm == 0b1000)
>> +		ret = emulate_at(vcpu, params);
>> +
> 
> So, in theory the NV bit shouldn't trap other Op0 == 1 instructions.
> Would it be worth adding a WARN() or BUG() in an "else" branch here,
> just in case?

Probably not a BUG(), but a WARN_ONCE() would be good.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 11/59] KVM: arm64: nv: Inject HVC exceptions to the virtual EL2
  2019-06-25 13:13   ` Alexandru Elisei
@ 2019-07-03 14:16     ` Marc Zyngier
  2019-07-30 14:08     ` Alexandru Elisei
  1 sibling, 0 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-07-03 14:16 UTC (permalink / raw)
  To: Alexandru Elisei, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Dave Martin

On 25/06/2019 14:13, Alexandru Elisei wrote:
> On 6/21/19 10:37 AM, Marc Zyngier wrote:
>> From: Jintack Lim <jintack.lim@linaro.org>
>>
>> Now that the psci call is done by the smc instruction when nested
> This suggests that we have support for PSCI calls using SMC as the conduit, but
> that is not the case, as the handle_smc function is not changed by this commit,
> and support for PSCI via SMC is added later in patch 22/59 "KVM: arm64: nv:
> Handle PSCI call via smc from the guest". Perhaps the commit message should be
> reworded to reflect that?

Sure.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 23/59] KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting
  2019-06-25 14:19   ` Julien Thierry
  2019-07-02 12:54     ` Alexandru Elisei
@ 2019-07-03 14:18     ` Marc Zyngier
  1 sibling, 0 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-07-03 14:18 UTC (permalink / raw)
  To: Julien Thierry, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	James Morse, Suzuki K Poulose

On 25/06/2019 15:19, Julien Thierry wrote:
> 
> 
> On 06/21/2019 10:38 AM, Marc Zyngier wrote:
>> From: Jintack Lim <jintack.lim@linaro.org>
>>
>> Forward exceptions due to WFI or WFE instructions to the virtual EL2 if
>> they are not coming from the virtual EL2 and virtual HCR_EL2.TWX is set.
>>
>> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm64/include/asm/kvm_nested.h |  2 ++
>>  arch/arm64/kvm/Makefile             |  1 +
>>  arch/arm64/kvm/handle_exit.c        | 13 +++++++++-
>>  arch/arm64/kvm/nested.c             | 39 +++++++++++++++++++++++++++++
>>  4 files changed, 54 insertions(+), 1 deletion(-)
>>  create mode 100644 arch/arm64/kvm/nested.c
>>
>> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
>> index 8a3d121a0b42..645e5e11b749 100644
>> --- a/arch/arm64/include/asm/kvm_nested.h
>> +++ b/arch/arm64/include/asm/kvm_nested.h
>> @@ -10,4 +10,6 @@ static inline bool nested_virt_in_use(const struct kvm_vcpu *vcpu)
>>  		test_bit(KVM_ARM_VCPU_NESTED_VIRT, vcpu->arch.features);
>>  }
>>  
>> +int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
>> +
>>  #endif /* __ARM64_KVM_NESTED_H */
>> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
>> index 9e450aea7db6..f11bd8b0d837 100644
>> --- a/arch/arm64/kvm/Makefile
>> +++ b/arch/arm64/kvm/Makefile
>> @@ -36,4 +36,5 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/irqchip.o
>>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
>>  kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
>>  
>> +kvm-$(CONFIG_KVM_ARM_HOST) += nested.o
>>  kvm-$(CONFIG_KVM_ARM_HOST) += emulate-nested.o
>> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
>> index e348c15c81bc..ddba212fd6ec 100644
>> --- a/arch/arm64/kvm/handle_exit.c
>> +++ b/arch/arm64/kvm/handle_exit.c
>> @@ -127,7 +127,18 @@ static int handle_no_fpsimd(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>   */
>>  static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>  {
>> -	if (kvm_vcpu_get_hsr(vcpu) & ESR_ELx_WFx_ISS_WFE) {
>> +	bool is_wfe = !!(kvm_vcpu_get_hsr(vcpu) & ESR_ELx_WFx_ISS_WFE);
>> +
>> +	if (nested_virt_in_use(vcpu)) {
>> +		int ret = handle_wfx_nested(vcpu, is_wfe);
>> +
>> +		if (ret < 0 && ret != -EINVAL)
>> +			return ret;
>> +		else if (ret >= 0)
>> +			return ret;
> 
> I think you can simplify this:
> 
> 	if (ret != -EINVAL)
> 		return ret;

Ah! Yes. ;-)

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 13/59] KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg()
  2019-07-03 12:15     ` Marc Zyngier
@ 2019-07-03 15:21       ` Julien Thierry
  0 siblings, 0 replies; 176+ messages in thread
From: Julien Thierry @ 2019-07-03 15:21 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	James Morse, Suzuki K Poulose



On 03/07/2019 13:15, Marc Zyngier wrote:
> On 24/06/2019 13:42, Julien Thierry wrote:
>>
>>
>> On 06/21/2019 10:37 AM, Marc Zyngier wrote:
>>> From: Andre Przywara <andre.przywara@arm.com>
>>>
>>> KVM internally uses accessor functions when reading or writing the
>>> guest's system registers. This takes care of accessing either the stored
>>> copy or using the "live" EL1 system registers when the host uses VHE.
>>>
>>> With the introduction of virtual EL2 we add a bunch of EL2 system
>>> registers, which now must also be taken care of:
>>> - If the guest is running in vEL2, and we access an EL1 sysreg, we must
>>>   revert to the stored version of that, and not use the CPU's copy.
>>> - If the guest is running in vEL1, and we access an EL2 sysreg, we must
>>>   also use the stored version, since the CPU carries the EL1 copy.
>>> - Some EL2 system registers are supposed to affect the current execution
>>>   of the system, so we need to put them into their respective EL1
>>>   counterparts. For this we need to define a mapping between the two.
>>>   This is done using the newly introduced struct el2_sysreg_map.
>>> - Some EL2 system registers have a different format than their EL1
>>>   counterpart, so we need to translate them before writing them to the
>>>   CPU. This is done using an (optional) translate function in the map.
>>> - There are the three special registers SP_EL2, SPSR_EL2 and ELR_EL2,
>>>   which need some separate handling.
>>>
>>> All of these cases are now wrapped into the existing accessor functions,
>>> so KVM users wouldn't need to care whether they access EL2 or EL1
>>> registers and also which state the guest is in.
>>>
>>> This handles what was formerly known as the "shadow state" dynamically,
>>> without requiring a separate copy for each vCPU EL.
>>>
>>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>> ---
>>>  arch/arm64/include/asm/kvm_emulate.h |   6 +
>>>  arch/arm64/include/asm/kvm_host.h    |   5 +
>>>  arch/arm64/kvm/sys_regs.c            | 163 +++++++++++++++++++++++++++
>>>  3 files changed, 174 insertions(+)
>>>
>>> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
>>> index c43aac5fed69..f37006b6eec4 100644
>>> --- a/arch/arm64/include/asm/kvm_emulate.h
>>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>>> @@ -70,6 +70,12 @@ void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu);
>>>  int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2);
>>>  int kvm_inject_nested_irq(struct kvm_vcpu *vcpu);
>>>  
>>> +u64 translate_tcr(u64 tcr);
>>> +u64 translate_cptr(u64 tcr);
>>> +u64 translate_sctlr(u64 tcr);
>>> +u64 translate_ttbr0(u64 tcr);
>>> +u64 translate_cnthctl(u64 tcr);
>>> +
>>>  static inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
>>>  {
>>>  	return !(vcpu->arch.hcr_el2 & HCR_RW);
>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>>> index 2d4290d2513a..dae9c42a7219 100644
>>> --- a/arch/arm64/include/asm/kvm_host.h
>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>> @@ -217,6 +217,11 @@ enum vcpu_sysreg {
>>>  	NR_SYS_REGS	/* Nothing after this line! */
>>>  };
>>>  
>>> +static inline bool sysreg_is_el2(int reg)
>>> +{
>>> +	return reg >= FIRST_EL2_SYSREG && reg < NR_SYS_REGS;
>>> +}
>>> +
>>>  /* 32bit mapping */
>>>  #define c0_MPIDR	(MPIDR_EL1 * 2)	/* MultiProcessor ID Register */
>>>  #define c0_CSSELR	(CSSELR_EL1 * 2)/* Cache Size Selection Register */
>>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>>> index 693dd063c9c2..d024114da162 100644
>>> --- a/arch/arm64/kvm/sys_regs.c
>>> +++ b/arch/arm64/kvm/sys_regs.c
>>> @@ -76,11 +76,142 @@ static bool write_to_read_only(struct kvm_vcpu *vcpu,
>>>  	return false;
>>>  }
>>>  
>>> +static u64 tcr_el2_ips_to_tcr_el1_ps(u64 tcr_el2)
>>> +{
>>> +	return ((tcr_el2 & TCR_EL2_PS_MASK) >> TCR_EL2_PS_SHIFT)
>>> +		<< TCR_IPS_SHIFT;
>>> +}
>>> +
>>> +u64 translate_tcr(u64 tcr)
>>> +{
>>> +	return TCR_EPD1_MASK |				/* disable TTBR1_EL1 */
>>> +	       ((tcr & TCR_EL2_TBI) ? TCR_TBI0 : 0) |
>>> +	       tcr_el2_ips_to_tcr_el1_ps(tcr) |
>>> +	       (tcr & TCR_EL2_TG0_MASK) |
>>> +	       (tcr & TCR_EL2_ORGN0_MASK) |
>>> +	       (tcr & TCR_EL2_IRGN0_MASK) |
>>> +	       (tcr & TCR_EL2_T0SZ_MASK);
>>> +}
>>> +
>>> +u64 translate_cptr(u64 cptr_el2)
>>> +{
>>> +	u64 cpacr_el1 = 0;
>>> +
>>> +	if (!(cptr_el2 & CPTR_EL2_TFP))
>>> +		cpacr_el1 |= CPACR_EL1_FPEN;
>>> +	if (cptr_el2 & CPTR_EL2_TTA)
>>> +		cpacr_el1 |= CPACR_EL1_TTA;
>>> +	if (!(cptr_el2 & CPTR_EL2_TZ))
>>> +		cpacr_el1 |= CPACR_EL1_ZEN;
>>> +
>>> +	return cpacr_el1;
>>> +}
>>> +
>>> +u64 translate_sctlr(u64 sctlr)
>>> +{
>>> +	/* Bit 20 is RES1 in SCTLR_EL1, but RES0 in SCTLR_EL2 */
>>> +	return sctlr | BIT(20);
>>> +}
>>> +
>>> +u64 translate_ttbr0(u64 ttbr0)
>>> +{
>>> +	/* Force ASID to 0 (ASID 0 or RES0) */
>>> +	return ttbr0 & ~GENMASK_ULL(63, 48);
>>> +}
>>> +
>>> +u64 translate_cnthctl(u64 cnthctl)
>>> +{
>>> +	return ((cnthctl & 0x3) << 10) | (cnthctl & 0xfc);
>>> +}
>>> +
>>> +#define EL2_SYSREG(el2, el1, translate)	\
>>> +	[el2 - FIRST_EL2_SYSREG] = { el2, el1, translate }
>>> +#define PURE_EL2_SYSREG(el2) \
>>> +	[el2 - FIRST_EL2_SYSREG] = { el2,__INVALID_SYSREG__, NULL }
>>> +/*
>>> + * Associate vEL2 registers to their EL1 counterparts on the CPU.
>>> + * The translate function can be NULL, when the register layout is identical.
>>> + */
>>> +struct el2_sysreg_map {
>>> +	int sysreg;	/* EL2 register index into the array above */
>>> +	int mapping;	/* associated EL1 register */
>>> +	u64 (*translate)(u64 value);
>>> +} nested_sysreg_map[NR_SYS_REGS - FIRST_EL2_SYSREG] = {
>>> +	PURE_EL2_SYSREG( VPIDR_EL2 ),
>>> +	PURE_EL2_SYSREG( VMPIDR_EL2 ),
>>> +	PURE_EL2_SYSREG( ACTLR_EL2 ),
>>> +	PURE_EL2_SYSREG( HCR_EL2 ),
>>> +	PURE_EL2_SYSREG( MDCR_EL2 ),
>>> +	PURE_EL2_SYSREG( HSTR_EL2 ),
>>> +	PURE_EL2_SYSREG( HACR_EL2 ),
>>> +	PURE_EL2_SYSREG( VTTBR_EL2 ),
>>> +	PURE_EL2_SYSREG( VTCR_EL2 ),
>>> +	PURE_EL2_SYSREG( RVBAR_EL2 ),
>>> +	PURE_EL2_SYSREG( RMR_EL2 ),
>>> +	PURE_EL2_SYSREG( TPIDR_EL2 ),
>>> +	PURE_EL2_SYSREG( CNTVOFF_EL2 ),
>>> +	PURE_EL2_SYSREG( CNTHCTL_EL2 ),
>>> +	PURE_EL2_SYSREG( HPFAR_EL2 ),
>>> +	EL2_SYSREG(      SCTLR_EL2,  SCTLR_EL1,      translate_sctlr ),
>>> +	EL2_SYSREG(      CPTR_EL2,   CPACR_EL1,      translate_cptr  ),
>>> +	EL2_SYSREG(      TTBR0_EL2,  TTBR0_EL1,      translate_ttbr0 ),
>>> +	EL2_SYSREG(      TTBR1_EL2,  TTBR1_EL1,      NULL            ),
>>> +	EL2_SYSREG(      TCR_EL2,    TCR_EL1,        translate_tcr   ),
>>> +	EL2_SYSREG(      VBAR_EL2,   VBAR_EL1,       NULL            ),
>>> +	EL2_SYSREG(      AFSR0_EL2,  AFSR0_EL1,      NULL            ),
>>> +	EL2_SYSREG(      AFSR1_EL2,  AFSR1_EL1,      NULL            ),
>>> +	EL2_SYSREG(      ESR_EL2,    ESR_EL1,        NULL            ),
>>> +	EL2_SYSREG(      FAR_EL2,    FAR_EL1,        NULL            ),
>>> +	EL2_SYSREG(      MAIR_EL2,   MAIR_EL1,       NULL            ),
>>> +	EL2_SYSREG(      AMAIR_EL2,  AMAIR_EL1,      NULL            ),
>>> +};
>>> +
>>> +static
>>> +const struct el2_sysreg_map *find_el2_sysreg(const struct el2_sysreg_map *map,
>>> +					     int reg)
>>> +{
>>> +	const struct el2_sysreg_map *entry;
>>> +
>>> +	if (!sysreg_is_el2(reg))
>>> +		return NULL;
>>> +
>>> +	entry = &nested_sysreg_map[reg - FIRST_EL2_SYSREG];
>>> +	if (entry->sysreg == __INVALID_SYSREG__)
>>> +		return NULL;
>>> +
>>> +	return entry;
>>> +}
>>> +
>>>  u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
>>>  {
>>> +
>>>  	if (!vcpu->arch.sysregs_loaded_on_cpu)
>>>  		goto immediate_read;
>>>  
>>> +	if (unlikely(sysreg_is_el2(reg))) {
>>> +		const struct el2_sysreg_map *el2_reg;
>>> +
>>> +		if (!is_hyp_ctxt(vcpu))
>>> +			goto immediate_read;
>>> +
>>> +		el2_reg = find_el2_sysreg(nested_sysreg_map, reg);
>>> +		if (el2_reg) {
>>> +			/*
>>> +			 * If this register does not have an EL1 counterpart,
>>> +			 * then read the stored EL2 version.
>>> +			 */
>>> +			if (el2_reg->mapping == __INVALID_SYSREG__)
>>
>> In this patch, find_el2_sysreg returns NULL for PURE_EL2 registers. So
> 
> That's not how I read this code. You get NULL if the the EL2 sysreg is
> set to __INVALID_SYSREG__, of which there is no occurrence (yeah, dead
> code).
> 

Ah yes, as you guessed, I got confused between ->sysreg and ->mapping.
Something must have gotten in my eyes when I was doing the review!

You can ignore my comments on patch then!

Thanks,

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 34/59] KVM: arm/arm64: nv: Factor out stage 2 page table data from struct kvm
  2019-06-21  9:38 ` [PATCH 34/59] KVM: arm/arm64: nv: Factor out stage 2 page table data from struct kvm Marc Zyngier
@ 2019-07-03 15:52   ` Alexandru Elisei
  0 siblings, 0 replies; 176+ messages in thread
From: Alexandru Elisei @ 2019-07-03 15:52 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave Martin


On 6/21/19 10:38 AM, Marc Zyngier wrote:
> As we are about to reuse our stage 2 page table manipulation code for
> shadow stage 2 page tables in the context of nested virtualization, we
> are going to manage multiple stage 2 page tables for a single VM.
>
> This requires some pretty invasive changes to our data structures,
> which moves the vmid and pgd pointers into a separate structure and
> change pretty much all of our mmu code to operate on this structure
> instead.
>
> The new structre is called struct kvm_s2_mmu.
>
> There is no intended functional change by this patch alone.
>
> [Designed data structure layout in collaboration]
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> ---
>  arch/arm/include/asm/kvm_asm.h    |   5 +-
>  arch/arm/include/asm/kvm_host.h   |  23 ++-
>  arch/arm/include/asm/kvm_mmu.h    |  10 +-
>  arch/arm/kvm/hyp/switch.c         |   3 +-
>  arch/arm/kvm/hyp/tlb.c            |  13 +-
>  arch/arm64/include/asm/kvm_asm.h  |   5 +-
>  arch/arm64/include/asm/kvm_host.h |  24 ++-
>  arch/arm64/include/asm/kvm_mmu.h  |  16 +-
>  arch/arm64/kvm/hyp/switch.c       |   8 +-
>  arch/arm64/kvm/hyp/tlb.c          |  36 ++---
>  virt/kvm/arm/arm.c                |  17 +-
>  virt/kvm/arm/mmu.c                | 250 ++++++++++++++++--------------
>  12 files changed, 224 insertions(+), 186 deletions(-)
>
> diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
> index f615830f9f57..4f85323f1290 100644
> --- a/arch/arm/include/asm/kvm_asm.h
> +++ b/arch/arm/include/asm/kvm_asm.h
> @@ -49,13 +49,14 @@
>  #ifndef __ASSEMBLY__
>  struct kvm;
>  struct kvm_vcpu;
> +struct kvm_s2_mmu;
>  
>  extern char __kvm_hyp_init[];
>  extern char __kvm_hyp_init_end[];
>  
>  extern void __kvm_flush_vm_context(void);
> -extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
> -extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
> +extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa);
> +extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
>  extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
>  
>  extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index f80418ddeb60..e3217c4ad25b 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -55,18 +55,23 @@ struct kvm_vmid {
>  	u32    vmid;
>  };
>  
> +struct kvm_s2_mmu {
> +	/* The VMID generation used for the virt. memory system */

For more context:

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index f80418ddeb60..e3217c4ad25b 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -52,24 +52,29 @@ void kvm_reset_coprocs(struct kvm_vcpu *vcpu);
 struct kvm_vmid {
        /* The VMID generation used for the virt. memory system */
        u64    vmid_gen;
        u32    vmid;
 };
 
+struct kvm_s2_mmu {
+       /* The VMID generation used for the virt. memory system */
+       struct kvm_vmid vmid;
+
+       /* Stage-2 page table */
+       pgd_t *pgd;
+       phys_addr_t pgd_phys;
+
+       struct kvm *kvm;
+};

[..]

I think one of the comments is redundant.

> +	struct kvm_vmid vmid;
> +
> +	/* Stage-2 page table */
> +	pgd_t *pgd;
> +	phys_addr_t pgd_phys;
> +
> +	struct kvm *kvm;
> +};
> +
>  struct kvm_arch {
> +	struct kvm_s2_mmu mmu;
> +
>  	/* The last vcpu id that ran on each physical CPU */
>  	int __percpu *last_vcpu_ran;
>  
> -	/*
> -	 * Anything that is not used directly from assembly code goes
> -	 * here.
> -	 */
> -
> -	/* The VMID generation used for the virt. memory system */
> -	struct kvm_vmid vmid;
> -
>  	/* Stage-2 page table */
>  	pgd_t *pgd;
>  	phys_addr_t pgd_phys;
> @@ -164,6 +169,8 @@ struct vcpu_reset_state {
>  struct kvm_vcpu_arch {
>  	struct kvm_cpu_context ctxt;
>  
> +	struct kvm_s2_mmu *hw_mmu;
> +
>  	int target; /* Processor target */
>  	DECLARE_BITMAP(features, KVM_VCPU_MAX_FEATURES);
>  
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index 0d84d50bf9ba..be23e3f8e08c 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -52,8 +52,8 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
>  void free_hyp_pgds(void);
>  
>  void stage2_unmap_vm(struct kvm *kvm);
> -int kvm_alloc_stage2_pgd(struct kvm *kvm);
> -void kvm_free_stage2_pgd(struct kvm *kvm);
> +int kvm_alloc_stage2_pgd(struct kvm_s2_mmu *mmu);
> +void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
>  int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
>  			  phys_addr_t pa, unsigned long size, bool writable);
>  
> @@ -420,12 +420,12 @@ static inline int hyp_map_aux_data(void)
>  
>  static inline void kvm_set_ipa_limit(void) {}
>  
> -static __always_inline u64 kvm_get_vttbr(struct kvm *kvm)
> +static __always_inline u64 kvm_get_vttbr(struct kvm_s2_mmu *mmu)
>  {
> -	struct kvm_vmid *vmid = &kvm->arch.vmid;
> +	struct kvm_vmid *vmid = &mmu->vmid;
>  	u64 vmid_field, baddr;
>  
> -	baddr = kvm->arch.pgd_phys;
> +	baddr = mmu->pgd_phys;
>  	vmid_field = (u64)vmid->vmid << VTTBR_VMID_SHIFT;
>  	return kvm_phys_to_vttbr(baddr) | vmid_field;
>  }
> diff --git a/arch/arm/kvm/hyp/switch.c b/arch/arm/kvm/hyp/switch.c
> index 3b058a5d7c5f..6e9c3f11bfa4 100644
> --- a/arch/arm/kvm/hyp/switch.c
> +++ b/arch/arm/kvm/hyp/switch.c
> @@ -76,8 +76,7 @@ static void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu)
>  
>  static void __hyp_text __activate_vm(struct kvm_vcpu *vcpu)
>  {
> -	struct kvm *kvm = kern_hyp_va(vcpu->kvm);
> -	write_sysreg(kvm_get_vttbr(kvm), VTTBR);
> +	write_sysreg(kvm_get_vttbr(vcpu->arch.hw_mmu), VTTBR);
>  	write_sysreg(vcpu->arch.midr, VPIDR);
>  }
>  
> diff --git a/arch/arm/kvm/hyp/tlb.c b/arch/arm/kvm/hyp/tlb.c
> index 8e4afba73635..2d66288e20ed 100644
> --- a/arch/arm/kvm/hyp/tlb.c
> +++ b/arch/arm/kvm/hyp/tlb.c
> @@ -35,13 +35,12 @@
>   * As v7 does not support flushing per IPA, just nuke the whole TLB
>   * instead, ignoring the ipa value.
>   */
> -void __hyp_text __kvm_tlb_flush_vmid(struct kvm *kvm)
> +void __hyp_text __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu)
>  {
>  	dsb(ishst);
>  
>  	/* Switch to requested VMID */
> -	kvm = kern_hyp_va(kvm);
> -	write_sysreg(kvm_get_vttbr(kvm), VTTBR);
> +	write_sysreg(kvm_get_vttbr(mmu), VTTBR);
>  	isb();
>  
>  	write_sysreg(0, TLBIALLIS);
> @@ -51,17 +50,15 @@ void __hyp_text __kvm_tlb_flush_vmid(struct kvm *kvm)
>  	write_sysreg(0, VTTBR);
>  }
>  
> -void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
> +void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa)
>  {
> -	__kvm_tlb_flush_vmid(kvm);
> +	__kvm_tlb_flush_vmid(mmu);
>  }
>  
>  void __hyp_text __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu)
>  {
> -	struct kvm *kvm = kern_hyp_va(kern_hyp_va(vcpu)->kvm);
> -
>  	/* Switch to requested VMID */
> -	write_sysreg(kvm_get_vttbr(kvm), VTTBR);
> +	write_sysreg(kvm_get_vttbr(vcpu->arch.hw_mmu), VTTBR);
>  	isb();
>  
>  	write_sysreg(0, TLBIALL);
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index ff73f5462aca..5e956c2cd9b4 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -56,6 +56,7 @@
>  
>  struct kvm;
>  struct kvm_vcpu;
> +struct kvm_s2_mmu;
>  
>  extern char __kvm_hyp_init[];
>  extern char __kvm_hyp_init_end[];
> @@ -63,8 +64,8 @@ extern char __kvm_hyp_init_end[];
>  extern char __kvm_hyp_vector[];
>  
>  extern void __kvm_flush_vm_context(void);
> -extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
> -extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
> +extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa);
> +extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
>  extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
>  
>  extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index dae9c42a7219..3dee5e17a4ee 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -73,12 +73,25 @@ struct kvm_vmid {
>  	u32    vmid;
>  };
>  
> -struct kvm_arch {
> +struct kvm_s2_mmu {
>  	struct kvm_vmid vmid;
>  
> -	/* stage2 entry level table */
> -	pgd_t *pgd;
> -	phys_addr_t pgd_phys;
> +	/*
> +	 * stage2 entry level table
> +	 *
> +	 * Two kvm_s2_mmu structures in the same VM can point to the same pgd
> +	 * here.  This happens when running a non-VHE guest hypervisor which
> +	 * uses the canonical stage 2 page table for both vEL2 and for vEL1/0
> +	 * with vHCR_EL2.VM == 0.
> +	 */
> +	pgd_t		*pgd;
> +	phys_addr_t	pgd_phys;
> +
> +	struct kvm *kvm;
> +};
> +
> +struct kvm_arch {
> +	struct kvm_s2_mmu mmu;
>  
>  	/* VTCR_EL2 value for this VM */
>  	u64    vtcr;
> @@ -297,6 +310,9 @@ struct kvm_vcpu_arch {
>  	void *sve_state;
>  	unsigned int sve_max_vl;
>  
> +	/* Stage 2 paging state used by the hardware on next switch */
> +	struct kvm_s2_mmu *hw_mmu;
> +
>  	/* HYP configuration */
>  	u64 hcr_el2;
>  	u32 mdcr_el2;
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index fe954efc992c..1eb6e0ca61c2 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -165,8 +165,8 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
>  void free_hyp_pgds(void);
>  
>  void stage2_unmap_vm(struct kvm *kvm);
> -int kvm_alloc_stage2_pgd(struct kvm *kvm);
> -void kvm_free_stage2_pgd(struct kvm *kvm);
> +int kvm_alloc_stage2_pgd(struct kvm_s2_mmu *mmu);
> +void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
>  int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
>  			  phys_addr_t pa, unsigned long size, bool writable);
>  
> @@ -607,13 +607,13 @@ static inline u64 kvm_vttbr_baddr_mask(struct kvm *kvm)
>  	return vttbr_baddr_mask(kvm_phys_shift(kvm), kvm_stage2_levels(kvm));
>  }
>  
> -static __always_inline u64 kvm_get_vttbr(struct kvm *kvm)
> +static __always_inline u64 kvm_get_vttbr(struct kvm_s2_mmu *mmu)
>  {
> -	struct kvm_vmid *vmid = &kvm->arch.vmid;
> +	struct kvm_vmid *vmid = &mmu->vmid;
>  	u64 vmid_field, baddr;
>  	u64 cnp = system_supports_cnp() ? VTTBR_CNP_BIT : 0;
>  
> -	baddr = kvm->arch.pgd_phys;
> +	baddr = mmu->pgd_phys;
>  	vmid_field = (u64)vmid->vmid << VTTBR_VMID_SHIFT;
>  	return kvm_phys_to_vttbr(baddr) | vmid_field | cnp;
>  }
> @@ -622,10 +622,10 @@ static __always_inline u64 kvm_get_vttbr(struct kvm *kvm)
>   * Must be called from hyp code running at EL2 with an updated VTTBR
>   * and interrupts disabled.
>   */
> -static __always_inline void __load_guest_stage2(struct kvm *kvm)
> +static __always_inline void __load_guest_stage2(struct kvm_s2_mmu *mmu)
>  {
> -	write_sysreg(kvm->arch.vtcr, vtcr_el2);
> -	write_sysreg(kvm_get_vttbr(kvm), vttbr_el2);
> +	write_sysreg(kern_hyp_va(mmu->kvm)->arch.vtcr, vtcr_el2);
> +	write_sysreg(kvm_get_vttbr(mmu), vttbr_el2);
>  
>  	/*
>  	 * ARM erratum 1165522 requires the actual execution of the above
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 4b2c45060b38..fb479c71b521 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -248,9 +248,9 @@ void deactivate_traps_vhe_put(void)
>  	__deactivate_traps_common();
>  }
>  
> -static void __hyp_text __activate_vm(struct kvm *kvm)
> +static void __hyp_text __activate_vm(struct kvm_s2_mmu *mmu)
>  {
> -	__load_guest_stage2(kvm);
> +	__load_guest_stage2(mmu);
>  }
>  
>  static void __hyp_text __deactivate_vm(struct kvm_vcpu *vcpu)
> @@ -611,7 +611,7 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
>  	 * stage 2 translation, and __activate_traps clear HCR_EL2.TGE
>  	 * (among other things).
>  	 */
> -	__activate_vm(vcpu->kvm);
> +	__activate_vm(vcpu->arch.hw_mmu);
>  	__activate_traps(vcpu);
>  
>  	sysreg_restore_guest_state_vhe(guest_ctxt);
> @@ -672,7 +672,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
>  
>  	__sysreg_save_state_nvhe(host_ctxt);
>  
> -	__activate_vm(kern_hyp_va(vcpu->kvm));
> +	__activate_vm(kern_hyp_va(vcpu->arch.hw_mmu));
>  	__activate_traps(vcpu);
>  
>  	__hyp_vgic_restore_state(vcpu);
> diff --git a/arch/arm64/kvm/hyp/tlb.c b/arch/arm64/kvm/hyp/tlb.c
> index 32a782bb00be..779405db3fb3 100644
> --- a/arch/arm64/kvm/hyp/tlb.c
> +++ b/arch/arm64/kvm/hyp/tlb.c
> @@ -27,7 +27,7 @@ struct tlb_inv_context {
>  	u64		sctlr;
>  };
>  
> -static void __hyp_text __tlb_switch_to_guest_vhe(struct kvm *kvm,
> +static void __hyp_text __tlb_switch_to_guest_vhe(struct kvm_s2_mmu *mmu,
>  						 struct tlb_inv_context *cxt)
>  {
>  	u64 val;
> @@ -64,17 +64,17 @@ static void __hyp_text __tlb_switch_to_guest_vhe(struct kvm *kvm,
>  	 * place before clearing TGE. __load_guest_stage2() already
>  	 * has an ISB in order to deal with this.
>  	 */
> -	__load_guest_stage2(kvm);
> +	__load_guest_stage2(mmu);
>  	val = read_sysreg(hcr_el2);
>  	val &= ~HCR_TGE;
>  	write_sysreg(val, hcr_el2);
>  	isb();
>  }
>  
> -static void __hyp_text __tlb_switch_to_guest_nvhe(struct kvm *kvm,
> +static void __hyp_text __tlb_switch_to_guest_nvhe(struct kvm_s2_mmu *mmu,
>  						  struct tlb_inv_context *cxt)
>  {
> -	__load_guest_stage2(kvm);
> +	__load_guest_stage2(mmu);
>  	isb();
>  }
>  
> @@ -83,8 +83,7 @@ static hyp_alternate_select(__tlb_switch_to_guest,
>  			    __tlb_switch_to_guest_vhe,
>  			    ARM64_HAS_VIRT_HOST_EXTN);
>  
> -static void __hyp_text __tlb_switch_to_host_vhe(struct kvm *kvm,
> -						struct tlb_inv_context *cxt)
> +static void __hyp_text __tlb_switch_to_host_vhe(struct tlb_inv_context *cxt)
>  {
>  	/*
>  	 * We're done with the TLB operation, let's restore the host's
> @@ -103,8 +102,7 @@ static void __hyp_text __tlb_switch_to_host_vhe(struct kvm *kvm,
>  	local_irq_restore(cxt->flags);
>  }
>  
> -static void __hyp_text __tlb_switch_to_host_nvhe(struct kvm *kvm,
> -						 struct tlb_inv_context *cxt)
> +static void __hyp_text __tlb_switch_to_host_nvhe(struct tlb_inv_context *cxt)
>  {
>  	write_sysreg(0, vttbr_el2);
>  }
> @@ -114,15 +112,15 @@ static hyp_alternate_select(__tlb_switch_to_host,
>  			    __tlb_switch_to_host_vhe,
>  			    ARM64_HAS_VIRT_HOST_EXTN);
>  
> -void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
> +void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa)
>  {
>  	struct tlb_inv_context cxt;
>  
>  	dsb(ishst);
>  
>  	/* Switch to requested VMID */
> -	kvm = kern_hyp_va(kvm);
> -	__tlb_switch_to_guest()(kvm, &cxt);
> +	mmu = kern_hyp_va(mmu);
> +	__tlb_switch_to_guest()(mmu, &cxt);
>  
>  	/*
>  	 * We could do so much better if we had the VA as well.
> @@ -165,39 +163,39 @@ void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
>  	if (!has_vhe() && icache_is_vpipt())
>  		__flush_icache_all();
>  
> -	__tlb_switch_to_host()(kvm, &cxt);
> +	__tlb_switch_to_host()(&cxt);
>  }
>  
> -void __hyp_text __kvm_tlb_flush_vmid(struct kvm *kvm)
> +void __hyp_text __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu)
>  {
>  	struct tlb_inv_context cxt;
>  
>  	dsb(ishst);
>  
>  	/* Switch to requested VMID */
> -	kvm = kern_hyp_va(kvm);
> -	__tlb_switch_to_guest()(kvm, &cxt);
> +	mmu = kern_hyp_va(mmu);
> +	__tlb_switch_to_guest()(mmu, &cxt);
>  
>  	__tlbi(vmalls12e1is);
>  	dsb(ish);
>  	isb();
>  
> -	__tlb_switch_to_host()(kvm, &cxt);
> +	__tlb_switch_to_host()(&cxt);
>  }
>  
>  void __hyp_text __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu)
>  {
> -	struct kvm *kvm = kern_hyp_va(kern_hyp_va(vcpu)->kvm);
> +	struct kvm_s2_mmu *mmu = kern_hyp_va(kern_hyp_va(vcpu)->arch.hw_mmu);
>  	struct tlb_inv_context cxt;
>  
>  	/* Switch to requested VMID */
> -	__tlb_switch_to_guest()(kvm, &cxt);
> +	__tlb_switch_to_guest()(mmu, &cxt);
>  
>  	__tlbi(vmalle1);
>  	dsb(nsh);
>  	isb();
>  
> -	__tlb_switch_to_host()(kvm, &cxt);
> +	__tlb_switch_to_host()(&cxt);
>  }
>  
>  void __hyp_text __kvm_flush_vm_context(void)
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index bd5c55916d0d..5d4371633e1c 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -118,26 +118,27 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  	for_each_possible_cpu(cpu)
>  		*per_cpu_ptr(kvm->arch.last_vcpu_ran, cpu) = -1;
>  
> -	ret = kvm_alloc_stage2_pgd(kvm);
> +	ret = kvm_alloc_stage2_pgd(&kvm->arch.mmu);

I don't think this is correct, because kvm_alloc_stage2_pgd will do this:

pgd = alloc_pages_exact(stage2_pgd_size(mmu->kvm), GFP_KERNEL | __GFP_ZERO);

and mmu->kvm is zero at that point. As evidenced by this error I get when trying
to run a guest with the host built from this patch:

 /test/stress64/kvm/lkvm run -k /opt/kvm/guest-0/Image -d
/opt/kvm/guest-0/fs.ext2 -c 1 -m 511 --console virtio --irqchip=gicv3 --params
console=hvc earlycon=uart8250,0x3f8 swiotlb=1024
  # lkvm run -k /opt/kvm/guest-0/Image -m 511 -c 1 --name guest-90
[    3.296083] Unable to handle kernel paging request at virtual address
0000000000001120
[    3.296083] Mem abort info:
[    3.297109]   ESR = 0x96000006
[    3.297451]   Exception class = DABT (current EL), IL = 32 bits
[    3.297962]   SET = 0, FnV = 0
[    3.297962]   EA = 0, S1PTW = 0
[    3.298645] Data abort info:
[    3.298986]   ISV = 0, ISS = 0x00000006
[    3.299499]   CM = 0, WnR = 0
[    3.299499] user pgtable: 4k pages, 48-bit VAs, pgdp=00000000b8e1a000
[    3.300695] [0000000000001120] pgd=00000000b8e20003, pud=00000000b8e21003,
pmd=0000000000000000
[    3.301547] Internal error: Oops: 96000006 [#1] SMP
[    3.302058] Modules linked in:
[    3.302634] CPU: 0 PID: 90 Comm: lkvm Not tainted
5.2.0-rc5-b42cb0673478-dirty-4.20-nano-mc-fs-slr-a64-kvm+ #212
[    3.303301] Hardware name: Generated (DT)
[    3.303765] pstate: 62400009 (nZCv daif +PAN -UAO)
[    3.304448] pc : kvm_alloc_stage2_pgd+0x24/0x118
[    3.305131] lr : kvm_arch_init_vm+0xb0/0x138
[    3.305473] sp : ffff000010e0bcc0
[    3.305813] x29: ffff000010e0bcc0 x28: ffff800039b14240
[    3.306495] x27: 0000000000000000 x26: 0000000000000000
[    3.307178] x25: 0000000056000000 x24: 0000000000000003
[    3.307751] x23: 00000000ffffffff x22: ffff000010869920
[    3.308373] x21: ffff00001003e0f8 x20: ffff00001003d000
[    3.309056] x19: ffff00001003e0f8 x18: 0000000000000000
[    3.309568] x17: 0000000000000000 x16: 0000000000000000
[    3.310250] x15: 0000000000000010 x14: ffffffffffffffff
[    3.310933] x13: ffff000090e0ba7f x12: ffff000010e0ba87
[    3.311445] x11: ffff000010879000 x10: ffff000010e0ba20
[    3.312129] x9 : 00000000ffffffd0 x8 : ffff00001044ebf8
[    3.312811] x7 : 000000000000008f x6 : ffff0000108c83b9
[    3.313493] x5 : 000000000000000a x4 : ffff800039a94c80
[    3.314005] x3 : 0000000000000040 x2 : 00000000ffffffff
[    3.314690] x1 : 0000000000000000 x0 : 0000000000000008
[    3.315302] Call trace:
[    3.315712]  kvm_alloc_stage2_pgd+0x24/0x118
[    3.316395]  kvm_arch_init_vm+0xb0/0x138
[    3.316917]  kvm_dev_ioctl+0x160/0x640
[    3.317418]  do_vfs_ioctl+0xa4/0x858
[    3.318101]  ksys_ioctl+0x78/0xa8
[    3.318634]  __arm64_sys_ioctl+0x1c/0x28
[    3.319295]  el0_svc_common.constprop.0+0x88/0x150
[    3.319808]  el0_svc_handler+0x28/0x78
[    3.320320]  el0_svc+0x8/0xc
[    3.321002] Code: b5000720 f9401261 d2800803 d2800100 (f9489021)
[    3.321515] ---[ end trace f37de9a5e8acd1dc ]---
[    3.322027] Kernel panic - not syncing: Fatal exception
[    3.322367] Kernel Offset: disabled
[    3.322882] CPU features: 0x0297,2a00aa38
[    3.323221] Memory Limit: none
[    3.323733] ---[ end Kernel panic - not syncing: Fatal exception ]---

With this change I was able to boot a guest to userspace:

diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 5d4371633e1c..83253976edd3 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -118,6 +118,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
        for_each_possible_cpu(cpu)
                *per_cpu_ptr(kvm->arch.last_vcpu_ran, cpu) = -1;
 
+       kvm->arch.mmu.kvm = kvm;
        ret = kvm_alloc_stage2_pgd(&kvm->arch.mmu);
        if (ret)
                goto out_fail_alloc;

>  	if (ret)
>  		goto out_fail_alloc;
>  
> +	/* Mark the initial VMID generation invalid */
> +	kvm->arch.mmu.vmid.vmid_gen = 0;
> +	kvm->arch.mmu.kvm = kvm;
> +
>  	ret = create_hyp_mappings(kvm, kvm + 1, PAGE_HYP);
>  	if (ret)
>  		goto out_free_stage2_pgd;
>  
>  	kvm_vgic_early_init(kvm);
>  
> -	/* Mark the initial VMID generation invalid */
> -	kvm->arch.vmid.vmid_gen = 0;
> -
>  	/* The maximum number of VCPUs is limited by the host's GIC model */
>  	kvm->arch.max_vcpus = vgic_present ?
>  				kvm_vgic_get_max_vcpus() : KVM_MAX_VCPUS;
>  
>  	return ret;
>  out_free_stage2_pgd:
> -	kvm_free_stage2_pgd(kvm);
> +	kvm_free_stage2_pgd(&kvm->arch.mmu);
>  out_fail_alloc:
>  	free_percpu(kvm->arch.last_vcpu_ran);
>  	kvm->arch.last_vcpu_ran = NULL;
> @@ -342,6 +343,8 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
>  
>  	kvm_arm_reset_debug_ptr(vcpu);
>  
> +	vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu;
> +
>  	return kvm_vgic_vcpu_init(vcpu);
>  }
>  
> @@ -682,7 +685,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		 */
>  		cond_resched();
>  
> -		update_vmid(&vcpu->kvm->arch.vmid);
> +		update_vmid(&vcpu->arch.hw_mmu->vmid);
>  
>  		check_vcpu_requests(vcpu);
>  
> @@ -731,7 +734,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		 */
>  		smp_store_mb(vcpu->mode, IN_GUEST_MODE);
>  
> -		if (ret <= 0 || need_new_vmid_gen(&vcpu->kvm->arch.vmid) ||
> +		if (ret <= 0 || need_new_vmid_gen(&vcpu->arch.hw_mmu->vmid) ||
>  		    kvm_request_pending(vcpu)) {
>  			vcpu->mode = OUTSIDE_GUEST_MODE;
>  			isb(); /* Ensure work in x_flush_hwstate is committed */
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 198e5171e1f7..bb1be4ea55ec 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -51,12 +51,12 @@ static bool memslot_is_logging(struct kvm_memory_slot *memslot)
>   */
>  void kvm_flush_remote_tlbs(struct kvm *kvm)
>  {
> -	kvm_call_hyp(__kvm_tlb_flush_vmid, kvm);
> +	kvm_call_hyp(__kvm_tlb_flush_vmid, &kvm->arch.mmu);
>  }
>  
> -static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
> +static void kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa)
>  {
> -	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, kvm, ipa);
> +	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ipa);
>  }
>  
>  /*
> @@ -92,31 +92,33 @@ static bool kvm_is_device_pfn(unsigned long pfn)
>   *
>   * Function clears a PMD entry, flushes addr 1st and 2nd stage TLBs.
>   */
> -static void stage2_dissolve_pmd(struct kvm *kvm, phys_addr_t addr, pmd_t *pmd)
> +static void stage2_dissolve_pmd(struct kvm_s2_mmu *mmu, phys_addr_t addr, pmd_t *pmd)
>  {
>  	if (!pmd_thp_or_huge(*pmd))
>  		return;
>  
>  	pmd_clear(pmd);
> -	kvm_tlb_flush_vmid_ipa(kvm, addr);
> +	kvm_tlb_flush_vmid_ipa(mmu, addr);
>  	put_page(virt_to_page(pmd));
>  }
>  
>  /**
>   * stage2_dissolve_pud() - clear and flush huge PUD entry
> - * @kvm:	pointer to kvm structure.
> + * @mmu:	pointer to mmu structure to operate on
>   * @addr:	IPA
>   * @pud:	pud pointer for IPA
>   *
>   * Function clears a PUD entry, flushes addr 1st and 2nd stage TLBs.
>   */
> -static void stage2_dissolve_pud(struct kvm *kvm, phys_addr_t addr, pud_t *pudp)
> +static void stage2_dissolve_pud(struct kvm_s2_mmu *mmu, phys_addr_t addr, pud_t *pudp)
>  {
> +	struct kvm *kvm __maybe_unused = mmu->kvm;
> +
>  	if (!stage2_pud_huge(kvm, *pudp))
>  		return;
>  
>  	stage2_pud_clear(kvm, pudp);
> -	kvm_tlb_flush_vmid_ipa(kvm, addr);
> +	kvm_tlb_flush_vmid_ipa(mmu, addr);
>  	put_page(virt_to_page(pudp));
>  }
>  
> @@ -152,31 +154,35 @@ static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
>  	return p;
>  }
>  
> -static void clear_stage2_pgd_entry(struct kvm *kvm, pgd_t *pgd, phys_addr_t addr)
> +static void clear_stage2_pgd_entry(struct kvm_s2_mmu *mmu, pgd_t *pgd, phys_addr_t addr)
>  {
> +	struct kvm *kvm __maybe_unused = mmu->kvm;
> +
>  	pud_t *pud_table __maybe_unused = stage2_pud_offset(kvm, pgd, 0UL);
>  	stage2_pgd_clear(kvm, pgd);
> -	kvm_tlb_flush_vmid_ipa(kvm, addr);
> +	kvm_tlb_flush_vmid_ipa(mmu, addr);
>  	stage2_pud_free(kvm, pud_table);
>  	put_page(virt_to_page(pgd));
>  }
>  
> -static void clear_stage2_pud_entry(struct kvm *kvm, pud_t *pud, phys_addr_t addr)
> +static void clear_stage2_pud_entry(struct kvm_s2_mmu *mmu, pud_t *pud, phys_addr_t addr)
>  {
> +	struct kvm *kvm __maybe_unused = mmu->kvm;
> +
>  	pmd_t *pmd_table __maybe_unused = stage2_pmd_offset(kvm, pud, 0);
>  	VM_BUG_ON(stage2_pud_huge(kvm, *pud));
>  	stage2_pud_clear(kvm, pud);
> -	kvm_tlb_flush_vmid_ipa(kvm, addr);
> +	kvm_tlb_flush_vmid_ipa(mmu, addr);
>  	stage2_pmd_free(kvm, pmd_table);
>  	put_page(virt_to_page(pud));
>  }
>  
> -static void clear_stage2_pmd_entry(struct kvm *kvm, pmd_t *pmd, phys_addr_t addr)
> +static void clear_stage2_pmd_entry(struct kvm_s2_mmu *mmu, pmd_t *pmd, phys_addr_t addr)
>  {
>  	pte_t *pte_table = pte_offset_kernel(pmd, 0);
>  	VM_BUG_ON(pmd_thp_or_huge(*pmd));
>  	pmd_clear(pmd);
> -	kvm_tlb_flush_vmid_ipa(kvm, addr);
> +	kvm_tlb_flush_vmid_ipa(mmu, addr);
>  	free_page((unsigned long)pte_table);
>  	put_page(virt_to_page(pmd));
>  }
> @@ -234,7 +240,7 @@ static inline void kvm_pgd_populate(pgd_t *pgdp, pud_t *pudp)
>   * we then fully enforce cacheability of RAM, no matter what the guest
>   * does.
>   */
> -static void unmap_stage2_ptes(struct kvm *kvm, pmd_t *pmd,
> +static void unmap_stage2_ptes(struct kvm_s2_mmu *mmu, pmd_t *pmd,
>  		       phys_addr_t addr, phys_addr_t end)
>  {
>  	phys_addr_t start_addr = addr;
> @@ -246,7 +252,7 @@ static void unmap_stage2_ptes(struct kvm *kvm, pmd_t *pmd,
>  			pte_t old_pte = *pte;
>  
>  			kvm_set_pte(pte, __pte(0));
> -			kvm_tlb_flush_vmid_ipa(kvm, addr);
> +			kvm_tlb_flush_vmid_ipa(mmu, addr);
>  
>  			/* No need to invalidate the cache for device mappings */
>  			if (!kvm_is_device_pfn(pte_pfn(old_pte)))
> @@ -256,13 +262,14 @@ static void unmap_stage2_ptes(struct kvm *kvm, pmd_t *pmd,
>  		}
>  	} while (pte++, addr += PAGE_SIZE, addr != end);
>  
> -	if (stage2_pte_table_empty(kvm, start_pte))
> -		clear_stage2_pmd_entry(kvm, pmd, start_addr);
> +	if (stage2_pte_table_empty(mmu->kvm, start_pte))
> +		clear_stage2_pmd_entry(mmu, pmd, start_addr);
>  }
>  
> -static void unmap_stage2_pmds(struct kvm *kvm, pud_t *pud,
> +static void unmap_stage2_pmds(struct kvm_s2_mmu *mmu, pud_t *pud,
>  		       phys_addr_t addr, phys_addr_t end)
>  {
> +	struct kvm *kvm __maybe_unused = mmu->kvm;
>  	phys_addr_t next, start_addr = addr;
>  	pmd_t *pmd, *start_pmd;
>  
> @@ -274,24 +281,25 @@ static void unmap_stage2_pmds(struct kvm *kvm, pud_t *pud,
>  				pmd_t old_pmd = *pmd;
>  
>  				pmd_clear(pmd);
> -				kvm_tlb_flush_vmid_ipa(kvm, addr);
> +				kvm_tlb_flush_vmid_ipa(mmu, addr);
>  
>  				kvm_flush_dcache_pmd(old_pmd);
>  
>  				put_page(virt_to_page(pmd));
>  			} else {
> -				unmap_stage2_ptes(kvm, pmd, addr, next);
> +				unmap_stage2_ptes(mmu, pmd, addr, next);
>  			}
>  		}
>  	} while (pmd++, addr = next, addr != end);
>  
>  	if (stage2_pmd_table_empty(kvm, start_pmd))
> -		clear_stage2_pud_entry(kvm, pud, start_addr);
> +		clear_stage2_pud_entry(mmu, pud, start_addr);
>  }
>  
> -static void unmap_stage2_puds(struct kvm *kvm, pgd_t *pgd,
> +static void unmap_stage2_puds(struct kvm_s2_mmu *mmu, pgd_t *pgd,
>  		       phys_addr_t addr, phys_addr_t end)
>  {
> +	struct kvm *kvm __maybe_unused = mmu->kvm;
>  	phys_addr_t next, start_addr = addr;
>  	pud_t *pud, *start_pud;
>  
> @@ -303,17 +311,17 @@ static void unmap_stage2_puds(struct kvm *kvm, pgd_t *pgd,
>  				pud_t old_pud = *pud;
>  
>  				stage2_pud_clear(kvm, pud);
> -				kvm_tlb_flush_vmid_ipa(kvm, addr);
> +				kvm_tlb_flush_vmid_ipa(mmu, addr);
>  				kvm_flush_dcache_pud(old_pud);
>  				put_page(virt_to_page(pud));
>  			} else {
> -				unmap_stage2_pmds(kvm, pud, addr, next);
> +				unmap_stage2_pmds(mmu, pud, addr, next);
>  			}
>  		}
>  	} while (pud++, addr = next, addr != end);
>  
>  	if (stage2_pud_table_empty(kvm, start_pud))
> -		clear_stage2_pgd_entry(kvm, pgd, start_addr);
> +		clear_stage2_pgd_entry(mmu, pgd, start_addr);
>  }
>  
>  /**
> @@ -327,8 +335,9 @@ static void unmap_stage2_puds(struct kvm *kvm, pgd_t *pgd,
>   * destroying the VM), otherwise another faulting VCPU may come in and mess
>   * with things behind our backs.
>   */
> -static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
> +static void unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)

I'm nitpicking here, but this line is longer than 80 characters.

>  {
> +	struct kvm *kvm = mmu->kvm;
>  	pgd_t *pgd;
>  	phys_addr_t addr = start, end = start + size;
>  	phys_addr_t next;
> @@ -336,18 +345,18 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
>  	assert_spin_locked(&kvm->mmu_lock);
>  	WARN_ON(size & ~PAGE_MASK);
>  
> -	pgd = kvm->arch.pgd + stage2_pgd_index(kvm, addr);
> +	pgd = mmu->pgd + stage2_pgd_index(kvm, addr);
>  	do {
>  		/*
>  		 * Make sure the page table is still active, as another thread
>  		 * could have possibly freed the page table, while we released
>  		 * the lock.
>  		 */
> -		if (!READ_ONCE(kvm->arch.pgd))
> +		if (!READ_ONCE(mmu->pgd))
>  			break;
>  		next = stage2_pgd_addr_end(kvm, addr, end);
>  		if (!stage2_pgd_none(kvm, *pgd))
> -			unmap_stage2_puds(kvm, pgd, addr, next);
> +			unmap_stage2_puds(mmu, pgd, addr, next);
>  		/*
>  		 * If the range is too large, release the kvm->mmu_lock
>  		 * to prevent starvation and lockup detector warnings.
> @@ -357,7 +366,7 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
>  	} while (pgd++, addr = next, addr != end);
>  }
>  
> -static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
> +static void stage2_flush_ptes(struct kvm_s2_mmu *mmu, pmd_t *pmd,
>  			      phys_addr_t addr, phys_addr_t end)
>  {
>  	pte_t *pte;
> @@ -369,9 +378,10 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
>  	} while (pte++, addr += PAGE_SIZE, addr != end);
>  }
>  
> -static void stage2_flush_pmds(struct kvm *kvm, pud_t *pud,
> +static void stage2_flush_pmds(struct kvm_s2_mmu *mmu, pud_t *pud,
>  			      phys_addr_t addr, phys_addr_t end)
>  {
> +	struct kvm *kvm = mmu->kvm;
>  	pmd_t *pmd;
>  	phys_addr_t next;
>  
> @@ -382,14 +392,15 @@ static void stage2_flush_pmds(struct kvm *kvm, pud_t *pud,
>  			if (pmd_thp_or_huge(*pmd))
>  				kvm_flush_dcache_pmd(*pmd);
>  			else
> -				stage2_flush_ptes(kvm, pmd, addr, next);
> +				stage2_flush_ptes(mmu, pmd, addr, next);
>  		}
>  	} while (pmd++, addr = next, addr != end);
>  }
>  
> -static void stage2_flush_puds(struct kvm *kvm, pgd_t *pgd,
> +static void stage2_flush_puds(struct kvm_s2_mmu *mmu, pgd_t *pgd,
>  			      phys_addr_t addr, phys_addr_t end)
>  {
> +	struct kvm *kvm __maybe_unused = mmu->kvm;
>  	pud_t *pud;
>  	phys_addr_t next;
>  
> @@ -400,24 +411,25 @@ static void stage2_flush_puds(struct kvm *kvm, pgd_t *pgd,
>  			if (stage2_pud_huge(kvm, *pud))
>  				kvm_flush_dcache_pud(*pud);
>  			else
> -				stage2_flush_pmds(kvm, pud, addr, next);
> +				stage2_flush_pmds(mmu, pud, addr, next);
>  		}
>  	} while (pud++, addr = next, addr != end);
>  }
>  
> -static void stage2_flush_memslot(struct kvm *kvm,
> +static void stage2_flush_memslot(struct kvm_s2_mmu *mmu,
>  				 struct kvm_memory_slot *memslot)
>  {
> +	struct kvm *kvm = mmu->kvm;
>  	phys_addr_t addr = memslot->base_gfn << PAGE_SHIFT;
>  	phys_addr_t end = addr + PAGE_SIZE * memslot->npages;
>  	phys_addr_t next;
>  	pgd_t *pgd;
>  
> -	pgd = kvm->arch.pgd + stage2_pgd_index(kvm, addr);
> +	pgd = mmu->pgd + stage2_pgd_index(kvm, addr);
>  	do {
>  		next = stage2_pgd_addr_end(kvm, addr, end);
>  		if (!stage2_pgd_none(kvm, *pgd))
> -			stage2_flush_puds(kvm, pgd, addr, next);
> +			stage2_flush_puds(mmu, pgd, addr, next);
>  	} while (pgd++, addr = next, addr != end);
>  }
>  
> @@ -439,7 +451,7 @@ static void stage2_flush_vm(struct kvm *kvm)
>  
>  	slots = kvm_memslots(kvm);
>  	kvm_for_each_memslot(memslot, slots)
> -		stage2_flush_memslot(kvm, memslot);
> +		stage2_flush_memslot(&kvm->arch.mmu, memslot);
>  
>  	spin_unlock(&kvm->mmu_lock);
>  	srcu_read_unlock(&kvm->srcu, idx);
> @@ -883,35 +895,35 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
>  
>  /**
>   * kvm_alloc_stage2_pgd - allocate level-1 table for stage-2 translation.
> - * @kvm:	The KVM struct pointer for the VM.
> + * @mmu:	The stage 2 mmu struct pointer
>   *
>   * Allocates only the stage-2 HW PGD level table(s) of size defined by
> - * stage2_pgd_size(kvm).
> + * stage2_pgd_size(mmu->kvm).
>   *
>   * Note we don't need locking here as this is only called when the VM is
>   * created, which can only be done once.
>   */
> -int kvm_alloc_stage2_pgd(struct kvm *kvm)
> +int kvm_alloc_stage2_pgd(struct kvm_s2_mmu *mmu)
>  {
>  	phys_addr_t pgd_phys;
>  	pgd_t *pgd;
>  
> -	if (kvm->arch.pgd != NULL) {
> +	if (mmu->pgd != NULL) {
>  		kvm_err("kvm_arch already initialized?\n");
>  		return -EINVAL;
>  	}
>  
>  	/* Allocate the HW PGD, making sure that each page gets its own refcount */
> -	pgd = alloc_pages_exact(stage2_pgd_size(kvm), GFP_KERNEL | __GFP_ZERO);
> +	pgd = alloc_pages_exact(stage2_pgd_size(mmu->kvm), GFP_KERNEL | __GFP_ZERO);
>  	if (!pgd)
>  		return -ENOMEM;
>  
>  	pgd_phys = virt_to_phys(pgd);
> -	if (WARN_ON(pgd_phys & ~kvm_vttbr_baddr_mask(kvm)))
> +	if (WARN_ON(pgd_phys & ~kvm_vttbr_baddr_mask(mmu->kvm)))
>  		return -EINVAL;
>  
> -	kvm->arch.pgd = pgd;
> -	kvm->arch.pgd_phys = pgd_phys;
> +	mmu->pgd = pgd;
> +	mmu->pgd_phys = pgd_phys;
>  	return 0;
>  }
>  
> @@ -950,7 +962,7 @@ static void stage2_unmap_memslot(struct kvm *kvm,
>  
>  		if (!(vma->vm_flags & VM_PFNMAP)) {
>  			gpa_t gpa = addr + (vm_start - memslot->userspace_addr);
> -			unmap_stage2_range(kvm, gpa, vm_end - vm_start);
> +			unmap_stage2_range(&kvm->arch.mmu, gpa, vm_end - vm_start);
>  		}
>  		hva = vm_end;
>  	} while (hva < reg_end);
> @@ -982,24 +994,16 @@ void stage2_unmap_vm(struct kvm *kvm)
>  	srcu_read_unlock(&kvm->srcu, idx);
>  }
>  
> -/**
> - * kvm_free_stage2_pgd - free all stage-2 tables
> - * @kvm:	The KVM struct pointer for the VM.
> - *
> - * Walks the level-1 page table pointed to by kvm->arch.pgd and frees all
> - * underlying level-2 and level-3 tables before freeing the actual level-1 table
> - * and setting the struct pointer to NULL.
> - */
> -void kvm_free_stage2_pgd(struct kvm *kvm)
> +void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
>  {
> +	struct kvm *kvm = mmu->kvm;
>  	void *pgd = NULL;
>  
>  	spin_lock(&kvm->mmu_lock);
> -	if (kvm->arch.pgd) {
> -		unmap_stage2_range(kvm, 0, kvm_phys_size(kvm));
> -		pgd = READ_ONCE(kvm->arch.pgd);
> -		kvm->arch.pgd = NULL;
> -		kvm->arch.pgd_phys = 0;
> +	if (mmu->pgd) {
> +		unmap_stage2_range(mmu, 0, kvm_phys_size(kvm));
> +		pgd = READ_ONCE(mmu->pgd);
> +		mmu->pgd = NULL;
>  	}
>  	spin_unlock(&kvm->mmu_lock);
>  
> @@ -1008,13 +1012,14 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
>  		free_pages_exact(pgd, stage2_pgd_size(kvm));
>  }
>  
> -static pud_t *stage2_get_pud(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
> +static pud_t *stage2_get_pud(struct kvm_s2_mmu *mmu, struct kvm_mmu_memory_cache *cache,

This line is also longer than 80 characters. I'm bringing it up because in other
places you have tried not to go over the 80 character limit.

>  			     phys_addr_t addr)
>  {
> +	struct kvm *kvm __maybe_unused = mmu->kvm;
>  	pgd_t *pgd;
>  	pud_t *pud;
>  
> -	pgd = kvm->arch.pgd + stage2_pgd_index(kvm, addr);
> +	pgd = mmu->pgd + stage2_pgd_index(kvm, addr);
>  	if (stage2_pgd_none(kvm, *pgd)) {
>  		if (!cache)
>  			return NULL;
> @@ -1026,13 +1031,14 @@ static pud_t *stage2_get_pud(struct kvm *kvm, struct kvm_mmu_memory_cache *cache
>  	return stage2_pud_offset(kvm, pgd, addr);
>  }
>  
> -static pmd_t *stage2_get_pmd(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
> +static pmd_t *stage2_get_pmd(struct kvm_s2_mmu *mmu, struct kvm_mmu_memory_cache *cache,

Same here.

>  			     phys_addr_t addr)
>  {
> +	struct kvm *kvm __maybe_unused = mmu->kvm;
>  	pud_t *pud;
>  	pmd_t *pmd;
>  
> -	pud = stage2_get_pud(kvm, cache, addr);
> +	pud = stage2_get_pud(mmu, cache, addr);
>  	if (!pud || stage2_pud_huge(kvm, *pud))
>  		return NULL;
>  
> @@ -1047,13 +1053,14 @@ static pmd_t *stage2_get_pmd(struct kvm *kvm, struct kvm_mmu_memory_cache *cache
>  	return stage2_pmd_offset(kvm, pud, addr);
>  }
>  
> -static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
> -			       *cache, phys_addr_t addr, const pmd_t *new_pmd)
> +static int stage2_set_pmd_huge(struct kvm_s2_mmu *mmu,
> +			       struct kvm_mmu_memory_cache *cache,
> +			       phys_addr_t addr, const pmd_t *new_pmd)
>  {
>  	pmd_t *pmd, old_pmd;
>  
>  retry:
> -	pmd = stage2_get_pmd(kvm, cache, addr);
> +	pmd = stage2_get_pmd(mmu, cache, addr);
>  	VM_BUG_ON(!pmd);
>  
>  	old_pmd = *pmd;
> @@ -1086,7 +1093,7 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
>  		 * get handled accordingly.
>  		 */
>  		if (!pmd_thp_or_huge(old_pmd)) {
> -			unmap_stage2_range(kvm, addr & S2_PMD_MASK, S2_PMD_SIZE);
> +			unmap_stage2_range(mmu, addr & S2_PMD_MASK, S2_PMD_SIZE);
>  			goto retry;
>  		}
>  		/*
> @@ -1102,7 +1109,7 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
>  		 */
>  		WARN_ON_ONCE(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd));
>  		pmd_clear(pmd);
> -		kvm_tlb_flush_vmid_ipa(kvm, addr);
> +		kvm_tlb_flush_vmid_ipa(mmu, addr);
>  	} else {
>  		get_page(virt_to_page(pmd));
>  	}
> @@ -1111,13 +1118,15 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
>  	return 0;
>  }
>  
> -static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
> +static int stage2_set_pud_huge(struct kvm_s2_mmu *mmu,
> +			       struct kvm_mmu_memory_cache *cache,
>  			       phys_addr_t addr, const pud_t *new_pudp)
>  {
> +	struct kvm *kvm __maybe_unused = mmu->kvm;
>  	pud_t *pudp, old_pud;
>  
>  retry:
> -	pudp = stage2_get_pud(kvm, cache, addr);
> +	pudp = stage2_get_pud(mmu, cache, addr);
>  	VM_BUG_ON(!pudp);
>  
>  	old_pud = *pudp;
> @@ -1136,13 +1145,13 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cac
>  		 * the range for this block and retry.
>  		 */
>  		if (!stage2_pud_huge(kvm, old_pud)) {
> -			unmap_stage2_range(kvm, addr & S2_PUD_MASK, S2_PUD_SIZE);
> +			unmap_stage2_range(mmu, addr & S2_PUD_MASK, S2_PUD_SIZE);
>  			goto retry;
>  		}
>  
>  		WARN_ON_ONCE(kvm_pud_pfn(old_pud) != kvm_pud_pfn(*new_pudp));
>  		stage2_pud_clear(kvm, pudp);
> -		kvm_tlb_flush_vmid_ipa(kvm, addr);
> +		kvm_tlb_flush_vmid_ipa(mmu, addr);
>  	} else {
>  		get_page(virt_to_page(pudp));
>  	}
> @@ -1157,9 +1166,10 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cac
>   * leaf-entry is returned in the appropriate level variable - pudpp,
>   * pmdpp, ptepp.
>   */
> -static bool stage2_get_leaf_entry(struct kvm *kvm, phys_addr_t addr,
> +static bool stage2_get_leaf_entry(struct kvm_s2_mmu *mmu, phys_addr_t addr,
>  				  pud_t **pudpp, pmd_t **pmdpp, pte_t **ptepp)
>  {
> +	struct kvm *kvm __maybe_unused = mmu->kvm;
>  	pud_t *pudp;
>  	pmd_t *pmdp;
>  	pte_t *ptep;
> @@ -1168,7 +1178,7 @@ static bool stage2_get_leaf_entry(struct kvm *kvm, phys_addr_t addr,
>  	*pmdpp = NULL;
>  	*ptepp = NULL;
>  
> -	pudp = stage2_get_pud(kvm, NULL, addr);
> +	pudp = stage2_get_pud(mmu, NULL, addr);
>  	if (!pudp || stage2_pud_none(kvm, *pudp) || !stage2_pud_present(kvm, *pudp))
>  		return false;
>  
> @@ -1194,14 +1204,14 @@ static bool stage2_get_leaf_entry(struct kvm *kvm, phys_addr_t addr,
>  	return true;
>  }
>  
> -static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr)
> +static bool stage2_is_exec(struct kvm_s2_mmu *mmu, phys_addr_t addr)
>  {
>  	pud_t *pudp;
>  	pmd_t *pmdp;
>  	pte_t *ptep;
>  	bool found;
>  
> -	found = stage2_get_leaf_entry(kvm, addr, &pudp, &pmdp, &ptep);
> +	found = stage2_get_leaf_entry(mmu, addr, &pudp, &pmdp, &ptep);
>  	if (!found)
>  		return false;
>  
> @@ -1213,10 +1223,12 @@ static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr)
>  		return kvm_s2pte_exec(ptep);
>  }
>  
> -static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
> +static int stage2_set_pte(struct kvm_s2_mmu *mmu,
> +			  struct kvm_mmu_memory_cache *cache,
>  			  phys_addr_t addr, const pte_t *new_pte,
>  			  unsigned long flags)
>  {
> +	struct kvm *kvm __maybe_unused = mmu->kvm;
>  	pud_t *pud;
>  	pmd_t *pmd;
>  	pte_t *pte, old_pte;
> @@ -1226,7 +1238,7 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
>  	VM_BUG_ON(logging_active && !cache);
>  
>  	/* Create stage-2 page table mapping - Levels 0 and 1 */
> -	pud = stage2_get_pud(kvm, cache, addr);
> +	pud = stage2_get_pud(mmu, cache, addr);
>  	if (!pud) {
>  		/*
>  		 * Ignore calls from kvm_set_spte_hva for unallocated
> @@ -1240,7 +1252,7 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
>  	 * on to allocate page.
>  	 */
>  	if (logging_active)
> -		stage2_dissolve_pud(kvm, addr, pud);
> +		stage2_dissolve_pud(mmu, addr, pud);
>  
>  	if (stage2_pud_none(kvm, *pud)) {
>  		if (!cache)
> @@ -1264,7 +1276,7 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
>  	 * allocate page.
>  	 */
>  	if (logging_active)
> -		stage2_dissolve_pmd(kvm, addr, pmd);
> +		stage2_dissolve_pmd(mmu, addr, pmd);
>  
>  	/* Create stage-2 page mappings - Level 2 */
>  	if (pmd_none(*pmd)) {
> @@ -1288,7 +1300,7 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
>  			return 0;
>  
>  		kvm_set_pte(pte, __pte(0));
> -		kvm_tlb_flush_vmid_ipa(kvm, addr);
> +		kvm_tlb_flush_vmid_ipa(mmu, addr);
>  	} else {
>  		get_page(virt_to_page(pte));
>  	}
> @@ -1354,8 +1366,8 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
>  		if (ret)
>  			goto out;
>  		spin_lock(&kvm->mmu_lock);
> -		ret = stage2_set_pte(kvm, &cache, addr, &pte,
> -						KVM_S2PTE_FLAG_IS_IOMAP);
> +		ret = stage2_set_pte(&kvm->arch.mmu, &cache, addr, &pte,
> +				     KVM_S2PTE_FLAG_IS_IOMAP);
>  		spin_unlock(&kvm->mmu_lock);
>  		if (ret)
>  			goto out;
> @@ -1441,9 +1453,10 @@ static void stage2_wp_ptes(pmd_t *pmd, phys_addr_t addr, phys_addr_t end)
>   * @addr:	range start address
>   * @end:	range end address
>   */
> -static void stage2_wp_pmds(struct kvm *kvm, pud_t *pud,
> +static void stage2_wp_pmds(struct kvm_s2_mmu *mmu, pud_t *pud,
>  			   phys_addr_t addr, phys_addr_t end)
>  {
> +	struct kvm *kvm = mmu->kvm;
>  	pmd_t *pmd;
>  	phys_addr_t next;
>  
> @@ -1463,14 +1476,15 @@ static void stage2_wp_pmds(struct kvm *kvm, pud_t *pud,
>  }
>  
>  /**
> - * stage2_wp_puds - write protect PGD range
> - * @pgd:	pointer to pgd entry
> - * @addr:	range start address
> - * @end:	range end address
> - */
> -static void  stage2_wp_puds(struct kvm *kvm, pgd_t *pgd,
> +  * stage2_wp_puds - write protect PGD range
> +  * @pgd:	pointer to pgd entry
> +  * @addr:	range start address
> +  * @end:	range end address
> +  */
> +static void  stage2_wp_puds(struct kvm_s2_mmu *mmu, pgd_t *pgd,
>  			    phys_addr_t addr, phys_addr_t end)
>  {
> +	struct kvm *kvm __maybe_unused = mmu->kvm;
>  	pud_t *pud;
>  	phys_addr_t next;
>  
> @@ -1482,7 +1496,7 @@ static void  stage2_wp_puds(struct kvm *kvm, pgd_t *pgd,
>  				if (!kvm_s2pud_readonly(pud))
>  					kvm_set_s2pud_readonly(pud);
>  			} else {
> -				stage2_wp_pmds(kvm, pud, addr, next);
> +				stage2_wp_pmds(mmu, pud, addr, next);
>  			}
>  		}
>  	} while (pud++, addr = next, addr != end);
> @@ -1494,12 +1508,13 @@ static void  stage2_wp_puds(struct kvm *kvm, pgd_t *pgd,
>   * @addr:	Start address of range
>   * @end:	End address of range
>   */
> -static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
> +static void stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end)

Same here.

>  {
> +	struct kvm *kvm = mmu->kvm;
>  	pgd_t *pgd;
>  	phys_addr_t next;
>  
> -	pgd = kvm->arch.pgd + stage2_pgd_index(kvm, addr);
> +	pgd = mmu->pgd + stage2_pgd_index(kvm, addr);
>  	do {
>  		/*
>  		 * Release kvm_mmu_lock periodically if the memory region is
> @@ -1511,11 +1526,11 @@ static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
>  		 * the lock.
>  		 */
>  		cond_resched_lock(&kvm->mmu_lock);
> -		if (!READ_ONCE(kvm->arch.pgd))
> +		if (!READ_ONCE(mmu->pgd))
>  			break;
>  		next = stage2_pgd_addr_end(kvm, addr, end);
>  		if (stage2_pgd_present(kvm, *pgd))
> -			stage2_wp_puds(kvm, pgd, addr, next);
> +			stage2_wp_puds(mmu, pgd, addr, next);
>  	} while (pgd++, addr = next, addr != end);
>  }
>  
> @@ -1540,7 +1555,7 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
>  	phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
>  
>  	spin_lock(&kvm->mmu_lock);
> -	stage2_wp_range(kvm, start, end);
> +	stage2_wp_range(&kvm->arch.mmu, start, end);
>  	spin_unlock(&kvm->mmu_lock);
>  	kvm_flush_remote_tlbs(kvm);
>  }
> @@ -1564,7 +1579,7 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
>  	phys_addr_t start = (base_gfn +  __ffs(mask)) << PAGE_SHIFT;
>  	phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT;
>  
> -	stage2_wp_range(kvm, start, end);
> +	stage2_wp_range(&kvm->arch.mmu, start, end);
>  }
>  
>  /*
> @@ -1677,6 +1692,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	pgprot_t mem_type = PAGE_S2;
>  	bool logging_active = memslot_is_logging(memslot);
>  	unsigned long vma_pagesize, flags = 0;
> +	struct kvm_s2_mmu *mmu = vcpu->arch.hw_mmu;
>  
>  	write_fault = kvm_is_write_fault(vcpu);
>  	exec_fault = kvm_vcpu_trap_is_iabt(vcpu);
> @@ -1796,7 +1812,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	 * execute permissions, and we preserve whatever we have.
>  	 */
>  	needs_exec = exec_fault ||
> -		(fault_status == FSC_PERM && stage2_is_exec(kvm, fault_ipa));
> +		(fault_status == FSC_PERM && stage2_is_exec(mmu, fault_ipa));
>  
>  	if (vma_pagesize == PUD_SIZE) {
>  		pud_t new_pud = kvm_pfn_pud(pfn, mem_type);
> @@ -1808,7 +1824,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  		if (needs_exec)
>  			new_pud = kvm_s2pud_mkexec(new_pud);
>  
> -		ret = stage2_set_pud_huge(kvm, memcache, fault_ipa, &new_pud);
> +		ret = stage2_set_pud_huge(mmu, memcache, fault_ipa, &new_pud);
>  	} else if (vma_pagesize == PMD_SIZE) {
>  		pmd_t new_pmd = kvm_pfn_pmd(pfn, mem_type);
>  
> @@ -1820,7 +1836,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  		if (needs_exec)
>  			new_pmd = kvm_s2pmd_mkexec(new_pmd);
>  
> -		ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
> +		ret = stage2_set_pmd_huge(mmu, memcache, fault_ipa, &new_pmd);
>  	} else {
>  		pte_t new_pte = kvm_pfn_pte(pfn, mem_type);
>  
> @@ -1832,7 +1848,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  		if (needs_exec)
>  			new_pte = kvm_s2pte_mkexec(new_pte);
>  
> -		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, flags);
> +		ret = stage2_set_pte(mmu, memcache, fault_ipa, &new_pte, flags);
>  	}
>  
>  out_unlock:
> @@ -1861,7 +1877,7 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
>  
>  	spin_lock(&vcpu->kvm->mmu_lock);
>  
> -	if (!stage2_get_leaf_entry(vcpu->kvm, fault_ipa, &pud, &pmd, &pte))
> +	if (!stage2_get_leaf_entry(vcpu->arch.hw_mmu, fault_ipa, &pud, &pmd, &pte))
>  		goto out;
>  
>  	if (pud) {		/* HugeTLB */
> @@ -2031,14 +2047,14 @@ static int handle_hva_to_gpa(struct kvm *kvm,
>  
>  static int kvm_unmap_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data)
>  {
> -	unmap_stage2_range(kvm, gpa, size);
> +	unmap_stage2_range(&kvm->arch.mmu, gpa, size);
>  	return 0;
>  }
>  
>  int kvm_unmap_hva_range(struct kvm *kvm,
>  			unsigned long start, unsigned long end)
>  {
> -	if (!kvm->arch.pgd)
> +	if (!kvm->arch.mmu.pgd)
>  		return 0;
>  
>  	trace_kvm_unmap_hva_range(start, end);
> @@ -2058,7 +2074,7 @@ static int kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data
>  	 * therefore stage2_set_pte() never needs to clear out a huge PMD
>  	 * through this calling path.
>  	 */
> -	stage2_set_pte(kvm, NULL, gpa, pte, 0);
> +	stage2_set_pte(&kvm->arch.mmu, NULL, gpa, pte, 0);
>  	return 0;
>  }
>  
> @@ -2069,7 +2085,7 @@ int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
>  	kvm_pfn_t pfn = pte_pfn(pte);
>  	pte_t stage2_pte;
>  
> -	if (!kvm->arch.pgd)
> +	if (!kvm->arch.mmu.pgd)
>  		return 0;
>  
>  	trace_kvm_set_spte_hva(hva);
> @@ -2092,7 +2108,7 @@ static int kvm_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data)
>  	pte_t *pte;
>  
>  	WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE);
> -	if (!stage2_get_leaf_entry(kvm, gpa, &pud, &pmd, &pte))
> +	if (!stage2_get_leaf_entry(&kvm->arch.mmu, gpa, &pud, &pmd, &pte))
>  		return 0;
>  
>  	if (pud)
> @@ -2110,7 +2126,7 @@ static int kvm_test_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *
>  	pte_t *pte;
>  
>  	WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE);
> -	if (!stage2_get_leaf_entry(kvm, gpa, &pud, &pmd, &pte))
> +	if (!stage2_get_leaf_entry(&kvm->arch.mmu, gpa, &pud, &pmd, &pte))
>  		return 0;
>  
>  	if (pud)
> @@ -2123,7 +2139,7 @@ static int kvm_test_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *
>  
>  int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end)
>  {
> -	if (!kvm->arch.pgd)
> +	if (!kvm->arch.mmu.pgd)
>  		return 0;
>  	trace_kvm_age_hva(start, end);
>  	return handle_hva_to_gpa(kvm, start, end, kvm_age_hva_handler, NULL);
> @@ -2131,7 +2147,7 @@ int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end)
>  
>  int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
>  {
> -	if (!kvm->arch.pgd)
> +	if (!kvm->arch.mmu.pgd)
>  		return 0;
>  	trace_kvm_test_age_hva(hva);
>  	return handle_hva_to_gpa(kvm, hva, hva, kvm_test_age_hva_handler, NULL);
> @@ -2344,9 +2360,9 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>  
>  	spin_lock(&kvm->mmu_lock);
>  	if (ret)
> -		unmap_stage2_range(kvm, mem->guest_phys_addr, mem->memory_size);
> +		unmap_stage2_range(&kvm->arch.mmu, mem->guest_phys_addr, mem->memory_size);

Same here.

Thanks,

Alex

>  	else
> -		stage2_flush_memslot(kvm, memslot);
> +		stage2_flush_memslot(&kvm->arch.mmu, memslot);
>  	spin_unlock(&kvm->mmu_lock);
>  out:
>  	up_read(&current->mm->mmap_sem);
> @@ -2370,7 +2386,7 @@ void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen)
>  
>  void kvm_arch_flush_shadow_all(struct kvm *kvm)
>  {
> -	kvm_free_stage2_pgd(kvm);
> +	kvm_free_stage2_pgd(&kvm->arch.mmu);
>  }
>  
>  void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
> @@ -2380,7 +2396,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>  	phys_addr_t size = slot->npages << PAGE_SHIFT;
>  
>  	spin_lock(&kvm->mmu_lock);
> -	unmap_stage2_range(kvm, gpa, size);
> +	unmap_stage2_range(&kvm->arch.mmu, gpa, size);
>  	spin_unlock(&kvm->mmu_lock);
>  }
>  

^ permalink raw reply related	[flat|nested] 176+ messages in thread

* Re: [PATCH 13/59] KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg()
  2019-06-25 15:18   ` Alexandru Elisei
  2019-07-01  9:58     ` Alexandru Elisei
@ 2019-07-03 15:59     ` Marc Zyngier
  2019-07-03 16:32       ` Alexandru Elisei
  1 sibling, 1 reply; 176+ messages in thread
From: Marc Zyngier @ 2019-07-03 15:59 UTC (permalink / raw)
  To: Alexandru Elisei, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Dave Martin

On 25/06/2019 16:18, Alexandru Elisei wrote:
> Hi Marc,
> 
> A question regarding this patch. This patch modifies vcpu_{read,write}_sys_reg
> to handle virtual EL2 registers. However, the file kvm/emulate-nested.c added by
> patch 10/59 "KVM: arm64: nv: Support virtual EL2 exceptions" already uses
> vcpu_{read,write}_sys_reg to access EL2 registers. In my opinion, it doesn't
> really matter which comes first because nested support is only enabled in the
> last patch of the series, but I thought I should bring this up in case it is not
> what you intended.

It doesn't really matter at that stage. The only thing I'm trying to
achieve in the middle of the series is not to break the build, and not
to cause non-NV to fail.

> 
> On 6/21/19 10:37 AM, Marc Zyngier wrote:
>> From: Andre Przywara <andre.przywara@arm.com>
>>
>> KVM internally uses accessor functions when reading or writing the
>> guest's system registers. This takes care of accessing either the stored
>> copy or using the "live" EL1 system registers when the host uses VHE.
>>
>> With the introduction of virtual EL2 we add a bunch of EL2 system
>> registers, which now must also be taken care of:
>> - If the guest is running in vEL2, and we access an EL1 sysreg, we must
>>   revert to the stored version of that, and not use the CPU's copy.
>> - If the guest is running in vEL1, and we access an EL2 sysreg, we must
>>   also use the stored version, since the CPU carries the EL1 copy.
>> - Some EL2 system registers are supposed to affect the current execution
>>   of the system, so we need to put them into their respective EL1
>>   counterparts. For this we need to define a mapping between the two.
>>   This is done using the newly introduced struct el2_sysreg_map.
>> - Some EL2 system registers have a different format than their EL1
>>   counterpart, so we need to translate them before writing them to the
>>   CPU. This is done using an (optional) translate function in the map.
>> - There are the three special registers SP_EL2, SPSR_EL2 and ELR_EL2,
>>   which need some separate handling.
> I see no change in this patch related to SPSR_EL2. Special handling of SPSR_EL2
> is added in the next patch, patch 14/59 "KVM: arm64: nv: Handle SPSR_EL2 specially".

Indeed, this needs rewriting (we ended-up splitting the SPSR stuff out
as it was messy and not completely correct). I may take the rest of the
special stuff out as well.

>>
>> All of these cases are now wrapped into the existing accessor functions,
>> so KVM users wouldn't need to care whether they access EL2 or EL1
>> registers and also which state the guest is in.
>>
>> This handles what was formerly known as the "shadow state" dynamically,
>> without requiring a separate copy for each vCPU EL.
>>
>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm64/include/asm/kvm_emulate.h |   6 +
>>  arch/arm64/include/asm/kvm_host.h    |   5 +
>>  arch/arm64/kvm/sys_regs.c            | 163 +++++++++++++++++++++++++++
>>  3 files changed, 174 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
>> index c43aac5fed69..f37006b6eec4 100644
>> --- a/arch/arm64/include/asm/kvm_emulate.h
>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>> @@ -70,6 +70,12 @@ void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu);
>>  int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2);
>>  int kvm_inject_nested_irq(struct kvm_vcpu *vcpu);
>>  
>> +u64 translate_tcr(u64 tcr);
>> +u64 translate_cptr(u64 tcr);
>> +u64 translate_sctlr(u64 tcr);
>> +u64 translate_ttbr0(u64 tcr);
>> +u64 translate_cnthctl(u64 tcr);
>> +
>>  static inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
>>  {
>>  	return !(vcpu->arch.hcr_el2 & HCR_RW);
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 2d4290d2513a..dae9c42a7219 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -217,6 +217,11 @@ enum vcpu_sysreg {
>>  	NR_SYS_REGS	/* Nothing after this line! */
>>  };
>>  
>> +static inline bool sysreg_is_el2(int reg)
>> +{
>> +	return reg >= FIRST_EL2_SYSREG && reg < NR_SYS_REGS;
>> +}
>> +
>>  /* 32bit mapping */
>>  #define c0_MPIDR	(MPIDR_EL1 * 2)	/* MultiProcessor ID Register */
>>  #define c0_CSSELR	(CSSELR_EL1 * 2)/* Cache Size Selection Register */
>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>> index 693dd063c9c2..d024114da162 100644
>> --- a/arch/arm64/kvm/sys_regs.c
>> +++ b/arch/arm64/kvm/sys_regs.c
>> @@ -76,11 +76,142 @@ static bool write_to_read_only(struct kvm_vcpu *vcpu,
>>  	return false;
>>  }
>>  
>> +static u64 tcr_el2_ips_to_tcr_el1_ps(u64 tcr_el2)
> The code seems to suggest that you are translating TCR_EL2.PS to TCR_EL1.IPS.
> Perhaps the function should be named tcr_el2_ps_to_tcr_el1_ips?

yup.

>> +{
>> +	return ((tcr_el2 & TCR_EL2_PS_MASK) >> TCR_EL2_PS_SHIFT)
>> +		<< TCR_IPS_SHIFT;
>> +}
>> +
>> +u64 translate_tcr(u64 tcr)
>> +{
>> +	return TCR_EPD1_MASK |				/* disable TTBR1_EL1 */
>> +	       ((tcr & TCR_EL2_TBI) ? TCR_TBI0 : 0) |
>> +	       tcr_el2_ips_to_tcr_el1_ps(tcr) |
>> +	       (tcr & TCR_EL2_TG0_MASK) |
>> +	       (tcr & TCR_EL2_ORGN0_MASK) |
>> +	       (tcr & TCR_EL2_IRGN0_MASK) |
>> +	       (tcr & TCR_EL2_T0SZ_MASK);
>> +}
>> +
>> +u64 translate_cptr(u64 cptr_el2)
> The argument name is not consistent with the other translate_* functions. I
> think it is reasonably obvious that you are translating an EL2 register.

That's pretty much immaterial, and the variable could be called zorglub.
Consistency is good, but I don't think we need to worry about that level
of detail.

>> +{
>> +	u64 cpacr_el1 = 0;
>> +
>> +	if (!(cptr_el2 & CPTR_EL2_TFP))
>> +		cpacr_el1 |= CPACR_EL1_FPEN;
>> +	if (cptr_el2 & CPTR_EL2_TTA)
>> +		cpacr_el1 |= CPACR_EL1_TTA;
>> +	if (!(cptr_el2 & CPTR_EL2_TZ))
>> +		cpacr_el1 |= CPACR_EL1_ZEN;
>> +
>> +	return cpacr_el1;
>> +}
>> +
>> +u64 translate_sctlr(u64 sctlr)
>> +{
>> +	/* Bit 20 is RES1 in SCTLR_EL1, but RES0 in SCTLR_EL2 */
>> +	return sctlr | BIT(20);
>> +}
>> +
>> +u64 translate_ttbr0(u64 ttbr0)
>> +{
>> +	/* Force ASID to 0 (ASID 0 or RES0) */
> Are you forcing ASID to 0 because you are only expecting a non-vhe guest
> hypervisor to access ttbr0_el2, in which case the architecture says that the
> ASID field is RES0? Is it so unlikely that a vhe guest hypervisor will access
> ttbr0_el2 directly that it's not worth adding a check for that?

Like all the translate_* function, this is only called when running a
non-VHE guest so that the EL2 register is translated to the EL1 format.
A VHE guest usually has its sysregs in the EL1 format, and certainly
does for TTBR0_EL2.

>> +	return ttbr0 & ~GENMASK_ULL(63, 48);
>> +}
>> +
>> +u64 translate_cnthctl(u64 cnthctl)
>> +{
>> +	return ((cnthctl & 0x3) << 10) | (cnthctl & 0xfc);
> 
> Patch 16/59 "KVM: arm64: nv: Save/Restore vEL2 sysregs" suggests that you are
> translating CNTHCTL to write it to CNTKCTL_EL1. Looking at ARM DDI 0487D.b,
> CNTKCTL_EL1 has bits 63:10 RES0. I think the correct value should be ((cnthctl &
> 0x3) << 8) | (cnthctl & 0xfc).

Rookie mistake! When HCR_EL2.E2h==1 (which is always the case for NV),
CNTKCTL_EL1 accesses CNTHCTL_EL2. What you have here is the translation
of non-VHE CNTHCTL_EL2 to its VHE equivalent.

> 
>> +}
>> +
>> +#define EL2_SYSREG(el2, el1, translate)	\
>> +	[el2 - FIRST_EL2_SYSREG] = { el2, el1, translate }
>> +#define PURE_EL2_SYSREG(el2) \
>> +	[el2 - FIRST_EL2_SYSREG] = { el2,__INVALID_SYSREG__, NULL }
>> +/*
>> + * Associate vEL2 registers to their EL1 counterparts on the CPU.
>> + * The translate function can be NULL, when the register layout is identical.
>> + */
>> +struct el2_sysreg_map {
>> +	int sysreg;	/* EL2 register index into the array above */
>> +	int mapping;	/* associated EL1 register */
>> +	u64 (*translate)(u64 value);
>> +} nested_sysreg_map[NR_SYS_REGS - FIRST_EL2_SYSREG] = {
>> +	PURE_EL2_SYSREG( VPIDR_EL2 ),
>> +	PURE_EL2_SYSREG( VMPIDR_EL2 ),
>> +	PURE_EL2_SYSREG( ACTLR_EL2 ),
>> +	PURE_EL2_SYSREG( HCR_EL2 ),
>> +	PURE_EL2_SYSREG( MDCR_EL2 ),
>> +	PURE_EL2_SYSREG( HSTR_EL2 ),
>> +	PURE_EL2_SYSREG( HACR_EL2 ),
>> +	PURE_EL2_SYSREG( VTTBR_EL2 ),
>> +	PURE_EL2_SYSREG( VTCR_EL2 ),
>> +	PURE_EL2_SYSREG( RVBAR_EL2 ),
>> +	PURE_EL2_SYSREG( RMR_EL2 ),
>> +	PURE_EL2_SYSREG( TPIDR_EL2 ),
>> +	PURE_EL2_SYSREG( CNTVOFF_EL2 ),
>> +	PURE_EL2_SYSREG( CNTHCTL_EL2 ),
>> +	PURE_EL2_SYSREG( HPFAR_EL2 ),
>> +	EL2_SYSREG(      SCTLR_EL2,  SCTLR_EL1,      translate_sctlr ),
>> +	EL2_SYSREG(      CPTR_EL2,   CPACR_EL1,      translate_cptr  ),
>> +	EL2_SYSREG(      TTBR0_EL2,  TTBR0_EL1,      translate_ttbr0 ),
>> +	EL2_SYSREG(      TTBR1_EL2,  TTBR1_EL1,      NULL            ),
>> +	EL2_SYSREG(      TCR_EL2,    TCR_EL1,        translate_tcr   ),
>> +	EL2_SYSREG(      VBAR_EL2,   VBAR_EL1,       NULL            ),
>> +	EL2_SYSREG(      AFSR0_EL2,  AFSR0_EL1,      NULL            ),
>> +	EL2_SYSREG(      AFSR1_EL2,  AFSR1_EL1,      NULL            ),
>> +	EL2_SYSREG(      ESR_EL2,    ESR_EL1,        NULL            ),
>> +	EL2_SYSREG(      FAR_EL2,    FAR_EL1,        NULL            ),
>> +	EL2_SYSREG(      MAIR_EL2,   MAIR_EL1,       NULL            ),
>> +	EL2_SYSREG(      AMAIR_EL2,  AMAIR_EL1,      NULL            ),
>> +};
> Figuring out which registers are in this map and which aren't and are supposed
> to be treated differently is really cumbersome because they are split into two
> types of el2 registers and their order is different from the order in enum
> vcpu_sysreg (in kvm_host.h). Perhaps adding a comment about what registers will
> be treated differently would make the code a bit easier to follow?

I'm not sure what this buys us. We have 3 categories of EL2 sysregs:
- Purely emulated
- Directly mapped onto an EL1 sysreg
- Translated from EL2 to EL1

I think the wrappers represent that pretty well, although we could split
EL2_SYSREG into DIRECT_EL2_SYSREG and TRANSLATE_EL2_SYSREG. As for the
order, does it really matter? We also have the trap table order, which
is also different from the enum. Do you propose we reorder everything?

>> +
>> +static
>> +const struct el2_sysreg_map *find_el2_sysreg(const struct el2_sysreg_map *map,
>> +					     int reg)
>> +{
>> +	const struct el2_sysreg_map *entry;
>> +
>> +	if (!sysreg_is_el2(reg))
>> +		return NULL;
>> +
>> +	entry = &nested_sysreg_map[reg - FIRST_EL2_SYSREG];
>> +	if (entry->sysreg == __INVALID_SYSREG__)
>> +		return NULL;
>> +
>> +	return entry;
>> +}
>> +
>>  u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
>>  {
>> +
>>  	if (!vcpu->arch.sysregs_loaded_on_cpu)
>>  		goto immediate_read;
>>  
>> +	if (unlikely(sysreg_is_el2(reg))) {
>> +		const struct el2_sysreg_map *el2_reg;
>> +
>> +		if (!is_hyp_ctxt(vcpu))
>> +			goto immediate_read;
> I'm confused by this. is_hyp_ctxt returns false when the guest is not in vEL2
> AND HCR_EL.E2H or HCR_EL2.TGE are not set. In this case, the NV bit will not be
> set and the hardware will raise an undefined instruction exception when
> accessing an EL2 register from EL1. What am I missing?

You don't necessarily access an EL2 register just because you run at
EL2. You also access it because you emulate an EL1 instruction whose
behaviour is conditioned by an EL2 register.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 07/59] KVM: arm64: nv: Add EL2 system registers to vcpu context
  2019-07-03 13:20     ` Marc Zyngier
@ 2019-07-03 16:01       ` Marc Zyngier
  0 siblings, 0 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-07-03 16:01 UTC (permalink / raw)
  To: Alexandru Elisei, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Dave Martin

On 03/07/2019 14:20, Marc Zyngier wrote:
> On 24/06/2019 16:47, Alexandru Elisei wrote:
>> On 6/21/19 10:37 AM, Marc Zyngier wrote:
>>> From: Jintack Lim <jintack.lim@linaro.org>

[...]

>>> +	{ SYS_DESC(SYS_VPIDR_EL2), access_rw, reset_val, VPIDR_EL2, 0 },
>>> +	{ SYS_DESC(SYS_VMPIDR_EL2), access_rw, reset_val, VMPIDR_EL2, 0 },
>>> +
>>> +	{ SYS_DESC(SYS_SCTLR_EL2), access_rw, reset_val, SCTLR_EL2, 0 },
>> Some bits are RES1 for SCTLR_EL2.
> 
> See Patch #67.

The astute reader will notice that there is no patch #67 (yet). Patch
#57 is what I had in mind...

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 02/59] KVM: arm64: Move __load_guest_stage2 to kvm_mmu.h
  2019-07-03  9:30     ` Marc Zyngier
@ 2019-07-03 16:13       ` Dave Martin
  0 siblings, 0 replies; 176+ messages in thread
From: Dave Martin @ 2019-07-03 16:13 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, Andre Przywara, kvmarm, linux-arm-kernel

On Wed, Jul 03, 2019 at 10:30:03AM +0100, Marc Zyngier wrote:
> On 24/06/2019 12:19, Dave Martin wrote:
> > On Fri, Jun 21, 2019 at 10:37:46AM +0100, Marc Zyngier wrote:
> >> Having __load_guest_stage2 in kvm_hyp.h is quickly going to trigger
> >> a circular include problem. In order to avoid this, let's move
> >> it to kvm_mmu.h, where it will be a better fit anyway.
> >>
> >> In the process, drop the __hyp_text annotation, which doesn't help
> >> as the function is marked as __always_inline.
> > 
> > Does GCC always inline things marked __always_inline?
> > 
> > I seem to remember some gotchas in this area, but I may be being
> > paranoid.
> 
> Yes, this is a strong guarantee. Things like static keys rely on that,
> for example.
> 
> > 
> > If this still only called from hyp, I'd be tempted to heep the
> > __hyp_text annotation just to be on the safe side.
> 
> The trouble with that is that re-introduces the circular dependency with
> kvm_hyp.h that this patch is trying to break...

Ah, right.

I guess it's easier to put up with this, then.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 04/59] KVM: arm64: nv: Introduce nested virtualization VCPU feature
  2019-07-03 11:56     ` Marc Zyngier
@ 2019-07-03 16:24       ` Dave Martin
  0 siblings, 0 replies; 176+ messages in thread
From: Dave Martin @ 2019-07-03 16:24 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, Andre Przywara, kvmarm, linux-arm-kernel

On Wed, Jul 03, 2019 at 12:56:51PM +0100, Marc Zyngier wrote:
> On 24/06/2019 12:43, Dave Martin wrote:
> > On Fri, Jun 21, 2019 at 10:37:48AM +0100, Marc Zyngier wrote:
> >> From: Christoffer Dall <christoffer.dall@arm.com>
> >>
> >> Introduce the feature bit and a primitive that checks if the feature is
> >> set behind a static key check based on the cpus_have_const_cap check.
> >>
> >> Checking nested_virt_in_use() on systems without nested virt enabled
> >> should have neglgible overhead.
> >>
> >> We don't yet allow userspace to actually set this feature.
> >>
> >> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> >> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >> ---
> > 
> > [...]
> > 
> >> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> >> new file mode 100644
> >> index 000000000000..8a3d121a0b42
> >> --- /dev/null
> >> +++ b/arch/arm64/include/asm/kvm_nested.h
> >> @@ -0,0 +1,13 @@
> >> +/* SPDX-License-Identifier: GPL-2.0 */
> >> +#ifndef __ARM64_KVM_NESTED_H
> >> +#define __ARM64_KVM_NESTED_H
> >> +
> >> +#include <linux/kvm_host.h>
> >> +
> >> +static inline bool nested_virt_in_use(const struct kvm_vcpu *vcpu)
> >> +{
> >> +	return cpus_have_const_cap(ARM64_HAS_NESTED_VIRT) &&
> >> +		test_bit(KVM_ARM_VCPU_NESTED_VIRT, vcpu->arch.features);
> >> +}
> > 
> > Also, is it worth having a vcpu->arch.flags flag for this, similarly to
> > SVE and ptrauth?
> 
> What would we expose through this flag?

Nothing new, put possibly more efficient to access.

AFAIK, test_bit() always results in an explicit load, whereas
vcpu->arch.flags is just a variable, which we already access on some hot
paths.  So the compiler can read it once and cache it, with a bit of
luck.

For flags that are fixed after vcpu init, or flags that are only read/
written by the vcpu thread itself, this should work fine.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 04/59] KVM: arm64: nv: Introduce nested virtualization VCPU feature
  2019-07-03 11:53     ` Marc Zyngier
@ 2019-07-03 16:27       ` Dave Martin
  0 siblings, 0 replies; 176+ messages in thread
From: Dave Martin @ 2019-07-03 16:27 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, Andre Przywara, kvmarm, linux-arm-kernel

On Wed, Jul 03, 2019 at 12:53:58PM +0100, Marc Zyngier wrote:
> On 24/06/2019 12:28, Dave Martin wrote:
> > On Fri, Jun 21, 2019 at 10:37:48AM +0100, Marc Zyngier wrote:
> >> From: Christoffer Dall <christoffer.dall@arm.com>
> >>
> >> Introduce the feature bit and a primitive that checks if the feature is
> >> set behind a static key check based on the cpus_have_const_cap check.
> >>
> >> Checking nested_virt_in_use() on systems without nested virt enabled
> >> should have neglgible overhead.
> >>
> >> We don't yet allow userspace to actually set this feature.
> >>
> >> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> >> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >> ---
> >>  arch/arm/include/asm/kvm_nested.h   |  9 +++++++++
> >>  arch/arm64/include/asm/kvm_nested.h | 13 +++++++++++++
> >>  arch/arm64/include/uapi/asm/kvm.h   |  1 +
> >>  3 files changed, 23 insertions(+)
> >>  create mode 100644 arch/arm/include/asm/kvm_nested.h
> >>  create mode 100644 arch/arm64/include/asm/kvm_nested.h
> >>
> >> diff --git a/arch/arm/include/asm/kvm_nested.h b/arch/arm/include/asm/kvm_nested.h
> >> new file mode 100644
> >> index 000000000000..124ff6445f8f
> >> --- /dev/null
> >> +++ b/arch/arm/include/asm/kvm_nested.h
> >> @@ -0,0 +1,9 @@
> >> +/* SPDX-License-Identifier: GPL-2.0 */
> >> +#ifndef __ARM_KVM_NESTED_H
> >> +#define __ARM_KVM_NESTED_H
> >> +
> >> +#include <linux/kvm_host.h>
> >> +
> >> +static inline bool nested_virt_in_use(const struct kvm_vcpu *vcpu) { return false; }
> >> +
> >> +#endif /* __ARM_KVM_NESTED_H */
> >> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> >> new file mode 100644
> >> index 000000000000..8a3d121a0b42
> >> --- /dev/null
> >> +++ b/arch/arm64/include/asm/kvm_nested.h
> >> @@ -0,0 +1,13 @@
> >> +/* SPDX-License-Identifier: GPL-2.0 */
> >> +#ifndef __ARM64_KVM_NESTED_H
> >> +#define __ARM64_KVM_NESTED_H
> >> +
> >> +#include <linux/kvm_host.h>
> >> +
> >> +static inline bool nested_virt_in_use(const struct kvm_vcpu *vcpu)
> >> +{
> >> +	return cpus_have_const_cap(ARM64_HAS_NESTED_VIRT) &&
> >> +		test_bit(KVM_ARM_VCPU_NESTED_VIRT, vcpu->arch.features);
> >> +}
> >> +
> >> +#endif /* __ARM64_KVM_NESTED_H */
> >> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> >> index d819a3e8b552..563e2a8bae93 100644
> >> --- a/arch/arm64/include/uapi/asm/kvm.h
> >> +++ b/arch/arm64/include/uapi/asm/kvm.h
> >> @@ -106,6 +106,7 @@ struct kvm_regs {
> >>  #define KVM_ARM_VCPU_SVE		4 /* enable SVE for this CPU */
> >>  #define KVM_ARM_VCPU_PTRAUTH_ADDRESS	5 /* VCPU uses address authentication */
> >>  #define KVM_ARM_VCPU_PTRAUTH_GENERIC	6 /* VCPU uses generic authentication */
> >> +#define KVM_ARM_VCPU_NESTED_VIRT	7 /* Support nested virtualization */
> > 
> > This seems weirdly named:
> > 
> > Isn't the feature we're exposing here really EL2?  In that case, the
> > feature the guest gets with this flag enabled is plain virtualisation,
> > possibly with the option to nest further.
> > 
> > Does the guest also get nested virt (i.e., recursively nested virt from
> > the host's PoV) as a side effect, or would require an explicit extra
> > flag?
> 
> So far, there is no extra flag to describe further nesting, and it
> directly comes from EL2 being emulated. I don't mind renaming this to
> KVM_ARM_VCPU_HAS_EL2, or something similar... Whether we want userspace
> to control the exposure of the nesting capability (i.e. EL2 with
> ARMv8.3-NV) is another question.

Agreed.

KVM_ARM_VCPU_HAS_EL2 seems a reasonable name to me.

If we have have an internal flag in vcpu_arch.flags we could call that
something different (i.e., keep the NESTED_VIRT naming) if it's natural
to do so.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 07/59] KVM: arm64: nv: Add EL2 system registers to vcpu context
  2019-07-03 12:20     ` Marc Zyngier
@ 2019-07-03 16:31       ` Dave Martin
  0 siblings, 0 replies; 176+ messages in thread
From: Dave Martin @ 2019-07-03 16:31 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, Andre Przywara, kvmarm, linux-arm-kernel

On Wed, Jul 03, 2019 at 01:20:55PM +0100, Marc Zyngier wrote:
> On 24/06/2019 13:54, Dave Martin wrote:
> > On Fri, Jun 21, 2019 at 10:37:51AM +0100, Marc Zyngier wrote:
> >> From: Jintack Lim <jintack.lim@linaro.org>
> >>
> >> ARM v8.3 introduces a new bit in the HCR_EL2, which is the NV bit. When
> >> this bit is set, accessing EL2 registers in EL1 traps to EL2. In
> >> addition, executing the following instructions in EL1 will trap to EL2:
> >> tlbi, at, eret, and msr/mrs instructions to access SP_EL1. Most of the
> >> instructions that trap to EL2 with the NV bit were undef at EL1 prior to
> >> ARM v8.3. The only instruction that was not undef is eret.
> >>
> >> This patch sets up a handler for EL2 registers and SP_EL1 register
> >> accesses at EL1. The host hypervisor keeps those register values in
> >> memory, and will emulate their behavior.
> >>
> >> This patch doesn't set the NV bit yet. It will be set in a later patch
> >> once nested virtualization support is completed.
> >>
> >> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> >> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >> ---
> >>  arch/arm64/include/asm/kvm_host.h | 37 +++++++++++++++-
> >>  arch/arm64/include/asm/sysreg.h   | 50 ++++++++++++++++++++-
> >>  arch/arm64/kvm/sys_regs.c         | 74 ++++++++++++++++++++++++++++---
> >>  3 files changed, 154 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> >> index 4bcd9c1291d5..2d4290d2513a 100644
> >> --- a/arch/arm64/include/asm/kvm_host.h
> >> +++ b/arch/arm64/include/asm/kvm_host.h
> >> @@ -173,12 +173,47 @@ enum vcpu_sysreg {
> >>  	APGAKEYLO_EL1,
> >>  	APGAKEYHI_EL1,
> >>  
> >> -	/* 32bit specific registers. Keep them at the end of the range */
> >> +	/* 32bit specific registers. */
> > 
> > Out of interest, why did we originally want these to be at the end?
> > Because they're not at the end any more...
> 
> I seem to remember the original assembly switch code used that property.
> This is a long gone requirement, thankfully.

Ah, right.

> >>  	DACR32_EL2,	/* Domain Access Control Register */
> >>  	IFSR32_EL2,	/* Instruction Fault Status Register */
> >>  	FPEXC32_EL2,	/* Floating-Point Exception Control Register */
> >>  	DBGVCR32_EL2,	/* Debug Vector Catch Register */
> >>  
> >> +	/* EL2 registers sorted ascending by Op0, Op1, CRn, CRm, Op2 */
> >> +	FIRST_EL2_SYSREG,
> >> +	VPIDR_EL2 = FIRST_EL2_SYSREG,
> >> +			/* Virtualization Processor ID Register */
> >> +	VMPIDR_EL2,	/* Virtualization Multiprocessor ID Register */
> >> +	SCTLR_EL2,	/* System Control Register (EL2) */
> >> +	ACTLR_EL2,	/* Auxiliary Control Register (EL2) */
> >> +	HCR_EL2,	/* Hypervisor Configuration Register */
> >> +	MDCR_EL2,	/* Monitor Debug Configuration Register (EL2) */
> >> +	CPTR_EL2,	/* Architectural Feature Trap Register (EL2) */
> >> +	HSTR_EL2,	/* Hypervisor System Trap Register */
> >> +	HACR_EL2,	/* Hypervisor Auxiliary Control Register */
> >> +	TTBR0_EL2,	/* Translation Table Base Register 0 (EL2) */
> >> +	TTBR1_EL2,	/* Translation Table Base Register 1 (EL2) */
> >> +	TCR_EL2,	/* Translation Control Register (EL2) */
> >> +	VTTBR_EL2,	/* Virtualization Translation Table Base Register */
> >> +	VTCR_EL2,	/* Virtualization Translation Control Register */
> >> +	SPSR_EL2,	/* EL2 saved program status register */
> >> +	ELR_EL2,	/* EL2 exception link register */
> >> +	AFSR0_EL2,	/* Auxiliary Fault Status Register 0 (EL2) */
> >> +	AFSR1_EL2,	/* Auxiliary Fault Status Register 1 (EL2) */
> >> +	ESR_EL2,	/* Exception Syndrome Register (EL2) */
> >> +	FAR_EL2,	/* Hypervisor IPA Fault Address Register */
> >> +	HPFAR_EL2,	/* Hypervisor IPA Fault Address Register */
> >> +	MAIR_EL2,	/* Memory Attribute Indirection Register (EL2) */
> >> +	AMAIR_EL2,	/* Auxiliary Memory Attribute Indirection Register (EL2) */
> >> +	VBAR_EL2,	/* Vector Base Address Register (EL2) */
> >> +	RVBAR_EL2,	/* Reset Vector Base Address Register */
> >> +	RMR_EL2,	/* Reset Management Register */
> >> +	CONTEXTIDR_EL2,	/* Context ID Register (EL2) */
> >> +	TPIDR_EL2,	/* EL2 Software Thread ID Register */
> >> +	CNTVOFF_EL2,	/* Counter-timer Virtual Offset register */
> >> +	CNTHCTL_EL2,	/* Counter-timer Hypervisor Control register */
> >> +	SP_EL2,		/* EL2 Stack Pointer */
> >> +
> > 
> > I wonder whether we could make these conditionally present somehow.  Not
> > worth worrying about for now to save 200-odd bytes per vcpu though.
> 
> With 8.4-NV, these 200 bytes turn into a whole 8kB (4kB page, plus
> almost 4kB of padding that I need to reduce one way or another). So I'm
> not too worried about this for now.
> 
> I really want the NV code to always be present though, in order to avoid
> configuration related regressions. I'm not sure how to make this better.

Fair enough -- sounds like addressing this would probably be premature
optimisation, then.

I suppose we could have two alternate layouts, but would likely be a
source of overhead, and bugs...

Cheers
---Dave

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 26/59] KVM: arm64: nv: Respect the virtual HCR_EL2.NV bit setting
  2019-06-26  5:31   ` Julien Thierry
@ 2019-07-03 16:31     ` Marc Zyngier
  0 siblings, 0 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-07-03 16:31 UTC (permalink / raw)
  To: Julien Thierry, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	James Morse, Suzuki K Poulose

On 26/06/2019 06:31, Julien Thierry wrote:
> 
> 
> On 06/21/2019 10:38 AM, Marc Zyngier wrote:
>> From: Jintack Lim <jintack.lim@linaro.org>
>>
>> Forward traps due to HCR_EL2.NV bit to the virtual EL2 if they are not
>> coming from the virtual EL2 and the virtual HCR_EL2.NV bit is set.
>>
>> In addition to EL2 register accesses, setting NV bit will also make EL12
>> register accesses trap to EL2. To emulate this for the virtual EL2,
>> forword traps due to EL12 register accessses to the virtual EL2 if the
>> virtual HCR_EL2.NV bit is set.
>>
>> This is for recursive nested virtualization.
>>
>> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
>> [Moved code to emulate-nested.c]
>> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm64/include/asm/kvm_arm.h    |  1 +
>>  arch/arm64/include/asm/kvm_nested.h |  2 ++
>>  arch/arm64/kvm/emulate-nested.c     | 28 ++++++++++++++++++++++++++++
>>  arch/arm64/kvm/handle_exit.c        |  6 ++++++
>>  arch/arm64/kvm/sys_regs.c           | 18 ++++++++++++++++++
>>  5 files changed, 55 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
>> index 48e15af2bece..d21486274eeb 100644
>> --- a/arch/arm64/include/asm/kvm_arm.h
>> +++ b/arch/arm64/include/asm/kvm_arm.h
>> @@ -24,6 +24,7 @@
>>  
>>  /* Hyp Configuration Register (HCR) bits */
>>  #define HCR_FWB		(UL(1) << 46)
>> +#define HCR_NV		(UL(1) << 42)
>>  #define HCR_API		(UL(1) << 41)
>>  #define HCR_APK		(UL(1) << 40)
>>  #define HCR_TEA		(UL(1) << 37)
>> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
>> index 645e5e11b749..61e71d0d2151 100644
>> --- a/arch/arm64/include/asm/kvm_nested.h
>> +++ b/arch/arm64/include/asm/kvm_nested.h
>> @@ -11,5 +11,7 @@ static inline bool nested_virt_in_use(const struct kvm_vcpu *vcpu)
>>  }
>>  
>>  int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
>> +extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
>> +extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
>>  
>>  #endif /* __ARM64_KVM_NESTED_H */
>> diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
>> index f829b8b04dc8..c406fd688b9f 100644
>> --- a/arch/arm64/kvm/emulate-nested.c
>> +++ b/arch/arm64/kvm/emulate-nested.c
>> @@ -24,6 +24,27 @@
>>  
>>  #include "trace.h"
>>  
>> +bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit)
> 
> Should this one be static?

$ git grep forward_traps
arch/arm64/include/asm/kvm_nested.h:extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
arch/arm64/kvm/emulate-nested.c:bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit)
arch/arm64/kvm/emulate-nested.c:        return forward_traps(vcpu, HCR_NV);
arch/arm64/kvm/sys_regs.c:      return forward_traps(vcpu, HCR_AT);
arch/arm64/kvm/sys_regs.c:      return forward_traps(vcpu, HCR_TTLB);
arch/arm64/kvm/sys_regs.c:      return forward_traps(vcpu, HCR_NV1);

I guess not.

> 
>> +{
>> +	bool control_bit_set;
>> +
>> +	if (!nested_virt_in_use(vcpu))
>> +		return false;
>> +
>> +	control_bit_set = __vcpu_sys_reg(vcpu, HCR_EL2) & control_bit;
>> +	if (!vcpu_mode_el2(vcpu) && control_bit_set) {
>> +		kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
>> +		return true;
>> +	}
>> +	return false;
>> +}
>> +
>> +bool forward_nv_traps(struct kvm_vcpu *vcpu)
>> +{
>> +	return forward_traps(vcpu, HCR_NV);
>> +}
>> +
>> +
>>  /* This is borrowed from get_except_vector in inject_fault.c */
>>  static u64 get_el2_except_vector(struct kvm_vcpu *vcpu,
>>  		enum exception_type type)
>> @@ -55,6 +76,13 @@ void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu)
>>  	u64 spsr, elr, mode;
>>  	bool direct_eret;
>>  
>> +	/*
>> +	 * Forward this trap to the virtual EL2 if the virtual
>> +	 * HCR_EL2.NV bit is set and this is coming from !EL2.
>> +	 */
>> +	if (forward_nv_traps(vcpu))
>> +		return;
>> +
>>  	/*
>>  	 * Going through the whole put/load motions is a waste of time
>>  	 * if this is a VHE guest hypervisor returning to its own
>> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
>> index 39602a4c1d61..7e8b1ec1d251 100644
>> --- a/arch/arm64/kvm/handle_exit.c
>> +++ b/arch/arm64/kvm/handle_exit.c
>> @@ -72,6 +72,12 @@ static int handle_smc(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>  {
>>  	int ret;
>>  
>> +	/*
>> +	 * Forward this trapped smc instruction to the virtual EL2.
>> +	 */
>> +	if ((vcpu_read_sys_reg(vcpu, HCR_EL2) & HCR_TSC) && forward_nv_traps(vcpu))
> 
> Not sure I understand why this would be only when the guest hyp also has
> NV set.
> 
> If the guest hyp requested to trap smc instructions and that we received
> one while in vel1, shouldn't we always forward it to the guest hyp to
> let it implement the smc response the way it wants?

There is a small difference, but I'm not sure it matters: If EL3 is not
implemented, SMC UNDEFs at EL1 unless NV is set. So we know here that our
guest is running a guest hypervisor. But does it change a thing?

I need to think about it again... :-(

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 13/59] KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg()
  2019-07-03 15:59     ` Marc Zyngier
@ 2019-07-03 16:32       ` Alexandru Elisei
  2019-07-04 14:39         ` Marc Zyngier
  0 siblings, 1 reply; 176+ messages in thread
From: Alexandru Elisei @ 2019-07-03 16:32 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave Martin


On 7/3/19 4:59 PM, Marc Zyngier wrote:
> On 25/06/2019 16:18, Alexandru Elisei wrote:
>> Hi Marc,
>>
>> A question regarding this patch. This patch modifies vcpu_{read,write}_sys_reg
>> to handle virtual EL2 registers. However, the file kvm/emulate-nested.c added by
>> patch 10/59 "KVM: arm64: nv: Support virtual EL2 exceptions" already uses
>> vcpu_{read,write}_sys_reg to access EL2 registers. In my opinion, it doesn't
>> really matter which comes first because nested support is only enabled in the
>> last patch of the series, but I thought I should bring this up in case it is not
>> what you intended.
> It doesn't really matter at that stage. The only thing I'm trying to
> achieve in the middle of the series is not to break the build, and not
> to cause non-NV to fail.
>
>> On 6/21/19 10:37 AM, Marc Zyngier wrote:
>>> From: Andre Przywara <andre.przywara@arm.com>
>>>
>>> KVM internally uses accessor functions when reading or writing the
>>> guest's system registers. This takes care of accessing either the stored
>>> copy or using the "live" EL1 system registers when the host uses VHE.
>>>
>>> With the introduction of virtual EL2 we add a bunch of EL2 system
>>> registers, which now must also be taken care of:
>>> - If the guest is running in vEL2, and we access an EL1 sysreg, we must
>>>   revert to the stored version of that, and not use the CPU's copy.
>>> - If the guest is running in vEL1, and we access an EL2 sysreg, we must
>>>   also use the stored version, since the CPU carries the EL1 copy.
>>> - Some EL2 system registers are supposed to affect the current execution
>>>   of the system, so we need to put them into their respective EL1
>>>   counterparts. For this we need to define a mapping between the two.
>>>   This is done using the newly introduced struct el2_sysreg_map.
>>> - Some EL2 system registers have a different format than their EL1
>>>   counterpart, so we need to translate them before writing them to the
>>>   CPU. This is done using an (optional) translate function in the map.
>>> - There are the three special registers SP_EL2, SPSR_EL2 and ELR_EL2,
>>>   which need some separate handling.
>> I see no change in this patch related to SPSR_EL2. Special handling of SPSR_EL2
>> is added in the next patch, patch 14/59 "KVM: arm64: nv: Handle SPSR_EL2 specially".
> Indeed, this needs rewriting (we ended-up splitting the SPSR stuff out
> as it was messy and not completely correct). I may take the rest of the
> special stuff out as well.
>
>>> All of these cases are now wrapped into the existing accessor functions,
>>> so KVM users wouldn't need to care whether they access EL2 or EL1
>>> registers and also which state the guest is in.
>>>
>>> This handles what was formerly known as the "shadow state" dynamically,
>>> without requiring a separate copy for each vCPU EL.
>>>
>>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>> ---
>>>  arch/arm64/include/asm/kvm_emulate.h |   6 +
>>>  arch/arm64/include/asm/kvm_host.h    |   5 +
>>>  arch/arm64/kvm/sys_regs.c            | 163 +++++++++++++++++++++++++++
>>>  3 files changed, 174 insertions(+)
>>>
>>> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
>>> index c43aac5fed69..f37006b6eec4 100644
>>> --- a/arch/arm64/include/asm/kvm_emulate.h
>>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>>> @@ -70,6 +70,12 @@ void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu);
>>>  int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2);
>>>  int kvm_inject_nested_irq(struct kvm_vcpu *vcpu);
>>>  
>>> +u64 translate_tcr(u64 tcr);
>>> +u64 translate_cptr(u64 tcr);
>>> +u64 translate_sctlr(u64 tcr);
>>> +u64 translate_ttbr0(u64 tcr);
>>> +u64 translate_cnthctl(u64 tcr);
>>> +
>>>  static inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
>>>  {
>>>  	return !(vcpu->arch.hcr_el2 & HCR_RW);
>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>>> index 2d4290d2513a..dae9c42a7219 100644
>>> --- a/arch/arm64/include/asm/kvm_host.h
>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>> @@ -217,6 +217,11 @@ enum vcpu_sysreg {
>>>  	NR_SYS_REGS	/* Nothing after this line! */
>>>  };
>>>  
>>> +static inline bool sysreg_is_el2(int reg)
>>> +{
>>> +	return reg >= FIRST_EL2_SYSREG && reg < NR_SYS_REGS;
>>> +}
>>> +
>>>  /* 32bit mapping */
>>>  #define c0_MPIDR	(MPIDR_EL1 * 2)	/* MultiProcessor ID Register */
>>>  #define c0_CSSELR	(CSSELR_EL1 * 2)/* Cache Size Selection Register */
>>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>>> index 693dd063c9c2..d024114da162 100644
>>> --- a/arch/arm64/kvm/sys_regs.c
>>> +++ b/arch/arm64/kvm/sys_regs.c
>>> @@ -76,11 +76,142 @@ static bool write_to_read_only(struct kvm_vcpu *vcpu,
>>>  	return false;
>>>  }
>>>  
>>> +static u64 tcr_el2_ips_to_tcr_el1_ps(u64 tcr_el2)
>> The code seems to suggest that you are translating TCR_EL2.PS to TCR_EL1.IPS.
>> Perhaps the function should be named tcr_el2_ps_to_tcr_el1_ips?
> yup.
>
>>> +{
>>> +	return ((tcr_el2 & TCR_EL2_PS_MASK) >> TCR_EL2_PS_SHIFT)
>>> +		<< TCR_IPS_SHIFT;
>>> +}
>>> +
>>> +u64 translate_tcr(u64 tcr)
>>> +{
>>> +	return TCR_EPD1_MASK |				/* disable TTBR1_EL1 */
>>> +	       ((tcr & TCR_EL2_TBI) ? TCR_TBI0 : 0) |
>>> +	       tcr_el2_ips_to_tcr_el1_ps(tcr) |
>>> +	       (tcr & TCR_EL2_TG0_MASK) |
>>> +	       (tcr & TCR_EL2_ORGN0_MASK) |
>>> +	       (tcr & TCR_EL2_IRGN0_MASK) |
>>> +	       (tcr & TCR_EL2_T0SZ_MASK);
>>> +}
>>> +
>>> +u64 translate_cptr(u64 cptr_el2)
>> The argument name is not consistent with the other translate_* functions. I
>> think it is reasonably obvious that you are translating an EL2 register.
> That's pretty much immaterial, and the variable could be called zorglub.
> Consistency is good, but I don't think we need to worry about that level
> of detail.

Sure.

>
>>> +{
>>> +	u64 cpacr_el1 = 0;
>>> +
>>> +	if (!(cptr_el2 & CPTR_EL2_TFP))
>>> +		cpacr_el1 |= CPACR_EL1_FPEN;
>>> +	if (cptr_el2 & CPTR_EL2_TTA)
>>> +		cpacr_el1 |= CPACR_EL1_TTA;
>>> +	if (!(cptr_el2 & CPTR_EL2_TZ))
>>> +		cpacr_el1 |= CPACR_EL1_ZEN;
>>> +
>>> +	return cpacr_el1;
>>> +}
>>> +
>>> +u64 translate_sctlr(u64 sctlr)
>>> +{
>>> +	/* Bit 20 is RES1 in SCTLR_EL1, but RES0 in SCTLR_EL2 */
>>> +	return sctlr | BIT(20);
>>> +}
>>> +
>>> +u64 translate_ttbr0(u64 ttbr0)
>>> +{
>>> +	/* Force ASID to 0 (ASID 0 or RES0) */
>> Are you forcing ASID to 0 because you are only expecting a non-vhe guest
>> hypervisor to access ttbr0_el2, in which case the architecture says that the
>> ASID field is RES0? Is it so unlikely that a vhe guest hypervisor will access
>> ttbr0_el2 directly that it's not worth adding a check for that?
> Like all the translate_* function, this is only called when running a
> non-VHE guest so that the EL2 register is translated to the EL1 format.
> A VHE guest usually has its sysregs in the EL1 format, and certainly
> does for TTBR0_EL2.

Yeah, figured that out after I sent this patch.

>
>>> +	return ttbr0 & ~GENMASK_ULL(63, 48);
>>> +}
>>> +
>>> +u64 translate_cnthctl(u64 cnthctl)
>>> +{
>>> +	return ((cnthctl & 0x3) << 10) | (cnthctl & 0xfc);
>> Patch 16/59 "KVM: arm64: nv: Save/Restore vEL2 sysregs" suggests that you are
>> translating CNTHCTL to write it to CNTKCTL_EL1. Looking at ARM DDI 0487D.b,
>> CNTKCTL_EL1 has bits 63:10 RES0. I think the correct value should be ((cnthctl &
>> 0x3) << 8) | (cnthctl & 0xfc).
> Rookie mistake! When HCR_EL2.E2h==1 (which is always the case for NV),
> CNTKCTL_EL1 accesses CNTHCTL_EL2. What you have here is the translation
> of non-VHE CNTHCTL_EL2 to its VHE equivalent.

Indeed! Thank you for pointing it out.

>
>>> +}
>>> +
>>> +#define EL2_SYSREG(el2, el1, translate)	\
>>> +	[el2 - FIRST_EL2_SYSREG] = { el2, el1, translate }
>>> +#define PURE_EL2_SYSREG(el2) \
>>> +	[el2 - FIRST_EL2_SYSREG] = { el2,__INVALID_SYSREG__, NULL }
>>> +/*
>>> + * Associate vEL2 registers to their EL1 counterparts on the CPU.
>>> + * The translate function can be NULL, when the register layout is identical.
>>> + */
>>> +struct el2_sysreg_map {
>>> +	int sysreg;	/* EL2 register index into the array above */
>>> +	int mapping;	/* associated EL1 register */
>>> +	u64 (*translate)(u64 value);
>>> +} nested_sysreg_map[NR_SYS_REGS - FIRST_EL2_SYSREG] = {
>>> +	PURE_EL2_SYSREG( VPIDR_EL2 ),
>>> +	PURE_EL2_SYSREG( VMPIDR_EL2 ),
>>> +	PURE_EL2_SYSREG( ACTLR_EL2 ),
>>> +	PURE_EL2_SYSREG( HCR_EL2 ),
>>> +	PURE_EL2_SYSREG( MDCR_EL2 ),
>>> +	PURE_EL2_SYSREG( HSTR_EL2 ),
>>> +	PURE_EL2_SYSREG( HACR_EL2 ),
>>> +	PURE_EL2_SYSREG( VTTBR_EL2 ),
>>> +	PURE_EL2_SYSREG( VTCR_EL2 ),
>>> +	PURE_EL2_SYSREG( RVBAR_EL2 ),
>>> +	PURE_EL2_SYSREG( RMR_EL2 ),
>>> +	PURE_EL2_SYSREG( TPIDR_EL2 ),
>>> +	PURE_EL2_SYSREG( CNTVOFF_EL2 ),
>>> +	PURE_EL2_SYSREG( CNTHCTL_EL2 ),
>>> +	PURE_EL2_SYSREG( HPFAR_EL2 ),
>>> +	EL2_SYSREG(      SCTLR_EL2,  SCTLR_EL1,      translate_sctlr ),
>>> +	EL2_SYSREG(      CPTR_EL2,   CPACR_EL1,      translate_cptr  ),
>>> +	EL2_SYSREG(      TTBR0_EL2,  TTBR0_EL1,      translate_ttbr0 ),
>>> +	EL2_SYSREG(      TTBR1_EL2,  TTBR1_EL1,      NULL            ),
>>> +	EL2_SYSREG(      TCR_EL2,    TCR_EL1,        translate_tcr   ),
>>> +	EL2_SYSREG(      VBAR_EL2,   VBAR_EL1,       NULL            ),
>>> +	EL2_SYSREG(      AFSR0_EL2,  AFSR0_EL1,      NULL            ),
>>> +	EL2_SYSREG(      AFSR1_EL2,  AFSR1_EL1,      NULL            ),
>>> +	EL2_SYSREG(      ESR_EL2,    ESR_EL1,        NULL            ),
>>> +	EL2_SYSREG(      FAR_EL2,    FAR_EL1,        NULL            ),
>>> +	EL2_SYSREG(      MAIR_EL2,   MAIR_EL1,       NULL            ),
>>> +	EL2_SYSREG(      AMAIR_EL2,  AMAIR_EL1,      NULL            ),
>>> +};
>> Figuring out which registers are in this map and which aren't and are supposed
>> to be treated differently is really cumbersome because they are split into two
>> types of el2 registers and their order is different from the order in enum
>> vcpu_sysreg (in kvm_host.h). Perhaps adding a comment about what registers will
>> be treated differently would make the code a bit easier to follow?
> I'm not sure what this buys us. We have 3 categories of EL2 sysregs:
> - Purely emulated
> - Directly mapped onto an EL1 sysreg
> - Translated from EL2 to EL1
>
> I think the wrappers represent that pretty well, although we could split
> EL2_SYSREG into DIRECT_EL2_SYSREG and TRANSLATE_EL2_SYSREG. As for the
> order, does it really matter? We also have the trap table order, which
> is also different from the enum. Do you propose we reorder everything?

The wrappers and the naming are fine.

I was trying to figure out which EL2 registers are in the nested_sysreg_map and
which aren't (that's what I meant by "two types of registers") by looking at the
vcpu_sysreg enum. Because the order in the map is different than the order in
the enum, I was having a difficult time figuring out which registers are not in
the nested_sysreg_map to make sure we haven't somehow forgot to emulate a register.

So no, I wasn't asking to reorder everything. I was asking if it would be
appropriate to write a comment stating the intention to treat registers X, Y and
Z separately from the registers in nested_sysreg_map.

>
>>> +
>>> +static
>>> +const struct el2_sysreg_map *find_el2_sysreg(const struct el2_sysreg_map *map,
>>> +					     int reg)
>>> +{
>>> +	const struct el2_sysreg_map *entry;
>>> +
>>> +	if (!sysreg_is_el2(reg))
>>> +		return NULL;
>>> +
>>> +	entry = &nested_sysreg_map[reg - FIRST_EL2_SYSREG];
>>> +	if (entry->sysreg == __INVALID_SYSREG__)
>>> +		return NULL;
>>> +
>>> +	return entry;
>>> +}
>>> +
>>>  u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
>>>  {
>>> +
>>>  	if (!vcpu->arch.sysregs_loaded_on_cpu)
>>>  		goto immediate_read;
>>>  
>>> +	if (unlikely(sysreg_is_el2(reg))) {
>>> +		const struct el2_sysreg_map *el2_reg;
>>> +
>>> +		if (!is_hyp_ctxt(vcpu))
>>> +			goto immediate_read;
>> I'm confused by this. is_hyp_ctxt returns false when the guest is not in vEL2
>> AND HCR_EL.E2H or HCR_EL2.TGE are not set. In this case, the NV bit will not be
>> set and the hardware will raise an undefined instruction exception when
>> accessing an EL2 register from EL1. What am I missing?
> You don't necessarily access an EL2 register just because you run at
> EL2. You also access it because you emulate an EL1 instruction whose
> behaviour is conditioned by an EL2 register.

Got it, now it makes a lot more sense.

>
> Thanks,
>
> 	M.

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 52/59] KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ
  2019-06-21  9:38 ` [PATCH 52/59] KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ Marc Zyngier
@ 2019-07-04  7:38   ` Julien Thierry
  2019-07-04  9:01     ` Andre Przywara
  0 siblings, 1 reply; 176+ messages in thread
From: Julien Thierry @ 2019-07-04  7:38 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	James Morse, Suzuki K Poulose



On 21/06/2019 10:38, Marc Zyngier wrote:
> From: Andre Przywara <andre.przywara@arm.com>
> 
> The VGIC maintenance IRQ signals various conditions about the LRs, when
> the GIC's virtualization extension is used.
> So far we didn't need it, but nested virtualization needs to know about
> this interrupt, so add a userland interface to setup the IRQ number.
> The architecture mandates that it must be a PPI, on top of that this code
> only exports a per-device option, so the PPI is the same on all VCPUs.
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> [added some bits of documentation]
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  .../virtual/kvm/devices/arm-vgic-v3.txt       |  9 ++++++++
>  arch/arm/include/uapi/asm/kvm.h               |  1 +
>  arch/arm64/include/uapi/asm/kvm.h             |  1 +
>  include/kvm/arm_vgic.h                        |  3 +++
>  virt/kvm/arm/vgic/vgic-kvm-device.c           | 22 +++++++++++++++++++
>  5 files changed, 36 insertions(+)
> 
> diff --git a/Documentation/virtual/kvm/devices/arm-vgic-v3.txt b/Documentation/virtual/kvm/devices/arm-vgic-v3.txt
> index ff290b43c8e5..c70e8f2e0c9c 100644
> --- a/Documentation/virtual/kvm/devices/arm-vgic-v3.txt
> +++ b/Documentation/virtual/kvm/devices/arm-vgic-v3.txt
> @@ -249,3 +249,12 @@ Groups:
>    Errors:
>      -EINVAL: vINTID is not multiple of 32 or
>       info field is not VGIC_LEVEL_INFO_LINE_LEVEL
> +
> +  KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ
> +    The attr field of kvm_device_attr encodes the following values:
> +    bits:     | 31   ....    5 | 4  ....  0 |
> +    values:   |      RES0      |   vINTID   |
> +
> +    The vINTID specifies which interrupt is generated when the vGIC
> +    must generate a maintenance interrupt. This must be a PPI.
> +

Something seems off. The documentation suggests that the value of the
attribute will be between 0-15 (and other values will be masked down to
a value between 0 and 15). However, in the code the value should be
between 16 and 32 (PPI INTID) and other values are rejected.

Am I reading this wrong?

Cheers,

Julien

> diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
> index 4602464ebdfb..a6af45645a6d 100644
> --- a/arch/arm/include/uapi/asm/kvm.h
> +++ b/arch/arm/include/uapi/asm/kvm.h
> @@ -233,6 +233,7 @@ struct kvm_vcpu_events {
>  #define KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS 6
>  #define KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO  7
>  #define KVM_DEV_ARM_VGIC_GRP_ITS_REGS	8
> +#define KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ	9
>  #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT	10
>  #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_MASK \
>  			(0x3fffffULL << KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT)
> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> index 563e2a8bae93..8e1c6ffe1b59 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -295,6 +295,7 @@ struct kvm_vcpu_events {
>  #define KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS 6
>  #define KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO  7
>  #define KVM_DEV_ARM_VGIC_GRP_ITS_REGS 8
> +#define KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ  9
>  #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT	10
>  #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_MASK \
>  			(0x3fffffULL << KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT)
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index 17d32c2797c0..95c6232663c3 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -229,6 +229,9 @@ struct vgic_dist {
>  
>  	int			nr_spis;
>  
> +	/* The GIC maintenance IRQ for nested hypervisors. */
> +	u32			maint_irq;
> +
>  	/* base addresses in guest physical address space: */
>  	gpa_t			vgic_dist_base;		/* distributor */
>  	union {
> diff --git a/virt/kvm/arm/vgic/vgic-kvm-device.c b/virt/kvm/arm/vgic/vgic-kvm-device.c
> index 44419679f91a..dfb1d7cc66b3 100644
> --- a/virt/kvm/arm/vgic/vgic-kvm-device.c
> +++ b/virt/kvm/arm/vgic/vgic-kvm-device.c
> @@ -241,6 +241,12 @@ static int vgic_get_common_attr(struct kvm_device *dev,
>  			     VGIC_NR_PRIVATE_IRQS, uaddr);
>  		break;
>  	}
> +	case KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ: {
> +		u32 __user *uaddr = (u32 __user *)(long)attr->addr;
> +
> +		r = put_user(dev->kvm->arch.vgic.maint_irq, uaddr);
> +		break;
> +	}
>  	}
>  
>  	return r;
> @@ -627,6 +633,21 @@ static int vgic_v3_set_attr(struct kvm_device *dev,
>  		reg = tmp32;
>  		return vgic_v3_attr_regs_access(dev, attr, &reg, true);
>  	}
> +	case KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ: {
> +		u32 __user *uaddr = (u32 __user *)(long)attr->addr;
> +		u32 val;
> +
> +		if (get_user(val, uaddr))
> +			return -EFAULT;
> +
> +		/* Must be a PPI. */
> +		if ((val >= VGIC_NR_PRIVATE_IRQS) || (val < VGIC_NR_SGIS))
> +			return -EINVAL;
> +
> +		dev->kvm->arch.vgic.maint_irq = val;
> +
> +		return 0;
> +	}
>  	case KVM_DEV_ARM_VGIC_GRP_CTRL: {
>  		int ret;
>  
> @@ -712,6 +733,7 @@ static int vgic_v3_has_attr(struct kvm_device *dev,
>  	case KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS:
>  		return vgic_v3_has_attr_regs(dev, attr);
>  	case KVM_DEV_ARM_VGIC_GRP_NR_IRQS:
> +	case KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ:
>  		return 0;
>  	case KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO: {
>  		if (((attr->attr & KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_MASK) >>
> 

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 53/59] KVM: arm64: nv: Implement maintenance interrupt forwarding
  2019-06-21  9:38 ` [PATCH 53/59] KVM: arm64: nv: Implement maintenance interrupt forwarding Marc Zyngier
@ 2019-07-04  8:06   ` Julien Thierry
  2019-07-16 16:35   ` Alexandru Elisei
  1 sibling, 0 replies; 176+ messages in thread
From: Julien Thierry @ 2019-07-04  8:06 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	James Morse, Suzuki K Poulose



On 21/06/2019 10:38, Marc Zyngier wrote:
> When we take a maintenance interrupt, we need to decide whether
> it is generated on an action from the guest, or if it is something
> that needs to be forwarded to the guest hypervisor.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/kvm/nested.c            |  2 +-
>  virt/kvm/arm/vgic/vgic-init.c      | 30 ++++++++++++++++++++++++++++++
>  virt/kvm/arm/vgic/vgic-v3-nested.c | 25 +++++++++++++++++++++----
>  3 files changed, 52 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> index df2db9ab7cfb..ab61f0f30ee6 100644
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
> @@ -545,7 +545,7 @@ bool vgic_state_is_nested(struct kvm_vcpu *vcpu)
>  	bool imo = __vcpu_sys_reg(vcpu, HCR_EL2) & HCR_IMO;
>  	bool fmo = __vcpu_sys_reg(vcpu, HCR_EL2) & HCR_FMO;
>  
> -	WARN(imo != fmo, "Separate virtual IRQ/FIQ settings not supported\n");
> +	WARN_ONCE(imo != fmo, "Separate virtual IRQ/FIQ settings not supported\n");
>  
>  	return nested_virt_in_use(vcpu) && imo && fmo && !is_hyp_ctxt(vcpu);
>  }
> diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c
> index 3bdb31eaed64..ec54bc8d5126 100644
> --- a/virt/kvm/arm/vgic/vgic-init.c
> +++ b/virt/kvm/arm/vgic/vgic-init.c
> @@ -17,9 +17,11 @@
>  #include <linux/uaccess.h>
>  #include <linux/interrupt.h>
>  #include <linux/cpu.h>
> +#include <linux/irq.h>
>  #include <linux/kvm_host.h>
>  #include <kvm/arm_vgic.h>
>  #include <asm/kvm_mmu.h>
> +#include <asm/kvm_nested.h>
>  #include "vgic.h"
>  
>  /*
> @@ -240,6 +242,16 @@ int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
>  	if (!irqchip_in_kernel(vcpu->kvm))
>  		return 0;
>  
> +	if (nested_virt_in_use(vcpu)) {
> +		/* FIXME: remove this hack */
> +		if (vcpu->kvm->arch.vgic.maint_irq == 0)
> +			vcpu->kvm->arch.vgic.maint_irq = kvm_vgic_global_state.maint_irq;
> +		ret = kvm_vgic_set_owner(vcpu, vcpu->kvm->arch.vgic.maint_irq,
> +					 vcpu);

Having this here, does it mean that the userland needs to set the
maintenance irq attribute before creating any vCPU for the VM?

If so, should it be clarified in the documentation? Or is it a general
rule that (vGIC?) attributes should be set before creating vCPUs?

Cheers,

Julien

> +		if (ret)
> +			return ret;
> +	}
> +
>  	/*
>  	 * If we are creating a VCPU with a GICv3 we must also register the
>  	 * KVM io device for the redistributor that belongs to this VCPU.
> @@ -455,12 +467,23 @@ static int vgic_init_cpu_dying(unsigned int cpu)
>  
>  static irqreturn_t vgic_maintenance_handler(int irq, void *data)
>  {
> +	struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)data;
> +
>  	/*
>  	 * We cannot rely on the vgic maintenance interrupt to be
>  	 * delivered synchronously. This means we can only use it to
>  	 * exit the VM, and we perform the handling of EOIed
>  	 * interrupts on the exit path (see vgic_fold_lr_state).
>  	 */
> +
> +	/* If not nested, deactivate */
> +	if (!vcpu || !vgic_state_is_nested(vcpu)) {
> +		irq_set_irqchip_state(irq, IRQCHIP_STATE_ACTIVE, false);
> +		return IRQ_HANDLED;
> +	}
> +
> +	/* Assume nested from now */
> +	vgic_v3_handle_nested_maint_irq(vcpu);
>  	return IRQ_HANDLED;
>  }
>  
> @@ -531,6 +554,13 @@ int kvm_vgic_hyp_init(void)
>  		return ret;
>  	}
>  
> +	ret = irq_set_vcpu_affinity(kvm_vgic_global_state.maint_irq,
> +				    kvm_get_running_vcpus());
> +	if (ret) {
> +		kvm_err("Error setting vcpu affinity\n");
> +		goto out_free_irq;
> +	}
> +
>  	ret = cpuhp_setup_state(CPUHP_AP_KVM_ARM_VGIC_INIT_STARTING,
>  				"kvm/arm/vgic:starting",
>  				vgic_init_cpu_starting, vgic_init_cpu_dying);
> diff --git a/virt/kvm/arm/vgic/vgic-v3-nested.c b/virt/kvm/arm/vgic/vgic-v3-nested.c
> index c917d49e4a14..7c5f82ae68e0 100644
> --- a/virt/kvm/arm/vgic/vgic-v3-nested.c
> +++ b/virt/kvm/arm/vgic/vgic-v3-nested.c
> @@ -172,10 +172,20 @@ void vgic_v3_sync_nested(struct kvm_vcpu *vcpu)
>  void vgic_v3_load_nested(struct kvm_vcpu *vcpu)
>  {
>  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
> +	struct vgic_irq *irq;
> +	unsigned long flags;
>  
>  	vgic_cpu->shadow_vgic_v3 = vgic_cpu->nested_vgic_v3;
>  	vgic_v3_create_shadow_lr(vcpu);
>  	__vgic_v3_restore_state(vcpu_shadow_if(vcpu));
> +
> +	irq = vgic_get_irq(vcpu->kvm, vcpu, vcpu->kvm->arch.vgic.maint_irq);
> +	raw_spin_lock_irqsave(&irq->irq_lock, flags);
> +	if (irq->line_level || irq->active)
> +		irq_set_irqchip_state(kvm_vgic_global_state.maint_irq,
> +				      IRQCHIP_STATE_ACTIVE, true);
> +	raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
> +	vgic_put_irq(vcpu->kvm, irq);
>  }
>  
>  void vgic_v3_put_nested(struct kvm_vcpu *vcpu)
> @@ -190,11 +200,14 @@ void vgic_v3_put_nested(struct kvm_vcpu *vcpu)
>  	 */
>  	vgic_v3_fixup_shadow_lr_state(vcpu);
>  	vgic_cpu->nested_vgic_v3 = vgic_cpu->shadow_vgic_v3;
> +	irq_set_irqchip_state(kvm_vgic_global_state.maint_irq,
> +			      IRQCHIP_STATE_ACTIVE, false);
>  }
>  
>  void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu)
>  {
>  	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
> +	bool state;
>  
>  	/*
>  	 * If we exit a nested VM with a pending maintenance interrupt from the
> @@ -202,8 +215,12 @@ void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu)
>  	 * can re-sync the appropriate LRs and sample level triggered interrupts
>  	 * again.
>  	 */
> -	if (vgic_state_is_nested(vcpu) &&
> -	    (cpu_if->vgic_hcr & ICH_HCR_EN) &&
> -	    vgic_v3_get_misr(vcpu))
> -		kvm_inject_nested_irq(vcpu);
> +	if (!vgic_state_is_nested(vcpu))
> +		return;
> +
> +	state  = cpu_if->vgic_hcr & ICH_HCR_EN;
> +	state &= vgic_v3_get_misr(vcpu);
> +
> +	kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
> +			    vcpu->kvm->arch.vgic.maint_irq, state, vcpu);
>  }
> 

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 52/59] KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ
  2019-07-04  7:38   ` Julien Thierry
@ 2019-07-04  9:01     ` Andre Przywara
  2019-07-04  9:04       ` Julien Thierry
  0 siblings, 1 reply; 176+ messages in thread
From: Andre Przywara @ 2019-07-04  9:01 UTC (permalink / raw)
  To: Julien Thierry
  Cc: Marc Zyngier, linux-arm-kernel, kvmarm, kvm, Christoffer Dall,
	Dave Martin, Jintack Lim, James Morse, Suzuki K Poulose

On Thu, 4 Jul 2019 08:38:20 +0100
Julien Thierry <julien.thierry@arm.com> wrote:

> On 21/06/2019 10:38, Marc Zyngier wrote:
> > From: Andre Przywara <andre.przywara@arm.com>
> > 
> > The VGIC maintenance IRQ signals various conditions about the LRs, when
> > the GIC's virtualization extension is used.
> > So far we didn't need it, but nested virtualization needs to know about
> > this interrupt, so add a userland interface to setup the IRQ number.
> > The architecture mandates that it must be a PPI, on top of that this code
> > only exports a per-device option, so the PPI is the same on all VCPUs.
> > 
> > Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> > [added some bits of documentation]
> > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> > ---
> >  .../virtual/kvm/devices/arm-vgic-v3.txt       |  9 ++++++++
> >  arch/arm/include/uapi/asm/kvm.h               |  1 +
> >  arch/arm64/include/uapi/asm/kvm.h             |  1 +
> >  include/kvm/arm_vgic.h                        |  3 +++
> >  virt/kvm/arm/vgic/vgic-kvm-device.c           | 22 +++++++++++++++++++
> >  5 files changed, 36 insertions(+)
> > 
> > diff --git a/Documentation/virtual/kvm/devices/arm-vgic-v3.txt b/Documentation/virtual/kvm/devices/arm-vgic-v3.txt
> > index ff290b43c8e5..c70e8f2e0c9c 100644
> > --- a/Documentation/virtual/kvm/devices/arm-vgic-v3.txt
> > +++ b/Documentation/virtual/kvm/devices/arm-vgic-v3.txt
> > @@ -249,3 +249,12 @@ Groups:
> >    Errors:
> >      -EINVAL: vINTID is not multiple of 32 or
> >       info field is not VGIC_LEVEL_INFO_LINE_LEVEL
> > +
> > +  KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ
> > +    The attr field of kvm_device_attr encodes the following values:
> > +    bits:     | 31   ....    5 | 4  ....  0 |
> > +    values:   |      RES0      |   vINTID   |
> > +
> > +    The vINTID specifies which interrupt is generated when the vGIC
> > +    must generate a maintenance interrupt. This must be a PPI.
> > +  
> 
> Something seems off. The documentation suggests that the value of the
> attribute will be between 0-15 (and other values will be masked down to
> a value between 0 and 15).

Where does that happen? The mask is [4:0], so 5 bits, that should be enough for PPIs as well.
We could add a line to the documentation to stress that this is an interrupt ID as seen by the virtual GIC, if that helps.

Cheers,
Andre.

> However, in the code the value should be
> between 16 and 32 (PPI INTID) and other values are rejected.
> 
> Am I reading this wrong?
> 
> Cheers,
> 
> Julien
> 
> > diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
> > index 4602464ebdfb..a6af45645a6d 100644
> > --- a/arch/arm/include/uapi/asm/kvm.h
> > +++ b/arch/arm/include/uapi/asm/kvm.h
> > @@ -233,6 +233,7 @@ struct kvm_vcpu_events {
> >  #define KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS 6
> >  #define KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO  7
> >  #define KVM_DEV_ARM_VGIC_GRP_ITS_REGS	8
> > +#define KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ	9
> >  #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT	10
> >  #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_MASK \
> >  			(0x3fffffULL << KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT)
> > diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> > index 563e2a8bae93..8e1c6ffe1b59 100644
> > --- a/arch/arm64/include/uapi/asm/kvm.h
> > +++ b/arch/arm64/include/uapi/asm/kvm.h
> > @@ -295,6 +295,7 @@ struct kvm_vcpu_events {
> >  #define KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS 6
> >  #define KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO  7
> >  #define KVM_DEV_ARM_VGIC_GRP_ITS_REGS 8
> > +#define KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ  9
> >  #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT	10
> >  #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_MASK \
> >  			(0x3fffffULL << KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT)
> > diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> > index 17d32c2797c0..95c6232663c3 100644
> > --- a/include/kvm/arm_vgic.h
> > +++ b/include/kvm/arm_vgic.h
> > @@ -229,6 +229,9 @@ struct vgic_dist {
> >  
> >  	int			nr_spis;
> >  
> > +	/* The GIC maintenance IRQ for nested hypervisors. */
> > +	u32			maint_irq;
> > +
> >  	/* base addresses in guest physical address space: */
> >  	gpa_t			vgic_dist_base;		/* distributor */
> >  	union {
> > diff --git a/virt/kvm/arm/vgic/vgic-kvm-device.c b/virt/kvm/arm/vgic/vgic-kvm-device.c
> > index 44419679f91a..dfb1d7cc66b3 100644
> > --- a/virt/kvm/arm/vgic/vgic-kvm-device.c
> > +++ b/virt/kvm/arm/vgic/vgic-kvm-device.c
> > @@ -241,6 +241,12 @@ static int vgic_get_common_attr(struct kvm_device *dev,
> >  			     VGIC_NR_PRIVATE_IRQS, uaddr);
> >  		break;
> >  	}
> > +	case KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ: {
> > +		u32 __user *uaddr = (u32 __user *)(long)attr->addr;
> > +
> > +		r = put_user(dev->kvm->arch.vgic.maint_irq, uaddr);
> > +		break;
> > +	}
> >  	}
> >  
> >  	return r;
> > @@ -627,6 +633,21 @@ static int vgic_v3_set_attr(struct kvm_device *dev,
> >  		reg = tmp32;
> >  		return vgic_v3_attr_regs_access(dev, attr, &reg, true);
> >  	}
> > +	case KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ: {
> > +		u32 __user *uaddr = (u32 __user *)(long)attr->addr;
> > +		u32 val;
> > +
> > +		if (get_user(val, uaddr))
> > +			return -EFAULT;
> > +
> > +		/* Must be a PPI. */
> > +		if ((val >= VGIC_NR_PRIVATE_IRQS) || (val < VGIC_NR_SGIS))
> > +			return -EINVAL;
> > +
> > +		dev->kvm->arch.vgic.maint_irq = val;
> > +
> > +		return 0;
> > +	}
> >  	case KVM_DEV_ARM_VGIC_GRP_CTRL: {
> >  		int ret;
> >  
> > @@ -712,6 +733,7 @@ static int vgic_v3_has_attr(struct kvm_device *dev,
> >  	case KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS:
> >  		return vgic_v3_has_attr_regs(dev, attr);
> >  	case KVM_DEV_ARM_VGIC_GRP_NR_IRQS:
> > +	case KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ:
> >  		return 0;
> >  	case KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO: {
> >  		if (((attr->attr & KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_MASK) >>
> >   
> 


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 52/59] KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ
  2019-07-04  9:01     ` Andre Przywara
@ 2019-07-04  9:04       ` Julien Thierry
  0 siblings, 0 replies; 176+ messages in thread
From: Julien Thierry @ 2019-07-04  9:04 UTC (permalink / raw)
  To: Andre Przywara
  Cc: kvm, Suzuki K Poulose, Marc Zyngier, Christoffer Dall, kvmarm,
	Jintack Lim, James Morse, Dave Martin, linux-arm-kernel



On 04/07/2019 10:01, Andre Przywara wrote:
> On Thu, 4 Jul 2019 08:38:20 +0100
> Julien Thierry <julien.thierry@arm.com> wrote:
> 
>> On 21/06/2019 10:38, Marc Zyngier wrote:
>>> From: Andre Przywara <andre.przywara@arm.com>
>>>
>>> The VGIC maintenance IRQ signals various conditions about the LRs, when
>>> the GIC's virtualization extension is used.
>>> So far we didn't need it, but nested virtualization needs to know about
>>> this interrupt, so add a userland interface to setup the IRQ number.
>>> The architecture mandates that it must be a PPI, on top of that this code
>>> only exports a per-device option, so the PPI is the same on all VCPUs.
>>>
>>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>>> [added some bits of documentation]
>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>> ---
>>>  .../virtual/kvm/devices/arm-vgic-v3.txt       |  9 ++++++++
>>>  arch/arm/include/uapi/asm/kvm.h               |  1 +
>>>  arch/arm64/include/uapi/asm/kvm.h             |  1 +
>>>  include/kvm/arm_vgic.h                        |  3 +++
>>>  virt/kvm/arm/vgic/vgic-kvm-device.c           | 22 +++++++++++++++++++
>>>  5 files changed, 36 insertions(+)
>>>
>>> diff --git a/Documentation/virtual/kvm/devices/arm-vgic-v3.txt b/Documentation/virtual/kvm/devices/arm-vgic-v3.txt
>>> index ff290b43c8e5..c70e8f2e0c9c 100644
>>> --- a/Documentation/virtual/kvm/devices/arm-vgic-v3.txt
>>> +++ b/Documentation/virtual/kvm/devices/arm-vgic-v3.txt
>>> @@ -249,3 +249,12 @@ Groups:
>>>    Errors:
>>>      -EINVAL: vINTID is not multiple of 32 or
>>>       info field is not VGIC_LEVEL_INFO_LINE_LEVEL
>>> +
>>> +  KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ
>>> +    The attr field of kvm_device_attr encodes the following values:
>>> +    bits:     | 31   ....    5 | 4  ....  0 |
>>> +    values:   |      RES0      |   vINTID   |
>>> +
>>> +    The vINTID specifies which interrupt is generated when the vGIC
>>> +    must generate a maintenance interrupt. This must be a PPI.
>>> +  
>>
>> Something seems off. The documentation suggests that the value of the
>> attribute will be between 0-15 (and other values will be masked down to
>> a value between 0 and 15).
> 
> Where does that happen? The mask is [4:0], so 5 bits, that should be enough for PPIs as well.
> We could add a line to the documentation to stress that this is an interrupt ID as seen by the virtual GIC, if that helps.
> 

You're right, I misread the length of the vINTID field.

Nevermind then!

Thanks,

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 06/59] KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x
  2019-07-03  9:21         ` Marc Zyngier
@ 2019-07-04 10:00           ` Dave Martin
  0 siblings, 0 replies; 176+ messages in thread
From: Dave Martin @ 2019-07-04 10:00 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, Andre Przywara, kvmarm, linux-arm-kernel

On Wed, Jul 03, 2019 at 10:21:57AM +0100, Marc Zyngier wrote:
> On 24/06/2019 13:48, Dave Martin wrote:
> > On Fri, Jun 21, 2019 at 02:50:08PM +0100, Marc Zyngier wrote:
> >> On 21/06/2019 14:24, Julien Thierry wrote:
> >>>
> >>>
> >>> On 21/06/2019 10:37, Marc Zyngier wrote:
> >>>> From: Christoffer Dall <christoffer.dall@linaro.org>
> >>>>
> >>>> We were not allowing userspace to set a more privileged mode for the VCPU
> >>>> than EL1, but we should allow this when nested virtualization is enabled
> >>>> for the VCPU.
> >>>>
> >>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> >>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >>>> ---
> >>>>  arch/arm64/kvm/guest.c | 6 ++++++
> >>>>  1 file changed, 6 insertions(+)
> >>>>
> >>>> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> >>>> index 3ae2f82fca46..4c35b5d51e21 100644
> >>>> --- a/arch/arm64/kvm/guest.c
> >>>> +++ b/arch/arm64/kvm/guest.c
> >>>> @@ -37,6 +37,7 @@
> >>>>  #include <asm/kvm_emulate.h>
> >>>>  #include <asm/kvm_coproc.h>
> >>>>  #include <asm/kvm_host.h>
> >>>> +#include <asm/kvm_nested.h>
> >>>>  #include <asm/sigcontext.h>
> >>>>  
> >>>>  #include "trace.h"
> >>>> @@ -194,6 +195,11 @@ static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
> >>>>  			if (vcpu_el1_is_32bit(vcpu))
> >>>>  				return -EINVAL;
> >>>>  			break;
> >>>> +		case PSR_MODE_EL2h:
> >>>> +		case PSR_MODE_EL2t:
> >>>> +			if (vcpu_el1_is_32bit(vcpu) || !nested_virt_in_use(vcpu))
> >>>
> >>> This condition reads a bit weirdly. Why do we care about anything else
> >>> than !nested_virt_in_use() ?
> >>>
> >>> If nested virt is not in use then obviously we return the error.
> >>>
> >>> If nested virt is in use then why do we care about EL1? Or should this
> >>> test read as "highest_el_is_32bit" ?
> >>
> >> There are multiple things at play here:
> >>
> >> - MODE_EL2x is not a valid 32bit mode
> >> - The architecture forbids nested virt with 32bit EL2
> >>
> >> The code above is a simplification of these two conditions. But
> >> certainly we can do a bit better, as kvm_reset_cpu() doesn't really
> >> check that we don't create a vcpu with both 32bit+NV. These two bits
> >> should really be exclusive.
> > 
> > This code is safe for now because KVM_VCPU_MAX_FEATURES <=
> > KVM_ARM_VCPU_NESTED_VIRT, right, i.e., nested_virt_in_use() cannot be
> > true?
> > 
> > This makes me a little uneasy, but I think that's paranoia talking: we
> > want bisectably, but no sane person should ship with just half of this
> > series.  So I guess this is fine.
> > 
> > We could stick something like
> > 
> > 	if (WARN_ON(...))
> > 		return false;
> > 
> > in nested_virt_in_use() and then remove it in the final patch, but it's
> > probably overkill.
> 
> The only case I can imagine something going wrong is if this series is
> only applied halfway, and another series bumps the maximum feature to
> something that includes NV. I guess your suggestion would solve that.

I won't lose sleep over it either way.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 13/59] KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg()
  2019-07-03 16:32       ` Alexandru Elisei
@ 2019-07-04 14:39         ` Marc Zyngier
  0 siblings, 0 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-07-04 14:39 UTC (permalink / raw)
  To: Alexandru Elisei, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Dave Martin

On 03/07/2019 17:32, Alexandru Elisei wrote:

[...]

>>>> +}
>>>> +
>>>> +#define EL2_SYSREG(el2, el1, translate)	\
>>>> +	[el2 - FIRST_EL2_SYSREG] = { el2, el1, translate }
>>>> +#define PURE_EL2_SYSREG(el2) \
>>>> +	[el2 - FIRST_EL2_SYSREG] = { el2,__INVALID_SYSREG__, NULL }
>>>> +/*
>>>> + * Associate vEL2 registers to their EL1 counterparts on the CPU.
>>>> + * The translate function can be NULL, when the register layout is identical.
>>>> + */
>>>> +struct el2_sysreg_map {
>>>> +	int sysreg;	/* EL2 register index into the array above */
>>>> +	int mapping;	/* associated EL1 register */
>>>> +	u64 (*translate)(u64 value);
>>>> +} nested_sysreg_map[NR_SYS_REGS - FIRST_EL2_SYSREG] = {
>>>> +	PURE_EL2_SYSREG( VPIDR_EL2 ),
>>>> +	PURE_EL2_SYSREG( VMPIDR_EL2 ),
>>>> +	PURE_EL2_SYSREG( ACTLR_EL2 ),
>>>> +	PURE_EL2_SYSREG( HCR_EL2 ),
>>>> +	PURE_EL2_SYSREG( MDCR_EL2 ),
>>>> +	PURE_EL2_SYSREG( HSTR_EL2 ),
>>>> +	PURE_EL2_SYSREG( HACR_EL2 ),
>>>> +	PURE_EL2_SYSREG( VTTBR_EL2 ),
>>>> +	PURE_EL2_SYSREG( VTCR_EL2 ),
>>>> +	PURE_EL2_SYSREG( RVBAR_EL2 ),
>>>> +	PURE_EL2_SYSREG( RMR_EL2 ),
>>>> +	PURE_EL2_SYSREG( TPIDR_EL2 ),
>>>> +	PURE_EL2_SYSREG( CNTVOFF_EL2 ),
>>>> +	PURE_EL2_SYSREG( CNTHCTL_EL2 ),
>>>> +	PURE_EL2_SYSREG( HPFAR_EL2 ),
>>>> +	EL2_SYSREG(      SCTLR_EL2,  SCTLR_EL1,      translate_sctlr ),
>>>> +	EL2_SYSREG(      CPTR_EL2,   CPACR_EL1,      translate_cptr  ),
>>>> +	EL2_SYSREG(      TTBR0_EL2,  TTBR0_EL1,      translate_ttbr0 ),
>>>> +	EL2_SYSREG(      TTBR1_EL2,  TTBR1_EL1,      NULL            ),
>>>> +	EL2_SYSREG(      TCR_EL2,    TCR_EL1,        translate_tcr   ),
>>>> +	EL2_SYSREG(      VBAR_EL2,   VBAR_EL1,       NULL            ),
>>>> +	EL2_SYSREG(      AFSR0_EL2,  AFSR0_EL1,      NULL            ),
>>>> +	EL2_SYSREG(      AFSR1_EL2,  AFSR1_EL1,      NULL            ),
>>>> +	EL2_SYSREG(      ESR_EL2,    ESR_EL1,        NULL            ),
>>>> +	EL2_SYSREG(      FAR_EL2,    FAR_EL1,        NULL            ),
>>>> +	EL2_SYSREG(      MAIR_EL2,   MAIR_EL1,       NULL            ),
>>>> +	EL2_SYSREG(      AMAIR_EL2,  AMAIR_EL1,      NULL            ),
>>>> +};
>>> Figuring out which registers are in this map and which aren't and are supposed
>>> to be treated differently is really cumbersome because they are split into two
>>> types of el2 registers and their order is different from the order in enum
>>> vcpu_sysreg (in kvm_host.h). Perhaps adding a comment about what registers will
>>> be treated differently would make the code a bit easier to follow?
>> I'm not sure what this buys us. We have 3 categories of EL2 sysregs:
>> - Purely emulated
>> - Directly mapped onto an EL1 sysreg
>> - Translated from EL2 to EL1
>>
>> I think the wrappers represent that pretty well, although we could split
>> EL2_SYSREG into DIRECT_EL2_SYSREG and TRANSLATE_EL2_SYSREG. As for the
>> order, does it really matter? We also have the trap table order, which
>> is also different from the enum. Do you propose we reorder everything?
> 
> The wrappers and the naming are fine.
> 
> I was trying to figure out which EL2 registers are in the nested_sysreg_map and
> which aren't (that's what I meant by "two types of registers") by looking at the
> vcpu_sysreg enum. Because the order in the map is different than the order in
> the enum, I was having a difficult time figuring out which registers are not in
> the nested_sysreg_map to make sure we haven't somehow forgot to emulate a register.
> 
> So no, I wasn't asking to reorder everything. I was asking if it would be
> appropriate to write a comment stating the intention to treat registers X, Y and
> Z separately from the registers in nested_sysreg_map.

Ah, fair enough. Yes, that's a very reasonable suggestion.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 27/59] KVM: arm64: nv: Respect virtual HCR_EL2.TVM and TRVM settings
  2019-06-26  6:55   ` Julien Thierry
@ 2019-07-04 14:57     ` Marc Zyngier
  0 siblings, 0 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-07-04 14:57 UTC (permalink / raw)
  To: Julien Thierry, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Dave Martin, Jintack Lim,
	James Morse, Suzuki K Poulose

On 26/06/2019 07:55, Julien Thierry wrote:
> 
> 
> On 06/21/2019 10:38 AM, Marc Zyngier wrote:
>> From: Jintack Lim <jintack.lim@linaro.org>
>>
>> Forward the EL1 virtual memory register traps to the virtual EL2 if they
>> are not coming from the virtual EL2 and the virtual HCR_EL2.TVM or TRVM
>> bit is set.
>>
>> This is for recursive nested virtualization.
>>
>> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm64/kvm/sys_regs.c | 24 ++++++++++++++++++++++++
>>  1 file changed, 24 insertions(+)
>>
>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>> index 582d62aa48b7..0f74b9277a86 100644
>> --- a/arch/arm64/kvm/sys_regs.c
>> +++ b/arch/arm64/kvm/sys_regs.c
>> @@ -436,6 +436,27 @@ static bool access_dcsw(struct kvm_vcpu *vcpu,
>>  	return true;
>>  }
>>  
>> +/* This function is to support the recursive nested virtualization */
>> +static bool forward_vm_traps(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
>> +{
>> +	u64 hcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
>> +
>> +	/* If a trap comes from the virtual EL2, the host hypervisor handles. */
>> +	if (vcpu_mode_el2(vcpu))
>> +		return false;
>> +
>> +	/*
>> +	 * If the virtual HCR_EL2.TVM or TRVM bit is set, we need to foward
>> +	 * this trap to the virtual EL2.
>> +	 */
>> +	if ((hcr_el2 & HCR_TVM) && p->is_write)
>> +		return true;
>> +	else if ((hcr_el2 & HCR_TRVM) && !p->is_write)
>> +		return true;
>> +
>> +	return false;
>> +}
>> +
>>  /*
>>   * Generic accessor for VM registers. Only called as long as HCR_TVM
>>   * is set. If the guest enables the MMU, we stop trapping the VM
>> @@ -452,6 +473,9 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
>>  	if (el12_reg(p) && forward_nv_traps(vcpu))
>>  		return false;
>>  
>> +	if (!el12_reg(p) && forward_vm_traps(vcpu, p))
>> +		return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
> 
> Since we already have forward_traps(), isn't this just:
> 
> 	if (!el12_reg(p) && forward_traps(vcpu, p->is_write ? HCR_TVM : HCR_TRVM))
> 		return true;
> 
> We could maybe simplify forward_vm_traps() to just call forward_traps()
> similar to forward_nv_traps().

Odd. I remember doing something like that. Where has it gone? Yes, this
looks sensible.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 13/59] KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg()
  2019-06-26 15:04   ` Alexandru Elisei
@ 2019-07-04 15:05     ` Marc Zyngier
  0 siblings, 0 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-07-04 15:05 UTC (permalink / raw)
  To: Alexandru Elisei, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Dave Martin

On 26/06/2019 16:04, Alexandru Elisei wrote:
> On 6/21/19 10:37 AM, Marc Zyngier wrote:
>> From: Andre Przywara <andre.przywara@arm.com>
>>
>> KVM internally uses accessor functions when reading or writing the
>> guest's system registers. This takes care of accessing either the stored
>> copy or using the "live" EL1 system registers when the host uses VHE.
>>
>> With the introduction of virtual EL2 we add a bunch of EL2 system
>> registers, which now must also be taken care of:
>> - If the guest is running in vEL2, and we access an EL1 sysreg, we must
>>   revert to the stored version of that, and not use the CPU's copy.
>> - If the guest is running in vEL1, and we access an EL2 sysreg, we must
>>   also use the stored version, since the CPU carries the EL1 copy.
>> - Some EL2 system registers are supposed to affect the current execution
>>   of the system, so we need to put them into their respective EL1
>>   counterparts. For this we need to define a mapping between the two.
>>   This is done using the newly introduced struct el2_sysreg_map.
>> - Some EL2 system registers have a different format than their EL1
>>   counterpart, so we need to translate them before writing them to the
>>   CPU. This is done using an (optional) translate function in the map.
>> - There are the three special registers SP_EL2, SPSR_EL2 and ELR_EL2,
>>   which need some separate handling.
>>
>> All of these cases are now wrapped into the existing accessor functions,
>> so KVM users wouldn't need to care whether they access EL2 or EL1
>> registers and also which state the guest is in.
>>
>> This handles what was formerly known as the "shadow state" dynamically,
>> without requiring a separate copy for each vCPU EL.
>>
>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm64/include/asm/kvm_emulate.h |   6 +
>>  arch/arm64/include/asm/kvm_host.h    |   5 +
>>  arch/arm64/kvm/sys_regs.c            | 163 +++++++++++++++++++++++++++
>>  3 files changed, 174 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
>> index c43aac5fed69..f37006b6eec4 100644
>> --- a/arch/arm64/include/asm/kvm_emulate.h
>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>> @@ -70,6 +70,12 @@ void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu);
>>  int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2);
>>  int kvm_inject_nested_irq(struct kvm_vcpu *vcpu);
>>  
>> +u64 translate_tcr(u64 tcr);
>> +u64 translate_cptr(u64 tcr);
>> +u64 translate_sctlr(u64 tcr);
>> +u64 translate_ttbr0(u64 tcr);
>> +u64 translate_cnthctl(u64 tcr);
>> +
>>  static inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
>>  {
>>  	return !(vcpu->arch.hcr_el2 & HCR_RW);
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 2d4290d2513a..dae9c42a7219 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -217,6 +217,11 @@ enum vcpu_sysreg {
>>  	NR_SYS_REGS	/* Nothing after this line! */
>>  };
>>  
>> +static inline bool sysreg_is_el2(int reg)
>> +{
>> +	return reg >= FIRST_EL2_SYSREG && reg < NR_SYS_REGS;
>> +}
>> +
>>  /* 32bit mapping */
>>  #define c0_MPIDR	(MPIDR_EL1 * 2)	/* MultiProcessor ID Register */
>>  #define c0_CSSELR	(CSSELR_EL1 * 2)/* Cache Size Selection Register */
>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>> index 693dd063c9c2..d024114da162 100644
>> --- a/arch/arm64/kvm/sys_regs.c
>> +++ b/arch/arm64/kvm/sys_regs.c
>> @@ -76,11 +76,142 @@ static bool write_to_read_only(struct kvm_vcpu *vcpu,
>>  	return false;
>>  }
>>  
>> +static u64 tcr_el2_ips_to_tcr_el1_ps(u64 tcr_el2)
>> +{
>> +	return ((tcr_el2 & TCR_EL2_PS_MASK) >> TCR_EL2_PS_SHIFT)
>> +		<< TCR_IPS_SHIFT;
>> +}
>> +
>> +u64 translate_tcr(u64 tcr)
>> +{
>> +	return TCR_EPD1_MASK |				/* disable TTBR1_EL1 */
>> +	       ((tcr & TCR_EL2_TBI) ? TCR_TBI0 : 0) |
>> +	       tcr_el2_ips_to_tcr_el1_ps(tcr) |
>> +	       (tcr & TCR_EL2_TG0_MASK) |
>> +	       (tcr & TCR_EL2_ORGN0_MASK) |
>> +	       (tcr & TCR_EL2_IRGN0_MASK) |
>> +	       (tcr & TCR_EL2_T0SZ_MASK);
>> +}
>> +
>> +u64 translate_cptr(u64 cptr_el2)
>> +{
>> +	u64 cpacr_el1 = 0;
>> +
>> +	if (!(cptr_el2 & CPTR_EL2_TFP))
>> +		cpacr_el1 |= CPACR_EL1_FPEN;
>> +	if (cptr_el2 & CPTR_EL2_TTA)
>> +		cpacr_el1 |= CPACR_EL1_TTA;
>> +	if (!(cptr_el2 & CPTR_EL2_TZ))
>> +		cpacr_el1 |= CPACR_EL1_ZEN;
>> +
>> +	return cpacr_el1;
>> +}
>> +
>> +u64 translate_sctlr(u64 sctlr)
>> +{
>> +	/* Bit 20 is RES1 in SCTLR_EL1, but RES0 in SCTLR_EL2 */
>> +	return sctlr | BIT(20);
>> +}
>> +
>> +u64 translate_ttbr0(u64 ttbr0)
>> +{
>> +	/* Force ASID to 0 (ASID 0 or RES0) */
>> +	return ttbr0 & ~GENMASK_ULL(63, 48);
>> +}
>> +
>> +u64 translate_cnthctl(u64 cnthctl)
>> +{
>> +	return ((cnthctl & 0x3) << 10) | (cnthctl & 0xfc);
>> +}
>> +
>> +#define EL2_SYSREG(el2, el1, translate)	\
>> +	[el2 - FIRST_EL2_SYSREG] = { el2, el1, translate }
>> +#define PURE_EL2_SYSREG(el2) \
>> +	[el2 - FIRST_EL2_SYSREG] = { el2,__INVALID_SYSREG__, NULL }
>> +/*
>> + * Associate vEL2 registers to their EL1 counterparts on the CPU.
>> + * The translate function can be NULL, when the register layout is identical.
>> + */
>> +struct el2_sysreg_map {
>> +	int sysreg;	/* EL2 register index into the array above */
>> +	int mapping;	/* associated EL1 register */
>> +	u64 (*translate)(u64 value);
>> +} nested_sysreg_map[NR_SYS_REGS - FIRST_EL2_SYSREG] = {
>> +	PURE_EL2_SYSREG( VPIDR_EL2 ),
>> +	PURE_EL2_SYSREG( VMPIDR_EL2 ),
>> +	PURE_EL2_SYSREG( ACTLR_EL2 ),
>> +	PURE_EL2_SYSREG( HCR_EL2 ),
>> +	PURE_EL2_SYSREG( MDCR_EL2 ),
>> +	PURE_EL2_SYSREG( HSTR_EL2 ),
>> +	PURE_EL2_SYSREG( HACR_EL2 ),
>> +	PURE_EL2_SYSREG( VTTBR_EL2 ),
>> +	PURE_EL2_SYSREG( VTCR_EL2 ),
>> +	PURE_EL2_SYSREG( RVBAR_EL2 ),
>> +	PURE_EL2_SYSREG( RMR_EL2 ),
>> +	PURE_EL2_SYSREG( TPIDR_EL2 ),
>> +	PURE_EL2_SYSREG( CNTVOFF_EL2 ),
>> +	PURE_EL2_SYSREG( CNTHCTL_EL2 ),
>> +	PURE_EL2_SYSREG( HPFAR_EL2 ),
>> +	EL2_SYSREG(      SCTLR_EL2,  SCTLR_EL1,      translate_sctlr ),
>> +	EL2_SYSREG(      CPTR_EL2,   CPACR_EL1,      translate_cptr  ),
>> +	EL2_SYSREG(      TTBR0_EL2,  TTBR0_EL1,      translate_ttbr0 ),
>> +	EL2_SYSREG(      TTBR1_EL2,  TTBR1_EL1,      NULL            ),
>> +	EL2_SYSREG(      TCR_EL2,    TCR_EL1,        translate_tcr   ),
>> +	EL2_SYSREG(      VBAR_EL2,   VBAR_EL1,       NULL            ),
>> +	EL2_SYSREG(      AFSR0_EL2,  AFSR0_EL1,      NULL            ),
>> +	EL2_SYSREG(      AFSR1_EL2,  AFSR1_EL1,      NULL            ),
>> +	EL2_SYSREG(      ESR_EL2,    ESR_EL1,        NULL            ),
>> +	EL2_SYSREG(      FAR_EL2,    FAR_EL1,        NULL            ),
>> +	EL2_SYSREG(      MAIR_EL2,   MAIR_EL1,       NULL            ),
>> +	EL2_SYSREG(      AMAIR_EL2,  AMAIR_EL1,      NULL            ),
>> +};
>> +
>> +static
>> +const struct el2_sysreg_map *find_el2_sysreg(const struct el2_sysreg_map *map,
>> +					     int reg)
>> +{
>> +	const struct el2_sysreg_map *entry;
>> +
>> +	if (!sysreg_is_el2(reg))
>> +		return NULL;
>> +
>> +	entry = &nested_sysreg_map[reg - FIRST_EL2_SYSREG];
>> +	if (entry->sysreg == __INVALID_SYSREG__)
>> +		return NULL;
>> +
>> +	return entry;
>> +}
>> +
>>  u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
>>  {
>> +
>>  	if (!vcpu->arch.sysregs_loaded_on_cpu)
>>  		goto immediate_read;
>>  
>> +	if (unlikely(sysreg_is_el2(reg))) {
>> +		const struct el2_sysreg_map *el2_reg;
>> +
>> +		if (!is_hyp_ctxt(vcpu))
>> +			goto immediate_read;
>> +
>> +		el2_reg = find_el2_sysreg(nested_sysreg_map, reg);
>> +		if (el2_reg) {
>> +			/*
>> +			 * If this register does not have an EL1 counterpart,
>> +			 * then read the stored EL2 version.
>> +			 */
>> +			if (el2_reg->mapping == __INVALID_SYSREG__)
>> +				goto immediate_read;
>> +
>> +			/* Get the current version of the EL1 counterpart. */
>> +			reg = el2_reg->mapping;
>> +		}
>> +	} else {
>> +		/* EL1 register can't be on the CPU if the guest is in vEL2. */
>> +		if (unlikely(is_hyp_ctxt(vcpu)))
>> +			goto immediate_read;
>> +	}
>> +
>>  	/*
>>  	 * System registers listed in the switch are not saved on every
>>  	 * exit from the guest but are only saved on vcpu_put.
>> @@ -114,6 +245,8 @@ u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
>>  	case DACR32_EL2:	return read_sysreg_s(SYS_DACR32_EL2);
>>  	case IFSR32_EL2:	return read_sysreg_s(SYS_IFSR32_EL2);
>>  	case DBGVCR32_EL2:	return read_sysreg_s(SYS_DBGVCR32_EL2);
>> +	case SP_EL2:		return read_sysreg(sp_el1);
> From ARM DDI 0487D.b, section Behavior when HCR_EL2.NV == 1: "Reads or writes to
> any allocated and implemented System register or Special-purpose register named
> *_EL2, *_EL02, or *_EL12 in the MRS or MSR instruction, other than SP_EL2, are
> trapped to EL2 rather than being UNDEFINED" (page D5-2480). My interpretation of
> the text is that attempted reads of SP_EL2 from virtual EL2 cause an undefined
> instruction exception.

Sure. Nonetheless, the virtual EL2 has a stack pointer, accessible via
SP_EL1 when it is loaded on the CPU. Somehow, this gets dropped later in
the series (which is a bit wrong). I definitely should bring it back.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 15/59] KVM: arm64: nv: Refactor vcpu_{read,write}_sys_reg
  2019-06-27  9:21   ` Alexandru Elisei
@ 2019-07-04 15:15     ` Marc Zyngier
  0 siblings, 0 replies; 176+ messages in thread
From: Marc Zyngier @ 2019-07-04 15:15 UTC (permalink / raw)
  To: Alexandru Elisei, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Dave Martin

On 27/06/2019 10:21, Alexandru Elisei wrote:
> On 6/21/19 10:37 AM, Marc Zyngier wrote:
>> Extract the direct HW accessors for later reuse.
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm64/kvm/sys_regs.c | 247 +++++++++++++++++++++-----------------
>>  1 file changed, 139 insertions(+), 108 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>> index 2b8734f75a09..e181359adadf 100644
>> --- a/arch/arm64/kvm/sys_regs.c
>> +++ b/arch/arm64/kvm/sys_regs.c
>> @@ -182,99 +182,161 @@ const struct el2_sysreg_map *find_el2_sysreg(const struct el2_sysreg_map *map,
>>  	return entry;
>>  }
>>  
>> +static bool __vcpu_read_sys_reg_from_cpu(int reg, u64 *val)
>> +{
>> +	/*
>> +	 * System registers listed in the switch are not saved on every
>> +	 * exit from the guest but are only saved on vcpu_put.
>> +	 *
>> +	 * Note that MPIDR_EL1 for the guest is set by KVM via VMPIDR_EL2 but
>> +	 * should never be listed below, because the guest cannot modify its
>> +	 * own MPIDR_EL1 and MPIDR_EL1 is accessed for VCPU A from VCPU B's
>> +	 * thread when emulating cross-VCPU communication.
>> +	 */
>> +	switch (reg) {
>> +	case CSSELR_EL1:	*val = read_sysreg_s(SYS_CSSELR_EL1);	break;
>> +	case SCTLR_EL1:		*val = read_sysreg_s(SYS_SCTLR_EL12);	break;
>> +	case ACTLR_EL1:		*val = read_sysreg_s(SYS_ACTLR_EL1);	break;
>> +	case CPACR_EL1:		*val = read_sysreg_s(SYS_CPACR_EL12);	break;
>> +	case TTBR0_EL1:		*val = read_sysreg_s(SYS_TTBR0_EL12);	break;
>> +	case TTBR1_EL1:		*val = read_sysreg_s(SYS_TTBR1_EL12);	break;
>> +	case TCR_EL1:		*val = read_sysreg_s(SYS_TCR_EL12);	break;
>> +	case ESR_EL1:		*val = read_sysreg_s(SYS_ESR_EL12);	break;
>> +	case AFSR0_EL1:		*val = read_sysreg_s(SYS_AFSR0_EL12);	break;
>> +	case AFSR1_EL1:		*val = read_sysreg_s(SYS_AFSR1_EL12);	break;
>> +	case FAR_EL1:		*val = read_sysreg_s(SYS_FAR_EL12);	break;
>> +	case MAIR_EL1:		*val = read_sysreg_s(SYS_MAIR_EL12);	break;
>> +	case VBAR_EL1:		*val = read_sysreg_s(SYS_VBAR_EL12);	break;
>> +	case CONTEXTIDR_EL1:	*val = read_sysreg_s(SYS_CONTEXTIDR_EL12);break;
>> +	case TPIDR_EL0:		*val = read_sysreg_s(SYS_TPIDR_EL0);	break;
>> +	case TPIDRRO_EL0:	*val = read_sysreg_s(SYS_TPIDRRO_EL0);	break;
>> +	case TPIDR_EL1:		*val = read_sysreg_s(SYS_TPIDR_EL1);	break;
>> +	case AMAIR_EL1:		*val = read_sysreg_s(SYS_AMAIR_EL12);	break;
>> +	case CNTKCTL_EL1:	*val = read_sysreg_s(SYS_CNTKCTL_EL12);	break;
>> +	case PAR_EL1:		*val = read_sysreg_s(SYS_PAR_EL1);	break;
>> +	case DACR32_EL2:	*val = read_sysreg_s(SYS_DACR32_EL2);	break;
>> +	case IFSR32_EL2:	*val = read_sysreg_s(SYS_IFSR32_EL2);	break;
>> +	case DBGVCR32_EL2:	*val = read_sysreg_s(SYS_DBGVCR32_EL2);	break;
>> +	default:		return false;
>> +	}
>> +
>> +	return true;
>> +}
>> +
>> +static bool __vcpu_write_sys_reg_to_cpu(u64 val, int reg)
>> +{
>> +	/*
>> +	 * System registers listed in the switch are not restored on every
>> +	 * entry to the guest but are only restored on vcpu_load.
>> +	 *
>> +	 * Note that MPIDR_EL1 for the guest is set by KVM via VMPIDR_EL2 but
>> +	 * should never be listed below, because the the MPIDR should only be
>> +	 * set once, before running the VCPU, and never changed later.
>> +	 */
>> +	switch (reg) {
>> +	case CSSELR_EL1:	write_sysreg_s(val, SYS_CSSELR_EL1);	break;
>> +	case SCTLR_EL1:		write_sysreg_s(val, SYS_SCTLR_EL12);	break;
>> +	case ACTLR_EL1:		write_sysreg_s(val, SYS_ACTLR_EL1);	break;
>> +	case CPACR_EL1:		write_sysreg_s(val, SYS_CPACR_EL12);	break;
>> +	case TTBR0_EL1:		write_sysreg_s(val, SYS_TTBR0_EL12);	break;
>> +	case TTBR1_EL1:		write_sysreg_s(val, SYS_TTBR1_EL12);	break;
>> +	case TCR_EL1:		write_sysreg_s(val, SYS_TCR_EL12);	break;
>> +	case ESR_EL1:		write_sysreg_s(val, SYS_ESR_EL12);	break;
>> +	case AFSR0_EL1:		write_sysreg_s(val, SYS_AFSR0_EL12);	break;
>> +	case AFSR1_EL1:		write_sysreg_s(val, SYS_AFSR1_EL12);	break;
>> +	case FAR_EL1:		write_sysreg_s(val, SYS_FAR_EL12);	break;
>> +	case MAIR_EL1:		write_sysreg_s(val, SYS_MAIR_EL12);	break;
>> +	case VBAR_EL1:		write_sysreg_s(val, SYS_VBAR_EL12);	break;
>> +	case CONTEXTIDR_EL1:	write_sysreg_s(val, SYS_CONTEXTIDR_EL12);break;
>> +	case TPIDR_EL0:		write_sysreg_s(val, SYS_TPIDR_EL0);	break;
>> +	case TPIDRRO_EL0:	write_sysreg_s(val, SYS_TPIDRRO_EL0);	break;
>> +	case TPIDR_EL1:		write_sysreg_s(val, SYS_TPIDR_EL1);	break;
>> +	case AMAIR_EL1:		write_sysreg_s(val, SYS_AMAIR_EL12);	break;
>> +	case CNTKCTL_EL1:	write_sysreg_s(val, SYS_CNTKCTL_EL12);	break;
>> +	case PAR_EL1:		write_sysreg_s(val, SYS_PAR_EL1);	break;
>> +	case DACR32_EL2:	write_sysreg_s(val, SYS_DACR32_EL2);	break;
>> +	case IFSR32_EL2:	write_sysreg_s(val, SYS_IFSR32_EL2);	break;
>> +	case DBGVCR32_EL2:	write_sysreg_s(val, SYS_DBGVCR32_EL2);	break;
>> +	default:		return false;
>> +	}
>> +
>> +	return true;
>> +}
>> +
>>  u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
>>  {
>> -	u64 val;
>> +	u64 val = 0x8badf00d8badf00d;
>>  
>>  	if (!vcpu->arch.sysregs_loaded_on_cpu)
>> -		goto immediate_read;
>> +		goto memory_read;
>>  
>>  	if (unlikely(sysreg_is_el2(reg))) {
>>  		const struct el2_sysreg_map *el2_reg;
>>  
>>  		if (!is_hyp_ctxt(vcpu))
>> -			goto immediate_read;
>> +			goto memory_read;
>>  
>>  		switch (reg) {
>> +		case ELR_EL2:
>> +			return read_sysreg_el1(SYS_ELR);
>>  		case SPSR_EL2:
>>  			val = read_sysreg_el1(SYS_SPSR);
>>  			return __fixup_spsr_el2_read(&vcpu->arch.ctxt, val);
>>  		}
>>  
>>  		el2_reg = find_el2_sysreg(nested_sysreg_map, reg);
>> -		if (el2_reg) {
>> -			/*
>> -			 * If this register does not have an EL1 counterpart,
>> -			 * then read the stored EL2 version.
>> -			 */
>> -			if (el2_reg->mapping == __INVALID_SYSREG__)
>> -				goto immediate_read;
>> -
>> -			/* Get the current version of the EL1 counterpart. */
>> -			reg = el2_reg->mapping;
>> -		}
>> -	} else {
>> -		/* EL1 register can't be on the CPU if the guest is in vEL2. */
>> -		if (unlikely(is_hyp_ctxt(vcpu)))
>> -			goto immediate_read;
>> +		BUG_ON(!el2_reg);
>> +
>> +		/*
>> +		 * If this register does not have an EL1 counterpart,
>> +		 * then read the stored EL2 version.
>> +		 */
>> +		if (el2_reg->mapping == __INVALID_SYSREG__)
>> +			goto memory_read;
>> +
>> +		if (!vcpu_el2_e2h_is_set(vcpu) &&
>> +		    el2_reg->translate)
>> +			goto memory_read;
> 
> Nit: the condition can be written on one line.
> 
> This condition wasn't present in patch 13 which introduced EL2 register
> handling, and I'm struggling to understand what it does. As I understand the
> code, this condition basically translates into:
> 
> - if the register is one of SCTLR_EL2, TTBR0_EL2, CPTR_EL2 or TCR_EL2, then read
> it from memory.
> 
> - if the register is an EL2 register whose value is written unmodified to the
> corresponding EL1 register, then read the corresponding EL1 register and return
> that value.
> 
> Looking at vcpu_write_sys_reg, the values for the EL2 registers are always saved
> in memory. The guest is a non-vhe guest, so writes to EL1 registers shouldn't be
> reflected in the corresponding EL2 register. I think it's safe to always return
> the value from memory.
> 
> I tried testing this with the following patch:
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 1235a88ec575..27d39bb9564d 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -290,6 +290,9 @@ u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
>                 el2_reg = find_el2_sysreg(nested_sysreg_map, reg);
>                 BUG_ON(!el2_reg);
>  
> +               if (!vcpu_el2_e2h_is_set(vcpu))
> +                       goto memory_read;
> +
>                 /*
>                  * If this register does not have an EL1 counterpart,
>                  * then read the stored EL2 version.
> @@ -297,10 +300,6 @@ u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
>                 if (el2_reg->mapping == __INVALID_SYSREG__)
>                         goto memory_read;
>  
> -               if (!vcpu_el2_e2h_is_set(vcpu) &&
> -                   el2_reg->translate)
> -                       goto memory_read;
> -
>                 /* Get the current version of the EL1 counterpart. */
>                 reg = el2_reg->mapping;
>                 WARN_ON(!__vcpu_read_sys_reg_from_cpu(reg, &val));
> 
> I know it's not conclusive, but I was able to boot a L2 guest under a L1 non-vhe
> hypervisor.

And now you can't properly handle the terrible ARMv8.3 business of
SPSR_EL1 being changed behind your back if you get an exception at vEL2
to vEL2 on non-VHE. To handle this, you need both the live system
register and the memory backup (see __fixup_spsr_el2_read and co).

More generally, some registers can be modified behind your back. That's
ELR, SPSR, FAR, ESR.  Anything related to taking an exception. No, this
can't be observed with KVM because we don't allow exception to be taken
at EL2 in the absence of RAS errors.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 35/59] KVM: arm/arm64: nv: Support multiple nested stage 2 mmu structures
  2019-06-21  9:38 ` [PATCH 35/59] KVM: arm/arm64: nv: Support multiple nested stage 2 mmu structures Marc Zyngier
  2019-06-25 12:19   ` Alexandru Elisei
  2019-06-27 13:15   ` Julien Thierry
@ 2019-07-04 15:51   ` Alexandru Elisei
  2020-01-05 11:35     ` Marc Zyngier
  2 siblings, 1 reply; 176+ messages in thread
From: Alexandru Elisei @ 2019-07-04 15:51 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave Martin

On 6/21/19 10:38 AM, Marc Zyngier wrote:

> From: Christoffer Dall <christoffer.dall@arm.com>
>
> Add stage 2 mmu data structures for virtual EL2 and for nested guests.
> We don't yet populate shadow stage 2 page tables, but we now have a
> framework for getting to a shadow stage 2 pgd.
>
> We allocate twice the number of vcpus as stage 2 mmu structures because
> that's sufficient for each vcpu running two VMs without having to flush
> the stage 2 page tables.
>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/include/asm/kvm_host.h     |   4 +
>  arch/arm/include/asm/kvm_mmu.h      |   3 +
>  arch/arm64/include/asm/kvm_host.h   |  28 +++++
>  arch/arm64/include/asm/kvm_mmu.h    |   8 ++
>  arch/arm64/include/asm/kvm_nested.h |   7 ++
>  arch/arm64/kvm/nested.c             | 172 ++++++++++++++++++++++++++++
>  virt/kvm/arm/arm.c                  |  16 ++-
>  virt/kvm/arm/mmu.c                  |  31 ++---
>  8 files changed, 254 insertions(+), 15 deletions(-)
>
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index e3217c4ad25b..b821eb2383ad 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -424,4 +424,8 @@ static inline bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu)
>  	return true;
>  }
>  
> +static inline void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu) {}
> +static inline int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu) { return 0; }
> +
>  #endif /* __ARM_KVM_HOST_H__ */
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index be23e3f8e08c..e6984b6da2ce 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -420,6 +420,9 @@ static inline int hyp_map_aux_data(void)
>  
>  static inline void kvm_set_ipa_limit(void) {}
>  
> +static inline void kvm_init_s2_mmu(struct kvm_s2_mmu *mmu) {}
> +static inline void kvm_init_nested(struct kvm *kvm) {}
> +
>  static __always_inline u64 kvm_get_vttbr(struct kvm_s2_mmu *mmu)
>  {
>  	struct kvm_vmid *vmid = &mmu->vmid;
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 3dee5e17a4ee..cc238de170d2 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -88,11 +88,39 @@ struct kvm_s2_mmu {
>  	phys_addr_t	pgd_phys;
>  
>  	struct kvm *kvm;
> +
> +	/*
> +	 * For a shadow stage-2 MMU, the virtual vttbr programmed by the guest
> +	 * hypervisor.  Unused for kvm_arch->mmu. Set to 1 when the structure
> +	 * contains no valid information.
> +	 */
> +	u64	vttbr;
> +
> +	/* true when this represents a nested context where virtual HCR_EL2.VM == 1 */
> +	bool	nested_stage2_enabled;
> +
> +	/*
> +	 *  0: Nobody is currently using this, check vttbr for validity
> +	 * >0: Somebody is actively using this.
> +	 */
> +	atomic_t refcnt;
>  };
>  
> +static inline bool kvm_s2_mmu_valid(struct kvm_s2_mmu *mmu)
> +{
> +	return !(mmu->vttbr & 1);
> +}
> +
>  struct kvm_arch {
>  	struct kvm_s2_mmu mmu;
>  
> +	/*
> +	 * Stage 2 paging stage for VMs with nested virtual using a virtual
> +	 * VMID.
> +	 */
> +	struct kvm_s2_mmu *nested_mmus;
> +	size_t nested_mmus_size;
> +
>  	/* VTCR_EL2 value for this VM */
>  	u64    vtcr;
>  
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 1eb6e0ca61c2..32bcaa1845dc 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -100,6 +100,7 @@ alternative_cb_end
>  #include <asm/mmu_context.h>
>  #include <asm/pgtable.h>
>  #include <asm/kvm_emulate.h>
> +#include <asm/kvm_nested.h>
>  
>  void kvm_update_va_mask(struct alt_instr *alt,
>  			__le32 *origptr, __le32 *updptr, int nr_inst);
> @@ -164,6 +165,7 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
>  			     void **haddr);
>  void free_hyp_pgds(void);
>  
> +void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size);
>  void stage2_unmap_vm(struct kvm *kvm);
>  int kvm_alloc_stage2_pgd(struct kvm_s2_mmu *mmu);
>  void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
> @@ -635,5 +637,11 @@ static __always_inline void __load_guest_stage2(struct kvm_s2_mmu *mmu)
>  	asm(ALTERNATIVE("nop", "isb", ARM64_WORKAROUND_1165522));
>  }
>  
> +static inline u64 get_vmid(u64 vttbr)
> +{
> +	return (vttbr & VTTBR_VMID_MASK(kvm_get_vmid_bits())) >>
> +		VTTBR_VMID_SHIFT;
> +}
> +
>  #endif /* __ASSEMBLY__ */
>  #endif /* __ARM64_KVM_MMU_H__ */
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index 61e71d0d2151..d4021d0892bd 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -10,6 +10,13 @@ static inline bool nested_virt_in_use(const struct kvm_vcpu *vcpu)
>  		test_bit(KVM_ARM_VCPU_NESTED_VIRT, vcpu->arch.features);
>  }
>  
> +extern void kvm_init_nested(struct kvm *kvm);
> +extern int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu);
> +extern void kvm_init_s2_mmu(struct kvm_s2_mmu *mmu);
> +extern struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr);
> +extern void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu);
> +extern void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu);
> +
>  int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
>  extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
>  extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> index 3872e3cf1691..4b38dc5c0be3 100644
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
> @@ -18,7 +18,161 @@
>  #include <linux/kvm.h>
>  #include <linux/kvm_host.h>
>  
> +#include <asm/kvm_arm.h>
>  #include <asm/kvm_emulate.h>
> +#include <asm/kvm_mmu.h>
> +#include <asm/kvm_nested.h>
> +
> +void kvm_init_nested(struct kvm *kvm)
> +{
> +	kvm_init_s2_mmu(&kvm->arch.mmu);
> +
> +	kvm->arch.nested_mmus = NULL;
> +	kvm->arch.nested_mmus_size = 0;
> +}
> +
> +int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm *kvm = vcpu->kvm;
> +	struct kvm_s2_mmu *tmp;
> +	int num_mmus;
> +	int ret = -ENOMEM;
> +
> +	if (!test_bit(KVM_ARM_VCPU_NESTED_VIRT, vcpu->arch.features))
> +		return 0;
> +
> +	if (!cpus_have_const_cap(ARM64_HAS_NESTED_VIRT))
> +		return -EINVAL;
> +
> +	mutex_lock(&kvm->lock);
> +
> +	num_mmus = atomic_read(&kvm->online_vcpus) * 2;
> +	tmp = __krealloc(kvm->arch.nested_mmus,
> +			 num_mmus * sizeof(*kvm->arch.nested_mmus),
> +			 GFP_KERNEL | __GFP_ZERO);
> +
> +	if (tmp) {
> +		if (tmp != kvm->arch.nested_mmus)
> +			kfree(kvm->arch.nested_mmus);

Here we are freeing a resource that is shared between all virtual CPUs. So if
kvm_alloc_stage2_pgd (below) fails, we could end up freeing something that has
been initialized by previous vcpu(s). Same thing happens if tmp ==
kvm->arch.nested_mmus (because of kfree(tmp)).

Looking at Documentation/virtual/kvm/api.txt (section 4.82, KVM_ARM_VCPU_IOCTL):

Userspace can call this function multiple times for a given vcpu, including
after the vcpu has been run. This will reset the vcpu to its initial
state.

It seems to me that it is allowed for userspace to try to call
KVM_ARM_VCPU_IOCTL again after it failed. I was wondering if it's possible to
end up in a situation where the second call succeeds, but kvm->arch.nested_mmus
is not initialized properly and if we should care about that.

> +
> +		tmp[num_mmus - 1].kvm = kvm;
> +		atomic_set(&tmp[num_mmus - 1].refcnt, 0);
> +		ret = kvm_alloc_stage2_pgd(&tmp[num_mmus - 1]);
> +		if (ret)
> +			goto out;
> +
> +		tmp[num_mmus - 2].kvm = kvm;
> +		atomic_set(&tmp[num_mmus - 2].refcnt, 0);
> +		ret = kvm_alloc_stage2_pgd(&tmp[num_mmus - 2]);
> +		if (ret) {
> +			kvm_free_stage2_pgd(&tmp[num_mmus - 1]);
> +			goto out;
> +		}
> +
> +		kvm->arch.nested_mmus_size = num_mmus;
> +		kvm->arch.nested_mmus = tmp;
> +		tmp = NULL;
> +	}
> +
> +out:
> +	kfree(tmp);
> +	mutex_unlock(&kvm->lock);
> +	return ret;
> +}
> +
> +/* Must be called with kvm->lock held */
> +struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr)
> +{
> +	bool nested_stage2_enabled = hcr & HCR_VM;
> +	int i;
> +
> +	/* Don't consider the CnP bit for the vttbr match */
> +	vttbr = vttbr & ~1UL;

There's a define for that bit, VTTBR_CNP_BIT in kvm_arm.h (which this file
includes).

> +
> +	/* Search a mmu in the list using the virtual VMID as a key */

I think when the guest has stage 2 enabled we also match on VTTBR_EL2.VMID +
VTTBR_EL2.BADDR.

> +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> +		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
> +
> +		if (!kvm_s2_mmu_valid(mmu))
> +			continue;
> +
> +		if (nested_stage2_enabled &&
> +		    mmu->nested_stage2_enabled &&
> +		    vttbr == mmu->vttbr)
> +			return mmu;
> +
> +		if (!nested_stage2_enabled &&
> +		    !mmu->nested_stage2_enabled &&
> +		    get_vmid(vttbr) == get_vmid(mmu->vttbr))
> +			return mmu;

I'm struggling to understand why we do this. As far as I can tell, this applies
only to non-vhe guest hypervisors, because that's the only situation where we
run something at vEL1 without shadow stage 2 enabled. There's only one MMU that
matches that guest, because there's only one host for the non-vhe hypervisor. So
why are we matching on vmid only, instead of matching on the entire vttbr? Is it
because we don't want to get fooled by a naughty guest that changes the BADDR
each time it switches to its EL1 host? Or something else entirely that I'm missing?

> +	}
> +	return NULL;
> +}
> +
> +static struct kvm_s2_mmu *get_s2_mmu_nested(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm *kvm = vcpu->kvm;
> +	u64 vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
> +	u64 hcr= vcpu_read_sys_reg(vcpu, HCR_EL2);
> +	struct kvm_s2_mmu *s2_mmu;
> +	int i;
> +
> +	s2_mmu = lookup_s2_mmu(kvm, vttbr, hcr);
> +	if (s2_mmu)
> +		goto out;
> +
> +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> +		s2_mmu = &kvm->arch.nested_mmus[i];
> +
> +		if (atomic_read(&s2_mmu->refcnt) == 0)
> +			break;
> +	}
> +	BUG_ON(atomic_read(&s2_mmu->refcnt)); /* We have struct MMUs to spare */
> +
> +	if (kvm_s2_mmu_valid(s2_mmu)) {
> +		/* Clear the old state */
> +		kvm_unmap_stage2_range(s2_mmu, 0, kvm_phys_size(kvm));
> +		if (s2_mmu->vmid.vmid_gen)
> +			kvm_call_hyp(__kvm_tlb_flush_vmid, s2_mmu);
> +	}
> +
> +	/*
> +	 * The virtual VMID (modulo CnP) will be used as a key when matching
> +	 * an existing kvm_s2_mmu.
> +	 */
> +	s2_mmu->vttbr = vttbr & ~1UL;
> +	s2_mmu->nested_stage2_enabled = hcr & HCR_VM;
> +
> +out:
> +	atomic_inc(&s2_mmu->refcnt);
> +	return s2_mmu;
> +}
> +
> +void kvm_init_s2_mmu(struct kvm_s2_mmu *mmu)
> +{
> +	mmu->vttbr = 1;
> +	mmu->nested_stage2_enabled = false;
> +	atomic_set(&mmu->refcnt, 0);
> +}
> +
> +void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu)
> +{
> +	if (is_hyp_ctxt(vcpu)) {

If userspace has set the nested feature, but hasn't set the vcpu mode to
PSR_MODE_EL2h/PSR_MODE_EL2t, we will never use kvm->arch.mmu, and instead we'll
always take the mmu_lock and search for the mmu. Is that something we should
care about?

> +		vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu;
> +	} else {
> +		spin_lock(&vcpu->kvm->mmu_lock);
> +		vcpu->arch.hw_mmu = get_s2_mmu_nested(vcpu);
> +		spin_unlock(&vcpu->kvm->mmu_lock);
> +	}
> +}
> +
> +void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu)
> +{
> +	if (vcpu->arch.hw_mmu != &vcpu->kvm->arch.mmu) {
> +		atomic_dec(&vcpu->arch.hw_mmu->refcnt);
> +		vcpu->arch.hw_mmu = NULL;
> +	}
> +}
>  
>  /*
>   * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
> @@ -37,3 +191,21 @@ int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
>  
>  	return -EINVAL;
>  }
> +
> +void kvm_arch_flush_shadow_all(struct kvm *kvm)
> +{
> +	int i;
> +
> +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> +		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
> +
> +		WARN_ON(atomic_read(&mmu->refcnt));
> +
> +		if (!atomic_read(&mmu->refcnt))
> +			kvm_free_stage2_pgd(mmu);
> +	}
> +	kfree(kvm->arch.nested_mmus);
> +	kvm->arch.nested_mmus = NULL;
> +	kvm->arch.nested_mmus_size = 0;
> +	kvm_free_stage2_pgd(&kvm->arch.mmu);
> +}
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 5d4371633e1c..4e3cbfa1ecbe 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -36,6 +36,7 @@
>  #include <asm/kvm_arm.h>
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_mmu.h>
> +#include <asm/kvm_nested.h>
>  #include <asm/kvm_emulate.h>
>  #include <asm/kvm_coproc.h>
>  #include <asm/sections.h>
> @@ -126,6 +127,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  	kvm->arch.mmu.vmid.vmid_gen = 0;
>  	kvm->arch.mmu.kvm = kvm;
>  
> +	kvm_init_nested(kvm);

More context:

@@ -120,18 +121,20 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 
     ret = kvm_alloc_stage2_pgd(&kvm->arch.mmu);
     if (ret)
         goto out_fail_alloc;
 
     /* Mark the initial VMID generation invalid */
     kvm->arch.mmu.vmid.vmid_gen = 0;
     kvm->arch.mmu.kvm = kvm;
 
+    kvm_init_nested(kvm);
+

kvm_alloc_stage2_pgd ends up calling kvm_init_s2_mmu for kvm->arch.mmu.
kvm_init_nested does the same thing, and we end up calling kvm_init_s2_mmu for
kvm->arch.mmu twice. There's really no harm in doing it twice, but I thought I
could mention it because it looks to me that the fix is simple: remove the
kvm_init_s2_mmu call from kvm_init_nested. The double initialization is also
true for the tip of the series.

Thanks,

Alex

> +
>  	ret = create_hyp_mappings(kvm, kvm + 1, PAGE_HYP);
>  	if (ret)
>  		goto out_free_stage2_pgd;
> @@ -353,6 +356,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  	int *last_ran;
>  	kvm_host_data_t *cpu_data;
>  
> +	if (nested_virt_in_use(vcpu))
> +		kvm_vcpu_load_hw_mmu(vcpu);
> +
>  	last_ran = this_cpu_ptr(vcpu->kvm->arch.last_vcpu_ran);
>  	cpu_data = this_cpu_ptr(&kvm_host_data);
>  
> @@ -391,6 +397,9 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  	kvm_vgic_put(vcpu);
>  	kvm_vcpu_pmu_restore_host(vcpu);
>  
> +	if (nested_virt_in_use(vcpu))
> +		kvm_vcpu_put_hw_mmu(vcpu);
> +
>  	vcpu->cpu = -1;
>  
>  	kvm_arm_set_running_vcpu(NULL);
> @@ -968,8 +977,13 @@ static int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
>  
>  	vcpu->arch.target = phys_target;
>  
> +	/* Prepare for nested if required */
> +	ret = kvm_vcpu_init_nested(vcpu);
> +
>  	/* Now we know what it is, we can reset it. */
> -	ret = kvm_reset_vcpu(vcpu);
> +	if (!ret)
> +		ret = kvm_reset_vcpu(vcpu);
> +
>  	if (ret) {
>  		vcpu->arch.target = -1;
>  		bitmap_zero(vcpu->arch.features, KVM_VCPU_MAX_FEATURES);
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index bb1be4ea55ec..faa61a81c8cc 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -325,7 +325,7 @@ static void unmap_stage2_puds(struct kvm_s2_mmu *mmu, pgd_t *pgd,
>  }
>  
>  /**
> - * unmap_stage2_range -- Clear stage2 page table entries to unmap a range
> + * kvm_unmap_stage2_range -- Clear stage2 page table entries to unmap a range
>   * @kvm:   The VM pointer
>   * @start: The intermediate physical base address of the range to unmap
>   * @size:  The size of the area to unmap
> @@ -335,7 +335,7 @@ static void unmap_stage2_puds(struct kvm_s2_mmu *mmu, pgd_t *pgd,
>   * destroying the VM), otherwise another faulting VCPU may come in and mess
>   * with things behind our backs.
>   */
> -static void unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
> +void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
>  {
>  	struct kvm *kvm = mmu->kvm;
>  	pgd_t *pgd;
> @@ -924,6 +924,10 @@ int kvm_alloc_stage2_pgd(struct kvm_s2_mmu *mmu)
>  
>  	mmu->pgd = pgd;
>  	mmu->pgd_phys = pgd_phys;
> +	mmu->vmid.vmid_gen = 0;
> +
> +	kvm_init_s2_mmu(mmu);
> +
>  	return 0;
>  }
>  
> @@ -962,7 +966,7 @@ static void stage2_unmap_memslot(struct kvm *kvm,
>  
>  		if (!(vma->vm_flags & VM_PFNMAP)) {
>  			gpa_t gpa = addr + (vm_start - memslot->userspace_addr);
> -			unmap_stage2_range(&kvm->arch.mmu, gpa, vm_end - vm_start);
> +			kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, vm_end - vm_start);
>  		}
>  		hva = vm_end;
>  	} while (hva < reg_end);
> @@ -1001,7 +1005,7 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
>  
>  	spin_lock(&kvm->mmu_lock);
>  	if (mmu->pgd) {
> -		unmap_stage2_range(mmu, 0, kvm_phys_size(kvm));
> +		kvm_unmap_stage2_range(mmu, 0, kvm_phys_size(kvm));
>  		pgd = READ_ONCE(mmu->pgd);
>  		mmu->pgd = NULL;
>  	}
> @@ -1093,7 +1097,7 @@ static int stage2_set_pmd_huge(struct kvm_s2_mmu *mmu,
>  		 * get handled accordingly.
>  		 */
>  		if (!pmd_thp_or_huge(old_pmd)) {
> -			unmap_stage2_range(mmu, addr & S2_PMD_MASK, S2_PMD_SIZE);
> +			kvm_unmap_stage2_range(mmu, addr & S2_PMD_MASK, S2_PMD_SIZE);
>  			goto retry;
>  		}
>  		/*
> @@ -1145,7 +1149,7 @@ static int stage2_set_pud_huge(struct kvm_s2_mmu *mmu,
>  		 * the range for this block and retry.
>  		 */
>  		if (!stage2_pud_huge(kvm, old_pud)) {
> -			unmap_stage2_range(mmu, addr & S2_PUD_MASK, S2_PUD_SIZE);
> +			kvm_unmap_stage2_range(mmu, addr & S2_PUD_MASK, S2_PUD_SIZE);
>  			goto retry;
>  		}
>  
> @@ -2047,7 +2051,7 @@ static int handle_hva_to_gpa(struct kvm *kvm,
>  
>  static int kvm_unmap_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data)
>  {
> -	unmap_stage2_range(&kvm->arch.mmu, gpa, size);
> +	kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, size);
>  	return 0;
>  }
>  
> @@ -2360,7 +2364,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>  
>  	spin_lock(&kvm->mmu_lock);
>  	if (ret)
> -		unmap_stage2_range(&kvm->arch.mmu, mem->guest_phys_addr, mem->memory_size);
> +		kvm_unmap_stage2_range(&kvm->arch.mmu, mem->guest_phys_addr, mem->memory_size);
>  	else
>  		stage2_flush_memslot(&kvm->arch.mmu, memslot);
>  	spin_unlock(&kvm->mmu_lock);
> @@ -2384,11 +2388,6 @@ void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen)
>  {
>  }
>  
> -void kvm_arch_flush_shadow_all(struct kvm *kvm)
> -{
> -	kvm_free_stage2_pgd(&kvm->arch.mmu);
> -}
> -
>  void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>  				   struct kvm_memory_slot *slot)
>  {
> @@ -2396,7 +2395,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>  	phys_addr_t size = slot->npages << PAGE_SHIFT;
>  
>  	spin_lock(&kvm->mmu_lock);
> -	unmap_stage2_range(&kvm->arch.mmu, gpa, size);
> +	kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, size);
>  	spin_unlock(&kvm->mmu_lock);
>  }
>  
> @@ -2467,3 +2466,7 @@ void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled)
>  
>  	trace_kvm_toggle_cache(*vcpu_pc(vcpu), was_enabled, now_enabled);
>  }
> +
> +__weak void kvm_arch_flush_shadow_all(struct kvm *kvm)
> +{
> +}

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 37/59] KVM: arm64: nv: Handle shadow stage 2 page faults
  2019-06-21  9:38 ` [PATCH 37/59] KVM: arm64: nv: Handle shadow stage 2 page faults Marc Zyngier
@ 2019-07-05 14:28   ` Alexandru Elisei
  0 siblings, 0 replies; 176+ messages in thread
From: Alexandru Elisei @ 2019-07-05 14:28 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave Martin

On 6/21/19 10:38 AM, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@linaro.org>
>
> If we are faulting on a shadow stage 2 translation, we first walk the
> guest hypervisor's stage 2 page table to see if it has a mapping. If
> not, we inject a stage 2 page fault to the virtual EL2. Otherwise, we
> create a mapping in the shadow stage 2 page table.
>
> Note that we have to deal with two IPAs when we got a showdow stage 2

I think it should be "shadow", not "shodow".

> page fault. One is the address we faulted on, and is in the L2 guest
> phys space. The other is from the guest stage-2 page table walk, and is
> in the L1 guest phys space.  To differentiate them, we rename variable
> names so that fault_ipa is used for the former and ipa is used for the

How about "To differentiate them, we renames variables so that [..]"?

> latter.
>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/include/asm/kvm_mmu.h       | 52 +++++++++++++++
>  arch/arm64/include/asm/kvm_emulate.h |  6 ++
>  arch/arm64/include/asm/kvm_nested.h  | 20 +++++-
>  arch/arm64/kvm/nested.c              | 41 ++++++++++++
>  virt/kvm/arm/mmio.c                  | 12 ++--
>  virt/kvm/arm/mmu.c                   | 99 ++++++++++++++++++++++------
>  6 files changed, 203 insertions(+), 27 deletions(-)
>
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index e6984b6da2ce..afabf1fd1d17 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -423,6 +423,58 @@ static inline void kvm_set_ipa_limit(void) {}
>  static inline void kvm_init_s2_mmu(struct kvm_s2_mmu *mmu) {}
>  static inline void kvm_init_nested(struct kvm *kvm) {}
>  
> +struct kvm_s2_trans {};
> +static inline phys_addr_t kvm_s2_trans_output(struct kvm_s2_trans *trans)
> +{
> +	BUG();
> +}
> +
> +static inline unsigned long kvm_s2_trans_size(struct kvm_s2_trans *trans)
> +{
> +	BUG();
> +}
> +
> +static inline u32 kvm_s2_trans_esr(struct kvm_s2_trans *trans)
> +{
> +	BUG();
> +}
> +
> +static inline int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t ipa,
> +				     struct kvm_s2_trans *trans)
> +{
> +	BUG();
> +}
> +
> +static inline int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu,
> +					   struct kvm_s2_trans *trans)
> +{
> +	BUG();
> +}
> +
> +static inline void kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u32 esr)
> +{
> +	BUG();
> +}
> +
> +static inline bool kvm_s2_trans_readable(struct kvm_s2_trans *trans)
> +{
> +	BUG();
> +}
> +
> +static inline bool kvm_s2_trans_writable(struct kvm_s2_trans *trans)
> +{
> +	BUG();
> +}
> +
> +static inline void kvm_nested_s2_flush(struct kvm *kvm) {}
> +static inline void kvm_nested_s2_wp(struct kvm *kvm) {}
> +static inline void kvm_nested_s2_clear(struct kvm *kvm) {}
> +
> +static inline bool kvm_is_shadow_s2_fault(struct kvm_vcpu *vcpu)
> +{
> +	return false;
> +}
> +
>  static __always_inline u64 kvm_get_vttbr(struct kvm_s2_mmu *mmu)
>  {
>  	struct kvm_vmid *vmid = &mmu->vmid;
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 73d8c54a52c6..b49a47f3daa8 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -606,4 +606,10 @@ static inline void __hyp_text __kvm_skip_instr(struct kvm_vcpu *vcpu)
>  	write_sysreg_el2(*vcpu_pc(vcpu), SYS_ELR);
>  }
>  
> +static inline bool kvm_is_shadow_s2_fault(struct kvm_vcpu *vcpu)
> +{
> +	return (vcpu->arch.hw_mmu != &vcpu->kvm->arch.mmu &&
> +		vcpu->arch.hw_mmu->nested_stage2_enabled);
> +}
> +
>  #endif /* __ARM64_KVM_EMULATE_H__ */
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index 686ba53379ab..052d46d96201 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -19,7 +19,7 @@ extern void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu);
>  
>  struct kvm_s2_trans {
>  	phys_addr_t output;
> -	phys_addr_t block_size;
> +	unsigned long block_size;

Shouldn't this be part of the previous patch, where we introduce the struct?

>  	bool writable;
>  	bool readable;
>  	int level;
> @@ -27,9 +27,27 @@ struct kvm_s2_trans {
>  	u64 upper_attr;
>  };
>  
> +static inline phys_addr_t kvm_s2_trans_output(struct kvm_s2_trans *trans)
> +{
> +	return trans->output;
> +}
> +
> +static inline unsigned long kvm_s2_trans_size(struct kvm_s2_trans *trans)
> +{
> +	return trans->block_size;
> +}
> +
> +static inline u32 kvm_s2_trans_esr(struct kvm_s2_trans *trans)
> +{
> +	return trans->esr;
> +}
> +
>  extern int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
>  			      struct kvm_s2_trans *result);
>  
> +extern int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu,
> +				    struct kvm_s2_trans *trans);
> +extern int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2);
>  int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
>  extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
>  extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> index 6a9bd68b769b..023027fa2db5 100644
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
> @@ -300,6 +300,8 @@ int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
>  	u64 vtcr = vcpu_read_sys_reg(vcpu, VTCR_EL2);
>  	struct s2_walk_info wi;
>  
> +	result->esr = 0;

I think this should be part of the previous patch, looks like a fix to me.

Thanks,
Alex
> +
>  	if (!nested_virt_in_use(vcpu))
>  		return 0;
>  
> @@ -415,6 +417,45 @@ void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu)
>  	}
>  }
>  
> +/*
> + * Returns non-zero if permission fault is handled by injecting it to the next
> + * level hypervisor.
> + */
> +int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu, struct kvm_s2_trans *trans)
> +{
> +	unsigned long fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
> +	bool forward_fault = false;
> +
> +	trans->esr = 0;
> +
> +	if (fault_status != FSC_PERM)
> +		return 0;
> +
> +	if (kvm_vcpu_trap_is_iabt(vcpu)) {
> +		forward_fault = (trans->upper_attr & PTE_S2_XN);
> +	} else {
> +		bool write_fault = kvm_is_write_fault(vcpu);
> +
> +		forward_fault = ((write_fault && !trans->writable) ||
> +				 (!write_fault && !trans->readable));
> +	}
> +
> +	if (forward_fault) {
> +		trans->esr = esr_s2_fault(vcpu, trans->level, ESR_ELx_FSC_PERM);
> +		return 1;
> +	}
> +
> +	return 0;
> +}
> +
> +int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2)
> +{
> +	vcpu_write_sys_reg(vcpu, vcpu->arch.fault.far_el2, FAR_EL2);
> +	vcpu_write_sys_reg(vcpu, vcpu->arch.fault.hpfar_el2, HPFAR_EL2);
> +
> +	return kvm_inject_nested_sync(vcpu, esr_el2);
> +}
> +
>  /*
>   * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
>   * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
> diff --git a/virt/kvm/arm/mmio.c b/virt/kvm/arm/mmio.c
> index a8a6a0c883f1..2b5de8388bf4 100644
> --- a/virt/kvm/arm/mmio.c
> +++ b/virt/kvm/arm/mmio.c
> @@ -142,7 +142,7 @@ static int decode_hsr(struct kvm_vcpu *vcpu, bool *is_write, int *len)
>  }
>  
>  int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
> -		 phys_addr_t fault_ipa)
> +		 phys_addr_t ipa)
>  {
>  	unsigned long data;
>  	unsigned long rt;
> @@ -171,22 +171,22 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
>  		data = vcpu_data_guest_to_host(vcpu, vcpu_get_reg(vcpu, rt),
>  					       len);
>  
> -		trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, len, fault_ipa, &data);
> +		trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, len, ipa, &data);
>  		kvm_mmio_write_buf(data_buf, len, data);
>  
> -		ret = kvm_io_bus_write(vcpu, KVM_MMIO_BUS, fault_ipa, len,
> +		ret = kvm_io_bus_write(vcpu, KVM_MMIO_BUS, ipa, len,
>  				       data_buf);
>  	} else {
>  		trace_kvm_mmio(KVM_TRACE_MMIO_READ_UNSATISFIED, len,
> -			       fault_ipa, NULL);
> +			       ipa, NULL);
>  
> -		ret = kvm_io_bus_read(vcpu, KVM_MMIO_BUS, fault_ipa, len,
> +		ret = kvm_io_bus_read(vcpu, KVM_MMIO_BUS, ipa, len,
>  				      data_buf);
>  	}
>  
>  	/* Now prepare kvm_run for the potential return to userland. */
>  	run->mmio.is_write	= is_write;
> -	run->mmio.phys_addr	= fault_ipa;
> +	run->mmio.phys_addr	= ipa;
>  	run->mmio.len		= len;
>  
>  	if (!ret) {
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index faa61a81c8cc..3c7845832db8 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1384,7 +1384,8 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
>  	return ret;
>  }
>  
> -static bool transparent_hugepage_adjust(kvm_pfn_t *pfnp, phys_addr_t *ipap)
> +static bool transparent_hugepage_adjust(kvm_pfn_t *pfnp, phys_addr_t *ipap,
> +					phys_addr_t *fault_ipap)
>  {
>  	kvm_pfn_t pfn = *pfnp;
>  	gfn_t gfn = *ipap >> PAGE_SHIFT;
> @@ -1418,6 +1419,7 @@ static bool transparent_hugepage_adjust(kvm_pfn_t *pfnp, phys_addr_t *ipap)
>  		mask = PTRS_PER_PMD - 1;
>  		VM_BUG_ON((gfn & mask) != (pfn & mask));
>  		if (pfn & mask) {
> +			*fault_ipap &= PMD_MASK;
>  			*ipap &= PMD_MASK;
>  			kvm_release_pfn_clean(pfn);
>  			pfn &= ~mask;
> @@ -1681,14 +1683,16 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot,
>  }
>  
>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> -			  struct kvm_memory_slot *memslot, unsigned long hva,
> -			  unsigned long fault_status)
> +			  struct kvm_s2_trans *nested,
> +			  struct kvm_memory_slot *memslot,
> +			  unsigned long hva, unsigned long fault_status)
>  {
>  	int ret;
> -	bool write_fault, writable, force_pte = false;
> +	bool write_fault, writable;
>  	bool exec_fault, needs_exec;
>  	unsigned long mmu_seq;
> -	gfn_t gfn = fault_ipa >> PAGE_SHIFT;
> +	phys_addr_t ipa = fault_ipa;
> +	gfn_t gfn;
>  	struct kvm *kvm = vcpu->kvm;
>  	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
>  	struct vm_area_struct *vma;
> @@ -1697,6 +1701,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	bool logging_active = memslot_is_logging(memslot);
>  	unsigned long vma_pagesize, flags = 0;
>  	struct kvm_s2_mmu *mmu = vcpu->arch.hw_mmu;
> +	unsigned long max_map_size = PUD_SIZE;
>  
>  	write_fault = kvm_is_write_fault(vcpu);
>  	exec_fault = kvm_vcpu_trap_is_iabt(vcpu);
> @@ -1717,11 +1722,26 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	}
>  
>  	vma_pagesize = vma_kernel_pagesize(vma);
> -	if (logging_active ||
> -	    !fault_supports_stage2_huge_mapping(memslot, hva, vma_pagesize)) {
> -		force_pte = true;
> -		vma_pagesize = PAGE_SIZE;
> +
> +	if (!fault_supports_stage2_huge_mapping(memslot, hva, vma_pagesize))
> +		max_map_size = PAGE_SIZE;
> +
> +	if (logging_active)
> +		max_map_size = PAGE_SIZE;
> +
> +	if (kvm_is_shadow_s2_fault(vcpu)) {
> +               ipa = kvm_s2_trans_output(nested);
> +
> +		/*
> +		 * If we're about to create a shadow stage 2 entry, then we
> +		 * can only create a block mapping if the guest stage 2 page
> +		 * table uses at least as big a mapping.
> +		 */
> +		max_map_size = min(kvm_s2_trans_size(nested), max_map_size);
>  	}
> +	gfn = ipa >> PAGE_SHIFT;
> +
> +	vma_pagesize = min(vma_pagesize, max_map_size);
>  
>  	/*
>  	 * The stage2 has a minimum of 2 level table (For arm64 see
> @@ -1731,8 +1751,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	 * 3 levels, i.e, PMD is not folded.
>  	 */
>  	if (vma_pagesize == PMD_SIZE ||
> -	    (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm)))
> -		gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT;
> +	    (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm))) {
> +		gfn = (ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT;
> +	}
>  	up_read(&current->mm->mmap_sem);
>  
>  	/* We need minimum second+third level pages */
> @@ -1784,7 +1805,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	if (mmu_notifier_retry(kvm, mmu_seq))
>  		goto out_unlock;
>  
> -	if (vma_pagesize == PAGE_SIZE && !force_pte) {
> +	if (vma_pagesize == PAGE_SIZE && max_map_size >= PMD_SIZE) {
>  		/*
>  		 * Only PMD_SIZE transparent hugepages(THP) are
>  		 * currently supported. This code will need to be
> @@ -1794,7 +1815,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  		 * aligned and that the block is contained within the memslot.
>  		 */
>  		if (fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE) &&
> -		    transparent_hugepage_adjust(&pfn, &fault_ipa))
> +		    transparent_hugepage_adjust(&pfn, &ipa, &fault_ipa))
>  			vma_pagesize = PMD_SIZE;
>  	}
>  
> @@ -1919,8 +1940,10 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
>  int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  {
>  	unsigned long fault_status;
> -	phys_addr_t fault_ipa;
> +	phys_addr_t fault_ipa; /* The address we faulted on */
> +	phys_addr_t ipa; /* Always the IPA in the L1 guest phys space */
>  	struct kvm_memory_slot *memslot;
> +	struct kvm_s2_trans nested_trans;
>  	unsigned long hva;
>  	bool is_iabt, write_fault, writable;
>  	gfn_t gfn;
> @@ -1928,7 +1951,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  
>  	fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
>  
> -	fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
> +	ipa = fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
>  	is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
>  
>  	/* Synchronous External Abort? */
> @@ -1952,6 +1975,12 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  	/* Check the stage-2 fault is trans. fault or write fault */
>  	if (fault_status != FSC_FAULT && fault_status != FSC_PERM &&
>  	    fault_status != FSC_ACCESS) {
> +		/*
> +		 * We must never see an address size fault on shadow stage 2
> +		 * page table walk, because we would have injected an addr
> +		 * size fault when we walked the nested s2 page and not
> +		 * create the shadow entry.
> +		 */
>  		kvm_err("Unsupported FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
>  			kvm_vcpu_trap_get_class(vcpu),
>  			(unsigned long)kvm_vcpu_trap_get_fault(vcpu),
> @@ -1961,7 +1990,36 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  
>  	idx = srcu_read_lock(&vcpu->kvm->srcu);
>  
> -	gfn = fault_ipa >> PAGE_SHIFT;
> +	/*
> +	 * We may have faulted on a shadow stage 2 page table if we are
> +	 * running a nested guest.  In this case, we have to resolve the L2
> +	 * IPA to the L1 IPA first, before knowing what kind of memory should
> +	 * back the L1 IPA.
> +	 *
> +	 * If the shadow stage 2 page table walk faults, then we simply inject
> +	 * this to the guest and carry on.
> +	 */
> +	if (kvm_is_shadow_s2_fault(vcpu)) {
> +		u32 esr;
> +
> +		ret = kvm_walk_nested_s2(vcpu, fault_ipa, &nested_trans);
> +		esr = kvm_s2_trans_esr(&nested_trans);
> +		if (esr)
> +			kvm_inject_s2_fault(vcpu, esr);
> +		if (ret)
> +			goto out_unlock;
> +
> +		ret = kvm_s2_handle_perm_fault(vcpu, &nested_trans);
> +		esr = kvm_s2_trans_esr(&nested_trans);
> +		if (esr)
> +			kvm_inject_s2_fault(vcpu, esr);
> +		if (ret)
> +			goto out_unlock;
> +
> +		ipa = kvm_s2_trans_output(&nested_trans);
> +	}
> +
> +	gfn = ipa >> PAGE_SHIFT;
>  	memslot = gfn_to_memslot(vcpu->kvm, gfn);
>  	hva = gfn_to_hva_memslot_prot(memslot, gfn, &writable);
>  	write_fault = kvm_is_write_fault(vcpu);
> @@ -1995,13 +2053,13 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		 * faulting VA. This is always 12 bits, irrespective
>  		 * of the page size.
>  		 */
> -		fault_ipa |= kvm_vcpu_get_hfar(vcpu) & ((1 << 12) - 1);
> -		ret = io_mem_abort(vcpu, run, fault_ipa);
> +		ipa |= kvm_vcpu_get_hfar(vcpu) & ((1 << 12) - 1);
> +		ret = io_mem_abort(vcpu, run, ipa);
>  		goto out_unlock;
>  	}
>  
>  	/* Userspace should not be able to register out-of-bounds IPAs */
> -	VM_BUG_ON(fault_ipa >= kvm_phys_size(vcpu->kvm));
> +	VM_BUG_ON(ipa >= kvm_phys_size(vcpu->kvm));
>  
>  	if (fault_status == FSC_ACCESS) {
>  		handle_access_fault(vcpu, fault_ipa);
> @@ -2009,7 +2067,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		goto out_unlock;
>  	}
>  
> -	ret = user_mem_abort(vcpu, fault_ipa, memslot, hva, fault_status);
> +	ret = user_mem_abort(vcpu, fault_ipa, &nested_trans,
> +			     memslot, hva, fault_status);
>  	if (ret == 0)
>  		ret = 1;
>  out_unlock:

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 39/59] KVM: arm64: nv: Move last_vcpu_ran to be per s2 mmu
  2019-06-21  9:38 ` [PATCH 39/59] KVM: arm64: nv: Move last_vcpu_ran to be per s2 mmu Marc Zyngier
  2019-07-01  9:10   ` Julien Thierry
@ 2019-07-05 15:28   ` Alexandru Elisei
  1 sibling, 0 replies; 176+ messages in thread
From: Alexandru Elisei @ 2019-07-05 15:28 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave Martin

On 6/21/19 10:38 AM, Marc Zyngier wrote:
> last_vcpu_ran has to be per s2 mmu now that we can have multiple S2
> per VM. Let's take this opportunity to perform some cleanup.
>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/include/asm/kvm_host.h   |  6 +++---
>  arch/arm/include/asm/kvm_mmu.h    |  2 +-
>  arch/arm64/include/asm/kvm_host.h |  6 +++---
>  arch/arm64/include/asm/kvm_mmu.h  |  2 +-
>  arch/arm64/kvm/nested.c           | 13 ++++++-------
>  virt/kvm/arm/arm.c                | 22 ++++------------------
>  virt/kvm/arm/mmu.c                | 26 ++++++++++++++++++++------
>  7 files changed, 38 insertions(+), 39 deletions(-)
>
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index b821eb2383ad..cc761610e41e 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -63,15 +63,15 @@ struct kvm_s2_mmu {
>  	pgd_t *pgd;
>  	phys_addr_t pgd_phys;
>  
> +	/* The last vcpu id that ran on each physical CPU */
> +	int __percpu *last_vcpu_ran;
> +
>  	struct kvm *kvm;
>  };
>  
>  struct kvm_arch {
>  	struct kvm_s2_mmu mmu;
>  
> -	/* The last vcpu id that ran on each physical CPU */
> -	int __percpu *last_vcpu_ran;
> -
>  	/* Stage-2 page table */
>  	pgd_t *pgd;
>  	phys_addr_t pgd_phys;
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index afabf1fd1d17..7a6e9008ed45 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -52,7 +52,7 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
>  void free_hyp_pgds(void);
>  
>  void stage2_unmap_vm(struct kvm *kvm);
> -int kvm_alloc_stage2_pgd(struct kvm_s2_mmu *mmu);
> +int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu);
>  void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
>  int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
>  			  phys_addr_t pa, unsigned long size, bool writable);
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index cc238de170d2..b71a7a237f95 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -104,6 +104,9 @@ struct kvm_s2_mmu {
>  	 * >0: Somebody is actively using this.
>  	 */
>  	atomic_t refcnt;
> +
> +	/* The last vcpu id that ran on each physical CPU */
> +	int __percpu *last_vcpu_ran;
>  };
>  
>  static inline bool kvm_s2_mmu_valid(struct kvm_s2_mmu *mmu)
> @@ -124,9 +127,6 @@ struct kvm_arch {
>  	/* VTCR_EL2 value for this VM */
>  	u64    vtcr;
>  
> -	/* The last vcpu id that ran on each physical CPU */
> -	int __percpu *last_vcpu_ran;
> -
>  	/* The maximum number of vCPUs depends on the used GIC model */
>  	int max_vcpus;
>  
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index f4c5ac5eb95f..53103607065a 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -169,7 +169,7 @@ void free_hyp_pgds(void);
>  
>  void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size);
>  void stage2_unmap_vm(struct kvm *kvm);
> -int kvm_alloc_stage2_pgd(struct kvm_s2_mmu *mmu);
> +int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu);
>  void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
>  int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
>  			  phys_addr_t pa, unsigned long size, bool writable);
> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> index 8880033fb6e0..09afafbdc8fe 100644
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
> @@ -52,18 +52,17 @@ int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu)
>  			 GFP_KERNEL | __GFP_ZERO);
>  
>  	if (tmp) {
> -		if (tmp != kvm->arch.nested_mmus)
> +		if (tmp != kvm->arch.nested_mmus) {
>  			kfree(kvm->arch.nested_mmus);
> +			kvm->arch.nested_mmus = NULL;
> +			kvm->arch.nested_mmus_size = 0;
> +		}
>  
> -		tmp[num_mmus - 1].kvm = kvm;
> -		atomic_set(&tmp[num_mmus - 1].refcnt, 0);
> -		ret = kvm_alloc_stage2_pgd(&tmp[num_mmus - 1]);
> +		ret = kvm_init_stage2_mmu(kvm, &tmp[num_mmus - 1]);
>  		if (ret)
>  			goto out;
>  
> -		tmp[num_mmus - 2].kvm = kvm;
> -		atomic_set(&tmp[num_mmus - 2].refcnt, 0);
> -		ret = kvm_alloc_stage2_pgd(&tmp[num_mmus - 2]);
> +		ret = kvm_init_stage2_mmu(kvm, &tmp[num_mmus - 2]);
>  		if (ret) {
>  			kvm_free_stage2_pgd(&tmp[num_mmus - 1]);
>  			goto out;
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index bcca27d5c481..e8b584b79847 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -99,29 +99,21 @@ void kvm_arch_check_processor_compat(void *rtn)
>  	*(int *)rtn = 0;
>  }
>  
> -
>  /**
>   * kvm_arch_init_vm - initializes a VM data structure
>   * @kvm:	pointer to the KVM struct
>   */
>  int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  {
> -	int ret, cpu;
> +	int ret;
>  
>  	ret = kvm_arm_setup_stage2(kvm, type);
>  	if (ret)
>  		return ret;
>  
> -	kvm->arch.last_vcpu_ran = alloc_percpu(typeof(*kvm->arch.last_vcpu_ran));
> -	if (!kvm->arch.last_vcpu_ran)
> -		return -ENOMEM;
> -
> -	for_each_possible_cpu(cpu)
> -		*per_cpu_ptr(kvm->arch.last_vcpu_ran, cpu) = -1;
> -
> -	ret = kvm_alloc_stage2_pgd(&kvm->arch.mmu);
> +	ret = kvm_init_stage2_mmu(kvm, &kvm->arch.mmu);
>  	if (ret)
> -		goto out_fail_alloc;
> +		return ret;
>  
>  	/* Mark the initial VMID generation invalid */
>  	kvm->arch.mmu.vmid.vmid_gen = 0;

More context:

 int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 {
-    int ret, cpu;
+    int ret;
 
     ret = kvm_arm_setup_stage2(kvm, type);
     if (ret)
         return ret;
 
-    kvm->arch.last_vcpu_ran = alloc_percpu(typeof(*kvm->arch.last_vcpu_ran));
-    if (!kvm->arch.last_vcpu_ran)
-        return -ENOMEM;
-
-    for_each_possible_cpu(cpu)
-        *per_cpu_ptr(kvm->arch.last_vcpu_ran, cpu) = -1;
-
-    ret = kvm_alloc_stage2_pgd(&kvm->arch.mmu);
+    ret = kvm_init_stage2_mmu(kvm, &kvm->arch.mmu);
     if (ret)
-        goto out_fail_alloc;
+        return ret;
 
     /* Mark the initial VMID generation invalid */
     kvm->arch.mmu.vmid.vmid_gen = 0;
     kvm->arch.mmu.kvm = kvm;
 

kvm_init_stage2_mmu already sets vmid_gen and kvm.

> @@ -142,9 +134,6 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  	return ret;
>  out_free_stage2_pgd:
>  	kvm_free_stage2_pgd(&kvm->arch.mmu);
> -out_fail_alloc:
> -	free_percpu(kvm->arch.last_vcpu_ran);
> -	kvm->arch.last_vcpu_ran = NULL;
>  	return ret;
>  }
>  
> @@ -174,9 +163,6 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
>  
>  	kvm_vgic_destroy(kvm);
>  
> -	free_percpu(kvm->arch.last_vcpu_ran);
> -	kvm->arch.last_vcpu_ran = NULL;
> -
>  	for (i = 0; i < KVM_MAX_VCPUS; ++i) {
>  		if (kvm->vcpus[i]) {
>  			kvm_arch_vcpu_free(kvm->vcpus[i]);
> @@ -359,7 +345,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  	if (nested_virt_in_use(vcpu))
>  		kvm_vcpu_load_hw_mmu(vcpu);
>  
> -	last_ran = this_cpu_ptr(vcpu->kvm->arch.last_vcpu_ran);
> +	last_ran = this_cpu_ptr(vcpu->arch.hw_mmu->last_vcpu_ran);
>  	cpu_data = this_cpu_ptr(&kvm_host_data);
>  
>  	/*
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 94d400e7af57..6a7cba077bce 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -903,8 +903,9 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
>  }
>  
>  /**
> - * kvm_alloc_stage2_pgd - allocate level-1 table for stage-2 translation.
> - * @mmu:	The stage 2 mmu struct pointer
> + * kvm_init_stage2_mmu - Initialise a S2 MMU strucrure
> + * @kvm:	The pointer to the KVM structure
> + * @mmu:	The pointer to the s2 MMU structure
>   *
>   * Allocates only the stage-2 HW PGD level table(s) of size defined by
>   * stage2_pgd_size(mmu->kvm).
> @@ -912,10 +913,11 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
>   * Note we don't need locking here as this is only called when the VM is
>   * created, which can only be done once.
>   */
> -int kvm_alloc_stage2_pgd(struct kvm_s2_mmu *mmu)
> +int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
>  {
>  	phys_addr_t pgd_phys;
>  	pgd_t *pgd;
> +	int cpu;
>  
>  	if (mmu->pgd != NULL) {
>  		kvm_err("kvm_arch already initialized?\n");
> @@ -923,18 +925,28 @@ int kvm_alloc_stage2_pgd(struct kvm_s2_mmu *mmu)
>  	}
>  
>  	/* Allocate the HW PGD, making sure that each page gets its own refcount */
> -	pgd = alloc_pages_exact(stage2_pgd_size(mmu->kvm), GFP_KERNEL | __GFP_ZERO);
> +	pgd = alloc_pages_exact(stage2_pgd_size(kvm), GFP_KERNEL | __GFP_ZERO);
>  	if (!pgd)
>  		return -ENOMEM;
>  
>  	pgd_phys = virt_to_phys(pgd);
> -	if (WARN_ON(pgd_phys & ~kvm_vttbr_baddr_mask(mmu->kvm)))
> +	if (WARN_ON(pgd_phys & ~kvm_vttbr_baddr_mask(kvm)))
>  		return -EINVAL;
>  
> +	mmu->last_vcpu_ran = alloc_percpu(typeof(*mmu->last_vcpu_ran));
> +	if (!mmu->last_vcpu_ran) {
> +		free_pages_exact(pgd, stage2_pgd_size(kvm));
> +		return -ENOMEM;
> +	}
> +
> +	mmu->kvm = kvm;
>  	mmu->pgd = pgd;
>  	mmu->pgd_phys = pgd_phys;
>  	mmu->vmid.vmid_gen = 0;
>  
> +	for_each_possible_cpu(cpu)
> +		*per_cpu_ptr(mmu->last_vcpu_ran, cpu) = -1;
> +
>  	kvm_init_s2_mmu(mmu);
>  
>  	return 0;
> @@ -1021,8 +1033,10 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
>  	spin_unlock(&kvm->mmu_lock);
>  
>  	/* Free the HW pgd, one page at a time */

last_vcpu_ran too.

> -	if (pgd)
> +	if (pgd) {
>  		free_pages_exact(pgd, stage2_pgd_size(kvm));
> +		free_percpu(mmu->last_vcpu_ran);
> +	}
>  }
>  
>  static pud_t *stage2_get_pud(struct kvm_s2_mmu *mmu, struct kvm_mmu_memory_cache *cache,

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 10/59] KVM: arm64: nv: Support virtual EL2 exceptions
  2019-06-21  9:37 ` [PATCH 10/59] KVM: arm64: nv: Support virtual EL2 exceptions Marc Zyngier
@ 2019-07-08 13:56   ` Steven Price
  0 siblings, 0 replies; 176+ messages in thread
From: Steven Price @ 2019-07-08 13:56 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm
  Cc: Julien Thierry, Andre Przywara, Suzuki K Poulose,
	Christoffer Dall, Dave Martin, James Morse, Jintack Lim

On 21/06/2019 10:37, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> Support injecting exceptions and performing exception returns to and
> from virtual EL2.  This must be done entirely in software except when
> taking an exception from vEL0 to vEL2 when the virtual HCR_EL2.{E2H,TGE}
> == {1,1}  (a VHE guest hypervisor).
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/include/asm/kvm_arm.h     |  17 +++
>  arch/arm64/include/asm/kvm_emulate.h |  22 ++++
>  arch/arm64/kvm/Makefile              |   2 +
>  arch/arm64/kvm/emulate-nested.c      | 184 +++++++++++++++++++++++++++
>  arch/arm64/kvm/inject_fault.c        |  12 --
>  arch/arm64/kvm/trace.h               |  56 ++++++++
>  6 files changed, 281 insertions(+), 12 deletions(-)
>  create mode 100644 arch/arm64/kvm/emulate-nested.c
> 
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index 7f9d2bfcf82e..9d70a5362fbb 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -339,4 +339,21 @@
>  #define CPACR_EL1_TTA		(1 << 28)
>  #define CPACR_EL1_DEFAULT	(CPACR_EL1_FPEN | CPACR_EL1_ZEN_EL1EN)
>  
> +#define kvm_mode_names				\
> +	{ PSR_MODE_EL0t,	"EL0t" },	\
> +	{ PSR_MODE_EL1t,	"EL1t" },	\
> +	{ PSR_MODE_EL1h,	"EL1h" },	\
> +	{ PSR_MODE_EL2t,	"EL2t" },	\
> +	{ PSR_MODE_EL2h,	"EL2h" },	\
> +	{ PSR_MODE_EL3t,	"EL3t" },	\
> +	{ PSR_MODE_EL3h,	"EL3h" },	\
> +	{ PSR_AA32_MODE_USR,	"32-bit USR" },	\
> +	{ PSR_AA32_MODE_FIQ,	"32-bit FIQ" },	\
> +	{ PSR_AA32_MODE_IRQ,	"32-bit IRQ" },	\
> +	{ PSR_AA32_MODE_SVC,	"32-bit SVC" },	\
> +	{ PSR_AA32_MODE_ABT,	"32-bit ABT" },	\
> +	{ PSR_AA32_MODE_HYP,	"32-bit HYP" },	\
> +	{ PSR_AA32_MODE_UND,	"32-bit UND" },	\
> +	{ PSR_AA32_MODE_SYS,	"32-bit SYS" }
> +
>  #endif /* __ARM64_KVM_ARM_H__ */
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 8f201ea56f6e..c43aac5fed69 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -33,6 +33,24 @@
>  #include <asm/cputype.h>
>  #include <asm/virt.h>
>  
> +#define CURRENT_EL_SP_EL0_VECTOR	0x0
> +#define CURRENT_EL_SP_ELx_VECTOR	0x200
> +#define LOWER_EL_AArch64_VECTOR		0x400
> +#define LOWER_EL_AArch32_VECTOR		0x600
> +
> +enum exception_type {
> +	except_type_sync	= 0,
> +	except_type_irq		= 0x80,
> +	except_type_fiq		= 0x100,
> +	except_type_serror	= 0x180,
> +};
> +
> +#define kvm_exception_type_names		\
> +	{ except_type_sync,	"SYNC"   },	\
> +	{ except_type_irq,	"IRQ"    },	\
> +	{ except_type_fiq,	"FIQ"    },	\
> +	{ except_type_serror,	"SERROR" }
> +
>  unsigned long *vcpu_reg32(const struct kvm_vcpu *vcpu, u8 reg_num);
>  unsigned long vcpu_read_spsr32(const struct kvm_vcpu *vcpu);
>  void vcpu_write_spsr32(struct kvm_vcpu *vcpu, unsigned long v);
> @@ -48,6 +66,10 @@ void kvm_inject_undef32(struct kvm_vcpu *vcpu);
>  void kvm_inject_dabt32(struct kvm_vcpu *vcpu, unsigned long addr);
>  void kvm_inject_pabt32(struct kvm_vcpu *vcpu, unsigned long addr);
>  
> +void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu);
> +int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2);
> +int kvm_inject_nested_irq(struct kvm_vcpu *vcpu);
> +
>  static inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
>  {
>  	return !(vcpu->arch.hcr_el2 & HCR_RW);
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index 3ac1a64d2fb9..9e450aea7db6 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -35,3 +35,5 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-debug.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/irqchip.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
>  kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
> +
> +kvm-$(CONFIG_KVM_ARM_HOST) += emulate-nested.o
> diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
> new file mode 100644
> index 000000000000..f829b8b04dc8
> --- /dev/null
> +++ b/arch/arm64/kvm/emulate-nested.c
> @@ -0,0 +1,184 @@
> +/*
> + * Copyright (C) 2016 - Linaro and Columbia University
> + * Author: Jintack Lim <jintack.lim@linaro.org>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <linux/kvm.h>
> +#include <linux/kvm_host.h>
> +
> +#include <asm/kvm_coproc.h>
> +#include <asm/kvm_emulate.h>
> +#include <asm/kvm_nested.h>
> +
> +#include "trace.h"
> +
> +/* This is borrowed from get_except_vector in inject_fault.c */
> +static u64 get_el2_except_vector(struct kvm_vcpu *vcpu,
> +		enum exception_type type)
> +{
> +	u64 exc_offset;
> +
> +	switch (*vcpu_cpsr(vcpu) & (PSR_MODE_MASK | PSR_MODE32_BIT)) {
> +	case PSR_MODE_EL2t:
> +		exc_offset = CURRENT_EL_SP_EL0_VECTOR;
> +		break;
> +	case PSR_MODE_EL2h:
> +		exc_offset = CURRENT_EL_SP_ELx_VECTOR;
> +		break;
> +	case PSR_MODE_EL1t:
> +	case PSR_MODE_EL1h:
> +	case PSR_MODE_EL0t:
> +		exc_offset = LOWER_EL_AArch64_VECTOR;
> +		break;
> +	default:
> +		kvm_err("Unexpected previous exception level: aarch32\n");
> +		exc_offset = LOWER_EL_AArch32_VECTOR;
> +	}
> +
> +	return vcpu_read_sys_reg(vcpu, VBAR_EL2) + exc_offset + type;
> +}
> +
> +void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu)
> +{
> +	u64 spsr, elr, mode;
> +	bool direct_eret;
> +
> +	/*
> +	 * Going through the whole put/load motions is a waste of time
> +	 * if this is a VHE guest hypervisor returning to its own
> +	 * userspace, or the hypervisor performing a local exception
> +	 * return. No need to save/restore registers, no need to
> +	 * switch S2 MMU. Just do the canonical ERET.
> +	 */
> +	spsr = vcpu_read_sys_reg(vcpu, SPSR_EL2);
> +	mode = spsr & (PSR_MODE_MASK | PSR_MODE32_BIT);
> +
> +	direct_eret  = (mode == PSR_MODE_EL0t &&
> +			vcpu_el2_e2h_is_set(vcpu) &&
> +			vcpu_el2_tge_is_set(vcpu));
> +	direct_eret |= (mode == PSR_MODE_EL2h || mode == PSR_MODE_EL2t);
> +
> +	if (direct_eret) {
> +		*vcpu_pc(vcpu) = vcpu_read_sys_reg(vcpu, ELR_EL2);
> +		*vcpu_cpsr(vcpu) = spsr;
> +		trace_kvm_nested_eret(vcpu, *vcpu_pc(vcpu), *vcpu_cpsr(vcpu));
> +		return;
> +	}
> +
> +	preempt_disable();
> +	kvm_arch_vcpu_put(vcpu);
> +
> +	spsr = __vcpu_sys_reg(vcpu, SPSR_EL2);

Why do we need to reload SPSR here?

> +	elr = __vcpu_sys_reg(vcpu, ELR_EL2);
> +
> +	trace_kvm_nested_eret(vcpu, elr, spsr);
> +
> +	/*
> +	 * Note that the current exception level is always the virtual EL2,
> +	 * since we set HCR_EL2.NV bit only when entering the virtual EL2.
> +	 */
> +	*vcpu_pc(vcpu) = elr;
> +	*vcpu_cpsr(vcpu) = spsr;
> +
> +	kvm_arch_vcpu_load(vcpu, smp_processor_id());
> +	preempt_enable();
> +}
> +
> +static void enter_el2_exception(struct kvm_vcpu *vcpu, u64 esr_el2,
> +				enum exception_type type)
> +{
> +	trace_kvm_inject_nested_exception(vcpu, esr_el2, type);
> +
> +	vcpu_write_sys_reg(vcpu, *vcpu_cpsr(vcpu), SPSR_EL2);
> +	vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL2);
> +	vcpu_write_sys_reg(vcpu, esr_el2, ESR_EL2);
> +
> +	*vcpu_pc(vcpu) = get_el2_except_vector(vcpu, type);
> +	/* On an exception, PSTATE.SP becomes 1 */
> +	*vcpu_cpsr(vcpu) = PSR_MODE_EL2h;
> +	*vcpu_cpsr(vcpu) |= PSR_A_BIT | PSR_F_BIT | PSR_I_BIT | PSR_D_BIT;
> +}
> +
> +/*
> + * Emulate taking an exception to EL2.
> + * See ARM ARM J8.1.2 AArch64.TakeException()
> + */
> +static int kvm_inject_nested(struct kvm_vcpu *vcpu, u64 esr_el2,
> +			     enum exception_type type)
> +{
> +	u64 pstate, mode;
> +	bool direct_inject;
> +
> +	if (!nested_virt_in_use(vcpu)) {
> +		kvm_err("Unexpected call to %s for the non-nesting configuration\n",
> +				__func__);
> +		return -EINVAL;
> +	}
> +
> +	/*
> +	 * As for ERET, we can avoid doing too much on the injection path by
> +	 * checking that we either took the exception from a VHE host
> +	 * userspace or from vEL2. In these cases, there is no change in
> +	 * translation regime (or anything else), so let's do as little as
> +	 * possible.
> +	 */
> +	pstate = *vcpu_cpsr(vcpu);
> +	mode = pstate & (PSR_MODE_MASK | PSR_MODE32_BIT);
> +
> +	direct_inject  = (mode == PSR_MODE_EL0t &&
> +			  vcpu_el2_e2h_is_set(vcpu) &&
> +			  vcpu_el2_tge_is_set(vcpu));
> +	direct_inject |= (mode == PSR_MODE_EL2h || mode == PSR_MODE_EL2t);
> +
> +	if (direct_inject) {
> +		enter_el2_exception(vcpu, esr_el2, type);
> +		return 1;
> +	}
> +
> +	preempt_disable();
> +	kvm_arch_vcpu_put(vcpu);
> +
> +	enter_el2_exception(vcpu, esr_el2, type);
> +
> +	kvm_arch_vcpu_load(vcpu, smp_processor_id());
> +	preempt_enable();
> +
> +	return 1;
> +}
> +
> +int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2)
> +{
> +	return kvm_inject_nested(vcpu, esr_el2, except_type_sync);
> +}
> +
> +int kvm_inject_nested_irq(struct kvm_vcpu *vcpu)
> +{
> +	/*
> +	 * Do not inject an irq if the:
> +	 *  - Current exception level is EL2, and
> +	 *  - virtual HCR_EL2.TGE == 0
> +	 *  - virtual HCR_EL2.IMO == 0
> +	 *
> +	 * See Table D1-17 "Physical interrupt target and masking when EL3 is
> +	 * not implemented and EL2 is implemented" in ARM DDI 0487C.a.
> +	 */
> +
> +	if (vcpu_mode_el2(vcpu) && !vcpu_el2_tge_is_set(vcpu) &&
> +	    !(__vcpu_sys_reg(vcpu, HCR_EL2) & HCR_IMO))
> +		return 1;
> +
> +	/* esr_el2 value doesn't matter for exits due to irqs. */
> +	return kvm_inject_nested(vcpu, 0, except_type_irq);
> +}
> diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
> index a55e91dfcf8f..fac962b467bd 100644
> --- a/arch/arm64/kvm/inject_fault.c
> +++ b/arch/arm64/kvm/inject_fault.c
> @@ -28,18 +28,6 @@
>  #define PSTATE_FAULT_BITS_64 	(PSR_MODE_EL1h | PSR_A_BIT | PSR_F_BIT | \
>  				 PSR_I_BIT | PSR_D_BIT)
>  
> -#define CURRENT_EL_SP_EL0_VECTOR	0x0
> -#define CURRENT_EL_SP_ELx_VECTOR	0x200
> -#define LOWER_EL_AArch64_VECTOR		0x400
> -#define LOWER_EL_AArch32_VECTOR		0x600
> -
> -enum exception_type {
> -	except_type_sync	= 0,
> -	except_type_irq		= 0x80,
> -	except_type_fiq		= 0x100,
> -	except_type_serror	= 0x180,
> -};
> -
>  static u64 get_except_vector(struct kvm_vcpu *vcpu, enum exception_type type)
>  {
>  	u64 exc_offset;
> diff --git a/arch/arm64/kvm/trace.h b/arch/arm64/kvm/trace.h
> index eab91ad0effb..797a705bb644 100644
> --- a/arch/arm64/kvm/trace.h
> +++ b/arch/arm64/kvm/trace.h
> @@ -204,7 +204,63 @@ TRACE_EVENT(kvm_set_guest_debug,
>  	TP_printk("vcpu: %p, flags: 0x%08x", __entry->vcpu, __entry->guest_debug)
>  );
>  
> +TRACE_EVENT(kvm_nested_eret,
> +	TP_PROTO(struct kvm_vcpu *vcpu, unsigned long elr_el2,
> +		 unsigned long spsr_el2),
> +	TP_ARGS(vcpu, elr_el2, spsr_el2),
>  
> +	TP_STRUCT__entry(
> +		__field(struct kvm_vcpu *,	vcpu)
> +		__field(unsigned long,		elr_el2)
> +		__field(unsigned long,		spsr_el2)
> +		__field(unsigned long,		target_mode)
> +		__field(unsigned long,		hcr_el2)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->vcpu = vcpu;
> +		__entry->elr_el2 = elr_el2;
> +		__entry->spsr_el2 = spsr_el2;
> +		__entry->target_mode = spsr_el2 & (PSR_MODE_MASK | PSR_MODE32_BIT);
> +		__entry->hcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
> +	),
> +
> +	TP_printk("elr_el2: 0x%lx spsr_el2: 0x%08lx (M: %s) hcr_el2: %lx",
> +		  __entry->elr_el2, __entry->spsr_el2,
> +		  __print_symbolic(__entry->target_mode, kvm_mode_names),
> +		  __entry->hcr_el2)
> +);
> +
> +TRACE_EVENT(kvm_inject_nested_exception,
> +	TP_PROTO(struct kvm_vcpu *vcpu, u64 esr_el2, int type),
> +	TP_ARGS(vcpu, esr_el2, type),
> +
> +	TP_STRUCT__entry(
> +		__field(struct kvm_vcpu *,		vcpu)
> +		__field(unsigned long,			esr_el2)
> +		__field(int,				type)
> +		__field(unsigned long,			spsr_el2)
> +		__field(unsigned long,			pc)
> +		__field(int,				source_mode)

NIT: target_mode is "unsigned long", but source_mode (above) is "int",
both are constrained by the same mask (PSR_MODE_MASK | PSR_MODE32_BIT)

Steve

> +		__field(unsigned long,			hcr_el2)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->vcpu = vcpu;
> +		__entry->esr_el2 = esr_el2;
> +		__entry->type = type;
> +		__entry->spsr_el2 = *vcpu_cpsr(vcpu);
> +		__entry->pc = *vcpu_pc(vcpu);
> +		__entry->source_mode = *vcpu_cpsr(vcpu) & (PSR_MODE_MASK | PSR_MODE32_BIT);
> +		__entry->hcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
> +	),
> +
> +	TP_printk("%s: esr_el2 0x%lx elr_el2: 0x%lx spsr_el2: 0x%08lx (M: %s) hcr_el2: %lx",
> +		  __print_symbolic(__entry->type, kvm_exception_type_names),
> +		  __entry->esr_el2, __entry->pc, __entry->spsr_el2,
> +		  __print_symbolic(__entry->source_mode, kvm_mode_names),
> +		  __entry->hcr_el2)
> +);
>  #endif /* _TRACE_ARM64_KVM_H */
>  
>  #undef TRACE_INCLUDE_PATH
> 


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 40/59] KVM: arm64: nv: Don't always start an S2 MMU search from the beginning
  2019-06-21  9:38 ` [PATCH 40/59] KVM: arm64: nv: Don't always start an S2 MMU search from the beginning Marc Zyngier
@ 2019-07-09  9:59   ` Alexandru Elisei
  0 siblings, 0 replies; 176+ messages in thread
From: Alexandru Elisei @ 2019-07-09  9:59 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave Martin

On 6/21/19 10:38 AM, Marc Zyngier wrote:
> Starting a S2 MMU search from the beginning all the time means that
> we're potentially nuking a useful context (like we'd potentially
> have on a !VHE KVM guest).
>
> Instead, let's always start the search from the point *after* the
> last allocated context. This should ensure that alternating between
> two EL1 contexts will not result in nuking the whole S2 each time.
>
> lookup_s2_mmu now has a chance to provide a hit.
>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/include/asm/kvm_host.h |  1 +
>  arch/arm64/kvm/nested.c           | 14 ++++++++++++--
>  2 files changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index b71a7a237f95..b7c44adcdbf3 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -123,6 +123,7 @@ struct kvm_arch {
>  	 */
>  	struct kvm_s2_mmu *nested_mmus;
>  	size_t nested_mmus_size;
> +	int nested_mmus_next;

For consistency, shouldn't nested_mmus_next be zero initialized in
kvm_init_nested (arch/arm64/kvm/nested.c), like nested_mmus and
nested_mmus_size? Not a big deal either way, since struct kvm is allocated using
vzalloc.

>  really
>  	/* VTCR_EL2 value for this VM */
>  	u64    vtcr;
> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> index 09afafbdc8fe..214d59019935 100644
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
> @@ -363,14 +363,24 @@ static struct kvm_s2_mmu *get_s2_mmu_nested(struct kvm_vcpu *vcpu)
>  	if (s2_mmu)
>  		goto out;
>  
> -	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> -		s2_mmu = &kvm->arch.nested_mmus[i];
> +	/*
> +	 * Make sure we don't always search from the same point, or we
> +	 * will always reuse a potentially active context, leaving
> +	 * free contexts unused.
> +	 */
> +	for (i = kvm->arch.nested_mmus_next;
> +	     i < (kvm->arch.nested_mmus_size + kvm->arch.nested_mmus_next);
> +	     i++) {
> +		s2_mmu = &kvm->arch.nested_mmus[i % kvm->arch.nested_mmus_size];
>  
>  		if (atomic_read(&s2_mmu->refcnt) == 0)
>  			break;
>  	}
>  	BUG_ON(atomic_read(&s2_mmu->refcnt)); /* We have struct MMUs to spare */
>  
> +	/* Set the scene for the next search */
> +	kvm->arch.nested_mmus_next = (i + 1) % kvm->arch.nested_mmus_size;
> +
>  	if (kvm_s2_mmu_valid(s2_mmu)) {
>  		/* Clear the old state */
>  		kvm_unmap_stage2_range(s2_mmu, 0, kvm_phys_size(kvm));

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 43/59] KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2
  2019-06-21  9:38 ` [PATCH 43/59] KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2 Marc Zyngier
  2019-07-01 15:45   ` Julien Thierry
@ 2019-07-09 13:20   ` Alexandru Elisei
  2019-07-18 12:13     ` Tomasz Nowicki
  2019-07-24 10:25   ` Tomasz Nowicki
  2 siblings, 1 reply; 176+ messages in thread
From: Alexandru Elisei @ 2019-07-09 13:20 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave Martin

On 6/21/19 10:38 AM, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
>
> When supporting nested virtualization a guest hypervisor executing AT
> instructions must be trapped and emulated by the host hypervisor,
> because untrapped AT instructions operating on S1E1 will use the wrong
> translation regieme (the one used to emulate virtual EL2 in EL1 instead

I think that should be "regime".

> of virtual EL1) and AT instructions operating on S12 will not work from
> EL1.
>
> This patch does several things.
>
> 1. List and define all AT system instructions to emulate and document
> the emulation design.
>
> 2. Implement AT instruction handling logic in EL2. This will be used to
> emulate AT instructions executed in the virtual EL2.
>
> AT instruction emulation works by loading the proper processor
> context, which depends on the trapped instruction and the virtual
> HCR_EL2, to the EL1 virtual memory control registers and executing AT
> instructions. Note that ctxt->hw_sys_regs is expected to have the
> proper processor context before calling the handling
> function(__kvm_at_insn) implemented in this patch.
>
> 4. Emulate AT S1E[01] instructions by issuing the same instructions in
> EL2. We set the physical EL1 registers, NV and NV1 bits as described in
> the AT instruction emulation overview.

Is item number 3 missing, or is that the result of an unfortunate typo?

>
> 5. Emulate AT A12E[01] instructions in two steps: First, do the stage-1
> translation by reusing the existing AT emulation functions.  Second, do
> the stage-2 translation by walking the guest hypervisor's stage-2 page
> table in software. Record the translation result to PAR_EL1.
>
> 6. Emulate AT S1E2 instructions by issuing the corresponding S1E1
> instructions in EL2. We set the physical EL1 registers and the HCR_EL2
> register as described in the AT instruction emulation overview.
>
> 7. Forward system instruction traps to the virtual EL2 if the corresponding
> virtual AT bit is set in the virtual HCR_EL2.
>
>   [ Much logic above has been reworked by Marc Zyngier ]
>
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> ---
>  arch/arm64/include/asm/kvm_arm.h |   2 +
>  arch/arm64/include/asm/kvm_asm.h |   2 +
>  arch/arm64/include/asm/sysreg.h  |  17 +++
>  arch/arm64/kvm/hyp/Makefile      |   1 +
>  arch/arm64/kvm/hyp/at.c          | 217 +++++++++++++++++++++++++++++++
>  arch/arm64/kvm/hyp/switch.c      |  13 +-
>  arch/arm64/kvm/sys_regs.c        | 202 +++++++++++++++++++++++++++-
>  7 files changed, 450 insertions(+), 4 deletions(-)
>  create mode 100644 arch/arm64/kvm/hyp/at.c
>
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index 1e4dbe0b1c8e..9903f10f6343 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -24,6 +24,7 @@
>  
>  /* Hyp Configuration Register (HCR) bits */
>  #define HCR_FWB		(UL(1) << 46)
> +#define HCR_AT		(UL(1) << 44)
>  #define HCR_NV1		(UL(1) << 43)
>  #define HCR_NV		(UL(1) << 42)
>  #define HCR_API		(UL(1) << 41)
> @@ -119,6 +120,7 @@
>  #define VTCR_EL2_TG0_16K	TCR_TG0_16K
>  #define VTCR_EL2_TG0_64K	TCR_TG0_64K
>  #define VTCR_EL2_SH0_MASK	TCR_SH0_MASK
> +#define VTCR_EL2_SH0_SHIFT	TCR_SH0_SHIFT
>  #define VTCR_EL2_SH0_INNER	TCR_SH0_INNER
>  #define VTCR_EL2_ORGN0_MASK	TCR_ORGN0_MASK
>  #define VTCR_EL2_ORGN0_WBWA	TCR_ORGN0_WBWA
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index 5e956c2cd9b4..1cfa4d2cf772 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -69,6 +69,8 @@ extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
>  extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
>  
>  extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
> +extern void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr);
> +extern void __kvm_at_s1e2(struct kvm_vcpu *vcpu, u32 op, u64 vaddr);
>  
>  extern int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu);
>  
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 8b95f2c42c3d..b3a8d21c07b3 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -536,6 +536,23 @@
>  
>  #define SYS_SP_EL2			sys_reg(3, 6,  4, 1, 0)
>  
> +/* AT instructions */
> +#define AT_Op0 1
> +#define AT_CRn 7
> +
> +#define OP_AT_S1E1R	sys_insn(AT_Op0, 0, AT_CRn, 8, 0)
> +#define OP_AT_S1E1W	sys_insn(AT_Op0, 0, AT_CRn, 8, 1)
> +#define OP_AT_S1E0R	sys_insn(AT_Op0, 0, AT_CRn, 8, 2)
> +#define OP_AT_S1E0W	sys_insn(AT_Op0, 0, AT_CRn, 8, 3)
> +#define OP_AT_S1E1RP	sys_insn(AT_Op0, 0, AT_CRn, 9, 0)
> +#define OP_AT_S1E1WP	sys_insn(AT_Op0, 0, AT_CRn, 9, 1)
> +#define OP_AT_S1E2R	sys_insn(AT_Op0, 4, AT_CRn, 8, 0)
> +#define OP_AT_S1E2W	sys_insn(AT_Op0, 4, AT_CRn, 8, 1)
> +#define OP_AT_S12E1R	sys_insn(AT_Op0, 4, AT_CRn, 8, 4)
> +#define OP_AT_S12E1W	sys_insn(AT_Op0, 4, AT_CRn, 8, 5)
> +#define OP_AT_S12E0R	sys_insn(AT_Op0, 4, AT_CRn, 8, 6)
> +#define OP_AT_S12E0W	sys_insn(AT_Op0, 4, AT_CRn, 8, 7)
> +
>  /* Common SCTLR_ELx flags. */
>  #define SCTLR_ELx_DSSBS	(_BITUL(44))
>  #define SCTLR_ELx_ENIA	(_BITUL(31))
> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
> index ea710f674cb6..f7af51647079 100644
> --- a/arch/arm64/kvm/hyp/Makefile
> +++ b/arch/arm64/kvm/hyp/Makefile
> @@ -19,6 +19,7 @@ obj-$(CONFIG_KVM_ARM_HOST) += entry.o
>  obj-$(CONFIG_KVM_ARM_HOST) += switch.o
>  obj-$(CONFIG_KVM_ARM_HOST) += fpsimd.o
>  obj-$(CONFIG_KVM_ARM_HOST) += tlb.o
> +obj-$(CONFIG_KVM_ARM_HOST) += at.o
>  obj-$(CONFIG_KVM_ARM_HOST) += hyp-entry.o
>  
>  # KVM code is run at a different exception code with a different map, so
> diff --git a/arch/arm64/kvm/hyp/at.c b/arch/arm64/kvm/hyp/at.c
> new file mode 100644
> index 000000000000..0e938b6f8e43
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/at.c
> @@ -0,0 +1,217 @@
> +/*
> + * Copyright (C) 2017 - Linaro Ltd
> + * Author: Jintack Lim <jintack.lim@linaro.org>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <asm/kvm_hyp.h>
> +#include <asm/kvm_mmu.h>
> +
> +struct mmu_config {
> +	u64	ttbr0;
> +	u64	ttbr1;
> +	u64	tcr;
> +	u64	sctlr;
> +	u64	vttbr;
> +	u64	vtcr;
> +	u64	hcr;
> +};
> +
> +static void __mmu_config_save(struct mmu_config *config)
> +{
> +	config->ttbr0	= read_sysreg_el1(SYS_TTBR0);
> +	config->ttbr1	= read_sysreg_el1(SYS_TTBR1);
> +	config->tcr	= read_sysreg_el1(SYS_TCR);
> +	config->sctlr	= read_sysreg_el1(SYS_SCTLR);
> +	config->vttbr	= read_sysreg(vttbr_el2);
> +	config->vtcr	= read_sysreg(vtcr_el2);
> +	config->hcr	= read_sysreg(hcr_el2);
> +}
> +
> +static void __mmu_config_restore(struct mmu_config *config)
> +{
> +	write_sysreg_el1(config->ttbr0,	SYS_TTBR0);
> +	write_sysreg_el1(config->ttbr1,	SYS_TTBR1);
> +	write_sysreg_el1(config->tcr,	SYS_TCR);
> +	write_sysreg_el1(config->sctlr,	SYS_SCTLR);
> +	write_sysreg(config->vttbr,	vttbr_el2);
> +	write_sysreg(config->vtcr,	vttbr_el2);
> +	write_sysreg(config->hcr,	hcr_el2);
> +
> +	isb();
> +}
> +
> +void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
> +{
> +	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
> +	struct mmu_config config;
> +	struct kvm_s2_mmu *mmu;
> +
> +	/*
> +	 * We can only get here when trapping from vEL2, so we're
> +	 * translating a guest guest VA.
> +	 *
> +	 * FIXME: Obtaining the S2 MMU for a a guest guest is horribly
> +	 * racy, and we may not find it.
> +	 */
> +	spin_lock(&vcpu->kvm->mmu_lock);
> +
> +	mmu = lookup_s2_mmu(vcpu->kvm,
> +			    vcpu_read_sys_reg(vcpu, VTTBR_EL2),
> +			    vcpu_read_sys_reg(vcpu, HCR_EL2));
From ARM DDI 0487D.b, the description for AT S1E1R (page C5-467, it's the same
for the other at s1e{0,1}* instructions):

[..] Performs stage 1 address translation, with permisions as if reading from
the given virtual address from EL1, or from EL2 [..], using the following
translation regime:
- If HCR_EL2.{E2H,TGE} is {1, 1}, the EL2&0 translation regime, accessed from EL2.

If the guest is VHE, I don't think there's any need to switch mmus. The AT
instruction will use the physical EL1&0 translation regime already on the
hardware (assuming host HCR_EL2.TGE == 0), which is the vEL2&0 regime for the
guest hypervisor.

> +
> +	if (WARN_ON(!mmu))
> +		goto out;
> +
> +	/* We've trapped, so everything is live on the CPU. */
> +	__mmu_config_save(&config);
> +
> +	write_sysreg_el1(ctxt->sys_regs[TTBR0_EL1],	SYS_TTBR0);
> +	write_sysreg_el1(ctxt->sys_regs[TTBR1_EL1],	SYS_TTBR1);
> +	write_sysreg_el1(ctxt->sys_regs[TCR_EL1],	SYS_TCR);
> +	write_sysreg_el1(ctxt->sys_regs[SCTLR_EL1],	SYS_SCTLR);
> +	write_sysreg(kvm_get_vttbr(mmu),		vttbr_el2);
> +	/* FIXME: write S2 MMU VTCR_EL2 */
> +	write_sysreg(config.hcr & ~HCR_TGE,		hcr_el2);
> +
> +	isb();
> +
> +	switch (op) {
> +	case OP_AT_S1E1R:
> +	case OP_AT_S1E1RP:
> +		asm volatile("at s1e1r, %0" : : "r" (vaddr));
> +		break;
> +	case OP_AT_S1E1W:
> +	case OP_AT_S1E1WP:
> +		asm volatile("at s1e1w, %0" : : "r" (vaddr));
> +		break;
> +	case OP_AT_S1E0R:
> +		asm volatile("at s1e0r, %0" : : "r" (vaddr));
> +		break;
> +	case OP_AT_S1E0W:
> +		asm volatile("at s1e0w, %0" : : "r" (vaddr));
> +		break;
> +	default:
> +		WARN_ON(1);
> +		break;
> +	}
> +
> +	isb();
> +
> +	ctxt->sys_regs[PAR_EL1] = read_sysreg(par_el1);
> +
> +	/*
> +	 * Failed? let's leave the building now.
> +	 *
> +	 * FIXME: how about a failed translation because the shadow S2
> +	 * wasn't populated? We may need to perform a SW PTW,
> +	 * populating our shadow S2 and retry the instruction.
> +	 */

I think this can also fail if the L2 IPA is not in the L1 guest stage 2 tables
(and therefore not in the shadow stage 2 tables). At that point we should stop
and fail the AT instruction emulation.

Thanks,
Alex
> +	if (ctxt->sys_regs[PAR_EL1] & 1)
> +		goto nopan;
> +
> +	/* No PAN? No problem. */
> +	if (!(*vcpu_cpsr(vcpu) & PSR_PAN_BIT))
> +		goto nopan;
> +
> +	/*
> +	 * For PAN-involved AT operations, perform the same
> +	 * translation, using EL0 this time.
> +	 */
> +	switch (op) {
> +	case OP_AT_S1E1RP:
> +		asm volatile("at s1e0r, %0" : : "r" (vaddr));
> +		break;
> +	case OP_AT_S1E1WP:
> +		asm volatile("at s1e0w, %0" : : "r" (vaddr));
> +		break;
> +	default:
> +		goto nopan;
> +	}
> +
> +	/*
> +	 * If the EL0 translation has succeeded, we need to pretend
> +	 * the AT operation has failed, as the PAN setting forbids
> +	 * such a translation.
> +	 *
> +	 * FIXME: we hardcode a Level-3 permission fault. We really
> +	 * should return the real fault level.
> +	 */
> +	if (!(read_sysreg(par_el1) & 1))
> +		ctxt->sys_regs[PAR_EL1] = 0x1f;
> +
> +nopan:
> +	__mmu_config_restore(&config);
> +
> +out:
> +	spin_unlock(&vcpu->kvm->mmu_lock);
> +}
> +
> +void __kvm_at_s1e2(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
> +{
> +	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
> +	struct mmu_config config;
> +	struct kvm_s2_mmu *mmu;
> +	u64 val;
> +
> +	spin_lock(&vcpu->kvm->mmu_lock);
> +
> +	mmu = &vcpu->kvm->arch.mmu;
> +
> +	/* We've trapped, so everything is live on the CPU. */
> +	__mmu_config_save(&config);
> +
> +	if (vcpu_el2_e2h_is_set(vcpu)) {
> +		write_sysreg_el1(ctxt->sys_regs[TTBR0_EL2],	SYS_TTBR0);
> +		write_sysreg_el1(ctxt->sys_regs[TTBR1_EL2],	SYS_TTBR1);
> +		write_sysreg_el1(ctxt->sys_regs[TCR_EL2],	SYS_TCR);
> +		write_sysreg_el1(ctxt->sys_regs[SCTLR_EL2],	SYS_SCTLR);
> +
> +		val = config.hcr;
> +	} else {
> +		write_sysreg_el1(ctxt->sys_regs[TTBR0_EL2],	SYS_TTBR0);
> +		write_sysreg_el1(translate_tcr(ctxt->sys_regs[TCR_EL2]),
> +				 SYS_TCR);
> +		write_sysreg_el1(translate_sctlr(ctxt->sys_regs[SCTLR_EL2]),
> +				 SYS_SCTLR);
> +
> +		val = config.hcr | HCR_NV | HCR_NV1;
> +	}
> +
> +	write_sysreg(kvm_get_vttbr(mmu),		vttbr_el2);
> +	/* FIXME: write S2 MMU VTCR_EL2 */
> +	write_sysreg(val & ~HCR_TGE,			hcr_el2);
> +
> +	isb();
> +
> +	switch (op) {
> +	case OP_AT_S1E2R:
> +		asm volatile("at s1e1r, %0" : : "r" (vaddr));
> +		break;
> +	case OP_AT_S1E2W:
> +		asm volatile("at s1e1w, %0" : : "r" (vaddr));
> +		break;
> +	default:
> +		WARN_ON(1);
> +		break;
> +	}
> +
> +	isb();
> +
> +	/* FIXME: handle failed translation due to shadow S2 */
> +	ctxt->sys_regs[PAR_EL1] = read_sysreg(par_el1);
> +
> +	__mmu_config_restore(&config);
> +	spin_unlock(&vcpu->kvm->mmu_lock);
> +}
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index fb479c71b521..bd9fc0dae8e8 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -143,9 +143,10 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
>  		if (!vcpu_el2_e2h_is_set(vcpu)) {
>  			/*
>  			 * For a guest hypervisor on v8.0, trap and emulate
> -			 * the EL1 virtual memory control register accesses.
> +			 * the EL1 virtual memory control register accesses
> +			 * as well as the AT S1 operations.
>  			 */
> -			hcr |= HCR_TVM | HCR_TRVM | HCR_NV1;
> +			hcr |= HCR_TVM | HCR_TRVM | HCR_AT | HCR_NV1;
>  		} else {
>  			/*
>  			 * For a guest hypervisor on v8.1 (VHE), allow to
> @@ -168,6 +169,14 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
>  			hcr &= ~HCR_TVM;
>  
>  			hcr |= vhcr_el2 & (HCR_TVM | HCR_TRVM);
> +
> +			/*
> +			 * If we're using the EL1 translation regime
> +			 * (TGE clear, then ensure that AT S1 ops are
> +			 * trapped too.
> +			 */
> +			if (!vcpu_el2_tge_is_set(vcpu))
> +				hcr |= HCR_AT;
>  		}
>  	}
>  
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 0d5b7a7c76de..102419b837e8 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1656,6 +1656,11 @@ static bool access_sp_el1(struct kvm_vcpu *vcpu,
>  	return true;
>  }
>  
> +static bool forward_at_traps(struct kvm_vcpu *vcpu)
> +{
> +	return forward_traps(vcpu, HCR_AT);
> +}
> +
>  /* This function is to support the recursive nested virtualization */
>  static bool forward_nv1_traps(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
>  {
> @@ -2135,12 +2140,205 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	{ SYS_DESC(SYS_SP_EL2), NULL, reset_unknown, SP_EL2 },
>  };
>  
> -#define SYS_INSN_TO_DESC(insn, access_fn, forward_fn)	\
> -	{ SYS_DESC((insn)), (access_fn), NULL, 0, 0, NULL, NULL, (forward_fn) }
> +static bool handle_s1e01(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +			 const struct sys_reg_desc *r)
> +{
> +	int sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
> +
> +	__kvm_at_s1e01(vcpu, sys_encoding, p->regval);
> +
> +	return true;
> +}
> +
> +static bool handle_s1e2(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +			const struct sys_reg_desc *r)
> +{
> +	int sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
> +
> +	__kvm_at_s1e2(vcpu, sys_encoding, p->regval);
> +
> +	return true;
> +}
> +
> +static u64 setup_par_aborted(u32 esr)
> +{
> +	u64 par = 0;
> +
> +	/* S [9]: fault in the stage 2 translation */
> +	par |= (1 << 9);
> +	/* FST [6:1]: Fault status code  */
> +	par |= (esr << 1);
> +	/* F [0]: translation is aborted */
> +	par |= 1;
> +
> +	return par;
> +}
> +
> +static u64 setup_par_completed(struct kvm_vcpu *vcpu, struct kvm_s2_trans *out)
> +{
> +	u64 par, vtcr_sh0;
> +
> +	/* F [0]: Translation is completed successfully */
> +	par = 0;
> +	/* ATTR [63:56] */
> +	par |= out->upper_attr;
> +	/* PA [47:12] */
> +	par |= out->output & GENMASK_ULL(11, 0);
> +	/* RES1 [11] */
> +	par |= (1UL << 11);
> +	/* SH [8:7]: Shareability attribute */
> +	vtcr_sh0 = vcpu_read_sys_reg(vcpu, VTCR_EL2) & VTCR_EL2_SH0_MASK;
> +	par |= (vtcr_sh0 >> VTCR_EL2_SH0_SHIFT) << 7;
> +
> +	return par;
> +}
> +
> +static bool handle_s12(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +		       const struct sys_reg_desc *r, bool write)
> +{
> +	u64 par, va;
> +	u32 esr;
> +	phys_addr_t ipa;
> +	struct kvm_s2_trans out;
> +	int ret;
> +
> +	/* Do the stage-1 translation */
> +	handle_s1e01(vcpu, p, r);
> +	par = vcpu_read_sys_reg(vcpu, PAR_EL1);
> +	if (par & 1) {
> +		/* The stage-1 translation aborted */
> +		return true;
> +	}
> +
> +	/* Do the stage-2 translation */
> +	va = p->regval;
> +	ipa = (par & GENMASK_ULL(47, 12)) | (va & GENMASK_ULL(11, 0));
> +	out.esr = 0;
> +	ret = kvm_walk_nested_s2(vcpu, ipa, &out);
> +	if (ret < 0)
> +		return false;
> +
> +	/* Check if the stage-2 PTW is aborted */
> +	if (out.esr) {
> +		esr = out.esr;
> +		goto s2_trans_abort;
> +	}
> +
> +	/* Check the access permission */
> +	if ((!write && !out.readable) || (write && !out.writable)) {
> +		esr = ESR_ELx_FSC_PERM;
> +		esr |= out.level & 0x3;
> +		goto s2_trans_abort;
> +	}
> +
> +	vcpu_write_sys_reg(vcpu, setup_par_completed(vcpu, &out), PAR_EL1);
> +	return true;
> +
> +s2_trans_abort:
> +	vcpu_write_sys_reg(vcpu, setup_par_aborted(esr), PAR_EL1);
> +	return true;
> +}
> +
> +static bool handle_s12r(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +			const struct sys_reg_desc *r)
> +{
> +	return handle_s12(vcpu, p, r, false);
> +}
> +
> +static bool handle_s12w(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +			const struct sys_reg_desc *r)
> +{
> +	return handle_s12(vcpu, p, r, true);
> +}
> +
> +/*
> + * AT instruction emulation
> + *
> + * We emulate AT instructions executed in the virtual EL2.
> + * Basic strategy for the stage-1 translation emulation is to load proper
> + * context, which depends on the trapped instruction and the virtual HCR_EL2,
> + * to the EL1 virtual memory control registers and execute S1E[01] instructions
> + * in EL2. See below for more detail.
> + *
> + * For the stage-2 translation, which is necessary for S12E[01] emulation,
> + * we walk the guest hypervisor's stage-2 page table in software.
> + *
> + * The stage-1 translation emulations can be divided into two groups depending
> + * on the translation regime.
> + *
> + * 1. EL2 AT instructions: S1E2x
> + * +-----------------------------------------------------------------------+
> + * |                             |         Setting for the emulation       |
> + * | Virtual HCR_EL2.E2H on trap |-----------------------------------------+
> + * |                             | Phys EL1 regs | Phys NV, NV1 | Phys TGE |
> + * |-----------------------------------------------------------------------|
> + * |             0               |     vEL2      |    (1, 1)    |    0     |
> + * |             1               |     vEL2      |    (0, 0)    |    0     |
> + * +-----------------------------------------------------------------------+
> + *
> + * We emulate the EL2 AT instructions by loading virtual EL2 context
> + * to the EL1 virtual memory control registers and executing corresponding
> + * EL1 AT instructions.
> + *
> + * We set physical NV and NV1 bits to use EL2 page table format for non-VHE
> + * guest hypervisor (i.e. HCR_EL2.E2H == 0). As a VHE guest hypervisor uses the
> + * EL1 page table format, we don't set those bits.
> + *
> + * We should clear physical TGE bit not to use the EL2 translation regime when
> + * the host uses the VHE feature.
> + *
> + *
> + * 2. EL0/EL1 AT instructions: S1E[01]x, S12E1x
> + * +----------------------------------------------------------------------+
> + * |   Virtual HCR_EL2 on trap  |        Setting for the emulation        |
> + * |----------------------------------------------------------------------+
> + * | (vE2H, vTGE) | (vNV, vNV1) | Phys EL1 regs | Phys NV, NV1 | Phys TGE |
> + * |----------------------------------------------------------------------|
> + * |    (0, 0)*   |   (0, 0)    |      vEL1     |    (0, 0)    |    0     |
> + * |    (0, 0)    |   (1, 1)    |      vEL1     |    (1, 1)    |    0     |
> + * |    (1, 1)    |   (0, 0)    |      vEL2     |    (0, 0)    |    0     |
> + * |    (1, 1)    |   (1, 1)    |      vEL2     |    (1, 1)    |    0     |
> + * +----------------------------------------------------------------------+
> + *
> + * *For (0, 0) in the 'Virtual HCR_EL2 on trap' column, it actually means
> + *  (1, 1). Keep them (0, 0) just for the readability.
> + *
> + * We set physical EL1 virtual memory control registers depending on
> + * (vE2H, vTGE) pair. When the pair is (0, 0) where AT instructions are
> + * supposed to use EL0/EL1 translation regime, we load the EL1 registers with
> + * the virtual EL1 registers (i.e. EL1 registers from the guest hypervisor's
> + * point of view). When the pair is (1, 1), however, AT instructions are defined
> + * to apply EL2 translation regime. To emulate this behavior, we load the EL1
> + * registers with the virtual EL2 context. (i.e the shadow registers)
> + *
> + * We respect the virtual NV and NV1 bit for the emulation. When those bits are
> + * set, it means that a guest hypervisor would like to use EL2 page table format
> + * for the EL1 translation regime. We emulate this by setting the physical
> + * NV and NV1 bits.
> + */
> +
> +#define SYS_INSN_TO_DESC(insn, access_fn, forward_fn)			\
> +	{ SYS_DESC(OP_##insn), (access_fn), NULL, 0, 0,			\
> +	  NULL, NULL, (forward_fn) }
>  static struct sys_reg_desc sys_insn_descs[] = {
>  	{ SYS_DESC(SYS_DC_ISW), access_dcsw },
> +
> +	SYS_INSN_TO_DESC(AT_S1E1R, handle_s1e01, forward_at_traps),
> +	SYS_INSN_TO_DESC(AT_S1E1W, handle_s1e01, forward_at_traps),
> +	SYS_INSN_TO_DESC(AT_S1E0R, handle_s1e01, forward_at_traps),
> +	SYS_INSN_TO_DESC(AT_S1E0W, handle_s1e01, forward_at_traps),
> +	SYS_INSN_TO_DESC(AT_S1E1RP, handle_s1e01, forward_at_traps),
> +	SYS_INSN_TO_DESC(AT_S1E1WP, handle_s1e01, forward_at_traps),
> +
>  	{ SYS_DESC(SYS_DC_CSW), access_dcsw },
>  	{ SYS_DESC(SYS_DC_CISW), access_dcsw },
> +
> +	SYS_INSN_TO_DESC(AT_S1E2R, handle_s1e2, forward_nv_traps),
> +	SYS_INSN_TO_DESC(AT_S1E2W, handle_s1e2, forward_nv_traps),
> +	SYS_INSN_TO_DESC(AT_S12E1R, handle_s12r, forward_nv_traps),
> +	SYS_INSN_TO_DESC(AT_S12E1W, handle_s12w, forward_nv_traps),
> +	SYS_INSN_TO_DESC(AT_S12E0R, handle_s12r, forward_nv_traps),
> +	SYS_INSN_TO_DESC(AT_S12E0W, handle_s12w, forward_nv_traps),
>  };
>  
>  static bool trap_dbgidr(struct kvm_vcpu *vcpu,

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 44/59] KVM: arm64: nv: Trap and emulate TLBI instructions from virtual EL2
  2019-06-21  9:38 ` [PATCH 44/59] KVM: arm64: nv: Trap and emulate TLBI " Marc Zyngier
  2019-07-02 12:37   ` Julien Thierry
@ 2019-07-10 10:15   ` Alexandru Elisei
  1 sibling, 0 replies; 176+ messages in thread
From: Alexandru Elisei @ 2019-07-10 10:15 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave Martin

On 6/21/19 10:38 AM, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
>
> When supporting nested virtualization a guest hypervisor executing TLBI
> instructions must be trapped and emulated by the host hypervisor,
> because the guest hypervisor can only affect physical TLB entries
> relating to its own execution environment (virtual EL2 in EL1) but not
> to the nested guests as required by the semantics of the instructions
> and TLBI instructions might also result in updates (invalidations) to
> shadow page tables.
>
> This patch does several things.
>
> 1. List and define all TLBI system instructions to emulate.
>
> 2. Emulate TLBI ALLE2(IS) instruction executed in the virtual EL2. Since
> we emulate the virtual EL2 in the EL1, we invalidate EL1&0 regime stage
> 1 TLB entries with setting vttbr_el2 having the VMID of the virtual EL2.
>
> 3. Emulate TLBI VAE2* instruction executed in the virtual EL2. Based on the
> same principle as TLBI ALLE2 instruction, we can simply emulate those
> instructions by executing corresponding VAE1* instructions with the
> virtual EL2's VMID assigned by the host hypervisor.
>
> Note that we are able to emulate TLBI ALLE2IS precisely by only
> invalidating stage 1 TLB entries via TLBI VMALL1IS instruction, but to
> make it simeple, we reuse the existing function, __kvm_tlb_flush_vmid(),

s/simeple/simple

> which invalidates both of stage 1 and 2 TLB entries.
>
> 4. TLBI ALLE1(IS) instruction invalidates all EL1&0 regime stage 1 and 2
> TLB entries (on all PEs in the same Inner Shareable domain). To emulate
> these instructions, we first need to clear all the mappings in the
> shadow page tables since executing those instructions implies the change
> of mappings in the stage 2 page tables maintained by the guest
> hypervisor.  We then need to invalidate all EL1&0 regime stage 1 and 2
> TLB entries of all VMIDs, which are assigned by the host hypervisor, for
> this VM.
>
> 5. Based on the same principle as TLBI ALLE1(IS) emulation, we clear the
> mappings in the shadow stage-2 page tables and invalidate TLB entries.
> But this time we do it only for the current VMID from the guest
> hypervisor's perspective, not for all VMIDs.
>
> 6. Based on the same principle as TLBI ALLE1(IS) and TLBI VMALLS12E1(IS)
> emulation, we clear the mappings in the shadow stage-2 page tables and
> invalidate TLB entries. We do it only for one mapping for the current
> VMID from the guest hypervisor's view.
>
> 7. Forward system instruction traps to the virtual EL2 if a
> corresponding bit in the virtual HCR_EL2 is set.
>
> 8. Even though a guest hypervisor can execute TLBI instructions that are
> accesible at EL1 without trap, it's wrong; All those TLBI instructions
> work based on current VMID, and when running a guest hypervisor current
> VMID is the one for itself, not the one from the virtual vttbr_el2. So
> letting a guest hypervisor execute those TLBI instructions results in
> invalidating its own TLB entries and leaving invalid TLB entries
> unhandled.
>
> Therefore we trap and emulate those TLBI instructions. The emulation is
> simple; we find a shadow VMID mapped to the virtual vttbr_el2, set it in
> the physical vttbr_el2, then execute the same instruction in EL2.
>
> We don't set HCR_EL2.TTLB bit yet.
>
>   [ Changes performed by Marc Zynger:
>
>     The TLBI handling code more or less directly execute the same
>     instruction that has been trapped (with an EL2->EL1 conversion
>     in the case of an EL2 TLBI), but that's unfortunately not enough:
>
>     - TLBIs must be upgraded to the Inner Shareable domain to account
>       for vcpu migration, just like we already have with HCR_EL2.FB.
>
>     - The DSB instruction that synchronises these must thus be on
>       the Inner Shareable domain as well.
>
>     - Prior to executing the TLBI, we need another DSB ISHST to make
>       sure that the update to the page tables is now visible.
>
>       Ordering of system instructions fixed
>
>     - The current TLB invalidation code is pretty buggy, as it assume a
>       page mapping. On the contrary, it is likely that TLB invalidation
>       will cover more than a single page, and the size should be decided
>       by the guests configuration (and not the host's).
>
>       Since we don't cache the guest mapping sizes in the shadow PT yet,
>       let's assume the worse case (a block mapping) and invalidate that.
>
>       Take this opportunity to fix the decoding of the parameter (it
>       isn't a straight IPA).
>
>     - In general, we always emulate local TBL invalidations as being
>       as upgraded to the Inner Shareable domain so that we can easily
>       deal with vcpu migration. This is consistent with the fact that
>       we set HCR_EL2.FB when running non-nested VMs.
>
>       So let's emulate TLBI ALLE2 as ALLE2IS.
>   ]
>
>   [ Changes performed by Christoffer Dall:
>
>     Sometimes when we are invalidating the TLB for a certain S2 MMU
>     context, this context can also have EL2 context associated with it
>     and we have to invalidate this too.
>   ]
>
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> ---
>  arch/arm64/include/asm/kvm_asm.h |   2 +
>  arch/arm64/include/asm/sysreg.h  |  36 ++++++
>  arch/arm64/kvm/hyp/tlb.c         |  81 +++++++++++++
>  arch/arm64/kvm/sys_regs.c        | 201 +++++++++++++++++++++++++++++++
>  virt/kvm/arm/mmu.c               |  18 ++-
>  5 files changed, 337 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index 1cfa4d2cf772..9cb9ab066ebc 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -67,6 +67,8 @@ extern void __kvm_flush_vm_context(void);
>  extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa);
>  extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
>  extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
> +extern void __kvm_tlb_vae2(struct kvm_s2_mmu *mmu, u64 va, u64 sys_encoding);
> +extern void __kvm_tlb_el1_instr(struct kvm_s2_mmu *mmu, u64 val, u64 sys_encoding);
>  
>  extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
>  extern void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr);
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index b3a8d21c07b3..e0912ececd92 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -553,6 +553,42 @@
>  #define OP_AT_S12E0R	sys_insn(AT_Op0, 4, AT_CRn, 8, 6)
>  #define OP_AT_S12E0W	sys_insn(AT_Op0, 4, AT_CRn, 8, 7)
>  
> +/* TLBI instructions */
> +#define TLBI_Op0	1
> +#define TLBI_Op1_EL1	0	/* Accessible from EL1 or higher */
> +#define TLBI_Op1_EL2	4	/* Accessible from EL2 or higher */
> +#define TLBI_CRn	8
> +#define tlbi_insn_el1(CRm, Op2)	sys_insn(TLBI_Op0, TLBI_Op1_EL1, TLBI_CRn, (CRm), (Op2))
> +#define tlbi_insn_el2(CRm, Op2)	sys_insn(TLBI_Op0, TLBI_Op1_EL2, TLBI_CRn, (CRm), (Op2))
> +
> +#define OP_TLBI_VMALLE1IS	tlbi_insn_el1(3, 0)
> +#define OP_TLBI_VAE1IS		tlbi_insn_el1(3, 1)
> +#define OP_TLBI_ASIDE1IS	tlbi_insn_el1(3, 2)
> +#define OP_TLBI_VAAE1IS		tlbi_insn_el1(3, 3)
> +#define OP_TLBI_VALE1IS		tlbi_insn_el1(3, 5)
> +#define OP_TLBI_VAALE1IS	tlbi_insn_el1(3, 7)
> +#define OP_TLBI_VMALLE1		tlbi_insn_el1(7, 0)
> +#define OP_TLBI_VAE1		tlbi_insn_el1(7, 1)
> +#define OP_TLBI_ASIDE1		tlbi_insn_el1(7, 2)
> +#define OP_TLBI_VAAE1		tlbi_insn_el1(7, 3)
> +#define OP_TLBI_VALE1		tlbi_insn_el1(7, 5)
> +#define OP_TLBI_VAALE1		tlbi_insn_el1(7, 7)
> +
> +#define OP_TLBI_IPAS2E1IS	tlbi_insn_el2(0, 1)
> +#define OP_TLBI_IPAS2LE1IS	tlbi_insn_el2(0, 5)
> +#define OP_TLBI_ALLE2IS		tlbi_insn_el2(3, 0)
> +#define OP_TLBI_VAE2IS		tlbi_insn_el2(3, 1)
> +#define OP_TLBI_ALLE1IS		tlbi_insn_el2(3, 4)
> +#define OP_TLBI_VALE2IS		tlbi_insn_el2(3, 5)
> +#define OP_TLBI_VMALLS12E1IS	tlbi_insn_el2(3, 6)
> +#define OP_TLBI_IPAS2E1		tlbi_insn_el2(4, 1)
> +#define OP_TLBI_IPAS2LE1	tlbi_insn_el2(4, 5)
> +#define OP_TLBI_ALLE2		tlbi_insn_el2(7, 0)
> +#define OP_TLBI_VAE2		tlbi_insn_el2(7, 1)
> +#define OP_TLBI_ALLE1		tlbi_insn_el2(7, 4)
> +#define OP_TLBI_VALE2		tlbi_insn_el2(7, 5)
> +#define OP_TLBI_VMALLS12E1	tlbi_insn_el2(7, 6)
> +
>  /* Common SCTLR_ELx flags. */
>  #define SCTLR_ELx_DSSBS	(_BITUL(44))
>  #define SCTLR_ELx_ENIA	(_BITUL(31))
> diff --git a/arch/arm64/kvm/hyp/tlb.c b/arch/arm64/kvm/hyp/tlb.c
> index 779405db3fb3..026afbf1a697 100644
> --- a/arch/arm64/kvm/hyp/tlb.c
> +++ b/arch/arm64/kvm/hyp/tlb.c
> @@ -205,3 +205,84 @@ void __hyp_text __kvm_flush_vm_context(void)
>  	asm volatile("ic ialluis" : : );
>  	dsb(ish);
>  }
> +
> +void __hyp_text __kvm_tlb_vae2(struct kvm_s2_mmu *mmu, u64 va, u64 sys_encoding)
> +{
> +	struct tlb_inv_context cxt;
> +
> +	dsb(ishst);
> +
> +	/* Switch to requested VMID */
> +	__tlb_switch_to_guest()(mmu, &cxt);
> +
> +	/*
> +	 * Execute the EL1 version of TLBI VAE2* instruction, forcing
> +	 * an upgrade to the Inner Shareable domain in order to
> +	 * perform the invalidation on all CPUs.
> +	 */
> +	switch (sys_encoding) {
> +	case OP_TLBI_VAE2:
> +	case OP_TLBI_VAE2IS:
> +		__tlbi(vae1is, va);
> +		break;
> +	case OP_TLBI_VALE2:
> +	case OP_TLBI_VALE2IS:
> +		__tlbi(vale1is, va);
> +		break;
> +	default:
> +		break;
> +	}
> +	dsb(ish);
> +	isb();
> +
> +	__tlb_switch_to_host()(&cxt);
> +}
> +
> +void __hyp_text __kvm_tlb_el1_instr(struct kvm_s2_mmu *mmu, u64 val, u64 sys_encoding)
> +{
> +	struct tlb_inv_context cxt;
> +
> +	dsb(ishst);
> +
> +	/* Switch to requested VMID */
> +	__tlb_switch_to_guest()(mmu, &cxt);
> +
> +	/*
> +	 * Execute the same instruction as the guest hypervisor did,
> +	 * expanding the scope of local TLB invalidations to the Inner
> +	 * Shareable domain so that it takes place on all CPUs. This
> +	 * is equivalent to having HCR_EL2.FB set.
> +	 */
> +	switch (sys_encoding) {
> +	case OP_TLBI_VMALLE1:
> +	case OP_TLBI_VMALLE1IS:
> +		__tlbi(vmalle1is);
> +		break;
> +	case OP_TLBI_VAE1:
> +	case OP_TLBI_VAE1IS:
> +		__tlbi(vae1is, val);
> +		break;
> +	case OP_TLBI_ASIDE1:
> +	case OP_TLBI_ASIDE1IS:
> +		__tlbi(aside1is, val);
> +		break;
> +	case OP_TLBI_VAAE1:
> +	case OP_TLBI_VAAE1IS:
> +		__tlbi(vaae1is, val);
> +		break;
> +	case OP_TLBI_VALE1:
> +	case OP_TLBI_VALE1IS:
> +		__tlbi(vale1is, val);
> +		break;
> +	case OP_TLBI_VAALE1:
> +	case OP_TLBI_VAALE1IS:
> +		__tlbi(vaale1is, val);
> +		break;
> +	default:
> +		break;
> +	}
> +	dsb(ish);
> +	isb();
> +
> +	__tlb_switch_to_host()(&cxt);
> +}
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 102419b837e8..0343682fe47f 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1661,6 +1661,11 @@ static bool forward_at_traps(struct kvm_vcpu *vcpu)
>  	return forward_traps(vcpu, HCR_AT);
>  }
>  
> +static bool forward_ttlb_traps(struct kvm_vcpu *vcpu)
> +{
> +	return forward_traps(vcpu, HCR_TTLB);
> +}
> +
>  /* This function is to support the recursive nested virtualization */
>  static bool forward_nv1_traps(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
>  {
> @@ -2251,6 +2256,174 @@ static bool handle_s12w(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
>  	return handle_s12(vcpu, p, r, true);
>  }
>  
> +static bool handle_alle2is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +			   const struct sys_reg_desc *r)
> +{
> +	/*
> +	 * To emulate invalidating all EL2 regime stage 1 TLB entries for all
> +	 * PEs, executing TLBI VMALLE1IS is enough. But reuse the existing
> +	 * interface for the simplicity; invalidating stage 2 entries doesn't
> +	 * affect the correctness.
> +	 */
> +	kvm_call_hyp(__kvm_tlb_flush_vmid, &vcpu->kvm->arch.mmu);
> +	return true;
> +}
> +
> +static bool handle_vae2(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +		       const struct sys_reg_desc *r)

This is nitpicking, but the function should really be named handle_vae2is
because __kvm_tlb_vae2 upgrades the tlb operation to inner shareable. All the
functions that upgrade the operation to inner shareable have the "is" suffix and
when reading the patch my first impression was that the above function doesn't
do the instruction upgrade.

> +{
> +	int sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
> +
> +	/*
> +	 * Based on the same principle as TLBI ALLE2 instruction emulation, we
> +	 * emulate TLBI VAE2* instructions by executing corresponding TLBI VAE1*
> +	 * instructions with the virtual EL2's VMID assigned by the host
> +	 * hypervisor.
> +	 */
> +	kvm_call_hyp(__kvm_tlb_vae2, &vcpu->kvm->arch.mmu,
> +		     p->regval, sys_encoding);
> +	return true;
> +}
> +
> +static bool handle_alle1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +			   const struct sys_reg_desc *r)
> +{
> +	struct kvm_s2_mmu *mmu = &vcpu->kvm->arch.mmu;
> +	spin_lock(&vcpu->kvm->mmu_lock);
> +
> +	/*
> +	 * Clear all mappings in the shadow page tables and invalidate the stage
> +	 * 1 and 2 TLB entries via kvm_tlb_flush_vmid_ipa().
> +	 */
> +	kvm_nested_s2_clear(vcpu->kvm);
> +
> +	if (mmu->vmid.vmid_gen) {
> +		/*
> +		 * Invalidate the stage 1 and 2 TLB entries for the host OS
> +		 * in a VM only if there is one.
> +		 */
> +		kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
> +	}
> +
> +	spin_unlock(&vcpu->kvm->mmu_lock);
> +
> +	return true;
> +}
> +
> +static bool handle_vmalls12e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +				const struct sys_reg_desc *r)
> +{
> +	u64 vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
> +	struct kvm_s2_mmu *mmu;
> +
> +	spin_lock(&vcpu->kvm->mmu_lock);
> +
> +	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, HCR_VM);
> +	if (mmu)
> +		kvm_unmap_stage2_range(mmu, 0, kvm_phys_size(vcpu->kvm));
> +
> +	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, 0);
> +	if (mmu)
> +		kvm_unmap_stage2_range(mmu, 0, kvm_phys_size(vcpu->kvm));
> +
> +	spin_unlock(&vcpu->kvm->mmu_lock);
> +
> +	return true;
> +}
> +
> +static bool handle_ipas2e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +			     const struct sys_reg_desc *r)
> +{
> +	u64 vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
> +	u64 vtcr = vcpu_read_sys_reg(vcpu, VTCR_EL2);
> +	struct kvm_s2_mmu *mmu;
> +	u64 base_addr;
> +	int max_size;
> +
> +	/*
> +	 * We drop a number of things from the supplied value:
> +	 *
> +	 * - NS bit: we're non-secure only.
> +	 *
> +	 * - TTL field: We already have the granule size from the
> +	 *   VTCR_EL2.TG0 field, and the level is only relevant to the
> +	 *   guest's S2PT.
> +	 *
> +	 * - IPA[51:48]: We don't support 52bit IPA just yet...
> +	 *
> +	 * And of course, adjust the IPA to be on an actual address.
> +	 */
> +	base_addr = (p->regval & GENMASK_ULL(35, 0)) << 12;
> +
> +	/* Compute the maximum extent of the invalidation */
> +	switch ((vtcr & VTCR_EL2_TG0_MASK)) {
> +	case VTCR_EL2_TG0_4K:
> +		max_size = SZ_1G;
> +		break;
> +	case VTCR_EL2_TG0_16K:
> +		max_size = SZ_32M;
> +		break;
> +	case VTCR_EL2_TG0_64K:
> +		/*
> +		 * No, we do not support 52bit IPA in nested yet. Once
> +		 * we do, this should be 4TB.
> +		 */
> +		/* FIXME: remove the 52bit PA support from the IDregs */
> +		max_size = SZ_512M;
> +		break;
> +	default:
> +		BUG();
> +	}
> +
> +	spin_lock(&vcpu->kvm->mmu_lock);
> +
> +	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, HCR_VM);
> +	if (mmu)
> +		kvm_unmap_stage2_range(mmu, base_addr, max_size);
> +
> +	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, 0);
> +	if (mmu)
> +		kvm_unmap_stage2_range(mmu, base_addr, max_size);
> +
> +	spin_unlock(&vcpu->kvm->mmu_lock);
> +
> +	return true;
> +}
> +
> +static bool handle_tlbi_el1(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +			    const struct sys_reg_desc *r)
> +{
> +	u64 virtual_vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
> +	struct kvm_s2_mmu *mmu;
> +	int sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
> +
> +	/*
> +	 * TODO: Revisit this comment:
> +	 *
> +	 * If we can't find a shadow VMID, it is either the virtual
> +	 * VMID is for the host OS or the nested VM having the virtual
> +	 * VMID is never executed. (Note that we create a showdow VMID
> +	 * when entering a VM.) For the former, we can flush TLB
> +	 * entries belonging to the host OS in a VM. For the latter, we
> +	 * don't have to do anything. Since we can't differentiate
> +	 * between those cases, just do what we can do for the former.
> +	 */

As I see it, we're not searching by shadow VMID, but by guest hypervisor
vttbr_el2, and we set the vttbr when we create a new mmu as a result of
emulating eret.

> +
> +	mutex_lock(&vcpu->kvm->lock);
> +	mmu = lookup_s2_mmu(vcpu->kvm, virtual_vttbr, HCR_VM);
> +	if (mmu)
> +		kvm_call_hyp(__kvm_tlb_el1_instr,
> +			     mmu, p->regval, sys_encoding);
> +
> +	mmu = lookup_s2_mmu(vcpu->kvm, virtual_vttbr, 0);
> +	if (mmu)
> +		kvm_call_hyp(__kvm_tlb_el1_instr,
> +			     mmu, p->regval, sys_encoding);
> +	mutex_unlock(&vcpu->kvm->lock);

I can't figure out why we're trying both cases. Why not use __vcpu_sys_reg(vcpu,
HCR_EL2) and pass that to lookup_s2_mmu? I see that you're doing the above in
several other places, so there's must be something that I'm missing.

> +
> +	return true;
> +}
> +
>  /*
>   * AT instruction emulation
>   *
> @@ -2333,12 +2506,40 @@ static struct sys_reg_desc sys_insn_descs[] = {
>  	{ SYS_DESC(SYS_DC_CSW), access_dcsw },
>  	{ SYS_DESC(SYS_DC_CISW), access_dcsw },
>  
> +	SYS_INSN_TO_DESC(TLBI_VMALLE1IS, handle_tlbi_el1, forward_ttlb_traps),
> +	SYS_INSN_TO_DESC(TLBI_VAE1IS, handle_tlbi_el1, forward_ttlb_traps),
> +	SYS_INSN_TO_DESC(TLBI_ASIDE1IS, handle_tlbi_el1, forward_ttlb_traps),
> +	SYS_INSN_TO_DESC(TLBI_VAAE1IS, handle_tlbi_el1, forward_ttlb_traps),
> +	SYS_INSN_TO_DESC(TLBI_VALE1IS, handle_tlbi_el1, forward_ttlb_traps),
> +	SYS_INSN_TO_DESC(TLBI_VAALE1IS, handle_tlbi_el1, forward_ttlb_traps),
> +	SYS_INSN_TO_DESC(TLBI_VMALLE1, handle_tlbi_el1, forward_ttlb_traps),
> +	SYS_INSN_TO_DESC(TLBI_VAE1, handle_tlbi_el1, forward_ttlb_traps),
> +	SYS_INSN_TO_DESC(TLBI_ASIDE1, handle_tlbi_el1, forward_ttlb_traps),
> +	SYS_INSN_TO_DESC(TLBI_VAAE1, handle_tlbi_el1, forward_ttlb_traps),
> +	SYS_INSN_TO_DESC(TLBI_VALE1, handle_tlbi_el1, forward_ttlb_traps),
> +	SYS_INSN_TO_DESC(TLBI_VAALE1, handle_tlbi_el1, forward_ttlb_traps),
> +
>  	SYS_INSN_TO_DESC(AT_S1E2R, handle_s1e2, forward_nv_traps),
>  	SYS_INSN_TO_DESC(AT_S1E2W, handle_s1e2, forward_nv_traps),
>  	SYS_INSN_TO_DESC(AT_S12E1R, handle_s12r, forward_nv_traps),
>  	SYS_INSN_TO_DESC(AT_S12E1W, handle_s12w, forward_nv_traps),
>  	SYS_INSN_TO_DESC(AT_S12E0R, handle_s12r, forward_nv_traps),
>  	SYS_INSN_TO_DESC(AT_S12E0W, handle_s12w, forward_nv_traps),
> +
> +	SYS_INSN_TO_DESC(TLBI_IPAS2E1IS, handle_ipas2e1is, forward_nv_traps),
> +	SYS_INSN_TO_DESC(TLBI_IPAS2LE1IS, handle_ipas2e1is, forward_nv_traps),
> +	SYS_INSN_TO_DESC(TLBI_ALLE2IS, handle_alle2is, forward_nv_traps),
> +	SYS_INSN_TO_DESC(TLBI_VAE2IS, handle_vae2, forward_nv_traps),
> +	SYS_INSN_TO_DESC(TLBI_ALLE1IS, handle_alle1is, forward_nv_traps),
> +	SYS_INSN_TO_DESC(TLBI_VALE2IS, handle_vae2, forward_nv_traps),
> +	SYS_INSN_TO_DESC(TLBI_VMALLS12E1IS, handle_vmalls12e1is, forward_nv_traps),
> +	SYS_INSN_TO_DESC(TLBI_IPAS2E1, handle_ipas2e1is, forward_nv_traps),
> +	SYS_INSN_TO_DESC(TLBI_IPAS2LE1, handle_ipas2e1is, forward_nv_traps),
> +	SYS_INSN_TO_DESC(TLBI_ALLE2, handle_alle2is, forward_nv_traps),
> +	SYS_INSN_TO_DESC(TLBI_VAE2, handle_vae2, forward_nv_traps),
> +	SYS_INSN_TO_DESC(TLBI_ALLE1, handle_alle1is, forward_nv_traps),
> +	SYS_INSN_TO_DESC(TLBI_VALE2, handle_vae2, forward_nv_traps),
> +	SYS_INSN_TO_DESC(TLBI_VMALLS12E1, handle_vmalls12e1is, forward_nv_traps),
>  };
>  
>  static bool trap_dbgidr(struct kvm_vcpu *vcpu,
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 6a7cba077bce..0ea79e543b29 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -51,7 +51,23 @@ static bool memslot_is_logging(struct kvm_memory_slot *memslot)
>   */
>  void kvm_flush_remote_tlbs(struct kvm *kvm)
>  {
> -	kvm_call_hyp(__kvm_tlb_flush_vmid, &kvm->arch.mmu);
> +	struct kvm_s2_mmu *mmu = &kvm->arch.mmu;
> +
> +	if (mmu == &kvm->arch.mmu) {

Soo... mmu was initialized to &kvm->arch.mmu and now we're testing to see if mmu
== &kvm->arch.mmu? I think we want to compare it with kvm->arch.hw_mmu, i.e
kvm->arch.hw_mmu == &kvm->arch.mmu.

Assuming that we want to compare kvm->arch.mmu with kvm->arch.hw_mmu, I'm not
really sure that that is enough to tell that we have a regular (non-nesting)
guest. kvm->arch.hw_mmu == &kvm->arch.mmu only means that we're running the L1
guest hypervisor. We could probably do it by iterating through the
kvm->arch.nested_mmus and check that mmu.vttbr is invalid for each mmu. Or check
that the vcpu doesn't have the KVM_ARM_VCPU_NESTED_VIRT feature.

Also, this change to the kvm_flush_remote_tlbs function wasn't mentioned in the
commit message and doesn't look like it's really related to this patch, it looks
more like a fix. Perhaps it should be part of patch #34 "KVM: arm/arm64: nv:
Factor out stage 2 page table data from struct kvm".

Thanks,
Alex
> +		/*
> +		 * For a normal (i.e. non-nested) guest, flush entries for the
> +		 * given VMID *
> +		 */
> +		kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
> +	} else {
> +		/*
> +		 * When supporting nested virtualization, we can have multiple
> +		 * VMIDs in play for each VCPU in the VM, so it's really not
> +		 * worth it to try to quiesce the system and flush all the
> +		 * VMIDs that may be in use, instead just nuke the whole thing.
> +		 */
> +		kvm_call_hyp(__kvm_flush_vm_context);
> +	}
>  }
>  
>  static void kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa)

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 46/59] KVM: arm64: nv: arch_timer: Support hyp timer emulation
  2019-06-21  9:38 ` [PATCH 46/59] KVM: arm64: nv: arch_timer: Support hyp timer emulation Marc Zyngier
@ 2019-07-10 16:23   ` Alexandru Elisei
  0 siblings, 0 replies; 176+ messages in thread
From: Alexandru Elisei @ 2019-07-10 16:23 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave Martin

On 6/21/19 10:38 AM, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@arm.com>
>
> Emulating EL2 also means emulating the EL2 timers. To do so, we expand
> our timer framework to deal with at most 4 timers. At any given time,
> two timers are using the HW timers, and the two others are purely
> emulated.
>
> The role of deciding which is which at any given time is left to a
> mapping function which is called every time we need to make such a
> decision.
>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/include/asm/kvm_emulate.h |   2 +
>  include/kvm/arm_arch_timer.h       |   5 ++
>  include/kvm/arm_vgic.h             |   1 +
>  virt/kvm/arm/arch_timer.c          | 122 ++++++++++++++++++++++++++++-
>  virt/kvm/arm/trace.h               |   6 +-
>  virt/kvm/arm/vgic/vgic.c           |  15 ++++
>  6 files changed, 147 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
> index 6b7644a383f6..865ce545b465 100644
> --- a/arch/arm/include/asm/kvm_emulate.h
> +++ b/arch/arm/include/asm/kvm_emulate.h
> @@ -333,4 +333,6 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
>  
>  static inline void vcpu_ptrauth_setup_lazy(struct kvm_vcpu *vcpu) {}
>  
> +static inline bool is_hyp_ctxt(struct kvm_vcpu *vcpu) { return false; }
> +
>  #endif /* __ARM_KVM_EMULATE_H__ */
> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> index d120e6c323e7..3a5d9255120e 100644
> --- a/include/kvm/arm_arch_timer.h
> +++ b/include/kvm/arm_arch_timer.h
> @@ -13,6 +13,8 @@
>  enum kvm_arch_timers {
>  	TIMER_PTIMER,
>  	TIMER_VTIMER,
> +	TIMER_HVTIMER,
> +	TIMER_HPTIMER,
>  	NR_KVM_TIMERS
>  };
>  
> @@ -54,6 +56,7 @@ struct arch_timer_context {
>  struct timer_map {
>  	struct arch_timer_context *direct_vtimer;
>  	struct arch_timer_context *direct_ptimer;
> +	struct arch_timer_context *emul_vtimer;
>  	struct arch_timer_context *emul_ptimer;
>  };
>  
> @@ -98,6 +101,8 @@ bool kvm_arch_timer_get_input_level(int vintid);
>  #define vcpu_get_timer(v,t)	(&vcpu_timer(v)->timers[(t)])
>  #define vcpu_vtimer(v)	(&(v)->arch.timer_cpu.timers[TIMER_VTIMER])
>  #define vcpu_ptimer(v)	(&(v)->arch.timer_cpu.timers[TIMER_PTIMER])
> +#define vcpu_hvtimer(v)	(&(v)->arch.timer_cpu.timers[TIMER_HVTIMER])
> +#define vcpu_hptimer(v)	(&(v)->arch.timer_cpu.timers[TIMER_HPTIMER])
>  
>  #define arch_timer_ctx_index(ctx)	((ctx) - vcpu_timer((ctx)->vcpu)->timers)
>  
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index c36c86f1ec9a..7fc3b413b3de 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -355,6 +355,7 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
>  int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, unsigned int host_irq,
>  			  u32 vintid, bool (*get_input_level)(int vindid));
>  int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int vintid);
> +int kvm_vgic_get_map(struct kvm_vcpu *vcpu, unsigned int vintid);
>  bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int vintid);
>  
>  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 089441a07ed7..3d84c240071d 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -15,6 +15,7 @@
>  #include <asm/arch_timer.h>
>  #include <asm/kvm_emulate.h>
>  #include <asm/kvm_hyp.h>
> +#include <asm/kvm_nested.h>
>  
>  #include <kvm/arm_vgic.h>
>  #include <kvm/arm_arch_timer.h>
> @@ -39,6 +40,16 @@ static const struct kvm_irq_level default_vtimer_irq = {
>  	.level	= 1,
>  };
>  
> +static const struct kvm_irq_level default_hptimer_irq = {
> +	.irq	= 26,
> +	.level	= 1,
> +};
> +
> +static const struct kvm_irq_level default_hvtimer_irq = {
> +	.irq	= 28,
> +	.level	= 1,
> +};
> +
>  static bool kvm_timer_irq_can_fire(struct arch_timer_context *timer_ctx);
>  static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
>  				 struct arch_timer_context *timer_ctx);
> @@ -58,13 +69,27 @@ u64 kvm_phys_timer_read(void)
>  
>  static void get_timer_map(struct kvm_vcpu *vcpu, struct timer_map *map)
>  {
> -	if (has_vhe()) {
> +	if (nested_virt_in_use(vcpu)) {
> +		if (is_hyp_ctxt(vcpu)) {
> +			map->direct_vtimer = vcpu_hvtimer(vcpu);
> +			map->direct_ptimer = vcpu_hptimer(vcpu);
> +			map->emul_vtimer = vcpu_vtimer(vcpu);
> +			map->emul_ptimer = vcpu_ptimer(vcpu);
> +		} else {
> +			map->direct_vtimer = vcpu_vtimer(vcpu);
> +			map->direct_ptimer = vcpu_ptimer(vcpu);
> +			map->emul_vtimer = vcpu_hvtimer(vcpu);
> +			map->emul_ptimer = vcpu_hptimer(vcpu);
> +		}
> +	} else if (has_vhe()) {
>  		map->direct_vtimer = vcpu_vtimer(vcpu);
>  		map->direct_ptimer = vcpu_ptimer(vcpu);
> +		map->emul_vtimer = NULL;
>  		map->emul_ptimer = NULL;
>  	} else {
>  		map->direct_vtimer = vcpu_vtimer(vcpu);
>  		map->direct_ptimer = NULL;
> +		map->emul_vtimer = NULL;
>  		map->emul_ptimer = vcpu_ptimer(vcpu);
>  	}
>  
> @@ -237,9 +262,11 @@ static bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx)
>  
>  		switch (index) {
>  		case TIMER_VTIMER:
> +		case TIMER_HVTIMER:
>  			cnt_ctl = read_sysreg_el0(SYS_CNTV_CTL);
>  			break;
>  		case TIMER_PTIMER:
> +		case TIMER_HPTIMER:
>  			cnt_ctl = read_sysreg_el0(SYS_CNTP_CTL);
>  			break;
>  		case NR_KVM_TIMERS:
> @@ -270,6 +297,7 @@ bool kvm_timer_is_pending(struct kvm_vcpu *vcpu)
>  
>  	return kvm_timer_should_fire(map.direct_vtimer) ||
>  	       kvm_timer_should_fire(map.direct_ptimer) ||
> +	       kvm_timer_should_fire(map.emul_vtimer) ||
>  	       kvm_timer_should_fire(map.emul_ptimer);
>  }
>  
> @@ -349,6 +377,7 @@ static void timer_save_state(struct arch_timer_context *ctx)
>  
>  	switch (index) {
>  	case TIMER_VTIMER:
> +	case TIMER_HVTIMER:
>  		ctx->cnt_ctl = read_sysreg_el0(SYS_CNTV_CTL);
>  		ctx->cnt_cval = read_sysreg_el0(SYS_CNTV_CVAL);
>  
> @@ -358,6 +387,7 @@ static void timer_save_state(struct arch_timer_context *ctx)
>  
>  		break;
>  	case TIMER_PTIMER:
> +	case TIMER_HPTIMER:
>  		ctx->cnt_ctl = read_sysreg_el0(SYS_CNTP_CTL);
>  		ctx->cnt_cval = read_sysreg_el0(SYS_CNTP_CVAL);
>  
> @@ -395,6 +425,7 @@ static void kvm_timer_blocking(struct kvm_vcpu *vcpu)
>  	 */
>  	if (!kvm_timer_irq_can_fire(map.direct_vtimer) &&
>  	    !kvm_timer_irq_can_fire(map.direct_ptimer) &&
> +	    !kvm_timer_irq_can_fire(map.emul_vtimer) &&
>  	    !kvm_timer_irq_can_fire(map.emul_ptimer))
>  		return;
>  
> @@ -428,11 +459,13 @@ static void timer_restore_state(struct arch_timer_context *ctx)
>  
>  	switch (index) {
>  	case TIMER_VTIMER:
> +	case TIMER_HVTIMER:
>  		write_sysreg_el0(ctx->cnt_cval, SYS_CNTV_CVAL);
>  		isb();
>  		write_sysreg_el0(ctx->cnt_ctl, SYS_CNTV_CTL);
>  		break;
>  	case TIMER_PTIMER:
> +	case TIMER_HPTIMER:
>  		write_sysreg_el0(ctx->cnt_cval, SYS_CNTP_CVAL);
>  		isb();
>  		write_sysreg_el0(ctx->cnt_ctl, SYS_CNTP_CTL);
> @@ -519,6 +552,40 @@ static void kvm_timer_vcpu_load_nogic(struct kvm_vcpu *vcpu)
>  		enable_percpu_irq(host_vtimer_irq, host_vtimer_irq_flags);
>  }
>  
> +static void kvm_timer_vcpu_load_nested_switch(struct kvm_vcpu *vcpu,
> +					      struct timer_map *map)
> +{
> +	int hw, ret;
> +
> +	if (!irqchip_in_kernel(vcpu->kvm))
> +		return;
> +
> +	/*
> +	 * We only ever unmap the vtimer irq on a VHE system that runs nested
> +	 * virtualization, in which case we have both a valid emul_vtimer,
> +	 * emul_ptimer, direct_vtimer, and direct_ptimer.
> +	 *
> +	 * Since this is called from kvm_timer_vcpu_load(), a change between
> +	 * vEL2 and vEL1/0 will have just happened, and the timer_map will
> +	 * represent this, and therefore we switch the emul/direct mappings
> +	 * below.
> +	 */
> +	hw = kvm_vgic_get_map(vcpu, map->direct_vtimer->irq.irq);
> +	if (hw < 0) {
> +		kvm_vgic_unmap_phys_irq(vcpu, map->emul_vtimer->irq.irq);
> +		kvm_vgic_unmap_phys_irq(vcpu, map->emul_ptimer->irq.irq);
> +
> +		ret = kvm_vgic_map_phys_irq(vcpu,
> +					    map->direct_vtimer->host_timer_irq,
> +					    map->direct_vtimer->irq.irq,
> +					    kvm_arch_timer_get_input_level);
> +		ret = kvm_vgic_map_phys_irq(vcpu,
> +					    map->direct_ptimer->host_timer_irq,
> +					    map->direct_ptimer->irq.irq,
> +					    kvm_arch_timer_get_input_level);
> +	}
> +}
> +
>  void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
>  {
>  	struct arch_timer_cpu *timer = vcpu_timer(vcpu);
> @@ -530,6 +597,8 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
>  	get_timer_map(vcpu, &map);
>  
>  	if (static_branch_likely(&has_gic_active_state)) {
> +		kvm_timer_vcpu_load_nested_switch(vcpu, &map);

Would it be possible for this be a conditional call that depends on
nested_virt_in_use(vcpu), since that's the only situation where we change the
direct_vtimer when we switch from vEL1 to vEL2 or viceversa?

Thanks,
Alex
> +
>  		kvm_timer_vcpu_load_gic(map.direct_vtimer);
>  		if (map.direct_ptimer)
>  			kvm_timer_vcpu_load_gic(map.direct_ptimer);
> @@ -545,6 +614,8 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
>  	if (map.direct_ptimer)
>  		timer_restore_state(map.direct_ptimer);
>  
> +	if (map.emul_vtimer)
> +		timer_emulate(map.emul_vtimer);
>  	if (map.emul_ptimer)
>  		timer_emulate(map.emul_ptimer);
>  }
> @@ -589,6 +660,8 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
>  	 * In any case, we re-schedule the hrtimer for the physical timer when
>  	 * coming back to the VCPU thread in kvm_timer_vcpu_load().
>  	 */
> +	if (map.emul_vtimer)
> +		soft_timer_cancel(&map.emul_vtimer->hrtimer);
>  	if (map.emul_ptimer)
>  		soft_timer_cancel(&map.emul_ptimer->hrtimer);
>  
> @@ -649,10 +722,14 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
>  	 */
>  	vcpu_vtimer(vcpu)->cnt_ctl = 0;
>  	vcpu_ptimer(vcpu)->cnt_ctl = 0;
> +	vcpu_hvtimer(vcpu)->cnt_ctl = 0;
> +	vcpu_hptimer(vcpu)->cnt_ctl = 0;
>  
>  	if (timer->enabled) {
>  		kvm_timer_update_irq(vcpu, false, vcpu_vtimer(vcpu));
>  		kvm_timer_update_irq(vcpu, false, vcpu_ptimer(vcpu));
> +		kvm_timer_update_irq(vcpu, false, vcpu_hvtimer(vcpu));
> +		kvm_timer_update_irq(vcpu, false, vcpu_hptimer(vcpu));
>  
>  		if (irqchip_in_kernel(vcpu->kvm)) {
>  			kvm_vgic_reset_mapped_irq(vcpu, map.direct_vtimer->irq.irq);
> @@ -661,6 +738,8 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
>  		}
>  	}
>  
> +	if (map.emul_vtimer)
> +		soft_timer_cancel(&map.emul_vtimer->hrtimer);
>  	if (map.emul_ptimer)
>  		soft_timer_cancel(&map.emul_ptimer->hrtimer);
>  
> @@ -691,30 +770,46 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
>  	struct arch_timer_cpu *timer = vcpu_timer(vcpu);
>  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
>  	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
> +	struct arch_timer_context *hvtimer = vcpu_hvtimer(vcpu);
> +	struct arch_timer_context *hptimer = vcpu_hptimer(vcpu);
>  
>  	/* Synchronize cntvoff across all vtimers of a VM. */
>  	update_vtimer_cntvoff(vcpu, kvm_phys_timer_read());
>  	ptimer->cntvoff = 0;
> +	hvtimer->cntvoff = 0;
> +	hptimer->cntvoff = 0;
>  
>  	hrtimer_init(&timer->bg_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
>  	timer->bg_timer.function = kvm_bg_timer_expire;
>  
>  	hrtimer_init(&vtimer->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
>  	hrtimer_init(&ptimer->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
> +	hrtimer_init(&hvtimer->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
> +	hrtimer_init(&hptimer->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
>  	vtimer->hrtimer.function = kvm_hrtimer_expire;
>  	ptimer->hrtimer.function = kvm_hrtimer_expire;
> +	hvtimer->hrtimer.function = kvm_hrtimer_expire;
> +	hptimer->hrtimer.function = kvm_hrtimer_expire;
>  
>  	vtimer->irq.irq = default_vtimer_irq.irq;
>  	ptimer->irq.irq = default_ptimer_irq.irq;
> +	hvtimer->irq.irq = default_hvtimer_irq.irq;
> +	hptimer->irq.irq = default_hptimer_irq.irq;
>  
>  	vtimer->host_timer_irq = host_vtimer_irq;
>  	ptimer->host_timer_irq = host_ptimer_irq;
> +	hvtimer->host_timer_irq = host_vtimer_irq;
> +	hptimer->host_timer_irq = host_ptimer_irq;
>  
>  	vtimer->host_timer_irq_flags = host_vtimer_irq_flags;
>  	ptimer->host_timer_irq_flags = host_ptimer_irq_flags;
> +	hvtimer->host_timer_irq_flags = host_vtimer_irq_flags;
> +	hptimer->host_timer_irq_flags = host_ptimer_irq_flags;
>  
>  	vtimer->vcpu = vcpu;
>  	ptimer->vcpu = vcpu;
> +	hvtimer->vcpu = vcpu;
> +	hptimer->vcpu = vcpu;
>  }
>  
>  static void kvm_timer_init_interrupt(void *info)
> @@ -997,7 +1092,7 @@ void kvm_timer_vcpu_terminate(struct kvm_vcpu *vcpu)
>  
>  static bool timer_irqs_are_valid(struct kvm_vcpu *vcpu)
>  {
> -	int vtimer_irq, ptimer_irq;
> +	int vtimer_irq, ptimer_irq, hvtimer_irq, hptimer_irq;
>  	int i, ret;
>  
>  	vtimer_irq = vcpu_vtimer(vcpu)->irq.irq;
> @@ -1010,9 +1105,21 @@ static bool timer_irqs_are_valid(struct kvm_vcpu *vcpu)
>  	if (ret)
>  		return false;
>  
> +	hvtimer_irq = vcpu_hvtimer(vcpu)->irq.irq;
> +	ret = kvm_vgic_set_owner(vcpu, hvtimer_irq, vcpu_hvtimer(vcpu));
> +	if (ret)
> +		return false;
> +
> +	hptimer_irq = vcpu_hptimer(vcpu)->irq.irq;
> +	ret = kvm_vgic_set_owner(vcpu, hptimer_irq, vcpu_hptimer(vcpu));
> +	if (ret)
> +		return false;
> +
>  	kvm_for_each_vcpu(i, vcpu, vcpu->kvm) {
>  		if (vcpu_vtimer(vcpu)->irq.irq != vtimer_irq ||
> -		    vcpu_ptimer(vcpu)->irq.irq != ptimer_irq)
> +		    vcpu_ptimer(vcpu)->irq.irq != ptimer_irq ||
> +		    vcpu_hvtimer(vcpu)->irq.irq != hvtimer_irq ||
> +		    vcpu_hptimer(vcpu)->irq.irq != hptimer_irq)
>  			return false;
>  	}
>  
> @@ -1028,6 +1135,10 @@ bool kvm_arch_timer_get_input_level(int vintid)
>  		timer = vcpu_vtimer(vcpu);
>  	else if (vintid == vcpu_ptimer(vcpu)->irq.irq)
>  		timer = vcpu_ptimer(vcpu);
> +	else if (vintid == vcpu_hvtimer(vcpu)->irq.irq)
> +		timer = vcpu_hvtimer(vcpu);
> +	else if (vintid == vcpu_hptimer(vcpu)->irq.irq)
> +		timer = vcpu_hptimer(vcpu);
>  	else
>  		BUG();
>  
> @@ -1109,6 +1220,7 @@ static void set_timer_irqs(struct kvm *kvm, int vtimer_irq, int ptimer_irq)
>  	kvm_for_each_vcpu(i, vcpu, kvm) {
>  		vcpu_vtimer(vcpu)->irq.irq = vtimer_irq;
>  		vcpu_ptimer(vcpu)->irq.irq = ptimer_irq;
> +		/* TODO: Add support for hv/hp timers */
>  	}
>  }
>  
> @@ -1119,6 +1231,8 @@ int kvm_arm_timer_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
>  	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
>  	int irq;
>  
> +	/* TODO: Add support for hv/hp timers */
> +
>  	if (!irqchip_in_kernel(vcpu->kvm))
>  		return -EINVAL;
>  
> @@ -1151,6 +1265,8 @@ int kvm_arm_timer_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
>  	struct arch_timer_context *timer;
>  	int irq;
>  
> +	/* TODO: Add support for hv/hp timers */
> +
>  	switch (attr->attr) {
>  	case KVM_ARM_VCPU_TIMER_IRQ_VTIMER:
>  		timer = vcpu_vtimer(vcpu);
> diff --git a/virt/kvm/arm/trace.h b/virt/kvm/arm/trace.h
> index 204d210d01c2..3b08cc0376f4 100644
> --- a/virt/kvm/arm/trace.h
> +++ b/virt/kvm/arm/trace.h
> @@ -271,6 +271,7 @@ TRACE_EVENT(kvm_get_timer_map,
>  		__field(	unsigned long,		vcpu_id	)
>  		__field(	int,			direct_vtimer	)
>  		__field(	int,			direct_ptimer	)
> +		__field(	int,			emul_vtimer	)
>  		__field(	int,			emul_ptimer	)
>  	),
>  
> @@ -279,14 +280,17 @@ TRACE_EVENT(kvm_get_timer_map,
>  		__entry->direct_vtimer		= arch_timer_ctx_index(map->direct_vtimer);
>  		__entry->direct_ptimer =
>  			(map->direct_ptimer) ? arch_timer_ctx_index(map->direct_ptimer) : -1;
> +		__entry->emul_vtimer =
> +			(map->emul_vtimer) ? arch_timer_ctx_index(map->emul_vtimer) : -1;
>  		__entry->emul_ptimer =
>  			(map->emul_ptimer) ? arch_timer_ctx_index(map->emul_ptimer) : -1;
>  	),
>  
> -	TP_printk("VCPU: %ld, dv: %d, dp: %d, ep: %d",
> +	TP_printk("VCPU: %ld, dv: %d, dp: %d, ev: %d, ep: %d",
>  		  __entry->vcpu_id,
>  		  __entry->direct_vtimer,
>  		  __entry->direct_ptimer,
> +		  __entry->emul_vtimer,
>  		  __entry->emul_ptimer)
>  );
>  
> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> index 191deccf60bf..1c5b4dbd33e4 100644
> --- a/virt/kvm/arm/vgic/vgic.c
> +++ b/virt/kvm/arm/vgic/vgic.c
> @@ -569,6 +569,21 @@ int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int vintid)
>  	return 0;
>  }
>  
> +int kvm_vgic_get_map(struct kvm_vcpu *vcpu, unsigned int vintid)
> +{
> +	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, vintid);
> +	unsigned long flags;
> +	int ret = -1;
> +
> +	raw_spin_lock_irqsave(&irq->irq_lock, flags);
> +	if (irq->hw)
> +		ret = irq->hwintid;
> +	raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
> +
> +	vgic_put_irq(vcpu->kvm, irq);
> +	return ret;
> +}
> +
>  /**
>   * kvm_vgic_set_owner - Set the owner of an interrupt for a VM
>   *

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 55/59] arm64: KVM: nv: Add handling of EL2-specific timer registers
  2019-06-21  9:38 ` [PATCH 55/59] arm64: KVM: nv: Add handling of EL2-specific timer registers Marc Zyngier
@ 2019-07-11 12:35   ` Alexandru Elisei
  2019-07-17 10:19   ` Alexandru Elisei
  1 sibling, 0 replies; 176+ messages in thread
From: Alexandru Elisei @ 2019-07-11 12:35 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave Martin

On 6/21/19 10:38 AM, Marc Zyngier wrote:

> Add the required handling for EL2 and EL02 registers, as
> well as EL1 registers used in the E2H context.
>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/kvm/sys_regs.c | 72 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 72 insertions(+)
>
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index ba3bcd29c02d..0bd6a66b621e 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1361,20 +1361,92 @@ static bool access_arch_timer(struct kvm_vcpu *vcpu,
>  
>  	switch (reg) {
>  	case SYS_CNTP_TVAL_EL0:
> +		if (vcpu_mode_el2(vcpu) && vcpu_el2_e2h_is_set(vcpu))
> +			tmr = TIMER_HPTIMER;
> +		else
> +			tmr = TIMER_PTIMER;
> +		treg = TIMER_REG_TVAL;
> +		break;
> +
>  	case SYS_AARCH32_CNTP_TVAL:
> +	case SYS_CNTP_TVAL_EL02:
>  		tmr = TIMER_PTIMER;
>  		treg = TIMER_REG_TVAL;
>  		break;
> +
> +	case SYS_CNTV_TVAL_EL02:
> +		tmr = TIMER_VTIMER;
> +		treg = TIMER_REG_TVAL;
> +		break;
> +
> +	case SYS_CNTHP_TVAL_EL2:
> +		tmr = TIMER_HPTIMER;
> +		treg = TIMER_REG_TVAL;
> +		break;
> +
> +	case SYS_CNTHV_TVAL_EL2:
> +		tmr = TIMER_HVTIMER;
> +		treg = TIMER_REG_TVAL;
> +		break;
> +
>  	case SYS_CNTP_CTL_EL0:
> +		if (vcpu_mode_el2(vcpu) && vcpu_el2_e2h_is_set(vcpu))
> +			tmr = TIMER_HPTIMER;
> +		else
> +			tmr = TIMER_PTIMER;
> +		treg = TIMER_REG_CTL;
> +		break;
> +
>  	case SYS_AARCH32_CNTP_CTL:
> +	case SYS_CNTP_CTL_EL02:
>  		tmr = TIMER_PTIMER;
>  		treg = TIMER_REG_CTL;
>  		break;
> +
> +	case SYS_CNTV_CTL_EL02:
> +		tmr = TIMER_VTIMER;
> +		treg = TIMER_REG_CTL;
> +		break;
> +
> +	case SYS_CNTHP_CTL_EL2:
> +		tmr = TIMER_HPTIMER;
> +		treg = TIMER_REG_CTL;
> +		break;
> +
> +	case SYS_CNTHV_CTL_EL2:
> +		tmr = TIMER_HVTIMER;
> +		treg = TIMER_REG_CTL;
> +		break;
> +		
>  	case SYS_CNTP_CVAL_EL0:
> +		if (vcpu_mode_el2(vcpu) && vcpu_el2_e2h_is_set(vcpu))
> +			tmr = TIMER_HPTIMER;
> +		else
> +			tmr = TIMER_PTIMER;
> +		treg = TIMER_REG_CVAL;
> +		break;
> +
>  	case SYS_AARCH32_CNTP_CVAL:
> +	case SYS_CNTP_CVAL_EL02:
>  		tmr = TIMER_PTIMER;
>  		treg = TIMER_REG_CVAL;
>  		break;
> +
> +	case SYS_CNTV_CVAL_EL02:
> +		tmr = TIMER_VTIMER;
> +		treg = TIMER_REG_CVAL;
> +		break;
> +
> +	case SYS_CNTHP_CVAL_EL2:
> +		tmr = TIMER_HPTIMER;
> +		treg = TIMER_REG_CVAL;
> +		break;
> +
> +	case SYS_CNTHV_CVAL_EL2:
> +		tmr = TIMER_HVTIMER;
> +		treg = TIMER_REG_CVAL;
> +		break;
> +
>  	case SYS_CNTVOFF_EL2:
>  		tmr = TIMER_VTIMER;
>  		treg = TIMER_REG_VOFF;

Shouldn't we forward physical timer traps to the L1 guest hypervisor if
__vcpu_sys_reg(vcpu, CNTHCTL_EL2.EL1PTEN) == 0 (trap access) and
!vcpu_mode_el2(vcpu)? A regular (non-nested) non-vhe hypervisor can set that bit
to emulate the physical timer. If I remember correctly, KVM with VHE used to do
it too until some time ago.


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 48/59] KVM: arm64: nv: Load timer before the GIC
  2019-06-21  9:38 ` [PATCH 48/59] KVM: arm64: nv: Load timer before the GIC Marc Zyngier
@ 2019-07-11 13:17   ` Alexandru Elisei
  0 siblings, 0 replies; 176+ messages in thread
From: Alexandru Elisei @ 2019-07-11 13:17 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave Martin

On 6/21/19 10:38 AM, Marc Zyngier wrote:
> In order for vgic_v3_load_nested to be able to observe which
> which timer interrupts have the HW bit set for the current

s/which which/which

> context, the timers must have been loaded in the new mode
> and the right timer mapped to their corresponding HW IRQs.
>
> At the moment, we load the GIC first, meaning that timer
> interrupts injected to an L2 guest will never have the HW
> HW bit set (we see the old configuration).

s/HW HW/HW

> Swapping the two loads solves this particular problem.
>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  virt/kvm/arm/arm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index e8b584b79847..ca10a11e044e 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -361,8 +361,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  	vcpu->arch.host_cpu_context = &cpu_data->host_ctxt;
>  
>  	kvm_arm_set_running_vcpu(vcpu);
> -	kvm_vgic_load(vcpu);
>  	kvm_timer_vcpu_load(vcpu);
> +	kvm_vgic_load(vcpu);
>  	kvm_vcpu_load_sysregs(vcpu);
>  	kvm_arch_vcpu_load_fp(vcpu);
>  	kvm_vcpu_pmu_restore_guest(vcpu);

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 50/59] KVM: arm64: nv: Nested GICv3 Support
  2019-06-21  9:38 ` [PATCH 50/59] KVM: arm64: nv: Nested GICv3 Support Marc Zyngier
@ 2019-07-16 11:41   ` Alexandru Elisei
  0 siblings, 0 replies; 176+ messages in thread
From: Alexandru Elisei @ 2019-07-16 11:41 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave Martin

On 6/21/19 10:38 AM, Marc Zyngier wrote:
> From: Jintack Lim <jintack@cs.columbia.edu>
>
> When entering a nested VM, we set up the hypervisor control interface
> based on what the guest hypervisor has set. Especially, we investigate
> each list register written by the guest hypervisor whether HW bit is
> set.  If so, we translate hw irq number from the guest's point of view
> to the real hardware irq number if there is a mapping.
>
> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
> [Rewritten to support GICv3 instead of GICv2]
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> [Redesigned execution flow around vcpu load/put]
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> ---
>  arch/arm/include/asm/kvm_emulate.h |   1 +
>  arch/arm/include/asm/kvm_host.h    |   6 +-
>  arch/arm64/include/asm/kvm_host.h  |   5 +-
>  arch/arm64/kvm/Makefile            |   1 +
>  arch/arm64/kvm/nested.c            |  10 ++
>  arch/arm64/kvm/sys_regs.c          | 178 ++++++++++++++++++++++++++++-
>  include/kvm/arm_vgic.h             |  18 +++
>  virt/kvm/arm/arm.c                 |   7 +-
>  virt/kvm/arm/vgic/vgic-v3-nested.c | 177 ++++++++++++++++++++++++++++
>  virt/kvm/arm/vgic/vgic-v3.c        |  28 +++++
>  virt/kvm/arm/vgic/vgic.c           |  32 ++++++
>  11 files changed, 456 insertions(+), 7 deletions(-)
>  create mode 100644 virt/kvm/arm/vgic/vgic-v3-nested.c
>
> diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
> index 865ce545b465..a53f19041e16 100644
> --- a/arch/arm/include/asm/kvm_emulate.h
> +++ b/arch/arm/include/asm/kvm_emulate.h
> @@ -334,5 +334,6 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
>  static inline void vcpu_ptrauth_setup_lazy(struct kvm_vcpu *vcpu) {}
>  
>  static inline bool is_hyp_ctxt(struct kvm_vcpu *vcpu) { return false; }
> +static inline int kvm_inject_nested_irq(struct kvm_vcpu *vcpu) { BUG(); }
>  
>  #endif /* __ARM_KVM_EMULATE_H__ */
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index cc761610e41e..d6923ed55796 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -35,10 +35,12 @@
>  #define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS
>  #endif
>  
> +/* KVM_REQ_GUEST_HYP_IRQ_PENDING is actually unused */
>  #define KVM_REQ_SLEEP \
>  	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> -#define KVM_REQ_IRQ_PENDING	KVM_ARCH_REQ(1)
> -#define KVM_REQ_VCPU_RESET	KVM_ARCH_REQ(2)
> +#define KVM_REQ_IRQ_PENDING		KVM_ARCH_REQ(1)
> +#define KVM_REQ_VCPU_RESET		KVM_ARCH_REQ(2)
> +#define KVM_REQ_GUEST_HYP_IRQ_PENDING	KVM_ARCH_REQ(3)
>  
>  DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
>  
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index e0fe9acb46bf..e2e44cc650bf 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -53,8 +53,9 @@
>  
>  #define KVM_REQ_SLEEP \
>  	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> -#define KVM_REQ_IRQ_PENDING	KVM_ARCH_REQ(1)
> -#define KVM_REQ_VCPU_RESET	KVM_ARCH_REQ(2)
> +#define KVM_REQ_IRQ_PENDING		KVM_ARCH_REQ(1)
> +#define KVM_REQ_VCPU_RESET		KVM_ARCH_REQ(2)
> +#define KVM_REQ_GUEST_HYP_IRQ_PENDING	KVM_ARCH_REQ(3)
>  
>  DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
>  
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index f11bd8b0d837..045a8f18f465 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -38,3 +38,4 @@ kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
>  
>  kvm-$(CONFIG_KVM_ARM_HOST) += nested.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += emulate-nested.o
> +kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-v3-nested.o
> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> index 214d59019935..df2db9ab7cfb 100644
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
> @@ -539,3 +539,13 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm)
>  	kvm->arch.nested_mmus_size = 0;
>  	kvm_free_stage2_pgd(&kvm->arch.mmu);
>  }
> +
> +bool vgic_state_is_nested(struct kvm_vcpu *vcpu)
> +{
> +	bool imo = __vcpu_sys_reg(vcpu, HCR_EL2) & HCR_IMO;
> +	bool fmo = __vcpu_sys_reg(vcpu, HCR_EL2) & HCR_FMO;
> +
> +	WARN(imo != fmo, "Separate virtual IRQ/FIQ settings not supported\n");
> +
> +	return nested_virt_in_use(vcpu) && imo && fmo && !is_hyp_ctxt(vcpu);
> +}
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 2031a59fcf49..ba3bcd29c02d 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -26,6 +26,8 @@
>  #include <linux/printk.h>
>  #include <linux/uaccess.h>
>  
> +#include <linux/irqchip/arm-gic-v3.h>
> +
>  #include <asm/cacheflush.h>
>  #include <asm/cputype.h>
>  #include <asm/debug-monitors.h>
> @@ -505,6 +507,18 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
>  	return true;
>  }
>  
> +/*
> + * The architecture says that non-secure write accesses to this register from
> + * EL1 are trapped to EL2, if either:
> + *  - HCR_EL2.FMO==1, or
> + *  - HCR_EL2.IMO==1
> + */
> +static bool sgi_traps_to_vel2(struct kvm_vcpu *vcpu)
> +{
> +	return !vcpu_mode_el2(vcpu) &&
> +		!!(__vcpu_sys_reg(vcpu, HCR_EL2) & (HCR_IMO | HCR_FMO));
> +}
> +
>  /*
>   * Trap handler for the GICv3 SGI generation system register.
>   * Forward the request to the VGIC emulation.
> @@ -520,6 +534,11 @@ static bool access_gic_sgi(struct kvm_vcpu *vcpu,
>  	if (!p->is_write)
>  		return read_from_write_only(vcpu, p, r);
>  
> +	if (sgi_traps_to_vel2(vcpu)) {
> +		kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
> +		return false;
> +	}
> +
>  	/*
>  	 * In a system where GICD_CTLR.DS=1, a ICC_SGI0R_EL1 access generates
>  	 * Group0 SGIs only, while ICC_SGI1R_EL1 can generate either group,
> @@ -563,7 +582,13 @@ static bool access_gic_sre(struct kvm_vcpu *vcpu,
>  	if (p->is_write)
>  		return ignore_write(vcpu, p);
>  
> -	p->regval = vcpu->arch.vgic_cpu.vgic_v3.vgic_sre;
> +	if (p->Op1 == 4) {	/* ICC_SRE_EL2 */
> +		p->regval = (ICC_SRE_EL2_ENABLE | ICC_SRE_EL2_SRE |
> +			     ICC_SRE_EL1_DIB | ICC_SRE_EL1_DFB);
> +	} else {		/* ICC_SRE_EL1 */
> +		p->regval = vcpu->arch.vgic_cpu.vgic_v3.vgic_sre;
> +	}
> +
>  	return true;
>  }
>  
> @@ -1793,6 +1818,122 @@ static bool access_id_aa64pfr0_el1(struct kvm_vcpu *v,
>  	return true;
>  }
>  
> +static bool access_gic_apr(struct kvm_vcpu *vcpu,
> +			   struct sys_reg_params *p,
> +			   const struct sys_reg_desc *r)
> +{
> +	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
> +	u32 index, *base;
> +
> +	index = r->Op2;
> +	if (r->CRm == 8)
> +		base = cpu_if->vgic_ap0r;
> +	else
> +		base = cpu_if->vgic_ap1r;
> +
> +	if (p->is_write)
> +		base[index] = p->regval;
> +	else
> +		p->regval = base[index];
> +
> +	return true;
> +}
> +
> +static bool access_gic_hcr(struct kvm_vcpu *vcpu,
> +			   struct sys_reg_params *p,
> +			   const struct sys_reg_desc *r)
> +{
> +	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
> +
> +	if (p->is_write)
> +		cpu_if->vgic_hcr = p->regval;

Probably because there's only enough NV support to run an L1 KVM hypervisor + L2
guest, but the L1 guest ICH_HCR_EL2 value is written to the register unmodified
in vgic_v3_load, and there's no support for forwarding traps that can be
configured via ICH_HCR_EL2 (or even handling some traps - ICV_CTLR_EL1 can be
trapped when ICH_HCR_EL2.TC = 1).

> +	else
> +		p->regval = cpu_if->vgic_hcr;
> +
> +	return true;
> +}
> +
> +static bool access_gic_vtr(struct kvm_vcpu *vcpu,
> +			   struct sys_reg_params *p,
> +			   const struct sys_reg_desc *r)
> +{
> +	if (p->is_write)
> +		return write_to_read_only(vcpu, p, r);
> +
> +	p->regval = kvm_vgic_global_state.ich_vtr_el2;
> +
> +	return true;
> +}
> +
> +static bool access_gic_misr(struct kvm_vcpu *vcpu,
> +			    struct sys_reg_params *p,
> +			    const struct sys_reg_desc *r)
> +{
> +	if (p->is_write)
> +		return write_to_read_only(vcpu, p, r);
> +
> +	p->regval = vgic_v3_get_misr(vcpu);
> +
> +	return true;
> +}
> +
> +static bool access_gic_eisr(struct kvm_vcpu *vcpu,
> +			    struct sys_reg_params *p,
> +			    const struct sys_reg_desc *r)
> +{
> +	if (p->is_write)
> +		return write_to_read_only(vcpu, p, r);
> +
> +	p->regval = vgic_v3_get_eisr(vcpu);
> +
> +	return true;
> +}
> +
> +static bool access_gic_elrsr(struct kvm_vcpu *vcpu,
> +			     struct sys_reg_params *p,
> +			     const struct sys_reg_desc *r)
> +{
> +	if (p->is_write)
> +		return write_to_read_only(vcpu, p, r);
> +
> +	p->regval = vgic_v3_get_elrsr(vcpu);
> +
> +	return true;
> +}
> +
> +static bool access_gic_vmcr(struct kvm_vcpu *vcpu,
> +			    struct sys_reg_params *p,
> +			    const struct sys_reg_desc *r)
> +{
> +	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
> +
> +	if (p->is_write)
> +		cpu_if->vgic_vmcr = p->regval;
> +	else
> +		p->regval = cpu_if->vgic_vmcr;
> +
> +	return true;
> +}
> +
> +static bool access_gic_lr(struct kvm_vcpu *vcpu,
> +			  struct sys_reg_params *p,
> +			  const struct sys_reg_desc *r)
> +{
> +	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
> +	u32 index;
> +
> +	index = p->Op2;
> +	if (p->CRm == 13)
> +		index += 8;
> +
> +	if (p->is_write)
> +		cpu_if->vgic_lr[index] = p->regval;
> +	else
> +		p->regval = cpu_if->vgic_lr[index];
> +
> +	return true;
> +}
> +
>  /*
>   * Architected system registers.
>   * Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
> @@ -2123,6 +2264,41 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	{ SYS_DESC(SYS_RMR_EL2), access_rw, reset_val, RMR_EL2, 0 },
>  	{ SYS_DESC(SYS_VDISR_EL2), trap_undef },
>  
> +	{ SYS_DESC(SYS_ICH_AP0R0_EL2), access_gic_apr },
> +	{ SYS_DESC(SYS_ICH_AP0R1_EL2), access_gic_apr },
> +	{ SYS_DESC(SYS_ICH_AP0R2_EL2), access_gic_apr },
> +	{ SYS_DESC(SYS_ICH_AP0R3_EL2), access_gic_apr },
> +	{ SYS_DESC(SYS_ICH_AP1R0_EL2), access_gic_apr },
> +	{ SYS_DESC(SYS_ICH_AP1R1_EL2), access_gic_apr },
> +	{ SYS_DESC(SYS_ICH_AP1R2_EL2), access_gic_apr },
> +	{ SYS_DESC(SYS_ICH_AP1R3_EL2), access_gic_apr },
> +
> +	{ SYS_DESC(SYS_ICC_SRE_EL2), access_gic_sre },
> +
> +	{ SYS_DESC(SYS_ICH_HCR_EL2), access_gic_hcr },
> +	{ SYS_DESC(SYS_ICH_VTR_EL2), access_gic_vtr },
> +	{ SYS_DESC(SYS_ICH_MISR_EL2), access_gic_misr },
> +	{ SYS_DESC(SYS_ICH_EISR_EL2), access_gic_eisr },
> +	{ SYS_DESC(SYS_ICH_ELRSR_EL2), access_gic_elrsr },
> +	{ SYS_DESC(SYS_ICH_VMCR_EL2), access_gic_vmcr },
> +
> +	{ SYS_DESC(SYS_ICH_LR0_EL2), access_gic_lr },
> +	{ SYS_DESC(SYS_ICH_LR1_EL2), access_gic_lr },
> +	{ SYS_DESC(SYS_ICH_LR2_EL2), access_gic_lr },
> +	{ SYS_DESC(SYS_ICH_LR3_EL2), access_gic_lr },
> +	{ SYS_DESC(SYS_ICH_LR4_EL2), access_gic_lr },
> +	{ SYS_DESC(SYS_ICH_LR5_EL2), access_gic_lr },
> +	{ SYS_DESC(SYS_ICH_LR6_EL2), access_gic_lr },
> +	{ SYS_DESC(SYS_ICH_LR7_EL2), access_gic_lr },
> +	{ SYS_DESC(SYS_ICH_LR8_EL2), access_gic_lr },
> +	{ SYS_DESC(SYS_ICH_LR9_EL2), access_gic_lr },
> +	{ SYS_DESC(SYS_ICH_LR10_EL2), access_gic_lr },
> +	{ SYS_DESC(SYS_ICH_LR11_EL2), access_gic_lr },
> +	{ SYS_DESC(SYS_ICH_LR12_EL2), access_gic_lr },
> +	{ SYS_DESC(SYS_ICH_LR13_EL2), access_gic_lr },
> +	{ SYS_DESC(SYS_ICH_LR14_EL2), access_gic_lr },
> +	{ SYS_DESC(SYS_ICH_LR15_EL2), access_gic_lr },
> +
>  	{ SYS_DESC(SYS_CONTEXTIDR_EL2), access_rw, reset_val, CONTEXTIDR_EL2, 0 },
>  	{ SYS_DESC(SYS_TPIDR_EL2), access_rw, reset_val, TPIDR_EL2, 0 },
>  
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index 163b132e100e..707fbe627155 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -310,6 +310,15 @@ struct vgic_cpu {
>  
>  	struct vgic_irq private_irqs[VGIC_NR_PRIVATE_IRQS];
>  
> +	/* CPU vif control registers for the virtual GICH interface */
> +	struct vgic_v3_cpu_if	nested_vgic_v3;
> +
> +	/*
> +	 * The shadow vif control register loaded to the hardware when
> +	 * running a nested L2 guest with the virtual IMO/FMO bit set.
> +	 */
> +	struct vgic_v3_cpu_if	shadow_vgic_v3;
> +
>  	raw_spinlock_t ap_list_lock;	/* Protects the ap_list */
>  
>  	/*
> @@ -366,6 +375,13 @@ int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
>  void kvm_vgic_load(struct kvm_vcpu *vcpu);
>  void kvm_vgic_put(struct kvm_vcpu *vcpu);
>  
> +void vgic_v3_load_nested(struct kvm_vcpu *vcpu);
> +void vgic_v3_put_nested(struct kvm_vcpu *vcpu);
> +void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu);
> +u16 vgic_v3_get_eisr(struct kvm_vcpu *vcpu);
> +u16 vgic_v3_get_elrsr(struct kvm_vcpu *vcpu);
> +u64 vgic_v3_get_misr(struct kvm_vcpu *vcpu);
> +
>  #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
>  #define vgic_initialized(k)	((k)->arch.vgic.initialized)
>  #define vgic_ready(k)		((k)->arch.vgic.ready)
> @@ -411,4 +427,6 @@ int kvm_vgic_v4_unset_forwarding(struct kvm *kvm, int irq,
>  void kvm_vgic_v4_enable_doorbell(struct kvm_vcpu *vcpu);
>  void kvm_vgic_v4_disable_doorbell(struct kvm_vcpu *vcpu);
>  
> +bool vgic_state_is_nested(struct kvm_vcpu *vcpu);
> +
>  #endif /* __KVM_ARM_VGIC_H */
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index ca10a11e044e..ddcab58ae440 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -634,6 +634,9 @@ static void check_vcpu_requests(struct kvm_vcpu *vcpu)
>  		 * that a VCPU sees new virtual interrupts.
>  		 */
>  		kvm_check_request(KVM_REQ_IRQ_PENDING, vcpu);
> +
> +		if (kvm_check_request(KVM_REQ_GUEST_HYP_IRQ_PENDING, vcpu))
> +			kvm_inject_nested_irq(vcpu);
>  	}
>  }
>  
> @@ -680,10 +683,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		 */
>  		cond_resched();
>  
> -		update_vmid(&vcpu->arch.hw_mmu->vmid);
> -
>  		check_vcpu_requests(vcpu);
>  
> +		update_vmid(&vcpu->arch.hw_mmu->vmid);

Was this change made to prevent having a mmu with a valid vmid_gen, but which
was never actually run? Or something else entirely?

> +
>  		/*
>  		 * Preparing the interrupts to be injected also
>  		 * involves poking the GIC, which must be done in a
> diff --git a/virt/kvm/arm/vgic/vgic-v3-nested.c b/virt/kvm/arm/vgic/vgic-v3-nested.c
> new file mode 100644
> index 000000000000..6fb81dfbb679
> --- /dev/null
> +++ b/virt/kvm/arm/vgic/vgic-v3-nested.c
> @@ -0,0 +1,177 @@
> +#include <linux/cpu.h>
> +#include <linux/kvm.h>
> +#include <linux/kvm_host.h>
> +#include <linux/interrupt.h>
> +#include <linux/io.h>
> +#include <linux/uaccess.h>
> +
> +#include <linux/irqchip/arm-gic-v3.h>
> +
> +#include <asm/kvm_emulate.h>
> +#include <asm/kvm_arm.h>
> +#include <kvm/arm_vgic.h>
> +
> +#include "vgic.h"
> +
> +static inline struct vgic_v3_cpu_if *vcpu_nested_if(struct kvm_vcpu *vcpu)
> +{
> +	return &vcpu->arch.vgic_cpu.nested_vgic_v3;
> +}

Not especially relevant at this stage, but the nested_vgic_v3 member is accesses
in several other places in sys_regs.c and vgic-v3.c. Perhaps this function could
be moved to include/kvm/arm_vgic.h in a future revision.

> +
> +static inline struct vgic_v3_cpu_if *vcpu_shadow_if(struct kvm_vcpu *vcpu)
> +{
> +	return &vcpu->arch.vgic_cpu.shadow_vgic_v3;
> +}
> +
> +static inline bool lr_triggers_eoi(u64 lr)
> +{
> +	return !(lr & (ICH_LR_STATE | ICH_LR_HW)) && (lr & ICH_LR_EOI);
> +}
> +
> +u16 vgic_v3_get_eisr(struct kvm_vcpu *vcpu)
> +{
> +	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
> +	u16 reg = 0;
> +	int i;
> +
> +	for (i = 0; i < kvm_vgic_global_state.nr_lr; i++) {
> +		if (lr_triggers_eoi(cpu_if->vgic_lr[i]))
> +			reg |= BIT(i);
> +	}
> +
> +	return reg;
> +}
> +
> +u16 vgic_v3_get_elrsr(struct kvm_vcpu *vcpu)
> +{
> +	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
> +	u16 reg = 0;
> +	int i;
> +
> +	for (i = 0; i < kvm_vgic_global_state.nr_lr; i++) {
> +		if (!(cpu_if->vgic_lr[i] & ICH_LR_STATE))
> +			reg |= BIT(i);
> +	}
> +
> +	return reg;
> +}
> +
> +u64 vgic_v3_get_misr(struct kvm_vcpu *vcpu)
> +{
> +	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
> +	int nr_lr = kvm_vgic_global_state.nr_lr;
> +	u64 reg = 0;
> +
> +	if (vgic_v3_get_eisr(vcpu))
> +		reg |= ICH_MISR_EOI;
> +
> +	if (cpu_if->vgic_hcr & ICH_HCR_UIE) {
> +		int used_lrs;
> +
> +		used_lrs = nr_lr - hweight16(vgic_v3_get_elrsr(vcpu));
> +		if (used_lrs <= 1)
> +			reg |= ICH_MISR_U;
> +	}
> +
> +	/* TODO: Support remaining bits in this register */
> +	return reg;
> +}
> +
> +/*
> + * For LRs which have HW bit set such as timer interrupts, we modify them to
> + * have the host hardware interrupt number instead of the virtual one programmed
> + * by the guest hypervisor.
> + */
> +static void vgic_v3_create_shadow_lr(struct kvm_vcpu *vcpu)
> +{
> +	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
> +	struct vgic_v3_cpu_if *s_cpu_if = vcpu_shadow_if(vcpu);
> +	struct vgic_irq *irq;
> +	int i;
> +
> +	for (i = 0; i < kvm_vgic_global_state.nr_lr; i++) {
> +		u64 lr = cpu_if->vgic_lr[i];
> +		int l1_irq;
> +
> +		if (!(lr & ICH_LR_HW))
> +			goto next;
> +
> +		/* We have the HW bit set */
> +		l1_irq = (lr & ICH_LR_PHYS_ID_MASK) >> ICH_LR_PHYS_ID_SHIFT;
> +		irq = vgic_get_irq(vcpu->kvm, vcpu, l1_irq);
> +
> +		if (!irq || !irq->hw) {
> +			/* There was no real mapping, so nuke the HW bit */
> +			lr &= ~ICH_LR_HW;
> +			if (irq)
> +				vgic_put_irq(vcpu->kvm, irq);
> +			goto next;
> +		}
> +
> +		/* Translate the virtual mapping to the real one */
> +		lr &= ~ICH_LR_EOI; /* Why? */
> +		lr &= ~ICH_LR_PHYS_ID_MASK;
> +		lr |= (u64)irq->hwintid << ICH_LR_PHYS_ID_SHIFT;
> +		vgic_put_irq(vcpu->kvm, irq);
> +
> +next:
> +		s_cpu_if->vgic_lr[i] = lr;
> +	}
> +
> +	s_cpu_if->used_lrs = kvm_vgic_global_state.nr_lr;
> +}
> +
> +/*
> + * Change the shadow HWIRQ field back to the virtual value before copying over
> + * the entire shadow struct to the nested state.
> + */
> +static void vgic_v3_fixup_shadow_lr_state(struct kvm_vcpu *vcpu)
> +{
> +	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
> +	struct vgic_v3_cpu_if *s_cpu_if = vcpu_shadow_if(vcpu);
> +	int lr;
> +
> +	for (lr = 0; lr < kvm_vgic_global_state.nr_lr; lr++) {
> +		s_cpu_if->vgic_lr[lr] &= ~ICH_LR_PHYS_ID_MASK;
> +		s_cpu_if->vgic_lr[lr] |= cpu_if->vgic_lr[lr] & ICH_LR_PHYS_ID_MASK;
> +	}
> +}
> +
> +void vgic_v3_load_nested(struct kvm_vcpu *vcpu)
> +{
> +	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
> +
> +	vgic_cpu->shadow_vgic_v3 = vgic_cpu->nested_vgic_v3;
> +	vgic_v3_create_shadow_lr(vcpu);
> +	__vgic_v3_restore_state(vcpu_shadow_if(vcpu));
> +}
> +
> +void vgic_v3_put_nested(struct kvm_vcpu *vcpu)
> +{
> +	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
> +
> +	__vgic_v3_save_state(vcpu_shadow_if(vcpu));
> +
> +	/*
> +	 * Translate the shadow state HW fields back to the virtual ones
> +	 * before copying the shadow struct back to the nested one.
> +	 */
> +	vgic_v3_fixup_shadow_lr_state(vcpu);
> +	vgic_cpu->nested_vgic_v3 = vgic_cpu->shadow_vgic_v3;
> +}
> +
> +void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu)
> +{
> +	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
> +
> +	/*
> +	 * If we exit a nested VM with a pending maintenance interrupt from the
> +	 * GIC, then we need to forward this to the guest hypervisor so that it
> +	 * can re-sync the appropriate LRs and sample level triggered interrupts
> +	 * again.
> +	 */
> +	if (vgic_state_is_nested(vcpu) &&
> +	    (cpu_if->vgic_hcr & ICH_HCR_EN) &&
> +	    vgic_v3_get_misr(vcpu))
> +		kvm_inject_nested_irq(vcpu);
> +}

I don't see this function used anywhere, shouldn't it be part of #53 "KVM:
arm64: nv: Implement maintenance interrupt forwarding"?

> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
> index 77d23e817756..25edf32c28fb 100644
> --- a/virt/kvm/arm/vgic/vgic-v3.c
> +++ b/virt/kvm/arm/vgic/vgic-v3.c
> @@ -18,6 +18,7 @@
>  #include <kvm/arm_vgic.h>
>  #include <asm/kvm_hyp.h>
>  #include <asm/kvm_mmu.h>
> +#include <asm/kvm_nested.h>
>  #include <asm/kvm_asm.h>
>  
>  #include "vgic.h"
> @@ -298,6 +299,12 @@ void vgic_v3_enable(struct kvm_vcpu *vcpu)
>  		vgic_v3->vgic_sre = (ICC_SRE_EL1_DIB |
>  				     ICC_SRE_EL1_DFB |
>  				     ICC_SRE_EL1_SRE);
> +		/*
> +		 * If nesting is allowed, force GICv3 onto the nested
> +		 * guests as well.
> +		 */
> +		if (nested_virt_in_use(vcpu))
> +			vcpu->arch.vgic_cpu.nested_vgic_v3.vgic_sre = vgic_v3->vgic_sre;
>  		vcpu->arch.vgic_cpu.pendbaser = INITIAL_PENDBASER_VALUE;
>  	} else {
>  		vgic_v3->vgic_sre = 0;
> @@ -660,6 +667,13 @@ void vgic_v3_load(struct kvm_vcpu *vcpu)
>  {
>  	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
>  
> +	/*
> +	 * vgic_v3_load_nested only affects the LRs in the shadow
> +	 * state, so it is fine to pass the nested state around.
> +	 */
> +	if (vgic_state_is_nested(vcpu))
> +		cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
> +
>  	/*
>  	 * If dealing with a GICv2 emulation on GICv3, VMCR_EL2.VFIQen
>  	 * is dependent on ICC_SRE_EL1.SRE, and we have to perform the
> @@ -672,12 +686,18 @@ void vgic_v3_load(struct kvm_vcpu *vcpu)
>  
>  	if (has_vhe())
>  		__vgic_v3_activate_traps(cpu_if);
> +
> +	if (vgic_state_is_nested(vcpu))
> +		vgic_v3_load_nested(vcpu);
>  }
>  
>  void vgic_v3_put(struct kvm_vcpu *vcpu)
>  {
>  	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
>  
> +	if (vgic_state_is_nested(vcpu))
> +		cpu_if = &vcpu->arch.vgic_cpu.shadow_vgic_v3;
> +
>  	if (likely(cpu_if->vgic_sre))
>  		cpu_if->vgic_vmcr = kvm_call_hyp_ret(__vgic_v3_read_vmcr);
>  
> @@ -685,4 +705,12 @@ void vgic_v3_put(struct kvm_vcpu *vcpu)
>  
>  	if (has_vhe())
>  		__vgic_v3_deactivate_traps(cpu_if);
> +
> +	if (vgic_state_is_nested(vcpu))
> +		vgic_v3_put_nested(vcpu);
>  }
> +
> +__weak void vgic_v3_sync_nested(struct kvm_vcpu *vcpu) {}
> +__weak void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu) {}
> +__weak void vgic_v3_load_nested(struct kvm_vcpu *vcpu) {}
> +__weak void vgic_v3_put_nested(struct kvm_vcpu *vcpu) {}
> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> index 6953aefecbb6..f32f49b0c803 100644
> --- a/virt/kvm/arm/vgic/vgic.c
> +++ b/virt/kvm/arm/vgic/vgic.c
> @@ -872,6 +872,10 @@ void kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
>  {
>  	int used_lrs;
>  
> +	/* If nesting, this is a load/put affair, not flush/sync. */
> +	if (vgic_state_is_nested(vcpu))
> +		return;
> +
>  	WARN_ON(vgic_v4_sync_hwstate(vcpu));
>  
>  	/* An empty ap_list_head implies used_lrs == 0 */
> @@ -920,6 +924,29 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
>  	    !vgic_supports_direct_msis(vcpu->kvm))
>  		return;
>  
> +	/*
> +	 * If in a nested state, we must return early. Two possibilities:
> +	 *
> +	 * - If we have any pending IRQ for the guest and the guest
> +	 *   expects IRQs to be handled in its virtual EL2 mode (the
> +	 *   virtual IMO bit is set) and it is not already running in
> +	 *   virtual EL2 mode, then we have to emulate an IRQ
> +	 *   exception to virtual EL2.
> +	 *
> +	 *   We do that by placing a request to ourselves which will
> +	 *   abort the entry procedure and inject the exception at the
> +	 *   beginning of the run loop.
> +	 *
> +	 * - Otherwise, do exactly *NOTHING*. The guest state is
> +	 *   already loaded, and we can carry on with running it.
> +	 */
> +	if (vgic_state_is_nested(vcpu)) {
> +		if (kvm_vgic_vcpu_pending_irq(vcpu))
> +			kvm_make_request(KVM_REQ_GUEST_HYP_IRQ_PENDING, vcpu);
> +
> +		return;
> +	}
> +
>  	DEBUG_SPINLOCK_BUG_ON(!irqs_disabled());
>  
>  	if (!list_empty(&vcpu->arch.vgic_cpu.ap_list_head)) {
> @@ -1022,3 +1049,8 @@ bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int vintid)
>  
>  	return map_is_active;
>  }
> +
> +__weak bool vgic_state_is_nested(struct kvm_vcpu *vcpu)
> +{
> +	return false;
> +}

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 53/59] KVM: arm64: nv: Implement maintenance interrupt forwarding
  2019-06-21  9:38 ` [PATCH 53/59] KVM: arm64: nv: Implement maintenance interrupt forwarding Marc Zyngier
  2019-07-04  8:06   ` Julien Thierry
@ 2019-07-16 16:35   ` Alexandru Elisei
  1 sibling, 0 replies; 176+ messages in thread
From: Alexandru Elisei @ 2019-07-16 16:35 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave Martin

On 6/21/19 10:38 AM, Marc Zyngier wrote:
> When we take a maintenance interrupt, we need to decide whether
> it is generated on an action from the guest, or if it is something
> that needs to be forwarded to the guest hypervisor.
>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/kvm/nested.c            |  2 +-
>  virt/kvm/arm/vgic/vgic-init.c      | 30 ++++++++++++++++++++++++++++++
>  virt/kvm/arm/vgic/vgic-v3-nested.c | 25 +++++++++++++++++++++----
>  3 files changed, 52 insertions(+), 5 deletions(-)
>
> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> index df2db9ab7cfb..ab61f0f30ee6 100644
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
> @@ -545,7 +545,7 @@ bool vgic_state_is_nested(struct kvm_vcpu *vcpu)
>  	bool imo = __vcpu_sys_reg(vcpu, HCR_EL2) & HCR_IMO;
>  	bool fmo = __vcpu_sys_reg(vcpu, HCR_EL2) & HCR_FMO;
>  
> -	WARN(imo != fmo, "Separate virtual IRQ/FIQ settings not supported\n");
> +	WARN_ONCE(imo != fmo, "Separate virtual IRQ/FIQ settings not supported\n");
>  
>  	return nested_virt_in_use(vcpu) && imo && fmo && !is_hyp_ctxt(vcpu);
>  }
> diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c
> index 3bdb31eaed64..ec54bc8d5126 100644
> --- a/virt/kvm/arm/vgic/vgic-init.c
> +++ b/virt/kvm/arm/vgic/vgic-init.c
> @@ -17,9 +17,11 @@
>  #include <linux/uaccess.h>
>  #include <linux/interrupt.h>
>  #include <linux/cpu.h>
> +#include <linux/irq.h>
>  #include <linux/kvm_host.h>
>  #include <kvm/arm_vgic.h>
>  #include <asm/kvm_mmu.h>
> +#include <asm/kvm_nested.h>
>  #include "vgic.h"
>  
>  /*
> @@ -240,6 +242,16 @@ int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
>  	if (!irqchip_in_kernel(vcpu->kvm))
>  		return 0;
>  
> +	if (nested_virt_in_use(vcpu)) {
> +		/* FIXME: remove this hack */
> +		if (vcpu->kvm->arch.vgic.maint_irq == 0)
> +			vcpu->kvm->arch.vgic.maint_irq = kvm_vgic_global_state.maint_irq;
> +		ret = kvm_vgic_set_owner(vcpu, vcpu->kvm->arch.vgic.maint_irq,
> +					 vcpu);
> +		if (ret)
> +			return ret;
> +	}
> +
>  	/*
>  	 * If we are creating a VCPU with a GICv3 we must also register the
>  	 * KVM io device for the redistributor that belongs to this VCPU.
> @@ -455,12 +467,23 @@ static int vgic_init_cpu_dying(unsigned int cpu)
>  
>  static irqreturn_t vgic_maintenance_handler(int irq, void *data)
>  {
> +	struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)data;
> +
>  	/*
>  	 * We cannot rely on the vgic maintenance interrupt to be
>  	 * delivered synchronously. This means we can only use it to
>  	 * exit the VM, and we perform the handling of EOIed
>  	 * interrupts on the exit path (see vgic_fold_lr_state).
>  	 */
> +
> +	/* If not nested, deactivate */
> +	if (!vcpu || !vgic_state_is_nested(vcpu)) {
> +		irq_set_irqchip_state(irq, IRQCHIP_STATE_ACTIVE, false);
> +		return IRQ_HANDLED;
> +	}
> +
> +	/* Assume nested from now */
> +	vgic_v3_handle_nested_maint_irq(vcpu);
>  	return IRQ_HANDLED;
>  }
>  
> @@ -531,6 +554,13 @@ int kvm_vgic_hyp_init(void)
>  		return ret;
>  	}
>  
> +	ret = irq_set_vcpu_affinity(kvm_vgic_global_state.maint_irq,
> +				    kvm_get_running_vcpus());
> +	if (ret) {
> +		kvm_err("Error setting vcpu affinity\n");
> +		goto out_free_irq;
> +	}
> +
>  	ret = cpuhp_setup_state(CPUHP_AP_KVM_ARM_VGIC_INIT_STARTING,
>  				"kvm/arm/vgic:starting",
>  				vgic_init_cpu_starting, vgic_init_cpu_dying);
> diff --git a/virt/kvm/arm/vgic/vgic-v3-nested.c b/virt/kvm/arm/vgic/vgic-v3-nested.c
> index c917d49e4a14..7c5f82ae68e0 100644
> --- a/virt/kvm/arm/vgic/vgic-v3-nested.c
> +++ b/virt/kvm/arm/vgic/vgic-v3-nested.c
> @@ -172,10 +172,20 @@ void vgic_v3_sync_nested(struct kvm_vcpu *vcpu)
>  void vgic_v3_load_nested(struct kvm_vcpu *vcpu)
>  {
>  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
> +	struct vgic_irq *irq;
> +	unsigned long flags;
>  
>  	vgic_cpu->shadow_vgic_v3 = vgic_cpu->nested_vgic_v3;
>  	vgic_v3_create_shadow_lr(vcpu);
>  	__vgic_v3_restore_state(vcpu_shadow_if(vcpu));
> +
> +	irq = vgic_get_irq(vcpu->kvm, vcpu, vcpu->kvm->arch.vgic.maint_irq);
> +	raw_spin_lock_irqsave(&irq->irq_lock, flags);
> +	if (irq->line_level || irq->active)
> +		irq_set_irqchip_state(kvm_vgic_global_state.maint_irq,
> +				      IRQCHIP_STATE_ACTIVE, true);
> +	raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
> +	vgic_put_irq(vcpu->kvm, irq);
>  }
>  
>  void vgic_v3_put_nested(struct kvm_vcpu *vcpu)
> @@ -190,11 +200,14 @@ void vgic_v3_put_nested(struct kvm_vcpu *vcpu)
>  	 */
>  	vgic_v3_fixup_shadow_lr_state(vcpu);
>  	vgic_cpu->nested_vgic_v3 = vgic_cpu->shadow_vgic_v3;
> +	irq_set_irqchip_state(kvm_vgic_global_state.maint_irq,
> +			      IRQCHIP_STATE_ACTIVE, false);
>  }
>  
>  void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu)
>  {
>  	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
> +	bool state;
>  
>  	/*
>  	 * If we exit a nested VM with a pending maintenance interrupt from the
> @@ -202,8 +215,12 @@ void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu)
>  	 * can re-sync the appropriate LRs and sample level triggered interrupts
>  	 * again.
>  	 */
> -	if (vgic_state_is_nested(vcpu) &&
> -	    (cpu_if->vgic_hcr & ICH_HCR_EN) &&
> -	    vgic_v3_get_misr(vcpu))
> -		kvm_inject_nested_irq(vcpu);
> +	if (!vgic_state_is_nested(vcpu))
> +		return;

Isn't this redundant with the same check in vgic_maintenance_handler?

Thanks,
Alex
> +
> +	state  = cpu_if->vgic_hcr & ICH_HCR_EN;
> +	state &= vgic_v3_get_misr(vcpu);
> +
> +	kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
> +			    vcpu->kvm->arch.vgic.maint_irq, state, vcpu);
>  }

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 55/59] arm64: KVM: nv: Add handling of EL2-specific timer registers
  2019-06-21  9:38 ` [PATCH 55/59] arm64: KVM: nv: Add handling of EL2-specific timer registers Marc Zyngier
  2019-07-11 12:35   ` Alexandru Elisei
@ 2019-07-17 10:19   ` Alexandru Elisei
  1 sibling, 0 replies; 176+ messages in thread
From: Alexandru Elisei @ 2019-07-17 10:19 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave Martin

On 6/21/19 10:38 AM, Marc Zyngier wrote:
> Add the required handling for EL2 and EL02 registers, as
> well as EL1 registers used in the E2H context.

Shouldn't that be "[..] EL0 registers used in E2H context"?

Thanks,
Alex
>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/kvm/sys_regs.c | 72 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 72 insertions(+)
>
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index ba3bcd29c02d..0bd6a66b621e 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1361,20 +1361,92 @@ static bool access_arch_timer(struct kvm_vcpu *vcpu,
>  
>  	switch (reg) {
>  	case SYS_CNTP_TVAL_EL0:
> +		if (vcpu_mode_el2(vcpu) && vcpu_el2_e2h_is_set(vcpu))
> +			tmr = TIMER_HPTIMER;
> +		else
> +			tmr = TIMER_PTIMER;
> +		treg = TIMER_REG_TVAL;
> +		break;
> +
>  	case SYS_AARCH32_CNTP_TVAL:
> +	case SYS_CNTP_TVAL_EL02:
>  		tmr = TIMER_PTIMER;
>  		treg = TIMER_REG_TVAL;
>  		break;
> +
> +	case SYS_CNTV_TVAL_EL02:
> +		tmr = TIMER_VTIMER;
> +		treg = TIMER_REG_TVAL;
> +		break;
> +
> +	case SYS_CNTHP_TVAL_EL2:
> +		tmr = TIMER_HPTIMER;
> +		treg = TIMER_REG_TVAL;
> +		break;
> +
> +	case SYS_CNTHV_TVAL_EL2:
> +		tmr = TIMER_HVTIMER;
> +		treg = TIMER_REG_TVAL;
> +		break;
> +
>  	case SYS_CNTP_CTL_EL0:
> +		if (vcpu_mode_el2(vcpu) && vcpu_el2_e2h_is_set(vcpu))
> +			tmr = TIMER_HPTIMER;
> +		else
> +			tmr = TIMER_PTIMER;
> +		treg = TIMER_REG_CTL;
> +		break;
> +
>  	case SYS_AARCH32_CNTP_CTL:
> +	case SYS_CNTP_CTL_EL02:
>  		tmr = TIMER_PTIMER;
>  		treg = TIMER_REG_CTL;
>  		break;
> +
> +	case SYS_CNTV_CTL_EL02:
> +		tmr = TIMER_VTIMER;
> +		treg = TIMER_REG_CTL;
> +		break;
> +
> +	case SYS_CNTHP_CTL_EL2:
> +		tmr = TIMER_HPTIMER;
> +		treg = TIMER_REG_CTL;
> +		break;
> +
> +	case SYS_CNTHV_CTL_EL2:
> +		tmr = TIMER_HVTIMER;
> +		treg = TIMER_REG_CTL;
> +		break;
> +		
>  	case SYS_CNTP_CVAL_EL0:
> +		if (vcpu_mode_el2(vcpu) && vcpu_el2_e2h_is_set(vcpu))
> +			tmr = TIMER_HPTIMER;
> +		else
> +			tmr = TIMER_PTIMER;
> +		treg = TIMER_REG_CVAL;
> +		break;
> +
>  	case SYS_AARCH32_CNTP_CVAL:
> +	case SYS_CNTP_CVAL_EL02:
>  		tmr = TIMER_PTIMER;
>  		treg = TIMER_REG_CVAL;
>  		break;
> +
> +	case SYS_CNTV_CVAL_EL02:
> +		tmr = TIMER_VTIMER;
> +		treg = TIMER_REG_CVAL;
> +		break;
> +
> +	case SYS_CNTHP_CVAL_EL2:
> +		tmr = TIMER_HPTIMER;
> +		treg = TIMER_REG_CVAL;
> +		break;
> +
> +	case SYS_CNTHV_CVAL_EL2:
> +		tmr = TIMER_HVTIMER;
> +		treg = TIMER_REG_CVAL;
> +		break;
> +
>  	case SYS_CNTVOFF_EL2:
>  		tmr = TIMER_VTIMER;
>  		treg = TIMER_REG_VOFF;

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 43/59] KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2
  2019-07-09 13:20   ` Alexandru Elisei
@ 2019-07-18 12:13     ` Tomasz Nowicki
       [not found]       ` <6537c8d2-3bda-788e-8861-b70971a625cb@arm.com>
  0 siblings, 1 reply; 176+ messages in thread
From: Tomasz Nowicki @ 2019-07-18 12:13 UTC (permalink / raw)
  To: Alexandru Elisei, Marc Zyngier, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Dave Martin

Hello Alex,

On 09.07.2019 15:20, Alexandru Elisei wrote:
> On 6/21/19 10:38 AM, Marc Zyngier wrote:
>> From: Jintack Lim <jintack.lim@linaro.org>
>>
>> When supporting nested virtualization a guest hypervisor executing AT
>> instructions must be trapped and emulated by the host hypervisor,
>> because untrapped AT instructions operating on S1E1 will use the wrong
>> translation regieme (the one used to emulate virtual EL2 in EL1 instead
> 
> I think that should be "regime".
> 
>> of virtual EL1) and AT instructions operating on S12 will not work from
>> EL1.
>>
>> This patch does several things.
>>
>> 1. List and define all AT system instructions to emulate and document
>> the emulation design.
>>
>> 2. Implement AT instruction handling logic in EL2. This will be used to
>> emulate AT instructions executed in the virtual EL2.
>>
>> AT instruction emulation works by loading the proper processor
>> context, which depends on the trapped instruction and the virtual
>> HCR_EL2, to the EL1 virtual memory control registers and executing AT
>> instructions. Note that ctxt->hw_sys_regs is expected to have the
>> proper processor context before calling the handling
>> function(__kvm_at_insn) implemented in this patch.
>>
>> 4. Emulate AT S1E[01] instructions by issuing the same instructions in
>> EL2. We set the physical EL1 registers, NV and NV1 bits as described in
>> the AT instruction emulation overview.
> 
> Is item number 3 missing, or is that the result of an unfortunate typo?
> 
>>
>> 5. Emulate AT A12E[01] instructions in two steps: First, do the stage-1
>> translation by reusing the existing AT emulation functions.  Second, do
>> the stage-2 translation by walking the guest hypervisor's stage-2 page
>> table in software. Record the translation result to PAR_EL1.
>>
>> 6. Emulate AT S1E2 instructions by issuing the corresponding S1E1
>> instructions in EL2. We set the physical EL1 registers and the HCR_EL2
>> register as described in the AT instruction emulation overview.
>>
>> 7. Forward system instruction traps to the virtual EL2 if the corresponding
>> virtual AT bit is set in the virtual HCR_EL2.
>>
>>    [ Much logic above has been reworked by Marc Zyngier ]
>>
>> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
>> ---
>>   arch/arm64/include/asm/kvm_arm.h |   2 +
>>   arch/arm64/include/asm/kvm_asm.h |   2 +
>>   arch/arm64/include/asm/sysreg.h  |  17 +++
>>   arch/arm64/kvm/hyp/Makefile      |   1 +
>>   arch/arm64/kvm/hyp/at.c          | 217 +++++++++++++++++++++++++++++++
>>   arch/arm64/kvm/hyp/switch.c      |  13 +-
>>   arch/arm64/kvm/sys_regs.c        | 202 +++++++++++++++++++++++++++-
>>   7 files changed, 450 insertions(+), 4 deletions(-)
>>   create mode 100644 arch/arm64/kvm/hyp/at.c
>>

[...]

>> +
>> +void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
>> +{
>> +	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
>> +	struct mmu_config config;
>> +	struct kvm_s2_mmu *mmu;
>> +
>> +	/*
>> +	 * We can only get here when trapping from vEL2, so we're
>> +	 * translating a guest guest VA.
>> +	 *
>> +	 * FIXME: Obtaining the S2 MMU for a a guest guest is horribly
>> +	 * racy, and we may not find it.
>> +	 */
>> +	spin_lock(&vcpu->kvm->mmu_lock);
>> +
>> +	mmu = lookup_s2_mmu(vcpu->kvm,
>> +			    vcpu_read_sys_reg(vcpu, VTTBR_EL2),
>> +			    vcpu_read_sys_reg(vcpu, HCR_EL2));
>  From ARM DDI 0487D.b, the description for AT S1E1R (page C5-467, it's the same
> for the other at s1e{0,1}* instructions):
> 
> [..] Performs stage 1 address translation, with permisions as if reading from
> the given virtual address from EL1, or from EL2 [..], using the following
> translation regime:
> - If HCR_EL2.{E2H,TGE} is {1, 1}, the EL2&0 translation regime, accessed from EL2.
> 
> If the guest is VHE, I don't think there's any need to switch mmus. The AT
> instruction will use the physical EL1&0 translation regime already on the
> hardware (assuming host HCR_EL2.TGE == 0), which is the vEL2&0 regime for the
> guest hypervisor.

Here we want to run AT for L2 (guest guest) EL1&0 regime and not the L1 
(guest hypervisor) so we have to lookup and switch to nested VM MMU 
context. Or did I miss your point?

Thanks,
Tomasz

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 43/59] KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2
       [not found]       ` <6537c8d2-3bda-788e-8861-b70971a625cb@arm.com>
@ 2019-07-18 12:59         ` Tomasz Nowicki
  0 siblings, 0 replies; 176+ messages in thread
From: Tomasz Nowicki @ 2019-07-18 12:59 UTC (permalink / raw)
  To: Alexandru Elisei, Marc Zyngier, linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Dave Martin

On 18.07.2019 14:36, Alexandru Elisei wrote:
> On 7/18/19 1:13 PM, Tomasz Nowicki wrote:
> 
>> Hello Alex,
>>
>> On 09.07.2019 15:20, Alexandru Elisei wrote:
>>> On 6/21/19 10:38 AM, Marc Zyngier wrote:
>>>> From: Jintack Lim<jintack.lim@linaro.org>
>>>>
>>>> When supporting nested virtualization a guest hypervisor executing AT
>>>> instructions must be trapped and emulated by the host hypervisor,
>>>> because untrapped AT instructions operating on S1E1 will use the wrong
>>>> translation regieme (the one used to emulate virtual EL2 in EL1 instead
>>> I think that should be "regime".
>>>
>>>> of virtual EL1) and AT instructions operating on S12 will not work from
>>>> EL1.
>>>>
>>>> This patch does several things.
>>>>
>>>> 1. List and define all AT system instructions to emulate and document
>>>> the emulation design.
>>>>
>>>> 2. Implement AT instruction handling logic in EL2. This will be used to
>>>> emulate AT instructions executed in the virtual EL2.
>>>>
>>>> AT instruction emulation works by loading the proper processor
>>>> context, which depends on the trapped instruction and the virtual
>>>> HCR_EL2, to the EL1 virtual memory control registers and executing AT
>>>> instructions. Note that ctxt->hw_sys_regs is expected to have the
>>>> proper processor context before calling the handling
>>>> function(__kvm_at_insn) implemented in this patch.
>>>>
>>>> 4. Emulate AT S1E[01] instructions by issuing the same instructions in
>>>> EL2. We set the physical EL1 registers, NV and NV1 bits as described in
>>>> the AT instruction emulation overview.
>>> Is item number 3 missing, or is that the result of an unfortunate typo?
>>>
>>>> 5. Emulate AT A12E[01] instructions in two steps: First, do the stage-1
>>>> translation by reusing the existing AT emulation functions.  Second, do
>>>> the stage-2 translation by walking the guest hypervisor's stage-2 page
>>>> table in software. Record the translation result to PAR_EL1.
>>>>
>>>> 6. Emulate AT S1E2 instructions by issuing the corresponding S1E1
>>>> instructions in EL2. We set the physical EL1 registers and the HCR_EL2
>>>> register as described in the AT instruction emulation overview.
>>>>
>>>> 7. Forward system instruction traps to the virtual EL2 if the corresponding
>>>> virtual AT bit is set in the virtual HCR_EL2.
>>>>
>>>>     [ Much logic above has been reworked by Marc Zyngier ]
>>>>
>>>> Signed-off-by: Jintack Lim<jintack.lim@linaro.org>
>>>> Signed-off-by: Marc Zyngier<marc.zyngier@arm.com>
>>>> Signed-off-by: Christoffer Dall<christoffer.dall@arm.com>
>>>> ---
>>>>    arch/arm64/include/asm/kvm_arm.h |   2 +
>>>>    arch/arm64/include/asm/kvm_asm.h |   2 +
>>>>    arch/arm64/include/asm/sysreg.h  |  17 +++
>>>>    arch/arm64/kvm/hyp/Makefile      |   1 +
>>>>    arch/arm64/kvm/hyp/at.c          | 217 +++++++++++++++++++++++++++++++
>>>>    arch/arm64/kvm/hyp/switch.c      |  13 +-
>>>>    arch/arm64/kvm/sys_regs.c        | 202 +++++++++++++++++++++++++++-
>>>>    7 files changed, 450 insertions(+), 4 deletions(-)
>>>>    create mode 100644 arch/arm64/kvm/hyp/at.c
>>>>
>> [...]
>>
>>>> +
>>>> +void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
>>>> +{
>>>> +	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
>>>> +	struct mmu_config config;
>>>> +	struct kvm_s2_mmu *mmu;
>>>> +
>>>> +	/*
>>>> +	 * We can only get here when trapping from vEL2, so we're
>>>> +	 * translating a guest guest VA.
>>>> +	 *
>>>> +	 * FIXME: Obtaining the S2 MMU for a a guest guest is horribly
>>>> +	 * racy, and we may not find it.
>>>> +	 */
>>>> +	spin_lock(&vcpu->kvm->mmu_lock);
>>>> +
>>>> +	mmu = lookup_s2_mmu(vcpu->kvm,
>>>> +			    vcpu_read_sys_reg(vcpu, VTTBR_EL2),
>>>> +			    vcpu_read_sys_reg(vcpu, HCR_EL2));
>>>   From ARM DDI 0487D.b, the description for AT S1E1R (page C5-467, it's the same
>>> for the other at s1e{0,1}* instructions):
>>>
>>> [..] Performs stage 1 address translation, with permisions as if reading from
>>> the given virtual address from EL1, or from EL2 [..], using the following
>>> translation regime:
>>> - If HCR_EL2.{E2H,TGE} is {1, 1}, the EL2&0 translation regime, accessed from EL2.
>>>
>>> If the guest is VHE, I don't think there's any need to switch mmus. The AT
>>> instruction will use the physical EL1&0 translation regime already on the
>>> hardware (assuming host HCR_EL2.TGE == 0), which is the vEL2&0 regime for the
>>> guest hypervisor.
>> Here we want to run AT for L2 (guest guest) EL1&0 regime and not the L1
>> (guest hypervisor) so we have to lookup and switch to nested VM MMU
>> context. Or did I miss your point?
>>
>> Thanks,
>> Tomasz
> 
> What I mean to say is that if the L1 guest has set HCR_EL2.{E2H, TGE} = {1, 1}, then the instruction affects the vEL2&0 translation regime (as per the instruction description in the arhitecture), which is already loaded. The AT instruction will affect the L1 guest hypervisor, not the L2 guest.
> 
> In other words:
> 
> if (!vcpu_el2_e2h_is_set(vcpu) || !vcpu_el2_tge_is_set(vcpu))
>          /* switch mmus, the instruction affects the L2 guest (the guest guest) */
> else
>          /* do not switch mmus, the instruction affects the L1 guest hypervisor which is loaded */
> 
> I hope this makes things clearer.
> 

Yes, now I see your point, obviously you are right.

Thanks,
Tomasz

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 43/59] KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2
  2019-06-21  9:38 ` [PATCH 43/59] KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2 Marc Zyngier
  2019-07-01 15:45   ` Julien Thierry
  2019-07-09 13:20   ` Alexandru Elisei
@ 2019-07-24 10:25   ` Tomasz Nowicki
  2019-07-24 12:39     ` Marc Zyngier
  2 siblings, 1 reply; 176+ messages in thread
From: Tomasz Nowicki @ 2019-07-24 10:25 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave Martin

On 21.06.2019 11:38, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> When supporting nested virtualization a guest hypervisor executing AT
> instructions must be trapped and emulated by the host hypervisor,
> because untrapped AT instructions operating on S1E1 will use the wrong
> translation regieme (the one used to emulate virtual EL2 in EL1 instead
> of virtual EL1) and AT instructions operating on S12 will not work from
> EL1.
> 
> This patch does several things.
> 
> 1. List and define all AT system instructions to emulate and document
> the emulation design.
> 
> 2. Implement AT instruction handling logic in EL2. This will be used to
> emulate AT instructions executed in the virtual EL2.
> 
> AT instruction emulation works by loading the proper processor
> context, which depends on the trapped instruction and the virtual
> HCR_EL2, to the EL1 virtual memory control registers and executing AT
> instructions. Note that ctxt->hw_sys_regs is expected to have the
> proper processor context before calling the handling
> function(__kvm_at_insn) implemented in this patch.
> 
> 4. Emulate AT S1E[01] instructions by issuing the same instructions in
> EL2. We set the physical EL1 registers, NV and NV1 bits as described in
> the AT instruction emulation overview.
> 
> 5. Emulate AT A12E[01] instructions in two steps: First, do the stage-1
> translation by reusing the existing AT emulation functions.  Second, do
> the stage-2 translation by walking the guest hypervisor's stage-2 page
> table in software. Record the translation result to PAR_EL1.
> 
> 6. Emulate AT S1E2 instructions by issuing the corresponding S1E1
> instructions in EL2. We set the physical EL1 registers and the HCR_EL2
> register as described in the AT instruction emulation overview.
> 
> 7. Forward system instruction traps to the virtual EL2 if the corresponding
> virtual AT bit is set in the virtual HCR_EL2.
> 
>    [ Much logic above has been reworked by Marc Zyngier ]
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> ---
>   arch/arm64/include/asm/kvm_arm.h |   2 +
>   arch/arm64/include/asm/kvm_asm.h |   2 +
>   arch/arm64/include/asm/sysreg.h  |  17 +++
>   arch/arm64/kvm/hyp/Makefile      |   1 +
>   arch/arm64/kvm/hyp/at.c          | 217 +++++++++++++++++++++++++++++++
>   arch/arm64/kvm/hyp/switch.c      |  13 +-
>   arch/arm64/kvm/sys_regs.c        | 202 +++++++++++++++++++++++++++-
>   7 files changed, 450 insertions(+), 4 deletions(-)
>   create mode 100644 arch/arm64/kvm/hyp/at.c
> 

[...]

> +
> +void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
> +{
> +	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
> +	struct mmu_config config;
> +	struct kvm_s2_mmu *mmu;
> +
> +	/*
> +	 * We can only get here when trapping from vEL2, so we're
> +	 * translating a guest guest VA.
> +	 *
> +	 * FIXME: Obtaining the S2 MMU for a a guest guest is horribly
> +	 * racy, and we may not find it.
> +	 */
> +	spin_lock(&vcpu->kvm->mmu_lock);
> +
> +	mmu = lookup_s2_mmu(vcpu->kvm,
> +			    vcpu_read_sys_reg(vcpu, VTTBR_EL2),
> +			    vcpu_read_sys_reg(vcpu, HCR_EL2));
> +
> +	if (WARN_ON(!mmu))
> +		goto out;
> +
> +	/* We've trapped, so everything is live on the CPU. */
> +	__mmu_config_save(&config);
> +
> +	write_sysreg_el1(ctxt->sys_regs[TTBR0_EL1],	SYS_TTBR0);
> +	write_sysreg_el1(ctxt->sys_regs[TTBR1_EL1],	SYS_TTBR1);
> +	write_sysreg_el1(ctxt->sys_regs[TCR_EL1],	SYS_TCR);
> +	write_sysreg_el1(ctxt->sys_regs[SCTLR_EL1],	SYS_SCTLR);
> +	write_sysreg(kvm_get_vttbr(mmu),		vttbr_el2);
> +	/* FIXME: write S2 MMU VTCR_EL2 */
> +	write_sysreg(config.hcr & ~HCR_TGE,		hcr_el2);

All below AT S1E* operations may initiate stage-1 PTW. And stage-1 table 
walk addresses are themselves subject to stage-2 translation.

So should we enable stage-2 translation here by setting HCR_VM bit?

> +
> +	isb();
> +
> +	switch (op) {
> +	case OP_AT_S1E1R:
> +	case OP_AT_S1E1RP:
> +		asm volatile("at s1e1r, %0" : : "r" (vaddr));
> +		break;
> +	case OP_AT_S1E1W:
> +	case OP_AT_S1E1WP:
> +		asm volatile("at s1e1w, %0" : : "r" (vaddr));
> +		break;
> +	case OP_AT_S1E0R:
> +		asm volatile("at s1e0r, %0" : : "r" (vaddr));
> +		break;
> +	case OP_AT_S1E0W:
> +		asm volatile("at s1e0w, %0" : : "r" (vaddr));
> +		break;
> +	default:
> +		WARN_ON(1);
> +		break;
> +	}
> +
> +	isb();
> +
> +	ctxt->sys_regs[PAR_EL1] = read_sysreg(par_el1);
> +
> +	/*
> +	 * Failed? let's leave the building now.
> +	 *
> +	 * FIXME: how about a failed translation because the shadow S2
> +	 * wasn't populated? We may need to perform a SW PTW,
> +	 * populating our shadow S2 and retry the instruction.
> +	 */
> +	if (ctxt->sys_regs[PAR_EL1] & 1)
> +		goto nopan;
> +
> +	/* No PAN? No problem. */
> +	if (!(*vcpu_cpsr(vcpu) & PSR_PAN_BIT))
> +		goto nopan;
> +
> +	/*
> +	 * For PAN-involved AT operations, perform the same
> +	 * translation, using EL0 this time.
> +	 */
> +	switch (op) {
> +	case OP_AT_S1E1RP:
> +		asm volatile("at s1e0r, %0" : : "r" (vaddr));
> +		break;
> +	case OP_AT_S1E1WP:
> +		asm volatile("at s1e0w, %0" : : "r" (vaddr));
> +		break;
> +	default:
> +		goto nopan;
> +	}
> +
> +	/*
> +	 * If the EL0 translation has succeeded, we need to pretend
> +	 * the AT operation has failed, as the PAN setting forbids
> +	 * such a translation.
> +	 *
> +	 * FIXME: we hardcode a Level-3 permission fault. We really
> +	 * should return the real fault level.
> +	 */
> +	if (!(read_sysreg(par_el1) & 1))
> +		ctxt->sys_regs[PAR_EL1] = 0x1f;
> +
> +nopan:
> +	__mmu_config_restore(&config);
> +
> +out:
> +	spin_unlock(&vcpu->kvm->mmu_lock);
> +}
> +
> +void __kvm_at_s1e2(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
> +{
> +	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
> +	struct mmu_config config;
> +	struct kvm_s2_mmu *mmu;
> +	u64 val;
> +
> +	spin_lock(&vcpu->kvm->mmu_lock);
> +
> +	mmu = &vcpu->kvm->arch.mmu;
> +
> +	/* We've trapped, so everything is live on the CPU. */
> +	__mmu_config_save(&config);
> +
> +	if (vcpu_el2_e2h_is_set(vcpu)) {
> +		write_sysreg_el1(ctxt->sys_regs[TTBR0_EL2],	SYS_TTBR0);
> +		write_sysreg_el1(ctxt->sys_regs[TTBR1_EL2],	SYS_TTBR1);
> +		write_sysreg_el1(ctxt->sys_regs[TCR_EL2],	SYS_TCR);
> +		write_sysreg_el1(ctxt->sys_regs[SCTLR_EL2],	SYS_SCTLR);
> +
> +		val = config.hcr;
> +	} else {
> +		write_sysreg_el1(ctxt->sys_regs[TTBR0_EL2],	SYS_TTBR0);
> +		write_sysreg_el1(translate_tcr(ctxt->sys_regs[TCR_EL2]),
> +				 SYS_TCR);
> +		write_sysreg_el1(translate_sctlr(ctxt->sys_regs[SCTLR_EL2]),
> +				 SYS_SCTLR);
> +
> +		val = config.hcr | HCR_NV | HCR_NV1;
> +	}
> +
> +	write_sysreg(kvm_get_vttbr(mmu),		vttbr_el2);
> +	/* FIXME: write S2 MMU VTCR_EL2 */
> +	write_sysreg(val & ~HCR_TGE,			hcr_el2);

And here the same.

> +
> +	isb();
> +
> +	switch (op) {
> +	case OP_AT_S1E2R:
> +		asm volatile("at s1e1r, %0" : : "r" (vaddr));
> +		break;
> +	case OP_AT_S1E2W:
> +		asm volatile("at s1e1w, %0" : : "r" (vaddr));
> +		break;
> +	default:
> +		WARN_ON(1);
> +		break;
> +	}
> +
> +	isb();
> +
> +	/* FIXME: handle failed translation due to shadow S2 */
> +	ctxt->sys_regs[PAR_EL1] = read_sysreg(par_el1);
> +
> +	__mmu_config_restore(&config);
> +	spin_unlock(&vcpu->kvm->mmu_lock);
> +}

Thanks,
Tomasz


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 43/59] KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2
  2019-07-24 10:25   ` Tomasz Nowicki
@ 2019-07-24 12:39     ` Marc Zyngier
  2019-07-24 13:56       ` Tomasz Nowicki
  0 siblings, 1 reply; 176+ messages in thread
From: Marc Zyngier @ 2019-07-24 12:39 UTC (permalink / raw)
  To: Tomasz Nowicki, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave Martin

On 24/07/2019 11:25, Tomasz Nowicki wrote:
> On 21.06.2019 11:38, Marc Zyngier wrote:
>> From: Jintack Lim <jintack.lim@linaro.org>
>>
>> When supporting nested virtualization a guest hypervisor executing AT
>> instructions must be trapped and emulated by the host hypervisor,
>> because untrapped AT instructions operating on S1E1 will use the wrong
>> translation regieme (the one used to emulate virtual EL2 in EL1 instead
>> of virtual EL1) and AT instructions operating on S12 will not work from
>> EL1.
>>
>> This patch does several things.
>>
>> 1. List and define all AT system instructions to emulate and document
>> the emulation design.
>>
>> 2. Implement AT instruction handling logic in EL2. This will be used to
>> emulate AT instructions executed in the virtual EL2.
>>
>> AT instruction emulation works by loading the proper processor
>> context, which depends on the trapped instruction and the virtual
>> HCR_EL2, to the EL1 virtual memory control registers and executing AT
>> instructions. Note that ctxt->hw_sys_regs is expected to have the
>> proper processor context before calling the handling
>> function(__kvm_at_insn) implemented in this patch.
>>
>> 4. Emulate AT S1E[01] instructions by issuing the same instructions in
>> EL2. We set the physical EL1 registers, NV and NV1 bits as described in
>> the AT instruction emulation overview.
>>
>> 5. Emulate AT A12E[01] instructions in two steps: First, do the stage-1
>> translation by reusing the existing AT emulation functions.  Second, do
>> the stage-2 translation by walking the guest hypervisor's stage-2 page
>> table in software. Record the translation result to PAR_EL1.
>>
>> 6. Emulate AT S1E2 instructions by issuing the corresponding S1E1
>> instructions in EL2. We set the physical EL1 registers and the HCR_EL2
>> register as described in the AT instruction emulation overview.
>>
>> 7. Forward system instruction traps to the virtual EL2 if the corresponding
>> virtual AT bit is set in the virtual HCR_EL2.
>>
>>    [ Much logic above has been reworked by Marc Zyngier ]
>>
>> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
>> ---
>>   arch/arm64/include/asm/kvm_arm.h |   2 +
>>   arch/arm64/include/asm/kvm_asm.h |   2 +
>>   arch/arm64/include/asm/sysreg.h  |  17 +++
>>   arch/arm64/kvm/hyp/Makefile      |   1 +
>>   arch/arm64/kvm/hyp/at.c          | 217 +++++++++++++++++++++++++++++++
>>   arch/arm64/kvm/hyp/switch.c      |  13 +-
>>   arch/arm64/kvm/sys_regs.c        | 202 +++++++++++++++++++++++++++-
>>   7 files changed, 450 insertions(+), 4 deletions(-)
>>   create mode 100644 arch/arm64/kvm/hyp/at.c
>>
> 
> [...]
> 
>> +
>> +void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
>> +{
>> +	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
>> +	struct mmu_config config;
>> +	struct kvm_s2_mmu *mmu;
>> +
>> +	/*
>> +	 * We can only get here when trapping from vEL2, so we're
>> +	 * translating a guest guest VA.
>> +	 *
>> +	 * FIXME: Obtaining the S2 MMU for a a guest guest is horribly
>> +	 * racy, and we may not find it.
>> +	 */
>> +	spin_lock(&vcpu->kvm->mmu_lock);
>> +
>> +	mmu = lookup_s2_mmu(vcpu->kvm,
>> +			    vcpu_read_sys_reg(vcpu, VTTBR_EL2),
>> +			    vcpu_read_sys_reg(vcpu, HCR_EL2));
>> +
>> +	if (WARN_ON(!mmu))
>> +		goto out;
>> +
>> +	/* We've trapped, so everything is live on the CPU. */
>> +	__mmu_config_save(&config);
>> +
>> +	write_sysreg_el1(ctxt->sys_regs[TTBR0_EL1],	SYS_TTBR0);
>> +	write_sysreg_el1(ctxt->sys_regs[TTBR1_EL1],	SYS_TTBR1);
>> +	write_sysreg_el1(ctxt->sys_regs[TCR_EL1],	SYS_TCR);
>> +	write_sysreg_el1(ctxt->sys_regs[SCTLR_EL1],	SYS_SCTLR);
>> +	write_sysreg(kvm_get_vttbr(mmu),		vttbr_el2);
>> +	/* FIXME: write S2 MMU VTCR_EL2 */
>> +	write_sysreg(config.hcr & ~HCR_TGE,		hcr_el2);
> 
> All below AT S1E* operations may initiate stage-1 PTW. And stage-1 table 
> walk addresses are themselves subject to stage-2 translation.
> 
> So should we enable stage-2 translation here by setting HCR_VM bit?

Hopefully that's already the case, otherwise nothing works. Have you
verified that it is actually clear at this stage?

It's a bit of a moot point as we need a full S1 PTW in SW anyway (so
most of that code will eventually be removed), but at least this should
be fixed if it needs to.

Thanks,

	M. (who really needs to get on top of these review comments)
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 43/59] KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2
  2019-07-24 12:39     ` Marc Zyngier
@ 2019-07-24 13:56       ` Tomasz Nowicki
  0 siblings, 0 replies; 176+ messages in thread
From: Tomasz Nowicki @ 2019-07-24 13:56 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave Martin

On 24.07.2019 14:39, Marc Zyngier wrote:
> On 24/07/2019 11:25, Tomasz Nowicki wrote:
>> On 21.06.2019 11:38, Marc Zyngier wrote:
>>> From: Jintack Lim <jintack.lim@linaro.org>
>>>
>>> When supporting nested virtualization a guest hypervisor executing AT
>>> instructions must be trapped and emulated by the host hypervisor,
>>> because untrapped AT instructions operating on S1E1 will use the wrong
>>> translation regieme (the one used to emulate virtual EL2 in EL1 instead
>>> of virtual EL1) and AT instructions operating on S12 will not work from
>>> EL1.
>>>
>>> This patch does several things.
>>>
>>> 1. List and define all AT system instructions to emulate and document
>>> the emulation design.
>>>
>>> 2. Implement AT instruction handling logic in EL2. This will be used to
>>> emulate AT instructions executed in the virtual EL2.
>>>
>>> AT instruction emulation works by loading the proper processor
>>> context, which depends on the trapped instruction and the virtual
>>> HCR_EL2, to the EL1 virtual memory control registers and executing AT
>>> instructions. Note that ctxt->hw_sys_regs is expected to have the
>>> proper processor context before calling the handling
>>> function(__kvm_at_insn) implemented in this patch.
>>>
>>> 4. Emulate AT S1E[01] instructions by issuing the same instructions in
>>> EL2. We set the physical EL1 registers, NV and NV1 bits as described in
>>> the AT instruction emulation overview.
>>>
>>> 5. Emulate AT A12E[01] instructions in two steps: First, do the stage-1
>>> translation by reusing the existing AT emulation functions.  Second, do
>>> the stage-2 translation by walking the guest hypervisor's stage-2 page
>>> table in software. Record the translation result to PAR_EL1.
>>>
>>> 6. Emulate AT S1E2 instructions by issuing the corresponding S1E1
>>> instructions in EL2. We set the physical EL1 registers and the HCR_EL2
>>> register as described in the AT instruction emulation overview.
>>>
>>> 7. Forward system instruction traps to the virtual EL2 if the corresponding
>>> virtual AT bit is set in the virtual HCR_EL2.
>>>
>>>     [ Much logic above has been reworked by Marc Zyngier ]
>>>
>>> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
>>> ---
>>>    arch/arm64/include/asm/kvm_arm.h |   2 +
>>>    arch/arm64/include/asm/kvm_asm.h |   2 +
>>>    arch/arm64/include/asm/sysreg.h  |  17 +++
>>>    arch/arm64/kvm/hyp/Makefile      |   1 +
>>>    arch/arm64/kvm/hyp/at.c          | 217 +++++++++++++++++++++++++++++++
>>>    arch/arm64/kvm/hyp/switch.c      |  13 +-
>>>    arch/arm64/kvm/sys_regs.c        | 202 +++++++++++++++++++++++++++-
>>>    7 files changed, 450 insertions(+), 4 deletions(-)
>>>    create mode 100644 arch/arm64/kvm/hyp/at.c
>>>
>>
>> [...]
>>
>>> +
>>> +void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
>>> +{
>>> +	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
>>> +	struct mmu_config config;
>>> +	struct kvm_s2_mmu *mmu;
>>> +
>>> +	/*
>>> +	 * We can only get here when trapping from vEL2, so we're
>>> +	 * translating a guest guest VA.
>>> +	 *
>>> +	 * FIXME: Obtaining the S2 MMU for a a guest guest is horribly
>>> +	 * racy, and we may not find it.
>>> +	 */
>>> +	spin_lock(&vcpu->kvm->mmu_lock);
>>> +
>>> +	mmu = lookup_s2_mmu(vcpu->kvm,
>>> +			    vcpu_read_sys_reg(vcpu, VTTBR_EL2),
>>> +			    vcpu_read_sys_reg(vcpu, HCR_EL2));
>>> +
>>> +	if (WARN_ON(!mmu))
>>> +		goto out;
>>> +
>>> +	/* We've trapped, so everything is live on the CPU. */
>>> +	__mmu_config_save(&config);
>>> +
>>> +	write_sysreg_el1(ctxt->sys_regs[TTBR0_EL1],	SYS_TTBR0);
>>> +	write_sysreg_el1(ctxt->sys_regs[TTBR1_EL1],	SYS_TTBR1);
>>> +	write_sysreg_el1(ctxt->sys_regs[TCR_EL1],	SYS_TCR);
>>> +	write_sysreg_el1(ctxt->sys_regs[SCTLR_EL1],	SYS_SCTLR);
>>> +	write_sysreg(kvm_get_vttbr(mmu),		vttbr_el2);
>>> +	/* FIXME: write S2 MMU VTCR_EL2 */
>>> +	write_sysreg(config.hcr & ~HCR_TGE,		hcr_el2);
>>
>> All below AT S1E* operations may initiate stage-1 PTW. And stage-1 table
>> walk addresses are themselves subject to stage-2 translation.
>>
>> So should we enable stage-2 translation here by setting HCR_VM bit?
> 
> Hopefully that's already the case, otherwise nothing works. Have you
> verified that it is actually clear at this stage?

Yes, HCR_EL2.VM is clear at this stage (cleared during guest exit when 
deactivating traps). The weird thing is that my Fastmodel is still 
working with HCR_EL2.VM=0 here and this path is being exercised.

Thanks,
Tomasz


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 11/59] KVM: arm64: nv: Inject HVC exceptions to the virtual EL2
  2019-06-25 13:13   ` Alexandru Elisei
  2019-07-03 14:16     ` Marc Zyngier
@ 2019-07-30 14:08     ` Alexandru Elisei
  1 sibling, 0 replies; 176+ messages in thread
From: Alexandru Elisei @ 2019-07-30 14:08 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave Martin

On 6/25/19 2:13 PM, Alexandru Elisei wrote:
> On 6/21/19 10:37 AM, Marc Zyngier wrote:
>> From: Jintack Lim <jintack.lim@linaro.org>
>>
>> Now that the psci call is done by the smc instruction when nested
> This suggests that we have support for PSCI calls using SMC as the conduit, but
> that is not the case, as the handle_smc function is not changed by this commit,
> and support for PSCI via SMC is added later in patch 22/59 "KVM: arm64: nv:
> Handle PSCI call via smc from the guest". Perhaps the commit message should be
> reworded to reflect that?
>> virtualization is enabled, it is clear that all hvc instruction from the
>> VM (including from the virtual EL2) are supposed to handled in the
>> virtual EL2.
>>
>> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm64/kvm/handle_exit.c | 7 +++++++
>>  1 file changed, 7 insertions(+)
>>
>> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
>> index 516aead3c2a9..6c0ac52b34cc 100644
>> --- a/arch/arm64/kvm/handle_exit.c
>> +++ b/arch/arm64/kvm/handle_exit.c
>> @@ -30,6 +30,7 @@
>>  #include <asm/kvm_coproc.h>
>>  #include <asm/kvm_emulate.h>
>>  #include <asm/kvm_mmu.h>
>> +#include <asm/kvm_nested.h>
>>  #include <asm/debug-monitors.h>
>>  #include <asm/traps.h>
>>  
>> @@ -52,6 +53,12 @@ static int handle_hvc(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>  			    kvm_vcpu_hvc_get_imm(vcpu));
>>  	vcpu->stat.hvc_exit_stat++;
>>  
>> +	/* Forward hvc instructions to the virtual EL2 if the guest has EL2. */
>> +	if (nested_virt_in_use(vcpu)) {
>> +		kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
>> +		return 1;
>> +	}

According to ARM DDI 0487E.a, when HCR_EL2.HCD = 1, HVC instructions are
undefined at EL2 and EL1.

Thanks,
Alex
>> +
>>  	ret = kvm_hvc_call_handler(vcpu);
>>  	if (ret < 0) {
>>  		vcpu_set_reg(vcpu, 0, ~0UL);

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support
  2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
                   ` (59 preceding siblings ...)
       [not found] ` <CANW9uyssDm_0ysC_pnvhHRrnsmFZik+3_ENmFz7L2GCmtH09fw@mail.gmail.com>
@ 2019-08-02 10:11 ` Alexandru Elisei
  2019-08-02 10:30   ` Andrew Jones
  2019-08-09 10:01   ` Alexandru Elisei
  60 siblings, 2 replies; 176+ messages in thread
From: Alexandru Elisei @ 2019-08-02 10:11 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave P Martin

Hi,

On 6/21/19 10:37 AM, Marc Zyngier wrote:
> I've taken over the maintenance of this series originally written by
> Jintack and Christoffer. Since then, the series has been substantially
> reworked, new features (and most probably bugs) have been added, and
> the whole thing rebased multiple times. If anything breaks, please
> blame me, and nobody else.
>
> As you can tell, this is quite big. It is also remarkably incomplete
> (we're missing many critical bits for fully emulate EL2), but the idea
> is to start merging things early in order to reduce the maintenance
> headache. What we want to achieve is that with NV disabled, there is
> no performance overhead and no regression. The only thing I intend to
> merge ASAP is the first patch in the series, because it should have
> zero effect and is a reasonable cleanup.
>
> The series is roughly divided in 4 parts: exception handling, memory
> virtualization, interrupts and timers. There are of course some
> dependencies, but you'll hopefully get the gist of it.
>
> For the most courageous of you, I've put out a branch[1] containing this
> and a bit more. Of course, you'll need some userspace. Andre maintains
> a hacked version of kvmtool[1] that takes a --nested option, allowing
> the guest to be started at EL2. You can run the whole stack in the
> Foundation model. Don't be in a hurry ;-).
>
> [1] git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git kvm-arm64/nv-wip-5.2-rc5
> [2] git://linux-arm.org/kvmtool.git nv/nv-wip-5.2-rc5
>
> Andre Przywara (4):
>   KVM: arm64: nv: Handle virtual EL2 registers in
>     vcpu_read/write_sys_reg()
>   KVM: arm64: nv: Save/Restore vEL2 sysregs
>   KVM: arm64: nv: Handle traps for timer _EL02 and _EL2 sysregs
>     accessors
>   KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ
>
> Christoffer Dall (16):
>   KVM: arm64: nv: Introduce nested virtualization VCPU feature
>   KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set
>   KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x
>   KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state
>   KVM: arm64: nv: Handle trapped ERET from virtual EL2
>   KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor
>   KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2
>   KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2
>     changes
>   KVM: arm/arm64: nv: Support multiple nested stage 2 mmu structures
>   KVM: arm64: nv: Implement nested Stage-2 page table walk logic
>   KVM: arm64: nv: Handle shadow stage 2 page faults
>   KVM: arm64: nv: Unmap/flush shadow stage 2 page tables
>   KVM: arm64: nv: arch_timer: Support hyp timer emulation
>   KVM: arm64: nv: vgic-v3: Take cpu_if pointer directly instead of vcpu
>   KVM: arm64: nv: vgic: Emulate the HW bit in software
>   KVM: arm64: nv: Add nested GICv3 tracepoints
>
> Dave Martin (1):
>   KVM: arm64: Migrate _elx sysreg accessors to msr_s/mrs_s
>
> Jintack Lim (21):
>   arm64: Add ARM64_HAS_NESTED_VIRT cpufeature
>   KVM: arm64: nv: Add EL2 system registers to vcpu context
>   KVM: arm64: nv: Support virtual EL2 exceptions
>   KVM: arm64: nv: Inject HVC exceptions to the virtual EL2
>   KVM: arm64: nv: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
>   KVM: arm64: nv: Trap CPACR_EL1 access in virtual EL2
>   KVM: arm64: nv: Set a handler for the system instruction traps
>   KVM: arm64: nv: Handle PSCI call via smc from the guest
>   KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting
>   KVM: arm64: nv: Respect virtual CPTR_EL2.TFP setting
>   KVM: arm64: nv: Respect the virtual HCR_EL2.NV bit setting
>   KVM: arm64: nv: Respect virtual HCR_EL2.TVM and TRVM settings
>   KVM: arm64: nv: Respect the virtual HCR_EL2.NV1 bit setting
>   KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2
>   KVM: arm64: nv: Configure HCR_EL2 for nested virtualization
>   KVM: arm64: nv: Pretend we only support larger-than-host page sizes
>   KVM: arm64: nv: Introduce sys_reg_desc.forward_trap
>   KVM: arm64: nv: Rework the system instruction emulation framework
>   KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2
>   KVM: arm64: nv: Trap and emulate TLBI instructions from virtual EL2
>   KVM: arm64: nv: Nested GICv3 Support
>
> Marc Zyngier (17):
>   KVM: arm64: Move __load_guest_stage2 to kvm_mmu.h
>   KVM: arm64: nv: Reset VMPIDR_EL2 and VPIDR_EL2 to sane values
>   KVM: arm64: nv: Handle SPSR_EL2 specially
>   KVM: arm64: nv: Refactor vcpu_{read,write}_sys_reg
>   KVM: arm64: nv: Don't expose SVE to nested guests
>   KVM: arm64: nv: Hide RAS from nested guests
>   KVM: arm/arm64: nv: Factor out stage 2 page table data from struct kvm
>   KVM: arm64: nv: Move last_vcpu_ran to be per s2 mmu
>   KVM: arm64: nv: Don't always start an S2 MMU search from the beginning
>   KVM: arm64: nv: Propagate CNTVOFF_EL2 to the virtual EL1 timer
>   KVM: arm64: nv: Load timer before the GIC
>   KVM: arm64: nv: Implement maintenance interrupt forwarding
>   arm64: KVM: nv: Add handling of EL2-specific timer registers
>   arm64: KVM: nv: Honor SCTLR_EL2.SPAN on entering vEL2
>   arm64: KVM: nv: Handle SCTLR_EL2 RES0/RES1 bits
>   arm64: KVM: nv: Restrict S2 RD/WR permissions to match the guest's
>   arm64: KVM: nv: Allow userspace to request KVM_ARM_VCPU_NESTED_VIRT
>
>  .../admin-guide/kernel-parameters.txt         |    4 +
>  .../virtual/kvm/devices/arm-vgic-v3.txt       |    9 +
>  arch/arm/include/asm/kvm_asm.h                |    5 +-
>  arch/arm/include/asm/kvm_emulate.h            |    3 +
>  arch/arm/include/asm/kvm_host.h               |   31 +-
>  arch/arm/include/asm/kvm_hyp.h                |   25 +-
>  arch/arm/include/asm/kvm_mmu.h                |   83 +-
>  arch/arm/include/asm/kvm_nested.h             |    9 +
>  arch/arm/include/uapi/asm/kvm.h               |    1 +
>  arch/arm/kvm/hyp/switch.c                     |   11 +-
>  arch/arm/kvm/hyp/tlb.c                        |   13 +-
>  arch/arm64/include/asm/cpucaps.h              |    3 +-
>  arch/arm64/include/asm/esr.h                  |    4 +-
>  arch/arm64/include/asm/kvm_arm.h              |   28 +-
>  arch/arm64/include/asm/kvm_asm.h              |    9 +-
>  arch/arm64/include/asm/kvm_coproc.h           |    2 +-
>  arch/arm64/include/asm/kvm_emulate.h          |  157 +-
>  arch/arm64/include/asm/kvm_host.h             |  105 +-
>  arch/arm64/include/asm/kvm_hyp.h              |   82 +-
>  arch/arm64/include/asm/kvm_mmu.h              |   62 +-
>  arch/arm64/include/asm/kvm_nested.h           |   68 +
>  arch/arm64/include/asm/sysreg.h               |  143 +-
>  arch/arm64/include/uapi/asm/kvm.h             |    2 +
>  arch/arm64/kernel/cpufeature.c                |   26 +
>  arch/arm64/kvm/Makefile                       |    4 +
>  arch/arm64/kvm/emulate-nested.c               |  223 +++
>  arch/arm64/kvm/guest.c                        |    6 +
>  arch/arm64/kvm/handle_exit.c                  |   76 +-
>  arch/arm64/kvm/hyp/Makefile                   |    1 +
>  arch/arm64/kvm/hyp/at.c                       |  217 +++
>  arch/arm64/kvm/hyp/switch.c                   |   86 +-
>  arch/arm64/kvm/hyp/sysreg-sr.c                |  267 ++-
>  arch/arm64/kvm/hyp/tlb.c                      |  129 +-
>  arch/arm64/kvm/hyp/vgic-v2-cpuif-proxy.c      |    2 +-
>  arch/arm64/kvm/inject_fault.c                 |   12 -
>  arch/arm64/kvm/nested.c                       |  551 +++++++
>  arch/arm64/kvm/regmap.c                       |    4 +-
>  arch/arm64/kvm/reset.c                        |    7 +
>  arch/arm64/kvm/sys_regs.c                     | 1460 +++++++++++++++--
>  arch/arm64/kvm/sys_regs.h                     |    6 +
>  arch/arm64/kvm/trace.h                        |   58 +-
>  include/kvm/arm_arch_timer.h                  |    6 +
>  include/kvm/arm_vgic.h                        |   28 +-
>  virt/kvm/arm/arch_timer.c                     |  158 +-
>  virt/kvm/arm/arm.c                            |   62 +-
>  virt/kvm/arm/hyp/vgic-v3-sr.c                 |   35 +-
>  virt/kvm/arm/mmio.c                           |   12 +-
>  virt/kvm/arm/mmu.c                            |  445 +++--
>  virt/kvm/arm/trace.h                          |    6 +-
>  virt/kvm/arm/vgic/vgic-init.c                 |   30 +
>  virt/kvm/arm/vgic/vgic-kvm-device.c           |   22 +
>  virt/kvm/arm/vgic/vgic-nested-trace.h         |  137 ++
>  virt/kvm/arm/vgic/vgic-v2.c                   |   10 +-
>  virt/kvm/arm/vgic/vgic-v3-nested.c            |  236 +++
>  virt/kvm/arm/vgic/vgic-v3.c                   |   40 +-
>  virt/kvm/arm/vgic/vgic.c                      |   74 +-
>  56 files changed, 4683 insertions(+), 612 deletions(-)
>  create mode 100644 arch/arm/include/asm/kvm_nested.h
>  create mode 100644 arch/arm64/include/asm/kvm_nested.h
>  create mode 100644 arch/arm64/kvm/emulate-nested.c
>  create mode 100644 arch/arm64/kvm/hyp/at.c
>  create mode 100644 arch/arm64/kvm/nested.c
>  create mode 100644 virt/kvm/arm/vgic/vgic-nested-trace.h
>  create mode 100644 virt/kvm/arm/vgic/vgic-v3-nested.c
>
When working on adding support for EL2 to kvm-unit-tests I was able to trigger
the following warning:

# ./lkvm run -f psci.flat -m 128 -c 8 --console serial --irqchip gicv3 --nested
  # lkvm run --firmware psci.flat -m 128 -c 8 --name guest-151
  Info: Placing fdt at 0x80200000 - 0x80210000
  # Warning: The maximum recommended amount of VCPUs is 4
chr_testdev_init: chr-testdev: can't find a virtio-console
INFO: PSCI version 1.0
PASS: invalid-function
PASS: affinity-info-on
PASS: affinity-info-off
[   24.381266] WARNING: CPU: 3 PID: 160 at
arch/arm64/kvm/../../../virt/kvm/arm/arch_timer.c:170
kvm_timer_irq_can_fire+0xc/0x30
[   24.381366] Modules linked in:
[   24.381466] CPU: 3 PID: 160 Comm: kvm-vcpu-1 Not tainted
5.2.0-rc5-00060-g7dbce63bd1c7 #145
[   24.381566] Hardware name: Foundation-v8A (DT)
[   24.381566] pstate: 40400009 (nZcv daif +PAN -UAO)
[   24.381666] pc : kvm_timer_irq_can_fire+0xc/0x30
[   24.381766] lr : timer_emulate+0x24/0x98
[   24.381766] sp : ffff000013d8b780
[   24.381866] x29: ffff000013d8b780 x28: ffff80087a639b80
[   24.381966] x27: ffff000010ba8648 x26: ffff000010b71b40
[   24.382066] x25: ffff80087a63a100 x24: 0000000000000000
[   24.382111] x23: 000080086ca54000 x22: ffff0000100ce260
[   24.382166] x21: ffff800875e7c918 x20: ffff800875e7a800
[   24.382275] x19: ffff800875e7ca08 x18: 0000000000000000
[   24.382366] x17: 0000000000000000 x16: 0000000000000000
[   24.382466] x15: 0000000000000000 x14: 0000000000002118
[   24.382566] x13: 0000000000002190 x12: 0000000000002280
[   24.382566] x11: 0000000000002208 x10: 0000000000000040
[   24.382666] x9 : ffff000012dc3b38 x8 : 0000000000000000
[   24.382766] x7 : 0000000000000000 x6 : ffff80087ac00248
[   24.382866] x5 : 000080086ca54000 x4 : 0000000000002118
[   24.382966] x3 : eeeeeeeeeeeeeeef x2 : ffff800875e7c918
[   24.383066] x1 : 0000000000000001 x0 : ffff800875e7ca08
[   24.383066] Call trace:
[   24.383166]  kvm_timer_irq_can_fire+0xc/0x30
[   24.383266]  kvm_timer_vcpu_load+0x9c/0x1a0
[   24.383366]  kvm_arch_vcpu_load+0xb0/0x1f0
[   24.383366]  kvm_sched_in+0x1c/0x28
[   24.383466]  finish_task_switch+0xd8/0x1d8
[   24.383566]  __schedule+0x248/0x4a0
[   24.383666]  preempt_schedule_irq+0x60/0x90
[   24.383666]  el1_irq+0xd0/0x180
[   24.383766]  kvm_handle_guest_abort+0x0/0x3a0
[   24.383866]  kvm_arch_vcpu_ioctl_run+0x41c/0x688
[   24.383866]  kvm_vcpu_ioctl+0x4c0/0x838
[   24.383966]  do_vfs_ioctl+0xb8/0x878
[   24.384077]  ksys_ioctl+0x84/0x90
[   24.384166]  __arm64_sys_ioctl+0x18/0x28
[   24.384166]  el0_svc_common.constprop.0+0xb0/0x168
[   24.384266]  el0_svc_handler+0x28/0x78
[   24.384366]  el0_svc+0x8/0xc
[   24.384366] ---[ end trace 37a32293e43ac12c ]---
[   24.384666] WARNING: CPU: 3 PID: 160 at
arch/arm64/kvm/../../../virt/kvm/arm/arch_timer.c:170
kvm_timer_irq_can_fire+0xc/0x30
[   24.384766] Modules linked in:
[   24.384866] CPU: 3 PID: 160 Comm: kvm-vcpu-1 Tainted: G        W
5.2.0-rc5-00060-g7dbce63bd1c7 #145
[   24.384966] Hardware name: Foundation-v8A (DT)
[   24.384966] pstate: 40400009 (nZcv daif +PAN -UAO)
[   24.385066] pc : kvm_timer_irq_can_fire+0xc/0x30
[   24.385166] lr : timer_emulate+0x24/0x98
[   24.385166] sp : ffff000013d8b780
[   24.385266] x29: ffff000013d8b780 x28: ffff80087a639b80
[   24.385366] x27: ffff000010ba8648 x26: ffff000010b71b40
[   24.385466] x25: ffff80087a63a100 x24: 0000000000000000
[   24.385466] x23: 000080086ca54000 x22: ffff0000100ce260
[   24.385566] x21: ffff800875e7c918 x20: ffff800875e7a800
[   24.385666] x19: ffff800875e7ca80 x18: 0000000000000000
[   24.385766] x17: 0000000000000000 x16: 0000000000000000
[   24.385866] x15: 0000000000000000 x14: 0000000000002118
[   24.385966] x13: 0000000000002190 x12: 0000000000002280
[   24.385966] x11: 0000000000002208 x10: 0000000000000040
[   24.386066] x9 : ffff000012dc3b38 x8 : 0000000000000000
[   24.386166] x7 : 0000000000000000 x6 : ffff80087ac00248
[   24.386266] x5 : 000080086ca54000 x4 : 0000000000002118
[   24.386366] x3 : eeeeeeeeeeeeeeef x2 : ffff800875e7c918
[   24.386466] x1 : 0000000000000001 x0 : ffff800875e7ca80
[   24.386466] Call trace:
[   24.386566]  kvm_timer_irq_can_fire+0xc/0x30
[   24.386666]  kvm_timer_vcpu_load+0xa8/0x1a0
[   24.386666]  kvm_arch_vcpu_load+0xb0/0x1f0
[   24.386898]  kvm_sched_in+0x1c/0x28
[   24.386966]  finish_task_switch+0xd8/0x1d8
[   24.387166]  __schedule+0x248/0x4a0
[   24.387354]  preempt_schedule_irq+0x60/0x90
[   24.387366]  el1_irq+0xd0/0x180
[   24.387466]  kvm_handle_guest_abort+0x0/0x3a0
[   24.387566]  kvm_arch_vcpu_ioctl_run+0x41c/0x688
[   24.387566]  kvm_vcpu_ioctl+0x4c0/0x838
[   24.387666]  do_vfs_ioctl+0xb8/0x878
[   24.387766]  ksys_ioctl+0x84/0x90
[   24.387866]  __arm64_sys_ioctl+0x18/0x28
[   24.387866]  el0_svc_common.constprop.0+0xb0/0x168
[   24.387966]  el0_svc_handler+0x28/0x78
[   24.388066]  el0_svc+0x8/0xc
[   24.388066] ---[ end trace 37a32293e43ac12d ]---
PASS: cpu-on
SUMMARY: 4 te[   24.390266] WARNING: CPU: 3 PID: 160 at
arch/arm64/kvm/../../../virt/kvm/arm/arch_timer.c:170
kvm_timer_irq_can_fire+0xc/0x30
s[   24.390366] Modules linked in:
ts[   24.390366] CPU: 3 PID: 160 Comm: kvm-vcpu-1 Tainted: G        W
5.2.0-rc5-00060-g7dbce63bd1c7 #145
[   24.390566] Hardware name: Foundation-v8A (DT)

[   24.390795] pstate: 40400009 (nZcv daif +PAN -UAO)
[   24.390866] pc : kvm_timer_irq_can_fire+0xc/0x30
[   24.390966] lr : timer_emulate+0x24/0x98
[   24.391066] sp : ffff000013d8b780
[   24.391066] x29: ffff000013d8b780 x28: ffff80087a639b80
[   24.391166] x27: ffff000010ba8648 x26: ffff000010b71b40
[   24.391266] x25: ffff80087a63a100 x24: 0000000000000000
[   24.391366] x23: 000080086ca54000 x22: 0000000000000003
[   24.391466] x21: ffff800875e7c918 x20: ffff800875e7a800
[   24.391466] x19: ffff800875e7ca08 x18: 0000000000000000
[   24.391566] x17: 0000000000000000 x16: 0000000000000000
[   24.391666] x15: 0000000000000000 x14: 0000000000002118
[   24.391766] x13: 0000000000002190 x12: 0000000000002280
[   24.391866] x11: 0000000000002208 x10: 0000000000000040
[   24.391942] x9 : ffff000012dc3b38 x8 : 0000000000000000
[   24.391966] x7 : 0000000000000000 x6 : ffff80087ac00248
[   24.392066] x5 : 000080086ca54000 x4 : 0000000000002118
[   24.392166] x3 : eeeeeeeeeeeeeeef x2 : ffff800875e7c918
[   24.392269] x1 : 0000000000000001 x0 : ffff800875e7ca08
[   24.392366] Call trace:
[   24.392433]  kvm_timer_irq_can_fire+0xc/0x30
[   24.392466]  kvm_timer_vcpu_load+0x9c/0x1a0
[   24.392597]  kvm_arch_vcpu_load+0xb0/0x1f0
[   24.392666]  kvm_sched_in+0x1c/0x28
[   24.392766]  finish_task_switch+0xd8/0x1d8
[   24.392766]  __schedule+0x248/0x4a0
[   24.392866]  preempt_schedule_irq+0x60/0x90
[   24.392966]  el1_irq+0xd0/0x180
[   24.392966]  kvm_handle_guest_abort+0x0/0x3a0
[   24.393066]  kvm_arch_vcpu_ioctl_run+0x41c/0x688
[   24.393166]  kvm_vcpu_ioctl+0x4c0/0x838
[   24.393266]  do_vfs_ioctl+0xb8/0x878
[   24.393266]  ksys_ioctl+0x84/0x90
[   24.393366]  __arm64_sys_ioctl+0x18/0x28
[   24.393466]  el0_svc_common.constprop.0+0xb0/0x168
[   24.393566]  el0_svc_handler+0x28/0x78
[   24.393566]  el0_svc+0x8/0xc
[   24.393666] ---[ end trace 37a32293e43ac12e ]---
[   24.393866] WARNING: CPU: 3 PID: 160 at
arch/arm64/kvm/../../../virt/kvm/arm/arch_timer.c:170
kvm_timer_irq_can_fire+0xc/0x30
[   24.394066] Modules linked in:
[   24.394266] CPU: 3 PID: 160 Comm: kvm-vcpu-1 Tainted: G        W
5.2.0-rc5-00060-g7dbce63bd1c7 #145
[   24.394366] Hardware name: Foundation-v8A (DT)
[   24.394466] pstate: 40400009 (nZcv daif +PAN -UAO)
[   24.394466] pc : kvm_timer_irq_can_fire+0xc/0x30
[   24.394566] lr : timer_emulate+0x24/0x98
[   24.394666] sp : ffff000013d8b780
[   24.394727] x29: ffff000013d8b780 x28: ffff80087a639b80
[   24.394766] x27: ffff000010ba8648 x26: ffff000010b71b40
[   24.394866] x25: ffff80087a63a100 x24: 0000000000000000
[   24.394966] x23: 000080086ca54000 x22: 0000000000000003
[   24.394966] x21: ffff800875e7c918 x20: ffff800875e7a800
[   24.395066] x19: ffff800875e7ca80 x18: 0000000000000000
[   24.395166] x17: 0000000000000000 x16: 0000000000000000
[   24.395266] x15: 0000000000000000 x14: 0000000000002118
[   24.395383] x13: 0000000000002190 x12: 0000000000002280
[   24.395466] x11: 0000000000002208 x10: 0000000000000040
[   24.395547] x9 : ffff000012dc3b38 x8 : 0000000000000000
[   24.395666] x7 : 0000000000000000 x6 : ffff80087ac00248
[   24.395866] x5 : 000080086ca54000 x4 : 0000000000002118
[   24.395966] x3 : eeeeeeeeeeeeeeef x2 : ffff800875e7c918
[   24.396066] x1 : 0000000000000001 x0 : ffff800875e7ca80
[   24.396066] Call trace:
[   24.396166]  kvm_timer_irq_can_fire+0xc/0x30
[   24.396266]  kvm_timer_vcpu_load+0xa8/0x1a0
[   24.396366]  kvm_arch_vcpu_load+0xb0/0x1f0
[   24.396366]  kvm_sched_in+0x1c/0x28
[   24.396466]  finish_task_switch+0xd8/0x1d8
[   24.396566]  __schedule+0x248/0x4a0
[   24.396666]  preempt_schedule_irq+0x60/0x90
[   24.396666]  el1_irq+0xd0/0x180
[   24.396766]  kvm_handle_guest_abort+0x0/0x3a0
[   24.396866]  kvm_arch_vcpu_ioctl_run+0x41c/0x688
[   24.396866]  kvm_vcpu_ioctl+0x4c0/0x838
[   24.397021]  do_vfs_ioctl+0xb8/0x878
[   24.397066]  ksys_ioctl+0x84/0x90
[   24.397166]  __arm64_sys_ioctl+0x18/0x28
[   24.397348]  el0_svc_common.constprop.0+0xb0/0x168
[   24.397366]  el0_svc_handler+0x28/0x78
[   24.397566]  el0_svc+0x8/0xc
[   24.397676] ---[ end trace 37a32293e43ac12f ]---

  # KVM compatibility warning.
    virtio-9p device was not detected.
    While you have requested a virtio-9p device, the guest kernel did not
initialize it.
    Please make sure that the guest kernel was compiled with
CONFIG_NET_9P_VIRTIO=y enabled in .config.

  # KVM compatibility warning.
    virtio-net device was not detected.
    While you have requested a virtio-net device, the guest kernel did not
initialize it.
    Please make sure that the guest kernel was compiled with CONFIG_VIRTIO_NET=y
enabled in .config.

These are the changes that I made to kvm-unit-tests (the diff can be applied on
top of upstream master, 2130fd4154ad ("tscdeadline_latency: Check condition
first before loop")):

diff --git a/arm/cstart64.S b/arm/cstart64.S
index b0e8baa1a23a..a7631b5a1801 100644
--- a/arm/cstart64.S
+++ b/arm/cstart64.S
@@ -51,6 +51,17 @@ start:
        b       1b

 1:
+       mrs     x4, CurrentEL
+       cmp     x4, CurrentEL_EL2
+       b.ne    1f
+       mrs     x4, mpidr_el1
+       msr     vmpidr_el2, x4
+       mrs     x4, midr_el1
+       msr     vpidr_el2, x4
+       ldr     x4, =(HCR_EL2_TGE | HCR_EL2_E2H)
+       msr     hcr_el2, x4
+       isb
+1:
        /* set up stack */
        mov     x4, #1
        msr     spsel, x4
@@ -101,6 +112,17 @@ get_mmu_off:

 .globl secondary_entry
 secondary_entry:
+       mrs     x0, CurrentEL
+       cmp     x0, CurrentEL_EL2
+       b.ne    1f
+       mrs     x0, mpidr_el1
+       msr     vmpidr_el2, x0
+       mrs     x0, midr_el1
+       msr     vpidr_el2, x0
+       ldr     x0, =(HCR_EL2_TGE | HCR_EL2_E2H)
+       msr     hcr_el2, x0
+       isb
+1:
        /* Enable FP/ASIMD */
        mov     x0, #(3 << 20)
        msr     cpacr_el1, x0
diff --git a/lib/arm/asm/psci.h b/lib/arm/asm/psci.h
index 7b956bf5987d..07297a27e0ce 100644
--- a/lib/arm/asm/psci.h
+++ b/lib/arm/asm/psci.h
@@ -3,6 +3,15 @@
 #include <libcflat.h>
 #include <linux/psci.h>

+enum psci_conduit {
+       PSCI_CONDUIT_HVC,
+       PSCI_CONDUIT_SMC,
+};
+
+extern void psci_init(void);
+extern void psci_set_conduit(enum psci_conduit conduit);
+extern enum psci_conduit psci_get_conduit(void);
+
 extern int psci_invoke(unsigned long function_id, unsigned long arg0,
                       unsigned long arg1, unsigned long arg2);
 extern int psci_cpu_on(unsigned long cpuid, unsigned long entry_point);
diff --git a/lib/arm/psci.c b/lib/arm/psci.c
index c3d399064ae3..20ad4b944738 100644
--- a/lib/arm/psci.c
+++ b/lib/arm/psci.c
@@ -6,13 +6,14 @@
  *
  * This work is licensed under the terms of the GNU LGPL, version 2.
  */
+#include <devicetree.h>
+#include <string.h>
 #include <asm/psci.h>
 #include <asm/setup.h>
 #include <asm/page.h>
 #include <asm/smp.h>

-__attribute__((noinline))
-int psci_invoke(unsigned long function_id, unsigned long arg0,
+static int psci_invoke_hvc(unsigned long function_id, unsigned long arg0,
                unsigned long arg1, unsigned long arg2)
 {
        asm volatile(
@@ -22,6 +23,63 @@ int psci_invoke(unsigned long function_id, unsigned long arg0,
        return function_id;
 }

+static int psci_invoke_smc(unsigned long function_id, unsigned long arg0,
+               unsigned long arg1, unsigned long arg2)
+{
+       asm volatile(
+               "smc #0"
+       : "+r" (function_id)
+       : "r" (arg0), "r" (arg1), "r" (arg2));
+       return function_id;
+}
+
+/*
+ * Initialize to something sensible, so the exit fallback psci_system_off still
+ * works before calling psci_init when booted at EL1.
+ */
+static enum psci_conduit psci_conduit = PSCI_CONDUIT_HVC;
+static int (*psci_fn)(unsigned long, unsigned long, unsigned long,
+               unsigned long) = &psci_invoke_hvc;
+
+void psci_set_conduit(enum psci_conduit conduit)
+{
+       psci_conduit = conduit;
+       if (conduit == PSCI_CONDUIT_HVC)
+               psci_fn = &psci_invoke_hvc;
+       else
+               psci_fn = &psci_invoke_smc;
+}
+
+enum psci_conduit psci_get_conduit(void)
+{
+       return psci_conduit;
+}
+
+int psci_invoke(unsigned long function_id, unsigned long arg0,
+               unsigned long arg1, unsigned long arg2)
+{
+       return psci_fn(function_id, arg0, arg1, arg2);
+}
+
+void psci_init(void)
+{
+       const char *conduit;
+       int ret;
+
+       ret = dt_get_psci_conduit(&conduit);
+       assert(ret == 0 || ret == -FDT_ERR_NOTFOUND);
+
+       if (ret == -FDT_ERR_NOTFOUND)
+               conduit = "hvc";
+
+       assert(strcmp(conduit, "hvc") == 0 || strcmp(conduit, "smc") == 0);
+
+       if (strcmp(conduit, "hvc") == 0)
+               psci_set_conduit(PSCI_CONDUIT_HVC);
+       else
+               psci_set_conduit(PSCI_CONDUIT_SMC);
+}
+
 int psci_cpu_on(unsigned long cpuid, unsigned long entry_point)
 {
 #ifdef __arm__
diff --git a/lib/arm/setup.c b/lib/arm/setup.c
index 4f02fca85607..e0dc9e4801b0 100644
--- a/lib/arm/setup.c
+++ b/lib/arm/setup.c
@@ -21,6 +21,7 @@
 #include <asm/setup.h>
 #include <asm/page.h>
 #include <asm/smp.h>
+#include <asm/psci.h>

 #include "io.h"

@@ -164,7 +165,11 @@ void setup(const void *fdt)
                freemem += initrd_size;
        }

-       /* call init functions */
+       /*
+        * call init functions. psci_init goes first so psci_system_off fallback
+        * works in case of an assert failure
+        */
+       psci_init();
        mem_init(PAGE_ALIGN((unsigned long)freemem));
        cpu_init();

diff --git a/lib/arm64/asm/processor.h b/lib/arm64/asm/processor.h
index 1d9223f728a5..18c5d29ddd1f 100644
--- a/lib/arm64/asm/processor.h
+++ b/lib/arm64/asm/processor.h
@@ -16,6 +16,9 @@
 #define SCTLR_EL1_A    (1 << 1)
 #define SCTLR_EL1_M    (1 << 0)

+#define HCR_EL2_TGE    (1 << 27)
+#define HCR_EL2_E2H    (1 << 34)
+
 #ifndef __ASSEMBLY__
 #include <asm/ptrace.h>
 #include <asm/esr.h>
diff --git a/lib/devicetree.c b/lib/devicetree.c
index 2b89178a109b..4e684c7100b2 100644
--- a/lib/devicetree.c
+++ b/lib/devicetree.c
@@ -263,6 +263,27 @@ int dt_get_bootargs(const char **bootargs)
        return 0;
 }

+int dt_get_psci_conduit(const char **conduit)
+{
+       const struct fdt_property *prop;
+       int node, len;
+
+       *conduit = NULL;
+
+       node = fdt_node_offset_by_compatible(fdt, -1, "arm,psci-0.2");
+       if (node < 0)
+               return node;
+
+       prop = fdt_get_property(fdt, node, "method", &len);
+       if (!prop)
+               return len;
+       if (len < 4)
+               return -FDT_ERR_NOTFOUND;
+
+       *conduit = prop->data;
+       return 0;
+}
+
 int dt_get_default_console_node(void)
 {
        const struct fdt_property *prop;
diff --git a/lib/devicetree.h b/lib/devicetree.h
index 93c7ebc63bd8..236035eb777d 100644
--- a/lib/devicetree.h
+++ b/lib/devicetree.h
@@ -211,6 +211,15 @@ extern int dt_get_reg(int fdtnode, int regidx, struct
dt_reg *reg);
 extern int dt_get_bootargs(const char **bootargs);

 /*
+ * dt_get_psci_conduit gets the conduit for PSCI function invocations from
+ * /psci/method
+ * returns
+ *   - zero on success
+ *   - a negative FDT_ERR_* value on failure, and @conduit will be set to null
+ */
+extern int dt_get_psci_conduit(const char **conduit);
+
+/*
  * dt_get_default_console_node gets the node of the path stored in
  * /chosen/stdout-path (or the deprecated /chosen/linux,stdout-path)
  * returns

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply related	[flat|nested] 176+ messages in thread

* Re: [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support
  2019-08-02 10:11 ` Alexandru Elisei
@ 2019-08-02 10:30   ` Andrew Jones
  2019-08-09 10:01   ` Alexandru Elisei
  1 sibling, 0 replies; 176+ messages in thread
From: Andrew Jones @ 2019-08-02 10:30 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: Marc Zyngier, linux-arm-kernel, kvmarm, kvm, Andre Przywara,
	Dave P Martin

On Fri, Aug 02, 2019 at 10:11:38AM +0000, Alexandru Elisei wrote:
> These are the changes that I made to kvm-unit-tests (the diff can be applied on
> top of upstream master, 2130fd4154ad ("tscdeadline_latency: Check condition
> first before loop")):

It's great to hear that you're doing this. You may find these bit-rotted
commits useful too

https://github.com/rhdrjones/kvm-unit-tests/commits/arm64/hyp-mode

Thanks,
drew

> 
> diff --git a/arm/cstart64.S b/arm/cstart64.S
> index b0e8baa1a23a..a7631b5a1801 100644
> --- a/arm/cstart64.S
> +++ b/arm/cstart64.S
> @@ -51,6 +51,17 @@ start:
>         b       1b
> 
>  1:
> +       mrs     x4, CurrentEL
> +       cmp     x4, CurrentEL_EL2
> +       b.ne    1f
> +       mrs     x4, mpidr_el1
> +       msr     vmpidr_el2, x4
> +       mrs     x4, midr_el1
> +       msr     vpidr_el2, x4
> +       ldr     x4, =(HCR_EL2_TGE | HCR_EL2_E2H)
> +       msr     hcr_el2, x4
> +       isb
> +1:
>         /* set up stack */
>         mov     x4, #1
>         msr     spsel, x4
> @@ -101,6 +112,17 @@ get_mmu_off:
> 
>  .globl secondary_entry
>  secondary_entry:
> +       mrs     x0, CurrentEL
> +       cmp     x0, CurrentEL_EL2
> +       b.ne    1f
> +       mrs     x0, mpidr_el1
> +       msr     vmpidr_el2, x0
> +       mrs     x0, midr_el1
> +       msr     vpidr_el2, x0
> +       ldr     x0, =(HCR_EL2_TGE | HCR_EL2_E2H)
> +       msr     hcr_el2, x0
> +       isb
> +1:
>         /* Enable FP/ASIMD */
>         mov     x0, #(3 << 20)
>         msr     cpacr_el1, x0
> diff --git a/lib/arm/asm/psci.h b/lib/arm/asm/psci.h
> index 7b956bf5987d..07297a27e0ce 100644
> --- a/lib/arm/asm/psci.h
> +++ b/lib/arm/asm/psci.h
> @@ -3,6 +3,15 @@
>  #include <libcflat.h>
>  #include <linux/psci.h>
> 
> +enum psci_conduit {
> +       PSCI_CONDUIT_HVC,
> +       PSCI_CONDUIT_SMC,
> +};
> +
> +extern void psci_init(void);
> +extern void psci_set_conduit(enum psci_conduit conduit);
> +extern enum psci_conduit psci_get_conduit(void);
> +
>  extern int psci_invoke(unsigned long function_id, unsigned long arg0,
>                        unsigned long arg1, unsigned long arg2);
>  extern int psci_cpu_on(unsigned long cpuid, unsigned long entry_point);
> diff --git a/lib/arm/psci.c b/lib/arm/psci.c
> index c3d399064ae3..20ad4b944738 100644
> --- a/lib/arm/psci.c
> +++ b/lib/arm/psci.c
> @@ -6,13 +6,14 @@
>   *
>   * This work is licensed under the terms of the GNU LGPL, version 2.
>   */
> +#include <devicetree.h>
> +#include <string.h>
>  #include <asm/psci.h>
>  #include <asm/setup.h>
>  #include <asm/page.h>
>  #include <asm/smp.h>
> 
> -__attribute__((noinline))
> -int psci_invoke(unsigned long function_id, unsigned long arg0,
> +static int psci_invoke_hvc(unsigned long function_id, unsigned long arg0,
>                 unsigned long arg1, unsigned long arg2)
>  {
>         asm volatile(
> @@ -22,6 +23,63 @@ int psci_invoke(unsigned long function_id, unsigned long arg0,
>         return function_id;
>  }
> 
> +static int psci_invoke_smc(unsigned long function_id, unsigned long arg0,
> +               unsigned long arg1, unsigned long arg2)
> +{
> +       asm volatile(
> +               "smc #0"
> +       : "+r" (function_id)
> +       : "r" (arg0), "r" (arg1), "r" (arg2));
> +       return function_id;
> +}
> +
> +/*
> + * Initialize to something sensible, so the exit fallback psci_system_off still
> + * works before calling psci_init when booted at EL1.
> + */
> +static enum psci_conduit psci_conduit = PSCI_CONDUIT_HVC;
> +static int (*psci_fn)(unsigned long, unsigned long, unsigned long,
> +               unsigned long) = &psci_invoke_hvc;
> +
> +void psci_set_conduit(enum psci_conduit conduit)
> +{
> +       psci_conduit = conduit;
> +       if (conduit == PSCI_CONDUIT_HVC)
> +               psci_fn = &psci_invoke_hvc;
> +       else
> +               psci_fn = &psci_invoke_smc;
> +}
> +
> +enum psci_conduit psci_get_conduit(void)
> +{
> +       return psci_conduit;
> +}
> +
> +int psci_invoke(unsigned long function_id, unsigned long arg0,
> +               unsigned long arg1, unsigned long arg2)
> +{
> +       return psci_fn(function_id, arg0, arg1, arg2);
> +}
> +
> +void psci_init(void)
> +{
> +       const char *conduit;
> +       int ret;
> +
> +       ret = dt_get_psci_conduit(&conduit);
> +       assert(ret == 0 || ret == -FDT_ERR_NOTFOUND);
> +
> +       if (ret == -FDT_ERR_NOTFOUND)
> +               conduit = "hvc";
> +
> +       assert(strcmp(conduit, "hvc") == 0 || strcmp(conduit, "smc") == 0);
> +
> +       if (strcmp(conduit, "hvc") == 0)
> +               psci_set_conduit(PSCI_CONDUIT_HVC);
> +       else
> +               psci_set_conduit(PSCI_CONDUIT_SMC);
> +}
> +
>  int psci_cpu_on(unsigned long cpuid, unsigned long entry_point)
>  {
>  #ifdef __arm__
> diff --git a/lib/arm/setup.c b/lib/arm/setup.c
> index 4f02fca85607..e0dc9e4801b0 100644
> --- a/lib/arm/setup.c
> +++ b/lib/arm/setup.c
> @@ -21,6 +21,7 @@
>  #include <asm/setup.h>
>  #include <asm/page.h>
>  #include <asm/smp.h>
> +#include <asm/psci.h>
> 
>  #include "io.h"
> 
> @@ -164,7 +165,11 @@ void setup(const void *fdt)
>                 freemem += initrd_size;
>         }
> 
> -       /* call init functions */
> +       /*
> +        * call init functions. psci_init goes first so psci_system_off fallback
> +        * works in case of an assert failure
> +        */
> +       psci_init();
>         mem_init(PAGE_ALIGN((unsigned long)freemem));
>         cpu_init();
> 
> diff --git a/lib/arm64/asm/processor.h b/lib/arm64/asm/processor.h
> index 1d9223f728a5..18c5d29ddd1f 100644
> --- a/lib/arm64/asm/processor.h
> +++ b/lib/arm64/asm/processor.h
> @@ -16,6 +16,9 @@
>  #define SCTLR_EL1_A    (1 << 1)
>  #define SCTLR_EL1_M    (1 << 0)
> 
> +#define HCR_EL2_TGE    (1 << 27)
> +#define HCR_EL2_E2H    (1 << 34)
> +
>  #ifndef __ASSEMBLY__
>  #include <asm/ptrace.h>
>  #include <asm/esr.h>
> diff --git a/lib/devicetree.c b/lib/devicetree.c
> index 2b89178a109b..4e684c7100b2 100644
> --- a/lib/devicetree.c
> +++ b/lib/devicetree.c
> @@ -263,6 +263,27 @@ int dt_get_bootargs(const char **bootargs)
>         return 0;
>  }
> 
> +int dt_get_psci_conduit(const char **conduit)
> +{
> +       const struct fdt_property *prop;
> +       int node, len;
> +
> +       *conduit = NULL;
> +
> +       node = fdt_node_offset_by_compatible(fdt, -1, "arm,psci-0.2");
> +       if (node < 0)
> +               return node;
> +
> +       prop = fdt_get_property(fdt, node, "method", &len);
> +       if (!prop)
> +               return len;
> +       if (len < 4)
> +               return -FDT_ERR_NOTFOUND;
> +
> +       *conduit = prop->data;
> +       return 0;
> +}
> +
>  int dt_get_default_console_node(void)
>  {
>         const struct fdt_property *prop;
> diff --git a/lib/devicetree.h b/lib/devicetree.h
> index 93c7ebc63bd8..236035eb777d 100644
> --- a/lib/devicetree.h
> +++ b/lib/devicetree.h
> @@ -211,6 +211,15 @@ extern int dt_get_reg(int fdtnode, int regidx, struct
> dt_reg *reg);
>  extern int dt_get_bootargs(const char **bootargs);
> 
>  /*
> + * dt_get_psci_conduit gets the conduit for PSCI function invocations from
> + * /psci/method
> + * returns
> + *   - zero on success
> + *   - a negative FDT_ERR_* value on failure, and @conduit will be set to null
> + */
> +extern int dt_get_psci_conduit(const char **conduit);
> +
> +/*
>   * dt_get_default_console_node gets the node of the path stored in
>   * /chosen/stdout-path (or the deprecated /chosen/linux,stdout-path)
>   * returns
> 
> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 47/59] KVM: arm64: nv: Propagate CNTVOFF_EL2 to the virtual EL1 timer
  2019-06-21  9:38 ` [PATCH 47/59] KVM: arm64: nv: Propagate CNTVOFF_EL2 to the virtual EL1 timer Marc Zyngier
@ 2019-08-08  9:34   ` Alexandru Elisei
  0 siblings, 0 replies; 176+ messages in thread
From: Alexandru Elisei @ 2019-08-08  9:34 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave Martin

On 6/21/19 10:38 AM, Marc Zyngier wrote:
> We need to allow a guest hypervisor to virtualize the virtual timer.
> FOr that, let's propagate CNTVOFF_EL2 to the guest's view of that
> timer.
>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm64/include/asm/kvm_host.h |  1 -
>  arch/arm64/kvm/sys_regs.c         |  8 ++++++--
>  include/kvm/arm_arch_timer.h      |  1 +
>  virt/kvm/arm/arch_timer.c         | 12 ++++++++++++
>  4 files changed, 19 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index b7c44adcdbf3..e0fe9acb46bf 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -252,7 +252,6 @@ enum vcpu_sysreg {
>  	RMR_EL2,	/* Reset Management Register */
>  	CONTEXTIDR_EL2,	/* Context ID Register (EL2) */
>  	TPIDR_EL2,	/* EL2 Software Thread ID Register */
> -	CNTVOFF_EL2,	/* Counter-timer Virtual Offset register */
>  	CNTHCTL_EL2,	/* Counter-timer Hypervisor Control register */
>  	SP_EL2,		/* EL2 Stack Pointer */
>  
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 1b8016330a19..2031a59fcf49 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -150,7 +150,6 @@ struct el2_sysreg_map {
>  	PURE_EL2_SYSREG( RVBAR_EL2 ),
>  	PURE_EL2_SYSREG( RMR_EL2 ),
>  	PURE_EL2_SYSREG( TPIDR_EL2 ),
> -	PURE_EL2_SYSREG( CNTVOFF_EL2 ),
>  	PURE_EL2_SYSREG( CNTHCTL_EL2 ),
>  	PURE_EL2_SYSREG( HPFAR_EL2 ),
>  	EL2_SYSREG(      SCTLR_EL2,  SCTLR_EL1,      translate_sctlr ),
> @@ -1351,6 +1350,11 @@ static bool access_arch_timer(struct kvm_vcpu *vcpu,
>  		tmr = TIMER_PTIMER;
>  		treg = TIMER_REG_CVAL;
>  		break;
> +	case SYS_CNTVOFF_EL2:
> +		tmr = TIMER_VTIMER;
> +		treg = TIMER_REG_VOFF;
> +		break;
> +
>  	default:
>  		BUG();
>  	}
> @@ -2122,7 +2126,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	{ SYS_DESC(SYS_CONTEXTIDR_EL2), access_rw, reset_val, CONTEXTIDR_EL2, 0 },
>  	{ SYS_DESC(SYS_TPIDR_EL2), access_rw, reset_val, TPIDR_EL2, 0 },
>  
> -	{ SYS_DESC(SYS_CNTVOFF_EL2), access_rw, reset_val, CNTVOFF_EL2, 0 },
> +	{ SYS_DESC(SYS_CNTVOFF_EL2), access_arch_timer },
>  	{ SYS_DESC(SYS_CNTHCTL_EL2), access_rw, reset_val, CNTHCTL_EL2, 0 },
>  
>  	{ SYS_DESC(SYS_CNTHP_TVAL_EL2), access_arch_timer },
> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> index 3a5d9255120e..3389606f3029 100644
> --- a/include/kvm/arm_arch_timer.h
> +++ b/include/kvm/arm_arch_timer.h
> @@ -23,6 +23,7 @@ enum kvm_arch_timer_regs {
>  	TIMER_REG_CVAL,
>  	TIMER_REG_TVAL,
>  	TIMER_REG_CTL,
> +	TIMER_REG_VOFF,
>  };
>  
>  struct arch_timer_context {
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 3d84c240071d..1d53352c7d97 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -913,6 +913,10 @@ static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
>  		val = kvm_phys_timer_read() - timer->cntvoff;
>  		break;
>  
> +	case TIMER_REG_VOFF:
> +		val = timer->cntvoff;
> +		break;
> +
>  	default:
>  		BUG();
>  	}
> @@ -955,6 +959,10 @@ static void kvm_arm_timer_write(struct kvm_vcpu *vcpu,
>  		timer->cnt_cval = val;
>  		break;
>  
> +	case TIMER_REG_VOFF:
> +		timer->cntvoff = val;
> +		break;
> +
>  	default:
>  		BUG();
>  	}
> @@ -1166,6 +1174,10 @@ int kvm_timer_enable(struct kvm_vcpu *vcpu)
>  		return -EINVAL;
>  	}
>  
> +	/* Nested virtualization requires zero offset for virtual EL2 */
> +	if (nested_virt_in_use(vcpu))
> +		vcpu_vtimer(vcpu)->cntvoff = 0;

I think this is related to the fact that the virtual offset is treated as 0 when
reading CNTVCT_EL0 from EL2, or from from EL2 and EL0 if E2H, TGE are set
(please correct me if I'm wrong).

However, when the guest runs in virtual EL2, the direct_vtimer is the hvtimer,
so the value that ends up in CNTVOFF_EL2 is vcpu_hvtimer(vcpu)->cntvoff.

Thanks,
Alex
> +
>  	get_timer_map(vcpu, &map);
>  
>  	ret = kvm_vgic_map_phys_irq(vcpu,

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support
  2019-08-02 10:11 ` Alexandru Elisei
  2019-08-02 10:30   ` Andrew Jones
@ 2019-08-09 10:01   ` Alexandru Elisei
  2019-08-09 11:44     ` Andrew Jones
  2019-08-22 11:57     ` Alexandru Elisei
  1 sibling, 2 replies; 176+ messages in thread
From: Alexandru Elisei @ 2019-08-09 10:01 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave P Martin

On 8/2/19 11:11 AM, Alexandru Elisei wrote:
> Hi,
>
> On 6/21/19 10:37 AM, Marc Zyngier wrote:
>> I've taken over the maintenance of this series originally written by
>> Jintack and Christoffer. Since then, the series has been substantially
>> reworked, new features (and most probably bugs) have been added, and
>> the whole thing rebased multiple times. If anything breaks, please
>> blame me, and nobody else.
>>
>> As you can tell, this is quite big. It is also remarkably incomplete
>> (we're missing many critical bits for fully emulate EL2), but the idea
>> is to start merging things early in order to reduce the maintenance
>> headache. What we want to achieve is that with NV disabled, there is
>> no performance overhead and no regression. The only thing I intend to
>> merge ASAP is the first patch in the series, because it should have
>> zero effect and is a reasonable cleanup.
>>
>> The series is roughly divided in 4 parts: exception handling, memory
>> virtualization, interrupts and timers. There are of course some
>> dependencies, but you'll hopefully get the gist of it.
>>
>> For the most courageous of you, I've put out a branch[1] containing this
>> and a bit more. Of course, you'll need some userspace. Andre maintains
>> a hacked version of kvmtool[1] that takes a --nested option, allowing
>> the guest to be started at EL2. You can run the whole stack in the
>> Foundation model. Don't be in a hurry ;-).
>>
>> [1] git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git kvm-arm64/nv-wip-5.2-rc5
>> [2] git://linux-arm.org/kvmtool.git nv/nv-wip-5.2-rc5
>>
>> Andre Przywara (4):
>>   KVM: arm64: nv: Handle virtual EL2 registers in
>>     vcpu_read/write_sys_reg()
>>   KVM: arm64: nv: Save/Restore vEL2 sysregs
>>   KVM: arm64: nv: Handle traps for timer _EL02 and _EL2 sysregs
>>     accessors
>>   KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ
>>
>> Christoffer Dall (16):
>>   KVM: arm64: nv: Introduce nested virtualization VCPU feature
>>   KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set
>>   KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x
>>   KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state
>>   KVM: arm64: nv: Handle trapped ERET from virtual EL2
>>   KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor
>>   KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2
>>   KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2
>>     changes
>>   KVM: arm/arm64: nv: Support multiple nested stage 2 mmu structures
>>   KVM: arm64: nv: Implement nested Stage-2 page table walk logic
>>   KVM: arm64: nv: Handle shadow stage 2 page faults
>>   KVM: arm64: nv: Unmap/flush shadow stage 2 page tables
>>   KVM: arm64: nv: arch_timer: Support hyp timer emulation
>>   KVM: arm64: nv: vgic-v3: Take cpu_if pointer directly instead of vcpu
>>   KVM: arm64: nv: vgic: Emulate the HW bit in software
>>   KVM: arm64: nv: Add nested GICv3 tracepoints
>>
>> Dave Martin (1):
>>   KVM: arm64: Migrate _elx sysreg accessors to msr_s/mrs_s
>>
>> Jintack Lim (21):
>>   arm64: Add ARM64_HAS_NESTED_VIRT cpufeature
>>   KVM: arm64: nv: Add EL2 system registers to vcpu context
>>   KVM: arm64: nv: Support virtual EL2 exceptions
>>   KVM: arm64: nv: Inject HVC exceptions to the virtual EL2
>>   KVM: arm64: nv: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
>>   KVM: arm64: nv: Trap CPACR_EL1 access in virtual EL2
>>   KVM: arm64: nv: Set a handler for the system instruction traps
>>   KVM: arm64: nv: Handle PSCI call via smc from the guest
>>   KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting
>>   KVM: arm64: nv: Respect virtual CPTR_EL2.TFP setting
>>   KVM: arm64: nv: Respect the virtual HCR_EL2.NV bit setting
>>   KVM: arm64: nv: Respect virtual HCR_EL2.TVM and TRVM settings
>>   KVM: arm64: nv: Respect the virtual HCR_EL2.NV1 bit setting
>>   KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2
>>   KVM: arm64: nv: Configure HCR_EL2 for nested virtualization
>>   KVM: arm64: nv: Pretend we only support larger-than-host page sizes
>>   KVM: arm64: nv: Introduce sys_reg_desc.forward_trap
>>   KVM: arm64: nv: Rework the system instruction emulation framework
>>   KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2
>>   KVM: arm64: nv: Trap and emulate TLBI instructions from virtual EL2
>>   KVM: arm64: nv: Nested GICv3 Support
>>
>> Marc Zyngier (17):
>>   KVM: arm64: Move __load_guest_stage2 to kvm_mmu.h
>>   KVM: arm64: nv: Reset VMPIDR_EL2 and VPIDR_EL2 to sane values
>>   KVM: arm64: nv: Handle SPSR_EL2 specially
>>   KVM: arm64: nv: Refactor vcpu_{read,write}_sys_reg
>>   KVM: arm64: nv: Don't expose SVE to nested guests
>>   KVM: arm64: nv: Hide RAS from nested guests
>>   KVM: arm/arm64: nv: Factor out stage 2 page table data from struct kvm
>>   KVM: arm64: nv: Move last_vcpu_ran to be per s2 mmu
>>   KVM: arm64: nv: Don't always start an S2 MMU search from the beginning
>>   KVM: arm64: nv: Propagate CNTVOFF_EL2 to the virtual EL1 timer
>>   KVM: arm64: nv: Load timer before the GIC
>>   KVM: arm64: nv: Implement maintenance interrupt forwarding
>>   arm64: KVM: nv: Add handling of EL2-specific timer registers
>>   arm64: KVM: nv: Honor SCTLR_EL2.SPAN on entering vEL2
>>   arm64: KVM: nv: Handle SCTLR_EL2 RES0/RES1 bits
>>   arm64: KVM: nv: Restrict S2 RD/WR permissions to match the guest's
>>   arm64: KVM: nv: Allow userspace to request KVM_ARM_VCPU_NESTED_VIRT
>>
>>  .../admin-guide/kernel-parameters.txt         |    4 +
>>  .../virtual/kvm/devices/arm-vgic-v3.txt       |    9 +
>>  arch/arm/include/asm/kvm_asm.h                |    5 +-
>>  arch/arm/include/asm/kvm_emulate.h            |    3 +
>>  arch/arm/include/asm/kvm_host.h               |   31 +-
>>  arch/arm/include/asm/kvm_hyp.h                |   25 +-
>>  arch/arm/include/asm/kvm_mmu.h                |   83 +-
>>  arch/arm/include/asm/kvm_nested.h             |    9 +
>>  arch/arm/include/uapi/asm/kvm.h               |    1 +
>>  arch/arm/kvm/hyp/switch.c                     |   11 +-
>>  arch/arm/kvm/hyp/tlb.c                        |   13 +-
>>  arch/arm64/include/asm/cpucaps.h              |    3 +-
>>  arch/arm64/include/asm/esr.h                  |    4 +-
>>  arch/arm64/include/asm/kvm_arm.h              |   28 +-
>>  arch/arm64/include/asm/kvm_asm.h              |    9 +-
>>  arch/arm64/include/asm/kvm_coproc.h           |    2 +-
>>  arch/arm64/include/asm/kvm_emulate.h          |  157 +-
>>  arch/arm64/include/asm/kvm_host.h             |  105 +-
>>  arch/arm64/include/asm/kvm_hyp.h              |   82 +-
>>  arch/arm64/include/asm/kvm_mmu.h              |   62 +-
>>  arch/arm64/include/asm/kvm_nested.h           |   68 +
>>  arch/arm64/include/asm/sysreg.h               |  143 +-
>>  arch/arm64/include/uapi/asm/kvm.h             |    2 +
>>  arch/arm64/kernel/cpufeature.c                |   26 +
>>  arch/arm64/kvm/Makefile                       |    4 +
>>  arch/arm64/kvm/emulate-nested.c               |  223 +++
>>  arch/arm64/kvm/guest.c                        |    6 +
>>  arch/arm64/kvm/handle_exit.c                  |   76 +-
>>  arch/arm64/kvm/hyp/Makefile                   |    1 +
>>  arch/arm64/kvm/hyp/at.c                       |  217 +++
>>  arch/arm64/kvm/hyp/switch.c                   |   86 +-
>>  arch/arm64/kvm/hyp/sysreg-sr.c                |  267 ++-
>>  arch/arm64/kvm/hyp/tlb.c                      |  129 +-
>>  arch/arm64/kvm/hyp/vgic-v2-cpuif-proxy.c      |    2 +-
>>  arch/arm64/kvm/inject_fault.c                 |   12 -
>>  arch/arm64/kvm/nested.c                       |  551 +++++++
>>  arch/arm64/kvm/regmap.c                       |    4 +-
>>  arch/arm64/kvm/reset.c                        |    7 +
>>  arch/arm64/kvm/sys_regs.c                     | 1460 +++++++++++++++--
>>  arch/arm64/kvm/sys_regs.h                     |    6 +
>>  arch/arm64/kvm/trace.h                        |   58 +-
>>  include/kvm/arm_arch_timer.h                  |    6 +
>>  include/kvm/arm_vgic.h                        |   28 +-
>>  virt/kvm/arm/arch_timer.c                     |  158 +-
>>  virt/kvm/arm/arm.c                            |   62 +-
>>  virt/kvm/arm/hyp/vgic-v3-sr.c                 |   35 +-
>>  virt/kvm/arm/mmio.c                           |   12 +-
>>  virt/kvm/arm/mmu.c                            |  445 +++--
>>  virt/kvm/arm/trace.h                          |    6 +-
>>  virt/kvm/arm/vgic/vgic-init.c                 |   30 +
>>  virt/kvm/arm/vgic/vgic-kvm-device.c           |   22 +
>>  virt/kvm/arm/vgic/vgic-nested-trace.h         |  137 ++
>>  virt/kvm/arm/vgic/vgic-v2.c                   |   10 +-
>>  virt/kvm/arm/vgic/vgic-v3-nested.c            |  236 +++
>>  virt/kvm/arm/vgic/vgic-v3.c                   |   40 +-
>>  virt/kvm/arm/vgic/vgic.c                      |   74 +-
>>  56 files changed, 4683 insertions(+), 612 deletions(-)
>>  create mode 100644 arch/arm/include/asm/kvm_nested.h
>>  create mode 100644 arch/arm64/include/asm/kvm_nested.h
>>  create mode 100644 arch/arm64/kvm/emulate-nested.c
>>  create mode 100644 arch/arm64/kvm/hyp/at.c
>>  create mode 100644 arch/arm64/kvm/nested.c
>>  create mode 100644 virt/kvm/arm/vgic/vgic-nested-trace.h
>>  create mode 100644 virt/kvm/arm/vgic/vgic-v3-nested.c
>>
> When working on adding support for EL2 to kvm-unit-tests I was able to trigger
> the following warning:
>
> # ./lkvm run -f psci.flat -m 128 -c 8 --console serial --irqchip gicv3 --nested
>   # lkvm run --firmware psci.flat -m 128 -c 8 --name guest-151
>   Info: Placing fdt at 0x80200000 - 0x80210000
>   # Warning: The maximum recommended amount of VCPUs is 4
> chr_testdev_init: chr-testdev: can't find a virtio-console
> INFO: PSCI version 1.0
> PASS: invalid-function
> PASS: affinity-info-on
> PASS: affinity-info-off
> [   24.381266] WARNING: CPU: 3 PID: 160 at
> arch/arm64/kvm/../../../virt/kvm/arm/arch_timer.c:170
> kvm_timer_irq_can_fire+0xc/0x30
> [   24.381366] Modules linked in:
> [   24.381466] CPU: 3 PID: 160 Comm: kvm-vcpu-1 Not tainted
> 5.2.0-rc5-00060-g7dbce63bd1c7 #145
> [   24.381566] Hardware name: Foundation-v8A (DT)
> [   24.381566] pstate: 40400009 (nZcv daif +PAN -UAO)
> [   24.381666] pc : kvm_timer_irq_can_fire+0xc/0x30
> [   24.381766] lr : timer_emulate+0x24/0x98
> [   24.381766] sp : ffff000013d8b780
> [   24.381866] x29: ffff000013d8b780 x28: ffff80087a639b80
> [   24.381966] x27: ffff000010ba8648 x26: ffff000010b71b40
> [   24.382066] x25: ffff80087a63a100 x24: 0000000000000000
> [   24.382111] x23: 000080086ca54000 x22: ffff0000100ce260
> [   24.382166] x21: ffff800875e7c918 x20: ffff800875e7a800
> [   24.382275] x19: ffff800875e7ca08 x18: 0000000000000000
> [   24.382366] x17: 0000000000000000 x16: 0000000000000000
> [   24.382466] x15: 0000000000000000 x14: 0000000000002118
> [   24.382566] x13: 0000000000002190 x12: 0000000000002280
> [   24.382566] x11: 0000000000002208 x10: 0000000000000040
> [   24.382666] x9 : ffff000012dc3b38 x8 : 0000000000000000
> [   24.382766] x7 : 0000000000000000 x6 : ffff80087ac00248
> [   24.382866] x5 : 000080086ca54000 x4 : 0000000000002118
> [   24.382966] x3 : eeeeeeeeeeeeeeef x2 : ffff800875e7c918
> [   24.383066] x1 : 0000000000000001 x0 : ffff800875e7ca08
> [   24.383066] Call trace:
> [   24.383166]  kvm_timer_irq_can_fire+0xc/0x30
> [   24.383266]  kvm_timer_vcpu_load+0x9c/0x1a0
> [   24.383366]  kvm_arch_vcpu_load+0xb0/0x1f0
> [   24.383366]  kvm_sched_in+0x1c/0x28
> [   24.383466]  finish_task_switch+0xd8/0x1d8
> [   24.383566]  __schedule+0x248/0x4a0
> [   24.383666]  preempt_schedule_irq+0x60/0x90
> [   24.383666]  el1_irq+0xd0/0x180
> [   24.383766]  kvm_handle_guest_abort+0x0/0x3a0
> [   24.383866]  kvm_arch_vcpu_ioctl_run+0x41c/0x688
> [   24.383866]  kvm_vcpu_ioctl+0x4c0/0x838
> [   24.383966]  do_vfs_ioctl+0xb8/0x878
> [   24.384077]  ksys_ioctl+0x84/0x90
> [   24.384166]  __arm64_sys_ioctl+0x18/0x28
> [   24.384166]  el0_svc_common.constprop.0+0xb0/0x168
> [   24.384266]  el0_svc_handler+0x28/0x78
> [   24.384366]  el0_svc+0x8/0xc
> [   24.384366] ---[ end trace 37a32293e43ac12c ]---
> [   24.384666] WARNING: CPU: 3 PID: 160 at
> arch/arm64/kvm/../../../virt/kvm/arm/arch_timer.c:170
> kvm_timer_irq_can_fire+0xc/0x30
> [   24.384766] Modules linked in:
> [   24.384866] CPU: 3 PID: 160 Comm: kvm-vcpu-1 Tainted: G        W
> 5.2.0-rc5-00060-g7dbce63bd1c7 #145
> [   24.384966] Hardware name: Foundation-v8A (DT)
> [   24.384966] pstate: 40400009 (nZcv daif +PAN -UAO)
> [   24.385066] pc : kvm_timer_irq_can_fire+0xc/0x30
> [   24.385166] lr : timer_emulate+0x24/0x98
> [   24.385166] sp : ffff000013d8b780
> [   24.385266] x29: ffff000013d8b780 x28: ffff80087a639b80
> [   24.385366] x27: ffff000010ba8648 x26: ffff000010b71b40
> [   24.385466] x25: ffff80087a63a100 x24: 0000000000000000
> [   24.385466] x23: 000080086ca54000 x22: ffff0000100ce260
> [   24.385566] x21: ffff800875e7c918 x20: ffff800875e7a800
> [   24.385666] x19: ffff800875e7ca80 x18: 0000000000000000
> [   24.385766] x17: 0000000000000000 x16: 0000000000000000
> [   24.385866] x15: 0000000000000000 x14: 0000000000002118
> [   24.385966] x13: 0000000000002190 x12: 0000000000002280
> [   24.385966] x11: 0000000000002208 x10: 0000000000000040
> [   24.386066] x9 : ffff000012dc3b38 x8 : 0000000000000000
> [   24.386166] x7 : 0000000000000000 x6 : ffff80087ac00248
> [   24.386266] x5 : 000080086ca54000 x4 : 0000000000002118
> [   24.386366] x3 : eeeeeeeeeeeeeeef x2 : ffff800875e7c918
> [   24.386466] x1 : 0000000000000001 x0 : ffff800875e7ca80
> [   24.386466] Call trace:
> [   24.386566]  kvm_timer_irq_can_fire+0xc/0x30
> [   24.386666]  kvm_timer_vcpu_load+0xa8/0x1a0
> [   24.386666]  kvm_arch_vcpu_load+0xb0/0x1f0
> [   24.386898]  kvm_sched_in+0x1c/0x28
> [   24.386966]  finish_task_switch+0xd8/0x1d8
> [   24.387166]  __schedule+0x248/0x4a0
> [   24.387354]  preempt_schedule_irq+0x60/0x90
> [   24.387366]  el1_irq+0xd0/0x180
> [   24.387466]  kvm_handle_guest_abort+0x0/0x3a0
> [   24.387566]  kvm_arch_vcpu_ioctl_run+0x41c/0x688
> [   24.387566]  kvm_vcpu_ioctl+0x4c0/0x838
> [   24.387666]  do_vfs_ioctl+0xb8/0x878
> [   24.387766]  ksys_ioctl+0x84/0x90
> [   24.387866]  __arm64_sys_ioctl+0x18/0x28
> [   24.387866]  el0_svc_common.constprop.0+0xb0/0x168
> [   24.387966]  el0_svc_handler+0x28/0x78
> [   24.388066]  el0_svc+0x8/0xc
> [   24.388066] ---[ end trace 37a32293e43ac12d ]---
> PASS: cpu-on
> SUMMARY: 4 te[   24.390266] WARNING: CPU: 3 PID: 160 at
> arch/arm64/kvm/../../../virt/kvm/arm/arch_timer.c:170
> kvm_timer_irq_can_fire+0xc/0x30
> s[   24.390366] Modules linked in:
> ts[   24.390366] CPU: 3 PID: 160 Comm: kvm-vcpu-1 Tainted: G        W
> 5.2.0-rc5-00060-g7dbce63bd1c7 #145
> [   24.390566] Hardware name: Foundation-v8A (DT)
>
> [   24.390795] pstate: 40400009 (nZcv daif +PAN -UAO)
> [   24.390866] pc : kvm_timer_irq_can_fire+0xc/0x30
> [   24.390966] lr : timer_emulate+0x24/0x98
> [   24.391066] sp : ffff000013d8b780
> [   24.391066] x29: ffff000013d8b780 x28: ffff80087a639b80
> [   24.391166] x27: ffff000010ba8648 x26: ffff000010b71b40
> [   24.391266] x25: ffff80087a63a100 x24: 0000000000000000
> [   24.391366] x23: 000080086ca54000 x22: 0000000000000003
> [   24.391466] x21: ffff800875e7c918 x20: ffff800875e7a800
> [   24.391466] x19: ffff800875e7ca08 x18: 0000000000000000
> [   24.391566] x17: 0000000000000000 x16: 0000000000000000
> [   24.391666] x15: 0000000000000000 x14: 0000000000002118
> [   24.391766] x13: 0000000000002190 x12: 0000000000002280
> [   24.391866] x11: 0000000000002208 x10: 0000000000000040
> [   24.391942] x9 : ffff000012dc3b38 x8 : 0000000000000000
> [   24.391966] x7 : 0000000000000000 x6 : ffff80087ac00248
> [   24.392066] x5 : 000080086ca54000 x4 : 0000000000002118
> [   24.392166] x3 : eeeeeeeeeeeeeeef x2 : ffff800875e7c918
> [   24.392269] x1 : 0000000000000001 x0 : ffff800875e7ca08
> [   24.392366] Call trace:
> [   24.392433]  kvm_timer_irq_can_fire+0xc/0x30
> [   24.392466]  kvm_timer_vcpu_load+0x9c/0x1a0
> [   24.392597]  kvm_arch_vcpu_load+0xb0/0x1f0
> [   24.392666]  kvm_sched_in+0x1c/0x28
> [   24.392766]  finish_task_switch+0xd8/0x1d8
> [   24.392766]  __schedule+0x248/0x4a0
> [   24.392866]  preempt_schedule_irq+0x60/0x90
> [   24.392966]  el1_irq+0xd0/0x180
> [   24.392966]  kvm_handle_guest_abort+0x0/0x3a0
> [   24.393066]  kvm_arch_vcpu_ioctl_run+0x41c/0x688
> [   24.393166]  kvm_vcpu_ioctl+0x4c0/0x838
> [   24.393266]  do_vfs_ioctl+0xb8/0x878
> [   24.393266]  ksys_ioctl+0x84/0x90
> [   24.393366]  __arm64_sys_ioctl+0x18/0x28
> [   24.393466]  el0_svc_common.constprop.0+0xb0/0x168
> [   24.393566]  el0_svc_handler+0x28/0x78
> [   24.393566]  el0_svc+0x8/0xc
> [   24.393666] ---[ end trace 37a32293e43ac12e ]---
> [   24.393866] WARNING: CPU: 3 PID: 160 at
> arch/arm64/kvm/../../../virt/kvm/arm/arch_timer.c:170
> kvm_timer_irq_can_fire+0xc/0x30
> [   24.394066] Modules linked in:
> [   24.394266] CPU: 3 PID: 160 Comm: kvm-vcpu-1 Tainted: G        W
> 5.2.0-rc5-00060-g7dbce63bd1c7 #145
> [   24.394366] Hardware name: Foundation-v8A (DT)
> [   24.394466] pstate: 40400009 (nZcv daif +PAN -UAO)
> [   24.394466] pc : kvm_timer_irq_can_fire+0xc/0x30
> [   24.394566] lr : timer_emulate+0x24/0x98
> [   24.394666] sp : ffff000013d8b780
> [   24.394727] x29: ffff000013d8b780 x28: ffff80087a639b80
> [   24.394766] x27: ffff000010ba8648 x26: ffff000010b71b40
> [   24.394866] x25: ffff80087a63a100 x24: 0000000000000000
> [   24.394966] x23: 000080086ca54000 x22: 0000000000000003
> [   24.394966] x21: ffff800875e7c918 x20: ffff800875e7a800
> [   24.395066] x19: ffff800875e7ca80 x18: 0000000000000000
> [   24.395166] x17: 0000000000000000 x16: 0000000000000000
> [   24.395266] x15: 0000000000000000 x14: 0000000000002118
> [   24.395383] x13: 0000000000002190 x12: 0000000000002280
> [   24.395466] x11: 0000000000002208 x10: 0000000000000040
> [   24.395547] x9 : ffff000012dc3b38 x8 : 0000000000000000
> [   24.395666] x7 : 0000000000000000 x6 : ffff80087ac00248
> [   24.395866] x5 : 000080086ca54000 x4 : 0000000000002118
> [   24.395966] x3 : eeeeeeeeeeeeeeef x2 : ffff800875e7c918
> [   24.396066] x1 : 0000000000000001 x0 : ffff800875e7ca80
> [   24.396066] Call trace:
> [   24.396166]  kvm_timer_irq_can_fire+0xc/0x30
> [   24.396266]  kvm_timer_vcpu_load+0xa8/0x1a0
> [   24.396366]  kvm_arch_vcpu_load+0xb0/0x1f0
> [   24.396366]  kvm_sched_in+0x1c/0x28
> [   24.396466]  finish_task_switch+0xd8/0x1d8
> [   24.396566]  __schedule+0x248/0x4a0
> [   24.396666]  preempt_schedule_irq+0x60/0x90
> [   24.396666]  el1_irq+0xd0/0x180
> [   24.396766]  kvm_handle_guest_abort+0x0/0x3a0
> [   24.396866]  kvm_arch_vcpu_ioctl_run+0x41c/0x688
> [   24.396866]  kvm_vcpu_ioctl+0x4c0/0x838
> [   24.397021]  do_vfs_ioctl+0xb8/0x878
> [   24.397066]  ksys_ioctl+0x84/0x90
> [   24.397166]  __arm64_sys_ioctl+0x18/0x28
> [   24.397348]  el0_svc_common.constprop.0+0xb0/0x168
> [   24.397366]  el0_svc_handler+0x28/0x78
> [   24.397566]  el0_svc+0x8/0xc
> [   24.397676] ---[ end trace 37a32293e43ac12f ]---
>
>   # KVM compatibility warning.
>     virtio-9p device was not detected.
>     While you have requested a virtio-9p device, the guest kernel did not
> initialize it.
>     Please make sure that the guest kernel was compiled with
> CONFIG_NET_9P_VIRTIO=y enabled in .config.
>
>   # KVM compatibility warning.
>     virtio-net device was not detected.
>     While you have requested a virtio-net device, the guest kernel did not
> initialize it.
>     Please make sure that the guest kernel was compiled with CONFIG_VIRTIO_NET=y
> enabled in .config.
>
> [..]

Did some investigating and this was caused by a bug in kvm-unit-tests (the fix
for it will be part of the EL2 patches for kvm-unit-tests). The guest was trying
to fetch an instruction from address 0x200, which KVM interprets as a prefetch
abort on an I/O address and ends up calling kvm_inject_pabt. The code from
arch/arm64/kvm/inject_fault.c doesn't know anything about nested virtualization,
and it sets the VCPU mode directly to PSR_MODE_EL1h. This makes_hyp_ctxt return
false, and get_timer_map will return an incorrect mapping.

On next kvm_timer_vcpu_put, the direct timers will be {p,v}timer, and
h{p,v}timer->loaded will not be set to false. In the corresponding call to
kvm_timer_vcpu_load, KVM will try to emulate the hptimer and hvtimer, which
still have loaded = true. And this causes the warning I saw.


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support
  2019-08-09 10:01   ` Alexandru Elisei
@ 2019-08-09 11:44     ` Andrew Jones
  2019-08-09 12:00       ` Alexandru Elisei
  2019-08-22 11:57     ` Alexandru Elisei
  1 sibling, 1 reply; 176+ messages in thread
From: Andrew Jones @ 2019-08-09 11:44 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: Marc Zyngier, linux-arm-kernel, kvmarm, kvm, Andre Przywara,
	Dave P Martin

On Fri, Aug 09, 2019 at 11:01:51AM +0100, Alexandru Elisei wrote:
> On 8/2/19 11:11 AM, Alexandru Elisei wrote:
> > Hi,
> >
> > On 6/21/19 10:37 AM, Marc Zyngier wrote:
> >> I've taken over the maintenance of this series originally written by
> >> Jintack and Christoffer. Since then, the series has been substantially
> >> reworked, new features (and most probably bugs) have been added, and
> >> the whole thing rebased multiple times. If anything breaks, please
> >> blame me, and nobody else.
> >>
> >> As you can tell, this is quite big. It is also remarkably incomplete
> >> (we're missing many critical bits for fully emulate EL2), but the idea
> >> is to start merging things early in order to reduce the maintenance
> >> headache. What we want to achieve is that with NV disabled, there is
> >> no performance overhead and no regression. The only thing I intend to
> >> merge ASAP is the first patch in the series, because it should have
> >> zero effect and is a reasonable cleanup.
> >>
> >> The series is roughly divided in 4 parts: exception handling, memory
> >> virtualization, interrupts and timers. There are of course some
> >> dependencies, but you'll hopefully get the gist of it.
> >>
> >> For the most courageous of you, I've put out a branch[1] containing this
> >> and a bit more. Of course, you'll need some userspace. Andre maintains
> >> a hacked version of kvmtool[1] that takes a --nested option, allowing
> >> the guest to be started at EL2. You can run the whole stack in the
> >> Foundation model. Don't be in a hurry ;-).
> >>
> >> [1] git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git kvm-arm64/nv-wip-5.2-rc5
> >> [2] git://linux-arm.org/kvmtool.git nv/nv-wip-5.2-rc5
> >>
> >> Andre Przywara (4):
> >>   KVM: arm64: nv: Handle virtual EL2 registers in
> >>     vcpu_read/write_sys_reg()
> >>   KVM: arm64: nv: Save/Restore vEL2 sysregs
> >>   KVM: arm64: nv: Handle traps for timer _EL02 and _EL2 sysregs
> >>     accessors
> >>   KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ
> >>
> >> Christoffer Dall (16):
> >>   KVM: arm64: nv: Introduce nested virtualization VCPU feature
> >>   KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set
> >>   KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x
> >>   KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state
> >>   KVM: arm64: nv: Handle trapped ERET from virtual EL2
> >>   KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor
> >>   KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2
> >>   KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2
> >>     changes
> >>   KVM: arm/arm64: nv: Support multiple nested stage 2 mmu structures
> >>   KVM: arm64: nv: Implement nested Stage-2 page table walk logic
> >>   KVM: arm64: nv: Handle shadow stage 2 page faults
> >>   KVM: arm64: nv: Unmap/flush shadow stage 2 page tables
> >>   KVM: arm64: nv: arch_timer: Support hyp timer emulation
> >>   KVM: arm64: nv: vgic-v3: Take cpu_if pointer directly instead of vcpu
> >>   KVM: arm64: nv: vgic: Emulate the HW bit in software
> >>   KVM: arm64: nv: Add nested GICv3 tracepoints
> >>
> >> Dave Martin (1):
> >>   KVM: arm64: Migrate _elx sysreg accessors to msr_s/mrs_s
> >>
> >> Jintack Lim (21):
> >>   arm64: Add ARM64_HAS_NESTED_VIRT cpufeature
> >>   KVM: arm64: nv: Add EL2 system registers to vcpu context
> >>   KVM: arm64: nv: Support virtual EL2 exceptions
> >>   KVM: arm64: nv: Inject HVC exceptions to the virtual EL2
> >>   KVM: arm64: nv: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
> >>   KVM: arm64: nv: Trap CPACR_EL1 access in virtual EL2
> >>   KVM: arm64: nv: Set a handler for the system instruction traps
> >>   KVM: arm64: nv: Handle PSCI call via smc from the guest
> >>   KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting
> >>   KVM: arm64: nv: Respect virtual CPTR_EL2.TFP setting
> >>   KVM: arm64: nv: Respect the virtual HCR_EL2.NV bit setting
> >>   KVM: arm64: nv: Respect virtual HCR_EL2.TVM and TRVM settings
> >>   KVM: arm64: nv: Respect the virtual HCR_EL2.NV1 bit setting
> >>   KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2
> >>   KVM: arm64: nv: Configure HCR_EL2 for nested virtualization
> >>   KVM: arm64: nv: Pretend we only support larger-than-host page sizes
> >>   KVM: arm64: nv: Introduce sys_reg_desc.forward_trap
> >>   KVM: arm64: nv: Rework the system instruction emulation framework
> >>   KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2
> >>   KVM: arm64: nv: Trap and emulate TLBI instructions from virtual EL2
> >>   KVM: arm64: nv: Nested GICv3 Support
> >>
> >> Marc Zyngier (17):
> >>   KVM: arm64: Move __load_guest_stage2 to kvm_mmu.h
> >>   KVM: arm64: nv: Reset VMPIDR_EL2 and VPIDR_EL2 to sane values
> >>   KVM: arm64: nv: Handle SPSR_EL2 specially
> >>   KVM: arm64: nv: Refactor vcpu_{read,write}_sys_reg
> >>   KVM: arm64: nv: Don't expose SVE to nested guests
> >>   KVM: arm64: nv: Hide RAS from nested guests
> >>   KVM: arm/arm64: nv: Factor out stage 2 page table data from struct kvm
> >>   KVM: arm64: nv: Move last_vcpu_ran to be per s2 mmu
> >>   KVM: arm64: nv: Don't always start an S2 MMU search from the beginning
> >>   KVM: arm64: nv: Propagate CNTVOFF_EL2 to the virtual EL1 timer
> >>   KVM: arm64: nv: Load timer before the GIC
> >>   KVM: arm64: nv: Implement maintenance interrupt forwarding
> >>   arm64: KVM: nv: Add handling of EL2-specific timer registers
> >>   arm64: KVM: nv: Honor SCTLR_EL2.SPAN on entering vEL2
> >>   arm64: KVM: nv: Handle SCTLR_EL2 RES0/RES1 bits
> >>   arm64: KVM: nv: Restrict S2 RD/WR permissions to match the guest's
> >>   arm64: KVM: nv: Allow userspace to request KVM_ARM_VCPU_NESTED_VIRT
> >>
> >>  .../admin-guide/kernel-parameters.txt         |    4 +
> >>  .../virtual/kvm/devices/arm-vgic-v3.txt       |    9 +
> >>  arch/arm/include/asm/kvm_asm.h                |    5 +-
> >>  arch/arm/include/asm/kvm_emulate.h            |    3 +
> >>  arch/arm/include/asm/kvm_host.h               |   31 +-
> >>  arch/arm/include/asm/kvm_hyp.h                |   25 +-
> >>  arch/arm/include/asm/kvm_mmu.h                |   83 +-
> >>  arch/arm/include/asm/kvm_nested.h             |    9 +
> >>  arch/arm/include/uapi/asm/kvm.h               |    1 +
> >>  arch/arm/kvm/hyp/switch.c                     |   11 +-
> >>  arch/arm/kvm/hyp/tlb.c                        |   13 +-
> >>  arch/arm64/include/asm/cpucaps.h              |    3 +-
> >>  arch/arm64/include/asm/esr.h                  |    4 +-
> >>  arch/arm64/include/asm/kvm_arm.h              |   28 +-
> >>  arch/arm64/include/asm/kvm_asm.h              |    9 +-
> >>  arch/arm64/include/asm/kvm_coproc.h           |    2 +-
> >>  arch/arm64/include/asm/kvm_emulate.h          |  157 +-
> >>  arch/arm64/include/asm/kvm_host.h             |  105 +-
> >>  arch/arm64/include/asm/kvm_hyp.h              |   82 +-
> >>  arch/arm64/include/asm/kvm_mmu.h              |   62 +-
> >>  arch/arm64/include/asm/kvm_nested.h           |   68 +
> >>  arch/arm64/include/asm/sysreg.h               |  143 +-
> >>  arch/arm64/include/uapi/asm/kvm.h             |    2 +
> >>  arch/arm64/kernel/cpufeature.c                |   26 +
> >>  arch/arm64/kvm/Makefile                       |    4 +
> >>  arch/arm64/kvm/emulate-nested.c               |  223 +++
> >>  arch/arm64/kvm/guest.c                        |    6 +
> >>  arch/arm64/kvm/handle_exit.c                  |   76 +-
> >>  arch/arm64/kvm/hyp/Makefile                   |    1 +
> >>  arch/arm64/kvm/hyp/at.c                       |  217 +++
> >>  arch/arm64/kvm/hyp/switch.c                   |   86 +-
> >>  arch/arm64/kvm/hyp/sysreg-sr.c                |  267 ++-
> >>  arch/arm64/kvm/hyp/tlb.c                      |  129 +-
> >>  arch/arm64/kvm/hyp/vgic-v2-cpuif-proxy.c      |    2 +-
> >>  arch/arm64/kvm/inject_fault.c                 |   12 -
> >>  arch/arm64/kvm/nested.c                       |  551 +++++++
> >>  arch/arm64/kvm/regmap.c                       |    4 +-
> >>  arch/arm64/kvm/reset.c                        |    7 +
> >>  arch/arm64/kvm/sys_regs.c                     | 1460 +++++++++++++++--
> >>  arch/arm64/kvm/sys_regs.h                     |    6 +
> >>  arch/arm64/kvm/trace.h                        |   58 +-
> >>  include/kvm/arm_arch_timer.h                  |    6 +
> >>  include/kvm/arm_vgic.h                        |   28 +-
> >>  virt/kvm/arm/arch_timer.c                     |  158 +-
> >>  virt/kvm/arm/arm.c                            |   62 +-
> >>  virt/kvm/arm/hyp/vgic-v3-sr.c                 |   35 +-
> >>  virt/kvm/arm/mmio.c                           |   12 +-
> >>  virt/kvm/arm/mmu.c                            |  445 +++--
> >>  virt/kvm/arm/trace.h                          |    6 +-
> >>  virt/kvm/arm/vgic/vgic-init.c                 |   30 +
> >>  virt/kvm/arm/vgic/vgic-kvm-device.c           |   22 +
> >>  virt/kvm/arm/vgic/vgic-nested-trace.h         |  137 ++
> >>  virt/kvm/arm/vgic/vgic-v2.c                   |   10 +-
> >>  virt/kvm/arm/vgic/vgic-v3-nested.c            |  236 +++
> >>  virt/kvm/arm/vgic/vgic-v3.c                   |   40 +-
> >>  virt/kvm/arm/vgic/vgic.c                      |   74 +-
> >>  56 files changed, 4683 insertions(+), 612 deletions(-)
> >>  create mode 100644 arch/arm/include/asm/kvm_nested.h
> >>  create mode 100644 arch/arm64/include/asm/kvm_nested.h
> >>  create mode 100644 arch/arm64/kvm/emulate-nested.c
> >>  create mode 100644 arch/arm64/kvm/hyp/at.c
> >>  create mode 100644 arch/arm64/kvm/nested.c
> >>  create mode 100644 virt/kvm/arm/vgic/vgic-nested-trace.h
> >>  create mode 100644 virt/kvm/arm/vgic/vgic-v3-nested.c
> >>
> > When working on adding support for EL2 to kvm-unit-tests I was able to trigger
> > the following warning:
> >
> > # ./lkvm run -f psci.flat -m 128 -c 8 --console serial --irqchip gicv3 --nested
> >   # lkvm run --firmware psci.flat -m 128 -c 8 --name guest-151
> >   Info: Placing fdt at 0x80200000 - 0x80210000
> >   # Warning: The maximum recommended amount of VCPUs is 4
> > chr_testdev_init: chr-testdev: can't find a virtio-console
> > INFO: PSCI version 1.0
> > PASS: invalid-function
> > PASS: affinity-info-on
> > PASS: affinity-info-off
> > [   24.381266] WARNING: CPU: 3 PID: 160 at
> > arch/arm64/kvm/../../../virt/kvm/arm/arch_timer.c:170
> > kvm_timer_irq_can_fire+0xc/0x30
> > [   24.381366] Modules linked in:
> > [   24.381466] CPU: 3 PID: 160 Comm: kvm-vcpu-1 Not tainted
> > 5.2.0-rc5-00060-g7dbce63bd1c7 #145
> > [   24.381566] Hardware name: Foundation-v8A (DT)
> > [   24.381566] pstate: 40400009 (nZcv daif +PAN -UAO)
> > [   24.381666] pc : kvm_timer_irq_can_fire+0xc/0x30
> > [   24.381766] lr : timer_emulate+0x24/0x98
> > [   24.381766] sp : ffff000013d8b780
> > [   24.381866] x29: ffff000013d8b780 x28: ffff80087a639b80
> > [   24.381966] x27: ffff000010ba8648 x26: ffff000010b71b40
> > [   24.382066] x25: ffff80087a63a100 x24: 0000000000000000
> > [   24.382111] x23: 000080086ca54000 x22: ffff0000100ce260
> > [   24.382166] x21: ffff800875e7c918 x20: ffff800875e7a800
> > [   24.382275] x19: ffff800875e7ca08 x18: 0000000000000000
> > [   24.382366] x17: 0000000000000000 x16: 0000000000000000
> > [   24.382466] x15: 0000000000000000 x14: 0000000000002118
> > [   24.382566] x13: 0000000000002190 x12: 0000000000002280
> > [   24.382566] x11: 0000000000002208 x10: 0000000000000040
> > [   24.382666] x9 : ffff000012dc3b38 x8 : 0000000000000000
> > [   24.382766] x7 : 0000000000000000 x6 : ffff80087ac00248
> > [   24.382866] x5 : 000080086ca54000 x4 : 0000000000002118
> > [   24.382966] x3 : eeeeeeeeeeeeeeef x2 : ffff800875e7c918
> > [   24.383066] x1 : 0000000000000001 x0 : ffff800875e7ca08
> > [   24.383066] Call trace:
> > [   24.383166]  kvm_timer_irq_can_fire+0xc/0x30
> > [   24.383266]  kvm_timer_vcpu_load+0x9c/0x1a0
> > [   24.383366]  kvm_arch_vcpu_load+0xb0/0x1f0
> > [   24.383366]  kvm_sched_in+0x1c/0x28
> > [   24.383466]  finish_task_switch+0xd8/0x1d8
> > [   24.383566]  __schedule+0x248/0x4a0
> > [   24.383666]  preempt_schedule_irq+0x60/0x90
> > [   24.383666]  el1_irq+0xd0/0x180
> > [   24.383766]  kvm_handle_guest_abort+0x0/0x3a0
> > [   24.383866]  kvm_arch_vcpu_ioctl_run+0x41c/0x688
> > [   24.383866]  kvm_vcpu_ioctl+0x4c0/0x838
> > [   24.383966]  do_vfs_ioctl+0xb8/0x878
> > [   24.384077]  ksys_ioctl+0x84/0x90
> > [   24.384166]  __arm64_sys_ioctl+0x18/0x28
> > [   24.384166]  el0_svc_common.constprop.0+0xb0/0x168
> > [   24.384266]  el0_svc_handler+0x28/0x78
> > [   24.384366]  el0_svc+0x8/0xc
> > [   24.384366] ---[ end trace 37a32293e43ac12c ]---
> > [   24.384666] WARNING: CPU: 3 PID: 160 at
> > arch/arm64/kvm/../../../virt/kvm/arm/arch_timer.c:170
> > kvm_timer_irq_can_fire+0xc/0x30
> > [   24.384766] Modules linked in:
> > [   24.384866] CPU: 3 PID: 160 Comm: kvm-vcpu-1 Tainted: G        W
> > 5.2.0-rc5-00060-g7dbce63bd1c7 #145
> > [   24.384966] Hardware name: Foundation-v8A (DT)
> > [   24.384966] pstate: 40400009 (nZcv daif +PAN -UAO)
> > [   24.385066] pc : kvm_timer_irq_can_fire+0xc/0x30
> > [   24.385166] lr : timer_emulate+0x24/0x98
> > [   24.385166] sp : ffff000013d8b780
> > [   24.385266] x29: ffff000013d8b780 x28: ffff80087a639b80
> > [   24.385366] x27: ffff000010ba8648 x26: ffff000010b71b40
> > [   24.385466] x25: ffff80087a63a100 x24: 0000000000000000
> > [   24.385466] x23: 000080086ca54000 x22: ffff0000100ce260
> > [   24.385566] x21: ffff800875e7c918 x20: ffff800875e7a800
> > [   24.385666] x19: ffff800875e7ca80 x18: 0000000000000000
> > [   24.385766] x17: 0000000000000000 x16: 0000000000000000
> > [   24.385866] x15: 0000000000000000 x14: 0000000000002118
> > [   24.385966] x13: 0000000000002190 x12: 0000000000002280
> > [   24.385966] x11: 0000000000002208 x10: 0000000000000040
> > [   24.386066] x9 : ffff000012dc3b38 x8 : 0000000000000000
> > [   24.386166] x7 : 0000000000000000 x6 : ffff80087ac00248
> > [   24.386266] x5 : 000080086ca54000 x4 : 0000000000002118
> > [   24.386366] x3 : eeeeeeeeeeeeeeef x2 : ffff800875e7c918
> > [   24.386466] x1 : 0000000000000001 x0 : ffff800875e7ca80
> > [   24.386466] Call trace:
> > [   24.386566]  kvm_timer_irq_can_fire+0xc/0x30
> > [   24.386666]  kvm_timer_vcpu_load+0xa8/0x1a0
> > [   24.386666]  kvm_arch_vcpu_load+0xb0/0x1f0
> > [   24.386898]  kvm_sched_in+0x1c/0x28
> > [   24.386966]  finish_task_switch+0xd8/0x1d8
> > [   24.387166]  __schedule+0x248/0x4a0
> > [   24.387354]  preempt_schedule_irq+0x60/0x90
> > [   24.387366]  el1_irq+0xd0/0x180
> > [   24.387466]  kvm_handle_guest_abort+0x0/0x3a0
> > [   24.387566]  kvm_arch_vcpu_ioctl_run+0x41c/0x688
> > [   24.387566]  kvm_vcpu_ioctl+0x4c0/0x838
> > [   24.387666]  do_vfs_ioctl+0xb8/0x878
> > [   24.387766]  ksys_ioctl+0x84/0x90
> > [   24.387866]  __arm64_sys_ioctl+0x18/0x28
> > [   24.387866]  el0_svc_common.constprop.0+0xb0/0x168
> > [   24.387966]  el0_svc_handler+0x28/0x78
> > [   24.388066]  el0_svc+0x8/0xc
> > [   24.388066] ---[ end trace 37a32293e43ac12d ]---
> > PASS: cpu-on
> > SUMMARY: 4 te[   24.390266] WARNING: CPU: 3 PID: 160 at
> > arch/arm64/kvm/../../../virt/kvm/arm/arch_timer.c:170
> > kvm_timer_irq_can_fire+0xc/0x30
> > s[   24.390366] Modules linked in:
> > ts[   24.390366] CPU: 3 PID: 160 Comm: kvm-vcpu-1 Tainted: G        W
> > 5.2.0-rc5-00060-g7dbce63bd1c7 #145
> > [   24.390566] Hardware name: Foundation-v8A (DT)
> >
> > [   24.390795] pstate: 40400009 (nZcv daif +PAN -UAO)
> > [   24.390866] pc : kvm_timer_irq_can_fire+0xc/0x30
> > [   24.390966] lr : timer_emulate+0x24/0x98
> > [   24.391066] sp : ffff000013d8b780
> > [   24.391066] x29: ffff000013d8b780 x28: ffff80087a639b80
> > [   24.391166] x27: ffff000010ba8648 x26: ffff000010b71b40
> > [   24.391266] x25: ffff80087a63a100 x24: 0000000000000000
> > [   24.391366] x23: 000080086ca54000 x22: 0000000000000003
> > [   24.391466] x21: ffff800875e7c918 x20: ffff800875e7a800
> > [   24.391466] x19: ffff800875e7ca08 x18: 0000000000000000
> > [   24.391566] x17: 0000000000000000 x16: 0000000000000000
> > [   24.391666] x15: 0000000000000000 x14: 0000000000002118
> > [   24.391766] x13: 0000000000002190 x12: 0000000000002280
> > [   24.391866] x11: 0000000000002208 x10: 0000000000000040
> > [   24.391942] x9 : ffff000012dc3b38 x8 : 0000000000000000
> > [   24.391966] x7 : 0000000000000000 x6 : ffff80087ac00248
> > [   24.392066] x5 : 000080086ca54000 x4 : 0000000000002118
> > [   24.392166] x3 : eeeeeeeeeeeeeeef x2 : ffff800875e7c918
> > [   24.392269] x1 : 0000000000000001 x0 : ffff800875e7ca08
> > [   24.392366] Call trace:
> > [   24.392433]  kvm_timer_irq_can_fire+0xc/0x30
> > [   24.392466]  kvm_timer_vcpu_load+0x9c/0x1a0
> > [   24.392597]  kvm_arch_vcpu_load+0xb0/0x1f0
> > [   24.392666]  kvm_sched_in+0x1c/0x28
> > [   24.392766]  finish_task_switch+0xd8/0x1d8
> > [   24.392766]  __schedule+0x248/0x4a0
> > [   24.392866]  preempt_schedule_irq+0x60/0x90
> > [   24.392966]  el1_irq+0xd0/0x180
> > [   24.392966]  kvm_handle_guest_abort+0x0/0x3a0
> > [   24.393066]  kvm_arch_vcpu_ioctl_run+0x41c/0x688
> > [   24.393166]  kvm_vcpu_ioctl+0x4c0/0x838
> > [   24.393266]  do_vfs_ioctl+0xb8/0x878
> > [   24.393266]  ksys_ioctl+0x84/0x90
> > [   24.393366]  __arm64_sys_ioctl+0x18/0x28
> > [   24.393466]  el0_svc_common.constprop.0+0xb0/0x168
> > [   24.393566]  el0_svc_handler+0x28/0x78
> > [   24.393566]  el0_svc+0x8/0xc
> > [   24.393666] ---[ end trace 37a32293e43ac12e ]---
> > [   24.393866] WARNING: CPU: 3 PID: 160 at
> > arch/arm64/kvm/../../../virt/kvm/arm/arch_timer.c:170
> > kvm_timer_irq_can_fire+0xc/0x30
> > [   24.394066] Modules linked in:
> > [   24.394266] CPU: 3 PID: 160 Comm: kvm-vcpu-1 Tainted: G        W
> > 5.2.0-rc5-00060-g7dbce63bd1c7 #145
> > [   24.394366] Hardware name: Foundation-v8A (DT)
> > [   24.394466] pstate: 40400009 (nZcv daif +PAN -UAO)
> > [   24.394466] pc : kvm_timer_irq_can_fire+0xc/0x30
> > [   24.394566] lr : timer_emulate+0x24/0x98
> > [   24.394666] sp : ffff000013d8b780
> > [   24.394727] x29: ffff000013d8b780 x28: ffff80087a639b80
> > [   24.394766] x27: ffff000010ba8648 x26: ffff000010b71b40
> > [   24.394866] x25: ffff80087a63a100 x24: 0000000000000000
> > [   24.394966] x23: 000080086ca54000 x22: 0000000000000003
> > [   24.394966] x21: ffff800875e7c918 x20: ffff800875e7a800
> > [   24.395066] x19: ffff800875e7ca80 x18: 0000000000000000
> > [   24.395166] x17: 0000000000000000 x16: 0000000000000000
> > [   24.395266] x15: 0000000000000000 x14: 0000000000002118
> > [   24.395383] x13: 0000000000002190 x12: 0000000000002280
> > [   24.395466] x11: 0000000000002208 x10: 0000000000000040
> > [   24.395547] x9 : ffff000012dc3b38 x8 : 0000000000000000
> > [   24.395666] x7 : 0000000000000000 x6 : ffff80087ac00248
> > [   24.395866] x5 : 000080086ca54000 x4 : 0000000000002118
> > [   24.395966] x3 : eeeeeeeeeeeeeeef x2 : ffff800875e7c918
> > [   24.396066] x1 : 0000000000000001 x0 : ffff800875e7ca80
> > [   24.396066] Call trace:
> > [   24.396166]  kvm_timer_irq_can_fire+0xc/0x30
> > [   24.396266]  kvm_timer_vcpu_load+0xa8/0x1a0
> > [   24.396366]  kvm_arch_vcpu_load+0xb0/0x1f0
> > [   24.396366]  kvm_sched_in+0x1c/0x28
> > [   24.396466]  finish_task_switch+0xd8/0x1d8
> > [   24.396566]  __schedule+0x248/0x4a0
> > [   24.396666]  preempt_schedule_irq+0x60/0x90
> > [   24.396666]  el1_irq+0xd0/0x180
> > [   24.396766]  kvm_handle_guest_abort+0x0/0x3a0
> > [   24.396866]  kvm_arch_vcpu_ioctl_run+0x41c/0x688
> > [   24.396866]  kvm_vcpu_ioctl+0x4c0/0x838
> > [   24.397021]  do_vfs_ioctl+0xb8/0x878
> > [   24.397066]  ksys_ioctl+0x84/0x90
> > [   24.397166]  __arm64_sys_ioctl+0x18/0x28
> > [   24.397348]  el0_svc_common.constprop.0+0xb0/0x168
> > [   24.397366]  el0_svc_handler+0x28/0x78
> > [   24.397566]  el0_svc+0x8/0xc
> > [   24.397676] ---[ end trace 37a32293e43ac12f ]---
> >
> >   # KVM compatibility warning.
> >     virtio-9p device was not detected.
> >     While you have requested a virtio-9p device, the guest kernel did not
> > initialize it.
> >     Please make sure that the guest kernel was compiled with
> > CONFIG_NET_9P_VIRTIO=y enabled in .config.
> >
> >   # KVM compatibility warning.
> >     virtio-net device was not detected.
> >     While you have requested a virtio-net device, the guest kernel did not
> > initialize it.
> >     Please make sure that the guest kernel was compiled with CONFIG_VIRTIO_NET=y
> > enabled in .config.
> >
> > [..]
> 
> Did some investigating and this was caused by a bug in kvm-unit-tests (the fix
> for it will be part of the EL2 patches for kvm-unit-tests). The guest was trying
> to fetch an instruction from address 0x200, which KVM interprets as a prefetch
> abort on an I/O address and ends up calling kvm_inject_pabt. The code from
> arch/arm64/kvm/inject_fault.c doesn't know anything about nested virtualization,
> and it sets the VCPU mode directly to PSR_MODE_EL1h. This makes_hyp_ctxt return
> false, and get_timer_map will return an incorrect mapping.
> 
> On next kvm_timer_vcpu_put, the direct timers will be {p,v}timer, and
> h{p,v}timer->loaded will not be set to false. In the corresponding call to
> kvm_timer_vcpu_load, KVM will try to emulate the hptimer and hvtimer, which
> still have loaded = true. And this causes the warning I saw.
>

Hi Alexandru,

While a unit test in kvm-unit-tests may not do what it should in order to
exercise the code it's targeting appropriately, and therefore need to be
fixed in order to do that, I'd argue that if a guest can induce a host
warning then that's a host bug. Indeed now that you've analyzed the
issue you could write a kvm-unit-tests test to specifically reproduce the
warning and then use that test to test any host fix candidates.

Thanks,
drew


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support
  2019-08-09 11:44     ` Andrew Jones
@ 2019-08-09 12:00       ` Alexandru Elisei
  2019-08-09 13:00         ` Andrew Jones
  0 siblings, 1 reply; 176+ messages in thread
From: Alexandru Elisei @ 2019-08-09 12:00 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Marc Zyngier, linux-arm-kernel, kvmarm, kvm, Andre Przywara,
	Dave P Martin

Hi Andrew,

On 8/9/19 12:44 PM, Andrew Jones wrote:
> On Fri, Aug 09, 2019 at 11:01:51AM +0100, Alexandru Elisei wrote:
>> On 8/2/19 11:11 AM, Alexandru Elisei wrote:
>>> Hi,
>>>
>>> On 6/21/19 10:37 AM, Marc Zyngier wrote:
>>>> I've taken over the maintenance of this series originally written by
>>>> Jintack and Christoffer. Since then, the series has been substantially
>>>> reworked, new features (and most probably bugs) have been added, and
>>>> the whole thing rebased multiple times. If anything breaks, please
>>>> blame me, and nobody else.
>>>>
>>>> As you can tell, this is quite big. It is also remarkably incomplete
>>>> (we're missing many critical bits for fully emulate EL2), but the idea
>>>> is to start merging things early in order to reduce the maintenance
>>>> headache. What we want to achieve is that with NV disabled, there is
>>>> no performance overhead and no regression. The only thing I intend to
>>>> merge ASAP is the first patch in the series, because it should have
>>>> zero effect and is a reasonable cleanup.
>>>>
>>>> The series is roughly divided in 4 parts: exception handling, memory
>>>> virtualization, interrupts and timers. There are of course some
>>>> dependencies, but you'll hopefully get the gist of it.
>>>>
>>>> For the most courageous of you, I've put out a branch[1] containing this
>>>> and a bit more. Of course, you'll need some userspace. Andre maintains
>>>> a hacked version of kvmtool[1] that takes a --nested option, allowing
>>>> the guest to be started at EL2. You can run the whole stack in the
>>>> Foundation model. Don't be in a hurry ;-).
>>>>
>>>> [1] git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git kvm-arm64/nv-wip-5.2-rc5
>>>> [2] git://linux-arm.org/kvmtool.git nv/nv-wip-5.2-rc5
>>>>
>>>> Andre Przywara (4):
>>>>   KVM: arm64: nv: Handle virtual EL2 registers in
>>>>     vcpu_read/write_sys_reg()
>>>>   KVM: arm64: nv: Save/Restore vEL2 sysregs
>>>>   KVM: arm64: nv: Handle traps for timer _EL02 and _EL2 sysregs
>>>>     accessors
>>>>   KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ
>>>>
>>>> Christoffer Dall (16):
>>>>   KVM: arm64: nv: Introduce nested virtualization VCPU feature
>>>>   KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set
>>>>   KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x
>>>>   KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state
>>>>   KVM: arm64: nv: Handle trapped ERET from virtual EL2
>>>>   KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor
>>>>   KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2
>>>>   KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2
>>>>     changes
>>>>   KVM: arm/arm64: nv: Support multiple nested stage 2 mmu structures
>>>>   KVM: arm64: nv: Implement nested Stage-2 page table walk logic
>>>>   KVM: arm64: nv: Handle shadow stage 2 page faults
>>>>   KVM: arm64: nv: Unmap/flush shadow stage 2 page tables
>>>>   KVM: arm64: nv: arch_timer: Support hyp timer emulation
>>>>   KVM: arm64: nv: vgic-v3: Take cpu_if pointer directly instead of vcpu
>>>>   KVM: arm64: nv: vgic: Emulate the HW bit in software
>>>>   KVM: arm64: nv: Add nested GICv3 tracepoints
>>>>
>>>> Dave Martin (1):
>>>>   KVM: arm64: Migrate _elx sysreg accessors to msr_s/mrs_s
>>>>
>>>> Jintack Lim (21):
>>>>   arm64: Add ARM64_HAS_NESTED_VIRT cpufeature
>>>>   KVM: arm64: nv: Add EL2 system registers to vcpu context
>>>>   KVM: arm64: nv: Support virtual EL2 exceptions
>>>>   KVM: arm64: nv: Inject HVC exceptions to the virtual EL2
>>>>   KVM: arm64: nv: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
>>>>   KVM: arm64: nv: Trap CPACR_EL1 access in virtual EL2
>>>>   KVM: arm64: nv: Set a handler for the system instruction traps
>>>>   KVM: arm64: nv: Handle PSCI call via smc from the guest
>>>>   KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting
>>>>   KVM: arm64: nv: Respect virtual CPTR_EL2.TFP setting
>>>>   KVM: arm64: nv: Respect the virtual HCR_EL2.NV bit setting
>>>>   KVM: arm64: nv: Respect virtual HCR_EL2.TVM and TRVM settings
>>>>   KVM: arm64: nv: Respect the virtual HCR_EL2.NV1 bit setting
>>>>   KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2
>>>>   KVM: arm64: nv: Configure HCR_EL2 for nested virtualization
>>>>   KVM: arm64: nv: Pretend we only support larger-than-host page sizes
>>>>   KVM: arm64: nv: Introduce sys_reg_desc.forward_trap
>>>>   KVM: arm64: nv: Rework the system instruction emulation framework
>>>>   KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2
>>>>   KVM: arm64: nv: Trap and emulate TLBI instructions from virtual EL2
>>>>   KVM: arm64: nv: Nested GICv3 Support
>>>>
>>>> Marc Zyngier (17):
>>>>   KVM: arm64: Move __load_guest_stage2 to kvm_mmu.h
>>>>   KVM: arm64: nv: Reset VMPIDR_EL2 and VPIDR_EL2 to sane values
>>>>   KVM: arm64: nv: Handle SPSR_EL2 specially
>>>>   KVM: arm64: nv: Refactor vcpu_{read,write}_sys_reg
>>>>   KVM: arm64: nv: Don't expose SVE to nested guests
>>>>   KVM: arm64: nv: Hide RAS from nested guests
>>>>   KVM: arm/arm64: nv: Factor out stage 2 page table data from struct kvm
>>>>   KVM: arm64: nv: Move last_vcpu_ran to be per s2 mmu
>>>>   KVM: arm64: nv: Don't always start an S2 MMU search from the beginning
>>>>   KVM: arm64: nv: Propagate CNTVOFF_EL2 to the virtual EL1 timer
>>>>   KVM: arm64: nv: Load timer before the GIC
>>>>   KVM: arm64: nv: Implement maintenance interrupt forwarding
>>>>   arm64: KVM: nv: Add handling of EL2-specific timer registers
>>>>   arm64: KVM: nv: Honor SCTLR_EL2.SPAN on entering vEL2
>>>>   arm64: KVM: nv: Handle SCTLR_EL2 RES0/RES1 bits
>>>>   arm64: KVM: nv: Restrict S2 RD/WR permissions to match the guest's
>>>>   arm64: KVM: nv: Allow userspace to request KVM_ARM_VCPU_NESTED_VIRT
>>>>
>>>>  .../admin-guide/kernel-parameters.txt         |    4 +
>>>>  .../virtual/kvm/devices/arm-vgic-v3.txt       |    9 +
>>>>  arch/arm/include/asm/kvm_asm.h                |    5 +-
>>>>  arch/arm/include/asm/kvm_emulate.h            |    3 +
>>>>  arch/arm/include/asm/kvm_host.h               |   31 +-
>>>>  arch/arm/include/asm/kvm_hyp.h                |   25 +-
>>>>  arch/arm/include/asm/kvm_mmu.h                |   83 +-
>>>>  arch/arm/include/asm/kvm_nested.h             |    9 +
>>>>  arch/arm/include/uapi/asm/kvm.h               |    1 +
>>>>  arch/arm/kvm/hyp/switch.c                     |   11 +-
>>>>  arch/arm/kvm/hyp/tlb.c                        |   13 +-
>>>>  arch/arm64/include/asm/cpucaps.h              |    3 +-
>>>>  arch/arm64/include/asm/esr.h                  |    4 +-
>>>>  arch/arm64/include/asm/kvm_arm.h              |   28 +-
>>>>  arch/arm64/include/asm/kvm_asm.h              |    9 +-
>>>>  arch/arm64/include/asm/kvm_coproc.h           |    2 +-
>>>>  arch/arm64/include/asm/kvm_emulate.h          |  157 +-
>>>>  arch/arm64/include/asm/kvm_host.h             |  105 +-
>>>>  arch/arm64/include/asm/kvm_hyp.h              |   82 +-
>>>>  arch/arm64/include/asm/kvm_mmu.h              |   62 +-
>>>>  arch/arm64/include/asm/kvm_nested.h           |   68 +
>>>>  arch/arm64/include/asm/sysreg.h               |  143 +-
>>>>  arch/arm64/include/uapi/asm/kvm.h             |    2 +
>>>>  arch/arm64/kernel/cpufeature.c                |   26 +
>>>>  arch/arm64/kvm/Makefile                       |    4 +
>>>>  arch/arm64/kvm/emulate-nested.c               |  223 +++
>>>>  arch/arm64/kvm/guest.c                        |    6 +
>>>>  arch/arm64/kvm/handle_exit.c                  |   76 +-
>>>>  arch/arm64/kvm/hyp/Makefile                   |    1 +
>>>>  arch/arm64/kvm/hyp/at.c                       |  217 +++
>>>>  arch/arm64/kvm/hyp/switch.c                   |   86 +-
>>>>  arch/arm64/kvm/hyp/sysreg-sr.c                |  267 ++-
>>>>  arch/arm64/kvm/hyp/tlb.c                      |  129 +-
>>>>  arch/arm64/kvm/hyp/vgic-v2-cpuif-proxy.c      |    2 +-
>>>>  arch/arm64/kvm/inject_fault.c                 |   12 -
>>>>  arch/arm64/kvm/nested.c                       |  551 +++++++
>>>>  arch/arm64/kvm/regmap.c                       |    4 +-
>>>>  arch/arm64/kvm/reset.c                        |    7 +
>>>>  arch/arm64/kvm/sys_regs.c                     | 1460 +++++++++++++++--
>>>>  arch/arm64/kvm/sys_regs.h                     |    6 +
>>>>  arch/arm64/kvm/trace.h                        |   58 +-
>>>>  include/kvm/arm_arch_timer.h                  |    6 +
>>>>  include/kvm/arm_vgic.h                        |   28 +-
>>>>  virt/kvm/arm/arch_timer.c                     |  158 +-
>>>>  virt/kvm/arm/arm.c                            |   62 +-
>>>>  virt/kvm/arm/hyp/vgic-v3-sr.c                 |   35 +-
>>>>  virt/kvm/arm/mmio.c                           |   12 +-
>>>>  virt/kvm/arm/mmu.c                            |  445 +++--
>>>>  virt/kvm/arm/trace.h                          |    6 +-
>>>>  virt/kvm/arm/vgic/vgic-init.c                 |   30 +
>>>>  virt/kvm/arm/vgic/vgic-kvm-device.c           |   22 +
>>>>  virt/kvm/arm/vgic/vgic-nested-trace.h         |  137 ++
>>>>  virt/kvm/arm/vgic/vgic-v2.c                   |   10 +-
>>>>  virt/kvm/arm/vgic/vgic-v3-nested.c            |  236 +++
>>>>  virt/kvm/arm/vgic/vgic-v3.c                   |   40 +-
>>>>  virt/kvm/arm/vgic/vgic.c                      |   74 +-
>>>>  56 files changed, 4683 insertions(+), 612 deletions(-)
>>>>  create mode 100644 arch/arm/include/asm/kvm_nested.h
>>>>  create mode 100644 arch/arm64/include/asm/kvm_nested.h
>>>>  create mode 100644 arch/arm64/kvm/emulate-nested.c
>>>>  create mode 100644 arch/arm64/kvm/hyp/at.c
>>>>  create mode 100644 arch/arm64/kvm/nested.c
>>>>  create mode 100644 virt/kvm/arm/vgic/vgic-nested-trace.h
>>>>  create mode 100644 virt/kvm/arm/vgic/vgic-v3-nested.c
>>>>
>>> When working on adding support for EL2 to kvm-unit-tests I was able to trigger
>>> the following warning:
>>>
>>> # ./lkvm run -f psci.flat -m 128 -c 8 --console serial --irqchip gicv3 --nested
>>>   # lkvm run --firmware psci.flat -m 128 -c 8 --name guest-151
>>>   Info: Placing fdt at 0x80200000 - 0x80210000
>>>   # Warning: The maximum recommended amount of VCPUs is 4
>>> chr_testdev_init: chr-testdev: can't find a virtio-console
>>> INFO: PSCI version 1.0
>>> PASS: invalid-function
>>> PASS: affinity-info-on
>>> PASS: affinity-info-off
>>> [   24.381266] WARNING: CPU: 3 PID: 160 at
>>> arch/arm64/kvm/../../../virt/kvm/arm/arch_timer.c:170
>>> kvm_timer_irq_can_fire+0xc/0x30
>>> [   24.381366] Modules linked in:
>>> [   24.381466] CPU: 3 PID: 160 Comm: kvm-vcpu-1 Not tainted
>>> 5.2.0-rc5-00060-g7dbce63bd1c7 #145
>>> [   24.381566] Hardware name: Foundation-v8A (DT)
>>> [   24.381566] pstate: 40400009 (nZcv daif +PAN -UAO)
>>> [   24.381666] pc : kvm_timer_irq_can_fire+0xc/0x30
>>> [   24.381766] lr : timer_emulate+0x24/0x98
>>> [   24.381766] sp : ffff000013d8b780
>>> [   24.381866] x29: ffff000013d8b780 x28: ffff80087a639b80
>>> [   24.381966] x27: ffff000010ba8648 x26: ffff000010b71b40
>>> [   24.382066] x25: ffff80087a63a100 x24: 0000000000000000
>>> [   24.382111] x23: 000080086ca54000 x22: ffff0000100ce260
>>> [   24.382166] x21: ffff800875e7c918 x20: ffff800875e7a800
>>> [   24.382275] x19: ffff800875e7ca08 x18: 0000000000000000
>>> [   24.382366] x17: 0000000000000000 x16: 0000000000000000
>>> [   24.382466] x15: 0000000000000000 x14: 0000000000002118
>>> [   24.382566] x13: 0000000000002190 x12: 0000000000002280
>>> [   24.382566] x11: 0000000000002208 x10: 0000000000000040
>>> [   24.382666] x9 : ffff000012dc3b38 x8 : 0000000000000000
>>> [   24.382766] x7 : 0000000000000000 x6 : ffff80087ac00248
>>> [   24.382866] x5 : 000080086ca54000 x4 : 0000000000002118
>>> [   24.382966] x3 : eeeeeeeeeeeeeeef x2 : ffff800875e7c918
>>> [   24.383066] x1 : 0000000000000001 x0 : ffff800875e7ca08
>>> [   24.383066] Call trace:
>>> [   24.383166]  kvm_timer_irq_can_fire+0xc/0x30
>>> [   24.383266]  kvm_timer_vcpu_load+0x9c/0x1a0
>>> [   24.383366]  kvm_arch_vcpu_load+0xb0/0x1f0
>>> [   24.383366]  kvm_sched_in+0x1c/0x28
>>> [   24.383466]  finish_task_switch+0xd8/0x1d8
>>> [   24.383566]  __schedule+0x248/0x4a0
>>> [   24.383666]  preempt_schedule_irq+0x60/0x90
>>> [   24.383666]  el1_irq+0xd0/0x180
>>> [   24.383766]  kvm_handle_guest_abort+0x0/0x3a0
>>> [   24.383866]  kvm_arch_vcpu_ioctl_run+0x41c/0x688
>>> [   24.383866]  kvm_vcpu_ioctl+0x4c0/0x838
>>> [   24.383966]  do_vfs_ioctl+0xb8/0x878
>>> [   24.384077]  ksys_ioctl+0x84/0x90
>>> [   24.384166]  __arm64_sys_ioctl+0x18/0x28
>>> [   24.384166]  el0_svc_common.constprop.0+0xb0/0x168
>>> [   24.384266]  el0_svc_handler+0x28/0x78
>>> [   24.384366]  el0_svc+0x8/0xc
>>> [   24.384366] ---[ end trace 37a32293e43ac12c ]---
>>> [   24.384666] WARNING: CPU: 3 PID: 160 at
>>> arch/arm64/kvm/../../../virt/kvm/arm/arch_timer.c:170
>>> kvm_timer_irq_can_fire+0xc/0x30
>>> [   24.384766] Modules linked in:
>>> [   24.384866] CPU: 3 PID: 160 Comm: kvm-vcpu-1 Tainted: G        W
>>> 5.2.0-rc5-00060-g7dbce63bd1c7 #145
>>> [   24.384966] Hardware name: Foundation-v8A (DT)
>>> [   24.384966] pstate: 40400009 (nZcv daif +PAN -UAO)
>>> [   24.385066] pc : kvm_timer_irq_can_fire+0xc/0x30
>>> [   24.385166] lr : timer_emulate+0x24/0x98
>>> [   24.385166] sp : ffff000013d8b780
>>> [   24.385266] x29: ffff000013d8b780 x28: ffff80087a639b80
>>> [   24.385366] x27: ffff000010ba8648 x26: ffff000010b71b40
>>> [   24.385466] x25: ffff80087a63a100 x24: 0000000000000000
>>> [   24.385466] x23: 000080086ca54000 x22: ffff0000100ce260
>>> [   24.385566] x21: ffff800875e7c918 x20: ffff800875e7a800
>>> [   24.385666] x19: ffff800875e7ca80 x18: 0000000000000000
>>> [   24.385766] x17: 0000000000000000 x16: 0000000000000000
>>> [   24.385866] x15: 0000000000000000 x14: 0000000000002118
>>> [   24.385966] x13: 0000000000002190 x12: 0000000000002280
>>> [   24.385966] x11: 0000000000002208 x10: 0000000000000040
>>> [   24.386066] x9 : ffff000012dc3b38 x8 : 0000000000000000
>>> [   24.386166] x7 : 0000000000000000 x6 : ffff80087ac00248
>>> [   24.386266] x5 : 000080086ca54000 x4 : 0000000000002118
>>> [   24.386366] x3 : eeeeeeeeeeeeeeef x2 : ffff800875e7c918
>>> [   24.386466] x1 : 0000000000000001 x0 : ffff800875e7ca80
>>> [   24.386466] Call trace:
>>> [   24.386566]  kvm_timer_irq_can_fire+0xc/0x30
>>> [   24.386666]  kvm_timer_vcpu_load+0xa8/0x1a0
>>> [   24.386666]  kvm_arch_vcpu_load+0xb0/0x1f0
>>> [   24.386898]  kvm_sched_in+0x1c/0x28
>>> [   24.386966]  finish_task_switch+0xd8/0x1d8
>>> [   24.387166]  __schedule+0x248/0x4a0
>>> [   24.387354]  preempt_schedule_irq+0x60/0x90
>>> [   24.387366]  el1_irq+0xd0/0x180
>>> [   24.387466]  kvm_handle_guest_abort+0x0/0x3a0
>>> [   24.387566]  kvm_arch_vcpu_ioctl_run+0x41c/0x688
>>> [   24.387566]  kvm_vcpu_ioctl+0x4c0/0x838
>>> [   24.387666]  do_vfs_ioctl+0xb8/0x878
>>> [   24.387766]  ksys_ioctl+0x84/0x90
>>> [   24.387866]  __arm64_sys_ioctl+0x18/0x28
>>> [   24.387866]  el0_svc_common.constprop.0+0xb0/0x168
>>> [   24.387966]  el0_svc_handler+0x28/0x78
>>> [   24.388066]  el0_svc+0x8/0xc
>>> [   24.388066] ---[ end trace 37a32293e43ac12d ]---
>>> PASS: cpu-on
>>> SUMMARY: 4 te[   24.390266] WARNING: CPU: 3 PID: 160 at
>>> arch/arm64/kvm/../../../virt/kvm/arm/arch_timer.c:170
>>> kvm_timer_irq_can_fire+0xc/0x30
>>> s[   24.390366] Modules linked in:
>>> ts[   24.390366] CPU: 3 PID: 160 Comm: kvm-vcpu-1 Tainted: G        W
>>> 5.2.0-rc5-00060-g7dbce63bd1c7 #145
>>> [   24.390566] Hardware name: Foundation-v8A (DT)
>>>
>>> [   24.390795] pstate: 40400009 (nZcv daif +PAN -UAO)
>>> [   24.390866] pc : kvm_timer_irq_can_fire+0xc/0x30
>>> [   24.390966] lr : timer_emulate+0x24/0x98
>>> [   24.391066] sp : ffff000013d8b780
>>> [   24.391066] x29: ffff000013d8b780 x28: ffff80087a639b80
>>> [   24.391166] x27: ffff000010ba8648 x26: ffff000010b71b40
>>> [   24.391266] x25: ffff80087a63a100 x24: 0000000000000000
>>> [   24.391366] x23: 000080086ca54000 x22: 0000000000000003
>>> [   24.391466] x21: ffff800875e7c918 x20: ffff800875e7a800
>>> [   24.391466] x19: ffff800875e7ca08 x18: 0000000000000000
>>> [   24.391566] x17: 0000000000000000 x16: 0000000000000000
>>> [   24.391666] x15: 0000000000000000 x14: 0000000000002118
>>> [   24.391766] x13: 0000000000002190 x12: 0000000000002280
>>> [   24.391866] x11: 0000000000002208 x10: 0000000000000040
>>> [   24.391942] x9 : ffff000012dc3b38 x8 : 0000000000000000
>>> [   24.391966] x7 : 0000000000000000 x6 : ffff80087ac00248
>>> [   24.392066] x5 : 000080086ca54000 x4 : 0000000000002118
>>> [   24.392166] x3 : eeeeeeeeeeeeeeef x2 : ffff800875e7c918
>>> [   24.392269] x1 : 0000000000000001 x0 : ffff800875e7ca08
>>> [   24.392366] Call trace:
>>> [   24.392433]  kvm_timer_irq_can_fire+0xc/0x30
>>> [   24.392466]  kvm_timer_vcpu_load+0x9c/0x1a0
>>> [   24.392597]  kvm_arch_vcpu_load+0xb0/0x1f0
>>> [   24.392666]  kvm_sched_in+0x1c/0x28
>>> [   24.392766]  finish_task_switch+0xd8/0x1d8
>>> [   24.392766]  __schedule+0x248/0x4a0
>>> [   24.392866]  preempt_schedule_irq+0x60/0x90
>>> [   24.392966]  el1_irq+0xd0/0x180
>>> [   24.392966]  kvm_handle_guest_abort+0x0/0x3a0
>>> [   24.393066]  kvm_arch_vcpu_ioctl_run+0x41c/0x688
>>> [   24.393166]  kvm_vcpu_ioctl+0x4c0/0x838
>>> [   24.393266]  do_vfs_ioctl+0xb8/0x878
>>> [   24.393266]  ksys_ioctl+0x84/0x90
>>> [   24.393366]  __arm64_sys_ioctl+0x18/0x28
>>> [   24.393466]  el0_svc_common.constprop.0+0xb0/0x168
>>> [   24.393566]  el0_svc_handler+0x28/0x78
>>> [   24.393566]  el0_svc+0x8/0xc
>>> [   24.393666] ---[ end trace 37a32293e43ac12e ]---
>>> [   24.393866] WARNING: CPU: 3 PID: 160 at
>>> arch/arm64/kvm/../../../virt/kvm/arm/arch_timer.c:170
>>> kvm_timer_irq_can_fire+0xc/0x30
>>> [   24.394066] Modules linked in:
>>> [   24.394266] CPU: 3 PID: 160 Comm: kvm-vcpu-1 Tainted: G        W
>>> 5.2.0-rc5-00060-g7dbce63bd1c7 #145
>>> [   24.394366] Hardware name: Foundation-v8A (DT)
>>> [   24.394466] pstate: 40400009 (nZcv daif +PAN -UAO)
>>> [   24.394466] pc : kvm_timer_irq_can_fire+0xc/0x30
>>> [   24.394566] lr : timer_emulate+0x24/0x98
>>> [   24.394666] sp : ffff000013d8b780
>>> [   24.394727] x29: ffff000013d8b780 x28: ffff80087a639b80
>>> [   24.394766] x27: ffff000010ba8648 x26: ffff000010b71b40
>>> [   24.394866] x25: ffff80087a63a100 x24: 0000000000000000
>>> [   24.394966] x23: 000080086ca54000 x22: 0000000000000003
>>> [   24.394966] x21: ffff800875e7c918 x20: ffff800875e7a800
>>> [   24.395066] x19: ffff800875e7ca80 x18: 0000000000000000
>>> [   24.395166] x17: 0000000000000000 x16: 0000000000000000
>>> [   24.395266] x15: 0000000000000000 x14: 0000000000002118
>>> [   24.395383] x13: 0000000000002190 x12: 0000000000002280
>>> [   24.395466] x11: 0000000000002208 x10: 0000000000000040
>>> [   24.395547] x9 : ffff000012dc3b38 x8 : 0000000000000000
>>> [   24.395666] x7 : 0000000000000000 x6 : ffff80087ac00248
>>> [   24.395866] x5 : 000080086ca54000 x4 : 0000000000002118
>>> [   24.395966] x3 : eeeeeeeeeeeeeeef x2 : ffff800875e7c918
>>> [   24.396066] x1 : 0000000000000001 x0 : ffff800875e7ca80
>>> [   24.396066] Call trace:
>>> [   24.396166]  kvm_timer_irq_can_fire+0xc/0x30
>>> [   24.396266]  kvm_timer_vcpu_load+0xa8/0x1a0
>>> [   24.396366]  kvm_arch_vcpu_load+0xb0/0x1f0
>>> [   24.396366]  kvm_sched_in+0x1c/0x28
>>> [   24.396466]  finish_task_switch+0xd8/0x1d8
>>> [   24.396566]  __schedule+0x248/0x4a0
>>> [   24.396666]  preempt_schedule_irq+0x60/0x90
>>> [   24.396666]  el1_irq+0xd0/0x180
>>> [   24.396766]  kvm_handle_guest_abort+0x0/0x3a0
>>> [   24.396866]  kvm_arch_vcpu_ioctl_run+0x41c/0x688
>>> [   24.396866]  kvm_vcpu_ioctl+0x4c0/0x838
>>> [   24.397021]  do_vfs_ioctl+0xb8/0x878
>>> [   24.397066]  ksys_ioctl+0x84/0x90
>>> [   24.397166]  __arm64_sys_ioctl+0x18/0x28
>>> [   24.397348]  el0_svc_common.constprop.0+0xb0/0x168
>>> [   24.397366]  el0_svc_handler+0x28/0x78
>>> [   24.397566]  el0_svc+0x8/0xc
>>> [   24.397676] ---[ end trace 37a32293e43ac12f ]---
>>>
>>>   # KVM compatibility warning.
>>>     virtio-9p device was not detected.
>>>     While you have requested a virtio-9p device, the guest kernel did not
>>> initialize it.
>>>     Please make sure that the guest kernel was compiled with
>>> CONFIG_NET_9P_VIRTIO=y enabled in .config.
>>>
>>>   # KVM compatibility warning.
>>>     virtio-net device was not detected.
>>>     While you have requested a virtio-net device, the guest kernel did not
>>> initialize it.
>>>     Please make sure that the guest kernel was compiled with CONFIG_VIRTIO_NET=y
>>> enabled in .config.
>>>
>>> [..]
>> Did some investigating and this was caused by a bug in kvm-unit-tests (the fix
>> for it will be part of the EL2 patches for kvm-unit-tests). The guest was trying
>> to fetch an instruction from address 0x200, which KVM interprets as a prefetch
>> abort on an I/O address and ends up calling kvm_inject_pabt. The code from
>> arch/arm64/kvm/inject_fault.c doesn't know anything about nested virtualization,
>> and it sets the VCPU mode directly to PSR_MODE_EL1h. This makes_hyp_ctxt return
>> false, and get_timer_map will return an incorrect mapping.
>>
>> On next kvm_timer_vcpu_put, the direct timers will be {p,v}timer, and
>> h{p,v}timer->loaded will not be set to false. In the corresponding call to
>> kvm_timer_vcpu_load, KVM will try to emulate the hptimer and hvtimer, which
>> still have loaded = true. And this causes the warning I saw.
>>
> Hi Alexandru,
>
> While a unit test in kvm-unit-tests may not do what it should in order to
> exercise the code it's targeting appropriately, and therefore need to be
> fixed in order to do that, I'd argue that if a guest can induce a host
> warning then that's a host bug. Indeed now that you've analyzed the
> issue you could write a kvm-unit-tests test to specifically reproduce the
> warning and then use that test to test any host fix candidates.
>
> Thanks,
> drew
>
It was a host bug triggered by a bug in kvm-unit-tests. The kvm-unit-tests bug
is a real bug because it goes against the intent of the psci test. It wasn't
discovered until now because with the upstream version of Linux we don't get any
messages about it. I'll post a patch for it as soon as I can and we can discuss
how we want to fix it :)

Thanks,
Alex

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support
  2019-08-09 12:00       ` Alexandru Elisei
@ 2019-08-09 13:00         ` Andrew Jones
  0 siblings, 0 replies; 176+ messages in thread
From: Andrew Jones @ 2019-08-09 13:00 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: Marc Zyngier, linux-arm-kernel, kvmarm, kvm, Andre Przywara,
	Dave P Martin

On Fri, Aug 09, 2019 at 01:00:02PM +0100, Alexandru Elisei wrote:
> Hi Andrew,
> 
> On 8/9/19 12:44 PM, Andrew Jones wrote:
> > On Fri, Aug 09, 2019 at 11:01:51AM +0100, Alexandru Elisei wrote:
> >> On 8/2/19 11:11 AM, Alexandru Elisei wrote:
> >>> Hi,
> >>>
> >>> On 6/21/19 10:37 AM, Marc Zyngier wrote:
> >>>> I've taken over the maintenance of this series originally written by
> >>>> Jintack and Christoffer. Since then, the series has been substantially
> >>>> reworked, new features (and most probably bugs) have been added, and
> >>>> the whole thing rebased multiple times. If anything breaks, please
> >>>> blame me, and nobody else.
> >>>>
> >>>> As you can tell, this is quite big. It is also remarkably incomplete
> >>>> (we're missing many critical bits for fully emulate EL2), but the idea
> >>>> is to start merging things early in order to reduce the maintenance
> >>>> headache. What we want to achieve is that with NV disabled, there is
> >>>> no performance overhead and no regression. The only thing I intend to
> >>>> merge ASAP is the first patch in the series, because it should have
> >>>> zero effect and is a reasonable cleanup.
> >>>>
> >>>> The series is roughly divided in 4 parts: exception handling, memory
> >>>> virtualization, interrupts and timers. There are of course some
> >>>> dependencies, but you'll hopefully get the gist of it.
> >>>>
> >>>> For the most courageous of you, I've put out a branch[1] containing this
> >>>> and a bit more. Of course, you'll need some userspace. Andre maintains
> >>>> a hacked version of kvmtool[1] that takes a --nested option, allowing
> >>>> the guest to be started at EL2. You can run the whole stack in the
> >>>> Foundation model. Don't be in a hurry ;-).
> >>>>
> >>>> [1] git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git kvm-arm64/nv-wip-5.2-rc5
> >>>> [2] git://linux-arm.org/kvmtool.git nv/nv-wip-5.2-rc5
> >>>>
> >>>> Andre Przywara (4):
> >>>>   KVM: arm64: nv: Handle virtual EL2 registers in
> >>>>     vcpu_read/write_sys_reg()
> >>>>   KVM: arm64: nv: Save/Restore vEL2 sysregs
> >>>>   KVM: arm64: nv: Handle traps for timer _EL02 and _EL2 sysregs
> >>>>     accessors
> >>>>   KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ
> >>>>
> >>>> Christoffer Dall (16):
> >>>>   KVM: arm64: nv: Introduce nested virtualization VCPU feature
> >>>>   KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set
> >>>>   KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x
> >>>>   KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state
> >>>>   KVM: arm64: nv: Handle trapped ERET from virtual EL2
> >>>>   KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor
> >>>>   KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2
> >>>>   KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2
> >>>>     changes
> >>>>   KVM: arm/arm64: nv: Support multiple nested stage 2 mmu structures
> >>>>   KVM: arm64: nv: Implement nested Stage-2 page table walk logic
> >>>>   KVM: arm64: nv: Handle shadow stage 2 page faults
> >>>>   KVM: arm64: nv: Unmap/flush shadow stage 2 page tables
> >>>>   KVM: arm64: nv: arch_timer: Support hyp timer emulation
> >>>>   KVM: arm64: nv: vgic-v3: Take cpu_if pointer directly instead of vcpu
> >>>>   KVM: arm64: nv: vgic: Emulate the HW bit in software
> >>>>   KVM: arm64: nv: Add nested GICv3 tracepoints
> >>>>
> >>>> Dave Martin (1):
> >>>>   KVM: arm64: Migrate _elx sysreg accessors to msr_s/mrs_s
> >>>>
> >>>> Jintack Lim (21):
> >>>>   arm64: Add ARM64_HAS_NESTED_VIRT cpufeature
> >>>>   KVM: arm64: nv: Add EL2 system registers to vcpu context
> >>>>   KVM: arm64: nv: Support virtual EL2 exceptions
> >>>>   KVM: arm64: nv: Inject HVC exceptions to the virtual EL2
> >>>>   KVM: arm64: nv: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
> >>>>   KVM: arm64: nv: Trap CPACR_EL1 access in virtual EL2
> >>>>   KVM: arm64: nv: Set a handler for the system instruction traps
> >>>>   KVM: arm64: nv: Handle PSCI call via smc from the guest
> >>>>   KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting
> >>>>   KVM: arm64: nv: Respect virtual CPTR_EL2.TFP setting
> >>>>   KVM: arm64: nv: Respect the virtual HCR_EL2.NV bit setting
> >>>>   KVM: arm64: nv: Respect virtual HCR_EL2.TVM and TRVM settings
> >>>>   KVM: arm64: nv: Respect the virtual HCR_EL2.NV1 bit setting
> >>>>   KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2
> >>>>   KVM: arm64: nv: Configure HCR_EL2 for nested virtualization
> >>>>   KVM: arm64: nv: Pretend we only support larger-than-host page sizes
> >>>>   KVM: arm64: nv: Introduce sys_reg_desc.forward_trap
> >>>>   KVM: arm64: nv: Rework the system instruction emulation framework
> >>>>   KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2
> >>>>   KVM: arm64: nv: Trap and emulate TLBI instructions from virtual EL2
> >>>>   KVM: arm64: nv: Nested GICv3 Support
> >>>>
> >>>> Marc Zyngier (17):
> >>>>   KVM: arm64: Move __load_guest_stage2 to kvm_mmu.h
> >>>>   KVM: arm64: nv: Reset VMPIDR_EL2 and VPIDR_EL2 to sane values
> >>>>   KVM: arm64: nv: Handle SPSR_EL2 specially
> >>>>   KVM: arm64: nv: Refactor vcpu_{read,write}_sys_reg
> >>>>   KVM: arm64: nv: Don't expose SVE to nested guests
> >>>>   KVM: arm64: nv: Hide RAS from nested guests
> >>>>   KVM: arm/arm64: nv: Factor out stage 2 page table data from struct kvm
> >>>>   KVM: arm64: nv: Move last_vcpu_ran to be per s2 mmu
> >>>>   KVM: arm64: nv: Don't always start an S2 MMU search from the beginning
> >>>>   KVM: arm64: nv: Propagate CNTVOFF_EL2 to the virtual EL1 timer
> >>>>   KVM: arm64: nv: Load timer before the GIC
> >>>>   KVM: arm64: nv: Implement maintenance interrupt forwarding
> >>>>   arm64: KVM: nv: Add handling of EL2-specific timer registers
> >>>>   arm64: KVM: nv: Honor SCTLR_EL2.SPAN on entering vEL2
> >>>>   arm64: KVM: nv: Handle SCTLR_EL2 RES0/RES1 bits
> >>>>   arm64: KVM: nv: Restrict S2 RD/WR permissions to match the guest's
> >>>>   arm64: KVM: nv: Allow userspace to request KVM_ARM_VCPU_NESTED_VIRT
> >>>>
> >>>>  .../admin-guide/kernel-parameters.txt         |    4 +
> >>>>  .../virtual/kvm/devices/arm-vgic-v3.txt       |    9 +
> >>>>  arch/arm/include/asm/kvm_asm.h                |    5 +-
> >>>>  arch/arm/include/asm/kvm_emulate.h            |    3 +
> >>>>  arch/arm/include/asm/kvm_host.h               |   31 +-
> >>>>  arch/arm/include/asm/kvm_hyp.h                |   25 +-
> >>>>  arch/arm/include/asm/kvm_mmu.h                |   83 +-
> >>>>  arch/arm/include/asm/kvm_nested.h             |    9 +
> >>>>  arch/arm/include/uapi/asm/kvm.h               |    1 +
> >>>>  arch/arm/kvm/hyp/switch.c                     |   11 +-
> >>>>  arch/arm/kvm/hyp/tlb.c                        |   13 +-
> >>>>  arch/arm64/include/asm/cpucaps.h              |    3 +-
> >>>>  arch/arm64/include/asm/esr.h                  |    4 +-
> >>>>  arch/arm64/include/asm/kvm_arm.h              |   28 +-
> >>>>  arch/arm64/include/asm/kvm_asm.h              |    9 +-
> >>>>  arch/arm64/include/asm/kvm_coproc.h           |    2 +-
> >>>>  arch/arm64/include/asm/kvm_emulate.h          |  157 +-
> >>>>  arch/arm64/include/asm/kvm_host.h             |  105 +-
> >>>>  arch/arm64/include/asm/kvm_hyp.h              |   82 +-
> >>>>  arch/arm64/include/asm/kvm_mmu.h              |   62 +-
> >>>>  arch/arm64/include/asm/kvm_nested.h           |   68 +
> >>>>  arch/arm64/include/asm/sysreg.h               |  143 +-
> >>>>  arch/arm64/include/uapi/asm/kvm.h             |    2 +
> >>>>  arch/arm64/kernel/cpufeature.c                |   26 +
> >>>>  arch/arm64/kvm/Makefile                       |    4 +
> >>>>  arch/arm64/kvm/emulate-nested.c               |  223 +++
> >>>>  arch/arm64/kvm/guest.c                        |    6 +
> >>>>  arch/arm64/kvm/handle_exit.c                  |   76 +-
> >>>>  arch/arm64/kvm/hyp/Makefile                   |    1 +
> >>>>  arch/arm64/kvm/hyp/at.c                       |  217 +++
> >>>>  arch/arm64/kvm/hyp/switch.c                   |   86 +-
> >>>>  arch/arm64/kvm/hyp/sysreg-sr.c                |  267 ++-
> >>>>  arch/arm64/kvm/hyp/tlb.c                      |  129 +-
> >>>>  arch/arm64/kvm/hyp/vgic-v2-cpuif-proxy.c      |    2 +-
> >>>>  arch/arm64/kvm/inject_fault.c                 |   12 -
> >>>>  arch/arm64/kvm/nested.c                       |  551 +++++++
> >>>>  arch/arm64/kvm/regmap.c                       |    4 +-
> >>>>  arch/arm64/kvm/reset.c                        |    7 +
> >>>>  arch/arm64/kvm/sys_regs.c                     | 1460 +++++++++++++++--
> >>>>  arch/arm64/kvm/sys_regs.h                     |    6 +
> >>>>  arch/arm64/kvm/trace.h                        |   58 +-
> >>>>  include/kvm/arm_arch_timer.h                  |    6 +
> >>>>  include/kvm/arm_vgic.h                        |   28 +-
> >>>>  virt/kvm/arm/arch_timer.c                     |  158 +-
> >>>>  virt/kvm/arm/arm.c                            |   62 +-
> >>>>  virt/kvm/arm/hyp/vgic-v3-sr.c                 |   35 +-
> >>>>  virt/kvm/arm/mmio.c                           |   12 +-
> >>>>  virt/kvm/arm/mmu.c                            |  445 +++--
> >>>>  virt/kvm/arm/trace.h                          |    6 +-
> >>>>  virt/kvm/arm/vgic/vgic-init.c                 |   30 +
> >>>>  virt/kvm/arm/vgic/vgic-kvm-device.c           |   22 +
> >>>>  virt/kvm/arm/vgic/vgic-nested-trace.h         |  137 ++
> >>>>  virt/kvm/arm/vgic/vgic-v2.c                   |   10 +-
> >>>>  virt/kvm/arm/vgic/vgic-v3-nested.c            |  236 +++
> >>>>  virt/kvm/arm/vgic/vgic-v3.c                   |   40 +-
> >>>>  virt/kvm/arm/vgic/vgic.c                      |   74 +-
> >>>>  56 files changed, 4683 insertions(+), 612 deletions(-)
> >>>>  create mode 100644 arch/arm/include/asm/kvm_nested.h
> >>>>  create mode 100644 arch/arm64/include/asm/kvm_nested.h
> >>>>  create mode 100644 arch/arm64/kvm/emulate-nested.c
> >>>>  create mode 100644 arch/arm64/kvm/hyp/at.c
> >>>>  create mode 100644 arch/arm64/kvm/nested.c
> >>>>  create mode 100644 virt/kvm/arm/vgic/vgic-nested-trace.h
> >>>>  create mode 100644 virt/kvm/arm/vgic/vgic-v3-nested.c
> >>>>
> >>> When working on adding support for EL2 to kvm-unit-tests I was able to trigger
> >>> the following warning:
> >>>
> >>> # ./lkvm run -f psci.flat -m 128 -c 8 --console serial --irqchip gicv3 --nested
> >>>   # lkvm run --firmware psci.flat -m 128 -c 8 --name guest-151
> >>>   Info: Placing fdt at 0x80200000 - 0x80210000
> >>>   # Warning: The maximum recommended amount of VCPUs is 4
> >>> chr_testdev_init: chr-testdev: can't find a virtio-console
> >>> INFO: PSCI version 1.0
> >>> PASS: invalid-function
> >>> PASS: affinity-info-on
> >>> PASS: affinity-info-off
> >>> [   24.381266] WARNING: CPU: 3 PID: 160 at
> >>> arch/arm64/kvm/../../../virt/kvm/arm/arch_timer.c:170
> >>> kvm_timer_irq_can_fire+0xc/0x30
> >>> [   24.381366] Modules linked in:
> >>> [   24.381466] CPU: 3 PID: 160 Comm: kvm-vcpu-1 Not tainted
> >>> 5.2.0-rc5-00060-g7dbce63bd1c7 #145
> >>> [   24.381566] Hardware name: Foundation-v8A (DT)
> >>> [   24.381566] pstate: 40400009 (nZcv daif +PAN -UAO)
> >>> [   24.381666] pc : kvm_timer_irq_can_fire+0xc/0x30
> >>> [   24.381766] lr : timer_emulate+0x24/0x98
> >>> [   24.381766] sp : ffff000013d8b780
> >>> [   24.381866] x29: ffff000013d8b780 x28: ffff80087a639b80
> >>> [   24.381966] x27: ffff000010ba8648 x26: ffff000010b71b40
> >>> [   24.382066] x25: ffff80087a63a100 x24: 0000000000000000
> >>> [   24.382111] x23: 000080086ca54000 x22: ffff0000100ce260
> >>> [   24.382166] x21: ffff800875e7c918 x20: ffff800875e7a800
> >>> [   24.382275] x19: ffff800875e7ca08 x18: 0000000000000000
> >>> [   24.382366] x17: 0000000000000000 x16: 0000000000000000
> >>> [   24.382466] x15: 0000000000000000 x14: 0000000000002118
> >>> [   24.382566] x13: 0000000000002190 x12: 0000000000002280
> >>> [   24.382566] x11: 0000000000002208 x10: 0000000000000040
> >>> [   24.382666] x9 : ffff000012dc3b38 x8 : 0000000000000000
> >>> [   24.382766] x7 : 0000000000000000 x6 : ffff80087ac00248
> >>> [   24.382866] x5 : 000080086ca54000 x4 : 0000000000002118
> >>> [   24.382966] x3 : eeeeeeeeeeeeeeef x2 : ffff800875e7c918
> >>> [   24.383066] x1 : 0000000000000001 x0 : ffff800875e7ca08
> >>> [   24.383066] Call trace:
> >>> [   24.383166]  kvm_timer_irq_can_fire+0xc/0x30
> >>> [   24.383266]  kvm_timer_vcpu_load+0x9c/0x1a0
> >>> [   24.383366]  kvm_arch_vcpu_load+0xb0/0x1f0
> >>> [   24.383366]  kvm_sched_in+0x1c/0x28
> >>> [   24.383466]  finish_task_switch+0xd8/0x1d8
> >>> [   24.383566]  __schedule+0x248/0x4a0
> >>> [   24.383666]  preempt_schedule_irq+0x60/0x90
> >>> [   24.383666]  el1_irq+0xd0/0x180
> >>> [   24.383766]  kvm_handle_guest_abort+0x0/0x3a0
> >>> [   24.383866]  kvm_arch_vcpu_ioctl_run+0x41c/0x688
> >>> [   24.383866]  kvm_vcpu_ioctl+0x4c0/0x838
> >>> [   24.383966]  do_vfs_ioctl+0xb8/0x878
> >>> [   24.384077]  ksys_ioctl+0x84/0x90
> >>> [   24.384166]  __arm64_sys_ioctl+0x18/0x28
> >>> [   24.384166]  el0_svc_common.constprop.0+0xb0/0x168
> >>> [   24.384266]  el0_svc_handler+0x28/0x78
> >>> [   24.384366]  el0_svc+0x8/0xc
> >>> [   24.384366] ---[ end trace 37a32293e43ac12c ]---
> >>> [   24.384666] WARNING: CPU: 3 PID: 160 at
> >>> arch/arm64/kvm/../../../virt/kvm/arm/arch_timer.c:170
> >>> kvm_timer_irq_can_fire+0xc/0x30
> >>> [   24.384766] Modules linked in:
> >>> [   24.384866] CPU: 3 PID: 160 Comm: kvm-vcpu-1 Tainted: G        W
> >>> 5.2.0-rc5-00060-g7dbce63bd1c7 #145
> >>> [   24.384966] Hardware name: Foundation-v8A (DT)
> >>> [   24.384966] pstate: 40400009 (nZcv daif +PAN -UAO)
> >>> [   24.385066] pc : kvm_timer_irq_can_fire+0xc/0x30
> >>> [   24.385166] lr : timer_emulate+0x24/0x98
> >>> [   24.385166] sp : ffff000013d8b780
> >>> [   24.385266] x29: ffff000013d8b780 x28: ffff80087a639b80
> >>> [   24.385366] x27: ffff000010ba8648 x26: ffff000010b71b40
> >>> [   24.385466] x25: ffff80087a63a100 x24: 0000000000000000
> >>> [   24.385466] x23: 000080086ca54000 x22: ffff0000100ce260
> >>> [   24.385566] x21: ffff800875e7c918 x20: ffff800875e7a800
> >>> [   24.385666] x19: ffff800875e7ca80 x18: 0000000000000000
> >>> [   24.385766] x17: 0000000000000000 x16: 0000000000000000
> >>> [   24.385866] x15: 0000000000000000 x14: 0000000000002118
> >>> [   24.385966] x13: 0000000000002190 x12: 0000000000002280
> >>> [   24.385966] x11: 0000000000002208 x10: 0000000000000040
> >>> [   24.386066] x9 : ffff000012dc3b38 x8 : 0000000000000000
> >>> [   24.386166] x7 : 0000000000000000 x6 : ffff80087ac00248
> >>> [   24.386266] x5 : 000080086ca54000 x4 : 0000000000002118
> >>> [   24.386366] x3 : eeeeeeeeeeeeeeef x2 : ffff800875e7c918
> >>> [   24.386466] x1 : 0000000000000001 x0 : ffff800875e7ca80
> >>> [   24.386466] Call trace:
> >>> [   24.386566]  kvm_timer_irq_can_fire+0xc/0x30
> >>> [   24.386666]  kvm_timer_vcpu_load+0xa8/0x1a0
> >>> [   24.386666]  kvm_arch_vcpu_load+0xb0/0x1f0
> >>> [   24.386898]  kvm_sched_in+0x1c/0x28
> >>> [   24.386966]  finish_task_switch+0xd8/0x1d8
> >>> [   24.387166]  __schedule+0x248/0x4a0
> >>> [   24.387354]  preempt_schedule_irq+0x60/0x90
> >>> [   24.387366]  el1_irq+0xd0/0x180
> >>> [   24.387466]  kvm_handle_guest_abort+0x0/0x3a0
> >>> [   24.387566]  kvm_arch_vcpu_ioctl_run+0x41c/0x688
> >>> [   24.387566]  kvm_vcpu_ioctl+0x4c0/0x838
> >>> [   24.387666]  do_vfs_ioctl+0xb8/0x878
> >>> [   24.387766]  ksys_ioctl+0x84/0x90
> >>> [   24.387866]  __arm64_sys_ioctl+0x18/0x28
> >>> [   24.387866]  el0_svc_common.constprop.0+0xb0/0x168
> >>> [   24.387966]  el0_svc_handler+0x28/0x78
> >>> [   24.388066]  el0_svc+0x8/0xc
> >>> [   24.388066] ---[ end trace 37a32293e43ac12d ]---
> >>> PASS: cpu-on
> >>> SUMMARY: 4 te[   24.390266] WARNING: CPU: 3 PID: 160 at
> >>> arch/arm64/kvm/../../../virt/kvm/arm/arch_timer.c:170
> >>> kvm_timer_irq_can_fire+0xc/0x30
> >>> s[   24.390366] Modules linked in:
> >>> ts[   24.390366] CPU: 3 PID: 160 Comm: kvm-vcpu-1 Tainted: G        W
> >>> 5.2.0-rc5-00060-g7dbce63bd1c7 #145
> >>> [   24.390566] Hardware name: Foundation-v8A (DT)
> >>>
> >>> [   24.390795] pstate: 40400009 (nZcv daif +PAN -UAO)
> >>> [   24.390866] pc : kvm_timer_irq_can_fire+0xc/0x30
> >>> [   24.390966] lr : timer_emulate+0x24/0x98
> >>> [   24.391066] sp : ffff000013d8b780
> >>> [   24.391066] x29: ffff000013d8b780 x28: ffff80087a639b80
> >>> [   24.391166] x27: ffff000010ba8648 x26: ffff000010b71b40
> >>> [   24.391266] x25: ffff80087a63a100 x24: 0000000000000000
> >>> [   24.391366] x23: 000080086ca54000 x22: 0000000000000003
> >>> [   24.391466] x21: ffff800875e7c918 x20: ffff800875e7a800
> >>> [   24.391466] x19: ffff800875e7ca08 x18: 0000000000000000
> >>> [   24.391566] x17: 0000000000000000 x16: 0000000000000000
> >>> [   24.391666] x15: 0000000000000000 x14: 0000000000002118
> >>> [   24.391766] x13: 0000000000002190 x12: 0000000000002280
> >>> [   24.391866] x11: 0000000000002208 x10: 0000000000000040
> >>> [   24.391942] x9 : ffff000012dc3b38 x8 : 0000000000000000
> >>> [   24.391966] x7 : 0000000000000000 x6 : ffff80087ac00248
> >>> [   24.392066] x5 : 000080086ca54000 x4 : 0000000000002118
> >>> [   24.392166] x3 : eeeeeeeeeeeeeeef x2 : ffff800875e7c918
> >>> [   24.392269] x1 : 0000000000000001 x0 : ffff800875e7ca08
> >>> [   24.392366] Call trace:
> >>> [   24.392433]  kvm_timer_irq_can_fire+0xc/0x30
> >>> [   24.392466]  kvm_timer_vcpu_load+0x9c/0x1a0
> >>> [   24.392597]  kvm_arch_vcpu_load+0xb0/0x1f0
> >>> [   24.392666]  kvm_sched_in+0x1c/0x28
> >>> [   24.392766]  finish_task_switch+0xd8/0x1d8
> >>> [   24.392766]  __schedule+0x248/0x4a0
> >>> [   24.392866]  preempt_schedule_irq+0x60/0x90
> >>> [   24.392966]  el1_irq+0xd0/0x180
> >>> [   24.392966]  kvm_handle_guest_abort+0x0/0x3a0
> >>> [   24.393066]  kvm_arch_vcpu_ioctl_run+0x41c/0x688
> >>> [   24.393166]  kvm_vcpu_ioctl+0x4c0/0x838
> >>> [   24.393266]  do_vfs_ioctl+0xb8/0x878
> >>> [   24.393266]  ksys_ioctl+0x84/0x90
> >>> [   24.393366]  __arm64_sys_ioctl+0x18/0x28
> >>> [   24.393466]  el0_svc_common.constprop.0+0xb0/0x168
> >>> [   24.393566]  el0_svc_handler+0x28/0x78
> >>> [   24.393566]  el0_svc+0x8/0xc
> >>> [   24.393666] ---[ end trace 37a32293e43ac12e ]---
> >>> [   24.393866] WARNING: CPU: 3 PID: 160 at
> >>> arch/arm64/kvm/../../../virt/kvm/arm/arch_timer.c:170
> >>> kvm_timer_irq_can_fire+0xc/0x30
> >>> [   24.394066] Modules linked in:
> >>> [   24.394266] CPU: 3 PID: 160 Comm: kvm-vcpu-1 Tainted: G        W
> >>> 5.2.0-rc5-00060-g7dbce63bd1c7 #145
> >>> [   24.394366] Hardware name: Foundation-v8A (DT)
> >>> [   24.394466] pstate: 40400009 (nZcv daif +PAN -UAO)
> >>> [   24.394466] pc : kvm_timer_irq_can_fire+0xc/0x30
> >>> [   24.394566] lr : timer_emulate+0x24/0x98
> >>> [   24.394666] sp : ffff000013d8b780
> >>> [   24.394727] x29: ffff000013d8b780 x28: ffff80087a639b80
> >>> [   24.394766] x27: ffff000010ba8648 x26: ffff000010b71b40
> >>> [   24.394866] x25: ffff80087a63a100 x24: 0000000000000000
> >>> [   24.394966] x23: 000080086ca54000 x22: 0000000000000003
> >>> [   24.394966] x21: ffff800875e7c918 x20: ffff800875e7a800
> >>> [   24.395066] x19: ffff800875e7ca80 x18: 0000000000000000
> >>> [   24.395166] x17: 0000000000000000 x16: 0000000000000000
> >>> [   24.395266] x15: 0000000000000000 x14: 0000000000002118
> >>> [   24.395383] x13: 0000000000002190 x12: 0000000000002280
> >>> [   24.395466] x11: 0000000000002208 x10: 0000000000000040
> >>> [   24.395547] x9 : ffff000012dc3b38 x8 : 0000000000000000
> >>> [   24.395666] x7 : 0000000000000000 x6 : ffff80087ac00248
> >>> [   24.395866] x5 : 000080086ca54000 x4 : 0000000000002118
> >>> [   24.395966] x3 : eeeeeeeeeeeeeeef x2 : ffff800875e7c918
> >>> [   24.396066] x1 : 0000000000000001 x0 : ffff800875e7ca80
> >>> [   24.396066] Call trace:
> >>> [   24.396166]  kvm_timer_irq_can_fire+0xc/0x30
> >>> [   24.396266]  kvm_timer_vcpu_load+0xa8/0x1a0
> >>> [   24.396366]  kvm_arch_vcpu_load+0xb0/0x1f0
> >>> [   24.396366]  kvm_sched_in+0x1c/0x28
> >>> [   24.396466]  finish_task_switch+0xd8/0x1d8
> >>> [   24.396566]  __schedule+0x248/0x4a0
> >>> [   24.396666]  preempt_schedule_irq+0x60/0x90
> >>> [   24.396666]  el1_irq+0xd0/0x180
> >>> [   24.396766]  kvm_handle_guest_abort+0x0/0x3a0
> >>> [   24.396866]  kvm_arch_vcpu_ioctl_run+0x41c/0x688
> >>> [   24.396866]  kvm_vcpu_ioctl+0x4c0/0x838
> >>> [   24.397021]  do_vfs_ioctl+0xb8/0x878
> >>> [   24.397066]  ksys_ioctl+0x84/0x90
> >>> [   24.397166]  __arm64_sys_ioctl+0x18/0x28
> >>> [   24.397348]  el0_svc_common.constprop.0+0xb0/0x168
> >>> [   24.397366]  el0_svc_handler+0x28/0x78
> >>> [   24.397566]  el0_svc+0x8/0xc
> >>> [   24.397676] ---[ end trace 37a32293e43ac12f ]---
> >>>
> >>>   # KVM compatibility warning.
> >>>     virtio-9p device was not detected.
> >>>     While you have requested a virtio-9p device, the guest kernel did not
> >>> initialize it.
> >>>     Please make sure that the guest kernel was compiled with
> >>> CONFIG_NET_9P_VIRTIO=y enabled in .config.
> >>>
> >>>   # KVM compatibility warning.
> >>>     virtio-net device was not detected.
> >>>     While you have requested a virtio-net device, the guest kernel did not
> >>> initialize it.
> >>>     Please make sure that the guest kernel was compiled with CONFIG_VIRTIO_NET=y
> >>> enabled in .config.
> >>>
> >>> [..]
> >> Did some investigating and this was caused by a bug in kvm-unit-tests (the fix
> >> for it will be part of the EL2 patches for kvm-unit-tests). The guest was trying
> >> to fetch an instruction from address 0x200, which KVM interprets as a prefetch
> >> abort on an I/O address and ends up calling kvm_inject_pabt. The code from
> >> arch/arm64/kvm/inject_fault.c doesn't know anything about nested virtualization,
> >> and it sets the VCPU mode directly to PSR_MODE_EL1h. This makes_hyp_ctxt return
> >> false, and get_timer_map will return an incorrect mapping.
> >>
> >> On next kvm_timer_vcpu_put, the direct timers will be {p,v}timer, and
> >> h{p,v}timer->loaded will not be set to false. In the corresponding call to
> >> kvm_timer_vcpu_load, KVM will try to emulate the hptimer and hvtimer, which
> >> still have loaded = true. And this causes the warning I saw.
> >>
> > Hi Alexandru,
> >
> > While a unit test in kvm-unit-tests may not do what it should in order to
> > exercise the code it's targeting appropriately, and therefore need to be
> > fixed in order to do that, I'd argue that if a guest can induce a host
> > warning then that's a host bug. Indeed now that you've analyzed the
> > issue you could write a kvm-unit-tests test to specifically reproduce the
> > warning and then use that test to test any host fix candidates.
> >
> > Thanks,
> > drew
> >
> It was a host bug triggered by a bug in kvm-unit-tests. The kvm-unit-tests bug
> is a real bug because it goes against the intent of the psci test. It wasn't
> discovered until now because with the upstream version of Linux we don't get any
> messages about it. I'll post a patch for it as soon as I can and we can discuss
> how we want to fix it :)
>

Great, that's how I understood it, but it wasn't clear from your original
message that we were also acknowledging the host bug, only the
kvm-unit-tests bug. I wanted to make sure we fix both.

Thanks,
drew

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 16/59] KVM: arm64: nv: Save/Restore vEL2 sysregs
  2019-06-21  9:38 ` [PATCH 16/59] KVM: arm64: nv: Save/Restore vEL2 sysregs Marc Zyngier
  2019-06-25  8:48   ` Julien Thierry
  2019-07-01 12:09   ` Alexandru Elisei
@ 2019-08-21 11:57   ` Alexandru Elisei
  2 siblings, 0 replies; 176+ messages in thread
From: Alexandru Elisei @ 2019-08-21 11:57 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave Martin

On 6/21/19 10:38 AM, Marc Zyngier wrote:
> From: Andre Przywara <andre.przywara@arm.com>
>
> Whenever we need to restore the guest's system registers to the CPU, we
> now need to take care of the EL2 system registers as well. Most of them
> are accessed via traps only, but some have an immediate effect and also
> a guest running in VHE mode would expect them to be accessible via their
> EL1 encoding, which we do not trap.
>
> Split the current __sysreg_{save,restore}_el1_state() functions into
> handling common sysregs, then differentiate between the guest running in
> vEL2 and vEL1.
>
> For vEL2 we write the virtual EL2 registers with an identical format directly
> into their EL1 counterpart, and translate the few registers that have a
> different format for the same effect on the execution when running a
> non-VHE guest guest hypervisor.
>
>   [ Commit message reworked and many bug fixes applied by Marc Zyngier
>     and Christoffer Dall. ]
>
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> ---
>  arch/arm64/kvm/hyp/sysreg-sr.c | 160 +++++++++++++++++++++++++++++++--
>  1 file changed, 153 insertions(+), 7 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> index 62866a68e852..2abb9c3ff24f 100644
> --- a/arch/arm64/kvm/hyp/sysreg-sr.c
> +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> @@ -22,6 +22,7 @@
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_emulate.h>
>  #include <asm/kvm_hyp.h>
> +#include <asm/kvm_nested.h>
>  
>  /*
>   * Non-VHE: Both host and guest must save everything.
> @@ -51,11 +52,9 @@ static void __hyp_text __sysreg_save_user_state(struct kvm_cpu_context *ctxt)
>  	ctxt->sys_regs[TPIDRRO_EL0]	= read_sysreg(tpidrro_el0);
>  }
>  
> -static void __hyp_text __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
> +static void __hyp_text __sysreg_save_vel1_state(struct kvm_cpu_context *ctxt)
>  {
> -	ctxt->sys_regs[CSSELR_EL1]	= read_sysreg(csselr_el1);
>  	ctxt->sys_regs[SCTLR_EL1]	= read_sysreg_el1(SYS_SCTLR);
> -	ctxt->sys_regs[ACTLR_EL1]	= read_sysreg(actlr_el1);
>  	ctxt->sys_regs[CPACR_EL1]	= read_sysreg_el1(SYS_CPACR);
>  	ctxt->sys_regs[TTBR0_EL1]	= read_sysreg_el1(SYS_TTBR0);
>  	ctxt->sys_regs[TTBR1_EL1]	= read_sysreg_el1(SYS_TTBR1);
> @@ -69,14 +68,58 @@ static void __hyp_text __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
>  	ctxt->sys_regs[CONTEXTIDR_EL1]	= read_sysreg_el1(SYS_CONTEXTIDR);
>  	ctxt->sys_regs[AMAIR_EL1]	= read_sysreg_el1(SYS_AMAIR);
>  	ctxt->sys_regs[CNTKCTL_EL1]	= read_sysreg_el1(SYS_CNTKCTL);
> -	ctxt->sys_regs[PAR_EL1]		= read_sysreg(par_el1);
> -	ctxt->sys_regs[TPIDR_EL1]	= read_sysreg(tpidr_el1);
>  
>  	ctxt->gp_regs.sp_el1		= read_sysreg(sp_el1);
>  	ctxt->gp_regs.elr_el1		= read_sysreg_el1(SYS_ELR);
>  	ctxt->gp_regs.spsr[KVM_SPSR_EL1]= read_sysreg_el1(SYS_SPSR);
>  }
>  
> +static void __sysreg_save_vel2_state(struct kvm_cpu_context *ctxt)
> +{
> +	ctxt->sys_regs[ESR_EL2]		= read_sysreg_el1(SYS_ESR);
> +	ctxt->sys_regs[AFSR0_EL2]	= read_sysreg_el1(SYS_AFSR0);
> +	ctxt->sys_regs[AFSR1_EL2]	= read_sysreg_el1(SYS_AFSR1);
> +	ctxt->sys_regs[FAR_EL2]		= read_sysreg_el1(SYS_FAR);
> +	ctxt->sys_regs[MAIR_EL2]	= read_sysreg_el1(SYS_MAIR);
> +	ctxt->sys_regs[VBAR_EL2]	= read_sysreg_el1(SYS_VBAR);
> +	ctxt->sys_regs[CONTEXTIDR_EL2]	= read_sysreg_el1(SYS_CONTEXTIDR);
> +	ctxt->sys_regs[AMAIR_EL2]	= read_sysreg_el1(SYS_AMAIR);
> +
> +	/*
> +	 * In VHE mode those registers are compatible between EL1 and EL2,
> +	 * and the guest uses the _EL1 versions on the CPU naturally.
> +	 * So we save them into their _EL2 versions here.
> +	 * For nVHE mode we trap accesses to those registers, so our
> +	 * _EL2 copy in sys_regs[] is always up-to-date and we don't need
> +	 * to save anything here.
> +	 */
> +	if (__vcpu_el2_e2h_is_set(ctxt)) {
> +		ctxt->sys_regs[SCTLR_EL2]	= read_sysreg_el1(SYS_SCTLR);
> +		ctxt->sys_regs[CPTR_EL2]	= read_sysreg_el1(SYS_CPACR);
> +		ctxt->sys_regs[TTBR0_EL2]	= read_sysreg_el1(SYS_TTBR0);
> +		ctxt->sys_regs[TTBR1_EL2]	= read_sysreg_el1(SYS_TTBR1);
> +		ctxt->sys_regs[TCR_EL2]		= read_sysreg_el1(SYS_TCR);
> +		ctxt->sys_regs[CNTHCTL_EL2]	= read_sysreg_el1(SYS_CNTKCTL);
> +	}

This can break guests that run with VHE on, then disable it. I stumbled into
this while working on kvm-unit-tests, which uses TTBR0 for the translation
tables. Let's consider the following scenario:

1. Guest sets HCR_EL2.E2H
2. Guest programs translation tables in TTBR0_EL1, which should reflect in
TTBR0_EL2.
3. Guest enabled MMU and does stuff.
4. Guest disables MMU and clears HCR_EL2.E2H
5. Guest turns MMU on. It doesn't change TTBR0_EL2, because it will use the same
translation tables as when running with E2H set.
6. The vcpu gets scheduled out. E2H is not set, so the value that the guest
programmed in hardware TTBR0_EL1 won't be copied to virtual TTBR0_EL2.
7. The vcpu gets scheduled back in. KVM will write the reset value for virtual
TTBR0_EL2 (which is 0x0).
8. The guest hangs.

I think this is actually a symptom of a deeper issue. When E2H is cleared, the
values that the guest wrote to the EL1 registers aren't immediately reflected in
the virtual EL2 registers, as it happens on real hardware. Instead, some of the
hardware values from the EL1 registers are copied to the corresponding EL2
registers on the next vcpu_put, which happens at a later time.

I am thinking that something like this will fix the issues (it did fix disabling
VHE in kvm-unit-tests):

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 1d896113f1f8..f2b5a39762d0 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -333,7 +333,8 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
                 * to reverse-translate virtual EL2 system registers for a
                 * non-VHE guest hypervisor.
                 */
-               __vcpu_sys_reg(vcpu, reg) = val;
+               if (reg != HCR_EL2)
+                       __vcpu_sys_reg(vcpu, reg) = val;
 
                switch (reg) {
                case ELR_EL2:
@@ -370,7 +371,17 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int
reg)
                return;
 
 memory_write:
-        __vcpu_sys_reg(vcpu, reg) = val;
+       if (reg == HCR_EL2 && vcpu_el2_e2h_is_set(vcpu) && !(val & HCR_E2H)) {
+               preempt_disable();
+               kvm_arch_vcpu_put(vcpu);
+
+               __vcpu_sys_reg(vcpu, reg) = val;
+
+               kvm_arch_vcpu_load(vcpu, smp_processor_id());
+               preempt_enable();
+       } else {
+                __vcpu_sys_reg(vcpu, reg) = val;
+       }
 }
 
 /* 3 bits per cache level, as per CLIDR, but non-existent caches always 0 */

I don't think there's any need to convert EL1 registers to their non-vhe EL2
format because of how RES0/RES1 is defined in the architecture glossary (ARM DDI
0487E.a, page 7893 for RES0 and 7894 for RES1):

"If a bit is RES0 only in some contexts:
A read of the bit must return the value last successfully written to the bit, by
either a direct or an indirect write, regardless of the use of the register when
the bit was written
[..]
While the use of the register is such that the bit is described as RES0, the
value of the bit must have no effect on the operation of the PE, other than
determining the value read back from that bit, unless this Manual explicitly
defines additional properties for the bit"

We have the translate functions that should take care of converting the non-vhe
EL2 format to the hardware EL1 format.

As an aside, the diff looks weird because the vcpu_write_sys_reg is very
complex, there are a LOT of exit points from the function, and the register
value is written twice for some registers. I think it's worth considering making
the function simpler, maybe splitting it into two separate functions, one for
EL2 registers, one for regular registers.

Here's the kvm-unit-tests diff that I used to spot the bug. It's very far from
being correct, but the test is able to finish with the fix (it hangs otherwise).
You can apply it on top of 2130fd4154ad ("tscdeadline_latency: Check condition
first before loop"):

diff --git a/arm/cstart64.S b/arm/cstart64.S
index b0e8baa1a23a..01357b3b116b 100644
--- a/arm/cstart64.S
+++ b/arm/cstart64.S
@@ -51,6 +51,18 @@ start:
     b    1b
 
 1:
+    mrs    x4, CurrentEL
+    cmp    x4, CurrentEL_EL2
+    b.ne    1f
+    mrs    x4, mpidr_el1
+    msr    vmpidr_el2, x4
+    mrs    x4, midr_el1
+    msr    vpidr_el2, x4
+    ldr    x4, =(HCR_EL2_TGE | HCR_EL2_E2H)
+    msr    hcr_el2, x4
+    isb
+
+1:
     /* set up stack */
     mov    x4, #1
     msr    spsel, x4
@@ -101,6 +113,18 @@ get_mmu_off:
 
 .globl secondary_entry
 secondary_entry:
+    mrs    x0, CurrentEL
+    cmp    x0, CurrentEL_EL2
+    b.ne    1f
+    mrs    x0, mpidr_el1
+    msr    vmpidr_el2, x0
+    mrs    x0, midr_el1
+    msr    vpidr_el2, x0
+    ldr    x0, =(HCR_EL2_TGE | HCR_EL2_E2H)
+    msr    hcr_el2, x0
+    isb
+
+1:
     /* Enable FP/ASIMD */
     mov    x0, #(3 << 20)
     msr    cpacr_el1, x0
@@ -194,6 +218,33 @@ asm_mmu_enable:
 
     ret
 
+asm_mmu_enable_hyp:
+       ic      iallu
+       tlbi    alle2is
+       dsb     ish
+
+        /* TCR */
+       ldr     x1, =TCR_EL2_RES1 |                     \
+                    TCR_T0SZ(VA_BITS) |                \
+                    TCR_TG0_64K |                      \
+                    TCR_IRGN0_WBWA | TCR_ORGN0_WBWA |  \
+                    TCR_SH0_IS
+       mrs     x2, id_aa64mmfr0_el1
+       bfi     x1, x2, #TCR_EL2_PS_SHIFT, #3
+       msr     tcr_el2, x1
+
+       /* Same MAIR and TTBR0 as in VHE mode */
+
+       /* SCTLR */
+       ldr     x1, =SCTLR_EL2_RES1 |                   \
+                    SCTLR_EL2_C |                      \
+                    SCTLR_EL2_I |                      \
+                    SCTLR_EL2_M
+       msr     sctlr_el2, x1
+       isb
+
+       ret
+
 .globl asm_mmu_disable
 asm_mmu_disable:
     mrs    x0, sctlr_el1
@@ -202,6 +253,18 @@ asm_mmu_disable:
     isb
     ret
 
+.globl asm_disable_vhe
+asm_disable_vhe:
+    str     x30, [sp, #-16]!
+
+    bl      asm_mmu_disable
+    msr     hcr_el2, xzr
+    isb
+    bl      asm_mmu_enable_hyp
+
+    ldr     x30, [sp], #16
+    ret
+
 /*
  * Vectors
  * Adapted from arch/arm64/kernel/entry.S
diff --git a/arm/selftest.c b/arm/selftest.c
index 28a17f7a7531..68a18036221b 100644
--- a/arm/selftest.c
+++ b/arm/selftest.c
@@ -287,6 +287,12 @@ static void user_psci_system_off(struct pt_regs *regs,
unsigned int esr)
 {
     __user_psci_system_off();
 }
+
+extern void asm_disable_vhe(void);
+static void check_el2(void)
+{
+    asm_disable_vhe();
+}
 #endif
 
 static void check_vectors(void *arg __unused)
@@ -369,6 +375,10 @@ int main(int argc, char **argv)
         report("PSCI version", psci_check());
         on_cpus(cpu_report, NULL);
 
+    } else if (strcmp(argv[1], "el2") == 0) {
+
+        check_el2();
+
     } else {
         printf("Unknown subtest\n");
         abort();
diff --git a/lib/arm/psci.c b/lib/arm/psci.c
index c3d399064ae3..5ef1c0386ce1 100644
--- a/lib/arm/psci.c
+++ b/lib/arm/psci.c
@@ -16,7 +16,7 @@ int psci_invoke(unsigned long function_id, unsigned long arg0,
         unsigned long arg1, unsigned long arg2)
 {
     asm volatile(
-        "hvc #0"
+        "smc #0"
     : "+r" (function_id)
     : "r" (arg0), "r" (arg1), "r" (arg2));
     return function_id;
diff --git a/lib/arm64/asm/pgtable-hwdef.h b/lib/arm64/asm/pgtable-hwdef.h
index 045a3ce12645..6b80e34dda0c 100644
--- a/lib/arm64/asm/pgtable-hwdef.h
+++ b/lib/arm64/asm/pgtable-hwdef.h
@@ -95,18 +95,42 @@
 /*
  * TCR flags.
  */
-#define TCR_TxSZ(x)        (((UL(64) - (x)) << 16) | ((UL(64) - (x)) << 0))
-#define TCR_IRGN_NC        ((UL(0) << 8) | (UL(0) << 24))
-#define TCR_IRGN_WBWA        ((UL(1) << 8) | (UL(1) << 24))
-#define TCR_IRGN_WT        ((UL(2) << 8) | (UL(2) << 24))
-#define TCR_IRGN_WBnWA        ((UL(3) << 8) | (UL(3) << 24))
-#define TCR_IRGN_MASK        ((UL(3) << 8) | (UL(3) << 24))
-#define TCR_ORGN_NC        ((UL(0) << 10) | (UL(0) << 26))
-#define TCR_ORGN_WBWA        ((UL(1) << 10) | (UL(1) << 26))
-#define TCR_ORGN_WT        ((UL(2) << 10) | (UL(2) << 26))
-#define TCR_ORGN_WBnWA        ((UL(3) << 10) | (UL(3) << 26))
-#define TCR_ORGN_MASK        ((UL(3) << 10) | (UL(3) << 26))
-#define TCR_SHARED        ((UL(3) << 12) | (UL(3) << 28))
+#define TCR_T0SZ(x)        ((UL(64) - (x)) << 0)
+#define TCR_T1SZ(x)        ((UL(64) - (x)) << 16)
+#define TCR_TxSZ(x)        (TCR_T0SZ(x) | TCR_T1SZ(x))
+#define TCR_IRGN0_NC        (UL(0) << 8)
+#define TCR_IRGN1_NC        (UL(0) << 24)
+#define TCR_IRGN_NC        (TCR_IRGN0_NC | TCR_IRGN1_NC)
+#define TCR_IRGN0_WBWA        (UL(1) << 8)
+#define TCR_IRGN1_WBWA        (UL(1) << 24)
+#define TCR_IRGN_WBWA        (TCR_IRGN0_WBWA | TCR_IRGN1_WBWA)
+#define TCR_IRGN0_WT        (UL(2) << 8)
+#define TCR_IRGN1_WT        (UL(2) << 24)
+#define TCR_IRGN_WT        (TCR_IRGN0_WT | TCR_IRGN1_WT)
+#define TCR_IRGN0_WBnWA        (UL(3) << 8)
+#define TCR_IRGN1_WBnWA        (UL(3) << 24)
+#define TCR_IRGN_WBnWA        (TCR_IRGN0_WBnWA | TCR_IRGN1_WBnWA)
+#define TCR_IRGN0_MASK        (UL(3) << 8)
+#define TCR_IRGN1_MASK        (UL(3) << 24)
+#define TCR_IRGN_MASK        (TCR_IRGN0_MASK | TCR_IRGN1_MASK)
+#define TCR_ORGN0_NC        (UL(0) << 10)
+#define TCR_ORGN1_NC        (UL(0) << 26)
+#define TCR_ORGN_NC        (TCR_ORGN0_NC | TCR_ORGN1_NC)
+#define TCR_ORGN0_WBWA        (UL(1) << 10)
+#define TCR_ORGN1_WBWA        (UL(1) << 26)
+#define TCR_ORGN_WBWA        (TCR_ORGN0_WBWA | TCR_ORGN1_WBWA)
+#define TCR_ORGN0_WT        (UL(2) << 10)
+#define TCR_ORGN1_WT        (UL(2) << 26)
+#define TCR_ORGN_WT        (TCR_ORGN0_WT | TCR_ORGN1_WT)
+#define TCR_ORGN0_WBnWA        (UL(3) << 8)
+#define TCR_ORGN1_WBnWA        (UL(3) << 24)
+#define TCR_ORGN_WBnWA        (TCR_ORGN0_WBnWA | TCR_ORGN1_WBnWA)
+#define TCR_ORGN0_MASK        (UL(3) << 10)
+#define TCR_ORGN1_MASK        (UL(3) << 26)
+#define TCR_ORGN_MASK        (TCR_ORGN0_MASK | TCR_ORGN1_MASK)
+#define TCR_SH0_IS        (UL(3) << 12)
+#define TCR_SH1_IS        (UL(3) << 28)
+#define TCR_SHARED        (TCR_SH0_IS | TCR_SH1_IS)
 #define TCR_TG0_4K        (UL(0) << 14)
 #define TCR_TG0_64K        (UL(1) << 14)
 #define TCR_TG0_16K        (UL(2) << 14)
@@ -116,6 +140,9 @@
 #define TCR_ASID16        (UL(1) << 36)
 #define TCR_TBI0        (UL(1) << 37)
 
+#define TCR_EL2_RES1        ((UL(1) << 31) | (UL(1) << 23))
+#define TCR_EL2_PS_SHIFT    16
+
 /*
  * Memory types available.
  */
diff --git a/lib/arm64/asm/processor.h b/lib/arm64/asm/processor.h
index 1d9223f728a5..b2136acda743 100644
--- a/lib/arm64/asm/processor.h
+++ b/lib/arm64/asm/processor.h
@@ -16,6 +16,16 @@
 #define SCTLR_EL1_A    (1 << 1)
 #define SCTLR_EL1_M    (1 << 0)
 
+#define HCR_EL2_TGE    (1 << 27)
+#define HCR_EL2_E2H    (1 << 34)
+
+#define SCTLR_EL2_RES1    ((UL(3) << 28) | (UL(3) << 22) |    \
+            (UL(1) << 18) | (UL(1) << 16) |        \
+            (UL(1) << 11) | (UL(3) << 4))
+#define SCTLR_EL2_I    SCTLR_EL1_I
+#define SCTLR_EL2_C    SCTLR_EL1_C
+#define SCTLR_EL2_M    SCTLR_EL1_M
+
 #ifndef __ASSEMBLY__
 #include <asm/ptrace.h>
 #include <asm/esr.h>


To run it:

lkvm run -f selftest.flat -c 1 -m 128 -p el2 --nested --irqchip gicv3 --console
serial

Thanks,
Alex

^ permalink raw reply related	[flat|nested] 176+ messages in thread

* Re: [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support
  2019-08-09 10:01   ` Alexandru Elisei
  2019-08-09 11:44     ` Andrew Jones
@ 2019-08-22 11:57     ` Alexandru Elisei
  2019-08-22 15:32       ` Alexandru Elisei
  1 sibling, 1 reply; 176+ messages in thread
From: Alexandru Elisei @ 2019-08-22 11:57 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave P Martin

On 8/9/19 11:01 AM, Alexandru Elisei wrote:
> On 8/2/19 11:11 AM, Alexandru Elisei wrote:
>> Hi,
>>
>> On 6/21/19 10:37 AM, Marc Zyngier wrote:
>> When working on adding support for EL2 to kvm-unit-tests I was able to trigger
>> the following warning:
>>
>> # ./lkvm run -f psci.flat -m 128 -c 8 --console serial --irqchip gicv3 --nested
>>   # lkvm run --firmware psci.flat -m 128 -c 8 --name guest-151
>>   Info: Placing fdt at 0x80200000 - 0x80210000
>>   # Warning: The maximum recommended amount of VCPUs is 4
>> chr_testdev_init: chr-testdev: can't find a virtio-console
>> INFO: PSCI version 1.0
>> PASS: invalid-function
>> PASS: affinity-info-on
>> PASS: affinity-info-off
>> [   24.381266] WARNING: CPU: 3 PID: 160 at
>> arch/arm64/kvm/../../../virt/kvm/arm/arch_timer.c:170
>> kvm_timer_irq_can_fire+0xc/0x30
>> [   24.381366] Modules linked in:
>> [   24.381466] CPU: 3 PID: 160 Comm: kvm-vcpu-1 Not tainted
>> 5.2.0-rc5-00060-g7dbce63bd1c7 #145
>> [   24.381566] Hardware name: Foundation-v8A (DT)
>> [   24.381566] pstate: 40400009 (nZcv daif +PAN -UAO)
>> [   24.381666] pc : kvm_timer_irq_can_fire+0xc/0x30
>> [   24.381766] lr : timer_emulate+0x24/0x98
>> [   24.381766] sp : ffff000013d8b780
>> [   24.381866] x29: ffff000013d8b780 x28: ffff80087a639b80
>> [   24.381966] x27: ffff000010ba8648 x26: ffff000010b71b40
>> [   24.382066] x25: ffff80087a63a100 x24: 0000000000000000
>> [   24.382111] x23: 000080086ca54000 x22: ffff0000100ce260
>> [   24.382166] x21: ffff800875e7c918 x20: ffff800875e7a800
>> [   24.382275] x19: ffff800875e7ca08 x18: 0000000000000000
>> [   24.382366] x17: 0000000000000000 x16: 0000000000000000
>> [   24.382466] x15: 0000000000000000 x14: 0000000000002118
>> [   24.382566] x13: 0000000000002190 x12: 0000000000002280
>> [   24.382566] x11: 0000000000002208 x10: 0000000000000040
>> [   24.382666] x9 : ffff000012dc3b38 x8 : 0000000000000000
>> [   24.382766] x7 : 0000000000000000 x6 : ffff80087ac00248
>> [   24.382866] x5 : 000080086ca54000 x4 : 0000000000002118
>> [   24.382966] x3 : eeeeeeeeeeeeeeef x2 : ffff800875e7c918
>> [   24.383066] x1 : 0000000000000001 x0 : ffff800875e7ca08
>> [   24.383066] Call trace:
>> [   24.383166]  kvm_timer_irq_can_fire+0xc/0x30
>> [   24.383266]  kvm_timer_vcpu_load+0x9c/0x1a0
>> [   24.383366]  kvm_arch_vcpu_load+0xb0/0x1f0
>> [   24.383366]  kvm_sched_in+0x1c/0x28
>> [   24.383466]  finish_task_switch+0xd8/0x1d8
>> [   24.383566]  __schedule+0x248/0x4a0
>> [   24.383666]  preempt_schedule_irq+0x60/0x90
>> [   24.383666]  el1_irq+0xd0/0x180
>> [   24.383766]  kvm_handle_guest_abort+0x0/0x3a0
>> [   24.383866]  kvm_arch_vcpu_ioctl_run+0x41c/0x688
>> [   24.383866]  kvm_vcpu_ioctl+0x4c0/0x838
>> [   24.383966]  do_vfs_ioctl+0xb8/0x878
>> [   24.384077]  ksys_ioctl+0x84/0x90
>> [   24.384166]  __arm64_sys_ioctl+0x18/0x28
>> [   24.384166]  el0_svc_common.constprop.0+0xb0/0x168
>> [   24.384266]  el0_svc_handler+0x28/0x78
>> [   24.384366]  el0_svc+0x8/0xc
>> [   24.384366] ---[ end trace 37a32293e43ac12c ]---
>> [   24.384666] WARNING: CPU: 3 PID: 160 at
>> arch/arm64/kvm/../../../virt/kvm/arm/arch_timer.c:170
>> kvm_timer_irq_can_fire+0xc/0x30
>> [   24.384766] Modules linked in:
>> [   24.384866] CPU: 3 PID: 160 Comm: kvm-vcpu-1 Tainted: G        W
>> 5.2.0-rc5-00060-g7dbce63bd1c7 #145
>> [   24.384966] Hardware name: Foundation-v8A (DT)
>> [   24.384966] pstate: 40400009 (nZcv daif +PAN -UAO)
>> [   24.385066] pc : kvm_timer_irq_can_fire+0xc/0x30
>> [   24.385166] lr : timer_emulate+0x24/0x98
>> [   24.385166] sp : ffff000013d8b780
>> [   24.385266] x29: ffff000013d8b780 x28: ffff80087a639b80
>> [   24.385366] x27: ffff000010ba8648 x26: ffff000010b71b40
>> [   24.385466] x25: ffff80087a63a100 x24: 0000000000000000
>> [   24.385466] x23: 000080086ca54000 x22: ffff0000100ce260
>> [   24.385566] x21: ffff800875e7c918 x20: ffff800875e7a800
>> [   24.385666] x19: ffff800875e7ca80 x18: 0000000000000000
>> [   24.385766] x17: 0000000000000000 x16: 0000000000000000
>> [   24.385866] x15: 0000000000000000 x14: 0000000000002118
>> [   24.385966] x13: 0000000000002190 x12: 0000000000002280
>> [   24.385966] x11: 0000000000002208 x10: 0000000000000040
>> [   24.386066] x9 : ffff000012dc3b38 x8 : 0000000000000000
>> [   24.386166] x7 : 0000000000000000 x6 : ffff80087ac00248
>> [   24.386266] x5 : 000080086ca54000 x4 : 0000000000002118
>> [   24.386366] x3 : eeeeeeeeeeeeeeef x2 : ffff800875e7c918
>> [   24.386466] x1 : 0000000000000001 x0 : ffff800875e7ca80
>> [   24.386466] Call trace:
>> [   24.386566]  kvm_timer_irq_can_fire+0xc/0x30
>> [   24.386666]  kvm_timer_vcpu_load+0xa8/0x1a0
>> [   24.386666]  kvm_arch_vcpu_load+0xb0/0x1f0
>> [   24.386898]  kvm_sched_in+0x1c/0x28
>> [   24.386966]  finish_task_switch+0xd8/0x1d8
>> [   24.387166]  __schedule+0x248/0x4a0
>> [   24.387354]  preempt_schedule_irq+0x60/0x90
>> [   24.387366]  el1_irq+0xd0/0x180
>> [   24.387466]  kvm_handle_guest_abort+0x0/0x3a0
>> [   24.387566]  kvm_arch_vcpu_ioctl_run+0x41c/0x688
>> [   24.387566]  kvm_vcpu_ioctl+0x4c0/0x838
>> [   24.387666]  do_vfs_ioctl+0xb8/0x878
>> [   24.387766]  ksys_ioctl+0x84/0x90
>> [   24.387866]  __arm64_sys_ioctl+0x18/0x28
>> [   24.387866]  el0_svc_common.constprop.0+0xb0/0x168
>> [   24.387966]  el0_svc_handler+0x28/0x78
>> [   24.388066]  el0_svc+0x8/0xc
>> [   24.388066] ---[ end trace 37a32293e43ac12d ]---
>> PASS: cpu-on
>> SUMMARY: 4 te[   24.390266] WARNING: CPU: 3 PID: 160 at
>> arch/arm64/kvm/../../../virt/kvm/arm/arch_timer.c:170
>> kvm_timer_irq_can_fire+0xc/0x30
>> s[   24.390366] Modules linked in:
>> ts[   24.390366] CPU: 3 PID: 160 Comm: kvm-vcpu-1 Tainted: G        W
>> 5.2.0-rc5-00060-g7dbce63bd1c7 #145
>> [   24.390566] Hardware name: Foundation-v8A (DT)
>>
>> [   24.390795] pstate: 40400009 (nZcv daif +PAN -UAO)
>> [   24.390866] pc : kvm_timer_irq_can_fire+0xc/0x30
>> [   24.390966] lr : timer_emulate+0x24/0x98
>> [   24.391066] sp : ffff000013d8b780
>> [   24.391066] x29: ffff000013d8b780 x28: ffff80087a639b80
>> [   24.391166] x27: ffff000010ba8648 x26: ffff000010b71b40
>> [   24.391266] x25: ffff80087a63a100 x24: 0000000000000000
>> [   24.391366] x23: 000080086ca54000 x22: 0000000000000003
>> [   24.391466] x21: ffff800875e7c918 x20: ffff800875e7a800
>> [   24.391466] x19: ffff800875e7ca08 x18: 0000000000000000
>> [   24.391566] x17: 0000000000000000 x16: 0000000000000000
>> [   24.391666] x15: 0000000000000000 x14: 0000000000002118
>> [   24.391766] x13: 0000000000002190 x12: 0000000000002280
>> [   24.391866] x11: 0000000000002208 x10: 0000000000000040
>> [   24.391942] x9 : ffff000012dc3b38 x8 : 0000000000000000
>> [   24.391966] x7 : 0000000000000000 x6 : ffff80087ac00248
>> [   24.392066] x5 : 000080086ca54000 x4 : 0000000000002118
>> [   24.392166] x3 : eeeeeeeeeeeeeeef x2 : ffff800875e7c918
>> [   24.392269] x1 : 0000000000000001 x0 : ffff800875e7ca08
>> [   24.392366] Call trace:
>> [   24.392433]  kvm_timer_irq_can_fire+0xc/0x30
>> [   24.392466]  kvm_timer_vcpu_load+0x9c/0x1a0
>> [   24.392597]  kvm_arch_vcpu_load+0xb0/0x1f0
>> [   24.392666]  kvm_sched_in+0x1c/0x28
>> [   24.392766]  finish_task_switch+0xd8/0x1d8
>> [   24.392766]  __schedule+0x248/0x4a0
>> [   24.392866]  preempt_schedule_irq+0x60/0x90
>> [   24.392966]  el1_irq+0xd0/0x180
>> [   24.392966]  kvm_handle_guest_abort+0x0/0x3a0
>> [   24.393066]  kvm_arch_vcpu_ioctl_run+0x41c/0x688
>> [   24.393166]  kvm_vcpu_ioctl+0x4c0/0x838
>> [   24.393266]  do_vfs_ioctl+0xb8/0x878
>> [   24.393266]  ksys_ioctl+0x84/0x90
>> [   24.393366]  __arm64_sys_ioctl+0x18/0x28
>> [   24.393466]  el0_svc_common.constprop.0+0xb0/0x168
>> [   24.393566]  el0_svc_handler+0x28/0x78
>> [   24.393566]  el0_svc+0x8/0xc
>> [   24.393666] ---[ end trace 37a32293e43ac12e ]---
>> [   24.393866] WARNING: CPU: 3 PID: 160 at
>> arch/arm64/kvm/../../../virt/kvm/arm/arch_timer.c:170
>> kvm_timer_irq_can_fire+0xc/0x30
>> [   24.394066] Modules linked in:
>> [   24.394266] CPU: 3 PID: 160 Comm: kvm-vcpu-1 Tainted: G        W
>> 5.2.0-rc5-00060-g7dbce63bd1c7 #145
>> [   24.394366] Hardware name: Foundation-v8A (DT)
>> [   24.394466] pstate: 40400009 (nZcv daif +PAN -UAO)
>> [   24.394466] pc : kvm_timer_irq_can_fire+0xc/0x30
>> [   24.394566] lr : timer_emulate+0x24/0x98
>> [   24.394666] sp : ffff000013d8b780
>> [   24.394727] x29: ffff000013d8b780 x28: ffff80087a639b80
>> [   24.394766] x27: ffff000010ba8648 x26: ffff000010b71b40
>> [   24.394866] x25: ffff80087a63a100 x24: 0000000000000000
>> [   24.394966] x23: 000080086ca54000 x22: 0000000000000003
>> [   24.394966] x21: ffff800875e7c918 x20: ffff800875e7a800
>> [   24.395066] x19: ffff800875e7ca80 x18: 0000000000000000
>> [   24.395166] x17: 0000000000000000 x16: 0000000000000000
>> [   24.395266] x15: 0000000000000000 x14: 0000000000002118
>> [   24.395383] x13: 0000000000002190 x12: 0000000000002280
>> [   24.395466] x11: 0000000000002208 x10: 0000000000000040
>> [   24.395547] x9 : ffff000012dc3b38 x8 : 0000000000000000
>> [   24.395666] x7 : 0000000000000000 x6 : ffff80087ac00248
>> [   24.395866] x5 : 000080086ca54000 x4 : 0000000000002118
>> [   24.395966] x3 : eeeeeeeeeeeeeeef x2 : ffff800875e7c918
>> [   24.396066] x1 : 0000000000000001 x0 : ffff800875e7ca80
>> [   24.396066] Call trace:
>> [   24.396166]  kvm_timer_irq_can_fire+0xc/0x30
>> [   24.396266]  kvm_timer_vcpu_load+0xa8/0x1a0
>> [   24.396366]  kvm_arch_vcpu_load+0xb0/0x1f0
>> [   24.396366]  kvm_sched_in+0x1c/0x28
>> [   24.396466]  finish_task_switch+0xd8/0x1d8
>> [   24.396566]  __schedule+0x248/0x4a0
>> [   24.396666]  preempt_schedule_irq+0x60/0x90
>> [   24.396666]  el1_irq+0xd0/0x180
>> [   24.396766]  kvm_handle_guest_abort+0x0/0x3a0
>> [   24.396866]  kvm_arch_vcpu_ioctl_run+0x41c/0x688
>> [   24.396866]  kvm_vcpu_ioctl+0x4c0/0x838
>> [   24.397021]  do_vfs_ioctl+0xb8/0x878
>> [   24.397066]  ksys_ioctl+0x84/0x90
>> [   24.397166]  __arm64_sys_ioctl+0x18/0x28
>> [   24.397348]  el0_svc_common.constprop.0+0xb0/0x168
>> [   24.397366]  el0_svc_handler+0x28/0x78
>> [   24.397566]  el0_svc+0x8/0xc
>> [   24.397676] ---[ end trace 37a32293e43ac12f ]---
>>
>>   # KVM compatibility warning.
>>     virtio-9p device was not detected.
>>     While you have requested a virtio-9p device, the guest kernel did not
>> initialize it.
>>     Please make sure that the guest kernel was compiled with
>> CONFIG_NET_9P_VIRTIO=y enabled in .config.
>>
>>   # KVM compatibility warning.
>>     virtio-net device was not detected.
>>     While you have requested a virtio-net device, the guest kernel did not
>> initialize it.
>>     Please make sure that the guest kernel was compiled with CONFIG_VIRTIO_NET=y
>> enabled in .config.
>>
>> [..]
> Did some investigating and this was caused by a bug in kvm-unit-tests (the fix
> for it will be part of the EL2 patches for kvm-unit-tests). The guest was trying
> to fetch an instruction from address 0x200, which KVM interprets as a prefetch
> abort on an I/O address and ends up calling kvm_inject_pabt. The code from
> arch/arm64/kvm/inject_fault.c doesn't know anything about nested virtualization,
> and it sets the VCPU mode directly to PSR_MODE_EL1h. This makes_hyp_ctxt return
> false, and get_timer_map will return an incorrect mapping.
>
> On next kvm_timer_vcpu_put, the direct timers will be {p,v}timer, and
> h{p,v}timer->loaded will not be set to false. In the corresponding call to
> kvm_timer_vcpu_load, KVM will try to emulate the hptimer and hvtimer, which
> still have loaded = true. And this causes the warning I saw.

I tried to fix it with the following patch, inject_undef64 was similarly broken:

diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index fac962b467bd..aee8a9ef36d5 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -53,15 +53,7 @@ static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt,
unsigned long addr
 {
     unsigned long cpsr = *vcpu_cpsr(vcpu);
     bool is_aarch32 = vcpu_mode_is_32bit(vcpu);
-    u32 esr = 0;
-
-    vcpu_write_elr_el1(vcpu, *vcpu_pc(vcpu));
-    *vcpu_pc(vcpu) = get_except_vector(vcpu, except_type_sync);
-
-    *vcpu_cpsr(vcpu) = PSTATE_FAULT_BITS_64;
-    vcpu_write_spsr(vcpu, cpsr);
-
-    vcpu_write_sys_reg(vcpu, addr, FAR_EL1);
+    u32 esr = ESR_ELx_FSC_EXTABT;
 
     /*
      * Build an {i,d}abort, depending on the level and the
@@ -82,13 +74,12 @@ static void inject_abt64(struct kvm_vcpu *vcpu, bool
is_iabt, unsigned long addr
     if (!is_iabt)
         esr |= ESR_ELx_EC_DABT_LOW << ESR_ELx_EC_SHIFT;
 
-    vcpu_write_sys_reg(vcpu, esr | ESR_ELx_FSC_EXTABT, ESR_EL1);
-}
+    if (nested_virt_in_use(vcpu)) {
+        kvm_inject_nested_sync(vcpu, esr);
+        return;
+    }
 
-static void inject_undef64(struct kvm_vcpu *vcpu)
-{
-    unsigned long cpsr = *vcpu_cpsr(vcpu);
-    u32 esr = (ESR_ELx_EC_UNKNOWN << ESR_ELx_EC_SHIFT);
+    vcpu_write_sys_reg(vcpu, esr, ESR_EL1);
 
     vcpu_write_elr_el1(vcpu, *vcpu_pc(vcpu));
     *vcpu_pc(vcpu) = get_except_vector(vcpu, except_type_sync);
@@ -96,6 +87,14 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
     *vcpu_cpsr(vcpu) = PSTATE_FAULT_BITS_64;
     vcpu_write_spsr(vcpu, cpsr);
 
+    vcpu_write_sys_reg(vcpu, addr, FAR_EL1);
+}
+
+static void inject_undef64(struct kvm_vcpu *vcpu)
+{
+    unsigned long cpsr = *vcpu_cpsr(vcpu);
+    u32 esr = ESR_ELx_EC_UNKNOWN << ESR_ELx_EC_SHIFT;
+
     /*
      * Build an unknown exception, depending on the instruction
      * set.
@@ -103,7 +102,18 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
     if (kvm_vcpu_trap_il_is32bit(vcpu))
         esr |= ESR_ELx_IL;
 
+    if (nested_virt_in_use(vcpu)) {
+        kvm_inject_nested_sync(vcpu, esr);
+        return;
+    }
+
     vcpu_write_sys_reg(vcpu, esr, ESR_EL1);
+
+    vcpu_write_elr_el1(vcpu, *vcpu_pc(vcpu));
+    *vcpu_pc(vcpu) = get_except_vector(vcpu, except_type_sync);
+
+    *vcpu_cpsr(vcpu) = PSTATE_FAULT_BITS_64;
+    vcpu_write_spsr(vcpu, cpsr);
 }
 
 /**


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* Re: [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support
  2019-08-22 11:57     ` Alexandru Elisei
@ 2019-08-22 15:32       ` Alexandru Elisei
  0 siblings, 0 replies; 176+ messages in thread
From: Alexandru Elisei @ 2019-08-22 15:32 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel, kvmarm, kvm; +Cc: Andre Przywara, Dave P Martin

On 8/22/19 12:57 PM, Alexandru Elisei wrote:
> [..]
> I tried to fix it with the following patch, inject_undef64 was similarly broken:
>
> diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
> index fac962b467bd..aee8a9ef36d5 100644
> --- a/arch/arm64/kvm/inject_fault.c
> +++ b/arch/arm64/kvm/inject_fault.c
> @@ -53,15 +53,7 @@ static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt,
> unsigned long addr
>  {
>      unsigned long cpsr = *vcpu_cpsr(vcpu);
>      bool is_aarch32 = vcpu_mode_is_32bit(vcpu);
> -    u32 esr = 0;
> -
> -    vcpu_write_elr_el1(vcpu, *vcpu_pc(vcpu));
> -    *vcpu_pc(vcpu) = get_except_vector(vcpu, except_type_sync);
> -
> -    *vcpu_cpsr(vcpu) = PSTATE_FAULT_BITS_64;
> -    vcpu_write_spsr(vcpu, cpsr);
> -
> -    vcpu_write_sys_reg(vcpu, addr, FAR_EL1);
> +    u32 esr = ESR_ELx_FSC_EXTABT;
>  
>      /*
>       * Build an {i,d}abort, depending on the level and the
> @@ -82,13 +74,12 @@ static void inject_abt64(struct kvm_vcpu *vcpu, bool
> is_iabt, unsigned long addr
>      if (!is_iabt)
>          esr |= ESR_ELx_EC_DABT_LOW << ESR_ELx_EC_SHIFT;
>  
> -    vcpu_write_sys_reg(vcpu, esr | ESR_ELx_FSC_EXTABT, ESR_EL1);
> -}
> +    if (nested_virt_in_use(vcpu)) {
> +        kvm_inject_nested_sync(vcpu, esr);
> +        return;
> +    }
>  
> -static void inject_undef64(struct kvm_vcpu *vcpu)
> -{
> -    unsigned long cpsr = *vcpu_cpsr(vcpu);
> -    u32 esr = (ESR_ELx_EC_UNKNOWN << ESR_ELx_EC_SHIFT);
> +    vcpu_write_sys_reg(vcpu, esr, ESR_EL1);
>  
>      vcpu_write_elr_el1(vcpu, *vcpu_pc(vcpu));
>      *vcpu_pc(vcpu) = get_except_vector(vcpu, except_type_sync);
> @@ -96,6 +87,14 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
>      *vcpu_cpsr(vcpu) = PSTATE_FAULT_BITS_64;
>      vcpu_write_spsr(vcpu, cpsr);
>  
> +    vcpu_write_sys_reg(vcpu, addr, FAR_EL1);
> +}
> +
> +static void inject_undef64(struct kvm_vcpu *vcpu)
> +{
> +    unsigned long cpsr = *vcpu_cpsr(vcpu);
> +    u32 esr = ESR_ELx_EC_UNKNOWN << ESR_ELx_EC_SHIFT;
> +
>      /*
>       * Build an unknown exception, depending on the instruction
>       * set.
> @@ -103,7 +102,18 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
>      if (kvm_vcpu_trap_il_is32bit(vcpu))
>          esr |= ESR_ELx_IL;
>  
> +    if (nested_virt_in_use(vcpu)) {
> +        kvm_inject_nested_sync(vcpu, esr);
> +        return;
> +    }
> +
>      vcpu_write_sys_reg(vcpu, esr, ESR_EL1);
> +
> +    vcpu_write_elr_el1(vcpu, *vcpu_pc(vcpu));
> +    *vcpu_pc(vcpu) = get_except_vector(vcpu, except_type_sync);
> +
> +    *vcpu_cpsr(vcpu) = PSTATE_FAULT_BITS_64;
> +    vcpu_write_spsr(vcpu, cpsr);
>  }
>  
>  /**
>
Oops, the above is broken for anything running under a L1 guest hypervisor.
Hopefully this is better:

diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index fac962b467bd..952e49aeb6f0 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -53,15 +53,7 @@ static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt,
unsigned long addr
 {
     unsigned long cpsr = *vcpu_cpsr(vcpu);
     bool is_aarch32 = vcpu_mode_is_32bit(vcpu);
-    u32 esr = 0;
-
-    vcpu_write_elr_el1(vcpu, *vcpu_pc(vcpu));
-    *vcpu_pc(vcpu) = get_except_vector(vcpu, except_type_sync);
-
-    *vcpu_cpsr(vcpu) = PSTATE_FAULT_BITS_64;
-    vcpu_write_spsr(vcpu, cpsr);
-
-    vcpu_write_sys_reg(vcpu, addr, FAR_EL1);
+    u32 esr = ESR_ELx_FSC_EXTABT;
 
     /*
      * Build an {i,d}abort, depending on the level and the
@@ -82,13 +74,12 @@ static void inject_abt64(struct kvm_vcpu *vcpu, bool
is_iabt, unsigned long addr
     if (!is_iabt)
         esr |= ESR_ELx_EC_DABT_LOW << ESR_ELx_EC_SHIFT;
 
-    vcpu_write_sys_reg(vcpu, esr | ESR_ELx_FSC_EXTABT, ESR_EL1);
-}
+    if (is_hyp_ctxt(vcpu)) {
+        kvm_inject_nested_sync(vcpu, esr);
+        return;
+    }
 
-static void inject_undef64(struct kvm_vcpu *vcpu)
-{
-    unsigned long cpsr = *vcpu_cpsr(vcpu);
-    u32 esr = (ESR_ELx_EC_UNKNOWN << ESR_ELx_EC_SHIFT);
+    vcpu_write_sys_reg(vcpu, esr, ESR_EL1);
 
     vcpu_write_elr_el1(vcpu, *vcpu_pc(vcpu));
     *vcpu_pc(vcpu) = get_except_vector(vcpu, except_type_sync);
@@ -96,6 +87,14 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
     *vcpu_cpsr(vcpu) = PSTATE_FAULT_BITS_64;
     vcpu_write_spsr(vcpu, cpsr);
 
+    vcpu_write_sys_reg(vcpu, addr, FAR_EL1);
+}
+
+static void inject_undef64(struct kvm_vcpu *vcpu)
+{
+    unsigned long cpsr = *vcpu_cpsr(vcpu);
+    u32 esr = ESR_ELx_EC_UNKNOWN << ESR_ELx_EC_SHIFT;
+
     /*
      * Build an unknown exception, depending on the instruction
      * set.
@@ -103,7 +102,18 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
     if (kvm_vcpu_trap_il_is32bit(vcpu))
         esr |= ESR_ELx_IL;
 
+    if (is_hyp_ctxt(vcpu)) {
+        kvm_inject_nested_sync(vcpu, esr);
+        return;
+    }
+
     vcpu_write_sys_reg(vcpu, esr, ESR_EL1);
+
+    vcpu_write_elr_el1(vcpu, *vcpu_pc(vcpu));
+    *vcpu_pc(vcpu) = get_except_vector(vcpu, except_type_sync);
+
+    *vcpu_cpsr(vcpu) = PSTATE_FAULT_BITS_64;
+    vcpu_write_spsr(vcpu, cpsr);
 }
 
 /**


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* Re: [PATCH 35/59] KVM: arm/arm64: nv: Support multiple nested stage 2 mmu structures
  2019-07-04 15:51   ` Alexandru Elisei
@ 2020-01-05 11:35     ` Marc Zyngier
  2020-01-06 16:31       ` Alexandru Elisei
  0 siblings, 1 reply; 176+ messages in thread
From: Marc Zyngier @ 2020-01-05 11:35 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Dave Martin,
	Julien Thierry, James Morse, Suzuki K Poulose, Christoffer Dall,
	Jintack Lim

[Blast from the past!]

Hi Alexandru,

I'm currently reworking this series, so taking the opportunity to go
through your comments (I won't reply to them all, as most are
perfectly valid and need no further discussion).

On Thu, 04 Jul 2019 16:51:52 +0100,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> On 6/21/19 10:38 AM, Marc Zyngier wrote:
> 
> > From: Christoffer Dall <christoffer.dall@arm.com>
> >
> > Add stage 2 mmu data structures for virtual EL2 and for nested guests.
> > We don't yet populate shadow stage 2 page tables, but we now have a
> > framework for getting to a shadow stage 2 pgd.
> >
> > We allocate twice the number of vcpus as stage 2 mmu structures because
> > that's sufficient for each vcpu running two VMs without having to flush
> > the stage 2 page tables.
> >
> > Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> > ---
> >  arch/arm/include/asm/kvm_host.h     |   4 +
> >  arch/arm/include/asm/kvm_mmu.h      |   3 +
> >  arch/arm64/include/asm/kvm_host.h   |  28 +++++
> >  arch/arm64/include/asm/kvm_mmu.h    |   8 ++
> >  arch/arm64/include/asm/kvm_nested.h |   7 ++
> >  arch/arm64/kvm/nested.c             | 172 ++++++++++++++++++++++++++++
> >  virt/kvm/arm/arm.c                  |  16 ++-
> >  virt/kvm/arm/mmu.c                  |  31 ++---
> >  8 files changed, 254 insertions(+), 15 deletions(-)
> >
> > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> > index e3217c4ad25b..b821eb2383ad 100644
> > --- a/arch/arm/include/asm/kvm_host.h
> > +++ b/arch/arm/include/asm/kvm_host.h
> > @@ -424,4 +424,8 @@ static inline bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu)
> >  	return true;
> >  }
> >  
> > +static inline void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu) {}
> > +static inline void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu) {}
> > +static inline int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu) { return 0; }
> > +
> >  #endif /* __ARM_KVM_HOST_H__ */
> > diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> > index be23e3f8e08c..e6984b6da2ce 100644
> > --- a/arch/arm/include/asm/kvm_mmu.h
> > +++ b/arch/arm/include/asm/kvm_mmu.h
> > @@ -420,6 +420,9 @@ static inline int hyp_map_aux_data(void)
> >  
> >  static inline void kvm_set_ipa_limit(void) {}
> >  
> > +static inline void kvm_init_s2_mmu(struct kvm_s2_mmu *mmu) {}
> > +static inline void kvm_init_nested(struct kvm *kvm) {}
> > +
> >  static __always_inline u64 kvm_get_vttbr(struct kvm_s2_mmu *mmu)
> >  {
> >  	struct kvm_vmid *vmid = &mmu->vmid;
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 3dee5e17a4ee..cc238de170d2 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -88,11 +88,39 @@ struct kvm_s2_mmu {
> >  	phys_addr_t	pgd_phys;
> >  
> >  	struct kvm *kvm;
> > +
> > +	/*
> > +	 * For a shadow stage-2 MMU, the virtual vttbr programmed by the guest
> > +	 * hypervisor.  Unused for kvm_arch->mmu. Set to 1 when the structure
> > +	 * contains no valid information.
> > +	 */
> > +	u64	vttbr;
> > +
> > +	/* true when this represents a nested context where virtual HCR_EL2.VM == 1 */
> > +	bool	nested_stage2_enabled;
> > +
> > +	/*
> > +	 *  0: Nobody is currently using this, check vttbr for validity
> > +	 * >0: Somebody is actively using this.
> > +	 */
> > +	atomic_t refcnt;
> >  };
> >  
> > +static inline bool kvm_s2_mmu_valid(struct kvm_s2_mmu *mmu)
> > +{
> > +	return !(mmu->vttbr & 1);
> > +}
> > +
> >  struct kvm_arch {
> >  	struct kvm_s2_mmu mmu;
> >  
> > +	/*
> > +	 * Stage 2 paging stage for VMs with nested virtual using a virtual
> > +	 * VMID.
> > +	 */
> > +	struct kvm_s2_mmu *nested_mmus;
> > +	size_t nested_mmus_size;
> > +
> >  	/* VTCR_EL2 value for this VM */
> >  	u64    vtcr;
> >  
> > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> > index 1eb6e0ca61c2..32bcaa1845dc 100644
> > --- a/arch/arm64/include/asm/kvm_mmu.h
> > +++ b/arch/arm64/include/asm/kvm_mmu.h
> > @@ -100,6 +100,7 @@ alternative_cb_end
> >  #include <asm/mmu_context.h>
> >  #include <asm/pgtable.h>
> >  #include <asm/kvm_emulate.h>
> > +#include <asm/kvm_nested.h>
> >  
> >  void kvm_update_va_mask(struct alt_instr *alt,
> >  			__le32 *origptr, __le32 *updptr, int nr_inst);
> > @@ -164,6 +165,7 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
> >  			     void **haddr);
> >  void free_hyp_pgds(void);
> >  
> > +void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size);
> >  void stage2_unmap_vm(struct kvm *kvm);
> >  int kvm_alloc_stage2_pgd(struct kvm_s2_mmu *mmu);
> >  void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
> > @@ -635,5 +637,11 @@ static __always_inline void __load_guest_stage2(struct kvm_s2_mmu *mmu)
> >  	asm(ALTERNATIVE("nop", "isb", ARM64_WORKAROUND_1165522));
> >  }
> >  
> > +static inline u64 get_vmid(u64 vttbr)
> > +{
> > +	return (vttbr & VTTBR_VMID_MASK(kvm_get_vmid_bits())) >>
> > +		VTTBR_VMID_SHIFT;
> > +}
> > +
> >  #endif /* __ASSEMBLY__ */
> >  #endif /* __ARM64_KVM_MMU_H__ */
> > diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> > index 61e71d0d2151..d4021d0892bd 100644
> > --- a/arch/arm64/include/asm/kvm_nested.h
> > +++ b/arch/arm64/include/asm/kvm_nested.h
> > @@ -10,6 +10,13 @@ static inline bool nested_virt_in_use(const struct kvm_vcpu *vcpu)
> >  		test_bit(KVM_ARM_VCPU_NESTED_VIRT, vcpu->arch.features);
> >  }
> >  
> > +extern void kvm_init_nested(struct kvm *kvm);
> > +extern int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu);
> > +extern void kvm_init_s2_mmu(struct kvm_s2_mmu *mmu);
> > +extern struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr);
> > +extern void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu);
> > +extern void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu);
> > +
> >  int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
> >  extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
> >  extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
> > diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> > index 3872e3cf1691..4b38dc5c0be3 100644
> > --- a/arch/arm64/kvm/nested.c
> > +++ b/arch/arm64/kvm/nested.c
> > @@ -18,7 +18,161 @@
> >  #include <linux/kvm.h>
> >  #include <linux/kvm_host.h>
> >  
> > +#include <asm/kvm_arm.h>
> >  #include <asm/kvm_emulate.h>
> > +#include <asm/kvm_mmu.h>
> > +#include <asm/kvm_nested.h>
> > +
> > +void kvm_init_nested(struct kvm *kvm)
> > +{
> > +	kvm_init_s2_mmu(&kvm->arch.mmu);
> > +
> > +	kvm->arch.nested_mmus = NULL;
> > +	kvm->arch.nested_mmus_size = 0;
> > +}
> > +
> > +int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu)
> > +{
> > +	struct kvm *kvm = vcpu->kvm;
> > +	struct kvm_s2_mmu *tmp;
> > +	int num_mmus;
> > +	int ret = -ENOMEM;
> > +
> > +	if (!test_bit(KVM_ARM_VCPU_NESTED_VIRT, vcpu->arch.features))
> > +		return 0;
> > +
> > +	if (!cpus_have_const_cap(ARM64_HAS_NESTED_VIRT))
> > +		return -EINVAL;
> > +
> > +	mutex_lock(&kvm->lock);
> > +
> > +	num_mmus = atomic_read(&kvm->online_vcpus) * 2;
> > +	tmp = __krealloc(kvm->arch.nested_mmus,
> > +			 num_mmus * sizeof(*kvm->arch.nested_mmus),
> > +			 GFP_KERNEL | __GFP_ZERO);
> > +
> > +	if (tmp) {
> > +		if (tmp != kvm->arch.nested_mmus)
> > +			kfree(kvm->arch.nested_mmus);
> 
> Here we are freeing a resource that is shared between all virtual
> CPUs. So if kvm_alloc_stage2_pgd (below) fails, we could end up
> freeing something that has been initialized by previous
> vcpu(s). Same thing happens if tmp == kvm->arch.nested_mmus (because
> of kfree(tmp)).
> 
> Looking at Documentation/virtual/kvm/api.txt (section 4.82,
> KVM_ARM_VCPU_IOCTL):
> 
> Userspace can call this function multiple times for a given vcpu,
> including after the vcpu has been run. This will reset the vcpu to
> its initial state.
> 
> It seems to me that it is allowed for userspace to try to call
> KVM_ARM_VCPU_IOCTL again after it failed. I was wondering if it's
> possible to end up in a situation where the second call succeeds,
> but kvm->arch.nested_mmus is not initialized properly and if we
> should care about that.

Indeed, this needs reworking. I think that in the event of any
allocation failure, we're better off leaving things in place. Given
that we always perform a krealloc(), it should be safe to do so.

> 
> > +
> > +		tmp[num_mmus - 1].kvm = kvm;
> > +		atomic_set(&tmp[num_mmus - 1].refcnt, 0);
> > +		ret = kvm_alloc_stage2_pgd(&tmp[num_mmus - 1]);
> > +		if (ret)
> > +			goto out;
> > +
> > +		tmp[num_mmus - 2].kvm = kvm;
> > +		atomic_set(&tmp[num_mmus - 2].refcnt, 0);
> > +		ret = kvm_alloc_stage2_pgd(&tmp[num_mmus - 2]);
> > +		if (ret) {
> > +			kvm_free_stage2_pgd(&tmp[num_mmus - 1]);
> > +			goto out;
> > +		}
> > +
> > +		kvm->arch.nested_mmus_size = num_mmus;
> > +		kvm->arch.nested_mmus = tmp;
> > +		tmp = NULL;
> > +	}
> > +
> > +out:
> > +	kfree(tmp);
> > +	mutex_unlock(&kvm->lock);
> > +	return ret;
> > +}
> > +
> > +/* Must be called with kvm->lock held */
> > +struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr)
> > +{
> > +	bool nested_stage2_enabled = hcr & HCR_VM;
> > +	int i;
> > +
> > +	/* Don't consider the CnP bit for the vttbr match */
> > +	vttbr = vttbr & ~1UL;
> 
> There's a define for that bit, VTTBR_CNP_BIT in kvm_arm.h (which this file
> includes).

Fair enough.

> 
> > +
> > +	/* Search a mmu in the list using the virtual VMID as a key */
> 
> I think when the guest has stage 2 enabled we also match on
> VTTBR_EL2.VMID + VTTBR_EL2.BADDR.

Yes.

> 
> > +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> > +		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
> > +
> > +		if (!kvm_s2_mmu_valid(mmu))
> > +			continue;
> > +
> > +		if (nested_stage2_enabled &&
> > +		    mmu->nested_stage2_enabled &&
> > +		    vttbr == mmu->vttbr)
> > +			return mmu;
> > +
> > +		if (!nested_stage2_enabled &&
> > +		    !mmu->nested_stage2_enabled &&
> > +		    get_vmid(vttbr) == get_vmid(mmu->vttbr))
> > +			return mmu;
> 
> I'm struggling to understand why we do this. As far as I can tell,
> this applies only to non-vhe guest hypervisors, because that's the
> only situation where we run something at vEL1 without shadow stage 2
> enabled.

I don't think the non-VHE consideration is relevant here. Trying to
distinguish between arbitrary use cases is just a massive distraction
and gets in the way of implementing the architecture correctly.

> There's only one MMU that matches that guest, because
> there's only one host for the non-vhe hypervisor. So why are we
> matching on vmid only, instead of matching on the entire vttbr? Is
> it because we don't want to get fooled by a naughty guest that
> changes the BADDR each time it switches to its EL1 host? Or
> something else entirely that I'm missing?

First, let's remember one thing: No matter whether we have S2 enabled
or not, TLBs are tagged by VMID. I wondered about that for a long
time, but couldn't find anything in the architecture that would
contradict this. And if you think of what a CPU must implement, it
makes more sense to always tag things than try to come up with more
fancy schemes.

Given that our shadow S2 is an ugly form of SW-loaded TLBs, matching
the VMID is critical to hit the right shadow MMU context. And since S2
*translation* is disabled, it is perfectly valid to ignore the BADDR
part of VTTBR, and VHE-vs-!VHE doesn't come into the equation.

Now, the real cause for questioning is the previous statement: why do
we decide to match on the whole VTTBR when S2 translation is enabled?
Let's imagine a guest that reuses the same VMID for different
translations (that's a bit bonkers, but unfortunately not forbidden by
the architecture if you consider per-CPU translations).

And even in the single VCPU case that would change BADDR each time
without invalidating the TLBs: If we only matched on the VMID, we
could potentially end-up with TLB conflicts, and that's not a nice
place to be. So we will just pick a new MMU context, and start
populating shadow translations based on this context. This is pretty
forgiving for broken guests, but it also simplifies the implementation.

> 
> > +	}
> > +	return NULL;
> > +}
> > +
> > +static struct kvm_s2_mmu *get_s2_mmu_nested(struct kvm_vcpu *vcpu)
> > +{
> > +	struct kvm *kvm = vcpu->kvm;
> > +	u64 vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
> > +	u64 hcr= vcpu_read_sys_reg(vcpu, HCR_EL2);
> > +	struct kvm_s2_mmu *s2_mmu;
> > +	int i;
> > +
> > +	s2_mmu = lookup_s2_mmu(kvm, vttbr, hcr);
> > +	if (s2_mmu)
> > +		goto out;
> > +
> > +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> > +		s2_mmu = &kvm->arch.nested_mmus[i];
> > +
> > +		if (atomic_read(&s2_mmu->refcnt) == 0)
> > +			break;
> > +	}
> > +	BUG_ON(atomic_read(&s2_mmu->refcnt)); /* We have struct MMUs to spare */
> > +
> > +	if (kvm_s2_mmu_valid(s2_mmu)) {
> > +		/* Clear the old state */
> > +		kvm_unmap_stage2_range(s2_mmu, 0, kvm_phys_size(kvm));
> > +		if (s2_mmu->vmid.vmid_gen)
> > +			kvm_call_hyp(__kvm_tlb_flush_vmid, s2_mmu);
> > +	}
> > +
> > +	/*
> > +	 * The virtual VMID (modulo CnP) will be used as a key when matching
> > +	 * an existing kvm_s2_mmu.
> > +	 */
> > +	s2_mmu->vttbr = vttbr & ~1UL;
> > +	s2_mmu->nested_stage2_enabled = hcr & HCR_VM;
> > +
> > +out:
> > +	atomic_inc(&s2_mmu->refcnt);
> > +	return s2_mmu;
> > +}
> > +
> > +void kvm_init_s2_mmu(struct kvm_s2_mmu *mmu)
> > +{
> > +	mmu->vttbr = 1;
> > +	mmu->nested_stage2_enabled = false;
> > +	atomic_set(&mmu->refcnt, 0);
> > +}
> > +
> > +void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu)
> > +{
> > +	if (is_hyp_ctxt(vcpu)) {
> 
> If userspace has set the nested feature, but hasn't set the vcpu
> mode to PSR_MODE_EL2h/PSR_MODE_EL2t, we will never use
> kvm->arch.mmu, and instead we'll always take the mmu_lock and search
> for the mmu. Is that something we should care about?

I'm really not worried about this. If the user has asked for nested
support but doesn't make use of it, tough luck.

> 
> > +		vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu;
> > +	} else {
> > +		spin_lock(&vcpu->kvm->mmu_lock);
> > +		vcpu->arch.hw_mmu = get_s2_mmu_nested(vcpu);
> > +		spin_unlock(&vcpu->kvm->mmu_lock);
> > +	}
> > +}
> > +
> > +void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu)
> > +{
> > +	if (vcpu->arch.hw_mmu != &vcpu->kvm->arch.mmu) {
> > +		atomic_dec(&vcpu->arch.hw_mmu->refcnt);
> > +		vcpu->arch.hw_mmu = NULL;
> > +	}
> > +}
> >  
> >  /*
> >   * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
> > @@ -37,3 +191,21 @@ int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
> >  
> >  	return -EINVAL;
> >  }
> > +
> > +void kvm_arch_flush_shadow_all(struct kvm *kvm)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> > +		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
> > +
> > +		WARN_ON(atomic_read(&mmu->refcnt));
> > +
> > +		if (!atomic_read(&mmu->refcnt))
> > +			kvm_free_stage2_pgd(mmu);
> > +	}
> > +	kfree(kvm->arch.nested_mmus);
> > +	kvm->arch.nested_mmus = NULL;
> > +	kvm->arch.nested_mmus_size = 0;
> > +	kvm_free_stage2_pgd(&kvm->arch.mmu);
> > +}
> > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> > index 5d4371633e1c..4e3cbfa1ecbe 100644
> > --- a/virt/kvm/arm/arm.c
> > +++ b/virt/kvm/arm/arm.c
> > @@ -36,6 +36,7 @@
> >  #include <asm/kvm_arm.h>
> >  #include <asm/kvm_asm.h>
> >  #include <asm/kvm_mmu.h>
> > +#include <asm/kvm_nested.h>
> >  #include <asm/kvm_emulate.h>
> >  #include <asm/kvm_coproc.h>
> >  #include <asm/sections.h>
> > @@ -126,6 +127,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> >  	kvm->arch.mmu.vmid.vmid_gen = 0;
> >  	kvm->arch.mmu.kvm = kvm;
> >  
> > +	kvm_init_nested(kvm);
> 
> More context:
> 
> @@ -120,18 +121,20 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  
>      ret = kvm_alloc_stage2_pgd(&kvm->arch.mmu);
>      if (ret)
>          goto out_fail_alloc;
>  
>      /* Mark the initial VMID generation invalid */
>      kvm->arch.mmu.vmid.vmid_gen = 0;
>      kvm->arch.mmu.kvm = kvm;
>  
> +    kvm_init_nested(kvm);
> +
> 
> kvm_alloc_stage2_pgd ends up calling kvm_init_s2_mmu for
> kvm->arch.mmu.  kvm_init_nested does the same thing, and we end up
> calling kvm_init_s2_mmu for kvm->arch.mmu twice. There's really no
> harm in doing it twice, but I thought I could mention it because it
> looks to me that the fix is simple: remove the kvm_init_s2_mmu call
> from kvm_init_nested. The double initialization is also true for the
> tip of the series.

I've reworked this part a bit, and need to reconcile the above with
what I now have. Overall, I do agree that there is some vastly
redundant init here.

And a very belated thanks for all the work you've put into reviewing
this difficult series. Rest assured that this wasn't in vain!

Thanks,

	M.

-- 
Jazz is not dead, it just smells funny.

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH 35/59] KVM: arm/arm64: nv: Support multiple nested stage 2 mmu structures
  2020-01-05 11:35     ` Marc Zyngier
@ 2020-01-06 16:31       ` Alexandru Elisei
  0 siblings, 0 replies; 176+ messages in thread
From: Alexandru Elisei @ 2020-01-06 16:31 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Dave Martin,
	Julien Thierry, James Morse, Suzuki K Poulose, Christoffer Dall,
	Jintack Lim

Hi,

On 1/5/20 11:35 AM, Marc Zyngier wrote:
> [Blast from the past!]
>
> Hi Alexandru,
>
> I'm currently reworking this series, so taking the opportunity to go
> through your comments (I won't reply to them all, as most are
> perfectly valid and need no further discussion).
>
> On Thu, 04 Jul 2019 16:51:52 +0100,
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>> On 6/21/19 10:38 AM, Marc Zyngier wrote:
>>
>>> From: Christoffer Dall <christoffer.dall@arm.com>
>>>
>>> Add stage 2 mmu data structures for virtual EL2 and for nested guests.
>>> We don't yet populate shadow stage 2 page tables, but we now have a
>>> framework for getting to a shadow stage 2 pgd.
>>>
>>> We allocate twice the number of vcpus as stage 2 mmu structures because
>>> that's sufficient for each vcpu running two VMs without having to flush
>>> the stage 2 page tables.
>>>
>>> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>> ---
>>>  arch/arm/include/asm/kvm_host.h     |   4 +
>>>  arch/arm/include/asm/kvm_mmu.h      |   3 +
>>>  arch/arm64/include/asm/kvm_host.h   |  28 +++++
>>>  arch/arm64/include/asm/kvm_mmu.h    |   8 ++
>>>  arch/arm64/include/asm/kvm_nested.h |   7 ++
>>>  arch/arm64/kvm/nested.c             | 172 ++++++++++++++++++++++++++++
>>>  virt/kvm/arm/arm.c                  |  16 ++-
>>>  virt/kvm/arm/mmu.c                  |  31 ++---
>>>  8 files changed, 254 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>>> index e3217c4ad25b..b821eb2383ad 100644
>>> --- a/arch/arm/include/asm/kvm_host.h
>>> +++ b/arch/arm/include/asm/kvm_host.h
>>> @@ -424,4 +424,8 @@ static inline bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu)
>>>  	return true;
>>>  }
>>>  
>>> +static inline void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu) {}
>>> +static inline void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu) {}
>>> +static inline int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu) { return 0; }
>>> +
>>>  #endif /* __ARM_KVM_HOST_H__ */
>>> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
>>> index be23e3f8e08c..e6984b6da2ce 100644
>>> --- a/arch/arm/include/asm/kvm_mmu.h
>>> +++ b/arch/arm/include/asm/kvm_mmu.h
>>> @@ -420,6 +420,9 @@ static inline int hyp_map_aux_data(void)
>>>  
>>>  static inline void kvm_set_ipa_limit(void) {}
>>>  
>>> +static inline void kvm_init_s2_mmu(struct kvm_s2_mmu *mmu) {}
>>> +static inline void kvm_init_nested(struct kvm *kvm) {}
>>> +
>>>  static __always_inline u64 kvm_get_vttbr(struct kvm_s2_mmu *mmu)
>>>  {
>>>  	struct kvm_vmid *vmid = &mmu->vmid;
>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>>> index 3dee5e17a4ee..cc238de170d2 100644
>>> --- a/arch/arm64/include/asm/kvm_host.h
>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>> @@ -88,11 +88,39 @@ struct kvm_s2_mmu {
>>>  	phys_addr_t	pgd_phys;
>>>  
>>>  	struct kvm *kvm;
>>> +
>>> +	/*
>>> +	 * For a shadow stage-2 MMU, the virtual vttbr programmed by the guest
>>> +	 * hypervisor.  Unused for kvm_arch->mmu. Set to 1 when the structure
>>> +	 * contains no valid information.
>>> +	 */
>>> +	u64	vttbr;
>>> +
>>> +	/* true when this represents a nested context where virtual HCR_EL2.VM == 1 */
>>> +	bool	nested_stage2_enabled;
>>> +
>>> +	/*
>>> +	 *  0: Nobody is currently using this, check vttbr for validity
>>> +	 * >0: Somebody is actively using this.
>>> +	 */
>>> +	atomic_t refcnt;
>>>  };
>>>  
>>> +static inline bool kvm_s2_mmu_valid(struct kvm_s2_mmu *mmu)
>>> +{
>>> +	return !(mmu->vttbr & 1);
>>> +}
>>> +
>>>  struct kvm_arch {
>>>  	struct kvm_s2_mmu mmu;
>>>  
>>> +	/*
>>> +	 * Stage 2 paging stage for VMs with nested virtual using a virtual
>>> +	 * VMID.
>>> +	 */
>>> +	struct kvm_s2_mmu *nested_mmus;
>>> +	size_t nested_mmus_size;
>>> +
>>>  	/* VTCR_EL2 value for this VM */
>>>  	u64    vtcr;
>>>  
>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
>>> index 1eb6e0ca61c2..32bcaa1845dc 100644
>>> --- a/arch/arm64/include/asm/kvm_mmu.h
>>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>>> @@ -100,6 +100,7 @@ alternative_cb_end
>>>  #include <asm/mmu_context.h>
>>>  #include <asm/pgtable.h>
>>>  #include <asm/kvm_emulate.h>
>>> +#include <asm/kvm_nested.h>
>>>  
>>>  void kvm_update_va_mask(struct alt_instr *alt,
>>>  			__le32 *origptr, __le32 *updptr, int nr_inst);
>>> @@ -164,6 +165,7 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
>>>  			     void **haddr);
>>>  void free_hyp_pgds(void);
>>>  
>>> +void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size);
>>>  void stage2_unmap_vm(struct kvm *kvm);
>>>  int kvm_alloc_stage2_pgd(struct kvm_s2_mmu *mmu);
>>>  void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
>>> @@ -635,5 +637,11 @@ static __always_inline void __load_guest_stage2(struct kvm_s2_mmu *mmu)
>>>  	asm(ALTERNATIVE("nop", "isb", ARM64_WORKAROUND_1165522));
>>>  }
>>>  
>>> +static inline u64 get_vmid(u64 vttbr)
>>> +{
>>> +	return (vttbr & VTTBR_VMID_MASK(kvm_get_vmid_bits())) >>
>>> +		VTTBR_VMID_SHIFT;
>>> +}
>>> +
>>>  #endif /* __ASSEMBLY__ */
>>>  #endif /* __ARM64_KVM_MMU_H__ */
>>> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
>>> index 61e71d0d2151..d4021d0892bd 100644
>>> --- a/arch/arm64/include/asm/kvm_nested.h
>>> +++ b/arch/arm64/include/asm/kvm_nested.h
>>> @@ -10,6 +10,13 @@ static inline bool nested_virt_in_use(const struct kvm_vcpu *vcpu)
>>>  		test_bit(KVM_ARM_VCPU_NESTED_VIRT, vcpu->arch.features);
>>>  }
>>>  
>>> +extern void kvm_init_nested(struct kvm *kvm);
>>> +extern int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu);
>>> +extern void kvm_init_s2_mmu(struct kvm_s2_mmu *mmu);
>>> +extern struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr);
>>> +extern void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu);
>>> +extern void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu);
>>> +
>>>  int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
>>>  extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
>>>  extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
>>> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
>>> index 3872e3cf1691..4b38dc5c0be3 100644
>>> --- a/arch/arm64/kvm/nested.c
>>> +++ b/arch/arm64/kvm/nested.c
>>> @@ -18,7 +18,161 @@
>>>  #include <linux/kvm.h>
>>>  #include <linux/kvm_host.h>
>>>  
>>> +#include <asm/kvm_arm.h>
>>>  #include <asm/kvm_emulate.h>
>>> +#include <asm/kvm_mmu.h>
>>> +#include <asm/kvm_nested.h>
>>> +
>>> +void kvm_init_nested(struct kvm *kvm)
>>> +{
>>> +	kvm_init_s2_mmu(&kvm->arch.mmu);
>>> +
>>> +	kvm->arch.nested_mmus = NULL;
>>> +	kvm->arch.nested_mmus_size = 0;
>>> +}
>>> +
>>> +int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu)
>>> +{
>>> +	struct kvm *kvm = vcpu->kvm;
>>> +	struct kvm_s2_mmu *tmp;
>>> +	int num_mmus;
>>> +	int ret = -ENOMEM;
>>> +
>>> +	if (!test_bit(KVM_ARM_VCPU_NESTED_VIRT, vcpu->arch.features))
>>> +		return 0;
>>> +
>>> +	if (!cpus_have_const_cap(ARM64_HAS_NESTED_VIRT))
>>> +		return -EINVAL;
>>> +
>>> +	mutex_lock(&kvm->lock);
>>> +
>>> +	num_mmus = atomic_read(&kvm->online_vcpus) * 2;
>>> +	tmp = __krealloc(kvm->arch.nested_mmus,
>>> +			 num_mmus * sizeof(*kvm->arch.nested_mmus),
>>> +			 GFP_KERNEL | __GFP_ZERO);
>>> +
>>> +	if (tmp) {
>>> +		if (tmp != kvm->arch.nested_mmus)
>>> +			kfree(kvm->arch.nested_mmus);
>> Here we are freeing a resource that is shared between all virtual
>> CPUs. So if kvm_alloc_stage2_pgd (below) fails, we could end up
>> freeing something that has been initialized by previous
>> vcpu(s). Same thing happens if tmp == kvm->arch.nested_mmus (because
>> of kfree(tmp)).
>>
>> Looking at Documentation/virtual/kvm/api.txt (section 4.82,
>> KVM_ARM_VCPU_IOCTL):
>>
>> Userspace can call this function multiple times for a given vcpu,
>> including after the vcpu has been run. This will reset the vcpu to
>> its initial state.
>>
>> It seems to me that it is allowed for userspace to try to call
>> KVM_ARM_VCPU_IOCTL again after it failed. I was wondering if it's
>> possible to end up in a situation where the second call succeeds,
>> but kvm->arch.nested_mmus is not initialized properly and if we
>> should care about that.
> Indeed, this needs reworking. I think that in the event of any
> allocation failure, we're better off leaving things in place. Given
> that we always perform a krealloc(), it should be safe to do so.
>
>>> +
>>> +		tmp[num_mmus - 1].kvm = kvm;
>>> +		atomic_set(&tmp[num_mmus - 1].refcnt, 0);
>>> +		ret = kvm_alloc_stage2_pgd(&tmp[num_mmus - 1]);
>>> +		if (ret)
>>> +			goto out;
>>> +
>>> +		tmp[num_mmus - 2].kvm = kvm;
>>> +		atomic_set(&tmp[num_mmus - 2].refcnt, 0);
>>> +		ret = kvm_alloc_stage2_pgd(&tmp[num_mmus - 2]);
>>> +		if (ret) {
>>> +			kvm_free_stage2_pgd(&tmp[num_mmus - 1]);
>>> +			goto out;
>>> +		}
>>> +
>>> +		kvm->arch.nested_mmus_size = num_mmus;
>>> +		kvm->arch.nested_mmus = tmp;
>>> +		tmp = NULL;
>>> +	}
>>> +
>>> +out:
>>> +	kfree(tmp);
>>> +	mutex_unlock(&kvm->lock);
>>> +	return ret;
>>> +}
>>> +
>>> +/* Must be called with kvm->lock held */
>>> +struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr)
>>> +{
>>> +	bool nested_stage2_enabled = hcr & HCR_VM;
>>> +	int i;
>>> +
>>> +	/* Don't consider the CnP bit for the vttbr match */
>>> +	vttbr = vttbr & ~1UL;
>> There's a define for that bit, VTTBR_CNP_BIT in kvm_arm.h (which this file
>> includes).
> Fair enough.
>
>>> +
>>> +	/* Search a mmu in the list using the virtual VMID as a key */
>> I think when the guest has stage 2 enabled we also match on
>> VTTBR_EL2.VMID + VTTBR_EL2.BADDR.
> Yes.
>
>>> +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
>>> +		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
>>> +
>>> +		if (!kvm_s2_mmu_valid(mmu))
>>> +			continue;
>>> +
>>> +		if (nested_stage2_enabled &&
>>> +		    mmu->nested_stage2_enabled &&
>>> +		    vttbr == mmu->vttbr)
>>> +			return mmu;
>>> +
>>> +		if (!nested_stage2_enabled &&
>>> +		    !mmu->nested_stage2_enabled &&
>>> +		    get_vmid(vttbr) == get_vmid(mmu->vttbr))
>>> +			return mmu;
>> I'm struggling to understand why we do this. As far as I can tell,
>> this applies only to non-vhe guest hypervisors, because that's the
>> only situation where we run something at vEL1 without shadow stage 2
>> enabled.
> I don't think the non-VHE consideration is relevant here. Trying to
> distinguish between arbitrary use cases is just a massive distraction
> and gets in the way of implementing the architecture correctly.

This is very true. By the time I had finished reviewing the series I had fallen
into this trap several times. Lesson learned, I hope.

>
>> There's only one MMU that matches that guest, because
>> there's only one host for the non-vhe hypervisor. So why are we
>> matching on vmid only, instead of matching on the entire vttbr? Is
>> it because we don't want to get fooled by a naughty guest that
>> changes the BADDR each time it switches to its EL1 host? Or
>> something else entirely that I'm missing?
> First, let's remember one thing: No matter whether we have S2 enabled
> or not, TLBs are tagged by VMID. I wondered about that for a long
> time, but couldn't find anything in the architecture that would
> contradict this. And if you think of what a CPU must implement, it
> makes more sense to always tag things than try to come up with more
> fancy schemes.
>
> Given that our shadow S2 is an ugly form of SW-loaded TLBs, matching
> the VMID is critical to hit the right shadow MMU context. And since S2
> *translation* is disabled, it is perfectly valid to ignore the BADDR
> part of VTTBR, and VHE-vs-!VHE doesn't come into the equation.

Very true.

>
> Now, the real cause for questioning is the previous statement: why do
> we decide to match on the whole VTTBR when S2 translation is enabled?
> Let's imagine a guest that reuses the same VMID for different
> translations (that's a bit bonkers, but unfortunately not forbidden by
> the architecture if you consider per-CPU translations).
>
> And even in the single VCPU case that would change BADDR each time
> without invalidating the TLBs: If we only matched on the VMID, we
> could potentially end-up with TLB conflicts, and that's not a nice
> place to be. So we will just pick a new MMU context, and start
> populating shadow translations based on this context. This is pretty
> forgiving for broken guests, but it also simplifies the implementation.

I think I understand what you are saying, thank you for the explanation.

>
>>> +	}
>>> +	return NULL;
>>> +}
>>> +
>>> +static struct kvm_s2_mmu *get_s2_mmu_nested(struct kvm_vcpu *vcpu)
>>> +{
>>> +	struct kvm *kvm = vcpu->kvm;
>>> +	u64 vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
>>> +	u64 hcr= vcpu_read_sys_reg(vcpu, HCR_EL2);
>>> +	struct kvm_s2_mmu *s2_mmu;
>>> +	int i;
>>> +
>>> +	s2_mmu = lookup_s2_mmu(kvm, vttbr, hcr);
>>> +	if (s2_mmu)
>>> +		goto out;
>>> +
>>> +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
>>> +		s2_mmu = &kvm->arch.nested_mmus[i];
>>> +
>>> +		if (atomic_read(&s2_mmu->refcnt) == 0)
>>> +			break;
>>> +	}
>>> +	BUG_ON(atomic_read(&s2_mmu->refcnt)); /* We have struct MMUs to spare */
>>> +
>>> +	if (kvm_s2_mmu_valid(s2_mmu)) {
>>> +		/* Clear the old state */
>>> +		kvm_unmap_stage2_range(s2_mmu, 0, kvm_phys_size(kvm));
>>> +		if (s2_mmu->vmid.vmid_gen)
>>> +			kvm_call_hyp(__kvm_tlb_flush_vmid, s2_mmu);
>>> +	}
>>> +
>>> +	/*
>>> +	 * The virtual VMID (modulo CnP) will be used as a key when matching
>>> +	 * an existing kvm_s2_mmu.
>>> +	 */
>>> +	s2_mmu->vttbr = vttbr & ~1UL;
>>> +	s2_mmu->nested_stage2_enabled = hcr & HCR_VM;
>>> +
>>> +out:
>>> +	atomic_inc(&s2_mmu->refcnt);
>>> +	return s2_mmu;
>>> +}
>>> +
>>> +void kvm_init_s2_mmu(struct kvm_s2_mmu *mmu)
>>> +{
>>> +	mmu->vttbr = 1;
>>> +	mmu->nested_stage2_enabled = false;
>>> +	atomic_set(&mmu->refcnt, 0);
>>> +}
>>> +
>>> +void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu)
>>> +{
>>> +	if (is_hyp_ctxt(vcpu)) {
>> If userspace has set the nested feature, but hasn't set the vcpu
>> mode to PSR_MODE_EL2h/PSR_MODE_EL2t, we will never use
>> kvm->arch.mmu, and instead we'll always take the mmu_lock and search
>> for the mmu. Is that something we should care about?
> I'm really not worried about this. If the user has asked for nested
> support but doesn't make use of it, tough luck.

Fair enough.

>
>>> +		vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu;
>>> +	} else {
>>> +		spin_lock(&vcpu->kvm->mmu_lock);
>>> +		vcpu->arch.hw_mmu = get_s2_mmu_nested(vcpu);
>>> +		spin_unlock(&vcpu->kvm->mmu_lock);
>>> +	}
>>> +}
>>> +
>>> +void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu)
>>> +{
>>> +	if (vcpu->arch.hw_mmu != &vcpu->kvm->arch.mmu) {
>>> +		atomic_dec(&vcpu->arch.hw_mmu->refcnt);
>>> +		vcpu->arch.hw_mmu = NULL;
>>> +	}
>>> +}
>>>  
>>>  /*
>>>   * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
>>> @@ -37,3 +191,21 @@ int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
>>>  
>>>  	return -EINVAL;
>>>  }
>>> +
>>> +void kvm_arch_flush_shadow_all(struct kvm *kvm)
>>> +{
>>> +	int i;
>>> +
>>> +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
>>> +		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
>>> +
>>> +		WARN_ON(atomic_read(&mmu->refcnt));
>>> +
>>> +		if (!atomic_read(&mmu->refcnt))
>>> +			kvm_free_stage2_pgd(mmu);
>>> +	}
>>> +	kfree(kvm->arch.nested_mmus);
>>> +	kvm->arch.nested_mmus = NULL;
>>> +	kvm->arch.nested_mmus_size = 0;
>>> +	kvm_free_stage2_pgd(&kvm->arch.mmu);
>>> +}
>>> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
>>> index 5d4371633e1c..4e3cbfa1ecbe 100644
>>> --- a/virt/kvm/arm/arm.c
>>> +++ b/virt/kvm/arm/arm.c
>>> @@ -36,6 +36,7 @@
>>>  #include <asm/kvm_arm.h>
>>>  #include <asm/kvm_asm.h>
>>>  #include <asm/kvm_mmu.h>
>>> +#include <asm/kvm_nested.h>
>>>  #include <asm/kvm_emulate.h>
>>>  #include <asm/kvm_coproc.h>
>>>  #include <asm/sections.h>
>>> @@ -126,6 +127,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>>>  	kvm->arch.mmu.vmid.vmid_gen = 0;
>>>  	kvm->arch.mmu.kvm = kvm;
>>>  
>>> +	kvm_init_nested(kvm);
>> More context:
>>
>> @@ -120,18 +121,20 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>>  
>>      ret = kvm_alloc_stage2_pgd(&kvm->arch.mmu);
>>      if (ret)
>>          goto out_fail_alloc;
>>  
>>      /* Mark the initial VMID generation invalid */
>>      kvm->arch.mmu.vmid.vmid_gen = 0;
>>      kvm->arch.mmu.kvm = kvm;
>>  
>> +    kvm_init_nested(kvm);
>> +
>>
>> kvm_alloc_stage2_pgd ends up calling kvm_init_s2_mmu for
>> kvm->arch.mmu.  kvm_init_nested does the same thing, and we end up
>> calling kvm_init_s2_mmu for kvm->arch.mmu twice. There's really no
>> harm in doing it twice, but I thought I could mention it because it
>> looks to me that the fix is simple: remove the kvm_init_s2_mmu call
>> from kvm_init_nested. The double initialization is also true for the
>> tip of the series.
> I've reworked this part a bit, and need to reconcile the above with
> what I now have. Overall, I do agree that there is some vastly
> redundant init here.
>
> And a very belated thanks for all the work you've put into reviewing
> this difficult series. Rest assured that this wasn't in vain!

Thank you, reviewing this series has been very useful for learning how KVM works,
as well as some of the finer points of the Arm architecture. I look forward to
reviewing v2 of the patches when they're ready.

Thanks,
Alex
>
> Thanks,
>
> 	M.
>

^ permalink raw reply	[flat|nested] 176+ messages in thread

end of thread, other threads:[~2020-01-06 16:31 UTC | newest]

Thread overview: 176+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-21  9:37 [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
2019-06-21  9:37 ` [PATCH 01/59] KVM: arm64: Migrate _elx sysreg accessors to msr_s/mrs_s Marc Zyngier
2019-06-24 11:16   ` Dave Martin
2019-06-24 12:59   ` Alexandru Elisei
2019-07-03 12:32     ` Marc Zyngier
2019-06-21  9:37 ` [PATCH 02/59] KVM: arm64: Move __load_guest_stage2 to kvm_mmu.h Marc Zyngier
2019-06-24 11:19   ` Dave Martin
2019-07-03  9:30     ` Marc Zyngier
2019-07-03 16:13       ` Dave Martin
2019-06-21  9:37 ` [PATCH 03/59] arm64: Add ARM64_HAS_NESTED_VIRT cpufeature Marc Zyngier
2019-06-21 13:08   ` Julien Thierry
2019-06-21 13:22     ` Marc Zyngier
2019-06-21 13:44   ` Suzuki K Poulose
2019-06-24 11:24   ` Dave Martin
2019-06-21  9:37 ` [PATCH 04/59] KVM: arm64: nv: Introduce nested virtualization VCPU feature Marc Zyngier
2019-06-21 13:08   ` Julien Thierry
2019-06-24 11:28   ` Dave Martin
2019-07-03 11:53     ` Marc Zyngier
2019-07-03 16:27       ` Dave Martin
2019-06-24 11:43   ` Dave Martin
2019-07-03 11:56     ` Marc Zyngier
2019-07-03 16:24       ` Dave Martin
2019-06-21  9:37 ` [PATCH 05/59] KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set Marc Zyngier
2019-06-24 10:19   ` Suzuki K Poulose
2019-06-24 11:38   ` Dave Martin
2019-06-21  9:37 ` [PATCH 06/59] KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x Marc Zyngier
2019-06-21 13:24   ` Julien Thierry
2019-06-21 13:50     ` Marc Zyngier
2019-06-24 12:48       ` Dave Martin
2019-07-03  9:21         ` Marc Zyngier
2019-07-04 10:00           ` Dave Martin
2019-06-21  9:37 ` [PATCH 07/59] KVM: arm64: nv: Add EL2 system registers to vcpu context Marc Zyngier
2019-06-24 12:54   ` Dave Martin
2019-07-03 12:20     ` Marc Zyngier
2019-07-03 16:31       ` Dave Martin
2019-06-24 15:47   ` Alexandru Elisei
2019-07-03 13:20     ` Marc Zyngier
2019-07-03 16:01       ` Marc Zyngier
2019-07-01 16:36   ` Suzuki K Poulose
2019-06-21  9:37 ` [PATCH 08/59] KVM: arm64: nv: Reset VMPIDR_EL2 and VPIDR_EL2 to sane values Marc Zyngier
2019-06-24 12:59   ` Dave Martin
2019-06-21  9:37 ` [PATCH 09/59] KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state Marc Zyngier
2019-06-24 13:08   ` Dave Martin
2019-06-21  9:37 ` [PATCH 10/59] KVM: arm64: nv: Support virtual EL2 exceptions Marc Zyngier
2019-07-08 13:56   ` Steven Price
2019-06-21  9:37 ` [PATCH 11/59] KVM: arm64: nv: Inject HVC exceptions to the virtual EL2 Marc Zyngier
2019-06-25 13:13   ` Alexandru Elisei
2019-07-03 14:16     ` Marc Zyngier
2019-07-30 14:08     ` Alexandru Elisei
2019-06-21  9:37 ` [PATCH 12/59] KVM: arm64: nv: Handle trapped ERET from " Marc Zyngier
2019-07-02 12:00   ` Alexandru Elisei
2019-06-21  9:37 ` [PATCH 13/59] KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg() Marc Zyngier
2019-06-24 12:42   ` Julien Thierry
2019-06-25 14:02     ` Alexandru Elisei
2019-07-03 12:15     ` Marc Zyngier
2019-07-03 15:21       ` Julien Thierry
2019-06-25 15:18   ` Alexandru Elisei
2019-07-01  9:58     ` Alexandru Elisei
2019-07-03 15:59     ` Marc Zyngier
2019-07-03 16:32       ` Alexandru Elisei
2019-07-04 14:39         ` Marc Zyngier
2019-06-26 15:04   ` Alexandru Elisei
2019-07-04 15:05     ` Marc Zyngier
2019-07-01 12:10   ` Alexandru Elisei
2019-06-21  9:37 ` [PATCH 14/59] KVM: arm64: nv: Handle SPSR_EL2 specially Marc Zyngier
2019-06-21  9:37 ` [PATCH 15/59] KVM: arm64: nv: Refactor vcpu_{read,write}_sys_reg Marc Zyngier
2019-06-24 15:07   ` Julien Thierry
2019-07-03 13:09     ` Marc Zyngier
2019-06-27  9:21   ` Alexandru Elisei
2019-07-04 15:15     ` Marc Zyngier
2019-06-21  9:38 ` [PATCH 16/59] KVM: arm64: nv: Save/Restore vEL2 sysregs Marc Zyngier
2019-06-25  8:48   ` Julien Thierry
2019-07-03 13:42     ` Marc Zyngier
2019-07-01 12:09   ` Alexandru Elisei
2019-08-21 11:57   ` Alexandru Elisei
2019-06-21  9:38 ` [PATCH 17/59] KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor Marc Zyngier
2019-06-21  9:38 ` [PATCH 18/59] KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2 Marc Zyngier
2019-07-01 16:12   ` Alexandru Elisei
2019-06-21  9:38 ` [PATCH 19/59] KVM: arm64: nv: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 from " Marc Zyngier
2019-06-21  9:38 ` [PATCH 20/59] KVM: arm64: nv: Trap CPACR_EL1 access in " Marc Zyngier
2019-07-01 16:40   ` Alexandru Elisei
2019-06-21  9:38 ` [PATCH 21/59] KVM: arm64: nv: Set a handler for the system instruction traps Marc Zyngier
2019-06-25 12:55   ` Julien Thierry
2019-07-03 14:15     ` Marc Zyngier
2019-06-21  9:38 ` [PATCH 22/59] KVM: arm64: nv: Handle PSCI call via smc from the guest Marc Zyngier
2019-06-21  9:38 ` [PATCH 23/59] KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting Marc Zyngier
2019-06-25 14:19   ` Julien Thierry
2019-07-02 12:54     ` Alexandru Elisei
2019-07-03 14:18     ` Marc Zyngier
2019-06-21  9:38 ` [PATCH 24/59] KVM: arm64: nv: Respect virtual CPTR_EL2.TFP setting Marc Zyngier
2019-06-21  9:38 ` [PATCH 25/59] KVM: arm64: nv: Don't expose SVE to nested guests Marc Zyngier
2019-06-21  9:38 ` [PATCH 26/59] KVM: arm64: nv: Respect the virtual HCR_EL2.NV bit setting Marc Zyngier
2019-06-26  5:31   ` Julien Thierry
2019-07-03 16:31     ` Marc Zyngier
2019-06-21  9:38 ` [PATCH 27/59] KVM: arm64: nv: Respect virtual HCR_EL2.TVM and TRVM settings Marc Zyngier
2019-06-26  6:55   ` Julien Thierry
2019-07-04 14:57     ` Marc Zyngier
2019-06-21  9:38 ` [PATCH 28/59] KVM: arm64: nv: Respect the virtual HCR_EL2.NV1 bit setting Marc Zyngier
2019-06-26  7:23   ` Julien Thierry
2019-07-02 16:32   ` Alexandru Elisei
2019-07-03  9:10     ` Alexandru Elisei
2019-06-21  9:38 ` [PATCH 29/59] KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2 Marc Zyngier
2019-07-03  9:16   ` Alexandru Elisei
2019-06-21  9:38 ` [PATCH 30/59] KVM: arm64: nv: Configure HCR_EL2 for nested virtualization Marc Zyngier
2019-06-21  9:38 ` [PATCH 31/59] KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 changes Marc Zyngier
2019-06-21  9:38 ` [PATCH 32/59] KVM: arm64: nv: Hide RAS from nested guests Marc Zyngier
2019-07-03 13:59   ` Alexandru Elisei
2019-06-21  9:38 ` [PATCH 33/59] KVM: arm64: nv: Pretend we only support larger-than-host page sizes Marc Zyngier
2019-07-03 14:13   ` Alexandru Elisei
2019-06-21  9:38 ` [PATCH 34/59] KVM: arm/arm64: nv: Factor out stage 2 page table data from struct kvm Marc Zyngier
2019-07-03 15:52   ` Alexandru Elisei
2019-06-21  9:38 ` [PATCH 35/59] KVM: arm/arm64: nv: Support multiple nested stage 2 mmu structures Marc Zyngier
2019-06-25 12:19   ` Alexandru Elisei
2019-07-03 13:47     ` Marc Zyngier
2019-06-27 13:15   ` Julien Thierry
2019-07-04 15:51   ` Alexandru Elisei
2020-01-05 11:35     ` Marc Zyngier
2020-01-06 16:31       ` Alexandru Elisei
2019-06-21  9:38 ` [PATCH 36/59] KVM: arm64: nv: Implement nested Stage-2 page table walk logic Marc Zyngier
2019-06-21  9:38 ` [PATCH 37/59] KVM: arm64: nv: Handle shadow stage 2 page faults Marc Zyngier
2019-07-05 14:28   ` Alexandru Elisei
2019-06-21  9:38 ` [PATCH 38/59] KVM: arm64: nv: Unmap/flush shadow stage 2 page tables Marc Zyngier
2019-07-01  8:03   ` Julien Thierry
2019-06-21  9:38 ` [PATCH 39/59] KVM: arm64: nv: Move last_vcpu_ran to be per s2 mmu Marc Zyngier
2019-07-01  9:10   ` Julien Thierry
2019-07-05 15:28   ` Alexandru Elisei
2019-06-21  9:38 ` [PATCH 40/59] KVM: arm64: nv: Don't always start an S2 MMU search from the beginning Marc Zyngier
2019-07-09  9:59   ` Alexandru Elisei
2019-06-21  9:38 ` [PATCH 41/59] KVM: arm64: nv: Introduce sys_reg_desc.forward_trap Marc Zyngier
2019-06-21  9:38 ` [PATCH 42/59] KVM: arm64: nv: Rework the system instruction emulation framework Marc Zyngier
2019-06-21  9:38 ` [PATCH 43/59] KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2 Marc Zyngier
2019-07-01 15:45   ` Julien Thierry
2019-07-09 13:20   ` Alexandru Elisei
2019-07-18 12:13     ` Tomasz Nowicki
     [not found]       ` <6537c8d2-3bda-788e-8861-b70971a625cb@arm.com>
2019-07-18 12:59         ` Tomasz Nowicki
2019-07-24 10:25   ` Tomasz Nowicki
2019-07-24 12:39     ` Marc Zyngier
2019-07-24 13:56       ` Tomasz Nowicki
2019-06-21  9:38 ` [PATCH 44/59] KVM: arm64: nv: Trap and emulate TLBI " Marc Zyngier
2019-07-02 12:37   ` Julien Thierry
2019-07-10 10:15   ` Alexandru Elisei
2019-06-21  9:38 ` [PATCH 45/59] KVM: arm64: nv: Handle traps for timer _EL02 and _EL2 sysregs accessors Marc Zyngier
2019-06-21  9:38 ` [PATCH 46/59] KVM: arm64: nv: arch_timer: Support hyp timer emulation Marc Zyngier
2019-07-10 16:23   ` Alexandru Elisei
2019-06-21  9:38 ` [PATCH 47/59] KVM: arm64: nv: Propagate CNTVOFF_EL2 to the virtual EL1 timer Marc Zyngier
2019-08-08  9:34   ` Alexandru Elisei
2019-06-21  9:38 ` [PATCH 48/59] KVM: arm64: nv: Load timer before the GIC Marc Zyngier
2019-07-11 13:17   ` Alexandru Elisei
2019-06-21  9:38 ` [PATCH 49/59] KVM: arm64: nv: vgic-v3: Take cpu_if pointer directly instead of vcpu Marc Zyngier
2019-06-21  9:38 ` [PATCH 50/59] KVM: arm64: nv: Nested GICv3 Support Marc Zyngier
2019-07-16 11:41   ` Alexandru Elisei
2019-06-21  9:38 ` [PATCH 51/59] KVM: arm64: nv: vgic: Emulate the HW bit in software Marc Zyngier
2019-06-21  9:38 ` [PATCH 52/59] KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ Marc Zyngier
2019-07-04  7:38   ` Julien Thierry
2019-07-04  9:01     ` Andre Przywara
2019-07-04  9:04       ` Julien Thierry
2019-06-21  9:38 ` [PATCH 53/59] KVM: arm64: nv: Implement maintenance interrupt forwarding Marc Zyngier
2019-07-04  8:06   ` Julien Thierry
2019-07-16 16:35   ` Alexandru Elisei
2019-06-21  9:38 ` [PATCH 54/59] KVM: arm64: nv: Add nested GICv3 tracepoints Marc Zyngier
2019-06-21  9:38 ` [PATCH 55/59] arm64: KVM: nv: Add handling of EL2-specific timer registers Marc Zyngier
2019-07-11 12:35   ` Alexandru Elisei
2019-07-17 10:19   ` Alexandru Elisei
2019-06-21  9:38 ` [PATCH 56/59] arm64: KVM: nv: Honor SCTLR_EL2.SPAN on entering vEL2 Marc Zyngier
2019-06-21  9:38 ` [PATCH 57/59] arm64: KVM: nv: Handle SCTLR_EL2 RES0/RES1 bits Marc Zyngier
2019-06-21  9:38 ` [PATCH 58/59] arm64: KVM: nv: Restrict S2 RD/WR permissions to match the guest's Marc Zyngier
2019-06-21  9:38 ` [PATCH 59/59] arm64: KVM: nv: Allow userspace to request KVM_ARM_VCPU_NESTED_VIRT Marc Zyngier
     [not found] ` <CANW9uyssDm_0ysC_pnvhHRrnsmFZik+3_ENmFz7L2GCmtH09fw@mail.gmail.com>
2019-06-21 11:21   ` [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support Marc Zyngier
2019-08-02 10:11 ` Alexandru Elisei
2019-08-02 10:30   ` Andrew Jones
2019-08-09 10:01   ` Alexandru Elisei
2019-08-09 11:44     ` Andrew Jones
2019-08-09 12:00       ` Alexandru Elisei
2019-08-09 13:00         ` Andrew Jones
2019-08-22 11:57     ` Alexandru Elisei
2019-08-22 15:32       ` Alexandru Elisei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).