All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 00/64] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support
@ 2022-01-28 12:18 ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

Here the first drop of the KVM/arm64 NV support code for 2022. Nothing
to worry about, it certainly isn't going to be the last!

A number of changes since [1]:

- The exposure of the EL2 sysregs to userspace is now gated by NV
  being enabled, as you'd expect. Which means we shouldn't break live
  migration anymore (yay!).

- We now correctly detect and handle an Illegal Exception Return from
  vEL2. Don't try this at home, kids!

- Non-nested exception injection when executing at EL0 has been fixed.

- We forbid NV+SVE for now. This is a change in ABI, and I hope to
  remove this requirement at some point. I've pushed out a kvmtool
  change that enforces this [2], and QEMU will need similar surgery
  for now.

- nested_virt_in_use() is now renamed to vcpu_has_nv() (resp
  vcpu_has_nv2() for the v8.4 support).

- A bunch of small nits being tidied up, thanks to our eagle eyed
  reviewers.

Many thanks to Alexandru, Chase, Ganapatrao and Russell for spending a
lot of time reviewing this and actively spotting issues. There is a
pending issue that Ganapatrao mentioned when running L0 with 4kB and
L1 with 64kB, resulting in no forward progress. I haven't investigated
this yet, but I strongly suspect that something is amiss in the S2 PTW
at the point where we combine the L2 IPA with the L1 IPA.

As usual, blame me for any bug, and nobody else. It has been tested on
my usual zoo, and nothing caught fire. Which means nothing, of course.
Obviously, it isn't feature complete, and it is quite easy to write a
guest that will not behave as intended. The current goal is to make
sure that non-NV KVM is unaffected by the NV stuff.

It is massively painful to run on the FVP, but if you have a Neoverse
V1 or N2 system (or anything else with ARMv8.4-NV) that is collecting
dust, I have the right stuff to keep it busy.

	M.

[1] https://lore.kernel.org/r/20211129200150.351436-1-maz@kernel.org
[2] https://git.kernel.org/pub/scm/linux/kernel/git/maz/kvmtool.git/log/?h=arm64/nv-5.16

Andre Przywara (1):
  KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ

Christoffer Dall (15):
  KVM: arm64: nv: Introduce nested virtualization VCPU feature
  KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set
  KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x
  KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state
  KVM: arm64: nv: Reset VMPIDR_EL2 and VPIDR_EL2 to sane values
  KVM: arm64: nv: Handle trapped ERET from virtual EL2
  KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor
  KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2
  KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2
    changes
  KVM: arm64: nv: Implement nested Stage-2 page table walk logic
  KVM: arm64: nv: Unmap/flush shadow stage 2 page tables
  KVM: arm64: nv: arch_timer: Support hyp timer emulation
  KVM: arm64: nv: vgic: Emulate the HW bit in software
  KVM: arm64: nv: Add nested GICv3 tracepoints
  KVM: arm64: nv: Sync nested timer state with ARMv8.4

Jintack Lim (18):
  arm64: Add ARM64_HAS_NESTED_VIRT cpufeature
  KVM: arm64: nv: Handle HCR_EL2.NV system register traps
  KVM: arm64: nv: Support virtual EL2 exceptions
  KVM: arm64: nv: Inject HVC exceptions to the virtual EL2
  KVM: arm64: nv: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
  KVM: arm64: nv: Trap CPACR_EL1 access in virtual EL2
  KVM: arm64: nv: Handle PSCI call via smc from the guest
  KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting
  KVM: arm64: nv: Respect virtual CPTR_EL2.{TFP,FPEN} settings
  KVM: arm64: nv: Respect the virtual HCR_EL2.NV bit setting
  KVM: arm64: nv: Respect virtual HCR_EL2.TVM and TRVM settings
  KVM: arm64: nv: Respect the virtual HCR_EL2.NV1 bit setting
  KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2
  KVM: arm64: nv: Configure HCR_EL2 for nested virtualization
  KVM: arm64: nv: Set a handler for the system instruction traps
  KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2
  KVM: arm64: nv: Trap and emulate TLBI instructions from virtual EL2
  KVM: arm64: nv: Nested GICv3 Support

Marc Zyngier (30):
  KVM: arm64: nv: Add EL2 system registers to vcpu context
  KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers
  KVM: arm64: nv: Handle virtual EL2 registers in
    vcpu_read/write_sys_reg()
  KVM: arm64: nv: Handle SPSR_EL2 specially
  KVM: arm64: nv: Handle HCR_EL2.E2H specially
  KVM: arm64: nv: Save/Restore vEL2 sysregs
  KVM: arm64: nv: Allow a sysreg to be hidden from userspace only
  KVM: arm64: nv: Forward debug traps to the nested guest
  KVM: arm64: nv: Filter out unsupported features from ID regs
  KVM: arm64: nv: Hide RAS from nested guests
  KVM: arm64: nv: Support multiple nested Stage-2 mmu structures
  KVM: arm64: nv: Handle shadow stage 2 page faults
  KVM: arm64: nv: Restrict S2 RD/WR permissions to match the guest's
  KVM: arm64: nv: Fold guest's HCR_EL2 configuration into the host's
  KVM: arm64: nv: Add handling of EL2-specific timer registers
  KVM: arm64: nv: Load timer before the GIC
  KVM: arm64: nv: Don't load the GICv4 context on entering a nested
    guest
  KVM: arm64: nv: Implement maintenance interrupt forwarding
  KVM: arm64: nv: Allow userspace to request KVM_ARM_VCPU_NESTED_VIRT
  KVM: arm64: nv: Add handling of ARMv8.4-TTL TLB invalidation
  KVM: arm64: nv: Invalidate TLBs based on shadow S2 TTL-like
    information
  KVM: arm64: nv: Tag shadow S2 entries with nested level
  KVM: arm64: nv: Add include containing the VNCR_EL2 offsets
  KVM: arm64: nv: Map VNCR-capable registers to a separate page
  KVM: arm64: nv: Move nested vgic state into the sysreg file
  KVM: arm64: Add ARMv8.4 Enhanced Nested Virt cpufeature
  KVM: arm64: nv: Allocate VNCR page when required
  KVM: arm64: nv: Enable ARMv8.4-NV support
  KVM: arm64: nv: Fast-track 'InHost' exception returns
  KVM: arm64: nv: Fast-track EL1 TLBIs for VHE guests

 .../admin-guide/kernel-parameters.txt         |    7 +-
 .../virt/kvm/devices/arm-vgic-v3.rst          |   12 +-
 arch/arm64/include/asm/esr.h                  |    5 +
 arch/arm64/include/asm/kvm_arm.h              |   27 +-
 arch/arm64/include/asm/kvm_asm.h              |    4 +
 arch/arm64/include/asm/kvm_emulate.h          |  143 +-
 arch/arm64/include/asm/kvm_host.h             |  185 ++-
 arch/arm64/include/asm/kvm_hyp.h              |    2 +
 arch/arm64/include/asm/kvm_mmu.h              |   18 +-
 arch/arm64/include/asm/kvm_nested.h           |  156 ++
 arch/arm64/include/asm/sysreg.h               |  101 +-
 arch/arm64/include/asm/vncr_mapping.h         |   74 +
 arch/arm64/include/uapi/asm/kvm.h             |    2 +
 arch/arm64/kernel/cpufeature.c                |   34 +
 arch/arm64/kvm/Makefile                       |    4 +-
 arch/arm64/kvm/arch_timer.c                   |  202 ++-
 arch/arm64/kvm/arm.c                          |   42 +-
 arch/arm64/kvm/at.c                           |  219 +++
 arch/arm64/kvm/emulate-nested.c               |  211 +++
 arch/arm64/kvm/guest.c                        |    6 +
 arch/arm64/kvm/handle_exit.c                  |   81 +-
 arch/arm64/kvm/hyp/exception.c                |   49 +-
 arch/arm64/kvm/hyp/include/hyp/switch.h       |   12 +-
 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h    |   24 +-
 arch/arm64/kvm/hyp/nvhe/switch.c              |    2 +-
 arch/arm64/kvm/hyp/nvhe/sysreg-sr.c           |    2 +-
 arch/arm64/kvm/hyp/vgic-v3-sr.c               |    2 +-
 arch/arm64/kvm/hyp/vhe/switch.c               |  181 ++-
 arch/arm64/kvm/hyp/vhe/sysreg-sr.c            |  125 +-
 arch/arm64/kvm/hyp/vhe/tlb.c                  |   83 ++
 arch/arm64/kvm/inject_fault.c                 |   68 +-
 arch/arm64/kvm/mmu.c                          |  206 ++-
 arch/arm64/kvm/nested.c                       |  915 ++++++++++++
 arch/arm64/kvm/reset.c                        |   30 +-
 arch/arm64/kvm/sys_regs.c                     | 1252 ++++++++++++++++-
 arch/arm64/kvm/sys_regs.h                     |   14 +-
 arch/arm64/kvm/trace_arm.h                    |   65 +-
 arch/arm64/kvm/vgic/vgic-init.c               |   30 +
 arch/arm64/kvm/vgic/vgic-kvm-device.c         |   22 +
 arch/arm64/kvm/vgic/vgic-nested-trace.h       |  137 ++
 arch/arm64/kvm/vgic/vgic-v3-nested.c          |  242 ++++
 arch/arm64/kvm/vgic/vgic-v3.c                 |   39 +-
 arch/arm64/kvm/vgic/vgic.c                    |   44 +
 arch/arm64/kvm/vgic/vgic.h                    |   10 +
 arch/arm64/tools/cpucaps                      |    2 +
 include/kvm/arm_arch_timer.h                  |    9 +-
 include/kvm/arm_vgic.h                        |   16 +
 include/uapi/linux/kvm.h                      |    1 +
 tools/arch/arm/include/uapi/asm/kvm.h         |    1 +
 49 files changed, 4947 insertions(+), 171 deletions(-)
 create mode 100644 arch/arm64/include/asm/kvm_nested.h
 create mode 100644 arch/arm64/include/asm/vncr_mapping.h
 create mode 100644 arch/arm64/kvm/at.c
 create mode 100644 arch/arm64/kvm/emulate-nested.c
 create mode 100644 arch/arm64/kvm/nested.c
 create mode 100644 arch/arm64/kvm/vgic/vgic-nested-trace.h
 create mode 100644 arch/arm64/kvm/vgic/vgic-v3-nested.c

-- 
2.30.2


^ permalink raw reply	[flat|nested] 378+ messages in thread

* [PATCH v6 00/64] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support
@ 2022-01-28 12:18 ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

Here the first drop of the KVM/arm64 NV support code for 2022. Nothing
to worry about, it certainly isn't going to be the last!

A number of changes since [1]:

- The exposure of the EL2 sysregs to userspace is now gated by NV
  being enabled, as you'd expect. Which means we shouldn't break live
  migration anymore (yay!).

- We now correctly detect and handle an Illegal Exception Return from
  vEL2. Don't try this at home, kids!

- Non-nested exception injection when executing at EL0 has been fixed.

- We forbid NV+SVE for now. This is a change in ABI, and I hope to
  remove this requirement at some point. I've pushed out a kvmtool
  change that enforces this [2], and QEMU will need similar surgery
  for now.

- nested_virt_in_use() is now renamed to vcpu_has_nv() (resp
  vcpu_has_nv2() for the v8.4 support).

- A bunch of small nits being tidied up, thanks to our eagle eyed
  reviewers.

Many thanks to Alexandru, Chase, Ganapatrao and Russell for spending a
lot of time reviewing this and actively spotting issues. There is a
pending issue that Ganapatrao mentioned when running L0 with 4kB and
L1 with 64kB, resulting in no forward progress. I haven't investigated
this yet, but I strongly suspect that something is amiss in the S2 PTW
at the point where we combine the L2 IPA with the L1 IPA.

As usual, blame me for any bug, and nobody else. It has been tested on
my usual zoo, and nothing caught fire. Which means nothing, of course.
Obviously, it isn't feature complete, and it is quite easy to write a
guest that will not behave as intended. The current goal is to make
sure that non-NV KVM is unaffected by the NV stuff.

It is massively painful to run on the FVP, but if you have a Neoverse
V1 or N2 system (or anything else with ARMv8.4-NV) that is collecting
dust, I have the right stuff to keep it busy.

	M.

[1] https://lore.kernel.org/r/20211129200150.351436-1-maz@kernel.org
[2] https://git.kernel.org/pub/scm/linux/kernel/git/maz/kvmtool.git/log/?h=arm64/nv-5.16

Andre Przywara (1):
  KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ

Christoffer Dall (15):
  KVM: arm64: nv: Introduce nested virtualization VCPU feature
  KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set
  KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x
  KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state
  KVM: arm64: nv: Reset VMPIDR_EL2 and VPIDR_EL2 to sane values
  KVM: arm64: nv: Handle trapped ERET from virtual EL2
  KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor
  KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2
  KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2
    changes
  KVM: arm64: nv: Implement nested Stage-2 page table walk logic
  KVM: arm64: nv: Unmap/flush shadow stage 2 page tables
  KVM: arm64: nv: arch_timer: Support hyp timer emulation
  KVM: arm64: nv: vgic: Emulate the HW bit in software
  KVM: arm64: nv: Add nested GICv3 tracepoints
  KVM: arm64: nv: Sync nested timer state with ARMv8.4

Jintack Lim (18):
  arm64: Add ARM64_HAS_NESTED_VIRT cpufeature
  KVM: arm64: nv: Handle HCR_EL2.NV system register traps
  KVM: arm64: nv: Support virtual EL2 exceptions
  KVM: arm64: nv: Inject HVC exceptions to the virtual EL2
  KVM: arm64: nv: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
  KVM: arm64: nv: Trap CPACR_EL1 access in virtual EL2
  KVM: arm64: nv: Handle PSCI call via smc from the guest
  KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting
  KVM: arm64: nv: Respect virtual CPTR_EL2.{TFP,FPEN} settings
  KVM: arm64: nv: Respect the virtual HCR_EL2.NV bit setting
  KVM: arm64: nv: Respect virtual HCR_EL2.TVM and TRVM settings
  KVM: arm64: nv: Respect the virtual HCR_EL2.NV1 bit setting
  KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2
  KVM: arm64: nv: Configure HCR_EL2 for nested virtualization
  KVM: arm64: nv: Set a handler for the system instruction traps
  KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2
  KVM: arm64: nv: Trap and emulate TLBI instructions from virtual EL2
  KVM: arm64: nv: Nested GICv3 Support

Marc Zyngier (30):
  KVM: arm64: nv: Add EL2 system registers to vcpu context
  KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers
  KVM: arm64: nv: Handle virtual EL2 registers in
    vcpu_read/write_sys_reg()
  KVM: arm64: nv: Handle SPSR_EL2 specially
  KVM: arm64: nv: Handle HCR_EL2.E2H specially
  KVM: arm64: nv: Save/Restore vEL2 sysregs
  KVM: arm64: nv: Allow a sysreg to be hidden from userspace only
  KVM: arm64: nv: Forward debug traps to the nested guest
  KVM: arm64: nv: Filter out unsupported features from ID regs
  KVM: arm64: nv: Hide RAS from nested guests
  KVM: arm64: nv: Support multiple nested Stage-2 mmu structures
  KVM: arm64: nv: Handle shadow stage 2 page faults
  KVM: arm64: nv: Restrict S2 RD/WR permissions to match the guest's
  KVM: arm64: nv: Fold guest's HCR_EL2 configuration into the host's
  KVM: arm64: nv: Add handling of EL2-specific timer registers
  KVM: arm64: nv: Load timer before the GIC
  KVM: arm64: nv: Don't load the GICv4 context on entering a nested
    guest
  KVM: arm64: nv: Implement maintenance interrupt forwarding
  KVM: arm64: nv: Allow userspace to request KVM_ARM_VCPU_NESTED_VIRT
  KVM: arm64: nv: Add handling of ARMv8.4-TTL TLB invalidation
  KVM: arm64: nv: Invalidate TLBs based on shadow S2 TTL-like
    information
  KVM: arm64: nv: Tag shadow S2 entries with nested level
  KVM: arm64: nv: Add include containing the VNCR_EL2 offsets
  KVM: arm64: nv: Map VNCR-capable registers to a separate page
  KVM: arm64: nv: Move nested vgic state into the sysreg file
  KVM: arm64: Add ARMv8.4 Enhanced Nested Virt cpufeature
  KVM: arm64: nv: Allocate VNCR page when required
  KVM: arm64: nv: Enable ARMv8.4-NV support
  KVM: arm64: nv: Fast-track 'InHost' exception returns
  KVM: arm64: nv: Fast-track EL1 TLBIs for VHE guests

 .../admin-guide/kernel-parameters.txt         |    7 +-
 .../virt/kvm/devices/arm-vgic-v3.rst          |   12 +-
 arch/arm64/include/asm/esr.h                  |    5 +
 arch/arm64/include/asm/kvm_arm.h              |   27 +-
 arch/arm64/include/asm/kvm_asm.h              |    4 +
 arch/arm64/include/asm/kvm_emulate.h          |  143 +-
 arch/arm64/include/asm/kvm_host.h             |  185 ++-
 arch/arm64/include/asm/kvm_hyp.h              |    2 +
 arch/arm64/include/asm/kvm_mmu.h              |   18 +-
 arch/arm64/include/asm/kvm_nested.h           |  156 ++
 arch/arm64/include/asm/sysreg.h               |  101 +-
 arch/arm64/include/asm/vncr_mapping.h         |   74 +
 arch/arm64/include/uapi/asm/kvm.h             |    2 +
 arch/arm64/kernel/cpufeature.c                |   34 +
 arch/arm64/kvm/Makefile                       |    4 +-
 arch/arm64/kvm/arch_timer.c                   |  202 ++-
 arch/arm64/kvm/arm.c                          |   42 +-
 arch/arm64/kvm/at.c                           |  219 +++
 arch/arm64/kvm/emulate-nested.c               |  211 +++
 arch/arm64/kvm/guest.c                        |    6 +
 arch/arm64/kvm/handle_exit.c                  |   81 +-
 arch/arm64/kvm/hyp/exception.c                |   49 +-
 arch/arm64/kvm/hyp/include/hyp/switch.h       |   12 +-
 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h    |   24 +-
 arch/arm64/kvm/hyp/nvhe/switch.c              |    2 +-
 arch/arm64/kvm/hyp/nvhe/sysreg-sr.c           |    2 +-
 arch/arm64/kvm/hyp/vgic-v3-sr.c               |    2 +-
 arch/arm64/kvm/hyp/vhe/switch.c               |  181 ++-
 arch/arm64/kvm/hyp/vhe/sysreg-sr.c            |  125 +-
 arch/arm64/kvm/hyp/vhe/tlb.c                  |   83 ++
 arch/arm64/kvm/inject_fault.c                 |   68 +-
 arch/arm64/kvm/mmu.c                          |  206 ++-
 arch/arm64/kvm/nested.c                       |  915 ++++++++++++
 arch/arm64/kvm/reset.c                        |   30 +-
 arch/arm64/kvm/sys_regs.c                     | 1252 ++++++++++++++++-
 arch/arm64/kvm/sys_regs.h                     |   14 +-
 arch/arm64/kvm/trace_arm.h                    |   65 +-
 arch/arm64/kvm/vgic/vgic-init.c               |   30 +
 arch/arm64/kvm/vgic/vgic-kvm-device.c         |   22 +
 arch/arm64/kvm/vgic/vgic-nested-trace.h       |  137 ++
 arch/arm64/kvm/vgic/vgic-v3-nested.c          |  242 ++++
 arch/arm64/kvm/vgic/vgic-v3.c                 |   39 +-
 arch/arm64/kvm/vgic/vgic.c                    |   44 +
 arch/arm64/kvm/vgic/vgic.h                    |   10 +
 arch/arm64/tools/cpucaps                      |    2 +
 include/kvm/arm_arch_timer.h                  |    9 +-
 include/kvm/arm_vgic.h                        |   16 +
 include/uapi/linux/kvm.h                      |    1 +
 tools/arch/arm/include/uapi/asm/kvm.h         |    1 +
 49 files changed, 4947 insertions(+), 171 deletions(-)
 create mode 100644 arch/arm64/include/asm/kvm_nested.h
 create mode 100644 arch/arm64/include/asm/vncr_mapping.h
 create mode 100644 arch/arm64/kvm/at.c
 create mode 100644 arch/arm64/kvm/emulate-nested.c
 create mode 100644 arch/arm64/kvm/nested.c
 create mode 100644 arch/arm64/kvm/vgic/vgic-nested-trace.h
 create mode 100644 arch/arm64/kvm/vgic/vgic-v3-nested.c

-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* [PATCH v6 00/64] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support
@ 2022-01-28 12:18 ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

Here the first drop of the KVM/arm64 NV support code for 2022. Nothing
to worry about, it certainly isn't going to be the last!

A number of changes since [1]:

- The exposure of the EL2 sysregs to userspace is now gated by NV
  being enabled, as you'd expect. Which means we shouldn't break live
  migration anymore (yay!).

- We now correctly detect and handle an Illegal Exception Return from
  vEL2. Don't try this at home, kids!

- Non-nested exception injection when executing at EL0 has been fixed.

- We forbid NV+SVE for now. This is a change in ABI, and I hope to
  remove this requirement at some point. I've pushed out a kvmtool
  change that enforces this [2], and QEMU will need similar surgery
  for now.

- nested_virt_in_use() is now renamed to vcpu_has_nv() (resp
  vcpu_has_nv2() for the v8.4 support).

- A bunch of small nits being tidied up, thanks to our eagle eyed
  reviewers.

Many thanks to Alexandru, Chase, Ganapatrao and Russell for spending a
lot of time reviewing this and actively spotting issues. There is a
pending issue that Ganapatrao mentioned when running L0 with 4kB and
L1 with 64kB, resulting in no forward progress. I haven't investigated
this yet, but I strongly suspect that something is amiss in the S2 PTW
at the point where we combine the L2 IPA with the L1 IPA.

As usual, blame me for any bug, and nobody else. It has been tested on
my usual zoo, and nothing caught fire. Which means nothing, of course.
Obviously, it isn't feature complete, and it is quite easy to write a
guest that will not behave as intended. The current goal is to make
sure that non-NV KVM is unaffected by the NV stuff.

It is massively painful to run on the FVP, but if you have a Neoverse
V1 or N2 system (or anything else with ARMv8.4-NV) that is collecting
dust, I have the right stuff to keep it busy.

	M.

[1] https://lore.kernel.org/r/20211129200150.351436-1-maz@kernel.org
[2] https://git.kernel.org/pub/scm/linux/kernel/git/maz/kvmtool.git/log/?h=arm64/nv-5.16

Andre Przywara (1):
  KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ

Christoffer Dall (15):
  KVM: arm64: nv: Introduce nested virtualization VCPU feature
  KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set
  KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x
  KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state
  KVM: arm64: nv: Reset VMPIDR_EL2 and VPIDR_EL2 to sane values
  KVM: arm64: nv: Handle trapped ERET from virtual EL2
  KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor
  KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2
  KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2
    changes
  KVM: arm64: nv: Implement nested Stage-2 page table walk logic
  KVM: arm64: nv: Unmap/flush shadow stage 2 page tables
  KVM: arm64: nv: arch_timer: Support hyp timer emulation
  KVM: arm64: nv: vgic: Emulate the HW bit in software
  KVM: arm64: nv: Add nested GICv3 tracepoints
  KVM: arm64: nv: Sync nested timer state with ARMv8.4

Jintack Lim (18):
  arm64: Add ARM64_HAS_NESTED_VIRT cpufeature
  KVM: arm64: nv: Handle HCR_EL2.NV system register traps
  KVM: arm64: nv: Support virtual EL2 exceptions
  KVM: arm64: nv: Inject HVC exceptions to the virtual EL2
  KVM: arm64: nv: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
  KVM: arm64: nv: Trap CPACR_EL1 access in virtual EL2
  KVM: arm64: nv: Handle PSCI call via smc from the guest
  KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting
  KVM: arm64: nv: Respect virtual CPTR_EL2.{TFP,FPEN} settings
  KVM: arm64: nv: Respect the virtual HCR_EL2.NV bit setting
  KVM: arm64: nv: Respect virtual HCR_EL2.TVM and TRVM settings
  KVM: arm64: nv: Respect the virtual HCR_EL2.NV1 bit setting
  KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2
  KVM: arm64: nv: Configure HCR_EL2 for nested virtualization
  KVM: arm64: nv: Set a handler for the system instruction traps
  KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2
  KVM: arm64: nv: Trap and emulate TLBI instructions from virtual EL2
  KVM: arm64: nv: Nested GICv3 Support

Marc Zyngier (30):
  KVM: arm64: nv: Add EL2 system registers to vcpu context
  KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers
  KVM: arm64: nv: Handle virtual EL2 registers in
    vcpu_read/write_sys_reg()
  KVM: arm64: nv: Handle SPSR_EL2 specially
  KVM: arm64: nv: Handle HCR_EL2.E2H specially
  KVM: arm64: nv: Save/Restore vEL2 sysregs
  KVM: arm64: nv: Allow a sysreg to be hidden from userspace only
  KVM: arm64: nv: Forward debug traps to the nested guest
  KVM: arm64: nv: Filter out unsupported features from ID regs
  KVM: arm64: nv: Hide RAS from nested guests
  KVM: arm64: nv: Support multiple nested Stage-2 mmu structures
  KVM: arm64: nv: Handle shadow stage 2 page faults
  KVM: arm64: nv: Restrict S2 RD/WR permissions to match the guest's
  KVM: arm64: nv: Fold guest's HCR_EL2 configuration into the host's
  KVM: arm64: nv: Add handling of EL2-specific timer registers
  KVM: arm64: nv: Load timer before the GIC
  KVM: arm64: nv: Don't load the GICv4 context on entering a nested
    guest
  KVM: arm64: nv: Implement maintenance interrupt forwarding
  KVM: arm64: nv: Allow userspace to request KVM_ARM_VCPU_NESTED_VIRT
  KVM: arm64: nv: Add handling of ARMv8.4-TTL TLB invalidation
  KVM: arm64: nv: Invalidate TLBs based on shadow S2 TTL-like
    information
  KVM: arm64: nv: Tag shadow S2 entries with nested level
  KVM: arm64: nv: Add include containing the VNCR_EL2 offsets
  KVM: arm64: nv: Map VNCR-capable registers to a separate page
  KVM: arm64: nv: Move nested vgic state into the sysreg file
  KVM: arm64: Add ARMv8.4 Enhanced Nested Virt cpufeature
  KVM: arm64: nv: Allocate VNCR page when required
  KVM: arm64: nv: Enable ARMv8.4-NV support
  KVM: arm64: nv: Fast-track 'InHost' exception returns
  KVM: arm64: nv: Fast-track EL1 TLBIs for VHE guests

 .../admin-guide/kernel-parameters.txt         |    7 +-
 .../virt/kvm/devices/arm-vgic-v3.rst          |   12 +-
 arch/arm64/include/asm/esr.h                  |    5 +
 arch/arm64/include/asm/kvm_arm.h              |   27 +-
 arch/arm64/include/asm/kvm_asm.h              |    4 +
 arch/arm64/include/asm/kvm_emulate.h          |  143 +-
 arch/arm64/include/asm/kvm_host.h             |  185 ++-
 arch/arm64/include/asm/kvm_hyp.h              |    2 +
 arch/arm64/include/asm/kvm_mmu.h              |   18 +-
 arch/arm64/include/asm/kvm_nested.h           |  156 ++
 arch/arm64/include/asm/sysreg.h               |  101 +-
 arch/arm64/include/asm/vncr_mapping.h         |   74 +
 arch/arm64/include/uapi/asm/kvm.h             |    2 +
 arch/arm64/kernel/cpufeature.c                |   34 +
 arch/arm64/kvm/Makefile                       |    4 +-
 arch/arm64/kvm/arch_timer.c                   |  202 ++-
 arch/arm64/kvm/arm.c                          |   42 +-
 arch/arm64/kvm/at.c                           |  219 +++
 arch/arm64/kvm/emulate-nested.c               |  211 +++
 arch/arm64/kvm/guest.c                        |    6 +
 arch/arm64/kvm/handle_exit.c                  |   81 +-
 arch/arm64/kvm/hyp/exception.c                |   49 +-
 arch/arm64/kvm/hyp/include/hyp/switch.h       |   12 +-
 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h    |   24 +-
 arch/arm64/kvm/hyp/nvhe/switch.c              |    2 +-
 arch/arm64/kvm/hyp/nvhe/sysreg-sr.c           |    2 +-
 arch/arm64/kvm/hyp/vgic-v3-sr.c               |    2 +-
 arch/arm64/kvm/hyp/vhe/switch.c               |  181 ++-
 arch/arm64/kvm/hyp/vhe/sysreg-sr.c            |  125 +-
 arch/arm64/kvm/hyp/vhe/tlb.c                  |   83 ++
 arch/arm64/kvm/inject_fault.c                 |   68 +-
 arch/arm64/kvm/mmu.c                          |  206 ++-
 arch/arm64/kvm/nested.c                       |  915 ++++++++++++
 arch/arm64/kvm/reset.c                        |   30 +-
 arch/arm64/kvm/sys_regs.c                     | 1252 ++++++++++++++++-
 arch/arm64/kvm/sys_regs.h                     |   14 +-
 arch/arm64/kvm/trace_arm.h                    |   65 +-
 arch/arm64/kvm/vgic/vgic-init.c               |   30 +
 arch/arm64/kvm/vgic/vgic-kvm-device.c         |   22 +
 arch/arm64/kvm/vgic/vgic-nested-trace.h       |  137 ++
 arch/arm64/kvm/vgic/vgic-v3-nested.c          |  242 ++++
 arch/arm64/kvm/vgic/vgic-v3.c                 |   39 +-
 arch/arm64/kvm/vgic/vgic.c                    |   44 +
 arch/arm64/kvm/vgic/vgic.h                    |   10 +
 arch/arm64/tools/cpucaps                      |    2 +
 include/kvm/arm_arch_timer.h                  |    9 +-
 include/kvm/arm_vgic.h                        |   16 +
 include/uapi/linux/kvm.h                      |    1 +
 tools/arch/arm/include/uapi/asm/kvm.h         |    1 +
 49 files changed, 4947 insertions(+), 171 deletions(-)
 create mode 100644 arch/arm64/include/asm/kvm_nested.h
 create mode 100644 arch/arm64/include/asm/vncr_mapping.h
 create mode 100644 arch/arm64/kvm/at.c
 create mode 100644 arch/arm64/kvm/emulate-nested.c
 create mode 100644 arch/arm64/kvm/nested.c
 create mode 100644 arch/arm64/kvm/vgic/vgic-nested-trace.h
 create mode 100644 arch/arm64/kvm/vgic/vgic-v3-nested.c

-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* [PATCH v6 01/64] arm64: Add ARM64_HAS_NESTED_VIRT cpufeature
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Jintack Lim <jintack.lim@linaro.org>

Add a new ARM64_HAS_NESTED_VIRT feature to indicate that the
CPU has the ARMv8.3 nested virtualization capability, together
with the 'kvm-arm.mode=nested' command line option.

This will be used to support nested virtualization in KVM.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
[maz: moved the command-line option to kvm-arm.mode]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 .../admin-guide/kernel-parameters.txt         |  7 +++++-
 arch/arm64/include/asm/kvm_host.h             |  5 ++++
 arch/arm64/kernel/cpufeature.c                | 24 +++++++++++++++++++
 arch/arm64/kvm/arm.c                          |  5 ++++
 arch/arm64/tools/cpucaps                      |  1 +
 5 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index f5a27f067db9..466d8fdbaee2 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2391,9 +2391,14 @@
 				   state is kept private from the host.
 				   Not valid if the kernel is running in EL2.
 
+			nested: VHE-based mode with support for nested
+				virtualization. Requires at least ARMv8.3
+				hardware.
+
 			Defaults to VHE/nVHE based on hardware support. Setting
 			mode to "protected" will disable kexec and hibernation
-			for the host.
+			for the host. "nested" is experimental and should be
+			used with extreme caution.
 
 	kvm-arm.vgic_v3_group0_trap=
 			[KVM,ARM] Trap guest accesses to GICv3 group-0
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 5bc01e62c08a..115e0e2caf9a 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -57,9 +57,14 @@
 enum kvm_mode {
 	KVM_MODE_DEFAULT,
 	KVM_MODE_PROTECTED,
+	KVM_MODE_NV,
 	KVM_MODE_NONE,
 };
+#ifdef CONFIG_KVM
 enum kvm_mode kvm_get_mode(void);
+#else
+static inline enum kvm_mode kvm_get_mode(void) { return KVM_MODE_NONE; };
+#endif
 
 DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
 
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index a46ab3b1c4d5..2fa39ce108d0 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1772,6 +1772,20 @@ static void cpu_copy_el2regs(const struct arm64_cpu_capabilities *__unused)
 		write_sysreg(read_sysreg(tpidr_el1), tpidr_el2);
 }
 
+static bool has_nested_virt_support(const struct arm64_cpu_capabilities *cap,
+				    int scope)
+{
+	if (kvm_get_mode() != KVM_MODE_NV)
+		return false;
+
+	if (!has_cpuid_feature(cap, scope)) {
+		pr_warn("unavailable: %s\n", cap->desc);
+		return false;
+	}
+
+	return true;
+}
+
 static void cpu_has_fwb(const struct arm64_cpu_capabilities *__unused)
 {
 	u64 val = read_sysreg_s(SYS_CLIDR_EL1);
@@ -2005,6 +2019,16 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 		.matches = runs_at_el2,
 		.cpu_enable = cpu_copy_el2regs,
 	},
+	{
+		.desc = "Nested Virtualization Support",
+		.capability = ARM64_HAS_NESTED_VIRT,
+		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
+		.matches = has_nested_virt_support,
+		.sys_reg = SYS_ID_AA64MMFR2_EL1,
+		.sign = FTR_UNSIGNED,
+		.field_pos = ID_AA64MMFR2_NV_SHIFT,
+		.min_field_value = 1,
+	},
 	{
 		.capability = ARM64_HAS_32BIT_EL0_DO_NOT_USE,
 		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index a4a0063df456..06ca11e90482 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -2209,6 +2209,11 @@ static int __init early_kvm_mode_cfg(char *arg)
 		return 0;
 	}
 
+	if (strcmp(arg, "nested") == 0 && !WARN_ON(!is_kernel_in_hyp_mode())) {
+		kvm_mode = KVM_MODE_NV;
+		return 0;
+	}
+
 	if (strcmp(arg, "none") == 0) {
 		kvm_mode = KVM_MODE_NONE;
 		return 0;
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 870c39537dd0..a49864b56a07 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -26,6 +26,7 @@ HAS_GENERIC_AUTH_IMP_DEF
 HAS_IRQ_PRIO_MASKING
 HAS_LDAPR
 HAS_LSE_ATOMICS
+HAS_NESTED_VIRT
 HAS_NO_FPSIMD
 HAS_NO_HW_PREFETCH
 HAS_PAN
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 01/64] arm64: Add ARM64_HAS_NESTED_VIRT cpufeature
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

From: Jintack Lim <jintack.lim@linaro.org>

Add a new ARM64_HAS_NESTED_VIRT feature to indicate that the
CPU has the ARMv8.3 nested virtualization capability, together
with the 'kvm-arm.mode=nested' command line option.

This will be used to support nested virtualization in KVM.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
[maz: moved the command-line option to kvm-arm.mode]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 .../admin-guide/kernel-parameters.txt         |  7 +++++-
 arch/arm64/include/asm/kvm_host.h             |  5 ++++
 arch/arm64/kernel/cpufeature.c                | 24 +++++++++++++++++++
 arch/arm64/kvm/arm.c                          |  5 ++++
 arch/arm64/tools/cpucaps                      |  1 +
 5 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index f5a27f067db9..466d8fdbaee2 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2391,9 +2391,14 @@
 				   state is kept private from the host.
 				   Not valid if the kernel is running in EL2.
 
+			nested: VHE-based mode with support for nested
+				virtualization. Requires at least ARMv8.3
+				hardware.
+
 			Defaults to VHE/nVHE based on hardware support. Setting
 			mode to "protected" will disable kexec and hibernation
-			for the host.
+			for the host. "nested" is experimental and should be
+			used with extreme caution.
 
 	kvm-arm.vgic_v3_group0_trap=
 			[KVM,ARM] Trap guest accesses to GICv3 group-0
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 5bc01e62c08a..115e0e2caf9a 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -57,9 +57,14 @@
 enum kvm_mode {
 	KVM_MODE_DEFAULT,
 	KVM_MODE_PROTECTED,
+	KVM_MODE_NV,
 	KVM_MODE_NONE,
 };
+#ifdef CONFIG_KVM
 enum kvm_mode kvm_get_mode(void);
+#else
+static inline enum kvm_mode kvm_get_mode(void) { return KVM_MODE_NONE; };
+#endif
 
 DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
 
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index a46ab3b1c4d5..2fa39ce108d0 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1772,6 +1772,20 @@ static void cpu_copy_el2regs(const struct arm64_cpu_capabilities *__unused)
 		write_sysreg(read_sysreg(tpidr_el1), tpidr_el2);
 }
 
+static bool has_nested_virt_support(const struct arm64_cpu_capabilities *cap,
+				    int scope)
+{
+	if (kvm_get_mode() != KVM_MODE_NV)
+		return false;
+
+	if (!has_cpuid_feature(cap, scope)) {
+		pr_warn("unavailable: %s\n", cap->desc);
+		return false;
+	}
+
+	return true;
+}
+
 static void cpu_has_fwb(const struct arm64_cpu_capabilities *__unused)
 {
 	u64 val = read_sysreg_s(SYS_CLIDR_EL1);
@@ -2005,6 +2019,16 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 		.matches = runs_at_el2,
 		.cpu_enable = cpu_copy_el2regs,
 	},
+	{
+		.desc = "Nested Virtualization Support",
+		.capability = ARM64_HAS_NESTED_VIRT,
+		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
+		.matches = has_nested_virt_support,
+		.sys_reg = SYS_ID_AA64MMFR2_EL1,
+		.sign = FTR_UNSIGNED,
+		.field_pos = ID_AA64MMFR2_NV_SHIFT,
+		.min_field_value = 1,
+	},
 	{
 		.capability = ARM64_HAS_32BIT_EL0_DO_NOT_USE,
 		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index a4a0063df456..06ca11e90482 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -2209,6 +2209,11 @@ static int __init early_kvm_mode_cfg(char *arg)
 		return 0;
 	}
 
+	if (strcmp(arg, "nested") == 0 && !WARN_ON(!is_kernel_in_hyp_mode())) {
+		kvm_mode = KVM_MODE_NV;
+		return 0;
+	}
+
 	if (strcmp(arg, "none") == 0) {
 		kvm_mode = KVM_MODE_NONE;
 		return 0;
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 870c39537dd0..a49864b56a07 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -26,6 +26,7 @@ HAS_GENERIC_AUTH_IMP_DEF
 HAS_IRQ_PRIO_MASKING
 HAS_LDAPR
 HAS_LSE_ATOMICS
+HAS_NESTED_VIRT
 HAS_NO_FPSIMD
 HAS_NO_HW_PREFETCH
 HAS_PAN
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 01/64] arm64: Add ARM64_HAS_NESTED_VIRT cpufeature
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Jintack Lim <jintack.lim@linaro.org>

Add a new ARM64_HAS_NESTED_VIRT feature to indicate that the
CPU has the ARMv8.3 nested virtualization capability, together
with the 'kvm-arm.mode=nested' command line option.

This will be used to support nested virtualization in KVM.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
[maz: moved the command-line option to kvm-arm.mode]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 .../admin-guide/kernel-parameters.txt         |  7 +++++-
 arch/arm64/include/asm/kvm_host.h             |  5 ++++
 arch/arm64/kernel/cpufeature.c                | 24 +++++++++++++++++++
 arch/arm64/kvm/arm.c                          |  5 ++++
 arch/arm64/tools/cpucaps                      |  1 +
 5 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index f5a27f067db9..466d8fdbaee2 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2391,9 +2391,14 @@
 				   state is kept private from the host.
 				   Not valid if the kernel is running in EL2.
 
+			nested: VHE-based mode with support for nested
+				virtualization. Requires at least ARMv8.3
+				hardware.
+
 			Defaults to VHE/nVHE based on hardware support. Setting
 			mode to "protected" will disable kexec and hibernation
-			for the host.
+			for the host. "nested" is experimental and should be
+			used with extreme caution.
 
 	kvm-arm.vgic_v3_group0_trap=
 			[KVM,ARM] Trap guest accesses to GICv3 group-0
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 5bc01e62c08a..115e0e2caf9a 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -57,9 +57,14 @@
 enum kvm_mode {
 	KVM_MODE_DEFAULT,
 	KVM_MODE_PROTECTED,
+	KVM_MODE_NV,
 	KVM_MODE_NONE,
 };
+#ifdef CONFIG_KVM
 enum kvm_mode kvm_get_mode(void);
+#else
+static inline enum kvm_mode kvm_get_mode(void) { return KVM_MODE_NONE; };
+#endif
 
 DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
 
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index a46ab3b1c4d5..2fa39ce108d0 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1772,6 +1772,20 @@ static void cpu_copy_el2regs(const struct arm64_cpu_capabilities *__unused)
 		write_sysreg(read_sysreg(tpidr_el1), tpidr_el2);
 }
 
+static bool has_nested_virt_support(const struct arm64_cpu_capabilities *cap,
+				    int scope)
+{
+	if (kvm_get_mode() != KVM_MODE_NV)
+		return false;
+
+	if (!has_cpuid_feature(cap, scope)) {
+		pr_warn("unavailable: %s\n", cap->desc);
+		return false;
+	}
+
+	return true;
+}
+
 static void cpu_has_fwb(const struct arm64_cpu_capabilities *__unused)
 {
 	u64 val = read_sysreg_s(SYS_CLIDR_EL1);
@@ -2005,6 +2019,16 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 		.matches = runs_at_el2,
 		.cpu_enable = cpu_copy_el2regs,
 	},
+	{
+		.desc = "Nested Virtualization Support",
+		.capability = ARM64_HAS_NESTED_VIRT,
+		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
+		.matches = has_nested_virt_support,
+		.sys_reg = SYS_ID_AA64MMFR2_EL1,
+		.sign = FTR_UNSIGNED,
+		.field_pos = ID_AA64MMFR2_NV_SHIFT,
+		.min_field_value = 1,
+	},
 	{
 		.capability = ARM64_HAS_32BIT_EL0_DO_NOT_USE,
 		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index a4a0063df456..06ca11e90482 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -2209,6 +2209,11 @@ static int __init early_kvm_mode_cfg(char *arg)
 		return 0;
 	}
 
+	if (strcmp(arg, "nested") == 0 && !WARN_ON(!is_kernel_in_hyp_mode())) {
+		kvm_mode = KVM_MODE_NV;
+		return 0;
+	}
+
 	if (strcmp(arg, "none") == 0) {
 		kvm_mode = KVM_MODE_NONE;
 		return 0;
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 870c39537dd0..a49864b56a07 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -26,6 +26,7 @@ HAS_GENERIC_AUTH_IMP_DEF
 HAS_IRQ_PRIO_MASKING
 HAS_LDAPR
 HAS_LSE_ATOMICS
+HAS_NESTED_VIRT
 HAS_NO_FPSIMD
 HAS_NO_HW_PREFETCH
 HAS_PAN
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 02/64] KVM: arm64: nv: Introduce nested virtualization VCPU feature
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Christoffer Dall <christoffer.dall@arm.com>

Introduce the feature bit and a primitive that checks if the feature is
set behind a static key check based on the cpus_have_const_cap check.

Checking vcpu_has_nv() on systems without nested virt enabled
should have negligible overhead.

We don't yet allow userspace to actually set this feature.

Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_nested.h | 14 ++++++++++++++
 arch/arm64/include/uapi/asm/kvm.h   |  1 +
 2 files changed, 15 insertions(+)
 create mode 100644 arch/arm64/include/asm/kvm_nested.h

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
new file mode 100644
index 000000000000..fd601ea68d13
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ARM64_KVM_NESTED_H
+#define __ARM64_KVM_NESTED_H
+
+#include <linux/kvm_host.h>
+
+static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
+{
+	return (!__is_defined(__KVM_NVHE_HYPERVISOR__) &&
+		cpus_have_final_cap(ARM64_HAS_NESTED_VIRT) &&
+		test_bit(KVM_ARM_VCPU_HAS_EL2, vcpu->arch.features));
+}
+
+#endif /* __ARM64_KVM_NESTED_H */
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index b3edde68bc3e..395a4c039bcc 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -106,6 +106,7 @@ struct kvm_regs {
 #define KVM_ARM_VCPU_SVE		4 /* enable SVE for this CPU */
 #define KVM_ARM_VCPU_PTRAUTH_ADDRESS	5 /* VCPU uses address authentication */
 #define KVM_ARM_VCPU_PTRAUTH_GENERIC	6 /* VCPU uses generic authentication */
+#define KVM_ARM_VCPU_HAS_EL2		7 /* Support nested virtualization */
 
 struct kvm_vcpu_init {
 	__u32 target;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 02/64] KVM: arm64: nv: Introduce nested virtualization VCPU feature
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

From: Christoffer Dall <christoffer.dall@arm.com>

Introduce the feature bit and a primitive that checks if the feature is
set behind a static key check based on the cpus_have_const_cap check.

Checking vcpu_has_nv() on systems without nested virt enabled
should have negligible overhead.

We don't yet allow userspace to actually set this feature.

Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_nested.h | 14 ++++++++++++++
 arch/arm64/include/uapi/asm/kvm.h   |  1 +
 2 files changed, 15 insertions(+)
 create mode 100644 arch/arm64/include/asm/kvm_nested.h

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
new file mode 100644
index 000000000000..fd601ea68d13
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ARM64_KVM_NESTED_H
+#define __ARM64_KVM_NESTED_H
+
+#include <linux/kvm_host.h>
+
+static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
+{
+	return (!__is_defined(__KVM_NVHE_HYPERVISOR__) &&
+		cpus_have_final_cap(ARM64_HAS_NESTED_VIRT) &&
+		test_bit(KVM_ARM_VCPU_HAS_EL2, vcpu->arch.features));
+}
+
+#endif /* __ARM64_KVM_NESTED_H */
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index b3edde68bc3e..395a4c039bcc 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -106,6 +106,7 @@ struct kvm_regs {
 #define KVM_ARM_VCPU_SVE		4 /* enable SVE for this CPU */
 #define KVM_ARM_VCPU_PTRAUTH_ADDRESS	5 /* VCPU uses address authentication */
 #define KVM_ARM_VCPU_PTRAUTH_GENERIC	6 /* VCPU uses generic authentication */
+#define KVM_ARM_VCPU_HAS_EL2		7 /* Support nested virtualization */
 
 struct kvm_vcpu_init {
 	__u32 target;
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 02/64] KVM: arm64: nv: Introduce nested virtualization VCPU feature
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Christoffer Dall <christoffer.dall@arm.com>

Introduce the feature bit and a primitive that checks if the feature is
set behind a static key check based on the cpus_have_const_cap check.

Checking vcpu_has_nv() on systems without nested virt enabled
should have negligible overhead.

We don't yet allow userspace to actually set this feature.

Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_nested.h | 14 ++++++++++++++
 arch/arm64/include/uapi/asm/kvm.h   |  1 +
 2 files changed, 15 insertions(+)
 create mode 100644 arch/arm64/include/asm/kvm_nested.h

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
new file mode 100644
index 000000000000..fd601ea68d13
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ARM64_KVM_NESTED_H
+#define __ARM64_KVM_NESTED_H
+
+#include <linux/kvm_host.h>
+
+static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
+{
+	return (!__is_defined(__KVM_NVHE_HYPERVISOR__) &&
+		cpus_have_final_cap(ARM64_HAS_NESTED_VIRT) &&
+		test_bit(KVM_ARM_VCPU_HAS_EL2, vcpu->arch.features));
+}
+
+#endif /* __ARM64_KVM_NESTED_H */
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index b3edde68bc3e..395a4c039bcc 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -106,6 +106,7 @@ struct kvm_regs {
 #define KVM_ARM_VCPU_SVE		4 /* enable SVE for this CPU */
 #define KVM_ARM_VCPU_PTRAUTH_ADDRESS	5 /* VCPU uses address authentication */
 #define KVM_ARM_VCPU_PTRAUTH_GENERIC	6 /* VCPU uses generic authentication */
+#define KVM_ARM_VCPU_HAS_EL2		7 /* Support nested virtualization */
 
 struct kvm_vcpu_init {
 	__u32 target;
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 03/64] KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Christoffer Dall <christoffer.dall@arm.com>

Reset the VCPU with PSTATE.M = EL2h when the nested virtualization
feature is enabled on the VCPU.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
[maz: rework register reset not to use empty data structures]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/reset.c | 29 ++++++++++++++++++++++++-----
 1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index ecc40c8cd6f6..d19a9aad2d85 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -27,6 +27,7 @@
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_nested.h>
 #include <asm/virt.h>
 
 /* Maximum phys_shift supported for any VM on this host */
@@ -38,6 +39,9 @@ static u32 kvm_ipa_limit;
 #define VCPU_RESET_PSTATE_EL1	(PSR_MODE_EL1h | PSR_A_BIT | PSR_I_BIT | \
 				 PSR_F_BIT | PSR_D_BIT)
 
+#define VCPU_RESET_PSTATE_EL2	(PSR_MODE_EL2h | PSR_A_BIT | PSR_I_BIT | \
+				 PSR_F_BIT | PSR_D_BIT)
+
 #define VCPU_RESET_PSTATE_SVC	(PSR_AA32_MODE_SVC | PSR_AA32_A_BIT | \
 				 PSR_AA32_I_BIT | PSR_AA32_F_BIT)
 
@@ -188,12 +192,19 @@ static bool vcpu_allowed_register_width(struct kvm_vcpu *vcpu)
 	unsigned long i;
 
 	is32bit = vcpu_has_feature(vcpu, KVM_ARM_VCPU_EL1_32BIT);
-	if (!cpus_have_const_cap(ARM64_HAS_32BIT_EL1) && is32bit)
-		return false;
+	if (is32bit) {
+		/* The HW must obviously support AArch32 at EL1 */
+		if (!cpus_have_const_cap(ARM64_HAS_32BIT_EL1))
+			return false;
 
-	/* MTE is incompatible with AArch32 */
-	if (kvm_has_mte(vcpu->kvm) && is32bit)
-		return false;
+		/* MTE is incompatible with AArch32 */
+		if (kvm_has_mte(vcpu->kvm))
+			return false;
+
+		/* NV is incompatible with AArch32 */
+		if (vcpu_has_nv(vcpu))
+			return false;
+	}
 
 	/* Check that the vcpus are either all 32bit or all 64bit */
 	kvm_for_each_vcpu(i, tmp, vcpu->kvm) {
@@ -265,10 +276,18 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 		goto out;
 	}
 
+	if (vcpu_has_nv(vcpu) &&
+	    vcpu_has_feature(vcpu, KVM_ARM_VCPU_SVE)) {
+		ret = -EINVAL;
+		goto out;
+	}
+
 	switch (vcpu->arch.target) {
 	default:
 		if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
 			pstate = VCPU_RESET_PSTATE_SVC;
+		} else if (vcpu_has_nv(vcpu)) {
+			pstate = VCPU_RESET_PSTATE_EL2;
 		} else {
 			pstate = VCPU_RESET_PSTATE_EL1;
 		}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 03/64] KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

From: Christoffer Dall <christoffer.dall@arm.com>

Reset the VCPU with PSTATE.M = EL2h when the nested virtualization
feature is enabled on the VCPU.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
[maz: rework register reset not to use empty data structures]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/reset.c | 29 ++++++++++++++++++++++++-----
 1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index ecc40c8cd6f6..d19a9aad2d85 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -27,6 +27,7 @@
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_nested.h>
 #include <asm/virt.h>
 
 /* Maximum phys_shift supported for any VM on this host */
@@ -38,6 +39,9 @@ static u32 kvm_ipa_limit;
 #define VCPU_RESET_PSTATE_EL1	(PSR_MODE_EL1h | PSR_A_BIT | PSR_I_BIT | \
 				 PSR_F_BIT | PSR_D_BIT)
 
+#define VCPU_RESET_PSTATE_EL2	(PSR_MODE_EL2h | PSR_A_BIT | PSR_I_BIT | \
+				 PSR_F_BIT | PSR_D_BIT)
+
 #define VCPU_RESET_PSTATE_SVC	(PSR_AA32_MODE_SVC | PSR_AA32_A_BIT | \
 				 PSR_AA32_I_BIT | PSR_AA32_F_BIT)
 
@@ -188,12 +192,19 @@ static bool vcpu_allowed_register_width(struct kvm_vcpu *vcpu)
 	unsigned long i;
 
 	is32bit = vcpu_has_feature(vcpu, KVM_ARM_VCPU_EL1_32BIT);
-	if (!cpus_have_const_cap(ARM64_HAS_32BIT_EL1) && is32bit)
-		return false;
+	if (is32bit) {
+		/* The HW must obviously support AArch32 at EL1 */
+		if (!cpus_have_const_cap(ARM64_HAS_32BIT_EL1))
+			return false;
 
-	/* MTE is incompatible with AArch32 */
-	if (kvm_has_mte(vcpu->kvm) && is32bit)
-		return false;
+		/* MTE is incompatible with AArch32 */
+		if (kvm_has_mte(vcpu->kvm))
+			return false;
+
+		/* NV is incompatible with AArch32 */
+		if (vcpu_has_nv(vcpu))
+			return false;
+	}
 
 	/* Check that the vcpus are either all 32bit or all 64bit */
 	kvm_for_each_vcpu(i, tmp, vcpu->kvm) {
@@ -265,10 +276,18 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 		goto out;
 	}
 
+	if (vcpu_has_nv(vcpu) &&
+	    vcpu_has_feature(vcpu, KVM_ARM_VCPU_SVE)) {
+		ret = -EINVAL;
+		goto out;
+	}
+
 	switch (vcpu->arch.target) {
 	default:
 		if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
 			pstate = VCPU_RESET_PSTATE_SVC;
+		} else if (vcpu_has_nv(vcpu)) {
+			pstate = VCPU_RESET_PSTATE_EL2;
 		} else {
 			pstate = VCPU_RESET_PSTATE_EL1;
 		}
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 03/64] KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Christoffer Dall <christoffer.dall@arm.com>

Reset the VCPU with PSTATE.M = EL2h when the nested virtualization
feature is enabled on the VCPU.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
[maz: rework register reset not to use empty data structures]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/reset.c | 29 ++++++++++++++++++++++++-----
 1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index ecc40c8cd6f6..d19a9aad2d85 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -27,6 +27,7 @@
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_nested.h>
 #include <asm/virt.h>
 
 /* Maximum phys_shift supported for any VM on this host */
@@ -38,6 +39,9 @@ static u32 kvm_ipa_limit;
 #define VCPU_RESET_PSTATE_EL1	(PSR_MODE_EL1h | PSR_A_BIT | PSR_I_BIT | \
 				 PSR_F_BIT | PSR_D_BIT)
 
+#define VCPU_RESET_PSTATE_EL2	(PSR_MODE_EL2h | PSR_A_BIT | PSR_I_BIT | \
+				 PSR_F_BIT | PSR_D_BIT)
+
 #define VCPU_RESET_PSTATE_SVC	(PSR_AA32_MODE_SVC | PSR_AA32_A_BIT | \
 				 PSR_AA32_I_BIT | PSR_AA32_F_BIT)
 
@@ -188,12 +192,19 @@ static bool vcpu_allowed_register_width(struct kvm_vcpu *vcpu)
 	unsigned long i;
 
 	is32bit = vcpu_has_feature(vcpu, KVM_ARM_VCPU_EL1_32BIT);
-	if (!cpus_have_const_cap(ARM64_HAS_32BIT_EL1) && is32bit)
-		return false;
+	if (is32bit) {
+		/* The HW must obviously support AArch32 at EL1 */
+		if (!cpus_have_const_cap(ARM64_HAS_32BIT_EL1))
+			return false;
 
-	/* MTE is incompatible with AArch32 */
-	if (kvm_has_mte(vcpu->kvm) && is32bit)
-		return false;
+		/* MTE is incompatible with AArch32 */
+		if (kvm_has_mte(vcpu->kvm))
+			return false;
+
+		/* NV is incompatible with AArch32 */
+		if (vcpu_has_nv(vcpu))
+			return false;
+	}
 
 	/* Check that the vcpus are either all 32bit or all 64bit */
 	kvm_for_each_vcpu(i, tmp, vcpu->kvm) {
@@ -265,10 +276,18 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 		goto out;
 	}
 
+	if (vcpu_has_nv(vcpu) &&
+	    vcpu_has_feature(vcpu, KVM_ARM_VCPU_SVE)) {
+		ret = -EINVAL;
+		goto out;
+	}
+
 	switch (vcpu->arch.target) {
 	default:
 		if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
 			pstate = VCPU_RESET_PSTATE_SVC;
+		} else if (vcpu_has_nv(vcpu)) {
+			pstate = VCPU_RESET_PSTATE_EL2;
 		} else {
 			pstate = VCPU_RESET_PSTATE_EL1;
 		}
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 04/64] KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Christoffer Dall <christoffer.dall@linaro.org>

We were not allowing userspace to set a more privileged mode for the VCPU
than EL1, but we should allow this when nested virtualization is enabled
for the VCPU.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/guest.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index e116c7767730..db6209622be9 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -24,6 +24,7 @@
 #include <asm/fpsimd.h>
 #include <asm/kvm.h>
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_nested.h>
 #include <asm/sigcontext.h>
 
 #include "trace.h"
@@ -259,6 +260,11 @@ static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 			if (vcpu_el1_is_32bit(vcpu))
 				return -EINVAL;
 			break;
+		case PSR_MODE_EL2h:
+		case PSR_MODE_EL2t:
+			if (vcpu_el1_is_32bit(vcpu) || !vcpu_has_nv(vcpu))
+				return -EINVAL;
+			break;
 		default:
 			err = -EINVAL;
 			goto out;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 04/64] KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

From: Christoffer Dall <christoffer.dall@linaro.org>

We were not allowing userspace to set a more privileged mode for the VCPU
than EL1, but we should allow this when nested virtualization is enabled
for the VCPU.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/guest.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index e116c7767730..db6209622be9 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -24,6 +24,7 @@
 #include <asm/fpsimd.h>
 #include <asm/kvm.h>
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_nested.h>
 #include <asm/sigcontext.h>
 
 #include "trace.h"
@@ -259,6 +260,11 @@ static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 			if (vcpu_el1_is_32bit(vcpu))
 				return -EINVAL;
 			break;
+		case PSR_MODE_EL2h:
+		case PSR_MODE_EL2t:
+			if (vcpu_el1_is_32bit(vcpu) || !vcpu_has_nv(vcpu))
+				return -EINVAL;
+			break;
 		default:
 			err = -EINVAL;
 			goto out;
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 04/64] KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Christoffer Dall <christoffer.dall@linaro.org>

We were not allowing userspace to set a more privileged mode for the VCPU
than EL1, but we should allow this when nested virtualization is enabled
for the VCPU.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/guest.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index e116c7767730..db6209622be9 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -24,6 +24,7 @@
 #include <asm/fpsimd.h>
 #include <asm/kvm.h>
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_nested.h>
 #include <asm/sigcontext.h>
 
 #include "trace.h"
@@ -259,6 +260,11 @@ static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 			if (vcpu_el1_is_32bit(vcpu))
 				return -EINVAL;
 			break;
+		case PSR_MODE_EL2h:
+		case PSR_MODE_EL2t:
+			if (vcpu_el1_is_32bit(vcpu) || !vcpu_has_nv(vcpu))
+				return -EINVAL;
+			break;
 		default:
 			err = -EINVAL;
 			goto out;
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 05/64] KVM: arm64: nv: Add EL2 system registers to vcpu context
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

Add the minimal set of EL2 system registers to the vcpu context.
Nothing uses them just yet.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h | 33 ++++++++++++++++++++++++++++++-
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 115e0e2caf9a..15f690c27baf 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -220,12 +220,43 @@ enum vcpu_sysreg {
 	TFSR_EL1,	/* Tag Fault Status Register (EL1) */
 	TFSRE0_EL1,	/* Tag Fault Status Register (EL0) */
 
-	/* 32bit specific registers. Keep them at the end of the range */
+	/* 32bit specific registers. */
 	DACR32_EL2,	/* Domain Access Control Register */
 	IFSR32_EL2,	/* Instruction Fault Status Register */
 	FPEXC32_EL2,	/* Floating-Point Exception Control Register */
 	DBGVCR32_EL2,	/* Debug Vector Catch Register */
 
+	/* EL2 registers */
+	VPIDR_EL2,	/* Virtualization Processor ID Register */
+	VMPIDR_EL2,	/* Virtualization Multiprocessor ID Register */
+	SCTLR_EL2,	/* System Control Register (EL2) */
+	ACTLR_EL2,	/* Auxiliary Control Register (EL2) */
+	HCR_EL2,	/* Hypervisor Configuration Register */
+	MDCR_EL2,	/* Monitor Debug Configuration Register (EL2) */
+	CPTR_EL2,	/* Architectural Feature Trap Register (EL2) */
+	HSTR_EL2,	/* Hypervisor System Trap Register */
+	HACR_EL2,	/* Hypervisor Auxiliary Control Register */
+	TTBR0_EL2,	/* Translation Table Base Register 0 (EL2) */
+	TTBR1_EL2,	/* Translation Table Base Register 1 (EL2) */
+	TCR_EL2,	/* Translation Control Register (EL2) */
+	VTTBR_EL2,	/* Virtualization Translation Table Base Register */
+	VTCR_EL2,	/* Virtualization Translation Control Register */
+	SPSR_EL2,	/* EL2 saved program status register */
+	ELR_EL2,	/* EL2 exception link register */
+	AFSR0_EL2,	/* Auxiliary Fault Status Register 0 (EL2) */
+	AFSR1_EL2,	/* Auxiliary Fault Status Register 1 (EL2) */
+	ESR_EL2,	/* Exception Syndrome Register (EL2) */
+	FAR_EL2,	/* Hypervisor IPA Fault Address Register */
+	HPFAR_EL2,	/* Hypervisor IPA Fault Address Register */
+	MAIR_EL2,	/* Memory Attribute Indirection Register (EL2) */
+	AMAIR_EL2,	/* Auxiliary Memory Attribute Indirection Register (EL2) */
+	VBAR_EL2,	/* Vector Base Address Register (EL2) */
+	RVBAR_EL2,	/* Reset Vector Base Address Register */
+	CONTEXTIDR_EL2,	/* Context ID Register (EL2) */
+	TPIDR_EL2,	/* EL2 Software Thread ID Register */
+	CNTHCTL_EL2,	/* Counter-timer Hypervisor Control register */
+	SP_EL2,		/* EL2 Stack Pointer */
+
 	NR_SYS_REGS	/* Nothing after this line! */
 };
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 05/64] KVM: arm64: nv: Add EL2 system registers to vcpu context
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

Add the minimal set of EL2 system registers to the vcpu context.
Nothing uses them just yet.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h | 33 ++++++++++++++++++++++++++++++-
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 115e0e2caf9a..15f690c27baf 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -220,12 +220,43 @@ enum vcpu_sysreg {
 	TFSR_EL1,	/* Tag Fault Status Register (EL1) */
 	TFSRE0_EL1,	/* Tag Fault Status Register (EL0) */
 
-	/* 32bit specific registers. Keep them at the end of the range */
+	/* 32bit specific registers. */
 	DACR32_EL2,	/* Domain Access Control Register */
 	IFSR32_EL2,	/* Instruction Fault Status Register */
 	FPEXC32_EL2,	/* Floating-Point Exception Control Register */
 	DBGVCR32_EL2,	/* Debug Vector Catch Register */
 
+	/* EL2 registers */
+	VPIDR_EL2,	/* Virtualization Processor ID Register */
+	VMPIDR_EL2,	/* Virtualization Multiprocessor ID Register */
+	SCTLR_EL2,	/* System Control Register (EL2) */
+	ACTLR_EL2,	/* Auxiliary Control Register (EL2) */
+	HCR_EL2,	/* Hypervisor Configuration Register */
+	MDCR_EL2,	/* Monitor Debug Configuration Register (EL2) */
+	CPTR_EL2,	/* Architectural Feature Trap Register (EL2) */
+	HSTR_EL2,	/* Hypervisor System Trap Register */
+	HACR_EL2,	/* Hypervisor Auxiliary Control Register */
+	TTBR0_EL2,	/* Translation Table Base Register 0 (EL2) */
+	TTBR1_EL2,	/* Translation Table Base Register 1 (EL2) */
+	TCR_EL2,	/* Translation Control Register (EL2) */
+	VTTBR_EL2,	/* Virtualization Translation Table Base Register */
+	VTCR_EL2,	/* Virtualization Translation Control Register */
+	SPSR_EL2,	/* EL2 saved program status register */
+	ELR_EL2,	/* EL2 exception link register */
+	AFSR0_EL2,	/* Auxiliary Fault Status Register 0 (EL2) */
+	AFSR1_EL2,	/* Auxiliary Fault Status Register 1 (EL2) */
+	ESR_EL2,	/* Exception Syndrome Register (EL2) */
+	FAR_EL2,	/* Hypervisor IPA Fault Address Register */
+	HPFAR_EL2,	/* Hypervisor IPA Fault Address Register */
+	MAIR_EL2,	/* Memory Attribute Indirection Register (EL2) */
+	AMAIR_EL2,	/* Auxiliary Memory Attribute Indirection Register (EL2) */
+	VBAR_EL2,	/* Vector Base Address Register (EL2) */
+	RVBAR_EL2,	/* Reset Vector Base Address Register */
+	CONTEXTIDR_EL2,	/* Context ID Register (EL2) */
+	TPIDR_EL2,	/* EL2 Software Thread ID Register */
+	CNTHCTL_EL2,	/* Counter-timer Hypervisor Control register */
+	SP_EL2,		/* EL2 Stack Pointer */
+
 	NR_SYS_REGS	/* Nothing after this line! */
 };
 
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 05/64] KVM: arm64: nv: Add EL2 system registers to vcpu context
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

Add the minimal set of EL2 system registers to the vcpu context.
Nothing uses them just yet.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h | 33 ++++++++++++++++++++++++++++++-
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 115e0e2caf9a..15f690c27baf 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -220,12 +220,43 @@ enum vcpu_sysreg {
 	TFSR_EL1,	/* Tag Fault Status Register (EL1) */
 	TFSRE0_EL1,	/* Tag Fault Status Register (EL0) */
 
-	/* 32bit specific registers. Keep them at the end of the range */
+	/* 32bit specific registers. */
 	DACR32_EL2,	/* Domain Access Control Register */
 	IFSR32_EL2,	/* Instruction Fault Status Register */
 	FPEXC32_EL2,	/* Floating-Point Exception Control Register */
 	DBGVCR32_EL2,	/* Debug Vector Catch Register */
 
+	/* EL2 registers */
+	VPIDR_EL2,	/* Virtualization Processor ID Register */
+	VMPIDR_EL2,	/* Virtualization Multiprocessor ID Register */
+	SCTLR_EL2,	/* System Control Register (EL2) */
+	ACTLR_EL2,	/* Auxiliary Control Register (EL2) */
+	HCR_EL2,	/* Hypervisor Configuration Register */
+	MDCR_EL2,	/* Monitor Debug Configuration Register (EL2) */
+	CPTR_EL2,	/* Architectural Feature Trap Register (EL2) */
+	HSTR_EL2,	/* Hypervisor System Trap Register */
+	HACR_EL2,	/* Hypervisor Auxiliary Control Register */
+	TTBR0_EL2,	/* Translation Table Base Register 0 (EL2) */
+	TTBR1_EL2,	/* Translation Table Base Register 1 (EL2) */
+	TCR_EL2,	/* Translation Control Register (EL2) */
+	VTTBR_EL2,	/* Virtualization Translation Table Base Register */
+	VTCR_EL2,	/* Virtualization Translation Control Register */
+	SPSR_EL2,	/* EL2 saved program status register */
+	ELR_EL2,	/* EL2 exception link register */
+	AFSR0_EL2,	/* Auxiliary Fault Status Register 0 (EL2) */
+	AFSR1_EL2,	/* Auxiliary Fault Status Register 1 (EL2) */
+	ESR_EL2,	/* Exception Syndrome Register (EL2) */
+	FAR_EL2,	/* Hypervisor IPA Fault Address Register */
+	HPFAR_EL2,	/* Hypervisor IPA Fault Address Register */
+	MAIR_EL2,	/* Memory Attribute Indirection Register (EL2) */
+	AMAIR_EL2,	/* Auxiliary Memory Attribute Indirection Register (EL2) */
+	VBAR_EL2,	/* Vector Base Address Register (EL2) */
+	RVBAR_EL2,	/* Reset Vector Base Address Register */
+	CONTEXTIDR_EL2,	/* Context ID Register (EL2) */
+	TPIDR_EL2,	/* EL2 Software Thread ID Register */
+	CNTHCTL_EL2,	/* Counter-timer Hypervisor Control register */
+	SP_EL2,		/* EL2 Stack Pointer */
+
 	NR_SYS_REGS	/* Nothing after this line! */
 };
 
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 06/64] KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Christoffer Dall <christoffer.dall@arm.com>

When running a nested hypervisor we commonly have to figure out if
the VCPU mode is running in the context of a guest hypervisor or guest
guest, or just a normal guest.

Add convenient primitives for this.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_emulate.h | 53 ++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index d62405ce3e6d..ea9a130c4b6a 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -178,6 +178,59 @@ static __always_inline void vcpu_set_reg(struct kvm_vcpu *vcpu, u8 reg_num,
 		vcpu_gp_regs(vcpu)->regs[reg_num] = val;
 }
 
+static inline bool vcpu_is_el2_ctxt(const struct kvm_cpu_context *ctxt)
+{
+	switch (ctxt->regs.pstate & (PSR_MODE32_BIT | PSR_MODE_MASK)) {
+	case PSR_MODE_EL2h:
+	case PSR_MODE_EL2t:
+		return true;
+	default:
+		return false;
+	}
+}
+
+static inline bool vcpu_is_el2(const struct kvm_vcpu *vcpu)
+{
+	return vcpu_is_el2_ctxt(&vcpu->arch.ctxt);
+}
+
+static inline bool __vcpu_el2_e2h_is_set(const struct kvm_cpu_context *ctxt)
+{
+	return ctxt_sys_reg(ctxt, HCR_EL2) & HCR_E2H;
+}
+
+static inline bool vcpu_el2_e2h_is_set(const struct kvm_vcpu *vcpu)
+{
+	return __vcpu_el2_e2h_is_set(&vcpu->arch.ctxt);
+}
+
+static inline bool __vcpu_el2_tge_is_set(const struct kvm_cpu_context *ctxt)
+{
+	return ctxt_sys_reg(ctxt, HCR_EL2) & HCR_TGE;
+}
+
+static inline bool vcpu_el2_tge_is_set(const struct kvm_vcpu *vcpu)
+{
+	return __vcpu_el2_tge_is_set(&vcpu->arch.ctxt);
+}
+
+static inline bool __is_hyp_ctxt(const struct kvm_cpu_context *ctxt)
+{
+	/*
+	 * We are in a hypervisor context if the vcpu mode is EL2 or
+	 * E2H and TGE bits are set. The latter means we are in the user space
+	 * of the VHE kernel. ARMv8.1 ARM describes this as 'InHost'
+	 */
+	return vcpu_is_el2_ctxt(ctxt) ||
+		(__vcpu_el2_e2h_is_set(ctxt) && __vcpu_el2_tge_is_set(ctxt)) ||
+		WARN_ON(__vcpu_el2_tge_is_set(ctxt));
+}
+
+static inline bool is_hyp_ctxt(const struct kvm_vcpu *vcpu)
+{
+	return __is_hyp_ctxt(&vcpu->arch.ctxt);
+}
+
 /*
  * The layout of SPSR for an AArch32 state is different when observed from an
  * AArch64 SPSR_ELx or an AArch32 SPSR_*. This function generates the AArch32
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 06/64] KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

From: Christoffer Dall <christoffer.dall@arm.com>

When running a nested hypervisor we commonly have to figure out if
the VCPU mode is running in the context of a guest hypervisor or guest
guest, or just a normal guest.

Add convenient primitives for this.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_emulate.h | 53 ++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index d62405ce3e6d..ea9a130c4b6a 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -178,6 +178,59 @@ static __always_inline void vcpu_set_reg(struct kvm_vcpu *vcpu, u8 reg_num,
 		vcpu_gp_regs(vcpu)->regs[reg_num] = val;
 }
 
+static inline bool vcpu_is_el2_ctxt(const struct kvm_cpu_context *ctxt)
+{
+	switch (ctxt->regs.pstate & (PSR_MODE32_BIT | PSR_MODE_MASK)) {
+	case PSR_MODE_EL2h:
+	case PSR_MODE_EL2t:
+		return true;
+	default:
+		return false;
+	}
+}
+
+static inline bool vcpu_is_el2(const struct kvm_vcpu *vcpu)
+{
+	return vcpu_is_el2_ctxt(&vcpu->arch.ctxt);
+}
+
+static inline bool __vcpu_el2_e2h_is_set(const struct kvm_cpu_context *ctxt)
+{
+	return ctxt_sys_reg(ctxt, HCR_EL2) & HCR_E2H;
+}
+
+static inline bool vcpu_el2_e2h_is_set(const struct kvm_vcpu *vcpu)
+{
+	return __vcpu_el2_e2h_is_set(&vcpu->arch.ctxt);
+}
+
+static inline bool __vcpu_el2_tge_is_set(const struct kvm_cpu_context *ctxt)
+{
+	return ctxt_sys_reg(ctxt, HCR_EL2) & HCR_TGE;
+}
+
+static inline bool vcpu_el2_tge_is_set(const struct kvm_vcpu *vcpu)
+{
+	return __vcpu_el2_tge_is_set(&vcpu->arch.ctxt);
+}
+
+static inline bool __is_hyp_ctxt(const struct kvm_cpu_context *ctxt)
+{
+	/*
+	 * We are in a hypervisor context if the vcpu mode is EL2 or
+	 * E2H and TGE bits are set. The latter means we are in the user space
+	 * of the VHE kernel. ARMv8.1 ARM describes this as 'InHost'
+	 */
+	return vcpu_is_el2_ctxt(ctxt) ||
+		(__vcpu_el2_e2h_is_set(ctxt) && __vcpu_el2_tge_is_set(ctxt)) ||
+		WARN_ON(__vcpu_el2_tge_is_set(ctxt));
+}
+
+static inline bool is_hyp_ctxt(const struct kvm_vcpu *vcpu)
+{
+	return __is_hyp_ctxt(&vcpu->arch.ctxt);
+}
+
 /*
  * The layout of SPSR for an AArch32 state is different when observed from an
  * AArch64 SPSR_ELx or an AArch32 SPSR_*. This function generates the AArch32
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 06/64] KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Christoffer Dall <christoffer.dall@arm.com>

When running a nested hypervisor we commonly have to figure out if
the VCPU mode is running in the context of a guest hypervisor or guest
guest, or just a normal guest.

Add convenient primitives for this.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_emulate.h | 53 ++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index d62405ce3e6d..ea9a130c4b6a 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -178,6 +178,59 @@ static __always_inline void vcpu_set_reg(struct kvm_vcpu *vcpu, u8 reg_num,
 		vcpu_gp_regs(vcpu)->regs[reg_num] = val;
 }
 
+static inline bool vcpu_is_el2_ctxt(const struct kvm_cpu_context *ctxt)
+{
+	switch (ctxt->regs.pstate & (PSR_MODE32_BIT | PSR_MODE_MASK)) {
+	case PSR_MODE_EL2h:
+	case PSR_MODE_EL2t:
+		return true;
+	default:
+		return false;
+	}
+}
+
+static inline bool vcpu_is_el2(const struct kvm_vcpu *vcpu)
+{
+	return vcpu_is_el2_ctxt(&vcpu->arch.ctxt);
+}
+
+static inline bool __vcpu_el2_e2h_is_set(const struct kvm_cpu_context *ctxt)
+{
+	return ctxt_sys_reg(ctxt, HCR_EL2) & HCR_E2H;
+}
+
+static inline bool vcpu_el2_e2h_is_set(const struct kvm_vcpu *vcpu)
+{
+	return __vcpu_el2_e2h_is_set(&vcpu->arch.ctxt);
+}
+
+static inline bool __vcpu_el2_tge_is_set(const struct kvm_cpu_context *ctxt)
+{
+	return ctxt_sys_reg(ctxt, HCR_EL2) & HCR_TGE;
+}
+
+static inline bool vcpu_el2_tge_is_set(const struct kvm_vcpu *vcpu)
+{
+	return __vcpu_el2_tge_is_set(&vcpu->arch.ctxt);
+}
+
+static inline bool __is_hyp_ctxt(const struct kvm_cpu_context *ctxt)
+{
+	/*
+	 * We are in a hypervisor context if the vcpu mode is EL2 or
+	 * E2H and TGE bits are set. The latter means we are in the user space
+	 * of the VHE kernel. ARMv8.1 ARM describes this as 'InHost'
+	 */
+	return vcpu_is_el2_ctxt(ctxt) ||
+		(__vcpu_el2_e2h_is_set(ctxt) && __vcpu_el2_tge_is_set(ctxt)) ||
+		WARN_ON(__vcpu_el2_tge_is_set(ctxt));
+}
+
+static inline bool is_hyp_ctxt(const struct kvm_vcpu *vcpu)
+{
+	return __is_hyp_ctxt(&vcpu->arch.ctxt);
+}
+
 /*
  * The layout of SPSR for an AArch32 state is different when observed from an
  * AArch64 SPSR_ELx or an AArch32 SPSR_*. This function generates the AArch32
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 07/64] KVM: arm64: nv: Handle HCR_EL2.NV system register traps
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Jintack Lim <jintack.lim@linaro.org>

ARM v8.3 introduces a new bit in the HCR_EL2, which is the NV bit. When
this bit is set, accessing EL2 registers in EL1 traps to EL2. In
addition, executing the following instructions in EL1 will trap to EL2:
tlbi, at, eret, and msr/mrs instructions to access SP_EL1. Most of the
instructions that trap to EL2 with the NV bit were undef at EL1 prior to
ARM v8.3. The only instruction that was not undef is eret.

This patch sets up a handler for EL2 registers and SP_EL1 register
accesses at EL1. The host hypervisor keeps those register values in
memory, and will emulate their behavior.

This patch doesn't set the NV bit yet. It will be set in a later patch
once nested virtualization support is completed.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
[maz: EL2_REG() macros]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/sysreg.h | 41 +++++++++++++-
 arch/arm64/kvm/sys_regs.c       | 99 +++++++++++++++++++++++++++++++--
 2 files changed, 134 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 898bee0004ae..61c990e5591a 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -532,10 +532,26 @@
 
 #define SYS_PMCCFILTR_EL0		sys_reg(3, 3, 14, 15, 7)
 
+#define SYS_VPIDR_EL2			sys_reg(3, 4, 0, 0, 0)
+#define SYS_VMPIDR_EL2			sys_reg(3, 4, 0, 0, 5)
+
 #define SYS_SCTLR_EL2			sys_reg(3, 4, 1, 0, 0)
+#define SYS_ACTLR_EL2			sys_reg(3, 4, 1, 0, 1)
+#define SYS_HCR_EL2			sys_reg(3, 4, 1, 1, 0)
+#define SYS_MDCR_EL2			sys_reg(3, 4, 1, 1, 1)
+#define SYS_CPTR_EL2			sys_reg(3, 4, 1, 1, 2)
+#define SYS_HSTR_EL2			sys_reg(3, 4, 1, 1, 3)
 #define SYS_HFGRTR_EL2			sys_reg(3, 4, 1, 1, 4)
 #define SYS_HFGWTR_EL2			sys_reg(3, 4, 1, 1, 5)
 #define SYS_HFGITR_EL2			sys_reg(3, 4, 1, 1, 6)
+#define SYS_HACR_EL2			sys_reg(3, 4, 1, 1, 7)
+
+#define SYS_TTBR0_EL2			sys_reg(3, 4, 2, 0, 0)
+#define SYS_TTBR1_EL2			sys_reg(3, 4, 2, 0, 1)
+#define SYS_TCR_EL2			sys_reg(3, 4, 2, 0, 2)
+#define SYS_VTTBR_EL2			sys_reg(3, 4, 2, 1, 0)
+#define SYS_VTCR_EL2			sys_reg(3, 4, 2, 1, 2)
+
 #define SYS_ZCR_EL2			sys_reg(3, 4, 1, 2, 0)
 #define SYS_TRFCR_EL2			sys_reg(3, 4, 1, 2, 1)
 #define SYS_DACR32_EL2			sys_reg(3, 4, 3, 0, 0)
@@ -544,14 +560,26 @@
 #define SYS_HAFGRTR_EL2			sys_reg(3, 4, 3, 1, 6)
 #define SYS_SPSR_EL2			sys_reg(3, 4, 4, 0, 0)
 #define SYS_ELR_EL2			sys_reg(3, 4, 4, 0, 1)
+#define SYS_SP_EL1			sys_reg(3, 4, 4, 1, 0)
 #define SYS_IFSR32_EL2			sys_reg(3, 4, 5, 0, 1)
+#define SYS_AFSR0_EL2			sys_reg(3, 4, 5, 1, 0)
+#define SYS_AFSR1_EL2			sys_reg(3, 4, 5, 1, 1)
 #define SYS_ESR_EL2			sys_reg(3, 4, 5, 2, 0)
 #define SYS_VSESR_EL2			sys_reg(3, 4, 5, 2, 3)
 #define SYS_FPEXC32_EL2			sys_reg(3, 4, 5, 3, 0)
 #define SYS_TFSR_EL2			sys_reg(3, 4, 5, 6, 0)
 #define SYS_FAR_EL2			sys_reg(3, 4, 6, 0, 0)
 
-#define SYS_VDISR_EL2			sys_reg(3, 4, 12, 1,  1)
+#define SYS_FAR_EL2			sys_reg(3, 4, 6, 0, 0)
+#define SYS_HPFAR_EL2			sys_reg(3, 4, 6, 0, 4)
+
+#define SYS_MAIR_EL2			sys_reg(3, 4, 10, 2, 0)
+#define SYS_AMAIR_EL2			sys_reg(3, 4, 10, 3, 0)
+
+#define SYS_VBAR_EL2			sys_reg(3, 4, 12, 0, 0)
+#define SYS_RVBAR_EL2			sys_reg(3, 4, 12, 0, 1)
+#define SYS_RMR_EL2			sys_reg(3, 4, 12, 0, 2)
+#define SYS_VDISR_EL2			sys_reg(3, 4, 12, 1, 1)
 #define __SYS__AP0Rx_EL2(x)		sys_reg(3, 4, 12, 8, x)
 #define SYS_ICH_AP0R0_EL2		__SYS__AP0Rx_EL2(0)
 #define SYS_ICH_AP0R1_EL2		__SYS__AP0Rx_EL2(1)
@@ -593,15 +621,24 @@
 #define SYS_ICH_LR14_EL2		__SYS__LR8_EL2(6)
 #define SYS_ICH_LR15_EL2		__SYS__LR8_EL2(7)
 
+#define SYS_CONTEXTIDR_EL2		sys_reg(3, 4, 13, 0, 1)
+#define SYS_TPIDR_EL2			sys_reg(3, 4, 13, 0, 2)
+
+#define SYS_CNTVOFF_EL2			sys_reg(3, 4, 14, 0, 3)
+#define SYS_CNTHCTL_EL2			sys_reg(3, 4, 14, 1, 0)
+
 /* VHE encodings for architectural EL0/1 system registers */
 #define SYS_SCTLR_EL12			sys_reg(3, 5, 1, 0, 0)
 #define SYS_CPACR_EL12			sys_reg(3, 5, 1, 0, 2)
 #define SYS_ZCR_EL12			sys_reg(3, 5, 1, 2, 0)
+
 #define SYS_TTBR0_EL12			sys_reg(3, 5, 2, 0, 0)
 #define SYS_TTBR1_EL12			sys_reg(3, 5, 2, 0, 1)
 #define SYS_TCR_EL12			sys_reg(3, 5, 2, 0, 2)
+
 #define SYS_SPSR_EL12			sys_reg(3, 5, 4, 0, 0)
 #define SYS_ELR_EL12			sys_reg(3, 5, 4, 0, 1)
+
 #define SYS_AFSR0_EL12			sys_reg(3, 5, 5, 1, 0)
 #define SYS_AFSR1_EL12			sys_reg(3, 5, 5, 1, 1)
 #define SYS_ESR_EL12			sys_reg(3, 5, 5, 2, 0)
@@ -619,6 +656,8 @@
 #define SYS_CNTV_CTL_EL02		sys_reg(3, 5, 14, 3, 1)
 #define SYS_CNTV_CVAL_EL02		sys_reg(3, 5, 14, 3, 2)
 
+#define SYS_SP_EL2			sys_reg(3, 6,  4, 1, 0)
+
 /* Common SCTLR_ELx flags. */
 #define SCTLR_ELx_DSSBS	(BIT(44))
 #define SCTLR_ELx_ATA	(BIT(43))
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 4dc2fba316ff..06ca98458815 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -24,6 +24,7 @@
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_nested.h>
 #include <asm/perf_event.h>
 #include <asm/sysreg.h>
 
@@ -105,6 +106,18 @@ static u32 get_ccsidr(u32 csselr)
 	return ccsidr;
 }
 
+static bool access_rw(struct kvm_vcpu *vcpu,
+		      struct sys_reg_params *p,
+		      const struct sys_reg_desc *r)
+{
+	if (p->is_write)
+		vcpu_write_sys_reg(vcpu, p->regval, r->reg);
+	else
+		p->regval = vcpu_read_sys_reg(vcpu, r->reg);
+
+	return true;
+}
+
 /*
  * See note at ARMv7 ARM B1.14.4 (TL;DR: S/W ops are not easily virtualized).
  */
@@ -263,6 +276,14 @@ static bool trap_raz_wi(struct kvm_vcpu *vcpu,
 		return read_zero(vcpu, p);
 }
 
+static bool trap_undef(struct kvm_vcpu *vcpu,
+		       struct sys_reg_params *p,
+		       const struct sys_reg_desc *r)
+{
+	kvm_inject_undefined(vcpu);
+	return false;
+}
+
 /*
  * ARMv8.1 mandates at least a trivial LORegion implementation, where all the
  * RW registers are RES0 (which we can implement as RAZ/WI). On an ARMv8.0
@@ -342,12 +363,9 @@ static bool trap_debug_regs(struct kvm_vcpu *vcpu,
 			    struct sys_reg_params *p,
 			    const struct sys_reg_desc *r)
 {
-	if (p->is_write) {
-		vcpu_write_sys_reg(vcpu, p->regval, r->reg);
+	access_rw(vcpu, p, r);
+	if (p->is_write)
 		vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
-	} else {
-		p->regval = vcpu_read_sys_reg(vcpu, r->reg);
-	}
 
 	trace_trap_reg(__func__, r->reg, p->is_write, p->regval);
 
@@ -1378,6 +1396,24 @@ static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
 	.visibility = mte_visibility,		\
 }
 
+static unsigned int el2_visibility(const struct kvm_vcpu *vcpu,
+				   const struct sys_reg_desc *rd)
+{
+	if (vcpu_has_nv(vcpu))
+		return 0;
+
+	return REG_HIDDEN;
+}
+
+#define EL2_REG(name, acc, rst, v) {		\
+	SYS_DESC(SYS_##name),			\
+	.access = acc,				\
+	.reset = rst,				\
+	.reg = name,				\
+	.visibility = el2_visibility,		\
+	.val = v,				\
+}
+
 /* sys_reg_desc initialiser for known cpufeature ID registers */
 #define ID_SANITISED(name) {			\
 	SYS_DESC(SYS_##name),			\
@@ -1411,6 +1447,18 @@ static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
 	.set_user = set_raz_id_reg,		\
 }
 
+static bool access_sp_el1(struct kvm_vcpu *vcpu,
+			  struct sys_reg_params *p,
+			  const struct sys_reg_desc *r)
+{
+	if (p->is_write)
+		__vcpu_sys_reg(vcpu, SP_EL1) = p->regval;
+	else
+		p->regval = __vcpu_sys_reg(vcpu, SP_EL1);
+
+	return true;
+}
+
 /*
  * Architected system registers.
  * Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
@@ -1825,9 +1873,50 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ PMU_SYS_REG(SYS_PMCCFILTR_EL0), .access = access_pmu_evtyper,
 	  .reset = reset_val, .reg = PMCCFILTR_EL0, .val = 0 },
 
+	EL2_REG(VPIDR_EL2, access_rw, reset_val, 0),
+	EL2_REG(VMPIDR_EL2, access_rw, reset_val, 0),
+	EL2_REG(SCTLR_EL2, access_rw, reset_val, SCTLR_EL2_RES1),
+	EL2_REG(ACTLR_EL2, access_rw, reset_val, 0),
+	EL2_REG(HCR_EL2, access_rw, reset_val, 0),
+	EL2_REG(MDCR_EL2, access_rw, reset_val, 0),
+	EL2_REG(CPTR_EL2, access_rw, reset_val, CPTR_EL2_DEFAULT ),
+	EL2_REG(HSTR_EL2, access_rw, reset_val, 0),
+	EL2_REG(HACR_EL2, access_rw, reset_val, 0),
+
+	EL2_REG(TTBR0_EL2, access_rw, reset_val, 0),
+	EL2_REG(TTBR1_EL2, access_rw, reset_val, 0),
+	EL2_REG(TCR_EL2, access_rw, reset_val, TCR_EL2_RES1),
+	EL2_REG(VTTBR_EL2, access_rw, reset_val, 0),
+	EL2_REG(VTCR_EL2, access_rw, reset_val, 0),
+
 	{ SYS_DESC(SYS_DACR32_EL2), NULL, reset_unknown, DACR32_EL2 },
+	EL2_REG(SPSR_EL2, access_rw, reset_val, 0),
+	EL2_REG(ELR_EL2, access_rw, reset_val, 0),
+	{ SYS_DESC(SYS_SP_EL1), access_sp_el1},
+
 	{ SYS_DESC(SYS_IFSR32_EL2), NULL, reset_unknown, IFSR32_EL2 },
+	EL2_REG(AFSR0_EL2, access_rw, reset_val, 0),
+	EL2_REG(AFSR1_EL2, access_rw, reset_val, 0),
+	EL2_REG(ESR_EL2, access_rw, reset_val, 0),
 	{ SYS_DESC(SYS_FPEXC32_EL2), NULL, reset_val, FPEXC32_EL2, 0x700 },
+
+	EL2_REG(FAR_EL2, access_rw, reset_val, 0),
+	EL2_REG(HPFAR_EL2, access_rw, reset_val, 0),
+
+	EL2_REG(MAIR_EL2, access_rw, reset_val, 0),
+	EL2_REG(AMAIR_EL2, access_rw, reset_val, 0),
+
+	EL2_REG(VBAR_EL2, access_rw, reset_val, 0),
+	EL2_REG(RVBAR_EL2, access_rw, reset_val, 0),
+	{ SYS_DESC(SYS_RMR_EL2), trap_undef },
+
+	EL2_REG(CONTEXTIDR_EL2, access_rw, reset_val, 0),
+	EL2_REG(TPIDR_EL2, access_rw, reset_val, 0),
+
+	EL2_REG(CNTVOFF_EL2, access_rw, reset_val, 0),
+	EL2_REG(CNTHCTL_EL2, access_rw, reset_val, 0),
+
+	EL2_REG(SP_EL2, NULL, reset_unknown, 0),
 };
 
 static bool trap_dbgdidr(struct kvm_vcpu *vcpu,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 07/64] KVM: arm64: nv: Handle HCR_EL2.NV system register traps
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

From: Jintack Lim <jintack.lim@linaro.org>

ARM v8.3 introduces a new bit in the HCR_EL2, which is the NV bit. When
this bit is set, accessing EL2 registers in EL1 traps to EL2. In
addition, executing the following instructions in EL1 will trap to EL2:
tlbi, at, eret, and msr/mrs instructions to access SP_EL1. Most of the
instructions that trap to EL2 with the NV bit were undef at EL1 prior to
ARM v8.3. The only instruction that was not undef is eret.

This patch sets up a handler for EL2 registers and SP_EL1 register
accesses at EL1. The host hypervisor keeps those register values in
memory, and will emulate their behavior.

This patch doesn't set the NV bit yet. It will be set in a later patch
once nested virtualization support is completed.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
[maz: EL2_REG() macros]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/sysreg.h | 41 +++++++++++++-
 arch/arm64/kvm/sys_regs.c       | 99 +++++++++++++++++++++++++++++++--
 2 files changed, 134 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 898bee0004ae..61c990e5591a 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -532,10 +532,26 @@
 
 #define SYS_PMCCFILTR_EL0		sys_reg(3, 3, 14, 15, 7)
 
+#define SYS_VPIDR_EL2			sys_reg(3, 4, 0, 0, 0)
+#define SYS_VMPIDR_EL2			sys_reg(3, 4, 0, 0, 5)
+
 #define SYS_SCTLR_EL2			sys_reg(3, 4, 1, 0, 0)
+#define SYS_ACTLR_EL2			sys_reg(3, 4, 1, 0, 1)
+#define SYS_HCR_EL2			sys_reg(3, 4, 1, 1, 0)
+#define SYS_MDCR_EL2			sys_reg(3, 4, 1, 1, 1)
+#define SYS_CPTR_EL2			sys_reg(3, 4, 1, 1, 2)
+#define SYS_HSTR_EL2			sys_reg(3, 4, 1, 1, 3)
 #define SYS_HFGRTR_EL2			sys_reg(3, 4, 1, 1, 4)
 #define SYS_HFGWTR_EL2			sys_reg(3, 4, 1, 1, 5)
 #define SYS_HFGITR_EL2			sys_reg(3, 4, 1, 1, 6)
+#define SYS_HACR_EL2			sys_reg(3, 4, 1, 1, 7)
+
+#define SYS_TTBR0_EL2			sys_reg(3, 4, 2, 0, 0)
+#define SYS_TTBR1_EL2			sys_reg(3, 4, 2, 0, 1)
+#define SYS_TCR_EL2			sys_reg(3, 4, 2, 0, 2)
+#define SYS_VTTBR_EL2			sys_reg(3, 4, 2, 1, 0)
+#define SYS_VTCR_EL2			sys_reg(3, 4, 2, 1, 2)
+
 #define SYS_ZCR_EL2			sys_reg(3, 4, 1, 2, 0)
 #define SYS_TRFCR_EL2			sys_reg(3, 4, 1, 2, 1)
 #define SYS_DACR32_EL2			sys_reg(3, 4, 3, 0, 0)
@@ -544,14 +560,26 @@
 #define SYS_HAFGRTR_EL2			sys_reg(3, 4, 3, 1, 6)
 #define SYS_SPSR_EL2			sys_reg(3, 4, 4, 0, 0)
 #define SYS_ELR_EL2			sys_reg(3, 4, 4, 0, 1)
+#define SYS_SP_EL1			sys_reg(3, 4, 4, 1, 0)
 #define SYS_IFSR32_EL2			sys_reg(3, 4, 5, 0, 1)
+#define SYS_AFSR0_EL2			sys_reg(3, 4, 5, 1, 0)
+#define SYS_AFSR1_EL2			sys_reg(3, 4, 5, 1, 1)
 #define SYS_ESR_EL2			sys_reg(3, 4, 5, 2, 0)
 #define SYS_VSESR_EL2			sys_reg(3, 4, 5, 2, 3)
 #define SYS_FPEXC32_EL2			sys_reg(3, 4, 5, 3, 0)
 #define SYS_TFSR_EL2			sys_reg(3, 4, 5, 6, 0)
 #define SYS_FAR_EL2			sys_reg(3, 4, 6, 0, 0)
 
-#define SYS_VDISR_EL2			sys_reg(3, 4, 12, 1,  1)
+#define SYS_FAR_EL2			sys_reg(3, 4, 6, 0, 0)
+#define SYS_HPFAR_EL2			sys_reg(3, 4, 6, 0, 4)
+
+#define SYS_MAIR_EL2			sys_reg(3, 4, 10, 2, 0)
+#define SYS_AMAIR_EL2			sys_reg(3, 4, 10, 3, 0)
+
+#define SYS_VBAR_EL2			sys_reg(3, 4, 12, 0, 0)
+#define SYS_RVBAR_EL2			sys_reg(3, 4, 12, 0, 1)
+#define SYS_RMR_EL2			sys_reg(3, 4, 12, 0, 2)
+#define SYS_VDISR_EL2			sys_reg(3, 4, 12, 1, 1)
 #define __SYS__AP0Rx_EL2(x)		sys_reg(3, 4, 12, 8, x)
 #define SYS_ICH_AP0R0_EL2		__SYS__AP0Rx_EL2(0)
 #define SYS_ICH_AP0R1_EL2		__SYS__AP0Rx_EL2(1)
@@ -593,15 +621,24 @@
 #define SYS_ICH_LR14_EL2		__SYS__LR8_EL2(6)
 #define SYS_ICH_LR15_EL2		__SYS__LR8_EL2(7)
 
+#define SYS_CONTEXTIDR_EL2		sys_reg(3, 4, 13, 0, 1)
+#define SYS_TPIDR_EL2			sys_reg(3, 4, 13, 0, 2)
+
+#define SYS_CNTVOFF_EL2			sys_reg(3, 4, 14, 0, 3)
+#define SYS_CNTHCTL_EL2			sys_reg(3, 4, 14, 1, 0)
+
 /* VHE encodings for architectural EL0/1 system registers */
 #define SYS_SCTLR_EL12			sys_reg(3, 5, 1, 0, 0)
 #define SYS_CPACR_EL12			sys_reg(3, 5, 1, 0, 2)
 #define SYS_ZCR_EL12			sys_reg(3, 5, 1, 2, 0)
+
 #define SYS_TTBR0_EL12			sys_reg(3, 5, 2, 0, 0)
 #define SYS_TTBR1_EL12			sys_reg(3, 5, 2, 0, 1)
 #define SYS_TCR_EL12			sys_reg(3, 5, 2, 0, 2)
+
 #define SYS_SPSR_EL12			sys_reg(3, 5, 4, 0, 0)
 #define SYS_ELR_EL12			sys_reg(3, 5, 4, 0, 1)
+
 #define SYS_AFSR0_EL12			sys_reg(3, 5, 5, 1, 0)
 #define SYS_AFSR1_EL12			sys_reg(3, 5, 5, 1, 1)
 #define SYS_ESR_EL12			sys_reg(3, 5, 5, 2, 0)
@@ -619,6 +656,8 @@
 #define SYS_CNTV_CTL_EL02		sys_reg(3, 5, 14, 3, 1)
 #define SYS_CNTV_CVAL_EL02		sys_reg(3, 5, 14, 3, 2)
 
+#define SYS_SP_EL2			sys_reg(3, 6,  4, 1, 0)
+
 /* Common SCTLR_ELx flags. */
 #define SCTLR_ELx_DSSBS	(BIT(44))
 #define SCTLR_ELx_ATA	(BIT(43))
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 4dc2fba316ff..06ca98458815 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -24,6 +24,7 @@
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_nested.h>
 #include <asm/perf_event.h>
 #include <asm/sysreg.h>
 
@@ -105,6 +106,18 @@ static u32 get_ccsidr(u32 csselr)
 	return ccsidr;
 }
 
+static bool access_rw(struct kvm_vcpu *vcpu,
+		      struct sys_reg_params *p,
+		      const struct sys_reg_desc *r)
+{
+	if (p->is_write)
+		vcpu_write_sys_reg(vcpu, p->regval, r->reg);
+	else
+		p->regval = vcpu_read_sys_reg(vcpu, r->reg);
+
+	return true;
+}
+
 /*
  * See note at ARMv7 ARM B1.14.4 (TL;DR: S/W ops are not easily virtualized).
  */
@@ -263,6 +276,14 @@ static bool trap_raz_wi(struct kvm_vcpu *vcpu,
 		return read_zero(vcpu, p);
 }
 
+static bool trap_undef(struct kvm_vcpu *vcpu,
+		       struct sys_reg_params *p,
+		       const struct sys_reg_desc *r)
+{
+	kvm_inject_undefined(vcpu);
+	return false;
+}
+
 /*
  * ARMv8.1 mandates at least a trivial LORegion implementation, where all the
  * RW registers are RES0 (which we can implement as RAZ/WI). On an ARMv8.0
@@ -342,12 +363,9 @@ static bool trap_debug_regs(struct kvm_vcpu *vcpu,
 			    struct sys_reg_params *p,
 			    const struct sys_reg_desc *r)
 {
-	if (p->is_write) {
-		vcpu_write_sys_reg(vcpu, p->regval, r->reg);
+	access_rw(vcpu, p, r);
+	if (p->is_write)
 		vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
-	} else {
-		p->regval = vcpu_read_sys_reg(vcpu, r->reg);
-	}
 
 	trace_trap_reg(__func__, r->reg, p->is_write, p->regval);
 
@@ -1378,6 +1396,24 @@ static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
 	.visibility = mte_visibility,		\
 }
 
+static unsigned int el2_visibility(const struct kvm_vcpu *vcpu,
+				   const struct sys_reg_desc *rd)
+{
+	if (vcpu_has_nv(vcpu))
+		return 0;
+
+	return REG_HIDDEN;
+}
+
+#define EL2_REG(name, acc, rst, v) {		\
+	SYS_DESC(SYS_##name),			\
+	.access = acc,				\
+	.reset = rst,				\
+	.reg = name,				\
+	.visibility = el2_visibility,		\
+	.val = v,				\
+}
+
 /* sys_reg_desc initialiser for known cpufeature ID registers */
 #define ID_SANITISED(name) {			\
 	SYS_DESC(SYS_##name),			\
@@ -1411,6 +1447,18 @@ static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
 	.set_user = set_raz_id_reg,		\
 }
 
+static bool access_sp_el1(struct kvm_vcpu *vcpu,
+			  struct sys_reg_params *p,
+			  const struct sys_reg_desc *r)
+{
+	if (p->is_write)
+		__vcpu_sys_reg(vcpu, SP_EL1) = p->regval;
+	else
+		p->regval = __vcpu_sys_reg(vcpu, SP_EL1);
+
+	return true;
+}
+
 /*
  * Architected system registers.
  * Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
@@ -1825,9 +1873,50 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ PMU_SYS_REG(SYS_PMCCFILTR_EL0), .access = access_pmu_evtyper,
 	  .reset = reset_val, .reg = PMCCFILTR_EL0, .val = 0 },
 
+	EL2_REG(VPIDR_EL2, access_rw, reset_val, 0),
+	EL2_REG(VMPIDR_EL2, access_rw, reset_val, 0),
+	EL2_REG(SCTLR_EL2, access_rw, reset_val, SCTLR_EL2_RES1),
+	EL2_REG(ACTLR_EL2, access_rw, reset_val, 0),
+	EL2_REG(HCR_EL2, access_rw, reset_val, 0),
+	EL2_REG(MDCR_EL2, access_rw, reset_val, 0),
+	EL2_REG(CPTR_EL2, access_rw, reset_val, CPTR_EL2_DEFAULT ),
+	EL2_REG(HSTR_EL2, access_rw, reset_val, 0),
+	EL2_REG(HACR_EL2, access_rw, reset_val, 0),
+
+	EL2_REG(TTBR0_EL2, access_rw, reset_val, 0),
+	EL2_REG(TTBR1_EL2, access_rw, reset_val, 0),
+	EL2_REG(TCR_EL2, access_rw, reset_val, TCR_EL2_RES1),
+	EL2_REG(VTTBR_EL2, access_rw, reset_val, 0),
+	EL2_REG(VTCR_EL2, access_rw, reset_val, 0),
+
 	{ SYS_DESC(SYS_DACR32_EL2), NULL, reset_unknown, DACR32_EL2 },
+	EL2_REG(SPSR_EL2, access_rw, reset_val, 0),
+	EL2_REG(ELR_EL2, access_rw, reset_val, 0),
+	{ SYS_DESC(SYS_SP_EL1), access_sp_el1},
+
 	{ SYS_DESC(SYS_IFSR32_EL2), NULL, reset_unknown, IFSR32_EL2 },
+	EL2_REG(AFSR0_EL2, access_rw, reset_val, 0),
+	EL2_REG(AFSR1_EL2, access_rw, reset_val, 0),
+	EL2_REG(ESR_EL2, access_rw, reset_val, 0),
 	{ SYS_DESC(SYS_FPEXC32_EL2), NULL, reset_val, FPEXC32_EL2, 0x700 },
+
+	EL2_REG(FAR_EL2, access_rw, reset_val, 0),
+	EL2_REG(HPFAR_EL2, access_rw, reset_val, 0),
+
+	EL2_REG(MAIR_EL2, access_rw, reset_val, 0),
+	EL2_REG(AMAIR_EL2, access_rw, reset_val, 0),
+
+	EL2_REG(VBAR_EL2, access_rw, reset_val, 0),
+	EL2_REG(RVBAR_EL2, access_rw, reset_val, 0),
+	{ SYS_DESC(SYS_RMR_EL2), trap_undef },
+
+	EL2_REG(CONTEXTIDR_EL2, access_rw, reset_val, 0),
+	EL2_REG(TPIDR_EL2, access_rw, reset_val, 0),
+
+	EL2_REG(CNTVOFF_EL2, access_rw, reset_val, 0),
+	EL2_REG(CNTHCTL_EL2, access_rw, reset_val, 0),
+
+	EL2_REG(SP_EL2, NULL, reset_unknown, 0),
 };
 
 static bool trap_dbgdidr(struct kvm_vcpu *vcpu,
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 07/64] KVM: arm64: nv: Handle HCR_EL2.NV system register traps
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Jintack Lim <jintack.lim@linaro.org>

ARM v8.3 introduces a new bit in the HCR_EL2, which is the NV bit. When
this bit is set, accessing EL2 registers in EL1 traps to EL2. In
addition, executing the following instructions in EL1 will trap to EL2:
tlbi, at, eret, and msr/mrs instructions to access SP_EL1. Most of the
instructions that trap to EL2 with the NV bit were undef at EL1 prior to
ARM v8.3. The only instruction that was not undef is eret.

This patch sets up a handler for EL2 registers and SP_EL1 register
accesses at EL1. The host hypervisor keeps those register values in
memory, and will emulate their behavior.

This patch doesn't set the NV bit yet. It will be set in a later patch
once nested virtualization support is completed.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
[maz: EL2_REG() macros]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/sysreg.h | 41 +++++++++++++-
 arch/arm64/kvm/sys_regs.c       | 99 +++++++++++++++++++++++++++++++--
 2 files changed, 134 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 898bee0004ae..61c990e5591a 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -532,10 +532,26 @@
 
 #define SYS_PMCCFILTR_EL0		sys_reg(3, 3, 14, 15, 7)
 
+#define SYS_VPIDR_EL2			sys_reg(3, 4, 0, 0, 0)
+#define SYS_VMPIDR_EL2			sys_reg(3, 4, 0, 0, 5)
+
 #define SYS_SCTLR_EL2			sys_reg(3, 4, 1, 0, 0)
+#define SYS_ACTLR_EL2			sys_reg(3, 4, 1, 0, 1)
+#define SYS_HCR_EL2			sys_reg(3, 4, 1, 1, 0)
+#define SYS_MDCR_EL2			sys_reg(3, 4, 1, 1, 1)
+#define SYS_CPTR_EL2			sys_reg(3, 4, 1, 1, 2)
+#define SYS_HSTR_EL2			sys_reg(3, 4, 1, 1, 3)
 #define SYS_HFGRTR_EL2			sys_reg(3, 4, 1, 1, 4)
 #define SYS_HFGWTR_EL2			sys_reg(3, 4, 1, 1, 5)
 #define SYS_HFGITR_EL2			sys_reg(3, 4, 1, 1, 6)
+#define SYS_HACR_EL2			sys_reg(3, 4, 1, 1, 7)
+
+#define SYS_TTBR0_EL2			sys_reg(3, 4, 2, 0, 0)
+#define SYS_TTBR1_EL2			sys_reg(3, 4, 2, 0, 1)
+#define SYS_TCR_EL2			sys_reg(3, 4, 2, 0, 2)
+#define SYS_VTTBR_EL2			sys_reg(3, 4, 2, 1, 0)
+#define SYS_VTCR_EL2			sys_reg(3, 4, 2, 1, 2)
+
 #define SYS_ZCR_EL2			sys_reg(3, 4, 1, 2, 0)
 #define SYS_TRFCR_EL2			sys_reg(3, 4, 1, 2, 1)
 #define SYS_DACR32_EL2			sys_reg(3, 4, 3, 0, 0)
@@ -544,14 +560,26 @@
 #define SYS_HAFGRTR_EL2			sys_reg(3, 4, 3, 1, 6)
 #define SYS_SPSR_EL2			sys_reg(3, 4, 4, 0, 0)
 #define SYS_ELR_EL2			sys_reg(3, 4, 4, 0, 1)
+#define SYS_SP_EL1			sys_reg(3, 4, 4, 1, 0)
 #define SYS_IFSR32_EL2			sys_reg(3, 4, 5, 0, 1)
+#define SYS_AFSR0_EL2			sys_reg(3, 4, 5, 1, 0)
+#define SYS_AFSR1_EL2			sys_reg(3, 4, 5, 1, 1)
 #define SYS_ESR_EL2			sys_reg(3, 4, 5, 2, 0)
 #define SYS_VSESR_EL2			sys_reg(3, 4, 5, 2, 3)
 #define SYS_FPEXC32_EL2			sys_reg(3, 4, 5, 3, 0)
 #define SYS_TFSR_EL2			sys_reg(3, 4, 5, 6, 0)
 #define SYS_FAR_EL2			sys_reg(3, 4, 6, 0, 0)
 
-#define SYS_VDISR_EL2			sys_reg(3, 4, 12, 1,  1)
+#define SYS_FAR_EL2			sys_reg(3, 4, 6, 0, 0)
+#define SYS_HPFAR_EL2			sys_reg(3, 4, 6, 0, 4)
+
+#define SYS_MAIR_EL2			sys_reg(3, 4, 10, 2, 0)
+#define SYS_AMAIR_EL2			sys_reg(3, 4, 10, 3, 0)
+
+#define SYS_VBAR_EL2			sys_reg(3, 4, 12, 0, 0)
+#define SYS_RVBAR_EL2			sys_reg(3, 4, 12, 0, 1)
+#define SYS_RMR_EL2			sys_reg(3, 4, 12, 0, 2)
+#define SYS_VDISR_EL2			sys_reg(3, 4, 12, 1, 1)
 #define __SYS__AP0Rx_EL2(x)		sys_reg(3, 4, 12, 8, x)
 #define SYS_ICH_AP0R0_EL2		__SYS__AP0Rx_EL2(0)
 #define SYS_ICH_AP0R1_EL2		__SYS__AP0Rx_EL2(1)
@@ -593,15 +621,24 @@
 #define SYS_ICH_LR14_EL2		__SYS__LR8_EL2(6)
 #define SYS_ICH_LR15_EL2		__SYS__LR8_EL2(7)
 
+#define SYS_CONTEXTIDR_EL2		sys_reg(3, 4, 13, 0, 1)
+#define SYS_TPIDR_EL2			sys_reg(3, 4, 13, 0, 2)
+
+#define SYS_CNTVOFF_EL2			sys_reg(3, 4, 14, 0, 3)
+#define SYS_CNTHCTL_EL2			sys_reg(3, 4, 14, 1, 0)
+
 /* VHE encodings for architectural EL0/1 system registers */
 #define SYS_SCTLR_EL12			sys_reg(3, 5, 1, 0, 0)
 #define SYS_CPACR_EL12			sys_reg(3, 5, 1, 0, 2)
 #define SYS_ZCR_EL12			sys_reg(3, 5, 1, 2, 0)
+
 #define SYS_TTBR0_EL12			sys_reg(3, 5, 2, 0, 0)
 #define SYS_TTBR1_EL12			sys_reg(3, 5, 2, 0, 1)
 #define SYS_TCR_EL12			sys_reg(3, 5, 2, 0, 2)
+
 #define SYS_SPSR_EL12			sys_reg(3, 5, 4, 0, 0)
 #define SYS_ELR_EL12			sys_reg(3, 5, 4, 0, 1)
+
 #define SYS_AFSR0_EL12			sys_reg(3, 5, 5, 1, 0)
 #define SYS_AFSR1_EL12			sys_reg(3, 5, 5, 1, 1)
 #define SYS_ESR_EL12			sys_reg(3, 5, 5, 2, 0)
@@ -619,6 +656,8 @@
 #define SYS_CNTV_CTL_EL02		sys_reg(3, 5, 14, 3, 1)
 #define SYS_CNTV_CVAL_EL02		sys_reg(3, 5, 14, 3, 2)
 
+#define SYS_SP_EL2			sys_reg(3, 6,  4, 1, 0)
+
 /* Common SCTLR_ELx flags. */
 #define SCTLR_ELx_DSSBS	(BIT(44))
 #define SCTLR_ELx_ATA	(BIT(43))
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 4dc2fba316ff..06ca98458815 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -24,6 +24,7 @@
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_nested.h>
 #include <asm/perf_event.h>
 #include <asm/sysreg.h>
 
@@ -105,6 +106,18 @@ static u32 get_ccsidr(u32 csselr)
 	return ccsidr;
 }
 
+static bool access_rw(struct kvm_vcpu *vcpu,
+		      struct sys_reg_params *p,
+		      const struct sys_reg_desc *r)
+{
+	if (p->is_write)
+		vcpu_write_sys_reg(vcpu, p->regval, r->reg);
+	else
+		p->regval = vcpu_read_sys_reg(vcpu, r->reg);
+
+	return true;
+}
+
 /*
  * See note at ARMv7 ARM B1.14.4 (TL;DR: S/W ops are not easily virtualized).
  */
@@ -263,6 +276,14 @@ static bool trap_raz_wi(struct kvm_vcpu *vcpu,
 		return read_zero(vcpu, p);
 }
 
+static bool trap_undef(struct kvm_vcpu *vcpu,
+		       struct sys_reg_params *p,
+		       const struct sys_reg_desc *r)
+{
+	kvm_inject_undefined(vcpu);
+	return false;
+}
+
 /*
  * ARMv8.1 mandates at least a trivial LORegion implementation, where all the
  * RW registers are RES0 (which we can implement as RAZ/WI). On an ARMv8.0
@@ -342,12 +363,9 @@ static bool trap_debug_regs(struct kvm_vcpu *vcpu,
 			    struct sys_reg_params *p,
 			    const struct sys_reg_desc *r)
 {
-	if (p->is_write) {
-		vcpu_write_sys_reg(vcpu, p->regval, r->reg);
+	access_rw(vcpu, p, r);
+	if (p->is_write)
 		vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
-	} else {
-		p->regval = vcpu_read_sys_reg(vcpu, r->reg);
-	}
 
 	trace_trap_reg(__func__, r->reg, p->is_write, p->regval);
 
@@ -1378,6 +1396,24 @@ static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
 	.visibility = mte_visibility,		\
 }
 
+static unsigned int el2_visibility(const struct kvm_vcpu *vcpu,
+				   const struct sys_reg_desc *rd)
+{
+	if (vcpu_has_nv(vcpu))
+		return 0;
+
+	return REG_HIDDEN;
+}
+
+#define EL2_REG(name, acc, rst, v) {		\
+	SYS_DESC(SYS_##name),			\
+	.access = acc,				\
+	.reset = rst,				\
+	.reg = name,				\
+	.visibility = el2_visibility,		\
+	.val = v,				\
+}
+
 /* sys_reg_desc initialiser for known cpufeature ID registers */
 #define ID_SANITISED(name) {			\
 	SYS_DESC(SYS_##name),			\
@@ -1411,6 +1447,18 @@ static unsigned int mte_visibility(const struct kvm_vcpu *vcpu,
 	.set_user = set_raz_id_reg,		\
 }
 
+static bool access_sp_el1(struct kvm_vcpu *vcpu,
+			  struct sys_reg_params *p,
+			  const struct sys_reg_desc *r)
+{
+	if (p->is_write)
+		__vcpu_sys_reg(vcpu, SP_EL1) = p->regval;
+	else
+		p->regval = __vcpu_sys_reg(vcpu, SP_EL1);
+
+	return true;
+}
+
 /*
  * Architected system registers.
  * Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
@@ -1825,9 +1873,50 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ PMU_SYS_REG(SYS_PMCCFILTR_EL0), .access = access_pmu_evtyper,
 	  .reset = reset_val, .reg = PMCCFILTR_EL0, .val = 0 },
 
+	EL2_REG(VPIDR_EL2, access_rw, reset_val, 0),
+	EL2_REG(VMPIDR_EL2, access_rw, reset_val, 0),
+	EL2_REG(SCTLR_EL2, access_rw, reset_val, SCTLR_EL2_RES1),
+	EL2_REG(ACTLR_EL2, access_rw, reset_val, 0),
+	EL2_REG(HCR_EL2, access_rw, reset_val, 0),
+	EL2_REG(MDCR_EL2, access_rw, reset_val, 0),
+	EL2_REG(CPTR_EL2, access_rw, reset_val, CPTR_EL2_DEFAULT ),
+	EL2_REG(HSTR_EL2, access_rw, reset_val, 0),
+	EL2_REG(HACR_EL2, access_rw, reset_val, 0),
+
+	EL2_REG(TTBR0_EL2, access_rw, reset_val, 0),
+	EL2_REG(TTBR1_EL2, access_rw, reset_val, 0),
+	EL2_REG(TCR_EL2, access_rw, reset_val, TCR_EL2_RES1),
+	EL2_REG(VTTBR_EL2, access_rw, reset_val, 0),
+	EL2_REG(VTCR_EL2, access_rw, reset_val, 0),
+
 	{ SYS_DESC(SYS_DACR32_EL2), NULL, reset_unknown, DACR32_EL2 },
+	EL2_REG(SPSR_EL2, access_rw, reset_val, 0),
+	EL2_REG(ELR_EL2, access_rw, reset_val, 0),
+	{ SYS_DESC(SYS_SP_EL1), access_sp_el1},
+
 	{ SYS_DESC(SYS_IFSR32_EL2), NULL, reset_unknown, IFSR32_EL2 },
+	EL2_REG(AFSR0_EL2, access_rw, reset_val, 0),
+	EL2_REG(AFSR1_EL2, access_rw, reset_val, 0),
+	EL2_REG(ESR_EL2, access_rw, reset_val, 0),
 	{ SYS_DESC(SYS_FPEXC32_EL2), NULL, reset_val, FPEXC32_EL2, 0x700 },
+
+	EL2_REG(FAR_EL2, access_rw, reset_val, 0),
+	EL2_REG(HPFAR_EL2, access_rw, reset_val, 0),
+
+	EL2_REG(MAIR_EL2, access_rw, reset_val, 0),
+	EL2_REG(AMAIR_EL2, access_rw, reset_val, 0),
+
+	EL2_REG(VBAR_EL2, access_rw, reset_val, 0),
+	EL2_REG(RVBAR_EL2, access_rw, reset_val, 0),
+	{ SYS_DESC(SYS_RMR_EL2), trap_undef },
+
+	EL2_REG(CONTEXTIDR_EL2, access_rw, reset_val, 0),
+	EL2_REG(TPIDR_EL2, access_rw, reset_val, 0),
+
+	EL2_REG(CNTVOFF_EL2, access_rw, reset_val, 0),
+	EL2_REG(CNTHCTL_EL2, access_rw, reset_val, 0),
+
+	EL2_REG(SP_EL2, NULL, reset_unknown, 0),
 };
 
 static bool trap_dbgdidr(struct kvm_vcpu *vcpu,
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 08/64] KVM: arm64: nv: Reset VMPIDR_EL2 and VPIDR_EL2 to sane values
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Christoffer Dall <christoffer.dall@arm.com>

The VMPIDR_EL2 and VPIDR_EL2 are architecturally UNKNOWN at reset, but
let's be nice to a guest hypervisor behaving foolishly and reset these
to something reasonable anyway.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/sys_regs.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 06ca98458815..25386cb622c5 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -591,7 +591,7 @@ static void reset_actlr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
 	vcpu_write_sys_reg(vcpu, actlr, ACTLR_EL1);
 }
 
-static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
+static u64 compute_reset_mpidr(struct kvm_vcpu *vcpu)
 {
 	u64 mpidr;
 
@@ -605,7 +605,24 @@ static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
 	mpidr = (vcpu->vcpu_id & 0x0f) << MPIDR_LEVEL_SHIFT(0);
 	mpidr |= ((vcpu->vcpu_id >> 4) & 0xff) << MPIDR_LEVEL_SHIFT(1);
 	mpidr |= ((vcpu->vcpu_id >> 12) & 0xff) << MPIDR_LEVEL_SHIFT(2);
-	vcpu_write_sys_reg(vcpu, (1ULL << 31) | mpidr, MPIDR_EL1);
+	mpidr |= (1ULL << 31);
+
+	return mpidr;
+}
+
+static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
+{
+	vcpu_write_sys_reg(vcpu, compute_reset_mpidr(vcpu), MPIDR_EL1);
+}
+
+static void reset_vmpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
+{
+	vcpu_write_sys_reg(vcpu, compute_reset_mpidr(vcpu), VMPIDR_EL2);
+}
+
+static void reset_vpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
+{
+	vcpu_write_sys_reg(vcpu, read_cpuid_id(), VPIDR_EL2);
 }
 
 static unsigned int pmu_visibility(const struct kvm_vcpu *vcpu,
@@ -1873,8 +1890,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ PMU_SYS_REG(SYS_PMCCFILTR_EL0), .access = access_pmu_evtyper,
 	  .reset = reset_val, .reg = PMCCFILTR_EL0, .val = 0 },
 
-	EL2_REG(VPIDR_EL2, access_rw, reset_val, 0),
-	EL2_REG(VMPIDR_EL2, access_rw, reset_val, 0),
+	EL2_REG(VPIDR_EL2, access_rw, reset_vpidr, 0),
+	EL2_REG(VMPIDR_EL2, access_rw, reset_vmpidr, 0),
 	EL2_REG(SCTLR_EL2, access_rw, reset_val, SCTLR_EL2_RES1),
 	EL2_REG(ACTLR_EL2, access_rw, reset_val, 0),
 	EL2_REG(HCR_EL2, access_rw, reset_val, 0),
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 08/64] KVM: arm64: nv: Reset VMPIDR_EL2 and VPIDR_EL2 to sane values
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

From: Christoffer Dall <christoffer.dall@arm.com>

The VMPIDR_EL2 and VPIDR_EL2 are architecturally UNKNOWN at reset, but
let's be nice to a guest hypervisor behaving foolishly and reset these
to something reasonable anyway.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/sys_regs.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 06ca98458815..25386cb622c5 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -591,7 +591,7 @@ static void reset_actlr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
 	vcpu_write_sys_reg(vcpu, actlr, ACTLR_EL1);
 }
 
-static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
+static u64 compute_reset_mpidr(struct kvm_vcpu *vcpu)
 {
 	u64 mpidr;
 
@@ -605,7 +605,24 @@ static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
 	mpidr = (vcpu->vcpu_id & 0x0f) << MPIDR_LEVEL_SHIFT(0);
 	mpidr |= ((vcpu->vcpu_id >> 4) & 0xff) << MPIDR_LEVEL_SHIFT(1);
 	mpidr |= ((vcpu->vcpu_id >> 12) & 0xff) << MPIDR_LEVEL_SHIFT(2);
-	vcpu_write_sys_reg(vcpu, (1ULL << 31) | mpidr, MPIDR_EL1);
+	mpidr |= (1ULL << 31);
+
+	return mpidr;
+}
+
+static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
+{
+	vcpu_write_sys_reg(vcpu, compute_reset_mpidr(vcpu), MPIDR_EL1);
+}
+
+static void reset_vmpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
+{
+	vcpu_write_sys_reg(vcpu, compute_reset_mpidr(vcpu), VMPIDR_EL2);
+}
+
+static void reset_vpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
+{
+	vcpu_write_sys_reg(vcpu, read_cpuid_id(), VPIDR_EL2);
 }
 
 static unsigned int pmu_visibility(const struct kvm_vcpu *vcpu,
@@ -1873,8 +1890,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ PMU_SYS_REG(SYS_PMCCFILTR_EL0), .access = access_pmu_evtyper,
 	  .reset = reset_val, .reg = PMCCFILTR_EL0, .val = 0 },
 
-	EL2_REG(VPIDR_EL2, access_rw, reset_val, 0),
-	EL2_REG(VMPIDR_EL2, access_rw, reset_val, 0),
+	EL2_REG(VPIDR_EL2, access_rw, reset_vpidr, 0),
+	EL2_REG(VMPIDR_EL2, access_rw, reset_vmpidr, 0),
 	EL2_REG(SCTLR_EL2, access_rw, reset_val, SCTLR_EL2_RES1),
 	EL2_REG(ACTLR_EL2, access_rw, reset_val, 0),
 	EL2_REG(HCR_EL2, access_rw, reset_val, 0),
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 08/64] KVM: arm64: nv: Reset VMPIDR_EL2 and VPIDR_EL2 to sane values
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Christoffer Dall <christoffer.dall@arm.com>

The VMPIDR_EL2 and VPIDR_EL2 are architecturally UNKNOWN at reset, but
let's be nice to a guest hypervisor behaving foolishly and reset these
to something reasonable anyway.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/sys_regs.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 06ca98458815..25386cb622c5 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -591,7 +591,7 @@ static void reset_actlr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
 	vcpu_write_sys_reg(vcpu, actlr, ACTLR_EL1);
 }
 
-static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
+static u64 compute_reset_mpidr(struct kvm_vcpu *vcpu)
 {
 	u64 mpidr;
 
@@ -605,7 +605,24 @@ static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
 	mpidr = (vcpu->vcpu_id & 0x0f) << MPIDR_LEVEL_SHIFT(0);
 	mpidr |= ((vcpu->vcpu_id >> 4) & 0xff) << MPIDR_LEVEL_SHIFT(1);
 	mpidr |= ((vcpu->vcpu_id >> 12) & 0xff) << MPIDR_LEVEL_SHIFT(2);
-	vcpu_write_sys_reg(vcpu, (1ULL << 31) | mpidr, MPIDR_EL1);
+	mpidr |= (1ULL << 31);
+
+	return mpidr;
+}
+
+static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
+{
+	vcpu_write_sys_reg(vcpu, compute_reset_mpidr(vcpu), MPIDR_EL1);
+}
+
+static void reset_vmpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
+{
+	vcpu_write_sys_reg(vcpu, compute_reset_mpidr(vcpu), VMPIDR_EL2);
+}
+
+static void reset_vpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
+{
+	vcpu_write_sys_reg(vcpu, read_cpuid_id(), VPIDR_EL2);
 }
 
 static unsigned int pmu_visibility(const struct kvm_vcpu *vcpu,
@@ -1873,8 +1890,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ PMU_SYS_REG(SYS_PMCCFILTR_EL0), .access = access_pmu_evtyper,
 	  .reset = reset_val, .reg = PMCCFILTR_EL0, .val = 0 },
 
-	EL2_REG(VPIDR_EL2, access_rw, reset_val, 0),
-	EL2_REG(VMPIDR_EL2, access_rw, reset_val, 0),
+	EL2_REG(VPIDR_EL2, access_rw, reset_vpidr, 0),
+	EL2_REG(VMPIDR_EL2, access_rw, reset_vmpidr, 0),
 	EL2_REG(SCTLR_EL2, access_rw, reset_val, SCTLR_EL2_RES1),
 	EL2_REG(ACTLR_EL2, access_rw, reset_val, 0),
 	EL2_REG(HCR_EL2, access_rw, reset_val, 0),
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 09/64] KVM: arm64: nv: Support virtual EL2 exceptions
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Jintack Lim <jintack.lim@linaro.org>

Support injecting exceptions and performing exception returns to and
from virtual EL2.  This must be done entirely in software except when
taking an exception from vEL0 to vEL2 when the virtual HCR_EL2.{E2H,TGE}
== {1,1}  (a VHE guest hypervisor).

Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
[maz: switch to common exception injection framework, illegal exeption
 return handling]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_arm.h     |  17 +++
 arch/arm64/include/asm/kvm_emulate.h |  10 ++
 arch/arm64/include/asm/kvm_host.h    |   1 +
 arch/arm64/kvm/Makefile              |   2 +-
 arch/arm64/kvm/emulate-nested.c      | 197 +++++++++++++++++++++++++++
 arch/arm64/kvm/hyp/exception.c       |  49 +++++--
 arch/arm64/kvm/inject_fault.c        |  68 +++++++--
 arch/arm64/kvm/trace_arm.h           |  59 ++++++++
 8 files changed, 382 insertions(+), 21 deletions(-)
 create mode 100644 arch/arm64/kvm/emulate-nested.c

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 01d47c5886dc..e6e3aae87a09 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -359,4 +359,21 @@
 #define CPACR_EL1_TTA		(1 << 28)
 #define CPACR_EL1_DEFAULT	(CPACR_EL1_FPEN | CPACR_EL1_ZEN_EL1EN)
 
+#define kvm_mode_names				\
+	{ PSR_MODE_EL0t,	"EL0t" },	\
+	{ PSR_MODE_EL1t,	"EL1t" },	\
+	{ PSR_MODE_EL1h,	"EL1h" },	\
+	{ PSR_MODE_EL2t,	"EL2t" },	\
+	{ PSR_MODE_EL2h,	"EL2h" },	\
+	{ PSR_MODE_EL3t,	"EL3t" },	\
+	{ PSR_MODE_EL3h,	"EL3h" },	\
+	{ PSR_AA32_MODE_USR,	"32-bit USR" },	\
+	{ PSR_AA32_MODE_FIQ,	"32-bit FIQ" },	\
+	{ PSR_AA32_MODE_IRQ,	"32-bit IRQ" },	\
+	{ PSR_AA32_MODE_SVC,	"32-bit SVC" },	\
+	{ PSR_AA32_MODE_ABT,	"32-bit ABT" },	\
+	{ PSR_AA32_MODE_HYP,	"32-bit HYP" },	\
+	{ PSR_AA32_MODE_UND,	"32-bit UND" },	\
+	{ PSR_AA32_MODE_SYS,	"32-bit SYS" }
+
 #endif /* __ARM64_KVM_ARM_H__ */
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index ea9a130c4b6a..cb9f123d26f3 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -33,6 +33,12 @@ enum exception_type {
 	except_type_serror	= 0x180,
 };
 
+#define kvm_exception_type_names		\
+	{ except_type_sync,	"SYNC"   },	\
+	{ except_type_irq,	"IRQ"    },	\
+	{ except_type_fiq,	"FIQ"    },	\
+	{ except_type_serror,	"SERROR" }
+
 bool kvm_condition_valid32(const struct kvm_vcpu *vcpu);
 void kvm_skip_instr32(struct kvm_vcpu *vcpu);
 
@@ -43,6 +49,10 @@ void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
 
 void kvm_vcpu_wfi(struct kvm_vcpu *vcpu);
 
+void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu);
+int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2);
+int kvm_inject_nested_irq(struct kvm_vcpu *vcpu);
+
 static __always_inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
 {
 	return !(vcpu->arch.hcr_el2 & HCR_RW);
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 15f690c27baf..8fffe2888403 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -467,6 +467,7 @@ struct kvm_vcpu_arch {
 #define KVM_ARM64_EXCEPT_AA64_ELx_SERR	(3 << 9)
 #define KVM_ARM64_EXCEPT_AA64_EL1	(0 << 11)
 #define KVM_ARM64_EXCEPT_AA64_EL2	(1 << 11)
+#define KVM_ARM64_EXCEPT_AA64_EL_MASK	(1 << 11)
 
 #define KVM_ARM64_DEBUG_STATE_SAVE_SPE	(1 << 12) /* Save SPE context if active  */
 #define KVM_ARM64_DEBUG_STATE_SAVE_TRBE	(1 << 13) /* Save TRBE context if active  */
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 91861fd8b897..b67c4ebd72b1 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -14,7 +14,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
 	 inject_fault.o va_layout.o handle_exit.o \
 	 guest.o debug.o reset.o sys_regs.o \
 	 vgic-sys-reg-v3.o fpsimd.o pmu.o pkvm.o \
-	 arch_timer.o trng.o\
+	 arch_timer.o trng.o emulate-nested.o \
 	 vgic/vgic.o vgic/vgic-init.o \
 	 vgic/vgic-irqfd.o vgic/vgic-v2.o \
 	 vgic/vgic-v3.o vgic/vgic-v4.o \
diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
new file mode 100644
index 000000000000..f52cd4458947
--- /dev/null
+++ b/arch/arm64/kvm/emulate-nested.c
@@ -0,0 +1,197 @@
+/*
+ * Copyright (C) 2016 - Linaro and Columbia University
+ * Author: Jintack Lim <jintack.lim@linaro.org>
+ */
+
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_emulate.h>
+#include <asm/kvm_nested.h>
+
+#include "hyp/include/hyp/adjust_pc.h"
+
+#include "trace.h"
+
+static u64 kvm_check_illegal_exception_return(struct kvm_vcpu *vcpu, u64 spsr)
+{
+	u64 mode = spsr & PSR_MODE_MASK;
+
+	/*
+	 * Possible causes for an Illegal Exception Return from EL2:
+	 * - trying to return to EL3
+	 * - trying to return to a 32bit EL
+	 * - trying to return to EL1 with HCR_EL2.TGE set
+	 */
+	if (mode == PSR_MODE_EL3t || mode == PSR_MODE_EL3h ||
+	    spsr & PSR_MODE32_BIT ||
+	    (vcpu_el2_tge_is_set(vcpu) && (mode == PSR_MODE_EL1t ||
+					   mode == PSR_MODE_EL1h))) {
+		/*
+		 * The guest is playing with our nerves. Preserve EL, SP,
+		 * masks, flags from the existing PSTATE, and set IL.
+		 * The HW will then generate an Illegal State Exception
+		 * immediately after ERET.
+		 */
+		spsr = *vcpu_cpsr(vcpu);
+
+		spsr &= (PSR_D_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT |
+			 PSR_N_BIT | PSR_Z_BIT | PSR_C_BIT | PSR_V_BIT |
+			 PSR_MODE_MASK | PSR_MODE32_BIT);
+		spsr |= PSR_IL_BIT;
+	}
+
+	return spsr;
+}
+
+void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu)
+{
+	u64 spsr, elr, mode;
+	bool direct_eret;
+
+	/*
+	 * Going through the whole put/load motions is a waste of time
+	 * if this is a VHE guest hypervisor returning to its own
+	 * userspace, or the hypervisor performing a local exception
+	 * return. No need to save/restore registers, no need to
+	 * switch S2 MMU. Just do the canonical ERET.
+	 */
+	spsr = vcpu_read_sys_reg(vcpu, SPSR_EL2);
+	spsr = kvm_check_illegal_exception_return(vcpu, spsr);
+
+	mode = spsr & (PSR_MODE_MASK | PSR_MODE32_BIT);
+
+	direct_eret  = (mode == PSR_MODE_EL0t &&
+			vcpu_el2_e2h_is_set(vcpu) &&
+			vcpu_el2_tge_is_set(vcpu));
+	direct_eret |= (mode == PSR_MODE_EL2h || mode == PSR_MODE_EL2t);
+
+	if (direct_eret) {
+		*vcpu_pc(vcpu) = vcpu_read_sys_reg(vcpu, ELR_EL2);
+		*vcpu_cpsr(vcpu) = spsr;
+		trace_kvm_nested_eret(vcpu, *vcpu_pc(vcpu), spsr);
+		return;
+	}
+
+	preempt_disable();
+	kvm_arch_vcpu_put(vcpu);
+
+	elr = __vcpu_sys_reg(vcpu, ELR_EL2);
+
+	trace_kvm_nested_eret(vcpu, elr, spsr);
+
+	/*
+	 * Note that the current exception level is always the virtual EL2,
+	 * since we set HCR_EL2.NV bit only when entering the virtual EL2.
+	 */
+	*vcpu_pc(vcpu) = elr;
+	*vcpu_cpsr(vcpu) = spsr;
+
+	kvm_arch_vcpu_load(vcpu, smp_processor_id());
+	preempt_enable();
+}
+
+static void kvm_inject_el2_exception(struct kvm_vcpu *vcpu, u64 esr_el2,
+				     enum exception_type type)
+{
+	trace_kvm_inject_nested_exception(vcpu, esr_el2, type);
+
+	switch (type) {
+	case except_type_sync:
+		vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_ELx_SYNC;
+		break;
+	case except_type_irq:
+		vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_ELx_IRQ;
+		break;
+	default:
+		WARN_ONCE(1, "Unsupported EL2 exception injection %d\n", type);
+	}
+
+	vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA64_EL2		|
+			     KVM_ARM64_PENDING_EXCEPTION);
+
+	vcpu_write_sys_reg(vcpu, esr_el2, ESR_EL2);
+}
+
+/*
+ * Emulate taking an exception to EL2.
+ * See ARM ARM J8.1.2 AArch64.TakeException()
+ */
+static int kvm_inject_nested(struct kvm_vcpu *vcpu, u64 esr_el2,
+			     enum exception_type type)
+{
+	u64 pstate, mode;
+	bool direct_inject;
+
+	if (!vcpu_has_nv(vcpu)) {
+		kvm_err("Unexpected call to %s for the non-nesting configuration\n",
+				__func__);
+		return -EINVAL;
+	}
+
+	/*
+	 * As for ERET, we can avoid doing too much on the injection path by
+	 * checking that we either took the exception from a VHE host
+	 * userspace or from vEL2. In these cases, there is no change in
+	 * translation regime (or anything else), so let's do as little as
+	 * possible.
+	 */
+	pstate = *vcpu_cpsr(vcpu);
+	mode = pstate & (PSR_MODE_MASK | PSR_MODE32_BIT);
+
+	direct_inject  = (mode == PSR_MODE_EL0t &&
+			  vcpu_el2_e2h_is_set(vcpu) &&
+			  vcpu_el2_tge_is_set(vcpu));
+	direct_inject |= (mode == PSR_MODE_EL2h || mode == PSR_MODE_EL2t);
+
+	if (direct_inject) {
+		kvm_inject_el2_exception(vcpu, esr_el2, type);
+		return 1;
+	}
+
+	preempt_disable();
+	kvm_arch_vcpu_put(vcpu);
+
+	kvm_inject_el2_exception(vcpu, esr_el2, type);
+
+	/*
+	 * A hard requirement is that a switch between EL1 and EL2
+	 * contexts has to happen between a put/load, so that we can
+	 * pick the correct timer and interrupt configuration, among
+	 * other things.
+	 *
+	 * Make sure the exception actually took place before we load
+	 * the new context.
+	 */
+	__kvm_adjust_pc(vcpu);
+
+	kvm_arch_vcpu_load(vcpu, smp_processor_id());
+	preempt_enable();
+
+	return 1;
+}
+
+int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2)
+{
+	return kvm_inject_nested(vcpu, esr_el2, except_type_sync);
+}
+
+int kvm_inject_nested_irq(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * Do not inject an irq if the:
+	 *  - Current exception level is EL2, and
+	 *  - virtual HCR_EL2.TGE == 0
+	 *  - virtual HCR_EL2.IMO == 0
+	 *
+	 * See Table D1-17 "Physical interrupt target and masking when EL3 is
+	 * not implemented and EL2 is implemented" in ARM DDI 0487C.a.
+	 */
+
+	if (vcpu_is_el2(vcpu) && !vcpu_el2_tge_is_set(vcpu) &&
+	    !(__vcpu_sys_reg(vcpu, HCR_EL2) & HCR_IMO))
+		return 1;
+
+	/* esr_el2 value doesn't matter for exits due to irqs. */
+	return kvm_inject_nested(vcpu, 0, except_type_irq);
+}
diff --git a/arch/arm64/kvm/hyp/exception.c b/arch/arm64/kvm/hyp/exception.c
index 0418399e0a20..93f9c6f97376 100644
--- a/arch/arm64/kvm/hyp/exception.c
+++ b/arch/arm64/kvm/hyp/exception.c
@@ -13,6 +13,7 @@
 #include <hyp/adjust_pc.h>
 #include <linux/kvm_host.h>
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_nested.h>
 
 #if !defined (__KVM_NVHE_HYPERVISOR__) && !defined (__KVM_VHE_HYPERVISOR__)
 #error Hypervisor code only!
@@ -22,7 +23,9 @@ static inline u64 __vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
 {
 	u64 val;
 
-	if (__vcpu_read_sys_reg_from_cpu(reg, &val))
+	if (unlikely(vcpu_has_nv(vcpu)))
+		return vcpu_read_sys_reg(vcpu, reg);
+	else if (__vcpu_read_sys_reg_from_cpu(reg, &val))
 		return val;
 
 	return __vcpu_sys_reg(vcpu, reg);
@@ -30,14 +33,24 @@ static inline u64 __vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
 
 static inline void __vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
 {
-	if (__vcpu_write_sys_reg_to_cpu(val, reg))
-		return;
-
-	 __vcpu_sys_reg(vcpu, reg) = val;
+	if (unlikely(vcpu_has_nv(vcpu)))
+		vcpu_write_sys_reg(vcpu, val, reg);
+	else if (!__vcpu_write_sys_reg_to_cpu(val, reg))
+		__vcpu_sys_reg(vcpu, reg) = val;
 }
 
-static void __vcpu_write_spsr(struct kvm_vcpu *vcpu, u64 val)
+static void __vcpu_write_spsr(struct kvm_vcpu *vcpu, unsigned long target_mode,
+			      u64 val)
 {
+	if (unlikely(vcpu_has_nv(vcpu))) {
+		if (target_mode == PSR_MODE_EL1h)
+			vcpu_write_sys_reg(vcpu, val, SPSR_EL1);
+		else
+			vcpu_write_sys_reg(vcpu, val, SPSR_EL2);
+
+		return;
+	}
+
 	write_sysreg_el1(val, SYS_SPSR);
 }
 
@@ -97,6 +110,11 @@ static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long target_mode,
 		sctlr = __vcpu_read_sys_reg(vcpu, SCTLR_EL1);
 		__vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL1);
 		break;
+	case PSR_MODE_EL2h:
+		vbar = __vcpu_read_sys_reg(vcpu, VBAR_EL2);
+		sctlr = __vcpu_read_sys_reg(vcpu, SCTLR_EL2);
+		__vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL2);
+		break;
 	default:
 		/* Don't do that */
 		BUG();
@@ -149,7 +167,7 @@ static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long target_mode,
 	new |= target_mode;
 
 	*vcpu_cpsr(vcpu) = new;
-	__vcpu_write_spsr(vcpu, old);
+	__vcpu_write_spsr(vcpu, target_mode, old);
 }
 
 /*
@@ -320,11 +338,22 @@ static void kvm_inject_exception(struct kvm_vcpu *vcpu)
 		      KVM_ARM64_EXCEPT_AA64_EL1):
 			enter_exception64(vcpu, PSR_MODE_EL1h, except_type_sync);
 			break;
+
+		case (KVM_ARM64_EXCEPT_AA64_ELx_SYNC |
+		      KVM_ARM64_EXCEPT_AA64_EL2):
+			enter_exception64(vcpu, PSR_MODE_EL2h, except_type_sync);
+			break;
+
+		case (KVM_ARM64_EXCEPT_AA64_ELx_IRQ |
+		      KVM_ARM64_EXCEPT_AA64_EL2):
+			enter_exception64(vcpu, PSR_MODE_EL2h, except_type_irq);
+			break;
+
 		default:
 			/*
-			 * Only EL1_SYNC makes sense so far, EL2_{SYNC,IRQ}
-			 * will be implemented at some point. Everything
-			 * else gets silently ignored.
+			 * Only EL1_SYNC and EL2_{SYNC,IRQ} makes
+			 * sense so far. Everything else gets silently
+			 * ignored.
 			 */
 			break;
 		}
diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index b47df73e98d7..81ceee6998cc 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -12,19 +12,58 @@
 
 #include <linux/kvm_host.h>
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_nested.h>
 #include <asm/esr.h>
 
+static void pend_sync_exception(struct kvm_vcpu *vcpu)
+{
+	vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA64_ELx_SYNC	|
+			     KVM_ARM64_PENDING_EXCEPTION);
+
+	/* If not nesting, EL1 is the only possible exception target */
+	if (likely(!vcpu_has_nv(vcpu))) {
+		vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_EL1;
+		return;
+	}
+
+	/*
+	 * With NV, we need to pick between EL1 and EL2. Note that we
+	 * never deal with a nesting exception here, hence never
+	 * changing context, and the exception itself can be delayed
+	 * until the next entry.
+	 */
+	switch(*vcpu_cpsr(vcpu) & PSR_MODE_MASK) {
+	case PSR_MODE_EL2h:
+	case PSR_MODE_EL2t:
+		vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_EL2;
+		break;
+	case PSR_MODE_EL1h:
+	case PSR_MODE_EL1t:
+		vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_EL1;
+		break;
+	case PSR_MODE_EL0t:
+		if (vcpu_el2_tge_is_set(vcpu))
+			vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_EL2;
+		else
+			vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_EL1;
+		break;
+	default:
+		BUG();
+	}
+}
+
+static bool match_target_el(struct kvm_vcpu *vcpu, unsigned long target)
+{
+	return (vcpu->arch.flags & KVM_ARM64_EXCEPT_AA64_EL_MASK) == target;
+}
+
 static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr)
 {
 	unsigned long cpsr = *vcpu_cpsr(vcpu);
 	bool is_aarch32 = vcpu_mode_is_32bit(vcpu);
 	u32 esr = 0;
 
-	vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA64_EL1		|
-			     KVM_ARM64_EXCEPT_AA64_ELx_SYNC	|
-			     KVM_ARM64_PENDING_EXCEPTION);
-
-	vcpu_write_sys_reg(vcpu, addr, FAR_EL1);
+	pend_sync_exception(vcpu);
 
 	/*
 	 * Build an {i,d}abort, depending on the level and the
@@ -45,16 +84,22 @@ static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr
 	if (!is_iabt)
 		esr |= ESR_ELx_EC_DABT_LOW << ESR_ELx_EC_SHIFT;
 
-	vcpu_write_sys_reg(vcpu, esr | ESR_ELx_FSC_EXTABT, ESR_EL1);
+	esr |= ESR_ELx_FSC_EXTABT;
+
+	if (match_target_el(vcpu, KVM_ARM64_EXCEPT_AA64_EL1)) {
+		vcpu_write_sys_reg(vcpu, addr, FAR_EL1);
+		vcpu_write_sys_reg(vcpu, esr, ESR_EL1);
+	} else {
+		vcpu_write_sys_reg(vcpu, addr, FAR_EL2);
+		vcpu_write_sys_reg(vcpu, esr, ESR_EL2);
+	}
 }
 
 static void inject_undef64(struct kvm_vcpu *vcpu)
 {
 	u32 esr = (ESR_ELx_EC_UNKNOWN << ESR_ELx_EC_SHIFT);
 
-	vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA64_EL1		|
-			     KVM_ARM64_EXCEPT_AA64_ELx_SYNC	|
-			     KVM_ARM64_PENDING_EXCEPTION);
+	pend_sync_exception(vcpu);
 
 	/*
 	 * Build an unknown exception, depending on the instruction
@@ -63,7 +108,10 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
 	if (kvm_vcpu_trap_il_is32bit(vcpu))
 		esr |= ESR_ELx_IL;
 
-	vcpu_write_sys_reg(vcpu, esr, ESR_EL1);
+	if (match_target_el(vcpu, KVM_ARM64_EXCEPT_AA64_EL1))
+		vcpu_write_sys_reg(vcpu, esr, ESR_EL1);
+	else
+		vcpu_write_sys_reg(vcpu, esr, ESR_EL2);
 }
 
 #define DFSR_FSC_EXTABT_LPAE	0x10
diff --git a/arch/arm64/kvm/trace_arm.h b/arch/arm64/kvm/trace_arm.h
index 33e4e7dd2719..f3e46a976125 100644
--- a/arch/arm64/kvm/trace_arm.h
+++ b/arch/arm64/kvm/trace_arm.h
@@ -2,6 +2,7 @@
 #if !defined(_TRACE_ARM_ARM64_KVM_H) || defined(TRACE_HEADER_MULTI_READ)
 #define _TRACE_ARM_ARM64_KVM_H
 
+#include <asm/kvm_emulate.h>
 #include <kvm/arm_arch_timer.h>
 #include <linux/tracepoint.h>
 
@@ -301,6 +302,64 @@ TRACE_EVENT(kvm_timer_emulate,
 		  __entry->timer_idx, __entry->should_fire)
 );
 
+TRACE_EVENT(kvm_nested_eret,
+	TP_PROTO(struct kvm_vcpu *vcpu, unsigned long elr_el2,
+		 unsigned long spsr_el2),
+	TP_ARGS(vcpu, elr_el2, spsr_el2),
+
+	TP_STRUCT__entry(
+		__field(struct kvm_vcpu *,	vcpu)
+		__field(unsigned long,		elr_el2)
+		__field(unsigned long,		spsr_el2)
+		__field(unsigned long,		target_mode)
+		__field(unsigned long,		hcr_el2)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu = vcpu;
+		__entry->elr_el2 = elr_el2;
+		__entry->spsr_el2 = spsr_el2;
+		__entry->target_mode = spsr_el2 & (PSR_MODE_MASK | PSR_MODE32_BIT);
+		__entry->hcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
+	),
+
+	TP_printk("elr_el2: 0x%lx spsr_el2: 0x%08lx (M: %s) hcr_el2: %lx",
+		  __entry->elr_el2, __entry->spsr_el2,
+		  __print_symbolic(__entry->target_mode, kvm_mode_names),
+		  __entry->hcr_el2)
+);
+
+TRACE_EVENT(kvm_inject_nested_exception,
+	TP_PROTO(struct kvm_vcpu *vcpu, u64 esr_el2, int type),
+	TP_ARGS(vcpu, esr_el2, type),
+
+	TP_STRUCT__entry(
+		__field(struct kvm_vcpu *,		vcpu)
+		__field(unsigned long,			esr_el2)
+		__field(int,				type)
+		__field(unsigned long,			spsr_el2)
+		__field(unsigned long,			pc)
+		__field(unsigned long,			source_mode)
+		__field(unsigned long,			hcr_el2)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu = vcpu;
+		__entry->esr_el2 = esr_el2;
+		__entry->type = type;
+		__entry->spsr_el2 = *vcpu_cpsr(vcpu);
+		__entry->pc = *vcpu_pc(vcpu);
+		__entry->source_mode = *vcpu_cpsr(vcpu) & (PSR_MODE_MASK | PSR_MODE32_BIT);
+		__entry->hcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
+	),
+
+	TP_printk("%s: esr_el2 0x%lx elr_el2: 0x%lx spsr_el2: 0x%08lx (M: %s) hcr_el2: %lx",
+		  __print_symbolic(__entry->type, kvm_exception_type_names),
+		  __entry->esr_el2, __entry->pc, __entry->spsr_el2,
+		  __print_symbolic(__entry->source_mode, kvm_mode_names),
+		  __entry->hcr_el2)
+);
+
 #endif /* _TRACE_ARM_ARM64_KVM_H */
 
 #undef TRACE_INCLUDE_PATH
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 09/64] KVM: arm64: nv: Support virtual EL2 exceptions
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

From: Jintack Lim <jintack.lim@linaro.org>

Support injecting exceptions and performing exception returns to and
from virtual EL2.  This must be done entirely in software except when
taking an exception from vEL0 to vEL2 when the virtual HCR_EL2.{E2H,TGE}
== {1,1}  (a VHE guest hypervisor).

Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
[maz: switch to common exception injection framework, illegal exeption
 return handling]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_arm.h     |  17 +++
 arch/arm64/include/asm/kvm_emulate.h |  10 ++
 arch/arm64/include/asm/kvm_host.h    |   1 +
 arch/arm64/kvm/Makefile              |   2 +-
 arch/arm64/kvm/emulate-nested.c      | 197 +++++++++++++++++++++++++++
 arch/arm64/kvm/hyp/exception.c       |  49 +++++--
 arch/arm64/kvm/inject_fault.c        |  68 +++++++--
 arch/arm64/kvm/trace_arm.h           |  59 ++++++++
 8 files changed, 382 insertions(+), 21 deletions(-)
 create mode 100644 arch/arm64/kvm/emulate-nested.c

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 01d47c5886dc..e6e3aae87a09 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -359,4 +359,21 @@
 #define CPACR_EL1_TTA		(1 << 28)
 #define CPACR_EL1_DEFAULT	(CPACR_EL1_FPEN | CPACR_EL1_ZEN_EL1EN)
 
+#define kvm_mode_names				\
+	{ PSR_MODE_EL0t,	"EL0t" },	\
+	{ PSR_MODE_EL1t,	"EL1t" },	\
+	{ PSR_MODE_EL1h,	"EL1h" },	\
+	{ PSR_MODE_EL2t,	"EL2t" },	\
+	{ PSR_MODE_EL2h,	"EL2h" },	\
+	{ PSR_MODE_EL3t,	"EL3t" },	\
+	{ PSR_MODE_EL3h,	"EL3h" },	\
+	{ PSR_AA32_MODE_USR,	"32-bit USR" },	\
+	{ PSR_AA32_MODE_FIQ,	"32-bit FIQ" },	\
+	{ PSR_AA32_MODE_IRQ,	"32-bit IRQ" },	\
+	{ PSR_AA32_MODE_SVC,	"32-bit SVC" },	\
+	{ PSR_AA32_MODE_ABT,	"32-bit ABT" },	\
+	{ PSR_AA32_MODE_HYP,	"32-bit HYP" },	\
+	{ PSR_AA32_MODE_UND,	"32-bit UND" },	\
+	{ PSR_AA32_MODE_SYS,	"32-bit SYS" }
+
 #endif /* __ARM64_KVM_ARM_H__ */
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index ea9a130c4b6a..cb9f123d26f3 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -33,6 +33,12 @@ enum exception_type {
 	except_type_serror	= 0x180,
 };
 
+#define kvm_exception_type_names		\
+	{ except_type_sync,	"SYNC"   },	\
+	{ except_type_irq,	"IRQ"    },	\
+	{ except_type_fiq,	"FIQ"    },	\
+	{ except_type_serror,	"SERROR" }
+
 bool kvm_condition_valid32(const struct kvm_vcpu *vcpu);
 void kvm_skip_instr32(struct kvm_vcpu *vcpu);
 
@@ -43,6 +49,10 @@ void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
 
 void kvm_vcpu_wfi(struct kvm_vcpu *vcpu);
 
+void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu);
+int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2);
+int kvm_inject_nested_irq(struct kvm_vcpu *vcpu);
+
 static __always_inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
 {
 	return !(vcpu->arch.hcr_el2 & HCR_RW);
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 15f690c27baf..8fffe2888403 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -467,6 +467,7 @@ struct kvm_vcpu_arch {
 #define KVM_ARM64_EXCEPT_AA64_ELx_SERR	(3 << 9)
 #define KVM_ARM64_EXCEPT_AA64_EL1	(0 << 11)
 #define KVM_ARM64_EXCEPT_AA64_EL2	(1 << 11)
+#define KVM_ARM64_EXCEPT_AA64_EL_MASK	(1 << 11)
 
 #define KVM_ARM64_DEBUG_STATE_SAVE_SPE	(1 << 12) /* Save SPE context if active  */
 #define KVM_ARM64_DEBUG_STATE_SAVE_TRBE	(1 << 13) /* Save TRBE context if active  */
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 91861fd8b897..b67c4ebd72b1 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -14,7 +14,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
 	 inject_fault.o va_layout.o handle_exit.o \
 	 guest.o debug.o reset.o sys_regs.o \
 	 vgic-sys-reg-v3.o fpsimd.o pmu.o pkvm.o \
-	 arch_timer.o trng.o\
+	 arch_timer.o trng.o emulate-nested.o \
 	 vgic/vgic.o vgic/vgic-init.o \
 	 vgic/vgic-irqfd.o vgic/vgic-v2.o \
 	 vgic/vgic-v3.o vgic/vgic-v4.o \
diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
new file mode 100644
index 000000000000..f52cd4458947
--- /dev/null
+++ b/arch/arm64/kvm/emulate-nested.c
@@ -0,0 +1,197 @@
+/*
+ * Copyright (C) 2016 - Linaro and Columbia University
+ * Author: Jintack Lim <jintack.lim@linaro.org>
+ */
+
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_emulate.h>
+#include <asm/kvm_nested.h>
+
+#include "hyp/include/hyp/adjust_pc.h"
+
+#include "trace.h"
+
+static u64 kvm_check_illegal_exception_return(struct kvm_vcpu *vcpu, u64 spsr)
+{
+	u64 mode = spsr & PSR_MODE_MASK;
+
+	/*
+	 * Possible causes for an Illegal Exception Return from EL2:
+	 * - trying to return to EL3
+	 * - trying to return to a 32bit EL
+	 * - trying to return to EL1 with HCR_EL2.TGE set
+	 */
+	if (mode == PSR_MODE_EL3t || mode == PSR_MODE_EL3h ||
+	    spsr & PSR_MODE32_BIT ||
+	    (vcpu_el2_tge_is_set(vcpu) && (mode == PSR_MODE_EL1t ||
+					   mode == PSR_MODE_EL1h))) {
+		/*
+		 * The guest is playing with our nerves. Preserve EL, SP,
+		 * masks, flags from the existing PSTATE, and set IL.
+		 * The HW will then generate an Illegal State Exception
+		 * immediately after ERET.
+		 */
+		spsr = *vcpu_cpsr(vcpu);
+
+		spsr &= (PSR_D_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT |
+			 PSR_N_BIT | PSR_Z_BIT | PSR_C_BIT | PSR_V_BIT |
+			 PSR_MODE_MASK | PSR_MODE32_BIT);
+		spsr |= PSR_IL_BIT;
+	}
+
+	return spsr;
+}
+
+void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu)
+{
+	u64 spsr, elr, mode;
+	bool direct_eret;
+
+	/*
+	 * Going through the whole put/load motions is a waste of time
+	 * if this is a VHE guest hypervisor returning to its own
+	 * userspace, or the hypervisor performing a local exception
+	 * return. No need to save/restore registers, no need to
+	 * switch S2 MMU. Just do the canonical ERET.
+	 */
+	spsr = vcpu_read_sys_reg(vcpu, SPSR_EL2);
+	spsr = kvm_check_illegal_exception_return(vcpu, spsr);
+
+	mode = spsr & (PSR_MODE_MASK | PSR_MODE32_BIT);
+
+	direct_eret  = (mode == PSR_MODE_EL0t &&
+			vcpu_el2_e2h_is_set(vcpu) &&
+			vcpu_el2_tge_is_set(vcpu));
+	direct_eret |= (mode == PSR_MODE_EL2h || mode == PSR_MODE_EL2t);
+
+	if (direct_eret) {
+		*vcpu_pc(vcpu) = vcpu_read_sys_reg(vcpu, ELR_EL2);
+		*vcpu_cpsr(vcpu) = spsr;
+		trace_kvm_nested_eret(vcpu, *vcpu_pc(vcpu), spsr);
+		return;
+	}
+
+	preempt_disable();
+	kvm_arch_vcpu_put(vcpu);
+
+	elr = __vcpu_sys_reg(vcpu, ELR_EL2);
+
+	trace_kvm_nested_eret(vcpu, elr, spsr);
+
+	/*
+	 * Note that the current exception level is always the virtual EL2,
+	 * since we set HCR_EL2.NV bit only when entering the virtual EL2.
+	 */
+	*vcpu_pc(vcpu) = elr;
+	*vcpu_cpsr(vcpu) = spsr;
+
+	kvm_arch_vcpu_load(vcpu, smp_processor_id());
+	preempt_enable();
+}
+
+static void kvm_inject_el2_exception(struct kvm_vcpu *vcpu, u64 esr_el2,
+				     enum exception_type type)
+{
+	trace_kvm_inject_nested_exception(vcpu, esr_el2, type);
+
+	switch (type) {
+	case except_type_sync:
+		vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_ELx_SYNC;
+		break;
+	case except_type_irq:
+		vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_ELx_IRQ;
+		break;
+	default:
+		WARN_ONCE(1, "Unsupported EL2 exception injection %d\n", type);
+	}
+
+	vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA64_EL2		|
+			     KVM_ARM64_PENDING_EXCEPTION);
+
+	vcpu_write_sys_reg(vcpu, esr_el2, ESR_EL2);
+}
+
+/*
+ * Emulate taking an exception to EL2.
+ * See ARM ARM J8.1.2 AArch64.TakeException()
+ */
+static int kvm_inject_nested(struct kvm_vcpu *vcpu, u64 esr_el2,
+			     enum exception_type type)
+{
+	u64 pstate, mode;
+	bool direct_inject;
+
+	if (!vcpu_has_nv(vcpu)) {
+		kvm_err("Unexpected call to %s for the non-nesting configuration\n",
+				__func__);
+		return -EINVAL;
+	}
+
+	/*
+	 * As for ERET, we can avoid doing too much on the injection path by
+	 * checking that we either took the exception from a VHE host
+	 * userspace or from vEL2. In these cases, there is no change in
+	 * translation regime (or anything else), so let's do as little as
+	 * possible.
+	 */
+	pstate = *vcpu_cpsr(vcpu);
+	mode = pstate & (PSR_MODE_MASK | PSR_MODE32_BIT);
+
+	direct_inject  = (mode == PSR_MODE_EL0t &&
+			  vcpu_el2_e2h_is_set(vcpu) &&
+			  vcpu_el2_tge_is_set(vcpu));
+	direct_inject |= (mode == PSR_MODE_EL2h || mode == PSR_MODE_EL2t);
+
+	if (direct_inject) {
+		kvm_inject_el2_exception(vcpu, esr_el2, type);
+		return 1;
+	}
+
+	preempt_disable();
+	kvm_arch_vcpu_put(vcpu);
+
+	kvm_inject_el2_exception(vcpu, esr_el2, type);
+
+	/*
+	 * A hard requirement is that a switch between EL1 and EL2
+	 * contexts has to happen between a put/load, so that we can
+	 * pick the correct timer and interrupt configuration, among
+	 * other things.
+	 *
+	 * Make sure the exception actually took place before we load
+	 * the new context.
+	 */
+	__kvm_adjust_pc(vcpu);
+
+	kvm_arch_vcpu_load(vcpu, smp_processor_id());
+	preempt_enable();
+
+	return 1;
+}
+
+int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2)
+{
+	return kvm_inject_nested(vcpu, esr_el2, except_type_sync);
+}
+
+int kvm_inject_nested_irq(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * Do not inject an irq if the:
+	 *  - Current exception level is EL2, and
+	 *  - virtual HCR_EL2.TGE == 0
+	 *  - virtual HCR_EL2.IMO == 0
+	 *
+	 * See Table D1-17 "Physical interrupt target and masking when EL3 is
+	 * not implemented and EL2 is implemented" in ARM DDI 0487C.a.
+	 */
+
+	if (vcpu_is_el2(vcpu) && !vcpu_el2_tge_is_set(vcpu) &&
+	    !(__vcpu_sys_reg(vcpu, HCR_EL2) & HCR_IMO))
+		return 1;
+
+	/* esr_el2 value doesn't matter for exits due to irqs. */
+	return kvm_inject_nested(vcpu, 0, except_type_irq);
+}
diff --git a/arch/arm64/kvm/hyp/exception.c b/arch/arm64/kvm/hyp/exception.c
index 0418399e0a20..93f9c6f97376 100644
--- a/arch/arm64/kvm/hyp/exception.c
+++ b/arch/arm64/kvm/hyp/exception.c
@@ -13,6 +13,7 @@
 #include <hyp/adjust_pc.h>
 #include <linux/kvm_host.h>
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_nested.h>
 
 #if !defined (__KVM_NVHE_HYPERVISOR__) && !defined (__KVM_VHE_HYPERVISOR__)
 #error Hypervisor code only!
@@ -22,7 +23,9 @@ static inline u64 __vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
 {
 	u64 val;
 
-	if (__vcpu_read_sys_reg_from_cpu(reg, &val))
+	if (unlikely(vcpu_has_nv(vcpu)))
+		return vcpu_read_sys_reg(vcpu, reg);
+	else if (__vcpu_read_sys_reg_from_cpu(reg, &val))
 		return val;
 
 	return __vcpu_sys_reg(vcpu, reg);
@@ -30,14 +33,24 @@ static inline u64 __vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
 
 static inline void __vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
 {
-	if (__vcpu_write_sys_reg_to_cpu(val, reg))
-		return;
-
-	 __vcpu_sys_reg(vcpu, reg) = val;
+	if (unlikely(vcpu_has_nv(vcpu)))
+		vcpu_write_sys_reg(vcpu, val, reg);
+	else if (!__vcpu_write_sys_reg_to_cpu(val, reg))
+		__vcpu_sys_reg(vcpu, reg) = val;
 }
 
-static void __vcpu_write_spsr(struct kvm_vcpu *vcpu, u64 val)
+static void __vcpu_write_spsr(struct kvm_vcpu *vcpu, unsigned long target_mode,
+			      u64 val)
 {
+	if (unlikely(vcpu_has_nv(vcpu))) {
+		if (target_mode == PSR_MODE_EL1h)
+			vcpu_write_sys_reg(vcpu, val, SPSR_EL1);
+		else
+			vcpu_write_sys_reg(vcpu, val, SPSR_EL2);
+
+		return;
+	}
+
 	write_sysreg_el1(val, SYS_SPSR);
 }
 
@@ -97,6 +110,11 @@ static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long target_mode,
 		sctlr = __vcpu_read_sys_reg(vcpu, SCTLR_EL1);
 		__vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL1);
 		break;
+	case PSR_MODE_EL2h:
+		vbar = __vcpu_read_sys_reg(vcpu, VBAR_EL2);
+		sctlr = __vcpu_read_sys_reg(vcpu, SCTLR_EL2);
+		__vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL2);
+		break;
 	default:
 		/* Don't do that */
 		BUG();
@@ -149,7 +167,7 @@ static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long target_mode,
 	new |= target_mode;
 
 	*vcpu_cpsr(vcpu) = new;
-	__vcpu_write_spsr(vcpu, old);
+	__vcpu_write_spsr(vcpu, target_mode, old);
 }
 
 /*
@@ -320,11 +338,22 @@ static void kvm_inject_exception(struct kvm_vcpu *vcpu)
 		      KVM_ARM64_EXCEPT_AA64_EL1):
 			enter_exception64(vcpu, PSR_MODE_EL1h, except_type_sync);
 			break;
+
+		case (KVM_ARM64_EXCEPT_AA64_ELx_SYNC |
+		      KVM_ARM64_EXCEPT_AA64_EL2):
+			enter_exception64(vcpu, PSR_MODE_EL2h, except_type_sync);
+			break;
+
+		case (KVM_ARM64_EXCEPT_AA64_ELx_IRQ |
+		      KVM_ARM64_EXCEPT_AA64_EL2):
+			enter_exception64(vcpu, PSR_MODE_EL2h, except_type_irq);
+			break;
+
 		default:
 			/*
-			 * Only EL1_SYNC makes sense so far, EL2_{SYNC,IRQ}
-			 * will be implemented at some point. Everything
-			 * else gets silently ignored.
+			 * Only EL1_SYNC and EL2_{SYNC,IRQ} makes
+			 * sense so far. Everything else gets silently
+			 * ignored.
 			 */
 			break;
 		}
diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index b47df73e98d7..81ceee6998cc 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -12,19 +12,58 @@
 
 #include <linux/kvm_host.h>
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_nested.h>
 #include <asm/esr.h>
 
+static void pend_sync_exception(struct kvm_vcpu *vcpu)
+{
+	vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA64_ELx_SYNC	|
+			     KVM_ARM64_PENDING_EXCEPTION);
+
+	/* If not nesting, EL1 is the only possible exception target */
+	if (likely(!vcpu_has_nv(vcpu))) {
+		vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_EL1;
+		return;
+	}
+
+	/*
+	 * With NV, we need to pick between EL1 and EL2. Note that we
+	 * never deal with a nesting exception here, hence never
+	 * changing context, and the exception itself can be delayed
+	 * until the next entry.
+	 */
+	switch(*vcpu_cpsr(vcpu) & PSR_MODE_MASK) {
+	case PSR_MODE_EL2h:
+	case PSR_MODE_EL2t:
+		vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_EL2;
+		break;
+	case PSR_MODE_EL1h:
+	case PSR_MODE_EL1t:
+		vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_EL1;
+		break;
+	case PSR_MODE_EL0t:
+		if (vcpu_el2_tge_is_set(vcpu))
+			vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_EL2;
+		else
+			vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_EL1;
+		break;
+	default:
+		BUG();
+	}
+}
+
+static bool match_target_el(struct kvm_vcpu *vcpu, unsigned long target)
+{
+	return (vcpu->arch.flags & KVM_ARM64_EXCEPT_AA64_EL_MASK) == target;
+}
+
 static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr)
 {
 	unsigned long cpsr = *vcpu_cpsr(vcpu);
 	bool is_aarch32 = vcpu_mode_is_32bit(vcpu);
 	u32 esr = 0;
 
-	vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA64_EL1		|
-			     KVM_ARM64_EXCEPT_AA64_ELx_SYNC	|
-			     KVM_ARM64_PENDING_EXCEPTION);
-
-	vcpu_write_sys_reg(vcpu, addr, FAR_EL1);
+	pend_sync_exception(vcpu);
 
 	/*
 	 * Build an {i,d}abort, depending on the level and the
@@ -45,16 +84,22 @@ static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr
 	if (!is_iabt)
 		esr |= ESR_ELx_EC_DABT_LOW << ESR_ELx_EC_SHIFT;
 
-	vcpu_write_sys_reg(vcpu, esr | ESR_ELx_FSC_EXTABT, ESR_EL1);
+	esr |= ESR_ELx_FSC_EXTABT;
+
+	if (match_target_el(vcpu, KVM_ARM64_EXCEPT_AA64_EL1)) {
+		vcpu_write_sys_reg(vcpu, addr, FAR_EL1);
+		vcpu_write_sys_reg(vcpu, esr, ESR_EL1);
+	} else {
+		vcpu_write_sys_reg(vcpu, addr, FAR_EL2);
+		vcpu_write_sys_reg(vcpu, esr, ESR_EL2);
+	}
 }
 
 static void inject_undef64(struct kvm_vcpu *vcpu)
 {
 	u32 esr = (ESR_ELx_EC_UNKNOWN << ESR_ELx_EC_SHIFT);
 
-	vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA64_EL1		|
-			     KVM_ARM64_EXCEPT_AA64_ELx_SYNC	|
-			     KVM_ARM64_PENDING_EXCEPTION);
+	pend_sync_exception(vcpu);
 
 	/*
 	 * Build an unknown exception, depending on the instruction
@@ -63,7 +108,10 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
 	if (kvm_vcpu_trap_il_is32bit(vcpu))
 		esr |= ESR_ELx_IL;
 
-	vcpu_write_sys_reg(vcpu, esr, ESR_EL1);
+	if (match_target_el(vcpu, KVM_ARM64_EXCEPT_AA64_EL1))
+		vcpu_write_sys_reg(vcpu, esr, ESR_EL1);
+	else
+		vcpu_write_sys_reg(vcpu, esr, ESR_EL2);
 }
 
 #define DFSR_FSC_EXTABT_LPAE	0x10
diff --git a/arch/arm64/kvm/trace_arm.h b/arch/arm64/kvm/trace_arm.h
index 33e4e7dd2719..f3e46a976125 100644
--- a/arch/arm64/kvm/trace_arm.h
+++ b/arch/arm64/kvm/trace_arm.h
@@ -2,6 +2,7 @@
 #if !defined(_TRACE_ARM_ARM64_KVM_H) || defined(TRACE_HEADER_MULTI_READ)
 #define _TRACE_ARM_ARM64_KVM_H
 
+#include <asm/kvm_emulate.h>
 #include <kvm/arm_arch_timer.h>
 #include <linux/tracepoint.h>
 
@@ -301,6 +302,64 @@ TRACE_EVENT(kvm_timer_emulate,
 		  __entry->timer_idx, __entry->should_fire)
 );
 
+TRACE_EVENT(kvm_nested_eret,
+	TP_PROTO(struct kvm_vcpu *vcpu, unsigned long elr_el2,
+		 unsigned long spsr_el2),
+	TP_ARGS(vcpu, elr_el2, spsr_el2),
+
+	TP_STRUCT__entry(
+		__field(struct kvm_vcpu *,	vcpu)
+		__field(unsigned long,		elr_el2)
+		__field(unsigned long,		spsr_el2)
+		__field(unsigned long,		target_mode)
+		__field(unsigned long,		hcr_el2)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu = vcpu;
+		__entry->elr_el2 = elr_el2;
+		__entry->spsr_el2 = spsr_el2;
+		__entry->target_mode = spsr_el2 & (PSR_MODE_MASK | PSR_MODE32_BIT);
+		__entry->hcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
+	),
+
+	TP_printk("elr_el2: 0x%lx spsr_el2: 0x%08lx (M: %s) hcr_el2: %lx",
+		  __entry->elr_el2, __entry->spsr_el2,
+		  __print_symbolic(__entry->target_mode, kvm_mode_names),
+		  __entry->hcr_el2)
+);
+
+TRACE_EVENT(kvm_inject_nested_exception,
+	TP_PROTO(struct kvm_vcpu *vcpu, u64 esr_el2, int type),
+	TP_ARGS(vcpu, esr_el2, type),
+
+	TP_STRUCT__entry(
+		__field(struct kvm_vcpu *,		vcpu)
+		__field(unsigned long,			esr_el2)
+		__field(int,				type)
+		__field(unsigned long,			spsr_el2)
+		__field(unsigned long,			pc)
+		__field(unsigned long,			source_mode)
+		__field(unsigned long,			hcr_el2)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu = vcpu;
+		__entry->esr_el2 = esr_el2;
+		__entry->type = type;
+		__entry->spsr_el2 = *vcpu_cpsr(vcpu);
+		__entry->pc = *vcpu_pc(vcpu);
+		__entry->source_mode = *vcpu_cpsr(vcpu) & (PSR_MODE_MASK | PSR_MODE32_BIT);
+		__entry->hcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
+	),
+
+	TP_printk("%s: esr_el2 0x%lx elr_el2: 0x%lx spsr_el2: 0x%08lx (M: %s) hcr_el2: %lx",
+		  __print_symbolic(__entry->type, kvm_exception_type_names),
+		  __entry->esr_el2, __entry->pc, __entry->spsr_el2,
+		  __print_symbolic(__entry->source_mode, kvm_mode_names),
+		  __entry->hcr_el2)
+);
+
 #endif /* _TRACE_ARM_ARM64_KVM_H */
 
 #undef TRACE_INCLUDE_PATH
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 09/64] KVM: arm64: nv: Support virtual EL2 exceptions
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Jintack Lim <jintack.lim@linaro.org>

Support injecting exceptions and performing exception returns to and
from virtual EL2.  This must be done entirely in software except when
taking an exception from vEL0 to vEL2 when the virtual HCR_EL2.{E2H,TGE}
== {1,1}  (a VHE guest hypervisor).

Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
[maz: switch to common exception injection framework, illegal exeption
 return handling]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_arm.h     |  17 +++
 arch/arm64/include/asm/kvm_emulate.h |  10 ++
 arch/arm64/include/asm/kvm_host.h    |   1 +
 arch/arm64/kvm/Makefile              |   2 +-
 arch/arm64/kvm/emulate-nested.c      | 197 +++++++++++++++++++++++++++
 arch/arm64/kvm/hyp/exception.c       |  49 +++++--
 arch/arm64/kvm/inject_fault.c        |  68 +++++++--
 arch/arm64/kvm/trace_arm.h           |  59 ++++++++
 8 files changed, 382 insertions(+), 21 deletions(-)
 create mode 100644 arch/arm64/kvm/emulate-nested.c

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 01d47c5886dc..e6e3aae87a09 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -359,4 +359,21 @@
 #define CPACR_EL1_TTA		(1 << 28)
 #define CPACR_EL1_DEFAULT	(CPACR_EL1_FPEN | CPACR_EL1_ZEN_EL1EN)
 
+#define kvm_mode_names				\
+	{ PSR_MODE_EL0t,	"EL0t" },	\
+	{ PSR_MODE_EL1t,	"EL1t" },	\
+	{ PSR_MODE_EL1h,	"EL1h" },	\
+	{ PSR_MODE_EL2t,	"EL2t" },	\
+	{ PSR_MODE_EL2h,	"EL2h" },	\
+	{ PSR_MODE_EL3t,	"EL3t" },	\
+	{ PSR_MODE_EL3h,	"EL3h" },	\
+	{ PSR_AA32_MODE_USR,	"32-bit USR" },	\
+	{ PSR_AA32_MODE_FIQ,	"32-bit FIQ" },	\
+	{ PSR_AA32_MODE_IRQ,	"32-bit IRQ" },	\
+	{ PSR_AA32_MODE_SVC,	"32-bit SVC" },	\
+	{ PSR_AA32_MODE_ABT,	"32-bit ABT" },	\
+	{ PSR_AA32_MODE_HYP,	"32-bit HYP" },	\
+	{ PSR_AA32_MODE_UND,	"32-bit UND" },	\
+	{ PSR_AA32_MODE_SYS,	"32-bit SYS" }
+
 #endif /* __ARM64_KVM_ARM_H__ */
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index ea9a130c4b6a..cb9f123d26f3 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -33,6 +33,12 @@ enum exception_type {
 	except_type_serror	= 0x180,
 };
 
+#define kvm_exception_type_names		\
+	{ except_type_sync,	"SYNC"   },	\
+	{ except_type_irq,	"IRQ"    },	\
+	{ except_type_fiq,	"FIQ"    },	\
+	{ except_type_serror,	"SERROR" }
+
 bool kvm_condition_valid32(const struct kvm_vcpu *vcpu);
 void kvm_skip_instr32(struct kvm_vcpu *vcpu);
 
@@ -43,6 +49,10 @@ void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
 
 void kvm_vcpu_wfi(struct kvm_vcpu *vcpu);
 
+void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu);
+int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2);
+int kvm_inject_nested_irq(struct kvm_vcpu *vcpu);
+
 static __always_inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
 {
 	return !(vcpu->arch.hcr_el2 & HCR_RW);
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 15f690c27baf..8fffe2888403 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -467,6 +467,7 @@ struct kvm_vcpu_arch {
 #define KVM_ARM64_EXCEPT_AA64_ELx_SERR	(3 << 9)
 #define KVM_ARM64_EXCEPT_AA64_EL1	(0 << 11)
 #define KVM_ARM64_EXCEPT_AA64_EL2	(1 << 11)
+#define KVM_ARM64_EXCEPT_AA64_EL_MASK	(1 << 11)
 
 #define KVM_ARM64_DEBUG_STATE_SAVE_SPE	(1 << 12) /* Save SPE context if active  */
 #define KVM_ARM64_DEBUG_STATE_SAVE_TRBE	(1 << 13) /* Save TRBE context if active  */
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 91861fd8b897..b67c4ebd72b1 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -14,7 +14,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
 	 inject_fault.o va_layout.o handle_exit.o \
 	 guest.o debug.o reset.o sys_regs.o \
 	 vgic-sys-reg-v3.o fpsimd.o pmu.o pkvm.o \
-	 arch_timer.o trng.o\
+	 arch_timer.o trng.o emulate-nested.o \
 	 vgic/vgic.o vgic/vgic-init.o \
 	 vgic/vgic-irqfd.o vgic/vgic-v2.o \
 	 vgic/vgic-v3.o vgic/vgic-v4.o \
diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
new file mode 100644
index 000000000000..f52cd4458947
--- /dev/null
+++ b/arch/arm64/kvm/emulate-nested.c
@@ -0,0 +1,197 @@
+/*
+ * Copyright (C) 2016 - Linaro and Columbia University
+ * Author: Jintack Lim <jintack.lim@linaro.org>
+ */
+
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_emulate.h>
+#include <asm/kvm_nested.h>
+
+#include "hyp/include/hyp/adjust_pc.h"
+
+#include "trace.h"
+
+static u64 kvm_check_illegal_exception_return(struct kvm_vcpu *vcpu, u64 spsr)
+{
+	u64 mode = spsr & PSR_MODE_MASK;
+
+	/*
+	 * Possible causes for an Illegal Exception Return from EL2:
+	 * - trying to return to EL3
+	 * - trying to return to a 32bit EL
+	 * - trying to return to EL1 with HCR_EL2.TGE set
+	 */
+	if (mode == PSR_MODE_EL3t || mode == PSR_MODE_EL3h ||
+	    spsr & PSR_MODE32_BIT ||
+	    (vcpu_el2_tge_is_set(vcpu) && (mode == PSR_MODE_EL1t ||
+					   mode == PSR_MODE_EL1h))) {
+		/*
+		 * The guest is playing with our nerves. Preserve EL, SP,
+		 * masks, flags from the existing PSTATE, and set IL.
+		 * The HW will then generate an Illegal State Exception
+		 * immediately after ERET.
+		 */
+		spsr = *vcpu_cpsr(vcpu);
+
+		spsr &= (PSR_D_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT |
+			 PSR_N_BIT | PSR_Z_BIT | PSR_C_BIT | PSR_V_BIT |
+			 PSR_MODE_MASK | PSR_MODE32_BIT);
+		spsr |= PSR_IL_BIT;
+	}
+
+	return spsr;
+}
+
+void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu)
+{
+	u64 spsr, elr, mode;
+	bool direct_eret;
+
+	/*
+	 * Going through the whole put/load motions is a waste of time
+	 * if this is a VHE guest hypervisor returning to its own
+	 * userspace, or the hypervisor performing a local exception
+	 * return. No need to save/restore registers, no need to
+	 * switch S2 MMU. Just do the canonical ERET.
+	 */
+	spsr = vcpu_read_sys_reg(vcpu, SPSR_EL2);
+	spsr = kvm_check_illegal_exception_return(vcpu, spsr);
+
+	mode = spsr & (PSR_MODE_MASK | PSR_MODE32_BIT);
+
+	direct_eret  = (mode == PSR_MODE_EL0t &&
+			vcpu_el2_e2h_is_set(vcpu) &&
+			vcpu_el2_tge_is_set(vcpu));
+	direct_eret |= (mode == PSR_MODE_EL2h || mode == PSR_MODE_EL2t);
+
+	if (direct_eret) {
+		*vcpu_pc(vcpu) = vcpu_read_sys_reg(vcpu, ELR_EL2);
+		*vcpu_cpsr(vcpu) = spsr;
+		trace_kvm_nested_eret(vcpu, *vcpu_pc(vcpu), spsr);
+		return;
+	}
+
+	preempt_disable();
+	kvm_arch_vcpu_put(vcpu);
+
+	elr = __vcpu_sys_reg(vcpu, ELR_EL2);
+
+	trace_kvm_nested_eret(vcpu, elr, spsr);
+
+	/*
+	 * Note that the current exception level is always the virtual EL2,
+	 * since we set HCR_EL2.NV bit only when entering the virtual EL2.
+	 */
+	*vcpu_pc(vcpu) = elr;
+	*vcpu_cpsr(vcpu) = spsr;
+
+	kvm_arch_vcpu_load(vcpu, smp_processor_id());
+	preempt_enable();
+}
+
+static void kvm_inject_el2_exception(struct kvm_vcpu *vcpu, u64 esr_el2,
+				     enum exception_type type)
+{
+	trace_kvm_inject_nested_exception(vcpu, esr_el2, type);
+
+	switch (type) {
+	case except_type_sync:
+		vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_ELx_SYNC;
+		break;
+	case except_type_irq:
+		vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_ELx_IRQ;
+		break;
+	default:
+		WARN_ONCE(1, "Unsupported EL2 exception injection %d\n", type);
+	}
+
+	vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA64_EL2		|
+			     KVM_ARM64_PENDING_EXCEPTION);
+
+	vcpu_write_sys_reg(vcpu, esr_el2, ESR_EL2);
+}
+
+/*
+ * Emulate taking an exception to EL2.
+ * See ARM ARM J8.1.2 AArch64.TakeException()
+ */
+static int kvm_inject_nested(struct kvm_vcpu *vcpu, u64 esr_el2,
+			     enum exception_type type)
+{
+	u64 pstate, mode;
+	bool direct_inject;
+
+	if (!vcpu_has_nv(vcpu)) {
+		kvm_err("Unexpected call to %s for the non-nesting configuration\n",
+				__func__);
+		return -EINVAL;
+	}
+
+	/*
+	 * As for ERET, we can avoid doing too much on the injection path by
+	 * checking that we either took the exception from a VHE host
+	 * userspace or from vEL2. In these cases, there is no change in
+	 * translation regime (or anything else), so let's do as little as
+	 * possible.
+	 */
+	pstate = *vcpu_cpsr(vcpu);
+	mode = pstate & (PSR_MODE_MASK | PSR_MODE32_BIT);
+
+	direct_inject  = (mode == PSR_MODE_EL0t &&
+			  vcpu_el2_e2h_is_set(vcpu) &&
+			  vcpu_el2_tge_is_set(vcpu));
+	direct_inject |= (mode == PSR_MODE_EL2h || mode == PSR_MODE_EL2t);
+
+	if (direct_inject) {
+		kvm_inject_el2_exception(vcpu, esr_el2, type);
+		return 1;
+	}
+
+	preempt_disable();
+	kvm_arch_vcpu_put(vcpu);
+
+	kvm_inject_el2_exception(vcpu, esr_el2, type);
+
+	/*
+	 * A hard requirement is that a switch between EL1 and EL2
+	 * contexts has to happen between a put/load, so that we can
+	 * pick the correct timer and interrupt configuration, among
+	 * other things.
+	 *
+	 * Make sure the exception actually took place before we load
+	 * the new context.
+	 */
+	__kvm_adjust_pc(vcpu);
+
+	kvm_arch_vcpu_load(vcpu, smp_processor_id());
+	preempt_enable();
+
+	return 1;
+}
+
+int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2)
+{
+	return kvm_inject_nested(vcpu, esr_el2, except_type_sync);
+}
+
+int kvm_inject_nested_irq(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * Do not inject an irq if the:
+	 *  - Current exception level is EL2, and
+	 *  - virtual HCR_EL2.TGE == 0
+	 *  - virtual HCR_EL2.IMO == 0
+	 *
+	 * See Table D1-17 "Physical interrupt target and masking when EL3 is
+	 * not implemented and EL2 is implemented" in ARM DDI 0487C.a.
+	 */
+
+	if (vcpu_is_el2(vcpu) && !vcpu_el2_tge_is_set(vcpu) &&
+	    !(__vcpu_sys_reg(vcpu, HCR_EL2) & HCR_IMO))
+		return 1;
+
+	/* esr_el2 value doesn't matter for exits due to irqs. */
+	return kvm_inject_nested(vcpu, 0, except_type_irq);
+}
diff --git a/arch/arm64/kvm/hyp/exception.c b/arch/arm64/kvm/hyp/exception.c
index 0418399e0a20..93f9c6f97376 100644
--- a/arch/arm64/kvm/hyp/exception.c
+++ b/arch/arm64/kvm/hyp/exception.c
@@ -13,6 +13,7 @@
 #include <hyp/adjust_pc.h>
 #include <linux/kvm_host.h>
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_nested.h>
 
 #if !defined (__KVM_NVHE_HYPERVISOR__) && !defined (__KVM_VHE_HYPERVISOR__)
 #error Hypervisor code only!
@@ -22,7 +23,9 @@ static inline u64 __vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
 {
 	u64 val;
 
-	if (__vcpu_read_sys_reg_from_cpu(reg, &val))
+	if (unlikely(vcpu_has_nv(vcpu)))
+		return vcpu_read_sys_reg(vcpu, reg);
+	else if (__vcpu_read_sys_reg_from_cpu(reg, &val))
 		return val;
 
 	return __vcpu_sys_reg(vcpu, reg);
@@ -30,14 +33,24 @@ static inline u64 __vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
 
 static inline void __vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
 {
-	if (__vcpu_write_sys_reg_to_cpu(val, reg))
-		return;
-
-	 __vcpu_sys_reg(vcpu, reg) = val;
+	if (unlikely(vcpu_has_nv(vcpu)))
+		vcpu_write_sys_reg(vcpu, val, reg);
+	else if (!__vcpu_write_sys_reg_to_cpu(val, reg))
+		__vcpu_sys_reg(vcpu, reg) = val;
 }
 
-static void __vcpu_write_spsr(struct kvm_vcpu *vcpu, u64 val)
+static void __vcpu_write_spsr(struct kvm_vcpu *vcpu, unsigned long target_mode,
+			      u64 val)
 {
+	if (unlikely(vcpu_has_nv(vcpu))) {
+		if (target_mode == PSR_MODE_EL1h)
+			vcpu_write_sys_reg(vcpu, val, SPSR_EL1);
+		else
+			vcpu_write_sys_reg(vcpu, val, SPSR_EL2);
+
+		return;
+	}
+
 	write_sysreg_el1(val, SYS_SPSR);
 }
 
@@ -97,6 +110,11 @@ static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long target_mode,
 		sctlr = __vcpu_read_sys_reg(vcpu, SCTLR_EL1);
 		__vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL1);
 		break;
+	case PSR_MODE_EL2h:
+		vbar = __vcpu_read_sys_reg(vcpu, VBAR_EL2);
+		sctlr = __vcpu_read_sys_reg(vcpu, SCTLR_EL2);
+		__vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL2);
+		break;
 	default:
 		/* Don't do that */
 		BUG();
@@ -149,7 +167,7 @@ static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long target_mode,
 	new |= target_mode;
 
 	*vcpu_cpsr(vcpu) = new;
-	__vcpu_write_spsr(vcpu, old);
+	__vcpu_write_spsr(vcpu, target_mode, old);
 }
 
 /*
@@ -320,11 +338,22 @@ static void kvm_inject_exception(struct kvm_vcpu *vcpu)
 		      KVM_ARM64_EXCEPT_AA64_EL1):
 			enter_exception64(vcpu, PSR_MODE_EL1h, except_type_sync);
 			break;
+
+		case (KVM_ARM64_EXCEPT_AA64_ELx_SYNC |
+		      KVM_ARM64_EXCEPT_AA64_EL2):
+			enter_exception64(vcpu, PSR_MODE_EL2h, except_type_sync);
+			break;
+
+		case (KVM_ARM64_EXCEPT_AA64_ELx_IRQ |
+		      KVM_ARM64_EXCEPT_AA64_EL2):
+			enter_exception64(vcpu, PSR_MODE_EL2h, except_type_irq);
+			break;
+
 		default:
 			/*
-			 * Only EL1_SYNC makes sense so far, EL2_{SYNC,IRQ}
-			 * will be implemented at some point. Everything
-			 * else gets silently ignored.
+			 * Only EL1_SYNC and EL2_{SYNC,IRQ} makes
+			 * sense so far. Everything else gets silently
+			 * ignored.
 			 */
 			break;
 		}
diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index b47df73e98d7..81ceee6998cc 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -12,19 +12,58 @@
 
 #include <linux/kvm_host.h>
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_nested.h>
 #include <asm/esr.h>
 
+static void pend_sync_exception(struct kvm_vcpu *vcpu)
+{
+	vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA64_ELx_SYNC	|
+			     KVM_ARM64_PENDING_EXCEPTION);
+
+	/* If not nesting, EL1 is the only possible exception target */
+	if (likely(!vcpu_has_nv(vcpu))) {
+		vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_EL1;
+		return;
+	}
+
+	/*
+	 * With NV, we need to pick between EL1 and EL2. Note that we
+	 * never deal with a nesting exception here, hence never
+	 * changing context, and the exception itself can be delayed
+	 * until the next entry.
+	 */
+	switch(*vcpu_cpsr(vcpu) & PSR_MODE_MASK) {
+	case PSR_MODE_EL2h:
+	case PSR_MODE_EL2t:
+		vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_EL2;
+		break;
+	case PSR_MODE_EL1h:
+	case PSR_MODE_EL1t:
+		vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_EL1;
+		break;
+	case PSR_MODE_EL0t:
+		if (vcpu_el2_tge_is_set(vcpu))
+			vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_EL2;
+		else
+			vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_EL1;
+		break;
+	default:
+		BUG();
+	}
+}
+
+static bool match_target_el(struct kvm_vcpu *vcpu, unsigned long target)
+{
+	return (vcpu->arch.flags & KVM_ARM64_EXCEPT_AA64_EL_MASK) == target;
+}
+
 static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr)
 {
 	unsigned long cpsr = *vcpu_cpsr(vcpu);
 	bool is_aarch32 = vcpu_mode_is_32bit(vcpu);
 	u32 esr = 0;
 
-	vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA64_EL1		|
-			     KVM_ARM64_EXCEPT_AA64_ELx_SYNC	|
-			     KVM_ARM64_PENDING_EXCEPTION);
-
-	vcpu_write_sys_reg(vcpu, addr, FAR_EL1);
+	pend_sync_exception(vcpu);
 
 	/*
 	 * Build an {i,d}abort, depending on the level and the
@@ -45,16 +84,22 @@ static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr
 	if (!is_iabt)
 		esr |= ESR_ELx_EC_DABT_LOW << ESR_ELx_EC_SHIFT;
 
-	vcpu_write_sys_reg(vcpu, esr | ESR_ELx_FSC_EXTABT, ESR_EL1);
+	esr |= ESR_ELx_FSC_EXTABT;
+
+	if (match_target_el(vcpu, KVM_ARM64_EXCEPT_AA64_EL1)) {
+		vcpu_write_sys_reg(vcpu, addr, FAR_EL1);
+		vcpu_write_sys_reg(vcpu, esr, ESR_EL1);
+	} else {
+		vcpu_write_sys_reg(vcpu, addr, FAR_EL2);
+		vcpu_write_sys_reg(vcpu, esr, ESR_EL2);
+	}
 }
 
 static void inject_undef64(struct kvm_vcpu *vcpu)
 {
 	u32 esr = (ESR_ELx_EC_UNKNOWN << ESR_ELx_EC_SHIFT);
 
-	vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA64_EL1		|
-			     KVM_ARM64_EXCEPT_AA64_ELx_SYNC	|
-			     KVM_ARM64_PENDING_EXCEPTION);
+	pend_sync_exception(vcpu);
 
 	/*
 	 * Build an unknown exception, depending on the instruction
@@ -63,7 +108,10 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
 	if (kvm_vcpu_trap_il_is32bit(vcpu))
 		esr |= ESR_ELx_IL;
 
-	vcpu_write_sys_reg(vcpu, esr, ESR_EL1);
+	if (match_target_el(vcpu, KVM_ARM64_EXCEPT_AA64_EL1))
+		vcpu_write_sys_reg(vcpu, esr, ESR_EL1);
+	else
+		vcpu_write_sys_reg(vcpu, esr, ESR_EL2);
 }
 
 #define DFSR_FSC_EXTABT_LPAE	0x10
diff --git a/arch/arm64/kvm/trace_arm.h b/arch/arm64/kvm/trace_arm.h
index 33e4e7dd2719..f3e46a976125 100644
--- a/arch/arm64/kvm/trace_arm.h
+++ b/arch/arm64/kvm/trace_arm.h
@@ -2,6 +2,7 @@
 #if !defined(_TRACE_ARM_ARM64_KVM_H) || defined(TRACE_HEADER_MULTI_READ)
 #define _TRACE_ARM_ARM64_KVM_H
 
+#include <asm/kvm_emulate.h>
 #include <kvm/arm_arch_timer.h>
 #include <linux/tracepoint.h>
 
@@ -301,6 +302,64 @@ TRACE_EVENT(kvm_timer_emulate,
 		  __entry->timer_idx, __entry->should_fire)
 );
 
+TRACE_EVENT(kvm_nested_eret,
+	TP_PROTO(struct kvm_vcpu *vcpu, unsigned long elr_el2,
+		 unsigned long spsr_el2),
+	TP_ARGS(vcpu, elr_el2, spsr_el2),
+
+	TP_STRUCT__entry(
+		__field(struct kvm_vcpu *,	vcpu)
+		__field(unsigned long,		elr_el2)
+		__field(unsigned long,		spsr_el2)
+		__field(unsigned long,		target_mode)
+		__field(unsigned long,		hcr_el2)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu = vcpu;
+		__entry->elr_el2 = elr_el2;
+		__entry->spsr_el2 = spsr_el2;
+		__entry->target_mode = spsr_el2 & (PSR_MODE_MASK | PSR_MODE32_BIT);
+		__entry->hcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
+	),
+
+	TP_printk("elr_el2: 0x%lx spsr_el2: 0x%08lx (M: %s) hcr_el2: %lx",
+		  __entry->elr_el2, __entry->spsr_el2,
+		  __print_symbolic(__entry->target_mode, kvm_mode_names),
+		  __entry->hcr_el2)
+);
+
+TRACE_EVENT(kvm_inject_nested_exception,
+	TP_PROTO(struct kvm_vcpu *vcpu, u64 esr_el2, int type),
+	TP_ARGS(vcpu, esr_el2, type),
+
+	TP_STRUCT__entry(
+		__field(struct kvm_vcpu *,		vcpu)
+		__field(unsigned long,			esr_el2)
+		__field(int,				type)
+		__field(unsigned long,			spsr_el2)
+		__field(unsigned long,			pc)
+		__field(unsigned long,			source_mode)
+		__field(unsigned long,			hcr_el2)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu = vcpu;
+		__entry->esr_el2 = esr_el2;
+		__entry->type = type;
+		__entry->spsr_el2 = *vcpu_cpsr(vcpu);
+		__entry->pc = *vcpu_pc(vcpu);
+		__entry->source_mode = *vcpu_cpsr(vcpu) & (PSR_MODE_MASK | PSR_MODE32_BIT);
+		__entry->hcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
+	),
+
+	TP_printk("%s: esr_el2 0x%lx elr_el2: 0x%lx spsr_el2: 0x%08lx (M: %s) hcr_el2: %lx",
+		  __print_symbolic(__entry->type, kvm_exception_type_names),
+		  __entry->esr_el2, __entry->pc, __entry->spsr_el2,
+		  __print_symbolic(__entry->source_mode, kvm_mode_names),
+		  __entry->hcr_el2)
+);
+
 #endif /* _TRACE_ARM_ARM64_KVM_H */
 
 #undef TRACE_INCLUDE_PATH
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 10/64] KVM: arm64: nv: Inject HVC exceptions to the virtual EL2
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Jintack Lim <jintack.lim@linaro.org>

As we expect all PSCI calls from the L1 hypervisor to be performed
using SMC when nested virtualization is enabled, it is clear that
all HVC instruction from the VM (including from the virtual EL2)
are supposed to handled in the virtual EL2.

Forward these to EL2 as required.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
[maz: add handling of HCR_EL2.HCD]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/handle_exit.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index fd2dd26caf91..6bfb5c31cad1 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -16,6 +16,7 @@
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_nested.h>
 #include <asm/debug-monitors.h>
 #include <asm/traps.h>
 
@@ -40,6 +41,16 @@ static int handle_hvc(struct kvm_vcpu *vcpu)
 			    kvm_vcpu_hvc_get_imm(vcpu));
 	vcpu->stat.hvc_exit_stat++;
 
+	/* Forward hvc instructions to the virtual EL2 if the guest has EL2. */
+	if (vcpu_has_nv(vcpu)) {
+		if (vcpu_read_sys_reg(vcpu, HCR_EL2) & HCR_HCD)
+			kvm_inject_undefined(vcpu);
+		else
+			kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
+
+		return 1;
+	}
+
 	ret = kvm_hvc_call_handler(vcpu);
 	if (ret < 0) {
 		vcpu_set_reg(vcpu, 0, ~0UL);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 10/64] KVM: arm64: nv: Inject HVC exceptions to the virtual EL2
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

From: Jintack Lim <jintack.lim@linaro.org>

As we expect all PSCI calls from the L1 hypervisor to be performed
using SMC when nested virtualization is enabled, it is clear that
all HVC instruction from the VM (including from the virtual EL2)
are supposed to handled in the virtual EL2.

Forward these to EL2 as required.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
[maz: add handling of HCR_EL2.HCD]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/handle_exit.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index fd2dd26caf91..6bfb5c31cad1 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -16,6 +16,7 @@
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_nested.h>
 #include <asm/debug-monitors.h>
 #include <asm/traps.h>
 
@@ -40,6 +41,16 @@ static int handle_hvc(struct kvm_vcpu *vcpu)
 			    kvm_vcpu_hvc_get_imm(vcpu));
 	vcpu->stat.hvc_exit_stat++;
 
+	/* Forward hvc instructions to the virtual EL2 if the guest has EL2. */
+	if (vcpu_has_nv(vcpu)) {
+		if (vcpu_read_sys_reg(vcpu, HCR_EL2) & HCR_HCD)
+			kvm_inject_undefined(vcpu);
+		else
+			kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
+
+		return 1;
+	}
+
 	ret = kvm_hvc_call_handler(vcpu);
 	if (ret < 0) {
 		vcpu_set_reg(vcpu, 0, ~0UL);
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 10/64] KVM: arm64: nv: Inject HVC exceptions to the virtual EL2
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Jintack Lim <jintack.lim@linaro.org>

As we expect all PSCI calls from the L1 hypervisor to be performed
using SMC when nested virtualization is enabled, it is clear that
all HVC instruction from the VM (including from the virtual EL2)
are supposed to handled in the virtual EL2.

Forward these to EL2 as required.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
[maz: add handling of HCR_EL2.HCD]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/handle_exit.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index fd2dd26caf91..6bfb5c31cad1 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -16,6 +16,7 @@
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_nested.h>
 #include <asm/debug-monitors.h>
 #include <asm/traps.h>
 
@@ -40,6 +41,16 @@ static int handle_hvc(struct kvm_vcpu *vcpu)
 			    kvm_vcpu_hvc_get_imm(vcpu));
 	vcpu->stat.hvc_exit_stat++;
 
+	/* Forward hvc instructions to the virtual EL2 if the guest has EL2. */
+	if (vcpu_has_nv(vcpu)) {
+		if (vcpu_read_sys_reg(vcpu, HCR_EL2) & HCR_HCD)
+			kvm_inject_undefined(vcpu);
+		else
+			kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
+
+		return 1;
+	}
+
 	ret = kvm_hvc_call_handler(vcpu);
 	if (ret < 0) {
 		vcpu_set_reg(vcpu, 0, ~0UL);
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 11/64] KVM: arm64: nv: Handle trapped ERET from virtual EL2
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Christoffer Dall <christoffer.dall@arm.com>

When a guest hypervisor running virtual EL2 in EL1 executes an ERET
instruction, we will have set HCR_EL2.NV which traps ERET to EL2, so
that we can emulate the exception return in software.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/esr.h     |  4 ++++
 arch/arm64/include/asm/kvm_arm.h |  2 +-
 arch/arm64/kvm/handle_exit.c     | 10 ++++++++++
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index d52a0b269ee8..3574c224889f 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -257,6 +257,10 @@
 		(((e) & ESR_ELx_SYS64_ISS_OP2_MASK) >>		\
 		 ESR_ELx_SYS64_ISS_OP2_SHIFT))
 
+/* ISS field definitions for ERET/ERETAA/ERETAB trapping */
+#define ESR_ELx_ERET_ISS_ERET		0x2
+#define ESR_ELx_ERET_ISS_ERETA		0x1
+
 /*
  * ISS field definitions for floating-point exception traps
  * (FP_EXC_32/FP_EXC_64).
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index e6e3aae87a09..5acb153a82c8 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -353,7 +353,7 @@
 	ECN(SP_ALIGN), ECN(FP_EXC32), ECN(FP_EXC64), ECN(SERROR), \
 	ECN(BREAKPT_LOW), ECN(BREAKPT_CUR), ECN(SOFTSTP_LOW), \
 	ECN(SOFTSTP_CUR), ECN(WATCHPT_LOW), ECN(WATCHPT_CUR), \
-	ECN(BKPT32), ECN(VECTOR32), ECN(BRK64)
+	ECN(BKPT32), ECN(VECTOR32), ECN(BRK64), ECN(ERET)
 
 #define CPACR_EL1_FPEN		(3 << 20)
 #define CPACR_EL1_TTA		(1 << 28)
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 6bfb5c31cad1..2bbeed8c9786 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -171,6 +171,15 @@ static int kvm_handle_ptrauth(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+static int kvm_handle_eret(struct kvm_vcpu *vcpu)
+{
+	if (kvm_vcpu_get_esr(vcpu) & ESR_ELx_ERET_ISS_ERET)
+		return kvm_handle_ptrauth(vcpu);
+
+	kvm_emulate_nested_eret(vcpu);
+	return 1;
+}
+
 static exit_handle_fn arm_exit_handlers[] = {
 	[0 ... ESR_ELx_EC_MAX]	= kvm_handle_unknown_ec,
 	[ESR_ELx_EC_WFx]	= kvm_handle_wfx,
@@ -185,6 +194,7 @@ static exit_handle_fn arm_exit_handlers[] = {
 	[ESR_ELx_EC_SMC64]	= handle_smc,
 	[ESR_ELx_EC_SYS64]	= kvm_handle_sys_reg,
 	[ESR_ELx_EC_SVE]	= handle_sve,
+	[ESR_ELx_EC_ERET]	= kvm_handle_eret,
 	[ESR_ELx_EC_IABT_LOW]	= kvm_handle_guest_abort,
 	[ESR_ELx_EC_DABT_LOW]	= kvm_handle_guest_abort,
 	[ESR_ELx_EC_SOFTSTP_LOW]= kvm_handle_guest_debug,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 11/64] KVM: arm64: nv: Handle trapped ERET from virtual EL2
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

From: Christoffer Dall <christoffer.dall@arm.com>

When a guest hypervisor running virtual EL2 in EL1 executes an ERET
instruction, we will have set HCR_EL2.NV which traps ERET to EL2, so
that we can emulate the exception return in software.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/esr.h     |  4 ++++
 arch/arm64/include/asm/kvm_arm.h |  2 +-
 arch/arm64/kvm/handle_exit.c     | 10 ++++++++++
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index d52a0b269ee8..3574c224889f 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -257,6 +257,10 @@
 		(((e) & ESR_ELx_SYS64_ISS_OP2_MASK) >>		\
 		 ESR_ELx_SYS64_ISS_OP2_SHIFT))
 
+/* ISS field definitions for ERET/ERETAA/ERETAB trapping */
+#define ESR_ELx_ERET_ISS_ERET		0x2
+#define ESR_ELx_ERET_ISS_ERETA		0x1
+
 /*
  * ISS field definitions for floating-point exception traps
  * (FP_EXC_32/FP_EXC_64).
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index e6e3aae87a09..5acb153a82c8 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -353,7 +353,7 @@
 	ECN(SP_ALIGN), ECN(FP_EXC32), ECN(FP_EXC64), ECN(SERROR), \
 	ECN(BREAKPT_LOW), ECN(BREAKPT_CUR), ECN(SOFTSTP_LOW), \
 	ECN(SOFTSTP_CUR), ECN(WATCHPT_LOW), ECN(WATCHPT_CUR), \
-	ECN(BKPT32), ECN(VECTOR32), ECN(BRK64)
+	ECN(BKPT32), ECN(VECTOR32), ECN(BRK64), ECN(ERET)
 
 #define CPACR_EL1_FPEN		(3 << 20)
 #define CPACR_EL1_TTA		(1 << 28)
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 6bfb5c31cad1..2bbeed8c9786 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -171,6 +171,15 @@ static int kvm_handle_ptrauth(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+static int kvm_handle_eret(struct kvm_vcpu *vcpu)
+{
+	if (kvm_vcpu_get_esr(vcpu) & ESR_ELx_ERET_ISS_ERET)
+		return kvm_handle_ptrauth(vcpu);
+
+	kvm_emulate_nested_eret(vcpu);
+	return 1;
+}
+
 static exit_handle_fn arm_exit_handlers[] = {
 	[0 ... ESR_ELx_EC_MAX]	= kvm_handle_unknown_ec,
 	[ESR_ELx_EC_WFx]	= kvm_handle_wfx,
@@ -185,6 +194,7 @@ static exit_handle_fn arm_exit_handlers[] = {
 	[ESR_ELx_EC_SMC64]	= handle_smc,
 	[ESR_ELx_EC_SYS64]	= kvm_handle_sys_reg,
 	[ESR_ELx_EC_SVE]	= handle_sve,
+	[ESR_ELx_EC_ERET]	= kvm_handle_eret,
 	[ESR_ELx_EC_IABT_LOW]	= kvm_handle_guest_abort,
 	[ESR_ELx_EC_DABT_LOW]	= kvm_handle_guest_abort,
 	[ESR_ELx_EC_SOFTSTP_LOW]= kvm_handle_guest_debug,
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 11/64] KVM: arm64: nv: Handle trapped ERET from virtual EL2
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Christoffer Dall <christoffer.dall@arm.com>

When a guest hypervisor running virtual EL2 in EL1 executes an ERET
instruction, we will have set HCR_EL2.NV which traps ERET to EL2, so
that we can emulate the exception return in software.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/esr.h     |  4 ++++
 arch/arm64/include/asm/kvm_arm.h |  2 +-
 arch/arm64/kvm/handle_exit.c     | 10 ++++++++++
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index d52a0b269ee8..3574c224889f 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -257,6 +257,10 @@
 		(((e) & ESR_ELx_SYS64_ISS_OP2_MASK) >>		\
 		 ESR_ELx_SYS64_ISS_OP2_SHIFT))
 
+/* ISS field definitions for ERET/ERETAA/ERETAB trapping */
+#define ESR_ELx_ERET_ISS_ERET		0x2
+#define ESR_ELx_ERET_ISS_ERETA		0x1
+
 /*
  * ISS field definitions for floating-point exception traps
  * (FP_EXC_32/FP_EXC_64).
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index e6e3aae87a09..5acb153a82c8 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -353,7 +353,7 @@
 	ECN(SP_ALIGN), ECN(FP_EXC32), ECN(FP_EXC64), ECN(SERROR), \
 	ECN(BREAKPT_LOW), ECN(BREAKPT_CUR), ECN(SOFTSTP_LOW), \
 	ECN(SOFTSTP_CUR), ECN(WATCHPT_LOW), ECN(WATCHPT_CUR), \
-	ECN(BKPT32), ECN(VECTOR32), ECN(BRK64)
+	ECN(BKPT32), ECN(VECTOR32), ECN(BRK64), ECN(ERET)
 
 #define CPACR_EL1_FPEN		(3 << 20)
 #define CPACR_EL1_TTA		(1 << 28)
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 6bfb5c31cad1..2bbeed8c9786 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -171,6 +171,15 @@ static int kvm_handle_ptrauth(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+static int kvm_handle_eret(struct kvm_vcpu *vcpu)
+{
+	if (kvm_vcpu_get_esr(vcpu) & ESR_ELx_ERET_ISS_ERET)
+		return kvm_handle_ptrauth(vcpu);
+
+	kvm_emulate_nested_eret(vcpu);
+	return 1;
+}
+
 static exit_handle_fn arm_exit_handlers[] = {
 	[0 ... ESR_ELx_EC_MAX]	= kvm_handle_unknown_ec,
 	[ESR_ELx_EC_WFx]	= kvm_handle_wfx,
@@ -185,6 +194,7 @@ static exit_handle_fn arm_exit_handlers[] = {
 	[ESR_ELx_EC_SMC64]	= handle_smc,
 	[ESR_ELx_EC_SYS64]	= kvm_handle_sys_reg,
 	[ESR_ELx_EC_SVE]	= handle_sve,
+	[ESR_ELx_EC_ERET]	= kvm_handle_eret,
 	[ESR_ELx_EC_IABT_LOW]	= kvm_handle_guest_abort,
 	[ESR_ELx_EC_DABT_LOW]	= kvm_handle_guest_abort,
 	[ESR_ELx_EC_SOFTSTP_LOW]= kvm_handle_guest_debug,
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 12/64] KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

Some EL2 system registers immediately affect the current execution
of the system, so we need to use their respective EL1 counterparts.
For this we need to define a mapping between the two. In general,
this only affects non-VHE guest hypervisors, as VHE system registers
are compatible with the EL1 counterparts.

These helpers will get used in subsequent patches.

Co-developed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_nested.h | 54 +++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index fd601ea68d13..5a85be6d8eb3 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -2,6 +2,7 @@
 #ifndef __ARM64_KVM_NESTED_H
 #define __ARM64_KVM_NESTED_H
 
+#include <linux/bitfield.h>
 #include <linux/kvm_host.h>
 
 static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
@@ -11,4 +12,57 @@ static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
 		test_bit(KVM_ARM_VCPU_HAS_EL2, vcpu->arch.features));
 }
 
+/* Translation helpers from non-VHE EL2 to EL1 */
+static inline u64 tcr_el2_ps_to_tcr_el1_ips(u64 tcr_el2)
+{
+	return (u64)FIELD_GET(TCR_EL2_PS_MASK, tcr_el2) << TCR_IPS_SHIFT;
+}
+
+static inline u64 translate_tcr_el2_to_tcr_el1(u64 tcr)
+{
+	return TCR_EPD1_MASK |				/* disable TTBR1_EL1 */
+	       ((tcr & TCR_EL2_TBI) ? TCR_TBI0 : 0) |
+	       tcr_el2_ps_to_tcr_el1_ips(tcr) |
+	       (tcr & TCR_EL2_TG0_MASK) |
+	       (tcr & TCR_EL2_ORGN0_MASK) |
+	       (tcr & TCR_EL2_IRGN0_MASK) |
+	       (tcr & TCR_EL2_T0SZ_MASK);
+}
+
+static inline u64 translate_cptr_el2_to_cpacr_el1(u64 cptr_el2)
+{
+	u64 cpacr_el1 = 0;
+
+	if (cptr_el2 & CPTR_EL2_TTA)
+		cpacr_el1 |= CPACR_EL1_TTA;
+	if (!(cptr_el2 & CPTR_EL2_TFP))
+		cpacr_el1 |= CPACR_EL1_FPEN;
+	if (!(cptr_el2 & CPTR_EL2_TZ))
+		cpacr_el1 |= CPACR_EL1_ZEN;
+
+	return cpacr_el1;
+}
+
+static inline u64 translate_sctlr_el2_to_sctlr_el1(u64 val)
+{
+	/* Only preserve the minimal set of bits we support */
+	val &= (SCTLR_ELx_M | SCTLR_ELx_A | SCTLR_ELx_C | SCTLR_ELx_SA |
+		SCTLR_ELx_I | SCTLR_ELx_IESB | SCTLR_ELx_WXN | SCTLR_ELx_EE);
+	val |= SCTLR_EL1_RES1;
+
+	return val;
+}
+
+static inline u64 translate_ttbr0_el2_to_ttbr0_el1(u64 ttbr0)
+{
+	/* Clear the ASID field */
+	return ttbr0 & ~GENMASK_ULL(63, 48);
+}
+
+static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
+{
+	return ((FIELD_GET(CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN, cnthctl) << 10) |
+		(cnthctl & (CNTHCTL_EVNTI | CNTHCTL_EVNTDIR | CNTHCTL_EVNTEN)));
+}
+
 #endif /* __ARM64_KVM_NESTED_H */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 12/64] KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

Some EL2 system registers immediately affect the current execution
of the system, so we need to use their respective EL1 counterparts.
For this we need to define a mapping between the two. In general,
this only affects non-VHE guest hypervisors, as VHE system registers
are compatible with the EL1 counterparts.

These helpers will get used in subsequent patches.

Co-developed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_nested.h | 54 +++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index fd601ea68d13..5a85be6d8eb3 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -2,6 +2,7 @@
 #ifndef __ARM64_KVM_NESTED_H
 #define __ARM64_KVM_NESTED_H
 
+#include <linux/bitfield.h>
 #include <linux/kvm_host.h>
 
 static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
@@ -11,4 +12,57 @@ static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
 		test_bit(KVM_ARM_VCPU_HAS_EL2, vcpu->arch.features));
 }
 
+/* Translation helpers from non-VHE EL2 to EL1 */
+static inline u64 tcr_el2_ps_to_tcr_el1_ips(u64 tcr_el2)
+{
+	return (u64)FIELD_GET(TCR_EL2_PS_MASK, tcr_el2) << TCR_IPS_SHIFT;
+}
+
+static inline u64 translate_tcr_el2_to_tcr_el1(u64 tcr)
+{
+	return TCR_EPD1_MASK |				/* disable TTBR1_EL1 */
+	       ((tcr & TCR_EL2_TBI) ? TCR_TBI0 : 0) |
+	       tcr_el2_ps_to_tcr_el1_ips(tcr) |
+	       (tcr & TCR_EL2_TG0_MASK) |
+	       (tcr & TCR_EL2_ORGN0_MASK) |
+	       (tcr & TCR_EL2_IRGN0_MASK) |
+	       (tcr & TCR_EL2_T0SZ_MASK);
+}
+
+static inline u64 translate_cptr_el2_to_cpacr_el1(u64 cptr_el2)
+{
+	u64 cpacr_el1 = 0;
+
+	if (cptr_el2 & CPTR_EL2_TTA)
+		cpacr_el1 |= CPACR_EL1_TTA;
+	if (!(cptr_el2 & CPTR_EL2_TFP))
+		cpacr_el1 |= CPACR_EL1_FPEN;
+	if (!(cptr_el2 & CPTR_EL2_TZ))
+		cpacr_el1 |= CPACR_EL1_ZEN;
+
+	return cpacr_el1;
+}
+
+static inline u64 translate_sctlr_el2_to_sctlr_el1(u64 val)
+{
+	/* Only preserve the minimal set of bits we support */
+	val &= (SCTLR_ELx_M | SCTLR_ELx_A | SCTLR_ELx_C | SCTLR_ELx_SA |
+		SCTLR_ELx_I | SCTLR_ELx_IESB | SCTLR_ELx_WXN | SCTLR_ELx_EE);
+	val |= SCTLR_EL1_RES1;
+
+	return val;
+}
+
+static inline u64 translate_ttbr0_el2_to_ttbr0_el1(u64 ttbr0)
+{
+	/* Clear the ASID field */
+	return ttbr0 & ~GENMASK_ULL(63, 48);
+}
+
+static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
+{
+	return ((FIELD_GET(CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN, cnthctl) << 10) |
+		(cnthctl & (CNTHCTL_EVNTI | CNTHCTL_EVNTDIR | CNTHCTL_EVNTEN)));
+}
+
 #endif /* __ARM64_KVM_NESTED_H */
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 12/64] KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

Some EL2 system registers immediately affect the current execution
of the system, so we need to use their respective EL1 counterparts.
For this we need to define a mapping between the two. In general,
this only affects non-VHE guest hypervisors, as VHE system registers
are compatible with the EL1 counterparts.

These helpers will get used in subsequent patches.

Co-developed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_nested.h | 54 +++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index fd601ea68d13..5a85be6d8eb3 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -2,6 +2,7 @@
 #ifndef __ARM64_KVM_NESTED_H
 #define __ARM64_KVM_NESTED_H
 
+#include <linux/bitfield.h>
 #include <linux/kvm_host.h>
 
 static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
@@ -11,4 +12,57 @@ static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
 		test_bit(KVM_ARM_VCPU_HAS_EL2, vcpu->arch.features));
 }
 
+/* Translation helpers from non-VHE EL2 to EL1 */
+static inline u64 tcr_el2_ps_to_tcr_el1_ips(u64 tcr_el2)
+{
+	return (u64)FIELD_GET(TCR_EL2_PS_MASK, tcr_el2) << TCR_IPS_SHIFT;
+}
+
+static inline u64 translate_tcr_el2_to_tcr_el1(u64 tcr)
+{
+	return TCR_EPD1_MASK |				/* disable TTBR1_EL1 */
+	       ((tcr & TCR_EL2_TBI) ? TCR_TBI0 : 0) |
+	       tcr_el2_ps_to_tcr_el1_ips(tcr) |
+	       (tcr & TCR_EL2_TG0_MASK) |
+	       (tcr & TCR_EL2_ORGN0_MASK) |
+	       (tcr & TCR_EL2_IRGN0_MASK) |
+	       (tcr & TCR_EL2_T0SZ_MASK);
+}
+
+static inline u64 translate_cptr_el2_to_cpacr_el1(u64 cptr_el2)
+{
+	u64 cpacr_el1 = 0;
+
+	if (cptr_el2 & CPTR_EL2_TTA)
+		cpacr_el1 |= CPACR_EL1_TTA;
+	if (!(cptr_el2 & CPTR_EL2_TFP))
+		cpacr_el1 |= CPACR_EL1_FPEN;
+	if (!(cptr_el2 & CPTR_EL2_TZ))
+		cpacr_el1 |= CPACR_EL1_ZEN;
+
+	return cpacr_el1;
+}
+
+static inline u64 translate_sctlr_el2_to_sctlr_el1(u64 val)
+{
+	/* Only preserve the minimal set of bits we support */
+	val &= (SCTLR_ELx_M | SCTLR_ELx_A | SCTLR_ELx_C | SCTLR_ELx_SA |
+		SCTLR_ELx_I | SCTLR_ELx_IESB | SCTLR_ELx_WXN | SCTLR_ELx_EE);
+	val |= SCTLR_EL1_RES1;
+
+	return val;
+}
+
+static inline u64 translate_ttbr0_el2_to_ttbr0_el1(u64 ttbr0)
+{
+	/* Clear the ASID field */
+	return ttbr0 & ~GENMASK_ULL(63, 48);
+}
+
+static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
+{
+	return ((FIELD_GET(CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN, cnthctl) << 10) |
+		(cnthctl & (CNTHCTL_EVNTI | CNTHCTL_EVNTDIR | CNTHCTL_EVNTEN)));
+}
+
 #endif /* __ARM64_KVM_NESTED_H */
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 13/64] KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg()
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

KVM internally uses accessor functions when reading or writing the
guest's system registers. This takes care of accessing either the stored
copy or using the "live" EL1 system registers when the host uses VHE.

With the introduction of virtual EL2 we add a bunch of EL2 system
registers, which now must also be taken care of:
- If the guest is running in vEL2, and we access an EL1 sysreg, we must
  revert to the stored version of that, and not use the CPU's copy.
- If the guest is running in vEL1, and we access an EL2 sysreg, we must
  also use the stored version, since the CPU carries the EL1 copy.
- Some EL2 system registers are supposed to affect the current execution
  of the system, so we need to put them into their respective EL1
  counterparts. For this we need to define a mapping between the two.
  This is done using the newly introduced struct el2_sysreg_map.
- Some EL2 system registers have a different format than their EL1
  counterpart, so we need to translate them before writing them to the
  CPU. This is done using an (optional) translate function in the map.
- There are the three special registers SP_EL2, SPSR_EL2 and ELR_EL2,
  which need some separate handling (SPSR_EL2 is being handled in a
  separate patch).

All of these cases are now wrapped into the existing accessor functions,
so KVM users wouldn't need to care whether they access EL2 or EL1
registers and also which state the guest is in.

This handles what was formerly known as the "shadow state" dynamically,
without requiring a separate copy for each vCPU EL.

Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Co-developed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/sys_regs.c | 142 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 138 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 25386cb622c5..0927e6c345b4 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -65,23 +65,157 @@ static bool write_to_read_only(struct kvm_vcpu *vcpu,
 	return false;
 }
 
+#define PURE_EL2_SYSREG(el2)						\
+	case el2: {							\
+		*el1r = el2;						\
+		return true;						\
+	}
+
+#define MAPPED_EL2_SYSREG(el2, el1, fn)					\
+	case el2: {							\
+		*xlate = fn;						\
+		*el1r = el1;						\
+		return true;						\
+	}
+
+static bool get_el2_to_el1_mapping(unsigned int reg,
+				   unsigned int *el1r, u64 (**xlate)(u64))
+{
+	switch (reg) {
+		PURE_EL2_SYSREG(  VPIDR_EL2	);
+		PURE_EL2_SYSREG(  VMPIDR_EL2	);
+		PURE_EL2_SYSREG(  ACTLR_EL2	);
+		PURE_EL2_SYSREG(  HCR_EL2	);
+		PURE_EL2_SYSREG(  MDCR_EL2	);
+		PURE_EL2_SYSREG(  HSTR_EL2	);
+		PURE_EL2_SYSREG(  HACR_EL2	);
+		PURE_EL2_SYSREG(  VTTBR_EL2	);
+		PURE_EL2_SYSREG(  VTCR_EL2	);
+		PURE_EL2_SYSREG(  RVBAR_EL2	);
+		PURE_EL2_SYSREG(  TPIDR_EL2	);
+		PURE_EL2_SYSREG(  HPFAR_EL2	);
+		PURE_EL2_SYSREG(  ELR_EL2	);
+		PURE_EL2_SYSREG(  SPSR_EL2	);
+		MAPPED_EL2_SYSREG(SCTLR_EL2,   SCTLR_EL1,
+				  translate_sctlr_el2_to_sctlr_el1	     );
+		MAPPED_EL2_SYSREG(CPTR_EL2,    CPACR_EL1,
+				  translate_cptr_el2_to_cpacr_el1	     );
+		MAPPED_EL2_SYSREG(TTBR0_EL2,   TTBR0_EL1,
+				  translate_ttbr0_el2_to_ttbr0_el1	     );
+		MAPPED_EL2_SYSREG(TTBR1_EL2,   TTBR1_EL1,   NULL	     );
+		MAPPED_EL2_SYSREG(TCR_EL2,     TCR_EL1,
+				  translate_tcr_el2_to_tcr_el1		     );
+		MAPPED_EL2_SYSREG(VBAR_EL2,    VBAR_EL1,    NULL	     );
+		MAPPED_EL2_SYSREG(AFSR0_EL2,   AFSR0_EL1,   NULL	     );
+		MAPPED_EL2_SYSREG(AFSR1_EL2,   AFSR1_EL1,   NULL	     );
+		MAPPED_EL2_SYSREG(ESR_EL2,     ESR_EL1,     NULL	     );
+		MAPPED_EL2_SYSREG(FAR_EL2,     FAR_EL1,     NULL	     );
+		MAPPED_EL2_SYSREG(MAIR_EL2,    MAIR_EL1,    NULL	     );
+		MAPPED_EL2_SYSREG(AMAIR_EL2,   AMAIR_EL1,   NULL	     );
+		MAPPED_EL2_SYSREG(CNTHCTL_EL2, CNTKCTL_EL1,
+				  translate_cnthctl_el2_to_cntkctl_el1	     );
+	default:
+		return false;
+	}
+}
+
 u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
 {
 	u64 val = 0x8badf00d8badf00d;
+	u64 (*xlate)(u64) = NULL;
+	unsigned int el1r;
+
+	if (!vcpu->arch.sysregs_loaded_on_cpu)
+		goto memory_read;
+
+	if (unlikely(get_el2_to_el1_mapping(reg, &el1r, &xlate))) {
+		if (!is_hyp_ctxt(vcpu))
+			goto memory_read;
+
+		/*
+		 * ELR_EL2 is special cased for now.
+		 */
+		switch (reg) {
+		case ELR_EL2:
+			return read_sysreg_el1(SYS_ELR);
+		}
+
+		/*
+		 * If this register does not have an EL1 counterpart,
+		 * then read the stored EL2 version.
+		 */
+		if (reg == el1r)
+			goto memory_read;
+
+		/*
+		 * If we have a non-VHE guest and that the sysreg
+		 * requires translation to be used at EL1, use the
+		 * in-memory copy instead.
+		 */
+		if (!vcpu_el2_e2h_is_set(vcpu) && xlate)
+			goto memory_read;
+
+		/* Get the current version of the EL1 counterpart. */
+		WARN_ON(!__vcpu_read_sys_reg_from_cpu(el1r, &val));
+		return val;
+	}
+
+	/* EL1 register can't be on the CPU if the guest is in vEL2. */
+	if (unlikely(is_hyp_ctxt(vcpu)))
+		goto memory_read;
 
-	if (vcpu->arch.sysregs_loaded_on_cpu &&
-	    __vcpu_read_sys_reg_from_cpu(reg, &val))
+	if (__vcpu_read_sys_reg_from_cpu(reg, &val))
 		return val;
 
+memory_read:
 	return __vcpu_sys_reg(vcpu, reg);
 }
 
 void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
 {
-	if (vcpu->arch.sysregs_loaded_on_cpu &&
-	    __vcpu_write_sys_reg_to_cpu(val, reg))
+	u64 (*xlate)(u64) = NULL;
+	unsigned int el1r;
+
+	if (!vcpu->arch.sysregs_loaded_on_cpu)
+		goto memory_write;
+
+	if (unlikely(get_el2_to_el1_mapping(reg, &el1r, &xlate))) {
+		if (!is_hyp_ctxt(vcpu))
+			goto memory_write;
+
+		/*
+		 * Always store a copy of the write to memory to avoid having
+		 * to reverse-translate virtual EL2 system registers for a
+		 * non-VHE guest hypervisor.
+		 */
+		__vcpu_sys_reg(vcpu, reg) = val;
+
+		switch (reg) {
+		case ELR_EL2:
+			write_sysreg_el1(val, SYS_ELR);
+			return;
+		}
+
+		/* No EL1 counterpart? We're done here.? */
+		if (reg == el1r)
+			return;
+
+		if (!vcpu_el2_e2h_is_set(vcpu) && xlate)
+			val = xlate(val);
+
+		/* Redirect this to the EL1 version of the register. */
+		WARN_ON(!__vcpu_write_sys_reg_to_cpu(val, el1r));
+		return;
+	}
+
+	/* EL1 register can't be on the CPU if the guest is in vEL2. */
+	if (unlikely(is_hyp_ctxt(vcpu)))
+		goto memory_write;
+
+	if (__vcpu_write_sys_reg_to_cpu(val, reg))
 		return;
 
+memory_write:
 	 __vcpu_sys_reg(vcpu, reg) = val;
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 13/64] KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg()
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

KVM internally uses accessor functions when reading or writing the
guest's system registers. This takes care of accessing either the stored
copy or using the "live" EL1 system registers when the host uses VHE.

With the introduction of virtual EL2 we add a bunch of EL2 system
registers, which now must also be taken care of:
- If the guest is running in vEL2, and we access an EL1 sysreg, we must
  revert to the stored version of that, and not use the CPU's copy.
- If the guest is running in vEL1, and we access an EL2 sysreg, we must
  also use the stored version, since the CPU carries the EL1 copy.
- Some EL2 system registers are supposed to affect the current execution
  of the system, so we need to put them into their respective EL1
  counterparts. For this we need to define a mapping between the two.
  This is done using the newly introduced struct el2_sysreg_map.
- Some EL2 system registers have a different format than their EL1
  counterpart, so we need to translate them before writing them to the
  CPU. This is done using an (optional) translate function in the map.
- There are the three special registers SP_EL2, SPSR_EL2 and ELR_EL2,
  which need some separate handling (SPSR_EL2 is being handled in a
  separate patch).

All of these cases are now wrapped into the existing accessor functions,
so KVM users wouldn't need to care whether they access EL2 or EL1
registers and also which state the guest is in.

This handles what was formerly known as the "shadow state" dynamically,
without requiring a separate copy for each vCPU EL.

Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Co-developed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/sys_regs.c | 142 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 138 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 25386cb622c5..0927e6c345b4 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -65,23 +65,157 @@ static bool write_to_read_only(struct kvm_vcpu *vcpu,
 	return false;
 }
 
+#define PURE_EL2_SYSREG(el2)						\
+	case el2: {							\
+		*el1r = el2;						\
+		return true;						\
+	}
+
+#define MAPPED_EL2_SYSREG(el2, el1, fn)					\
+	case el2: {							\
+		*xlate = fn;						\
+		*el1r = el1;						\
+		return true;						\
+	}
+
+static bool get_el2_to_el1_mapping(unsigned int reg,
+				   unsigned int *el1r, u64 (**xlate)(u64))
+{
+	switch (reg) {
+		PURE_EL2_SYSREG(  VPIDR_EL2	);
+		PURE_EL2_SYSREG(  VMPIDR_EL2	);
+		PURE_EL2_SYSREG(  ACTLR_EL2	);
+		PURE_EL2_SYSREG(  HCR_EL2	);
+		PURE_EL2_SYSREG(  MDCR_EL2	);
+		PURE_EL2_SYSREG(  HSTR_EL2	);
+		PURE_EL2_SYSREG(  HACR_EL2	);
+		PURE_EL2_SYSREG(  VTTBR_EL2	);
+		PURE_EL2_SYSREG(  VTCR_EL2	);
+		PURE_EL2_SYSREG(  RVBAR_EL2	);
+		PURE_EL2_SYSREG(  TPIDR_EL2	);
+		PURE_EL2_SYSREG(  HPFAR_EL2	);
+		PURE_EL2_SYSREG(  ELR_EL2	);
+		PURE_EL2_SYSREG(  SPSR_EL2	);
+		MAPPED_EL2_SYSREG(SCTLR_EL2,   SCTLR_EL1,
+				  translate_sctlr_el2_to_sctlr_el1	     );
+		MAPPED_EL2_SYSREG(CPTR_EL2,    CPACR_EL1,
+				  translate_cptr_el2_to_cpacr_el1	     );
+		MAPPED_EL2_SYSREG(TTBR0_EL2,   TTBR0_EL1,
+				  translate_ttbr0_el2_to_ttbr0_el1	     );
+		MAPPED_EL2_SYSREG(TTBR1_EL2,   TTBR1_EL1,   NULL	     );
+		MAPPED_EL2_SYSREG(TCR_EL2,     TCR_EL1,
+				  translate_tcr_el2_to_tcr_el1		     );
+		MAPPED_EL2_SYSREG(VBAR_EL2,    VBAR_EL1,    NULL	     );
+		MAPPED_EL2_SYSREG(AFSR0_EL2,   AFSR0_EL1,   NULL	     );
+		MAPPED_EL2_SYSREG(AFSR1_EL2,   AFSR1_EL1,   NULL	     );
+		MAPPED_EL2_SYSREG(ESR_EL2,     ESR_EL1,     NULL	     );
+		MAPPED_EL2_SYSREG(FAR_EL2,     FAR_EL1,     NULL	     );
+		MAPPED_EL2_SYSREG(MAIR_EL2,    MAIR_EL1,    NULL	     );
+		MAPPED_EL2_SYSREG(AMAIR_EL2,   AMAIR_EL1,   NULL	     );
+		MAPPED_EL2_SYSREG(CNTHCTL_EL2, CNTKCTL_EL1,
+				  translate_cnthctl_el2_to_cntkctl_el1	     );
+	default:
+		return false;
+	}
+}
+
 u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
 {
 	u64 val = 0x8badf00d8badf00d;
+	u64 (*xlate)(u64) = NULL;
+	unsigned int el1r;
+
+	if (!vcpu->arch.sysregs_loaded_on_cpu)
+		goto memory_read;
+
+	if (unlikely(get_el2_to_el1_mapping(reg, &el1r, &xlate))) {
+		if (!is_hyp_ctxt(vcpu))
+			goto memory_read;
+
+		/*
+		 * ELR_EL2 is special cased for now.
+		 */
+		switch (reg) {
+		case ELR_EL2:
+			return read_sysreg_el1(SYS_ELR);
+		}
+
+		/*
+		 * If this register does not have an EL1 counterpart,
+		 * then read the stored EL2 version.
+		 */
+		if (reg == el1r)
+			goto memory_read;
+
+		/*
+		 * If we have a non-VHE guest and that the sysreg
+		 * requires translation to be used at EL1, use the
+		 * in-memory copy instead.
+		 */
+		if (!vcpu_el2_e2h_is_set(vcpu) && xlate)
+			goto memory_read;
+
+		/* Get the current version of the EL1 counterpart. */
+		WARN_ON(!__vcpu_read_sys_reg_from_cpu(el1r, &val));
+		return val;
+	}
+
+	/* EL1 register can't be on the CPU if the guest is in vEL2. */
+	if (unlikely(is_hyp_ctxt(vcpu)))
+		goto memory_read;
 
-	if (vcpu->arch.sysregs_loaded_on_cpu &&
-	    __vcpu_read_sys_reg_from_cpu(reg, &val))
+	if (__vcpu_read_sys_reg_from_cpu(reg, &val))
 		return val;
 
+memory_read:
 	return __vcpu_sys_reg(vcpu, reg);
 }
 
 void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
 {
-	if (vcpu->arch.sysregs_loaded_on_cpu &&
-	    __vcpu_write_sys_reg_to_cpu(val, reg))
+	u64 (*xlate)(u64) = NULL;
+	unsigned int el1r;
+
+	if (!vcpu->arch.sysregs_loaded_on_cpu)
+		goto memory_write;
+
+	if (unlikely(get_el2_to_el1_mapping(reg, &el1r, &xlate))) {
+		if (!is_hyp_ctxt(vcpu))
+			goto memory_write;
+
+		/*
+		 * Always store a copy of the write to memory to avoid having
+		 * to reverse-translate virtual EL2 system registers for a
+		 * non-VHE guest hypervisor.
+		 */
+		__vcpu_sys_reg(vcpu, reg) = val;
+
+		switch (reg) {
+		case ELR_EL2:
+			write_sysreg_el1(val, SYS_ELR);
+			return;
+		}
+
+		/* No EL1 counterpart? We're done here.? */
+		if (reg == el1r)
+			return;
+
+		if (!vcpu_el2_e2h_is_set(vcpu) && xlate)
+			val = xlate(val);
+
+		/* Redirect this to the EL1 version of the register. */
+		WARN_ON(!__vcpu_write_sys_reg_to_cpu(val, el1r));
+		return;
+	}
+
+	/* EL1 register can't be on the CPU if the guest is in vEL2. */
+	if (unlikely(is_hyp_ctxt(vcpu)))
+		goto memory_write;
+
+	if (__vcpu_write_sys_reg_to_cpu(val, reg))
 		return;
 
+memory_write:
 	 __vcpu_sys_reg(vcpu, reg) = val;
 }
 
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 13/64] KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg()
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

KVM internally uses accessor functions when reading or writing the
guest's system registers. This takes care of accessing either the stored
copy or using the "live" EL1 system registers when the host uses VHE.

With the introduction of virtual EL2 we add a bunch of EL2 system
registers, which now must also be taken care of:
- If the guest is running in vEL2, and we access an EL1 sysreg, we must
  revert to the stored version of that, and not use the CPU's copy.
- If the guest is running in vEL1, and we access an EL2 sysreg, we must
  also use the stored version, since the CPU carries the EL1 copy.
- Some EL2 system registers are supposed to affect the current execution
  of the system, so we need to put them into their respective EL1
  counterparts. For this we need to define a mapping between the two.
  This is done using the newly introduced struct el2_sysreg_map.
- Some EL2 system registers have a different format than their EL1
  counterpart, so we need to translate them before writing them to the
  CPU. This is done using an (optional) translate function in the map.
- There are the three special registers SP_EL2, SPSR_EL2 and ELR_EL2,
  which need some separate handling (SPSR_EL2 is being handled in a
  separate patch).

All of these cases are now wrapped into the existing accessor functions,
so KVM users wouldn't need to care whether they access EL2 or EL1
registers and also which state the guest is in.

This handles what was formerly known as the "shadow state" dynamically,
without requiring a separate copy for each vCPU EL.

Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Co-developed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/sys_regs.c | 142 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 138 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 25386cb622c5..0927e6c345b4 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -65,23 +65,157 @@ static bool write_to_read_only(struct kvm_vcpu *vcpu,
 	return false;
 }
 
+#define PURE_EL2_SYSREG(el2)						\
+	case el2: {							\
+		*el1r = el2;						\
+		return true;						\
+	}
+
+#define MAPPED_EL2_SYSREG(el2, el1, fn)					\
+	case el2: {							\
+		*xlate = fn;						\
+		*el1r = el1;						\
+		return true;						\
+	}
+
+static bool get_el2_to_el1_mapping(unsigned int reg,
+				   unsigned int *el1r, u64 (**xlate)(u64))
+{
+	switch (reg) {
+		PURE_EL2_SYSREG(  VPIDR_EL2	);
+		PURE_EL2_SYSREG(  VMPIDR_EL2	);
+		PURE_EL2_SYSREG(  ACTLR_EL2	);
+		PURE_EL2_SYSREG(  HCR_EL2	);
+		PURE_EL2_SYSREG(  MDCR_EL2	);
+		PURE_EL2_SYSREG(  HSTR_EL2	);
+		PURE_EL2_SYSREG(  HACR_EL2	);
+		PURE_EL2_SYSREG(  VTTBR_EL2	);
+		PURE_EL2_SYSREG(  VTCR_EL2	);
+		PURE_EL2_SYSREG(  RVBAR_EL2	);
+		PURE_EL2_SYSREG(  TPIDR_EL2	);
+		PURE_EL2_SYSREG(  HPFAR_EL2	);
+		PURE_EL2_SYSREG(  ELR_EL2	);
+		PURE_EL2_SYSREG(  SPSR_EL2	);
+		MAPPED_EL2_SYSREG(SCTLR_EL2,   SCTLR_EL1,
+				  translate_sctlr_el2_to_sctlr_el1	     );
+		MAPPED_EL2_SYSREG(CPTR_EL2,    CPACR_EL1,
+				  translate_cptr_el2_to_cpacr_el1	     );
+		MAPPED_EL2_SYSREG(TTBR0_EL2,   TTBR0_EL1,
+				  translate_ttbr0_el2_to_ttbr0_el1	     );
+		MAPPED_EL2_SYSREG(TTBR1_EL2,   TTBR1_EL1,   NULL	     );
+		MAPPED_EL2_SYSREG(TCR_EL2,     TCR_EL1,
+				  translate_tcr_el2_to_tcr_el1		     );
+		MAPPED_EL2_SYSREG(VBAR_EL2,    VBAR_EL1,    NULL	     );
+		MAPPED_EL2_SYSREG(AFSR0_EL2,   AFSR0_EL1,   NULL	     );
+		MAPPED_EL2_SYSREG(AFSR1_EL2,   AFSR1_EL1,   NULL	     );
+		MAPPED_EL2_SYSREG(ESR_EL2,     ESR_EL1,     NULL	     );
+		MAPPED_EL2_SYSREG(FAR_EL2,     FAR_EL1,     NULL	     );
+		MAPPED_EL2_SYSREG(MAIR_EL2,    MAIR_EL1,    NULL	     );
+		MAPPED_EL2_SYSREG(AMAIR_EL2,   AMAIR_EL1,   NULL	     );
+		MAPPED_EL2_SYSREG(CNTHCTL_EL2, CNTKCTL_EL1,
+				  translate_cnthctl_el2_to_cntkctl_el1	     );
+	default:
+		return false;
+	}
+}
+
 u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
 {
 	u64 val = 0x8badf00d8badf00d;
+	u64 (*xlate)(u64) = NULL;
+	unsigned int el1r;
+
+	if (!vcpu->arch.sysregs_loaded_on_cpu)
+		goto memory_read;
+
+	if (unlikely(get_el2_to_el1_mapping(reg, &el1r, &xlate))) {
+		if (!is_hyp_ctxt(vcpu))
+			goto memory_read;
+
+		/*
+		 * ELR_EL2 is special cased for now.
+		 */
+		switch (reg) {
+		case ELR_EL2:
+			return read_sysreg_el1(SYS_ELR);
+		}
+
+		/*
+		 * If this register does not have an EL1 counterpart,
+		 * then read the stored EL2 version.
+		 */
+		if (reg == el1r)
+			goto memory_read;
+
+		/*
+		 * If we have a non-VHE guest and that the sysreg
+		 * requires translation to be used at EL1, use the
+		 * in-memory copy instead.
+		 */
+		if (!vcpu_el2_e2h_is_set(vcpu) && xlate)
+			goto memory_read;
+
+		/* Get the current version of the EL1 counterpart. */
+		WARN_ON(!__vcpu_read_sys_reg_from_cpu(el1r, &val));
+		return val;
+	}
+
+	/* EL1 register can't be on the CPU if the guest is in vEL2. */
+	if (unlikely(is_hyp_ctxt(vcpu)))
+		goto memory_read;
 
-	if (vcpu->arch.sysregs_loaded_on_cpu &&
-	    __vcpu_read_sys_reg_from_cpu(reg, &val))
+	if (__vcpu_read_sys_reg_from_cpu(reg, &val))
 		return val;
 
+memory_read:
 	return __vcpu_sys_reg(vcpu, reg);
 }
 
 void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
 {
-	if (vcpu->arch.sysregs_loaded_on_cpu &&
-	    __vcpu_write_sys_reg_to_cpu(val, reg))
+	u64 (*xlate)(u64) = NULL;
+	unsigned int el1r;
+
+	if (!vcpu->arch.sysregs_loaded_on_cpu)
+		goto memory_write;
+
+	if (unlikely(get_el2_to_el1_mapping(reg, &el1r, &xlate))) {
+		if (!is_hyp_ctxt(vcpu))
+			goto memory_write;
+
+		/*
+		 * Always store a copy of the write to memory to avoid having
+		 * to reverse-translate virtual EL2 system registers for a
+		 * non-VHE guest hypervisor.
+		 */
+		__vcpu_sys_reg(vcpu, reg) = val;
+
+		switch (reg) {
+		case ELR_EL2:
+			write_sysreg_el1(val, SYS_ELR);
+			return;
+		}
+
+		/* No EL1 counterpart? We're done here.? */
+		if (reg == el1r)
+			return;
+
+		if (!vcpu_el2_e2h_is_set(vcpu) && xlate)
+			val = xlate(val);
+
+		/* Redirect this to the EL1 version of the register. */
+		WARN_ON(!__vcpu_write_sys_reg_to_cpu(val, el1r));
+		return;
+	}
+
+	/* EL1 register can't be on the CPU if the guest is in vEL2. */
+	if (unlikely(is_hyp_ctxt(vcpu)))
+		goto memory_write;
+
+	if (__vcpu_write_sys_reg_to_cpu(val, reg))
 		return;
 
+memory_write:
 	 __vcpu_sys_reg(vcpu, reg) = val;
 }
 
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 14/64] KVM: arm64: nv: Handle SPSR_EL2 specially
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

SPSR_EL2 needs special attention when running nested on ARMv8.3:

If taking an exception while running at vEL2 (actually EL1), the
HW will update the SPSR_EL1 register with the EL1 mode. We need
to track this in order to make sure that accesses to the virtual
view of SPSR_EL2 is correct.

To do so, we place an illegal value in SPSR_EL1.M, and patch it
accordingly if required when accessing it.

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_emulate.h | 37 ++++++++++++++++++++++++++++
 arch/arm64/kvm/sys_regs.c            | 23 +++++++++++++++--
 2 files changed, 58 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index cb9f123d26f3..54e8eee413eb 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -241,6 +241,43 @@ static inline bool is_hyp_ctxt(const struct kvm_vcpu *vcpu)
 	return __is_hyp_ctxt(&vcpu->arch.ctxt);
 }
 
+static inline u64 __fixup_spsr_el2_write(struct kvm_cpu_context *ctxt, u64 val)
+{
+	if (!__vcpu_el2_e2h_is_set(ctxt)) {
+		/*
+		 * Clear the .M field when writing SPSR to the CPU, so that we
+		 * can detect when the CPU clobbered our SPSR copy during a
+		 * local exception.
+		 */
+		val &= ~0xc;
+	}
+
+	return val;
+}
+
+static inline u64 __fixup_spsr_el2_read(const struct kvm_cpu_context *ctxt, u64 val)
+{
+	if (__vcpu_el2_e2h_is_set(ctxt))
+		return val;
+
+	/*
+	 * SPSR.M == 0 means the CPU has not touched the SPSR, so the
+	 * register has still the value we saved on the last write.
+	 */
+	if ((val & 0xc) == 0)
+		return ctxt_sys_reg(ctxt, SPSR_EL2);
+
+	/*
+	 * Otherwise there was a "local" exception on the CPU,
+	 * which from the guest's point of view was being taken from
+	 * EL2 to EL2, although it actually happened to be from
+	 * EL1 to EL1.
+	 * So we need to fix the .M field in SPSR, to make it look
+	 * like EL2, which is what the guest would expect.
+	 */
+	return (val & ~0x0c) | CurrentEL_EL2;
+}
+
 /*
  * The layout of SPSR for an AArch32 state is different when observed from an
  * AArch64 SPSR_ELx or an AArch32 SPSR_*. This function generates the AArch32
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 0927e6c345b4..ace4a54caef9 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -133,11 +133,14 @@ u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
 			goto memory_read;
 
 		/*
-		 * ELR_EL2 is special cased for now.
+		 * ELR_EL2 and SPSR_EL2 are special cased for now.
 		 */
 		switch (reg) {
 		case ELR_EL2:
 			return read_sysreg_el1(SYS_ELR);
+		case SPSR_EL2:
+			val = read_sysreg_el1(SYS_SPSR);
+			return __fixup_spsr_el2_read(&vcpu->arch.ctxt, val);
 		}
 
 		/*
@@ -194,6 +197,10 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
 		case ELR_EL2:
 			write_sysreg_el1(val, SYS_ELR);
 			return;
+		case SPSR_EL2:
+			val = __fixup_spsr_el2_write(&vcpu->arch.ctxt, val);
+			write_sysreg_el1(val, SYS_SPSR);
+			return;
 		}
 
 		/* No EL1 counterpart? We're done here.? */
@@ -1610,6 +1617,18 @@ static bool access_sp_el1(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+static bool access_spsr_el2(struct kvm_vcpu *vcpu,
+			    struct sys_reg_params *p,
+			    const struct sys_reg_desc *r)
+{
+	if (p->is_write)
+		vcpu_write_sys_reg(vcpu, p->regval, SPSR_EL2);
+	else
+		p->regval = vcpu_read_sys_reg(vcpu, SPSR_EL2);
+
+	return true;
+}
+
 /*
  * Architected system registers.
  * Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
@@ -2041,7 +2060,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	EL2_REG(VTCR_EL2, access_rw, reset_val, 0),
 
 	{ SYS_DESC(SYS_DACR32_EL2), NULL, reset_unknown, DACR32_EL2 },
-	EL2_REG(SPSR_EL2, access_rw, reset_val, 0),
+	EL2_REG(SPSR_EL2, access_spsr_el2, reset_val, 0),
 	EL2_REG(ELR_EL2, access_rw, reset_val, 0),
 	{ SYS_DESC(SYS_SP_EL1), access_sp_el1},
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 14/64] KVM: arm64: nv: Handle SPSR_EL2 specially
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

SPSR_EL2 needs special attention when running nested on ARMv8.3:

If taking an exception while running at vEL2 (actually EL1), the
HW will update the SPSR_EL1 register with the EL1 mode. We need
to track this in order to make sure that accesses to the virtual
view of SPSR_EL2 is correct.

To do so, we place an illegal value in SPSR_EL1.M, and patch it
accordingly if required when accessing it.

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_emulate.h | 37 ++++++++++++++++++++++++++++
 arch/arm64/kvm/sys_regs.c            | 23 +++++++++++++++--
 2 files changed, 58 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index cb9f123d26f3..54e8eee413eb 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -241,6 +241,43 @@ static inline bool is_hyp_ctxt(const struct kvm_vcpu *vcpu)
 	return __is_hyp_ctxt(&vcpu->arch.ctxt);
 }
 
+static inline u64 __fixup_spsr_el2_write(struct kvm_cpu_context *ctxt, u64 val)
+{
+	if (!__vcpu_el2_e2h_is_set(ctxt)) {
+		/*
+		 * Clear the .M field when writing SPSR to the CPU, so that we
+		 * can detect when the CPU clobbered our SPSR copy during a
+		 * local exception.
+		 */
+		val &= ~0xc;
+	}
+
+	return val;
+}
+
+static inline u64 __fixup_spsr_el2_read(const struct kvm_cpu_context *ctxt, u64 val)
+{
+	if (__vcpu_el2_e2h_is_set(ctxt))
+		return val;
+
+	/*
+	 * SPSR.M == 0 means the CPU has not touched the SPSR, so the
+	 * register has still the value we saved on the last write.
+	 */
+	if ((val & 0xc) == 0)
+		return ctxt_sys_reg(ctxt, SPSR_EL2);
+
+	/*
+	 * Otherwise there was a "local" exception on the CPU,
+	 * which from the guest's point of view was being taken from
+	 * EL2 to EL2, although it actually happened to be from
+	 * EL1 to EL1.
+	 * So we need to fix the .M field in SPSR, to make it look
+	 * like EL2, which is what the guest would expect.
+	 */
+	return (val & ~0x0c) | CurrentEL_EL2;
+}
+
 /*
  * The layout of SPSR for an AArch32 state is different when observed from an
  * AArch64 SPSR_ELx or an AArch32 SPSR_*. This function generates the AArch32
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 0927e6c345b4..ace4a54caef9 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -133,11 +133,14 @@ u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
 			goto memory_read;
 
 		/*
-		 * ELR_EL2 is special cased for now.
+		 * ELR_EL2 and SPSR_EL2 are special cased for now.
 		 */
 		switch (reg) {
 		case ELR_EL2:
 			return read_sysreg_el1(SYS_ELR);
+		case SPSR_EL2:
+			val = read_sysreg_el1(SYS_SPSR);
+			return __fixup_spsr_el2_read(&vcpu->arch.ctxt, val);
 		}
 
 		/*
@@ -194,6 +197,10 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
 		case ELR_EL2:
 			write_sysreg_el1(val, SYS_ELR);
 			return;
+		case SPSR_EL2:
+			val = __fixup_spsr_el2_write(&vcpu->arch.ctxt, val);
+			write_sysreg_el1(val, SYS_SPSR);
+			return;
 		}
 
 		/* No EL1 counterpart? We're done here.? */
@@ -1610,6 +1617,18 @@ static bool access_sp_el1(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+static bool access_spsr_el2(struct kvm_vcpu *vcpu,
+			    struct sys_reg_params *p,
+			    const struct sys_reg_desc *r)
+{
+	if (p->is_write)
+		vcpu_write_sys_reg(vcpu, p->regval, SPSR_EL2);
+	else
+		p->regval = vcpu_read_sys_reg(vcpu, SPSR_EL2);
+
+	return true;
+}
+
 /*
  * Architected system registers.
  * Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
@@ -2041,7 +2060,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	EL2_REG(VTCR_EL2, access_rw, reset_val, 0),
 
 	{ SYS_DESC(SYS_DACR32_EL2), NULL, reset_unknown, DACR32_EL2 },
-	EL2_REG(SPSR_EL2, access_rw, reset_val, 0),
+	EL2_REG(SPSR_EL2, access_spsr_el2, reset_val, 0),
 	EL2_REG(ELR_EL2, access_rw, reset_val, 0),
 	{ SYS_DESC(SYS_SP_EL1), access_sp_el1},
 
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 14/64] KVM: arm64: nv: Handle SPSR_EL2 specially
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

SPSR_EL2 needs special attention when running nested on ARMv8.3:

If taking an exception while running at vEL2 (actually EL1), the
HW will update the SPSR_EL1 register with the EL1 mode. We need
to track this in order to make sure that accesses to the virtual
view of SPSR_EL2 is correct.

To do so, we place an illegal value in SPSR_EL1.M, and patch it
accordingly if required when accessing it.

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_emulate.h | 37 ++++++++++++++++++++++++++++
 arch/arm64/kvm/sys_regs.c            | 23 +++++++++++++++--
 2 files changed, 58 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index cb9f123d26f3..54e8eee413eb 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -241,6 +241,43 @@ static inline bool is_hyp_ctxt(const struct kvm_vcpu *vcpu)
 	return __is_hyp_ctxt(&vcpu->arch.ctxt);
 }
 
+static inline u64 __fixup_spsr_el2_write(struct kvm_cpu_context *ctxt, u64 val)
+{
+	if (!__vcpu_el2_e2h_is_set(ctxt)) {
+		/*
+		 * Clear the .M field when writing SPSR to the CPU, so that we
+		 * can detect when the CPU clobbered our SPSR copy during a
+		 * local exception.
+		 */
+		val &= ~0xc;
+	}
+
+	return val;
+}
+
+static inline u64 __fixup_spsr_el2_read(const struct kvm_cpu_context *ctxt, u64 val)
+{
+	if (__vcpu_el2_e2h_is_set(ctxt))
+		return val;
+
+	/*
+	 * SPSR.M == 0 means the CPU has not touched the SPSR, so the
+	 * register has still the value we saved on the last write.
+	 */
+	if ((val & 0xc) == 0)
+		return ctxt_sys_reg(ctxt, SPSR_EL2);
+
+	/*
+	 * Otherwise there was a "local" exception on the CPU,
+	 * which from the guest's point of view was being taken from
+	 * EL2 to EL2, although it actually happened to be from
+	 * EL1 to EL1.
+	 * So we need to fix the .M field in SPSR, to make it look
+	 * like EL2, which is what the guest would expect.
+	 */
+	return (val & ~0x0c) | CurrentEL_EL2;
+}
+
 /*
  * The layout of SPSR for an AArch32 state is different when observed from an
  * AArch64 SPSR_ELx or an AArch32 SPSR_*. This function generates the AArch32
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 0927e6c345b4..ace4a54caef9 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -133,11 +133,14 @@ u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
 			goto memory_read;
 
 		/*
-		 * ELR_EL2 is special cased for now.
+		 * ELR_EL2 and SPSR_EL2 are special cased for now.
 		 */
 		switch (reg) {
 		case ELR_EL2:
 			return read_sysreg_el1(SYS_ELR);
+		case SPSR_EL2:
+			val = read_sysreg_el1(SYS_SPSR);
+			return __fixup_spsr_el2_read(&vcpu->arch.ctxt, val);
 		}
 
 		/*
@@ -194,6 +197,10 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
 		case ELR_EL2:
 			write_sysreg_el1(val, SYS_ELR);
 			return;
+		case SPSR_EL2:
+			val = __fixup_spsr_el2_write(&vcpu->arch.ctxt, val);
+			write_sysreg_el1(val, SYS_SPSR);
+			return;
 		}
 
 		/* No EL1 counterpart? We're done here.? */
@@ -1610,6 +1617,18 @@ static bool access_sp_el1(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+static bool access_spsr_el2(struct kvm_vcpu *vcpu,
+			    struct sys_reg_params *p,
+			    const struct sys_reg_desc *r)
+{
+	if (p->is_write)
+		vcpu_write_sys_reg(vcpu, p->regval, SPSR_EL2);
+	else
+		p->regval = vcpu_read_sys_reg(vcpu, SPSR_EL2);
+
+	return true;
+}
+
 /*
  * Architected system registers.
  * Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
@@ -2041,7 +2060,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	EL2_REG(VTCR_EL2, access_rw, reset_val, 0),
 
 	{ SYS_DESC(SYS_DACR32_EL2), NULL, reset_unknown, DACR32_EL2 },
-	EL2_REG(SPSR_EL2, access_rw, reset_val, 0),
+	EL2_REG(SPSR_EL2, access_spsr_el2, reset_val, 0),
 	EL2_REG(ELR_EL2, access_rw, reset_val, 0),
 	{ SYS_DESC(SYS_SP_EL1), access_sp_el1},
 
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 15/64] KVM: arm64: nv: Handle HCR_EL2.E2H specially
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

HCR_EL2.E2H is nasty, as a flip of this bit completely changes the way
we deal with a lot of the state. So when the guest flips this bit
(sysregs are live), do the put/load dance so that we have a consistent
state.

Yes, this is slow. Don't do it.

Suggested-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/sys_regs.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index ace4a54caef9..102bc4906723 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -183,9 +183,24 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
 		goto memory_write;
 
 	if (unlikely(get_el2_to_el1_mapping(reg, &el1r, &xlate))) {
+		bool need_put_load;
+
 		if (!is_hyp_ctxt(vcpu))
 			goto memory_write;
 
+		/*
+		 * HCR_EL2.E2H is nasty: it changes the way we interpret a
+		 * lot of the EL2 state, so treat is as a full state
+		 * transition.
+		 */
+		need_put_load = ((reg == HCR_EL2) &&
+				 vcpu_el2_e2h_is_set(vcpu) != !!(val & HCR_E2H));
+
+		if (need_put_load) {
+			preempt_disable();
+			kvm_arch_vcpu_put(vcpu);
+		}
+
 		/*
 		 * Always store a copy of the write to memory to avoid having
 		 * to reverse-translate virtual EL2 system registers for a
@@ -193,6 +208,11 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
 		 */
 		__vcpu_sys_reg(vcpu, reg) = val;
 
+		if (need_put_load) {
+			kvm_arch_vcpu_load(vcpu, smp_processor_id());
+			preempt_enable();
+		}
+
 		switch (reg) {
 		case ELR_EL2:
 			write_sysreg_el1(val, SYS_ELR);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 15/64] KVM: arm64: nv: Handle HCR_EL2.E2H specially
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

HCR_EL2.E2H is nasty, as a flip of this bit completely changes the way
we deal with a lot of the state. So when the guest flips this bit
(sysregs are live), do the put/load dance so that we have a consistent
state.

Yes, this is slow. Don't do it.

Suggested-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/sys_regs.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index ace4a54caef9..102bc4906723 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -183,9 +183,24 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
 		goto memory_write;
 
 	if (unlikely(get_el2_to_el1_mapping(reg, &el1r, &xlate))) {
+		bool need_put_load;
+
 		if (!is_hyp_ctxt(vcpu))
 			goto memory_write;
 
+		/*
+		 * HCR_EL2.E2H is nasty: it changes the way we interpret a
+		 * lot of the EL2 state, so treat is as a full state
+		 * transition.
+		 */
+		need_put_load = ((reg == HCR_EL2) &&
+				 vcpu_el2_e2h_is_set(vcpu) != !!(val & HCR_E2H));
+
+		if (need_put_load) {
+			preempt_disable();
+			kvm_arch_vcpu_put(vcpu);
+		}
+
 		/*
 		 * Always store a copy of the write to memory to avoid having
 		 * to reverse-translate virtual EL2 system registers for a
@@ -193,6 +208,11 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
 		 */
 		__vcpu_sys_reg(vcpu, reg) = val;
 
+		if (need_put_load) {
+			kvm_arch_vcpu_load(vcpu, smp_processor_id());
+			preempt_enable();
+		}
+
 		switch (reg) {
 		case ELR_EL2:
 			write_sysreg_el1(val, SYS_ELR);
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 15/64] KVM: arm64: nv: Handle HCR_EL2.E2H specially
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

HCR_EL2.E2H is nasty, as a flip of this bit completely changes the way
we deal with a lot of the state. So when the guest flips this bit
(sysregs are live), do the put/load dance so that we have a consistent
state.

Yes, this is slow. Don't do it.

Suggested-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/sys_regs.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index ace4a54caef9..102bc4906723 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -183,9 +183,24 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
 		goto memory_write;
 
 	if (unlikely(get_el2_to_el1_mapping(reg, &el1r, &xlate))) {
+		bool need_put_load;
+
 		if (!is_hyp_ctxt(vcpu))
 			goto memory_write;
 
+		/*
+		 * HCR_EL2.E2H is nasty: it changes the way we interpret a
+		 * lot of the EL2 state, so treat is as a full state
+		 * transition.
+		 */
+		need_put_load = ((reg == HCR_EL2) &&
+				 vcpu_el2_e2h_is_set(vcpu) != !!(val & HCR_E2H));
+
+		if (need_put_load) {
+			preempt_disable();
+			kvm_arch_vcpu_put(vcpu);
+		}
+
 		/*
 		 * Always store a copy of the write to memory to avoid having
 		 * to reverse-translate virtual EL2 system registers for a
@@ -193,6 +208,11 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
 		 */
 		__vcpu_sys_reg(vcpu, reg) = val;
 
+		if (need_put_load) {
+			kvm_arch_vcpu_load(vcpu, smp_processor_id());
+			preempt_enable();
+		}
+
 		switch (reg) {
 		case ELR_EL2:
 			write_sysreg_el1(val, SYS_ELR);
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 16/64] KVM: arm64: nv: Save/Restore vEL2 sysregs
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

Whenever we need to restore the guest's system registers to the CPU, we
now need to take care of the EL2 system registers as well. Most of them
are accessed via traps only, but some have an immediate effect and also
a guest running in VHE mode would expect them to be accessible via their
EL1 encoding, which we do not trap.

For vEL2 we write the virtual EL2 registers with an identical format directly
into their EL1 counterpart, and translate the few registers that have a
different format for the same effect on the execution when running a
non-VHE guest guest hypervisor.

Based on an initial patch from Andre Przywara, rewritten many times
since.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h |   5 +-
 arch/arm64/kvm/hyp/nvhe/sysreg-sr.c        |   2 +-
 arch/arm64/kvm/hyp/vhe/sysreg-sr.c         | 125 ++++++++++++++++++++-
 3 files changed, 127 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
index 7ecca8b07851..283f780f5f56 100644
--- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
+++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
@@ -92,9 +92,10 @@ static inline void __sysreg_restore_user_state(struct kvm_cpu_context *ctxt)
 	write_sysreg(ctxt_sys_reg(ctxt, TPIDRRO_EL0),	tpidrro_el0);
 }
 
-static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
+static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt,
+					      u64 mpidr)
 {
-	write_sysreg(ctxt_sys_reg(ctxt, MPIDR_EL1),	vmpidr_el2);
+	write_sysreg(mpidr,				vmpidr_el2);
 	write_sysreg(ctxt_sys_reg(ctxt, CSSELR_EL1),	csselr_el1);
 
 	if (has_vhe() ||
diff --git a/arch/arm64/kvm/hyp/nvhe/sysreg-sr.c b/arch/arm64/kvm/hyp/nvhe/sysreg-sr.c
index 29305022bc04..dba101565de3 100644
--- a/arch/arm64/kvm/hyp/nvhe/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/sysreg-sr.c
@@ -28,7 +28,7 @@ void __sysreg_save_state_nvhe(struct kvm_cpu_context *ctxt)
 
 void __sysreg_restore_state_nvhe(struct kvm_cpu_context *ctxt)
 {
-	__sysreg_restore_el1_state(ctxt);
+	__sysreg_restore_el1_state(ctxt, ctxt_sys_reg(ctxt, MPIDR_EL1));
 	__sysreg_restore_common_state(ctxt);
 	__sysreg_restore_user_state(ctxt);
 	__sysreg_restore_el2_return_state(ctxt);
diff --git a/arch/arm64/kvm/hyp/vhe/sysreg-sr.c b/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
index 007a12dd4351..3e26a78d00c5 100644
--- a/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
@@ -13,6 +13,96 @@
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_nested.h>
+
+static void __sysreg_save_vel2_state(struct kvm_cpu_context *ctxt)
+{
+	/* These registers are common with EL1 */
+	ctxt_sys_reg(ctxt, CSSELR_EL1)	= read_sysreg(csselr_el1);
+	ctxt_sys_reg(ctxt, PAR_EL1)	= read_sysreg(par_el1);
+	ctxt_sys_reg(ctxt, TPIDR_EL1)	= read_sysreg(tpidr_el1);
+
+	ctxt_sys_reg(ctxt, ESR_EL2)	= read_sysreg_el1(SYS_ESR);
+	ctxt_sys_reg(ctxt, AFSR0_EL2)	= read_sysreg_el1(SYS_AFSR0);
+	ctxt_sys_reg(ctxt, AFSR1_EL2)	= read_sysreg_el1(SYS_AFSR1);
+	ctxt_sys_reg(ctxt, FAR_EL2)	= read_sysreg_el1(SYS_FAR);
+	ctxt_sys_reg(ctxt, MAIR_EL2)	= read_sysreg_el1(SYS_MAIR);
+	ctxt_sys_reg(ctxt, VBAR_EL2)	= read_sysreg_el1(SYS_VBAR);
+	ctxt_sys_reg(ctxt, CONTEXTIDR_EL2) = read_sysreg_el1(SYS_CONTEXTIDR);
+	ctxt_sys_reg(ctxt, AMAIR_EL2)	= read_sysreg_el1(SYS_AMAIR);
+
+	/*
+	 * In VHE mode those registers are compatible between EL1 and EL2,
+	 * and the guest uses the _EL1 versions on the CPU naturally.
+	 * So we save them into their _EL2 versions here.
+	 * For nVHE mode we trap accesses to those registers, so our
+	 * _EL2 copy in sys_regs[] is always up-to-date and we don't need
+	 * to save anything here.
+	 */
+	if (__vcpu_el2_e2h_is_set(ctxt)) {
+		ctxt_sys_reg(ctxt, SCTLR_EL2)	= read_sysreg_el1(SYS_SCTLR);
+		ctxt_sys_reg(ctxt, CPTR_EL2)	= read_sysreg_el1(SYS_CPACR);
+		ctxt_sys_reg(ctxt, TTBR0_EL2)	= read_sysreg_el1(SYS_TTBR0);
+		ctxt_sys_reg(ctxt, TTBR1_EL2)	= read_sysreg_el1(SYS_TTBR1);
+		ctxt_sys_reg(ctxt, TCR_EL2)	= read_sysreg_el1(SYS_TCR);
+		ctxt_sys_reg(ctxt, CNTHCTL_EL2)	= read_sysreg_el1(SYS_CNTKCTL);
+	}
+
+	ctxt_sys_reg(ctxt, SP_EL2)	= read_sysreg(sp_el1);
+	ctxt_sys_reg(ctxt, ELR_EL2)	= read_sysreg_el1(SYS_ELR);
+	ctxt_sys_reg(ctxt, SPSR_EL2)	= __fixup_spsr_el2_read(ctxt, read_sysreg_el1(SYS_SPSR));
+}
+
+static void __sysreg_restore_vel2_state(struct kvm_cpu_context *ctxt)
+{
+	u64 val;
+
+	/* These registers are common with EL1 */
+	write_sysreg(ctxt_sys_reg(ctxt, CSSELR_EL1),	csselr_el1);
+	write_sysreg(ctxt_sys_reg(ctxt, PAR_EL1),	par_el1);
+	write_sysreg(ctxt_sys_reg(ctxt, TPIDR_EL1),	tpidr_el1);
+
+	write_sysreg(read_cpuid_id(),			vpidr_el2);
+	write_sysreg(ctxt_sys_reg(ctxt, MPIDR_EL1),	vmpidr_el2);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, MAIR_EL2),	SYS_MAIR);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, VBAR_EL2),	SYS_VBAR);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, CONTEXTIDR_EL2),SYS_CONTEXTIDR);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, AMAIR_EL2),	SYS_AMAIR);
+
+	if (__vcpu_el2_e2h_is_set(ctxt)) {
+		/*
+		 * In VHE mode those registers are compatible between
+		 * EL1 and EL2.
+		 */
+		write_sysreg_el1(ctxt_sys_reg(ctxt, SCTLR_EL2),	SYS_SCTLR);
+		write_sysreg_el1(ctxt_sys_reg(ctxt, CPTR_EL2),	SYS_CPACR);
+		write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR0_EL2),	SYS_TTBR0);
+		write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR1_EL2),	SYS_TTBR1);
+		write_sysreg_el1(ctxt_sys_reg(ctxt, TCR_EL2),	SYS_TCR);
+		write_sysreg_el1(ctxt_sys_reg(ctxt, CNTHCTL_EL2), SYS_CNTKCTL);
+	} else {
+		val = translate_sctlr_el2_to_sctlr_el1(ctxt_sys_reg(ctxt, SCTLR_EL2));
+		write_sysreg_el1(val, SYS_SCTLR);
+		val = translate_cptr_el2_to_cpacr_el1(ctxt_sys_reg(ctxt, CPTR_EL2));
+		write_sysreg_el1(val, SYS_CPACR);
+		val = translate_ttbr0_el2_to_ttbr0_el1(ctxt_sys_reg(ctxt, TTBR0_EL2));
+		write_sysreg_el1(val, SYS_TTBR0);
+		val = translate_tcr_el2_to_tcr_el1(ctxt_sys_reg(ctxt, TCR_EL2));
+		write_sysreg_el1(val, SYS_TCR);
+		val = translate_cnthctl_el2_to_cntkctl_el1(ctxt_sys_reg(ctxt, CNTHCTL_EL2));
+		write_sysreg_el1(val, SYS_CNTKCTL);
+	}
+
+	write_sysreg_el1(ctxt_sys_reg(ctxt, ESR_EL2),	SYS_ESR);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, AFSR0_EL2),	SYS_AFSR0);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, AFSR1_EL2),	SYS_AFSR1);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, FAR_EL2),	SYS_FAR);
+	write_sysreg(ctxt_sys_reg(ctxt, SP_EL2),	sp_el1);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, ELR_EL2),	SYS_ELR);
+
+	val = __fixup_spsr_el2_write(ctxt, ctxt_sys_reg(ctxt, SPSR_EL2));
+	write_sysreg_el1(val,	SYS_SPSR);
+}
 
 /*
  * VHE: Host and guest must save mdscr_el1 and sp_el0 (and the PC and
@@ -65,6 +155,7 @@ void kvm_vcpu_load_sysregs_vhe(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
 	struct kvm_cpu_context *host_ctxt;
+	u64 mpidr;
 
 	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
 	__sysreg_save_user_state(host_ctxt);
@@ -77,7 +168,29 @@ void kvm_vcpu_load_sysregs_vhe(struct kvm_vcpu *vcpu)
 	 */
 	__sysreg32_restore_state(vcpu);
 	__sysreg_restore_user_state(guest_ctxt);
-	__sysreg_restore_el1_state(guest_ctxt);
+
+	if (unlikely(__is_hyp_ctxt(guest_ctxt))) {
+		__sysreg_restore_vel2_state(guest_ctxt);
+	} else {
+		if (vcpu_has_nv(vcpu)) {
+			/*
+			 * Only set VPIDR_EL2 for nested VMs, as this is the
+			 * only time it changes. We'll restore the MIDR_EL1
+			 * view on put.
+			 */
+			write_sysreg(ctxt_sys_reg(guest_ctxt, VPIDR_EL2), vpidr_el2);
+
+			/*
+			 * As we're restoring a nested guest, set the value
+			 * provided by the guest hypervisor.
+			 */
+			mpidr = ctxt_sys_reg(guest_ctxt, VMPIDR_EL2);
+		} else {
+			mpidr = ctxt_sys_reg(guest_ctxt, MPIDR_EL1);
+		}
+
+		__sysreg_restore_el1_state(guest_ctxt, mpidr);
+	}
 
 	vcpu->arch.sysregs_loaded_on_cpu = true;
 
@@ -103,12 +216,20 @@ void kvm_vcpu_put_sysregs_vhe(struct kvm_vcpu *vcpu)
 	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
 	deactivate_traps_vhe_put(vcpu);
 
-	__sysreg_save_el1_state(guest_ctxt);
+	if (unlikely(__is_hyp_ctxt(guest_ctxt)))
+		__sysreg_save_vel2_state(guest_ctxt);
+	else
+		__sysreg_save_el1_state(guest_ctxt);
+
 	__sysreg_save_user_state(guest_ctxt);
 	__sysreg32_save_state(vcpu);
 
 	/* Restore host user state */
 	__sysreg_restore_user_state(host_ctxt);
 
+	/* If leaving a nesting guest, restore MPIDR_EL1 default view */
+	if (vcpu_has_nv(vcpu))
+		write_sysreg(read_cpuid_id(),	vpidr_el2);
+
 	vcpu->arch.sysregs_loaded_on_cpu = false;
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 16/64] KVM: arm64: nv: Save/Restore vEL2 sysregs
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

Whenever we need to restore the guest's system registers to the CPU, we
now need to take care of the EL2 system registers as well. Most of them
are accessed via traps only, but some have an immediate effect and also
a guest running in VHE mode would expect them to be accessible via their
EL1 encoding, which we do not trap.

For vEL2 we write the virtual EL2 registers with an identical format directly
into their EL1 counterpart, and translate the few registers that have a
different format for the same effect on the execution when running a
non-VHE guest guest hypervisor.

Based on an initial patch from Andre Przywara, rewritten many times
since.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h |   5 +-
 arch/arm64/kvm/hyp/nvhe/sysreg-sr.c        |   2 +-
 arch/arm64/kvm/hyp/vhe/sysreg-sr.c         | 125 ++++++++++++++++++++-
 3 files changed, 127 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
index 7ecca8b07851..283f780f5f56 100644
--- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
+++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
@@ -92,9 +92,10 @@ static inline void __sysreg_restore_user_state(struct kvm_cpu_context *ctxt)
 	write_sysreg(ctxt_sys_reg(ctxt, TPIDRRO_EL0),	tpidrro_el0);
 }
 
-static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
+static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt,
+					      u64 mpidr)
 {
-	write_sysreg(ctxt_sys_reg(ctxt, MPIDR_EL1),	vmpidr_el2);
+	write_sysreg(mpidr,				vmpidr_el2);
 	write_sysreg(ctxt_sys_reg(ctxt, CSSELR_EL1),	csselr_el1);
 
 	if (has_vhe() ||
diff --git a/arch/arm64/kvm/hyp/nvhe/sysreg-sr.c b/arch/arm64/kvm/hyp/nvhe/sysreg-sr.c
index 29305022bc04..dba101565de3 100644
--- a/arch/arm64/kvm/hyp/nvhe/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/sysreg-sr.c
@@ -28,7 +28,7 @@ void __sysreg_save_state_nvhe(struct kvm_cpu_context *ctxt)
 
 void __sysreg_restore_state_nvhe(struct kvm_cpu_context *ctxt)
 {
-	__sysreg_restore_el1_state(ctxt);
+	__sysreg_restore_el1_state(ctxt, ctxt_sys_reg(ctxt, MPIDR_EL1));
 	__sysreg_restore_common_state(ctxt);
 	__sysreg_restore_user_state(ctxt);
 	__sysreg_restore_el2_return_state(ctxt);
diff --git a/arch/arm64/kvm/hyp/vhe/sysreg-sr.c b/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
index 007a12dd4351..3e26a78d00c5 100644
--- a/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
@@ -13,6 +13,96 @@
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_nested.h>
+
+static void __sysreg_save_vel2_state(struct kvm_cpu_context *ctxt)
+{
+	/* These registers are common with EL1 */
+	ctxt_sys_reg(ctxt, CSSELR_EL1)	= read_sysreg(csselr_el1);
+	ctxt_sys_reg(ctxt, PAR_EL1)	= read_sysreg(par_el1);
+	ctxt_sys_reg(ctxt, TPIDR_EL1)	= read_sysreg(tpidr_el1);
+
+	ctxt_sys_reg(ctxt, ESR_EL2)	= read_sysreg_el1(SYS_ESR);
+	ctxt_sys_reg(ctxt, AFSR0_EL2)	= read_sysreg_el1(SYS_AFSR0);
+	ctxt_sys_reg(ctxt, AFSR1_EL2)	= read_sysreg_el1(SYS_AFSR1);
+	ctxt_sys_reg(ctxt, FAR_EL2)	= read_sysreg_el1(SYS_FAR);
+	ctxt_sys_reg(ctxt, MAIR_EL2)	= read_sysreg_el1(SYS_MAIR);
+	ctxt_sys_reg(ctxt, VBAR_EL2)	= read_sysreg_el1(SYS_VBAR);
+	ctxt_sys_reg(ctxt, CONTEXTIDR_EL2) = read_sysreg_el1(SYS_CONTEXTIDR);
+	ctxt_sys_reg(ctxt, AMAIR_EL2)	= read_sysreg_el1(SYS_AMAIR);
+
+	/*
+	 * In VHE mode those registers are compatible between EL1 and EL2,
+	 * and the guest uses the _EL1 versions on the CPU naturally.
+	 * So we save them into their _EL2 versions here.
+	 * For nVHE mode we trap accesses to those registers, so our
+	 * _EL2 copy in sys_regs[] is always up-to-date and we don't need
+	 * to save anything here.
+	 */
+	if (__vcpu_el2_e2h_is_set(ctxt)) {
+		ctxt_sys_reg(ctxt, SCTLR_EL2)	= read_sysreg_el1(SYS_SCTLR);
+		ctxt_sys_reg(ctxt, CPTR_EL2)	= read_sysreg_el1(SYS_CPACR);
+		ctxt_sys_reg(ctxt, TTBR0_EL2)	= read_sysreg_el1(SYS_TTBR0);
+		ctxt_sys_reg(ctxt, TTBR1_EL2)	= read_sysreg_el1(SYS_TTBR1);
+		ctxt_sys_reg(ctxt, TCR_EL2)	= read_sysreg_el1(SYS_TCR);
+		ctxt_sys_reg(ctxt, CNTHCTL_EL2)	= read_sysreg_el1(SYS_CNTKCTL);
+	}
+
+	ctxt_sys_reg(ctxt, SP_EL2)	= read_sysreg(sp_el1);
+	ctxt_sys_reg(ctxt, ELR_EL2)	= read_sysreg_el1(SYS_ELR);
+	ctxt_sys_reg(ctxt, SPSR_EL2)	= __fixup_spsr_el2_read(ctxt, read_sysreg_el1(SYS_SPSR));
+}
+
+static void __sysreg_restore_vel2_state(struct kvm_cpu_context *ctxt)
+{
+	u64 val;
+
+	/* These registers are common with EL1 */
+	write_sysreg(ctxt_sys_reg(ctxt, CSSELR_EL1),	csselr_el1);
+	write_sysreg(ctxt_sys_reg(ctxt, PAR_EL1),	par_el1);
+	write_sysreg(ctxt_sys_reg(ctxt, TPIDR_EL1),	tpidr_el1);
+
+	write_sysreg(read_cpuid_id(),			vpidr_el2);
+	write_sysreg(ctxt_sys_reg(ctxt, MPIDR_EL1),	vmpidr_el2);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, MAIR_EL2),	SYS_MAIR);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, VBAR_EL2),	SYS_VBAR);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, CONTEXTIDR_EL2),SYS_CONTEXTIDR);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, AMAIR_EL2),	SYS_AMAIR);
+
+	if (__vcpu_el2_e2h_is_set(ctxt)) {
+		/*
+		 * In VHE mode those registers are compatible between
+		 * EL1 and EL2.
+		 */
+		write_sysreg_el1(ctxt_sys_reg(ctxt, SCTLR_EL2),	SYS_SCTLR);
+		write_sysreg_el1(ctxt_sys_reg(ctxt, CPTR_EL2),	SYS_CPACR);
+		write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR0_EL2),	SYS_TTBR0);
+		write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR1_EL2),	SYS_TTBR1);
+		write_sysreg_el1(ctxt_sys_reg(ctxt, TCR_EL2),	SYS_TCR);
+		write_sysreg_el1(ctxt_sys_reg(ctxt, CNTHCTL_EL2), SYS_CNTKCTL);
+	} else {
+		val = translate_sctlr_el2_to_sctlr_el1(ctxt_sys_reg(ctxt, SCTLR_EL2));
+		write_sysreg_el1(val, SYS_SCTLR);
+		val = translate_cptr_el2_to_cpacr_el1(ctxt_sys_reg(ctxt, CPTR_EL2));
+		write_sysreg_el1(val, SYS_CPACR);
+		val = translate_ttbr0_el2_to_ttbr0_el1(ctxt_sys_reg(ctxt, TTBR0_EL2));
+		write_sysreg_el1(val, SYS_TTBR0);
+		val = translate_tcr_el2_to_tcr_el1(ctxt_sys_reg(ctxt, TCR_EL2));
+		write_sysreg_el1(val, SYS_TCR);
+		val = translate_cnthctl_el2_to_cntkctl_el1(ctxt_sys_reg(ctxt, CNTHCTL_EL2));
+		write_sysreg_el1(val, SYS_CNTKCTL);
+	}
+
+	write_sysreg_el1(ctxt_sys_reg(ctxt, ESR_EL2),	SYS_ESR);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, AFSR0_EL2),	SYS_AFSR0);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, AFSR1_EL2),	SYS_AFSR1);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, FAR_EL2),	SYS_FAR);
+	write_sysreg(ctxt_sys_reg(ctxt, SP_EL2),	sp_el1);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, ELR_EL2),	SYS_ELR);
+
+	val = __fixup_spsr_el2_write(ctxt, ctxt_sys_reg(ctxt, SPSR_EL2));
+	write_sysreg_el1(val,	SYS_SPSR);
+}
 
 /*
  * VHE: Host and guest must save mdscr_el1 and sp_el0 (and the PC and
@@ -65,6 +155,7 @@ void kvm_vcpu_load_sysregs_vhe(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
 	struct kvm_cpu_context *host_ctxt;
+	u64 mpidr;
 
 	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
 	__sysreg_save_user_state(host_ctxt);
@@ -77,7 +168,29 @@ void kvm_vcpu_load_sysregs_vhe(struct kvm_vcpu *vcpu)
 	 */
 	__sysreg32_restore_state(vcpu);
 	__sysreg_restore_user_state(guest_ctxt);
-	__sysreg_restore_el1_state(guest_ctxt);
+
+	if (unlikely(__is_hyp_ctxt(guest_ctxt))) {
+		__sysreg_restore_vel2_state(guest_ctxt);
+	} else {
+		if (vcpu_has_nv(vcpu)) {
+			/*
+			 * Only set VPIDR_EL2 for nested VMs, as this is the
+			 * only time it changes. We'll restore the MIDR_EL1
+			 * view on put.
+			 */
+			write_sysreg(ctxt_sys_reg(guest_ctxt, VPIDR_EL2), vpidr_el2);
+
+			/*
+			 * As we're restoring a nested guest, set the value
+			 * provided by the guest hypervisor.
+			 */
+			mpidr = ctxt_sys_reg(guest_ctxt, VMPIDR_EL2);
+		} else {
+			mpidr = ctxt_sys_reg(guest_ctxt, MPIDR_EL1);
+		}
+
+		__sysreg_restore_el1_state(guest_ctxt, mpidr);
+	}
 
 	vcpu->arch.sysregs_loaded_on_cpu = true;
 
@@ -103,12 +216,20 @@ void kvm_vcpu_put_sysregs_vhe(struct kvm_vcpu *vcpu)
 	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
 	deactivate_traps_vhe_put(vcpu);
 
-	__sysreg_save_el1_state(guest_ctxt);
+	if (unlikely(__is_hyp_ctxt(guest_ctxt)))
+		__sysreg_save_vel2_state(guest_ctxt);
+	else
+		__sysreg_save_el1_state(guest_ctxt);
+
 	__sysreg_save_user_state(guest_ctxt);
 	__sysreg32_save_state(vcpu);
 
 	/* Restore host user state */
 	__sysreg_restore_user_state(host_ctxt);
 
+	/* If leaving a nesting guest, restore MPIDR_EL1 default view */
+	if (vcpu_has_nv(vcpu))
+		write_sysreg(read_cpuid_id(),	vpidr_el2);
+
 	vcpu->arch.sysregs_loaded_on_cpu = false;
 }
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 16/64] KVM: arm64: nv: Save/Restore vEL2 sysregs
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

Whenever we need to restore the guest's system registers to the CPU, we
now need to take care of the EL2 system registers as well. Most of them
are accessed via traps only, but some have an immediate effect and also
a guest running in VHE mode would expect them to be accessible via their
EL1 encoding, which we do not trap.

For vEL2 we write the virtual EL2 registers with an identical format directly
into their EL1 counterpart, and translate the few registers that have a
different format for the same effect on the execution when running a
non-VHE guest guest hypervisor.

Based on an initial patch from Andre Przywara, rewritten many times
since.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h |   5 +-
 arch/arm64/kvm/hyp/nvhe/sysreg-sr.c        |   2 +-
 arch/arm64/kvm/hyp/vhe/sysreg-sr.c         | 125 ++++++++++++++++++++-
 3 files changed, 127 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
index 7ecca8b07851..283f780f5f56 100644
--- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
+++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
@@ -92,9 +92,10 @@ static inline void __sysreg_restore_user_state(struct kvm_cpu_context *ctxt)
 	write_sysreg(ctxt_sys_reg(ctxt, TPIDRRO_EL0),	tpidrro_el0);
 }
 
-static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
+static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt,
+					      u64 mpidr)
 {
-	write_sysreg(ctxt_sys_reg(ctxt, MPIDR_EL1),	vmpidr_el2);
+	write_sysreg(mpidr,				vmpidr_el2);
 	write_sysreg(ctxt_sys_reg(ctxt, CSSELR_EL1),	csselr_el1);
 
 	if (has_vhe() ||
diff --git a/arch/arm64/kvm/hyp/nvhe/sysreg-sr.c b/arch/arm64/kvm/hyp/nvhe/sysreg-sr.c
index 29305022bc04..dba101565de3 100644
--- a/arch/arm64/kvm/hyp/nvhe/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/sysreg-sr.c
@@ -28,7 +28,7 @@ void __sysreg_save_state_nvhe(struct kvm_cpu_context *ctxt)
 
 void __sysreg_restore_state_nvhe(struct kvm_cpu_context *ctxt)
 {
-	__sysreg_restore_el1_state(ctxt);
+	__sysreg_restore_el1_state(ctxt, ctxt_sys_reg(ctxt, MPIDR_EL1));
 	__sysreg_restore_common_state(ctxt);
 	__sysreg_restore_user_state(ctxt);
 	__sysreg_restore_el2_return_state(ctxt);
diff --git a/arch/arm64/kvm/hyp/vhe/sysreg-sr.c b/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
index 007a12dd4351..3e26a78d00c5 100644
--- a/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
@@ -13,6 +13,96 @@
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_nested.h>
+
+static void __sysreg_save_vel2_state(struct kvm_cpu_context *ctxt)
+{
+	/* These registers are common with EL1 */
+	ctxt_sys_reg(ctxt, CSSELR_EL1)	= read_sysreg(csselr_el1);
+	ctxt_sys_reg(ctxt, PAR_EL1)	= read_sysreg(par_el1);
+	ctxt_sys_reg(ctxt, TPIDR_EL1)	= read_sysreg(tpidr_el1);
+
+	ctxt_sys_reg(ctxt, ESR_EL2)	= read_sysreg_el1(SYS_ESR);
+	ctxt_sys_reg(ctxt, AFSR0_EL2)	= read_sysreg_el1(SYS_AFSR0);
+	ctxt_sys_reg(ctxt, AFSR1_EL2)	= read_sysreg_el1(SYS_AFSR1);
+	ctxt_sys_reg(ctxt, FAR_EL2)	= read_sysreg_el1(SYS_FAR);
+	ctxt_sys_reg(ctxt, MAIR_EL2)	= read_sysreg_el1(SYS_MAIR);
+	ctxt_sys_reg(ctxt, VBAR_EL2)	= read_sysreg_el1(SYS_VBAR);
+	ctxt_sys_reg(ctxt, CONTEXTIDR_EL2) = read_sysreg_el1(SYS_CONTEXTIDR);
+	ctxt_sys_reg(ctxt, AMAIR_EL2)	= read_sysreg_el1(SYS_AMAIR);
+
+	/*
+	 * In VHE mode those registers are compatible between EL1 and EL2,
+	 * and the guest uses the _EL1 versions on the CPU naturally.
+	 * So we save them into their _EL2 versions here.
+	 * For nVHE mode we trap accesses to those registers, so our
+	 * _EL2 copy in sys_regs[] is always up-to-date and we don't need
+	 * to save anything here.
+	 */
+	if (__vcpu_el2_e2h_is_set(ctxt)) {
+		ctxt_sys_reg(ctxt, SCTLR_EL2)	= read_sysreg_el1(SYS_SCTLR);
+		ctxt_sys_reg(ctxt, CPTR_EL2)	= read_sysreg_el1(SYS_CPACR);
+		ctxt_sys_reg(ctxt, TTBR0_EL2)	= read_sysreg_el1(SYS_TTBR0);
+		ctxt_sys_reg(ctxt, TTBR1_EL2)	= read_sysreg_el1(SYS_TTBR1);
+		ctxt_sys_reg(ctxt, TCR_EL2)	= read_sysreg_el1(SYS_TCR);
+		ctxt_sys_reg(ctxt, CNTHCTL_EL2)	= read_sysreg_el1(SYS_CNTKCTL);
+	}
+
+	ctxt_sys_reg(ctxt, SP_EL2)	= read_sysreg(sp_el1);
+	ctxt_sys_reg(ctxt, ELR_EL2)	= read_sysreg_el1(SYS_ELR);
+	ctxt_sys_reg(ctxt, SPSR_EL2)	= __fixup_spsr_el2_read(ctxt, read_sysreg_el1(SYS_SPSR));
+}
+
+static void __sysreg_restore_vel2_state(struct kvm_cpu_context *ctxt)
+{
+	u64 val;
+
+	/* These registers are common with EL1 */
+	write_sysreg(ctxt_sys_reg(ctxt, CSSELR_EL1),	csselr_el1);
+	write_sysreg(ctxt_sys_reg(ctxt, PAR_EL1),	par_el1);
+	write_sysreg(ctxt_sys_reg(ctxt, TPIDR_EL1),	tpidr_el1);
+
+	write_sysreg(read_cpuid_id(),			vpidr_el2);
+	write_sysreg(ctxt_sys_reg(ctxt, MPIDR_EL1),	vmpidr_el2);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, MAIR_EL2),	SYS_MAIR);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, VBAR_EL2),	SYS_VBAR);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, CONTEXTIDR_EL2),SYS_CONTEXTIDR);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, AMAIR_EL2),	SYS_AMAIR);
+
+	if (__vcpu_el2_e2h_is_set(ctxt)) {
+		/*
+		 * In VHE mode those registers are compatible between
+		 * EL1 and EL2.
+		 */
+		write_sysreg_el1(ctxt_sys_reg(ctxt, SCTLR_EL2),	SYS_SCTLR);
+		write_sysreg_el1(ctxt_sys_reg(ctxt, CPTR_EL2),	SYS_CPACR);
+		write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR0_EL2),	SYS_TTBR0);
+		write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR1_EL2),	SYS_TTBR1);
+		write_sysreg_el1(ctxt_sys_reg(ctxt, TCR_EL2),	SYS_TCR);
+		write_sysreg_el1(ctxt_sys_reg(ctxt, CNTHCTL_EL2), SYS_CNTKCTL);
+	} else {
+		val = translate_sctlr_el2_to_sctlr_el1(ctxt_sys_reg(ctxt, SCTLR_EL2));
+		write_sysreg_el1(val, SYS_SCTLR);
+		val = translate_cptr_el2_to_cpacr_el1(ctxt_sys_reg(ctxt, CPTR_EL2));
+		write_sysreg_el1(val, SYS_CPACR);
+		val = translate_ttbr0_el2_to_ttbr0_el1(ctxt_sys_reg(ctxt, TTBR0_EL2));
+		write_sysreg_el1(val, SYS_TTBR0);
+		val = translate_tcr_el2_to_tcr_el1(ctxt_sys_reg(ctxt, TCR_EL2));
+		write_sysreg_el1(val, SYS_TCR);
+		val = translate_cnthctl_el2_to_cntkctl_el1(ctxt_sys_reg(ctxt, CNTHCTL_EL2));
+		write_sysreg_el1(val, SYS_CNTKCTL);
+	}
+
+	write_sysreg_el1(ctxt_sys_reg(ctxt, ESR_EL2),	SYS_ESR);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, AFSR0_EL2),	SYS_AFSR0);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, AFSR1_EL2),	SYS_AFSR1);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, FAR_EL2),	SYS_FAR);
+	write_sysreg(ctxt_sys_reg(ctxt, SP_EL2),	sp_el1);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, ELR_EL2),	SYS_ELR);
+
+	val = __fixup_spsr_el2_write(ctxt, ctxt_sys_reg(ctxt, SPSR_EL2));
+	write_sysreg_el1(val,	SYS_SPSR);
+}
 
 /*
  * VHE: Host and guest must save mdscr_el1 and sp_el0 (and the PC and
@@ -65,6 +155,7 @@ void kvm_vcpu_load_sysregs_vhe(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
 	struct kvm_cpu_context *host_ctxt;
+	u64 mpidr;
 
 	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
 	__sysreg_save_user_state(host_ctxt);
@@ -77,7 +168,29 @@ void kvm_vcpu_load_sysregs_vhe(struct kvm_vcpu *vcpu)
 	 */
 	__sysreg32_restore_state(vcpu);
 	__sysreg_restore_user_state(guest_ctxt);
-	__sysreg_restore_el1_state(guest_ctxt);
+
+	if (unlikely(__is_hyp_ctxt(guest_ctxt))) {
+		__sysreg_restore_vel2_state(guest_ctxt);
+	} else {
+		if (vcpu_has_nv(vcpu)) {
+			/*
+			 * Only set VPIDR_EL2 for nested VMs, as this is the
+			 * only time it changes. We'll restore the MIDR_EL1
+			 * view on put.
+			 */
+			write_sysreg(ctxt_sys_reg(guest_ctxt, VPIDR_EL2), vpidr_el2);
+
+			/*
+			 * As we're restoring a nested guest, set the value
+			 * provided by the guest hypervisor.
+			 */
+			mpidr = ctxt_sys_reg(guest_ctxt, VMPIDR_EL2);
+		} else {
+			mpidr = ctxt_sys_reg(guest_ctxt, MPIDR_EL1);
+		}
+
+		__sysreg_restore_el1_state(guest_ctxt, mpidr);
+	}
 
 	vcpu->arch.sysregs_loaded_on_cpu = true;
 
@@ -103,12 +216,20 @@ void kvm_vcpu_put_sysregs_vhe(struct kvm_vcpu *vcpu)
 	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
 	deactivate_traps_vhe_put(vcpu);
 
-	__sysreg_save_el1_state(guest_ctxt);
+	if (unlikely(__is_hyp_ctxt(guest_ctxt)))
+		__sysreg_save_vel2_state(guest_ctxt);
+	else
+		__sysreg_save_el1_state(guest_ctxt);
+
 	__sysreg_save_user_state(guest_ctxt);
 	__sysreg32_save_state(vcpu);
 
 	/* Restore host user state */
 	__sysreg_restore_user_state(host_ctxt);
 
+	/* If leaving a nesting guest, restore MPIDR_EL1 default view */
+	if (vcpu_has_nv(vcpu))
+		write_sysreg(read_cpuid_id(),	vpidr_el2);
+
 	vcpu->arch.sysregs_loaded_on_cpu = false;
 }
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 17/64] KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Christoffer Dall <christoffer.dall@arm.com>

We can no longer blindly copy the VCPU's PSTATE into SPSR_EL2 and return
to the guest and vice versa when taking an exception to the hypervisor,
because we emulate virtual EL2 in EL1 and therefore have to translate
the mode field from EL2 to EL1 and vice versa.

This requires keeping track of the state we enter the guest, for which
we transiently use a dedicated flag.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h          |  1 +
 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 19 ++++++++++++++++-
 arch/arm64/kvm/hyp/vhe/switch.c            | 24 ++++++++++++++++++++++
 3 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 8fffe2888403..fa253f08e0fd 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -472,6 +472,7 @@ struct kvm_vcpu_arch {
 #define KVM_ARM64_DEBUG_STATE_SAVE_SPE	(1 << 12) /* Save SPE context if active  */
 #define KVM_ARM64_DEBUG_STATE_SAVE_TRBE	(1 << 13) /* Save TRBE context if active  */
 #define KVM_ARM64_FP_FOREIGN_FPSTATE	(1 << 14)
+#define KVM_ARM64_IN_HYP_CONTEXT	(1 << 15) /* Guest running in HYP context */
 
 #define KVM_GUESTDBG_VALID_MASK (KVM_GUESTDBG_ENABLE | \
 				 KVM_GUESTDBG_USE_SW_BP | \
diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
index 283f780f5f56..e3689c6ce4cc 100644
--- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
+++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
@@ -157,9 +157,26 @@ static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt,
 	write_sysreg_el1(ctxt_sys_reg(ctxt, SPSR_EL1),	SYS_SPSR);
 }
 
+/* Read the VCPU state's PSTATE, but translate (v)EL2 to EL1. */
+static inline u64 to_hw_pstate(const struct kvm_cpu_context *ctxt)
+{
+	u64 mode = ctxt->regs.pstate & (PSR_MODE_MASK | PSR_MODE32_BIT);
+
+	switch (mode) {
+	case PSR_MODE_EL2t:
+		mode = PSR_MODE_EL1t;
+		break;
+	case PSR_MODE_EL2h:
+		mode = PSR_MODE_EL1h;
+		break;
+	}
+
+	return (ctxt->regs.pstate & ~(PSR_MODE_MASK | PSR_MODE32_BIT)) | mode;
+}
+
 static inline void __sysreg_restore_el2_return_state(struct kvm_cpu_context *ctxt)
 {
-	u64 pstate = ctxt->regs.pstate;
+	u64 pstate = to_hw_pstate(ctxt);
 	u64 mode = pstate & PSR_AA32_MODE_MASK;
 
 	/*
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 11d053fdd604..82ddaebe66de 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -113,6 +113,25 @@ static const exit_handler_fn *kvm_get_exit_handler_array(struct kvm_vcpu *vcpu)
 
 static void early_exit_filter(struct kvm_vcpu *vcpu, u64 *exit_code)
 {
+	/*
+	 * If we were in HYP context on entry, adjust the PSTATE view
+	 * so that the usual helpers work correctly.
+	 */
+	if (unlikely(vcpu->arch.flags & KVM_ARM64_IN_HYP_CONTEXT)) {
+		u64 mode = *vcpu_cpsr(vcpu) & (PSR_MODE_MASK | PSR_MODE32_BIT);
+
+		switch (mode) {
+		case PSR_MODE_EL1t:
+			mode = PSR_MODE_EL2t;
+			break;
+		case PSR_MODE_EL1h:
+			mode = PSR_MODE_EL2h;
+			break;
+		}
+
+		*vcpu_cpsr(vcpu) &= ~(PSR_MODE_MASK | PSR_MODE32_BIT);
+		*vcpu_cpsr(vcpu) |= mode;
+	}
 }
 
 /* Switch to the guest for VHE systems running in EL2 */
@@ -147,6 +166,11 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	sysreg_restore_guest_state_vhe(guest_ctxt);
 	__debug_switch_to_guest(vcpu);
 
+	if (is_hyp_ctxt(vcpu))
+		vcpu->arch.flags |= KVM_ARM64_IN_HYP_CONTEXT;
+	else
+		vcpu->arch.flags &= ~KVM_ARM64_IN_HYP_CONTEXT;
+
 	do {
 		/* Jump in the fire! */
 		exit_code = __guest_enter(vcpu);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 17/64] KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

From: Christoffer Dall <christoffer.dall@arm.com>

We can no longer blindly copy the VCPU's PSTATE into SPSR_EL2 and return
to the guest and vice versa when taking an exception to the hypervisor,
because we emulate virtual EL2 in EL1 and therefore have to translate
the mode field from EL2 to EL1 and vice versa.

This requires keeping track of the state we enter the guest, for which
we transiently use a dedicated flag.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h          |  1 +
 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 19 ++++++++++++++++-
 arch/arm64/kvm/hyp/vhe/switch.c            | 24 ++++++++++++++++++++++
 3 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 8fffe2888403..fa253f08e0fd 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -472,6 +472,7 @@ struct kvm_vcpu_arch {
 #define KVM_ARM64_DEBUG_STATE_SAVE_SPE	(1 << 12) /* Save SPE context if active  */
 #define KVM_ARM64_DEBUG_STATE_SAVE_TRBE	(1 << 13) /* Save TRBE context if active  */
 #define KVM_ARM64_FP_FOREIGN_FPSTATE	(1 << 14)
+#define KVM_ARM64_IN_HYP_CONTEXT	(1 << 15) /* Guest running in HYP context */
 
 #define KVM_GUESTDBG_VALID_MASK (KVM_GUESTDBG_ENABLE | \
 				 KVM_GUESTDBG_USE_SW_BP | \
diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
index 283f780f5f56..e3689c6ce4cc 100644
--- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
+++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
@@ -157,9 +157,26 @@ static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt,
 	write_sysreg_el1(ctxt_sys_reg(ctxt, SPSR_EL1),	SYS_SPSR);
 }
 
+/* Read the VCPU state's PSTATE, but translate (v)EL2 to EL1. */
+static inline u64 to_hw_pstate(const struct kvm_cpu_context *ctxt)
+{
+	u64 mode = ctxt->regs.pstate & (PSR_MODE_MASK | PSR_MODE32_BIT);
+
+	switch (mode) {
+	case PSR_MODE_EL2t:
+		mode = PSR_MODE_EL1t;
+		break;
+	case PSR_MODE_EL2h:
+		mode = PSR_MODE_EL1h;
+		break;
+	}
+
+	return (ctxt->regs.pstate & ~(PSR_MODE_MASK | PSR_MODE32_BIT)) | mode;
+}
+
 static inline void __sysreg_restore_el2_return_state(struct kvm_cpu_context *ctxt)
 {
-	u64 pstate = ctxt->regs.pstate;
+	u64 pstate = to_hw_pstate(ctxt);
 	u64 mode = pstate & PSR_AA32_MODE_MASK;
 
 	/*
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 11d053fdd604..82ddaebe66de 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -113,6 +113,25 @@ static const exit_handler_fn *kvm_get_exit_handler_array(struct kvm_vcpu *vcpu)
 
 static void early_exit_filter(struct kvm_vcpu *vcpu, u64 *exit_code)
 {
+	/*
+	 * If we were in HYP context on entry, adjust the PSTATE view
+	 * so that the usual helpers work correctly.
+	 */
+	if (unlikely(vcpu->arch.flags & KVM_ARM64_IN_HYP_CONTEXT)) {
+		u64 mode = *vcpu_cpsr(vcpu) & (PSR_MODE_MASK | PSR_MODE32_BIT);
+
+		switch (mode) {
+		case PSR_MODE_EL1t:
+			mode = PSR_MODE_EL2t;
+			break;
+		case PSR_MODE_EL1h:
+			mode = PSR_MODE_EL2h;
+			break;
+		}
+
+		*vcpu_cpsr(vcpu) &= ~(PSR_MODE_MASK | PSR_MODE32_BIT);
+		*vcpu_cpsr(vcpu) |= mode;
+	}
 }
 
 /* Switch to the guest for VHE systems running in EL2 */
@@ -147,6 +166,11 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	sysreg_restore_guest_state_vhe(guest_ctxt);
 	__debug_switch_to_guest(vcpu);
 
+	if (is_hyp_ctxt(vcpu))
+		vcpu->arch.flags |= KVM_ARM64_IN_HYP_CONTEXT;
+	else
+		vcpu->arch.flags &= ~KVM_ARM64_IN_HYP_CONTEXT;
+
 	do {
 		/* Jump in the fire! */
 		exit_code = __guest_enter(vcpu);
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 17/64] KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Christoffer Dall <christoffer.dall@arm.com>

We can no longer blindly copy the VCPU's PSTATE into SPSR_EL2 and return
to the guest and vice versa when taking an exception to the hypervisor,
because we emulate virtual EL2 in EL1 and therefore have to translate
the mode field from EL2 to EL1 and vice versa.

This requires keeping track of the state we enter the guest, for which
we transiently use a dedicated flag.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h          |  1 +
 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 19 ++++++++++++++++-
 arch/arm64/kvm/hyp/vhe/switch.c            | 24 ++++++++++++++++++++++
 3 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 8fffe2888403..fa253f08e0fd 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -472,6 +472,7 @@ struct kvm_vcpu_arch {
 #define KVM_ARM64_DEBUG_STATE_SAVE_SPE	(1 << 12) /* Save SPE context if active  */
 #define KVM_ARM64_DEBUG_STATE_SAVE_TRBE	(1 << 13) /* Save TRBE context if active  */
 #define KVM_ARM64_FP_FOREIGN_FPSTATE	(1 << 14)
+#define KVM_ARM64_IN_HYP_CONTEXT	(1 << 15) /* Guest running in HYP context */
 
 #define KVM_GUESTDBG_VALID_MASK (KVM_GUESTDBG_ENABLE | \
 				 KVM_GUESTDBG_USE_SW_BP | \
diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
index 283f780f5f56..e3689c6ce4cc 100644
--- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
+++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
@@ -157,9 +157,26 @@ static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt,
 	write_sysreg_el1(ctxt_sys_reg(ctxt, SPSR_EL1),	SYS_SPSR);
 }
 
+/* Read the VCPU state's PSTATE, but translate (v)EL2 to EL1. */
+static inline u64 to_hw_pstate(const struct kvm_cpu_context *ctxt)
+{
+	u64 mode = ctxt->regs.pstate & (PSR_MODE_MASK | PSR_MODE32_BIT);
+
+	switch (mode) {
+	case PSR_MODE_EL2t:
+		mode = PSR_MODE_EL1t;
+		break;
+	case PSR_MODE_EL2h:
+		mode = PSR_MODE_EL1h;
+		break;
+	}
+
+	return (ctxt->regs.pstate & ~(PSR_MODE_MASK | PSR_MODE32_BIT)) | mode;
+}
+
 static inline void __sysreg_restore_el2_return_state(struct kvm_cpu_context *ctxt)
 {
-	u64 pstate = ctxt->regs.pstate;
+	u64 pstate = to_hw_pstate(ctxt);
 	u64 mode = pstate & PSR_AA32_MODE_MASK;
 
 	/*
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 11d053fdd604..82ddaebe66de 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -113,6 +113,25 @@ static const exit_handler_fn *kvm_get_exit_handler_array(struct kvm_vcpu *vcpu)
 
 static void early_exit_filter(struct kvm_vcpu *vcpu, u64 *exit_code)
 {
+	/*
+	 * If we were in HYP context on entry, adjust the PSTATE view
+	 * so that the usual helpers work correctly.
+	 */
+	if (unlikely(vcpu->arch.flags & KVM_ARM64_IN_HYP_CONTEXT)) {
+		u64 mode = *vcpu_cpsr(vcpu) & (PSR_MODE_MASK | PSR_MODE32_BIT);
+
+		switch (mode) {
+		case PSR_MODE_EL1t:
+			mode = PSR_MODE_EL2t;
+			break;
+		case PSR_MODE_EL1h:
+			mode = PSR_MODE_EL2h;
+			break;
+		}
+
+		*vcpu_cpsr(vcpu) &= ~(PSR_MODE_MASK | PSR_MODE32_BIT);
+		*vcpu_cpsr(vcpu) |= mode;
+	}
 }
 
 /* Switch to the guest for VHE systems running in EL2 */
@@ -147,6 +166,11 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	sysreg_restore_guest_state_vhe(guest_ctxt);
 	__debug_switch_to_guest(vcpu);
 
+	if (is_hyp_ctxt(vcpu))
+		vcpu->arch.flags |= KVM_ARM64_IN_HYP_CONTEXT;
+	else
+		vcpu->arch.flags &= ~KVM_ARM64_IN_HYP_CONTEXT;
+
 	do {
 		/* Jump in the fire! */
 		exit_code = __guest_enter(vcpu);
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 18/64] KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Christoffer Dall <christoffer.dall@linaro.org>

When running in virtual EL2 mode, we actually run the hardware in EL1
and therefore have to use the EL1 registers to ensure correct operation.

By setting the HCR.TVM and HCR.TVRM we ensure that the virtual EL2 mode
doesn't shoot itself in the foot when setting up what it believes to be
a different mode's system register state (for example when preparing to
switch to a VM).

We can leverage the existing sysregs infrastructure to support trapped
accesses to these registers.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/hyp/include/hyp/switch.h |  4 +---
 arch/arm64/kvm/hyp/nvhe/switch.c        |  2 +-
 arch/arm64/kvm/hyp/vhe/switch.c         |  7 ++++++-
 arch/arm64/kvm/sys_regs.c               | 19 ++++++++++++++++---
 4 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index 58e14f8ead23..49c3b9eb09d7 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -110,10 +110,8 @@ static inline void __deactivate_traps_common(struct kvm_vcpu *vcpu)
 		write_sysreg(0, pmuserenr_el0);
 }
 
-static inline void ___activate_traps(struct kvm_vcpu *vcpu)
+static inline void ___activate_traps(struct kvm_vcpu *vcpu, u64 hcr)
 {
-	u64 hcr = vcpu->arch.hcr_el2;
-
 	if (cpus_have_final_cap(ARM64_WORKAROUND_CAVIUM_TX2_219_TVM))
 		hcr |= HCR_TVM;
 
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index 6410d21d8695..61a5627fd456 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -38,7 +38,7 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 {
 	u64 val;
 
-	___activate_traps(vcpu);
+	___activate_traps(vcpu, vcpu->arch.hcr_el2);
 	__activate_traps_common(vcpu);
 
 	val = vcpu->arch.cptr_el2;
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 82ddaebe66de..6ed9e4893a02 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -32,9 +32,14 @@ DEFINE_PER_CPU(unsigned long, kvm_hyp_vector);
 
 static void __activate_traps(struct kvm_vcpu *vcpu)
 {
+	u64 hcr = vcpu->arch.hcr_el2;
 	u64 val;
 
-	___activate_traps(vcpu);
+	/* Trap VM sysreg accesses if an EL2 guest is not using VHE. */
+	if (vcpu_is_el2(vcpu) && !vcpu_el2_e2h_is_set(vcpu))
+		hcr |= HCR_TVM | HCR_TRVM;
+
+	___activate_traps(vcpu, hcr);
 
 	val = read_sysreg(cpacr_el1);
 	val |= CPACR_EL1_TTA;
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 102bc4906723..9d3520f1d17a 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -322,8 +322,15 @@ static void get_access_mask(const struct sys_reg_desc *r, u64 *mask, u64 *shift)
 
 /*
  * Generic accessor for VM registers. Only called as long as HCR_TVM
- * is set. If the guest enables the MMU, we stop trapping the VM
- * sys_regs and leave it in complete control of the caches.
+ * is set.
+ *
+ * This is set in two cases: either (1) we're running at vEL2, or (2)
+ * we're running at EL1 and the guest has its MMU off.
+ *
+ * (1) TVM/TRVM is set, as we need to virtualise some of the VM
+ * registers for the guest hypervisor
+ * (2) Once the guest enables the MMU, we stop trapping the VM sys_regs
+ * and leave it in complete control of the caches.
  */
 static bool access_vm_reg(struct kvm_vcpu *vcpu,
 			  struct sys_reg_params *p,
@@ -332,7 +339,13 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
 	bool was_enabled = vcpu_has_cache_enabled(vcpu);
 	u64 val, mask, shift;
 
-	BUG_ON(!p->is_write);
+	/* We don't expect TRVM on the host */
+	BUG_ON(!vcpu_is_el2(vcpu) && !p->is_write);
+
+	if (!p->is_write) {
+		p->regval = vcpu_read_sys_reg(vcpu, r->reg);
+		return true;
+	}
 
 	get_access_mask(r, &mask, &shift);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 18/64] KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

From: Christoffer Dall <christoffer.dall@linaro.org>

When running in virtual EL2 mode, we actually run the hardware in EL1
and therefore have to use the EL1 registers to ensure correct operation.

By setting the HCR.TVM and HCR.TVRM we ensure that the virtual EL2 mode
doesn't shoot itself in the foot when setting up what it believes to be
a different mode's system register state (for example when preparing to
switch to a VM).

We can leverage the existing sysregs infrastructure to support trapped
accesses to these registers.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/hyp/include/hyp/switch.h |  4 +---
 arch/arm64/kvm/hyp/nvhe/switch.c        |  2 +-
 arch/arm64/kvm/hyp/vhe/switch.c         |  7 ++++++-
 arch/arm64/kvm/sys_regs.c               | 19 ++++++++++++++++---
 4 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index 58e14f8ead23..49c3b9eb09d7 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -110,10 +110,8 @@ static inline void __deactivate_traps_common(struct kvm_vcpu *vcpu)
 		write_sysreg(0, pmuserenr_el0);
 }
 
-static inline void ___activate_traps(struct kvm_vcpu *vcpu)
+static inline void ___activate_traps(struct kvm_vcpu *vcpu, u64 hcr)
 {
-	u64 hcr = vcpu->arch.hcr_el2;
-
 	if (cpus_have_final_cap(ARM64_WORKAROUND_CAVIUM_TX2_219_TVM))
 		hcr |= HCR_TVM;
 
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index 6410d21d8695..61a5627fd456 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -38,7 +38,7 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 {
 	u64 val;
 
-	___activate_traps(vcpu);
+	___activate_traps(vcpu, vcpu->arch.hcr_el2);
 	__activate_traps_common(vcpu);
 
 	val = vcpu->arch.cptr_el2;
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 82ddaebe66de..6ed9e4893a02 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -32,9 +32,14 @@ DEFINE_PER_CPU(unsigned long, kvm_hyp_vector);
 
 static void __activate_traps(struct kvm_vcpu *vcpu)
 {
+	u64 hcr = vcpu->arch.hcr_el2;
 	u64 val;
 
-	___activate_traps(vcpu);
+	/* Trap VM sysreg accesses if an EL2 guest is not using VHE. */
+	if (vcpu_is_el2(vcpu) && !vcpu_el2_e2h_is_set(vcpu))
+		hcr |= HCR_TVM | HCR_TRVM;
+
+	___activate_traps(vcpu, hcr);
 
 	val = read_sysreg(cpacr_el1);
 	val |= CPACR_EL1_TTA;
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 102bc4906723..9d3520f1d17a 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -322,8 +322,15 @@ static void get_access_mask(const struct sys_reg_desc *r, u64 *mask, u64 *shift)
 
 /*
  * Generic accessor for VM registers. Only called as long as HCR_TVM
- * is set. If the guest enables the MMU, we stop trapping the VM
- * sys_regs and leave it in complete control of the caches.
+ * is set.
+ *
+ * This is set in two cases: either (1) we're running at vEL2, or (2)
+ * we're running at EL1 and the guest has its MMU off.
+ *
+ * (1) TVM/TRVM is set, as we need to virtualise some of the VM
+ * registers for the guest hypervisor
+ * (2) Once the guest enables the MMU, we stop trapping the VM sys_regs
+ * and leave it in complete control of the caches.
  */
 static bool access_vm_reg(struct kvm_vcpu *vcpu,
 			  struct sys_reg_params *p,
@@ -332,7 +339,13 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
 	bool was_enabled = vcpu_has_cache_enabled(vcpu);
 	u64 val, mask, shift;
 
-	BUG_ON(!p->is_write);
+	/* We don't expect TRVM on the host */
+	BUG_ON(!vcpu_is_el2(vcpu) && !p->is_write);
+
+	if (!p->is_write) {
+		p->regval = vcpu_read_sys_reg(vcpu, r->reg);
+		return true;
+	}
 
 	get_access_mask(r, &mask, &shift);
 
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 18/64] KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Christoffer Dall <christoffer.dall@linaro.org>

When running in virtual EL2 mode, we actually run the hardware in EL1
and therefore have to use the EL1 registers to ensure correct operation.

By setting the HCR.TVM and HCR.TVRM we ensure that the virtual EL2 mode
doesn't shoot itself in the foot when setting up what it believes to be
a different mode's system register state (for example when preparing to
switch to a VM).

We can leverage the existing sysregs infrastructure to support trapped
accesses to these registers.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/hyp/include/hyp/switch.h |  4 +---
 arch/arm64/kvm/hyp/nvhe/switch.c        |  2 +-
 arch/arm64/kvm/hyp/vhe/switch.c         |  7 ++++++-
 arch/arm64/kvm/sys_regs.c               | 19 ++++++++++++++++---
 4 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index 58e14f8ead23..49c3b9eb09d7 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -110,10 +110,8 @@ static inline void __deactivate_traps_common(struct kvm_vcpu *vcpu)
 		write_sysreg(0, pmuserenr_el0);
 }
 
-static inline void ___activate_traps(struct kvm_vcpu *vcpu)
+static inline void ___activate_traps(struct kvm_vcpu *vcpu, u64 hcr)
 {
-	u64 hcr = vcpu->arch.hcr_el2;
-
 	if (cpus_have_final_cap(ARM64_WORKAROUND_CAVIUM_TX2_219_TVM))
 		hcr |= HCR_TVM;
 
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index 6410d21d8695..61a5627fd456 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -38,7 +38,7 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 {
 	u64 val;
 
-	___activate_traps(vcpu);
+	___activate_traps(vcpu, vcpu->arch.hcr_el2);
 	__activate_traps_common(vcpu);
 
 	val = vcpu->arch.cptr_el2;
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 82ddaebe66de..6ed9e4893a02 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -32,9 +32,14 @@ DEFINE_PER_CPU(unsigned long, kvm_hyp_vector);
 
 static void __activate_traps(struct kvm_vcpu *vcpu)
 {
+	u64 hcr = vcpu->arch.hcr_el2;
 	u64 val;
 
-	___activate_traps(vcpu);
+	/* Trap VM sysreg accesses if an EL2 guest is not using VHE. */
+	if (vcpu_is_el2(vcpu) && !vcpu_el2_e2h_is_set(vcpu))
+		hcr |= HCR_TVM | HCR_TRVM;
+
+	___activate_traps(vcpu, hcr);
 
 	val = read_sysreg(cpacr_el1);
 	val |= CPACR_EL1_TTA;
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 102bc4906723..9d3520f1d17a 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -322,8 +322,15 @@ static void get_access_mask(const struct sys_reg_desc *r, u64 *mask, u64 *shift)
 
 /*
  * Generic accessor for VM registers. Only called as long as HCR_TVM
- * is set. If the guest enables the MMU, we stop trapping the VM
- * sys_regs and leave it in complete control of the caches.
+ * is set.
+ *
+ * This is set in two cases: either (1) we're running at vEL2, or (2)
+ * we're running at EL1 and the guest has its MMU off.
+ *
+ * (1) TVM/TRVM is set, as we need to virtualise some of the VM
+ * registers for the guest hypervisor
+ * (2) Once the guest enables the MMU, we stop trapping the VM sys_regs
+ * and leave it in complete control of the caches.
  */
 static bool access_vm_reg(struct kvm_vcpu *vcpu,
 			  struct sys_reg_params *p,
@@ -332,7 +339,13 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
 	bool was_enabled = vcpu_has_cache_enabled(vcpu);
 	u64 val, mask, shift;
 
-	BUG_ON(!p->is_write);
+	/* We don't expect TRVM on the host */
+	BUG_ON(!vcpu_is_el2(vcpu) && !p->is_write);
+
+	if (!p->is_write) {
+		p->regval = vcpu_read_sys_reg(vcpu, r->reg);
+		return true;
+	}
 
 	get_access_mask(r, &mask, &shift);
 
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 19/64] KVM: arm64: nv: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Jintack Lim <jintack.lim@linaro.org>

For the same reason we trap virtual memory register accesses at virtual
EL2, we need to trap SPSR_EL1, ELR_EL1 and VBAR_EL1 accesses. ARM v8.3
introduces the HCR_EL2.NV1 bit to be able to trap on those register
accesses in EL1. Do not set this bit until the whole nesting support is
completed.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/sys_regs.c | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 9d3520f1d17a..4f2bcc1e0c25 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1650,6 +1650,30 @@ static bool access_sp_el1(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+static bool access_elr(struct kvm_vcpu *vcpu,
+		       struct sys_reg_params *p,
+		       const struct sys_reg_desc *r)
+{
+	if (p->is_write)
+		vcpu_write_sys_reg(vcpu, p->regval, ELR_EL1);
+	else
+		p->regval = vcpu_read_sys_reg(vcpu, ELR_EL1);
+
+	return true;
+}
+
+static bool access_spsr(struct kvm_vcpu *vcpu,
+			struct sys_reg_params *p,
+			const struct sys_reg_desc *r)
+{
+	if (p->is_write)
+		__vcpu_sys_reg(vcpu, SPSR_EL1) = p->regval;
+	else
+		p->regval = __vcpu_sys_reg(vcpu, SPSR_EL1);
+
+	return true;
+}
+
 static bool access_spsr_el2(struct kvm_vcpu *vcpu,
 			    struct sys_reg_params *p,
 			    const struct sys_reg_desc *r)
@@ -1812,6 +1836,9 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	PTRAUTH_KEY(APDB),
 	PTRAUTH_KEY(APGA),
 
+	{ SYS_DESC(SYS_SPSR_EL1), access_spsr},
+	{ SYS_DESC(SYS_ELR_EL1), access_elr},
+
 	{ SYS_DESC(SYS_AFSR0_EL1), access_vm_reg, reset_unknown, AFSR0_EL1 },
 	{ SYS_DESC(SYS_AFSR1_EL1), access_vm_reg, reset_unknown, AFSR1_EL1 },
 	{ SYS_DESC(SYS_ESR_EL1), access_vm_reg, reset_unknown, ESR_EL1 },
@@ -1859,7 +1886,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_LORC_EL1), trap_loregion },
 	{ SYS_DESC(SYS_LORID_EL1), trap_loregion },
 
-	{ SYS_DESC(SYS_VBAR_EL1), NULL, reset_val, VBAR_EL1, 0 },
+	{ SYS_DESC(SYS_VBAR_EL1), access_rw, reset_val, VBAR_EL1, 0 },
 	{ SYS_DESC(SYS_DISR_EL1), NULL, reset_val, DISR_EL1, 0 },
 
 	{ SYS_DESC(SYS_ICC_IAR0_EL1), write_to_read_only },
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 19/64] KVM: arm64: nv: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

From: Jintack Lim <jintack.lim@linaro.org>

For the same reason we trap virtual memory register accesses at virtual
EL2, we need to trap SPSR_EL1, ELR_EL1 and VBAR_EL1 accesses. ARM v8.3
introduces the HCR_EL2.NV1 bit to be able to trap on those register
accesses in EL1. Do not set this bit until the whole nesting support is
completed.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/sys_regs.c | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 9d3520f1d17a..4f2bcc1e0c25 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1650,6 +1650,30 @@ static bool access_sp_el1(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+static bool access_elr(struct kvm_vcpu *vcpu,
+		       struct sys_reg_params *p,
+		       const struct sys_reg_desc *r)
+{
+	if (p->is_write)
+		vcpu_write_sys_reg(vcpu, p->regval, ELR_EL1);
+	else
+		p->regval = vcpu_read_sys_reg(vcpu, ELR_EL1);
+
+	return true;
+}
+
+static bool access_spsr(struct kvm_vcpu *vcpu,
+			struct sys_reg_params *p,
+			const struct sys_reg_desc *r)
+{
+	if (p->is_write)
+		__vcpu_sys_reg(vcpu, SPSR_EL1) = p->regval;
+	else
+		p->regval = __vcpu_sys_reg(vcpu, SPSR_EL1);
+
+	return true;
+}
+
 static bool access_spsr_el2(struct kvm_vcpu *vcpu,
 			    struct sys_reg_params *p,
 			    const struct sys_reg_desc *r)
@@ -1812,6 +1836,9 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	PTRAUTH_KEY(APDB),
 	PTRAUTH_KEY(APGA),
 
+	{ SYS_DESC(SYS_SPSR_EL1), access_spsr},
+	{ SYS_DESC(SYS_ELR_EL1), access_elr},
+
 	{ SYS_DESC(SYS_AFSR0_EL1), access_vm_reg, reset_unknown, AFSR0_EL1 },
 	{ SYS_DESC(SYS_AFSR1_EL1), access_vm_reg, reset_unknown, AFSR1_EL1 },
 	{ SYS_DESC(SYS_ESR_EL1), access_vm_reg, reset_unknown, ESR_EL1 },
@@ -1859,7 +1886,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_LORC_EL1), trap_loregion },
 	{ SYS_DESC(SYS_LORID_EL1), trap_loregion },
 
-	{ SYS_DESC(SYS_VBAR_EL1), NULL, reset_val, VBAR_EL1, 0 },
+	{ SYS_DESC(SYS_VBAR_EL1), access_rw, reset_val, VBAR_EL1, 0 },
 	{ SYS_DESC(SYS_DISR_EL1), NULL, reset_val, DISR_EL1, 0 },
 
 	{ SYS_DESC(SYS_ICC_IAR0_EL1), write_to_read_only },
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 19/64] KVM: arm64: nv: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Jintack Lim <jintack.lim@linaro.org>

For the same reason we trap virtual memory register accesses at virtual
EL2, we need to trap SPSR_EL1, ELR_EL1 and VBAR_EL1 accesses. ARM v8.3
introduces the HCR_EL2.NV1 bit to be able to trap on those register
accesses in EL1. Do not set this bit until the whole nesting support is
completed.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/sys_regs.c | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 9d3520f1d17a..4f2bcc1e0c25 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1650,6 +1650,30 @@ static bool access_sp_el1(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+static bool access_elr(struct kvm_vcpu *vcpu,
+		       struct sys_reg_params *p,
+		       const struct sys_reg_desc *r)
+{
+	if (p->is_write)
+		vcpu_write_sys_reg(vcpu, p->regval, ELR_EL1);
+	else
+		p->regval = vcpu_read_sys_reg(vcpu, ELR_EL1);
+
+	return true;
+}
+
+static bool access_spsr(struct kvm_vcpu *vcpu,
+			struct sys_reg_params *p,
+			const struct sys_reg_desc *r)
+{
+	if (p->is_write)
+		__vcpu_sys_reg(vcpu, SPSR_EL1) = p->regval;
+	else
+		p->regval = __vcpu_sys_reg(vcpu, SPSR_EL1);
+
+	return true;
+}
+
 static bool access_spsr_el2(struct kvm_vcpu *vcpu,
 			    struct sys_reg_params *p,
 			    const struct sys_reg_desc *r)
@@ -1812,6 +1836,9 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	PTRAUTH_KEY(APDB),
 	PTRAUTH_KEY(APGA),
 
+	{ SYS_DESC(SYS_SPSR_EL1), access_spsr},
+	{ SYS_DESC(SYS_ELR_EL1), access_elr},
+
 	{ SYS_DESC(SYS_AFSR0_EL1), access_vm_reg, reset_unknown, AFSR0_EL1 },
 	{ SYS_DESC(SYS_AFSR1_EL1), access_vm_reg, reset_unknown, AFSR1_EL1 },
 	{ SYS_DESC(SYS_ESR_EL1), access_vm_reg, reset_unknown, ESR_EL1 },
@@ -1859,7 +1886,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_LORC_EL1), trap_loregion },
 	{ SYS_DESC(SYS_LORID_EL1), trap_loregion },
 
-	{ SYS_DESC(SYS_VBAR_EL1), NULL, reset_val, VBAR_EL1, 0 },
+	{ SYS_DESC(SYS_VBAR_EL1), access_rw, reset_val, VBAR_EL1, 0 },
 	{ SYS_DESC(SYS_DISR_EL1), NULL, reset_val, DISR_EL1, 0 },
 
 	{ SYS_DESC(SYS_ICC_IAR0_EL1), write_to_read_only },
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 20/64] KVM: arm64: nv: Trap CPACR_EL1 access in virtual EL2
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Jintack Lim <jintack.lim@linaro.org>

For the same reason we trap virtual memory register accesses in virtual
EL2, we trap CPACR_EL1 access too; We allow the virtual EL2 mode to
access EL1 system register state instead of the virtual EL2 one.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/hyp/vhe/switch.c | 3 +++
 arch/arm64/kvm/sys_regs.c       | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 6ed9e4893a02..1e6a00e7bfb3 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -64,6 +64,9 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 		__activate_traps_fpsimd32(vcpu);
 	}
 
+	if (vcpu_is_el2(vcpu) && !vcpu_el2_e2h_is_set(vcpu))
+		val |= CPTR_EL2_TCPAC;
+	
 	write_sysreg(val, cpacr_el1);
 
 	write_sysreg(__this_cpu_read(kvm_hyp_vector), vbar_el1);
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 4f2bcc1e0c25..7f074a7f6eb3 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1819,7 +1819,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 
 	{ SYS_DESC(SYS_SCTLR_EL1), access_vm_reg, reset_val, SCTLR_EL1, 0x00C50078 },
 	{ SYS_DESC(SYS_ACTLR_EL1), access_actlr, reset_actlr, ACTLR_EL1 },
-	{ SYS_DESC(SYS_CPACR_EL1), NULL, reset_val, CPACR_EL1, 0 },
+	{ SYS_DESC(SYS_CPACR_EL1), access_rw, reset_val, CPACR_EL1, 0 },
 
 	MTE_REG(RGSR_EL1),
 	MTE_REG(GCR_EL1),
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 20/64] KVM: arm64: nv: Trap CPACR_EL1 access in virtual EL2
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

From: Jintack Lim <jintack.lim@linaro.org>

For the same reason we trap virtual memory register accesses in virtual
EL2, we trap CPACR_EL1 access too; We allow the virtual EL2 mode to
access EL1 system register state instead of the virtual EL2 one.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/hyp/vhe/switch.c | 3 +++
 arch/arm64/kvm/sys_regs.c       | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 6ed9e4893a02..1e6a00e7bfb3 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -64,6 +64,9 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 		__activate_traps_fpsimd32(vcpu);
 	}
 
+	if (vcpu_is_el2(vcpu) && !vcpu_el2_e2h_is_set(vcpu))
+		val |= CPTR_EL2_TCPAC;
+	
 	write_sysreg(val, cpacr_el1);
 
 	write_sysreg(__this_cpu_read(kvm_hyp_vector), vbar_el1);
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 4f2bcc1e0c25..7f074a7f6eb3 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1819,7 +1819,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 
 	{ SYS_DESC(SYS_SCTLR_EL1), access_vm_reg, reset_val, SCTLR_EL1, 0x00C50078 },
 	{ SYS_DESC(SYS_ACTLR_EL1), access_actlr, reset_actlr, ACTLR_EL1 },
-	{ SYS_DESC(SYS_CPACR_EL1), NULL, reset_val, CPACR_EL1, 0 },
+	{ SYS_DESC(SYS_CPACR_EL1), access_rw, reset_val, CPACR_EL1, 0 },
 
 	MTE_REG(RGSR_EL1),
 	MTE_REG(GCR_EL1),
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 20/64] KVM: arm64: nv: Trap CPACR_EL1 access in virtual EL2
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Jintack Lim <jintack.lim@linaro.org>

For the same reason we trap virtual memory register accesses in virtual
EL2, we trap CPACR_EL1 access too; We allow the virtual EL2 mode to
access EL1 system register state instead of the virtual EL2 one.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/hyp/vhe/switch.c | 3 +++
 arch/arm64/kvm/sys_regs.c       | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 6ed9e4893a02..1e6a00e7bfb3 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -64,6 +64,9 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 		__activate_traps_fpsimd32(vcpu);
 	}
 
+	if (vcpu_is_el2(vcpu) && !vcpu_el2_e2h_is_set(vcpu))
+		val |= CPTR_EL2_TCPAC;
+	
 	write_sysreg(val, cpacr_el1);
 
 	write_sysreg(__this_cpu_read(kvm_hyp_vector), vbar_el1);
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 4f2bcc1e0c25..7f074a7f6eb3 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1819,7 +1819,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 
 	{ SYS_DESC(SYS_SCTLR_EL1), access_vm_reg, reset_val, SCTLR_EL1, 0x00C50078 },
 	{ SYS_DESC(SYS_ACTLR_EL1), access_actlr, reset_actlr, ACTLR_EL1 },
-	{ SYS_DESC(SYS_CPACR_EL1), NULL, reset_val, CPACR_EL1, 0 },
+	{ SYS_DESC(SYS_CPACR_EL1), access_rw, reset_val, CPACR_EL1, 0 },
 
 	MTE_REG(RGSR_EL1),
 	MTE_REG(GCR_EL1),
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 21/64] KVM: arm64: nv: Handle PSCI call via smc from the guest
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Jintack Lim <jintack.lim@linaro.org>

VMs used to execute hvc #0 for the psci call if EL3 is not implemented.
However, when we come to provide the virtual EL2 mode to the VM, the
host OS inside the VM calls kvm_call_hyp() which is also hvc #0. So,
it's hard to differentiate between them from the host hypervisor's point
of view.

So, let the VM execute smc instruction for the psci call. On ARMv8.3,
even if EL3 is not implemented, a smc instruction executed at non-secure
EL1 is trapped to EL2 if HCR_EL2.TSC==1, rather than being treated as
UNDEFINED. So, the host hypervisor can handle this psci call without any
confusion.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/handle_exit.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 2bbeed8c9786..0cedef6e0d80 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -62,6 +62,8 @@ static int handle_hvc(struct kvm_vcpu *vcpu)
 
 static int handle_smc(struct kvm_vcpu *vcpu)
 {
+	int ret;
+
 	/*
 	 * "If an SMC instruction executed at Non-secure EL1 is
 	 * trapped to EL2 because HCR_EL2.TSC is 1, the exception is a
@@ -69,10 +71,28 @@ static int handle_smc(struct kvm_vcpu *vcpu)
 	 *
 	 * We need to advance the PC after the trap, as it would
 	 * otherwise return to the same address...
+	 *
+	 * If imm is non-zero, it's not defined, so just skip it.
+	 */
+	if (kvm_vcpu_hvc_get_imm(vcpu)) {
+		vcpu_set_reg(vcpu, 0, ~0UL);
+		kvm_incr_pc(vcpu);
+		return 1;
+	}
+
+	/*
+	 * If imm is zero, it's a psci call.
+	 * Note that on ARMv8.3, even if EL3 is not implemented, SMC executed
+	 * at Non-secure EL1 is trapped to EL2 if HCR_EL2.TSC==1, rather than
+	 * being treated as UNDEFINED.
 	 */
-	vcpu_set_reg(vcpu, 0, ~0UL);
+	ret = kvm_hvc_call_handler(vcpu);
+	if (ret < 0)
+		vcpu_set_reg(vcpu, 0, ~0UL);
+
 	kvm_incr_pc(vcpu);
-	return 1;
+
+	return ret;
 }
 
 /*
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 21/64] KVM: arm64: nv: Handle PSCI call via smc from the guest
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

From: Jintack Lim <jintack.lim@linaro.org>

VMs used to execute hvc #0 for the psci call if EL3 is not implemented.
However, when we come to provide the virtual EL2 mode to the VM, the
host OS inside the VM calls kvm_call_hyp() which is also hvc #0. So,
it's hard to differentiate between them from the host hypervisor's point
of view.

So, let the VM execute smc instruction for the psci call. On ARMv8.3,
even if EL3 is not implemented, a smc instruction executed at non-secure
EL1 is trapped to EL2 if HCR_EL2.TSC==1, rather than being treated as
UNDEFINED. So, the host hypervisor can handle this psci call without any
confusion.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/handle_exit.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 2bbeed8c9786..0cedef6e0d80 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -62,6 +62,8 @@ static int handle_hvc(struct kvm_vcpu *vcpu)
 
 static int handle_smc(struct kvm_vcpu *vcpu)
 {
+	int ret;
+
 	/*
 	 * "If an SMC instruction executed at Non-secure EL1 is
 	 * trapped to EL2 because HCR_EL2.TSC is 1, the exception is a
@@ -69,10 +71,28 @@ static int handle_smc(struct kvm_vcpu *vcpu)
 	 *
 	 * We need to advance the PC after the trap, as it would
 	 * otherwise return to the same address...
+	 *
+	 * If imm is non-zero, it's not defined, so just skip it.
+	 */
+	if (kvm_vcpu_hvc_get_imm(vcpu)) {
+		vcpu_set_reg(vcpu, 0, ~0UL);
+		kvm_incr_pc(vcpu);
+		return 1;
+	}
+
+	/*
+	 * If imm is zero, it's a psci call.
+	 * Note that on ARMv8.3, even if EL3 is not implemented, SMC executed
+	 * at Non-secure EL1 is trapped to EL2 if HCR_EL2.TSC==1, rather than
+	 * being treated as UNDEFINED.
 	 */
-	vcpu_set_reg(vcpu, 0, ~0UL);
+	ret = kvm_hvc_call_handler(vcpu);
+	if (ret < 0)
+		vcpu_set_reg(vcpu, 0, ~0UL);
+
 	kvm_incr_pc(vcpu);
-	return 1;
+
+	return ret;
 }
 
 /*
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 21/64] KVM: arm64: nv: Handle PSCI call via smc from the guest
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Jintack Lim <jintack.lim@linaro.org>

VMs used to execute hvc #0 for the psci call if EL3 is not implemented.
However, when we come to provide the virtual EL2 mode to the VM, the
host OS inside the VM calls kvm_call_hyp() which is also hvc #0. So,
it's hard to differentiate between them from the host hypervisor's point
of view.

So, let the VM execute smc instruction for the psci call. On ARMv8.3,
even if EL3 is not implemented, a smc instruction executed at non-secure
EL1 is trapped to EL2 if HCR_EL2.TSC==1, rather than being treated as
UNDEFINED. So, the host hypervisor can handle this psci call without any
confusion.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/handle_exit.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 2bbeed8c9786..0cedef6e0d80 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -62,6 +62,8 @@ static int handle_hvc(struct kvm_vcpu *vcpu)
 
 static int handle_smc(struct kvm_vcpu *vcpu)
 {
+	int ret;
+
 	/*
 	 * "If an SMC instruction executed at Non-secure EL1 is
 	 * trapped to EL2 because HCR_EL2.TSC is 1, the exception is a
@@ -69,10 +71,28 @@ static int handle_smc(struct kvm_vcpu *vcpu)
 	 *
 	 * We need to advance the PC after the trap, as it would
 	 * otherwise return to the same address...
+	 *
+	 * If imm is non-zero, it's not defined, so just skip it.
+	 */
+	if (kvm_vcpu_hvc_get_imm(vcpu)) {
+		vcpu_set_reg(vcpu, 0, ~0UL);
+		kvm_incr_pc(vcpu);
+		return 1;
+	}
+
+	/*
+	 * If imm is zero, it's a psci call.
+	 * Note that on ARMv8.3, even if EL3 is not implemented, SMC executed
+	 * at Non-secure EL1 is trapped to EL2 if HCR_EL2.TSC==1, rather than
+	 * being treated as UNDEFINED.
 	 */
-	vcpu_set_reg(vcpu, 0, ~0UL);
+	ret = kvm_hvc_call_handler(vcpu);
+	if (ret < 0)
+		vcpu_set_reg(vcpu, 0, ~0UL);
+
 	kvm_incr_pc(vcpu);
-	return 1;
+
+	return ret;
 }
 
 /*
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 22/64] KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Jintack Lim <jintack.lim@linaro.org>

Forward exceptions due to WFI or WFE instructions to the virtual EL2 if
they are not coming from the virtual EL2 and virtual HCR_EL2.TWX is set.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_nested.h |  2 ++
 arch/arm64/kvm/Makefile             |  2 +-
 arch/arm64/kvm/handle_exit.c        | 11 ++++++++++-
 arch/arm64/kvm/nested.c             | 28 ++++++++++++++++++++++++++++
 4 files changed, 41 insertions(+), 2 deletions(-)
 create mode 100644 arch/arm64/kvm/nested.c

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 5a85be6d8eb3..79d382fa02ea 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -65,4 +65,6 @@ static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
 		(cnthctl & (CNTHCTL_EVNTI | CNTHCTL_EVNTDIR | CNTHCTL_EVNTEN)));
 }
 
+int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
+
 #endif /* __ARM64_KVM_NESTED_H */
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index b67c4ebd72b1..dbaf42ff65f1 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -14,7 +14,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
 	 inject_fault.o va_layout.o handle_exit.o \
 	 guest.o debug.o reset.o sys_regs.o \
 	 vgic-sys-reg-v3.o fpsimd.o pmu.o pkvm.o \
-	 arch_timer.o trng.o emulate-nested.o \
+	 arch_timer.o trng.o emulate-nested.o nested.o \
 	 vgic/vgic.o vgic/vgic-init.o \
 	 vgic/vgic-irqfd.o vgic/vgic-v2.o \
 	 vgic/vgic-v3.o vgic/vgic-v4.o \
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 0cedef6e0d80..a1b1bbf3d598 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -119,7 +119,16 @@ static int handle_no_fpsimd(struct kvm_vcpu *vcpu)
  */
 static int kvm_handle_wfx(struct kvm_vcpu *vcpu)
 {
-	if (kvm_vcpu_get_esr(vcpu) & ESR_ELx_WFx_ISS_WFE) {
+	bool is_wfe = !!(kvm_vcpu_get_esr(vcpu) & ESR_ELx_WFx_ISS_WFE);
+
+	if (vcpu_has_nv(vcpu)) {
+		int ret = handle_wfx_nested(vcpu, is_wfe);
+
+		if (ret != -EINVAL)
+			return ret;
+	}
+
+	if (is_wfe) {
 		trace_kvm_wfx_arm64(*vcpu_pc(vcpu), true);
 		vcpu->stat.wfe_exit_stat++;
 		kvm_vcpu_on_spin(vcpu, vcpu_mode_priv(vcpu));
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
new file mode 100644
index 000000000000..5e1104f8e765
--- /dev/null
+++ b/arch/arm64/kvm/nested.c
@@ -0,0 +1,28 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2017 - Columbia University and Linaro Ltd.
+ * Author: Jintack Lim <jintack.lim@linaro.org>
+ */
+
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_emulate.h>
+
+/*
+ * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
+ * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
+ * handle this.
+ */
+int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
+{
+	u64 hcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
+
+	if (vcpu_is_el2(vcpu))
+		return -EINVAL;
+
+	if ((is_wfe && (hcr_el2 & HCR_TWE)) || (!is_wfe && (hcr_el2 & HCR_TWI)))
+		return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
+
+	return -EINVAL;
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 22/64] KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

From: Jintack Lim <jintack.lim@linaro.org>

Forward exceptions due to WFI or WFE instructions to the virtual EL2 if
they are not coming from the virtual EL2 and virtual HCR_EL2.TWX is set.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_nested.h |  2 ++
 arch/arm64/kvm/Makefile             |  2 +-
 arch/arm64/kvm/handle_exit.c        | 11 ++++++++++-
 arch/arm64/kvm/nested.c             | 28 ++++++++++++++++++++++++++++
 4 files changed, 41 insertions(+), 2 deletions(-)
 create mode 100644 arch/arm64/kvm/nested.c

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 5a85be6d8eb3..79d382fa02ea 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -65,4 +65,6 @@ static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
 		(cnthctl & (CNTHCTL_EVNTI | CNTHCTL_EVNTDIR | CNTHCTL_EVNTEN)));
 }
 
+int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
+
 #endif /* __ARM64_KVM_NESTED_H */
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index b67c4ebd72b1..dbaf42ff65f1 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -14,7 +14,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
 	 inject_fault.o va_layout.o handle_exit.o \
 	 guest.o debug.o reset.o sys_regs.o \
 	 vgic-sys-reg-v3.o fpsimd.o pmu.o pkvm.o \
-	 arch_timer.o trng.o emulate-nested.o \
+	 arch_timer.o trng.o emulate-nested.o nested.o \
 	 vgic/vgic.o vgic/vgic-init.o \
 	 vgic/vgic-irqfd.o vgic/vgic-v2.o \
 	 vgic/vgic-v3.o vgic/vgic-v4.o \
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 0cedef6e0d80..a1b1bbf3d598 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -119,7 +119,16 @@ static int handle_no_fpsimd(struct kvm_vcpu *vcpu)
  */
 static int kvm_handle_wfx(struct kvm_vcpu *vcpu)
 {
-	if (kvm_vcpu_get_esr(vcpu) & ESR_ELx_WFx_ISS_WFE) {
+	bool is_wfe = !!(kvm_vcpu_get_esr(vcpu) & ESR_ELx_WFx_ISS_WFE);
+
+	if (vcpu_has_nv(vcpu)) {
+		int ret = handle_wfx_nested(vcpu, is_wfe);
+
+		if (ret != -EINVAL)
+			return ret;
+	}
+
+	if (is_wfe) {
 		trace_kvm_wfx_arm64(*vcpu_pc(vcpu), true);
 		vcpu->stat.wfe_exit_stat++;
 		kvm_vcpu_on_spin(vcpu, vcpu_mode_priv(vcpu));
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
new file mode 100644
index 000000000000..5e1104f8e765
--- /dev/null
+++ b/arch/arm64/kvm/nested.c
@@ -0,0 +1,28 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2017 - Columbia University and Linaro Ltd.
+ * Author: Jintack Lim <jintack.lim@linaro.org>
+ */
+
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_emulate.h>
+
+/*
+ * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
+ * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
+ * handle this.
+ */
+int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
+{
+	u64 hcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
+
+	if (vcpu_is_el2(vcpu))
+		return -EINVAL;
+
+	if ((is_wfe && (hcr_el2 & HCR_TWE)) || (!is_wfe && (hcr_el2 & HCR_TWI)))
+		return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
+
+	return -EINVAL;
+}
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 22/64] KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Jintack Lim <jintack.lim@linaro.org>

Forward exceptions due to WFI or WFE instructions to the virtual EL2 if
they are not coming from the virtual EL2 and virtual HCR_EL2.TWX is set.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_nested.h |  2 ++
 arch/arm64/kvm/Makefile             |  2 +-
 arch/arm64/kvm/handle_exit.c        | 11 ++++++++++-
 arch/arm64/kvm/nested.c             | 28 ++++++++++++++++++++++++++++
 4 files changed, 41 insertions(+), 2 deletions(-)
 create mode 100644 arch/arm64/kvm/nested.c

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 5a85be6d8eb3..79d382fa02ea 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -65,4 +65,6 @@ static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
 		(cnthctl & (CNTHCTL_EVNTI | CNTHCTL_EVNTDIR | CNTHCTL_EVNTEN)));
 }
 
+int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
+
 #endif /* __ARM64_KVM_NESTED_H */
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index b67c4ebd72b1..dbaf42ff65f1 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -14,7 +14,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
 	 inject_fault.o va_layout.o handle_exit.o \
 	 guest.o debug.o reset.o sys_regs.o \
 	 vgic-sys-reg-v3.o fpsimd.o pmu.o pkvm.o \
-	 arch_timer.o trng.o emulate-nested.o \
+	 arch_timer.o trng.o emulate-nested.o nested.o \
 	 vgic/vgic.o vgic/vgic-init.o \
 	 vgic/vgic-irqfd.o vgic/vgic-v2.o \
 	 vgic/vgic-v3.o vgic/vgic-v4.o \
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 0cedef6e0d80..a1b1bbf3d598 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -119,7 +119,16 @@ static int handle_no_fpsimd(struct kvm_vcpu *vcpu)
  */
 static int kvm_handle_wfx(struct kvm_vcpu *vcpu)
 {
-	if (kvm_vcpu_get_esr(vcpu) & ESR_ELx_WFx_ISS_WFE) {
+	bool is_wfe = !!(kvm_vcpu_get_esr(vcpu) & ESR_ELx_WFx_ISS_WFE);
+
+	if (vcpu_has_nv(vcpu)) {
+		int ret = handle_wfx_nested(vcpu, is_wfe);
+
+		if (ret != -EINVAL)
+			return ret;
+	}
+
+	if (is_wfe) {
 		trace_kvm_wfx_arm64(*vcpu_pc(vcpu), true);
 		vcpu->stat.wfe_exit_stat++;
 		kvm_vcpu_on_spin(vcpu, vcpu_mode_priv(vcpu));
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
new file mode 100644
index 000000000000..5e1104f8e765
--- /dev/null
+++ b/arch/arm64/kvm/nested.c
@@ -0,0 +1,28 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2017 - Columbia University and Linaro Ltd.
+ * Author: Jintack Lim <jintack.lim@linaro.org>
+ */
+
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_emulate.h>
+
+/*
+ * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
+ * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
+ * handle this.
+ */
+int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
+{
+	u64 hcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
+
+	if (vcpu_is_el2(vcpu))
+		return -EINVAL;
+
+	if ((is_wfe && (hcr_el2 & HCR_TWE)) || (!is_wfe && (hcr_el2 & HCR_TWI)))
+		return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
+
+	return -EINVAL;
+}
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 23/64] KVM: arm64: nv: Respect virtual CPTR_EL2.{TFP,FPEN} settings
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Jintack Lim <jintack.lim@linaro.org>

Forward traps due to FP/ASIMD register accesses to the virtual EL2
if virtual CPTR_EL2.TFP is set (with HCR_EL2.E2H == 0) or
CPTR_EL2.FPEN is configure to do so (with HCR_EL2.E2h == 1).

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
[maz: account for HCR_EL2.E2H when testing for TFP/FPEN, with
 all the hard work actually being done by Chase Conklin]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_emulate.h    | 26 +++++++++++++++++++++++++
 arch/arm64/kvm/handle_exit.c            | 16 +++++++++++----
 arch/arm64/kvm/hyp/include/hyp/switch.h |  8 ++++++--
 3 files changed, 44 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 54e8eee413eb..ff8980a39ee8 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -11,12 +11,14 @@
 #ifndef __ARM64_KVM_EMULATE_H__
 #define __ARM64_KVM_EMULATE_H__
 
+#include <linux/bitfield.h>
 #include <linux/kvm_host.h>
 
 #include <asm/debug-monitors.h>
 #include <asm/esr.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_nested.h>
 #include <asm/ptrace.h>
 #include <asm/cputype.h>
 #include <asm/virt.h>
@@ -324,6 +326,30 @@ static inline bool vcpu_mode_priv(const struct kvm_vcpu *vcpu)
 	return mode != PSR_MODE_EL0t;
 }
 
+static inline bool guest_hyp_fpsimd_traps_enabled(const struct kvm_vcpu *vcpu)
+{
+	u64 val;
+
+	if (!vcpu_has_nv(vcpu))
+		return false;
+
+	val = vcpu_read_sys_reg(vcpu, CPTR_EL2);
+
+	if (!vcpu_el2_e2h_is_set(vcpu))
+		return (val & CPTR_EL2_TFP);
+
+	switch (FIELD_GET(CPACR_EL1_FPEN, val)) {
+	case 0b00:
+	case 0b10:
+		return true;
+	case 0b01:
+		return vcpu_el2_tge_is_set(vcpu) && !vcpu_is_el2(vcpu);
+	case 0b11:
+	default:		/* GCC is dumb */
+		return false;
+	}
+}
+
 static __always_inline u32 kvm_vcpu_get_esr(const struct kvm_vcpu *vcpu)
 {
 	return vcpu->arch.fault.esr_el2;
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index a1b1bbf3d598..a5c698f188d6 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -96,11 +96,19 @@ static int handle_smc(struct kvm_vcpu *vcpu)
 }
 
 /*
- * Guest access to FP/ASIMD registers are routed to this handler only
- * when the system doesn't support FP/ASIMD.
+ * This handles the cases where the system does not support FP/ASIMD or when
+ * we are running nested virtualization and the guest hypervisor is trapping
+ * FP/ASIMD accesses by its guest guest.
+ *
+ * All other handling of guest vs. host FP/ASIMD register state is handled in
+ * fixup_guest_exit().
  */
-static int handle_no_fpsimd(struct kvm_vcpu *vcpu)
+static int kvm_handle_fpasimd(struct kvm_vcpu *vcpu)
 {
+	if (guest_hyp_fpsimd_traps_enabled(vcpu))
+		return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
+
+	/* This is the case when the system doesn't support FP/ASIMD. */
 	kvm_inject_undefined(vcpu);
 	return 1;
 }
@@ -231,7 +239,7 @@ static exit_handle_fn arm_exit_handlers[] = {
 	[ESR_ELx_EC_BREAKPT_LOW]= kvm_handle_guest_debug,
 	[ESR_ELx_EC_BKPT32]	= kvm_handle_guest_debug,
 	[ESR_ELx_EC_BRK64]	= kvm_handle_guest_debug,
-	[ESR_ELx_EC_FP_ASIMD]	= handle_no_fpsimd,
+	[ESR_ELx_EC_FP_ASIMD]	= kvm_handle_fpasimd,
 	[ESR_ELx_EC_PAC]	= kvm_handle_ptrauth,
 };
 
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index 49c3b9eb09d7..42c47731f64e 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -166,8 +166,12 @@ static bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code)
 	sve_guest = vcpu_has_sve(vcpu);
 	esr_ec = kvm_vcpu_trap_get_class(vcpu);
 
-	/* Don't handle SVE traps for non-SVE vcpus here: */
-	if (!sve_guest && esr_ec != ESR_ELx_EC_FP_ASIMD)
+	/*
+	 * Don't handle SVE traps for non-SVE vcpus here. This
+	 * includes NV guests for the time being.
+	 */
+	if (!sve_guest && (esr_ec != ESR_ELx_EC_FP_ASIMD ||
+			   guest_hyp_fpsimd_traps_enabled(vcpu)))
 		return false;
 
 	/* Valid trap.  Switch the context: */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 23/64] KVM: arm64: nv: Respect virtual CPTR_EL2.{TFP, FPEN} settings
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

From: Jintack Lim <jintack.lim@linaro.org>

Forward traps due to FP/ASIMD register accesses to the virtual EL2
if virtual CPTR_EL2.TFP is set (with HCR_EL2.E2H == 0) or
CPTR_EL2.FPEN is configure to do so (with HCR_EL2.E2h == 1).

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
[maz: account for HCR_EL2.E2H when testing for TFP/FPEN, with
 all the hard work actually being done by Chase Conklin]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_emulate.h    | 26 +++++++++++++++++++++++++
 arch/arm64/kvm/handle_exit.c            | 16 +++++++++++----
 arch/arm64/kvm/hyp/include/hyp/switch.h |  8 ++++++--
 3 files changed, 44 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 54e8eee413eb..ff8980a39ee8 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -11,12 +11,14 @@
 #ifndef __ARM64_KVM_EMULATE_H__
 #define __ARM64_KVM_EMULATE_H__
 
+#include <linux/bitfield.h>
 #include <linux/kvm_host.h>
 
 #include <asm/debug-monitors.h>
 #include <asm/esr.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_nested.h>
 #include <asm/ptrace.h>
 #include <asm/cputype.h>
 #include <asm/virt.h>
@@ -324,6 +326,30 @@ static inline bool vcpu_mode_priv(const struct kvm_vcpu *vcpu)
 	return mode != PSR_MODE_EL0t;
 }
 
+static inline bool guest_hyp_fpsimd_traps_enabled(const struct kvm_vcpu *vcpu)
+{
+	u64 val;
+
+	if (!vcpu_has_nv(vcpu))
+		return false;
+
+	val = vcpu_read_sys_reg(vcpu, CPTR_EL2);
+
+	if (!vcpu_el2_e2h_is_set(vcpu))
+		return (val & CPTR_EL2_TFP);
+
+	switch (FIELD_GET(CPACR_EL1_FPEN, val)) {
+	case 0b00:
+	case 0b10:
+		return true;
+	case 0b01:
+		return vcpu_el2_tge_is_set(vcpu) && !vcpu_is_el2(vcpu);
+	case 0b11:
+	default:		/* GCC is dumb */
+		return false;
+	}
+}
+
 static __always_inline u32 kvm_vcpu_get_esr(const struct kvm_vcpu *vcpu)
 {
 	return vcpu->arch.fault.esr_el2;
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index a1b1bbf3d598..a5c698f188d6 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -96,11 +96,19 @@ static int handle_smc(struct kvm_vcpu *vcpu)
 }
 
 /*
- * Guest access to FP/ASIMD registers are routed to this handler only
- * when the system doesn't support FP/ASIMD.
+ * This handles the cases where the system does not support FP/ASIMD or when
+ * we are running nested virtualization and the guest hypervisor is trapping
+ * FP/ASIMD accesses by its guest guest.
+ *
+ * All other handling of guest vs. host FP/ASIMD register state is handled in
+ * fixup_guest_exit().
  */
-static int handle_no_fpsimd(struct kvm_vcpu *vcpu)
+static int kvm_handle_fpasimd(struct kvm_vcpu *vcpu)
 {
+	if (guest_hyp_fpsimd_traps_enabled(vcpu))
+		return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
+
+	/* This is the case when the system doesn't support FP/ASIMD. */
 	kvm_inject_undefined(vcpu);
 	return 1;
 }
@@ -231,7 +239,7 @@ static exit_handle_fn arm_exit_handlers[] = {
 	[ESR_ELx_EC_BREAKPT_LOW]= kvm_handle_guest_debug,
 	[ESR_ELx_EC_BKPT32]	= kvm_handle_guest_debug,
 	[ESR_ELx_EC_BRK64]	= kvm_handle_guest_debug,
-	[ESR_ELx_EC_FP_ASIMD]	= handle_no_fpsimd,
+	[ESR_ELx_EC_FP_ASIMD]	= kvm_handle_fpasimd,
 	[ESR_ELx_EC_PAC]	= kvm_handle_ptrauth,
 };
 
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index 49c3b9eb09d7..42c47731f64e 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -166,8 +166,12 @@ static bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code)
 	sve_guest = vcpu_has_sve(vcpu);
 	esr_ec = kvm_vcpu_trap_get_class(vcpu);
 
-	/* Don't handle SVE traps for non-SVE vcpus here: */
-	if (!sve_guest && esr_ec != ESR_ELx_EC_FP_ASIMD)
+	/*
+	 * Don't handle SVE traps for non-SVE vcpus here. This
+	 * includes NV guests for the time being.
+	 */
+	if (!sve_guest && (esr_ec != ESR_ELx_EC_FP_ASIMD ||
+			   guest_hyp_fpsimd_traps_enabled(vcpu)))
 		return false;
 
 	/* Valid trap.  Switch the context: */
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 23/64] KVM: arm64: nv: Respect virtual CPTR_EL2.{TFP, FPEN} settings
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Jintack Lim <jintack.lim@linaro.org>

Forward traps due to FP/ASIMD register accesses to the virtual EL2
if virtual CPTR_EL2.TFP is set (with HCR_EL2.E2H == 0) or
CPTR_EL2.FPEN is configure to do so (with HCR_EL2.E2h == 1).

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
[maz: account for HCR_EL2.E2H when testing for TFP/FPEN, with
 all the hard work actually being done by Chase Conklin]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_emulate.h    | 26 +++++++++++++++++++++++++
 arch/arm64/kvm/handle_exit.c            | 16 +++++++++++----
 arch/arm64/kvm/hyp/include/hyp/switch.h |  8 ++++++--
 3 files changed, 44 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 54e8eee413eb..ff8980a39ee8 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -11,12 +11,14 @@
 #ifndef __ARM64_KVM_EMULATE_H__
 #define __ARM64_KVM_EMULATE_H__
 
+#include <linux/bitfield.h>
 #include <linux/kvm_host.h>
 
 #include <asm/debug-monitors.h>
 #include <asm/esr.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_nested.h>
 #include <asm/ptrace.h>
 #include <asm/cputype.h>
 #include <asm/virt.h>
@@ -324,6 +326,30 @@ static inline bool vcpu_mode_priv(const struct kvm_vcpu *vcpu)
 	return mode != PSR_MODE_EL0t;
 }
 
+static inline bool guest_hyp_fpsimd_traps_enabled(const struct kvm_vcpu *vcpu)
+{
+	u64 val;
+
+	if (!vcpu_has_nv(vcpu))
+		return false;
+
+	val = vcpu_read_sys_reg(vcpu, CPTR_EL2);
+
+	if (!vcpu_el2_e2h_is_set(vcpu))
+		return (val & CPTR_EL2_TFP);
+
+	switch (FIELD_GET(CPACR_EL1_FPEN, val)) {
+	case 0b00:
+	case 0b10:
+		return true;
+	case 0b01:
+		return vcpu_el2_tge_is_set(vcpu) && !vcpu_is_el2(vcpu);
+	case 0b11:
+	default:		/* GCC is dumb */
+		return false;
+	}
+}
+
 static __always_inline u32 kvm_vcpu_get_esr(const struct kvm_vcpu *vcpu)
 {
 	return vcpu->arch.fault.esr_el2;
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index a1b1bbf3d598..a5c698f188d6 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -96,11 +96,19 @@ static int handle_smc(struct kvm_vcpu *vcpu)
 }
 
 /*
- * Guest access to FP/ASIMD registers are routed to this handler only
- * when the system doesn't support FP/ASIMD.
+ * This handles the cases where the system does not support FP/ASIMD or when
+ * we are running nested virtualization and the guest hypervisor is trapping
+ * FP/ASIMD accesses by its guest guest.
+ *
+ * All other handling of guest vs. host FP/ASIMD register state is handled in
+ * fixup_guest_exit().
  */
-static int handle_no_fpsimd(struct kvm_vcpu *vcpu)
+static int kvm_handle_fpasimd(struct kvm_vcpu *vcpu)
 {
+	if (guest_hyp_fpsimd_traps_enabled(vcpu))
+		return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
+
+	/* This is the case when the system doesn't support FP/ASIMD. */
 	kvm_inject_undefined(vcpu);
 	return 1;
 }
@@ -231,7 +239,7 @@ static exit_handle_fn arm_exit_handlers[] = {
 	[ESR_ELx_EC_BREAKPT_LOW]= kvm_handle_guest_debug,
 	[ESR_ELx_EC_BKPT32]	= kvm_handle_guest_debug,
 	[ESR_ELx_EC_BRK64]	= kvm_handle_guest_debug,
-	[ESR_ELx_EC_FP_ASIMD]	= handle_no_fpsimd,
+	[ESR_ELx_EC_FP_ASIMD]	= kvm_handle_fpasimd,
 	[ESR_ELx_EC_PAC]	= kvm_handle_ptrauth,
 };
 
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index 49c3b9eb09d7..42c47731f64e 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -166,8 +166,12 @@ static bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code)
 	sve_guest = vcpu_has_sve(vcpu);
 	esr_ec = kvm_vcpu_trap_get_class(vcpu);
 
-	/* Don't handle SVE traps for non-SVE vcpus here: */
-	if (!sve_guest && esr_ec != ESR_ELx_EC_FP_ASIMD)
+	/*
+	 * Don't handle SVE traps for non-SVE vcpus here. This
+	 * includes NV guests for the time being.
+	 */
+	if (!sve_guest && (esr_ec != ESR_ELx_EC_FP_ASIMD ||
+			   guest_hyp_fpsimd_traps_enabled(vcpu)))
 		return false;
 
 	/* Valid trap.  Switch the context: */
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 24/64] KVM: arm64: nv: Respect the virtual HCR_EL2.NV bit setting
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Jintack Lim <jintack.lim@linaro.org>

Forward traps due to HCR_EL2.NV bit to the virtual EL2 if they are not
coming from the virtual EL2 and the virtual HCR_EL2.NV bit is set.

In addition to EL2 register accesses, setting NV bit will also make EL12
register accesses trap to EL2. To emulate this for the virtual EL2,
forword traps due to EL12 register accessses to the virtual EL2 if the
virtual HCR_EL2.NV bit is set.

This is for recursive nested virtualization.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
[Moved code to emulate-nested.c]
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_arm.h    |  1 +
 arch/arm64/include/asm/kvm_nested.h |  2 ++
 arch/arm64/kvm/emulate-nested.c     | 27 +++++++++++++++++++++++++++
 arch/arm64/kvm/handle_exit.c        |  7 +++++++
 arch/arm64/kvm/sys_regs.c           | 21 +++++++++++++++++++++
 5 files changed, 58 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 5acb153a82c8..8043827e7dc0 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -20,6 +20,7 @@
 #define HCR_AMVOFFEN	(UL(1) << 51)
 #define HCR_FIEN	(UL(1) << 47)
 #define HCR_FWB		(UL(1) << 46)
+#define HCR_NV		(UL(1) << 42)
 #define HCR_API		(UL(1) << 41)
 #define HCR_APK		(UL(1) << 40)
 #define HCR_TEA		(UL(1) << 37)
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 79d382fa02ea..37ff6458296d 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -66,5 +66,7 @@ static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
 }
 
 int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
+extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
+extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
 
 #endif /* __ARM64_KVM_NESTED_H */
diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
index f52cd4458947..7dd98d6e96e0 100644
--- a/arch/arm64/kvm/emulate-nested.c
+++ b/arch/arm64/kvm/emulate-nested.c
@@ -13,6 +13,26 @@
 
 #include "trace.h"
 
+bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit)
+{
+	bool control_bit_set;
+
+	if (!vcpu_has_nv(vcpu))
+		return false;
+
+	control_bit_set = __vcpu_sys_reg(vcpu, HCR_EL2) & control_bit;
+	if (!vcpu_is_el2(vcpu) && control_bit_set) {
+		kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
+		return true;
+	}
+	return false;
+}
+
+bool forward_nv_traps(struct kvm_vcpu *vcpu)
+{
+	return forward_traps(vcpu, HCR_NV);
+}
+
 static u64 kvm_check_illegal_exception_return(struct kvm_vcpu *vcpu, u64 spsr)
 {
 	u64 mode = spsr & PSR_MODE_MASK;
@@ -49,6 +69,13 @@ void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu)
 	u64 spsr, elr, mode;
 	bool direct_eret;
 
+	/*
+	 * Forward this trap to the virtual EL2 if the virtual
+	 * HCR_EL2.NV bit is set and this is coming from !EL2.
+	 */
+	if (forward_nv_traps(vcpu))
+		return;
+
 	/*
 	 * Going through the whole put/load motions is a waste of time
 	 * if this is a VHE guest hypervisor returning to its own
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index a5c698f188d6..867de65eb766 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -64,6 +64,13 @@ static int handle_smc(struct kvm_vcpu *vcpu)
 {
 	int ret;
 
+	/*
+	 * Forward this trapped smc instruction to the virtual EL2 if
+	 * the guest has asked for it.
+	 */
+	if (forward_traps(vcpu, HCR_TSC))
+		return 1;
+
 	/*
 	 * "If an SMC instruction executed at Non-secure EL1 is
 	 * trapped to EL2 because HCR_EL2.TSC is 1, the exception is a
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 7f074a7f6eb3..ccd063d6cb69 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -267,10 +267,19 @@ static u32 get_ccsidr(u32 csselr)
 	return ccsidr;
 }
 
+static bool el12_reg(struct sys_reg_params *p)
+{
+	/* All *_EL12 registers have Op1=5. */
+	return (p->Op1 == 5);
+}
+
 static bool access_rw(struct kvm_vcpu *vcpu,
 		      struct sys_reg_params *p,
 		      const struct sys_reg_desc *r)
 {
+	if (el12_reg(p) && forward_nv_traps(vcpu))
+		return false;
+
 	if (p->is_write)
 		vcpu_write_sys_reg(vcpu, p->regval, r->reg);
 	else
@@ -339,6 +348,9 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
 	bool was_enabled = vcpu_has_cache_enabled(vcpu);
 	u64 val, mask, shift;
 
+	if (el12_reg(p) && forward_nv_traps(vcpu))
+		return false;
+
 	/* We don't expect TRVM on the host */
 	BUG_ON(!vcpu_is_el2(vcpu) && !p->is_write);
 
@@ -1654,6 +1666,9 @@ static bool access_elr(struct kvm_vcpu *vcpu,
 		       struct sys_reg_params *p,
 		       const struct sys_reg_desc *r)
 {
+	if (el12_reg(p) && forward_nv_traps(vcpu))
+		return false;
+
 	if (p->is_write)
 		vcpu_write_sys_reg(vcpu, p->regval, ELR_EL1);
 	else
@@ -1666,6 +1681,9 @@ static bool access_spsr(struct kvm_vcpu *vcpu,
 			struct sys_reg_params *p,
 			const struct sys_reg_desc *r)
 {
+	if (el12_reg(p) && forward_nv_traps(vcpu))
+		return false;
+
 	if (p->is_write)
 		__vcpu_sys_reg(vcpu, SPSR_EL1) = p->regval;
 	else
@@ -1678,6 +1696,9 @@ static bool access_spsr_el2(struct kvm_vcpu *vcpu,
 			    struct sys_reg_params *p,
 			    const struct sys_reg_desc *r)
 {
+	if (el12_reg(p) && forward_nv_traps(vcpu))
+		return false;
+
 	if (p->is_write)
 		vcpu_write_sys_reg(vcpu, p->regval, SPSR_EL2);
 	else
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 24/64] KVM: arm64: nv: Respect the virtual HCR_EL2.NV bit setting
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

From: Jintack Lim <jintack.lim@linaro.org>

Forward traps due to HCR_EL2.NV bit to the virtual EL2 if they are not
coming from the virtual EL2 and the virtual HCR_EL2.NV bit is set.

In addition to EL2 register accesses, setting NV bit will also make EL12
register accesses trap to EL2. To emulate this for the virtual EL2,
forword traps due to EL12 register accessses to the virtual EL2 if the
virtual HCR_EL2.NV bit is set.

This is for recursive nested virtualization.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
[Moved code to emulate-nested.c]
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_arm.h    |  1 +
 arch/arm64/include/asm/kvm_nested.h |  2 ++
 arch/arm64/kvm/emulate-nested.c     | 27 +++++++++++++++++++++++++++
 arch/arm64/kvm/handle_exit.c        |  7 +++++++
 arch/arm64/kvm/sys_regs.c           | 21 +++++++++++++++++++++
 5 files changed, 58 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 5acb153a82c8..8043827e7dc0 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -20,6 +20,7 @@
 #define HCR_AMVOFFEN	(UL(1) << 51)
 #define HCR_FIEN	(UL(1) << 47)
 #define HCR_FWB		(UL(1) << 46)
+#define HCR_NV		(UL(1) << 42)
 #define HCR_API		(UL(1) << 41)
 #define HCR_APK		(UL(1) << 40)
 #define HCR_TEA		(UL(1) << 37)
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 79d382fa02ea..37ff6458296d 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -66,5 +66,7 @@ static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
 }
 
 int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
+extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
+extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
 
 #endif /* __ARM64_KVM_NESTED_H */
diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
index f52cd4458947..7dd98d6e96e0 100644
--- a/arch/arm64/kvm/emulate-nested.c
+++ b/arch/arm64/kvm/emulate-nested.c
@@ -13,6 +13,26 @@
 
 #include "trace.h"
 
+bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit)
+{
+	bool control_bit_set;
+
+	if (!vcpu_has_nv(vcpu))
+		return false;
+
+	control_bit_set = __vcpu_sys_reg(vcpu, HCR_EL2) & control_bit;
+	if (!vcpu_is_el2(vcpu) && control_bit_set) {
+		kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
+		return true;
+	}
+	return false;
+}
+
+bool forward_nv_traps(struct kvm_vcpu *vcpu)
+{
+	return forward_traps(vcpu, HCR_NV);
+}
+
 static u64 kvm_check_illegal_exception_return(struct kvm_vcpu *vcpu, u64 spsr)
 {
 	u64 mode = spsr & PSR_MODE_MASK;
@@ -49,6 +69,13 @@ void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu)
 	u64 spsr, elr, mode;
 	bool direct_eret;
 
+	/*
+	 * Forward this trap to the virtual EL2 if the virtual
+	 * HCR_EL2.NV bit is set and this is coming from !EL2.
+	 */
+	if (forward_nv_traps(vcpu))
+		return;
+
 	/*
 	 * Going through the whole put/load motions is a waste of time
 	 * if this is a VHE guest hypervisor returning to its own
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index a5c698f188d6..867de65eb766 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -64,6 +64,13 @@ static int handle_smc(struct kvm_vcpu *vcpu)
 {
 	int ret;
 
+	/*
+	 * Forward this trapped smc instruction to the virtual EL2 if
+	 * the guest has asked for it.
+	 */
+	if (forward_traps(vcpu, HCR_TSC))
+		return 1;
+
 	/*
 	 * "If an SMC instruction executed at Non-secure EL1 is
 	 * trapped to EL2 because HCR_EL2.TSC is 1, the exception is a
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 7f074a7f6eb3..ccd063d6cb69 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -267,10 +267,19 @@ static u32 get_ccsidr(u32 csselr)
 	return ccsidr;
 }
 
+static bool el12_reg(struct sys_reg_params *p)
+{
+	/* All *_EL12 registers have Op1=5. */
+	return (p->Op1 == 5);
+}
+
 static bool access_rw(struct kvm_vcpu *vcpu,
 		      struct sys_reg_params *p,
 		      const struct sys_reg_desc *r)
 {
+	if (el12_reg(p) && forward_nv_traps(vcpu))
+		return false;
+
 	if (p->is_write)
 		vcpu_write_sys_reg(vcpu, p->regval, r->reg);
 	else
@@ -339,6 +348,9 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
 	bool was_enabled = vcpu_has_cache_enabled(vcpu);
 	u64 val, mask, shift;
 
+	if (el12_reg(p) && forward_nv_traps(vcpu))
+		return false;
+
 	/* We don't expect TRVM on the host */
 	BUG_ON(!vcpu_is_el2(vcpu) && !p->is_write);
 
@@ -1654,6 +1666,9 @@ static bool access_elr(struct kvm_vcpu *vcpu,
 		       struct sys_reg_params *p,
 		       const struct sys_reg_desc *r)
 {
+	if (el12_reg(p) && forward_nv_traps(vcpu))
+		return false;
+
 	if (p->is_write)
 		vcpu_write_sys_reg(vcpu, p->regval, ELR_EL1);
 	else
@@ -1666,6 +1681,9 @@ static bool access_spsr(struct kvm_vcpu *vcpu,
 			struct sys_reg_params *p,
 			const struct sys_reg_desc *r)
 {
+	if (el12_reg(p) && forward_nv_traps(vcpu))
+		return false;
+
 	if (p->is_write)
 		__vcpu_sys_reg(vcpu, SPSR_EL1) = p->regval;
 	else
@@ -1678,6 +1696,9 @@ static bool access_spsr_el2(struct kvm_vcpu *vcpu,
 			    struct sys_reg_params *p,
 			    const struct sys_reg_desc *r)
 {
+	if (el12_reg(p) && forward_nv_traps(vcpu))
+		return false;
+
 	if (p->is_write)
 		vcpu_write_sys_reg(vcpu, p->regval, SPSR_EL2);
 	else
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 24/64] KVM: arm64: nv: Respect the virtual HCR_EL2.NV bit setting
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Jintack Lim <jintack.lim@linaro.org>

Forward traps due to HCR_EL2.NV bit to the virtual EL2 if they are not
coming from the virtual EL2 and the virtual HCR_EL2.NV bit is set.

In addition to EL2 register accesses, setting NV bit will also make EL12
register accesses trap to EL2. To emulate this for the virtual EL2,
forword traps due to EL12 register accessses to the virtual EL2 if the
virtual HCR_EL2.NV bit is set.

This is for recursive nested virtualization.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
[Moved code to emulate-nested.c]
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_arm.h    |  1 +
 arch/arm64/include/asm/kvm_nested.h |  2 ++
 arch/arm64/kvm/emulate-nested.c     | 27 +++++++++++++++++++++++++++
 arch/arm64/kvm/handle_exit.c        |  7 +++++++
 arch/arm64/kvm/sys_regs.c           | 21 +++++++++++++++++++++
 5 files changed, 58 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 5acb153a82c8..8043827e7dc0 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -20,6 +20,7 @@
 #define HCR_AMVOFFEN	(UL(1) << 51)
 #define HCR_FIEN	(UL(1) << 47)
 #define HCR_FWB		(UL(1) << 46)
+#define HCR_NV		(UL(1) << 42)
 #define HCR_API		(UL(1) << 41)
 #define HCR_APK		(UL(1) << 40)
 #define HCR_TEA		(UL(1) << 37)
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 79d382fa02ea..37ff6458296d 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -66,5 +66,7 @@ static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
 }
 
 int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
+extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
+extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
 
 #endif /* __ARM64_KVM_NESTED_H */
diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
index f52cd4458947..7dd98d6e96e0 100644
--- a/arch/arm64/kvm/emulate-nested.c
+++ b/arch/arm64/kvm/emulate-nested.c
@@ -13,6 +13,26 @@
 
 #include "trace.h"
 
+bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit)
+{
+	bool control_bit_set;
+
+	if (!vcpu_has_nv(vcpu))
+		return false;
+
+	control_bit_set = __vcpu_sys_reg(vcpu, HCR_EL2) & control_bit;
+	if (!vcpu_is_el2(vcpu) && control_bit_set) {
+		kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
+		return true;
+	}
+	return false;
+}
+
+bool forward_nv_traps(struct kvm_vcpu *vcpu)
+{
+	return forward_traps(vcpu, HCR_NV);
+}
+
 static u64 kvm_check_illegal_exception_return(struct kvm_vcpu *vcpu, u64 spsr)
 {
 	u64 mode = spsr & PSR_MODE_MASK;
@@ -49,6 +69,13 @@ void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu)
 	u64 spsr, elr, mode;
 	bool direct_eret;
 
+	/*
+	 * Forward this trap to the virtual EL2 if the virtual
+	 * HCR_EL2.NV bit is set and this is coming from !EL2.
+	 */
+	if (forward_nv_traps(vcpu))
+		return;
+
 	/*
 	 * Going through the whole put/load motions is a waste of time
 	 * if this is a VHE guest hypervisor returning to its own
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index a5c698f188d6..867de65eb766 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -64,6 +64,13 @@ static int handle_smc(struct kvm_vcpu *vcpu)
 {
 	int ret;
 
+	/*
+	 * Forward this trapped smc instruction to the virtual EL2 if
+	 * the guest has asked for it.
+	 */
+	if (forward_traps(vcpu, HCR_TSC))
+		return 1;
+
 	/*
 	 * "If an SMC instruction executed at Non-secure EL1 is
 	 * trapped to EL2 because HCR_EL2.TSC is 1, the exception is a
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 7f074a7f6eb3..ccd063d6cb69 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -267,10 +267,19 @@ static u32 get_ccsidr(u32 csselr)
 	return ccsidr;
 }
 
+static bool el12_reg(struct sys_reg_params *p)
+{
+	/* All *_EL12 registers have Op1=5. */
+	return (p->Op1 == 5);
+}
+
 static bool access_rw(struct kvm_vcpu *vcpu,
 		      struct sys_reg_params *p,
 		      const struct sys_reg_desc *r)
 {
+	if (el12_reg(p) && forward_nv_traps(vcpu))
+		return false;
+
 	if (p->is_write)
 		vcpu_write_sys_reg(vcpu, p->regval, r->reg);
 	else
@@ -339,6 +348,9 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
 	bool was_enabled = vcpu_has_cache_enabled(vcpu);
 	u64 val, mask, shift;
 
+	if (el12_reg(p) && forward_nv_traps(vcpu))
+		return false;
+
 	/* We don't expect TRVM on the host */
 	BUG_ON(!vcpu_is_el2(vcpu) && !p->is_write);
 
@@ -1654,6 +1666,9 @@ static bool access_elr(struct kvm_vcpu *vcpu,
 		       struct sys_reg_params *p,
 		       const struct sys_reg_desc *r)
 {
+	if (el12_reg(p) && forward_nv_traps(vcpu))
+		return false;
+
 	if (p->is_write)
 		vcpu_write_sys_reg(vcpu, p->regval, ELR_EL1);
 	else
@@ -1666,6 +1681,9 @@ static bool access_spsr(struct kvm_vcpu *vcpu,
 			struct sys_reg_params *p,
 			const struct sys_reg_desc *r)
 {
+	if (el12_reg(p) && forward_nv_traps(vcpu))
+		return false;
+
 	if (p->is_write)
 		__vcpu_sys_reg(vcpu, SPSR_EL1) = p->regval;
 	else
@@ -1678,6 +1696,9 @@ static bool access_spsr_el2(struct kvm_vcpu *vcpu,
 			    struct sys_reg_params *p,
 			    const struct sys_reg_desc *r)
 {
+	if (el12_reg(p) && forward_nv_traps(vcpu))
+		return false;
+
 	if (p->is_write)
 		vcpu_write_sys_reg(vcpu, p->regval, SPSR_EL2);
 	else
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 25/64] KVM: arm64: nv: Respect virtual HCR_EL2.TVM and TRVM settings
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Jintack Lim <jintack.lim@linaro.org>

Forward the EL1 virtual memory register traps to the virtual EL2 if they
are not coming from the virtual EL2 and the virtual HCR_EL2.TVM or TRVM
bit is set.

This is for recursive nested virtualization.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/sys_regs.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index ccd063d6cb69..edaf287c7ec9 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -351,6 +351,13 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
 	if (el12_reg(p) && forward_nv_traps(vcpu))
 		return false;
 
+	if (!el12_reg(p)) {
+		u64 bit = p->is_write ? HCR_TVM : HCR_TRVM;
+
+		if (forward_traps(vcpu, bit))
+			return false;
+	}
+
 	/* We don't expect TRVM on the host */
 	BUG_ON(!vcpu_is_el2(vcpu) && !p->is_write);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 25/64] KVM: arm64: nv: Respect virtual HCR_EL2.TVM and TRVM settings
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

From: Jintack Lim <jintack.lim@linaro.org>

Forward the EL1 virtual memory register traps to the virtual EL2 if they
are not coming from the virtual EL2 and the virtual HCR_EL2.TVM or TRVM
bit is set.

This is for recursive nested virtualization.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/sys_regs.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index ccd063d6cb69..edaf287c7ec9 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -351,6 +351,13 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
 	if (el12_reg(p) && forward_nv_traps(vcpu))
 		return false;
 
+	if (!el12_reg(p)) {
+		u64 bit = p->is_write ? HCR_TVM : HCR_TRVM;
+
+		if (forward_traps(vcpu, bit))
+			return false;
+	}
+
 	/* We don't expect TRVM on the host */
 	BUG_ON(!vcpu_is_el2(vcpu) && !p->is_write);
 
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 25/64] KVM: arm64: nv: Respect virtual HCR_EL2.TVM and TRVM settings
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Jintack Lim <jintack.lim@linaro.org>

Forward the EL1 virtual memory register traps to the virtual EL2 if they
are not coming from the virtual EL2 and the virtual HCR_EL2.TVM or TRVM
bit is set.

This is for recursive nested virtualization.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/sys_regs.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index ccd063d6cb69..edaf287c7ec9 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -351,6 +351,13 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
 	if (el12_reg(p) && forward_nv_traps(vcpu))
 		return false;
 
+	if (!el12_reg(p)) {
+		u64 bit = p->is_write ? HCR_TVM : HCR_TRVM;
+
+		if (forward_traps(vcpu, bit))
+			return false;
+	}
+
 	/* We don't expect TRVM on the host */
 	BUG_ON(!vcpu_is_el2(vcpu) && !p->is_write);
 
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 26/64] KVM: arm64: nv: Respect the virtual HCR_EL2.NV1 bit setting
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Jintack Lim <jintack@cs.columbia.edu>

Forward ELR_EL1, SPSR_EL1 and VBAR_EL1 traps to the virtual EL2 if the
virtual HCR_EL2.NV bit is set.

This is for recursive nested virtualization.

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_arm.h    |  1 +
 arch/arm64/include/asm/kvm_nested.h |  1 +
 arch/arm64/kvm/emulate-nested.c     |  5 +++++
 arch/arm64/kvm/sys_regs.c           | 22 +++++++++++++++++++++-
 4 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 8043827e7dc0..748c2b068d4e 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -20,6 +20,7 @@
 #define HCR_AMVOFFEN	(UL(1) << 51)
 #define HCR_FIEN	(UL(1) << 47)
 #define HCR_FWB		(UL(1) << 46)
+#define HCR_NV1		(UL(1) << 43)
 #define HCR_NV		(UL(1) << 42)
 #define HCR_API		(UL(1) << 41)
 #define HCR_APK		(UL(1) << 40)
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 37ff6458296d..82fc8b6c990b 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -68,5 +68,6 @@ static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
 int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
 extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
 extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
+extern bool forward_nv1_traps(struct kvm_vcpu *vcpu);
 
 #endif /* __ARM64_KVM_NESTED_H */
diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
index 7dd98d6e96e0..0109dfd664dd 100644
--- a/arch/arm64/kvm/emulate-nested.c
+++ b/arch/arm64/kvm/emulate-nested.c
@@ -33,6 +33,11 @@ bool forward_nv_traps(struct kvm_vcpu *vcpu)
 	return forward_traps(vcpu, HCR_NV);
 }
 
+bool forward_nv1_traps(struct kvm_vcpu *vcpu)
+{
+	return forward_traps(vcpu, HCR_NV1);
+}
+
 static u64 kvm_check_illegal_exception_return(struct kvm_vcpu *vcpu, u64 spsr)
 {
 	u64 mode = spsr & PSR_MODE_MASK;
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index edaf287c7ec9..31d739d59f67 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -288,6 +288,16 @@ static bool access_rw(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+static bool access_vbar_el1(struct kvm_vcpu *vcpu,
+			    struct sys_reg_params *p,
+			    const struct sys_reg_desc *r)
+{
+	if (forward_nv1_traps(vcpu))
+		return false;
+
+	return access_rw(vcpu, p, r);
+}
+
 /*
  * See note at ARMv7 ARM B1.14.4 (TL;DR: S/W ops are not easily virtualized).
  */
@@ -1669,6 +1679,7 @@ static bool access_sp_el1(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+
 static bool access_elr(struct kvm_vcpu *vcpu,
 		       struct sys_reg_params *p,
 		       const struct sys_reg_desc *r)
@@ -1676,6 +1687,9 @@ static bool access_elr(struct kvm_vcpu *vcpu,
 	if (el12_reg(p) && forward_nv_traps(vcpu))
 		return false;
 
+	if (!el12_reg(p) && forward_nv1_traps(vcpu))
+		return false;
+
 	if (p->is_write)
 		vcpu_write_sys_reg(vcpu, p->regval, ELR_EL1);
 	else
@@ -1691,6 +1705,9 @@ static bool access_spsr(struct kvm_vcpu *vcpu,
 	if (el12_reg(p) && forward_nv_traps(vcpu))
 		return false;
 
+	if (!el12_reg(p) && forward_nv1_traps(vcpu))
+		return false;
+
 	if (p->is_write)
 		__vcpu_sys_reg(vcpu, SPSR_EL1) = p->regval;
 	else
@@ -1706,6 +1723,9 @@ static bool access_spsr_el2(struct kvm_vcpu *vcpu,
 	if (el12_reg(p) && forward_nv_traps(vcpu))
 		return false;
 
+	if (!el12_reg(p) && forward_nv1_traps(vcpu))
+		return false;
+
 	if (p->is_write)
 		vcpu_write_sys_reg(vcpu, p->regval, SPSR_EL2);
 	else
@@ -1914,7 +1934,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_LORC_EL1), trap_loregion },
 	{ SYS_DESC(SYS_LORID_EL1), trap_loregion },
 
-	{ SYS_DESC(SYS_VBAR_EL1), access_rw, reset_val, VBAR_EL1, 0 },
+	{ SYS_DESC(SYS_VBAR_EL1), access_vbar_el1, reset_val, VBAR_EL1, 0 },
 	{ SYS_DESC(SYS_DISR_EL1), NULL, reset_val, DISR_EL1, 0 },
 
 	{ SYS_DESC(SYS_ICC_IAR0_EL1), write_to_read_only },
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 26/64] KVM: arm64: nv: Respect the virtual HCR_EL2.NV1 bit setting
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

From: Jintack Lim <jintack@cs.columbia.edu>

Forward ELR_EL1, SPSR_EL1 and VBAR_EL1 traps to the virtual EL2 if the
virtual HCR_EL2.NV bit is set.

This is for recursive nested virtualization.

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_arm.h    |  1 +
 arch/arm64/include/asm/kvm_nested.h |  1 +
 arch/arm64/kvm/emulate-nested.c     |  5 +++++
 arch/arm64/kvm/sys_regs.c           | 22 +++++++++++++++++++++-
 4 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 8043827e7dc0..748c2b068d4e 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -20,6 +20,7 @@
 #define HCR_AMVOFFEN	(UL(1) << 51)
 #define HCR_FIEN	(UL(1) << 47)
 #define HCR_FWB		(UL(1) << 46)
+#define HCR_NV1		(UL(1) << 43)
 #define HCR_NV		(UL(1) << 42)
 #define HCR_API		(UL(1) << 41)
 #define HCR_APK		(UL(1) << 40)
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 37ff6458296d..82fc8b6c990b 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -68,5 +68,6 @@ static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
 int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
 extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
 extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
+extern bool forward_nv1_traps(struct kvm_vcpu *vcpu);
 
 #endif /* __ARM64_KVM_NESTED_H */
diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
index 7dd98d6e96e0..0109dfd664dd 100644
--- a/arch/arm64/kvm/emulate-nested.c
+++ b/arch/arm64/kvm/emulate-nested.c
@@ -33,6 +33,11 @@ bool forward_nv_traps(struct kvm_vcpu *vcpu)
 	return forward_traps(vcpu, HCR_NV);
 }
 
+bool forward_nv1_traps(struct kvm_vcpu *vcpu)
+{
+	return forward_traps(vcpu, HCR_NV1);
+}
+
 static u64 kvm_check_illegal_exception_return(struct kvm_vcpu *vcpu, u64 spsr)
 {
 	u64 mode = spsr & PSR_MODE_MASK;
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index edaf287c7ec9..31d739d59f67 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -288,6 +288,16 @@ static bool access_rw(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+static bool access_vbar_el1(struct kvm_vcpu *vcpu,
+			    struct sys_reg_params *p,
+			    const struct sys_reg_desc *r)
+{
+	if (forward_nv1_traps(vcpu))
+		return false;
+
+	return access_rw(vcpu, p, r);
+}
+
 /*
  * See note at ARMv7 ARM B1.14.4 (TL;DR: S/W ops are not easily virtualized).
  */
@@ -1669,6 +1679,7 @@ static bool access_sp_el1(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+
 static bool access_elr(struct kvm_vcpu *vcpu,
 		       struct sys_reg_params *p,
 		       const struct sys_reg_desc *r)
@@ -1676,6 +1687,9 @@ static bool access_elr(struct kvm_vcpu *vcpu,
 	if (el12_reg(p) && forward_nv_traps(vcpu))
 		return false;
 
+	if (!el12_reg(p) && forward_nv1_traps(vcpu))
+		return false;
+
 	if (p->is_write)
 		vcpu_write_sys_reg(vcpu, p->regval, ELR_EL1);
 	else
@@ -1691,6 +1705,9 @@ static bool access_spsr(struct kvm_vcpu *vcpu,
 	if (el12_reg(p) && forward_nv_traps(vcpu))
 		return false;
 
+	if (!el12_reg(p) && forward_nv1_traps(vcpu))
+		return false;
+
 	if (p->is_write)
 		__vcpu_sys_reg(vcpu, SPSR_EL1) = p->regval;
 	else
@@ -1706,6 +1723,9 @@ static bool access_spsr_el2(struct kvm_vcpu *vcpu,
 	if (el12_reg(p) && forward_nv_traps(vcpu))
 		return false;
 
+	if (!el12_reg(p) && forward_nv1_traps(vcpu))
+		return false;
+
 	if (p->is_write)
 		vcpu_write_sys_reg(vcpu, p->regval, SPSR_EL2);
 	else
@@ -1914,7 +1934,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_LORC_EL1), trap_loregion },
 	{ SYS_DESC(SYS_LORID_EL1), trap_loregion },
 
-	{ SYS_DESC(SYS_VBAR_EL1), access_rw, reset_val, VBAR_EL1, 0 },
+	{ SYS_DESC(SYS_VBAR_EL1), access_vbar_el1, reset_val, VBAR_EL1, 0 },
 	{ SYS_DESC(SYS_DISR_EL1), NULL, reset_val, DISR_EL1, 0 },
 
 	{ SYS_DESC(SYS_ICC_IAR0_EL1), write_to_read_only },
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 26/64] KVM: arm64: nv: Respect the virtual HCR_EL2.NV1 bit setting
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Jintack Lim <jintack@cs.columbia.edu>

Forward ELR_EL1, SPSR_EL1 and VBAR_EL1 traps to the virtual EL2 if the
virtual HCR_EL2.NV bit is set.

This is for recursive nested virtualization.

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_arm.h    |  1 +
 arch/arm64/include/asm/kvm_nested.h |  1 +
 arch/arm64/kvm/emulate-nested.c     |  5 +++++
 arch/arm64/kvm/sys_regs.c           | 22 +++++++++++++++++++++-
 4 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 8043827e7dc0..748c2b068d4e 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -20,6 +20,7 @@
 #define HCR_AMVOFFEN	(UL(1) << 51)
 #define HCR_FIEN	(UL(1) << 47)
 #define HCR_FWB		(UL(1) << 46)
+#define HCR_NV1		(UL(1) << 43)
 #define HCR_NV		(UL(1) << 42)
 #define HCR_API		(UL(1) << 41)
 #define HCR_APK		(UL(1) << 40)
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 37ff6458296d..82fc8b6c990b 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -68,5 +68,6 @@ static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
 int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
 extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
 extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
+extern bool forward_nv1_traps(struct kvm_vcpu *vcpu);
 
 #endif /* __ARM64_KVM_NESTED_H */
diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
index 7dd98d6e96e0..0109dfd664dd 100644
--- a/arch/arm64/kvm/emulate-nested.c
+++ b/arch/arm64/kvm/emulate-nested.c
@@ -33,6 +33,11 @@ bool forward_nv_traps(struct kvm_vcpu *vcpu)
 	return forward_traps(vcpu, HCR_NV);
 }
 
+bool forward_nv1_traps(struct kvm_vcpu *vcpu)
+{
+	return forward_traps(vcpu, HCR_NV1);
+}
+
 static u64 kvm_check_illegal_exception_return(struct kvm_vcpu *vcpu, u64 spsr)
 {
 	u64 mode = spsr & PSR_MODE_MASK;
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index edaf287c7ec9..31d739d59f67 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -288,6 +288,16 @@ static bool access_rw(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+static bool access_vbar_el1(struct kvm_vcpu *vcpu,
+			    struct sys_reg_params *p,
+			    const struct sys_reg_desc *r)
+{
+	if (forward_nv1_traps(vcpu))
+		return false;
+
+	return access_rw(vcpu, p, r);
+}
+
 /*
  * See note at ARMv7 ARM B1.14.4 (TL;DR: S/W ops are not easily virtualized).
  */
@@ -1669,6 +1679,7 @@ static bool access_sp_el1(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+
 static bool access_elr(struct kvm_vcpu *vcpu,
 		       struct sys_reg_params *p,
 		       const struct sys_reg_desc *r)
@@ -1676,6 +1687,9 @@ static bool access_elr(struct kvm_vcpu *vcpu,
 	if (el12_reg(p) && forward_nv_traps(vcpu))
 		return false;
 
+	if (!el12_reg(p) && forward_nv1_traps(vcpu))
+		return false;
+
 	if (p->is_write)
 		vcpu_write_sys_reg(vcpu, p->regval, ELR_EL1);
 	else
@@ -1691,6 +1705,9 @@ static bool access_spsr(struct kvm_vcpu *vcpu,
 	if (el12_reg(p) && forward_nv_traps(vcpu))
 		return false;
 
+	if (!el12_reg(p) && forward_nv1_traps(vcpu))
+		return false;
+
 	if (p->is_write)
 		__vcpu_sys_reg(vcpu, SPSR_EL1) = p->regval;
 	else
@@ -1706,6 +1723,9 @@ static bool access_spsr_el2(struct kvm_vcpu *vcpu,
 	if (el12_reg(p) && forward_nv_traps(vcpu))
 		return false;
 
+	if (!el12_reg(p) && forward_nv1_traps(vcpu))
+		return false;
+
 	if (p->is_write)
 		vcpu_write_sys_reg(vcpu, p->regval, SPSR_EL2);
 	else
@@ -1914,7 +1934,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_LORC_EL1), trap_loregion },
 	{ SYS_DESC(SYS_LORID_EL1), trap_loregion },
 
-	{ SYS_DESC(SYS_VBAR_EL1), access_rw, reset_val, VBAR_EL1, 0 },
+	{ SYS_DESC(SYS_VBAR_EL1), access_vbar_el1, reset_val, VBAR_EL1, 0 },
 	{ SYS_DESC(SYS_DISR_EL1), NULL, reset_val, DISR_EL1, 0 },
 
 	{ SYS_DESC(SYS_ICC_IAR0_EL1), write_to_read_only },
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 27/64] KVM: arm64: nv: Allow a sysreg to be hidden from userspace only
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

So far, we never needed to distinguish between registers hidden
from userspace and being hidden from a guest (they are always
either visible to both, or hidden from both).

With NV, we have the ugly case of the EL{0,1}2 registers, which
are only a view on the EL{0,1} registers. It makes absolutely no
sense to expose them to userspace, since it already has the
canonical view.

Add a new visibility flag (REG_HIDDEN_USER) and a new helper that
checks for it and REG_HIDDEN when checking whether to expose
a sysreg to userspace. Subsequent patches will make use of it.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/sys_regs.c |  6 +++---
 arch/arm64/kvm/sys_regs.h | 12 +++++++++++-
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 31d739d59f67..0c9bbe5eee5e 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -3031,7 +3031,7 @@ int kvm_arm_sys_reg_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg
 		return get_invariant_sys_reg(reg->id, uaddr);
 
 	/* Check for regs disabled by runtime config */
-	if (sysreg_hidden(vcpu, r))
+	if (sysreg_hidden_user(vcpu, r))
 		return -ENOENT;
 
 	if (r->get_user)
@@ -3056,7 +3056,7 @@ int kvm_arm_sys_reg_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg
 		return set_invariant_sys_reg(reg->id, uaddr);
 
 	/* Check for regs disabled by runtime config */
-	if (sysreg_hidden(vcpu, r))
+	if (sysreg_hidden_user(vcpu, r))
 		return -ENOENT;
 
 	if (r->set_user)
@@ -3127,7 +3127,7 @@ static int walk_one_sys_reg(const struct kvm_vcpu *vcpu,
 	if (!(rd->reg || rd->get_user))
 		return 0;
 
-	if (sysreg_hidden(vcpu, rd))
+	if (sysreg_hidden_user(vcpu, rd))
 		return 0;
 
 	if (!copy_reg_to_user(rd, uind))
diff --git a/arch/arm64/kvm/sys_regs.h b/arch/arm64/kvm/sys_regs.h
index cc0cc95a0280..850629f083a3 100644
--- a/arch/arm64/kvm/sys_regs.h
+++ b/arch/arm64/kvm/sys_regs.h
@@ -78,7 +78,8 @@ struct sys_reg_desc {
 };
 
 #define REG_HIDDEN		(1 << 0) /* hidden from userspace and guest */
-#define REG_RAZ			(1 << 1) /* RAZ from userspace and guest */
+#define REG_HIDDEN_USER		(1 << 1) /* hidden from userspace only */
+#define REG_RAZ			(1 << 2) /* RAZ from userspace and guest */
 
 static __printf(2, 3)
 inline void print_sys_reg_msg(const struct sys_reg_params *p,
@@ -138,6 +139,15 @@ static inline bool sysreg_hidden(const struct kvm_vcpu *vcpu,
 	return r->visibility(vcpu, r) & REG_HIDDEN;
 }
 
+static inline bool sysreg_hidden_user(const struct kvm_vcpu *vcpu,
+				      const struct sys_reg_desc *r)
+{
+	if (likely(!r->visibility))
+		return false;
+
+	return r->visibility(vcpu, r) & (REG_HIDDEN | REG_HIDDEN_USER);
+}
+
 static inline bool sysreg_visible_as_raz(const struct kvm_vcpu *vcpu,
 					 const struct sys_reg_desc *r)
 {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 27/64] KVM: arm64: nv: Allow a sysreg to be hidden from userspace only
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

So far, we never needed to distinguish between registers hidden
from userspace and being hidden from a guest (they are always
either visible to both, or hidden from both).

With NV, we have the ugly case of the EL{0,1}2 registers, which
are only a view on the EL{0,1} registers. It makes absolutely no
sense to expose them to userspace, since it already has the
canonical view.

Add a new visibility flag (REG_HIDDEN_USER) and a new helper that
checks for it and REG_HIDDEN when checking whether to expose
a sysreg to userspace. Subsequent patches will make use of it.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/sys_regs.c |  6 +++---
 arch/arm64/kvm/sys_regs.h | 12 +++++++++++-
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 31d739d59f67..0c9bbe5eee5e 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -3031,7 +3031,7 @@ int kvm_arm_sys_reg_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg
 		return get_invariant_sys_reg(reg->id, uaddr);
 
 	/* Check for regs disabled by runtime config */
-	if (sysreg_hidden(vcpu, r))
+	if (sysreg_hidden_user(vcpu, r))
 		return -ENOENT;
 
 	if (r->get_user)
@@ -3056,7 +3056,7 @@ int kvm_arm_sys_reg_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg
 		return set_invariant_sys_reg(reg->id, uaddr);
 
 	/* Check for regs disabled by runtime config */
-	if (sysreg_hidden(vcpu, r))
+	if (sysreg_hidden_user(vcpu, r))
 		return -ENOENT;
 
 	if (r->set_user)
@@ -3127,7 +3127,7 @@ static int walk_one_sys_reg(const struct kvm_vcpu *vcpu,
 	if (!(rd->reg || rd->get_user))
 		return 0;
 
-	if (sysreg_hidden(vcpu, rd))
+	if (sysreg_hidden_user(vcpu, rd))
 		return 0;
 
 	if (!copy_reg_to_user(rd, uind))
diff --git a/arch/arm64/kvm/sys_regs.h b/arch/arm64/kvm/sys_regs.h
index cc0cc95a0280..850629f083a3 100644
--- a/arch/arm64/kvm/sys_regs.h
+++ b/arch/arm64/kvm/sys_regs.h
@@ -78,7 +78,8 @@ struct sys_reg_desc {
 };
 
 #define REG_HIDDEN		(1 << 0) /* hidden from userspace and guest */
-#define REG_RAZ			(1 << 1) /* RAZ from userspace and guest */
+#define REG_HIDDEN_USER		(1 << 1) /* hidden from userspace only */
+#define REG_RAZ			(1 << 2) /* RAZ from userspace and guest */
 
 static __printf(2, 3)
 inline void print_sys_reg_msg(const struct sys_reg_params *p,
@@ -138,6 +139,15 @@ static inline bool sysreg_hidden(const struct kvm_vcpu *vcpu,
 	return r->visibility(vcpu, r) & REG_HIDDEN;
 }
 
+static inline bool sysreg_hidden_user(const struct kvm_vcpu *vcpu,
+				      const struct sys_reg_desc *r)
+{
+	if (likely(!r->visibility))
+		return false;
+
+	return r->visibility(vcpu, r) & (REG_HIDDEN | REG_HIDDEN_USER);
+}
+
 static inline bool sysreg_visible_as_raz(const struct kvm_vcpu *vcpu,
 					 const struct sys_reg_desc *r)
 {
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 27/64] KVM: arm64: nv: Allow a sysreg to be hidden from userspace only
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

So far, we never needed to distinguish between registers hidden
from userspace and being hidden from a guest (they are always
either visible to both, or hidden from both).

With NV, we have the ugly case of the EL{0,1}2 registers, which
are only a view on the EL{0,1} registers. It makes absolutely no
sense to expose them to userspace, since it already has the
canonical view.

Add a new visibility flag (REG_HIDDEN_USER) and a new helper that
checks for it and REG_HIDDEN when checking whether to expose
a sysreg to userspace. Subsequent patches will make use of it.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/sys_regs.c |  6 +++---
 arch/arm64/kvm/sys_regs.h | 12 +++++++++++-
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 31d739d59f67..0c9bbe5eee5e 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -3031,7 +3031,7 @@ int kvm_arm_sys_reg_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg
 		return get_invariant_sys_reg(reg->id, uaddr);
 
 	/* Check for regs disabled by runtime config */
-	if (sysreg_hidden(vcpu, r))
+	if (sysreg_hidden_user(vcpu, r))
 		return -ENOENT;
 
 	if (r->get_user)
@@ -3056,7 +3056,7 @@ int kvm_arm_sys_reg_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg
 		return set_invariant_sys_reg(reg->id, uaddr);
 
 	/* Check for regs disabled by runtime config */
-	if (sysreg_hidden(vcpu, r))
+	if (sysreg_hidden_user(vcpu, r))
 		return -ENOENT;
 
 	if (r->set_user)
@@ -3127,7 +3127,7 @@ static int walk_one_sys_reg(const struct kvm_vcpu *vcpu,
 	if (!(rd->reg || rd->get_user))
 		return 0;
 
-	if (sysreg_hidden(vcpu, rd))
+	if (sysreg_hidden_user(vcpu, rd))
 		return 0;
 
 	if (!copy_reg_to_user(rd, uind))
diff --git a/arch/arm64/kvm/sys_regs.h b/arch/arm64/kvm/sys_regs.h
index cc0cc95a0280..850629f083a3 100644
--- a/arch/arm64/kvm/sys_regs.h
+++ b/arch/arm64/kvm/sys_regs.h
@@ -78,7 +78,8 @@ struct sys_reg_desc {
 };
 
 #define REG_HIDDEN		(1 << 0) /* hidden from userspace and guest */
-#define REG_RAZ			(1 << 1) /* RAZ from userspace and guest */
+#define REG_HIDDEN_USER		(1 << 1) /* hidden from userspace only */
+#define REG_RAZ			(1 << 2) /* RAZ from userspace and guest */
 
 static __printf(2, 3)
 inline void print_sys_reg_msg(const struct sys_reg_params *p,
@@ -138,6 +139,15 @@ static inline bool sysreg_hidden(const struct kvm_vcpu *vcpu,
 	return r->visibility(vcpu, r) & REG_HIDDEN;
 }
 
+static inline bool sysreg_hidden_user(const struct kvm_vcpu *vcpu,
+				      const struct sys_reg_desc *r)
+{
+	if (likely(!r->visibility))
+		return false;
+
+	return r->visibility(vcpu, r) & (REG_HIDDEN | REG_HIDDEN_USER);
+}
+
 static inline bool sysreg_visible_as_raz(const struct kvm_vcpu *vcpu,
 					 const struct sys_reg_desc *r)
 {
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 28/64] KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Jintack Lim <jintack.lim@linaro.org>

With HCR_EL2.NV bit set, accesses to EL12 registers in the virtual EL2
trap to EL2. Handle those traps just like we do for EL1 registers.

One exception is CNTKCTL_EL12. We don't trap on CNTKCTL_EL1 for non-VHE
virtual EL2 because we don't have to. However, accessing CNTKCTL_EL12
will trap since it's one of the EL12 registers controlled by HCR_EL2.NV
bit.  Therefore, add a handler for it and don't treat it as a
non-trap-registers when preparing a shadow context.

These registers, being only a view on their EL1 counterpart, are
permanently hidden from userspace.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
[maz: EL12_REG(), register visibility]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/sys_regs.c | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 0c9bbe5eee5e..697bf0bca550 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1634,6 +1634,26 @@ static unsigned int el2_visibility(const struct kvm_vcpu *vcpu,
 	.val = v,				\
 }
 
+/*
+ * EL{0,1}2 registers are the EL2 view on an EL0 or EL1 register when
+ * HCR_EL2.E2H==1, and only in the sysreg table for convenience of
+ * handling traps. Given that, they are always hidden from userspace.
+ */
+static unsigned int elx2_visibility(const struct kvm_vcpu *vcpu,
+				    const struct sys_reg_desc *rd)
+{
+	return REG_HIDDEN_USER;
+}
+
+#define EL12_REG(name, acc, rst, v) {		\
+	SYS_DESC(SYS_##name##_EL12),		\
+	.access = acc,				\
+	.reset = rst,				\
+	.reg = name##_EL1,			\
+	.val = v,				\
+	.visibility = elx2_visibility,		\
+}
+
 /* sys_reg_desc initialiser for known cpufeature ID registers */
 #define ID_SANITISED(name) {			\
 	SYS_DESC(SYS_##name),			\
@@ -2194,6 +2214,23 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	EL2_REG(CNTVOFF_EL2, access_rw, reset_val, 0),
 	EL2_REG(CNTHCTL_EL2, access_rw, reset_val, 0),
 
+	EL12_REG(SCTLR, access_vm_reg, reset_val, 0x00C50078),
+	EL12_REG(CPACR, access_rw, reset_val, 0),
+	EL12_REG(TTBR0, access_vm_reg, reset_unknown, 0),
+	EL12_REG(TTBR1, access_vm_reg, reset_unknown, 0),
+	EL12_REG(TCR, access_vm_reg, reset_val, 0),
+	{ SYS_DESC(SYS_SPSR_EL12), access_spsr},
+	{ SYS_DESC(SYS_ELR_EL12), access_elr},
+	EL12_REG(AFSR0, access_vm_reg, reset_unknown, 0),
+	EL12_REG(AFSR1, access_vm_reg, reset_unknown, 0),
+	EL12_REG(ESR, access_vm_reg, reset_unknown, 0),
+	EL12_REG(FAR, access_vm_reg, reset_unknown, 0),
+	EL12_REG(MAIR, access_vm_reg, reset_unknown, 0),
+	EL12_REG(AMAIR, access_vm_reg, reset_amair_el1, 0),
+	EL12_REG(VBAR, access_rw, reset_val, 0),
+	EL12_REG(CONTEXTIDR, access_vm_reg, reset_val, 0),
+	EL12_REG(CNTKCTL, access_rw, reset_val, 0),
+
 	EL2_REG(SP_EL2, NULL, reset_unknown, 0),
 };
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 28/64] KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

From: Jintack Lim <jintack.lim@linaro.org>

With HCR_EL2.NV bit set, accesses to EL12 registers in the virtual EL2
trap to EL2. Handle those traps just like we do for EL1 registers.

One exception is CNTKCTL_EL12. We don't trap on CNTKCTL_EL1 for non-VHE
virtual EL2 because we don't have to. However, accessing CNTKCTL_EL12
will trap since it's one of the EL12 registers controlled by HCR_EL2.NV
bit.  Therefore, add a handler for it and don't treat it as a
non-trap-registers when preparing a shadow context.

These registers, being only a view on their EL1 counterpart, are
permanently hidden from userspace.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
[maz: EL12_REG(), register visibility]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/sys_regs.c | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 0c9bbe5eee5e..697bf0bca550 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1634,6 +1634,26 @@ static unsigned int el2_visibility(const struct kvm_vcpu *vcpu,
 	.val = v,				\
 }
 
+/*
+ * EL{0,1}2 registers are the EL2 view on an EL0 or EL1 register when
+ * HCR_EL2.E2H==1, and only in the sysreg table for convenience of
+ * handling traps. Given that, they are always hidden from userspace.
+ */
+static unsigned int elx2_visibility(const struct kvm_vcpu *vcpu,
+				    const struct sys_reg_desc *rd)
+{
+	return REG_HIDDEN_USER;
+}
+
+#define EL12_REG(name, acc, rst, v) {		\
+	SYS_DESC(SYS_##name##_EL12),		\
+	.access = acc,				\
+	.reset = rst,				\
+	.reg = name##_EL1,			\
+	.val = v,				\
+	.visibility = elx2_visibility,		\
+}
+
 /* sys_reg_desc initialiser for known cpufeature ID registers */
 #define ID_SANITISED(name) {			\
 	SYS_DESC(SYS_##name),			\
@@ -2194,6 +2214,23 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	EL2_REG(CNTVOFF_EL2, access_rw, reset_val, 0),
 	EL2_REG(CNTHCTL_EL2, access_rw, reset_val, 0),
 
+	EL12_REG(SCTLR, access_vm_reg, reset_val, 0x00C50078),
+	EL12_REG(CPACR, access_rw, reset_val, 0),
+	EL12_REG(TTBR0, access_vm_reg, reset_unknown, 0),
+	EL12_REG(TTBR1, access_vm_reg, reset_unknown, 0),
+	EL12_REG(TCR, access_vm_reg, reset_val, 0),
+	{ SYS_DESC(SYS_SPSR_EL12), access_spsr},
+	{ SYS_DESC(SYS_ELR_EL12), access_elr},
+	EL12_REG(AFSR0, access_vm_reg, reset_unknown, 0),
+	EL12_REG(AFSR1, access_vm_reg, reset_unknown, 0),
+	EL12_REG(ESR, access_vm_reg, reset_unknown, 0),
+	EL12_REG(FAR, access_vm_reg, reset_unknown, 0),
+	EL12_REG(MAIR, access_vm_reg, reset_unknown, 0),
+	EL12_REG(AMAIR, access_vm_reg, reset_amair_el1, 0),
+	EL12_REG(VBAR, access_rw, reset_val, 0),
+	EL12_REG(CONTEXTIDR, access_vm_reg, reset_val, 0),
+	EL12_REG(CNTKCTL, access_rw, reset_val, 0),
+
 	EL2_REG(SP_EL2, NULL, reset_unknown, 0),
 };
 
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 28/64] KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Jintack Lim <jintack.lim@linaro.org>

With HCR_EL2.NV bit set, accesses to EL12 registers in the virtual EL2
trap to EL2. Handle those traps just like we do for EL1 registers.

One exception is CNTKCTL_EL12. We don't trap on CNTKCTL_EL1 for non-VHE
virtual EL2 because we don't have to. However, accessing CNTKCTL_EL12
will trap since it's one of the EL12 registers controlled by HCR_EL2.NV
bit.  Therefore, add a handler for it and don't treat it as a
non-trap-registers when preparing a shadow context.

These registers, being only a view on their EL1 counterpart, are
permanently hidden from userspace.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
[maz: EL12_REG(), register visibility]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/sys_regs.c | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 0c9bbe5eee5e..697bf0bca550 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1634,6 +1634,26 @@ static unsigned int el2_visibility(const struct kvm_vcpu *vcpu,
 	.val = v,				\
 }
 
+/*
+ * EL{0,1}2 registers are the EL2 view on an EL0 or EL1 register when
+ * HCR_EL2.E2H==1, and only in the sysreg table for convenience of
+ * handling traps. Given that, they are always hidden from userspace.
+ */
+static unsigned int elx2_visibility(const struct kvm_vcpu *vcpu,
+				    const struct sys_reg_desc *rd)
+{
+	return REG_HIDDEN_USER;
+}
+
+#define EL12_REG(name, acc, rst, v) {		\
+	SYS_DESC(SYS_##name##_EL12),		\
+	.access = acc,				\
+	.reset = rst,				\
+	.reg = name##_EL1,			\
+	.val = v,				\
+	.visibility = elx2_visibility,		\
+}
+
 /* sys_reg_desc initialiser for known cpufeature ID registers */
 #define ID_SANITISED(name) {			\
 	SYS_DESC(SYS_##name),			\
@@ -2194,6 +2214,23 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	EL2_REG(CNTVOFF_EL2, access_rw, reset_val, 0),
 	EL2_REG(CNTHCTL_EL2, access_rw, reset_val, 0),
 
+	EL12_REG(SCTLR, access_vm_reg, reset_val, 0x00C50078),
+	EL12_REG(CPACR, access_rw, reset_val, 0),
+	EL12_REG(TTBR0, access_vm_reg, reset_unknown, 0),
+	EL12_REG(TTBR1, access_vm_reg, reset_unknown, 0),
+	EL12_REG(TCR, access_vm_reg, reset_val, 0),
+	{ SYS_DESC(SYS_SPSR_EL12), access_spsr},
+	{ SYS_DESC(SYS_ELR_EL12), access_elr},
+	EL12_REG(AFSR0, access_vm_reg, reset_unknown, 0),
+	EL12_REG(AFSR1, access_vm_reg, reset_unknown, 0),
+	EL12_REG(ESR, access_vm_reg, reset_unknown, 0),
+	EL12_REG(FAR, access_vm_reg, reset_unknown, 0),
+	EL12_REG(MAIR, access_vm_reg, reset_unknown, 0),
+	EL12_REG(AMAIR, access_vm_reg, reset_amair_el1, 0),
+	EL12_REG(VBAR, access_rw, reset_val, 0),
+	EL12_REG(CONTEXTIDR, access_vm_reg, reset_val, 0),
+	EL12_REG(CNTKCTL, access_rw, reset_val, 0),
+
 	EL2_REG(SP_EL2, NULL, reset_unknown, 0),
 };
 
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 29/64] KVM: arm64: nv: Forward debug traps to the nested guest
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

On handling a debug trap, check whether we need to forward it to the
guest before handling it.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_nested.h | 2 ++
 arch/arm64/kvm/emulate-nested.c     | 9 +++++++--
 arch/arm64/kvm/sys_regs.c           | 3 +++
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 82fc8b6c990b..047ca700163b 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -66,6 +66,8 @@ static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
 }
 
 int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
+extern bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg,
+			    u64 control_bit);
 extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
 extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
 extern bool forward_nv1_traps(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
index 0109dfd664dd..1f6cf8fe9fe3 100644
--- a/arch/arm64/kvm/emulate-nested.c
+++ b/arch/arm64/kvm/emulate-nested.c
@@ -13,14 +13,14 @@
 
 #include "trace.h"
 
-bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit)
+bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg, u64 control_bit)
 {
 	bool control_bit_set;
 
 	if (!vcpu_has_nv(vcpu))
 		return false;
 
-	control_bit_set = __vcpu_sys_reg(vcpu, HCR_EL2) & control_bit;
+	control_bit_set = __vcpu_sys_reg(vcpu, reg) & control_bit;
 	if (!vcpu_is_el2(vcpu) && control_bit_set) {
 		kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
 		return true;
@@ -28,6 +28,11 @@ bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit)
 	return false;
 }
 
+bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit)
+{
+	return __forward_traps(vcpu, HCR_EL2, control_bit);
+}
+
 bool forward_nv_traps(struct kvm_vcpu *vcpu)
 {
 	return forward_traps(vcpu, HCR_NV);
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 697bf0bca550..3e1f37c507a8 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -566,6 +566,9 @@ static bool trap_debug_regs(struct kvm_vcpu *vcpu,
 			    struct sys_reg_params *p,
 			    const struct sys_reg_desc *r)
 {
+	if (__forward_traps(vcpu, MDCR_EL2, MDCR_EL2_TDA | MDCR_EL2_TDE))
+		return false;
+
 	access_rw(vcpu, p, r);
 	if (p->is_write)
 		vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 29/64] KVM: arm64: nv: Forward debug traps to the nested guest
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

On handling a debug trap, check whether we need to forward it to the
guest before handling it.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_nested.h | 2 ++
 arch/arm64/kvm/emulate-nested.c     | 9 +++++++--
 arch/arm64/kvm/sys_regs.c           | 3 +++
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 82fc8b6c990b..047ca700163b 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -66,6 +66,8 @@ static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
 }
 
 int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
+extern bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg,
+			    u64 control_bit);
 extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
 extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
 extern bool forward_nv1_traps(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
index 0109dfd664dd..1f6cf8fe9fe3 100644
--- a/arch/arm64/kvm/emulate-nested.c
+++ b/arch/arm64/kvm/emulate-nested.c
@@ -13,14 +13,14 @@
 
 #include "trace.h"
 
-bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit)
+bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg, u64 control_bit)
 {
 	bool control_bit_set;
 
 	if (!vcpu_has_nv(vcpu))
 		return false;
 
-	control_bit_set = __vcpu_sys_reg(vcpu, HCR_EL2) & control_bit;
+	control_bit_set = __vcpu_sys_reg(vcpu, reg) & control_bit;
 	if (!vcpu_is_el2(vcpu) && control_bit_set) {
 		kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
 		return true;
@@ -28,6 +28,11 @@ bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit)
 	return false;
 }
 
+bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit)
+{
+	return __forward_traps(vcpu, HCR_EL2, control_bit);
+}
+
 bool forward_nv_traps(struct kvm_vcpu *vcpu)
 {
 	return forward_traps(vcpu, HCR_NV);
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 697bf0bca550..3e1f37c507a8 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -566,6 +566,9 @@ static bool trap_debug_regs(struct kvm_vcpu *vcpu,
 			    struct sys_reg_params *p,
 			    const struct sys_reg_desc *r)
 {
+	if (__forward_traps(vcpu, MDCR_EL2, MDCR_EL2_TDA | MDCR_EL2_TDE))
+		return false;
+
 	access_rw(vcpu, p, r);
 	if (p->is_write)
 		vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 29/64] KVM: arm64: nv: Forward debug traps to the nested guest
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

On handling a debug trap, check whether we need to forward it to the
guest before handling it.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_nested.h | 2 ++
 arch/arm64/kvm/emulate-nested.c     | 9 +++++++--
 arch/arm64/kvm/sys_regs.c           | 3 +++
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 82fc8b6c990b..047ca700163b 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -66,6 +66,8 @@ static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
 }
 
 int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
+extern bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg,
+			    u64 control_bit);
 extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
 extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
 extern bool forward_nv1_traps(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
index 0109dfd664dd..1f6cf8fe9fe3 100644
--- a/arch/arm64/kvm/emulate-nested.c
+++ b/arch/arm64/kvm/emulate-nested.c
@@ -13,14 +13,14 @@
 
 #include "trace.h"
 
-bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit)
+bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg, u64 control_bit)
 {
 	bool control_bit_set;
 
 	if (!vcpu_has_nv(vcpu))
 		return false;
 
-	control_bit_set = __vcpu_sys_reg(vcpu, HCR_EL2) & control_bit;
+	control_bit_set = __vcpu_sys_reg(vcpu, reg) & control_bit;
 	if (!vcpu_is_el2(vcpu) && control_bit_set) {
 		kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
 		return true;
@@ -28,6 +28,11 @@ bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit)
 	return false;
 }
 
+bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit)
+{
+	return __forward_traps(vcpu, HCR_EL2, control_bit);
+}
+
 bool forward_nv_traps(struct kvm_vcpu *vcpu)
 {
 	return forward_traps(vcpu, HCR_NV);
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 697bf0bca550..3e1f37c507a8 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -566,6 +566,9 @@ static bool trap_debug_regs(struct kvm_vcpu *vcpu,
 			    struct sys_reg_params *p,
 			    const struct sys_reg_desc *r)
 {
+	if (__forward_traps(vcpu, MDCR_EL2, MDCR_EL2_TDA | MDCR_EL2_TDE))
+		return false;
+
 	access_rw(vcpu, p, r);
 	if (p->is_write)
 		vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 30/64] KVM: arm64: nv: Configure HCR_EL2 for nested virtualization
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Jintack Lim <jintack.lim@linaro.org>

We enable nested virtualization by setting the HCR NV and NV1 bit.

When the virtual E2H bit is set, we can support EL2 register accesses
via EL1 registers from the virtual EL2 by doing trap-and-emulate. A
better alternative, however, is to allow the virtual EL2 to access EL2
register states without trap. This can be easily achieved by not traping
EL1 registers since those registers already have EL2 register states.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_arm.h |  1 +
 arch/arm64/kvm/hyp/vhe/switch.c  | 38 +++++++++++++++++++++++++++++---
 2 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 748c2b068d4e..d913c3fb5665 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -87,6 +87,7 @@
 			 HCR_BSU_IS | HCR_FB | HCR_TACR | \
 			 HCR_AMO | HCR_SWIO | HCR_TIDCP | HCR_RW | HCR_TLOR | \
 			 HCR_FMO | HCR_IMO | HCR_PTW )
+#define HCR_GUEST_NV_FILTER_FLAGS (HCR_ATA | HCR_API | HCR_APK)
 #define HCR_VIRT_EXCP_MASK (HCR_VSE | HCR_VI | HCR_VF)
 #define HCR_HOST_NVHE_FLAGS (HCR_RW | HCR_API | HCR_APK | HCR_ATA)
 #define HCR_HOST_NVHE_PROTECTED_FLAGS (HCR_HOST_NVHE_FLAGS | HCR_TSC)
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 1e6a00e7bfb3..28845f907cfc 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -35,9 +35,41 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 	u64 hcr = vcpu->arch.hcr_el2;
 	u64 val;
 
-	/* Trap VM sysreg accesses if an EL2 guest is not using VHE. */
-	if (vcpu_is_el2(vcpu) && !vcpu_el2_e2h_is_set(vcpu))
-		hcr |= HCR_TVM | HCR_TRVM;
+	if (is_hyp_ctxt(vcpu)) {
+		hcr |= HCR_NV;
+
+		if (!vcpu_el2_e2h_is_set(vcpu)) {
+			/*
+			 * For a guest hypervisor on v8.0, trap and emulate
+			 * the EL1 virtual memory control register accesses.
+			 */
+			hcr |= HCR_TVM | HCR_TRVM | HCR_NV1;
+		} else {
+			/*
+			 * For a guest hypervisor on v8.1 (VHE), allow to
+			 * access the EL1 virtual memory control registers
+			 * natively. These accesses are to access EL2 register
+			 * states.
+			 * Note that we still need to respect the virtual
+			 * HCR_EL2 state.
+			 */
+			u64 vhcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
+
+			vhcr_el2 &= ~HCR_GUEST_NV_FILTER_FLAGS;
+
+			/*
+			 * We already set TVM to handle set/way cache maint
+			 * ops traps, this somewhat collides with the nested
+			 * virt trapping for nVHE. So turn this off for now
+			 * here, in the hope that VHE guests won't ever do this.
+			 * TODO: find out whether it's worth to support both
+			 * cases at the same time.
+			 */
+			hcr &= ~HCR_TVM;
+
+			hcr |= vhcr_el2 & (HCR_TVM | HCR_TRVM);
+		}
+	}
 
 	___activate_traps(vcpu, hcr);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 30/64] KVM: arm64: nv: Configure HCR_EL2 for nested virtualization
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

From: Jintack Lim <jintack.lim@linaro.org>

We enable nested virtualization by setting the HCR NV and NV1 bit.

When the virtual E2H bit is set, we can support EL2 register accesses
via EL1 registers from the virtual EL2 by doing trap-and-emulate. A
better alternative, however, is to allow the virtual EL2 to access EL2
register states without trap. This can be easily achieved by not traping
EL1 registers since those registers already have EL2 register states.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_arm.h |  1 +
 arch/arm64/kvm/hyp/vhe/switch.c  | 38 +++++++++++++++++++++++++++++---
 2 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 748c2b068d4e..d913c3fb5665 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -87,6 +87,7 @@
 			 HCR_BSU_IS | HCR_FB | HCR_TACR | \
 			 HCR_AMO | HCR_SWIO | HCR_TIDCP | HCR_RW | HCR_TLOR | \
 			 HCR_FMO | HCR_IMO | HCR_PTW )
+#define HCR_GUEST_NV_FILTER_FLAGS (HCR_ATA | HCR_API | HCR_APK)
 #define HCR_VIRT_EXCP_MASK (HCR_VSE | HCR_VI | HCR_VF)
 #define HCR_HOST_NVHE_FLAGS (HCR_RW | HCR_API | HCR_APK | HCR_ATA)
 #define HCR_HOST_NVHE_PROTECTED_FLAGS (HCR_HOST_NVHE_FLAGS | HCR_TSC)
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 1e6a00e7bfb3..28845f907cfc 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -35,9 +35,41 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 	u64 hcr = vcpu->arch.hcr_el2;
 	u64 val;
 
-	/* Trap VM sysreg accesses if an EL2 guest is not using VHE. */
-	if (vcpu_is_el2(vcpu) && !vcpu_el2_e2h_is_set(vcpu))
-		hcr |= HCR_TVM | HCR_TRVM;
+	if (is_hyp_ctxt(vcpu)) {
+		hcr |= HCR_NV;
+
+		if (!vcpu_el2_e2h_is_set(vcpu)) {
+			/*
+			 * For a guest hypervisor on v8.0, trap and emulate
+			 * the EL1 virtual memory control register accesses.
+			 */
+			hcr |= HCR_TVM | HCR_TRVM | HCR_NV1;
+		} else {
+			/*
+			 * For a guest hypervisor on v8.1 (VHE), allow to
+			 * access the EL1 virtual memory control registers
+			 * natively. These accesses are to access EL2 register
+			 * states.
+			 * Note that we still need to respect the virtual
+			 * HCR_EL2 state.
+			 */
+			u64 vhcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
+
+			vhcr_el2 &= ~HCR_GUEST_NV_FILTER_FLAGS;
+
+			/*
+			 * We already set TVM to handle set/way cache maint
+			 * ops traps, this somewhat collides with the nested
+			 * virt trapping for nVHE. So turn this off for now
+			 * here, in the hope that VHE guests won't ever do this.
+			 * TODO: find out whether it's worth to support both
+			 * cases at the same time.
+			 */
+			hcr &= ~HCR_TVM;
+
+			hcr |= vhcr_el2 & (HCR_TVM | HCR_TRVM);
+		}
+	}
 
 	___activate_traps(vcpu, hcr);
 
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 30/64] KVM: arm64: nv: Configure HCR_EL2 for nested virtualization
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Jintack Lim <jintack.lim@linaro.org>

We enable nested virtualization by setting the HCR NV and NV1 bit.

When the virtual E2H bit is set, we can support EL2 register accesses
via EL1 registers from the virtual EL2 by doing trap-and-emulate. A
better alternative, however, is to allow the virtual EL2 to access EL2
register states without trap. This can be easily achieved by not traping
EL1 registers since those registers already have EL2 register states.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_arm.h |  1 +
 arch/arm64/kvm/hyp/vhe/switch.c  | 38 +++++++++++++++++++++++++++++---
 2 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 748c2b068d4e..d913c3fb5665 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -87,6 +87,7 @@
 			 HCR_BSU_IS | HCR_FB | HCR_TACR | \
 			 HCR_AMO | HCR_SWIO | HCR_TIDCP | HCR_RW | HCR_TLOR | \
 			 HCR_FMO | HCR_IMO | HCR_PTW )
+#define HCR_GUEST_NV_FILTER_FLAGS (HCR_ATA | HCR_API | HCR_APK)
 #define HCR_VIRT_EXCP_MASK (HCR_VSE | HCR_VI | HCR_VF)
 #define HCR_HOST_NVHE_FLAGS (HCR_RW | HCR_API | HCR_APK | HCR_ATA)
 #define HCR_HOST_NVHE_PROTECTED_FLAGS (HCR_HOST_NVHE_FLAGS | HCR_TSC)
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 1e6a00e7bfb3..28845f907cfc 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -35,9 +35,41 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 	u64 hcr = vcpu->arch.hcr_el2;
 	u64 val;
 
-	/* Trap VM sysreg accesses if an EL2 guest is not using VHE. */
-	if (vcpu_is_el2(vcpu) && !vcpu_el2_e2h_is_set(vcpu))
-		hcr |= HCR_TVM | HCR_TRVM;
+	if (is_hyp_ctxt(vcpu)) {
+		hcr |= HCR_NV;
+
+		if (!vcpu_el2_e2h_is_set(vcpu)) {
+			/*
+			 * For a guest hypervisor on v8.0, trap and emulate
+			 * the EL1 virtual memory control register accesses.
+			 */
+			hcr |= HCR_TVM | HCR_TRVM | HCR_NV1;
+		} else {
+			/*
+			 * For a guest hypervisor on v8.1 (VHE), allow to
+			 * access the EL1 virtual memory control registers
+			 * natively. These accesses are to access EL2 register
+			 * states.
+			 * Note that we still need to respect the virtual
+			 * HCR_EL2 state.
+			 */
+			u64 vhcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
+
+			vhcr_el2 &= ~HCR_GUEST_NV_FILTER_FLAGS;
+
+			/*
+			 * We already set TVM to handle set/way cache maint
+			 * ops traps, this somewhat collides with the nested
+			 * virt trapping for nVHE. So turn this off for now
+			 * here, in the hope that VHE guests won't ever do this.
+			 * TODO: find out whether it's worth to support both
+			 * cases at the same time.
+			 */
+			hcr &= ~HCR_TVM;
+
+			hcr |= vhcr_el2 & (HCR_TVM | HCR_TRVM);
+		}
+	}
 
 	___activate_traps(vcpu, hcr);
 
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 31/64] KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 changes
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Christoffer Dall <christoffer.dall@linaro.org>

So far we were flushing almost the entire universe whenever a VM would
load/unload the SCTLR_EL1 and the two versions of that register had
different MMU enabled settings.  This turned out to be so slow that it
prevented forward progress for a nested VM, because a scheduler timer
tick interrupt would always be pending when we reached the nested VM.

To avoid this problem, we consider the SCTLR_EL2 when evaluating if
caches are on or off when entering virtual EL2 (because this is the
value that we end up shadowing onto the hardware EL1 register).

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_mmu.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 81839e9a8a24..1b314b2a69bc 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -115,6 +115,7 @@ alternative_cb_end
 #include <asm/cache.h>
 #include <asm/cacheflush.h>
 #include <asm/mmu_context.h>
+#include <asm/kvm_emulate.h>
 
 void kvm_update_va_mask(struct alt_instr *alt,
 			__le32 *origptr, __le32 *updptr, int nr_inst);
@@ -187,7 +188,10 @@ struct kvm;
 
 static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
 {
-	return (vcpu_read_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
+	if (vcpu_is_el2(vcpu))
+		return (__vcpu_sys_reg(vcpu, SCTLR_EL2) & 0b101) == 0b101;
+	else
+		return (vcpu_read_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
 }
 
 static inline void __clean_dcache_guest_page(void *va, size_t size)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 31/64] KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 changes
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

From: Christoffer Dall <christoffer.dall@linaro.org>

So far we were flushing almost the entire universe whenever a VM would
load/unload the SCTLR_EL1 and the two versions of that register had
different MMU enabled settings.  This turned out to be so slow that it
prevented forward progress for a nested VM, because a scheduler timer
tick interrupt would always be pending when we reached the nested VM.

To avoid this problem, we consider the SCTLR_EL2 when evaluating if
caches are on or off when entering virtual EL2 (because this is the
value that we end up shadowing onto the hardware EL1 register).

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_mmu.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 81839e9a8a24..1b314b2a69bc 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -115,6 +115,7 @@ alternative_cb_end
 #include <asm/cache.h>
 #include <asm/cacheflush.h>
 #include <asm/mmu_context.h>
+#include <asm/kvm_emulate.h>
 
 void kvm_update_va_mask(struct alt_instr *alt,
 			__le32 *origptr, __le32 *updptr, int nr_inst);
@@ -187,7 +188,10 @@ struct kvm;
 
 static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
 {
-	return (vcpu_read_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
+	if (vcpu_is_el2(vcpu))
+		return (__vcpu_sys_reg(vcpu, SCTLR_EL2) & 0b101) == 0b101;
+	else
+		return (vcpu_read_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
 }
 
 static inline void __clean_dcache_guest_page(void *va, size_t size)
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 31/64] KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 changes
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Christoffer Dall <christoffer.dall@linaro.org>

So far we were flushing almost the entire universe whenever a VM would
load/unload the SCTLR_EL1 and the two versions of that register had
different MMU enabled settings.  This turned out to be so slow that it
prevented forward progress for a nested VM, because a scheduler timer
tick interrupt would always be pending when we reached the nested VM.

To avoid this problem, we consider the SCTLR_EL2 when evaluating if
caches are on or off when entering virtual EL2 (because this is the
value that we end up shadowing onto the hardware EL1 register).

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_mmu.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 81839e9a8a24..1b314b2a69bc 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -115,6 +115,7 @@ alternative_cb_end
 #include <asm/cache.h>
 #include <asm/cacheflush.h>
 #include <asm/mmu_context.h>
+#include <asm/kvm_emulate.h>
 
 void kvm_update_va_mask(struct alt_instr *alt,
 			__le32 *origptr, __le32 *updptr, int nr_inst);
@@ -187,7 +188,10 @@ struct kvm;
 
 static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
 {
-	return (vcpu_read_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
+	if (vcpu_is_el2(vcpu))
+		return (__vcpu_sys_reg(vcpu, SCTLR_EL2) & 0b101) == 0b101;
+	else
+		return (vcpu_read_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
 }
 
 static inline void __clean_dcache_guest_page(void *va, size_t size)
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 32/64] KVM: arm64: nv: Filter out unsupported features from ID regs
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

As there is a number of features that we either can't support,
or don't want to support right away with NV, let's add some
basic filtering so that we don't advertize silly things to the
EL2 guest.

Whilst we are at it, advertize ARMv8.4-TTL as well as ARMv8.5-GTG.

Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_nested.h |   6 ++
 arch/arm64/kvm/nested.c             | 157 ++++++++++++++++++++++++++++
 arch/arm64/kvm/sys_regs.c           |   4 +-
 arch/arm64/kvm/sys_regs.h           |   2 +
 4 files changed, 168 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 047ca700163b..7d398510fd9d 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -72,4 +72,10 @@ extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
 extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
 extern bool forward_nv1_traps(struct kvm_vcpu *vcpu);
 
+struct sys_reg_params;
+struct sys_reg_desc;
+
+void access_nested_id_reg(struct kvm_vcpu *v, struct sys_reg_params *p,
+			  const struct sys_reg_desc *r);
+
 #endif /* __ARM64_KVM_NESTED_H */
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 5e1104f8e765..254152cd791e 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -8,6 +8,10 @@
 #include <linux/kvm_host.h>
 
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_nested.h>
+#include <asm/sysreg.h>
+
+#include "sys_regs.h"
 
 /*
  * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
@@ -26,3 +30,156 @@ int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
 
 	return -EINVAL;
 }
+
+/*
+ * Our emulated CPU doesn't support all the possible features. For the
+ * sake of simplicity (and probably mental sanity), wipe out a number
+ * of feature bits we don't intend to support for the time being.
+ * This list should get updated as new features get added to the NV
+ * support, and new extension to the architecture.
+ */
+void access_nested_id_reg(struct kvm_vcpu *v, struct sys_reg_params *p,
+			  const struct sys_reg_desc *r)
+{
+	u32 id = sys_reg((u32)r->Op0, (u32)r->Op1,
+			 (u32)r->CRn, (u32)r->CRm, (u32)r->Op2);
+	u64 val, tmp;
+
+	if (!vcpu_has_nv(v))
+		return;
+
+	val = p->regval;
+
+	switch (id) {
+	case SYS_ID_AA64ISAR0_EL1:
+		/* Support everything but O.S. and Range TLBIs */
+		val &= ~(FEATURE(ID_AA64ISAR0_TLB)	|
+			 GENMASK_ULL(27, 24)		|
+			 GENMASK_ULL(3, 0));
+		break;
+
+	case SYS_ID_AA64ISAR1_EL1:
+		/* Support everything but PtrAuth and Spec Invalidation */
+		val &= ~(GENMASK_ULL(63, 56)		|
+			 FEATURE(ID_AA64ISAR1_SPECRES)	|
+			 FEATURE(ID_AA64ISAR1_GPI)	|
+			 FEATURE(ID_AA64ISAR1_GPA)	|
+			 FEATURE(ID_AA64ISAR1_API)	|
+			 FEATURE(ID_AA64ISAR1_APA));
+		break;
+
+	case SYS_ID_AA64PFR0_EL1:
+		/* No AMU, MPAM, S-EL2, RAS or SVE */
+		val &= ~(GENMASK_ULL(55, 52)		|
+			 FEATURE(ID_AA64PFR0_AMU)	|
+			 FEATURE(ID_AA64PFR0_MPAM)	|
+			 FEATURE(ID_AA64PFR0_SEL2)	|
+			 FEATURE(ID_AA64PFR0_RAS)	|
+			 FEATURE(ID_AA64PFR0_SVE)	|
+			 FEATURE(ID_AA64PFR0_EL3)	|
+			 FEATURE(ID_AA64PFR0_EL2));
+		/* 64bit EL2/EL3 only */
+		val |= FIELD_PREP(FEATURE(ID_AA64PFR0_EL2), 0b0001);
+		val |= FIELD_PREP(FEATURE(ID_AA64PFR0_EL3), 0b0001);
+		break;
+
+	case SYS_ID_AA64PFR1_EL1:
+		/* Only support SSBS */
+		val &= FEATURE(ID_AA64PFR1_SSBS);
+		break;
+
+	case SYS_ID_AA64MMFR0_EL1:
+		/* Hide ECV, FGT, ExS, Secure Memory */
+		val &= ~(GENMASK_ULL(63, 43)			|
+			 FEATURE(ID_AA64MMFR0_TGRAN4_2)		|
+			 FEATURE(ID_AA64MMFR0_TGRAN16_2)	|
+			 FEATURE(ID_AA64MMFR0_TGRAN64_2)	|
+			 FEATURE(ID_AA64MMFR0_SNSMEM));
+
+		/* Disallow unsupported S2 page sizes */
+		switch (PAGE_SIZE) {
+		case SZ_64K:
+			val |= FIELD_PREP(FEATURE(ID_AA64MMFR0_TGRAN16_2), 0b0001);
+			fallthrough;
+		case SZ_16K:
+			val |= FIELD_PREP(FEATURE(ID_AA64MMFR0_TGRAN4_2), 0b0001);
+			fallthrough;
+		case SZ_4K:
+			/* Support everything */
+			break;
+		}
+		/*
+		 * Since we can't support a guest S2 page size smaller than
+		 * the host's own page size (due to KVM only populating its
+		 * own S2 using the kernel's page size), advertise the
+		 * limitation using FEAT_GTG.
+		 */
+		switch (PAGE_SIZE) {
+		case SZ_4K:
+			val |= FIELD_PREP(FEATURE(ID_AA64MMFR0_TGRAN4_2), 0b0010);
+			fallthrough;
+		case SZ_16K:
+			val |= FIELD_PREP(FEATURE(ID_AA64MMFR0_TGRAN16_2), 0b0010);
+			fallthrough;
+		case SZ_64K:
+			val |= FIELD_PREP(FEATURE(ID_AA64MMFR0_TGRAN64_2), 0b0010);
+			break;
+		}
+		/* Cap PARange to 40bits */
+		tmp = FIELD_GET(FEATURE(ID_AA64MMFR0_PARANGE), val);
+		if (tmp > 0b0010) {
+			val &= ~FEATURE(ID_AA64MMFR0_PARANGE);
+			val |= FIELD_PREP(FEATURE(ID_AA64MMFR0_PARANGE), 0b0010);
+		}
+		break;
+
+	case SYS_ID_AA64MMFR1_EL1:
+		val &= (FEATURE(ID_AA64MMFR1_PAN)	|
+			FEATURE(ID_AA64MMFR1_LOR)	|
+			FEATURE(ID_AA64MMFR1_HPD)	|
+			FEATURE(ID_AA64MMFR1_VHE)	|
+			FEATURE(ID_AA64MMFR1_VMIDBITS));
+		break;
+
+	case SYS_ID_AA64MMFR2_EL1:
+		val &= ~(FEATURE(ID_AA64MMFR2_EVT)	|
+			 FEATURE(ID_AA64MMFR2_BBM)	|
+			 FEATURE(ID_AA64MMFR2_TTL)	|
+			 GENMASK_ULL(47, 44)		|
+			 FEATURE(ID_AA64MMFR2_ST)	|
+			 FEATURE(ID_AA64MMFR2_CCIDX)	|
+			 FEATURE(ID_AA64MMFR2_LVA));
+
+		/* Force TTL support */
+		val |= FIELD_PREP(FEATURE(ID_AA64MMFR2_TTL), 0b0001);
+		break;
+
+	case SYS_ID_AA64DFR0_EL1:
+		/* Only limited support for PMU, Debug, BPs and WPs */
+		val &= (FEATURE(ID_AA64DFR0_PMSVER)	|
+			FEATURE(ID_AA64DFR0_WRPS)	|
+			FEATURE(ID_AA64DFR0_BRPS)	|
+			FEATURE(ID_AA64DFR0_DEBUGVER));
+
+		/* Cap PMU to ARMv8.1 */
+		tmp = FIELD_GET(FEATURE(ID_AA64DFR0_PMUVER), val);
+		if (tmp > 0b0100) {
+			val &= ~FEATURE(ID_AA64DFR0_PMUVER);
+			val |= FIELD_PREP(FEATURE(ID_AA64DFR0_PMUVER), 0b0100);
+		}
+		/* Cap Debug to ARMv8.1 */
+		tmp = FIELD_GET(FEATURE(ID_AA64DFR0_DEBUGVER), val);
+		if (tmp > 0b0111) {
+			val &= ~FEATURE(ID_AA64DFR0_DEBUGVER);
+			val |= FIELD_PREP(FEATURE(ID_AA64DFR0_DEBUGVER), 0b0111);
+		}
+		break;
+
+	default:
+		/* Unknown register, just wipe it clean */
+		val = 0;
+		break;
+	}
+
+	p->regval = val;
+}
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 3e1f37c507a8..ebcdf2714b73 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1394,8 +1394,10 @@ static bool access_id_reg(struct kvm_vcpu *vcpu,
 			  const struct sys_reg_desc *r)
 {
 	bool raz = sysreg_visible_as_raz(vcpu, r);
+	bool ret = __access_id_reg(vcpu, p, r, raz);
 
-	return __access_id_reg(vcpu, p, r, raz);
+	access_nested_id_reg(vcpu, p, r);
+	return ret;
 }
 
 static bool access_raz_id_reg(struct kvm_vcpu *vcpu,
diff --git a/arch/arm64/kvm/sys_regs.h b/arch/arm64/kvm/sys_regs.h
index 850629f083a3..b82683224250 100644
--- a/arch/arm64/kvm/sys_regs.h
+++ b/arch/arm64/kvm/sys_regs.h
@@ -211,4 +211,6 @@ const struct sys_reg_desc *find_reg_by_id(u64 id,
 	CRn(sys_reg_CRn(reg)), CRm(sys_reg_CRm(reg)),	\
 	Op2(sys_reg_Op2(reg))
 
+#define FEATURE(x)	(GENMASK_ULL(x##_SHIFT + 3, x##_SHIFT))
+
 #endif /* __ARM64_KVM_SYS_REGS_LOCAL_H__ */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 32/64] KVM: arm64: nv: Filter out unsupported features from ID regs
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

As there is a number of features that we either can't support,
or don't want to support right away with NV, let's add some
basic filtering so that we don't advertize silly things to the
EL2 guest.

Whilst we are at it, advertize ARMv8.4-TTL as well as ARMv8.5-GTG.

Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_nested.h |   6 ++
 arch/arm64/kvm/nested.c             | 157 ++++++++++++++++++++++++++++
 arch/arm64/kvm/sys_regs.c           |   4 +-
 arch/arm64/kvm/sys_regs.h           |   2 +
 4 files changed, 168 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 047ca700163b..7d398510fd9d 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -72,4 +72,10 @@ extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
 extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
 extern bool forward_nv1_traps(struct kvm_vcpu *vcpu);
 
+struct sys_reg_params;
+struct sys_reg_desc;
+
+void access_nested_id_reg(struct kvm_vcpu *v, struct sys_reg_params *p,
+			  const struct sys_reg_desc *r);
+
 #endif /* __ARM64_KVM_NESTED_H */
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 5e1104f8e765..254152cd791e 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -8,6 +8,10 @@
 #include <linux/kvm_host.h>
 
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_nested.h>
+#include <asm/sysreg.h>
+
+#include "sys_regs.h"
 
 /*
  * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
@@ -26,3 +30,156 @@ int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
 
 	return -EINVAL;
 }
+
+/*
+ * Our emulated CPU doesn't support all the possible features. For the
+ * sake of simplicity (and probably mental sanity), wipe out a number
+ * of feature bits we don't intend to support for the time being.
+ * This list should get updated as new features get added to the NV
+ * support, and new extension to the architecture.
+ */
+void access_nested_id_reg(struct kvm_vcpu *v, struct sys_reg_params *p,
+			  const struct sys_reg_desc *r)
+{
+	u32 id = sys_reg((u32)r->Op0, (u32)r->Op1,
+			 (u32)r->CRn, (u32)r->CRm, (u32)r->Op2);
+	u64 val, tmp;
+
+	if (!vcpu_has_nv(v))
+		return;
+
+	val = p->regval;
+
+	switch (id) {
+	case SYS_ID_AA64ISAR0_EL1:
+		/* Support everything but O.S. and Range TLBIs */
+		val &= ~(FEATURE(ID_AA64ISAR0_TLB)	|
+			 GENMASK_ULL(27, 24)		|
+			 GENMASK_ULL(3, 0));
+		break;
+
+	case SYS_ID_AA64ISAR1_EL1:
+		/* Support everything but PtrAuth and Spec Invalidation */
+		val &= ~(GENMASK_ULL(63, 56)		|
+			 FEATURE(ID_AA64ISAR1_SPECRES)	|
+			 FEATURE(ID_AA64ISAR1_GPI)	|
+			 FEATURE(ID_AA64ISAR1_GPA)	|
+			 FEATURE(ID_AA64ISAR1_API)	|
+			 FEATURE(ID_AA64ISAR1_APA));
+		break;
+
+	case SYS_ID_AA64PFR0_EL1:
+		/* No AMU, MPAM, S-EL2, RAS or SVE */
+		val &= ~(GENMASK_ULL(55, 52)		|
+			 FEATURE(ID_AA64PFR0_AMU)	|
+			 FEATURE(ID_AA64PFR0_MPAM)	|
+			 FEATURE(ID_AA64PFR0_SEL2)	|
+			 FEATURE(ID_AA64PFR0_RAS)	|
+			 FEATURE(ID_AA64PFR0_SVE)	|
+			 FEATURE(ID_AA64PFR0_EL3)	|
+			 FEATURE(ID_AA64PFR0_EL2));
+		/* 64bit EL2/EL3 only */
+		val |= FIELD_PREP(FEATURE(ID_AA64PFR0_EL2), 0b0001);
+		val |= FIELD_PREP(FEATURE(ID_AA64PFR0_EL3), 0b0001);
+		break;
+
+	case SYS_ID_AA64PFR1_EL1:
+		/* Only support SSBS */
+		val &= FEATURE(ID_AA64PFR1_SSBS);
+		break;
+
+	case SYS_ID_AA64MMFR0_EL1:
+		/* Hide ECV, FGT, ExS, Secure Memory */
+		val &= ~(GENMASK_ULL(63, 43)			|
+			 FEATURE(ID_AA64MMFR0_TGRAN4_2)		|
+			 FEATURE(ID_AA64MMFR0_TGRAN16_2)	|
+			 FEATURE(ID_AA64MMFR0_TGRAN64_2)	|
+			 FEATURE(ID_AA64MMFR0_SNSMEM));
+
+		/* Disallow unsupported S2 page sizes */
+		switch (PAGE_SIZE) {
+		case SZ_64K:
+			val |= FIELD_PREP(FEATURE(ID_AA64MMFR0_TGRAN16_2), 0b0001);
+			fallthrough;
+		case SZ_16K:
+			val |= FIELD_PREP(FEATURE(ID_AA64MMFR0_TGRAN4_2), 0b0001);
+			fallthrough;
+		case SZ_4K:
+			/* Support everything */
+			break;
+		}
+		/*
+		 * Since we can't support a guest S2 page size smaller than
+		 * the host's own page size (due to KVM only populating its
+		 * own S2 using the kernel's page size), advertise the
+		 * limitation using FEAT_GTG.
+		 */
+		switch (PAGE_SIZE) {
+		case SZ_4K:
+			val |= FIELD_PREP(FEATURE(ID_AA64MMFR0_TGRAN4_2), 0b0010);
+			fallthrough;
+		case SZ_16K:
+			val |= FIELD_PREP(FEATURE(ID_AA64MMFR0_TGRAN16_2), 0b0010);
+			fallthrough;
+		case SZ_64K:
+			val |= FIELD_PREP(FEATURE(ID_AA64MMFR0_TGRAN64_2), 0b0010);
+			break;
+		}
+		/* Cap PARange to 40bits */
+		tmp = FIELD_GET(FEATURE(ID_AA64MMFR0_PARANGE), val);
+		if (tmp > 0b0010) {
+			val &= ~FEATURE(ID_AA64MMFR0_PARANGE);
+			val |= FIELD_PREP(FEATURE(ID_AA64MMFR0_PARANGE), 0b0010);
+		}
+		break;
+
+	case SYS_ID_AA64MMFR1_EL1:
+		val &= (FEATURE(ID_AA64MMFR1_PAN)	|
+			FEATURE(ID_AA64MMFR1_LOR)	|
+			FEATURE(ID_AA64MMFR1_HPD)	|
+			FEATURE(ID_AA64MMFR1_VHE)	|
+			FEATURE(ID_AA64MMFR1_VMIDBITS));
+		break;
+
+	case SYS_ID_AA64MMFR2_EL1:
+		val &= ~(FEATURE(ID_AA64MMFR2_EVT)	|
+			 FEATURE(ID_AA64MMFR2_BBM)	|
+			 FEATURE(ID_AA64MMFR2_TTL)	|
+			 GENMASK_ULL(47, 44)		|
+			 FEATURE(ID_AA64MMFR2_ST)	|
+			 FEATURE(ID_AA64MMFR2_CCIDX)	|
+			 FEATURE(ID_AA64MMFR2_LVA));
+
+		/* Force TTL support */
+		val |= FIELD_PREP(FEATURE(ID_AA64MMFR2_TTL), 0b0001);
+		break;
+
+	case SYS_ID_AA64DFR0_EL1:
+		/* Only limited support for PMU, Debug, BPs and WPs */
+		val &= (FEATURE(ID_AA64DFR0_PMSVER)	|
+			FEATURE(ID_AA64DFR0_WRPS)	|
+			FEATURE(ID_AA64DFR0_BRPS)	|
+			FEATURE(ID_AA64DFR0_DEBUGVER));
+
+		/* Cap PMU to ARMv8.1 */
+		tmp = FIELD_GET(FEATURE(ID_AA64DFR0_PMUVER), val);
+		if (tmp > 0b0100) {
+			val &= ~FEATURE(ID_AA64DFR0_PMUVER);
+			val |= FIELD_PREP(FEATURE(ID_AA64DFR0_PMUVER), 0b0100);
+		}
+		/* Cap Debug to ARMv8.1 */
+		tmp = FIELD_GET(FEATURE(ID_AA64DFR0_DEBUGVER), val);
+		if (tmp > 0b0111) {
+			val &= ~FEATURE(ID_AA64DFR0_DEBUGVER);
+			val |= FIELD_PREP(FEATURE(ID_AA64DFR0_DEBUGVER), 0b0111);
+		}
+		break;
+
+	default:
+		/* Unknown register, just wipe it clean */
+		val = 0;
+		break;
+	}
+
+	p->regval = val;
+}
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 3e1f37c507a8..ebcdf2714b73 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1394,8 +1394,10 @@ static bool access_id_reg(struct kvm_vcpu *vcpu,
 			  const struct sys_reg_desc *r)
 {
 	bool raz = sysreg_visible_as_raz(vcpu, r);
+	bool ret = __access_id_reg(vcpu, p, r, raz);
 
-	return __access_id_reg(vcpu, p, r, raz);
+	access_nested_id_reg(vcpu, p, r);
+	return ret;
 }
 
 static bool access_raz_id_reg(struct kvm_vcpu *vcpu,
diff --git a/arch/arm64/kvm/sys_regs.h b/arch/arm64/kvm/sys_regs.h
index 850629f083a3..b82683224250 100644
--- a/arch/arm64/kvm/sys_regs.h
+++ b/arch/arm64/kvm/sys_regs.h
@@ -211,4 +211,6 @@ const struct sys_reg_desc *find_reg_by_id(u64 id,
 	CRn(sys_reg_CRn(reg)), CRm(sys_reg_CRm(reg)),	\
 	Op2(sys_reg_Op2(reg))
 
+#define FEATURE(x)	(GENMASK_ULL(x##_SHIFT + 3, x##_SHIFT))
+
 #endif /* __ARM64_KVM_SYS_REGS_LOCAL_H__ */
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 32/64] KVM: arm64: nv: Filter out unsupported features from ID regs
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

As there is a number of features that we either can't support,
or don't want to support right away with NV, let's add some
basic filtering so that we don't advertize silly things to the
EL2 guest.

Whilst we are at it, advertize ARMv8.4-TTL as well as ARMv8.5-GTG.

Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_nested.h |   6 ++
 arch/arm64/kvm/nested.c             | 157 ++++++++++++++++++++++++++++
 arch/arm64/kvm/sys_regs.c           |   4 +-
 arch/arm64/kvm/sys_regs.h           |   2 +
 4 files changed, 168 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 047ca700163b..7d398510fd9d 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -72,4 +72,10 @@ extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
 extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
 extern bool forward_nv1_traps(struct kvm_vcpu *vcpu);
 
+struct sys_reg_params;
+struct sys_reg_desc;
+
+void access_nested_id_reg(struct kvm_vcpu *v, struct sys_reg_params *p,
+			  const struct sys_reg_desc *r);
+
 #endif /* __ARM64_KVM_NESTED_H */
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 5e1104f8e765..254152cd791e 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -8,6 +8,10 @@
 #include <linux/kvm_host.h>
 
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_nested.h>
+#include <asm/sysreg.h>
+
+#include "sys_regs.h"
 
 /*
  * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
@@ -26,3 +30,156 @@ int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
 
 	return -EINVAL;
 }
+
+/*
+ * Our emulated CPU doesn't support all the possible features. For the
+ * sake of simplicity (and probably mental sanity), wipe out a number
+ * of feature bits we don't intend to support for the time being.
+ * This list should get updated as new features get added to the NV
+ * support, and new extension to the architecture.
+ */
+void access_nested_id_reg(struct kvm_vcpu *v, struct sys_reg_params *p,
+			  const struct sys_reg_desc *r)
+{
+	u32 id = sys_reg((u32)r->Op0, (u32)r->Op1,
+			 (u32)r->CRn, (u32)r->CRm, (u32)r->Op2);
+	u64 val, tmp;
+
+	if (!vcpu_has_nv(v))
+		return;
+
+	val = p->regval;
+
+	switch (id) {
+	case SYS_ID_AA64ISAR0_EL1:
+		/* Support everything but O.S. and Range TLBIs */
+		val &= ~(FEATURE(ID_AA64ISAR0_TLB)	|
+			 GENMASK_ULL(27, 24)		|
+			 GENMASK_ULL(3, 0));
+		break;
+
+	case SYS_ID_AA64ISAR1_EL1:
+		/* Support everything but PtrAuth and Spec Invalidation */
+		val &= ~(GENMASK_ULL(63, 56)		|
+			 FEATURE(ID_AA64ISAR1_SPECRES)	|
+			 FEATURE(ID_AA64ISAR1_GPI)	|
+			 FEATURE(ID_AA64ISAR1_GPA)	|
+			 FEATURE(ID_AA64ISAR1_API)	|
+			 FEATURE(ID_AA64ISAR1_APA));
+		break;
+
+	case SYS_ID_AA64PFR0_EL1:
+		/* No AMU, MPAM, S-EL2, RAS or SVE */
+		val &= ~(GENMASK_ULL(55, 52)		|
+			 FEATURE(ID_AA64PFR0_AMU)	|
+			 FEATURE(ID_AA64PFR0_MPAM)	|
+			 FEATURE(ID_AA64PFR0_SEL2)	|
+			 FEATURE(ID_AA64PFR0_RAS)	|
+			 FEATURE(ID_AA64PFR0_SVE)	|
+			 FEATURE(ID_AA64PFR0_EL3)	|
+			 FEATURE(ID_AA64PFR0_EL2));
+		/* 64bit EL2/EL3 only */
+		val |= FIELD_PREP(FEATURE(ID_AA64PFR0_EL2), 0b0001);
+		val |= FIELD_PREP(FEATURE(ID_AA64PFR0_EL3), 0b0001);
+		break;
+
+	case SYS_ID_AA64PFR1_EL1:
+		/* Only support SSBS */
+		val &= FEATURE(ID_AA64PFR1_SSBS);
+		break;
+
+	case SYS_ID_AA64MMFR0_EL1:
+		/* Hide ECV, FGT, ExS, Secure Memory */
+		val &= ~(GENMASK_ULL(63, 43)			|
+			 FEATURE(ID_AA64MMFR0_TGRAN4_2)		|
+			 FEATURE(ID_AA64MMFR0_TGRAN16_2)	|
+			 FEATURE(ID_AA64MMFR0_TGRAN64_2)	|
+			 FEATURE(ID_AA64MMFR0_SNSMEM));
+
+		/* Disallow unsupported S2 page sizes */
+		switch (PAGE_SIZE) {
+		case SZ_64K:
+			val |= FIELD_PREP(FEATURE(ID_AA64MMFR0_TGRAN16_2), 0b0001);
+			fallthrough;
+		case SZ_16K:
+			val |= FIELD_PREP(FEATURE(ID_AA64MMFR0_TGRAN4_2), 0b0001);
+			fallthrough;
+		case SZ_4K:
+			/* Support everything */
+			break;
+		}
+		/*
+		 * Since we can't support a guest S2 page size smaller than
+		 * the host's own page size (due to KVM only populating its
+		 * own S2 using the kernel's page size), advertise the
+		 * limitation using FEAT_GTG.
+		 */
+		switch (PAGE_SIZE) {
+		case SZ_4K:
+			val |= FIELD_PREP(FEATURE(ID_AA64MMFR0_TGRAN4_2), 0b0010);
+			fallthrough;
+		case SZ_16K:
+			val |= FIELD_PREP(FEATURE(ID_AA64MMFR0_TGRAN16_2), 0b0010);
+			fallthrough;
+		case SZ_64K:
+			val |= FIELD_PREP(FEATURE(ID_AA64MMFR0_TGRAN64_2), 0b0010);
+			break;
+		}
+		/* Cap PARange to 40bits */
+		tmp = FIELD_GET(FEATURE(ID_AA64MMFR0_PARANGE), val);
+		if (tmp > 0b0010) {
+			val &= ~FEATURE(ID_AA64MMFR0_PARANGE);
+			val |= FIELD_PREP(FEATURE(ID_AA64MMFR0_PARANGE), 0b0010);
+		}
+		break;
+
+	case SYS_ID_AA64MMFR1_EL1:
+		val &= (FEATURE(ID_AA64MMFR1_PAN)	|
+			FEATURE(ID_AA64MMFR1_LOR)	|
+			FEATURE(ID_AA64MMFR1_HPD)	|
+			FEATURE(ID_AA64MMFR1_VHE)	|
+			FEATURE(ID_AA64MMFR1_VMIDBITS));
+		break;
+
+	case SYS_ID_AA64MMFR2_EL1:
+		val &= ~(FEATURE(ID_AA64MMFR2_EVT)	|
+			 FEATURE(ID_AA64MMFR2_BBM)	|
+			 FEATURE(ID_AA64MMFR2_TTL)	|
+			 GENMASK_ULL(47, 44)		|
+			 FEATURE(ID_AA64MMFR2_ST)	|
+			 FEATURE(ID_AA64MMFR2_CCIDX)	|
+			 FEATURE(ID_AA64MMFR2_LVA));
+
+		/* Force TTL support */
+		val |= FIELD_PREP(FEATURE(ID_AA64MMFR2_TTL), 0b0001);
+		break;
+
+	case SYS_ID_AA64DFR0_EL1:
+		/* Only limited support for PMU, Debug, BPs and WPs */
+		val &= (FEATURE(ID_AA64DFR0_PMSVER)	|
+			FEATURE(ID_AA64DFR0_WRPS)	|
+			FEATURE(ID_AA64DFR0_BRPS)	|
+			FEATURE(ID_AA64DFR0_DEBUGVER));
+
+		/* Cap PMU to ARMv8.1 */
+		tmp = FIELD_GET(FEATURE(ID_AA64DFR0_PMUVER), val);
+		if (tmp > 0b0100) {
+			val &= ~FEATURE(ID_AA64DFR0_PMUVER);
+			val |= FIELD_PREP(FEATURE(ID_AA64DFR0_PMUVER), 0b0100);
+		}
+		/* Cap Debug to ARMv8.1 */
+		tmp = FIELD_GET(FEATURE(ID_AA64DFR0_DEBUGVER), val);
+		if (tmp > 0b0111) {
+			val &= ~FEATURE(ID_AA64DFR0_DEBUGVER);
+			val |= FIELD_PREP(FEATURE(ID_AA64DFR0_DEBUGVER), 0b0111);
+		}
+		break;
+
+	default:
+		/* Unknown register, just wipe it clean */
+		val = 0;
+		break;
+	}
+
+	p->regval = val;
+}
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 3e1f37c507a8..ebcdf2714b73 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1394,8 +1394,10 @@ static bool access_id_reg(struct kvm_vcpu *vcpu,
 			  const struct sys_reg_desc *r)
 {
 	bool raz = sysreg_visible_as_raz(vcpu, r);
+	bool ret = __access_id_reg(vcpu, p, r, raz);
 
-	return __access_id_reg(vcpu, p, r, raz);
+	access_nested_id_reg(vcpu, p, r);
+	return ret;
 }
 
 static bool access_raz_id_reg(struct kvm_vcpu *vcpu,
diff --git a/arch/arm64/kvm/sys_regs.h b/arch/arm64/kvm/sys_regs.h
index 850629f083a3..b82683224250 100644
--- a/arch/arm64/kvm/sys_regs.h
+++ b/arch/arm64/kvm/sys_regs.h
@@ -211,4 +211,6 @@ const struct sys_reg_desc *find_reg_by_id(u64 id,
 	CRn(sys_reg_CRn(reg)), CRm(sys_reg_CRm(reg)),	\
 	Op2(sys_reg_Op2(reg))
 
+#define FEATURE(x)	(GENMASK_ULL(x##_SHIFT + 3, x##_SHIFT))
+
 #endif /* __ARM64_KVM_SYS_REGS_LOCAL_H__ */
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 33/64] KVM: arm64: nv: Hide RAS from nested guests
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

We don't want to expose complicated features to guests until we have
a good grasp on the basic CPU emulation. So let's pretend that RAS,
doesn't exist in a nested guest. We already hide the feature bits,
let's now make sure VDISR_EL1 will UNDEF.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/sys_regs.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index ebcdf2714b73..5e8876177ce6 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2212,6 +2212,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	EL2_REG(VBAR_EL2, access_rw, reset_val, 0),
 	EL2_REG(RVBAR_EL2, access_rw, reset_val, 0),
 	{ SYS_DESC(SYS_RMR_EL2), trap_undef },
+	{ SYS_DESC(SYS_VDISR_EL2), trap_undef },
 
 	EL2_REG(CONTEXTIDR_EL2, access_rw, reset_val, 0),
 	EL2_REG(TPIDR_EL2, access_rw, reset_val, 0),
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 33/64] KVM: arm64: nv: Hide RAS from nested guests
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

We don't want to expose complicated features to guests until we have
a good grasp on the basic CPU emulation. So let's pretend that RAS,
doesn't exist in a nested guest. We already hide the feature bits,
let's now make sure VDISR_EL1 will UNDEF.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/sys_regs.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index ebcdf2714b73..5e8876177ce6 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2212,6 +2212,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	EL2_REG(VBAR_EL2, access_rw, reset_val, 0),
 	EL2_REG(RVBAR_EL2, access_rw, reset_val, 0),
 	{ SYS_DESC(SYS_RMR_EL2), trap_undef },
+	{ SYS_DESC(SYS_VDISR_EL2), trap_undef },
 
 	EL2_REG(CONTEXTIDR_EL2, access_rw, reset_val, 0),
 	EL2_REG(TPIDR_EL2, access_rw, reset_val, 0),
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 33/64] KVM: arm64: nv: Hide RAS from nested guests
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

We don't want to expose complicated features to guests until we have
a good grasp on the basic CPU emulation. So let's pretend that RAS,
doesn't exist in a nested guest. We already hide the feature bits,
let's now make sure VDISR_EL1 will UNDEF.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/sys_regs.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index ebcdf2714b73..5e8876177ce6 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2212,6 +2212,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	EL2_REG(VBAR_EL2, access_rw, reset_val, 0),
 	EL2_REG(RVBAR_EL2, access_rw, reset_val, 0),
 	{ SYS_DESC(SYS_RMR_EL2), trap_undef },
+	{ SYS_DESC(SYS_VDISR_EL2), trap_undef },
 
 	EL2_REG(CONTEXTIDR_EL2, access_rw, reset_val, 0),
 	EL2_REG(TPIDR_EL2, access_rw, reset_val, 0),
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 34/64] KVM: arm64: nv: Support multiple nested Stage-2 mmu structures
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

Add Stage-2 mmu data structures for virtual EL2 and for nested guests.
We don't yet populate shadow Stage-2 page tables, but we now have a
framework for getting to a shadow Stage-2 pgd.

We allocate twice the number of vcpus as Stage-2 mmu structures because
that's sufficient for each vcpu running two translation regimes without
having to flush the Stage-2 page tables.

Co-developed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h   |  29 +++++
 arch/arm64/include/asm/kvm_mmu.h    |   9 ++
 arch/arm64/include/asm/kvm_nested.h |   7 +
 arch/arm64/kvm/arm.c                |  16 ++-
 arch/arm64/kvm/mmu.c                |  29 +++--
 arch/arm64/kvm/nested.c             | 195 ++++++++++++++++++++++++++++
 6 files changed, 275 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index fa253f08e0fd..a15183d0e1bf 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -101,14 +101,43 @@ struct kvm_s2_mmu {
 	int __percpu *last_vcpu_ran;
 
 	struct kvm_arch *arch;
+
+	/*
+	 * For a shadow stage-2 MMU, the virtual vttbr programmed by the guest
+	 * hypervisor.  Unused for kvm_arch->mmu. Set to 1 when the structure
+	 * contains no valid information.
+	 */
+	u64	vttbr;
+
+	/* true when this represents a nested context where virtual HCR_EL2.VM == 1 */
+	bool	nested_stage2_enabled;
+
+	/*
+	 *  0: Nobody is currently using this, check vttbr for validity
+	 * >0: Somebody is actively using this.
+	 */
+	atomic_t refcnt;
 };
 
+static inline bool kvm_s2_mmu_valid(struct kvm_s2_mmu *mmu)
+{
+	return !(mmu->vttbr & 1);
+}
+
 struct kvm_arch_memory_slot {
 };
 
 struct kvm_arch {
 	struct kvm_s2_mmu mmu;
 
+	/*
+	 * Stage 2 paging stage for VMs with nested virtual using a virtual
+	 * VMID.
+	 */
+	struct kvm_s2_mmu *nested_mmus;
+	size_t nested_mmus_size;
+	int nested_mmus_next;
+
 	/* VTCR_EL2 value for this VM */
 	u64    vtcr;
 
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 1b314b2a69bc..0750d022bbf8 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -116,6 +116,7 @@ alternative_cb_end
 #include <asm/cacheflush.h>
 #include <asm/mmu_context.h>
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_nested.h>
 
 void kvm_update_va_mask(struct alt_instr *alt,
 			__le32 *origptr, __le32 *updptr, int nr_inst);
@@ -161,6 +162,7 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
 			     void **haddr);
 void free_hyp_pgds(void);
 
+void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size);
 void stage2_unmap_vm(struct kvm *kvm);
 int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu);
 void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
@@ -296,5 +298,12 @@ static inline struct kvm *kvm_s2_mmu_to_kvm(struct kvm_s2_mmu *mmu)
 {
 	return container_of(mmu->arch, struct kvm, arch);
 }
+
+static inline u64 get_vmid(u64 vttbr)
+{
+	return (vttbr & VTTBR_VMID_MASK(kvm_get_vmid_bits())) >>
+		VTTBR_VMID_SHIFT;
+}
+
 #endif /* __ASSEMBLY__ */
 #endif /* __ARM64_KVM_MMU_H__ */
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 7d398510fd9d..8bb7159f2b6b 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -65,6 +65,13 @@ static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
 		(cnthctl & (CNTHCTL_EVNTI | CNTHCTL_EVNTDIR | CNTHCTL_EVNTEN)));
 }
 
+extern void kvm_init_nested(struct kvm *kvm);
+extern int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu);
+extern void kvm_init_nested_s2_mmu(struct kvm_s2_mmu *mmu);
+extern struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr);
+extern void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu);
+extern void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu);
+
 int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
 extern bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg,
 			    u64 control_bit);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 06ca11e90482..14f85f1e15b2 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -37,6 +37,7 @@
 #include <asm/kvm_arm.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_nested.h>
 #include <asm/kvm_emulate.h>
 #include <asm/sections.h>
 
@@ -146,6 +147,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	if (ret)
 		return ret;
 
+	kvm_init_nested(kvm);
+
 	ret = kvm_share_hyp(kvm, kvm + 1);
 	if (ret)
 		goto out_free_stage2_pgd;
@@ -375,6 +378,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	struct kvm_s2_mmu *mmu;
 	int *last_ran;
 
+	if (vcpu_has_nv(vcpu))
+		kvm_vcpu_load_hw_mmu(vcpu);
+
 	mmu = vcpu->arch.hw_mmu;
 	last_ran = this_cpu_ptr(mmu->last_vcpu_ran);
 
@@ -423,6 +429,9 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 	kvm_vgic_put(vcpu);
 	kvm_vcpu_pmu_restore_host(vcpu);
 
+	if (vcpu_has_nv(vcpu))
+		kvm_vcpu_put_hw_mmu(vcpu);
+
 	vcpu->cpu = -1;
 }
 
@@ -1122,8 +1131,13 @@ static int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
 
 	vcpu->arch.target = phys_target;
 
+	/* Prepare for nested if required */
+	ret = kvm_vcpu_init_nested(vcpu);
+
 	/* Now we know what it is, we can reset it. */
-	ret = kvm_reset_vcpu(vcpu);
+	if (!ret)
+		ret = kvm_reset_vcpu(vcpu);
+
 	if (ret) {
 		vcpu->arch.target = -1;
 		bitmap_zero(vcpu->arch.features, KVM_VCPU_MAX_FEATURES);
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index bc2aba953299..55525fd5743d 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -162,7 +162,7 @@ static void invalidate_icache_guest_page(void *va, size_t size)
  * does.
  */
 /**
- * unmap_stage2_range -- Clear stage2 page table entries to unmap a range
+ * __unmap_stage2_range -- Clear stage2 page table entries to unmap a range
  * @mmu:   The KVM stage-2 MMU pointer
  * @start: The intermediate physical base address of the range to unmap
  * @size:  The size of the area to unmap
@@ -185,7 +185,7 @@ static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64
 				   may_block));
 }
 
-static void unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
+void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
 {
 	__unmap_stage2_range(mmu, start, size, true);
 }
@@ -628,7 +628,20 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
 	int cpu, err;
 	struct kvm_pgtable *pgt;
 
+	/*
+	 * If we already have our page tables in place, and that the
+	 * MMU context is the canonical one, we have a bug somewhere,
+	 * as this is only supposed to ever happen once per VM.
+	 *
+	 * Otherwise, we're building nested page tables, and that's
+	 * probably because userspace called KVM_ARM_VCPU_INIT more
+	 * than once on the same vcpu. Since that's actually legal,
+	 * don't kick a fuss and leave gracefully.
+	 */
 	if (mmu->pgt != NULL) {
+		if (&kvm->arch.mmu != mmu)
+			return 0;
+
 		kvm_err("kvm_arch already initialized?\n");
 		return -EINVAL;
 	}
@@ -654,6 +667,9 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
 	mmu->pgt = pgt;
 	mmu->pgd_phys = __pa(pgt->pgd);
 	WRITE_ONCE(mmu->vmid.vmid_gen, 0);
+
+	kvm_init_nested_s2_mmu(mmu);
+
 	return 0;
 
 out_destroy_pgtable:
@@ -699,7 +715,7 @@ static void stage2_unmap_memslot(struct kvm *kvm,
 
 		if (!(vma->vm_flags & VM_PFNMAP)) {
 			gpa_t gpa = addr + (vm_start - memslot->userspace_addr);
-			unmap_stage2_range(&kvm->arch.mmu, gpa, vm_end - vm_start);
+			kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, vm_end - vm_start);
 		}
 		hva = vm_end;
 	} while (hva < reg_end);
@@ -1681,11 +1697,6 @@ void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen)
 {
 }
 
-void kvm_arch_flush_shadow_all(struct kvm *kvm)
-{
-	kvm_free_stage2_pgd(&kvm->arch.mmu);
-}
-
 void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 				   struct kvm_memory_slot *slot)
 {
@@ -1693,7 +1704,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 	phys_addr_t size = slot->npages << PAGE_SHIFT;
 
 	spin_lock(&kvm->mmu_lock);
-	unmap_stage2_range(&kvm->arch.mmu, gpa, size);
+	kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, size);
 	spin_unlock(&kvm->mmu_lock);
 }
 
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 254152cd791e..bfa2b9229173 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -7,12 +7,189 @@
 #include <linux/kvm.h>
 #include <linux/kvm_host.h>
 
+#include <asm/kvm_arm.h>
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_mmu.h>
 #include <asm/kvm_nested.h>
 #include <asm/sysreg.h>
 
 #include "sys_regs.h"
 
+void kvm_init_nested(struct kvm *kvm)
+{
+	kvm->arch.nested_mmus = NULL;
+	kvm->arch.nested_mmus_size = 0;
+}
+
+int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu)
+{
+	struct kvm *kvm = vcpu->kvm;
+	struct kvm_s2_mmu *tmp;
+	int num_mmus;
+	int ret = -ENOMEM;
+
+	if (!test_bit(KVM_ARM_VCPU_HAS_EL2, vcpu->arch.features))
+		return 0;
+
+	if (!cpus_have_final_cap(ARM64_HAS_NESTED_VIRT))
+		return -EINVAL;
+
+	mutex_lock(&kvm->lock);
+
+	/*
+	 * Let's treat memory allocation failures as benign: If we fail to
+	 * allocate anything, return an error and keep the allocated array
+	 * alive. Userspace may try to recover by intializing the vcpu
+	 * again, and there is no reason to affect the whole VM for this.
+	 */
+	num_mmus = atomic_read(&kvm->online_vcpus) * 2;
+	tmp = krealloc(kvm->arch.nested_mmus,
+		       num_mmus * sizeof(*kvm->arch.nested_mmus),
+		       GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+	if (tmp) {
+		/*
+		 * If we went through a realocation, adjust the MMU
+		 * back-pointers in the previously initialised
+		 * pg_table structures.
+		 */
+		if (kvm->arch.nested_mmus != tmp) {
+			int i;
+
+			for (i = 0; i < num_mmus - 2; i++)
+				tmp[i].pgt->mmu = &tmp[i];
+		}
+
+		if (kvm_init_stage2_mmu(kvm, &tmp[num_mmus - 1]) ||
+		    kvm_init_stage2_mmu(kvm, &tmp[num_mmus - 2])) {
+			kvm_free_stage2_pgd(&tmp[num_mmus - 1]);
+			kvm_free_stage2_pgd(&tmp[num_mmus - 2]);
+		} else {
+			kvm->arch.nested_mmus_size = num_mmus;
+			ret = 0;
+		}
+
+		kvm->arch.nested_mmus = tmp;
+	}
+
+	mutex_unlock(&kvm->lock);
+	return ret;
+}
+
+/* Must be called with kvm->lock held */
+struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr)
+{
+	bool nested_stage2_enabled = hcr & HCR_VM;
+	int i;
+
+	/* Don't consider the CnP bit for the vttbr match */
+	vttbr = vttbr & ~VTTBR_CNP_BIT;
+
+	/*
+	 * Two possibilities when looking up a S2 MMU context:
+	 *
+	 * - either S2 is enabled in the guest, and we need a context that
+         *   is S2-enabled and matches the full VTTBR (VMID+BADDR), which
+         *   makes it safe from a TLB conflict perspective (a broken guest
+         *   won't be able to generate them),
+	 *
+	 * - or S2 is disabled, and we need a context that is S2-disabled
+         *   and matches the VMID only, as all TLBs are tagged by VMID even
+         *   if S2 translation is disabled.
+	 */
+	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
+		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
+
+		if (!kvm_s2_mmu_valid(mmu))
+			continue;
+
+		if (nested_stage2_enabled &&
+		    mmu->nested_stage2_enabled &&
+		    vttbr == mmu->vttbr)
+			return mmu;
+
+		if (!nested_stage2_enabled &&
+		    !mmu->nested_stage2_enabled &&
+		    get_vmid(vttbr) == get_vmid(mmu->vttbr))
+			return mmu;
+	}
+	return NULL;
+}
+
+static struct kvm_s2_mmu *get_s2_mmu_nested(struct kvm_vcpu *vcpu)
+{
+	struct kvm *kvm = vcpu->kvm;
+	u64 vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
+	u64 hcr= vcpu_read_sys_reg(vcpu, HCR_EL2);
+	struct kvm_s2_mmu *s2_mmu;
+	int i;
+
+	s2_mmu = lookup_s2_mmu(kvm, vttbr, hcr);
+	if (s2_mmu)
+		goto out;
+
+	/*
+	 * Make sure we don't always search from the same point, or we
+	 * will always reuse a potentially active context, leaving
+	 * free contexts unused.
+	 */
+	for (i = kvm->arch.nested_mmus_next;
+	     i < (kvm->arch.nested_mmus_size + kvm->arch.nested_mmus_next);
+	     i++) {
+		s2_mmu = &kvm->arch.nested_mmus[i % kvm->arch.nested_mmus_size];
+
+		if (atomic_read(&s2_mmu->refcnt) == 0)
+			break;
+	}
+	BUG_ON(atomic_read(&s2_mmu->refcnt)); /* We have struct MMUs to spare */
+
+	/* Set the scene for the next search */
+	kvm->arch.nested_mmus_next = (i + 1) % kvm->arch.nested_mmus_size;
+
+	if (kvm_s2_mmu_valid(s2_mmu)) {
+		/* Clear the old state */
+		kvm_unmap_stage2_range(s2_mmu, 0, kvm_phys_size(kvm));
+		if (s2_mmu->vmid.vmid_gen)
+			kvm_call_hyp(__kvm_tlb_flush_vmid, s2_mmu);
+	}
+
+	/*
+	 * The virtual VMID (modulo CnP) will be used as a key when matching
+	 * an existing kvm_s2_mmu.
+	 */
+	s2_mmu->vttbr = vttbr & ~VTTBR_CNP_BIT;
+	s2_mmu->nested_stage2_enabled = hcr & HCR_VM;
+
+out:
+	atomic_inc(&s2_mmu->refcnt);
+	return s2_mmu;
+}
+
+void kvm_init_nested_s2_mmu(struct kvm_s2_mmu *mmu)
+{
+	mmu->vttbr = 1;
+	mmu->nested_stage2_enabled = false;
+	atomic_set(&mmu->refcnt, 0);
+}
+
+void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu)
+{
+	if (is_hyp_ctxt(vcpu)) {
+		vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu;
+	} else {
+		spin_lock(&vcpu->kvm->mmu_lock);
+		vcpu->arch.hw_mmu = get_s2_mmu_nested(vcpu);
+		spin_unlock(&vcpu->kvm->mmu_lock);
+	}
+}
+
+void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu)
+{
+	if (vcpu->arch.hw_mmu != &vcpu->kvm->arch.mmu) {
+		atomic_dec(&vcpu->arch.hw_mmu->refcnt);
+		vcpu->arch.hw_mmu = NULL;
+	}
+}
+
 /*
  * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
  * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
@@ -31,6 +208,24 @@ int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
 	return -EINVAL;
 }
 
+void kvm_arch_flush_shadow_all(struct kvm *kvm)
+{
+	int i;
+
+	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
+		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
+
+		WARN_ON(atomic_read(&mmu->refcnt));
+
+		if (!atomic_read(&mmu->refcnt))
+			kvm_free_stage2_pgd(mmu);
+	}
+	kfree(kvm->arch.nested_mmus);
+	kvm->arch.nested_mmus = NULL;
+	kvm->arch.nested_mmus_size = 0;
+	kvm_free_stage2_pgd(&kvm->arch.mmu);
+}
+
 /*
  * Our emulated CPU doesn't support all the possible features. For the
  * sake of simplicity (and probably mental sanity), wipe out a number
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 34/64] KVM: arm64: nv: Support multiple nested Stage-2 mmu structures
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

Add Stage-2 mmu data structures for virtual EL2 and for nested guests.
We don't yet populate shadow Stage-2 page tables, but we now have a
framework for getting to a shadow Stage-2 pgd.

We allocate twice the number of vcpus as Stage-2 mmu structures because
that's sufficient for each vcpu running two translation regimes without
having to flush the Stage-2 page tables.

Co-developed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h   |  29 +++++
 arch/arm64/include/asm/kvm_mmu.h    |   9 ++
 arch/arm64/include/asm/kvm_nested.h |   7 +
 arch/arm64/kvm/arm.c                |  16 ++-
 arch/arm64/kvm/mmu.c                |  29 +++--
 arch/arm64/kvm/nested.c             | 195 ++++++++++++++++++++++++++++
 6 files changed, 275 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index fa253f08e0fd..a15183d0e1bf 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -101,14 +101,43 @@ struct kvm_s2_mmu {
 	int __percpu *last_vcpu_ran;
 
 	struct kvm_arch *arch;
+
+	/*
+	 * For a shadow stage-2 MMU, the virtual vttbr programmed by the guest
+	 * hypervisor.  Unused for kvm_arch->mmu. Set to 1 when the structure
+	 * contains no valid information.
+	 */
+	u64	vttbr;
+
+	/* true when this represents a nested context where virtual HCR_EL2.VM == 1 */
+	bool	nested_stage2_enabled;
+
+	/*
+	 *  0: Nobody is currently using this, check vttbr for validity
+	 * >0: Somebody is actively using this.
+	 */
+	atomic_t refcnt;
 };
 
+static inline bool kvm_s2_mmu_valid(struct kvm_s2_mmu *mmu)
+{
+	return !(mmu->vttbr & 1);
+}
+
 struct kvm_arch_memory_slot {
 };
 
 struct kvm_arch {
 	struct kvm_s2_mmu mmu;
 
+	/*
+	 * Stage 2 paging stage for VMs with nested virtual using a virtual
+	 * VMID.
+	 */
+	struct kvm_s2_mmu *nested_mmus;
+	size_t nested_mmus_size;
+	int nested_mmus_next;
+
 	/* VTCR_EL2 value for this VM */
 	u64    vtcr;
 
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 1b314b2a69bc..0750d022bbf8 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -116,6 +116,7 @@ alternative_cb_end
 #include <asm/cacheflush.h>
 #include <asm/mmu_context.h>
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_nested.h>
 
 void kvm_update_va_mask(struct alt_instr *alt,
 			__le32 *origptr, __le32 *updptr, int nr_inst);
@@ -161,6 +162,7 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
 			     void **haddr);
 void free_hyp_pgds(void);
 
+void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size);
 void stage2_unmap_vm(struct kvm *kvm);
 int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu);
 void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
@@ -296,5 +298,12 @@ static inline struct kvm *kvm_s2_mmu_to_kvm(struct kvm_s2_mmu *mmu)
 {
 	return container_of(mmu->arch, struct kvm, arch);
 }
+
+static inline u64 get_vmid(u64 vttbr)
+{
+	return (vttbr & VTTBR_VMID_MASK(kvm_get_vmid_bits())) >>
+		VTTBR_VMID_SHIFT;
+}
+
 #endif /* __ASSEMBLY__ */
 #endif /* __ARM64_KVM_MMU_H__ */
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 7d398510fd9d..8bb7159f2b6b 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -65,6 +65,13 @@ static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
 		(cnthctl & (CNTHCTL_EVNTI | CNTHCTL_EVNTDIR | CNTHCTL_EVNTEN)));
 }
 
+extern void kvm_init_nested(struct kvm *kvm);
+extern int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu);
+extern void kvm_init_nested_s2_mmu(struct kvm_s2_mmu *mmu);
+extern struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr);
+extern void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu);
+extern void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu);
+
 int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
 extern bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg,
 			    u64 control_bit);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 06ca11e90482..14f85f1e15b2 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -37,6 +37,7 @@
 #include <asm/kvm_arm.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_nested.h>
 #include <asm/kvm_emulate.h>
 #include <asm/sections.h>
 
@@ -146,6 +147,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	if (ret)
 		return ret;
 
+	kvm_init_nested(kvm);
+
 	ret = kvm_share_hyp(kvm, kvm + 1);
 	if (ret)
 		goto out_free_stage2_pgd;
@@ -375,6 +378,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	struct kvm_s2_mmu *mmu;
 	int *last_ran;
 
+	if (vcpu_has_nv(vcpu))
+		kvm_vcpu_load_hw_mmu(vcpu);
+
 	mmu = vcpu->arch.hw_mmu;
 	last_ran = this_cpu_ptr(mmu->last_vcpu_ran);
 
@@ -423,6 +429,9 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 	kvm_vgic_put(vcpu);
 	kvm_vcpu_pmu_restore_host(vcpu);
 
+	if (vcpu_has_nv(vcpu))
+		kvm_vcpu_put_hw_mmu(vcpu);
+
 	vcpu->cpu = -1;
 }
 
@@ -1122,8 +1131,13 @@ static int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
 
 	vcpu->arch.target = phys_target;
 
+	/* Prepare for nested if required */
+	ret = kvm_vcpu_init_nested(vcpu);
+
 	/* Now we know what it is, we can reset it. */
-	ret = kvm_reset_vcpu(vcpu);
+	if (!ret)
+		ret = kvm_reset_vcpu(vcpu);
+
 	if (ret) {
 		vcpu->arch.target = -1;
 		bitmap_zero(vcpu->arch.features, KVM_VCPU_MAX_FEATURES);
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index bc2aba953299..55525fd5743d 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -162,7 +162,7 @@ static void invalidate_icache_guest_page(void *va, size_t size)
  * does.
  */
 /**
- * unmap_stage2_range -- Clear stage2 page table entries to unmap a range
+ * __unmap_stage2_range -- Clear stage2 page table entries to unmap a range
  * @mmu:   The KVM stage-2 MMU pointer
  * @start: The intermediate physical base address of the range to unmap
  * @size:  The size of the area to unmap
@@ -185,7 +185,7 @@ static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64
 				   may_block));
 }
 
-static void unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
+void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
 {
 	__unmap_stage2_range(mmu, start, size, true);
 }
@@ -628,7 +628,20 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
 	int cpu, err;
 	struct kvm_pgtable *pgt;
 
+	/*
+	 * If we already have our page tables in place, and that the
+	 * MMU context is the canonical one, we have a bug somewhere,
+	 * as this is only supposed to ever happen once per VM.
+	 *
+	 * Otherwise, we're building nested page tables, and that's
+	 * probably because userspace called KVM_ARM_VCPU_INIT more
+	 * than once on the same vcpu. Since that's actually legal,
+	 * don't kick a fuss and leave gracefully.
+	 */
 	if (mmu->pgt != NULL) {
+		if (&kvm->arch.mmu != mmu)
+			return 0;
+
 		kvm_err("kvm_arch already initialized?\n");
 		return -EINVAL;
 	}
@@ -654,6 +667,9 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
 	mmu->pgt = pgt;
 	mmu->pgd_phys = __pa(pgt->pgd);
 	WRITE_ONCE(mmu->vmid.vmid_gen, 0);
+
+	kvm_init_nested_s2_mmu(mmu);
+
 	return 0;
 
 out_destroy_pgtable:
@@ -699,7 +715,7 @@ static void stage2_unmap_memslot(struct kvm *kvm,
 
 		if (!(vma->vm_flags & VM_PFNMAP)) {
 			gpa_t gpa = addr + (vm_start - memslot->userspace_addr);
-			unmap_stage2_range(&kvm->arch.mmu, gpa, vm_end - vm_start);
+			kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, vm_end - vm_start);
 		}
 		hva = vm_end;
 	} while (hva < reg_end);
@@ -1681,11 +1697,6 @@ void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen)
 {
 }
 
-void kvm_arch_flush_shadow_all(struct kvm *kvm)
-{
-	kvm_free_stage2_pgd(&kvm->arch.mmu);
-}
-
 void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 				   struct kvm_memory_slot *slot)
 {
@@ -1693,7 +1704,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 	phys_addr_t size = slot->npages << PAGE_SHIFT;
 
 	spin_lock(&kvm->mmu_lock);
-	unmap_stage2_range(&kvm->arch.mmu, gpa, size);
+	kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, size);
 	spin_unlock(&kvm->mmu_lock);
 }
 
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 254152cd791e..bfa2b9229173 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -7,12 +7,189 @@
 #include <linux/kvm.h>
 #include <linux/kvm_host.h>
 
+#include <asm/kvm_arm.h>
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_mmu.h>
 #include <asm/kvm_nested.h>
 #include <asm/sysreg.h>
 
 #include "sys_regs.h"
 
+void kvm_init_nested(struct kvm *kvm)
+{
+	kvm->arch.nested_mmus = NULL;
+	kvm->arch.nested_mmus_size = 0;
+}
+
+int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu)
+{
+	struct kvm *kvm = vcpu->kvm;
+	struct kvm_s2_mmu *tmp;
+	int num_mmus;
+	int ret = -ENOMEM;
+
+	if (!test_bit(KVM_ARM_VCPU_HAS_EL2, vcpu->arch.features))
+		return 0;
+
+	if (!cpus_have_final_cap(ARM64_HAS_NESTED_VIRT))
+		return -EINVAL;
+
+	mutex_lock(&kvm->lock);
+
+	/*
+	 * Let's treat memory allocation failures as benign: If we fail to
+	 * allocate anything, return an error and keep the allocated array
+	 * alive. Userspace may try to recover by intializing the vcpu
+	 * again, and there is no reason to affect the whole VM for this.
+	 */
+	num_mmus = atomic_read(&kvm->online_vcpus) * 2;
+	tmp = krealloc(kvm->arch.nested_mmus,
+		       num_mmus * sizeof(*kvm->arch.nested_mmus),
+		       GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+	if (tmp) {
+		/*
+		 * If we went through a realocation, adjust the MMU
+		 * back-pointers in the previously initialised
+		 * pg_table structures.
+		 */
+		if (kvm->arch.nested_mmus != tmp) {
+			int i;
+
+			for (i = 0; i < num_mmus - 2; i++)
+				tmp[i].pgt->mmu = &tmp[i];
+		}
+
+		if (kvm_init_stage2_mmu(kvm, &tmp[num_mmus - 1]) ||
+		    kvm_init_stage2_mmu(kvm, &tmp[num_mmus - 2])) {
+			kvm_free_stage2_pgd(&tmp[num_mmus - 1]);
+			kvm_free_stage2_pgd(&tmp[num_mmus - 2]);
+		} else {
+			kvm->arch.nested_mmus_size = num_mmus;
+			ret = 0;
+		}
+
+		kvm->arch.nested_mmus = tmp;
+	}
+
+	mutex_unlock(&kvm->lock);
+	return ret;
+}
+
+/* Must be called with kvm->lock held */
+struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr)
+{
+	bool nested_stage2_enabled = hcr & HCR_VM;
+	int i;
+
+	/* Don't consider the CnP bit for the vttbr match */
+	vttbr = vttbr & ~VTTBR_CNP_BIT;
+
+	/*
+	 * Two possibilities when looking up a S2 MMU context:
+	 *
+	 * - either S2 is enabled in the guest, and we need a context that
+         *   is S2-enabled and matches the full VTTBR (VMID+BADDR), which
+         *   makes it safe from a TLB conflict perspective (a broken guest
+         *   won't be able to generate them),
+	 *
+	 * - or S2 is disabled, and we need a context that is S2-disabled
+         *   and matches the VMID only, as all TLBs are tagged by VMID even
+         *   if S2 translation is disabled.
+	 */
+	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
+		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
+
+		if (!kvm_s2_mmu_valid(mmu))
+			continue;
+
+		if (nested_stage2_enabled &&
+		    mmu->nested_stage2_enabled &&
+		    vttbr == mmu->vttbr)
+			return mmu;
+
+		if (!nested_stage2_enabled &&
+		    !mmu->nested_stage2_enabled &&
+		    get_vmid(vttbr) == get_vmid(mmu->vttbr))
+			return mmu;
+	}
+	return NULL;
+}
+
+static struct kvm_s2_mmu *get_s2_mmu_nested(struct kvm_vcpu *vcpu)
+{
+	struct kvm *kvm = vcpu->kvm;
+	u64 vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
+	u64 hcr= vcpu_read_sys_reg(vcpu, HCR_EL2);
+	struct kvm_s2_mmu *s2_mmu;
+	int i;
+
+	s2_mmu = lookup_s2_mmu(kvm, vttbr, hcr);
+	if (s2_mmu)
+		goto out;
+
+	/*
+	 * Make sure we don't always search from the same point, or we
+	 * will always reuse a potentially active context, leaving
+	 * free contexts unused.
+	 */
+	for (i = kvm->arch.nested_mmus_next;
+	     i < (kvm->arch.nested_mmus_size + kvm->arch.nested_mmus_next);
+	     i++) {
+		s2_mmu = &kvm->arch.nested_mmus[i % kvm->arch.nested_mmus_size];
+
+		if (atomic_read(&s2_mmu->refcnt) == 0)
+			break;
+	}
+	BUG_ON(atomic_read(&s2_mmu->refcnt)); /* We have struct MMUs to spare */
+
+	/* Set the scene for the next search */
+	kvm->arch.nested_mmus_next = (i + 1) % kvm->arch.nested_mmus_size;
+
+	if (kvm_s2_mmu_valid(s2_mmu)) {
+		/* Clear the old state */
+		kvm_unmap_stage2_range(s2_mmu, 0, kvm_phys_size(kvm));
+		if (s2_mmu->vmid.vmid_gen)
+			kvm_call_hyp(__kvm_tlb_flush_vmid, s2_mmu);
+	}
+
+	/*
+	 * The virtual VMID (modulo CnP) will be used as a key when matching
+	 * an existing kvm_s2_mmu.
+	 */
+	s2_mmu->vttbr = vttbr & ~VTTBR_CNP_BIT;
+	s2_mmu->nested_stage2_enabled = hcr & HCR_VM;
+
+out:
+	atomic_inc(&s2_mmu->refcnt);
+	return s2_mmu;
+}
+
+void kvm_init_nested_s2_mmu(struct kvm_s2_mmu *mmu)
+{
+	mmu->vttbr = 1;
+	mmu->nested_stage2_enabled = false;
+	atomic_set(&mmu->refcnt, 0);
+}
+
+void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu)
+{
+	if (is_hyp_ctxt(vcpu)) {
+		vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu;
+	} else {
+		spin_lock(&vcpu->kvm->mmu_lock);
+		vcpu->arch.hw_mmu = get_s2_mmu_nested(vcpu);
+		spin_unlock(&vcpu->kvm->mmu_lock);
+	}
+}
+
+void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu)
+{
+	if (vcpu->arch.hw_mmu != &vcpu->kvm->arch.mmu) {
+		atomic_dec(&vcpu->arch.hw_mmu->refcnt);
+		vcpu->arch.hw_mmu = NULL;
+	}
+}
+
 /*
  * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
  * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
@@ -31,6 +208,24 @@ int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
 	return -EINVAL;
 }
 
+void kvm_arch_flush_shadow_all(struct kvm *kvm)
+{
+	int i;
+
+	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
+		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
+
+		WARN_ON(atomic_read(&mmu->refcnt));
+
+		if (!atomic_read(&mmu->refcnt))
+			kvm_free_stage2_pgd(mmu);
+	}
+	kfree(kvm->arch.nested_mmus);
+	kvm->arch.nested_mmus = NULL;
+	kvm->arch.nested_mmus_size = 0;
+	kvm_free_stage2_pgd(&kvm->arch.mmu);
+}
+
 /*
  * Our emulated CPU doesn't support all the possible features. For the
  * sake of simplicity (and probably mental sanity), wipe out a number
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 34/64] KVM: arm64: nv: Support multiple nested Stage-2 mmu structures
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

Add Stage-2 mmu data structures for virtual EL2 and for nested guests.
We don't yet populate shadow Stage-2 page tables, but we now have a
framework for getting to a shadow Stage-2 pgd.

We allocate twice the number of vcpus as Stage-2 mmu structures because
that's sufficient for each vcpu running two translation regimes without
having to flush the Stage-2 page tables.

Co-developed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h   |  29 +++++
 arch/arm64/include/asm/kvm_mmu.h    |   9 ++
 arch/arm64/include/asm/kvm_nested.h |   7 +
 arch/arm64/kvm/arm.c                |  16 ++-
 arch/arm64/kvm/mmu.c                |  29 +++--
 arch/arm64/kvm/nested.c             | 195 ++++++++++++++++++++++++++++
 6 files changed, 275 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index fa253f08e0fd..a15183d0e1bf 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -101,14 +101,43 @@ struct kvm_s2_mmu {
 	int __percpu *last_vcpu_ran;
 
 	struct kvm_arch *arch;
+
+	/*
+	 * For a shadow stage-2 MMU, the virtual vttbr programmed by the guest
+	 * hypervisor.  Unused for kvm_arch->mmu. Set to 1 when the structure
+	 * contains no valid information.
+	 */
+	u64	vttbr;
+
+	/* true when this represents a nested context where virtual HCR_EL2.VM == 1 */
+	bool	nested_stage2_enabled;
+
+	/*
+	 *  0: Nobody is currently using this, check vttbr for validity
+	 * >0: Somebody is actively using this.
+	 */
+	atomic_t refcnt;
 };
 
+static inline bool kvm_s2_mmu_valid(struct kvm_s2_mmu *mmu)
+{
+	return !(mmu->vttbr & 1);
+}
+
 struct kvm_arch_memory_slot {
 };
 
 struct kvm_arch {
 	struct kvm_s2_mmu mmu;
 
+	/*
+	 * Stage 2 paging stage for VMs with nested virtual using a virtual
+	 * VMID.
+	 */
+	struct kvm_s2_mmu *nested_mmus;
+	size_t nested_mmus_size;
+	int nested_mmus_next;
+
 	/* VTCR_EL2 value for this VM */
 	u64    vtcr;
 
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 1b314b2a69bc..0750d022bbf8 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -116,6 +116,7 @@ alternative_cb_end
 #include <asm/cacheflush.h>
 #include <asm/mmu_context.h>
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_nested.h>
 
 void kvm_update_va_mask(struct alt_instr *alt,
 			__le32 *origptr, __le32 *updptr, int nr_inst);
@@ -161,6 +162,7 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
 			     void **haddr);
 void free_hyp_pgds(void);
 
+void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size);
 void stage2_unmap_vm(struct kvm *kvm);
 int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu);
 void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
@@ -296,5 +298,12 @@ static inline struct kvm *kvm_s2_mmu_to_kvm(struct kvm_s2_mmu *mmu)
 {
 	return container_of(mmu->arch, struct kvm, arch);
 }
+
+static inline u64 get_vmid(u64 vttbr)
+{
+	return (vttbr & VTTBR_VMID_MASK(kvm_get_vmid_bits())) >>
+		VTTBR_VMID_SHIFT;
+}
+
 #endif /* __ASSEMBLY__ */
 #endif /* __ARM64_KVM_MMU_H__ */
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 7d398510fd9d..8bb7159f2b6b 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -65,6 +65,13 @@ static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
 		(cnthctl & (CNTHCTL_EVNTI | CNTHCTL_EVNTDIR | CNTHCTL_EVNTEN)));
 }
 
+extern void kvm_init_nested(struct kvm *kvm);
+extern int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu);
+extern void kvm_init_nested_s2_mmu(struct kvm_s2_mmu *mmu);
+extern struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr);
+extern void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu);
+extern void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu);
+
 int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
 extern bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg,
 			    u64 control_bit);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 06ca11e90482..14f85f1e15b2 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -37,6 +37,7 @@
 #include <asm/kvm_arm.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_nested.h>
 #include <asm/kvm_emulate.h>
 #include <asm/sections.h>
 
@@ -146,6 +147,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	if (ret)
 		return ret;
 
+	kvm_init_nested(kvm);
+
 	ret = kvm_share_hyp(kvm, kvm + 1);
 	if (ret)
 		goto out_free_stage2_pgd;
@@ -375,6 +378,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	struct kvm_s2_mmu *mmu;
 	int *last_ran;
 
+	if (vcpu_has_nv(vcpu))
+		kvm_vcpu_load_hw_mmu(vcpu);
+
 	mmu = vcpu->arch.hw_mmu;
 	last_ran = this_cpu_ptr(mmu->last_vcpu_ran);
 
@@ -423,6 +429,9 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 	kvm_vgic_put(vcpu);
 	kvm_vcpu_pmu_restore_host(vcpu);
 
+	if (vcpu_has_nv(vcpu))
+		kvm_vcpu_put_hw_mmu(vcpu);
+
 	vcpu->cpu = -1;
 }
 
@@ -1122,8 +1131,13 @@ static int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
 
 	vcpu->arch.target = phys_target;
 
+	/* Prepare for nested if required */
+	ret = kvm_vcpu_init_nested(vcpu);
+
 	/* Now we know what it is, we can reset it. */
-	ret = kvm_reset_vcpu(vcpu);
+	if (!ret)
+		ret = kvm_reset_vcpu(vcpu);
+
 	if (ret) {
 		vcpu->arch.target = -1;
 		bitmap_zero(vcpu->arch.features, KVM_VCPU_MAX_FEATURES);
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index bc2aba953299..55525fd5743d 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -162,7 +162,7 @@ static void invalidate_icache_guest_page(void *va, size_t size)
  * does.
  */
 /**
- * unmap_stage2_range -- Clear stage2 page table entries to unmap a range
+ * __unmap_stage2_range -- Clear stage2 page table entries to unmap a range
  * @mmu:   The KVM stage-2 MMU pointer
  * @start: The intermediate physical base address of the range to unmap
  * @size:  The size of the area to unmap
@@ -185,7 +185,7 @@ static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64
 				   may_block));
 }
 
-static void unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
+void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
 {
 	__unmap_stage2_range(mmu, start, size, true);
 }
@@ -628,7 +628,20 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
 	int cpu, err;
 	struct kvm_pgtable *pgt;
 
+	/*
+	 * If we already have our page tables in place, and that the
+	 * MMU context is the canonical one, we have a bug somewhere,
+	 * as this is only supposed to ever happen once per VM.
+	 *
+	 * Otherwise, we're building nested page tables, and that's
+	 * probably because userspace called KVM_ARM_VCPU_INIT more
+	 * than once on the same vcpu. Since that's actually legal,
+	 * don't kick a fuss and leave gracefully.
+	 */
 	if (mmu->pgt != NULL) {
+		if (&kvm->arch.mmu != mmu)
+			return 0;
+
 		kvm_err("kvm_arch already initialized?\n");
 		return -EINVAL;
 	}
@@ -654,6 +667,9 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
 	mmu->pgt = pgt;
 	mmu->pgd_phys = __pa(pgt->pgd);
 	WRITE_ONCE(mmu->vmid.vmid_gen, 0);
+
+	kvm_init_nested_s2_mmu(mmu);
+
 	return 0;
 
 out_destroy_pgtable:
@@ -699,7 +715,7 @@ static void stage2_unmap_memslot(struct kvm *kvm,
 
 		if (!(vma->vm_flags & VM_PFNMAP)) {
 			gpa_t gpa = addr + (vm_start - memslot->userspace_addr);
-			unmap_stage2_range(&kvm->arch.mmu, gpa, vm_end - vm_start);
+			kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, vm_end - vm_start);
 		}
 		hva = vm_end;
 	} while (hva < reg_end);
@@ -1681,11 +1697,6 @@ void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen)
 {
 }
 
-void kvm_arch_flush_shadow_all(struct kvm *kvm)
-{
-	kvm_free_stage2_pgd(&kvm->arch.mmu);
-}
-
 void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 				   struct kvm_memory_slot *slot)
 {
@@ -1693,7 +1704,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 	phys_addr_t size = slot->npages << PAGE_SHIFT;
 
 	spin_lock(&kvm->mmu_lock);
-	unmap_stage2_range(&kvm->arch.mmu, gpa, size);
+	kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, size);
 	spin_unlock(&kvm->mmu_lock);
 }
 
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 254152cd791e..bfa2b9229173 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -7,12 +7,189 @@
 #include <linux/kvm.h>
 #include <linux/kvm_host.h>
 
+#include <asm/kvm_arm.h>
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_mmu.h>
 #include <asm/kvm_nested.h>
 #include <asm/sysreg.h>
 
 #include "sys_regs.h"
 
+void kvm_init_nested(struct kvm *kvm)
+{
+	kvm->arch.nested_mmus = NULL;
+	kvm->arch.nested_mmus_size = 0;
+}
+
+int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu)
+{
+	struct kvm *kvm = vcpu->kvm;
+	struct kvm_s2_mmu *tmp;
+	int num_mmus;
+	int ret = -ENOMEM;
+
+	if (!test_bit(KVM_ARM_VCPU_HAS_EL2, vcpu->arch.features))
+		return 0;
+
+	if (!cpus_have_final_cap(ARM64_HAS_NESTED_VIRT))
+		return -EINVAL;
+
+	mutex_lock(&kvm->lock);
+
+	/*
+	 * Let's treat memory allocation failures as benign: If we fail to
+	 * allocate anything, return an error and keep the allocated array
+	 * alive. Userspace may try to recover by intializing the vcpu
+	 * again, and there is no reason to affect the whole VM for this.
+	 */
+	num_mmus = atomic_read(&kvm->online_vcpus) * 2;
+	tmp = krealloc(kvm->arch.nested_mmus,
+		       num_mmus * sizeof(*kvm->arch.nested_mmus),
+		       GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+	if (tmp) {
+		/*
+		 * If we went through a realocation, adjust the MMU
+		 * back-pointers in the previously initialised
+		 * pg_table structures.
+		 */
+		if (kvm->arch.nested_mmus != tmp) {
+			int i;
+
+			for (i = 0; i < num_mmus - 2; i++)
+				tmp[i].pgt->mmu = &tmp[i];
+		}
+
+		if (kvm_init_stage2_mmu(kvm, &tmp[num_mmus - 1]) ||
+		    kvm_init_stage2_mmu(kvm, &tmp[num_mmus - 2])) {
+			kvm_free_stage2_pgd(&tmp[num_mmus - 1]);
+			kvm_free_stage2_pgd(&tmp[num_mmus - 2]);
+		} else {
+			kvm->arch.nested_mmus_size = num_mmus;
+			ret = 0;
+		}
+
+		kvm->arch.nested_mmus = tmp;
+	}
+
+	mutex_unlock(&kvm->lock);
+	return ret;
+}
+
+/* Must be called with kvm->lock held */
+struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr)
+{
+	bool nested_stage2_enabled = hcr & HCR_VM;
+	int i;
+
+	/* Don't consider the CnP bit for the vttbr match */
+	vttbr = vttbr & ~VTTBR_CNP_BIT;
+
+	/*
+	 * Two possibilities when looking up a S2 MMU context:
+	 *
+	 * - either S2 is enabled in the guest, and we need a context that
+         *   is S2-enabled and matches the full VTTBR (VMID+BADDR), which
+         *   makes it safe from a TLB conflict perspective (a broken guest
+         *   won't be able to generate them),
+	 *
+	 * - or S2 is disabled, and we need a context that is S2-disabled
+         *   and matches the VMID only, as all TLBs are tagged by VMID even
+         *   if S2 translation is disabled.
+	 */
+	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
+		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
+
+		if (!kvm_s2_mmu_valid(mmu))
+			continue;
+
+		if (nested_stage2_enabled &&
+		    mmu->nested_stage2_enabled &&
+		    vttbr == mmu->vttbr)
+			return mmu;
+
+		if (!nested_stage2_enabled &&
+		    !mmu->nested_stage2_enabled &&
+		    get_vmid(vttbr) == get_vmid(mmu->vttbr))
+			return mmu;
+	}
+	return NULL;
+}
+
+static struct kvm_s2_mmu *get_s2_mmu_nested(struct kvm_vcpu *vcpu)
+{
+	struct kvm *kvm = vcpu->kvm;
+	u64 vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
+	u64 hcr= vcpu_read_sys_reg(vcpu, HCR_EL2);
+	struct kvm_s2_mmu *s2_mmu;
+	int i;
+
+	s2_mmu = lookup_s2_mmu(kvm, vttbr, hcr);
+	if (s2_mmu)
+		goto out;
+
+	/*
+	 * Make sure we don't always search from the same point, or we
+	 * will always reuse a potentially active context, leaving
+	 * free contexts unused.
+	 */
+	for (i = kvm->arch.nested_mmus_next;
+	     i < (kvm->arch.nested_mmus_size + kvm->arch.nested_mmus_next);
+	     i++) {
+		s2_mmu = &kvm->arch.nested_mmus[i % kvm->arch.nested_mmus_size];
+
+		if (atomic_read(&s2_mmu->refcnt) == 0)
+			break;
+	}
+	BUG_ON(atomic_read(&s2_mmu->refcnt)); /* We have struct MMUs to spare */
+
+	/* Set the scene for the next search */
+	kvm->arch.nested_mmus_next = (i + 1) % kvm->arch.nested_mmus_size;
+
+	if (kvm_s2_mmu_valid(s2_mmu)) {
+		/* Clear the old state */
+		kvm_unmap_stage2_range(s2_mmu, 0, kvm_phys_size(kvm));
+		if (s2_mmu->vmid.vmid_gen)
+			kvm_call_hyp(__kvm_tlb_flush_vmid, s2_mmu);
+	}
+
+	/*
+	 * The virtual VMID (modulo CnP) will be used as a key when matching
+	 * an existing kvm_s2_mmu.
+	 */
+	s2_mmu->vttbr = vttbr & ~VTTBR_CNP_BIT;
+	s2_mmu->nested_stage2_enabled = hcr & HCR_VM;
+
+out:
+	atomic_inc(&s2_mmu->refcnt);
+	return s2_mmu;
+}
+
+void kvm_init_nested_s2_mmu(struct kvm_s2_mmu *mmu)
+{
+	mmu->vttbr = 1;
+	mmu->nested_stage2_enabled = false;
+	atomic_set(&mmu->refcnt, 0);
+}
+
+void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu)
+{
+	if (is_hyp_ctxt(vcpu)) {
+		vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu;
+	} else {
+		spin_lock(&vcpu->kvm->mmu_lock);
+		vcpu->arch.hw_mmu = get_s2_mmu_nested(vcpu);
+		spin_unlock(&vcpu->kvm->mmu_lock);
+	}
+}
+
+void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu)
+{
+	if (vcpu->arch.hw_mmu != &vcpu->kvm->arch.mmu) {
+		atomic_dec(&vcpu->arch.hw_mmu->refcnt);
+		vcpu->arch.hw_mmu = NULL;
+	}
+}
+
 /*
  * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
  * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
@@ -31,6 +208,24 @@ int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
 	return -EINVAL;
 }
 
+void kvm_arch_flush_shadow_all(struct kvm *kvm)
+{
+	int i;
+
+	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
+		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
+
+		WARN_ON(atomic_read(&mmu->refcnt));
+
+		if (!atomic_read(&mmu->refcnt))
+			kvm_free_stage2_pgd(mmu);
+	}
+	kfree(kvm->arch.nested_mmus);
+	kvm->arch.nested_mmus = NULL;
+	kvm->arch.nested_mmus_size = 0;
+	kvm_free_stage2_pgd(&kvm->arch.mmu);
+}
+
 /*
  * Our emulated CPU doesn't support all the possible features. For the
  * sake of simplicity (and probably mental sanity), wipe out a number
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 35/64] KVM: arm64: nv: Implement nested Stage-2 page table walk logic
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Christoffer Dall <christoffer.dall@linaro.org>

Based on the pseudo-code in the ARM ARM, implement a stage 2 software
page table walker.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
[maz: heavily reworked for future ARMv8.4-TTL support]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/esr.h        |   1 +
 arch/arm64/include/asm/kvm_arm.h    |   2 +
 arch/arm64/include/asm/kvm_nested.h |  13 ++
 arch/arm64/kvm/nested.c             | 267 ++++++++++++++++++++++++++++
 4 files changed, 283 insertions(+)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index 3574c224889f..b4f6f86a3631 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -130,6 +130,7 @@
 #define ESR_ELx_CM 		(UL(1) << ESR_ELx_CM_SHIFT)
 
 /* ISS field definitions for exceptions taken in to Hyp */
+#define ESR_ELx_FSC_ADDRSZ	(0x00)
 #define ESR_ELx_CV		(UL(1) << 24)
 #define ESR_ELx_COND_SHIFT	(20)
 #define ESR_ELx_COND_MASK	(UL(0xF) << ESR_ELx_COND_SHIFT)
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index d913c3fb5665..3675879b53c6 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -272,6 +272,8 @@
 #define VTTBR_VMID_SHIFT  (UL(48))
 #define VTTBR_VMID_MASK(size) (_AT(u64, (1 << size) - 1) << VTTBR_VMID_SHIFT)
 
+#define SCTLR_EE	(UL(1) << 25)
+
 /* Hyp System Trap Register */
 #define HSTR_EL2_T(x)	(1 << x)
 
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 8bb7159f2b6b..48cf288ea238 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -72,6 +72,19 @@ extern struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr);
 extern void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu);
 extern void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu);
 
+struct kvm_s2_trans {
+	phys_addr_t output;
+	unsigned long block_size;
+	bool writable;
+	bool readable;
+	int level;
+	u32 esr;
+	u64 upper_attr;
+};
+
+extern int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
+			      struct kvm_s2_trans *result);
+
 int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
 extern bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg,
 			    u64 control_bit);
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index bfa2b9229173..c2a99b672368 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -75,6 +75,273 @@ int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu)
 	return ret;
 }
 
+struct s2_walk_info {
+	int	     (*read_desc)(phys_addr_t pa, u64 *desc, void *data);
+	void	     *data;
+	u64	     baddr;
+	unsigned int max_pa_bits;
+	unsigned int pgshift;
+	unsigned int pgsize;
+	unsigned int ps;
+	unsigned int sl;
+	unsigned int t0sz;
+	bool	     be;
+	bool	     el1_aarch32;
+};
+
+static unsigned int ps_to_output_size(unsigned int ps)
+{
+	switch (ps) {
+	case 0: return 32;
+	case 1: return 36;
+	case 2: return 40;
+	case 3: return 42;
+	case 4: return 44;
+	case 5:
+	default:
+		return 48;
+	}
+}
+
+static u32 compute_fsc(int level, u32 fsc)
+{
+	return fsc | (level & 0x3);
+}
+
+static int check_base_s2_limits(struct s2_walk_info *wi,
+				int level, int input_size, int stride)
+{
+	int start_size;
+
+	/* Check translation limits */
+	switch (wi->pgsize) {
+	case SZ_64K:
+		if (level == 0 || (level == 1 && wi->max_pa_bits <= 42))
+			return -EFAULT;
+		break;
+	case SZ_16K:
+		if (level == 0 || (level == 1 && wi->max_pa_bits <= 40))
+			return -EFAULT;
+		break;
+	case SZ_4K:
+		if (level < 0 || (level == 0 && wi->max_pa_bits <= 42))
+			return -EFAULT;
+		break;
+	}
+
+	/* Check input size limits */
+	if (input_size > wi->max_pa_bits &&
+	    (!wi->el1_aarch32 || input_size > 40))
+		return -EFAULT;
+
+	/* Check number of entries in starting level table */
+	start_size = input_size - ((3 - level) * stride + wi->pgshift);
+	if (start_size < 1 || start_size > stride + 4)
+		return -EFAULT;
+
+	return 0;
+}
+
+/* Check if output is within boundaries */
+static int check_output_size(struct s2_walk_info *wi, phys_addr_t output)
+{
+	unsigned int output_size = ps_to_output_size(wi->ps);
+
+	if (output_size > wi->max_pa_bits)
+		output_size = wi->max_pa_bits;
+
+	if (output_size != 48 && (output & GENMASK_ULL(47, output_size)))
+		return -1;
+
+	return 0;
+}
+
+/*
+ * This is essentially a C-version of the pseudo code from the ARM ARM
+ * AArch64.TranslationTableWalk  function.  I strongly recommend looking at
+ * that pseudocode in trying to understand this.
+ *
+ * Must be called with the kvm->srcu read lock held
+ */
+static int walk_nested_s2_pgd(phys_addr_t ipa,
+			      struct s2_walk_info *wi, struct kvm_s2_trans *out)
+{
+	int first_block_level, level, stride, input_size, base_lower_bound;
+	phys_addr_t base_addr;
+	unsigned int addr_top, addr_bottom;
+	u64 desc;  /* page table entry */
+	int ret;
+	phys_addr_t paddr;
+
+	switch (wi->pgsize) {
+	case SZ_64K:
+	case SZ_16K:
+		level = 3 - wi->sl;
+		first_block_level = 2;
+		break;
+	case SZ_4K:
+		level = 2 - wi->sl;
+		first_block_level = 1;
+		break;
+	default:
+		/* GCC is braindead */
+		unreachable();
+	}
+
+	stride = wi->pgshift - 3;
+	input_size = 64 - wi->t0sz;
+	if (input_size > 48 || input_size < 25)
+		return -EFAULT;
+
+	ret = check_base_s2_limits(wi, level, input_size, stride);
+	if (WARN_ON(ret))
+		return ret;
+
+	base_lower_bound = 3 + input_size - ((3 - level) * stride +
+			   wi->pgshift);
+	base_addr = wi->baddr & GENMASK_ULL(47, base_lower_bound);
+
+	if (check_output_size(wi, base_addr)) {
+		out->esr = compute_fsc(level, ESR_ELx_FSC_ADDRSZ);
+		return 1;
+	}
+
+	addr_top = input_size - 1;
+
+	while (1) {
+		phys_addr_t index;
+
+		addr_bottom = (3 - level) * stride + wi->pgshift;
+		index = (ipa & GENMASK_ULL(addr_top, addr_bottom))
+			>> (addr_bottom - 3);
+
+		paddr = base_addr | index;
+		ret = wi->read_desc(paddr, &desc, wi->data);
+		if (ret < 0)
+			return ret;
+
+		/*
+		 * Handle reversedescriptors if endianness differs between the
+		 * host and the guest hypervisor.
+		 */
+		if (wi->be)
+			desc = be64_to_cpu(desc);
+		else
+			desc = le64_to_cpu(desc);
+
+		/* Check for valid descriptor at this point */
+		if (!(desc & 1) || ((desc & 3) == 1 && level == 3)) {
+			out->esr = compute_fsc(level, ESR_ELx_FSC_FAULT);
+			out->upper_attr = desc;
+			return 1;
+		}
+
+		/* We're at the final level or block translation level */
+		if ((desc & 3) == 1 || level == 3)
+			break;
+
+		if (check_output_size(wi, desc)) {
+			out->esr = compute_fsc(level, ESR_ELx_FSC_ADDRSZ);
+			out->upper_attr = desc;
+			return 1;
+		}
+
+		base_addr = desc & GENMASK_ULL(47, wi->pgshift);
+
+		level += 1;
+		addr_top = addr_bottom - 1;
+	}
+
+	if (level < first_block_level) {
+		out->esr = compute_fsc(level, ESR_ELx_FSC_FAULT);
+		out->upper_attr = desc;
+		return 1;
+	}
+
+	/*
+	 * We don't use the contiguous bit in the stage-2 ptes, so skip check
+	 * for misprogramming of the contiguous bit.
+	 */
+
+	if (check_output_size(wi, desc)) {
+		out->esr = compute_fsc(level, ESR_ELx_FSC_ADDRSZ);
+		out->upper_attr = desc;
+		return 1;
+	}
+
+	if (!(desc & BIT(10))) {
+		out->esr = compute_fsc(level, ESR_ELx_FSC_ACCESS);
+		out->upper_attr = desc;
+		return 1;
+	}
+
+	/* Calculate and return the result */
+	paddr = (desc & GENMASK_ULL(47, addr_bottom)) |
+		(ipa & GENMASK_ULL(addr_bottom - 1, 0));
+	out->output = paddr;
+	out->block_size = 1UL << ((3 - level) * stride + wi->pgshift);
+	out->readable = desc & (0b01 << 6);
+	out->writable = desc & (0b10 << 6);
+	out->level = level;
+	out->upper_attr = desc & GENMASK_ULL(63, 52);
+	return 0;
+}
+
+static int read_guest_s2_desc(phys_addr_t pa, u64 *desc, void *data)
+{
+	struct kvm_vcpu *vcpu = data;
+
+	return kvm_read_guest(vcpu->kvm, pa, desc, sizeof(*desc));
+}
+
+static void vtcr_to_walk_info(u64 vtcr, struct s2_walk_info *wi)
+{
+	wi->t0sz = vtcr & TCR_EL2_T0SZ_MASK;
+
+	switch (vtcr & VTCR_EL2_TG0_MASK) {
+	case VTCR_EL2_TG0_4K:
+		wi->pgshift = 12;	 break;
+	case VTCR_EL2_TG0_16K:
+		wi->pgshift = 14;	 break;
+	case VTCR_EL2_TG0_64K:
+	default:
+		wi->pgshift = 16;	 break;
+	}
+
+	wi->pgsize = 1UL << wi->pgshift;
+	wi->ps = (vtcr & VTCR_EL2_PS_MASK) >> VTCR_EL2_PS_SHIFT;
+	wi->sl = (vtcr & VTCR_EL2_SL0_MASK) >> VTCR_EL2_SL0_SHIFT;
+	wi->max_pa_bits = VTCR_EL2_IPA(vtcr);
+}
+
+int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
+		       struct kvm_s2_trans *result)
+{
+	u64 vtcr = vcpu_read_sys_reg(vcpu, VTCR_EL2);
+	struct s2_walk_info wi;
+	int ret;
+
+	result->esr = 0;
+
+	if (!vcpu_has_nv(vcpu))
+		return 0;
+
+	wi.read_desc = read_guest_s2_desc;
+	wi.data = vcpu;
+	wi.baddr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
+
+	vtcr_to_walk_info(vtcr, &wi);
+
+	wi.be = vcpu_read_sys_reg(vcpu, SCTLR_EL2) & SCTLR_EE;
+	wi.el1_aarch32 = vcpu_mode_is_32bit(vcpu);
+
+	ret = walk_nested_s2_pgd(gipa, &wi, result);
+	if (ret)
+		result->esr |= (kvm_vcpu_get_esr(vcpu) & ~ESR_ELx_FSC);
+
+	return ret;
+}
+
 /* Must be called with kvm->lock held */
 struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr)
 {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 35/64] KVM: arm64: nv: Implement nested Stage-2 page table walk logic
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

From: Christoffer Dall <christoffer.dall@linaro.org>

Based on the pseudo-code in the ARM ARM, implement a stage 2 software
page table walker.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
[maz: heavily reworked for future ARMv8.4-TTL support]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/esr.h        |   1 +
 arch/arm64/include/asm/kvm_arm.h    |   2 +
 arch/arm64/include/asm/kvm_nested.h |  13 ++
 arch/arm64/kvm/nested.c             | 267 ++++++++++++++++++++++++++++
 4 files changed, 283 insertions(+)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index 3574c224889f..b4f6f86a3631 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -130,6 +130,7 @@
 #define ESR_ELx_CM 		(UL(1) << ESR_ELx_CM_SHIFT)
 
 /* ISS field definitions for exceptions taken in to Hyp */
+#define ESR_ELx_FSC_ADDRSZ	(0x00)
 #define ESR_ELx_CV		(UL(1) << 24)
 #define ESR_ELx_COND_SHIFT	(20)
 #define ESR_ELx_COND_MASK	(UL(0xF) << ESR_ELx_COND_SHIFT)
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index d913c3fb5665..3675879b53c6 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -272,6 +272,8 @@
 #define VTTBR_VMID_SHIFT  (UL(48))
 #define VTTBR_VMID_MASK(size) (_AT(u64, (1 << size) - 1) << VTTBR_VMID_SHIFT)
 
+#define SCTLR_EE	(UL(1) << 25)
+
 /* Hyp System Trap Register */
 #define HSTR_EL2_T(x)	(1 << x)
 
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 8bb7159f2b6b..48cf288ea238 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -72,6 +72,19 @@ extern struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr);
 extern void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu);
 extern void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu);
 
+struct kvm_s2_trans {
+	phys_addr_t output;
+	unsigned long block_size;
+	bool writable;
+	bool readable;
+	int level;
+	u32 esr;
+	u64 upper_attr;
+};
+
+extern int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
+			      struct kvm_s2_trans *result);
+
 int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
 extern bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg,
 			    u64 control_bit);
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index bfa2b9229173..c2a99b672368 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -75,6 +75,273 @@ int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu)
 	return ret;
 }
 
+struct s2_walk_info {
+	int	     (*read_desc)(phys_addr_t pa, u64 *desc, void *data);
+	void	     *data;
+	u64	     baddr;
+	unsigned int max_pa_bits;
+	unsigned int pgshift;
+	unsigned int pgsize;
+	unsigned int ps;
+	unsigned int sl;
+	unsigned int t0sz;
+	bool	     be;
+	bool	     el1_aarch32;
+};
+
+static unsigned int ps_to_output_size(unsigned int ps)
+{
+	switch (ps) {
+	case 0: return 32;
+	case 1: return 36;
+	case 2: return 40;
+	case 3: return 42;
+	case 4: return 44;
+	case 5:
+	default:
+		return 48;
+	}
+}
+
+static u32 compute_fsc(int level, u32 fsc)
+{
+	return fsc | (level & 0x3);
+}
+
+static int check_base_s2_limits(struct s2_walk_info *wi,
+				int level, int input_size, int stride)
+{
+	int start_size;
+
+	/* Check translation limits */
+	switch (wi->pgsize) {
+	case SZ_64K:
+		if (level == 0 || (level == 1 && wi->max_pa_bits <= 42))
+			return -EFAULT;
+		break;
+	case SZ_16K:
+		if (level == 0 || (level == 1 && wi->max_pa_bits <= 40))
+			return -EFAULT;
+		break;
+	case SZ_4K:
+		if (level < 0 || (level == 0 && wi->max_pa_bits <= 42))
+			return -EFAULT;
+		break;
+	}
+
+	/* Check input size limits */
+	if (input_size > wi->max_pa_bits &&
+	    (!wi->el1_aarch32 || input_size > 40))
+		return -EFAULT;
+
+	/* Check number of entries in starting level table */
+	start_size = input_size - ((3 - level) * stride + wi->pgshift);
+	if (start_size < 1 || start_size > stride + 4)
+		return -EFAULT;
+
+	return 0;
+}
+
+/* Check if output is within boundaries */
+static int check_output_size(struct s2_walk_info *wi, phys_addr_t output)
+{
+	unsigned int output_size = ps_to_output_size(wi->ps);
+
+	if (output_size > wi->max_pa_bits)
+		output_size = wi->max_pa_bits;
+
+	if (output_size != 48 && (output & GENMASK_ULL(47, output_size)))
+		return -1;
+
+	return 0;
+}
+
+/*
+ * This is essentially a C-version of the pseudo code from the ARM ARM
+ * AArch64.TranslationTableWalk  function.  I strongly recommend looking at
+ * that pseudocode in trying to understand this.
+ *
+ * Must be called with the kvm->srcu read lock held
+ */
+static int walk_nested_s2_pgd(phys_addr_t ipa,
+			      struct s2_walk_info *wi, struct kvm_s2_trans *out)
+{
+	int first_block_level, level, stride, input_size, base_lower_bound;
+	phys_addr_t base_addr;
+	unsigned int addr_top, addr_bottom;
+	u64 desc;  /* page table entry */
+	int ret;
+	phys_addr_t paddr;
+
+	switch (wi->pgsize) {
+	case SZ_64K:
+	case SZ_16K:
+		level = 3 - wi->sl;
+		first_block_level = 2;
+		break;
+	case SZ_4K:
+		level = 2 - wi->sl;
+		first_block_level = 1;
+		break;
+	default:
+		/* GCC is braindead */
+		unreachable();
+	}
+
+	stride = wi->pgshift - 3;
+	input_size = 64 - wi->t0sz;
+	if (input_size > 48 || input_size < 25)
+		return -EFAULT;
+
+	ret = check_base_s2_limits(wi, level, input_size, stride);
+	if (WARN_ON(ret))
+		return ret;
+
+	base_lower_bound = 3 + input_size - ((3 - level) * stride +
+			   wi->pgshift);
+	base_addr = wi->baddr & GENMASK_ULL(47, base_lower_bound);
+
+	if (check_output_size(wi, base_addr)) {
+		out->esr = compute_fsc(level, ESR_ELx_FSC_ADDRSZ);
+		return 1;
+	}
+
+	addr_top = input_size - 1;
+
+	while (1) {
+		phys_addr_t index;
+
+		addr_bottom = (3 - level) * stride + wi->pgshift;
+		index = (ipa & GENMASK_ULL(addr_top, addr_bottom))
+			>> (addr_bottom - 3);
+
+		paddr = base_addr | index;
+		ret = wi->read_desc(paddr, &desc, wi->data);
+		if (ret < 0)
+			return ret;
+
+		/*
+		 * Handle reversedescriptors if endianness differs between the
+		 * host and the guest hypervisor.
+		 */
+		if (wi->be)
+			desc = be64_to_cpu(desc);
+		else
+			desc = le64_to_cpu(desc);
+
+		/* Check for valid descriptor at this point */
+		if (!(desc & 1) || ((desc & 3) == 1 && level == 3)) {
+			out->esr = compute_fsc(level, ESR_ELx_FSC_FAULT);
+			out->upper_attr = desc;
+			return 1;
+		}
+
+		/* We're at the final level or block translation level */
+		if ((desc & 3) == 1 || level == 3)
+			break;
+
+		if (check_output_size(wi, desc)) {
+			out->esr = compute_fsc(level, ESR_ELx_FSC_ADDRSZ);
+			out->upper_attr = desc;
+			return 1;
+		}
+
+		base_addr = desc & GENMASK_ULL(47, wi->pgshift);
+
+		level += 1;
+		addr_top = addr_bottom - 1;
+	}
+
+	if (level < first_block_level) {
+		out->esr = compute_fsc(level, ESR_ELx_FSC_FAULT);
+		out->upper_attr = desc;
+		return 1;
+	}
+
+	/*
+	 * We don't use the contiguous bit in the stage-2 ptes, so skip check
+	 * for misprogramming of the contiguous bit.
+	 */
+
+	if (check_output_size(wi, desc)) {
+		out->esr = compute_fsc(level, ESR_ELx_FSC_ADDRSZ);
+		out->upper_attr = desc;
+		return 1;
+	}
+
+	if (!(desc & BIT(10))) {
+		out->esr = compute_fsc(level, ESR_ELx_FSC_ACCESS);
+		out->upper_attr = desc;
+		return 1;
+	}
+
+	/* Calculate and return the result */
+	paddr = (desc & GENMASK_ULL(47, addr_bottom)) |
+		(ipa & GENMASK_ULL(addr_bottom - 1, 0));
+	out->output = paddr;
+	out->block_size = 1UL << ((3 - level) * stride + wi->pgshift);
+	out->readable = desc & (0b01 << 6);
+	out->writable = desc & (0b10 << 6);
+	out->level = level;
+	out->upper_attr = desc & GENMASK_ULL(63, 52);
+	return 0;
+}
+
+static int read_guest_s2_desc(phys_addr_t pa, u64 *desc, void *data)
+{
+	struct kvm_vcpu *vcpu = data;
+
+	return kvm_read_guest(vcpu->kvm, pa, desc, sizeof(*desc));
+}
+
+static void vtcr_to_walk_info(u64 vtcr, struct s2_walk_info *wi)
+{
+	wi->t0sz = vtcr & TCR_EL2_T0SZ_MASK;
+
+	switch (vtcr & VTCR_EL2_TG0_MASK) {
+	case VTCR_EL2_TG0_4K:
+		wi->pgshift = 12;	 break;
+	case VTCR_EL2_TG0_16K:
+		wi->pgshift = 14;	 break;
+	case VTCR_EL2_TG0_64K:
+	default:
+		wi->pgshift = 16;	 break;
+	}
+
+	wi->pgsize = 1UL << wi->pgshift;
+	wi->ps = (vtcr & VTCR_EL2_PS_MASK) >> VTCR_EL2_PS_SHIFT;
+	wi->sl = (vtcr & VTCR_EL2_SL0_MASK) >> VTCR_EL2_SL0_SHIFT;
+	wi->max_pa_bits = VTCR_EL2_IPA(vtcr);
+}
+
+int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
+		       struct kvm_s2_trans *result)
+{
+	u64 vtcr = vcpu_read_sys_reg(vcpu, VTCR_EL2);
+	struct s2_walk_info wi;
+	int ret;
+
+	result->esr = 0;
+
+	if (!vcpu_has_nv(vcpu))
+		return 0;
+
+	wi.read_desc = read_guest_s2_desc;
+	wi.data = vcpu;
+	wi.baddr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
+
+	vtcr_to_walk_info(vtcr, &wi);
+
+	wi.be = vcpu_read_sys_reg(vcpu, SCTLR_EL2) & SCTLR_EE;
+	wi.el1_aarch32 = vcpu_mode_is_32bit(vcpu);
+
+	ret = walk_nested_s2_pgd(gipa, &wi, result);
+	if (ret)
+		result->esr |= (kvm_vcpu_get_esr(vcpu) & ~ESR_ELx_FSC);
+
+	return ret;
+}
+
 /* Must be called with kvm->lock held */
 struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr)
 {
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 35/64] KVM: arm64: nv: Implement nested Stage-2 page table walk logic
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Christoffer Dall <christoffer.dall@linaro.org>

Based on the pseudo-code in the ARM ARM, implement a stage 2 software
page table walker.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
[maz: heavily reworked for future ARMv8.4-TTL support]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/esr.h        |   1 +
 arch/arm64/include/asm/kvm_arm.h    |   2 +
 arch/arm64/include/asm/kvm_nested.h |  13 ++
 arch/arm64/kvm/nested.c             | 267 ++++++++++++++++++++++++++++
 4 files changed, 283 insertions(+)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index 3574c224889f..b4f6f86a3631 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -130,6 +130,7 @@
 #define ESR_ELx_CM 		(UL(1) << ESR_ELx_CM_SHIFT)
 
 /* ISS field definitions for exceptions taken in to Hyp */
+#define ESR_ELx_FSC_ADDRSZ	(0x00)
 #define ESR_ELx_CV		(UL(1) << 24)
 #define ESR_ELx_COND_SHIFT	(20)
 #define ESR_ELx_COND_MASK	(UL(0xF) << ESR_ELx_COND_SHIFT)
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index d913c3fb5665..3675879b53c6 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -272,6 +272,8 @@
 #define VTTBR_VMID_SHIFT  (UL(48))
 #define VTTBR_VMID_MASK(size) (_AT(u64, (1 << size) - 1) << VTTBR_VMID_SHIFT)
 
+#define SCTLR_EE	(UL(1) << 25)
+
 /* Hyp System Trap Register */
 #define HSTR_EL2_T(x)	(1 << x)
 
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 8bb7159f2b6b..48cf288ea238 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -72,6 +72,19 @@ extern struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr);
 extern void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu);
 extern void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu);
 
+struct kvm_s2_trans {
+	phys_addr_t output;
+	unsigned long block_size;
+	bool writable;
+	bool readable;
+	int level;
+	u32 esr;
+	u64 upper_attr;
+};
+
+extern int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
+			      struct kvm_s2_trans *result);
+
 int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
 extern bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg,
 			    u64 control_bit);
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index bfa2b9229173..c2a99b672368 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -75,6 +75,273 @@ int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu)
 	return ret;
 }
 
+struct s2_walk_info {
+	int	     (*read_desc)(phys_addr_t pa, u64 *desc, void *data);
+	void	     *data;
+	u64	     baddr;
+	unsigned int max_pa_bits;
+	unsigned int pgshift;
+	unsigned int pgsize;
+	unsigned int ps;
+	unsigned int sl;
+	unsigned int t0sz;
+	bool	     be;
+	bool	     el1_aarch32;
+};
+
+static unsigned int ps_to_output_size(unsigned int ps)
+{
+	switch (ps) {
+	case 0: return 32;
+	case 1: return 36;
+	case 2: return 40;
+	case 3: return 42;
+	case 4: return 44;
+	case 5:
+	default:
+		return 48;
+	}
+}
+
+static u32 compute_fsc(int level, u32 fsc)
+{
+	return fsc | (level & 0x3);
+}
+
+static int check_base_s2_limits(struct s2_walk_info *wi,
+				int level, int input_size, int stride)
+{
+	int start_size;
+
+	/* Check translation limits */
+	switch (wi->pgsize) {
+	case SZ_64K:
+		if (level == 0 || (level == 1 && wi->max_pa_bits <= 42))
+			return -EFAULT;
+		break;
+	case SZ_16K:
+		if (level == 0 || (level == 1 && wi->max_pa_bits <= 40))
+			return -EFAULT;
+		break;
+	case SZ_4K:
+		if (level < 0 || (level == 0 && wi->max_pa_bits <= 42))
+			return -EFAULT;
+		break;
+	}
+
+	/* Check input size limits */
+	if (input_size > wi->max_pa_bits &&
+	    (!wi->el1_aarch32 || input_size > 40))
+		return -EFAULT;
+
+	/* Check number of entries in starting level table */
+	start_size = input_size - ((3 - level) * stride + wi->pgshift);
+	if (start_size < 1 || start_size > stride + 4)
+		return -EFAULT;
+
+	return 0;
+}
+
+/* Check if output is within boundaries */
+static int check_output_size(struct s2_walk_info *wi, phys_addr_t output)
+{
+	unsigned int output_size = ps_to_output_size(wi->ps);
+
+	if (output_size > wi->max_pa_bits)
+		output_size = wi->max_pa_bits;
+
+	if (output_size != 48 && (output & GENMASK_ULL(47, output_size)))
+		return -1;
+
+	return 0;
+}
+
+/*
+ * This is essentially a C-version of the pseudo code from the ARM ARM
+ * AArch64.TranslationTableWalk  function.  I strongly recommend looking at
+ * that pseudocode in trying to understand this.
+ *
+ * Must be called with the kvm->srcu read lock held
+ */
+static int walk_nested_s2_pgd(phys_addr_t ipa,
+			      struct s2_walk_info *wi, struct kvm_s2_trans *out)
+{
+	int first_block_level, level, stride, input_size, base_lower_bound;
+	phys_addr_t base_addr;
+	unsigned int addr_top, addr_bottom;
+	u64 desc;  /* page table entry */
+	int ret;
+	phys_addr_t paddr;
+
+	switch (wi->pgsize) {
+	case SZ_64K:
+	case SZ_16K:
+		level = 3 - wi->sl;
+		first_block_level = 2;
+		break;
+	case SZ_4K:
+		level = 2 - wi->sl;
+		first_block_level = 1;
+		break;
+	default:
+		/* GCC is braindead */
+		unreachable();
+	}
+
+	stride = wi->pgshift - 3;
+	input_size = 64 - wi->t0sz;
+	if (input_size > 48 || input_size < 25)
+		return -EFAULT;
+
+	ret = check_base_s2_limits(wi, level, input_size, stride);
+	if (WARN_ON(ret))
+		return ret;
+
+	base_lower_bound = 3 + input_size - ((3 - level) * stride +
+			   wi->pgshift);
+	base_addr = wi->baddr & GENMASK_ULL(47, base_lower_bound);
+
+	if (check_output_size(wi, base_addr)) {
+		out->esr = compute_fsc(level, ESR_ELx_FSC_ADDRSZ);
+		return 1;
+	}
+
+	addr_top = input_size - 1;
+
+	while (1) {
+		phys_addr_t index;
+
+		addr_bottom = (3 - level) * stride + wi->pgshift;
+		index = (ipa & GENMASK_ULL(addr_top, addr_bottom))
+			>> (addr_bottom - 3);
+
+		paddr = base_addr | index;
+		ret = wi->read_desc(paddr, &desc, wi->data);
+		if (ret < 0)
+			return ret;
+
+		/*
+		 * Handle reversedescriptors if endianness differs between the
+		 * host and the guest hypervisor.
+		 */
+		if (wi->be)
+			desc = be64_to_cpu(desc);
+		else
+			desc = le64_to_cpu(desc);
+
+		/* Check for valid descriptor at this point */
+		if (!(desc & 1) || ((desc & 3) == 1 && level == 3)) {
+			out->esr = compute_fsc(level, ESR_ELx_FSC_FAULT);
+			out->upper_attr = desc;
+			return 1;
+		}
+
+		/* We're at the final level or block translation level */
+		if ((desc & 3) == 1 || level == 3)
+			break;
+
+		if (check_output_size(wi, desc)) {
+			out->esr = compute_fsc(level, ESR_ELx_FSC_ADDRSZ);
+			out->upper_attr = desc;
+			return 1;
+		}
+
+		base_addr = desc & GENMASK_ULL(47, wi->pgshift);
+
+		level += 1;
+		addr_top = addr_bottom - 1;
+	}
+
+	if (level < first_block_level) {
+		out->esr = compute_fsc(level, ESR_ELx_FSC_FAULT);
+		out->upper_attr = desc;
+		return 1;
+	}
+
+	/*
+	 * We don't use the contiguous bit in the stage-2 ptes, so skip check
+	 * for misprogramming of the contiguous bit.
+	 */
+
+	if (check_output_size(wi, desc)) {
+		out->esr = compute_fsc(level, ESR_ELx_FSC_ADDRSZ);
+		out->upper_attr = desc;
+		return 1;
+	}
+
+	if (!(desc & BIT(10))) {
+		out->esr = compute_fsc(level, ESR_ELx_FSC_ACCESS);
+		out->upper_attr = desc;
+		return 1;
+	}
+
+	/* Calculate and return the result */
+	paddr = (desc & GENMASK_ULL(47, addr_bottom)) |
+		(ipa & GENMASK_ULL(addr_bottom - 1, 0));
+	out->output = paddr;
+	out->block_size = 1UL << ((3 - level) * stride + wi->pgshift);
+	out->readable = desc & (0b01 << 6);
+	out->writable = desc & (0b10 << 6);
+	out->level = level;
+	out->upper_attr = desc & GENMASK_ULL(63, 52);
+	return 0;
+}
+
+static int read_guest_s2_desc(phys_addr_t pa, u64 *desc, void *data)
+{
+	struct kvm_vcpu *vcpu = data;
+
+	return kvm_read_guest(vcpu->kvm, pa, desc, sizeof(*desc));
+}
+
+static void vtcr_to_walk_info(u64 vtcr, struct s2_walk_info *wi)
+{
+	wi->t0sz = vtcr & TCR_EL2_T0SZ_MASK;
+
+	switch (vtcr & VTCR_EL2_TG0_MASK) {
+	case VTCR_EL2_TG0_4K:
+		wi->pgshift = 12;	 break;
+	case VTCR_EL2_TG0_16K:
+		wi->pgshift = 14;	 break;
+	case VTCR_EL2_TG0_64K:
+	default:
+		wi->pgshift = 16;	 break;
+	}
+
+	wi->pgsize = 1UL << wi->pgshift;
+	wi->ps = (vtcr & VTCR_EL2_PS_MASK) >> VTCR_EL2_PS_SHIFT;
+	wi->sl = (vtcr & VTCR_EL2_SL0_MASK) >> VTCR_EL2_SL0_SHIFT;
+	wi->max_pa_bits = VTCR_EL2_IPA(vtcr);
+}
+
+int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
+		       struct kvm_s2_trans *result)
+{
+	u64 vtcr = vcpu_read_sys_reg(vcpu, VTCR_EL2);
+	struct s2_walk_info wi;
+	int ret;
+
+	result->esr = 0;
+
+	if (!vcpu_has_nv(vcpu))
+		return 0;
+
+	wi.read_desc = read_guest_s2_desc;
+	wi.data = vcpu;
+	wi.baddr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
+
+	vtcr_to_walk_info(vtcr, &wi);
+
+	wi.be = vcpu_read_sys_reg(vcpu, SCTLR_EL2) & SCTLR_EE;
+	wi.el1_aarch32 = vcpu_mode_is_32bit(vcpu);
+
+	ret = walk_nested_s2_pgd(gipa, &wi, result);
+	if (ret)
+		result->esr |= (kvm_vcpu_get_esr(vcpu) & ~ESR_ELx_FSC);
+
+	return ret;
+}
+
 /* Must be called with kvm->lock held */
 struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr)
 {
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 36/64] KVM: arm64: nv: Handle shadow stage 2 page faults
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

If we are faulting on a shadow stage 2 translation, we first walk the
guest hypervisor's stage 2 page table to see if it has a mapping. If
not, we inject a stage 2 page fault to the virtual EL2. Otherwise, we
create a mapping in the shadow stage 2 page table.

Note that we have to deal with two IPAs when we got a shadow stage 2
page fault. One is the address we faulted on, and is in the L2 guest
phys space. The other is from the guest stage-2 page table walk, and is
in the L1 guest phys space.  To differentiate them, we rename variables
so that fault_ipa is used for the former and ipa is used for the latter.

Co-developed-by: Christoffer Dall <christoffer.dall@linaro.org>
Co-developed-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
[maz: rewrote this multiple times...]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_emulate.h |   6 ++
 arch/arm64/include/asm/kvm_nested.h  |  18 +++++
 arch/arm64/kvm/mmu.c                 | 102 +++++++++++++++++++++++----
 arch/arm64/kvm/nested.c              |  48 +++++++++++++
 4 files changed, 159 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index ff8980a39ee8..dc6eeb0cc8a9 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -602,4 +602,10 @@ static inline bool vcpu_has_feature(struct kvm_vcpu *vcpu, int feature)
 	return test_bit(feature, vcpu->arch.features);
 }
 
+static inline bool kvm_is_shadow_s2_fault(struct kvm_vcpu *vcpu)
+{
+	return (vcpu->arch.hw_mmu != &vcpu->kvm->arch.mmu &&
+		vcpu->arch.hw_mmu->nested_stage2_enabled);
+}
+
 #endif /* __ARM64_KVM_EMULATE_H__ */
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 48cf288ea238..4fad4d3848ce 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -82,9 +82,27 @@ struct kvm_s2_trans {
 	u64 upper_attr;
 };
 
+static inline phys_addr_t kvm_s2_trans_output(struct kvm_s2_trans *trans)
+{
+	return trans->output;
+}
+
+static inline unsigned long kvm_s2_trans_size(struct kvm_s2_trans *trans)
+{
+	return trans->block_size;
+}
+
+static inline u32 kvm_s2_trans_esr(struct kvm_s2_trans *trans)
+{
+	return trans->esr;
+}
+
 extern int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
 			      struct kvm_s2_trans *result);
 
+extern int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu,
+				    struct kvm_s2_trans *trans);
+extern int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2);
 int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
 extern bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg,
 			    u64 control_bit);
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 55525fd5743d..36f7ecb4f81b 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -969,7 +969,7 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot,
 static unsigned long
 transparent_hugepage_adjust(struct kvm *kvm, struct kvm_memory_slot *memslot,
 			    unsigned long hva, kvm_pfn_t *pfnp,
-			    phys_addr_t *ipap)
+			    phys_addr_t *ipap, phys_addr_t *fault_ipap)
 {
 	kvm_pfn_t pfn = *pfnp;
 
@@ -998,6 +998,7 @@ transparent_hugepage_adjust(struct kvm *kvm, struct kvm_memory_slot *memslot,
 		 * to PG_head and switch the pfn from a tail page to the head
 		 * page accordingly.
 		 */
+		*fault_ipap &= PMD_MASK;
 		*ipap &= PMD_MASK;
 		kvm_release_pfn_clean(pfn);
 		pfn &= ~(PTRS_PER_PMD - 1);
@@ -1080,15 +1081,17 @@ static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
 }
 
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
-			  struct kvm_memory_slot *memslot, unsigned long hva,
-			  unsigned long fault_status)
+			  struct kvm_s2_trans *nested,
+			  struct kvm_memory_slot *memslot,
+			  unsigned long hva, unsigned long fault_status)
 {
 	int ret = 0;
-	bool write_fault, writable, force_pte = false;
+	bool write_fault, writable;
 	bool exec_fault;
 	bool device = false;
 	bool shared;
 	unsigned long mmu_seq;
+	phys_addr_t ipa = fault_ipa;
 	struct kvm *kvm = vcpu->kvm;
 	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
 	struct vm_area_struct *vma;
@@ -1100,6 +1103,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	unsigned long vma_pagesize, fault_granule;
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
 	struct kvm_pgtable *pgt;
+	unsigned long max_map_size = PUD_SIZE;
 
 	fault_granule = 1UL << ARM64_HW_PGTABLE_LEVEL_SHIFT(fault_level);
 	write_fault = kvm_is_write_fault(vcpu);
@@ -1128,7 +1132,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * memslots.
 	 */
 	if (logging_active) {
-		force_pte = true;
+		max_map_size = vma_pagesize = PAGE_SIZE;
 		vma_shift = PAGE_SHIFT;
 	} else {
 		vma_shift = get_vma_page_shift(vma, hva);
@@ -1152,7 +1156,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		fallthrough;
 	case CONT_PTE_SHIFT:
 		vma_shift = PAGE_SHIFT;
-		force_pte = true;
+		max_map_size = PAGE_SIZE;
 		fallthrough;
 	case PAGE_SHIFT:
 		break;
@@ -1161,10 +1165,25 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	}
 
 	vma_pagesize = 1UL << vma_shift;
+
+	if (kvm_is_shadow_s2_fault(vcpu)) {
+		ipa = kvm_s2_trans_output(nested);
+
+		/*
+		 * If we're about to create a shadow stage 2 entry, then we
+		 * can only create a block mapping if the guest stage 2 page
+		 * table uses at least as big a mapping.
+		 */
+		max_map_size = min(kvm_s2_trans_size(nested), max_map_size);
+	}
+
+	vma_pagesize = min(vma_pagesize, max_map_size);
+
 	if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE)
 		fault_ipa &= ~(vma_pagesize - 1);
 
-	gfn = fault_ipa >> PAGE_SHIFT;
+	gfn = ipa >> PAGE_SHIFT;
+
 	mmap_read_unlock(current->mm);
 
 	/*
@@ -1237,12 +1256,14 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * If we are not forced to use page mapping, check if we are
 	 * backed by a THP and thus use block mapping if possible.
 	 */
-	if (vma_pagesize == PAGE_SIZE && !(force_pte || device)) {
+	if (vma_pagesize == PAGE_SIZE &&
+	    !(max_map_size == PAGE_SIZE || device)) {
 		if (fault_status == FSC_PERM && fault_granule > PAGE_SIZE)
 			vma_pagesize = fault_granule;
 		else
 			vma_pagesize = transparent_hugepage_adjust(kvm, memslot,
 								   hva, &pfn,
+								   &ipa,
 								   &fault_ipa);
 	}
 
@@ -1326,8 +1347,10 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 {
 	unsigned long fault_status;
-	phys_addr_t fault_ipa;
+	phys_addr_t fault_ipa; /* The address we faulted on */
+	phys_addr_t ipa; /* Always the IPA in the L1 guest phys space */
 	struct kvm_memory_slot *memslot;
+	struct kvm_s2_trans nested_trans;
 	unsigned long hva;
 	bool is_iabt, write_fault, writable;
 	gfn_t gfn;
@@ -1335,7 +1358,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 
 	fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
 
-	fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
+	ipa = fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
 	is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
 
 	/* Synchronous External Abort? */
@@ -1356,6 +1379,12 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 	/* Check the stage-2 fault is trans. fault or write fault */
 	if (fault_status != FSC_FAULT && fault_status != FSC_PERM &&
 	    fault_status != FSC_ACCESS) {
+		/*
+		 * We must never see an address size fault on shadow stage 2
+		 * page table walk, because we would have injected an addr
+		 * size fault when we walked the nested s2 page and not
+		 * create the shadow entry.
+		 */
 		kvm_err("Unsupported FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
 			kvm_vcpu_trap_get_class(vcpu),
 			(unsigned long)kvm_vcpu_trap_get_fault(vcpu),
@@ -1365,7 +1394,49 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 
 	idx = srcu_read_lock(&vcpu->kvm->srcu);
 
-	gfn = fault_ipa >> PAGE_SHIFT;
+	/*
+	 * We may have faulted on a shadow stage 2 page table if we are
+	 * running a nested guest.  In this case, we have to resolve the L2
+	 * IPA to the L1 IPA first, before knowing what kind of memory should
+	 * back the L1 IPA.
+	 *
+	 * If the shadow stage 2 page table walk faults, then we simply inject
+	 * this to the guest and carry on.
+	 */
+	if (kvm_is_shadow_s2_fault(vcpu)) {
+		u32 esr;
+
+		ret = kvm_walk_nested_s2(vcpu, fault_ipa, &nested_trans);
+		esr = kvm_s2_trans_esr(&nested_trans);
+		if (esr)
+			kvm_inject_s2_fault(vcpu, esr);
+		if (ret)
+			goto out_unlock;
+
+		ret = kvm_s2_handle_perm_fault(vcpu, &nested_trans);
+		esr = kvm_s2_trans_esr(&nested_trans);
+		if (esr)
+			kvm_inject_s2_fault(vcpu, esr);
+		if (ret)
+			goto out_unlock;
+
+		ipa = kvm_s2_trans_output(&nested_trans);
+	} else {
+		nested_trans = (struct kvm_s2_trans) {
+			/*
+			 * Default to RWX so that we don't filter
+			 * anything while evaluating the permissions.
+			 */
+			.writable	= true,
+			.readable	= true,
+			.upper_attr	= 0,
+			.output		= ipa,
+			.level		= kvm_vcpu_trap_get_fault_level(vcpu),
+			.esr		= kvm_vcpu_get_esr(vcpu),
+		};
+	}
+
+	gfn = ipa >> PAGE_SHIFT;
 	memslot = gfn_to_memslot(vcpu->kvm, gfn);
 	hva = gfn_to_hva_memslot_prot(memslot, gfn, &writable);
 	write_fault = kvm_is_write_fault(vcpu);
@@ -1409,13 +1480,13 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 		 * faulting VA. This is always 12 bits, irrespective
 		 * of the page size.
 		 */
-		fault_ipa |= kvm_vcpu_get_hfar(vcpu) & ((1 << 12) - 1);
-		ret = io_mem_abort(vcpu, fault_ipa);
+		ipa |= kvm_vcpu_get_hfar(vcpu) & ((1 << 12) - 1);
+		ret = io_mem_abort(vcpu, ipa);
 		goto out_unlock;
 	}
 
 	/* Userspace should not be able to register out-of-bounds IPAs */
-	VM_BUG_ON(fault_ipa >= kvm_phys_size(vcpu->kvm));
+	VM_BUG_ON(ipa >= kvm_phys_size(vcpu->kvm));
 
 	if (fault_status == FSC_ACCESS) {
 		handle_access_fault(vcpu, fault_ipa);
@@ -1423,7 +1494,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 		goto out_unlock;
 	}
 
-	ret = user_mem_abort(vcpu, fault_ipa, memslot, hva, fault_status);
+	ret = user_mem_abort(vcpu, fault_ipa, &nested_trans,
+			     memslot, hva, fault_status);
 	if (ret == 0)
 		ret = 1;
 out:
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index c2a99b672368..0a9708f776fc 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -108,6 +108,15 @@ static u32 compute_fsc(int level, u32 fsc)
 	return fsc | (level & 0x3);
 }
 
+static int esr_s2_fault(struct kvm_vcpu *vcpu, int level, u32 fsc)
+{
+	u32 esr;
+
+	esr = kvm_vcpu_get_esr(vcpu) & ~ESR_ELx_FSC;
+	esr |= compute_fsc(level, fsc);
+	return esr;
+}
+
 static int check_base_s2_limits(struct s2_walk_info *wi,
 				int level, int input_size, int stride)
 {
@@ -457,6 +466,45 @@ void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu)
 	}
 }
 
+/*
+ * Returns non-zero if permission fault is handled by injecting it to the next
+ * level hypervisor.
+ */
+int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu, struct kvm_s2_trans *trans)
+{
+	unsigned long fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
+	bool forward_fault = false;
+
+	trans->esr = 0;
+
+	if (fault_status != FSC_PERM)
+		return 0;
+
+	if (kvm_vcpu_trap_is_iabt(vcpu)) {
+		forward_fault = (trans->upper_attr & BIT(54));
+	} else {
+		bool write_fault = kvm_is_write_fault(vcpu);
+
+		forward_fault = ((write_fault && !trans->writable) ||
+				 (!write_fault && !trans->readable));
+	}
+
+	if (forward_fault) {
+		trans->esr = esr_s2_fault(vcpu, trans->level, ESR_ELx_FSC_PERM);
+		return 1;
+	}
+
+	return 0;
+}
+
+int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2)
+{
+	vcpu_write_sys_reg(vcpu, vcpu->arch.fault.far_el2, FAR_EL2);
+	vcpu_write_sys_reg(vcpu, vcpu->arch.fault.hpfar_el2, HPFAR_EL2);
+
+	return kvm_inject_nested_sync(vcpu, esr_el2);
+}
+
 /*
  * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
  * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 36/64] KVM: arm64: nv: Handle shadow stage 2 page faults
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

If we are faulting on a shadow stage 2 translation, we first walk the
guest hypervisor's stage 2 page table to see if it has a mapping. If
not, we inject a stage 2 page fault to the virtual EL2. Otherwise, we
create a mapping in the shadow stage 2 page table.

Note that we have to deal with two IPAs when we got a shadow stage 2
page fault. One is the address we faulted on, and is in the L2 guest
phys space. The other is from the guest stage-2 page table walk, and is
in the L1 guest phys space.  To differentiate them, we rename variables
so that fault_ipa is used for the former and ipa is used for the latter.

Co-developed-by: Christoffer Dall <christoffer.dall@linaro.org>
Co-developed-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
[maz: rewrote this multiple times...]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_emulate.h |   6 ++
 arch/arm64/include/asm/kvm_nested.h  |  18 +++++
 arch/arm64/kvm/mmu.c                 | 102 +++++++++++++++++++++++----
 arch/arm64/kvm/nested.c              |  48 +++++++++++++
 4 files changed, 159 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index ff8980a39ee8..dc6eeb0cc8a9 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -602,4 +602,10 @@ static inline bool vcpu_has_feature(struct kvm_vcpu *vcpu, int feature)
 	return test_bit(feature, vcpu->arch.features);
 }
 
+static inline bool kvm_is_shadow_s2_fault(struct kvm_vcpu *vcpu)
+{
+	return (vcpu->arch.hw_mmu != &vcpu->kvm->arch.mmu &&
+		vcpu->arch.hw_mmu->nested_stage2_enabled);
+}
+
 #endif /* __ARM64_KVM_EMULATE_H__ */
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 48cf288ea238..4fad4d3848ce 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -82,9 +82,27 @@ struct kvm_s2_trans {
 	u64 upper_attr;
 };
 
+static inline phys_addr_t kvm_s2_trans_output(struct kvm_s2_trans *trans)
+{
+	return trans->output;
+}
+
+static inline unsigned long kvm_s2_trans_size(struct kvm_s2_trans *trans)
+{
+	return trans->block_size;
+}
+
+static inline u32 kvm_s2_trans_esr(struct kvm_s2_trans *trans)
+{
+	return trans->esr;
+}
+
 extern int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
 			      struct kvm_s2_trans *result);
 
+extern int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu,
+				    struct kvm_s2_trans *trans);
+extern int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2);
 int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
 extern bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg,
 			    u64 control_bit);
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 55525fd5743d..36f7ecb4f81b 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -969,7 +969,7 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot,
 static unsigned long
 transparent_hugepage_adjust(struct kvm *kvm, struct kvm_memory_slot *memslot,
 			    unsigned long hva, kvm_pfn_t *pfnp,
-			    phys_addr_t *ipap)
+			    phys_addr_t *ipap, phys_addr_t *fault_ipap)
 {
 	kvm_pfn_t pfn = *pfnp;
 
@@ -998,6 +998,7 @@ transparent_hugepage_adjust(struct kvm *kvm, struct kvm_memory_slot *memslot,
 		 * to PG_head and switch the pfn from a tail page to the head
 		 * page accordingly.
 		 */
+		*fault_ipap &= PMD_MASK;
 		*ipap &= PMD_MASK;
 		kvm_release_pfn_clean(pfn);
 		pfn &= ~(PTRS_PER_PMD - 1);
@@ -1080,15 +1081,17 @@ static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
 }
 
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
-			  struct kvm_memory_slot *memslot, unsigned long hva,
-			  unsigned long fault_status)
+			  struct kvm_s2_trans *nested,
+			  struct kvm_memory_slot *memslot,
+			  unsigned long hva, unsigned long fault_status)
 {
 	int ret = 0;
-	bool write_fault, writable, force_pte = false;
+	bool write_fault, writable;
 	bool exec_fault;
 	bool device = false;
 	bool shared;
 	unsigned long mmu_seq;
+	phys_addr_t ipa = fault_ipa;
 	struct kvm *kvm = vcpu->kvm;
 	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
 	struct vm_area_struct *vma;
@@ -1100,6 +1103,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	unsigned long vma_pagesize, fault_granule;
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
 	struct kvm_pgtable *pgt;
+	unsigned long max_map_size = PUD_SIZE;
 
 	fault_granule = 1UL << ARM64_HW_PGTABLE_LEVEL_SHIFT(fault_level);
 	write_fault = kvm_is_write_fault(vcpu);
@@ -1128,7 +1132,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * memslots.
 	 */
 	if (logging_active) {
-		force_pte = true;
+		max_map_size = vma_pagesize = PAGE_SIZE;
 		vma_shift = PAGE_SHIFT;
 	} else {
 		vma_shift = get_vma_page_shift(vma, hva);
@@ -1152,7 +1156,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		fallthrough;
 	case CONT_PTE_SHIFT:
 		vma_shift = PAGE_SHIFT;
-		force_pte = true;
+		max_map_size = PAGE_SIZE;
 		fallthrough;
 	case PAGE_SHIFT:
 		break;
@@ -1161,10 +1165,25 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	}
 
 	vma_pagesize = 1UL << vma_shift;
+
+	if (kvm_is_shadow_s2_fault(vcpu)) {
+		ipa = kvm_s2_trans_output(nested);
+
+		/*
+		 * If we're about to create a shadow stage 2 entry, then we
+		 * can only create a block mapping if the guest stage 2 page
+		 * table uses at least as big a mapping.
+		 */
+		max_map_size = min(kvm_s2_trans_size(nested), max_map_size);
+	}
+
+	vma_pagesize = min(vma_pagesize, max_map_size);
+
 	if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE)
 		fault_ipa &= ~(vma_pagesize - 1);
 
-	gfn = fault_ipa >> PAGE_SHIFT;
+	gfn = ipa >> PAGE_SHIFT;
+
 	mmap_read_unlock(current->mm);
 
 	/*
@@ -1237,12 +1256,14 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * If we are not forced to use page mapping, check if we are
 	 * backed by a THP and thus use block mapping if possible.
 	 */
-	if (vma_pagesize == PAGE_SIZE && !(force_pte || device)) {
+	if (vma_pagesize == PAGE_SIZE &&
+	    !(max_map_size == PAGE_SIZE || device)) {
 		if (fault_status == FSC_PERM && fault_granule > PAGE_SIZE)
 			vma_pagesize = fault_granule;
 		else
 			vma_pagesize = transparent_hugepage_adjust(kvm, memslot,
 								   hva, &pfn,
+								   &ipa,
 								   &fault_ipa);
 	}
 
@@ -1326,8 +1347,10 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 {
 	unsigned long fault_status;
-	phys_addr_t fault_ipa;
+	phys_addr_t fault_ipa; /* The address we faulted on */
+	phys_addr_t ipa; /* Always the IPA in the L1 guest phys space */
 	struct kvm_memory_slot *memslot;
+	struct kvm_s2_trans nested_trans;
 	unsigned long hva;
 	bool is_iabt, write_fault, writable;
 	gfn_t gfn;
@@ -1335,7 +1358,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 
 	fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
 
-	fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
+	ipa = fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
 	is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
 
 	/* Synchronous External Abort? */
@@ -1356,6 +1379,12 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 	/* Check the stage-2 fault is trans. fault or write fault */
 	if (fault_status != FSC_FAULT && fault_status != FSC_PERM &&
 	    fault_status != FSC_ACCESS) {
+		/*
+		 * We must never see an address size fault on shadow stage 2
+		 * page table walk, because we would have injected an addr
+		 * size fault when we walked the nested s2 page and not
+		 * create the shadow entry.
+		 */
 		kvm_err("Unsupported FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
 			kvm_vcpu_trap_get_class(vcpu),
 			(unsigned long)kvm_vcpu_trap_get_fault(vcpu),
@@ -1365,7 +1394,49 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 
 	idx = srcu_read_lock(&vcpu->kvm->srcu);
 
-	gfn = fault_ipa >> PAGE_SHIFT;
+	/*
+	 * We may have faulted on a shadow stage 2 page table if we are
+	 * running a nested guest.  In this case, we have to resolve the L2
+	 * IPA to the L1 IPA first, before knowing what kind of memory should
+	 * back the L1 IPA.
+	 *
+	 * If the shadow stage 2 page table walk faults, then we simply inject
+	 * this to the guest and carry on.
+	 */
+	if (kvm_is_shadow_s2_fault(vcpu)) {
+		u32 esr;
+
+		ret = kvm_walk_nested_s2(vcpu, fault_ipa, &nested_trans);
+		esr = kvm_s2_trans_esr(&nested_trans);
+		if (esr)
+			kvm_inject_s2_fault(vcpu, esr);
+		if (ret)
+			goto out_unlock;
+
+		ret = kvm_s2_handle_perm_fault(vcpu, &nested_trans);
+		esr = kvm_s2_trans_esr(&nested_trans);
+		if (esr)
+			kvm_inject_s2_fault(vcpu, esr);
+		if (ret)
+			goto out_unlock;
+
+		ipa = kvm_s2_trans_output(&nested_trans);
+	} else {
+		nested_trans = (struct kvm_s2_trans) {
+			/*
+			 * Default to RWX so that we don't filter
+			 * anything while evaluating the permissions.
+			 */
+			.writable	= true,
+			.readable	= true,
+			.upper_attr	= 0,
+			.output		= ipa,
+			.level		= kvm_vcpu_trap_get_fault_level(vcpu),
+			.esr		= kvm_vcpu_get_esr(vcpu),
+		};
+	}
+
+	gfn = ipa >> PAGE_SHIFT;
 	memslot = gfn_to_memslot(vcpu->kvm, gfn);
 	hva = gfn_to_hva_memslot_prot(memslot, gfn, &writable);
 	write_fault = kvm_is_write_fault(vcpu);
@@ -1409,13 +1480,13 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 		 * faulting VA. This is always 12 bits, irrespective
 		 * of the page size.
 		 */
-		fault_ipa |= kvm_vcpu_get_hfar(vcpu) & ((1 << 12) - 1);
-		ret = io_mem_abort(vcpu, fault_ipa);
+		ipa |= kvm_vcpu_get_hfar(vcpu) & ((1 << 12) - 1);
+		ret = io_mem_abort(vcpu, ipa);
 		goto out_unlock;
 	}
 
 	/* Userspace should not be able to register out-of-bounds IPAs */
-	VM_BUG_ON(fault_ipa >= kvm_phys_size(vcpu->kvm));
+	VM_BUG_ON(ipa >= kvm_phys_size(vcpu->kvm));
 
 	if (fault_status == FSC_ACCESS) {
 		handle_access_fault(vcpu, fault_ipa);
@@ -1423,7 +1494,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 		goto out_unlock;
 	}
 
-	ret = user_mem_abort(vcpu, fault_ipa, memslot, hva, fault_status);
+	ret = user_mem_abort(vcpu, fault_ipa, &nested_trans,
+			     memslot, hva, fault_status);
 	if (ret == 0)
 		ret = 1;
 out:
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index c2a99b672368..0a9708f776fc 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -108,6 +108,15 @@ static u32 compute_fsc(int level, u32 fsc)
 	return fsc | (level & 0x3);
 }
 
+static int esr_s2_fault(struct kvm_vcpu *vcpu, int level, u32 fsc)
+{
+	u32 esr;
+
+	esr = kvm_vcpu_get_esr(vcpu) & ~ESR_ELx_FSC;
+	esr |= compute_fsc(level, fsc);
+	return esr;
+}
+
 static int check_base_s2_limits(struct s2_walk_info *wi,
 				int level, int input_size, int stride)
 {
@@ -457,6 +466,45 @@ void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu)
 	}
 }
 
+/*
+ * Returns non-zero if permission fault is handled by injecting it to the next
+ * level hypervisor.
+ */
+int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu, struct kvm_s2_trans *trans)
+{
+	unsigned long fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
+	bool forward_fault = false;
+
+	trans->esr = 0;
+
+	if (fault_status != FSC_PERM)
+		return 0;
+
+	if (kvm_vcpu_trap_is_iabt(vcpu)) {
+		forward_fault = (trans->upper_attr & BIT(54));
+	} else {
+		bool write_fault = kvm_is_write_fault(vcpu);
+
+		forward_fault = ((write_fault && !trans->writable) ||
+				 (!write_fault && !trans->readable));
+	}
+
+	if (forward_fault) {
+		trans->esr = esr_s2_fault(vcpu, trans->level, ESR_ELx_FSC_PERM);
+		return 1;
+	}
+
+	return 0;
+}
+
+int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2)
+{
+	vcpu_write_sys_reg(vcpu, vcpu->arch.fault.far_el2, FAR_EL2);
+	vcpu_write_sys_reg(vcpu, vcpu->arch.fault.hpfar_el2, HPFAR_EL2);
+
+	return kvm_inject_nested_sync(vcpu, esr_el2);
+}
+
 /*
  * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
  * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 36/64] KVM: arm64: nv: Handle shadow stage 2 page faults
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

If we are faulting on a shadow stage 2 translation, we first walk the
guest hypervisor's stage 2 page table to see if it has a mapping. If
not, we inject a stage 2 page fault to the virtual EL2. Otherwise, we
create a mapping in the shadow stage 2 page table.

Note that we have to deal with two IPAs when we got a shadow stage 2
page fault. One is the address we faulted on, and is in the L2 guest
phys space. The other is from the guest stage-2 page table walk, and is
in the L1 guest phys space.  To differentiate them, we rename variables
so that fault_ipa is used for the former and ipa is used for the latter.

Co-developed-by: Christoffer Dall <christoffer.dall@linaro.org>
Co-developed-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
[maz: rewrote this multiple times...]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_emulate.h |   6 ++
 arch/arm64/include/asm/kvm_nested.h  |  18 +++++
 arch/arm64/kvm/mmu.c                 | 102 +++++++++++++++++++++++----
 arch/arm64/kvm/nested.c              |  48 +++++++++++++
 4 files changed, 159 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index ff8980a39ee8..dc6eeb0cc8a9 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -602,4 +602,10 @@ static inline bool vcpu_has_feature(struct kvm_vcpu *vcpu, int feature)
 	return test_bit(feature, vcpu->arch.features);
 }
 
+static inline bool kvm_is_shadow_s2_fault(struct kvm_vcpu *vcpu)
+{
+	return (vcpu->arch.hw_mmu != &vcpu->kvm->arch.mmu &&
+		vcpu->arch.hw_mmu->nested_stage2_enabled);
+}
+
 #endif /* __ARM64_KVM_EMULATE_H__ */
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 48cf288ea238..4fad4d3848ce 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -82,9 +82,27 @@ struct kvm_s2_trans {
 	u64 upper_attr;
 };
 
+static inline phys_addr_t kvm_s2_trans_output(struct kvm_s2_trans *trans)
+{
+	return trans->output;
+}
+
+static inline unsigned long kvm_s2_trans_size(struct kvm_s2_trans *trans)
+{
+	return trans->block_size;
+}
+
+static inline u32 kvm_s2_trans_esr(struct kvm_s2_trans *trans)
+{
+	return trans->esr;
+}
+
 extern int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
 			      struct kvm_s2_trans *result);
 
+extern int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu,
+				    struct kvm_s2_trans *trans);
+extern int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2);
 int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
 extern bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg,
 			    u64 control_bit);
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 55525fd5743d..36f7ecb4f81b 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -969,7 +969,7 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot,
 static unsigned long
 transparent_hugepage_adjust(struct kvm *kvm, struct kvm_memory_slot *memslot,
 			    unsigned long hva, kvm_pfn_t *pfnp,
-			    phys_addr_t *ipap)
+			    phys_addr_t *ipap, phys_addr_t *fault_ipap)
 {
 	kvm_pfn_t pfn = *pfnp;
 
@@ -998,6 +998,7 @@ transparent_hugepage_adjust(struct kvm *kvm, struct kvm_memory_slot *memslot,
 		 * to PG_head and switch the pfn from a tail page to the head
 		 * page accordingly.
 		 */
+		*fault_ipap &= PMD_MASK;
 		*ipap &= PMD_MASK;
 		kvm_release_pfn_clean(pfn);
 		pfn &= ~(PTRS_PER_PMD - 1);
@@ -1080,15 +1081,17 @@ static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
 }
 
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
-			  struct kvm_memory_slot *memslot, unsigned long hva,
-			  unsigned long fault_status)
+			  struct kvm_s2_trans *nested,
+			  struct kvm_memory_slot *memslot,
+			  unsigned long hva, unsigned long fault_status)
 {
 	int ret = 0;
-	bool write_fault, writable, force_pte = false;
+	bool write_fault, writable;
 	bool exec_fault;
 	bool device = false;
 	bool shared;
 	unsigned long mmu_seq;
+	phys_addr_t ipa = fault_ipa;
 	struct kvm *kvm = vcpu->kvm;
 	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
 	struct vm_area_struct *vma;
@@ -1100,6 +1103,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	unsigned long vma_pagesize, fault_granule;
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
 	struct kvm_pgtable *pgt;
+	unsigned long max_map_size = PUD_SIZE;
 
 	fault_granule = 1UL << ARM64_HW_PGTABLE_LEVEL_SHIFT(fault_level);
 	write_fault = kvm_is_write_fault(vcpu);
@@ -1128,7 +1132,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * memslots.
 	 */
 	if (logging_active) {
-		force_pte = true;
+		max_map_size = vma_pagesize = PAGE_SIZE;
 		vma_shift = PAGE_SHIFT;
 	} else {
 		vma_shift = get_vma_page_shift(vma, hva);
@@ -1152,7 +1156,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		fallthrough;
 	case CONT_PTE_SHIFT:
 		vma_shift = PAGE_SHIFT;
-		force_pte = true;
+		max_map_size = PAGE_SIZE;
 		fallthrough;
 	case PAGE_SHIFT:
 		break;
@@ -1161,10 +1165,25 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	}
 
 	vma_pagesize = 1UL << vma_shift;
+
+	if (kvm_is_shadow_s2_fault(vcpu)) {
+		ipa = kvm_s2_trans_output(nested);
+
+		/*
+		 * If we're about to create a shadow stage 2 entry, then we
+		 * can only create a block mapping if the guest stage 2 page
+		 * table uses at least as big a mapping.
+		 */
+		max_map_size = min(kvm_s2_trans_size(nested), max_map_size);
+	}
+
+	vma_pagesize = min(vma_pagesize, max_map_size);
+
 	if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE)
 		fault_ipa &= ~(vma_pagesize - 1);
 
-	gfn = fault_ipa >> PAGE_SHIFT;
+	gfn = ipa >> PAGE_SHIFT;
+
 	mmap_read_unlock(current->mm);
 
 	/*
@@ -1237,12 +1256,14 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * If we are not forced to use page mapping, check if we are
 	 * backed by a THP and thus use block mapping if possible.
 	 */
-	if (vma_pagesize == PAGE_SIZE && !(force_pte || device)) {
+	if (vma_pagesize == PAGE_SIZE &&
+	    !(max_map_size == PAGE_SIZE || device)) {
 		if (fault_status == FSC_PERM && fault_granule > PAGE_SIZE)
 			vma_pagesize = fault_granule;
 		else
 			vma_pagesize = transparent_hugepage_adjust(kvm, memslot,
 								   hva, &pfn,
+								   &ipa,
 								   &fault_ipa);
 	}
 
@@ -1326,8 +1347,10 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 {
 	unsigned long fault_status;
-	phys_addr_t fault_ipa;
+	phys_addr_t fault_ipa; /* The address we faulted on */
+	phys_addr_t ipa; /* Always the IPA in the L1 guest phys space */
 	struct kvm_memory_slot *memslot;
+	struct kvm_s2_trans nested_trans;
 	unsigned long hva;
 	bool is_iabt, write_fault, writable;
 	gfn_t gfn;
@@ -1335,7 +1358,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 
 	fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
 
-	fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
+	ipa = fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
 	is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
 
 	/* Synchronous External Abort? */
@@ -1356,6 +1379,12 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 	/* Check the stage-2 fault is trans. fault or write fault */
 	if (fault_status != FSC_FAULT && fault_status != FSC_PERM &&
 	    fault_status != FSC_ACCESS) {
+		/*
+		 * We must never see an address size fault on shadow stage 2
+		 * page table walk, because we would have injected an addr
+		 * size fault when we walked the nested s2 page and not
+		 * create the shadow entry.
+		 */
 		kvm_err("Unsupported FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
 			kvm_vcpu_trap_get_class(vcpu),
 			(unsigned long)kvm_vcpu_trap_get_fault(vcpu),
@@ -1365,7 +1394,49 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 
 	idx = srcu_read_lock(&vcpu->kvm->srcu);
 
-	gfn = fault_ipa >> PAGE_SHIFT;
+	/*
+	 * We may have faulted on a shadow stage 2 page table if we are
+	 * running a nested guest.  In this case, we have to resolve the L2
+	 * IPA to the L1 IPA first, before knowing what kind of memory should
+	 * back the L1 IPA.
+	 *
+	 * If the shadow stage 2 page table walk faults, then we simply inject
+	 * this to the guest and carry on.
+	 */
+	if (kvm_is_shadow_s2_fault(vcpu)) {
+		u32 esr;
+
+		ret = kvm_walk_nested_s2(vcpu, fault_ipa, &nested_trans);
+		esr = kvm_s2_trans_esr(&nested_trans);
+		if (esr)
+			kvm_inject_s2_fault(vcpu, esr);
+		if (ret)
+			goto out_unlock;
+
+		ret = kvm_s2_handle_perm_fault(vcpu, &nested_trans);
+		esr = kvm_s2_trans_esr(&nested_trans);
+		if (esr)
+			kvm_inject_s2_fault(vcpu, esr);
+		if (ret)
+			goto out_unlock;
+
+		ipa = kvm_s2_trans_output(&nested_trans);
+	} else {
+		nested_trans = (struct kvm_s2_trans) {
+			/*
+			 * Default to RWX so that we don't filter
+			 * anything while evaluating the permissions.
+			 */
+			.writable	= true,
+			.readable	= true,
+			.upper_attr	= 0,
+			.output		= ipa,
+			.level		= kvm_vcpu_trap_get_fault_level(vcpu),
+			.esr		= kvm_vcpu_get_esr(vcpu),
+		};
+	}
+
+	gfn = ipa >> PAGE_SHIFT;
 	memslot = gfn_to_memslot(vcpu->kvm, gfn);
 	hva = gfn_to_hva_memslot_prot(memslot, gfn, &writable);
 	write_fault = kvm_is_write_fault(vcpu);
@@ -1409,13 +1480,13 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 		 * faulting VA. This is always 12 bits, irrespective
 		 * of the page size.
 		 */
-		fault_ipa |= kvm_vcpu_get_hfar(vcpu) & ((1 << 12) - 1);
-		ret = io_mem_abort(vcpu, fault_ipa);
+		ipa |= kvm_vcpu_get_hfar(vcpu) & ((1 << 12) - 1);
+		ret = io_mem_abort(vcpu, ipa);
 		goto out_unlock;
 	}
 
 	/* Userspace should not be able to register out-of-bounds IPAs */
-	VM_BUG_ON(fault_ipa >= kvm_phys_size(vcpu->kvm));
+	VM_BUG_ON(ipa >= kvm_phys_size(vcpu->kvm));
 
 	if (fault_status == FSC_ACCESS) {
 		handle_access_fault(vcpu, fault_ipa);
@@ -1423,7 +1494,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 		goto out_unlock;
 	}
 
-	ret = user_mem_abort(vcpu, fault_ipa, memslot, hva, fault_status);
+	ret = user_mem_abort(vcpu, fault_ipa, &nested_trans,
+			     memslot, hva, fault_status);
 	if (ret == 0)
 		ret = 1;
 out:
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index c2a99b672368..0a9708f776fc 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -108,6 +108,15 @@ static u32 compute_fsc(int level, u32 fsc)
 	return fsc | (level & 0x3);
 }
 
+static int esr_s2_fault(struct kvm_vcpu *vcpu, int level, u32 fsc)
+{
+	u32 esr;
+
+	esr = kvm_vcpu_get_esr(vcpu) & ~ESR_ELx_FSC;
+	esr |= compute_fsc(level, fsc);
+	return esr;
+}
+
 static int check_base_s2_limits(struct s2_walk_info *wi,
 				int level, int input_size, int stride)
 {
@@ -457,6 +466,45 @@ void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu)
 	}
 }
 
+/*
+ * Returns non-zero if permission fault is handled by injecting it to the next
+ * level hypervisor.
+ */
+int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu, struct kvm_s2_trans *trans)
+{
+	unsigned long fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
+	bool forward_fault = false;
+
+	trans->esr = 0;
+
+	if (fault_status != FSC_PERM)
+		return 0;
+
+	if (kvm_vcpu_trap_is_iabt(vcpu)) {
+		forward_fault = (trans->upper_attr & BIT(54));
+	} else {
+		bool write_fault = kvm_is_write_fault(vcpu);
+
+		forward_fault = ((write_fault && !trans->writable) ||
+				 (!write_fault && !trans->readable));
+	}
+
+	if (forward_fault) {
+		trans->esr = esr_s2_fault(vcpu, trans->level, ESR_ELx_FSC_PERM);
+		return 1;
+	}
+
+	return 0;
+}
+
+int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2)
+{
+	vcpu_write_sys_reg(vcpu, vcpu->arch.fault.far_el2, FAR_EL2);
+	vcpu_write_sys_reg(vcpu, vcpu->arch.fault.hpfar_el2, HPFAR_EL2);
+
+	return kvm_inject_nested_sync(vcpu, esr_el2);
+}
+
 /*
  * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
  * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 37/64] KVM: arm64: nv: Restrict S2 RD/WR permissions to match the guest's
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

When mapping a page in a shadow stage-2, special care must be
taken not to be more permissive than the guest is (writable or
readable page when the guest hasn't set that permission).

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_nested.h | 15 +++++++++++++++
 arch/arm64/kvm/mmu.c                | 14 +++++++++++++-
 arch/arm64/kvm/nested.c             |  2 +-
 3 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 4fad4d3848ce..f4b846d09d86 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -97,6 +97,21 @@ static inline u32 kvm_s2_trans_esr(struct kvm_s2_trans *trans)
 	return trans->esr;
 }
 
+static inline bool kvm_s2_trans_readable(struct kvm_s2_trans *trans)
+{
+	return trans->readable;
+}
+
+static inline bool kvm_s2_trans_writable(struct kvm_s2_trans *trans)
+{
+	return trans->writable;
+}
+
+static inline bool kvm_s2_trans_executable(struct kvm_s2_trans *trans)
+{
+	return !(trans->upper_attr & BIT(54));
+}
+
 extern int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
 			      struct kvm_s2_trans *result);
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 36f7ecb4f81b..7c56e1522d3c 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1247,6 +1247,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (exec_fault && device)
 		return -ENOEXEC;
 
+	/*
+	 * Potentially reduce shadow S2 permissions to match the guest's own
+	 * S2. For exec faults, we'd only reach this point if the guest
+	 * actually allowed it (see kvm_s2_handle_perm_fault).
+	 */
+	if (kvm_is_shadow_s2_fault(vcpu)) {
+		writable &= kvm_s2_trans_writable(nested);
+		if (!kvm_s2_trans_readable(nested))
+			prot &= ~KVM_PGTABLE_PROT_R;
+	}
+
 	spin_lock(&kvm->mmu_lock);
 	pgt = vcpu->arch.hw_mmu->pgt;
 	if (mmu_notifier_retry(kvm, mmu_seq))
@@ -1285,7 +1296,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 
 	if (device)
 		prot |= KVM_PGTABLE_PROT_DEVICE;
-	else if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC))
+	else if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC) &&
+		 kvm_s2_trans_executable(nested))
 		prot |= KVM_PGTABLE_PROT_X;
 
 	/*
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 0a9708f776fc..a74ffb1d2064 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -481,7 +481,7 @@ int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu, struct kvm_s2_trans *trans)
 		return 0;
 
 	if (kvm_vcpu_trap_is_iabt(vcpu)) {
-		forward_fault = (trans->upper_attr & BIT(54));
+		forward_fault = !kvm_s2_trans_executable(trans);
 	} else {
 		bool write_fault = kvm_is_write_fault(vcpu);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 37/64] KVM: arm64: nv: Restrict S2 RD/WR permissions to match the guest's
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

When mapping a page in a shadow stage-2, special care must be
taken not to be more permissive than the guest is (writable or
readable page when the guest hasn't set that permission).

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_nested.h | 15 +++++++++++++++
 arch/arm64/kvm/mmu.c                | 14 +++++++++++++-
 arch/arm64/kvm/nested.c             |  2 +-
 3 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 4fad4d3848ce..f4b846d09d86 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -97,6 +97,21 @@ static inline u32 kvm_s2_trans_esr(struct kvm_s2_trans *trans)
 	return trans->esr;
 }
 
+static inline bool kvm_s2_trans_readable(struct kvm_s2_trans *trans)
+{
+	return trans->readable;
+}
+
+static inline bool kvm_s2_trans_writable(struct kvm_s2_trans *trans)
+{
+	return trans->writable;
+}
+
+static inline bool kvm_s2_trans_executable(struct kvm_s2_trans *trans)
+{
+	return !(trans->upper_attr & BIT(54));
+}
+
 extern int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
 			      struct kvm_s2_trans *result);
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 36f7ecb4f81b..7c56e1522d3c 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1247,6 +1247,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (exec_fault && device)
 		return -ENOEXEC;
 
+	/*
+	 * Potentially reduce shadow S2 permissions to match the guest's own
+	 * S2. For exec faults, we'd only reach this point if the guest
+	 * actually allowed it (see kvm_s2_handle_perm_fault).
+	 */
+	if (kvm_is_shadow_s2_fault(vcpu)) {
+		writable &= kvm_s2_trans_writable(nested);
+		if (!kvm_s2_trans_readable(nested))
+			prot &= ~KVM_PGTABLE_PROT_R;
+	}
+
 	spin_lock(&kvm->mmu_lock);
 	pgt = vcpu->arch.hw_mmu->pgt;
 	if (mmu_notifier_retry(kvm, mmu_seq))
@@ -1285,7 +1296,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 
 	if (device)
 		prot |= KVM_PGTABLE_PROT_DEVICE;
-	else if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC))
+	else if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC) &&
+		 kvm_s2_trans_executable(nested))
 		prot |= KVM_PGTABLE_PROT_X;
 
 	/*
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 0a9708f776fc..a74ffb1d2064 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -481,7 +481,7 @@ int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu, struct kvm_s2_trans *trans)
 		return 0;
 
 	if (kvm_vcpu_trap_is_iabt(vcpu)) {
-		forward_fault = (trans->upper_attr & BIT(54));
+		forward_fault = !kvm_s2_trans_executable(trans);
 	} else {
 		bool write_fault = kvm_is_write_fault(vcpu);
 
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 37/64] KVM: arm64: nv: Restrict S2 RD/WR permissions to match the guest's
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

When mapping a page in a shadow stage-2, special care must be
taken not to be more permissive than the guest is (writable or
readable page when the guest hasn't set that permission).

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_nested.h | 15 +++++++++++++++
 arch/arm64/kvm/mmu.c                | 14 +++++++++++++-
 arch/arm64/kvm/nested.c             |  2 +-
 3 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 4fad4d3848ce..f4b846d09d86 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -97,6 +97,21 @@ static inline u32 kvm_s2_trans_esr(struct kvm_s2_trans *trans)
 	return trans->esr;
 }
 
+static inline bool kvm_s2_trans_readable(struct kvm_s2_trans *trans)
+{
+	return trans->readable;
+}
+
+static inline bool kvm_s2_trans_writable(struct kvm_s2_trans *trans)
+{
+	return trans->writable;
+}
+
+static inline bool kvm_s2_trans_executable(struct kvm_s2_trans *trans)
+{
+	return !(trans->upper_attr & BIT(54));
+}
+
 extern int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
 			      struct kvm_s2_trans *result);
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 36f7ecb4f81b..7c56e1522d3c 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1247,6 +1247,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (exec_fault && device)
 		return -ENOEXEC;
 
+	/*
+	 * Potentially reduce shadow S2 permissions to match the guest's own
+	 * S2. For exec faults, we'd only reach this point if the guest
+	 * actually allowed it (see kvm_s2_handle_perm_fault).
+	 */
+	if (kvm_is_shadow_s2_fault(vcpu)) {
+		writable &= kvm_s2_trans_writable(nested);
+		if (!kvm_s2_trans_readable(nested))
+			prot &= ~KVM_PGTABLE_PROT_R;
+	}
+
 	spin_lock(&kvm->mmu_lock);
 	pgt = vcpu->arch.hw_mmu->pgt;
 	if (mmu_notifier_retry(kvm, mmu_seq))
@@ -1285,7 +1296,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 
 	if (device)
 		prot |= KVM_PGTABLE_PROT_DEVICE;
-	else if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC))
+	else if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC) &&
+		 kvm_s2_trans_executable(nested))
 		prot |= KVM_PGTABLE_PROT_X;
 
 	/*
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 0a9708f776fc..a74ffb1d2064 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -481,7 +481,7 @@ int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu, struct kvm_s2_trans *trans)
 		return 0;
 
 	if (kvm_vcpu_trap_is_iabt(vcpu)) {
-		forward_fault = (trans->upper_attr & BIT(54));
+		forward_fault = !kvm_s2_trans_executable(trans);
 	} else {
 		bool write_fault = kvm_is_write_fault(vcpu);
 
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 38/64] KVM: arm64: nv: Unmap/flush shadow stage 2 page tables
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Christoffer Dall <christoffer.dall@linaro.org>

Unmap/flush shadow stage 2 page tables for the nested VMs as well as the
stage 2 page table for the guest hypervisor.

Note: A bunch of the code in mmu.c relating to MMU notifiers is
currently dealt with in an extremely abrupt way, for example by clearing
out an entire shadow stage-2 table. This will be handled in a more
efficient way using the reverse mapping feature in a later version of
the patch series.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_mmu.h    |  3 +++
 arch/arm64/include/asm/kvm_nested.h |  3 +++
 arch/arm64/kvm/mmu.c                | 31 +++++++++++++++++++----
 arch/arm64/kvm/nested.c             | 39 +++++++++++++++++++++++++++++
 4 files changed, 71 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 0750d022bbf8..afad4a27a6f2 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -160,6 +160,8 @@ int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
 			   void __iomem **haddr);
 int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
 			     void **haddr);
+void kvm_stage2_flush_range(struct kvm_s2_mmu *mmu,
+			    phys_addr_t addr, phys_addr_t end);
 void free_hyp_pgds(void);
 
 void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size);
@@ -168,6 +170,7 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu);
 void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
 int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 			  phys_addr_t pa, unsigned long size, bool writable);
+void kvm_stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end);
 
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index f4b846d09d86..8915bead0633 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -118,6 +118,9 @@ extern int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
 extern int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu,
 				    struct kvm_s2_trans *trans);
 extern int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2);
+extern void kvm_nested_s2_wp(struct kvm *kvm);
+extern void kvm_nested_s2_clear(struct kvm *kvm);
+extern void kvm_nested_s2_flush(struct kvm *kvm);
 int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
 extern bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg,
 			    u64 control_bit);
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 7c56e1522d3c..b9b11be65009 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -190,13 +190,20 @@ void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
 	__unmap_stage2_range(mmu, start, size, true);
 }
 
+void kvm_stage2_flush_range(struct kvm_s2_mmu *mmu,
+			    phys_addr_t addr, phys_addr_t end)
+{
+	stage2_apply_range_resched(kvm_s2_mmu_to_kvm(mmu), addr, end, kvm_pgtable_stage2_flush);
+}
+
 static void stage2_flush_memslot(struct kvm *kvm,
 				 struct kvm_memory_slot *memslot)
 {
 	phys_addr_t addr = memslot->base_gfn << PAGE_SHIFT;
 	phys_addr_t end = addr + PAGE_SIZE * memslot->npages;
+	struct kvm_s2_mmu *mmu = &kvm->arch.mmu;
 
-	stage2_apply_range_resched(kvm, addr, end, kvm_pgtable_stage2_flush);
+	kvm_stage2_flush_range(mmu, addr, end);
 }
 
 /**
@@ -219,6 +226,8 @@ static void stage2_flush_vm(struct kvm *kvm)
 	kvm_for_each_memslot(memslot, bkt, slots)
 		stage2_flush_memslot(kvm, memslot);
 
+	kvm_nested_s2_flush(kvm);
+
 	spin_unlock(&kvm->mmu_lock);
 	srcu_read_unlock(&kvm->srcu, idx);
 }
@@ -742,6 +751,8 @@ void stage2_unmap_vm(struct kvm *kvm)
 	kvm_for_each_memslot(memslot, bkt, slots)
 		stage2_unmap_memslot(kvm, memslot);
 
+	kvm_nested_s2_clear(kvm);
+
 	spin_unlock(&kvm->mmu_lock);
 	mmap_read_unlock(current->mm);
 	srcu_read_unlock(&kvm->srcu, idx);
@@ -814,12 +825,12 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 }
 
 /**
- * stage2_wp_range() - write protect stage2 memory region range
+ * kvm_stage2_wp_range() - write protect stage2 memory region range
  * @mmu:        The KVM stage-2 MMU pointer
  * @addr:	Start address of range
  * @end:	End address of range
  */
-static void stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end)
+void kvm_stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end)
 {
 	struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
 	stage2_apply_range_resched(kvm, addr, end, kvm_pgtable_stage2_wrprotect);
@@ -851,7 +862,8 @@ static void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
 	end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
 
 	spin_lock(&kvm->mmu_lock);
-	stage2_wp_range(&kvm->arch.mmu, start, end);
+	kvm_stage2_wp_range(&kvm->arch.mmu, start, end);
+	kvm_nested_s2_wp(kvm);
 	spin_unlock(&kvm->mmu_lock);
 	kvm_flush_remote_tlbs(kvm);
 }
@@ -875,7 +887,7 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
 	phys_addr_t start = (base_gfn +  __ffs(mask)) << PAGE_SHIFT;
 	phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT;
 
-	stage2_wp_range(&kvm->arch.mmu, start, end);
+	kvm_stage2_wp_range(&kvm->arch.mmu, start, end);
 }
 
 /*
@@ -890,6 +902,7 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 		gfn_t gfn_offset, unsigned long mask)
 {
 	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
+	kvm_nested_s2_wp(kvm);
 }
 
 static void kvm_send_hwpoison_signal(unsigned long address, short lsb)
@@ -1529,6 +1542,7 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
 			     (range->end - range->start) << PAGE_SHIFT,
 			     range->may_block);
 
+	kvm_nested_s2_clear(kvm);
 	return false;
 }
 
@@ -1560,6 +1574,7 @@ bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 			       PAGE_SIZE, __pfn_to_phys(pfn),
 			       KVM_PGTABLE_PROT_R, NULL);
 
+	kvm_nested_s2_clear(kvm);
 	return false;
 }
 
@@ -1578,6 +1593,11 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 					range->start << PAGE_SHIFT);
 	pte = __pte(kpte);
 	return pte_valid(pte) && pte_young(pte);
+
+	/*
+	 * TODO: Handle nested_mmu structures here using the reverse mapping in
+	 * a later version of patch series.
+	 */
 }
 
 bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
@@ -1789,6 +1809,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 
 	spin_lock(&kvm->mmu_lock);
 	kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, size);
+	kvm_nested_s2_clear(kvm);
 	spin_unlock(&kvm->mmu_lock);
 }
 
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index a74ffb1d2064..b39af4d87787 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -505,6 +505,45 @@ int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2)
 	return kvm_inject_nested_sync(vcpu, esr_el2);
 }
 
+/* expects kvm->mmu_lock to be held */
+void kvm_nested_s2_wp(struct kvm *kvm)
+{
+	int i;
+
+	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
+		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
+
+		if (kvm_s2_mmu_valid(mmu))
+			kvm_stage2_wp_range(mmu, 0, kvm_phys_size(kvm));
+	}
+}
+
+/* expects kvm->mmu_lock to be held */
+void kvm_nested_s2_clear(struct kvm *kvm)
+{
+	int i;
+
+	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
+		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
+
+		if (kvm_s2_mmu_valid(mmu))
+			kvm_unmap_stage2_range(mmu, 0, kvm_phys_size(kvm));
+	}
+}
+
+/* expects kvm->mmu_lock to be held */
+void kvm_nested_s2_flush(struct kvm *kvm)
+{
+	int i;
+
+	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
+		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
+
+		if (kvm_s2_mmu_valid(mmu))
+			kvm_stage2_flush_range(mmu, 0, kvm_phys_size(kvm));
+	}
+}
+
 /*
  * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
  * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 38/64] KVM: arm64: nv: Unmap/flush shadow stage 2 page tables
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

From: Christoffer Dall <christoffer.dall@linaro.org>

Unmap/flush shadow stage 2 page tables for the nested VMs as well as the
stage 2 page table for the guest hypervisor.

Note: A bunch of the code in mmu.c relating to MMU notifiers is
currently dealt with in an extremely abrupt way, for example by clearing
out an entire shadow stage-2 table. This will be handled in a more
efficient way using the reverse mapping feature in a later version of
the patch series.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_mmu.h    |  3 +++
 arch/arm64/include/asm/kvm_nested.h |  3 +++
 arch/arm64/kvm/mmu.c                | 31 +++++++++++++++++++----
 arch/arm64/kvm/nested.c             | 39 +++++++++++++++++++++++++++++
 4 files changed, 71 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 0750d022bbf8..afad4a27a6f2 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -160,6 +160,8 @@ int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
 			   void __iomem **haddr);
 int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
 			     void **haddr);
+void kvm_stage2_flush_range(struct kvm_s2_mmu *mmu,
+			    phys_addr_t addr, phys_addr_t end);
 void free_hyp_pgds(void);
 
 void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size);
@@ -168,6 +170,7 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu);
 void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
 int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 			  phys_addr_t pa, unsigned long size, bool writable);
+void kvm_stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end);
 
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index f4b846d09d86..8915bead0633 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -118,6 +118,9 @@ extern int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
 extern int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu,
 				    struct kvm_s2_trans *trans);
 extern int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2);
+extern void kvm_nested_s2_wp(struct kvm *kvm);
+extern void kvm_nested_s2_clear(struct kvm *kvm);
+extern void kvm_nested_s2_flush(struct kvm *kvm);
 int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
 extern bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg,
 			    u64 control_bit);
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 7c56e1522d3c..b9b11be65009 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -190,13 +190,20 @@ void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
 	__unmap_stage2_range(mmu, start, size, true);
 }
 
+void kvm_stage2_flush_range(struct kvm_s2_mmu *mmu,
+			    phys_addr_t addr, phys_addr_t end)
+{
+	stage2_apply_range_resched(kvm_s2_mmu_to_kvm(mmu), addr, end, kvm_pgtable_stage2_flush);
+}
+
 static void stage2_flush_memslot(struct kvm *kvm,
 				 struct kvm_memory_slot *memslot)
 {
 	phys_addr_t addr = memslot->base_gfn << PAGE_SHIFT;
 	phys_addr_t end = addr + PAGE_SIZE * memslot->npages;
+	struct kvm_s2_mmu *mmu = &kvm->arch.mmu;
 
-	stage2_apply_range_resched(kvm, addr, end, kvm_pgtable_stage2_flush);
+	kvm_stage2_flush_range(mmu, addr, end);
 }
 
 /**
@@ -219,6 +226,8 @@ static void stage2_flush_vm(struct kvm *kvm)
 	kvm_for_each_memslot(memslot, bkt, slots)
 		stage2_flush_memslot(kvm, memslot);
 
+	kvm_nested_s2_flush(kvm);
+
 	spin_unlock(&kvm->mmu_lock);
 	srcu_read_unlock(&kvm->srcu, idx);
 }
@@ -742,6 +751,8 @@ void stage2_unmap_vm(struct kvm *kvm)
 	kvm_for_each_memslot(memslot, bkt, slots)
 		stage2_unmap_memslot(kvm, memslot);
 
+	kvm_nested_s2_clear(kvm);
+
 	spin_unlock(&kvm->mmu_lock);
 	mmap_read_unlock(current->mm);
 	srcu_read_unlock(&kvm->srcu, idx);
@@ -814,12 +825,12 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 }
 
 /**
- * stage2_wp_range() - write protect stage2 memory region range
+ * kvm_stage2_wp_range() - write protect stage2 memory region range
  * @mmu:        The KVM stage-2 MMU pointer
  * @addr:	Start address of range
  * @end:	End address of range
  */
-static void stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end)
+void kvm_stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end)
 {
 	struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
 	stage2_apply_range_resched(kvm, addr, end, kvm_pgtable_stage2_wrprotect);
@@ -851,7 +862,8 @@ static void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
 	end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
 
 	spin_lock(&kvm->mmu_lock);
-	stage2_wp_range(&kvm->arch.mmu, start, end);
+	kvm_stage2_wp_range(&kvm->arch.mmu, start, end);
+	kvm_nested_s2_wp(kvm);
 	spin_unlock(&kvm->mmu_lock);
 	kvm_flush_remote_tlbs(kvm);
 }
@@ -875,7 +887,7 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
 	phys_addr_t start = (base_gfn +  __ffs(mask)) << PAGE_SHIFT;
 	phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT;
 
-	stage2_wp_range(&kvm->arch.mmu, start, end);
+	kvm_stage2_wp_range(&kvm->arch.mmu, start, end);
 }
 
 /*
@@ -890,6 +902,7 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 		gfn_t gfn_offset, unsigned long mask)
 {
 	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
+	kvm_nested_s2_wp(kvm);
 }
 
 static void kvm_send_hwpoison_signal(unsigned long address, short lsb)
@@ -1529,6 +1542,7 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
 			     (range->end - range->start) << PAGE_SHIFT,
 			     range->may_block);
 
+	kvm_nested_s2_clear(kvm);
 	return false;
 }
 
@@ -1560,6 +1574,7 @@ bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 			       PAGE_SIZE, __pfn_to_phys(pfn),
 			       KVM_PGTABLE_PROT_R, NULL);
 
+	kvm_nested_s2_clear(kvm);
 	return false;
 }
 
@@ -1578,6 +1593,11 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 					range->start << PAGE_SHIFT);
 	pte = __pte(kpte);
 	return pte_valid(pte) && pte_young(pte);
+
+	/*
+	 * TODO: Handle nested_mmu structures here using the reverse mapping in
+	 * a later version of patch series.
+	 */
 }
 
 bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
@@ -1789,6 +1809,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 
 	spin_lock(&kvm->mmu_lock);
 	kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, size);
+	kvm_nested_s2_clear(kvm);
 	spin_unlock(&kvm->mmu_lock);
 }
 
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index a74ffb1d2064..b39af4d87787 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -505,6 +505,45 @@ int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2)
 	return kvm_inject_nested_sync(vcpu, esr_el2);
 }
 
+/* expects kvm->mmu_lock to be held */
+void kvm_nested_s2_wp(struct kvm *kvm)
+{
+	int i;
+
+	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
+		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
+
+		if (kvm_s2_mmu_valid(mmu))
+			kvm_stage2_wp_range(mmu, 0, kvm_phys_size(kvm));
+	}
+}
+
+/* expects kvm->mmu_lock to be held */
+void kvm_nested_s2_clear(struct kvm *kvm)
+{
+	int i;
+
+	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
+		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
+
+		if (kvm_s2_mmu_valid(mmu))
+			kvm_unmap_stage2_range(mmu, 0, kvm_phys_size(kvm));
+	}
+}
+
+/* expects kvm->mmu_lock to be held */
+void kvm_nested_s2_flush(struct kvm *kvm)
+{
+	int i;
+
+	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
+		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
+
+		if (kvm_s2_mmu_valid(mmu))
+			kvm_stage2_flush_range(mmu, 0, kvm_phys_size(kvm));
+	}
+}
+
 /*
  * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
  * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 38/64] KVM: arm64: nv: Unmap/flush shadow stage 2 page tables
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Christoffer Dall <christoffer.dall@linaro.org>

Unmap/flush shadow stage 2 page tables for the nested VMs as well as the
stage 2 page table for the guest hypervisor.

Note: A bunch of the code in mmu.c relating to MMU notifiers is
currently dealt with in an extremely abrupt way, for example by clearing
out an entire shadow stage-2 table. This will be handled in a more
efficient way using the reverse mapping feature in a later version of
the patch series.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_mmu.h    |  3 +++
 arch/arm64/include/asm/kvm_nested.h |  3 +++
 arch/arm64/kvm/mmu.c                | 31 +++++++++++++++++++----
 arch/arm64/kvm/nested.c             | 39 +++++++++++++++++++++++++++++
 4 files changed, 71 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 0750d022bbf8..afad4a27a6f2 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -160,6 +160,8 @@ int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
 			   void __iomem **haddr);
 int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
 			     void **haddr);
+void kvm_stage2_flush_range(struct kvm_s2_mmu *mmu,
+			    phys_addr_t addr, phys_addr_t end);
 void free_hyp_pgds(void);
 
 void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size);
@@ -168,6 +170,7 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu);
 void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
 int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 			  phys_addr_t pa, unsigned long size, bool writable);
+void kvm_stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end);
 
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index f4b846d09d86..8915bead0633 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -118,6 +118,9 @@ extern int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
 extern int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu,
 				    struct kvm_s2_trans *trans);
 extern int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2);
+extern void kvm_nested_s2_wp(struct kvm *kvm);
+extern void kvm_nested_s2_clear(struct kvm *kvm);
+extern void kvm_nested_s2_flush(struct kvm *kvm);
 int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
 extern bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg,
 			    u64 control_bit);
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 7c56e1522d3c..b9b11be65009 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -190,13 +190,20 @@ void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
 	__unmap_stage2_range(mmu, start, size, true);
 }
 
+void kvm_stage2_flush_range(struct kvm_s2_mmu *mmu,
+			    phys_addr_t addr, phys_addr_t end)
+{
+	stage2_apply_range_resched(kvm_s2_mmu_to_kvm(mmu), addr, end, kvm_pgtable_stage2_flush);
+}
+
 static void stage2_flush_memslot(struct kvm *kvm,
 				 struct kvm_memory_slot *memslot)
 {
 	phys_addr_t addr = memslot->base_gfn << PAGE_SHIFT;
 	phys_addr_t end = addr + PAGE_SIZE * memslot->npages;
+	struct kvm_s2_mmu *mmu = &kvm->arch.mmu;
 
-	stage2_apply_range_resched(kvm, addr, end, kvm_pgtable_stage2_flush);
+	kvm_stage2_flush_range(mmu, addr, end);
 }
 
 /**
@@ -219,6 +226,8 @@ static void stage2_flush_vm(struct kvm *kvm)
 	kvm_for_each_memslot(memslot, bkt, slots)
 		stage2_flush_memslot(kvm, memslot);
 
+	kvm_nested_s2_flush(kvm);
+
 	spin_unlock(&kvm->mmu_lock);
 	srcu_read_unlock(&kvm->srcu, idx);
 }
@@ -742,6 +751,8 @@ void stage2_unmap_vm(struct kvm *kvm)
 	kvm_for_each_memslot(memslot, bkt, slots)
 		stage2_unmap_memslot(kvm, memslot);
 
+	kvm_nested_s2_clear(kvm);
+
 	spin_unlock(&kvm->mmu_lock);
 	mmap_read_unlock(current->mm);
 	srcu_read_unlock(&kvm->srcu, idx);
@@ -814,12 +825,12 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 }
 
 /**
- * stage2_wp_range() - write protect stage2 memory region range
+ * kvm_stage2_wp_range() - write protect stage2 memory region range
  * @mmu:        The KVM stage-2 MMU pointer
  * @addr:	Start address of range
  * @end:	End address of range
  */
-static void stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end)
+void kvm_stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end)
 {
 	struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
 	stage2_apply_range_resched(kvm, addr, end, kvm_pgtable_stage2_wrprotect);
@@ -851,7 +862,8 @@ static void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
 	end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
 
 	spin_lock(&kvm->mmu_lock);
-	stage2_wp_range(&kvm->arch.mmu, start, end);
+	kvm_stage2_wp_range(&kvm->arch.mmu, start, end);
+	kvm_nested_s2_wp(kvm);
 	spin_unlock(&kvm->mmu_lock);
 	kvm_flush_remote_tlbs(kvm);
 }
@@ -875,7 +887,7 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
 	phys_addr_t start = (base_gfn +  __ffs(mask)) << PAGE_SHIFT;
 	phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT;
 
-	stage2_wp_range(&kvm->arch.mmu, start, end);
+	kvm_stage2_wp_range(&kvm->arch.mmu, start, end);
 }
 
 /*
@@ -890,6 +902,7 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 		gfn_t gfn_offset, unsigned long mask)
 {
 	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
+	kvm_nested_s2_wp(kvm);
 }
 
 static void kvm_send_hwpoison_signal(unsigned long address, short lsb)
@@ -1529,6 +1542,7 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
 			     (range->end - range->start) << PAGE_SHIFT,
 			     range->may_block);
 
+	kvm_nested_s2_clear(kvm);
 	return false;
 }
 
@@ -1560,6 +1574,7 @@ bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 			       PAGE_SIZE, __pfn_to_phys(pfn),
 			       KVM_PGTABLE_PROT_R, NULL);
 
+	kvm_nested_s2_clear(kvm);
 	return false;
 }
 
@@ -1578,6 +1593,11 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 					range->start << PAGE_SHIFT);
 	pte = __pte(kpte);
 	return pte_valid(pte) && pte_young(pte);
+
+	/*
+	 * TODO: Handle nested_mmu structures here using the reverse mapping in
+	 * a later version of patch series.
+	 */
 }
 
 bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
@@ -1789,6 +1809,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 
 	spin_lock(&kvm->mmu_lock);
 	kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, size);
+	kvm_nested_s2_clear(kvm);
 	spin_unlock(&kvm->mmu_lock);
 }
 
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index a74ffb1d2064..b39af4d87787 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -505,6 +505,45 @@ int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2)
 	return kvm_inject_nested_sync(vcpu, esr_el2);
 }
 
+/* expects kvm->mmu_lock to be held */
+void kvm_nested_s2_wp(struct kvm *kvm)
+{
+	int i;
+
+	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
+		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
+
+		if (kvm_s2_mmu_valid(mmu))
+			kvm_stage2_wp_range(mmu, 0, kvm_phys_size(kvm));
+	}
+}
+
+/* expects kvm->mmu_lock to be held */
+void kvm_nested_s2_clear(struct kvm *kvm)
+{
+	int i;
+
+	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
+		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
+
+		if (kvm_s2_mmu_valid(mmu))
+			kvm_unmap_stage2_range(mmu, 0, kvm_phys_size(kvm));
+	}
+}
+
+/* expects kvm->mmu_lock to be held */
+void kvm_nested_s2_flush(struct kvm *kvm)
+{
+	int i;
+
+	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
+		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
+
+		if (kvm_s2_mmu_valid(mmu))
+			kvm_stage2_flush_range(mmu, 0, kvm_phys_size(kvm));
+	}
+}
+
 /*
  * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
  * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 39/64] KVM: arm64: nv: Set a handler for the system instruction traps
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Jintack Lim <jintack.lim@linaro.org>

When HCR.NV bit is set, execution of the EL2 translation regime address
aranslation instructions and TLB maintenance instructions are trapped to
EL2. In addition, execution of the EL1 translation regime address
aranslation instructions and TLB maintenance instructions that are only
accessible from EL2 and above are trapped to EL2. In these cases,
ESR_EL2.EC will be set to 0x18.

Rework the system instruction emulation framework to handle potentially
all system instruction traps other than MSR/MRS instructions. Those
system instructions would be AT and TLBI instructions controlled by
HCR_EL2.NV, AT, and TTLB bits.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
[maz: squashed two patches together, redispatched various bits around]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h |  4 +--
 arch/arm64/kvm/handle_exit.c      |  2 +-
 arch/arm64/kvm/sys_regs.c         | 48 +++++++++++++++++++++++++------
 3 files changed, 42 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index a15183d0e1bf..0b887364f994 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -407,7 +407,7 @@ struct kvm_vcpu_arch {
 	/*
 	 * Guest registers we preserve during guest debugging.
 	 *
-	 * These shadow registers are updated by the kvm_handle_sys_reg
+	 * These shadow registers are updated by the kvm_handle_sys
 	 * trap handler if the guest accesses or updates them while we
 	 * are using guest debug.
 	 */
@@ -724,7 +724,7 @@ int kvm_handle_cp14_32(struct kvm_vcpu *vcpu);
 int kvm_handle_cp14_64(struct kvm_vcpu *vcpu);
 int kvm_handle_cp15_32(struct kvm_vcpu *vcpu);
 int kvm_handle_cp15_64(struct kvm_vcpu *vcpu);
-int kvm_handle_sys_reg(struct kvm_vcpu *vcpu);
+int kvm_handle_sys(struct kvm_vcpu *vcpu);
 
 void kvm_reset_sys_regs(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 867de65eb766..d135fc7e6883 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -236,7 +236,7 @@ static exit_handle_fn arm_exit_handlers[] = {
 	[ESR_ELx_EC_SMC32]	= handle_smc,
 	[ESR_ELx_EC_HVC64]	= handle_hvc,
 	[ESR_ELx_EC_SMC64]	= handle_smc,
-	[ESR_ELx_EC_SYS64]	= kvm_handle_sys_reg,
+	[ESR_ELx_EC_SYS64]	= kvm_handle_sys,
 	[ESR_ELx_EC_SVE]	= handle_sve,
 	[ESR_ELx_EC_ERET]	= kvm_handle_eret,
 	[ESR_ELx_EC_IABT_LOW]	= kvm_handle_guest_abort,
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 5e8876177ce6..f669618f966b 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1771,10 +1771,6 @@ static bool access_spsr_el2(struct kvm_vcpu *vcpu,
  * more demanding guest...
  */
 static const struct sys_reg_desc sys_reg_descs[] = {
-	{ SYS_DESC(SYS_DC_ISW), access_dcsw },
-	{ SYS_DESC(SYS_DC_CSW), access_dcsw },
-	{ SYS_DESC(SYS_DC_CISW), access_dcsw },
-
 	DBG_BCR_BVR_WCR_WVR_EL1(0),
 	DBG_BCR_BVR_WCR_WVR_EL1(1),
 	{ SYS_DESC(SYS_MDCCINT_EL1), trap_debug_regs, reset_val, MDCCINT_EL1, 0 },
@@ -2240,6 +2236,14 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	EL2_REG(SP_EL2, NULL, reset_unknown, 0),
 };
 
+#define SYS_INSN_TO_DESC(insn, access_fn, forward_fn)	\
+	{ SYS_DESC((insn)), (access_fn), NULL, 0, 0, NULL, NULL, (forward_fn) }
+static struct sys_reg_desc sys_insn_descs[] = {
+	{ SYS_DESC(SYS_DC_ISW), access_dcsw },
+	{ SYS_DESC(SYS_DC_CSW), access_dcsw },
+	{ SYS_DESC(SYS_DC_CISW), access_dcsw },
+};
+
 static bool trap_dbgdidr(struct kvm_vcpu *vcpu,
 			struct sys_reg_params *p,
 			const struct sys_reg_desc *r)
@@ -2786,6 +2790,24 @@ static int emulate_sys_reg(struct kvm_vcpu *vcpu,
 	return 1;
 }
 
+static int emulate_sys_instr(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
+{
+	const struct sys_reg_desc *r;
+
+	/* Search from the system instruction table. */
+	r = find_reg(p, sys_insn_descs, ARRAY_SIZE(sys_insn_descs));
+
+	if (likely(r)) {
+		perform_access(vcpu, p, r);
+	} else {
+		kvm_err("Unsupported guest sys instruction at: %lx\n",
+			*vcpu_pc(vcpu));
+		print_sys_reg_instr(p);
+		kvm_inject_undefined(vcpu);
+	}
+	return 1;
+}
+
 /**
  * kvm_reset_sys_regs - sets system registers to reset value
  * @vcpu: The VCPU pointer
@@ -2803,10 +2825,11 @@ void kvm_reset_sys_regs(struct kvm_vcpu *vcpu)
 }
 
 /**
- * kvm_handle_sys_reg -- handles a mrs/msr trap on a guest sys_reg access
+ * kvm_handle_sys-- handles a system instruction or mrs/msr instruction trap
+		    on a guest execution
  * @vcpu: The VCPU pointer
  */
-int kvm_handle_sys_reg(struct kvm_vcpu *vcpu)
+int kvm_handle_sys(struct kvm_vcpu *vcpu)
 {
 	struct sys_reg_params params;
 	unsigned long esr = kvm_vcpu_get_esr(vcpu);
@@ -2818,10 +2841,16 @@ int kvm_handle_sys_reg(struct kvm_vcpu *vcpu)
 	params = esr_sys64_to_params(esr);
 	params.regval = vcpu_get_reg(vcpu, Rt);
 
-	ret = emulate_sys_reg(vcpu, &params);
+	if (params.Op0 == 1) {
+		/* System instructions */
+		ret = emulate_sys_instr(vcpu, &params);
+	} else {
+		/* MRS/MSR instructions */
+		ret = emulate_sys_reg(vcpu, &params);
+		if (!params.is_write)
+			vcpu_set_reg(vcpu, Rt, params.regval);
+	}
 
-	if (!params.is_write)
-		vcpu_set_reg(vcpu, Rt, params.regval);
 	return ret;
 }
 
@@ -3237,6 +3266,7 @@ void kvm_sys_reg_table_init(void)
 	BUG_ON(check_sysreg_table(cp15_regs, ARRAY_SIZE(cp15_regs), true));
 	BUG_ON(check_sysreg_table(cp15_64_regs, ARRAY_SIZE(cp15_64_regs), true));
 	BUG_ON(check_sysreg_table(invariant_sys_regs, ARRAY_SIZE(invariant_sys_regs), false));
+	BUG_ON(check_sysreg_table(sys_insn_descs, ARRAY_SIZE(sys_insn_descs), false));
 
 	/* We abuse the reset function to overwrite the table itself. */
 	for (i = 0; i < ARRAY_SIZE(invariant_sys_regs); i++)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 39/64] KVM: arm64: nv: Set a handler for the system instruction traps
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

From: Jintack Lim <jintack.lim@linaro.org>

When HCR.NV bit is set, execution of the EL2 translation regime address
aranslation instructions and TLB maintenance instructions are trapped to
EL2. In addition, execution of the EL1 translation regime address
aranslation instructions and TLB maintenance instructions that are only
accessible from EL2 and above are trapped to EL2. In these cases,
ESR_EL2.EC will be set to 0x18.

Rework the system instruction emulation framework to handle potentially
all system instruction traps other than MSR/MRS instructions. Those
system instructions would be AT and TLBI instructions controlled by
HCR_EL2.NV, AT, and TTLB bits.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
[maz: squashed two patches together, redispatched various bits around]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h |  4 +--
 arch/arm64/kvm/handle_exit.c      |  2 +-
 arch/arm64/kvm/sys_regs.c         | 48 +++++++++++++++++++++++++------
 3 files changed, 42 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index a15183d0e1bf..0b887364f994 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -407,7 +407,7 @@ struct kvm_vcpu_arch {
 	/*
 	 * Guest registers we preserve during guest debugging.
 	 *
-	 * These shadow registers are updated by the kvm_handle_sys_reg
+	 * These shadow registers are updated by the kvm_handle_sys
 	 * trap handler if the guest accesses or updates them while we
 	 * are using guest debug.
 	 */
@@ -724,7 +724,7 @@ int kvm_handle_cp14_32(struct kvm_vcpu *vcpu);
 int kvm_handle_cp14_64(struct kvm_vcpu *vcpu);
 int kvm_handle_cp15_32(struct kvm_vcpu *vcpu);
 int kvm_handle_cp15_64(struct kvm_vcpu *vcpu);
-int kvm_handle_sys_reg(struct kvm_vcpu *vcpu);
+int kvm_handle_sys(struct kvm_vcpu *vcpu);
 
 void kvm_reset_sys_regs(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 867de65eb766..d135fc7e6883 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -236,7 +236,7 @@ static exit_handle_fn arm_exit_handlers[] = {
 	[ESR_ELx_EC_SMC32]	= handle_smc,
 	[ESR_ELx_EC_HVC64]	= handle_hvc,
 	[ESR_ELx_EC_SMC64]	= handle_smc,
-	[ESR_ELx_EC_SYS64]	= kvm_handle_sys_reg,
+	[ESR_ELx_EC_SYS64]	= kvm_handle_sys,
 	[ESR_ELx_EC_SVE]	= handle_sve,
 	[ESR_ELx_EC_ERET]	= kvm_handle_eret,
 	[ESR_ELx_EC_IABT_LOW]	= kvm_handle_guest_abort,
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 5e8876177ce6..f669618f966b 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1771,10 +1771,6 @@ static bool access_spsr_el2(struct kvm_vcpu *vcpu,
  * more demanding guest...
  */
 static const struct sys_reg_desc sys_reg_descs[] = {
-	{ SYS_DESC(SYS_DC_ISW), access_dcsw },
-	{ SYS_DESC(SYS_DC_CSW), access_dcsw },
-	{ SYS_DESC(SYS_DC_CISW), access_dcsw },
-
 	DBG_BCR_BVR_WCR_WVR_EL1(0),
 	DBG_BCR_BVR_WCR_WVR_EL1(1),
 	{ SYS_DESC(SYS_MDCCINT_EL1), trap_debug_regs, reset_val, MDCCINT_EL1, 0 },
@@ -2240,6 +2236,14 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	EL2_REG(SP_EL2, NULL, reset_unknown, 0),
 };
 
+#define SYS_INSN_TO_DESC(insn, access_fn, forward_fn)	\
+	{ SYS_DESC((insn)), (access_fn), NULL, 0, 0, NULL, NULL, (forward_fn) }
+static struct sys_reg_desc sys_insn_descs[] = {
+	{ SYS_DESC(SYS_DC_ISW), access_dcsw },
+	{ SYS_DESC(SYS_DC_CSW), access_dcsw },
+	{ SYS_DESC(SYS_DC_CISW), access_dcsw },
+};
+
 static bool trap_dbgdidr(struct kvm_vcpu *vcpu,
 			struct sys_reg_params *p,
 			const struct sys_reg_desc *r)
@@ -2786,6 +2790,24 @@ static int emulate_sys_reg(struct kvm_vcpu *vcpu,
 	return 1;
 }
 
+static int emulate_sys_instr(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
+{
+	const struct sys_reg_desc *r;
+
+	/* Search from the system instruction table. */
+	r = find_reg(p, sys_insn_descs, ARRAY_SIZE(sys_insn_descs));
+
+	if (likely(r)) {
+		perform_access(vcpu, p, r);
+	} else {
+		kvm_err("Unsupported guest sys instruction at: %lx\n",
+			*vcpu_pc(vcpu));
+		print_sys_reg_instr(p);
+		kvm_inject_undefined(vcpu);
+	}
+	return 1;
+}
+
 /**
  * kvm_reset_sys_regs - sets system registers to reset value
  * @vcpu: The VCPU pointer
@@ -2803,10 +2825,11 @@ void kvm_reset_sys_regs(struct kvm_vcpu *vcpu)
 }
 
 /**
- * kvm_handle_sys_reg -- handles a mrs/msr trap on a guest sys_reg access
+ * kvm_handle_sys-- handles a system instruction or mrs/msr instruction trap
+		    on a guest execution
  * @vcpu: The VCPU pointer
  */
-int kvm_handle_sys_reg(struct kvm_vcpu *vcpu)
+int kvm_handle_sys(struct kvm_vcpu *vcpu)
 {
 	struct sys_reg_params params;
 	unsigned long esr = kvm_vcpu_get_esr(vcpu);
@@ -2818,10 +2841,16 @@ int kvm_handle_sys_reg(struct kvm_vcpu *vcpu)
 	params = esr_sys64_to_params(esr);
 	params.regval = vcpu_get_reg(vcpu, Rt);
 
-	ret = emulate_sys_reg(vcpu, &params);
+	if (params.Op0 == 1) {
+		/* System instructions */
+		ret = emulate_sys_instr(vcpu, &params);
+	} else {
+		/* MRS/MSR instructions */
+		ret = emulate_sys_reg(vcpu, &params);
+		if (!params.is_write)
+			vcpu_set_reg(vcpu, Rt, params.regval);
+	}
 
-	if (!params.is_write)
-		vcpu_set_reg(vcpu, Rt, params.regval);
 	return ret;
 }
 
@@ -3237,6 +3266,7 @@ void kvm_sys_reg_table_init(void)
 	BUG_ON(check_sysreg_table(cp15_regs, ARRAY_SIZE(cp15_regs), true));
 	BUG_ON(check_sysreg_table(cp15_64_regs, ARRAY_SIZE(cp15_64_regs), true));
 	BUG_ON(check_sysreg_table(invariant_sys_regs, ARRAY_SIZE(invariant_sys_regs), false));
+	BUG_ON(check_sysreg_table(sys_insn_descs, ARRAY_SIZE(sys_insn_descs), false));
 
 	/* We abuse the reset function to overwrite the table itself. */
 	for (i = 0; i < ARRAY_SIZE(invariant_sys_regs); i++)
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 39/64] KVM: arm64: nv: Set a handler for the system instruction traps
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Jintack Lim <jintack.lim@linaro.org>

When HCR.NV bit is set, execution of the EL2 translation regime address
aranslation instructions and TLB maintenance instructions are trapped to
EL2. In addition, execution of the EL1 translation regime address
aranslation instructions and TLB maintenance instructions that are only
accessible from EL2 and above are trapped to EL2. In these cases,
ESR_EL2.EC will be set to 0x18.

Rework the system instruction emulation framework to handle potentially
all system instruction traps other than MSR/MRS instructions. Those
system instructions would be AT and TLBI instructions controlled by
HCR_EL2.NV, AT, and TTLB bits.

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
[maz: squashed two patches together, redispatched various bits around]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h |  4 +--
 arch/arm64/kvm/handle_exit.c      |  2 +-
 arch/arm64/kvm/sys_regs.c         | 48 +++++++++++++++++++++++++------
 3 files changed, 42 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index a15183d0e1bf..0b887364f994 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -407,7 +407,7 @@ struct kvm_vcpu_arch {
 	/*
 	 * Guest registers we preserve during guest debugging.
 	 *
-	 * These shadow registers are updated by the kvm_handle_sys_reg
+	 * These shadow registers are updated by the kvm_handle_sys
 	 * trap handler if the guest accesses or updates them while we
 	 * are using guest debug.
 	 */
@@ -724,7 +724,7 @@ int kvm_handle_cp14_32(struct kvm_vcpu *vcpu);
 int kvm_handle_cp14_64(struct kvm_vcpu *vcpu);
 int kvm_handle_cp15_32(struct kvm_vcpu *vcpu);
 int kvm_handle_cp15_64(struct kvm_vcpu *vcpu);
-int kvm_handle_sys_reg(struct kvm_vcpu *vcpu);
+int kvm_handle_sys(struct kvm_vcpu *vcpu);
 
 void kvm_reset_sys_regs(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 867de65eb766..d135fc7e6883 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -236,7 +236,7 @@ static exit_handle_fn arm_exit_handlers[] = {
 	[ESR_ELx_EC_SMC32]	= handle_smc,
 	[ESR_ELx_EC_HVC64]	= handle_hvc,
 	[ESR_ELx_EC_SMC64]	= handle_smc,
-	[ESR_ELx_EC_SYS64]	= kvm_handle_sys_reg,
+	[ESR_ELx_EC_SYS64]	= kvm_handle_sys,
 	[ESR_ELx_EC_SVE]	= handle_sve,
 	[ESR_ELx_EC_ERET]	= kvm_handle_eret,
 	[ESR_ELx_EC_IABT_LOW]	= kvm_handle_guest_abort,
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 5e8876177ce6..f669618f966b 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1771,10 +1771,6 @@ static bool access_spsr_el2(struct kvm_vcpu *vcpu,
  * more demanding guest...
  */
 static const struct sys_reg_desc sys_reg_descs[] = {
-	{ SYS_DESC(SYS_DC_ISW), access_dcsw },
-	{ SYS_DESC(SYS_DC_CSW), access_dcsw },
-	{ SYS_DESC(SYS_DC_CISW), access_dcsw },
-
 	DBG_BCR_BVR_WCR_WVR_EL1(0),
 	DBG_BCR_BVR_WCR_WVR_EL1(1),
 	{ SYS_DESC(SYS_MDCCINT_EL1), trap_debug_regs, reset_val, MDCCINT_EL1, 0 },
@@ -2240,6 +2236,14 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	EL2_REG(SP_EL2, NULL, reset_unknown, 0),
 };
 
+#define SYS_INSN_TO_DESC(insn, access_fn, forward_fn)	\
+	{ SYS_DESC((insn)), (access_fn), NULL, 0, 0, NULL, NULL, (forward_fn) }
+static struct sys_reg_desc sys_insn_descs[] = {
+	{ SYS_DESC(SYS_DC_ISW), access_dcsw },
+	{ SYS_DESC(SYS_DC_CSW), access_dcsw },
+	{ SYS_DESC(SYS_DC_CISW), access_dcsw },
+};
+
 static bool trap_dbgdidr(struct kvm_vcpu *vcpu,
 			struct sys_reg_params *p,
 			const struct sys_reg_desc *r)
@@ -2786,6 +2790,24 @@ static int emulate_sys_reg(struct kvm_vcpu *vcpu,
 	return 1;
 }
 
+static int emulate_sys_instr(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
+{
+	const struct sys_reg_desc *r;
+
+	/* Search from the system instruction table. */
+	r = find_reg(p, sys_insn_descs, ARRAY_SIZE(sys_insn_descs));
+
+	if (likely(r)) {
+		perform_access(vcpu, p, r);
+	} else {
+		kvm_err("Unsupported guest sys instruction at: %lx\n",
+			*vcpu_pc(vcpu));
+		print_sys_reg_instr(p);
+		kvm_inject_undefined(vcpu);
+	}
+	return 1;
+}
+
 /**
  * kvm_reset_sys_regs - sets system registers to reset value
  * @vcpu: The VCPU pointer
@@ -2803,10 +2825,11 @@ void kvm_reset_sys_regs(struct kvm_vcpu *vcpu)
 }
 
 /**
- * kvm_handle_sys_reg -- handles a mrs/msr trap on a guest sys_reg access
+ * kvm_handle_sys-- handles a system instruction or mrs/msr instruction trap
+		    on a guest execution
  * @vcpu: The VCPU pointer
  */
-int kvm_handle_sys_reg(struct kvm_vcpu *vcpu)
+int kvm_handle_sys(struct kvm_vcpu *vcpu)
 {
 	struct sys_reg_params params;
 	unsigned long esr = kvm_vcpu_get_esr(vcpu);
@@ -2818,10 +2841,16 @@ int kvm_handle_sys_reg(struct kvm_vcpu *vcpu)
 	params = esr_sys64_to_params(esr);
 	params.regval = vcpu_get_reg(vcpu, Rt);
 
-	ret = emulate_sys_reg(vcpu, &params);
+	if (params.Op0 == 1) {
+		/* System instructions */
+		ret = emulate_sys_instr(vcpu, &params);
+	} else {
+		/* MRS/MSR instructions */
+		ret = emulate_sys_reg(vcpu, &params);
+		if (!params.is_write)
+			vcpu_set_reg(vcpu, Rt, params.regval);
+	}
 
-	if (!params.is_write)
-		vcpu_set_reg(vcpu, Rt, params.regval);
 	return ret;
 }
 
@@ -3237,6 +3266,7 @@ void kvm_sys_reg_table_init(void)
 	BUG_ON(check_sysreg_table(cp15_regs, ARRAY_SIZE(cp15_regs), true));
 	BUG_ON(check_sysreg_table(cp15_64_regs, ARRAY_SIZE(cp15_64_regs), true));
 	BUG_ON(check_sysreg_table(invariant_sys_regs, ARRAY_SIZE(invariant_sys_regs), false));
+	BUG_ON(check_sysreg_table(sys_insn_descs, ARRAY_SIZE(sys_insn_descs), false));
 
 	/* We abuse the reset function to overwrite the table itself. */
 	for (i = 0; i < ARRAY_SIZE(invariant_sys_regs); i++)
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 40/64] KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Jintack Lim <jintack.lim@linaro.org>

When supporting nested virtualization a guest hypervisor executing AT
instructions must be trapped and emulated by the host hypervisor,
because untrapped AT instructions operating on S1E1 will use the wrong
translation regieme (the one used to emulate virtual EL2 in EL1 instead
of virtual EL1) and AT instructions operating on S12 will not work from
EL1.

This patch does several things.

1. List and define all AT system instructions to emulate and document
the emulation design.

2. Implement AT instruction handling logic in EL2. This will be used to
emulate AT instructions executed in the virtual EL2.

AT instruction emulation works by loading the proper processor
context, which depends on the trapped instruction and the virtual
HCR_EL2, to the EL1 virtual memory control registers and executing AT
instructions. Note that ctxt->hw_sys_regs is expected to have the
proper processor context before calling the handling
function(__kvm_at_insn) implemented in this patch.

4. Emulate AT S1E[01] instructions by issuing the same instructions in
EL2. We set the physical EL1 registers, NV and NV1 bits as described in
the AT instruction emulation overview.

5. Emulate AT A12E[01] instructions in two steps: First, do the stage-1
translation by reusing the existing AT emulation functions.  Second, do
the stage-2 translation by walking the guest hypervisor's stage-2 page
table in software. Record the translation result to PAR_EL1.

6. Emulate AT S1E2 instructions by issuing the corresponding S1E1
instructions in EL2. We set the physical EL1 registers and the HCR_EL2
register as described in the AT instruction emulation overview.

7. Forward system instruction traps to the virtual EL2 if the corresponding
virtual AT bit is set in the virtual HCR_EL2.

  [ Much logic above has been reworked by Marc Zyngier ]

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
---
 arch/arm64/include/asm/kvm_arm.h |   2 +
 arch/arm64/include/asm/kvm_asm.h |   2 +
 arch/arm64/include/asm/sysreg.h  |  17 +++
 arch/arm64/kvm/Makefile          |   2 +-
 arch/arm64/kvm/at.c              | 219 +++++++++++++++++++++++++++++
 arch/arm64/kvm/hyp/vhe/switch.c  |  13 +-
 arch/arm64/kvm/sys_regs.c        | 229 ++++++++++++++++++++++++++++++-
 7 files changed, 478 insertions(+), 6 deletions(-)
 create mode 100644 arch/arm64/kvm/at.c

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 3675879b53c6..aa3bdce1b166 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -20,6 +20,7 @@
 #define HCR_AMVOFFEN	(UL(1) << 51)
 #define HCR_FIEN	(UL(1) << 47)
 #define HCR_FWB		(UL(1) << 46)
+#define HCR_AT		(UL(1) << 44)
 #define HCR_NV1		(UL(1) << 43)
 #define HCR_NV		(UL(1) << 42)
 #define HCR_API		(UL(1) << 41)
@@ -118,6 +119,7 @@
 #define VTCR_EL2_TG0_16K	TCR_TG0_16K
 #define VTCR_EL2_TG0_64K	TCR_TG0_64K
 #define VTCR_EL2_SH0_MASK	TCR_SH0_MASK
+#define VTCR_EL2_SH0_SHIFT	TCR_SH0_SHIFT
 #define VTCR_EL2_SH0_INNER	TCR_SH0_INNER
 #define VTCR_EL2_ORGN0_MASK	TCR_ORGN0_MASK
 #define VTCR_EL2_ORGN0_WBWA	TCR_ORGN0_WBWA
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index d5b0386ef765..e22861ece3c3 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -208,6 +208,8 @@ extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa,
 extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
 
 extern void __kvm_timer_set_cntvoff(u64 cntvoff);
+extern void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr);
+extern void __kvm_at_s1e2(struct kvm_vcpu *vcpu, u32 op, u64 vaddr);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 61c990e5591a..ea17d1eabfc5 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -658,6 +658,23 @@
 
 #define SYS_SP_EL2			sys_reg(3, 6,  4, 1, 0)
 
+/* AT instructions */
+#define AT_Op0 1
+#define AT_CRn 7
+
+#define OP_AT_S1E1R	sys_insn(AT_Op0, 0, AT_CRn, 8, 0)
+#define OP_AT_S1E1W	sys_insn(AT_Op0, 0, AT_CRn, 8, 1)
+#define OP_AT_S1E0R	sys_insn(AT_Op0, 0, AT_CRn, 8, 2)
+#define OP_AT_S1E0W	sys_insn(AT_Op0, 0, AT_CRn, 8, 3)
+#define OP_AT_S1E1RP	sys_insn(AT_Op0, 0, AT_CRn, 9, 0)
+#define OP_AT_S1E1WP	sys_insn(AT_Op0, 0, AT_CRn, 9, 1)
+#define OP_AT_S1E2R	sys_insn(AT_Op0, 4, AT_CRn, 8, 0)
+#define OP_AT_S1E2W	sys_insn(AT_Op0, 4, AT_CRn, 8, 1)
+#define OP_AT_S12E1R	sys_insn(AT_Op0, 4, AT_CRn, 8, 4)
+#define OP_AT_S12E1W	sys_insn(AT_Op0, 4, AT_CRn, 8, 5)
+#define OP_AT_S12E0R	sys_insn(AT_Op0, 4, AT_CRn, 8, 6)
+#define OP_AT_S12E0W	sys_insn(AT_Op0, 4, AT_CRn, 8, 7)
+
 /* Common SCTLR_ELx flags. */
 #define SCTLR_ELx_DSSBS	(BIT(44))
 #define SCTLR_ELx_ATA	(BIT(43))
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index dbaf42ff65f1..b800dcbd157f 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -14,7 +14,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
 	 inject_fault.o va_layout.o handle_exit.o \
 	 guest.o debug.o reset.o sys_regs.o \
 	 vgic-sys-reg-v3.o fpsimd.o pmu.o pkvm.o \
-	 arch_timer.o trng.o emulate-nested.o nested.o \
+	 arch_timer.o trng.o emulate-nested.o nested.o at.o \
 	 vgic/vgic.o vgic/vgic-init.o \
 	 vgic/vgic-irqfd.o vgic/vgic-v2.o \
 	 vgic/vgic-v3.o vgic/vgic-v4.o \
diff --git a/arch/arm64/kvm/at.c b/arch/arm64/kvm/at.c
new file mode 100644
index 000000000000..574c664e984b
--- /dev/null
+++ b/arch/arm64/kvm/at.c
@@ -0,0 +1,219 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2017 - Linaro Ltd
+ * Author: Jintack Lim <jintack.lim@linaro.org>
+ */
+
+#include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
+
+struct mmu_config {
+	u64	ttbr0;
+	u64	ttbr1;
+	u64	tcr;
+	u64	sctlr;
+	u64	vttbr;
+	u64	vtcr;
+	u64	hcr;
+};
+
+static void __mmu_config_save(struct mmu_config *config)
+{
+	config->ttbr0	= read_sysreg_el1(SYS_TTBR0);
+	config->ttbr1	= read_sysreg_el1(SYS_TTBR1);
+	config->tcr	= read_sysreg_el1(SYS_TCR);
+	config->sctlr	= read_sysreg_el1(SYS_SCTLR);
+	config->vttbr	= read_sysreg(vttbr_el2);
+	config->vtcr	= read_sysreg(vtcr_el2);
+	config->hcr	= read_sysreg(hcr_el2);
+}
+
+static void __mmu_config_restore(struct mmu_config *config)
+{
+	write_sysreg_el1(config->ttbr0,	SYS_TTBR0);
+	write_sysreg_el1(config->ttbr1,	SYS_TTBR1);
+	write_sysreg_el1(config->tcr,	SYS_TCR);
+	write_sysreg_el1(config->sctlr,	SYS_SCTLR);
+	write_sysreg(config->vttbr,	vttbr_el2);
+	write_sysreg(config->vtcr,	vtcr_el2);
+	write_sysreg(config->hcr,	hcr_el2);
+
+	isb();
+}
+
+void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
+{
+	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
+	struct mmu_config config;
+	struct kvm_s2_mmu *mmu;
+
+	spin_lock(&vcpu->kvm->mmu_lock);
+
+	/*
+	 * If HCR_EL2.{E2H,TGE} == {1,1}, the MMU context is already
+	 * the right one (as we trapped from vEL2).
+	 */
+	if (vcpu_el2_e2h_is_set(vcpu) && vcpu_el2_tge_is_set(vcpu))
+		goto skip_mmu_switch;
+
+	/*
+	 * FIXME: Obtaining the S2 MMU for a guest guest is horribly
+	 * racy, and we may not find it (evicted by another vcpu, for
+	 * example).
+	 */
+	mmu = lookup_s2_mmu(vcpu->kvm,
+			    vcpu_read_sys_reg(vcpu, VTTBR_EL2),
+			    vcpu_read_sys_reg(vcpu, HCR_EL2));
+
+	if (WARN_ON(!mmu))
+		goto out;
+
+	/* We've trapped, so everything is live on the CPU. */
+	__mmu_config_save(&config);
+
+	write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR0_EL1),	SYS_TTBR0);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR1_EL1),	SYS_TTBR1);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, TCR_EL1),	SYS_TCR);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, SCTLR_EL1),	SYS_SCTLR);
+	write_sysreg(kvm_get_vttbr(mmu),		vttbr_el2);
+	/*
+	 * REVISIT: do we need anything from the guest's VTCR_EL2? If
+	 * looks like keeping the hosts configuration is the right
+	 * thing to do at this stage (and we could avoid save/restore
+	 * it. Keep the host's version for now.
+	 */
+	write_sysreg((config.hcr & ~HCR_TGE) | HCR_VM,	hcr_el2);
+
+	isb();
+
+skip_mmu_switch:
+
+	switch (op) {
+	case OP_AT_S1E1R:
+	case OP_AT_S1E1RP:
+		asm volatile("at s1e1r, %0" : : "r" (vaddr));
+		break;
+	case OP_AT_S1E1W:
+	case OP_AT_S1E1WP:
+		asm volatile("at s1e1w, %0" : : "r" (vaddr));
+		break;
+	case OP_AT_S1E0R:
+		asm volatile("at s1e0r, %0" : : "r" (vaddr));
+		break;
+	case OP_AT_S1E0W:
+		asm volatile("at s1e0w, %0" : : "r" (vaddr));
+		break;
+	default:
+		WARN_ON_ONCE(1);
+		break;
+	}
+
+	isb();
+
+	ctxt_sys_reg(ctxt, PAR_EL1) = read_sysreg(par_el1);
+
+	/*
+	 * Failed? let's leave the building now.
+	 *
+	 * FIXME: how about a failed translation because the shadow S2
+	 * wasn't populated? We may need to perform a SW PTW,
+	 * populating our shadow S2 and retry the instruction.
+	 */
+	if (ctxt_sys_reg(ctxt, PAR_EL1) & 1)
+		goto nopan;
+
+	/* No PAN? No problem. */
+	if (!(*vcpu_cpsr(vcpu) & PSR_PAN_BIT))
+		goto nopan;
+
+	/*
+	 * For PAN-involved AT operations, perform the same
+	 * translation, using EL0 this time.
+	 */
+	switch (op) {
+	case OP_AT_S1E1RP:
+		asm volatile("at s1e0r, %0" : : "r" (vaddr));
+		break;
+	case OP_AT_S1E1WP:
+		asm volatile("at s1e0w, %0" : : "r" (vaddr));
+		break;
+	default:
+		goto nopan;
+	}
+
+	/*
+	 * If the EL0 translation has succeeded, we need to pretend
+	 * the AT operation has failed, as the PAN setting forbids
+	 * such a translation.
+	 *
+	 * FIXME: we hardcode a Level-3 permission fault. We really
+	 * should return the real fault level.
+	 */
+	if (!(read_sysreg(par_el1) & 1))
+		ctxt_sys_reg(ctxt, PAR_EL1) = 0x1f;
+
+nopan:
+	if (!(vcpu_el2_e2h_is_set(vcpu) && vcpu_el2_tge_is_set(vcpu)))
+		__mmu_config_restore(&config);
+
+out:
+	spin_unlock(&vcpu->kvm->mmu_lock);
+}
+
+void __kvm_at_s1e2(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
+{
+	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
+	struct mmu_config config;
+	struct kvm_s2_mmu *mmu;
+	u64 val;
+
+	spin_lock(&vcpu->kvm->mmu_lock);
+
+	mmu = &vcpu->kvm->arch.mmu;
+
+	/* We've trapped, so everything is live on the CPU. */
+	__mmu_config_save(&config);
+
+	if (vcpu_el2_e2h_is_set(vcpu)) {
+		write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR0_EL2),	SYS_TTBR0);
+		write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR1_EL2),	SYS_TTBR1);
+		write_sysreg_el1(ctxt_sys_reg(ctxt, TCR_EL2),	SYS_TCR);
+		write_sysreg_el1(ctxt_sys_reg(ctxt, SCTLR_EL2),	SYS_SCTLR);
+
+		val = config.hcr;
+	} else {
+		write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR0_EL2),	SYS_TTBR0);
+		val = translate_tcr_el2_to_tcr_el1(ctxt_sys_reg(ctxt, TCR_EL2));
+		write_sysreg_el1(val, SYS_TCR);
+		val = translate_sctlr_el2_to_sctlr_el1(ctxt_sys_reg(ctxt, SCTLR_EL2));
+		write_sysreg_el1(val, SYS_SCTLR);
+
+		val = config.hcr | HCR_NV | HCR_NV1;
+	}
+
+	write_sysreg(kvm_get_vttbr(mmu),		vttbr_el2);
+	/* FIXME: write S2 MMU VTCR_EL2? */
+	write_sysreg((val & ~HCR_TGE) | HCR_VM,		hcr_el2);
+
+	isb();
+
+	switch (op) {
+	case OP_AT_S1E2R:
+		asm volatile("at s1e1r, %0" : : "r" (vaddr));
+		break;
+	case OP_AT_S1E2W:
+		asm volatile("at s1e1w, %0" : : "r" (vaddr));
+		break;
+	default:
+		WARN_ON_ONCE(1);
+		break;
+	}
+
+	isb();
+
+	/* FIXME: handle failed translation due to shadow S2 */
+	ctxt_sys_reg(ctxt, PAR_EL1) = read_sysreg(par_el1);
+
+	__mmu_config_restore(&config);
+	spin_unlock(&vcpu->kvm->mmu_lock);
+}
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 28845f907cfc..b7790d3c4122 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -41,9 +41,10 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 		if (!vcpu_el2_e2h_is_set(vcpu)) {
 			/*
 			 * For a guest hypervisor on v8.0, trap and emulate
-			 * the EL1 virtual memory control register accesses.
+			 * the EL1 virtual memory control register accesses
+			 * as well as the AT S1 operations.
 			 */
-			hcr |= HCR_TVM | HCR_TRVM | HCR_NV1;
+			hcr |= HCR_TVM | HCR_TRVM | HCR_AT | HCR_NV1;
 		} else {
 			/*
 			 * For a guest hypervisor on v8.1 (VHE), allow to
@@ -68,6 +69,14 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 			hcr &= ~HCR_TVM;
 
 			hcr |= vhcr_el2 & (HCR_TVM | HCR_TRVM);
+
+			/*
+			 * If we're using the EL1 translation regime
+			 * (TGE clear), then ensure that AT S1 ops are
+			 * trapped too.
+			 */
+			if (!vcpu_el2_tge_is_set(vcpu))
+				hcr |= HCR_AT;
 		}
 	}
 
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index f669618f966b..7be57e1b7019 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1704,7 +1704,6 @@ static bool access_sp_el1(struct kvm_vcpu *vcpu,
 	return true;
 }
 
-
 static bool access_elr(struct kvm_vcpu *vcpu,
 		       struct sys_reg_params *p,
 		       const struct sys_reg_desc *r)
@@ -2236,12 +2235,236 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	EL2_REG(SP_EL2, NULL, reset_unknown, 0),
 };
 
-#define SYS_INSN_TO_DESC(insn, access_fn, forward_fn)	\
-	{ SYS_DESC((insn)), (access_fn), NULL, 0, 0, NULL, NULL, (forward_fn) }
+static bool handle_s1e01(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			 const struct sys_reg_desc *r)
+{
+	int sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
+
+	if (vcpu_has_nv(vcpu) && forward_traps(vcpu, HCR_AT))
+		return false;
+
+	__kvm_at_s1e01(vcpu, sys_encoding, p->regval);
+
+	return true;
+}
+
+static bool handle_s1e2(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			const struct sys_reg_desc *r)
+{
+	int sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
+
+	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
+		return false;
+
+	__kvm_at_s1e2(vcpu, sys_encoding, p->regval);
+
+	return true;
+}
+
+static u64 setup_par_aborted(u32 esr)
+{
+	u64 par = 0;
+
+	/* S [9]: fault in the stage 2 translation */
+	par |= (1 << 9);
+	/* FST [6:1]: Fault status code  */
+	par |= (esr << 1);
+	/* F [0]: translation is aborted */
+	par |= 1;
+
+	return par;
+}
+
+static u64 setup_par_completed(struct kvm_vcpu *vcpu, struct kvm_s2_trans *out)
+{
+	u64 par, vtcr_sh0;
+
+	/* F [0]: Translation is completed successfully */
+	par = 0;
+	/* ATTR [63:56] */
+	par |= out->upper_attr;
+	/* PA [47:12] */
+	par |= out->output & GENMASK_ULL(11, 0);
+	/* RES1 [11] */
+	par |= (1UL << 11);
+	/* SH [8:7]: Shareability attribute */
+	vtcr_sh0 = vcpu_read_sys_reg(vcpu, VTCR_EL2) & VTCR_EL2_SH0_MASK;
+	par |= (vtcr_sh0 >> VTCR_EL2_SH0_SHIFT) << 7;
+
+	return par;
+}
+
+static bool handle_s12(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+		       const struct sys_reg_desc *r, bool write)
+{
+	u64 par, va;
+	u32 esr, op;
+	phys_addr_t ipa;
+	struct kvm_s2_trans out;
+	int ret;
+
+	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
+		return false;
+
+	/* Do the stage-1 translation */
+	va = p->regval;
+	op = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
+	switch (op) {
+	case OP_AT_S12E1R:
+		op = OP_AT_S1E1R;
+		break;
+	case OP_AT_S12E1W:
+		op = OP_AT_S1E1W;
+		break;
+	case OP_AT_S12E0R:
+		op = OP_AT_S1E0R;
+		break;
+	case OP_AT_S12E0W:
+		op = OP_AT_S1E0W;
+		break;
+	default:
+		WARN_ON_ONCE(1);
+		return true;
+	}
+
+	__kvm_at_s1e01(vcpu, op, va);
+	par = vcpu_read_sys_reg(vcpu, PAR_EL1);
+	if (par & 1) {
+		/* The stage-1 translation aborted */
+		return true;
+	}
+
+	/* Do the stage-2 translation */
+	ipa = (par & GENMASK_ULL(47, 12)) | (va & GENMASK_ULL(11, 0));
+	out.esr = 0;
+	ret = kvm_walk_nested_s2(vcpu, ipa, &out);
+	if (ret < 0)
+		return false;
+
+	/* Check if the stage-2 PTW is aborted */
+	if (out.esr) {
+		esr = out.esr;
+		goto s2_trans_abort;
+	}
+
+	/* Check the access permission */
+	if ((!write && !out.readable) || (write && !out.writable)) {
+		esr = ESR_ELx_FSC_PERM;
+		esr |= out.level & 0x3;
+		goto s2_trans_abort;
+	}
+
+	vcpu_write_sys_reg(vcpu, setup_par_completed(vcpu, &out), PAR_EL1);
+	return true;
+
+s2_trans_abort:
+	vcpu_write_sys_reg(vcpu, setup_par_aborted(esr), PAR_EL1);
+	return true;
+}
+
+static bool handle_s12r(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			const struct sys_reg_desc *r)
+{
+	return handle_s12(vcpu, p, r, false);
+}
+
+static bool handle_s12w(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			const struct sys_reg_desc *r)
+{
+	return handle_s12(vcpu, p, r, true);
+}
+
+/*
+ * AT instruction emulation
+ *
+ * We emulate AT instructions executed in the virtual EL2.
+ * Basic strategy for the stage-1 translation emulation is to load proper
+ * context, which depends on the trapped instruction and the virtual HCR_EL2,
+ * to the EL1 virtual memory control registers and execute S1E[01] instructions
+ * in EL2. See below for more detail.
+ *
+ * For the stage-2 translation, which is necessary for S12E[01] emulation,
+ * we walk the guest hypervisor's stage-2 page table in software.
+ *
+ * The stage-1 translation emulations can be divided into two groups depending
+ * on the translation regime.
+ *
+ * 1. EL2 AT instructions: S1E2x
+ * +-----------------------------------------------------------------------+
+ * |                             |         Setting for the emulation       |
+ * | Virtual HCR_EL2.E2H on trap |-----------------------------------------+
+ * |                             | Phys EL1 regs | Phys NV, NV1 | Phys TGE |
+ * |-----------------------------------------------------------------------|
+ * |             0               |     vEL2      |    (1, 1)    |    0     |
+ * |             1               |     vEL2      |    (0, 0)    |    0     |
+ * +-----------------------------------------------------------------------+
+ *
+ * We emulate the EL2 AT instructions by loading virtual EL2 context
+ * to the EL1 virtual memory control registers and executing corresponding
+ * EL1 AT instructions.
+ *
+ * We set physical NV and NV1 bits to use EL2 page table format for non-VHE
+ * guest hypervisor (i.e. HCR_EL2.E2H == 0). As a VHE guest hypervisor uses the
+ * EL1 page table format, we don't set those bits.
+ *
+ * We should clear physical TGE bit not to use the EL2 translation regime when
+ * the host uses the VHE feature.
+ *
+ *
+ * 2. EL0/EL1 AT instructions: S1E[01]x, S12E1x
+ * +----------------------------------------------------------------------+
+ * |   Virtual HCR_EL2 on trap  |        Setting for the emulation        |
+ * |----------------------------------------------------------------------+
+ * | (vE2H, vTGE) | (vNV, vNV1) | Phys EL1 regs | Phys NV, NV1 | Phys TGE |
+ * |----------------------------------------------------------------------|
+ * |    (0, 0)*   |   (0, 0)    |      vEL1     |    (0, 0)    |    0     |
+ * |    (0, 0)    |   (1, 1)    |      vEL1     |    (1, 1)    |    0     |
+ * |    (1, 1)    |   (0, 0)    |      vEL2     |    (0, 0)    |    0     |
+ * |    (1, 1)    |   (1, 1)    |      vEL2     |    (1, 1)    |    0     |
+ * +----------------------------------------------------------------------+
+ *
+ * *For (0, 0) in the 'Virtual HCR_EL2 on trap' column, it actually means
+ *  (1, 1). Keep them (0, 0) just for the readability.
+ *
+ * We set physical EL1 virtual memory control registers depending on
+ * (vE2H, vTGE) pair. When the pair is (0, 0) where AT instructions are
+ * supposed to use EL0/EL1 translation regime, we load the EL1 registers with
+ * the virtual EL1 registers (i.e. EL1 registers from the guest hypervisor's
+ * point of view). When the pair is (1, 1), however, AT instructions are defined
+ * to apply EL2 translation regime. To emulate this behavior, we load the EL1
+ * registers with the virtual EL2 context. (i.e the shadow registers)
+ *
+ * We respect the virtual NV and NV1 bit for the emulation. When those bits are
+ * set, it means that a guest hypervisor would like to use EL2 page table format
+ * for the EL1 translation regime. We emulate this by setting the physical
+ * NV and NV1 bits.
+ */
+
+#define SYS_INSN(insn, access_fn)					\
+	{								\
+		SYS_DESC(OP_##insn),					\
+		.access = (access_fn),					\
+	}
+
 static struct sys_reg_desc sys_insn_descs[] = {
 	{ SYS_DESC(SYS_DC_ISW), access_dcsw },
+
+	SYS_INSN(AT_S1E1R, handle_s1e01),
+	SYS_INSN(AT_S1E1W, handle_s1e01),
+	SYS_INSN(AT_S1E0R, handle_s1e01),
+	SYS_INSN(AT_S1E0W, handle_s1e01),
+	SYS_INSN(AT_S1E1RP, handle_s1e01),
+	SYS_INSN(AT_S1E1WP, handle_s1e01),
+
 	{ SYS_DESC(SYS_DC_CSW), access_dcsw },
 	{ SYS_DESC(SYS_DC_CISW), access_dcsw },
+
+	SYS_INSN(AT_S1E2R, handle_s1e2),
+	SYS_INSN(AT_S1E2W, handle_s1e2),
+	SYS_INSN(AT_S12E1R, handle_s12r),
+	SYS_INSN(AT_S12E1W, handle_s12w),
+	SYS_INSN(AT_S12E0R, handle_s12r),
+	SYS_INSN(AT_S12E0W, handle_s12w),
 };
 
 static bool trap_dbgdidr(struct kvm_vcpu *vcpu,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 40/64] KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

From: Jintack Lim <jintack.lim@linaro.org>

When supporting nested virtualization a guest hypervisor executing AT
instructions must be trapped and emulated by the host hypervisor,
because untrapped AT instructions operating on S1E1 will use the wrong
translation regieme (the one used to emulate virtual EL2 in EL1 instead
of virtual EL1) and AT instructions operating on S12 will not work from
EL1.

This patch does several things.

1. List and define all AT system instructions to emulate and document
the emulation design.

2. Implement AT instruction handling logic in EL2. This will be used to
emulate AT instructions executed in the virtual EL2.

AT instruction emulation works by loading the proper processor
context, which depends on the trapped instruction and the virtual
HCR_EL2, to the EL1 virtual memory control registers and executing AT
instructions. Note that ctxt->hw_sys_regs is expected to have the
proper processor context before calling the handling
function(__kvm_at_insn) implemented in this patch.

4. Emulate AT S1E[01] instructions by issuing the same instructions in
EL2. We set the physical EL1 registers, NV and NV1 bits as described in
the AT instruction emulation overview.

5. Emulate AT A12E[01] instructions in two steps: First, do the stage-1
translation by reusing the existing AT emulation functions.  Second, do
the stage-2 translation by walking the guest hypervisor's stage-2 page
table in software. Record the translation result to PAR_EL1.

6. Emulate AT S1E2 instructions by issuing the corresponding S1E1
instructions in EL2. We set the physical EL1 registers and the HCR_EL2
register as described in the AT instruction emulation overview.

7. Forward system instruction traps to the virtual EL2 if the corresponding
virtual AT bit is set in the virtual HCR_EL2.

  [ Much logic above has been reworked by Marc Zyngier ]

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
---
 arch/arm64/include/asm/kvm_arm.h |   2 +
 arch/arm64/include/asm/kvm_asm.h |   2 +
 arch/arm64/include/asm/sysreg.h  |  17 +++
 arch/arm64/kvm/Makefile          |   2 +-
 arch/arm64/kvm/at.c              | 219 +++++++++++++++++++++++++++++
 arch/arm64/kvm/hyp/vhe/switch.c  |  13 +-
 arch/arm64/kvm/sys_regs.c        | 229 ++++++++++++++++++++++++++++++-
 7 files changed, 478 insertions(+), 6 deletions(-)
 create mode 100644 arch/arm64/kvm/at.c

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 3675879b53c6..aa3bdce1b166 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -20,6 +20,7 @@
 #define HCR_AMVOFFEN	(UL(1) << 51)
 #define HCR_FIEN	(UL(1) << 47)
 #define HCR_FWB		(UL(1) << 46)
+#define HCR_AT		(UL(1) << 44)
 #define HCR_NV1		(UL(1) << 43)
 #define HCR_NV		(UL(1) << 42)
 #define HCR_API		(UL(1) << 41)
@@ -118,6 +119,7 @@
 #define VTCR_EL2_TG0_16K	TCR_TG0_16K
 #define VTCR_EL2_TG0_64K	TCR_TG0_64K
 #define VTCR_EL2_SH0_MASK	TCR_SH0_MASK
+#define VTCR_EL2_SH0_SHIFT	TCR_SH0_SHIFT
 #define VTCR_EL2_SH0_INNER	TCR_SH0_INNER
 #define VTCR_EL2_ORGN0_MASK	TCR_ORGN0_MASK
 #define VTCR_EL2_ORGN0_WBWA	TCR_ORGN0_WBWA
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index d5b0386ef765..e22861ece3c3 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -208,6 +208,8 @@ extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa,
 extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
 
 extern void __kvm_timer_set_cntvoff(u64 cntvoff);
+extern void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr);
+extern void __kvm_at_s1e2(struct kvm_vcpu *vcpu, u32 op, u64 vaddr);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 61c990e5591a..ea17d1eabfc5 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -658,6 +658,23 @@
 
 #define SYS_SP_EL2			sys_reg(3, 6,  4, 1, 0)
 
+/* AT instructions */
+#define AT_Op0 1
+#define AT_CRn 7
+
+#define OP_AT_S1E1R	sys_insn(AT_Op0, 0, AT_CRn, 8, 0)
+#define OP_AT_S1E1W	sys_insn(AT_Op0, 0, AT_CRn, 8, 1)
+#define OP_AT_S1E0R	sys_insn(AT_Op0, 0, AT_CRn, 8, 2)
+#define OP_AT_S1E0W	sys_insn(AT_Op0, 0, AT_CRn, 8, 3)
+#define OP_AT_S1E1RP	sys_insn(AT_Op0, 0, AT_CRn, 9, 0)
+#define OP_AT_S1E1WP	sys_insn(AT_Op0, 0, AT_CRn, 9, 1)
+#define OP_AT_S1E2R	sys_insn(AT_Op0, 4, AT_CRn, 8, 0)
+#define OP_AT_S1E2W	sys_insn(AT_Op0, 4, AT_CRn, 8, 1)
+#define OP_AT_S12E1R	sys_insn(AT_Op0, 4, AT_CRn, 8, 4)
+#define OP_AT_S12E1W	sys_insn(AT_Op0, 4, AT_CRn, 8, 5)
+#define OP_AT_S12E0R	sys_insn(AT_Op0, 4, AT_CRn, 8, 6)
+#define OP_AT_S12E0W	sys_insn(AT_Op0, 4, AT_CRn, 8, 7)
+
 /* Common SCTLR_ELx flags. */
 #define SCTLR_ELx_DSSBS	(BIT(44))
 #define SCTLR_ELx_ATA	(BIT(43))
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index dbaf42ff65f1..b800dcbd157f 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -14,7 +14,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
 	 inject_fault.o va_layout.o handle_exit.o \
 	 guest.o debug.o reset.o sys_regs.o \
 	 vgic-sys-reg-v3.o fpsimd.o pmu.o pkvm.o \
-	 arch_timer.o trng.o emulate-nested.o nested.o \
+	 arch_timer.o trng.o emulate-nested.o nested.o at.o \
 	 vgic/vgic.o vgic/vgic-init.o \
 	 vgic/vgic-irqfd.o vgic/vgic-v2.o \
 	 vgic/vgic-v3.o vgic/vgic-v4.o \
diff --git a/arch/arm64/kvm/at.c b/arch/arm64/kvm/at.c
new file mode 100644
index 000000000000..574c664e984b
--- /dev/null
+++ b/arch/arm64/kvm/at.c
@@ -0,0 +1,219 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2017 - Linaro Ltd
+ * Author: Jintack Lim <jintack.lim@linaro.org>
+ */
+
+#include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
+
+struct mmu_config {
+	u64	ttbr0;
+	u64	ttbr1;
+	u64	tcr;
+	u64	sctlr;
+	u64	vttbr;
+	u64	vtcr;
+	u64	hcr;
+};
+
+static void __mmu_config_save(struct mmu_config *config)
+{
+	config->ttbr0	= read_sysreg_el1(SYS_TTBR0);
+	config->ttbr1	= read_sysreg_el1(SYS_TTBR1);
+	config->tcr	= read_sysreg_el1(SYS_TCR);
+	config->sctlr	= read_sysreg_el1(SYS_SCTLR);
+	config->vttbr	= read_sysreg(vttbr_el2);
+	config->vtcr	= read_sysreg(vtcr_el2);
+	config->hcr	= read_sysreg(hcr_el2);
+}
+
+static void __mmu_config_restore(struct mmu_config *config)
+{
+	write_sysreg_el1(config->ttbr0,	SYS_TTBR0);
+	write_sysreg_el1(config->ttbr1,	SYS_TTBR1);
+	write_sysreg_el1(config->tcr,	SYS_TCR);
+	write_sysreg_el1(config->sctlr,	SYS_SCTLR);
+	write_sysreg(config->vttbr,	vttbr_el2);
+	write_sysreg(config->vtcr,	vtcr_el2);
+	write_sysreg(config->hcr,	hcr_el2);
+
+	isb();
+}
+
+void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
+{
+	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
+	struct mmu_config config;
+	struct kvm_s2_mmu *mmu;
+
+	spin_lock(&vcpu->kvm->mmu_lock);
+
+	/*
+	 * If HCR_EL2.{E2H,TGE} == {1,1}, the MMU context is already
+	 * the right one (as we trapped from vEL2).
+	 */
+	if (vcpu_el2_e2h_is_set(vcpu) && vcpu_el2_tge_is_set(vcpu))
+		goto skip_mmu_switch;
+
+	/*
+	 * FIXME: Obtaining the S2 MMU for a guest guest is horribly
+	 * racy, and we may not find it (evicted by another vcpu, for
+	 * example).
+	 */
+	mmu = lookup_s2_mmu(vcpu->kvm,
+			    vcpu_read_sys_reg(vcpu, VTTBR_EL2),
+			    vcpu_read_sys_reg(vcpu, HCR_EL2));
+
+	if (WARN_ON(!mmu))
+		goto out;
+
+	/* We've trapped, so everything is live on the CPU. */
+	__mmu_config_save(&config);
+
+	write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR0_EL1),	SYS_TTBR0);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR1_EL1),	SYS_TTBR1);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, TCR_EL1),	SYS_TCR);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, SCTLR_EL1),	SYS_SCTLR);
+	write_sysreg(kvm_get_vttbr(mmu),		vttbr_el2);
+	/*
+	 * REVISIT: do we need anything from the guest's VTCR_EL2? If
+	 * looks like keeping the hosts configuration is the right
+	 * thing to do at this stage (and we could avoid save/restore
+	 * it. Keep the host's version for now.
+	 */
+	write_sysreg((config.hcr & ~HCR_TGE) | HCR_VM,	hcr_el2);
+
+	isb();
+
+skip_mmu_switch:
+
+	switch (op) {
+	case OP_AT_S1E1R:
+	case OP_AT_S1E1RP:
+		asm volatile("at s1e1r, %0" : : "r" (vaddr));
+		break;
+	case OP_AT_S1E1W:
+	case OP_AT_S1E1WP:
+		asm volatile("at s1e1w, %0" : : "r" (vaddr));
+		break;
+	case OP_AT_S1E0R:
+		asm volatile("at s1e0r, %0" : : "r" (vaddr));
+		break;
+	case OP_AT_S1E0W:
+		asm volatile("at s1e0w, %0" : : "r" (vaddr));
+		break;
+	default:
+		WARN_ON_ONCE(1);
+		break;
+	}
+
+	isb();
+
+	ctxt_sys_reg(ctxt, PAR_EL1) = read_sysreg(par_el1);
+
+	/*
+	 * Failed? let's leave the building now.
+	 *
+	 * FIXME: how about a failed translation because the shadow S2
+	 * wasn't populated? We may need to perform a SW PTW,
+	 * populating our shadow S2 and retry the instruction.
+	 */
+	if (ctxt_sys_reg(ctxt, PAR_EL1) & 1)
+		goto nopan;
+
+	/* No PAN? No problem. */
+	if (!(*vcpu_cpsr(vcpu) & PSR_PAN_BIT))
+		goto nopan;
+
+	/*
+	 * For PAN-involved AT operations, perform the same
+	 * translation, using EL0 this time.
+	 */
+	switch (op) {
+	case OP_AT_S1E1RP:
+		asm volatile("at s1e0r, %0" : : "r" (vaddr));
+		break;
+	case OP_AT_S1E1WP:
+		asm volatile("at s1e0w, %0" : : "r" (vaddr));
+		break;
+	default:
+		goto nopan;
+	}
+
+	/*
+	 * If the EL0 translation has succeeded, we need to pretend
+	 * the AT operation has failed, as the PAN setting forbids
+	 * such a translation.
+	 *
+	 * FIXME: we hardcode a Level-3 permission fault. We really
+	 * should return the real fault level.
+	 */
+	if (!(read_sysreg(par_el1) & 1))
+		ctxt_sys_reg(ctxt, PAR_EL1) = 0x1f;
+
+nopan:
+	if (!(vcpu_el2_e2h_is_set(vcpu) && vcpu_el2_tge_is_set(vcpu)))
+		__mmu_config_restore(&config);
+
+out:
+	spin_unlock(&vcpu->kvm->mmu_lock);
+}
+
+void __kvm_at_s1e2(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
+{
+	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
+	struct mmu_config config;
+	struct kvm_s2_mmu *mmu;
+	u64 val;
+
+	spin_lock(&vcpu->kvm->mmu_lock);
+
+	mmu = &vcpu->kvm->arch.mmu;
+
+	/* We've trapped, so everything is live on the CPU. */
+	__mmu_config_save(&config);
+
+	if (vcpu_el2_e2h_is_set(vcpu)) {
+		write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR0_EL2),	SYS_TTBR0);
+		write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR1_EL2),	SYS_TTBR1);
+		write_sysreg_el1(ctxt_sys_reg(ctxt, TCR_EL2),	SYS_TCR);
+		write_sysreg_el1(ctxt_sys_reg(ctxt, SCTLR_EL2),	SYS_SCTLR);
+
+		val = config.hcr;
+	} else {
+		write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR0_EL2),	SYS_TTBR0);
+		val = translate_tcr_el2_to_tcr_el1(ctxt_sys_reg(ctxt, TCR_EL2));
+		write_sysreg_el1(val, SYS_TCR);
+		val = translate_sctlr_el2_to_sctlr_el1(ctxt_sys_reg(ctxt, SCTLR_EL2));
+		write_sysreg_el1(val, SYS_SCTLR);
+
+		val = config.hcr | HCR_NV | HCR_NV1;
+	}
+
+	write_sysreg(kvm_get_vttbr(mmu),		vttbr_el2);
+	/* FIXME: write S2 MMU VTCR_EL2? */
+	write_sysreg((val & ~HCR_TGE) | HCR_VM,		hcr_el2);
+
+	isb();
+
+	switch (op) {
+	case OP_AT_S1E2R:
+		asm volatile("at s1e1r, %0" : : "r" (vaddr));
+		break;
+	case OP_AT_S1E2W:
+		asm volatile("at s1e1w, %0" : : "r" (vaddr));
+		break;
+	default:
+		WARN_ON_ONCE(1);
+		break;
+	}
+
+	isb();
+
+	/* FIXME: handle failed translation due to shadow S2 */
+	ctxt_sys_reg(ctxt, PAR_EL1) = read_sysreg(par_el1);
+
+	__mmu_config_restore(&config);
+	spin_unlock(&vcpu->kvm->mmu_lock);
+}
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 28845f907cfc..b7790d3c4122 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -41,9 +41,10 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 		if (!vcpu_el2_e2h_is_set(vcpu)) {
 			/*
 			 * For a guest hypervisor on v8.0, trap and emulate
-			 * the EL1 virtual memory control register accesses.
+			 * the EL1 virtual memory control register accesses
+			 * as well as the AT S1 operations.
 			 */
-			hcr |= HCR_TVM | HCR_TRVM | HCR_NV1;
+			hcr |= HCR_TVM | HCR_TRVM | HCR_AT | HCR_NV1;
 		} else {
 			/*
 			 * For a guest hypervisor on v8.1 (VHE), allow to
@@ -68,6 +69,14 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 			hcr &= ~HCR_TVM;
 
 			hcr |= vhcr_el2 & (HCR_TVM | HCR_TRVM);
+
+			/*
+			 * If we're using the EL1 translation regime
+			 * (TGE clear), then ensure that AT S1 ops are
+			 * trapped too.
+			 */
+			if (!vcpu_el2_tge_is_set(vcpu))
+				hcr |= HCR_AT;
 		}
 	}
 
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index f669618f966b..7be57e1b7019 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1704,7 +1704,6 @@ static bool access_sp_el1(struct kvm_vcpu *vcpu,
 	return true;
 }
 
-
 static bool access_elr(struct kvm_vcpu *vcpu,
 		       struct sys_reg_params *p,
 		       const struct sys_reg_desc *r)
@@ -2236,12 +2235,236 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	EL2_REG(SP_EL2, NULL, reset_unknown, 0),
 };
 
-#define SYS_INSN_TO_DESC(insn, access_fn, forward_fn)	\
-	{ SYS_DESC((insn)), (access_fn), NULL, 0, 0, NULL, NULL, (forward_fn) }
+static bool handle_s1e01(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			 const struct sys_reg_desc *r)
+{
+	int sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
+
+	if (vcpu_has_nv(vcpu) && forward_traps(vcpu, HCR_AT))
+		return false;
+
+	__kvm_at_s1e01(vcpu, sys_encoding, p->regval);
+
+	return true;
+}
+
+static bool handle_s1e2(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			const struct sys_reg_desc *r)
+{
+	int sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
+
+	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
+		return false;
+
+	__kvm_at_s1e2(vcpu, sys_encoding, p->regval);
+
+	return true;
+}
+
+static u64 setup_par_aborted(u32 esr)
+{
+	u64 par = 0;
+
+	/* S [9]: fault in the stage 2 translation */
+	par |= (1 << 9);
+	/* FST [6:1]: Fault status code  */
+	par |= (esr << 1);
+	/* F [0]: translation is aborted */
+	par |= 1;
+
+	return par;
+}
+
+static u64 setup_par_completed(struct kvm_vcpu *vcpu, struct kvm_s2_trans *out)
+{
+	u64 par, vtcr_sh0;
+
+	/* F [0]: Translation is completed successfully */
+	par = 0;
+	/* ATTR [63:56] */
+	par |= out->upper_attr;
+	/* PA [47:12] */
+	par |= out->output & GENMASK_ULL(11, 0);
+	/* RES1 [11] */
+	par |= (1UL << 11);
+	/* SH [8:7]: Shareability attribute */
+	vtcr_sh0 = vcpu_read_sys_reg(vcpu, VTCR_EL2) & VTCR_EL2_SH0_MASK;
+	par |= (vtcr_sh0 >> VTCR_EL2_SH0_SHIFT) << 7;
+
+	return par;
+}
+
+static bool handle_s12(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+		       const struct sys_reg_desc *r, bool write)
+{
+	u64 par, va;
+	u32 esr, op;
+	phys_addr_t ipa;
+	struct kvm_s2_trans out;
+	int ret;
+
+	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
+		return false;
+
+	/* Do the stage-1 translation */
+	va = p->regval;
+	op = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
+	switch (op) {
+	case OP_AT_S12E1R:
+		op = OP_AT_S1E1R;
+		break;
+	case OP_AT_S12E1W:
+		op = OP_AT_S1E1W;
+		break;
+	case OP_AT_S12E0R:
+		op = OP_AT_S1E0R;
+		break;
+	case OP_AT_S12E0W:
+		op = OP_AT_S1E0W;
+		break;
+	default:
+		WARN_ON_ONCE(1);
+		return true;
+	}
+
+	__kvm_at_s1e01(vcpu, op, va);
+	par = vcpu_read_sys_reg(vcpu, PAR_EL1);
+	if (par & 1) {
+		/* The stage-1 translation aborted */
+		return true;
+	}
+
+	/* Do the stage-2 translation */
+	ipa = (par & GENMASK_ULL(47, 12)) | (va & GENMASK_ULL(11, 0));
+	out.esr = 0;
+	ret = kvm_walk_nested_s2(vcpu, ipa, &out);
+	if (ret < 0)
+		return false;
+
+	/* Check if the stage-2 PTW is aborted */
+	if (out.esr) {
+		esr = out.esr;
+		goto s2_trans_abort;
+	}
+
+	/* Check the access permission */
+	if ((!write && !out.readable) || (write && !out.writable)) {
+		esr = ESR_ELx_FSC_PERM;
+		esr |= out.level & 0x3;
+		goto s2_trans_abort;
+	}
+
+	vcpu_write_sys_reg(vcpu, setup_par_completed(vcpu, &out), PAR_EL1);
+	return true;
+
+s2_trans_abort:
+	vcpu_write_sys_reg(vcpu, setup_par_aborted(esr), PAR_EL1);
+	return true;
+}
+
+static bool handle_s12r(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			const struct sys_reg_desc *r)
+{
+	return handle_s12(vcpu, p, r, false);
+}
+
+static bool handle_s12w(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			const struct sys_reg_desc *r)
+{
+	return handle_s12(vcpu, p, r, true);
+}
+
+/*
+ * AT instruction emulation
+ *
+ * We emulate AT instructions executed in the virtual EL2.
+ * Basic strategy for the stage-1 translation emulation is to load proper
+ * context, which depends on the trapped instruction and the virtual HCR_EL2,
+ * to the EL1 virtual memory control registers and execute S1E[01] instructions
+ * in EL2. See below for more detail.
+ *
+ * For the stage-2 translation, which is necessary for S12E[01] emulation,
+ * we walk the guest hypervisor's stage-2 page table in software.
+ *
+ * The stage-1 translation emulations can be divided into two groups depending
+ * on the translation regime.
+ *
+ * 1. EL2 AT instructions: S1E2x
+ * +-----------------------------------------------------------------------+
+ * |                             |         Setting for the emulation       |
+ * | Virtual HCR_EL2.E2H on trap |-----------------------------------------+
+ * |                             | Phys EL1 regs | Phys NV, NV1 | Phys TGE |
+ * |-----------------------------------------------------------------------|
+ * |             0               |     vEL2      |    (1, 1)    |    0     |
+ * |             1               |     vEL2      |    (0, 0)    |    0     |
+ * +-----------------------------------------------------------------------+
+ *
+ * We emulate the EL2 AT instructions by loading virtual EL2 context
+ * to the EL1 virtual memory control registers and executing corresponding
+ * EL1 AT instructions.
+ *
+ * We set physical NV and NV1 bits to use EL2 page table format for non-VHE
+ * guest hypervisor (i.e. HCR_EL2.E2H == 0). As a VHE guest hypervisor uses the
+ * EL1 page table format, we don't set those bits.
+ *
+ * We should clear physical TGE bit not to use the EL2 translation regime when
+ * the host uses the VHE feature.
+ *
+ *
+ * 2. EL0/EL1 AT instructions: S1E[01]x, S12E1x
+ * +----------------------------------------------------------------------+
+ * |   Virtual HCR_EL2 on trap  |        Setting for the emulation        |
+ * |----------------------------------------------------------------------+
+ * | (vE2H, vTGE) | (vNV, vNV1) | Phys EL1 regs | Phys NV, NV1 | Phys TGE |
+ * |----------------------------------------------------------------------|
+ * |    (0, 0)*   |   (0, 0)    |      vEL1     |    (0, 0)    |    0     |
+ * |    (0, 0)    |   (1, 1)    |      vEL1     |    (1, 1)    |    0     |
+ * |    (1, 1)    |   (0, 0)    |      vEL2     |    (0, 0)    |    0     |
+ * |    (1, 1)    |   (1, 1)    |      vEL2     |    (1, 1)    |    0     |
+ * +----------------------------------------------------------------------+
+ *
+ * *For (0, 0) in the 'Virtual HCR_EL2 on trap' column, it actually means
+ *  (1, 1). Keep them (0, 0) just for the readability.
+ *
+ * We set physical EL1 virtual memory control registers depending on
+ * (vE2H, vTGE) pair. When the pair is (0, 0) where AT instructions are
+ * supposed to use EL0/EL1 translation regime, we load the EL1 registers with
+ * the virtual EL1 registers (i.e. EL1 registers from the guest hypervisor's
+ * point of view). When the pair is (1, 1), however, AT instructions are defined
+ * to apply EL2 translation regime. To emulate this behavior, we load the EL1
+ * registers with the virtual EL2 context. (i.e the shadow registers)
+ *
+ * We respect the virtual NV and NV1 bit for the emulation. When those bits are
+ * set, it means that a guest hypervisor would like to use EL2 page table format
+ * for the EL1 translation regime. We emulate this by setting the physical
+ * NV and NV1 bits.
+ */
+
+#define SYS_INSN(insn, access_fn)					\
+	{								\
+		SYS_DESC(OP_##insn),					\
+		.access = (access_fn),					\
+	}
+
 static struct sys_reg_desc sys_insn_descs[] = {
 	{ SYS_DESC(SYS_DC_ISW), access_dcsw },
+
+	SYS_INSN(AT_S1E1R, handle_s1e01),
+	SYS_INSN(AT_S1E1W, handle_s1e01),
+	SYS_INSN(AT_S1E0R, handle_s1e01),
+	SYS_INSN(AT_S1E0W, handle_s1e01),
+	SYS_INSN(AT_S1E1RP, handle_s1e01),
+	SYS_INSN(AT_S1E1WP, handle_s1e01),
+
 	{ SYS_DESC(SYS_DC_CSW), access_dcsw },
 	{ SYS_DESC(SYS_DC_CISW), access_dcsw },
+
+	SYS_INSN(AT_S1E2R, handle_s1e2),
+	SYS_INSN(AT_S1E2W, handle_s1e2),
+	SYS_INSN(AT_S12E1R, handle_s12r),
+	SYS_INSN(AT_S12E1W, handle_s12w),
+	SYS_INSN(AT_S12E0R, handle_s12r),
+	SYS_INSN(AT_S12E0W, handle_s12w),
 };
 
 static bool trap_dbgdidr(struct kvm_vcpu *vcpu,
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 40/64] KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Jintack Lim <jintack.lim@linaro.org>

When supporting nested virtualization a guest hypervisor executing AT
instructions must be trapped and emulated by the host hypervisor,
because untrapped AT instructions operating on S1E1 will use the wrong
translation regieme (the one used to emulate virtual EL2 in EL1 instead
of virtual EL1) and AT instructions operating on S12 will not work from
EL1.

This patch does several things.

1. List and define all AT system instructions to emulate and document
the emulation design.

2. Implement AT instruction handling logic in EL2. This will be used to
emulate AT instructions executed in the virtual EL2.

AT instruction emulation works by loading the proper processor
context, which depends on the trapped instruction and the virtual
HCR_EL2, to the EL1 virtual memory control registers and executing AT
instructions. Note that ctxt->hw_sys_regs is expected to have the
proper processor context before calling the handling
function(__kvm_at_insn) implemented in this patch.

4. Emulate AT S1E[01] instructions by issuing the same instructions in
EL2. We set the physical EL1 registers, NV and NV1 bits as described in
the AT instruction emulation overview.

5. Emulate AT A12E[01] instructions in two steps: First, do the stage-1
translation by reusing the existing AT emulation functions.  Second, do
the stage-2 translation by walking the guest hypervisor's stage-2 page
table in software. Record the translation result to PAR_EL1.

6. Emulate AT S1E2 instructions by issuing the corresponding S1E1
instructions in EL2. We set the physical EL1 registers and the HCR_EL2
register as described in the AT instruction emulation overview.

7. Forward system instruction traps to the virtual EL2 if the corresponding
virtual AT bit is set in the virtual HCR_EL2.

  [ Much logic above has been reworked by Marc Zyngier ]

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
---
 arch/arm64/include/asm/kvm_arm.h |   2 +
 arch/arm64/include/asm/kvm_asm.h |   2 +
 arch/arm64/include/asm/sysreg.h  |  17 +++
 arch/arm64/kvm/Makefile          |   2 +-
 arch/arm64/kvm/at.c              | 219 +++++++++++++++++++++++++++++
 arch/arm64/kvm/hyp/vhe/switch.c  |  13 +-
 arch/arm64/kvm/sys_regs.c        | 229 ++++++++++++++++++++++++++++++-
 7 files changed, 478 insertions(+), 6 deletions(-)
 create mode 100644 arch/arm64/kvm/at.c

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 3675879b53c6..aa3bdce1b166 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -20,6 +20,7 @@
 #define HCR_AMVOFFEN	(UL(1) << 51)
 #define HCR_FIEN	(UL(1) << 47)
 #define HCR_FWB		(UL(1) << 46)
+#define HCR_AT		(UL(1) << 44)
 #define HCR_NV1		(UL(1) << 43)
 #define HCR_NV		(UL(1) << 42)
 #define HCR_API		(UL(1) << 41)
@@ -118,6 +119,7 @@
 #define VTCR_EL2_TG0_16K	TCR_TG0_16K
 #define VTCR_EL2_TG0_64K	TCR_TG0_64K
 #define VTCR_EL2_SH0_MASK	TCR_SH0_MASK
+#define VTCR_EL2_SH0_SHIFT	TCR_SH0_SHIFT
 #define VTCR_EL2_SH0_INNER	TCR_SH0_INNER
 #define VTCR_EL2_ORGN0_MASK	TCR_ORGN0_MASK
 #define VTCR_EL2_ORGN0_WBWA	TCR_ORGN0_WBWA
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index d5b0386ef765..e22861ece3c3 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -208,6 +208,8 @@ extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa,
 extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
 
 extern void __kvm_timer_set_cntvoff(u64 cntvoff);
+extern void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr);
+extern void __kvm_at_s1e2(struct kvm_vcpu *vcpu, u32 op, u64 vaddr);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 61c990e5591a..ea17d1eabfc5 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -658,6 +658,23 @@
 
 #define SYS_SP_EL2			sys_reg(3, 6,  4, 1, 0)
 
+/* AT instructions */
+#define AT_Op0 1
+#define AT_CRn 7
+
+#define OP_AT_S1E1R	sys_insn(AT_Op0, 0, AT_CRn, 8, 0)
+#define OP_AT_S1E1W	sys_insn(AT_Op0, 0, AT_CRn, 8, 1)
+#define OP_AT_S1E0R	sys_insn(AT_Op0, 0, AT_CRn, 8, 2)
+#define OP_AT_S1E0W	sys_insn(AT_Op0, 0, AT_CRn, 8, 3)
+#define OP_AT_S1E1RP	sys_insn(AT_Op0, 0, AT_CRn, 9, 0)
+#define OP_AT_S1E1WP	sys_insn(AT_Op0, 0, AT_CRn, 9, 1)
+#define OP_AT_S1E2R	sys_insn(AT_Op0, 4, AT_CRn, 8, 0)
+#define OP_AT_S1E2W	sys_insn(AT_Op0, 4, AT_CRn, 8, 1)
+#define OP_AT_S12E1R	sys_insn(AT_Op0, 4, AT_CRn, 8, 4)
+#define OP_AT_S12E1W	sys_insn(AT_Op0, 4, AT_CRn, 8, 5)
+#define OP_AT_S12E0R	sys_insn(AT_Op0, 4, AT_CRn, 8, 6)
+#define OP_AT_S12E0W	sys_insn(AT_Op0, 4, AT_CRn, 8, 7)
+
 /* Common SCTLR_ELx flags. */
 #define SCTLR_ELx_DSSBS	(BIT(44))
 #define SCTLR_ELx_ATA	(BIT(43))
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index dbaf42ff65f1..b800dcbd157f 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -14,7 +14,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
 	 inject_fault.o va_layout.o handle_exit.o \
 	 guest.o debug.o reset.o sys_regs.o \
 	 vgic-sys-reg-v3.o fpsimd.o pmu.o pkvm.o \
-	 arch_timer.o trng.o emulate-nested.o nested.o \
+	 arch_timer.o trng.o emulate-nested.o nested.o at.o \
 	 vgic/vgic.o vgic/vgic-init.o \
 	 vgic/vgic-irqfd.o vgic/vgic-v2.o \
 	 vgic/vgic-v3.o vgic/vgic-v4.o \
diff --git a/arch/arm64/kvm/at.c b/arch/arm64/kvm/at.c
new file mode 100644
index 000000000000..574c664e984b
--- /dev/null
+++ b/arch/arm64/kvm/at.c
@@ -0,0 +1,219 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2017 - Linaro Ltd
+ * Author: Jintack Lim <jintack.lim@linaro.org>
+ */
+
+#include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
+
+struct mmu_config {
+	u64	ttbr0;
+	u64	ttbr1;
+	u64	tcr;
+	u64	sctlr;
+	u64	vttbr;
+	u64	vtcr;
+	u64	hcr;
+};
+
+static void __mmu_config_save(struct mmu_config *config)
+{
+	config->ttbr0	= read_sysreg_el1(SYS_TTBR0);
+	config->ttbr1	= read_sysreg_el1(SYS_TTBR1);
+	config->tcr	= read_sysreg_el1(SYS_TCR);
+	config->sctlr	= read_sysreg_el1(SYS_SCTLR);
+	config->vttbr	= read_sysreg(vttbr_el2);
+	config->vtcr	= read_sysreg(vtcr_el2);
+	config->hcr	= read_sysreg(hcr_el2);
+}
+
+static void __mmu_config_restore(struct mmu_config *config)
+{
+	write_sysreg_el1(config->ttbr0,	SYS_TTBR0);
+	write_sysreg_el1(config->ttbr1,	SYS_TTBR1);
+	write_sysreg_el1(config->tcr,	SYS_TCR);
+	write_sysreg_el1(config->sctlr,	SYS_SCTLR);
+	write_sysreg(config->vttbr,	vttbr_el2);
+	write_sysreg(config->vtcr,	vtcr_el2);
+	write_sysreg(config->hcr,	hcr_el2);
+
+	isb();
+}
+
+void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
+{
+	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
+	struct mmu_config config;
+	struct kvm_s2_mmu *mmu;
+
+	spin_lock(&vcpu->kvm->mmu_lock);
+
+	/*
+	 * If HCR_EL2.{E2H,TGE} == {1,1}, the MMU context is already
+	 * the right one (as we trapped from vEL2).
+	 */
+	if (vcpu_el2_e2h_is_set(vcpu) && vcpu_el2_tge_is_set(vcpu))
+		goto skip_mmu_switch;
+
+	/*
+	 * FIXME: Obtaining the S2 MMU for a guest guest is horribly
+	 * racy, and we may not find it (evicted by another vcpu, for
+	 * example).
+	 */
+	mmu = lookup_s2_mmu(vcpu->kvm,
+			    vcpu_read_sys_reg(vcpu, VTTBR_EL2),
+			    vcpu_read_sys_reg(vcpu, HCR_EL2));
+
+	if (WARN_ON(!mmu))
+		goto out;
+
+	/* We've trapped, so everything is live on the CPU. */
+	__mmu_config_save(&config);
+
+	write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR0_EL1),	SYS_TTBR0);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR1_EL1),	SYS_TTBR1);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, TCR_EL1),	SYS_TCR);
+	write_sysreg_el1(ctxt_sys_reg(ctxt, SCTLR_EL1),	SYS_SCTLR);
+	write_sysreg(kvm_get_vttbr(mmu),		vttbr_el2);
+	/*
+	 * REVISIT: do we need anything from the guest's VTCR_EL2? If
+	 * looks like keeping the hosts configuration is the right
+	 * thing to do at this stage (and we could avoid save/restore
+	 * it. Keep the host's version for now.
+	 */
+	write_sysreg((config.hcr & ~HCR_TGE) | HCR_VM,	hcr_el2);
+
+	isb();
+
+skip_mmu_switch:
+
+	switch (op) {
+	case OP_AT_S1E1R:
+	case OP_AT_S1E1RP:
+		asm volatile("at s1e1r, %0" : : "r" (vaddr));
+		break;
+	case OP_AT_S1E1W:
+	case OP_AT_S1E1WP:
+		asm volatile("at s1e1w, %0" : : "r" (vaddr));
+		break;
+	case OP_AT_S1E0R:
+		asm volatile("at s1e0r, %0" : : "r" (vaddr));
+		break;
+	case OP_AT_S1E0W:
+		asm volatile("at s1e0w, %0" : : "r" (vaddr));
+		break;
+	default:
+		WARN_ON_ONCE(1);
+		break;
+	}
+
+	isb();
+
+	ctxt_sys_reg(ctxt, PAR_EL1) = read_sysreg(par_el1);
+
+	/*
+	 * Failed? let's leave the building now.
+	 *
+	 * FIXME: how about a failed translation because the shadow S2
+	 * wasn't populated? We may need to perform a SW PTW,
+	 * populating our shadow S2 and retry the instruction.
+	 */
+	if (ctxt_sys_reg(ctxt, PAR_EL1) & 1)
+		goto nopan;
+
+	/* No PAN? No problem. */
+	if (!(*vcpu_cpsr(vcpu) & PSR_PAN_BIT))
+		goto nopan;
+
+	/*
+	 * For PAN-involved AT operations, perform the same
+	 * translation, using EL0 this time.
+	 */
+	switch (op) {
+	case OP_AT_S1E1RP:
+		asm volatile("at s1e0r, %0" : : "r" (vaddr));
+		break;
+	case OP_AT_S1E1WP:
+		asm volatile("at s1e0w, %0" : : "r" (vaddr));
+		break;
+	default:
+		goto nopan;
+	}
+
+	/*
+	 * If the EL0 translation has succeeded, we need to pretend
+	 * the AT operation has failed, as the PAN setting forbids
+	 * such a translation.
+	 *
+	 * FIXME: we hardcode a Level-3 permission fault. We really
+	 * should return the real fault level.
+	 */
+	if (!(read_sysreg(par_el1) & 1))
+		ctxt_sys_reg(ctxt, PAR_EL1) = 0x1f;
+
+nopan:
+	if (!(vcpu_el2_e2h_is_set(vcpu) && vcpu_el2_tge_is_set(vcpu)))
+		__mmu_config_restore(&config);
+
+out:
+	spin_unlock(&vcpu->kvm->mmu_lock);
+}
+
+void __kvm_at_s1e2(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
+{
+	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
+	struct mmu_config config;
+	struct kvm_s2_mmu *mmu;
+	u64 val;
+
+	spin_lock(&vcpu->kvm->mmu_lock);
+
+	mmu = &vcpu->kvm->arch.mmu;
+
+	/* We've trapped, so everything is live on the CPU. */
+	__mmu_config_save(&config);
+
+	if (vcpu_el2_e2h_is_set(vcpu)) {
+		write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR0_EL2),	SYS_TTBR0);
+		write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR1_EL2),	SYS_TTBR1);
+		write_sysreg_el1(ctxt_sys_reg(ctxt, TCR_EL2),	SYS_TCR);
+		write_sysreg_el1(ctxt_sys_reg(ctxt, SCTLR_EL2),	SYS_SCTLR);
+
+		val = config.hcr;
+	} else {
+		write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR0_EL2),	SYS_TTBR0);
+		val = translate_tcr_el2_to_tcr_el1(ctxt_sys_reg(ctxt, TCR_EL2));
+		write_sysreg_el1(val, SYS_TCR);
+		val = translate_sctlr_el2_to_sctlr_el1(ctxt_sys_reg(ctxt, SCTLR_EL2));
+		write_sysreg_el1(val, SYS_SCTLR);
+
+		val = config.hcr | HCR_NV | HCR_NV1;
+	}
+
+	write_sysreg(kvm_get_vttbr(mmu),		vttbr_el2);
+	/* FIXME: write S2 MMU VTCR_EL2? */
+	write_sysreg((val & ~HCR_TGE) | HCR_VM,		hcr_el2);
+
+	isb();
+
+	switch (op) {
+	case OP_AT_S1E2R:
+		asm volatile("at s1e1r, %0" : : "r" (vaddr));
+		break;
+	case OP_AT_S1E2W:
+		asm volatile("at s1e1w, %0" : : "r" (vaddr));
+		break;
+	default:
+		WARN_ON_ONCE(1);
+		break;
+	}
+
+	isb();
+
+	/* FIXME: handle failed translation due to shadow S2 */
+	ctxt_sys_reg(ctxt, PAR_EL1) = read_sysreg(par_el1);
+
+	__mmu_config_restore(&config);
+	spin_unlock(&vcpu->kvm->mmu_lock);
+}
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 28845f907cfc..b7790d3c4122 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -41,9 +41,10 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 		if (!vcpu_el2_e2h_is_set(vcpu)) {
 			/*
 			 * For a guest hypervisor on v8.0, trap and emulate
-			 * the EL1 virtual memory control register accesses.
+			 * the EL1 virtual memory control register accesses
+			 * as well as the AT S1 operations.
 			 */
-			hcr |= HCR_TVM | HCR_TRVM | HCR_NV1;
+			hcr |= HCR_TVM | HCR_TRVM | HCR_AT | HCR_NV1;
 		} else {
 			/*
 			 * For a guest hypervisor on v8.1 (VHE), allow to
@@ -68,6 +69,14 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 			hcr &= ~HCR_TVM;
 
 			hcr |= vhcr_el2 & (HCR_TVM | HCR_TRVM);
+
+			/*
+			 * If we're using the EL1 translation regime
+			 * (TGE clear), then ensure that AT S1 ops are
+			 * trapped too.
+			 */
+			if (!vcpu_el2_tge_is_set(vcpu))
+				hcr |= HCR_AT;
 		}
 	}
 
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index f669618f966b..7be57e1b7019 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1704,7 +1704,6 @@ static bool access_sp_el1(struct kvm_vcpu *vcpu,
 	return true;
 }
 
-
 static bool access_elr(struct kvm_vcpu *vcpu,
 		       struct sys_reg_params *p,
 		       const struct sys_reg_desc *r)
@@ -2236,12 +2235,236 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	EL2_REG(SP_EL2, NULL, reset_unknown, 0),
 };
 
-#define SYS_INSN_TO_DESC(insn, access_fn, forward_fn)	\
-	{ SYS_DESC((insn)), (access_fn), NULL, 0, 0, NULL, NULL, (forward_fn) }
+static bool handle_s1e01(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			 const struct sys_reg_desc *r)
+{
+	int sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
+
+	if (vcpu_has_nv(vcpu) && forward_traps(vcpu, HCR_AT))
+		return false;
+
+	__kvm_at_s1e01(vcpu, sys_encoding, p->regval);
+
+	return true;
+}
+
+static bool handle_s1e2(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			const struct sys_reg_desc *r)
+{
+	int sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
+
+	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
+		return false;
+
+	__kvm_at_s1e2(vcpu, sys_encoding, p->regval);
+
+	return true;
+}
+
+static u64 setup_par_aborted(u32 esr)
+{
+	u64 par = 0;
+
+	/* S [9]: fault in the stage 2 translation */
+	par |= (1 << 9);
+	/* FST [6:1]: Fault status code  */
+	par |= (esr << 1);
+	/* F [0]: translation is aborted */
+	par |= 1;
+
+	return par;
+}
+
+static u64 setup_par_completed(struct kvm_vcpu *vcpu, struct kvm_s2_trans *out)
+{
+	u64 par, vtcr_sh0;
+
+	/* F [0]: Translation is completed successfully */
+	par = 0;
+	/* ATTR [63:56] */
+	par |= out->upper_attr;
+	/* PA [47:12] */
+	par |= out->output & GENMASK_ULL(11, 0);
+	/* RES1 [11] */
+	par |= (1UL << 11);
+	/* SH [8:7]: Shareability attribute */
+	vtcr_sh0 = vcpu_read_sys_reg(vcpu, VTCR_EL2) & VTCR_EL2_SH0_MASK;
+	par |= (vtcr_sh0 >> VTCR_EL2_SH0_SHIFT) << 7;
+
+	return par;
+}
+
+static bool handle_s12(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+		       const struct sys_reg_desc *r, bool write)
+{
+	u64 par, va;
+	u32 esr, op;
+	phys_addr_t ipa;
+	struct kvm_s2_trans out;
+	int ret;
+
+	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
+		return false;
+
+	/* Do the stage-1 translation */
+	va = p->regval;
+	op = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
+	switch (op) {
+	case OP_AT_S12E1R:
+		op = OP_AT_S1E1R;
+		break;
+	case OP_AT_S12E1W:
+		op = OP_AT_S1E1W;
+		break;
+	case OP_AT_S12E0R:
+		op = OP_AT_S1E0R;
+		break;
+	case OP_AT_S12E0W:
+		op = OP_AT_S1E0W;
+		break;
+	default:
+		WARN_ON_ONCE(1);
+		return true;
+	}
+
+	__kvm_at_s1e01(vcpu, op, va);
+	par = vcpu_read_sys_reg(vcpu, PAR_EL1);
+	if (par & 1) {
+		/* The stage-1 translation aborted */
+		return true;
+	}
+
+	/* Do the stage-2 translation */
+	ipa = (par & GENMASK_ULL(47, 12)) | (va & GENMASK_ULL(11, 0));
+	out.esr = 0;
+	ret = kvm_walk_nested_s2(vcpu, ipa, &out);
+	if (ret < 0)
+		return false;
+
+	/* Check if the stage-2 PTW is aborted */
+	if (out.esr) {
+		esr = out.esr;
+		goto s2_trans_abort;
+	}
+
+	/* Check the access permission */
+	if ((!write && !out.readable) || (write && !out.writable)) {
+		esr = ESR_ELx_FSC_PERM;
+		esr |= out.level & 0x3;
+		goto s2_trans_abort;
+	}
+
+	vcpu_write_sys_reg(vcpu, setup_par_completed(vcpu, &out), PAR_EL1);
+	return true;
+
+s2_trans_abort:
+	vcpu_write_sys_reg(vcpu, setup_par_aborted(esr), PAR_EL1);
+	return true;
+}
+
+static bool handle_s12r(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			const struct sys_reg_desc *r)
+{
+	return handle_s12(vcpu, p, r, false);
+}
+
+static bool handle_s12w(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			const struct sys_reg_desc *r)
+{
+	return handle_s12(vcpu, p, r, true);
+}
+
+/*
+ * AT instruction emulation
+ *
+ * We emulate AT instructions executed in the virtual EL2.
+ * Basic strategy for the stage-1 translation emulation is to load proper
+ * context, which depends on the trapped instruction and the virtual HCR_EL2,
+ * to the EL1 virtual memory control registers and execute S1E[01] instructions
+ * in EL2. See below for more detail.
+ *
+ * For the stage-2 translation, which is necessary for S12E[01] emulation,
+ * we walk the guest hypervisor's stage-2 page table in software.
+ *
+ * The stage-1 translation emulations can be divided into two groups depending
+ * on the translation regime.
+ *
+ * 1. EL2 AT instructions: S1E2x
+ * +-----------------------------------------------------------------------+
+ * |                             |         Setting for the emulation       |
+ * | Virtual HCR_EL2.E2H on trap |-----------------------------------------+
+ * |                             | Phys EL1 regs | Phys NV, NV1 | Phys TGE |
+ * |-----------------------------------------------------------------------|
+ * |             0               |     vEL2      |    (1, 1)    |    0     |
+ * |             1               |     vEL2      |    (0, 0)    |    0     |
+ * +-----------------------------------------------------------------------+
+ *
+ * We emulate the EL2 AT instructions by loading virtual EL2 context
+ * to the EL1 virtual memory control registers and executing corresponding
+ * EL1 AT instructions.
+ *
+ * We set physical NV and NV1 bits to use EL2 page table format for non-VHE
+ * guest hypervisor (i.e. HCR_EL2.E2H == 0). As a VHE guest hypervisor uses the
+ * EL1 page table format, we don't set those bits.
+ *
+ * We should clear physical TGE bit not to use the EL2 translation regime when
+ * the host uses the VHE feature.
+ *
+ *
+ * 2. EL0/EL1 AT instructions: S1E[01]x, S12E1x
+ * +----------------------------------------------------------------------+
+ * |   Virtual HCR_EL2 on trap  |        Setting for the emulation        |
+ * |----------------------------------------------------------------------+
+ * | (vE2H, vTGE) | (vNV, vNV1) | Phys EL1 regs | Phys NV, NV1 | Phys TGE |
+ * |----------------------------------------------------------------------|
+ * |    (0, 0)*   |   (0, 0)    |      vEL1     |    (0, 0)    |    0     |
+ * |    (0, 0)    |   (1, 1)    |      vEL1     |    (1, 1)    |    0     |
+ * |    (1, 1)    |   (0, 0)    |      vEL2     |    (0, 0)    |    0     |
+ * |    (1, 1)    |   (1, 1)    |      vEL2     |    (1, 1)    |    0     |
+ * +----------------------------------------------------------------------+
+ *
+ * *For (0, 0) in the 'Virtual HCR_EL2 on trap' column, it actually means
+ *  (1, 1). Keep them (0, 0) just for the readability.
+ *
+ * We set physical EL1 virtual memory control registers depending on
+ * (vE2H, vTGE) pair. When the pair is (0, 0) where AT instructions are
+ * supposed to use EL0/EL1 translation regime, we load the EL1 registers with
+ * the virtual EL1 registers (i.e. EL1 registers from the guest hypervisor's
+ * point of view). When the pair is (1, 1), however, AT instructions are defined
+ * to apply EL2 translation regime. To emulate this behavior, we load the EL1
+ * registers with the virtual EL2 context. (i.e the shadow registers)
+ *
+ * We respect the virtual NV and NV1 bit for the emulation. When those bits are
+ * set, it means that a guest hypervisor would like to use EL2 page table format
+ * for the EL1 translation regime. We emulate this by setting the physical
+ * NV and NV1 bits.
+ */
+
+#define SYS_INSN(insn, access_fn)					\
+	{								\
+		SYS_DESC(OP_##insn),					\
+		.access = (access_fn),					\
+	}
+
 static struct sys_reg_desc sys_insn_descs[] = {
 	{ SYS_DESC(SYS_DC_ISW), access_dcsw },
+
+	SYS_INSN(AT_S1E1R, handle_s1e01),
+	SYS_INSN(AT_S1E1W, handle_s1e01),
+	SYS_INSN(AT_S1E0R, handle_s1e01),
+	SYS_INSN(AT_S1E0W, handle_s1e01),
+	SYS_INSN(AT_S1E1RP, handle_s1e01),
+	SYS_INSN(AT_S1E1WP, handle_s1e01),
+
 	{ SYS_DESC(SYS_DC_CSW), access_dcsw },
 	{ SYS_DESC(SYS_DC_CISW), access_dcsw },
+
+	SYS_INSN(AT_S1E2R, handle_s1e2),
+	SYS_INSN(AT_S1E2W, handle_s1e2),
+	SYS_INSN(AT_S12E1R, handle_s12r),
+	SYS_INSN(AT_S12E1W, handle_s12w),
+	SYS_INSN(AT_S12E0R, handle_s12r),
+	SYS_INSN(AT_S12E0W, handle_s12w),
 };
 
 static bool trap_dbgdidr(struct kvm_vcpu *vcpu,
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 41/64] KVM: arm64: nv: Trap and emulate TLBI instructions from virtual EL2
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Jintack Lim <jintack.lim@linaro.org>

When supporting nested virtualization a guest hypervisor executing TLBI
instructions must be trapped and emulated by the host hypervisor,
because the guest hypervisor can only affect physical TLB entries
relating to its own execution environment (virtual EL2 in EL1) but not
to the nested guests as required by the semantics of the instructions
and TLBI instructions might also result in updates (invalidations) to
shadow page tables.

This patch does several things.

1. List and define all TLBI system instructions to emulate.

2. Emulate TLBI ALLE2(IS) instruction executed in the virtual EL2. Since
we emulate the virtual EL2 in the EL1, we invalidate EL1&0 regime stage
1 TLB entries with setting vttbr_el2 having the VMID of the virtual EL2.

3. Emulate TLBI VAE2* instruction executed in the virtual EL2. Based on the
same principle as TLBI ALLE2 instruction, we can simply emulate those
instructions by executing corresponding VAE1* instructions with the
virtual EL2's VMID assigned by the host hypervisor.

Note that we are able to emulate TLBI ALLE2IS precisely by only
invalidating stage 1 TLB entries via TLBI VMALL1IS instruction, but to
make it simeple, we reuse the existing function, __kvm_tlb_flush_vmid(),
which invalidates both of stage 1 and 2 TLB entries.

4. TLBI ALLE1(IS) instruction invalidates all EL1&0 regime stage 1 and 2
TLB entries (on all PEs in the same Inner Shareable domain). To emulate
these instructions, we first need to clear all the mappings in the
shadow page tables since executing those instructions implies the change
of mappings in the stage 2 page tables maintained by the guest
hypervisor.  We then need to invalidate all EL1&0 regime stage 1 and 2
TLB entries of all VMIDs, which are assigned by the host hypervisor, for
this VM.

5. Based on the same principle as TLBI ALLE1(IS) emulation, we clear the
mappings in the shadow stage-2 page tables and invalidate TLB entries.
But this time we do it only for the current VMID from the guest
hypervisor's perspective, not for all VMIDs.

6. Based on the same principle as TLBI ALLE1(IS) and TLBI VMALLS12E1(IS)
emulation, we clear the mappings in the shadow stage-2 page tables and
invalidate TLB entries. We do it only for one mapping for the current
VMID from the guest hypervisor's view.

7. Forward system instruction traps to the virtual EL2 if a
corresponding bit in the virtual HCR_EL2 is set.

8. Even though a guest hypervisor can execute TLBI instructions that are
accesible at EL1 without trap, it's wrong; All those TLBI instructions
work based on current VMID, and when running a guest hypervisor current
VMID is the one for itself, not the one from the virtual vttbr_el2. So
letting a guest hypervisor execute those TLBI instructions results in
invalidating its own TLB entries and leaving invalid TLB entries
unhandled.

Therefore we trap and emulate those TLBI instructions. The emulation is
simple; we find a shadow VMID mapped to the virtual vttbr_el2, set it in
the physical vttbr_el2, then execute the same instruction in EL2.

We don't set HCR_EL2.TTLB bit yet.

  [ Changes performed by Marc Zynger:

    The TLBI handling code more or less directly execute the same
    instruction that has been trapped (with an EL2->EL1 conversion
    in the case of an EL2 TLBI), but that's unfortunately not enough:

    - TLBIs must be upgraded to the Inner Shareable domain to account
      for vcpu migration, just like we already have with HCR_EL2.FB.

    - The DSB instruction that synchronises these must thus be on
      the Inner Shareable domain as well.

    - Prior to executing the TLBI, we need another DSB ISHST to make
      sure that the update to the page tables is now visible.

      Ordering of system instructions fixed

    - The current TLB invalidation code is pretty buggy, as it assume a
      page mapping. On the contrary, it is likely that TLB invalidation
      will cover more than a single page, and the size should be decided
      by the guests configuration (and not the host's).

      Since we don't cache the guest mapping sizes in the shadow PT yet,
      let's assume the worse case (a block mapping) and invalidate that.

      Take this opportunity to fix the decoding of the parameter (it
      isn't a straight IPA).

    - In general, we always emulate local TBL invalidations as being
      as upgraded to the Inner Shareable domain so that we can easily
      deal with vcpu migration. This is consistent with the fact that
      we set HCR_EL2.FB when running non-nested VMs.

      So let's emulate TLBI ALLE2 as ALLE2IS.
  ]

  [ Changes performed by Christoffer Dall:

    Sometimes when we are invalidating the TLB for a certain S2 MMU
    context, this context can also have EL2 context associated with it
    and we have to invalidate this too.
  ]

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_asm.h |   2 +
 arch/arm64/include/asm/sysreg.h  |  36 +++++
 arch/arm64/kvm/hyp/vhe/switch.c  |   8 +-
 arch/arm64/kvm/hyp/vhe/tlb.c     |  81 +++++++++++
 arch/arm64/kvm/mmu.c             |  19 ++-
 arch/arm64/kvm/sys_regs.c        | 227 +++++++++++++++++++++++++++++++
 6 files changed, 368 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index e22861ece3c3..64ddbecab241 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -206,6 +206,8 @@ extern void __kvm_flush_cpu_context(struct kvm_s2_mmu *mmu);
 extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa,
 				     int level);
 extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
+extern void __kvm_tlb_vae2is(struct kvm_s2_mmu *mmu, u64 va, u64 sys_encoding);
+extern void __kvm_tlb_el1_instr(struct kvm_s2_mmu *mmu, u64 val, u64 sys_encoding);
 
 extern void __kvm_timer_set_cntvoff(u64 cntvoff);
 extern void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr);
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index ea17d1eabfc5..41c3603a3f29 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -675,6 +675,42 @@
 #define OP_AT_S12E0R	sys_insn(AT_Op0, 4, AT_CRn, 8, 6)
 #define OP_AT_S12E0W	sys_insn(AT_Op0, 4, AT_CRn, 8, 7)
 
+/* TLBI instructions */
+#define TLBI_Op0	1
+#define TLBI_Op1_EL1	0	/* Accessible from EL1 or higher */
+#define TLBI_Op1_EL2	4	/* Accessible from EL2 or higher */
+#define TLBI_CRn	8
+#define tlbi_insn_el1(CRm, Op2)	sys_insn(TLBI_Op0, TLBI_Op1_EL1, TLBI_CRn, (CRm), (Op2))
+#define tlbi_insn_el2(CRm, Op2)	sys_insn(TLBI_Op0, TLBI_Op1_EL2, TLBI_CRn, (CRm), (Op2))
+
+#define OP_TLBI_VMALLE1IS	tlbi_insn_el1(3, 0)
+#define OP_TLBI_VAE1IS		tlbi_insn_el1(3, 1)
+#define OP_TLBI_ASIDE1IS	tlbi_insn_el1(3, 2)
+#define OP_TLBI_VAAE1IS		tlbi_insn_el1(3, 3)
+#define OP_TLBI_VALE1IS		tlbi_insn_el1(3, 5)
+#define OP_TLBI_VAALE1IS	tlbi_insn_el1(3, 7)
+#define OP_TLBI_VMALLE1		tlbi_insn_el1(7, 0)
+#define OP_TLBI_VAE1		tlbi_insn_el1(7, 1)
+#define OP_TLBI_ASIDE1		tlbi_insn_el1(7, 2)
+#define OP_TLBI_VAAE1		tlbi_insn_el1(7, 3)
+#define OP_TLBI_VALE1		tlbi_insn_el1(7, 5)
+#define OP_TLBI_VAALE1		tlbi_insn_el1(7, 7)
+
+#define OP_TLBI_IPAS2E1IS	tlbi_insn_el2(0, 1)
+#define OP_TLBI_IPAS2LE1IS	tlbi_insn_el2(0, 5)
+#define OP_TLBI_ALLE2IS		tlbi_insn_el2(3, 0)
+#define OP_TLBI_VAE2IS		tlbi_insn_el2(3, 1)
+#define OP_TLBI_ALLE1IS		tlbi_insn_el2(3, 4)
+#define OP_TLBI_VALE2IS		tlbi_insn_el2(3, 5)
+#define OP_TLBI_VMALLS12E1IS	tlbi_insn_el2(3, 6)
+#define OP_TLBI_IPAS2E1		tlbi_insn_el2(4, 1)
+#define OP_TLBI_IPAS2LE1	tlbi_insn_el2(4, 5)
+#define OP_TLBI_ALLE2		tlbi_insn_el2(7, 0)
+#define OP_TLBI_VAE2		tlbi_insn_el2(7, 1)
+#define OP_TLBI_ALLE1		tlbi_insn_el2(7, 4)
+#define OP_TLBI_VALE2		tlbi_insn_el2(7, 5)
+#define OP_TLBI_VMALLS12E1	tlbi_insn_el2(7, 6)
+
 /* Common SCTLR_ELx flags. */
 #define SCTLR_ELx_DSSBS	(BIT(44))
 #define SCTLR_ELx_ATA	(BIT(43))
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index b7790d3c4122..0e164cc8e913 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -44,7 +44,7 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 			 * the EL1 virtual memory control register accesses
 			 * as well as the AT S1 operations.
 			 */
-			hcr |= HCR_TVM | HCR_TRVM | HCR_AT | HCR_NV1;
+			hcr |= HCR_TVM | HCR_TRVM | HCR_AT | HCR_TTLB | HCR_NV1;
 		} else {
 			/*
 			 * For a guest hypervisor on v8.1 (VHE), allow to
@@ -72,11 +72,11 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 
 			/*
 			 * If we're using the EL1 translation regime
-			 * (TGE clear), then ensure that AT S1 ops are
-			 * trapped too.
+			 * (TGE clear), then ensure that AT S1 and
+			 * TLBI E1 ops are trapped too.
 			 */
 			if (!vcpu_el2_tge_is_set(vcpu))
-				hcr |= HCR_AT;
+				hcr |= HCR_AT | HCR_TTLB;
 		}
 	}
 
diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c
index 24cef9b87f9e..c4389db4cc22 100644
--- a/arch/arm64/kvm/hyp/vhe/tlb.c
+++ b/arch/arm64/kvm/hyp/vhe/tlb.c
@@ -161,3 +161,84 @@ void __kvm_flush_vm_context(void)
 
 	dsb(ish);
 }
+
+void __kvm_tlb_vae2is(struct kvm_s2_mmu *mmu, u64 va, u64 sys_encoding)
+{
+	struct tlb_inv_context cxt;
+
+	dsb(ishst);
+
+	/* Switch to requested VMID */
+	__tlb_switch_to_guest(mmu, &cxt);
+
+	/*
+	 * Execute the EL1 version of TLBI VAE2* instruction, forcing
+	 * an upgrade to the Inner Shareable domain in order to
+	 * perform the invalidation on all CPUs.
+	 */
+	switch (sys_encoding) {
+	case OP_TLBI_VAE2:
+	case OP_TLBI_VAE2IS:
+		__tlbi(vae1is, va);
+		break;
+	case OP_TLBI_VALE2:
+	case OP_TLBI_VALE2IS:
+		__tlbi(vale1is, va);
+		break;
+	default:
+		break;
+	}
+	dsb(ish);
+	isb();
+
+	__tlb_switch_to_host(&cxt);
+}
+
+void __kvm_tlb_el1_instr(struct kvm_s2_mmu *mmu, u64 val, u64 sys_encoding)
+{
+	struct tlb_inv_context cxt;
+
+	dsb(ishst);
+
+	/* Switch to requested VMID */
+	__tlb_switch_to_guest(mmu, &cxt);
+
+	/*
+	 * Execute the same instruction as the guest hypervisor did,
+	 * expanding the scope of local TLB invalidations to the Inner
+	 * Shareable domain so that it takes place on all CPUs. This
+	 * is equivalent to having HCR_EL2.FB set.
+	 */
+	switch (sys_encoding) {
+	case OP_TLBI_VMALLE1:
+	case OP_TLBI_VMALLE1IS:
+		__tlbi(vmalle1is);
+		break;
+	case OP_TLBI_VAE1:
+	case OP_TLBI_VAE1IS:
+		__tlbi(vae1is, val);
+		break;
+	case OP_TLBI_ASIDE1:
+	case OP_TLBI_ASIDE1IS:
+		__tlbi(aside1is, val);
+		break;
+	case OP_TLBI_VAAE1:
+	case OP_TLBI_VAAE1IS:
+		__tlbi(vaae1is, val);
+		break;
+	case OP_TLBI_VALE1:
+	case OP_TLBI_VALE1IS:
+		__tlbi(vale1is, val);
+		break;
+	case OP_TLBI_VAALE1:
+	case OP_TLBI_VAALE1IS:
+		__tlbi(vaale1is, val);
+		break;
+	default:
+		break;
+	}
+	dsb(ish);
+	isb();
+
+	__tlb_switch_to_host(&cxt);
+}
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index b9b11be65009..4b38f2f3c997 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -80,8 +80,25 @@ static bool memslot_is_logging(struct kvm_memory_slot *memslot)
  */
 void kvm_flush_remote_tlbs(struct kvm *kvm)
 {
+	struct kvm_s2_mmu *mmu = &kvm->arch.mmu;
+
 	++kvm->stat.generic.remote_tlb_flush_requests;
-	kvm_call_hyp(__kvm_tlb_flush_vmid, &kvm->arch.mmu);
+
+	if (mmu == &kvm->arch.mmu) {
+		/*
+		 * For a normal (i.e. non-nested) guest, flush entries for the
+		 * given VMID *
+		 */
+		kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
+	} else {
+		/*
+		 * When supporting nested virtualization, we can have multiple
+		 * VMIDs in play for each VCPU in the VM, so it's really not
+		 * worth it to try to quiesce the system and flush all the
+		 * VMIDs that may be in use, instead just nuke the whole thing.
+		 */
+		kvm_call_hyp(__kvm_flush_vm_context);
+	}
 }
 
 static bool kvm_is_device_pfn(unsigned long pfn)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 7be57e1b7019..d7441b8ba406 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2374,6 +2374,205 @@ static bool handle_s12w(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 	return handle_s12(vcpu, p, r, true);
 }
 
+static bool handle_alle2is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			   const struct sys_reg_desc *r)
+{
+	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
+		return false;
+
+	/*
+	 * To emulate invalidating all EL2 regime stage 1 TLB entries for all
+	 * PEs, executing TLBI VMALLE1IS is enough. But reuse the existing
+	 * interface for the simplicity; invalidating stage 2 entries doesn't
+	 * affect the correctness.
+	 */
+	__kvm_tlb_flush_vmid(&vcpu->kvm->arch.mmu);
+	return true;
+}
+
+static bool handle_vae2is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			  const struct sys_reg_desc *r)
+{
+	int sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
+
+	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
+		return false;
+
+	/*
+	 * Based on the same principle as TLBI ALLE2 instruction
+	 * emulation, we emulate TLBI VAE2* instructions by executing
+	 * corresponding TLBI VAE1* instructions with the virtual
+	 * EL2's VMID assigned by the host hypervisor.
+	 */
+	__kvm_tlb_vae2is(&vcpu->kvm->arch.mmu, p->regval, sys_encoding);
+	return true;
+}
+
+static bool handle_alle1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			   const struct sys_reg_desc *r)
+{
+	struct kvm_s2_mmu *mmu = &vcpu->kvm->arch.mmu;
+
+	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
+		return false;
+
+	spin_lock(&vcpu->kvm->mmu_lock);
+
+	/*
+	 * Clear all mappings in the shadow page tables and invalidate the stage
+	 * 1 and 2 TLB entries via kvm_tlb_flush_vmid_ipa().
+	 */
+	kvm_nested_s2_clear(vcpu->kvm);
+
+	if (mmu->vmid.vmid_gen) {
+		/*
+		 * Invalidate the stage 1 and 2 TLB entries for the host OS
+		 * in a VM only if there is one.
+		 */
+		__kvm_tlb_flush_vmid(mmu);
+	}
+
+	spin_unlock(&vcpu->kvm->mmu_lock);
+
+	return true;
+}
+
+static bool handle_vmalls12e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+				const struct sys_reg_desc *r)
+{
+	struct kvm_s2_mmu *mmu;
+	u64 vttbr;
+
+	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
+		return false;
+
+	vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
+	spin_lock(&vcpu->kvm->mmu_lock);
+
+	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, HCR_VM);
+	if (mmu)
+		kvm_unmap_stage2_range(mmu, 0, kvm_phys_size(vcpu->kvm));
+
+	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, 0);
+	if (mmu)
+		kvm_unmap_stage2_range(mmu, 0, kvm_phys_size(vcpu->kvm));
+
+	spin_unlock(&vcpu->kvm->mmu_lock);
+
+	return true;
+}
+
+static bool handle_ipas2e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			     const struct sys_reg_desc *r)
+{
+	u64 vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
+	u64 vtcr = vcpu_read_sys_reg(vcpu, VTCR_EL2);
+	struct kvm_s2_mmu *mmu;
+	u64 base_addr;
+	int max_size;
+
+	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
+		return false;
+
+	/*
+	 * We drop a number of things from the supplied value:
+	 *
+	 * - NS bit: we're non-secure only.
+	 *
+	 * - TTL field: We already have the granule size from the
+	 *   VTCR_EL2.TG0 field, and the level is only relevant to the
+	 *   guest's S2PT.
+	 *
+	 * - IPA[51:48]: We don't support 52bit IPA just yet...
+	 *
+	 * And of course, adjust the IPA to be on an actual address.
+	 */
+	base_addr = (p->regval & GENMASK_ULL(35, 0)) << 12;
+
+	/* Compute the maximum extent of the invalidation */
+	switch ((vtcr & VTCR_EL2_TG0_MASK)) {
+	case VTCR_EL2_TG0_4K:
+		max_size = SZ_1G;
+		break;
+	case VTCR_EL2_TG0_16K:
+		max_size = SZ_32M;
+		break;
+	case VTCR_EL2_TG0_64K:
+		/*
+		 * No, we do not support 52bit IPA in nested yet. Once
+		 * we do, this should be 4TB.
+		 */
+		/* FIXME: remove the 52bit PA support from the IDregs */
+		max_size = SZ_512M;
+		break;
+	default:
+		BUG();
+	}
+
+	spin_lock(&vcpu->kvm->mmu_lock);
+
+	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, HCR_VM);
+	if (mmu)
+		kvm_unmap_stage2_range(mmu, base_addr, max_size);
+
+	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, 0);
+	if (mmu)
+		kvm_unmap_stage2_range(mmu, base_addr, max_size);
+
+	spin_unlock(&vcpu->kvm->mmu_lock);
+
+	return true;
+}
+
+static bool handle_tlbi_el1(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			    const struct sys_reg_desc *r)
+{
+	u32 sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
+
+	if (vcpu_has_nv(vcpu) && forward_traps(vcpu, HCR_TTLB))
+		return false;
+
+	/*
+	 * If we're here, this is because we've trapped on a EL1 TLBI
+	 * instruction that affects the EL1 translation regime while
+	 * we're running in a context that doesn't allow us to let the
+	 * HW do its thing (aka vEL2):
+	 *
+	 * - HCR_EL2.E2H == 0 : a non-VHE guest
+	 * - HCR_EL2.{E2H,TGE} == { 1, 0 } : a VHE guest in guest mode
+	 *
+	 * We don't expect these helpers to ever be called when running
+	 * in a vEL1 context.
+	 */
+
+	WARN_ON(!vcpu_is_el2(vcpu));
+
+	mutex_lock(&vcpu->kvm->lock);
+
+	if ((__vcpu_sys_reg(vcpu, HCR_EL2) & (HCR_E2H | HCR_TGE)) != (HCR_E2H | HCR_TGE)) {
+		u64 virtual_vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
+		struct kvm_s2_mmu *mmu;
+
+		mmu = lookup_s2_mmu(vcpu->kvm, virtual_vttbr, HCR_VM);
+		if (mmu)
+			__kvm_tlb_el1_instr(mmu, p->regval, sys_encoding);
+
+		mmu = lookup_s2_mmu(vcpu->kvm, virtual_vttbr, 0);
+		if (mmu)
+			__kvm_tlb_el1_instr(mmu, p->regval, sys_encoding);
+	} else {
+		/*
+		 * ARMv8.4-NV allows the guest to change TGE behind
+		 * our back, so we always trap EL1 TLBIs from vEL2...
+		 */
+		__kvm_tlb_el1_instr(&vcpu->kvm->arch.mmu, p->regval, sys_encoding);
+	}
+
+	mutex_unlock(&vcpu->kvm->lock);
+
+	return true;
+}
+
 /*
  * AT instruction emulation
  *
@@ -2459,12 +2658,40 @@ static struct sys_reg_desc sys_insn_descs[] = {
 	{ SYS_DESC(SYS_DC_CSW), access_dcsw },
 	{ SYS_DESC(SYS_DC_CISW), access_dcsw },
 
+	SYS_INSN(TLBI_VMALLE1IS, handle_tlbi_el1),
+	SYS_INSN(TLBI_VAE1IS, handle_tlbi_el1),
+	SYS_INSN(TLBI_ASIDE1IS, handle_tlbi_el1),
+	SYS_INSN(TLBI_VAAE1IS, handle_tlbi_el1),
+	SYS_INSN(TLBI_VALE1IS, handle_tlbi_el1),
+	SYS_INSN(TLBI_VAALE1IS, handle_tlbi_el1),
+	SYS_INSN(TLBI_VMALLE1, handle_tlbi_el1),
+	SYS_INSN(TLBI_VAE1, handle_tlbi_el1),
+	SYS_INSN(TLBI_ASIDE1, handle_tlbi_el1),
+	SYS_INSN(TLBI_VAAE1, handle_tlbi_el1),
+	SYS_INSN(TLBI_VALE1, handle_tlbi_el1),
+	SYS_INSN(TLBI_VAALE1, handle_tlbi_el1),
+
 	SYS_INSN(AT_S1E2R, handle_s1e2),
 	SYS_INSN(AT_S1E2W, handle_s1e2),
 	SYS_INSN(AT_S12E1R, handle_s12r),
 	SYS_INSN(AT_S12E1W, handle_s12w),
 	SYS_INSN(AT_S12E0R, handle_s12r),
 	SYS_INSN(AT_S12E0W, handle_s12w),
+
+	SYS_INSN(TLBI_IPAS2E1IS, handle_ipas2e1is),
+	SYS_INSN(TLBI_IPAS2LE1IS, handle_ipas2e1is),
+	SYS_INSN(TLBI_ALLE2IS, handle_alle2is),
+	SYS_INSN(TLBI_VAE2IS, handle_vae2is),
+	SYS_INSN(TLBI_ALLE1IS, handle_alle1is),
+	SYS_INSN(TLBI_VALE2IS, handle_vae2is),
+	SYS_INSN(TLBI_VMALLS12E1IS, handle_vmalls12e1is),
+	SYS_INSN(TLBI_IPAS2E1, handle_ipas2e1is),
+	SYS_INSN(TLBI_IPAS2LE1, handle_ipas2e1is),
+	SYS_INSN(TLBI_ALLE2, handle_alle2is),
+	SYS_INSN(TLBI_VAE2, handle_vae2is),
+	SYS_INSN(TLBI_ALLE1, handle_alle1is),
+	SYS_INSN(TLBI_VALE2, handle_vae2is),
+	SYS_INSN(TLBI_VMALLS12E1, handle_vmalls12e1is),
 };
 
 static bool trap_dbgdidr(struct kvm_vcpu *vcpu,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 41/64] KVM: arm64: nv: Trap and emulate TLBI instructions from virtual EL2
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

From: Jintack Lim <jintack.lim@linaro.org>

When supporting nested virtualization a guest hypervisor executing TLBI
instructions must be trapped and emulated by the host hypervisor,
because the guest hypervisor can only affect physical TLB entries
relating to its own execution environment (virtual EL2 in EL1) but not
to the nested guests as required by the semantics of the instructions
and TLBI instructions might also result in updates (invalidations) to
shadow page tables.

This patch does several things.

1. List and define all TLBI system instructions to emulate.

2. Emulate TLBI ALLE2(IS) instruction executed in the virtual EL2. Since
we emulate the virtual EL2 in the EL1, we invalidate EL1&0 regime stage
1 TLB entries with setting vttbr_el2 having the VMID of the virtual EL2.

3. Emulate TLBI VAE2* instruction executed in the virtual EL2. Based on the
same principle as TLBI ALLE2 instruction, we can simply emulate those
instructions by executing corresponding VAE1* instructions with the
virtual EL2's VMID assigned by the host hypervisor.

Note that we are able to emulate TLBI ALLE2IS precisely by only
invalidating stage 1 TLB entries via TLBI VMALL1IS instruction, but to
make it simeple, we reuse the existing function, __kvm_tlb_flush_vmid(),
which invalidates both of stage 1 and 2 TLB entries.

4. TLBI ALLE1(IS) instruction invalidates all EL1&0 regime stage 1 and 2
TLB entries (on all PEs in the same Inner Shareable domain). To emulate
these instructions, we first need to clear all the mappings in the
shadow page tables since executing those instructions implies the change
of mappings in the stage 2 page tables maintained by the guest
hypervisor.  We then need to invalidate all EL1&0 regime stage 1 and 2
TLB entries of all VMIDs, which are assigned by the host hypervisor, for
this VM.

5. Based on the same principle as TLBI ALLE1(IS) emulation, we clear the
mappings in the shadow stage-2 page tables and invalidate TLB entries.
But this time we do it only for the current VMID from the guest
hypervisor's perspective, not for all VMIDs.

6. Based on the same principle as TLBI ALLE1(IS) and TLBI VMALLS12E1(IS)
emulation, we clear the mappings in the shadow stage-2 page tables and
invalidate TLB entries. We do it only for one mapping for the current
VMID from the guest hypervisor's view.

7. Forward system instruction traps to the virtual EL2 if a
corresponding bit in the virtual HCR_EL2 is set.

8. Even though a guest hypervisor can execute TLBI instructions that are
accesible at EL1 without trap, it's wrong; All those TLBI instructions
work based on current VMID, and when running a guest hypervisor current
VMID is the one for itself, not the one from the virtual vttbr_el2. So
letting a guest hypervisor execute those TLBI instructions results in
invalidating its own TLB entries and leaving invalid TLB entries
unhandled.

Therefore we trap and emulate those TLBI instructions. The emulation is
simple; we find a shadow VMID mapped to the virtual vttbr_el2, set it in
the physical vttbr_el2, then execute the same instruction in EL2.

We don't set HCR_EL2.TTLB bit yet.

  [ Changes performed by Marc Zynger:

    The TLBI handling code more or less directly execute the same
    instruction that has been trapped (with an EL2->EL1 conversion
    in the case of an EL2 TLBI), but that's unfortunately not enough:

    - TLBIs must be upgraded to the Inner Shareable domain to account
      for vcpu migration, just like we already have with HCR_EL2.FB.

    - The DSB instruction that synchronises these must thus be on
      the Inner Shareable domain as well.

    - Prior to executing the TLBI, we need another DSB ISHST to make
      sure that the update to the page tables is now visible.

      Ordering of system instructions fixed

    - The current TLB invalidation code is pretty buggy, as it assume a
      page mapping. On the contrary, it is likely that TLB invalidation
      will cover more than a single page, and the size should be decided
      by the guests configuration (and not the host's).

      Since we don't cache the guest mapping sizes in the shadow PT yet,
      let's assume the worse case (a block mapping) and invalidate that.

      Take this opportunity to fix the decoding of the parameter (it
      isn't a straight IPA).

    - In general, we always emulate local TBL invalidations as being
      as upgraded to the Inner Shareable domain so that we can easily
      deal with vcpu migration. This is consistent with the fact that
      we set HCR_EL2.FB when running non-nested VMs.

      So let's emulate TLBI ALLE2 as ALLE2IS.
  ]

  [ Changes performed by Christoffer Dall:

    Sometimes when we are invalidating the TLB for a certain S2 MMU
    context, this context can also have EL2 context associated with it
    and we have to invalidate this too.
  ]

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_asm.h |   2 +
 arch/arm64/include/asm/sysreg.h  |  36 +++++
 arch/arm64/kvm/hyp/vhe/switch.c  |   8 +-
 arch/arm64/kvm/hyp/vhe/tlb.c     |  81 +++++++++++
 arch/arm64/kvm/mmu.c             |  19 ++-
 arch/arm64/kvm/sys_regs.c        | 227 +++++++++++++++++++++++++++++++
 6 files changed, 368 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index e22861ece3c3..64ddbecab241 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -206,6 +206,8 @@ extern void __kvm_flush_cpu_context(struct kvm_s2_mmu *mmu);
 extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa,
 				     int level);
 extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
+extern void __kvm_tlb_vae2is(struct kvm_s2_mmu *mmu, u64 va, u64 sys_encoding);
+extern void __kvm_tlb_el1_instr(struct kvm_s2_mmu *mmu, u64 val, u64 sys_encoding);
 
 extern void __kvm_timer_set_cntvoff(u64 cntvoff);
 extern void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr);
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index ea17d1eabfc5..41c3603a3f29 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -675,6 +675,42 @@
 #define OP_AT_S12E0R	sys_insn(AT_Op0, 4, AT_CRn, 8, 6)
 #define OP_AT_S12E0W	sys_insn(AT_Op0, 4, AT_CRn, 8, 7)
 
+/* TLBI instructions */
+#define TLBI_Op0	1
+#define TLBI_Op1_EL1	0	/* Accessible from EL1 or higher */
+#define TLBI_Op1_EL2	4	/* Accessible from EL2 or higher */
+#define TLBI_CRn	8
+#define tlbi_insn_el1(CRm, Op2)	sys_insn(TLBI_Op0, TLBI_Op1_EL1, TLBI_CRn, (CRm), (Op2))
+#define tlbi_insn_el2(CRm, Op2)	sys_insn(TLBI_Op0, TLBI_Op1_EL2, TLBI_CRn, (CRm), (Op2))
+
+#define OP_TLBI_VMALLE1IS	tlbi_insn_el1(3, 0)
+#define OP_TLBI_VAE1IS		tlbi_insn_el1(3, 1)
+#define OP_TLBI_ASIDE1IS	tlbi_insn_el1(3, 2)
+#define OP_TLBI_VAAE1IS		tlbi_insn_el1(3, 3)
+#define OP_TLBI_VALE1IS		tlbi_insn_el1(3, 5)
+#define OP_TLBI_VAALE1IS	tlbi_insn_el1(3, 7)
+#define OP_TLBI_VMALLE1		tlbi_insn_el1(7, 0)
+#define OP_TLBI_VAE1		tlbi_insn_el1(7, 1)
+#define OP_TLBI_ASIDE1		tlbi_insn_el1(7, 2)
+#define OP_TLBI_VAAE1		tlbi_insn_el1(7, 3)
+#define OP_TLBI_VALE1		tlbi_insn_el1(7, 5)
+#define OP_TLBI_VAALE1		tlbi_insn_el1(7, 7)
+
+#define OP_TLBI_IPAS2E1IS	tlbi_insn_el2(0, 1)
+#define OP_TLBI_IPAS2LE1IS	tlbi_insn_el2(0, 5)
+#define OP_TLBI_ALLE2IS		tlbi_insn_el2(3, 0)
+#define OP_TLBI_VAE2IS		tlbi_insn_el2(3, 1)
+#define OP_TLBI_ALLE1IS		tlbi_insn_el2(3, 4)
+#define OP_TLBI_VALE2IS		tlbi_insn_el2(3, 5)
+#define OP_TLBI_VMALLS12E1IS	tlbi_insn_el2(3, 6)
+#define OP_TLBI_IPAS2E1		tlbi_insn_el2(4, 1)
+#define OP_TLBI_IPAS2LE1	tlbi_insn_el2(4, 5)
+#define OP_TLBI_ALLE2		tlbi_insn_el2(7, 0)
+#define OP_TLBI_VAE2		tlbi_insn_el2(7, 1)
+#define OP_TLBI_ALLE1		tlbi_insn_el2(7, 4)
+#define OP_TLBI_VALE2		tlbi_insn_el2(7, 5)
+#define OP_TLBI_VMALLS12E1	tlbi_insn_el2(7, 6)
+
 /* Common SCTLR_ELx flags. */
 #define SCTLR_ELx_DSSBS	(BIT(44))
 #define SCTLR_ELx_ATA	(BIT(43))
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index b7790d3c4122..0e164cc8e913 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -44,7 +44,7 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 			 * the EL1 virtual memory control register accesses
 			 * as well as the AT S1 operations.
 			 */
-			hcr |= HCR_TVM | HCR_TRVM | HCR_AT | HCR_NV1;
+			hcr |= HCR_TVM | HCR_TRVM | HCR_AT | HCR_TTLB | HCR_NV1;
 		} else {
 			/*
 			 * For a guest hypervisor on v8.1 (VHE), allow to
@@ -72,11 +72,11 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 
 			/*
 			 * If we're using the EL1 translation regime
-			 * (TGE clear), then ensure that AT S1 ops are
-			 * trapped too.
+			 * (TGE clear), then ensure that AT S1 and
+			 * TLBI E1 ops are trapped too.
 			 */
 			if (!vcpu_el2_tge_is_set(vcpu))
-				hcr |= HCR_AT;
+				hcr |= HCR_AT | HCR_TTLB;
 		}
 	}
 
diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c
index 24cef9b87f9e..c4389db4cc22 100644
--- a/arch/arm64/kvm/hyp/vhe/tlb.c
+++ b/arch/arm64/kvm/hyp/vhe/tlb.c
@@ -161,3 +161,84 @@ void __kvm_flush_vm_context(void)
 
 	dsb(ish);
 }
+
+void __kvm_tlb_vae2is(struct kvm_s2_mmu *mmu, u64 va, u64 sys_encoding)
+{
+	struct tlb_inv_context cxt;
+
+	dsb(ishst);
+
+	/* Switch to requested VMID */
+	__tlb_switch_to_guest(mmu, &cxt);
+
+	/*
+	 * Execute the EL1 version of TLBI VAE2* instruction, forcing
+	 * an upgrade to the Inner Shareable domain in order to
+	 * perform the invalidation on all CPUs.
+	 */
+	switch (sys_encoding) {
+	case OP_TLBI_VAE2:
+	case OP_TLBI_VAE2IS:
+		__tlbi(vae1is, va);
+		break;
+	case OP_TLBI_VALE2:
+	case OP_TLBI_VALE2IS:
+		__tlbi(vale1is, va);
+		break;
+	default:
+		break;
+	}
+	dsb(ish);
+	isb();
+
+	__tlb_switch_to_host(&cxt);
+}
+
+void __kvm_tlb_el1_instr(struct kvm_s2_mmu *mmu, u64 val, u64 sys_encoding)
+{
+	struct tlb_inv_context cxt;
+
+	dsb(ishst);
+
+	/* Switch to requested VMID */
+	__tlb_switch_to_guest(mmu, &cxt);
+
+	/*
+	 * Execute the same instruction as the guest hypervisor did,
+	 * expanding the scope of local TLB invalidations to the Inner
+	 * Shareable domain so that it takes place on all CPUs. This
+	 * is equivalent to having HCR_EL2.FB set.
+	 */
+	switch (sys_encoding) {
+	case OP_TLBI_VMALLE1:
+	case OP_TLBI_VMALLE1IS:
+		__tlbi(vmalle1is);
+		break;
+	case OP_TLBI_VAE1:
+	case OP_TLBI_VAE1IS:
+		__tlbi(vae1is, val);
+		break;
+	case OP_TLBI_ASIDE1:
+	case OP_TLBI_ASIDE1IS:
+		__tlbi(aside1is, val);
+		break;
+	case OP_TLBI_VAAE1:
+	case OP_TLBI_VAAE1IS:
+		__tlbi(vaae1is, val);
+		break;
+	case OP_TLBI_VALE1:
+	case OP_TLBI_VALE1IS:
+		__tlbi(vale1is, val);
+		break;
+	case OP_TLBI_VAALE1:
+	case OP_TLBI_VAALE1IS:
+		__tlbi(vaale1is, val);
+		break;
+	default:
+		break;
+	}
+	dsb(ish);
+	isb();
+
+	__tlb_switch_to_host(&cxt);
+}
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index b9b11be65009..4b38f2f3c997 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -80,8 +80,25 @@ static bool memslot_is_logging(struct kvm_memory_slot *memslot)
  */
 void kvm_flush_remote_tlbs(struct kvm *kvm)
 {
+	struct kvm_s2_mmu *mmu = &kvm->arch.mmu;
+
 	++kvm->stat.generic.remote_tlb_flush_requests;
-	kvm_call_hyp(__kvm_tlb_flush_vmid, &kvm->arch.mmu);
+
+	if (mmu == &kvm->arch.mmu) {
+		/*
+		 * For a normal (i.e. non-nested) guest, flush entries for the
+		 * given VMID *
+		 */
+		kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
+	} else {
+		/*
+		 * When supporting nested virtualization, we can have multiple
+		 * VMIDs in play for each VCPU in the VM, so it's really not
+		 * worth it to try to quiesce the system and flush all the
+		 * VMIDs that may be in use, instead just nuke the whole thing.
+		 */
+		kvm_call_hyp(__kvm_flush_vm_context);
+	}
 }
 
 static bool kvm_is_device_pfn(unsigned long pfn)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 7be57e1b7019..d7441b8ba406 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2374,6 +2374,205 @@ static bool handle_s12w(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 	return handle_s12(vcpu, p, r, true);
 }
 
+static bool handle_alle2is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			   const struct sys_reg_desc *r)
+{
+	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
+		return false;
+
+	/*
+	 * To emulate invalidating all EL2 regime stage 1 TLB entries for all
+	 * PEs, executing TLBI VMALLE1IS is enough. But reuse the existing
+	 * interface for the simplicity; invalidating stage 2 entries doesn't
+	 * affect the correctness.
+	 */
+	__kvm_tlb_flush_vmid(&vcpu->kvm->arch.mmu);
+	return true;
+}
+
+static bool handle_vae2is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			  const struct sys_reg_desc *r)
+{
+	int sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
+
+	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
+		return false;
+
+	/*
+	 * Based on the same principle as TLBI ALLE2 instruction
+	 * emulation, we emulate TLBI VAE2* instructions by executing
+	 * corresponding TLBI VAE1* instructions with the virtual
+	 * EL2's VMID assigned by the host hypervisor.
+	 */
+	__kvm_tlb_vae2is(&vcpu->kvm->arch.mmu, p->regval, sys_encoding);
+	return true;
+}
+
+static bool handle_alle1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			   const struct sys_reg_desc *r)
+{
+	struct kvm_s2_mmu *mmu = &vcpu->kvm->arch.mmu;
+
+	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
+		return false;
+
+	spin_lock(&vcpu->kvm->mmu_lock);
+
+	/*
+	 * Clear all mappings in the shadow page tables and invalidate the stage
+	 * 1 and 2 TLB entries via kvm_tlb_flush_vmid_ipa().
+	 */
+	kvm_nested_s2_clear(vcpu->kvm);
+
+	if (mmu->vmid.vmid_gen) {
+		/*
+		 * Invalidate the stage 1 and 2 TLB entries for the host OS
+		 * in a VM only if there is one.
+		 */
+		__kvm_tlb_flush_vmid(mmu);
+	}
+
+	spin_unlock(&vcpu->kvm->mmu_lock);
+
+	return true;
+}
+
+static bool handle_vmalls12e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+				const struct sys_reg_desc *r)
+{
+	struct kvm_s2_mmu *mmu;
+	u64 vttbr;
+
+	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
+		return false;
+
+	vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
+	spin_lock(&vcpu->kvm->mmu_lock);
+
+	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, HCR_VM);
+	if (mmu)
+		kvm_unmap_stage2_range(mmu, 0, kvm_phys_size(vcpu->kvm));
+
+	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, 0);
+	if (mmu)
+		kvm_unmap_stage2_range(mmu, 0, kvm_phys_size(vcpu->kvm));
+
+	spin_unlock(&vcpu->kvm->mmu_lock);
+
+	return true;
+}
+
+static bool handle_ipas2e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			     const struct sys_reg_desc *r)
+{
+	u64 vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
+	u64 vtcr = vcpu_read_sys_reg(vcpu, VTCR_EL2);
+	struct kvm_s2_mmu *mmu;
+	u64 base_addr;
+	int max_size;
+
+	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
+		return false;
+
+	/*
+	 * We drop a number of things from the supplied value:
+	 *
+	 * - NS bit: we're non-secure only.
+	 *
+	 * - TTL field: We already have the granule size from the
+	 *   VTCR_EL2.TG0 field, and the level is only relevant to the
+	 *   guest's S2PT.
+	 *
+	 * - IPA[51:48]: We don't support 52bit IPA just yet...
+	 *
+	 * And of course, adjust the IPA to be on an actual address.
+	 */
+	base_addr = (p->regval & GENMASK_ULL(35, 0)) << 12;
+
+	/* Compute the maximum extent of the invalidation */
+	switch ((vtcr & VTCR_EL2_TG0_MASK)) {
+	case VTCR_EL2_TG0_4K:
+		max_size = SZ_1G;
+		break;
+	case VTCR_EL2_TG0_16K:
+		max_size = SZ_32M;
+		break;
+	case VTCR_EL2_TG0_64K:
+		/*
+		 * No, we do not support 52bit IPA in nested yet. Once
+		 * we do, this should be 4TB.
+		 */
+		/* FIXME: remove the 52bit PA support from the IDregs */
+		max_size = SZ_512M;
+		break;
+	default:
+		BUG();
+	}
+
+	spin_lock(&vcpu->kvm->mmu_lock);
+
+	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, HCR_VM);
+	if (mmu)
+		kvm_unmap_stage2_range(mmu, base_addr, max_size);
+
+	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, 0);
+	if (mmu)
+		kvm_unmap_stage2_range(mmu, base_addr, max_size);
+
+	spin_unlock(&vcpu->kvm->mmu_lock);
+
+	return true;
+}
+
+static bool handle_tlbi_el1(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			    const struct sys_reg_desc *r)
+{
+	u32 sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
+
+	if (vcpu_has_nv(vcpu) && forward_traps(vcpu, HCR_TTLB))
+		return false;
+
+	/*
+	 * If we're here, this is because we've trapped on a EL1 TLBI
+	 * instruction that affects the EL1 translation regime while
+	 * we're running in a context that doesn't allow us to let the
+	 * HW do its thing (aka vEL2):
+	 *
+	 * - HCR_EL2.E2H == 0 : a non-VHE guest
+	 * - HCR_EL2.{E2H,TGE} == { 1, 0 } : a VHE guest in guest mode
+	 *
+	 * We don't expect these helpers to ever be called when running
+	 * in a vEL1 context.
+	 */
+
+	WARN_ON(!vcpu_is_el2(vcpu));
+
+	mutex_lock(&vcpu->kvm->lock);
+
+	if ((__vcpu_sys_reg(vcpu, HCR_EL2) & (HCR_E2H | HCR_TGE)) != (HCR_E2H | HCR_TGE)) {
+		u64 virtual_vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
+		struct kvm_s2_mmu *mmu;
+
+		mmu = lookup_s2_mmu(vcpu->kvm, virtual_vttbr, HCR_VM);
+		if (mmu)
+			__kvm_tlb_el1_instr(mmu, p->regval, sys_encoding);
+
+		mmu = lookup_s2_mmu(vcpu->kvm, virtual_vttbr, 0);
+		if (mmu)
+			__kvm_tlb_el1_instr(mmu, p->regval, sys_encoding);
+	} else {
+		/*
+		 * ARMv8.4-NV allows the guest to change TGE behind
+		 * our back, so we always trap EL1 TLBIs from vEL2...
+		 */
+		__kvm_tlb_el1_instr(&vcpu->kvm->arch.mmu, p->regval, sys_encoding);
+	}
+
+	mutex_unlock(&vcpu->kvm->lock);
+
+	return true;
+}
+
 /*
  * AT instruction emulation
  *
@@ -2459,12 +2658,40 @@ static struct sys_reg_desc sys_insn_descs[] = {
 	{ SYS_DESC(SYS_DC_CSW), access_dcsw },
 	{ SYS_DESC(SYS_DC_CISW), access_dcsw },
 
+	SYS_INSN(TLBI_VMALLE1IS, handle_tlbi_el1),
+	SYS_INSN(TLBI_VAE1IS, handle_tlbi_el1),
+	SYS_INSN(TLBI_ASIDE1IS, handle_tlbi_el1),
+	SYS_INSN(TLBI_VAAE1IS, handle_tlbi_el1),
+	SYS_INSN(TLBI_VALE1IS, handle_tlbi_el1),
+	SYS_INSN(TLBI_VAALE1IS, handle_tlbi_el1),
+	SYS_INSN(TLBI_VMALLE1, handle_tlbi_el1),
+	SYS_INSN(TLBI_VAE1, handle_tlbi_el1),
+	SYS_INSN(TLBI_ASIDE1, handle_tlbi_el1),
+	SYS_INSN(TLBI_VAAE1, handle_tlbi_el1),
+	SYS_INSN(TLBI_VALE1, handle_tlbi_el1),
+	SYS_INSN(TLBI_VAALE1, handle_tlbi_el1),
+
 	SYS_INSN(AT_S1E2R, handle_s1e2),
 	SYS_INSN(AT_S1E2W, handle_s1e2),
 	SYS_INSN(AT_S12E1R, handle_s12r),
 	SYS_INSN(AT_S12E1W, handle_s12w),
 	SYS_INSN(AT_S12E0R, handle_s12r),
 	SYS_INSN(AT_S12E0W, handle_s12w),
+
+	SYS_INSN(TLBI_IPAS2E1IS, handle_ipas2e1is),
+	SYS_INSN(TLBI_IPAS2LE1IS, handle_ipas2e1is),
+	SYS_INSN(TLBI_ALLE2IS, handle_alle2is),
+	SYS_INSN(TLBI_VAE2IS, handle_vae2is),
+	SYS_INSN(TLBI_ALLE1IS, handle_alle1is),
+	SYS_INSN(TLBI_VALE2IS, handle_vae2is),
+	SYS_INSN(TLBI_VMALLS12E1IS, handle_vmalls12e1is),
+	SYS_INSN(TLBI_IPAS2E1, handle_ipas2e1is),
+	SYS_INSN(TLBI_IPAS2LE1, handle_ipas2e1is),
+	SYS_INSN(TLBI_ALLE2, handle_alle2is),
+	SYS_INSN(TLBI_VAE2, handle_vae2is),
+	SYS_INSN(TLBI_ALLE1, handle_alle1is),
+	SYS_INSN(TLBI_VALE2, handle_vae2is),
+	SYS_INSN(TLBI_VMALLS12E1, handle_vmalls12e1is),
 };
 
 static bool trap_dbgdidr(struct kvm_vcpu *vcpu,
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 41/64] KVM: arm64: nv: Trap and emulate TLBI instructions from virtual EL2
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Jintack Lim <jintack.lim@linaro.org>

When supporting nested virtualization a guest hypervisor executing TLBI
instructions must be trapped and emulated by the host hypervisor,
because the guest hypervisor can only affect physical TLB entries
relating to its own execution environment (virtual EL2 in EL1) but not
to the nested guests as required by the semantics of the instructions
and TLBI instructions might also result in updates (invalidations) to
shadow page tables.

This patch does several things.

1. List and define all TLBI system instructions to emulate.

2. Emulate TLBI ALLE2(IS) instruction executed in the virtual EL2. Since
we emulate the virtual EL2 in the EL1, we invalidate EL1&0 regime stage
1 TLB entries with setting vttbr_el2 having the VMID of the virtual EL2.

3. Emulate TLBI VAE2* instruction executed in the virtual EL2. Based on the
same principle as TLBI ALLE2 instruction, we can simply emulate those
instructions by executing corresponding VAE1* instructions with the
virtual EL2's VMID assigned by the host hypervisor.

Note that we are able to emulate TLBI ALLE2IS precisely by only
invalidating stage 1 TLB entries via TLBI VMALL1IS instruction, but to
make it simeple, we reuse the existing function, __kvm_tlb_flush_vmid(),
which invalidates both of stage 1 and 2 TLB entries.

4. TLBI ALLE1(IS) instruction invalidates all EL1&0 regime stage 1 and 2
TLB entries (on all PEs in the same Inner Shareable domain). To emulate
these instructions, we first need to clear all the mappings in the
shadow page tables since executing those instructions implies the change
of mappings in the stage 2 page tables maintained by the guest
hypervisor.  We then need to invalidate all EL1&0 regime stage 1 and 2
TLB entries of all VMIDs, which are assigned by the host hypervisor, for
this VM.

5. Based on the same principle as TLBI ALLE1(IS) emulation, we clear the
mappings in the shadow stage-2 page tables and invalidate TLB entries.
But this time we do it only for the current VMID from the guest
hypervisor's perspective, not for all VMIDs.

6. Based on the same principle as TLBI ALLE1(IS) and TLBI VMALLS12E1(IS)
emulation, we clear the mappings in the shadow stage-2 page tables and
invalidate TLB entries. We do it only for one mapping for the current
VMID from the guest hypervisor's view.

7. Forward system instruction traps to the virtual EL2 if a
corresponding bit in the virtual HCR_EL2 is set.

8. Even though a guest hypervisor can execute TLBI instructions that are
accesible at EL1 without trap, it's wrong; All those TLBI instructions
work based on current VMID, and when running a guest hypervisor current
VMID is the one for itself, not the one from the virtual vttbr_el2. So
letting a guest hypervisor execute those TLBI instructions results in
invalidating its own TLB entries and leaving invalid TLB entries
unhandled.

Therefore we trap and emulate those TLBI instructions. The emulation is
simple; we find a shadow VMID mapped to the virtual vttbr_el2, set it in
the physical vttbr_el2, then execute the same instruction in EL2.

We don't set HCR_EL2.TTLB bit yet.

  [ Changes performed by Marc Zynger:

    The TLBI handling code more or less directly execute the same
    instruction that has been trapped (with an EL2->EL1 conversion
    in the case of an EL2 TLBI), but that's unfortunately not enough:

    - TLBIs must be upgraded to the Inner Shareable domain to account
      for vcpu migration, just like we already have with HCR_EL2.FB.

    - The DSB instruction that synchronises these must thus be on
      the Inner Shareable domain as well.

    - Prior to executing the TLBI, we need another DSB ISHST to make
      sure that the update to the page tables is now visible.

      Ordering of system instructions fixed

    - The current TLB invalidation code is pretty buggy, as it assume a
      page mapping. On the contrary, it is likely that TLB invalidation
      will cover more than a single page, and the size should be decided
      by the guests configuration (and not the host's).

      Since we don't cache the guest mapping sizes in the shadow PT yet,
      let's assume the worse case (a block mapping) and invalidate that.

      Take this opportunity to fix the decoding of the parameter (it
      isn't a straight IPA).

    - In general, we always emulate local TBL invalidations as being
      as upgraded to the Inner Shareable domain so that we can easily
      deal with vcpu migration. This is consistent with the fact that
      we set HCR_EL2.FB when running non-nested VMs.

      So let's emulate TLBI ALLE2 as ALLE2IS.
  ]

  [ Changes performed by Christoffer Dall:

    Sometimes when we are invalidating the TLB for a certain S2 MMU
    context, this context can also have EL2 context associated with it
    and we have to invalidate this too.
  ]

Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_asm.h |   2 +
 arch/arm64/include/asm/sysreg.h  |  36 +++++
 arch/arm64/kvm/hyp/vhe/switch.c  |   8 +-
 arch/arm64/kvm/hyp/vhe/tlb.c     |  81 +++++++++++
 arch/arm64/kvm/mmu.c             |  19 ++-
 arch/arm64/kvm/sys_regs.c        | 227 +++++++++++++++++++++++++++++++
 6 files changed, 368 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index e22861ece3c3..64ddbecab241 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -206,6 +206,8 @@ extern void __kvm_flush_cpu_context(struct kvm_s2_mmu *mmu);
 extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa,
 				     int level);
 extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
+extern void __kvm_tlb_vae2is(struct kvm_s2_mmu *mmu, u64 va, u64 sys_encoding);
+extern void __kvm_tlb_el1_instr(struct kvm_s2_mmu *mmu, u64 val, u64 sys_encoding);
 
 extern void __kvm_timer_set_cntvoff(u64 cntvoff);
 extern void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr);
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index ea17d1eabfc5..41c3603a3f29 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -675,6 +675,42 @@
 #define OP_AT_S12E0R	sys_insn(AT_Op0, 4, AT_CRn, 8, 6)
 #define OP_AT_S12E0W	sys_insn(AT_Op0, 4, AT_CRn, 8, 7)
 
+/* TLBI instructions */
+#define TLBI_Op0	1
+#define TLBI_Op1_EL1	0	/* Accessible from EL1 or higher */
+#define TLBI_Op1_EL2	4	/* Accessible from EL2 or higher */
+#define TLBI_CRn	8
+#define tlbi_insn_el1(CRm, Op2)	sys_insn(TLBI_Op0, TLBI_Op1_EL1, TLBI_CRn, (CRm), (Op2))
+#define tlbi_insn_el2(CRm, Op2)	sys_insn(TLBI_Op0, TLBI_Op1_EL2, TLBI_CRn, (CRm), (Op2))
+
+#define OP_TLBI_VMALLE1IS	tlbi_insn_el1(3, 0)
+#define OP_TLBI_VAE1IS		tlbi_insn_el1(3, 1)
+#define OP_TLBI_ASIDE1IS	tlbi_insn_el1(3, 2)
+#define OP_TLBI_VAAE1IS		tlbi_insn_el1(3, 3)
+#define OP_TLBI_VALE1IS		tlbi_insn_el1(3, 5)
+#define OP_TLBI_VAALE1IS	tlbi_insn_el1(3, 7)
+#define OP_TLBI_VMALLE1		tlbi_insn_el1(7, 0)
+#define OP_TLBI_VAE1		tlbi_insn_el1(7, 1)
+#define OP_TLBI_ASIDE1		tlbi_insn_el1(7, 2)
+#define OP_TLBI_VAAE1		tlbi_insn_el1(7, 3)
+#define OP_TLBI_VALE1		tlbi_insn_el1(7, 5)
+#define OP_TLBI_VAALE1		tlbi_insn_el1(7, 7)
+
+#define OP_TLBI_IPAS2E1IS	tlbi_insn_el2(0, 1)
+#define OP_TLBI_IPAS2LE1IS	tlbi_insn_el2(0, 5)
+#define OP_TLBI_ALLE2IS		tlbi_insn_el2(3, 0)
+#define OP_TLBI_VAE2IS		tlbi_insn_el2(3, 1)
+#define OP_TLBI_ALLE1IS		tlbi_insn_el2(3, 4)
+#define OP_TLBI_VALE2IS		tlbi_insn_el2(3, 5)
+#define OP_TLBI_VMALLS12E1IS	tlbi_insn_el2(3, 6)
+#define OP_TLBI_IPAS2E1		tlbi_insn_el2(4, 1)
+#define OP_TLBI_IPAS2LE1	tlbi_insn_el2(4, 5)
+#define OP_TLBI_ALLE2		tlbi_insn_el2(7, 0)
+#define OP_TLBI_VAE2		tlbi_insn_el2(7, 1)
+#define OP_TLBI_ALLE1		tlbi_insn_el2(7, 4)
+#define OP_TLBI_VALE2		tlbi_insn_el2(7, 5)
+#define OP_TLBI_VMALLS12E1	tlbi_insn_el2(7, 6)
+
 /* Common SCTLR_ELx flags. */
 #define SCTLR_ELx_DSSBS	(BIT(44))
 #define SCTLR_ELx_ATA	(BIT(43))
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index b7790d3c4122..0e164cc8e913 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -44,7 +44,7 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 			 * the EL1 virtual memory control register accesses
 			 * as well as the AT S1 operations.
 			 */
-			hcr |= HCR_TVM | HCR_TRVM | HCR_AT | HCR_NV1;
+			hcr |= HCR_TVM | HCR_TRVM | HCR_AT | HCR_TTLB | HCR_NV1;
 		} else {
 			/*
 			 * For a guest hypervisor on v8.1 (VHE), allow to
@@ -72,11 +72,11 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 
 			/*
 			 * If we're using the EL1 translation regime
-			 * (TGE clear), then ensure that AT S1 ops are
-			 * trapped too.
+			 * (TGE clear), then ensure that AT S1 and
+			 * TLBI E1 ops are trapped too.
 			 */
 			if (!vcpu_el2_tge_is_set(vcpu))
-				hcr |= HCR_AT;
+				hcr |= HCR_AT | HCR_TTLB;
 		}
 	}
 
diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c
index 24cef9b87f9e..c4389db4cc22 100644
--- a/arch/arm64/kvm/hyp/vhe/tlb.c
+++ b/arch/arm64/kvm/hyp/vhe/tlb.c
@@ -161,3 +161,84 @@ void __kvm_flush_vm_context(void)
 
 	dsb(ish);
 }
+
+void __kvm_tlb_vae2is(struct kvm_s2_mmu *mmu, u64 va, u64 sys_encoding)
+{
+	struct tlb_inv_context cxt;
+
+	dsb(ishst);
+
+	/* Switch to requested VMID */
+	__tlb_switch_to_guest(mmu, &cxt);
+
+	/*
+	 * Execute the EL1 version of TLBI VAE2* instruction, forcing
+	 * an upgrade to the Inner Shareable domain in order to
+	 * perform the invalidation on all CPUs.
+	 */
+	switch (sys_encoding) {
+	case OP_TLBI_VAE2:
+	case OP_TLBI_VAE2IS:
+		__tlbi(vae1is, va);
+		break;
+	case OP_TLBI_VALE2:
+	case OP_TLBI_VALE2IS:
+		__tlbi(vale1is, va);
+		break;
+	default:
+		break;
+	}
+	dsb(ish);
+	isb();
+
+	__tlb_switch_to_host(&cxt);
+}
+
+void __kvm_tlb_el1_instr(struct kvm_s2_mmu *mmu, u64 val, u64 sys_encoding)
+{
+	struct tlb_inv_context cxt;
+
+	dsb(ishst);
+
+	/* Switch to requested VMID */
+	__tlb_switch_to_guest(mmu, &cxt);
+
+	/*
+	 * Execute the same instruction as the guest hypervisor did,
+	 * expanding the scope of local TLB invalidations to the Inner
+	 * Shareable domain so that it takes place on all CPUs. This
+	 * is equivalent to having HCR_EL2.FB set.
+	 */
+	switch (sys_encoding) {
+	case OP_TLBI_VMALLE1:
+	case OP_TLBI_VMALLE1IS:
+		__tlbi(vmalle1is);
+		break;
+	case OP_TLBI_VAE1:
+	case OP_TLBI_VAE1IS:
+		__tlbi(vae1is, val);
+		break;
+	case OP_TLBI_ASIDE1:
+	case OP_TLBI_ASIDE1IS:
+		__tlbi(aside1is, val);
+		break;
+	case OP_TLBI_VAAE1:
+	case OP_TLBI_VAAE1IS:
+		__tlbi(vaae1is, val);
+		break;
+	case OP_TLBI_VALE1:
+	case OP_TLBI_VALE1IS:
+		__tlbi(vale1is, val);
+		break;
+	case OP_TLBI_VAALE1:
+	case OP_TLBI_VAALE1IS:
+		__tlbi(vaale1is, val);
+		break;
+	default:
+		break;
+	}
+	dsb(ish);
+	isb();
+
+	__tlb_switch_to_host(&cxt);
+}
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index b9b11be65009..4b38f2f3c997 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -80,8 +80,25 @@ static bool memslot_is_logging(struct kvm_memory_slot *memslot)
  */
 void kvm_flush_remote_tlbs(struct kvm *kvm)
 {
+	struct kvm_s2_mmu *mmu = &kvm->arch.mmu;
+
 	++kvm->stat.generic.remote_tlb_flush_requests;
-	kvm_call_hyp(__kvm_tlb_flush_vmid, &kvm->arch.mmu);
+
+	if (mmu == &kvm->arch.mmu) {
+		/*
+		 * For a normal (i.e. non-nested) guest, flush entries for the
+		 * given VMID *
+		 */
+		kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
+	} else {
+		/*
+		 * When supporting nested virtualization, we can have multiple
+		 * VMIDs in play for each VCPU in the VM, so it's really not
+		 * worth it to try to quiesce the system and flush all the
+		 * VMIDs that may be in use, instead just nuke the whole thing.
+		 */
+		kvm_call_hyp(__kvm_flush_vm_context);
+	}
 }
 
 static bool kvm_is_device_pfn(unsigned long pfn)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 7be57e1b7019..d7441b8ba406 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2374,6 +2374,205 @@ static bool handle_s12w(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 	return handle_s12(vcpu, p, r, true);
 }
 
+static bool handle_alle2is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			   const struct sys_reg_desc *r)
+{
+	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
+		return false;
+
+	/*
+	 * To emulate invalidating all EL2 regime stage 1 TLB entries for all
+	 * PEs, executing TLBI VMALLE1IS is enough. But reuse the existing
+	 * interface for the simplicity; invalidating stage 2 entries doesn't
+	 * affect the correctness.
+	 */
+	__kvm_tlb_flush_vmid(&vcpu->kvm->arch.mmu);
+	return true;
+}
+
+static bool handle_vae2is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			  const struct sys_reg_desc *r)
+{
+	int sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
+
+	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
+		return false;
+
+	/*
+	 * Based on the same principle as TLBI ALLE2 instruction
+	 * emulation, we emulate TLBI VAE2* instructions by executing
+	 * corresponding TLBI VAE1* instructions with the virtual
+	 * EL2's VMID assigned by the host hypervisor.
+	 */
+	__kvm_tlb_vae2is(&vcpu->kvm->arch.mmu, p->regval, sys_encoding);
+	return true;
+}
+
+static bool handle_alle1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			   const struct sys_reg_desc *r)
+{
+	struct kvm_s2_mmu *mmu = &vcpu->kvm->arch.mmu;
+
+	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
+		return false;
+
+	spin_lock(&vcpu->kvm->mmu_lock);
+
+	/*
+	 * Clear all mappings in the shadow page tables and invalidate the stage
+	 * 1 and 2 TLB entries via kvm_tlb_flush_vmid_ipa().
+	 */
+	kvm_nested_s2_clear(vcpu->kvm);
+
+	if (mmu->vmid.vmid_gen) {
+		/*
+		 * Invalidate the stage 1 and 2 TLB entries for the host OS
+		 * in a VM only if there is one.
+		 */
+		__kvm_tlb_flush_vmid(mmu);
+	}
+
+	spin_unlock(&vcpu->kvm->mmu_lock);
+
+	return true;
+}
+
+static bool handle_vmalls12e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+				const struct sys_reg_desc *r)
+{
+	struct kvm_s2_mmu *mmu;
+	u64 vttbr;
+
+	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
+		return false;
+
+	vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
+	spin_lock(&vcpu->kvm->mmu_lock);
+
+	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, HCR_VM);
+	if (mmu)
+		kvm_unmap_stage2_range(mmu, 0, kvm_phys_size(vcpu->kvm));
+
+	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, 0);
+	if (mmu)
+		kvm_unmap_stage2_range(mmu, 0, kvm_phys_size(vcpu->kvm));
+
+	spin_unlock(&vcpu->kvm->mmu_lock);
+
+	return true;
+}
+
+static bool handle_ipas2e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			     const struct sys_reg_desc *r)
+{
+	u64 vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
+	u64 vtcr = vcpu_read_sys_reg(vcpu, VTCR_EL2);
+	struct kvm_s2_mmu *mmu;
+	u64 base_addr;
+	int max_size;
+
+	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
+		return false;
+
+	/*
+	 * We drop a number of things from the supplied value:
+	 *
+	 * - NS bit: we're non-secure only.
+	 *
+	 * - TTL field: We already have the granule size from the
+	 *   VTCR_EL2.TG0 field, and the level is only relevant to the
+	 *   guest's S2PT.
+	 *
+	 * - IPA[51:48]: We don't support 52bit IPA just yet...
+	 *
+	 * And of course, adjust the IPA to be on an actual address.
+	 */
+	base_addr = (p->regval & GENMASK_ULL(35, 0)) << 12;
+
+	/* Compute the maximum extent of the invalidation */
+	switch ((vtcr & VTCR_EL2_TG0_MASK)) {
+	case VTCR_EL2_TG0_4K:
+		max_size = SZ_1G;
+		break;
+	case VTCR_EL2_TG0_16K:
+		max_size = SZ_32M;
+		break;
+	case VTCR_EL2_TG0_64K:
+		/*
+		 * No, we do not support 52bit IPA in nested yet. Once
+		 * we do, this should be 4TB.
+		 */
+		/* FIXME: remove the 52bit PA support from the IDregs */
+		max_size = SZ_512M;
+		break;
+	default:
+		BUG();
+	}
+
+	spin_lock(&vcpu->kvm->mmu_lock);
+
+	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, HCR_VM);
+	if (mmu)
+		kvm_unmap_stage2_range(mmu, base_addr, max_size);
+
+	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, 0);
+	if (mmu)
+		kvm_unmap_stage2_range(mmu, base_addr, max_size);
+
+	spin_unlock(&vcpu->kvm->mmu_lock);
+
+	return true;
+}
+
+static bool handle_tlbi_el1(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			    const struct sys_reg_desc *r)
+{
+	u32 sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
+
+	if (vcpu_has_nv(vcpu) && forward_traps(vcpu, HCR_TTLB))
+		return false;
+
+	/*
+	 * If we're here, this is because we've trapped on a EL1 TLBI
+	 * instruction that affects the EL1 translation regime while
+	 * we're running in a context that doesn't allow us to let the
+	 * HW do its thing (aka vEL2):
+	 *
+	 * - HCR_EL2.E2H == 0 : a non-VHE guest
+	 * - HCR_EL2.{E2H,TGE} == { 1, 0 } : a VHE guest in guest mode
+	 *
+	 * We don't expect these helpers to ever be called when running
+	 * in a vEL1 context.
+	 */
+
+	WARN_ON(!vcpu_is_el2(vcpu));
+
+	mutex_lock(&vcpu->kvm->lock);
+
+	if ((__vcpu_sys_reg(vcpu, HCR_EL2) & (HCR_E2H | HCR_TGE)) != (HCR_E2H | HCR_TGE)) {
+		u64 virtual_vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
+		struct kvm_s2_mmu *mmu;
+
+		mmu = lookup_s2_mmu(vcpu->kvm, virtual_vttbr, HCR_VM);
+		if (mmu)
+			__kvm_tlb_el1_instr(mmu, p->regval, sys_encoding);
+
+		mmu = lookup_s2_mmu(vcpu->kvm, virtual_vttbr, 0);
+		if (mmu)
+			__kvm_tlb_el1_instr(mmu, p->regval, sys_encoding);
+	} else {
+		/*
+		 * ARMv8.4-NV allows the guest to change TGE behind
+		 * our back, so we always trap EL1 TLBIs from vEL2...
+		 */
+		__kvm_tlb_el1_instr(&vcpu->kvm->arch.mmu, p->regval, sys_encoding);
+	}
+
+	mutex_unlock(&vcpu->kvm->lock);
+
+	return true;
+}
+
 /*
  * AT instruction emulation
  *
@@ -2459,12 +2658,40 @@ static struct sys_reg_desc sys_insn_descs[] = {
 	{ SYS_DESC(SYS_DC_CSW), access_dcsw },
 	{ SYS_DESC(SYS_DC_CISW), access_dcsw },
 
+	SYS_INSN(TLBI_VMALLE1IS, handle_tlbi_el1),
+	SYS_INSN(TLBI_VAE1IS, handle_tlbi_el1),
+	SYS_INSN(TLBI_ASIDE1IS, handle_tlbi_el1),
+	SYS_INSN(TLBI_VAAE1IS, handle_tlbi_el1),
+	SYS_INSN(TLBI_VALE1IS, handle_tlbi_el1),
+	SYS_INSN(TLBI_VAALE1IS, handle_tlbi_el1),
+	SYS_INSN(TLBI_VMALLE1, handle_tlbi_el1),
+	SYS_INSN(TLBI_VAE1, handle_tlbi_el1),
+	SYS_INSN(TLBI_ASIDE1, handle_tlbi_el1),
+	SYS_INSN(TLBI_VAAE1, handle_tlbi_el1),
+	SYS_INSN(TLBI_VALE1, handle_tlbi_el1),
+	SYS_INSN(TLBI_VAALE1, handle_tlbi_el1),
+
 	SYS_INSN(AT_S1E2R, handle_s1e2),
 	SYS_INSN(AT_S1E2W, handle_s1e2),
 	SYS_INSN(AT_S12E1R, handle_s12r),
 	SYS_INSN(AT_S12E1W, handle_s12w),
 	SYS_INSN(AT_S12E0R, handle_s12r),
 	SYS_INSN(AT_S12E0W, handle_s12w),
+
+	SYS_INSN(TLBI_IPAS2E1IS, handle_ipas2e1is),
+	SYS_INSN(TLBI_IPAS2LE1IS, handle_ipas2e1is),
+	SYS_INSN(TLBI_ALLE2IS, handle_alle2is),
+	SYS_INSN(TLBI_VAE2IS, handle_vae2is),
+	SYS_INSN(TLBI_ALLE1IS, handle_alle1is),
+	SYS_INSN(TLBI_VALE2IS, handle_vae2is),
+	SYS_INSN(TLBI_VMALLS12E1IS, handle_vmalls12e1is),
+	SYS_INSN(TLBI_IPAS2E1, handle_ipas2e1is),
+	SYS_INSN(TLBI_IPAS2LE1, handle_ipas2e1is),
+	SYS_INSN(TLBI_ALLE2, handle_alle2is),
+	SYS_INSN(TLBI_VAE2, handle_vae2is),
+	SYS_INSN(TLBI_ALLE1, handle_alle1is),
+	SYS_INSN(TLBI_VALE2, handle_vae2is),
+	SYS_INSN(TLBI_VMALLS12E1, handle_vmalls12e1is),
 };
 
 static bool trap_dbgdidr(struct kvm_vcpu *vcpu,
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 42/64] KVM: arm64: nv: Fold guest's HCR_EL2 configuration into the host's
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

When entering a L2 guest (nested virt enabled, but not in hypervisor
context), we need to honor the traps the L1 guest has asked enabled.

For now, just OR the guest's HCR_EL2 into the host's. We may have to do
some filtering in the future though.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/hyp/vhe/switch.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 0e164cc8e913..5e8eafac27c6 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -78,6 +78,11 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 			if (!vcpu_el2_tge_is_set(vcpu))
 				hcr |= HCR_AT | HCR_TTLB;
 		}
+	} else if (vcpu_has_nv(vcpu)) {
+		u64 vhcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
+
+		vhcr_el2 &= ~HCR_GUEST_NV_FILTER_FLAGS;
+		hcr |= vhcr_el2;
 	}
 
 	___activate_traps(vcpu, hcr);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 42/64] KVM: arm64: nv: Fold guest's HCR_EL2 configuration into the host's
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

When entering a L2 guest (nested virt enabled, but not in hypervisor
context), we need to honor the traps the L1 guest has asked enabled.

For now, just OR the guest's HCR_EL2 into the host's. We may have to do
some filtering in the future though.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/hyp/vhe/switch.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 0e164cc8e913..5e8eafac27c6 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -78,6 +78,11 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 			if (!vcpu_el2_tge_is_set(vcpu))
 				hcr |= HCR_AT | HCR_TTLB;
 		}
+	} else if (vcpu_has_nv(vcpu)) {
+		u64 vhcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
+
+		vhcr_el2 &= ~HCR_GUEST_NV_FILTER_FLAGS;
+		hcr |= vhcr_el2;
 	}
 
 	___activate_traps(vcpu, hcr);
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 42/64] KVM: arm64: nv: Fold guest's HCR_EL2 configuration into the host's
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

When entering a L2 guest (nested virt enabled, but not in hypervisor
context), we need to honor the traps the L1 guest has asked enabled.

For now, just OR the guest's HCR_EL2 into the host's. We may have to do
some filtering in the future though.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/hyp/vhe/switch.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 0e164cc8e913..5e8eafac27c6 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -78,6 +78,11 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 			if (!vcpu_el2_tge_is_set(vcpu))
 				hcr |= HCR_AT | HCR_TTLB;
 		}
+	} else if (vcpu_has_nv(vcpu)) {
+		u64 vhcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
+
+		vhcr_el2 &= ~HCR_GUEST_NV_FILTER_FLAGS;
+		hcr |= vhcr_el2;
 	}
 
 	___activate_traps(vcpu, hcr);
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 43/64] KVM: arm64: nv: arch_timer: Support hyp timer emulation
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Christoffer Dall <christoffer.dall@arm.com>

Emulating EL2 also means emulating the EL2 timers. To do so, we expand
our timer framework to deal with at most 4 timers. At any given time,
two timers are using the HW timers, and the two others are purely
emulated.

The role of deciding which is which at any given time is left to a
mapping function which is called every time we need to make such a
decision.

Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
[maz: added CNTVOFF support, general reworking for v4.8]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h |   4 +
 arch/arm64/kvm/arch_timer.c       | 165 ++++++++++++++++++++++++++++--
 arch/arm64/kvm/sys_regs.c         |   7 +-
 arch/arm64/kvm/trace_arm.h        |   6 +-
 arch/arm64/kvm/vgic/vgic.c        |  15 +++
 include/kvm/arm_arch_timer.h      |   8 +-
 include/kvm/arm_vgic.h            |   1 +
 7 files changed, 194 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 0b887364f994..03833eca3307 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -285,6 +285,10 @@ enum vcpu_sysreg {
 	TPIDR_EL2,	/* EL2 Software Thread ID Register */
 	CNTHCTL_EL2,	/* Counter-timer Hypervisor Control register */
 	SP_EL2,		/* EL2 Stack Pointer */
+	CNTHP_CTL_EL2,
+	CNTHP_CVAL_EL2,
+	CNTHV_CTL_EL2,
+	CNTHV_CVAL_EL2,
 
 	NR_SYS_REGS	/* Nothing after this line! */
 };
diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index 6e542e2eae32..5e4f93605d36 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -16,6 +16,7 @@
 #include <asm/arch_timer.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_nested.h>
 
 #include <kvm/arm_vgic.h>
 #include <kvm/arm_arch_timer.h>
@@ -40,6 +41,16 @@ static const struct kvm_irq_level default_vtimer_irq = {
 	.level	= 1,
 };
 
+static const struct kvm_irq_level default_hptimer_irq = {
+	.irq	= 26,
+	.level	= 1,
+};
+
+static const struct kvm_irq_level default_hvtimer_irq = {
+	.irq	= 28,
+	.level	= 1,
+};
+
 static bool kvm_timer_irq_can_fire(struct arch_timer_context *timer_ctx);
 static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
 				 struct arch_timer_context *timer_ctx);
@@ -51,6 +62,11 @@ static void kvm_arm_timer_write(struct kvm_vcpu *vcpu,
 static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
 			      struct arch_timer_context *timer,
 			      enum kvm_arch_timer_regs treg);
+static bool kvm_arch_timer_get_input_level(int vintid);
+
+static struct irq_ops arch_timer_irq_ops = {
+	.get_input_level = kvm_arch_timer_get_input_level,
+};
 
 u32 timer_get_ctl(struct arch_timer_context *ctxt)
 {
@@ -61,6 +77,10 @@ u32 timer_get_ctl(struct arch_timer_context *ctxt)
 		return __vcpu_sys_reg(vcpu, CNTV_CTL_EL0);
 	case TIMER_PTIMER:
 		return __vcpu_sys_reg(vcpu, CNTP_CTL_EL0);
+	case TIMER_HVTIMER:
+		return __vcpu_sys_reg(vcpu, CNTHV_CTL_EL2);
+	case TIMER_HPTIMER:
+		return __vcpu_sys_reg(vcpu, CNTHP_CTL_EL2);
 	default:
 		WARN_ON(1);
 		return 0;
@@ -76,6 +96,10 @@ u64 timer_get_cval(struct arch_timer_context *ctxt)
 		return __vcpu_sys_reg(vcpu, CNTV_CVAL_EL0);
 	case TIMER_PTIMER:
 		return __vcpu_sys_reg(vcpu, CNTP_CVAL_EL0);
+	case TIMER_HVTIMER:
+		return __vcpu_sys_reg(vcpu, CNTHV_CVAL_EL2);
+	case TIMER_HPTIMER:
+		return __vcpu_sys_reg(vcpu, CNTHP_CVAL_EL2);
 	default:
 		WARN_ON(1);
 		return 0;
@@ -105,6 +129,12 @@ static void timer_set_ctl(struct arch_timer_context *ctxt, u32 ctl)
 	case TIMER_PTIMER:
 		__vcpu_sys_reg(vcpu, CNTP_CTL_EL0) = ctl;
 		break;
+	case TIMER_HVTIMER:
+		__vcpu_sys_reg(vcpu, CNTHV_CTL_EL2) = ctl;
+		break;
+	case TIMER_HPTIMER:
+		__vcpu_sys_reg(vcpu, CNTHP_CTL_EL2) = ctl;
+		break;
 	default:
 		WARN_ON(1);
 	}
@@ -121,6 +151,12 @@ static void timer_set_cval(struct arch_timer_context *ctxt, u64 cval)
 	case TIMER_PTIMER:
 		__vcpu_sys_reg(vcpu, CNTP_CVAL_EL0) = cval;
 		break;
+	case TIMER_HVTIMER:
+		__vcpu_sys_reg(vcpu, CNTHV_CVAL_EL2) = cval;
+		break;
+	case TIMER_HPTIMER:
+		__vcpu_sys_reg(vcpu, CNTHP_CVAL_EL2) = cval;
+		break;
 	default:
 		WARN_ON(1);
 	}
@@ -146,13 +182,27 @@ u64 kvm_phys_timer_read(void)
 
 static void get_timer_map(struct kvm_vcpu *vcpu, struct timer_map *map)
 {
-	if (has_vhe()) {
+	if (vcpu_has_nv(vcpu)) {
+		if (is_hyp_ctxt(vcpu)) {
+			map->direct_vtimer = vcpu_hvtimer(vcpu);
+			map->direct_ptimer = vcpu_hptimer(vcpu);
+			map->emul_vtimer = vcpu_vtimer(vcpu);
+			map->emul_ptimer = vcpu_ptimer(vcpu);
+		} else {
+			map->direct_vtimer = vcpu_vtimer(vcpu);
+			map->direct_ptimer = vcpu_ptimer(vcpu);
+			map->emul_vtimer = vcpu_hvtimer(vcpu);
+			map->emul_ptimer = vcpu_hptimer(vcpu);
+		}
+	} else if (has_vhe()) {
 		map->direct_vtimer = vcpu_vtimer(vcpu);
 		map->direct_ptimer = vcpu_ptimer(vcpu);
+		map->emul_vtimer = NULL;
 		map->emul_ptimer = NULL;
 	} else {
 		map->direct_vtimer = vcpu_vtimer(vcpu);
 		map->direct_ptimer = NULL;
+		map->emul_vtimer = NULL;
 		map->emul_ptimer = vcpu_ptimer(vcpu);
 	}
 
@@ -325,9 +375,11 @@ static bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx)
 
 		switch (index) {
 		case TIMER_VTIMER:
+		case TIMER_HVTIMER:
 			cnt_ctl = read_sysreg_el0(SYS_CNTV_CTL);
 			break;
 		case TIMER_PTIMER:
+		case TIMER_HPTIMER:
 			cnt_ctl = read_sysreg_el0(SYS_CNTP_CTL);
 			break;
 		case NR_KVM_TIMERS:
@@ -358,6 +410,7 @@ bool kvm_timer_is_pending(struct kvm_vcpu *vcpu)
 
 	return kvm_timer_should_fire(map.direct_vtimer) ||
 	       kvm_timer_should_fire(map.direct_ptimer) ||
+	       kvm_timer_should_fire(map.emul_vtimer) ||
 	       kvm_timer_should_fire(map.emul_ptimer);
 }
 
@@ -438,6 +491,7 @@ static void timer_save_state(struct arch_timer_context *ctx)
 
 	switch (index) {
 	case TIMER_VTIMER:
+	case TIMER_HVTIMER:
 		timer_set_ctl(ctx, read_sysreg_el0(SYS_CNTV_CTL));
 		timer_set_cval(ctx, read_sysreg_el0(SYS_CNTV_CVAL));
 
@@ -447,6 +501,7 @@ static void timer_save_state(struct arch_timer_context *ctx)
 
 		break;
 	case TIMER_PTIMER:
+	case TIMER_HPTIMER:
 		timer_set_ctl(ctx, read_sysreg_el0(SYS_CNTP_CTL));
 		timer_set_cval(ctx, read_sysreg_el0(SYS_CNTP_CVAL));
 
@@ -484,6 +539,7 @@ static void kvm_timer_blocking(struct kvm_vcpu *vcpu)
 	 */
 	if (!kvm_timer_irq_can_fire(map.direct_vtimer) &&
 	    !kvm_timer_irq_can_fire(map.direct_ptimer) &&
+	    !kvm_timer_irq_can_fire(map.emul_vtimer) &&
 	    !kvm_timer_irq_can_fire(map.emul_ptimer))
 		return;
 
@@ -517,11 +573,13 @@ static void timer_restore_state(struct arch_timer_context *ctx)
 
 	switch (index) {
 	case TIMER_VTIMER:
+	case TIMER_HVTIMER:
 		write_sysreg_el0(timer_get_cval(ctx), SYS_CNTV_CVAL);
 		isb();
 		write_sysreg_el0(timer_get_ctl(ctx), SYS_CNTV_CTL);
 		break;
 	case TIMER_PTIMER:
+	case TIMER_HPTIMER:
 		write_sysreg_el0(timer_get_cval(ctx), SYS_CNTP_CVAL);
 		isb();
 		write_sysreg_el0(timer_get_ctl(ctx), SYS_CNTP_CTL);
@@ -598,6 +656,42 @@ static void kvm_timer_vcpu_load_nogic(struct kvm_vcpu *vcpu)
 		enable_percpu_irq(host_vtimer_irq, host_vtimer_irq_flags);
 }
 
+static void kvm_timer_vcpu_load_nested_switch(struct kvm_vcpu *vcpu,
+					      struct timer_map *map)
+{
+	int hw, ret;
+
+	if (!irqchip_in_kernel(vcpu->kvm))
+		return;
+
+	/*
+	 * We only ever unmap the vtimer irq on a VHE system that runs nested
+	 * virtualization, in which case we have both a valid emul_vtimer,
+	 * emul_ptimer, direct_vtimer, and direct_ptimer.
+	 *
+	 * Since this is called from kvm_timer_vcpu_load(), a change between
+	 * vEL2 and vEL1/0 will have just happened, and the timer_map will
+	 * represent this, and therefore we switch the emul/direct mappings
+	 * below.
+	 */
+	hw = kvm_vgic_get_map(vcpu, map->direct_vtimer->irq.irq);
+	if (hw < 0) {
+		kvm_vgic_unmap_phys_irq(vcpu, map->emul_vtimer->irq.irq);
+		kvm_vgic_unmap_phys_irq(vcpu, map->emul_ptimer->irq.irq);
+
+		ret = kvm_vgic_map_phys_irq(vcpu,
+					    map->direct_vtimer->host_timer_irq,
+					    map->direct_vtimer->irq.irq,
+					    &arch_timer_irq_ops);
+		WARN_ON_ONCE(ret);
+		ret = kvm_vgic_map_phys_irq(vcpu,
+					    map->direct_ptimer->host_timer_irq,
+					    map->direct_ptimer->irq.irq,
+					    &arch_timer_irq_ops);
+		WARN_ON_ONCE(ret);
+	}
+}
+
 void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = vcpu_timer(vcpu);
@@ -609,6 +703,9 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
 	get_timer_map(vcpu, &map);
 
 	if (static_branch_likely(&has_gic_active_state)) {
+		if (vcpu_has_nv(vcpu))
+			kvm_timer_vcpu_load_nested_switch(vcpu, &map);
+
 		kvm_timer_vcpu_load_gic(map.direct_vtimer);
 		if (map.direct_ptimer)
 			kvm_timer_vcpu_load_gic(map.direct_ptimer);
@@ -624,6 +721,8 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
 	if (map.direct_ptimer)
 		timer_restore_state(map.direct_ptimer);
 
+	if (map.emul_vtimer)
+		timer_emulate(map.emul_vtimer);
 	if (map.emul_ptimer)
 		timer_emulate(map.emul_ptimer);
 }
@@ -668,6 +767,8 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
 	 * In any case, we re-schedule the hrtimer for the physical timer when
 	 * coming back to the VCPU thread in kvm_timer_vcpu_load().
 	 */
+	if (map.emul_vtimer)
+		soft_timer_cancel(&map.emul_vtimer->hrtimer);
 	if (map.emul_ptimer)
 		soft_timer_cancel(&map.emul_ptimer->hrtimer);
 
@@ -728,10 +829,14 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
 	 */
 	timer_set_ctl(vcpu_vtimer(vcpu), 0);
 	timer_set_ctl(vcpu_ptimer(vcpu), 0);
+	timer_set_ctl(vcpu_hvtimer(vcpu), 0);
+	timer_set_ctl(vcpu_hptimer(vcpu), 0);
 
 	if (timer->enabled) {
 		kvm_timer_update_irq(vcpu, false, vcpu_vtimer(vcpu));
 		kvm_timer_update_irq(vcpu, false, vcpu_ptimer(vcpu));
+		kvm_timer_update_irq(vcpu, false, vcpu_hvtimer(vcpu));
+		kvm_timer_update_irq(vcpu, false, vcpu_hptimer(vcpu));
 
 		if (irqchip_in_kernel(vcpu->kvm)) {
 			kvm_vgic_reset_mapped_irq(vcpu, map.direct_vtimer->irq.irq);
@@ -740,6 +845,8 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
 		}
 	}
 
+	if (map.emul_vtimer)
+		soft_timer_cancel(&map.emul_vtimer->hrtimer);
 	if (map.emul_ptimer)
 		soft_timer_cancel(&map.emul_ptimer->hrtimer);
 
@@ -770,30 +877,47 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
 	struct arch_timer_cpu *timer = vcpu_timer(vcpu);
 	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
 	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
+	struct arch_timer_context *hvtimer = vcpu_hvtimer(vcpu);
+	struct arch_timer_context *hptimer = vcpu_hptimer(vcpu);
 
 	vtimer->vcpu = vcpu;
 	ptimer->vcpu = vcpu;
+	hvtimer->vcpu = vcpu;
+	hptimer->vcpu = vcpu;
 
 	/* Synchronize cntvoff across all vtimers of a VM. */
 	update_vtimer_cntvoff(vcpu, kvm_phys_timer_read());
 	timer_set_offset(ptimer, 0);
+	timer_set_offset(hvtimer, 0);
+	timer_set_offset(hptimer, 0);
 
 	hrtimer_init(&timer->bg_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
 	timer->bg_timer.function = kvm_bg_timer_expire;
 
 	hrtimer_init(&vtimer->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
 	hrtimer_init(&ptimer->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
+	hrtimer_init(&hvtimer->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
+	hrtimer_init(&hptimer->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
+
 	vtimer->hrtimer.function = kvm_hrtimer_expire;
 	ptimer->hrtimer.function = kvm_hrtimer_expire;
+	hvtimer->hrtimer.function = kvm_hrtimer_expire;
+	hptimer->hrtimer.function = kvm_hrtimer_expire;
 
 	vtimer->irq.irq = default_vtimer_irq.irq;
 	ptimer->irq.irq = default_ptimer_irq.irq;
+	hvtimer->irq.irq = default_hvtimer_irq.irq;
+	hptimer->irq.irq = default_hptimer_irq.irq;
 
 	vtimer->host_timer_irq = host_vtimer_irq;
 	ptimer->host_timer_irq = host_ptimer_irq;
+	hvtimer->host_timer_irq = host_vtimer_irq;
+	hptimer->host_timer_irq = host_ptimer_irq;
 
 	vtimer->host_timer_irq_flags = host_vtimer_irq_flags;
 	ptimer->host_timer_irq_flags = host_ptimer_irq_flags;
+	hvtimer->host_timer_irq_flags = host_vtimer_irq_flags;
+	hptimer->host_timer_irq_flags = host_ptimer_irq_flags;
 }
 
 static void kvm_timer_init_interrupt(void *info)
@@ -900,6 +1024,10 @@ static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
 		val = kvm_phys_timer_read() - timer_get_offset(timer);
 		break;
 
+	case TIMER_REG_VOFF:
+		val = timer_get_offset(timer);
+		break;
+
 	default:
 		BUG();
 	}
@@ -942,6 +1070,10 @@ static void kvm_arm_timer_write(struct kvm_vcpu *vcpu,
 		timer_set_cval(timer, val);
 		break;
 
+	case TIMER_REG_VOFF:
+		timer_set_offset(timer, val);
+		break;
+
 	default:
 		BUG();
 	}
@@ -1040,10 +1172,6 @@ static const struct irq_domain_ops timer_domain_ops = {
 	.free	= timer_irq_domain_free,
 };
 
-static struct irq_ops arch_timer_irq_ops = {
-	.get_input_level = kvm_arch_timer_get_input_level,
-};
-
 static void kvm_irq_fixup_flags(unsigned int virq, u32 *flags)
 {
 	*flags = irq_get_trigger_type(virq);
@@ -1188,7 +1316,7 @@ void kvm_timer_vcpu_terminate(struct kvm_vcpu *vcpu)
 
 static bool timer_irqs_are_valid(struct kvm_vcpu *vcpu)
 {
-	int vtimer_irq, ptimer_irq, ret;
+	int vtimer_irq, ptimer_irq, hvtimer_irq, hptimer_irq, ret;
 	unsigned long i;
 
 	vtimer_irq = vcpu_vtimer(vcpu)->irq.irq;
@@ -1201,16 +1329,28 @@ static bool timer_irqs_are_valid(struct kvm_vcpu *vcpu)
 	if (ret)
 		return false;
 
+	hvtimer_irq = vcpu_hvtimer(vcpu)->irq.irq;
+	ret = kvm_vgic_set_owner(vcpu, hvtimer_irq, vcpu_hvtimer(vcpu));
+	if (ret)
+		return false;
+
+	hptimer_irq = vcpu_hptimer(vcpu)->irq.irq;
+	ret = kvm_vgic_set_owner(vcpu, hptimer_irq, vcpu_hptimer(vcpu));
+	if (ret)
+		return false;
+
 	kvm_for_each_vcpu(i, vcpu, vcpu->kvm) {
 		if (vcpu_vtimer(vcpu)->irq.irq != vtimer_irq ||
-		    vcpu_ptimer(vcpu)->irq.irq != ptimer_irq)
+		    vcpu_ptimer(vcpu)->irq.irq != ptimer_irq ||
+		    vcpu_hvtimer(vcpu)->irq.irq != hvtimer_irq ||
+		    vcpu_hptimer(vcpu)->irq.irq != hptimer_irq)
 			return false;
 	}
 
 	return true;
 }
 
-bool kvm_arch_timer_get_input_level(int vintid)
+static bool kvm_arch_timer_get_input_level(int vintid)
 {
 	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
 	struct arch_timer_context *timer;
@@ -1219,6 +1359,10 @@ bool kvm_arch_timer_get_input_level(int vintid)
 		timer = vcpu_vtimer(vcpu);
 	else if (vintid == vcpu_ptimer(vcpu)->irq.irq)
 		timer = vcpu_ptimer(vcpu);
+	else if (vintid == vcpu_hvtimer(vcpu)->irq.irq)
+		timer = vcpu_hvtimer(vcpu);
+	else if (vintid == vcpu_hptimer(vcpu)->irq.irq)
+		timer = vcpu_hptimer(vcpu);
 	else
 		BUG();
 
@@ -1301,6 +1445,7 @@ static void set_timer_irqs(struct kvm *kvm, int vtimer_irq, int ptimer_irq)
 	kvm_for_each_vcpu(i, vcpu, kvm) {
 		vcpu_vtimer(vcpu)->irq.irq = vtimer_irq;
 		vcpu_ptimer(vcpu)->irq.irq = ptimer_irq;
+		/* TODO: Add support for hv/hp timers */
 	}
 }
 
@@ -1311,6 +1456,8 @@ int kvm_arm_timer_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
 	int irq;
 
+	/* TODO: Add support for hv/hp timers */
+
 	if (!irqchip_in_kernel(vcpu->kvm))
 		return -EINVAL;
 
@@ -1343,6 +1490,8 @@ int kvm_arm_timer_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 	struct arch_timer_context *timer;
 	int irq;
 
+	/* TODO: Add support for hv/hp timers */
+
 	switch (attr->attr) {
 	case KVM_ARM_VCPU_TIMER_IRQ_VTIMER:
 		timer = vcpu_vtimer(vcpu);
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index d7441b8ba406..bbc58930a5eb 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1288,6 +1288,11 @@ static bool access_arch_timer(struct kvm_vcpu *vcpu,
 		tmr = TIMER_PTIMER;
 		treg = TIMER_REG_CVAL;
 		break;
+	case SYS_CNTVOFF_EL2:
+		tmr = TIMER_VTIMER;
+		treg = TIMER_REG_VOFF;
+		break;
+
 	default:
 		BUG();
 	}
@@ -2212,7 +2217,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	EL2_REG(CONTEXTIDR_EL2, access_rw, reset_val, 0),
 	EL2_REG(TPIDR_EL2, access_rw, reset_val, 0),
 
-	EL2_REG(CNTVOFF_EL2, access_rw, reset_val, 0),
+	{ SYS_DESC(SYS_CNTVOFF_EL2), access_arch_timer },
 	EL2_REG(CNTHCTL_EL2, access_rw, reset_val, 0),
 
 	EL12_REG(SCTLR, access_vm_reg, reset_val, 0x00C50078),
diff --git a/arch/arm64/kvm/trace_arm.h b/arch/arm64/kvm/trace_arm.h
index f3e46a976125..6ce5c025218d 100644
--- a/arch/arm64/kvm/trace_arm.h
+++ b/arch/arm64/kvm/trace_arm.h
@@ -206,6 +206,7 @@ TRACE_EVENT(kvm_get_timer_map,
 		__field(	unsigned long,		vcpu_id	)
 		__field(	int,			direct_vtimer	)
 		__field(	int,			direct_ptimer	)
+		__field(	int,			emul_vtimer	)
 		__field(	int,			emul_ptimer	)
 	),
 
@@ -214,14 +215,17 @@ TRACE_EVENT(kvm_get_timer_map,
 		__entry->direct_vtimer		= arch_timer_ctx_index(map->direct_vtimer);
 		__entry->direct_ptimer =
 			(map->direct_ptimer) ? arch_timer_ctx_index(map->direct_ptimer) : -1;
+		__entry->emul_vtimer =
+			(map->emul_vtimer) ? arch_timer_ctx_index(map->emul_vtimer) : -1;
 		__entry->emul_ptimer =
 			(map->emul_ptimer) ? arch_timer_ctx_index(map->emul_ptimer) : -1;
 	),
 
-	TP_printk("VCPU: %ld, dv: %d, dp: %d, ep: %d",
+	TP_printk("VCPU: %ld, dv: %d, dp: %d, ev: %d, ep: %d",
 		  __entry->vcpu_id,
 		  __entry->direct_vtimer,
 		  __entry->direct_ptimer,
+		  __entry->emul_vtimer,
 		  __entry->emul_ptimer)
 );
 
diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
index 9b98876a8a93..e7fe0447af08 100644
--- a/arch/arm64/kvm/vgic/vgic.c
+++ b/arch/arm64/kvm/vgic/vgic.c
@@ -573,6 +573,21 @@ int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int vintid)
 	return 0;
 }
 
+int kvm_vgic_get_map(struct kvm_vcpu *vcpu, unsigned int vintid)
+{
+	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, vintid);
+	unsigned long flags;
+	int ret = -1;
+
+	raw_spin_lock_irqsave(&irq->irq_lock, flags);
+	if (irq->hw)
+		ret = irq->hwintid;
+	raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
+
+	vgic_put_irq(vcpu->kvm, irq);
+	return ret;
+}
+
 /**
  * kvm_vgic_set_owner - Set the owner of an interrupt for a VM
  *
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index 51c19381108c..0a76dac8cb6a 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -13,6 +13,8 @@
 enum kvm_arch_timers {
 	TIMER_PTIMER,
 	TIMER_VTIMER,
+	TIMER_HVTIMER,
+	TIMER_HPTIMER,
 	NR_KVM_TIMERS
 };
 
@@ -21,6 +23,7 @@ enum kvm_arch_timer_regs {
 	TIMER_REG_CVAL,
 	TIMER_REG_TVAL,
 	TIMER_REG_CTL,
+	TIMER_REG_VOFF,
 };
 
 struct arch_timer_context {
@@ -47,6 +50,7 @@ struct arch_timer_context {
 struct timer_map {
 	struct arch_timer_context *direct_vtimer;
 	struct arch_timer_context *direct_ptimer;
+	struct arch_timer_context *emul_vtimer;
 	struct arch_timer_context *emul_ptimer;
 };
 
@@ -85,12 +89,12 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu);
 
 void kvm_timer_init_vhe(void);
 
-bool kvm_arch_timer_get_input_level(int vintid);
-
 #define vcpu_timer(v)	(&(v)->arch.timer_cpu)
 #define vcpu_get_timer(v,t)	(&vcpu_timer(v)->timers[(t)])
 #define vcpu_vtimer(v)	(&(v)->arch.timer_cpu.timers[TIMER_VTIMER])
 #define vcpu_ptimer(v)	(&(v)->arch.timer_cpu.timers[TIMER_PTIMER])
+#define vcpu_hvtimer(v)	(&(v)->arch.timer_cpu.timers[TIMER_HVTIMER])
+#define vcpu_hptimer(v)	(&(v)->arch.timer_cpu.timers[TIMER_HPTIMER])
 
 #define arch_timer_ctx_index(ctx)	((ctx) - vcpu_timer((ctx)->vcpu)->timers)
 
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index bb30a6803d9f..17b651890d22 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -375,6 +375,7 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
 int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, unsigned int host_irq,
 			  u32 vintid, struct irq_ops *ops);
 int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int vintid);
+int kvm_vgic_get_map(struct kvm_vcpu *vcpu, unsigned int vintid);
 bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int vintid);
 
 int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 43/64] KVM: arm64: nv: arch_timer: Support hyp timer emulation
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

From: Christoffer Dall <christoffer.dall@arm.com>

Emulating EL2 also means emulating the EL2 timers. To do so, we expand
our timer framework to deal with at most 4 timers. At any given time,
two timers are using the HW timers, and the two others are purely
emulated.

The role of deciding which is which at any given time is left to a
mapping function which is called every time we need to make such a
decision.

Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
[maz: added CNTVOFF support, general reworking for v4.8]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h |   4 +
 arch/arm64/kvm/arch_timer.c       | 165 ++++++++++++++++++++++++++++--
 arch/arm64/kvm/sys_regs.c         |   7 +-
 arch/arm64/kvm/trace_arm.h        |   6 +-
 arch/arm64/kvm/vgic/vgic.c        |  15 +++
 include/kvm/arm_arch_timer.h      |   8 +-
 include/kvm/arm_vgic.h            |   1 +
 7 files changed, 194 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 0b887364f994..03833eca3307 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -285,6 +285,10 @@ enum vcpu_sysreg {
 	TPIDR_EL2,	/* EL2 Software Thread ID Register */
 	CNTHCTL_EL2,	/* Counter-timer Hypervisor Control register */
 	SP_EL2,		/* EL2 Stack Pointer */
+	CNTHP_CTL_EL2,
+	CNTHP_CVAL_EL2,
+	CNTHV_CTL_EL2,
+	CNTHV_CVAL_EL2,
 
 	NR_SYS_REGS	/* Nothing after this line! */
 };
diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index 6e542e2eae32..5e4f93605d36 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -16,6 +16,7 @@
 #include <asm/arch_timer.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_nested.h>
 
 #include <kvm/arm_vgic.h>
 #include <kvm/arm_arch_timer.h>
@@ -40,6 +41,16 @@ static const struct kvm_irq_level default_vtimer_irq = {
 	.level	= 1,
 };
 
+static const struct kvm_irq_level default_hptimer_irq = {
+	.irq	= 26,
+	.level	= 1,
+};
+
+static const struct kvm_irq_level default_hvtimer_irq = {
+	.irq	= 28,
+	.level	= 1,
+};
+
 static bool kvm_timer_irq_can_fire(struct arch_timer_context *timer_ctx);
 static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
 				 struct arch_timer_context *timer_ctx);
@@ -51,6 +62,11 @@ static void kvm_arm_timer_write(struct kvm_vcpu *vcpu,
 static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
 			      struct arch_timer_context *timer,
 			      enum kvm_arch_timer_regs treg);
+static bool kvm_arch_timer_get_input_level(int vintid);
+
+static struct irq_ops arch_timer_irq_ops = {
+	.get_input_level = kvm_arch_timer_get_input_level,
+};
 
 u32 timer_get_ctl(struct arch_timer_context *ctxt)
 {
@@ -61,6 +77,10 @@ u32 timer_get_ctl(struct arch_timer_context *ctxt)
 		return __vcpu_sys_reg(vcpu, CNTV_CTL_EL0);
 	case TIMER_PTIMER:
 		return __vcpu_sys_reg(vcpu, CNTP_CTL_EL0);
+	case TIMER_HVTIMER:
+		return __vcpu_sys_reg(vcpu, CNTHV_CTL_EL2);
+	case TIMER_HPTIMER:
+		return __vcpu_sys_reg(vcpu, CNTHP_CTL_EL2);
 	default:
 		WARN_ON(1);
 		return 0;
@@ -76,6 +96,10 @@ u64 timer_get_cval(struct arch_timer_context *ctxt)
 		return __vcpu_sys_reg(vcpu, CNTV_CVAL_EL0);
 	case TIMER_PTIMER:
 		return __vcpu_sys_reg(vcpu, CNTP_CVAL_EL0);
+	case TIMER_HVTIMER:
+		return __vcpu_sys_reg(vcpu, CNTHV_CVAL_EL2);
+	case TIMER_HPTIMER:
+		return __vcpu_sys_reg(vcpu, CNTHP_CVAL_EL2);
 	default:
 		WARN_ON(1);
 		return 0;
@@ -105,6 +129,12 @@ static void timer_set_ctl(struct arch_timer_context *ctxt, u32 ctl)
 	case TIMER_PTIMER:
 		__vcpu_sys_reg(vcpu, CNTP_CTL_EL0) = ctl;
 		break;
+	case TIMER_HVTIMER:
+		__vcpu_sys_reg(vcpu, CNTHV_CTL_EL2) = ctl;
+		break;
+	case TIMER_HPTIMER:
+		__vcpu_sys_reg(vcpu, CNTHP_CTL_EL2) = ctl;
+		break;
 	default:
 		WARN_ON(1);
 	}
@@ -121,6 +151,12 @@ static void timer_set_cval(struct arch_timer_context *ctxt, u64 cval)
 	case TIMER_PTIMER:
 		__vcpu_sys_reg(vcpu, CNTP_CVAL_EL0) = cval;
 		break;
+	case TIMER_HVTIMER:
+		__vcpu_sys_reg(vcpu, CNTHV_CVAL_EL2) = cval;
+		break;
+	case TIMER_HPTIMER:
+		__vcpu_sys_reg(vcpu, CNTHP_CVAL_EL2) = cval;
+		break;
 	default:
 		WARN_ON(1);
 	}
@@ -146,13 +182,27 @@ u64 kvm_phys_timer_read(void)
 
 static void get_timer_map(struct kvm_vcpu *vcpu, struct timer_map *map)
 {
-	if (has_vhe()) {
+	if (vcpu_has_nv(vcpu)) {
+		if (is_hyp_ctxt(vcpu)) {
+			map->direct_vtimer = vcpu_hvtimer(vcpu);
+			map->direct_ptimer = vcpu_hptimer(vcpu);
+			map->emul_vtimer = vcpu_vtimer(vcpu);
+			map->emul_ptimer = vcpu_ptimer(vcpu);
+		} else {
+			map->direct_vtimer = vcpu_vtimer(vcpu);
+			map->direct_ptimer = vcpu_ptimer(vcpu);
+			map->emul_vtimer = vcpu_hvtimer(vcpu);
+			map->emul_ptimer = vcpu_hptimer(vcpu);
+		}
+	} else if (has_vhe()) {
 		map->direct_vtimer = vcpu_vtimer(vcpu);
 		map->direct_ptimer = vcpu_ptimer(vcpu);
+		map->emul_vtimer = NULL;
 		map->emul_ptimer = NULL;
 	} else {
 		map->direct_vtimer = vcpu_vtimer(vcpu);
 		map->direct_ptimer = NULL;
+		map->emul_vtimer = NULL;
 		map->emul_ptimer = vcpu_ptimer(vcpu);
 	}
 
@@ -325,9 +375,11 @@ static bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx)
 
 		switch (index) {
 		case TIMER_VTIMER:
+		case TIMER_HVTIMER:
 			cnt_ctl = read_sysreg_el0(SYS_CNTV_CTL);
 			break;
 		case TIMER_PTIMER:
+		case TIMER_HPTIMER:
 			cnt_ctl = read_sysreg_el0(SYS_CNTP_CTL);
 			break;
 		case NR_KVM_TIMERS:
@@ -358,6 +410,7 @@ bool kvm_timer_is_pending(struct kvm_vcpu *vcpu)
 
 	return kvm_timer_should_fire(map.direct_vtimer) ||
 	       kvm_timer_should_fire(map.direct_ptimer) ||
+	       kvm_timer_should_fire(map.emul_vtimer) ||
 	       kvm_timer_should_fire(map.emul_ptimer);
 }
 
@@ -438,6 +491,7 @@ static void timer_save_state(struct arch_timer_context *ctx)
 
 	switch (index) {
 	case TIMER_VTIMER:
+	case TIMER_HVTIMER:
 		timer_set_ctl(ctx, read_sysreg_el0(SYS_CNTV_CTL));
 		timer_set_cval(ctx, read_sysreg_el0(SYS_CNTV_CVAL));
 
@@ -447,6 +501,7 @@ static void timer_save_state(struct arch_timer_context *ctx)
 
 		break;
 	case TIMER_PTIMER:
+	case TIMER_HPTIMER:
 		timer_set_ctl(ctx, read_sysreg_el0(SYS_CNTP_CTL));
 		timer_set_cval(ctx, read_sysreg_el0(SYS_CNTP_CVAL));
 
@@ -484,6 +539,7 @@ static void kvm_timer_blocking(struct kvm_vcpu *vcpu)
 	 */
 	if (!kvm_timer_irq_can_fire(map.direct_vtimer) &&
 	    !kvm_timer_irq_can_fire(map.direct_ptimer) &&
+	    !kvm_timer_irq_can_fire(map.emul_vtimer) &&
 	    !kvm_timer_irq_can_fire(map.emul_ptimer))
 		return;
 
@@ -517,11 +573,13 @@ static void timer_restore_state(struct arch_timer_context *ctx)
 
 	switch (index) {
 	case TIMER_VTIMER:
+	case TIMER_HVTIMER:
 		write_sysreg_el0(timer_get_cval(ctx), SYS_CNTV_CVAL);
 		isb();
 		write_sysreg_el0(timer_get_ctl(ctx), SYS_CNTV_CTL);
 		break;
 	case TIMER_PTIMER:
+	case TIMER_HPTIMER:
 		write_sysreg_el0(timer_get_cval(ctx), SYS_CNTP_CVAL);
 		isb();
 		write_sysreg_el0(timer_get_ctl(ctx), SYS_CNTP_CTL);
@@ -598,6 +656,42 @@ static void kvm_timer_vcpu_load_nogic(struct kvm_vcpu *vcpu)
 		enable_percpu_irq(host_vtimer_irq, host_vtimer_irq_flags);
 }
 
+static void kvm_timer_vcpu_load_nested_switch(struct kvm_vcpu *vcpu,
+					      struct timer_map *map)
+{
+	int hw, ret;
+
+	if (!irqchip_in_kernel(vcpu->kvm))
+		return;
+
+	/*
+	 * We only ever unmap the vtimer irq on a VHE system that runs nested
+	 * virtualization, in which case we have both a valid emul_vtimer,
+	 * emul_ptimer, direct_vtimer, and direct_ptimer.
+	 *
+	 * Since this is called from kvm_timer_vcpu_load(), a change between
+	 * vEL2 and vEL1/0 will have just happened, and the timer_map will
+	 * represent this, and therefore we switch the emul/direct mappings
+	 * below.
+	 */
+	hw = kvm_vgic_get_map(vcpu, map->direct_vtimer->irq.irq);
+	if (hw < 0) {
+		kvm_vgic_unmap_phys_irq(vcpu, map->emul_vtimer->irq.irq);
+		kvm_vgic_unmap_phys_irq(vcpu, map->emul_ptimer->irq.irq);
+
+		ret = kvm_vgic_map_phys_irq(vcpu,
+					    map->direct_vtimer->host_timer_irq,
+					    map->direct_vtimer->irq.irq,
+					    &arch_timer_irq_ops);
+		WARN_ON_ONCE(ret);
+		ret = kvm_vgic_map_phys_irq(vcpu,
+					    map->direct_ptimer->host_timer_irq,
+					    map->direct_ptimer->irq.irq,
+					    &arch_timer_irq_ops);
+		WARN_ON_ONCE(ret);
+	}
+}
+
 void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = vcpu_timer(vcpu);
@@ -609,6 +703,9 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
 	get_timer_map(vcpu, &map);
 
 	if (static_branch_likely(&has_gic_active_state)) {
+		if (vcpu_has_nv(vcpu))
+			kvm_timer_vcpu_load_nested_switch(vcpu, &map);
+
 		kvm_timer_vcpu_load_gic(map.direct_vtimer);
 		if (map.direct_ptimer)
 			kvm_timer_vcpu_load_gic(map.direct_ptimer);
@@ -624,6 +721,8 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
 	if (map.direct_ptimer)
 		timer_restore_state(map.direct_ptimer);
 
+	if (map.emul_vtimer)
+		timer_emulate(map.emul_vtimer);
 	if (map.emul_ptimer)
 		timer_emulate(map.emul_ptimer);
 }
@@ -668,6 +767,8 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
 	 * In any case, we re-schedule the hrtimer for the physical timer when
 	 * coming back to the VCPU thread in kvm_timer_vcpu_load().
 	 */
+	if (map.emul_vtimer)
+		soft_timer_cancel(&map.emul_vtimer->hrtimer);
 	if (map.emul_ptimer)
 		soft_timer_cancel(&map.emul_ptimer->hrtimer);
 
@@ -728,10 +829,14 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
 	 */
 	timer_set_ctl(vcpu_vtimer(vcpu), 0);
 	timer_set_ctl(vcpu_ptimer(vcpu), 0);
+	timer_set_ctl(vcpu_hvtimer(vcpu), 0);
+	timer_set_ctl(vcpu_hptimer(vcpu), 0);
 
 	if (timer->enabled) {
 		kvm_timer_update_irq(vcpu, false, vcpu_vtimer(vcpu));
 		kvm_timer_update_irq(vcpu, false, vcpu_ptimer(vcpu));
+		kvm_timer_update_irq(vcpu, false, vcpu_hvtimer(vcpu));
+		kvm_timer_update_irq(vcpu, false, vcpu_hptimer(vcpu));
 
 		if (irqchip_in_kernel(vcpu->kvm)) {
 			kvm_vgic_reset_mapped_irq(vcpu, map.direct_vtimer->irq.irq);
@@ -740,6 +845,8 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
 		}
 	}
 
+	if (map.emul_vtimer)
+		soft_timer_cancel(&map.emul_vtimer->hrtimer);
 	if (map.emul_ptimer)
 		soft_timer_cancel(&map.emul_ptimer->hrtimer);
 
@@ -770,30 +877,47 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
 	struct arch_timer_cpu *timer = vcpu_timer(vcpu);
 	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
 	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
+	struct arch_timer_context *hvtimer = vcpu_hvtimer(vcpu);
+	struct arch_timer_context *hptimer = vcpu_hptimer(vcpu);
 
 	vtimer->vcpu = vcpu;
 	ptimer->vcpu = vcpu;
+	hvtimer->vcpu = vcpu;
+	hptimer->vcpu = vcpu;
 
 	/* Synchronize cntvoff across all vtimers of a VM. */
 	update_vtimer_cntvoff(vcpu, kvm_phys_timer_read());
 	timer_set_offset(ptimer, 0);
+	timer_set_offset(hvtimer, 0);
+	timer_set_offset(hptimer, 0);
 
 	hrtimer_init(&timer->bg_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
 	timer->bg_timer.function = kvm_bg_timer_expire;
 
 	hrtimer_init(&vtimer->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
 	hrtimer_init(&ptimer->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
+	hrtimer_init(&hvtimer->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
+	hrtimer_init(&hptimer->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
+
 	vtimer->hrtimer.function = kvm_hrtimer_expire;
 	ptimer->hrtimer.function = kvm_hrtimer_expire;
+	hvtimer->hrtimer.function = kvm_hrtimer_expire;
+	hptimer->hrtimer.function = kvm_hrtimer_expire;
 
 	vtimer->irq.irq = default_vtimer_irq.irq;
 	ptimer->irq.irq = default_ptimer_irq.irq;
+	hvtimer->irq.irq = default_hvtimer_irq.irq;
+	hptimer->irq.irq = default_hptimer_irq.irq;
 
 	vtimer->host_timer_irq = host_vtimer_irq;
 	ptimer->host_timer_irq = host_ptimer_irq;
+	hvtimer->host_timer_irq = host_vtimer_irq;
+	hptimer->host_timer_irq = host_ptimer_irq;
 
 	vtimer->host_timer_irq_flags = host_vtimer_irq_flags;
 	ptimer->host_timer_irq_flags = host_ptimer_irq_flags;
+	hvtimer->host_timer_irq_flags = host_vtimer_irq_flags;
+	hptimer->host_timer_irq_flags = host_ptimer_irq_flags;
 }
 
 static void kvm_timer_init_interrupt(void *info)
@@ -900,6 +1024,10 @@ static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
 		val = kvm_phys_timer_read() - timer_get_offset(timer);
 		break;
 
+	case TIMER_REG_VOFF:
+		val = timer_get_offset(timer);
+		break;
+
 	default:
 		BUG();
 	}
@@ -942,6 +1070,10 @@ static void kvm_arm_timer_write(struct kvm_vcpu *vcpu,
 		timer_set_cval(timer, val);
 		break;
 
+	case TIMER_REG_VOFF:
+		timer_set_offset(timer, val);
+		break;
+
 	default:
 		BUG();
 	}
@@ -1040,10 +1172,6 @@ static const struct irq_domain_ops timer_domain_ops = {
 	.free	= timer_irq_domain_free,
 };
 
-static struct irq_ops arch_timer_irq_ops = {
-	.get_input_level = kvm_arch_timer_get_input_level,
-};
-
 static void kvm_irq_fixup_flags(unsigned int virq, u32 *flags)
 {
 	*flags = irq_get_trigger_type(virq);
@@ -1188,7 +1316,7 @@ void kvm_timer_vcpu_terminate(struct kvm_vcpu *vcpu)
 
 static bool timer_irqs_are_valid(struct kvm_vcpu *vcpu)
 {
-	int vtimer_irq, ptimer_irq, ret;
+	int vtimer_irq, ptimer_irq, hvtimer_irq, hptimer_irq, ret;
 	unsigned long i;
 
 	vtimer_irq = vcpu_vtimer(vcpu)->irq.irq;
@@ -1201,16 +1329,28 @@ static bool timer_irqs_are_valid(struct kvm_vcpu *vcpu)
 	if (ret)
 		return false;
 
+	hvtimer_irq = vcpu_hvtimer(vcpu)->irq.irq;
+	ret = kvm_vgic_set_owner(vcpu, hvtimer_irq, vcpu_hvtimer(vcpu));
+	if (ret)
+		return false;
+
+	hptimer_irq = vcpu_hptimer(vcpu)->irq.irq;
+	ret = kvm_vgic_set_owner(vcpu, hptimer_irq, vcpu_hptimer(vcpu));
+	if (ret)
+		return false;
+
 	kvm_for_each_vcpu(i, vcpu, vcpu->kvm) {
 		if (vcpu_vtimer(vcpu)->irq.irq != vtimer_irq ||
-		    vcpu_ptimer(vcpu)->irq.irq != ptimer_irq)
+		    vcpu_ptimer(vcpu)->irq.irq != ptimer_irq ||
+		    vcpu_hvtimer(vcpu)->irq.irq != hvtimer_irq ||
+		    vcpu_hptimer(vcpu)->irq.irq != hptimer_irq)
 			return false;
 	}
 
 	return true;
 }
 
-bool kvm_arch_timer_get_input_level(int vintid)
+static bool kvm_arch_timer_get_input_level(int vintid)
 {
 	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
 	struct arch_timer_context *timer;
@@ -1219,6 +1359,10 @@ bool kvm_arch_timer_get_input_level(int vintid)
 		timer = vcpu_vtimer(vcpu);
 	else if (vintid == vcpu_ptimer(vcpu)->irq.irq)
 		timer = vcpu_ptimer(vcpu);
+	else if (vintid == vcpu_hvtimer(vcpu)->irq.irq)
+		timer = vcpu_hvtimer(vcpu);
+	else if (vintid == vcpu_hptimer(vcpu)->irq.irq)
+		timer = vcpu_hptimer(vcpu);
 	else
 		BUG();
 
@@ -1301,6 +1445,7 @@ static void set_timer_irqs(struct kvm *kvm, int vtimer_irq, int ptimer_irq)
 	kvm_for_each_vcpu(i, vcpu, kvm) {
 		vcpu_vtimer(vcpu)->irq.irq = vtimer_irq;
 		vcpu_ptimer(vcpu)->irq.irq = ptimer_irq;
+		/* TODO: Add support for hv/hp timers */
 	}
 }
 
@@ -1311,6 +1456,8 @@ int kvm_arm_timer_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
 	int irq;
 
+	/* TODO: Add support for hv/hp timers */
+
 	if (!irqchip_in_kernel(vcpu->kvm))
 		return -EINVAL;
 
@@ -1343,6 +1490,8 @@ int kvm_arm_timer_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 	struct arch_timer_context *timer;
 	int irq;
 
+	/* TODO: Add support for hv/hp timers */
+
 	switch (attr->attr) {
 	case KVM_ARM_VCPU_TIMER_IRQ_VTIMER:
 		timer = vcpu_vtimer(vcpu);
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index d7441b8ba406..bbc58930a5eb 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1288,6 +1288,11 @@ static bool access_arch_timer(struct kvm_vcpu *vcpu,
 		tmr = TIMER_PTIMER;
 		treg = TIMER_REG_CVAL;
 		break;
+	case SYS_CNTVOFF_EL2:
+		tmr = TIMER_VTIMER;
+		treg = TIMER_REG_VOFF;
+		break;
+
 	default:
 		BUG();
 	}
@@ -2212,7 +2217,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	EL2_REG(CONTEXTIDR_EL2, access_rw, reset_val, 0),
 	EL2_REG(TPIDR_EL2, access_rw, reset_val, 0),
 
-	EL2_REG(CNTVOFF_EL2, access_rw, reset_val, 0),
+	{ SYS_DESC(SYS_CNTVOFF_EL2), access_arch_timer },
 	EL2_REG(CNTHCTL_EL2, access_rw, reset_val, 0),
 
 	EL12_REG(SCTLR, access_vm_reg, reset_val, 0x00C50078),
diff --git a/arch/arm64/kvm/trace_arm.h b/arch/arm64/kvm/trace_arm.h
index f3e46a976125..6ce5c025218d 100644
--- a/arch/arm64/kvm/trace_arm.h
+++ b/arch/arm64/kvm/trace_arm.h
@@ -206,6 +206,7 @@ TRACE_EVENT(kvm_get_timer_map,
 		__field(	unsigned long,		vcpu_id	)
 		__field(	int,			direct_vtimer	)
 		__field(	int,			direct_ptimer	)
+		__field(	int,			emul_vtimer	)
 		__field(	int,			emul_ptimer	)
 	),
 
@@ -214,14 +215,17 @@ TRACE_EVENT(kvm_get_timer_map,
 		__entry->direct_vtimer		= arch_timer_ctx_index(map->direct_vtimer);
 		__entry->direct_ptimer =
 			(map->direct_ptimer) ? arch_timer_ctx_index(map->direct_ptimer) : -1;
+		__entry->emul_vtimer =
+			(map->emul_vtimer) ? arch_timer_ctx_index(map->emul_vtimer) : -1;
 		__entry->emul_ptimer =
 			(map->emul_ptimer) ? arch_timer_ctx_index(map->emul_ptimer) : -1;
 	),
 
-	TP_printk("VCPU: %ld, dv: %d, dp: %d, ep: %d",
+	TP_printk("VCPU: %ld, dv: %d, dp: %d, ev: %d, ep: %d",
 		  __entry->vcpu_id,
 		  __entry->direct_vtimer,
 		  __entry->direct_ptimer,
+		  __entry->emul_vtimer,
 		  __entry->emul_ptimer)
 );
 
diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
index 9b98876a8a93..e7fe0447af08 100644
--- a/arch/arm64/kvm/vgic/vgic.c
+++ b/arch/arm64/kvm/vgic/vgic.c
@@ -573,6 +573,21 @@ int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int vintid)
 	return 0;
 }
 
+int kvm_vgic_get_map(struct kvm_vcpu *vcpu, unsigned int vintid)
+{
+	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, vintid);
+	unsigned long flags;
+	int ret = -1;
+
+	raw_spin_lock_irqsave(&irq->irq_lock, flags);
+	if (irq->hw)
+		ret = irq->hwintid;
+	raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
+
+	vgic_put_irq(vcpu->kvm, irq);
+	return ret;
+}
+
 /**
  * kvm_vgic_set_owner - Set the owner of an interrupt for a VM
  *
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index 51c19381108c..0a76dac8cb6a 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -13,6 +13,8 @@
 enum kvm_arch_timers {
 	TIMER_PTIMER,
 	TIMER_VTIMER,
+	TIMER_HVTIMER,
+	TIMER_HPTIMER,
 	NR_KVM_TIMERS
 };
 
@@ -21,6 +23,7 @@ enum kvm_arch_timer_regs {
 	TIMER_REG_CVAL,
 	TIMER_REG_TVAL,
 	TIMER_REG_CTL,
+	TIMER_REG_VOFF,
 };
 
 struct arch_timer_context {
@@ -47,6 +50,7 @@ struct arch_timer_context {
 struct timer_map {
 	struct arch_timer_context *direct_vtimer;
 	struct arch_timer_context *direct_ptimer;
+	struct arch_timer_context *emul_vtimer;
 	struct arch_timer_context *emul_ptimer;
 };
 
@@ -85,12 +89,12 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu);
 
 void kvm_timer_init_vhe(void);
 
-bool kvm_arch_timer_get_input_level(int vintid);
-
 #define vcpu_timer(v)	(&(v)->arch.timer_cpu)
 #define vcpu_get_timer(v,t)	(&vcpu_timer(v)->timers[(t)])
 #define vcpu_vtimer(v)	(&(v)->arch.timer_cpu.timers[TIMER_VTIMER])
 #define vcpu_ptimer(v)	(&(v)->arch.timer_cpu.timers[TIMER_PTIMER])
+#define vcpu_hvtimer(v)	(&(v)->arch.timer_cpu.timers[TIMER_HVTIMER])
+#define vcpu_hptimer(v)	(&(v)->arch.timer_cpu.timers[TIMER_HPTIMER])
 
 #define arch_timer_ctx_index(ctx)	((ctx) - vcpu_timer((ctx)->vcpu)->timers)
 
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index bb30a6803d9f..17b651890d22 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -375,6 +375,7 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
 int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, unsigned int host_irq,
 			  u32 vintid, struct irq_ops *ops);
 int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int vintid);
+int kvm_vgic_get_map(struct kvm_vcpu *vcpu, unsigned int vintid);
 bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int vintid);
 
 int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 43/64] KVM: arm64: nv: arch_timer: Support hyp timer emulation
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Christoffer Dall <christoffer.dall@arm.com>

Emulating EL2 also means emulating the EL2 timers. To do so, we expand
our timer framework to deal with at most 4 timers. At any given time,
two timers are using the HW timers, and the two others are purely
emulated.

The role of deciding which is which at any given time is left to a
mapping function which is called every time we need to make such a
decision.

Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
[maz: added CNTVOFF support, general reworking for v4.8]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h |   4 +
 arch/arm64/kvm/arch_timer.c       | 165 ++++++++++++++++++++++++++++--
 arch/arm64/kvm/sys_regs.c         |   7 +-
 arch/arm64/kvm/trace_arm.h        |   6 +-
 arch/arm64/kvm/vgic/vgic.c        |  15 +++
 include/kvm/arm_arch_timer.h      |   8 +-
 include/kvm/arm_vgic.h            |   1 +
 7 files changed, 194 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 0b887364f994..03833eca3307 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -285,6 +285,10 @@ enum vcpu_sysreg {
 	TPIDR_EL2,	/* EL2 Software Thread ID Register */
 	CNTHCTL_EL2,	/* Counter-timer Hypervisor Control register */
 	SP_EL2,		/* EL2 Stack Pointer */
+	CNTHP_CTL_EL2,
+	CNTHP_CVAL_EL2,
+	CNTHV_CTL_EL2,
+	CNTHV_CVAL_EL2,
 
 	NR_SYS_REGS	/* Nothing after this line! */
 };
diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index 6e542e2eae32..5e4f93605d36 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -16,6 +16,7 @@
 #include <asm/arch_timer.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_nested.h>
 
 #include <kvm/arm_vgic.h>
 #include <kvm/arm_arch_timer.h>
@@ -40,6 +41,16 @@ static const struct kvm_irq_level default_vtimer_irq = {
 	.level	= 1,
 };
 
+static const struct kvm_irq_level default_hptimer_irq = {
+	.irq	= 26,
+	.level	= 1,
+};
+
+static const struct kvm_irq_level default_hvtimer_irq = {
+	.irq	= 28,
+	.level	= 1,
+};
+
 static bool kvm_timer_irq_can_fire(struct arch_timer_context *timer_ctx);
 static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
 				 struct arch_timer_context *timer_ctx);
@@ -51,6 +62,11 @@ static void kvm_arm_timer_write(struct kvm_vcpu *vcpu,
 static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
 			      struct arch_timer_context *timer,
 			      enum kvm_arch_timer_regs treg);
+static bool kvm_arch_timer_get_input_level(int vintid);
+
+static struct irq_ops arch_timer_irq_ops = {
+	.get_input_level = kvm_arch_timer_get_input_level,
+};
 
 u32 timer_get_ctl(struct arch_timer_context *ctxt)
 {
@@ -61,6 +77,10 @@ u32 timer_get_ctl(struct arch_timer_context *ctxt)
 		return __vcpu_sys_reg(vcpu, CNTV_CTL_EL0);
 	case TIMER_PTIMER:
 		return __vcpu_sys_reg(vcpu, CNTP_CTL_EL0);
+	case TIMER_HVTIMER:
+		return __vcpu_sys_reg(vcpu, CNTHV_CTL_EL2);
+	case TIMER_HPTIMER:
+		return __vcpu_sys_reg(vcpu, CNTHP_CTL_EL2);
 	default:
 		WARN_ON(1);
 		return 0;
@@ -76,6 +96,10 @@ u64 timer_get_cval(struct arch_timer_context *ctxt)
 		return __vcpu_sys_reg(vcpu, CNTV_CVAL_EL0);
 	case TIMER_PTIMER:
 		return __vcpu_sys_reg(vcpu, CNTP_CVAL_EL0);
+	case TIMER_HVTIMER:
+		return __vcpu_sys_reg(vcpu, CNTHV_CVAL_EL2);
+	case TIMER_HPTIMER:
+		return __vcpu_sys_reg(vcpu, CNTHP_CVAL_EL2);
 	default:
 		WARN_ON(1);
 		return 0;
@@ -105,6 +129,12 @@ static void timer_set_ctl(struct arch_timer_context *ctxt, u32 ctl)
 	case TIMER_PTIMER:
 		__vcpu_sys_reg(vcpu, CNTP_CTL_EL0) = ctl;
 		break;
+	case TIMER_HVTIMER:
+		__vcpu_sys_reg(vcpu, CNTHV_CTL_EL2) = ctl;
+		break;
+	case TIMER_HPTIMER:
+		__vcpu_sys_reg(vcpu, CNTHP_CTL_EL2) = ctl;
+		break;
 	default:
 		WARN_ON(1);
 	}
@@ -121,6 +151,12 @@ static void timer_set_cval(struct arch_timer_context *ctxt, u64 cval)
 	case TIMER_PTIMER:
 		__vcpu_sys_reg(vcpu, CNTP_CVAL_EL0) = cval;
 		break;
+	case TIMER_HVTIMER:
+		__vcpu_sys_reg(vcpu, CNTHV_CVAL_EL2) = cval;
+		break;
+	case TIMER_HPTIMER:
+		__vcpu_sys_reg(vcpu, CNTHP_CVAL_EL2) = cval;
+		break;
 	default:
 		WARN_ON(1);
 	}
@@ -146,13 +182,27 @@ u64 kvm_phys_timer_read(void)
 
 static void get_timer_map(struct kvm_vcpu *vcpu, struct timer_map *map)
 {
-	if (has_vhe()) {
+	if (vcpu_has_nv(vcpu)) {
+		if (is_hyp_ctxt(vcpu)) {
+			map->direct_vtimer = vcpu_hvtimer(vcpu);
+			map->direct_ptimer = vcpu_hptimer(vcpu);
+			map->emul_vtimer = vcpu_vtimer(vcpu);
+			map->emul_ptimer = vcpu_ptimer(vcpu);
+		} else {
+			map->direct_vtimer = vcpu_vtimer(vcpu);
+			map->direct_ptimer = vcpu_ptimer(vcpu);
+			map->emul_vtimer = vcpu_hvtimer(vcpu);
+			map->emul_ptimer = vcpu_hptimer(vcpu);
+		}
+	} else if (has_vhe()) {
 		map->direct_vtimer = vcpu_vtimer(vcpu);
 		map->direct_ptimer = vcpu_ptimer(vcpu);
+		map->emul_vtimer = NULL;
 		map->emul_ptimer = NULL;
 	} else {
 		map->direct_vtimer = vcpu_vtimer(vcpu);
 		map->direct_ptimer = NULL;
+		map->emul_vtimer = NULL;
 		map->emul_ptimer = vcpu_ptimer(vcpu);
 	}
 
@@ -325,9 +375,11 @@ static bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx)
 
 		switch (index) {
 		case TIMER_VTIMER:
+		case TIMER_HVTIMER:
 			cnt_ctl = read_sysreg_el0(SYS_CNTV_CTL);
 			break;
 		case TIMER_PTIMER:
+		case TIMER_HPTIMER:
 			cnt_ctl = read_sysreg_el0(SYS_CNTP_CTL);
 			break;
 		case NR_KVM_TIMERS:
@@ -358,6 +410,7 @@ bool kvm_timer_is_pending(struct kvm_vcpu *vcpu)
 
 	return kvm_timer_should_fire(map.direct_vtimer) ||
 	       kvm_timer_should_fire(map.direct_ptimer) ||
+	       kvm_timer_should_fire(map.emul_vtimer) ||
 	       kvm_timer_should_fire(map.emul_ptimer);
 }
 
@@ -438,6 +491,7 @@ static void timer_save_state(struct arch_timer_context *ctx)
 
 	switch (index) {
 	case TIMER_VTIMER:
+	case TIMER_HVTIMER:
 		timer_set_ctl(ctx, read_sysreg_el0(SYS_CNTV_CTL));
 		timer_set_cval(ctx, read_sysreg_el0(SYS_CNTV_CVAL));
 
@@ -447,6 +501,7 @@ static void timer_save_state(struct arch_timer_context *ctx)
 
 		break;
 	case TIMER_PTIMER:
+	case TIMER_HPTIMER:
 		timer_set_ctl(ctx, read_sysreg_el0(SYS_CNTP_CTL));
 		timer_set_cval(ctx, read_sysreg_el0(SYS_CNTP_CVAL));
 
@@ -484,6 +539,7 @@ static void kvm_timer_blocking(struct kvm_vcpu *vcpu)
 	 */
 	if (!kvm_timer_irq_can_fire(map.direct_vtimer) &&
 	    !kvm_timer_irq_can_fire(map.direct_ptimer) &&
+	    !kvm_timer_irq_can_fire(map.emul_vtimer) &&
 	    !kvm_timer_irq_can_fire(map.emul_ptimer))
 		return;
 
@@ -517,11 +573,13 @@ static void timer_restore_state(struct arch_timer_context *ctx)
 
 	switch (index) {
 	case TIMER_VTIMER:
+	case TIMER_HVTIMER:
 		write_sysreg_el0(timer_get_cval(ctx), SYS_CNTV_CVAL);
 		isb();
 		write_sysreg_el0(timer_get_ctl(ctx), SYS_CNTV_CTL);
 		break;
 	case TIMER_PTIMER:
+	case TIMER_HPTIMER:
 		write_sysreg_el0(timer_get_cval(ctx), SYS_CNTP_CVAL);
 		isb();
 		write_sysreg_el0(timer_get_ctl(ctx), SYS_CNTP_CTL);
@@ -598,6 +656,42 @@ static void kvm_timer_vcpu_load_nogic(struct kvm_vcpu *vcpu)
 		enable_percpu_irq(host_vtimer_irq, host_vtimer_irq_flags);
 }
 
+static void kvm_timer_vcpu_load_nested_switch(struct kvm_vcpu *vcpu,
+					      struct timer_map *map)
+{
+	int hw, ret;
+
+	if (!irqchip_in_kernel(vcpu->kvm))
+		return;
+
+	/*
+	 * We only ever unmap the vtimer irq on a VHE system that runs nested
+	 * virtualization, in which case we have both a valid emul_vtimer,
+	 * emul_ptimer, direct_vtimer, and direct_ptimer.
+	 *
+	 * Since this is called from kvm_timer_vcpu_load(), a change between
+	 * vEL2 and vEL1/0 will have just happened, and the timer_map will
+	 * represent this, and therefore we switch the emul/direct mappings
+	 * below.
+	 */
+	hw = kvm_vgic_get_map(vcpu, map->direct_vtimer->irq.irq);
+	if (hw < 0) {
+		kvm_vgic_unmap_phys_irq(vcpu, map->emul_vtimer->irq.irq);
+		kvm_vgic_unmap_phys_irq(vcpu, map->emul_ptimer->irq.irq);
+
+		ret = kvm_vgic_map_phys_irq(vcpu,
+					    map->direct_vtimer->host_timer_irq,
+					    map->direct_vtimer->irq.irq,
+					    &arch_timer_irq_ops);
+		WARN_ON_ONCE(ret);
+		ret = kvm_vgic_map_phys_irq(vcpu,
+					    map->direct_ptimer->host_timer_irq,
+					    map->direct_ptimer->irq.irq,
+					    &arch_timer_irq_ops);
+		WARN_ON_ONCE(ret);
+	}
+}
+
 void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = vcpu_timer(vcpu);
@@ -609,6 +703,9 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
 	get_timer_map(vcpu, &map);
 
 	if (static_branch_likely(&has_gic_active_state)) {
+		if (vcpu_has_nv(vcpu))
+			kvm_timer_vcpu_load_nested_switch(vcpu, &map);
+
 		kvm_timer_vcpu_load_gic(map.direct_vtimer);
 		if (map.direct_ptimer)
 			kvm_timer_vcpu_load_gic(map.direct_ptimer);
@@ -624,6 +721,8 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
 	if (map.direct_ptimer)
 		timer_restore_state(map.direct_ptimer);
 
+	if (map.emul_vtimer)
+		timer_emulate(map.emul_vtimer);
 	if (map.emul_ptimer)
 		timer_emulate(map.emul_ptimer);
 }
@@ -668,6 +767,8 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
 	 * In any case, we re-schedule the hrtimer for the physical timer when
 	 * coming back to the VCPU thread in kvm_timer_vcpu_load().
 	 */
+	if (map.emul_vtimer)
+		soft_timer_cancel(&map.emul_vtimer->hrtimer);
 	if (map.emul_ptimer)
 		soft_timer_cancel(&map.emul_ptimer->hrtimer);
 
@@ -728,10 +829,14 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
 	 */
 	timer_set_ctl(vcpu_vtimer(vcpu), 0);
 	timer_set_ctl(vcpu_ptimer(vcpu), 0);
+	timer_set_ctl(vcpu_hvtimer(vcpu), 0);
+	timer_set_ctl(vcpu_hptimer(vcpu), 0);
 
 	if (timer->enabled) {
 		kvm_timer_update_irq(vcpu, false, vcpu_vtimer(vcpu));
 		kvm_timer_update_irq(vcpu, false, vcpu_ptimer(vcpu));
+		kvm_timer_update_irq(vcpu, false, vcpu_hvtimer(vcpu));
+		kvm_timer_update_irq(vcpu, false, vcpu_hptimer(vcpu));
 
 		if (irqchip_in_kernel(vcpu->kvm)) {
 			kvm_vgic_reset_mapped_irq(vcpu, map.direct_vtimer->irq.irq);
@@ -740,6 +845,8 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
 		}
 	}
 
+	if (map.emul_vtimer)
+		soft_timer_cancel(&map.emul_vtimer->hrtimer);
 	if (map.emul_ptimer)
 		soft_timer_cancel(&map.emul_ptimer->hrtimer);
 
@@ -770,30 +877,47 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
 	struct arch_timer_cpu *timer = vcpu_timer(vcpu);
 	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
 	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
+	struct arch_timer_context *hvtimer = vcpu_hvtimer(vcpu);
+	struct arch_timer_context *hptimer = vcpu_hptimer(vcpu);
 
 	vtimer->vcpu = vcpu;
 	ptimer->vcpu = vcpu;
+	hvtimer->vcpu = vcpu;
+	hptimer->vcpu = vcpu;
 
 	/* Synchronize cntvoff across all vtimers of a VM. */
 	update_vtimer_cntvoff(vcpu, kvm_phys_timer_read());
 	timer_set_offset(ptimer, 0);
+	timer_set_offset(hvtimer, 0);
+	timer_set_offset(hptimer, 0);
 
 	hrtimer_init(&timer->bg_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
 	timer->bg_timer.function = kvm_bg_timer_expire;
 
 	hrtimer_init(&vtimer->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
 	hrtimer_init(&ptimer->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
+	hrtimer_init(&hvtimer->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
+	hrtimer_init(&hptimer->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
+
 	vtimer->hrtimer.function = kvm_hrtimer_expire;
 	ptimer->hrtimer.function = kvm_hrtimer_expire;
+	hvtimer->hrtimer.function = kvm_hrtimer_expire;
+	hptimer->hrtimer.function = kvm_hrtimer_expire;
 
 	vtimer->irq.irq = default_vtimer_irq.irq;
 	ptimer->irq.irq = default_ptimer_irq.irq;
+	hvtimer->irq.irq = default_hvtimer_irq.irq;
+	hptimer->irq.irq = default_hptimer_irq.irq;
 
 	vtimer->host_timer_irq = host_vtimer_irq;
 	ptimer->host_timer_irq = host_ptimer_irq;
+	hvtimer->host_timer_irq = host_vtimer_irq;
+	hptimer->host_timer_irq = host_ptimer_irq;
 
 	vtimer->host_timer_irq_flags = host_vtimer_irq_flags;
 	ptimer->host_timer_irq_flags = host_ptimer_irq_flags;
+	hvtimer->host_timer_irq_flags = host_vtimer_irq_flags;
+	hptimer->host_timer_irq_flags = host_ptimer_irq_flags;
 }
 
 static void kvm_timer_init_interrupt(void *info)
@@ -900,6 +1024,10 @@ static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
 		val = kvm_phys_timer_read() - timer_get_offset(timer);
 		break;
 
+	case TIMER_REG_VOFF:
+		val = timer_get_offset(timer);
+		break;
+
 	default:
 		BUG();
 	}
@@ -942,6 +1070,10 @@ static void kvm_arm_timer_write(struct kvm_vcpu *vcpu,
 		timer_set_cval(timer, val);
 		break;
 
+	case TIMER_REG_VOFF:
+		timer_set_offset(timer, val);
+		break;
+
 	default:
 		BUG();
 	}
@@ -1040,10 +1172,6 @@ static const struct irq_domain_ops timer_domain_ops = {
 	.free	= timer_irq_domain_free,
 };
 
-static struct irq_ops arch_timer_irq_ops = {
-	.get_input_level = kvm_arch_timer_get_input_level,
-};
-
 static void kvm_irq_fixup_flags(unsigned int virq, u32 *flags)
 {
 	*flags = irq_get_trigger_type(virq);
@@ -1188,7 +1316,7 @@ void kvm_timer_vcpu_terminate(struct kvm_vcpu *vcpu)
 
 static bool timer_irqs_are_valid(struct kvm_vcpu *vcpu)
 {
-	int vtimer_irq, ptimer_irq, ret;
+	int vtimer_irq, ptimer_irq, hvtimer_irq, hptimer_irq, ret;
 	unsigned long i;
 
 	vtimer_irq = vcpu_vtimer(vcpu)->irq.irq;
@@ -1201,16 +1329,28 @@ static bool timer_irqs_are_valid(struct kvm_vcpu *vcpu)
 	if (ret)
 		return false;
 
+	hvtimer_irq = vcpu_hvtimer(vcpu)->irq.irq;
+	ret = kvm_vgic_set_owner(vcpu, hvtimer_irq, vcpu_hvtimer(vcpu));
+	if (ret)
+		return false;
+
+	hptimer_irq = vcpu_hptimer(vcpu)->irq.irq;
+	ret = kvm_vgic_set_owner(vcpu, hptimer_irq, vcpu_hptimer(vcpu));
+	if (ret)
+		return false;
+
 	kvm_for_each_vcpu(i, vcpu, vcpu->kvm) {
 		if (vcpu_vtimer(vcpu)->irq.irq != vtimer_irq ||
-		    vcpu_ptimer(vcpu)->irq.irq != ptimer_irq)
+		    vcpu_ptimer(vcpu)->irq.irq != ptimer_irq ||
+		    vcpu_hvtimer(vcpu)->irq.irq != hvtimer_irq ||
+		    vcpu_hptimer(vcpu)->irq.irq != hptimer_irq)
 			return false;
 	}
 
 	return true;
 }
 
-bool kvm_arch_timer_get_input_level(int vintid)
+static bool kvm_arch_timer_get_input_level(int vintid)
 {
 	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
 	struct arch_timer_context *timer;
@@ -1219,6 +1359,10 @@ bool kvm_arch_timer_get_input_level(int vintid)
 		timer = vcpu_vtimer(vcpu);
 	else if (vintid == vcpu_ptimer(vcpu)->irq.irq)
 		timer = vcpu_ptimer(vcpu);
+	else if (vintid == vcpu_hvtimer(vcpu)->irq.irq)
+		timer = vcpu_hvtimer(vcpu);
+	else if (vintid == vcpu_hptimer(vcpu)->irq.irq)
+		timer = vcpu_hptimer(vcpu);
 	else
 		BUG();
 
@@ -1301,6 +1445,7 @@ static void set_timer_irqs(struct kvm *kvm, int vtimer_irq, int ptimer_irq)
 	kvm_for_each_vcpu(i, vcpu, kvm) {
 		vcpu_vtimer(vcpu)->irq.irq = vtimer_irq;
 		vcpu_ptimer(vcpu)->irq.irq = ptimer_irq;
+		/* TODO: Add support for hv/hp timers */
 	}
 }
 
@@ -1311,6 +1456,8 @@ int kvm_arm_timer_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
 	int irq;
 
+	/* TODO: Add support for hv/hp timers */
+
 	if (!irqchip_in_kernel(vcpu->kvm))
 		return -EINVAL;
 
@@ -1343,6 +1490,8 @@ int kvm_arm_timer_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 	struct arch_timer_context *timer;
 	int irq;
 
+	/* TODO: Add support for hv/hp timers */
+
 	switch (attr->attr) {
 	case KVM_ARM_VCPU_TIMER_IRQ_VTIMER:
 		timer = vcpu_vtimer(vcpu);
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index d7441b8ba406..bbc58930a5eb 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1288,6 +1288,11 @@ static bool access_arch_timer(struct kvm_vcpu *vcpu,
 		tmr = TIMER_PTIMER;
 		treg = TIMER_REG_CVAL;
 		break;
+	case SYS_CNTVOFF_EL2:
+		tmr = TIMER_VTIMER;
+		treg = TIMER_REG_VOFF;
+		break;
+
 	default:
 		BUG();
 	}
@@ -2212,7 +2217,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	EL2_REG(CONTEXTIDR_EL2, access_rw, reset_val, 0),
 	EL2_REG(TPIDR_EL2, access_rw, reset_val, 0),
 
-	EL2_REG(CNTVOFF_EL2, access_rw, reset_val, 0),
+	{ SYS_DESC(SYS_CNTVOFF_EL2), access_arch_timer },
 	EL2_REG(CNTHCTL_EL2, access_rw, reset_val, 0),
 
 	EL12_REG(SCTLR, access_vm_reg, reset_val, 0x00C50078),
diff --git a/arch/arm64/kvm/trace_arm.h b/arch/arm64/kvm/trace_arm.h
index f3e46a976125..6ce5c025218d 100644
--- a/arch/arm64/kvm/trace_arm.h
+++ b/arch/arm64/kvm/trace_arm.h
@@ -206,6 +206,7 @@ TRACE_EVENT(kvm_get_timer_map,
 		__field(	unsigned long,		vcpu_id	)
 		__field(	int,			direct_vtimer	)
 		__field(	int,			direct_ptimer	)
+		__field(	int,			emul_vtimer	)
 		__field(	int,			emul_ptimer	)
 	),
 
@@ -214,14 +215,17 @@ TRACE_EVENT(kvm_get_timer_map,
 		__entry->direct_vtimer		= arch_timer_ctx_index(map->direct_vtimer);
 		__entry->direct_ptimer =
 			(map->direct_ptimer) ? arch_timer_ctx_index(map->direct_ptimer) : -1;
+		__entry->emul_vtimer =
+			(map->emul_vtimer) ? arch_timer_ctx_index(map->emul_vtimer) : -1;
 		__entry->emul_ptimer =
 			(map->emul_ptimer) ? arch_timer_ctx_index(map->emul_ptimer) : -1;
 	),
 
-	TP_printk("VCPU: %ld, dv: %d, dp: %d, ep: %d",
+	TP_printk("VCPU: %ld, dv: %d, dp: %d, ev: %d, ep: %d",
 		  __entry->vcpu_id,
 		  __entry->direct_vtimer,
 		  __entry->direct_ptimer,
+		  __entry->emul_vtimer,
 		  __entry->emul_ptimer)
 );
 
diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
index 9b98876a8a93..e7fe0447af08 100644
--- a/arch/arm64/kvm/vgic/vgic.c
+++ b/arch/arm64/kvm/vgic/vgic.c
@@ -573,6 +573,21 @@ int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int vintid)
 	return 0;
 }
 
+int kvm_vgic_get_map(struct kvm_vcpu *vcpu, unsigned int vintid)
+{
+	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, vintid);
+	unsigned long flags;
+	int ret = -1;
+
+	raw_spin_lock_irqsave(&irq->irq_lock, flags);
+	if (irq->hw)
+		ret = irq->hwintid;
+	raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
+
+	vgic_put_irq(vcpu->kvm, irq);
+	return ret;
+}
+
 /**
  * kvm_vgic_set_owner - Set the owner of an interrupt for a VM
  *
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index 51c19381108c..0a76dac8cb6a 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -13,6 +13,8 @@
 enum kvm_arch_timers {
 	TIMER_PTIMER,
 	TIMER_VTIMER,
+	TIMER_HVTIMER,
+	TIMER_HPTIMER,
 	NR_KVM_TIMERS
 };
 
@@ -21,6 +23,7 @@ enum kvm_arch_timer_regs {
 	TIMER_REG_CVAL,
 	TIMER_REG_TVAL,
 	TIMER_REG_CTL,
+	TIMER_REG_VOFF,
 };
 
 struct arch_timer_context {
@@ -47,6 +50,7 @@ struct arch_timer_context {
 struct timer_map {
 	struct arch_timer_context *direct_vtimer;
 	struct arch_timer_context *direct_ptimer;
+	struct arch_timer_context *emul_vtimer;
 	struct arch_timer_context *emul_ptimer;
 };
 
@@ -85,12 +89,12 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu);
 
 void kvm_timer_init_vhe(void);
 
-bool kvm_arch_timer_get_input_level(int vintid);
-
 #define vcpu_timer(v)	(&(v)->arch.timer_cpu)
 #define vcpu_get_timer(v,t)	(&vcpu_timer(v)->timers[(t)])
 #define vcpu_vtimer(v)	(&(v)->arch.timer_cpu.timers[TIMER_VTIMER])
 #define vcpu_ptimer(v)	(&(v)->arch.timer_cpu.timers[TIMER_PTIMER])
+#define vcpu_hvtimer(v)	(&(v)->arch.timer_cpu.timers[TIMER_HVTIMER])
+#define vcpu_hptimer(v)	(&(v)->arch.timer_cpu.timers[TIMER_HPTIMER])
 
 #define arch_timer_ctx_index(ctx)	((ctx) - vcpu_timer((ctx)->vcpu)->timers)
 
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index bb30a6803d9f..17b651890d22 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -375,6 +375,7 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
 int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, unsigned int host_irq,
 			  u32 vintid, struct irq_ops *ops);
 int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int vintid);
+int kvm_vgic_get_map(struct kvm_vcpu *vcpu, unsigned int vintid);
 bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int vintid);
 
 int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 44/64] KVM: arm64: nv: Add handling of EL2-specific timer registers
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

Add the required handling for EL2 and EL02 registers, as
well as EL1 registers used in the E2H context.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/sysreg.h |  6 +++
 arch/arm64/kvm/sys_regs.c       | 87 +++++++++++++++++++++++++++++++++
 2 files changed, 93 insertions(+)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 41c3603a3f29..ff6d3af8ed34 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -626,6 +626,12 @@
 
 #define SYS_CNTVOFF_EL2			sys_reg(3, 4, 14, 0, 3)
 #define SYS_CNTHCTL_EL2			sys_reg(3, 4, 14, 1, 0)
+#define SYS_CNTHP_TVAL_EL2		sys_reg(3, 4, 14, 2, 0)
+#define SYS_CNTHP_CTL_EL2		sys_reg(3, 4, 14, 2, 1)
+#define SYS_CNTHP_CVAL_EL2		sys_reg(3, 4, 14, 2, 2)
+#define SYS_CNTHV_TVAL_EL2		sys_reg(3, 4, 14, 3, 0)
+#define SYS_CNTHV_CTL_EL2		sys_reg(3, 4, 14, 3, 1)
+#define SYS_CNTHV_CVAL_EL2		sys_reg(3, 4, 14, 3, 2)
 
 /* VHE encodings for architectural EL0/1 system registers */
 #define SYS_SCTLR_EL12			sys_reg(3, 5, 1, 0, 0)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index bbc58930a5eb..f2452e76156c 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1274,20 +1274,92 @@ static bool access_arch_timer(struct kvm_vcpu *vcpu,
 
 	switch (reg) {
 	case SYS_CNTP_TVAL_EL0:
+		if (vcpu_is_el2(vcpu) && vcpu_el2_e2h_is_set(vcpu))
+			tmr = TIMER_HPTIMER;
+		else
+			tmr = TIMER_PTIMER;
+		treg = TIMER_REG_TVAL;
+		break;
+
 	case SYS_AARCH32_CNTP_TVAL:
+	case SYS_CNTP_TVAL_EL02:
 		tmr = TIMER_PTIMER;
 		treg = TIMER_REG_TVAL;
 		break;
+
+	case SYS_CNTV_TVAL_EL02:
+		tmr = TIMER_VTIMER;
+		treg = TIMER_REG_TVAL;
+		break;
+
+	case SYS_CNTHP_TVAL_EL2:
+		tmr = TIMER_HPTIMER;
+		treg = TIMER_REG_TVAL;
+		break;
+
+	case SYS_CNTHV_TVAL_EL2:
+		tmr = TIMER_HVTIMER;
+		treg = TIMER_REG_TVAL;
+		break;
+
 	case SYS_CNTP_CTL_EL0:
+		if (vcpu_is_el2(vcpu) && vcpu_el2_e2h_is_set(vcpu))
+			tmr = TIMER_HPTIMER;
+		else
+			tmr = TIMER_PTIMER;
+		treg = TIMER_REG_CTL;
+		break;
+
 	case SYS_AARCH32_CNTP_CTL:
+	case SYS_CNTP_CTL_EL02:
 		tmr = TIMER_PTIMER;
 		treg = TIMER_REG_CTL;
 		break;
+
+	case SYS_CNTV_CTL_EL02:
+		tmr = TIMER_VTIMER;
+		treg = TIMER_REG_CTL;
+		break;
+
+	case SYS_CNTHP_CTL_EL2:
+		tmr = TIMER_HPTIMER;
+		treg = TIMER_REG_CTL;
+		break;
+
+	case SYS_CNTHV_CTL_EL2:
+		tmr = TIMER_HVTIMER;
+		treg = TIMER_REG_CTL;
+		break;
+
 	case SYS_CNTP_CVAL_EL0:
+		if (vcpu_is_el2(vcpu) && vcpu_el2_e2h_is_set(vcpu))
+			tmr = TIMER_HPTIMER;
+		else
+			tmr = TIMER_PTIMER;
+		treg = TIMER_REG_CVAL;
+		break;
+
 	case SYS_AARCH32_CNTP_CVAL:
+	case SYS_CNTP_CVAL_EL02:
 		tmr = TIMER_PTIMER;
 		treg = TIMER_REG_CVAL;
 		break;
+
+	case SYS_CNTV_CVAL_EL02:
+		tmr = TIMER_VTIMER;
+		treg = TIMER_REG_CVAL;
+		break;
+
+	case SYS_CNTHP_CVAL_EL2:
+		tmr = TIMER_HPTIMER;
+		treg = TIMER_REG_CVAL;
+		break;
+
+	case SYS_CNTHV_CVAL_EL2:
+		tmr = TIMER_HVTIMER;
+		treg = TIMER_REG_CVAL;
+		break;
+
 	case SYS_CNTVOFF_EL2:
 		tmr = TIMER_VTIMER;
 		treg = TIMER_REG_VOFF;
@@ -2220,6 +2292,13 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_CNTVOFF_EL2), access_arch_timer },
 	EL2_REG(CNTHCTL_EL2, access_rw, reset_val, 0),
 
+	{ SYS_DESC(SYS_CNTHP_TVAL_EL2), access_arch_timer },
+	{ SYS_DESC(SYS_CNTHP_CTL_EL2), access_arch_timer },
+	{ SYS_DESC(SYS_CNTHP_CVAL_EL2), access_arch_timer },
+	{ SYS_DESC(SYS_CNTHV_TVAL_EL2), access_arch_timer },
+	{ SYS_DESC(SYS_CNTHV_CTL_EL2), access_arch_timer },
+	{ SYS_DESC(SYS_CNTHV_CVAL_EL2), access_arch_timer },
+
 	EL12_REG(SCTLR, access_vm_reg, reset_val, 0x00C50078),
 	EL12_REG(CPACR, access_rw, reset_val, 0),
 	EL12_REG(TTBR0, access_vm_reg, reset_unknown, 0),
@@ -2237,6 +2316,14 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	EL12_REG(CONTEXTIDR, access_vm_reg, reset_val, 0),
 	EL12_REG(CNTKCTL, access_rw, reset_val, 0),
 
+	{ SYS_DESC(SYS_CNTP_TVAL_EL02), access_arch_timer },
+	{ SYS_DESC(SYS_CNTP_CTL_EL02), access_arch_timer },
+	{ SYS_DESC(SYS_CNTP_CVAL_EL02), access_arch_timer },
+
+	{ SYS_DESC(SYS_CNTV_TVAL_EL02), access_arch_timer },
+	{ SYS_DESC(SYS_CNTV_CTL_EL02), access_arch_timer },
+	{ SYS_DESC(SYS_CNTV_CVAL_EL02), access_arch_timer },
+
 	EL2_REG(SP_EL2, NULL, reset_unknown, 0),
 };
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 44/64] KVM: arm64: nv: Add handling of EL2-specific timer registers
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

Add the required handling for EL2 and EL02 registers, as
well as EL1 registers used in the E2H context.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/sysreg.h |  6 +++
 arch/arm64/kvm/sys_regs.c       | 87 +++++++++++++++++++++++++++++++++
 2 files changed, 93 insertions(+)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 41c3603a3f29..ff6d3af8ed34 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -626,6 +626,12 @@
 
 #define SYS_CNTVOFF_EL2			sys_reg(3, 4, 14, 0, 3)
 #define SYS_CNTHCTL_EL2			sys_reg(3, 4, 14, 1, 0)
+#define SYS_CNTHP_TVAL_EL2		sys_reg(3, 4, 14, 2, 0)
+#define SYS_CNTHP_CTL_EL2		sys_reg(3, 4, 14, 2, 1)
+#define SYS_CNTHP_CVAL_EL2		sys_reg(3, 4, 14, 2, 2)
+#define SYS_CNTHV_TVAL_EL2		sys_reg(3, 4, 14, 3, 0)
+#define SYS_CNTHV_CTL_EL2		sys_reg(3, 4, 14, 3, 1)
+#define SYS_CNTHV_CVAL_EL2		sys_reg(3, 4, 14, 3, 2)
 
 /* VHE encodings for architectural EL0/1 system registers */
 #define SYS_SCTLR_EL12			sys_reg(3, 5, 1, 0, 0)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index bbc58930a5eb..f2452e76156c 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1274,20 +1274,92 @@ static bool access_arch_timer(struct kvm_vcpu *vcpu,
 
 	switch (reg) {
 	case SYS_CNTP_TVAL_EL0:
+		if (vcpu_is_el2(vcpu) && vcpu_el2_e2h_is_set(vcpu))
+			tmr = TIMER_HPTIMER;
+		else
+			tmr = TIMER_PTIMER;
+		treg = TIMER_REG_TVAL;
+		break;
+
 	case SYS_AARCH32_CNTP_TVAL:
+	case SYS_CNTP_TVAL_EL02:
 		tmr = TIMER_PTIMER;
 		treg = TIMER_REG_TVAL;
 		break;
+
+	case SYS_CNTV_TVAL_EL02:
+		tmr = TIMER_VTIMER;
+		treg = TIMER_REG_TVAL;
+		break;
+
+	case SYS_CNTHP_TVAL_EL2:
+		tmr = TIMER_HPTIMER;
+		treg = TIMER_REG_TVAL;
+		break;
+
+	case SYS_CNTHV_TVAL_EL2:
+		tmr = TIMER_HVTIMER;
+		treg = TIMER_REG_TVAL;
+		break;
+
 	case SYS_CNTP_CTL_EL0:
+		if (vcpu_is_el2(vcpu) && vcpu_el2_e2h_is_set(vcpu))
+			tmr = TIMER_HPTIMER;
+		else
+			tmr = TIMER_PTIMER;
+		treg = TIMER_REG_CTL;
+		break;
+
 	case SYS_AARCH32_CNTP_CTL:
+	case SYS_CNTP_CTL_EL02:
 		tmr = TIMER_PTIMER;
 		treg = TIMER_REG_CTL;
 		break;
+
+	case SYS_CNTV_CTL_EL02:
+		tmr = TIMER_VTIMER;
+		treg = TIMER_REG_CTL;
+		break;
+
+	case SYS_CNTHP_CTL_EL2:
+		tmr = TIMER_HPTIMER;
+		treg = TIMER_REG_CTL;
+		break;
+
+	case SYS_CNTHV_CTL_EL2:
+		tmr = TIMER_HVTIMER;
+		treg = TIMER_REG_CTL;
+		break;
+
 	case SYS_CNTP_CVAL_EL0:
+		if (vcpu_is_el2(vcpu) && vcpu_el2_e2h_is_set(vcpu))
+			tmr = TIMER_HPTIMER;
+		else
+			tmr = TIMER_PTIMER;
+		treg = TIMER_REG_CVAL;
+		break;
+
 	case SYS_AARCH32_CNTP_CVAL:
+	case SYS_CNTP_CVAL_EL02:
 		tmr = TIMER_PTIMER;
 		treg = TIMER_REG_CVAL;
 		break;
+
+	case SYS_CNTV_CVAL_EL02:
+		tmr = TIMER_VTIMER;
+		treg = TIMER_REG_CVAL;
+		break;
+
+	case SYS_CNTHP_CVAL_EL2:
+		tmr = TIMER_HPTIMER;
+		treg = TIMER_REG_CVAL;
+		break;
+
+	case SYS_CNTHV_CVAL_EL2:
+		tmr = TIMER_HVTIMER;
+		treg = TIMER_REG_CVAL;
+		break;
+
 	case SYS_CNTVOFF_EL2:
 		tmr = TIMER_VTIMER;
 		treg = TIMER_REG_VOFF;
@@ -2220,6 +2292,13 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_CNTVOFF_EL2), access_arch_timer },
 	EL2_REG(CNTHCTL_EL2, access_rw, reset_val, 0),
 
+	{ SYS_DESC(SYS_CNTHP_TVAL_EL2), access_arch_timer },
+	{ SYS_DESC(SYS_CNTHP_CTL_EL2), access_arch_timer },
+	{ SYS_DESC(SYS_CNTHP_CVAL_EL2), access_arch_timer },
+	{ SYS_DESC(SYS_CNTHV_TVAL_EL2), access_arch_timer },
+	{ SYS_DESC(SYS_CNTHV_CTL_EL2), access_arch_timer },
+	{ SYS_DESC(SYS_CNTHV_CVAL_EL2), access_arch_timer },
+
 	EL12_REG(SCTLR, access_vm_reg, reset_val, 0x00C50078),
 	EL12_REG(CPACR, access_rw, reset_val, 0),
 	EL12_REG(TTBR0, access_vm_reg, reset_unknown, 0),
@@ -2237,6 +2316,14 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	EL12_REG(CONTEXTIDR, access_vm_reg, reset_val, 0),
 	EL12_REG(CNTKCTL, access_rw, reset_val, 0),
 
+	{ SYS_DESC(SYS_CNTP_TVAL_EL02), access_arch_timer },
+	{ SYS_DESC(SYS_CNTP_CTL_EL02), access_arch_timer },
+	{ SYS_DESC(SYS_CNTP_CVAL_EL02), access_arch_timer },
+
+	{ SYS_DESC(SYS_CNTV_TVAL_EL02), access_arch_timer },
+	{ SYS_DESC(SYS_CNTV_CTL_EL02), access_arch_timer },
+	{ SYS_DESC(SYS_CNTV_CVAL_EL02), access_arch_timer },
+
 	EL2_REG(SP_EL2, NULL, reset_unknown, 0),
 };
 
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 44/64] KVM: arm64: nv: Add handling of EL2-specific timer registers
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

Add the required handling for EL2 and EL02 registers, as
well as EL1 registers used in the E2H context.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/sysreg.h |  6 +++
 arch/arm64/kvm/sys_regs.c       | 87 +++++++++++++++++++++++++++++++++
 2 files changed, 93 insertions(+)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 41c3603a3f29..ff6d3af8ed34 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -626,6 +626,12 @@
 
 #define SYS_CNTVOFF_EL2			sys_reg(3, 4, 14, 0, 3)
 #define SYS_CNTHCTL_EL2			sys_reg(3, 4, 14, 1, 0)
+#define SYS_CNTHP_TVAL_EL2		sys_reg(3, 4, 14, 2, 0)
+#define SYS_CNTHP_CTL_EL2		sys_reg(3, 4, 14, 2, 1)
+#define SYS_CNTHP_CVAL_EL2		sys_reg(3, 4, 14, 2, 2)
+#define SYS_CNTHV_TVAL_EL2		sys_reg(3, 4, 14, 3, 0)
+#define SYS_CNTHV_CTL_EL2		sys_reg(3, 4, 14, 3, 1)
+#define SYS_CNTHV_CVAL_EL2		sys_reg(3, 4, 14, 3, 2)
 
 /* VHE encodings for architectural EL0/1 system registers */
 #define SYS_SCTLR_EL12			sys_reg(3, 5, 1, 0, 0)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index bbc58930a5eb..f2452e76156c 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1274,20 +1274,92 @@ static bool access_arch_timer(struct kvm_vcpu *vcpu,
 
 	switch (reg) {
 	case SYS_CNTP_TVAL_EL0:
+		if (vcpu_is_el2(vcpu) && vcpu_el2_e2h_is_set(vcpu))
+			tmr = TIMER_HPTIMER;
+		else
+			tmr = TIMER_PTIMER;
+		treg = TIMER_REG_TVAL;
+		break;
+
 	case SYS_AARCH32_CNTP_TVAL:
+	case SYS_CNTP_TVAL_EL02:
 		tmr = TIMER_PTIMER;
 		treg = TIMER_REG_TVAL;
 		break;
+
+	case SYS_CNTV_TVAL_EL02:
+		tmr = TIMER_VTIMER;
+		treg = TIMER_REG_TVAL;
+		break;
+
+	case SYS_CNTHP_TVAL_EL2:
+		tmr = TIMER_HPTIMER;
+		treg = TIMER_REG_TVAL;
+		break;
+
+	case SYS_CNTHV_TVAL_EL2:
+		tmr = TIMER_HVTIMER;
+		treg = TIMER_REG_TVAL;
+		break;
+
 	case SYS_CNTP_CTL_EL0:
+		if (vcpu_is_el2(vcpu) && vcpu_el2_e2h_is_set(vcpu))
+			tmr = TIMER_HPTIMER;
+		else
+			tmr = TIMER_PTIMER;
+		treg = TIMER_REG_CTL;
+		break;
+
 	case SYS_AARCH32_CNTP_CTL:
+	case SYS_CNTP_CTL_EL02:
 		tmr = TIMER_PTIMER;
 		treg = TIMER_REG_CTL;
 		break;
+
+	case SYS_CNTV_CTL_EL02:
+		tmr = TIMER_VTIMER;
+		treg = TIMER_REG_CTL;
+		break;
+
+	case SYS_CNTHP_CTL_EL2:
+		tmr = TIMER_HPTIMER;
+		treg = TIMER_REG_CTL;
+		break;
+
+	case SYS_CNTHV_CTL_EL2:
+		tmr = TIMER_HVTIMER;
+		treg = TIMER_REG_CTL;
+		break;
+
 	case SYS_CNTP_CVAL_EL0:
+		if (vcpu_is_el2(vcpu) && vcpu_el2_e2h_is_set(vcpu))
+			tmr = TIMER_HPTIMER;
+		else
+			tmr = TIMER_PTIMER;
+		treg = TIMER_REG_CVAL;
+		break;
+
 	case SYS_AARCH32_CNTP_CVAL:
+	case SYS_CNTP_CVAL_EL02:
 		tmr = TIMER_PTIMER;
 		treg = TIMER_REG_CVAL;
 		break;
+
+	case SYS_CNTV_CVAL_EL02:
+		tmr = TIMER_VTIMER;
+		treg = TIMER_REG_CVAL;
+		break;
+
+	case SYS_CNTHP_CVAL_EL2:
+		tmr = TIMER_HPTIMER;
+		treg = TIMER_REG_CVAL;
+		break;
+
+	case SYS_CNTHV_CVAL_EL2:
+		tmr = TIMER_HVTIMER;
+		treg = TIMER_REG_CVAL;
+		break;
+
 	case SYS_CNTVOFF_EL2:
 		tmr = TIMER_VTIMER;
 		treg = TIMER_REG_VOFF;
@@ -2220,6 +2292,13 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_CNTVOFF_EL2), access_arch_timer },
 	EL2_REG(CNTHCTL_EL2, access_rw, reset_val, 0),
 
+	{ SYS_DESC(SYS_CNTHP_TVAL_EL2), access_arch_timer },
+	{ SYS_DESC(SYS_CNTHP_CTL_EL2), access_arch_timer },
+	{ SYS_DESC(SYS_CNTHP_CVAL_EL2), access_arch_timer },
+	{ SYS_DESC(SYS_CNTHV_TVAL_EL2), access_arch_timer },
+	{ SYS_DESC(SYS_CNTHV_CTL_EL2), access_arch_timer },
+	{ SYS_DESC(SYS_CNTHV_CVAL_EL2), access_arch_timer },
+
 	EL12_REG(SCTLR, access_vm_reg, reset_val, 0x00C50078),
 	EL12_REG(CPACR, access_rw, reset_val, 0),
 	EL12_REG(TTBR0, access_vm_reg, reset_unknown, 0),
@@ -2237,6 +2316,14 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	EL12_REG(CONTEXTIDR, access_vm_reg, reset_val, 0),
 	EL12_REG(CNTKCTL, access_rw, reset_val, 0),
 
+	{ SYS_DESC(SYS_CNTP_TVAL_EL02), access_arch_timer },
+	{ SYS_DESC(SYS_CNTP_CTL_EL02), access_arch_timer },
+	{ SYS_DESC(SYS_CNTP_CVAL_EL02), access_arch_timer },
+
+	{ SYS_DESC(SYS_CNTV_TVAL_EL02), access_arch_timer },
+	{ SYS_DESC(SYS_CNTV_CTL_EL02), access_arch_timer },
+	{ SYS_DESC(SYS_CNTV_CVAL_EL02), access_arch_timer },
+
 	EL2_REG(SP_EL2, NULL, reset_unknown, 0),
 };
 
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 45/64] KVM: arm64: nv: Load timer before the GIC
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

In order for vgic_v3_load_nested to be able to observe which timer
interrupts have the HW bit set for the current context, the timers
must have been loaded in the new mode and the right timer mapped
to their corresponding HW IRQs.

At the moment, we load the GIC first, meaning that timer interrupts
injected to an L2 guest will never have the HW bit set (we see the
old configuration).

Swapping the two loads solves this particular problem.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/arm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 14f85f1e15b2..6597cb8f64f9 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -400,8 +400,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 
 	vcpu->cpu = cpu;
 
-	kvm_vgic_load(vcpu);
 	kvm_timer_vcpu_load(vcpu);
+	kvm_vgic_load(vcpu);
 	if (has_vhe())
 		kvm_vcpu_load_sysregs_vhe(vcpu);
 	kvm_arch_vcpu_load_fp(vcpu);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 45/64] KVM: arm64: nv: Load timer before the GIC
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

In order for vgic_v3_load_nested to be able to observe which timer
interrupts have the HW bit set for the current context, the timers
must have been loaded in the new mode and the right timer mapped
to their corresponding HW IRQs.

At the moment, we load the GIC first, meaning that timer interrupts
injected to an L2 guest will never have the HW bit set (we see the
old configuration).

Swapping the two loads solves this particular problem.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/arm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 14f85f1e15b2..6597cb8f64f9 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -400,8 +400,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 
 	vcpu->cpu = cpu;
 
-	kvm_vgic_load(vcpu);
 	kvm_timer_vcpu_load(vcpu);
+	kvm_vgic_load(vcpu);
 	if (has_vhe())
 		kvm_vcpu_load_sysregs_vhe(vcpu);
 	kvm_arch_vcpu_load_fp(vcpu);
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 45/64] KVM: arm64: nv: Load timer before the GIC
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

In order for vgic_v3_load_nested to be able to observe which timer
interrupts have the HW bit set for the current context, the timers
must have been loaded in the new mode and the right timer mapped
to their corresponding HW IRQs.

At the moment, we load the GIC first, meaning that timer interrupts
injected to an L2 guest will never have the HW bit set (we see the
old configuration).

Swapping the two loads solves this particular problem.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/arm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 14f85f1e15b2..6597cb8f64f9 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -400,8 +400,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 
 	vcpu->cpu = cpu;
 
-	kvm_vgic_load(vcpu);
 	kvm_timer_vcpu_load(vcpu);
+	kvm_vgic_load(vcpu);
 	if (has_vhe())
 		kvm_vcpu_load_sysregs_vhe(vcpu);
 	kvm_arch_vcpu_load_fp(vcpu);
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 46/64] KVM: arm64: nv: Nested GICv3 Support
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Jintack Lim <jintack@cs.columbia.edu>

When entering a nested VM, we set up the hypervisor control interface
based on what the guest hypervisor has set. Especially, we investigate
each list register written by the guest hypervisor whether HW bit is
set.  If so, we translate hw irq number from the guest's point of view
to the real hardware irq number if there is a mapping.

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
[Redesigned execution flow around vcpu load/put]
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
[Rewritten to support GICv3 instead of GICv2]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_emulate.h |   8 +-
 arch/arm64/include/asm/kvm_host.h    |  11 +-
 arch/arm64/include/asm/kvm_nested.h  |   1 +
 arch/arm64/kvm/Makefile              |   2 +-
 arch/arm64/kvm/arm.c                 |  13 +-
 arch/arm64/kvm/nested.c              |  16 +++
 arch/arm64/kvm/sys_regs.c            | 179 +++++++++++++++++++++++++-
 arch/arm64/kvm/vgic/vgic-v3-nested.c | 180 +++++++++++++++++++++++++++
 arch/arm64/kvm/vgic/vgic-v3.c        |  26 ++++
 arch/arm64/kvm/vgic/vgic.c           |  27 ++++
 include/kvm/arm_vgic.h               |  18 +++
 11 files changed, 471 insertions(+), 10 deletions(-)
 create mode 100644 arch/arm64/kvm/vgic/vgic-v3-nested.c

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index dc6eeb0cc8a9..2128d623a8b3 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -505,7 +505,13 @@ static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
 
 static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
 {
-	return vcpu_read_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
+	/*
+	 * Use the in-memory view for MPIDR_EL1. It can't be changed by the
+	 * guest, and is also accessed from the context of *another* vcpu,
+	 * so anything using some other state (such as the NV state that is
+	 * used by vcpu_read_sys_reg) will eventually go wrong.
+	 */
+	return __vcpu_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
 }
 
 static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 03833eca3307..6bfdb21c0d8a 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -41,11 +41,12 @@
 
 #define KVM_REQ_SLEEP \
 	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
-#define KVM_REQ_IRQ_PENDING	KVM_ARCH_REQ(1)
-#define KVM_REQ_VCPU_RESET	KVM_ARCH_REQ(2)
-#define KVM_REQ_RECORD_STEAL	KVM_ARCH_REQ(3)
-#define KVM_REQ_RELOAD_GICv4	KVM_ARCH_REQ(4)
-#define KVM_REQ_RELOAD_PMU	KVM_ARCH_REQ(5)
+#define KVM_REQ_IRQ_PENDING		KVM_ARCH_REQ(1)
+#define KVM_REQ_VCPU_RESET		KVM_ARCH_REQ(2)
+#define KVM_REQ_RECORD_STEAL		KVM_ARCH_REQ(3)
+#define KVM_REQ_RELOAD_GICv4		KVM_ARCH_REQ(4)
+#define KVM_REQ_RELOAD_PMU		KVM_ARCH_REQ(5)
+#define KVM_REQ_GUEST_HYP_IRQ_PENDING	KVM_ARCH_REQ(6)
 
 #define KVM_DIRTY_LOG_MANUAL_CAPS   (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | \
 				     KVM_DIRTY_LOG_INITIALLY_SET)
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 8915bead0633..3eef5f60c76d 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -71,6 +71,7 @@ extern void kvm_init_nested_s2_mmu(struct kvm_s2_mmu *mmu);
 extern struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr);
 extern void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu);
 extern void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu);
+extern void check_nested_vcpu_requests(struct kvm_vcpu *vcpu);
 
 struct kvm_s2_trans {
 	phys_addr_t output;
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index b800dcbd157f..a982fe1b4fc4 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -20,7 +20,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
 	 vgic/vgic-v3.o vgic/vgic-v4.o \
 	 vgic/vgic-mmio.o vgic/vgic-mmio-v2.o \
 	 vgic/vgic-mmio-v3.o vgic/vgic-kvm-device.o \
-	 vgic/vgic-its.o vgic/vgic-debug.o
+	 vgic/vgic-its.o vgic/vgic-debug.o vgic/vgic-v3-nested.o
 
 kvm-$(CONFIG_HW_PERF_EVENTS)  += pmu-emul.o
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 6597cb8f64f9..91fb7bc09b8e 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -755,6 +755,8 @@ static void check_vcpu_requests(struct kvm_vcpu *vcpu)
 		if (kvm_check_request(KVM_REQ_RELOAD_PMU, vcpu))
 			kvm_pmu_handle_pmcr(vcpu,
 					    __vcpu_sys_reg(vcpu, PMCR_EL0));
+
+		check_nested_vcpu_requests(vcpu);
 	}
 }
 
@@ -846,9 +848,16 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 		if (!ret)
 			ret = 1;
 
-		update_vmid(&vcpu->arch.hw_mmu->vmid);
-
+		/*
+		 * A nested exeption triggered by a vcpu request (such
+		 * as an interrupt injected in a guest hypervisor) can
+		 * change the currently used VMID (by switching to a
+		 * different translation regime. It is thus necesary
+		 * to update the VMID *after* all requests have been
+		 * processed.
+		 */
 		check_vcpu_requests(vcpu);
+		update_vmid(&vcpu->arch.hw_mmu->vmid);
 
 		/*
 		 * Preparing the interrupts to be injected also
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index b39af4d87787..39c9b1033f02 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -580,6 +580,22 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm)
 	kvm_free_stage2_pgd(&kvm->arch.mmu);
 }
 
+bool vgic_state_is_nested(struct kvm_vcpu *vcpu)
+{
+	bool imo = __vcpu_sys_reg(vcpu, HCR_EL2) & HCR_IMO;
+	bool fmo = __vcpu_sys_reg(vcpu, HCR_EL2) & HCR_FMO;
+
+	WARN_ONCE(imo != fmo, "Separate virtual IRQ/FIQ settings not supported\n");
+
+	return vcpu_has_nv(vcpu) && imo && fmo && !is_hyp_ctxt(vcpu);
+}
+
+void check_nested_vcpu_requests(struct kvm_vcpu *vcpu)
+{
+	if (kvm_check_request(KVM_REQ_GUEST_HYP_IRQ_PENDING, vcpu))
+		kvm_inject_nested_irq(vcpu);
+}
+
 /*
  * Our emulated CPU doesn't support all the possible features. For the
  * sake of simplicity (and probably mental sanity), wipe out a number
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index f2452e76156c..a18d845bcff7 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -16,6 +16,8 @@
 #include <linux/printk.h>
 #include <linux/uaccess.h>
 
+#include <linux/irqchip/arm-gic-v3.h>
+
 #include <asm/cacheflush.h>
 #include <asm/cputype.h>
 #include <asm/debug-monitors.h>
@@ -407,6 +409,19 @@ static bool access_actlr(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+/*
+ * The architecture says that non-secure write accesses to this register from
+ * EL1 are trapped to EL2, if either:
+ *  - HCR_EL2.FMO==1, or
+ *  - HCR_EL2.IMO==1
+ */
+static bool sgi_traps_to_vel2(struct kvm_vcpu *vcpu)
+{
+	return (vcpu_has_nv(vcpu) &&
+		!vcpu_is_el2(vcpu) &&
+		!!(vcpu_read_sys_reg(vcpu, HCR_EL2) & (HCR_IMO | HCR_FMO)));
+}
+
 /*
  * Trap handler for the GICv3 SGI generation system register.
  * Forward the request to the VGIC emulation.
@@ -422,6 +437,11 @@ static bool access_gic_sgi(struct kvm_vcpu *vcpu,
 	if (!p->is_write)
 		return read_from_write_only(vcpu, p, r);
 
+	if (sgi_traps_to_vel2(vcpu)) {
+		kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
+		return false;
+	}
+
 	/*
 	 * In a system where GICD_CTLR.DS=1, a ICC_SGI0R_EL1 access generates
 	 * Group0 SGIs only, while ICC_SGI1R_EL1 can generate either group,
@@ -465,7 +485,13 @@ static bool access_gic_sre(struct kvm_vcpu *vcpu,
 	if (p->is_write)
 		return ignore_write(vcpu, p);
 
-	p->regval = vcpu->arch.vgic_cpu.vgic_v3.vgic_sre;
+	if (p->Op1 == 4) {	/* ICC_SRE_EL2 */
+		p->regval = (ICC_SRE_EL2_ENABLE | ICC_SRE_EL2_SRE |
+			     ICC_SRE_EL1_DIB | ICC_SRE_EL1_DFB);
+	} else {		/* ICC_SRE_EL1 */
+		p->regval = vcpu->arch.vgic_cpu.vgic_v3.vgic_sre;
+	}
+
 	return true;
 }
 
@@ -1835,6 +1861,122 @@ static bool access_spsr_el2(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+static bool access_gic_apr(struct kvm_vcpu *vcpu,
+			   struct sys_reg_params *p,
+			   const struct sys_reg_desc *r)
+{
+	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
+	u32 index, *base;
+
+	index = r->Op2;
+	if (r->CRm == 8)
+		base = cpu_if->vgic_ap0r;
+	else
+		base = cpu_if->vgic_ap1r;
+
+	if (p->is_write)
+		base[index] = p->regval;
+	else
+		p->regval = base[index];
+
+	return true;
+}
+
+static bool access_gic_hcr(struct kvm_vcpu *vcpu,
+			   struct sys_reg_params *p,
+			   const struct sys_reg_desc *r)
+{
+	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
+
+	if (p->is_write)
+		cpu_if->vgic_hcr = p->regval;
+	else
+		p->regval = cpu_if->vgic_hcr;
+
+	return true;
+}
+
+static bool access_gic_vtr(struct kvm_vcpu *vcpu,
+			   struct sys_reg_params *p,
+			   const struct sys_reg_desc *r)
+{
+	if (p->is_write)
+		return write_to_read_only(vcpu, p, r);
+
+	p->regval = kvm_vgic_global_state.ich_vtr_el2;
+
+	return true;
+}
+
+static bool access_gic_misr(struct kvm_vcpu *vcpu,
+			    struct sys_reg_params *p,
+			    const struct sys_reg_desc *r)
+{
+	if (p->is_write)
+		return write_to_read_only(vcpu, p, r);
+
+	p->regval = vgic_v3_get_misr(vcpu);
+
+	return true;
+}
+
+static bool access_gic_eisr(struct kvm_vcpu *vcpu,
+			    struct sys_reg_params *p,
+			    const struct sys_reg_desc *r)
+{
+	if (p->is_write)
+		return write_to_read_only(vcpu, p, r);
+
+	p->regval = vgic_v3_get_eisr(vcpu);
+
+	return true;
+}
+
+static bool access_gic_elrsr(struct kvm_vcpu *vcpu,
+			     struct sys_reg_params *p,
+			     const struct sys_reg_desc *r)
+{
+	if (p->is_write)
+		return write_to_read_only(vcpu, p, r);
+
+	p->regval = vgic_v3_get_elrsr(vcpu);
+
+	return true;
+}
+
+static bool access_gic_vmcr(struct kvm_vcpu *vcpu,
+			    struct sys_reg_params *p,
+			    const struct sys_reg_desc *r)
+{
+	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
+
+	if (p->is_write)
+		cpu_if->vgic_vmcr = p->regval;
+	else
+		p->regval = cpu_if->vgic_vmcr;
+
+	return true;
+}
+
+static bool access_gic_lr(struct kvm_vcpu *vcpu,
+			  struct sys_reg_params *p,
+			  const struct sys_reg_desc *r)
+{
+	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
+	u32 index;
+
+	index = p->Op2;
+	if (p->CRm == 13)
+		index += 8;
+
+	if (p->is_write)
+		cpu_if->vgic_lr[index] = p->regval;
+	else
+		p->regval = cpu_if->vgic_lr[index];
+
+	return true;
+}
+
 /*
  * Architected system registers.
  * Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
@@ -2286,6 +2428,41 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_RMR_EL2), trap_undef },
 	{ SYS_DESC(SYS_VDISR_EL2), trap_undef },
 
+	{ SYS_DESC(SYS_ICH_AP0R0_EL2), access_gic_apr },
+	{ SYS_DESC(SYS_ICH_AP0R1_EL2), access_gic_apr },
+	{ SYS_DESC(SYS_ICH_AP0R2_EL2), access_gic_apr },
+	{ SYS_DESC(SYS_ICH_AP0R3_EL2), access_gic_apr },
+	{ SYS_DESC(SYS_ICH_AP1R0_EL2), access_gic_apr },
+	{ SYS_DESC(SYS_ICH_AP1R1_EL2), access_gic_apr },
+	{ SYS_DESC(SYS_ICH_AP1R2_EL2), access_gic_apr },
+	{ SYS_DESC(SYS_ICH_AP1R3_EL2), access_gic_apr },
+
+	{ SYS_DESC(SYS_ICC_SRE_EL2), access_gic_sre },
+
+	{ SYS_DESC(SYS_ICH_HCR_EL2), access_gic_hcr },
+	{ SYS_DESC(SYS_ICH_VTR_EL2), access_gic_vtr },
+	{ SYS_DESC(SYS_ICH_MISR_EL2), access_gic_misr },
+	{ SYS_DESC(SYS_ICH_EISR_EL2), access_gic_eisr },
+	{ SYS_DESC(SYS_ICH_ELRSR_EL2), access_gic_elrsr },
+	{ SYS_DESC(SYS_ICH_VMCR_EL2), access_gic_vmcr },
+
+	{ SYS_DESC(SYS_ICH_LR0_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR1_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR2_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR3_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR4_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR5_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR6_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR7_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR8_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR9_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR10_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR11_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR12_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR13_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR14_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR15_EL2), access_gic_lr },
+
 	EL2_REG(CONTEXTIDR_EL2, access_rw, reset_val, 0),
 	EL2_REG(TPIDR_EL2, access_rw, reset_val, 0),
 
diff --git a/arch/arm64/kvm/vgic/vgic-v3-nested.c b/arch/arm64/kvm/vgic/vgic-v3-nested.c
new file mode 100644
index 000000000000..ab8ddf490b31
--- /dev/null
+++ b/arch/arm64/kvm/vgic/vgic-v3-nested.c
@@ -0,0 +1,180 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/cpu.h>
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/uaccess.h>
+
+#include <linux/irqchip/arm-gic-v3.h>
+
+#include <asm/kvm_emulate.h>
+#include <asm/kvm_arm.h>
+#include <kvm/arm_vgic.h>
+
+#include "vgic.h"
+
+static inline struct vgic_v3_cpu_if *vcpu_nested_if(struct kvm_vcpu *vcpu)
+{
+	return &vcpu->arch.vgic_cpu.nested_vgic_v3;
+}
+
+static inline struct vgic_v3_cpu_if *vcpu_shadow_if(struct kvm_vcpu *vcpu)
+{
+	return &vcpu->arch.vgic_cpu.shadow_vgic_v3;
+}
+
+static inline bool lr_triggers_eoi(u64 lr)
+{
+	return !(lr & (ICH_LR_STATE | ICH_LR_HW)) && (lr & ICH_LR_EOI);
+}
+
+u16 vgic_v3_get_eisr(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+	u16 reg = 0;
+	int i;
+
+	for (i = 0; i < kvm_vgic_global_state.nr_lr; i++) {
+		if (lr_triggers_eoi(cpu_if->vgic_lr[i]))
+			reg |= BIT(i);
+	}
+
+	return reg;
+}
+
+u16 vgic_v3_get_elrsr(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+	u16 reg = 0;
+	int i;
+
+	for (i = 0; i < kvm_vgic_global_state.nr_lr; i++) {
+		if (!(cpu_if->vgic_lr[i] & ICH_LR_STATE))
+			reg |= BIT(i);
+	}
+
+	return reg;
+}
+
+u64 vgic_v3_get_misr(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+	int nr_lr = kvm_vgic_global_state.nr_lr;
+	u64 reg = 0;
+
+	if (vgic_v3_get_eisr(vcpu))
+		reg |= ICH_MISR_EOI;
+
+	if (cpu_if->vgic_hcr & ICH_HCR_UIE) {
+		int used_lrs;
+
+		used_lrs = nr_lr - hweight16(vgic_v3_get_elrsr(vcpu));
+		if (used_lrs <= 1)
+			reg |= ICH_MISR_U;
+	}
+
+	/* TODO: Support remaining bits in this register */
+	return reg;
+}
+
+/*
+ * For LRs which have HW bit set such as timer interrupts, we modify them to
+ * have the host hardware interrupt number instead of the virtual one programmed
+ * by the guest hypervisor.
+ */
+static void vgic_v3_create_shadow_lr(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+	struct vgic_v3_cpu_if *s_cpu_if = vcpu_shadow_if(vcpu);
+	struct vgic_irq *irq;
+	int i, used_lrs = 0;
+
+	for (i = 0; i < kvm_vgic_global_state.nr_lr; i++) {
+		u64 lr = cpu_if->vgic_lr[i];
+		int l1_irq;
+
+		if (!(lr & ICH_LR_HW))
+			goto next;
+
+		/* We have the HW bit set */
+		l1_irq = (lr & ICH_LR_PHYS_ID_MASK) >> ICH_LR_PHYS_ID_SHIFT;
+		irq = vgic_get_irq(vcpu->kvm, vcpu, l1_irq);
+
+		if (!irq || !irq->hw) {
+			/* There was no real mapping, so nuke the HW bit */
+			lr &= ~ICH_LR_HW;
+			if (irq)
+				vgic_put_irq(vcpu->kvm, irq);
+			goto next;
+		}
+
+		/* Translate the virtual mapping to the real one */
+		lr &= ~ICH_LR_EOI; /* Why? */
+		lr &= ~ICH_LR_PHYS_ID_MASK;
+		lr |= (u64)irq->hwintid << ICH_LR_PHYS_ID_SHIFT;
+		vgic_put_irq(vcpu->kvm, irq);
+
+next:
+		s_cpu_if->vgic_lr[i] = lr;
+		used_lrs = i + 1;
+	}
+
+	s_cpu_if->used_lrs = used_lrs;
+}
+
+/*
+ * Change the shadow HWIRQ field back to the virtual value before copying over
+ * the entire shadow struct to the nested state.
+ */
+static void vgic_v3_fixup_shadow_lr_state(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+	struct vgic_v3_cpu_if *s_cpu_if = vcpu_shadow_if(vcpu);
+	int lr;
+
+	for (lr = 0; lr < kvm_vgic_global_state.nr_lr; lr++) {
+		s_cpu_if->vgic_lr[lr] &= ~ICH_LR_PHYS_ID_MASK;
+		s_cpu_if->vgic_lr[lr] |= cpu_if->vgic_lr[lr] & ICH_LR_PHYS_ID_MASK;
+	}
+}
+
+void vgic_v3_load_nested(struct kvm_vcpu *vcpu)
+{
+	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
+
+	vgic_cpu->shadow_vgic_v3 = vgic_cpu->nested_vgic_v3;
+	vgic_v3_create_shadow_lr(vcpu);
+	__vgic_v3_restore_state(vcpu_shadow_if(vcpu));
+}
+
+void vgic_v3_put_nested(struct kvm_vcpu *vcpu)
+{
+	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
+
+	__vgic_v3_save_state(vcpu_shadow_if(vcpu));
+
+	/*
+	 * Translate the shadow state HW fields back to the virtual ones
+	 * before copying the shadow struct back to the nested one.
+	 */
+	vgic_v3_fixup_shadow_lr_state(vcpu);
+	vgic_cpu->nested_vgic_v3 = vgic_cpu->shadow_vgic_v3;
+}
+
+void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+
+	/*
+	 * If we exit a nested VM with a pending maintenance interrupt from the
+	 * GIC, then we need to forward this to the guest hypervisor so that it
+	 * can re-sync the appropriate LRs and sample level triggered interrupts
+	 * again.
+	 */
+	if (vgic_state_is_nested(vcpu) &&
+	    (cpu_if->vgic_hcr & ICH_HCR_EN) &&
+	    vgic_v3_get_misr(vcpu))
+		kvm_inject_nested_irq(vcpu);
+}
diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c
index a33d4366b326..6bfca28ecced 100644
--- a/arch/arm64/kvm/vgic/vgic-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-v3.c
@@ -8,6 +8,7 @@
 #include <kvm/arm_vgic.h>
 #include <asm/kvm_hyp.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_nested.h>
 #include <asm/kvm_asm.h>
 
 #include "vgic.h"
@@ -277,6 +278,12 @@ void vgic_v3_enable(struct kvm_vcpu *vcpu)
 		vgic_v3->vgic_sre = (ICC_SRE_EL1_DIB |
 				     ICC_SRE_EL1_DFB |
 				     ICC_SRE_EL1_SRE);
+		/*
+		 * If nesting is allowed, force GICv3 onto the nested
+		 * guests as well.
+		 */
+		if (vcpu_has_nv(vcpu))
+			vcpu->arch.vgic_cpu.nested_vgic_v3.vgic_sre = vgic_v3->vgic_sre;
 		vcpu->arch.vgic_cpu.pendbaser = INITIAL_PENDBASER_VALUE;
 	} else {
 		vgic_v3->vgic_sre = 0;
@@ -707,6 +714,13 @@ void vgic_v3_load(struct kvm_vcpu *vcpu)
 {
 	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
 
+	/*
+	 * vgic_v3_load_nested only affects the LRs in the shadow
+	 * state, so it is fine to pass the nested state around.
+	 */
+	if (vgic_state_is_nested(vcpu))
+		cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
+
 	/*
 	 * If dealing with a GICv2 emulation on GICv3, VMCR_EL2.VFIQen
 	 * is dependent on ICC_SRE_EL1.SRE, and we have to perform the
@@ -720,6 +734,9 @@ void vgic_v3_load(struct kvm_vcpu *vcpu)
 	if (has_vhe())
 		__vgic_v3_activate_traps(cpu_if);
 
+	if (vgic_state_is_nested(vcpu))
+		vgic_v3_load_nested(vcpu);
+
 	WARN_ON(vgic_v4_load(vcpu));
 }
 
@@ -727,6 +744,9 @@ void vgic_v3_vmcr_sync(struct kvm_vcpu *vcpu)
 {
 	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
 
+	if (vgic_state_is_nested(vcpu))
+		cpu_if = &vcpu->arch.vgic_cpu.shadow_vgic_v3;
+
 	if (likely(cpu_if->vgic_sre))
 		cpu_if->vgic_vmcr = kvm_call_hyp_ret(__vgic_v3_read_vmcr);
 }
@@ -739,8 +759,14 @@ void vgic_v3_put(struct kvm_vcpu *vcpu)
 
 	vgic_v3_vmcr_sync(vcpu);
 
+	if (vgic_state_is_nested(vcpu))
+		cpu_if = &vcpu->arch.vgic_cpu.shadow_vgic_v3;
+
 	kvm_call_hyp(__vgic_v3_save_aprs, cpu_if);
 
 	if (has_vhe())
 		__vgic_v3_deactivate_traps(cpu_if);
+
+	if (vgic_state_is_nested(vcpu))
+		vgic_v3_put_nested(vcpu);
 }
diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
index e7fe0447af08..f467ea454c66 100644
--- a/arch/arm64/kvm/vgic/vgic.c
+++ b/arch/arm64/kvm/vgic/vgic.c
@@ -876,6 +876,10 @@ void kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
 {
 	int used_lrs;
 
+	/* If nesting, this is a load/put affair, not flush/sync. */
+	if (vgic_state_is_nested(vcpu))
+		return;
+
 	/* An empty ap_list_head implies used_lrs == 0 */
 	if (list_empty(&vcpu->arch.vgic_cpu.ap_list_head))
 		return;
@@ -920,6 +924,29 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
 	    !vgic_supports_direct_msis(vcpu->kvm))
 		return;
 
+	/*
+	 * If in a nested state, we must return early. Two possibilities:
+	 *
+	 * - If we have any pending IRQ for the guest and the guest
+	 *   expects IRQs to be handled in its virtual EL2 mode (the
+	 *   virtual IMO bit is set) and it is not already running in
+	 *   virtual EL2 mode, then we have to emulate an IRQ
+	 *   exception to virtual EL2.
+	 *
+	 *   We do that by placing a request to ourselves which will
+	 *   abort the entry procedure and inject the exception at the
+	 *   beginning of the run loop.
+	 *
+	 * - Otherwise, do exactly *NOTHING*. The guest state is
+	 *   already loaded, and we can carry on with running it.
+	 */
+	if (vgic_state_is_nested(vcpu)) {
+		if (kvm_vgic_vcpu_pending_irq(vcpu))
+			kvm_make_request(KVM_REQ_GUEST_HYP_IRQ_PENDING, vcpu);
+
+		return;
+	}
+
 	DEBUG_SPINLOCK_BUG_ON(!irqs_disabled());
 
 	if (!list_empty(&vcpu->arch.vgic_cpu.ap_list_head)) {
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 17b651890d22..2e27259b451f 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -327,6 +327,15 @@ struct vgic_cpu {
 
 	struct vgic_irq private_irqs[VGIC_NR_PRIVATE_IRQS];
 
+	/* CPU vif control registers for the virtual GICH interface */
+	struct vgic_v3_cpu_if	nested_vgic_v3;
+
+	/*
+	 * The shadow vif control register loaded to the hardware when
+	 * running a nested L2 guest with the virtual IMO/FMO bit set.
+	 */
+	struct vgic_v3_cpu_if	shadow_vgic_v3;
+
 	raw_spinlock_t ap_list_lock;	/* Protects the ap_list */
 
 	/*
@@ -384,6 +393,13 @@ void kvm_vgic_load(struct kvm_vcpu *vcpu);
 void kvm_vgic_put(struct kvm_vcpu *vcpu);
 void kvm_vgic_vmcr_sync(struct kvm_vcpu *vcpu);
 
+void vgic_v3_load_nested(struct kvm_vcpu *vcpu);
+void vgic_v3_put_nested(struct kvm_vcpu *vcpu);
+void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu);
+u16 vgic_v3_get_eisr(struct kvm_vcpu *vcpu);
+u16 vgic_v3_get_elrsr(struct kvm_vcpu *vcpu);
+u64 vgic_v3_get_misr(struct kvm_vcpu *vcpu);
+
 #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
 #define vgic_initialized(k)	((k)->arch.vgic.initialized)
 #define vgic_ready(k)		((k)->arch.vgic.ready)
@@ -428,4 +444,6 @@ int vgic_v4_load(struct kvm_vcpu *vcpu);
 void vgic_v4_commit(struct kvm_vcpu *vcpu);
 int vgic_v4_put(struct kvm_vcpu *vcpu, bool need_db);
 
+bool vgic_state_is_nested(struct kvm_vcpu *vcpu);
+
 #endif /* __KVM_ARM_VGIC_H */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 46/64] KVM: arm64: nv: Nested GICv3 Support
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

From: Jintack Lim <jintack@cs.columbia.edu>

When entering a nested VM, we set up the hypervisor control interface
based on what the guest hypervisor has set. Especially, we investigate
each list register written by the guest hypervisor whether HW bit is
set.  If so, we translate hw irq number from the guest's point of view
to the real hardware irq number if there is a mapping.

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
[Redesigned execution flow around vcpu load/put]
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
[Rewritten to support GICv3 instead of GICv2]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_emulate.h |   8 +-
 arch/arm64/include/asm/kvm_host.h    |  11 +-
 arch/arm64/include/asm/kvm_nested.h  |   1 +
 arch/arm64/kvm/Makefile              |   2 +-
 arch/arm64/kvm/arm.c                 |  13 +-
 arch/arm64/kvm/nested.c              |  16 +++
 arch/arm64/kvm/sys_regs.c            | 179 +++++++++++++++++++++++++-
 arch/arm64/kvm/vgic/vgic-v3-nested.c | 180 +++++++++++++++++++++++++++
 arch/arm64/kvm/vgic/vgic-v3.c        |  26 ++++
 arch/arm64/kvm/vgic/vgic.c           |  27 ++++
 include/kvm/arm_vgic.h               |  18 +++
 11 files changed, 471 insertions(+), 10 deletions(-)
 create mode 100644 arch/arm64/kvm/vgic/vgic-v3-nested.c

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index dc6eeb0cc8a9..2128d623a8b3 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -505,7 +505,13 @@ static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
 
 static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
 {
-	return vcpu_read_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
+	/*
+	 * Use the in-memory view for MPIDR_EL1. It can't be changed by the
+	 * guest, and is also accessed from the context of *another* vcpu,
+	 * so anything using some other state (such as the NV state that is
+	 * used by vcpu_read_sys_reg) will eventually go wrong.
+	 */
+	return __vcpu_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
 }
 
 static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 03833eca3307..6bfdb21c0d8a 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -41,11 +41,12 @@
 
 #define KVM_REQ_SLEEP \
 	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
-#define KVM_REQ_IRQ_PENDING	KVM_ARCH_REQ(1)
-#define KVM_REQ_VCPU_RESET	KVM_ARCH_REQ(2)
-#define KVM_REQ_RECORD_STEAL	KVM_ARCH_REQ(3)
-#define KVM_REQ_RELOAD_GICv4	KVM_ARCH_REQ(4)
-#define KVM_REQ_RELOAD_PMU	KVM_ARCH_REQ(5)
+#define KVM_REQ_IRQ_PENDING		KVM_ARCH_REQ(1)
+#define KVM_REQ_VCPU_RESET		KVM_ARCH_REQ(2)
+#define KVM_REQ_RECORD_STEAL		KVM_ARCH_REQ(3)
+#define KVM_REQ_RELOAD_GICv4		KVM_ARCH_REQ(4)
+#define KVM_REQ_RELOAD_PMU		KVM_ARCH_REQ(5)
+#define KVM_REQ_GUEST_HYP_IRQ_PENDING	KVM_ARCH_REQ(6)
 
 #define KVM_DIRTY_LOG_MANUAL_CAPS   (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | \
 				     KVM_DIRTY_LOG_INITIALLY_SET)
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 8915bead0633..3eef5f60c76d 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -71,6 +71,7 @@ extern void kvm_init_nested_s2_mmu(struct kvm_s2_mmu *mmu);
 extern struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr);
 extern void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu);
 extern void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu);
+extern void check_nested_vcpu_requests(struct kvm_vcpu *vcpu);
 
 struct kvm_s2_trans {
 	phys_addr_t output;
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index b800dcbd157f..a982fe1b4fc4 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -20,7 +20,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
 	 vgic/vgic-v3.o vgic/vgic-v4.o \
 	 vgic/vgic-mmio.o vgic/vgic-mmio-v2.o \
 	 vgic/vgic-mmio-v3.o vgic/vgic-kvm-device.o \
-	 vgic/vgic-its.o vgic/vgic-debug.o
+	 vgic/vgic-its.o vgic/vgic-debug.o vgic/vgic-v3-nested.o
 
 kvm-$(CONFIG_HW_PERF_EVENTS)  += pmu-emul.o
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 6597cb8f64f9..91fb7bc09b8e 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -755,6 +755,8 @@ static void check_vcpu_requests(struct kvm_vcpu *vcpu)
 		if (kvm_check_request(KVM_REQ_RELOAD_PMU, vcpu))
 			kvm_pmu_handle_pmcr(vcpu,
 					    __vcpu_sys_reg(vcpu, PMCR_EL0));
+
+		check_nested_vcpu_requests(vcpu);
 	}
 }
 
@@ -846,9 +848,16 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 		if (!ret)
 			ret = 1;
 
-		update_vmid(&vcpu->arch.hw_mmu->vmid);
-
+		/*
+		 * A nested exeption triggered by a vcpu request (such
+		 * as an interrupt injected in a guest hypervisor) can
+		 * change the currently used VMID (by switching to a
+		 * different translation regime. It is thus necesary
+		 * to update the VMID *after* all requests have been
+		 * processed.
+		 */
 		check_vcpu_requests(vcpu);
+		update_vmid(&vcpu->arch.hw_mmu->vmid);
 
 		/*
 		 * Preparing the interrupts to be injected also
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index b39af4d87787..39c9b1033f02 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -580,6 +580,22 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm)
 	kvm_free_stage2_pgd(&kvm->arch.mmu);
 }
 
+bool vgic_state_is_nested(struct kvm_vcpu *vcpu)
+{
+	bool imo = __vcpu_sys_reg(vcpu, HCR_EL2) & HCR_IMO;
+	bool fmo = __vcpu_sys_reg(vcpu, HCR_EL2) & HCR_FMO;
+
+	WARN_ONCE(imo != fmo, "Separate virtual IRQ/FIQ settings not supported\n");
+
+	return vcpu_has_nv(vcpu) && imo && fmo && !is_hyp_ctxt(vcpu);
+}
+
+void check_nested_vcpu_requests(struct kvm_vcpu *vcpu)
+{
+	if (kvm_check_request(KVM_REQ_GUEST_HYP_IRQ_PENDING, vcpu))
+		kvm_inject_nested_irq(vcpu);
+}
+
 /*
  * Our emulated CPU doesn't support all the possible features. For the
  * sake of simplicity (and probably mental sanity), wipe out a number
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index f2452e76156c..a18d845bcff7 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -16,6 +16,8 @@
 #include <linux/printk.h>
 #include <linux/uaccess.h>
 
+#include <linux/irqchip/arm-gic-v3.h>
+
 #include <asm/cacheflush.h>
 #include <asm/cputype.h>
 #include <asm/debug-monitors.h>
@@ -407,6 +409,19 @@ static bool access_actlr(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+/*
+ * The architecture says that non-secure write accesses to this register from
+ * EL1 are trapped to EL2, if either:
+ *  - HCR_EL2.FMO==1, or
+ *  - HCR_EL2.IMO==1
+ */
+static bool sgi_traps_to_vel2(struct kvm_vcpu *vcpu)
+{
+	return (vcpu_has_nv(vcpu) &&
+		!vcpu_is_el2(vcpu) &&
+		!!(vcpu_read_sys_reg(vcpu, HCR_EL2) & (HCR_IMO | HCR_FMO)));
+}
+
 /*
  * Trap handler for the GICv3 SGI generation system register.
  * Forward the request to the VGIC emulation.
@@ -422,6 +437,11 @@ static bool access_gic_sgi(struct kvm_vcpu *vcpu,
 	if (!p->is_write)
 		return read_from_write_only(vcpu, p, r);
 
+	if (sgi_traps_to_vel2(vcpu)) {
+		kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
+		return false;
+	}
+
 	/*
 	 * In a system where GICD_CTLR.DS=1, a ICC_SGI0R_EL1 access generates
 	 * Group0 SGIs only, while ICC_SGI1R_EL1 can generate either group,
@@ -465,7 +485,13 @@ static bool access_gic_sre(struct kvm_vcpu *vcpu,
 	if (p->is_write)
 		return ignore_write(vcpu, p);
 
-	p->regval = vcpu->arch.vgic_cpu.vgic_v3.vgic_sre;
+	if (p->Op1 == 4) {	/* ICC_SRE_EL2 */
+		p->regval = (ICC_SRE_EL2_ENABLE | ICC_SRE_EL2_SRE |
+			     ICC_SRE_EL1_DIB | ICC_SRE_EL1_DFB);
+	} else {		/* ICC_SRE_EL1 */
+		p->regval = vcpu->arch.vgic_cpu.vgic_v3.vgic_sre;
+	}
+
 	return true;
 }
 
@@ -1835,6 +1861,122 @@ static bool access_spsr_el2(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+static bool access_gic_apr(struct kvm_vcpu *vcpu,
+			   struct sys_reg_params *p,
+			   const struct sys_reg_desc *r)
+{
+	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
+	u32 index, *base;
+
+	index = r->Op2;
+	if (r->CRm == 8)
+		base = cpu_if->vgic_ap0r;
+	else
+		base = cpu_if->vgic_ap1r;
+
+	if (p->is_write)
+		base[index] = p->regval;
+	else
+		p->regval = base[index];
+
+	return true;
+}
+
+static bool access_gic_hcr(struct kvm_vcpu *vcpu,
+			   struct sys_reg_params *p,
+			   const struct sys_reg_desc *r)
+{
+	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
+
+	if (p->is_write)
+		cpu_if->vgic_hcr = p->regval;
+	else
+		p->regval = cpu_if->vgic_hcr;
+
+	return true;
+}
+
+static bool access_gic_vtr(struct kvm_vcpu *vcpu,
+			   struct sys_reg_params *p,
+			   const struct sys_reg_desc *r)
+{
+	if (p->is_write)
+		return write_to_read_only(vcpu, p, r);
+
+	p->regval = kvm_vgic_global_state.ich_vtr_el2;
+
+	return true;
+}
+
+static bool access_gic_misr(struct kvm_vcpu *vcpu,
+			    struct sys_reg_params *p,
+			    const struct sys_reg_desc *r)
+{
+	if (p->is_write)
+		return write_to_read_only(vcpu, p, r);
+
+	p->regval = vgic_v3_get_misr(vcpu);
+
+	return true;
+}
+
+static bool access_gic_eisr(struct kvm_vcpu *vcpu,
+			    struct sys_reg_params *p,
+			    const struct sys_reg_desc *r)
+{
+	if (p->is_write)
+		return write_to_read_only(vcpu, p, r);
+
+	p->regval = vgic_v3_get_eisr(vcpu);
+
+	return true;
+}
+
+static bool access_gic_elrsr(struct kvm_vcpu *vcpu,
+			     struct sys_reg_params *p,
+			     const struct sys_reg_desc *r)
+{
+	if (p->is_write)
+		return write_to_read_only(vcpu, p, r);
+
+	p->regval = vgic_v3_get_elrsr(vcpu);
+
+	return true;
+}
+
+static bool access_gic_vmcr(struct kvm_vcpu *vcpu,
+			    struct sys_reg_params *p,
+			    const struct sys_reg_desc *r)
+{
+	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
+
+	if (p->is_write)
+		cpu_if->vgic_vmcr = p->regval;
+	else
+		p->regval = cpu_if->vgic_vmcr;
+
+	return true;
+}
+
+static bool access_gic_lr(struct kvm_vcpu *vcpu,
+			  struct sys_reg_params *p,
+			  const struct sys_reg_desc *r)
+{
+	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
+	u32 index;
+
+	index = p->Op2;
+	if (p->CRm == 13)
+		index += 8;
+
+	if (p->is_write)
+		cpu_if->vgic_lr[index] = p->regval;
+	else
+		p->regval = cpu_if->vgic_lr[index];
+
+	return true;
+}
+
 /*
  * Architected system registers.
  * Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
@@ -2286,6 +2428,41 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_RMR_EL2), trap_undef },
 	{ SYS_DESC(SYS_VDISR_EL2), trap_undef },
 
+	{ SYS_DESC(SYS_ICH_AP0R0_EL2), access_gic_apr },
+	{ SYS_DESC(SYS_ICH_AP0R1_EL2), access_gic_apr },
+	{ SYS_DESC(SYS_ICH_AP0R2_EL2), access_gic_apr },
+	{ SYS_DESC(SYS_ICH_AP0R3_EL2), access_gic_apr },
+	{ SYS_DESC(SYS_ICH_AP1R0_EL2), access_gic_apr },
+	{ SYS_DESC(SYS_ICH_AP1R1_EL2), access_gic_apr },
+	{ SYS_DESC(SYS_ICH_AP1R2_EL2), access_gic_apr },
+	{ SYS_DESC(SYS_ICH_AP1R3_EL2), access_gic_apr },
+
+	{ SYS_DESC(SYS_ICC_SRE_EL2), access_gic_sre },
+
+	{ SYS_DESC(SYS_ICH_HCR_EL2), access_gic_hcr },
+	{ SYS_DESC(SYS_ICH_VTR_EL2), access_gic_vtr },
+	{ SYS_DESC(SYS_ICH_MISR_EL2), access_gic_misr },
+	{ SYS_DESC(SYS_ICH_EISR_EL2), access_gic_eisr },
+	{ SYS_DESC(SYS_ICH_ELRSR_EL2), access_gic_elrsr },
+	{ SYS_DESC(SYS_ICH_VMCR_EL2), access_gic_vmcr },
+
+	{ SYS_DESC(SYS_ICH_LR0_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR1_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR2_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR3_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR4_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR5_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR6_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR7_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR8_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR9_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR10_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR11_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR12_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR13_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR14_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR15_EL2), access_gic_lr },
+
 	EL2_REG(CONTEXTIDR_EL2, access_rw, reset_val, 0),
 	EL2_REG(TPIDR_EL2, access_rw, reset_val, 0),
 
diff --git a/arch/arm64/kvm/vgic/vgic-v3-nested.c b/arch/arm64/kvm/vgic/vgic-v3-nested.c
new file mode 100644
index 000000000000..ab8ddf490b31
--- /dev/null
+++ b/arch/arm64/kvm/vgic/vgic-v3-nested.c
@@ -0,0 +1,180 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/cpu.h>
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/uaccess.h>
+
+#include <linux/irqchip/arm-gic-v3.h>
+
+#include <asm/kvm_emulate.h>
+#include <asm/kvm_arm.h>
+#include <kvm/arm_vgic.h>
+
+#include "vgic.h"
+
+static inline struct vgic_v3_cpu_if *vcpu_nested_if(struct kvm_vcpu *vcpu)
+{
+	return &vcpu->arch.vgic_cpu.nested_vgic_v3;
+}
+
+static inline struct vgic_v3_cpu_if *vcpu_shadow_if(struct kvm_vcpu *vcpu)
+{
+	return &vcpu->arch.vgic_cpu.shadow_vgic_v3;
+}
+
+static inline bool lr_triggers_eoi(u64 lr)
+{
+	return !(lr & (ICH_LR_STATE | ICH_LR_HW)) && (lr & ICH_LR_EOI);
+}
+
+u16 vgic_v3_get_eisr(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+	u16 reg = 0;
+	int i;
+
+	for (i = 0; i < kvm_vgic_global_state.nr_lr; i++) {
+		if (lr_triggers_eoi(cpu_if->vgic_lr[i]))
+			reg |= BIT(i);
+	}
+
+	return reg;
+}
+
+u16 vgic_v3_get_elrsr(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+	u16 reg = 0;
+	int i;
+
+	for (i = 0; i < kvm_vgic_global_state.nr_lr; i++) {
+		if (!(cpu_if->vgic_lr[i] & ICH_LR_STATE))
+			reg |= BIT(i);
+	}
+
+	return reg;
+}
+
+u64 vgic_v3_get_misr(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+	int nr_lr = kvm_vgic_global_state.nr_lr;
+	u64 reg = 0;
+
+	if (vgic_v3_get_eisr(vcpu))
+		reg |= ICH_MISR_EOI;
+
+	if (cpu_if->vgic_hcr & ICH_HCR_UIE) {
+		int used_lrs;
+
+		used_lrs = nr_lr - hweight16(vgic_v3_get_elrsr(vcpu));
+		if (used_lrs <= 1)
+			reg |= ICH_MISR_U;
+	}
+
+	/* TODO: Support remaining bits in this register */
+	return reg;
+}
+
+/*
+ * For LRs which have HW bit set such as timer interrupts, we modify them to
+ * have the host hardware interrupt number instead of the virtual one programmed
+ * by the guest hypervisor.
+ */
+static void vgic_v3_create_shadow_lr(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+	struct vgic_v3_cpu_if *s_cpu_if = vcpu_shadow_if(vcpu);
+	struct vgic_irq *irq;
+	int i, used_lrs = 0;
+
+	for (i = 0; i < kvm_vgic_global_state.nr_lr; i++) {
+		u64 lr = cpu_if->vgic_lr[i];
+		int l1_irq;
+
+		if (!(lr & ICH_LR_HW))
+			goto next;
+
+		/* We have the HW bit set */
+		l1_irq = (lr & ICH_LR_PHYS_ID_MASK) >> ICH_LR_PHYS_ID_SHIFT;
+		irq = vgic_get_irq(vcpu->kvm, vcpu, l1_irq);
+
+		if (!irq || !irq->hw) {
+			/* There was no real mapping, so nuke the HW bit */
+			lr &= ~ICH_LR_HW;
+			if (irq)
+				vgic_put_irq(vcpu->kvm, irq);
+			goto next;
+		}
+
+		/* Translate the virtual mapping to the real one */
+		lr &= ~ICH_LR_EOI; /* Why? */
+		lr &= ~ICH_LR_PHYS_ID_MASK;
+		lr |= (u64)irq->hwintid << ICH_LR_PHYS_ID_SHIFT;
+		vgic_put_irq(vcpu->kvm, irq);
+
+next:
+		s_cpu_if->vgic_lr[i] = lr;
+		used_lrs = i + 1;
+	}
+
+	s_cpu_if->used_lrs = used_lrs;
+}
+
+/*
+ * Change the shadow HWIRQ field back to the virtual value before copying over
+ * the entire shadow struct to the nested state.
+ */
+static void vgic_v3_fixup_shadow_lr_state(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+	struct vgic_v3_cpu_if *s_cpu_if = vcpu_shadow_if(vcpu);
+	int lr;
+
+	for (lr = 0; lr < kvm_vgic_global_state.nr_lr; lr++) {
+		s_cpu_if->vgic_lr[lr] &= ~ICH_LR_PHYS_ID_MASK;
+		s_cpu_if->vgic_lr[lr] |= cpu_if->vgic_lr[lr] & ICH_LR_PHYS_ID_MASK;
+	}
+}
+
+void vgic_v3_load_nested(struct kvm_vcpu *vcpu)
+{
+	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
+
+	vgic_cpu->shadow_vgic_v3 = vgic_cpu->nested_vgic_v3;
+	vgic_v3_create_shadow_lr(vcpu);
+	__vgic_v3_restore_state(vcpu_shadow_if(vcpu));
+}
+
+void vgic_v3_put_nested(struct kvm_vcpu *vcpu)
+{
+	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
+
+	__vgic_v3_save_state(vcpu_shadow_if(vcpu));
+
+	/*
+	 * Translate the shadow state HW fields back to the virtual ones
+	 * before copying the shadow struct back to the nested one.
+	 */
+	vgic_v3_fixup_shadow_lr_state(vcpu);
+	vgic_cpu->nested_vgic_v3 = vgic_cpu->shadow_vgic_v3;
+}
+
+void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+
+	/*
+	 * If we exit a nested VM with a pending maintenance interrupt from the
+	 * GIC, then we need to forward this to the guest hypervisor so that it
+	 * can re-sync the appropriate LRs and sample level triggered interrupts
+	 * again.
+	 */
+	if (vgic_state_is_nested(vcpu) &&
+	    (cpu_if->vgic_hcr & ICH_HCR_EN) &&
+	    vgic_v3_get_misr(vcpu))
+		kvm_inject_nested_irq(vcpu);
+}
diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c
index a33d4366b326..6bfca28ecced 100644
--- a/arch/arm64/kvm/vgic/vgic-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-v3.c
@@ -8,6 +8,7 @@
 #include <kvm/arm_vgic.h>
 #include <asm/kvm_hyp.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_nested.h>
 #include <asm/kvm_asm.h>
 
 #include "vgic.h"
@@ -277,6 +278,12 @@ void vgic_v3_enable(struct kvm_vcpu *vcpu)
 		vgic_v3->vgic_sre = (ICC_SRE_EL1_DIB |
 				     ICC_SRE_EL1_DFB |
 				     ICC_SRE_EL1_SRE);
+		/*
+		 * If nesting is allowed, force GICv3 onto the nested
+		 * guests as well.
+		 */
+		if (vcpu_has_nv(vcpu))
+			vcpu->arch.vgic_cpu.nested_vgic_v3.vgic_sre = vgic_v3->vgic_sre;
 		vcpu->arch.vgic_cpu.pendbaser = INITIAL_PENDBASER_VALUE;
 	} else {
 		vgic_v3->vgic_sre = 0;
@@ -707,6 +714,13 @@ void vgic_v3_load(struct kvm_vcpu *vcpu)
 {
 	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
 
+	/*
+	 * vgic_v3_load_nested only affects the LRs in the shadow
+	 * state, so it is fine to pass the nested state around.
+	 */
+	if (vgic_state_is_nested(vcpu))
+		cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
+
 	/*
 	 * If dealing with a GICv2 emulation on GICv3, VMCR_EL2.VFIQen
 	 * is dependent on ICC_SRE_EL1.SRE, and we have to perform the
@@ -720,6 +734,9 @@ void vgic_v3_load(struct kvm_vcpu *vcpu)
 	if (has_vhe())
 		__vgic_v3_activate_traps(cpu_if);
 
+	if (vgic_state_is_nested(vcpu))
+		vgic_v3_load_nested(vcpu);
+
 	WARN_ON(vgic_v4_load(vcpu));
 }
 
@@ -727,6 +744,9 @@ void vgic_v3_vmcr_sync(struct kvm_vcpu *vcpu)
 {
 	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
 
+	if (vgic_state_is_nested(vcpu))
+		cpu_if = &vcpu->arch.vgic_cpu.shadow_vgic_v3;
+
 	if (likely(cpu_if->vgic_sre))
 		cpu_if->vgic_vmcr = kvm_call_hyp_ret(__vgic_v3_read_vmcr);
 }
@@ -739,8 +759,14 @@ void vgic_v3_put(struct kvm_vcpu *vcpu)
 
 	vgic_v3_vmcr_sync(vcpu);
 
+	if (vgic_state_is_nested(vcpu))
+		cpu_if = &vcpu->arch.vgic_cpu.shadow_vgic_v3;
+
 	kvm_call_hyp(__vgic_v3_save_aprs, cpu_if);
 
 	if (has_vhe())
 		__vgic_v3_deactivate_traps(cpu_if);
+
+	if (vgic_state_is_nested(vcpu))
+		vgic_v3_put_nested(vcpu);
 }
diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
index e7fe0447af08..f467ea454c66 100644
--- a/arch/arm64/kvm/vgic/vgic.c
+++ b/arch/arm64/kvm/vgic/vgic.c
@@ -876,6 +876,10 @@ void kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
 {
 	int used_lrs;
 
+	/* If nesting, this is a load/put affair, not flush/sync. */
+	if (vgic_state_is_nested(vcpu))
+		return;
+
 	/* An empty ap_list_head implies used_lrs == 0 */
 	if (list_empty(&vcpu->arch.vgic_cpu.ap_list_head))
 		return;
@@ -920,6 +924,29 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
 	    !vgic_supports_direct_msis(vcpu->kvm))
 		return;
 
+	/*
+	 * If in a nested state, we must return early. Two possibilities:
+	 *
+	 * - If we have any pending IRQ for the guest and the guest
+	 *   expects IRQs to be handled in its virtual EL2 mode (the
+	 *   virtual IMO bit is set) and it is not already running in
+	 *   virtual EL2 mode, then we have to emulate an IRQ
+	 *   exception to virtual EL2.
+	 *
+	 *   We do that by placing a request to ourselves which will
+	 *   abort the entry procedure and inject the exception at the
+	 *   beginning of the run loop.
+	 *
+	 * - Otherwise, do exactly *NOTHING*. The guest state is
+	 *   already loaded, and we can carry on with running it.
+	 */
+	if (vgic_state_is_nested(vcpu)) {
+		if (kvm_vgic_vcpu_pending_irq(vcpu))
+			kvm_make_request(KVM_REQ_GUEST_HYP_IRQ_PENDING, vcpu);
+
+		return;
+	}
+
 	DEBUG_SPINLOCK_BUG_ON(!irqs_disabled());
 
 	if (!list_empty(&vcpu->arch.vgic_cpu.ap_list_head)) {
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 17b651890d22..2e27259b451f 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -327,6 +327,15 @@ struct vgic_cpu {
 
 	struct vgic_irq private_irqs[VGIC_NR_PRIVATE_IRQS];
 
+	/* CPU vif control registers for the virtual GICH interface */
+	struct vgic_v3_cpu_if	nested_vgic_v3;
+
+	/*
+	 * The shadow vif control register loaded to the hardware when
+	 * running a nested L2 guest with the virtual IMO/FMO bit set.
+	 */
+	struct vgic_v3_cpu_if	shadow_vgic_v3;
+
 	raw_spinlock_t ap_list_lock;	/* Protects the ap_list */
 
 	/*
@@ -384,6 +393,13 @@ void kvm_vgic_load(struct kvm_vcpu *vcpu);
 void kvm_vgic_put(struct kvm_vcpu *vcpu);
 void kvm_vgic_vmcr_sync(struct kvm_vcpu *vcpu);
 
+void vgic_v3_load_nested(struct kvm_vcpu *vcpu);
+void vgic_v3_put_nested(struct kvm_vcpu *vcpu);
+void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu);
+u16 vgic_v3_get_eisr(struct kvm_vcpu *vcpu);
+u16 vgic_v3_get_elrsr(struct kvm_vcpu *vcpu);
+u64 vgic_v3_get_misr(struct kvm_vcpu *vcpu);
+
 #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
 #define vgic_initialized(k)	((k)->arch.vgic.initialized)
 #define vgic_ready(k)		((k)->arch.vgic.ready)
@@ -428,4 +444,6 @@ int vgic_v4_load(struct kvm_vcpu *vcpu);
 void vgic_v4_commit(struct kvm_vcpu *vcpu);
 int vgic_v4_put(struct kvm_vcpu *vcpu, bool need_db);
 
+bool vgic_state_is_nested(struct kvm_vcpu *vcpu);
+
 #endif /* __KVM_ARM_VGIC_H */
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 46/64] KVM: arm64: nv: Nested GICv3 Support
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Jintack Lim <jintack@cs.columbia.edu>

When entering a nested VM, we set up the hypervisor control interface
based on what the guest hypervisor has set. Especially, we investigate
each list register written by the guest hypervisor whether HW bit is
set.  If so, we translate hw irq number from the guest's point of view
to the real hardware irq number if there is a mapping.

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
[Redesigned execution flow around vcpu load/put]
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
[Rewritten to support GICv3 instead of GICv2]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_emulate.h |   8 +-
 arch/arm64/include/asm/kvm_host.h    |  11 +-
 arch/arm64/include/asm/kvm_nested.h  |   1 +
 arch/arm64/kvm/Makefile              |   2 +-
 arch/arm64/kvm/arm.c                 |  13 +-
 arch/arm64/kvm/nested.c              |  16 +++
 arch/arm64/kvm/sys_regs.c            | 179 +++++++++++++++++++++++++-
 arch/arm64/kvm/vgic/vgic-v3-nested.c | 180 +++++++++++++++++++++++++++
 arch/arm64/kvm/vgic/vgic-v3.c        |  26 ++++
 arch/arm64/kvm/vgic/vgic.c           |  27 ++++
 include/kvm/arm_vgic.h               |  18 +++
 11 files changed, 471 insertions(+), 10 deletions(-)
 create mode 100644 arch/arm64/kvm/vgic/vgic-v3-nested.c

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index dc6eeb0cc8a9..2128d623a8b3 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -505,7 +505,13 @@ static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
 
 static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
 {
-	return vcpu_read_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
+	/*
+	 * Use the in-memory view for MPIDR_EL1. It can't be changed by the
+	 * guest, and is also accessed from the context of *another* vcpu,
+	 * so anything using some other state (such as the NV state that is
+	 * used by vcpu_read_sys_reg) will eventually go wrong.
+	 */
+	return __vcpu_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
 }
 
 static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 03833eca3307..6bfdb21c0d8a 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -41,11 +41,12 @@
 
 #define KVM_REQ_SLEEP \
 	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
-#define KVM_REQ_IRQ_PENDING	KVM_ARCH_REQ(1)
-#define KVM_REQ_VCPU_RESET	KVM_ARCH_REQ(2)
-#define KVM_REQ_RECORD_STEAL	KVM_ARCH_REQ(3)
-#define KVM_REQ_RELOAD_GICv4	KVM_ARCH_REQ(4)
-#define KVM_REQ_RELOAD_PMU	KVM_ARCH_REQ(5)
+#define KVM_REQ_IRQ_PENDING		KVM_ARCH_REQ(1)
+#define KVM_REQ_VCPU_RESET		KVM_ARCH_REQ(2)
+#define KVM_REQ_RECORD_STEAL		KVM_ARCH_REQ(3)
+#define KVM_REQ_RELOAD_GICv4		KVM_ARCH_REQ(4)
+#define KVM_REQ_RELOAD_PMU		KVM_ARCH_REQ(5)
+#define KVM_REQ_GUEST_HYP_IRQ_PENDING	KVM_ARCH_REQ(6)
 
 #define KVM_DIRTY_LOG_MANUAL_CAPS   (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | \
 				     KVM_DIRTY_LOG_INITIALLY_SET)
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 8915bead0633..3eef5f60c76d 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -71,6 +71,7 @@ extern void kvm_init_nested_s2_mmu(struct kvm_s2_mmu *mmu);
 extern struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr);
 extern void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu);
 extern void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu);
+extern void check_nested_vcpu_requests(struct kvm_vcpu *vcpu);
 
 struct kvm_s2_trans {
 	phys_addr_t output;
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index b800dcbd157f..a982fe1b4fc4 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -20,7 +20,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
 	 vgic/vgic-v3.o vgic/vgic-v4.o \
 	 vgic/vgic-mmio.o vgic/vgic-mmio-v2.o \
 	 vgic/vgic-mmio-v3.o vgic/vgic-kvm-device.o \
-	 vgic/vgic-its.o vgic/vgic-debug.o
+	 vgic/vgic-its.o vgic/vgic-debug.o vgic/vgic-v3-nested.o
 
 kvm-$(CONFIG_HW_PERF_EVENTS)  += pmu-emul.o
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 6597cb8f64f9..91fb7bc09b8e 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -755,6 +755,8 @@ static void check_vcpu_requests(struct kvm_vcpu *vcpu)
 		if (kvm_check_request(KVM_REQ_RELOAD_PMU, vcpu))
 			kvm_pmu_handle_pmcr(vcpu,
 					    __vcpu_sys_reg(vcpu, PMCR_EL0));
+
+		check_nested_vcpu_requests(vcpu);
 	}
 }
 
@@ -846,9 +848,16 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 		if (!ret)
 			ret = 1;
 
-		update_vmid(&vcpu->arch.hw_mmu->vmid);
-
+		/*
+		 * A nested exeption triggered by a vcpu request (such
+		 * as an interrupt injected in a guest hypervisor) can
+		 * change the currently used VMID (by switching to a
+		 * different translation regime. It is thus necesary
+		 * to update the VMID *after* all requests have been
+		 * processed.
+		 */
 		check_vcpu_requests(vcpu);
+		update_vmid(&vcpu->arch.hw_mmu->vmid);
 
 		/*
 		 * Preparing the interrupts to be injected also
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index b39af4d87787..39c9b1033f02 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -580,6 +580,22 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm)
 	kvm_free_stage2_pgd(&kvm->arch.mmu);
 }
 
+bool vgic_state_is_nested(struct kvm_vcpu *vcpu)
+{
+	bool imo = __vcpu_sys_reg(vcpu, HCR_EL2) & HCR_IMO;
+	bool fmo = __vcpu_sys_reg(vcpu, HCR_EL2) & HCR_FMO;
+
+	WARN_ONCE(imo != fmo, "Separate virtual IRQ/FIQ settings not supported\n");
+
+	return vcpu_has_nv(vcpu) && imo && fmo && !is_hyp_ctxt(vcpu);
+}
+
+void check_nested_vcpu_requests(struct kvm_vcpu *vcpu)
+{
+	if (kvm_check_request(KVM_REQ_GUEST_HYP_IRQ_PENDING, vcpu))
+		kvm_inject_nested_irq(vcpu);
+}
+
 /*
  * Our emulated CPU doesn't support all the possible features. For the
  * sake of simplicity (and probably mental sanity), wipe out a number
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index f2452e76156c..a18d845bcff7 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -16,6 +16,8 @@
 #include <linux/printk.h>
 #include <linux/uaccess.h>
 
+#include <linux/irqchip/arm-gic-v3.h>
+
 #include <asm/cacheflush.h>
 #include <asm/cputype.h>
 #include <asm/debug-monitors.h>
@@ -407,6 +409,19 @@ static bool access_actlr(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+/*
+ * The architecture says that non-secure write accesses to this register from
+ * EL1 are trapped to EL2, if either:
+ *  - HCR_EL2.FMO==1, or
+ *  - HCR_EL2.IMO==1
+ */
+static bool sgi_traps_to_vel2(struct kvm_vcpu *vcpu)
+{
+	return (vcpu_has_nv(vcpu) &&
+		!vcpu_is_el2(vcpu) &&
+		!!(vcpu_read_sys_reg(vcpu, HCR_EL2) & (HCR_IMO | HCR_FMO)));
+}
+
 /*
  * Trap handler for the GICv3 SGI generation system register.
  * Forward the request to the VGIC emulation.
@@ -422,6 +437,11 @@ static bool access_gic_sgi(struct kvm_vcpu *vcpu,
 	if (!p->is_write)
 		return read_from_write_only(vcpu, p, r);
 
+	if (sgi_traps_to_vel2(vcpu)) {
+		kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
+		return false;
+	}
+
 	/*
 	 * In a system where GICD_CTLR.DS=1, a ICC_SGI0R_EL1 access generates
 	 * Group0 SGIs only, while ICC_SGI1R_EL1 can generate either group,
@@ -465,7 +485,13 @@ static bool access_gic_sre(struct kvm_vcpu *vcpu,
 	if (p->is_write)
 		return ignore_write(vcpu, p);
 
-	p->regval = vcpu->arch.vgic_cpu.vgic_v3.vgic_sre;
+	if (p->Op1 == 4) {	/* ICC_SRE_EL2 */
+		p->regval = (ICC_SRE_EL2_ENABLE | ICC_SRE_EL2_SRE |
+			     ICC_SRE_EL1_DIB | ICC_SRE_EL1_DFB);
+	} else {		/* ICC_SRE_EL1 */
+		p->regval = vcpu->arch.vgic_cpu.vgic_v3.vgic_sre;
+	}
+
 	return true;
 }
 
@@ -1835,6 +1861,122 @@ static bool access_spsr_el2(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+static bool access_gic_apr(struct kvm_vcpu *vcpu,
+			   struct sys_reg_params *p,
+			   const struct sys_reg_desc *r)
+{
+	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
+	u32 index, *base;
+
+	index = r->Op2;
+	if (r->CRm == 8)
+		base = cpu_if->vgic_ap0r;
+	else
+		base = cpu_if->vgic_ap1r;
+
+	if (p->is_write)
+		base[index] = p->regval;
+	else
+		p->regval = base[index];
+
+	return true;
+}
+
+static bool access_gic_hcr(struct kvm_vcpu *vcpu,
+			   struct sys_reg_params *p,
+			   const struct sys_reg_desc *r)
+{
+	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
+
+	if (p->is_write)
+		cpu_if->vgic_hcr = p->regval;
+	else
+		p->regval = cpu_if->vgic_hcr;
+
+	return true;
+}
+
+static bool access_gic_vtr(struct kvm_vcpu *vcpu,
+			   struct sys_reg_params *p,
+			   const struct sys_reg_desc *r)
+{
+	if (p->is_write)
+		return write_to_read_only(vcpu, p, r);
+
+	p->regval = kvm_vgic_global_state.ich_vtr_el2;
+
+	return true;
+}
+
+static bool access_gic_misr(struct kvm_vcpu *vcpu,
+			    struct sys_reg_params *p,
+			    const struct sys_reg_desc *r)
+{
+	if (p->is_write)
+		return write_to_read_only(vcpu, p, r);
+
+	p->regval = vgic_v3_get_misr(vcpu);
+
+	return true;
+}
+
+static bool access_gic_eisr(struct kvm_vcpu *vcpu,
+			    struct sys_reg_params *p,
+			    const struct sys_reg_desc *r)
+{
+	if (p->is_write)
+		return write_to_read_only(vcpu, p, r);
+
+	p->regval = vgic_v3_get_eisr(vcpu);
+
+	return true;
+}
+
+static bool access_gic_elrsr(struct kvm_vcpu *vcpu,
+			     struct sys_reg_params *p,
+			     const struct sys_reg_desc *r)
+{
+	if (p->is_write)
+		return write_to_read_only(vcpu, p, r);
+
+	p->regval = vgic_v3_get_elrsr(vcpu);
+
+	return true;
+}
+
+static bool access_gic_vmcr(struct kvm_vcpu *vcpu,
+			    struct sys_reg_params *p,
+			    const struct sys_reg_desc *r)
+{
+	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
+
+	if (p->is_write)
+		cpu_if->vgic_vmcr = p->regval;
+	else
+		p->regval = cpu_if->vgic_vmcr;
+
+	return true;
+}
+
+static bool access_gic_lr(struct kvm_vcpu *vcpu,
+			  struct sys_reg_params *p,
+			  const struct sys_reg_desc *r)
+{
+	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
+	u32 index;
+
+	index = p->Op2;
+	if (p->CRm == 13)
+		index += 8;
+
+	if (p->is_write)
+		cpu_if->vgic_lr[index] = p->regval;
+	else
+		p->regval = cpu_if->vgic_lr[index];
+
+	return true;
+}
+
 /*
  * Architected system registers.
  * Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
@@ -2286,6 +2428,41 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_RMR_EL2), trap_undef },
 	{ SYS_DESC(SYS_VDISR_EL2), trap_undef },
 
+	{ SYS_DESC(SYS_ICH_AP0R0_EL2), access_gic_apr },
+	{ SYS_DESC(SYS_ICH_AP0R1_EL2), access_gic_apr },
+	{ SYS_DESC(SYS_ICH_AP0R2_EL2), access_gic_apr },
+	{ SYS_DESC(SYS_ICH_AP0R3_EL2), access_gic_apr },
+	{ SYS_DESC(SYS_ICH_AP1R0_EL2), access_gic_apr },
+	{ SYS_DESC(SYS_ICH_AP1R1_EL2), access_gic_apr },
+	{ SYS_DESC(SYS_ICH_AP1R2_EL2), access_gic_apr },
+	{ SYS_DESC(SYS_ICH_AP1R3_EL2), access_gic_apr },
+
+	{ SYS_DESC(SYS_ICC_SRE_EL2), access_gic_sre },
+
+	{ SYS_DESC(SYS_ICH_HCR_EL2), access_gic_hcr },
+	{ SYS_DESC(SYS_ICH_VTR_EL2), access_gic_vtr },
+	{ SYS_DESC(SYS_ICH_MISR_EL2), access_gic_misr },
+	{ SYS_DESC(SYS_ICH_EISR_EL2), access_gic_eisr },
+	{ SYS_DESC(SYS_ICH_ELRSR_EL2), access_gic_elrsr },
+	{ SYS_DESC(SYS_ICH_VMCR_EL2), access_gic_vmcr },
+
+	{ SYS_DESC(SYS_ICH_LR0_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR1_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR2_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR3_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR4_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR5_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR6_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR7_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR8_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR9_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR10_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR11_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR12_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR13_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR14_EL2), access_gic_lr },
+	{ SYS_DESC(SYS_ICH_LR15_EL2), access_gic_lr },
+
 	EL2_REG(CONTEXTIDR_EL2, access_rw, reset_val, 0),
 	EL2_REG(TPIDR_EL2, access_rw, reset_val, 0),
 
diff --git a/arch/arm64/kvm/vgic/vgic-v3-nested.c b/arch/arm64/kvm/vgic/vgic-v3-nested.c
new file mode 100644
index 000000000000..ab8ddf490b31
--- /dev/null
+++ b/arch/arm64/kvm/vgic/vgic-v3-nested.c
@@ -0,0 +1,180 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/cpu.h>
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/uaccess.h>
+
+#include <linux/irqchip/arm-gic-v3.h>
+
+#include <asm/kvm_emulate.h>
+#include <asm/kvm_arm.h>
+#include <kvm/arm_vgic.h>
+
+#include "vgic.h"
+
+static inline struct vgic_v3_cpu_if *vcpu_nested_if(struct kvm_vcpu *vcpu)
+{
+	return &vcpu->arch.vgic_cpu.nested_vgic_v3;
+}
+
+static inline struct vgic_v3_cpu_if *vcpu_shadow_if(struct kvm_vcpu *vcpu)
+{
+	return &vcpu->arch.vgic_cpu.shadow_vgic_v3;
+}
+
+static inline bool lr_triggers_eoi(u64 lr)
+{
+	return !(lr & (ICH_LR_STATE | ICH_LR_HW)) && (lr & ICH_LR_EOI);
+}
+
+u16 vgic_v3_get_eisr(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+	u16 reg = 0;
+	int i;
+
+	for (i = 0; i < kvm_vgic_global_state.nr_lr; i++) {
+		if (lr_triggers_eoi(cpu_if->vgic_lr[i]))
+			reg |= BIT(i);
+	}
+
+	return reg;
+}
+
+u16 vgic_v3_get_elrsr(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+	u16 reg = 0;
+	int i;
+
+	for (i = 0; i < kvm_vgic_global_state.nr_lr; i++) {
+		if (!(cpu_if->vgic_lr[i] & ICH_LR_STATE))
+			reg |= BIT(i);
+	}
+
+	return reg;
+}
+
+u64 vgic_v3_get_misr(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+	int nr_lr = kvm_vgic_global_state.nr_lr;
+	u64 reg = 0;
+
+	if (vgic_v3_get_eisr(vcpu))
+		reg |= ICH_MISR_EOI;
+
+	if (cpu_if->vgic_hcr & ICH_HCR_UIE) {
+		int used_lrs;
+
+		used_lrs = nr_lr - hweight16(vgic_v3_get_elrsr(vcpu));
+		if (used_lrs <= 1)
+			reg |= ICH_MISR_U;
+	}
+
+	/* TODO: Support remaining bits in this register */
+	return reg;
+}
+
+/*
+ * For LRs which have HW bit set such as timer interrupts, we modify them to
+ * have the host hardware interrupt number instead of the virtual one programmed
+ * by the guest hypervisor.
+ */
+static void vgic_v3_create_shadow_lr(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+	struct vgic_v3_cpu_if *s_cpu_if = vcpu_shadow_if(vcpu);
+	struct vgic_irq *irq;
+	int i, used_lrs = 0;
+
+	for (i = 0; i < kvm_vgic_global_state.nr_lr; i++) {
+		u64 lr = cpu_if->vgic_lr[i];
+		int l1_irq;
+
+		if (!(lr & ICH_LR_HW))
+			goto next;
+
+		/* We have the HW bit set */
+		l1_irq = (lr & ICH_LR_PHYS_ID_MASK) >> ICH_LR_PHYS_ID_SHIFT;
+		irq = vgic_get_irq(vcpu->kvm, vcpu, l1_irq);
+
+		if (!irq || !irq->hw) {
+			/* There was no real mapping, so nuke the HW bit */
+			lr &= ~ICH_LR_HW;
+			if (irq)
+				vgic_put_irq(vcpu->kvm, irq);
+			goto next;
+		}
+
+		/* Translate the virtual mapping to the real one */
+		lr &= ~ICH_LR_EOI; /* Why? */
+		lr &= ~ICH_LR_PHYS_ID_MASK;
+		lr |= (u64)irq->hwintid << ICH_LR_PHYS_ID_SHIFT;
+		vgic_put_irq(vcpu->kvm, irq);
+
+next:
+		s_cpu_if->vgic_lr[i] = lr;
+		used_lrs = i + 1;
+	}
+
+	s_cpu_if->used_lrs = used_lrs;
+}
+
+/*
+ * Change the shadow HWIRQ field back to the virtual value before copying over
+ * the entire shadow struct to the nested state.
+ */
+static void vgic_v3_fixup_shadow_lr_state(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+	struct vgic_v3_cpu_if *s_cpu_if = vcpu_shadow_if(vcpu);
+	int lr;
+
+	for (lr = 0; lr < kvm_vgic_global_state.nr_lr; lr++) {
+		s_cpu_if->vgic_lr[lr] &= ~ICH_LR_PHYS_ID_MASK;
+		s_cpu_if->vgic_lr[lr] |= cpu_if->vgic_lr[lr] & ICH_LR_PHYS_ID_MASK;
+	}
+}
+
+void vgic_v3_load_nested(struct kvm_vcpu *vcpu)
+{
+	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
+
+	vgic_cpu->shadow_vgic_v3 = vgic_cpu->nested_vgic_v3;
+	vgic_v3_create_shadow_lr(vcpu);
+	__vgic_v3_restore_state(vcpu_shadow_if(vcpu));
+}
+
+void vgic_v3_put_nested(struct kvm_vcpu *vcpu)
+{
+	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
+
+	__vgic_v3_save_state(vcpu_shadow_if(vcpu));
+
+	/*
+	 * Translate the shadow state HW fields back to the virtual ones
+	 * before copying the shadow struct back to the nested one.
+	 */
+	vgic_v3_fixup_shadow_lr_state(vcpu);
+	vgic_cpu->nested_vgic_v3 = vgic_cpu->shadow_vgic_v3;
+}
+
+void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+
+	/*
+	 * If we exit a nested VM with a pending maintenance interrupt from the
+	 * GIC, then we need to forward this to the guest hypervisor so that it
+	 * can re-sync the appropriate LRs and sample level triggered interrupts
+	 * again.
+	 */
+	if (vgic_state_is_nested(vcpu) &&
+	    (cpu_if->vgic_hcr & ICH_HCR_EN) &&
+	    vgic_v3_get_misr(vcpu))
+		kvm_inject_nested_irq(vcpu);
+}
diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c
index a33d4366b326..6bfca28ecced 100644
--- a/arch/arm64/kvm/vgic/vgic-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-v3.c
@@ -8,6 +8,7 @@
 #include <kvm/arm_vgic.h>
 #include <asm/kvm_hyp.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_nested.h>
 #include <asm/kvm_asm.h>
 
 #include "vgic.h"
@@ -277,6 +278,12 @@ void vgic_v3_enable(struct kvm_vcpu *vcpu)
 		vgic_v3->vgic_sre = (ICC_SRE_EL1_DIB |
 				     ICC_SRE_EL1_DFB |
 				     ICC_SRE_EL1_SRE);
+		/*
+		 * If nesting is allowed, force GICv3 onto the nested
+		 * guests as well.
+		 */
+		if (vcpu_has_nv(vcpu))
+			vcpu->arch.vgic_cpu.nested_vgic_v3.vgic_sre = vgic_v3->vgic_sre;
 		vcpu->arch.vgic_cpu.pendbaser = INITIAL_PENDBASER_VALUE;
 	} else {
 		vgic_v3->vgic_sre = 0;
@@ -707,6 +714,13 @@ void vgic_v3_load(struct kvm_vcpu *vcpu)
 {
 	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
 
+	/*
+	 * vgic_v3_load_nested only affects the LRs in the shadow
+	 * state, so it is fine to pass the nested state around.
+	 */
+	if (vgic_state_is_nested(vcpu))
+		cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
+
 	/*
 	 * If dealing with a GICv2 emulation on GICv3, VMCR_EL2.VFIQen
 	 * is dependent on ICC_SRE_EL1.SRE, and we have to perform the
@@ -720,6 +734,9 @@ void vgic_v3_load(struct kvm_vcpu *vcpu)
 	if (has_vhe())
 		__vgic_v3_activate_traps(cpu_if);
 
+	if (vgic_state_is_nested(vcpu))
+		vgic_v3_load_nested(vcpu);
+
 	WARN_ON(vgic_v4_load(vcpu));
 }
 
@@ -727,6 +744,9 @@ void vgic_v3_vmcr_sync(struct kvm_vcpu *vcpu)
 {
 	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
 
+	if (vgic_state_is_nested(vcpu))
+		cpu_if = &vcpu->arch.vgic_cpu.shadow_vgic_v3;
+
 	if (likely(cpu_if->vgic_sre))
 		cpu_if->vgic_vmcr = kvm_call_hyp_ret(__vgic_v3_read_vmcr);
 }
@@ -739,8 +759,14 @@ void vgic_v3_put(struct kvm_vcpu *vcpu)
 
 	vgic_v3_vmcr_sync(vcpu);
 
+	if (vgic_state_is_nested(vcpu))
+		cpu_if = &vcpu->arch.vgic_cpu.shadow_vgic_v3;
+
 	kvm_call_hyp(__vgic_v3_save_aprs, cpu_if);
 
 	if (has_vhe())
 		__vgic_v3_deactivate_traps(cpu_if);
+
+	if (vgic_state_is_nested(vcpu))
+		vgic_v3_put_nested(vcpu);
 }
diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
index e7fe0447af08..f467ea454c66 100644
--- a/arch/arm64/kvm/vgic/vgic.c
+++ b/arch/arm64/kvm/vgic/vgic.c
@@ -876,6 +876,10 @@ void kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
 {
 	int used_lrs;
 
+	/* If nesting, this is a load/put affair, not flush/sync. */
+	if (vgic_state_is_nested(vcpu))
+		return;
+
 	/* An empty ap_list_head implies used_lrs == 0 */
 	if (list_empty(&vcpu->arch.vgic_cpu.ap_list_head))
 		return;
@@ -920,6 +924,29 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
 	    !vgic_supports_direct_msis(vcpu->kvm))
 		return;
 
+	/*
+	 * If in a nested state, we must return early. Two possibilities:
+	 *
+	 * - If we have any pending IRQ for the guest and the guest
+	 *   expects IRQs to be handled in its virtual EL2 mode (the
+	 *   virtual IMO bit is set) and it is not already running in
+	 *   virtual EL2 mode, then we have to emulate an IRQ
+	 *   exception to virtual EL2.
+	 *
+	 *   We do that by placing a request to ourselves which will
+	 *   abort the entry procedure and inject the exception at the
+	 *   beginning of the run loop.
+	 *
+	 * - Otherwise, do exactly *NOTHING*. The guest state is
+	 *   already loaded, and we can carry on with running it.
+	 */
+	if (vgic_state_is_nested(vcpu)) {
+		if (kvm_vgic_vcpu_pending_irq(vcpu))
+			kvm_make_request(KVM_REQ_GUEST_HYP_IRQ_PENDING, vcpu);
+
+		return;
+	}
+
 	DEBUG_SPINLOCK_BUG_ON(!irqs_disabled());
 
 	if (!list_empty(&vcpu->arch.vgic_cpu.ap_list_head)) {
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 17b651890d22..2e27259b451f 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -327,6 +327,15 @@ struct vgic_cpu {
 
 	struct vgic_irq private_irqs[VGIC_NR_PRIVATE_IRQS];
 
+	/* CPU vif control registers for the virtual GICH interface */
+	struct vgic_v3_cpu_if	nested_vgic_v3;
+
+	/*
+	 * The shadow vif control register loaded to the hardware when
+	 * running a nested L2 guest with the virtual IMO/FMO bit set.
+	 */
+	struct vgic_v3_cpu_if	shadow_vgic_v3;
+
 	raw_spinlock_t ap_list_lock;	/* Protects the ap_list */
 
 	/*
@@ -384,6 +393,13 @@ void kvm_vgic_load(struct kvm_vcpu *vcpu);
 void kvm_vgic_put(struct kvm_vcpu *vcpu);
 void kvm_vgic_vmcr_sync(struct kvm_vcpu *vcpu);
 
+void vgic_v3_load_nested(struct kvm_vcpu *vcpu);
+void vgic_v3_put_nested(struct kvm_vcpu *vcpu);
+void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu);
+u16 vgic_v3_get_eisr(struct kvm_vcpu *vcpu);
+u16 vgic_v3_get_elrsr(struct kvm_vcpu *vcpu);
+u64 vgic_v3_get_misr(struct kvm_vcpu *vcpu);
+
 #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
 #define vgic_initialized(k)	((k)->arch.vgic.initialized)
 #define vgic_ready(k)		((k)->arch.vgic.ready)
@@ -428,4 +444,6 @@ int vgic_v4_load(struct kvm_vcpu *vcpu);
 void vgic_v4_commit(struct kvm_vcpu *vcpu);
 int vgic_v4_put(struct kvm_vcpu *vcpu, bool need_db);
 
+bool vgic_state_is_nested(struct kvm_vcpu *vcpu);
+
 #endif /* __KVM_ARM_VGIC_H */
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 47/64] KVM: arm64: nv: Don't load the GICv4 context on entering a nested guest
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

When entering a nested guest (vgic_state_is_nested() == true),
special care must be taken *not* to make the vPE resident, as
these are interrupts targetting the L1 guest, and not any
nested guest.

By not making the vPE resident, we guarantee that the delivery
of an vLPI will result in a doorbell, forcing an exit from the
nested guest and a switch to the L1 guest to handle the interrupt.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/vgic/vgic-v3.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c
index 6bfca28ecced..39e5c68087ea 100644
--- a/arch/arm64/kvm/vgic/vgic-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-v3.c
@@ -736,8 +736,8 @@ void vgic_v3_load(struct kvm_vcpu *vcpu)
 
 	if (vgic_state_is_nested(vcpu))
 		vgic_v3_load_nested(vcpu);
-
-	WARN_ON(vgic_v4_load(vcpu));
+	else
+		WARN_ON(vgic_v4_load(vcpu));
 }
 
 void vgic_v3_vmcr_sync(struct kvm_vcpu *vcpu)
@@ -755,6 +755,12 @@ void vgic_v3_put(struct kvm_vcpu *vcpu)
 {
 	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
 
+	/*
+	 * vgic_v4_put will do nothing if we were not resident. This
+	 * covers both the cases where we've blocked (we already have
+	 * done a vgic_v4_put) and when running a nested guest (the
+	 * vPE was never resident in order to generate a doorbell).
+	 */
 	WARN_ON(vgic_v4_put(vcpu, false));
 
 	vgic_v3_vmcr_sync(vcpu);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 47/64] KVM: arm64: nv: Don't load the GICv4 context on entering a nested guest
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

When entering a nested guest (vgic_state_is_nested() == true),
special care must be taken *not* to make the vPE resident, as
these are interrupts targetting the L1 guest, and not any
nested guest.

By not making the vPE resident, we guarantee that the delivery
of an vLPI will result in a doorbell, forcing an exit from the
nested guest and a switch to the L1 guest to handle the interrupt.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/vgic/vgic-v3.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c
index 6bfca28ecced..39e5c68087ea 100644
--- a/arch/arm64/kvm/vgic/vgic-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-v3.c
@@ -736,8 +736,8 @@ void vgic_v3_load(struct kvm_vcpu *vcpu)
 
 	if (vgic_state_is_nested(vcpu))
 		vgic_v3_load_nested(vcpu);
-
-	WARN_ON(vgic_v4_load(vcpu));
+	else
+		WARN_ON(vgic_v4_load(vcpu));
 }
 
 void vgic_v3_vmcr_sync(struct kvm_vcpu *vcpu)
@@ -755,6 +755,12 @@ void vgic_v3_put(struct kvm_vcpu *vcpu)
 {
 	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
 
+	/*
+	 * vgic_v4_put will do nothing if we were not resident. This
+	 * covers both the cases where we've blocked (we already have
+	 * done a vgic_v4_put) and when running a nested guest (the
+	 * vPE was never resident in order to generate a doorbell).
+	 */
 	WARN_ON(vgic_v4_put(vcpu, false));
 
 	vgic_v3_vmcr_sync(vcpu);
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 47/64] KVM: arm64: nv: Don't load the GICv4 context on entering a nested guest
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

When entering a nested guest (vgic_state_is_nested() == true),
special care must be taken *not* to make the vPE resident, as
these are interrupts targetting the L1 guest, and not any
nested guest.

By not making the vPE resident, we guarantee that the delivery
of an vLPI will result in a doorbell, forcing an exit from the
nested guest and a switch to the L1 guest to handle the interrupt.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/vgic/vgic-v3.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c
index 6bfca28ecced..39e5c68087ea 100644
--- a/arch/arm64/kvm/vgic/vgic-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-v3.c
@@ -736,8 +736,8 @@ void vgic_v3_load(struct kvm_vcpu *vcpu)
 
 	if (vgic_state_is_nested(vcpu))
 		vgic_v3_load_nested(vcpu);
-
-	WARN_ON(vgic_v4_load(vcpu));
+	else
+		WARN_ON(vgic_v4_load(vcpu));
 }
 
 void vgic_v3_vmcr_sync(struct kvm_vcpu *vcpu)
@@ -755,6 +755,12 @@ void vgic_v3_put(struct kvm_vcpu *vcpu)
 {
 	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
 
+	/*
+	 * vgic_v4_put will do nothing if we were not resident. This
+	 * covers both the cases where we've blocked (we already have
+	 * done a vgic_v4_put) and when running a nested guest (the
+	 * vPE was never resident in order to generate a doorbell).
+	 */
 	WARN_ON(vgic_v4_put(vcpu, false));
 
 	vgic_v3_vmcr_sync(vcpu);
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 48/64] KVM: arm64: nv: vgic: Emulate the HW bit in software
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Christoffer Dall <christoffer.dall@arm.com>

Should the guest hypervisor use the HW bit in the LRs, we need to
emulate the deactivation from the L2 guest into the L1 distributor
emulation, which is handled by L0.

It's all good fun.

Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_hyp.h     |  2 ++
 arch/arm64/kvm/hyp/vgic-v3-sr.c      |  2 +-
 arch/arm64/kvm/vgic/vgic-v3-nested.c | 32 ++++++++++++++++++++++++++++
 arch/arm64/kvm/vgic/vgic.c           |  6 ++++--
 include/kvm/arm_vgic.h               |  1 +
 5 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 462882f356c7..9dc157a59642 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -57,6 +57,8 @@ DECLARE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
 
 int __vgic_v2_perform_cpuif_access(struct kvm_vcpu *vcpu);
 
+u64 __gic_v3_get_lr(unsigned int lr);
+
 void __vgic_v3_save_state(struct vgic_v3_cpu_if *cpu_if);
 void __vgic_v3_restore_state(struct vgic_v3_cpu_if *cpu_if);
 void __vgic_v3_activate_traps(struct vgic_v3_cpu_if *cpu_if);
diff --git a/arch/arm64/kvm/hyp/vgic-v3-sr.c b/arch/arm64/kvm/hyp/vgic-v3-sr.c
index 20db2f281cf2..0601d73de11b 100644
--- a/arch/arm64/kvm/hyp/vgic-v3-sr.c
+++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c
@@ -18,7 +18,7 @@
 #define vtr_to_nr_pre_bits(v)		((((u32)(v) >> 26) & 7) + 1)
 #define vtr_to_nr_apr_regs(v)		(1 << (vtr_to_nr_pre_bits(v) - 5))
 
-static u64 __gic_v3_get_lr(unsigned int lr)
+u64 __gic_v3_get_lr(unsigned int lr)
 {
 	switch (lr & 0xf) {
 	case 0:
diff --git a/arch/arm64/kvm/vgic/vgic-v3-nested.c b/arch/arm64/kvm/vgic/vgic-v3-nested.c
index ab8ddf490b31..e88c75e79010 100644
--- a/arch/arm64/kvm/vgic/vgic-v3-nested.c
+++ b/arch/arm64/kvm/vgic/vgic-v3-nested.c
@@ -140,6 +140,38 @@ static void vgic_v3_fixup_shadow_lr_state(struct kvm_vcpu *vcpu)
 	}
 }
 
+void vgic_v3_sync_nested(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+	struct vgic_v3_cpu_if *s_cpu_if = vcpu_shadow_if(vcpu);
+	struct vgic_irq *irq;
+	int i;
+
+	for (i = 0; i < s_cpu_if->used_lrs; i++) {
+		u64 lr = cpu_if->vgic_lr[i];
+		int l1_irq;
+
+		if (!(lr & ICH_LR_HW) || !(lr & ICH_LR_STATE))
+			continue;
+
+		/*
+		 * If we had a HW lr programmed by the guest hypervisor, we
+		 * need to emulate the HW effect between the guest hypervisor
+		 * and the nested guest.
+		 */
+		l1_irq = (lr & ICH_LR_PHYS_ID_MASK) >> ICH_LR_PHYS_ID_SHIFT;
+		irq = vgic_get_irq(vcpu->kvm, vcpu, l1_irq);
+		if (!irq)
+			continue; /* oh well, the guest hyp is broken */
+
+		lr = __gic_v3_get_lr(i);
+		if (!(lr & ICH_LR_STATE))
+			irq->active = false;
+
+		vgic_put_irq(vcpu->kvm, irq);
+	}
+}
+
 void vgic_v3_load_nested(struct kvm_vcpu *vcpu)
 {
 	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
index f467ea454c66..92df86726c86 100644
--- a/arch/arm64/kvm/vgic/vgic.c
+++ b/arch/arm64/kvm/vgic/vgic.c
@@ -876,9 +876,11 @@ void kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
 {
 	int used_lrs;
 
-	/* If nesting, this is a load/put affair, not flush/sync. */
-	if (vgic_state_is_nested(vcpu))
+	/* If nesting, emulate the HW effect from L0 to L1 */
+	if (vgic_state_is_nested(vcpu)) {
+		vgic_v3_sync_nested(vcpu);
 		return;
+	}
 
 	/* An empty ap_list_head implies used_lrs == 0 */
 	if (list_empty(&vcpu->arch.vgic_cpu.ap_list_head))
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 2e27259b451f..beeb99dc3cc1 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -393,6 +393,7 @@ void kvm_vgic_load(struct kvm_vcpu *vcpu);
 void kvm_vgic_put(struct kvm_vcpu *vcpu);
 void kvm_vgic_vmcr_sync(struct kvm_vcpu *vcpu);
 
+void vgic_v3_sync_nested(struct kvm_vcpu *vcpu);
 void vgic_v3_load_nested(struct kvm_vcpu *vcpu);
 void vgic_v3_put_nested(struct kvm_vcpu *vcpu);
 void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 48/64] KVM: arm64: nv: vgic: Emulate the HW bit in software
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

From: Christoffer Dall <christoffer.dall@arm.com>

Should the guest hypervisor use the HW bit in the LRs, we need to
emulate the deactivation from the L2 guest into the L1 distributor
emulation, which is handled by L0.

It's all good fun.

Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_hyp.h     |  2 ++
 arch/arm64/kvm/hyp/vgic-v3-sr.c      |  2 +-
 arch/arm64/kvm/vgic/vgic-v3-nested.c | 32 ++++++++++++++++++++++++++++
 arch/arm64/kvm/vgic/vgic.c           |  6 ++++--
 include/kvm/arm_vgic.h               |  1 +
 5 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 462882f356c7..9dc157a59642 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -57,6 +57,8 @@ DECLARE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
 
 int __vgic_v2_perform_cpuif_access(struct kvm_vcpu *vcpu);
 
+u64 __gic_v3_get_lr(unsigned int lr);
+
 void __vgic_v3_save_state(struct vgic_v3_cpu_if *cpu_if);
 void __vgic_v3_restore_state(struct vgic_v3_cpu_if *cpu_if);
 void __vgic_v3_activate_traps(struct vgic_v3_cpu_if *cpu_if);
diff --git a/arch/arm64/kvm/hyp/vgic-v3-sr.c b/arch/arm64/kvm/hyp/vgic-v3-sr.c
index 20db2f281cf2..0601d73de11b 100644
--- a/arch/arm64/kvm/hyp/vgic-v3-sr.c
+++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c
@@ -18,7 +18,7 @@
 #define vtr_to_nr_pre_bits(v)		((((u32)(v) >> 26) & 7) + 1)
 #define vtr_to_nr_apr_regs(v)		(1 << (vtr_to_nr_pre_bits(v) - 5))
 
-static u64 __gic_v3_get_lr(unsigned int lr)
+u64 __gic_v3_get_lr(unsigned int lr)
 {
 	switch (lr & 0xf) {
 	case 0:
diff --git a/arch/arm64/kvm/vgic/vgic-v3-nested.c b/arch/arm64/kvm/vgic/vgic-v3-nested.c
index ab8ddf490b31..e88c75e79010 100644
--- a/arch/arm64/kvm/vgic/vgic-v3-nested.c
+++ b/arch/arm64/kvm/vgic/vgic-v3-nested.c
@@ -140,6 +140,38 @@ static void vgic_v3_fixup_shadow_lr_state(struct kvm_vcpu *vcpu)
 	}
 }
 
+void vgic_v3_sync_nested(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+	struct vgic_v3_cpu_if *s_cpu_if = vcpu_shadow_if(vcpu);
+	struct vgic_irq *irq;
+	int i;
+
+	for (i = 0; i < s_cpu_if->used_lrs; i++) {
+		u64 lr = cpu_if->vgic_lr[i];
+		int l1_irq;
+
+		if (!(lr & ICH_LR_HW) || !(lr & ICH_LR_STATE))
+			continue;
+
+		/*
+		 * If we had a HW lr programmed by the guest hypervisor, we
+		 * need to emulate the HW effect between the guest hypervisor
+		 * and the nested guest.
+		 */
+		l1_irq = (lr & ICH_LR_PHYS_ID_MASK) >> ICH_LR_PHYS_ID_SHIFT;
+		irq = vgic_get_irq(vcpu->kvm, vcpu, l1_irq);
+		if (!irq)
+			continue; /* oh well, the guest hyp is broken */
+
+		lr = __gic_v3_get_lr(i);
+		if (!(lr & ICH_LR_STATE))
+			irq->active = false;
+
+		vgic_put_irq(vcpu->kvm, irq);
+	}
+}
+
 void vgic_v3_load_nested(struct kvm_vcpu *vcpu)
 {
 	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
index f467ea454c66..92df86726c86 100644
--- a/arch/arm64/kvm/vgic/vgic.c
+++ b/arch/arm64/kvm/vgic/vgic.c
@@ -876,9 +876,11 @@ void kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
 {
 	int used_lrs;
 
-	/* If nesting, this is a load/put affair, not flush/sync. */
-	if (vgic_state_is_nested(vcpu))
+	/* If nesting, emulate the HW effect from L0 to L1 */
+	if (vgic_state_is_nested(vcpu)) {
+		vgic_v3_sync_nested(vcpu);
 		return;
+	}
 
 	/* An empty ap_list_head implies used_lrs == 0 */
 	if (list_empty(&vcpu->arch.vgic_cpu.ap_list_head))
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 2e27259b451f..beeb99dc3cc1 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -393,6 +393,7 @@ void kvm_vgic_load(struct kvm_vcpu *vcpu);
 void kvm_vgic_put(struct kvm_vcpu *vcpu);
 void kvm_vgic_vmcr_sync(struct kvm_vcpu *vcpu);
 
+void vgic_v3_sync_nested(struct kvm_vcpu *vcpu);
 void vgic_v3_load_nested(struct kvm_vcpu *vcpu);
 void vgic_v3_put_nested(struct kvm_vcpu *vcpu);
 void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu);
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 48/64] KVM: arm64: nv: vgic: Emulate the HW bit in software
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Christoffer Dall <christoffer.dall@arm.com>

Should the guest hypervisor use the HW bit in the LRs, we need to
emulate the deactivation from the L2 guest into the L1 distributor
emulation, which is handled by L0.

It's all good fun.

Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_hyp.h     |  2 ++
 arch/arm64/kvm/hyp/vgic-v3-sr.c      |  2 +-
 arch/arm64/kvm/vgic/vgic-v3-nested.c | 32 ++++++++++++++++++++++++++++
 arch/arm64/kvm/vgic/vgic.c           |  6 ++++--
 include/kvm/arm_vgic.h               |  1 +
 5 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 462882f356c7..9dc157a59642 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -57,6 +57,8 @@ DECLARE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
 
 int __vgic_v2_perform_cpuif_access(struct kvm_vcpu *vcpu);
 
+u64 __gic_v3_get_lr(unsigned int lr);
+
 void __vgic_v3_save_state(struct vgic_v3_cpu_if *cpu_if);
 void __vgic_v3_restore_state(struct vgic_v3_cpu_if *cpu_if);
 void __vgic_v3_activate_traps(struct vgic_v3_cpu_if *cpu_if);
diff --git a/arch/arm64/kvm/hyp/vgic-v3-sr.c b/arch/arm64/kvm/hyp/vgic-v3-sr.c
index 20db2f281cf2..0601d73de11b 100644
--- a/arch/arm64/kvm/hyp/vgic-v3-sr.c
+++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c
@@ -18,7 +18,7 @@
 #define vtr_to_nr_pre_bits(v)		((((u32)(v) >> 26) & 7) + 1)
 #define vtr_to_nr_apr_regs(v)		(1 << (vtr_to_nr_pre_bits(v) - 5))
 
-static u64 __gic_v3_get_lr(unsigned int lr)
+u64 __gic_v3_get_lr(unsigned int lr)
 {
 	switch (lr & 0xf) {
 	case 0:
diff --git a/arch/arm64/kvm/vgic/vgic-v3-nested.c b/arch/arm64/kvm/vgic/vgic-v3-nested.c
index ab8ddf490b31..e88c75e79010 100644
--- a/arch/arm64/kvm/vgic/vgic-v3-nested.c
+++ b/arch/arm64/kvm/vgic/vgic-v3-nested.c
@@ -140,6 +140,38 @@ static void vgic_v3_fixup_shadow_lr_state(struct kvm_vcpu *vcpu)
 	}
 }
 
+void vgic_v3_sync_nested(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+	struct vgic_v3_cpu_if *s_cpu_if = vcpu_shadow_if(vcpu);
+	struct vgic_irq *irq;
+	int i;
+
+	for (i = 0; i < s_cpu_if->used_lrs; i++) {
+		u64 lr = cpu_if->vgic_lr[i];
+		int l1_irq;
+
+		if (!(lr & ICH_LR_HW) || !(lr & ICH_LR_STATE))
+			continue;
+
+		/*
+		 * If we had a HW lr programmed by the guest hypervisor, we
+		 * need to emulate the HW effect between the guest hypervisor
+		 * and the nested guest.
+		 */
+		l1_irq = (lr & ICH_LR_PHYS_ID_MASK) >> ICH_LR_PHYS_ID_SHIFT;
+		irq = vgic_get_irq(vcpu->kvm, vcpu, l1_irq);
+		if (!irq)
+			continue; /* oh well, the guest hyp is broken */
+
+		lr = __gic_v3_get_lr(i);
+		if (!(lr & ICH_LR_STATE))
+			irq->active = false;
+
+		vgic_put_irq(vcpu->kvm, irq);
+	}
+}
+
 void vgic_v3_load_nested(struct kvm_vcpu *vcpu)
 {
 	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
index f467ea454c66..92df86726c86 100644
--- a/arch/arm64/kvm/vgic/vgic.c
+++ b/arch/arm64/kvm/vgic/vgic.c
@@ -876,9 +876,11 @@ void kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
 {
 	int used_lrs;
 
-	/* If nesting, this is a load/put affair, not flush/sync. */
-	if (vgic_state_is_nested(vcpu))
+	/* If nesting, emulate the HW effect from L0 to L1 */
+	if (vgic_state_is_nested(vcpu)) {
+		vgic_v3_sync_nested(vcpu);
 		return;
+	}
 
 	/* An empty ap_list_head implies used_lrs == 0 */
 	if (list_empty(&vcpu->arch.vgic_cpu.ap_list_head))
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 2e27259b451f..beeb99dc3cc1 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -393,6 +393,7 @@ void kvm_vgic_load(struct kvm_vcpu *vcpu);
 void kvm_vgic_put(struct kvm_vcpu *vcpu);
 void kvm_vgic_vmcr_sync(struct kvm_vcpu *vcpu);
 
+void vgic_v3_sync_nested(struct kvm_vcpu *vcpu);
 void vgic_v3_load_nested(struct kvm_vcpu *vcpu);
 void vgic_v3_put_nested(struct kvm_vcpu *vcpu);
 void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu);
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 49/64] KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Andre Przywara <andre.przywara@arm.com>

The VGIC maintenance IRQ signals various conditions about the LRs, when
the GIC's virtualization extension is used.
So far we didn't need it, but nested virtualization needs to know about
this interrupt, so add a userland interface to setup the IRQ number.
The architecture mandates that it must be a PPI, on top of that this code
only exports a per-device option, so the PPI is the same on all VCPUs.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
[added some bits of documentation]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 .../virt/kvm/devices/arm-vgic-v3.rst          | 12 +++++++++-
 arch/arm64/include/uapi/asm/kvm.h             |  1 +
 arch/arm64/kvm/vgic/vgic-kvm-device.c         | 22 +++++++++++++++++++
 include/kvm/arm_vgic.h                        |  3 +++
 tools/arch/arm/include/uapi/asm/kvm.h         |  1 +
 5 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/devices/arm-vgic-v3.rst b/Documentation/virt/kvm/devices/arm-vgic-v3.rst
index 51e5e5762571..1901e651cc00 100644
--- a/Documentation/virt/kvm/devices/arm-vgic-v3.rst
+++ b/Documentation/virt/kvm/devices/arm-vgic-v3.rst
@@ -284,8 +284,18 @@ Groups:
       |    Aff3    |    Aff2    |    Aff1    |    Aff0    |
 
   Errors:
-
     =======  =============================================
     -EINVAL  vINTID is not multiple of 32 or info field is
 	     not VGIC_LEVEL_INFO_LINE_LEVEL
     =======  =============================================
+
+  KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ
+   Attributes:
+
+    The attr field of kvm_device_attr encodes the following values:
+
+      bits:     | 31   ....    5 | 4  ....  0 |
+      values:   |      RES0      |   vINTID   |
+
+    The vINTID specifies which interrupt is generated when the vGIC
+    must generate a maintenance interrupt. This must be a PPI.
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 395a4c039bcc..b2811b90fa13 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -346,6 +346,7 @@ struct kvm_arm_copy_mte_tags {
 #define KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS 6
 #define KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO  7
 #define KVM_DEV_ARM_VGIC_GRP_ITS_REGS 8
+#define KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ  9
 #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT	10
 #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_MASK \
 			(0x3fffffULL << KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT)
diff --git a/arch/arm64/kvm/vgic/vgic-kvm-device.c b/arch/arm64/kvm/vgic/vgic-kvm-device.c
index c6d52a1fd9c8..c653259f5a4f 100644
--- a/arch/arm64/kvm/vgic/vgic-kvm-device.c
+++ b/arch/arm64/kvm/vgic/vgic-kvm-device.c
@@ -251,6 +251,12 @@ static int vgic_get_common_attr(struct kvm_device *dev,
 			     VGIC_NR_PRIVATE_IRQS, uaddr);
 		break;
 	}
+	case KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ: {
+		u32 __user *uaddr = (u32 __user *)(long)attr->addr;
+
+		r = put_user(dev->kvm->arch.vgic.maint_irq, uaddr);
+		break;
+	}
 	}
 
 	return r;
@@ -637,6 +643,21 @@ static int vgic_v3_set_attr(struct kvm_device *dev,
 		reg = tmp32;
 		return vgic_v3_attr_regs_access(dev, attr, &reg, true);
 	}
+	case KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ: {
+		u32 __user *uaddr = (u32 __user *)(long)attr->addr;
+		u32 val;
+
+		if (get_user(val, uaddr))
+			return -EFAULT;
+
+		/* Must be a PPI. */
+		if ((val >= VGIC_NR_PRIVATE_IRQS) || (val < VGIC_NR_SGIS))
+			return -EINVAL;
+
+		dev->kvm->arch.vgic.maint_irq = val;
+
+		return 0;
+	}
 	case KVM_DEV_ARM_VGIC_GRP_CTRL: {
 		int ret;
 
@@ -722,6 +743,7 @@ static int vgic_v3_has_attr(struct kvm_device *dev,
 	case KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS:
 		return vgic_v3_has_attr_regs(dev, attr);
 	case KVM_DEV_ARM_VGIC_GRP_NR_IRQS:
+	case KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ:
 		return 0;
 	case KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO: {
 		if (((attr->attr & KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_MASK) >>
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index beeb99dc3cc1..b44db6b013f5 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -240,6 +240,9 @@ struct vgic_dist {
 
 	int			nr_spis;
 
+	/* The GIC maintenance IRQ for nested hypervisors. */
+	u32			maint_irq;
+
 	/* base addresses in guest physical address space: */
 	gpa_t			vgic_dist_base;		/* distributor */
 	union {
diff --git a/tools/arch/arm/include/uapi/asm/kvm.h b/tools/arch/arm/include/uapi/asm/kvm.h
index 03cd7c19a683..d5dd96902817 100644
--- a/tools/arch/arm/include/uapi/asm/kvm.h
+++ b/tools/arch/arm/include/uapi/asm/kvm.h
@@ -246,6 +246,7 @@ struct kvm_vcpu_events {
 #define KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS 6
 #define KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO  7
 #define KVM_DEV_ARM_VGIC_GRP_ITS_REGS	8
+#define KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ	9
 #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT	10
 #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_MASK \
 			(0x3fffffULL << KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 49/64] KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

From: Andre Przywara <andre.przywara@arm.com>

The VGIC maintenance IRQ signals various conditions about the LRs, when
the GIC's virtualization extension is used.
So far we didn't need it, but nested virtualization needs to know about
this interrupt, so add a userland interface to setup the IRQ number.
The architecture mandates that it must be a PPI, on top of that this code
only exports a per-device option, so the PPI is the same on all VCPUs.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
[added some bits of documentation]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 .../virt/kvm/devices/arm-vgic-v3.rst          | 12 +++++++++-
 arch/arm64/include/uapi/asm/kvm.h             |  1 +
 arch/arm64/kvm/vgic/vgic-kvm-device.c         | 22 +++++++++++++++++++
 include/kvm/arm_vgic.h                        |  3 +++
 tools/arch/arm/include/uapi/asm/kvm.h         |  1 +
 5 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/devices/arm-vgic-v3.rst b/Documentation/virt/kvm/devices/arm-vgic-v3.rst
index 51e5e5762571..1901e651cc00 100644
--- a/Documentation/virt/kvm/devices/arm-vgic-v3.rst
+++ b/Documentation/virt/kvm/devices/arm-vgic-v3.rst
@@ -284,8 +284,18 @@ Groups:
       |    Aff3    |    Aff2    |    Aff1    |    Aff0    |
 
   Errors:
-
     =======  =============================================
     -EINVAL  vINTID is not multiple of 32 or info field is
 	     not VGIC_LEVEL_INFO_LINE_LEVEL
     =======  =============================================
+
+  KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ
+   Attributes:
+
+    The attr field of kvm_device_attr encodes the following values:
+
+      bits:     | 31   ....    5 | 4  ....  0 |
+      values:   |      RES0      |   vINTID   |
+
+    The vINTID specifies which interrupt is generated when the vGIC
+    must generate a maintenance interrupt. This must be a PPI.
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 395a4c039bcc..b2811b90fa13 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -346,6 +346,7 @@ struct kvm_arm_copy_mte_tags {
 #define KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS 6
 #define KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO  7
 #define KVM_DEV_ARM_VGIC_GRP_ITS_REGS 8
+#define KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ  9
 #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT	10
 #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_MASK \
 			(0x3fffffULL << KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT)
diff --git a/arch/arm64/kvm/vgic/vgic-kvm-device.c b/arch/arm64/kvm/vgic/vgic-kvm-device.c
index c6d52a1fd9c8..c653259f5a4f 100644
--- a/arch/arm64/kvm/vgic/vgic-kvm-device.c
+++ b/arch/arm64/kvm/vgic/vgic-kvm-device.c
@@ -251,6 +251,12 @@ static int vgic_get_common_attr(struct kvm_device *dev,
 			     VGIC_NR_PRIVATE_IRQS, uaddr);
 		break;
 	}
+	case KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ: {
+		u32 __user *uaddr = (u32 __user *)(long)attr->addr;
+
+		r = put_user(dev->kvm->arch.vgic.maint_irq, uaddr);
+		break;
+	}
 	}
 
 	return r;
@@ -637,6 +643,21 @@ static int vgic_v3_set_attr(struct kvm_device *dev,
 		reg = tmp32;
 		return vgic_v3_attr_regs_access(dev, attr, &reg, true);
 	}
+	case KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ: {
+		u32 __user *uaddr = (u32 __user *)(long)attr->addr;
+		u32 val;
+
+		if (get_user(val, uaddr))
+			return -EFAULT;
+
+		/* Must be a PPI. */
+		if ((val >= VGIC_NR_PRIVATE_IRQS) || (val < VGIC_NR_SGIS))
+			return -EINVAL;
+
+		dev->kvm->arch.vgic.maint_irq = val;
+
+		return 0;
+	}
 	case KVM_DEV_ARM_VGIC_GRP_CTRL: {
 		int ret;
 
@@ -722,6 +743,7 @@ static int vgic_v3_has_attr(struct kvm_device *dev,
 	case KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS:
 		return vgic_v3_has_attr_regs(dev, attr);
 	case KVM_DEV_ARM_VGIC_GRP_NR_IRQS:
+	case KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ:
 		return 0;
 	case KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO: {
 		if (((attr->attr & KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_MASK) >>
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index beeb99dc3cc1..b44db6b013f5 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -240,6 +240,9 @@ struct vgic_dist {
 
 	int			nr_spis;
 
+	/* The GIC maintenance IRQ for nested hypervisors. */
+	u32			maint_irq;
+
 	/* base addresses in guest physical address space: */
 	gpa_t			vgic_dist_base;		/* distributor */
 	union {
diff --git a/tools/arch/arm/include/uapi/asm/kvm.h b/tools/arch/arm/include/uapi/asm/kvm.h
index 03cd7c19a683..d5dd96902817 100644
--- a/tools/arch/arm/include/uapi/asm/kvm.h
+++ b/tools/arch/arm/include/uapi/asm/kvm.h
@@ -246,6 +246,7 @@ struct kvm_vcpu_events {
 #define KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS 6
 #define KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO  7
 #define KVM_DEV_ARM_VGIC_GRP_ITS_REGS	8
+#define KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ	9
 #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT	10
 #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_MASK \
 			(0x3fffffULL << KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT)
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 49/64] KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Andre Przywara <andre.przywara@arm.com>

The VGIC maintenance IRQ signals various conditions about the LRs, when
the GIC's virtualization extension is used.
So far we didn't need it, but nested virtualization needs to know about
this interrupt, so add a userland interface to setup the IRQ number.
The architecture mandates that it must be a PPI, on top of that this code
only exports a per-device option, so the PPI is the same on all VCPUs.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
[added some bits of documentation]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 .../virt/kvm/devices/arm-vgic-v3.rst          | 12 +++++++++-
 arch/arm64/include/uapi/asm/kvm.h             |  1 +
 arch/arm64/kvm/vgic/vgic-kvm-device.c         | 22 +++++++++++++++++++
 include/kvm/arm_vgic.h                        |  3 +++
 tools/arch/arm/include/uapi/asm/kvm.h         |  1 +
 5 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/devices/arm-vgic-v3.rst b/Documentation/virt/kvm/devices/arm-vgic-v3.rst
index 51e5e5762571..1901e651cc00 100644
--- a/Documentation/virt/kvm/devices/arm-vgic-v3.rst
+++ b/Documentation/virt/kvm/devices/arm-vgic-v3.rst
@@ -284,8 +284,18 @@ Groups:
       |    Aff3    |    Aff2    |    Aff1    |    Aff0    |
 
   Errors:
-
     =======  =============================================
     -EINVAL  vINTID is not multiple of 32 or info field is
 	     not VGIC_LEVEL_INFO_LINE_LEVEL
     =======  =============================================
+
+  KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ
+   Attributes:
+
+    The attr field of kvm_device_attr encodes the following values:
+
+      bits:     | 31   ....    5 | 4  ....  0 |
+      values:   |      RES0      |   vINTID   |
+
+    The vINTID specifies which interrupt is generated when the vGIC
+    must generate a maintenance interrupt. This must be a PPI.
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 395a4c039bcc..b2811b90fa13 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -346,6 +346,7 @@ struct kvm_arm_copy_mte_tags {
 #define KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS 6
 #define KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO  7
 #define KVM_DEV_ARM_VGIC_GRP_ITS_REGS 8
+#define KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ  9
 #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT	10
 #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_MASK \
 			(0x3fffffULL << KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT)
diff --git a/arch/arm64/kvm/vgic/vgic-kvm-device.c b/arch/arm64/kvm/vgic/vgic-kvm-device.c
index c6d52a1fd9c8..c653259f5a4f 100644
--- a/arch/arm64/kvm/vgic/vgic-kvm-device.c
+++ b/arch/arm64/kvm/vgic/vgic-kvm-device.c
@@ -251,6 +251,12 @@ static int vgic_get_common_attr(struct kvm_device *dev,
 			     VGIC_NR_PRIVATE_IRQS, uaddr);
 		break;
 	}
+	case KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ: {
+		u32 __user *uaddr = (u32 __user *)(long)attr->addr;
+
+		r = put_user(dev->kvm->arch.vgic.maint_irq, uaddr);
+		break;
+	}
 	}
 
 	return r;
@@ -637,6 +643,21 @@ static int vgic_v3_set_attr(struct kvm_device *dev,
 		reg = tmp32;
 		return vgic_v3_attr_regs_access(dev, attr, &reg, true);
 	}
+	case KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ: {
+		u32 __user *uaddr = (u32 __user *)(long)attr->addr;
+		u32 val;
+
+		if (get_user(val, uaddr))
+			return -EFAULT;
+
+		/* Must be a PPI. */
+		if ((val >= VGIC_NR_PRIVATE_IRQS) || (val < VGIC_NR_SGIS))
+			return -EINVAL;
+
+		dev->kvm->arch.vgic.maint_irq = val;
+
+		return 0;
+	}
 	case KVM_DEV_ARM_VGIC_GRP_CTRL: {
 		int ret;
 
@@ -722,6 +743,7 @@ static int vgic_v3_has_attr(struct kvm_device *dev,
 	case KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS:
 		return vgic_v3_has_attr_regs(dev, attr);
 	case KVM_DEV_ARM_VGIC_GRP_NR_IRQS:
+	case KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ:
 		return 0;
 	case KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO: {
 		if (((attr->attr & KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_MASK) >>
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index beeb99dc3cc1..b44db6b013f5 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -240,6 +240,9 @@ struct vgic_dist {
 
 	int			nr_spis;
 
+	/* The GIC maintenance IRQ for nested hypervisors. */
+	u32			maint_irq;
+
 	/* base addresses in guest physical address space: */
 	gpa_t			vgic_dist_base;		/* distributor */
 	union {
diff --git a/tools/arch/arm/include/uapi/asm/kvm.h b/tools/arch/arm/include/uapi/asm/kvm.h
index 03cd7c19a683..d5dd96902817 100644
--- a/tools/arch/arm/include/uapi/asm/kvm.h
+++ b/tools/arch/arm/include/uapi/asm/kvm.h
@@ -246,6 +246,7 @@ struct kvm_vcpu_events {
 #define KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS 6
 #define KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO  7
 #define KVM_DEV_ARM_VGIC_GRP_ITS_REGS	8
+#define KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ	9
 #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT	10
 #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_MASK \
 			(0x3fffffULL << KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT)
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 50/64] KVM: arm64: nv: Implement maintenance interrupt forwarding
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

When we take a maintenance interrupt, we need to decide whether
it is generated on an action from the guest, or if it is something
that needs to be forwarded to the guest hypervisor.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/vgic/vgic-init.c      | 30 ++++++++++++++++++++++++++++
 arch/arm64/kvm/vgic/vgic-v3-nested.c | 25 +++++++++++++++++++----
 2 files changed, 51 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
index fc00304fe7d8..6485b7935155 100644
--- a/arch/arm64/kvm/vgic/vgic-init.c
+++ b/arch/arm64/kvm/vgic/vgic-init.c
@@ -6,10 +6,12 @@
 #include <linux/uaccess.h>
 #include <linux/interrupt.h>
 #include <linux/cpu.h>
+#include <linux/irq.h>
 #include <linux/kvm_host.h>
 #include <kvm/arm_vgic.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_nested.h>
 #include "vgic.h"
 
 /*
@@ -222,6 +224,16 @@ int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
 	if (!irqchip_in_kernel(vcpu->kvm))
 		return 0;
 
+	if (vcpu_has_nv(vcpu)) {
+		/* FIXME: remove this hack */
+		if (vcpu->kvm->arch.vgic.maint_irq == 0)
+			vcpu->kvm->arch.vgic.maint_irq = kvm_vgic_global_state.maint_irq;
+		ret = kvm_vgic_set_owner(vcpu, vcpu->kvm->arch.vgic.maint_irq,
+					 vcpu);
+		if (ret)
+			return ret;
+	}
+
 	/*
 	 * If we are creating a VCPU with a GICv3 we must also register the
 	 * KVM io device for the redistributor that belongs to this VCPU.
@@ -475,12 +487,23 @@ static int vgic_init_cpu_dying(unsigned int cpu)
 
 static irqreturn_t vgic_maintenance_handler(int irq, void *data)
 {
+	struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)data;
+
 	/*
 	 * We cannot rely on the vgic maintenance interrupt to be
 	 * delivered synchronously. This means we can only use it to
 	 * exit the VM, and we perform the handling of EOIed
 	 * interrupts on the exit path (see vgic_fold_lr_state).
 	 */
+
+	/* If not nested, deactivate */
+	if (!vcpu || !vgic_state_is_nested(vcpu)) {
+		irq_set_irqchip_state(irq, IRQCHIP_STATE_ACTIVE, false);
+		return IRQ_HANDLED;
+	}
+
+	/* Assume nested from now */
+	vgic_v3_handle_nested_maint_irq(vcpu);
 	return IRQ_HANDLED;
 }
 
@@ -579,6 +602,13 @@ int kvm_vgic_hyp_init(void)
 		return ret;
 	}
 
+	ret = irq_set_vcpu_affinity(kvm_vgic_global_state.maint_irq,
+				    kvm_get_running_vcpus());
+	if (ret) {
+		kvm_err("Error setting vcpu affinity\n");
+		goto out_free_irq;
+	}
+
 	ret = cpuhp_setup_state(CPUHP_AP_KVM_ARM_VGIC_INIT_STARTING,
 				"kvm/arm/vgic:starting",
 				vgic_init_cpu_starting, vgic_init_cpu_dying);
diff --git a/arch/arm64/kvm/vgic/vgic-v3-nested.c b/arch/arm64/kvm/vgic/vgic-v3-nested.c
index e88c75e79010..4a35c2be7984 100644
--- a/arch/arm64/kvm/vgic/vgic-v3-nested.c
+++ b/arch/arm64/kvm/vgic/vgic-v3-nested.c
@@ -175,10 +175,20 @@ void vgic_v3_sync_nested(struct kvm_vcpu *vcpu)
 void vgic_v3_load_nested(struct kvm_vcpu *vcpu)
 {
 	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
+	struct vgic_irq *irq;
+	unsigned long flags;
 
 	vgic_cpu->shadow_vgic_v3 = vgic_cpu->nested_vgic_v3;
 	vgic_v3_create_shadow_lr(vcpu);
 	__vgic_v3_restore_state(vcpu_shadow_if(vcpu));
+
+	irq = vgic_get_irq(vcpu->kvm, vcpu, vcpu->kvm->arch.vgic.maint_irq);
+	raw_spin_lock_irqsave(&irq->irq_lock, flags);
+	if (irq->line_level || irq->active)
+		irq_set_irqchip_state(kvm_vgic_global_state.maint_irq,
+				      IRQCHIP_STATE_ACTIVE, true);
+	raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
+	vgic_put_irq(vcpu->kvm, irq);
 }
 
 void vgic_v3_put_nested(struct kvm_vcpu *vcpu)
@@ -193,11 +203,14 @@ void vgic_v3_put_nested(struct kvm_vcpu *vcpu)
 	 */
 	vgic_v3_fixup_shadow_lr_state(vcpu);
 	vgic_cpu->nested_vgic_v3 = vgic_cpu->shadow_vgic_v3;
+	irq_set_irqchip_state(kvm_vgic_global_state.maint_irq,
+			      IRQCHIP_STATE_ACTIVE, false);
 }
 
 void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu)
 {
 	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+	bool state;
 
 	/*
 	 * If we exit a nested VM with a pending maintenance interrupt from the
@@ -205,8 +218,12 @@ void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu)
 	 * can re-sync the appropriate LRs and sample level triggered interrupts
 	 * again.
 	 */
-	if (vgic_state_is_nested(vcpu) &&
-	    (cpu_if->vgic_hcr & ICH_HCR_EN) &&
-	    vgic_v3_get_misr(vcpu))
-		kvm_inject_nested_irq(vcpu);
+	if (!vgic_state_is_nested(vcpu))
+		return;
+
+	state  = cpu_if->vgic_hcr & ICH_HCR_EN;
+	state &= vgic_v3_get_misr(vcpu);
+
+	kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
+			    vcpu->kvm->arch.vgic.maint_irq, state, vcpu);
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 50/64] KVM: arm64: nv: Implement maintenance interrupt forwarding
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

When we take a maintenance interrupt, we need to decide whether
it is generated on an action from the guest, or if it is something
that needs to be forwarded to the guest hypervisor.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/vgic/vgic-init.c      | 30 ++++++++++++++++++++++++++++
 arch/arm64/kvm/vgic/vgic-v3-nested.c | 25 +++++++++++++++++++----
 2 files changed, 51 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
index fc00304fe7d8..6485b7935155 100644
--- a/arch/arm64/kvm/vgic/vgic-init.c
+++ b/arch/arm64/kvm/vgic/vgic-init.c
@@ -6,10 +6,12 @@
 #include <linux/uaccess.h>
 #include <linux/interrupt.h>
 #include <linux/cpu.h>
+#include <linux/irq.h>
 #include <linux/kvm_host.h>
 #include <kvm/arm_vgic.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_nested.h>
 #include "vgic.h"
 
 /*
@@ -222,6 +224,16 @@ int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
 	if (!irqchip_in_kernel(vcpu->kvm))
 		return 0;
 
+	if (vcpu_has_nv(vcpu)) {
+		/* FIXME: remove this hack */
+		if (vcpu->kvm->arch.vgic.maint_irq == 0)
+			vcpu->kvm->arch.vgic.maint_irq = kvm_vgic_global_state.maint_irq;
+		ret = kvm_vgic_set_owner(vcpu, vcpu->kvm->arch.vgic.maint_irq,
+					 vcpu);
+		if (ret)
+			return ret;
+	}
+
 	/*
 	 * If we are creating a VCPU with a GICv3 we must also register the
 	 * KVM io device for the redistributor that belongs to this VCPU.
@@ -475,12 +487,23 @@ static int vgic_init_cpu_dying(unsigned int cpu)
 
 static irqreturn_t vgic_maintenance_handler(int irq, void *data)
 {
+	struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)data;
+
 	/*
 	 * We cannot rely on the vgic maintenance interrupt to be
 	 * delivered synchronously. This means we can only use it to
 	 * exit the VM, and we perform the handling of EOIed
 	 * interrupts on the exit path (see vgic_fold_lr_state).
 	 */
+
+	/* If not nested, deactivate */
+	if (!vcpu || !vgic_state_is_nested(vcpu)) {
+		irq_set_irqchip_state(irq, IRQCHIP_STATE_ACTIVE, false);
+		return IRQ_HANDLED;
+	}
+
+	/* Assume nested from now */
+	vgic_v3_handle_nested_maint_irq(vcpu);
 	return IRQ_HANDLED;
 }
 
@@ -579,6 +602,13 @@ int kvm_vgic_hyp_init(void)
 		return ret;
 	}
 
+	ret = irq_set_vcpu_affinity(kvm_vgic_global_state.maint_irq,
+				    kvm_get_running_vcpus());
+	if (ret) {
+		kvm_err("Error setting vcpu affinity\n");
+		goto out_free_irq;
+	}
+
 	ret = cpuhp_setup_state(CPUHP_AP_KVM_ARM_VGIC_INIT_STARTING,
 				"kvm/arm/vgic:starting",
 				vgic_init_cpu_starting, vgic_init_cpu_dying);
diff --git a/arch/arm64/kvm/vgic/vgic-v3-nested.c b/arch/arm64/kvm/vgic/vgic-v3-nested.c
index e88c75e79010..4a35c2be7984 100644
--- a/arch/arm64/kvm/vgic/vgic-v3-nested.c
+++ b/arch/arm64/kvm/vgic/vgic-v3-nested.c
@@ -175,10 +175,20 @@ void vgic_v3_sync_nested(struct kvm_vcpu *vcpu)
 void vgic_v3_load_nested(struct kvm_vcpu *vcpu)
 {
 	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
+	struct vgic_irq *irq;
+	unsigned long flags;
 
 	vgic_cpu->shadow_vgic_v3 = vgic_cpu->nested_vgic_v3;
 	vgic_v3_create_shadow_lr(vcpu);
 	__vgic_v3_restore_state(vcpu_shadow_if(vcpu));
+
+	irq = vgic_get_irq(vcpu->kvm, vcpu, vcpu->kvm->arch.vgic.maint_irq);
+	raw_spin_lock_irqsave(&irq->irq_lock, flags);
+	if (irq->line_level || irq->active)
+		irq_set_irqchip_state(kvm_vgic_global_state.maint_irq,
+				      IRQCHIP_STATE_ACTIVE, true);
+	raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
+	vgic_put_irq(vcpu->kvm, irq);
 }
 
 void vgic_v3_put_nested(struct kvm_vcpu *vcpu)
@@ -193,11 +203,14 @@ void vgic_v3_put_nested(struct kvm_vcpu *vcpu)
 	 */
 	vgic_v3_fixup_shadow_lr_state(vcpu);
 	vgic_cpu->nested_vgic_v3 = vgic_cpu->shadow_vgic_v3;
+	irq_set_irqchip_state(kvm_vgic_global_state.maint_irq,
+			      IRQCHIP_STATE_ACTIVE, false);
 }
 
 void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu)
 {
 	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+	bool state;
 
 	/*
 	 * If we exit a nested VM with a pending maintenance interrupt from the
@@ -205,8 +218,12 @@ void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu)
 	 * can re-sync the appropriate LRs and sample level triggered interrupts
 	 * again.
 	 */
-	if (vgic_state_is_nested(vcpu) &&
-	    (cpu_if->vgic_hcr & ICH_HCR_EN) &&
-	    vgic_v3_get_misr(vcpu))
-		kvm_inject_nested_irq(vcpu);
+	if (!vgic_state_is_nested(vcpu))
+		return;
+
+	state  = cpu_if->vgic_hcr & ICH_HCR_EN;
+	state &= vgic_v3_get_misr(vcpu);
+
+	kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
+			    vcpu->kvm->arch.vgic.maint_irq, state, vcpu);
 }
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 50/64] KVM: arm64: nv: Implement maintenance interrupt forwarding
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

When we take a maintenance interrupt, we need to decide whether
it is generated on an action from the guest, or if it is something
that needs to be forwarded to the guest hypervisor.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/vgic/vgic-init.c      | 30 ++++++++++++++++++++++++++++
 arch/arm64/kvm/vgic/vgic-v3-nested.c | 25 +++++++++++++++++++----
 2 files changed, 51 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
index fc00304fe7d8..6485b7935155 100644
--- a/arch/arm64/kvm/vgic/vgic-init.c
+++ b/arch/arm64/kvm/vgic/vgic-init.c
@@ -6,10 +6,12 @@
 #include <linux/uaccess.h>
 #include <linux/interrupt.h>
 #include <linux/cpu.h>
+#include <linux/irq.h>
 #include <linux/kvm_host.h>
 #include <kvm/arm_vgic.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_nested.h>
 #include "vgic.h"
 
 /*
@@ -222,6 +224,16 @@ int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
 	if (!irqchip_in_kernel(vcpu->kvm))
 		return 0;
 
+	if (vcpu_has_nv(vcpu)) {
+		/* FIXME: remove this hack */
+		if (vcpu->kvm->arch.vgic.maint_irq == 0)
+			vcpu->kvm->arch.vgic.maint_irq = kvm_vgic_global_state.maint_irq;
+		ret = kvm_vgic_set_owner(vcpu, vcpu->kvm->arch.vgic.maint_irq,
+					 vcpu);
+		if (ret)
+			return ret;
+	}
+
 	/*
 	 * If we are creating a VCPU with a GICv3 we must also register the
 	 * KVM io device for the redistributor that belongs to this VCPU.
@@ -475,12 +487,23 @@ static int vgic_init_cpu_dying(unsigned int cpu)
 
 static irqreturn_t vgic_maintenance_handler(int irq, void *data)
 {
+	struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)data;
+
 	/*
 	 * We cannot rely on the vgic maintenance interrupt to be
 	 * delivered synchronously. This means we can only use it to
 	 * exit the VM, and we perform the handling of EOIed
 	 * interrupts on the exit path (see vgic_fold_lr_state).
 	 */
+
+	/* If not nested, deactivate */
+	if (!vcpu || !vgic_state_is_nested(vcpu)) {
+		irq_set_irqchip_state(irq, IRQCHIP_STATE_ACTIVE, false);
+		return IRQ_HANDLED;
+	}
+
+	/* Assume nested from now */
+	vgic_v3_handle_nested_maint_irq(vcpu);
 	return IRQ_HANDLED;
 }
 
@@ -579,6 +602,13 @@ int kvm_vgic_hyp_init(void)
 		return ret;
 	}
 
+	ret = irq_set_vcpu_affinity(kvm_vgic_global_state.maint_irq,
+				    kvm_get_running_vcpus());
+	if (ret) {
+		kvm_err("Error setting vcpu affinity\n");
+		goto out_free_irq;
+	}
+
 	ret = cpuhp_setup_state(CPUHP_AP_KVM_ARM_VGIC_INIT_STARTING,
 				"kvm/arm/vgic:starting",
 				vgic_init_cpu_starting, vgic_init_cpu_dying);
diff --git a/arch/arm64/kvm/vgic/vgic-v3-nested.c b/arch/arm64/kvm/vgic/vgic-v3-nested.c
index e88c75e79010..4a35c2be7984 100644
--- a/arch/arm64/kvm/vgic/vgic-v3-nested.c
+++ b/arch/arm64/kvm/vgic/vgic-v3-nested.c
@@ -175,10 +175,20 @@ void vgic_v3_sync_nested(struct kvm_vcpu *vcpu)
 void vgic_v3_load_nested(struct kvm_vcpu *vcpu)
 {
 	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
+	struct vgic_irq *irq;
+	unsigned long flags;
 
 	vgic_cpu->shadow_vgic_v3 = vgic_cpu->nested_vgic_v3;
 	vgic_v3_create_shadow_lr(vcpu);
 	__vgic_v3_restore_state(vcpu_shadow_if(vcpu));
+
+	irq = vgic_get_irq(vcpu->kvm, vcpu, vcpu->kvm->arch.vgic.maint_irq);
+	raw_spin_lock_irqsave(&irq->irq_lock, flags);
+	if (irq->line_level || irq->active)
+		irq_set_irqchip_state(kvm_vgic_global_state.maint_irq,
+				      IRQCHIP_STATE_ACTIVE, true);
+	raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
+	vgic_put_irq(vcpu->kvm, irq);
 }
 
 void vgic_v3_put_nested(struct kvm_vcpu *vcpu)
@@ -193,11 +203,14 @@ void vgic_v3_put_nested(struct kvm_vcpu *vcpu)
 	 */
 	vgic_v3_fixup_shadow_lr_state(vcpu);
 	vgic_cpu->nested_vgic_v3 = vgic_cpu->shadow_vgic_v3;
+	irq_set_irqchip_state(kvm_vgic_global_state.maint_irq,
+			      IRQCHIP_STATE_ACTIVE, false);
 }
 
 void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu)
 {
 	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+	bool state;
 
 	/*
 	 * If we exit a nested VM with a pending maintenance interrupt from the
@@ -205,8 +218,12 @@ void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu)
 	 * can re-sync the appropriate LRs and sample level triggered interrupts
 	 * again.
 	 */
-	if (vgic_state_is_nested(vcpu) &&
-	    (cpu_if->vgic_hcr & ICH_HCR_EN) &&
-	    vgic_v3_get_misr(vcpu))
-		kvm_inject_nested_irq(vcpu);
+	if (!vgic_state_is_nested(vcpu))
+		return;
+
+	state  = cpu_if->vgic_hcr & ICH_HCR_EN;
+	state &= vgic_v3_get_misr(vcpu);
+
+	kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
+			    vcpu->kvm->arch.vgic.maint_irq, state, vcpu);
 }
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 51/64] KVM: arm64: nv: Add nested GICv3 tracepoints
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:18   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Christoffer Dall <christoffer.dall@arm.com>

Adding tracepoints to be able to peek into the shadow LRs used when
running a guest guest.

Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/vgic/vgic-nested-trace.h | 137 ++++++++++++++++++++++++
 arch/arm64/kvm/vgic/vgic-v3-nested.c    |  13 ++-
 2 files changed, 149 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/kvm/vgic/vgic-nested-trace.h

diff --git a/arch/arm64/kvm/vgic/vgic-nested-trace.h b/arch/arm64/kvm/vgic/vgic-nested-trace.h
new file mode 100644
index 000000000000..f1a074c791a6
--- /dev/null
+++ b/arch/arm64/kvm/vgic/vgic-nested-trace.h
@@ -0,0 +1,137 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#if !defined(_TRACE_VGIC_NESTED_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_VGIC_NESTED_H
+
+#include <linux/tracepoint.h>
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM kvm
+
+#define SLR_ENTRY_VALS(x)							\
+	" ",									\
+	!!(__entry->lrs[x] & ICH_LR_HW),		   			\
+	!!(__entry->lrs[x] & ICH_LR_PENDING_BIT),	   			\
+	!!(__entry->lrs[x] & ICH_LR_ACTIVE_BIT),	   			\
+	__entry->lrs[x] & ICH_LR_VIRTUAL_ID_MASK,				\
+	(__entry->lrs[x] & ICH_LR_PHYS_ID_MASK) >> ICH_LR_PHYS_ID_SHIFT,	\
+	(__entry->orig_lrs[x] & ICH_LR_PHYS_ID_MASK) >> ICH_LR_PHYS_ID_SHIFT
+
+TRACE_EVENT(vgic_create_shadow_lrs,
+	TP_PROTO(struct kvm_vcpu *vcpu, int nr_lr, u64 *lrs, u64 *orig_lrs),
+	TP_ARGS(vcpu, nr_lr, lrs, orig_lrs),
+
+	TP_STRUCT__entry(
+		__field(	int,	nr_lr			)
+		__array(	u64,	lrs,		16	)
+		__array(	u64,	orig_lrs,	16	)
+	),
+
+	TP_fast_assign(
+		__entry->nr_lr		= nr_lr;
+		memcpy(__entry->lrs, lrs, 16 * sizeof(u64));
+		memcpy(__entry->orig_lrs, orig_lrs, 16 * sizeof(u64));
+	),
+
+	TP_printk("nr_lr: %d\n"
+		  "%50sLR[ 0]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[ 1]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[ 2]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[ 3]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[ 4]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[ 5]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[ 6]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[ 7]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[ 8]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[ 9]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[10]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[11]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[12]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[13]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[14]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[15]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)",
+		  __entry->nr_lr,
+		  SLR_ENTRY_VALS(0), SLR_ENTRY_VALS(1), SLR_ENTRY_VALS(2),
+		  SLR_ENTRY_VALS(3), SLR_ENTRY_VALS(4), SLR_ENTRY_VALS(5),
+		  SLR_ENTRY_VALS(6), SLR_ENTRY_VALS(7), SLR_ENTRY_VALS(8),
+		  SLR_ENTRY_VALS(9), SLR_ENTRY_VALS(10), SLR_ENTRY_VALS(11),
+		  SLR_ENTRY_VALS(12), SLR_ENTRY_VALS(13), SLR_ENTRY_VALS(14),
+		  SLR_ENTRY_VALS(15))
+);
+
+#define LR_ENTRY_VALS(x)							\
+	" ",									\
+	!!(__entry->lrs[x] & ICH_LR_HW),		   			\
+	!!(__entry->lrs[x] & ICH_LR_PENDING_BIT),	   			\
+	!!(__entry->lrs[x] & ICH_LR_ACTIVE_BIT),	   			\
+	__entry->lrs[x] & ICH_LR_VIRTUAL_ID_MASK,				\
+	(__entry->lrs[x] & ICH_LR_PHYS_ID_MASK) >> ICH_LR_PHYS_ID_SHIFT
+
+TRACE_EVENT(vgic_put_nested,
+	TP_PROTO(struct kvm_vcpu *vcpu, int nr_lr, u64 *lrs),
+	TP_ARGS(vcpu, nr_lr, lrs),
+
+	TP_STRUCT__entry(
+		__field(	int,	nr_lr			)
+		__array(	u64,	lrs,		16	)
+	),
+
+	TP_fast_assign(
+		__entry->nr_lr		= nr_lr;
+		memcpy(__entry->lrs, lrs, 16 * sizeof(u64));
+	),
+
+	TP_printk("nr_lr: %d\n"
+		  "%50sLR[ 0]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[ 1]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[ 2]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[ 3]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[ 4]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[ 5]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[ 6]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[ 7]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[ 8]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[ 9]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[10]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[11]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[12]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[13]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[14]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[15]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu",
+		  __entry->nr_lr,
+		  LR_ENTRY_VALS(0), LR_ENTRY_VALS(1), LR_ENTRY_VALS(2),
+		  LR_ENTRY_VALS(3), LR_ENTRY_VALS(4), LR_ENTRY_VALS(5),
+		  LR_ENTRY_VALS(6), LR_ENTRY_VALS(7), LR_ENTRY_VALS(8),
+		  LR_ENTRY_VALS(9), LR_ENTRY_VALS(10), LR_ENTRY_VALS(11),
+		  LR_ENTRY_VALS(12), LR_ENTRY_VALS(13), LR_ENTRY_VALS(14),
+		  LR_ENTRY_VALS(15))
+);
+
+TRACE_EVENT(vgic_nested_hw_emulate,
+	TP_PROTO(int lr, u64 lr_val, u32 l1_intid),
+	TP_ARGS(lr, lr_val, l1_intid),
+
+	TP_STRUCT__entry(
+		__field(	int,	lr		)
+		__field(	u64,	lr_val		)
+		__field(	u32,	l1_intid	)
+	),
+
+	TP_fast_assign(
+		__entry->lr		= lr;
+		__entry->lr_val		= lr_val;
+		__entry->l1_intid	= l1_intid;
+	),
+
+	TP_printk("lr: %d LR %llx L1 INTID: %u\n",
+		  __entry->lr, __entry->lr_val, __entry->l1_intid)
+);
+
+#endif /* _TRACE_VGIC_NESTED_H */
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH vgic/
+#undef TRACE_INCLUDE_FILE
+#define TRACE_INCLUDE_FILE vgic-nested-trace
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/arch/arm64/kvm/vgic/vgic-v3-nested.c b/arch/arm64/kvm/vgic/vgic-v3-nested.c
index 4a35c2be7984..02b0b335f72a 100644
--- a/arch/arm64/kvm/vgic/vgic-v3-nested.c
+++ b/arch/arm64/kvm/vgic/vgic-v3-nested.c
@@ -15,6 +15,9 @@
 
 #include "vgic.h"
 
+#define CREATE_TRACE_POINTS
+#include "vgic-nested-trace.h"
+
 static inline struct vgic_v3_cpu_if *vcpu_nested_if(struct kvm_vcpu *vcpu)
 {
 	return &vcpu->arch.vgic_cpu.nested_vgic_v3;
@@ -121,6 +124,9 @@ static void vgic_v3_create_shadow_lr(struct kvm_vcpu *vcpu)
 		used_lrs = i + 1;
 	}
 
+	trace_vgic_create_shadow_lrs(vcpu, kvm_vgic_global_state.nr_lr,
+				     s_cpu_if->vgic_lr, cpu_if->vgic_lr);
+
 	s_cpu_if->used_lrs = used_lrs;
 }
 
@@ -165,8 +171,10 @@ void vgic_v3_sync_nested(struct kvm_vcpu *vcpu)
 			continue; /* oh well, the guest hyp is broken */
 
 		lr = __gic_v3_get_lr(i);
-		if (!(lr & ICH_LR_STATE))
+		if (!(lr & ICH_LR_STATE)) {
+			trace_vgic_nested_hw_emulate(i, lr, l1_irq);
 			irq->active = false;
+		}
 
 		vgic_put_irq(vcpu->kvm, irq);
 	}
@@ -197,6 +205,9 @@ void vgic_v3_put_nested(struct kvm_vcpu *vcpu)
 
 	__vgic_v3_save_state(vcpu_shadow_if(vcpu));
 
+	trace_vgic_put_nested(vcpu, kvm_vgic_global_state.nr_lr,
+			      vcpu_shadow_if(vcpu)->vgic_lr);
+
 	/*
 	 * Translate the shadow state HW fields back to the virtual ones
 	 * before copying the shadow struct back to the nested one.
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 51/64] KVM: arm64: nv: Add nested GICv3 tracepoints
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

From: Christoffer Dall <christoffer.dall@arm.com>

Adding tracepoints to be able to peek into the shadow LRs used when
running a guest guest.

Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/vgic/vgic-nested-trace.h | 137 ++++++++++++++++++++++++
 arch/arm64/kvm/vgic/vgic-v3-nested.c    |  13 ++-
 2 files changed, 149 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/kvm/vgic/vgic-nested-trace.h

diff --git a/arch/arm64/kvm/vgic/vgic-nested-trace.h b/arch/arm64/kvm/vgic/vgic-nested-trace.h
new file mode 100644
index 000000000000..f1a074c791a6
--- /dev/null
+++ b/arch/arm64/kvm/vgic/vgic-nested-trace.h
@@ -0,0 +1,137 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#if !defined(_TRACE_VGIC_NESTED_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_VGIC_NESTED_H
+
+#include <linux/tracepoint.h>
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM kvm
+
+#define SLR_ENTRY_VALS(x)							\
+	" ",									\
+	!!(__entry->lrs[x] & ICH_LR_HW),		   			\
+	!!(__entry->lrs[x] & ICH_LR_PENDING_BIT),	   			\
+	!!(__entry->lrs[x] & ICH_LR_ACTIVE_BIT),	   			\
+	__entry->lrs[x] & ICH_LR_VIRTUAL_ID_MASK,				\
+	(__entry->lrs[x] & ICH_LR_PHYS_ID_MASK) >> ICH_LR_PHYS_ID_SHIFT,	\
+	(__entry->orig_lrs[x] & ICH_LR_PHYS_ID_MASK) >> ICH_LR_PHYS_ID_SHIFT
+
+TRACE_EVENT(vgic_create_shadow_lrs,
+	TP_PROTO(struct kvm_vcpu *vcpu, int nr_lr, u64 *lrs, u64 *orig_lrs),
+	TP_ARGS(vcpu, nr_lr, lrs, orig_lrs),
+
+	TP_STRUCT__entry(
+		__field(	int,	nr_lr			)
+		__array(	u64,	lrs,		16	)
+		__array(	u64,	orig_lrs,	16	)
+	),
+
+	TP_fast_assign(
+		__entry->nr_lr		= nr_lr;
+		memcpy(__entry->lrs, lrs, 16 * sizeof(u64));
+		memcpy(__entry->orig_lrs, orig_lrs, 16 * sizeof(u64));
+	),
+
+	TP_printk("nr_lr: %d\n"
+		  "%50sLR[ 0]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[ 1]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[ 2]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[ 3]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[ 4]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[ 5]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[ 6]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[ 7]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[ 8]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[ 9]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[10]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[11]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[12]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[13]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[14]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[15]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)",
+		  __entry->nr_lr,
+		  SLR_ENTRY_VALS(0), SLR_ENTRY_VALS(1), SLR_ENTRY_VALS(2),
+		  SLR_ENTRY_VALS(3), SLR_ENTRY_VALS(4), SLR_ENTRY_VALS(5),
+		  SLR_ENTRY_VALS(6), SLR_ENTRY_VALS(7), SLR_ENTRY_VALS(8),
+		  SLR_ENTRY_VALS(9), SLR_ENTRY_VALS(10), SLR_ENTRY_VALS(11),
+		  SLR_ENTRY_VALS(12), SLR_ENTRY_VALS(13), SLR_ENTRY_VALS(14),
+		  SLR_ENTRY_VALS(15))
+);
+
+#define LR_ENTRY_VALS(x)							\
+	" ",									\
+	!!(__entry->lrs[x] & ICH_LR_HW),		   			\
+	!!(__entry->lrs[x] & ICH_LR_PENDING_BIT),	   			\
+	!!(__entry->lrs[x] & ICH_LR_ACTIVE_BIT),	   			\
+	__entry->lrs[x] & ICH_LR_VIRTUAL_ID_MASK,				\
+	(__entry->lrs[x] & ICH_LR_PHYS_ID_MASK) >> ICH_LR_PHYS_ID_SHIFT
+
+TRACE_EVENT(vgic_put_nested,
+	TP_PROTO(struct kvm_vcpu *vcpu, int nr_lr, u64 *lrs),
+	TP_ARGS(vcpu, nr_lr, lrs),
+
+	TP_STRUCT__entry(
+		__field(	int,	nr_lr			)
+		__array(	u64,	lrs,		16	)
+	),
+
+	TP_fast_assign(
+		__entry->nr_lr		= nr_lr;
+		memcpy(__entry->lrs, lrs, 16 * sizeof(u64));
+	),
+
+	TP_printk("nr_lr: %d\n"
+		  "%50sLR[ 0]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[ 1]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[ 2]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[ 3]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[ 4]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[ 5]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[ 6]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[ 7]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[ 8]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[ 9]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[10]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[11]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[12]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[13]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[14]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[15]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu",
+		  __entry->nr_lr,
+		  LR_ENTRY_VALS(0), LR_ENTRY_VALS(1), LR_ENTRY_VALS(2),
+		  LR_ENTRY_VALS(3), LR_ENTRY_VALS(4), LR_ENTRY_VALS(5),
+		  LR_ENTRY_VALS(6), LR_ENTRY_VALS(7), LR_ENTRY_VALS(8),
+		  LR_ENTRY_VALS(9), LR_ENTRY_VALS(10), LR_ENTRY_VALS(11),
+		  LR_ENTRY_VALS(12), LR_ENTRY_VALS(13), LR_ENTRY_VALS(14),
+		  LR_ENTRY_VALS(15))
+);
+
+TRACE_EVENT(vgic_nested_hw_emulate,
+	TP_PROTO(int lr, u64 lr_val, u32 l1_intid),
+	TP_ARGS(lr, lr_val, l1_intid),
+
+	TP_STRUCT__entry(
+		__field(	int,	lr		)
+		__field(	u64,	lr_val		)
+		__field(	u32,	l1_intid	)
+	),
+
+	TP_fast_assign(
+		__entry->lr		= lr;
+		__entry->lr_val		= lr_val;
+		__entry->l1_intid	= l1_intid;
+	),
+
+	TP_printk("lr: %d LR %llx L1 INTID: %u\n",
+		  __entry->lr, __entry->lr_val, __entry->l1_intid)
+);
+
+#endif /* _TRACE_VGIC_NESTED_H */
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH vgic/
+#undef TRACE_INCLUDE_FILE
+#define TRACE_INCLUDE_FILE vgic-nested-trace
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/arch/arm64/kvm/vgic/vgic-v3-nested.c b/arch/arm64/kvm/vgic/vgic-v3-nested.c
index 4a35c2be7984..02b0b335f72a 100644
--- a/arch/arm64/kvm/vgic/vgic-v3-nested.c
+++ b/arch/arm64/kvm/vgic/vgic-v3-nested.c
@@ -15,6 +15,9 @@
 
 #include "vgic.h"
 
+#define CREATE_TRACE_POINTS
+#include "vgic-nested-trace.h"
+
 static inline struct vgic_v3_cpu_if *vcpu_nested_if(struct kvm_vcpu *vcpu)
 {
 	return &vcpu->arch.vgic_cpu.nested_vgic_v3;
@@ -121,6 +124,9 @@ static void vgic_v3_create_shadow_lr(struct kvm_vcpu *vcpu)
 		used_lrs = i + 1;
 	}
 
+	trace_vgic_create_shadow_lrs(vcpu, kvm_vgic_global_state.nr_lr,
+				     s_cpu_if->vgic_lr, cpu_if->vgic_lr);
+
 	s_cpu_if->used_lrs = used_lrs;
 }
 
@@ -165,8 +171,10 @@ void vgic_v3_sync_nested(struct kvm_vcpu *vcpu)
 			continue; /* oh well, the guest hyp is broken */
 
 		lr = __gic_v3_get_lr(i);
-		if (!(lr & ICH_LR_STATE))
+		if (!(lr & ICH_LR_STATE)) {
+			trace_vgic_nested_hw_emulate(i, lr, l1_irq);
 			irq->active = false;
+		}
 
 		vgic_put_irq(vcpu->kvm, irq);
 	}
@@ -197,6 +205,9 @@ void vgic_v3_put_nested(struct kvm_vcpu *vcpu)
 
 	__vgic_v3_save_state(vcpu_shadow_if(vcpu));
 
+	trace_vgic_put_nested(vcpu, kvm_vgic_global_state.nr_lr,
+			      vcpu_shadow_if(vcpu)->vgic_lr);
+
 	/*
 	 * Translate the shadow state HW fields back to the virtual ones
 	 * before copying the shadow struct back to the nested one.
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 51/64] KVM: arm64: nv: Add nested GICv3 tracepoints
@ 2022-01-28 12:18   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:18 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Christoffer Dall <christoffer.dall@arm.com>

Adding tracepoints to be able to peek into the shadow LRs used when
running a guest guest.

Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/vgic/vgic-nested-trace.h | 137 ++++++++++++++++++++++++
 arch/arm64/kvm/vgic/vgic-v3-nested.c    |  13 ++-
 2 files changed, 149 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/kvm/vgic/vgic-nested-trace.h

diff --git a/arch/arm64/kvm/vgic/vgic-nested-trace.h b/arch/arm64/kvm/vgic/vgic-nested-trace.h
new file mode 100644
index 000000000000..f1a074c791a6
--- /dev/null
+++ b/arch/arm64/kvm/vgic/vgic-nested-trace.h
@@ -0,0 +1,137 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#if !defined(_TRACE_VGIC_NESTED_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_VGIC_NESTED_H
+
+#include <linux/tracepoint.h>
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM kvm
+
+#define SLR_ENTRY_VALS(x)							\
+	" ",									\
+	!!(__entry->lrs[x] & ICH_LR_HW),		   			\
+	!!(__entry->lrs[x] & ICH_LR_PENDING_BIT),	   			\
+	!!(__entry->lrs[x] & ICH_LR_ACTIVE_BIT),	   			\
+	__entry->lrs[x] & ICH_LR_VIRTUAL_ID_MASK,				\
+	(__entry->lrs[x] & ICH_LR_PHYS_ID_MASK) >> ICH_LR_PHYS_ID_SHIFT,	\
+	(__entry->orig_lrs[x] & ICH_LR_PHYS_ID_MASK) >> ICH_LR_PHYS_ID_SHIFT
+
+TRACE_EVENT(vgic_create_shadow_lrs,
+	TP_PROTO(struct kvm_vcpu *vcpu, int nr_lr, u64 *lrs, u64 *orig_lrs),
+	TP_ARGS(vcpu, nr_lr, lrs, orig_lrs),
+
+	TP_STRUCT__entry(
+		__field(	int,	nr_lr			)
+		__array(	u64,	lrs,		16	)
+		__array(	u64,	orig_lrs,	16	)
+	),
+
+	TP_fast_assign(
+		__entry->nr_lr		= nr_lr;
+		memcpy(__entry->lrs, lrs, 16 * sizeof(u64));
+		memcpy(__entry->orig_lrs, orig_lrs, 16 * sizeof(u64));
+	),
+
+	TP_printk("nr_lr: %d\n"
+		  "%50sLR[ 0]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[ 1]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[ 2]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[ 3]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[ 4]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[ 5]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[ 6]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[ 7]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[ 8]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[ 9]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[10]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[11]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[12]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[13]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[14]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)\n"
+		  "%50sLR[15]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu (%5llu)",
+		  __entry->nr_lr,
+		  SLR_ENTRY_VALS(0), SLR_ENTRY_VALS(1), SLR_ENTRY_VALS(2),
+		  SLR_ENTRY_VALS(3), SLR_ENTRY_VALS(4), SLR_ENTRY_VALS(5),
+		  SLR_ENTRY_VALS(6), SLR_ENTRY_VALS(7), SLR_ENTRY_VALS(8),
+		  SLR_ENTRY_VALS(9), SLR_ENTRY_VALS(10), SLR_ENTRY_VALS(11),
+		  SLR_ENTRY_VALS(12), SLR_ENTRY_VALS(13), SLR_ENTRY_VALS(14),
+		  SLR_ENTRY_VALS(15))
+);
+
+#define LR_ENTRY_VALS(x)							\
+	" ",									\
+	!!(__entry->lrs[x] & ICH_LR_HW),		   			\
+	!!(__entry->lrs[x] & ICH_LR_PENDING_BIT),	   			\
+	!!(__entry->lrs[x] & ICH_LR_ACTIVE_BIT),	   			\
+	__entry->lrs[x] & ICH_LR_VIRTUAL_ID_MASK,				\
+	(__entry->lrs[x] & ICH_LR_PHYS_ID_MASK) >> ICH_LR_PHYS_ID_SHIFT
+
+TRACE_EVENT(vgic_put_nested,
+	TP_PROTO(struct kvm_vcpu *vcpu, int nr_lr, u64 *lrs),
+	TP_ARGS(vcpu, nr_lr, lrs),
+
+	TP_STRUCT__entry(
+		__field(	int,	nr_lr			)
+		__array(	u64,	lrs,		16	)
+	),
+
+	TP_fast_assign(
+		__entry->nr_lr		= nr_lr;
+		memcpy(__entry->lrs, lrs, 16 * sizeof(u64));
+	),
+
+	TP_printk("nr_lr: %d\n"
+		  "%50sLR[ 0]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[ 1]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[ 2]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[ 3]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[ 4]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[ 5]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[ 6]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[ 7]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[ 8]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[ 9]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[10]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[11]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[12]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[13]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[14]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu\n"
+		  "%50sLR[15]: HW: %d P: %d: A: %d vINTID: %5llu pINTID: %5llu",
+		  __entry->nr_lr,
+		  LR_ENTRY_VALS(0), LR_ENTRY_VALS(1), LR_ENTRY_VALS(2),
+		  LR_ENTRY_VALS(3), LR_ENTRY_VALS(4), LR_ENTRY_VALS(5),
+		  LR_ENTRY_VALS(6), LR_ENTRY_VALS(7), LR_ENTRY_VALS(8),
+		  LR_ENTRY_VALS(9), LR_ENTRY_VALS(10), LR_ENTRY_VALS(11),
+		  LR_ENTRY_VALS(12), LR_ENTRY_VALS(13), LR_ENTRY_VALS(14),
+		  LR_ENTRY_VALS(15))
+);
+
+TRACE_EVENT(vgic_nested_hw_emulate,
+	TP_PROTO(int lr, u64 lr_val, u32 l1_intid),
+	TP_ARGS(lr, lr_val, l1_intid),
+
+	TP_STRUCT__entry(
+		__field(	int,	lr		)
+		__field(	u64,	lr_val		)
+		__field(	u32,	l1_intid	)
+	),
+
+	TP_fast_assign(
+		__entry->lr		= lr;
+		__entry->lr_val		= lr_val;
+		__entry->l1_intid	= l1_intid;
+	),
+
+	TP_printk("lr: %d LR %llx L1 INTID: %u\n",
+		  __entry->lr, __entry->lr_val, __entry->l1_intid)
+);
+
+#endif /* _TRACE_VGIC_NESTED_H */
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH vgic/
+#undef TRACE_INCLUDE_FILE
+#define TRACE_INCLUDE_FILE vgic-nested-trace
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/arch/arm64/kvm/vgic/vgic-v3-nested.c b/arch/arm64/kvm/vgic/vgic-v3-nested.c
index 4a35c2be7984..02b0b335f72a 100644
--- a/arch/arm64/kvm/vgic/vgic-v3-nested.c
+++ b/arch/arm64/kvm/vgic/vgic-v3-nested.c
@@ -15,6 +15,9 @@
 
 #include "vgic.h"
 
+#define CREATE_TRACE_POINTS
+#include "vgic-nested-trace.h"
+
 static inline struct vgic_v3_cpu_if *vcpu_nested_if(struct kvm_vcpu *vcpu)
 {
 	return &vcpu->arch.vgic_cpu.nested_vgic_v3;
@@ -121,6 +124,9 @@ static void vgic_v3_create_shadow_lr(struct kvm_vcpu *vcpu)
 		used_lrs = i + 1;
 	}
 
+	trace_vgic_create_shadow_lrs(vcpu, kvm_vgic_global_state.nr_lr,
+				     s_cpu_if->vgic_lr, cpu_if->vgic_lr);
+
 	s_cpu_if->used_lrs = used_lrs;
 }
 
@@ -165,8 +171,10 @@ void vgic_v3_sync_nested(struct kvm_vcpu *vcpu)
 			continue; /* oh well, the guest hyp is broken */
 
 		lr = __gic_v3_get_lr(i);
-		if (!(lr & ICH_LR_STATE))
+		if (!(lr & ICH_LR_STATE)) {
+			trace_vgic_nested_hw_emulate(i, lr, l1_irq);
 			irq->active = false;
+		}
 
 		vgic_put_irq(vcpu->kvm, irq);
 	}
@@ -197,6 +205,9 @@ void vgic_v3_put_nested(struct kvm_vcpu *vcpu)
 
 	__vgic_v3_save_state(vcpu_shadow_if(vcpu));
 
+	trace_vgic_put_nested(vcpu, kvm_vgic_global_state.nr_lr,
+			      vcpu_shadow_if(vcpu)->vgic_lr);
+
 	/*
 	 * Translate the shadow state HW fields back to the virtual ones
 	 * before copying the shadow struct back to the nested one.
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 52/64] KVM: arm64: nv: Allow userspace to request KVM_ARM_VCPU_NESTED_VIRT
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:19   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:19 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

Since we're (almost) feature complete, let's allow userspace to
request KVM_ARM_VCPU_NESTED_VIRT by bumping the KVM_VCPU_MAX_FEATURES
up. We also now advertise the feature to userspace with a new capability.

It's going to be great...

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h | 2 +-
 arch/arm64/kvm/arm.c              | 3 +++
 include/uapi/linux/kvm.h          | 1 +
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 6bfdb21c0d8a..0ea8cf0eeca7 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -37,7 +37,7 @@
 
 #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
 
-#define KVM_VCPU_MAX_FEATURES 7
+#define KVM_VCPU_MAX_FEATURES 8
 
 #define KVM_REQ_SLEEP \
 	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 91fb7bc09b8e..ac7d89c1e987 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -258,6 +258,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_ARM_EL1_32BIT:
 		r = cpus_have_const_cap(ARM64_HAS_32BIT_EL1);
 		break;
+	case KVM_CAP_ARM_EL2:
+		r = cpus_have_const_cap(ARM64_HAS_NESTED_VIRT);
+		break;
 	case KVM_CAP_GUEST_DEBUG_HW_BPS:
 		r = get_num_brps();
 		break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 9563d294f181..6950497e5032 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1133,6 +1133,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206
 #define KVM_CAP_VM_GPA_BITS 207
 #define KVM_CAP_XSAVE2 208
+#define KVM_CAP_ARM_EL2 209
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 52/64] KVM: arm64: nv: Allow userspace to request KVM_ARM_VCPU_NESTED_VIRT
@ 2022-01-28 12:19   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:19 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

Since we're (almost) feature complete, let's allow userspace to
request KVM_ARM_VCPU_NESTED_VIRT by bumping the KVM_VCPU_MAX_FEATURES
up. We also now advertise the feature to userspace with a new capability.

It's going to be great...

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h | 2 +-
 arch/arm64/kvm/arm.c              | 3 +++
 include/uapi/linux/kvm.h          | 1 +
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 6bfdb21c0d8a..0ea8cf0eeca7 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -37,7 +37,7 @@
 
 #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
 
-#define KVM_VCPU_MAX_FEATURES 7
+#define KVM_VCPU_MAX_FEATURES 8
 
 #define KVM_REQ_SLEEP \
 	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 91fb7bc09b8e..ac7d89c1e987 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -258,6 +258,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_ARM_EL1_32BIT:
 		r = cpus_have_const_cap(ARM64_HAS_32BIT_EL1);
 		break;
+	case KVM_CAP_ARM_EL2:
+		r = cpus_have_const_cap(ARM64_HAS_NESTED_VIRT);
+		break;
 	case KVM_CAP_GUEST_DEBUG_HW_BPS:
 		r = get_num_brps();
 		break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 9563d294f181..6950497e5032 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1133,6 +1133,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206
 #define KVM_CAP_VM_GPA_BITS 207
 #define KVM_CAP_XSAVE2 208
+#define KVM_CAP_ARM_EL2 209
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 52/64] KVM: arm64: nv: Allow userspace to request KVM_ARM_VCPU_NESTED_VIRT
@ 2022-01-28 12:19   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:19 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

Since we're (almost) feature complete, let's allow userspace to
request KVM_ARM_VCPU_NESTED_VIRT by bumping the KVM_VCPU_MAX_FEATURES
up. We also now advertise the feature to userspace with a new capability.

It's going to be great...

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h | 2 +-
 arch/arm64/kvm/arm.c              | 3 +++
 include/uapi/linux/kvm.h          | 1 +
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 6bfdb21c0d8a..0ea8cf0eeca7 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -37,7 +37,7 @@
 
 #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
 
-#define KVM_VCPU_MAX_FEATURES 7
+#define KVM_VCPU_MAX_FEATURES 8
 
 #define KVM_REQ_SLEEP \
 	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 91fb7bc09b8e..ac7d89c1e987 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -258,6 +258,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_ARM_EL1_32BIT:
 		r = cpus_have_const_cap(ARM64_HAS_32BIT_EL1);
 		break;
+	case KVM_CAP_ARM_EL2:
+		r = cpus_have_const_cap(ARM64_HAS_NESTED_VIRT);
+		break;
 	case KVM_CAP_GUEST_DEBUG_HW_BPS:
 		r = get_num_brps();
 		break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 9563d294f181..6950497e5032 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1133,6 +1133,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM 206
 #define KVM_CAP_VM_GPA_BITS 207
 #define KVM_CAP_XSAVE2 208
+#define KVM_CAP_ARM_EL2 209
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 53/64] KVM: arm64: nv: Add handling of ARMv8.4-TTL TLB invalidation
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:19   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:19 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

Support guest-provided information information to find out about
the range of required invalidation.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_nested.h |  1 +
 arch/arm64/kvm/nested.c             | 57 +++++++++++++++++++++
 arch/arm64/kvm/sys_regs.c           | 78 ++++++++++++++++++-----------
 3 files changed, 108 insertions(+), 28 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 3eef5f60c76d..8bfbeca2f4cd 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -128,6 +128,7 @@ extern bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg,
 extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
 extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
 extern bool forward_nv1_traps(struct kvm_vcpu *vcpu);
+unsigned int ttl_to_size(u8 ttl);
 
 struct sys_reg_params;
 struct sys_reg_desc;
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 39c9b1033f02..7e4ff98c4e95 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -351,6 +351,63 @@ int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
 	return ret;
 }
 
+
+unsigned int ttl_to_size(u8 ttl)
+{
+	int level = ttl & 3;
+	int gran = (ttl >> 2) & 3;
+	unsigned int max_size = 0;
+
+	switch (gran) {
+	case TLBI_TTL_TG_4K:
+		switch (level) {
+		case 0:
+			break;
+		case 1:
+			max_size = SZ_1G;
+			break;
+		case 2:
+			max_size = SZ_2M;
+			break;
+		case 3:
+			max_size = SZ_4K;
+			break;
+		}
+		break;
+	case TLBI_TTL_TG_16K:
+		switch (level) {
+		case 0:
+		case 1:
+			break;
+		case 2:
+			max_size = SZ_32M;
+			break;
+		case 3:
+			max_size = SZ_16K;
+			break;
+		}
+		break;
+	case TLBI_TTL_TG_64K:
+		switch (level) {
+		case 0:
+		case 1:
+			/* No 52bit IPA support */
+			break;
+		case 2:
+			max_size = SZ_512M;
+			break;
+		case 3:
+			max_size = SZ_64K;
+			break;
+		}
+		break;
+	default:			/* No size information */
+		break;
+	}
+
+	return max_size;
+}
+
 /* Must be called with kvm->lock held */
 struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr)
 {
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index a18d845bcff7..ebe6fad76b4b 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2731,14 +2731,54 @@ static bool handle_vmalls12e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 	return true;
 }
 
+static unsigned long compute_tlb_inval_range(struct kvm_vcpu *vcpu,
+					     struct kvm_s2_mmu *mmu,
+					     u64 val)
+{
+	unsigned long max_size;
+	u8 ttl = 0;
+
+	if (cpus_have_const_cap(ARM64_HAS_ARMv8_4_TTL)) {
+		ttl = FIELD_GET(GENMASK_ULL(47, 44), val);
+	}
+
+	max_size = ttl_to_size(ttl);
+
+	if (!max_size) {
+		u64 vtcr = vcpu_read_sys_reg(vcpu, VTCR_EL2);
+
+		/* Compute the maximum extent of the invalidation */
+		switch ((vtcr & VTCR_EL2_TG0_MASK)) {
+		case VTCR_EL2_TG0_4K:
+			max_size = SZ_1G;
+			break;
+		case VTCR_EL2_TG0_16K:
+			max_size = SZ_32M;
+			break;
+		case VTCR_EL2_TG0_64K:
+			/*
+			 * No, we do not support 52bit IPA in nested yet. Once
+			 * we do, this should be 4TB.
+			 */
+			/* FIXME: remove the 52bit PA support from the IDregs */
+			max_size = SZ_512M;
+			break;
+		default:
+			BUG();
+		}
+	}
+
+	WARN_ON(!max_size);
+	return max_size;
+}
+
 static bool handle_ipas2e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 			     const struct sys_reg_desc *r)
 {
 	u64 vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
-	u64 vtcr = vcpu_read_sys_reg(vcpu, VTCR_EL2);
 	struct kvm_s2_mmu *mmu;
 	u64 base_addr;
-	int max_size;
+	unsigned long max_size;
 
 	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
 		return false;
@@ -2748,45 +2788,27 @@ static bool handle_ipas2e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 	 *
 	 * - NS bit: we're non-secure only.
 	 *
-	 * - TTL field: We already have the granule size from the
-	 *   VTCR_EL2.TG0 field, and the level is only relevant to the
-	 *   guest's S2PT.
-	 *
 	 * - IPA[51:48]: We don't support 52bit IPA just yet...
 	 *
 	 * And of course, adjust the IPA to be on an actual address.
 	 */
 	base_addr = (p->regval & GENMASK_ULL(35, 0)) << 12;
 
-	/* Compute the maximum extent of the invalidation */
-	switch ((vtcr & VTCR_EL2_TG0_MASK)) {
-	case VTCR_EL2_TG0_4K:
-		max_size = SZ_1G;
-		break;
-	case VTCR_EL2_TG0_16K:
-		max_size = SZ_32M;
-		break;
-	case VTCR_EL2_TG0_64K:
-		/*
-		 * No, we do not support 52bit IPA in nested yet. Once
-		 * we do, this should be 4TB.
-		 */
-		/* FIXME: remove the 52bit PA support from the IDregs */
-		max_size = SZ_512M;
-		break;
-	default:
-		BUG();
-	}
-
 	spin_lock(&vcpu->kvm->mmu_lock);
 
 	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, HCR_VM);
-	if (mmu)
+	if (mmu) {
+		max_size = compute_tlb_inval_range(vcpu, mmu, p->regval);
+		base_addr &= ~(max_size - 1);
 		kvm_unmap_stage2_range(mmu, base_addr, max_size);
+	}
 
 	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, 0);
-	if (mmu)
+	if (mmu) {
+		max_size = compute_tlb_inval_range(vcpu, mmu, p->regval);
+		base_addr &= ~(max_size - 1);
 		kvm_unmap_stage2_range(mmu, base_addr, max_size);
+	}
 
 	spin_unlock(&vcpu->kvm->mmu_lock);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 53/64] KVM: arm64: nv: Add handling of ARMv8.4-TTL TLB invalidation
@ 2022-01-28 12:19   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:19 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

Support guest-provided information information to find out about
the range of required invalidation.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_nested.h |  1 +
 arch/arm64/kvm/nested.c             | 57 +++++++++++++++++++++
 arch/arm64/kvm/sys_regs.c           | 78 ++++++++++++++++++-----------
 3 files changed, 108 insertions(+), 28 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 3eef5f60c76d..8bfbeca2f4cd 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -128,6 +128,7 @@ extern bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg,
 extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
 extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
 extern bool forward_nv1_traps(struct kvm_vcpu *vcpu);
+unsigned int ttl_to_size(u8 ttl);
 
 struct sys_reg_params;
 struct sys_reg_desc;
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 39c9b1033f02..7e4ff98c4e95 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -351,6 +351,63 @@ int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
 	return ret;
 }
 
+
+unsigned int ttl_to_size(u8 ttl)
+{
+	int level = ttl & 3;
+	int gran = (ttl >> 2) & 3;
+	unsigned int max_size = 0;
+
+	switch (gran) {
+	case TLBI_TTL_TG_4K:
+		switch (level) {
+		case 0:
+			break;
+		case 1:
+			max_size = SZ_1G;
+			break;
+		case 2:
+			max_size = SZ_2M;
+			break;
+		case 3:
+			max_size = SZ_4K;
+			break;
+		}
+		break;
+	case TLBI_TTL_TG_16K:
+		switch (level) {
+		case 0:
+		case 1:
+			break;
+		case 2:
+			max_size = SZ_32M;
+			break;
+		case 3:
+			max_size = SZ_16K;
+			break;
+		}
+		break;
+	case TLBI_TTL_TG_64K:
+		switch (level) {
+		case 0:
+		case 1:
+			/* No 52bit IPA support */
+			break;
+		case 2:
+			max_size = SZ_512M;
+			break;
+		case 3:
+			max_size = SZ_64K;
+			break;
+		}
+		break;
+	default:			/* No size information */
+		break;
+	}
+
+	return max_size;
+}
+
 /* Must be called with kvm->lock held */
 struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr)
 {
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index a18d845bcff7..ebe6fad76b4b 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2731,14 +2731,54 @@ static bool handle_vmalls12e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 	return true;
 }
 
+static unsigned long compute_tlb_inval_range(struct kvm_vcpu *vcpu,
+					     struct kvm_s2_mmu *mmu,
+					     u64 val)
+{
+	unsigned long max_size;
+	u8 ttl = 0;
+
+	if (cpus_have_const_cap(ARM64_HAS_ARMv8_4_TTL)) {
+		ttl = FIELD_GET(GENMASK_ULL(47, 44), val);
+	}
+
+	max_size = ttl_to_size(ttl);
+
+	if (!max_size) {
+		u64 vtcr = vcpu_read_sys_reg(vcpu, VTCR_EL2);
+
+		/* Compute the maximum extent of the invalidation */
+		switch ((vtcr & VTCR_EL2_TG0_MASK)) {
+		case VTCR_EL2_TG0_4K:
+			max_size = SZ_1G;
+			break;
+		case VTCR_EL2_TG0_16K:
+			max_size = SZ_32M;
+			break;
+		case VTCR_EL2_TG0_64K:
+			/*
+			 * No, we do not support 52bit IPA in nested yet. Once
+			 * we do, this should be 4TB.
+			 */
+			/* FIXME: remove the 52bit PA support from the IDregs */
+			max_size = SZ_512M;
+			break;
+		default:
+			BUG();
+		}
+	}
+
+	WARN_ON(!max_size);
+	return max_size;
+}
+
 static bool handle_ipas2e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 			     const struct sys_reg_desc *r)
 {
 	u64 vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
-	u64 vtcr = vcpu_read_sys_reg(vcpu, VTCR_EL2);
 	struct kvm_s2_mmu *mmu;
 	u64 base_addr;
-	int max_size;
+	unsigned long max_size;
 
 	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
 		return false;
@@ -2748,45 +2788,27 @@ static bool handle_ipas2e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 	 *
 	 * - NS bit: we're non-secure only.
 	 *
-	 * - TTL field: We already have the granule size from the
-	 *   VTCR_EL2.TG0 field, and the level is only relevant to the
-	 *   guest's S2PT.
-	 *
 	 * - IPA[51:48]: We don't support 52bit IPA just yet...
 	 *
 	 * And of course, adjust the IPA to be on an actual address.
 	 */
 	base_addr = (p->regval & GENMASK_ULL(35, 0)) << 12;
 
-	/* Compute the maximum extent of the invalidation */
-	switch ((vtcr & VTCR_EL2_TG0_MASK)) {
-	case VTCR_EL2_TG0_4K:
-		max_size = SZ_1G;
-		break;
-	case VTCR_EL2_TG0_16K:
-		max_size = SZ_32M;
-		break;
-	case VTCR_EL2_TG0_64K:
-		/*
-		 * No, we do not support 52bit IPA in nested yet. Once
-		 * we do, this should be 4TB.
-		 */
-		/* FIXME: remove the 52bit PA support from the IDregs */
-		max_size = SZ_512M;
-		break;
-	default:
-		BUG();
-	}
-
 	spin_lock(&vcpu->kvm->mmu_lock);
 
 	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, HCR_VM);
-	if (mmu)
+	if (mmu) {
+		max_size = compute_tlb_inval_range(vcpu, mmu, p->regval);
+		base_addr &= ~(max_size - 1);
 		kvm_unmap_stage2_range(mmu, base_addr, max_size);
+	}
 
 	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, 0);
-	if (mmu)
+	if (mmu) {
+		max_size = compute_tlb_inval_range(vcpu, mmu, p->regval);
+		base_addr &= ~(max_size - 1);
 		kvm_unmap_stage2_range(mmu, base_addr, max_size);
+	}
 
 	spin_unlock(&vcpu->kvm->mmu_lock);
 
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 53/64] KVM: arm64: nv: Add handling of ARMv8.4-TTL TLB invalidation
@ 2022-01-28 12:19   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:19 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

Support guest-provided information information to find out about
the range of required invalidation.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_nested.h |  1 +
 arch/arm64/kvm/nested.c             | 57 +++++++++++++++++++++
 arch/arm64/kvm/sys_regs.c           | 78 ++++++++++++++++++-----------
 3 files changed, 108 insertions(+), 28 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 3eef5f60c76d..8bfbeca2f4cd 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -128,6 +128,7 @@ extern bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg,
 extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
 extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
 extern bool forward_nv1_traps(struct kvm_vcpu *vcpu);
+unsigned int ttl_to_size(u8 ttl);
 
 struct sys_reg_params;
 struct sys_reg_desc;
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 39c9b1033f02..7e4ff98c4e95 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -351,6 +351,63 @@ int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
 	return ret;
 }
 
+
+unsigned int ttl_to_size(u8 ttl)
+{
+	int level = ttl & 3;
+	int gran = (ttl >> 2) & 3;
+	unsigned int max_size = 0;
+
+	switch (gran) {
+	case TLBI_TTL_TG_4K:
+		switch (level) {
+		case 0:
+			break;
+		case 1:
+			max_size = SZ_1G;
+			break;
+		case 2:
+			max_size = SZ_2M;
+			break;
+		case 3:
+			max_size = SZ_4K;
+			break;
+		}
+		break;
+	case TLBI_TTL_TG_16K:
+		switch (level) {
+		case 0:
+		case 1:
+			break;
+		case 2:
+			max_size = SZ_32M;
+			break;
+		case 3:
+			max_size = SZ_16K;
+			break;
+		}
+		break;
+	case TLBI_TTL_TG_64K:
+		switch (level) {
+		case 0:
+		case 1:
+			/* No 52bit IPA support */
+			break;
+		case 2:
+			max_size = SZ_512M;
+			break;
+		case 3:
+			max_size = SZ_64K;
+			break;
+		}
+		break;
+	default:			/* No size information */
+		break;
+	}
+
+	return max_size;
+}
+
 /* Must be called with kvm->lock held */
 struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr)
 {
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index a18d845bcff7..ebe6fad76b4b 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2731,14 +2731,54 @@ static bool handle_vmalls12e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 	return true;
 }
 
+static unsigned long compute_tlb_inval_range(struct kvm_vcpu *vcpu,
+					     struct kvm_s2_mmu *mmu,
+					     u64 val)
+{
+	unsigned long max_size;
+	u8 ttl = 0;
+
+	if (cpus_have_const_cap(ARM64_HAS_ARMv8_4_TTL)) {
+		ttl = FIELD_GET(GENMASK_ULL(47, 44), val);
+	}
+
+	max_size = ttl_to_size(ttl);
+
+	if (!max_size) {
+		u64 vtcr = vcpu_read_sys_reg(vcpu, VTCR_EL2);
+
+		/* Compute the maximum extent of the invalidation */
+		switch ((vtcr & VTCR_EL2_TG0_MASK)) {
+		case VTCR_EL2_TG0_4K:
+			max_size = SZ_1G;
+			break;
+		case VTCR_EL2_TG0_16K:
+			max_size = SZ_32M;
+			break;
+		case VTCR_EL2_TG0_64K:
+			/*
+			 * No, we do not support 52bit IPA in nested yet. Once
+			 * we do, this should be 4TB.
+			 */
+			/* FIXME: remove the 52bit PA support from the IDregs */
+			max_size = SZ_512M;
+			break;
+		default:
+			BUG();
+		}
+	}
+
+	WARN_ON(!max_size);
+	return max_size;
+}
+
 static bool handle_ipas2e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 			     const struct sys_reg_desc *r)
 {
 	u64 vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
-	u64 vtcr = vcpu_read_sys_reg(vcpu, VTCR_EL2);
 	struct kvm_s2_mmu *mmu;
 	u64 base_addr;
-	int max_size;
+	unsigned long max_size;
 
 	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
 		return false;
@@ -2748,45 +2788,27 @@ static bool handle_ipas2e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 	 *
 	 * - NS bit: we're non-secure only.
 	 *
-	 * - TTL field: We already have the granule size from the
-	 *   VTCR_EL2.TG0 field, and the level is only relevant to the
-	 *   guest's S2PT.
-	 *
 	 * - IPA[51:48]: We don't support 52bit IPA just yet...
 	 *
 	 * And of course, adjust the IPA to be on an actual address.
 	 */
 	base_addr = (p->regval & GENMASK_ULL(35, 0)) << 12;
 
-	/* Compute the maximum extent of the invalidation */
-	switch ((vtcr & VTCR_EL2_TG0_MASK)) {
-	case VTCR_EL2_TG0_4K:
-		max_size = SZ_1G;
-		break;
-	case VTCR_EL2_TG0_16K:
-		max_size = SZ_32M;
-		break;
-	case VTCR_EL2_TG0_64K:
-		/*
-		 * No, we do not support 52bit IPA in nested yet. Once
-		 * we do, this should be 4TB.
-		 */
-		/* FIXME: remove the 52bit PA support from the IDregs */
-		max_size = SZ_512M;
-		break;
-	default:
-		BUG();
-	}
-
 	spin_lock(&vcpu->kvm->mmu_lock);
 
 	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, HCR_VM);
-	if (mmu)
+	if (mmu) {
+		max_size = compute_tlb_inval_range(vcpu, mmu, p->regval);
+		base_addr &= ~(max_size - 1);
 		kvm_unmap_stage2_range(mmu, base_addr, max_size);
+	}
 
 	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, 0);
-	if (mmu)
+	if (mmu) {
+		max_size = compute_tlb_inval_range(vcpu, mmu, p->regval);
+		base_addr &= ~(max_size - 1);
 		kvm_unmap_stage2_range(mmu, base_addr, max_size);
+	}
 
 	spin_unlock(&vcpu->kvm->mmu_lock);
 
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 54/64] KVM: arm64: nv: Invalidate TLBs based on shadow S2 TTL-like information
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:19   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:19 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

In order to be able to make S2 TLB invalidations more performant on NV,
let's use a scheme derived from the ARMv8.4 TTL extension.

If bits [56:55] in the descriptor are non-zero, they indicate a level
which can be used as an invalidation range.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_nested.h |  4 ++
 arch/arm64/kvm/nested.c             | 98 +++++++++++++++++++++++++++++
 arch/arm64/kvm/sys_regs.c           | 13 ++--
 3 files changed, 110 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 8bfbeca2f4cd..847b652bd376 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -128,6 +128,8 @@ extern bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg,
 extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
 extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
 extern bool forward_nv1_traps(struct kvm_vcpu *vcpu);
+u8 get_guest_mapping_ttl(struct kvm_vcpu *vcpu, struct kvm_s2_mmu *mmu,
+			 u64 addr);
 unsigned int ttl_to_size(u8 ttl);
 
 struct sys_reg_params;
@@ -136,4 +138,6 @@ struct sys_reg_desc;
 void access_nested_id_reg(struct kvm_vcpu *v, struct sys_reg_params *p,
 			  const struct sys_reg_desc *r);
 
+#define KVM_NV_GUEST_MAP_SZ	(KVM_PGTABLE_PROT_SW1 | KVM_PGTABLE_PROT_SW0)
+
 #endif /* __ARM64_KVM_NESTED_H */
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 7e4ff98c4e95..8e99690cdde1 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -4,6 +4,7 @@
  * Author: Jintack Lim <jintack.lim@linaro.org>
  */
 
+#include <linux/bitfield.h>
 #include <linux/kvm.h>
 #include <linux/kvm_host.h>
 
@@ -351,6 +352,29 @@ int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
 	return ret;
 }
 
+static int read_host_s2_desc(phys_addr_t pa, u64 *desc, void *data)
+{
+	u64 *va = phys_to_virt(pa);
+
+	*desc = *va;
+
+	return 0;
+}
+
+static int kvm_walk_shadow_s2(struct kvm_s2_mmu *mmu, phys_addr_t gipa,
+			      struct kvm_s2_trans *result)
+{
+	struct s2_walk_info wi = { };
+
+	wi.read_desc = read_host_s2_desc;
+	wi.baddr = mmu->pgd_phys;
+
+	vtcr_to_walk_info(mmu->arch->vtcr, &wi);
+
+	wi.be = IS_ENABLED(CONFIG_CPU_BIG_ENDIAN);
+
+	return walk_nested_s2_pgd(gipa, &wi, result);
+}
 
 unsigned int ttl_to_size(u8 ttl)
 {
@@ -408,6 +432,80 @@ unsigned int ttl_to_size(u8 ttl)
 	return max_size;
 }
 
+/*
+ * Compute the equivalent of the TTL field by parsing the shadow PT.
+ * The granule size is extracted from VTCR_EL2.TG0 while the level is
+ * retrieved from first entry carrying the level as a tag.
+ */
+u8 get_guest_mapping_ttl(struct kvm_vcpu *vcpu, struct kvm_s2_mmu *mmu,
+			 u64 addr)
+{
+	u64 tmp, sz = 0, vtcr = vcpu_read_sys_reg(vcpu, VTCR_EL2);
+	struct kvm_s2_trans out;
+	u8 ttl, level;
+
+	switch (vtcr & VTCR_EL2_TG0_MASK) {
+	case VTCR_EL2_TG0_4K:
+		ttl = (1 << 2);
+		break;
+	case VTCR_EL2_TG0_16K:
+		ttl = (2 << 2);
+		break;
+	case VTCR_EL2_TG0_64K:
+		ttl = (3 << 2);
+		break;
+	default:
+		BUG();
+	}
+
+	tmp = addr;
+
+again:
+	/* Iteratively compute the block sizes for a particular granule size */
+	switch (vtcr & VTCR_EL2_TG0_MASK) {
+	case VTCR_EL2_TG0_4K:
+		if	(sz < SZ_4K)	sz = SZ_4K;
+		else if (sz < SZ_2M)	sz = SZ_2M;
+		else if (sz < SZ_1G)	sz = SZ_1G;
+		else			sz = 0;
+		break;
+	case VTCR_EL2_TG0_16K:
+		if	(sz < SZ_16K)	sz = SZ_16K;
+		else if (sz < SZ_32M)	sz = SZ_32M;
+		else			sz = 0;
+		break;
+	case VTCR_EL2_TG0_64K:
+		if	(sz < SZ_64K)	sz = SZ_64K;
+		else if (sz < SZ_512M)	sz = SZ_512M;
+		else			sz = 0;
+		break;
+	default:
+		BUG();
+	}
+
+	if (sz == 0)
+		return 0;
+
+	tmp &= ~(sz - 1);
+	out = (struct kvm_s2_trans) { };
+	kvm_walk_shadow_s2(mmu, tmp, &out);
+	level = FIELD_GET(KVM_NV_GUEST_MAP_SZ, out.upper_attr);
+	if (!level)
+		goto again;
+
+	ttl |= level;
+
+	/*
+	 * We now have found some level information in the shadow S2. Check
+	 * that the resulting range is actually including the original IPA.
+	 */
+	sz = ttl_to_size(ttl);
+	if (addr < (tmp + sz))
+		return ttl;
+
+	return 0;
+}
+
 /* Must be called with kvm->lock held */
 struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr)
 {
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index ebe6fad76b4b..32b8f4c1074a 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2736,10 +2736,13 @@ static unsigned long compute_tlb_inval_range(struct kvm_vcpu *vcpu,
 					     u64 val)
 {
 	unsigned long max_size;
-	u8 ttl = 0;
+	u8 ttl;
 
-	if (cpus_have_const_cap(ARM64_HAS_ARMv8_4_TTL)) {
-		ttl = FIELD_GET(GENMASK_ULL(47, 44), val);
+	ttl = FIELD_GET(GENMASK_ULL(47, 44), val);
+
+	if (!(cpus_have_const_cap(ARM64_HAS_ARMv8_4_TTL) && ttl)) {
+		u64 addr = (val & GENMASK_ULL(35, 0)) << 12;
+		ttl = get_guest_mapping_ttl(vcpu, mmu, addr);
 	}
 
 	max_size = ttl_to_size(ttl);
@@ -2783,6 +2786,8 @@ static bool handle_ipas2e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
 		return false;
 
+	spin_lock(&vcpu->kvm->mmu_lock);
+
 	/*
 	 * We drop a number of things from the supplied value:
 	 *
@@ -2794,8 +2799,6 @@ static bool handle_ipas2e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 	 */
 	base_addr = (p->regval & GENMASK_ULL(35, 0)) << 12;
 
-	spin_lock(&vcpu->kvm->mmu_lock);
-
 	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, HCR_VM);
 	if (mmu) {
 		max_size = compute_tlb_inval_range(vcpu, mmu, p->regval);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 54/64] KVM: arm64: nv: Invalidate TLBs based on shadow S2 TTL-like information
@ 2022-01-28 12:19   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:19 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

In order to be able to make S2 TLB invalidations more performant on NV,
let's use a scheme derived from the ARMv8.4 TTL extension.

If bits [56:55] in the descriptor are non-zero, they indicate a level
which can be used as an invalidation range.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_nested.h |  4 ++
 arch/arm64/kvm/nested.c             | 98 +++++++++++++++++++++++++++++
 arch/arm64/kvm/sys_regs.c           | 13 ++--
 3 files changed, 110 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 8bfbeca2f4cd..847b652bd376 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -128,6 +128,8 @@ extern bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg,
 extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
 extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
 extern bool forward_nv1_traps(struct kvm_vcpu *vcpu);
+u8 get_guest_mapping_ttl(struct kvm_vcpu *vcpu, struct kvm_s2_mmu *mmu,
+			 u64 addr);
 unsigned int ttl_to_size(u8 ttl);
 
 struct sys_reg_params;
@@ -136,4 +138,6 @@ struct sys_reg_desc;
 void access_nested_id_reg(struct kvm_vcpu *v, struct sys_reg_params *p,
 			  const struct sys_reg_desc *r);
 
+#define KVM_NV_GUEST_MAP_SZ	(KVM_PGTABLE_PROT_SW1 | KVM_PGTABLE_PROT_SW0)
+
 #endif /* __ARM64_KVM_NESTED_H */
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 7e4ff98c4e95..8e99690cdde1 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -4,6 +4,7 @@
  * Author: Jintack Lim <jintack.lim@linaro.org>
  */
 
+#include <linux/bitfield.h>
 #include <linux/kvm.h>
 #include <linux/kvm_host.h>
 
@@ -351,6 +352,29 @@ int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
 	return ret;
 }
 
+static int read_host_s2_desc(phys_addr_t pa, u64 *desc, void *data)
+{
+	u64 *va = phys_to_virt(pa);
+
+	*desc = *va;
+
+	return 0;
+}
+
+static int kvm_walk_shadow_s2(struct kvm_s2_mmu *mmu, phys_addr_t gipa,
+			      struct kvm_s2_trans *result)
+{
+	struct s2_walk_info wi = { };
+
+	wi.read_desc = read_host_s2_desc;
+	wi.baddr = mmu->pgd_phys;
+
+	vtcr_to_walk_info(mmu->arch->vtcr, &wi);
+
+	wi.be = IS_ENABLED(CONFIG_CPU_BIG_ENDIAN);
+
+	return walk_nested_s2_pgd(gipa, &wi, result);
+}
 
 unsigned int ttl_to_size(u8 ttl)
 {
@@ -408,6 +432,80 @@ unsigned int ttl_to_size(u8 ttl)
 	return max_size;
 }
 
+/*
+ * Compute the equivalent of the TTL field by parsing the shadow PT.
+ * The granule size is extracted from VTCR_EL2.TG0 while the level is
+ * retrieved from first entry carrying the level as a tag.
+ */
+u8 get_guest_mapping_ttl(struct kvm_vcpu *vcpu, struct kvm_s2_mmu *mmu,
+			 u64 addr)
+{
+	u64 tmp, sz = 0, vtcr = vcpu_read_sys_reg(vcpu, VTCR_EL2);
+	struct kvm_s2_trans out;
+	u8 ttl, level;
+
+	switch (vtcr & VTCR_EL2_TG0_MASK) {
+	case VTCR_EL2_TG0_4K:
+		ttl = (1 << 2);
+		break;
+	case VTCR_EL2_TG0_16K:
+		ttl = (2 << 2);
+		break;
+	case VTCR_EL2_TG0_64K:
+		ttl = (3 << 2);
+		break;
+	default:
+		BUG();
+	}
+
+	tmp = addr;
+
+again:
+	/* Iteratively compute the block sizes for a particular granule size */
+	switch (vtcr & VTCR_EL2_TG0_MASK) {
+	case VTCR_EL2_TG0_4K:
+		if	(sz < SZ_4K)	sz = SZ_4K;
+		else if (sz < SZ_2M)	sz = SZ_2M;
+		else if (sz < SZ_1G)	sz = SZ_1G;
+		else			sz = 0;
+		break;
+	case VTCR_EL2_TG0_16K:
+		if	(sz < SZ_16K)	sz = SZ_16K;
+		else if (sz < SZ_32M)	sz = SZ_32M;
+		else			sz = 0;
+		break;
+	case VTCR_EL2_TG0_64K:
+		if	(sz < SZ_64K)	sz = SZ_64K;
+		else if (sz < SZ_512M)	sz = SZ_512M;
+		else			sz = 0;
+		break;
+	default:
+		BUG();
+	}
+
+	if (sz == 0)
+		return 0;
+
+	tmp &= ~(sz - 1);
+	out = (struct kvm_s2_trans) { };
+	kvm_walk_shadow_s2(mmu, tmp, &out);
+	level = FIELD_GET(KVM_NV_GUEST_MAP_SZ, out.upper_attr);
+	if (!level)
+		goto again;
+
+	ttl |= level;
+
+	/*
+	 * We now have found some level information in the shadow S2. Check
+	 * that the resulting range is actually including the original IPA.
+	 */
+	sz = ttl_to_size(ttl);
+	if (addr < (tmp + sz))
+		return ttl;
+
+	return 0;
+}
+
 /* Must be called with kvm->lock held */
 struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr)
 {
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index ebe6fad76b4b..32b8f4c1074a 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2736,10 +2736,13 @@ static unsigned long compute_tlb_inval_range(struct kvm_vcpu *vcpu,
 					     u64 val)
 {
 	unsigned long max_size;
-	u8 ttl = 0;
+	u8 ttl;
 
-	if (cpus_have_const_cap(ARM64_HAS_ARMv8_4_TTL)) {
-		ttl = FIELD_GET(GENMASK_ULL(47, 44), val);
+	ttl = FIELD_GET(GENMASK_ULL(47, 44), val);
+
+	if (!(cpus_have_const_cap(ARM64_HAS_ARMv8_4_TTL) && ttl)) {
+		u64 addr = (val & GENMASK_ULL(35, 0)) << 12;
+		ttl = get_guest_mapping_ttl(vcpu, mmu, addr);
 	}
 
 	max_size = ttl_to_size(ttl);
@@ -2783,6 +2786,8 @@ static bool handle_ipas2e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
 		return false;
 
+	spin_lock(&vcpu->kvm->mmu_lock);
+
 	/*
 	 * We drop a number of things from the supplied value:
 	 *
@@ -2794,8 +2799,6 @@ static bool handle_ipas2e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 	 */
 	base_addr = (p->regval & GENMASK_ULL(35, 0)) << 12;
 
-	spin_lock(&vcpu->kvm->mmu_lock);
-
 	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, HCR_VM);
 	if (mmu) {
 		max_size = compute_tlb_inval_range(vcpu, mmu, p->regval);
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 54/64] KVM: arm64: nv: Invalidate TLBs based on shadow S2 TTL-like information
@ 2022-01-28 12:19   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:19 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

In order to be able to make S2 TLB invalidations more performant on NV,
let's use a scheme derived from the ARMv8.4 TTL extension.

If bits [56:55] in the descriptor are non-zero, they indicate a level
which can be used as an invalidation range.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_nested.h |  4 ++
 arch/arm64/kvm/nested.c             | 98 +++++++++++++++++++++++++++++
 arch/arm64/kvm/sys_regs.c           | 13 ++--
 3 files changed, 110 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 8bfbeca2f4cd..847b652bd376 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -128,6 +128,8 @@ extern bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg,
 extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
 extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
 extern bool forward_nv1_traps(struct kvm_vcpu *vcpu);
+u8 get_guest_mapping_ttl(struct kvm_vcpu *vcpu, struct kvm_s2_mmu *mmu,
+			 u64 addr);
 unsigned int ttl_to_size(u8 ttl);
 
 struct sys_reg_params;
@@ -136,4 +138,6 @@ struct sys_reg_desc;
 void access_nested_id_reg(struct kvm_vcpu *v, struct sys_reg_params *p,
 			  const struct sys_reg_desc *r);
 
+#define KVM_NV_GUEST_MAP_SZ	(KVM_PGTABLE_PROT_SW1 | KVM_PGTABLE_PROT_SW0)
+
 #endif /* __ARM64_KVM_NESTED_H */
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 7e4ff98c4e95..8e99690cdde1 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -4,6 +4,7 @@
  * Author: Jintack Lim <jintack.lim@linaro.org>
  */
 
+#include <linux/bitfield.h>
 #include <linux/kvm.h>
 #include <linux/kvm_host.h>
 
@@ -351,6 +352,29 @@ int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
 	return ret;
 }
 
+static int read_host_s2_desc(phys_addr_t pa, u64 *desc, void *data)
+{
+	u64 *va = phys_to_virt(pa);
+
+	*desc = *va;
+
+	return 0;
+}
+
+static int kvm_walk_shadow_s2(struct kvm_s2_mmu *mmu, phys_addr_t gipa,
+			      struct kvm_s2_trans *result)
+{
+	struct s2_walk_info wi = { };
+
+	wi.read_desc = read_host_s2_desc;
+	wi.baddr = mmu->pgd_phys;
+
+	vtcr_to_walk_info(mmu->arch->vtcr, &wi);
+
+	wi.be = IS_ENABLED(CONFIG_CPU_BIG_ENDIAN);
+
+	return walk_nested_s2_pgd(gipa, &wi, result);
+}
 
 unsigned int ttl_to_size(u8 ttl)
 {
@@ -408,6 +432,80 @@ unsigned int ttl_to_size(u8 ttl)
 	return max_size;
 }
 
+/*
+ * Compute the equivalent of the TTL field by parsing the shadow PT.
+ * The granule size is extracted from VTCR_EL2.TG0 while the level is
+ * retrieved from first entry carrying the level as a tag.
+ */
+u8 get_guest_mapping_ttl(struct kvm_vcpu *vcpu, struct kvm_s2_mmu *mmu,
+			 u64 addr)
+{
+	u64 tmp, sz = 0, vtcr = vcpu_read_sys_reg(vcpu, VTCR_EL2);
+	struct kvm_s2_trans out;
+	u8 ttl, level;
+
+	switch (vtcr & VTCR_EL2_TG0_MASK) {
+	case VTCR_EL2_TG0_4K:
+		ttl = (1 << 2);
+		break;
+	case VTCR_EL2_TG0_16K:
+		ttl = (2 << 2);
+		break;
+	case VTCR_EL2_TG0_64K:
+		ttl = (3 << 2);
+		break;
+	default:
+		BUG();
+	}
+
+	tmp = addr;
+
+again:
+	/* Iteratively compute the block sizes for a particular granule size */
+	switch (vtcr & VTCR_EL2_TG0_MASK) {
+	case VTCR_EL2_TG0_4K:
+		if	(sz < SZ_4K)	sz = SZ_4K;
+		else if (sz < SZ_2M)	sz = SZ_2M;
+		else if (sz < SZ_1G)	sz = SZ_1G;
+		else			sz = 0;
+		break;
+	case VTCR_EL2_TG0_16K:
+		if	(sz < SZ_16K)	sz = SZ_16K;
+		else if (sz < SZ_32M)	sz = SZ_32M;
+		else			sz = 0;
+		break;
+	case VTCR_EL2_TG0_64K:
+		if	(sz < SZ_64K)	sz = SZ_64K;
+		else if (sz < SZ_512M)	sz = SZ_512M;
+		else			sz = 0;
+		break;
+	default:
+		BUG();
+	}
+
+	if (sz == 0)
+		return 0;
+
+	tmp &= ~(sz - 1);
+	out = (struct kvm_s2_trans) { };
+	kvm_walk_shadow_s2(mmu, tmp, &out);
+	level = FIELD_GET(KVM_NV_GUEST_MAP_SZ, out.upper_attr);
+	if (!level)
+		goto again;
+
+	ttl |= level;
+
+	/*
+	 * We now have found some level information in the shadow S2. Check
+	 * that the resulting range is actually including the original IPA.
+	 */
+	sz = ttl_to_size(ttl);
+	if (addr < (tmp + sz))
+		return ttl;
+
+	return 0;
+}
+
 /* Must be called with kvm->lock held */
 struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr)
 {
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index ebe6fad76b4b..32b8f4c1074a 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2736,10 +2736,13 @@ static unsigned long compute_tlb_inval_range(struct kvm_vcpu *vcpu,
 					     u64 val)
 {
 	unsigned long max_size;
-	u8 ttl = 0;
+	u8 ttl;
 
-	if (cpus_have_const_cap(ARM64_HAS_ARMv8_4_TTL)) {
-		ttl = FIELD_GET(GENMASK_ULL(47, 44), val);
+	ttl = FIELD_GET(GENMASK_ULL(47, 44), val);
+
+	if (!(cpus_have_const_cap(ARM64_HAS_ARMv8_4_TTL) && ttl)) {
+		u64 addr = (val & GENMASK_ULL(35, 0)) << 12;
+		ttl = get_guest_mapping_ttl(vcpu, mmu, addr);
 	}
 
 	max_size = ttl_to_size(ttl);
@@ -2783,6 +2786,8 @@ static bool handle_ipas2e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
 		return false;
 
+	spin_lock(&vcpu->kvm->mmu_lock);
+
 	/*
 	 * We drop a number of things from the supplied value:
 	 *
@@ -2794,8 +2799,6 @@ static bool handle_ipas2e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 	 */
 	base_addr = (p->regval & GENMASK_ULL(35, 0)) << 12;
 
-	spin_lock(&vcpu->kvm->mmu_lock);
-
 	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, HCR_VM);
 	if (mmu) {
 		max_size = compute_tlb_inval_range(vcpu, mmu, p->regval);
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 55/64] KVM: arm64: nv: Tag shadow S2 entries with nested level
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:19   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:19 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

Populate bits [56:55] of the leaf entry with the level provided
by the guest's S2 translation.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_nested.h |  7 +++++++
 arch/arm64/kvm/mmu.c                | 11 +++++++++++
 2 files changed, 18 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 847b652bd376..3b5838a2f29e 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -5,6 +5,8 @@
 #include <linux/bitfield.h>
 #include <linux/kvm_host.h>
 
+#include <asm/kvm_pgtable.h>
+
 static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
 {
 	return (!__is_defined(__KVM_NVHE_HYPERVISOR__) &&
@@ -140,4 +142,9 @@ void access_nested_id_reg(struct kvm_vcpu *v, struct sys_reg_params *p,
 
 #define KVM_NV_GUEST_MAP_SZ	(KVM_PGTABLE_PROT_SW1 | KVM_PGTABLE_PROT_SW0)
 
+static inline u64 kvm_encode_nested_level(struct kvm_s2_trans *trans)
+{
+	return FIELD_PREP(KVM_NV_GUEST_MAP_SZ, trans->level);
+}
+
 #endif /* __ARM64_KVM_NESTED_H */
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 4b38f2f3c997..6caa48da1b2e 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1281,11 +1281,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * Potentially reduce shadow S2 permissions to match the guest's own
 	 * S2. For exec faults, we'd only reach this point if the guest
 	 * actually allowed it (see kvm_s2_handle_perm_fault).
+	 *
+	 * Also encode the level of the nested translation in the SW bits of
+	 * the PTE/PMD/PUD. This will be retrived on TLB invalidation from
+	 * the guest.
 	 */
 	if (kvm_is_shadow_s2_fault(vcpu)) {
 		writable &= kvm_s2_trans_writable(nested);
 		if (!kvm_s2_trans_readable(nested))
 			prot &= ~KVM_PGTABLE_PROT_R;
+
+		prot |= kvm_encode_nested_level(nested);
 	}
 
 	spin_lock(&kvm->mmu_lock);
@@ -1336,6 +1342,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * kvm_pgtable_stage2_map() should be called to change block size.
 	 */
 	if (fault_status == FSC_PERM && vma_pagesize == fault_granule) {
+		/*
+		 * Drop the SW bits in favour of those stored in the
+		 * PTE, which will be preserved.
+		 */
+		prot &= ~KVM_NV_GUEST_MAP_SZ;
 		ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot);
 	} else {
 		ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 55/64] KVM: arm64: nv: Tag shadow S2 entries with nested level
@ 2022-01-28 12:19   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:19 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

Populate bits [56:55] of the leaf entry with the level provided
by the guest's S2 translation.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_nested.h |  7 +++++++
 arch/arm64/kvm/mmu.c                | 11 +++++++++++
 2 files changed, 18 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 847b652bd376..3b5838a2f29e 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -5,6 +5,8 @@
 #include <linux/bitfield.h>
 #include <linux/kvm_host.h>
 
+#include <asm/kvm_pgtable.h>
+
 static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
 {
 	return (!__is_defined(__KVM_NVHE_HYPERVISOR__) &&
@@ -140,4 +142,9 @@ void access_nested_id_reg(struct kvm_vcpu *v, struct sys_reg_params *p,
 
 #define KVM_NV_GUEST_MAP_SZ	(KVM_PGTABLE_PROT_SW1 | KVM_PGTABLE_PROT_SW0)
 
+static inline u64 kvm_encode_nested_level(struct kvm_s2_trans *trans)
+{
+	return FIELD_PREP(KVM_NV_GUEST_MAP_SZ, trans->level);
+}
+
 #endif /* __ARM64_KVM_NESTED_H */
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 4b38f2f3c997..6caa48da1b2e 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1281,11 +1281,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * Potentially reduce shadow S2 permissions to match the guest's own
 	 * S2. For exec faults, we'd only reach this point if the guest
 	 * actually allowed it (see kvm_s2_handle_perm_fault).
+	 *
+	 * Also encode the level of the nested translation in the SW bits of
+	 * the PTE/PMD/PUD. This will be retrived on TLB invalidation from
+	 * the guest.
 	 */
 	if (kvm_is_shadow_s2_fault(vcpu)) {
 		writable &= kvm_s2_trans_writable(nested);
 		if (!kvm_s2_trans_readable(nested))
 			prot &= ~KVM_PGTABLE_PROT_R;
+
+		prot |= kvm_encode_nested_level(nested);
 	}
 
 	spin_lock(&kvm->mmu_lock);
@@ -1336,6 +1342,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * kvm_pgtable_stage2_map() should be called to change block size.
 	 */
 	if (fault_status == FSC_PERM && vma_pagesize == fault_granule) {
+		/*
+		 * Drop the SW bits in favour of those stored in the
+		 * PTE, which will be preserved.
+		 */
+		prot &= ~KVM_NV_GUEST_MAP_SZ;
 		ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot);
 	} else {
 		ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize,
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 55/64] KVM: arm64: nv: Tag shadow S2 entries with nested level
@ 2022-01-28 12:19   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:19 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

Populate bits [56:55] of the leaf entry with the level provided
by the guest's S2 translation.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_nested.h |  7 +++++++
 arch/arm64/kvm/mmu.c                | 11 +++++++++++
 2 files changed, 18 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 847b652bd376..3b5838a2f29e 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -5,6 +5,8 @@
 #include <linux/bitfield.h>
 #include <linux/kvm_host.h>
 
+#include <asm/kvm_pgtable.h>
+
 static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
 {
 	return (!__is_defined(__KVM_NVHE_HYPERVISOR__) &&
@@ -140,4 +142,9 @@ void access_nested_id_reg(struct kvm_vcpu *v, struct sys_reg_params *p,
 
 #define KVM_NV_GUEST_MAP_SZ	(KVM_PGTABLE_PROT_SW1 | KVM_PGTABLE_PROT_SW0)
 
+static inline u64 kvm_encode_nested_level(struct kvm_s2_trans *trans)
+{
+	return FIELD_PREP(KVM_NV_GUEST_MAP_SZ, trans->level);
+}
+
 #endif /* __ARM64_KVM_NESTED_H */
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 4b38f2f3c997..6caa48da1b2e 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1281,11 +1281,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * Potentially reduce shadow S2 permissions to match the guest's own
 	 * S2. For exec faults, we'd only reach this point if the guest
 	 * actually allowed it (see kvm_s2_handle_perm_fault).
+	 *
+	 * Also encode the level of the nested translation in the SW bits of
+	 * the PTE/PMD/PUD. This will be retrived on TLB invalidation from
+	 * the guest.
 	 */
 	if (kvm_is_shadow_s2_fault(vcpu)) {
 		writable &= kvm_s2_trans_writable(nested);
 		if (!kvm_s2_trans_readable(nested))
 			prot &= ~KVM_PGTABLE_PROT_R;
+
+		prot |= kvm_encode_nested_level(nested);
 	}
 
 	spin_lock(&kvm->mmu_lock);
@@ -1336,6 +1342,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * kvm_pgtable_stage2_map() should be called to change block size.
 	 */
 	if (fault_status == FSC_PERM && vma_pagesize == fault_granule) {
+		/*
+		 * Drop the SW bits in favour of those stored in the
+		 * PTE, which will be preserved.
+		 */
+		prot &= ~KVM_NV_GUEST_MAP_SZ;
 		ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot);
 	} else {
 		ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize,
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 56/64] KVM: arm64: nv: Add include containing the VNCR_EL2 offsets
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:19   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:19 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

VNCR_EL2 points to a page containing a number of system registers
accessed by a guest hypervisor when ARMv8.4-NV is enabled.

Let's document the offsets in that page, as we are going to use
this layout.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/vncr_mapping.h | 74 +++++++++++++++++++++++++++
 1 file changed, 74 insertions(+)
 create mode 100644 arch/arm64/include/asm/vncr_mapping.h

diff --git a/arch/arm64/include/asm/vncr_mapping.h b/arch/arm64/include/asm/vncr_mapping.h
new file mode 100644
index 000000000000..c0a2acd5045c
--- /dev/null
+++ b/arch/arm64/include/asm/vncr_mapping.h
@@ -0,0 +1,74 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * System register offsets in the VNCR page
+ * All offsets are *byte* displacements!
+ */
+
+#ifndef __ARM64_VNCR_MAPPING_H__
+#define __ARM64_VNCR_MAPPING_H__
+
+#define VNCR_VTTBR_EL2          0x020
+#define VNCR_VTCR_EL2           0x040
+#define VNCR_VMPIDR_EL2         0x050
+#define VNCR_CNTVOFF_EL2        0x060
+#define VNCR_HCR_EL2            0x078
+#define VNCR_HSTR_EL2           0x080
+#define VNCR_VPIDR_EL2          0x088
+#define VNCR_TPIDR_EL2          0x090
+#define VNCR_VNCR_EL2           0x0B0
+#define VNCR_CPACR_EL1          0x100
+#define VNCR_CONTEXTIDR_EL1     0x108
+#define VNCR_SCTLR_EL1          0x110
+#define VNCR_ACTLR_EL1          0x118
+#define VNCR_TCR_EL1            0x120
+#define VNCR_AFSR0_EL1          0x128
+#define VNCR_AFSR1_EL1          0x130
+#define VNCR_ESR_EL1            0x138
+#define VNCR_MAIR_EL1           0x140
+#define VNCR_AMAIR_EL1          0x148
+#define VNCR_MDSCR_EL1          0x158
+#define VNCR_SPSR_EL1           0x160
+#define VNCR_CNTV_CVAL_EL0      0x168
+#define VNCR_CNTV_CTL_EL0       0x170
+#define VNCR_CNTP_CVAL_EL0      0x178
+#define VNCR_CNTP_CTL_EL0       0x180
+#define VNCR_SCXTNUM_EL1        0x188
+#define VNCR_TFSR_EL1		0x190
+#define VNCR_ZCR_EL1            0x1E0
+#define VNCR_TTBR0_EL1          0x200
+#define VNCR_TTBR1_EL1          0x210
+#define VNCR_FAR_EL1            0x220
+#define VNCR_ELR_EL1            0x230
+#define VNCR_SP_EL1             0x240
+#define VNCR_VBAR_EL1           0x250
+#define VNCR_ICH_LR0_EL2        0x400
+//      VNCR_ICH_LRN_EL2(n)     VNCR_ICH_LR0_EL2+8*((n) & 7)
+#define VNCR_ICH_AP0R0_EL2      0x480
+//      VNCR_ICH_AP0RN_EL2(n)   VNCR_ICH_AP0R0_EL2+8*((n) & 3)
+#define VNCR_ICH_AP1R0_EL2      0x4A0
+//      VNCR_ICH_AP1RN_EL2(n)   VNCR_ICH_AP1R0_EL2+8*((n) & 3)
+#define VNCR_ICH_HCR_EL2        0x4C0
+#define VNCR_ICH_VMCR_EL2       0x4C8
+#define VNCR_VDISR_EL2          0x500
+#define VNCR_PMBLIMITR_EL1      0x800
+#define VNCR_PMBPTR_EL1         0x810
+#define VNCR_PMBSR_EL1          0x820
+#define VNCR_PMSCR_EL1          0x828
+#define VNCR_PMSEVFR_EL1        0x830
+#define VNCR_PMSICR_EL1         0x838
+#define VNCR_PMSIRR_EL1         0x840
+#define VNCR_PMSLATFR_EL1       0x848
+#define VNCR_TRFCR_EL1          0x880
+#define VNCR_MPAM1_EL1          0x900
+#define VNCR_MPAMHCR_EL2        0x930
+#define VNCR_MPAMVPMV_EL2       0x938
+#define VNCR_MPAMVPM0_EL2       0x940
+#define VNCR_MPAMVPM1_EL2       0x948
+#define VNCR_MPAMVPM2_EL2       0x950
+#define VNCR_MPAMVPM3_EL2       0x958
+#define VNCR_MPAMVPM4_EL2       0x960
+#define VNCR_MPAMVPM5_EL2       0x968
+#define VNCR_MPAMVPM6_EL2       0x970
+#define VNCR_MPAMVPM7_EL2       0x978
+
+#endif /* __ARM64_VNCR_MAPPING_H__ */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 56/64] KVM: arm64: nv: Add include containing the VNCR_EL2 offsets
@ 2022-01-28 12:19   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:19 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

VNCR_EL2 points to a page containing a number of system registers
accessed by a guest hypervisor when ARMv8.4-NV is enabled.

Let's document the offsets in that page, as we are going to use
this layout.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/vncr_mapping.h | 74 +++++++++++++++++++++++++++
 1 file changed, 74 insertions(+)
 create mode 100644 arch/arm64/include/asm/vncr_mapping.h

diff --git a/arch/arm64/include/asm/vncr_mapping.h b/arch/arm64/include/asm/vncr_mapping.h
new file mode 100644
index 000000000000..c0a2acd5045c
--- /dev/null
+++ b/arch/arm64/include/asm/vncr_mapping.h
@@ -0,0 +1,74 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * System register offsets in the VNCR page
+ * All offsets are *byte* displacements!
+ */
+
+#ifndef __ARM64_VNCR_MAPPING_H__
+#define __ARM64_VNCR_MAPPING_H__
+
+#define VNCR_VTTBR_EL2          0x020
+#define VNCR_VTCR_EL2           0x040
+#define VNCR_VMPIDR_EL2         0x050
+#define VNCR_CNTVOFF_EL2        0x060
+#define VNCR_HCR_EL2            0x078
+#define VNCR_HSTR_EL2           0x080
+#define VNCR_VPIDR_EL2          0x088
+#define VNCR_TPIDR_EL2          0x090
+#define VNCR_VNCR_EL2           0x0B0
+#define VNCR_CPACR_EL1          0x100
+#define VNCR_CONTEXTIDR_EL1     0x108
+#define VNCR_SCTLR_EL1          0x110
+#define VNCR_ACTLR_EL1          0x118
+#define VNCR_TCR_EL1            0x120
+#define VNCR_AFSR0_EL1          0x128
+#define VNCR_AFSR1_EL1          0x130
+#define VNCR_ESR_EL1            0x138
+#define VNCR_MAIR_EL1           0x140
+#define VNCR_AMAIR_EL1          0x148
+#define VNCR_MDSCR_EL1          0x158
+#define VNCR_SPSR_EL1           0x160
+#define VNCR_CNTV_CVAL_EL0      0x168
+#define VNCR_CNTV_CTL_EL0       0x170
+#define VNCR_CNTP_CVAL_EL0      0x178
+#define VNCR_CNTP_CTL_EL0       0x180
+#define VNCR_SCXTNUM_EL1        0x188
+#define VNCR_TFSR_EL1		0x190
+#define VNCR_ZCR_EL1            0x1E0
+#define VNCR_TTBR0_EL1          0x200
+#define VNCR_TTBR1_EL1          0x210
+#define VNCR_FAR_EL1            0x220
+#define VNCR_ELR_EL1            0x230
+#define VNCR_SP_EL1             0x240
+#define VNCR_VBAR_EL1           0x250
+#define VNCR_ICH_LR0_EL2        0x400
+//      VNCR_ICH_LRN_EL2(n)     VNCR_ICH_LR0_EL2+8*((n) & 7)
+#define VNCR_ICH_AP0R0_EL2      0x480
+//      VNCR_ICH_AP0RN_EL2(n)   VNCR_ICH_AP0R0_EL2+8*((n) & 3)
+#define VNCR_ICH_AP1R0_EL2      0x4A0
+//      VNCR_ICH_AP1RN_EL2(n)   VNCR_ICH_AP1R0_EL2+8*((n) & 3)
+#define VNCR_ICH_HCR_EL2        0x4C0
+#define VNCR_ICH_VMCR_EL2       0x4C8
+#define VNCR_VDISR_EL2          0x500
+#define VNCR_PMBLIMITR_EL1      0x800
+#define VNCR_PMBPTR_EL1         0x810
+#define VNCR_PMBSR_EL1          0x820
+#define VNCR_PMSCR_EL1          0x828
+#define VNCR_PMSEVFR_EL1        0x830
+#define VNCR_PMSICR_EL1         0x838
+#define VNCR_PMSIRR_EL1         0x840
+#define VNCR_PMSLATFR_EL1       0x848
+#define VNCR_TRFCR_EL1          0x880
+#define VNCR_MPAM1_EL1          0x900
+#define VNCR_MPAMHCR_EL2        0x930
+#define VNCR_MPAMVPMV_EL2       0x938
+#define VNCR_MPAMVPM0_EL2       0x940
+#define VNCR_MPAMVPM1_EL2       0x948
+#define VNCR_MPAMVPM2_EL2       0x950
+#define VNCR_MPAMVPM3_EL2       0x958
+#define VNCR_MPAMVPM4_EL2       0x960
+#define VNCR_MPAMVPM5_EL2       0x968
+#define VNCR_MPAMVPM6_EL2       0x970
+#define VNCR_MPAMVPM7_EL2       0x978
+
+#endif /* __ARM64_VNCR_MAPPING_H__ */
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 56/64] KVM: arm64: nv: Add include containing the VNCR_EL2 offsets
@ 2022-01-28 12:19   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:19 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

VNCR_EL2 points to a page containing a number of system registers
accessed by a guest hypervisor when ARMv8.4-NV is enabled.

Let's document the offsets in that page, as we are going to use
this layout.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/vncr_mapping.h | 74 +++++++++++++++++++++++++++
 1 file changed, 74 insertions(+)
 create mode 100644 arch/arm64/include/asm/vncr_mapping.h

diff --git a/arch/arm64/include/asm/vncr_mapping.h b/arch/arm64/include/asm/vncr_mapping.h
new file mode 100644
index 000000000000..c0a2acd5045c
--- /dev/null
+++ b/arch/arm64/include/asm/vncr_mapping.h
@@ -0,0 +1,74 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * System register offsets in the VNCR page
+ * All offsets are *byte* displacements!
+ */
+
+#ifndef __ARM64_VNCR_MAPPING_H__
+#define __ARM64_VNCR_MAPPING_H__
+
+#define VNCR_VTTBR_EL2          0x020
+#define VNCR_VTCR_EL2           0x040
+#define VNCR_VMPIDR_EL2         0x050
+#define VNCR_CNTVOFF_EL2        0x060
+#define VNCR_HCR_EL2            0x078
+#define VNCR_HSTR_EL2           0x080
+#define VNCR_VPIDR_EL2          0x088
+#define VNCR_TPIDR_EL2          0x090
+#define VNCR_VNCR_EL2           0x0B0
+#define VNCR_CPACR_EL1          0x100
+#define VNCR_CONTEXTIDR_EL1     0x108
+#define VNCR_SCTLR_EL1          0x110
+#define VNCR_ACTLR_EL1          0x118
+#define VNCR_TCR_EL1            0x120
+#define VNCR_AFSR0_EL1          0x128
+#define VNCR_AFSR1_EL1          0x130
+#define VNCR_ESR_EL1            0x138
+#define VNCR_MAIR_EL1           0x140
+#define VNCR_AMAIR_EL1          0x148
+#define VNCR_MDSCR_EL1          0x158
+#define VNCR_SPSR_EL1           0x160
+#define VNCR_CNTV_CVAL_EL0      0x168
+#define VNCR_CNTV_CTL_EL0       0x170
+#define VNCR_CNTP_CVAL_EL0      0x178
+#define VNCR_CNTP_CTL_EL0       0x180
+#define VNCR_SCXTNUM_EL1        0x188
+#define VNCR_TFSR_EL1		0x190
+#define VNCR_ZCR_EL1            0x1E0
+#define VNCR_TTBR0_EL1          0x200
+#define VNCR_TTBR1_EL1          0x210
+#define VNCR_FAR_EL1            0x220
+#define VNCR_ELR_EL1            0x230
+#define VNCR_SP_EL1             0x240
+#define VNCR_VBAR_EL1           0x250
+#define VNCR_ICH_LR0_EL2        0x400
+//      VNCR_ICH_LRN_EL2(n)     VNCR_ICH_LR0_EL2+8*((n) & 7)
+#define VNCR_ICH_AP0R0_EL2      0x480
+//      VNCR_ICH_AP0RN_EL2(n)   VNCR_ICH_AP0R0_EL2+8*((n) & 3)
+#define VNCR_ICH_AP1R0_EL2      0x4A0
+//      VNCR_ICH_AP1RN_EL2(n)   VNCR_ICH_AP1R0_EL2+8*((n) & 3)
+#define VNCR_ICH_HCR_EL2        0x4C0
+#define VNCR_ICH_VMCR_EL2       0x4C8
+#define VNCR_VDISR_EL2          0x500
+#define VNCR_PMBLIMITR_EL1      0x800
+#define VNCR_PMBPTR_EL1         0x810
+#define VNCR_PMBSR_EL1          0x820
+#define VNCR_PMSCR_EL1          0x828
+#define VNCR_PMSEVFR_EL1        0x830
+#define VNCR_PMSICR_EL1         0x838
+#define VNCR_PMSIRR_EL1         0x840
+#define VNCR_PMSLATFR_EL1       0x848
+#define VNCR_TRFCR_EL1          0x880
+#define VNCR_MPAM1_EL1          0x900
+#define VNCR_MPAMHCR_EL2        0x930
+#define VNCR_MPAMVPMV_EL2       0x938
+#define VNCR_MPAMVPM0_EL2       0x940
+#define VNCR_MPAMVPM1_EL2       0x948
+#define VNCR_MPAMVPM2_EL2       0x950
+#define VNCR_MPAMVPM3_EL2       0x958
+#define VNCR_MPAMVPM4_EL2       0x960
+#define VNCR_MPAMVPM5_EL2       0x968
+#define VNCR_MPAMVPM6_EL2       0x970
+#define VNCR_MPAMVPM7_EL2       0x978
+
+#endif /* __ARM64_VNCR_MAPPING_H__ */
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 57/64] KVM: arm64: nv: Map VNCR-capable registers to a separate page
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:19   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:19 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

With ARMv8.4-NV, registers that can be directly accessed in memory
by the guest have to live at architected offsets in a special page.

Let's annotate the sysreg enum to reflect the offset at which they
are in this page, whith a little twist:

If running on HW that doesn't have the ARMv8.4-NV feature, or even
a VM that doesn't use NV, we store all the system registers in the
usual sys_regs array. The only difference with the pre-8.4
situation is that VNCR-capable registers are at a "similar" offset
as in the VNCR page (we can compute the actual offset at compile
time), and that the sys_regs array is both bigger and sparse.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h | 99 ++++++++++++++++++++-----------
 1 file changed, 64 insertions(+), 35 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 0ea8cf0eeca7..3f6ddccae310 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -26,6 +26,7 @@
 #include <asm/fpsimd.h>
 #include <asm/kvm.h>
 #include <asm/kvm_asm.h>
+#include <asm/vncr_mapping.h>
 
 #define __KVM_HAVE_ARCH_INTC_INITIALIZED
 
@@ -180,31 +181,32 @@ struct kvm_vcpu_fault_info {
 	u64 disr_el1;		/* Deferred [SError] Status Register */
 };
 
+/*
+ * VNCR() just places the VNCR_capable registers in the enum after
+ * __VNCR_START__, and the value (after correction) to be an 8-byte offset
+ * from the VNCR base. As we don't require the enum to be otherwise ordered,
+ * we need the terrible hack below to ensure that we correctly size the
+ * sys_regs array, no matter what.
+ *
+ * The __MAX__ macro has been lifted from Sean Eron Anderson's wonderful
+ * treasure trove of bit hacks:
+ * https://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMax
+ */
+#define __MAX__(x,y)	((x) ^ (((x) ^ (y)) & -((x) < (y))))
+#define VNCR(r)						\
+	__before_##r,					\
+	r = __VNCR_START__ + ((VNCR_ ## r) / 8),	\
+	__after_##r = __MAX__(__before_##r - 1, r)
+
 enum vcpu_sysreg {
 	__INVALID_SYSREG__,   /* 0 is reserved as an invalid value */
 	MPIDR_EL1,	/* MultiProcessor Affinity Register */
 	CSSELR_EL1,	/* Cache Size Selection Register */
-	SCTLR_EL1,	/* System Control Register */
-	ACTLR_EL1,	/* Auxiliary Control Register */
-	CPACR_EL1,	/* Coprocessor Access Control */
-	ZCR_EL1,	/* SVE Control */
-	TTBR0_EL1,	/* Translation Table Base Register 0 */
-	TTBR1_EL1,	/* Translation Table Base Register 1 */
-	TCR_EL1,	/* Translation Control Register */
-	ESR_EL1,	/* Exception Syndrome Register */
-	AFSR0_EL1,	/* Auxiliary Fault Status Register 0 */
-	AFSR1_EL1,	/* Auxiliary Fault Status Register 1 */
-	FAR_EL1,	/* Fault Address Register */
-	MAIR_EL1,	/* Memory Attribute Indirection Register */
-	VBAR_EL1,	/* Vector Base Address Register */
-	CONTEXTIDR_EL1,	/* Context ID Register */
 	TPIDR_EL0,	/* Thread ID, User R/W */
 	TPIDRRO_EL0,	/* Thread ID, User R/O */
 	TPIDR_EL1,	/* Thread ID, Privileged */
-	AMAIR_EL1,	/* Aux Memory Attribute Indirection Register */
 	CNTKCTL_EL1,	/* Timer Control Register (EL1) */
 	PAR_EL1,	/* Physical Address Register */
-	MDSCR_EL1,	/* Monitor Debug System Control Register */
 	MDCCINT_EL1,	/* Monitor Debug Comms Channel Interrupt Enable Reg */
 	DISR_EL1,	/* Deferred Interrupt Status Register */
 
@@ -234,20 +236,9 @@ enum vcpu_sysreg {
 	APGAKEYLO_EL1,
 	APGAKEYHI_EL1,
 
-	ELR_EL1,
-	SP_EL1,
-	SPSR_EL1,
-
-	CNTVOFF_EL2,
-	CNTV_CVAL_EL0,
-	CNTV_CTL_EL0,
-	CNTP_CVAL_EL0,
-	CNTP_CTL_EL0,
-
 	/* Memory Tagging Extension registers */
 	RGSR_EL1,	/* Random Allocation Tag Seed Register */
 	GCR_EL1,	/* Tag Control Register */
-	TFSR_EL1,	/* Tag Fault Status Register (EL1) */
 	TFSRE0_EL1,	/* Tag Fault Status Register (EL0) */
 
 	/* 32bit specific registers. */
@@ -257,20 +248,14 @@ enum vcpu_sysreg {
 	DBGVCR32_EL2,	/* Debug Vector Catch Register */
 
 	/* EL2 registers */
-	VPIDR_EL2,	/* Virtualization Processor ID Register */
-	VMPIDR_EL2,	/* Virtualization Multiprocessor ID Register */
 	SCTLR_EL2,	/* System Control Register (EL2) */
 	ACTLR_EL2,	/* Auxiliary Control Register (EL2) */
-	HCR_EL2,	/* Hypervisor Configuration Register */
 	MDCR_EL2,	/* Monitor Debug Configuration Register (EL2) */
 	CPTR_EL2,	/* Architectural Feature Trap Register (EL2) */
-	HSTR_EL2,	/* Hypervisor System Trap Register */
 	HACR_EL2,	/* Hypervisor Auxiliary Control Register */
 	TTBR0_EL2,	/* Translation Table Base Register 0 (EL2) */
 	TTBR1_EL2,	/* Translation Table Base Register 1 (EL2) */
 	TCR_EL2,	/* Translation Control Register (EL2) */
-	VTTBR_EL2,	/* Virtualization Translation Table Base Register */
-	VTCR_EL2,	/* Virtualization Translation Control Register */
 	SPSR_EL2,	/* EL2 saved program status register */
 	ELR_EL2,	/* EL2 exception link register */
 	AFSR0_EL2,	/* Auxiliary Fault Status Register 0 (EL2) */
@@ -283,7 +268,6 @@ enum vcpu_sysreg {
 	VBAR_EL2,	/* Vector Base Address Register (EL2) */
 	RVBAR_EL2,	/* Reset Vector Base Address Register */
 	CONTEXTIDR_EL2,	/* Context ID Register (EL2) */
-	TPIDR_EL2,	/* EL2 Software Thread ID Register */
 	CNTHCTL_EL2,	/* Counter-timer Hypervisor Control register */
 	SP_EL2,		/* EL2 Stack Pointer */
 	CNTHP_CTL_EL2,
@@ -291,6 +275,42 @@ enum vcpu_sysreg {
 	CNTHV_CTL_EL2,
 	CNTHV_CVAL_EL2,
 
+	__VNCR_START__,	/* Any VNCR-capable reg goes after this point */
+
+	VNCR(SCTLR_EL1),/* System Control Register */
+	VNCR(ACTLR_EL1),/* Auxiliary Control Register */
+	VNCR(CPACR_EL1),/* Coprocessor Access Control */
+	VNCR(ZCR_EL1),	/* SVE Control */
+	VNCR(TTBR0_EL1),/* Translation Table Base Register 0 */
+	VNCR(TTBR1_EL1),/* Translation Table Base Register 1 */
+	VNCR(TCR_EL1),	/* Translation Control Register */
+	VNCR(ESR_EL1),	/* Exception Syndrome Register */
+	VNCR(AFSR0_EL1),/* Auxiliary Fault Status Register 0 */
+	VNCR(AFSR1_EL1),/* Auxiliary Fault Status Register 1 */
+	VNCR(FAR_EL1),	/* Fault Address Register */
+	VNCR(MAIR_EL1),	/* Memory Attribute Indirection Register */
+	VNCR(VBAR_EL1),	/* Vector Base Address Register */
+	VNCR(CONTEXTIDR_EL1),	/* Context ID Register */
+	VNCR(AMAIR_EL1),/* Aux Memory Attribute Indirection Register */
+	VNCR(MDSCR_EL1),/* Monitor Debug System Control Register */
+	VNCR(ELR_EL1),
+	VNCR(SP_EL1),
+	VNCR(SPSR_EL1),
+	VNCR(TFSR_EL1),	/* Tag Fault Status Register (EL1) */
+	VNCR(VPIDR_EL2),/* Virtualization Processor ID Register */
+	VNCR(VMPIDR_EL2),/* Virtualization Multiprocessor ID Register */
+	VNCR(HCR_EL2),	/* Hypervisor Configuration Register */
+	VNCR(HSTR_EL2),	/* Hypervisor System Trap Register */
+	VNCR(VTTBR_EL2),/* Virtualization Translation Table Base Register */
+	VNCR(VTCR_EL2),	/* Virtualization Translation Control Register */
+	VNCR(TPIDR_EL2),/* EL2 Software Thread ID Register */
+
+	VNCR(CNTVOFF_EL2),
+	VNCR(CNTV_CVAL_EL0),
+	VNCR(CNTV_CTL_EL0),
+	VNCR(CNTP_CVAL_EL0),
+	VNCR(CNTP_CTL_EL0),
+
 	NR_SYS_REGS	/* Nothing after this line! */
 };
 
@@ -307,6 +327,9 @@ struct kvm_cpu_context {
 	u64 sys_regs[NR_SYS_REGS];
 
 	struct kvm_vcpu *__hyp_running_vcpu;
+
+	/* This pointer has to be 4kB aligned. */
+	u64 *vncr_array;
 };
 
 struct kvm_pmu_events {
@@ -534,7 +557,13 @@ struct kvm_vcpu_arch {
  * for system registers that are never context switched, but only
  * emulated.
  */
-#define __ctxt_sys_reg(c,r)	(&(c)->sys_regs[(r)])
+static inline u64 *__ctxt_sys_reg(const struct kvm_cpu_context *ctxt, int r)
+{
+	if (unlikely(r >= __VNCR_START__ && ctxt->vncr_array))
+		return &ctxt->vncr_array[r - __VNCR_START__];
+
+	return (u64 *)&ctxt->sys_regs[r];
+}
 
 #define ctxt_sys_reg(c,r)	(*__ctxt_sys_reg(c,r))
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 57/64] KVM: arm64: nv: Map VNCR-capable registers to a separate page
@ 2022-01-28 12:19   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:19 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

With ARMv8.4-NV, registers that can be directly accessed in memory
by the guest have to live at architected offsets in a special page.

Let's annotate the sysreg enum to reflect the offset at which they
are in this page, whith a little twist:

If running on HW that doesn't have the ARMv8.4-NV feature, or even
a VM that doesn't use NV, we store all the system registers in the
usual sys_regs array. The only difference with the pre-8.4
situation is that VNCR-capable registers are at a "similar" offset
as in the VNCR page (we can compute the actual offset at compile
time), and that the sys_regs array is both bigger and sparse.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h | 99 ++++++++++++++++++++-----------
 1 file changed, 64 insertions(+), 35 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 0ea8cf0eeca7..3f6ddccae310 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -26,6 +26,7 @@
 #include <asm/fpsimd.h>
 #include <asm/kvm.h>
 #include <asm/kvm_asm.h>
+#include <asm/vncr_mapping.h>
 
 #define __KVM_HAVE_ARCH_INTC_INITIALIZED
 
@@ -180,31 +181,32 @@ struct kvm_vcpu_fault_info {
 	u64 disr_el1;		/* Deferred [SError] Status Register */
 };
 
+/*
+ * VNCR() just places the VNCR_capable registers in the enum after
+ * __VNCR_START__, and the value (after correction) to be an 8-byte offset
+ * from the VNCR base. As we don't require the enum to be otherwise ordered,
+ * we need the terrible hack below to ensure that we correctly size the
+ * sys_regs array, no matter what.
+ *
+ * The __MAX__ macro has been lifted from Sean Eron Anderson's wonderful
+ * treasure trove of bit hacks:
+ * https://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMax
+ */
+#define __MAX__(x,y)	((x) ^ (((x) ^ (y)) & -((x) < (y))))
+#define VNCR(r)						\
+	__before_##r,					\
+	r = __VNCR_START__ + ((VNCR_ ## r) / 8),	\
+	__after_##r = __MAX__(__before_##r - 1, r)
+
 enum vcpu_sysreg {
 	__INVALID_SYSREG__,   /* 0 is reserved as an invalid value */
 	MPIDR_EL1,	/* MultiProcessor Affinity Register */
 	CSSELR_EL1,	/* Cache Size Selection Register */
-	SCTLR_EL1,	/* System Control Register */
-	ACTLR_EL1,	/* Auxiliary Control Register */
-	CPACR_EL1,	/* Coprocessor Access Control */
-	ZCR_EL1,	/* SVE Control */
-	TTBR0_EL1,	/* Translation Table Base Register 0 */
-	TTBR1_EL1,	/* Translation Table Base Register 1 */
-	TCR_EL1,	/* Translation Control Register */
-	ESR_EL1,	/* Exception Syndrome Register */
-	AFSR0_EL1,	/* Auxiliary Fault Status Register 0 */
-	AFSR1_EL1,	/* Auxiliary Fault Status Register 1 */
-	FAR_EL1,	/* Fault Address Register */
-	MAIR_EL1,	/* Memory Attribute Indirection Register */
-	VBAR_EL1,	/* Vector Base Address Register */
-	CONTEXTIDR_EL1,	/* Context ID Register */
 	TPIDR_EL0,	/* Thread ID, User R/W */
 	TPIDRRO_EL0,	/* Thread ID, User R/O */
 	TPIDR_EL1,	/* Thread ID, Privileged */
-	AMAIR_EL1,	/* Aux Memory Attribute Indirection Register */
 	CNTKCTL_EL1,	/* Timer Control Register (EL1) */
 	PAR_EL1,	/* Physical Address Register */
-	MDSCR_EL1,	/* Monitor Debug System Control Register */
 	MDCCINT_EL1,	/* Monitor Debug Comms Channel Interrupt Enable Reg */
 	DISR_EL1,	/* Deferred Interrupt Status Register */
 
@@ -234,20 +236,9 @@ enum vcpu_sysreg {
 	APGAKEYLO_EL1,
 	APGAKEYHI_EL1,
 
-	ELR_EL1,
-	SP_EL1,
-	SPSR_EL1,
-
-	CNTVOFF_EL2,
-	CNTV_CVAL_EL0,
-	CNTV_CTL_EL0,
-	CNTP_CVAL_EL0,
-	CNTP_CTL_EL0,
-
 	/* Memory Tagging Extension registers */
 	RGSR_EL1,	/* Random Allocation Tag Seed Register */
 	GCR_EL1,	/* Tag Control Register */
-	TFSR_EL1,	/* Tag Fault Status Register (EL1) */
 	TFSRE0_EL1,	/* Tag Fault Status Register (EL0) */
 
 	/* 32bit specific registers. */
@@ -257,20 +248,14 @@ enum vcpu_sysreg {
 	DBGVCR32_EL2,	/* Debug Vector Catch Register */
 
 	/* EL2 registers */
-	VPIDR_EL2,	/* Virtualization Processor ID Register */
-	VMPIDR_EL2,	/* Virtualization Multiprocessor ID Register */
 	SCTLR_EL2,	/* System Control Register (EL2) */
 	ACTLR_EL2,	/* Auxiliary Control Register (EL2) */
-	HCR_EL2,	/* Hypervisor Configuration Register */
 	MDCR_EL2,	/* Monitor Debug Configuration Register (EL2) */
 	CPTR_EL2,	/* Architectural Feature Trap Register (EL2) */
-	HSTR_EL2,	/* Hypervisor System Trap Register */
 	HACR_EL2,	/* Hypervisor Auxiliary Control Register */
 	TTBR0_EL2,	/* Translation Table Base Register 0 (EL2) */
 	TTBR1_EL2,	/* Translation Table Base Register 1 (EL2) */
 	TCR_EL2,	/* Translation Control Register (EL2) */
-	VTTBR_EL2,	/* Virtualization Translation Table Base Register */
-	VTCR_EL2,	/* Virtualization Translation Control Register */
 	SPSR_EL2,	/* EL2 saved program status register */
 	ELR_EL2,	/* EL2 exception link register */
 	AFSR0_EL2,	/* Auxiliary Fault Status Register 0 (EL2) */
@@ -283,7 +268,6 @@ enum vcpu_sysreg {
 	VBAR_EL2,	/* Vector Base Address Register (EL2) */
 	RVBAR_EL2,	/* Reset Vector Base Address Register */
 	CONTEXTIDR_EL2,	/* Context ID Register (EL2) */
-	TPIDR_EL2,	/* EL2 Software Thread ID Register */
 	CNTHCTL_EL2,	/* Counter-timer Hypervisor Control register */
 	SP_EL2,		/* EL2 Stack Pointer */
 	CNTHP_CTL_EL2,
@@ -291,6 +275,42 @@ enum vcpu_sysreg {
 	CNTHV_CTL_EL2,
 	CNTHV_CVAL_EL2,
 
+	__VNCR_START__,	/* Any VNCR-capable reg goes after this point */
+
+	VNCR(SCTLR_EL1),/* System Control Register */
+	VNCR(ACTLR_EL1),/* Auxiliary Control Register */
+	VNCR(CPACR_EL1),/* Coprocessor Access Control */
+	VNCR(ZCR_EL1),	/* SVE Control */
+	VNCR(TTBR0_EL1),/* Translation Table Base Register 0 */
+	VNCR(TTBR1_EL1),/* Translation Table Base Register 1 */
+	VNCR(TCR_EL1),	/* Translation Control Register */
+	VNCR(ESR_EL1),	/* Exception Syndrome Register */
+	VNCR(AFSR0_EL1),/* Auxiliary Fault Status Register 0 */
+	VNCR(AFSR1_EL1),/* Auxiliary Fault Status Register 1 */
+	VNCR(FAR_EL1),	/* Fault Address Register */
+	VNCR(MAIR_EL1),	/* Memory Attribute Indirection Register */
+	VNCR(VBAR_EL1),	/* Vector Base Address Register */
+	VNCR(CONTEXTIDR_EL1),	/* Context ID Register */
+	VNCR(AMAIR_EL1),/* Aux Memory Attribute Indirection Register */
+	VNCR(MDSCR_EL1),/* Monitor Debug System Control Register */
+	VNCR(ELR_EL1),
+	VNCR(SP_EL1),
+	VNCR(SPSR_EL1),
+	VNCR(TFSR_EL1),	/* Tag Fault Status Register (EL1) */
+	VNCR(VPIDR_EL2),/* Virtualization Processor ID Register */
+	VNCR(VMPIDR_EL2),/* Virtualization Multiprocessor ID Register */
+	VNCR(HCR_EL2),	/* Hypervisor Configuration Register */
+	VNCR(HSTR_EL2),	/* Hypervisor System Trap Register */
+	VNCR(VTTBR_EL2),/* Virtualization Translation Table Base Register */
+	VNCR(VTCR_EL2),	/* Virtualization Translation Control Register */
+	VNCR(TPIDR_EL2),/* EL2 Software Thread ID Register */
+
+	VNCR(CNTVOFF_EL2),
+	VNCR(CNTV_CVAL_EL0),
+	VNCR(CNTV_CTL_EL0),
+	VNCR(CNTP_CVAL_EL0),
+	VNCR(CNTP_CTL_EL0),
+
 	NR_SYS_REGS	/* Nothing after this line! */
 };
 
@@ -307,6 +327,9 @@ struct kvm_cpu_context {
 	u64 sys_regs[NR_SYS_REGS];
 
 	struct kvm_vcpu *__hyp_running_vcpu;
+
+	/* This pointer has to be 4kB aligned. */
+	u64 *vncr_array;
 };
 
 struct kvm_pmu_events {
@@ -534,7 +557,13 @@ struct kvm_vcpu_arch {
  * for system registers that are never context switched, but only
  * emulated.
  */
-#define __ctxt_sys_reg(c,r)	(&(c)->sys_regs[(r)])
+static inline u64 *__ctxt_sys_reg(const struct kvm_cpu_context *ctxt, int r)
+{
+	if (unlikely(r >= __VNCR_START__ && ctxt->vncr_array))
+		return &ctxt->vncr_array[r - __VNCR_START__];
+
+	return (u64 *)&ctxt->sys_regs[r];
+}
 
 #define ctxt_sys_reg(c,r)	(*__ctxt_sys_reg(c,r))
 
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 57/64] KVM: arm64: nv: Map VNCR-capable registers to a separate page
@ 2022-01-28 12:19   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:19 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

With ARMv8.4-NV, registers that can be directly accessed in memory
by the guest have to live at architected offsets in a special page.

Let's annotate the sysreg enum to reflect the offset at which they
are in this page, whith a little twist:

If running on HW that doesn't have the ARMv8.4-NV feature, or even
a VM that doesn't use NV, we store all the system registers in the
usual sys_regs array. The only difference with the pre-8.4
situation is that VNCR-capable registers are at a "similar" offset
as in the VNCR page (we can compute the actual offset at compile
time), and that the sys_regs array is both bigger and sparse.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h | 99 ++++++++++++++++++++-----------
 1 file changed, 64 insertions(+), 35 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 0ea8cf0eeca7..3f6ddccae310 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -26,6 +26,7 @@
 #include <asm/fpsimd.h>
 #include <asm/kvm.h>
 #include <asm/kvm_asm.h>
+#include <asm/vncr_mapping.h>
 
 #define __KVM_HAVE_ARCH_INTC_INITIALIZED
 
@@ -180,31 +181,32 @@ struct kvm_vcpu_fault_info {
 	u64 disr_el1;		/* Deferred [SError] Status Register */
 };
 
+/*
+ * VNCR() just places the VNCR_capable registers in the enum after
+ * __VNCR_START__, and the value (after correction) to be an 8-byte offset
+ * from the VNCR base. As we don't require the enum to be otherwise ordered,
+ * we need the terrible hack below to ensure that we correctly size the
+ * sys_regs array, no matter what.
+ *
+ * The __MAX__ macro has been lifted from Sean Eron Anderson's wonderful
+ * treasure trove of bit hacks:
+ * https://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMax
+ */
+#define __MAX__(x,y)	((x) ^ (((x) ^ (y)) & -((x) < (y))))
+#define VNCR(r)						\
+	__before_##r,					\
+	r = __VNCR_START__ + ((VNCR_ ## r) / 8),	\
+	__after_##r = __MAX__(__before_##r - 1, r)
+
 enum vcpu_sysreg {
 	__INVALID_SYSREG__,   /* 0 is reserved as an invalid value */
 	MPIDR_EL1,	/* MultiProcessor Affinity Register */
 	CSSELR_EL1,	/* Cache Size Selection Register */
-	SCTLR_EL1,	/* System Control Register */
-	ACTLR_EL1,	/* Auxiliary Control Register */
-	CPACR_EL1,	/* Coprocessor Access Control */
-	ZCR_EL1,	/* SVE Control */
-	TTBR0_EL1,	/* Translation Table Base Register 0 */
-	TTBR1_EL1,	/* Translation Table Base Register 1 */
-	TCR_EL1,	/* Translation Control Register */
-	ESR_EL1,	/* Exception Syndrome Register */
-	AFSR0_EL1,	/* Auxiliary Fault Status Register 0 */
-	AFSR1_EL1,	/* Auxiliary Fault Status Register 1 */
-	FAR_EL1,	/* Fault Address Register */
-	MAIR_EL1,	/* Memory Attribute Indirection Register */
-	VBAR_EL1,	/* Vector Base Address Register */
-	CONTEXTIDR_EL1,	/* Context ID Register */
 	TPIDR_EL0,	/* Thread ID, User R/W */
 	TPIDRRO_EL0,	/* Thread ID, User R/O */
 	TPIDR_EL1,	/* Thread ID, Privileged */
-	AMAIR_EL1,	/* Aux Memory Attribute Indirection Register */
 	CNTKCTL_EL1,	/* Timer Control Register (EL1) */
 	PAR_EL1,	/* Physical Address Register */
-	MDSCR_EL1,	/* Monitor Debug System Control Register */
 	MDCCINT_EL1,	/* Monitor Debug Comms Channel Interrupt Enable Reg */
 	DISR_EL1,	/* Deferred Interrupt Status Register */
 
@@ -234,20 +236,9 @@ enum vcpu_sysreg {
 	APGAKEYLO_EL1,
 	APGAKEYHI_EL1,
 
-	ELR_EL1,
-	SP_EL1,
-	SPSR_EL1,
-
-	CNTVOFF_EL2,
-	CNTV_CVAL_EL0,
-	CNTV_CTL_EL0,
-	CNTP_CVAL_EL0,
-	CNTP_CTL_EL0,
-
 	/* Memory Tagging Extension registers */
 	RGSR_EL1,	/* Random Allocation Tag Seed Register */
 	GCR_EL1,	/* Tag Control Register */
-	TFSR_EL1,	/* Tag Fault Status Register (EL1) */
 	TFSRE0_EL1,	/* Tag Fault Status Register (EL0) */
 
 	/* 32bit specific registers. */
@@ -257,20 +248,14 @@ enum vcpu_sysreg {
 	DBGVCR32_EL2,	/* Debug Vector Catch Register */
 
 	/* EL2 registers */
-	VPIDR_EL2,	/* Virtualization Processor ID Register */
-	VMPIDR_EL2,	/* Virtualization Multiprocessor ID Register */
 	SCTLR_EL2,	/* System Control Register (EL2) */
 	ACTLR_EL2,	/* Auxiliary Control Register (EL2) */
-	HCR_EL2,	/* Hypervisor Configuration Register */
 	MDCR_EL2,	/* Monitor Debug Configuration Register (EL2) */
 	CPTR_EL2,	/* Architectural Feature Trap Register (EL2) */
-	HSTR_EL2,	/* Hypervisor System Trap Register */
 	HACR_EL2,	/* Hypervisor Auxiliary Control Register */
 	TTBR0_EL2,	/* Translation Table Base Register 0 (EL2) */
 	TTBR1_EL2,	/* Translation Table Base Register 1 (EL2) */
 	TCR_EL2,	/* Translation Control Register (EL2) */
-	VTTBR_EL2,	/* Virtualization Translation Table Base Register */
-	VTCR_EL2,	/* Virtualization Translation Control Register */
 	SPSR_EL2,	/* EL2 saved program status register */
 	ELR_EL2,	/* EL2 exception link register */
 	AFSR0_EL2,	/* Auxiliary Fault Status Register 0 (EL2) */
@@ -283,7 +268,6 @@ enum vcpu_sysreg {
 	VBAR_EL2,	/* Vector Base Address Register (EL2) */
 	RVBAR_EL2,	/* Reset Vector Base Address Register */
 	CONTEXTIDR_EL2,	/* Context ID Register (EL2) */
-	TPIDR_EL2,	/* EL2 Software Thread ID Register */
 	CNTHCTL_EL2,	/* Counter-timer Hypervisor Control register */
 	SP_EL2,		/* EL2 Stack Pointer */
 	CNTHP_CTL_EL2,
@@ -291,6 +275,42 @@ enum vcpu_sysreg {
 	CNTHV_CTL_EL2,
 	CNTHV_CVAL_EL2,
 
+	__VNCR_START__,	/* Any VNCR-capable reg goes after this point */
+
+	VNCR(SCTLR_EL1),/* System Control Register */
+	VNCR(ACTLR_EL1),/* Auxiliary Control Register */
+	VNCR(CPACR_EL1),/* Coprocessor Access Control */
+	VNCR(ZCR_EL1),	/* SVE Control */
+	VNCR(TTBR0_EL1),/* Translation Table Base Register 0 */
+	VNCR(TTBR1_EL1),/* Translation Table Base Register 1 */
+	VNCR(TCR_EL1),	/* Translation Control Register */
+	VNCR(ESR_EL1),	/* Exception Syndrome Register */
+	VNCR(AFSR0_EL1),/* Auxiliary Fault Status Register 0 */
+	VNCR(AFSR1_EL1),/* Auxiliary Fault Status Register 1 */
+	VNCR(FAR_EL1),	/* Fault Address Register */
+	VNCR(MAIR_EL1),	/* Memory Attribute Indirection Register */
+	VNCR(VBAR_EL1),	/* Vector Base Address Register */
+	VNCR(CONTEXTIDR_EL1),	/* Context ID Register */
+	VNCR(AMAIR_EL1),/* Aux Memory Attribute Indirection Register */
+	VNCR(MDSCR_EL1),/* Monitor Debug System Control Register */
+	VNCR(ELR_EL1),
+	VNCR(SP_EL1),
+	VNCR(SPSR_EL1),
+	VNCR(TFSR_EL1),	/* Tag Fault Status Register (EL1) */
+	VNCR(VPIDR_EL2),/* Virtualization Processor ID Register */
+	VNCR(VMPIDR_EL2),/* Virtualization Multiprocessor ID Register */
+	VNCR(HCR_EL2),	/* Hypervisor Configuration Register */
+	VNCR(HSTR_EL2),	/* Hypervisor System Trap Register */
+	VNCR(VTTBR_EL2),/* Virtualization Translation Table Base Register */
+	VNCR(VTCR_EL2),	/* Virtualization Translation Control Register */
+	VNCR(TPIDR_EL2),/* EL2 Software Thread ID Register */
+
+	VNCR(CNTVOFF_EL2),
+	VNCR(CNTV_CVAL_EL0),
+	VNCR(CNTV_CTL_EL0),
+	VNCR(CNTP_CVAL_EL0),
+	VNCR(CNTP_CTL_EL0),
+
 	NR_SYS_REGS	/* Nothing after this line! */
 };
 
@@ -307,6 +327,9 @@ struct kvm_cpu_context {
 	u64 sys_regs[NR_SYS_REGS];
 
 	struct kvm_vcpu *__hyp_running_vcpu;
+
+	/* This pointer has to be 4kB aligned. */
+	u64 *vncr_array;
 };
 
 struct kvm_pmu_events {
@@ -534,7 +557,13 @@ struct kvm_vcpu_arch {
  * for system registers that are never context switched, but only
  * emulated.
  */
-#define __ctxt_sys_reg(c,r)	(&(c)->sys_regs[(r)])
+static inline u64 *__ctxt_sys_reg(const struct kvm_cpu_context *ctxt, int r)
+{
+	if (unlikely(r >= __VNCR_START__ && ctxt->vncr_array))
+		return &ctxt->vncr_array[r - __VNCR_START__];
+
+	return (u64 *)&ctxt->sys_regs[r];
+}
 
 #define ctxt_sys_reg(c,r)	(*__ctxt_sys_reg(c,r))
 
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 58/64] KVM: arm64: nv: Move nested vgic state into the sysreg file
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:19   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:19 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

The vgic nested state needs to be accessible from the VNCR page, and
thus needs to be part of the normal sysreg file. Let's move it there.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h    |  9 +++
 arch/arm64/kvm/sys_regs.c            | 53 +++++++++++------
 arch/arm64/kvm/vgic/vgic-v3-nested.c | 88 ++++++++++++++--------------
 arch/arm64/kvm/vgic/vgic-v3.c        | 17 ++++--
 arch/arm64/kvm/vgic/vgic.h           | 10 ++++
 include/kvm/arm_vgic.h               |  7 ---
 6 files changed, 110 insertions(+), 74 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 3f6ddccae310..15bbe8ccdefa 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -311,6 +311,15 @@ enum vcpu_sysreg {
 	VNCR(CNTP_CVAL_EL0),
 	VNCR(CNTP_CTL_EL0),
 
+	VNCR(ICH_LR0_EL2),
+	ICH_LR15_EL2 = ICH_LR0_EL2 + 15,
+	VNCR(ICH_AP0R0_EL2),
+	ICH_AP0R3_EL2 = ICH_AP0R0_EL2 + 3,
+	VNCR(ICH_AP1R0_EL2),
+	ICH_AP1R3_EL2 = ICH_AP1R0_EL2 + 3,
+	VNCR(ICH_HCR_EL2),
+	VNCR(ICH_VMCR_EL2),
+
 	NR_SYS_REGS	/* Nothing after this line! */
 };
 
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 32b8f4c1074a..351bcb429f25 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1865,17 +1865,17 @@ static bool access_gic_apr(struct kvm_vcpu *vcpu,
 			   struct sys_reg_params *p,
 			   const struct sys_reg_desc *r)
 {
-	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
-	u32 index, *base;
+	u64 *base;
+	u8 index;
 
 	index = r->Op2;
 	if (r->CRm == 8)
-		base = cpu_if->vgic_ap0r;
+		base = __ctxt_sys_reg(&vcpu->arch.ctxt, ICH_AP0R0_EL2);
 	else
-		base = cpu_if->vgic_ap1r;
+		base = __ctxt_sys_reg(&vcpu->arch.ctxt, ICH_AP1R0_EL2);
 
 	if (p->is_write)
-		base[index] = p->regval;
+		base[index] = lower_32_bits(p->regval);
 	else
 		p->regval = base[index];
 
@@ -1886,12 +1886,10 @@ static bool access_gic_hcr(struct kvm_vcpu *vcpu,
 			   struct sys_reg_params *p,
 			   const struct sys_reg_desc *r)
 {
-	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
-
 	if (p->is_write)
-		cpu_if->vgic_hcr = p->regval;
+		__vcpu_sys_reg(vcpu, ICH_HCR_EL2) = lower_32_bits(p->regval);
 	else
-		p->regval = cpu_if->vgic_hcr;
+		p->regval = __vcpu_sys_reg(vcpu, ICH_HCR_EL2);
 
 	return true;
 }
@@ -1948,12 +1946,19 @@ static bool access_gic_vmcr(struct kvm_vcpu *vcpu,
 			    struct sys_reg_params *p,
 			    const struct sys_reg_desc *r)
 {
-	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
-
 	if (p->is_write)
-		cpu_if->vgic_vmcr = p->regval;
+		__vcpu_sys_reg(vcpu, ICH_VMCR_EL2) = (p->regval	&
+						      (ICH_VMCR_ENG0_MASK	|
+						       ICH_VMCR_ENG1_MASK	|
+						       ICH_VMCR_PMR_MASK	|
+						       ICH_VMCR_BPR0_MASK	|
+						       ICH_VMCR_BPR1_MASK	|
+						       ICH_VMCR_EOIM_MASK	|
+						       ICH_VMCR_CBPR_MASK	|
+						       ICH_VMCR_FIQ_EN_MASK	|
+						       ICH_VMCR_ACK_CTL_MASK));
 	else
-		p->regval = cpu_if->vgic_vmcr;
+		p->regval = __vcpu_sys_reg(vcpu, ICH_VMCR_EL2);
 
 	return true;
 }
@@ -1962,17 +1967,29 @@ static bool access_gic_lr(struct kvm_vcpu *vcpu,
 			  struct sys_reg_params *p,
 			  const struct sys_reg_desc *r)
 {
-	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
 	u32 index;
+	u64 *base;
 
+	base = __ctxt_sys_reg(&vcpu->arch.ctxt, ICH_LR0_EL2);
 	index = p->Op2;
 	if (p->CRm == 13)
 		index += 8;
 
-	if (p->is_write)
-		cpu_if->vgic_lr[index] = p->regval;
-	else
-		p->regval = cpu_if->vgic_lr[index];
+	if (p->is_write) {
+		u64 mask = (ICH_LR_VIRTUAL_ID_MASK	|
+			    ICH_LR_GROUP		|
+			    ICH_LR_HW			|
+			    ICH_LR_STATE);
+
+		if (p->regval & ICH_LR_HW)
+			mask |= ICH_LR_PHYS_ID_MASK;
+		else
+			mask |= ICH_LR_EOI;
+
+		base[index] = p->regval & mask;
+	} else {
+		p->regval = base[index];
+	}
 
 	return true;
 }
diff --git a/arch/arm64/kvm/vgic/vgic-v3-nested.c b/arch/arm64/kvm/vgic/vgic-v3-nested.c
index 02b0b335f72a..5ffffddb5776 100644
--- a/arch/arm64/kvm/vgic/vgic-v3-nested.c
+++ b/arch/arm64/kvm/vgic/vgic-v3-nested.c
@@ -18,11 +18,6 @@
 #define CREATE_TRACE_POINTS
 #include "vgic-nested-trace.h"
 
-static inline struct vgic_v3_cpu_if *vcpu_nested_if(struct kvm_vcpu *vcpu)
-{
-	return &vcpu->arch.vgic_cpu.nested_vgic_v3;
-}
-
 static inline struct vgic_v3_cpu_if *vcpu_shadow_if(struct kvm_vcpu *vcpu)
 {
 	return &vcpu->arch.vgic_cpu.shadow_vgic_v3;
@@ -35,12 +30,11 @@ static inline bool lr_triggers_eoi(u64 lr)
 
 u16 vgic_v3_get_eisr(struct kvm_vcpu *vcpu)
 {
-	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
 	u16 reg = 0;
 	int i;
 
 	for (i = 0; i < kvm_vgic_global_state.nr_lr; i++) {
-		if (lr_triggers_eoi(cpu_if->vgic_lr[i]))
+		if (lr_triggers_eoi(__vcpu_sys_reg(vcpu, ICH_LRN(i))))
 			reg |= BIT(i);
 	}
 
@@ -49,12 +43,11 @@ u16 vgic_v3_get_eisr(struct kvm_vcpu *vcpu)
 
 u16 vgic_v3_get_elrsr(struct kvm_vcpu *vcpu)
 {
-	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
 	u16 reg = 0;
 	int i;
 
 	for (i = 0; i < kvm_vgic_global_state.nr_lr; i++) {
-		if (!(cpu_if->vgic_lr[i] & ICH_LR_STATE))
+		if (!(__vcpu_sys_reg(vcpu, ICH_LRN(i)) & ICH_LR_STATE))
 			reg |= BIT(i);
 	}
 
@@ -63,14 +56,13 @@ u16 vgic_v3_get_elrsr(struct kvm_vcpu *vcpu)
 
 u64 vgic_v3_get_misr(struct kvm_vcpu *vcpu)
 {
-	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
 	int nr_lr = kvm_vgic_global_state.nr_lr;
 	u64 reg = 0;
 
 	if (vgic_v3_get_eisr(vcpu))
 		reg |= ICH_MISR_EOI;
 
-	if (cpu_if->vgic_hcr & ICH_HCR_UIE) {
+	if (__vcpu_sys_reg(vcpu, ICH_HCR_EL2) & ICH_HCR_UIE) {
 		int used_lrs;
 
 		used_lrs = nr_lr - hweight16(vgic_v3_get_elrsr(vcpu));
@@ -89,13 +81,12 @@ u64 vgic_v3_get_misr(struct kvm_vcpu *vcpu)
  */
 static void vgic_v3_create_shadow_lr(struct kvm_vcpu *vcpu)
 {
-	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
 	struct vgic_v3_cpu_if *s_cpu_if = vcpu_shadow_if(vcpu);
 	struct vgic_irq *irq;
 	int i, used_lrs = 0;
 
 	for (i = 0; i < kvm_vgic_global_state.nr_lr; i++) {
-		u64 lr = cpu_if->vgic_lr[i];
+		u64 lr = __vcpu_sys_reg(vcpu, ICH_LRN(i));
 		int l1_irq;
 
 		if (!(lr & ICH_LR_HW))
@@ -125,36 +116,20 @@ static void vgic_v3_create_shadow_lr(struct kvm_vcpu *vcpu)
 	}
 
 	trace_vgic_create_shadow_lrs(vcpu, kvm_vgic_global_state.nr_lr,
-				     s_cpu_if->vgic_lr, cpu_if->vgic_lr);
+				     s_cpu_if->vgic_lr,
+				     __ctxt_sys_reg(&vcpu->arch.ctxt, ICH_LR0_EL2));
 
 	s_cpu_if->used_lrs = used_lrs;
 }
 
-/*
- * Change the shadow HWIRQ field back to the virtual value before copying over
- * the entire shadow struct to the nested state.
- */
-static void vgic_v3_fixup_shadow_lr_state(struct kvm_vcpu *vcpu)
-{
-	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
-	struct vgic_v3_cpu_if *s_cpu_if = vcpu_shadow_if(vcpu);
-	int lr;
-
-	for (lr = 0; lr < kvm_vgic_global_state.nr_lr; lr++) {
-		s_cpu_if->vgic_lr[lr] &= ~ICH_LR_PHYS_ID_MASK;
-		s_cpu_if->vgic_lr[lr] |= cpu_if->vgic_lr[lr] & ICH_LR_PHYS_ID_MASK;
-	}
-}
-
 void vgic_v3_sync_nested(struct kvm_vcpu *vcpu)
 {
-	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
 	struct vgic_v3_cpu_if *s_cpu_if = vcpu_shadow_if(vcpu);
 	struct vgic_irq *irq;
 	int i;
 
 	for (i = 0; i < s_cpu_if->used_lrs; i++) {
-		u64 lr = cpu_if->vgic_lr[i];
+		u64 lr = __vcpu_sys_reg(vcpu, ICH_LRN(i));
 		int l1_irq;
 
 		if (!(lr & ICH_LR_HW) || !(lr & ICH_LR_STATE))
@@ -180,14 +155,27 @@ void vgic_v3_sync_nested(struct kvm_vcpu *vcpu)
 	}
 }
 
+void vgic_v3_create_shadow_state(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.shadow_vgic_v3;
+	int i;
+
+	cpu_if->vgic_hcr = __vcpu_sys_reg(vcpu, ICH_HCR_EL2);
+	cpu_if->vgic_vmcr = __vcpu_sys_reg(vcpu, ICH_VMCR_EL2);
+
+	for (i = 0; i < 4; i++) {
+		cpu_if->vgic_ap0r[i] = __vcpu_sys_reg(vcpu, ICH_AP0RN(i));
+		cpu_if->vgic_ap1r[i] = __vcpu_sys_reg(vcpu, ICH_AP1RN(i));
+	}
+
+	vgic_v3_create_shadow_lr(vcpu);
+}
+
 void vgic_v3_load_nested(struct kvm_vcpu *vcpu)
 {
-	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
 	struct vgic_irq *irq;
 	unsigned long flags;
 
-	vgic_cpu->shadow_vgic_v3 = vgic_cpu->nested_vgic_v3;
-	vgic_v3_create_shadow_lr(vcpu);
 	__vgic_v3_restore_state(vcpu_shadow_if(vcpu));
 
 	irq = vgic_get_irq(vcpu->kvm, vcpu, vcpu->kvm->arch.vgic.maint_irq);
@@ -201,26 +189,40 @@ void vgic_v3_load_nested(struct kvm_vcpu *vcpu)
 
 void vgic_v3_put_nested(struct kvm_vcpu *vcpu)
 {
-	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
+	struct vgic_v3_cpu_if *s_cpu_if = vcpu_shadow_if(vcpu);
+	int i;
 
-	__vgic_v3_save_state(vcpu_shadow_if(vcpu));
+	__vgic_v3_save_state(s_cpu_if);
 
-	trace_vgic_put_nested(vcpu, kvm_vgic_global_state.nr_lr,
-			      vcpu_shadow_if(vcpu)->vgic_lr);
+	trace_vgic_put_nested(vcpu, kvm_vgic_global_state.nr_lr, s_cpu_if->vgic_lr);
 
 	/*
 	 * Translate the shadow state HW fields back to the virtual ones
 	 * before copying the shadow struct back to the nested one.
 	 */
-	vgic_v3_fixup_shadow_lr_state(vcpu);
-	vgic_cpu->nested_vgic_v3 = vgic_cpu->shadow_vgic_v3;
+	__vcpu_sys_reg(vcpu, ICH_HCR_EL2) = s_cpu_if->vgic_hcr;
+	__vcpu_sys_reg(vcpu, ICH_VMCR_EL2) = s_cpu_if->vgic_vmcr;
+
+	for (i = 0; i < 4; i++) {
+		__vcpu_sys_reg(vcpu, ICH_AP0RN(i)) = s_cpu_if->vgic_ap0r[i];
+		__vcpu_sys_reg(vcpu, ICH_AP1RN(i)) = s_cpu_if->vgic_ap1r[i];
+	}
+
+	for (i = 0; i < kvm_vgic_global_state.nr_lr; i++) {
+		u64 val = __vcpu_sys_reg(vcpu, ICH_LRN(i));
+
+		val &= ~ICH_LR_STATE;
+		val |= s_cpu_if->vgic_lr[i] & ICH_LR_STATE;
+
+		__vcpu_sys_reg(vcpu, ICH_LRN(i)) = val;
+	}
+
 	irq_set_irqchip_state(kvm_vgic_global_state.maint_irq,
 			      IRQCHIP_STATE_ACTIVE, false);
 }
 
 void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu)
 {
-	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
 	bool state;
 
 	/*
@@ -232,7 +234,7 @@ void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu)
 	if (!vgic_state_is_nested(vcpu))
 		return;
 
-	state  = cpu_if->vgic_hcr & ICH_HCR_EN;
+	state  = __vcpu_sys_reg(vcpu, ICH_HCR_EL2) & ICH_HCR_EN;
 	state &= vgic_v3_get_misr(vcpu);
 
 	kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c
index 39e5c68087ea..f8ed60563587 100644
--- a/arch/arm64/kvm/vgic/vgic-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-v3.c
@@ -280,10 +280,11 @@ void vgic_v3_enable(struct kvm_vcpu *vcpu)
 				     ICC_SRE_EL1_SRE);
 		/*
 		 * If nesting is allowed, force GICv3 onto the nested
-		 * guests as well.
+		 * guests as well by setting the shadow state to the
+		 * same value.
 		 */
 		if (vcpu_has_nv(vcpu))
-			vcpu->arch.vgic_cpu.nested_vgic_v3.vgic_sre = vgic_v3->vgic_sre;
+			vcpu->arch.vgic_cpu.shadow_vgic_v3.vgic_sre = vgic_v3->vgic_sre;
 		vcpu->arch.vgic_cpu.pendbaser = INITIAL_PENDBASER_VALUE;
 	} else {
 		vgic_v3->vgic_sre = 0;
@@ -715,11 +716,15 @@ void vgic_v3_load(struct kvm_vcpu *vcpu)
 	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
 
 	/*
-	 * vgic_v3_load_nested only affects the LRs in the shadow
-	 * state, so it is fine to pass the nested state around.
+	 * If the vgic is in nested state, populate the shadow state
+	 * from the guest's nested state. As vgic_v3_load_nested()
+	 * will only load LRs, let's deal with the rest of the state
+	 * here as if it was a non-nested state. Cunning.
 	 */
-	if (vgic_state_is_nested(vcpu))
-		cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
+	if (vgic_state_is_nested(vcpu)) {
+		vgic_v3_create_shadow_state(vcpu);
+		cpu_if = &vcpu->arch.vgic_cpu.shadow_vgic_v3;
+	}
 
 	/*
 	 * If dealing with a GICv2 emulation on GICv3, VMCR_EL2.VFIQen
diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h
index 3fd6c86a7ef3..ffdfe7bd9aea 100644
--- a/arch/arm64/kvm/vgic/vgic.h
+++ b/arch/arm64/kvm/vgic/vgic.h
@@ -323,4 +323,14 @@ void vgic_v4_teardown(struct kvm *kvm);
 void vgic_v4_configure_vsgis(struct kvm *kvm);
 void vgic_v4_get_vlpi_state(struct vgic_irq *irq, bool *val);
 
+void vgic_v3_sync_nested(struct kvm_vcpu *vcpu);
+void vgic_v3_create_shadow_state(struct kvm_vcpu *vcpu);
+void vgic_v3_load_nested(struct kvm_vcpu *vcpu);
+void vgic_v3_put_nested(struct kvm_vcpu *vcpu);
+void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu);
+
+#define ICH_LRN(n)	(ICH_LR0_EL2 + (n))
+#define ICH_AP0RN(n)	(ICH_AP0R0_EL2 + (n))
+#define ICH_AP1RN(n)	(ICH_AP1R0_EL2 + (n))
+
 #endif
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index b44db6b013f5..43fef20f1671 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -330,9 +330,6 @@ struct vgic_cpu {
 
 	struct vgic_irq private_irqs[VGIC_NR_PRIVATE_IRQS];
 
-	/* CPU vif control registers for the virtual GICH interface */
-	struct vgic_v3_cpu_if	nested_vgic_v3;
-
 	/*
 	 * The shadow vif control register loaded to the hardware when
 	 * running a nested L2 guest with the virtual IMO/FMO bit set.
@@ -396,10 +393,6 @@ void kvm_vgic_load(struct kvm_vcpu *vcpu);
 void kvm_vgic_put(struct kvm_vcpu *vcpu);
 void kvm_vgic_vmcr_sync(struct kvm_vcpu *vcpu);
 
-void vgic_v3_sync_nested(struct kvm_vcpu *vcpu);
-void vgic_v3_load_nested(struct kvm_vcpu *vcpu);
-void vgic_v3_put_nested(struct kvm_vcpu *vcpu);
-void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu);
 u16 vgic_v3_get_eisr(struct kvm_vcpu *vcpu);
 u16 vgic_v3_get_elrsr(struct kvm_vcpu *vcpu);
 u64 vgic_v3_get_misr(struct kvm_vcpu *vcpu);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 58/64] KVM: arm64: nv: Move nested vgic state into the sysreg file
@ 2022-01-28 12:19   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:19 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

The vgic nested state needs to be accessible from the VNCR page, and
thus needs to be part of the normal sysreg file. Let's move it there.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h    |  9 +++
 arch/arm64/kvm/sys_regs.c            | 53 +++++++++++------
 arch/arm64/kvm/vgic/vgic-v3-nested.c | 88 ++++++++++++++--------------
 arch/arm64/kvm/vgic/vgic-v3.c        | 17 ++++--
 arch/arm64/kvm/vgic/vgic.h           | 10 ++++
 include/kvm/arm_vgic.h               |  7 ---
 6 files changed, 110 insertions(+), 74 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 3f6ddccae310..15bbe8ccdefa 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -311,6 +311,15 @@ enum vcpu_sysreg {
 	VNCR(CNTP_CVAL_EL0),
 	VNCR(CNTP_CTL_EL0),
 
+	VNCR(ICH_LR0_EL2),
+	ICH_LR15_EL2 = ICH_LR0_EL2 + 15,
+	VNCR(ICH_AP0R0_EL2),
+	ICH_AP0R3_EL2 = ICH_AP0R0_EL2 + 3,
+	VNCR(ICH_AP1R0_EL2),
+	ICH_AP1R3_EL2 = ICH_AP1R0_EL2 + 3,
+	VNCR(ICH_HCR_EL2),
+	VNCR(ICH_VMCR_EL2),
+
 	NR_SYS_REGS	/* Nothing after this line! */
 };
 
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 32b8f4c1074a..351bcb429f25 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1865,17 +1865,17 @@ static bool access_gic_apr(struct kvm_vcpu *vcpu,
 			   struct sys_reg_params *p,
 			   const struct sys_reg_desc *r)
 {
-	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
-	u32 index, *base;
+	u64 *base;
+	u8 index;
 
 	index = r->Op2;
 	if (r->CRm == 8)
-		base = cpu_if->vgic_ap0r;
+		base = __ctxt_sys_reg(&vcpu->arch.ctxt, ICH_AP0R0_EL2);
 	else
-		base = cpu_if->vgic_ap1r;
+		base = __ctxt_sys_reg(&vcpu->arch.ctxt, ICH_AP1R0_EL2);
 
 	if (p->is_write)
-		base[index] = p->regval;
+		base[index] = lower_32_bits(p->regval);
 	else
 		p->regval = base[index];
 
@@ -1886,12 +1886,10 @@ static bool access_gic_hcr(struct kvm_vcpu *vcpu,
 			   struct sys_reg_params *p,
 			   const struct sys_reg_desc *r)
 {
-	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
-
 	if (p->is_write)
-		cpu_if->vgic_hcr = p->regval;
+		__vcpu_sys_reg(vcpu, ICH_HCR_EL2) = lower_32_bits(p->regval);
 	else
-		p->regval = cpu_if->vgic_hcr;
+		p->regval = __vcpu_sys_reg(vcpu, ICH_HCR_EL2);
 
 	return true;
 }
@@ -1948,12 +1946,19 @@ static bool access_gic_vmcr(struct kvm_vcpu *vcpu,
 			    struct sys_reg_params *p,
 			    const struct sys_reg_desc *r)
 {
-	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
-
 	if (p->is_write)
-		cpu_if->vgic_vmcr = p->regval;
+		__vcpu_sys_reg(vcpu, ICH_VMCR_EL2) = (p->regval	&
+						      (ICH_VMCR_ENG0_MASK	|
+						       ICH_VMCR_ENG1_MASK	|
+						       ICH_VMCR_PMR_MASK	|
+						       ICH_VMCR_BPR0_MASK	|
+						       ICH_VMCR_BPR1_MASK	|
+						       ICH_VMCR_EOIM_MASK	|
+						       ICH_VMCR_CBPR_MASK	|
+						       ICH_VMCR_FIQ_EN_MASK	|
+						       ICH_VMCR_ACK_CTL_MASK));
 	else
-		p->regval = cpu_if->vgic_vmcr;
+		p->regval = __vcpu_sys_reg(vcpu, ICH_VMCR_EL2);
 
 	return true;
 }
@@ -1962,17 +1967,29 @@ static bool access_gic_lr(struct kvm_vcpu *vcpu,
 			  struct sys_reg_params *p,
 			  const struct sys_reg_desc *r)
 {
-	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
 	u32 index;
+	u64 *base;
 
+	base = __ctxt_sys_reg(&vcpu->arch.ctxt, ICH_LR0_EL2);
 	index = p->Op2;
 	if (p->CRm == 13)
 		index += 8;
 
-	if (p->is_write)
-		cpu_if->vgic_lr[index] = p->regval;
-	else
-		p->regval = cpu_if->vgic_lr[index];
+	if (p->is_write) {
+		u64 mask = (ICH_LR_VIRTUAL_ID_MASK	|
+			    ICH_LR_GROUP		|
+			    ICH_LR_HW			|
+			    ICH_LR_STATE);
+
+		if (p->regval & ICH_LR_HW)
+			mask |= ICH_LR_PHYS_ID_MASK;
+		else
+			mask |= ICH_LR_EOI;
+
+		base[index] = p->regval & mask;
+	} else {
+		p->regval = base[index];
+	}
 
 	return true;
 }
diff --git a/arch/arm64/kvm/vgic/vgic-v3-nested.c b/arch/arm64/kvm/vgic/vgic-v3-nested.c
index 02b0b335f72a..5ffffddb5776 100644
--- a/arch/arm64/kvm/vgic/vgic-v3-nested.c
+++ b/arch/arm64/kvm/vgic/vgic-v3-nested.c
@@ -18,11 +18,6 @@
 #define CREATE_TRACE_POINTS
 #include "vgic-nested-trace.h"
 
-static inline struct vgic_v3_cpu_if *vcpu_nested_if(struct kvm_vcpu *vcpu)
-{
-	return &vcpu->arch.vgic_cpu.nested_vgic_v3;
-}
-
 static inline struct vgic_v3_cpu_if *vcpu_shadow_if(struct kvm_vcpu *vcpu)
 {
 	return &vcpu->arch.vgic_cpu.shadow_vgic_v3;
@@ -35,12 +30,11 @@ static inline bool lr_triggers_eoi(u64 lr)
 
 u16 vgic_v3_get_eisr(struct kvm_vcpu *vcpu)
 {
-	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
 	u16 reg = 0;
 	int i;
 
 	for (i = 0; i < kvm_vgic_global_state.nr_lr; i++) {
-		if (lr_triggers_eoi(cpu_if->vgic_lr[i]))
+		if (lr_triggers_eoi(__vcpu_sys_reg(vcpu, ICH_LRN(i))))
 			reg |= BIT(i);
 	}
 
@@ -49,12 +43,11 @@ u16 vgic_v3_get_eisr(struct kvm_vcpu *vcpu)
 
 u16 vgic_v3_get_elrsr(struct kvm_vcpu *vcpu)
 {
-	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
 	u16 reg = 0;
 	int i;
 
 	for (i = 0; i < kvm_vgic_global_state.nr_lr; i++) {
-		if (!(cpu_if->vgic_lr[i] & ICH_LR_STATE))
+		if (!(__vcpu_sys_reg(vcpu, ICH_LRN(i)) & ICH_LR_STATE))
 			reg |= BIT(i);
 	}
 
@@ -63,14 +56,13 @@ u16 vgic_v3_get_elrsr(struct kvm_vcpu *vcpu)
 
 u64 vgic_v3_get_misr(struct kvm_vcpu *vcpu)
 {
-	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
 	int nr_lr = kvm_vgic_global_state.nr_lr;
 	u64 reg = 0;
 
 	if (vgic_v3_get_eisr(vcpu))
 		reg |= ICH_MISR_EOI;
 
-	if (cpu_if->vgic_hcr & ICH_HCR_UIE) {
+	if (__vcpu_sys_reg(vcpu, ICH_HCR_EL2) & ICH_HCR_UIE) {
 		int used_lrs;
 
 		used_lrs = nr_lr - hweight16(vgic_v3_get_elrsr(vcpu));
@@ -89,13 +81,12 @@ u64 vgic_v3_get_misr(struct kvm_vcpu *vcpu)
  */
 static void vgic_v3_create_shadow_lr(struct kvm_vcpu *vcpu)
 {
-	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
 	struct vgic_v3_cpu_if *s_cpu_if = vcpu_shadow_if(vcpu);
 	struct vgic_irq *irq;
 	int i, used_lrs = 0;
 
 	for (i = 0; i < kvm_vgic_global_state.nr_lr; i++) {
-		u64 lr = cpu_if->vgic_lr[i];
+		u64 lr = __vcpu_sys_reg(vcpu, ICH_LRN(i));
 		int l1_irq;
 
 		if (!(lr & ICH_LR_HW))
@@ -125,36 +116,20 @@ static void vgic_v3_create_shadow_lr(struct kvm_vcpu *vcpu)
 	}
 
 	trace_vgic_create_shadow_lrs(vcpu, kvm_vgic_global_state.nr_lr,
-				     s_cpu_if->vgic_lr, cpu_if->vgic_lr);
+				     s_cpu_if->vgic_lr,
+				     __ctxt_sys_reg(&vcpu->arch.ctxt, ICH_LR0_EL2));
 
 	s_cpu_if->used_lrs = used_lrs;
 }
 
-/*
- * Change the shadow HWIRQ field back to the virtual value before copying over
- * the entire shadow struct to the nested state.
- */
-static void vgic_v3_fixup_shadow_lr_state(struct kvm_vcpu *vcpu)
-{
-	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
-	struct vgic_v3_cpu_if *s_cpu_if = vcpu_shadow_if(vcpu);
-	int lr;
-
-	for (lr = 0; lr < kvm_vgic_global_state.nr_lr; lr++) {
-		s_cpu_if->vgic_lr[lr] &= ~ICH_LR_PHYS_ID_MASK;
-		s_cpu_if->vgic_lr[lr] |= cpu_if->vgic_lr[lr] & ICH_LR_PHYS_ID_MASK;
-	}
-}
-
 void vgic_v3_sync_nested(struct kvm_vcpu *vcpu)
 {
-	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
 	struct vgic_v3_cpu_if *s_cpu_if = vcpu_shadow_if(vcpu);
 	struct vgic_irq *irq;
 	int i;
 
 	for (i = 0; i < s_cpu_if->used_lrs; i++) {
-		u64 lr = cpu_if->vgic_lr[i];
+		u64 lr = __vcpu_sys_reg(vcpu, ICH_LRN(i));
 		int l1_irq;
 
 		if (!(lr & ICH_LR_HW) || !(lr & ICH_LR_STATE))
@@ -180,14 +155,27 @@ void vgic_v3_sync_nested(struct kvm_vcpu *vcpu)
 	}
 }
 
+void vgic_v3_create_shadow_state(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.shadow_vgic_v3;
+	int i;
+
+	cpu_if->vgic_hcr = __vcpu_sys_reg(vcpu, ICH_HCR_EL2);
+	cpu_if->vgic_vmcr = __vcpu_sys_reg(vcpu, ICH_VMCR_EL2);
+
+	for (i = 0; i < 4; i++) {
+		cpu_if->vgic_ap0r[i] = __vcpu_sys_reg(vcpu, ICH_AP0RN(i));
+		cpu_if->vgic_ap1r[i] = __vcpu_sys_reg(vcpu, ICH_AP1RN(i));
+	}
+
+	vgic_v3_create_shadow_lr(vcpu);
+}
+
 void vgic_v3_load_nested(struct kvm_vcpu *vcpu)
 {
-	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
 	struct vgic_irq *irq;
 	unsigned long flags;
 
-	vgic_cpu->shadow_vgic_v3 = vgic_cpu->nested_vgic_v3;
-	vgic_v3_create_shadow_lr(vcpu);
 	__vgic_v3_restore_state(vcpu_shadow_if(vcpu));
 
 	irq = vgic_get_irq(vcpu->kvm, vcpu, vcpu->kvm->arch.vgic.maint_irq);
@@ -201,26 +189,40 @@ void vgic_v3_load_nested(struct kvm_vcpu *vcpu)
 
 void vgic_v3_put_nested(struct kvm_vcpu *vcpu)
 {
-	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
+	struct vgic_v3_cpu_if *s_cpu_if = vcpu_shadow_if(vcpu);
+	int i;
 
-	__vgic_v3_save_state(vcpu_shadow_if(vcpu));
+	__vgic_v3_save_state(s_cpu_if);
 
-	trace_vgic_put_nested(vcpu, kvm_vgic_global_state.nr_lr,
-			      vcpu_shadow_if(vcpu)->vgic_lr);
+	trace_vgic_put_nested(vcpu, kvm_vgic_global_state.nr_lr, s_cpu_if->vgic_lr);
 
 	/*
 	 * Translate the shadow state HW fields back to the virtual ones
 	 * before copying the shadow struct back to the nested one.
 	 */
-	vgic_v3_fixup_shadow_lr_state(vcpu);
-	vgic_cpu->nested_vgic_v3 = vgic_cpu->shadow_vgic_v3;
+	__vcpu_sys_reg(vcpu, ICH_HCR_EL2) = s_cpu_if->vgic_hcr;
+	__vcpu_sys_reg(vcpu, ICH_VMCR_EL2) = s_cpu_if->vgic_vmcr;
+
+	for (i = 0; i < 4; i++) {
+		__vcpu_sys_reg(vcpu, ICH_AP0RN(i)) = s_cpu_if->vgic_ap0r[i];
+		__vcpu_sys_reg(vcpu, ICH_AP1RN(i)) = s_cpu_if->vgic_ap1r[i];
+	}
+
+	for (i = 0; i < kvm_vgic_global_state.nr_lr; i++) {
+		u64 val = __vcpu_sys_reg(vcpu, ICH_LRN(i));
+
+		val &= ~ICH_LR_STATE;
+		val |= s_cpu_if->vgic_lr[i] & ICH_LR_STATE;
+
+		__vcpu_sys_reg(vcpu, ICH_LRN(i)) = val;
+	}
+
 	irq_set_irqchip_state(kvm_vgic_global_state.maint_irq,
 			      IRQCHIP_STATE_ACTIVE, false);
 }
 
 void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu)
 {
-	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
 	bool state;
 
 	/*
@@ -232,7 +234,7 @@ void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu)
 	if (!vgic_state_is_nested(vcpu))
 		return;
 
-	state  = cpu_if->vgic_hcr & ICH_HCR_EN;
+	state  = __vcpu_sys_reg(vcpu, ICH_HCR_EL2) & ICH_HCR_EN;
 	state &= vgic_v3_get_misr(vcpu);
 
 	kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c
index 39e5c68087ea..f8ed60563587 100644
--- a/arch/arm64/kvm/vgic/vgic-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-v3.c
@@ -280,10 +280,11 @@ void vgic_v3_enable(struct kvm_vcpu *vcpu)
 				     ICC_SRE_EL1_SRE);
 		/*
 		 * If nesting is allowed, force GICv3 onto the nested
-		 * guests as well.
+		 * guests as well by setting the shadow state to the
+		 * same value.
 		 */
 		if (vcpu_has_nv(vcpu))
-			vcpu->arch.vgic_cpu.nested_vgic_v3.vgic_sre = vgic_v3->vgic_sre;
+			vcpu->arch.vgic_cpu.shadow_vgic_v3.vgic_sre = vgic_v3->vgic_sre;
 		vcpu->arch.vgic_cpu.pendbaser = INITIAL_PENDBASER_VALUE;
 	} else {
 		vgic_v3->vgic_sre = 0;
@@ -715,11 +716,15 @@ void vgic_v3_load(struct kvm_vcpu *vcpu)
 	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
 
 	/*
-	 * vgic_v3_load_nested only affects the LRs in the shadow
-	 * state, so it is fine to pass the nested state around.
+	 * If the vgic is in nested state, populate the shadow state
+	 * from the guest's nested state. As vgic_v3_load_nested()
+	 * will only load LRs, let's deal with the rest of the state
+	 * here as if it was a non-nested state. Cunning.
 	 */
-	if (vgic_state_is_nested(vcpu))
-		cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
+	if (vgic_state_is_nested(vcpu)) {
+		vgic_v3_create_shadow_state(vcpu);
+		cpu_if = &vcpu->arch.vgic_cpu.shadow_vgic_v3;
+	}
 
 	/*
 	 * If dealing with a GICv2 emulation on GICv3, VMCR_EL2.VFIQen
diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h
index 3fd6c86a7ef3..ffdfe7bd9aea 100644
--- a/arch/arm64/kvm/vgic/vgic.h
+++ b/arch/arm64/kvm/vgic/vgic.h
@@ -323,4 +323,14 @@ void vgic_v4_teardown(struct kvm *kvm);
 void vgic_v4_configure_vsgis(struct kvm *kvm);
 void vgic_v4_get_vlpi_state(struct vgic_irq *irq, bool *val);
 
+void vgic_v3_sync_nested(struct kvm_vcpu *vcpu);
+void vgic_v3_create_shadow_state(struct kvm_vcpu *vcpu);
+void vgic_v3_load_nested(struct kvm_vcpu *vcpu);
+void vgic_v3_put_nested(struct kvm_vcpu *vcpu);
+void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu);
+
+#define ICH_LRN(n)	(ICH_LR0_EL2 + (n))
+#define ICH_AP0RN(n)	(ICH_AP0R0_EL2 + (n))
+#define ICH_AP1RN(n)	(ICH_AP1R0_EL2 + (n))
+
 #endif
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index b44db6b013f5..43fef20f1671 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -330,9 +330,6 @@ struct vgic_cpu {
 
 	struct vgic_irq private_irqs[VGIC_NR_PRIVATE_IRQS];
 
-	/* CPU vif control registers for the virtual GICH interface */
-	struct vgic_v3_cpu_if	nested_vgic_v3;
-
 	/*
 	 * The shadow vif control register loaded to the hardware when
 	 * running a nested L2 guest with the virtual IMO/FMO bit set.
@@ -396,10 +393,6 @@ void kvm_vgic_load(struct kvm_vcpu *vcpu);
 void kvm_vgic_put(struct kvm_vcpu *vcpu);
 void kvm_vgic_vmcr_sync(struct kvm_vcpu *vcpu);
 
-void vgic_v3_sync_nested(struct kvm_vcpu *vcpu);
-void vgic_v3_load_nested(struct kvm_vcpu *vcpu);
-void vgic_v3_put_nested(struct kvm_vcpu *vcpu);
-void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu);
 u16 vgic_v3_get_eisr(struct kvm_vcpu *vcpu);
 u16 vgic_v3_get_elrsr(struct kvm_vcpu *vcpu);
 u64 vgic_v3_get_misr(struct kvm_vcpu *vcpu);
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 58/64] KVM: arm64: nv: Move nested vgic state into the sysreg file
@ 2022-01-28 12:19   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:19 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

The vgic nested state needs to be accessible from the VNCR page, and
thus needs to be part of the normal sysreg file. Let's move it there.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h    |  9 +++
 arch/arm64/kvm/sys_regs.c            | 53 +++++++++++------
 arch/arm64/kvm/vgic/vgic-v3-nested.c | 88 ++++++++++++++--------------
 arch/arm64/kvm/vgic/vgic-v3.c        | 17 ++++--
 arch/arm64/kvm/vgic/vgic.h           | 10 ++++
 include/kvm/arm_vgic.h               |  7 ---
 6 files changed, 110 insertions(+), 74 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 3f6ddccae310..15bbe8ccdefa 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -311,6 +311,15 @@ enum vcpu_sysreg {
 	VNCR(CNTP_CVAL_EL0),
 	VNCR(CNTP_CTL_EL0),
 
+	VNCR(ICH_LR0_EL2),
+	ICH_LR15_EL2 = ICH_LR0_EL2 + 15,
+	VNCR(ICH_AP0R0_EL2),
+	ICH_AP0R3_EL2 = ICH_AP0R0_EL2 + 3,
+	VNCR(ICH_AP1R0_EL2),
+	ICH_AP1R3_EL2 = ICH_AP1R0_EL2 + 3,
+	VNCR(ICH_HCR_EL2),
+	VNCR(ICH_VMCR_EL2),
+
 	NR_SYS_REGS	/* Nothing after this line! */
 };
 
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 32b8f4c1074a..351bcb429f25 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1865,17 +1865,17 @@ static bool access_gic_apr(struct kvm_vcpu *vcpu,
 			   struct sys_reg_params *p,
 			   const struct sys_reg_desc *r)
 {
-	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
-	u32 index, *base;
+	u64 *base;
+	u8 index;
 
 	index = r->Op2;
 	if (r->CRm == 8)
-		base = cpu_if->vgic_ap0r;
+		base = __ctxt_sys_reg(&vcpu->arch.ctxt, ICH_AP0R0_EL2);
 	else
-		base = cpu_if->vgic_ap1r;
+		base = __ctxt_sys_reg(&vcpu->arch.ctxt, ICH_AP1R0_EL2);
 
 	if (p->is_write)
-		base[index] = p->regval;
+		base[index] = lower_32_bits(p->regval);
 	else
 		p->regval = base[index];
 
@@ -1886,12 +1886,10 @@ static bool access_gic_hcr(struct kvm_vcpu *vcpu,
 			   struct sys_reg_params *p,
 			   const struct sys_reg_desc *r)
 {
-	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
-
 	if (p->is_write)
-		cpu_if->vgic_hcr = p->regval;
+		__vcpu_sys_reg(vcpu, ICH_HCR_EL2) = lower_32_bits(p->regval);
 	else
-		p->regval = cpu_if->vgic_hcr;
+		p->regval = __vcpu_sys_reg(vcpu, ICH_HCR_EL2);
 
 	return true;
 }
@@ -1948,12 +1946,19 @@ static bool access_gic_vmcr(struct kvm_vcpu *vcpu,
 			    struct sys_reg_params *p,
 			    const struct sys_reg_desc *r)
 {
-	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
-
 	if (p->is_write)
-		cpu_if->vgic_vmcr = p->regval;
+		__vcpu_sys_reg(vcpu, ICH_VMCR_EL2) = (p->regval	&
+						      (ICH_VMCR_ENG0_MASK	|
+						       ICH_VMCR_ENG1_MASK	|
+						       ICH_VMCR_PMR_MASK	|
+						       ICH_VMCR_BPR0_MASK	|
+						       ICH_VMCR_BPR1_MASK	|
+						       ICH_VMCR_EOIM_MASK	|
+						       ICH_VMCR_CBPR_MASK	|
+						       ICH_VMCR_FIQ_EN_MASK	|
+						       ICH_VMCR_ACK_CTL_MASK));
 	else
-		p->regval = cpu_if->vgic_vmcr;
+		p->regval = __vcpu_sys_reg(vcpu, ICH_VMCR_EL2);
 
 	return true;
 }
@@ -1962,17 +1967,29 @@ static bool access_gic_lr(struct kvm_vcpu *vcpu,
 			  struct sys_reg_params *p,
 			  const struct sys_reg_desc *r)
 {
-	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
 	u32 index;
+	u64 *base;
 
+	base = __ctxt_sys_reg(&vcpu->arch.ctxt, ICH_LR0_EL2);
 	index = p->Op2;
 	if (p->CRm == 13)
 		index += 8;
 
-	if (p->is_write)
-		cpu_if->vgic_lr[index] = p->regval;
-	else
-		p->regval = cpu_if->vgic_lr[index];
+	if (p->is_write) {
+		u64 mask = (ICH_LR_VIRTUAL_ID_MASK	|
+			    ICH_LR_GROUP		|
+			    ICH_LR_HW			|
+			    ICH_LR_STATE);
+
+		if (p->regval & ICH_LR_HW)
+			mask |= ICH_LR_PHYS_ID_MASK;
+		else
+			mask |= ICH_LR_EOI;
+
+		base[index] = p->regval & mask;
+	} else {
+		p->regval = base[index];
+	}
 
 	return true;
 }
diff --git a/arch/arm64/kvm/vgic/vgic-v3-nested.c b/arch/arm64/kvm/vgic/vgic-v3-nested.c
index 02b0b335f72a..5ffffddb5776 100644
--- a/arch/arm64/kvm/vgic/vgic-v3-nested.c
+++ b/arch/arm64/kvm/vgic/vgic-v3-nested.c
@@ -18,11 +18,6 @@
 #define CREATE_TRACE_POINTS
 #include "vgic-nested-trace.h"
 
-static inline struct vgic_v3_cpu_if *vcpu_nested_if(struct kvm_vcpu *vcpu)
-{
-	return &vcpu->arch.vgic_cpu.nested_vgic_v3;
-}
-
 static inline struct vgic_v3_cpu_if *vcpu_shadow_if(struct kvm_vcpu *vcpu)
 {
 	return &vcpu->arch.vgic_cpu.shadow_vgic_v3;
@@ -35,12 +30,11 @@ static inline bool lr_triggers_eoi(u64 lr)
 
 u16 vgic_v3_get_eisr(struct kvm_vcpu *vcpu)
 {
-	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
 	u16 reg = 0;
 	int i;
 
 	for (i = 0; i < kvm_vgic_global_state.nr_lr; i++) {
-		if (lr_triggers_eoi(cpu_if->vgic_lr[i]))
+		if (lr_triggers_eoi(__vcpu_sys_reg(vcpu, ICH_LRN(i))))
 			reg |= BIT(i);
 	}
 
@@ -49,12 +43,11 @@ u16 vgic_v3_get_eisr(struct kvm_vcpu *vcpu)
 
 u16 vgic_v3_get_elrsr(struct kvm_vcpu *vcpu)
 {
-	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
 	u16 reg = 0;
 	int i;
 
 	for (i = 0; i < kvm_vgic_global_state.nr_lr; i++) {
-		if (!(cpu_if->vgic_lr[i] & ICH_LR_STATE))
+		if (!(__vcpu_sys_reg(vcpu, ICH_LRN(i)) & ICH_LR_STATE))
 			reg |= BIT(i);
 	}
 
@@ -63,14 +56,13 @@ u16 vgic_v3_get_elrsr(struct kvm_vcpu *vcpu)
 
 u64 vgic_v3_get_misr(struct kvm_vcpu *vcpu)
 {
-	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
 	int nr_lr = kvm_vgic_global_state.nr_lr;
 	u64 reg = 0;
 
 	if (vgic_v3_get_eisr(vcpu))
 		reg |= ICH_MISR_EOI;
 
-	if (cpu_if->vgic_hcr & ICH_HCR_UIE) {
+	if (__vcpu_sys_reg(vcpu, ICH_HCR_EL2) & ICH_HCR_UIE) {
 		int used_lrs;
 
 		used_lrs = nr_lr - hweight16(vgic_v3_get_elrsr(vcpu));
@@ -89,13 +81,12 @@ u64 vgic_v3_get_misr(struct kvm_vcpu *vcpu)
  */
 static void vgic_v3_create_shadow_lr(struct kvm_vcpu *vcpu)
 {
-	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
 	struct vgic_v3_cpu_if *s_cpu_if = vcpu_shadow_if(vcpu);
 	struct vgic_irq *irq;
 	int i, used_lrs = 0;
 
 	for (i = 0; i < kvm_vgic_global_state.nr_lr; i++) {
-		u64 lr = cpu_if->vgic_lr[i];
+		u64 lr = __vcpu_sys_reg(vcpu, ICH_LRN(i));
 		int l1_irq;
 
 		if (!(lr & ICH_LR_HW))
@@ -125,36 +116,20 @@ static void vgic_v3_create_shadow_lr(struct kvm_vcpu *vcpu)
 	}
 
 	trace_vgic_create_shadow_lrs(vcpu, kvm_vgic_global_state.nr_lr,
-				     s_cpu_if->vgic_lr, cpu_if->vgic_lr);
+				     s_cpu_if->vgic_lr,
+				     __ctxt_sys_reg(&vcpu->arch.ctxt, ICH_LR0_EL2));
 
 	s_cpu_if->used_lrs = used_lrs;
 }
 
-/*
- * Change the shadow HWIRQ field back to the virtual value before copying over
- * the entire shadow struct to the nested state.
- */
-static void vgic_v3_fixup_shadow_lr_state(struct kvm_vcpu *vcpu)
-{
-	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
-	struct vgic_v3_cpu_if *s_cpu_if = vcpu_shadow_if(vcpu);
-	int lr;
-
-	for (lr = 0; lr < kvm_vgic_global_state.nr_lr; lr++) {
-		s_cpu_if->vgic_lr[lr] &= ~ICH_LR_PHYS_ID_MASK;
-		s_cpu_if->vgic_lr[lr] |= cpu_if->vgic_lr[lr] & ICH_LR_PHYS_ID_MASK;
-	}
-}
-
 void vgic_v3_sync_nested(struct kvm_vcpu *vcpu)
 {
-	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
 	struct vgic_v3_cpu_if *s_cpu_if = vcpu_shadow_if(vcpu);
 	struct vgic_irq *irq;
 	int i;
 
 	for (i = 0; i < s_cpu_if->used_lrs; i++) {
-		u64 lr = cpu_if->vgic_lr[i];
+		u64 lr = __vcpu_sys_reg(vcpu, ICH_LRN(i));
 		int l1_irq;
 
 		if (!(lr & ICH_LR_HW) || !(lr & ICH_LR_STATE))
@@ -180,14 +155,27 @@ void vgic_v3_sync_nested(struct kvm_vcpu *vcpu)
 	}
 }
 
+void vgic_v3_create_shadow_state(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.shadow_vgic_v3;
+	int i;
+
+	cpu_if->vgic_hcr = __vcpu_sys_reg(vcpu, ICH_HCR_EL2);
+	cpu_if->vgic_vmcr = __vcpu_sys_reg(vcpu, ICH_VMCR_EL2);
+
+	for (i = 0; i < 4; i++) {
+		cpu_if->vgic_ap0r[i] = __vcpu_sys_reg(vcpu, ICH_AP0RN(i));
+		cpu_if->vgic_ap1r[i] = __vcpu_sys_reg(vcpu, ICH_AP1RN(i));
+	}
+
+	vgic_v3_create_shadow_lr(vcpu);
+}
+
 void vgic_v3_load_nested(struct kvm_vcpu *vcpu)
 {
-	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
 	struct vgic_irq *irq;
 	unsigned long flags;
 
-	vgic_cpu->shadow_vgic_v3 = vgic_cpu->nested_vgic_v3;
-	vgic_v3_create_shadow_lr(vcpu);
 	__vgic_v3_restore_state(vcpu_shadow_if(vcpu));
 
 	irq = vgic_get_irq(vcpu->kvm, vcpu, vcpu->kvm->arch.vgic.maint_irq);
@@ -201,26 +189,40 @@ void vgic_v3_load_nested(struct kvm_vcpu *vcpu)
 
 void vgic_v3_put_nested(struct kvm_vcpu *vcpu)
 {
-	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
+	struct vgic_v3_cpu_if *s_cpu_if = vcpu_shadow_if(vcpu);
+	int i;
 
-	__vgic_v3_save_state(vcpu_shadow_if(vcpu));
+	__vgic_v3_save_state(s_cpu_if);
 
-	trace_vgic_put_nested(vcpu, kvm_vgic_global_state.nr_lr,
-			      vcpu_shadow_if(vcpu)->vgic_lr);
+	trace_vgic_put_nested(vcpu, kvm_vgic_global_state.nr_lr, s_cpu_if->vgic_lr);
 
 	/*
 	 * Translate the shadow state HW fields back to the virtual ones
 	 * before copying the shadow struct back to the nested one.
 	 */
-	vgic_v3_fixup_shadow_lr_state(vcpu);
-	vgic_cpu->nested_vgic_v3 = vgic_cpu->shadow_vgic_v3;
+	__vcpu_sys_reg(vcpu, ICH_HCR_EL2) = s_cpu_if->vgic_hcr;
+	__vcpu_sys_reg(vcpu, ICH_VMCR_EL2) = s_cpu_if->vgic_vmcr;
+
+	for (i = 0; i < 4; i++) {
+		__vcpu_sys_reg(vcpu, ICH_AP0RN(i)) = s_cpu_if->vgic_ap0r[i];
+		__vcpu_sys_reg(vcpu, ICH_AP1RN(i)) = s_cpu_if->vgic_ap1r[i];
+	}
+
+	for (i = 0; i < kvm_vgic_global_state.nr_lr; i++) {
+		u64 val = __vcpu_sys_reg(vcpu, ICH_LRN(i));
+
+		val &= ~ICH_LR_STATE;
+		val |= s_cpu_if->vgic_lr[i] & ICH_LR_STATE;
+
+		__vcpu_sys_reg(vcpu, ICH_LRN(i)) = val;
+	}
+
 	irq_set_irqchip_state(kvm_vgic_global_state.maint_irq,
 			      IRQCHIP_STATE_ACTIVE, false);
 }
 
 void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu)
 {
-	struct vgic_v3_cpu_if *cpu_if = vcpu_nested_if(vcpu);
 	bool state;
 
 	/*
@@ -232,7 +234,7 @@ void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu)
 	if (!vgic_state_is_nested(vcpu))
 		return;
 
-	state  = cpu_if->vgic_hcr & ICH_HCR_EN;
+	state  = __vcpu_sys_reg(vcpu, ICH_HCR_EL2) & ICH_HCR_EN;
 	state &= vgic_v3_get_misr(vcpu);
 
 	kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c
index 39e5c68087ea..f8ed60563587 100644
--- a/arch/arm64/kvm/vgic/vgic-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-v3.c
@@ -280,10 +280,11 @@ void vgic_v3_enable(struct kvm_vcpu *vcpu)
 				     ICC_SRE_EL1_SRE);
 		/*
 		 * If nesting is allowed, force GICv3 onto the nested
-		 * guests as well.
+		 * guests as well by setting the shadow state to the
+		 * same value.
 		 */
 		if (vcpu_has_nv(vcpu))
-			vcpu->arch.vgic_cpu.nested_vgic_v3.vgic_sre = vgic_v3->vgic_sre;
+			vcpu->arch.vgic_cpu.shadow_vgic_v3.vgic_sre = vgic_v3->vgic_sre;
 		vcpu->arch.vgic_cpu.pendbaser = INITIAL_PENDBASER_VALUE;
 	} else {
 		vgic_v3->vgic_sre = 0;
@@ -715,11 +716,15 @@ void vgic_v3_load(struct kvm_vcpu *vcpu)
 	struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
 
 	/*
-	 * vgic_v3_load_nested only affects the LRs in the shadow
-	 * state, so it is fine to pass the nested state around.
+	 * If the vgic is in nested state, populate the shadow state
+	 * from the guest's nested state. As vgic_v3_load_nested()
+	 * will only load LRs, let's deal with the rest of the state
+	 * here as if it was a non-nested state. Cunning.
 	 */
-	if (vgic_state_is_nested(vcpu))
-		cpu_if = &vcpu->arch.vgic_cpu.nested_vgic_v3;
+	if (vgic_state_is_nested(vcpu)) {
+		vgic_v3_create_shadow_state(vcpu);
+		cpu_if = &vcpu->arch.vgic_cpu.shadow_vgic_v3;
+	}
 
 	/*
 	 * If dealing with a GICv2 emulation on GICv3, VMCR_EL2.VFIQen
diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h
index 3fd6c86a7ef3..ffdfe7bd9aea 100644
--- a/arch/arm64/kvm/vgic/vgic.h
+++ b/arch/arm64/kvm/vgic/vgic.h
@@ -323,4 +323,14 @@ void vgic_v4_teardown(struct kvm *kvm);
 void vgic_v4_configure_vsgis(struct kvm *kvm);
 void vgic_v4_get_vlpi_state(struct vgic_irq *irq, bool *val);
 
+void vgic_v3_sync_nested(struct kvm_vcpu *vcpu);
+void vgic_v3_create_shadow_state(struct kvm_vcpu *vcpu);
+void vgic_v3_load_nested(struct kvm_vcpu *vcpu);
+void vgic_v3_put_nested(struct kvm_vcpu *vcpu);
+void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu);
+
+#define ICH_LRN(n)	(ICH_LR0_EL2 + (n))
+#define ICH_AP0RN(n)	(ICH_AP0R0_EL2 + (n))
+#define ICH_AP1RN(n)	(ICH_AP1R0_EL2 + (n))
+
 #endif
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index b44db6b013f5..43fef20f1671 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -330,9 +330,6 @@ struct vgic_cpu {
 
 	struct vgic_irq private_irqs[VGIC_NR_PRIVATE_IRQS];
 
-	/* CPU vif control registers for the virtual GICH interface */
-	struct vgic_v3_cpu_if	nested_vgic_v3;
-
 	/*
 	 * The shadow vif control register loaded to the hardware when
 	 * running a nested L2 guest with the virtual IMO/FMO bit set.
@@ -396,10 +393,6 @@ void kvm_vgic_load(struct kvm_vcpu *vcpu);
 void kvm_vgic_put(struct kvm_vcpu *vcpu);
 void kvm_vgic_vmcr_sync(struct kvm_vcpu *vcpu);
 
-void vgic_v3_sync_nested(struct kvm_vcpu *vcpu);
-void vgic_v3_load_nested(struct kvm_vcpu *vcpu);
-void vgic_v3_put_nested(struct kvm_vcpu *vcpu);
-void vgic_v3_handle_nested_maint_irq(struct kvm_vcpu *vcpu);
 u16 vgic_v3_get_eisr(struct kvm_vcpu *vcpu);
 u16 vgic_v3_get_elrsr(struct kvm_vcpu *vcpu);
 u64 vgic_v3_get_misr(struct kvm_vcpu *vcpu);
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 59/64] KVM: arm64: Add ARMv8.4 Enhanced Nested Virt cpufeature
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:19   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:19 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

Add the detection code for the ARMv8.4-NV feature.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_nested.h |  6 ++++++
 arch/arm64/kernel/cpufeature.c      | 10 ++++++++++
 arch/arm64/tools/cpucaps            |  1 +
 3 files changed, 17 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 3b5838a2f29e..57b14474a062 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -14,6 +14,12 @@ static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
 		test_bit(KVM_ARM_VCPU_HAS_EL2, vcpu->arch.features));
 }
 
+static inline bool vcpu_has_nv2(const struct kvm_vcpu *vcpu)
+{
+	return cpus_have_final_cap(ARM64_HAS_ENHANCED_NESTED_VIRT) &&
+		vcpu_has_nv(vcpu);
+}
+
 /* Translation helpers from non-VHE EL2 to EL1 */
 static inline u64 tcr_el2_ps_to_tcr_el1_ips(u64 tcr_el2)
 {
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 2fa39ce108d0..6e0577d3bf63 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2029,6 +2029,16 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 		.field_pos = ID_AA64MMFR2_NV_SHIFT,
 		.min_field_value = 1,
 	},
+	{
+		.desc = "Enhanced Nested Virtualization Support",
+		.capability = ARM64_HAS_ENHANCED_NESTED_VIRT,
+		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
+		.matches = has_nested_virt_support,
+		.sys_reg = SYS_ID_AA64MMFR2_EL1,
+		.sign = FTR_UNSIGNED,
+		.field_pos = ID_AA64MMFR2_NV_SHIFT,
+		.min_field_value = 2,
+	},
 	{
 		.capability = ARM64_HAS_32BIT_EL0_DO_NOT_USE,
 		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index a49864b56a07..3f6f567d2811 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -19,6 +19,7 @@ HAS_DCPODP
 HAS_DCPOP
 HAS_E0PD
 HAS_ECV
+HAS_ENHANCED_NESTED_VIRT
 HAS_EPAN
 HAS_GENERIC_AUTH
 HAS_GENERIC_AUTH_ARCH
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 59/64] KVM: arm64: Add ARMv8.4 Enhanced Nested Virt cpufeature
@ 2022-01-28 12:19   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:19 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

Add the detection code for the ARMv8.4-NV feature.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_nested.h |  6 ++++++
 arch/arm64/kernel/cpufeature.c      | 10 ++++++++++
 arch/arm64/tools/cpucaps            |  1 +
 3 files changed, 17 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 3b5838a2f29e..57b14474a062 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -14,6 +14,12 @@ static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
 		test_bit(KVM_ARM_VCPU_HAS_EL2, vcpu->arch.features));
 }
 
+static inline bool vcpu_has_nv2(const struct kvm_vcpu *vcpu)
+{
+	return cpus_have_final_cap(ARM64_HAS_ENHANCED_NESTED_VIRT) &&
+		vcpu_has_nv(vcpu);
+}
+
 /* Translation helpers from non-VHE EL2 to EL1 */
 static inline u64 tcr_el2_ps_to_tcr_el1_ips(u64 tcr_el2)
 {
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 2fa39ce108d0..6e0577d3bf63 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2029,6 +2029,16 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 		.field_pos = ID_AA64MMFR2_NV_SHIFT,
 		.min_field_value = 1,
 	},
+	{
+		.desc = "Enhanced Nested Virtualization Support",
+		.capability = ARM64_HAS_ENHANCED_NESTED_VIRT,
+		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
+		.matches = has_nested_virt_support,
+		.sys_reg = SYS_ID_AA64MMFR2_EL1,
+		.sign = FTR_UNSIGNED,
+		.field_pos = ID_AA64MMFR2_NV_SHIFT,
+		.min_field_value = 2,
+	},
 	{
 		.capability = ARM64_HAS_32BIT_EL0_DO_NOT_USE,
 		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index a49864b56a07..3f6f567d2811 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -19,6 +19,7 @@ HAS_DCPODP
 HAS_DCPOP
 HAS_E0PD
 HAS_ECV
+HAS_ENHANCED_NESTED_VIRT
 HAS_EPAN
 HAS_GENERIC_AUTH
 HAS_GENERIC_AUTH_ARCH
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 59/64] KVM: arm64: Add ARMv8.4 Enhanced Nested Virt cpufeature
@ 2022-01-28 12:19   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:19 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

Add the detection code for the ARMv8.4-NV feature.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_nested.h |  6 ++++++
 arch/arm64/kernel/cpufeature.c      | 10 ++++++++++
 arch/arm64/tools/cpucaps            |  1 +
 3 files changed, 17 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 3b5838a2f29e..57b14474a062 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -14,6 +14,12 @@ static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
 		test_bit(KVM_ARM_VCPU_HAS_EL2, vcpu->arch.features));
 }
 
+static inline bool vcpu_has_nv2(const struct kvm_vcpu *vcpu)
+{
+	return cpus_have_final_cap(ARM64_HAS_ENHANCED_NESTED_VIRT) &&
+		vcpu_has_nv(vcpu);
+}
+
 /* Translation helpers from non-VHE EL2 to EL1 */
 static inline u64 tcr_el2_ps_to_tcr_el1_ips(u64 tcr_el2)
 {
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 2fa39ce108d0..6e0577d3bf63 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2029,6 +2029,16 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 		.field_pos = ID_AA64MMFR2_NV_SHIFT,
 		.min_field_value = 1,
 	},
+	{
+		.desc = "Enhanced Nested Virtualization Support",
+		.capability = ARM64_HAS_ENHANCED_NESTED_VIRT,
+		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
+		.matches = has_nested_virt_support,
+		.sys_reg = SYS_ID_AA64MMFR2_EL1,
+		.sign = FTR_UNSIGNED,
+		.field_pos = ID_AA64MMFR2_NV_SHIFT,
+		.min_field_value = 2,
+	},
 	{
 		.capability = ARM64_HAS_32BIT_EL0_DO_NOT_USE,
 		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index a49864b56a07..3f6f567d2811 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -19,6 +19,7 @@ HAS_DCPODP
 HAS_DCPOP
 HAS_E0PD
 HAS_ECV
+HAS_ENHANCED_NESTED_VIRT
 HAS_EPAN
 HAS_GENERIC_AUTH
 HAS_GENERIC_AUTH_ARCH
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 60/64] KVM: arm64: nv: Sync nested timer state with ARMv8.4
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:19   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:19 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Christoffer Dall <christoffer.dall@arm.com>

Emulating the ARMv8.4-NV timers is a bit odd, as the timers can
be reconfigured behind our back without the hypervisor even
noticing. In the VHE case, that's an actual regression in the
architecture...

Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/arch_timer.c  | 37 ++++++++++++++++++++++++++++++++++++
 arch/arm64/kvm/arm.c         |  3 +++
 include/kvm/arm_arch_timer.h |  1 +
 3 files changed, 41 insertions(+)

diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index 5e4f93605d36..2371796b1ab5 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -785,6 +785,43 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
 	set_cntvoff(0);
 }
 
+void kvm_timer_sync_nested(struct kvm_vcpu *vcpu)
+{
+	if (!is_hyp_ctxt(vcpu))
+		return;
+
+	/*
+	 * Guest hypervisors using ARMv8.4 enhanced nested virt support have
+	 * their EL1 timer register accesses redirected to the VNCR page.
+	 */
+	if (!vcpu_el2_e2h_is_set(vcpu)) {
+		/*
+		 * For a non-VHE guest hypervisor, we update the hardware
+		 * timer registers with the latest value written by the guest
+		 * to the VNCR page and let the hardware take care of the
+		 * rest.
+		 */
+		write_sysreg_el0(__vcpu_sys_reg(vcpu, CNTV_CTL_EL0),  SYS_CNTV_CTL);
+		write_sysreg_el0(__vcpu_sys_reg(vcpu, CNTV_CVAL_EL0), SYS_CNTV_CVAL);
+		write_sysreg_el0(__vcpu_sys_reg(vcpu, CNTP_CTL_EL0),  SYS_CNTP_CTL);
+		write_sysreg_el0(__vcpu_sys_reg(vcpu, CNTP_CVAL_EL0), SYS_CNTP_CVAL);
+	} else {
+		/*
+		 * For a VHE guest hypervisor, the emulated state (which
+		 * is stored in the VNCR page) could have been updated behind
+		 * our back, and we must reset the emulation of the timers.
+		 */
+
+		struct timer_map map;
+		get_timer_map(vcpu, &map);
+
+		soft_timer_cancel(&map.emul_vtimer->hrtimer);
+		soft_timer_cancel(&map.emul_ptimer->hrtimer);
+		timer_emulate(map.emul_vtimer);
+		timer_emulate(map.emul_ptimer);
+	}
+}
+
 /*
  * With a userspace irqchip we have to check if the guest de-asserted the
  * timer and if so, unmask the timer irq signal on the host interrupt
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index ac7d89c1e987..4c47a66eac8c 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -936,6 +936,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 		if (static_branch_unlikely(&userspace_irqchip_in_use))
 			kvm_timer_sync_user(vcpu);
 
+		if (vcpu_has_nv2(vcpu))
+			kvm_timer_sync_nested(vcpu);
+
 		kvm_arch_vcpu_ctxsync_fp(vcpu);
 
 		/*
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index 0a76dac8cb6a..89b08e5b456e 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -68,6 +68,7 @@ int kvm_timer_hyp_init(bool);
 int kvm_timer_enable(struct kvm_vcpu *vcpu);
 int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu);
 void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu);
+void kvm_timer_sync_nested(struct kvm_vcpu *vcpu);
 void kvm_timer_sync_user(struct kvm_vcpu *vcpu);
 bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu);
 void kvm_timer_update_run(struct kvm_vcpu *vcpu);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 60/64] KVM: arm64: nv: Sync nested timer state with ARMv8.4
@ 2022-01-28 12:19   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:19 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

From: Christoffer Dall <christoffer.dall@arm.com>

Emulating the ARMv8.4-NV timers is a bit odd, as the timers can
be reconfigured behind our back without the hypervisor even
noticing. In the VHE case, that's an actual regression in the
architecture...

Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/arch_timer.c  | 37 ++++++++++++++++++++++++++++++++++++
 arch/arm64/kvm/arm.c         |  3 +++
 include/kvm/arm_arch_timer.h |  1 +
 3 files changed, 41 insertions(+)

diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index 5e4f93605d36..2371796b1ab5 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -785,6 +785,43 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
 	set_cntvoff(0);
 }
 
+void kvm_timer_sync_nested(struct kvm_vcpu *vcpu)
+{
+	if (!is_hyp_ctxt(vcpu))
+		return;
+
+	/*
+	 * Guest hypervisors using ARMv8.4 enhanced nested virt support have
+	 * their EL1 timer register accesses redirected to the VNCR page.
+	 */
+	if (!vcpu_el2_e2h_is_set(vcpu)) {
+		/*
+		 * For a non-VHE guest hypervisor, we update the hardware
+		 * timer registers with the latest value written by the guest
+		 * to the VNCR page and let the hardware take care of the
+		 * rest.
+		 */
+		write_sysreg_el0(__vcpu_sys_reg(vcpu, CNTV_CTL_EL0),  SYS_CNTV_CTL);
+		write_sysreg_el0(__vcpu_sys_reg(vcpu, CNTV_CVAL_EL0), SYS_CNTV_CVAL);
+		write_sysreg_el0(__vcpu_sys_reg(vcpu, CNTP_CTL_EL0),  SYS_CNTP_CTL);
+		write_sysreg_el0(__vcpu_sys_reg(vcpu, CNTP_CVAL_EL0), SYS_CNTP_CVAL);
+	} else {
+		/*
+		 * For a VHE guest hypervisor, the emulated state (which
+		 * is stored in the VNCR page) could have been updated behind
+		 * our back, and we must reset the emulation of the timers.
+		 */
+
+		struct timer_map map;
+		get_timer_map(vcpu, &map);
+
+		soft_timer_cancel(&map.emul_vtimer->hrtimer);
+		soft_timer_cancel(&map.emul_ptimer->hrtimer);
+		timer_emulate(map.emul_vtimer);
+		timer_emulate(map.emul_ptimer);
+	}
+}
+
 /*
  * With a userspace irqchip we have to check if the guest de-asserted the
  * timer and if so, unmask the timer irq signal on the host interrupt
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index ac7d89c1e987..4c47a66eac8c 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -936,6 +936,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 		if (static_branch_unlikely(&userspace_irqchip_in_use))
 			kvm_timer_sync_user(vcpu);
 
+		if (vcpu_has_nv2(vcpu))
+			kvm_timer_sync_nested(vcpu);
+
 		kvm_arch_vcpu_ctxsync_fp(vcpu);
 
 		/*
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index 0a76dac8cb6a..89b08e5b456e 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -68,6 +68,7 @@ int kvm_timer_hyp_init(bool);
 int kvm_timer_enable(struct kvm_vcpu *vcpu);
 int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu);
 void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu);
+void kvm_timer_sync_nested(struct kvm_vcpu *vcpu);
 void kvm_timer_sync_user(struct kvm_vcpu *vcpu);
 bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu);
 void kvm_timer_update_run(struct kvm_vcpu *vcpu);
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 60/64] KVM: arm64: nv: Sync nested timer state with ARMv8.4
@ 2022-01-28 12:19   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:19 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

From: Christoffer Dall <christoffer.dall@arm.com>

Emulating the ARMv8.4-NV timers is a bit odd, as the timers can
be reconfigured behind our back without the hypervisor even
noticing. In the VHE case, that's an actual regression in the
architecture...

Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/arch_timer.c  | 37 ++++++++++++++++++++++++++++++++++++
 arch/arm64/kvm/arm.c         |  3 +++
 include/kvm/arm_arch_timer.h |  1 +
 3 files changed, 41 insertions(+)

diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index 5e4f93605d36..2371796b1ab5 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -785,6 +785,43 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
 	set_cntvoff(0);
 }
 
+void kvm_timer_sync_nested(struct kvm_vcpu *vcpu)
+{
+	if (!is_hyp_ctxt(vcpu))
+		return;
+
+	/*
+	 * Guest hypervisors using ARMv8.4 enhanced nested virt support have
+	 * their EL1 timer register accesses redirected to the VNCR page.
+	 */
+	if (!vcpu_el2_e2h_is_set(vcpu)) {
+		/*
+		 * For a non-VHE guest hypervisor, we update the hardware
+		 * timer registers with the latest value written by the guest
+		 * to the VNCR page and let the hardware take care of the
+		 * rest.
+		 */
+		write_sysreg_el0(__vcpu_sys_reg(vcpu, CNTV_CTL_EL0),  SYS_CNTV_CTL);
+		write_sysreg_el0(__vcpu_sys_reg(vcpu, CNTV_CVAL_EL0), SYS_CNTV_CVAL);
+		write_sysreg_el0(__vcpu_sys_reg(vcpu, CNTP_CTL_EL0),  SYS_CNTP_CTL);
+		write_sysreg_el0(__vcpu_sys_reg(vcpu, CNTP_CVAL_EL0), SYS_CNTP_CVAL);
+	} else {
+		/*
+		 * For a VHE guest hypervisor, the emulated state (which
+		 * is stored in the VNCR page) could have been updated behind
+		 * our back, and we must reset the emulation of the timers.
+		 */
+
+		struct timer_map map;
+		get_timer_map(vcpu, &map);
+
+		soft_timer_cancel(&map.emul_vtimer->hrtimer);
+		soft_timer_cancel(&map.emul_ptimer->hrtimer);
+		timer_emulate(map.emul_vtimer);
+		timer_emulate(map.emul_ptimer);
+	}
+}
+
 /*
  * With a userspace irqchip we have to check if the guest de-asserted the
  * timer and if so, unmask the timer irq signal on the host interrupt
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index ac7d89c1e987..4c47a66eac8c 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -936,6 +936,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 		if (static_branch_unlikely(&userspace_irqchip_in_use))
 			kvm_timer_sync_user(vcpu);
 
+		if (vcpu_has_nv2(vcpu))
+			kvm_timer_sync_nested(vcpu);
+
 		kvm_arch_vcpu_ctxsync_fp(vcpu);
 
 		/*
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index 0a76dac8cb6a..89b08e5b456e 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -68,6 +68,7 @@ int kvm_timer_hyp_init(bool);
 int kvm_timer_enable(struct kvm_vcpu *vcpu);
 int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu);
 void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu);
+void kvm_timer_sync_nested(struct kvm_vcpu *vcpu);
 void kvm_timer_sync_user(struct kvm_vcpu *vcpu);
 bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu);
 void kvm_timer_update_run(struct kvm_vcpu *vcpu);
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 61/64] KVM: arm64: nv: Allocate VNCR page when required
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:19   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:19 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

If running a NV guest on an ARMv8.4-NV capable system, let's
allocate an additional page that will be used by the hypervisor
to fulfill system register accesses.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h |  3 ++-
 arch/arm64/kvm/nested.c           | 10 ++++++++++
 arch/arm64/kvm/reset.c            |  1 +
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 15bbe8ccdefa..5d69992106aa 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -568,7 +568,8 @@ struct kvm_vcpu_arch {
  */
 static inline u64 *__ctxt_sys_reg(const struct kvm_cpu_context *ctxt, int r)
 {
-	if (unlikely(r >= __VNCR_START__ && ctxt->vncr_array))
+	if (unlikely(cpus_have_final_cap(ARM64_HAS_ENHANCED_NESTED_VIRT) &&
+		     r >= __VNCR_START__ && ctxt->vncr_array))
 		return &ctxt->vncr_array[r - __VNCR_START__];
 
 	return (u64 *)&ctxt->sys_regs[r];
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 8e99690cdde1..bbdd4633e665 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -35,6 +35,14 @@ int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu)
 	if (!cpus_have_final_cap(ARM64_HAS_NESTED_VIRT))
 		return -EINVAL;
 
+	if (cpus_have_final_cap(ARM64_HAS_ENHANCED_NESTED_VIRT)) {
+		if (!vcpu->arch.ctxt.vncr_array)
+			vcpu->arch.ctxt.vncr_array = (u64 *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+
+		if (!vcpu->arch.ctxt.vncr_array)
+			return -ENOMEM;
+	}
+
 	mutex_lock(&kvm->lock);
 
 	/*
@@ -64,6 +72,8 @@ int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu)
 		    kvm_init_stage2_mmu(kvm, &tmp[num_mmus - 2])) {
 			kvm_free_stage2_pgd(&tmp[num_mmus - 1]);
 			kvm_free_stage2_pgd(&tmp[num_mmus - 2]);
+			free_page((unsigned long)vcpu->arch.ctxt.vncr_array);
+			vcpu->arch.ctxt.vncr_array = NULL;
 		} else {
 			kvm->arch.nested_mmus_size = num_mmus;
 			ret = 0;
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index d19a9aad2d85..f59c00fb53cc 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -161,6 +161,7 @@ void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu)
 	if (sve_state)
 		kvm_unshare_hyp(sve_state, sve_state + vcpu_sve_state_size(vcpu));
 	kfree(sve_state);
+	free_page((unsigned long)vcpu->arch.ctxt.vncr_array);
 }
 
 static void kvm_vcpu_reset_sve(struct kvm_vcpu *vcpu)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 61/64] KVM: arm64: nv: Allocate VNCR page when required
@ 2022-01-28 12:19   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:19 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

If running a NV guest on an ARMv8.4-NV capable system, let's
allocate an additional page that will be used by the hypervisor
to fulfill system register accesses.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h |  3 ++-
 arch/arm64/kvm/nested.c           | 10 ++++++++++
 arch/arm64/kvm/reset.c            |  1 +
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 15bbe8ccdefa..5d69992106aa 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -568,7 +568,8 @@ struct kvm_vcpu_arch {
  */
 static inline u64 *__ctxt_sys_reg(const struct kvm_cpu_context *ctxt, int r)
 {
-	if (unlikely(r >= __VNCR_START__ && ctxt->vncr_array))
+	if (unlikely(cpus_have_final_cap(ARM64_HAS_ENHANCED_NESTED_VIRT) &&
+		     r >= __VNCR_START__ && ctxt->vncr_array))
 		return &ctxt->vncr_array[r - __VNCR_START__];
 
 	return (u64 *)&ctxt->sys_regs[r];
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 8e99690cdde1..bbdd4633e665 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -35,6 +35,14 @@ int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu)
 	if (!cpus_have_final_cap(ARM64_HAS_NESTED_VIRT))
 		return -EINVAL;
 
+	if (cpus_have_final_cap(ARM64_HAS_ENHANCED_NESTED_VIRT)) {
+		if (!vcpu->arch.ctxt.vncr_array)
+			vcpu->arch.ctxt.vncr_array = (u64 *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+
+		if (!vcpu->arch.ctxt.vncr_array)
+			return -ENOMEM;
+	}
+
 	mutex_lock(&kvm->lock);
 
 	/*
@@ -64,6 +72,8 @@ int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu)
 		    kvm_init_stage2_mmu(kvm, &tmp[num_mmus - 2])) {
 			kvm_free_stage2_pgd(&tmp[num_mmus - 1]);
 			kvm_free_stage2_pgd(&tmp[num_mmus - 2]);
+			free_page((unsigned long)vcpu->arch.ctxt.vncr_array);
+			vcpu->arch.ctxt.vncr_array = NULL;
 		} else {
 			kvm->arch.nested_mmus_size = num_mmus;
 			ret = 0;
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index d19a9aad2d85..f59c00fb53cc 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -161,6 +161,7 @@ void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu)
 	if (sve_state)
 		kvm_unshare_hyp(sve_state, sve_state + vcpu_sve_state_size(vcpu));
 	kfree(sve_state);
+	free_page((unsigned long)vcpu->arch.ctxt.vncr_array);
 }
 
 static void kvm_vcpu_reset_sve(struct kvm_vcpu *vcpu)
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 61/64] KVM: arm64: nv: Allocate VNCR page when required
@ 2022-01-28 12:19   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:19 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

If running a NV guest on an ARMv8.4-NV capable system, let's
allocate an additional page that will be used by the hypervisor
to fulfill system register accesses.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h |  3 ++-
 arch/arm64/kvm/nested.c           | 10 ++++++++++
 arch/arm64/kvm/reset.c            |  1 +
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 15bbe8ccdefa..5d69992106aa 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -568,7 +568,8 @@ struct kvm_vcpu_arch {
  */
 static inline u64 *__ctxt_sys_reg(const struct kvm_cpu_context *ctxt, int r)
 {
-	if (unlikely(r >= __VNCR_START__ && ctxt->vncr_array))
+	if (unlikely(cpus_have_final_cap(ARM64_HAS_ENHANCED_NESTED_VIRT) &&
+		     r >= __VNCR_START__ && ctxt->vncr_array))
 		return &ctxt->vncr_array[r - __VNCR_START__];
 
 	return (u64 *)&ctxt->sys_regs[r];
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 8e99690cdde1..bbdd4633e665 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -35,6 +35,14 @@ int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu)
 	if (!cpus_have_final_cap(ARM64_HAS_NESTED_VIRT))
 		return -EINVAL;
 
+	if (cpus_have_final_cap(ARM64_HAS_ENHANCED_NESTED_VIRT)) {
+		if (!vcpu->arch.ctxt.vncr_array)
+			vcpu->arch.ctxt.vncr_array = (u64 *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+
+		if (!vcpu->arch.ctxt.vncr_array)
+			return -ENOMEM;
+	}
+
 	mutex_lock(&kvm->lock);
 
 	/*
@@ -64,6 +72,8 @@ int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu)
 		    kvm_init_stage2_mmu(kvm, &tmp[num_mmus - 2])) {
 			kvm_free_stage2_pgd(&tmp[num_mmus - 1]);
 			kvm_free_stage2_pgd(&tmp[num_mmus - 2]);
+			free_page((unsigned long)vcpu->arch.ctxt.vncr_array);
+			vcpu->arch.ctxt.vncr_array = NULL;
 		} else {
 			kvm->arch.nested_mmus_size = num_mmus;
 			ret = 0;
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index d19a9aad2d85..f59c00fb53cc 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -161,6 +161,7 @@ void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu)
 	if (sve_state)
 		kvm_unshare_hyp(sve_state, sve_state + vcpu_sve_state_size(vcpu));
 	kfree(sve_state);
+	free_page((unsigned long)vcpu->arch.ctxt.vncr_array);
 }
 
 static void kvm_vcpu_reset_sve(struct kvm_vcpu *vcpu)
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 62/64] KVM: arm64: nv: Enable ARMv8.4-NV support
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:19   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:19 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

As all the VNCR-capable system registers are nicely separated
from the rest of the crowd, let's set HCR_EL2.NV2 on and let
the ball rolling.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_arm.h     |  1 +
 arch/arm64/include/asm/kvm_emulate.h | 23 +++++++++++++----------
 arch/arm64/include/asm/sysreg.h      |  1 +
 arch/arm64/kvm/hyp/vhe/switch.c      | 14 +++++++++++++-
 4 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index aa3bdce1b166..19998cf067ce 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -20,6 +20,7 @@
 #define HCR_AMVOFFEN	(UL(1) << 51)
 #define HCR_FIEN	(UL(1) << 47)
 #define HCR_FWB		(UL(1) << 46)
+#define HCR_NV2		(UL(1) << 45)
 #define HCR_AT		(UL(1) << 44)
 #define HCR_NV1		(UL(1) << 43)
 #define HCR_NV		(UL(1) << 42)
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 2128d623a8b3..7c09d36fd593 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -245,21 +245,24 @@ static inline bool is_hyp_ctxt(const struct kvm_vcpu *vcpu)
 
 static inline u64 __fixup_spsr_el2_write(struct kvm_cpu_context *ctxt, u64 val)
 {
-	if (!__vcpu_el2_e2h_is_set(ctxt)) {
-		/*
-		 * Clear the .M field when writing SPSR to the CPU, so that we
-		 * can detect when the CPU clobbered our SPSR copy during a
-		 * local exception.
-		 */
-		val &= ~0xc;
-	}
+	struct kvm_vcpu *vcpu = container_of(ctxt, struct kvm_vcpu, arch.ctxt);
+
+	if (vcpu_has_nv2(vcpu) || __vcpu_el2_e2h_is_set(ctxt))
+		return val;
 
-	return val;
+	/*
+	 * Clear the .M field when writing SPSR to the CPU, so that we
+	 * can detect when the CPU clobbered our SPSR copy during a
+	 * local exception.
+	 */
+	return val &= ~0xc;
 }
 
 static inline u64 __fixup_spsr_el2_read(const struct kvm_cpu_context *ctxt, u64 val)
 {
-	if (__vcpu_el2_e2h_is_set(ctxt))
+	struct kvm_vcpu *vcpu = container_of(ctxt, struct kvm_vcpu, arch.ctxt);
+
+	if (vcpu_has_nv2(vcpu) || __vcpu_el2_e2h_is_set(ctxt))
 		return val;
 
 	/*
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index ff6d3af8ed34..0a21be5263d9 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -551,6 +551,7 @@
 #define SYS_TCR_EL2			sys_reg(3, 4, 2, 0, 2)
 #define SYS_VTTBR_EL2			sys_reg(3, 4, 2, 1, 0)
 #define SYS_VTCR_EL2			sys_reg(3, 4, 2, 1, 2)
+#define SYS_VNCR_EL2			sys_reg(3, 4, 2, 2, 0)
 
 #define SYS_ZCR_EL2			sys_reg(3, 4, 1, 2, 0)
 #define SYS_TRFCR_EL2			sys_reg(3, 4, 1, 2, 1)
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 5e8eafac27c6..65120c9027d6 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -44,7 +44,13 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 			 * the EL1 virtual memory control register accesses
 			 * as well as the AT S1 operations.
 			 */
-			hcr |= HCR_TVM | HCR_TRVM | HCR_AT | HCR_TTLB | HCR_NV1;
+			if (vcpu_has_nv2(vcpu)) {
+				hcr &= ~HCR_TVM;
+			} else {
+				hcr |= HCR_TVM | HCR_TRVM | HCR_TTLB;
+			}
+
+			hcr |= HCR_AT | HCR_NV1;
 		} else {
 			/*
 			 * For a guest hypervisor on v8.1 (VHE), allow to
@@ -78,6 +84,12 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 			if (!vcpu_el2_tge_is_set(vcpu))
 				hcr |= HCR_AT | HCR_TTLB;
 		}
+
+		if (vcpu_has_nv2(vcpu)) {
+			hcr |= HCR_AT | HCR_TTLB | HCR_NV2;
+			write_sysreg_s(vcpu->arch.ctxt.vncr_array,
+				       SYS_VNCR_EL2);
+		}
 	} else if (vcpu_has_nv(vcpu)) {
 		u64 vhcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 62/64] KVM: arm64: nv: Enable ARMv8.4-NV support
@ 2022-01-28 12:19   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:19 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

As all the VNCR-capable system registers are nicely separated
from the rest of the crowd, let's set HCR_EL2.NV2 on and let
the ball rolling.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_arm.h     |  1 +
 arch/arm64/include/asm/kvm_emulate.h | 23 +++++++++++++----------
 arch/arm64/include/asm/sysreg.h      |  1 +
 arch/arm64/kvm/hyp/vhe/switch.c      | 14 +++++++++++++-
 4 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index aa3bdce1b166..19998cf067ce 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -20,6 +20,7 @@
 #define HCR_AMVOFFEN	(UL(1) << 51)
 #define HCR_FIEN	(UL(1) << 47)
 #define HCR_FWB		(UL(1) << 46)
+#define HCR_NV2		(UL(1) << 45)
 #define HCR_AT		(UL(1) << 44)
 #define HCR_NV1		(UL(1) << 43)
 #define HCR_NV		(UL(1) << 42)
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 2128d623a8b3..7c09d36fd593 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -245,21 +245,24 @@ static inline bool is_hyp_ctxt(const struct kvm_vcpu *vcpu)
 
 static inline u64 __fixup_spsr_el2_write(struct kvm_cpu_context *ctxt, u64 val)
 {
-	if (!__vcpu_el2_e2h_is_set(ctxt)) {
-		/*
-		 * Clear the .M field when writing SPSR to the CPU, so that we
-		 * can detect when the CPU clobbered our SPSR copy during a
-		 * local exception.
-		 */
-		val &= ~0xc;
-	}
+	struct kvm_vcpu *vcpu = container_of(ctxt, struct kvm_vcpu, arch.ctxt);
+
+	if (vcpu_has_nv2(vcpu) || __vcpu_el2_e2h_is_set(ctxt))
+		return val;
 
-	return val;
+	/*
+	 * Clear the .M field when writing SPSR to the CPU, so that we
+	 * can detect when the CPU clobbered our SPSR copy during a
+	 * local exception.
+	 */
+	return val &= ~0xc;
 }
 
 static inline u64 __fixup_spsr_el2_read(const struct kvm_cpu_context *ctxt, u64 val)
 {
-	if (__vcpu_el2_e2h_is_set(ctxt))
+	struct kvm_vcpu *vcpu = container_of(ctxt, struct kvm_vcpu, arch.ctxt);
+
+	if (vcpu_has_nv2(vcpu) || __vcpu_el2_e2h_is_set(ctxt))
 		return val;
 
 	/*
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index ff6d3af8ed34..0a21be5263d9 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -551,6 +551,7 @@
 #define SYS_TCR_EL2			sys_reg(3, 4, 2, 0, 2)
 #define SYS_VTTBR_EL2			sys_reg(3, 4, 2, 1, 0)
 #define SYS_VTCR_EL2			sys_reg(3, 4, 2, 1, 2)
+#define SYS_VNCR_EL2			sys_reg(3, 4, 2, 2, 0)
 
 #define SYS_ZCR_EL2			sys_reg(3, 4, 1, 2, 0)
 #define SYS_TRFCR_EL2			sys_reg(3, 4, 1, 2, 1)
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 5e8eafac27c6..65120c9027d6 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -44,7 +44,13 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 			 * the EL1 virtual memory control register accesses
 			 * as well as the AT S1 operations.
 			 */
-			hcr |= HCR_TVM | HCR_TRVM | HCR_AT | HCR_TTLB | HCR_NV1;
+			if (vcpu_has_nv2(vcpu)) {
+				hcr &= ~HCR_TVM;
+			} else {
+				hcr |= HCR_TVM | HCR_TRVM | HCR_TTLB;
+			}
+
+			hcr |= HCR_AT | HCR_NV1;
 		} else {
 			/*
 			 * For a guest hypervisor on v8.1 (VHE), allow to
@@ -78,6 +84,12 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 			if (!vcpu_el2_tge_is_set(vcpu))
 				hcr |= HCR_AT | HCR_TTLB;
 		}
+
+		if (vcpu_has_nv2(vcpu)) {
+			hcr |= HCR_AT | HCR_TTLB | HCR_NV2;
+			write_sysreg_s(vcpu->arch.ctxt.vncr_array,
+				       SYS_VNCR_EL2);
+		}
 	} else if (vcpu_has_nv(vcpu)) {
 		u64 vhcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
 
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 62/64] KVM: arm64: nv: Enable ARMv8.4-NV support
@ 2022-01-28 12:19   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:19 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

As all the VNCR-capable system registers are nicely separated
from the rest of the crowd, let's set HCR_EL2.NV2 on and let
the ball rolling.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_arm.h     |  1 +
 arch/arm64/include/asm/kvm_emulate.h | 23 +++++++++++++----------
 arch/arm64/include/asm/sysreg.h      |  1 +
 arch/arm64/kvm/hyp/vhe/switch.c      | 14 +++++++++++++-
 4 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index aa3bdce1b166..19998cf067ce 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -20,6 +20,7 @@
 #define HCR_AMVOFFEN	(UL(1) << 51)
 #define HCR_FIEN	(UL(1) << 47)
 #define HCR_FWB		(UL(1) << 46)
+#define HCR_NV2		(UL(1) << 45)
 #define HCR_AT		(UL(1) << 44)
 #define HCR_NV1		(UL(1) << 43)
 #define HCR_NV		(UL(1) << 42)
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 2128d623a8b3..7c09d36fd593 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -245,21 +245,24 @@ static inline bool is_hyp_ctxt(const struct kvm_vcpu *vcpu)
 
 static inline u64 __fixup_spsr_el2_write(struct kvm_cpu_context *ctxt, u64 val)
 {
-	if (!__vcpu_el2_e2h_is_set(ctxt)) {
-		/*
-		 * Clear the .M field when writing SPSR to the CPU, so that we
-		 * can detect when the CPU clobbered our SPSR copy during a
-		 * local exception.
-		 */
-		val &= ~0xc;
-	}
+	struct kvm_vcpu *vcpu = container_of(ctxt, struct kvm_vcpu, arch.ctxt);
+
+	if (vcpu_has_nv2(vcpu) || __vcpu_el2_e2h_is_set(ctxt))
+		return val;
 
-	return val;
+	/*
+	 * Clear the .M field when writing SPSR to the CPU, so that we
+	 * can detect when the CPU clobbered our SPSR copy during a
+	 * local exception.
+	 */
+	return val &= ~0xc;
 }
 
 static inline u64 __fixup_spsr_el2_read(const struct kvm_cpu_context *ctxt, u64 val)
 {
-	if (__vcpu_el2_e2h_is_set(ctxt))
+	struct kvm_vcpu *vcpu = container_of(ctxt, struct kvm_vcpu, arch.ctxt);
+
+	if (vcpu_has_nv2(vcpu) || __vcpu_el2_e2h_is_set(ctxt))
 		return val;
 
 	/*
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index ff6d3af8ed34..0a21be5263d9 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -551,6 +551,7 @@
 #define SYS_TCR_EL2			sys_reg(3, 4, 2, 0, 2)
 #define SYS_VTTBR_EL2			sys_reg(3, 4, 2, 1, 0)
 #define SYS_VTCR_EL2			sys_reg(3, 4, 2, 1, 2)
+#define SYS_VNCR_EL2			sys_reg(3, 4, 2, 2, 0)
 
 #define SYS_ZCR_EL2			sys_reg(3, 4, 1, 2, 0)
 #define SYS_TRFCR_EL2			sys_reg(3, 4, 1, 2, 1)
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 5e8eafac27c6..65120c9027d6 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -44,7 +44,13 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 			 * the EL1 virtual memory control register accesses
 			 * as well as the AT S1 operations.
 			 */
-			hcr |= HCR_TVM | HCR_TRVM | HCR_AT | HCR_TTLB | HCR_NV1;
+			if (vcpu_has_nv2(vcpu)) {
+				hcr &= ~HCR_TVM;
+			} else {
+				hcr |= HCR_TVM | HCR_TRVM | HCR_TTLB;
+			}
+
+			hcr |= HCR_AT | HCR_NV1;
 		} else {
 			/*
 			 * For a guest hypervisor on v8.1 (VHE), allow to
@@ -78,6 +84,12 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 			if (!vcpu_el2_tge_is_set(vcpu))
 				hcr |= HCR_AT | HCR_TTLB;
 		}
+
+		if (vcpu_has_nv2(vcpu)) {
+			hcr |= HCR_AT | HCR_TTLB | HCR_NV2;
+			write_sysreg_s(vcpu->arch.ctxt.vncr_array,
+				       SYS_VNCR_EL2);
+		}
 	} else if (vcpu_has_nv(vcpu)) {
 		u64 vhcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
 
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 63/64] KVM: arm64: nv: Fast-track 'InHost' exception returns
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:19   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:19 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

A significant part of the ARMv8.3-NV extension is to trap ERET
instructions so that the hypervisor gets a chance to switch
from a vEL2 L1 guest to an EL1 L2 guest.

But this also has the unfortunate consequence of trapping ERET
in unsuspecting circumstances, such as staying at vEL2 (interrupt
handling while being in the guest hypervisor), or returning to host
userspace in the case of a VHE guest.

Although we already make some effort to handle these ERET quicker
by not doing the put/load dance, it is still way too far down the
line for it to be efficient enough.

For these cases, it would ideal to ERET directly, no question asked.
Of course, we can't do that. But the next best thing is to do it as
early as possible, in fixup_guest_exit(), much as we would handle
FPSIMD exceptions.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/emulate-nested.c | 29 +++------------------
 arch/arm64/kvm/hyp/vhe/switch.c | 46 +++++++++++++++++++++++++++++++++
 2 files changed, 49 insertions(+), 26 deletions(-)

diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
index 1f6cf8fe9fe3..3f3b1296b4ac 100644
--- a/arch/arm64/kvm/emulate-nested.c
+++ b/arch/arm64/kvm/emulate-nested.c
@@ -76,8 +76,7 @@ static u64 kvm_check_illegal_exception_return(struct kvm_vcpu *vcpu, u64 spsr)
 
 void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu)
 {
-	u64 spsr, elr, mode;
-	bool direct_eret;
+	u64 spsr, elr;
 
 	/*
 	 * Forward this trap to the virtual EL2 if the virtual
@@ -86,33 +85,11 @@ void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu)
 	if (forward_nv_traps(vcpu))
 		return;
 
-	/*
-	 * Going through the whole put/load motions is a waste of time
-	 * if this is a VHE guest hypervisor returning to its own
-	 * userspace, or the hypervisor performing a local exception
-	 * return. No need to save/restore registers, no need to
-	 * switch S2 MMU. Just do the canonical ERET.
-	 */
-	spsr = vcpu_read_sys_reg(vcpu, SPSR_EL2);
-	spsr = kvm_check_illegal_exception_return(vcpu, spsr);
-
-	mode = spsr & (PSR_MODE_MASK | PSR_MODE32_BIT);
-
-	direct_eret  = (mode == PSR_MODE_EL0t &&
-			vcpu_el2_e2h_is_set(vcpu) &&
-			vcpu_el2_tge_is_set(vcpu));
-	direct_eret |= (mode == PSR_MODE_EL2h || mode == PSR_MODE_EL2t);
-
-	if (direct_eret) {
-		*vcpu_pc(vcpu) = vcpu_read_sys_reg(vcpu, ELR_EL2);
-		*vcpu_cpsr(vcpu) = spsr;
-		trace_kvm_nested_eret(vcpu, *vcpu_pc(vcpu), spsr);
-		return;
-	}
-
 	preempt_disable();
 	kvm_arch_vcpu_put(vcpu);
 
+	spsr = __vcpu_sys_reg(vcpu, SPSR_EL2);
+	spsr = kvm_check_illegal_exception_return(vcpu, spsr);
 	elr = __vcpu_sys_reg(vcpu, ELR_EL2);
 
 	trace_kvm_nested_eret(vcpu, elr, spsr);
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 65120c9027d6..f469bf51b4be 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -161,6 +161,51 @@ void deactivate_traps_vhe_put(struct kvm_vcpu *vcpu)
 	__deactivate_traps_common(vcpu);
 }
 
+static bool kvm_hyp_handle_eret(struct kvm_vcpu *vcpu, u64 *exit_code)
+{
+	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
+	u64 spsr, mode;
+
+	/*
+	 * Going through the whole put/load motions is a waste of time
+	 * if this is a VHE guest hypervisor returning to its own
+	 * userspace, or the hypervisor performing a local exception
+	 * return. No need to save/restore registers, no need to
+	 * switch S2 MMU. Just do the canonical ERET.
+	 *
+	 * Unless the trap has to be forwarded further down the line,
+	 * of course...
+	 */
+	if (__vcpu_sys_reg(vcpu, HCR_EL2) & HCR_NV)
+		return false;
+
+	spsr = read_sysreg_el1(SYS_SPSR);
+	spsr = __fixup_spsr_el2_read(ctxt, spsr);
+	mode = spsr & (PSR_MODE_MASK | PSR_MODE32_BIT);
+
+	switch (mode) {
+	case PSR_MODE_EL0t:
+		if (!(vcpu_el2_e2h_is_set(vcpu) && vcpu_el2_tge_is_set(vcpu)))
+			return false;
+		break;
+	case PSR_MODE_EL2t:
+		mode = PSR_MODE_EL1t;
+		break;
+	case PSR_MODE_EL2h:
+		mode = PSR_MODE_EL1h;
+		break;
+	default:
+		return false;
+	}
+
+	spsr = (spsr & ~(PSR_MODE_MASK | PSR_MODE32_BIT)) | mode;
+
+	write_sysreg_el2(spsr, SYS_SPSR);
+	write_sysreg_el2(read_sysreg_el1(SYS_ELR), SYS_ELR);
+
+	return true;
+}
+
 static const exit_handler_fn hyp_exit_handlers[] = {
 	[0 ... ESR_ELx_EC_MAX]		= NULL,
 	[ESR_ELx_EC_CP15_32]		= kvm_hyp_handle_cp15_32,
@@ -170,6 +215,7 @@ static const exit_handler_fn hyp_exit_handlers[] = {
 	[ESR_ELx_EC_IABT_LOW]		= kvm_hyp_handle_iabt_low,
 	[ESR_ELx_EC_DABT_LOW]		= kvm_hyp_handle_dabt_low,
 	[ESR_ELx_EC_PAC]		= kvm_hyp_handle_ptrauth,
+	[ESR_ELx_EC_ERET]		= kvm_hyp_handle_eret,
 };
 
 static const exit_handler_fn *kvm_get_exit_handler_array(struct kvm_vcpu *vcpu)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 63/64] KVM: arm64: nv: Fast-track 'InHost' exception returns
@ 2022-01-28 12:19   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:19 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

A significant part of the ARMv8.3-NV extension is to trap ERET
instructions so that the hypervisor gets a chance to switch
from a vEL2 L1 guest to an EL1 L2 guest.

But this also has the unfortunate consequence of trapping ERET
in unsuspecting circumstances, such as staying at vEL2 (interrupt
handling while being in the guest hypervisor), or returning to host
userspace in the case of a VHE guest.

Although we already make some effort to handle these ERET quicker
by not doing the put/load dance, it is still way too far down the
line for it to be efficient enough.

For these cases, it would ideal to ERET directly, no question asked.
Of course, we can't do that. But the next best thing is to do it as
early as possible, in fixup_guest_exit(), much as we would handle
FPSIMD exceptions.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/emulate-nested.c | 29 +++------------------
 arch/arm64/kvm/hyp/vhe/switch.c | 46 +++++++++++++++++++++++++++++++++
 2 files changed, 49 insertions(+), 26 deletions(-)

diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
index 1f6cf8fe9fe3..3f3b1296b4ac 100644
--- a/arch/arm64/kvm/emulate-nested.c
+++ b/arch/arm64/kvm/emulate-nested.c
@@ -76,8 +76,7 @@ static u64 kvm_check_illegal_exception_return(struct kvm_vcpu *vcpu, u64 spsr)
 
 void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu)
 {
-	u64 spsr, elr, mode;
-	bool direct_eret;
+	u64 spsr, elr;
 
 	/*
 	 * Forward this trap to the virtual EL2 if the virtual
@@ -86,33 +85,11 @@ void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu)
 	if (forward_nv_traps(vcpu))
 		return;
 
-	/*
-	 * Going through the whole put/load motions is a waste of time
-	 * if this is a VHE guest hypervisor returning to its own
-	 * userspace, or the hypervisor performing a local exception
-	 * return. No need to save/restore registers, no need to
-	 * switch S2 MMU. Just do the canonical ERET.
-	 */
-	spsr = vcpu_read_sys_reg(vcpu, SPSR_EL2);
-	spsr = kvm_check_illegal_exception_return(vcpu, spsr);
-
-	mode = spsr & (PSR_MODE_MASK | PSR_MODE32_BIT);
-
-	direct_eret  = (mode == PSR_MODE_EL0t &&
-			vcpu_el2_e2h_is_set(vcpu) &&
-			vcpu_el2_tge_is_set(vcpu));
-	direct_eret |= (mode == PSR_MODE_EL2h || mode == PSR_MODE_EL2t);
-
-	if (direct_eret) {
-		*vcpu_pc(vcpu) = vcpu_read_sys_reg(vcpu, ELR_EL2);
-		*vcpu_cpsr(vcpu) = spsr;
-		trace_kvm_nested_eret(vcpu, *vcpu_pc(vcpu), spsr);
-		return;
-	}
-
 	preempt_disable();
 	kvm_arch_vcpu_put(vcpu);
 
+	spsr = __vcpu_sys_reg(vcpu, SPSR_EL2);
+	spsr = kvm_check_illegal_exception_return(vcpu, spsr);
 	elr = __vcpu_sys_reg(vcpu, ELR_EL2);
 
 	trace_kvm_nested_eret(vcpu, elr, spsr);
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 65120c9027d6..f469bf51b4be 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -161,6 +161,51 @@ void deactivate_traps_vhe_put(struct kvm_vcpu *vcpu)
 	__deactivate_traps_common(vcpu);
 }
 
+static bool kvm_hyp_handle_eret(struct kvm_vcpu *vcpu, u64 *exit_code)
+{
+	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
+	u64 spsr, mode;
+
+	/*
+	 * Going through the whole put/load motions is a waste of time
+	 * if this is a VHE guest hypervisor returning to its own
+	 * userspace, or the hypervisor performing a local exception
+	 * return. No need to save/restore registers, no need to
+	 * switch S2 MMU. Just do the canonical ERET.
+	 *
+	 * Unless the trap has to be forwarded further down the line,
+	 * of course...
+	 */
+	if (__vcpu_sys_reg(vcpu, HCR_EL2) & HCR_NV)
+		return false;
+
+	spsr = read_sysreg_el1(SYS_SPSR);
+	spsr = __fixup_spsr_el2_read(ctxt, spsr);
+	mode = spsr & (PSR_MODE_MASK | PSR_MODE32_BIT);
+
+	switch (mode) {
+	case PSR_MODE_EL0t:
+		if (!(vcpu_el2_e2h_is_set(vcpu) && vcpu_el2_tge_is_set(vcpu)))
+			return false;
+		break;
+	case PSR_MODE_EL2t:
+		mode = PSR_MODE_EL1t;
+		break;
+	case PSR_MODE_EL2h:
+		mode = PSR_MODE_EL1h;
+		break;
+	default:
+		return false;
+	}
+
+	spsr = (spsr & ~(PSR_MODE_MASK | PSR_MODE32_BIT)) | mode;
+
+	write_sysreg_el2(spsr, SYS_SPSR);
+	write_sysreg_el2(read_sysreg_el1(SYS_ELR), SYS_ELR);
+
+	return true;
+}
+
 static const exit_handler_fn hyp_exit_handlers[] = {
 	[0 ... ESR_ELx_EC_MAX]		= NULL,
 	[ESR_ELx_EC_CP15_32]		= kvm_hyp_handle_cp15_32,
@@ -170,6 +215,7 @@ static const exit_handler_fn hyp_exit_handlers[] = {
 	[ESR_ELx_EC_IABT_LOW]		= kvm_hyp_handle_iabt_low,
 	[ESR_ELx_EC_DABT_LOW]		= kvm_hyp_handle_dabt_low,
 	[ESR_ELx_EC_PAC]		= kvm_hyp_handle_ptrauth,
+	[ESR_ELx_EC_ERET]		= kvm_hyp_handle_eret,
 };
 
 static const exit_handler_fn *kvm_get_exit_handler_array(struct kvm_vcpu *vcpu)
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 63/64] KVM: arm64: nv: Fast-track 'InHost' exception returns
@ 2022-01-28 12:19   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:19 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

A significant part of the ARMv8.3-NV extension is to trap ERET
instructions so that the hypervisor gets a chance to switch
from a vEL2 L1 guest to an EL1 L2 guest.

But this also has the unfortunate consequence of trapping ERET
in unsuspecting circumstances, such as staying at vEL2 (interrupt
handling while being in the guest hypervisor), or returning to host
userspace in the case of a VHE guest.

Although we already make some effort to handle these ERET quicker
by not doing the put/load dance, it is still way too far down the
line for it to be efficient enough.

For these cases, it would ideal to ERET directly, no question asked.
Of course, we can't do that. But the next best thing is to do it as
early as possible, in fixup_guest_exit(), much as we would handle
FPSIMD exceptions.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/emulate-nested.c | 29 +++------------------
 arch/arm64/kvm/hyp/vhe/switch.c | 46 +++++++++++++++++++++++++++++++++
 2 files changed, 49 insertions(+), 26 deletions(-)

diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
index 1f6cf8fe9fe3..3f3b1296b4ac 100644
--- a/arch/arm64/kvm/emulate-nested.c
+++ b/arch/arm64/kvm/emulate-nested.c
@@ -76,8 +76,7 @@ static u64 kvm_check_illegal_exception_return(struct kvm_vcpu *vcpu, u64 spsr)
 
 void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu)
 {
-	u64 spsr, elr, mode;
-	bool direct_eret;
+	u64 spsr, elr;
 
 	/*
 	 * Forward this trap to the virtual EL2 if the virtual
@@ -86,33 +85,11 @@ void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu)
 	if (forward_nv_traps(vcpu))
 		return;
 
-	/*
-	 * Going through the whole put/load motions is a waste of time
-	 * if this is a VHE guest hypervisor returning to its own
-	 * userspace, or the hypervisor performing a local exception
-	 * return. No need to save/restore registers, no need to
-	 * switch S2 MMU. Just do the canonical ERET.
-	 */
-	spsr = vcpu_read_sys_reg(vcpu, SPSR_EL2);
-	spsr = kvm_check_illegal_exception_return(vcpu, spsr);
-
-	mode = spsr & (PSR_MODE_MASK | PSR_MODE32_BIT);
-
-	direct_eret  = (mode == PSR_MODE_EL0t &&
-			vcpu_el2_e2h_is_set(vcpu) &&
-			vcpu_el2_tge_is_set(vcpu));
-	direct_eret |= (mode == PSR_MODE_EL2h || mode == PSR_MODE_EL2t);
-
-	if (direct_eret) {
-		*vcpu_pc(vcpu) = vcpu_read_sys_reg(vcpu, ELR_EL2);
-		*vcpu_cpsr(vcpu) = spsr;
-		trace_kvm_nested_eret(vcpu, *vcpu_pc(vcpu), spsr);
-		return;
-	}
-
 	preempt_disable();
 	kvm_arch_vcpu_put(vcpu);
 
+	spsr = __vcpu_sys_reg(vcpu, SPSR_EL2);
+	spsr = kvm_check_illegal_exception_return(vcpu, spsr);
 	elr = __vcpu_sys_reg(vcpu, ELR_EL2);
 
 	trace_kvm_nested_eret(vcpu, elr, spsr);
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 65120c9027d6..f469bf51b4be 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -161,6 +161,51 @@ void deactivate_traps_vhe_put(struct kvm_vcpu *vcpu)
 	__deactivate_traps_common(vcpu);
 }
 
+static bool kvm_hyp_handle_eret(struct kvm_vcpu *vcpu, u64 *exit_code)
+{
+	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
+	u64 spsr, mode;
+
+	/*
+	 * Going through the whole put/load motions is a waste of time
+	 * if this is a VHE guest hypervisor returning to its own
+	 * userspace, or the hypervisor performing a local exception
+	 * return. No need to save/restore registers, no need to
+	 * switch S2 MMU. Just do the canonical ERET.
+	 *
+	 * Unless the trap has to be forwarded further down the line,
+	 * of course...
+	 */
+	if (__vcpu_sys_reg(vcpu, HCR_EL2) & HCR_NV)
+		return false;
+
+	spsr = read_sysreg_el1(SYS_SPSR);
+	spsr = __fixup_spsr_el2_read(ctxt, spsr);
+	mode = spsr & (PSR_MODE_MASK | PSR_MODE32_BIT);
+
+	switch (mode) {
+	case PSR_MODE_EL0t:
+		if (!(vcpu_el2_e2h_is_set(vcpu) && vcpu_el2_tge_is_set(vcpu)))
+			return false;
+		break;
+	case PSR_MODE_EL2t:
+		mode = PSR_MODE_EL1t;
+		break;
+	case PSR_MODE_EL2h:
+		mode = PSR_MODE_EL1h;
+		break;
+	default:
+		return false;
+	}
+
+	spsr = (spsr & ~(PSR_MODE_MASK | PSR_MODE32_BIT)) | mode;
+
+	write_sysreg_el2(spsr, SYS_SPSR);
+	write_sysreg_el2(read_sysreg_el1(SYS_ELR), SYS_ELR);
+
+	return true;
+}
+
 static const exit_handler_fn hyp_exit_handlers[] = {
 	[0 ... ESR_ELx_EC_MAX]		= NULL,
 	[ESR_ELx_EC_CP15_32]		= kvm_hyp_handle_cp15_32,
@@ -170,6 +215,7 @@ static const exit_handler_fn hyp_exit_handlers[] = {
 	[ESR_ELx_EC_IABT_LOW]		= kvm_hyp_handle_iabt_low,
 	[ESR_ELx_EC_DABT_LOW]		= kvm_hyp_handle_dabt_low,
 	[ESR_ELx_EC_PAC]		= kvm_hyp_handle_ptrauth,
+	[ESR_ELx_EC_ERET]		= kvm_hyp_handle_eret,
 };
 
 static const exit_handler_fn *kvm_get_exit_handler_array(struct kvm_vcpu *vcpu)
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 64/64] KVM: arm64: nv: Fast-track EL1 TLBIs for VHE guests
  2022-01-28 12:18 ` Marc Zyngier
  (?)
@ 2022-01-28 12:19   ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:19 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

Due to the way ARMv8.4-NV suppresses traps when accessing EL2
system registers, we can't track when the guest changes its
HCR_EL2.TGE setting. This means we always trap EL1 TLBIs,
even if they don't affect any guest.

This obviously has a huge impact on performance, as we handle
TLBI traps as a normal exit, and a normal VHE host issues
thousands of TLBIs when booting (and quite a few when running
userspace).

A cheap way to reduce the overhead is to handle the limited
case of {E2H,TGE}=={1,1} as a guest fixup, as we already have
the right mmu configuration in place. Just execute the decoded
instruction right away and return to the guest.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/hyp/vhe/switch.c | 43 ++++++++++++++++++++++++++++++++-
 arch/arm64/kvm/hyp/vhe/tlb.c    |  6 +++--
 arch/arm64/kvm/sys_regs.c       | 25 ++++++-------------
 3 files changed, 54 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index f469bf51b4be..686b37579d3b 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -161,6 +161,47 @@ void deactivate_traps_vhe_put(struct kvm_vcpu *vcpu)
 	__deactivate_traps_common(vcpu);
 }
 
+static bool kvm_hyp_handle_tlbi_el1(struct kvm_vcpu *vcpu, u64 *exit_code)
+{
+	u32 instr;
+	u64 val;
+
+	/*
+	 * Ideally, we would never trap on EL1 TLB invalidations when the
+	 * guest's HCR_EL2.{E2H,TGE} == {1,1}. But "thanks" to ARMv8.4, we
+	 * don't trap writes to HCR_EL2, meaning that we can't track
+	 * changes to the virtual TGE bit. So we leave HCR_EL2.TTLB set on
+	 * the host. Oopsie...
+	 *
+	 * In order to speed-up EL1 TLBIs from the vEL2 guest when TGE is
+	 * set, try and handle these invalidation as quickly as possible,
+	 * without fully exiting (unless this needs forwarding).
+	 */
+	if (!vcpu_has_nv2(vcpu) ||
+	    !vcpu_is_el2(vcpu) ||
+	    (__vcpu_sys_reg(vcpu, HCR_EL2) & (HCR_E2H | HCR_TGE)) != (HCR_E2H | HCR_TGE))
+		return false;
+
+	instr = esr_sys64_to_sysreg(kvm_vcpu_get_esr(vcpu));
+	if (sys_reg_Op0(instr) != TLBI_Op0 ||
+	    sys_reg_Op1(instr) != TLBI_Op1_EL1)
+		return false;
+
+	val = vcpu_get_reg(vcpu, kvm_vcpu_sys_get_rt(vcpu));
+	__kvm_tlb_el1_instr(NULL, val, instr);
+	__kvm_skip_instr(vcpu);
+
+	return true;
+}
+
+static bool kvm_hyp_handle_sysreg_vhe(struct kvm_vcpu *vcpu, u64 *exit_code)
+{
+	if (kvm_hyp_handle_tlbi_el1(vcpu, exit_code))
+		return true;
+
+	return kvm_hyp_handle_sysreg(vcpu, exit_code);
+}
+
 static bool kvm_hyp_handle_eret(struct kvm_vcpu *vcpu, u64 *exit_code)
 {
 	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
@@ -209,7 +250,7 @@ static bool kvm_hyp_handle_eret(struct kvm_vcpu *vcpu, u64 *exit_code)
 static const exit_handler_fn hyp_exit_handlers[] = {
 	[0 ... ESR_ELx_EC_MAX]		= NULL,
 	[ESR_ELx_EC_CP15_32]		= kvm_hyp_handle_cp15_32,
-	[ESR_ELx_EC_SYS64]		= kvm_hyp_handle_sysreg,
+	[ESR_ELx_EC_SYS64]		= kvm_hyp_handle_sysreg_vhe,
 	[ESR_ELx_EC_SVE]		= kvm_hyp_handle_fpsimd,
 	[ESR_ELx_EC_FP_ASIMD]		= kvm_hyp_handle_fpsimd,
 	[ESR_ELx_EC_IABT_LOW]		= kvm_hyp_handle_iabt_low,
diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c
index c4389db4cc22..beb162468c0b 100644
--- a/arch/arm64/kvm/hyp/vhe/tlb.c
+++ b/arch/arm64/kvm/hyp/vhe/tlb.c
@@ -201,7 +201,8 @@ void __kvm_tlb_el1_instr(struct kvm_s2_mmu *mmu, u64 val, u64 sys_encoding)
 	dsb(ishst);
 
 	/* Switch to requested VMID */
-	__tlb_switch_to_guest(mmu, &cxt);
+	if (mmu)
+		__tlb_switch_to_guest(mmu, &cxt);
 
 	/*
 	 * Execute the same instruction as the guest hypervisor did,
@@ -240,5 +241,6 @@ void __kvm_tlb_el1_instr(struct kvm_s2_mmu *mmu, u64 val, u64 sys_encoding)
 	dsb(ish);
 	isb();
 
-	__tlb_switch_to_host(&cxt);
+	if (mmu)
+		__tlb_switch_to_host(&cxt);
 }
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 351bcb429f25..5023ae566f7f 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2839,6 +2839,8 @@ static bool handle_tlbi_el1(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 			    const struct sys_reg_desc *r)
 {
 	u32 sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
+	u64 virtual_vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
+	struct kvm_s2_mmu *mmu;
 
 	if (vcpu_has_nv(vcpu) && forward_traps(vcpu, HCR_TTLB))
 		return false;
@@ -2860,24 +2862,13 @@ static bool handle_tlbi_el1(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 
 	mutex_lock(&vcpu->kvm->lock);
 
-	if ((__vcpu_sys_reg(vcpu, HCR_EL2) & (HCR_E2H | HCR_TGE)) != (HCR_E2H | HCR_TGE)) {
-		u64 virtual_vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
-		struct kvm_s2_mmu *mmu;
-
-		mmu = lookup_s2_mmu(vcpu->kvm, virtual_vttbr, HCR_VM);
-		if (mmu)
-			__kvm_tlb_el1_instr(mmu, p->regval, sys_encoding);
+	mmu = lookup_s2_mmu(vcpu->kvm, virtual_vttbr, HCR_VM);
+	if (mmu)
+		__kvm_tlb_el1_instr(mmu, p->regval, sys_encoding);
 
-		mmu = lookup_s2_mmu(vcpu->kvm, virtual_vttbr, 0);
-		if (mmu)
-			__kvm_tlb_el1_instr(mmu, p->regval, sys_encoding);
-	} else {
-		/*
-		 * ARMv8.4-NV allows the guest to change TGE behind
-		 * our back, so we always trap EL1 TLBIs from vEL2...
-		 */
-		__kvm_tlb_el1_instr(&vcpu->kvm->arch.mmu, p->regval, sys_encoding);
-	}
+	mmu = lookup_s2_mmu(vcpu->kvm, virtual_vttbr, 0);
+	if (mmu)
+		__kvm_tlb_el1_instr(mmu, p->regval, sys_encoding);
 
 	mutex_unlock(&vcpu->kvm->lock);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 64/64] KVM: arm64: nv: Fast-track EL1 TLBIs for VHE guests
@ 2022-01-28 12:19   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:19 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: kernel-team, Andre Przywara, Christoffer Dall, Chase Conklin,
	Russell King (Oracle),
	mihai.carabas, Ganapatrao Kulkarni

Due to the way ARMv8.4-NV suppresses traps when accessing EL2
system registers, we can't track when the guest changes its
HCR_EL2.TGE setting. This means we always trap EL1 TLBIs,
even if they don't affect any guest.

This obviously has a huge impact on performance, as we handle
TLBI traps as a normal exit, and a normal VHE host issues
thousands of TLBIs when booting (and quite a few when running
userspace).

A cheap way to reduce the overhead is to handle the limited
case of {E2H,TGE}=={1,1} as a guest fixup, as we already have
the right mmu configuration in place. Just execute the decoded
instruction right away and return to the guest.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/hyp/vhe/switch.c | 43 ++++++++++++++++++++++++++++++++-
 arch/arm64/kvm/hyp/vhe/tlb.c    |  6 +++--
 arch/arm64/kvm/sys_regs.c       | 25 ++++++-------------
 3 files changed, 54 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index f469bf51b4be..686b37579d3b 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -161,6 +161,47 @@ void deactivate_traps_vhe_put(struct kvm_vcpu *vcpu)
 	__deactivate_traps_common(vcpu);
 }
 
+static bool kvm_hyp_handle_tlbi_el1(struct kvm_vcpu *vcpu, u64 *exit_code)
+{
+	u32 instr;
+	u64 val;
+
+	/*
+	 * Ideally, we would never trap on EL1 TLB invalidations when the
+	 * guest's HCR_EL2.{E2H,TGE} == {1,1}. But "thanks" to ARMv8.4, we
+	 * don't trap writes to HCR_EL2, meaning that we can't track
+	 * changes to the virtual TGE bit. So we leave HCR_EL2.TTLB set on
+	 * the host. Oopsie...
+	 *
+	 * In order to speed-up EL1 TLBIs from the vEL2 guest when TGE is
+	 * set, try and handle these invalidation as quickly as possible,
+	 * without fully exiting (unless this needs forwarding).
+	 */
+	if (!vcpu_has_nv2(vcpu) ||
+	    !vcpu_is_el2(vcpu) ||
+	    (__vcpu_sys_reg(vcpu, HCR_EL2) & (HCR_E2H | HCR_TGE)) != (HCR_E2H | HCR_TGE))
+		return false;
+
+	instr = esr_sys64_to_sysreg(kvm_vcpu_get_esr(vcpu));
+	if (sys_reg_Op0(instr) != TLBI_Op0 ||
+	    sys_reg_Op1(instr) != TLBI_Op1_EL1)
+		return false;
+
+	val = vcpu_get_reg(vcpu, kvm_vcpu_sys_get_rt(vcpu));
+	__kvm_tlb_el1_instr(NULL, val, instr);
+	__kvm_skip_instr(vcpu);
+
+	return true;
+}
+
+static bool kvm_hyp_handle_sysreg_vhe(struct kvm_vcpu *vcpu, u64 *exit_code)
+{
+	if (kvm_hyp_handle_tlbi_el1(vcpu, exit_code))
+		return true;
+
+	return kvm_hyp_handle_sysreg(vcpu, exit_code);
+}
+
 static bool kvm_hyp_handle_eret(struct kvm_vcpu *vcpu, u64 *exit_code)
 {
 	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
@@ -209,7 +250,7 @@ static bool kvm_hyp_handle_eret(struct kvm_vcpu *vcpu, u64 *exit_code)
 static const exit_handler_fn hyp_exit_handlers[] = {
 	[0 ... ESR_ELx_EC_MAX]		= NULL,
 	[ESR_ELx_EC_CP15_32]		= kvm_hyp_handle_cp15_32,
-	[ESR_ELx_EC_SYS64]		= kvm_hyp_handle_sysreg,
+	[ESR_ELx_EC_SYS64]		= kvm_hyp_handle_sysreg_vhe,
 	[ESR_ELx_EC_SVE]		= kvm_hyp_handle_fpsimd,
 	[ESR_ELx_EC_FP_ASIMD]		= kvm_hyp_handle_fpsimd,
 	[ESR_ELx_EC_IABT_LOW]		= kvm_hyp_handle_iabt_low,
diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c
index c4389db4cc22..beb162468c0b 100644
--- a/arch/arm64/kvm/hyp/vhe/tlb.c
+++ b/arch/arm64/kvm/hyp/vhe/tlb.c
@@ -201,7 +201,8 @@ void __kvm_tlb_el1_instr(struct kvm_s2_mmu *mmu, u64 val, u64 sys_encoding)
 	dsb(ishst);
 
 	/* Switch to requested VMID */
-	__tlb_switch_to_guest(mmu, &cxt);
+	if (mmu)
+		__tlb_switch_to_guest(mmu, &cxt);
 
 	/*
 	 * Execute the same instruction as the guest hypervisor did,
@@ -240,5 +241,6 @@ void __kvm_tlb_el1_instr(struct kvm_s2_mmu *mmu, u64 val, u64 sys_encoding)
 	dsb(ish);
 	isb();
 
-	__tlb_switch_to_host(&cxt);
+	if (mmu)
+		__tlb_switch_to_host(&cxt);
 }
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 351bcb429f25..5023ae566f7f 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2839,6 +2839,8 @@ static bool handle_tlbi_el1(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 			    const struct sys_reg_desc *r)
 {
 	u32 sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
+	u64 virtual_vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
+	struct kvm_s2_mmu *mmu;
 
 	if (vcpu_has_nv(vcpu) && forward_traps(vcpu, HCR_TTLB))
 		return false;
@@ -2860,24 +2862,13 @@ static bool handle_tlbi_el1(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 
 	mutex_lock(&vcpu->kvm->lock);
 
-	if ((__vcpu_sys_reg(vcpu, HCR_EL2) & (HCR_E2H | HCR_TGE)) != (HCR_E2H | HCR_TGE)) {
-		u64 virtual_vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
-		struct kvm_s2_mmu *mmu;
-
-		mmu = lookup_s2_mmu(vcpu->kvm, virtual_vttbr, HCR_VM);
-		if (mmu)
-			__kvm_tlb_el1_instr(mmu, p->regval, sys_encoding);
+	mmu = lookup_s2_mmu(vcpu->kvm, virtual_vttbr, HCR_VM);
+	if (mmu)
+		__kvm_tlb_el1_instr(mmu, p->regval, sys_encoding);
 
-		mmu = lookup_s2_mmu(vcpu->kvm, virtual_vttbr, 0);
-		if (mmu)
-			__kvm_tlb_el1_instr(mmu, p->regval, sys_encoding);
-	} else {
-		/*
-		 * ARMv8.4-NV allows the guest to change TGE behind
-		 * our back, so we always trap EL1 TLBIs from vEL2...
-		 */
-		__kvm_tlb_el1_instr(&vcpu->kvm->arch.mmu, p->regval, sys_encoding);
-	}
+	mmu = lookup_s2_mmu(vcpu->kvm, virtual_vttbr, 0);
+	if (mmu)
+		__kvm_tlb_el1_instr(mmu, p->regval, sys_encoding);
 
 	mutex_unlock(&vcpu->kvm->lock);
 
-- 
2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* [PATCH v6 64/64] KVM: arm64: nv: Fast-track EL1 TLBIs for VHE guests
@ 2022-01-28 12:19   ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-01-28 12:19 UTC (permalink / raw)
  To: linux-arm-kernel, kvmarm, kvm
  Cc: Andre Przywara, Christoffer Dall, Jintack Lim, Haibo Xu,
	Ganapatrao Kulkarni, Chase Conklin, Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

Due to the way ARMv8.4-NV suppresses traps when accessing EL2
system registers, we can't track when the guest changes its
HCR_EL2.TGE setting. This means we always trap EL1 TLBIs,
even if they don't affect any guest.

This obviously has a huge impact on performance, as we handle
TLBI traps as a normal exit, and a normal VHE host issues
thousands of TLBIs when booting (and quite a few when running
userspace).

A cheap way to reduce the overhead is to handle the limited
case of {E2H,TGE}=={1,1} as a guest fixup, as we already have
the right mmu configuration in place. Just execute the decoded
instruction right away and return to the guest.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/hyp/vhe/switch.c | 43 ++++++++++++++++++++++++++++++++-
 arch/arm64/kvm/hyp/vhe/tlb.c    |  6 +++--
 arch/arm64/kvm/sys_regs.c       | 25 ++++++-------------
 3 files changed, 54 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index f469bf51b4be..686b37579d3b 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -161,6 +161,47 @@ void deactivate_traps_vhe_put(struct kvm_vcpu *vcpu)
 	__deactivate_traps_common(vcpu);
 }
 
+static bool kvm_hyp_handle_tlbi_el1(struct kvm_vcpu *vcpu, u64 *exit_code)
+{
+	u32 instr;
+	u64 val;
+
+	/*
+	 * Ideally, we would never trap on EL1 TLB invalidations when the
+	 * guest's HCR_EL2.{E2H,TGE} == {1,1}. But "thanks" to ARMv8.4, we
+	 * don't trap writes to HCR_EL2, meaning that we can't track
+	 * changes to the virtual TGE bit. So we leave HCR_EL2.TTLB set on
+	 * the host. Oopsie...
+	 *
+	 * In order to speed-up EL1 TLBIs from the vEL2 guest when TGE is
+	 * set, try and handle these invalidation as quickly as possible,
+	 * without fully exiting (unless this needs forwarding).
+	 */
+	if (!vcpu_has_nv2(vcpu) ||
+	    !vcpu_is_el2(vcpu) ||
+	    (__vcpu_sys_reg(vcpu, HCR_EL2) & (HCR_E2H | HCR_TGE)) != (HCR_E2H | HCR_TGE))
+		return false;
+
+	instr = esr_sys64_to_sysreg(kvm_vcpu_get_esr(vcpu));
+	if (sys_reg_Op0(instr) != TLBI_Op0 ||
+	    sys_reg_Op1(instr) != TLBI_Op1_EL1)
+		return false;
+
+	val = vcpu_get_reg(vcpu, kvm_vcpu_sys_get_rt(vcpu));
+	__kvm_tlb_el1_instr(NULL, val, instr);
+	__kvm_skip_instr(vcpu);
+
+	return true;
+}
+
+static bool kvm_hyp_handle_sysreg_vhe(struct kvm_vcpu *vcpu, u64 *exit_code)
+{
+	if (kvm_hyp_handle_tlbi_el1(vcpu, exit_code))
+		return true;
+
+	return kvm_hyp_handle_sysreg(vcpu, exit_code);
+}
+
 static bool kvm_hyp_handle_eret(struct kvm_vcpu *vcpu, u64 *exit_code)
 {
 	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
@@ -209,7 +250,7 @@ static bool kvm_hyp_handle_eret(struct kvm_vcpu *vcpu, u64 *exit_code)
 static const exit_handler_fn hyp_exit_handlers[] = {
 	[0 ... ESR_ELx_EC_MAX]		= NULL,
 	[ESR_ELx_EC_CP15_32]		= kvm_hyp_handle_cp15_32,
-	[ESR_ELx_EC_SYS64]		= kvm_hyp_handle_sysreg,
+	[ESR_ELx_EC_SYS64]		= kvm_hyp_handle_sysreg_vhe,
 	[ESR_ELx_EC_SVE]		= kvm_hyp_handle_fpsimd,
 	[ESR_ELx_EC_FP_ASIMD]		= kvm_hyp_handle_fpsimd,
 	[ESR_ELx_EC_IABT_LOW]		= kvm_hyp_handle_iabt_low,
diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c
index c4389db4cc22..beb162468c0b 100644
--- a/arch/arm64/kvm/hyp/vhe/tlb.c
+++ b/arch/arm64/kvm/hyp/vhe/tlb.c
@@ -201,7 +201,8 @@ void __kvm_tlb_el1_instr(struct kvm_s2_mmu *mmu, u64 val, u64 sys_encoding)
 	dsb(ishst);
 
 	/* Switch to requested VMID */
-	__tlb_switch_to_guest(mmu, &cxt);
+	if (mmu)
+		__tlb_switch_to_guest(mmu, &cxt);
 
 	/*
 	 * Execute the same instruction as the guest hypervisor did,
@@ -240,5 +241,6 @@ void __kvm_tlb_el1_instr(struct kvm_s2_mmu *mmu, u64 val, u64 sys_encoding)
 	dsb(ish);
 	isb();
 
-	__tlb_switch_to_host(&cxt);
+	if (mmu)
+		__tlb_switch_to_host(&cxt);
 }
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 351bcb429f25..5023ae566f7f 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2839,6 +2839,8 @@ static bool handle_tlbi_el1(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 			    const struct sys_reg_desc *r)
 {
 	u32 sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
+	u64 virtual_vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
+	struct kvm_s2_mmu *mmu;
 
 	if (vcpu_has_nv(vcpu) && forward_traps(vcpu, HCR_TTLB))
 		return false;
@@ -2860,24 +2862,13 @@ static bool handle_tlbi_el1(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 
 	mutex_lock(&vcpu->kvm->lock);
 
-	if ((__vcpu_sys_reg(vcpu, HCR_EL2) & (HCR_E2H | HCR_TGE)) != (HCR_E2H | HCR_TGE)) {
-		u64 virtual_vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
-		struct kvm_s2_mmu *mmu;
-
-		mmu = lookup_s2_mmu(vcpu->kvm, virtual_vttbr, HCR_VM);
-		if (mmu)
-			__kvm_tlb_el1_instr(mmu, p->regval, sys_encoding);
+	mmu = lookup_s2_mmu(vcpu->kvm, virtual_vttbr, HCR_VM);
+	if (mmu)
+		__kvm_tlb_el1_instr(mmu, p->regval, sys_encoding);
 
-		mmu = lookup_s2_mmu(vcpu->kvm, virtual_vttbr, 0);
-		if (mmu)
-			__kvm_tlb_el1_instr(mmu, p->regval, sys_encoding);
-	} else {
-		/*
-		 * ARMv8.4-NV allows the guest to change TGE behind
-		 * our back, so we always trap EL1 TLBIs from vEL2...
-		 */
-		__kvm_tlb_el1_instr(&vcpu->kvm->arch.mmu, p->regval, sys_encoding);
-	}
+	mmu = lookup_s2_mmu(vcpu->kvm, virtual_vttbr, 0);
+	if (mmu)
+		__kvm_tlb_el1_instr(mmu, p->regval, sys_encoding);
 
 	mutex_unlock(&vcpu->kvm->lock);
 
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 01/64] arm64: Add ARM64_HAS_NESTED_VIRT cpufeature
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-01 14:22     ` Russell King (Oracle)
  -1 siblings, 0 replies; 378+ messages in thread
From: Russell King (Oracle) @ 2022-02-01 14:22 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

On Fri, Jan 28, 2022 at 12:18:09PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> Add a new ARM64_HAS_NESTED_VIRT feature to indicate that the
> CPU has the ARMv8.3 nested virtualization capability, together
> with the 'kvm-arm.mode=nested' command line option.
> 
> This will be used to support nested virtualization in KVM.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> [maz: moved the command-line option to kvm-arm.mode]
> Signed-off-by: Marc Zyngier <maz@kernel.org>

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 01/64] arm64: Add ARM64_HAS_NESTED_VIRT cpufeature
@ 2022-02-01 14:22     ` Russell King (Oracle)
  0 siblings, 0 replies; 378+ messages in thread
From: Russell King (Oracle) @ 2022-02-01 14:22 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	linux-arm-kernel

On Fri, Jan 28, 2022 at 12:18:09PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> Add a new ARM64_HAS_NESTED_VIRT feature to indicate that the
> CPU has the ARMv8.3 nested virtualization capability, together
> with the 'kvm-arm.mode=nested' command line option.
> 
> This will be used to support nested virtualization in KVM.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> [maz: moved the command-line option to kvm-arm.mode]
> Signed-off-by: Marc Zyngier <maz@kernel.org>

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 01/64] arm64: Add ARM64_HAS_NESTED_VIRT cpufeature
@ 2022-02-01 14:22     ` Russell King (Oracle)
  0 siblings, 0 replies; 378+ messages in thread
From: Russell King (Oracle) @ 2022-02-01 14:22 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

On Fri, Jan 28, 2022 at 12:18:09PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> Add a new ARM64_HAS_NESTED_VIRT feature to indicate that the
> CPU has the ARMv8.3 nested virtualization capability, together
> with the 'kvm-arm.mode=nested' command line option.
> 
> This will be used to support nested virtualization in KVM.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> [maz: moved the command-line option to kvm-arm.mode]
> Signed-off-by: Marc Zyngier <maz@kernel.org>

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 07/64] KVM: arm64: nv: Handle HCR_EL2.NV system register traps
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-01 14:32     ` Russell King (Oracle)
  -1 siblings, 0 replies; 378+ messages in thread
From: Russell King (Oracle) @ 2022-02-01 14:32 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

On Fri, Jan 28, 2022 at 12:18:15PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> ARM v8.3 introduces a new bit in the HCR_EL2, which is the NV bit. When
> this bit is set, accessing EL2 registers in EL1 traps to EL2. In
> addition, executing the following instructions in EL1 will trap to EL2:
> tlbi, at, eret, and msr/mrs instructions to access SP_EL1. Most of the
> instructions that trap to EL2 with the NV bit were undef at EL1 prior to
> ARM v8.3. The only instruction that was not undef is eret.
> 
> This patch sets up a handler for EL2 registers and SP_EL1 register
> accesses at EL1. The host hypervisor keeps those register values in
> memory, and will emulate their behavior.
> 
> This patch doesn't set the NV bit yet. It will be set in a later patch
> once nested virtualization support is completed.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> [maz: EL2_REG() macros]
> Signed-off-by: Marc Zyngier <maz@kernel.org>

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

Thanks!

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 07/64] KVM: arm64: nv: Handle HCR_EL2.NV system register traps
@ 2022-02-01 14:32     ` Russell King (Oracle)
  0 siblings, 0 replies; 378+ messages in thread
From: Russell King (Oracle) @ 2022-02-01 14:32 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	linux-arm-kernel

On Fri, Jan 28, 2022 at 12:18:15PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> ARM v8.3 introduces a new bit in the HCR_EL2, which is the NV bit. When
> this bit is set, accessing EL2 registers in EL1 traps to EL2. In
> addition, executing the following instructions in EL1 will trap to EL2:
> tlbi, at, eret, and msr/mrs instructions to access SP_EL1. Most of the
> instructions that trap to EL2 with the NV bit were undef at EL1 prior to
> ARM v8.3. The only instruction that was not undef is eret.
> 
> This patch sets up a handler for EL2 registers and SP_EL1 register
> accesses at EL1. The host hypervisor keeps those register values in
> memory, and will emulate their behavior.
> 
> This patch doesn't set the NV bit yet. It will be set in a later patch
> once nested virtualization support is completed.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> [maz: EL2_REG() macros]
> Signed-off-by: Marc Zyngier <maz@kernel.org>

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

Thanks!

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 07/64] KVM: arm64: nv: Handle HCR_EL2.NV system register traps
@ 2022-02-01 14:32     ` Russell King (Oracle)
  0 siblings, 0 replies; 378+ messages in thread
From: Russell King (Oracle) @ 2022-02-01 14:32 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

On Fri, Jan 28, 2022 at 12:18:15PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> ARM v8.3 introduces a new bit in the HCR_EL2, which is the NV bit. When
> this bit is set, accessing EL2 registers in EL1 traps to EL2. In
> addition, executing the following instructions in EL1 will trap to EL2:
> tlbi, at, eret, and msr/mrs instructions to access SP_EL1. Most of the
> instructions that trap to EL2 with the NV bit were undef at EL1 prior to
> ARM v8.3. The only instruction that was not undef is eret.
> 
> This patch sets up a handler for EL2 registers and SP_EL1 register
> accesses at EL1. The host hypervisor keeps those register values in
> memory, and will emulate their behavior.
> 
> This patch doesn't set the NV bit yet. It will be set in a later patch
> once nested virtualization support is completed.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> [maz: EL2_REG() macros]
> Signed-off-by: Marc Zyngier <maz@kernel.org>

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

Thanks!

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 12/64] KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-01 16:37     ` Russell King (Oracle)
  -1 siblings, 0 replies; 378+ messages in thread
From: Russell King (Oracle) @ 2022-02-01 16:37 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

On Fri, Jan 28, 2022 at 12:18:20PM +0000, Marc Zyngier wrote:
> Some EL2 system registers immediately affect the current execution
> of the system, so we need to use their respective EL1 counterparts.
> For this we need to define a mapping between the two. In general,
> this only affects non-VHE guest hypervisors, as VHE system registers
> are compatible with the EL1 counterparts.
> 
> These helpers will get used in subsequent patches.
> 
> Co-developed-by: Andre Przywara <andre.przywara@arm.com>
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

> ---
>  arch/arm64/include/asm/kvm_nested.h | 54 +++++++++++++++++++++++++++++
>  1 file changed, 54 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index fd601ea68d13..5a85be6d8eb3 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -2,6 +2,7 @@
>  #ifndef __ARM64_KVM_NESTED_H
>  #define __ARM64_KVM_NESTED_H
>  
> +#include <linux/bitfield.h>
>  #include <linux/kvm_host.h>
>  
>  static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
> @@ -11,4 +12,57 @@ static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
>  		test_bit(KVM_ARM_VCPU_HAS_EL2, vcpu->arch.features));
>  }
>  
> +/* Translation helpers from non-VHE EL2 to EL1 */
> +static inline u64 tcr_el2_ps_to_tcr_el1_ips(u64 tcr_el2)
> +{
> +	return (u64)FIELD_GET(TCR_EL2_PS_MASK, tcr_el2) << TCR_IPS_SHIFT;

I frowned about the use of FIELD_GET() but not FIELD_PREP(), which
would be:

	return FIELD_PREP(TCR_IPS_MASK, FIELD_GET(TCR_EL2_PS_MASK, tcr_el2));

However, I'm not bothered by this beyond frowning!

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 12/64] KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers
@ 2022-02-01 16:37     ` Russell King (Oracle)
  0 siblings, 0 replies; 378+ messages in thread
From: Russell King (Oracle) @ 2022-02-01 16:37 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	linux-arm-kernel

On Fri, Jan 28, 2022 at 12:18:20PM +0000, Marc Zyngier wrote:
> Some EL2 system registers immediately affect the current execution
> of the system, so we need to use their respective EL1 counterparts.
> For this we need to define a mapping between the two. In general,
> this only affects non-VHE guest hypervisors, as VHE system registers
> are compatible with the EL1 counterparts.
> 
> These helpers will get used in subsequent patches.
> 
> Co-developed-by: Andre Przywara <andre.przywara@arm.com>
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

> ---
>  arch/arm64/include/asm/kvm_nested.h | 54 +++++++++++++++++++++++++++++
>  1 file changed, 54 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index fd601ea68d13..5a85be6d8eb3 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -2,6 +2,7 @@
>  #ifndef __ARM64_KVM_NESTED_H
>  #define __ARM64_KVM_NESTED_H
>  
> +#include <linux/bitfield.h>
>  #include <linux/kvm_host.h>
>  
>  static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
> @@ -11,4 +12,57 @@ static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
>  		test_bit(KVM_ARM_VCPU_HAS_EL2, vcpu->arch.features));
>  }
>  
> +/* Translation helpers from non-VHE EL2 to EL1 */
> +static inline u64 tcr_el2_ps_to_tcr_el1_ips(u64 tcr_el2)
> +{
> +	return (u64)FIELD_GET(TCR_EL2_PS_MASK, tcr_el2) << TCR_IPS_SHIFT;

I frowned about the use of FIELD_GET() but not FIELD_PREP(), which
would be:

	return FIELD_PREP(TCR_IPS_MASK, FIELD_GET(TCR_EL2_PS_MASK, tcr_el2));

However, I'm not bothered by this beyond frowning!

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 12/64] KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers
@ 2022-02-01 16:37     ` Russell King (Oracle)
  0 siblings, 0 replies; 378+ messages in thread
From: Russell King (Oracle) @ 2022-02-01 16:37 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

On Fri, Jan 28, 2022 at 12:18:20PM +0000, Marc Zyngier wrote:
> Some EL2 system registers immediately affect the current execution
> of the system, so we need to use their respective EL1 counterparts.
> For this we need to define a mapping between the two. In general,
> this only affects non-VHE guest hypervisors, as VHE system registers
> are compatible with the EL1 counterparts.
> 
> These helpers will get used in subsequent patches.
> 
> Co-developed-by: Andre Przywara <andre.przywara@arm.com>
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

> ---
>  arch/arm64/include/asm/kvm_nested.h | 54 +++++++++++++++++++++++++++++
>  1 file changed, 54 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index fd601ea68d13..5a85be6d8eb3 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -2,6 +2,7 @@
>  #ifndef __ARM64_KVM_NESTED_H
>  #define __ARM64_KVM_NESTED_H
>  
> +#include <linux/bitfield.h>
>  #include <linux/kvm_host.h>
>  
>  static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
> @@ -11,4 +12,57 @@ static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
>  		test_bit(KVM_ARM_VCPU_HAS_EL2, vcpu->arch.features));
>  }
>  
> +/* Translation helpers from non-VHE EL2 to EL1 */
> +static inline u64 tcr_el2_ps_to_tcr_el1_ips(u64 tcr_el2)
> +{
> +	return (u64)FIELD_GET(TCR_EL2_PS_MASK, tcr_el2) << TCR_IPS_SHIFT;

I frowned about the use of FIELD_GET() but not FIELD_PREP(), which
would be:

	return FIELD_PREP(TCR_IPS_MASK, FIELD_GET(TCR_EL2_PS_MASK, tcr_el2));

However, I'm not bothered by this beyond frowning!

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 13/64] KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg()
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-01 16:40     ` Russell King (Oracle)
  -1 siblings, 0 replies; 378+ messages in thread
From: Russell King (Oracle) @ 2022-02-01 16:40 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

On Fri, Jan 28, 2022 at 12:18:21PM +0000, Marc Zyngier wrote:
> KVM internally uses accessor functions when reading or writing the
> guest's system registers. This takes care of accessing either the stored
> copy or using the "live" EL1 system registers when the host uses VHE.
> 
> With the introduction of virtual EL2 we add a bunch of EL2 system
> registers, which now must also be taken care of:
> - If the guest is running in vEL2, and we access an EL1 sysreg, we must
>   revert to the stored version of that, and not use the CPU's copy.
> - If the guest is running in vEL1, and we access an EL2 sysreg, we must
>   also use the stored version, since the CPU carries the EL1 copy.
> - Some EL2 system registers are supposed to affect the current execution
>   of the system, so we need to put them into their respective EL1
>   counterparts. For this we need to define a mapping between the two.
>   This is done using the newly introduced struct el2_sysreg_map.
> - Some EL2 system registers have a different format than their EL1
>   counterpart, so we need to translate them before writing them to the
>   CPU. This is done using an (optional) translate function in the map.
> - There are the three special registers SP_EL2, SPSR_EL2 and ELR_EL2,
>   which need some separate handling (SPSR_EL2 is being handled in a
>   separate patch).
> 
> All of these cases are now wrapped into the existing accessor functions,
> so KVM users wouldn't need to care whether they access EL2 or EL1
> registers and also which state the guest is in.
> 
> This handles what was formerly known as the "shadow state" dynamically,
> without requiring a separate copy for each vCPU EL.
> 
> Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
> Co-developed-by: Andre Przywara <andre.przywara@arm.com>
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 13/64] KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg()
@ 2022-02-01 16:40     ` Russell King (Oracle)
  0 siblings, 0 replies; 378+ messages in thread
From: Russell King (Oracle) @ 2022-02-01 16:40 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	linux-arm-kernel

On Fri, Jan 28, 2022 at 12:18:21PM +0000, Marc Zyngier wrote:
> KVM internally uses accessor functions when reading or writing the
> guest's system registers. This takes care of accessing either the stored
> copy or using the "live" EL1 system registers when the host uses VHE.
> 
> With the introduction of virtual EL2 we add a bunch of EL2 system
> registers, which now must also be taken care of:
> - If the guest is running in vEL2, and we access an EL1 sysreg, we must
>   revert to the stored version of that, and not use the CPU's copy.
> - If the guest is running in vEL1, and we access an EL2 sysreg, we must
>   also use the stored version, since the CPU carries the EL1 copy.
> - Some EL2 system registers are supposed to affect the current execution
>   of the system, so we need to put them into their respective EL1
>   counterparts. For this we need to define a mapping between the two.
>   This is done using the newly introduced struct el2_sysreg_map.
> - Some EL2 system registers have a different format than their EL1
>   counterpart, so we need to translate them before writing them to the
>   CPU. This is done using an (optional) translate function in the map.
> - There are the three special registers SP_EL2, SPSR_EL2 and ELR_EL2,
>   which need some separate handling (SPSR_EL2 is being handled in a
>   separate patch).
> 
> All of these cases are now wrapped into the existing accessor functions,
> so KVM users wouldn't need to care whether they access EL2 or EL1
> registers and also which state the guest is in.
> 
> This handles what was formerly known as the "shadow state" dynamically,
> without requiring a separate copy for each vCPU EL.
> 
> Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
> Co-developed-by: Andre Przywara <andre.przywara@arm.com>
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 13/64] KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg()
@ 2022-02-01 16:40     ` Russell King (Oracle)
  0 siblings, 0 replies; 378+ messages in thread
From: Russell King (Oracle) @ 2022-02-01 16:40 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

On Fri, Jan 28, 2022 at 12:18:21PM +0000, Marc Zyngier wrote:
> KVM internally uses accessor functions when reading or writing the
> guest's system registers. This takes care of accessing either the stored
> copy or using the "live" EL1 system registers when the host uses VHE.
> 
> With the introduction of virtual EL2 we add a bunch of EL2 system
> registers, which now must also be taken care of:
> - If the guest is running in vEL2, and we access an EL1 sysreg, we must
>   revert to the stored version of that, and not use the CPU's copy.
> - If the guest is running in vEL1, and we access an EL2 sysreg, we must
>   also use the stored version, since the CPU carries the EL1 copy.
> - Some EL2 system registers are supposed to affect the current execution
>   of the system, so we need to put them into their respective EL1
>   counterparts. For this we need to define a mapping between the two.
>   This is done using the newly introduced struct el2_sysreg_map.
> - Some EL2 system registers have a different format than their EL1
>   counterpart, so we need to translate them before writing them to the
>   CPU. This is done using an (optional) translate function in the map.
> - There are the three special registers SP_EL2, SPSR_EL2 and ELR_EL2,
>   which need some separate handling (SPSR_EL2 is being handled in a
>   separate patch).
> 
> All of these cases are now wrapped into the existing accessor functions,
> so KVM users wouldn't need to care whether they access EL2 or EL1
> registers and also which state the guest is in.
> 
> This handles what was formerly known as the "shadow state" dynamically,
> without requiring a separate copy for each vCPU EL.
> 
> Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
> Co-developed-by: Andre Przywara <andre.przywara@arm.com>
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 14/64] KVM: arm64: nv: Handle SPSR_EL2 specially
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-01 16:43     ` Russell King (Oracle)
  -1 siblings, 0 replies; 378+ messages in thread
From: Russell King (Oracle) @ 2022-02-01 16:43 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

On Fri, Jan 28, 2022 at 12:18:22PM +0000, Marc Zyngier wrote:
> SPSR_EL2 needs special attention when running nested on ARMv8.3:
> 
> If taking an exception while running at vEL2 (actually EL1), the
> HW will update the SPSR_EL1 register with the EL1 mode. We need
> to track this in order to make sure that accesses to the virtual
> view of SPSR_EL2 is correct.
> 
> To do so, we place an illegal value in SPSR_EL1.M, and patch it
> accordingly if required when accessing it.
> 
> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 14/64] KVM: arm64: nv: Handle SPSR_EL2 specially
@ 2022-02-01 16:43     ` Russell King (Oracle)
  0 siblings, 0 replies; 378+ messages in thread
From: Russell King (Oracle) @ 2022-02-01 16:43 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	linux-arm-kernel

On Fri, Jan 28, 2022 at 12:18:22PM +0000, Marc Zyngier wrote:
> SPSR_EL2 needs special attention when running nested on ARMv8.3:
> 
> If taking an exception while running at vEL2 (actually EL1), the
> HW will update the SPSR_EL1 register with the EL1 mode. We need
> to track this in order to make sure that accesses to the virtual
> view of SPSR_EL2 is correct.
> 
> To do so, we place an illegal value in SPSR_EL1.M, and patch it
> accordingly if required when accessing it.
> 
> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 14/64] KVM: arm64: nv: Handle SPSR_EL2 specially
@ 2022-02-01 16:43     ` Russell King (Oracle)
  0 siblings, 0 replies; 378+ messages in thread
From: Russell King (Oracle) @ 2022-02-01 16:43 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

On Fri, Jan 28, 2022 at 12:18:22PM +0000, Marc Zyngier wrote:
> SPSR_EL2 needs special attention when running nested on ARMv8.3:
> 
> If taking an exception while running at vEL2 (actually EL1), the
> HW will update the SPSR_EL1 register with the EL1 mode. We need
> to track this in order to make sure that accesses to the virtual
> view of SPSR_EL2 is correct.
> 
> To do so, we place an illegal value in SPSR_EL1.M, and patch it
> accordingly if required when accessing it.
> 
> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 15/64] KVM: arm64: nv: Handle HCR_EL2.E2H specially
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-01 16:51     ` Russell King (Oracle)
  -1 siblings, 0 replies; 378+ messages in thread
From: Russell King (Oracle) @ 2022-02-01 16:51 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

On Fri, Jan 28, 2022 at 12:18:23PM +0000, Marc Zyngier wrote:
> HCR_EL2.E2H is nasty, as a flip of this bit completely changes the way
> we deal with a lot of the state. So when the guest flips this bit
> (sysregs are live), do the put/load dance so that we have a consistent
> state.
> 
> Yes, this is slow. Don't do it.

I'd hope this is very unlikely!

> 
> Suggested-by: Alexandru Elisei <alexandru.elisei@arm.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 15/64] KVM: arm64: nv: Handle HCR_EL2.E2H specially
@ 2022-02-01 16:51     ` Russell King (Oracle)
  0 siblings, 0 replies; 378+ messages in thread
From: Russell King (Oracle) @ 2022-02-01 16:51 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	linux-arm-kernel

On Fri, Jan 28, 2022 at 12:18:23PM +0000, Marc Zyngier wrote:
> HCR_EL2.E2H is nasty, as a flip of this bit completely changes the way
> we deal with a lot of the state. So when the guest flips this bit
> (sysregs are live), do the put/load dance so that we have a consistent
> state.
> 
> Yes, this is slow. Don't do it.

I'd hope this is very unlikely!

> 
> Suggested-by: Alexandru Elisei <alexandru.elisei@arm.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 15/64] KVM: arm64: nv: Handle HCR_EL2.E2H specially
@ 2022-02-01 16:51     ` Russell King (Oracle)
  0 siblings, 0 replies; 378+ messages in thread
From: Russell King (Oracle) @ 2022-02-01 16:51 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

On Fri, Jan 28, 2022 at 12:18:23PM +0000, Marc Zyngier wrote:
> HCR_EL2.E2H is nasty, as a flip of this bit completely changes the way
> we deal with a lot of the state. So when the guest flips this bit
> (sysregs are live), do the put/load dance so that we have a consistent
> state.
> 
> Yes, this is slow. Don't do it.

I'd hope this is very unlikely!

> 
> Suggested-by: Alexandru Elisei <alexandru.elisei@arm.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 17/64] KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-01 18:06     ` Russell King (Oracle)
  -1 siblings, 0 replies; 378+ messages in thread
From: Russell King (Oracle) @ 2022-02-01 18:06 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

On Fri, Jan 28, 2022 at 12:18:25PM +0000, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@arm.com>
> 
> We can no longer blindly copy the VCPU's PSTATE into SPSR_EL2 and return
> to the guest and vice versa when taking an exception to the hypervisor,
> because we emulate virtual EL2 in EL1 and therefore have to translate
> the mode field from EL2 to EL1 and vice versa.
> 
> This requires keeping track of the state we enter the guest, for which
> we transiently use a dedicated flag.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_host.h          |  1 +
>  arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 19 ++++++++++++++++-
>  arch/arm64/kvm/hyp/vhe/switch.c            | 24 ++++++++++++++++++++++
>  3 files changed, 43 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 8fffe2888403..fa253f08e0fd 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -472,6 +472,7 @@ struct kvm_vcpu_arch {
>  #define KVM_ARM64_DEBUG_STATE_SAVE_SPE	(1 << 12) /* Save SPE context if active  */
>  #define KVM_ARM64_DEBUG_STATE_SAVE_TRBE	(1 << 13) /* Save TRBE context if active  */
>  #define KVM_ARM64_FP_FOREIGN_FPSTATE	(1 << 14)
> +#define KVM_ARM64_IN_HYP_CONTEXT	(1 << 15) /* Guest running in HYP context */
>  
>  #define KVM_GUESTDBG_VALID_MASK (KVM_GUESTDBG_ENABLE | \
>  				 KVM_GUESTDBG_USE_SW_BP | \
> diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
> index 283f780f5f56..e3689c6ce4cc 100644
> --- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
> +++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
> @@ -157,9 +157,26 @@ static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt,
>  	write_sysreg_el1(ctxt_sys_reg(ctxt, SPSR_EL1),	SYS_SPSR);
>  }
>  
> +/* Read the VCPU state's PSTATE, but translate (v)EL2 to EL1. */
> +static inline u64 to_hw_pstate(const struct kvm_cpu_context *ctxt)
> +{
> +	u64 mode = ctxt->regs.pstate & (PSR_MODE_MASK | PSR_MODE32_BIT);
> +
> +	switch (mode) {
> +	case PSR_MODE_EL2t:
> +		mode = PSR_MODE_EL1t;
> +		break;
> +	case PSR_MODE_EL2h:
> +		mode = PSR_MODE_EL1h;
> +		break;
> +	}
> +
> +	return (ctxt->regs.pstate & ~(PSR_MODE_MASK | PSR_MODE32_BIT)) | mode;
> +}
> +

Wondering if it makes sense to also have the reverse translation as an
inline function after the above too, so the two translations are
together - but as it's only used (in this patch at least) in switch.c
there probably isn't too much point.

So:

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 17/64] KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor
@ 2022-02-01 18:06     ` Russell King (Oracle)
  0 siblings, 0 replies; 378+ messages in thread
From: Russell King (Oracle) @ 2022-02-01 18:06 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	linux-arm-kernel

On Fri, Jan 28, 2022 at 12:18:25PM +0000, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@arm.com>
> 
> We can no longer blindly copy the VCPU's PSTATE into SPSR_EL2 and return
> to the guest and vice versa when taking an exception to the hypervisor,
> because we emulate virtual EL2 in EL1 and therefore have to translate
> the mode field from EL2 to EL1 and vice versa.
> 
> This requires keeping track of the state we enter the guest, for which
> we transiently use a dedicated flag.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_host.h          |  1 +
>  arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 19 ++++++++++++++++-
>  arch/arm64/kvm/hyp/vhe/switch.c            | 24 ++++++++++++++++++++++
>  3 files changed, 43 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 8fffe2888403..fa253f08e0fd 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -472,6 +472,7 @@ struct kvm_vcpu_arch {
>  #define KVM_ARM64_DEBUG_STATE_SAVE_SPE	(1 << 12) /* Save SPE context if active  */
>  #define KVM_ARM64_DEBUG_STATE_SAVE_TRBE	(1 << 13) /* Save TRBE context if active  */
>  #define KVM_ARM64_FP_FOREIGN_FPSTATE	(1 << 14)
> +#define KVM_ARM64_IN_HYP_CONTEXT	(1 << 15) /* Guest running in HYP context */
>  
>  #define KVM_GUESTDBG_VALID_MASK (KVM_GUESTDBG_ENABLE | \
>  				 KVM_GUESTDBG_USE_SW_BP | \
> diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
> index 283f780f5f56..e3689c6ce4cc 100644
> --- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
> +++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
> @@ -157,9 +157,26 @@ static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt,
>  	write_sysreg_el1(ctxt_sys_reg(ctxt, SPSR_EL1),	SYS_SPSR);
>  }
>  
> +/* Read the VCPU state's PSTATE, but translate (v)EL2 to EL1. */
> +static inline u64 to_hw_pstate(const struct kvm_cpu_context *ctxt)
> +{
> +	u64 mode = ctxt->regs.pstate & (PSR_MODE_MASK | PSR_MODE32_BIT);
> +
> +	switch (mode) {
> +	case PSR_MODE_EL2t:
> +		mode = PSR_MODE_EL1t;
> +		break;
> +	case PSR_MODE_EL2h:
> +		mode = PSR_MODE_EL1h;
> +		break;
> +	}
> +
> +	return (ctxt->regs.pstate & ~(PSR_MODE_MASK | PSR_MODE32_BIT)) | mode;
> +}
> +

Wondering if it makes sense to also have the reverse translation as an
inline function after the above too, so the two translations are
together - but as it's only used (in this patch at least) in switch.c
there probably isn't too much point.

So:

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 17/64] KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor
@ 2022-02-01 18:06     ` Russell King (Oracle)
  0 siblings, 0 replies; 378+ messages in thread
From: Russell King (Oracle) @ 2022-02-01 18:06 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

On Fri, Jan 28, 2022 at 12:18:25PM +0000, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@arm.com>
> 
> We can no longer blindly copy the VCPU's PSTATE into SPSR_EL2 and return
> to the guest and vice versa when taking an exception to the hypervisor,
> because we emulate virtual EL2 in EL1 and therefore have to translate
> the mode field from EL2 to EL1 and vice versa.
> 
> This requires keeping track of the state we enter the guest, for which
> we transiently use a dedicated flag.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_host.h          |  1 +
>  arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 19 ++++++++++++++++-
>  arch/arm64/kvm/hyp/vhe/switch.c            | 24 ++++++++++++++++++++++
>  3 files changed, 43 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 8fffe2888403..fa253f08e0fd 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -472,6 +472,7 @@ struct kvm_vcpu_arch {
>  #define KVM_ARM64_DEBUG_STATE_SAVE_SPE	(1 << 12) /* Save SPE context if active  */
>  #define KVM_ARM64_DEBUG_STATE_SAVE_TRBE	(1 << 13) /* Save TRBE context if active  */
>  #define KVM_ARM64_FP_FOREIGN_FPSTATE	(1 << 14)
> +#define KVM_ARM64_IN_HYP_CONTEXT	(1 << 15) /* Guest running in HYP context */
>  
>  #define KVM_GUESTDBG_VALID_MASK (KVM_GUESTDBG_ENABLE | \
>  				 KVM_GUESTDBG_USE_SW_BP | \
> diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
> index 283f780f5f56..e3689c6ce4cc 100644
> --- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
> +++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
> @@ -157,9 +157,26 @@ static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt,
>  	write_sysreg_el1(ctxt_sys_reg(ctxt, SPSR_EL1),	SYS_SPSR);
>  }
>  
> +/* Read the VCPU state's PSTATE, but translate (v)EL2 to EL1. */
> +static inline u64 to_hw_pstate(const struct kvm_cpu_context *ctxt)
> +{
> +	u64 mode = ctxt->regs.pstate & (PSR_MODE_MASK | PSR_MODE32_BIT);
> +
> +	switch (mode) {
> +	case PSR_MODE_EL2t:
> +		mode = PSR_MODE_EL1t;
> +		break;
> +	case PSR_MODE_EL2h:
> +		mode = PSR_MODE_EL1h;
> +		break;
> +	}
> +
> +	return (ctxt->regs.pstate & ~(PSR_MODE_MASK | PSR_MODE32_BIT)) | mode;
> +}
> +

Wondering if it makes sense to also have the reverse translation as an
inline function after the above too, so the two translations are
together - but as it's only used (in this patch at least) in switch.c
there probably isn't too much point.

So:

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 18/64] KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-01 18:08     ` Russell King (Oracle)
  -1 siblings, 0 replies; 378+ messages in thread
From: Russell King (Oracle) @ 2022-02-01 18:08 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	linux-arm-kernel

On Fri, Jan 28, 2022 at 12:18:26PM +0000, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@linaro.org>
> 
> When running in virtual EL2 mode, we actually run the hardware in EL1
> and therefore have to use the EL1 registers to ensure correct operation.
> 
> By setting the HCR.TVM and HCR.TVRM we ensure that the virtual EL2 mode
> doesn't shoot itself in the foot when setting up what it believes to be
> a different mode's system register state (for example when preparing to
> switch to a VM).
> 
> We can leverage the existing sysregs infrastructure to support trapped
> accesses to these registers.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 18/64] KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2
@ 2022-02-01 18:08     ` Russell King (Oracle)
  0 siblings, 0 replies; 378+ messages in thread
From: Russell King (Oracle) @ 2022-02-01 18:08 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

On Fri, Jan 28, 2022 at 12:18:26PM +0000, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@linaro.org>
> 
> When running in virtual EL2 mode, we actually run the hardware in EL1
> and therefore have to use the EL1 registers to ensure correct operation.
> 
> By setting the HCR.TVM and HCR.TVRM we ensure that the virtual EL2 mode
> doesn't shoot itself in the foot when setting up what it believes to be
> a different mode's system register state (for example when preparing to
> switch to a VM).
> 
> We can leverage the existing sysregs infrastructure to support trapped
> accesses to these registers.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 18/64] KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2
@ 2022-02-01 18:08     ` Russell King (Oracle)
  0 siblings, 0 replies; 378+ messages in thread
From: Russell King (Oracle) @ 2022-02-01 18:08 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

On Fri, Jan 28, 2022 at 12:18:26PM +0000, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@linaro.org>
> 
> When running in virtual EL2 mode, we actually run the hardware in EL1
> and therefore have to use the EL1 registers to ensure correct operation.
> 
> By setting the HCR.TVM and HCR.TVRM we ensure that the virtual EL2 mode
> doesn't shoot itself in the foot when setting up what it believes to be
> a different mode's system register state (for example when preparing to
> switch to a VM).
> 
> We can leverage the existing sysregs infrastructure to support trapped
> accesses to these registers.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 19/64] KVM: arm64: nv: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-01 18:13     ` Russell King (Oracle)
  -1 siblings, 0 replies; 378+ messages in thread
From: Russell King (Oracle) @ 2022-02-01 18:13 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	linux-arm-kernel

On Fri, Jan 28, 2022 at 12:18:27PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> For the same reason we trap virtual memory register accesses at virtual
> EL2, we need to trap SPSR_EL1, ELR_EL1 and VBAR_EL1 accesses. ARM v8.3
> introduces the HCR_EL2.NV1 bit to be able to trap on those register
> accesses in EL1. Do not set this bit until the whole nesting support is

Maybe:
                 , but will be done in a future patch once nested support
is complete.

> completed.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 19/64] KVM: arm64: nv: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
@ 2022-02-01 18:13     ` Russell King (Oracle)
  0 siblings, 0 replies; 378+ messages in thread
From: Russell King (Oracle) @ 2022-02-01 18:13 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

On Fri, Jan 28, 2022 at 12:18:27PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> For the same reason we trap virtual memory register accesses at virtual
> EL2, we need to trap SPSR_EL1, ELR_EL1 and VBAR_EL1 accesses. ARM v8.3
> introduces the HCR_EL2.NV1 bit to be able to trap on those register
> accesses in EL1. Do not set this bit until the whole nesting support is

Maybe:
                 , but will be done in a future patch once nested support
is complete.

> completed.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 19/64] KVM: arm64: nv: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
@ 2022-02-01 18:13     ` Russell King (Oracle)
  0 siblings, 0 replies; 378+ messages in thread
From: Russell King (Oracle) @ 2022-02-01 18:13 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

On Fri, Jan 28, 2022 at 12:18:27PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> For the same reason we trap virtual memory register accesses at virtual
> EL2, we need to trap SPSR_EL1, ELR_EL1 and VBAR_EL1 accesses. ARM v8.3
> introduces the HCR_EL2.NV1 bit to be able to trap on those register
> accesses in EL1. Do not set this bit until the whole nesting support is

Maybe:
                 , but will be done in a future patch once nested support
is complete.

> completed.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 15/64] KVM: arm64: nv: Handle HCR_EL2.E2H specially
  2022-02-01 16:51     ` Russell King (Oracle)
  (?)
@ 2022-02-01 18:17       ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-02-01 18:17 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

On 2022-02-01 16:51, Russell King (Oracle) wrote:
> On Fri, Jan 28, 2022 at 12:18:23PM +0000, Marc Zyngier wrote:
>> HCR_EL2.E2H is nasty, as a flip of this bit completely changes the way
>> we deal with a lot of the state. So when the guest flips this bit
>> (sysregs are live), do the put/load dance so that we have a consistent
>> state.
>> 
>> Yes, this is slow. Don't do it.
> 
> I'd hope this is very unlikely!

A guest OS would probably do it once per CPU bring-up. So I'm
not too bothered about the speed. But that's only one of the
many cases where we need to do this put/load game.

At this stage, we don't care too much. But the last two patches
give you a glimpse of what sort of fine-grained optimisation
we will eventually want to do for this not to suck too much.

But again, this is NV, and it gives a whole new sense to "being
slow".

> 
>> 
>> Suggested-by: Alexandru Elisei <alexandru.elisei@arm.com>
>> Signed-off-by: Marc Zyngier <maz@kernel.org>
> 
> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 15/64] KVM: arm64: nv: Handle HCR_EL2.E2H specially
@ 2022-02-01 18:17       ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-02-01 18:17 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	linux-arm-kernel

On 2022-02-01 16:51, Russell King (Oracle) wrote:
> On Fri, Jan 28, 2022 at 12:18:23PM +0000, Marc Zyngier wrote:
>> HCR_EL2.E2H is nasty, as a flip of this bit completely changes the way
>> we deal with a lot of the state. So when the guest flips this bit
>> (sysregs are live), do the put/load dance so that we have a consistent
>> state.
>> 
>> Yes, this is slow. Don't do it.
> 
> I'd hope this is very unlikely!

A guest OS would probably do it once per CPU bring-up. So I'm
not too bothered about the speed. But that's only one of the
many cases where we need to do this put/load game.

At this stage, we don't care too much. But the last two patches
give you a glimpse of what sort of fine-grained optimisation
we will eventually want to do for this not to suck too much.

But again, this is NV, and it gives a whole new sense to "being
slow".

> 
>> 
>> Suggested-by: Alexandru Elisei <alexandru.elisei@arm.com>
>> Signed-off-by: Marc Zyngier <maz@kernel.org>
> 
> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 15/64] KVM: arm64: nv: Handle HCR_EL2.E2H specially
@ 2022-02-01 18:17       ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-02-01 18:17 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	James Morse, Suzuki K Poulose, Alexandru Elisei, karl.heubaum,
	mihai.carabas, miguel.luis, kernel-team

On 2022-02-01 16:51, Russell King (Oracle) wrote:
> On Fri, Jan 28, 2022 at 12:18:23PM +0000, Marc Zyngier wrote:
>> HCR_EL2.E2H is nasty, as a flip of this bit completely changes the way
>> we deal with a lot of the state. So when the guest flips this bit
>> (sysregs are live), do the put/load dance so that we have a consistent
>> state.
>> 
>> Yes, this is slow. Don't do it.
> 
> I'd hope this is very unlikely!

A guest OS would probably do it once per CPU bring-up. So I'm
not too bothered about the speed. But that's only one of the
many cases where we need to do this put/load game.

At this stage, we don't care too much. But the last two patches
give you a glimpse of what sort of fine-grained optimisation
we will eventually want to do for this not to suck too much.

But again, this is NV, and it gives a whole new sense to "being
slow".

> 
>> 
>> Suggested-by: Alexandru Elisei <alexandru.elisei@arm.com>
>> Signed-off-by: Marc Zyngier <maz@kernel.org>
> 
> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 03/64] KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-02 11:40     ` Alexandru Elisei
  -1 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-02 11:40 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:11PM +0000, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@arm.com>
> 
> Reset the VCPU with PSTATE.M = EL2h when the nested virtualization
> feature is enabled on the VCPU.

Looks good to me:

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks,
Alex

> 
> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> [maz: rework register reset not to use empty data structures]
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/reset.c | 29 ++++++++++++++++++++++++-----
>  1 file changed, 24 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index ecc40c8cd6f6..d19a9aad2d85 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -27,6 +27,7 @@
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_emulate.h>
>  #include <asm/kvm_mmu.h>
> +#include <asm/kvm_nested.h>
>  #include <asm/virt.h>
>  
>  /* Maximum phys_shift supported for any VM on this host */
> @@ -38,6 +39,9 @@ static u32 kvm_ipa_limit;
>  #define VCPU_RESET_PSTATE_EL1	(PSR_MODE_EL1h | PSR_A_BIT | PSR_I_BIT | \
>  				 PSR_F_BIT | PSR_D_BIT)
>  
> +#define VCPU_RESET_PSTATE_EL2	(PSR_MODE_EL2h | PSR_A_BIT | PSR_I_BIT | \
> +				 PSR_F_BIT | PSR_D_BIT)
> +
>  #define VCPU_RESET_PSTATE_SVC	(PSR_AA32_MODE_SVC | PSR_AA32_A_BIT | \
>  				 PSR_AA32_I_BIT | PSR_AA32_F_BIT)
>  
> @@ -188,12 +192,19 @@ static bool vcpu_allowed_register_width(struct kvm_vcpu *vcpu)
>  	unsigned long i;
>  
>  	is32bit = vcpu_has_feature(vcpu, KVM_ARM_VCPU_EL1_32BIT);
> -	if (!cpus_have_const_cap(ARM64_HAS_32BIT_EL1) && is32bit)
> -		return false;
> +	if (is32bit) {
> +		/* The HW must obviously support AArch32 at EL1 */
> +		if (!cpus_have_const_cap(ARM64_HAS_32BIT_EL1))
> +			return false;
>  
> -	/* MTE is incompatible with AArch32 */
> -	if (kvm_has_mte(vcpu->kvm) && is32bit)
> -		return false;
> +		/* MTE is incompatible with AArch32 */
> +		if (kvm_has_mte(vcpu->kvm))
> +			return false;
> +
> +		/* NV is incompatible with AArch32 */
> +		if (vcpu_has_nv(vcpu))
> +			return false;
> +	}
>  
>  	/* Check that the vcpus are either all 32bit or all 64bit */
>  	kvm_for_each_vcpu(i, tmp, vcpu->kvm) {
> @@ -265,10 +276,18 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>  		goto out;
>  	}
>  
> +	if (vcpu_has_nv(vcpu) &&
> +	    vcpu_has_feature(vcpu, KVM_ARM_VCPU_SVE)) {
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
>  	switch (vcpu->arch.target) {
>  	default:
>  		if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
>  			pstate = VCPU_RESET_PSTATE_SVC;
> +		} else if (vcpu_has_nv(vcpu)) {
> +			pstate = VCPU_RESET_PSTATE_EL2;
>  		} else {
>  			pstate = VCPU_RESET_PSTATE_EL1;
>  		}
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 03/64] KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set
@ 2022-02-02 11:40     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-02 11:40 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:11PM +0000, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@arm.com>
> 
> Reset the VCPU with PSTATE.M = EL2h when the nested virtualization
> feature is enabled on the VCPU.

Looks good to me:

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks,
Alex

> 
> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> [maz: rework register reset not to use empty data structures]
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/reset.c | 29 ++++++++++++++++++++++++-----
>  1 file changed, 24 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index ecc40c8cd6f6..d19a9aad2d85 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -27,6 +27,7 @@
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_emulate.h>
>  #include <asm/kvm_mmu.h>
> +#include <asm/kvm_nested.h>
>  #include <asm/virt.h>
>  
>  /* Maximum phys_shift supported for any VM on this host */
> @@ -38,6 +39,9 @@ static u32 kvm_ipa_limit;
>  #define VCPU_RESET_PSTATE_EL1	(PSR_MODE_EL1h | PSR_A_BIT | PSR_I_BIT | \
>  				 PSR_F_BIT | PSR_D_BIT)
>  
> +#define VCPU_RESET_PSTATE_EL2	(PSR_MODE_EL2h | PSR_A_BIT | PSR_I_BIT | \
> +				 PSR_F_BIT | PSR_D_BIT)
> +
>  #define VCPU_RESET_PSTATE_SVC	(PSR_AA32_MODE_SVC | PSR_AA32_A_BIT | \
>  				 PSR_AA32_I_BIT | PSR_AA32_F_BIT)
>  
> @@ -188,12 +192,19 @@ static bool vcpu_allowed_register_width(struct kvm_vcpu *vcpu)
>  	unsigned long i;
>  
>  	is32bit = vcpu_has_feature(vcpu, KVM_ARM_VCPU_EL1_32BIT);
> -	if (!cpus_have_const_cap(ARM64_HAS_32BIT_EL1) && is32bit)
> -		return false;
> +	if (is32bit) {
> +		/* The HW must obviously support AArch32 at EL1 */
> +		if (!cpus_have_const_cap(ARM64_HAS_32BIT_EL1))
> +			return false;
>  
> -	/* MTE is incompatible with AArch32 */
> -	if (kvm_has_mte(vcpu->kvm) && is32bit)
> -		return false;
> +		/* MTE is incompatible with AArch32 */
> +		if (kvm_has_mte(vcpu->kvm))
> +			return false;
> +
> +		/* NV is incompatible with AArch32 */
> +		if (vcpu_has_nv(vcpu))
> +			return false;
> +	}
>  
>  	/* Check that the vcpus are either all 32bit or all 64bit */
>  	kvm_for_each_vcpu(i, tmp, vcpu->kvm) {
> @@ -265,10 +276,18 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>  		goto out;
>  	}
>  
> +	if (vcpu_has_nv(vcpu) &&
> +	    vcpu_has_feature(vcpu, KVM_ARM_VCPU_SVE)) {
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
>  	switch (vcpu->arch.target) {
>  	default:
>  		if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
>  			pstate = VCPU_RESET_PSTATE_SVC;
> +		} else if (vcpu_has_nv(vcpu)) {
> +			pstate = VCPU_RESET_PSTATE_EL2;
>  		} else {
>  			pstate = VCPU_RESET_PSTATE_EL1;
>  		}
> -- 
> 2.30.2
> 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 03/64] KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set
@ 2022-02-02 11:40     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-02 11:40 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:11PM +0000, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@arm.com>
> 
> Reset the VCPU with PSTATE.M = EL2h when the nested virtualization
> feature is enabled on the VCPU.

Looks good to me:

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks,
Alex

> 
> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> [maz: rework register reset not to use empty data structures]
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/reset.c | 29 ++++++++++++++++++++++++-----
>  1 file changed, 24 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index ecc40c8cd6f6..d19a9aad2d85 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -27,6 +27,7 @@
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_emulate.h>
>  #include <asm/kvm_mmu.h>
> +#include <asm/kvm_nested.h>
>  #include <asm/virt.h>
>  
>  /* Maximum phys_shift supported for any VM on this host */
> @@ -38,6 +39,9 @@ static u32 kvm_ipa_limit;
>  #define VCPU_RESET_PSTATE_EL1	(PSR_MODE_EL1h | PSR_A_BIT | PSR_I_BIT | \
>  				 PSR_F_BIT | PSR_D_BIT)
>  
> +#define VCPU_RESET_PSTATE_EL2	(PSR_MODE_EL2h | PSR_A_BIT | PSR_I_BIT | \
> +				 PSR_F_BIT | PSR_D_BIT)
> +
>  #define VCPU_RESET_PSTATE_SVC	(PSR_AA32_MODE_SVC | PSR_AA32_A_BIT | \
>  				 PSR_AA32_I_BIT | PSR_AA32_F_BIT)
>  
> @@ -188,12 +192,19 @@ static bool vcpu_allowed_register_width(struct kvm_vcpu *vcpu)
>  	unsigned long i;
>  
>  	is32bit = vcpu_has_feature(vcpu, KVM_ARM_VCPU_EL1_32BIT);
> -	if (!cpus_have_const_cap(ARM64_HAS_32BIT_EL1) && is32bit)
> -		return false;
> +	if (is32bit) {
> +		/* The HW must obviously support AArch32 at EL1 */
> +		if (!cpus_have_const_cap(ARM64_HAS_32BIT_EL1))
> +			return false;
>  
> -	/* MTE is incompatible with AArch32 */
> -	if (kvm_has_mte(vcpu->kvm) && is32bit)
> -		return false;
> +		/* MTE is incompatible with AArch32 */
> +		if (kvm_has_mte(vcpu->kvm))
> +			return false;
> +
> +		/* NV is incompatible with AArch32 */
> +		if (vcpu_has_nv(vcpu))
> +			return false;
> +	}
>  
>  	/* Check that the vcpus are either all 32bit or all 64bit */
>  	kvm_for_each_vcpu(i, tmp, vcpu->kvm) {
> @@ -265,10 +276,18 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>  		goto out;
>  	}
>  
> +	if (vcpu_has_nv(vcpu) &&
> +	    vcpu_has_feature(vcpu, KVM_ARM_VCPU_SVE)) {
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
>  	switch (vcpu->arch.target) {
>  	default:
>  		if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
>  			pstate = VCPU_RESET_PSTATE_SVC;
> +		} else if (vcpu_has_nv(vcpu)) {
> +			pstate = VCPU_RESET_PSTATE_EL2;
>  		} else {
>  			pstate = VCPU_RESET_PSTATE_EL1;
>  		}
> -- 
> 2.30.2
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 04/64] KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-02 11:53     ` Alexandru Elisei
  -1 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-02 11:53 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:12PM +0000, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@linaro.org>
> 
> We were not allowing userspace to set a more privileged mode for the VCPU
> than EL1, but we should allow this when nested virtualization is enabled
> for the VCPU.
> 
> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/guest.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> index e116c7767730..db6209622be9 100644
> --- a/arch/arm64/kvm/guest.c
> +++ b/arch/arm64/kvm/guest.c
> @@ -24,6 +24,7 @@
>  #include <asm/fpsimd.h>
>  #include <asm/kvm.h>
>  #include <asm/kvm_emulate.h>
> +#include <asm/kvm_nested.h>
>  #include <asm/sigcontext.h>
>  
>  #include "trace.h"
> @@ -259,6 +260,11 @@ static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
>  			if (vcpu_el1_is_32bit(vcpu))
>  				return -EINVAL;
>  			break;
> +		case PSR_MODE_EL2h:
> +		case PSR_MODE_EL2t:
> +			if (vcpu_el1_is_32bit(vcpu) || !vcpu_has_nv(vcpu))

I'm a bit confused about the vcpu_el1_is_32bit() check. The function tests
that HCR_EL2.RW is not set. HCR_EL2.RW is cleared when the
KVM_ARM_VCPU_EL1_32BIT feature is preset for the VCPU. But the EL2 and the
32BIT features are incompatible (kvm_reset_vcpu() returns an error when
both are set). Wouldn't checking only !vcpu_has_nv() be enough here?

Thanks,
Alex

> +				return -EINVAL;
> +			break;
>  		default:
>  			err = -EINVAL;
>  			goto out;
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 04/64] KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x
@ 2022-02-02 11:53     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-02 11:53 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:12PM +0000, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@linaro.org>
> 
> We were not allowing userspace to set a more privileged mode for the VCPU
> than EL1, but we should allow this when nested virtualization is enabled
> for the VCPU.
> 
> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/guest.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> index e116c7767730..db6209622be9 100644
> --- a/arch/arm64/kvm/guest.c
> +++ b/arch/arm64/kvm/guest.c
> @@ -24,6 +24,7 @@
>  #include <asm/fpsimd.h>
>  #include <asm/kvm.h>
>  #include <asm/kvm_emulate.h>
> +#include <asm/kvm_nested.h>
>  #include <asm/sigcontext.h>
>  
>  #include "trace.h"
> @@ -259,6 +260,11 @@ static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
>  			if (vcpu_el1_is_32bit(vcpu))
>  				return -EINVAL;
>  			break;
> +		case PSR_MODE_EL2h:
> +		case PSR_MODE_EL2t:
> +			if (vcpu_el1_is_32bit(vcpu) || !vcpu_has_nv(vcpu))

I'm a bit confused about the vcpu_el1_is_32bit() check. The function tests
that HCR_EL2.RW is not set. HCR_EL2.RW is cleared when the
KVM_ARM_VCPU_EL1_32BIT feature is preset for the VCPU. But the EL2 and the
32BIT features are incompatible (kvm_reset_vcpu() returns an error when
both are set). Wouldn't checking only !vcpu_has_nv() be enough here?

Thanks,
Alex

> +				return -EINVAL;
> +			break;
>  		default:
>  			err = -EINVAL;
>  			goto out;
> -- 
> 2.30.2
> 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 04/64] KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x
@ 2022-02-02 11:53     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-02 11:53 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:12PM +0000, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@linaro.org>
> 
> We were not allowing userspace to set a more privileged mode for the VCPU
> than EL1, but we should allow this when nested virtualization is enabled
> for the VCPU.
> 
> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/guest.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> index e116c7767730..db6209622be9 100644
> --- a/arch/arm64/kvm/guest.c
> +++ b/arch/arm64/kvm/guest.c
> @@ -24,6 +24,7 @@
>  #include <asm/fpsimd.h>
>  #include <asm/kvm.h>
>  #include <asm/kvm_emulate.h>
> +#include <asm/kvm_nested.h>
>  #include <asm/sigcontext.h>
>  
>  #include "trace.h"
> @@ -259,6 +260,11 @@ static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
>  			if (vcpu_el1_is_32bit(vcpu))
>  				return -EINVAL;
>  			break;
> +		case PSR_MODE_EL2h:
> +		case PSR_MODE_EL2t:
> +			if (vcpu_el1_is_32bit(vcpu) || !vcpu_has_nv(vcpu))

I'm a bit confused about the vcpu_el1_is_32bit() check. The function tests
that HCR_EL2.RW is not set. HCR_EL2.RW is cleared when the
KVM_ARM_VCPU_EL1_32BIT feature is preset for the VCPU. But the EL2 and the
32BIT features are incompatible (kvm_reset_vcpu() returns an error when
both are set). Wouldn't checking only !vcpu_has_nv() be enough here?

Thanks,
Alex

> +				return -EINVAL;
> +			break;
>  		default:
>  			err = -EINVAL;
>  			goto out;
> -- 
> 2.30.2
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 06/64] KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-02 12:10     ` Alexandru Elisei
  -1 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-02 12:10 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:14PM +0000, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@arm.com>
> 
> When running a nested hypervisor we commonly have to figure out if
> the VCPU mode is running in the context of a guest hypervisor or guest
> guest, or just a normal guest.
> 
> Add convenient primitives for this.
> 
> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_emulate.h | 53 ++++++++++++++++++++++++++++
>  1 file changed, 53 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index d62405ce3e6d..ea9a130c4b6a 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -178,6 +178,59 @@ static __always_inline void vcpu_set_reg(struct kvm_vcpu *vcpu, u8 reg_num,
>  		vcpu_gp_regs(vcpu)->regs[reg_num] = val;
>  }
>  
> +static inline bool vcpu_is_el2_ctxt(const struct kvm_cpu_context *ctxt)
> +{
> +	switch (ctxt->regs.pstate & (PSR_MODE32_BIT | PSR_MODE_MASK)) {
> +	case PSR_MODE_EL2h:
> +	case PSR_MODE_EL2t:
> +		return true;
> +	default:
> +		return false;
> +	}
> +}
> +
> +static inline bool vcpu_is_el2(const struct kvm_vcpu *vcpu)
> +{
> +	return vcpu_is_el2_ctxt(&vcpu->arch.ctxt);
> +}
> +
> +static inline bool __vcpu_el2_e2h_is_set(const struct kvm_cpu_context *ctxt)
> +{
> +	return ctxt_sys_reg(ctxt, HCR_EL2) & HCR_E2H;
> +}
> +
> +static inline bool vcpu_el2_e2h_is_set(const struct kvm_vcpu *vcpu)
> +{
> +	return __vcpu_el2_e2h_is_set(&vcpu->arch.ctxt);
> +}
> +
> +static inline bool __vcpu_el2_tge_is_set(const struct kvm_cpu_context *ctxt)
> +{
> +	return ctxt_sys_reg(ctxt, HCR_EL2) & HCR_TGE;
> +}
> +
> +static inline bool vcpu_el2_tge_is_set(const struct kvm_vcpu *vcpu)
> +{
> +	return __vcpu_el2_tge_is_set(&vcpu->arch.ctxt);
> +}
> +
> +static inline bool __is_hyp_ctxt(const struct kvm_cpu_context *ctxt)
> +{
> +	/*
> +	 * We are in a hypervisor context if the vcpu mode is EL2 or
> +	 * E2H and TGE bits are set. The latter means we are in the user space
> +	 * of the VHE kernel. ARMv8.1 ARM describes this as 'InHost'
> +	 */
> +	return vcpu_is_el2_ctxt(ctxt) ||
> +		(__vcpu_el2_e2h_is_set(ctxt) && __vcpu_el2_tge_is_set(ctxt)) ||
> +		WARN_ON(__vcpu_el2_tge_is_set(ctxt));

Why the WARN_ON? Wouldn't it be easy for a guest to flood the host's dmesg with
warnings by setting HCR_EL2.{E2H,TGE} = {0, 1} and then repeatedly accessing EL2
registers (for example)?

> +}
> +
> +static inline bool is_hyp_ctxt(const struct kvm_vcpu *vcpu)

When KVM boots at EL2 and FEAT_VHE is not implemented it prints to dmesg:

[    1.352302] kvm [1]: Hyp mode initialized successfully

I take it in that context "Hyp" means "EL2 NVHE", because when E2H,TGE are
set the message changes to:

[    0.037683] kvm [1]: VHE mode initialized successfully

In this file, in the function is_kernel_in_hyp_mode(), "hyp" actually means
that the exception level is EL2. In kvm/arm.c, "hyp" also means EL2 (or
maybe EL2 NVHE?).

Wouldn't it be nice to avoid having "hyp" mean yet another thing and
replace it with "hypervisor" in these two functions?

Thanks,
Alex

> +{
> +	return __is_hyp_ctxt(&vcpu->arch.ctxt);
> +}
> +
>  /*
>   * The layout of SPSR for an AArch32 state is different when observed from an
>   * AArch64 SPSR_ELx or an AArch32 SPSR_*. This function generates the AArch32
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 06/64] KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state
@ 2022-02-02 12:10     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-02 12:10 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:14PM +0000, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@arm.com>
> 
> When running a nested hypervisor we commonly have to figure out if
> the VCPU mode is running in the context of a guest hypervisor or guest
> guest, or just a normal guest.
> 
> Add convenient primitives for this.
> 
> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_emulate.h | 53 ++++++++++++++++++++++++++++
>  1 file changed, 53 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index d62405ce3e6d..ea9a130c4b6a 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -178,6 +178,59 @@ static __always_inline void vcpu_set_reg(struct kvm_vcpu *vcpu, u8 reg_num,
>  		vcpu_gp_regs(vcpu)->regs[reg_num] = val;
>  }
>  
> +static inline bool vcpu_is_el2_ctxt(const struct kvm_cpu_context *ctxt)
> +{
> +	switch (ctxt->regs.pstate & (PSR_MODE32_BIT | PSR_MODE_MASK)) {
> +	case PSR_MODE_EL2h:
> +	case PSR_MODE_EL2t:
> +		return true;
> +	default:
> +		return false;
> +	}
> +}
> +
> +static inline bool vcpu_is_el2(const struct kvm_vcpu *vcpu)
> +{
> +	return vcpu_is_el2_ctxt(&vcpu->arch.ctxt);
> +}
> +
> +static inline bool __vcpu_el2_e2h_is_set(const struct kvm_cpu_context *ctxt)
> +{
> +	return ctxt_sys_reg(ctxt, HCR_EL2) & HCR_E2H;
> +}
> +
> +static inline bool vcpu_el2_e2h_is_set(const struct kvm_vcpu *vcpu)
> +{
> +	return __vcpu_el2_e2h_is_set(&vcpu->arch.ctxt);
> +}
> +
> +static inline bool __vcpu_el2_tge_is_set(const struct kvm_cpu_context *ctxt)
> +{
> +	return ctxt_sys_reg(ctxt, HCR_EL2) & HCR_TGE;
> +}
> +
> +static inline bool vcpu_el2_tge_is_set(const struct kvm_vcpu *vcpu)
> +{
> +	return __vcpu_el2_tge_is_set(&vcpu->arch.ctxt);
> +}
> +
> +static inline bool __is_hyp_ctxt(const struct kvm_cpu_context *ctxt)
> +{
> +	/*
> +	 * We are in a hypervisor context if the vcpu mode is EL2 or
> +	 * E2H and TGE bits are set. The latter means we are in the user space
> +	 * of the VHE kernel. ARMv8.1 ARM describes this as 'InHost'
> +	 */
> +	return vcpu_is_el2_ctxt(ctxt) ||
> +		(__vcpu_el2_e2h_is_set(ctxt) && __vcpu_el2_tge_is_set(ctxt)) ||
> +		WARN_ON(__vcpu_el2_tge_is_set(ctxt));

Why the WARN_ON? Wouldn't it be easy for a guest to flood the host's dmesg with
warnings by setting HCR_EL2.{E2H,TGE} = {0, 1} and then repeatedly accessing EL2
registers (for example)?

> +}
> +
> +static inline bool is_hyp_ctxt(const struct kvm_vcpu *vcpu)

When KVM boots at EL2 and FEAT_VHE is not implemented it prints to dmesg:

[    1.352302] kvm [1]: Hyp mode initialized successfully

I take it in that context "Hyp" means "EL2 NVHE", because when E2H,TGE are
set the message changes to:

[    0.037683] kvm [1]: VHE mode initialized successfully

In this file, in the function is_kernel_in_hyp_mode(), "hyp" actually means
that the exception level is EL2. In kvm/arm.c, "hyp" also means EL2 (or
maybe EL2 NVHE?).

Wouldn't it be nice to avoid having "hyp" mean yet another thing and
replace it with "hypervisor" in these two functions?

Thanks,
Alex

> +{
> +	return __is_hyp_ctxt(&vcpu->arch.ctxt);
> +}
> +
>  /*
>   * The layout of SPSR for an AArch32 state is different when observed from an
>   * AArch64 SPSR_ELx or an AArch32 SPSR_*. This function generates the AArch32
> -- 
> 2.30.2
> 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 06/64] KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state
@ 2022-02-02 12:10     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-02 12:10 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:14PM +0000, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@arm.com>
> 
> When running a nested hypervisor we commonly have to figure out if
> the VCPU mode is running in the context of a guest hypervisor or guest
> guest, or just a normal guest.
> 
> Add convenient primitives for this.
> 
> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_emulate.h | 53 ++++++++++++++++++++++++++++
>  1 file changed, 53 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index d62405ce3e6d..ea9a130c4b6a 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -178,6 +178,59 @@ static __always_inline void vcpu_set_reg(struct kvm_vcpu *vcpu, u8 reg_num,
>  		vcpu_gp_regs(vcpu)->regs[reg_num] = val;
>  }
>  
> +static inline bool vcpu_is_el2_ctxt(const struct kvm_cpu_context *ctxt)
> +{
> +	switch (ctxt->regs.pstate & (PSR_MODE32_BIT | PSR_MODE_MASK)) {
> +	case PSR_MODE_EL2h:
> +	case PSR_MODE_EL2t:
> +		return true;
> +	default:
> +		return false;
> +	}
> +}
> +
> +static inline bool vcpu_is_el2(const struct kvm_vcpu *vcpu)
> +{
> +	return vcpu_is_el2_ctxt(&vcpu->arch.ctxt);
> +}
> +
> +static inline bool __vcpu_el2_e2h_is_set(const struct kvm_cpu_context *ctxt)
> +{
> +	return ctxt_sys_reg(ctxt, HCR_EL2) & HCR_E2H;
> +}
> +
> +static inline bool vcpu_el2_e2h_is_set(const struct kvm_vcpu *vcpu)
> +{
> +	return __vcpu_el2_e2h_is_set(&vcpu->arch.ctxt);
> +}
> +
> +static inline bool __vcpu_el2_tge_is_set(const struct kvm_cpu_context *ctxt)
> +{
> +	return ctxt_sys_reg(ctxt, HCR_EL2) & HCR_TGE;
> +}
> +
> +static inline bool vcpu_el2_tge_is_set(const struct kvm_vcpu *vcpu)
> +{
> +	return __vcpu_el2_tge_is_set(&vcpu->arch.ctxt);
> +}
> +
> +static inline bool __is_hyp_ctxt(const struct kvm_cpu_context *ctxt)
> +{
> +	/*
> +	 * We are in a hypervisor context if the vcpu mode is EL2 or
> +	 * E2H and TGE bits are set. The latter means we are in the user space
> +	 * of the VHE kernel. ARMv8.1 ARM describes this as 'InHost'
> +	 */
> +	return vcpu_is_el2_ctxt(ctxt) ||
> +		(__vcpu_el2_e2h_is_set(ctxt) && __vcpu_el2_tge_is_set(ctxt)) ||
> +		WARN_ON(__vcpu_el2_tge_is_set(ctxt));

Why the WARN_ON? Wouldn't it be easy for a guest to flood the host's dmesg with
warnings by setting HCR_EL2.{E2H,TGE} = {0, 1} and then repeatedly accessing EL2
registers (for example)?

> +}
> +
> +static inline bool is_hyp_ctxt(const struct kvm_vcpu *vcpu)

When KVM boots at EL2 and FEAT_VHE is not implemented it prints to dmesg:

[    1.352302] kvm [1]: Hyp mode initialized successfully

I take it in that context "Hyp" means "EL2 NVHE", because when E2H,TGE are
set the message changes to:

[    0.037683] kvm [1]: VHE mode initialized successfully

In this file, in the function is_kernel_in_hyp_mode(), "hyp" actually means
that the exception level is EL2. In kvm/arm.c, "hyp" also means EL2 (or
maybe EL2 NVHE?).

Wouldn't it be nice to avoid having "hyp" mean yet another thing and
replace it with "hypervisor" in these two functions?

Thanks,
Alex

> +{
> +	return __is_hyp_ctxt(&vcpu->arch.ctxt);
> +}
> +
>  /*
>   * The layout of SPSR for an AArch32 state is different when observed from an
>   * AArch64 SPSR_ELx or an AArch32 SPSR_*. This function generates the AArch32
> -- 
> 2.30.2
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 09/64] KVM: arm64: nv: Support virtual EL2 exceptions
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-02 15:23     ` Alexandru Elisei
  -1 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-02 15:23 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi,

On Fri, Jan 28, 2022 at 12:18:17PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> Support injecting exceptions and performing exception returns to and
> from virtual EL2.  This must be done entirely in software except when
> taking an exception from vEL0 to vEL2 when the virtual HCR_EL2.{E2H,TGE}
> == {1,1}  (a VHE guest hypervisor).
> 
> Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> [maz: switch to common exception injection framework, illegal exeption
>  return handling]
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_arm.h     |  17 +++
>  arch/arm64/include/asm/kvm_emulate.h |  10 ++
>  arch/arm64/include/asm/kvm_host.h    |   1 +
>  arch/arm64/kvm/Makefile              |   2 +-
>  arch/arm64/kvm/emulate-nested.c      | 197 +++++++++++++++++++++++++++
>  arch/arm64/kvm/hyp/exception.c       |  49 +++++--
>  arch/arm64/kvm/inject_fault.c        |  68 +++++++--
>  arch/arm64/kvm/trace_arm.h           |  59 ++++++++
>  8 files changed, 382 insertions(+), 21 deletions(-)
>  create mode 100644 arch/arm64/kvm/emulate-nested.c
> 
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index 01d47c5886dc..e6e3aae87a09 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -359,4 +359,21 @@
>  #define CPACR_EL1_TTA		(1 << 28)
>  #define CPACR_EL1_DEFAULT	(CPACR_EL1_FPEN | CPACR_EL1_ZEN_EL1EN)
>  
> +#define kvm_mode_names				\
> +	{ PSR_MODE_EL0t,	"EL0t" },	\
> +	{ PSR_MODE_EL1t,	"EL1t" },	\
> +	{ PSR_MODE_EL1h,	"EL1h" },	\
> +	{ PSR_MODE_EL2t,	"EL2t" },	\
> +	{ PSR_MODE_EL2h,	"EL2h" },	\
> +	{ PSR_MODE_EL3t,	"EL3t" },	\
> +	{ PSR_MODE_EL3h,	"EL3h" },	\
> +	{ PSR_AA32_MODE_USR,	"32-bit USR" },	\
> +	{ PSR_AA32_MODE_FIQ,	"32-bit FIQ" },	\
> +	{ PSR_AA32_MODE_IRQ,	"32-bit IRQ" },	\
> +	{ PSR_AA32_MODE_SVC,	"32-bit SVC" },	\
> +	{ PSR_AA32_MODE_ABT,	"32-bit ABT" },	\
> +	{ PSR_AA32_MODE_HYP,	"32-bit HYP" },	\
> +	{ PSR_AA32_MODE_UND,	"32-bit UND" },	\
> +	{ PSR_AA32_MODE_SYS,	"32-bit SYS" }
> +
>  #endif /* __ARM64_KVM_ARM_H__ */
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index ea9a130c4b6a..cb9f123d26f3 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -33,6 +33,12 @@ enum exception_type {
>  	except_type_serror	= 0x180,
>  };
>  
> +#define kvm_exception_type_names		\
> +	{ except_type_sync,	"SYNC"   },	\
> +	{ except_type_irq,	"IRQ"    },	\
> +	{ except_type_fiq,	"FIQ"    },	\
> +	{ except_type_serror,	"SERROR" }
> +
>  bool kvm_condition_valid32(const struct kvm_vcpu *vcpu);
>  void kvm_skip_instr32(struct kvm_vcpu *vcpu);
>  
> @@ -43,6 +49,10 @@ void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
>  
>  void kvm_vcpu_wfi(struct kvm_vcpu *vcpu);
>  
> +void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu);
> +int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2);
> +int kvm_inject_nested_irq(struct kvm_vcpu *vcpu);
> +
>  static __always_inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
>  {
>  	return !(vcpu->arch.hcr_el2 & HCR_RW);
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 15f690c27baf..8fffe2888403 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -467,6 +467,7 @@ struct kvm_vcpu_arch {
>  #define KVM_ARM64_EXCEPT_AA64_ELx_SERR	(3 << 9)
>  #define KVM_ARM64_EXCEPT_AA64_EL1	(0 << 11)
>  #define KVM_ARM64_EXCEPT_AA64_EL2	(1 << 11)
> +#define KVM_ARM64_EXCEPT_AA64_EL_MASK	(1 << 11)
>  
>  #define KVM_ARM64_DEBUG_STATE_SAVE_SPE	(1 << 12) /* Save SPE context if active  */
>  #define KVM_ARM64_DEBUG_STATE_SAVE_TRBE	(1 << 13) /* Save TRBE context if active  */
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index 91861fd8b897..b67c4ebd72b1 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -14,7 +14,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
>  	 inject_fault.o va_layout.o handle_exit.o \
>  	 guest.o debug.o reset.o sys_regs.o \
>  	 vgic-sys-reg-v3.o fpsimd.o pmu.o pkvm.o \
> -	 arch_timer.o trng.o\
> +	 arch_timer.o trng.o emulate-nested.o \
>  	 vgic/vgic.o vgic/vgic-init.o \
>  	 vgic/vgic-irqfd.o vgic/vgic-v2.o \
>  	 vgic/vgic-v3.o vgic/vgic-v4.o \
> diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
> new file mode 100644
> index 000000000000..f52cd4458947
> --- /dev/null
> +++ b/arch/arm64/kvm/emulate-nested.c
> @@ -0,0 +1,197 @@

Looks like this line:

// SPDX-License-Identifier: GPL-2.0-only

is missing.

> +/*
> + * Copyright (C) 2016 - Linaro and Columbia University
> + * Author: Jintack Lim <jintack.lim@linaro.org>
> + */
> +
> +#include <linux/kvm.h>
> +#include <linux/kvm_host.h>
> +
> +#include <asm/kvm_emulate.h>
> +#include <asm/kvm_nested.h>
> +
> +#include "hyp/include/hyp/adjust_pc.h"
> +
> +#include "trace.h"
> +
> +static u64 kvm_check_illegal_exception_return(struct kvm_vcpu *vcpu, u64 spsr)
> +{
> +	u64 mode = spsr & PSR_MODE_MASK;
> +
> +	/*
> +	 * Possible causes for an Illegal Exception Return from EL2:
> +	 * - trying to return to EL3
> +	 * - trying to return to a 32bit EL
> +	 * - trying to return to EL1 with HCR_EL2.TGE set
> +	 */
> +	if (mode == PSR_MODE_EL3t || mode == PSR_MODE_EL3h ||
> +	    spsr & PSR_MODE32_BIT ||

I take it KVM will not allow a L1 hypervisor to run EL1 or EL0 in 32 bit mode?

> +	    (vcpu_el2_tge_is_set(vcpu) && (mode == PSR_MODE_EL1t ||
> +					   mode == PSR_MODE_EL1h))) {

I think these checks should also be added:

"A return where the value of the saved process state M[4] bit is 0, indicating a
return to AArch64 state, and one of the following is true:

- The M[1] bit is 1.
- The M[3:0] bits are 0b0001.
- The Exception level being returned to is using AArch32 state, as programmed by
  the SCR_EL3.RW or HCR_EL2.RW bits, or as configured from reset."

Thanks,
Alex

> +		/*
> +		 * The guest is playing with our nerves. Preserve EL, SP,
> +		 * masks, flags from the existing PSTATE, and set IL.
> +		 * The HW will then generate an Illegal State Exception
> +		 * immediately after ERET.
> +		 */
> +		spsr = *vcpu_cpsr(vcpu);
> +
> +		spsr &= (PSR_D_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT |
> +			 PSR_N_BIT | PSR_Z_BIT | PSR_C_BIT | PSR_V_BIT |
> +			 PSR_MODE_MASK | PSR_MODE32_BIT);
> +		spsr |= PSR_IL_BIT;
> +	}
> +
> +	return spsr;
> +}
> +
> +void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu)
> +{
> +	u64 spsr, elr, mode;
> +	bool direct_eret;
> +
> +	/*
> +	 * Going through the whole put/load motions is a waste of time
> +	 * if this is a VHE guest hypervisor returning to its own
> +	 * userspace, or the hypervisor performing a local exception
> +	 * return. No need to save/restore registers, no need to
> +	 * switch S2 MMU. Just do the canonical ERET.
> +	 */
> +	spsr = vcpu_read_sys_reg(vcpu, SPSR_EL2);
> +	spsr = kvm_check_illegal_exception_return(vcpu, spsr);
> +
> +	mode = spsr & (PSR_MODE_MASK | PSR_MODE32_BIT);
> +
> +	direct_eret  = (mode == PSR_MODE_EL0t &&
> +			vcpu_el2_e2h_is_set(vcpu) &&
> +			vcpu_el2_tge_is_set(vcpu));
> +	direct_eret |= (mode == PSR_MODE_EL2h || mode == PSR_MODE_EL2t);
> +
> +	if (direct_eret) {
> +		*vcpu_pc(vcpu) = vcpu_read_sys_reg(vcpu, ELR_EL2);
> +		*vcpu_cpsr(vcpu) = spsr;
> +		trace_kvm_nested_eret(vcpu, *vcpu_pc(vcpu), spsr);
> +		return;
> +	}
> +
> +	preempt_disable();
> +	kvm_arch_vcpu_put(vcpu);
> +
> +	elr = __vcpu_sys_reg(vcpu, ELR_EL2);
> +
> +	trace_kvm_nested_eret(vcpu, elr, spsr);
> +
> +	/*
> +	 * Note that the current exception level is always the virtual EL2,
> +	 * since we set HCR_EL2.NV bit only when entering the virtual EL2.
> +	 */
> +	*vcpu_pc(vcpu) = elr;
> +	*vcpu_cpsr(vcpu) = spsr;
> +
> +	kvm_arch_vcpu_load(vcpu, smp_processor_id());
> +	preempt_enable();
> +}
> +
> +static void kvm_inject_el2_exception(struct kvm_vcpu *vcpu, u64 esr_el2,
> +				     enum exception_type type)
> +{
> +	trace_kvm_inject_nested_exception(vcpu, esr_el2, type);
> +
> +	switch (type) {
> +	case except_type_sync:
> +		vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_ELx_SYNC;
> +		break;
> +	case except_type_irq:
> +		vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_ELx_IRQ;
> +		break;
> +	default:
> +		WARN_ONCE(1, "Unsupported EL2 exception injection %d\n", type);
> +	}
> +
> +	vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA64_EL2		|
> +			     KVM_ARM64_PENDING_EXCEPTION);
> +
> +	vcpu_write_sys_reg(vcpu, esr_el2, ESR_EL2);
> +}
> +
> +/*
> + * Emulate taking an exception to EL2.
> + * See ARM ARM J8.1.2 AArch64.TakeException()
> + */
> +static int kvm_inject_nested(struct kvm_vcpu *vcpu, u64 esr_el2,
> +			     enum exception_type type)
> +{
> +	u64 pstate, mode;
> +	bool direct_inject;
> +
> +	if (!vcpu_has_nv(vcpu)) {
> +		kvm_err("Unexpected call to %s for the non-nesting configuration\n",
> +				__func__);
> +		return -EINVAL;
> +	}
> +
> +	/*
> +	 * As for ERET, we can avoid doing too much on the injection path by
> +	 * checking that we either took the exception from a VHE host
> +	 * userspace or from vEL2. In these cases, there is no change in
> +	 * translation regime (or anything else), so let's do as little as
> +	 * possible.
> +	 */
> +	pstate = *vcpu_cpsr(vcpu);
> +	mode = pstate & (PSR_MODE_MASK | PSR_MODE32_BIT);
> +
> +	direct_inject  = (mode == PSR_MODE_EL0t &&
> +			  vcpu_el2_e2h_is_set(vcpu) &&
> +			  vcpu_el2_tge_is_set(vcpu));
> +	direct_inject |= (mode == PSR_MODE_EL2h || mode == PSR_MODE_EL2t);
> +
> +	if (direct_inject) {
> +		kvm_inject_el2_exception(vcpu, esr_el2, type);
> +		return 1;
> +	}
> +
> +	preempt_disable();
> +	kvm_arch_vcpu_put(vcpu);
> +
> +	kvm_inject_el2_exception(vcpu, esr_el2, type);
> +
> +	/*
> +	 * A hard requirement is that a switch between EL1 and EL2
> +	 * contexts has to happen between a put/load, so that we can
> +	 * pick the correct timer and interrupt configuration, among
> +	 * other things.
> +	 *
> +	 * Make sure the exception actually took place before we load
> +	 * the new context.
> +	 */
> +	__kvm_adjust_pc(vcpu);
> +
> +	kvm_arch_vcpu_load(vcpu, smp_processor_id());
> +	preempt_enable();
> +
> +	return 1;
> +}
> +
> +int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2)
> +{
> +	return kvm_inject_nested(vcpu, esr_el2, except_type_sync);
> +}
> +
> +int kvm_inject_nested_irq(struct kvm_vcpu *vcpu)
> +{
> +	/*
> +	 * Do not inject an irq if the:
> +	 *  - Current exception level is EL2, and
> +	 *  - virtual HCR_EL2.TGE == 0
> +	 *  - virtual HCR_EL2.IMO == 0
> +	 *
> +	 * See Table D1-17 "Physical interrupt target and masking when EL3 is
> +	 * not implemented and EL2 is implemented" in ARM DDI 0487C.a.
> +	 */
> +
> +	if (vcpu_is_el2(vcpu) && !vcpu_el2_tge_is_set(vcpu) &&
> +	    !(__vcpu_sys_reg(vcpu, HCR_EL2) & HCR_IMO))
> +		return 1;
> +
> +	/* esr_el2 value doesn't matter for exits due to irqs. */
> +	return kvm_inject_nested(vcpu, 0, except_type_irq);
> +}
> diff --git a/arch/arm64/kvm/hyp/exception.c b/arch/arm64/kvm/hyp/exception.c
> index 0418399e0a20..93f9c6f97376 100644
> --- a/arch/arm64/kvm/hyp/exception.c
> +++ b/arch/arm64/kvm/hyp/exception.c
> @@ -13,6 +13,7 @@
>  #include <hyp/adjust_pc.h>
>  #include <linux/kvm_host.h>
>  #include <asm/kvm_emulate.h>
> +#include <asm/kvm_nested.h>
>  
>  #if !defined (__KVM_NVHE_HYPERVISOR__) && !defined (__KVM_VHE_HYPERVISOR__)
>  #error Hypervisor code only!
> @@ -22,7 +23,9 @@ static inline u64 __vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
>  {
>  	u64 val;
>  
> -	if (__vcpu_read_sys_reg_from_cpu(reg, &val))
> +	if (unlikely(vcpu_has_nv(vcpu)))
> +		return vcpu_read_sys_reg(vcpu, reg);
> +	else if (__vcpu_read_sys_reg_from_cpu(reg, &val))
>  		return val;
>  
>  	return __vcpu_sys_reg(vcpu, reg);
> @@ -30,14 +33,24 @@ static inline u64 __vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
>  
>  static inline void __vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
>  {
> -	if (__vcpu_write_sys_reg_to_cpu(val, reg))
> -		return;
> -
> -	 __vcpu_sys_reg(vcpu, reg) = val;
> +	if (unlikely(vcpu_has_nv(vcpu)))
> +		vcpu_write_sys_reg(vcpu, val, reg);
> +	else if (!__vcpu_write_sys_reg_to_cpu(val, reg))
> +		__vcpu_sys_reg(vcpu, reg) = val;
>  }
>  
> -static void __vcpu_write_spsr(struct kvm_vcpu *vcpu, u64 val)
> +static void __vcpu_write_spsr(struct kvm_vcpu *vcpu, unsigned long target_mode,
> +			      u64 val)
>  {
> +	if (unlikely(vcpu_has_nv(vcpu))) {
> +		if (target_mode == PSR_MODE_EL1h)
> +			vcpu_write_sys_reg(vcpu, val, SPSR_EL1);
> +		else
> +			vcpu_write_sys_reg(vcpu, val, SPSR_EL2);
> +
> +		return;
> +	}
> +
>  	write_sysreg_el1(val, SYS_SPSR);
>  }
>  
> @@ -97,6 +110,11 @@ static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long target_mode,
>  		sctlr = __vcpu_read_sys_reg(vcpu, SCTLR_EL1);
>  		__vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL1);
>  		break;
> +	case PSR_MODE_EL2h:
> +		vbar = __vcpu_read_sys_reg(vcpu, VBAR_EL2);
> +		sctlr = __vcpu_read_sys_reg(vcpu, SCTLR_EL2);
> +		__vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL2);
> +		break;
>  	default:
>  		/* Don't do that */
>  		BUG();
> @@ -149,7 +167,7 @@ static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long target_mode,
>  	new |= target_mode;
>  
>  	*vcpu_cpsr(vcpu) = new;
> -	__vcpu_write_spsr(vcpu, old);
> +	__vcpu_write_spsr(vcpu, target_mode, old);
>  }
>  
>  /*
> @@ -320,11 +338,22 @@ static void kvm_inject_exception(struct kvm_vcpu *vcpu)
>  		      KVM_ARM64_EXCEPT_AA64_EL1):
>  			enter_exception64(vcpu, PSR_MODE_EL1h, except_type_sync);
>  			break;
> +
> +		case (KVM_ARM64_EXCEPT_AA64_ELx_SYNC |
> +		      KVM_ARM64_EXCEPT_AA64_EL2):
> +			enter_exception64(vcpu, PSR_MODE_EL2h, except_type_sync);
> +			break;
> +
> +		case (KVM_ARM64_EXCEPT_AA64_ELx_IRQ |
> +		      KVM_ARM64_EXCEPT_AA64_EL2):
> +			enter_exception64(vcpu, PSR_MODE_EL2h, except_type_irq);
> +			break;
> +
>  		default:
>  			/*
> -			 * Only EL1_SYNC makes sense so far, EL2_{SYNC,IRQ}
> -			 * will be implemented at some point. Everything
> -			 * else gets silently ignored.
> +			 * Only EL1_SYNC and EL2_{SYNC,IRQ} makes
> +			 * sense so far. Everything else gets silently
> +			 * ignored.
>  			 */
>  			break;
>  		}
> diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
> index b47df73e98d7..81ceee6998cc 100644
> --- a/arch/arm64/kvm/inject_fault.c
> +++ b/arch/arm64/kvm/inject_fault.c
> @@ -12,19 +12,58 @@
>  
>  #include <linux/kvm_host.h>
>  #include <asm/kvm_emulate.h>
> +#include <asm/kvm_nested.h>
>  #include <asm/esr.h>
>  
> +static void pend_sync_exception(struct kvm_vcpu *vcpu)
> +{
> +	vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA64_ELx_SYNC	|
> +			     KVM_ARM64_PENDING_EXCEPTION);
> +
> +	/* If not nesting, EL1 is the only possible exception target */
> +	if (likely(!vcpu_has_nv(vcpu))) {
> +		vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_EL1;
> +		return;
> +	}
> +
> +	/*
> +	 * With NV, we need to pick between EL1 and EL2. Note that we
> +	 * never deal with a nesting exception here, hence never
> +	 * changing context, and the exception itself can be delayed
> +	 * until the next entry.
> +	 */
> +	switch(*vcpu_cpsr(vcpu) & PSR_MODE_MASK) {
> +	case PSR_MODE_EL2h:
> +	case PSR_MODE_EL2t:
> +		vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_EL2;
> +		break;
> +	case PSR_MODE_EL1h:
> +	case PSR_MODE_EL1t:
> +		vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_EL1;
> +		break;
> +	case PSR_MODE_EL0t:
> +		if (vcpu_el2_tge_is_set(vcpu))
> +			vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_EL2;
> +		else
> +			vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_EL1;
> +		break;
> +	default:
> +		BUG();
> +	}
> +}
> +
> +static bool match_target_el(struct kvm_vcpu *vcpu, unsigned long target)
> +{
> +	return (vcpu->arch.flags & KVM_ARM64_EXCEPT_AA64_EL_MASK) == target;
> +}
> +
>  static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr)
>  {
>  	unsigned long cpsr = *vcpu_cpsr(vcpu);
>  	bool is_aarch32 = vcpu_mode_is_32bit(vcpu);
>  	u32 esr = 0;
>  
> -	vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA64_EL1		|
> -			     KVM_ARM64_EXCEPT_AA64_ELx_SYNC	|
> -			     KVM_ARM64_PENDING_EXCEPTION);
> -
> -	vcpu_write_sys_reg(vcpu, addr, FAR_EL1);
> +	pend_sync_exception(vcpu);
>  
>  	/*
>  	 * Build an {i,d}abort, depending on the level and the
> @@ -45,16 +84,22 @@ static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr
>  	if (!is_iabt)
>  		esr |= ESR_ELx_EC_DABT_LOW << ESR_ELx_EC_SHIFT;
>  
> -	vcpu_write_sys_reg(vcpu, esr | ESR_ELx_FSC_EXTABT, ESR_EL1);
> +	esr |= ESR_ELx_FSC_EXTABT;
> +
> +	if (match_target_el(vcpu, KVM_ARM64_EXCEPT_AA64_EL1)) {
> +		vcpu_write_sys_reg(vcpu, addr, FAR_EL1);
> +		vcpu_write_sys_reg(vcpu, esr, ESR_EL1);
> +	} else {
> +		vcpu_write_sys_reg(vcpu, addr, FAR_EL2);
> +		vcpu_write_sys_reg(vcpu, esr, ESR_EL2);
> +	}
>  }
>  
>  static void inject_undef64(struct kvm_vcpu *vcpu)
>  {
>  	u32 esr = (ESR_ELx_EC_UNKNOWN << ESR_ELx_EC_SHIFT);
>  
> -	vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA64_EL1		|
> -			     KVM_ARM64_EXCEPT_AA64_ELx_SYNC	|
> -			     KVM_ARM64_PENDING_EXCEPTION);
> +	pend_sync_exception(vcpu);
>  
>  	/*
>  	 * Build an unknown exception, depending on the instruction
> @@ -63,7 +108,10 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
>  	if (kvm_vcpu_trap_il_is32bit(vcpu))
>  		esr |= ESR_ELx_IL;
>  
> -	vcpu_write_sys_reg(vcpu, esr, ESR_EL1);
> +	if (match_target_el(vcpu, KVM_ARM64_EXCEPT_AA64_EL1))
> +		vcpu_write_sys_reg(vcpu, esr, ESR_EL1);
> +	else
> +		vcpu_write_sys_reg(vcpu, esr, ESR_EL2);
>  }
>  
>  #define DFSR_FSC_EXTABT_LPAE	0x10
> diff --git a/arch/arm64/kvm/trace_arm.h b/arch/arm64/kvm/trace_arm.h
> index 33e4e7dd2719..f3e46a976125 100644
> --- a/arch/arm64/kvm/trace_arm.h
> +++ b/arch/arm64/kvm/trace_arm.h
> @@ -2,6 +2,7 @@
>  #if !defined(_TRACE_ARM_ARM64_KVM_H) || defined(TRACE_HEADER_MULTI_READ)
>  #define _TRACE_ARM_ARM64_KVM_H
>  
> +#include <asm/kvm_emulate.h>
>  #include <kvm/arm_arch_timer.h>
>  #include <linux/tracepoint.h>
>  
> @@ -301,6 +302,64 @@ TRACE_EVENT(kvm_timer_emulate,
>  		  __entry->timer_idx, __entry->should_fire)
>  );
>  
> +TRACE_EVENT(kvm_nested_eret,
> +	TP_PROTO(struct kvm_vcpu *vcpu, unsigned long elr_el2,
> +		 unsigned long spsr_el2),
> +	TP_ARGS(vcpu, elr_el2, spsr_el2),
> +
> +	TP_STRUCT__entry(
> +		__field(struct kvm_vcpu *,	vcpu)
> +		__field(unsigned long,		elr_el2)
> +		__field(unsigned long,		spsr_el2)
> +		__field(unsigned long,		target_mode)
> +		__field(unsigned long,		hcr_el2)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->vcpu = vcpu;
> +		__entry->elr_el2 = elr_el2;
> +		__entry->spsr_el2 = spsr_el2;
> +		__entry->target_mode = spsr_el2 & (PSR_MODE_MASK | PSR_MODE32_BIT);
> +		__entry->hcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
> +	),
> +
> +	TP_printk("elr_el2: 0x%lx spsr_el2: 0x%08lx (M: %s) hcr_el2: %lx",
> +		  __entry->elr_el2, __entry->spsr_el2,
> +		  __print_symbolic(__entry->target_mode, kvm_mode_names),
> +		  __entry->hcr_el2)
> +);
> +
> +TRACE_EVENT(kvm_inject_nested_exception,
> +	TP_PROTO(struct kvm_vcpu *vcpu, u64 esr_el2, int type),
> +	TP_ARGS(vcpu, esr_el2, type),
> +
> +	TP_STRUCT__entry(
> +		__field(struct kvm_vcpu *,		vcpu)
> +		__field(unsigned long,			esr_el2)
> +		__field(int,				type)
> +		__field(unsigned long,			spsr_el2)
> +		__field(unsigned long,			pc)
> +		__field(unsigned long,			source_mode)
> +		__field(unsigned long,			hcr_el2)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->vcpu = vcpu;
> +		__entry->esr_el2 = esr_el2;
> +		__entry->type = type;
> +		__entry->spsr_el2 = *vcpu_cpsr(vcpu);
> +		__entry->pc = *vcpu_pc(vcpu);
> +		__entry->source_mode = *vcpu_cpsr(vcpu) & (PSR_MODE_MASK | PSR_MODE32_BIT);
> +		__entry->hcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
> +	),
> +
> +	TP_printk("%s: esr_el2 0x%lx elr_el2: 0x%lx spsr_el2: 0x%08lx (M: %s) hcr_el2: %lx",
> +		  __print_symbolic(__entry->type, kvm_exception_type_names),
> +		  __entry->esr_el2, __entry->pc, __entry->spsr_el2,
> +		  __print_symbolic(__entry->source_mode, kvm_mode_names),
> +		  __entry->hcr_el2)
> +);
> +
>  #endif /* _TRACE_ARM_ARM64_KVM_H */
>  
>  #undef TRACE_INCLUDE_PATH
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 09/64] KVM: arm64: nv: Support virtual EL2 exceptions
@ 2022-02-02 15:23     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-02 15:23 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

Hi,

On Fri, Jan 28, 2022 at 12:18:17PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> Support injecting exceptions and performing exception returns to and
> from virtual EL2.  This must be done entirely in software except when
> taking an exception from vEL0 to vEL2 when the virtual HCR_EL2.{E2H,TGE}
> == {1,1}  (a VHE guest hypervisor).
> 
> Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> [maz: switch to common exception injection framework, illegal exeption
>  return handling]
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_arm.h     |  17 +++
>  arch/arm64/include/asm/kvm_emulate.h |  10 ++
>  arch/arm64/include/asm/kvm_host.h    |   1 +
>  arch/arm64/kvm/Makefile              |   2 +-
>  arch/arm64/kvm/emulate-nested.c      | 197 +++++++++++++++++++++++++++
>  arch/arm64/kvm/hyp/exception.c       |  49 +++++--
>  arch/arm64/kvm/inject_fault.c        |  68 +++++++--
>  arch/arm64/kvm/trace_arm.h           |  59 ++++++++
>  8 files changed, 382 insertions(+), 21 deletions(-)
>  create mode 100644 arch/arm64/kvm/emulate-nested.c
> 
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index 01d47c5886dc..e6e3aae87a09 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -359,4 +359,21 @@
>  #define CPACR_EL1_TTA		(1 << 28)
>  #define CPACR_EL1_DEFAULT	(CPACR_EL1_FPEN | CPACR_EL1_ZEN_EL1EN)
>  
> +#define kvm_mode_names				\
> +	{ PSR_MODE_EL0t,	"EL0t" },	\
> +	{ PSR_MODE_EL1t,	"EL1t" },	\
> +	{ PSR_MODE_EL1h,	"EL1h" },	\
> +	{ PSR_MODE_EL2t,	"EL2t" },	\
> +	{ PSR_MODE_EL2h,	"EL2h" },	\
> +	{ PSR_MODE_EL3t,	"EL3t" },	\
> +	{ PSR_MODE_EL3h,	"EL3h" },	\
> +	{ PSR_AA32_MODE_USR,	"32-bit USR" },	\
> +	{ PSR_AA32_MODE_FIQ,	"32-bit FIQ" },	\
> +	{ PSR_AA32_MODE_IRQ,	"32-bit IRQ" },	\
> +	{ PSR_AA32_MODE_SVC,	"32-bit SVC" },	\
> +	{ PSR_AA32_MODE_ABT,	"32-bit ABT" },	\
> +	{ PSR_AA32_MODE_HYP,	"32-bit HYP" },	\
> +	{ PSR_AA32_MODE_UND,	"32-bit UND" },	\
> +	{ PSR_AA32_MODE_SYS,	"32-bit SYS" }
> +
>  #endif /* __ARM64_KVM_ARM_H__ */
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index ea9a130c4b6a..cb9f123d26f3 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -33,6 +33,12 @@ enum exception_type {
>  	except_type_serror	= 0x180,
>  };
>  
> +#define kvm_exception_type_names		\
> +	{ except_type_sync,	"SYNC"   },	\
> +	{ except_type_irq,	"IRQ"    },	\
> +	{ except_type_fiq,	"FIQ"    },	\
> +	{ except_type_serror,	"SERROR" }
> +
>  bool kvm_condition_valid32(const struct kvm_vcpu *vcpu);
>  void kvm_skip_instr32(struct kvm_vcpu *vcpu);
>  
> @@ -43,6 +49,10 @@ void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
>  
>  void kvm_vcpu_wfi(struct kvm_vcpu *vcpu);
>  
> +void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu);
> +int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2);
> +int kvm_inject_nested_irq(struct kvm_vcpu *vcpu);
> +
>  static __always_inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
>  {
>  	return !(vcpu->arch.hcr_el2 & HCR_RW);
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 15f690c27baf..8fffe2888403 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -467,6 +467,7 @@ struct kvm_vcpu_arch {
>  #define KVM_ARM64_EXCEPT_AA64_ELx_SERR	(3 << 9)
>  #define KVM_ARM64_EXCEPT_AA64_EL1	(0 << 11)
>  #define KVM_ARM64_EXCEPT_AA64_EL2	(1 << 11)
> +#define KVM_ARM64_EXCEPT_AA64_EL_MASK	(1 << 11)
>  
>  #define KVM_ARM64_DEBUG_STATE_SAVE_SPE	(1 << 12) /* Save SPE context if active  */
>  #define KVM_ARM64_DEBUG_STATE_SAVE_TRBE	(1 << 13) /* Save TRBE context if active  */
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index 91861fd8b897..b67c4ebd72b1 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -14,7 +14,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
>  	 inject_fault.o va_layout.o handle_exit.o \
>  	 guest.o debug.o reset.o sys_regs.o \
>  	 vgic-sys-reg-v3.o fpsimd.o pmu.o pkvm.o \
> -	 arch_timer.o trng.o\
> +	 arch_timer.o trng.o emulate-nested.o \
>  	 vgic/vgic.o vgic/vgic-init.o \
>  	 vgic/vgic-irqfd.o vgic/vgic-v2.o \
>  	 vgic/vgic-v3.o vgic/vgic-v4.o \
> diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
> new file mode 100644
> index 000000000000..f52cd4458947
> --- /dev/null
> +++ b/arch/arm64/kvm/emulate-nested.c
> @@ -0,0 +1,197 @@

Looks like this line:

// SPDX-License-Identifier: GPL-2.0-only

is missing.

> +/*
> + * Copyright (C) 2016 - Linaro and Columbia University
> + * Author: Jintack Lim <jintack.lim@linaro.org>
> + */
> +
> +#include <linux/kvm.h>
> +#include <linux/kvm_host.h>
> +
> +#include <asm/kvm_emulate.h>
> +#include <asm/kvm_nested.h>
> +
> +#include "hyp/include/hyp/adjust_pc.h"
> +
> +#include "trace.h"
> +
> +static u64 kvm_check_illegal_exception_return(struct kvm_vcpu *vcpu, u64 spsr)
> +{
> +	u64 mode = spsr & PSR_MODE_MASK;
> +
> +	/*
> +	 * Possible causes for an Illegal Exception Return from EL2:
> +	 * - trying to return to EL3
> +	 * - trying to return to a 32bit EL
> +	 * - trying to return to EL1 with HCR_EL2.TGE set
> +	 */
> +	if (mode == PSR_MODE_EL3t || mode == PSR_MODE_EL3h ||
> +	    spsr & PSR_MODE32_BIT ||

I take it KVM will not allow a L1 hypervisor to run EL1 or EL0 in 32 bit mode?

> +	    (vcpu_el2_tge_is_set(vcpu) && (mode == PSR_MODE_EL1t ||
> +					   mode == PSR_MODE_EL1h))) {

I think these checks should also be added:

"A return where the value of the saved process state M[4] bit is 0, indicating a
return to AArch64 state, and one of the following is true:

- The M[1] bit is 1.
- The M[3:0] bits are 0b0001.
- The Exception level being returned to is using AArch32 state, as programmed by
  the SCR_EL3.RW or HCR_EL2.RW bits, or as configured from reset."

Thanks,
Alex

> +		/*
> +		 * The guest is playing with our nerves. Preserve EL, SP,
> +		 * masks, flags from the existing PSTATE, and set IL.
> +		 * The HW will then generate an Illegal State Exception
> +		 * immediately after ERET.
> +		 */
> +		spsr = *vcpu_cpsr(vcpu);
> +
> +		spsr &= (PSR_D_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT |
> +			 PSR_N_BIT | PSR_Z_BIT | PSR_C_BIT | PSR_V_BIT |
> +			 PSR_MODE_MASK | PSR_MODE32_BIT);
> +		spsr |= PSR_IL_BIT;
> +	}
> +
> +	return spsr;
> +}
> +
> +void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu)
> +{
> +	u64 spsr, elr, mode;
> +	bool direct_eret;
> +
> +	/*
> +	 * Going through the whole put/load motions is a waste of time
> +	 * if this is a VHE guest hypervisor returning to its own
> +	 * userspace, or the hypervisor performing a local exception
> +	 * return. No need to save/restore registers, no need to
> +	 * switch S2 MMU. Just do the canonical ERET.
> +	 */
> +	spsr = vcpu_read_sys_reg(vcpu, SPSR_EL2);
> +	spsr = kvm_check_illegal_exception_return(vcpu, spsr);
> +
> +	mode = spsr & (PSR_MODE_MASK | PSR_MODE32_BIT);
> +
> +	direct_eret  = (mode == PSR_MODE_EL0t &&
> +			vcpu_el2_e2h_is_set(vcpu) &&
> +			vcpu_el2_tge_is_set(vcpu));
> +	direct_eret |= (mode == PSR_MODE_EL2h || mode == PSR_MODE_EL2t);
> +
> +	if (direct_eret) {
> +		*vcpu_pc(vcpu) = vcpu_read_sys_reg(vcpu, ELR_EL2);
> +		*vcpu_cpsr(vcpu) = spsr;
> +		trace_kvm_nested_eret(vcpu, *vcpu_pc(vcpu), spsr);
> +		return;
> +	}
> +
> +	preempt_disable();
> +	kvm_arch_vcpu_put(vcpu);
> +
> +	elr = __vcpu_sys_reg(vcpu, ELR_EL2);
> +
> +	trace_kvm_nested_eret(vcpu, elr, spsr);
> +
> +	/*
> +	 * Note that the current exception level is always the virtual EL2,
> +	 * since we set HCR_EL2.NV bit only when entering the virtual EL2.
> +	 */
> +	*vcpu_pc(vcpu) = elr;
> +	*vcpu_cpsr(vcpu) = spsr;
> +
> +	kvm_arch_vcpu_load(vcpu, smp_processor_id());
> +	preempt_enable();
> +}
> +
> +static void kvm_inject_el2_exception(struct kvm_vcpu *vcpu, u64 esr_el2,
> +				     enum exception_type type)
> +{
> +	trace_kvm_inject_nested_exception(vcpu, esr_el2, type);
> +
> +	switch (type) {
> +	case except_type_sync:
> +		vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_ELx_SYNC;
> +		break;
> +	case except_type_irq:
> +		vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_ELx_IRQ;
> +		break;
> +	default:
> +		WARN_ONCE(1, "Unsupported EL2 exception injection %d\n", type);
> +	}
> +
> +	vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA64_EL2		|
> +			     KVM_ARM64_PENDING_EXCEPTION);
> +
> +	vcpu_write_sys_reg(vcpu, esr_el2, ESR_EL2);
> +}
> +
> +/*
> + * Emulate taking an exception to EL2.
> + * See ARM ARM J8.1.2 AArch64.TakeException()
> + */
> +static int kvm_inject_nested(struct kvm_vcpu *vcpu, u64 esr_el2,
> +			     enum exception_type type)
> +{
> +	u64 pstate, mode;
> +	bool direct_inject;
> +
> +	if (!vcpu_has_nv(vcpu)) {
> +		kvm_err("Unexpected call to %s for the non-nesting configuration\n",
> +				__func__);
> +		return -EINVAL;
> +	}
> +
> +	/*
> +	 * As for ERET, we can avoid doing too much on the injection path by
> +	 * checking that we either took the exception from a VHE host
> +	 * userspace or from vEL2. In these cases, there is no change in
> +	 * translation regime (or anything else), so let's do as little as
> +	 * possible.
> +	 */
> +	pstate = *vcpu_cpsr(vcpu);
> +	mode = pstate & (PSR_MODE_MASK | PSR_MODE32_BIT);
> +
> +	direct_inject  = (mode == PSR_MODE_EL0t &&
> +			  vcpu_el2_e2h_is_set(vcpu) &&
> +			  vcpu_el2_tge_is_set(vcpu));
> +	direct_inject |= (mode == PSR_MODE_EL2h || mode == PSR_MODE_EL2t);
> +
> +	if (direct_inject) {
> +		kvm_inject_el2_exception(vcpu, esr_el2, type);
> +		return 1;
> +	}
> +
> +	preempt_disable();
> +	kvm_arch_vcpu_put(vcpu);
> +
> +	kvm_inject_el2_exception(vcpu, esr_el2, type);
> +
> +	/*
> +	 * A hard requirement is that a switch between EL1 and EL2
> +	 * contexts has to happen between a put/load, so that we can
> +	 * pick the correct timer and interrupt configuration, among
> +	 * other things.
> +	 *
> +	 * Make sure the exception actually took place before we load
> +	 * the new context.
> +	 */
> +	__kvm_adjust_pc(vcpu);
> +
> +	kvm_arch_vcpu_load(vcpu, smp_processor_id());
> +	preempt_enable();
> +
> +	return 1;
> +}
> +
> +int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2)
> +{
> +	return kvm_inject_nested(vcpu, esr_el2, except_type_sync);
> +}
> +
> +int kvm_inject_nested_irq(struct kvm_vcpu *vcpu)
> +{
> +	/*
> +	 * Do not inject an irq if the:
> +	 *  - Current exception level is EL2, and
> +	 *  - virtual HCR_EL2.TGE == 0
> +	 *  - virtual HCR_EL2.IMO == 0
> +	 *
> +	 * See Table D1-17 "Physical interrupt target and masking when EL3 is
> +	 * not implemented and EL2 is implemented" in ARM DDI 0487C.a.
> +	 */
> +
> +	if (vcpu_is_el2(vcpu) && !vcpu_el2_tge_is_set(vcpu) &&
> +	    !(__vcpu_sys_reg(vcpu, HCR_EL2) & HCR_IMO))
> +		return 1;
> +
> +	/* esr_el2 value doesn't matter for exits due to irqs. */
> +	return kvm_inject_nested(vcpu, 0, except_type_irq);
> +}
> diff --git a/arch/arm64/kvm/hyp/exception.c b/arch/arm64/kvm/hyp/exception.c
> index 0418399e0a20..93f9c6f97376 100644
> --- a/arch/arm64/kvm/hyp/exception.c
> +++ b/arch/arm64/kvm/hyp/exception.c
> @@ -13,6 +13,7 @@
>  #include <hyp/adjust_pc.h>
>  #include <linux/kvm_host.h>
>  #include <asm/kvm_emulate.h>
> +#include <asm/kvm_nested.h>
>  
>  #if !defined (__KVM_NVHE_HYPERVISOR__) && !defined (__KVM_VHE_HYPERVISOR__)
>  #error Hypervisor code only!
> @@ -22,7 +23,9 @@ static inline u64 __vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
>  {
>  	u64 val;
>  
> -	if (__vcpu_read_sys_reg_from_cpu(reg, &val))
> +	if (unlikely(vcpu_has_nv(vcpu)))
> +		return vcpu_read_sys_reg(vcpu, reg);
> +	else if (__vcpu_read_sys_reg_from_cpu(reg, &val))
>  		return val;
>  
>  	return __vcpu_sys_reg(vcpu, reg);
> @@ -30,14 +33,24 @@ static inline u64 __vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
>  
>  static inline void __vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
>  {
> -	if (__vcpu_write_sys_reg_to_cpu(val, reg))
> -		return;
> -
> -	 __vcpu_sys_reg(vcpu, reg) = val;
> +	if (unlikely(vcpu_has_nv(vcpu)))
> +		vcpu_write_sys_reg(vcpu, val, reg);
> +	else if (!__vcpu_write_sys_reg_to_cpu(val, reg))
> +		__vcpu_sys_reg(vcpu, reg) = val;
>  }
>  
> -static void __vcpu_write_spsr(struct kvm_vcpu *vcpu, u64 val)
> +static void __vcpu_write_spsr(struct kvm_vcpu *vcpu, unsigned long target_mode,
> +			      u64 val)
>  {
> +	if (unlikely(vcpu_has_nv(vcpu))) {
> +		if (target_mode == PSR_MODE_EL1h)
> +			vcpu_write_sys_reg(vcpu, val, SPSR_EL1);
> +		else
> +			vcpu_write_sys_reg(vcpu, val, SPSR_EL2);
> +
> +		return;
> +	}
> +
>  	write_sysreg_el1(val, SYS_SPSR);
>  }
>  
> @@ -97,6 +110,11 @@ static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long target_mode,
>  		sctlr = __vcpu_read_sys_reg(vcpu, SCTLR_EL1);
>  		__vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL1);
>  		break;
> +	case PSR_MODE_EL2h:
> +		vbar = __vcpu_read_sys_reg(vcpu, VBAR_EL2);
> +		sctlr = __vcpu_read_sys_reg(vcpu, SCTLR_EL2);
> +		__vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL2);
> +		break;
>  	default:
>  		/* Don't do that */
>  		BUG();
> @@ -149,7 +167,7 @@ static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long target_mode,
>  	new |= target_mode;
>  
>  	*vcpu_cpsr(vcpu) = new;
> -	__vcpu_write_spsr(vcpu, old);
> +	__vcpu_write_spsr(vcpu, target_mode, old);
>  }
>  
>  /*
> @@ -320,11 +338,22 @@ static void kvm_inject_exception(struct kvm_vcpu *vcpu)
>  		      KVM_ARM64_EXCEPT_AA64_EL1):
>  			enter_exception64(vcpu, PSR_MODE_EL1h, except_type_sync);
>  			break;
> +
> +		case (KVM_ARM64_EXCEPT_AA64_ELx_SYNC |
> +		      KVM_ARM64_EXCEPT_AA64_EL2):
> +			enter_exception64(vcpu, PSR_MODE_EL2h, except_type_sync);
> +			break;
> +
> +		case (KVM_ARM64_EXCEPT_AA64_ELx_IRQ |
> +		      KVM_ARM64_EXCEPT_AA64_EL2):
> +			enter_exception64(vcpu, PSR_MODE_EL2h, except_type_irq);
> +			break;
> +
>  		default:
>  			/*
> -			 * Only EL1_SYNC makes sense so far, EL2_{SYNC,IRQ}
> -			 * will be implemented at some point. Everything
> -			 * else gets silently ignored.
> +			 * Only EL1_SYNC and EL2_{SYNC,IRQ} makes
> +			 * sense so far. Everything else gets silently
> +			 * ignored.
>  			 */
>  			break;
>  		}
> diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
> index b47df73e98d7..81ceee6998cc 100644
> --- a/arch/arm64/kvm/inject_fault.c
> +++ b/arch/arm64/kvm/inject_fault.c
> @@ -12,19 +12,58 @@
>  
>  #include <linux/kvm_host.h>
>  #include <asm/kvm_emulate.h>
> +#include <asm/kvm_nested.h>
>  #include <asm/esr.h>
>  
> +static void pend_sync_exception(struct kvm_vcpu *vcpu)
> +{
> +	vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA64_ELx_SYNC	|
> +			     KVM_ARM64_PENDING_EXCEPTION);
> +
> +	/* If not nesting, EL1 is the only possible exception target */
> +	if (likely(!vcpu_has_nv(vcpu))) {
> +		vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_EL1;
> +		return;
> +	}
> +
> +	/*
> +	 * With NV, we need to pick between EL1 and EL2. Note that we
> +	 * never deal with a nesting exception here, hence never
> +	 * changing context, and the exception itself can be delayed
> +	 * until the next entry.
> +	 */
> +	switch(*vcpu_cpsr(vcpu) & PSR_MODE_MASK) {
> +	case PSR_MODE_EL2h:
> +	case PSR_MODE_EL2t:
> +		vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_EL2;
> +		break;
> +	case PSR_MODE_EL1h:
> +	case PSR_MODE_EL1t:
> +		vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_EL1;
> +		break;
> +	case PSR_MODE_EL0t:
> +		if (vcpu_el2_tge_is_set(vcpu))
> +			vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_EL2;
> +		else
> +			vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_EL1;
> +		break;
> +	default:
> +		BUG();
> +	}
> +}
> +
> +static bool match_target_el(struct kvm_vcpu *vcpu, unsigned long target)
> +{
> +	return (vcpu->arch.flags & KVM_ARM64_EXCEPT_AA64_EL_MASK) == target;
> +}
> +
>  static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr)
>  {
>  	unsigned long cpsr = *vcpu_cpsr(vcpu);
>  	bool is_aarch32 = vcpu_mode_is_32bit(vcpu);
>  	u32 esr = 0;
>  
> -	vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA64_EL1		|
> -			     KVM_ARM64_EXCEPT_AA64_ELx_SYNC	|
> -			     KVM_ARM64_PENDING_EXCEPTION);
> -
> -	vcpu_write_sys_reg(vcpu, addr, FAR_EL1);
> +	pend_sync_exception(vcpu);
>  
>  	/*
>  	 * Build an {i,d}abort, depending on the level and the
> @@ -45,16 +84,22 @@ static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr
>  	if (!is_iabt)
>  		esr |= ESR_ELx_EC_DABT_LOW << ESR_ELx_EC_SHIFT;
>  
> -	vcpu_write_sys_reg(vcpu, esr | ESR_ELx_FSC_EXTABT, ESR_EL1);
> +	esr |= ESR_ELx_FSC_EXTABT;
> +
> +	if (match_target_el(vcpu, KVM_ARM64_EXCEPT_AA64_EL1)) {
> +		vcpu_write_sys_reg(vcpu, addr, FAR_EL1);
> +		vcpu_write_sys_reg(vcpu, esr, ESR_EL1);
> +	} else {
> +		vcpu_write_sys_reg(vcpu, addr, FAR_EL2);
> +		vcpu_write_sys_reg(vcpu, esr, ESR_EL2);
> +	}
>  }
>  
>  static void inject_undef64(struct kvm_vcpu *vcpu)
>  {
>  	u32 esr = (ESR_ELx_EC_UNKNOWN << ESR_ELx_EC_SHIFT);
>  
> -	vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA64_EL1		|
> -			     KVM_ARM64_EXCEPT_AA64_ELx_SYNC	|
> -			     KVM_ARM64_PENDING_EXCEPTION);
> +	pend_sync_exception(vcpu);
>  
>  	/*
>  	 * Build an unknown exception, depending on the instruction
> @@ -63,7 +108,10 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
>  	if (kvm_vcpu_trap_il_is32bit(vcpu))
>  		esr |= ESR_ELx_IL;
>  
> -	vcpu_write_sys_reg(vcpu, esr, ESR_EL1);
> +	if (match_target_el(vcpu, KVM_ARM64_EXCEPT_AA64_EL1))
> +		vcpu_write_sys_reg(vcpu, esr, ESR_EL1);
> +	else
> +		vcpu_write_sys_reg(vcpu, esr, ESR_EL2);
>  }
>  
>  #define DFSR_FSC_EXTABT_LPAE	0x10
> diff --git a/arch/arm64/kvm/trace_arm.h b/arch/arm64/kvm/trace_arm.h
> index 33e4e7dd2719..f3e46a976125 100644
> --- a/arch/arm64/kvm/trace_arm.h
> +++ b/arch/arm64/kvm/trace_arm.h
> @@ -2,6 +2,7 @@
>  #if !defined(_TRACE_ARM_ARM64_KVM_H) || defined(TRACE_HEADER_MULTI_READ)
>  #define _TRACE_ARM_ARM64_KVM_H
>  
> +#include <asm/kvm_emulate.h>
>  #include <kvm/arm_arch_timer.h>
>  #include <linux/tracepoint.h>
>  
> @@ -301,6 +302,64 @@ TRACE_EVENT(kvm_timer_emulate,
>  		  __entry->timer_idx, __entry->should_fire)
>  );
>  
> +TRACE_EVENT(kvm_nested_eret,
> +	TP_PROTO(struct kvm_vcpu *vcpu, unsigned long elr_el2,
> +		 unsigned long spsr_el2),
> +	TP_ARGS(vcpu, elr_el2, spsr_el2),
> +
> +	TP_STRUCT__entry(
> +		__field(struct kvm_vcpu *,	vcpu)
> +		__field(unsigned long,		elr_el2)
> +		__field(unsigned long,		spsr_el2)
> +		__field(unsigned long,		target_mode)
> +		__field(unsigned long,		hcr_el2)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->vcpu = vcpu;
> +		__entry->elr_el2 = elr_el2;
> +		__entry->spsr_el2 = spsr_el2;
> +		__entry->target_mode = spsr_el2 & (PSR_MODE_MASK | PSR_MODE32_BIT);
> +		__entry->hcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
> +	),
> +
> +	TP_printk("elr_el2: 0x%lx spsr_el2: 0x%08lx (M: %s) hcr_el2: %lx",
> +		  __entry->elr_el2, __entry->spsr_el2,
> +		  __print_symbolic(__entry->target_mode, kvm_mode_names),
> +		  __entry->hcr_el2)
> +);
> +
> +TRACE_EVENT(kvm_inject_nested_exception,
> +	TP_PROTO(struct kvm_vcpu *vcpu, u64 esr_el2, int type),
> +	TP_ARGS(vcpu, esr_el2, type),
> +
> +	TP_STRUCT__entry(
> +		__field(struct kvm_vcpu *,		vcpu)
> +		__field(unsigned long,			esr_el2)
> +		__field(int,				type)
> +		__field(unsigned long,			spsr_el2)
> +		__field(unsigned long,			pc)
> +		__field(unsigned long,			source_mode)
> +		__field(unsigned long,			hcr_el2)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->vcpu = vcpu;
> +		__entry->esr_el2 = esr_el2;
> +		__entry->type = type;
> +		__entry->spsr_el2 = *vcpu_cpsr(vcpu);
> +		__entry->pc = *vcpu_pc(vcpu);
> +		__entry->source_mode = *vcpu_cpsr(vcpu) & (PSR_MODE_MASK | PSR_MODE32_BIT);
> +		__entry->hcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
> +	),
> +
> +	TP_printk("%s: esr_el2 0x%lx elr_el2: 0x%lx spsr_el2: 0x%08lx (M: %s) hcr_el2: %lx",
> +		  __print_symbolic(__entry->type, kvm_exception_type_names),
> +		  __entry->esr_el2, __entry->pc, __entry->spsr_el2,
> +		  __print_symbolic(__entry->source_mode, kvm_mode_names),
> +		  __entry->hcr_el2)
> +);
> +
>  #endif /* _TRACE_ARM_ARM64_KVM_H */
>  
>  #undef TRACE_INCLUDE_PATH
> -- 
> 2.30.2
> 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 09/64] KVM: arm64: nv: Support virtual EL2 exceptions
@ 2022-02-02 15:23     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-02 15:23 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi,

On Fri, Jan 28, 2022 at 12:18:17PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> Support injecting exceptions and performing exception returns to and
> from virtual EL2.  This must be done entirely in software except when
> taking an exception from vEL0 to vEL2 when the virtual HCR_EL2.{E2H,TGE}
> == {1,1}  (a VHE guest hypervisor).
> 
> Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> [maz: switch to common exception injection framework, illegal exeption
>  return handling]
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_arm.h     |  17 +++
>  arch/arm64/include/asm/kvm_emulate.h |  10 ++
>  arch/arm64/include/asm/kvm_host.h    |   1 +
>  arch/arm64/kvm/Makefile              |   2 +-
>  arch/arm64/kvm/emulate-nested.c      | 197 +++++++++++++++++++++++++++
>  arch/arm64/kvm/hyp/exception.c       |  49 +++++--
>  arch/arm64/kvm/inject_fault.c        |  68 +++++++--
>  arch/arm64/kvm/trace_arm.h           |  59 ++++++++
>  8 files changed, 382 insertions(+), 21 deletions(-)
>  create mode 100644 arch/arm64/kvm/emulate-nested.c
> 
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index 01d47c5886dc..e6e3aae87a09 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -359,4 +359,21 @@
>  #define CPACR_EL1_TTA		(1 << 28)
>  #define CPACR_EL1_DEFAULT	(CPACR_EL1_FPEN | CPACR_EL1_ZEN_EL1EN)
>  
> +#define kvm_mode_names				\
> +	{ PSR_MODE_EL0t,	"EL0t" },	\
> +	{ PSR_MODE_EL1t,	"EL1t" },	\
> +	{ PSR_MODE_EL1h,	"EL1h" },	\
> +	{ PSR_MODE_EL2t,	"EL2t" },	\
> +	{ PSR_MODE_EL2h,	"EL2h" },	\
> +	{ PSR_MODE_EL3t,	"EL3t" },	\
> +	{ PSR_MODE_EL3h,	"EL3h" },	\
> +	{ PSR_AA32_MODE_USR,	"32-bit USR" },	\
> +	{ PSR_AA32_MODE_FIQ,	"32-bit FIQ" },	\
> +	{ PSR_AA32_MODE_IRQ,	"32-bit IRQ" },	\
> +	{ PSR_AA32_MODE_SVC,	"32-bit SVC" },	\
> +	{ PSR_AA32_MODE_ABT,	"32-bit ABT" },	\
> +	{ PSR_AA32_MODE_HYP,	"32-bit HYP" },	\
> +	{ PSR_AA32_MODE_UND,	"32-bit UND" },	\
> +	{ PSR_AA32_MODE_SYS,	"32-bit SYS" }
> +
>  #endif /* __ARM64_KVM_ARM_H__ */
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index ea9a130c4b6a..cb9f123d26f3 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -33,6 +33,12 @@ enum exception_type {
>  	except_type_serror	= 0x180,
>  };
>  
> +#define kvm_exception_type_names		\
> +	{ except_type_sync,	"SYNC"   },	\
> +	{ except_type_irq,	"IRQ"    },	\
> +	{ except_type_fiq,	"FIQ"    },	\
> +	{ except_type_serror,	"SERROR" }
> +
>  bool kvm_condition_valid32(const struct kvm_vcpu *vcpu);
>  void kvm_skip_instr32(struct kvm_vcpu *vcpu);
>  
> @@ -43,6 +49,10 @@ void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
>  
>  void kvm_vcpu_wfi(struct kvm_vcpu *vcpu);
>  
> +void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu);
> +int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2);
> +int kvm_inject_nested_irq(struct kvm_vcpu *vcpu);
> +
>  static __always_inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
>  {
>  	return !(vcpu->arch.hcr_el2 & HCR_RW);
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 15f690c27baf..8fffe2888403 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -467,6 +467,7 @@ struct kvm_vcpu_arch {
>  #define KVM_ARM64_EXCEPT_AA64_ELx_SERR	(3 << 9)
>  #define KVM_ARM64_EXCEPT_AA64_EL1	(0 << 11)
>  #define KVM_ARM64_EXCEPT_AA64_EL2	(1 << 11)
> +#define KVM_ARM64_EXCEPT_AA64_EL_MASK	(1 << 11)
>  
>  #define KVM_ARM64_DEBUG_STATE_SAVE_SPE	(1 << 12) /* Save SPE context if active  */
>  #define KVM_ARM64_DEBUG_STATE_SAVE_TRBE	(1 << 13) /* Save TRBE context if active  */
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index 91861fd8b897..b67c4ebd72b1 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -14,7 +14,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
>  	 inject_fault.o va_layout.o handle_exit.o \
>  	 guest.o debug.o reset.o sys_regs.o \
>  	 vgic-sys-reg-v3.o fpsimd.o pmu.o pkvm.o \
> -	 arch_timer.o trng.o\
> +	 arch_timer.o trng.o emulate-nested.o \
>  	 vgic/vgic.o vgic/vgic-init.o \
>  	 vgic/vgic-irqfd.o vgic/vgic-v2.o \
>  	 vgic/vgic-v3.o vgic/vgic-v4.o \
> diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
> new file mode 100644
> index 000000000000..f52cd4458947
> --- /dev/null
> +++ b/arch/arm64/kvm/emulate-nested.c
> @@ -0,0 +1,197 @@

Looks like this line:

// SPDX-License-Identifier: GPL-2.0-only

is missing.

> +/*
> + * Copyright (C) 2016 - Linaro and Columbia University
> + * Author: Jintack Lim <jintack.lim@linaro.org>
> + */
> +
> +#include <linux/kvm.h>
> +#include <linux/kvm_host.h>
> +
> +#include <asm/kvm_emulate.h>
> +#include <asm/kvm_nested.h>
> +
> +#include "hyp/include/hyp/adjust_pc.h"
> +
> +#include "trace.h"
> +
> +static u64 kvm_check_illegal_exception_return(struct kvm_vcpu *vcpu, u64 spsr)
> +{
> +	u64 mode = spsr & PSR_MODE_MASK;
> +
> +	/*
> +	 * Possible causes for an Illegal Exception Return from EL2:
> +	 * - trying to return to EL3
> +	 * - trying to return to a 32bit EL
> +	 * - trying to return to EL1 with HCR_EL2.TGE set
> +	 */
> +	if (mode == PSR_MODE_EL3t || mode == PSR_MODE_EL3h ||
> +	    spsr & PSR_MODE32_BIT ||

I take it KVM will not allow a L1 hypervisor to run EL1 or EL0 in 32 bit mode?

> +	    (vcpu_el2_tge_is_set(vcpu) && (mode == PSR_MODE_EL1t ||
> +					   mode == PSR_MODE_EL1h))) {

I think these checks should also be added:

"A return where the value of the saved process state M[4] bit is 0, indicating a
return to AArch64 state, and one of the following is true:

- The M[1] bit is 1.
- The M[3:0] bits are 0b0001.
- The Exception level being returned to is using AArch32 state, as programmed by
  the SCR_EL3.RW or HCR_EL2.RW bits, or as configured from reset."

Thanks,
Alex

> +		/*
> +		 * The guest is playing with our nerves. Preserve EL, SP,
> +		 * masks, flags from the existing PSTATE, and set IL.
> +		 * The HW will then generate an Illegal State Exception
> +		 * immediately after ERET.
> +		 */
> +		spsr = *vcpu_cpsr(vcpu);
> +
> +		spsr &= (PSR_D_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT |
> +			 PSR_N_BIT | PSR_Z_BIT | PSR_C_BIT | PSR_V_BIT |
> +			 PSR_MODE_MASK | PSR_MODE32_BIT);
> +		spsr |= PSR_IL_BIT;
> +	}
> +
> +	return spsr;
> +}
> +
> +void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu)
> +{
> +	u64 spsr, elr, mode;
> +	bool direct_eret;
> +
> +	/*
> +	 * Going through the whole put/load motions is a waste of time
> +	 * if this is a VHE guest hypervisor returning to its own
> +	 * userspace, or the hypervisor performing a local exception
> +	 * return. No need to save/restore registers, no need to
> +	 * switch S2 MMU. Just do the canonical ERET.
> +	 */
> +	spsr = vcpu_read_sys_reg(vcpu, SPSR_EL2);
> +	spsr = kvm_check_illegal_exception_return(vcpu, spsr);
> +
> +	mode = spsr & (PSR_MODE_MASK | PSR_MODE32_BIT);
> +
> +	direct_eret  = (mode == PSR_MODE_EL0t &&
> +			vcpu_el2_e2h_is_set(vcpu) &&
> +			vcpu_el2_tge_is_set(vcpu));
> +	direct_eret |= (mode == PSR_MODE_EL2h || mode == PSR_MODE_EL2t);
> +
> +	if (direct_eret) {
> +		*vcpu_pc(vcpu) = vcpu_read_sys_reg(vcpu, ELR_EL2);
> +		*vcpu_cpsr(vcpu) = spsr;
> +		trace_kvm_nested_eret(vcpu, *vcpu_pc(vcpu), spsr);
> +		return;
> +	}
> +
> +	preempt_disable();
> +	kvm_arch_vcpu_put(vcpu);
> +
> +	elr = __vcpu_sys_reg(vcpu, ELR_EL2);
> +
> +	trace_kvm_nested_eret(vcpu, elr, spsr);
> +
> +	/*
> +	 * Note that the current exception level is always the virtual EL2,
> +	 * since we set HCR_EL2.NV bit only when entering the virtual EL2.
> +	 */
> +	*vcpu_pc(vcpu) = elr;
> +	*vcpu_cpsr(vcpu) = spsr;
> +
> +	kvm_arch_vcpu_load(vcpu, smp_processor_id());
> +	preempt_enable();
> +}
> +
> +static void kvm_inject_el2_exception(struct kvm_vcpu *vcpu, u64 esr_el2,
> +				     enum exception_type type)
> +{
> +	trace_kvm_inject_nested_exception(vcpu, esr_el2, type);
> +
> +	switch (type) {
> +	case except_type_sync:
> +		vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_ELx_SYNC;
> +		break;
> +	case except_type_irq:
> +		vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_ELx_IRQ;
> +		break;
> +	default:
> +		WARN_ONCE(1, "Unsupported EL2 exception injection %d\n", type);
> +	}
> +
> +	vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA64_EL2		|
> +			     KVM_ARM64_PENDING_EXCEPTION);
> +
> +	vcpu_write_sys_reg(vcpu, esr_el2, ESR_EL2);
> +}
> +
> +/*
> + * Emulate taking an exception to EL2.
> + * See ARM ARM J8.1.2 AArch64.TakeException()
> + */
> +static int kvm_inject_nested(struct kvm_vcpu *vcpu, u64 esr_el2,
> +			     enum exception_type type)
> +{
> +	u64 pstate, mode;
> +	bool direct_inject;
> +
> +	if (!vcpu_has_nv(vcpu)) {
> +		kvm_err("Unexpected call to %s for the non-nesting configuration\n",
> +				__func__);
> +		return -EINVAL;
> +	}
> +
> +	/*
> +	 * As for ERET, we can avoid doing too much on the injection path by
> +	 * checking that we either took the exception from a VHE host
> +	 * userspace or from vEL2. In these cases, there is no change in
> +	 * translation regime (or anything else), so let's do as little as
> +	 * possible.
> +	 */
> +	pstate = *vcpu_cpsr(vcpu);
> +	mode = pstate & (PSR_MODE_MASK | PSR_MODE32_BIT);
> +
> +	direct_inject  = (mode == PSR_MODE_EL0t &&
> +			  vcpu_el2_e2h_is_set(vcpu) &&
> +			  vcpu_el2_tge_is_set(vcpu));
> +	direct_inject |= (mode == PSR_MODE_EL2h || mode == PSR_MODE_EL2t);
> +
> +	if (direct_inject) {
> +		kvm_inject_el2_exception(vcpu, esr_el2, type);
> +		return 1;
> +	}
> +
> +	preempt_disable();
> +	kvm_arch_vcpu_put(vcpu);
> +
> +	kvm_inject_el2_exception(vcpu, esr_el2, type);
> +
> +	/*
> +	 * A hard requirement is that a switch between EL1 and EL2
> +	 * contexts has to happen between a put/load, so that we can
> +	 * pick the correct timer and interrupt configuration, among
> +	 * other things.
> +	 *
> +	 * Make sure the exception actually took place before we load
> +	 * the new context.
> +	 */
> +	__kvm_adjust_pc(vcpu);
> +
> +	kvm_arch_vcpu_load(vcpu, smp_processor_id());
> +	preempt_enable();
> +
> +	return 1;
> +}
> +
> +int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2)
> +{
> +	return kvm_inject_nested(vcpu, esr_el2, except_type_sync);
> +}
> +
> +int kvm_inject_nested_irq(struct kvm_vcpu *vcpu)
> +{
> +	/*
> +	 * Do not inject an irq if the:
> +	 *  - Current exception level is EL2, and
> +	 *  - virtual HCR_EL2.TGE == 0
> +	 *  - virtual HCR_EL2.IMO == 0
> +	 *
> +	 * See Table D1-17 "Physical interrupt target and masking when EL3 is
> +	 * not implemented and EL2 is implemented" in ARM DDI 0487C.a.
> +	 */
> +
> +	if (vcpu_is_el2(vcpu) && !vcpu_el2_tge_is_set(vcpu) &&
> +	    !(__vcpu_sys_reg(vcpu, HCR_EL2) & HCR_IMO))
> +		return 1;
> +
> +	/* esr_el2 value doesn't matter for exits due to irqs. */
> +	return kvm_inject_nested(vcpu, 0, except_type_irq);
> +}
> diff --git a/arch/arm64/kvm/hyp/exception.c b/arch/arm64/kvm/hyp/exception.c
> index 0418399e0a20..93f9c6f97376 100644
> --- a/arch/arm64/kvm/hyp/exception.c
> +++ b/arch/arm64/kvm/hyp/exception.c
> @@ -13,6 +13,7 @@
>  #include <hyp/adjust_pc.h>
>  #include <linux/kvm_host.h>
>  #include <asm/kvm_emulate.h>
> +#include <asm/kvm_nested.h>
>  
>  #if !defined (__KVM_NVHE_HYPERVISOR__) && !defined (__KVM_VHE_HYPERVISOR__)
>  #error Hypervisor code only!
> @@ -22,7 +23,9 @@ static inline u64 __vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
>  {
>  	u64 val;
>  
> -	if (__vcpu_read_sys_reg_from_cpu(reg, &val))
> +	if (unlikely(vcpu_has_nv(vcpu)))
> +		return vcpu_read_sys_reg(vcpu, reg);
> +	else if (__vcpu_read_sys_reg_from_cpu(reg, &val))
>  		return val;
>  
>  	return __vcpu_sys_reg(vcpu, reg);
> @@ -30,14 +33,24 @@ static inline u64 __vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
>  
>  static inline void __vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
>  {
> -	if (__vcpu_write_sys_reg_to_cpu(val, reg))
> -		return;
> -
> -	 __vcpu_sys_reg(vcpu, reg) = val;
> +	if (unlikely(vcpu_has_nv(vcpu)))
> +		vcpu_write_sys_reg(vcpu, val, reg);
> +	else if (!__vcpu_write_sys_reg_to_cpu(val, reg))
> +		__vcpu_sys_reg(vcpu, reg) = val;
>  }
>  
> -static void __vcpu_write_spsr(struct kvm_vcpu *vcpu, u64 val)
> +static void __vcpu_write_spsr(struct kvm_vcpu *vcpu, unsigned long target_mode,
> +			      u64 val)
>  {
> +	if (unlikely(vcpu_has_nv(vcpu))) {
> +		if (target_mode == PSR_MODE_EL1h)
> +			vcpu_write_sys_reg(vcpu, val, SPSR_EL1);
> +		else
> +			vcpu_write_sys_reg(vcpu, val, SPSR_EL2);
> +
> +		return;
> +	}
> +
>  	write_sysreg_el1(val, SYS_SPSR);
>  }
>  
> @@ -97,6 +110,11 @@ static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long target_mode,
>  		sctlr = __vcpu_read_sys_reg(vcpu, SCTLR_EL1);
>  		__vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL1);
>  		break;
> +	case PSR_MODE_EL2h:
> +		vbar = __vcpu_read_sys_reg(vcpu, VBAR_EL2);
> +		sctlr = __vcpu_read_sys_reg(vcpu, SCTLR_EL2);
> +		__vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL2);
> +		break;
>  	default:
>  		/* Don't do that */
>  		BUG();
> @@ -149,7 +167,7 @@ static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long target_mode,
>  	new |= target_mode;
>  
>  	*vcpu_cpsr(vcpu) = new;
> -	__vcpu_write_spsr(vcpu, old);
> +	__vcpu_write_spsr(vcpu, target_mode, old);
>  }
>  
>  /*
> @@ -320,11 +338,22 @@ static void kvm_inject_exception(struct kvm_vcpu *vcpu)
>  		      KVM_ARM64_EXCEPT_AA64_EL1):
>  			enter_exception64(vcpu, PSR_MODE_EL1h, except_type_sync);
>  			break;
> +
> +		case (KVM_ARM64_EXCEPT_AA64_ELx_SYNC |
> +		      KVM_ARM64_EXCEPT_AA64_EL2):
> +			enter_exception64(vcpu, PSR_MODE_EL2h, except_type_sync);
> +			break;
> +
> +		case (KVM_ARM64_EXCEPT_AA64_ELx_IRQ |
> +		      KVM_ARM64_EXCEPT_AA64_EL2):
> +			enter_exception64(vcpu, PSR_MODE_EL2h, except_type_irq);
> +			break;
> +
>  		default:
>  			/*
> -			 * Only EL1_SYNC makes sense so far, EL2_{SYNC,IRQ}
> -			 * will be implemented at some point. Everything
> -			 * else gets silently ignored.
> +			 * Only EL1_SYNC and EL2_{SYNC,IRQ} makes
> +			 * sense so far. Everything else gets silently
> +			 * ignored.
>  			 */
>  			break;
>  		}
> diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
> index b47df73e98d7..81ceee6998cc 100644
> --- a/arch/arm64/kvm/inject_fault.c
> +++ b/arch/arm64/kvm/inject_fault.c
> @@ -12,19 +12,58 @@
>  
>  #include <linux/kvm_host.h>
>  #include <asm/kvm_emulate.h>
> +#include <asm/kvm_nested.h>
>  #include <asm/esr.h>
>  
> +static void pend_sync_exception(struct kvm_vcpu *vcpu)
> +{
> +	vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA64_ELx_SYNC	|
> +			     KVM_ARM64_PENDING_EXCEPTION);
> +
> +	/* If not nesting, EL1 is the only possible exception target */
> +	if (likely(!vcpu_has_nv(vcpu))) {
> +		vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_EL1;
> +		return;
> +	}
> +
> +	/*
> +	 * With NV, we need to pick between EL1 and EL2. Note that we
> +	 * never deal with a nesting exception here, hence never
> +	 * changing context, and the exception itself can be delayed
> +	 * until the next entry.
> +	 */
> +	switch(*vcpu_cpsr(vcpu) & PSR_MODE_MASK) {
> +	case PSR_MODE_EL2h:
> +	case PSR_MODE_EL2t:
> +		vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_EL2;
> +		break;
> +	case PSR_MODE_EL1h:
> +	case PSR_MODE_EL1t:
> +		vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_EL1;
> +		break;
> +	case PSR_MODE_EL0t:
> +		if (vcpu_el2_tge_is_set(vcpu))
> +			vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_EL2;
> +		else
> +			vcpu->arch.flags |= KVM_ARM64_EXCEPT_AA64_EL1;
> +		break;
> +	default:
> +		BUG();
> +	}
> +}
> +
> +static bool match_target_el(struct kvm_vcpu *vcpu, unsigned long target)
> +{
> +	return (vcpu->arch.flags & KVM_ARM64_EXCEPT_AA64_EL_MASK) == target;
> +}
> +
>  static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr)
>  {
>  	unsigned long cpsr = *vcpu_cpsr(vcpu);
>  	bool is_aarch32 = vcpu_mode_is_32bit(vcpu);
>  	u32 esr = 0;
>  
> -	vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA64_EL1		|
> -			     KVM_ARM64_EXCEPT_AA64_ELx_SYNC	|
> -			     KVM_ARM64_PENDING_EXCEPTION);
> -
> -	vcpu_write_sys_reg(vcpu, addr, FAR_EL1);
> +	pend_sync_exception(vcpu);
>  
>  	/*
>  	 * Build an {i,d}abort, depending on the level and the
> @@ -45,16 +84,22 @@ static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr
>  	if (!is_iabt)
>  		esr |= ESR_ELx_EC_DABT_LOW << ESR_ELx_EC_SHIFT;
>  
> -	vcpu_write_sys_reg(vcpu, esr | ESR_ELx_FSC_EXTABT, ESR_EL1);
> +	esr |= ESR_ELx_FSC_EXTABT;
> +
> +	if (match_target_el(vcpu, KVM_ARM64_EXCEPT_AA64_EL1)) {
> +		vcpu_write_sys_reg(vcpu, addr, FAR_EL1);
> +		vcpu_write_sys_reg(vcpu, esr, ESR_EL1);
> +	} else {
> +		vcpu_write_sys_reg(vcpu, addr, FAR_EL2);
> +		vcpu_write_sys_reg(vcpu, esr, ESR_EL2);
> +	}
>  }
>  
>  static void inject_undef64(struct kvm_vcpu *vcpu)
>  {
>  	u32 esr = (ESR_ELx_EC_UNKNOWN << ESR_ELx_EC_SHIFT);
>  
> -	vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA64_EL1		|
> -			     KVM_ARM64_EXCEPT_AA64_ELx_SYNC	|
> -			     KVM_ARM64_PENDING_EXCEPTION);
> +	pend_sync_exception(vcpu);
>  
>  	/*
>  	 * Build an unknown exception, depending on the instruction
> @@ -63,7 +108,10 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
>  	if (kvm_vcpu_trap_il_is32bit(vcpu))
>  		esr |= ESR_ELx_IL;
>  
> -	vcpu_write_sys_reg(vcpu, esr, ESR_EL1);
> +	if (match_target_el(vcpu, KVM_ARM64_EXCEPT_AA64_EL1))
> +		vcpu_write_sys_reg(vcpu, esr, ESR_EL1);
> +	else
> +		vcpu_write_sys_reg(vcpu, esr, ESR_EL2);
>  }
>  
>  #define DFSR_FSC_EXTABT_LPAE	0x10
> diff --git a/arch/arm64/kvm/trace_arm.h b/arch/arm64/kvm/trace_arm.h
> index 33e4e7dd2719..f3e46a976125 100644
> --- a/arch/arm64/kvm/trace_arm.h
> +++ b/arch/arm64/kvm/trace_arm.h
> @@ -2,6 +2,7 @@
>  #if !defined(_TRACE_ARM_ARM64_KVM_H) || defined(TRACE_HEADER_MULTI_READ)
>  #define _TRACE_ARM_ARM64_KVM_H
>  
> +#include <asm/kvm_emulate.h>
>  #include <kvm/arm_arch_timer.h>
>  #include <linux/tracepoint.h>
>  
> @@ -301,6 +302,64 @@ TRACE_EVENT(kvm_timer_emulate,
>  		  __entry->timer_idx, __entry->should_fire)
>  );
>  
> +TRACE_EVENT(kvm_nested_eret,
> +	TP_PROTO(struct kvm_vcpu *vcpu, unsigned long elr_el2,
> +		 unsigned long spsr_el2),
> +	TP_ARGS(vcpu, elr_el2, spsr_el2),
> +
> +	TP_STRUCT__entry(
> +		__field(struct kvm_vcpu *,	vcpu)
> +		__field(unsigned long,		elr_el2)
> +		__field(unsigned long,		spsr_el2)
> +		__field(unsigned long,		target_mode)
> +		__field(unsigned long,		hcr_el2)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->vcpu = vcpu;
> +		__entry->elr_el2 = elr_el2;
> +		__entry->spsr_el2 = spsr_el2;
> +		__entry->target_mode = spsr_el2 & (PSR_MODE_MASK | PSR_MODE32_BIT);
> +		__entry->hcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
> +	),
> +
> +	TP_printk("elr_el2: 0x%lx spsr_el2: 0x%08lx (M: %s) hcr_el2: %lx",
> +		  __entry->elr_el2, __entry->spsr_el2,
> +		  __print_symbolic(__entry->target_mode, kvm_mode_names),
> +		  __entry->hcr_el2)
> +);
> +
> +TRACE_EVENT(kvm_inject_nested_exception,
> +	TP_PROTO(struct kvm_vcpu *vcpu, u64 esr_el2, int type),
> +	TP_ARGS(vcpu, esr_el2, type),
> +
> +	TP_STRUCT__entry(
> +		__field(struct kvm_vcpu *,		vcpu)
> +		__field(unsigned long,			esr_el2)
> +		__field(int,				type)
> +		__field(unsigned long,			spsr_el2)
> +		__field(unsigned long,			pc)
> +		__field(unsigned long,			source_mode)
> +		__field(unsigned long,			hcr_el2)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->vcpu = vcpu;
> +		__entry->esr_el2 = esr_el2;
> +		__entry->type = type;
> +		__entry->spsr_el2 = *vcpu_cpsr(vcpu);
> +		__entry->pc = *vcpu_pc(vcpu);
> +		__entry->source_mode = *vcpu_cpsr(vcpu) & (PSR_MODE_MASK | PSR_MODE32_BIT);
> +		__entry->hcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
> +	),
> +
> +	TP_printk("%s: esr_el2 0x%lx elr_el2: 0x%lx spsr_el2: 0x%08lx (M: %s) hcr_el2: %lx",
> +		  __print_symbolic(__entry->type, kvm_exception_type_names),
> +		  __entry->esr_el2, __entry->pc, __entry->spsr_el2,
> +		  __print_symbolic(__entry->source_mode, kvm_mode_names),
> +		  __entry->hcr_el2)
> +);
> +
>  #endif /* _TRACE_ARM_ARM64_KVM_H */
>  
>  #undef TRACE_INCLUDE_PATH
> -- 
> 2.30.2
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 12/64] KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-02 17:08     ` Alexandru Elisei
  -1 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-02 17:08 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:20PM +0000, Marc Zyngier wrote:
> Some EL2 system registers immediately affect the current execution
> of the system, so we need to use their respective EL1 counterparts.
> For this we need to define a mapping between the two. In general,
> this only affects non-VHE guest hypervisors, as VHE system registers
> are compatible with the EL1 counterparts.
> 
> These helpers will get used in subsequent patches.
> 
> Co-developed-by: Andre Przywara <andre.przywara@arm.com>
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_nested.h | 54 +++++++++++++++++++++++++++++
>  1 file changed, 54 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index fd601ea68d13..5a85be6d8eb3 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -2,6 +2,7 @@
>  #ifndef __ARM64_KVM_NESTED_H
>  #define __ARM64_KVM_NESTED_H
>  
> +#include <linux/bitfield.h>
>  #include <linux/kvm_host.h>
>  
>  static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
> @@ -11,4 +12,57 @@ static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
>  		test_bit(KVM_ARM_VCPU_HAS_EL2, vcpu->arch.features));
>  }
>  
> +/* Translation helpers from non-VHE EL2 to EL1 */
> +static inline u64 tcr_el2_ps_to_tcr_el1_ips(u64 tcr_el2)
> +{
> +	return (u64)FIELD_GET(TCR_EL2_PS_MASK, tcr_el2) << TCR_IPS_SHIFT;
> +}
> +
> +static inline u64 translate_tcr_el2_to_tcr_el1(u64 tcr)
> +{
> +	return TCR_EPD1_MASK |				/* disable TTBR1_EL1 */
> +	       ((tcr & TCR_EL2_TBI) ? TCR_TBI0 : 0) |
> +	       tcr_el2_ps_to_tcr_el1_ips(tcr) |
> +	       (tcr & TCR_EL2_TG0_MASK) |
> +	       (tcr & TCR_EL2_ORGN0_MASK) |
> +	       (tcr & TCR_EL2_IRGN0_MASK) |
> +	       (tcr & TCR_EL2_T0SZ_MASK);
> +}
> +
> +static inline u64 translate_cptr_el2_to_cpacr_el1(u64 cptr_el2)
> +{
> +	u64 cpacr_el1 = 0;
> +
> +	if (cptr_el2 & CPTR_EL2_TTA)
> +		cpacr_el1 |= CPACR_EL1_TTA;
> +	if (!(cptr_el2 & CPTR_EL2_TFP))
> +		cpacr_el1 |= CPACR_EL1_FPEN;
> +	if (!(cptr_el2 & CPTR_EL2_TZ))
> +		cpacr_el1 |= CPACR_EL1_ZEN;
> +
> +	return cpacr_el1;
> +}
> +
> +static inline u64 translate_sctlr_el2_to_sctlr_el1(u64 val)
> +{
> +	/* Only preserve the minimal set of bits we support */
> +	val &= (SCTLR_ELx_M | SCTLR_ELx_A | SCTLR_ELx_C | SCTLR_ELx_SA |
> +		SCTLR_ELx_I | SCTLR_ELx_IESB | SCTLR_ELx_WXN | SCTLR_ELx_EE);

Checked that the bit positions are the same between SCTLR_EL2 and SCTLR_EL1. I
think the IESB bit (bit 21) should be after the WXN bit (bit 19) to be
consistent; doesn't really matter either way.

> +	val |= SCTLR_EL1_RES1;
> +
> +	return val;
> +}
> +
> +static inline u64 translate_ttbr0_el2_to_ttbr0_el1(u64 ttbr0)
> +{
> +	/* Clear the ASID field */
> +	return ttbr0 & ~GENMASK_ULL(63, 48);
> +}
> +
> +static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
> +{
> +	return ((FIELD_GET(CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN, cnthctl) << 10) |
> +		(cnthctl & (CNTHCTL_EVNTI | CNTHCTL_EVNTDIR | CNTHCTL_EVNTEN)));

I asked about the field positions in the previous series and this is what you
replied:

> It's a classic one. Remember that we are running VHE, and remapping a
> nVHE view of CNTHCTL_EL2 into the VHE view *for the guest*, and that
> these things are completely shifted around (it has the CNTKCTL_EL1
> format).
>
> For example, on nVHE, CNTHCTL_EL2.EL1PCTEN is bit 0. On nVHE, this is
> bit 10. That's why we have this shift, and that you now need some
> paracetamol.
>
> You can also look at the way we deal with the same stuff in
> kvm_timer_init_vhe()".

Here's how this function is used in vhe/sysreg-sr.c:

static void __sysreg_restore_vel2_state(struct kvm_cpu_context *ctxt)
{
	[..]
	if (__vcpu_el2_e2h_is_set(ctxt)) {
		[..]
	} else {
		[..]
		val = translate_cnthctl_el2_to_cntkctl_el1(ctxt_sys_reg(ctxt, CNTHCTL_EL2));
		write_sysreg_el1(val, SYS_CNTKCTL);
	}
	[..]
}

CNTHCTL_EL2 is a pure EL2 register. The translate function is called when guest
HCR_EL2.E2H is not set, therefore virtual CNTHCTL_EL2 has the non-VHE format.
And the result of the function is used to write to the hardware CNTKCTL_EL1
register (using the CNTKCTL_EL12 encoding), which is different from the
CNTHCTL_EL2 register. CNTKCTL_EL1 also always has the same format regardless of
the value of the HCR_EL2.E2H bit.

I don't understand what the host running with VHE has to do with the translate
function.

Thanks,
Alex

> +}
> +
>  #endif /* __ARM64_KVM_NESTED_H */
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 12/64] KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers
@ 2022-02-02 17:08     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-02 17:08 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:20PM +0000, Marc Zyngier wrote:
> Some EL2 system registers immediately affect the current execution
> of the system, so we need to use their respective EL1 counterparts.
> For this we need to define a mapping between the two. In general,
> this only affects non-VHE guest hypervisors, as VHE system registers
> are compatible with the EL1 counterparts.
> 
> These helpers will get used in subsequent patches.
> 
> Co-developed-by: Andre Przywara <andre.przywara@arm.com>
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_nested.h | 54 +++++++++++++++++++++++++++++
>  1 file changed, 54 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index fd601ea68d13..5a85be6d8eb3 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -2,6 +2,7 @@
>  #ifndef __ARM64_KVM_NESTED_H
>  #define __ARM64_KVM_NESTED_H
>  
> +#include <linux/bitfield.h>
>  #include <linux/kvm_host.h>
>  
>  static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
> @@ -11,4 +12,57 @@ static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
>  		test_bit(KVM_ARM_VCPU_HAS_EL2, vcpu->arch.features));
>  }
>  
> +/* Translation helpers from non-VHE EL2 to EL1 */
> +static inline u64 tcr_el2_ps_to_tcr_el1_ips(u64 tcr_el2)
> +{
> +	return (u64)FIELD_GET(TCR_EL2_PS_MASK, tcr_el2) << TCR_IPS_SHIFT;
> +}
> +
> +static inline u64 translate_tcr_el2_to_tcr_el1(u64 tcr)
> +{
> +	return TCR_EPD1_MASK |				/* disable TTBR1_EL1 */
> +	       ((tcr & TCR_EL2_TBI) ? TCR_TBI0 : 0) |
> +	       tcr_el2_ps_to_tcr_el1_ips(tcr) |
> +	       (tcr & TCR_EL2_TG0_MASK) |
> +	       (tcr & TCR_EL2_ORGN0_MASK) |
> +	       (tcr & TCR_EL2_IRGN0_MASK) |
> +	       (tcr & TCR_EL2_T0SZ_MASK);
> +}
> +
> +static inline u64 translate_cptr_el2_to_cpacr_el1(u64 cptr_el2)
> +{
> +	u64 cpacr_el1 = 0;
> +
> +	if (cptr_el2 & CPTR_EL2_TTA)
> +		cpacr_el1 |= CPACR_EL1_TTA;
> +	if (!(cptr_el2 & CPTR_EL2_TFP))
> +		cpacr_el1 |= CPACR_EL1_FPEN;
> +	if (!(cptr_el2 & CPTR_EL2_TZ))
> +		cpacr_el1 |= CPACR_EL1_ZEN;
> +
> +	return cpacr_el1;
> +}
> +
> +static inline u64 translate_sctlr_el2_to_sctlr_el1(u64 val)
> +{
> +	/* Only preserve the minimal set of bits we support */
> +	val &= (SCTLR_ELx_M | SCTLR_ELx_A | SCTLR_ELx_C | SCTLR_ELx_SA |
> +		SCTLR_ELx_I | SCTLR_ELx_IESB | SCTLR_ELx_WXN | SCTLR_ELx_EE);

Checked that the bit positions are the same between SCTLR_EL2 and SCTLR_EL1. I
think the IESB bit (bit 21) should be after the WXN bit (bit 19) to be
consistent; doesn't really matter either way.

> +	val |= SCTLR_EL1_RES1;
> +
> +	return val;
> +}
> +
> +static inline u64 translate_ttbr0_el2_to_ttbr0_el1(u64 ttbr0)
> +{
> +	/* Clear the ASID field */
> +	return ttbr0 & ~GENMASK_ULL(63, 48);
> +}
> +
> +static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
> +{
> +	return ((FIELD_GET(CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN, cnthctl) << 10) |
> +		(cnthctl & (CNTHCTL_EVNTI | CNTHCTL_EVNTDIR | CNTHCTL_EVNTEN)));

I asked about the field positions in the previous series and this is what you
replied:

> It's a classic one. Remember that we are running VHE, and remapping a
> nVHE view of CNTHCTL_EL2 into the VHE view *for the guest*, and that
> these things are completely shifted around (it has the CNTKCTL_EL1
> format).
>
> For example, on nVHE, CNTHCTL_EL2.EL1PCTEN is bit 0. On nVHE, this is
> bit 10. That's why we have this shift, and that you now need some
> paracetamol.
>
> You can also look at the way we deal with the same stuff in
> kvm_timer_init_vhe()".

Here's how this function is used in vhe/sysreg-sr.c:

static void __sysreg_restore_vel2_state(struct kvm_cpu_context *ctxt)
{
	[..]
	if (__vcpu_el2_e2h_is_set(ctxt)) {
		[..]
	} else {
		[..]
		val = translate_cnthctl_el2_to_cntkctl_el1(ctxt_sys_reg(ctxt, CNTHCTL_EL2));
		write_sysreg_el1(val, SYS_CNTKCTL);
	}
	[..]
}

CNTHCTL_EL2 is a pure EL2 register. The translate function is called when guest
HCR_EL2.E2H is not set, therefore virtual CNTHCTL_EL2 has the non-VHE format.
And the result of the function is used to write to the hardware CNTKCTL_EL1
register (using the CNTKCTL_EL12 encoding), which is different from the
CNTHCTL_EL2 register. CNTKCTL_EL1 also always has the same format regardless of
the value of the HCR_EL2.E2H bit.

I don't understand what the host running with VHE has to do with the translate
function.

Thanks,
Alex

> +}
> +
>  #endif /* __ARM64_KVM_NESTED_H */
> -- 
> 2.30.2
> 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 12/64] KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers
@ 2022-02-02 17:08     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-02 17:08 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:20PM +0000, Marc Zyngier wrote:
> Some EL2 system registers immediately affect the current execution
> of the system, so we need to use their respective EL1 counterparts.
> For this we need to define a mapping between the two. In general,
> this only affects non-VHE guest hypervisors, as VHE system registers
> are compatible with the EL1 counterparts.
> 
> These helpers will get used in subsequent patches.
> 
> Co-developed-by: Andre Przywara <andre.przywara@arm.com>
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_nested.h | 54 +++++++++++++++++++++++++++++
>  1 file changed, 54 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index fd601ea68d13..5a85be6d8eb3 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -2,6 +2,7 @@
>  #ifndef __ARM64_KVM_NESTED_H
>  #define __ARM64_KVM_NESTED_H
>  
> +#include <linux/bitfield.h>
>  #include <linux/kvm_host.h>
>  
>  static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
> @@ -11,4 +12,57 @@ static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
>  		test_bit(KVM_ARM_VCPU_HAS_EL2, vcpu->arch.features));
>  }
>  
> +/* Translation helpers from non-VHE EL2 to EL1 */
> +static inline u64 tcr_el2_ps_to_tcr_el1_ips(u64 tcr_el2)
> +{
> +	return (u64)FIELD_GET(TCR_EL2_PS_MASK, tcr_el2) << TCR_IPS_SHIFT;
> +}
> +
> +static inline u64 translate_tcr_el2_to_tcr_el1(u64 tcr)
> +{
> +	return TCR_EPD1_MASK |				/* disable TTBR1_EL1 */
> +	       ((tcr & TCR_EL2_TBI) ? TCR_TBI0 : 0) |
> +	       tcr_el2_ps_to_tcr_el1_ips(tcr) |
> +	       (tcr & TCR_EL2_TG0_MASK) |
> +	       (tcr & TCR_EL2_ORGN0_MASK) |
> +	       (tcr & TCR_EL2_IRGN0_MASK) |
> +	       (tcr & TCR_EL2_T0SZ_MASK);
> +}
> +
> +static inline u64 translate_cptr_el2_to_cpacr_el1(u64 cptr_el2)
> +{
> +	u64 cpacr_el1 = 0;
> +
> +	if (cptr_el2 & CPTR_EL2_TTA)
> +		cpacr_el1 |= CPACR_EL1_TTA;
> +	if (!(cptr_el2 & CPTR_EL2_TFP))
> +		cpacr_el1 |= CPACR_EL1_FPEN;
> +	if (!(cptr_el2 & CPTR_EL2_TZ))
> +		cpacr_el1 |= CPACR_EL1_ZEN;
> +
> +	return cpacr_el1;
> +}
> +
> +static inline u64 translate_sctlr_el2_to_sctlr_el1(u64 val)
> +{
> +	/* Only preserve the minimal set of bits we support */
> +	val &= (SCTLR_ELx_M | SCTLR_ELx_A | SCTLR_ELx_C | SCTLR_ELx_SA |
> +		SCTLR_ELx_I | SCTLR_ELx_IESB | SCTLR_ELx_WXN | SCTLR_ELx_EE);

Checked that the bit positions are the same between SCTLR_EL2 and SCTLR_EL1. I
think the IESB bit (bit 21) should be after the WXN bit (bit 19) to be
consistent; doesn't really matter either way.

> +	val |= SCTLR_EL1_RES1;
> +
> +	return val;
> +}
> +
> +static inline u64 translate_ttbr0_el2_to_ttbr0_el1(u64 ttbr0)
> +{
> +	/* Clear the ASID field */
> +	return ttbr0 & ~GENMASK_ULL(63, 48);
> +}
> +
> +static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
> +{
> +	return ((FIELD_GET(CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN, cnthctl) << 10) |
> +		(cnthctl & (CNTHCTL_EVNTI | CNTHCTL_EVNTDIR | CNTHCTL_EVNTEN)));

I asked about the field positions in the previous series and this is what you
replied:

> It's a classic one. Remember that we are running VHE, and remapping a
> nVHE view of CNTHCTL_EL2 into the VHE view *for the guest*, and that
> these things are completely shifted around (it has the CNTKCTL_EL1
> format).
>
> For example, on nVHE, CNTHCTL_EL2.EL1PCTEN is bit 0. On nVHE, this is
> bit 10. That's why we have this shift, and that you now need some
> paracetamol.
>
> You can also look at the way we deal with the same stuff in
> kvm_timer_init_vhe()".

Here's how this function is used in vhe/sysreg-sr.c:

static void __sysreg_restore_vel2_state(struct kvm_cpu_context *ctxt)
{
	[..]
	if (__vcpu_el2_e2h_is_set(ctxt)) {
		[..]
	} else {
		[..]
		val = translate_cnthctl_el2_to_cntkctl_el1(ctxt_sys_reg(ctxt, CNTHCTL_EL2));
		write_sysreg_el1(val, SYS_CNTKCTL);
	}
	[..]
}

CNTHCTL_EL2 is a pure EL2 register. The translate function is called when guest
HCR_EL2.E2H is not set, therefore virtual CNTHCTL_EL2 has the non-VHE format.
And the result of the function is used to write to the hardware CNTKCTL_EL1
register (using the CNTKCTL_EL12 encoding), which is different from the
CNTHCTL_EL2 register. CNTKCTL_EL1 also always has the same format regardless of
the value of the HCR_EL2.E2H bit.

I don't understand what the host running with VHE has to do with the translate
function.

Thanks,
Alex

> +}
> +
>  #endif /* __ARM64_KVM_NESTED_H */
> -- 
> 2.30.2
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 16/64] KVM: arm64: nv: Save/Restore vEL2 sysregs
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-03 15:14     ` Alexandru Elisei
  -1 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-03 15:14 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi,

On Fri, Jan 28, 2022 at 12:18:24PM +0000, Marc Zyngier wrote:
> Whenever we need to restore the guest's system registers to the CPU, we
> now need to take care of the EL2 system registers as well. Most of them
> are accessed via traps only, but some have an immediate effect and also
> a guest running in VHE mode would expect them to be accessible via their
> EL1 encoding, which we do not trap.
> 
> For vEL2 we write the virtual EL2 registers with an identical format directly
> into their EL1 counterpart, and translate the few registers that have a
> different format for the same effect on the execution when running a
> non-VHE guest guest hypervisor.
> 
> Based on an initial patch from Andre Przywara, rewritten many times
> since.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h |   5 +-
>  arch/arm64/kvm/hyp/nvhe/sysreg-sr.c        |   2 +-
>  arch/arm64/kvm/hyp/vhe/sysreg-sr.c         | 125 ++++++++++++++++++++-
>  3 files changed, 127 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
> index 7ecca8b07851..283f780f5f56 100644
> --- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
> +++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
> @@ -92,9 +92,10 @@ static inline void __sysreg_restore_user_state(struct kvm_cpu_context *ctxt)
>  	write_sysreg(ctxt_sys_reg(ctxt, TPIDRRO_EL0),	tpidrro_el0);
>  }
>  
> -static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
> +static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt,
> +					      u64 mpidr)
>  {
> -	write_sysreg(ctxt_sys_reg(ctxt, MPIDR_EL1),	vmpidr_el2);
> +	write_sysreg(mpidr,				vmpidr_el2);
>  	write_sysreg(ctxt_sys_reg(ctxt, CSSELR_EL1),	csselr_el1);
>  
>  	if (has_vhe() ||
> diff --git a/arch/arm64/kvm/hyp/nvhe/sysreg-sr.c b/arch/arm64/kvm/hyp/nvhe/sysreg-sr.c
> index 29305022bc04..dba101565de3 100644
> --- a/arch/arm64/kvm/hyp/nvhe/sysreg-sr.c
> +++ b/arch/arm64/kvm/hyp/nvhe/sysreg-sr.c
> @@ -28,7 +28,7 @@ void __sysreg_save_state_nvhe(struct kvm_cpu_context *ctxt)
>  
>  void __sysreg_restore_state_nvhe(struct kvm_cpu_context *ctxt)
>  {
> -	__sysreg_restore_el1_state(ctxt);
> +	__sysreg_restore_el1_state(ctxt, ctxt_sys_reg(ctxt, MPIDR_EL1));
>  	__sysreg_restore_common_state(ctxt);
>  	__sysreg_restore_user_state(ctxt);
>  	__sysreg_restore_el2_return_state(ctxt);
> diff --git a/arch/arm64/kvm/hyp/vhe/sysreg-sr.c b/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
> index 007a12dd4351..3e26a78d00c5 100644
> --- a/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
> +++ b/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
> @@ -13,6 +13,96 @@
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_emulate.h>
>  #include <asm/kvm_hyp.h>
> +#include <asm/kvm_nested.h>
> +
> +static void __sysreg_save_vel2_state(struct kvm_cpu_context *ctxt)
> +{
> +	/* These registers are common with EL1 */
> +	ctxt_sys_reg(ctxt, CSSELR_EL1)	= read_sysreg(csselr_el1);
> +	ctxt_sys_reg(ctxt, PAR_EL1)	= read_sysreg(par_el1);
> +	ctxt_sys_reg(ctxt, TPIDR_EL1)	= read_sysreg(tpidr_el1);
> +
> +	ctxt_sys_reg(ctxt, ESR_EL2)	= read_sysreg_el1(SYS_ESR);
> +	ctxt_sys_reg(ctxt, AFSR0_EL2)	= read_sysreg_el1(SYS_AFSR0);
> +	ctxt_sys_reg(ctxt, AFSR1_EL2)	= read_sysreg_el1(SYS_AFSR1);
> +	ctxt_sys_reg(ctxt, FAR_EL2)	= read_sysreg_el1(SYS_FAR);
> +	ctxt_sys_reg(ctxt, MAIR_EL2)	= read_sysreg_el1(SYS_MAIR);
> +	ctxt_sys_reg(ctxt, VBAR_EL2)	= read_sysreg_el1(SYS_VBAR);
> +	ctxt_sys_reg(ctxt, CONTEXTIDR_EL2) = read_sysreg_el1(SYS_CONTEXTIDR);
> +	ctxt_sys_reg(ctxt, AMAIR_EL2)	= read_sysreg_el1(SYS_AMAIR);
> +
> +	/*
> +	 * In VHE mode those registers are compatible between EL1 and EL2,
> +	 * and the guest uses the _EL1 versions on the CPU naturally.
> +	 * So we save them into their _EL2 versions here.
> +	 * For nVHE mode we trap accesses to those registers, so our
> +	 * _EL2 copy in sys_regs[] is always up-to-date and we don't need
> +	 * to save anything here.
> +	 */
> +	if (__vcpu_el2_e2h_is_set(ctxt)) {
> +		ctxt_sys_reg(ctxt, SCTLR_EL2)	= read_sysreg_el1(SYS_SCTLR);
> +		ctxt_sys_reg(ctxt, CPTR_EL2)	= read_sysreg_el1(SYS_CPACR);
> +		ctxt_sys_reg(ctxt, TTBR0_EL2)	= read_sysreg_el1(SYS_TTBR0);
> +		ctxt_sys_reg(ctxt, TTBR1_EL2)	= read_sysreg_el1(SYS_TTBR1);
> +		ctxt_sys_reg(ctxt, TCR_EL2)	= read_sysreg_el1(SYS_TCR);
> +		ctxt_sys_reg(ctxt, CNTHCTL_EL2)	= read_sysreg_el1(SYS_CNTKCTL);
> +	}
> +
> +	ctxt_sys_reg(ctxt, SP_EL2)	= read_sysreg(sp_el1);
> +	ctxt_sys_reg(ctxt, ELR_EL2)	= read_sysreg_el1(SYS_ELR);
> +	ctxt_sys_reg(ctxt, SPSR_EL2)	= __fixup_spsr_el2_read(ctxt, read_sysreg_el1(SYS_SPSR));
> +}
> +
> +static void __sysreg_restore_vel2_state(struct kvm_cpu_context *ctxt)
> +{
> +	u64 val;
> +
> +	/* These registers are common with EL1 */
> +	write_sysreg(ctxt_sys_reg(ctxt, CSSELR_EL1),	csselr_el1);
> +	write_sysreg(ctxt_sys_reg(ctxt, PAR_EL1),	par_el1);
> +	write_sysreg(ctxt_sys_reg(ctxt, TPIDR_EL1),	tpidr_el1);
> +
> +	write_sysreg(read_cpuid_id(),			vpidr_el2);

This is sneaky. The the pseudocode for accessing MPDIR_EL1 is:

if PSTATE.EL == EL0 then
    [..]
elsif PSTATE.EL == EL1 then
    if EL2Enabled() && (!HaveEL(EL3) || SCR_EL3.FGTEn == '1') && HFGRTR_EL2.MIDR_EL1 == '1' then
        AArch64.SystemAccessTrap(EL2, 0x18);
    elsif EL2Enabled() then
        return VPIDR_EL2;
    else
        return MIDR_EL1;
elsif PSTATE.EL == EL2 then
    return MIDR_EL1;
[..]

From the guest's point of view, they are running at virtual EL2, a read of
MIDR_EL1 returns MIDR_EL1, and not what they programmed in VPIDR_EL2.

From the host's point of view, the guest is running in hardware EL1 and a
read of MPIDR_EL1 returns the hardware value of the VPIDR_EL2 register.

Because of the above, KVM programs hardware VPIDR_EL2 with the hardware
MPIDR_EL1 value, instead of the L1 hypervisor virtual VPIDR_EL2 value.

I feel that this deserves a comment because it's not immediately obvious
what is happening, perhaps along the lines "Reading MIDR_EL1 from virtual
EL2 returns the hardware MIDR_EL1 value, not the value that the guest
programmed in virtual VPIDR_EL2".

> +	write_sysreg(ctxt_sys_reg(ctxt, MPIDR_EL1),	vmpidr_el2);

But here, the guest will always read the value that KVM computed for the VM
in reset_mpidr(), that's why KVM is writing the shadow MPIDR value.

> +	write_sysreg_el1(ctxt_sys_reg(ctxt, MAIR_EL2),	SYS_MAIR);
> +	write_sysreg_el1(ctxt_sys_reg(ctxt, VBAR_EL2),	SYS_VBAR);
> +	write_sysreg_el1(ctxt_sys_reg(ctxt, CONTEXTIDR_EL2),SYS_CONTEXTIDR);
> +	write_sysreg_el1(ctxt_sys_reg(ctxt, AMAIR_EL2),	SYS_AMAIR);
> +
> +	if (__vcpu_el2_e2h_is_set(ctxt)) {
> +		/*
> +		 * In VHE mode those registers are compatible between
> +		 * EL1 and EL2.
> +		 */
> +		write_sysreg_el1(ctxt_sys_reg(ctxt, SCTLR_EL2),	SYS_SCTLR);
> +		write_sysreg_el1(ctxt_sys_reg(ctxt, CPTR_EL2),	SYS_CPACR);
> +		write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR0_EL2),	SYS_TTBR0);
> +		write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR1_EL2),	SYS_TTBR1);
> +		write_sysreg_el1(ctxt_sys_reg(ctxt, TCR_EL2),	SYS_TCR);
> +		write_sysreg_el1(ctxt_sys_reg(ctxt, CNTHCTL_EL2), SYS_CNTKCTL);
> +	} else {
> +		val = translate_sctlr_el2_to_sctlr_el1(ctxt_sys_reg(ctxt, SCTLR_EL2));
> +		write_sysreg_el1(val, SYS_SCTLR);
> +		val = translate_cptr_el2_to_cpacr_el1(ctxt_sys_reg(ctxt, CPTR_EL2));
> +		write_sysreg_el1(val, SYS_CPACR);
> +		val = translate_ttbr0_el2_to_ttbr0_el1(ctxt_sys_reg(ctxt, TTBR0_EL2));
> +		write_sysreg_el1(val, SYS_TTBR0);
> +		val = translate_tcr_el2_to_tcr_el1(ctxt_sys_reg(ctxt, TCR_EL2));
> +		write_sysreg_el1(val, SYS_TCR);
> +		val = translate_cnthctl_el2_to_cntkctl_el1(ctxt_sys_reg(ctxt, CNTHCTL_EL2));
> +		write_sysreg_el1(val, SYS_CNTKCTL);
> +	}
> +
> +	write_sysreg_el1(ctxt_sys_reg(ctxt, ESR_EL2),	SYS_ESR);
> +	write_sysreg_el1(ctxt_sys_reg(ctxt, AFSR0_EL2),	SYS_AFSR0);
> +	write_sysreg_el1(ctxt_sys_reg(ctxt, AFSR1_EL2),	SYS_AFSR1);
> +	write_sysreg_el1(ctxt_sys_reg(ctxt, FAR_EL2),	SYS_FAR);
> +	write_sysreg(ctxt_sys_reg(ctxt, SP_EL2),	sp_el1);
> +	write_sysreg_el1(ctxt_sys_reg(ctxt, ELR_EL2),	SYS_ELR);
> +
> +	val = __fixup_spsr_el2_write(ctxt, ctxt_sys_reg(ctxt, SPSR_EL2));
> +	write_sysreg_el1(val,	SYS_SPSR);
> +}
>  
>  /*
>   * VHE: Host and guest must save mdscr_el1 and sp_el0 (and the PC and
> @@ -65,6 +155,7 @@ void kvm_vcpu_load_sysregs_vhe(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
>  	struct kvm_cpu_context *host_ctxt;
> +	u64 mpidr;
>  
>  	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
>  	__sysreg_save_user_state(host_ctxt);
> @@ -77,7 +168,29 @@ void kvm_vcpu_load_sysregs_vhe(struct kvm_vcpu *vcpu)
>  	 */
>  	__sysreg32_restore_state(vcpu);
>  	__sysreg_restore_user_state(guest_ctxt);
> -	__sysreg_restore_el1_state(guest_ctxt);
> +
> +	if (unlikely(__is_hyp_ctxt(guest_ctxt))) {
> +		__sysreg_restore_vel2_state(guest_ctxt);
> +	} else {
> +		if (vcpu_has_nv(vcpu)) {
> +			/*
> +			 * Only set VPIDR_EL2 for nested VMs, as this is the
> +			 * only time it changes. We'll restore the MIDR_EL1
> +			 * view on put.
> +			 */
> +			write_sysreg(ctxt_sys_reg(guest_ctxt, VPIDR_EL2), vpidr_el2);
> +
> +			/*
> +			 * As we're restoring a nested guest, set the value
> +			 * provided by the guest hypervisor.
> +			 */
> +			mpidr = ctxt_sys_reg(guest_ctxt, VMPIDR_EL2);
> +		} else {
> +			mpidr = ctxt_sys_reg(guest_ctxt, MPIDR_EL1);
> +		}
> +
> +		__sysreg_restore_el1_state(guest_ctxt, mpidr);
> +	}
>  
>  	vcpu->arch.sysregs_loaded_on_cpu = true;
>  
> @@ -103,12 +216,20 @@ void kvm_vcpu_put_sysregs_vhe(struct kvm_vcpu *vcpu)
>  	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
>  	deactivate_traps_vhe_put(vcpu);
>  
> -	__sysreg_save_el1_state(guest_ctxt);
> +	if (unlikely(__is_hyp_ctxt(guest_ctxt)))
> +		__sysreg_save_vel2_state(guest_ctxt);
> +	else
> +		__sysreg_save_el1_state(guest_ctxt);
> +
>  	__sysreg_save_user_state(guest_ctxt);
>  	__sysreg32_save_state(vcpu);
>  
>  	/* Restore host user state */
>  	__sysreg_restore_user_state(host_ctxt);
>  
> +	/* If leaving a nesting guest, restore MPIDR_EL1 default view */
> +	if (vcpu_has_nv(vcpu))
> +		write_sysreg(read_cpuid_id(),	vpidr_el2);
> +
>  	vcpu->arch.sysregs_loaded_on_cpu = false;

Compared __sysreg_{save,restore}_vel2_state() with
__sysreg_{save,restore}_el1_state(), they access the same registers. Also
checked in the Arm ARM that the registers for which KVM doesn't
differentiate between E2H set and cleared in virtual EL2 have the same
encoding regardless of the value of HCR_EL2.E2H:

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks,
Alex

>  }
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 16/64] KVM: arm64: nv: Save/Restore vEL2 sysregs
@ 2022-02-03 15:14     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-03 15:14 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

Hi,

On Fri, Jan 28, 2022 at 12:18:24PM +0000, Marc Zyngier wrote:
> Whenever we need to restore the guest's system registers to the CPU, we
> now need to take care of the EL2 system registers as well. Most of them
> are accessed via traps only, but some have an immediate effect and also
> a guest running in VHE mode would expect them to be accessible via their
> EL1 encoding, which we do not trap.
> 
> For vEL2 we write the virtual EL2 registers with an identical format directly
> into their EL1 counterpart, and translate the few registers that have a
> different format for the same effect on the execution when running a
> non-VHE guest guest hypervisor.
> 
> Based on an initial patch from Andre Przywara, rewritten many times
> since.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h |   5 +-
>  arch/arm64/kvm/hyp/nvhe/sysreg-sr.c        |   2 +-
>  arch/arm64/kvm/hyp/vhe/sysreg-sr.c         | 125 ++++++++++++++++++++-
>  3 files changed, 127 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
> index 7ecca8b07851..283f780f5f56 100644
> --- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
> +++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
> @@ -92,9 +92,10 @@ static inline void __sysreg_restore_user_state(struct kvm_cpu_context *ctxt)
>  	write_sysreg(ctxt_sys_reg(ctxt, TPIDRRO_EL0),	tpidrro_el0);
>  }
>  
> -static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
> +static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt,
> +					      u64 mpidr)
>  {
> -	write_sysreg(ctxt_sys_reg(ctxt, MPIDR_EL1),	vmpidr_el2);
> +	write_sysreg(mpidr,				vmpidr_el2);
>  	write_sysreg(ctxt_sys_reg(ctxt, CSSELR_EL1),	csselr_el1);
>  
>  	if (has_vhe() ||
> diff --git a/arch/arm64/kvm/hyp/nvhe/sysreg-sr.c b/arch/arm64/kvm/hyp/nvhe/sysreg-sr.c
> index 29305022bc04..dba101565de3 100644
> --- a/arch/arm64/kvm/hyp/nvhe/sysreg-sr.c
> +++ b/arch/arm64/kvm/hyp/nvhe/sysreg-sr.c
> @@ -28,7 +28,7 @@ void __sysreg_save_state_nvhe(struct kvm_cpu_context *ctxt)
>  
>  void __sysreg_restore_state_nvhe(struct kvm_cpu_context *ctxt)
>  {
> -	__sysreg_restore_el1_state(ctxt);
> +	__sysreg_restore_el1_state(ctxt, ctxt_sys_reg(ctxt, MPIDR_EL1));
>  	__sysreg_restore_common_state(ctxt);
>  	__sysreg_restore_user_state(ctxt);
>  	__sysreg_restore_el2_return_state(ctxt);
> diff --git a/arch/arm64/kvm/hyp/vhe/sysreg-sr.c b/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
> index 007a12dd4351..3e26a78d00c5 100644
> --- a/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
> +++ b/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
> @@ -13,6 +13,96 @@
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_emulate.h>
>  #include <asm/kvm_hyp.h>
> +#include <asm/kvm_nested.h>
> +
> +static void __sysreg_save_vel2_state(struct kvm_cpu_context *ctxt)
> +{
> +	/* These registers are common with EL1 */
> +	ctxt_sys_reg(ctxt, CSSELR_EL1)	= read_sysreg(csselr_el1);
> +	ctxt_sys_reg(ctxt, PAR_EL1)	= read_sysreg(par_el1);
> +	ctxt_sys_reg(ctxt, TPIDR_EL1)	= read_sysreg(tpidr_el1);
> +
> +	ctxt_sys_reg(ctxt, ESR_EL2)	= read_sysreg_el1(SYS_ESR);
> +	ctxt_sys_reg(ctxt, AFSR0_EL2)	= read_sysreg_el1(SYS_AFSR0);
> +	ctxt_sys_reg(ctxt, AFSR1_EL2)	= read_sysreg_el1(SYS_AFSR1);
> +	ctxt_sys_reg(ctxt, FAR_EL2)	= read_sysreg_el1(SYS_FAR);
> +	ctxt_sys_reg(ctxt, MAIR_EL2)	= read_sysreg_el1(SYS_MAIR);
> +	ctxt_sys_reg(ctxt, VBAR_EL2)	= read_sysreg_el1(SYS_VBAR);
> +	ctxt_sys_reg(ctxt, CONTEXTIDR_EL2) = read_sysreg_el1(SYS_CONTEXTIDR);
> +	ctxt_sys_reg(ctxt, AMAIR_EL2)	= read_sysreg_el1(SYS_AMAIR);
> +
> +	/*
> +	 * In VHE mode those registers are compatible between EL1 and EL2,
> +	 * and the guest uses the _EL1 versions on the CPU naturally.
> +	 * So we save them into their _EL2 versions here.
> +	 * For nVHE mode we trap accesses to those registers, so our
> +	 * _EL2 copy in sys_regs[] is always up-to-date and we don't need
> +	 * to save anything here.
> +	 */
> +	if (__vcpu_el2_e2h_is_set(ctxt)) {
> +		ctxt_sys_reg(ctxt, SCTLR_EL2)	= read_sysreg_el1(SYS_SCTLR);
> +		ctxt_sys_reg(ctxt, CPTR_EL2)	= read_sysreg_el1(SYS_CPACR);
> +		ctxt_sys_reg(ctxt, TTBR0_EL2)	= read_sysreg_el1(SYS_TTBR0);
> +		ctxt_sys_reg(ctxt, TTBR1_EL2)	= read_sysreg_el1(SYS_TTBR1);
> +		ctxt_sys_reg(ctxt, TCR_EL2)	= read_sysreg_el1(SYS_TCR);
> +		ctxt_sys_reg(ctxt, CNTHCTL_EL2)	= read_sysreg_el1(SYS_CNTKCTL);
> +	}
> +
> +	ctxt_sys_reg(ctxt, SP_EL2)	= read_sysreg(sp_el1);
> +	ctxt_sys_reg(ctxt, ELR_EL2)	= read_sysreg_el1(SYS_ELR);
> +	ctxt_sys_reg(ctxt, SPSR_EL2)	= __fixup_spsr_el2_read(ctxt, read_sysreg_el1(SYS_SPSR));
> +}
> +
> +static void __sysreg_restore_vel2_state(struct kvm_cpu_context *ctxt)
> +{
> +	u64 val;
> +
> +	/* These registers are common with EL1 */
> +	write_sysreg(ctxt_sys_reg(ctxt, CSSELR_EL1),	csselr_el1);
> +	write_sysreg(ctxt_sys_reg(ctxt, PAR_EL1),	par_el1);
> +	write_sysreg(ctxt_sys_reg(ctxt, TPIDR_EL1),	tpidr_el1);
> +
> +	write_sysreg(read_cpuid_id(),			vpidr_el2);

This is sneaky. The the pseudocode for accessing MPDIR_EL1 is:

if PSTATE.EL == EL0 then
    [..]
elsif PSTATE.EL == EL1 then
    if EL2Enabled() && (!HaveEL(EL3) || SCR_EL3.FGTEn == '1') && HFGRTR_EL2.MIDR_EL1 == '1' then
        AArch64.SystemAccessTrap(EL2, 0x18);
    elsif EL2Enabled() then
        return VPIDR_EL2;
    else
        return MIDR_EL1;
elsif PSTATE.EL == EL2 then
    return MIDR_EL1;
[..]

From the guest's point of view, they are running at virtual EL2, a read of
MIDR_EL1 returns MIDR_EL1, and not what they programmed in VPIDR_EL2.

From the host's point of view, the guest is running in hardware EL1 and a
read of MPIDR_EL1 returns the hardware value of the VPIDR_EL2 register.

Because of the above, KVM programs hardware VPIDR_EL2 with the hardware
MPIDR_EL1 value, instead of the L1 hypervisor virtual VPIDR_EL2 value.

I feel that this deserves a comment because it's not immediately obvious
what is happening, perhaps along the lines "Reading MIDR_EL1 from virtual
EL2 returns the hardware MIDR_EL1 value, not the value that the guest
programmed in virtual VPIDR_EL2".

> +	write_sysreg(ctxt_sys_reg(ctxt, MPIDR_EL1),	vmpidr_el2);

But here, the guest will always read the value that KVM computed for the VM
in reset_mpidr(), that's why KVM is writing the shadow MPIDR value.

> +	write_sysreg_el1(ctxt_sys_reg(ctxt, MAIR_EL2),	SYS_MAIR);
> +	write_sysreg_el1(ctxt_sys_reg(ctxt, VBAR_EL2),	SYS_VBAR);
> +	write_sysreg_el1(ctxt_sys_reg(ctxt, CONTEXTIDR_EL2),SYS_CONTEXTIDR);
> +	write_sysreg_el1(ctxt_sys_reg(ctxt, AMAIR_EL2),	SYS_AMAIR);
> +
> +	if (__vcpu_el2_e2h_is_set(ctxt)) {
> +		/*
> +		 * In VHE mode those registers are compatible between
> +		 * EL1 and EL2.
> +		 */
> +		write_sysreg_el1(ctxt_sys_reg(ctxt, SCTLR_EL2),	SYS_SCTLR);
> +		write_sysreg_el1(ctxt_sys_reg(ctxt, CPTR_EL2),	SYS_CPACR);
> +		write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR0_EL2),	SYS_TTBR0);
> +		write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR1_EL2),	SYS_TTBR1);
> +		write_sysreg_el1(ctxt_sys_reg(ctxt, TCR_EL2),	SYS_TCR);
> +		write_sysreg_el1(ctxt_sys_reg(ctxt, CNTHCTL_EL2), SYS_CNTKCTL);
> +	} else {
> +		val = translate_sctlr_el2_to_sctlr_el1(ctxt_sys_reg(ctxt, SCTLR_EL2));
> +		write_sysreg_el1(val, SYS_SCTLR);
> +		val = translate_cptr_el2_to_cpacr_el1(ctxt_sys_reg(ctxt, CPTR_EL2));
> +		write_sysreg_el1(val, SYS_CPACR);
> +		val = translate_ttbr0_el2_to_ttbr0_el1(ctxt_sys_reg(ctxt, TTBR0_EL2));
> +		write_sysreg_el1(val, SYS_TTBR0);
> +		val = translate_tcr_el2_to_tcr_el1(ctxt_sys_reg(ctxt, TCR_EL2));
> +		write_sysreg_el1(val, SYS_TCR);
> +		val = translate_cnthctl_el2_to_cntkctl_el1(ctxt_sys_reg(ctxt, CNTHCTL_EL2));
> +		write_sysreg_el1(val, SYS_CNTKCTL);
> +	}
> +
> +	write_sysreg_el1(ctxt_sys_reg(ctxt, ESR_EL2),	SYS_ESR);
> +	write_sysreg_el1(ctxt_sys_reg(ctxt, AFSR0_EL2),	SYS_AFSR0);
> +	write_sysreg_el1(ctxt_sys_reg(ctxt, AFSR1_EL2),	SYS_AFSR1);
> +	write_sysreg_el1(ctxt_sys_reg(ctxt, FAR_EL2),	SYS_FAR);
> +	write_sysreg(ctxt_sys_reg(ctxt, SP_EL2),	sp_el1);
> +	write_sysreg_el1(ctxt_sys_reg(ctxt, ELR_EL2),	SYS_ELR);
> +
> +	val = __fixup_spsr_el2_write(ctxt, ctxt_sys_reg(ctxt, SPSR_EL2));
> +	write_sysreg_el1(val,	SYS_SPSR);
> +}
>  
>  /*
>   * VHE: Host and guest must save mdscr_el1 and sp_el0 (and the PC and
> @@ -65,6 +155,7 @@ void kvm_vcpu_load_sysregs_vhe(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
>  	struct kvm_cpu_context *host_ctxt;
> +	u64 mpidr;
>  
>  	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
>  	__sysreg_save_user_state(host_ctxt);
> @@ -77,7 +168,29 @@ void kvm_vcpu_load_sysregs_vhe(struct kvm_vcpu *vcpu)
>  	 */
>  	__sysreg32_restore_state(vcpu);
>  	__sysreg_restore_user_state(guest_ctxt);
> -	__sysreg_restore_el1_state(guest_ctxt);
> +
> +	if (unlikely(__is_hyp_ctxt(guest_ctxt))) {
> +		__sysreg_restore_vel2_state(guest_ctxt);
> +	} else {
> +		if (vcpu_has_nv(vcpu)) {
> +			/*
> +			 * Only set VPIDR_EL2 for nested VMs, as this is the
> +			 * only time it changes. We'll restore the MIDR_EL1
> +			 * view on put.
> +			 */
> +			write_sysreg(ctxt_sys_reg(guest_ctxt, VPIDR_EL2), vpidr_el2);
> +
> +			/*
> +			 * As we're restoring a nested guest, set the value
> +			 * provided by the guest hypervisor.
> +			 */
> +			mpidr = ctxt_sys_reg(guest_ctxt, VMPIDR_EL2);
> +		} else {
> +			mpidr = ctxt_sys_reg(guest_ctxt, MPIDR_EL1);
> +		}
> +
> +		__sysreg_restore_el1_state(guest_ctxt, mpidr);
> +	}
>  
>  	vcpu->arch.sysregs_loaded_on_cpu = true;
>  
> @@ -103,12 +216,20 @@ void kvm_vcpu_put_sysregs_vhe(struct kvm_vcpu *vcpu)
>  	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
>  	deactivate_traps_vhe_put(vcpu);
>  
> -	__sysreg_save_el1_state(guest_ctxt);
> +	if (unlikely(__is_hyp_ctxt(guest_ctxt)))
> +		__sysreg_save_vel2_state(guest_ctxt);
> +	else
> +		__sysreg_save_el1_state(guest_ctxt);
> +
>  	__sysreg_save_user_state(guest_ctxt);
>  	__sysreg32_save_state(vcpu);
>  
>  	/* Restore host user state */
>  	__sysreg_restore_user_state(host_ctxt);
>  
> +	/* If leaving a nesting guest, restore MPIDR_EL1 default view */
> +	if (vcpu_has_nv(vcpu))
> +		write_sysreg(read_cpuid_id(),	vpidr_el2);
> +
>  	vcpu->arch.sysregs_loaded_on_cpu = false;

Compared __sysreg_{save,restore}_vel2_state() with
__sysreg_{save,restore}_el1_state(), they access the same registers. Also
checked in the Arm ARM that the registers for which KVM doesn't
differentiate between E2H set and cleared in virtual EL2 have the same
encoding regardless of the value of HCR_EL2.E2H:

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks,
Alex

>  }
> -- 
> 2.30.2
> 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 16/64] KVM: arm64: nv: Save/Restore vEL2 sysregs
@ 2022-02-03 15:14     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-03 15:14 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi,

On Fri, Jan 28, 2022 at 12:18:24PM +0000, Marc Zyngier wrote:
> Whenever we need to restore the guest's system registers to the CPU, we
> now need to take care of the EL2 system registers as well. Most of them
> are accessed via traps only, but some have an immediate effect and also
> a guest running in VHE mode would expect them to be accessible via their
> EL1 encoding, which we do not trap.
> 
> For vEL2 we write the virtual EL2 registers with an identical format directly
> into their EL1 counterpart, and translate the few registers that have a
> different format for the same effect on the execution when running a
> non-VHE guest guest hypervisor.
> 
> Based on an initial patch from Andre Przywara, rewritten many times
> since.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h |   5 +-
>  arch/arm64/kvm/hyp/nvhe/sysreg-sr.c        |   2 +-
>  arch/arm64/kvm/hyp/vhe/sysreg-sr.c         | 125 ++++++++++++++++++++-
>  3 files changed, 127 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
> index 7ecca8b07851..283f780f5f56 100644
> --- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
> +++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
> @@ -92,9 +92,10 @@ static inline void __sysreg_restore_user_state(struct kvm_cpu_context *ctxt)
>  	write_sysreg(ctxt_sys_reg(ctxt, TPIDRRO_EL0),	tpidrro_el0);
>  }
>  
> -static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt)
> +static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt,
> +					      u64 mpidr)
>  {
> -	write_sysreg(ctxt_sys_reg(ctxt, MPIDR_EL1),	vmpidr_el2);
> +	write_sysreg(mpidr,				vmpidr_el2);
>  	write_sysreg(ctxt_sys_reg(ctxt, CSSELR_EL1),	csselr_el1);
>  
>  	if (has_vhe() ||
> diff --git a/arch/arm64/kvm/hyp/nvhe/sysreg-sr.c b/arch/arm64/kvm/hyp/nvhe/sysreg-sr.c
> index 29305022bc04..dba101565de3 100644
> --- a/arch/arm64/kvm/hyp/nvhe/sysreg-sr.c
> +++ b/arch/arm64/kvm/hyp/nvhe/sysreg-sr.c
> @@ -28,7 +28,7 @@ void __sysreg_save_state_nvhe(struct kvm_cpu_context *ctxt)
>  
>  void __sysreg_restore_state_nvhe(struct kvm_cpu_context *ctxt)
>  {
> -	__sysreg_restore_el1_state(ctxt);
> +	__sysreg_restore_el1_state(ctxt, ctxt_sys_reg(ctxt, MPIDR_EL1));
>  	__sysreg_restore_common_state(ctxt);
>  	__sysreg_restore_user_state(ctxt);
>  	__sysreg_restore_el2_return_state(ctxt);
> diff --git a/arch/arm64/kvm/hyp/vhe/sysreg-sr.c b/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
> index 007a12dd4351..3e26a78d00c5 100644
> --- a/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
> +++ b/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
> @@ -13,6 +13,96 @@
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_emulate.h>
>  #include <asm/kvm_hyp.h>
> +#include <asm/kvm_nested.h>
> +
> +static void __sysreg_save_vel2_state(struct kvm_cpu_context *ctxt)
> +{
> +	/* These registers are common with EL1 */
> +	ctxt_sys_reg(ctxt, CSSELR_EL1)	= read_sysreg(csselr_el1);
> +	ctxt_sys_reg(ctxt, PAR_EL1)	= read_sysreg(par_el1);
> +	ctxt_sys_reg(ctxt, TPIDR_EL1)	= read_sysreg(tpidr_el1);
> +
> +	ctxt_sys_reg(ctxt, ESR_EL2)	= read_sysreg_el1(SYS_ESR);
> +	ctxt_sys_reg(ctxt, AFSR0_EL2)	= read_sysreg_el1(SYS_AFSR0);
> +	ctxt_sys_reg(ctxt, AFSR1_EL2)	= read_sysreg_el1(SYS_AFSR1);
> +	ctxt_sys_reg(ctxt, FAR_EL2)	= read_sysreg_el1(SYS_FAR);
> +	ctxt_sys_reg(ctxt, MAIR_EL2)	= read_sysreg_el1(SYS_MAIR);
> +	ctxt_sys_reg(ctxt, VBAR_EL2)	= read_sysreg_el1(SYS_VBAR);
> +	ctxt_sys_reg(ctxt, CONTEXTIDR_EL2) = read_sysreg_el1(SYS_CONTEXTIDR);
> +	ctxt_sys_reg(ctxt, AMAIR_EL2)	= read_sysreg_el1(SYS_AMAIR);
> +
> +	/*
> +	 * In VHE mode those registers are compatible between EL1 and EL2,
> +	 * and the guest uses the _EL1 versions on the CPU naturally.
> +	 * So we save them into their _EL2 versions here.
> +	 * For nVHE mode we trap accesses to those registers, so our
> +	 * _EL2 copy in sys_regs[] is always up-to-date and we don't need
> +	 * to save anything here.
> +	 */
> +	if (__vcpu_el2_e2h_is_set(ctxt)) {
> +		ctxt_sys_reg(ctxt, SCTLR_EL2)	= read_sysreg_el1(SYS_SCTLR);
> +		ctxt_sys_reg(ctxt, CPTR_EL2)	= read_sysreg_el1(SYS_CPACR);
> +		ctxt_sys_reg(ctxt, TTBR0_EL2)	= read_sysreg_el1(SYS_TTBR0);
> +		ctxt_sys_reg(ctxt, TTBR1_EL2)	= read_sysreg_el1(SYS_TTBR1);
> +		ctxt_sys_reg(ctxt, TCR_EL2)	= read_sysreg_el1(SYS_TCR);
> +		ctxt_sys_reg(ctxt, CNTHCTL_EL2)	= read_sysreg_el1(SYS_CNTKCTL);
> +	}
> +
> +	ctxt_sys_reg(ctxt, SP_EL2)	= read_sysreg(sp_el1);
> +	ctxt_sys_reg(ctxt, ELR_EL2)	= read_sysreg_el1(SYS_ELR);
> +	ctxt_sys_reg(ctxt, SPSR_EL2)	= __fixup_spsr_el2_read(ctxt, read_sysreg_el1(SYS_SPSR));
> +}
> +
> +static void __sysreg_restore_vel2_state(struct kvm_cpu_context *ctxt)
> +{
> +	u64 val;
> +
> +	/* These registers are common with EL1 */
> +	write_sysreg(ctxt_sys_reg(ctxt, CSSELR_EL1),	csselr_el1);
> +	write_sysreg(ctxt_sys_reg(ctxt, PAR_EL1),	par_el1);
> +	write_sysreg(ctxt_sys_reg(ctxt, TPIDR_EL1),	tpidr_el1);
> +
> +	write_sysreg(read_cpuid_id(),			vpidr_el2);

This is sneaky. The the pseudocode for accessing MPDIR_EL1 is:

if PSTATE.EL == EL0 then
    [..]
elsif PSTATE.EL == EL1 then
    if EL2Enabled() && (!HaveEL(EL3) || SCR_EL3.FGTEn == '1') && HFGRTR_EL2.MIDR_EL1 == '1' then
        AArch64.SystemAccessTrap(EL2, 0x18);
    elsif EL2Enabled() then
        return VPIDR_EL2;
    else
        return MIDR_EL1;
elsif PSTATE.EL == EL2 then
    return MIDR_EL1;
[..]

From the guest's point of view, they are running at virtual EL2, a read of
MIDR_EL1 returns MIDR_EL1, and not what they programmed in VPIDR_EL2.

From the host's point of view, the guest is running in hardware EL1 and a
read of MPIDR_EL1 returns the hardware value of the VPIDR_EL2 register.

Because of the above, KVM programs hardware VPIDR_EL2 with the hardware
MPIDR_EL1 value, instead of the L1 hypervisor virtual VPIDR_EL2 value.

I feel that this deserves a comment because it's not immediately obvious
what is happening, perhaps along the lines "Reading MIDR_EL1 from virtual
EL2 returns the hardware MIDR_EL1 value, not the value that the guest
programmed in virtual VPIDR_EL2".

> +	write_sysreg(ctxt_sys_reg(ctxt, MPIDR_EL1),	vmpidr_el2);

But here, the guest will always read the value that KVM computed for the VM
in reset_mpidr(), that's why KVM is writing the shadow MPIDR value.

> +	write_sysreg_el1(ctxt_sys_reg(ctxt, MAIR_EL2),	SYS_MAIR);
> +	write_sysreg_el1(ctxt_sys_reg(ctxt, VBAR_EL2),	SYS_VBAR);
> +	write_sysreg_el1(ctxt_sys_reg(ctxt, CONTEXTIDR_EL2),SYS_CONTEXTIDR);
> +	write_sysreg_el1(ctxt_sys_reg(ctxt, AMAIR_EL2),	SYS_AMAIR);
> +
> +	if (__vcpu_el2_e2h_is_set(ctxt)) {
> +		/*
> +		 * In VHE mode those registers are compatible between
> +		 * EL1 and EL2.
> +		 */
> +		write_sysreg_el1(ctxt_sys_reg(ctxt, SCTLR_EL2),	SYS_SCTLR);
> +		write_sysreg_el1(ctxt_sys_reg(ctxt, CPTR_EL2),	SYS_CPACR);
> +		write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR0_EL2),	SYS_TTBR0);
> +		write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR1_EL2),	SYS_TTBR1);
> +		write_sysreg_el1(ctxt_sys_reg(ctxt, TCR_EL2),	SYS_TCR);
> +		write_sysreg_el1(ctxt_sys_reg(ctxt, CNTHCTL_EL2), SYS_CNTKCTL);
> +	} else {
> +		val = translate_sctlr_el2_to_sctlr_el1(ctxt_sys_reg(ctxt, SCTLR_EL2));
> +		write_sysreg_el1(val, SYS_SCTLR);
> +		val = translate_cptr_el2_to_cpacr_el1(ctxt_sys_reg(ctxt, CPTR_EL2));
> +		write_sysreg_el1(val, SYS_CPACR);
> +		val = translate_ttbr0_el2_to_ttbr0_el1(ctxt_sys_reg(ctxt, TTBR0_EL2));
> +		write_sysreg_el1(val, SYS_TTBR0);
> +		val = translate_tcr_el2_to_tcr_el1(ctxt_sys_reg(ctxt, TCR_EL2));
> +		write_sysreg_el1(val, SYS_TCR);
> +		val = translate_cnthctl_el2_to_cntkctl_el1(ctxt_sys_reg(ctxt, CNTHCTL_EL2));
> +		write_sysreg_el1(val, SYS_CNTKCTL);
> +	}
> +
> +	write_sysreg_el1(ctxt_sys_reg(ctxt, ESR_EL2),	SYS_ESR);
> +	write_sysreg_el1(ctxt_sys_reg(ctxt, AFSR0_EL2),	SYS_AFSR0);
> +	write_sysreg_el1(ctxt_sys_reg(ctxt, AFSR1_EL2),	SYS_AFSR1);
> +	write_sysreg_el1(ctxt_sys_reg(ctxt, FAR_EL2),	SYS_FAR);
> +	write_sysreg(ctxt_sys_reg(ctxt, SP_EL2),	sp_el1);
> +	write_sysreg_el1(ctxt_sys_reg(ctxt, ELR_EL2),	SYS_ELR);
> +
> +	val = __fixup_spsr_el2_write(ctxt, ctxt_sys_reg(ctxt, SPSR_EL2));
> +	write_sysreg_el1(val,	SYS_SPSR);
> +}
>  
>  /*
>   * VHE: Host and guest must save mdscr_el1 and sp_el0 (and the PC and
> @@ -65,6 +155,7 @@ void kvm_vcpu_load_sysregs_vhe(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
>  	struct kvm_cpu_context *host_ctxt;
> +	u64 mpidr;
>  
>  	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
>  	__sysreg_save_user_state(host_ctxt);
> @@ -77,7 +168,29 @@ void kvm_vcpu_load_sysregs_vhe(struct kvm_vcpu *vcpu)
>  	 */
>  	__sysreg32_restore_state(vcpu);
>  	__sysreg_restore_user_state(guest_ctxt);
> -	__sysreg_restore_el1_state(guest_ctxt);
> +
> +	if (unlikely(__is_hyp_ctxt(guest_ctxt))) {
> +		__sysreg_restore_vel2_state(guest_ctxt);
> +	} else {
> +		if (vcpu_has_nv(vcpu)) {
> +			/*
> +			 * Only set VPIDR_EL2 for nested VMs, as this is the
> +			 * only time it changes. We'll restore the MIDR_EL1
> +			 * view on put.
> +			 */
> +			write_sysreg(ctxt_sys_reg(guest_ctxt, VPIDR_EL2), vpidr_el2);
> +
> +			/*
> +			 * As we're restoring a nested guest, set the value
> +			 * provided by the guest hypervisor.
> +			 */
> +			mpidr = ctxt_sys_reg(guest_ctxt, VMPIDR_EL2);
> +		} else {
> +			mpidr = ctxt_sys_reg(guest_ctxt, MPIDR_EL1);
> +		}
> +
> +		__sysreg_restore_el1_state(guest_ctxt, mpidr);
> +	}
>  
>  	vcpu->arch.sysregs_loaded_on_cpu = true;
>  
> @@ -103,12 +216,20 @@ void kvm_vcpu_put_sysregs_vhe(struct kvm_vcpu *vcpu)
>  	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
>  	deactivate_traps_vhe_put(vcpu);
>  
> -	__sysreg_save_el1_state(guest_ctxt);
> +	if (unlikely(__is_hyp_ctxt(guest_ctxt)))
> +		__sysreg_save_vel2_state(guest_ctxt);
> +	else
> +		__sysreg_save_el1_state(guest_ctxt);
> +
>  	__sysreg_save_user_state(guest_ctxt);
>  	__sysreg32_save_state(vcpu);
>  
>  	/* Restore host user state */
>  	__sysreg_restore_user_state(host_ctxt);
>  
> +	/* If leaving a nesting guest, restore MPIDR_EL1 default view */
> +	if (vcpu_has_nv(vcpu))
> +		write_sysreg(read_cpuid_id(),	vpidr_el2);
> +
>  	vcpu->arch.sysregs_loaded_on_cpu = false;

Compared __sysreg_{save,restore}_vel2_state() with
__sysreg_{save,restore}_el1_state(), they access the same registers. Also
checked in the Arm ARM that the registers for which KVM doesn't
differentiate between E2H set and cleared in virtual EL2 have the same
encoding regardless of the value of HCR_EL2.E2H:

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks,
Alex

>  }
> -- 
> 2.30.2
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 17/64] KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-03 15:53     ` Alexandru Elisei
  -1 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-03 15:53 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:25PM +0000, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@arm.com>
> 
> We can no longer blindly copy the VCPU's PSTATE into SPSR_EL2 and return
> to the guest and vice versa when taking an exception to the hypervisor,
> because we emulate virtual EL2 in EL1 and therefore have to translate
> the mode field from EL2 to EL1 and vice versa.
> 
> This requires keeping track of the state we enter the guest, for which
> we transiently use a dedicated flag.

Looks good to me:

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks,
Alex

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 17/64] KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor
@ 2022-02-03 15:53     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-03 15:53 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:25PM +0000, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@arm.com>
> 
> We can no longer blindly copy the VCPU's PSTATE into SPSR_EL2 and return
> to the guest and vice versa when taking an exception to the hypervisor,
> because we emulate virtual EL2 in EL1 and therefore have to translate
> the mode field from EL2 to EL1 and vice versa.
> 
> This requires keeping track of the state we enter the guest, for which
> we transiently use a dedicated flag.

Looks good to me:

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks,
Alex
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 17/64] KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor
@ 2022-02-03 15:53     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-03 15:53 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:25PM +0000, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@arm.com>
> 
> We can no longer blindly copy the VCPU's PSTATE into SPSR_EL2 and return
> to the guest and vice versa when taking an exception to the hypervisor,
> because we emulate virtual EL2 in EL1 and therefore have to translate
> the mode field from EL2 to EL1 and vice versa.
> 
> This requires keeping track of the state we enter the guest, for which
> we transiently use a dedicated flag.

Looks good to me:

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks,
Alex

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 18/64] KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-03 17:11     ` Alexandru Elisei
  -1 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-03 17:11 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:26PM +0000, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@linaro.org>
> 
> When running in virtual EL2 mode, we actually run the hardware in EL1
> and therefore have to use the EL1 registers to ensure correct operation.
> 
> By setting the HCR.TVM and HCR.TVRM we ensure that the virtual EL2 mode
> doesn't shoot itself in the foot when setting up what it believes to be
> a different mode's system register state (for example when preparing to
> switch to a VM).
> 
> We can leverage the existing sysregs infrastructure to support trapped
> accesses to these registers.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/hyp/include/hyp/switch.h |  4 +---
>  arch/arm64/kvm/hyp/nvhe/switch.c        |  2 +-
>  arch/arm64/kvm/hyp/vhe/switch.c         |  7 ++++++-
>  arch/arm64/kvm/sys_regs.c               | 19 ++++++++++++++++---
>  4 files changed, 24 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
> index 58e14f8ead23..49c3b9eb09d7 100644
> --- a/arch/arm64/kvm/hyp/include/hyp/switch.h
> +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
> @@ -110,10 +110,8 @@ static inline void __deactivate_traps_common(struct kvm_vcpu *vcpu)
>  		write_sysreg(0, pmuserenr_el0);
>  }
>  
> -static inline void ___activate_traps(struct kvm_vcpu *vcpu)
> +static inline void ___activate_traps(struct kvm_vcpu *vcpu, u64 hcr)
>  {
> -	u64 hcr = vcpu->arch.hcr_el2;
> -
>  	if (cpus_have_final_cap(ARM64_WORKAROUND_CAVIUM_TX2_219_TVM))
>  		hcr |= HCR_TVM;
>  
> diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
> index 6410d21d8695..61a5627fd456 100644
> --- a/arch/arm64/kvm/hyp/nvhe/switch.c
> +++ b/arch/arm64/kvm/hyp/nvhe/switch.c
> @@ -38,7 +38,7 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
>  {
>  	u64 val;
>  
> -	___activate_traps(vcpu);
> +	___activate_traps(vcpu, vcpu->arch.hcr_el2);
>  	__activate_traps_common(vcpu);
>  
>  	val = vcpu->arch.cptr_el2;
> diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
> index 82ddaebe66de..6ed9e4893a02 100644
> --- a/arch/arm64/kvm/hyp/vhe/switch.c
> +++ b/arch/arm64/kvm/hyp/vhe/switch.c
> @@ -32,9 +32,14 @@ DEFINE_PER_CPU(unsigned long, kvm_hyp_vector);
>  
>  static void __activate_traps(struct kvm_vcpu *vcpu)
>  {
> +	u64 hcr = vcpu->arch.hcr_el2;
>  	u64 val;
>  
> -	___activate_traps(vcpu);
> +	/* Trap VM sysreg accesses if an EL2 guest is not using VHE. */
> +	if (vcpu_is_el2(vcpu) && !vcpu_el2_e2h_is_set(vcpu))
> +		hcr |= HCR_TVM | HCR_TRVM;
> +
> +	___activate_traps(vcpu, hcr);
>  
>  	val = read_sysreg(cpacr_el1);
>  	val |= CPACR_EL1_TTA;
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 102bc4906723..9d3520f1d17a 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -322,8 +322,15 @@ static void get_access_mask(const struct sys_reg_desc *r, u64 *mask, u64 *shift)
>  
>  /*
>   * Generic accessor for VM registers. Only called as long as HCR_TVM
> - * is set. If the guest enables the MMU, we stop trapping the VM
> - * sys_regs and leave it in complete control of the caches.
> + * is set.
> + *
> + * This is set in two cases: either (1) we're running at vEL2, or (2)
> + * we're running at EL1 and the guest has its MMU off.
> + *
> + * (1) TVM/TRVM is set, as we need to virtualise some of the VM
> + * registers for the guest hypervisor
> + * (2) Once the guest enables the MMU, we stop trapping the VM sys_regs
> + * and leave it in complete control of the caches.
>   */
>  static bool access_vm_reg(struct kvm_vcpu *vcpu,
>  			  struct sys_reg_params *p,
> @@ -332,7 +339,13 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
>  	bool was_enabled = vcpu_has_cache_enabled(vcpu);
>  	u64 val, mask, shift;
>  
> -	BUG_ON(!p->is_write);
> +	/* We don't expect TRVM on the host */

I don't get what that means. Isn't KVM setting HCR_EL2.TRVM to trap reads?

Other than that, the patch looks good:

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks,
Alex

> +	BUG_ON(!vcpu_is_el2(vcpu) && !p->is_write);
> +
> +	if (!p->is_write) {
> +		p->regval = vcpu_read_sys_reg(vcpu, r->reg);
> +		return true;
> +	}
>  
>  	get_access_mask(r, &mask, &shift);
>  
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 18/64] KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2
@ 2022-02-03 17:11     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-03 17:11 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:26PM +0000, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@linaro.org>
> 
> When running in virtual EL2 mode, we actually run the hardware in EL1
> and therefore have to use the EL1 registers to ensure correct operation.
> 
> By setting the HCR.TVM and HCR.TVRM we ensure that the virtual EL2 mode
> doesn't shoot itself in the foot when setting up what it believes to be
> a different mode's system register state (for example when preparing to
> switch to a VM).
> 
> We can leverage the existing sysregs infrastructure to support trapped
> accesses to these registers.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/hyp/include/hyp/switch.h |  4 +---
>  arch/arm64/kvm/hyp/nvhe/switch.c        |  2 +-
>  arch/arm64/kvm/hyp/vhe/switch.c         |  7 ++++++-
>  arch/arm64/kvm/sys_regs.c               | 19 ++++++++++++++++---
>  4 files changed, 24 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
> index 58e14f8ead23..49c3b9eb09d7 100644
> --- a/arch/arm64/kvm/hyp/include/hyp/switch.h
> +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
> @@ -110,10 +110,8 @@ static inline void __deactivate_traps_common(struct kvm_vcpu *vcpu)
>  		write_sysreg(0, pmuserenr_el0);
>  }
>  
> -static inline void ___activate_traps(struct kvm_vcpu *vcpu)
> +static inline void ___activate_traps(struct kvm_vcpu *vcpu, u64 hcr)
>  {
> -	u64 hcr = vcpu->arch.hcr_el2;
> -
>  	if (cpus_have_final_cap(ARM64_WORKAROUND_CAVIUM_TX2_219_TVM))
>  		hcr |= HCR_TVM;
>  
> diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
> index 6410d21d8695..61a5627fd456 100644
> --- a/arch/arm64/kvm/hyp/nvhe/switch.c
> +++ b/arch/arm64/kvm/hyp/nvhe/switch.c
> @@ -38,7 +38,7 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
>  {
>  	u64 val;
>  
> -	___activate_traps(vcpu);
> +	___activate_traps(vcpu, vcpu->arch.hcr_el2);
>  	__activate_traps_common(vcpu);
>  
>  	val = vcpu->arch.cptr_el2;
> diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
> index 82ddaebe66de..6ed9e4893a02 100644
> --- a/arch/arm64/kvm/hyp/vhe/switch.c
> +++ b/arch/arm64/kvm/hyp/vhe/switch.c
> @@ -32,9 +32,14 @@ DEFINE_PER_CPU(unsigned long, kvm_hyp_vector);
>  
>  static void __activate_traps(struct kvm_vcpu *vcpu)
>  {
> +	u64 hcr = vcpu->arch.hcr_el2;
>  	u64 val;
>  
> -	___activate_traps(vcpu);
> +	/* Trap VM sysreg accesses if an EL2 guest is not using VHE. */
> +	if (vcpu_is_el2(vcpu) && !vcpu_el2_e2h_is_set(vcpu))
> +		hcr |= HCR_TVM | HCR_TRVM;
> +
> +	___activate_traps(vcpu, hcr);
>  
>  	val = read_sysreg(cpacr_el1);
>  	val |= CPACR_EL1_TTA;
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 102bc4906723..9d3520f1d17a 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -322,8 +322,15 @@ static void get_access_mask(const struct sys_reg_desc *r, u64 *mask, u64 *shift)
>  
>  /*
>   * Generic accessor for VM registers. Only called as long as HCR_TVM
> - * is set. If the guest enables the MMU, we stop trapping the VM
> - * sys_regs and leave it in complete control of the caches.
> + * is set.
> + *
> + * This is set in two cases: either (1) we're running at vEL2, or (2)
> + * we're running at EL1 and the guest has its MMU off.
> + *
> + * (1) TVM/TRVM is set, as we need to virtualise some of the VM
> + * registers for the guest hypervisor
> + * (2) Once the guest enables the MMU, we stop trapping the VM sys_regs
> + * and leave it in complete control of the caches.
>   */
>  static bool access_vm_reg(struct kvm_vcpu *vcpu,
>  			  struct sys_reg_params *p,
> @@ -332,7 +339,13 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
>  	bool was_enabled = vcpu_has_cache_enabled(vcpu);
>  	u64 val, mask, shift;
>  
> -	BUG_ON(!p->is_write);
> +	/* We don't expect TRVM on the host */

I don't get what that means. Isn't KVM setting HCR_EL2.TRVM to trap reads?

Other than that, the patch looks good:

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks,
Alex

> +	BUG_ON(!vcpu_is_el2(vcpu) && !p->is_write);
> +
> +	if (!p->is_write) {
> +		p->regval = vcpu_read_sys_reg(vcpu, r->reg);
> +		return true;
> +	}
>  
>  	get_access_mask(r, &mask, &shift);
>  
> -- 
> 2.30.2
> 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 18/64] KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2
@ 2022-02-03 17:11     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-03 17:11 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:26PM +0000, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@linaro.org>
> 
> When running in virtual EL2 mode, we actually run the hardware in EL1
> and therefore have to use the EL1 registers to ensure correct operation.
> 
> By setting the HCR.TVM and HCR.TVRM we ensure that the virtual EL2 mode
> doesn't shoot itself in the foot when setting up what it believes to be
> a different mode's system register state (for example when preparing to
> switch to a VM).
> 
> We can leverage the existing sysregs infrastructure to support trapped
> accesses to these registers.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/hyp/include/hyp/switch.h |  4 +---
>  arch/arm64/kvm/hyp/nvhe/switch.c        |  2 +-
>  arch/arm64/kvm/hyp/vhe/switch.c         |  7 ++++++-
>  arch/arm64/kvm/sys_regs.c               | 19 ++++++++++++++++---
>  4 files changed, 24 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
> index 58e14f8ead23..49c3b9eb09d7 100644
> --- a/arch/arm64/kvm/hyp/include/hyp/switch.h
> +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
> @@ -110,10 +110,8 @@ static inline void __deactivate_traps_common(struct kvm_vcpu *vcpu)
>  		write_sysreg(0, pmuserenr_el0);
>  }
>  
> -static inline void ___activate_traps(struct kvm_vcpu *vcpu)
> +static inline void ___activate_traps(struct kvm_vcpu *vcpu, u64 hcr)
>  {
> -	u64 hcr = vcpu->arch.hcr_el2;
> -
>  	if (cpus_have_final_cap(ARM64_WORKAROUND_CAVIUM_TX2_219_TVM))
>  		hcr |= HCR_TVM;
>  
> diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
> index 6410d21d8695..61a5627fd456 100644
> --- a/arch/arm64/kvm/hyp/nvhe/switch.c
> +++ b/arch/arm64/kvm/hyp/nvhe/switch.c
> @@ -38,7 +38,7 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
>  {
>  	u64 val;
>  
> -	___activate_traps(vcpu);
> +	___activate_traps(vcpu, vcpu->arch.hcr_el2);
>  	__activate_traps_common(vcpu);
>  
>  	val = vcpu->arch.cptr_el2;
> diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
> index 82ddaebe66de..6ed9e4893a02 100644
> --- a/arch/arm64/kvm/hyp/vhe/switch.c
> +++ b/arch/arm64/kvm/hyp/vhe/switch.c
> @@ -32,9 +32,14 @@ DEFINE_PER_CPU(unsigned long, kvm_hyp_vector);
>  
>  static void __activate_traps(struct kvm_vcpu *vcpu)
>  {
> +	u64 hcr = vcpu->arch.hcr_el2;
>  	u64 val;
>  
> -	___activate_traps(vcpu);
> +	/* Trap VM sysreg accesses if an EL2 guest is not using VHE. */
> +	if (vcpu_is_el2(vcpu) && !vcpu_el2_e2h_is_set(vcpu))
> +		hcr |= HCR_TVM | HCR_TRVM;
> +
> +	___activate_traps(vcpu, hcr);
>  
>  	val = read_sysreg(cpacr_el1);
>  	val |= CPACR_EL1_TTA;
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 102bc4906723..9d3520f1d17a 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -322,8 +322,15 @@ static void get_access_mask(const struct sys_reg_desc *r, u64 *mask, u64 *shift)
>  
>  /*
>   * Generic accessor for VM registers. Only called as long as HCR_TVM
> - * is set. If the guest enables the MMU, we stop trapping the VM
> - * sys_regs and leave it in complete control of the caches.
> + * is set.
> + *
> + * This is set in two cases: either (1) we're running at vEL2, or (2)
> + * we're running at EL1 and the guest has its MMU off.
> + *
> + * (1) TVM/TRVM is set, as we need to virtualise some of the VM
> + * registers for the guest hypervisor
> + * (2) Once the guest enables the MMU, we stop trapping the VM sys_regs
> + * and leave it in complete control of the caches.
>   */
>  static bool access_vm_reg(struct kvm_vcpu *vcpu,
>  			  struct sys_reg_params *p,
> @@ -332,7 +339,13 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
>  	bool was_enabled = vcpu_has_cache_enabled(vcpu);
>  	u64 val, mask, shift;
>  
> -	BUG_ON(!p->is_write);
> +	/* We don't expect TRVM on the host */

I don't get what that means. Isn't KVM setting HCR_EL2.TRVM to trap reads?

Other than that, the patch looks good:

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks,
Alex

> +	BUG_ON(!vcpu_is_el2(vcpu) && !p->is_write);
> +
> +	if (!p->is_write) {
> +		p->regval = vcpu_read_sys_reg(vcpu, r->reg);
> +		return true;
> +	}
>  
>  	get_access_mask(r, &mask, &shift);
>  
> -- 
> 2.30.2
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 19/64] KVM: arm64: nv: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-03 17:27     ` Alexandru Elisei
  -1 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-03 17:27 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

The previous patch:

"KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2" -> sets the trap
bits to trap EL1 VM registers.

This patch:

"KVM: arm64: nv: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2" -> does
not set the trap bits to trap the registers.

Might be worth changing the subject to something like "Add accessors for
<register list here>" to reflect what the patch does.

Other than that, the patch looks good to me.

Thanks,
Alex

On Fri, Jan 28, 2022 at 12:18:27PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> For the same reason we trap virtual memory register accesses at virtual
> EL2, we need to trap SPSR_EL1, ELR_EL1 and VBAR_EL1 accesses. ARM v8.3
> introduces the HCR_EL2.NV1 bit to be able to trap on those register
> accesses in EL1. Do not set this bit until the whole nesting support is
> completed.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/sys_regs.c | 29 ++++++++++++++++++++++++++++-
>  1 file changed, 28 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 9d3520f1d17a..4f2bcc1e0c25 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1650,6 +1650,30 @@ static bool access_sp_el1(struct kvm_vcpu *vcpu,
>  	return true;
>  }
>  
> +static bool access_elr(struct kvm_vcpu *vcpu,
> +		       struct sys_reg_params *p,
> +		       const struct sys_reg_desc *r)
> +{
> +	if (p->is_write)
> +		vcpu_write_sys_reg(vcpu, p->regval, ELR_EL1);
> +	else
> +		p->regval = vcpu_read_sys_reg(vcpu, ELR_EL1);
> +
> +	return true;
> +}
> +
> +static bool access_spsr(struct kvm_vcpu *vcpu,
> +			struct sys_reg_params *p,
> +			const struct sys_reg_desc *r)
> +{
> +	if (p->is_write)
> +		__vcpu_sys_reg(vcpu, SPSR_EL1) = p->regval;
> +	else
> +		p->regval = __vcpu_sys_reg(vcpu, SPSR_EL1);
> +
> +	return true;
> +}
> +
>  static bool access_spsr_el2(struct kvm_vcpu *vcpu,
>  			    struct sys_reg_params *p,
>  			    const struct sys_reg_desc *r)
> @@ -1812,6 +1836,9 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	PTRAUTH_KEY(APDB),
>  	PTRAUTH_KEY(APGA),
>  
> +	{ SYS_DESC(SYS_SPSR_EL1), access_spsr},
> +	{ SYS_DESC(SYS_ELR_EL1), access_elr},
> +
>  	{ SYS_DESC(SYS_AFSR0_EL1), access_vm_reg, reset_unknown, AFSR0_EL1 },
>  	{ SYS_DESC(SYS_AFSR1_EL1), access_vm_reg, reset_unknown, AFSR1_EL1 },
>  	{ SYS_DESC(SYS_ESR_EL1), access_vm_reg, reset_unknown, ESR_EL1 },
> @@ -1859,7 +1886,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	{ SYS_DESC(SYS_LORC_EL1), trap_loregion },
>  	{ SYS_DESC(SYS_LORID_EL1), trap_loregion },
>  
> -	{ SYS_DESC(SYS_VBAR_EL1), NULL, reset_val, VBAR_EL1, 0 },
> +	{ SYS_DESC(SYS_VBAR_EL1), access_rw, reset_val, VBAR_EL1, 0 },
>  	{ SYS_DESC(SYS_DISR_EL1), NULL, reset_val, DISR_EL1, 0 },
>  
>  	{ SYS_DESC(SYS_ICC_IAR0_EL1), write_to_read_only },
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 19/64] KVM: arm64: nv: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
@ 2022-02-03 17:27     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-03 17:27 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

The previous patch:

"KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2" -> sets the trap
bits to trap EL1 VM registers.

This patch:

"KVM: arm64: nv: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2" -> does
not set the trap bits to trap the registers.

Might be worth changing the subject to something like "Add accessors for
<register list here>" to reflect what the patch does.

Other than that, the patch looks good to me.

Thanks,
Alex

On Fri, Jan 28, 2022 at 12:18:27PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> For the same reason we trap virtual memory register accesses at virtual
> EL2, we need to trap SPSR_EL1, ELR_EL1 and VBAR_EL1 accesses. ARM v8.3
> introduces the HCR_EL2.NV1 bit to be able to trap on those register
> accesses in EL1. Do not set this bit until the whole nesting support is
> completed.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/sys_regs.c | 29 ++++++++++++++++++++++++++++-
>  1 file changed, 28 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 9d3520f1d17a..4f2bcc1e0c25 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1650,6 +1650,30 @@ static bool access_sp_el1(struct kvm_vcpu *vcpu,
>  	return true;
>  }
>  
> +static bool access_elr(struct kvm_vcpu *vcpu,
> +		       struct sys_reg_params *p,
> +		       const struct sys_reg_desc *r)
> +{
> +	if (p->is_write)
> +		vcpu_write_sys_reg(vcpu, p->regval, ELR_EL1);
> +	else
> +		p->regval = vcpu_read_sys_reg(vcpu, ELR_EL1);
> +
> +	return true;
> +}
> +
> +static bool access_spsr(struct kvm_vcpu *vcpu,
> +			struct sys_reg_params *p,
> +			const struct sys_reg_desc *r)
> +{
> +	if (p->is_write)
> +		__vcpu_sys_reg(vcpu, SPSR_EL1) = p->regval;
> +	else
> +		p->regval = __vcpu_sys_reg(vcpu, SPSR_EL1);
> +
> +	return true;
> +}
> +
>  static bool access_spsr_el2(struct kvm_vcpu *vcpu,
>  			    struct sys_reg_params *p,
>  			    const struct sys_reg_desc *r)
> @@ -1812,6 +1836,9 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	PTRAUTH_KEY(APDB),
>  	PTRAUTH_KEY(APGA),
>  
> +	{ SYS_DESC(SYS_SPSR_EL1), access_spsr},
> +	{ SYS_DESC(SYS_ELR_EL1), access_elr},
> +
>  	{ SYS_DESC(SYS_AFSR0_EL1), access_vm_reg, reset_unknown, AFSR0_EL1 },
>  	{ SYS_DESC(SYS_AFSR1_EL1), access_vm_reg, reset_unknown, AFSR1_EL1 },
>  	{ SYS_DESC(SYS_ESR_EL1), access_vm_reg, reset_unknown, ESR_EL1 },
> @@ -1859,7 +1886,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	{ SYS_DESC(SYS_LORC_EL1), trap_loregion },
>  	{ SYS_DESC(SYS_LORID_EL1), trap_loregion },
>  
> -	{ SYS_DESC(SYS_VBAR_EL1), NULL, reset_val, VBAR_EL1, 0 },
> +	{ SYS_DESC(SYS_VBAR_EL1), access_rw, reset_val, VBAR_EL1, 0 },
>  	{ SYS_DESC(SYS_DISR_EL1), NULL, reset_val, DISR_EL1, 0 },
>  
>  	{ SYS_DESC(SYS_ICC_IAR0_EL1), write_to_read_only },
> -- 
> 2.30.2
> 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 19/64] KVM: arm64: nv: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
@ 2022-02-03 17:27     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-03 17:27 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

The previous patch:

"KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2" -> sets the trap
bits to trap EL1 VM registers.

This patch:

"KVM: arm64: nv: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2" -> does
not set the trap bits to trap the registers.

Might be worth changing the subject to something like "Add accessors for
<register list here>" to reflect what the patch does.

Other than that, the patch looks good to me.

Thanks,
Alex

On Fri, Jan 28, 2022 at 12:18:27PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> For the same reason we trap virtual memory register accesses at virtual
> EL2, we need to trap SPSR_EL1, ELR_EL1 and VBAR_EL1 accesses. ARM v8.3
> introduces the HCR_EL2.NV1 bit to be able to trap on those register
> accesses in EL1. Do not set this bit until the whole nesting support is
> completed.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/sys_regs.c | 29 ++++++++++++++++++++++++++++-
>  1 file changed, 28 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 9d3520f1d17a..4f2bcc1e0c25 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1650,6 +1650,30 @@ static bool access_sp_el1(struct kvm_vcpu *vcpu,
>  	return true;
>  }
>  
> +static bool access_elr(struct kvm_vcpu *vcpu,
> +		       struct sys_reg_params *p,
> +		       const struct sys_reg_desc *r)
> +{
> +	if (p->is_write)
> +		vcpu_write_sys_reg(vcpu, p->regval, ELR_EL1);
> +	else
> +		p->regval = vcpu_read_sys_reg(vcpu, ELR_EL1);
> +
> +	return true;
> +}
> +
> +static bool access_spsr(struct kvm_vcpu *vcpu,
> +			struct sys_reg_params *p,
> +			const struct sys_reg_desc *r)
> +{
> +	if (p->is_write)
> +		__vcpu_sys_reg(vcpu, SPSR_EL1) = p->regval;
> +	else
> +		p->regval = __vcpu_sys_reg(vcpu, SPSR_EL1);
> +
> +	return true;
> +}
> +
>  static bool access_spsr_el2(struct kvm_vcpu *vcpu,
>  			    struct sys_reg_params *p,
>  			    const struct sys_reg_desc *r)
> @@ -1812,6 +1836,9 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	PTRAUTH_KEY(APDB),
>  	PTRAUTH_KEY(APGA),
>  
> +	{ SYS_DESC(SYS_SPSR_EL1), access_spsr},
> +	{ SYS_DESC(SYS_ELR_EL1), access_elr},
> +
>  	{ SYS_DESC(SYS_AFSR0_EL1), access_vm_reg, reset_unknown, AFSR0_EL1 },
>  	{ SYS_DESC(SYS_AFSR1_EL1), access_vm_reg, reset_unknown, AFSR1_EL1 },
>  	{ SYS_DESC(SYS_ESR_EL1), access_vm_reg, reset_unknown, ESR_EL1 },
> @@ -1859,7 +1886,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	{ SYS_DESC(SYS_LORC_EL1), trap_loregion },
>  	{ SYS_DESC(SYS_LORID_EL1), trap_loregion },
>  
> -	{ SYS_DESC(SYS_VBAR_EL1), NULL, reset_val, VBAR_EL1, 0 },
> +	{ SYS_DESC(SYS_VBAR_EL1), access_rw, reset_val, VBAR_EL1, 0 },
>  	{ SYS_DESC(SYS_DISR_EL1), NULL, reset_val, DISR_EL1, 0 },
>  
>  	{ SYS_DESC(SYS_ICC_IAR0_EL1), write_to_read_only },
> -- 
> 2.30.2
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 09/64] KVM: arm64: nv: Support virtual EL2 exceptions
  2022-02-02 15:23     ` Alexandru Elisei
  (?)
@ 2022-02-03 17:43       ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-02-03 17:43 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

On Wed, 02 Feb 2022 15:23:20 +0000,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> Hi,
> 
> On Fri, Jan 28, 2022 at 12:18:17PM +0000, Marc Zyngier wrote:
> > diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
> > new file mode 100644
> > index 000000000000..f52cd4458947
> > --- /dev/null
> > +++ b/arch/arm64/kvm/emulate-nested.c
> > @@ -0,0 +1,197 @@
> 
> Looks like this line:
> 
> // SPDX-License-Identifier: GPL-2.0-only
> 
> is missing.

Indeed. I should check all the new files, as they are a bit... off.

> 
> > +/*
> > + * Copyright (C) 2016 - Linaro and Columbia University
> > + * Author: Jintack Lim <jintack.lim@linaro.org>
> > + */
> > +
> > +#include <linux/kvm.h>
> > +#include <linux/kvm_host.h>
> > +
> > +#include <asm/kvm_emulate.h>
> > +#include <asm/kvm_nested.h>
> > +
> > +#include "hyp/include/hyp/adjust_pc.h"
> > +
> > +#include "trace.h"
> > +
> > +static u64 kvm_check_illegal_exception_return(struct kvm_vcpu *vcpu, u64 spsr)
> > +{
> > +	u64 mode = spsr & PSR_MODE_MASK;
> > +
> > +	/*
> > +	 * Possible causes for an Illegal Exception Return from EL2:
> > +	 * - trying to return to EL3
> > +	 * - trying to return to a 32bit EL
> > +	 * - trying to return to EL1 with HCR_EL2.TGE set
> > +	 */
> > +	if (mode == PSR_MODE_EL3t || mode == PSR_MODE_EL3h ||
> > +	    spsr & PSR_MODE32_BIT ||
> 
> I take it KVM will not allow a L1 hypervisor to run EL1 or EL0 in 32
> bit mode?

No, that'd really be a distraction at this stage. I don't expect any
HW supporting NV NV to support AArch32 at EL1, and if someone really
needs EL0 support (the HW support actually exists), they'll have to
revisit this.

>
> > +	    (vcpu_el2_tge_is_set(vcpu) && (mode == PSR_MODE_EL1t ||
> > +					   mode == PSR_MODE_EL1h))) {
> 
> I think these checks should also be added:
> 
> "A return where the value of the saved process state M[4] bit is 0,
> indicating a return to AArch64 state, and one of the following is
> true:
> 
> - The M[1] bit is 1.
> - The M[3:0] bits are 0b0001.

Definitely should add these two, probably in the form of a switch
enumerating all the possible exception levels rather than checking for
discrete bits that are hard to reason about.

> - The Exception level being returned to is using AArch32 state, as
>   programmed by the SCR_EL3.RW or HCR_EL2.RW bits, or as configured
>   from reset."

That's already caught with the SPSR check above.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 09/64] KVM: arm64: nv: Support virtual EL2 exceptions
@ 2022-02-03 17:43       ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-02-03 17:43 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

On Wed, 02 Feb 2022 15:23:20 +0000,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> Hi,
> 
> On Fri, Jan 28, 2022 at 12:18:17PM +0000, Marc Zyngier wrote:
> > diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
> > new file mode 100644
> > index 000000000000..f52cd4458947
> > --- /dev/null
> > +++ b/arch/arm64/kvm/emulate-nested.c
> > @@ -0,0 +1,197 @@
> 
> Looks like this line:
> 
> // SPDX-License-Identifier: GPL-2.0-only
> 
> is missing.

Indeed. I should check all the new files, as they are a bit... off.

> 
> > +/*
> > + * Copyright (C) 2016 - Linaro and Columbia University
> > + * Author: Jintack Lim <jintack.lim@linaro.org>
> > + */
> > +
> > +#include <linux/kvm.h>
> > +#include <linux/kvm_host.h>
> > +
> > +#include <asm/kvm_emulate.h>
> > +#include <asm/kvm_nested.h>
> > +
> > +#include "hyp/include/hyp/adjust_pc.h"
> > +
> > +#include "trace.h"
> > +
> > +static u64 kvm_check_illegal_exception_return(struct kvm_vcpu *vcpu, u64 spsr)
> > +{
> > +	u64 mode = spsr & PSR_MODE_MASK;
> > +
> > +	/*
> > +	 * Possible causes for an Illegal Exception Return from EL2:
> > +	 * - trying to return to EL3
> > +	 * - trying to return to a 32bit EL
> > +	 * - trying to return to EL1 with HCR_EL2.TGE set
> > +	 */
> > +	if (mode == PSR_MODE_EL3t || mode == PSR_MODE_EL3h ||
> > +	    spsr & PSR_MODE32_BIT ||
> 
> I take it KVM will not allow a L1 hypervisor to run EL1 or EL0 in 32
> bit mode?

No, that'd really be a distraction at this stage. I don't expect any
HW supporting NV NV to support AArch32 at EL1, and if someone really
needs EL0 support (the HW support actually exists), they'll have to
revisit this.

>
> > +	    (vcpu_el2_tge_is_set(vcpu) && (mode == PSR_MODE_EL1t ||
> > +					   mode == PSR_MODE_EL1h))) {
> 
> I think these checks should also be added:
> 
> "A return where the value of the saved process state M[4] bit is 0,
> indicating a return to AArch64 state, and one of the following is
> true:
> 
> - The M[1] bit is 1.
> - The M[3:0] bits are 0b0001.

Definitely should add these two, probably in the form of a switch
enumerating all the possible exception levels rather than checking for
discrete bits that are hard to reason about.

> - The Exception level being returned to is using AArch32 state, as
>   programmed by the SCR_EL3.RW or HCR_EL2.RW bits, or as configured
>   from reset."

That's already caught with the SPSR check above.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 09/64] KVM: arm64: nv: Support virtual EL2 exceptions
@ 2022-02-03 17:43       ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-02-03 17:43 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

On Wed, 02 Feb 2022 15:23:20 +0000,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> Hi,
> 
> On Fri, Jan 28, 2022 at 12:18:17PM +0000, Marc Zyngier wrote:
> > diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
> > new file mode 100644
> > index 000000000000..f52cd4458947
> > --- /dev/null
> > +++ b/arch/arm64/kvm/emulate-nested.c
> > @@ -0,0 +1,197 @@
> 
> Looks like this line:
> 
> // SPDX-License-Identifier: GPL-2.0-only
> 
> is missing.

Indeed. I should check all the new files, as they are a bit... off.

> 
> > +/*
> > + * Copyright (C) 2016 - Linaro and Columbia University
> > + * Author: Jintack Lim <jintack.lim@linaro.org>
> > + */
> > +
> > +#include <linux/kvm.h>
> > +#include <linux/kvm_host.h>
> > +
> > +#include <asm/kvm_emulate.h>
> > +#include <asm/kvm_nested.h>
> > +
> > +#include "hyp/include/hyp/adjust_pc.h"
> > +
> > +#include "trace.h"
> > +
> > +static u64 kvm_check_illegal_exception_return(struct kvm_vcpu *vcpu, u64 spsr)
> > +{
> > +	u64 mode = spsr & PSR_MODE_MASK;
> > +
> > +	/*
> > +	 * Possible causes for an Illegal Exception Return from EL2:
> > +	 * - trying to return to EL3
> > +	 * - trying to return to a 32bit EL
> > +	 * - trying to return to EL1 with HCR_EL2.TGE set
> > +	 */
> > +	if (mode == PSR_MODE_EL3t || mode == PSR_MODE_EL3h ||
> > +	    spsr & PSR_MODE32_BIT ||
> 
> I take it KVM will not allow a L1 hypervisor to run EL1 or EL0 in 32
> bit mode?

No, that'd really be a distraction at this stage. I don't expect any
HW supporting NV NV to support AArch32 at EL1, and if someone really
needs EL0 support (the HW support actually exists), they'll have to
revisit this.

>
> > +	    (vcpu_el2_tge_is_set(vcpu) && (mode == PSR_MODE_EL1t ||
> > +					   mode == PSR_MODE_EL1h))) {
> 
> I think these checks should also be added:
> 
> "A return where the value of the saved process state M[4] bit is 0,
> indicating a return to AArch64 state, and one of the following is
> true:
> 
> - The M[1] bit is 1.
> - The M[3:0] bits are 0b0001.

Definitely should add these two, probably in the form of a switch
enumerating all the possible exception levels rather than checking for
discrete bits that are hard to reason about.

> - The Exception level being returned to is using AArch32 state, as
>   programmed by the SCR_EL3.RW or HCR_EL2.RW bits, or as configured
>   from reset."

That's already caught with the SPSR check above.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 12/64] KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers
  2022-02-02 17:08     ` Alexandru Elisei
  (?)
@ 2022-02-03 18:29       ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-02-03 18:29 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

On Wed, 02 Feb 2022 17:08:26 +0000,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> Hi Marc,
> 
> On Fri, Jan 28, 2022 at 12:18:20PM +0000, Marc Zyngier wrote:
> > Some EL2 system registers immediately affect the current execution
> > of the system, so we need to use their respective EL1 counterparts.
> > For this we need to define a mapping between the two. In general,
> > this only affects non-VHE guest hypervisors, as VHE system registers
> > are compatible with the EL1 counterparts.
> > 
> > These helpers will get used in subsequent patches.
> > 
> > Co-developed-by: Andre Przywara <andre.przywara@arm.com>
> > Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> >  arch/arm64/include/asm/kvm_nested.h | 54 +++++++++++++++++++++++++++++
> >  1 file changed, 54 insertions(+)
> > 
> > diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> > index fd601ea68d13..5a85be6d8eb3 100644
> > --- a/arch/arm64/include/asm/kvm_nested.h
> > +++ b/arch/arm64/include/asm/kvm_nested.h
> > @@ -2,6 +2,7 @@
> >  #ifndef __ARM64_KVM_NESTED_H
> >  #define __ARM64_KVM_NESTED_H
> >  
> > +#include <linux/bitfield.h>
> >  #include <linux/kvm_host.h>
> >  
> >  static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
> > @@ -11,4 +12,57 @@ static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
> >  		test_bit(KVM_ARM_VCPU_HAS_EL2, vcpu->arch.features));
> >  }
> >  
> > +/* Translation helpers from non-VHE EL2 to EL1 */
> > +static inline u64 tcr_el2_ps_to_tcr_el1_ips(u64 tcr_el2)
> > +{
> > +	return (u64)FIELD_GET(TCR_EL2_PS_MASK, tcr_el2) << TCR_IPS_SHIFT;
> > +}
> > +
> > +static inline u64 translate_tcr_el2_to_tcr_el1(u64 tcr)
> > +{
> > +	return TCR_EPD1_MASK |				/* disable TTBR1_EL1 */
> > +	       ((tcr & TCR_EL2_TBI) ? TCR_TBI0 : 0) |
> > +	       tcr_el2_ps_to_tcr_el1_ips(tcr) |
> > +	       (tcr & TCR_EL2_TG0_MASK) |
> > +	       (tcr & TCR_EL2_ORGN0_MASK) |
> > +	       (tcr & TCR_EL2_IRGN0_MASK) |
> > +	       (tcr & TCR_EL2_T0SZ_MASK);
> > +}
> > +
> > +static inline u64 translate_cptr_el2_to_cpacr_el1(u64 cptr_el2)
> > +{
> > +	u64 cpacr_el1 = 0;
> > +
> > +	if (cptr_el2 & CPTR_EL2_TTA)
> > +		cpacr_el1 |= CPACR_EL1_TTA;
> > +	if (!(cptr_el2 & CPTR_EL2_TFP))
> > +		cpacr_el1 |= CPACR_EL1_FPEN;
> > +	if (!(cptr_el2 & CPTR_EL2_TZ))
> > +		cpacr_el1 |= CPACR_EL1_ZEN;
> > +
> > +	return cpacr_el1;
> > +}
> > +
> > +static inline u64 translate_sctlr_el2_to_sctlr_el1(u64 val)
> > +{
> > +	/* Only preserve the minimal set of bits we support */
> > +	val &= (SCTLR_ELx_M | SCTLR_ELx_A | SCTLR_ELx_C | SCTLR_ELx_SA |
> > +		SCTLR_ELx_I | SCTLR_ELx_IESB | SCTLR_ELx_WXN | SCTLR_ELx_EE);
> 
> Checked that the bit positions are the same between SCTLR_EL2 and SCTLR_EL1. I
> think the IESB bit (bit 21) should be after the WXN bit (bit 19) to be
> consistent; doesn't really matter either way.
> 
> > +	val |= SCTLR_EL1_RES1;
> > +
> > +	return val;
> > +}
> > +
> > +static inline u64 translate_ttbr0_el2_to_ttbr0_el1(u64 ttbr0)
> > +{
> > +	/* Clear the ASID field */
> > +	return ttbr0 & ~GENMASK_ULL(63, 48);
> > +}
> > +
> > +static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
> > +{
> > +	return ((FIELD_GET(CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN, cnthctl) << 10) |
> > +		(cnthctl & (CNTHCTL_EVNTI | CNTHCTL_EVNTDIR | CNTHCTL_EVNTEN)));
> 
> I asked about the field positions in the previous series and this is what you
> replied:
> 
> > It's a classic one. Remember that we are running VHE, and remapping a
> > nVHE view of CNTHCTL_EL2 into the VHE view *for the guest*, and that
> > these things are completely shifted around (it has the CNTKCTL_EL1
> > format).
> >
> > For example, on nVHE, CNTHCTL_EL2.EL1PCTEN is bit 0. On nVHE, this is
> > bit 10. That's why we have this shift, and that you now need some
> > paracetamol.
> >
> > You can also look at the way we deal with the same stuff in
> > kvm_timer_init_vhe()".
> 
> Here's how this function is used in vhe/sysreg-sr.c:
> 
> static void __sysreg_restore_vel2_state(struct kvm_cpu_context *ctxt)
> {
> 	[..]
> 	if (__vcpu_el2_e2h_is_set(ctxt)) {
> 		[..]
> 	} else {
> 		[..]
> 		val = translate_cnthctl_el2_to_cntkctl_el1(ctxt_sys_reg(ctxt, CNTHCTL_EL2));
> 		write_sysreg_el1(val, SYS_CNTKCTL);
> 	}
> 	[..]
> }
> 
> CNTHCTL_EL2 is a pure EL2 register. The translate function is called
> when guest HCR_EL2.E2H is not set, therefore virtual CNTHCTL_EL2 has
> the non-VHE format.  And the result of the function is used to write
> to the hardware CNTKCTL_EL1 register (using the CNTKCTL_EL12
> encoding), which is different from the CNTHCTL_EL2
> register. CNTKCTL_EL1 also always has the same format regardless of
> the value of the HCR_EL2.E2H bit.
> 
> I don't understand what the host running with VHE has to do with the
> translate function.

It's just that I completely misunderstood your question, and that I
also failed to realise that this code is just plain buggy. Apologies
for wasting your time on this.

As it turns out, CNTHCTL_EL2 has *zero* influence on the hypervisor
itself, so messing with it and trying to restore it into CNTKCTL_EL12
is remarkably pointless. It is solely designed to influence the
execution of EL1. Duh.

What it should do is to restore parts of this register *on the host*
so that L1's EL1 is actually influenced by what L1's EL2 has set up
(mostly to handle traps from EL1 to EL2).

To summarise:

- the name of the function is misleading: it should be something like
  'translate_nvhe_cnthctl_el2_to_vhe()'. The function is otherwise
  correct, and why I was rambling about the bit offsets.

- the location of the save/restore is wrong: it should happen when
  dealing with EL1 instead of EL2

- the register it targets is wrong: it should target CNTHTL_EL2 (or
  CNTKCTL_EL1 as seen from VHE EL2)

I'll stick a brown paper bag on my head and wear it for the evening.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 12/64] KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers
@ 2022-02-03 18:29       ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-02-03 18:29 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

On Wed, 02 Feb 2022 17:08:26 +0000,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> Hi Marc,
> 
> On Fri, Jan 28, 2022 at 12:18:20PM +0000, Marc Zyngier wrote:
> > Some EL2 system registers immediately affect the current execution
> > of the system, so we need to use their respective EL1 counterparts.
> > For this we need to define a mapping between the two. In general,
> > this only affects non-VHE guest hypervisors, as VHE system registers
> > are compatible with the EL1 counterparts.
> > 
> > These helpers will get used in subsequent patches.
> > 
> > Co-developed-by: Andre Przywara <andre.przywara@arm.com>
> > Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> >  arch/arm64/include/asm/kvm_nested.h | 54 +++++++++++++++++++++++++++++
> >  1 file changed, 54 insertions(+)
> > 
> > diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> > index fd601ea68d13..5a85be6d8eb3 100644
> > --- a/arch/arm64/include/asm/kvm_nested.h
> > +++ b/arch/arm64/include/asm/kvm_nested.h
> > @@ -2,6 +2,7 @@
> >  #ifndef __ARM64_KVM_NESTED_H
> >  #define __ARM64_KVM_NESTED_H
> >  
> > +#include <linux/bitfield.h>
> >  #include <linux/kvm_host.h>
> >  
> >  static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
> > @@ -11,4 +12,57 @@ static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
> >  		test_bit(KVM_ARM_VCPU_HAS_EL2, vcpu->arch.features));
> >  }
> >  
> > +/* Translation helpers from non-VHE EL2 to EL1 */
> > +static inline u64 tcr_el2_ps_to_tcr_el1_ips(u64 tcr_el2)
> > +{
> > +	return (u64)FIELD_GET(TCR_EL2_PS_MASK, tcr_el2) << TCR_IPS_SHIFT;
> > +}
> > +
> > +static inline u64 translate_tcr_el2_to_tcr_el1(u64 tcr)
> > +{
> > +	return TCR_EPD1_MASK |				/* disable TTBR1_EL1 */
> > +	       ((tcr & TCR_EL2_TBI) ? TCR_TBI0 : 0) |
> > +	       tcr_el2_ps_to_tcr_el1_ips(tcr) |
> > +	       (tcr & TCR_EL2_TG0_MASK) |
> > +	       (tcr & TCR_EL2_ORGN0_MASK) |
> > +	       (tcr & TCR_EL2_IRGN0_MASK) |
> > +	       (tcr & TCR_EL2_T0SZ_MASK);
> > +}
> > +
> > +static inline u64 translate_cptr_el2_to_cpacr_el1(u64 cptr_el2)
> > +{
> > +	u64 cpacr_el1 = 0;
> > +
> > +	if (cptr_el2 & CPTR_EL2_TTA)
> > +		cpacr_el1 |= CPACR_EL1_TTA;
> > +	if (!(cptr_el2 & CPTR_EL2_TFP))
> > +		cpacr_el1 |= CPACR_EL1_FPEN;
> > +	if (!(cptr_el2 & CPTR_EL2_TZ))
> > +		cpacr_el1 |= CPACR_EL1_ZEN;
> > +
> > +	return cpacr_el1;
> > +}
> > +
> > +static inline u64 translate_sctlr_el2_to_sctlr_el1(u64 val)
> > +{
> > +	/* Only preserve the minimal set of bits we support */
> > +	val &= (SCTLR_ELx_M | SCTLR_ELx_A | SCTLR_ELx_C | SCTLR_ELx_SA |
> > +		SCTLR_ELx_I | SCTLR_ELx_IESB | SCTLR_ELx_WXN | SCTLR_ELx_EE);
> 
> Checked that the bit positions are the same between SCTLR_EL2 and SCTLR_EL1. I
> think the IESB bit (bit 21) should be after the WXN bit (bit 19) to be
> consistent; doesn't really matter either way.
> 
> > +	val |= SCTLR_EL1_RES1;
> > +
> > +	return val;
> > +}
> > +
> > +static inline u64 translate_ttbr0_el2_to_ttbr0_el1(u64 ttbr0)
> > +{
> > +	/* Clear the ASID field */
> > +	return ttbr0 & ~GENMASK_ULL(63, 48);
> > +}
> > +
> > +static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
> > +{
> > +	return ((FIELD_GET(CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN, cnthctl) << 10) |
> > +		(cnthctl & (CNTHCTL_EVNTI | CNTHCTL_EVNTDIR | CNTHCTL_EVNTEN)));
> 
> I asked about the field positions in the previous series and this is what you
> replied:
> 
> > It's a classic one. Remember that we are running VHE, and remapping a
> > nVHE view of CNTHCTL_EL2 into the VHE view *for the guest*, and that
> > these things are completely shifted around (it has the CNTKCTL_EL1
> > format).
> >
> > For example, on nVHE, CNTHCTL_EL2.EL1PCTEN is bit 0. On nVHE, this is
> > bit 10. That's why we have this shift, and that you now need some
> > paracetamol.
> >
> > You can also look at the way we deal with the same stuff in
> > kvm_timer_init_vhe()".
> 
> Here's how this function is used in vhe/sysreg-sr.c:
> 
> static void __sysreg_restore_vel2_state(struct kvm_cpu_context *ctxt)
> {
> 	[..]
> 	if (__vcpu_el2_e2h_is_set(ctxt)) {
> 		[..]
> 	} else {
> 		[..]
> 		val = translate_cnthctl_el2_to_cntkctl_el1(ctxt_sys_reg(ctxt, CNTHCTL_EL2));
> 		write_sysreg_el1(val, SYS_CNTKCTL);
> 	}
> 	[..]
> }
> 
> CNTHCTL_EL2 is a pure EL2 register. The translate function is called
> when guest HCR_EL2.E2H is not set, therefore virtual CNTHCTL_EL2 has
> the non-VHE format.  And the result of the function is used to write
> to the hardware CNTKCTL_EL1 register (using the CNTKCTL_EL12
> encoding), which is different from the CNTHCTL_EL2
> register. CNTKCTL_EL1 also always has the same format regardless of
> the value of the HCR_EL2.E2H bit.
> 
> I don't understand what the host running with VHE has to do with the
> translate function.

It's just that I completely misunderstood your question, and that I
also failed to realise that this code is just plain buggy. Apologies
for wasting your time on this.

As it turns out, CNTHCTL_EL2 has *zero* influence on the hypervisor
itself, so messing with it and trying to restore it into CNTKCTL_EL12
is remarkably pointless. It is solely designed to influence the
execution of EL1. Duh.

What it should do is to restore parts of this register *on the host*
so that L1's EL1 is actually influenced by what L1's EL2 has set up
(mostly to handle traps from EL1 to EL2).

To summarise:

- the name of the function is misleading: it should be something like
  'translate_nvhe_cnthctl_el2_to_vhe()'. The function is otherwise
  correct, and why I was rambling about the bit offsets.

- the location of the save/restore is wrong: it should happen when
  dealing with EL1 instead of EL2

- the register it targets is wrong: it should target CNTHTL_EL2 (or
  CNTKCTL_EL1 as seen from VHE EL2)

I'll stick a brown paper bag on my head and wear it for the evening.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 12/64] KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers
@ 2022-02-03 18:29       ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-02-03 18:29 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

On Wed, 02 Feb 2022 17:08:26 +0000,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> Hi Marc,
> 
> On Fri, Jan 28, 2022 at 12:18:20PM +0000, Marc Zyngier wrote:
> > Some EL2 system registers immediately affect the current execution
> > of the system, so we need to use their respective EL1 counterparts.
> > For this we need to define a mapping between the two. In general,
> > this only affects non-VHE guest hypervisors, as VHE system registers
> > are compatible with the EL1 counterparts.
> > 
> > These helpers will get used in subsequent patches.
> > 
> > Co-developed-by: Andre Przywara <andre.przywara@arm.com>
> > Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> >  arch/arm64/include/asm/kvm_nested.h | 54 +++++++++++++++++++++++++++++
> >  1 file changed, 54 insertions(+)
> > 
> > diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> > index fd601ea68d13..5a85be6d8eb3 100644
> > --- a/arch/arm64/include/asm/kvm_nested.h
> > +++ b/arch/arm64/include/asm/kvm_nested.h
> > @@ -2,6 +2,7 @@
> >  #ifndef __ARM64_KVM_NESTED_H
> >  #define __ARM64_KVM_NESTED_H
> >  
> > +#include <linux/bitfield.h>
> >  #include <linux/kvm_host.h>
> >  
> >  static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
> > @@ -11,4 +12,57 @@ static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
> >  		test_bit(KVM_ARM_VCPU_HAS_EL2, vcpu->arch.features));
> >  }
> >  
> > +/* Translation helpers from non-VHE EL2 to EL1 */
> > +static inline u64 tcr_el2_ps_to_tcr_el1_ips(u64 tcr_el2)
> > +{
> > +	return (u64)FIELD_GET(TCR_EL2_PS_MASK, tcr_el2) << TCR_IPS_SHIFT;
> > +}
> > +
> > +static inline u64 translate_tcr_el2_to_tcr_el1(u64 tcr)
> > +{
> > +	return TCR_EPD1_MASK |				/* disable TTBR1_EL1 */
> > +	       ((tcr & TCR_EL2_TBI) ? TCR_TBI0 : 0) |
> > +	       tcr_el2_ps_to_tcr_el1_ips(tcr) |
> > +	       (tcr & TCR_EL2_TG0_MASK) |
> > +	       (tcr & TCR_EL2_ORGN0_MASK) |
> > +	       (tcr & TCR_EL2_IRGN0_MASK) |
> > +	       (tcr & TCR_EL2_T0SZ_MASK);
> > +}
> > +
> > +static inline u64 translate_cptr_el2_to_cpacr_el1(u64 cptr_el2)
> > +{
> > +	u64 cpacr_el1 = 0;
> > +
> > +	if (cptr_el2 & CPTR_EL2_TTA)
> > +		cpacr_el1 |= CPACR_EL1_TTA;
> > +	if (!(cptr_el2 & CPTR_EL2_TFP))
> > +		cpacr_el1 |= CPACR_EL1_FPEN;
> > +	if (!(cptr_el2 & CPTR_EL2_TZ))
> > +		cpacr_el1 |= CPACR_EL1_ZEN;
> > +
> > +	return cpacr_el1;
> > +}
> > +
> > +static inline u64 translate_sctlr_el2_to_sctlr_el1(u64 val)
> > +{
> > +	/* Only preserve the minimal set of bits we support */
> > +	val &= (SCTLR_ELx_M | SCTLR_ELx_A | SCTLR_ELx_C | SCTLR_ELx_SA |
> > +		SCTLR_ELx_I | SCTLR_ELx_IESB | SCTLR_ELx_WXN | SCTLR_ELx_EE);
> 
> Checked that the bit positions are the same between SCTLR_EL2 and SCTLR_EL1. I
> think the IESB bit (bit 21) should be after the WXN bit (bit 19) to be
> consistent; doesn't really matter either way.
> 
> > +	val |= SCTLR_EL1_RES1;
> > +
> > +	return val;
> > +}
> > +
> > +static inline u64 translate_ttbr0_el2_to_ttbr0_el1(u64 ttbr0)
> > +{
> > +	/* Clear the ASID field */
> > +	return ttbr0 & ~GENMASK_ULL(63, 48);
> > +}
> > +
> > +static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
> > +{
> > +	return ((FIELD_GET(CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN, cnthctl) << 10) |
> > +		(cnthctl & (CNTHCTL_EVNTI | CNTHCTL_EVNTDIR | CNTHCTL_EVNTEN)));
> 
> I asked about the field positions in the previous series and this is what you
> replied:
> 
> > It's a classic one. Remember that we are running VHE, and remapping a
> > nVHE view of CNTHCTL_EL2 into the VHE view *for the guest*, and that
> > these things are completely shifted around (it has the CNTKCTL_EL1
> > format).
> >
> > For example, on nVHE, CNTHCTL_EL2.EL1PCTEN is bit 0. On nVHE, this is
> > bit 10. That's why we have this shift, and that you now need some
> > paracetamol.
> >
> > You can also look at the way we deal with the same stuff in
> > kvm_timer_init_vhe()".
> 
> Here's how this function is used in vhe/sysreg-sr.c:
> 
> static void __sysreg_restore_vel2_state(struct kvm_cpu_context *ctxt)
> {
> 	[..]
> 	if (__vcpu_el2_e2h_is_set(ctxt)) {
> 		[..]
> 	} else {
> 		[..]
> 		val = translate_cnthctl_el2_to_cntkctl_el1(ctxt_sys_reg(ctxt, CNTHCTL_EL2));
> 		write_sysreg_el1(val, SYS_CNTKCTL);
> 	}
> 	[..]
> }
> 
> CNTHCTL_EL2 is a pure EL2 register. The translate function is called
> when guest HCR_EL2.E2H is not set, therefore virtual CNTHCTL_EL2 has
> the non-VHE format.  And the result of the function is used to write
> to the hardware CNTKCTL_EL1 register (using the CNTKCTL_EL12
> encoding), which is different from the CNTHCTL_EL2
> register. CNTKCTL_EL1 also always has the same format regardless of
> the value of the HCR_EL2.E2H bit.
> 
> I don't understand what the host running with VHE has to do with the
> translate function.

It's just that I completely misunderstood your question, and that I
also failed to realise that this code is just plain buggy. Apologies
for wasting your time on this.

As it turns out, CNTHCTL_EL2 has *zero* influence on the hypervisor
itself, so messing with it and trying to restore it into CNTKCTL_EL12
is remarkably pointless. It is solely designed to influence the
execution of EL1. Duh.

What it should do is to restore parts of this register *on the host*
so that L1's EL1 is actually influenced by what L1's EL2 has set up
(mostly to handle traps from EL1 to EL2).

To summarise:

- the name of the function is misleading: it should be something like
  'translate_nvhe_cnthctl_el2_to_vhe()'. The function is otherwise
  correct, and why I was rambling about the bit offsets.

- the location of the save/restore is wrong: it should happen when
  dealing with EL1 instead of EL2

- the register it targets is wrong: it should target CNTHTL_EL2 (or
  CNTKCTL_EL1 as seen from VHE EL2)

I'll stick a brown paper bag on my head and wear it for the evening.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 19/64] KVM: arm64: nv: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-04 10:58     ` Alexandru Elisei
  -1 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-04 10:58 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:27PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> For the same reason we trap virtual memory register accesses at virtual
> EL2, we need to trap SPSR_EL1, ELR_EL1 and VBAR_EL1 accesses. ARM v8.3
> introduces the HCR_EL2.NV1 bit to be able to trap on those register
> accesses in EL1. Do not set this bit until the whole nesting support is
> completed.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/sys_regs.c | 29 ++++++++++++++++++++++++++++-
>  1 file changed, 28 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 9d3520f1d17a..4f2bcc1e0c25 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1650,6 +1650,30 @@ static bool access_sp_el1(struct kvm_vcpu *vcpu,
>  	return true;
>  }
>  
> +static bool access_elr(struct kvm_vcpu *vcpu,
> +		       struct sys_reg_params *p,
> +		       const struct sys_reg_desc *r)
> +{
> +	if (p->is_write)
> +		vcpu_write_sys_reg(vcpu, p->regval, ELR_EL1);
> +	else
> +		p->regval = vcpu_read_sys_reg(vcpu, ELR_EL1);

Going over the patch again, I was a bit confused why access_elr() uses
vcpu_{read,write}_sys_reg(), but access_spsr() uses __vcpu_sys_reg(). In
the end, vcpu_{read,write}_sys_reg() will write to the shadow copy of the
registers, as the guest is executing at virtual non-VHE EL2, so the two are
equivalent.

It's obviously me nitpicking, but the inconsistency is unexpected. How
about using vcpu_{read,write}_sys_reg() for access_spsr() below?

Thanks,
Alex

> +
> +	return true;
> +}
> +
> +static bool access_spsr(struct kvm_vcpu *vcpu,
> +			struct sys_reg_params *p,
> +			const struct sys_reg_desc *r)
> +{
> +	if (p->is_write)
> +		__vcpu_sys_reg(vcpu, SPSR_EL1) = p->regval;
> +	else
> +		p->regval = __vcpu_sys_reg(vcpu, SPSR_EL1);
> +
> +	return true;
> +}
> +
>  static bool access_spsr_el2(struct kvm_vcpu *vcpu,
>  			    struct sys_reg_params *p,
>  			    const struct sys_reg_desc *r)
> @@ -1812,6 +1836,9 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	PTRAUTH_KEY(APDB),
>  	PTRAUTH_KEY(APGA),
>  
> +	{ SYS_DESC(SYS_SPSR_EL1), access_spsr},
> +	{ SYS_DESC(SYS_ELR_EL1), access_elr},
> +
>  	{ SYS_DESC(SYS_AFSR0_EL1), access_vm_reg, reset_unknown, AFSR0_EL1 },
>  	{ SYS_DESC(SYS_AFSR1_EL1), access_vm_reg, reset_unknown, AFSR1_EL1 },
>  	{ SYS_DESC(SYS_ESR_EL1), access_vm_reg, reset_unknown, ESR_EL1 },
> @@ -1859,7 +1886,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	{ SYS_DESC(SYS_LORC_EL1), trap_loregion },
>  	{ SYS_DESC(SYS_LORID_EL1), trap_loregion },
>  
> -	{ SYS_DESC(SYS_VBAR_EL1), NULL, reset_val, VBAR_EL1, 0 },
> +	{ SYS_DESC(SYS_VBAR_EL1), access_rw, reset_val, VBAR_EL1, 0 },
>  	{ SYS_DESC(SYS_DISR_EL1), NULL, reset_val, DISR_EL1, 0 },
>  
>  	{ SYS_DESC(SYS_ICC_IAR0_EL1), write_to_read_only },
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 19/64] KVM: arm64: nv: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
@ 2022-02-04 10:58     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-04 10:58 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:27PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> For the same reason we trap virtual memory register accesses at virtual
> EL2, we need to trap SPSR_EL1, ELR_EL1 and VBAR_EL1 accesses. ARM v8.3
> introduces the HCR_EL2.NV1 bit to be able to trap on those register
> accesses in EL1. Do not set this bit until the whole nesting support is
> completed.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/sys_regs.c | 29 ++++++++++++++++++++++++++++-
>  1 file changed, 28 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 9d3520f1d17a..4f2bcc1e0c25 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1650,6 +1650,30 @@ static bool access_sp_el1(struct kvm_vcpu *vcpu,
>  	return true;
>  }
>  
> +static bool access_elr(struct kvm_vcpu *vcpu,
> +		       struct sys_reg_params *p,
> +		       const struct sys_reg_desc *r)
> +{
> +	if (p->is_write)
> +		vcpu_write_sys_reg(vcpu, p->regval, ELR_EL1);
> +	else
> +		p->regval = vcpu_read_sys_reg(vcpu, ELR_EL1);

Going over the patch again, I was a bit confused why access_elr() uses
vcpu_{read,write}_sys_reg(), but access_spsr() uses __vcpu_sys_reg(). In
the end, vcpu_{read,write}_sys_reg() will write to the shadow copy of the
registers, as the guest is executing at virtual non-VHE EL2, so the two are
equivalent.

It's obviously me nitpicking, but the inconsistency is unexpected. How
about using vcpu_{read,write}_sys_reg() for access_spsr() below?

Thanks,
Alex

> +
> +	return true;
> +}
> +
> +static bool access_spsr(struct kvm_vcpu *vcpu,
> +			struct sys_reg_params *p,
> +			const struct sys_reg_desc *r)
> +{
> +	if (p->is_write)
> +		__vcpu_sys_reg(vcpu, SPSR_EL1) = p->regval;
> +	else
> +		p->regval = __vcpu_sys_reg(vcpu, SPSR_EL1);
> +
> +	return true;
> +}
> +
>  static bool access_spsr_el2(struct kvm_vcpu *vcpu,
>  			    struct sys_reg_params *p,
>  			    const struct sys_reg_desc *r)
> @@ -1812,6 +1836,9 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	PTRAUTH_KEY(APDB),
>  	PTRAUTH_KEY(APGA),
>  
> +	{ SYS_DESC(SYS_SPSR_EL1), access_spsr},
> +	{ SYS_DESC(SYS_ELR_EL1), access_elr},
> +
>  	{ SYS_DESC(SYS_AFSR0_EL1), access_vm_reg, reset_unknown, AFSR0_EL1 },
>  	{ SYS_DESC(SYS_AFSR1_EL1), access_vm_reg, reset_unknown, AFSR1_EL1 },
>  	{ SYS_DESC(SYS_ESR_EL1), access_vm_reg, reset_unknown, ESR_EL1 },
> @@ -1859,7 +1886,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	{ SYS_DESC(SYS_LORC_EL1), trap_loregion },
>  	{ SYS_DESC(SYS_LORID_EL1), trap_loregion },
>  
> -	{ SYS_DESC(SYS_VBAR_EL1), NULL, reset_val, VBAR_EL1, 0 },
> +	{ SYS_DESC(SYS_VBAR_EL1), access_rw, reset_val, VBAR_EL1, 0 },
>  	{ SYS_DESC(SYS_DISR_EL1), NULL, reset_val, DISR_EL1, 0 },
>  
>  	{ SYS_DESC(SYS_ICC_IAR0_EL1), write_to_read_only },
> -- 
> 2.30.2
> 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 19/64] KVM: arm64: nv: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
@ 2022-02-04 10:58     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-04 10:58 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:27PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> For the same reason we trap virtual memory register accesses at virtual
> EL2, we need to trap SPSR_EL1, ELR_EL1 and VBAR_EL1 accesses. ARM v8.3
> introduces the HCR_EL2.NV1 bit to be able to trap on those register
> accesses in EL1. Do not set this bit until the whole nesting support is
> completed.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/sys_regs.c | 29 ++++++++++++++++++++++++++++-
>  1 file changed, 28 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 9d3520f1d17a..4f2bcc1e0c25 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1650,6 +1650,30 @@ static bool access_sp_el1(struct kvm_vcpu *vcpu,
>  	return true;
>  }
>  
> +static bool access_elr(struct kvm_vcpu *vcpu,
> +		       struct sys_reg_params *p,
> +		       const struct sys_reg_desc *r)
> +{
> +	if (p->is_write)
> +		vcpu_write_sys_reg(vcpu, p->regval, ELR_EL1);
> +	else
> +		p->regval = vcpu_read_sys_reg(vcpu, ELR_EL1);

Going over the patch again, I was a bit confused why access_elr() uses
vcpu_{read,write}_sys_reg(), but access_spsr() uses __vcpu_sys_reg(). In
the end, vcpu_{read,write}_sys_reg() will write to the shadow copy of the
registers, as the guest is executing at virtual non-VHE EL2, so the two are
equivalent.

It's obviously me nitpicking, but the inconsistency is unexpected. How
about using vcpu_{read,write}_sys_reg() for access_spsr() below?

Thanks,
Alex

> +
> +	return true;
> +}
> +
> +static bool access_spsr(struct kvm_vcpu *vcpu,
> +			struct sys_reg_params *p,
> +			const struct sys_reg_desc *r)
> +{
> +	if (p->is_write)
> +		__vcpu_sys_reg(vcpu, SPSR_EL1) = p->regval;
> +	else
> +		p->regval = __vcpu_sys_reg(vcpu, SPSR_EL1);
> +
> +	return true;
> +}
> +
>  static bool access_spsr_el2(struct kvm_vcpu *vcpu,
>  			    struct sys_reg_params *p,
>  			    const struct sys_reg_desc *r)
> @@ -1812,6 +1836,9 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	PTRAUTH_KEY(APDB),
>  	PTRAUTH_KEY(APGA),
>  
> +	{ SYS_DESC(SYS_SPSR_EL1), access_spsr},
> +	{ SYS_DESC(SYS_ELR_EL1), access_elr},
> +
>  	{ SYS_DESC(SYS_AFSR0_EL1), access_vm_reg, reset_unknown, AFSR0_EL1 },
>  	{ SYS_DESC(SYS_AFSR1_EL1), access_vm_reg, reset_unknown, AFSR1_EL1 },
>  	{ SYS_DESC(SYS_ESR_EL1), access_vm_reg, reset_unknown, ESR_EL1 },
> @@ -1859,7 +1886,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	{ SYS_DESC(SYS_LORC_EL1), trap_loregion },
>  	{ SYS_DESC(SYS_LORID_EL1), trap_loregion },
>  
> -	{ SYS_DESC(SYS_VBAR_EL1), NULL, reset_val, VBAR_EL1, 0 },
> +	{ SYS_DESC(SYS_VBAR_EL1), access_rw, reset_val, VBAR_EL1, 0 },
>  	{ SYS_DESC(SYS_DISR_EL1), NULL, reset_val, DISR_EL1, 0 },
>  
>  	{ SYS_DESC(SYS_ICC_IAR0_EL1), write_to_read_only },
> -- 
> 2.30.2
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 20/64] KVM: arm64: nv: Trap CPACR_EL1 access in virtual EL2
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-04 11:10     ` Alexandru Elisei
  -1 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-04 11:10 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi Marc,

This also what Arm ARM recommends for a guest with HCR_EL2.E2H = 0 in section
D5.7.1 "Armv8.3 nested virtualization functionality".

The L1 guest hypervisor running at virtual EL2 is restoring the guest context at
EL1, but the host is actually running in hardware EL1 and the register access
needs to be trapped because it affects the guest hypervisor. And because the L0
host is running with HCR_EL2.E2H = 1, writes to CPACR_EL1 are redirected to
CPTR_EL2.

Also the reset value is correct, as CPACR_EL1 either has RES0 or UNKNOWN fields
are reset.

Looks good to me:

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks,
Alex

On Fri, Jan 28, 2022 at 12:18:28PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> For the same reason we trap virtual memory register accesses in virtual
> EL2, we trap CPACR_EL1 access too; We allow the virtual EL2 mode to
> access EL1 system register state instead of the virtual EL2 one.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/hyp/vhe/switch.c | 3 +++
>  arch/arm64/kvm/sys_regs.c       | 2 +-
>  2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
> index 6ed9e4893a02..1e6a00e7bfb3 100644
> --- a/arch/arm64/kvm/hyp/vhe/switch.c
> +++ b/arch/arm64/kvm/hyp/vhe/switch.c
> @@ -64,6 +64,9 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
>  		__activate_traps_fpsimd32(vcpu);
>  	}
>  
> +	if (vcpu_is_el2(vcpu) && !vcpu_el2_e2h_is_set(vcpu))
> +		val |= CPTR_EL2_TCPAC;
> +	
>  	write_sysreg(val, cpacr_el1);
>  
>  	write_sysreg(__this_cpu_read(kvm_hyp_vector), vbar_el1);
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 4f2bcc1e0c25..7f074a7f6eb3 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1819,7 +1819,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  
>  	{ SYS_DESC(SYS_SCTLR_EL1), access_vm_reg, reset_val, SCTLR_EL1, 0x00C50078 },
>  	{ SYS_DESC(SYS_ACTLR_EL1), access_actlr, reset_actlr, ACTLR_EL1 },
> -	{ SYS_DESC(SYS_CPACR_EL1), NULL, reset_val, CPACR_EL1, 0 },
> +	{ SYS_DESC(SYS_CPACR_EL1), access_rw, reset_val, CPACR_EL1, 0 },
>  
>  	MTE_REG(RGSR_EL1),
>  	MTE_REG(GCR_EL1),
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 20/64] KVM: arm64: nv: Trap CPACR_EL1 access in virtual EL2
@ 2022-02-04 11:10     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-04 11:10 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

Hi Marc,

This also what Arm ARM recommends for a guest with HCR_EL2.E2H = 0 in section
D5.7.1 "Armv8.3 nested virtualization functionality".

The L1 guest hypervisor running at virtual EL2 is restoring the guest context at
EL1, but the host is actually running in hardware EL1 and the register access
needs to be trapped because it affects the guest hypervisor. And because the L0
host is running with HCR_EL2.E2H = 1, writes to CPACR_EL1 are redirected to
CPTR_EL2.

Also the reset value is correct, as CPACR_EL1 either has RES0 or UNKNOWN fields
are reset.

Looks good to me:

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks,
Alex

On Fri, Jan 28, 2022 at 12:18:28PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> For the same reason we trap virtual memory register accesses in virtual
> EL2, we trap CPACR_EL1 access too; We allow the virtual EL2 mode to
> access EL1 system register state instead of the virtual EL2 one.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/hyp/vhe/switch.c | 3 +++
>  arch/arm64/kvm/sys_regs.c       | 2 +-
>  2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
> index 6ed9e4893a02..1e6a00e7bfb3 100644
> --- a/arch/arm64/kvm/hyp/vhe/switch.c
> +++ b/arch/arm64/kvm/hyp/vhe/switch.c
> @@ -64,6 +64,9 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
>  		__activate_traps_fpsimd32(vcpu);
>  	}
>  
> +	if (vcpu_is_el2(vcpu) && !vcpu_el2_e2h_is_set(vcpu))
> +		val |= CPTR_EL2_TCPAC;
> +	
>  	write_sysreg(val, cpacr_el1);
>  
>  	write_sysreg(__this_cpu_read(kvm_hyp_vector), vbar_el1);
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 4f2bcc1e0c25..7f074a7f6eb3 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1819,7 +1819,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  
>  	{ SYS_DESC(SYS_SCTLR_EL1), access_vm_reg, reset_val, SCTLR_EL1, 0x00C50078 },
>  	{ SYS_DESC(SYS_ACTLR_EL1), access_actlr, reset_actlr, ACTLR_EL1 },
> -	{ SYS_DESC(SYS_CPACR_EL1), NULL, reset_val, CPACR_EL1, 0 },
> +	{ SYS_DESC(SYS_CPACR_EL1), access_rw, reset_val, CPACR_EL1, 0 },
>  
>  	MTE_REG(RGSR_EL1),
>  	MTE_REG(GCR_EL1),
> -- 
> 2.30.2
> 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 20/64] KVM: arm64: nv: Trap CPACR_EL1 access in virtual EL2
@ 2022-02-04 11:10     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-04 11:10 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi Marc,

This also what Arm ARM recommends for a guest with HCR_EL2.E2H = 0 in section
D5.7.1 "Armv8.3 nested virtualization functionality".

The L1 guest hypervisor running at virtual EL2 is restoring the guest context at
EL1, but the host is actually running in hardware EL1 and the register access
needs to be trapped because it affects the guest hypervisor. And because the L0
host is running with HCR_EL2.E2H = 1, writes to CPACR_EL1 are redirected to
CPTR_EL2.

Also the reset value is correct, as CPACR_EL1 either has RES0 or UNKNOWN fields
are reset.

Looks good to me:

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks,
Alex

On Fri, Jan 28, 2022 at 12:18:28PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> For the same reason we trap virtual memory register accesses in virtual
> EL2, we trap CPACR_EL1 access too; We allow the virtual EL2 mode to
> access EL1 system register state instead of the virtual EL2 one.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/hyp/vhe/switch.c | 3 +++
>  arch/arm64/kvm/sys_regs.c       | 2 +-
>  2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
> index 6ed9e4893a02..1e6a00e7bfb3 100644
> --- a/arch/arm64/kvm/hyp/vhe/switch.c
> +++ b/arch/arm64/kvm/hyp/vhe/switch.c
> @@ -64,6 +64,9 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
>  		__activate_traps_fpsimd32(vcpu);
>  	}
>  
> +	if (vcpu_is_el2(vcpu) && !vcpu_el2_e2h_is_set(vcpu))
> +		val |= CPTR_EL2_TCPAC;
> +	
>  	write_sysreg(val, cpacr_el1);
>  
>  	write_sysreg(__this_cpu_read(kvm_hyp_vector), vbar_el1);
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 4f2bcc1e0c25..7f074a7f6eb3 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1819,7 +1819,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  
>  	{ SYS_DESC(SYS_SCTLR_EL1), access_vm_reg, reset_val, SCTLR_EL1, 0x00C50078 },
>  	{ SYS_DESC(SYS_ACTLR_EL1), access_actlr, reset_actlr, ACTLR_EL1 },
> -	{ SYS_DESC(SYS_CPACR_EL1), NULL, reset_val, CPACR_EL1, 0 },
> +	{ SYS_DESC(SYS_CPACR_EL1), access_rw, reset_val, CPACR_EL1, 0 },
>  
>  	MTE_REG(RGSR_EL1),
>  	MTE_REG(GCR_EL1),
> -- 
> 2.30.2
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 09/64] KVM: arm64: nv: Support virtual EL2 exceptions
  2022-02-03 17:43       ` Marc Zyngier
  (?)
@ 2022-02-04 11:47         ` Alexandru Elisei
  -1 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-04 11:47 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi Marc,

On Thu, Feb 03, 2022 at 05:43:36PM +0000, Marc Zyngier wrote:
> On Wed, 02 Feb 2022 15:23:20 +0000,
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > 
> > Hi,
> > 
> > On Fri, Jan 28, 2022 at 12:18:17PM +0000, Marc Zyngier wrote:
> > > diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
> > > new file mode 100644
> > > index 000000000000..f52cd4458947
> > > --- /dev/null
> > > +++ b/arch/arm64/kvm/emulate-nested.c
> > > @@ -0,0 +1,197 @@
> > 
> > Looks like this line:
> > 
> > // SPDX-License-Identifier: GPL-2.0-only
> > 
> > is missing.
> 
> Indeed. I should check all the new files, as they are a bit... off.
> 
> > 
> > > +/*
> > > + * Copyright (C) 2016 - Linaro and Columbia University
> > > + * Author: Jintack Lim <jintack.lim@linaro.org>
> > > + */
> > > +
> > > +#include <linux/kvm.h>
> > > +#include <linux/kvm_host.h>
> > > +
> > > +#include <asm/kvm_emulate.h>
> > > +#include <asm/kvm_nested.h>
> > > +
> > > +#include "hyp/include/hyp/adjust_pc.h"
> > > +
> > > +#include "trace.h"
> > > +
> > > +static u64 kvm_check_illegal_exception_return(struct kvm_vcpu *vcpu, u64 spsr)
> > > +{
> > > +	u64 mode = spsr & PSR_MODE_MASK;
> > > +
> > > +	/*
> > > +	 * Possible causes for an Illegal Exception Return from EL2:
> > > +	 * - trying to return to EL3
> > > +	 * - trying to return to a 32bit EL
> > > +	 * - trying to return to EL1 with HCR_EL2.TGE set
> > > +	 */
> > > +	if (mode == PSR_MODE_EL3t || mode == PSR_MODE_EL3h ||
> > > +	    spsr & PSR_MODE32_BIT ||
> > 
> > I take it KVM will not allow a L1 hypervisor to run EL1 or EL0 in 32
> > bit mode?
> 
> No, that'd really be a distraction at this stage. I don't expect any
> HW supporting NV NV to support AArch32 at EL1, and if someone really
> needs EL0 support (the HW support actually exists), they'll have to
> revisit this.
> 
> >
> > > +	    (vcpu_el2_tge_is_set(vcpu) && (mode == PSR_MODE_EL1t ||
> > > +					   mode == PSR_MODE_EL1h))) {
> > 
> > I think these checks should also be added:
> > 
> > "A return where the value of the saved process state M[4] bit is 0,
> > indicating a return to AArch64 state, and one of the following is
> > true:
> > 
> > - The M[1] bit is 1.
> > - The M[3:0] bits are 0b0001.
> 
> Definitely should add these two, probably in the form of a switch
> enumerating all the possible exception levels rather than checking for
> discrete bits that are hard to reason about.
> 
> > - The Exception level being returned to is using AArch32 state, as
> >   programmed by the SCR_EL3.RW or HCR_EL2.RW bits, or as configured
> >   from reset."
> 
> That's already caught with the SPSR check above.

Hmm... I don't think so. The illegal condition, according to the snippet, should
be:

!(mode & PSR_MODE32_BIT) && !(__vcpu_sys_reg(vcpu, HCR_EL2) & HCR_RW)

Or, perhaps better, KVM could add an HCR_EL2 accessor that treated the
HCR_EL2.RW bit as RA0/WI as per the architecture when the EL1 is not capable of
AArch32.  That would make the above situation impossible.

Thanks,
Alex

> 
> Thanks,
> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 09/64] KVM: arm64: nv: Support virtual EL2 exceptions
@ 2022-02-04 11:47         ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-04 11:47 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

Hi Marc,

On Thu, Feb 03, 2022 at 05:43:36PM +0000, Marc Zyngier wrote:
> On Wed, 02 Feb 2022 15:23:20 +0000,
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > 
> > Hi,
> > 
> > On Fri, Jan 28, 2022 at 12:18:17PM +0000, Marc Zyngier wrote:
> > > diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
> > > new file mode 100644
> > > index 000000000000..f52cd4458947
> > > --- /dev/null
> > > +++ b/arch/arm64/kvm/emulate-nested.c
> > > @@ -0,0 +1,197 @@
> > 
> > Looks like this line:
> > 
> > // SPDX-License-Identifier: GPL-2.0-only
> > 
> > is missing.
> 
> Indeed. I should check all the new files, as they are a bit... off.
> 
> > 
> > > +/*
> > > + * Copyright (C) 2016 - Linaro and Columbia University
> > > + * Author: Jintack Lim <jintack.lim@linaro.org>
> > > + */
> > > +
> > > +#include <linux/kvm.h>
> > > +#include <linux/kvm_host.h>
> > > +
> > > +#include <asm/kvm_emulate.h>
> > > +#include <asm/kvm_nested.h>
> > > +
> > > +#include "hyp/include/hyp/adjust_pc.h"
> > > +
> > > +#include "trace.h"
> > > +
> > > +static u64 kvm_check_illegal_exception_return(struct kvm_vcpu *vcpu, u64 spsr)
> > > +{
> > > +	u64 mode = spsr & PSR_MODE_MASK;
> > > +
> > > +	/*
> > > +	 * Possible causes for an Illegal Exception Return from EL2:
> > > +	 * - trying to return to EL3
> > > +	 * - trying to return to a 32bit EL
> > > +	 * - trying to return to EL1 with HCR_EL2.TGE set
> > > +	 */
> > > +	if (mode == PSR_MODE_EL3t || mode == PSR_MODE_EL3h ||
> > > +	    spsr & PSR_MODE32_BIT ||
> > 
> > I take it KVM will not allow a L1 hypervisor to run EL1 or EL0 in 32
> > bit mode?
> 
> No, that'd really be a distraction at this stage. I don't expect any
> HW supporting NV NV to support AArch32 at EL1, and if someone really
> needs EL0 support (the HW support actually exists), they'll have to
> revisit this.
> 
> >
> > > +	    (vcpu_el2_tge_is_set(vcpu) && (mode == PSR_MODE_EL1t ||
> > > +					   mode == PSR_MODE_EL1h))) {
> > 
> > I think these checks should also be added:
> > 
> > "A return where the value of the saved process state M[4] bit is 0,
> > indicating a return to AArch64 state, and one of the following is
> > true:
> > 
> > - The M[1] bit is 1.
> > - The M[3:0] bits are 0b0001.
> 
> Definitely should add these two, probably in the form of a switch
> enumerating all the possible exception levels rather than checking for
> discrete bits that are hard to reason about.
> 
> > - The Exception level being returned to is using AArch32 state, as
> >   programmed by the SCR_EL3.RW or HCR_EL2.RW bits, or as configured
> >   from reset."
> 
> That's already caught with the SPSR check above.

Hmm... I don't think so. The illegal condition, according to the snippet, should
be:

!(mode & PSR_MODE32_BIT) && !(__vcpu_sys_reg(vcpu, HCR_EL2) & HCR_RW)

Or, perhaps better, KVM could add an HCR_EL2 accessor that treated the
HCR_EL2.RW bit as RA0/WI as per the architecture when the EL1 is not capable of
AArch32.  That would make the above situation impossible.

Thanks,
Alex

> 
> Thanks,
> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 09/64] KVM: arm64: nv: Support virtual EL2 exceptions
@ 2022-02-04 11:47         ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-04 11:47 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi Marc,

On Thu, Feb 03, 2022 at 05:43:36PM +0000, Marc Zyngier wrote:
> On Wed, 02 Feb 2022 15:23:20 +0000,
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > 
> > Hi,
> > 
> > On Fri, Jan 28, 2022 at 12:18:17PM +0000, Marc Zyngier wrote:
> > > diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
> > > new file mode 100644
> > > index 000000000000..f52cd4458947
> > > --- /dev/null
> > > +++ b/arch/arm64/kvm/emulate-nested.c
> > > @@ -0,0 +1,197 @@
> > 
> > Looks like this line:
> > 
> > // SPDX-License-Identifier: GPL-2.0-only
> > 
> > is missing.
> 
> Indeed. I should check all the new files, as they are a bit... off.
> 
> > 
> > > +/*
> > > + * Copyright (C) 2016 - Linaro and Columbia University
> > > + * Author: Jintack Lim <jintack.lim@linaro.org>
> > > + */
> > > +
> > > +#include <linux/kvm.h>
> > > +#include <linux/kvm_host.h>
> > > +
> > > +#include <asm/kvm_emulate.h>
> > > +#include <asm/kvm_nested.h>
> > > +
> > > +#include "hyp/include/hyp/adjust_pc.h"
> > > +
> > > +#include "trace.h"
> > > +
> > > +static u64 kvm_check_illegal_exception_return(struct kvm_vcpu *vcpu, u64 spsr)
> > > +{
> > > +	u64 mode = spsr & PSR_MODE_MASK;
> > > +
> > > +	/*
> > > +	 * Possible causes for an Illegal Exception Return from EL2:
> > > +	 * - trying to return to EL3
> > > +	 * - trying to return to a 32bit EL
> > > +	 * - trying to return to EL1 with HCR_EL2.TGE set
> > > +	 */
> > > +	if (mode == PSR_MODE_EL3t || mode == PSR_MODE_EL3h ||
> > > +	    spsr & PSR_MODE32_BIT ||
> > 
> > I take it KVM will not allow a L1 hypervisor to run EL1 or EL0 in 32
> > bit mode?
> 
> No, that'd really be a distraction at this stage. I don't expect any
> HW supporting NV NV to support AArch32 at EL1, and if someone really
> needs EL0 support (the HW support actually exists), they'll have to
> revisit this.
> 
> >
> > > +	    (vcpu_el2_tge_is_set(vcpu) && (mode == PSR_MODE_EL1t ||
> > > +					   mode == PSR_MODE_EL1h))) {
> > 
> > I think these checks should also be added:
> > 
> > "A return where the value of the saved process state M[4] bit is 0,
> > indicating a return to AArch64 state, and one of the following is
> > true:
> > 
> > - The M[1] bit is 1.
> > - The M[3:0] bits are 0b0001.
> 
> Definitely should add these two, probably in the form of a switch
> enumerating all the possible exception levels rather than checking for
> discrete bits that are hard to reason about.
> 
> > - The Exception level being returned to is using AArch32 state, as
> >   programmed by the SCR_EL3.RW or HCR_EL2.RW bits, or as configured
> >   from reset."
> 
> That's already caught with the SPSR check above.

Hmm... I don't think so. The illegal condition, according to the snippet, should
be:

!(mode & PSR_MODE32_BIT) && !(__vcpu_sys_reg(vcpu, HCR_EL2) & HCR_RW)

Or, perhaps better, KVM could add an HCR_EL2 accessor that treated the
HCR_EL2.RW bit as RA0/WI as per the architecture when the EL1 is not capable of
AArch32.  That would make the above situation impossible.

Thanks,
Alex

> 
> Thanks,
> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 12/64] KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers
  2022-02-03 18:29       ` Marc Zyngier
  (?)
@ 2022-02-04 12:05         ` Alexandru Elisei
  -1 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-04 12:05 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi Marc,

On Thu, Feb 03, 2022 at 06:29:13PM +0000, Marc Zyngier wrote:
> On Wed, 02 Feb 2022 17:08:26 +0000,
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > 
> > Hi Marc,
> > 
> > On Fri, Jan 28, 2022 at 12:18:20PM +0000, Marc Zyngier wrote:
> > > Some EL2 system registers immediately affect the current execution
> > > of the system, so we need to use their respective EL1 counterparts.
> > > For this we need to define a mapping between the two. In general,
> > > this only affects non-VHE guest hypervisors, as VHE system registers
> > > are compatible with the EL1 counterparts.
> > > 
> > > These helpers will get used in subsequent patches.
> > > 
> > > Co-developed-by: Andre Przywara <andre.przywara@arm.com>
> > > Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> > > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > > ---
> > >  arch/arm64/include/asm/kvm_nested.h | 54 +++++++++++++++++++++++++++++
> > >  1 file changed, 54 insertions(+)
> > > 
> > > diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> > > index fd601ea68d13..5a85be6d8eb3 100644
> > > --- a/arch/arm64/include/asm/kvm_nested.h
> > > +++ b/arch/arm64/include/asm/kvm_nested.h
> > > @@ -2,6 +2,7 @@
> > >  #ifndef __ARM64_KVM_NESTED_H
> > >  #define __ARM64_KVM_NESTED_H
> > >  
> > > +#include <linux/bitfield.h>
> > >  #include <linux/kvm_host.h>
> > >  
> > >  static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
> > > @@ -11,4 +12,57 @@ static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
> > >  		test_bit(KVM_ARM_VCPU_HAS_EL2, vcpu->arch.features));
> > >  }
> > >  
> > > +/* Translation helpers from non-VHE EL2 to EL1 */
> > > +static inline u64 tcr_el2_ps_to_tcr_el1_ips(u64 tcr_el2)
> > > +{
> > > +	return (u64)FIELD_GET(TCR_EL2_PS_MASK, tcr_el2) << TCR_IPS_SHIFT;
> > > +}
> > > +
> > > +static inline u64 translate_tcr_el2_to_tcr_el1(u64 tcr)
> > > +{
> > > +	return TCR_EPD1_MASK |				/* disable TTBR1_EL1 */
> > > +	       ((tcr & TCR_EL2_TBI) ? TCR_TBI0 : 0) |
> > > +	       tcr_el2_ps_to_tcr_el1_ips(tcr) |
> > > +	       (tcr & TCR_EL2_TG0_MASK) |
> > > +	       (tcr & TCR_EL2_ORGN0_MASK) |
> > > +	       (tcr & TCR_EL2_IRGN0_MASK) |
> > > +	       (tcr & TCR_EL2_T0SZ_MASK);
> > > +}
> > > +
> > > +static inline u64 translate_cptr_el2_to_cpacr_el1(u64 cptr_el2)
> > > +{
> > > +	u64 cpacr_el1 = 0;
> > > +
> > > +	if (cptr_el2 & CPTR_EL2_TTA)
> > > +		cpacr_el1 |= CPACR_EL1_TTA;
> > > +	if (!(cptr_el2 & CPTR_EL2_TFP))
> > > +		cpacr_el1 |= CPACR_EL1_FPEN;
> > > +	if (!(cptr_el2 & CPTR_EL2_TZ))
> > > +		cpacr_el1 |= CPACR_EL1_ZEN;
> > > +
> > > +	return cpacr_el1;
> > > +}
> > > +
> > > +static inline u64 translate_sctlr_el2_to_sctlr_el1(u64 val)
> > > +{
> > > +	/* Only preserve the minimal set of bits we support */
> > > +	val &= (SCTLR_ELx_M | SCTLR_ELx_A | SCTLR_ELx_C | SCTLR_ELx_SA |
> > > +		SCTLR_ELx_I | SCTLR_ELx_IESB | SCTLR_ELx_WXN | SCTLR_ELx_EE);
> > 
> > Checked that the bit positions are the same between SCTLR_EL2 and SCTLR_EL1. I
> > think the IESB bit (bit 21) should be after the WXN bit (bit 19) to be
> > consistent; doesn't really matter either way.
> > 
> > > +	val |= SCTLR_EL1_RES1;
> > > +
> > > +	return val;
> > > +}
> > > +
> > > +static inline u64 translate_ttbr0_el2_to_ttbr0_el1(u64 ttbr0)
> > > +{
> > > +	/* Clear the ASID field */
> > > +	return ttbr0 & ~GENMASK_ULL(63, 48);
> > > +}
> > > +
> > > +static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
> > > +{
> > > +	return ((FIELD_GET(CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN, cnthctl) << 10) |
> > > +		(cnthctl & (CNTHCTL_EVNTI | CNTHCTL_EVNTDIR | CNTHCTL_EVNTEN)));
> > 
> > I asked about the field positions in the previous series and this is what you
> > replied:
> > 
> > > It's a classic one. Remember that we are running VHE, and remapping a
> > > nVHE view of CNTHCTL_EL2 into the VHE view *for the guest*, and that
> > > these things are completely shifted around (it has the CNTKCTL_EL1
> > > format).
> > >
> > > For example, on nVHE, CNTHCTL_EL2.EL1PCTEN is bit 0. On nVHE, this is
> > > bit 10. That's why we have this shift, and that you now need some
> > > paracetamol.
> > >
> > > You can also look at the way we deal with the same stuff in
> > > kvm_timer_init_vhe()".
> > 
> > Here's how this function is used in vhe/sysreg-sr.c:
> > 
> > static void __sysreg_restore_vel2_state(struct kvm_cpu_context *ctxt)
> > {
> > 	[..]
> > 	if (__vcpu_el2_e2h_is_set(ctxt)) {
> > 		[..]
> > 	} else {
> > 		[..]
> > 		val = translate_cnthctl_el2_to_cntkctl_el1(ctxt_sys_reg(ctxt, CNTHCTL_EL2));
> > 		write_sysreg_el1(val, SYS_CNTKCTL);
> > 	}
> > 	[..]
> > }
> > 
> > CNTHCTL_EL2 is a pure EL2 register. The translate function is called
> > when guest HCR_EL2.E2H is not set, therefore virtual CNTHCTL_EL2 has
> > the non-VHE format.  And the result of the function is used to write
> > to the hardware CNTKCTL_EL1 register (using the CNTKCTL_EL12
> > encoding), which is different from the CNTHCTL_EL2
> > register. CNTKCTL_EL1 also always has the same format regardless of
> > the value of the HCR_EL2.E2H bit.
> > 
> > I don't understand what the host running with VHE has to do with the
> > translate function.
> 
> It's just that I completely misunderstood your question, and that I
> also failed to realise that this code is just plain buggy. Apologies
> for wasting your time on this.
> 
> As it turns out, CNTHCTL_EL2 has *zero* influence on the hypervisor
> itself, so messing with it and trying to restore it into CNTKCTL_EL12
> is remarkably pointless. It is solely designed to influence the
> execution of EL1. Duh.

Hmm... it looks to me like the EVENTI and EVENTDIR fields in CNTHCTL_EL2 do have
an effect on the hypervisor. Unfortunately, the control the generation of events
from the *physical* counter, while the fields EVENTI and EVENTDIR from CNTKCTL_EL1
control the generation of events from the the *virtual* counter.

KVM cannot program CNTKCTL_EL1 to get the same effect as what the L1 hypervisor
intended by programming CNTHCTL_EL2, so I think it must program the
corresponding physical CNTHCTL_EL2 fields when running the L1 guest.

Thoughts?

Thanks,
Alex

> 
> What it should do is to restore parts of this register *on the host*
> so that L1's EL1 is actually influenced by what L1's EL2 has set up
> (mostly to handle traps from EL1 to EL2).
> 
> To summarise:
> 
> - the name of the function is misleading: it should be something like
>   'translate_nvhe_cnthctl_el2_to_vhe()'. The function is otherwise
>   correct, and why I was rambling about the bit offsets.
> 
> - the location of the save/restore is wrong: it should happen when
>   dealing with EL1 instead of EL2
> 
> - the register it targets is wrong: it should target CNTHTL_EL2 (or
>   CNTKCTL_EL1 as seen from VHE EL2)
> 
> I'll stick a brown paper bag on my head and wear it for the evening.
> 
> Thanks,
> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 12/64] KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers
@ 2022-02-04 12:05         ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-04 12:05 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

Hi Marc,

On Thu, Feb 03, 2022 at 06:29:13PM +0000, Marc Zyngier wrote:
> On Wed, 02 Feb 2022 17:08:26 +0000,
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > 
> > Hi Marc,
> > 
> > On Fri, Jan 28, 2022 at 12:18:20PM +0000, Marc Zyngier wrote:
> > > Some EL2 system registers immediately affect the current execution
> > > of the system, so we need to use their respective EL1 counterparts.
> > > For this we need to define a mapping between the two. In general,
> > > this only affects non-VHE guest hypervisors, as VHE system registers
> > > are compatible with the EL1 counterparts.
> > > 
> > > These helpers will get used in subsequent patches.
> > > 
> > > Co-developed-by: Andre Przywara <andre.przywara@arm.com>
> > > Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> > > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > > ---
> > >  arch/arm64/include/asm/kvm_nested.h | 54 +++++++++++++++++++++++++++++
> > >  1 file changed, 54 insertions(+)
> > > 
> > > diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> > > index fd601ea68d13..5a85be6d8eb3 100644
> > > --- a/arch/arm64/include/asm/kvm_nested.h
> > > +++ b/arch/arm64/include/asm/kvm_nested.h
> > > @@ -2,6 +2,7 @@
> > >  #ifndef __ARM64_KVM_NESTED_H
> > >  #define __ARM64_KVM_NESTED_H
> > >  
> > > +#include <linux/bitfield.h>
> > >  #include <linux/kvm_host.h>
> > >  
> > >  static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
> > > @@ -11,4 +12,57 @@ static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
> > >  		test_bit(KVM_ARM_VCPU_HAS_EL2, vcpu->arch.features));
> > >  }
> > >  
> > > +/* Translation helpers from non-VHE EL2 to EL1 */
> > > +static inline u64 tcr_el2_ps_to_tcr_el1_ips(u64 tcr_el2)
> > > +{
> > > +	return (u64)FIELD_GET(TCR_EL2_PS_MASK, tcr_el2) << TCR_IPS_SHIFT;
> > > +}
> > > +
> > > +static inline u64 translate_tcr_el2_to_tcr_el1(u64 tcr)
> > > +{
> > > +	return TCR_EPD1_MASK |				/* disable TTBR1_EL1 */
> > > +	       ((tcr & TCR_EL2_TBI) ? TCR_TBI0 : 0) |
> > > +	       tcr_el2_ps_to_tcr_el1_ips(tcr) |
> > > +	       (tcr & TCR_EL2_TG0_MASK) |
> > > +	       (tcr & TCR_EL2_ORGN0_MASK) |
> > > +	       (tcr & TCR_EL2_IRGN0_MASK) |
> > > +	       (tcr & TCR_EL2_T0SZ_MASK);
> > > +}
> > > +
> > > +static inline u64 translate_cptr_el2_to_cpacr_el1(u64 cptr_el2)
> > > +{
> > > +	u64 cpacr_el1 = 0;
> > > +
> > > +	if (cptr_el2 & CPTR_EL2_TTA)
> > > +		cpacr_el1 |= CPACR_EL1_TTA;
> > > +	if (!(cptr_el2 & CPTR_EL2_TFP))
> > > +		cpacr_el1 |= CPACR_EL1_FPEN;
> > > +	if (!(cptr_el2 & CPTR_EL2_TZ))
> > > +		cpacr_el1 |= CPACR_EL1_ZEN;
> > > +
> > > +	return cpacr_el1;
> > > +}
> > > +
> > > +static inline u64 translate_sctlr_el2_to_sctlr_el1(u64 val)
> > > +{
> > > +	/* Only preserve the minimal set of bits we support */
> > > +	val &= (SCTLR_ELx_M | SCTLR_ELx_A | SCTLR_ELx_C | SCTLR_ELx_SA |
> > > +		SCTLR_ELx_I | SCTLR_ELx_IESB | SCTLR_ELx_WXN | SCTLR_ELx_EE);
> > 
> > Checked that the bit positions are the same between SCTLR_EL2 and SCTLR_EL1. I
> > think the IESB bit (bit 21) should be after the WXN bit (bit 19) to be
> > consistent; doesn't really matter either way.
> > 
> > > +	val |= SCTLR_EL1_RES1;
> > > +
> > > +	return val;
> > > +}
> > > +
> > > +static inline u64 translate_ttbr0_el2_to_ttbr0_el1(u64 ttbr0)
> > > +{
> > > +	/* Clear the ASID field */
> > > +	return ttbr0 & ~GENMASK_ULL(63, 48);
> > > +}
> > > +
> > > +static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
> > > +{
> > > +	return ((FIELD_GET(CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN, cnthctl) << 10) |
> > > +		(cnthctl & (CNTHCTL_EVNTI | CNTHCTL_EVNTDIR | CNTHCTL_EVNTEN)));
> > 
> > I asked about the field positions in the previous series and this is what you
> > replied:
> > 
> > > It's a classic one. Remember that we are running VHE, and remapping a
> > > nVHE view of CNTHCTL_EL2 into the VHE view *for the guest*, and that
> > > these things are completely shifted around (it has the CNTKCTL_EL1
> > > format).
> > >
> > > For example, on nVHE, CNTHCTL_EL2.EL1PCTEN is bit 0. On nVHE, this is
> > > bit 10. That's why we have this shift, and that you now need some
> > > paracetamol.
> > >
> > > You can also look at the way we deal with the same stuff in
> > > kvm_timer_init_vhe()".
> > 
> > Here's how this function is used in vhe/sysreg-sr.c:
> > 
> > static void __sysreg_restore_vel2_state(struct kvm_cpu_context *ctxt)
> > {
> > 	[..]
> > 	if (__vcpu_el2_e2h_is_set(ctxt)) {
> > 		[..]
> > 	} else {
> > 		[..]
> > 		val = translate_cnthctl_el2_to_cntkctl_el1(ctxt_sys_reg(ctxt, CNTHCTL_EL2));
> > 		write_sysreg_el1(val, SYS_CNTKCTL);
> > 	}
> > 	[..]
> > }
> > 
> > CNTHCTL_EL2 is a pure EL2 register. The translate function is called
> > when guest HCR_EL2.E2H is not set, therefore virtual CNTHCTL_EL2 has
> > the non-VHE format.  And the result of the function is used to write
> > to the hardware CNTKCTL_EL1 register (using the CNTKCTL_EL12
> > encoding), which is different from the CNTHCTL_EL2
> > register. CNTKCTL_EL1 also always has the same format regardless of
> > the value of the HCR_EL2.E2H bit.
> > 
> > I don't understand what the host running with VHE has to do with the
> > translate function.
> 
> It's just that I completely misunderstood your question, and that I
> also failed to realise that this code is just plain buggy. Apologies
> for wasting your time on this.
> 
> As it turns out, CNTHCTL_EL2 has *zero* influence on the hypervisor
> itself, so messing with it and trying to restore it into CNTKCTL_EL12
> is remarkably pointless. It is solely designed to influence the
> execution of EL1. Duh.

Hmm... it looks to me like the EVENTI and EVENTDIR fields in CNTHCTL_EL2 do have
an effect on the hypervisor. Unfortunately, the control the generation of events
from the *physical* counter, while the fields EVENTI and EVENTDIR from CNTKCTL_EL1
control the generation of events from the the *virtual* counter.

KVM cannot program CNTKCTL_EL1 to get the same effect as what the L1 hypervisor
intended by programming CNTHCTL_EL2, so I think it must program the
corresponding physical CNTHCTL_EL2 fields when running the L1 guest.

Thoughts?

Thanks,
Alex

> 
> What it should do is to restore parts of this register *on the host*
> so that L1's EL1 is actually influenced by what L1's EL2 has set up
> (mostly to handle traps from EL1 to EL2).
> 
> To summarise:
> 
> - the name of the function is misleading: it should be something like
>   'translate_nvhe_cnthctl_el2_to_vhe()'. The function is otherwise
>   correct, and why I was rambling about the bit offsets.
> 
> - the location of the save/restore is wrong: it should happen when
>   dealing with EL1 instead of EL2
> 
> - the register it targets is wrong: it should target CNTHTL_EL2 (or
>   CNTKCTL_EL1 as seen from VHE EL2)
> 
> I'll stick a brown paper bag on my head and wear it for the evening.
> 
> Thanks,
> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 12/64] KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers
@ 2022-02-04 12:05         ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-04 12:05 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi Marc,

On Thu, Feb 03, 2022 at 06:29:13PM +0000, Marc Zyngier wrote:
> On Wed, 02 Feb 2022 17:08:26 +0000,
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > 
> > Hi Marc,
> > 
> > On Fri, Jan 28, 2022 at 12:18:20PM +0000, Marc Zyngier wrote:
> > > Some EL2 system registers immediately affect the current execution
> > > of the system, so we need to use their respective EL1 counterparts.
> > > For this we need to define a mapping between the two. In general,
> > > this only affects non-VHE guest hypervisors, as VHE system registers
> > > are compatible with the EL1 counterparts.
> > > 
> > > These helpers will get used in subsequent patches.
> > > 
> > > Co-developed-by: Andre Przywara <andre.przywara@arm.com>
> > > Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> > > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > > ---
> > >  arch/arm64/include/asm/kvm_nested.h | 54 +++++++++++++++++++++++++++++
> > >  1 file changed, 54 insertions(+)
> > > 
> > > diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> > > index fd601ea68d13..5a85be6d8eb3 100644
> > > --- a/arch/arm64/include/asm/kvm_nested.h
> > > +++ b/arch/arm64/include/asm/kvm_nested.h
> > > @@ -2,6 +2,7 @@
> > >  #ifndef __ARM64_KVM_NESTED_H
> > >  #define __ARM64_KVM_NESTED_H
> > >  
> > > +#include <linux/bitfield.h>
> > >  #include <linux/kvm_host.h>
> > >  
> > >  static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
> > > @@ -11,4 +12,57 @@ static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
> > >  		test_bit(KVM_ARM_VCPU_HAS_EL2, vcpu->arch.features));
> > >  }
> > >  
> > > +/* Translation helpers from non-VHE EL2 to EL1 */
> > > +static inline u64 tcr_el2_ps_to_tcr_el1_ips(u64 tcr_el2)
> > > +{
> > > +	return (u64)FIELD_GET(TCR_EL2_PS_MASK, tcr_el2) << TCR_IPS_SHIFT;
> > > +}
> > > +
> > > +static inline u64 translate_tcr_el2_to_tcr_el1(u64 tcr)
> > > +{
> > > +	return TCR_EPD1_MASK |				/* disable TTBR1_EL1 */
> > > +	       ((tcr & TCR_EL2_TBI) ? TCR_TBI0 : 0) |
> > > +	       tcr_el2_ps_to_tcr_el1_ips(tcr) |
> > > +	       (tcr & TCR_EL2_TG0_MASK) |
> > > +	       (tcr & TCR_EL2_ORGN0_MASK) |
> > > +	       (tcr & TCR_EL2_IRGN0_MASK) |
> > > +	       (tcr & TCR_EL2_T0SZ_MASK);
> > > +}
> > > +
> > > +static inline u64 translate_cptr_el2_to_cpacr_el1(u64 cptr_el2)
> > > +{
> > > +	u64 cpacr_el1 = 0;
> > > +
> > > +	if (cptr_el2 & CPTR_EL2_TTA)
> > > +		cpacr_el1 |= CPACR_EL1_TTA;
> > > +	if (!(cptr_el2 & CPTR_EL2_TFP))
> > > +		cpacr_el1 |= CPACR_EL1_FPEN;
> > > +	if (!(cptr_el2 & CPTR_EL2_TZ))
> > > +		cpacr_el1 |= CPACR_EL1_ZEN;
> > > +
> > > +	return cpacr_el1;
> > > +}
> > > +
> > > +static inline u64 translate_sctlr_el2_to_sctlr_el1(u64 val)
> > > +{
> > > +	/* Only preserve the minimal set of bits we support */
> > > +	val &= (SCTLR_ELx_M | SCTLR_ELx_A | SCTLR_ELx_C | SCTLR_ELx_SA |
> > > +		SCTLR_ELx_I | SCTLR_ELx_IESB | SCTLR_ELx_WXN | SCTLR_ELx_EE);
> > 
> > Checked that the bit positions are the same between SCTLR_EL2 and SCTLR_EL1. I
> > think the IESB bit (bit 21) should be after the WXN bit (bit 19) to be
> > consistent; doesn't really matter either way.
> > 
> > > +	val |= SCTLR_EL1_RES1;
> > > +
> > > +	return val;
> > > +}
> > > +
> > > +static inline u64 translate_ttbr0_el2_to_ttbr0_el1(u64 ttbr0)
> > > +{
> > > +	/* Clear the ASID field */
> > > +	return ttbr0 & ~GENMASK_ULL(63, 48);
> > > +}
> > > +
> > > +static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
> > > +{
> > > +	return ((FIELD_GET(CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN, cnthctl) << 10) |
> > > +		(cnthctl & (CNTHCTL_EVNTI | CNTHCTL_EVNTDIR | CNTHCTL_EVNTEN)));
> > 
> > I asked about the field positions in the previous series and this is what you
> > replied:
> > 
> > > It's a classic one. Remember that we are running VHE, and remapping a
> > > nVHE view of CNTHCTL_EL2 into the VHE view *for the guest*, and that
> > > these things are completely shifted around (it has the CNTKCTL_EL1
> > > format).
> > >
> > > For example, on nVHE, CNTHCTL_EL2.EL1PCTEN is bit 0. On nVHE, this is
> > > bit 10. That's why we have this shift, and that you now need some
> > > paracetamol.
> > >
> > > You can also look at the way we deal with the same stuff in
> > > kvm_timer_init_vhe()".
> > 
> > Here's how this function is used in vhe/sysreg-sr.c:
> > 
> > static void __sysreg_restore_vel2_state(struct kvm_cpu_context *ctxt)
> > {
> > 	[..]
> > 	if (__vcpu_el2_e2h_is_set(ctxt)) {
> > 		[..]
> > 	} else {
> > 		[..]
> > 		val = translate_cnthctl_el2_to_cntkctl_el1(ctxt_sys_reg(ctxt, CNTHCTL_EL2));
> > 		write_sysreg_el1(val, SYS_CNTKCTL);
> > 	}
> > 	[..]
> > }
> > 
> > CNTHCTL_EL2 is a pure EL2 register. The translate function is called
> > when guest HCR_EL2.E2H is not set, therefore virtual CNTHCTL_EL2 has
> > the non-VHE format.  And the result of the function is used to write
> > to the hardware CNTKCTL_EL1 register (using the CNTKCTL_EL12
> > encoding), which is different from the CNTHCTL_EL2
> > register. CNTKCTL_EL1 also always has the same format regardless of
> > the value of the HCR_EL2.E2H bit.
> > 
> > I don't understand what the host running with VHE has to do with the
> > translate function.
> 
> It's just that I completely misunderstood your question, and that I
> also failed to realise that this code is just plain buggy. Apologies
> for wasting your time on this.
> 
> As it turns out, CNTHCTL_EL2 has *zero* influence on the hypervisor
> itself, so messing with it and trying to restore it into CNTKCTL_EL12
> is remarkably pointless. It is solely designed to influence the
> execution of EL1. Duh.

Hmm... it looks to me like the EVENTI and EVENTDIR fields in CNTHCTL_EL2 do have
an effect on the hypervisor. Unfortunately, the control the generation of events
from the *physical* counter, while the fields EVENTI and EVENTDIR from CNTKCTL_EL1
control the generation of events from the the *virtual* counter.

KVM cannot program CNTKCTL_EL1 to get the same effect as what the L1 hypervisor
intended by programming CNTHCTL_EL2, so I think it must program the
corresponding physical CNTHCTL_EL2 fields when running the L1 guest.

Thoughts?

Thanks,
Alex

> 
> What it should do is to restore parts of this register *on the host*
> so that L1's EL1 is actually influenced by what L1's EL2 has set up
> (mostly to handle traps from EL1 to EL2).
> 
> To summarise:
> 
> - the name of the function is misleading: it should be something like
>   'translate_nvhe_cnthctl_el2_to_vhe()'. The function is otherwise
>   correct, and why I was rambling about the bit offsets.
> 
> - the location of the save/restore is wrong: it should happen when
>   dealing with EL1 instead of EL2
> 
> - the register it targets is wrong: it should target CNTHTL_EL2 (or
>   CNTKCTL_EL1 as seen from VHE EL2)
> 
> I'll stick a brown paper bag on my head and wear it for the evening.
> 
> Thanks,
> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 21/64] KVM: arm64: nv: Handle PSCI call via smc from the guest
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-04 14:02     ` Alexandru Elisei
  -1 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-04 14:02 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi Marc,

The patch looks good to me. Checked kvm_hvc_call_handler(), it returns -1
only when kvm_psci_call() doesn't handle the function and
PSCI_RET_NOT_SUPPORTED must be set by the caller (which is handle_smc).

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks,
Alex

On Fri, Jan 28, 2022 at 12:18:29PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> VMs used to execute hvc #0 for the psci call if EL3 is not implemented.
> However, when we come to provide the virtual EL2 mode to the VM, the
> host OS inside the VM calls kvm_call_hyp() which is also hvc #0. So,
> it's hard to differentiate between them from the host hypervisor's point
> of view.
> 
> So, let the VM execute smc instruction for the psci call. On ARMv8.3,
> even if EL3 is not implemented, a smc instruction executed at non-secure
> EL1 is trapped to EL2 if HCR_EL2.TSC==1, rather than being treated as
> UNDEFINED. So, the host hypervisor can handle this psci call without any
> confusion.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/handle_exit.c | 24 ++++++++++++++++++++++--
>  1 file changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 2bbeed8c9786..0cedef6e0d80 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -62,6 +62,8 @@ static int handle_hvc(struct kvm_vcpu *vcpu)
>  
>  static int handle_smc(struct kvm_vcpu *vcpu)
>  {
> +	int ret;
> +
>  	/*
>  	 * "If an SMC instruction executed at Non-secure EL1 is
>  	 * trapped to EL2 because HCR_EL2.TSC is 1, the exception is a
> @@ -69,10 +71,28 @@ static int handle_smc(struct kvm_vcpu *vcpu)
>  	 *
>  	 * We need to advance the PC after the trap, as it would
>  	 * otherwise return to the same address...
> +	 *
> +	 * If imm is non-zero, it's not defined, so just skip it.
> +	 */
> +	if (kvm_vcpu_hvc_get_imm(vcpu)) {
> +		vcpu_set_reg(vcpu, 0, ~0UL);
> +		kvm_incr_pc(vcpu);
> +		return 1;
> +	}
> +
> +	/*
> +	 * If imm is zero, it's a psci call.
> +	 * Note that on ARMv8.3, even if EL3 is not implemented, SMC executed
> +	 * at Non-secure EL1 is trapped to EL2 if HCR_EL2.TSC==1, rather than
> +	 * being treated as UNDEFINED.
>  	 */
> -	vcpu_set_reg(vcpu, 0, ~0UL);
> +	ret = kvm_hvc_call_handler(vcpu);
> +	if (ret < 0)
> +		vcpu_set_reg(vcpu, 0, ~0UL);
> +
>  	kvm_incr_pc(vcpu);
> -	return 1;
> +
> +	return ret;
>  }
>  
>  /*
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 21/64] KVM: arm64: nv: Handle PSCI call via smc from the guest
@ 2022-02-04 14:02     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-04 14:02 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

Hi Marc,

The patch looks good to me. Checked kvm_hvc_call_handler(), it returns -1
only when kvm_psci_call() doesn't handle the function and
PSCI_RET_NOT_SUPPORTED must be set by the caller (which is handle_smc).

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks,
Alex

On Fri, Jan 28, 2022 at 12:18:29PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> VMs used to execute hvc #0 for the psci call if EL3 is not implemented.
> However, when we come to provide the virtual EL2 mode to the VM, the
> host OS inside the VM calls kvm_call_hyp() which is also hvc #0. So,
> it's hard to differentiate between them from the host hypervisor's point
> of view.
> 
> So, let the VM execute smc instruction for the psci call. On ARMv8.3,
> even if EL3 is not implemented, a smc instruction executed at non-secure
> EL1 is trapped to EL2 if HCR_EL2.TSC==1, rather than being treated as
> UNDEFINED. So, the host hypervisor can handle this psci call without any
> confusion.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/handle_exit.c | 24 ++++++++++++++++++++++--
>  1 file changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 2bbeed8c9786..0cedef6e0d80 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -62,6 +62,8 @@ static int handle_hvc(struct kvm_vcpu *vcpu)
>  
>  static int handle_smc(struct kvm_vcpu *vcpu)
>  {
> +	int ret;
> +
>  	/*
>  	 * "If an SMC instruction executed at Non-secure EL1 is
>  	 * trapped to EL2 because HCR_EL2.TSC is 1, the exception is a
> @@ -69,10 +71,28 @@ static int handle_smc(struct kvm_vcpu *vcpu)
>  	 *
>  	 * We need to advance the PC after the trap, as it would
>  	 * otherwise return to the same address...
> +	 *
> +	 * If imm is non-zero, it's not defined, so just skip it.
> +	 */
> +	if (kvm_vcpu_hvc_get_imm(vcpu)) {
> +		vcpu_set_reg(vcpu, 0, ~0UL);
> +		kvm_incr_pc(vcpu);
> +		return 1;
> +	}
> +
> +	/*
> +	 * If imm is zero, it's a psci call.
> +	 * Note that on ARMv8.3, even if EL3 is not implemented, SMC executed
> +	 * at Non-secure EL1 is trapped to EL2 if HCR_EL2.TSC==1, rather than
> +	 * being treated as UNDEFINED.
>  	 */
> -	vcpu_set_reg(vcpu, 0, ~0UL);
> +	ret = kvm_hvc_call_handler(vcpu);
> +	if (ret < 0)
> +		vcpu_set_reg(vcpu, 0, ~0UL);
> +
>  	kvm_incr_pc(vcpu);
> -	return 1;
> +
> +	return ret;
>  }
>  
>  /*
> -- 
> 2.30.2
> 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 21/64] KVM: arm64: nv: Handle PSCI call via smc from the guest
@ 2022-02-04 14:02     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-04 14:02 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi Marc,

The patch looks good to me. Checked kvm_hvc_call_handler(), it returns -1
only when kvm_psci_call() doesn't handle the function and
PSCI_RET_NOT_SUPPORTED must be set by the caller (which is handle_smc).

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks,
Alex

On Fri, Jan 28, 2022 at 12:18:29PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> VMs used to execute hvc #0 for the psci call if EL3 is not implemented.
> However, when we come to provide the virtual EL2 mode to the VM, the
> host OS inside the VM calls kvm_call_hyp() which is also hvc #0. So,
> it's hard to differentiate between them from the host hypervisor's point
> of view.
> 
> So, let the VM execute smc instruction for the psci call. On ARMv8.3,
> even if EL3 is not implemented, a smc instruction executed at non-secure
> EL1 is trapped to EL2 if HCR_EL2.TSC==1, rather than being treated as
> UNDEFINED. So, the host hypervisor can handle this psci call without any
> confusion.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/handle_exit.c | 24 ++++++++++++++++++++++--
>  1 file changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 2bbeed8c9786..0cedef6e0d80 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -62,6 +62,8 @@ static int handle_hvc(struct kvm_vcpu *vcpu)
>  
>  static int handle_smc(struct kvm_vcpu *vcpu)
>  {
> +	int ret;
> +
>  	/*
>  	 * "If an SMC instruction executed at Non-secure EL1 is
>  	 * trapped to EL2 because HCR_EL2.TSC is 1, the exception is a
> @@ -69,10 +71,28 @@ static int handle_smc(struct kvm_vcpu *vcpu)
>  	 *
>  	 * We need to advance the PC after the trap, as it would
>  	 * otherwise return to the same address...
> +	 *
> +	 * If imm is non-zero, it's not defined, so just skip it.
> +	 */
> +	if (kvm_vcpu_hvc_get_imm(vcpu)) {
> +		vcpu_set_reg(vcpu, 0, ~0UL);
> +		kvm_incr_pc(vcpu);
> +		return 1;
> +	}
> +
> +	/*
> +	 * If imm is zero, it's a psci call.
> +	 * Note that on ARMv8.3, even if EL3 is not implemented, SMC executed
> +	 * at Non-secure EL1 is trapped to EL2 if HCR_EL2.TSC==1, rather than
> +	 * being treated as UNDEFINED.
>  	 */
> -	vcpu_set_reg(vcpu, 0, ~0UL);
> +	ret = kvm_hvc_call_handler(vcpu);
> +	if (ret < 0)
> +		vcpu_set_reg(vcpu, 0, ~0UL);
> +
>  	kvm_incr_pc(vcpu);
> -	return 1;
> +
> +	return ret;
>  }
>  
>  /*
> -- 
> 2.30.2
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 22/64] KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-04 15:40     ` Alexandru Elisei
  -1 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-04 15:40 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:30PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> Forward exceptions due to WFI or WFE instructions to the virtual EL2 if
> they are not coming from the virtual EL2 and virtual HCR_EL2.TWX is set.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_nested.h |  2 ++
>  arch/arm64/kvm/Makefile             |  2 +-
>  arch/arm64/kvm/handle_exit.c        | 11 ++++++++++-
>  arch/arm64/kvm/nested.c             | 28 ++++++++++++++++++++++++++++
>  4 files changed, 41 insertions(+), 2 deletions(-)
>  create mode 100644 arch/arm64/kvm/nested.c
> 
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index 5a85be6d8eb3..79d382fa02ea 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -65,4 +65,6 @@ static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
>  		(cnthctl & (CNTHCTL_EVNTI | CNTHCTL_EVNTDIR | CNTHCTL_EVNTEN)));
>  }
>  
> +int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
> +
>  #endif /* __ARM64_KVM_NESTED_H */
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index b67c4ebd72b1..dbaf42ff65f1 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -14,7 +14,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
>  	 inject_fault.o va_layout.o handle_exit.o \
>  	 guest.o debug.o reset.o sys_regs.o \
>  	 vgic-sys-reg-v3.o fpsimd.o pmu.o pkvm.o \
> -	 arch_timer.o trng.o emulate-nested.o \
> +	 arch_timer.o trng.o emulate-nested.o nested.o \
>  	 vgic/vgic.o vgic/vgic-init.o \
>  	 vgic/vgic-irqfd.o vgic/vgic-v2.o \
>  	 vgic/vgic-v3.o vgic/vgic-v4.o \
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 0cedef6e0d80..a1b1bbf3d598 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -119,7 +119,16 @@ static int handle_no_fpsimd(struct kvm_vcpu *vcpu)
>   */
>  static int kvm_handle_wfx(struct kvm_vcpu *vcpu)
>  {
> -	if (kvm_vcpu_get_esr(vcpu) & ESR_ELx_WFx_ISS_WFE) {
> +	bool is_wfe = !!(kvm_vcpu_get_esr(vcpu) & ESR_ELx_WFx_ISS_WFE);
> +
> +	if (vcpu_has_nv(vcpu)) {
> +		int ret = handle_wfx_nested(vcpu, is_wfe);
> +
> +		if (ret != -EINVAL)
> +			return ret;

I find this rather clunky. The common pattern is that a function returns
early when it encounters an error, but here this pattern is reversed:
-EINVAL means that handle_wfx_nested() failed in handling the WFx, so
proceed as usual; conversly, anything but -EINVAL means handle_wfx_nested()
was successful in handling WFx, so exit early from kvm_handle_wfx().

That would be ok by itself, but if we dig deeper, handle_wfx_nested() ends up
calling kvm_inject_nested(), where -EINVAL is actually an error code. Granted,
that should never happen, because kvm_handle_wfx() first checks vcpu_has_nv(),
but still feels like something that could be improved.

Maybe changing handle_wfx_nested() like this would be better:

--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -14,15 +14,18 @@
  * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
  * handle this.
  */
-int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
+bool handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe, int *error)
 {
        u64 hcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
 
+       *error = 0;
        if (vcpu_is_el2(vcpu))
-               return -EINVAL;
+               return false;
 
-       if ((is_wfe && (hcr_el2 & HCR_TWE)) || (!is_wfe && (hcr_el2 & HCR_TWI)))
-               return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
+       if ((is_wfe && (hcr_el2 & HCR_TWE)) || (!is_wfe && (hcr_el2 & HCR_TWI))) {
+               *error = kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
+               return true;
+       }
 
-       return -EINVAL;
+       return false;
 }

Now the return value means one thing only (did handle_wfx_nested() handle
the trap?) and we still capture the error code.

Or perhaps folding handle_wfx_nested() into kvm_handle_wfx() would be
preferable.

What do you think?

Thanks,
Alex

> +	}
> +
> +	if (is_wfe) {
>  		trace_kvm_wfx_arm64(*vcpu_pc(vcpu), true);
>  		vcpu->stat.wfe_exit_stat++;
>  		kvm_vcpu_on_spin(vcpu, vcpu_mode_priv(vcpu));
> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> new file mode 100644
> index 000000000000..5e1104f8e765
> --- /dev/null
> +++ b/arch/arm64/kvm/nested.c
> @@ -0,0 +1,28 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2017 - Columbia University and Linaro Ltd.
> + * Author: Jintack Lim <jintack.lim@linaro.org>
> + */
> +
> +#include <linux/kvm.h>
> +#include <linux/kvm_host.h>
> +
> +#include <asm/kvm_emulate.h>
> +
> +/*
> + * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
> + * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
> + * handle this.
> + */
> +int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
> +{
> +	u64 hcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
> +
> +	if (vcpu_is_el2(vcpu))
> +		return -EINVAL;
> +
> +	if ((is_wfe && (hcr_el2 & HCR_TWE)) || (!is_wfe && (hcr_el2 & HCR_TWI)))
> +		return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
> +
> +	return -EINVAL;
> +}
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 22/64] KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting
@ 2022-02-04 15:40     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-04 15:40 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:30PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> Forward exceptions due to WFI or WFE instructions to the virtual EL2 if
> they are not coming from the virtual EL2 and virtual HCR_EL2.TWX is set.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_nested.h |  2 ++
>  arch/arm64/kvm/Makefile             |  2 +-
>  arch/arm64/kvm/handle_exit.c        | 11 ++++++++++-
>  arch/arm64/kvm/nested.c             | 28 ++++++++++++++++++++++++++++
>  4 files changed, 41 insertions(+), 2 deletions(-)
>  create mode 100644 arch/arm64/kvm/nested.c
> 
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index 5a85be6d8eb3..79d382fa02ea 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -65,4 +65,6 @@ static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
>  		(cnthctl & (CNTHCTL_EVNTI | CNTHCTL_EVNTDIR | CNTHCTL_EVNTEN)));
>  }
>  
> +int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
> +
>  #endif /* __ARM64_KVM_NESTED_H */
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index b67c4ebd72b1..dbaf42ff65f1 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -14,7 +14,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
>  	 inject_fault.o va_layout.o handle_exit.o \
>  	 guest.o debug.o reset.o sys_regs.o \
>  	 vgic-sys-reg-v3.o fpsimd.o pmu.o pkvm.o \
> -	 arch_timer.o trng.o emulate-nested.o \
> +	 arch_timer.o trng.o emulate-nested.o nested.o \
>  	 vgic/vgic.o vgic/vgic-init.o \
>  	 vgic/vgic-irqfd.o vgic/vgic-v2.o \
>  	 vgic/vgic-v3.o vgic/vgic-v4.o \
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 0cedef6e0d80..a1b1bbf3d598 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -119,7 +119,16 @@ static int handle_no_fpsimd(struct kvm_vcpu *vcpu)
>   */
>  static int kvm_handle_wfx(struct kvm_vcpu *vcpu)
>  {
> -	if (kvm_vcpu_get_esr(vcpu) & ESR_ELx_WFx_ISS_WFE) {
> +	bool is_wfe = !!(kvm_vcpu_get_esr(vcpu) & ESR_ELx_WFx_ISS_WFE);
> +
> +	if (vcpu_has_nv(vcpu)) {
> +		int ret = handle_wfx_nested(vcpu, is_wfe);
> +
> +		if (ret != -EINVAL)
> +			return ret;

I find this rather clunky. The common pattern is that a function returns
early when it encounters an error, but here this pattern is reversed:
-EINVAL means that handle_wfx_nested() failed in handling the WFx, so
proceed as usual; conversly, anything but -EINVAL means handle_wfx_nested()
was successful in handling WFx, so exit early from kvm_handle_wfx().

That would be ok by itself, but if we dig deeper, handle_wfx_nested() ends up
calling kvm_inject_nested(), where -EINVAL is actually an error code. Granted,
that should never happen, because kvm_handle_wfx() first checks vcpu_has_nv(),
but still feels like something that could be improved.

Maybe changing handle_wfx_nested() like this would be better:

--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -14,15 +14,18 @@
  * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
  * handle this.
  */
-int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
+bool handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe, int *error)
 {
        u64 hcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
 
+       *error = 0;
        if (vcpu_is_el2(vcpu))
-               return -EINVAL;
+               return false;
 
-       if ((is_wfe && (hcr_el2 & HCR_TWE)) || (!is_wfe && (hcr_el2 & HCR_TWI)))
-               return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
+       if ((is_wfe && (hcr_el2 & HCR_TWE)) || (!is_wfe && (hcr_el2 & HCR_TWI))) {
+               *error = kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
+               return true;
+       }
 
-       return -EINVAL;
+       return false;
 }

Now the return value means one thing only (did handle_wfx_nested() handle
the trap?) and we still capture the error code.

Or perhaps folding handle_wfx_nested() into kvm_handle_wfx() would be
preferable.

What do you think?

Thanks,
Alex

> +	}
> +
> +	if (is_wfe) {
>  		trace_kvm_wfx_arm64(*vcpu_pc(vcpu), true);
>  		vcpu->stat.wfe_exit_stat++;
>  		kvm_vcpu_on_spin(vcpu, vcpu_mode_priv(vcpu));
> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> new file mode 100644
> index 000000000000..5e1104f8e765
> --- /dev/null
> +++ b/arch/arm64/kvm/nested.c
> @@ -0,0 +1,28 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2017 - Columbia University and Linaro Ltd.
> + * Author: Jintack Lim <jintack.lim@linaro.org>
> + */
> +
> +#include <linux/kvm.h>
> +#include <linux/kvm_host.h>
> +
> +#include <asm/kvm_emulate.h>
> +
> +/*
> + * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
> + * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
> + * handle this.
> + */
> +int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
> +{
> +	u64 hcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
> +
> +	if (vcpu_is_el2(vcpu))
> +		return -EINVAL;
> +
> +	if ((is_wfe && (hcr_el2 & HCR_TWE)) || (!is_wfe && (hcr_el2 & HCR_TWI)))
> +		return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
> +
> +	return -EINVAL;
> +}
> -- 
> 2.30.2
> 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 22/64] KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting
@ 2022-02-04 15:40     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-04 15:40 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:30PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> Forward exceptions due to WFI or WFE instructions to the virtual EL2 if
> they are not coming from the virtual EL2 and virtual HCR_EL2.TWX is set.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_nested.h |  2 ++
>  arch/arm64/kvm/Makefile             |  2 +-
>  arch/arm64/kvm/handle_exit.c        | 11 ++++++++++-
>  arch/arm64/kvm/nested.c             | 28 ++++++++++++++++++++++++++++
>  4 files changed, 41 insertions(+), 2 deletions(-)
>  create mode 100644 arch/arm64/kvm/nested.c
> 
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index 5a85be6d8eb3..79d382fa02ea 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -65,4 +65,6 @@ static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
>  		(cnthctl & (CNTHCTL_EVNTI | CNTHCTL_EVNTDIR | CNTHCTL_EVNTEN)));
>  }
>  
> +int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
> +
>  #endif /* __ARM64_KVM_NESTED_H */
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index b67c4ebd72b1..dbaf42ff65f1 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -14,7 +14,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
>  	 inject_fault.o va_layout.o handle_exit.o \
>  	 guest.o debug.o reset.o sys_regs.o \
>  	 vgic-sys-reg-v3.o fpsimd.o pmu.o pkvm.o \
> -	 arch_timer.o trng.o emulate-nested.o \
> +	 arch_timer.o trng.o emulate-nested.o nested.o \
>  	 vgic/vgic.o vgic/vgic-init.o \
>  	 vgic/vgic-irqfd.o vgic/vgic-v2.o \
>  	 vgic/vgic-v3.o vgic/vgic-v4.o \
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 0cedef6e0d80..a1b1bbf3d598 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -119,7 +119,16 @@ static int handle_no_fpsimd(struct kvm_vcpu *vcpu)
>   */
>  static int kvm_handle_wfx(struct kvm_vcpu *vcpu)
>  {
> -	if (kvm_vcpu_get_esr(vcpu) & ESR_ELx_WFx_ISS_WFE) {
> +	bool is_wfe = !!(kvm_vcpu_get_esr(vcpu) & ESR_ELx_WFx_ISS_WFE);
> +
> +	if (vcpu_has_nv(vcpu)) {
> +		int ret = handle_wfx_nested(vcpu, is_wfe);
> +
> +		if (ret != -EINVAL)
> +			return ret;

I find this rather clunky. The common pattern is that a function returns
early when it encounters an error, but here this pattern is reversed:
-EINVAL means that handle_wfx_nested() failed in handling the WFx, so
proceed as usual; conversly, anything but -EINVAL means handle_wfx_nested()
was successful in handling WFx, so exit early from kvm_handle_wfx().

That would be ok by itself, but if we dig deeper, handle_wfx_nested() ends up
calling kvm_inject_nested(), where -EINVAL is actually an error code. Granted,
that should never happen, because kvm_handle_wfx() first checks vcpu_has_nv(),
but still feels like something that could be improved.

Maybe changing handle_wfx_nested() like this would be better:

--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -14,15 +14,18 @@
  * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
  * handle this.
  */
-int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
+bool handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe, int *error)
 {
        u64 hcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
 
+       *error = 0;
        if (vcpu_is_el2(vcpu))
-               return -EINVAL;
+               return false;
 
-       if ((is_wfe && (hcr_el2 & HCR_TWE)) || (!is_wfe && (hcr_el2 & HCR_TWI)))
-               return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
+       if ((is_wfe && (hcr_el2 & HCR_TWE)) || (!is_wfe && (hcr_el2 & HCR_TWI))) {
+               *error = kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
+               return true;
+       }
 
-       return -EINVAL;
+       return false;
 }

Now the return value means one thing only (did handle_wfx_nested() handle
the trap?) and we still capture the error code.

Or perhaps folding handle_wfx_nested() into kvm_handle_wfx() would be
preferable.

What do you think?

Thanks,
Alex

> +	}
> +
> +	if (is_wfe) {
>  		trace_kvm_wfx_arm64(*vcpu_pc(vcpu), true);
>  		vcpu->stat.wfe_exit_stat++;
>  		kvm_vcpu_on_spin(vcpu, vcpu_mode_priv(vcpu));
> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> new file mode 100644
> index 000000000000..5e1104f8e765
> --- /dev/null
> +++ b/arch/arm64/kvm/nested.c
> @@ -0,0 +1,28 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2017 - Columbia University and Linaro Ltd.
> + * Author: Jintack Lim <jintack.lim@linaro.org>
> + */
> +
> +#include <linux/kvm.h>
> +#include <linux/kvm_host.h>
> +
> +#include <asm/kvm_emulate.h>
> +
> +/*
> + * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
> + * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
> + * handle this.
> + */
> +int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
> +{
> +	u64 hcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
> +
> +	if (vcpu_is_el2(vcpu))
> +		return -EINVAL;
> +
> +	if ((is_wfe && (hcr_el2 & HCR_TWE)) || (!is_wfe && (hcr_el2 & HCR_TWI)))
> +		return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
> +
> +	return -EINVAL;
> +}
> -- 
> 2.30.2
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 22/64] KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting
  2022-02-04 15:40     ` Alexandru Elisei
  (?)
@ 2022-02-04 16:01       ` Alexandru Elisei
  -1 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-04 16:01 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi,

On Fri, Feb 04, 2022 at 03:40:15PM +0000, Alexandru Elisei wrote:
> Hi Marc,
> 
> On Fri, Jan 28, 2022 at 12:18:30PM +0000, Marc Zyngier wrote:
> > From: Jintack Lim <jintack.lim@linaro.org>
> > 
> > Forward exceptions due to WFI or WFE instructions to the virtual EL2 if
> > they are not coming from the virtual EL2 and virtual HCR_EL2.TWX is set.
> > 
> > Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> >  arch/arm64/include/asm/kvm_nested.h |  2 ++
> >  arch/arm64/kvm/Makefile             |  2 +-
> >  arch/arm64/kvm/handle_exit.c        | 11 ++++++++++-
> >  arch/arm64/kvm/nested.c             | 28 ++++++++++++++++++++++++++++
> >  4 files changed, 41 insertions(+), 2 deletions(-)
> >  create mode 100644 arch/arm64/kvm/nested.c
> > 
> > diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> > index 5a85be6d8eb3..79d382fa02ea 100644
> > --- a/arch/arm64/include/asm/kvm_nested.h
> > +++ b/arch/arm64/include/asm/kvm_nested.h
> > @@ -65,4 +65,6 @@ static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
> >  		(cnthctl & (CNTHCTL_EVNTI | CNTHCTL_EVNTDIR | CNTHCTL_EVNTEN)));
> >  }
> >  
> > +int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
> > +
> >  #endif /* __ARM64_KVM_NESTED_H */
> > diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> > index b67c4ebd72b1..dbaf42ff65f1 100644
> > --- a/arch/arm64/kvm/Makefile
> > +++ b/arch/arm64/kvm/Makefile
> > @@ -14,7 +14,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
> >  	 inject_fault.o va_layout.o handle_exit.o \
> >  	 guest.o debug.o reset.o sys_regs.o \
> >  	 vgic-sys-reg-v3.o fpsimd.o pmu.o pkvm.o \
> > -	 arch_timer.o trng.o emulate-nested.o \
> > +	 arch_timer.o trng.o emulate-nested.o nested.o \
> >  	 vgic/vgic.o vgic/vgic-init.o \
> >  	 vgic/vgic-irqfd.o vgic/vgic-v2.o \
> >  	 vgic/vgic-v3.o vgic/vgic-v4.o \
> > diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> > index 0cedef6e0d80..a1b1bbf3d598 100644
> > --- a/arch/arm64/kvm/handle_exit.c
> > +++ b/arch/arm64/kvm/handle_exit.c
> > @@ -119,7 +119,16 @@ static int handle_no_fpsimd(struct kvm_vcpu *vcpu)
> >   */
> >  static int kvm_handle_wfx(struct kvm_vcpu *vcpu)
> >  {
> > -	if (kvm_vcpu_get_esr(vcpu) & ESR_ELx_WFx_ISS_WFE) {
> > +	bool is_wfe = !!(kvm_vcpu_get_esr(vcpu) & ESR_ELx_WFx_ISS_WFE);
> > +
> > +	if (vcpu_has_nv(vcpu)) {
> > +		int ret = handle_wfx_nested(vcpu, is_wfe);
> > +
> > +		if (ret != -EINVAL)
> > +			return ret;
> 
> I find this rather clunky. The common pattern is that a function returns
> early when it encounters an error, but here this pattern is reversed:
> -EINVAL means that handle_wfx_nested() failed in handling the WFx, so
> proceed as usual; conversly, anything but -EINVAL means handle_wfx_nested()
> was successful in handling WFx, so exit early from kvm_handle_wfx().
> 
> That would be ok by itself, but if we dig deeper, handle_wfx_nested() ends up
> calling kvm_inject_nested(), where -EINVAL is actually an error code. Granted,
> that should never happen, because kvm_handle_wfx() first checks vcpu_has_nv(),
> but still feels like something that could be improved.
> 
> Maybe changing handle_wfx_nested() like this would be better:
> [..]

Or change kvm_handle_wfx() to handle the WFx trap like kvm_handle_fpasimd():

	if (guest_wfx_traps_enabled(vcpu))
		return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));

Thanks,
Alex

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 22/64] KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting
@ 2022-02-04 16:01       ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-04 16:01 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

Hi,

On Fri, Feb 04, 2022 at 03:40:15PM +0000, Alexandru Elisei wrote:
> Hi Marc,
> 
> On Fri, Jan 28, 2022 at 12:18:30PM +0000, Marc Zyngier wrote:
> > From: Jintack Lim <jintack.lim@linaro.org>
> > 
> > Forward exceptions due to WFI or WFE instructions to the virtual EL2 if
> > they are not coming from the virtual EL2 and virtual HCR_EL2.TWX is set.
> > 
> > Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> >  arch/arm64/include/asm/kvm_nested.h |  2 ++
> >  arch/arm64/kvm/Makefile             |  2 +-
> >  arch/arm64/kvm/handle_exit.c        | 11 ++++++++++-
> >  arch/arm64/kvm/nested.c             | 28 ++++++++++++++++++++++++++++
> >  4 files changed, 41 insertions(+), 2 deletions(-)
> >  create mode 100644 arch/arm64/kvm/nested.c
> > 
> > diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> > index 5a85be6d8eb3..79d382fa02ea 100644
> > --- a/arch/arm64/include/asm/kvm_nested.h
> > +++ b/arch/arm64/include/asm/kvm_nested.h
> > @@ -65,4 +65,6 @@ static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
> >  		(cnthctl & (CNTHCTL_EVNTI | CNTHCTL_EVNTDIR | CNTHCTL_EVNTEN)));
> >  }
> >  
> > +int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
> > +
> >  #endif /* __ARM64_KVM_NESTED_H */
> > diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> > index b67c4ebd72b1..dbaf42ff65f1 100644
> > --- a/arch/arm64/kvm/Makefile
> > +++ b/arch/arm64/kvm/Makefile
> > @@ -14,7 +14,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
> >  	 inject_fault.o va_layout.o handle_exit.o \
> >  	 guest.o debug.o reset.o sys_regs.o \
> >  	 vgic-sys-reg-v3.o fpsimd.o pmu.o pkvm.o \
> > -	 arch_timer.o trng.o emulate-nested.o \
> > +	 arch_timer.o trng.o emulate-nested.o nested.o \
> >  	 vgic/vgic.o vgic/vgic-init.o \
> >  	 vgic/vgic-irqfd.o vgic/vgic-v2.o \
> >  	 vgic/vgic-v3.o vgic/vgic-v4.o \
> > diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> > index 0cedef6e0d80..a1b1bbf3d598 100644
> > --- a/arch/arm64/kvm/handle_exit.c
> > +++ b/arch/arm64/kvm/handle_exit.c
> > @@ -119,7 +119,16 @@ static int handle_no_fpsimd(struct kvm_vcpu *vcpu)
> >   */
> >  static int kvm_handle_wfx(struct kvm_vcpu *vcpu)
> >  {
> > -	if (kvm_vcpu_get_esr(vcpu) & ESR_ELx_WFx_ISS_WFE) {
> > +	bool is_wfe = !!(kvm_vcpu_get_esr(vcpu) & ESR_ELx_WFx_ISS_WFE);
> > +
> > +	if (vcpu_has_nv(vcpu)) {
> > +		int ret = handle_wfx_nested(vcpu, is_wfe);
> > +
> > +		if (ret != -EINVAL)
> > +			return ret;
> 
> I find this rather clunky. The common pattern is that a function returns
> early when it encounters an error, but here this pattern is reversed:
> -EINVAL means that handle_wfx_nested() failed in handling the WFx, so
> proceed as usual; conversly, anything but -EINVAL means handle_wfx_nested()
> was successful in handling WFx, so exit early from kvm_handle_wfx().
> 
> That would be ok by itself, but if we dig deeper, handle_wfx_nested() ends up
> calling kvm_inject_nested(), where -EINVAL is actually an error code. Granted,
> that should never happen, because kvm_handle_wfx() first checks vcpu_has_nv(),
> but still feels like something that could be improved.
> 
> Maybe changing handle_wfx_nested() like this would be better:
> [..]

Or change kvm_handle_wfx() to handle the WFx trap like kvm_handle_fpasimd():

	if (guest_wfx_traps_enabled(vcpu))
		return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));

Thanks,
Alex
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 22/64] KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting
@ 2022-02-04 16:01       ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-04 16:01 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi,

On Fri, Feb 04, 2022 at 03:40:15PM +0000, Alexandru Elisei wrote:
> Hi Marc,
> 
> On Fri, Jan 28, 2022 at 12:18:30PM +0000, Marc Zyngier wrote:
> > From: Jintack Lim <jintack.lim@linaro.org>
> > 
> > Forward exceptions due to WFI or WFE instructions to the virtual EL2 if
> > they are not coming from the virtual EL2 and virtual HCR_EL2.TWX is set.
> > 
> > Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> >  arch/arm64/include/asm/kvm_nested.h |  2 ++
> >  arch/arm64/kvm/Makefile             |  2 +-
> >  arch/arm64/kvm/handle_exit.c        | 11 ++++++++++-
> >  arch/arm64/kvm/nested.c             | 28 ++++++++++++++++++++++++++++
> >  4 files changed, 41 insertions(+), 2 deletions(-)
> >  create mode 100644 arch/arm64/kvm/nested.c
> > 
> > diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> > index 5a85be6d8eb3..79d382fa02ea 100644
> > --- a/arch/arm64/include/asm/kvm_nested.h
> > +++ b/arch/arm64/include/asm/kvm_nested.h
> > @@ -65,4 +65,6 @@ static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
> >  		(cnthctl & (CNTHCTL_EVNTI | CNTHCTL_EVNTDIR | CNTHCTL_EVNTEN)));
> >  }
> >  
> > +int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
> > +
> >  #endif /* __ARM64_KVM_NESTED_H */
> > diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> > index b67c4ebd72b1..dbaf42ff65f1 100644
> > --- a/arch/arm64/kvm/Makefile
> > +++ b/arch/arm64/kvm/Makefile
> > @@ -14,7 +14,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
> >  	 inject_fault.o va_layout.o handle_exit.o \
> >  	 guest.o debug.o reset.o sys_regs.o \
> >  	 vgic-sys-reg-v3.o fpsimd.o pmu.o pkvm.o \
> > -	 arch_timer.o trng.o emulate-nested.o \
> > +	 arch_timer.o trng.o emulate-nested.o nested.o \
> >  	 vgic/vgic.o vgic/vgic-init.o \
> >  	 vgic/vgic-irqfd.o vgic/vgic-v2.o \
> >  	 vgic/vgic-v3.o vgic/vgic-v4.o \
> > diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> > index 0cedef6e0d80..a1b1bbf3d598 100644
> > --- a/arch/arm64/kvm/handle_exit.c
> > +++ b/arch/arm64/kvm/handle_exit.c
> > @@ -119,7 +119,16 @@ static int handle_no_fpsimd(struct kvm_vcpu *vcpu)
> >   */
> >  static int kvm_handle_wfx(struct kvm_vcpu *vcpu)
> >  {
> > -	if (kvm_vcpu_get_esr(vcpu) & ESR_ELx_WFx_ISS_WFE) {
> > +	bool is_wfe = !!(kvm_vcpu_get_esr(vcpu) & ESR_ELx_WFx_ISS_WFE);
> > +
> > +	if (vcpu_has_nv(vcpu)) {
> > +		int ret = handle_wfx_nested(vcpu, is_wfe);
> > +
> > +		if (ret != -EINVAL)
> > +			return ret;
> 
> I find this rather clunky. The common pattern is that a function returns
> early when it encounters an error, but here this pattern is reversed:
> -EINVAL means that handle_wfx_nested() failed in handling the WFx, so
> proceed as usual; conversly, anything but -EINVAL means handle_wfx_nested()
> was successful in handling WFx, so exit early from kvm_handle_wfx().
> 
> That would be ok by itself, but if we dig deeper, handle_wfx_nested() ends up
> calling kvm_inject_nested(), where -EINVAL is actually an error code. Granted,
> that should never happen, because kvm_handle_wfx() first checks vcpu_has_nv(),
> but still feels like something that could be improved.
> 
> Maybe changing handle_wfx_nested() like this would be better:
> [..]

Or change kvm_handle_wfx() to handle the WFx trap like kvm_handle_fpasimd():

	if (guest_wfx_traps_enabled(vcpu))
		return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));

Thanks,
Alex

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 24/64] KVM: arm64: nv: Respect the virtual HCR_EL2.NV bit setting
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-07 15:33     ` Alexandru Elisei
  -1 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-07 15:33 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:32PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> Forward traps due to HCR_EL2.NV bit to the virtual EL2 if they are not
> coming from the virtual EL2 and the virtual HCR_EL2.NV bit is set.
> 
> In addition to EL2 register accesses, setting NV bit will also make EL12
> register accesses trap to EL2. To emulate this for the virtual EL2,
> forword traps due to EL12 register accessses to the virtual EL2 if the
> virtual HCR_EL2.NV bit is set.

The patch also adds handling for the HCR_EL2.TSC bit. It might prove useful for
the commit subject and message to reflect that.

Also, HCR_EL2.NV also enables trapping of accesses to the *_EL02, *_EL2 and
SP_EL1 registers, or trapping the execution of the ERET, ERETAA, ERETAB,
and of certain AT and TLB maintenance instructions.  I don't see those
mentioned anywhere.

IMO, the commit message should be reworded to say exactly is being forwarded,
because as it stands it is very misleading.

> 
> This is for recursive nested virtualization.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> [Moved code to emulate-nested.c]

What goes in emulate-nested.c and what goes in nested.c?

> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_arm.h    |  1 +
>  arch/arm64/include/asm/kvm_nested.h |  2 ++
>  arch/arm64/kvm/emulate-nested.c     | 27 +++++++++++++++++++++++++++
>  arch/arm64/kvm/handle_exit.c        |  7 +++++++
>  arch/arm64/kvm/sys_regs.c           | 21 +++++++++++++++++++++
>  5 files changed, 58 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index 5acb153a82c8..8043827e7dc0 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -20,6 +20,7 @@
>  #define HCR_AMVOFFEN	(UL(1) << 51)
>  #define HCR_FIEN	(UL(1) << 47)
>  #define HCR_FWB		(UL(1) << 46)
> +#define HCR_NV		(UL(1) << 42)
>  #define HCR_API		(UL(1) << 41)
>  #define HCR_APK		(UL(1) << 40)
>  #define HCR_TEA		(UL(1) << 37)
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index 79d382fa02ea..37ff6458296d 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -66,5 +66,7 @@ static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
>  }
>  
>  int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
> +extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
> +extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
>  
>  #endif /* __ARM64_KVM_NESTED_H */
> diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
> index f52cd4458947..7dd98d6e96e0 100644
> --- a/arch/arm64/kvm/emulate-nested.c
> +++ b/arch/arm64/kvm/emulate-nested.c
> @@ -13,6 +13,26 @@
>  
>  #include "trace.h"
>  
> +bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit)
> +{
> +	bool control_bit_set;
> +
> +	if (!vcpu_has_nv(vcpu))
> +		return false;
> +
> +	control_bit_set = __vcpu_sys_reg(vcpu, HCR_EL2) & control_bit;
> +	if (!vcpu_is_el2(vcpu) && control_bit_set) {
> +		kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
> +		return true;
> +	}
> +	return false;
> +}
> +
> +bool forward_nv_traps(struct kvm_vcpu *vcpu)
> +{
> +	return forward_traps(vcpu, HCR_NV);
> +}
> +
>  static u64 kvm_check_illegal_exception_return(struct kvm_vcpu *vcpu, u64 spsr)
>  {
>  	u64 mode = spsr & PSR_MODE_MASK;
> @@ -49,6 +69,13 @@ void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu)
>  	u64 spsr, elr, mode;
>  	bool direct_eret;
>  
> +	/*
> +	 * Forward this trap to the virtual EL2 if the virtual
> +	 * HCR_EL2.NV bit is set and this is coming from !EL2.
> +	 */

I was under the impression that Documentation/process/coding-style.rst frowns
upon explaining what a function does. forward_traps() is small and simple, I
think the comment is not needed for understanding what the function does.

> +	if (forward_nv_traps(vcpu))
> +		return;
> +
>  	/*
>  	 * Going through the whole put/load motions is a waste of time
>  	 * if this is a VHE guest hypervisor returning to its own
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index a5c698f188d6..867de65eb766 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -64,6 +64,13 @@ static int handle_smc(struct kvm_vcpu *vcpu)
>  {
>  	int ret;
>  
> +	/*
> +	 * Forward this trapped smc instruction to the virtual EL2 if
> +	 * the guest has asked for it.
> +	 */
> +	if (forward_traps(vcpu, HCR_TSC))

Like I've said, this part is not mentioned in the commit message at all.

> +		return 1;
> +
>  	/*
>  	 * "If an SMC instruction executed at Non-secure EL1 is
>  	 * trapped to EL2 because HCR_EL2.TSC is 1, the exception is a
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 7f074a7f6eb3..ccd063d6cb69 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -267,10 +267,19 @@ static u32 get_ccsidr(u32 csselr)
>  	return ccsidr;
>  }
>  
> +static bool el12_reg(struct sys_reg_params *p)
> +{
> +	/* All *_EL12 registers have Op1=5. */
> +	return (p->Op1 == 5);
> +}
> +
>  static bool access_rw(struct kvm_vcpu *vcpu,
>  		      struct sys_reg_params *p,
>  		      const struct sys_reg_desc *r)
>  {
> +	if (el12_reg(p) && forward_nv_traps(vcpu))
> +		return false;
> +
>  	if (p->is_write)
>  		vcpu_write_sys_reg(vcpu, p->regval, r->reg);
>  	else
> @@ -339,6 +348,9 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
>  	bool was_enabled = vcpu_has_cache_enabled(vcpu);
>  	u64 val, mask, shift;
>  
> +	if (el12_reg(p) && forward_nv_traps(vcpu))
> +		return false;
> +
>  	/* We don't expect TRVM on the host */
>  	BUG_ON(!vcpu_is_el2(vcpu) && !p->is_write);
>  
> @@ -1654,6 +1666,9 @@ static bool access_elr(struct kvm_vcpu *vcpu,
>  		       struct sys_reg_params *p,
>  		       const struct sys_reg_desc *r)
>  {
> +	if (el12_reg(p) && forward_nv_traps(vcpu))

ELR_EL2 has Op1 = 4, and ELR_EL1 has Op1 = 0, and as far as I can tell there are
no _EL12 variants. Why use el12_reg() here when it always returns false?

> +		return false;
> +
>  	if (p->is_write)
>  		vcpu_write_sys_reg(vcpu, p->regval, ELR_EL1);
>  	else
> @@ -1666,6 +1681,9 @@ static bool access_spsr(struct kvm_vcpu *vcpu,
>  			struct sys_reg_params *p,
>  			const struct sys_reg_desc *r)
>  {
> +	if (el12_reg(p) && forward_nv_traps(vcpu))
> +		return false;
> +
>  	if (p->is_write)
>  		__vcpu_sys_reg(vcpu, SPSR_EL1) = p->regval;
>  	else
> @@ -1678,6 +1696,9 @@ static bool access_spsr_el2(struct kvm_vcpu *vcpu,
>  			    struct sys_reg_params *p,
>  			    const struct sys_reg_desc *r)
>  {
> +	if (el12_reg(p) && forward_nv_traps(vcpu))
> +		return false;

spsr_el2 is an EL2 register, and el12_reg() always returns false (Op1 = 5).
Shouldn't that check be only:

		if (forward_nv_traps(vcpu))
			return false;

Thanks,
Alex

> +
>  	if (p->is_write)
>  		vcpu_write_sys_reg(vcpu, p->regval, SPSR_EL2);
>  	else
> -- 
> 2.30.2
> 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 24/64] KVM: arm64: nv: Respect the virtual HCR_EL2.NV bit setting
@ 2022-02-07 15:33     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-07 15:33 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:32PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> Forward traps due to HCR_EL2.NV bit to the virtual EL2 if they are not
> coming from the virtual EL2 and the virtual HCR_EL2.NV bit is set.
> 
> In addition to EL2 register accesses, setting NV bit will also make EL12
> register accesses trap to EL2. To emulate this for the virtual EL2,
> forword traps due to EL12 register accessses to the virtual EL2 if the
> virtual HCR_EL2.NV bit is set.

The patch also adds handling for the HCR_EL2.TSC bit. It might prove useful for
the commit subject and message to reflect that.

Also, HCR_EL2.NV also enables trapping of accesses to the *_EL02, *_EL2 and
SP_EL1 registers, or trapping the execution of the ERET, ERETAA, ERETAB,
and of certain AT and TLB maintenance instructions.  I don't see those
mentioned anywhere.

IMO, the commit message should be reworded to say exactly is being forwarded,
because as it stands it is very misleading.

> 
> This is for recursive nested virtualization.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> [Moved code to emulate-nested.c]

What goes in emulate-nested.c and what goes in nested.c?

> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_arm.h    |  1 +
>  arch/arm64/include/asm/kvm_nested.h |  2 ++
>  arch/arm64/kvm/emulate-nested.c     | 27 +++++++++++++++++++++++++++
>  arch/arm64/kvm/handle_exit.c        |  7 +++++++
>  arch/arm64/kvm/sys_regs.c           | 21 +++++++++++++++++++++
>  5 files changed, 58 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index 5acb153a82c8..8043827e7dc0 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -20,6 +20,7 @@
>  #define HCR_AMVOFFEN	(UL(1) << 51)
>  #define HCR_FIEN	(UL(1) << 47)
>  #define HCR_FWB		(UL(1) << 46)
> +#define HCR_NV		(UL(1) << 42)
>  #define HCR_API		(UL(1) << 41)
>  #define HCR_APK		(UL(1) << 40)
>  #define HCR_TEA		(UL(1) << 37)
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index 79d382fa02ea..37ff6458296d 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -66,5 +66,7 @@ static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
>  }
>  
>  int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
> +extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
> +extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
>  
>  #endif /* __ARM64_KVM_NESTED_H */
> diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
> index f52cd4458947..7dd98d6e96e0 100644
> --- a/arch/arm64/kvm/emulate-nested.c
> +++ b/arch/arm64/kvm/emulate-nested.c
> @@ -13,6 +13,26 @@
>  
>  #include "trace.h"
>  
> +bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit)
> +{
> +	bool control_bit_set;
> +
> +	if (!vcpu_has_nv(vcpu))
> +		return false;
> +
> +	control_bit_set = __vcpu_sys_reg(vcpu, HCR_EL2) & control_bit;
> +	if (!vcpu_is_el2(vcpu) && control_bit_set) {
> +		kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
> +		return true;
> +	}
> +	return false;
> +}
> +
> +bool forward_nv_traps(struct kvm_vcpu *vcpu)
> +{
> +	return forward_traps(vcpu, HCR_NV);
> +}
> +
>  static u64 kvm_check_illegal_exception_return(struct kvm_vcpu *vcpu, u64 spsr)
>  {
>  	u64 mode = spsr & PSR_MODE_MASK;
> @@ -49,6 +69,13 @@ void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu)
>  	u64 spsr, elr, mode;
>  	bool direct_eret;
>  
> +	/*
> +	 * Forward this trap to the virtual EL2 if the virtual
> +	 * HCR_EL2.NV bit is set and this is coming from !EL2.
> +	 */

I was under the impression that Documentation/process/coding-style.rst frowns
upon explaining what a function does. forward_traps() is small and simple, I
think the comment is not needed for understanding what the function does.

> +	if (forward_nv_traps(vcpu))
> +		return;
> +
>  	/*
>  	 * Going through the whole put/load motions is a waste of time
>  	 * if this is a VHE guest hypervisor returning to its own
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index a5c698f188d6..867de65eb766 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -64,6 +64,13 @@ static int handle_smc(struct kvm_vcpu *vcpu)
>  {
>  	int ret;
>  
> +	/*
> +	 * Forward this trapped smc instruction to the virtual EL2 if
> +	 * the guest has asked for it.
> +	 */
> +	if (forward_traps(vcpu, HCR_TSC))

Like I've said, this part is not mentioned in the commit message at all.

> +		return 1;
> +
>  	/*
>  	 * "If an SMC instruction executed at Non-secure EL1 is
>  	 * trapped to EL2 because HCR_EL2.TSC is 1, the exception is a
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 7f074a7f6eb3..ccd063d6cb69 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -267,10 +267,19 @@ static u32 get_ccsidr(u32 csselr)
>  	return ccsidr;
>  }
>  
> +static bool el12_reg(struct sys_reg_params *p)
> +{
> +	/* All *_EL12 registers have Op1=5. */
> +	return (p->Op1 == 5);
> +}
> +
>  static bool access_rw(struct kvm_vcpu *vcpu,
>  		      struct sys_reg_params *p,
>  		      const struct sys_reg_desc *r)
>  {
> +	if (el12_reg(p) && forward_nv_traps(vcpu))
> +		return false;
> +
>  	if (p->is_write)
>  		vcpu_write_sys_reg(vcpu, p->regval, r->reg);
>  	else
> @@ -339,6 +348,9 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
>  	bool was_enabled = vcpu_has_cache_enabled(vcpu);
>  	u64 val, mask, shift;
>  
> +	if (el12_reg(p) && forward_nv_traps(vcpu))
> +		return false;
> +
>  	/* We don't expect TRVM on the host */
>  	BUG_ON(!vcpu_is_el2(vcpu) && !p->is_write);
>  
> @@ -1654,6 +1666,9 @@ static bool access_elr(struct kvm_vcpu *vcpu,
>  		       struct sys_reg_params *p,
>  		       const struct sys_reg_desc *r)
>  {
> +	if (el12_reg(p) && forward_nv_traps(vcpu))

ELR_EL2 has Op1 = 4, and ELR_EL1 has Op1 = 0, and as far as I can tell there are
no _EL12 variants. Why use el12_reg() here when it always returns false?

> +		return false;
> +
>  	if (p->is_write)
>  		vcpu_write_sys_reg(vcpu, p->regval, ELR_EL1);
>  	else
> @@ -1666,6 +1681,9 @@ static bool access_spsr(struct kvm_vcpu *vcpu,
>  			struct sys_reg_params *p,
>  			const struct sys_reg_desc *r)
>  {
> +	if (el12_reg(p) && forward_nv_traps(vcpu))
> +		return false;
> +
>  	if (p->is_write)
>  		__vcpu_sys_reg(vcpu, SPSR_EL1) = p->regval;
>  	else
> @@ -1678,6 +1696,9 @@ static bool access_spsr_el2(struct kvm_vcpu *vcpu,
>  			    struct sys_reg_params *p,
>  			    const struct sys_reg_desc *r)
>  {
> +	if (el12_reg(p) && forward_nv_traps(vcpu))
> +		return false;

spsr_el2 is an EL2 register, and el12_reg() always returns false (Op1 = 5).
Shouldn't that check be only:

		if (forward_nv_traps(vcpu))
			return false;

Thanks,
Alex

> +
>  	if (p->is_write)
>  		vcpu_write_sys_reg(vcpu, p->regval, SPSR_EL2);
>  	else
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 24/64] KVM: arm64: nv: Respect the virtual HCR_EL2.NV bit setting
@ 2022-02-07 15:33     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-07 15:33 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:32PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> Forward traps due to HCR_EL2.NV bit to the virtual EL2 if they are not
> coming from the virtual EL2 and the virtual HCR_EL2.NV bit is set.
> 
> In addition to EL2 register accesses, setting NV bit will also make EL12
> register accesses trap to EL2. To emulate this for the virtual EL2,
> forword traps due to EL12 register accessses to the virtual EL2 if the
> virtual HCR_EL2.NV bit is set.

The patch also adds handling for the HCR_EL2.TSC bit. It might prove useful for
the commit subject and message to reflect that.

Also, HCR_EL2.NV also enables trapping of accesses to the *_EL02, *_EL2 and
SP_EL1 registers, or trapping the execution of the ERET, ERETAA, ERETAB,
and of certain AT and TLB maintenance instructions.  I don't see those
mentioned anywhere.

IMO, the commit message should be reworded to say exactly is being forwarded,
because as it stands it is very misleading.

> 
> This is for recursive nested virtualization.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> [Moved code to emulate-nested.c]

What goes in emulate-nested.c and what goes in nested.c?

> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_arm.h    |  1 +
>  arch/arm64/include/asm/kvm_nested.h |  2 ++
>  arch/arm64/kvm/emulate-nested.c     | 27 +++++++++++++++++++++++++++
>  arch/arm64/kvm/handle_exit.c        |  7 +++++++
>  arch/arm64/kvm/sys_regs.c           | 21 +++++++++++++++++++++
>  5 files changed, 58 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index 5acb153a82c8..8043827e7dc0 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -20,6 +20,7 @@
>  #define HCR_AMVOFFEN	(UL(1) << 51)
>  #define HCR_FIEN	(UL(1) << 47)
>  #define HCR_FWB		(UL(1) << 46)
> +#define HCR_NV		(UL(1) << 42)
>  #define HCR_API		(UL(1) << 41)
>  #define HCR_APK		(UL(1) << 40)
>  #define HCR_TEA		(UL(1) << 37)
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index 79d382fa02ea..37ff6458296d 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -66,5 +66,7 @@ static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
>  }
>  
>  int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
> +extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
> +extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
>  
>  #endif /* __ARM64_KVM_NESTED_H */
> diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
> index f52cd4458947..7dd98d6e96e0 100644
> --- a/arch/arm64/kvm/emulate-nested.c
> +++ b/arch/arm64/kvm/emulate-nested.c
> @@ -13,6 +13,26 @@
>  
>  #include "trace.h"
>  
> +bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit)
> +{
> +	bool control_bit_set;
> +
> +	if (!vcpu_has_nv(vcpu))
> +		return false;
> +
> +	control_bit_set = __vcpu_sys_reg(vcpu, HCR_EL2) & control_bit;
> +	if (!vcpu_is_el2(vcpu) && control_bit_set) {
> +		kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
> +		return true;
> +	}
> +	return false;
> +}
> +
> +bool forward_nv_traps(struct kvm_vcpu *vcpu)
> +{
> +	return forward_traps(vcpu, HCR_NV);
> +}
> +
>  static u64 kvm_check_illegal_exception_return(struct kvm_vcpu *vcpu, u64 spsr)
>  {
>  	u64 mode = spsr & PSR_MODE_MASK;
> @@ -49,6 +69,13 @@ void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu)
>  	u64 spsr, elr, mode;
>  	bool direct_eret;
>  
> +	/*
> +	 * Forward this trap to the virtual EL2 if the virtual
> +	 * HCR_EL2.NV bit is set and this is coming from !EL2.
> +	 */

I was under the impression that Documentation/process/coding-style.rst frowns
upon explaining what a function does. forward_traps() is small and simple, I
think the comment is not needed for understanding what the function does.

> +	if (forward_nv_traps(vcpu))
> +		return;
> +
>  	/*
>  	 * Going through the whole put/load motions is a waste of time
>  	 * if this is a VHE guest hypervisor returning to its own
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index a5c698f188d6..867de65eb766 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -64,6 +64,13 @@ static int handle_smc(struct kvm_vcpu *vcpu)
>  {
>  	int ret;
>  
> +	/*
> +	 * Forward this trapped smc instruction to the virtual EL2 if
> +	 * the guest has asked for it.
> +	 */
> +	if (forward_traps(vcpu, HCR_TSC))

Like I've said, this part is not mentioned in the commit message at all.

> +		return 1;
> +
>  	/*
>  	 * "If an SMC instruction executed at Non-secure EL1 is
>  	 * trapped to EL2 because HCR_EL2.TSC is 1, the exception is a
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 7f074a7f6eb3..ccd063d6cb69 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -267,10 +267,19 @@ static u32 get_ccsidr(u32 csselr)
>  	return ccsidr;
>  }
>  
> +static bool el12_reg(struct sys_reg_params *p)
> +{
> +	/* All *_EL12 registers have Op1=5. */
> +	return (p->Op1 == 5);
> +}
> +
>  static bool access_rw(struct kvm_vcpu *vcpu,
>  		      struct sys_reg_params *p,
>  		      const struct sys_reg_desc *r)
>  {
> +	if (el12_reg(p) && forward_nv_traps(vcpu))
> +		return false;
> +
>  	if (p->is_write)
>  		vcpu_write_sys_reg(vcpu, p->regval, r->reg);
>  	else
> @@ -339,6 +348,9 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
>  	bool was_enabled = vcpu_has_cache_enabled(vcpu);
>  	u64 val, mask, shift;
>  
> +	if (el12_reg(p) && forward_nv_traps(vcpu))
> +		return false;
> +
>  	/* We don't expect TRVM on the host */
>  	BUG_ON(!vcpu_is_el2(vcpu) && !p->is_write);
>  
> @@ -1654,6 +1666,9 @@ static bool access_elr(struct kvm_vcpu *vcpu,
>  		       struct sys_reg_params *p,
>  		       const struct sys_reg_desc *r)
>  {
> +	if (el12_reg(p) && forward_nv_traps(vcpu))

ELR_EL2 has Op1 = 4, and ELR_EL1 has Op1 = 0, and as far as I can tell there are
no _EL12 variants. Why use el12_reg() here when it always returns false?

> +		return false;
> +
>  	if (p->is_write)
>  		vcpu_write_sys_reg(vcpu, p->regval, ELR_EL1);
>  	else
> @@ -1666,6 +1681,9 @@ static bool access_spsr(struct kvm_vcpu *vcpu,
>  			struct sys_reg_params *p,
>  			const struct sys_reg_desc *r)
>  {
> +	if (el12_reg(p) && forward_nv_traps(vcpu))
> +		return false;
> +
>  	if (p->is_write)
>  		__vcpu_sys_reg(vcpu, SPSR_EL1) = p->regval;
>  	else
> @@ -1678,6 +1696,9 @@ static bool access_spsr_el2(struct kvm_vcpu *vcpu,
>  			    struct sys_reg_params *p,
>  			    const struct sys_reg_desc *r)
>  {
> +	if (el12_reg(p) && forward_nv_traps(vcpu))
> +		return false;

spsr_el2 is an EL2 register, and el12_reg() always returns false (Op1 = 5).
Shouldn't that check be only:

		if (forward_nv_traps(vcpu))
			return false;

Thanks,
Alex

> +
>  	if (p->is_write)
>  		vcpu_write_sys_reg(vcpu, p->regval, SPSR_EL2);
>  	else
> -- 
> 2.30.2
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 22/64] KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting
  2022-02-04 15:40     ` Alexandru Elisei
  (?)
@ 2022-02-07 15:38       ` Alexandru Elisei
  -1 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-07 15:38 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

Hi Marc,

On Fri, Feb 04, 2022 at 03:40:15PM +0000, Alexandru Elisei wrote:
> Hi Marc,
> 
> On Fri, Jan 28, 2022 at 12:18:30PM +0000, Marc Zyngier wrote:
> > From: Jintack Lim <jintack.lim@linaro.org>
> > 
> > Forward exceptions due to WFI or WFE instructions to the virtual EL2 if
> > they are not coming from the virtual EL2 and virtual HCR_EL2.TWX is set.
> > 
> > Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> >  arch/arm64/include/asm/kvm_nested.h |  2 ++
> >  arch/arm64/kvm/Makefile             |  2 +-
> >  arch/arm64/kvm/handle_exit.c        | 11 ++++++++++-
> >  arch/arm64/kvm/nested.c             | 28 ++++++++++++++++++++++++++++
> >  4 files changed, 41 insertions(+), 2 deletions(-)
> >  create mode 100644 arch/arm64/kvm/nested.c
> > 
> > diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> > index 5a85be6d8eb3..79d382fa02ea 100644
> > --- a/arch/arm64/include/asm/kvm_nested.h
> > +++ b/arch/arm64/include/asm/kvm_nested.h
> > @@ -65,4 +65,6 @@ static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
> >  		(cnthctl & (CNTHCTL_EVNTI | CNTHCTL_EVNTDIR | CNTHCTL_EVNTEN)));
> >  }
> >  
> > +int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
> > +
> >  #endif /* __ARM64_KVM_NESTED_H */
> > diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> > index b67c4ebd72b1..dbaf42ff65f1 100644
> > --- a/arch/arm64/kvm/Makefile
> > +++ b/arch/arm64/kvm/Makefile
> > @@ -14,7 +14,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
> >  	 inject_fault.o va_layout.o handle_exit.o \
> >  	 guest.o debug.o reset.o sys_regs.o \
> >  	 vgic-sys-reg-v3.o fpsimd.o pmu.o pkvm.o \
> > -	 arch_timer.o trng.o emulate-nested.o \
> > +	 arch_timer.o trng.o emulate-nested.o nested.o \
> >  	 vgic/vgic.o vgic/vgic-init.o \
> >  	 vgic/vgic-irqfd.o vgic/vgic-v2.o \
> >  	 vgic/vgic-v3.o vgic/vgic-v4.o \
> > diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> > index 0cedef6e0d80..a1b1bbf3d598 100644
> > --- a/arch/arm64/kvm/handle_exit.c
> > +++ b/arch/arm64/kvm/handle_exit.c
> > @@ -119,7 +119,16 @@ static int handle_no_fpsimd(struct kvm_vcpu *vcpu)
> >   */
> >  static int kvm_handle_wfx(struct kvm_vcpu *vcpu)
> >  {
> > -	if (kvm_vcpu_get_esr(vcpu) & ESR_ELx_WFx_ISS_WFE) {
> > +	bool is_wfe = !!(kvm_vcpu_get_esr(vcpu) & ESR_ELx_WFx_ISS_WFE);
> > +
> > +	if (vcpu_has_nv(vcpu)) {
> > +		int ret = handle_wfx_nested(vcpu, is_wfe);
> > +
> > +		if (ret != -EINVAL)
> > +			return ret;
> 
> I find this rather clunky. The common pattern is that a function returns
> early when it encounters an error, but here this pattern is reversed:
> -EINVAL means that handle_wfx_nested() failed in handling the WFx, so
> proceed as usual; conversly, anything but -EINVAL means handle_wfx_nested()
> was successful in handling WFx, so exit early from kvm_handle_wfx().
> 
> That would be ok by itself, but if we dig deeper, handle_wfx_nested() ends up
> calling kvm_inject_nested(), where -EINVAL is actually an error code. Granted,
> that should never happen, because kvm_handle_wfx() first checks vcpu_has_nv(),
> but still feels like something that could be improved.
> 
> Maybe changing handle_wfx_nested() like this would be better:
> 
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
> @@ -14,15 +14,18 @@
>   * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
>   * handle this.
>   */
> -int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
> +bool handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe, int *error)
>  {
>         u64 hcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
>  
> +       *error = 0;
>         if (vcpu_is_el2(vcpu))
> -               return -EINVAL;
> +               return false;
>  
> -       if ((is_wfe && (hcr_el2 & HCR_TWE)) || (!is_wfe && (hcr_el2 & HCR_TWI)))
> -               return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
> +       if ((is_wfe && (hcr_el2 & HCR_TWE)) || (!is_wfe && (hcr_el2 & HCR_TWI))) {
> +               *error = kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
> +               return true;
> +       }
>  
> -       return -EINVAL;
> +       return false;
>  }
> 
> Now the return value means one thing only (did handle_wfx_nested() handle
> the trap?) and we still capture the error code.
> 
> Or perhaps folding handle_wfx_nested() into kvm_handle_wfx() would be
> preferable.
> 
> What do you think?

Or kvm_handle_wtf() can be rewritten to use forward_traps() introduced in the patch after the
next one (#24, "KVM: arm64: nv: Respect the virtual HCR_EL2.NV bit setting"):

static int kvm_handle_wfx(struct kvm_vcpu *vcpu)
{
	bool is_wfe = !!(kvm_vcpu_get_esr(vcpu) & ESR_ELx_WFx_ISS_WFE);

	if (is_wfe && forward_traps(vcpu, HCR_TWE))
		return 1;

	if (!is_wfe && forward_traps(vcpu, HCR_TWI))
		return 1;

	[..]
}

Plenty of options to choose from.

Thanks,
Alex

> 
> Thanks,
> Alex
> 
> > +	}
> > +
> > +	if (is_wfe) {
> >  		trace_kvm_wfx_arm64(*vcpu_pc(vcpu), true);
> >  		vcpu->stat.wfe_exit_stat++;
> >  		kvm_vcpu_on_spin(vcpu, vcpu_mode_priv(vcpu));
> > diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> > new file mode 100644
> > index 000000000000..5e1104f8e765
> > --- /dev/null
> > +++ b/arch/arm64/kvm/nested.c
> > @@ -0,0 +1,28 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * Copyright (C) 2017 - Columbia University and Linaro Ltd.
> > + * Author: Jintack Lim <jintack.lim@linaro.org>
> > + */
> > +
> > +#include <linux/kvm.h>
> > +#include <linux/kvm_host.h>
> > +
> > +#include <asm/kvm_emulate.h>
> > +
> > +/*
> > + * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
> > + * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
> > + * handle this.
> > + */
> > +int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
> > +{
> > +	u64 hcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
> > +
> > +	if (vcpu_is_el2(vcpu))
> > +		return -EINVAL;
> > +
> > +	if ((is_wfe && (hcr_el2 & HCR_TWE)) || (!is_wfe && (hcr_el2 & HCR_TWI)))
> > +		return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
> > +
> > +	return -EINVAL;
> > +}
> > -- 
> > 2.30.2
> > 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 22/64] KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting
@ 2022-02-07 15:38       ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-07 15:38 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi Marc,

On Fri, Feb 04, 2022 at 03:40:15PM +0000, Alexandru Elisei wrote:
> Hi Marc,
> 
> On Fri, Jan 28, 2022 at 12:18:30PM +0000, Marc Zyngier wrote:
> > From: Jintack Lim <jintack.lim@linaro.org>
> > 
> > Forward exceptions due to WFI or WFE instructions to the virtual EL2 if
> > they are not coming from the virtual EL2 and virtual HCR_EL2.TWX is set.
> > 
> > Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> >  arch/arm64/include/asm/kvm_nested.h |  2 ++
> >  arch/arm64/kvm/Makefile             |  2 +-
> >  arch/arm64/kvm/handle_exit.c        | 11 ++++++++++-
> >  arch/arm64/kvm/nested.c             | 28 ++++++++++++++++++++++++++++
> >  4 files changed, 41 insertions(+), 2 deletions(-)
> >  create mode 100644 arch/arm64/kvm/nested.c
> > 
> > diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> > index 5a85be6d8eb3..79d382fa02ea 100644
> > --- a/arch/arm64/include/asm/kvm_nested.h
> > +++ b/arch/arm64/include/asm/kvm_nested.h
> > @@ -65,4 +65,6 @@ static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
> >  		(cnthctl & (CNTHCTL_EVNTI | CNTHCTL_EVNTDIR | CNTHCTL_EVNTEN)));
> >  }
> >  
> > +int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
> > +
> >  #endif /* __ARM64_KVM_NESTED_H */
> > diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> > index b67c4ebd72b1..dbaf42ff65f1 100644
> > --- a/arch/arm64/kvm/Makefile
> > +++ b/arch/arm64/kvm/Makefile
> > @@ -14,7 +14,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
> >  	 inject_fault.o va_layout.o handle_exit.o \
> >  	 guest.o debug.o reset.o sys_regs.o \
> >  	 vgic-sys-reg-v3.o fpsimd.o pmu.o pkvm.o \
> > -	 arch_timer.o trng.o emulate-nested.o \
> > +	 arch_timer.o trng.o emulate-nested.o nested.o \
> >  	 vgic/vgic.o vgic/vgic-init.o \
> >  	 vgic/vgic-irqfd.o vgic/vgic-v2.o \
> >  	 vgic/vgic-v3.o vgic/vgic-v4.o \
> > diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> > index 0cedef6e0d80..a1b1bbf3d598 100644
> > --- a/arch/arm64/kvm/handle_exit.c
> > +++ b/arch/arm64/kvm/handle_exit.c
> > @@ -119,7 +119,16 @@ static int handle_no_fpsimd(struct kvm_vcpu *vcpu)
> >   */
> >  static int kvm_handle_wfx(struct kvm_vcpu *vcpu)
> >  {
> > -	if (kvm_vcpu_get_esr(vcpu) & ESR_ELx_WFx_ISS_WFE) {
> > +	bool is_wfe = !!(kvm_vcpu_get_esr(vcpu) & ESR_ELx_WFx_ISS_WFE);
> > +
> > +	if (vcpu_has_nv(vcpu)) {
> > +		int ret = handle_wfx_nested(vcpu, is_wfe);
> > +
> > +		if (ret != -EINVAL)
> > +			return ret;
> 
> I find this rather clunky. The common pattern is that a function returns
> early when it encounters an error, but here this pattern is reversed:
> -EINVAL means that handle_wfx_nested() failed in handling the WFx, so
> proceed as usual; conversly, anything but -EINVAL means handle_wfx_nested()
> was successful in handling WFx, so exit early from kvm_handle_wfx().
> 
> That would be ok by itself, but if we dig deeper, handle_wfx_nested() ends up
> calling kvm_inject_nested(), where -EINVAL is actually an error code. Granted,
> that should never happen, because kvm_handle_wfx() first checks vcpu_has_nv(),
> but still feels like something that could be improved.
> 
> Maybe changing handle_wfx_nested() like this would be better:
> 
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
> @@ -14,15 +14,18 @@
>   * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
>   * handle this.
>   */
> -int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
> +bool handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe, int *error)
>  {
>         u64 hcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
>  
> +       *error = 0;
>         if (vcpu_is_el2(vcpu))
> -               return -EINVAL;
> +               return false;
>  
> -       if ((is_wfe && (hcr_el2 & HCR_TWE)) || (!is_wfe && (hcr_el2 & HCR_TWI)))
> -               return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
> +       if ((is_wfe && (hcr_el2 & HCR_TWE)) || (!is_wfe && (hcr_el2 & HCR_TWI))) {
> +               *error = kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
> +               return true;
> +       }
>  
> -       return -EINVAL;
> +       return false;
>  }
> 
> Now the return value means one thing only (did handle_wfx_nested() handle
> the trap?) and we still capture the error code.
> 
> Or perhaps folding handle_wfx_nested() into kvm_handle_wfx() would be
> preferable.
> 
> What do you think?

Or kvm_handle_wtf() can be rewritten to use forward_traps() introduced in the patch after the
next one (#24, "KVM: arm64: nv: Respect the virtual HCR_EL2.NV bit setting"):

static int kvm_handle_wfx(struct kvm_vcpu *vcpu)
{
	bool is_wfe = !!(kvm_vcpu_get_esr(vcpu) & ESR_ELx_WFx_ISS_WFE);

	if (is_wfe && forward_traps(vcpu, HCR_TWE))
		return 1;

	if (!is_wfe && forward_traps(vcpu, HCR_TWI))
		return 1;

	[..]
}

Plenty of options to choose from.

Thanks,
Alex

> 
> Thanks,
> Alex
> 
> > +	}
> > +
> > +	if (is_wfe) {
> >  		trace_kvm_wfx_arm64(*vcpu_pc(vcpu), true);
> >  		vcpu->stat.wfe_exit_stat++;
> >  		kvm_vcpu_on_spin(vcpu, vcpu_mode_priv(vcpu));
> > diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> > new file mode 100644
> > index 000000000000..5e1104f8e765
> > --- /dev/null
> > +++ b/arch/arm64/kvm/nested.c
> > @@ -0,0 +1,28 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * Copyright (C) 2017 - Columbia University and Linaro Ltd.
> > + * Author: Jintack Lim <jintack.lim@linaro.org>
> > + */
> > +
> > +#include <linux/kvm.h>
> > +#include <linux/kvm_host.h>
> > +
> > +#include <asm/kvm_emulate.h>
> > +
> > +/*
> > + * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
> > + * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
> > + * handle this.
> > + */
> > +int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
> > +{
> > +	u64 hcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
> > +
> > +	if (vcpu_is_el2(vcpu))
> > +		return -EINVAL;
> > +
> > +	if ((is_wfe && (hcr_el2 & HCR_TWE)) || (!is_wfe && (hcr_el2 & HCR_TWI)))
> > +		return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
> > +
> > +	return -EINVAL;
> > +}
> > -- 
> > 2.30.2
> > 

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 22/64] KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting
@ 2022-02-07 15:38       ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-07 15:38 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi Marc,

On Fri, Feb 04, 2022 at 03:40:15PM +0000, Alexandru Elisei wrote:
> Hi Marc,
> 
> On Fri, Jan 28, 2022 at 12:18:30PM +0000, Marc Zyngier wrote:
> > From: Jintack Lim <jintack.lim@linaro.org>
> > 
> > Forward exceptions due to WFI or WFE instructions to the virtual EL2 if
> > they are not coming from the virtual EL2 and virtual HCR_EL2.TWX is set.
> > 
> > Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> >  arch/arm64/include/asm/kvm_nested.h |  2 ++
> >  arch/arm64/kvm/Makefile             |  2 +-
> >  arch/arm64/kvm/handle_exit.c        | 11 ++++++++++-
> >  arch/arm64/kvm/nested.c             | 28 ++++++++++++++++++++++++++++
> >  4 files changed, 41 insertions(+), 2 deletions(-)
> >  create mode 100644 arch/arm64/kvm/nested.c
> > 
> > diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> > index 5a85be6d8eb3..79d382fa02ea 100644
> > --- a/arch/arm64/include/asm/kvm_nested.h
> > +++ b/arch/arm64/include/asm/kvm_nested.h
> > @@ -65,4 +65,6 @@ static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
> >  		(cnthctl & (CNTHCTL_EVNTI | CNTHCTL_EVNTDIR | CNTHCTL_EVNTEN)));
> >  }
> >  
> > +int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
> > +
> >  #endif /* __ARM64_KVM_NESTED_H */
> > diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> > index b67c4ebd72b1..dbaf42ff65f1 100644
> > --- a/arch/arm64/kvm/Makefile
> > +++ b/arch/arm64/kvm/Makefile
> > @@ -14,7 +14,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
> >  	 inject_fault.o va_layout.o handle_exit.o \
> >  	 guest.o debug.o reset.o sys_regs.o \
> >  	 vgic-sys-reg-v3.o fpsimd.o pmu.o pkvm.o \
> > -	 arch_timer.o trng.o emulate-nested.o \
> > +	 arch_timer.o trng.o emulate-nested.o nested.o \
> >  	 vgic/vgic.o vgic/vgic-init.o \
> >  	 vgic/vgic-irqfd.o vgic/vgic-v2.o \
> >  	 vgic/vgic-v3.o vgic/vgic-v4.o \
> > diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> > index 0cedef6e0d80..a1b1bbf3d598 100644
> > --- a/arch/arm64/kvm/handle_exit.c
> > +++ b/arch/arm64/kvm/handle_exit.c
> > @@ -119,7 +119,16 @@ static int handle_no_fpsimd(struct kvm_vcpu *vcpu)
> >   */
> >  static int kvm_handle_wfx(struct kvm_vcpu *vcpu)
> >  {
> > -	if (kvm_vcpu_get_esr(vcpu) & ESR_ELx_WFx_ISS_WFE) {
> > +	bool is_wfe = !!(kvm_vcpu_get_esr(vcpu) & ESR_ELx_WFx_ISS_WFE);
> > +
> > +	if (vcpu_has_nv(vcpu)) {
> > +		int ret = handle_wfx_nested(vcpu, is_wfe);
> > +
> > +		if (ret != -EINVAL)
> > +			return ret;
> 
> I find this rather clunky. The common pattern is that a function returns
> early when it encounters an error, but here this pattern is reversed:
> -EINVAL means that handle_wfx_nested() failed in handling the WFx, so
> proceed as usual; conversly, anything but -EINVAL means handle_wfx_nested()
> was successful in handling WFx, so exit early from kvm_handle_wfx().
> 
> That would be ok by itself, but if we dig deeper, handle_wfx_nested() ends up
> calling kvm_inject_nested(), where -EINVAL is actually an error code. Granted,
> that should never happen, because kvm_handle_wfx() first checks vcpu_has_nv(),
> but still feels like something that could be improved.
> 
> Maybe changing handle_wfx_nested() like this would be better:
> 
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
> @@ -14,15 +14,18 @@
>   * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
>   * handle this.
>   */
> -int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
> +bool handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe, int *error)
>  {
>         u64 hcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
>  
> +       *error = 0;
>         if (vcpu_is_el2(vcpu))
> -               return -EINVAL;
> +               return false;
>  
> -       if ((is_wfe && (hcr_el2 & HCR_TWE)) || (!is_wfe && (hcr_el2 & HCR_TWI)))
> -               return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
> +       if ((is_wfe && (hcr_el2 & HCR_TWE)) || (!is_wfe && (hcr_el2 & HCR_TWI))) {
> +               *error = kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
> +               return true;
> +       }
>  
> -       return -EINVAL;
> +       return false;
>  }
> 
> Now the return value means one thing only (did handle_wfx_nested() handle
> the trap?) and we still capture the error code.
> 
> Or perhaps folding handle_wfx_nested() into kvm_handle_wfx() would be
> preferable.
> 
> What do you think?

Or kvm_handle_wtf() can be rewritten to use forward_traps() introduced in the patch after the
next one (#24, "KVM: arm64: nv: Respect the virtual HCR_EL2.NV bit setting"):

static int kvm_handle_wfx(struct kvm_vcpu *vcpu)
{
	bool is_wfe = !!(kvm_vcpu_get_esr(vcpu) & ESR_ELx_WFx_ISS_WFE);

	if (is_wfe && forward_traps(vcpu, HCR_TWE))
		return 1;

	if (!is_wfe && forward_traps(vcpu, HCR_TWI))
		return 1;

	[..]
}

Plenty of options to choose from.

Thanks,
Alex

> 
> Thanks,
> Alex
> 
> > +	}
> > +
> > +	if (is_wfe) {
> >  		trace_kvm_wfx_arm64(*vcpu_pc(vcpu), true);
> >  		vcpu->stat.wfe_exit_stat++;
> >  		kvm_vcpu_on_spin(vcpu, vcpu_mode_priv(vcpu));
> > diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> > new file mode 100644
> > index 000000000000..5e1104f8e765
> > --- /dev/null
> > +++ b/arch/arm64/kvm/nested.c
> > @@ -0,0 +1,28 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * Copyright (C) 2017 - Columbia University and Linaro Ltd.
> > + * Author: Jintack Lim <jintack.lim@linaro.org>
> > + */
> > +
> > +#include <linux/kvm.h>
> > +#include <linux/kvm_host.h>
> > +
> > +#include <asm/kvm_emulate.h>
> > +
> > +/*
> > + * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
> > + * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
> > + * handle this.
> > + */
> > +int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
> > +{
> > +	u64 hcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
> > +
> > +	if (vcpu_is_el2(vcpu))
> > +		return -EINVAL;
> > +
> > +	if ((is_wfe && (hcr_el2 & HCR_TWE)) || (!is_wfe && (hcr_el2 & HCR_TWI)))
> > +		return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
> > +
> > +	return -EINVAL;
> > +}
> > -- 
> > 2.30.2
> > 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 25/64] KVM: arm64: nv: Respect virtual HCR_EL2.TVM and TRVM settings
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-07 16:18     ` Alexandru Elisei
  -1 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-07 16:18 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:33PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> Forward the EL1 virtual memory register traps to the virtual EL2 if they
> are not coming from the virtual EL2 and the virtual HCR_EL2.TVM or TRVM
> bit is set.
> 
> This is for recursive nested virtualization.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/sys_regs.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index ccd063d6cb69..edaf287c7ec9 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -351,6 +351,13 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
>  	if (el12_reg(p) && forward_nv_traps(vcpu))
>  		return false;
>  
> +	if (!el12_reg(p)) {
> +		u64 bit = p->is_write ? HCR_TVM : HCR_TRVM;
> +
> +		if (forward_traps(vcpu, bit))
> +			return false;

This part of the TVM bit description from the architecture manual (page
D13-3290) got me really stumped for a while:

"When HCR_EL2.TGE is 1, the PE ignores the value of this field for all purposes
other than a direct read of this field".

But I soon realized it's forbidden by the architecture to eret to EL1 when TGE
is set, so all's good. I wonder why that part was added to the TVM bit
description though.

Regardless, the patch looks good to me:

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks,
Alex

> +	}
> +
>  	/* We don't expect TRVM on the host */
>  	BUG_ON(!vcpu_is_el2(vcpu) && !p->is_write);
>  
> -- 
> 2.30.2
> 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 25/64] KVM: arm64: nv: Respect virtual HCR_EL2.TVM and TRVM settings
@ 2022-02-07 16:18     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-07 16:18 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:33PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> Forward the EL1 virtual memory register traps to the virtual EL2 if they
> are not coming from the virtual EL2 and the virtual HCR_EL2.TVM or TRVM
> bit is set.
> 
> This is for recursive nested virtualization.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/sys_regs.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index ccd063d6cb69..edaf287c7ec9 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -351,6 +351,13 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
>  	if (el12_reg(p) && forward_nv_traps(vcpu))
>  		return false;
>  
> +	if (!el12_reg(p)) {
> +		u64 bit = p->is_write ? HCR_TVM : HCR_TRVM;
> +
> +		if (forward_traps(vcpu, bit))
> +			return false;

This part of the TVM bit description from the architecture manual (page
D13-3290) got me really stumped for a while:

"When HCR_EL2.TGE is 1, the PE ignores the value of this field for all purposes
other than a direct read of this field".

But I soon realized it's forbidden by the architecture to eret to EL1 when TGE
is set, so all's good. I wonder why that part was added to the TVM bit
description though.

Regardless, the patch looks good to me:

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks,
Alex

> +	}
> +
>  	/* We don't expect TRVM on the host */
>  	BUG_ON(!vcpu_is_el2(vcpu) && !p->is_write);
>  
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 25/64] KVM: arm64: nv: Respect virtual HCR_EL2.TVM and TRVM settings
@ 2022-02-07 16:18     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-07 16:18 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:33PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> Forward the EL1 virtual memory register traps to the virtual EL2 if they
> are not coming from the virtual EL2 and the virtual HCR_EL2.TVM or TRVM
> bit is set.
> 
> This is for recursive nested virtualization.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/sys_regs.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index ccd063d6cb69..edaf287c7ec9 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -351,6 +351,13 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
>  	if (el12_reg(p) && forward_nv_traps(vcpu))
>  		return false;
>  
> +	if (!el12_reg(p)) {
> +		u64 bit = p->is_write ? HCR_TVM : HCR_TRVM;
> +
> +		if (forward_traps(vcpu, bit))
> +			return false;

This part of the TVM bit description from the architecture manual (page
D13-3290) got me really stumped for a while:

"When HCR_EL2.TGE is 1, the PE ignores the value of this field for all purposes
other than a direct read of this field".

But I soon realized it's forbidden by the architecture to eret to EL1 when TGE
is set, so all's good. I wonder why that part was added to the TVM bit
description though.

Regardless, the patch looks good to me:

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks,
Alex

> +	}
> +
>  	/* We don't expect TRVM on the host */
>  	BUG_ON(!vcpu_is_el2(vcpu) && !p->is_write);
>  
> -- 
> 2.30.2
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 26/64] KVM: arm64: nv: Respect the virtual HCR_EL2.NV1 bit setting
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-07 16:36     ` Alexandru Elisei
  -1 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-07 16:36 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

On Fri, Jan 28, 2022 at 12:18:34PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack@cs.columbia.edu>
> 
> Forward ELR_EL1, SPSR_EL1 and VBAR_EL1 traps to the virtual EL2 if the
> virtual HCR_EL2.NV bit is set.

Those registers are trapped when HCR_EL2.{NV,NV1} = {1,1}. They aren't trapped
when only HCR_EL2.NV is set.

> 
> This is for recursive nested virtualization.
> 
> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_arm.h    |  1 +
>  arch/arm64/include/asm/kvm_nested.h |  1 +
>  arch/arm64/kvm/emulate-nested.c     |  5 +++++
>  arch/arm64/kvm/sys_regs.c           | 22 +++++++++++++++++++++-
>  4 files changed, 28 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index 8043827e7dc0..748c2b068d4e 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -20,6 +20,7 @@
>  #define HCR_AMVOFFEN	(UL(1) << 51)
>  #define HCR_FIEN	(UL(1) << 47)
>  #define HCR_FWB		(UL(1) << 46)
> +#define HCR_NV1		(UL(1) << 43)
>  #define HCR_NV		(UL(1) << 42)
>  #define HCR_API		(UL(1) << 41)
>  #define HCR_APK		(UL(1) << 40)
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index 37ff6458296d..82fc8b6c990b 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -68,5 +68,6 @@ static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
>  int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
>  extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
>  extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
> +extern bool forward_nv1_traps(struct kvm_vcpu *vcpu);
>  
>  #endif /* __ARM64_KVM_NESTED_H */
> diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
> index 7dd98d6e96e0..0109dfd664dd 100644
> --- a/arch/arm64/kvm/emulate-nested.c
> +++ b/arch/arm64/kvm/emulate-nested.c
> @@ -33,6 +33,11 @@ bool forward_nv_traps(struct kvm_vcpu *vcpu)
>  	return forward_traps(vcpu, HCR_NV);
>  }
>  
> +bool forward_nv1_traps(struct kvm_vcpu *vcpu)
> +{
> +	return forward_traps(vcpu, HCR_NV1);
> +}
> +
>  static u64 kvm_check_illegal_exception_return(struct kvm_vcpu *vcpu, u64 spsr)
>  {
>  	u64 mode = spsr & PSR_MODE_MASK;
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index edaf287c7ec9..31d739d59f67 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -288,6 +288,16 @@ static bool access_rw(struct kvm_vcpu *vcpu,
>  	return true;
>  }
>  
> +static bool access_vbar_el1(struct kvm_vcpu *vcpu,
> +			    struct sys_reg_params *p,
> +			    const struct sys_reg_desc *r)
> +{
> +	if (forward_nv1_traps(vcpu))
> +		return false;
> +
> +	return access_rw(vcpu, p, r);
> +}
> +
>  /*
>   * See note at ARMv7 ARM B1.14.4 (TL;DR: S/W ops are not easily virtualized).
>   */
> @@ -1669,6 +1679,7 @@ static bool access_sp_el1(struct kvm_vcpu *vcpu,
>  	return true;
>  }
>  
> +

Hm... extra newline?

Thanks,
Alex

>  static bool access_elr(struct kvm_vcpu *vcpu,
>  		       struct sys_reg_params *p,
>  		       const struct sys_reg_desc *r)
> @@ -1676,6 +1687,9 @@ static bool access_elr(struct kvm_vcpu *vcpu,
>  	if (el12_reg(p) && forward_nv_traps(vcpu))
>  		return false;
>  
> +	if (!el12_reg(p) && forward_nv1_traps(vcpu))
> +		return false;
> +
>  	if (p->is_write)
>  		vcpu_write_sys_reg(vcpu, p->regval, ELR_EL1);
>  	else
> @@ -1691,6 +1705,9 @@ static bool access_spsr(struct kvm_vcpu *vcpu,
>  	if (el12_reg(p) && forward_nv_traps(vcpu))
>  		return false;
>  
> +	if (!el12_reg(p) && forward_nv1_traps(vcpu))
> +		return false;
> +
>  	if (p->is_write)
>  		__vcpu_sys_reg(vcpu, SPSR_EL1) = p->regval;
>  	else
> @@ -1706,6 +1723,9 @@ static bool access_spsr_el2(struct kvm_vcpu *vcpu,
>  	if (el12_reg(p) && forward_nv_traps(vcpu))
>  		return false;
>  
> +	if (!el12_reg(p) && forward_nv1_traps(vcpu))
> +		return false;
> +
>  	if (p->is_write)
>  		vcpu_write_sys_reg(vcpu, p->regval, SPSR_EL2);
>  	else
> @@ -1914,7 +1934,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	{ SYS_DESC(SYS_LORC_EL1), trap_loregion },
>  	{ SYS_DESC(SYS_LORID_EL1), trap_loregion },
>  
> -	{ SYS_DESC(SYS_VBAR_EL1), access_rw, reset_val, VBAR_EL1, 0 },
> +	{ SYS_DESC(SYS_VBAR_EL1), access_vbar_el1, reset_val, VBAR_EL1, 0 },
>  	{ SYS_DESC(SYS_DISR_EL1), NULL, reset_val, DISR_EL1, 0 },
>  
>  	{ SYS_DESC(SYS_ICC_IAR0_EL1), write_to_read_only },
> -- 
> 2.30.2
> 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 26/64] KVM: arm64: nv: Respect the virtual HCR_EL2.NV1 bit setting
@ 2022-02-07 16:36     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-07 16:36 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

On Fri, Jan 28, 2022 at 12:18:34PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack@cs.columbia.edu>
> 
> Forward ELR_EL1, SPSR_EL1 and VBAR_EL1 traps to the virtual EL2 if the
> virtual HCR_EL2.NV bit is set.

Those registers are trapped when HCR_EL2.{NV,NV1} = {1,1}. They aren't trapped
when only HCR_EL2.NV is set.

> 
> This is for recursive nested virtualization.
> 
> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_arm.h    |  1 +
>  arch/arm64/include/asm/kvm_nested.h |  1 +
>  arch/arm64/kvm/emulate-nested.c     |  5 +++++
>  arch/arm64/kvm/sys_regs.c           | 22 +++++++++++++++++++++-
>  4 files changed, 28 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index 8043827e7dc0..748c2b068d4e 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -20,6 +20,7 @@
>  #define HCR_AMVOFFEN	(UL(1) << 51)
>  #define HCR_FIEN	(UL(1) << 47)
>  #define HCR_FWB		(UL(1) << 46)
> +#define HCR_NV1		(UL(1) << 43)
>  #define HCR_NV		(UL(1) << 42)
>  #define HCR_API		(UL(1) << 41)
>  #define HCR_APK		(UL(1) << 40)
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index 37ff6458296d..82fc8b6c990b 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -68,5 +68,6 @@ static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
>  int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
>  extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
>  extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
> +extern bool forward_nv1_traps(struct kvm_vcpu *vcpu);
>  
>  #endif /* __ARM64_KVM_NESTED_H */
> diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
> index 7dd98d6e96e0..0109dfd664dd 100644
> --- a/arch/arm64/kvm/emulate-nested.c
> +++ b/arch/arm64/kvm/emulate-nested.c
> @@ -33,6 +33,11 @@ bool forward_nv_traps(struct kvm_vcpu *vcpu)
>  	return forward_traps(vcpu, HCR_NV);
>  }
>  
> +bool forward_nv1_traps(struct kvm_vcpu *vcpu)
> +{
> +	return forward_traps(vcpu, HCR_NV1);
> +}
> +
>  static u64 kvm_check_illegal_exception_return(struct kvm_vcpu *vcpu, u64 spsr)
>  {
>  	u64 mode = spsr & PSR_MODE_MASK;
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index edaf287c7ec9..31d739d59f67 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -288,6 +288,16 @@ static bool access_rw(struct kvm_vcpu *vcpu,
>  	return true;
>  }
>  
> +static bool access_vbar_el1(struct kvm_vcpu *vcpu,
> +			    struct sys_reg_params *p,
> +			    const struct sys_reg_desc *r)
> +{
> +	if (forward_nv1_traps(vcpu))
> +		return false;
> +
> +	return access_rw(vcpu, p, r);
> +}
> +
>  /*
>   * See note at ARMv7 ARM B1.14.4 (TL;DR: S/W ops are not easily virtualized).
>   */
> @@ -1669,6 +1679,7 @@ static bool access_sp_el1(struct kvm_vcpu *vcpu,
>  	return true;
>  }
>  
> +

Hm... extra newline?

Thanks,
Alex

>  static bool access_elr(struct kvm_vcpu *vcpu,
>  		       struct sys_reg_params *p,
>  		       const struct sys_reg_desc *r)
> @@ -1676,6 +1687,9 @@ static bool access_elr(struct kvm_vcpu *vcpu,
>  	if (el12_reg(p) && forward_nv_traps(vcpu))
>  		return false;
>  
> +	if (!el12_reg(p) && forward_nv1_traps(vcpu))
> +		return false;
> +
>  	if (p->is_write)
>  		vcpu_write_sys_reg(vcpu, p->regval, ELR_EL1);
>  	else
> @@ -1691,6 +1705,9 @@ static bool access_spsr(struct kvm_vcpu *vcpu,
>  	if (el12_reg(p) && forward_nv_traps(vcpu))
>  		return false;
>  
> +	if (!el12_reg(p) && forward_nv1_traps(vcpu))
> +		return false;
> +
>  	if (p->is_write)
>  		__vcpu_sys_reg(vcpu, SPSR_EL1) = p->regval;
>  	else
> @@ -1706,6 +1723,9 @@ static bool access_spsr_el2(struct kvm_vcpu *vcpu,
>  	if (el12_reg(p) && forward_nv_traps(vcpu))
>  		return false;
>  
> +	if (!el12_reg(p) && forward_nv1_traps(vcpu))
> +		return false;
> +
>  	if (p->is_write)
>  		vcpu_write_sys_reg(vcpu, p->regval, SPSR_EL2);
>  	else
> @@ -1914,7 +1934,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	{ SYS_DESC(SYS_LORC_EL1), trap_loregion },
>  	{ SYS_DESC(SYS_LORID_EL1), trap_loregion },
>  
> -	{ SYS_DESC(SYS_VBAR_EL1), access_rw, reset_val, VBAR_EL1, 0 },
> +	{ SYS_DESC(SYS_VBAR_EL1), access_vbar_el1, reset_val, VBAR_EL1, 0 },
>  	{ SYS_DESC(SYS_DISR_EL1), NULL, reset_val, DISR_EL1, 0 },
>  
>  	{ SYS_DESC(SYS_ICC_IAR0_EL1), write_to_read_only },
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 26/64] KVM: arm64: nv: Respect the virtual HCR_EL2.NV1 bit setting
@ 2022-02-07 16:36     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-07 16:36 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

On Fri, Jan 28, 2022 at 12:18:34PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack@cs.columbia.edu>
> 
> Forward ELR_EL1, SPSR_EL1 and VBAR_EL1 traps to the virtual EL2 if the
> virtual HCR_EL2.NV bit is set.

Those registers are trapped when HCR_EL2.{NV,NV1} = {1,1}. They aren't trapped
when only HCR_EL2.NV is set.

> 
> This is for recursive nested virtualization.
> 
> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_arm.h    |  1 +
>  arch/arm64/include/asm/kvm_nested.h |  1 +
>  arch/arm64/kvm/emulate-nested.c     |  5 +++++
>  arch/arm64/kvm/sys_regs.c           | 22 +++++++++++++++++++++-
>  4 files changed, 28 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index 8043827e7dc0..748c2b068d4e 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -20,6 +20,7 @@
>  #define HCR_AMVOFFEN	(UL(1) << 51)
>  #define HCR_FIEN	(UL(1) << 47)
>  #define HCR_FWB		(UL(1) << 46)
> +#define HCR_NV1		(UL(1) << 43)
>  #define HCR_NV		(UL(1) << 42)
>  #define HCR_API		(UL(1) << 41)
>  #define HCR_APK		(UL(1) << 40)
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index 37ff6458296d..82fc8b6c990b 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -68,5 +68,6 @@ static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
>  int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
>  extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
>  extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
> +extern bool forward_nv1_traps(struct kvm_vcpu *vcpu);
>  
>  #endif /* __ARM64_KVM_NESTED_H */
> diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
> index 7dd98d6e96e0..0109dfd664dd 100644
> --- a/arch/arm64/kvm/emulate-nested.c
> +++ b/arch/arm64/kvm/emulate-nested.c
> @@ -33,6 +33,11 @@ bool forward_nv_traps(struct kvm_vcpu *vcpu)
>  	return forward_traps(vcpu, HCR_NV);
>  }
>  
> +bool forward_nv1_traps(struct kvm_vcpu *vcpu)
> +{
> +	return forward_traps(vcpu, HCR_NV1);
> +}
> +
>  static u64 kvm_check_illegal_exception_return(struct kvm_vcpu *vcpu, u64 spsr)
>  {
>  	u64 mode = spsr & PSR_MODE_MASK;
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index edaf287c7ec9..31d739d59f67 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -288,6 +288,16 @@ static bool access_rw(struct kvm_vcpu *vcpu,
>  	return true;
>  }
>  
> +static bool access_vbar_el1(struct kvm_vcpu *vcpu,
> +			    struct sys_reg_params *p,
> +			    const struct sys_reg_desc *r)
> +{
> +	if (forward_nv1_traps(vcpu))
> +		return false;
> +
> +	return access_rw(vcpu, p, r);
> +}
> +
>  /*
>   * See note at ARMv7 ARM B1.14.4 (TL;DR: S/W ops are not easily virtualized).
>   */
> @@ -1669,6 +1679,7 @@ static bool access_sp_el1(struct kvm_vcpu *vcpu,
>  	return true;
>  }
>  
> +

Hm... extra newline?

Thanks,
Alex

>  static bool access_elr(struct kvm_vcpu *vcpu,
>  		       struct sys_reg_params *p,
>  		       const struct sys_reg_desc *r)
> @@ -1676,6 +1687,9 @@ static bool access_elr(struct kvm_vcpu *vcpu,
>  	if (el12_reg(p) && forward_nv_traps(vcpu))
>  		return false;
>  
> +	if (!el12_reg(p) && forward_nv1_traps(vcpu))
> +		return false;
> +
>  	if (p->is_write)
>  		vcpu_write_sys_reg(vcpu, p->regval, ELR_EL1);
>  	else
> @@ -1691,6 +1705,9 @@ static bool access_spsr(struct kvm_vcpu *vcpu,
>  	if (el12_reg(p) && forward_nv_traps(vcpu))
>  		return false;
>  
> +	if (!el12_reg(p) && forward_nv1_traps(vcpu))
> +		return false;
> +
>  	if (p->is_write)
>  		__vcpu_sys_reg(vcpu, SPSR_EL1) = p->regval;
>  	else
> @@ -1706,6 +1723,9 @@ static bool access_spsr_el2(struct kvm_vcpu *vcpu,
>  	if (el12_reg(p) && forward_nv_traps(vcpu))
>  		return false;
>  
> +	if (!el12_reg(p) && forward_nv1_traps(vcpu))
> +		return false;
> +
>  	if (p->is_write)
>  		vcpu_write_sys_reg(vcpu, p->regval, SPSR_EL2);
>  	else
> @@ -1914,7 +1934,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	{ SYS_DESC(SYS_LORC_EL1), trap_loregion },
>  	{ SYS_DESC(SYS_LORID_EL1), trap_loregion },
>  
> -	{ SYS_DESC(SYS_VBAR_EL1), access_rw, reset_val, VBAR_EL1, 0 },
> +	{ SYS_DESC(SYS_VBAR_EL1), access_vbar_el1, reset_val, VBAR_EL1, 0 },
>  	{ SYS_DESC(SYS_DISR_EL1), NULL, reset_val, DISR_EL1, 0 },
>  
>  	{ SYS_DESC(SYS_ICC_IAR0_EL1), write_to_read_only },
> -- 
> 2.30.2
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 27/64] KVM: arm64: nv: Allow a sysreg to be hidden from userspace only
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-08 14:36     ` Alexandru Elisei
  -1 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-08 14:36 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:35PM +0000, Marc Zyngier wrote:
> So far, we never needed to distinguish between registers hidden
> from userspace and being hidden from a guest (they are always
> either visible to both, or hidden from both).
> 
> With NV, we have the ugly case of the EL{0,1}2 registers, which
                                        ^^^^^^^^
That might be easier to parse if it is rephrased to "EL02 and EL12
registers".

> are only a view on the EL{0,1} registers. It makes absolutely no
> sense to expose them to userspace, since it already has the
> canonical view.
> 
> Add a new visibility flag (REG_HIDDEN_USER) and a new helper that
> checks for it and REG_HIDDEN when checking whether to expose
> a sysreg to userspace. Subsequent patches will make use of it.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>

The patch makes sense to me, the _EL02 and _EL12 are just aliases for the
EL0 and EL1 registers, no sense in exposing both to userspace. And these
registers also only make sense when the VCPU has the EL2 feature, which is
not always the case:

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks,
Alex

> ---
>  arch/arm64/kvm/sys_regs.c |  6 +++---
>  arch/arm64/kvm/sys_regs.h | 12 +++++++++++-
>  2 files changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 31d739d59f67..0c9bbe5eee5e 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -3031,7 +3031,7 @@ int kvm_arm_sys_reg_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg
>  		return get_invariant_sys_reg(reg->id, uaddr);
>  
>  	/* Check for regs disabled by runtime config */
> -	if (sysreg_hidden(vcpu, r))
> +	if (sysreg_hidden_user(vcpu, r))
>  		return -ENOENT;
>  
>  	if (r->get_user)
> @@ -3056,7 +3056,7 @@ int kvm_arm_sys_reg_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg
>  		return set_invariant_sys_reg(reg->id, uaddr);
>  
>  	/* Check for regs disabled by runtime config */
> -	if (sysreg_hidden(vcpu, r))
> +	if (sysreg_hidden_user(vcpu, r))
>  		return -ENOENT;
>  
>  	if (r->set_user)
> @@ -3127,7 +3127,7 @@ static int walk_one_sys_reg(const struct kvm_vcpu *vcpu,
>  	if (!(rd->reg || rd->get_user))
>  		return 0;
>  
> -	if (sysreg_hidden(vcpu, rd))
> +	if (sysreg_hidden_user(vcpu, rd))
>  		return 0;
>  
>  	if (!copy_reg_to_user(rd, uind))
> diff --git a/arch/arm64/kvm/sys_regs.h b/arch/arm64/kvm/sys_regs.h
> index cc0cc95a0280..850629f083a3 100644
> --- a/arch/arm64/kvm/sys_regs.h
> +++ b/arch/arm64/kvm/sys_regs.h
> @@ -78,7 +78,8 @@ struct sys_reg_desc {
>  };
>  
>  #define REG_HIDDEN		(1 << 0) /* hidden from userspace and guest */
> -#define REG_RAZ			(1 << 1) /* RAZ from userspace and guest */
> +#define REG_HIDDEN_USER		(1 << 1) /* hidden from userspace only */
> +#define REG_RAZ			(1 << 2) /* RAZ from userspace and guest */
>  
>  static __printf(2, 3)
>  inline void print_sys_reg_msg(const struct sys_reg_params *p,
> @@ -138,6 +139,15 @@ static inline bool sysreg_hidden(const struct kvm_vcpu *vcpu,
>  	return r->visibility(vcpu, r) & REG_HIDDEN;
>  }
>  
> +static inline bool sysreg_hidden_user(const struct kvm_vcpu *vcpu,
> +				      const struct sys_reg_desc *r)
> +{
> +	if (likely(!r->visibility))
> +		return false;
> +
> +	return r->visibility(vcpu, r) & (REG_HIDDEN | REG_HIDDEN_USER);
> +}
> +
>  static inline bool sysreg_visible_as_raz(const struct kvm_vcpu *vcpu,
>  					 const struct sys_reg_desc *r)
>  {
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 27/64] KVM: arm64: nv: Allow a sysreg to be hidden from userspace only
@ 2022-02-08 14:36     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-08 14:36 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:35PM +0000, Marc Zyngier wrote:
> So far, we never needed to distinguish between registers hidden
> from userspace and being hidden from a guest (they are always
> either visible to both, or hidden from both).
> 
> With NV, we have the ugly case of the EL{0,1}2 registers, which
                                        ^^^^^^^^
That might be easier to parse if it is rephrased to "EL02 and EL12
registers".

> are only a view on the EL{0,1} registers. It makes absolutely no
> sense to expose them to userspace, since it already has the
> canonical view.
> 
> Add a new visibility flag (REG_HIDDEN_USER) and a new helper that
> checks for it and REG_HIDDEN when checking whether to expose
> a sysreg to userspace. Subsequent patches will make use of it.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>

The patch makes sense to me, the _EL02 and _EL12 are just aliases for the
EL0 and EL1 registers, no sense in exposing both to userspace. And these
registers also only make sense when the VCPU has the EL2 feature, which is
not always the case:

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks,
Alex

> ---
>  arch/arm64/kvm/sys_regs.c |  6 +++---
>  arch/arm64/kvm/sys_regs.h | 12 +++++++++++-
>  2 files changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 31d739d59f67..0c9bbe5eee5e 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -3031,7 +3031,7 @@ int kvm_arm_sys_reg_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg
>  		return get_invariant_sys_reg(reg->id, uaddr);
>  
>  	/* Check for regs disabled by runtime config */
> -	if (sysreg_hidden(vcpu, r))
> +	if (sysreg_hidden_user(vcpu, r))
>  		return -ENOENT;
>  
>  	if (r->get_user)
> @@ -3056,7 +3056,7 @@ int kvm_arm_sys_reg_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg
>  		return set_invariant_sys_reg(reg->id, uaddr);
>  
>  	/* Check for regs disabled by runtime config */
> -	if (sysreg_hidden(vcpu, r))
> +	if (sysreg_hidden_user(vcpu, r))
>  		return -ENOENT;
>  
>  	if (r->set_user)
> @@ -3127,7 +3127,7 @@ static int walk_one_sys_reg(const struct kvm_vcpu *vcpu,
>  	if (!(rd->reg || rd->get_user))
>  		return 0;
>  
> -	if (sysreg_hidden(vcpu, rd))
> +	if (sysreg_hidden_user(vcpu, rd))
>  		return 0;
>  
>  	if (!copy_reg_to_user(rd, uind))
> diff --git a/arch/arm64/kvm/sys_regs.h b/arch/arm64/kvm/sys_regs.h
> index cc0cc95a0280..850629f083a3 100644
> --- a/arch/arm64/kvm/sys_regs.h
> +++ b/arch/arm64/kvm/sys_regs.h
> @@ -78,7 +78,8 @@ struct sys_reg_desc {
>  };
>  
>  #define REG_HIDDEN		(1 << 0) /* hidden from userspace and guest */
> -#define REG_RAZ			(1 << 1) /* RAZ from userspace and guest */
> +#define REG_HIDDEN_USER		(1 << 1) /* hidden from userspace only */
> +#define REG_RAZ			(1 << 2) /* RAZ from userspace and guest */
>  
>  static __printf(2, 3)
>  inline void print_sys_reg_msg(const struct sys_reg_params *p,
> @@ -138,6 +139,15 @@ static inline bool sysreg_hidden(const struct kvm_vcpu *vcpu,
>  	return r->visibility(vcpu, r) & REG_HIDDEN;
>  }
>  
> +static inline bool sysreg_hidden_user(const struct kvm_vcpu *vcpu,
> +				      const struct sys_reg_desc *r)
> +{
> +	if (likely(!r->visibility))
> +		return false;
> +
> +	return r->visibility(vcpu, r) & (REG_HIDDEN | REG_HIDDEN_USER);
> +}
> +
>  static inline bool sysreg_visible_as_raz(const struct kvm_vcpu *vcpu,
>  					 const struct sys_reg_desc *r)
>  {
> -- 
> 2.30.2
> 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 27/64] KVM: arm64: nv: Allow a sysreg to be hidden from userspace only
@ 2022-02-08 14:36     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-08 14:36 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:35PM +0000, Marc Zyngier wrote:
> So far, we never needed to distinguish between registers hidden
> from userspace and being hidden from a guest (they are always
> either visible to both, or hidden from both).
> 
> With NV, we have the ugly case of the EL{0,1}2 registers, which
                                        ^^^^^^^^
That might be easier to parse if it is rephrased to "EL02 and EL12
registers".

> are only a view on the EL{0,1} registers. It makes absolutely no
> sense to expose them to userspace, since it already has the
> canonical view.
> 
> Add a new visibility flag (REG_HIDDEN_USER) and a new helper that
> checks for it and REG_HIDDEN when checking whether to expose
> a sysreg to userspace. Subsequent patches will make use of it.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>

The patch makes sense to me, the _EL02 and _EL12 are just aliases for the
EL0 and EL1 registers, no sense in exposing both to userspace. And these
registers also only make sense when the VCPU has the EL2 feature, which is
not always the case:

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks,
Alex

> ---
>  arch/arm64/kvm/sys_regs.c |  6 +++---
>  arch/arm64/kvm/sys_regs.h | 12 +++++++++++-
>  2 files changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 31d739d59f67..0c9bbe5eee5e 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -3031,7 +3031,7 @@ int kvm_arm_sys_reg_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg
>  		return get_invariant_sys_reg(reg->id, uaddr);
>  
>  	/* Check for regs disabled by runtime config */
> -	if (sysreg_hidden(vcpu, r))
> +	if (sysreg_hidden_user(vcpu, r))
>  		return -ENOENT;
>  
>  	if (r->get_user)
> @@ -3056,7 +3056,7 @@ int kvm_arm_sys_reg_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg
>  		return set_invariant_sys_reg(reg->id, uaddr);
>  
>  	/* Check for regs disabled by runtime config */
> -	if (sysreg_hidden(vcpu, r))
> +	if (sysreg_hidden_user(vcpu, r))
>  		return -ENOENT;
>  
>  	if (r->set_user)
> @@ -3127,7 +3127,7 @@ static int walk_one_sys_reg(const struct kvm_vcpu *vcpu,
>  	if (!(rd->reg || rd->get_user))
>  		return 0;
>  
> -	if (sysreg_hidden(vcpu, rd))
> +	if (sysreg_hidden_user(vcpu, rd))
>  		return 0;
>  
>  	if (!copy_reg_to_user(rd, uind))
> diff --git a/arch/arm64/kvm/sys_regs.h b/arch/arm64/kvm/sys_regs.h
> index cc0cc95a0280..850629f083a3 100644
> --- a/arch/arm64/kvm/sys_regs.h
> +++ b/arch/arm64/kvm/sys_regs.h
> @@ -78,7 +78,8 @@ struct sys_reg_desc {
>  };
>  
>  #define REG_HIDDEN		(1 << 0) /* hidden from userspace and guest */
> -#define REG_RAZ			(1 << 1) /* RAZ from userspace and guest */
> +#define REG_HIDDEN_USER		(1 << 1) /* hidden from userspace only */
> +#define REG_RAZ			(1 << 2) /* RAZ from userspace and guest */
>  
>  static __printf(2, 3)
>  inline void print_sys_reg_msg(const struct sys_reg_params *p,
> @@ -138,6 +139,15 @@ static inline bool sysreg_hidden(const struct kvm_vcpu *vcpu,
>  	return r->visibility(vcpu, r) & REG_HIDDEN;
>  }
>  
> +static inline bool sysreg_hidden_user(const struct kvm_vcpu *vcpu,
> +				      const struct sys_reg_desc *r)
> +{
> +	if (likely(!r->visibility))
> +		return false;
> +
> +	return r->visibility(vcpu, r) & (REG_HIDDEN | REG_HIDDEN_USER);
> +}
> +
>  static inline bool sysreg_visible_as_raz(const struct kvm_vcpu *vcpu,
>  					 const struct sys_reg_desc *r)
>  {
> -- 
> 2.30.2
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 28/64] KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-08 15:35     ` Alexandru Elisei
  -1 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-08 15:35 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:36PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> With HCR_EL2.NV bit set, accesses to EL12 registers in the virtual EL2
> trap to EL2. Handle those traps just like we do for EL1 registers.
> 
> One exception is CNTKCTL_EL12. We don't trap on CNTKCTL_EL1 for non-VHE
> virtual EL2 because we don't have to. However, accessing CNTKCTL_EL12
> will trap since it's one of the EL12 registers controlled by HCR_EL2.NV
> bit.  Therefore, add a handler for it and don't treat it as a
> non-trap-registers when preparing a shadow context.
> 
> These registers, being only a view on their EL1 counterpart, are
> permanently hidden from userspace.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> [maz: EL12_REG(), register visibility]
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/sys_regs.c | 37 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 37 insertions(+)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 0c9bbe5eee5e..697bf0bca550 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1634,6 +1634,26 @@ static unsigned int el2_visibility(const struct kvm_vcpu *vcpu,
>  	.val = v,				\
>  }
>  
> +/*
> + * EL{0,1}2 registers are the EL2 view on an EL0 or EL1 register when
> + * HCR_EL2.E2H==1, and only in the sysreg table for convenience of
> + * handling traps. Given that, they are always hidden from userspace.
> + */
> +static unsigned int elx2_visibility(const struct kvm_vcpu *vcpu,
> +				    const struct sys_reg_desc *rd)
> +{
> +	return REG_HIDDEN_USER;
> +}
> +
> +#define EL12_REG(name, acc, rst, v) {		\
> +	SYS_DESC(SYS_##name##_EL12),		\
> +	.access = acc,				\
> +	.reset = rst,				\
> +	.reg = name##_EL1,			\
> +	.val = v,				\
> +	.visibility = elx2_visibility,		\
> +}
> +
>  /* sys_reg_desc initialiser for known cpufeature ID registers */
>  #define ID_SANITISED(name) {			\
>  	SYS_DESC(SYS_##name),			\
> @@ -2194,6 +2214,23 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	EL2_REG(CNTVOFF_EL2, access_rw, reset_val, 0),
>  	EL2_REG(CNTHCTL_EL2, access_rw, reset_val, 0),
>  
> +	EL12_REG(SCTLR, access_vm_reg, reset_val, 0x00C50078),
> +	EL12_REG(CPACR, access_rw, reset_val, 0),
> +	EL12_REG(TTBR0, access_vm_reg, reset_unknown, 0),
> +	EL12_REG(TTBR1, access_vm_reg, reset_unknown, 0),
> +	EL12_REG(TCR, access_vm_reg, reset_val, 0),
> +	{ SYS_DESC(SYS_SPSR_EL12), access_spsr},
> +	{ SYS_DESC(SYS_ELR_EL12), access_elr},
> +	EL12_REG(AFSR0, access_vm_reg, reset_unknown, 0),
> +	EL12_REG(AFSR1, access_vm_reg, reset_unknown, 0),
> +	EL12_REG(ESR, access_vm_reg, reset_unknown, 0),
> +	EL12_REG(FAR, access_vm_reg, reset_unknown, 0),
> +	EL12_REG(MAIR, access_vm_reg, reset_unknown, 0),
> +	EL12_REG(AMAIR, access_vm_reg, reset_amair_el1, 0),
> +	EL12_REG(VBAR, access_rw, reset_val, 0),
> +	EL12_REG(CONTEXTIDR, access_vm_reg, reset_val, 0),
> +	EL12_REG(CNTKCTL, access_rw, reset_val, 0),

Compared against Table D5-48 from ARM DDI 0487G.a (page D5-2768), everything
looks correct to me:

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks,
Alex

> +
>  	EL2_REG(SP_EL2, NULL, reset_unknown, 0),
>  };
>  
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 28/64] KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2
@ 2022-02-08 15:35     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-08 15:35 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:36PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> With HCR_EL2.NV bit set, accesses to EL12 registers in the virtual EL2
> trap to EL2. Handle those traps just like we do for EL1 registers.
> 
> One exception is CNTKCTL_EL12. We don't trap on CNTKCTL_EL1 for non-VHE
> virtual EL2 because we don't have to. However, accessing CNTKCTL_EL12
> will trap since it's one of the EL12 registers controlled by HCR_EL2.NV
> bit.  Therefore, add a handler for it and don't treat it as a
> non-trap-registers when preparing a shadow context.
> 
> These registers, being only a view on their EL1 counterpart, are
> permanently hidden from userspace.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> [maz: EL12_REG(), register visibility]
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/sys_regs.c | 37 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 37 insertions(+)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 0c9bbe5eee5e..697bf0bca550 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1634,6 +1634,26 @@ static unsigned int el2_visibility(const struct kvm_vcpu *vcpu,
>  	.val = v,				\
>  }
>  
> +/*
> + * EL{0,1}2 registers are the EL2 view on an EL0 or EL1 register when
> + * HCR_EL2.E2H==1, and only in the sysreg table for convenience of
> + * handling traps. Given that, they are always hidden from userspace.
> + */
> +static unsigned int elx2_visibility(const struct kvm_vcpu *vcpu,
> +				    const struct sys_reg_desc *rd)
> +{
> +	return REG_HIDDEN_USER;
> +}
> +
> +#define EL12_REG(name, acc, rst, v) {		\
> +	SYS_DESC(SYS_##name##_EL12),		\
> +	.access = acc,				\
> +	.reset = rst,				\
> +	.reg = name##_EL1,			\
> +	.val = v,				\
> +	.visibility = elx2_visibility,		\
> +}
> +
>  /* sys_reg_desc initialiser for known cpufeature ID registers */
>  #define ID_SANITISED(name) {			\
>  	SYS_DESC(SYS_##name),			\
> @@ -2194,6 +2214,23 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	EL2_REG(CNTVOFF_EL2, access_rw, reset_val, 0),
>  	EL2_REG(CNTHCTL_EL2, access_rw, reset_val, 0),
>  
> +	EL12_REG(SCTLR, access_vm_reg, reset_val, 0x00C50078),
> +	EL12_REG(CPACR, access_rw, reset_val, 0),
> +	EL12_REG(TTBR0, access_vm_reg, reset_unknown, 0),
> +	EL12_REG(TTBR1, access_vm_reg, reset_unknown, 0),
> +	EL12_REG(TCR, access_vm_reg, reset_val, 0),
> +	{ SYS_DESC(SYS_SPSR_EL12), access_spsr},
> +	{ SYS_DESC(SYS_ELR_EL12), access_elr},
> +	EL12_REG(AFSR0, access_vm_reg, reset_unknown, 0),
> +	EL12_REG(AFSR1, access_vm_reg, reset_unknown, 0),
> +	EL12_REG(ESR, access_vm_reg, reset_unknown, 0),
> +	EL12_REG(FAR, access_vm_reg, reset_unknown, 0),
> +	EL12_REG(MAIR, access_vm_reg, reset_unknown, 0),
> +	EL12_REG(AMAIR, access_vm_reg, reset_amair_el1, 0),
> +	EL12_REG(VBAR, access_rw, reset_val, 0),
> +	EL12_REG(CONTEXTIDR, access_vm_reg, reset_val, 0),
> +	EL12_REG(CNTKCTL, access_rw, reset_val, 0),

Compared against Table D5-48 from ARM DDI 0487G.a (page D5-2768), everything
looks correct to me:

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks,
Alex

> +
>  	EL2_REG(SP_EL2, NULL, reset_unknown, 0),
>  };
>  
> -- 
> 2.30.2
> 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 28/64] KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2
@ 2022-02-08 15:35     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-08 15:35 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:36PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> With HCR_EL2.NV bit set, accesses to EL12 registers in the virtual EL2
> trap to EL2. Handle those traps just like we do for EL1 registers.
> 
> One exception is CNTKCTL_EL12. We don't trap on CNTKCTL_EL1 for non-VHE
> virtual EL2 because we don't have to. However, accessing CNTKCTL_EL12
> will trap since it's one of the EL12 registers controlled by HCR_EL2.NV
> bit.  Therefore, add a handler for it and don't treat it as a
> non-trap-registers when preparing a shadow context.
> 
> These registers, being only a view on their EL1 counterpart, are
> permanently hidden from userspace.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> [maz: EL12_REG(), register visibility]
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/sys_regs.c | 37 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 37 insertions(+)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 0c9bbe5eee5e..697bf0bca550 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1634,6 +1634,26 @@ static unsigned int el2_visibility(const struct kvm_vcpu *vcpu,
>  	.val = v,				\
>  }
>  
> +/*
> + * EL{0,1}2 registers are the EL2 view on an EL0 or EL1 register when
> + * HCR_EL2.E2H==1, and only in the sysreg table for convenience of
> + * handling traps. Given that, they are always hidden from userspace.
> + */
> +static unsigned int elx2_visibility(const struct kvm_vcpu *vcpu,
> +				    const struct sys_reg_desc *rd)
> +{
> +	return REG_HIDDEN_USER;
> +}
> +
> +#define EL12_REG(name, acc, rst, v) {		\
> +	SYS_DESC(SYS_##name##_EL12),		\
> +	.access = acc,				\
> +	.reset = rst,				\
> +	.reg = name##_EL1,			\
> +	.val = v,				\
> +	.visibility = elx2_visibility,		\
> +}
> +
>  /* sys_reg_desc initialiser for known cpufeature ID registers */
>  #define ID_SANITISED(name) {			\
>  	SYS_DESC(SYS_##name),			\
> @@ -2194,6 +2214,23 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	EL2_REG(CNTVOFF_EL2, access_rw, reset_val, 0),
>  	EL2_REG(CNTHCTL_EL2, access_rw, reset_val, 0),
>  
> +	EL12_REG(SCTLR, access_vm_reg, reset_val, 0x00C50078),
> +	EL12_REG(CPACR, access_rw, reset_val, 0),
> +	EL12_REG(TTBR0, access_vm_reg, reset_unknown, 0),
> +	EL12_REG(TTBR1, access_vm_reg, reset_unknown, 0),
> +	EL12_REG(TCR, access_vm_reg, reset_val, 0),
> +	{ SYS_DESC(SYS_SPSR_EL12), access_spsr},
> +	{ SYS_DESC(SYS_ELR_EL12), access_elr},
> +	EL12_REG(AFSR0, access_vm_reg, reset_unknown, 0),
> +	EL12_REG(AFSR1, access_vm_reg, reset_unknown, 0),
> +	EL12_REG(ESR, access_vm_reg, reset_unknown, 0),
> +	EL12_REG(FAR, access_vm_reg, reset_unknown, 0),
> +	EL12_REG(MAIR, access_vm_reg, reset_unknown, 0),
> +	EL12_REG(AMAIR, access_vm_reg, reset_amair_el1, 0),
> +	EL12_REG(VBAR, access_rw, reset_val, 0),
> +	EL12_REG(CONTEXTIDR, access_vm_reg, reset_val, 0),
> +	EL12_REG(CNTKCTL, access_rw, reset_val, 0),

Compared against Table D5-48 from ARM DDI 0487G.a (page D5-2768), everything
looks correct to me:

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks,
Alex

> +
>  	EL2_REG(SP_EL2, NULL, reset_unknown, 0),
>  };
>  
> -- 
> 2.30.2
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 29/64] KVM: arm64: nv: Forward debug traps to the nested guest
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-09 11:04     ` Alexandru Elisei
  -1 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-09 11:04 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:37PM +0000, Marc Zyngier wrote:
> On handling a debug trap, check whether we need to forward it to the
> guest before handling it.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_nested.h | 2 ++
>  arch/arm64/kvm/emulate-nested.c     | 9 +++++++--
>  arch/arm64/kvm/sys_regs.c           | 3 +++
>  3 files changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index 82fc8b6c990b..047ca700163b 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -66,6 +66,8 @@ static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
>  }
>  
>  int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
> +extern bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg,
> +			    u64 control_bit);
>  extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
>  extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
>  extern bool forward_nv1_traps(struct kvm_vcpu *vcpu);
> diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
> index 0109dfd664dd..1f6cf8fe9fe3 100644
> --- a/arch/arm64/kvm/emulate-nested.c
> +++ b/arch/arm64/kvm/emulate-nested.c
> @@ -13,14 +13,14 @@
>  
>  #include "trace.h"
>  
> -bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit)
> +bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg, u64 control_bit)
>  {
>  	bool control_bit_set;
>  
>  	if (!vcpu_has_nv(vcpu))
>  		return false;
>  
> -	control_bit_set = __vcpu_sys_reg(vcpu, HCR_EL2) & control_bit;
> +	control_bit_set = __vcpu_sys_reg(vcpu, reg) & control_bit;
>  	if (!vcpu_is_el2(vcpu) && control_bit_set) {
>  		kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
>  		return true;
> @@ -28,6 +28,11 @@ bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit)
>  	return false;
>  }
>  
> +bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit)
> +{
> +	return __forward_traps(vcpu, HCR_EL2, control_bit);
> +}
> +
>  bool forward_nv_traps(struct kvm_vcpu *vcpu)
>  {
>  	return forward_traps(vcpu, HCR_NV);
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 697bf0bca550..3e1f37c507a8 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -566,6 +566,9 @@ static bool trap_debug_regs(struct kvm_vcpu *vcpu,
>  			    struct sys_reg_params *p,
>  			    const struct sys_reg_desc *r)
>  {
> +	if (__forward_traps(vcpu, MDCR_EL2, MDCR_EL2_TDA | MDCR_EL2_TDE))
> +		return false;

The description of the MDCR_EL2.TDA field says:

"This field is treated as being 1 for all purposes other than a direct read
when one or more of the following are true:

- MDCR_EL2.TDE == 1
- HCR_EL2.TGE == 1"

Shouldn't we also check for HCR_EL2.TGE == 1 when deciding to forward the trap?

Thanks,
Alex

> +
>  	access_rw(vcpu, p, r);
>  	if (p->is_write)
>  		vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
> -- 
> 2.30.2
> 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 29/64] KVM: arm64: nv: Forward debug traps to the nested guest
@ 2022-02-09 11:04     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-09 11:04 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:37PM +0000, Marc Zyngier wrote:
> On handling a debug trap, check whether we need to forward it to the
> guest before handling it.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_nested.h | 2 ++
>  arch/arm64/kvm/emulate-nested.c     | 9 +++++++--
>  arch/arm64/kvm/sys_regs.c           | 3 +++
>  3 files changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index 82fc8b6c990b..047ca700163b 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -66,6 +66,8 @@ static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
>  }
>  
>  int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
> +extern bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg,
> +			    u64 control_bit);
>  extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
>  extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
>  extern bool forward_nv1_traps(struct kvm_vcpu *vcpu);
> diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
> index 0109dfd664dd..1f6cf8fe9fe3 100644
> --- a/arch/arm64/kvm/emulate-nested.c
> +++ b/arch/arm64/kvm/emulate-nested.c
> @@ -13,14 +13,14 @@
>  
>  #include "trace.h"
>  
> -bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit)
> +bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg, u64 control_bit)
>  {
>  	bool control_bit_set;
>  
>  	if (!vcpu_has_nv(vcpu))
>  		return false;
>  
> -	control_bit_set = __vcpu_sys_reg(vcpu, HCR_EL2) & control_bit;
> +	control_bit_set = __vcpu_sys_reg(vcpu, reg) & control_bit;
>  	if (!vcpu_is_el2(vcpu) && control_bit_set) {
>  		kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
>  		return true;
> @@ -28,6 +28,11 @@ bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit)
>  	return false;
>  }
>  
> +bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit)
> +{
> +	return __forward_traps(vcpu, HCR_EL2, control_bit);
> +}
> +
>  bool forward_nv_traps(struct kvm_vcpu *vcpu)
>  {
>  	return forward_traps(vcpu, HCR_NV);
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 697bf0bca550..3e1f37c507a8 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -566,6 +566,9 @@ static bool trap_debug_regs(struct kvm_vcpu *vcpu,
>  			    struct sys_reg_params *p,
>  			    const struct sys_reg_desc *r)
>  {
> +	if (__forward_traps(vcpu, MDCR_EL2, MDCR_EL2_TDA | MDCR_EL2_TDE))
> +		return false;

The description of the MDCR_EL2.TDA field says:

"This field is treated as being 1 for all purposes other than a direct read
when one or more of the following are true:

- MDCR_EL2.TDE == 1
- HCR_EL2.TGE == 1"

Shouldn't we also check for HCR_EL2.TGE == 1 when deciding to forward the trap?

Thanks,
Alex

> +
>  	access_rw(vcpu, p, r);
>  	if (p->is_write)
>  		vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
> -- 
> 2.30.2
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 29/64] KVM: arm64: nv: Forward debug traps to the nested guest
@ 2022-02-09 11:04     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-09 11:04 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:37PM +0000, Marc Zyngier wrote:
> On handling a debug trap, check whether we need to forward it to the
> guest before handling it.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_nested.h | 2 ++
>  arch/arm64/kvm/emulate-nested.c     | 9 +++++++--
>  arch/arm64/kvm/sys_regs.c           | 3 +++
>  3 files changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index 82fc8b6c990b..047ca700163b 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -66,6 +66,8 @@ static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
>  }
>  
>  int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
> +extern bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg,
> +			    u64 control_bit);
>  extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
>  extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
>  extern bool forward_nv1_traps(struct kvm_vcpu *vcpu);
> diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
> index 0109dfd664dd..1f6cf8fe9fe3 100644
> --- a/arch/arm64/kvm/emulate-nested.c
> +++ b/arch/arm64/kvm/emulate-nested.c
> @@ -13,14 +13,14 @@
>  
>  #include "trace.h"
>  
> -bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit)
> +bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg, u64 control_bit)
>  {
>  	bool control_bit_set;
>  
>  	if (!vcpu_has_nv(vcpu))
>  		return false;
>  
> -	control_bit_set = __vcpu_sys_reg(vcpu, HCR_EL2) & control_bit;
> +	control_bit_set = __vcpu_sys_reg(vcpu, reg) & control_bit;
>  	if (!vcpu_is_el2(vcpu) && control_bit_set) {
>  		kvm_inject_nested_sync(vcpu, kvm_vcpu_get_esr(vcpu));
>  		return true;
> @@ -28,6 +28,11 @@ bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit)
>  	return false;
>  }
>  
> +bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit)
> +{
> +	return __forward_traps(vcpu, HCR_EL2, control_bit);
> +}
> +
>  bool forward_nv_traps(struct kvm_vcpu *vcpu)
>  {
>  	return forward_traps(vcpu, HCR_NV);
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 697bf0bca550..3e1f37c507a8 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -566,6 +566,9 @@ static bool trap_debug_regs(struct kvm_vcpu *vcpu,
>  			    struct sys_reg_params *p,
>  			    const struct sys_reg_desc *r)
>  {
> +	if (__forward_traps(vcpu, MDCR_EL2, MDCR_EL2_TDA | MDCR_EL2_TDE))
> +		return false;

The description of the MDCR_EL2.TDA field says:

"This field is treated as being 1 for all purposes other than a direct read
when one or more of the following are true:

- MDCR_EL2.TDE == 1
- HCR_EL2.TGE == 1"

Shouldn't we also check for HCR_EL2.TGE == 1 when deciding to forward the trap?

Thanks,
Alex

> +
>  	access_rw(vcpu, p, r);
>  	if (p->is_write)
>  		vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 30/64] KVM: arm64: nv: Configure HCR_EL2 for nested virtualization
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-09 16:41     ` Alexandru Elisei
  -1 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-09 16:41 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi,

On Fri, Jan 28, 2022 at 12:18:38PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> We enable nested virtualization by setting the HCR NV and NV1 bit.
> 
> When the virtual E2H bit is set, we can support EL2 register accesses
> via EL1 registers from the virtual EL2 by doing trap-and-emulate. A
> better alternative, however, is to allow the virtual EL2 to access EL2
> register states without trap. This can be easily achieved by not traping
> EL1 registers since those registers already have EL2 register states.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_arm.h |  1 +
>  arch/arm64/kvm/hyp/vhe/switch.c  | 38 +++++++++++++++++++++++++++++---
>  2 files changed, 36 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index 748c2b068d4e..d913c3fb5665 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -87,6 +87,7 @@
>  			 HCR_BSU_IS | HCR_FB | HCR_TACR | \
>  			 HCR_AMO | HCR_SWIO | HCR_TIDCP | HCR_RW | HCR_TLOR | \
>  			 HCR_FMO | HCR_IMO | HCR_PTW )
> +#define HCR_GUEST_NV_FILTER_FLAGS (HCR_ATA | HCR_API | HCR_APK)
>  #define HCR_VIRT_EXCP_MASK (HCR_VSE | HCR_VI | HCR_VF)
>  #define HCR_HOST_NVHE_FLAGS (HCR_RW | HCR_API | HCR_APK | HCR_ATA)
>  #define HCR_HOST_NVHE_PROTECTED_FLAGS (HCR_HOST_NVHE_FLAGS | HCR_TSC)
> diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
> index 1e6a00e7bfb3..28845f907cfc 100644
> --- a/arch/arm64/kvm/hyp/vhe/switch.c
> +++ b/arch/arm64/kvm/hyp/vhe/switch.c
> @@ -35,9 +35,41 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
>  	u64 hcr = vcpu->arch.hcr_el2;
>  	u64 val;
>  
> -	/* Trap VM sysreg accesses if an EL2 guest is not using VHE. */
> -	if (vcpu_is_el2(vcpu) && !vcpu_el2_e2h_is_set(vcpu))
> -		hcr |= HCR_TVM | HCR_TRVM;
> +	if (is_hyp_ctxt(vcpu)) {
> +		hcr |= HCR_NV;
> +
> +		if (!vcpu_el2_e2h_is_set(vcpu)) {
> +			/*
> +			 * For a guest hypervisor on v8.0, trap and emulate

That's confusing, because FEAT_NV is available from v8.3 and newer.

> +			 * the EL1 virtual memory control register accesses.
> +			 */

Wasn't the purpose of comment to explain why something is done instead of what
something does? The bits are explained in the Arm ARM. How about something like
this instead:

"A hypervisor running in non-VHE mode writes to EL1 the memory control registers
when preparing to run a guest or the host at EL1. Since virtual EL2 is emulated
on top of physical EL1, allowing the L1 guest hypervisor to access the registers
directly would lead to the L1 guest inadvertently corrupting its own state.

The NV1 bit is set to trap accesses to the registers which aren't controlled by
the TVM and TRVM bits, and to change to translation table format used by the
EL1&0 translation regime format to the EL2 translation regime format, which is
what the L1 guest hypervisor running with E2H = 0 expects when it creates the
tables".

> +			hcr |= HCR_TVM | HCR_TRVM | HCR_NV1;
> +		} else {
> +			/*
> +			 * For a guest hypervisor on v8.1 (VHE), allow to
> +			 * access the EL1 virtual memory control registers
> +			 * natively. These accesses are to access EL2 register
> +			 * states.
> +			 * Note that we still need to respect the virtual
> +			 * HCR_EL2 state.
> +			 */
> +			u64 vhcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);

Nitpick: I think it would be better to name this variable virtual_hcr_el2,
because vhcr_el2 looks like a valid register name.

Thanks,
Alex

> +
> +			vhcr_el2 &= ~HCR_GUEST_NV_FILTER_FLAGS;
> +
> +			/*
> +			 * We already set TVM to handle set/way cache maint
> +			 * ops traps, this somewhat collides with the nested
> +			 * virt trapping for nVHE. So turn this off for now
> +			 * here, in the hope that VHE guests won't ever do this.
> +			 * TODO: find out whether it's worth to support both
> +			 * cases at the same time.
> +			 */
> +			hcr &= ~HCR_TVM;
> +
> +			hcr |= vhcr_el2 & (HCR_TVM | HCR_TRVM);
> +		}
> +	}
>  
>  	___activate_traps(vcpu, hcr);
>  
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 30/64] KVM: arm64: nv: Configure HCR_EL2 for nested virtualization
@ 2022-02-09 16:41     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-09 16:41 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

Hi,

On Fri, Jan 28, 2022 at 12:18:38PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> We enable nested virtualization by setting the HCR NV and NV1 bit.
> 
> When the virtual E2H bit is set, we can support EL2 register accesses
> via EL1 registers from the virtual EL2 by doing trap-and-emulate. A
> better alternative, however, is to allow the virtual EL2 to access EL2
> register states without trap. This can be easily achieved by not traping
> EL1 registers since those registers already have EL2 register states.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_arm.h |  1 +
>  arch/arm64/kvm/hyp/vhe/switch.c  | 38 +++++++++++++++++++++++++++++---
>  2 files changed, 36 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index 748c2b068d4e..d913c3fb5665 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -87,6 +87,7 @@
>  			 HCR_BSU_IS | HCR_FB | HCR_TACR | \
>  			 HCR_AMO | HCR_SWIO | HCR_TIDCP | HCR_RW | HCR_TLOR | \
>  			 HCR_FMO | HCR_IMO | HCR_PTW )
> +#define HCR_GUEST_NV_FILTER_FLAGS (HCR_ATA | HCR_API | HCR_APK)
>  #define HCR_VIRT_EXCP_MASK (HCR_VSE | HCR_VI | HCR_VF)
>  #define HCR_HOST_NVHE_FLAGS (HCR_RW | HCR_API | HCR_APK | HCR_ATA)
>  #define HCR_HOST_NVHE_PROTECTED_FLAGS (HCR_HOST_NVHE_FLAGS | HCR_TSC)
> diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
> index 1e6a00e7bfb3..28845f907cfc 100644
> --- a/arch/arm64/kvm/hyp/vhe/switch.c
> +++ b/arch/arm64/kvm/hyp/vhe/switch.c
> @@ -35,9 +35,41 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
>  	u64 hcr = vcpu->arch.hcr_el2;
>  	u64 val;
>  
> -	/* Trap VM sysreg accesses if an EL2 guest is not using VHE. */
> -	if (vcpu_is_el2(vcpu) && !vcpu_el2_e2h_is_set(vcpu))
> -		hcr |= HCR_TVM | HCR_TRVM;
> +	if (is_hyp_ctxt(vcpu)) {
> +		hcr |= HCR_NV;
> +
> +		if (!vcpu_el2_e2h_is_set(vcpu)) {
> +			/*
> +			 * For a guest hypervisor on v8.0, trap and emulate

That's confusing, because FEAT_NV is available from v8.3 and newer.

> +			 * the EL1 virtual memory control register accesses.
> +			 */

Wasn't the purpose of comment to explain why something is done instead of what
something does? The bits are explained in the Arm ARM. How about something like
this instead:

"A hypervisor running in non-VHE mode writes to EL1 the memory control registers
when preparing to run a guest or the host at EL1. Since virtual EL2 is emulated
on top of physical EL1, allowing the L1 guest hypervisor to access the registers
directly would lead to the L1 guest inadvertently corrupting its own state.

The NV1 bit is set to trap accesses to the registers which aren't controlled by
the TVM and TRVM bits, and to change to translation table format used by the
EL1&0 translation regime format to the EL2 translation regime format, which is
what the L1 guest hypervisor running with E2H = 0 expects when it creates the
tables".

> +			hcr |= HCR_TVM | HCR_TRVM | HCR_NV1;
> +		} else {
> +			/*
> +			 * For a guest hypervisor on v8.1 (VHE), allow to
> +			 * access the EL1 virtual memory control registers
> +			 * natively. These accesses are to access EL2 register
> +			 * states.
> +			 * Note that we still need to respect the virtual
> +			 * HCR_EL2 state.
> +			 */
> +			u64 vhcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);

Nitpick: I think it would be better to name this variable virtual_hcr_el2,
because vhcr_el2 looks like a valid register name.

Thanks,
Alex

> +
> +			vhcr_el2 &= ~HCR_GUEST_NV_FILTER_FLAGS;
> +
> +			/*
> +			 * We already set TVM to handle set/way cache maint
> +			 * ops traps, this somewhat collides with the nested
> +			 * virt trapping for nVHE. So turn this off for now
> +			 * here, in the hope that VHE guests won't ever do this.
> +			 * TODO: find out whether it's worth to support both
> +			 * cases at the same time.
> +			 */
> +			hcr &= ~HCR_TVM;
> +
> +			hcr |= vhcr_el2 & (HCR_TVM | HCR_TRVM);
> +		}
> +	}
>  
>  	___activate_traps(vcpu, hcr);
>  
> -- 
> 2.30.2
> 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 30/64] KVM: arm64: nv: Configure HCR_EL2 for nested virtualization
@ 2022-02-09 16:41     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-09 16:41 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi,

On Fri, Jan 28, 2022 at 12:18:38PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> We enable nested virtualization by setting the HCR NV and NV1 bit.
> 
> When the virtual E2H bit is set, we can support EL2 register accesses
> via EL1 registers from the virtual EL2 by doing trap-and-emulate. A
> better alternative, however, is to allow the virtual EL2 to access EL2
> register states without trap. This can be easily achieved by not traping
> EL1 registers since those registers already have EL2 register states.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_arm.h |  1 +
>  arch/arm64/kvm/hyp/vhe/switch.c  | 38 +++++++++++++++++++++++++++++---
>  2 files changed, 36 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index 748c2b068d4e..d913c3fb5665 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -87,6 +87,7 @@
>  			 HCR_BSU_IS | HCR_FB | HCR_TACR | \
>  			 HCR_AMO | HCR_SWIO | HCR_TIDCP | HCR_RW | HCR_TLOR | \
>  			 HCR_FMO | HCR_IMO | HCR_PTW )
> +#define HCR_GUEST_NV_FILTER_FLAGS (HCR_ATA | HCR_API | HCR_APK)
>  #define HCR_VIRT_EXCP_MASK (HCR_VSE | HCR_VI | HCR_VF)
>  #define HCR_HOST_NVHE_FLAGS (HCR_RW | HCR_API | HCR_APK | HCR_ATA)
>  #define HCR_HOST_NVHE_PROTECTED_FLAGS (HCR_HOST_NVHE_FLAGS | HCR_TSC)
> diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
> index 1e6a00e7bfb3..28845f907cfc 100644
> --- a/arch/arm64/kvm/hyp/vhe/switch.c
> +++ b/arch/arm64/kvm/hyp/vhe/switch.c
> @@ -35,9 +35,41 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
>  	u64 hcr = vcpu->arch.hcr_el2;
>  	u64 val;
>  
> -	/* Trap VM sysreg accesses if an EL2 guest is not using VHE. */
> -	if (vcpu_is_el2(vcpu) && !vcpu_el2_e2h_is_set(vcpu))
> -		hcr |= HCR_TVM | HCR_TRVM;
> +	if (is_hyp_ctxt(vcpu)) {
> +		hcr |= HCR_NV;
> +
> +		if (!vcpu_el2_e2h_is_set(vcpu)) {
> +			/*
> +			 * For a guest hypervisor on v8.0, trap and emulate

That's confusing, because FEAT_NV is available from v8.3 and newer.

> +			 * the EL1 virtual memory control register accesses.
> +			 */

Wasn't the purpose of comment to explain why something is done instead of what
something does? The bits are explained in the Arm ARM. How about something like
this instead:

"A hypervisor running in non-VHE mode writes to EL1 the memory control registers
when preparing to run a guest or the host at EL1. Since virtual EL2 is emulated
on top of physical EL1, allowing the L1 guest hypervisor to access the registers
directly would lead to the L1 guest inadvertently corrupting its own state.

The NV1 bit is set to trap accesses to the registers which aren't controlled by
the TVM and TRVM bits, and to change to translation table format used by the
EL1&0 translation regime format to the EL2 translation regime format, which is
what the L1 guest hypervisor running with E2H = 0 expects when it creates the
tables".

> +			hcr |= HCR_TVM | HCR_TRVM | HCR_NV1;
> +		} else {
> +			/*
> +			 * For a guest hypervisor on v8.1 (VHE), allow to
> +			 * access the EL1 virtual memory control registers
> +			 * natively. These accesses are to access EL2 register
> +			 * states.
> +			 * Note that we still need to respect the virtual
> +			 * HCR_EL2 state.
> +			 */
> +			u64 vhcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);

Nitpick: I think it would be better to name this variable virtual_hcr_el2,
because vhcr_el2 looks like a valid register name.

Thanks,
Alex

> +
> +			vhcr_el2 &= ~HCR_GUEST_NV_FILTER_FLAGS;
> +
> +			/*
> +			 * We already set TVM to handle set/way cache maint
> +			 * ops traps, this somewhat collides with the nested
> +			 * virt trapping for nVHE. So turn this off for now
> +			 * here, in the hope that VHE guests won't ever do this.
> +			 * TODO: find out whether it's worth to support both
> +			 * cases at the same time.
> +			 */
> +			hcr &= ~HCR_TVM;
> +
> +			hcr |= vhcr_el2 & (HCR_TVM | HCR_TRVM);
> +		}
> +	}
>  
>  	___activate_traps(vcpu, hcr);
>  
> -- 
> 2.30.2
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 31/64] KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 changes
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-09 16:56     ` Alexandru Elisei
  -1 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-09 16:56 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:39PM +0000, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@linaro.org>
> 
> So far we were flushing almost the entire universe whenever a VM would
> load/unload the SCTLR_EL1 and the two versions of that register had
> different MMU enabled settings.  This turned out to be so slow that it
> prevented forward progress for a nested VM, because a scheduler timer
> tick interrupt would always be pending when we reached the nested VM.
> 
> To avoid this problem, we consider the SCTLR_EL2 when evaluating if
> caches are on or off when entering virtual EL2 (because this is the
> value that we end up shadowing onto the hardware EL1 register).
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_mmu.h | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 81839e9a8a24..1b314b2a69bc 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -115,6 +115,7 @@ alternative_cb_end
>  #include <asm/cache.h>
>  #include <asm/cacheflush.h>
>  #include <asm/mmu_context.h>
> +#include <asm/kvm_emulate.h>
>  
>  void kvm_update_va_mask(struct alt_instr *alt,
>  			__le32 *origptr, __le32 *updptr, int nr_inst);
> @@ -187,7 +188,10 @@ struct kvm;
>  
>  static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
>  {
> -	return (vcpu_read_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
> +	if (vcpu_is_el2(vcpu))
> +		return (__vcpu_sys_reg(vcpu, SCTLR_EL2) & 0b101) == 0b101;
> +	else
> +		return (vcpu_read_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;

Might be more readable if instead of 0b101 KVM would use defines, something
like:

 static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
 {
-       return (vcpu_read_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
+       u64 cache_bits = SCTLR_ELx_M | SCTLR_ELx_C;
+
+       if (vcpu_is_el2(vcpu))
+               return (__vcpu_sys_reg(vcpu, SCTLR_EL2) & cache_bits) == cache_bits;
+       else
+               return (vcpu_read_sys_reg(vcpu, SCTLR_EL1) & cache_bits) == cache_bits;
 }

Regardless, it is correct to use vcpu_read_sys_reg() for the SCTLR_EL1 case, as
the most recent register value could be on the CPU in the VHE case, instead of
being in memory, like it's always the case with the SCTLR_EL2 register:

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks,
Alex

>  }
>  
>  static inline void __clean_dcache_guest_page(void *va, size_t size)
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 31/64] KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 changes
@ 2022-02-09 16:56     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-09 16:56 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:39PM +0000, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@linaro.org>
> 
> So far we were flushing almost the entire universe whenever a VM would
> load/unload the SCTLR_EL1 and the two versions of that register had
> different MMU enabled settings.  This turned out to be so slow that it
> prevented forward progress for a nested VM, because a scheduler timer
> tick interrupt would always be pending when we reached the nested VM.
> 
> To avoid this problem, we consider the SCTLR_EL2 when evaluating if
> caches are on or off when entering virtual EL2 (because this is the
> value that we end up shadowing onto the hardware EL1 register).
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_mmu.h | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 81839e9a8a24..1b314b2a69bc 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -115,6 +115,7 @@ alternative_cb_end
>  #include <asm/cache.h>
>  #include <asm/cacheflush.h>
>  #include <asm/mmu_context.h>
> +#include <asm/kvm_emulate.h>
>  
>  void kvm_update_va_mask(struct alt_instr *alt,
>  			__le32 *origptr, __le32 *updptr, int nr_inst);
> @@ -187,7 +188,10 @@ struct kvm;
>  
>  static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
>  {
> -	return (vcpu_read_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
> +	if (vcpu_is_el2(vcpu))
> +		return (__vcpu_sys_reg(vcpu, SCTLR_EL2) & 0b101) == 0b101;
> +	else
> +		return (vcpu_read_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;

Might be more readable if instead of 0b101 KVM would use defines, something
like:

 static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
 {
-       return (vcpu_read_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
+       u64 cache_bits = SCTLR_ELx_M | SCTLR_ELx_C;
+
+       if (vcpu_is_el2(vcpu))
+               return (__vcpu_sys_reg(vcpu, SCTLR_EL2) & cache_bits) == cache_bits;
+       else
+               return (vcpu_read_sys_reg(vcpu, SCTLR_EL1) & cache_bits) == cache_bits;
 }

Regardless, it is correct to use vcpu_read_sys_reg() for the SCTLR_EL1 case, as
the most recent register value could be on the CPU in the VHE case, instead of
being in memory, like it's always the case with the SCTLR_EL2 register:

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks,
Alex

>  }
>  
>  static inline void __clean_dcache_guest_page(void *va, size_t size)
> -- 
> 2.30.2
> 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 31/64] KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 changes
@ 2022-02-09 16:56     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-09 16:56 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi Marc,

On Fri, Jan 28, 2022 at 12:18:39PM +0000, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@linaro.org>
> 
> So far we were flushing almost the entire universe whenever a VM would
> load/unload the SCTLR_EL1 and the two versions of that register had
> different MMU enabled settings.  This turned out to be so slow that it
> prevented forward progress for a nested VM, because a scheduler timer
> tick interrupt would always be pending when we reached the nested VM.
> 
> To avoid this problem, we consider the SCTLR_EL2 when evaluating if
> caches are on or off when entering virtual EL2 (because this is the
> value that we end up shadowing onto the hardware EL1 register).
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_mmu.h | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 81839e9a8a24..1b314b2a69bc 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -115,6 +115,7 @@ alternative_cb_end
>  #include <asm/cache.h>
>  #include <asm/cacheflush.h>
>  #include <asm/mmu_context.h>
> +#include <asm/kvm_emulate.h>
>  
>  void kvm_update_va_mask(struct alt_instr *alt,
>  			__le32 *origptr, __le32 *updptr, int nr_inst);
> @@ -187,7 +188,10 @@ struct kvm;
>  
>  static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
>  {
> -	return (vcpu_read_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
> +	if (vcpu_is_el2(vcpu))
> +		return (__vcpu_sys_reg(vcpu, SCTLR_EL2) & 0b101) == 0b101;
> +	else
> +		return (vcpu_read_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;

Might be more readable if instead of 0b101 KVM would use defines, something
like:

 static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
 {
-       return (vcpu_read_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
+       u64 cache_bits = SCTLR_ELx_M | SCTLR_ELx_C;
+
+       if (vcpu_is_el2(vcpu))
+               return (__vcpu_sys_reg(vcpu, SCTLR_EL2) & cache_bits) == cache_bits;
+       else
+               return (vcpu_read_sys_reg(vcpu, SCTLR_EL1) & cache_bits) == cache_bits;
 }

Regardless, it is correct to use vcpu_read_sys_reg() for the SCTLR_EL1 case, as
the most recent register value could be on the CPU in the VHE case, instead of
being in memory, like it's always the case with the SCTLR_EL2 register:

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks,
Alex

>  }
>  
>  static inline void __clean_dcache_guest_page(void *va, size_t size)
> -- 
> 2.30.2
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 32/64] KVM: arm64: nv: Filter out unsupported features from ID regs
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-09 17:33     ` Alexandru Elisei
  -1 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-09 17:33 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi,

On Fri, Jan 28, 2022 at 12:18:40PM +0000, Marc Zyngier wrote:
> As there is a number of features that we either can't support,
> or don't want to support right away with NV, let's add some
> basic filtering so that we don't advertize silly things to the
> EL2 guest.
> 
> Whilst we are at it, advertize ARMv8.4-TTL as well as ARMv8.5-GTG.
> 
> Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_nested.h |   6 ++
>  arch/arm64/kvm/nested.c             | 157 ++++++++++++++++++++++++++++
>  arch/arm64/kvm/sys_regs.c           |   4 +-
>  arch/arm64/kvm/sys_regs.h           |   2 +
>  4 files changed, 168 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index 047ca700163b..7d398510fd9d 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -72,4 +72,10 @@ extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
>  extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
>  extern bool forward_nv1_traps(struct kvm_vcpu *vcpu);
>  
> +struct sys_reg_params;
> +struct sys_reg_desc;
> +
> +void access_nested_id_reg(struct kvm_vcpu *v, struct sys_reg_params *p,
> +			  const struct sys_reg_desc *r);
> +
>  #endif /* __ARM64_KVM_NESTED_H */
> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> index 5e1104f8e765..254152cd791e 100644
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
> @@ -8,6 +8,10 @@
>  #include <linux/kvm_host.h>
>  
>  #include <asm/kvm_emulate.h>
> +#include <asm/kvm_nested.h>
> +#include <asm/sysreg.h>
> +
> +#include "sys_regs.h"
>  
>  /*
>   * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
> @@ -26,3 +30,156 @@ int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
>  
>  	return -EINVAL;
>  }
> +
> +/*
> + * Our emulated CPU doesn't support all the possible features. For the
> + * sake of simplicity (and probably mental sanity), wipe out a number
> + * of feature bits we don't intend to support for the time being.
> + * This list should get updated as new features get added to the NV
> + * support, and new extension to the architecture.
> + */
> +void access_nested_id_reg(struct kvm_vcpu *v, struct sys_reg_params *p,
> +			  const struct sys_reg_desc *r)
> +{
> +	u32 id = sys_reg((u32)r->Op0, (u32)r->Op1,
> +			 (u32)r->CRn, (u32)r->CRm, (u32)r->Op2);
> +	u64 val, tmp;
> +
> +	if (!vcpu_has_nv(v))
> +		return;
> +
> +	val = p->regval;
> +
> +	switch (id) {
> +	case SYS_ID_AA64ISAR0_EL1:
> +		/* Support everything but O.S. and Range TLBIs */
> +		val &= ~(FEATURE(ID_AA64ISAR0_TLB)	|
> +			 GENMASK_ULL(27, 24)		|
> +			 GENMASK_ULL(3, 0));
> +		break;
> +
> +	case SYS_ID_AA64ISAR1_EL1:
> +		/* Support everything but PtrAuth and Spec Invalidation */
> +		val &= ~(GENMASK_ULL(63, 56)		|
> +			 FEATURE(ID_AA64ISAR1_SPECRES)	|
> +			 FEATURE(ID_AA64ISAR1_GPI)	|
> +			 FEATURE(ID_AA64ISAR1_GPA)	|
> +			 FEATURE(ID_AA64ISAR1_API)	|
> +			 FEATURE(ID_AA64ISAR1_APA));
> +		break;
> +
> +	case SYS_ID_AA64PFR0_EL1:
> +		/* No AMU, MPAM, S-EL2, RAS or SVE */
> +		val &= ~(GENMASK_ULL(55, 52)		|
> +			 FEATURE(ID_AA64PFR0_AMU)	|
> +			 FEATURE(ID_AA64PFR0_MPAM)	|
> +			 FEATURE(ID_AA64PFR0_SEL2)	|
> +			 FEATURE(ID_AA64PFR0_RAS)	|
> +			 FEATURE(ID_AA64PFR0_SVE)	|
> +			 FEATURE(ID_AA64PFR0_EL3)	|
> +			 FEATURE(ID_AA64PFR0_EL2));
> +		/* 64bit EL2/EL3 only */
> +		val |= FIELD_PREP(FEATURE(ID_AA64PFR0_EL2), 0b0001);
> +		val |= FIELD_PREP(FEATURE(ID_AA64PFR0_EL3), 0b0001);

Since KVM doesn't support virtual EL1 running in AArch32 mode when vcpu_has_nv()
(according to kvm_check_illegal_exception_return()), would it make sense to hide
the feature here?

> +		break;
> +
> +	case SYS_ID_AA64PFR1_EL1:
> +		/* Only support SSBS */
> +		val &= FEATURE(ID_AA64PFR1_SSBS);
> +		break;
> +
> +	case SYS_ID_AA64MMFR0_EL1:
> +		/* Hide ECV, FGT, ExS, Secure Memory */
> +		val &= ~(GENMASK_ULL(63, 43)			|
> +			 FEATURE(ID_AA64MMFR0_TGRAN4_2)		|
> +			 FEATURE(ID_AA64MMFR0_TGRAN16_2)	|
> +			 FEATURE(ID_AA64MMFR0_TGRAN64_2)	|
> +			 FEATURE(ID_AA64MMFR0_SNSMEM));
> +
> +		/* Disallow unsupported S2 page sizes */
> +		switch (PAGE_SIZE) {
> +		case SZ_64K:
> +			val |= FIELD_PREP(FEATURE(ID_AA64MMFR0_TGRAN16_2), 0b0001);
> +			fallthrough;
> +		case SZ_16K:
> +			val |= FIELD_PREP(FEATURE(ID_AA64MMFR0_TGRAN4_2), 0b0001);
> +			fallthrough;
> +		case SZ_4K:
> +			/* Support everything */
> +			break;
> +		}
> +		/*
> +		 * Since we can't support a guest S2 page size smaller than
> +		 * the host's own page size (due to KVM only populating its
> +		 * own S2 using the kernel's page size), advertise the
> +		 * limitation using FEAT_GTG.
> +		 */
> +		switch (PAGE_SIZE) {
> +		case SZ_4K:
> +			val |= FIELD_PREP(FEATURE(ID_AA64MMFR0_TGRAN4_2), 0b0010);
> +			fallthrough;
> +		case SZ_16K:
> +			val |= FIELD_PREP(FEATURE(ID_AA64MMFR0_TGRAN16_2), 0b0010);
> +			fallthrough;
> +		case SZ_64K:
> +			val |= FIELD_PREP(FEATURE(ID_AA64MMFR0_TGRAN64_2), 0b0010);
> +			break;
> +		}
> +		/* Cap PARange to 40bits */
> +		tmp = FIELD_GET(FEATURE(ID_AA64MMFR0_PARANGE), val);
> +		if (tmp > 0b0010) {
> +			val &= ~FEATURE(ID_AA64MMFR0_PARANGE);
> +			val |= FIELD_PREP(FEATURE(ID_AA64MMFR0_PARANGE), 0b0010);
> +		}
> +		break;
> +
> +	case SYS_ID_AA64MMFR1_EL1:
> +		val &= (FEATURE(ID_AA64MMFR1_PAN)	|
> +			FEATURE(ID_AA64MMFR1_LOR)	|
> +			FEATURE(ID_AA64MMFR1_HPD)	|
> +			FEATURE(ID_AA64MMFR1_VHE)	|
> +			FEATURE(ID_AA64MMFR1_VMIDBITS));
> +		break;
> +
> +	case SYS_ID_AA64MMFR2_EL1:
> +		val &= ~(FEATURE(ID_AA64MMFR2_EVT)	|
> +			 FEATURE(ID_AA64MMFR2_BBM)	|
> +			 FEATURE(ID_AA64MMFR2_TTL)	|
> +			 GENMASK_ULL(47, 44)		|
> +			 FEATURE(ID_AA64MMFR2_ST)	|
> +			 FEATURE(ID_AA64MMFR2_CCIDX)	|
> +			 FEATURE(ID_AA64MMFR2_LVA));
> +
> +		/* Force TTL support */
> +		val |= FIELD_PREP(FEATURE(ID_AA64MMFR2_TTL), 0b0001);
> +		break;
> +
> +	case SYS_ID_AA64DFR0_EL1:
> +		/* Only limited support for PMU, Debug, BPs and WPs */
> +		val &= (FEATURE(ID_AA64DFR0_PMSVER)	|
> +			FEATURE(ID_AA64DFR0_WRPS)	|
> +			FEATURE(ID_AA64DFR0_BRPS)	|
> +			FEATURE(ID_AA64DFR0_DEBUGVER));
> +
> +		/* Cap PMU to ARMv8.1 */
> +		tmp = FIELD_GET(FEATURE(ID_AA64DFR0_PMUVER), val);
> +		if (tmp > 0b0100) {
> +			val &= ~FEATURE(ID_AA64DFR0_PMUVER);
> +			val |= FIELD_PREP(FEATURE(ID_AA64DFR0_PMUVER), 0b0100);
> +		}
> +		/* Cap Debug to ARMv8.1 */
> +		tmp = FIELD_GET(FEATURE(ID_AA64DFR0_DEBUGVER), val);
> +		if (tmp > 0b0111) {
> +			val &= ~FEATURE(ID_AA64DFR0_DEBUGVER);
> +			val |= FIELD_PREP(FEATURE(ID_AA64DFR0_DEBUGVER), 0b0111);
> +		}
> +		break;
> +
> +	default:
> +		/* Unknown register, just wipe it clean */
> +		val = 0;
> +		break;
> +	}
> +
> +	p->regval = val;
> +}
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 3e1f37c507a8..ebcdf2714b73 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1394,8 +1394,10 @@ static bool access_id_reg(struct kvm_vcpu *vcpu,
>  			  const struct sys_reg_desc *r)
>  {
>  	bool raz = sysreg_visible_as_raz(vcpu, r);
> +	bool ret = __access_id_reg(vcpu, p, r, raz);
>  
> -	return __access_id_reg(vcpu, p, r, raz);
> +	access_nested_id_reg(vcpu, p, r);

Looks like when the guest tries to *write* to an ID reg, __access_id_reg() above
returns false. It also looks like access_nested_id_reg() assumes that the access
is a read. Shouldn't the call to access_nested_id_reg() be gated by ret == true?

Thanks,
Alex

> +	return ret;
>  }
>  
>  static bool access_raz_id_reg(struct kvm_vcpu *vcpu,
> diff --git a/arch/arm64/kvm/sys_regs.h b/arch/arm64/kvm/sys_regs.h
> index 850629f083a3..b82683224250 100644
> --- a/arch/arm64/kvm/sys_regs.h
> +++ b/arch/arm64/kvm/sys_regs.h
> @@ -211,4 +211,6 @@ const struct sys_reg_desc *find_reg_by_id(u64 id,
>  	CRn(sys_reg_CRn(reg)), CRm(sys_reg_CRm(reg)),	\
>  	Op2(sys_reg_Op2(reg))
>  
> +#define FEATURE(x)	(GENMASK_ULL(x##_SHIFT + 3, x##_SHIFT))
> +
>  #endif /* __ARM64_KVM_SYS_REGS_LOCAL_H__ */
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 32/64] KVM: arm64: nv: Filter out unsupported features from ID regs
@ 2022-02-09 17:33     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-09 17:33 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

Hi,

On Fri, Jan 28, 2022 at 12:18:40PM +0000, Marc Zyngier wrote:
> As there is a number of features that we either can't support,
> or don't want to support right away with NV, let's add some
> basic filtering so that we don't advertize silly things to the
> EL2 guest.
> 
> Whilst we are at it, advertize ARMv8.4-TTL as well as ARMv8.5-GTG.
> 
> Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_nested.h |   6 ++
>  arch/arm64/kvm/nested.c             | 157 ++++++++++++++++++++++++++++
>  arch/arm64/kvm/sys_regs.c           |   4 +-
>  arch/arm64/kvm/sys_regs.h           |   2 +
>  4 files changed, 168 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index 047ca700163b..7d398510fd9d 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -72,4 +72,10 @@ extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
>  extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
>  extern bool forward_nv1_traps(struct kvm_vcpu *vcpu);
>  
> +struct sys_reg_params;
> +struct sys_reg_desc;
> +
> +void access_nested_id_reg(struct kvm_vcpu *v, struct sys_reg_params *p,
> +			  const struct sys_reg_desc *r);
> +
>  #endif /* __ARM64_KVM_NESTED_H */
> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> index 5e1104f8e765..254152cd791e 100644
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
> @@ -8,6 +8,10 @@
>  #include <linux/kvm_host.h>
>  
>  #include <asm/kvm_emulate.h>
> +#include <asm/kvm_nested.h>
> +#include <asm/sysreg.h>
> +
> +#include "sys_regs.h"
>  
>  /*
>   * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
> @@ -26,3 +30,156 @@ int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
>  
>  	return -EINVAL;
>  }
> +
> +/*
> + * Our emulated CPU doesn't support all the possible features. For the
> + * sake of simplicity (and probably mental sanity), wipe out a number
> + * of feature bits we don't intend to support for the time being.
> + * This list should get updated as new features get added to the NV
> + * support, and new extension to the architecture.
> + */
> +void access_nested_id_reg(struct kvm_vcpu *v, struct sys_reg_params *p,
> +			  const struct sys_reg_desc *r)
> +{
> +	u32 id = sys_reg((u32)r->Op0, (u32)r->Op1,
> +			 (u32)r->CRn, (u32)r->CRm, (u32)r->Op2);
> +	u64 val, tmp;
> +
> +	if (!vcpu_has_nv(v))
> +		return;
> +
> +	val = p->regval;
> +
> +	switch (id) {
> +	case SYS_ID_AA64ISAR0_EL1:
> +		/* Support everything but O.S. and Range TLBIs */
> +		val &= ~(FEATURE(ID_AA64ISAR0_TLB)	|
> +			 GENMASK_ULL(27, 24)		|
> +			 GENMASK_ULL(3, 0));
> +		break;
> +
> +	case SYS_ID_AA64ISAR1_EL1:
> +		/* Support everything but PtrAuth and Spec Invalidation */
> +		val &= ~(GENMASK_ULL(63, 56)		|
> +			 FEATURE(ID_AA64ISAR1_SPECRES)	|
> +			 FEATURE(ID_AA64ISAR1_GPI)	|
> +			 FEATURE(ID_AA64ISAR1_GPA)	|
> +			 FEATURE(ID_AA64ISAR1_API)	|
> +			 FEATURE(ID_AA64ISAR1_APA));
> +		break;
> +
> +	case SYS_ID_AA64PFR0_EL1:
> +		/* No AMU, MPAM, S-EL2, RAS or SVE */
> +		val &= ~(GENMASK_ULL(55, 52)		|
> +			 FEATURE(ID_AA64PFR0_AMU)	|
> +			 FEATURE(ID_AA64PFR0_MPAM)	|
> +			 FEATURE(ID_AA64PFR0_SEL2)	|
> +			 FEATURE(ID_AA64PFR0_RAS)	|
> +			 FEATURE(ID_AA64PFR0_SVE)	|
> +			 FEATURE(ID_AA64PFR0_EL3)	|
> +			 FEATURE(ID_AA64PFR0_EL2));
> +		/* 64bit EL2/EL3 only */
> +		val |= FIELD_PREP(FEATURE(ID_AA64PFR0_EL2), 0b0001);
> +		val |= FIELD_PREP(FEATURE(ID_AA64PFR0_EL3), 0b0001);

Since KVM doesn't support virtual EL1 running in AArch32 mode when vcpu_has_nv()
(according to kvm_check_illegal_exception_return()), would it make sense to hide
the feature here?

> +		break;
> +
> +	case SYS_ID_AA64PFR1_EL1:
> +		/* Only support SSBS */
> +		val &= FEATURE(ID_AA64PFR1_SSBS);
> +		break;
> +
> +	case SYS_ID_AA64MMFR0_EL1:
> +		/* Hide ECV, FGT, ExS, Secure Memory */
> +		val &= ~(GENMASK_ULL(63, 43)			|
> +			 FEATURE(ID_AA64MMFR0_TGRAN4_2)		|
> +			 FEATURE(ID_AA64MMFR0_TGRAN16_2)	|
> +			 FEATURE(ID_AA64MMFR0_TGRAN64_2)	|
> +			 FEATURE(ID_AA64MMFR0_SNSMEM));
> +
> +		/* Disallow unsupported S2 page sizes */
> +		switch (PAGE_SIZE) {
> +		case SZ_64K:
> +			val |= FIELD_PREP(FEATURE(ID_AA64MMFR0_TGRAN16_2), 0b0001);
> +			fallthrough;
> +		case SZ_16K:
> +			val |= FIELD_PREP(FEATURE(ID_AA64MMFR0_TGRAN4_2), 0b0001);
> +			fallthrough;
> +		case SZ_4K:
> +			/* Support everything */
> +			break;
> +		}
> +		/*
> +		 * Since we can't support a guest S2 page size smaller than
> +		 * the host's own page size (due to KVM only populating its
> +		 * own S2 using the kernel's page size), advertise the
> +		 * limitation using FEAT_GTG.
> +		 */
> +		switch (PAGE_SIZE) {
> +		case SZ_4K:
> +			val |= FIELD_PREP(FEATURE(ID_AA64MMFR0_TGRAN4_2), 0b0010);
> +			fallthrough;
> +		case SZ_16K:
> +			val |= FIELD_PREP(FEATURE(ID_AA64MMFR0_TGRAN16_2), 0b0010);
> +			fallthrough;
> +		case SZ_64K:
> +			val |= FIELD_PREP(FEATURE(ID_AA64MMFR0_TGRAN64_2), 0b0010);
> +			break;
> +		}
> +		/* Cap PARange to 40bits */
> +		tmp = FIELD_GET(FEATURE(ID_AA64MMFR0_PARANGE), val);
> +		if (tmp > 0b0010) {
> +			val &= ~FEATURE(ID_AA64MMFR0_PARANGE);
> +			val |= FIELD_PREP(FEATURE(ID_AA64MMFR0_PARANGE), 0b0010);
> +		}
> +		break;
> +
> +	case SYS_ID_AA64MMFR1_EL1:
> +		val &= (FEATURE(ID_AA64MMFR1_PAN)	|
> +			FEATURE(ID_AA64MMFR1_LOR)	|
> +			FEATURE(ID_AA64MMFR1_HPD)	|
> +			FEATURE(ID_AA64MMFR1_VHE)	|
> +			FEATURE(ID_AA64MMFR1_VMIDBITS));
> +		break;
> +
> +	case SYS_ID_AA64MMFR2_EL1:
> +		val &= ~(FEATURE(ID_AA64MMFR2_EVT)	|
> +			 FEATURE(ID_AA64MMFR2_BBM)	|
> +			 FEATURE(ID_AA64MMFR2_TTL)	|
> +			 GENMASK_ULL(47, 44)		|
> +			 FEATURE(ID_AA64MMFR2_ST)	|
> +			 FEATURE(ID_AA64MMFR2_CCIDX)	|
> +			 FEATURE(ID_AA64MMFR2_LVA));
> +
> +		/* Force TTL support */
> +		val |= FIELD_PREP(FEATURE(ID_AA64MMFR2_TTL), 0b0001);
> +		break;
> +
> +	case SYS_ID_AA64DFR0_EL1:
> +		/* Only limited support for PMU, Debug, BPs and WPs */
> +		val &= (FEATURE(ID_AA64DFR0_PMSVER)	|
> +			FEATURE(ID_AA64DFR0_WRPS)	|
> +			FEATURE(ID_AA64DFR0_BRPS)	|
> +			FEATURE(ID_AA64DFR0_DEBUGVER));
> +
> +		/* Cap PMU to ARMv8.1 */
> +		tmp = FIELD_GET(FEATURE(ID_AA64DFR0_PMUVER), val);
> +		if (tmp > 0b0100) {
> +			val &= ~FEATURE(ID_AA64DFR0_PMUVER);
> +			val |= FIELD_PREP(FEATURE(ID_AA64DFR0_PMUVER), 0b0100);
> +		}
> +		/* Cap Debug to ARMv8.1 */
> +		tmp = FIELD_GET(FEATURE(ID_AA64DFR0_DEBUGVER), val);
> +		if (tmp > 0b0111) {
> +			val &= ~FEATURE(ID_AA64DFR0_DEBUGVER);
> +			val |= FIELD_PREP(FEATURE(ID_AA64DFR0_DEBUGVER), 0b0111);
> +		}
> +		break;
> +
> +	default:
> +		/* Unknown register, just wipe it clean */
> +		val = 0;
> +		break;
> +	}
> +
> +	p->regval = val;
> +}
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 3e1f37c507a8..ebcdf2714b73 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1394,8 +1394,10 @@ static bool access_id_reg(struct kvm_vcpu *vcpu,
>  			  const struct sys_reg_desc *r)
>  {
>  	bool raz = sysreg_visible_as_raz(vcpu, r);
> +	bool ret = __access_id_reg(vcpu, p, r, raz);
>  
> -	return __access_id_reg(vcpu, p, r, raz);
> +	access_nested_id_reg(vcpu, p, r);

Looks like when the guest tries to *write* to an ID reg, __access_id_reg() above
returns false. It also looks like access_nested_id_reg() assumes that the access
is a read. Shouldn't the call to access_nested_id_reg() be gated by ret == true?

Thanks,
Alex

> +	return ret;
>  }
>  
>  static bool access_raz_id_reg(struct kvm_vcpu *vcpu,
> diff --git a/arch/arm64/kvm/sys_regs.h b/arch/arm64/kvm/sys_regs.h
> index 850629f083a3..b82683224250 100644
> --- a/arch/arm64/kvm/sys_regs.h
> +++ b/arch/arm64/kvm/sys_regs.h
> @@ -211,4 +211,6 @@ const struct sys_reg_desc *find_reg_by_id(u64 id,
>  	CRn(sys_reg_CRn(reg)), CRm(sys_reg_CRm(reg)),	\
>  	Op2(sys_reg_Op2(reg))
>  
> +#define FEATURE(x)	(GENMASK_ULL(x##_SHIFT + 3, x##_SHIFT))
> +
>  #endif /* __ARM64_KVM_SYS_REGS_LOCAL_H__ */
> -- 
> 2.30.2
> 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 32/64] KVM: arm64: nv: Filter out unsupported features from ID regs
@ 2022-02-09 17:33     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-09 17:33 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi,

On Fri, Jan 28, 2022 at 12:18:40PM +0000, Marc Zyngier wrote:
> As there is a number of features that we either can't support,
> or don't want to support right away with NV, let's add some
> basic filtering so that we don't advertize silly things to the
> EL2 guest.
> 
> Whilst we are at it, advertize ARMv8.4-TTL as well as ARMv8.5-GTG.
> 
> Reviewed-by: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_nested.h |   6 ++
>  arch/arm64/kvm/nested.c             | 157 ++++++++++++++++++++++++++++
>  arch/arm64/kvm/sys_regs.c           |   4 +-
>  arch/arm64/kvm/sys_regs.h           |   2 +
>  4 files changed, 168 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index 047ca700163b..7d398510fd9d 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -72,4 +72,10 @@ extern bool forward_traps(struct kvm_vcpu *vcpu, u64 control_bit);
>  extern bool forward_nv_traps(struct kvm_vcpu *vcpu);
>  extern bool forward_nv1_traps(struct kvm_vcpu *vcpu);
>  
> +struct sys_reg_params;
> +struct sys_reg_desc;
> +
> +void access_nested_id_reg(struct kvm_vcpu *v, struct sys_reg_params *p,
> +			  const struct sys_reg_desc *r);
> +
>  #endif /* __ARM64_KVM_NESTED_H */
> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> index 5e1104f8e765..254152cd791e 100644
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
> @@ -8,6 +8,10 @@
>  #include <linux/kvm_host.h>
>  
>  #include <asm/kvm_emulate.h>
> +#include <asm/kvm_nested.h>
> +#include <asm/sysreg.h>
> +
> +#include "sys_regs.h"
>  
>  /*
>   * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
> @@ -26,3 +30,156 @@ int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
>  
>  	return -EINVAL;
>  }
> +
> +/*
> + * Our emulated CPU doesn't support all the possible features. For the
> + * sake of simplicity (and probably mental sanity), wipe out a number
> + * of feature bits we don't intend to support for the time being.
> + * This list should get updated as new features get added to the NV
> + * support, and new extension to the architecture.
> + */
> +void access_nested_id_reg(struct kvm_vcpu *v, struct sys_reg_params *p,
> +			  const struct sys_reg_desc *r)
> +{
> +	u32 id = sys_reg((u32)r->Op0, (u32)r->Op1,
> +			 (u32)r->CRn, (u32)r->CRm, (u32)r->Op2);
> +	u64 val, tmp;
> +
> +	if (!vcpu_has_nv(v))
> +		return;
> +
> +	val = p->regval;
> +
> +	switch (id) {
> +	case SYS_ID_AA64ISAR0_EL1:
> +		/* Support everything but O.S. and Range TLBIs */
> +		val &= ~(FEATURE(ID_AA64ISAR0_TLB)	|
> +			 GENMASK_ULL(27, 24)		|
> +			 GENMASK_ULL(3, 0));
> +		break;
> +
> +	case SYS_ID_AA64ISAR1_EL1:
> +		/* Support everything but PtrAuth and Spec Invalidation */
> +		val &= ~(GENMASK_ULL(63, 56)		|
> +			 FEATURE(ID_AA64ISAR1_SPECRES)	|
> +			 FEATURE(ID_AA64ISAR1_GPI)	|
> +			 FEATURE(ID_AA64ISAR1_GPA)	|
> +			 FEATURE(ID_AA64ISAR1_API)	|
> +			 FEATURE(ID_AA64ISAR1_APA));
> +		break;
> +
> +	case SYS_ID_AA64PFR0_EL1:
> +		/* No AMU, MPAM, S-EL2, RAS or SVE */
> +		val &= ~(GENMASK_ULL(55, 52)		|
> +			 FEATURE(ID_AA64PFR0_AMU)	|
> +			 FEATURE(ID_AA64PFR0_MPAM)	|
> +			 FEATURE(ID_AA64PFR0_SEL2)	|
> +			 FEATURE(ID_AA64PFR0_RAS)	|
> +			 FEATURE(ID_AA64PFR0_SVE)	|
> +			 FEATURE(ID_AA64PFR0_EL3)	|
> +			 FEATURE(ID_AA64PFR0_EL2));
> +		/* 64bit EL2/EL3 only */
> +		val |= FIELD_PREP(FEATURE(ID_AA64PFR0_EL2), 0b0001);
> +		val |= FIELD_PREP(FEATURE(ID_AA64PFR0_EL3), 0b0001);

Since KVM doesn't support virtual EL1 running in AArch32 mode when vcpu_has_nv()
(according to kvm_check_illegal_exception_return()), would it make sense to hide
the feature here?

> +		break;
> +
> +	case SYS_ID_AA64PFR1_EL1:
> +		/* Only support SSBS */
> +		val &= FEATURE(ID_AA64PFR1_SSBS);
> +		break;
> +
> +	case SYS_ID_AA64MMFR0_EL1:
> +		/* Hide ECV, FGT, ExS, Secure Memory */
> +		val &= ~(GENMASK_ULL(63, 43)			|
> +			 FEATURE(ID_AA64MMFR0_TGRAN4_2)		|
> +			 FEATURE(ID_AA64MMFR0_TGRAN16_2)	|
> +			 FEATURE(ID_AA64MMFR0_TGRAN64_2)	|
> +			 FEATURE(ID_AA64MMFR0_SNSMEM));
> +
> +		/* Disallow unsupported S2 page sizes */
> +		switch (PAGE_SIZE) {
> +		case SZ_64K:
> +			val |= FIELD_PREP(FEATURE(ID_AA64MMFR0_TGRAN16_2), 0b0001);
> +			fallthrough;
> +		case SZ_16K:
> +			val |= FIELD_PREP(FEATURE(ID_AA64MMFR0_TGRAN4_2), 0b0001);
> +			fallthrough;
> +		case SZ_4K:
> +			/* Support everything */
> +			break;
> +		}
> +		/*
> +		 * Since we can't support a guest S2 page size smaller than
> +		 * the host's own page size (due to KVM only populating its
> +		 * own S2 using the kernel's page size), advertise the
> +		 * limitation using FEAT_GTG.
> +		 */
> +		switch (PAGE_SIZE) {
> +		case SZ_4K:
> +			val |= FIELD_PREP(FEATURE(ID_AA64MMFR0_TGRAN4_2), 0b0010);
> +			fallthrough;
> +		case SZ_16K:
> +			val |= FIELD_PREP(FEATURE(ID_AA64MMFR0_TGRAN16_2), 0b0010);
> +			fallthrough;
> +		case SZ_64K:
> +			val |= FIELD_PREP(FEATURE(ID_AA64MMFR0_TGRAN64_2), 0b0010);
> +			break;
> +		}
> +		/* Cap PARange to 40bits */
> +		tmp = FIELD_GET(FEATURE(ID_AA64MMFR0_PARANGE), val);
> +		if (tmp > 0b0010) {
> +			val &= ~FEATURE(ID_AA64MMFR0_PARANGE);
> +			val |= FIELD_PREP(FEATURE(ID_AA64MMFR0_PARANGE), 0b0010);
> +		}
> +		break;
> +
> +	case SYS_ID_AA64MMFR1_EL1:
> +		val &= (FEATURE(ID_AA64MMFR1_PAN)	|
> +			FEATURE(ID_AA64MMFR1_LOR)	|
> +			FEATURE(ID_AA64MMFR1_HPD)	|
> +			FEATURE(ID_AA64MMFR1_VHE)	|
> +			FEATURE(ID_AA64MMFR1_VMIDBITS));
> +		break;
> +
> +	case SYS_ID_AA64MMFR2_EL1:
> +		val &= ~(FEATURE(ID_AA64MMFR2_EVT)	|
> +			 FEATURE(ID_AA64MMFR2_BBM)	|
> +			 FEATURE(ID_AA64MMFR2_TTL)	|
> +			 GENMASK_ULL(47, 44)		|
> +			 FEATURE(ID_AA64MMFR2_ST)	|
> +			 FEATURE(ID_AA64MMFR2_CCIDX)	|
> +			 FEATURE(ID_AA64MMFR2_LVA));
> +
> +		/* Force TTL support */
> +		val |= FIELD_PREP(FEATURE(ID_AA64MMFR2_TTL), 0b0001);
> +		break;
> +
> +	case SYS_ID_AA64DFR0_EL1:
> +		/* Only limited support for PMU, Debug, BPs and WPs */
> +		val &= (FEATURE(ID_AA64DFR0_PMSVER)	|
> +			FEATURE(ID_AA64DFR0_WRPS)	|
> +			FEATURE(ID_AA64DFR0_BRPS)	|
> +			FEATURE(ID_AA64DFR0_DEBUGVER));
> +
> +		/* Cap PMU to ARMv8.1 */
> +		tmp = FIELD_GET(FEATURE(ID_AA64DFR0_PMUVER), val);
> +		if (tmp > 0b0100) {
> +			val &= ~FEATURE(ID_AA64DFR0_PMUVER);
> +			val |= FIELD_PREP(FEATURE(ID_AA64DFR0_PMUVER), 0b0100);
> +		}
> +		/* Cap Debug to ARMv8.1 */
> +		tmp = FIELD_GET(FEATURE(ID_AA64DFR0_DEBUGVER), val);
> +		if (tmp > 0b0111) {
> +			val &= ~FEATURE(ID_AA64DFR0_DEBUGVER);
> +			val |= FIELD_PREP(FEATURE(ID_AA64DFR0_DEBUGVER), 0b0111);
> +		}
> +		break;
> +
> +	default:
> +		/* Unknown register, just wipe it clean */
> +		val = 0;
> +		break;
> +	}
> +
> +	p->regval = val;
> +}
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 3e1f37c507a8..ebcdf2714b73 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1394,8 +1394,10 @@ static bool access_id_reg(struct kvm_vcpu *vcpu,
>  			  const struct sys_reg_desc *r)
>  {
>  	bool raz = sysreg_visible_as_raz(vcpu, r);
> +	bool ret = __access_id_reg(vcpu, p, r, raz);
>  
> -	return __access_id_reg(vcpu, p, r, raz);
> +	access_nested_id_reg(vcpu, p, r);

Looks like when the guest tries to *write* to an ID reg, __access_id_reg() above
returns false. It also looks like access_nested_id_reg() assumes that the access
is a read. Shouldn't the call to access_nested_id_reg() be gated by ret == true?

Thanks,
Alex

> +	return ret;
>  }
>  
>  static bool access_raz_id_reg(struct kvm_vcpu *vcpu,
> diff --git a/arch/arm64/kvm/sys_regs.h b/arch/arm64/kvm/sys_regs.h
> index 850629f083a3..b82683224250 100644
> --- a/arch/arm64/kvm/sys_regs.h
> +++ b/arch/arm64/kvm/sys_regs.h
> @@ -211,4 +211,6 @@ const struct sys_reg_desc *find_reg_by_id(u64 id,
>  	CRn(sys_reg_CRn(reg)), CRm(sys_reg_CRm(reg)),	\
>  	Op2(sys_reg_Op2(reg))
>  
> +#define FEATURE(x)	(GENMASK_ULL(x##_SHIFT + 3, x##_SHIFT))
> +
>  #endif /* __ARM64_KVM_SYS_REGS_LOCAL_H__ */
> -- 
> 2.30.2
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 05/64] KVM: arm64: nv: Add EL2 system registers to vcpu context
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-11 16:35     ` Miguel Luis
  -1 siblings, 0 replies; 378+ messages in thread
From: Miguel Luis @ 2022-02-11 16:35 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, Karl Heubaum,
	Mihai Carabas, kernel-team

Hi Marc,

> On 28 Jan 2022, at 11:18, Marc Zyngier <maz@kernel.org> wrote:
> 
> Add the minimal set of EL2 system registers to the vcpu context.
> Nothing uses them just yet.
> 
> Reviewed-by: Andre Przywara <andre.przywara@arm.com>
> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
> arch/arm64/include/asm/kvm_host.h | 33 ++++++++++++++++++++++++++++++-
> 1 file changed, 32 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 115e0e2caf9a..15f690c27baf 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -220,12 +220,43 @@ enum vcpu_sysreg {
> 	TFSR_EL1,	/* Tag Fault Status Register (EL1) */
> 	TFSRE0_EL1,	/* Tag Fault Status Register (EL0) */
> 
> -	/* 32bit specific registers. Keep them at the end of the range */
> +	/* 32bit specific registers. */
> 	DACR32_EL2,	/* Domain Access Control Register */
> 	IFSR32_EL2,	/* Instruction Fault Status Register */
> 	FPEXC32_EL2,	/* Floating-Point Exception Control Register */
> 	DBGVCR32_EL2,	/* Debug Vector Catch Register */
> 
> +	/* EL2 registers */
> +	VPIDR_EL2,	/* Virtualization Processor ID Register */
> +	VMPIDR_EL2,	/* Virtualization Multiprocessor ID Register */
> +	SCTLR_EL2,	/* System Control Register (EL2) */
> +	ACTLR_EL2,	/* Auxiliary Control Register (EL2) */
> +	HCR_EL2,	/* Hypervisor Configuration Register */
> +	MDCR_EL2,	/* Monitor Debug Configuration Register (EL2) */
> +	CPTR_EL2,	/* Architectural Feature Trap Register (EL2) */
> +	HSTR_EL2,	/* Hypervisor System Trap Register */
> +	HACR_EL2,	/* Hypervisor Auxiliary Control Register */
> +	TTBR0_EL2,	/* Translation Table Base Register 0 (EL2) */
> +	TTBR1_EL2,	/* Translation Table Base Register 1 (EL2) */
> +	TCR_EL2,	/* Translation Control Register (EL2) */
> +	VTTBR_EL2,	/* Virtualization Translation Table Base Register */
> +	VTCR_EL2,	/* Virtualization Translation Control Register */
> +	SPSR_EL2,	/* EL2 saved program status register */
> +	ELR_EL2,	/* EL2 exception link register */
> +	AFSR0_EL2,	/* Auxiliary Fault Status Register 0 (EL2) */
> +	AFSR1_EL2,	/* Auxiliary Fault Status Register 1 (EL2) */
> +	ESR_EL2,	/* Exception Syndrome Register (EL2) */
> +	FAR_EL2,	/* Hypervisor IPA Fault Address Register */

The comment for the FAR_EL2 register seems incorrect. As per D13.2.41
FAR_EL2, Fault Address Register (EL2).

Miguel


^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 05/64] KVM: arm64: nv: Add EL2 system registers to vcpu context
@ 2022-02-11 16:35     ` Miguel Luis
  0 siblings, 0 replies; 378+ messages in thread
From: Miguel Luis @ 2022-02-11 16:35 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, Mihai Carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

Hi Marc,

> On 28 Jan 2022, at 11:18, Marc Zyngier <maz@kernel.org> wrote:
> 
> Add the minimal set of EL2 system registers to the vcpu context.
> Nothing uses them just yet.
> 
> Reviewed-by: Andre Przywara <andre.przywara@arm.com>
> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
> arch/arm64/include/asm/kvm_host.h | 33 ++++++++++++++++++++++++++++++-
> 1 file changed, 32 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 115e0e2caf9a..15f690c27baf 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -220,12 +220,43 @@ enum vcpu_sysreg {
> 	TFSR_EL1,	/* Tag Fault Status Register (EL1) */
> 	TFSRE0_EL1,	/* Tag Fault Status Register (EL0) */
> 
> -	/* 32bit specific registers. Keep them at the end of the range */
> +	/* 32bit specific registers. */
> 	DACR32_EL2,	/* Domain Access Control Register */
> 	IFSR32_EL2,	/* Instruction Fault Status Register */
> 	FPEXC32_EL2,	/* Floating-Point Exception Control Register */
> 	DBGVCR32_EL2,	/* Debug Vector Catch Register */
> 
> +	/* EL2 registers */
> +	VPIDR_EL2,	/* Virtualization Processor ID Register */
> +	VMPIDR_EL2,	/* Virtualization Multiprocessor ID Register */
> +	SCTLR_EL2,	/* System Control Register (EL2) */
> +	ACTLR_EL2,	/* Auxiliary Control Register (EL2) */
> +	HCR_EL2,	/* Hypervisor Configuration Register */
> +	MDCR_EL2,	/* Monitor Debug Configuration Register (EL2) */
> +	CPTR_EL2,	/* Architectural Feature Trap Register (EL2) */
> +	HSTR_EL2,	/* Hypervisor System Trap Register */
> +	HACR_EL2,	/* Hypervisor Auxiliary Control Register */
> +	TTBR0_EL2,	/* Translation Table Base Register 0 (EL2) */
> +	TTBR1_EL2,	/* Translation Table Base Register 1 (EL2) */
> +	TCR_EL2,	/* Translation Control Register (EL2) */
> +	VTTBR_EL2,	/* Virtualization Translation Table Base Register */
> +	VTCR_EL2,	/* Virtualization Translation Control Register */
> +	SPSR_EL2,	/* EL2 saved program status register */
> +	ELR_EL2,	/* EL2 exception link register */
> +	AFSR0_EL2,	/* Auxiliary Fault Status Register 0 (EL2) */
> +	AFSR1_EL2,	/* Auxiliary Fault Status Register 1 (EL2) */
> +	ESR_EL2,	/* Exception Syndrome Register (EL2) */
> +	FAR_EL2,	/* Hypervisor IPA Fault Address Register */

The comment for the FAR_EL2 register seems incorrect. As per D13.2.41
FAR_EL2, Fault Address Register (EL2).

Miguel

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 05/64] KVM: arm64: nv: Add EL2 system registers to vcpu context
@ 2022-02-11 16:35     ` Miguel Luis
  0 siblings, 0 replies; 378+ messages in thread
From: Miguel Luis @ 2022-02-11 16:35 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, Karl Heubaum,
	Mihai Carabas, kernel-team

Hi Marc,

> On 28 Jan 2022, at 11:18, Marc Zyngier <maz@kernel.org> wrote:
> 
> Add the minimal set of EL2 system registers to the vcpu context.
> Nothing uses them just yet.
> 
> Reviewed-by: Andre Przywara <andre.przywara@arm.com>
> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
> arch/arm64/include/asm/kvm_host.h | 33 ++++++++++++++++++++++++++++++-
> 1 file changed, 32 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 115e0e2caf9a..15f690c27baf 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -220,12 +220,43 @@ enum vcpu_sysreg {
> 	TFSR_EL1,	/* Tag Fault Status Register (EL1) */
> 	TFSRE0_EL1,	/* Tag Fault Status Register (EL0) */
> 
> -	/* 32bit specific registers. Keep them at the end of the range */
> +	/* 32bit specific registers. */
> 	DACR32_EL2,	/* Domain Access Control Register */
> 	IFSR32_EL2,	/* Instruction Fault Status Register */
> 	FPEXC32_EL2,	/* Floating-Point Exception Control Register */
> 	DBGVCR32_EL2,	/* Debug Vector Catch Register */
> 
> +	/* EL2 registers */
> +	VPIDR_EL2,	/* Virtualization Processor ID Register */
> +	VMPIDR_EL2,	/* Virtualization Multiprocessor ID Register */
> +	SCTLR_EL2,	/* System Control Register (EL2) */
> +	ACTLR_EL2,	/* Auxiliary Control Register (EL2) */
> +	HCR_EL2,	/* Hypervisor Configuration Register */
> +	MDCR_EL2,	/* Monitor Debug Configuration Register (EL2) */
> +	CPTR_EL2,	/* Architectural Feature Trap Register (EL2) */
> +	HSTR_EL2,	/* Hypervisor System Trap Register */
> +	HACR_EL2,	/* Hypervisor Auxiliary Control Register */
> +	TTBR0_EL2,	/* Translation Table Base Register 0 (EL2) */
> +	TTBR1_EL2,	/* Translation Table Base Register 1 (EL2) */
> +	TCR_EL2,	/* Translation Control Register (EL2) */
> +	VTTBR_EL2,	/* Virtualization Translation Table Base Register */
> +	VTCR_EL2,	/* Virtualization Translation Control Register */
> +	SPSR_EL2,	/* EL2 saved program status register */
> +	ELR_EL2,	/* EL2 exception link register */
> +	AFSR0_EL2,	/* Auxiliary Fault Status Register 0 (EL2) */
> +	AFSR1_EL2,	/* Auxiliary Fault Status Register 1 (EL2) */
> +	ESR_EL2,	/* Exception Syndrome Register (EL2) */
> +	FAR_EL2,	/* Hypervisor IPA Fault Address Register */

The comment for the FAR_EL2 register seems incorrect. As per D13.2.41
FAR_EL2, Fault Address Register (EL2).

Miguel


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 06/64] KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-14 12:39     ` Miguel Luis
  -1 siblings, 0 replies; 378+ messages in thread
From: Miguel Luis @ 2022-02-14 12:39 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, Karl Heubaum,
	Mihai Carabas, kernel-team

Hi Marc,

> On 28 Jan 2022, at 11:18, Marc Zyngier <maz@kernel.org> wrote:
> 
> From: Christoffer Dall <christoffer.dall@arm.com>
> 
> When running a nested hypervisor we commonly have to figure out if
> the VCPU mode is running in the context of a guest hypervisor or guest
> guest, or just a normal guest.
> 
> Add convenient primitives for this.
> 
> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
> arch/arm64/include/asm/kvm_emulate.h | 53 ++++++++++++++++++++++++++++
> 1 file changed, 53 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index d62405ce3e6d..ea9a130c4b6a 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -178,6 +178,59 @@ static __always_inline void vcpu_set_reg(struct kvm_vcpu *vcpu, u8 reg_num,
> 		vcpu_gp_regs(vcpu)->regs[reg_num] = val;
> }
> 
> +static inline bool vcpu_is_el2_ctxt(const struct kvm_cpu_context *ctxt)
> +{
> +	switch (ctxt->regs.pstate & (PSR_MODE32_BIT | PSR_MODE_MASK)) {
> +	case PSR_MODE_EL2h:
> +	case PSR_MODE_EL2t:
> +		return true;
> +	default:
> +		return false;
> +	}
> +}

PSR_MODE_EL2{h,t} values the least significant nibble, so why the PSR_MODE32_BIT in the condition?

For the scope of this function as is, may I suggest:

	switch (ctxt->regs.pstate & PSR_MODE_MASK) {

which should be sufficient to check if vcpu_is_el2_ctx.

Thank you.

	Miguel

> +
> +static inline bool vcpu_is_el2(const struct kvm_vcpu *vcpu)
> +{
> +	return vcpu_is_el2_ctxt(&vcpu->arch.ctxt);
> +}
> +
> +static inline bool __vcpu_el2_e2h_is_set(const struct kvm_cpu_context *ctxt)
> +{
> +	return ctxt_sys_reg(ctxt, HCR_EL2) & HCR_E2H;
> +}
> +
> +static inline bool vcpu_el2_e2h_is_set(const struct kvm_vcpu *vcpu)
> +{
> +	return __vcpu_el2_e2h_is_set(&vcpu->arch.ctxt);
> +}
> +
> +static inline bool __vcpu_el2_tge_is_set(const struct kvm_cpu_context *ctxt)
> +{
> +	return ctxt_sys_reg(ctxt, HCR_EL2) & HCR_TGE;
> +}
> +
> +static inline bool vcpu_el2_tge_is_set(const struct kvm_vcpu *vcpu)
> +{
> +	return __vcpu_el2_tge_is_set(&vcpu->arch.ctxt);
> +}
> +
> +static inline bool __is_hyp_ctxt(const struct kvm_cpu_context *ctxt)
> +{
> +	/*
> +	 * We are in a hypervisor context if the vcpu mode is EL2 or
> +	 * E2H and TGE bits are set. The latter means we are in the user space
> +	 * of the VHE kernel. ARMv8.1 ARM describes this as 'InHost'
> +	 */
> +	return vcpu_is_el2_ctxt(ctxt) ||
> +		(__vcpu_el2_e2h_is_set(ctxt) && __vcpu_el2_tge_is_set(ctxt)) ||
> +		WARN_ON(__vcpu_el2_tge_is_set(ctxt));
> +}
> +
> +static inline bool is_hyp_ctxt(const struct kvm_vcpu *vcpu)
> +{
> +	return __is_hyp_ctxt(&vcpu->arch.ctxt);
> +}
> +
> /*
>  * The layout of SPSR for an AArch32 state is different when observed from an
>  * AArch64 SPSR_ELx or an AArch32 SPSR_*. This function generates the AArch32
> -- 
> 2.30.2
> 


^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 06/64] KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state
@ 2022-02-14 12:39     ` Miguel Luis
  0 siblings, 0 replies; 378+ messages in thread
From: Miguel Luis @ 2022-02-14 12:39 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, Mihai Carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

Hi Marc,

> On 28 Jan 2022, at 11:18, Marc Zyngier <maz@kernel.org> wrote:
> 
> From: Christoffer Dall <christoffer.dall@arm.com>
> 
> When running a nested hypervisor we commonly have to figure out if
> the VCPU mode is running in the context of a guest hypervisor or guest
> guest, or just a normal guest.
> 
> Add convenient primitives for this.
> 
> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
> arch/arm64/include/asm/kvm_emulate.h | 53 ++++++++++++++++++++++++++++
> 1 file changed, 53 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index d62405ce3e6d..ea9a130c4b6a 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -178,6 +178,59 @@ static __always_inline void vcpu_set_reg(struct kvm_vcpu *vcpu, u8 reg_num,
> 		vcpu_gp_regs(vcpu)->regs[reg_num] = val;
> }
> 
> +static inline bool vcpu_is_el2_ctxt(const struct kvm_cpu_context *ctxt)
> +{
> +	switch (ctxt->regs.pstate & (PSR_MODE32_BIT | PSR_MODE_MASK)) {
> +	case PSR_MODE_EL2h:
> +	case PSR_MODE_EL2t:
> +		return true;
> +	default:
> +		return false;
> +	}
> +}

PSR_MODE_EL2{h,t} values the least significant nibble, so why the PSR_MODE32_BIT in the condition?

For the scope of this function as is, may I suggest:

	switch (ctxt->regs.pstate & PSR_MODE_MASK) {

which should be sufficient to check if vcpu_is_el2_ctx.

Thank you.

	Miguel

> +
> +static inline bool vcpu_is_el2(const struct kvm_vcpu *vcpu)
> +{
> +	return vcpu_is_el2_ctxt(&vcpu->arch.ctxt);
> +}
> +
> +static inline bool __vcpu_el2_e2h_is_set(const struct kvm_cpu_context *ctxt)
> +{
> +	return ctxt_sys_reg(ctxt, HCR_EL2) & HCR_E2H;
> +}
> +
> +static inline bool vcpu_el2_e2h_is_set(const struct kvm_vcpu *vcpu)
> +{
> +	return __vcpu_el2_e2h_is_set(&vcpu->arch.ctxt);
> +}
> +
> +static inline bool __vcpu_el2_tge_is_set(const struct kvm_cpu_context *ctxt)
> +{
> +	return ctxt_sys_reg(ctxt, HCR_EL2) & HCR_TGE;
> +}
> +
> +static inline bool vcpu_el2_tge_is_set(const struct kvm_vcpu *vcpu)
> +{
> +	return __vcpu_el2_tge_is_set(&vcpu->arch.ctxt);
> +}
> +
> +static inline bool __is_hyp_ctxt(const struct kvm_cpu_context *ctxt)
> +{
> +	/*
> +	 * We are in a hypervisor context if the vcpu mode is EL2 or
> +	 * E2H and TGE bits are set. The latter means we are in the user space
> +	 * of the VHE kernel. ARMv8.1 ARM describes this as 'InHost'
> +	 */
> +	return vcpu_is_el2_ctxt(ctxt) ||
> +		(__vcpu_el2_e2h_is_set(ctxt) && __vcpu_el2_tge_is_set(ctxt)) ||
> +		WARN_ON(__vcpu_el2_tge_is_set(ctxt));
> +}
> +
> +static inline bool is_hyp_ctxt(const struct kvm_vcpu *vcpu)
> +{
> +	return __is_hyp_ctxt(&vcpu->arch.ctxt);
> +}
> +
> /*
>  * The layout of SPSR for an AArch32 state is different when observed from an
>  * AArch64 SPSR_ELx or an AArch32 SPSR_*. This function generates the AArch32
> -- 
> 2.30.2
> 

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 06/64] KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state
@ 2022-02-14 12:39     ` Miguel Luis
  0 siblings, 0 replies; 378+ messages in thread
From: Miguel Luis @ 2022-02-14 12:39 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, Karl Heubaum,
	Mihai Carabas, kernel-team

Hi Marc,

> On 28 Jan 2022, at 11:18, Marc Zyngier <maz@kernel.org> wrote:
> 
> From: Christoffer Dall <christoffer.dall@arm.com>
> 
> When running a nested hypervisor we commonly have to figure out if
> the VCPU mode is running in the context of a guest hypervisor or guest
> guest, or just a normal guest.
> 
> Add convenient primitives for this.
> 
> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
> arch/arm64/include/asm/kvm_emulate.h | 53 ++++++++++++++++++++++++++++
> 1 file changed, 53 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index d62405ce3e6d..ea9a130c4b6a 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -178,6 +178,59 @@ static __always_inline void vcpu_set_reg(struct kvm_vcpu *vcpu, u8 reg_num,
> 		vcpu_gp_regs(vcpu)->regs[reg_num] = val;
> }
> 
> +static inline bool vcpu_is_el2_ctxt(const struct kvm_cpu_context *ctxt)
> +{
> +	switch (ctxt->regs.pstate & (PSR_MODE32_BIT | PSR_MODE_MASK)) {
> +	case PSR_MODE_EL2h:
> +	case PSR_MODE_EL2t:
> +		return true;
> +	default:
> +		return false;
> +	}
> +}

PSR_MODE_EL2{h,t} values the least significant nibble, so why the PSR_MODE32_BIT in the condition?

For the scope of this function as is, may I suggest:

	switch (ctxt->regs.pstate & PSR_MODE_MASK) {

which should be sufficient to check if vcpu_is_el2_ctx.

Thank you.

	Miguel

> +
> +static inline bool vcpu_is_el2(const struct kvm_vcpu *vcpu)
> +{
> +	return vcpu_is_el2_ctxt(&vcpu->arch.ctxt);
> +}
> +
> +static inline bool __vcpu_el2_e2h_is_set(const struct kvm_cpu_context *ctxt)
> +{
> +	return ctxt_sys_reg(ctxt, HCR_EL2) & HCR_E2H;
> +}
> +
> +static inline bool vcpu_el2_e2h_is_set(const struct kvm_vcpu *vcpu)
> +{
> +	return __vcpu_el2_e2h_is_set(&vcpu->arch.ctxt);
> +}
> +
> +static inline bool __vcpu_el2_tge_is_set(const struct kvm_cpu_context *ctxt)
> +{
> +	return ctxt_sys_reg(ctxt, HCR_EL2) & HCR_TGE;
> +}
> +
> +static inline bool vcpu_el2_tge_is_set(const struct kvm_vcpu *vcpu)
> +{
> +	return __vcpu_el2_tge_is_set(&vcpu->arch.ctxt);
> +}
> +
> +static inline bool __is_hyp_ctxt(const struct kvm_cpu_context *ctxt)
> +{
> +	/*
> +	 * We are in a hypervisor context if the vcpu mode is EL2 or
> +	 * E2H and TGE bits are set. The latter means we are in the user space
> +	 * of the VHE kernel. ARMv8.1 ARM describes this as 'InHost'
> +	 */
> +	return vcpu_is_el2_ctxt(ctxt) ||
> +		(__vcpu_el2_e2h_is_set(ctxt) && __vcpu_el2_tge_is_set(ctxt)) ||
> +		WARN_ON(__vcpu_el2_tge_is_set(ctxt));
> +}
> +
> +static inline bool is_hyp_ctxt(const struct kvm_vcpu *vcpu)
> +{
> +	return __is_hyp_ctxt(&vcpu->arch.ctxt);
> +}
> +
> /*
>  * The layout of SPSR for an AArch32 state is different when observed from an
>  * AArch64 SPSR_ELx or an AArch32 SPSR_*. This function generates the AArch32
> -- 
> 2.30.2
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 06/64] KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state
  2022-02-14 12:39     ` Miguel Luis
  (?)
@ 2022-02-14 14:20       ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-02-14 14:20 UTC (permalink / raw)
  To: Miguel Luis
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, Karl Heubaum,
	Mihai Carabas, kernel-team

On 2022-02-14 12:39, Miguel Luis wrote:
> Hi Marc,
> 
>> On 28 Jan 2022, at 11:18, Marc Zyngier <maz@kernel.org> wrote:
>> 
>> From: Christoffer Dall <christoffer.dall@arm.com>
>> 
>> When running a nested hypervisor we commonly have to figure out if
>> the VCPU mode is running in the context of a guest hypervisor or guest
>> guest, or just a normal guest.
>> 
>> Add convenient primitives for this.
>> 
>> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
>> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
>> Signed-off-by: Marc Zyngier <maz@kernel.org>
>> ---
>> arch/arm64/include/asm/kvm_emulate.h | 53 ++++++++++++++++++++++++++++
>> 1 file changed, 53 insertions(+)
>> 
>> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
>> b/arch/arm64/include/asm/kvm_emulate.h
>> index d62405ce3e6d..ea9a130c4b6a 100644
>> --- a/arch/arm64/include/asm/kvm_emulate.h
>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>> @@ -178,6 +178,59 @@ static __always_inline void vcpu_set_reg(struct 
>> kvm_vcpu *vcpu, u8 reg_num,
>> 		vcpu_gp_regs(vcpu)->regs[reg_num] = val;
>> }
>> 
>> +static inline bool vcpu_is_el2_ctxt(const struct kvm_cpu_context 
>> *ctxt)
>> +{
>> +	switch (ctxt->regs.pstate & (PSR_MODE32_BIT | PSR_MODE_MASK)) {
>> +	case PSR_MODE_EL2h:
>> +	case PSR_MODE_EL2t:
>> +		return true;
>> +	default:
>> +		return false;
>> +	}
>> +}
> 
> PSR_MODE_EL2{h,t} values the least significant nibble, so why the
> PSR_MODE32_BIT in the condition?

Because that's part of the M bits in SPSR, and this has to be
valid on any code path, no matter what execution state the
guest is in. You can't evaluate the M[3:0] on their own *unless*
you have already checked that M[4] is 0 (an AArch32 guest would
have M[4]==1).

Yes, we are so far lucky that AArch32 and AArch64 don't overlap
in their values of SPSR_EL2.M[3:0]. We may run out of luck at
some point.

> 
> For the scope of this function as is, may I suggest:
> 
> 	switch (ctxt->regs.pstate & PSR_MODE_MASK) {
> 
> which should be sufficient to check if vcpu_is_el2_ctx.

I don't think this is wise. It makes the code more fragile,
and harder to reason about.

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 06/64] KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state
@ 2022-02-14 14:20       ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-02-14 14:20 UTC (permalink / raw)
  To: Miguel Luis
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, Mihai Carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

On 2022-02-14 12:39, Miguel Luis wrote:
> Hi Marc,
> 
>> On 28 Jan 2022, at 11:18, Marc Zyngier <maz@kernel.org> wrote:
>> 
>> From: Christoffer Dall <christoffer.dall@arm.com>
>> 
>> When running a nested hypervisor we commonly have to figure out if
>> the VCPU mode is running in the context of a guest hypervisor or guest
>> guest, or just a normal guest.
>> 
>> Add convenient primitives for this.
>> 
>> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
>> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
>> Signed-off-by: Marc Zyngier <maz@kernel.org>
>> ---
>> arch/arm64/include/asm/kvm_emulate.h | 53 ++++++++++++++++++++++++++++
>> 1 file changed, 53 insertions(+)
>> 
>> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
>> b/arch/arm64/include/asm/kvm_emulate.h
>> index d62405ce3e6d..ea9a130c4b6a 100644
>> --- a/arch/arm64/include/asm/kvm_emulate.h
>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>> @@ -178,6 +178,59 @@ static __always_inline void vcpu_set_reg(struct 
>> kvm_vcpu *vcpu, u8 reg_num,
>> 		vcpu_gp_regs(vcpu)->regs[reg_num] = val;
>> }
>> 
>> +static inline bool vcpu_is_el2_ctxt(const struct kvm_cpu_context 
>> *ctxt)
>> +{
>> +	switch (ctxt->regs.pstate & (PSR_MODE32_BIT | PSR_MODE_MASK)) {
>> +	case PSR_MODE_EL2h:
>> +	case PSR_MODE_EL2t:
>> +		return true;
>> +	default:
>> +		return false;
>> +	}
>> +}
> 
> PSR_MODE_EL2{h,t} values the least significant nibble, so why the
> PSR_MODE32_BIT in the condition?

Because that's part of the M bits in SPSR, and this has to be
valid on any code path, no matter what execution state the
guest is in. You can't evaluate the M[3:0] on their own *unless*
you have already checked that M[4] is 0 (an AArch32 guest would
have M[4]==1).

Yes, we are so far lucky that AArch32 and AArch64 don't overlap
in their values of SPSR_EL2.M[3:0]. We may run out of luck at
some point.

> 
> For the scope of this function as is, may I suggest:
> 
> 	switch (ctxt->regs.pstate & PSR_MODE_MASK) {
> 
> which should be sufficient to check if vcpu_is_el2_ctx.

I don't think this is wise. It makes the code more fragile,
and harder to reason about.

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 06/64] KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state
@ 2022-02-14 14:20       ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-02-14 14:20 UTC (permalink / raw)
  To: Miguel Luis
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, Alexandru Elisei, Karl Heubaum,
	Mihai Carabas, kernel-team

On 2022-02-14 12:39, Miguel Luis wrote:
> Hi Marc,
> 
>> On 28 Jan 2022, at 11:18, Marc Zyngier <maz@kernel.org> wrote:
>> 
>> From: Christoffer Dall <christoffer.dall@arm.com>
>> 
>> When running a nested hypervisor we commonly have to figure out if
>> the VCPU mode is running in the context of a guest hypervisor or guest
>> guest, or just a normal guest.
>> 
>> Add convenient primitives for this.
>> 
>> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
>> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
>> Signed-off-by: Marc Zyngier <maz@kernel.org>
>> ---
>> arch/arm64/include/asm/kvm_emulate.h | 53 ++++++++++++++++++++++++++++
>> 1 file changed, 53 insertions(+)
>> 
>> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
>> b/arch/arm64/include/asm/kvm_emulate.h
>> index d62405ce3e6d..ea9a130c4b6a 100644
>> --- a/arch/arm64/include/asm/kvm_emulate.h
>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>> @@ -178,6 +178,59 @@ static __always_inline void vcpu_set_reg(struct 
>> kvm_vcpu *vcpu, u8 reg_num,
>> 		vcpu_gp_regs(vcpu)->regs[reg_num] = val;
>> }
>> 
>> +static inline bool vcpu_is_el2_ctxt(const struct kvm_cpu_context 
>> *ctxt)
>> +{
>> +	switch (ctxt->regs.pstate & (PSR_MODE32_BIT | PSR_MODE_MASK)) {
>> +	case PSR_MODE_EL2h:
>> +	case PSR_MODE_EL2t:
>> +		return true;
>> +	default:
>> +		return false;
>> +	}
>> +}
> 
> PSR_MODE_EL2{h,t} values the least significant nibble, so why the
> PSR_MODE32_BIT in the condition?

Because that's part of the M bits in SPSR, and this has to be
valid on any code path, no matter what execution state the
guest is in. You can't evaluate the M[3:0] on their own *unless*
you have already checked that M[4] is 0 (an AArch32 guest would
have M[4]==1).

Yes, we are so far lucky that AArch32 and AArch64 don't overlap
in their values of SPSR_EL2.M[3:0]. We may run out of luck at
some point.

> 
> For the scope of this function as is, may I suggest:
> 
> 	switch (ctxt->regs.pstate & PSR_MODE_MASK) {
> 
> which should be sufficient to check if vcpu_is_el2_ctx.

I don't think this is wise. It makes the code more fragile,
and harder to reason about.

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 34/64] KVM: arm64: nv: Support multiple nested Stage-2 mmu structures
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-16 16:12     ` Alexandru Elisei
  -1 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-16 16:12 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi,

The code looks correct to me. Questions about stuff I don't understand and
nitpicks follow.

On Fri, Jan 28, 2022 at 12:18:42PM +0000, Marc Zyngier wrote:
> Add Stage-2 mmu data structures for virtual EL2 and for nested guests.
> We don't yet populate shadow Stage-2 page tables, but we now have a
> framework for getting to a shadow Stage-2 pgd.
> 
> We allocate twice the number of vcpus as Stage-2 mmu structures because
> that's sufficient for each vcpu running two translation regimes without
> having to flush the Stage-2 page tables.

That still leaves room for interpretation. Why is running two translation
regimes per VCPU the target for the allocation? Is it because it's a nice
number? Is it because KVM only supports two translation regimes per VCPU and
going beyond that will make it impossible to run the VM? Is it because, in
practice, KVM doesn't expect most people to run more than that? Is it because
KVM doesn't recommend or hasn't been tested with running more than that?

> 
> Co-developed-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_host.h   |  29 +++++
>  arch/arm64/include/asm/kvm_mmu.h    |   9 ++
>  arch/arm64/include/asm/kvm_nested.h |   7 +
>  arch/arm64/kvm/arm.c                |  16 ++-
>  arch/arm64/kvm/mmu.c                |  29 +++--
>  arch/arm64/kvm/nested.c             | 195 ++++++++++++++++++++++++++++
>  6 files changed, 275 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index fa253f08e0fd..a15183d0e1bf 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -101,14 +101,43 @@ struct kvm_s2_mmu {
>  	int __percpu *last_vcpu_ran;
>  
>  	struct kvm_arch *arch;
> +
> +	/*
> +	 * For a shadow stage-2 MMU, the virtual vttbr programmed by the guest
> +	 * hypervisor.  Unused for kvm_arch->mmu. Set to 1 when the structure
> +	 * contains no valid information.

I think the comment contradicts itself: 1 is a valid hardware value for
VTTBR_EL2 (VMID = 0, BADDR = 0, CnP = 1), so either this is not the VTTBR
programmed by the guest hypervisor, or treating the value 1 as invalid is
incorrect.

Seeing how this is only used to hold vmid+baddr, maybe it's worth considering
renaming the field to vmid_baddr to avoid confusion about what it means.

> +	 */
> +	u64	vttbr;
> +
> +	/* true when this represents a nested context where virtual HCR_EL2.VM == 1 */
> +	bool	nested_stage2_enabled;
> +
> +	/*
> +	 *  0: Nobody is currently using this, check vttbr for validity
> +	 * >0: Somebody is actively using this.
> +	 */
> +	atomic_t refcnt;
>  };
>  
> +static inline bool kvm_s2_mmu_valid(struct kvm_s2_mmu *mmu)
> +{
> +	return !(mmu->vttbr & 1);
> +}
> +
>  struct kvm_arch_memory_slot {
>  };
>  
>  struct kvm_arch {
>  	struct kvm_s2_mmu mmu;
>  
> +	/*
> +	 * Stage 2 paging stage for VMs with nested virtual using a virtual
                          ^^^^^
Is that correct or was it supposed to say "state"? Also, "[..] VMS with nested
virtual using [..]" sounds wrongs to me.

> +	 * VMID.
> +	 */
> +	struct kvm_s2_mmu *nested_mmus;
> +	size_t nested_mmus_size;
> +	int nested_mmus_next;
> +
>  	/* VTCR_EL2 value for this VM */
>  	u64    vtcr;
>  
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 1b314b2a69bc..0750d022bbf8 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -116,6 +116,7 @@ alternative_cb_end
>  #include <asm/cacheflush.h>
>  #include <asm/mmu_context.h>
>  #include <asm/kvm_emulate.h>
> +#include <asm/kvm_nested.h>
>  
>  void kvm_update_va_mask(struct alt_instr *alt,
>  			__le32 *origptr, __le32 *updptr, int nr_inst);
> @@ -161,6 +162,7 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
>  			     void **haddr);
>  void free_hyp_pgds(void);
>  
> +void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size);
>  void stage2_unmap_vm(struct kvm *kvm);
>  int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu);
>  void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
> @@ -296,5 +298,12 @@ static inline struct kvm *kvm_s2_mmu_to_kvm(struct kvm_s2_mmu *mmu)
>  {
>  	return container_of(mmu->arch, struct kvm, arch);
>  }
> +
> +static inline u64 get_vmid(u64 vttbr)
> +{
> +	return (vttbr & VTTBR_VMID_MASK(kvm_get_vmid_bits())) >>
> +		VTTBR_VMID_SHIFT;
> +}
> +
>  #endif /* __ASSEMBLY__ */
>  #endif /* __ARM64_KVM_MMU_H__ */
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index 7d398510fd9d..8bb7159f2b6b 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -65,6 +65,13 @@ static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
>  		(cnthctl & (CNTHCTL_EVNTI | CNTHCTL_EVNTDIR | CNTHCTL_EVNTEN)));
>  }
>  
> +extern void kvm_init_nested(struct kvm *kvm);
> +extern int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu);
> +extern void kvm_init_nested_s2_mmu(struct kvm_s2_mmu *mmu);
> +extern struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr);
> +extern void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu);
> +extern void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu);
> +
>  int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
>  extern bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg,
>  			    u64 control_bit);
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 06ca11e90482..14f85f1e15b2 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -37,6 +37,7 @@
>  #include <asm/kvm_arm.h>
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_mmu.h>
> +#include <asm/kvm_nested.h>
>  #include <asm/kvm_emulate.h>
>  #include <asm/sections.h>
>  
> @@ -146,6 +147,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  	if (ret)
>  		return ret;
>  
> +	kvm_init_nested(kvm);
> +
>  	ret = kvm_share_hyp(kvm, kvm + 1);
>  	if (ret)
>  		goto out_free_stage2_pgd;
> @@ -375,6 +378,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  	struct kvm_s2_mmu *mmu;
>  	int *last_ran;
>  
> +	if (vcpu_has_nv(vcpu))
> +		kvm_vcpu_load_hw_mmu(vcpu);
> +
>  	mmu = vcpu->arch.hw_mmu;
>  	last_ran = this_cpu_ptr(mmu->last_vcpu_ran);
>  
> @@ -423,6 +429,9 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  	kvm_vgic_put(vcpu);
>  	kvm_vcpu_pmu_restore_host(vcpu);
>  
> +	if (vcpu_has_nv(vcpu))
> +		kvm_vcpu_put_hw_mmu(vcpu);
> +
>  	vcpu->cpu = -1;
>  }
>  
> @@ -1122,8 +1131,13 @@ static int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
>  
>  	vcpu->arch.target = phys_target;
>  
> +	/* Prepare for nested if required */
> +	ret = kvm_vcpu_init_nested(vcpu);
> +
>  	/* Now we know what it is, we can reset it. */
> -	ret = kvm_reset_vcpu(vcpu);
> +	if (!ret)
> +		ret = kvm_reset_vcpu(vcpu);
> +
>  	if (ret) {
>  		vcpu->arch.target = -1;
>  		bitmap_zero(vcpu->arch.features, KVM_VCPU_MAX_FEATURES);
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index bc2aba953299..55525fd5743d 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -162,7 +162,7 @@ static void invalidate_icache_guest_page(void *va, size_t size)
>   * does.
>   */
>  /**
> - * unmap_stage2_range -- Clear stage2 page table entries to unmap a range
> + * __unmap_stage2_range -- Clear stage2 page table entries to unmap a range
>   * @mmu:   The KVM stage-2 MMU pointer
>   * @start: The intermediate physical base address of the range to unmap
>   * @size:  The size of the area to unmap
> @@ -185,7 +185,7 @@ static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64
>  				   may_block));
>  }
>  
> -static void unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
> +void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
>  {
>  	__unmap_stage2_range(mmu, start, size, true);
>  }
> @@ -628,7 +628,20 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
>  	int cpu, err;
>  	struct kvm_pgtable *pgt;
>  
> +	/*
> +	 * If we already have our page tables in place, and that the
> +	 * MMU context is the canonical one, we have a bug somewhere,
> +	 * as this is only supposed to ever happen once per VM.
> +	 *
> +	 * Otherwise, we're building nested page tables, and that's
> +	 * probably because userspace called KVM_ARM_VCPU_INIT more
> +	 * than once on the same vcpu. Since that's actually legal,
> +	 * don't kick a fuss and leave gracefully.
> +	 */
>  	if (mmu->pgt != NULL) {
> +		if (&kvm->arch.mmu != mmu)
> +			return 0;
> +
>  		kvm_err("kvm_arch already initialized?\n");
>  		return -EINVAL;
>  	}
> @@ -654,6 +667,9 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
>  	mmu->pgt = pgt;
>  	mmu->pgd_phys = __pa(pgt->pgd);
>  	WRITE_ONCE(mmu->vmid.vmid_gen, 0);
> +
> +	kvm_init_nested_s2_mmu(mmu);

Might make the code self documenting if kvm_init_nested_s2_mmu() was called only
if mmu != &kvm->arch.mmu (we already have a similar check a few lines above).
Again, not a big deal and doesn't affect correctness.

> +
>  	return 0;
>  
>  out_destroy_pgtable:
> @@ -699,7 +715,7 @@ static void stage2_unmap_memslot(struct kvm *kvm,
>  
>  		if (!(vma->vm_flags & VM_PFNMAP)) {
>  			gpa_t gpa = addr + (vm_start - memslot->userspace_addr);
> -			unmap_stage2_range(&kvm->arch.mmu, gpa, vm_end - vm_start);
> +			kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, vm_end - vm_start);
>  		}
>  		hva = vm_end;
>  	} while (hva < reg_end);
> @@ -1681,11 +1697,6 @@ void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen)
>  {
>  }
>  
> -void kvm_arch_flush_shadow_all(struct kvm *kvm)
> -{
> -	kvm_free_stage2_pgd(&kvm->arch.mmu);
> -}
> -
>  void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>  				   struct kvm_memory_slot *slot)
>  {
> @@ -1693,7 +1704,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>  	phys_addr_t size = slot->npages << PAGE_SHIFT;
>  
>  	spin_lock(&kvm->mmu_lock);
> -	unmap_stage2_range(&kvm->arch.mmu, gpa, size);
> +	kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, size);
>  	spin_unlock(&kvm->mmu_lock);
>  }
>  
> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> index 254152cd791e..bfa2b9229173 100644
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
> @@ -7,12 +7,189 @@
>  #include <linux/kvm.h>
>  #include <linux/kvm_host.h>
>  
> +#include <asm/kvm_arm.h>
>  #include <asm/kvm_emulate.h>
> +#include <asm/kvm_mmu.h>
>  #include <asm/kvm_nested.h>
>  #include <asm/sysreg.h>
>  
>  #include "sys_regs.h"
>  
> +void kvm_init_nested(struct kvm *kvm)
> +{
> +	kvm->arch.nested_mmus = NULL;
> +	kvm->arch.nested_mmus_size = 0;
> +}
> +
> +int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm *kvm = vcpu->kvm;
> +	struct kvm_s2_mmu *tmp;
> +	int num_mmus;
> +	int ret = -ENOMEM;
> +
> +	if (!test_bit(KVM_ARM_VCPU_HAS_EL2, vcpu->arch.features))
> +		return 0;
> +
> +	if (!cpus_have_final_cap(ARM64_HAS_NESTED_VIRT))
> +		return -EINVAL;
> +
> +	mutex_lock(&kvm->lock);
> +
> +	/*
> +	 * Let's treat memory allocation failures as benign: If we fail to
> +	 * allocate anything, return an error and keep the allocated array
> +	 * alive. Userspace may try to recover by intializing the vcpu
> +	 * again, and there is no reason to affect the whole VM for this.
> +	 */
> +	num_mmus = atomic_read(&kvm->online_vcpus) * 2;
> +	tmp = krealloc(kvm->arch.nested_mmus,
> +		       num_mmus * sizeof(*kvm->arch.nested_mmus),
> +		       GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> +	if (tmp) {
> +		/*
> +		 * If we went through a realocation, adjust the MMU
> +		 * back-pointers in the previously initialised
> +		 * pg_table structures.
> +		 */
> +		if (kvm->arch.nested_mmus != tmp) {
> +			int i;
> +
> +			for (i = 0; i < num_mmus - 2; i++)
> +				tmp[i].pgt->mmu = &tmp[i];
> +		}
> +
> +		if (kvm_init_stage2_mmu(kvm, &tmp[num_mmus - 1]) ||
> +		    kvm_init_stage2_mmu(kvm, &tmp[num_mmus - 2])) {
> +			kvm_free_stage2_pgd(&tmp[num_mmus - 1]);
> +			kvm_free_stage2_pgd(&tmp[num_mmus - 2]);
> +		} else {
> +			kvm->arch.nested_mmus_size = num_mmus;
> +			ret = 0;
> +		}
> +
> +		kvm->arch.nested_mmus = tmp;
> +	}
> +
> +	mutex_unlock(&kvm->lock);
> +	return ret;
> +}
> +
> +/* Must be called with kvm->lock held */
> +struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr)
> +{
> +	bool nested_stage2_enabled = hcr & HCR_VM;
> +	int i;
> +
> +	/* Don't consider the CnP bit for the vttbr match */
> +	vttbr = vttbr & ~VTTBR_CNP_BIT;
> +
> +	/*
> +	 * Two possibilities when looking up a S2 MMU context:
> +	 *
> +	 * - either S2 is enabled in the guest, and we need a context that
> +         *   is S2-enabled and matches the full VTTBR (VMID+BADDR), which
> +         *   makes it safe from a TLB conflict perspective (a broken guest
> +         *   won't be able to generate them),
> +	 *
> +	 * - or S2 is disabled, and we need a context that is S2-disabled
> +         *   and matches the VMID only, as all TLBs are tagged by VMID even
> +         *   if S2 translation is disabled.
> +	 */
> +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> +		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
> +
> +		if (!kvm_s2_mmu_valid(mmu))
> +			continue;
> +
> +		if (nested_stage2_enabled &&
> +		    mmu->nested_stage2_enabled &&
> +		    vttbr == mmu->vttbr)
> +			return mmu;
> +
> +		if (!nested_stage2_enabled &&
> +		    !mmu->nested_stage2_enabled &&
> +		    get_vmid(vttbr) == get_vmid(mmu->vttbr))
> +			return mmu;
> +	}
> +	return NULL;
> +}
> +
> +static struct kvm_s2_mmu *get_s2_mmu_nested(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm *kvm = vcpu->kvm;
> +	u64 vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
> +	u64 hcr= vcpu_read_sys_reg(vcpu, HCR_EL2);
> +	struct kvm_s2_mmu *s2_mmu;
> +	int i;
> +
> +	s2_mmu = lookup_s2_mmu(kvm, vttbr, hcr);
> +	if (s2_mmu)
> +		goto out;
> +
> +	/*
> +	 * Make sure we don't always search from the same point, or we
> +	 * will always reuse a potentially active context, leaving
> +	 * free contexts unused.
> +	 */
> +	for (i = kvm->arch.nested_mmus_next;
> +	     i < (kvm->arch.nested_mmus_size + kvm->arch.nested_mmus_next);
> +	     i++) {
> +		s2_mmu = &kvm->arch.nested_mmus[i % kvm->arch.nested_mmus_size];
> +
> +		if (atomic_read(&s2_mmu->refcnt) == 0)
> +			break;
> +	}
> +	BUG_ON(atomic_read(&s2_mmu->refcnt)); /* We have struct MMUs to spare */
> +
> +	/* Set the scene for the next search */
> +	kvm->arch.nested_mmus_next = (i + 1) % kvm->arch.nested_mmus_size;
> +
> +	if (kvm_s2_mmu_valid(s2_mmu)) {
> +		/* Clear the old state */
> +		kvm_unmap_stage2_range(s2_mmu, 0, kvm_phys_size(kvm));
> +		if (s2_mmu->vmid.vmid_gen)
> +			kvm_call_hyp(__kvm_tlb_flush_vmid, s2_mmu);
> +	}
> +
> +	/*
> +	 * The virtual VMID (modulo CnP) will be used as a key when matching
> +	 * an existing kvm_s2_mmu.
> +	 */
> +	s2_mmu->vttbr = vttbr & ~VTTBR_CNP_BIT;
> +	s2_mmu->nested_stage2_enabled = hcr & HCR_VM;
> +
> +out:
> +	atomic_inc(&s2_mmu->refcnt);
> +	return s2_mmu;
> +}
> +
> +void kvm_init_nested_s2_mmu(struct kvm_s2_mmu *mmu)
> +{
> +	mmu->vttbr = 1;
> +	mmu->nested_stage2_enabled = false;
> +	atomic_set(&mmu->refcnt, 0);
> +}
> +
> +void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu)
> +{
> +	if (is_hyp_ctxt(vcpu)) {
> +		vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu;
> +	} else {
> +		spin_lock(&vcpu->kvm->mmu_lock);
> +		vcpu->arch.hw_mmu = get_s2_mmu_nested(vcpu);
> +		spin_unlock(&vcpu->kvm->mmu_lock);
> +	}
> +}
> +
> +void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu)
> +{
> +	if (vcpu->arch.hw_mmu != &vcpu->kvm->arch.mmu) {
> +		atomic_dec(&vcpu->arch.hw_mmu->refcnt);
> +		vcpu->arch.hw_mmu = NULL;
> +	}
> +}
> +
>  /*
>   * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
>   * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
> @@ -31,6 +208,24 @@ int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
>  	return -EINVAL;
>  }
>  
> +void kvm_arch_flush_shadow_all(struct kvm *kvm)
> +{
> +	int i;
> +
> +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> +		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
> +
> +		WARN_ON(atomic_read(&mmu->refcnt));
> +
> +		if (!atomic_read(&mmu->refcnt))
> +			kvm_free_stage2_pgd(mmu);
> +	}
> +	kfree(kvm->arch.nested_mmus);
> +	kvm->arch.nested_mmus = NULL;
> +	kvm->arch.nested_mmus_size = 0;
> +	kvm_free_stage2_pgd(&kvm->arch.mmu);
> +}
> +

The code above looks good.

Thanks,
Alex

>  /*
>   * Our emulated CPU doesn't support all the possible features. For the
>   * sake of simplicity (and probably mental sanity), wipe out a number
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 34/64] KVM: arm64: nv: Support multiple nested Stage-2 mmu structures
@ 2022-02-16 16:12     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-16 16:12 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

Hi,

The code looks correct to me. Questions about stuff I don't understand and
nitpicks follow.

On Fri, Jan 28, 2022 at 12:18:42PM +0000, Marc Zyngier wrote:
> Add Stage-2 mmu data structures for virtual EL2 and for nested guests.
> We don't yet populate shadow Stage-2 page tables, but we now have a
> framework for getting to a shadow Stage-2 pgd.
> 
> We allocate twice the number of vcpus as Stage-2 mmu structures because
> that's sufficient for each vcpu running two translation regimes without
> having to flush the Stage-2 page tables.

That still leaves room for interpretation. Why is running two translation
regimes per VCPU the target for the allocation? Is it because it's a nice
number? Is it because KVM only supports two translation regimes per VCPU and
going beyond that will make it impossible to run the VM? Is it because, in
practice, KVM doesn't expect most people to run more than that? Is it because
KVM doesn't recommend or hasn't been tested with running more than that?

> 
> Co-developed-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_host.h   |  29 +++++
>  arch/arm64/include/asm/kvm_mmu.h    |   9 ++
>  arch/arm64/include/asm/kvm_nested.h |   7 +
>  arch/arm64/kvm/arm.c                |  16 ++-
>  arch/arm64/kvm/mmu.c                |  29 +++--
>  arch/arm64/kvm/nested.c             | 195 ++++++++++++++++++++++++++++
>  6 files changed, 275 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index fa253f08e0fd..a15183d0e1bf 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -101,14 +101,43 @@ struct kvm_s2_mmu {
>  	int __percpu *last_vcpu_ran;
>  
>  	struct kvm_arch *arch;
> +
> +	/*
> +	 * For a shadow stage-2 MMU, the virtual vttbr programmed by the guest
> +	 * hypervisor.  Unused for kvm_arch->mmu. Set to 1 when the structure
> +	 * contains no valid information.

I think the comment contradicts itself: 1 is a valid hardware value for
VTTBR_EL2 (VMID = 0, BADDR = 0, CnP = 1), so either this is not the VTTBR
programmed by the guest hypervisor, or treating the value 1 as invalid is
incorrect.

Seeing how this is only used to hold vmid+baddr, maybe it's worth considering
renaming the field to vmid_baddr to avoid confusion about what it means.

> +	 */
> +	u64	vttbr;
> +
> +	/* true when this represents a nested context where virtual HCR_EL2.VM == 1 */
> +	bool	nested_stage2_enabled;
> +
> +	/*
> +	 *  0: Nobody is currently using this, check vttbr for validity
> +	 * >0: Somebody is actively using this.
> +	 */
> +	atomic_t refcnt;
>  };
>  
> +static inline bool kvm_s2_mmu_valid(struct kvm_s2_mmu *mmu)
> +{
> +	return !(mmu->vttbr & 1);
> +}
> +
>  struct kvm_arch_memory_slot {
>  };
>  
>  struct kvm_arch {
>  	struct kvm_s2_mmu mmu;
>  
> +	/*
> +	 * Stage 2 paging stage for VMs with nested virtual using a virtual
                          ^^^^^
Is that correct or was it supposed to say "state"? Also, "[..] VMS with nested
virtual using [..]" sounds wrongs to me.

> +	 * VMID.
> +	 */
> +	struct kvm_s2_mmu *nested_mmus;
> +	size_t nested_mmus_size;
> +	int nested_mmus_next;
> +
>  	/* VTCR_EL2 value for this VM */
>  	u64    vtcr;
>  
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 1b314b2a69bc..0750d022bbf8 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -116,6 +116,7 @@ alternative_cb_end
>  #include <asm/cacheflush.h>
>  #include <asm/mmu_context.h>
>  #include <asm/kvm_emulate.h>
> +#include <asm/kvm_nested.h>
>  
>  void kvm_update_va_mask(struct alt_instr *alt,
>  			__le32 *origptr, __le32 *updptr, int nr_inst);
> @@ -161,6 +162,7 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
>  			     void **haddr);
>  void free_hyp_pgds(void);
>  
> +void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size);
>  void stage2_unmap_vm(struct kvm *kvm);
>  int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu);
>  void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
> @@ -296,5 +298,12 @@ static inline struct kvm *kvm_s2_mmu_to_kvm(struct kvm_s2_mmu *mmu)
>  {
>  	return container_of(mmu->arch, struct kvm, arch);
>  }
> +
> +static inline u64 get_vmid(u64 vttbr)
> +{
> +	return (vttbr & VTTBR_VMID_MASK(kvm_get_vmid_bits())) >>
> +		VTTBR_VMID_SHIFT;
> +}
> +
>  #endif /* __ASSEMBLY__ */
>  #endif /* __ARM64_KVM_MMU_H__ */
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index 7d398510fd9d..8bb7159f2b6b 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -65,6 +65,13 @@ static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
>  		(cnthctl & (CNTHCTL_EVNTI | CNTHCTL_EVNTDIR | CNTHCTL_EVNTEN)));
>  }
>  
> +extern void kvm_init_nested(struct kvm *kvm);
> +extern int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu);
> +extern void kvm_init_nested_s2_mmu(struct kvm_s2_mmu *mmu);
> +extern struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr);
> +extern void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu);
> +extern void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu);
> +
>  int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
>  extern bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg,
>  			    u64 control_bit);
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 06ca11e90482..14f85f1e15b2 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -37,6 +37,7 @@
>  #include <asm/kvm_arm.h>
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_mmu.h>
> +#include <asm/kvm_nested.h>
>  #include <asm/kvm_emulate.h>
>  #include <asm/sections.h>
>  
> @@ -146,6 +147,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  	if (ret)
>  		return ret;
>  
> +	kvm_init_nested(kvm);
> +
>  	ret = kvm_share_hyp(kvm, kvm + 1);
>  	if (ret)
>  		goto out_free_stage2_pgd;
> @@ -375,6 +378,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  	struct kvm_s2_mmu *mmu;
>  	int *last_ran;
>  
> +	if (vcpu_has_nv(vcpu))
> +		kvm_vcpu_load_hw_mmu(vcpu);
> +
>  	mmu = vcpu->arch.hw_mmu;
>  	last_ran = this_cpu_ptr(mmu->last_vcpu_ran);
>  
> @@ -423,6 +429,9 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  	kvm_vgic_put(vcpu);
>  	kvm_vcpu_pmu_restore_host(vcpu);
>  
> +	if (vcpu_has_nv(vcpu))
> +		kvm_vcpu_put_hw_mmu(vcpu);
> +
>  	vcpu->cpu = -1;
>  }
>  
> @@ -1122,8 +1131,13 @@ static int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
>  
>  	vcpu->arch.target = phys_target;
>  
> +	/* Prepare for nested if required */
> +	ret = kvm_vcpu_init_nested(vcpu);
> +
>  	/* Now we know what it is, we can reset it. */
> -	ret = kvm_reset_vcpu(vcpu);
> +	if (!ret)
> +		ret = kvm_reset_vcpu(vcpu);
> +
>  	if (ret) {
>  		vcpu->arch.target = -1;
>  		bitmap_zero(vcpu->arch.features, KVM_VCPU_MAX_FEATURES);
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index bc2aba953299..55525fd5743d 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -162,7 +162,7 @@ static void invalidate_icache_guest_page(void *va, size_t size)
>   * does.
>   */
>  /**
> - * unmap_stage2_range -- Clear stage2 page table entries to unmap a range
> + * __unmap_stage2_range -- Clear stage2 page table entries to unmap a range
>   * @mmu:   The KVM stage-2 MMU pointer
>   * @start: The intermediate physical base address of the range to unmap
>   * @size:  The size of the area to unmap
> @@ -185,7 +185,7 @@ static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64
>  				   may_block));
>  }
>  
> -static void unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
> +void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
>  {
>  	__unmap_stage2_range(mmu, start, size, true);
>  }
> @@ -628,7 +628,20 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
>  	int cpu, err;
>  	struct kvm_pgtable *pgt;
>  
> +	/*
> +	 * If we already have our page tables in place, and that the
> +	 * MMU context is the canonical one, we have a bug somewhere,
> +	 * as this is only supposed to ever happen once per VM.
> +	 *
> +	 * Otherwise, we're building nested page tables, and that's
> +	 * probably because userspace called KVM_ARM_VCPU_INIT more
> +	 * than once on the same vcpu. Since that's actually legal,
> +	 * don't kick a fuss and leave gracefully.
> +	 */
>  	if (mmu->pgt != NULL) {
> +		if (&kvm->arch.mmu != mmu)
> +			return 0;
> +
>  		kvm_err("kvm_arch already initialized?\n");
>  		return -EINVAL;
>  	}
> @@ -654,6 +667,9 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
>  	mmu->pgt = pgt;
>  	mmu->pgd_phys = __pa(pgt->pgd);
>  	WRITE_ONCE(mmu->vmid.vmid_gen, 0);
> +
> +	kvm_init_nested_s2_mmu(mmu);

Might make the code self documenting if kvm_init_nested_s2_mmu() was called only
if mmu != &kvm->arch.mmu (we already have a similar check a few lines above).
Again, not a big deal and doesn't affect correctness.

> +
>  	return 0;
>  
>  out_destroy_pgtable:
> @@ -699,7 +715,7 @@ static void stage2_unmap_memslot(struct kvm *kvm,
>  
>  		if (!(vma->vm_flags & VM_PFNMAP)) {
>  			gpa_t gpa = addr + (vm_start - memslot->userspace_addr);
> -			unmap_stage2_range(&kvm->arch.mmu, gpa, vm_end - vm_start);
> +			kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, vm_end - vm_start);
>  		}
>  		hva = vm_end;
>  	} while (hva < reg_end);
> @@ -1681,11 +1697,6 @@ void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen)
>  {
>  }
>  
> -void kvm_arch_flush_shadow_all(struct kvm *kvm)
> -{
> -	kvm_free_stage2_pgd(&kvm->arch.mmu);
> -}
> -
>  void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>  				   struct kvm_memory_slot *slot)
>  {
> @@ -1693,7 +1704,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>  	phys_addr_t size = slot->npages << PAGE_SHIFT;
>  
>  	spin_lock(&kvm->mmu_lock);
> -	unmap_stage2_range(&kvm->arch.mmu, gpa, size);
> +	kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, size);
>  	spin_unlock(&kvm->mmu_lock);
>  }
>  
> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> index 254152cd791e..bfa2b9229173 100644
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
> @@ -7,12 +7,189 @@
>  #include <linux/kvm.h>
>  #include <linux/kvm_host.h>
>  
> +#include <asm/kvm_arm.h>
>  #include <asm/kvm_emulate.h>
> +#include <asm/kvm_mmu.h>
>  #include <asm/kvm_nested.h>
>  #include <asm/sysreg.h>
>  
>  #include "sys_regs.h"
>  
> +void kvm_init_nested(struct kvm *kvm)
> +{
> +	kvm->arch.nested_mmus = NULL;
> +	kvm->arch.nested_mmus_size = 0;
> +}
> +
> +int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm *kvm = vcpu->kvm;
> +	struct kvm_s2_mmu *tmp;
> +	int num_mmus;
> +	int ret = -ENOMEM;
> +
> +	if (!test_bit(KVM_ARM_VCPU_HAS_EL2, vcpu->arch.features))
> +		return 0;
> +
> +	if (!cpus_have_final_cap(ARM64_HAS_NESTED_VIRT))
> +		return -EINVAL;
> +
> +	mutex_lock(&kvm->lock);
> +
> +	/*
> +	 * Let's treat memory allocation failures as benign: If we fail to
> +	 * allocate anything, return an error and keep the allocated array
> +	 * alive. Userspace may try to recover by intializing the vcpu
> +	 * again, and there is no reason to affect the whole VM for this.
> +	 */
> +	num_mmus = atomic_read(&kvm->online_vcpus) * 2;
> +	tmp = krealloc(kvm->arch.nested_mmus,
> +		       num_mmus * sizeof(*kvm->arch.nested_mmus),
> +		       GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> +	if (tmp) {
> +		/*
> +		 * If we went through a realocation, adjust the MMU
> +		 * back-pointers in the previously initialised
> +		 * pg_table structures.
> +		 */
> +		if (kvm->arch.nested_mmus != tmp) {
> +			int i;
> +
> +			for (i = 0; i < num_mmus - 2; i++)
> +				tmp[i].pgt->mmu = &tmp[i];
> +		}
> +
> +		if (kvm_init_stage2_mmu(kvm, &tmp[num_mmus - 1]) ||
> +		    kvm_init_stage2_mmu(kvm, &tmp[num_mmus - 2])) {
> +			kvm_free_stage2_pgd(&tmp[num_mmus - 1]);
> +			kvm_free_stage2_pgd(&tmp[num_mmus - 2]);
> +		} else {
> +			kvm->arch.nested_mmus_size = num_mmus;
> +			ret = 0;
> +		}
> +
> +		kvm->arch.nested_mmus = tmp;
> +	}
> +
> +	mutex_unlock(&kvm->lock);
> +	return ret;
> +}
> +
> +/* Must be called with kvm->lock held */
> +struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr)
> +{
> +	bool nested_stage2_enabled = hcr & HCR_VM;
> +	int i;
> +
> +	/* Don't consider the CnP bit for the vttbr match */
> +	vttbr = vttbr & ~VTTBR_CNP_BIT;
> +
> +	/*
> +	 * Two possibilities when looking up a S2 MMU context:
> +	 *
> +	 * - either S2 is enabled in the guest, and we need a context that
> +         *   is S2-enabled and matches the full VTTBR (VMID+BADDR), which
> +         *   makes it safe from a TLB conflict perspective (a broken guest
> +         *   won't be able to generate them),
> +	 *
> +	 * - or S2 is disabled, and we need a context that is S2-disabled
> +         *   and matches the VMID only, as all TLBs are tagged by VMID even
> +         *   if S2 translation is disabled.
> +	 */
> +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> +		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
> +
> +		if (!kvm_s2_mmu_valid(mmu))
> +			continue;
> +
> +		if (nested_stage2_enabled &&
> +		    mmu->nested_stage2_enabled &&
> +		    vttbr == mmu->vttbr)
> +			return mmu;
> +
> +		if (!nested_stage2_enabled &&
> +		    !mmu->nested_stage2_enabled &&
> +		    get_vmid(vttbr) == get_vmid(mmu->vttbr))
> +			return mmu;
> +	}
> +	return NULL;
> +}
> +
> +static struct kvm_s2_mmu *get_s2_mmu_nested(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm *kvm = vcpu->kvm;
> +	u64 vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
> +	u64 hcr= vcpu_read_sys_reg(vcpu, HCR_EL2);
> +	struct kvm_s2_mmu *s2_mmu;
> +	int i;
> +
> +	s2_mmu = lookup_s2_mmu(kvm, vttbr, hcr);
> +	if (s2_mmu)
> +		goto out;
> +
> +	/*
> +	 * Make sure we don't always search from the same point, or we
> +	 * will always reuse a potentially active context, leaving
> +	 * free contexts unused.
> +	 */
> +	for (i = kvm->arch.nested_mmus_next;
> +	     i < (kvm->arch.nested_mmus_size + kvm->arch.nested_mmus_next);
> +	     i++) {
> +		s2_mmu = &kvm->arch.nested_mmus[i % kvm->arch.nested_mmus_size];
> +
> +		if (atomic_read(&s2_mmu->refcnt) == 0)
> +			break;
> +	}
> +	BUG_ON(atomic_read(&s2_mmu->refcnt)); /* We have struct MMUs to spare */
> +
> +	/* Set the scene for the next search */
> +	kvm->arch.nested_mmus_next = (i + 1) % kvm->arch.nested_mmus_size;
> +
> +	if (kvm_s2_mmu_valid(s2_mmu)) {
> +		/* Clear the old state */
> +		kvm_unmap_stage2_range(s2_mmu, 0, kvm_phys_size(kvm));
> +		if (s2_mmu->vmid.vmid_gen)
> +			kvm_call_hyp(__kvm_tlb_flush_vmid, s2_mmu);
> +	}
> +
> +	/*
> +	 * The virtual VMID (modulo CnP) will be used as a key when matching
> +	 * an existing kvm_s2_mmu.
> +	 */
> +	s2_mmu->vttbr = vttbr & ~VTTBR_CNP_BIT;
> +	s2_mmu->nested_stage2_enabled = hcr & HCR_VM;
> +
> +out:
> +	atomic_inc(&s2_mmu->refcnt);
> +	return s2_mmu;
> +}
> +
> +void kvm_init_nested_s2_mmu(struct kvm_s2_mmu *mmu)
> +{
> +	mmu->vttbr = 1;
> +	mmu->nested_stage2_enabled = false;
> +	atomic_set(&mmu->refcnt, 0);
> +}
> +
> +void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu)
> +{
> +	if (is_hyp_ctxt(vcpu)) {
> +		vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu;
> +	} else {
> +		spin_lock(&vcpu->kvm->mmu_lock);
> +		vcpu->arch.hw_mmu = get_s2_mmu_nested(vcpu);
> +		spin_unlock(&vcpu->kvm->mmu_lock);
> +	}
> +}
> +
> +void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu)
> +{
> +	if (vcpu->arch.hw_mmu != &vcpu->kvm->arch.mmu) {
> +		atomic_dec(&vcpu->arch.hw_mmu->refcnt);
> +		vcpu->arch.hw_mmu = NULL;
> +	}
> +}
> +
>  /*
>   * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
>   * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
> @@ -31,6 +208,24 @@ int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
>  	return -EINVAL;
>  }
>  
> +void kvm_arch_flush_shadow_all(struct kvm *kvm)
> +{
> +	int i;
> +
> +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> +		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
> +
> +		WARN_ON(atomic_read(&mmu->refcnt));
> +
> +		if (!atomic_read(&mmu->refcnt))
> +			kvm_free_stage2_pgd(mmu);
> +	}
> +	kfree(kvm->arch.nested_mmus);
> +	kvm->arch.nested_mmus = NULL;
> +	kvm->arch.nested_mmus_size = 0;
> +	kvm_free_stage2_pgd(&kvm->arch.mmu);
> +}
> +

The code above looks good.

Thanks,
Alex

>  /*
>   * Our emulated CPU doesn't support all the possible features. For the
>   * sake of simplicity (and probably mental sanity), wipe out a number
> -- 
> 2.30.2
> 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 34/64] KVM: arm64: nv: Support multiple nested Stage-2 mmu structures
@ 2022-02-16 16:12     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-16 16:12 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi,

The code looks correct to me. Questions about stuff I don't understand and
nitpicks follow.

On Fri, Jan 28, 2022 at 12:18:42PM +0000, Marc Zyngier wrote:
> Add Stage-2 mmu data structures for virtual EL2 and for nested guests.
> We don't yet populate shadow Stage-2 page tables, but we now have a
> framework for getting to a shadow Stage-2 pgd.
> 
> We allocate twice the number of vcpus as Stage-2 mmu structures because
> that's sufficient for each vcpu running two translation regimes without
> having to flush the Stage-2 page tables.

That still leaves room for interpretation. Why is running two translation
regimes per VCPU the target for the allocation? Is it because it's a nice
number? Is it because KVM only supports two translation regimes per VCPU and
going beyond that will make it impossible to run the VM? Is it because, in
practice, KVM doesn't expect most people to run more than that? Is it because
KVM doesn't recommend or hasn't been tested with running more than that?

> 
> Co-developed-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_host.h   |  29 +++++
>  arch/arm64/include/asm/kvm_mmu.h    |   9 ++
>  arch/arm64/include/asm/kvm_nested.h |   7 +
>  arch/arm64/kvm/arm.c                |  16 ++-
>  arch/arm64/kvm/mmu.c                |  29 +++--
>  arch/arm64/kvm/nested.c             | 195 ++++++++++++++++++++++++++++
>  6 files changed, 275 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index fa253f08e0fd..a15183d0e1bf 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -101,14 +101,43 @@ struct kvm_s2_mmu {
>  	int __percpu *last_vcpu_ran;
>  
>  	struct kvm_arch *arch;
> +
> +	/*
> +	 * For a shadow stage-2 MMU, the virtual vttbr programmed by the guest
> +	 * hypervisor.  Unused for kvm_arch->mmu. Set to 1 when the structure
> +	 * contains no valid information.

I think the comment contradicts itself: 1 is a valid hardware value for
VTTBR_EL2 (VMID = 0, BADDR = 0, CnP = 1), so either this is not the VTTBR
programmed by the guest hypervisor, or treating the value 1 as invalid is
incorrect.

Seeing how this is only used to hold vmid+baddr, maybe it's worth considering
renaming the field to vmid_baddr to avoid confusion about what it means.

> +	 */
> +	u64	vttbr;
> +
> +	/* true when this represents a nested context where virtual HCR_EL2.VM == 1 */
> +	bool	nested_stage2_enabled;
> +
> +	/*
> +	 *  0: Nobody is currently using this, check vttbr for validity
> +	 * >0: Somebody is actively using this.
> +	 */
> +	atomic_t refcnt;
>  };
>  
> +static inline bool kvm_s2_mmu_valid(struct kvm_s2_mmu *mmu)
> +{
> +	return !(mmu->vttbr & 1);
> +}
> +
>  struct kvm_arch_memory_slot {
>  };
>  
>  struct kvm_arch {
>  	struct kvm_s2_mmu mmu;
>  
> +	/*
> +	 * Stage 2 paging stage for VMs with nested virtual using a virtual
                          ^^^^^
Is that correct or was it supposed to say "state"? Also, "[..] VMS with nested
virtual using [..]" sounds wrongs to me.

> +	 * VMID.
> +	 */
> +	struct kvm_s2_mmu *nested_mmus;
> +	size_t nested_mmus_size;
> +	int nested_mmus_next;
> +
>  	/* VTCR_EL2 value for this VM */
>  	u64    vtcr;
>  
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 1b314b2a69bc..0750d022bbf8 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -116,6 +116,7 @@ alternative_cb_end
>  #include <asm/cacheflush.h>
>  #include <asm/mmu_context.h>
>  #include <asm/kvm_emulate.h>
> +#include <asm/kvm_nested.h>
>  
>  void kvm_update_va_mask(struct alt_instr *alt,
>  			__le32 *origptr, __le32 *updptr, int nr_inst);
> @@ -161,6 +162,7 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
>  			     void **haddr);
>  void free_hyp_pgds(void);
>  
> +void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size);
>  void stage2_unmap_vm(struct kvm *kvm);
>  int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu);
>  void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
> @@ -296,5 +298,12 @@ static inline struct kvm *kvm_s2_mmu_to_kvm(struct kvm_s2_mmu *mmu)
>  {
>  	return container_of(mmu->arch, struct kvm, arch);
>  }
> +
> +static inline u64 get_vmid(u64 vttbr)
> +{
> +	return (vttbr & VTTBR_VMID_MASK(kvm_get_vmid_bits())) >>
> +		VTTBR_VMID_SHIFT;
> +}
> +
>  #endif /* __ASSEMBLY__ */
>  #endif /* __ARM64_KVM_MMU_H__ */
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index 7d398510fd9d..8bb7159f2b6b 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -65,6 +65,13 @@ static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
>  		(cnthctl & (CNTHCTL_EVNTI | CNTHCTL_EVNTDIR | CNTHCTL_EVNTEN)));
>  }
>  
> +extern void kvm_init_nested(struct kvm *kvm);
> +extern int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu);
> +extern void kvm_init_nested_s2_mmu(struct kvm_s2_mmu *mmu);
> +extern struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr);
> +extern void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu);
> +extern void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu);
> +
>  int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
>  extern bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg,
>  			    u64 control_bit);
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 06ca11e90482..14f85f1e15b2 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -37,6 +37,7 @@
>  #include <asm/kvm_arm.h>
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_mmu.h>
> +#include <asm/kvm_nested.h>
>  #include <asm/kvm_emulate.h>
>  #include <asm/sections.h>
>  
> @@ -146,6 +147,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  	if (ret)
>  		return ret;
>  
> +	kvm_init_nested(kvm);
> +
>  	ret = kvm_share_hyp(kvm, kvm + 1);
>  	if (ret)
>  		goto out_free_stage2_pgd;
> @@ -375,6 +378,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  	struct kvm_s2_mmu *mmu;
>  	int *last_ran;
>  
> +	if (vcpu_has_nv(vcpu))
> +		kvm_vcpu_load_hw_mmu(vcpu);
> +
>  	mmu = vcpu->arch.hw_mmu;
>  	last_ran = this_cpu_ptr(mmu->last_vcpu_ran);
>  
> @@ -423,6 +429,9 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  	kvm_vgic_put(vcpu);
>  	kvm_vcpu_pmu_restore_host(vcpu);
>  
> +	if (vcpu_has_nv(vcpu))
> +		kvm_vcpu_put_hw_mmu(vcpu);
> +
>  	vcpu->cpu = -1;
>  }
>  
> @@ -1122,8 +1131,13 @@ static int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
>  
>  	vcpu->arch.target = phys_target;
>  
> +	/* Prepare for nested if required */
> +	ret = kvm_vcpu_init_nested(vcpu);
> +
>  	/* Now we know what it is, we can reset it. */
> -	ret = kvm_reset_vcpu(vcpu);
> +	if (!ret)
> +		ret = kvm_reset_vcpu(vcpu);
> +
>  	if (ret) {
>  		vcpu->arch.target = -1;
>  		bitmap_zero(vcpu->arch.features, KVM_VCPU_MAX_FEATURES);
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index bc2aba953299..55525fd5743d 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -162,7 +162,7 @@ static void invalidate_icache_guest_page(void *va, size_t size)
>   * does.
>   */
>  /**
> - * unmap_stage2_range -- Clear stage2 page table entries to unmap a range
> + * __unmap_stage2_range -- Clear stage2 page table entries to unmap a range
>   * @mmu:   The KVM stage-2 MMU pointer
>   * @start: The intermediate physical base address of the range to unmap
>   * @size:  The size of the area to unmap
> @@ -185,7 +185,7 @@ static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64
>  				   may_block));
>  }
>  
> -static void unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
> +void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
>  {
>  	__unmap_stage2_range(mmu, start, size, true);
>  }
> @@ -628,7 +628,20 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
>  	int cpu, err;
>  	struct kvm_pgtable *pgt;
>  
> +	/*
> +	 * If we already have our page tables in place, and that the
> +	 * MMU context is the canonical one, we have a bug somewhere,
> +	 * as this is only supposed to ever happen once per VM.
> +	 *
> +	 * Otherwise, we're building nested page tables, and that's
> +	 * probably because userspace called KVM_ARM_VCPU_INIT more
> +	 * than once on the same vcpu. Since that's actually legal,
> +	 * don't kick a fuss and leave gracefully.
> +	 */
>  	if (mmu->pgt != NULL) {
> +		if (&kvm->arch.mmu != mmu)
> +			return 0;
> +
>  		kvm_err("kvm_arch already initialized?\n");
>  		return -EINVAL;
>  	}
> @@ -654,6 +667,9 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
>  	mmu->pgt = pgt;
>  	mmu->pgd_phys = __pa(pgt->pgd);
>  	WRITE_ONCE(mmu->vmid.vmid_gen, 0);
> +
> +	kvm_init_nested_s2_mmu(mmu);

Might make the code self documenting if kvm_init_nested_s2_mmu() was called only
if mmu != &kvm->arch.mmu (we already have a similar check a few lines above).
Again, not a big deal and doesn't affect correctness.

> +
>  	return 0;
>  
>  out_destroy_pgtable:
> @@ -699,7 +715,7 @@ static void stage2_unmap_memslot(struct kvm *kvm,
>  
>  		if (!(vma->vm_flags & VM_PFNMAP)) {
>  			gpa_t gpa = addr + (vm_start - memslot->userspace_addr);
> -			unmap_stage2_range(&kvm->arch.mmu, gpa, vm_end - vm_start);
> +			kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, vm_end - vm_start);
>  		}
>  		hva = vm_end;
>  	} while (hva < reg_end);
> @@ -1681,11 +1697,6 @@ void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen)
>  {
>  }
>  
> -void kvm_arch_flush_shadow_all(struct kvm *kvm)
> -{
> -	kvm_free_stage2_pgd(&kvm->arch.mmu);
> -}
> -
>  void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>  				   struct kvm_memory_slot *slot)
>  {
> @@ -1693,7 +1704,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>  	phys_addr_t size = slot->npages << PAGE_SHIFT;
>  
>  	spin_lock(&kvm->mmu_lock);
> -	unmap_stage2_range(&kvm->arch.mmu, gpa, size);
> +	kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, size);
>  	spin_unlock(&kvm->mmu_lock);
>  }
>  
> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> index 254152cd791e..bfa2b9229173 100644
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
> @@ -7,12 +7,189 @@
>  #include <linux/kvm.h>
>  #include <linux/kvm_host.h>
>  
> +#include <asm/kvm_arm.h>
>  #include <asm/kvm_emulate.h>
> +#include <asm/kvm_mmu.h>
>  #include <asm/kvm_nested.h>
>  #include <asm/sysreg.h>
>  
>  #include "sys_regs.h"
>  
> +void kvm_init_nested(struct kvm *kvm)
> +{
> +	kvm->arch.nested_mmus = NULL;
> +	kvm->arch.nested_mmus_size = 0;
> +}
> +
> +int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm *kvm = vcpu->kvm;
> +	struct kvm_s2_mmu *tmp;
> +	int num_mmus;
> +	int ret = -ENOMEM;
> +
> +	if (!test_bit(KVM_ARM_VCPU_HAS_EL2, vcpu->arch.features))
> +		return 0;
> +
> +	if (!cpus_have_final_cap(ARM64_HAS_NESTED_VIRT))
> +		return -EINVAL;
> +
> +	mutex_lock(&kvm->lock);
> +
> +	/*
> +	 * Let's treat memory allocation failures as benign: If we fail to
> +	 * allocate anything, return an error and keep the allocated array
> +	 * alive. Userspace may try to recover by intializing the vcpu
> +	 * again, and there is no reason to affect the whole VM for this.
> +	 */
> +	num_mmus = atomic_read(&kvm->online_vcpus) * 2;
> +	tmp = krealloc(kvm->arch.nested_mmus,
> +		       num_mmus * sizeof(*kvm->arch.nested_mmus),
> +		       GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> +	if (tmp) {
> +		/*
> +		 * If we went through a realocation, adjust the MMU
> +		 * back-pointers in the previously initialised
> +		 * pg_table structures.
> +		 */
> +		if (kvm->arch.nested_mmus != tmp) {
> +			int i;
> +
> +			for (i = 0; i < num_mmus - 2; i++)
> +				tmp[i].pgt->mmu = &tmp[i];
> +		}
> +
> +		if (kvm_init_stage2_mmu(kvm, &tmp[num_mmus - 1]) ||
> +		    kvm_init_stage2_mmu(kvm, &tmp[num_mmus - 2])) {
> +			kvm_free_stage2_pgd(&tmp[num_mmus - 1]);
> +			kvm_free_stage2_pgd(&tmp[num_mmus - 2]);
> +		} else {
> +			kvm->arch.nested_mmus_size = num_mmus;
> +			ret = 0;
> +		}
> +
> +		kvm->arch.nested_mmus = tmp;
> +	}
> +
> +	mutex_unlock(&kvm->lock);
> +	return ret;
> +}
> +
> +/* Must be called with kvm->lock held */
> +struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr)
> +{
> +	bool nested_stage2_enabled = hcr & HCR_VM;
> +	int i;
> +
> +	/* Don't consider the CnP bit for the vttbr match */
> +	vttbr = vttbr & ~VTTBR_CNP_BIT;
> +
> +	/*
> +	 * Two possibilities when looking up a S2 MMU context:
> +	 *
> +	 * - either S2 is enabled in the guest, and we need a context that
> +         *   is S2-enabled and matches the full VTTBR (VMID+BADDR), which
> +         *   makes it safe from a TLB conflict perspective (a broken guest
> +         *   won't be able to generate them),
> +	 *
> +	 * - or S2 is disabled, and we need a context that is S2-disabled
> +         *   and matches the VMID only, as all TLBs are tagged by VMID even
> +         *   if S2 translation is disabled.
> +	 */
> +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> +		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
> +
> +		if (!kvm_s2_mmu_valid(mmu))
> +			continue;
> +
> +		if (nested_stage2_enabled &&
> +		    mmu->nested_stage2_enabled &&
> +		    vttbr == mmu->vttbr)
> +			return mmu;
> +
> +		if (!nested_stage2_enabled &&
> +		    !mmu->nested_stage2_enabled &&
> +		    get_vmid(vttbr) == get_vmid(mmu->vttbr))
> +			return mmu;
> +	}
> +	return NULL;
> +}
> +
> +static struct kvm_s2_mmu *get_s2_mmu_nested(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm *kvm = vcpu->kvm;
> +	u64 vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
> +	u64 hcr= vcpu_read_sys_reg(vcpu, HCR_EL2);
> +	struct kvm_s2_mmu *s2_mmu;
> +	int i;
> +
> +	s2_mmu = lookup_s2_mmu(kvm, vttbr, hcr);
> +	if (s2_mmu)
> +		goto out;
> +
> +	/*
> +	 * Make sure we don't always search from the same point, or we
> +	 * will always reuse a potentially active context, leaving
> +	 * free contexts unused.
> +	 */
> +	for (i = kvm->arch.nested_mmus_next;
> +	     i < (kvm->arch.nested_mmus_size + kvm->arch.nested_mmus_next);
> +	     i++) {
> +		s2_mmu = &kvm->arch.nested_mmus[i % kvm->arch.nested_mmus_size];
> +
> +		if (atomic_read(&s2_mmu->refcnt) == 0)
> +			break;
> +	}
> +	BUG_ON(atomic_read(&s2_mmu->refcnt)); /* We have struct MMUs to spare */
> +
> +	/* Set the scene for the next search */
> +	kvm->arch.nested_mmus_next = (i + 1) % kvm->arch.nested_mmus_size;
> +
> +	if (kvm_s2_mmu_valid(s2_mmu)) {
> +		/* Clear the old state */
> +		kvm_unmap_stage2_range(s2_mmu, 0, kvm_phys_size(kvm));
> +		if (s2_mmu->vmid.vmid_gen)
> +			kvm_call_hyp(__kvm_tlb_flush_vmid, s2_mmu);
> +	}
> +
> +	/*
> +	 * The virtual VMID (modulo CnP) will be used as a key when matching
> +	 * an existing kvm_s2_mmu.
> +	 */
> +	s2_mmu->vttbr = vttbr & ~VTTBR_CNP_BIT;
> +	s2_mmu->nested_stage2_enabled = hcr & HCR_VM;
> +
> +out:
> +	atomic_inc(&s2_mmu->refcnt);
> +	return s2_mmu;
> +}
> +
> +void kvm_init_nested_s2_mmu(struct kvm_s2_mmu *mmu)
> +{
> +	mmu->vttbr = 1;
> +	mmu->nested_stage2_enabled = false;
> +	atomic_set(&mmu->refcnt, 0);
> +}
> +
> +void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu)
> +{
> +	if (is_hyp_ctxt(vcpu)) {
> +		vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu;
> +	} else {
> +		spin_lock(&vcpu->kvm->mmu_lock);
> +		vcpu->arch.hw_mmu = get_s2_mmu_nested(vcpu);
> +		spin_unlock(&vcpu->kvm->mmu_lock);
> +	}
> +}
> +
> +void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu)
> +{
> +	if (vcpu->arch.hw_mmu != &vcpu->kvm->arch.mmu) {
> +		atomic_dec(&vcpu->arch.hw_mmu->refcnt);
> +		vcpu->arch.hw_mmu = NULL;
> +	}
> +}
> +
>  /*
>   * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
>   * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
> @@ -31,6 +208,24 @@ int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
>  	return -EINVAL;
>  }
>  
> +void kvm_arch_flush_shadow_all(struct kvm *kvm)
> +{
> +	int i;
> +
> +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> +		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
> +
> +		WARN_ON(atomic_read(&mmu->refcnt));
> +
> +		if (!atomic_read(&mmu->refcnt))
> +			kvm_free_stage2_pgd(mmu);
> +	}
> +	kfree(kvm->arch.nested_mmus);
> +	kvm->arch.nested_mmus = NULL;
> +	kvm->arch.nested_mmus_size = 0;
> +	kvm_free_stage2_pgd(&kvm->arch.mmu);
> +}
> +

The code above looks good.

Thanks,
Alex

>  /*
>   * Our emulated CPU doesn't support all the possible features. For the
>   * sake of simplicity (and probably mental sanity), wipe out a number
> -- 
> 2.30.2
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 36/64] KVM: arm64: nv: Handle shadow stage 2 page faults
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-17 15:23     ` Alexandru Elisei
  -1 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-17 15:23 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi,

On Fri, Jan 28, 2022 at 12:18:44PM +0000, Marc Zyngier wrote:
> If we are faulting on a shadow stage 2 translation, we first walk the
> guest hypervisor's stage 2 page table to see if it has a mapping. If
> not, we inject a stage 2 page fault to the virtual EL2. Otherwise, we
> create a mapping in the shadow stage 2 page table.
> 
> Note that we have to deal with two IPAs when we got a shadow stage 2
> page fault. One is the address we faulted on, and is in the L2 guest
> phys space. The other is from the guest stage-2 page table walk, and is
> in the L1 guest phys space.  To differentiate them, we rename variables
> so that fault_ipa is used for the former and ipa is used for the latter.
> 
> Co-developed-by: Christoffer Dall <christoffer.dall@linaro.org>
> Co-developed-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> [maz: rewrote this multiple times...]
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_emulate.h |   6 ++
>  arch/arm64/include/asm/kvm_nested.h  |  18 +++++
>  arch/arm64/kvm/mmu.c                 | 102 +++++++++++++++++++++++----
>  arch/arm64/kvm/nested.c              |  48 +++++++++++++
>  4 files changed, 159 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index ff8980a39ee8..dc6eeb0cc8a9 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -602,4 +602,10 @@ static inline bool vcpu_has_feature(struct kvm_vcpu *vcpu, int feature)
>  	return test_bit(feature, vcpu->arch.features);
>  }
>  
> +static inline bool kvm_is_shadow_s2_fault(struct kvm_vcpu *vcpu)
> +{
> +	return (vcpu->arch.hw_mmu != &vcpu->kvm->arch.mmu &&
> +		vcpu->arch.hw_mmu->nested_stage2_enabled);
> +}
> +
>  #endif /* __ARM64_KVM_EMULATE_H__ */
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index 48cf288ea238..4fad4d3848ce 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -82,9 +82,27 @@ struct kvm_s2_trans {
>  	u64 upper_attr;
>  };
>  
> +static inline phys_addr_t kvm_s2_trans_output(struct kvm_s2_trans *trans)
> +{
> +	return trans->output;
> +}
> +
> +static inline unsigned long kvm_s2_trans_size(struct kvm_s2_trans *trans)
> +{
> +	return trans->block_size;
> +}
> +
> +static inline u32 kvm_s2_trans_esr(struct kvm_s2_trans *trans)
> +{
> +	return trans->esr;
> +}
> +
>  extern int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
>  			      struct kvm_s2_trans *result);
>  
> +extern int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu,
> +				    struct kvm_s2_trans *trans);
> +extern int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2);
>  int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
>  extern bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg,
>  			    u64 control_bit);
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 55525fd5743d..36f7ecb4f81b 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -969,7 +969,7 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot,
>  static unsigned long
>  transparent_hugepage_adjust(struct kvm *kvm, struct kvm_memory_slot *memslot,
>  			    unsigned long hva, kvm_pfn_t *pfnp,
> -			    phys_addr_t *ipap)
> +			    phys_addr_t *ipap, phys_addr_t *fault_ipap)
>  {
>  	kvm_pfn_t pfn = *pfnp;
>  
> @@ -998,6 +998,7 @@ transparent_hugepage_adjust(struct kvm *kvm, struct kvm_memory_slot *memslot,
>  		 * to PG_head and switch the pfn from a tail page to the head
>  		 * page accordingly.
>  		 */
> +		*fault_ipap &= PMD_MASK;
>  		*ipap &= PMD_MASK;
>  		kvm_release_pfn_clean(pfn);
>  		pfn &= ~(PTRS_PER_PMD - 1);
> @@ -1080,15 +1081,17 @@ static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
>  }
>  
>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> -			  struct kvm_memory_slot *memslot, unsigned long hva,
> -			  unsigned long fault_status)
> +			  struct kvm_s2_trans *nested,
> +			  struct kvm_memory_slot *memslot,
> +			  unsigned long hva, unsigned long fault_status)
>  {
>  	int ret = 0;
> -	bool write_fault, writable, force_pte = false;
> +	bool write_fault, writable;
>  	bool exec_fault;
>  	bool device = false;
>  	bool shared;
>  	unsigned long mmu_seq;
> +	phys_addr_t ipa = fault_ipa;
>  	struct kvm *kvm = vcpu->kvm;
>  	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
>  	struct vm_area_struct *vma;
> @@ -1100,6 +1103,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	unsigned long vma_pagesize, fault_granule;
>  	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
>  	struct kvm_pgtable *pgt;
> +	unsigned long max_map_size = PUD_SIZE;
>  
>  	fault_granule = 1UL << ARM64_HW_PGTABLE_LEVEL_SHIFT(fault_level);
>  	write_fault = kvm_is_write_fault(vcpu);
> @@ -1128,7 +1132,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	 * memslots.
>  	 */
>  	if (logging_active) {
> -		force_pte = true;
> +		max_map_size = vma_pagesize = PAGE_SIZE;

I don't think it's needed to set vma_pagesize here, because...

>  		vma_shift = PAGE_SHIFT;
>  	} else {
>  		vma_shift = get_vma_page_shift(vma, hva);
> @@ -1152,7 +1156,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  		fallthrough;
>  	case CONT_PTE_SHIFT:
>  		vma_shift = PAGE_SHIFT;
> -		force_pte = true;
> +		max_map_size = PAGE_SIZE;
>  		fallthrough;
>  	case PAGE_SHIFT:
>  		break;
> @@ -1161,10 +1165,25 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	}
>  
>  	vma_pagesize = 1UL << vma_shift;

...it's computed here based on vma_shift.

> +
> +	if (kvm_is_shadow_s2_fault(vcpu)) {
> +		ipa = kvm_s2_trans_output(nested);
> +
> +		/*
> +		 * If we're about to create a shadow stage 2 entry, then we
> +		 * can only create a block mapping if the guest stage 2 page
> +		 * table uses at least as big a mapping.
> +		 */
> +		max_map_size = min(kvm_s2_trans_size(nested), max_map_size);
> +	}
> +
> +	vma_pagesize = min(vma_pagesize, max_map_size);

I find the usage of max_map_size slightly confusing, because when max_map_size
is PAGE_SIZE, vma_pagesize is also PAGE_SIZE, so max_map_size doesn't affect
the result of the min() function above. In all other cases, max_map_size is
PUD_SIZE (the maximum possible value), so the resulting page size is only
influenced by kvm_s2_trans_size() and the vma_pagesize.

Wouldn't it make more sense to rework the hunk above as:

	if (kvm_is_shadow_s2_fault(vcpu)) {
		ipa = kvm_s2_trans_output(nested);
		vma_pagesize = min(kvm_s2_trans_size(nested), vma_pagesize);
	}

and remove the max_map_size local variable entirely? Or is there something I'm
missing here?

> +
>  	if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE)
>  		fault_ipa &= ~(vma_pagesize - 1);
>  
> -	gfn = fault_ipa >> PAGE_SHIFT;
> +	gfn = ipa >> PAGE_SHIFT;
> +
>  	mmap_read_unlock(current->mm);
>  
>  	/*
> @@ -1237,12 +1256,14 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	 * If we are not forced to use page mapping, check if we are
>  	 * backed by a THP and thus use block mapping if possible.
>  	 */
> -	if (vma_pagesize == PAGE_SIZE && !(force_pte || device)) {
> +	if (vma_pagesize == PAGE_SIZE &&
> +	    !(max_map_size == PAGE_SIZE || device)) {
>  		if (fault_status == FSC_PERM && fault_granule > PAGE_SIZE)
>  			vma_pagesize = fault_granule;
>  		else
>  			vma_pagesize = transparent_hugepage_adjust(kvm, memslot,
>  								   hva, &pfn,
> +								   &ipa,

I don't understand this. The local variable ipa (L1 guest's virtual stage 2
output) is never used after this point. kvm_pgtable_stage2_{map,relax_perms}
below uses fault_ipa (which is the correct address to use).

>  								   &fault_ipa);
>  	}
>  
> @@ -1326,8 +1347,10 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
>  int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>  {
>  	unsigned long fault_status;
> -	phys_addr_t fault_ipa;
> +	phys_addr_t fault_ipa; /* The address we faulted on */
> +	phys_addr_t ipa; /* Always the IPA in the L1 guest phys space */
>  	struct kvm_memory_slot *memslot;
> +	struct kvm_s2_trans nested_trans;
>  	unsigned long hva;
>  	bool is_iabt, write_fault, writable;
>  	gfn_t gfn;
> @@ -1335,7 +1358,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>  
>  	fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
>  
> -	fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
> +	ipa = fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
>  	is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
>  
>  	/* Synchronous External Abort? */
> @@ -1356,6 +1379,12 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>  	/* Check the stage-2 fault is trans. fault or write fault */
>  	if (fault_status != FSC_FAULT && fault_status != FSC_PERM &&
>  	    fault_status != FSC_ACCESS) {
> +		/*
> +		 * We must never see an address size fault on shadow stage 2
> +		 * page table walk, because we would have injected an addr
> +		 * size fault when we walked the nested s2 page and not
> +		 * create the shadow entry.
> +		 */
>  		kvm_err("Unsupported FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
>  			kvm_vcpu_trap_get_class(vcpu),
>  			(unsigned long)kvm_vcpu_trap_get_fault(vcpu),
> @@ -1365,7 +1394,49 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>  
>  	idx = srcu_read_lock(&vcpu->kvm->srcu);
>  
> -	gfn = fault_ipa >> PAGE_SHIFT;
> +	/*
> +	 * We may have faulted on a shadow stage 2 page table if we are
> +	 * running a nested guest.  In this case, we have to resolve the L2
> +	 * IPA to the L1 IPA first, before knowing what kind of memory should
> +	 * back the L1 IPA.
> +	 *
> +	 * If the shadow stage 2 page table walk faults, then we simply inject
> +	 * this to the guest and carry on.
> +	 */
> +	if (kvm_is_shadow_s2_fault(vcpu)) {
> +		u32 esr;
> +
> +		ret = kvm_walk_nested_s2(vcpu, fault_ipa, &nested_trans);
> +		esr = kvm_s2_trans_esr(&nested_trans);
> +		if (esr)
> +			kvm_inject_s2_fault(vcpu, esr);

This code reads very strange, because ret has the same semantic meaning as esr,
they are both non-zero when the software stage 2 walker encounters a virtual
stage 2 fault.

I think something like this would read a lot better:

		ret = kvm_walk_nested_s2(vcpu, fault_ipa, &nested);
		if (ret) {
			esr = kvm_s2_trans_esr(&nested_trans);
			kvm_inject_s2_fault(vcpu, esr);
			goto out_unlock;
		}

> +		if (ret)
> +			goto out_unlock;
> +
> +		ret = kvm_s2_handle_perm_fault(vcpu, &nested_trans);
> +		esr = kvm_s2_trans_esr(&nested_trans);
> +		if (esr)
> +			kvm_inject_s2_fault(vcpu, esr);
> +		if (ret)
> +			goto out_unlock;
> +
> +		ipa = kvm_s2_trans_output(&nested_trans);
> +	} else {
> +		nested_trans = (struct kvm_s2_trans) {
> +			/*
> +			 * Default to RWX so that we don't filter
> +			 * anything while evaluating the permissions.
> +			 */
> +			.writable	= true,
> +			.readable	= true,
> +			.upper_attr	= 0,
> +			.output		= ipa,
> +			.level		= kvm_vcpu_trap_get_fault_level(vcpu),
> +			.esr		= kvm_vcpu_get_esr(vcpu),
> +		};

I don't think this is needed, user_mem_abort() reads nested_trans only if
kvm_is_shadow_s2_fault().

> +	}
> +
> +	gfn = ipa >> PAGE_SHIFT;
>  	memslot = gfn_to_memslot(vcpu->kvm, gfn);
>  	hva = gfn_to_hva_memslot_prot(memslot, gfn, &writable);
>  	write_fault = kvm_is_write_fault(vcpu);
> @@ -1409,13 +1480,13 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>  		 * faulting VA. This is always 12 bits, irrespective
>  		 * of the page size.
>  		 */
> -		fault_ipa |= kvm_vcpu_get_hfar(vcpu) & ((1 << 12) - 1);
> -		ret = io_mem_abort(vcpu, fault_ipa);
> +		ipa |= kvm_vcpu_get_hfar(vcpu) & ((1 << 12) - 1);
> +		ret = io_mem_abort(vcpu, ipa);
>  		goto out_unlock;
>  	}
>  
>  	/* Userspace should not be able to register out-of-bounds IPAs */
> -	VM_BUG_ON(fault_ipa >= kvm_phys_size(vcpu->kvm));
> +	VM_BUG_ON(ipa >= kvm_phys_size(vcpu->kvm));
>  
>  	if (fault_status == FSC_ACCESS) {
>  		handle_access_fault(vcpu, fault_ipa);
> @@ -1423,7 +1494,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>  		goto out_unlock;
>  	}
>  
> -	ret = user_mem_abort(vcpu, fault_ipa, memslot, hva, fault_status);
> +	ret = user_mem_abort(vcpu, fault_ipa, &nested_trans,
> +			     memslot, hva, fault_status);
>  	if (ret == 0)
>  		ret = 1;
>  out:

I've tried to follow the nested fault logic in kvm_handle_guest_abort(), but I
found that extremely difficult because it's not clear what checks and error
conditions apply to the nested fault case versus the regular case.

For example, when handling this condition:

if (kvm_is_error_hva(hva) || (write_fault && !writable))

I think the second part (write_fault && !writable) is always false in the nested
case, because kvm_s2_handle_perm_fault() already has this check.

This part:

		if (kvm_is_error_hva(hva) && kvm_vcpu_dabt_is_cm(vcpu)) {
			kvm_incr_pc(vcpu);
			ret = 1;
			goto out_unlock;
		}

should the DABT be reflected back to the L1 guest? At this point we know that
fault_ipa is mapped to ipa in the virtual stage 2.

And this:

	if (fault_status == FSC_ACCESS) {
		handle_access_fault(vcpu, fault_ipa);
		ret = 1;
		goto out_unlock;
	}

I couldn't figure out if kvm_walk_nested_s2() alreay handles that case.

IMO, it might make handle_guest_abort() a lot easier to understand if the two
paths were clearly separated into distinct functions, one that handles the
nested abort case, and one that handle the normal case. On the other hand, that
might make maintaining the code a lot harder.

What do you think?

Thanks,
Alex

> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> index c2a99b672368..0a9708f776fc 100644
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
> @@ -108,6 +108,15 @@ static u32 compute_fsc(int level, u32 fsc)
>  	return fsc | (level & 0x3);
>  }
>  
> +static int esr_s2_fault(struct kvm_vcpu *vcpu, int level, u32 fsc)
> +{
> +	u32 esr;
> +
> +	esr = kvm_vcpu_get_esr(vcpu) & ~ESR_ELx_FSC;
> +	esr |= compute_fsc(level, fsc);
> +	return esr;
> +}
> +
>  static int check_base_s2_limits(struct s2_walk_info *wi,
>  				int level, int input_size, int stride)
>  {
> @@ -457,6 +466,45 @@ void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu)
>  	}
>  }
>  
> +/*
> + * Returns non-zero if permission fault is handled by injecting it to the next
> + * level hypervisor.
> + */
> +int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu, struct kvm_s2_trans *trans)
> +{
> +	unsigned long fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
> +	bool forward_fault = false;
> +
> +	trans->esr = 0;
> +
> +	if (fault_status != FSC_PERM)
> +		return 0;
> +
> +	if (kvm_vcpu_trap_is_iabt(vcpu)) {
> +		forward_fault = (trans->upper_attr & BIT(54));
> +	} else {
> +		bool write_fault = kvm_is_write_fault(vcpu);
> +
> +		forward_fault = ((write_fault && !trans->writable) ||
> +				 (!write_fault && !trans->readable));
> +	}
> +
> +	if (forward_fault) {
> +		trans->esr = esr_s2_fault(vcpu, trans->level, ESR_ELx_FSC_PERM);
> +		return 1;
> +	}
> +
> +	return 0;
> +}
> +
> +int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2)
> +{
> +	vcpu_write_sys_reg(vcpu, vcpu->arch.fault.far_el2, FAR_EL2);
> +	vcpu_write_sys_reg(vcpu, vcpu->arch.fault.hpfar_el2, HPFAR_EL2);
> +
> +	return kvm_inject_nested_sync(vcpu, esr_el2);
> +}
> +
>  /*
>   * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
>   * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 36/64] KVM: arm64: nv: Handle shadow stage 2 page faults
@ 2022-02-17 15:23     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-17 15:23 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

Hi,

On Fri, Jan 28, 2022 at 12:18:44PM +0000, Marc Zyngier wrote:
> If we are faulting on a shadow stage 2 translation, we first walk the
> guest hypervisor's stage 2 page table to see if it has a mapping. If
> not, we inject a stage 2 page fault to the virtual EL2. Otherwise, we
> create a mapping in the shadow stage 2 page table.
> 
> Note that we have to deal with two IPAs when we got a shadow stage 2
> page fault. One is the address we faulted on, and is in the L2 guest
> phys space. The other is from the guest stage-2 page table walk, and is
> in the L1 guest phys space.  To differentiate them, we rename variables
> so that fault_ipa is used for the former and ipa is used for the latter.
> 
> Co-developed-by: Christoffer Dall <christoffer.dall@linaro.org>
> Co-developed-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> [maz: rewrote this multiple times...]
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_emulate.h |   6 ++
>  arch/arm64/include/asm/kvm_nested.h  |  18 +++++
>  arch/arm64/kvm/mmu.c                 | 102 +++++++++++++++++++++++----
>  arch/arm64/kvm/nested.c              |  48 +++++++++++++
>  4 files changed, 159 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index ff8980a39ee8..dc6eeb0cc8a9 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -602,4 +602,10 @@ static inline bool vcpu_has_feature(struct kvm_vcpu *vcpu, int feature)
>  	return test_bit(feature, vcpu->arch.features);
>  }
>  
> +static inline bool kvm_is_shadow_s2_fault(struct kvm_vcpu *vcpu)
> +{
> +	return (vcpu->arch.hw_mmu != &vcpu->kvm->arch.mmu &&
> +		vcpu->arch.hw_mmu->nested_stage2_enabled);
> +}
> +
>  #endif /* __ARM64_KVM_EMULATE_H__ */
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index 48cf288ea238..4fad4d3848ce 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -82,9 +82,27 @@ struct kvm_s2_trans {
>  	u64 upper_attr;
>  };
>  
> +static inline phys_addr_t kvm_s2_trans_output(struct kvm_s2_trans *trans)
> +{
> +	return trans->output;
> +}
> +
> +static inline unsigned long kvm_s2_trans_size(struct kvm_s2_trans *trans)
> +{
> +	return trans->block_size;
> +}
> +
> +static inline u32 kvm_s2_trans_esr(struct kvm_s2_trans *trans)
> +{
> +	return trans->esr;
> +}
> +
>  extern int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
>  			      struct kvm_s2_trans *result);
>  
> +extern int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu,
> +				    struct kvm_s2_trans *trans);
> +extern int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2);
>  int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
>  extern bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg,
>  			    u64 control_bit);
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 55525fd5743d..36f7ecb4f81b 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -969,7 +969,7 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot,
>  static unsigned long
>  transparent_hugepage_adjust(struct kvm *kvm, struct kvm_memory_slot *memslot,
>  			    unsigned long hva, kvm_pfn_t *pfnp,
> -			    phys_addr_t *ipap)
> +			    phys_addr_t *ipap, phys_addr_t *fault_ipap)
>  {
>  	kvm_pfn_t pfn = *pfnp;
>  
> @@ -998,6 +998,7 @@ transparent_hugepage_adjust(struct kvm *kvm, struct kvm_memory_slot *memslot,
>  		 * to PG_head and switch the pfn from a tail page to the head
>  		 * page accordingly.
>  		 */
> +		*fault_ipap &= PMD_MASK;
>  		*ipap &= PMD_MASK;
>  		kvm_release_pfn_clean(pfn);
>  		pfn &= ~(PTRS_PER_PMD - 1);
> @@ -1080,15 +1081,17 @@ static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
>  }
>  
>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> -			  struct kvm_memory_slot *memslot, unsigned long hva,
> -			  unsigned long fault_status)
> +			  struct kvm_s2_trans *nested,
> +			  struct kvm_memory_slot *memslot,
> +			  unsigned long hva, unsigned long fault_status)
>  {
>  	int ret = 0;
> -	bool write_fault, writable, force_pte = false;
> +	bool write_fault, writable;
>  	bool exec_fault;
>  	bool device = false;
>  	bool shared;
>  	unsigned long mmu_seq;
> +	phys_addr_t ipa = fault_ipa;
>  	struct kvm *kvm = vcpu->kvm;
>  	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
>  	struct vm_area_struct *vma;
> @@ -1100,6 +1103,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	unsigned long vma_pagesize, fault_granule;
>  	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
>  	struct kvm_pgtable *pgt;
> +	unsigned long max_map_size = PUD_SIZE;
>  
>  	fault_granule = 1UL << ARM64_HW_PGTABLE_LEVEL_SHIFT(fault_level);
>  	write_fault = kvm_is_write_fault(vcpu);
> @@ -1128,7 +1132,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	 * memslots.
>  	 */
>  	if (logging_active) {
> -		force_pte = true;
> +		max_map_size = vma_pagesize = PAGE_SIZE;

I don't think it's needed to set vma_pagesize here, because...

>  		vma_shift = PAGE_SHIFT;
>  	} else {
>  		vma_shift = get_vma_page_shift(vma, hva);
> @@ -1152,7 +1156,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  		fallthrough;
>  	case CONT_PTE_SHIFT:
>  		vma_shift = PAGE_SHIFT;
> -		force_pte = true;
> +		max_map_size = PAGE_SIZE;
>  		fallthrough;
>  	case PAGE_SHIFT:
>  		break;
> @@ -1161,10 +1165,25 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	}
>  
>  	vma_pagesize = 1UL << vma_shift;

...it's computed here based on vma_shift.

> +
> +	if (kvm_is_shadow_s2_fault(vcpu)) {
> +		ipa = kvm_s2_trans_output(nested);
> +
> +		/*
> +		 * If we're about to create a shadow stage 2 entry, then we
> +		 * can only create a block mapping if the guest stage 2 page
> +		 * table uses at least as big a mapping.
> +		 */
> +		max_map_size = min(kvm_s2_trans_size(nested), max_map_size);
> +	}
> +
> +	vma_pagesize = min(vma_pagesize, max_map_size);

I find the usage of max_map_size slightly confusing, because when max_map_size
is PAGE_SIZE, vma_pagesize is also PAGE_SIZE, so max_map_size doesn't affect
the result of the min() function above. In all other cases, max_map_size is
PUD_SIZE (the maximum possible value), so the resulting page size is only
influenced by kvm_s2_trans_size() and the vma_pagesize.

Wouldn't it make more sense to rework the hunk above as:

	if (kvm_is_shadow_s2_fault(vcpu)) {
		ipa = kvm_s2_trans_output(nested);
		vma_pagesize = min(kvm_s2_trans_size(nested), vma_pagesize);
	}

and remove the max_map_size local variable entirely? Or is there something I'm
missing here?

> +
>  	if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE)
>  		fault_ipa &= ~(vma_pagesize - 1);
>  
> -	gfn = fault_ipa >> PAGE_SHIFT;
> +	gfn = ipa >> PAGE_SHIFT;
> +
>  	mmap_read_unlock(current->mm);
>  
>  	/*
> @@ -1237,12 +1256,14 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	 * If we are not forced to use page mapping, check if we are
>  	 * backed by a THP and thus use block mapping if possible.
>  	 */
> -	if (vma_pagesize == PAGE_SIZE && !(force_pte || device)) {
> +	if (vma_pagesize == PAGE_SIZE &&
> +	    !(max_map_size == PAGE_SIZE || device)) {
>  		if (fault_status == FSC_PERM && fault_granule > PAGE_SIZE)
>  			vma_pagesize = fault_granule;
>  		else
>  			vma_pagesize = transparent_hugepage_adjust(kvm, memslot,
>  								   hva, &pfn,
> +								   &ipa,

I don't understand this. The local variable ipa (L1 guest's virtual stage 2
output) is never used after this point. kvm_pgtable_stage2_{map,relax_perms}
below uses fault_ipa (which is the correct address to use).

>  								   &fault_ipa);
>  	}
>  
> @@ -1326,8 +1347,10 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
>  int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>  {
>  	unsigned long fault_status;
> -	phys_addr_t fault_ipa;
> +	phys_addr_t fault_ipa; /* The address we faulted on */
> +	phys_addr_t ipa; /* Always the IPA in the L1 guest phys space */
>  	struct kvm_memory_slot *memslot;
> +	struct kvm_s2_trans nested_trans;
>  	unsigned long hva;
>  	bool is_iabt, write_fault, writable;
>  	gfn_t gfn;
> @@ -1335,7 +1358,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>  
>  	fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
>  
> -	fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
> +	ipa = fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
>  	is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
>  
>  	/* Synchronous External Abort? */
> @@ -1356,6 +1379,12 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>  	/* Check the stage-2 fault is trans. fault or write fault */
>  	if (fault_status != FSC_FAULT && fault_status != FSC_PERM &&
>  	    fault_status != FSC_ACCESS) {
> +		/*
> +		 * We must never see an address size fault on shadow stage 2
> +		 * page table walk, because we would have injected an addr
> +		 * size fault when we walked the nested s2 page and not
> +		 * create the shadow entry.
> +		 */
>  		kvm_err("Unsupported FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
>  			kvm_vcpu_trap_get_class(vcpu),
>  			(unsigned long)kvm_vcpu_trap_get_fault(vcpu),
> @@ -1365,7 +1394,49 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>  
>  	idx = srcu_read_lock(&vcpu->kvm->srcu);
>  
> -	gfn = fault_ipa >> PAGE_SHIFT;
> +	/*
> +	 * We may have faulted on a shadow stage 2 page table if we are
> +	 * running a nested guest.  In this case, we have to resolve the L2
> +	 * IPA to the L1 IPA first, before knowing what kind of memory should
> +	 * back the L1 IPA.
> +	 *
> +	 * If the shadow stage 2 page table walk faults, then we simply inject
> +	 * this to the guest and carry on.
> +	 */
> +	if (kvm_is_shadow_s2_fault(vcpu)) {
> +		u32 esr;
> +
> +		ret = kvm_walk_nested_s2(vcpu, fault_ipa, &nested_trans);
> +		esr = kvm_s2_trans_esr(&nested_trans);
> +		if (esr)
> +			kvm_inject_s2_fault(vcpu, esr);

This code reads very strange, because ret has the same semantic meaning as esr,
they are both non-zero when the software stage 2 walker encounters a virtual
stage 2 fault.

I think something like this would read a lot better:

		ret = kvm_walk_nested_s2(vcpu, fault_ipa, &nested);
		if (ret) {
			esr = kvm_s2_trans_esr(&nested_trans);
			kvm_inject_s2_fault(vcpu, esr);
			goto out_unlock;
		}

> +		if (ret)
> +			goto out_unlock;
> +
> +		ret = kvm_s2_handle_perm_fault(vcpu, &nested_trans);
> +		esr = kvm_s2_trans_esr(&nested_trans);
> +		if (esr)
> +			kvm_inject_s2_fault(vcpu, esr);
> +		if (ret)
> +			goto out_unlock;
> +
> +		ipa = kvm_s2_trans_output(&nested_trans);
> +	} else {
> +		nested_trans = (struct kvm_s2_trans) {
> +			/*
> +			 * Default to RWX so that we don't filter
> +			 * anything while evaluating the permissions.
> +			 */
> +			.writable	= true,
> +			.readable	= true,
> +			.upper_attr	= 0,
> +			.output		= ipa,
> +			.level		= kvm_vcpu_trap_get_fault_level(vcpu),
> +			.esr		= kvm_vcpu_get_esr(vcpu),
> +		};

I don't think this is needed, user_mem_abort() reads nested_trans only if
kvm_is_shadow_s2_fault().

> +	}
> +
> +	gfn = ipa >> PAGE_SHIFT;
>  	memslot = gfn_to_memslot(vcpu->kvm, gfn);
>  	hva = gfn_to_hva_memslot_prot(memslot, gfn, &writable);
>  	write_fault = kvm_is_write_fault(vcpu);
> @@ -1409,13 +1480,13 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>  		 * faulting VA. This is always 12 bits, irrespective
>  		 * of the page size.
>  		 */
> -		fault_ipa |= kvm_vcpu_get_hfar(vcpu) & ((1 << 12) - 1);
> -		ret = io_mem_abort(vcpu, fault_ipa);
> +		ipa |= kvm_vcpu_get_hfar(vcpu) & ((1 << 12) - 1);
> +		ret = io_mem_abort(vcpu, ipa);
>  		goto out_unlock;
>  	}
>  
>  	/* Userspace should not be able to register out-of-bounds IPAs */
> -	VM_BUG_ON(fault_ipa >= kvm_phys_size(vcpu->kvm));
> +	VM_BUG_ON(ipa >= kvm_phys_size(vcpu->kvm));
>  
>  	if (fault_status == FSC_ACCESS) {
>  		handle_access_fault(vcpu, fault_ipa);
> @@ -1423,7 +1494,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>  		goto out_unlock;
>  	}
>  
> -	ret = user_mem_abort(vcpu, fault_ipa, memslot, hva, fault_status);
> +	ret = user_mem_abort(vcpu, fault_ipa, &nested_trans,
> +			     memslot, hva, fault_status);
>  	if (ret == 0)
>  		ret = 1;
>  out:

I've tried to follow the nested fault logic in kvm_handle_guest_abort(), but I
found that extremely difficult because it's not clear what checks and error
conditions apply to the nested fault case versus the regular case.

For example, when handling this condition:

if (kvm_is_error_hva(hva) || (write_fault && !writable))

I think the second part (write_fault && !writable) is always false in the nested
case, because kvm_s2_handle_perm_fault() already has this check.

This part:

		if (kvm_is_error_hva(hva) && kvm_vcpu_dabt_is_cm(vcpu)) {
			kvm_incr_pc(vcpu);
			ret = 1;
			goto out_unlock;
		}

should the DABT be reflected back to the L1 guest? At this point we know that
fault_ipa is mapped to ipa in the virtual stage 2.

And this:

	if (fault_status == FSC_ACCESS) {
		handle_access_fault(vcpu, fault_ipa);
		ret = 1;
		goto out_unlock;
	}

I couldn't figure out if kvm_walk_nested_s2() alreay handles that case.

IMO, it might make handle_guest_abort() a lot easier to understand if the two
paths were clearly separated into distinct functions, one that handles the
nested abort case, and one that handle the normal case. On the other hand, that
might make maintaining the code a lot harder.

What do you think?

Thanks,
Alex

> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> index c2a99b672368..0a9708f776fc 100644
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
> @@ -108,6 +108,15 @@ static u32 compute_fsc(int level, u32 fsc)
>  	return fsc | (level & 0x3);
>  }
>  
> +static int esr_s2_fault(struct kvm_vcpu *vcpu, int level, u32 fsc)
> +{
> +	u32 esr;
> +
> +	esr = kvm_vcpu_get_esr(vcpu) & ~ESR_ELx_FSC;
> +	esr |= compute_fsc(level, fsc);
> +	return esr;
> +}
> +
>  static int check_base_s2_limits(struct s2_walk_info *wi,
>  				int level, int input_size, int stride)
>  {
> @@ -457,6 +466,45 @@ void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu)
>  	}
>  }
>  
> +/*
> + * Returns non-zero if permission fault is handled by injecting it to the next
> + * level hypervisor.
> + */
> +int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu, struct kvm_s2_trans *trans)
> +{
> +	unsigned long fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
> +	bool forward_fault = false;
> +
> +	trans->esr = 0;
> +
> +	if (fault_status != FSC_PERM)
> +		return 0;
> +
> +	if (kvm_vcpu_trap_is_iabt(vcpu)) {
> +		forward_fault = (trans->upper_attr & BIT(54));
> +	} else {
> +		bool write_fault = kvm_is_write_fault(vcpu);
> +
> +		forward_fault = ((write_fault && !trans->writable) ||
> +				 (!write_fault && !trans->readable));
> +	}
> +
> +	if (forward_fault) {
> +		trans->esr = esr_s2_fault(vcpu, trans->level, ESR_ELx_FSC_PERM);
> +		return 1;
> +	}
> +
> +	return 0;
> +}
> +
> +int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2)
> +{
> +	vcpu_write_sys_reg(vcpu, vcpu->arch.fault.far_el2, FAR_EL2);
> +	vcpu_write_sys_reg(vcpu, vcpu->arch.fault.hpfar_el2, HPFAR_EL2);
> +
> +	return kvm_inject_nested_sync(vcpu, esr_el2);
> +}
> +
>  /*
>   * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
>   * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
> -- 
> 2.30.2
> 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 36/64] KVM: arm64: nv: Handle shadow stage 2 page faults
@ 2022-02-17 15:23     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-17 15:23 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi,

On Fri, Jan 28, 2022 at 12:18:44PM +0000, Marc Zyngier wrote:
> If we are faulting on a shadow stage 2 translation, we first walk the
> guest hypervisor's stage 2 page table to see if it has a mapping. If
> not, we inject a stage 2 page fault to the virtual EL2. Otherwise, we
> create a mapping in the shadow stage 2 page table.
> 
> Note that we have to deal with two IPAs when we got a shadow stage 2
> page fault. One is the address we faulted on, and is in the L2 guest
> phys space. The other is from the guest stage-2 page table walk, and is
> in the L1 guest phys space.  To differentiate them, we rename variables
> so that fault_ipa is used for the former and ipa is used for the latter.
> 
> Co-developed-by: Christoffer Dall <christoffer.dall@linaro.org>
> Co-developed-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> [maz: rewrote this multiple times...]
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_emulate.h |   6 ++
>  arch/arm64/include/asm/kvm_nested.h  |  18 +++++
>  arch/arm64/kvm/mmu.c                 | 102 +++++++++++++++++++++++----
>  arch/arm64/kvm/nested.c              |  48 +++++++++++++
>  4 files changed, 159 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index ff8980a39ee8..dc6eeb0cc8a9 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -602,4 +602,10 @@ static inline bool vcpu_has_feature(struct kvm_vcpu *vcpu, int feature)
>  	return test_bit(feature, vcpu->arch.features);
>  }
>  
> +static inline bool kvm_is_shadow_s2_fault(struct kvm_vcpu *vcpu)
> +{
> +	return (vcpu->arch.hw_mmu != &vcpu->kvm->arch.mmu &&
> +		vcpu->arch.hw_mmu->nested_stage2_enabled);
> +}
> +
>  #endif /* __ARM64_KVM_EMULATE_H__ */
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index 48cf288ea238..4fad4d3848ce 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -82,9 +82,27 @@ struct kvm_s2_trans {
>  	u64 upper_attr;
>  };
>  
> +static inline phys_addr_t kvm_s2_trans_output(struct kvm_s2_trans *trans)
> +{
> +	return trans->output;
> +}
> +
> +static inline unsigned long kvm_s2_trans_size(struct kvm_s2_trans *trans)
> +{
> +	return trans->block_size;
> +}
> +
> +static inline u32 kvm_s2_trans_esr(struct kvm_s2_trans *trans)
> +{
> +	return trans->esr;
> +}
> +
>  extern int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
>  			      struct kvm_s2_trans *result);
>  
> +extern int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu,
> +				    struct kvm_s2_trans *trans);
> +extern int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2);
>  int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
>  extern bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg,
>  			    u64 control_bit);
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 55525fd5743d..36f7ecb4f81b 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -969,7 +969,7 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot,
>  static unsigned long
>  transparent_hugepage_adjust(struct kvm *kvm, struct kvm_memory_slot *memslot,
>  			    unsigned long hva, kvm_pfn_t *pfnp,
> -			    phys_addr_t *ipap)
> +			    phys_addr_t *ipap, phys_addr_t *fault_ipap)
>  {
>  	kvm_pfn_t pfn = *pfnp;
>  
> @@ -998,6 +998,7 @@ transparent_hugepage_adjust(struct kvm *kvm, struct kvm_memory_slot *memslot,
>  		 * to PG_head and switch the pfn from a tail page to the head
>  		 * page accordingly.
>  		 */
> +		*fault_ipap &= PMD_MASK;
>  		*ipap &= PMD_MASK;
>  		kvm_release_pfn_clean(pfn);
>  		pfn &= ~(PTRS_PER_PMD - 1);
> @@ -1080,15 +1081,17 @@ static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
>  }
>  
>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> -			  struct kvm_memory_slot *memslot, unsigned long hva,
> -			  unsigned long fault_status)
> +			  struct kvm_s2_trans *nested,
> +			  struct kvm_memory_slot *memslot,
> +			  unsigned long hva, unsigned long fault_status)
>  {
>  	int ret = 0;
> -	bool write_fault, writable, force_pte = false;
> +	bool write_fault, writable;
>  	bool exec_fault;
>  	bool device = false;
>  	bool shared;
>  	unsigned long mmu_seq;
> +	phys_addr_t ipa = fault_ipa;
>  	struct kvm *kvm = vcpu->kvm;
>  	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
>  	struct vm_area_struct *vma;
> @@ -1100,6 +1103,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	unsigned long vma_pagesize, fault_granule;
>  	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
>  	struct kvm_pgtable *pgt;
> +	unsigned long max_map_size = PUD_SIZE;
>  
>  	fault_granule = 1UL << ARM64_HW_PGTABLE_LEVEL_SHIFT(fault_level);
>  	write_fault = kvm_is_write_fault(vcpu);
> @@ -1128,7 +1132,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	 * memslots.
>  	 */
>  	if (logging_active) {
> -		force_pte = true;
> +		max_map_size = vma_pagesize = PAGE_SIZE;

I don't think it's needed to set vma_pagesize here, because...

>  		vma_shift = PAGE_SHIFT;
>  	} else {
>  		vma_shift = get_vma_page_shift(vma, hva);
> @@ -1152,7 +1156,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  		fallthrough;
>  	case CONT_PTE_SHIFT:
>  		vma_shift = PAGE_SHIFT;
> -		force_pte = true;
> +		max_map_size = PAGE_SIZE;
>  		fallthrough;
>  	case PAGE_SHIFT:
>  		break;
> @@ -1161,10 +1165,25 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	}
>  
>  	vma_pagesize = 1UL << vma_shift;

...it's computed here based on vma_shift.

> +
> +	if (kvm_is_shadow_s2_fault(vcpu)) {
> +		ipa = kvm_s2_trans_output(nested);
> +
> +		/*
> +		 * If we're about to create a shadow stage 2 entry, then we
> +		 * can only create a block mapping if the guest stage 2 page
> +		 * table uses at least as big a mapping.
> +		 */
> +		max_map_size = min(kvm_s2_trans_size(nested), max_map_size);
> +	}
> +
> +	vma_pagesize = min(vma_pagesize, max_map_size);

I find the usage of max_map_size slightly confusing, because when max_map_size
is PAGE_SIZE, vma_pagesize is also PAGE_SIZE, so max_map_size doesn't affect
the result of the min() function above. In all other cases, max_map_size is
PUD_SIZE (the maximum possible value), so the resulting page size is only
influenced by kvm_s2_trans_size() and the vma_pagesize.

Wouldn't it make more sense to rework the hunk above as:

	if (kvm_is_shadow_s2_fault(vcpu)) {
		ipa = kvm_s2_trans_output(nested);
		vma_pagesize = min(kvm_s2_trans_size(nested), vma_pagesize);
	}

and remove the max_map_size local variable entirely? Or is there something I'm
missing here?

> +
>  	if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE)
>  		fault_ipa &= ~(vma_pagesize - 1);
>  
> -	gfn = fault_ipa >> PAGE_SHIFT;
> +	gfn = ipa >> PAGE_SHIFT;
> +
>  	mmap_read_unlock(current->mm);
>  
>  	/*
> @@ -1237,12 +1256,14 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	 * If we are not forced to use page mapping, check if we are
>  	 * backed by a THP and thus use block mapping if possible.
>  	 */
> -	if (vma_pagesize == PAGE_SIZE && !(force_pte || device)) {
> +	if (vma_pagesize == PAGE_SIZE &&
> +	    !(max_map_size == PAGE_SIZE || device)) {
>  		if (fault_status == FSC_PERM && fault_granule > PAGE_SIZE)
>  			vma_pagesize = fault_granule;
>  		else
>  			vma_pagesize = transparent_hugepage_adjust(kvm, memslot,
>  								   hva, &pfn,
> +								   &ipa,

I don't understand this. The local variable ipa (L1 guest's virtual stage 2
output) is never used after this point. kvm_pgtable_stage2_{map,relax_perms}
below uses fault_ipa (which is the correct address to use).

>  								   &fault_ipa);
>  	}
>  
> @@ -1326,8 +1347,10 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
>  int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>  {
>  	unsigned long fault_status;
> -	phys_addr_t fault_ipa;
> +	phys_addr_t fault_ipa; /* The address we faulted on */
> +	phys_addr_t ipa; /* Always the IPA in the L1 guest phys space */
>  	struct kvm_memory_slot *memslot;
> +	struct kvm_s2_trans nested_trans;
>  	unsigned long hva;
>  	bool is_iabt, write_fault, writable;
>  	gfn_t gfn;
> @@ -1335,7 +1358,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>  
>  	fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
>  
> -	fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
> +	ipa = fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
>  	is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
>  
>  	/* Synchronous External Abort? */
> @@ -1356,6 +1379,12 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>  	/* Check the stage-2 fault is trans. fault or write fault */
>  	if (fault_status != FSC_FAULT && fault_status != FSC_PERM &&
>  	    fault_status != FSC_ACCESS) {
> +		/*
> +		 * We must never see an address size fault on shadow stage 2
> +		 * page table walk, because we would have injected an addr
> +		 * size fault when we walked the nested s2 page and not
> +		 * create the shadow entry.
> +		 */
>  		kvm_err("Unsupported FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
>  			kvm_vcpu_trap_get_class(vcpu),
>  			(unsigned long)kvm_vcpu_trap_get_fault(vcpu),
> @@ -1365,7 +1394,49 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>  
>  	idx = srcu_read_lock(&vcpu->kvm->srcu);
>  
> -	gfn = fault_ipa >> PAGE_SHIFT;
> +	/*
> +	 * We may have faulted on a shadow stage 2 page table if we are
> +	 * running a nested guest.  In this case, we have to resolve the L2
> +	 * IPA to the L1 IPA first, before knowing what kind of memory should
> +	 * back the L1 IPA.
> +	 *
> +	 * If the shadow stage 2 page table walk faults, then we simply inject
> +	 * this to the guest and carry on.
> +	 */
> +	if (kvm_is_shadow_s2_fault(vcpu)) {
> +		u32 esr;
> +
> +		ret = kvm_walk_nested_s2(vcpu, fault_ipa, &nested_trans);
> +		esr = kvm_s2_trans_esr(&nested_trans);
> +		if (esr)
> +			kvm_inject_s2_fault(vcpu, esr);

This code reads very strange, because ret has the same semantic meaning as esr,
they are both non-zero when the software stage 2 walker encounters a virtual
stage 2 fault.

I think something like this would read a lot better:

		ret = kvm_walk_nested_s2(vcpu, fault_ipa, &nested);
		if (ret) {
			esr = kvm_s2_trans_esr(&nested_trans);
			kvm_inject_s2_fault(vcpu, esr);
			goto out_unlock;
		}

> +		if (ret)
> +			goto out_unlock;
> +
> +		ret = kvm_s2_handle_perm_fault(vcpu, &nested_trans);
> +		esr = kvm_s2_trans_esr(&nested_trans);
> +		if (esr)
> +			kvm_inject_s2_fault(vcpu, esr);
> +		if (ret)
> +			goto out_unlock;
> +
> +		ipa = kvm_s2_trans_output(&nested_trans);
> +	} else {
> +		nested_trans = (struct kvm_s2_trans) {
> +			/*
> +			 * Default to RWX so that we don't filter
> +			 * anything while evaluating the permissions.
> +			 */
> +			.writable	= true,
> +			.readable	= true,
> +			.upper_attr	= 0,
> +			.output		= ipa,
> +			.level		= kvm_vcpu_trap_get_fault_level(vcpu),
> +			.esr		= kvm_vcpu_get_esr(vcpu),
> +		};

I don't think this is needed, user_mem_abort() reads nested_trans only if
kvm_is_shadow_s2_fault().

> +	}
> +
> +	gfn = ipa >> PAGE_SHIFT;
>  	memslot = gfn_to_memslot(vcpu->kvm, gfn);
>  	hva = gfn_to_hva_memslot_prot(memslot, gfn, &writable);
>  	write_fault = kvm_is_write_fault(vcpu);
> @@ -1409,13 +1480,13 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>  		 * faulting VA. This is always 12 bits, irrespective
>  		 * of the page size.
>  		 */
> -		fault_ipa |= kvm_vcpu_get_hfar(vcpu) & ((1 << 12) - 1);
> -		ret = io_mem_abort(vcpu, fault_ipa);
> +		ipa |= kvm_vcpu_get_hfar(vcpu) & ((1 << 12) - 1);
> +		ret = io_mem_abort(vcpu, ipa);
>  		goto out_unlock;
>  	}
>  
>  	/* Userspace should not be able to register out-of-bounds IPAs */
> -	VM_BUG_ON(fault_ipa >= kvm_phys_size(vcpu->kvm));
> +	VM_BUG_ON(ipa >= kvm_phys_size(vcpu->kvm));
>  
>  	if (fault_status == FSC_ACCESS) {
>  		handle_access_fault(vcpu, fault_ipa);
> @@ -1423,7 +1494,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>  		goto out_unlock;
>  	}
>  
> -	ret = user_mem_abort(vcpu, fault_ipa, memslot, hva, fault_status);
> +	ret = user_mem_abort(vcpu, fault_ipa, &nested_trans,
> +			     memslot, hva, fault_status);
>  	if (ret == 0)
>  		ret = 1;
>  out:

I've tried to follow the nested fault logic in kvm_handle_guest_abort(), but I
found that extremely difficult because it's not clear what checks and error
conditions apply to the nested fault case versus the regular case.

For example, when handling this condition:

if (kvm_is_error_hva(hva) || (write_fault && !writable))

I think the second part (write_fault && !writable) is always false in the nested
case, because kvm_s2_handle_perm_fault() already has this check.

This part:

		if (kvm_is_error_hva(hva) && kvm_vcpu_dabt_is_cm(vcpu)) {
			kvm_incr_pc(vcpu);
			ret = 1;
			goto out_unlock;
		}

should the DABT be reflected back to the L1 guest? At this point we know that
fault_ipa is mapped to ipa in the virtual stage 2.

And this:

	if (fault_status == FSC_ACCESS) {
		handle_access_fault(vcpu, fault_ipa);
		ret = 1;
		goto out_unlock;
	}

I couldn't figure out if kvm_walk_nested_s2() alreay handles that case.

IMO, it might make handle_guest_abort() a lot easier to understand if the two
paths were clearly separated into distinct functions, one that handles the
nested abort case, and one that handle the normal case. On the other hand, that
might make maintaining the code a lot harder.

What do you think?

Thanks,
Alex

> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> index c2a99b672368..0a9708f776fc 100644
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
> @@ -108,6 +108,15 @@ static u32 compute_fsc(int level, u32 fsc)
>  	return fsc | (level & 0x3);
>  }
>  
> +static int esr_s2_fault(struct kvm_vcpu *vcpu, int level, u32 fsc)
> +{
> +	u32 esr;
> +
> +	esr = kvm_vcpu_get_esr(vcpu) & ~ESR_ELx_FSC;
> +	esr |= compute_fsc(level, fsc);
> +	return esr;
> +}
> +
>  static int check_base_s2_limits(struct s2_walk_info *wi,
>  				int level, int input_size, int stride)
>  {
> @@ -457,6 +466,45 @@ void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu)
>  	}
>  }
>  
> +/*
> + * Returns non-zero if permission fault is handled by injecting it to the next
> + * level hypervisor.
> + */
> +int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu, struct kvm_s2_trans *trans)
> +{
> +	unsigned long fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
> +	bool forward_fault = false;
> +
> +	trans->esr = 0;
> +
> +	if (fault_status != FSC_PERM)
> +		return 0;
> +
> +	if (kvm_vcpu_trap_is_iabt(vcpu)) {
> +		forward_fault = (trans->upper_attr & BIT(54));
> +	} else {
> +		bool write_fault = kvm_is_write_fault(vcpu);
> +
> +		forward_fault = ((write_fault && !trans->writable) ||
> +				 (!write_fault && !trans->readable));
> +	}
> +
> +	if (forward_fault) {
> +		trans->esr = esr_s2_fault(vcpu, trans->level, ESR_ELx_FSC_PERM);
> +		return 1;
> +	}
> +
> +	return 0;
> +}
> +
> +int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2)
> +{
> +	vcpu_write_sys_reg(vcpu, vcpu->arch.fault.far_el2, FAR_EL2);
> +	vcpu_write_sys_reg(vcpu, vcpu->arch.fault.hpfar_el2, HPFAR_EL2);
> +
> +	return kvm_inject_nested_sync(vcpu, esr_el2);
> +}
> +
>  /*
>   * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
>   * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
> -- 
> 2.30.2
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 37/64] KVM: arm64: nv: Restrict S2 RD/WR permissions to match the guest's
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-17 16:29     ` Alexandru Elisei
  -1 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-17 16:29 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi,

On Fri, Jan 28, 2022 at 12:18:45PM +0000, Marc Zyngier wrote:
> When mapping a page in a shadow stage-2, special care must be
> taken not to be more permissive than the guest is (writable or
> readable page when the guest hasn't set that permission).
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_nested.h | 15 +++++++++++++++
>  arch/arm64/kvm/mmu.c                | 14 +++++++++++++-
>  arch/arm64/kvm/nested.c             |  2 +-
>  3 files changed, 29 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index 4fad4d3848ce..f4b846d09d86 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -97,6 +97,21 @@ static inline u32 kvm_s2_trans_esr(struct kvm_s2_trans *trans)
>  	return trans->esr;
>  }
>  
> +static inline bool kvm_s2_trans_readable(struct kvm_s2_trans *trans)
> +{
> +	return trans->readable;
> +}
> +
> +static inline bool kvm_s2_trans_writable(struct kvm_s2_trans *trans)
> +{
> +	return trans->writable;
> +}
> +
> +static inline bool kvm_s2_trans_executable(struct kvm_s2_trans *trans)
> +{
> +	return !(trans->upper_attr & BIT(54));
> +}
> +
>  extern int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
>  			      struct kvm_s2_trans *result);
>  
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 36f7ecb4f81b..7c56e1522d3c 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1247,6 +1247,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	if (exec_fault && device)
>  		return -ENOEXEC;
>  
> +	/*
> +	 * Potentially reduce shadow S2 permissions to match the guest's own
> +	 * S2. For exec faults, we'd only reach this point if the guest
> +	 * actually allowed it (see kvm_s2_handle_perm_fault).
> +	 */
> +	if (kvm_is_shadow_s2_fault(vcpu)) {
> +		writable &= kvm_s2_trans_writable(nested);

I was a bit confused about writable being true when write_fault is false. After
some digging, it turns out that hva_to_pfn() oportunistically makes writable
true, even for read faults.

> +		if (!kvm_s2_trans_readable(nested))
> +			prot &= ~KVM_PGTABLE_PROT_R;

The local variable "prot" is initialized to KVM_PGTABLE_PROT_R, so this check
makes sense.

> +	}
> +
>  	spin_lock(&kvm->mmu_lock);
>  	pgt = vcpu->arch.hw_mmu->pgt;
>  	if (mmu_notifier_retry(kvm, mmu_seq))
> @@ -1285,7 +1296,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  
>  	if (device)
>  		prot |= KVM_PGTABLE_PROT_DEVICE;
> -	else if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC))
> +	else if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC) &&
> +		 kvm_s2_trans_executable(nested))
>  		prot |= KVM_PGTABLE_PROT_X;
>  
>  	/*
> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> index 0a9708f776fc..a74ffb1d2064 100644
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
> @@ -481,7 +481,7 @@ int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu, struct kvm_s2_trans *trans)
>  		return 0;
>  
>  	if (kvm_vcpu_trap_is_iabt(vcpu)) {
> -		forward_fault = (trans->upper_attr & BIT(54));
> +		forward_fault = !kvm_s2_trans_executable(trans);
>  	} else {
>  		bool write_fault = kvm_is_write_fault(vcpu);

The patch looks good to me:

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks,
Alex

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 37/64] KVM: arm64: nv: Restrict S2 RD/WR permissions to match the guest's
@ 2022-02-17 16:29     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-17 16:29 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

Hi,

On Fri, Jan 28, 2022 at 12:18:45PM +0000, Marc Zyngier wrote:
> When mapping a page in a shadow stage-2, special care must be
> taken not to be more permissive than the guest is (writable or
> readable page when the guest hasn't set that permission).
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_nested.h | 15 +++++++++++++++
>  arch/arm64/kvm/mmu.c                | 14 +++++++++++++-
>  arch/arm64/kvm/nested.c             |  2 +-
>  3 files changed, 29 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index 4fad4d3848ce..f4b846d09d86 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -97,6 +97,21 @@ static inline u32 kvm_s2_trans_esr(struct kvm_s2_trans *trans)
>  	return trans->esr;
>  }
>  
> +static inline bool kvm_s2_trans_readable(struct kvm_s2_trans *trans)
> +{
> +	return trans->readable;
> +}
> +
> +static inline bool kvm_s2_trans_writable(struct kvm_s2_trans *trans)
> +{
> +	return trans->writable;
> +}
> +
> +static inline bool kvm_s2_trans_executable(struct kvm_s2_trans *trans)
> +{
> +	return !(trans->upper_attr & BIT(54));
> +}
> +
>  extern int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
>  			      struct kvm_s2_trans *result);
>  
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 36f7ecb4f81b..7c56e1522d3c 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1247,6 +1247,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	if (exec_fault && device)
>  		return -ENOEXEC;
>  
> +	/*
> +	 * Potentially reduce shadow S2 permissions to match the guest's own
> +	 * S2. For exec faults, we'd only reach this point if the guest
> +	 * actually allowed it (see kvm_s2_handle_perm_fault).
> +	 */
> +	if (kvm_is_shadow_s2_fault(vcpu)) {
> +		writable &= kvm_s2_trans_writable(nested);

I was a bit confused about writable being true when write_fault is false. After
some digging, it turns out that hva_to_pfn() oportunistically makes writable
true, even for read faults.

> +		if (!kvm_s2_trans_readable(nested))
> +			prot &= ~KVM_PGTABLE_PROT_R;

The local variable "prot" is initialized to KVM_PGTABLE_PROT_R, so this check
makes sense.

> +	}
> +
>  	spin_lock(&kvm->mmu_lock);
>  	pgt = vcpu->arch.hw_mmu->pgt;
>  	if (mmu_notifier_retry(kvm, mmu_seq))
> @@ -1285,7 +1296,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  
>  	if (device)
>  		prot |= KVM_PGTABLE_PROT_DEVICE;
> -	else if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC))
> +	else if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC) &&
> +		 kvm_s2_trans_executable(nested))
>  		prot |= KVM_PGTABLE_PROT_X;
>  
>  	/*
> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> index 0a9708f776fc..a74ffb1d2064 100644
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
> @@ -481,7 +481,7 @@ int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu, struct kvm_s2_trans *trans)
>  		return 0;
>  
>  	if (kvm_vcpu_trap_is_iabt(vcpu)) {
> -		forward_fault = (trans->upper_attr & BIT(54));
> +		forward_fault = !kvm_s2_trans_executable(trans);
>  	} else {
>  		bool write_fault = kvm_is_write_fault(vcpu);

The patch looks good to me:

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks,
Alex
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 37/64] KVM: arm64: nv: Restrict S2 RD/WR permissions to match the guest's
@ 2022-02-17 16:29     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-17 16:29 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi,

On Fri, Jan 28, 2022 at 12:18:45PM +0000, Marc Zyngier wrote:
> When mapping a page in a shadow stage-2, special care must be
> taken not to be more permissive than the guest is (writable or
> readable page when the guest hasn't set that permission).
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_nested.h | 15 +++++++++++++++
>  arch/arm64/kvm/mmu.c                | 14 +++++++++++++-
>  arch/arm64/kvm/nested.c             |  2 +-
>  3 files changed, 29 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index 4fad4d3848ce..f4b846d09d86 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -97,6 +97,21 @@ static inline u32 kvm_s2_trans_esr(struct kvm_s2_trans *trans)
>  	return trans->esr;
>  }
>  
> +static inline bool kvm_s2_trans_readable(struct kvm_s2_trans *trans)
> +{
> +	return trans->readable;
> +}
> +
> +static inline bool kvm_s2_trans_writable(struct kvm_s2_trans *trans)
> +{
> +	return trans->writable;
> +}
> +
> +static inline bool kvm_s2_trans_executable(struct kvm_s2_trans *trans)
> +{
> +	return !(trans->upper_attr & BIT(54));
> +}
> +
>  extern int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
>  			      struct kvm_s2_trans *result);
>  
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 36f7ecb4f81b..7c56e1522d3c 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1247,6 +1247,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	if (exec_fault && device)
>  		return -ENOEXEC;
>  
> +	/*
> +	 * Potentially reduce shadow S2 permissions to match the guest's own
> +	 * S2. For exec faults, we'd only reach this point if the guest
> +	 * actually allowed it (see kvm_s2_handle_perm_fault).
> +	 */
> +	if (kvm_is_shadow_s2_fault(vcpu)) {
> +		writable &= kvm_s2_trans_writable(nested);

I was a bit confused about writable being true when write_fault is false. After
some digging, it turns out that hva_to_pfn() oportunistically makes writable
true, even for read faults.

> +		if (!kvm_s2_trans_readable(nested))
> +			prot &= ~KVM_PGTABLE_PROT_R;

The local variable "prot" is initialized to KVM_PGTABLE_PROT_R, so this check
makes sense.

> +	}
> +
>  	spin_lock(&kvm->mmu_lock);
>  	pgt = vcpu->arch.hw_mmu->pgt;
>  	if (mmu_notifier_retry(kvm, mmu_seq))
> @@ -1285,7 +1296,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  
>  	if (device)
>  		prot |= KVM_PGTABLE_PROT_DEVICE;
> -	else if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC))
> +	else if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC) &&
> +		 kvm_s2_trans_executable(nested))
>  		prot |= KVM_PGTABLE_PROT_X;
>  
>  	/*
> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> index 0a9708f776fc..a74ffb1d2064 100644
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
> @@ -481,7 +481,7 @@ int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu, struct kvm_s2_trans *trans)
>  		return 0;
>  
>  	if (kvm_vcpu_trap_is_iabt(vcpu)) {
> -		forward_fault = (trans->upper_attr & BIT(54));
> +		forward_fault = !kvm_s2_trans_executable(trans);
>  	} else {
>  		bool write_fault = kvm_is_write_fault(vcpu);

The patch looks good to me:

Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks,
Alex

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 38/64] KVM: arm64: nv: Unmap/flush shadow stage 2 page tables
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-22 16:13     ` Alexandru Elisei
  -1 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-22 16:13 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi,

On Fri, Jan 28, 2022 at 12:18:46PM +0000, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@linaro.org>
> 
> Unmap/flush shadow stage 2 page tables for the nested VMs as well as the
> stage 2 page table for the guest hypervisor.
> 
> Note: A bunch of the code in mmu.c relating to MMU notifiers is
> currently dealt with in an extremely abrupt way, for example by clearing
> out an entire shadow stage-2 table. This will be handled in a more
> efficient way using the reverse mapping feature in a later version of
> the patch series.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_mmu.h    |  3 +++
>  arch/arm64/include/asm/kvm_nested.h |  3 +++
>  arch/arm64/kvm/mmu.c                | 31 +++++++++++++++++++----
>  arch/arm64/kvm/nested.c             | 39 +++++++++++++++++++++++++++++
>  4 files changed, 71 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 0750d022bbf8..afad4a27a6f2 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -160,6 +160,8 @@ int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
>  			   void __iomem **haddr);
>  int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
>  			     void **haddr);
> +void kvm_stage2_flush_range(struct kvm_s2_mmu *mmu,
> +			    phys_addr_t addr, phys_addr_t end);
>  void free_hyp_pgds(void);
>  
>  void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size);
> @@ -168,6 +170,7 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu);
>  void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
>  int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
>  			  phys_addr_t pa, unsigned long size, bool writable);
> +void kvm_stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end);
>  
>  int kvm_handle_guest_abort(struct kvm_vcpu *vcpu);
>  
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index f4b846d09d86..8915bead0633 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -118,6 +118,9 @@ extern int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
>  extern int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu,
>  				    struct kvm_s2_trans *trans);
>  extern int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2);
> +extern void kvm_nested_s2_wp(struct kvm *kvm);
> +extern void kvm_nested_s2_clear(struct kvm *kvm);

Why is the function that removes all the entries from kvm->arch.mmu called
unmap_stage2 and the function that removes all the entries from the nested mmus
called s2_clear? Would be nice if the latter also used the verb unmap instead of
clear, to make the code easier to understand.

> +extern void kvm_nested_s2_flush(struct kvm *kvm);
>  int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
>  extern bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg,
>  			    u64 control_bit);
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 7c56e1522d3c..b9b11be65009 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -190,13 +190,20 @@ void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
>  	__unmap_stage2_range(mmu, start, size, true);
>  }
>  
> +void kvm_stage2_flush_range(struct kvm_s2_mmu *mmu,
> +			    phys_addr_t addr, phys_addr_t end)
> +{
> +	stage2_apply_range_resched(kvm_s2_mmu_to_kvm(mmu), addr, end, kvm_pgtable_stage2_flush);
> +}
> +
>  static void stage2_flush_memslot(struct kvm *kvm,
>  				 struct kvm_memory_slot *memslot)
>  {
>  	phys_addr_t addr = memslot->base_gfn << PAGE_SHIFT;
>  	phys_addr_t end = addr + PAGE_SIZE * memslot->npages;
> +	struct kvm_s2_mmu *mmu = &kvm->arch.mmu;
>  
> -	stage2_apply_range_resched(kvm, addr, end, kvm_pgtable_stage2_flush);
> +	kvm_stage2_flush_range(mmu, addr, end);
>  }
>  
>  /**
> @@ -219,6 +226,8 @@ static void stage2_flush_vm(struct kvm *kvm)
>  	kvm_for_each_memslot(memslot, bkt, slots)
>  		stage2_flush_memslot(kvm, memslot);
>  
> +	kvm_nested_s2_flush(kvm);
> +
>  	spin_unlock(&kvm->mmu_lock);
>  	srcu_read_unlock(&kvm->srcu, idx);
>  }
> @@ -742,6 +751,8 @@ void stage2_unmap_vm(struct kvm *kvm)
>  	kvm_for_each_memslot(memslot, bkt, slots)
>  		stage2_unmap_memslot(kvm, memslot);
>  
> +	kvm_nested_s2_clear(kvm);
> +
>  	spin_unlock(&kvm->mmu_lock);
>  	mmap_read_unlock(current->mm);
>  	srcu_read_unlock(&kvm->srcu, idx);
> @@ -814,12 +825,12 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
>  }
>  
>  /**
> - * stage2_wp_range() - write protect stage2 memory region range
> + * kvm_stage2_wp_range() - write protect stage2 memory region range
>   * @mmu:        The KVM stage-2 MMU pointer
>   * @addr:	Start address of range
>   * @end:	End address of range
>   */
> -static void stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end)
> +void kvm_stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end)
>  {
>  	struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
>  	stage2_apply_range_resched(kvm, addr, end, kvm_pgtable_stage2_wrprotect);
> @@ -851,7 +862,8 @@ static void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
>  	end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
>  
>  	spin_lock(&kvm->mmu_lock);
> -	stage2_wp_range(&kvm->arch.mmu, start, end);
> +	kvm_stage2_wp_range(&kvm->arch.mmu, start, end);
> +	kvm_nested_s2_wp(kvm);
>  	spin_unlock(&kvm->mmu_lock);
>  	kvm_flush_remote_tlbs(kvm);
>  }
> @@ -875,7 +887,7 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
>  	phys_addr_t start = (base_gfn +  __ffs(mask)) << PAGE_SHIFT;
>  	phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT;
>  
> -	stage2_wp_range(&kvm->arch.mmu, start, end);
> +	kvm_stage2_wp_range(&kvm->arch.mmu, start, end);
>  }
>  
>  /*
> @@ -890,6 +902,7 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>  		gfn_t gfn_offset, unsigned long mask)
>  {
>  	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
> +	kvm_nested_s2_wp(kvm);
>  }
>  
>  static void kvm_send_hwpoison_signal(unsigned long address, short lsb)
> @@ -1529,6 +1542,7 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
>  			     (range->end - range->start) << PAGE_SHIFT,
>  			     range->may_block);
>  
> +	kvm_nested_s2_clear(kvm);
>  	return false;
>  }
>  
> @@ -1560,6 +1574,7 @@ bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
>  			       PAGE_SIZE, __pfn_to_phys(pfn),
>  			       KVM_PGTABLE_PROT_R, NULL);
>  
> +	kvm_nested_s2_clear(kvm);
>  	return false;
>  }
>  
> @@ -1578,6 +1593,11 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
>  					range->start << PAGE_SHIFT);
>  	pte = __pte(kpte);
>  	return pte_valid(pte) && pte_young(pte);
> +
> +	/*
> +	 * TODO: Handle nested_mmu structures here using the reverse mapping in
> +	 * a later version of patch series.
> +	 */

Hm... does this mean that at the moment KVM cannot age a page in a nested MMU?

>  }
>  
>  bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
> @@ -1789,6 +1809,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>  
>  	spin_lock(&kvm->mmu_lock);
>  	kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, size);
> +	kvm_nested_s2_clear(kvm);
>  	spin_unlock(&kvm->mmu_lock);
>  }
>  
> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> index a74ffb1d2064..b39af4d87787 100644
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
> @@ -505,6 +505,45 @@ int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2)
>  	return kvm_inject_nested_sync(vcpu, esr_el2);
>  }
>  
> +/* expects kvm->mmu_lock to be held */

The function could do a lockdep_assert_held(&kvm->mmu_lock), which would serve
the double purpose of documenting that the lock should be held and also
verifying that that is indeed the case when lockdep checking is enabled.

> +void kvm_nested_s2_wp(struct kvm *kvm)
> +{
> +	int i;
> +
> +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> +		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
> +
> +		if (kvm_s2_mmu_valid(mmu))
> +			kvm_stage2_wp_range(mmu, 0, kvm_phys_size(kvm));

I'm not sure about KVM/arm64's plans to add support for dirty ring, but doing
this means that there is a change that duplicate pfns are added to the ring if
the same pfn is present in multiple MMUs.

Thanks,
Alex

> +	}
> +}
> +
> +/* expects kvm->mmu_lock to be held */
> +void kvm_nested_s2_clear(struct kvm *kvm)
> +{
> +	int i;
> +
> +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> +		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
> +
> +		if (kvm_s2_mmu_valid(mmu))
> +			kvm_unmap_stage2_range(mmu, 0, kvm_phys_size(kvm));
> +	}
> +}
> +
> +/* expects kvm->mmu_lock to be held */
> +void kvm_nested_s2_flush(struct kvm *kvm)
> +{
> +	int i;
> +
> +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> +		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
> +
> +		if (kvm_s2_mmu_valid(mmu))
> +			kvm_stage2_flush_range(mmu, 0, kvm_phys_size(kvm));
> +	}
> +}
> +
>  /*
>   * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
>   * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 38/64] KVM: arm64: nv: Unmap/flush shadow stage 2 page tables
@ 2022-02-22 16:13     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-22 16:13 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

Hi,

On Fri, Jan 28, 2022 at 12:18:46PM +0000, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@linaro.org>
> 
> Unmap/flush shadow stage 2 page tables for the nested VMs as well as the
> stage 2 page table for the guest hypervisor.
> 
> Note: A bunch of the code in mmu.c relating to MMU notifiers is
> currently dealt with in an extremely abrupt way, for example by clearing
> out an entire shadow stage-2 table. This will be handled in a more
> efficient way using the reverse mapping feature in a later version of
> the patch series.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_mmu.h    |  3 +++
>  arch/arm64/include/asm/kvm_nested.h |  3 +++
>  arch/arm64/kvm/mmu.c                | 31 +++++++++++++++++++----
>  arch/arm64/kvm/nested.c             | 39 +++++++++++++++++++++++++++++
>  4 files changed, 71 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 0750d022bbf8..afad4a27a6f2 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -160,6 +160,8 @@ int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
>  			   void __iomem **haddr);
>  int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
>  			     void **haddr);
> +void kvm_stage2_flush_range(struct kvm_s2_mmu *mmu,
> +			    phys_addr_t addr, phys_addr_t end);
>  void free_hyp_pgds(void);
>  
>  void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size);
> @@ -168,6 +170,7 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu);
>  void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
>  int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
>  			  phys_addr_t pa, unsigned long size, bool writable);
> +void kvm_stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end);
>  
>  int kvm_handle_guest_abort(struct kvm_vcpu *vcpu);
>  
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index f4b846d09d86..8915bead0633 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -118,6 +118,9 @@ extern int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
>  extern int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu,
>  				    struct kvm_s2_trans *trans);
>  extern int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2);
> +extern void kvm_nested_s2_wp(struct kvm *kvm);
> +extern void kvm_nested_s2_clear(struct kvm *kvm);

Why is the function that removes all the entries from kvm->arch.mmu called
unmap_stage2 and the function that removes all the entries from the nested mmus
called s2_clear? Would be nice if the latter also used the verb unmap instead of
clear, to make the code easier to understand.

> +extern void kvm_nested_s2_flush(struct kvm *kvm);
>  int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
>  extern bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg,
>  			    u64 control_bit);
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 7c56e1522d3c..b9b11be65009 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -190,13 +190,20 @@ void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
>  	__unmap_stage2_range(mmu, start, size, true);
>  }
>  
> +void kvm_stage2_flush_range(struct kvm_s2_mmu *mmu,
> +			    phys_addr_t addr, phys_addr_t end)
> +{
> +	stage2_apply_range_resched(kvm_s2_mmu_to_kvm(mmu), addr, end, kvm_pgtable_stage2_flush);
> +}
> +
>  static void stage2_flush_memslot(struct kvm *kvm,
>  				 struct kvm_memory_slot *memslot)
>  {
>  	phys_addr_t addr = memslot->base_gfn << PAGE_SHIFT;
>  	phys_addr_t end = addr + PAGE_SIZE * memslot->npages;
> +	struct kvm_s2_mmu *mmu = &kvm->arch.mmu;
>  
> -	stage2_apply_range_resched(kvm, addr, end, kvm_pgtable_stage2_flush);
> +	kvm_stage2_flush_range(mmu, addr, end);
>  }
>  
>  /**
> @@ -219,6 +226,8 @@ static void stage2_flush_vm(struct kvm *kvm)
>  	kvm_for_each_memslot(memslot, bkt, slots)
>  		stage2_flush_memslot(kvm, memslot);
>  
> +	kvm_nested_s2_flush(kvm);
> +
>  	spin_unlock(&kvm->mmu_lock);
>  	srcu_read_unlock(&kvm->srcu, idx);
>  }
> @@ -742,6 +751,8 @@ void stage2_unmap_vm(struct kvm *kvm)
>  	kvm_for_each_memslot(memslot, bkt, slots)
>  		stage2_unmap_memslot(kvm, memslot);
>  
> +	kvm_nested_s2_clear(kvm);
> +
>  	spin_unlock(&kvm->mmu_lock);
>  	mmap_read_unlock(current->mm);
>  	srcu_read_unlock(&kvm->srcu, idx);
> @@ -814,12 +825,12 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
>  }
>  
>  /**
> - * stage2_wp_range() - write protect stage2 memory region range
> + * kvm_stage2_wp_range() - write protect stage2 memory region range
>   * @mmu:        The KVM stage-2 MMU pointer
>   * @addr:	Start address of range
>   * @end:	End address of range
>   */
> -static void stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end)
> +void kvm_stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end)
>  {
>  	struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
>  	stage2_apply_range_resched(kvm, addr, end, kvm_pgtable_stage2_wrprotect);
> @@ -851,7 +862,8 @@ static void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
>  	end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
>  
>  	spin_lock(&kvm->mmu_lock);
> -	stage2_wp_range(&kvm->arch.mmu, start, end);
> +	kvm_stage2_wp_range(&kvm->arch.mmu, start, end);
> +	kvm_nested_s2_wp(kvm);
>  	spin_unlock(&kvm->mmu_lock);
>  	kvm_flush_remote_tlbs(kvm);
>  }
> @@ -875,7 +887,7 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
>  	phys_addr_t start = (base_gfn +  __ffs(mask)) << PAGE_SHIFT;
>  	phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT;
>  
> -	stage2_wp_range(&kvm->arch.mmu, start, end);
> +	kvm_stage2_wp_range(&kvm->arch.mmu, start, end);
>  }
>  
>  /*
> @@ -890,6 +902,7 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>  		gfn_t gfn_offset, unsigned long mask)
>  {
>  	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
> +	kvm_nested_s2_wp(kvm);
>  }
>  
>  static void kvm_send_hwpoison_signal(unsigned long address, short lsb)
> @@ -1529,6 +1542,7 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
>  			     (range->end - range->start) << PAGE_SHIFT,
>  			     range->may_block);
>  
> +	kvm_nested_s2_clear(kvm);
>  	return false;
>  }
>  
> @@ -1560,6 +1574,7 @@ bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
>  			       PAGE_SIZE, __pfn_to_phys(pfn),
>  			       KVM_PGTABLE_PROT_R, NULL);
>  
> +	kvm_nested_s2_clear(kvm);
>  	return false;
>  }
>  
> @@ -1578,6 +1593,11 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
>  					range->start << PAGE_SHIFT);
>  	pte = __pte(kpte);
>  	return pte_valid(pte) && pte_young(pte);
> +
> +	/*
> +	 * TODO: Handle nested_mmu structures here using the reverse mapping in
> +	 * a later version of patch series.
> +	 */

Hm... does this mean that at the moment KVM cannot age a page in a nested MMU?

>  }
>  
>  bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
> @@ -1789,6 +1809,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>  
>  	spin_lock(&kvm->mmu_lock);
>  	kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, size);
> +	kvm_nested_s2_clear(kvm);
>  	spin_unlock(&kvm->mmu_lock);
>  }
>  
> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> index a74ffb1d2064..b39af4d87787 100644
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
> @@ -505,6 +505,45 @@ int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2)
>  	return kvm_inject_nested_sync(vcpu, esr_el2);
>  }
>  
> +/* expects kvm->mmu_lock to be held */

The function could do a lockdep_assert_held(&kvm->mmu_lock), which would serve
the double purpose of documenting that the lock should be held and also
verifying that that is indeed the case when lockdep checking is enabled.

> +void kvm_nested_s2_wp(struct kvm *kvm)
> +{
> +	int i;
> +
> +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> +		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
> +
> +		if (kvm_s2_mmu_valid(mmu))
> +			kvm_stage2_wp_range(mmu, 0, kvm_phys_size(kvm));

I'm not sure about KVM/arm64's plans to add support for dirty ring, but doing
this means that there is a change that duplicate pfns are added to the ring if
the same pfn is present in multiple MMUs.

Thanks,
Alex

> +	}
> +}
> +
> +/* expects kvm->mmu_lock to be held */
> +void kvm_nested_s2_clear(struct kvm *kvm)
> +{
> +	int i;
> +
> +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> +		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
> +
> +		if (kvm_s2_mmu_valid(mmu))
> +			kvm_unmap_stage2_range(mmu, 0, kvm_phys_size(kvm));
> +	}
> +}
> +
> +/* expects kvm->mmu_lock to be held */
> +void kvm_nested_s2_flush(struct kvm *kvm)
> +{
> +	int i;
> +
> +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> +		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
> +
> +		if (kvm_s2_mmu_valid(mmu))
> +			kvm_stage2_flush_range(mmu, 0, kvm_phys_size(kvm));
> +	}
> +}
> +
>  /*
>   * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
>   * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
> -- 
> 2.30.2
> 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 38/64] KVM: arm64: nv: Unmap/flush shadow stage 2 page tables
@ 2022-02-22 16:13     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-22 16:13 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi,

On Fri, Jan 28, 2022 at 12:18:46PM +0000, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@linaro.org>
> 
> Unmap/flush shadow stage 2 page tables for the nested VMs as well as the
> stage 2 page table for the guest hypervisor.
> 
> Note: A bunch of the code in mmu.c relating to MMU notifiers is
> currently dealt with in an extremely abrupt way, for example by clearing
> out an entire shadow stage-2 table. This will be handled in a more
> efficient way using the reverse mapping feature in a later version of
> the patch series.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_mmu.h    |  3 +++
>  arch/arm64/include/asm/kvm_nested.h |  3 +++
>  arch/arm64/kvm/mmu.c                | 31 +++++++++++++++++++----
>  arch/arm64/kvm/nested.c             | 39 +++++++++++++++++++++++++++++
>  4 files changed, 71 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 0750d022bbf8..afad4a27a6f2 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -160,6 +160,8 @@ int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
>  			   void __iomem **haddr);
>  int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
>  			     void **haddr);
> +void kvm_stage2_flush_range(struct kvm_s2_mmu *mmu,
> +			    phys_addr_t addr, phys_addr_t end);
>  void free_hyp_pgds(void);
>  
>  void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size);
> @@ -168,6 +170,7 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu);
>  void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
>  int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
>  			  phys_addr_t pa, unsigned long size, bool writable);
> +void kvm_stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end);
>  
>  int kvm_handle_guest_abort(struct kvm_vcpu *vcpu);
>  
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index f4b846d09d86..8915bead0633 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -118,6 +118,9 @@ extern int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
>  extern int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu,
>  				    struct kvm_s2_trans *trans);
>  extern int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2);
> +extern void kvm_nested_s2_wp(struct kvm *kvm);
> +extern void kvm_nested_s2_clear(struct kvm *kvm);

Why is the function that removes all the entries from kvm->arch.mmu called
unmap_stage2 and the function that removes all the entries from the nested mmus
called s2_clear? Would be nice if the latter also used the verb unmap instead of
clear, to make the code easier to understand.

> +extern void kvm_nested_s2_flush(struct kvm *kvm);
>  int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
>  extern bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg,
>  			    u64 control_bit);
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 7c56e1522d3c..b9b11be65009 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -190,13 +190,20 @@ void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
>  	__unmap_stage2_range(mmu, start, size, true);
>  }
>  
> +void kvm_stage2_flush_range(struct kvm_s2_mmu *mmu,
> +			    phys_addr_t addr, phys_addr_t end)
> +{
> +	stage2_apply_range_resched(kvm_s2_mmu_to_kvm(mmu), addr, end, kvm_pgtable_stage2_flush);
> +}
> +
>  static void stage2_flush_memslot(struct kvm *kvm,
>  				 struct kvm_memory_slot *memslot)
>  {
>  	phys_addr_t addr = memslot->base_gfn << PAGE_SHIFT;
>  	phys_addr_t end = addr + PAGE_SIZE * memslot->npages;
> +	struct kvm_s2_mmu *mmu = &kvm->arch.mmu;
>  
> -	stage2_apply_range_resched(kvm, addr, end, kvm_pgtable_stage2_flush);
> +	kvm_stage2_flush_range(mmu, addr, end);
>  }
>  
>  /**
> @@ -219,6 +226,8 @@ static void stage2_flush_vm(struct kvm *kvm)
>  	kvm_for_each_memslot(memslot, bkt, slots)
>  		stage2_flush_memslot(kvm, memslot);
>  
> +	kvm_nested_s2_flush(kvm);
> +
>  	spin_unlock(&kvm->mmu_lock);
>  	srcu_read_unlock(&kvm->srcu, idx);
>  }
> @@ -742,6 +751,8 @@ void stage2_unmap_vm(struct kvm *kvm)
>  	kvm_for_each_memslot(memslot, bkt, slots)
>  		stage2_unmap_memslot(kvm, memslot);
>  
> +	kvm_nested_s2_clear(kvm);
> +
>  	spin_unlock(&kvm->mmu_lock);
>  	mmap_read_unlock(current->mm);
>  	srcu_read_unlock(&kvm->srcu, idx);
> @@ -814,12 +825,12 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
>  }
>  
>  /**
> - * stage2_wp_range() - write protect stage2 memory region range
> + * kvm_stage2_wp_range() - write protect stage2 memory region range
>   * @mmu:        The KVM stage-2 MMU pointer
>   * @addr:	Start address of range
>   * @end:	End address of range
>   */
> -static void stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end)
> +void kvm_stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end)
>  {
>  	struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
>  	stage2_apply_range_resched(kvm, addr, end, kvm_pgtable_stage2_wrprotect);
> @@ -851,7 +862,8 @@ static void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
>  	end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
>  
>  	spin_lock(&kvm->mmu_lock);
> -	stage2_wp_range(&kvm->arch.mmu, start, end);
> +	kvm_stage2_wp_range(&kvm->arch.mmu, start, end);
> +	kvm_nested_s2_wp(kvm);
>  	spin_unlock(&kvm->mmu_lock);
>  	kvm_flush_remote_tlbs(kvm);
>  }
> @@ -875,7 +887,7 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
>  	phys_addr_t start = (base_gfn +  __ffs(mask)) << PAGE_SHIFT;
>  	phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT;
>  
> -	stage2_wp_range(&kvm->arch.mmu, start, end);
> +	kvm_stage2_wp_range(&kvm->arch.mmu, start, end);
>  }
>  
>  /*
> @@ -890,6 +902,7 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>  		gfn_t gfn_offset, unsigned long mask)
>  {
>  	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
> +	kvm_nested_s2_wp(kvm);
>  }
>  
>  static void kvm_send_hwpoison_signal(unsigned long address, short lsb)
> @@ -1529,6 +1542,7 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
>  			     (range->end - range->start) << PAGE_SHIFT,
>  			     range->may_block);
>  
> +	kvm_nested_s2_clear(kvm);
>  	return false;
>  }
>  
> @@ -1560,6 +1574,7 @@ bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
>  			       PAGE_SIZE, __pfn_to_phys(pfn),
>  			       KVM_PGTABLE_PROT_R, NULL);
>  
> +	kvm_nested_s2_clear(kvm);
>  	return false;
>  }
>  
> @@ -1578,6 +1593,11 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
>  					range->start << PAGE_SHIFT);
>  	pte = __pte(kpte);
>  	return pte_valid(pte) && pte_young(pte);
> +
> +	/*
> +	 * TODO: Handle nested_mmu structures here using the reverse mapping in
> +	 * a later version of patch series.
> +	 */

Hm... does this mean that at the moment KVM cannot age a page in a nested MMU?

>  }
>  
>  bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
> @@ -1789,6 +1809,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>  
>  	spin_lock(&kvm->mmu_lock);
>  	kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, size);
> +	kvm_nested_s2_clear(kvm);
>  	spin_unlock(&kvm->mmu_lock);
>  }
>  
> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> index a74ffb1d2064..b39af4d87787 100644
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
> @@ -505,6 +505,45 @@ int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2)
>  	return kvm_inject_nested_sync(vcpu, esr_el2);
>  }
>  
> +/* expects kvm->mmu_lock to be held */

The function could do a lockdep_assert_held(&kvm->mmu_lock), which would serve
the double purpose of documenting that the lock should be held and also
verifying that that is indeed the case when lockdep checking is enabled.

> +void kvm_nested_s2_wp(struct kvm *kvm)
> +{
> +	int i;
> +
> +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> +		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
> +
> +		if (kvm_s2_mmu_valid(mmu))
> +			kvm_stage2_wp_range(mmu, 0, kvm_phys_size(kvm));

I'm not sure about KVM/arm64's plans to add support for dirty ring, but doing
this means that there is a change that duplicate pfns are added to the ring if
the same pfn is present in multiple MMUs.

Thanks,
Alex

> +	}
> +}
> +
> +/* expects kvm->mmu_lock to be held */
> +void kvm_nested_s2_clear(struct kvm *kvm)
> +{
> +	int i;
> +
> +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> +		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
> +
> +		if (kvm_s2_mmu_valid(mmu))
> +			kvm_unmap_stage2_range(mmu, 0, kvm_phys_size(kvm));
> +	}
> +}
> +
> +/* expects kvm->mmu_lock to be held */
> +void kvm_nested_s2_flush(struct kvm *kvm)
> +{
> +	int i;
> +
> +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> +		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
> +
> +		if (kvm_s2_mmu_valid(mmu))
> +			kvm_stage2_flush_range(mmu, 0, kvm_phys_size(kvm));
> +	}
> +}
> +
>  /*
>   * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
>   * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
> -- 
> 2.30.2
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 39/64] KVM: arm64: nv: Set a handler for the system instruction traps
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-24 11:59     ` Alexandru Elisei
  -1 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-24 11:59 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi,

On Fri, Jan 28, 2022 at 12:18:47PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> When HCR.NV bit is set, execution of the EL2 translation regime address
> aranslation instructions and TLB maintenance instructions are trapped to
> EL2. In addition, execution of the EL1 translation regime address
> aranslation instructions and TLB maintenance instructions that are only
> accessible from EL2 and above are trapped to EL2. In these cases,
> ESR_EL2.EC will be set to 0x18.
> 
> Rework the system instruction emulation framework to handle potentially
> all system instruction traps other than MSR/MRS instructions. Those
> system instructions would be AT and TLBI instructions controlled by
> HCR_EL2.NV, AT, and TTLB bits.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> [maz: squashed two patches together, redispatched various bits around]
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_host.h |  4 +--
>  arch/arm64/kvm/handle_exit.c      |  2 +-
>  arch/arm64/kvm/sys_regs.c         | 48 +++++++++++++++++++++++++------
>  3 files changed, 42 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index a15183d0e1bf..0b887364f994 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -407,7 +407,7 @@ struct kvm_vcpu_arch {
>  	/*
>  	 * Guest registers we preserve during guest debugging.
>  	 *
> -	 * These shadow registers are updated by the kvm_handle_sys_reg
> +	 * These shadow registers are updated by the kvm_handle_sys
>  	 * trap handler if the guest accesses or updates them while we
>  	 * are using guest debug.
>  	 */
> @@ -724,7 +724,7 @@ int kvm_handle_cp14_32(struct kvm_vcpu *vcpu);
>  int kvm_handle_cp14_64(struct kvm_vcpu *vcpu);
>  int kvm_handle_cp15_32(struct kvm_vcpu *vcpu);
>  int kvm_handle_cp15_64(struct kvm_vcpu *vcpu);
> -int kvm_handle_sys_reg(struct kvm_vcpu *vcpu);
> +int kvm_handle_sys(struct kvm_vcpu *vcpu);
>  
>  void kvm_reset_sys_regs(struct kvm_vcpu *vcpu);
>  
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 867de65eb766..d135fc7e6883 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -236,7 +236,7 @@ static exit_handle_fn arm_exit_handlers[] = {
>  	[ESR_ELx_EC_SMC32]	= handle_smc,
>  	[ESR_ELx_EC_HVC64]	= handle_hvc,
>  	[ESR_ELx_EC_SMC64]	= handle_smc,
> -	[ESR_ELx_EC_SYS64]	= kvm_handle_sys_reg,
> +	[ESR_ELx_EC_SYS64]	= kvm_handle_sys,
>  	[ESR_ELx_EC_SVE]	= handle_sve,
>  	[ESR_ELx_EC_ERET]	= kvm_handle_eret,
>  	[ESR_ELx_EC_IABT_LOW]	= kvm_handle_guest_abort,
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 5e8876177ce6..f669618f966b 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1771,10 +1771,6 @@ static bool access_spsr_el2(struct kvm_vcpu *vcpu,
>   * more demanding guest...
>   */
>  static const struct sys_reg_desc sys_reg_descs[] = {
> -	{ SYS_DESC(SYS_DC_ISW), access_dcsw },
> -	{ SYS_DESC(SYS_DC_CSW), access_dcsw },
> -	{ SYS_DESC(SYS_DC_CISW), access_dcsw },
> -
>  	DBG_BCR_BVR_WCR_WVR_EL1(0),
>  	DBG_BCR_BVR_WCR_WVR_EL1(1),
>  	{ SYS_DESC(SYS_MDCCINT_EL1), trap_debug_regs, reset_val, MDCCINT_EL1, 0 },
> @@ -2240,6 +2236,14 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	EL2_REG(SP_EL2, NULL, reset_unknown, 0),
>  };
>  
> +#define SYS_INSN_TO_DESC(insn, access_fn, forward_fn)	\
> +	{ SYS_DESC((insn)), (access_fn), NULL, 0, 0, NULL, NULL, (forward_fn) }

This macro looks out of place: it's not used anywhere in this patch, and the
next patch deletes it.

> +static struct sys_reg_desc sys_insn_descs[] = {
> +	{ SYS_DESC(SYS_DC_ISW), access_dcsw },
> +	{ SYS_DESC(SYS_DC_CSW), access_dcsw },
> +	{ SYS_DESC(SYS_DC_CISW), access_dcsw },
> +};
> +
>  static bool trap_dbgdidr(struct kvm_vcpu *vcpu,
>  			struct sys_reg_params *p,
>  			const struct sys_reg_desc *r)
> @@ -2786,6 +2790,24 @@ static int emulate_sys_reg(struct kvm_vcpu *vcpu,
>  	return 1;
>  }
>  
> +static int emulate_sys_instr(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
> +{
> +	const struct sys_reg_desc *r;
> +
> +	/* Search from the system instruction table. */
> +	r = find_reg(p, sys_insn_descs, ARRAY_SIZE(sys_insn_descs));
> +
> +	if (likely(r)) {
> +		perform_access(vcpu, p, r);
> +	} else {
> +		kvm_err("Unsupported guest sys instruction at: %lx\n",
> +			*vcpu_pc(vcpu));
> +		print_sys_reg_instr(p);
> +		kvm_inject_undefined(vcpu);
> +	}
> +	return 1;
> +}
> +
>  /**
>   * kvm_reset_sys_regs - sets system registers to reset value
>   * @vcpu: The VCPU pointer
> @@ -2803,10 +2825,11 @@ void kvm_reset_sys_regs(struct kvm_vcpu *vcpu)
>  }
>  
>  /**
> - * kvm_handle_sys_reg -- handles a mrs/msr trap on a guest sys_reg access
> + * kvm_handle_sys-- handles a system instruction or mrs/msr instruction trap
> +		    on a guest execution
>   * @vcpu: The VCPU pointer
>   */
> -int kvm_handle_sys_reg(struct kvm_vcpu *vcpu)
> +int kvm_handle_sys(struct kvm_vcpu *vcpu)
>  {
>  	struct sys_reg_params params;
>  	unsigned long esr = kvm_vcpu_get_esr(vcpu);
> @@ -2818,10 +2841,16 @@ int kvm_handle_sys_reg(struct kvm_vcpu *vcpu)
>  	params = esr_sys64_to_params(esr);
>  	params.regval = vcpu_get_reg(vcpu, Rt);
>  
> -	ret = emulate_sys_reg(vcpu, &params);
> +	if (params.Op0 == 1) {
> +		/* System instructions */
> +		ret = emulate_sys_instr(vcpu, &params);
> +	} else {

This doesn't look right: according to ARM DDI 0487G.a, page C5-393, Op0 =
{0,2} represents instruction classes different from system register
accesses, yet KVM puts them in the same bucket as system register traps.

May I suggest this change:

	if (params.Op0 == 3) {
		/* do emulate_sys_reg() */
	} else {
		/* do emulate_sys_instr() */
	}

Thanks,
Alex

> +		/* MRS/MSR instructions */
> +		ret = emulate_sys_reg(vcpu, &params);
> +		if (!params.is_write)
> +			vcpu_set_reg(vcpu, Rt, params.regval);
> +	}
>  
> -	if (!params.is_write)
> -		vcpu_set_reg(vcpu, Rt, params.regval);
>  	return ret;
>  }
>  
> @@ -3237,6 +3266,7 @@ void kvm_sys_reg_table_init(void)
>  	BUG_ON(check_sysreg_table(cp15_regs, ARRAY_SIZE(cp15_regs), true));
>  	BUG_ON(check_sysreg_table(cp15_64_regs, ARRAY_SIZE(cp15_64_regs), true));
>  	BUG_ON(check_sysreg_table(invariant_sys_regs, ARRAY_SIZE(invariant_sys_regs), false));
> +	BUG_ON(check_sysreg_table(sys_insn_descs, ARRAY_SIZE(sys_insn_descs), false));
>  
>  	/* We abuse the reset function to overwrite the table itself. */
>  	for (i = 0; i < ARRAY_SIZE(invariant_sys_regs); i++)
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 39/64] KVM: arm64: nv: Set a handler for the system instruction traps
@ 2022-02-24 11:59     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-24 11:59 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

Hi,

On Fri, Jan 28, 2022 at 12:18:47PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> When HCR.NV bit is set, execution of the EL2 translation regime address
> aranslation instructions and TLB maintenance instructions are trapped to
> EL2. In addition, execution of the EL1 translation regime address
> aranslation instructions and TLB maintenance instructions that are only
> accessible from EL2 and above are trapped to EL2. In these cases,
> ESR_EL2.EC will be set to 0x18.
> 
> Rework the system instruction emulation framework to handle potentially
> all system instruction traps other than MSR/MRS instructions. Those
> system instructions would be AT and TLBI instructions controlled by
> HCR_EL2.NV, AT, and TTLB bits.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> [maz: squashed two patches together, redispatched various bits around]
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_host.h |  4 +--
>  arch/arm64/kvm/handle_exit.c      |  2 +-
>  arch/arm64/kvm/sys_regs.c         | 48 +++++++++++++++++++++++++------
>  3 files changed, 42 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index a15183d0e1bf..0b887364f994 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -407,7 +407,7 @@ struct kvm_vcpu_arch {
>  	/*
>  	 * Guest registers we preserve during guest debugging.
>  	 *
> -	 * These shadow registers are updated by the kvm_handle_sys_reg
> +	 * These shadow registers are updated by the kvm_handle_sys
>  	 * trap handler if the guest accesses or updates them while we
>  	 * are using guest debug.
>  	 */
> @@ -724,7 +724,7 @@ int kvm_handle_cp14_32(struct kvm_vcpu *vcpu);
>  int kvm_handle_cp14_64(struct kvm_vcpu *vcpu);
>  int kvm_handle_cp15_32(struct kvm_vcpu *vcpu);
>  int kvm_handle_cp15_64(struct kvm_vcpu *vcpu);
> -int kvm_handle_sys_reg(struct kvm_vcpu *vcpu);
> +int kvm_handle_sys(struct kvm_vcpu *vcpu);
>  
>  void kvm_reset_sys_regs(struct kvm_vcpu *vcpu);
>  
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 867de65eb766..d135fc7e6883 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -236,7 +236,7 @@ static exit_handle_fn arm_exit_handlers[] = {
>  	[ESR_ELx_EC_SMC32]	= handle_smc,
>  	[ESR_ELx_EC_HVC64]	= handle_hvc,
>  	[ESR_ELx_EC_SMC64]	= handle_smc,
> -	[ESR_ELx_EC_SYS64]	= kvm_handle_sys_reg,
> +	[ESR_ELx_EC_SYS64]	= kvm_handle_sys,
>  	[ESR_ELx_EC_SVE]	= handle_sve,
>  	[ESR_ELx_EC_ERET]	= kvm_handle_eret,
>  	[ESR_ELx_EC_IABT_LOW]	= kvm_handle_guest_abort,
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 5e8876177ce6..f669618f966b 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1771,10 +1771,6 @@ static bool access_spsr_el2(struct kvm_vcpu *vcpu,
>   * more demanding guest...
>   */
>  static const struct sys_reg_desc sys_reg_descs[] = {
> -	{ SYS_DESC(SYS_DC_ISW), access_dcsw },
> -	{ SYS_DESC(SYS_DC_CSW), access_dcsw },
> -	{ SYS_DESC(SYS_DC_CISW), access_dcsw },
> -
>  	DBG_BCR_BVR_WCR_WVR_EL1(0),
>  	DBG_BCR_BVR_WCR_WVR_EL1(1),
>  	{ SYS_DESC(SYS_MDCCINT_EL1), trap_debug_regs, reset_val, MDCCINT_EL1, 0 },
> @@ -2240,6 +2236,14 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	EL2_REG(SP_EL2, NULL, reset_unknown, 0),
>  };
>  
> +#define SYS_INSN_TO_DESC(insn, access_fn, forward_fn)	\
> +	{ SYS_DESC((insn)), (access_fn), NULL, 0, 0, NULL, NULL, (forward_fn) }

This macro looks out of place: it's not used anywhere in this patch, and the
next patch deletes it.

> +static struct sys_reg_desc sys_insn_descs[] = {
> +	{ SYS_DESC(SYS_DC_ISW), access_dcsw },
> +	{ SYS_DESC(SYS_DC_CSW), access_dcsw },
> +	{ SYS_DESC(SYS_DC_CISW), access_dcsw },
> +};
> +
>  static bool trap_dbgdidr(struct kvm_vcpu *vcpu,
>  			struct sys_reg_params *p,
>  			const struct sys_reg_desc *r)
> @@ -2786,6 +2790,24 @@ static int emulate_sys_reg(struct kvm_vcpu *vcpu,
>  	return 1;
>  }
>  
> +static int emulate_sys_instr(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
> +{
> +	const struct sys_reg_desc *r;
> +
> +	/* Search from the system instruction table. */
> +	r = find_reg(p, sys_insn_descs, ARRAY_SIZE(sys_insn_descs));
> +
> +	if (likely(r)) {
> +		perform_access(vcpu, p, r);
> +	} else {
> +		kvm_err("Unsupported guest sys instruction at: %lx\n",
> +			*vcpu_pc(vcpu));
> +		print_sys_reg_instr(p);
> +		kvm_inject_undefined(vcpu);
> +	}
> +	return 1;
> +}
> +
>  /**
>   * kvm_reset_sys_regs - sets system registers to reset value
>   * @vcpu: The VCPU pointer
> @@ -2803,10 +2825,11 @@ void kvm_reset_sys_regs(struct kvm_vcpu *vcpu)
>  }
>  
>  /**
> - * kvm_handle_sys_reg -- handles a mrs/msr trap on a guest sys_reg access
> + * kvm_handle_sys-- handles a system instruction or mrs/msr instruction trap
> +		    on a guest execution
>   * @vcpu: The VCPU pointer
>   */
> -int kvm_handle_sys_reg(struct kvm_vcpu *vcpu)
> +int kvm_handle_sys(struct kvm_vcpu *vcpu)
>  {
>  	struct sys_reg_params params;
>  	unsigned long esr = kvm_vcpu_get_esr(vcpu);
> @@ -2818,10 +2841,16 @@ int kvm_handle_sys_reg(struct kvm_vcpu *vcpu)
>  	params = esr_sys64_to_params(esr);
>  	params.regval = vcpu_get_reg(vcpu, Rt);
>  
> -	ret = emulate_sys_reg(vcpu, &params);
> +	if (params.Op0 == 1) {
> +		/* System instructions */
> +		ret = emulate_sys_instr(vcpu, &params);
> +	} else {

This doesn't look right: according to ARM DDI 0487G.a, page C5-393, Op0 =
{0,2} represents instruction classes different from system register
accesses, yet KVM puts them in the same bucket as system register traps.

May I suggest this change:

	if (params.Op0 == 3) {
		/* do emulate_sys_reg() */
	} else {
		/* do emulate_sys_instr() */
	}

Thanks,
Alex

> +		/* MRS/MSR instructions */
> +		ret = emulate_sys_reg(vcpu, &params);
> +		if (!params.is_write)
> +			vcpu_set_reg(vcpu, Rt, params.regval);
> +	}
>  
> -	if (!params.is_write)
> -		vcpu_set_reg(vcpu, Rt, params.regval);
>  	return ret;
>  }
>  
> @@ -3237,6 +3266,7 @@ void kvm_sys_reg_table_init(void)
>  	BUG_ON(check_sysreg_table(cp15_regs, ARRAY_SIZE(cp15_regs), true));
>  	BUG_ON(check_sysreg_table(cp15_64_regs, ARRAY_SIZE(cp15_64_regs), true));
>  	BUG_ON(check_sysreg_table(invariant_sys_regs, ARRAY_SIZE(invariant_sys_regs), false));
> +	BUG_ON(check_sysreg_table(sys_insn_descs, ARRAY_SIZE(sys_insn_descs), false));
>  
>  	/* We abuse the reset function to overwrite the table itself. */
>  	for (i = 0; i < ARRAY_SIZE(invariant_sys_regs); i++)
> -- 
> 2.30.2
> 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 39/64] KVM: arm64: nv: Set a handler for the system instruction traps
@ 2022-02-24 11:59     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-24 11:59 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi,

On Fri, Jan 28, 2022 at 12:18:47PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> When HCR.NV bit is set, execution of the EL2 translation regime address
> aranslation instructions and TLB maintenance instructions are trapped to
> EL2. In addition, execution of the EL1 translation regime address
> aranslation instructions and TLB maintenance instructions that are only
> accessible from EL2 and above are trapped to EL2. In these cases,
> ESR_EL2.EC will be set to 0x18.
> 
> Rework the system instruction emulation framework to handle potentially
> all system instruction traps other than MSR/MRS instructions. Those
> system instructions would be AT and TLBI instructions controlled by
> HCR_EL2.NV, AT, and TTLB bits.
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> [maz: squashed two patches together, redispatched various bits around]
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_host.h |  4 +--
>  arch/arm64/kvm/handle_exit.c      |  2 +-
>  arch/arm64/kvm/sys_regs.c         | 48 +++++++++++++++++++++++++------
>  3 files changed, 42 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index a15183d0e1bf..0b887364f994 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -407,7 +407,7 @@ struct kvm_vcpu_arch {
>  	/*
>  	 * Guest registers we preserve during guest debugging.
>  	 *
> -	 * These shadow registers are updated by the kvm_handle_sys_reg
> +	 * These shadow registers are updated by the kvm_handle_sys
>  	 * trap handler if the guest accesses or updates them while we
>  	 * are using guest debug.
>  	 */
> @@ -724,7 +724,7 @@ int kvm_handle_cp14_32(struct kvm_vcpu *vcpu);
>  int kvm_handle_cp14_64(struct kvm_vcpu *vcpu);
>  int kvm_handle_cp15_32(struct kvm_vcpu *vcpu);
>  int kvm_handle_cp15_64(struct kvm_vcpu *vcpu);
> -int kvm_handle_sys_reg(struct kvm_vcpu *vcpu);
> +int kvm_handle_sys(struct kvm_vcpu *vcpu);
>  
>  void kvm_reset_sys_regs(struct kvm_vcpu *vcpu);
>  
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 867de65eb766..d135fc7e6883 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -236,7 +236,7 @@ static exit_handle_fn arm_exit_handlers[] = {
>  	[ESR_ELx_EC_SMC32]	= handle_smc,
>  	[ESR_ELx_EC_HVC64]	= handle_hvc,
>  	[ESR_ELx_EC_SMC64]	= handle_smc,
> -	[ESR_ELx_EC_SYS64]	= kvm_handle_sys_reg,
> +	[ESR_ELx_EC_SYS64]	= kvm_handle_sys,
>  	[ESR_ELx_EC_SVE]	= handle_sve,
>  	[ESR_ELx_EC_ERET]	= kvm_handle_eret,
>  	[ESR_ELx_EC_IABT_LOW]	= kvm_handle_guest_abort,
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 5e8876177ce6..f669618f966b 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1771,10 +1771,6 @@ static bool access_spsr_el2(struct kvm_vcpu *vcpu,
>   * more demanding guest...
>   */
>  static const struct sys_reg_desc sys_reg_descs[] = {
> -	{ SYS_DESC(SYS_DC_ISW), access_dcsw },
> -	{ SYS_DESC(SYS_DC_CSW), access_dcsw },
> -	{ SYS_DESC(SYS_DC_CISW), access_dcsw },
> -
>  	DBG_BCR_BVR_WCR_WVR_EL1(0),
>  	DBG_BCR_BVR_WCR_WVR_EL1(1),
>  	{ SYS_DESC(SYS_MDCCINT_EL1), trap_debug_regs, reset_val, MDCCINT_EL1, 0 },
> @@ -2240,6 +2236,14 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	EL2_REG(SP_EL2, NULL, reset_unknown, 0),
>  };
>  
> +#define SYS_INSN_TO_DESC(insn, access_fn, forward_fn)	\
> +	{ SYS_DESC((insn)), (access_fn), NULL, 0, 0, NULL, NULL, (forward_fn) }

This macro looks out of place: it's not used anywhere in this patch, and the
next patch deletes it.

> +static struct sys_reg_desc sys_insn_descs[] = {
> +	{ SYS_DESC(SYS_DC_ISW), access_dcsw },
> +	{ SYS_DESC(SYS_DC_CSW), access_dcsw },
> +	{ SYS_DESC(SYS_DC_CISW), access_dcsw },
> +};
> +
>  static bool trap_dbgdidr(struct kvm_vcpu *vcpu,
>  			struct sys_reg_params *p,
>  			const struct sys_reg_desc *r)
> @@ -2786,6 +2790,24 @@ static int emulate_sys_reg(struct kvm_vcpu *vcpu,
>  	return 1;
>  }
>  
> +static int emulate_sys_instr(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
> +{
> +	const struct sys_reg_desc *r;
> +
> +	/* Search from the system instruction table. */
> +	r = find_reg(p, sys_insn_descs, ARRAY_SIZE(sys_insn_descs));
> +
> +	if (likely(r)) {
> +		perform_access(vcpu, p, r);
> +	} else {
> +		kvm_err("Unsupported guest sys instruction at: %lx\n",
> +			*vcpu_pc(vcpu));
> +		print_sys_reg_instr(p);
> +		kvm_inject_undefined(vcpu);
> +	}
> +	return 1;
> +}
> +
>  /**
>   * kvm_reset_sys_regs - sets system registers to reset value
>   * @vcpu: The VCPU pointer
> @@ -2803,10 +2825,11 @@ void kvm_reset_sys_regs(struct kvm_vcpu *vcpu)
>  }
>  
>  /**
> - * kvm_handle_sys_reg -- handles a mrs/msr trap on a guest sys_reg access
> + * kvm_handle_sys-- handles a system instruction or mrs/msr instruction trap
> +		    on a guest execution
>   * @vcpu: The VCPU pointer
>   */
> -int kvm_handle_sys_reg(struct kvm_vcpu *vcpu)
> +int kvm_handle_sys(struct kvm_vcpu *vcpu)
>  {
>  	struct sys_reg_params params;
>  	unsigned long esr = kvm_vcpu_get_esr(vcpu);
> @@ -2818,10 +2841,16 @@ int kvm_handle_sys_reg(struct kvm_vcpu *vcpu)
>  	params = esr_sys64_to_params(esr);
>  	params.regval = vcpu_get_reg(vcpu, Rt);
>  
> -	ret = emulate_sys_reg(vcpu, &params);
> +	if (params.Op0 == 1) {
> +		/* System instructions */
> +		ret = emulate_sys_instr(vcpu, &params);
> +	} else {

This doesn't look right: according to ARM DDI 0487G.a, page C5-393, Op0 =
{0,2} represents instruction classes different from system register
accesses, yet KVM puts them in the same bucket as system register traps.

May I suggest this change:

	if (params.Op0 == 3) {
		/* do emulate_sys_reg() */
	} else {
		/* do emulate_sys_instr() */
	}

Thanks,
Alex

> +		/* MRS/MSR instructions */
> +		ret = emulate_sys_reg(vcpu, &params);
> +		if (!params.is_write)
> +			vcpu_set_reg(vcpu, Rt, params.regval);
> +	}
>  
> -	if (!params.is_write)
> -		vcpu_set_reg(vcpu, Rt, params.regval);
>  	return ret;
>  }
>  
> @@ -3237,6 +3266,7 @@ void kvm_sys_reg_table_init(void)
>  	BUG_ON(check_sysreg_table(cp15_regs, ARRAY_SIZE(cp15_regs), true));
>  	BUG_ON(check_sysreg_table(cp15_64_regs, ARRAY_SIZE(cp15_64_regs), true));
>  	BUG_ON(check_sysreg_table(invariant_sys_regs, ARRAY_SIZE(invariant_sys_regs), false));
> +	BUG_ON(check_sysreg_table(sys_insn_descs, ARRAY_SIZE(sys_insn_descs), false));
>  
>  	/* We abuse the reset function to overwrite the table itself. */
>  	for (i = 0; i < ARRAY_SIZE(invariant_sys_regs); i++)
> -- 
> 2.30.2
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 34/64] KVM: arm64: nv: Support multiple nested Stage-2 mmu structures
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-24 14:25     ` Alexandru Elisei
  -1 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-24 14:25 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi,

On Fri, Jan 28, 2022 at 12:18:42PM +0000, Marc Zyngier wrote:
> Add Stage-2 mmu data structures for virtual EL2 and for nested guests.
> We don't yet populate shadow Stage-2 page tables, but we now have a
> framework for getting to a shadow Stage-2 pgd.
> 
> We allocate twice the number of vcpus as Stage-2 mmu structures because
> that's sufficient for each vcpu running two translation regimes without
> having to flush the Stage-2 page tables.
> 
> Co-developed-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_host.h   |  29 +++++
>  arch/arm64/include/asm/kvm_mmu.h    |   9 ++
>  arch/arm64/include/asm/kvm_nested.h |   7 +
>  arch/arm64/kvm/arm.c                |  16 ++-
>  arch/arm64/kvm/mmu.c                |  29 +++--
>  arch/arm64/kvm/nested.c             | 195 ++++++++++++++++++++++++++++
>  6 files changed, 275 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index fa253f08e0fd..a15183d0e1bf 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -101,14 +101,43 @@ struct kvm_s2_mmu {
>  	int __percpu *last_vcpu_ran;
>  
>  	struct kvm_arch *arch;
> +
> +	/*
> +	 * For a shadow stage-2 MMU, the virtual vttbr programmed by the guest
> +	 * hypervisor.  Unused for kvm_arch->mmu. Set to 1 when the structure
> +	 * contains no valid information.
> +	 */
> +	u64	vttbr;
> +
> +	/* true when this represents a nested context where virtual HCR_EL2.VM == 1 */
> +	bool	nested_stage2_enabled;
> +
> +	/*
> +	 *  0: Nobody is currently using this, check vttbr for validity
> +	 * >0: Somebody is actively using this.
> +	 */
> +	atomic_t refcnt;
>  };
>  
> +static inline bool kvm_s2_mmu_valid(struct kvm_s2_mmu *mmu)
> +{
> +	return !(mmu->vttbr & 1);
> +}
> +
>  struct kvm_arch_memory_slot {
>  };
>  
>  struct kvm_arch {
>  	struct kvm_s2_mmu mmu;
>  
> +	/*
> +	 * Stage 2 paging stage for VMs with nested virtual using a virtual
> +	 * VMID.
> +	 */
> +	struct kvm_s2_mmu *nested_mmus;
> +	size_t nested_mmus_size;
> +	int nested_mmus_next;
> +
>  	/* VTCR_EL2 value for this VM */
>  	u64    vtcr;
>  
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 1b314b2a69bc..0750d022bbf8 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -116,6 +116,7 @@ alternative_cb_end
>  #include <asm/cacheflush.h>
>  #include <asm/mmu_context.h>
>  #include <asm/kvm_emulate.h>
> +#include <asm/kvm_nested.h>
>  
>  void kvm_update_va_mask(struct alt_instr *alt,
>  			__le32 *origptr, __le32 *updptr, int nr_inst);
> @@ -161,6 +162,7 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
>  			     void **haddr);
>  void free_hyp_pgds(void);
>  
> +void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size);
>  void stage2_unmap_vm(struct kvm *kvm);
>  int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu);
>  void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
> @@ -296,5 +298,12 @@ static inline struct kvm *kvm_s2_mmu_to_kvm(struct kvm_s2_mmu *mmu)
>  {
>  	return container_of(mmu->arch, struct kvm, arch);
>  }
> +
> +static inline u64 get_vmid(u64 vttbr)
> +{
> +	return (vttbr & VTTBR_VMID_MASK(kvm_get_vmid_bits())) >>
> +		VTTBR_VMID_SHIFT;
> +}
> +
>  #endif /* __ASSEMBLY__ */
>  #endif /* __ARM64_KVM_MMU_H__ */
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index 7d398510fd9d..8bb7159f2b6b 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -65,6 +65,13 @@ static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
>  		(cnthctl & (CNTHCTL_EVNTI | CNTHCTL_EVNTDIR | CNTHCTL_EVNTEN)));
>  }
>  
> +extern void kvm_init_nested(struct kvm *kvm);
> +extern int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu);
> +extern void kvm_init_nested_s2_mmu(struct kvm_s2_mmu *mmu);
> +extern struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr);
> +extern void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu);
> +extern void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu);
> +
>  int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
>  extern bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg,
>  			    u64 control_bit);
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 06ca11e90482..14f85f1e15b2 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -37,6 +37,7 @@
>  #include <asm/kvm_arm.h>
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_mmu.h>
> +#include <asm/kvm_nested.h>
>  #include <asm/kvm_emulate.h>
>  #include <asm/sections.h>
>  
> @@ -146,6 +147,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  	if (ret)
>  		return ret;
>  
> +	kvm_init_nested(kvm);
> +
>  	ret = kvm_share_hyp(kvm, kvm + 1);
>  	if (ret)
>  		goto out_free_stage2_pgd;
> @@ -375,6 +378,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  	struct kvm_s2_mmu *mmu;
>  	int *last_ran;
>  
> +	if (vcpu_has_nv(vcpu))
> +		kvm_vcpu_load_hw_mmu(vcpu);
> +
>  	mmu = vcpu->arch.hw_mmu;
>  	last_ran = this_cpu_ptr(mmu->last_vcpu_ran);
>  
> @@ -423,6 +429,9 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  	kvm_vgic_put(vcpu);
>  	kvm_vcpu_pmu_restore_host(vcpu);
>  
> +	if (vcpu_has_nv(vcpu))
> +		kvm_vcpu_put_hw_mmu(vcpu);
> +
>  	vcpu->cpu = -1;
>  }
>  
> @@ -1122,8 +1131,13 @@ static int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
>  
>  	vcpu->arch.target = phys_target;
>  
> +	/* Prepare for nested if required */
> +	ret = kvm_vcpu_init_nested(vcpu);
> +
>  	/* Now we know what it is, we can reset it. */
> -	ret = kvm_reset_vcpu(vcpu);
> +	if (!ret)
> +		ret = kvm_reset_vcpu(vcpu);
> +
>  	if (ret) {
>  		vcpu->arch.target = -1;
>  		bitmap_zero(vcpu->arch.features, KVM_VCPU_MAX_FEATURES);
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index bc2aba953299..55525fd5743d 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -162,7 +162,7 @@ static void invalidate_icache_guest_page(void *va, size_t size)
>   * does.
>   */
>  /**
> - * unmap_stage2_range -- Clear stage2 page table entries to unmap a range
> + * __unmap_stage2_range -- Clear stage2 page table entries to unmap a range
>   * @mmu:   The KVM stage-2 MMU pointer
>   * @start: The intermediate physical base address of the range to unmap
>   * @size:  The size of the area to unmap
> @@ -185,7 +185,7 @@ static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64
>  				   may_block));
>  }
>  
> -static void unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
> +void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
>  {
>  	__unmap_stage2_range(mmu, start, size, true);
>  }
> @@ -628,7 +628,20 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
>  	int cpu, err;
>  	struct kvm_pgtable *pgt;
>  
> +	/*
> +	 * If we already have our page tables in place, and that the
> +	 * MMU context is the canonical one, we have a bug somewhere,
> +	 * as this is only supposed to ever happen once per VM.
> +	 *
> +	 * Otherwise, we're building nested page tables, and that's
> +	 * probably because userspace called KVM_ARM_VCPU_INIT more
> +	 * than once on the same vcpu. Since that's actually legal,
> +	 * don't kick a fuss and leave gracefully.
> +	 */
>  	if (mmu->pgt != NULL) {
> +		if (&kvm->arch.mmu != mmu)
> +			return 0;
> +
>  		kvm_err("kvm_arch already initialized?\n");
>  		return -EINVAL;
>  	}
> @@ -654,6 +667,9 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
>  	mmu->pgt = pgt;
>  	mmu->pgd_phys = __pa(pgt->pgd);
>  	WRITE_ONCE(mmu->vmid.vmid_gen, 0);
> +
> +	kvm_init_nested_s2_mmu(mmu);
> +
>  	return 0;
>  
>  out_destroy_pgtable:
> @@ -699,7 +715,7 @@ static void stage2_unmap_memslot(struct kvm *kvm,
>  
>  		if (!(vma->vm_flags & VM_PFNMAP)) {
>  			gpa_t gpa = addr + (vm_start - memslot->userspace_addr);
> -			unmap_stage2_range(&kvm->arch.mmu, gpa, vm_end - vm_start);
> +			kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, vm_end - vm_start);
>  		}
>  		hva = vm_end;
>  	} while (hva < reg_end);
> @@ -1681,11 +1697,6 @@ void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen)
>  {
>  }
>  
> -void kvm_arch_flush_shadow_all(struct kvm *kvm)
> -{
> -	kvm_free_stage2_pgd(&kvm->arch.mmu);
> -}
> -
>  void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>  				   struct kvm_memory_slot *slot)
>  {
> @@ -1693,7 +1704,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>  	phys_addr_t size = slot->npages << PAGE_SHIFT;
>  
>  	spin_lock(&kvm->mmu_lock);
> -	unmap_stage2_range(&kvm->arch.mmu, gpa, size);
> +	kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, size);
>  	spin_unlock(&kvm->mmu_lock);
>  }
>  
> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> index 254152cd791e..bfa2b9229173 100644
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
> @@ -7,12 +7,189 @@
>  #include <linux/kvm.h>
>  #include <linux/kvm_host.h>
>  
> +#include <asm/kvm_arm.h>
>  #include <asm/kvm_emulate.h>
> +#include <asm/kvm_mmu.h>
>  #include <asm/kvm_nested.h>
>  #include <asm/sysreg.h>
>  
>  #include "sys_regs.h"
>  
> +void kvm_init_nested(struct kvm *kvm)
> +{
> +	kvm->arch.nested_mmus = NULL;
> +	kvm->arch.nested_mmus_size = 0;
> +}
> +
> +int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm *kvm = vcpu->kvm;
> +	struct kvm_s2_mmu *tmp;
> +	int num_mmus;
> +	int ret = -ENOMEM;
> +
> +	if (!test_bit(KVM_ARM_VCPU_HAS_EL2, vcpu->arch.features))
> +		return 0;
> +
> +	if (!cpus_have_final_cap(ARM64_HAS_NESTED_VIRT))
> +		return -EINVAL;
> +
> +	mutex_lock(&kvm->lock);
> +
> +	/*
> +	 * Let's treat memory allocation failures as benign: If we fail to
> +	 * allocate anything, return an error and keep the allocated array
> +	 * alive. Userspace may try to recover by intializing the vcpu
> +	 * again, and there is no reason to affect the whole VM for this.
> +	 */
> +	num_mmus = atomic_read(&kvm->online_vcpus) * 2;
> +	tmp = krealloc(kvm->arch.nested_mmus,
> +		       num_mmus * sizeof(*kvm->arch.nested_mmus),
> +		       GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> +	if (tmp) {
> +		/*
> +		 * If we went through a realocation, adjust the MMU
> +		 * back-pointers in the previously initialised
> +		 * pg_table structures.
> +		 */
> +		if (kvm->arch.nested_mmus != tmp) {
> +			int i;
> +
> +			for (i = 0; i < num_mmus - 2; i++)
> +				tmp[i].pgt->mmu = &tmp[i];
> +		}
> +
> +		if (kvm_init_stage2_mmu(kvm, &tmp[num_mmus - 1]) ||
> +		    kvm_init_stage2_mmu(kvm, &tmp[num_mmus - 2])) {
> +			kvm_free_stage2_pgd(&tmp[num_mmus - 1]);
> +			kvm_free_stage2_pgd(&tmp[num_mmus - 2]);
> +		} else {
> +			kvm->arch.nested_mmus_size = num_mmus;
> +			ret = 0;
> +		}
> +
> +		kvm->arch.nested_mmus = tmp;
> +	}
> +
> +	mutex_unlock(&kvm->lock);
> +	return ret;
> +}
> +
> +/* Must be called with kvm->lock held */
                               ^^^^
Either that should be kvm->mmu_lock, or lookup_s2_mmu() is used incorrectly
in this patch.

Thanks,
Alex

> +struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr)
> +{
> +	bool nested_stage2_enabled = hcr & HCR_VM;
> +	int i;
> +
> +	/* Don't consider the CnP bit for the vttbr match */
> +	vttbr = vttbr & ~VTTBR_CNP_BIT;
> +
> +	/*
> +	 * Two possibilities when looking up a S2 MMU context:
> +	 *
> +	 * - either S2 is enabled in the guest, and we need a context that
> +         *   is S2-enabled and matches the full VTTBR (VMID+BADDR), which
> +         *   makes it safe from a TLB conflict perspective (a broken guest
> +         *   won't be able to generate them),
> +	 *
> +	 * - or S2 is disabled, and we need a context that is S2-disabled
> +         *   and matches the VMID only, as all TLBs are tagged by VMID even
> +         *   if S2 translation is disabled.
> +	 */
> +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> +		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
> +
> +		if (!kvm_s2_mmu_valid(mmu))
> +			continue;
> +
> +		if (nested_stage2_enabled &&
> +		    mmu->nested_stage2_enabled &&
> +		    vttbr == mmu->vttbr)
> +			return mmu;
> +
> +		if (!nested_stage2_enabled &&
> +		    !mmu->nested_stage2_enabled &&
> +		    get_vmid(vttbr) == get_vmid(mmu->vttbr))
> +			return mmu;
> +	}
> +	return NULL;
> +}
> +
> +static struct kvm_s2_mmu *get_s2_mmu_nested(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm *kvm = vcpu->kvm;
> +	u64 vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
> +	u64 hcr= vcpu_read_sys_reg(vcpu, HCR_EL2);
> +	struct kvm_s2_mmu *s2_mmu;
> +	int i;
> +
> +	s2_mmu = lookup_s2_mmu(kvm, vttbr, hcr);
> +	if (s2_mmu)
> +		goto out;
> +
> +	/*
> +	 * Make sure we don't always search from the same point, or we
> +	 * will always reuse a potentially active context, leaving
> +	 * free contexts unused.
> +	 */
> +	for (i = kvm->arch.nested_mmus_next;
> +	     i < (kvm->arch.nested_mmus_size + kvm->arch.nested_mmus_next);
> +	     i++) {
> +		s2_mmu = &kvm->arch.nested_mmus[i % kvm->arch.nested_mmus_size];
> +
> +		if (atomic_read(&s2_mmu->refcnt) == 0)
> +			break;
> +	}
> +	BUG_ON(atomic_read(&s2_mmu->refcnt)); /* We have struct MMUs to spare */
> +
> +	/* Set the scene for the next search */
> +	kvm->arch.nested_mmus_next = (i + 1) % kvm->arch.nested_mmus_size;
> +
> +	if (kvm_s2_mmu_valid(s2_mmu)) {
> +		/* Clear the old state */
> +		kvm_unmap_stage2_range(s2_mmu, 0, kvm_phys_size(kvm));
> +		if (s2_mmu->vmid.vmid_gen)
> +			kvm_call_hyp(__kvm_tlb_flush_vmid, s2_mmu);
> +	}
> +
> +	/*
> +	 * The virtual VMID (modulo CnP) will be used as a key when matching
> +	 * an existing kvm_s2_mmu.
> +	 */
> +	s2_mmu->vttbr = vttbr & ~VTTBR_CNP_BIT;
> +	s2_mmu->nested_stage2_enabled = hcr & HCR_VM;
> +
> +out:
> +	atomic_inc(&s2_mmu->refcnt);
> +	return s2_mmu;
> +}
> +
> +void kvm_init_nested_s2_mmu(struct kvm_s2_mmu *mmu)
> +{
> +	mmu->vttbr = 1;
> +	mmu->nested_stage2_enabled = false;
> +	atomic_set(&mmu->refcnt, 0);
> +}
> +
> +void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu)
> +{
> +	if (is_hyp_ctxt(vcpu)) {
> +		vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu;
> +	} else {
> +		spin_lock(&vcpu->kvm->mmu_lock);
> +		vcpu->arch.hw_mmu = get_s2_mmu_nested(vcpu);
> +		spin_unlock(&vcpu->kvm->mmu_lock);
> +	}
> +}
> +
> +void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu)
> +{
> +	if (vcpu->arch.hw_mmu != &vcpu->kvm->arch.mmu) {
> +		atomic_dec(&vcpu->arch.hw_mmu->refcnt);
> +		vcpu->arch.hw_mmu = NULL;
> +	}
> +}
> +
>  /*
>   * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
>   * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
> @@ -31,6 +208,24 @@ int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
>  	return -EINVAL;
>  }
>  
> +void kvm_arch_flush_shadow_all(struct kvm *kvm)
> +{
> +	int i;
> +
> +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> +		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
> +
> +		WARN_ON(atomic_read(&mmu->refcnt));
> +
> +		if (!atomic_read(&mmu->refcnt))
> +			kvm_free_stage2_pgd(mmu);
> +	}
> +	kfree(kvm->arch.nested_mmus);
> +	kvm->arch.nested_mmus = NULL;
> +	kvm->arch.nested_mmus_size = 0;
> +	kvm_free_stage2_pgd(&kvm->arch.mmu);
> +}
> +
>  /*
>   * Our emulated CPU doesn't support all the possible features. For the
>   * sake of simplicity (and probably mental sanity), wipe out a number
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 34/64] KVM: arm64: nv: Support multiple nested Stage-2 mmu structures
@ 2022-02-24 14:25     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-24 14:25 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

Hi,

On Fri, Jan 28, 2022 at 12:18:42PM +0000, Marc Zyngier wrote:
> Add Stage-2 mmu data structures for virtual EL2 and for nested guests.
> We don't yet populate shadow Stage-2 page tables, but we now have a
> framework for getting to a shadow Stage-2 pgd.
> 
> We allocate twice the number of vcpus as Stage-2 mmu structures because
> that's sufficient for each vcpu running two translation regimes without
> having to flush the Stage-2 page tables.
> 
> Co-developed-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_host.h   |  29 +++++
>  arch/arm64/include/asm/kvm_mmu.h    |   9 ++
>  arch/arm64/include/asm/kvm_nested.h |   7 +
>  arch/arm64/kvm/arm.c                |  16 ++-
>  arch/arm64/kvm/mmu.c                |  29 +++--
>  arch/arm64/kvm/nested.c             | 195 ++++++++++++++++++++++++++++
>  6 files changed, 275 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index fa253f08e0fd..a15183d0e1bf 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -101,14 +101,43 @@ struct kvm_s2_mmu {
>  	int __percpu *last_vcpu_ran;
>  
>  	struct kvm_arch *arch;
> +
> +	/*
> +	 * For a shadow stage-2 MMU, the virtual vttbr programmed by the guest
> +	 * hypervisor.  Unused for kvm_arch->mmu. Set to 1 when the structure
> +	 * contains no valid information.
> +	 */
> +	u64	vttbr;
> +
> +	/* true when this represents a nested context where virtual HCR_EL2.VM == 1 */
> +	bool	nested_stage2_enabled;
> +
> +	/*
> +	 *  0: Nobody is currently using this, check vttbr for validity
> +	 * >0: Somebody is actively using this.
> +	 */
> +	atomic_t refcnt;
>  };
>  
> +static inline bool kvm_s2_mmu_valid(struct kvm_s2_mmu *mmu)
> +{
> +	return !(mmu->vttbr & 1);
> +}
> +
>  struct kvm_arch_memory_slot {
>  };
>  
>  struct kvm_arch {
>  	struct kvm_s2_mmu mmu;
>  
> +	/*
> +	 * Stage 2 paging stage for VMs with nested virtual using a virtual
> +	 * VMID.
> +	 */
> +	struct kvm_s2_mmu *nested_mmus;
> +	size_t nested_mmus_size;
> +	int nested_mmus_next;
> +
>  	/* VTCR_EL2 value for this VM */
>  	u64    vtcr;
>  
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 1b314b2a69bc..0750d022bbf8 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -116,6 +116,7 @@ alternative_cb_end
>  #include <asm/cacheflush.h>
>  #include <asm/mmu_context.h>
>  #include <asm/kvm_emulate.h>
> +#include <asm/kvm_nested.h>
>  
>  void kvm_update_va_mask(struct alt_instr *alt,
>  			__le32 *origptr, __le32 *updptr, int nr_inst);
> @@ -161,6 +162,7 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
>  			     void **haddr);
>  void free_hyp_pgds(void);
>  
> +void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size);
>  void stage2_unmap_vm(struct kvm *kvm);
>  int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu);
>  void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
> @@ -296,5 +298,12 @@ static inline struct kvm *kvm_s2_mmu_to_kvm(struct kvm_s2_mmu *mmu)
>  {
>  	return container_of(mmu->arch, struct kvm, arch);
>  }
> +
> +static inline u64 get_vmid(u64 vttbr)
> +{
> +	return (vttbr & VTTBR_VMID_MASK(kvm_get_vmid_bits())) >>
> +		VTTBR_VMID_SHIFT;
> +}
> +
>  #endif /* __ASSEMBLY__ */
>  #endif /* __ARM64_KVM_MMU_H__ */
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index 7d398510fd9d..8bb7159f2b6b 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -65,6 +65,13 @@ static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
>  		(cnthctl & (CNTHCTL_EVNTI | CNTHCTL_EVNTDIR | CNTHCTL_EVNTEN)));
>  }
>  
> +extern void kvm_init_nested(struct kvm *kvm);
> +extern int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu);
> +extern void kvm_init_nested_s2_mmu(struct kvm_s2_mmu *mmu);
> +extern struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr);
> +extern void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu);
> +extern void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu);
> +
>  int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
>  extern bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg,
>  			    u64 control_bit);
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 06ca11e90482..14f85f1e15b2 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -37,6 +37,7 @@
>  #include <asm/kvm_arm.h>
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_mmu.h>
> +#include <asm/kvm_nested.h>
>  #include <asm/kvm_emulate.h>
>  #include <asm/sections.h>
>  
> @@ -146,6 +147,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  	if (ret)
>  		return ret;
>  
> +	kvm_init_nested(kvm);
> +
>  	ret = kvm_share_hyp(kvm, kvm + 1);
>  	if (ret)
>  		goto out_free_stage2_pgd;
> @@ -375,6 +378,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  	struct kvm_s2_mmu *mmu;
>  	int *last_ran;
>  
> +	if (vcpu_has_nv(vcpu))
> +		kvm_vcpu_load_hw_mmu(vcpu);
> +
>  	mmu = vcpu->arch.hw_mmu;
>  	last_ran = this_cpu_ptr(mmu->last_vcpu_ran);
>  
> @@ -423,6 +429,9 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  	kvm_vgic_put(vcpu);
>  	kvm_vcpu_pmu_restore_host(vcpu);
>  
> +	if (vcpu_has_nv(vcpu))
> +		kvm_vcpu_put_hw_mmu(vcpu);
> +
>  	vcpu->cpu = -1;
>  }
>  
> @@ -1122,8 +1131,13 @@ static int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
>  
>  	vcpu->arch.target = phys_target;
>  
> +	/* Prepare for nested if required */
> +	ret = kvm_vcpu_init_nested(vcpu);
> +
>  	/* Now we know what it is, we can reset it. */
> -	ret = kvm_reset_vcpu(vcpu);
> +	if (!ret)
> +		ret = kvm_reset_vcpu(vcpu);
> +
>  	if (ret) {
>  		vcpu->arch.target = -1;
>  		bitmap_zero(vcpu->arch.features, KVM_VCPU_MAX_FEATURES);
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index bc2aba953299..55525fd5743d 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -162,7 +162,7 @@ static void invalidate_icache_guest_page(void *va, size_t size)
>   * does.
>   */
>  /**
> - * unmap_stage2_range -- Clear stage2 page table entries to unmap a range
> + * __unmap_stage2_range -- Clear stage2 page table entries to unmap a range
>   * @mmu:   The KVM stage-2 MMU pointer
>   * @start: The intermediate physical base address of the range to unmap
>   * @size:  The size of the area to unmap
> @@ -185,7 +185,7 @@ static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64
>  				   may_block));
>  }
>  
> -static void unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
> +void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
>  {
>  	__unmap_stage2_range(mmu, start, size, true);
>  }
> @@ -628,7 +628,20 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
>  	int cpu, err;
>  	struct kvm_pgtable *pgt;
>  
> +	/*
> +	 * If we already have our page tables in place, and that the
> +	 * MMU context is the canonical one, we have a bug somewhere,
> +	 * as this is only supposed to ever happen once per VM.
> +	 *
> +	 * Otherwise, we're building nested page tables, and that's
> +	 * probably because userspace called KVM_ARM_VCPU_INIT more
> +	 * than once on the same vcpu. Since that's actually legal,
> +	 * don't kick a fuss and leave gracefully.
> +	 */
>  	if (mmu->pgt != NULL) {
> +		if (&kvm->arch.mmu != mmu)
> +			return 0;
> +
>  		kvm_err("kvm_arch already initialized?\n");
>  		return -EINVAL;
>  	}
> @@ -654,6 +667,9 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
>  	mmu->pgt = pgt;
>  	mmu->pgd_phys = __pa(pgt->pgd);
>  	WRITE_ONCE(mmu->vmid.vmid_gen, 0);
> +
> +	kvm_init_nested_s2_mmu(mmu);
> +
>  	return 0;
>  
>  out_destroy_pgtable:
> @@ -699,7 +715,7 @@ static void stage2_unmap_memslot(struct kvm *kvm,
>  
>  		if (!(vma->vm_flags & VM_PFNMAP)) {
>  			gpa_t gpa = addr + (vm_start - memslot->userspace_addr);
> -			unmap_stage2_range(&kvm->arch.mmu, gpa, vm_end - vm_start);
> +			kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, vm_end - vm_start);
>  		}
>  		hva = vm_end;
>  	} while (hva < reg_end);
> @@ -1681,11 +1697,6 @@ void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen)
>  {
>  }
>  
> -void kvm_arch_flush_shadow_all(struct kvm *kvm)
> -{
> -	kvm_free_stage2_pgd(&kvm->arch.mmu);
> -}
> -
>  void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>  				   struct kvm_memory_slot *slot)
>  {
> @@ -1693,7 +1704,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>  	phys_addr_t size = slot->npages << PAGE_SHIFT;
>  
>  	spin_lock(&kvm->mmu_lock);
> -	unmap_stage2_range(&kvm->arch.mmu, gpa, size);
> +	kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, size);
>  	spin_unlock(&kvm->mmu_lock);
>  }
>  
> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> index 254152cd791e..bfa2b9229173 100644
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
> @@ -7,12 +7,189 @@
>  #include <linux/kvm.h>
>  #include <linux/kvm_host.h>
>  
> +#include <asm/kvm_arm.h>
>  #include <asm/kvm_emulate.h>
> +#include <asm/kvm_mmu.h>
>  #include <asm/kvm_nested.h>
>  #include <asm/sysreg.h>
>  
>  #include "sys_regs.h"
>  
> +void kvm_init_nested(struct kvm *kvm)
> +{
> +	kvm->arch.nested_mmus = NULL;
> +	kvm->arch.nested_mmus_size = 0;
> +}
> +
> +int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm *kvm = vcpu->kvm;
> +	struct kvm_s2_mmu *tmp;
> +	int num_mmus;
> +	int ret = -ENOMEM;
> +
> +	if (!test_bit(KVM_ARM_VCPU_HAS_EL2, vcpu->arch.features))
> +		return 0;
> +
> +	if (!cpus_have_final_cap(ARM64_HAS_NESTED_VIRT))
> +		return -EINVAL;
> +
> +	mutex_lock(&kvm->lock);
> +
> +	/*
> +	 * Let's treat memory allocation failures as benign: If we fail to
> +	 * allocate anything, return an error and keep the allocated array
> +	 * alive. Userspace may try to recover by intializing the vcpu
> +	 * again, and there is no reason to affect the whole VM for this.
> +	 */
> +	num_mmus = atomic_read(&kvm->online_vcpus) * 2;
> +	tmp = krealloc(kvm->arch.nested_mmus,
> +		       num_mmus * sizeof(*kvm->arch.nested_mmus),
> +		       GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> +	if (tmp) {
> +		/*
> +		 * If we went through a realocation, adjust the MMU
> +		 * back-pointers in the previously initialised
> +		 * pg_table structures.
> +		 */
> +		if (kvm->arch.nested_mmus != tmp) {
> +			int i;
> +
> +			for (i = 0; i < num_mmus - 2; i++)
> +				tmp[i].pgt->mmu = &tmp[i];
> +		}
> +
> +		if (kvm_init_stage2_mmu(kvm, &tmp[num_mmus - 1]) ||
> +		    kvm_init_stage2_mmu(kvm, &tmp[num_mmus - 2])) {
> +			kvm_free_stage2_pgd(&tmp[num_mmus - 1]);
> +			kvm_free_stage2_pgd(&tmp[num_mmus - 2]);
> +		} else {
> +			kvm->arch.nested_mmus_size = num_mmus;
> +			ret = 0;
> +		}
> +
> +		kvm->arch.nested_mmus = tmp;
> +	}
> +
> +	mutex_unlock(&kvm->lock);
> +	return ret;
> +}
> +
> +/* Must be called with kvm->lock held */
                               ^^^^
Either that should be kvm->mmu_lock, or lookup_s2_mmu() is used incorrectly
in this patch.

Thanks,
Alex

> +struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr)
> +{
> +	bool nested_stage2_enabled = hcr & HCR_VM;
> +	int i;
> +
> +	/* Don't consider the CnP bit for the vttbr match */
> +	vttbr = vttbr & ~VTTBR_CNP_BIT;
> +
> +	/*
> +	 * Two possibilities when looking up a S2 MMU context:
> +	 *
> +	 * - either S2 is enabled in the guest, and we need a context that
> +         *   is S2-enabled and matches the full VTTBR (VMID+BADDR), which
> +         *   makes it safe from a TLB conflict perspective (a broken guest
> +         *   won't be able to generate them),
> +	 *
> +	 * - or S2 is disabled, and we need a context that is S2-disabled
> +         *   and matches the VMID only, as all TLBs are tagged by VMID even
> +         *   if S2 translation is disabled.
> +	 */
> +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> +		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
> +
> +		if (!kvm_s2_mmu_valid(mmu))
> +			continue;
> +
> +		if (nested_stage2_enabled &&
> +		    mmu->nested_stage2_enabled &&
> +		    vttbr == mmu->vttbr)
> +			return mmu;
> +
> +		if (!nested_stage2_enabled &&
> +		    !mmu->nested_stage2_enabled &&
> +		    get_vmid(vttbr) == get_vmid(mmu->vttbr))
> +			return mmu;
> +	}
> +	return NULL;
> +}
> +
> +static struct kvm_s2_mmu *get_s2_mmu_nested(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm *kvm = vcpu->kvm;
> +	u64 vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
> +	u64 hcr= vcpu_read_sys_reg(vcpu, HCR_EL2);
> +	struct kvm_s2_mmu *s2_mmu;
> +	int i;
> +
> +	s2_mmu = lookup_s2_mmu(kvm, vttbr, hcr);
> +	if (s2_mmu)
> +		goto out;
> +
> +	/*
> +	 * Make sure we don't always search from the same point, or we
> +	 * will always reuse a potentially active context, leaving
> +	 * free contexts unused.
> +	 */
> +	for (i = kvm->arch.nested_mmus_next;
> +	     i < (kvm->arch.nested_mmus_size + kvm->arch.nested_mmus_next);
> +	     i++) {
> +		s2_mmu = &kvm->arch.nested_mmus[i % kvm->arch.nested_mmus_size];
> +
> +		if (atomic_read(&s2_mmu->refcnt) == 0)
> +			break;
> +	}
> +	BUG_ON(atomic_read(&s2_mmu->refcnt)); /* We have struct MMUs to spare */
> +
> +	/* Set the scene for the next search */
> +	kvm->arch.nested_mmus_next = (i + 1) % kvm->arch.nested_mmus_size;
> +
> +	if (kvm_s2_mmu_valid(s2_mmu)) {
> +		/* Clear the old state */
> +		kvm_unmap_stage2_range(s2_mmu, 0, kvm_phys_size(kvm));
> +		if (s2_mmu->vmid.vmid_gen)
> +			kvm_call_hyp(__kvm_tlb_flush_vmid, s2_mmu);
> +	}
> +
> +	/*
> +	 * The virtual VMID (modulo CnP) will be used as a key when matching
> +	 * an existing kvm_s2_mmu.
> +	 */
> +	s2_mmu->vttbr = vttbr & ~VTTBR_CNP_BIT;
> +	s2_mmu->nested_stage2_enabled = hcr & HCR_VM;
> +
> +out:
> +	atomic_inc(&s2_mmu->refcnt);
> +	return s2_mmu;
> +}
> +
> +void kvm_init_nested_s2_mmu(struct kvm_s2_mmu *mmu)
> +{
> +	mmu->vttbr = 1;
> +	mmu->nested_stage2_enabled = false;
> +	atomic_set(&mmu->refcnt, 0);
> +}
> +
> +void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu)
> +{
> +	if (is_hyp_ctxt(vcpu)) {
> +		vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu;
> +	} else {
> +		spin_lock(&vcpu->kvm->mmu_lock);
> +		vcpu->arch.hw_mmu = get_s2_mmu_nested(vcpu);
> +		spin_unlock(&vcpu->kvm->mmu_lock);
> +	}
> +}
> +
> +void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu)
> +{
> +	if (vcpu->arch.hw_mmu != &vcpu->kvm->arch.mmu) {
> +		atomic_dec(&vcpu->arch.hw_mmu->refcnt);
> +		vcpu->arch.hw_mmu = NULL;
> +	}
> +}
> +
>  /*
>   * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
>   * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
> @@ -31,6 +208,24 @@ int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
>  	return -EINVAL;
>  }
>  
> +void kvm_arch_flush_shadow_all(struct kvm *kvm)
> +{
> +	int i;
> +
> +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> +		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
> +
> +		WARN_ON(atomic_read(&mmu->refcnt));
> +
> +		if (!atomic_read(&mmu->refcnt))
> +			kvm_free_stage2_pgd(mmu);
> +	}
> +	kfree(kvm->arch.nested_mmus);
> +	kvm->arch.nested_mmus = NULL;
> +	kvm->arch.nested_mmus_size = 0;
> +	kvm_free_stage2_pgd(&kvm->arch.mmu);
> +}
> +
>  /*
>   * Our emulated CPU doesn't support all the possible features. For the
>   * sake of simplicity (and probably mental sanity), wipe out a number
> -- 
> 2.30.2
> 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 34/64] KVM: arm64: nv: Support multiple nested Stage-2 mmu structures
@ 2022-02-24 14:25     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-24 14:25 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi,

On Fri, Jan 28, 2022 at 12:18:42PM +0000, Marc Zyngier wrote:
> Add Stage-2 mmu data structures for virtual EL2 and for nested guests.
> We don't yet populate shadow Stage-2 page tables, but we now have a
> framework for getting to a shadow Stage-2 pgd.
> 
> We allocate twice the number of vcpus as Stage-2 mmu structures because
> that's sufficient for each vcpu running two translation regimes without
> having to flush the Stage-2 page tables.
> 
> Co-developed-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_host.h   |  29 +++++
>  arch/arm64/include/asm/kvm_mmu.h    |   9 ++
>  arch/arm64/include/asm/kvm_nested.h |   7 +
>  arch/arm64/kvm/arm.c                |  16 ++-
>  arch/arm64/kvm/mmu.c                |  29 +++--
>  arch/arm64/kvm/nested.c             | 195 ++++++++++++++++++++++++++++
>  6 files changed, 275 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index fa253f08e0fd..a15183d0e1bf 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -101,14 +101,43 @@ struct kvm_s2_mmu {
>  	int __percpu *last_vcpu_ran;
>  
>  	struct kvm_arch *arch;
> +
> +	/*
> +	 * For a shadow stage-2 MMU, the virtual vttbr programmed by the guest
> +	 * hypervisor.  Unused for kvm_arch->mmu. Set to 1 when the structure
> +	 * contains no valid information.
> +	 */
> +	u64	vttbr;
> +
> +	/* true when this represents a nested context where virtual HCR_EL2.VM == 1 */
> +	bool	nested_stage2_enabled;
> +
> +	/*
> +	 *  0: Nobody is currently using this, check vttbr for validity
> +	 * >0: Somebody is actively using this.
> +	 */
> +	atomic_t refcnt;
>  };
>  
> +static inline bool kvm_s2_mmu_valid(struct kvm_s2_mmu *mmu)
> +{
> +	return !(mmu->vttbr & 1);
> +}
> +
>  struct kvm_arch_memory_slot {
>  };
>  
>  struct kvm_arch {
>  	struct kvm_s2_mmu mmu;
>  
> +	/*
> +	 * Stage 2 paging stage for VMs with nested virtual using a virtual
> +	 * VMID.
> +	 */
> +	struct kvm_s2_mmu *nested_mmus;
> +	size_t nested_mmus_size;
> +	int nested_mmus_next;
> +
>  	/* VTCR_EL2 value for this VM */
>  	u64    vtcr;
>  
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 1b314b2a69bc..0750d022bbf8 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -116,6 +116,7 @@ alternative_cb_end
>  #include <asm/cacheflush.h>
>  #include <asm/mmu_context.h>
>  #include <asm/kvm_emulate.h>
> +#include <asm/kvm_nested.h>
>  
>  void kvm_update_va_mask(struct alt_instr *alt,
>  			__le32 *origptr, __le32 *updptr, int nr_inst);
> @@ -161,6 +162,7 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
>  			     void **haddr);
>  void free_hyp_pgds(void);
>  
> +void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size);
>  void stage2_unmap_vm(struct kvm *kvm);
>  int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu);
>  void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
> @@ -296,5 +298,12 @@ static inline struct kvm *kvm_s2_mmu_to_kvm(struct kvm_s2_mmu *mmu)
>  {
>  	return container_of(mmu->arch, struct kvm, arch);
>  }
> +
> +static inline u64 get_vmid(u64 vttbr)
> +{
> +	return (vttbr & VTTBR_VMID_MASK(kvm_get_vmid_bits())) >>
> +		VTTBR_VMID_SHIFT;
> +}
> +
>  #endif /* __ASSEMBLY__ */
>  #endif /* __ARM64_KVM_MMU_H__ */
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index 7d398510fd9d..8bb7159f2b6b 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -65,6 +65,13 @@ static inline u64 translate_cnthctl_el2_to_cntkctl_el1(u64 cnthctl)
>  		(cnthctl & (CNTHCTL_EVNTI | CNTHCTL_EVNTDIR | CNTHCTL_EVNTEN)));
>  }
>  
> +extern void kvm_init_nested(struct kvm *kvm);
> +extern int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu);
> +extern void kvm_init_nested_s2_mmu(struct kvm_s2_mmu *mmu);
> +extern struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr);
> +extern void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu);
> +extern void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu);
> +
>  int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
>  extern bool __forward_traps(struct kvm_vcpu *vcpu, unsigned int reg,
>  			    u64 control_bit);
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 06ca11e90482..14f85f1e15b2 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -37,6 +37,7 @@
>  #include <asm/kvm_arm.h>
>  #include <asm/kvm_asm.h>
>  #include <asm/kvm_mmu.h>
> +#include <asm/kvm_nested.h>
>  #include <asm/kvm_emulate.h>
>  #include <asm/sections.h>
>  
> @@ -146,6 +147,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  	if (ret)
>  		return ret;
>  
> +	kvm_init_nested(kvm);
> +
>  	ret = kvm_share_hyp(kvm, kvm + 1);
>  	if (ret)
>  		goto out_free_stage2_pgd;
> @@ -375,6 +378,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  	struct kvm_s2_mmu *mmu;
>  	int *last_ran;
>  
> +	if (vcpu_has_nv(vcpu))
> +		kvm_vcpu_load_hw_mmu(vcpu);
> +
>  	mmu = vcpu->arch.hw_mmu;
>  	last_ran = this_cpu_ptr(mmu->last_vcpu_ran);
>  
> @@ -423,6 +429,9 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  	kvm_vgic_put(vcpu);
>  	kvm_vcpu_pmu_restore_host(vcpu);
>  
> +	if (vcpu_has_nv(vcpu))
> +		kvm_vcpu_put_hw_mmu(vcpu);
> +
>  	vcpu->cpu = -1;
>  }
>  
> @@ -1122,8 +1131,13 @@ static int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
>  
>  	vcpu->arch.target = phys_target;
>  
> +	/* Prepare for nested if required */
> +	ret = kvm_vcpu_init_nested(vcpu);
> +
>  	/* Now we know what it is, we can reset it. */
> -	ret = kvm_reset_vcpu(vcpu);
> +	if (!ret)
> +		ret = kvm_reset_vcpu(vcpu);
> +
>  	if (ret) {
>  		vcpu->arch.target = -1;
>  		bitmap_zero(vcpu->arch.features, KVM_VCPU_MAX_FEATURES);
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index bc2aba953299..55525fd5743d 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -162,7 +162,7 @@ static void invalidate_icache_guest_page(void *va, size_t size)
>   * does.
>   */
>  /**
> - * unmap_stage2_range -- Clear stage2 page table entries to unmap a range
> + * __unmap_stage2_range -- Clear stage2 page table entries to unmap a range
>   * @mmu:   The KVM stage-2 MMU pointer
>   * @start: The intermediate physical base address of the range to unmap
>   * @size:  The size of the area to unmap
> @@ -185,7 +185,7 @@ static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64
>  				   may_block));
>  }
>  
> -static void unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
> +void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
>  {
>  	__unmap_stage2_range(mmu, start, size, true);
>  }
> @@ -628,7 +628,20 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
>  	int cpu, err;
>  	struct kvm_pgtable *pgt;
>  
> +	/*
> +	 * If we already have our page tables in place, and that the
> +	 * MMU context is the canonical one, we have a bug somewhere,
> +	 * as this is only supposed to ever happen once per VM.
> +	 *
> +	 * Otherwise, we're building nested page tables, and that's
> +	 * probably because userspace called KVM_ARM_VCPU_INIT more
> +	 * than once on the same vcpu. Since that's actually legal,
> +	 * don't kick a fuss and leave gracefully.
> +	 */
>  	if (mmu->pgt != NULL) {
> +		if (&kvm->arch.mmu != mmu)
> +			return 0;
> +
>  		kvm_err("kvm_arch already initialized?\n");
>  		return -EINVAL;
>  	}
> @@ -654,6 +667,9 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
>  	mmu->pgt = pgt;
>  	mmu->pgd_phys = __pa(pgt->pgd);
>  	WRITE_ONCE(mmu->vmid.vmid_gen, 0);
> +
> +	kvm_init_nested_s2_mmu(mmu);
> +
>  	return 0;
>  
>  out_destroy_pgtable:
> @@ -699,7 +715,7 @@ static void stage2_unmap_memslot(struct kvm *kvm,
>  
>  		if (!(vma->vm_flags & VM_PFNMAP)) {
>  			gpa_t gpa = addr + (vm_start - memslot->userspace_addr);
> -			unmap_stage2_range(&kvm->arch.mmu, gpa, vm_end - vm_start);
> +			kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, vm_end - vm_start);
>  		}
>  		hva = vm_end;
>  	} while (hva < reg_end);
> @@ -1681,11 +1697,6 @@ void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen)
>  {
>  }
>  
> -void kvm_arch_flush_shadow_all(struct kvm *kvm)
> -{
> -	kvm_free_stage2_pgd(&kvm->arch.mmu);
> -}
> -
>  void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>  				   struct kvm_memory_slot *slot)
>  {
> @@ -1693,7 +1704,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>  	phys_addr_t size = slot->npages << PAGE_SHIFT;
>  
>  	spin_lock(&kvm->mmu_lock);
> -	unmap_stage2_range(&kvm->arch.mmu, gpa, size);
> +	kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, size);
>  	spin_unlock(&kvm->mmu_lock);
>  }
>  
> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> index 254152cd791e..bfa2b9229173 100644
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
> @@ -7,12 +7,189 @@
>  #include <linux/kvm.h>
>  #include <linux/kvm_host.h>
>  
> +#include <asm/kvm_arm.h>
>  #include <asm/kvm_emulate.h>
> +#include <asm/kvm_mmu.h>
>  #include <asm/kvm_nested.h>
>  #include <asm/sysreg.h>
>  
>  #include "sys_regs.h"
>  
> +void kvm_init_nested(struct kvm *kvm)
> +{
> +	kvm->arch.nested_mmus = NULL;
> +	kvm->arch.nested_mmus_size = 0;
> +}
> +
> +int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm *kvm = vcpu->kvm;
> +	struct kvm_s2_mmu *tmp;
> +	int num_mmus;
> +	int ret = -ENOMEM;
> +
> +	if (!test_bit(KVM_ARM_VCPU_HAS_EL2, vcpu->arch.features))
> +		return 0;
> +
> +	if (!cpus_have_final_cap(ARM64_HAS_NESTED_VIRT))
> +		return -EINVAL;
> +
> +	mutex_lock(&kvm->lock);
> +
> +	/*
> +	 * Let's treat memory allocation failures as benign: If we fail to
> +	 * allocate anything, return an error and keep the allocated array
> +	 * alive. Userspace may try to recover by intializing the vcpu
> +	 * again, and there is no reason to affect the whole VM for this.
> +	 */
> +	num_mmus = atomic_read(&kvm->online_vcpus) * 2;
> +	tmp = krealloc(kvm->arch.nested_mmus,
> +		       num_mmus * sizeof(*kvm->arch.nested_mmus),
> +		       GFP_KERNEL_ACCOUNT | __GFP_ZERO);
> +	if (tmp) {
> +		/*
> +		 * If we went through a realocation, adjust the MMU
> +		 * back-pointers in the previously initialised
> +		 * pg_table structures.
> +		 */
> +		if (kvm->arch.nested_mmus != tmp) {
> +			int i;
> +
> +			for (i = 0; i < num_mmus - 2; i++)
> +				tmp[i].pgt->mmu = &tmp[i];
> +		}
> +
> +		if (kvm_init_stage2_mmu(kvm, &tmp[num_mmus - 1]) ||
> +		    kvm_init_stage2_mmu(kvm, &tmp[num_mmus - 2])) {
> +			kvm_free_stage2_pgd(&tmp[num_mmus - 1]);
> +			kvm_free_stage2_pgd(&tmp[num_mmus - 2]);
> +		} else {
> +			kvm->arch.nested_mmus_size = num_mmus;
> +			ret = 0;
> +		}
> +
> +		kvm->arch.nested_mmus = tmp;
> +	}
> +
> +	mutex_unlock(&kvm->lock);
> +	return ret;
> +}
> +
> +/* Must be called with kvm->lock held */
                               ^^^^
Either that should be kvm->mmu_lock, or lookup_s2_mmu() is used incorrectly
in this patch.

Thanks,
Alex

> +struct kvm_s2_mmu *lookup_s2_mmu(struct kvm *kvm, u64 vttbr, u64 hcr)
> +{
> +	bool nested_stage2_enabled = hcr & HCR_VM;
> +	int i;
> +
> +	/* Don't consider the CnP bit for the vttbr match */
> +	vttbr = vttbr & ~VTTBR_CNP_BIT;
> +
> +	/*
> +	 * Two possibilities when looking up a S2 MMU context:
> +	 *
> +	 * - either S2 is enabled in the guest, and we need a context that
> +         *   is S2-enabled and matches the full VTTBR (VMID+BADDR), which
> +         *   makes it safe from a TLB conflict perspective (a broken guest
> +         *   won't be able to generate them),
> +	 *
> +	 * - or S2 is disabled, and we need a context that is S2-disabled
> +         *   and matches the VMID only, as all TLBs are tagged by VMID even
> +         *   if S2 translation is disabled.
> +	 */
> +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> +		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
> +
> +		if (!kvm_s2_mmu_valid(mmu))
> +			continue;
> +
> +		if (nested_stage2_enabled &&
> +		    mmu->nested_stage2_enabled &&
> +		    vttbr == mmu->vttbr)
> +			return mmu;
> +
> +		if (!nested_stage2_enabled &&
> +		    !mmu->nested_stage2_enabled &&
> +		    get_vmid(vttbr) == get_vmid(mmu->vttbr))
> +			return mmu;
> +	}
> +	return NULL;
> +}
> +
> +static struct kvm_s2_mmu *get_s2_mmu_nested(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm *kvm = vcpu->kvm;
> +	u64 vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
> +	u64 hcr= vcpu_read_sys_reg(vcpu, HCR_EL2);
> +	struct kvm_s2_mmu *s2_mmu;
> +	int i;
> +
> +	s2_mmu = lookup_s2_mmu(kvm, vttbr, hcr);
> +	if (s2_mmu)
> +		goto out;
> +
> +	/*
> +	 * Make sure we don't always search from the same point, or we
> +	 * will always reuse a potentially active context, leaving
> +	 * free contexts unused.
> +	 */
> +	for (i = kvm->arch.nested_mmus_next;
> +	     i < (kvm->arch.nested_mmus_size + kvm->arch.nested_mmus_next);
> +	     i++) {
> +		s2_mmu = &kvm->arch.nested_mmus[i % kvm->arch.nested_mmus_size];
> +
> +		if (atomic_read(&s2_mmu->refcnt) == 0)
> +			break;
> +	}
> +	BUG_ON(atomic_read(&s2_mmu->refcnt)); /* We have struct MMUs to spare */
> +
> +	/* Set the scene for the next search */
> +	kvm->arch.nested_mmus_next = (i + 1) % kvm->arch.nested_mmus_size;
> +
> +	if (kvm_s2_mmu_valid(s2_mmu)) {
> +		/* Clear the old state */
> +		kvm_unmap_stage2_range(s2_mmu, 0, kvm_phys_size(kvm));
> +		if (s2_mmu->vmid.vmid_gen)
> +			kvm_call_hyp(__kvm_tlb_flush_vmid, s2_mmu);
> +	}
> +
> +	/*
> +	 * The virtual VMID (modulo CnP) will be used as a key when matching
> +	 * an existing kvm_s2_mmu.
> +	 */
> +	s2_mmu->vttbr = vttbr & ~VTTBR_CNP_BIT;
> +	s2_mmu->nested_stage2_enabled = hcr & HCR_VM;
> +
> +out:
> +	atomic_inc(&s2_mmu->refcnt);
> +	return s2_mmu;
> +}
> +
> +void kvm_init_nested_s2_mmu(struct kvm_s2_mmu *mmu)
> +{
> +	mmu->vttbr = 1;
> +	mmu->nested_stage2_enabled = false;
> +	atomic_set(&mmu->refcnt, 0);
> +}
> +
> +void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu)
> +{
> +	if (is_hyp_ctxt(vcpu)) {
> +		vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu;
> +	} else {
> +		spin_lock(&vcpu->kvm->mmu_lock);
> +		vcpu->arch.hw_mmu = get_s2_mmu_nested(vcpu);
> +		spin_unlock(&vcpu->kvm->mmu_lock);
> +	}
> +}
> +
> +void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu)
> +{
> +	if (vcpu->arch.hw_mmu != &vcpu->kvm->arch.mmu) {
> +		atomic_dec(&vcpu->arch.hw_mmu->refcnt);
> +		vcpu->arch.hw_mmu = NULL;
> +	}
> +}
> +
>  /*
>   * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
>   * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
> @@ -31,6 +208,24 @@ int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
>  	return -EINVAL;
>  }
>  
> +void kvm_arch_flush_shadow_all(struct kvm *kvm)
> +{
> +	int i;
> +
> +	for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
> +		struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
> +
> +		WARN_ON(atomic_read(&mmu->refcnt));
> +
> +		if (!atomic_read(&mmu->refcnt))
> +			kvm_free_stage2_pgd(mmu);
> +	}
> +	kfree(kvm->arch.nested_mmus);
> +	kvm->arch.nested_mmus = NULL;
> +	kvm->arch.nested_mmus_size = 0;
> +	kvm_free_stage2_pgd(&kvm->arch.mmu);
> +}
> +
>  /*
>   * Our emulated CPU doesn't support all the possible features. For the
>   * sake of simplicity (and probably mental sanity), wipe out a number
> -- 
> 2.30.2
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 40/64] KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-24 15:39     ` Alexandru Elisei
  -1 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-24 15:39 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi,

On Fri, Jan 28, 2022 at 12:18:48PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> When supporting nested virtualization a guest hypervisor executing AT
> instructions must be trapped and emulated by the host hypervisor,
> because untrapped AT instructions operating on S1E1 will use the wrong
> translation regieme (the one used to emulate virtual EL2 in EL1 instead

s/regieme/regime/

> of virtual EL1) and AT instructions operating on S12 will not work from
> EL1.
> 
> This patch does several things.

I think this is a good hint that the patch can be split into several
patches. The size of the patch plus the emulation logic  make this
patch rather tedious to review.

> 
> 1. List and define all AT system instructions to emulate and document
> the emulation design.
> 
> 2. Implement AT instruction handling logic in EL2. This will be used to
> emulate AT instructions executed in the virtual EL2.
> 
> AT instruction emulation works by loading the proper processor
> context, which depends on the trapped instruction and the virtual
> HCR_EL2, to the EL1 virtual memory control registers and executing AT
> instructions. Note that ctxt->hw_sys_regs is expected to have the
> proper processor context before calling the handling
> function(__kvm_at_insn) implemented in this patch.
> 
> 4. Emulate AT S1E[01] instructions by issuing the same instructions in

Hmm... where's point number 3?

> EL2. We set the physical EL1 registers, NV and NV1 bits as described in
> the AT instruction emulation overview.
> 
> 5. Emulate AT A12E[01] instructions in two steps: First, do the stage-1
                ^
I'm guessing that's AT S12E[01].

> translation by reusing the existing AT emulation functions.  Second, do
> the stage-2 translation by walking the guest hypervisor's stage-2 page
> table in software. Record the translation result to PAR_EL1.
> 
> 6. Emulate AT S1E2 instructions by issuing the corresponding S1E1
> instructions in EL2. We set the physical EL1 registers and the HCR_EL2
> register as described in the AT instruction emulation overview.
> 
> 7. Forward system instruction traps to the virtual EL2 if the corresponding
> virtual AT bit is set in the virtual HCR_EL2.

Looks like points 4-7 make good canditates for individual patches.

> 
>   [ Much logic above has been reworked by Marc Zyngier ]
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> ---
>  arch/arm64/include/asm/kvm_arm.h |   2 +
>  arch/arm64/include/asm/kvm_asm.h |   2 +
>  arch/arm64/include/asm/sysreg.h  |  17 +++
>  arch/arm64/kvm/Makefile          |   2 +-
>  arch/arm64/kvm/at.c              | 219 +++++++++++++++++++++++++++++
>  arch/arm64/kvm/hyp/vhe/switch.c  |  13 +-
>  arch/arm64/kvm/sys_regs.c        | 229 ++++++++++++++++++++++++++++++-
>  7 files changed, 478 insertions(+), 6 deletions(-)
>  create mode 100644 arch/arm64/kvm/at.c
> 
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index 3675879b53c6..aa3bdce1b166 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -20,6 +20,7 @@
>  #define HCR_AMVOFFEN	(UL(1) << 51)
>  #define HCR_FIEN	(UL(1) << 47)
>  #define HCR_FWB		(UL(1) << 46)
> +#define HCR_AT		(UL(1) << 44)
>  #define HCR_NV1		(UL(1) << 43)
>  #define HCR_NV		(UL(1) << 42)
>  #define HCR_API		(UL(1) << 41)
> @@ -118,6 +119,7 @@
>  #define VTCR_EL2_TG0_16K	TCR_TG0_16K
>  #define VTCR_EL2_TG0_64K	TCR_TG0_64K
>  #define VTCR_EL2_SH0_MASK	TCR_SH0_MASK
> +#define VTCR_EL2_SH0_SHIFT	TCR_SH0_SHIFT
>  #define VTCR_EL2_SH0_INNER	TCR_SH0_INNER
>  #define VTCR_EL2_ORGN0_MASK	TCR_ORGN0_MASK
>  #define VTCR_EL2_ORGN0_WBWA	TCR_ORGN0_WBWA
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index d5b0386ef765..e22861ece3c3 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -208,6 +208,8 @@ extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa,
>  extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
>  
>  extern void __kvm_timer_set_cntvoff(u64 cntvoff);
> +extern void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr);
> +extern void __kvm_at_s1e2(struct kvm_vcpu *vcpu, u32 op, u64 vaddr);
>  
>  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>  
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 61c990e5591a..ea17d1eabfc5 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -658,6 +658,23 @@
>  
>  #define SYS_SP_EL2			sys_reg(3, 6,  4, 1, 0)
>  
> +/* AT instructions */
> +#define AT_Op0 1
> +#define AT_CRn 7
> +
> +#define OP_AT_S1E1R	sys_insn(AT_Op0, 0, AT_CRn, 8, 0)
> +#define OP_AT_S1E1W	sys_insn(AT_Op0, 0, AT_CRn, 8, 1)
> +#define OP_AT_S1E0R	sys_insn(AT_Op0, 0, AT_CRn, 8, 2)
> +#define OP_AT_S1E0W	sys_insn(AT_Op0, 0, AT_CRn, 8, 3)
> +#define OP_AT_S1E1RP	sys_insn(AT_Op0, 0, AT_CRn, 9, 0)
> +#define OP_AT_S1E1WP	sys_insn(AT_Op0, 0, AT_CRn, 9, 1)
> +#define OP_AT_S1E2R	sys_insn(AT_Op0, 4, AT_CRn, 8, 0)
> +#define OP_AT_S1E2W	sys_insn(AT_Op0, 4, AT_CRn, 8, 1)
> +#define OP_AT_S12E1R	sys_insn(AT_Op0, 4, AT_CRn, 8, 4)
> +#define OP_AT_S12E1W	sys_insn(AT_Op0, 4, AT_CRn, 8, 5)
> +#define OP_AT_S12E0R	sys_insn(AT_Op0, 4, AT_CRn, 8, 6)
> +#define OP_AT_S12E0W	sys_insn(AT_Op0, 4, AT_CRn, 8, 7)
> +
>  /* Common SCTLR_ELx flags. */
>  #define SCTLR_ELx_DSSBS	(BIT(44))
>  #define SCTLR_ELx_ATA	(BIT(43))
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index dbaf42ff65f1..b800dcbd157f 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -14,7 +14,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
>  	 inject_fault.o va_layout.o handle_exit.o \
>  	 guest.o debug.o reset.o sys_regs.o \
>  	 vgic-sys-reg-v3.o fpsimd.o pmu.o pkvm.o \
> -	 arch_timer.o trng.o emulate-nested.o nested.o \
> +	 arch_timer.o trng.o emulate-nested.o nested.o at.o \
>  	 vgic/vgic.o vgic/vgic-init.o \
>  	 vgic/vgic-irqfd.o vgic/vgic-v2.o \
>  	 vgic/vgic-v3.o vgic/vgic-v4.o \
> diff --git a/arch/arm64/kvm/at.c b/arch/arm64/kvm/at.c
> new file mode 100644
> index 000000000000..574c664e984b
> --- /dev/null
> +++ b/arch/arm64/kvm/at.c
> @@ -0,0 +1,219 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2017 - Linaro Ltd
> + * Author: Jintack Lim <jintack.lim@linaro.org>
> + */
> +
> +#include <asm/kvm_hyp.h>
> +#include <asm/kvm_mmu.h>
> +
> +struct mmu_config {
> +	u64	ttbr0;
> +	u64	ttbr1;
> +	u64	tcr;
> +	u64	sctlr;
> +	u64	vttbr;
> +	u64	vtcr;
> +	u64	hcr;
> +};
> +
> +static void __mmu_config_save(struct mmu_config *config)
> +{
> +	config->ttbr0	= read_sysreg_el1(SYS_TTBR0);
> +	config->ttbr1	= read_sysreg_el1(SYS_TTBR1);
> +	config->tcr	= read_sysreg_el1(SYS_TCR);
> +	config->sctlr	= read_sysreg_el1(SYS_SCTLR);
> +	config->vttbr	= read_sysreg(vttbr_el2);
> +	config->vtcr	= read_sysreg(vtcr_el2);

KVM saves VTCR_EL2, but the register is never changed between
__mmu_config_{save,restore} sequences. Another comment about this below.

> +	config->hcr	= read_sysreg(hcr_el2);
> +}
> +
> +static void __mmu_config_restore(struct mmu_config *config)
> +{
> +	write_sysreg_el1(config->ttbr0,	SYS_TTBR0);
> +	write_sysreg_el1(config->ttbr1,	SYS_TTBR1);
> +	write_sysreg_el1(config->tcr,	SYS_TCR);
> +	write_sysreg_el1(config->sctlr,	SYS_SCTLR);
> +	write_sysreg(config->vttbr,	vttbr_el2);
> +	write_sysreg(config->vtcr,	vtcr_el2);
> +	write_sysreg(config->hcr,	hcr_el2);
> +
> +	isb();
> +}
> +
> +void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
> +{
> +	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
> +	struct mmu_config config;
> +	struct kvm_s2_mmu *mmu;
> +
> +	spin_lock(&vcpu->kvm->mmu_lock);
> +
> +	/*
> +	 * If HCR_EL2.{E2H,TGE} == {1,1}, the MMU context is already
> +	 * the right one (as we trapped from vEL2).
> +	 */
> +	if (vcpu_el2_e2h_is_set(vcpu) && vcpu_el2_tge_is_set(vcpu))
> +		goto skip_mmu_switch;
> +
> +	/*
> +	 * FIXME: Obtaining the S2 MMU for a guest guest is horribly
> +	 * racy, and we may not find it (evicted by another vcpu, for
> +	 * example).
> +	 */

I think the "horribly racy" part deserves some elaboration. As far as I can tell
get_s2_mmu_nested() and lookup_s2_mmu() is always called with kvm->mmu_lock
held.

I suppose "evicted by another vcpu" means that get_s2_mmu_nested() decided
to reuse the MMU that KVM needs, in which case it's impossible to execute
the AT instruction, as there is no shadow stage 2 for the context.

I wonder if that's something that KVM should try to avoid, One obvious
solution would be never to reuse MMUs, but that comes at the cost of
allowing a L1 guest to eat up the L0 host's memory with kvm_s2_mmu structs
by creating many L2 guests.

Another solution would be to have a software stage 1 translation table
walker which populates the shadow S2 during the walk, if and only if the
IPA is present in the virtual stage 2 with the right permissions.

What do you think?

> +	mmu = lookup_s2_mmu(vcpu->kvm,
> +			    vcpu_read_sys_reg(vcpu, VTTBR_EL2),
> +			    vcpu_read_sys_reg(vcpu, HCR_EL2));
> +
> +	if (WARN_ON(!mmu))
> +		goto out;

If I'm not mistaken, in this case, guest PAR_EL1 is left untouched by KVM.
Shouldn't KVM set PAR_EL1 to SYS_PAR_EL1_F so the L1 guest doesn't treat
the stale PAR_EL1 value as valid?

> +
> +	/* We've trapped, so everything is live on the CPU. */
> +	__mmu_config_save(&config);
> +
> +	write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR0_EL1),	SYS_TTBR0);
> +	write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR1_EL1),	SYS_TTBR1);
> +	write_sysreg_el1(ctxt_sys_reg(ctxt, TCR_EL1),	SYS_TCR);
> +	write_sysreg_el1(ctxt_sys_reg(ctxt, SCTLR_EL1),	SYS_SCTLR);
> +	write_sysreg(kvm_get_vttbr(mmu),		vttbr_el2);
> +	/*
> +	 * REVISIT: do we need anything from the guest's VTCR_EL2? If
> +	 * looks like keeping the hosts configuration is the right
> +	 * thing to do at this stage (and we could avoid save/restore
> +	 * it. Keep the host's version for now.
> +	 */

I also don't think it's necessary to load the L1 guest's VTCR_EL2 register.
The register controls virtual stage 2, which is never used because KVM will
always use the shadow stage 2.

> +	write_sysreg((config.hcr & ~HCR_TGE) | HCR_VM,	hcr_el2);
> +
> +	isb();
> +
> +skip_mmu_switch:
> +
> +	switch (op) {
> +	case OP_AT_S1E1R:
> +	case OP_AT_S1E1RP:
> +		asm volatile("at s1e1r, %0" : : "r" (vaddr));
> +		break;
> +	case OP_AT_S1E1W:
> +	case OP_AT_S1E1WP:
> +		asm volatile("at s1e1w, %0" : : "r" (vaddr));
> +		break;
> +	case OP_AT_S1E0R:
> +		asm volatile("at s1e0r, %0" : : "r" (vaddr));
> +		break;
> +	case OP_AT_S1E0W:
> +		asm volatile("at s1e0w, %0" : : "r" (vaddr));
> +		break;
> +	default:
> +		WARN_ON_ONCE(1);
> +		break;
> +	}
> +
> +	isb();
> +
> +	ctxt_sys_reg(ctxt, PAR_EL1) = read_sysreg(par_el1);
> +
> +	/*
> +	 * Failed? let's leave the building now.
> +	 *
> +	 * FIXME: how about a failed translation because the shadow S2
> +	 * wasn't populated? We may need to perform a SW PTW,
> +	 * populating our shadow S2 and retry the instruction.
> +	 */
> +	if (ctxt_sys_reg(ctxt, PAR_EL1) & 1)
> +		goto nopan;
> +
> +	/* No PAN? No problem. */
> +	if (!(*vcpu_cpsr(vcpu) & PSR_PAN_BIT))
> +		goto nopan;
> +
> +	/*
> +	 * For PAN-involved AT operations, perform the same
> +	 * translation, using EL0 this time.
> +	 */

The description for FEAT_PAN is:

"When the value of this PAN state bit is 1, any privileged data access from
EL1, or EL2 when HCR_EL2.E2H is 1, to a virtual memory address that is
accessible to data accesses at EL0, generates a Permission fault."

I assume KVM executes the AT to make sure there is a valid translation
translation for the guest virtual address, right?

> +	switch (op) {
> +	case OP_AT_S1E1RP:
> +		asm volatile("at s1e0r, %0" : : "r" (vaddr));
> +		break;
> +	case OP_AT_S1E1WP:
> +		asm volatile("at s1e0w, %0" : : "r" (vaddr));
> +		break;
> +	default:
> +		goto nopan;
> +	}
> +
> +	/*
> +	 * If the EL0 translation has succeeded, we need to pretend
> +	 * the AT operation has failed, as the PAN setting forbids
> +	 * such a translation.

Hmm.. according to the description of FEAT_PAN, the AT translation fails
because of PAN=1 when CurrentEL=EL2 && HCR_EL2.E2H=1. So if the VCPU is at
virtual EL2 and virtual HCR_EL2.E2H=0, then it is allowed to succeed.

Thanks,
Alex

> +	 *
> +	 * FIXME: we hardcode a Level-3 permission fault. We really
> +	 * should return the real fault level.
> +	 */
> +	if (!(read_sysreg(par_el1) & 1))
> +		ctxt_sys_reg(ctxt, PAR_EL1) = 0x1f;
> +
> +nopan:
> +	if (!(vcpu_el2_e2h_is_set(vcpu) && vcpu_el2_tge_is_set(vcpu)))
> +		__mmu_config_restore(&config);
> +
> +out:
> +	spin_unlock(&vcpu->kvm->mmu_lock);
> +}
> +
> +void __kvm_at_s1e2(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
> +{
> +	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
> +	struct mmu_config config;
> +	struct kvm_s2_mmu *mmu;
> +	u64 val;
> +
> +	spin_lock(&vcpu->kvm->mmu_lock);
> +
> +	mmu = &vcpu->kvm->arch.mmu;
> +
> +	/* We've trapped, so everything is live on the CPU. */
> +	__mmu_config_save(&config);
> +
> +	if (vcpu_el2_e2h_is_set(vcpu)) {
> +		write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR0_EL2),	SYS_TTBR0);
> +		write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR1_EL2),	SYS_TTBR1);
> +		write_sysreg_el1(ctxt_sys_reg(ctxt, TCR_EL2),	SYS_TCR);
> +		write_sysreg_el1(ctxt_sys_reg(ctxt, SCTLR_EL2),	SYS_SCTLR);
> +
> +		val = config.hcr;
> +	} else {
> +		write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR0_EL2),	SYS_TTBR0);
> +		val = translate_tcr_el2_to_tcr_el1(ctxt_sys_reg(ctxt, TCR_EL2));
> +		write_sysreg_el1(val, SYS_TCR);
> +		val = translate_sctlr_el2_to_sctlr_el1(ctxt_sys_reg(ctxt, SCTLR_EL2));
> +		write_sysreg_el1(val, SYS_SCTLR);
> +
> +		val = config.hcr | HCR_NV | HCR_NV1;
> +	}
> +
> +	write_sysreg(kvm_get_vttbr(mmu),		vttbr_el2);
> +	/* FIXME: write S2 MMU VTCR_EL2? */
> +	write_sysreg((val & ~HCR_TGE) | HCR_VM,		hcr_el2);
> +
> +	isb();
> +
> +	switch (op) {
> +	case OP_AT_S1E2R:
> +		asm volatile("at s1e1r, %0" : : "r" (vaddr));
> +		break;
> +	case OP_AT_S1E2W:
> +		asm volatile("at s1e1w, %0" : : "r" (vaddr));
> +		break;
> +	default:
> +		WARN_ON_ONCE(1);
> +		break;
> +	}
> +
> +	isb();
> +
> +	/* FIXME: handle failed translation due to shadow S2 */
> +	ctxt_sys_reg(ctxt, PAR_EL1) = read_sysreg(par_el1);
> +
> +	__mmu_config_restore(&config);
> +	spin_unlock(&vcpu->kvm->mmu_lock);
> +}
> diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
> index 28845f907cfc..b7790d3c4122 100644
> --- a/arch/arm64/kvm/hyp/vhe/switch.c
> +++ b/arch/arm64/kvm/hyp/vhe/switch.c
> @@ -41,9 +41,10 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
>  		if (!vcpu_el2_e2h_is_set(vcpu)) {
>  			/*
>  			 * For a guest hypervisor on v8.0, trap and emulate
> -			 * the EL1 virtual memory control register accesses.
> +			 * the EL1 virtual memory control register accesses
> +			 * as well as the AT S1 operations.
>  			 */
> -			hcr |= HCR_TVM | HCR_TRVM | HCR_NV1;
> +			hcr |= HCR_TVM | HCR_TRVM | HCR_AT | HCR_NV1;
>  		} else {
>  			/*
>  			 * For a guest hypervisor on v8.1 (VHE), allow to
> @@ -68,6 +69,14 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
>  			hcr &= ~HCR_TVM;
>  
>  			hcr |= vhcr_el2 & (HCR_TVM | HCR_TRVM);
> +
> +			/*
> +			 * If we're using the EL1 translation regime
> +			 * (TGE clear), then ensure that AT S1 ops are
> +			 * trapped too.
> +			 */
> +			if (!vcpu_el2_tge_is_set(vcpu))
> +				hcr |= HCR_AT;
>  		}
>  	}
>  
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index f669618f966b..7be57e1b7019 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1704,7 +1704,6 @@ static bool access_sp_el1(struct kvm_vcpu *vcpu,
>  	return true;
>  }
>  
> -
>  static bool access_elr(struct kvm_vcpu *vcpu,
>  		       struct sys_reg_params *p,
>  		       const struct sys_reg_desc *r)
> @@ -2236,12 +2235,236 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	EL2_REG(SP_EL2, NULL, reset_unknown, 0),
>  };
>  
> -#define SYS_INSN_TO_DESC(insn, access_fn, forward_fn)	\
> -	{ SYS_DESC((insn)), (access_fn), NULL, 0, 0, NULL, NULL, (forward_fn) }
> +static bool handle_s1e01(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +			 const struct sys_reg_desc *r)
> +{
> +	int sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
> +
> +	if (vcpu_has_nv(vcpu) && forward_traps(vcpu, HCR_AT))
> +		return false;
> +
> +	__kvm_at_s1e01(vcpu, sys_encoding, p->regval);
> +
> +	return true;
> +}
> +
> +static bool handle_s1e2(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +			const struct sys_reg_desc *r)
> +{
> +	int sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
> +
> +	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
> +		return false;
> +
> +	__kvm_at_s1e2(vcpu, sys_encoding, p->regval);
> +
> +	return true;
> +}
> +
> +static u64 setup_par_aborted(u32 esr)
> +{
> +	u64 par = 0;
> +
> +	/* S [9]: fault in the stage 2 translation */
> +	par |= (1 << 9);
> +	/* FST [6:1]: Fault status code  */
> +	par |= (esr << 1);
> +	/* F [0]: translation is aborted */
> +	par |= 1;
> +
> +	return par;
> +}
> +
> +static u64 setup_par_completed(struct kvm_vcpu *vcpu, struct kvm_s2_trans *out)
> +{
> +	u64 par, vtcr_sh0;
> +
> +	/* F [0]: Translation is completed successfully */
> +	par = 0;
> +	/* ATTR [63:56] */
> +	par |= out->upper_attr;
> +	/* PA [47:12] */
> +	par |= out->output & GENMASK_ULL(11, 0);
> +	/* RES1 [11] */
> +	par |= (1UL << 11);
> +	/* SH [8:7]: Shareability attribute */
> +	vtcr_sh0 = vcpu_read_sys_reg(vcpu, VTCR_EL2) & VTCR_EL2_SH0_MASK;
> +	par |= (vtcr_sh0 >> VTCR_EL2_SH0_SHIFT) << 7;
> +
> +	return par;
> +}
> +
> +static bool handle_s12(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +		       const struct sys_reg_desc *r, bool write)
> +{
> +	u64 par, va;
> +	u32 esr, op;
> +	phys_addr_t ipa;
> +	struct kvm_s2_trans out;
> +	int ret;
> +
> +	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
> +		return false;
> +
> +	/* Do the stage-1 translation */
> +	va = p->regval;
> +	op = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
> +	switch (op) {
> +	case OP_AT_S12E1R:
> +		op = OP_AT_S1E1R;
> +		break;
> +	case OP_AT_S12E1W:
> +		op = OP_AT_S1E1W;
> +		break;
> +	case OP_AT_S12E0R:
> +		op = OP_AT_S1E0R;
> +		break;
> +	case OP_AT_S12E0W:
> +		op = OP_AT_S1E0W;
> +		break;
> +	default:
> +		WARN_ON_ONCE(1);
> +		return true;
> +	}
> +
> +	__kvm_at_s1e01(vcpu, op, va);
> +	par = vcpu_read_sys_reg(vcpu, PAR_EL1);
> +	if (par & 1) {
> +		/* The stage-1 translation aborted */
> +		return true;
> +	}
> +
> +	/* Do the stage-2 translation */
> +	ipa = (par & GENMASK_ULL(47, 12)) | (va & GENMASK_ULL(11, 0));
> +	out.esr = 0;
> +	ret = kvm_walk_nested_s2(vcpu, ipa, &out);
> +	if (ret < 0)
> +		return false;
> +
> +	/* Check if the stage-2 PTW is aborted */
> +	if (out.esr) {
> +		esr = out.esr;
> +		goto s2_trans_abort;
> +	}
> +
> +	/* Check the access permission */
> +	if ((!write && !out.readable) || (write && !out.writable)) {
> +		esr = ESR_ELx_FSC_PERM;
> +		esr |= out.level & 0x3;
> +		goto s2_trans_abort;
> +	}
> +
> +	vcpu_write_sys_reg(vcpu, setup_par_completed(vcpu, &out), PAR_EL1);
> +	return true;
> +
> +s2_trans_abort:
> +	vcpu_write_sys_reg(vcpu, setup_par_aborted(esr), PAR_EL1);
> +	return true;
> +}
> +
> +static bool handle_s12r(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +			const struct sys_reg_desc *r)
> +{
> +	return handle_s12(vcpu, p, r, false);
> +}
> +
> +static bool handle_s12w(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +			const struct sys_reg_desc *r)
> +{
> +	return handle_s12(vcpu, p, r, true);
> +}
> +
> +/*
> + * AT instruction emulation
> + *
> + * We emulate AT instructions executed in the virtual EL2.
> + * Basic strategy for the stage-1 translation emulation is to load proper
> + * context, which depends on the trapped instruction and the virtual HCR_EL2,
> + * to the EL1 virtual memory control registers and execute S1E[01] instructions
> + * in EL2. See below for more detail.
> + *
> + * For the stage-2 translation, which is necessary for S12E[01] emulation,
> + * we walk the guest hypervisor's stage-2 page table in software.
> + *
> + * The stage-1 translation emulations can be divided into two groups depending
> + * on the translation regime.
> + *
> + * 1. EL2 AT instructions: S1E2x
> + * +-----------------------------------------------------------------------+
> + * |                             |         Setting for the emulation       |
> + * | Virtual HCR_EL2.E2H on trap |-----------------------------------------+
> + * |                             | Phys EL1 regs | Phys NV, NV1 | Phys TGE |
> + * |-----------------------------------------------------------------------|
> + * |             0               |     vEL2      |    (1, 1)    |    0     |
> + * |             1               |     vEL2      |    (0, 0)    |    0     |
> + * +-----------------------------------------------------------------------+
> + *
> + * We emulate the EL2 AT instructions by loading virtual EL2 context
> + * to the EL1 virtual memory control registers and executing corresponding
> + * EL1 AT instructions.
> + *
> + * We set physical NV and NV1 bits to use EL2 page table format for non-VHE
> + * guest hypervisor (i.e. HCR_EL2.E2H == 0). As a VHE guest hypervisor uses the
> + * EL1 page table format, we don't set those bits.
> + *
> + * We should clear physical TGE bit not to use the EL2 translation regime when
> + * the host uses the VHE feature.
> + *
> + *
> + * 2. EL0/EL1 AT instructions: S1E[01]x, S12E1x
> + * +----------------------------------------------------------------------+
> + * |   Virtual HCR_EL2 on trap  |        Setting for the emulation        |
> + * |----------------------------------------------------------------------+
> + * | (vE2H, vTGE) | (vNV, vNV1) | Phys EL1 regs | Phys NV, NV1 | Phys TGE |
> + * |----------------------------------------------------------------------|
> + * |    (0, 0)*   |   (0, 0)    |      vEL1     |    (0, 0)    |    0     |
> + * |    (0, 0)    |   (1, 1)    |      vEL1     |    (1, 1)    |    0     |
> + * |    (1, 1)    |   (0, 0)    |      vEL2     |    (0, 0)    |    0     |
> + * |    (1, 1)    |   (1, 1)    |      vEL2     |    (1, 1)    |    0     |
> + * +----------------------------------------------------------------------+
> + *
> + * *For (0, 0) in the 'Virtual HCR_EL2 on trap' column, it actually means
> + *  (1, 1). Keep them (0, 0) just for the readability.
> + *
> + * We set physical EL1 virtual memory control registers depending on
> + * (vE2H, vTGE) pair. When the pair is (0, 0) where AT instructions are
> + * supposed to use EL0/EL1 translation regime, we load the EL1 registers with
> + * the virtual EL1 registers (i.e. EL1 registers from the guest hypervisor's
> + * point of view). When the pair is (1, 1), however, AT instructions are defined
> + * to apply EL2 translation regime. To emulate this behavior, we load the EL1
> + * registers with the virtual EL2 context. (i.e the shadow registers)
> + *
> + * We respect the virtual NV and NV1 bit for the emulation. When those bits are
> + * set, it means that a guest hypervisor would like to use EL2 page table format
> + * for the EL1 translation regime. We emulate this by setting the physical
> + * NV and NV1 bits.
> + */
> +
> +#define SYS_INSN(insn, access_fn)					\
> +	{								\
> +		SYS_DESC(OP_##insn),					\
> +		.access = (access_fn),					\
> +	}
> +
>  static struct sys_reg_desc sys_insn_descs[] = {
>  	{ SYS_DESC(SYS_DC_ISW), access_dcsw },
> +
> +	SYS_INSN(AT_S1E1R, handle_s1e01),
> +	SYS_INSN(AT_S1E1W, handle_s1e01),
> +	SYS_INSN(AT_S1E0R, handle_s1e01),
> +	SYS_INSN(AT_S1E0W, handle_s1e01),
> +	SYS_INSN(AT_S1E1RP, handle_s1e01),
> +	SYS_INSN(AT_S1E1WP, handle_s1e01),
> +
>  	{ SYS_DESC(SYS_DC_CSW), access_dcsw },
>  	{ SYS_DESC(SYS_DC_CISW), access_dcsw },
> +
> +	SYS_INSN(AT_S1E2R, handle_s1e2),
> +	SYS_INSN(AT_S1E2W, handle_s1e2),
> +	SYS_INSN(AT_S12E1R, handle_s12r),
> +	SYS_INSN(AT_S12E1W, handle_s12w),
> +	SYS_INSN(AT_S12E0R, handle_s12r),
> +	SYS_INSN(AT_S12E0W, handle_s12w),
>  };
>  
>  static bool trap_dbgdidr(struct kvm_vcpu *vcpu,
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 40/64] KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2
@ 2022-02-24 15:39     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-24 15:39 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

Hi,

On Fri, Jan 28, 2022 at 12:18:48PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> When supporting nested virtualization a guest hypervisor executing AT
> instructions must be trapped and emulated by the host hypervisor,
> because untrapped AT instructions operating on S1E1 will use the wrong
> translation regieme (the one used to emulate virtual EL2 in EL1 instead

s/regieme/regime/

> of virtual EL1) and AT instructions operating on S12 will not work from
> EL1.
> 
> This patch does several things.

I think this is a good hint that the patch can be split into several
patches. The size of the patch plus the emulation logic  make this
patch rather tedious to review.

> 
> 1. List and define all AT system instructions to emulate and document
> the emulation design.
> 
> 2. Implement AT instruction handling logic in EL2. This will be used to
> emulate AT instructions executed in the virtual EL2.
> 
> AT instruction emulation works by loading the proper processor
> context, which depends on the trapped instruction and the virtual
> HCR_EL2, to the EL1 virtual memory control registers and executing AT
> instructions. Note that ctxt->hw_sys_regs is expected to have the
> proper processor context before calling the handling
> function(__kvm_at_insn) implemented in this patch.
> 
> 4. Emulate AT S1E[01] instructions by issuing the same instructions in

Hmm... where's point number 3?

> EL2. We set the physical EL1 registers, NV and NV1 bits as described in
> the AT instruction emulation overview.
> 
> 5. Emulate AT A12E[01] instructions in two steps: First, do the stage-1
                ^
I'm guessing that's AT S12E[01].

> translation by reusing the existing AT emulation functions.  Second, do
> the stage-2 translation by walking the guest hypervisor's stage-2 page
> table in software. Record the translation result to PAR_EL1.
> 
> 6. Emulate AT S1E2 instructions by issuing the corresponding S1E1
> instructions in EL2. We set the physical EL1 registers and the HCR_EL2
> register as described in the AT instruction emulation overview.
> 
> 7. Forward system instruction traps to the virtual EL2 if the corresponding
> virtual AT bit is set in the virtual HCR_EL2.

Looks like points 4-7 make good canditates for individual patches.

> 
>   [ Much logic above has been reworked by Marc Zyngier ]
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> ---
>  arch/arm64/include/asm/kvm_arm.h |   2 +
>  arch/arm64/include/asm/kvm_asm.h |   2 +
>  arch/arm64/include/asm/sysreg.h  |  17 +++
>  arch/arm64/kvm/Makefile          |   2 +-
>  arch/arm64/kvm/at.c              | 219 +++++++++++++++++++++++++++++
>  arch/arm64/kvm/hyp/vhe/switch.c  |  13 +-
>  arch/arm64/kvm/sys_regs.c        | 229 ++++++++++++++++++++++++++++++-
>  7 files changed, 478 insertions(+), 6 deletions(-)
>  create mode 100644 arch/arm64/kvm/at.c
> 
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index 3675879b53c6..aa3bdce1b166 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -20,6 +20,7 @@
>  #define HCR_AMVOFFEN	(UL(1) << 51)
>  #define HCR_FIEN	(UL(1) << 47)
>  #define HCR_FWB		(UL(1) << 46)
> +#define HCR_AT		(UL(1) << 44)
>  #define HCR_NV1		(UL(1) << 43)
>  #define HCR_NV		(UL(1) << 42)
>  #define HCR_API		(UL(1) << 41)
> @@ -118,6 +119,7 @@
>  #define VTCR_EL2_TG0_16K	TCR_TG0_16K
>  #define VTCR_EL2_TG0_64K	TCR_TG0_64K
>  #define VTCR_EL2_SH0_MASK	TCR_SH0_MASK
> +#define VTCR_EL2_SH0_SHIFT	TCR_SH0_SHIFT
>  #define VTCR_EL2_SH0_INNER	TCR_SH0_INNER
>  #define VTCR_EL2_ORGN0_MASK	TCR_ORGN0_MASK
>  #define VTCR_EL2_ORGN0_WBWA	TCR_ORGN0_WBWA
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index d5b0386ef765..e22861ece3c3 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -208,6 +208,8 @@ extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa,
>  extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
>  
>  extern void __kvm_timer_set_cntvoff(u64 cntvoff);
> +extern void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr);
> +extern void __kvm_at_s1e2(struct kvm_vcpu *vcpu, u32 op, u64 vaddr);
>  
>  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>  
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 61c990e5591a..ea17d1eabfc5 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -658,6 +658,23 @@
>  
>  #define SYS_SP_EL2			sys_reg(3, 6,  4, 1, 0)
>  
> +/* AT instructions */
> +#define AT_Op0 1
> +#define AT_CRn 7
> +
> +#define OP_AT_S1E1R	sys_insn(AT_Op0, 0, AT_CRn, 8, 0)
> +#define OP_AT_S1E1W	sys_insn(AT_Op0, 0, AT_CRn, 8, 1)
> +#define OP_AT_S1E0R	sys_insn(AT_Op0, 0, AT_CRn, 8, 2)
> +#define OP_AT_S1E0W	sys_insn(AT_Op0, 0, AT_CRn, 8, 3)
> +#define OP_AT_S1E1RP	sys_insn(AT_Op0, 0, AT_CRn, 9, 0)
> +#define OP_AT_S1E1WP	sys_insn(AT_Op0, 0, AT_CRn, 9, 1)
> +#define OP_AT_S1E2R	sys_insn(AT_Op0, 4, AT_CRn, 8, 0)
> +#define OP_AT_S1E2W	sys_insn(AT_Op0, 4, AT_CRn, 8, 1)
> +#define OP_AT_S12E1R	sys_insn(AT_Op0, 4, AT_CRn, 8, 4)
> +#define OP_AT_S12E1W	sys_insn(AT_Op0, 4, AT_CRn, 8, 5)
> +#define OP_AT_S12E0R	sys_insn(AT_Op0, 4, AT_CRn, 8, 6)
> +#define OP_AT_S12E0W	sys_insn(AT_Op0, 4, AT_CRn, 8, 7)
> +
>  /* Common SCTLR_ELx flags. */
>  #define SCTLR_ELx_DSSBS	(BIT(44))
>  #define SCTLR_ELx_ATA	(BIT(43))
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index dbaf42ff65f1..b800dcbd157f 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -14,7 +14,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
>  	 inject_fault.o va_layout.o handle_exit.o \
>  	 guest.o debug.o reset.o sys_regs.o \
>  	 vgic-sys-reg-v3.o fpsimd.o pmu.o pkvm.o \
> -	 arch_timer.o trng.o emulate-nested.o nested.o \
> +	 arch_timer.o trng.o emulate-nested.o nested.o at.o \
>  	 vgic/vgic.o vgic/vgic-init.o \
>  	 vgic/vgic-irqfd.o vgic/vgic-v2.o \
>  	 vgic/vgic-v3.o vgic/vgic-v4.o \
> diff --git a/arch/arm64/kvm/at.c b/arch/arm64/kvm/at.c
> new file mode 100644
> index 000000000000..574c664e984b
> --- /dev/null
> +++ b/arch/arm64/kvm/at.c
> @@ -0,0 +1,219 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2017 - Linaro Ltd
> + * Author: Jintack Lim <jintack.lim@linaro.org>
> + */
> +
> +#include <asm/kvm_hyp.h>
> +#include <asm/kvm_mmu.h>
> +
> +struct mmu_config {
> +	u64	ttbr0;
> +	u64	ttbr1;
> +	u64	tcr;
> +	u64	sctlr;
> +	u64	vttbr;
> +	u64	vtcr;
> +	u64	hcr;
> +};
> +
> +static void __mmu_config_save(struct mmu_config *config)
> +{
> +	config->ttbr0	= read_sysreg_el1(SYS_TTBR0);
> +	config->ttbr1	= read_sysreg_el1(SYS_TTBR1);
> +	config->tcr	= read_sysreg_el1(SYS_TCR);
> +	config->sctlr	= read_sysreg_el1(SYS_SCTLR);
> +	config->vttbr	= read_sysreg(vttbr_el2);
> +	config->vtcr	= read_sysreg(vtcr_el2);

KVM saves VTCR_EL2, but the register is never changed between
__mmu_config_{save,restore} sequences. Another comment about this below.

> +	config->hcr	= read_sysreg(hcr_el2);
> +}
> +
> +static void __mmu_config_restore(struct mmu_config *config)
> +{
> +	write_sysreg_el1(config->ttbr0,	SYS_TTBR0);
> +	write_sysreg_el1(config->ttbr1,	SYS_TTBR1);
> +	write_sysreg_el1(config->tcr,	SYS_TCR);
> +	write_sysreg_el1(config->sctlr,	SYS_SCTLR);
> +	write_sysreg(config->vttbr,	vttbr_el2);
> +	write_sysreg(config->vtcr,	vtcr_el2);
> +	write_sysreg(config->hcr,	hcr_el2);
> +
> +	isb();
> +}
> +
> +void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
> +{
> +	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
> +	struct mmu_config config;
> +	struct kvm_s2_mmu *mmu;
> +
> +	spin_lock(&vcpu->kvm->mmu_lock);
> +
> +	/*
> +	 * If HCR_EL2.{E2H,TGE} == {1,1}, the MMU context is already
> +	 * the right one (as we trapped from vEL2).
> +	 */
> +	if (vcpu_el2_e2h_is_set(vcpu) && vcpu_el2_tge_is_set(vcpu))
> +		goto skip_mmu_switch;
> +
> +	/*
> +	 * FIXME: Obtaining the S2 MMU for a guest guest is horribly
> +	 * racy, and we may not find it (evicted by another vcpu, for
> +	 * example).
> +	 */

I think the "horribly racy" part deserves some elaboration. As far as I can tell
get_s2_mmu_nested() and lookup_s2_mmu() is always called with kvm->mmu_lock
held.

I suppose "evicted by another vcpu" means that get_s2_mmu_nested() decided
to reuse the MMU that KVM needs, in which case it's impossible to execute
the AT instruction, as there is no shadow stage 2 for the context.

I wonder if that's something that KVM should try to avoid, One obvious
solution would be never to reuse MMUs, but that comes at the cost of
allowing a L1 guest to eat up the L0 host's memory with kvm_s2_mmu structs
by creating many L2 guests.

Another solution would be to have a software stage 1 translation table
walker which populates the shadow S2 during the walk, if and only if the
IPA is present in the virtual stage 2 with the right permissions.

What do you think?

> +	mmu = lookup_s2_mmu(vcpu->kvm,
> +			    vcpu_read_sys_reg(vcpu, VTTBR_EL2),
> +			    vcpu_read_sys_reg(vcpu, HCR_EL2));
> +
> +	if (WARN_ON(!mmu))
> +		goto out;

If I'm not mistaken, in this case, guest PAR_EL1 is left untouched by KVM.
Shouldn't KVM set PAR_EL1 to SYS_PAR_EL1_F so the L1 guest doesn't treat
the stale PAR_EL1 value as valid?

> +
> +	/* We've trapped, so everything is live on the CPU. */
> +	__mmu_config_save(&config);
> +
> +	write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR0_EL1),	SYS_TTBR0);
> +	write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR1_EL1),	SYS_TTBR1);
> +	write_sysreg_el1(ctxt_sys_reg(ctxt, TCR_EL1),	SYS_TCR);
> +	write_sysreg_el1(ctxt_sys_reg(ctxt, SCTLR_EL1),	SYS_SCTLR);
> +	write_sysreg(kvm_get_vttbr(mmu),		vttbr_el2);
> +	/*
> +	 * REVISIT: do we need anything from the guest's VTCR_EL2? If
> +	 * looks like keeping the hosts configuration is the right
> +	 * thing to do at this stage (and we could avoid save/restore
> +	 * it. Keep the host's version for now.
> +	 */

I also don't think it's necessary to load the L1 guest's VTCR_EL2 register.
The register controls virtual stage 2, which is never used because KVM will
always use the shadow stage 2.

> +	write_sysreg((config.hcr & ~HCR_TGE) | HCR_VM,	hcr_el2);
> +
> +	isb();
> +
> +skip_mmu_switch:
> +
> +	switch (op) {
> +	case OP_AT_S1E1R:
> +	case OP_AT_S1E1RP:
> +		asm volatile("at s1e1r, %0" : : "r" (vaddr));
> +		break;
> +	case OP_AT_S1E1W:
> +	case OP_AT_S1E1WP:
> +		asm volatile("at s1e1w, %0" : : "r" (vaddr));
> +		break;
> +	case OP_AT_S1E0R:
> +		asm volatile("at s1e0r, %0" : : "r" (vaddr));
> +		break;
> +	case OP_AT_S1E0W:
> +		asm volatile("at s1e0w, %0" : : "r" (vaddr));
> +		break;
> +	default:
> +		WARN_ON_ONCE(1);
> +		break;
> +	}
> +
> +	isb();
> +
> +	ctxt_sys_reg(ctxt, PAR_EL1) = read_sysreg(par_el1);
> +
> +	/*
> +	 * Failed? let's leave the building now.
> +	 *
> +	 * FIXME: how about a failed translation because the shadow S2
> +	 * wasn't populated? We may need to perform a SW PTW,
> +	 * populating our shadow S2 and retry the instruction.
> +	 */
> +	if (ctxt_sys_reg(ctxt, PAR_EL1) & 1)
> +		goto nopan;
> +
> +	/* No PAN? No problem. */
> +	if (!(*vcpu_cpsr(vcpu) & PSR_PAN_BIT))
> +		goto nopan;
> +
> +	/*
> +	 * For PAN-involved AT operations, perform the same
> +	 * translation, using EL0 this time.
> +	 */

The description for FEAT_PAN is:

"When the value of this PAN state bit is 1, any privileged data access from
EL1, or EL2 when HCR_EL2.E2H is 1, to a virtual memory address that is
accessible to data accesses at EL0, generates a Permission fault."

I assume KVM executes the AT to make sure there is a valid translation
translation for the guest virtual address, right?

> +	switch (op) {
> +	case OP_AT_S1E1RP:
> +		asm volatile("at s1e0r, %0" : : "r" (vaddr));
> +		break;
> +	case OP_AT_S1E1WP:
> +		asm volatile("at s1e0w, %0" : : "r" (vaddr));
> +		break;
> +	default:
> +		goto nopan;
> +	}
> +
> +	/*
> +	 * If the EL0 translation has succeeded, we need to pretend
> +	 * the AT operation has failed, as the PAN setting forbids
> +	 * such a translation.

Hmm.. according to the description of FEAT_PAN, the AT translation fails
because of PAN=1 when CurrentEL=EL2 && HCR_EL2.E2H=1. So if the VCPU is at
virtual EL2 and virtual HCR_EL2.E2H=0, then it is allowed to succeed.

Thanks,
Alex

> +	 *
> +	 * FIXME: we hardcode a Level-3 permission fault. We really
> +	 * should return the real fault level.
> +	 */
> +	if (!(read_sysreg(par_el1) & 1))
> +		ctxt_sys_reg(ctxt, PAR_EL1) = 0x1f;
> +
> +nopan:
> +	if (!(vcpu_el2_e2h_is_set(vcpu) && vcpu_el2_tge_is_set(vcpu)))
> +		__mmu_config_restore(&config);
> +
> +out:
> +	spin_unlock(&vcpu->kvm->mmu_lock);
> +}
> +
> +void __kvm_at_s1e2(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
> +{
> +	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
> +	struct mmu_config config;
> +	struct kvm_s2_mmu *mmu;
> +	u64 val;
> +
> +	spin_lock(&vcpu->kvm->mmu_lock);
> +
> +	mmu = &vcpu->kvm->arch.mmu;
> +
> +	/* We've trapped, so everything is live on the CPU. */
> +	__mmu_config_save(&config);
> +
> +	if (vcpu_el2_e2h_is_set(vcpu)) {
> +		write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR0_EL2),	SYS_TTBR0);
> +		write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR1_EL2),	SYS_TTBR1);
> +		write_sysreg_el1(ctxt_sys_reg(ctxt, TCR_EL2),	SYS_TCR);
> +		write_sysreg_el1(ctxt_sys_reg(ctxt, SCTLR_EL2),	SYS_SCTLR);
> +
> +		val = config.hcr;
> +	} else {
> +		write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR0_EL2),	SYS_TTBR0);
> +		val = translate_tcr_el2_to_tcr_el1(ctxt_sys_reg(ctxt, TCR_EL2));
> +		write_sysreg_el1(val, SYS_TCR);
> +		val = translate_sctlr_el2_to_sctlr_el1(ctxt_sys_reg(ctxt, SCTLR_EL2));
> +		write_sysreg_el1(val, SYS_SCTLR);
> +
> +		val = config.hcr | HCR_NV | HCR_NV1;
> +	}
> +
> +	write_sysreg(kvm_get_vttbr(mmu),		vttbr_el2);
> +	/* FIXME: write S2 MMU VTCR_EL2? */
> +	write_sysreg((val & ~HCR_TGE) | HCR_VM,		hcr_el2);
> +
> +	isb();
> +
> +	switch (op) {
> +	case OP_AT_S1E2R:
> +		asm volatile("at s1e1r, %0" : : "r" (vaddr));
> +		break;
> +	case OP_AT_S1E2W:
> +		asm volatile("at s1e1w, %0" : : "r" (vaddr));
> +		break;
> +	default:
> +		WARN_ON_ONCE(1);
> +		break;
> +	}
> +
> +	isb();
> +
> +	/* FIXME: handle failed translation due to shadow S2 */
> +	ctxt_sys_reg(ctxt, PAR_EL1) = read_sysreg(par_el1);
> +
> +	__mmu_config_restore(&config);
> +	spin_unlock(&vcpu->kvm->mmu_lock);
> +}
> diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
> index 28845f907cfc..b7790d3c4122 100644
> --- a/arch/arm64/kvm/hyp/vhe/switch.c
> +++ b/arch/arm64/kvm/hyp/vhe/switch.c
> @@ -41,9 +41,10 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
>  		if (!vcpu_el2_e2h_is_set(vcpu)) {
>  			/*
>  			 * For a guest hypervisor on v8.0, trap and emulate
> -			 * the EL1 virtual memory control register accesses.
> +			 * the EL1 virtual memory control register accesses
> +			 * as well as the AT S1 operations.
>  			 */
> -			hcr |= HCR_TVM | HCR_TRVM | HCR_NV1;
> +			hcr |= HCR_TVM | HCR_TRVM | HCR_AT | HCR_NV1;
>  		} else {
>  			/*
>  			 * For a guest hypervisor on v8.1 (VHE), allow to
> @@ -68,6 +69,14 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
>  			hcr &= ~HCR_TVM;
>  
>  			hcr |= vhcr_el2 & (HCR_TVM | HCR_TRVM);
> +
> +			/*
> +			 * If we're using the EL1 translation regime
> +			 * (TGE clear), then ensure that AT S1 ops are
> +			 * trapped too.
> +			 */
> +			if (!vcpu_el2_tge_is_set(vcpu))
> +				hcr |= HCR_AT;
>  		}
>  	}
>  
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index f669618f966b..7be57e1b7019 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1704,7 +1704,6 @@ static bool access_sp_el1(struct kvm_vcpu *vcpu,
>  	return true;
>  }
>  
> -
>  static bool access_elr(struct kvm_vcpu *vcpu,
>  		       struct sys_reg_params *p,
>  		       const struct sys_reg_desc *r)
> @@ -2236,12 +2235,236 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	EL2_REG(SP_EL2, NULL, reset_unknown, 0),
>  };
>  
> -#define SYS_INSN_TO_DESC(insn, access_fn, forward_fn)	\
> -	{ SYS_DESC((insn)), (access_fn), NULL, 0, 0, NULL, NULL, (forward_fn) }
> +static bool handle_s1e01(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +			 const struct sys_reg_desc *r)
> +{
> +	int sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
> +
> +	if (vcpu_has_nv(vcpu) && forward_traps(vcpu, HCR_AT))
> +		return false;
> +
> +	__kvm_at_s1e01(vcpu, sys_encoding, p->regval);
> +
> +	return true;
> +}
> +
> +static bool handle_s1e2(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +			const struct sys_reg_desc *r)
> +{
> +	int sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
> +
> +	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
> +		return false;
> +
> +	__kvm_at_s1e2(vcpu, sys_encoding, p->regval);
> +
> +	return true;
> +}
> +
> +static u64 setup_par_aborted(u32 esr)
> +{
> +	u64 par = 0;
> +
> +	/* S [9]: fault in the stage 2 translation */
> +	par |= (1 << 9);
> +	/* FST [6:1]: Fault status code  */
> +	par |= (esr << 1);
> +	/* F [0]: translation is aborted */
> +	par |= 1;
> +
> +	return par;
> +}
> +
> +static u64 setup_par_completed(struct kvm_vcpu *vcpu, struct kvm_s2_trans *out)
> +{
> +	u64 par, vtcr_sh0;
> +
> +	/* F [0]: Translation is completed successfully */
> +	par = 0;
> +	/* ATTR [63:56] */
> +	par |= out->upper_attr;
> +	/* PA [47:12] */
> +	par |= out->output & GENMASK_ULL(11, 0);
> +	/* RES1 [11] */
> +	par |= (1UL << 11);
> +	/* SH [8:7]: Shareability attribute */
> +	vtcr_sh0 = vcpu_read_sys_reg(vcpu, VTCR_EL2) & VTCR_EL2_SH0_MASK;
> +	par |= (vtcr_sh0 >> VTCR_EL2_SH0_SHIFT) << 7;
> +
> +	return par;
> +}
> +
> +static bool handle_s12(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +		       const struct sys_reg_desc *r, bool write)
> +{
> +	u64 par, va;
> +	u32 esr, op;
> +	phys_addr_t ipa;
> +	struct kvm_s2_trans out;
> +	int ret;
> +
> +	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
> +		return false;
> +
> +	/* Do the stage-1 translation */
> +	va = p->regval;
> +	op = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
> +	switch (op) {
> +	case OP_AT_S12E1R:
> +		op = OP_AT_S1E1R;
> +		break;
> +	case OP_AT_S12E1W:
> +		op = OP_AT_S1E1W;
> +		break;
> +	case OP_AT_S12E0R:
> +		op = OP_AT_S1E0R;
> +		break;
> +	case OP_AT_S12E0W:
> +		op = OP_AT_S1E0W;
> +		break;
> +	default:
> +		WARN_ON_ONCE(1);
> +		return true;
> +	}
> +
> +	__kvm_at_s1e01(vcpu, op, va);
> +	par = vcpu_read_sys_reg(vcpu, PAR_EL1);
> +	if (par & 1) {
> +		/* The stage-1 translation aborted */
> +		return true;
> +	}
> +
> +	/* Do the stage-2 translation */
> +	ipa = (par & GENMASK_ULL(47, 12)) | (va & GENMASK_ULL(11, 0));
> +	out.esr = 0;
> +	ret = kvm_walk_nested_s2(vcpu, ipa, &out);
> +	if (ret < 0)
> +		return false;
> +
> +	/* Check if the stage-2 PTW is aborted */
> +	if (out.esr) {
> +		esr = out.esr;
> +		goto s2_trans_abort;
> +	}
> +
> +	/* Check the access permission */
> +	if ((!write && !out.readable) || (write && !out.writable)) {
> +		esr = ESR_ELx_FSC_PERM;
> +		esr |= out.level & 0x3;
> +		goto s2_trans_abort;
> +	}
> +
> +	vcpu_write_sys_reg(vcpu, setup_par_completed(vcpu, &out), PAR_EL1);
> +	return true;
> +
> +s2_trans_abort:
> +	vcpu_write_sys_reg(vcpu, setup_par_aborted(esr), PAR_EL1);
> +	return true;
> +}
> +
> +static bool handle_s12r(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +			const struct sys_reg_desc *r)
> +{
> +	return handle_s12(vcpu, p, r, false);
> +}
> +
> +static bool handle_s12w(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +			const struct sys_reg_desc *r)
> +{
> +	return handle_s12(vcpu, p, r, true);
> +}
> +
> +/*
> + * AT instruction emulation
> + *
> + * We emulate AT instructions executed in the virtual EL2.
> + * Basic strategy for the stage-1 translation emulation is to load proper
> + * context, which depends on the trapped instruction and the virtual HCR_EL2,
> + * to the EL1 virtual memory control registers and execute S1E[01] instructions
> + * in EL2. See below for more detail.
> + *
> + * For the stage-2 translation, which is necessary for S12E[01] emulation,
> + * we walk the guest hypervisor's stage-2 page table in software.
> + *
> + * The stage-1 translation emulations can be divided into two groups depending
> + * on the translation regime.
> + *
> + * 1. EL2 AT instructions: S1E2x
> + * +-----------------------------------------------------------------------+
> + * |                             |         Setting for the emulation       |
> + * | Virtual HCR_EL2.E2H on trap |-----------------------------------------+
> + * |                             | Phys EL1 regs | Phys NV, NV1 | Phys TGE |
> + * |-----------------------------------------------------------------------|
> + * |             0               |     vEL2      |    (1, 1)    |    0     |
> + * |             1               |     vEL2      |    (0, 0)    |    0     |
> + * +-----------------------------------------------------------------------+
> + *
> + * We emulate the EL2 AT instructions by loading virtual EL2 context
> + * to the EL1 virtual memory control registers and executing corresponding
> + * EL1 AT instructions.
> + *
> + * We set physical NV and NV1 bits to use EL2 page table format for non-VHE
> + * guest hypervisor (i.e. HCR_EL2.E2H == 0). As a VHE guest hypervisor uses the
> + * EL1 page table format, we don't set those bits.
> + *
> + * We should clear physical TGE bit not to use the EL2 translation regime when
> + * the host uses the VHE feature.
> + *
> + *
> + * 2. EL0/EL1 AT instructions: S1E[01]x, S12E1x
> + * +----------------------------------------------------------------------+
> + * |   Virtual HCR_EL2 on trap  |        Setting for the emulation        |
> + * |----------------------------------------------------------------------+
> + * | (vE2H, vTGE) | (vNV, vNV1) | Phys EL1 regs | Phys NV, NV1 | Phys TGE |
> + * |----------------------------------------------------------------------|
> + * |    (0, 0)*   |   (0, 0)    |      vEL1     |    (0, 0)    |    0     |
> + * |    (0, 0)    |   (1, 1)    |      vEL1     |    (1, 1)    |    0     |
> + * |    (1, 1)    |   (0, 0)    |      vEL2     |    (0, 0)    |    0     |
> + * |    (1, 1)    |   (1, 1)    |      vEL2     |    (1, 1)    |    0     |
> + * +----------------------------------------------------------------------+
> + *
> + * *For (0, 0) in the 'Virtual HCR_EL2 on trap' column, it actually means
> + *  (1, 1). Keep them (0, 0) just for the readability.
> + *
> + * We set physical EL1 virtual memory control registers depending on
> + * (vE2H, vTGE) pair. When the pair is (0, 0) where AT instructions are
> + * supposed to use EL0/EL1 translation regime, we load the EL1 registers with
> + * the virtual EL1 registers (i.e. EL1 registers from the guest hypervisor's
> + * point of view). When the pair is (1, 1), however, AT instructions are defined
> + * to apply EL2 translation regime. To emulate this behavior, we load the EL1
> + * registers with the virtual EL2 context. (i.e the shadow registers)
> + *
> + * We respect the virtual NV and NV1 bit for the emulation. When those bits are
> + * set, it means that a guest hypervisor would like to use EL2 page table format
> + * for the EL1 translation regime. We emulate this by setting the physical
> + * NV and NV1 bits.
> + */
> +
> +#define SYS_INSN(insn, access_fn)					\
> +	{								\
> +		SYS_DESC(OP_##insn),					\
> +		.access = (access_fn),					\
> +	}
> +
>  static struct sys_reg_desc sys_insn_descs[] = {
>  	{ SYS_DESC(SYS_DC_ISW), access_dcsw },
> +
> +	SYS_INSN(AT_S1E1R, handle_s1e01),
> +	SYS_INSN(AT_S1E1W, handle_s1e01),
> +	SYS_INSN(AT_S1E0R, handle_s1e01),
> +	SYS_INSN(AT_S1E0W, handle_s1e01),
> +	SYS_INSN(AT_S1E1RP, handle_s1e01),
> +	SYS_INSN(AT_S1E1WP, handle_s1e01),
> +
>  	{ SYS_DESC(SYS_DC_CSW), access_dcsw },
>  	{ SYS_DESC(SYS_DC_CISW), access_dcsw },
> +
> +	SYS_INSN(AT_S1E2R, handle_s1e2),
> +	SYS_INSN(AT_S1E2W, handle_s1e2),
> +	SYS_INSN(AT_S12E1R, handle_s12r),
> +	SYS_INSN(AT_S12E1W, handle_s12w),
> +	SYS_INSN(AT_S12E0R, handle_s12r),
> +	SYS_INSN(AT_S12E0W, handle_s12w),
>  };
>  
>  static bool trap_dbgdidr(struct kvm_vcpu *vcpu,
> -- 
> 2.30.2
> 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 40/64] KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2
@ 2022-02-24 15:39     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-24 15:39 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi,

On Fri, Jan 28, 2022 at 12:18:48PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> When supporting nested virtualization a guest hypervisor executing AT
> instructions must be trapped and emulated by the host hypervisor,
> because untrapped AT instructions operating on S1E1 will use the wrong
> translation regieme (the one used to emulate virtual EL2 in EL1 instead

s/regieme/regime/

> of virtual EL1) and AT instructions operating on S12 will not work from
> EL1.
> 
> This patch does several things.

I think this is a good hint that the patch can be split into several
patches. The size of the patch plus the emulation logic  make this
patch rather tedious to review.

> 
> 1. List and define all AT system instructions to emulate and document
> the emulation design.
> 
> 2. Implement AT instruction handling logic in EL2. This will be used to
> emulate AT instructions executed in the virtual EL2.
> 
> AT instruction emulation works by loading the proper processor
> context, which depends on the trapped instruction and the virtual
> HCR_EL2, to the EL1 virtual memory control registers and executing AT
> instructions. Note that ctxt->hw_sys_regs is expected to have the
> proper processor context before calling the handling
> function(__kvm_at_insn) implemented in this patch.
> 
> 4. Emulate AT S1E[01] instructions by issuing the same instructions in

Hmm... where's point number 3?

> EL2. We set the physical EL1 registers, NV and NV1 bits as described in
> the AT instruction emulation overview.
> 
> 5. Emulate AT A12E[01] instructions in two steps: First, do the stage-1
                ^
I'm guessing that's AT S12E[01].

> translation by reusing the existing AT emulation functions.  Second, do
> the stage-2 translation by walking the guest hypervisor's stage-2 page
> table in software. Record the translation result to PAR_EL1.
> 
> 6. Emulate AT S1E2 instructions by issuing the corresponding S1E1
> instructions in EL2. We set the physical EL1 registers and the HCR_EL2
> register as described in the AT instruction emulation overview.
> 
> 7. Forward system instruction traps to the virtual EL2 if the corresponding
> virtual AT bit is set in the virtual HCR_EL2.

Looks like points 4-7 make good canditates for individual patches.

> 
>   [ Much logic above has been reworked by Marc Zyngier ]
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> ---
>  arch/arm64/include/asm/kvm_arm.h |   2 +
>  arch/arm64/include/asm/kvm_asm.h |   2 +
>  arch/arm64/include/asm/sysreg.h  |  17 +++
>  arch/arm64/kvm/Makefile          |   2 +-
>  arch/arm64/kvm/at.c              | 219 +++++++++++++++++++++++++++++
>  arch/arm64/kvm/hyp/vhe/switch.c  |  13 +-
>  arch/arm64/kvm/sys_regs.c        | 229 ++++++++++++++++++++++++++++++-
>  7 files changed, 478 insertions(+), 6 deletions(-)
>  create mode 100644 arch/arm64/kvm/at.c
> 
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index 3675879b53c6..aa3bdce1b166 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -20,6 +20,7 @@
>  #define HCR_AMVOFFEN	(UL(1) << 51)
>  #define HCR_FIEN	(UL(1) << 47)
>  #define HCR_FWB		(UL(1) << 46)
> +#define HCR_AT		(UL(1) << 44)
>  #define HCR_NV1		(UL(1) << 43)
>  #define HCR_NV		(UL(1) << 42)
>  #define HCR_API		(UL(1) << 41)
> @@ -118,6 +119,7 @@
>  #define VTCR_EL2_TG0_16K	TCR_TG0_16K
>  #define VTCR_EL2_TG0_64K	TCR_TG0_64K
>  #define VTCR_EL2_SH0_MASK	TCR_SH0_MASK
> +#define VTCR_EL2_SH0_SHIFT	TCR_SH0_SHIFT
>  #define VTCR_EL2_SH0_INNER	TCR_SH0_INNER
>  #define VTCR_EL2_ORGN0_MASK	TCR_ORGN0_MASK
>  #define VTCR_EL2_ORGN0_WBWA	TCR_ORGN0_WBWA
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index d5b0386ef765..e22861ece3c3 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -208,6 +208,8 @@ extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa,
>  extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
>  
>  extern void __kvm_timer_set_cntvoff(u64 cntvoff);
> +extern void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr);
> +extern void __kvm_at_s1e2(struct kvm_vcpu *vcpu, u32 op, u64 vaddr);
>  
>  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>  
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 61c990e5591a..ea17d1eabfc5 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -658,6 +658,23 @@
>  
>  #define SYS_SP_EL2			sys_reg(3, 6,  4, 1, 0)
>  
> +/* AT instructions */
> +#define AT_Op0 1
> +#define AT_CRn 7
> +
> +#define OP_AT_S1E1R	sys_insn(AT_Op0, 0, AT_CRn, 8, 0)
> +#define OP_AT_S1E1W	sys_insn(AT_Op0, 0, AT_CRn, 8, 1)
> +#define OP_AT_S1E0R	sys_insn(AT_Op0, 0, AT_CRn, 8, 2)
> +#define OP_AT_S1E0W	sys_insn(AT_Op0, 0, AT_CRn, 8, 3)
> +#define OP_AT_S1E1RP	sys_insn(AT_Op0, 0, AT_CRn, 9, 0)
> +#define OP_AT_S1E1WP	sys_insn(AT_Op0, 0, AT_CRn, 9, 1)
> +#define OP_AT_S1E2R	sys_insn(AT_Op0, 4, AT_CRn, 8, 0)
> +#define OP_AT_S1E2W	sys_insn(AT_Op0, 4, AT_CRn, 8, 1)
> +#define OP_AT_S12E1R	sys_insn(AT_Op0, 4, AT_CRn, 8, 4)
> +#define OP_AT_S12E1W	sys_insn(AT_Op0, 4, AT_CRn, 8, 5)
> +#define OP_AT_S12E0R	sys_insn(AT_Op0, 4, AT_CRn, 8, 6)
> +#define OP_AT_S12E0W	sys_insn(AT_Op0, 4, AT_CRn, 8, 7)
> +
>  /* Common SCTLR_ELx flags. */
>  #define SCTLR_ELx_DSSBS	(BIT(44))
>  #define SCTLR_ELx_ATA	(BIT(43))
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index dbaf42ff65f1..b800dcbd157f 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -14,7 +14,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
>  	 inject_fault.o va_layout.o handle_exit.o \
>  	 guest.o debug.o reset.o sys_regs.o \
>  	 vgic-sys-reg-v3.o fpsimd.o pmu.o pkvm.o \
> -	 arch_timer.o trng.o emulate-nested.o nested.o \
> +	 arch_timer.o trng.o emulate-nested.o nested.o at.o \
>  	 vgic/vgic.o vgic/vgic-init.o \
>  	 vgic/vgic-irqfd.o vgic/vgic-v2.o \
>  	 vgic/vgic-v3.o vgic/vgic-v4.o \
> diff --git a/arch/arm64/kvm/at.c b/arch/arm64/kvm/at.c
> new file mode 100644
> index 000000000000..574c664e984b
> --- /dev/null
> +++ b/arch/arm64/kvm/at.c
> @@ -0,0 +1,219 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2017 - Linaro Ltd
> + * Author: Jintack Lim <jintack.lim@linaro.org>
> + */
> +
> +#include <asm/kvm_hyp.h>
> +#include <asm/kvm_mmu.h>
> +
> +struct mmu_config {
> +	u64	ttbr0;
> +	u64	ttbr1;
> +	u64	tcr;
> +	u64	sctlr;
> +	u64	vttbr;
> +	u64	vtcr;
> +	u64	hcr;
> +};
> +
> +static void __mmu_config_save(struct mmu_config *config)
> +{
> +	config->ttbr0	= read_sysreg_el1(SYS_TTBR0);
> +	config->ttbr1	= read_sysreg_el1(SYS_TTBR1);
> +	config->tcr	= read_sysreg_el1(SYS_TCR);
> +	config->sctlr	= read_sysreg_el1(SYS_SCTLR);
> +	config->vttbr	= read_sysreg(vttbr_el2);
> +	config->vtcr	= read_sysreg(vtcr_el2);

KVM saves VTCR_EL2, but the register is never changed between
__mmu_config_{save,restore} sequences. Another comment about this below.

> +	config->hcr	= read_sysreg(hcr_el2);
> +}
> +
> +static void __mmu_config_restore(struct mmu_config *config)
> +{
> +	write_sysreg_el1(config->ttbr0,	SYS_TTBR0);
> +	write_sysreg_el1(config->ttbr1,	SYS_TTBR1);
> +	write_sysreg_el1(config->tcr,	SYS_TCR);
> +	write_sysreg_el1(config->sctlr,	SYS_SCTLR);
> +	write_sysreg(config->vttbr,	vttbr_el2);
> +	write_sysreg(config->vtcr,	vtcr_el2);
> +	write_sysreg(config->hcr,	hcr_el2);
> +
> +	isb();
> +}
> +
> +void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
> +{
> +	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
> +	struct mmu_config config;
> +	struct kvm_s2_mmu *mmu;
> +
> +	spin_lock(&vcpu->kvm->mmu_lock);
> +
> +	/*
> +	 * If HCR_EL2.{E2H,TGE} == {1,1}, the MMU context is already
> +	 * the right one (as we trapped from vEL2).
> +	 */
> +	if (vcpu_el2_e2h_is_set(vcpu) && vcpu_el2_tge_is_set(vcpu))
> +		goto skip_mmu_switch;
> +
> +	/*
> +	 * FIXME: Obtaining the S2 MMU for a guest guest is horribly
> +	 * racy, and we may not find it (evicted by another vcpu, for
> +	 * example).
> +	 */

I think the "horribly racy" part deserves some elaboration. As far as I can tell
get_s2_mmu_nested() and lookup_s2_mmu() is always called with kvm->mmu_lock
held.

I suppose "evicted by another vcpu" means that get_s2_mmu_nested() decided
to reuse the MMU that KVM needs, in which case it's impossible to execute
the AT instruction, as there is no shadow stage 2 for the context.

I wonder if that's something that KVM should try to avoid, One obvious
solution would be never to reuse MMUs, but that comes at the cost of
allowing a L1 guest to eat up the L0 host's memory with kvm_s2_mmu structs
by creating many L2 guests.

Another solution would be to have a software stage 1 translation table
walker which populates the shadow S2 during the walk, if and only if the
IPA is present in the virtual stage 2 with the right permissions.

What do you think?

> +	mmu = lookup_s2_mmu(vcpu->kvm,
> +			    vcpu_read_sys_reg(vcpu, VTTBR_EL2),
> +			    vcpu_read_sys_reg(vcpu, HCR_EL2));
> +
> +	if (WARN_ON(!mmu))
> +		goto out;

If I'm not mistaken, in this case, guest PAR_EL1 is left untouched by KVM.
Shouldn't KVM set PAR_EL1 to SYS_PAR_EL1_F so the L1 guest doesn't treat
the stale PAR_EL1 value as valid?

> +
> +	/* We've trapped, so everything is live on the CPU. */
> +	__mmu_config_save(&config);
> +
> +	write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR0_EL1),	SYS_TTBR0);
> +	write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR1_EL1),	SYS_TTBR1);
> +	write_sysreg_el1(ctxt_sys_reg(ctxt, TCR_EL1),	SYS_TCR);
> +	write_sysreg_el1(ctxt_sys_reg(ctxt, SCTLR_EL1),	SYS_SCTLR);
> +	write_sysreg(kvm_get_vttbr(mmu),		vttbr_el2);
> +	/*
> +	 * REVISIT: do we need anything from the guest's VTCR_EL2? If
> +	 * looks like keeping the hosts configuration is the right
> +	 * thing to do at this stage (and we could avoid save/restore
> +	 * it. Keep the host's version for now.
> +	 */

I also don't think it's necessary to load the L1 guest's VTCR_EL2 register.
The register controls virtual stage 2, which is never used because KVM will
always use the shadow stage 2.

> +	write_sysreg((config.hcr & ~HCR_TGE) | HCR_VM,	hcr_el2);
> +
> +	isb();
> +
> +skip_mmu_switch:
> +
> +	switch (op) {
> +	case OP_AT_S1E1R:
> +	case OP_AT_S1E1RP:
> +		asm volatile("at s1e1r, %0" : : "r" (vaddr));
> +		break;
> +	case OP_AT_S1E1W:
> +	case OP_AT_S1E1WP:
> +		asm volatile("at s1e1w, %0" : : "r" (vaddr));
> +		break;
> +	case OP_AT_S1E0R:
> +		asm volatile("at s1e0r, %0" : : "r" (vaddr));
> +		break;
> +	case OP_AT_S1E0W:
> +		asm volatile("at s1e0w, %0" : : "r" (vaddr));
> +		break;
> +	default:
> +		WARN_ON_ONCE(1);
> +		break;
> +	}
> +
> +	isb();
> +
> +	ctxt_sys_reg(ctxt, PAR_EL1) = read_sysreg(par_el1);
> +
> +	/*
> +	 * Failed? let's leave the building now.
> +	 *
> +	 * FIXME: how about a failed translation because the shadow S2
> +	 * wasn't populated? We may need to perform a SW PTW,
> +	 * populating our shadow S2 and retry the instruction.
> +	 */
> +	if (ctxt_sys_reg(ctxt, PAR_EL1) & 1)
> +		goto nopan;
> +
> +	/* No PAN? No problem. */
> +	if (!(*vcpu_cpsr(vcpu) & PSR_PAN_BIT))
> +		goto nopan;
> +
> +	/*
> +	 * For PAN-involved AT operations, perform the same
> +	 * translation, using EL0 this time.
> +	 */

The description for FEAT_PAN is:

"When the value of this PAN state bit is 1, any privileged data access from
EL1, or EL2 when HCR_EL2.E2H is 1, to a virtual memory address that is
accessible to data accesses at EL0, generates a Permission fault."

I assume KVM executes the AT to make sure there is a valid translation
translation for the guest virtual address, right?

> +	switch (op) {
> +	case OP_AT_S1E1RP:
> +		asm volatile("at s1e0r, %0" : : "r" (vaddr));
> +		break;
> +	case OP_AT_S1E1WP:
> +		asm volatile("at s1e0w, %0" : : "r" (vaddr));
> +		break;
> +	default:
> +		goto nopan;
> +	}
> +
> +	/*
> +	 * If the EL0 translation has succeeded, we need to pretend
> +	 * the AT operation has failed, as the PAN setting forbids
> +	 * such a translation.

Hmm.. according to the description of FEAT_PAN, the AT translation fails
because of PAN=1 when CurrentEL=EL2 && HCR_EL2.E2H=1. So if the VCPU is at
virtual EL2 and virtual HCR_EL2.E2H=0, then it is allowed to succeed.

Thanks,
Alex

> +	 *
> +	 * FIXME: we hardcode a Level-3 permission fault. We really
> +	 * should return the real fault level.
> +	 */
> +	if (!(read_sysreg(par_el1) & 1))
> +		ctxt_sys_reg(ctxt, PAR_EL1) = 0x1f;
> +
> +nopan:
> +	if (!(vcpu_el2_e2h_is_set(vcpu) && vcpu_el2_tge_is_set(vcpu)))
> +		__mmu_config_restore(&config);
> +
> +out:
> +	spin_unlock(&vcpu->kvm->mmu_lock);
> +}
> +
> +void __kvm_at_s1e2(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
> +{
> +	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
> +	struct mmu_config config;
> +	struct kvm_s2_mmu *mmu;
> +	u64 val;
> +
> +	spin_lock(&vcpu->kvm->mmu_lock);
> +
> +	mmu = &vcpu->kvm->arch.mmu;
> +
> +	/* We've trapped, so everything is live on the CPU. */
> +	__mmu_config_save(&config);
> +
> +	if (vcpu_el2_e2h_is_set(vcpu)) {
> +		write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR0_EL2),	SYS_TTBR0);
> +		write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR1_EL2),	SYS_TTBR1);
> +		write_sysreg_el1(ctxt_sys_reg(ctxt, TCR_EL2),	SYS_TCR);
> +		write_sysreg_el1(ctxt_sys_reg(ctxt, SCTLR_EL2),	SYS_SCTLR);
> +
> +		val = config.hcr;
> +	} else {
> +		write_sysreg_el1(ctxt_sys_reg(ctxt, TTBR0_EL2),	SYS_TTBR0);
> +		val = translate_tcr_el2_to_tcr_el1(ctxt_sys_reg(ctxt, TCR_EL2));
> +		write_sysreg_el1(val, SYS_TCR);
> +		val = translate_sctlr_el2_to_sctlr_el1(ctxt_sys_reg(ctxt, SCTLR_EL2));
> +		write_sysreg_el1(val, SYS_SCTLR);
> +
> +		val = config.hcr | HCR_NV | HCR_NV1;
> +	}
> +
> +	write_sysreg(kvm_get_vttbr(mmu),		vttbr_el2);
> +	/* FIXME: write S2 MMU VTCR_EL2? */
> +	write_sysreg((val & ~HCR_TGE) | HCR_VM,		hcr_el2);
> +
> +	isb();
> +
> +	switch (op) {
> +	case OP_AT_S1E2R:
> +		asm volatile("at s1e1r, %0" : : "r" (vaddr));
> +		break;
> +	case OP_AT_S1E2W:
> +		asm volatile("at s1e1w, %0" : : "r" (vaddr));
> +		break;
> +	default:
> +		WARN_ON_ONCE(1);
> +		break;
> +	}
> +
> +	isb();
> +
> +	/* FIXME: handle failed translation due to shadow S2 */
> +	ctxt_sys_reg(ctxt, PAR_EL1) = read_sysreg(par_el1);
> +
> +	__mmu_config_restore(&config);
> +	spin_unlock(&vcpu->kvm->mmu_lock);
> +}
> diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
> index 28845f907cfc..b7790d3c4122 100644
> --- a/arch/arm64/kvm/hyp/vhe/switch.c
> +++ b/arch/arm64/kvm/hyp/vhe/switch.c
> @@ -41,9 +41,10 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
>  		if (!vcpu_el2_e2h_is_set(vcpu)) {
>  			/*
>  			 * For a guest hypervisor on v8.0, trap and emulate
> -			 * the EL1 virtual memory control register accesses.
> +			 * the EL1 virtual memory control register accesses
> +			 * as well as the AT S1 operations.
>  			 */
> -			hcr |= HCR_TVM | HCR_TRVM | HCR_NV1;
> +			hcr |= HCR_TVM | HCR_TRVM | HCR_AT | HCR_NV1;
>  		} else {
>  			/*
>  			 * For a guest hypervisor on v8.1 (VHE), allow to
> @@ -68,6 +69,14 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
>  			hcr &= ~HCR_TVM;
>  
>  			hcr |= vhcr_el2 & (HCR_TVM | HCR_TRVM);
> +
> +			/*
> +			 * If we're using the EL1 translation regime
> +			 * (TGE clear), then ensure that AT S1 ops are
> +			 * trapped too.
> +			 */
> +			if (!vcpu_el2_tge_is_set(vcpu))
> +				hcr |= HCR_AT;
>  		}
>  	}
>  
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index f669618f966b..7be57e1b7019 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1704,7 +1704,6 @@ static bool access_sp_el1(struct kvm_vcpu *vcpu,
>  	return true;
>  }
>  
> -
>  static bool access_elr(struct kvm_vcpu *vcpu,
>  		       struct sys_reg_params *p,
>  		       const struct sys_reg_desc *r)
> @@ -2236,12 +2235,236 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	EL2_REG(SP_EL2, NULL, reset_unknown, 0),
>  };
>  
> -#define SYS_INSN_TO_DESC(insn, access_fn, forward_fn)	\
> -	{ SYS_DESC((insn)), (access_fn), NULL, 0, 0, NULL, NULL, (forward_fn) }
> +static bool handle_s1e01(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +			 const struct sys_reg_desc *r)
> +{
> +	int sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
> +
> +	if (vcpu_has_nv(vcpu) && forward_traps(vcpu, HCR_AT))
> +		return false;
> +
> +	__kvm_at_s1e01(vcpu, sys_encoding, p->regval);
> +
> +	return true;
> +}
> +
> +static bool handle_s1e2(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +			const struct sys_reg_desc *r)
> +{
> +	int sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
> +
> +	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
> +		return false;
> +
> +	__kvm_at_s1e2(vcpu, sys_encoding, p->regval);
> +
> +	return true;
> +}
> +
> +static u64 setup_par_aborted(u32 esr)
> +{
> +	u64 par = 0;
> +
> +	/* S [9]: fault in the stage 2 translation */
> +	par |= (1 << 9);
> +	/* FST [6:1]: Fault status code  */
> +	par |= (esr << 1);
> +	/* F [0]: translation is aborted */
> +	par |= 1;
> +
> +	return par;
> +}
> +
> +static u64 setup_par_completed(struct kvm_vcpu *vcpu, struct kvm_s2_trans *out)
> +{
> +	u64 par, vtcr_sh0;
> +
> +	/* F [0]: Translation is completed successfully */
> +	par = 0;
> +	/* ATTR [63:56] */
> +	par |= out->upper_attr;
> +	/* PA [47:12] */
> +	par |= out->output & GENMASK_ULL(11, 0);
> +	/* RES1 [11] */
> +	par |= (1UL << 11);
> +	/* SH [8:7]: Shareability attribute */
> +	vtcr_sh0 = vcpu_read_sys_reg(vcpu, VTCR_EL2) & VTCR_EL2_SH0_MASK;
> +	par |= (vtcr_sh0 >> VTCR_EL2_SH0_SHIFT) << 7;
> +
> +	return par;
> +}
> +
> +static bool handle_s12(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +		       const struct sys_reg_desc *r, bool write)
> +{
> +	u64 par, va;
> +	u32 esr, op;
> +	phys_addr_t ipa;
> +	struct kvm_s2_trans out;
> +	int ret;
> +
> +	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
> +		return false;
> +
> +	/* Do the stage-1 translation */
> +	va = p->regval;
> +	op = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
> +	switch (op) {
> +	case OP_AT_S12E1R:
> +		op = OP_AT_S1E1R;
> +		break;
> +	case OP_AT_S12E1W:
> +		op = OP_AT_S1E1W;
> +		break;
> +	case OP_AT_S12E0R:
> +		op = OP_AT_S1E0R;
> +		break;
> +	case OP_AT_S12E0W:
> +		op = OP_AT_S1E0W;
> +		break;
> +	default:
> +		WARN_ON_ONCE(1);
> +		return true;
> +	}
> +
> +	__kvm_at_s1e01(vcpu, op, va);
> +	par = vcpu_read_sys_reg(vcpu, PAR_EL1);
> +	if (par & 1) {
> +		/* The stage-1 translation aborted */
> +		return true;
> +	}
> +
> +	/* Do the stage-2 translation */
> +	ipa = (par & GENMASK_ULL(47, 12)) | (va & GENMASK_ULL(11, 0));
> +	out.esr = 0;
> +	ret = kvm_walk_nested_s2(vcpu, ipa, &out);
> +	if (ret < 0)
> +		return false;
> +
> +	/* Check if the stage-2 PTW is aborted */
> +	if (out.esr) {
> +		esr = out.esr;
> +		goto s2_trans_abort;
> +	}
> +
> +	/* Check the access permission */
> +	if ((!write && !out.readable) || (write && !out.writable)) {
> +		esr = ESR_ELx_FSC_PERM;
> +		esr |= out.level & 0x3;
> +		goto s2_trans_abort;
> +	}
> +
> +	vcpu_write_sys_reg(vcpu, setup_par_completed(vcpu, &out), PAR_EL1);
> +	return true;
> +
> +s2_trans_abort:
> +	vcpu_write_sys_reg(vcpu, setup_par_aborted(esr), PAR_EL1);
> +	return true;
> +}
> +
> +static bool handle_s12r(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +			const struct sys_reg_desc *r)
> +{
> +	return handle_s12(vcpu, p, r, false);
> +}
> +
> +static bool handle_s12w(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +			const struct sys_reg_desc *r)
> +{
> +	return handle_s12(vcpu, p, r, true);
> +}
> +
> +/*
> + * AT instruction emulation
> + *
> + * We emulate AT instructions executed in the virtual EL2.
> + * Basic strategy for the stage-1 translation emulation is to load proper
> + * context, which depends on the trapped instruction and the virtual HCR_EL2,
> + * to the EL1 virtual memory control registers and execute S1E[01] instructions
> + * in EL2. See below for more detail.
> + *
> + * For the stage-2 translation, which is necessary for S12E[01] emulation,
> + * we walk the guest hypervisor's stage-2 page table in software.
> + *
> + * The stage-1 translation emulations can be divided into two groups depending
> + * on the translation regime.
> + *
> + * 1. EL2 AT instructions: S1E2x
> + * +-----------------------------------------------------------------------+
> + * |                             |         Setting for the emulation       |
> + * | Virtual HCR_EL2.E2H on trap |-----------------------------------------+
> + * |                             | Phys EL1 regs | Phys NV, NV1 | Phys TGE |
> + * |-----------------------------------------------------------------------|
> + * |             0               |     vEL2      |    (1, 1)    |    0     |
> + * |             1               |     vEL2      |    (0, 0)    |    0     |
> + * +-----------------------------------------------------------------------+
> + *
> + * We emulate the EL2 AT instructions by loading virtual EL2 context
> + * to the EL1 virtual memory control registers and executing corresponding
> + * EL1 AT instructions.
> + *
> + * We set physical NV and NV1 bits to use EL2 page table format for non-VHE
> + * guest hypervisor (i.e. HCR_EL2.E2H == 0). As a VHE guest hypervisor uses the
> + * EL1 page table format, we don't set those bits.
> + *
> + * We should clear physical TGE bit not to use the EL2 translation regime when
> + * the host uses the VHE feature.
> + *
> + *
> + * 2. EL0/EL1 AT instructions: S1E[01]x, S12E1x
> + * +----------------------------------------------------------------------+
> + * |   Virtual HCR_EL2 on trap  |        Setting for the emulation        |
> + * |----------------------------------------------------------------------+
> + * | (vE2H, vTGE) | (vNV, vNV1) | Phys EL1 regs | Phys NV, NV1 | Phys TGE |
> + * |----------------------------------------------------------------------|
> + * |    (0, 0)*   |   (0, 0)    |      vEL1     |    (0, 0)    |    0     |
> + * |    (0, 0)    |   (1, 1)    |      vEL1     |    (1, 1)    |    0     |
> + * |    (1, 1)    |   (0, 0)    |      vEL2     |    (0, 0)    |    0     |
> + * |    (1, 1)    |   (1, 1)    |      vEL2     |    (1, 1)    |    0     |
> + * +----------------------------------------------------------------------+
> + *
> + * *For (0, 0) in the 'Virtual HCR_EL2 on trap' column, it actually means
> + *  (1, 1). Keep them (0, 0) just for the readability.
> + *
> + * We set physical EL1 virtual memory control registers depending on
> + * (vE2H, vTGE) pair. When the pair is (0, 0) where AT instructions are
> + * supposed to use EL0/EL1 translation regime, we load the EL1 registers with
> + * the virtual EL1 registers (i.e. EL1 registers from the guest hypervisor's
> + * point of view). When the pair is (1, 1), however, AT instructions are defined
> + * to apply EL2 translation regime. To emulate this behavior, we load the EL1
> + * registers with the virtual EL2 context. (i.e the shadow registers)
> + *
> + * We respect the virtual NV and NV1 bit for the emulation. When those bits are
> + * set, it means that a guest hypervisor would like to use EL2 page table format
> + * for the EL1 translation regime. We emulate this by setting the physical
> + * NV and NV1 bits.
> + */
> +
> +#define SYS_INSN(insn, access_fn)					\
> +	{								\
> +		SYS_DESC(OP_##insn),					\
> +		.access = (access_fn),					\
> +	}
> +
>  static struct sys_reg_desc sys_insn_descs[] = {
>  	{ SYS_DESC(SYS_DC_ISW), access_dcsw },
> +
> +	SYS_INSN(AT_S1E1R, handle_s1e01),
> +	SYS_INSN(AT_S1E1W, handle_s1e01),
> +	SYS_INSN(AT_S1E0R, handle_s1e01),
> +	SYS_INSN(AT_S1E0W, handle_s1e01),
> +	SYS_INSN(AT_S1E1RP, handle_s1e01),
> +	SYS_INSN(AT_S1E1WP, handle_s1e01),
> +
>  	{ SYS_DESC(SYS_DC_CSW), access_dcsw },
>  	{ SYS_DESC(SYS_DC_CISW), access_dcsw },
> +
> +	SYS_INSN(AT_S1E2R, handle_s1e2),
> +	SYS_INSN(AT_S1E2W, handle_s1e2),
> +	SYS_INSN(AT_S12E1R, handle_s12r),
> +	SYS_INSN(AT_S12E1W, handle_s12w),
> +	SYS_INSN(AT_S12E0R, handle_s12r),
> +	SYS_INSN(AT_S12E0W, handle_s12w),
>  };
>  
>  static bool trap_dbgdidr(struct kvm_vcpu *vcpu,
> -- 
> 2.30.2
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 41/64] KVM: arm64: nv: Trap and emulate TLBI instructions from virtual EL2
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-24 15:56     ` Alexandru Elisei
  -1 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-24 15:56 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi,

On Fri, Jan 28, 2022 at 12:18:49PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> When supporting nested virtualization a guest hypervisor executing TLBI
> instructions must be trapped and emulated by the host hypervisor,
> because the guest hypervisor can only affect physical TLB entries
> relating to its own execution environment (virtual EL2 in EL1) but not
> to the nested guests as required by the semantics of the instructions
> and TLBI instructions might also result in updates (invalidations) to
> shadow page tables.
> 
> This patch does several things.

Sounds like a great reason to split this into several patches!

> 
> 1. List and define all TLBI system instructions to emulate.
> 
> 2. Emulate TLBI ALLE2(IS) instruction executed in the virtual EL2. Since
> we emulate the virtual EL2 in the EL1, we invalidate EL1&0 regime stage
> 1 TLB entries with setting vttbr_el2 having the VMID of the virtual EL2.
> 
> 3. Emulate TLBI VAE2* instruction executed in the virtual EL2. Based on the
> same principle as TLBI ALLE2 instruction, we can simply emulate those
> instructions by executing corresponding VAE1* instructions with the
> virtual EL2's VMID assigned by the host hypervisor.
> 
> Note that we are able to emulate TLBI ALLE2IS precisely by only
> invalidating stage 1 TLB entries via TLBI VMALL1IS instruction, but to
> make it simeple, we reuse the existing function, __kvm_tlb_flush_vmid(),
> which invalidates both of stage 1 and 2 TLB entries.
> 
> 4. TLBI ALLE1(IS) instruction invalidates all EL1&0 regime stage 1 and 2
> TLB entries (on all PEs in the same Inner Shareable domain). To emulate
> these instructions, we first need to clear all the mappings in the
> shadow page tables since executing those instructions implies the change
> of mappings in the stage 2 page tables maintained by the guest
> hypervisor.  We then need to invalidate all EL1&0 regime stage 1 and 2
> TLB entries of all VMIDs, which are assigned by the host hypervisor, for
> this VM.
> 
> 5. Based on the same principle as TLBI ALLE1(IS) emulation, we clear the
> mappings in the shadow stage-2 page tables and invalidate TLB entries.
> But this time we do it only for the current VMID from the guest
> hypervisor's perspective, not for all VMIDs.
> 
> 6. Based on the same principle as TLBI ALLE1(IS) and TLBI VMALLS12E1(IS)
> emulation, we clear the mappings in the shadow stage-2 page tables and
> invalidate TLB entries. We do it only for one mapping for the current
> VMID from the guest hypervisor's view.
> 
> 7. Forward system instruction traps to the virtual EL2 if a
> corresponding bit in the virtual HCR_EL2 is set.
> 
> 8. Even though a guest hypervisor can execute TLBI instructions that are
> accesible at EL1 without trap, it's wrong; All those TLBI instructions
> work based on current VMID, and when running a guest hypervisor current
> VMID is the one for itself, not the one from the virtual vttbr_el2. So
> letting a guest hypervisor execute those TLBI instructions results in
> invalidating its own TLB entries and leaving invalid TLB entries
> unhandled.

The points above look like a great starting point for turning this patch
into several.

Thanks,
Alex

> 
> Therefore we trap and emulate those TLBI instructions. The emulation is
> simple; we find a shadow VMID mapped to the virtual vttbr_el2, set it in
> the physical vttbr_el2, then execute the same instruction in EL2.
> 
> We don't set HCR_EL2.TTLB bit yet.
> 
>   [ Changes performed by Marc Zynger:
> 
>     The TLBI handling code more or less directly execute the same
>     instruction that has been trapped (with an EL2->EL1 conversion
>     in the case of an EL2 TLBI), but that's unfortunately not enough:
> 
>     - TLBIs must be upgraded to the Inner Shareable domain to account
>       for vcpu migration, just like we already have with HCR_EL2.FB.
> 
>     - The DSB instruction that synchronises these must thus be on
>       the Inner Shareable domain as well.
> 
>     - Prior to executing the TLBI, we need another DSB ISHST to make
>       sure that the update to the page tables is now visible.
> 
>       Ordering of system instructions fixed
> 
>     - The current TLB invalidation code is pretty buggy, as it assume a
>       page mapping. On the contrary, it is likely that TLB invalidation
>       will cover more than a single page, and the size should be decided
>       by the guests configuration (and not the host's).
> 
>       Since we don't cache the guest mapping sizes in the shadow PT yet,
>       let's assume the worse case (a block mapping) and invalidate that.
> 
>       Take this opportunity to fix the decoding of the parameter (it
>       isn't a straight IPA).
> 
>     - In general, we always emulate local TBL invalidations as being
>       as upgraded to the Inner Shareable domain so that we can easily
>       deal with vcpu migration. This is consistent with the fact that
>       we set HCR_EL2.FB when running non-nested VMs.
> 
>       So let's emulate TLBI ALLE2 as ALLE2IS.
>   ]
> 
>   [ Changes performed by Christoffer Dall:
> 
>     Sometimes when we are invalidating the TLB for a certain S2 MMU
>     context, this context can also have EL2 context associated with it
>     and we have to invalidate this too.
>   ]
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_asm.h |   2 +
>  arch/arm64/include/asm/sysreg.h  |  36 +++++
>  arch/arm64/kvm/hyp/vhe/switch.c  |   8 +-
>  arch/arm64/kvm/hyp/vhe/tlb.c     |  81 +++++++++++
>  arch/arm64/kvm/mmu.c             |  19 ++-
>  arch/arm64/kvm/sys_regs.c        | 227 +++++++++++++++++++++++++++++++
>  6 files changed, 368 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index e22861ece3c3..64ddbecab241 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -206,6 +206,8 @@ extern void __kvm_flush_cpu_context(struct kvm_s2_mmu *mmu);
>  extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa,
>  				     int level);
>  extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
> +extern void __kvm_tlb_vae2is(struct kvm_s2_mmu *mmu, u64 va, u64 sys_encoding);
> +extern void __kvm_tlb_el1_instr(struct kvm_s2_mmu *mmu, u64 val, u64 sys_encoding);
>  
>  extern void __kvm_timer_set_cntvoff(u64 cntvoff);
>  extern void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr);
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index ea17d1eabfc5..41c3603a3f29 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -675,6 +675,42 @@
>  #define OP_AT_S12E0R	sys_insn(AT_Op0, 4, AT_CRn, 8, 6)
>  #define OP_AT_S12E0W	sys_insn(AT_Op0, 4, AT_CRn, 8, 7)
>  
> +/* TLBI instructions */
> +#define TLBI_Op0	1
> +#define TLBI_Op1_EL1	0	/* Accessible from EL1 or higher */
> +#define TLBI_Op1_EL2	4	/* Accessible from EL2 or higher */
> +#define TLBI_CRn	8
> +#define tlbi_insn_el1(CRm, Op2)	sys_insn(TLBI_Op0, TLBI_Op1_EL1, TLBI_CRn, (CRm), (Op2))
> +#define tlbi_insn_el2(CRm, Op2)	sys_insn(TLBI_Op0, TLBI_Op1_EL2, TLBI_CRn, (CRm), (Op2))
> +
> +#define OP_TLBI_VMALLE1IS	tlbi_insn_el1(3, 0)
> +#define OP_TLBI_VAE1IS		tlbi_insn_el1(3, 1)
> +#define OP_TLBI_ASIDE1IS	tlbi_insn_el1(3, 2)
> +#define OP_TLBI_VAAE1IS		tlbi_insn_el1(3, 3)
> +#define OP_TLBI_VALE1IS		tlbi_insn_el1(3, 5)
> +#define OP_TLBI_VAALE1IS	tlbi_insn_el1(3, 7)
> +#define OP_TLBI_VMALLE1		tlbi_insn_el1(7, 0)
> +#define OP_TLBI_VAE1		tlbi_insn_el1(7, 1)
> +#define OP_TLBI_ASIDE1		tlbi_insn_el1(7, 2)
> +#define OP_TLBI_VAAE1		tlbi_insn_el1(7, 3)
> +#define OP_TLBI_VALE1		tlbi_insn_el1(7, 5)
> +#define OP_TLBI_VAALE1		tlbi_insn_el1(7, 7)
> +
> +#define OP_TLBI_IPAS2E1IS	tlbi_insn_el2(0, 1)
> +#define OP_TLBI_IPAS2LE1IS	tlbi_insn_el2(0, 5)
> +#define OP_TLBI_ALLE2IS		tlbi_insn_el2(3, 0)
> +#define OP_TLBI_VAE2IS		tlbi_insn_el2(3, 1)
> +#define OP_TLBI_ALLE1IS		tlbi_insn_el2(3, 4)
> +#define OP_TLBI_VALE2IS		tlbi_insn_el2(3, 5)
> +#define OP_TLBI_VMALLS12E1IS	tlbi_insn_el2(3, 6)
> +#define OP_TLBI_IPAS2E1		tlbi_insn_el2(4, 1)
> +#define OP_TLBI_IPAS2LE1	tlbi_insn_el2(4, 5)
> +#define OP_TLBI_ALLE2		tlbi_insn_el2(7, 0)
> +#define OP_TLBI_VAE2		tlbi_insn_el2(7, 1)
> +#define OP_TLBI_ALLE1		tlbi_insn_el2(7, 4)
> +#define OP_TLBI_VALE2		tlbi_insn_el2(7, 5)
> +#define OP_TLBI_VMALLS12E1	tlbi_insn_el2(7, 6)
> +
>  /* Common SCTLR_ELx flags. */
>  #define SCTLR_ELx_DSSBS	(BIT(44))
>  #define SCTLR_ELx_ATA	(BIT(43))
> diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
> index b7790d3c4122..0e164cc8e913 100644
> --- a/arch/arm64/kvm/hyp/vhe/switch.c
> +++ b/arch/arm64/kvm/hyp/vhe/switch.c
> @@ -44,7 +44,7 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
>  			 * the EL1 virtual memory control register accesses
>  			 * as well as the AT S1 operations.
>  			 */
> -			hcr |= HCR_TVM | HCR_TRVM | HCR_AT | HCR_NV1;
> +			hcr |= HCR_TVM | HCR_TRVM | HCR_AT | HCR_TTLB | HCR_NV1;
>  		} else {
>  			/*
>  			 * For a guest hypervisor on v8.1 (VHE), allow to
> @@ -72,11 +72,11 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
>  
>  			/*
>  			 * If we're using the EL1 translation regime
> -			 * (TGE clear), then ensure that AT S1 ops are
> -			 * trapped too.
> +			 * (TGE clear), then ensure that AT S1 and
> +			 * TLBI E1 ops are trapped too.
>  			 */
>  			if (!vcpu_el2_tge_is_set(vcpu))
> -				hcr |= HCR_AT;
> +				hcr |= HCR_AT | HCR_TTLB;
>  		}
>  	}
>  
> diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c
> index 24cef9b87f9e..c4389db4cc22 100644
> --- a/arch/arm64/kvm/hyp/vhe/tlb.c
> +++ b/arch/arm64/kvm/hyp/vhe/tlb.c
> @@ -161,3 +161,84 @@ void __kvm_flush_vm_context(void)
>  
>  	dsb(ish);
>  }
> +
> +void __kvm_tlb_vae2is(struct kvm_s2_mmu *mmu, u64 va, u64 sys_encoding)
> +{
> +	struct tlb_inv_context cxt;
> +
> +	dsb(ishst);
> +
> +	/* Switch to requested VMID */
> +	__tlb_switch_to_guest(mmu, &cxt);
> +
> +	/*
> +	 * Execute the EL1 version of TLBI VAE2* instruction, forcing
> +	 * an upgrade to the Inner Shareable domain in order to
> +	 * perform the invalidation on all CPUs.
> +	 */
> +	switch (sys_encoding) {
> +	case OP_TLBI_VAE2:
> +	case OP_TLBI_VAE2IS:
> +		__tlbi(vae1is, va);
> +		break;
> +	case OP_TLBI_VALE2:
> +	case OP_TLBI_VALE2IS:
> +		__tlbi(vale1is, va);
> +		break;
> +	default:
> +		break;
> +	}
> +	dsb(ish);
> +	isb();
> +
> +	__tlb_switch_to_host(&cxt);
> +}
> +
> +void __kvm_tlb_el1_instr(struct kvm_s2_mmu *mmu, u64 val, u64 sys_encoding)
> +{
> +	struct tlb_inv_context cxt;
> +
> +	dsb(ishst);
> +
> +	/* Switch to requested VMID */
> +	__tlb_switch_to_guest(mmu, &cxt);
> +
> +	/*
> +	 * Execute the same instruction as the guest hypervisor did,
> +	 * expanding the scope of local TLB invalidations to the Inner
> +	 * Shareable domain so that it takes place on all CPUs. This
> +	 * is equivalent to having HCR_EL2.FB set.
> +	 */
> +	switch (sys_encoding) {
> +	case OP_TLBI_VMALLE1:
> +	case OP_TLBI_VMALLE1IS:
> +		__tlbi(vmalle1is);
> +		break;
> +	case OP_TLBI_VAE1:
> +	case OP_TLBI_VAE1IS:
> +		__tlbi(vae1is, val);
> +		break;
> +	case OP_TLBI_ASIDE1:
> +	case OP_TLBI_ASIDE1IS:
> +		__tlbi(aside1is, val);
> +		break;
> +	case OP_TLBI_VAAE1:
> +	case OP_TLBI_VAAE1IS:
> +		__tlbi(vaae1is, val);
> +		break;
> +	case OP_TLBI_VALE1:
> +	case OP_TLBI_VALE1IS:
> +		__tlbi(vale1is, val);
> +		break;
> +	case OP_TLBI_VAALE1:
> +	case OP_TLBI_VAALE1IS:
> +		__tlbi(vaale1is, val);
> +		break;
> +	default:
> +		break;
> +	}
> +	dsb(ish);
> +	isb();
> +
> +	__tlb_switch_to_host(&cxt);
> +}
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index b9b11be65009..4b38f2f3c997 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -80,8 +80,25 @@ static bool memslot_is_logging(struct kvm_memory_slot *memslot)
>   */
>  void kvm_flush_remote_tlbs(struct kvm *kvm)
>  {
> +	struct kvm_s2_mmu *mmu = &kvm->arch.mmu;
> +
>  	++kvm->stat.generic.remote_tlb_flush_requests;
> -	kvm_call_hyp(__kvm_tlb_flush_vmid, &kvm->arch.mmu);
> +
> +	if (mmu == &kvm->arch.mmu) {
> +		/*
> +		 * For a normal (i.e. non-nested) guest, flush entries for the
> +		 * given VMID *
> +		 */
> +		kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
> +	} else {
> +		/*
> +		 * When supporting nested virtualization, we can have multiple
> +		 * VMIDs in play for each VCPU in the VM, so it's really not
> +		 * worth it to try to quiesce the system and flush all the
> +		 * VMIDs that may be in use, instead just nuke the whole thing.
> +		 */
> +		kvm_call_hyp(__kvm_flush_vm_context);
> +	}
>  }
>  
>  static bool kvm_is_device_pfn(unsigned long pfn)
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 7be57e1b7019..d7441b8ba406 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -2374,6 +2374,205 @@ static bool handle_s12w(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
>  	return handle_s12(vcpu, p, r, true);
>  }
>  
> +static bool handle_alle2is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +			   const struct sys_reg_desc *r)
> +{
> +	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
> +		return false;
> +
> +	/*
> +	 * To emulate invalidating all EL2 regime stage 1 TLB entries for all
> +	 * PEs, executing TLBI VMALLE1IS is enough. But reuse the existing
> +	 * interface for the simplicity; invalidating stage 2 entries doesn't
> +	 * affect the correctness.
> +	 */
> +	__kvm_tlb_flush_vmid(&vcpu->kvm->arch.mmu);
> +	return true;
> +}
> +
> +static bool handle_vae2is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +			  const struct sys_reg_desc *r)
> +{
> +	int sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
> +
> +	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
> +		return false;
> +
> +	/*
> +	 * Based on the same principle as TLBI ALLE2 instruction
> +	 * emulation, we emulate TLBI VAE2* instructions by executing
> +	 * corresponding TLBI VAE1* instructions with the virtual
> +	 * EL2's VMID assigned by the host hypervisor.
> +	 */
> +	__kvm_tlb_vae2is(&vcpu->kvm->arch.mmu, p->regval, sys_encoding);
> +	return true;
> +}
> +
> +static bool handle_alle1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +			   const struct sys_reg_desc *r)
> +{
> +	struct kvm_s2_mmu *mmu = &vcpu->kvm->arch.mmu;
> +
> +	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
> +		return false;
> +
> +	spin_lock(&vcpu->kvm->mmu_lock);
> +
> +	/*
> +	 * Clear all mappings in the shadow page tables and invalidate the stage
> +	 * 1 and 2 TLB entries via kvm_tlb_flush_vmid_ipa().
> +	 */
> +	kvm_nested_s2_clear(vcpu->kvm);
> +
> +	if (mmu->vmid.vmid_gen) {
> +		/*
> +		 * Invalidate the stage 1 and 2 TLB entries for the host OS
> +		 * in a VM only if there is one.
> +		 */
> +		__kvm_tlb_flush_vmid(mmu);
> +	}
> +
> +	spin_unlock(&vcpu->kvm->mmu_lock);
> +
> +	return true;
> +}
> +
> +static bool handle_vmalls12e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +				const struct sys_reg_desc *r)
> +{
> +	struct kvm_s2_mmu *mmu;
> +	u64 vttbr;
> +
> +	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
> +		return false;
> +
> +	vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
> +	spin_lock(&vcpu->kvm->mmu_lock);
> +
> +	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, HCR_VM);
> +	if (mmu)
> +		kvm_unmap_stage2_range(mmu, 0, kvm_phys_size(vcpu->kvm));
> +
> +	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, 0);
> +	if (mmu)
> +		kvm_unmap_stage2_range(mmu, 0, kvm_phys_size(vcpu->kvm));
> +
> +	spin_unlock(&vcpu->kvm->mmu_lock);
> +
> +	return true;
> +}
> +
> +static bool handle_ipas2e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +			     const struct sys_reg_desc *r)
> +{
> +	u64 vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
> +	u64 vtcr = vcpu_read_sys_reg(vcpu, VTCR_EL2);
> +	struct kvm_s2_mmu *mmu;
> +	u64 base_addr;
> +	int max_size;
> +
> +	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
> +		return false;
> +
> +	/*
> +	 * We drop a number of things from the supplied value:
> +	 *
> +	 * - NS bit: we're non-secure only.
> +	 *
> +	 * - TTL field: We already have the granule size from the
> +	 *   VTCR_EL2.TG0 field, and the level is only relevant to the
> +	 *   guest's S2PT.
> +	 *
> +	 * - IPA[51:48]: We don't support 52bit IPA just yet...
> +	 *
> +	 * And of course, adjust the IPA to be on an actual address.
> +	 */
> +	base_addr = (p->regval & GENMASK_ULL(35, 0)) << 12;
> +
> +	/* Compute the maximum extent of the invalidation */
> +	switch ((vtcr & VTCR_EL2_TG0_MASK)) {
> +	case VTCR_EL2_TG0_4K:
> +		max_size = SZ_1G;
> +		break;
> +	case VTCR_EL2_TG0_16K:
> +		max_size = SZ_32M;
> +		break;
> +	case VTCR_EL2_TG0_64K:
> +		/*
> +		 * No, we do not support 52bit IPA in nested yet. Once
> +		 * we do, this should be 4TB.
> +		 */
> +		/* FIXME: remove the 52bit PA support from the IDregs */
> +		max_size = SZ_512M;
> +		break;
> +	default:
> +		BUG();
> +	}
> +
> +	spin_lock(&vcpu->kvm->mmu_lock);
> +
> +	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, HCR_VM);
> +	if (mmu)
> +		kvm_unmap_stage2_range(mmu, base_addr, max_size);
> +
> +	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, 0);
> +	if (mmu)
> +		kvm_unmap_stage2_range(mmu, base_addr, max_size);
> +
> +	spin_unlock(&vcpu->kvm->mmu_lock);
> +
> +	return true;
> +}
> +
> +static bool handle_tlbi_el1(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +			    const struct sys_reg_desc *r)
> +{
> +	u32 sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
> +
> +	if (vcpu_has_nv(vcpu) && forward_traps(vcpu, HCR_TTLB))
> +		return false;
> +
> +	/*
> +	 * If we're here, this is because we've trapped on a EL1 TLBI
> +	 * instruction that affects the EL1 translation regime while
> +	 * we're running in a context that doesn't allow us to let the
> +	 * HW do its thing (aka vEL2):
> +	 *
> +	 * - HCR_EL2.E2H == 0 : a non-VHE guest
> +	 * - HCR_EL2.{E2H,TGE} == { 1, 0 } : a VHE guest in guest mode
> +	 *
> +	 * We don't expect these helpers to ever be called when running
> +	 * in a vEL1 context.
> +	 */
> +
> +	WARN_ON(!vcpu_is_el2(vcpu));
> +
> +	mutex_lock(&vcpu->kvm->lock);
> +
> +	if ((__vcpu_sys_reg(vcpu, HCR_EL2) & (HCR_E2H | HCR_TGE)) != (HCR_E2H | HCR_TGE)) {
> +		u64 virtual_vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
> +		struct kvm_s2_mmu *mmu;
> +
> +		mmu = lookup_s2_mmu(vcpu->kvm, virtual_vttbr, HCR_VM);
> +		if (mmu)
> +			__kvm_tlb_el1_instr(mmu, p->regval, sys_encoding);
> +
> +		mmu = lookup_s2_mmu(vcpu->kvm, virtual_vttbr, 0);
> +		if (mmu)
> +			__kvm_tlb_el1_instr(mmu, p->regval, sys_encoding);
> +	} else {
> +		/*
> +		 * ARMv8.4-NV allows the guest to change TGE behind
> +		 * our back, so we always trap EL1 TLBIs from vEL2...
> +		 */
> +		__kvm_tlb_el1_instr(&vcpu->kvm->arch.mmu, p->regval, sys_encoding);
> +	}
> +
> +	mutex_unlock(&vcpu->kvm->lock);
> +
> +	return true;
> +}
> +
>  /*
>   * AT instruction emulation
>   *
> @@ -2459,12 +2658,40 @@ static struct sys_reg_desc sys_insn_descs[] = {
>  	{ SYS_DESC(SYS_DC_CSW), access_dcsw },
>  	{ SYS_DESC(SYS_DC_CISW), access_dcsw },
>  
> +	SYS_INSN(TLBI_VMALLE1IS, handle_tlbi_el1),
> +	SYS_INSN(TLBI_VAE1IS, handle_tlbi_el1),
> +	SYS_INSN(TLBI_ASIDE1IS, handle_tlbi_el1),
> +	SYS_INSN(TLBI_VAAE1IS, handle_tlbi_el1),
> +	SYS_INSN(TLBI_VALE1IS, handle_tlbi_el1),
> +	SYS_INSN(TLBI_VAALE1IS, handle_tlbi_el1),
> +	SYS_INSN(TLBI_VMALLE1, handle_tlbi_el1),
> +	SYS_INSN(TLBI_VAE1, handle_tlbi_el1),
> +	SYS_INSN(TLBI_ASIDE1, handle_tlbi_el1),
> +	SYS_INSN(TLBI_VAAE1, handle_tlbi_el1),
> +	SYS_INSN(TLBI_VALE1, handle_tlbi_el1),
> +	SYS_INSN(TLBI_VAALE1, handle_tlbi_el1),
> +
>  	SYS_INSN(AT_S1E2R, handle_s1e2),
>  	SYS_INSN(AT_S1E2W, handle_s1e2),
>  	SYS_INSN(AT_S12E1R, handle_s12r),
>  	SYS_INSN(AT_S12E1W, handle_s12w),
>  	SYS_INSN(AT_S12E0R, handle_s12r),
>  	SYS_INSN(AT_S12E0W, handle_s12w),
> +
> +	SYS_INSN(TLBI_IPAS2E1IS, handle_ipas2e1is),
> +	SYS_INSN(TLBI_IPAS2LE1IS, handle_ipas2e1is),
> +	SYS_INSN(TLBI_ALLE2IS, handle_alle2is),
> +	SYS_INSN(TLBI_VAE2IS, handle_vae2is),
> +	SYS_INSN(TLBI_ALLE1IS, handle_alle1is),
> +	SYS_INSN(TLBI_VALE2IS, handle_vae2is),
> +	SYS_INSN(TLBI_VMALLS12E1IS, handle_vmalls12e1is),
> +	SYS_INSN(TLBI_IPAS2E1, handle_ipas2e1is),
> +	SYS_INSN(TLBI_IPAS2LE1, handle_ipas2e1is),
> +	SYS_INSN(TLBI_ALLE2, handle_alle2is),
> +	SYS_INSN(TLBI_VAE2, handle_vae2is),
> +	SYS_INSN(TLBI_ALLE1, handle_alle1is),
> +	SYS_INSN(TLBI_VALE2, handle_vae2is),
> +	SYS_INSN(TLBI_VMALLS12E1, handle_vmalls12e1is),
>  };
>  
>  static bool trap_dbgdidr(struct kvm_vcpu *vcpu,
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 41/64] KVM: arm64: nv: Trap and emulate TLBI instructions from virtual EL2
@ 2022-02-24 15:56     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-24 15:56 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

Hi,

On Fri, Jan 28, 2022 at 12:18:49PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> When supporting nested virtualization a guest hypervisor executing TLBI
> instructions must be trapped and emulated by the host hypervisor,
> because the guest hypervisor can only affect physical TLB entries
> relating to its own execution environment (virtual EL2 in EL1) but not
> to the nested guests as required by the semantics of the instructions
> and TLBI instructions might also result in updates (invalidations) to
> shadow page tables.
> 
> This patch does several things.

Sounds like a great reason to split this into several patches!

> 
> 1. List and define all TLBI system instructions to emulate.
> 
> 2. Emulate TLBI ALLE2(IS) instruction executed in the virtual EL2. Since
> we emulate the virtual EL2 in the EL1, we invalidate EL1&0 regime stage
> 1 TLB entries with setting vttbr_el2 having the VMID of the virtual EL2.
> 
> 3. Emulate TLBI VAE2* instruction executed in the virtual EL2. Based on the
> same principle as TLBI ALLE2 instruction, we can simply emulate those
> instructions by executing corresponding VAE1* instructions with the
> virtual EL2's VMID assigned by the host hypervisor.
> 
> Note that we are able to emulate TLBI ALLE2IS precisely by only
> invalidating stage 1 TLB entries via TLBI VMALL1IS instruction, but to
> make it simeple, we reuse the existing function, __kvm_tlb_flush_vmid(),
> which invalidates both of stage 1 and 2 TLB entries.
> 
> 4. TLBI ALLE1(IS) instruction invalidates all EL1&0 regime stage 1 and 2
> TLB entries (on all PEs in the same Inner Shareable domain). To emulate
> these instructions, we first need to clear all the mappings in the
> shadow page tables since executing those instructions implies the change
> of mappings in the stage 2 page tables maintained by the guest
> hypervisor.  We then need to invalidate all EL1&0 regime stage 1 and 2
> TLB entries of all VMIDs, which are assigned by the host hypervisor, for
> this VM.
> 
> 5. Based on the same principle as TLBI ALLE1(IS) emulation, we clear the
> mappings in the shadow stage-2 page tables and invalidate TLB entries.
> But this time we do it only for the current VMID from the guest
> hypervisor's perspective, not for all VMIDs.
> 
> 6. Based on the same principle as TLBI ALLE1(IS) and TLBI VMALLS12E1(IS)
> emulation, we clear the mappings in the shadow stage-2 page tables and
> invalidate TLB entries. We do it only for one mapping for the current
> VMID from the guest hypervisor's view.
> 
> 7. Forward system instruction traps to the virtual EL2 if a
> corresponding bit in the virtual HCR_EL2 is set.
> 
> 8. Even though a guest hypervisor can execute TLBI instructions that are
> accesible at EL1 without trap, it's wrong; All those TLBI instructions
> work based on current VMID, and when running a guest hypervisor current
> VMID is the one for itself, not the one from the virtual vttbr_el2. So
> letting a guest hypervisor execute those TLBI instructions results in
> invalidating its own TLB entries and leaving invalid TLB entries
> unhandled.

The points above look like a great starting point for turning this patch
into several.

Thanks,
Alex

> 
> Therefore we trap and emulate those TLBI instructions. The emulation is
> simple; we find a shadow VMID mapped to the virtual vttbr_el2, set it in
> the physical vttbr_el2, then execute the same instruction in EL2.
> 
> We don't set HCR_EL2.TTLB bit yet.
> 
>   [ Changes performed by Marc Zynger:
> 
>     The TLBI handling code more or less directly execute the same
>     instruction that has been trapped (with an EL2->EL1 conversion
>     in the case of an EL2 TLBI), but that's unfortunately not enough:
> 
>     - TLBIs must be upgraded to the Inner Shareable domain to account
>       for vcpu migration, just like we already have with HCR_EL2.FB.
> 
>     - The DSB instruction that synchronises these must thus be on
>       the Inner Shareable domain as well.
> 
>     - Prior to executing the TLBI, we need another DSB ISHST to make
>       sure that the update to the page tables is now visible.
> 
>       Ordering of system instructions fixed
> 
>     - The current TLB invalidation code is pretty buggy, as it assume a
>       page mapping. On the contrary, it is likely that TLB invalidation
>       will cover more than a single page, and the size should be decided
>       by the guests configuration (and not the host's).
> 
>       Since we don't cache the guest mapping sizes in the shadow PT yet,
>       let's assume the worse case (a block mapping) and invalidate that.
> 
>       Take this opportunity to fix the decoding of the parameter (it
>       isn't a straight IPA).
> 
>     - In general, we always emulate local TBL invalidations as being
>       as upgraded to the Inner Shareable domain so that we can easily
>       deal with vcpu migration. This is consistent with the fact that
>       we set HCR_EL2.FB when running non-nested VMs.
> 
>       So let's emulate TLBI ALLE2 as ALLE2IS.
>   ]
> 
>   [ Changes performed by Christoffer Dall:
> 
>     Sometimes when we are invalidating the TLB for a certain S2 MMU
>     context, this context can also have EL2 context associated with it
>     and we have to invalidate this too.
>   ]
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_asm.h |   2 +
>  arch/arm64/include/asm/sysreg.h  |  36 +++++
>  arch/arm64/kvm/hyp/vhe/switch.c  |   8 +-
>  arch/arm64/kvm/hyp/vhe/tlb.c     |  81 +++++++++++
>  arch/arm64/kvm/mmu.c             |  19 ++-
>  arch/arm64/kvm/sys_regs.c        | 227 +++++++++++++++++++++++++++++++
>  6 files changed, 368 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index e22861ece3c3..64ddbecab241 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -206,6 +206,8 @@ extern void __kvm_flush_cpu_context(struct kvm_s2_mmu *mmu);
>  extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa,
>  				     int level);
>  extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
> +extern void __kvm_tlb_vae2is(struct kvm_s2_mmu *mmu, u64 va, u64 sys_encoding);
> +extern void __kvm_tlb_el1_instr(struct kvm_s2_mmu *mmu, u64 val, u64 sys_encoding);
>  
>  extern void __kvm_timer_set_cntvoff(u64 cntvoff);
>  extern void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr);
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index ea17d1eabfc5..41c3603a3f29 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -675,6 +675,42 @@
>  #define OP_AT_S12E0R	sys_insn(AT_Op0, 4, AT_CRn, 8, 6)
>  #define OP_AT_S12E0W	sys_insn(AT_Op0, 4, AT_CRn, 8, 7)
>  
> +/* TLBI instructions */
> +#define TLBI_Op0	1
> +#define TLBI_Op1_EL1	0	/* Accessible from EL1 or higher */
> +#define TLBI_Op1_EL2	4	/* Accessible from EL2 or higher */
> +#define TLBI_CRn	8
> +#define tlbi_insn_el1(CRm, Op2)	sys_insn(TLBI_Op0, TLBI_Op1_EL1, TLBI_CRn, (CRm), (Op2))
> +#define tlbi_insn_el2(CRm, Op2)	sys_insn(TLBI_Op0, TLBI_Op1_EL2, TLBI_CRn, (CRm), (Op2))
> +
> +#define OP_TLBI_VMALLE1IS	tlbi_insn_el1(3, 0)
> +#define OP_TLBI_VAE1IS		tlbi_insn_el1(3, 1)
> +#define OP_TLBI_ASIDE1IS	tlbi_insn_el1(3, 2)
> +#define OP_TLBI_VAAE1IS		tlbi_insn_el1(3, 3)
> +#define OP_TLBI_VALE1IS		tlbi_insn_el1(3, 5)
> +#define OP_TLBI_VAALE1IS	tlbi_insn_el1(3, 7)
> +#define OP_TLBI_VMALLE1		tlbi_insn_el1(7, 0)
> +#define OP_TLBI_VAE1		tlbi_insn_el1(7, 1)
> +#define OP_TLBI_ASIDE1		tlbi_insn_el1(7, 2)
> +#define OP_TLBI_VAAE1		tlbi_insn_el1(7, 3)
> +#define OP_TLBI_VALE1		tlbi_insn_el1(7, 5)
> +#define OP_TLBI_VAALE1		tlbi_insn_el1(7, 7)
> +
> +#define OP_TLBI_IPAS2E1IS	tlbi_insn_el2(0, 1)
> +#define OP_TLBI_IPAS2LE1IS	tlbi_insn_el2(0, 5)
> +#define OP_TLBI_ALLE2IS		tlbi_insn_el2(3, 0)
> +#define OP_TLBI_VAE2IS		tlbi_insn_el2(3, 1)
> +#define OP_TLBI_ALLE1IS		tlbi_insn_el2(3, 4)
> +#define OP_TLBI_VALE2IS		tlbi_insn_el2(3, 5)
> +#define OP_TLBI_VMALLS12E1IS	tlbi_insn_el2(3, 6)
> +#define OP_TLBI_IPAS2E1		tlbi_insn_el2(4, 1)
> +#define OP_TLBI_IPAS2LE1	tlbi_insn_el2(4, 5)
> +#define OP_TLBI_ALLE2		tlbi_insn_el2(7, 0)
> +#define OP_TLBI_VAE2		tlbi_insn_el2(7, 1)
> +#define OP_TLBI_ALLE1		tlbi_insn_el2(7, 4)
> +#define OP_TLBI_VALE2		tlbi_insn_el2(7, 5)
> +#define OP_TLBI_VMALLS12E1	tlbi_insn_el2(7, 6)
> +
>  /* Common SCTLR_ELx flags. */
>  #define SCTLR_ELx_DSSBS	(BIT(44))
>  #define SCTLR_ELx_ATA	(BIT(43))
> diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
> index b7790d3c4122..0e164cc8e913 100644
> --- a/arch/arm64/kvm/hyp/vhe/switch.c
> +++ b/arch/arm64/kvm/hyp/vhe/switch.c
> @@ -44,7 +44,7 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
>  			 * the EL1 virtual memory control register accesses
>  			 * as well as the AT S1 operations.
>  			 */
> -			hcr |= HCR_TVM | HCR_TRVM | HCR_AT | HCR_NV1;
> +			hcr |= HCR_TVM | HCR_TRVM | HCR_AT | HCR_TTLB | HCR_NV1;
>  		} else {
>  			/*
>  			 * For a guest hypervisor on v8.1 (VHE), allow to
> @@ -72,11 +72,11 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
>  
>  			/*
>  			 * If we're using the EL1 translation regime
> -			 * (TGE clear), then ensure that AT S1 ops are
> -			 * trapped too.
> +			 * (TGE clear), then ensure that AT S1 and
> +			 * TLBI E1 ops are trapped too.
>  			 */
>  			if (!vcpu_el2_tge_is_set(vcpu))
> -				hcr |= HCR_AT;
> +				hcr |= HCR_AT | HCR_TTLB;
>  		}
>  	}
>  
> diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c
> index 24cef9b87f9e..c4389db4cc22 100644
> --- a/arch/arm64/kvm/hyp/vhe/tlb.c
> +++ b/arch/arm64/kvm/hyp/vhe/tlb.c
> @@ -161,3 +161,84 @@ void __kvm_flush_vm_context(void)
>  
>  	dsb(ish);
>  }
> +
> +void __kvm_tlb_vae2is(struct kvm_s2_mmu *mmu, u64 va, u64 sys_encoding)
> +{
> +	struct tlb_inv_context cxt;
> +
> +	dsb(ishst);
> +
> +	/* Switch to requested VMID */
> +	__tlb_switch_to_guest(mmu, &cxt);
> +
> +	/*
> +	 * Execute the EL1 version of TLBI VAE2* instruction, forcing
> +	 * an upgrade to the Inner Shareable domain in order to
> +	 * perform the invalidation on all CPUs.
> +	 */
> +	switch (sys_encoding) {
> +	case OP_TLBI_VAE2:
> +	case OP_TLBI_VAE2IS:
> +		__tlbi(vae1is, va);
> +		break;
> +	case OP_TLBI_VALE2:
> +	case OP_TLBI_VALE2IS:
> +		__tlbi(vale1is, va);
> +		break;
> +	default:
> +		break;
> +	}
> +	dsb(ish);
> +	isb();
> +
> +	__tlb_switch_to_host(&cxt);
> +}
> +
> +void __kvm_tlb_el1_instr(struct kvm_s2_mmu *mmu, u64 val, u64 sys_encoding)
> +{
> +	struct tlb_inv_context cxt;
> +
> +	dsb(ishst);
> +
> +	/* Switch to requested VMID */
> +	__tlb_switch_to_guest(mmu, &cxt);
> +
> +	/*
> +	 * Execute the same instruction as the guest hypervisor did,
> +	 * expanding the scope of local TLB invalidations to the Inner
> +	 * Shareable domain so that it takes place on all CPUs. This
> +	 * is equivalent to having HCR_EL2.FB set.
> +	 */
> +	switch (sys_encoding) {
> +	case OP_TLBI_VMALLE1:
> +	case OP_TLBI_VMALLE1IS:
> +		__tlbi(vmalle1is);
> +		break;
> +	case OP_TLBI_VAE1:
> +	case OP_TLBI_VAE1IS:
> +		__tlbi(vae1is, val);
> +		break;
> +	case OP_TLBI_ASIDE1:
> +	case OP_TLBI_ASIDE1IS:
> +		__tlbi(aside1is, val);
> +		break;
> +	case OP_TLBI_VAAE1:
> +	case OP_TLBI_VAAE1IS:
> +		__tlbi(vaae1is, val);
> +		break;
> +	case OP_TLBI_VALE1:
> +	case OP_TLBI_VALE1IS:
> +		__tlbi(vale1is, val);
> +		break;
> +	case OP_TLBI_VAALE1:
> +	case OP_TLBI_VAALE1IS:
> +		__tlbi(vaale1is, val);
> +		break;
> +	default:
> +		break;
> +	}
> +	dsb(ish);
> +	isb();
> +
> +	__tlb_switch_to_host(&cxt);
> +}
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index b9b11be65009..4b38f2f3c997 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -80,8 +80,25 @@ static bool memslot_is_logging(struct kvm_memory_slot *memslot)
>   */
>  void kvm_flush_remote_tlbs(struct kvm *kvm)
>  {
> +	struct kvm_s2_mmu *mmu = &kvm->arch.mmu;
> +
>  	++kvm->stat.generic.remote_tlb_flush_requests;
> -	kvm_call_hyp(__kvm_tlb_flush_vmid, &kvm->arch.mmu);
> +
> +	if (mmu == &kvm->arch.mmu) {
> +		/*
> +		 * For a normal (i.e. non-nested) guest, flush entries for the
> +		 * given VMID *
> +		 */
> +		kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
> +	} else {
> +		/*
> +		 * When supporting nested virtualization, we can have multiple
> +		 * VMIDs in play for each VCPU in the VM, so it's really not
> +		 * worth it to try to quiesce the system and flush all the
> +		 * VMIDs that may be in use, instead just nuke the whole thing.
> +		 */
> +		kvm_call_hyp(__kvm_flush_vm_context);
> +	}
>  }
>  
>  static bool kvm_is_device_pfn(unsigned long pfn)
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 7be57e1b7019..d7441b8ba406 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -2374,6 +2374,205 @@ static bool handle_s12w(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
>  	return handle_s12(vcpu, p, r, true);
>  }
>  
> +static bool handle_alle2is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +			   const struct sys_reg_desc *r)
> +{
> +	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
> +		return false;
> +
> +	/*
> +	 * To emulate invalidating all EL2 regime stage 1 TLB entries for all
> +	 * PEs, executing TLBI VMALLE1IS is enough. But reuse the existing
> +	 * interface for the simplicity; invalidating stage 2 entries doesn't
> +	 * affect the correctness.
> +	 */
> +	__kvm_tlb_flush_vmid(&vcpu->kvm->arch.mmu);
> +	return true;
> +}
> +
> +static bool handle_vae2is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +			  const struct sys_reg_desc *r)
> +{
> +	int sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
> +
> +	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
> +		return false;
> +
> +	/*
> +	 * Based on the same principle as TLBI ALLE2 instruction
> +	 * emulation, we emulate TLBI VAE2* instructions by executing
> +	 * corresponding TLBI VAE1* instructions with the virtual
> +	 * EL2's VMID assigned by the host hypervisor.
> +	 */
> +	__kvm_tlb_vae2is(&vcpu->kvm->arch.mmu, p->regval, sys_encoding);
> +	return true;
> +}
> +
> +static bool handle_alle1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +			   const struct sys_reg_desc *r)
> +{
> +	struct kvm_s2_mmu *mmu = &vcpu->kvm->arch.mmu;
> +
> +	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
> +		return false;
> +
> +	spin_lock(&vcpu->kvm->mmu_lock);
> +
> +	/*
> +	 * Clear all mappings in the shadow page tables and invalidate the stage
> +	 * 1 and 2 TLB entries via kvm_tlb_flush_vmid_ipa().
> +	 */
> +	kvm_nested_s2_clear(vcpu->kvm);
> +
> +	if (mmu->vmid.vmid_gen) {
> +		/*
> +		 * Invalidate the stage 1 and 2 TLB entries for the host OS
> +		 * in a VM only if there is one.
> +		 */
> +		__kvm_tlb_flush_vmid(mmu);
> +	}
> +
> +	spin_unlock(&vcpu->kvm->mmu_lock);
> +
> +	return true;
> +}
> +
> +static bool handle_vmalls12e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +				const struct sys_reg_desc *r)
> +{
> +	struct kvm_s2_mmu *mmu;
> +	u64 vttbr;
> +
> +	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
> +		return false;
> +
> +	vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
> +	spin_lock(&vcpu->kvm->mmu_lock);
> +
> +	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, HCR_VM);
> +	if (mmu)
> +		kvm_unmap_stage2_range(mmu, 0, kvm_phys_size(vcpu->kvm));
> +
> +	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, 0);
> +	if (mmu)
> +		kvm_unmap_stage2_range(mmu, 0, kvm_phys_size(vcpu->kvm));
> +
> +	spin_unlock(&vcpu->kvm->mmu_lock);
> +
> +	return true;
> +}
> +
> +static bool handle_ipas2e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +			     const struct sys_reg_desc *r)
> +{
> +	u64 vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
> +	u64 vtcr = vcpu_read_sys_reg(vcpu, VTCR_EL2);
> +	struct kvm_s2_mmu *mmu;
> +	u64 base_addr;
> +	int max_size;
> +
> +	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
> +		return false;
> +
> +	/*
> +	 * We drop a number of things from the supplied value:
> +	 *
> +	 * - NS bit: we're non-secure only.
> +	 *
> +	 * - TTL field: We already have the granule size from the
> +	 *   VTCR_EL2.TG0 field, and the level is only relevant to the
> +	 *   guest's S2PT.
> +	 *
> +	 * - IPA[51:48]: We don't support 52bit IPA just yet...
> +	 *
> +	 * And of course, adjust the IPA to be on an actual address.
> +	 */
> +	base_addr = (p->regval & GENMASK_ULL(35, 0)) << 12;
> +
> +	/* Compute the maximum extent of the invalidation */
> +	switch ((vtcr & VTCR_EL2_TG0_MASK)) {
> +	case VTCR_EL2_TG0_4K:
> +		max_size = SZ_1G;
> +		break;
> +	case VTCR_EL2_TG0_16K:
> +		max_size = SZ_32M;
> +		break;
> +	case VTCR_EL2_TG0_64K:
> +		/*
> +		 * No, we do not support 52bit IPA in nested yet. Once
> +		 * we do, this should be 4TB.
> +		 */
> +		/* FIXME: remove the 52bit PA support from the IDregs */
> +		max_size = SZ_512M;
> +		break;
> +	default:
> +		BUG();
> +	}
> +
> +	spin_lock(&vcpu->kvm->mmu_lock);
> +
> +	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, HCR_VM);
> +	if (mmu)
> +		kvm_unmap_stage2_range(mmu, base_addr, max_size);
> +
> +	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, 0);
> +	if (mmu)
> +		kvm_unmap_stage2_range(mmu, base_addr, max_size);
> +
> +	spin_unlock(&vcpu->kvm->mmu_lock);
> +
> +	return true;
> +}
> +
> +static bool handle_tlbi_el1(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +			    const struct sys_reg_desc *r)
> +{
> +	u32 sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
> +
> +	if (vcpu_has_nv(vcpu) && forward_traps(vcpu, HCR_TTLB))
> +		return false;
> +
> +	/*
> +	 * If we're here, this is because we've trapped on a EL1 TLBI
> +	 * instruction that affects the EL1 translation regime while
> +	 * we're running in a context that doesn't allow us to let the
> +	 * HW do its thing (aka vEL2):
> +	 *
> +	 * - HCR_EL2.E2H == 0 : a non-VHE guest
> +	 * - HCR_EL2.{E2H,TGE} == { 1, 0 } : a VHE guest in guest mode
> +	 *
> +	 * We don't expect these helpers to ever be called when running
> +	 * in a vEL1 context.
> +	 */
> +
> +	WARN_ON(!vcpu_is_el2(vcpu));
> +
> +	mutex_lock(&vcpu->kvm->lock);
> +
> +	if ((__vcpu_sys_reg(vcpu, HCR_EL2) & (HCR_E2H | HCR_TGE)) != (HCR_E2H | HCR_TGE)) {
> +		u64 virtual_vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
> +		struct kvm_s2_mmu *mmu;
> +
> +		mmu = lookup_s2_mmu(vcpu->kvm, virtual_vttbr, HCR_VM);
> +		if (mmu)
> +			__kvm_tlb_el1_instr(mmu, p->regval, sys_encoding);
> +
> +		mmu = lookup_s2_mmu(vcpu->kvm, virtual_vttbr, 0);
> +		if (mmu)
> +			__kvm_tlb_el1_instr(mmu, p->regval, sys_encoding);
> +	} else {
> +		/*
> +		 * ARMv8.4-NV allows the guest to change TGE behind
> +		 * our back, so we always trap EL1 TLBIs from vEL2...
> +		 */
> +		__kvm_tlb_el1_instr(&vcpu->kvm->arch.mmu, p->regval, sys_encoding);
> +	}
> +
> +	mutex_unlock(&vcpu->kvm->lock);
> +
> +	return true;
> +}
> +
>  /*
>   * AT instruction emulation
>   *
> @@ -2459,12 +2658,40 @@ static struct sys_reg_desc sys_insn_descs[] = {
>  	{ SYS_DESC(SYS_DC_CSW), access_dcsw },
>  	{ SYS_DESC(SYS_DC_CISW), access_dcsw },
>  
> +	SYS_INSN(TLBI_VMALLE1IS, handle_tlbi_el1),
> +	SYS_INSN(TLBI_VAE1IS, handle_tlbi_el1),
> +	SYS_INSN(TLBI_ASIDE1IS, handle_tlbi_el1),
> +	SYS_INSN(TLBI_VAAE1IS, handle_tlbi_el1),
> +	SYS_INSN(TLBI_VALE1IS, handle_tlbi_el1),
> +	SYS_INSN(TLBI_VAALE1IS, handle_tlbi_el1),
> +	SYS_INSN(TLBI_VMALLE1, handle_tlbi_el1),
> +	SYS_INSN(TLBI_VAE1, handle_tlbi_el1),
> +	SYS_INSN(TLBI_ASIDE1, handle_tlbi_el1),
> +	SYS_INSN(TLBI_VAAE1, handle_tlbi_el1),
> +	SYS_INSN(TLBI_VALE1, handle_tlbi_el1),
> +	SYS_INSN(TLBI_VAALE1, handle_tlbi_el1),
> +
>  	SYS_INSN(AT_S1E2R, handle_s1e2),
>  	SYS_INSN(AT_S1E2W, handle_s1e2),
>  	SYS_INSN(AT_S12E1R, handle_s12r),
>  	SYS_INSN(AT_S12E1W, handle_s12w),
>  	SYS_INSN(AT_S12E0R, handle_s12r),
>  	SYS_INSN(AT_S12E0W, handle_s12w),
> +
> +	SYS_INSN(TLBI_IPAS2E1IS, handle_ipas2e1is),
> +	SYS_INSN(TLBI_IPAS2LE1IS, handle_ipas2e1is),
> +	SYS_INSN(TLBI_ALLE2IS, handle_alle2is),
> +	SYS_INSN(TLBI_VAE2IS, handle_vae2is),
> +	SYS_INSN(TLBI_ALLE1IS, handle_alle1is),
> +	SYS_INSN(TLBI_VALE2IS, handle_vae2is),
> +	SYS_INSN(TLBI_VMALLS12E1IS, handle_vmalls12e1is),
> +	SYS_INSN(TLBI_IPAS2E1, handle_ipas2e1is),
> +	SYS_INSN(TLBI_IPAS2LE1, handle_ipas2e1is),
> +	SYS_INSN(TLBI_ALLE2, handle_alle2is),
> +	SYS_INSN(TLBI_VAE2, handle_vae2is),
> +	SYS_INSN(TLBI_ALLE1, handle_alle1is),
> +	SYS_INSN(TLBI_VALE2, handle_vae2is),
> +	SYS_INSN(TLBI_VMALLS12E1, handle_vmalls12e1is),
>  };
>  
>  static bool trap_dbgdidr(struct kvm_vcpu *vcpu,
> -- 
> 2.30.2
> 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 41/64] KVM: arm64: nv: Trap and emulate TLBI instructions from virtual EL2
@ 2022-02-24 15:56     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-24 15:56 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi,

On Fri, Jan 28, 2022 at 12:18:49PM +0000, Marc Zyngier wrote:
> From: Jintack Lim <jintack.lim@linaro.org>
> 
> When supporting nested virtualization a guest hypervisor executing TLBI
> instructions must be trapped and emulated by the host hypervisor,
> because the guest hypervisor can only affect physical TLB entries
> relating to its own execution environment (virtual EL2 in EL1) but not
> to the nested guests as required by the semantics of the instructions
> and TLBI instructions might also result in updates (invalidations) to
> shadow page tables.
> 
> This patch does several things.

Sounds like a great reason to split this into several patches!

> 
> 1. List and define all TLBI system instructions to emulate.
> 
> 2. Emulate TLBI ALLE2(IS) instruction executed in the virtual EL2. Since
> we emulate the virtual EL2 in the EL1, we invalidate EL1&0 regime stage
> 1 TLB entries with setting vttbr_el2 having the VMID of the virtual EL2.
> 
> 3. Emulate TLBI VAE2* instruction executed in the virtual EL2. Based on the
> same principle as TLBI ALLE2 instruction, we can simply emulate those
> instructions by executing corresponding VAE1* instructions with the
> virtual EL2's VMID assigned by the host hypervisor.
> 
> Note that we are able to emulate TLBI ALLE2IS precisely by only
> invalidating stage 1 TLB entries via TLBI VMALL1IS instruction, but to
> make it simeple, we reuse the existing function, __kvm_tlb_flush_vmid(),
> which invalidates both of stage 1 and 2 TLB entries.
> 
> 4. TLBI ALLE1(IS) instruction invalidates all EL1&0 regime stage 1 and 2
> TLB entries (on all PEs in the same Inner Shareable domain). To emulate
> these instructions, we first need to clear all the mappings in the
> shadow page tables since executing those instructions implies the change
> of mappings in the stage 2 page tables maintained by the guest
> hypervisor.  We then need to invalidate all EL1&0 regime stage 1 and 2
> TLB entries of all VMIDs, which are assigned by the host hypervisor, for
> this VM.
> 
> 5. Based on the same principle as TLBI ALLE1(IS) emulation, we clear the
> mappings in the shadow stage-2 page tables and invalidate TLB entries.
> But this time we do it only for the current VMID from the guest
> hypervisor's perspective, not for all VMIDs.
> 
> 6. Based on the same principle as TLBI ALLE1(IS) and TLBI VMALLS12E1(IS)
> emulation, we clear the mappings in the shadow stage-2 page tables and
> invalidate TLB entries. We do it only for one mapping for the current
> VMID from the guest hypervisor's view.
> 
> 7. Forward system instruction traps to the virtual EL2 if a
> corresponding bit in the virtual HCR_EL2 is set.
> 
> 8. Even though a guest hypervisor can execute TLBI instructions that are
> accesible at EL1 without trap, it's wrong; All those TLBI instructions
> work based on current VMID, and when running a guest hypervisor current
> VMID is the one for itself, not the one from the virtual vttbr_el2. So
> letting a guest hypervisor execute those TLBI instructions results in
> invalidating its own TLB entries and leaving invalid TLB entries
> unhandled.

The points above look like a great starting point for turning this patch
into several.

Thanks,
Alex

> 
> Therefore we trap and emulate those TLBI instructions. The emulation is
> simple; we find a shadow VMID mapped to the virtual vttbr_el2, set it in
> the physical vttbr_el2, then execute the same instruction in EL2.
> 
> We don't set HCR_EL2.TTLB bit yet.
> 
>   [ Changes performed by Marc Zynger:
> 
>     The TLBI handling code more or less directly execute the same
>     instruction that has been trapped (with an EL2->EL1 conversion
>     in the case of an EL2 TLBI), but that's unfortunately not enough:
> 
>     - TLBIs must be upgraded to the Inner Shareable domain to account
>       for vcpu migration, just like we already have with HCR_EL2.FB.
> 
>     - The DSB instruction that synchronises these must thus be on
>       the Inner Shareable domain as well.
> 
>     - Prior to executing the TLBI, we need another DSB ISHST to make
>       sure that the update to the page tables is now visible.
> 
>       Ordering of system instructions fixed
> 
>     - The current TLB invalidation code is pretty buggy, as it assume a
>       page mapping. On the contrary, it is likely that TLB invalidation
>       will cover more than a single page, and the size should be decided
>       by the guests configuration (and not the host's).
> 
>       Since we don't cache the guest mapping sizes in the shadow PT yet,
>       let's assume the worse case (a block mapping) and invalidate that.
> 
>       Take this opportunity to fix the decoding of the parameter (it
>       isn't a straight IPA).
> 
>     - In general, we always emulate local TBL invalidations as being
>       as upgraded to the Inner Shareable domain so that we can easily
>       deal with vcpu migration. This is consistent with the fact that
>       we set HCR_EL2.FB when running non-nested VMs.
> 
>       So let's emulate TLBI ALLE2 as ALLE2IS.
>   ]
> 
>   [ Changes performed by Christoffer Dall:
> 
>     Sometimes when we are invalidating the TLB for a certain S2 MMU
>     context, this context can also have EL2 context associated with it
>     and we have to invalidate this too.
>   ]
> 
> Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_asm.h |   2 +
>  arch/arm64/include/asm/sysreg.h  |  36 +++++
>  arch/arm64/kvm/hyp/vhe/switch.c  |   8 +-
>  arch/arm64/kvm/hyp/vhe/tlb.c     |  81 +++++++++++
>  arch/arm64/kvm/mmu.c             |  19 ++-
>  arch/arm64/kvm/sys_regs.c        | 227 +++++++++++++++++++++++++++++++
>  6 files changed, 368 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index e22861ece3c3..64ddbecab241 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -206,6 +206,8 @@ extern void __kvm_flush_cpu_context(struct kvm_s2_mmu *mmu);
>  extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa,
>  				     int level);
>  extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
> +extern void __kvm_tlb_vae2is(struct kvm_s2_mmu *mmu, u64 va, u64 sys_encoding);
> +extern void __kvm_tlb_el1_instr(struct kvm_s2_mmu *mmu, u64 val, u64 sys_encoding);
>  
>  extern void __kvm_timer_set_cntvoff(u64 cntvoff);
>  extern void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr);
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index ea17d1eabfc5..41c3603a3f29 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -675,6 +675,42 @@
>  #define OP_AT_S12E0R	sys_insn(AT_Op0, 4, AT_CRn, 8, 6)
>  #define OP_AT_S12E0W	sys_insn(AT_Op0, 4, AT_CRn, 8, 7)
>  
> +/* TLBI instructions */
> +#define TLBI_Op0	1
> +#define TLBI_Op1_EL1	0	/* Accessible from EL1 or higher */
> +#define TLBI_Op1_EL2	4	/* Accessible from EL2 or higher */
> +#define TLBI_CRn	8
> +#define tlbi_insn_el1(CRm, Op2)	sys_insn(TLBI_Op0, TLBI_Op1_EL1, TLBI_CRn, (CRm), (Op2))
> +#define tlbi_insn_el2(CRm, Op2)	sys_insn(TLBI_Op0, TLBI_Op1_EL2, TLBI_CRn, (CRm), (Op2))
> +
> +#define OP_TLBI_VMALLE1IS	tlbi_insn_el1(3, 0)
> +#define OP_TLBI_VAE1IS		tlbi_insn_el1(3, 1)
> +#define OP_TLBI_ASIDE1IS	tlbi_insn_el1(3, 2)
> +#define OP_TLBI_VAAE1IS		tlbi_insn_el1(3, 3)
> +#define OP_TLBI_VALE1IS		tlbi_insn_el1(3, 5)
> +#define OP_TLBI_VAALE1IS	tlbi_insn_el1(3, 7)
> +#define OP_TLBI_VMALLE1		tlbi_insn_el1(7, 0)
> +#define OP_TLBI_VAE1		tlbi_insn_el1(7, 1)
> +#define OP_TLBI_ASIDE1		tlbi_insn_el1(7, 2)
> +#define OP_TLBI_VAAE1		tlbi_insn_el1(7, 3)
> +#define OP_TLBI_VALE1		tlbi_insn_el1(7, 5)
> +#define OP_TLBI_VAALE1		tlbi_insn_el1(7, 7)
> +
> +#define OP_TLBI_IPAS2E1IS	tlbi_insn_el2(0, 1)
> +#define OP_TLBI_IPAS2LE1IS	tlbi_insn_el2(0, 5)
> +#define OP_TLBI_ALLE2IS		tlbi_insn_el2(3, 0)
> +#define OP_TLBI_VAE2IS		tlbi_insn_el2(3, 1)
> +#define OP_TLBI_ALLE1IS		tlbi_insn_el2(3, 4)
> +#define OP_TLBI_VALE2IS		tlbi_insn_el2(3, 5)
> +#define OP_TLBI_VMALLS12E1IS	tlbi_insn_el2(3, 6)
> +#define OP_TLBI_IPAS2E1		tlbi_insn_el2(4, 1)
> +#define OP_TLBI_IPAS2LE1	tlbi_insn_el2(4, 5)
> +#define OP_TLBI_ALLE2		tlbi_insn_el2(7, 0)
> +#define OP_TLBI_VAE2		tlbi_insn_el2(7, 1)
> +#define OP_TLBI_ALLE1		tlbi_insn_el2(7, 4)
> +#define OP_TLBI_VALE2		tlbi_insn_el2(7, 5)
> +#define OP_TLBI_VMALLS12E1	tlbi_insn_el2(7, 6)
> +
>  /* Common SCTLR_ELx flags. */
>  #define SCTLR_ELx_DSSBS	(BIT(44))
>  #define SCTLR_ELx_ATA	(BIT(43))
> diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
> index b7790d3c4122..0e164cc8e913 100644
> --- a/arch/arm64/kvm/hyp/vhe/switch.c
> +++ b/arch/arm64/kvm/hyp/vhe/switch.c
> @@ -44,7 +44,7 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
>  			 * the EL1 virtual memory control register accesses
>  			 * as well as the AT S1 operations.
>  			 */
> -			hcr |= HCR_TVM | HCR_TRVM | HCR_AT | HCR_NV1;
> +			hcr |= HCR_TVM | HCR_TRVM | HCR_AT | HCR_TTLB | HCR_NV1;
>  		} else {
>  			/*
>  			 * For a guest hypervisor on v8.1 (VHE), allow to
> @@ -72,11 +72,11 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
>  
>  			/*
>  			 * If we're using the EL1 translation regime
> -			 * (TGE clear), then ensure that AT S1 ops are
> -			 * trapped too.
> +			 * (TGE clear), then ensure that AT S1 and
> +			 * TLBI E1 ops are trapped too.
>  			 */
>  			if (!vcpu_el2_tge_is_set(vcpu))
> -				hcr |= HCR_AT;
> +				hcr |= HCR_AT | HCR_TTLB;
>  		}
>  	}
>  
> diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c
> index 24cef9b87f9e..c4389db4cc22 100644
> --- a/arch/arm64/kvm/hyp/vhe/tlb.c
> +++ b/arch/arm64/kvm/hyp/vhe/tlb.c
> @@ -161,3 +161,84 @@ void __kvm_flush_vm_context(void)
>  
>  	dsb(ish);
>  }
> +
> +void __kvm_tlb_vae2is(struct kvm_s2_mmu *mmu, u64 va, u64 sys_encoding)
> +{
> +	struct tlb_inv_context cxt;
> +
> +	dsb(ishst);
> +
> +	/* Switch to requested VMID */
> +	__tlb_switch_to_guest(mmu, &cxt);
> +
> +	/*
> +	 * Execute the EL1 version of TLBI VAE2* instruction, forcing
> +	 * an upgrade to the Inner Shareable domain in order to
> +	 * perform the invalidation on all CPUs.
> +	 */
> +	switch (sys_encoding) {
> +	case OP_TLBI_VAE2:
> +	case OP_TLBI_VAE2IS:
> +		__tlbi(vae1is, va);
> +		break;
> +	case OP_TLBI_VALE2:
> +	case OP_TLBI_VALE2IS:
> +		__tlbi(vale1is, va);
> +		break;
> +	default:
> +		break;
> +	}
> +	dsb(ish);
> +	isb();
> +
> +	__tlb_switch_to_host(&cxt);
> +}
> +
> +void __kvm_tlb_el1_instr(struct kvm_s2_mmu *mmu, u64 val, u64 sys_encoding)
> +{
> +	struct tlb_inv_context cxt;
> +
> +	dsb(ishst);
> +
> +	/* Switch to requested VMID */
> +	__tlb_switch_to_guest(mmu, &cxt);
> +
> +	/*
> +	 * Execute the same instruction as the guest hypervisor did,
> +	 * expanding the scope of local TLB invalidations to the Inner
> +	 * Shareable domain so that it takes place on all CPUs. This
> +	 * is equivalent to having HCR_EL2.FB set.
> +	 */
> +	switch (sys_encoding) {
> +	case OP_TLBI_VMALLE1:
> +	case OP_TLBI_VMALLE1IS:
> +		__tlbi(vmalle1is);
> +		break;
> +	case OP_TLBI_VAE1:
> +	case OP_TLBI_VAE1IS:
> +		__tlbi(vae1is, val);
> +		break;
> +	case OP_TLBI_ASIDE1:
> +	case OP_TLBI_ASIDE1IS:
> +		__tlbi(aside1is, val);
> +		break;
> +	case OP_TLBI_VAAE1:
> +	case OP_TLBI_VAAE1IS:
> +		__tlbi(vaae1is, val);
> +		break;
> +	case OP_TLBI_VALE1:
> +	case OP_TLBI_VALE1IS:
> +		__tlbi(vale1is, val);
> +		break;
> +	case OP_TLBI_VAALE1:
> +	case OP_TLBI_VAALE1IS:
> +		__tlbi(vaale1is, val);
> +		break;
> +	default:
> +		break;
> +	}
> +	dsb(ish);
> +	isb();
> +
> +	__tlb_switch_to_host(&cxt);
> +}
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index b9b11be65009..4b38f2f3c997 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -80,8 +80,25 @@ static bool memslot_is_logging(struct kvm_memory_slot *memslot)
>   */
>  void kvm_flush_remote_tlbs(struct kvm *kvm)
>  {
> +	struct kvm_s2_mmu *mmu = &kvm->arch.mmu;
> +
>  	++kvm->stat.generic.remote_tlb_flush_requests;
> -	kvm_call_hyp(__kvm_tlb_flush_vmid, &kvm->arch.mmu);
> +
> +	if (mmu == &kvm->arch.mmu) {
> +		/*
> +		 * For a normal (i.e. non-nested) guest, flush entries for the
> +		 * given VMID *
> +		 */
> +		kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
> +	} else {
> +		/*
> +		 * When supporting nested virtualization, we can have multiple
> +		 * VMIDs in play for each VCPU in the VM, so it's really not
> +		 * worth it to try to quiesce the system and flush all the
> +		 * VMIDs that may be in use, instead just nuke the whole thing.
> +		 */
> +		kvm_call_hyp(__kvm_flush_vm_context);
> +	}
>  }
>  
>  static bool kvm_is_device_pfn(unsigned long pfn)
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 7be57e1b7019..d7441b8ba406 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -2374,6 +2374,205 @@ static bool handle_s12w(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
>  	return handle_s12(vcpu, p, r, true);
>  }
>  
> +static bool handle_alle2is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +			   const struct sys_reg_desc *r)
> +{
> +	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
> +		return false;
> +
> +	/*
> +	 * To emulate invalidating all EL2 regime stage 1 TLB entries for all
> +	 * PEs, executing TLBI VMALLE1IS is enough. But reuse the existing
> +	 * interface for the simplicity; invalidating stage 2 entries doesn't
> +	 * affect the correctness.
> +	 */
> +	__kvm_tlb_flush_vmid(&vcpu->kvm->arch.mmu);
> +	return true;
> +}
> +
> +static bool handle_vae2is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +			  const struct sys_reg_desc *r)
> +{
> +	int sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
> +
> +	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
> +		return false;
> +
> +	/*
> +	 * Based on the same principle as TLBI ALLE2 instruction
> +	 * emulation, we emulate TLBI VAE2* instructions by executing
> +	 * corresponding TLBI VAE1* instructions with the virtual
> +	 * EL2's VMID assigned by the host hypervisor.
> +	 */
> +	__kvm_tlb_vae2is(&vcpu->kvm->arch.mmu, p->regval, sys_encoding);
> +	return true;
> +}
> +
> +static bool handle_alle1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +			   const struct sys_reg_desc *r)
> +{
> +	struct kvm_s2_mmu *mmu = &vcpu->kvm->arch.mmu;
> +
> +	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
> +		return false;
> +
> +	spin_lock(&vcpu->kvm->mmu_lock);
> +
> +	/*
> +	 * Clear all mappings in the shadow page tables and invalidate the stage
> +	 * 1 and 2 TLB entries via kvm_tlb_flush_vmid_ipa().
> +	 */
> +	kvm_nested_s2_clear(vcpu->kvm);
> +
> +	if (mmu->vmid.vmid_gen) {
> +		/*
> +		 * Invalidate the stage 1 and 2 TLB entries for the host OS
> +		 * in a VM only if there is one.
> +		 */
> +		__kvm_tlb_flush_vmid(mmu);
> +	}
> +
> +	spin_unlock(&vcpu->kvm->mmu_lock);
> +
> +	return true;
> +}
> +
> +static bool handle_vmalls12e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +				const struct sys_reg_desc *r)
> +{
> +	struct kvm_s2_mmu *mmu;
> +	u64 vttbr;
> +
> +	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
> +		return false;
> +
> +	vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
> +	spin_lock(&vcpu->kvm->mmu_lock);
> +
> +	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, HCR_VM);
> +	if (mmu)
> +		kvm_unmap_stage2_range(mmu, 0, kvm_phys_size(vcpu->kvm));
> +
> +	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, 0);
> +	if (mmu)
> +		kvm_unmap_stage2_range(mmu, 0, kvm_phys_size(vcpu->kvm));
> +
> +	spin_unlock(&vcpu->kvm->mmu_lock);
> +
> +	return true;
> +}
> +
> +static bool handle_ipas2e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +			     const struct sys_reg_desc *r)
> +{
> +	u64 vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
> +	u64 vtcr = vcpu_read_sys_reg(vcpu, VTCR_EL2);
> +	struct kvm_s2_mmu *mmu;
> +	u64 base_addr;
> +	int max_size;
> +
> +	if (vcpu_has_nv(vcpu) && forward_nv_traps(vcpu))
> +		return false;
> +
> +	/*
> +	 * We drop a number of things from the supplied value:
> +	 *
> +	 * - NS bit: we're non-secure only.
> +	 *
> +	 * - TTL field: We already have the granule size from the
> +	 *   VTCR_EL2.TG0 field, and the level is only relevant to the
> +	 *   guest's S2PT.
> +	 *
> +	 * - IPA[51:48]: We don't support 52bit IPA just yet...
> +	 *
> +	 * And of course, adjust the IPA to be on an actual address.
> +	 */
> +	base_addr = (p->regval & GENMASK_ULL(35, 0)) << 12;
> +
> +	/* Compute the maximum extent of the invalidation */
> +	switch ((vtcr & VTCR_EL2_TG0_MASK)) {
> +	case VTCR_EL2_TG0_4K:
> +		max_size = SZ_1G;
> +		break;
> +	case VTCR_EL2_TG0_16K:
> +		max_size = SZ_32M;
> +		break;
> +	case VTCR_EL2_TG0_64K:
> +		/*
> +		 * No, we do not support 52bit IPA in nested yet. Once
> +		 * we do, this should be 4TB.
> +		 */
> +		/* FIXME: remove the 52bit PA support from the IDregs */
> +		max_size = SZ_512M;
> +		break;
> +	default:
> +		BUG();
> +	}
> +
> +	spin_lock(&vcpu->kvm->mmu_lock);
> +
> +	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, HCR_VM);
> +	if (mmu)
> +		kvm_unmap_stage2_range(mmu, base_addr, max_size);
> +
> +	mmu = lookup_s2_mmu(vcpu->kvm, vttbr, 0);
> +	if (mmu)
> +		kvm_unmap_stage2_range(mmu, base_addr, max_size);
> +
> +	spin_unlock(&vcpu->kvm->mmu_lock);
> +
> +	return true;
> +}
> +
> +static bool handle_tlbi_el1(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +			    const struct sys_reg_desc *r)
> +{
> +	u32 sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
> +
> +	if (vcpu_has_nv(vcpu) && forward_traps(vcpu, HCR_TTLB))
> +		return false;
> +
> +	/*
> +	 * If we're here, this is because we've trapped on a EL1 TLBI
> +	 * instruction that affects the EL1 translation regime while
> +	 * we're running in a context that doesn't allow us to let the
> +	 * HW do its thing (aka vEL2):
> +	 *
> +	 * - HCR_EL2.E2H == 0 : a non-VHE guest
> +	 * - HCR_EL2.{E2H,TGE} == { 1, 0 } : a VHE guest in guest mode
> +	 *
> +	 * We don't expect these helpers to ever be called when running
> +	 * in a vEL1 context.
> +	 */
> +
> +	WARN_ON(!vcpu_is_el2(vcpu));
> +
> +	mutex_lock(&vcpu->kvm->lock);
> +
> +	if ((__vcpu_sys_reg(vcpu, HCR_EL2) & (HCR_E2H | HCR_TGE)) != (HCR_E2H | HCR_TGE)) {
> +		u64 virtual_vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
> +		struct kvm_s2_mmu *mmu;
> +
> +		mmu = lookup_s2_mmu(vcpu->kvm, virtual_vttbr, HCR_VM);
> +		if (mmu)
> +			__kvm_tlb_el1_instr(mmu, p->regval, sys_encoding);
> +
> +		mmu = lookup_s2_mmu(vcpu->kvm, virtual_vttbr, 0);
> +		if (mmu)
> +			__kvm_tlb_el1_instr(mmu, p->regval, sys_encoding);
> +	} else {
> +		/*
> +		 * ARMv8.4-NV allows the guest to change TGE behind
> +		 * our back, so we always trap EL1 TLBIs from vEL2...
> +		 */
> +		__kvm_tlb_el1_instr(&vcpu->kvm->arch.mmu, p->regval, sys_encoding);
> +	}
> +
> +	mutex_unlock(&vcpu->kvm->lock);
> +
> +	return true;
> +}
> +
>  /*
>   * AT instruction emulation
>   *
> @@ -2459,12 +2658,40 @@ static struct sys_reg_desc sys_insn_descs[] = {
>  	{ SYS_DESC(SYS_DC_CSW), access_dcsw },
>  	{ SYS_DESC(SYS_DC_CISW), access_dcsw },
>  
> +	SYS_INSN(TLBI_VMALLE1IS, handle_tlbi_el1),
> +	SYS_INSN(TLBI_VAE1IS, handle_tlbi_el1),
> +	SYS_INSN(TLBI_ASIDE1IS, handle_tlbi_el1),
> +	SYS_INSN(TLBI_VAAE1IS, handle_tlbi_el1),
> +	SYS_INSN(TLBI_VALE1IS, handle_tlbi_el1),
> +	SYS_INSN(TLBI_VAALE1IS, handle_tlbi_el1),
> +	SYS_INSN(TLBI_VMALLE1, handle_tlbi_el1),
> +	SYS_INSN(TLBI_VAE1, handle_tlbi_el1),
> +	SYS_INSN(TLBI_ASIDE1, handle_tlbi_el1),
> +	SYS_INSN(TLBI_VAAE1, handle_tlbi_el1),
> +	SYS_INSN(TLBI_VALE1, handle_tlbi_el1),
> +	SYS_INSN(TLBI_VAALE1, handle_tlbi_el1),
> +
>  	SYS_INSN(AT_S1E2R, handle_s1e2),
>  	SYS_INSN(AT_S1E2W, handle_s1e2),
>  	SYS_INSN(AT_S12E1R, handle_s12r),
>  	SYS_INSN(AT_S12E1W, handle_s12w),
>  	SYS_INSN(AT_S12E0R, handle_s12r),
>  	SYS_INSN(AT_S12E0W, handle_s12w),
> +
> +	SYS_INSN(TLBI_IPAS2E1IS, handle_ipas2e1is),
> +	SYS_INSN(TLBI_IPAS2LE1IS, handle_ipas2e1is),
> +	SYS_INSN(TLBI_ALLE2IS, handle_alle2is),
> +	SYS_INSN(TLBI_VAE2IS, handle_vae2is),
> +	SYS_INSN(TLBI_ALLE1IS, handle_alle1is),
> +	SYS_INSN(TLBI_VALE2IS, handle_vae2is),
> +	SYS_INSN(TLBI_VMALLS12E1IS, handle_vmalls12e1is),
> +	SYS_INSN(TLBI_IPAS2E1, handle_ipas2e1is),
> +	SYS_INSN(TLBI_IPAS2LE1, handle_ipas2e1is),
> +	SYS_INSN(TLBI_ALLE2, handle_alle2is),
> +	SYS_INSN(TLBI_VAE2, handle_vae2is),
> +	SYS_INSN(TLBI_ALLE1, handle_alle1is),
> +	SYS_INSN(TLBI_VALE2, handle_vae2is),
> +	SYS_INSN(TLBI_VMALLS12E1, handle_vmalls12e1is),
>  };
>  
>  static bool trap_dbgdidr(struct kvm_vcpu *vcpu,
> -- 
> 2.30.2
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 42/64] KVM: arm64: nv: Fold guest's HCR_EL2 configuration into the host's
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-02-25 13:45     ` Alexandru Elisei
  -1 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-25 13:45 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi,

On Fri, Jan 28, 2022 at 12:18:50PM +0000, Marc Zyngier wrote:
> When entering a L2 guest (nested virt enabled, but not in hypervisor
> context), we need to honor the traps the L1 guest has asked enabled.
> 
> For now, just OR the guest's HCR_EL2 into the host's. We may have to do
> some filtering in the future though.

Hmm... looks to me like the filtering is already implemented via the
HCR_GUEST_NV_FILTER_FLAGS. Or am I misunderstanding something?

> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/hyp/vhe/switch.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
> index 0e164cc8e913..5e8eafac27c6 100644
> --- a/arch/arm64/kvm/hyp/vhe/switch.c
> +++ b/arch/arm64/kvm/hyp/vhe/switch.c
> @@ -78,6 +78,11 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
>  			if (!vcpu_el2_tge_is_set(vcpu))
>  				hcr |= HCR_AT | HCR_TTLB;
>  		}
> +	} else if (vcpu_has_nv(vcpu)) {
> +		u64 vhcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
> +
> +		vhcr_el2 &= ~HCR_GUEST_NV_FILTER_FLAGS;


> +		hcr |= vhcr_el2;

This makes sense, we only the guest to add extra traps on top of what KVM
already traps, not remove traps from what KVM has configured.

However, HCR_EL2.FIEN (bit 47) disables traps when the bit is 1. Shouldn't
it be part of the HCR_GUEST_NV_FILTER_FLAGS?

Thanks,
Alex

>  	}
>  
>  	___activate_traps(vcpu, hcr);
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 42/64] KVM: arm64: nv: Fold guest's HCR_EL2 configuration into the host's
@ 2022-02-25 13:45     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-25 13:45 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

Hi,

On Fri, Jan 28, 2022 at 12:18:50PM +0000, Marc Zyngier wrote:
> When entering a L2 guest (nested virt enabled, but not in hypervisor
> context), we need to honor the traps the L1 guest has asked enabled.
> 
> For now, just OR the guest's HCR_EL2 into the host's. We may have to do
> some filtering in the future though.

Hmm... looks to me like the filtering is already implemented via the
HCR_GUEST_NV_FILTER_FLAGS. Or am I misunderstanding something?

> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/hyp/vhe/switch.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
> index 0e164cc8e913..5e8eafac27c6 100644
> --- a/arch/arm64/kvm/hyp/vhe/switch.c
> +++ b/arch/arm64/kvm/hyp/vhe/switch.c
> @@ -78,6 +78,11 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
>  			if (!vcpu_el2_tge_is_set(vcpu))
>  				hcr |= HCR_AT | HCR_TTLB;
>  		}
> +	} else if (vcpu_has_nv(vcpu)) {
> +		u64 vhcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
> +
> +		vhcr_el2 &= ~HCR_GUEST_NV_FILTER_FLAGS;


> +		hcr |= vhcr_el2;

This makes sense, we only the guest to add extra traps on top of what KVM
already traps, not remove traps from what KVM has configured.

However, HCR_EL2.FIEN (bit 47) disables traps when the bit is 1. Shouldn't
it be part of the HCR_GUEST_NV_FILTER_FLAGS?

Thanks,
Alex

>  	}
>  
>  	___activate_traps(vcpu, hcr);
> -- 
> 2.30.2
> 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 42/64] KVM: arm64: nv: Fold guest's HCR_EL2 configuration into the host's
@ 2022-02-25 13:45     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-02-25 13:45 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi,

On Fri, Jan 28, 2022 at 12:18:50PM +0000, Marc Zyngier wrote:
> When entering a L2 guest (nested virt enabled, but not in hypervisor
> context), we need to honor the traps the L1 guest has asked enabled.
> 
> For now, just OR the guest's HCR_EL2 into the host's. We may have to do
> some filtering in the future though.

Hmm... looks to me like the filtering is already implemented via the
HCR_GUEST_NV_FILTER_FLAGS. Or am I misunderstanding something?

> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/hyp/vhe/switch.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
> index 0e164cc8e913..5e8eafac27c6 100644
> --- a/arch/arm64/kvm/hyp/vhe/switch.c
> +++ b/arch/arm64/kvm/hyp/vhe/switch.c
> @@ -78,6 +78,11 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
>  			if (!vcpu_el2_tge_is_set(vcpu))
>  				hcr |= HCR_AT | HCR_TTLB;
>  		}
> +	} else if (vcpu_has_nv(vcpu)) {
> +		u64 vhcr_el2 = __vcpu_sys_reg(vcpu, HCR_EL2);
> +
> +		vhcr_el2 &= ~HCR_GUEST_NV_FILTER_FLAGS;


> +		hcr |= vhcr_el2;

This makes sense, we only the guest to add extra traps on top of what KVM
already traps, not remove traps from what KVM has configured.

However, HCR_EL2.FIEN (bit 47) disables traps when the bit is 1. Shouldn't
it be part of the HCR_GUEST_NV_FILTER_FLAGS?

Thanks,
Alex

>  	}
>  
>  	___activate_traps(vcpu, hcr);
> -- 
> 2.30.2
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 43/64] KVM: arm64: nv: arch_timer: Support hyp timer emulation
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-03-07 14:52     ` Alexandru Elisei
  -1 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-03-07 14:52 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi,

I was under the impression that KVM's nested virtualization doesn't support
AArch32 in the guest, why is the subject about hyp mode aarch32 timers?

On Fri, Jan 28, 2022 at 12:18:51PM +0000, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@arm.com>
> 
> Emulating EL2 also means emulating the EL2 timers. To do so, we expand
> our timer framework to deal with at most 4 timers. At any given time,
> two timers are using the HW timers, and the two others are purely
> emulated.
> 
> The role of deciding which is which at any given time is left to a
> mapping function which is called every time we need to make such a
> decision.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> [maz: added CNTVOFF support, general reworking for v4.8]
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_host.h |   4 +
>  arch/arm64/kvm/arch_timer.c       | 165 ++++++++++++++++++++++++++++--
>  arch/arm64/kvm/sys_regs.c         |   7 +-
>  arch/arm64/kvm/trace_arm.h        |   6 +-
>  arch/arm64/kvm/vgic/vgic.c        |  15 +++
>  include/kvm/arm_arch_timer.h      |   8 +-
>  include/kvm/arm_vgic.h            |   1 +
>  7 files changed, 194 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 0b887364f994..03833eca3307 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -285,6 +285,10 @@ enum vcpu_sysreg {
>  	TPIDR_EL2,	/* EL2 Software Thread ID Register */
>  	CNTHCTL_EL2,	/* Counter-timer Hypervisor Control register */
>  	SP_EL2,		/* EL2 Stack Pointer */
> +	CNTHP_CTL_EL2,
> +	CNTHP_CVAL_EL2,
> +	CNTHV_CTL_EL2,
> +	CNTHV_CVAL_EL2,
>  
>  	NR_SYS_REGS	/* Nothing after this line! */
>  };
> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> index 6e542e2eae32..5e4f93605d36 100644
> --- a/arch/arm64/kvm/arch_timer.c
> +++ b/arch/arm64/kvm/arch_timer.c
> @@ -16,6 +16,7 @@
>  #include <asm/arch_timer.h>
>  #include <asm/kvm_emulate.h>
>  #include <asm/kvm_hyp.h>
> +#include <asm/kvm_nested.h>
>  
>  #include <kvm/arm_vgic.h>
>  #include <kvm/arm_arch_timer.h>
> @@ -40,6 +41,16 @@ static const struct kvm_irq_level default_vtimer_irq = {
>  	.level	= 1,
>  };
>  
> +static const struct kvm_irq_level default_hptimer_irq = {
> +	.irq	= 26,
> +	.level	= 1,
> +};
> +
> +static const struct kvm_irq_level default_hvtimer_irq = {
> +	.irq	= 28,
> +	.level	= 1,
> +};
> +
>  static bool kvm_timer_irq_can_fire(struct arch_timer_context *timer_ctx);
>  static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
>  				 struct arch_timer_context *timer_ctx);
> @@ -51,6 +62,11 @@ static void kvm_arm_timer_write(struct kvm_vcpu *vcpu,
>  static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
>  			      struct arch_timer_context *timer,
>  			      enum kvm_arch_timer_regs treg);
> +static bool kvm_arch_timer_get_input_level(int vintid);
> +
> +static struct irq_ops arch_timer_irq_ops = {
> +	.get_input_level = kvm_arch_timer_get_input_level,
> +};
>  
>  u32 timer_get_ctl(struct arch_timer_context *ctxt)
>  {
> @@ -61,6 +77,10 @@ u32 timer_get_ctl(struct arch_timer_context *ctxt)
>  		return __vcpu_sys_reg(vcpu, CNTV_CTL_EL0);
>  	case TIMER_PTIMER:
>  		return __vcpu_sys_reg(vcpu, CNTP_CTL_EL0);
> +	case TIMER_HVTIMER:
> +		return __vcpu_sys_reg(vcpu, CNTHV_CTL_EL2);
> +	case TIMER_HPTIMER:
> +		return __vcpu_sys_reg(vcpu, CNTHP_CTL_EL2);
>  	default:
>  		WARN_ON(1);
>  		return 0;
> @@ -76,6 +96,10 @@ u64 timer_get_cval(struct arch_timer_context *ctxt)
>  		return __vcpu_sys_reg(vcpu, CNTV_CVAL_EL0);
>  	case TIMER_PTIMER:
>  		return __vcpu_sys_reg(vcpu, CNTP_CVAL_EL0);
> +	case TIMER_HVTIMER:
> +		return __vcpu_sys_reg(vcpu, CNTHV_CVAL_EL2);
> +	case TIMER_HPTIMER:
> +		return __vcpu_sys_reg(vcpu, CNTHP_CVAL_EL2);
>  	default:
>  		WARN_ON(1);
>  		return 0;
> @@ -105,6 +129,12 @@ static void timer_set_ctl(struct arch_timer_context *ctxt, u32 ctl)
>  	case TIMER_PTIMER:
>  		__vcpu_sys_reg(vcpu, CNTP_CTL_EL0) = ctl;
>  		break;
> +	case TIMER_HVTIMER:
> +		__vcpu_sys_reg(vcpu, CNTHV_CTL_EL2) = ctl;
> +		break;
> +	case TIMER_HPTIMER:
> +		__vcpu_sys_reg(vcpu, CNTHP_CTL_EL2) = ctl;
> +		break;
>  	default:
>  		WARN_ON(1);
>  	}
> @@ -121,6 +151,12 @@ static void timer_set_cval(struct arch_timer_context *ctxt, u64 cval)
>  	case TIMER_PTIMER:
>  		__vcpu_sys_reg(vcpu, CNTP_CVAL_EL0) = cval;
>  		break;
> +	case TIMER_HVTIMER:
> +		__vcpu_sys_reg(vcpu, CNTHV_CVAL_EL2) = cval;
> +		break;
> +	case TIMER_HPTIMER:
> +		__vcpu_sys_reg(vcpu, CNTHP_CVAL_EL2) = cval;
> +		break;
>  	default:
>  		WARN_ON(1);
>  	}
> @@ -146,13 +182,27 @@ u64 kvm_phys_timer_read(void)
>  
>  static void get_timer_map(struct kvm_vcpu *vcpu, struct timer_map *map)
>  {
> -	if (has_vhe()) {
> +	if (vcpu_has_nv(vcpu)) {
> +		if (is_hyp_ctxt(vcpu)) {
> +			map->direct_vtimer = vcpu_hvtimer(vcpu);
> +			map->direct_ptimer = vcpu_hptimer(vcpu);
> +			map->emul_vtimer = vcpu_vtimer(vcpu);
> +			map->emul_ptimer = vcpu_ptimer(vcpu);
> +		} else {
> +			map->direct_vtimer = vcpu_vtimer(vcpu);
> +			map->direct_ptimer = vcpu_ptimer(vcpu);
> +			map->emul_vtimer = vcpu_hvtimer(vcpu);
> +			map->emul_ptimer = vcpu_hptimer(vcpu);
> +		}

It would be nice to explain why when the guest is running in virtual EL2, the
hp/hv timers are passthrough, while when the guest is in virtual EL1, the
virtual and physical timers are passthrough. I suppose it's because at EL2, the
guest is expected to use the hp/hv timers, and at EL1 the virtual and physical
timers, right?

> +	} else if (has_vhe()) {
>  		map->direct_vtimer = vcpu_vtimer(vcpu);
>  		map->direct_ptimer = vcpu_ptimer(vcpu);
> +		map->emul_vtimer = NULL;
>  		map->emul_ptimer = NULL;
>  	} else {
>  		map->direct_vtimer = vcpu_vtimer(vcpu);
>  		map->direct_ptimer = NULL;
> +		map->emul_vtimer = NULL;
>  		map->emul_ptimer = vcpu_ptimer(vcpu);
>  	}
>  
> @@ -325,9 +375,11 @@ static bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx)
>  
>  		switch (index) {
>  		case TIMER_VTIMER:
> +		case TIMER_HVTIMER:
>  			cnt_ctl = read_sysreg_el0(SYS_CNTV_CTL);
>  			break;
>  		case TIMER_PTIMER:
> +		case TIMER_HPTIMER:
>  			cnt_ctl = read_sysreg_el0(SYS_CNTP_CTL);
>  			break;
>  		case NR_KVM_TIMERS:
> @@ -358,6 +410,7 @@ bool kvm_timer_is_pending(struct kvm_vcpu *vcpu)
>  
>  	return kvm_timer_should_fire(map.direct_vtimer) ||
>  	       kvm_timer_should_fire(map.direct_ptimer) ||
> +	       kvm_timer_should_fire(map.emul_vtimer) ||
>  	       kvm_timer_should_fire(map.emul_ptimer);
>  }
>  
> @@ -438,6 +491,7 @@ static void timer_save_state(struct arch_timer_context *ctx)
>  
>  	switch (index) {
>  	case TIMER_VTIMER:
> +	case TIMER_HVTIMER:
>  		timer_set_ctl(ctx, read_sysreg_el0(SYS_CNTV_CTL));
>  		timer_set_cval(ctx, read_sysreg_el0(SYS_CNTV_CVAL));
>  
> @@ -447,6 +501,7 @@ static void timer_save_state(struct arch_timer_context *ctx)
>  
>  		break;
>  	case TIMER_PTIMER:
> +	case TIMER_HPTIMER:
>  		timer_set_ctl(ctx, read_sysreg_el0(SYS_CNTP_CTL));
>  		timer_set_cval(ctx, read_sysreg_el0(SYS_CNTP_CVAL));
>  
> @@ -484,6 +539,7 @@ static void kvm_timer_blocking(struct kvm_vcpu *vcpu)
>  	 */
>  	if (!kvm_timer_irq_can_fire(map.direct_vtimer) &&
>  	    !kvm_timer_irq_can_fire(map.direct_ptimer) &&
> +	    !kvm_timer_irq_can_fire(map.emul_vtimer) &&
>  	    !kvm_timer_irq_can_fire(map.emul_ptimer))
>  		return;
>  
> @@ -517,11 +573,13 @@ static void timer_restore_state(struct arch_timer_context *ctx)
>  
>  	switch (index) {
>  	case TIMER_VTIMER:
> +	case TIMER_HVTIMER:
>  		write_sysreg_el0(timer_get_cval(ctx), SYS_CNTV_CVAL);
>  		isb();
>  		write_sysreg_el0(timer_get_ctl(ctx), SYS_CNTV_CTL);
>  		break;
>  	case TIMER_PTIMER:
> +	case TIMER_HPTIMER:
>  		write_sysreg_el0(timer_get_cval(ctx), SYS_CNTP_CVAL);
>  		isb();
>  		write_sysreg_el0(timer_get_ctl(ctx), SYS_CNTP_CTL);
> @@ -598,6 +656,42 @@ static void kvm_timer_vcpu_load_nogic(struct kvm_vcpu *vcpu)
>  		enable_percpu_irq(host_vtimer_irq, host_vtimer_irq_flags);
>  }
>  
> +static void kvm_timer_vcpu_load_nested_switch(struct kvm_vcpu *vcpu,
> +					      struct timer_map *map)
> +{
> +	int hw, ret;
> +
> +	if (!irqchip_in_kernel(vcpu->kvm))
> +		return;
> +
> +	/*
> +	 * We only ever unmap the vtimer irq on a VHE system that runs nested
> +	 * virtualization, in which case we have both a valid emul_vtimer,
> +	 * emul_ptimer, direct_vtimer, and direct_ptimer.
> +	 *
> +	 * Since this is called from kvm_timer_vcpu_load(), a change between
> +	 * vEL2 and vEL1/0 will have just happened, and the timer_map will

I can think of at least two cases when the above doesn't hold:

- when a VCPU resets another VCPU.
- when the guest changes the value of the HCR_EL2.E2H field.

> +	 * represent this, and therefore we switch the emul/direct mappings
> +	 * below.
> +	 */
> +	hw = kvm_vgic_get_map(vcpu, map->direct_vtimer->irq.irq);
> +	if (hw < 0) {

Looking at kvm_vgic_get_map(), it returns -1 when the interrupt is not directly
directly mapped to a physical interrupt. This only happens when the VCPU has
transitioned from EL2 to EL1 or viceversa between a vcpu put/load pair. I think
it would make the code a lot more understandable if it's wrapped in the helper
function, something like kvm_timer_context_switched() or something like that.

Thanks,
Alex

> +		kvm_vgic_unmap_phys_irq(vcpu, map->emul_vtimer->irq.irq);
> +		kvm_vgic_unmap_phys_irq(vcpu, map->emul_ptimer->irq.irq);
> +
> +		ret = kvm_vgic_map_phys_irq(vcpu,
> +					    map->direct_vtimer->host_timer_irq,
> +					    map->direct_vtimer->irq.irq,
> +					    &arch_timer_irq_ops);
> +		WARN_ON_ONCE(ret);
> +		ret = kvm_vgic_map_phys_irq(vcpu,
> +					    map->direct_ptimer->host_timer_irq,
> +					    map->direct_ptimer->irq.irq,
> +					    &arch_timer_irq_ops);
> +		WARN_ON_ONCE(ret);
> +	}
> +}
> +
>  void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
>  {
>  	struct arch_timer_cpu *timer = vcpu_timer(vcpu);
> @@ -609,6 +703,9 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
>  	get_timer_map(vcpu, &map);
>  
>  	if (static_branch_likely(&has_gic_active_state)) {
> +		if (vcpu_has_nv(vcpu))
> +			kvm_timer_vcpu_load_nested_switch(vcpu, &map);
> +
>  		kvm_timer_vcpu_load_gic(map.direct_vtimer);
>  		if (map.direct_ptimer)
>  			kvm_timer_vcpu_load_gic(map.direct_ptimer);
> @@ -624,6 +721,8 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
>  	if (map.direct_ptimer)
>  		timer_restore_state(map.direct_ptimer);
>  
> +	if (map.emul_vtimer)
> +		timer_emulate(map.emul_vtimer);
>  	if (map.emul_ptimer)
>  		timer_emulate(map.emul_ptimer);
>  }
> @@ -668,6 +767,8 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
>  	 * In any case, we re-schedule the hrtimer for the physical timer when
>  	 * coming back to the VCPU thread in kvm_timer_vcpu_load().
>  	 */
> +	if (map.emul_vtimer)
> +		soft_timer_cancel(&map.emul_vtimer->hrtimer);
>  	if (map.emul_ptimer)
>  		soft_timer_cancel(&map.emul_ptimer->hrtimer);
>  
> @@ -728,10 +829,14 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
>  	 */
>  	timer_set_ctl(vcpu_vtimer(vcpu), 0);
>  	timer_set_ctl(vcpu_ptimer(vcpu), 0);
> +	timer_set_ctl(vcpu_hvtimer(vcpu), 0);
> +	timer_set_ctl(vcpu_hptimer(vcpu), 0);
>  
>  	if (timer->enabled) {
>  		kvm_timer_update_irq(vcpu, false, vcpu_vtimer(vcpu));
>  		kvm_timer_update_irq(vcpu, false, vcpu_ptimer(vcpu));
> +		kvm_timer_update_irq(vcpu, false, vcpu_hvtimer(vcpu));
> +		kvm_timer_update_irq(vcpu, false, vcpu_hptimer(vcpu));
>  
>  		if (irqchip_in_kernel(vcpu->kvm)) {
>  			kvm_vgic_reset_mapped_irq(vcpu, map.direct_vtimer->irq.irq);
> @@ -740,6 +845,8 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
>  		}
>  	}
>  
> +	if (map.emul_vtimer)
> +		soft_timer_cancel(&map.emul_vtimer->hrtimer);
>  	if (map.emul_ptimer)
>  		soft_timer_cancel(&map.emul_ptimer->hrtimer);
>  
> @@ -770,30 +877,47 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
>  	struct arch_timer_cpu *timer = vcpu_timer(vcpu);
>  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
>  	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
> +	struct arch_timer_context *hvtimer = vcpu_hvtimer(vcpu);
> +	struct arch_timer_context *hptimer = vcpu_hptimer(vcpu);
>  
>  	vtimer->vcpu = vcpu;
>  	ptimer->vcpu = vcpu;
> +	hvtimer->vcpu = vcpu;
> +	hptimer->vcpu = vcpu;
>  
>  	/* Synchronize cntvoff across all vtimers of a VM. */
>  	update_vtimer_cntvoff(vcpu, kvm_phys_timer_read());
>  	timer_set_offset(ptimer, 0);
> +	timer_set_offset(hvtimer, 0);
> +	timer_set_offset(hptimer, 0);
>  
>  	hrtimer_init(&timer->bg_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
>  	timer->bg_timer.function = kvm_bg_timer_expire;
>  
>  	hrtimer_init(&vtimer->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
>  	hrtimer_init(&ptimer->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
> +	hrtimer_init(&hvtimer->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
> +	hrtimer_init(&hptimer->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
> +
>  	vtimer->hrtimer.function = kvm_hrtimer_expire;
>  	ptimer->hrtimer.function = kvm_hrtimer_expire;
> +	hvtimer->hrtimer.function = kvm_hrtimer_expire;
> +	hptimer->hrtimer.function = kvm_hrtimer_expire;
>  
>  	vtimer->irq.irq = default_vtimer_irq.irq;
>  	ptimer->irq.irq = default_ptimer_irq.irq;
> +	hvtimer->irq.irq = default_hvtimer_irq.irq;
> +	hptimer->irq.irq = default_hptimer_irq.irq;
>  
>  	vtimer->host_timer_irq = host_vtimer_irq;
>  	ptimer->host_timer_irq = host_ptimer_irq;
> +	hvtimer->host_timer_irq = host_vtimer_irq;
> +	hptimer->host_timer_irq = host_ptimer_irq;
>  
>  	vtimer->host_timer_irq_flags = host_vtimer_irq_flags;
>  	ptimer->host_timer_irq_flags = host_ptimer_irq_flags;
> +	hvtimer->host_timer_irq_flags = host_vtimer_irq_flags;
> +	hptimer->host_timer_irq_flags = host_ptimer_irq_flags;
>  }
>  
>  static void kvm_timer_init_interrupt(void *info)
> @@ -900,6 +1024,10 @@ static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
>  		val = kvm_phys_timer_read() - timer_get_offset(timer);
>  		break;
>  
> +	case TIMER_REG_VOFF:
> +		val = timer_get_offset(timer);
> +		break;
> +
>  	default:
>  		BUG();
>  	}
> @@ -942,6 +1070,10 @@ static void kvm_arm_timer_write(struct kvm_vcpu *vcpu,
>  		timer_set_cval(timer, val);
>  		break;
>  
> +	case TIMER_REG_VOFF:
> +		timer_set_offset(timer, val);
> +		break;
> +
>  	default:
>  		BUG();
>  	}
> @@ -1040,10 +1172,6 @@ static const struct irq_domain_ops timer_domain_ops = {
>  	.free	= timer_irq_domain_free,
>  };
>  
> -static struct irq_ops arch_timer_irq_ops = {
> -	.get_input_level = kvm_arch_timer_get_input_level,
> -};
> -
>  static void kvm_irq_fixup_flags(unsigned int virq, u32 *flags)
>  {
>  	*flags = irq_get_trigger_type(virq);
> @@ -1188,7 +1316,7 @@ void kvm_timer_vcpu_terminate(struct kvm_vcpu *vcpu)
>  
>  static bool timer_irqs_are_valid(struct kvm_vcpu *vcpu)
>  {
> -	int vtimer_irq, ptimer_irq, ret;
> +	int vtimer_irq, ptimer_irq, hvtimer_irq, hptimer_irq, ret;
>  	unsigned long i;
>  
>  	vtimer_irq = vcpu_vtimer(vcpu)->irq.irq;
> @@ -1201,16 +1329,28 @@ static bool timer_irqs_are_valid(struct kvm_vcpu *vcpu)
>  	if (ret)
>  		return false;
>  
> +	hvtimer_irq = vcpu_hvtimer(vcpu)->irq.irq;
> +	ret = kvm_vgic_set_owner(vcpu, hvtimer_irq, vcpu_hvtimer(vcpu));
> +	if (ret)
> +		return false;
> +
> +	hptimer_irq = vcpu_hptimer(vcpu)->irq.irq;
> +	ret = kvm_vgic_set_owner(vcpu, hptimer_irq, vcpu_hptimer(vcpu));
> +	if (ret)
> +		return false;
> +
>  	kvm_for_each_vcpu(i, vcpu, vcpu->kvm) {
>  		if (vcpu_vtimer(vcpu)->irq.irq != vtimer_irq ||
> -		    vcpu_ptimer(vcpu)->irq.irq != ptimer_irq)
> +		    vcpu_ptimer(vcpu)->irq.irq != ptimer_irq ||
> +		    vcpu_hvtimer(vcpu)->irq.irq != hvtimer_irq ||
> +		    vcpu_hptimer(vcpu)->irq.irq != hptimer_irq)
>  			return false;
>  	}
>  
>  	return true;
>  }
>  
> -bool kvm_arch_timer_get_input_level(int vintid)
> +static bool kvm_arch_timer_get_input_level(int vintid)
>  {
>  	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
>  	struct arch_timer_context *timer;
> @@ -1219,6 +1359,10 @@ bool kvm_arch_timer_get_input_level(int vintid)
>  		timer = vcpu_vtimer(vcpu);
>  	else if (vintid == vcpu_ptimer(vcpu)->irq.irq)
>  		timer = vcpu_ptimer(vcpu);
> +	else if (vintid == vcpu_hvtimer(vcpu)->irq.irq)
> +		timer = vcpu_hvtimer(vcpu);
> +	else if (vintid == vcpu_hptimer(vcpu)->irq.irq)
> +		timer = vcpu_hptimer(vcpu);
>  	else
>  		BUG();
>  
> @@ -1301,6 +1445,7 @@ static void set_timer_irqs(struct kvm *kvm, int vtimer_irq, int ptimer_irq)
>  	kvm_for_each_vcpu(i, vcpu, kvm) {
>  		vcpu_vtimer(vcpu)->irq.irq = vtimer_irq;
>  		vcpu_ptimer(vcpu)->irq.irq = ptimer_irq;
> +		/* TODO: Add support for hv/hp timers */
>  	}
>  }
>  
> @@ -1311,6 +1456,8 @@ int kvm_arm_timer_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
>  	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
>  	int irq;
>  
> +	/* TODO: Add support for hv/hp timers */
> +
>  	if (!irqchip_in_kernel(vcpu->kvm))
>  		return -EINVAL;
>  
> @@ -1343,6 +1490,8 @@ int kvm_arm_timer_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
>  	struct arch_timer_context *timer;
>  	int irq;
>  
> +	/* TODO: Add support for hv/hp timers */
> +
>  	switch (attr->attr) {
>  	case KVM_ARM_VCPU_TIMER_IRQ_VTIMER:
>  		timer = vcpu_vtimer(vcpu);
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index d7441b8ba406..bbc58930a5eb 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1288,6 +1288,11 @@ static bool access_arch_timer(struct kvm_vcpu *vcpu,
>  		tmr = TIMER_PTIMER;
>  		treg = TIMER_REG_CVAL;
>  		break;
> +	case SYS_CNTVOFF_EL2:
> +		tmr = TIMER_VTIMER;
> +		treg = TIMER_REG_VOFF;
> +		break;
> +
>  	default:
>  		BUG();
>  	}
> @@ -2212,7 +2217,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	EL2_REG(CONTEXTIDR_EL2, access_rw, reset_val, 0),
>  	EL2_REG(TPIDR_EL2, access_rw, reset_val, 0),
>  
> -	EL2_REG(CNTVOFF_EL2, access_rw, reset_val, 0),
> +	{ SYS_DESC(SYS_CNTVOFF_EL2), access_arch_timer },
>  	EL2_REG(CNTHCTL_EL2, access_rw, reset_val, 0),
>  
>  	EL12_REG(SCTLR, access_vm_reg, reset_val, 0x00C50078),
> diff --git a/arch/arm64/kvm/trace_arm.h b/arch/arm64/kvm/trace_arm.h
> index f3e46a976125..6ce5c025218d 100644
> --- a/arch/arm64/kvm/trace_arm.h
> +++ b/arch/arm64/kvm/trace_arm.h
> @@ -206,6 +206,7 @@ TRACE_EVENT(kvm_get_timer_map,
>  		__field(	unsigned long,		vcpu_id	)
>  		__field(	int,			direct_vtimer	)
>  		__field(	int,			direct_ptimer	)
> +		__field(	int,			emul_vtimer	)
>  		__field(	int,			emul_ptimer	)
>  	),
>  
> @@ -214,14 +215,17 @@ TRACE_EVENT(kvm_get_timer_map,
>  		__entry->direct_vtimer		= arch_timer_ctx_index(map->direct_vtimer);
>  		__entry->direct_ptimer =
>  			(map->direct_ptimer) ? arch_timer_ctx_index(map->direct_ptimer) : -1;
> +		__entry->emul_vtimer =
> +			(map->emul_vtimer) ? arch_timer_ctx_index(map->emul_vtimer) : -1;
>  		__entry->emul_ptimer =
>  			(map->emul_ptimer) ? arch_timer_ctx_index(map->emul_ptimer) : -1;
>  	),
>  
> -	TP_printk("VCPU: %ld, dv: %d, dp: %d, ep: %d",
> +	TP_printk("VCPU: %ld, dv: %d, dp: %d, ev: %d, ep: %d",
>  		  __entry->vcpu_id,
>  		  __entry->direct_vtimer,
>  		  __entry->direct_ptimer,
> +		  __entry->emul_vtimer,
>  		  __entry->emul_ptimer)
>  );
>  
> diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
> index 9b98876a8a93..e7fe0447af08 100644
> --- a/arch/arm64/kvm/vgic/vgic.c
> +++ b/arch/arm64/kvm/vgic/vgic.c
> @@ -573,6 +573,21 @@ int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int vintid)
>  	return 0;
>  }
>  
> +int kvm_vgic_get_map(struct kvm_vcpu *vcpu, unsigned int vintid)
> +{
> +	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, vintid);
> +	unsigned long flags;
> +	int ret = -1;
> +
> +	raw_spin_lock_irqsave(&irq->irq_lock, flags);
> +	if (irq->hw)
> +		ret = irq->hwintid;
> +	raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
> +
> +	vgic_put_irq(vcpu->kvm, irq);
> +	return ret;
> +}
> +
>  /**
>   * kvm_vgic_set_owner - Set the owner of an interrupt for a VM
>   *
> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> index 51c19381108c..0a76dac8cb6a 100644
> --- a/include/kvm/arm_arch_timer.h
> +++ b/include/kvm/arm_arch_timer.h
> @@ -13,6 +13,8 @@
>  enum kvm_arch_timers {
>  	TIMER_PTIMER,
>  	TIMER_VTIMER,
> +	TIMER_HVTIMER,
> +	TIMER_HPTIMER,
>  	NR_KVM_TIMERS
>  };
>  
> @@ -21,6 +23,7 @@ enum kvm_arch_timer_regs {
>  	TIMER_REG_CVAL,
>  	TIMER_REG_TVAL,
>  	TIMER_REG_CTL,
> +	TIMER_REG_VOFF,
>  };
>  
>  struct arch_timer_context {
> @@ -47,6 +50,7 @@ struct arch_timer_context {
>  struct timer_map {
>  	struct arch_timer_context *direct_vtimer;
>  	struct arch_timer_context *direct_ptimer;
> +	struct arch_timer_context *emul_vtimer;
>  	struct arch_timer_context *emul_ptimer;
>  };
>  
> @@ -85,12 +89,12 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu);
>  
>  void kvm_timer_init_vhe(void);
>  
> -bool kvm_arch_timer_get_input_level(int vintid);
> -
>  #define vcpu_timer(v)	(&(v)->arch.timer_cpu)
>  #define vcpu_get_timer(v,t)	(&vcpu_timer(v)->timers[(t)])
>  #define vcpu_vtimer(v)	(&(v)->arch.timer_cpu.timers[TIMER_VTIMER])
>  #define vcpu_ptimer(v)	(&(v)->arch.timer_cpu.timers[TIMER_PTIMER])
> +#define vcpu_hvtimer(v)	(&(v)->arch.timer_cpu.timers[TIMER_HVTIMER])
> +#define vcpu_hptimer(v)	(&(v)->arch.timer_cpu.timers[TIMER_HPTIMER])
>  
>  #define arch_timer_ctx_index(ctx)	((ctx) - vcpu_timer((ctx)->vcpu)->timers)
>  
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index bb30a6803d9f..17b651890d22 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -375,6 +375,7 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
>  int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, unsigned int host_irq,
>  			  u32 vintid, struct irq_ops *ops);
>  int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int vintid);
> +int kvm_vgic_get_map(struct kvm_vcpu *vcpu, unsigned int vintid);
>  bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int vintid);
>  
>  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 43/64] KVM: arm64: nv: arch_timer: Support hyp timer emulation
@ 2022-03-07 14:52     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-03-07 14:52 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

Hi,

I was under the impression that KVM's nested virtualization doesn't support
AArch32 in the guest, why is the subject about hyp mode aarch32 timers?

On Fri, Jan 28, 2022 at 12:18:51PM +0000, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@arm.com>
> 
> Emulating EL2 also means emulating the EL2 timers. To do so, we expand
> our timer framework to deal with at most 4 timers. At any given time,
> two timers are using the HW timers, and the two others are purely
> emulated.
> 
> The role of deciding which is which at any given time is left to a
> mapping function which is called every time we need to make such a
> decision.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> [maz: added CNTVOFF support, general reworking for v4.8]
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_host.h |   4 +
>  arch/arm64/kvm/arch_timer.c       | 165 ++++++++++++++++++++++++++++--
>  arch/arm64/kvm/sys_regs.c         |   7 +-
>  arch/arm64/kvm/trace_arm.h        |   6 +-
>  arch/arm64/kvm/vgic/vgic.c        |  15 +++
>  include/kvm/arm_arch_timer.h      |   8 +-
>  include/kvm/arm_vgic.h            |   1 +
>  7 files changed, 194 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 0b887364f994..03833eca3307 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -285,6 +285,10 @@ enum vcpu_sysreg {
>  	TPIDR_EL2,	/* EL2 Software Thread ID Register */
>  	CNTHCTL_EL2,	/* Counter-timer Hypervisor Control register */
>  	SP_EL2,		/* EL2 Stack Pointer */
> +	CNTHP_CTL_EL2,
> +	CNTHP_CVAL_EL2,
> +	CNTHV_CTL_EL2,
> +	CNTHV_CVAL_EL2,
>  
>  	NR_SYS_REGS	/* Nothing after this line! */
>  };
> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> index 6e542e2eae32..5e4f93605d36 100644
> --- a/arch/arm64/kvm/arch_timer.c
> +++ b/arch/arm64/kvm/arch_timer.c
> @@ -16,6 +16,7 @@
>  #include <asm/arch_timer.h>
>  #include <asm/kvm_emulate.h>
>  #include <asm/kvm_hyp.h>
> +#include <asm/kvm_nested.h>
>  
>  #include <kvm/arm_vgic.h>
>  #include <kvm/arm_arch_timer.h>
> @@ -40,6 +41,16 @@ static const struct kvm_irq_level default_vtimer_irq = {
>  	.level	= 1,
>  };
>  
> +static const struct kvm_irq_level default_hptimer_irq = {
> +	.irq	= 26,
> +	.level	= 1,
> +};
> +
> +static const struct kvm_irq_level default_hvtimer_irq = {
> +	.irq	= 28,
> +	.level	= 1,
> +};
> +
>  static bool kvm_timer_irq_can_fire(struct arch_timer_context *timer_ctx);
>  static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
>  				 struct arch_timer_context *timer_ctx);
> @@ -51,6 +62,11 @@ static void kvm_arm_timer_write(struct kvm_vcpu *vcpu,
>  static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
>  			      struct arch_timer_context *timer,
>  			      enum kvm_arch_timer_regs treg);
> +static bool kvm_arch_timer_get_input_level(int vintid);
> +
> +static struct irq_ops arch_timer_irq_ops = {
> +	.get_input_level = kvm_arch_timer_get_input_level,
> +};
>  
>  u32 timer_get_ctl(struct arch_timer_context *ctxt)
>  {
> @@ -61,6 +77,10 @@ u32 timer_get_ctl(struct arch_timer_context *ctxt)
>  		return __vcpu_sys_reg(vcpu, CNTV_CTL_EL0);
>  	case TIMER_PTIMER:
>  		return __vcpu_sys_reg(vcpu, CNTP_CTL_EL0);
> +	case TIMER_HVTIMER:
> +		return __vcpu_sys_reg(vcpu, CNTHV_CTL_EL2);
> +	case TIMER_HPTIMER:
> +		return __vcpu_sys_reg(vcpu, CNTHP_CTL_EL2);
>  	default:
>  		WARN_ON(1);
>  		return 0;
> @@ -76,6 +96,10 @@ u64 timer_get_cval(struct arch_timer_context *ctxt)
>  		return __vcpu_sys_reg(vcpu, CNTV_CVAL_EL0);
>  	case TIMER_PTIMER:
>  		return __vcpu_sys_reg(vcpu, CNTP_CVAL_EL0);
> +	case TIMER_HVTIMER:
> +		return __vcpu_sys_reg(vcpu, CNTHV_CVAL_EL2);
> +	case TIMER_HPTIMER:
> +		return __vcpu_sys_reg(vcpu, CNTHP_CVAL_EL2);
>  	default:
>  		WARN_ON(1);
>  		return 0;
> @@ -105,6 +129,12 @@ static void timer_set_ctl(struct arch_timer_context *ctxt, u32 ctl)
>  	case TIMER_PTIMER:
>  		__vcpu_sys_reg(vcpu, CNTP_CTL_EL0) = ctl;
>  		break;
> +	case TIMER_HVTIMER:
> +		__vcpu_sys_reg(vcpu, CNTHV_CTL_EL2) = ctl;
> +		break;
> +	case TIMER_HPTIMER:
> +		__vcpu_sys_reg(vcpu, CNTHP_CTL_EL2) = ctl;
> +		break;
>  	default:
>  		WARN_ON(1);
>  	}
> @@ -121,6 +151,12 @@ static void timer_set_cval(struct arch_timer_context *ctxt, u64 cval)
>  	case TIMER_PTIMER:
>  		__vcpu_sys_reg(vcpu, CNTP_CVAL_EL0) = cval;
>  		break;
> +	case TIMER_HVTIMER:
> +		__vcpu_sys_reg(vcpu, CNTHV_CVAL_EL2) = cval;
> +		break;
> +	case TIMER_HPTIMER:
> +		__vcpu_sys_reg(vcpu, CNTHP_CVAL_EL2) = cval;
> +		break;
>  	default:
>  		WARN_ON(1);
>  	}
> @@ -146,13 +182,27 @@ u64 kvm_phys_timer_read(void)
>  
>  static void get_timer_map(struct kvm_vcpu *vcpu, struct timer_map *map)
>  {
> -	if (has_vhe()) {
> +	if (vcpu_has_nv(vcpu)) {
> +		if (is_hyp_ctxt(vcpu)) {
> +			map->direct_vtimer = vcpu_hvtimer(vcpu);
> +			map->direct_ptimer = vcpu_hptimer(vcpu);
> +			map->emul_vtimer = vcpu_vtimer(vcpu);
> +			map->emul_ptimer = vcpu_ptimer(vcpu);
> +		} else {
> +			map->direct_vtimer = vcpu_vtimer(vcpu);
> +			map->direct_ptimer = vcpu_ptimer(vcpu);
> +			map->emul_vtimer = vcpu_hvtimer(vcpu);
> +			map->emul_ptimer = vcpu_hptimer(vcpu);
> +		}

It would be nice to explain why when the guest is running in virtual EL2, the
hp/hv timers are passthrough, while when the guest is in virtual EL1, the
virtual and physical timers are passthrough. I suppose it's because at EL2, the
guest is expected to use the hp/hv timers, and at EL1 the virtual and physical
timers, right?

> +	} else if (has_vhe()) {
>  		map->direct_vtimer = vcpu_vtimer(vcpu);
>  		map->direct_ptimer = vcpu_ptimer(vcpu);
> +		map->emul_vtimer = NULL;
>  		map->emul_ptimer = NULL;
>  	} else {
>  		map->direct_vtimer = vcpu_vtimer(vcpu);
>  		map->direct_ptimer = NULL;
> +		map->emul_vtimer = NULL;
>  		map->emul_ptimer = vcpu_ptimer(vcpu);
>  	}
>  
> @@ -325,9 +375,11 @@ static bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx)
>  
>  		switch (index) {
>  		case TIMER_VTIMER:
> +		case TIMER_HVTIMER:
>  			cnt_ctl = read_sysreg_el0(SYS_CNTV_CTL);
>  			break;
>  		case TIMER_PTIMER:
> +		case TIMER_HPTIMER:
>  			cnt_ctl = read_sysreg_el0(SYS_CNTP_CTL);
>  			break;
>  		case NR_KVM_TIMERS:
> @@ -358,6 +410,7 @@ bool kvm_timer_is_pending(struct kvm_vcpu *vcpu)
>  
>  	return kvm_timer_should_fire(map.direct_vtimer) ||
>  	       kvm_timer_should_fire(map.direct_ptimer) ||
> +	       kvm_timer_should_fire(map.emul_vtimer) ||
>  	       kvm_timer_should_fire(map.emul_ptimer);
>  }
>  
> @@ -438,6 +491,7 @@ static void timer_save_state(struct arch_timer_context *ctx)
>  
>  	switch (index) {
>  	case TIMER_VTIMER:
> +	case TIMER_HVTIMER:
>  		timer_set_ctl(ctx, read_sysreg_el0(SYS_CNTV_CTL));
>  		timer_set_cval(ctx, read_sysreg_el0(SYS_CNTV_CVAL));
>  
> @@ -447,6 +501,7 @@ static void timer_save_state(struct arch_timer_context *ctx)
>  
>  		break;
>  	case TIMER_PTIMER:
> +	case TIMER_HPTIMER:
>  		timer_set_ctl(ctx, read_sysreg_el0(SYS_CNTP_CTL));
>  		timer_set_cval(ctx, read_sysreg_el0(SYS_CNTP_CVAL));
>  
> @@ -484,6 +539,7 @@ static void kvm_timer_blocking(struct kvm_vcpu *vcpu)
>  	 */
>  	if (!kvm_timer_irq_can_fire(map.direct_vtimer) &&
>  	    !kvm_timer_irq_can_fire(map.direct_ptimer) &&
> +	    !kvm_timer_irq_can_fire(map.emul_vtimer) &&
>  	    !kvm_timer_irq_can_fire(map.emul_ptimer))
>  		return;
>  
> @@ -517,11 +573,13 @@ static void timer_restore_state(struct arch_timer_context *ctx)
>  
>  	switch (index) {
>  	case TIMER_VTIMER:
> +	case TIMER_HVTIMER:
>  		write_sysreg_el0(timer_get_cval(ctx), SYS_CNTV_CVAL);
>  		isb();
>  		write_sysreg_el0(timer_get_ctl(ctx), SYS_CNTV_CTL);
>  		break;
>  	case TIMER_PTIMER:
> +	case TIMER_HPTIMER:
>  		write_sysreg_el0(timer_get_cval(ctx), SYS_CNTP_CVAL);
>  		isb();
>  		write_sysreg_el0(timer_get_ctl(ctx), SYS_CNTP_CTL);
> @@ -598,6 +656,42 @@ static void kvm_timer_vcpu_load_nogic(struct kvm_vcpu *vcpu)
>  		enable_percpu_irq(host_vtimer_irq, host_vtimer_irq_flags);
>  }
>  
> +static void kvm_timer_vcpu_load_nested_switch(struct kvm_vcpu *vcpu,
> +					      struct timer_map *map)
> +{
> +	int hw, ret;
> +
> +	if (!irqchip_in_kernel(vcpu->kvm))
> +		return;
> +
> +	/*
> +	 * We only ever unmap the vtimer irq on a VHE system that runs nested
> +	 * virtualization, in which case we have both a valid emul_vtimer,
> +	 * emul_ptimer, direct_vtimer, and direct_ptimer.
> +	 *
> +	 * Since this is called from kvm_timer_vcpu_load(), a change between
> +	 * vEL2 and vEL1/0 will have just happened, and the timer_map will

I can think of at least two cases when the above doesn't hold:

- when a VCPU resets another VCPU.
- when the guest changes the value of the HCR_EL2.E2H field.

> +	 * represent this, and therefore we switch the emul/direct mappings
> +	 * below.
> +	 */
> +	hw = kvm_vgic_get_map(vcpu, map->direct_vtimer->irq.irq);
> +	if (hw < 0) {

Looking at kvm_vgic_get_map(), it returns -1 when the interrupt is not directly
directly mapped to a physical interrupt. This only happens when the VCPU has
transitioned from EL2 to EL1 or viceversa between a vcpu put/load pair. I think
it would make the code a lot more understandable if it's wrapped in the helper
function, something like kvm_timer_context_switched() or something like that.

Thanks,
Alex

> +		kvm_vgic_unmap_phys_irq(vcpu, map->emul_vtimer->irq.irq);
> +		kvm_vgic_unmap_phys_irq(vcpu, map->emul_ptimer->irq.irq);
> +
> +		ret = kvm_vgic_map_phys_irq(vcpu,
> +					    map->direct_vtimer->host_timer_irq,
> +					    map->direct_vtimer->irq.irq,
> +					    &arch_timer_irq_ops);
> +		WARN_ON_ONCE(ret);
> +		ret = kvm_vgic_map_phys_irq(vcpu,
> +					    map->direct_ptimer->host_timer_irq,
> +					    map->direct_ptimer->irq.irq,
> +					    &arch_timer_irq_ops);
> +		WARN_ON_ONCE(ret);
> +	}
> +}
> +
>  void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
>  {
>  	struct arch_timer_cpu *timer = vcpu_timer(vcpu);
> @@ -609,6 +703,9 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
>  	get_timer_map(vcpu, &map);
>  
>  	if (static_branch_likely(&has_gic_active_state)) {
> +		if (vcpu_has_nv(vcpu))
> +			kvm_timer_vcpu_load_nested_switch(vcpu, &map);
> +
>  		kvm_timer_vcpu_load_gic(map.direct_vtimer);
>  		if (map.direct_ptimer)
>  			kvm_timer_vcpu_load_gic(map.direct_ptimer);
> @@ -624,6 +721,8 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
>  	if (map.direct_ptimer)
>  		timer_restore_state(map.direct_ptimer);
>  
> +	if (map.emul_vtimer)
> +		timer_emulate(map.emul_vtimer);
>  	if (map.emul_ptimer)
>  		timer_emulate(map.emul_ptimer);
>  }
> @@ -668,6 +767,8 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
>  	 * In any case, we re-schedule the hrtimer for the physical timer when
>  	 * coming back to the VCPU thread in kvm_timer_vcpu_load().
>  	 */
> +	if (map.emul_vtimer)
> +		soft_timer_cancel(&map.emul_vtimer->hrtimer);
>  	if (map.emul_ptimer)
>  		soft_timer_cancel(&map.emul_ptimer->hrtimer);
>  
> @@ -728,10 +829,14 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
>  	 */
>  	timer_set_ctl(vcpu_vtimer(vcpu), 0);
>  	timer_set_ctl(vcpu_ptimer(vcpu), 0);
> +	timer_set_ctl(vcpu_hvtimer(vcpu), 0);
> +	timer_set_ctl(vcpu_hptimer(vcpu), 0);
>  
>  	if (timer->enabled) {
>  		kvm_timer_update_irq(vcpu, false, vcpu_vtimer(vcpu));
>  		kvm_timer_update_irq(vcpu, false, vcpu_ptimer(vcpu));
> +		kvm_timer_update_irq(vcpu, false, vcpu_hvtimer(vcpu));
> +		kvm_timer_update_irq(vcpu, false, vcpu_hptimer(vcpu));
>  
>  		if (irqchip_in_kernel(vcpu->kvm)) {
>  			kvm_vgic_reset_mapped_irq(vcpu, map.direct_vtimer->irq.irq);
> @@ -740,6 +845,8 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
>  		}
>  	}
>  
> +	if (map.emul_vtimer)
> +		soft_timer_cancel(&map.emul_vtimer->hrtimer);
>  	if (map.emul_ptimer)
>  		soft_timer_cancel(&map.emul_ptimer->hrtimer);
>  
> @@ -770,30 +877,47 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
>  	struct arch_timer_cpu *timer = vcpu_timer(vcpu);
>  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
>  	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
> +	struct arch_timer_context *hvtimer = vcpu_hvtimer(vcpu);
> +	struct arch_timer_context *hptimer = vcpu_hptimer(vcpu);
>  
>  	vtimer->vcpu = vcpu;
>  	ptimer->vcpu = vcpu;
> +	hvtimer->vcpu = vcpu;
> +	hptimer->vcpu = vcpu;
>  
>  	/* Synchronize cntvoff across all vtimers of a VM. */
>  	update_vtimer_cntvoff(vcpu, kvm_phys_timer_read());
>  	timer_set_offset(ptimer, 0);
> +	timer_set_offset(hvtimer, 0);
> +	timer_set_offset(hptimer, 0);
>  
>  	hrtimer_init(&timer->bg_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
>  	timer->bg_timer.function = kvm_bg_timer_expire;
>  
>  	hrtimer_init(&vtimer->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
>  	hrtimer_init(&ptimer->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
> +	hrtimer_init(&hvtimer->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
> +	hrtimer_init(&hptimer->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
> +
>  	vtimer->hrtimer.function = kvm_hrtimer_expire;
>  	ptimer->hrtimer.function = kvm_hrtimer_expire;
> +	hvtimer->hrtimer.function = kvm_hrtimer_expire;
> +	hptimer->hrtimer.function = kvm_hrtimer_expire;
>  
>  	vtimer->irq.irq = default_vtimer_irq.irq;
>  	ptimer->irq.irq = default_ptimer_irq.irq;
> +	hvtimer->irq.irq = default_hvtimer_irq.irq;
> +	hptimer->irq.irq = default_hptimer_irq.irq;
>  
>  	vtimer->host_timer_irq = host_vtimer_irq;
>  	ptimer->host_timer_irq = host_ptimer_irq;
> +	hvtimer->host_timer_irq = host_vtimer_irq;
> +	hptimer->host_timer_irq = host_ptimer_irq;
>  
>  	vtimer->host_timer_irq_flags = host_vtimer_irq_flags;
>  	ptimer->host_timer_irq_flags = host_ptimer_irq_flags;
> +	hvtimer->host_timer_irq_flags = host_vtimer_irq_flags;
> +	hptimer->host_timer_irq_flags = host_ptimer_irq_flags;
>  }
>  
>  static void kvm_timer_init_interrupt(void *info)
> @@ -900,6 +1024,10 @@ static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
>  		val = kvm_phys_timer_read() - timer_get_offset(timer);
>  		break;
>  
> +	case TIMER_REG_VOFF:
> +		val = timer_get_offset(timer);
> +		break;
> +
>  	default:
>  		BUG();
>  	}
> @@ -942,6 +1070,10 @@ static void kvm_arm_timer_write(struct kvm_vcpu *vcpu,
>  		timer_set_cval(timer, val);
>  		break;
>  
> +	case TIMER_REG_VOFF:
> +		timer_set_offset(timer, val);
> +		break;
> +
>  	default:
>  		BUG();
>  	}
> @@ -1040,10 +1172,6 @@ static const struct irq_domain_ops timer_domain_ops = {
>  	.free	= timer_irq_domain_free,
>  };
>  
> -static struct irq_ops arch_timer_irq_ops = {
> -	.get_input_level = kvm_arch_timer_get_input_level,
> -};
> -
>  static void kvm_irq_fixup_flags(unsigned int virq, u32 *flags)
>  {
>  	*flags = irq_get_trigger_type(virq);
> @@ -1188,7 +1316,7 @@ void kvm_timer_vcpu_terminate(struct kvm_vcpu *vcpu)
>  
>  static bool timer_irqs_are_valid(struct kvm_vcpu *vcpu)
>  {
> -	int vtimer_irq, ptimer_irq, ret;
> +	int vtimer_irq, ptimer_irq, hvtimer_irq, hptimer_irq, ret;
>  	unsigned long i;
>  
>  	vtimer_irq = vcpu_vtimer(vcpu)->irq.irq;
> @@ -1201,16 +1329,28 @@ static bool timer_irqs_are_valid(struct kvm_vcpu *vcpu)
>  	if (ret)
>  		return false;
>  
> +	hvtimer_irq = vcpu_hvtimer(vcpu)->irq.irq;
> +	ret = kvm_vgic_set_owner(vcpu, hvtimer_irq, vcpu_hvtimer(vcpu));
> +	if (ret)
> +		return false;
> +
> +	hptimer_irq = vcpu_hptimer(vcpu)->irq.irq;
> +	ret = kvm_vgic_set_owner(vcpu, hptimer_irq, vcpu_hptimer(vcpu));
> +	if (ret)
> +		return false;
> +
>  	kvm_for_each_vcpu(i, vcpu, vcpu->kvm) {
>  		if (vcpu_vtimer(vcpu)->irq.irq != vtimer_irq ||
> -		    vcpu_ptimer(vcpu)->irq.irq != ptimer_irq)
> +		    vcpu_ptimer(vcpu)->irq.irq != ptimer_irq ||
> +		    vcpu_hvtimer(vcpu)->irq.irq != hvtimer_irq ||
> +		    vcpu_hptimer(vcpu)->irq.irq != hptimer_irq)
>  			return false;
>  	}
>  
>  	return true;
>  }
>  
> -bool kvm_arch_timer_get_input_level(int vintid)
> +static bool kvm_arch_timer_get_input_level(int vintid)
>  {
>  	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
>  	struct arch_timer_context *timer;
> @@ -1219,6 +1359,10 @@ bool kvm_arch_timer_get_input_level(int vintid)
>  		timer = vcpu_vtimer(vcpu);
>  	else if (vintid == vcpu_ptimer(vcpu)->irq.irq)
>  		timer = vcpu_ptimer(vcpu);
> +	else if (vintid == vcpu_hvtimer(vcpu)->irq.irq)
> +		timer = vcpu_hvtimer(vcpu);
> +	else if (vintid == vcpu_hptimer(vcpu)->irq.irq)
> +		timer = vcpu_hptimer(vcpu);
>  	else
>  		BUG();
>  
> @@ -1301,6 +1445,7 @@ static void set_timer_irqs(struct kvm *kvm, int vtimer_irq, int ptimer_irq)
>  	kvm_for_each_vcpu(i, vcpu, kvm) {
>  		vcpu_vtimer(vcpu)->irq.irq = vtimer_irq;
>  		vcpu_ptimer(vcpu)->irq.irq = ptimer_irq;
> +		/* TODO: Add support for hv/hp timers */
>  	}
>  }
>  
> @@ -1311,6 +1456,8 @@ int kvm_arm_timer_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
>  	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
>  	int irq;
>  
> +	/* TODO: Add support for hv/hp timers */
> +
>  	if (!irqchip_in_kernel(vcpu->kvm))
>  		return -EINVAL;
>  
> @@ -1343,6 +1490,8 @@ int kvm_arm_timer_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
>  	struct arch_timer_context *timer;
>  	int irq;
>  
> +	/* TODO: Add support for hv/hp timers */
> +
>  	switch (attr->attr) {
>  	case KVM_ARM_VCPU_TIMER_IRQ_VTIMER:
>  		timer = vcpu_vtimer(vcpu);
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index d7441b8ba406..bbc58930a5eb 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1288,6 +1288,11 @@ static bool access_arch_timer(struct kvm_vcpu *vcpu,
>  		tmr = TIMER_PTIMER;
>  		treg = TIMER_REG_CVAL;
>  		break;
> +	case SYS_CNTVOFF_EL2:
> +		tmr = TIMER_VTIMER;
> +		treg = TIMER_REG_VOFF;
> +		break;
> +
>  	default:
>  		BUG();
>  	}
> @@ -2212,7 +2217,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	EL2_REG(CONTEXTIDR_EL2, access_rw, reset_val, 0),
>  	EL2_REG(TPIDR_EL2, access_rw, reset_val, 0),
>  
> -	EL2_REG(CNTVOFF_EL2, access_rw, reset_val, 0),
> +	{ SYS_DESC(SYS_CNTVOFF_EL2), access_arch_timer },
>  	EL2_REG(CNTHCTL_EL2, access_rw, reset_val, 0),
>  
>  	EL12_REG(SCTLR, access_vm_reg, reset_val, 0x00C50078),
> diff --git a/arch/arm64/kvm/trace_arm.h b/arch/arm64/kvm/trace_arm.h
> index f3e46a976125..6ce5c025218d 100644
> --- a/arch/arm64/kvm/trace_arm.h
> +++ b/arch/arm64/kvm/trace_arm.h
> @@ -206,6 +206,7 @@ TRACE_EVENT(kvm_get_timer_map,
>  		__field(	unsigned long,		vcpu_id	)
>  		__field(	int,			direct_vtimer	)
>  		__field(	int,			direct_ptimer	)
> +		__field(	int,			emul_vtimer	)
>  		__field(	int,			emul_ptimer	)
>  	),
>  
> @@ -214,14 +215,17 @@ TRACE_EVENT(kvm_get_timer_map,
>  		__entry->direct_vtimer		= arch_timer_ctx_index(map->direct_vtimer);
>  		__entry->direct_ptimer =
>  			(map->direct_ptimer) ? arch_timer_ctx_index(map->direct_ptimer) : -1;
> +		__entry->emul_vtimer =
> +			(map->emul_vtimer) ? arch_timer_ctx_index(map->emul_vtimer) : -1;
>  		__entry->emul_ptimer =
>  			(map->emul_ptimer) ? arch_timer_ctx_index(map->emul_ptimer) : -1;
>  	),
>  
> -	TP_printk("VCPU: %ld, dv: %d, dp: %d, ep: %d",
> +	TP_printk("VCPU: %ld, dv: %d, dp: %d, ev: %d, ep: %d",
>  		  __entry->vcpu_id,
>  		  __entry->direct_vtimer,
>  		  __entry->direct_ptimer,
> +		  __entry->emul_vtimer,
>  		  __entry->emul_ptimer)
>  );
>  
> diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
> index 9b98876a8a93..e7fe0447af08 100644
> --- a/arch/arm64/kvm/vgic/vgic.c
> +++ b/arch/arm64/kvm/vgic/vgic.c
> @@ -573,6 +573,21 @@ int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int vintid)
>  	return 0;
>  }
>  
> +int kvm_vgic_get_map(struct kvm_vcpu *vcpu, unsigned int vintid)
> +{
> +	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, vintid);
> +	unsigned long flags;
> +	int ret = -1;
> +
> +	raw_spin_lock_irqsave(&irq->irq_lock, flags);
> +	if (irq->hw)
> +		ret = irq->hwintid;
> +	raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
> +
> +	vgic_put_irq(vcpu->kvm, irq);
> +	return ret;
> +}
> +
>  /**
>   * kvm_vgic_set_owner - Set the owner of an interrupt for a VM
>   *
> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> index 51c19381108c..0a76dac8cb6a 100644
> --- a/include/kvm/arm_arch_timer.h
> +++ b/include/kvm/arm_arch_timer.h
> @@ -13,6 +13,8 @@
>  enum kvm_arch_timers {
>  	TIMER_PTIMER,
>  	TIMER_VTIMER,
> +	TIMER_HVTIMER,
> +	TIMER_HPTIMER,
>  	NR_KVM_TIMERS
>  };
>  
> @@ -21,6 +23,7 @@ enum kvm_arch_timer_regs {
>  	TIMER_REG_CVAL,
>  	TIMER_REG_TVAL,
>  	TIMER_REG_CTL,
> +	TIMER_REG_VOFF,
>  };
>  
>  struct arch_timer_context {
> @@ -47,6 +50,7 @@ struct arch_timer_context {
>  struct timer_map {
>  	struct arch_timer_context *direct_vtimer;
>  	struct arch_timer_context *direct_ptimer;
> +	struct arch_timer_context *emul_vtimer;
>  	struct arch_timer_context *emul_ptimer;
>  };
>  
> @@ -85,12 +89,12 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu);
>  
>  void kvm_timer_init_vhe(void);
>  
> -bool kvm_arch_timer_get_input_level(int vintid);
> -
>  #define vcpu_timer(v)	(&(v)->arch.timer_cpu)
>  #define vcpu_get_timer(v,t)	(&vcpu_timer(v)->timers[(t)])
>  #define vcpu_vtimer(v)	(&(v)->arch.timer_cpu.timers[TIMER_VTIMER])
>  #define vcpu_ptimer(v)	(&(v)->arch.timer_cpu.timers[TIMER_PTIMER])
> +#define vcpu_hvtimer(v)	(&(v)->arch.timer_cpu.timers[TIMER_HVTIMER])
> +#define vcpu_hptimer(v)	(&(v)->arch.timer_cpu.timers[TIMER_HPTIMER])
>  
>  #define arch_timer_ctx_index(ctx)	((ctx) - vcpu_timer((ctx)->vcpu)->timers)
>  
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index bb30a6803d9f..17b651890d22 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -375,6 +375,7 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
>  int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, unsigned int host_irq,
>  			  u32 vintid, struct irq_ops *ops);
>  int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int vintid);
> +int kvm_vgic_get_map(struct kvm_vcpu *vcpu, unsigned int vintid);
>  bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int vintid);
>  
>  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
> -- 
> 2.30.2
> 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 43/64] KVM: arm64: nv: arch_timer: Support hyp timer emulation
@ 2022-03-07 14:52     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-03-07 14:52 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi,

I was under the impression that KVM's nested virtualization doesn't support
AArch32 in the guest, why is the subject about hyp mode aarch32 timers?

On Fri, Jan 28, 2022 at 12:18:51PM +0000, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@arm.com>
> 
> Emulating EL2 also means emulating the EL2 timers. To do so, we expand
> our timer framework to deal with at most 4 timers. At any given time,
> two timers are using the HW timers, and the two others are purely
> emulated.
> 
> The role of deciding which is which at any given time is left to a
> mapping function which is called every time we need to make such a
> decision.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> [maz: added CNTVOFF support, general reworking for v4.8]
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_host.h |   4 +
>  arch/arm64/kvm/arch_timer.c       | 165 ++++++++++++++++++++++++++++--
>  arch/arm64/kvm/sys_regs.c         |   7 +-
>  arch/arm64/kvm/trace_arm.h        |   6 +-
>  arch/arm64/kvm/vgic/vgic.c        |  15 +++
>  include/kvm/arm_arch_timer.h      |   8 +-
>  include/kvm/arm_vgic.h            |   1 +
>  7 files changed, 194 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 0b887364f994..03833eca3307 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -285,6 +285,10 @@ enum vcpu_sysreg {
>  	TPIDR_EL2,	/* EL2 Software Thread ID Register */
>  	CNTHCTL_EL2,	/* Counter-timer Hypervisor Control register */
>  	SP_EL2,		/* EL2 Stack Pointer */
> +	CNTHP_CTL_EL2,
> +	CNTHP_CVAL_EL2,
> +	CNTHV_CTL_EL2,
> +	CNTHV_CVAL_EL2,
>  
>  	NR_SYS_REGS	/* Nothing after this line! */
>  };
> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> index 6e542e2eae32..5e4f93605d36 100644
> --- a/arch/arm64/kvm/arch_timer.c
> +++ b/arch/arm64/kvm/arch_timer.c
> @@ -16,6 +16,7 @@
>  #include <asm/arch_timer.h>
>  #include <asm/kvm_emulate.h>
>  #include <asm/kvm_hyp.h>
> +#include <asm/kvm_nested.h>
>  
>  #include <kvm/arm_vgic.h>
>  #include <kvm/arm_arch_timer.h>
> @@ -40,6 +41,16 @@ static const struct kvm_irq_level default_vtimer_irq = {
>  	.level	= 1,
>  };
>  
> +static const struct kvm_irq_level default_hptimer_irq = {
> +	.irq	= 26,
> +	.level	= 1,
> +};
> +
> +static const struct kvm_irq_level default_hvtimer_irq = {
> +	.irq	= 28,
> +	.level	= 1,
> +};
> +
>  static bool kvm_timer_irq_can_fire(struct arch_timer_context *timer_ctx);
>  static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
>  				 struct arch_timer_context *timer_ctx);
> @@ -51,6 +62,11 @@ static void kvm_arm_timer_write(struct kvm_vcpu *vcpu,
>  static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
>  			      struct arch_timer_context *timer,
>  			      enum kvm_arch_timer_regs treg);
> +static bool kvm_arch_timer_get_input_level(int vintid);
> +
> +static struct irq_ops arch_timer_irq_ops = {
> +	.get_input_level = kvm_arch_timer_get_input_level,
> +};
>  
>  u32 timer_get_ctl(struct arch_timer_context *ctxt)
>  {
> @@ -61,6 +77,10 @@ u32 timer_get_ctl(struct arch_timer_context *ctxt)
>  		return __vcpu_sys_reg(vcpu, CNTV_CTL_EL0);
>  	case TIMER_PTIMER:
>  		return __vcpu_sys_reg(vcpu, CNTP_CTL_EL0);
> +	case TIMER_HVTIMER:
> +		return __vcpu_sys_reg(vcpu, CNTHV_CTL_EL2);
> +	case TIMER_HPTIMER:
> +		return __vcpu_sys_reg(vcpu, CNTHP_CTL_EL2);
>  	default:
>  		WARN_ON(1);
>  		return 0;
> @@ -76,6 +96,10 @@ u64 timer_get_cval(struct arch_timer_context *ctxt)
>  		return __vcpu_sys_reg(vcpu, CNTV_CVAL_EL0);
>  	case TIMER_PTIMER:
>  		return __vcpu_sys_reg(vcpu, CNTP_CVAL_EL0);
> +	case TIMER_HVTIMER:
> +		return __vcpu_sys_reg(vcpu, CNTHV_CVAL_EL2);
> +	case TIMER_HPTIMER:
> +		return __vcpu_sys_reg(vcpu, CNTHP_CVAL_EL2);
>  	default:
>  		WARN_ON(1);
>  		return 0;
> @@ -105,6 +129,12 @@ static void timer_set_ctl(struct arch_timer_context *ctxt, u32 ctl)
>  	case TIMER_PTIMER:
>  		__vcpu_sys_reg(vcpu, CNTP_CTL_EL0) = ctl;
>  		break;
> +	case TIMER_HVTIMER:
> +		__vcpu_sys_reg(vcpu, CNTHV_CTL_EL2) = ctl;
> +		break;
> +	case TIMER_HPTIMER:
> +		__vcpu_sys_reg(vcpu, CNTHP_CTL_EL2) = ctl;
> +		break;
>  	default:
>  		WARN_ON(1);
>  	}
> @@ -121,6 +151,12 @@ static void timer_set_cval(struct arch_timer_context *ctxt, u64 cval)
>  	case TIMER_PTIMER:
>  		__vcpu_sys_reg(vcpu, CNTP_CVAL_EL0) = cval;
>  		break;
> +	case TIMER_HVTIMER:
> +		__vcpu_sys_reg(vcpu, CNTHV_CVAL_EL2) = cval;
> +		break;
> +	case TIMER_HPTIMER:
> +		__vcpu_sys_reg(vcpu, CNTHP_CVAL_EL2) = cval;
> +		break;
>  	default:
>  		WARN_ON(1);
>  	}
> @@ -146,13 +182,27 @@ u64 kvm_phys_timer_read(void)
>  
>  static void get_timer_map(struct kvm_vcpu *vcpu, struct timer_map *map)
>  {
> -	if (has_vhe()) {
> +	if (vcpu_has_nv(vcpu)) {
> +		if (is_hyp_ctxt(vcpu)) {
> +			map->direct_vtimer = vcpu_hvtimer(vcpu);
> +			map->direct_ptimer = vcpu_hptimer(vcpu);
> +			map->emul_vtimer = vcpu_vtimer(vcpu);
> +			map->emul_ptimer = vcpu_ptimer(vcpu);
> +		} else {
> +			map->direct_vtimer = vcpu_vtimer(vcpu);
> +			map->direct_ptimer = vcpu_ptimer(vcpu);
> +			map->emul_vtimer = vcpu_hvtimer(vcpu);
> +			map->emul_ptimer = vcpu_hptimer(vcpu);
> +		}

It would be nice to explain why when the guest is running in virtual EL2, the
hp/hv timers are passthrough, while when the guest is in virtual EL1, the
virtual and physical timers are passthrough. I suppose it's because at EL2, the
guest is expected to use the hp/hv timers, and at EL1 the virtual and physical
timers, right?

> +	} else if (has_vhe()) {
>  		map->direct_vtimer = vcpu_vtimer(vcpu);
>  		map->direct_ptimer = vcpu_ptimer(vcpu);
> +		map->emul_vtimer = NULL;
>  		map->emul_ptimer = NULL;
>  	} else {
>  		map->direct_vtimer = vcpu_vtimer(vcpu);
>  		map->direct_ptimer = NULL;
> +		map->emul_vtimer = NULL;
>  		map->emul_ptimer = vcpu_ptimer(vcpu);
>  	}
>  
> @@ -325,9 +375,11 @@ static bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx)
>  
>  		switch (index) {
>  		case TIMER_VTIMER:
> +		case TIMER_HVTIMER:
>  			cnt_ctl = read_sysreg_el0(SYS_CNTV_CTL);
>  			break;
>  		case TIMER_PTIMER:
> +		case TIMER_HPTIMER:
>  			cnt_ctl = read_sysreg_el0(SYS_CNTP_CTL);
>  			break;
>  		case NR_KVM_TIMERS:
> @@ -358,6 +410,7 @@ bool kvm_timer_is_pending(struct kvm_vcpu *vcpu)
>  
>  	return kvm_timer_should_fire(map.direct_vtimer) ||
>  	       kvm_timer_should_fire(map.direct_ptimer) ||
> +	       kvm_timer_should_fire(map.emul_vtimer) ||
>  	       kvm_timer_should_fire(map.emul_ptimer);
>  }
>  
> @@ -438,6 +491,7 @@ static void timer_save_state(struct arch_timer_context *ctx)
>  
>  	switch (index) {
>  	case TIMER_VTIMER:
> +	case TIMER_HVTIMER:
>  		timer_set_ctl(ctx, read_sysreg_el0(SYS_CNTV_CTL));
>  		timer_set_cval(ctx, read_sysreg_el0(SYS_CNTV_CVAL));
>  
> @@ -447,6 +501,7 @@ static void timer_save_state(struct arch_timer_context *ctx)
>  
>  		break;
>  	case TIMER_PTIMER:
> +	case TIMER_HPTIMER:
>  		timer_set_ctl(ctx, read_sysreg_el0(SYS_CNTP_CTL));
>  		timer_set_cval(ctx, read_sysreg_el0(SYS_CNTP_CVAL));
>  
> @@ -484,6 +539,7 @@ static void kvm_timer_blocking(struct kvm_vcpu *vcpu)
>  	 */
>  	if (!kvm_timer_irq_can_fire(map.direct_vtimer) &&
>  	    !kvm_timer_irq_can_fire(map.direct_ptimer) &&
> +	    !kvm_timer_irq_can_fire(map.emul_vtimer) &&
>  	    !kvm_timer_irq_can_fire(map.emul_ptimer))
>  		return;
>  
> @@ -517,11 +573,13 @@ static void timer_restore_state(struct arch_timer_context *ctx)
>  
>  	switch (index) {
>  	case TIMER_VTIMER:
> +	case TIMER_HVTIMER:
>  		write_sysreg_el0(timer_get_cval(ctx), SYS_CNTV_CVAL);
>  		isb();
>  		write_sysreg_el0(timer_get_ctl(ctx), SYS_CNTV_CTL);
>  		break;
>  	case TIMER_PTIMER:
> +	case TIMER_HPTIMER:
>  		write_sysreg_el0(timer_get_cval(ctx), SYS_CNTP_CVAL);
>  		isb();
>  		write_sysreg_el0(timer_get_ctl(ctx), SYS_CNTP_CTL);
> @@ -598,6 +656,42 @@ static void kvm_timer_vcpu_load_nogic(struct kvm_vcpu *vcpu)
>  		enable_percpu_irq(host_vtimer_irq, host_vtimer_irq_flags);
>  }
>  
> +static void kvm_timer_vcpu_load_nested_switch(struct kvm_vcpu *vcpu,
> +					      struct timer_map *map)
> +{
> +	int hw, ret;
> +
> +	if (!irqchip_in_kernel(vcpu->kvm))
> +		return;
> +
> +	/*
> +	 * We only ever unmap the vtimer irq on a VHE system that runs nested
> +	 * virtualization, in which case we have both a valid emul_vtimer,
> +	 * emul_ptimer, direct_vtimer, and direct_ptimer.
> +	 *
> +	 * Since this is called from kvm_timer_vcpu_load(), a change between
> +	 * vEL2 and vEL1/0 will have just happened, and the timer_map will

I can think of at least two cases when the above doesn't hold:

- when a VCPU resets another VCPU.
- when the guest changes the value of the HCR_EL2.E2H field.

> +	 * represent this, and therefore we switch the emul/direct mappings
> +	 * below.
> +	 */
> +	hw = kvm_vgic_get_map(vcpu, map->direct_vtimer->irq.irq);
> +	if (hw < 0) {

Looking at kvm_vgic_get_map(), it returns -1 when the interrupt is not directly
directly mapped to a physical interrupt. This only happens when the VCPU has
transitioned from EL2 to EL1 or viceversa between a vcpu put/load pair. I think
it would make the code a lot more understandable if it's wrapped in the helper
function, something like kvm_timer_context_switched() or something like that.

Thanks,
Alex

> +		kvm_vgic_unmap_phys_irq(vcpu, map->emul_vtimer->irq.irq);
> +		kvm_vgic_unmap_phys_irq(vcpu, map->emul_ptimer->irq.irq);
> +
> +		ret = kvm_vgic_map_phys_irq(vcpu,
> +					    map->direct_vtimer->host_timer_irq,
> +					    map->direct_vtimer->irq.irq,
> +					    &arch_timer_irq_ops);
> +		WARN_ON_ONCE(ret);
> +		ret = kvm_vgic_map_phys_irq(vcpu,
> +					    map->direct_ptimer->host_timer_irq,
> +					    map->direct_ptimer->irq.irq,
> +					    &arch_timer_irq_ops);
> +		WARN_ON_ONCE(ret);
> +	}
> +}
> +
>  void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
>  {
>  	struct arch_timer_cpu *timer = vcpu_timer(vcpu);
> @@ -609,6 +703,9 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
>  	get_timer_map(vcpu, &map);
>  
>  	if (static_branch_likely(&has_gic_active_state)) {
> +		if (vcpu_has_nv(vcpu))
> +			kvm_timer_vcpu_load_nested_switch(vcpu, &map);
> +
>  		kvm_timer_vcpu_load_gic(map.direct_vtimer);
>  		if (map.direct_ptimer)
>  			kvm_timer_vcpu_load_gic(map.direct_ptimer);
> @@ -624,6 +721,8 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
>  	if (map.direct_ptimer)
>  		timer_restore_state(map.direct_ptimer);
>  
> +	if (map.emul_vtimer)
> +		timer_emulate(map.emul_vtimer);
>  	if (map.emul_ptimer)
>  		timer_emulate(map.emul_ptimer);
>  }
> @@ -668,6 +767,8 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
>  	 * In any case, we re-schedule the hrtimer for the physical timer when
>  	 * coming back to the VCPU thread in kvm_timer_vcpu_load().
>  	 */
> +	if (map.emul_vtimer)
> +		soft_timer_cancel(&map.emul_vtimer->hrtimer);
>  	if (map.emul_ptimer)
>  		soft_timer_cancel(&map.emul_ptimer->hrtimer);
>  
> @@ -728,10 +829,14 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
>  	 */
>  	timer_set_ctl(vcpu_vtimer(vcpu), 0);
>  	timer_set_ctl(vcpu_ptimer(vcpu), 0);
> +	timer_set_ctl(vcpu_hvtimer(vcpu), 0);
> +	timer_set_ctl(vcpu_hptimer(vcpu), 0);
>  
>  	if (timer->enabled) {
>  		kvm_timer_update_irq(vcpu, false, vcpu_vtimer(vcpu));
>  		kvm_timer_update_irq(vcpu, false, vcpu_ptimer(vcpu));
> +		kvm_timer_update_irq(vcpu, false, vcpu_hvtimer(vcpu));
> +		kvm_timer_update_irq(vcpu, false, vcpu_hptimer(vcpu));
>  
>  		if (irqchip_in_kernel(vcpu->kvm)) {
>  			kvm_vgic_reset_mapped_irq(vcpu, map.direct_vtimer->irq.irq);
> @@ -740,6 +845,8 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
>  		}
>  	}
>  
> +	if (map.emul_vtimer)
> +		soft_timer_cancel(&map.emul_vtimer->hrtimer);
>  	if (map.emul_ptimer)
>  		soft_timer_cancel(&map.emul_ptimer->hrtimer);
>  
> @@ -770,30 +877,47 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
>  	struct arch_timer_cpu *timer = vcpu_timer(vcpu);
>  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
>  	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
> +	struct arch_timer_context *hvtimer = vcpu_hvtimer(vcpu);
> +	struct arch_timer_context *hptimer = vcpu_hptimer(vcpu);
>  
>  	vtimer->vcpu = vcpu;
>  	ptimer->vcpu = vcpu;
> +	hvtimer->vcpu = vcpu;
> +	hptimer->vcpu = vcpu;
>  
>  	/* Synchronize cntvoff across all vtimers of a VM. */
>  	update_vtimer_cntvoff(vcpu, kvm_phys_timer_read());
>  	timer_set_offset(ptimer, 0);
> +	timer_set_offset(hvtimer, 0);
> +	timer_set_offset(hptimer, 0);
>  
>  	hrtimer_init(&timer->bg_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
>  	timer->bg_timer.function = kvm_bg_timer_expire;
>  
>  	hrtimer_init(&vtimer->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
>  	hrtimer_init(&ptimer->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
> +	hrtimer_init(&hvtimer->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
> +	hrtimer_init(&hptimer->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
> +
>  	vtimer->hrtimer.function = kvm_hrtimer_expire;
>  	ptimer->hrtimer.function = kvm_hrtimer_expire;
> +	hvtimer->hrtimer.function = kvm_hrtimer_expire;
> +	hptimer->hrtimer.function = kvm_hrtimer_expire;
>  
>  	vtimer->irq.irq = default_vtimer_irq.irq;
>  	ptimer->irq.irq = default_ptimer_irq.irq;
> +	hvtimer->irq.irq = default_hvtimer_irq.irq;
> +	hptimer->irq.irq = default_hptimer_irq.irq;
>  
>  	vtimer->host_timer_irq = host_vtimer_irq;
>  	ptimer->host_timer_irq = host_ptimer_irq;
> +	hvtimer->host_timer_irq = host_vtimer_irq;
> +	hptimer->host_timer_irq = host_ptimer_irq;
>  
>  	vtimer->host_timer_irq_flags = host_vtimer_irq_flags;
>  	ptimer->host_timer_irq_flags = host_ptimer_irq_flags;
> +	hvtimer->host_timer_irq_flags = host_vtimer_irq_flags;
> +	hptimer->host_timer_irq_flags = host_ptimer_irq_flags;
>  }
>  
>  static void kvm_timer_init_interrupt(void *info)
> @@ -900,6 +1024,10 @@ static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
>  		val = kvm_phys_timer_read() - timer_get_offset(timer);
>  		break;
>  
> +	case TIMER_REG_VOFF:
> +		val = timer_get_offset(timer);
> +		break;
> +
>  	default:
>  		BUG();
>  	}
> @@ -942,6 +1070,10 @@ static void kvm_arm_timer_write(struct kvm_vcpu *vcpu,
>  		timer_set_cval(timer, val);
>  		break;
>  
> +	case TIMER_REG_VOFF:
> +		timer_set_offset(timer, val);
> +		break;
> +
>  	default:
>  		BUG();
>  	}
> @@ -1040,10 +1172,6 @@ static const struct irq_domain_ops timer_domain_ops = {
>  	.free	= timer_irq_domain_free,
>  };
>  
> -static struct irq_ops arch_timer_irq_ops = {
> -	.get_input_level = kvm_arch_timer_get_input_level,
> -};
> -
>  static void kvm_irq_fixup_flags(unsigned int virq, u32 *flags)
>  {
>  	*flags = irq_get_trigger_type(virq);
> @@ -1188,7 +1316,7 @@ void kvm_timer_vcpu_terminate(struct kvm_vcpu *vcpu)
>  
>  static bool timer_irqs_are_valid(struct kvm_vcpu *vcpu)
>  {
> -	int vtimer_irq, ptimer_irq, ret;
> +	int vtimer_irq, ptimer_irq, hvtimer_irq, hptimer_irq, ret;
>  	unsigned long i;
>  
>  	vtimer_irq = vcpu_vtimer(vcpu)->irq.irq;
> @@ -1201,16 +1329,28 @@ static bool timer_irqs_are_valid(struct kvm_vcpu *vcpu)
>  	if (ret)
>  		return false;
>  
> +	hvtimer_irq = vcpu_hvtimer(vcpu)->irq.irq;
> +	ret = kvm_vgic_set_owner(vcpu, hvtimer_irq, vcpu_hvtimer(vcpu));
> +	if (ret)
> +		return false;
> +
> +	hptimer_irq = vcpu_hptimer(vcpu)->irq.irq;
> +	ret = kvm_vgic_set_owner(vcpu, hptimer_irq, vcpu_hptimer(vcpu));
> +	if (ret)
> +		return false;
> +
>  	kvm_for_each_vcpu(i, vcpu, vcpu->kvm) {
>  		if (vcpu_vtimer(vcpu)->irq.irq != vtimer_irq ||
> -		    vcpu_ptimer(vcpu)->irq.irq != ptimer_irq)
> +		    vcpu_ptimer(vcpu)->irq.irq != ptimer_irq ||
> +		    vcpu_hvtimer(vcpu)->irq.irq != hvtimer_irq ||
> +		    vcpu_hptimer(vcpu)->irq.irq != hptimer_irq)
>  			return false;
>  	}
>  
>  	return true;
>  }
>  
> -bool kvm_arch_timer_get_input_level(int vintid)
> +static bool kvm_arch_timer_get_input_level(int vintid)
>  {
>  	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
>  	struct arch_timer_context *timer;
> @@ -1219,6 +1359,10 @@ bool kvm_arch_timer_get_input_level(int vintid)
>  		timer = vcpu_vtimer(vcpu);
>  	else if (vintid == vcpu_ptimer(vcpu)->irq.irq)
>  		timer = vcpu_ptimer(vcpu);
> +	else if (vintid == vcpu_hvtimer(vcpu)->irq.irq)
> +		timer = vcpu_hvtimer(vcpu);
> +	else if (vintid == vcpu_hptimer(vcpu)->irq.irq)
> +		timer = vcpu_hptimer(vcpu);
>  	else
>  		BUG();
>  
> @@ -1301,6 +1445,7 @@ static void set_timer_irqs(struct kvm *kvm, int vtimer_irq, int ptimer_irq)
>  	kvm_for_each_vcpu(i, vcpu, kvm) {
>  		vcpu_vtimer(vcpu)->irq.irq = vtimer_irq;
>  		vcpu_ptimer(vcpu)->irq.irq = ptimer_irq;
> +		/* TODO: Add support for hv/hp timers */
>  	}
>  }
>  
> @@ -1311,6 +1456,8 @@ int kvm_arm_timer_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
>  	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
>  	int irq;
>  
> +	/* TODO: Add support for hv/hp timers */
> +
>  	if (!irqchip_in_kernel(vcpu->kvm))
>  		return -EINVAL;
>  
> @@ -1343,6 +1490,8 @@ int kvm_arm_timer_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
>  	struct arch_timer_context *timer;
>  	int irq;
>  
> +	/* TODO: Add support for hv/hp timers */
> +
>  	switch (attr->attr) {
>  	case KVM_ARM_VCPU_TIMER_IRQ_VTIMER:
>  		timer = vcpu_vtimer(vcpu);
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index d7441b8ba406..bbc58930a5eb 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1288,6 +1288,11 @@ static bool access_arch_timer(struct kvm_vcpu *vcpu,
>  		tmr = TIMER_PTIMER;
>  		treg = TIMER_REG_CVAL;
>  		break;
> +	case SYS_CNTVOFF_EL2:
> +		tmr = TIMER_VTIMER;
> +		treg = TIMER_REG_VOFF;
> +		break;
> +
>  	default:
>  		BUG();
>  	}
> @@ -2212,7 +2217,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	EL2_REG(CONTEXTIDR_EL2, access_rw, reset_val, 0),
>  	EL2_REG(TPIDR_EL2, access_rw, reset_val, 0),
>  
> -	EL2_REG(CNTVOFF_EL2, access_rw, reset_val, 0),
> +	{ SYS_DESC(SYS_CNTVOFF_EL2), access_arch_timer },
>  	EL2_REG(CNTHCTL_EL2, access_rw, reset_val, 0),
>  
>  	EL12_REG(SCTLR, access_vm_reg, reset_val, 0x00C50078),
> diff --git a/arch/arm64/kvm/trace_arm.h b/arch/arm64/kvm/trace_arm.h
> index f3e46a976125..6ce5c025218d 100644
> --- a/arch/arm64/kvm/trace_arm.h
> +++ b/arch/arm64/kvm/trace_arm.h
> @@ -206,6 +206,7 @@ TRACE_EVENT(kvm_get_timer_map,
>  		__field(	unsigned long,		vcpu_id	)
>  		__field(	int,			direct_vtimer	)
>  		__field(	int,			direct_ptimer	)
> +		__field(	int,			emul_vtimer	)
>  		__field(	int,			emul_ptimer	)
>  	),
>  
> @@ -214,14 +215,17 @@ TRACE_EVENT(kvm_get_timer_map,
>  		__entry->direct_vtimer		= arch_timer_ctx_index(map->direct_vtimer);
>  		__entry->direct_ptimer =
>  			(map->direct_ptimer) ? arch_timer_ctx_index(map->direct_ptimer) : -1;
> +		__entry->emul_vtimer =
> +			(map->emul_vtimer) ? arch_timer_ctx_index(map->emul_vtimer) : -1;
>  		__entry->emul_ptimer =
>  			(map->emul_ptimer) ? arch_timer_ctx_index(map->emul_ptimer) : -1;
>  	),
>  
> -	TP_printk("VCPU: %ld, dv: %d, dp: %d, ep: %d",
> +	TP_printk("VCPU: %ld, dv: %d, dp: %d, ev: %d, ep: %d",
>  		  __entry->vcpu_id,
>  		  __entry->direct_vtimer,
>  		  __entry->direct_ptimer,
> +		  __entry->emul_vtimer,
>  		  __entry->emul_ptimer)
>  );
>  
> diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
> index 9b98876a8a93..e7fe0447af08 100644
> --- a/arch/arm64/kvm/vgic/vgic.c
> +++ b/arch/arm64/kvm/vgic/vgic.c
> @@ -573,6 +573,21 @@ int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int vintid)
>  	return 0;
>  }
>  
> +int kvm_vgic_get_map(struct kvm_vcpu *vcpu, unsigned int vintid)
> +{
> +	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, vintid);
> +	unsigned long flags;
> +	int ret = -1;
> +
> +	raw_spin_lock_irqsave(&irq->irq_lock, flags);
> +	if (irq->hw)
> +		ret = irq->hwintid;
> +	raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
> +
> +	vgic_put_irq(vcpu->kvm, irq);
> +	return ret;
> +}
> +
>  /**
>   * kvm_vgic_set_owner - Set the owner of an interrupt for a VM
>   *
> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> index 51c19381108c..0a76dac8cb6a 100644
> --- a/include/kvm/arm_arch_timer.h
> +++ b/include/kvm/arm_arch_timer.h
> @@ -13,6 +13,8 @@
>  enum kvm_arch_timers {
>  	TIMER_PTIMER,
>  	TIMER_VTIMER,
> +	TIMER_HVTIMER,
> +	TIMER_HPTIMER,
>  	NR_KVM_TIMERS
>  };
>  
> @@ -21,6 +23,7 @@ enum kvm_arch_timer_regs {
>  	TIMER_REG_CVAL,
>  	TIMER_REG_TVAL,
>  	TIMER_REG_CTL,
> +	TIMER_REG_VOFF,
>  };
>  
>  struct arch_timer_context {
> @@ -47,6 +50,7 @@ struct arch_timer_context {
>  struct timer_map {
>  	struct arch_timer_context *direct_vtimer;
>  	struct arch_timer_context *direct_ptimer;
> +	struct arch_timer_context *emul_vtimer;
>  	struct arch_timer_context *emul_ptimer;
>  };
>  
> @@ -85,12 +89,12 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu);
>  
>  void kvm_timer_init_vhe(void);
>  
> -bool kvm_arch_timer_get_input_level(int vintid);
> -
>  #define vcpu_timer(v)	(&(v)->arch.timer_cpu)
>  #define vcpu_get_timer(v,t)	(&vcpu_timer(v)->timers[(t)])
>  #define vcpu_vtimer(v)	(&(v)->arch.timer_cpu.timers[TIMER_VTIMER])
>  #define vcpu_ptimer(v)	(&(v)->arch.timer_cpu.timers[TIMER_PTIMER])
> +#define vcpu_hvtimer(v)	(&(v)->arch.timer_cpu.timers[TIMER_HVTIMER])
> +#define vcpu_hptimer(v)	(&(v)->arch.timer_cpu.timers[TIMER_HPTIMER])
>  
>  #define arch_timer_ctx_index(ctx)	((ctx) - vcpu_timer((ctx)->vcpu)->timers)
>  
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index bb30a6803d9f..17b651890d22 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -375,6 +375,7 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
>  int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, unsigned int host_irq,
>  			  u32 vintid, struct irq_ops *ops);
>  int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int vintid);
> +int kvm_vgic_get_map(struct kvm_vcpu *vcpu, unsigned int vintid);
>  bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int vintid);
>  
>  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
> -- 
> 2.30.2
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 43/64] KVM: arm64: nv: arch_timer: Support hyp timer emulation
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-03-07 15:23     ` Alexandru Elisei
  -1 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-03-07 15:23 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi,

On Fri, Jan 28, 2022 at 12:18:51PM +0000, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@arm.com>
> 
> Emulating EL2 also means emulating the EL2 timers. To do so, we expand
> our timer framework to deal with at most 4 timers. At any given time,
> two timers are using the HW timers, and the two others are purely
> emulated.
> 
> The role of deciding which is which at any given time is left to a
> mapping function which is called every time we need to make such a
> decision.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> [maz: added CNTVOFF support, general reworking for v4.8]
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_host.h |   4 +
>  arch/arm64/kvm/arch_timer.c       | 165 ++++++++++++++++++++++++++++--
>  arch/arm64/kvm/sys_regs.c         |   7 +-
>  arch/arm64/kvm/trace_arm.h        |   6 +-
>  arch/arm64/kvm/vgic/vgic.c        |  15 +++
>  include/kvm/arm_arch_timer.h      |   8 +-
>  include/kvm/arm_vgic.h            |   1 +
>  7 files changed, 194 insertions(+), 12 deletions(-)
> 
[..]
> @@ -1301,6 +1445,7 @@ static void set_timer_irqs(struct kvm *kvm, int vtimer_irq, int ptimer_irq)
>  	kvm_for_each_vcpu(i, vcpu, kvm) {
>  		vcpu_vtimer(vcpu)->irq.irq = vtimer_irq;
>  		vcpu_ptimer(vcpu)->irq.irq = ptimer_irq;
> +		/* TODO: Add support for hv/hp timers */
>  	}
>  }
>  
> @@ -1311,6 +1456,8 @@ int kvm_arm_timer_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
>  	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
>  	int irq;
>  
> +	/* TODO: Add support for hv/hp timers */

Is the patch unfinished?

Thanks,
Alex

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 43/64] KVM: arm64: nv: arch_timer: Support hyp timer emulation
@ 2022-03-07 15:23     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-03-07 15:23 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

Hi,

On Fri, Jan 28, 2022 at 12:18:51PM +0000, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@arm.com>
> 
> Emulating EL2 also means emulating the EL2 timers. To do so, we expand
> our timer framework to deal with at most 4 timers. At any given time,
> two timers are using the HW timers, and the two others are purely
> emulated.
> 
> The role of deciding which is which at any given time is left to a
> mapping function which is called every time we need to make such a
> decision.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> [maz: added CNTVOFF support, general reworking for v4.8]
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_host.h |   4 +
>  arch/arm64/kvm/arch_timer.c       | 165 ++++++++++++++++++++++++++++--
>  arch/arm64/kvm/sys_regs.c         |   7 +-
>  arch/arm64/kvm/trace_arm.h        |   6 +-
>  arch/arm64/kvm/vgic/vgic.c        |  15 +++
>  include/kvm/arm_arch_timer.h      |   8 +-
>  include/kvm/arm_vgic.h            |   1 +
>  7 files changed, 194 insertions(+), 12 deletions(-)
> 
[..]
> @@ -1301,6 +1445,7 @@ static void set_timer_irqs(struct kvm *kvm, int vtimer_irq, int ptimer_irq)
>  	kvm_for_each_vcpu(i, vcpu, kvm) {
>  		vcpu_vtimer(vcpu)->irq.irq = vtimer_irq;
>  		vcpu_ptimer(vcpu)->irq.irq = ptimer_irq;
> +		/* TODO: Add support for hv/hp timers */
>  	}
>  }
>  
> @@ -1311,6 +1456,8 @@ int kvm_arm_timer_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
>  	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
>  	int irq;
>  
> +	/* TODO: Add support for hv/hp timers */

Is the patch unfinished?

Thanks,
Alex
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 43/64] KVM: arm64: nv: arch_timer: Support hyp timer emulation
@ 2022-03-07 15:23     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-03-07 15:23 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi,

On Fri, Jan 28, 2022 at 12:18:51PM +0000, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall@arm.com>
> 
> Emulating EL2 also means emulating the EL2 timers. To do so, we expand
> our timer framework to deal with at most 4 timers. At any given time,
> two timers are using the HW timers, and the two others are purely
> emulated.
> 
> The role of deciding which is which at any given time is left to a
> mapping function which is called every time we need to make such a
> decision.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> [maz: added CNTVOFF support, general reworking for v4.8]
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_host.h |   4 +
>  arch/arm64/kvm/arch_timer.c       | 165 ++++++++++++++++++++++++++++--
>  arch/arm64/kvm/sys_regs.c         |   7 +-
>  arch/arm64/kvm/trace_arm.h        |   6 +-
>  arch/arm64/kvm/vgic/vgic.c        |  15 +++
>  include/kvm/arm_arch_timer.h      |   8 +-
>  include/kvm/arm_vgic.h            |   1 +
>  7 files changed, 194 insertions(+), 12 deletions(-)
> 
[..]
> @@ -1301,6 +1445,7 @@ static void set_timer_irqs(struct kvm *kvm, int vtimer_irq, int ptimer_irq)
>  	kvm_for_each_vcpu(i, vcpu, kvm) {
>  		vcpu_vtimer(vcpu)->irq.irq = vtimer_irq;
>  		vcpu_ptimer(vcpu)->irq.irq = ptimer_irq;
> +		/* TODO: Add support for hv/hp timers */
>  	}
>  }
>  
> @@ -1311,6 +1456,8 @@ int kvm_arm_timer_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
>  	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
>  	int irq;
>  
> +	/* TODO: Add support for hv/hp timers */

Is the patch unfinished?

Thanks,
Alex

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 43/64] KVM: arm64: nv: arch_timer: Support hyp timer emulation
  2022-03-07 15:23     ` Alexandru Elisei
  (?)
@ 2022-03-07 15:44       ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-03-07 15:44 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

On 2022-03-07 15:23, Alexandru Elisei wrote:
> Hi,
> 
> On Fri, Jan 28, 2022 at 12:18:51PM +0000, Marc Zyngier wrote:
>> From: Christoffer Dall <christoffer.dall@arm.com>
>> 
>> Emulating EL2 also means emulating the EL2 timers. To do so, we expand
>> our timer framework to deal with at most 4 timers. At any given time,
>> two timers are using the HW timers, and the two others are purely
>> emulated.
>> 
>> The role of deciding which is which at any given time is left to a
>> mapping function which is called every time we need to make such a
>> decision.
>> 
>> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
>> [maz: added CNTVOFF support, general reworking for v4.8]
>> Signed-off-by: Marc Zyngier <maz@kernel.org>
>> ---
>>  arch/arm64/include/asm/kvm_host.h |   4 +
>>  arch/arm64/kvm/arch_timer.c       | 165 
>> ++++++++++++++++++++++++++++--
>>  arch/arm64/kvm/sys_regs.c         |   7 +-
>>  arch/arm64/kvm/trace_arm.h        |   6 +-
>>  arch/arm64/kvm/vgic/vgic.c        |  15 +++
>>  include/kvm/arm_arch_timer.h      |   8 +-
>>  include/kvm/arm_vgic.h            |   1 +
>>  7 files changed, 194 insertions(+), 12 deletions(-)
>> 
> [..]
>> @@ -1301,6 +1445,7 @@ static void set_timer_irqs(struct kvm *kvm, int 
>> vtimer_irq, int ptimer_irq)
>>  	kvm_for_each_vcpu(i, vcpu, kvm) {
>>  		vcpu_vtimer(vcpu)->irq.irq = vtimer_irq;
>>  		vcpu_ptimer(vcpu)->irq.irq = ptimer_irq;
>> +		/* TODO: Add support for hv/hp timers */
>>  	}
>>  }
>> 
>> @@ -1311,6 +1456,8 @@ int kvm_arm_timer_set_attr(struct kvm_vcpu 
>> *vcpu, struct kvm_device_attr *attr)
>>  	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
>>  	int irq;
>> 
>> +	/* TODO: Add support for hv/hp timers */
> 
> Is the patch unfinished?

Just like the rest of the kernel.

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 43/64] KVM: arm64: nv: arch_timer: Support hyp timer emulation
@ 2022-03-07 15:44       ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-03-07 15:44 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

On 2022-03-07 15:23, Alexandru Elisei wrote:
> Hi,
> 
> On Fri, Jan 28, 2022 at 12:18:51PM +0000, Marc Zyngier wrote:
>> From: Christoffer Dall <christoffer.dall@arm.com>
>> 
>> Emulating EL2 also means emulating the EL2 timers. To do so, we expand
>> our timer framework to deal with at most 4 timers. At any given time,
>> two timers are using the HW timers, and the two others are purely
>> emulated.
>> 
>> The role of deciding which is which at any given time is left to a
>> mapping function which is called every time we need to make such a
>> decision.
>> 
>> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
>> [maz: added CNTVOFF support, general reworking for v4.8]
>> Signed-off-by: Marc Zyngier <maz@kernel.org>
>> ---
>>  arch/arm64/include/asm/kvm_host.h |   4 +
>>  arch/arm64/kvm/arch_timer.c       | 165 
>> ++++++++++++++++++++++++++++--
>>  arch/arm64/kvm/sys_regs.c         |   7 +-
>>  arch/arm64/kvm/trace_arm.h        |   6 +-
>>  arch/arm64/kvm/vgic/vgic.c        |  15 +++
>>  include/kvm/arm_arch_timer.h      |   8 +-
>>  include/kvm/arm_vgic.h            |   1 +
>>  7 files changed, 194 insertions(+), 12 deletions(-)
>> 
> [..]
>> @@ -1301,6 +1445,7 @@ static void set_timer_irqs(struct kvm *kvm, int 
>> vtimer_irq, int ptimer_irq)
>>  	kvm_for_each_vcpu(i, vcpu, kvm) {
>>  		vcpu_vtimer(vcpu)->irq.irq = vtimer_irq;
>>  		vcpu_ptimer(vcpu)->irq.irq = ptimer_irq;
>> +		/* TODO: Add support for hv/hp timers */
>>  	}
>>  }
>> 
>> @@ -1311,6 +1456,8 @@ int kvm_arm_timer_set_attr(struct kvm_vcpu 
>> *vcpu, struct kvm_device_attr *attr)
>>  	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
>>  	int irq;
>> 
>> +	/* TODO: Add support for hv/hp timers */
> 
> Is the patch unfinished?

Just like the rest of the kernel.

         M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 43/64] KVM: arm64: nv: arch_timer: Support hyp timer emulation
@ 2022-03-07 15:44       ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-03-07 15:44 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

On 2022-03-07 15:23, Alexandru Elisei wrote:
> Hi,
> 
> On Fri, Jan 28, 2022 at 12:18:51PM +0000, Marc Zyngier wrote:
>> From: Christoffer Dall <christoffer.dall@arm.com>
>> 
>> Emulating EL2 also means emulating the EL2 timers. To do so, we expand
>> our timer framework to deal with at most 4 timers. At any given time,
>> two timers are using the HW timers, and the two others are purely
>> emulated.
>> 
>> The role of deciding which is which at any given time is left to a
>> mapping function which is called every time we need to make such a
>> decision.
>> 
>> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
>> [maz: added CNTVOFF support, general reworking for v4.8]
>> Signed-off-by: Marc Zyngier <maz@kernel.org>
>> ---
>>  arch/arm64/include/asm/kvm_host.h |   4 +
>>  arch/arm64/kvm/arch_timer.c       | 165 
>> ++++++++++++++++++++++++++++--
>>  arch/arm64/kvm/sys_regs.c         |   7 +-
>>  arch/arm64/kvm/trace_arm.h        |   6 +-
>>  arch/arm64/kvm/vgic/vgic.c        |  15 +++
>>  include/kvm/arm_arch_timer.h      |   8 +-
>>  include/kvm/arm_vgic.h            |   1 +
>>  7 files changed, 194 insertions(+), 12 deletions(-)
>> 
> [..]
>> @@ -1301,6 +1445,7 @@ static void set_timer_irqs(struct kvm *kvm, int 
>> vtimer_irq, int ptimer_irq)
>>  	kvm_for_each_vcpu(i, vcpu, kvm) {
>>  		vcpu_vtimer(vcpu)->irq.irq = vtimer_irq;
>>  		vcpu_ptimer(vcpu)->irq.irq = ptimer_irq;
>> +		/* TODO: Add support for hv/hp timers */
>>  	}
>>  }
>> 
>> @@ -1311,6 +1456,8 @@ int kvm_arm_timer_set_attr(struct kvm_vcpu 
>> *vcpu, struct kvm_device_attr *attr)
>>  	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
>>  	int irq;
>> 
>> +	/* TODO: Add support for hv/hp timers */
> 
> Is the patch unfinished?

Just like the rest of the kernel.

         M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 43/64] KVM: arm64: nv: arch_timer: Support hyp timer emulation
  2022-03-07 14:52     ` Alexandru Elisei
  (?)
@ 2022-03-07 15:48       ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-03-07 15:48 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

On 2022-03-07 14:52, Alexandru Elisei wrote:
> Hi,
> 
> I was under the impression that KVM's nested virtualization doesn't 
> support
> AArch32 in the guest, why is the subject about hyp mode aarch32 timers?

Where did you see *ANY* mention of AArch32?

Or is that a very roundabout way to object to the 'hyp' name?
If that's the case, just apply a mental 's/hyp/el2/' substitution.

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 43/64] KVM: arm64: nv: arch_timer: Support hyp timer emulation
@ 2022-03-07 15:48       ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-03-07 15:48 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

On 2022-03-07 14:52, Alexandru Elisei wrote:
> Hi,
> 
> I was under the impression that KVM's nested virtualization doesn't 
> support
> AArch32 in the guest, why is the subject about hyp mode aarch32 timers?

Where did you see *ANY* mention of AArch32?

Or is that a very roundabout way to object to the 'hyp' name?
If that's the case, just apply a mental 's/hyp/el2/' substitution.

         M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 43/64] KVM: arm64: nv: arch_timer: Support hyp timer emulation
@ 2022-03-07 15:48       ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-03-07 15:48 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

On 2022-03-07 14:52, Alexandru Elisei wrote:
> Hi,
> 
> I was under the impression that KVM's nested virtualization doesn't 
> support
> AArch32 in the guest, why is the subject about hyp mode aarch32 timers?

Where did you see *ANY* mention of AArch32?

Or is that a very roundabout way to object to the 'hyp' name?
If that's the case, just apply a mental 's/hyp/el2/' substitution.

         M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 44/64] KVM: arm64: nv: Add handling of EL2-specific timer registers
  2022-01-28 12:18   ` Marc Zyngier
  (?)
@ 2022-03-07 16:01     ` Alexandru Elisei
  -1 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-03-07 16:01 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi,

On Fri, Jan 28, 2022 at 12:18:52PM +0000, Marc Zyngier wrote:
> Add the required handling for EL2 and EL02 registers, as
> well as EL1 registers used in the E2H context.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/sysreg.h |  6 +++
>  arch/arm64/kvm/sys_regs.c       | 87 +++++++++++++++++++++++++++++++++
>  2 files changed, 93 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 41c3603a3f29..ff6d3af8ed34 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -626,6 +626,12 @@
>  
>  #define SYS_CNTVOFF_EL2			sys_reg(3, 4, 14, 0, 3)
>  #define SYS_CNTHCTL_EL2			sys_reg(3, 4, 14, 1, 0)
> +#define SYS_CNTHP_TVAL_EL2		sys_reg(3, 4, 14, 2, 0)
> +#define SYS_CNTHP_CTL_EL2		sys_reg(3, 4, 14, 2, 1)
> +#define SYS_CNTHP_CVAL_EL2		sys_reg(3, 4, 14, 2, 2)
> +#define SYS_CNTHV_TVAL_EL2		sys_reg(3, 4, 14, 3, 0)
> +#define SYS_CNTHV_CTL_EL2		sys_reg(3, 4, 14, 3, 1)
> +#define SYS_CNTHV_CVAL_EL2		sys_reg(3, 4, 14, 3, 2)
>  
>  /* VHE encodings for architectural EL0/1 system registers */
>  #define SYS_SCTLR_EL12			sys_reg(3, 5, 1, 0, 0)
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index bbc58930a5eb..f2452e76156c 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1274,20 +1274,92 @@ static bool access_arch_timer(struct kvm_vcpu *vcpu,
>  
>  	switch (reg) {
>  	case SYS_CNTP_TVAL_EL0:
> +		if (vcpu_is_el2(vcpu) && vcpu_el2_e2h_is_set(vcpu))

According to ARM DDI 0487G.a, page D13-4157, if PSTATE is EL0 then accesses are
also redirected to CNTHP_TVAL_EL2 (under certain conditions).

> +			tmr = TIMER_HPTIMER;
> +		else
> +			tmr = TIMER_PTIMER;

What happens when the guest hypervisor has enabled traps for this register?
Shouldn't those also be forwarded?

> +		treg = TIMER_REG_TVAL;
> +		break;
> +
>  	case SYS_AARCH32_CNTP_TVAL:
> +	case SYS_CNTP_TVAL_EL02:

Shouldn't KVM inject an undefined exception if PSTATE is EL2 and virtual
HCR_EL2.E2H = 0? Shouldn't KVM forward the trap if PSTATE is EL1 and virtual
HCR_EL2.NV = 1? The same applies to all EL02 encodings.

>  		tmr = TIMER_PTIMER;
>  		treg = TIMER_REG_TVAL;
>  		break;
> +
> +	case SYS_CNTV_TVAL_EL02:
> +		tmr = TIMER_VTIMER;
> +		treg = TIMER_REG_TVAL;
> +		break;
> +
> +	case SYS_CNTHP_TVAL_EL2:
> +		tmr = TIMER_HPTIMER;
> +		treg = TIMER_REG_TVAL;
> +		break;
> +
> +	case SYS_CNTHV_TVAL_EL2:
> +		tmr = TIMER_HVTIMER;
> +		treg = TIMER_REG_TVAL;

Shouldn't KVM inject an undefined exception if guest PSTATE is EL1 and virtual
HCR_EL2.NV = 0? Shouldn't KVM forward the trap if PSTATE is EL1 and
HCR_EL2.NV = 1? Same for the other _EL2 registers.

Thanks,
Alex

> +		break;
> +
>  	case SYS_CNTP_CTL_EL0:
> +		if (vcpu_is_el2(vcpu) && vcpu_el2_e2h_is_set(vcpu))
> +			tmr = TIMER_HPTIMER;
> +		else
> +			tmr = TIMER_PTIMER;
> +		treg = TIMER_REG_CTL;
> +		break;
> +
>  	case SYS_AARCH32_CNTP_CTL:
> +	case SYS_CNTP_CTL_EL02:
>  		tmr = TIMER_PTIMER;
>  		treg = TIMER_REG_CTL;
>  		break;
> +
> +	case SYS_CNTV_CTL_EL02:
> +		tmr = TIMER_VTIMER;
> +		treg = TIMER_REG_CTL;
> +		break;
> +
> +	case SYS_CNTHP_CTL_EL2:
> +		tmr = TIMER_HPTIMER;
> +		treg = TIMER_REG_CTL;
> +		break;
> +
> +	case SYS_CNTHV_CTL_EL2:
> +		tmr = TIMER_HVTIMER;
> +		treg = TIMER_REG_CTL;
> +		break;
> +
>  	case SYS_CNTP_CVAL_EL0:
> +		if (vcpu_is_el2(vcpu) && vcpu_el2_e2h_is_set(vcpu))
> +			tmr = TIMER_HPTIMER;
> +		else
> +			tmr = TIMER_PTIMER;
> +		treg = TIMER_REG_CVAL;
> +		break;
> +
>  	case SYS_AARCH32_CNTP_CVAL:
> +	case SYS_CNTP_CVAL_EL02:
>  		tmr = TIMER_PTIMER;
>  		treg = TIMER_REG_CVAL;
>  		break;
> +
> +	case SYS_CNTV_CVAL_EL02:
> +		tmr = TIMER_VTIMER;
> +		treg = TIMER_REG_CVAL;
> +		break;
> +
> +	case SYS_CNTHP_CVAL_EL2:
> +		tmr = TIMER_HPTIMER;
> +		treg = TIMER_REG_CVAL;
> +		break;
> +
> +	case SYS_CNTHV_CVAL_EL2:
> +		tmr = TIMER_HVTIMER;
> +		treg = TIMER_REG_CVAL;
> +		break;
> +
>  	case SYS_CNTVOFF_EL2:
>  		tmr = TIMER_VTIMER;
>  		treg = TIMER_REG_VOFF;
> @@ -2220,6 +2292,13 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	{ SYS_DESC(SYS_CNTVOFF_EL2), access_arch_timer },
>  	EL2_REG(CNTHCTL_EL2, access_rw, reset_val, 0),
>  
> +	{ SYS_DESC(SYS_CNTHP_TVAL_EL2), access_arch_timer },
> +	{ SYS_DESC(SYS_CNTHP_CTL_EL2), access_arch_timer },
> +	{ SYS_DESC(SYS_CNTHP_CVAL_EL2), access_arch_timer },
> +	{ SYS_DESC(SYS_CNTHV_TVAL_EL2), access_arch_timer },
> +	{ SYS_DESC(SYS_CNTHV_CTL_EL2), access_arch_timer },
> +	{ SYS_DESC(SYS_CNTHV_CVAL_EL2), access_arch_timer },
> +
>  	EL12_REG(SCTLR, access_vm_reg, reset_val, 0x00C50078),
>  	EL12_REG(CPACR, access_rw, reset_val, 0),
>  	EL12_REG(TTBR0, access_vm_reg, reset_unknown, 0),
> @@ -2237,6 +2316,14 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	EL12_REG(CONTEXTIDR, access_vm_reg, reset_val, 0),
>  	EL12_REG(CNTKCTL, access_rw, reset_val, 0),
>  
> +	{ SYS_DESC(SYS_CNTP_TVAL_EL02), access_arch_timer },
> +	{ SYS_DESC(SYS_CNTP_CTL_EL02), access_arch_timer },
> +	{ SYS_DESC(SYS_CNTP_CVAL_EL02), access_arch_timer },
> +
> +	{ SYS_DESC(SYS_CNTV_TVAL_EL02), access_arch_timer },
> +	{ SYS_DESC(SYS_CNTV_CTL_EL02), access_arch_timer },
> +	{ SYS_DESC(SYS_CNTV_CVAL_EL02), access_arch_timer },
> +
>  	EL2_REG(SP_EL2, NULL, reset_unknown, 0),
>  };
>  
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 44/64] KVM: arm64: nv: Add handling of EL2-specific timer registers
@ 2022-03-07 16:01     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-03-07 16:01 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

Hi,

On Fri, Jan 28, 2022 at 12:18:52PM +0000, Marc Zyngier wrote:
> Add the required handling for EL2 and EL02 registers, as
> well as EL1 registers used in the E2H context.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/sysreg.h |  6 +++
>  arch/arm64/kvm/sys_regs.c       | 87 +++++++++++++++++++++++++++++++++
>  2 files changed, 93 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 41c3603a3f29..ff6d3af8ed34 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -626,6 +626,12 @@
>  
>  #define SYS_CNTVOFF_EL2			sys_reg(3, 4, 14, 0, 3)
>  #define SYS_CNTHCTL_EL2			sys_reg(3, 4, 14, 1, 0)
> +#define SYS_CNTHP_TVAL_EL2		sys_reg(3, 4, 14, 2, 0)
> +#define SYS_CNTHP_CTL_EL2		sys_reg(3, 4, 14, 2, 1)
> +#define SYS_CNTHP_CVAL_EL2		sys_reg(3, 4, 14, 2, 2)
> +#define SYS_CNTHV_TVAL_EL2		sys_reg(3, 4, 14, 3, 0)
> +#define SYS_CNTHV_CTL_EL2		sys_reg(3, 4, 14, 3, 1)
> +#define SYS_CNTHV_CVAL_EL2		sys_reg(3, 4, 14, 3, 2)
>  
>  /* VHE encodings for architectural EL0/1 system registers */
>  #define SYS_SCTLR_EL12			sys_reg(3, 5, 1, 0, 0)
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index bbc58930a5eb..f2452e76156c 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1274,20 +1274,92 @@ static bool access_arch_timer(struct kvm_vcpu *vcpu,
>  
>  	switch (reg) {
>  	case SYS_CNTP_TVAL_EL0:
> +		if (vcpu_is_el2(vcpu) && vcpu_el2_e2h_is_set(vcpu))

According to ARM DDI 0487G.a, page D13-4157, if PSTATE is EL0 then accesses are
also redirected to CNTHP_TVAL_EL2 (under certain conditions).

> +			tmr = TIMER_HPTIMER;
> +		else
> +			tmr = TIMER_PTIMER;

What happens when the guest hypervisor has enabled traps for this register?
Shouldn't those also be forwarded?

> +		treg = TIMER_REG_TVAL;
> +		break;
> +
>  	case SYS_AARCH32_CNTP_TVAL:
> +	case SYS_CNTP_TVAL_EL02:

Shouldn't KVM inject an undefined exception if PSTATE is EL2 and virtual
HCR_EL2.E2H = 0? Shouldn't KVM forward the trap if PSTATE is EL1 and virtual
HCR_EL2.NV = 1? The same applies to all EL02 encodings.

>  		tmr = TIMER_PTIMER;
>  		treg = TIMER_REG_TVAL;
>  		break;
> +
> +	case SYS_CNTV_TVAL_EL02:
> +		tmr = TIMER_VTIMER;
> +		treg = TIMER_REG_TVAL;
> +		break;
> +
> +	case SYS_CNTHP_TVAL_EL2:
> +		tmr = TIMER_HPTIMER;
> +		treg = TIMER_REG_TVAL;
> +		break;
> +
> +	case SYS_CNTHV_TVAL_EL2:
> +		tmr = TIMER_HVTIMER;
> +		treg = TIMER_REG_TVAL;

Shouldn't KVM inject an undefined exception if guest PSTATE is EL1 and virtual
HCR_EL2.NV = 0? Shouldn't KVM forward the trap if PSTATE is EL1 and
HCR_EL2.NV = 1? Same for the other _EL2 registers.

Thanks,
Alex

> +		break;
> +
>  	case SYS_CNTP_CTL_EL0:
> +		if (vcpu_is_el2(vcpu) && vcpu_el2_e2h_is_set(vcpu))
> +			tmr = TIMER_HPTIMER;
> +		else
> +			tmr = TIMER_PTIMER;
> +		treg = TIMER_REG_CTL;
> +		break;
> +
>  	case SYS_AARCH32_CNTP_CTL:
> +	case SYS_CNTP_CTL_EL02:
>  		tmr = TIMER_PTIMER;
>  		treg = TIMER_REG_CTL;
>  		break;
> +
> +	case SYS_CNTV_CTL_EL02:
> +		tmr = TIMER_VTIMER;
> +		treg = TIMER_REG_CTL;
> +		break;
> +
> +	case SYS_CNTHP_CTL_EL2:
> +		tmr = TIMER_HPTIMER;
> +		treg = TIMER_REG_CTL;
> +		break;
> +
> +	case SYS_CNTHV_CTL_EL2:
> +		tmr = TIMER_HVTIMER;
> +		treg = TIMER_REG_CTL;
> +		break;
> +
>  	case SYS_CNTP_CVAL_EL0:
> +		if (vcpu_is_el2(vcpu) && vcpu_el2_e2h_is_set(vcpu))
> +			tmr = TIMER_HPTIMER;
> +		else
> +			tmr = TIMER_PTIMER;
> +		treg = TIMER_REG_CVAL;
> +		break;
> +
>  	case SYS_AARCH32_CNTP_CVAL:
> +	case SYS_CNTP_CVAL_EL02:
>  		tmr = TIMER_PTIMER;
>  		treg = TIMER_REG_CVAL;
>  		break;
> +
> +	case SYS_CNTV_CVAL_EL02:
> +		tmr = TIMER_VTIMER;
> +		treg = TIMER_REG_CVAL;
> +		break;
> +
> +	case SYS_CNTHP_CVAL_EL2:
> +		tmr = TIMER_HPTIMER;
> +		treg = TIMER_REG_CVAL;
> +		break;
> +
> +	case SYS_CNTHV_CVAL_EL2:
> +		tmr = TIMER_HVTIMER;
> +		treg = TIMER_REG_CVAL;
> +		break;
> +
>  	case SYS_CNTVOFF_EL2:
>  		tmr = TIMER_VTIMER;
>  		treg = TIMER_REG_VOFF;
> @@ -2220,6 +2292,13 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	{ SYS_DESC(SYS_CNTVOFF_EL2), access_arch_timer },
>  	EL2_REG(CNTHCTL_EL2, access_rw, reset_val, 0),
>  
> +	{ SYS_DESC(SYS_CNTHP_TVAL_EL2), access_arch_timer },
> +	{ SYS_DESC(SYS_CNTHP_CTL_EL2), access_arch_timer },
> +	{ SYS_DESC(SYS_CNTHP_CVAL_EL2), access_arch_timer },
> +	{ SYS_DESC(SYS_CNTHV_TVAL_EL2), access_arch_timer },
> +	{ SYS_DESC(SYS_CNTHV_CTL_EL2), access_arch_timer },
> +	{ SYS_DESC(SYS_CNTHV_CVAL_EL2), access_arch_timer },
> +
>  	EL12_REG(SCTLR, access_vm_reg, reset_val, 0x00C50078),
>  	EL12_REG(CPACR, access_rw, reset_val, 0),
>  	EL12_REG(TTBR0, access_vm_reg, reset_unknown, 0),
> @@ -2237,6 +2316,14 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	EL12_REG(CONTEXTIDR, access_vm_reg, reset_val, 0),
>  	EL12_REG(CNTKCTL, access_rw, reset_val, 0),
>  
> +	{ SYS_DESC(SYS_CNTP_TVAL_EL02), access_arch_timer },
> +	{ SYS_DESC(SYS_CNTP_CTL_EL02), access_arch_timer },
> +	{ SYS_DESC(SYS_CNTP_CVAL_EL02), access_arch_timer },
> +
> +	{ SYS_DESC(SYS_CNTV_TVAL_EL02), access_arch_timer },
> +	{ SYS_DESC(SYS_CNTV_CTL_EL02), access_arch_timer },
> +	{ SYS_DESC(SYS_CNTV_CVAL_EL02), access_arch_timer },
> +
>  	EL2_REG(SP_EL2, NULL, reset_unknown, 0),
>  };
>  
> -- 
> 2.30.2
> 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 44/64] KVM: arm64: nv: Add handling of EL2-specific timer registers
@ 2022-03-07 16:01     ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-03-07 16:01 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi,

On Fri, Jan 28, 2022 at 12:18:52PM +0000, Marc Zyngier wrote:
> Add the required handling for EL2 and EL02 registers, as
> well as EL1 registers used in the E2H context.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/sysreg.h |  6 +++
>  arch/arm64/kvm/sys_regs.c       | 87 +++++++++++++++++++++++++++++++++
>  2 files changed, 93 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 41c3603a3f29..ff6d3af8ed34 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -626,6 +626,12 @@
>  
>  #define SYS_CNTVOFF_EL2			sys_reg(3, 4, 14, 0, 3)
>  #define SYS_CNTHCTL_EL2			sys_reg(3, 4, 14, 1, 0)
> +#define SYS_CNTHP_TVAL_EL2		sys_reg(3, 4, 14, 2, 0)
> +#define SYS_CNTHP_CTL_EL2		sys_reg(3, 4, 14, 2, 1)
> +#define SYS_CNTHP_CVAL_EL2		sys_reg(3, 4, 14, 2, 2)
> +#define SYS_CNTHV_TVAL_EL2		sys_reg(3, 4, 14, 3, 0)
> +#define SYS_CNTHV_CTL_EL2		sys_reg(3, 4, 14, 3, 1)
> +#define SYS_CNTHV_CVAL_EL2		sys_reg(3, 4, 14, 3, 2)
>  
>  /* VHE encodings for architectural EL0/1 system registers */
>  #define SYS_SCTLR_EL12			sys_reg(3, 5, 1, 0, 0)
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index bbc58930a5eb..f2452e76156c 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1274,20 +1274,92 @@ static bool access_arch_timer(struct kvm_vcpu *vcpu,
>  
>  	switch (reg) {
>  	case SYS_CNTP_TVAL_EL0:
> +		if (vcpu_is_el2(vcpu) && vcpu_el2_e2h_is_set(vcpu))

According to ARM DDI 0487G.a, page D13-4157, if PSTATE is EL0 then accesses are
also redirected to CNTHP_TVAL_EL2 (under certain conditions).

> +			tmr = TIMER_HPTIMER;
> +		else
> +			tmr = TIMER_PTIMER;

What happens when the guest hypervisor has enabled traps for this register?
Shouldn't those also be forwarded?

> +		treg = TIMER_REG_TVAL;
> +		break;
> +
>  	case SYS_AARCH32_CNTP_TVAL:
> +	case SYS_CNTP_TVAL_EL02:

Shouldn't KVM inject an undefined exception if PSTATE is EL2 and virtual
HCR_EL2.E2H = 0? Shouldn't KVM forward the trap if PSTATE is EL1 and virtual
HCR_EL2.NV = 1? The same applies to all EL02 encodings.

>  		tmr = TIMER_PTIMER;
>  		treg = TIMER_REG_TVAL;
>  		break;
> +
> +	case SYS_CNTV_TVAL_EL02:
> +		tmr = TIMER_VTIMER;
> +		treg = TIMER_REG_TVAL;
> +		break;
> +
> +	case SYS_CNTHP_TVAL_EL2:
> +		tmr = TIMER_HPTIMER;
> +		treg = TIMER_REG_TVAL;
> +		break;
> +
> +	case SYS_CNTHV_TVAL_EL2:
> +		tmr = TIMER_HVTIMER;
> +		treg = TIMER_REG_TVAL;

Shouldn't KVM inject an undefined exception if guest PSTATE is EL1 and virtual
HCR_EL2.NV = 0? Shouldn't KVM forward the trap if PSTATE is EL1 and
HCR_EL2.NV = 1? Same for the other _EL2 registers.

Thanks,
Alex

> +		break;
> +
>  	case SYS_CNTP_CTL_EL0:
> +		if (vcpu_is_el2(vcpu) && vcpu_el2_e2h_is_set(vcpu))
> +			tmr = TIMER_HPTIMER;
> +		else
> +			tmr = TIMER_PTIMER;
> +		treg = TIMER_REG_CTL;
> +		break;
> +
>  	case SYS_AARCH32_CNTP_CTL:
> +	case SYS_CNTP_CTL_EL02:
>  		tmr = TIMER_PTIMER;
>  		treg = TIMER_REG_CTL;
>  		break;
> +
> +	case SYS_CNTV_CTL_EL02:
> +		tmr = TIMER_VTIMER;
> +		treg = TIMER_REG_CTL;
> +		break;
> +
> +	case SYS_CNTHP_CTL_EL2:
> +		tmr = TIMER_HPTIMER;
> +		treg = TIMER_REG_CTL;
> +		break;
> +
> +	case SYS_CNTHV_CTL_EL2:
> +		tmr = TIMER_HVTIMER;
> +		treg = TIMER_REG_CTL;
> +		break;
> +
>  	case SYS_CNTP_CVAL_EL0:
> +		if (vcpu_is_el2(vcpu) && vcpu_el2_e2h_is_set(vcpu))
> +			tmr = TIMER_HPTIMER;
> +		else
> +			tmr = TIMER_PTIMER;
> +		treg = TIMER_REG_CVAL;
> +		break;
> +
>  	case SYS_AARCH32_CNTP_CVAL:
> +	case SYS_CNTP_CVAL_EL02:
>  		tmr = TIMER_PTIMER;
>  		treg = TIMER_REG_CVAL;
>  		break;
> +
> +	case SYS_CNTV_CVAL_EL02:
> +		tmr = TIMER_VTIMER;
> +		treg = TIMER_REG_CVAL;
> +		break;
> +
> +	case SYS_CNTHP_CVAL_EL2:
> +		tmr = TIMER_HPTIMER;
> +		treg = TIMER_REG_CVAL;
> +		break;
> +
> +	case SYS_CNTHV_CVAL_EL2:
> +		tmr = TIMER_HVTIMER;
> +		treg = TIMER_REG_CVAL;
> +		break;
> +
>  	case SYS_CNTVOFF_EL2:
>  		tmr = TIMER_VTIMER;
>  		treg = TIMER_REG_VOFF;
> @@ -2220,6 +2292,13 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	{ SYS_DESC(SYS_CNTVOFF_EL2), access_arch_timer },
>  	EL2_REG(CNTHCTL_EL2, access_rw, reset_val, 0),
>  
> +	{ SYS_DESC(SYS_CNTHP_TVAL_EL2), access_arch_timer },
> +	{ SYS_DESC(SYS_CNTHP_CTL_EL2), access_arch_timer },
> +	{ SYS_DESC(SYS_CNTHP_CVAL_EL2), access_arch_timer },
> +	{ SYS_DESC(SYS_CNTHV_TVAL_EL2), access_arch_timer },
> +	{ SYS_DESC(SYS_CNTHV_CTL_EL2), access_arch_timer },
> +	{ SYS_DESC(SYS_CNTHV_CVAL_EL2), access_arch_timer },
> +
>  	EL12_REG(SCTLR, access_vm_reg, reset_val, 0x00C50078),
>  	EL12_REG(CPACR, access_rw, reset_val, 0),
>  	EL12_REG(TTBR0, access_vm_reg, reset_unknown, 0),
> @@ -2237,6 +2316,14 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  	EL12_REG(CONTEXTIDR, access_vm_reg, reset_val, 0),
>  	EL12_REG(CNTKCTL, access_rw, reset_val, 0),
>  
> +	{ SYS_DESC(SYS_CNTP_TVAL_EL02), access_arch_timer },
> +	{ SYS_DESC(SYS_CNTP_CTL_EL02), access_arch_timer },
> +	{ SYS_DESC(SYS_CNTP_CVAL_EL02), access_arch_timer },
> +
> +	{ SYS_DESC(SYS_CNTV_TVAL_EL02), access_arch_timer },
> +	{ SYS_DESC(SYS_CNTV_CTL_EL02), access_arch_timer },
> +	{ SYS_DESC(SYS_CNTV_CVAL_EL02), access_arch_timer },
> +
>  	EL2_REG(SP_EL2, NULL, reset_unknown, 0),
>  };
>  
> -- 
> 2.30.2
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 43/64] KVM: arm64: nv: arch_timer: Support hyp timer emulation
  2022-03-07 15:44       ` Marc Zyngier
  (?)
@ 2022-03-07 16:24         ` Alexandru Elisei
  -1 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-03-07 16:24 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi,

On Mon, Mar 07, 2022 at 03:44:43PM +0000, Marc Zyngier wrote:
> On 2022-03-07 15:23, Alexandru Elisei wrote:
> > Hi,
> > 
> > On Fri, Jan 28, 2022 at 12:18:51PM +0000, Marc Zyngier wrote:
> > > From: Christoffer Dall <christoffer.dall@arm.com>
> > > 
> > > Emulating EL2 also means emulating the EL2 timers. To do so, we expand
> > > our timer framework to deal with at most 4 timers. At any given time,
> > > two timers are using the HW timers, and the two others are purely
> > > emulated.
> > > 
> > > The role of deciding which is which at any given time is left to a
> > > mapping function which is called every time we need to make such a
> > > decision.
> > > 
> > > Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> > > [maz: added CNTVOFF support, general reworking for v4.8]
> > > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > > ---
> > >  arch/arm64/include/asm/kvm_host.h |   4 +
> > >  arch/arm64/kvm/arch_timer.c       | 165
> > > ++++++++++++++++++++++++++++--
> > >  arch/arm64/kvm/sys_regs.c         |   7 +-
> > >  arch/arm64/kvm/trace_arm.h        |   6 +-
> > >  arch/arm64/kvm/vgic/vgic.c        |  15 +++
> > >  include/kvm/arm_arch_timer.h      |   8 +-
> > >  include/kvm/arm_vgic.h            |   1 +
> > >  7 files changed, 194 insertions(+), 12 deletions(-)
> > > 
> > [..]
> > > @@ -1301,6 +1445,7 @@ static void set_timer_irqs(struct kvm *kvm,
> > > int vtimer_irq, int ptimer_irq)
> > >  	kvm_for_each_vcpu(i, vcpu, kvm) {
> > >  		vcpu_vtimer(vcpu)->irq.irq = vtimer_irq;
> > >  		vcpu_ptimer(vcpu)->irq.irq = ptimer_irq;
> > > +		/* TODO: Add support for hv/hp timers */
> > >  	}
> > >  }
> > > 
> > > @@ -1311,6 +1456,8 @@ int kvm_arm_timer_set_attr(struct kvm_vcpu
> > > *vcpu, struct kvm_device_attr *attr)
> > >  	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
> > >  	int irq;
> > > 
> > > +	/* TODO: Add support for hv/hp timers */
> > 
> > Is the patch unfinished?
> 
> Just like the rest of the kernel.

That doesn't really answer my question. What I was asking if this is the patch
that is intended to be merged or if you still want to add handling of those
timers before merging it.

Thanks,
Alex

> 
>         M.
> -- 
> Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 43/64] KVM: arm64: nv: arch_timer: Support hyp timer emulation
@ 2022-03-07 16:24         ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-03-07 16:24 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

Hi,

On Mon, Mar 07, 2022 at 03:44:43PM +0000, Marc Zyngier wrote:
> On 2022-03-07 15:23, Alexandru Elisei wrote:
> > Hi,
> > 
> > On Fri, Jan 28, 2022 at 12:18:51PM +0000, Marc Zyngier wrote:
> > > From: Christoffer Dall <christoffer.dall@arm.com>
> > > 
> > > Emulating EL2 also means emulating the EL2 timers. To do so, we expand
> > > our timer framework to deal with at most 4 timers. At any given time,
> > > two timers are using the HW timers, and the two others are purely
> > > emulated.
> > > 
> > > The role of deciding which is which at any given time is left to a
> > > mapping function which is called every time we need to make such a
> > > decision.
> > > 
> > > Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> > > [maz: added CNTVOFF support, general reworking for v4.8]
> > > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > > ---
> > >  arch/arm64/include/asm/kvm_host.h |   4 +
> > >  arch/arm64/kvm/arch_timer.c       | 165
> > > ++++++++++++++++++++++++++++--
> > >  arch/arm64/kvm/sys_regs.c         |   7 +-
> > >  arch/arm64/kvm/trace_arm.h        |   6 +-
> > >  arch/arm64/kvm/vgic/vgic.c        |  15 +++
> > >  include/kvm/arm_arch_timer.h      |   8 +-
> > >  include/kvm/arm_vgic.h            |   1 +
> > >  7 files changed, 194 insertions(+), 12 deletions(-)
> > > 
> > [..]
> > > @@ -1301,6 +1445,7 @@ static void set_timer_irqs(struct kvm *kvm,
> > > int vtimer_irq, int ptimer_irq)
> > >  	kvm_for_each_vcpu(i, vcpu, kvm) {
> > >  		vcpu_vtimer(vcpu)->irq.irq = vtimer_irq;
> > >  		vcpu_ptimer(vcpu)->irq.irq = ptimer_irq;
> > > +		/* TODO: Add support for hv/hp timers */
> > >  	}
> > >  }
> > > 
> > > @@ -1311,6 +1456,8 @@ int kvm_arm_timer_set_attr(struct kvm_vcpu
> > > *vcpu, struct kvm_device_attr *attr)
> > >  	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
> > >  	int irq;
> > > 
> > > +	/* TODO: Add support for hv/hp timers */
> > 
> > Is the patch unfinished?
> 
> Just like the rest of the kernel.

That doesn't really answer my question. What I was asking if this is the patch
that is intended to be merged or if you still want to add handling of those
timers before merging it.

Thanks,
Alex

> 
>         M.
> -- 
> Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 43/64] KVM: arm64: nv: arch_timer: Support hyp timer emulation
@ 2022-03-07 16:24         ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-03-07 16:24 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi,

On Mon, Mar 07, 2022 at 03:44:43PM +0000, Marc Zyngier wrote:
> On 2022-03-07 15:23, Alexandru Elisei wrote:
> > Hi,
> > 
> > On Fri, Jan 28, 2022 at 12:18:51PM +0000, Marc Zyngier wrote:
> > > From: Christoffer Dall <christoffer.dall@arm.com>
> > > 
> > > Emulating EL2 also means emulating the EL2 timers. To do so, we expand
> > > our timer framework to deal with at most 4 timers. At any given time,
> > > two timers are using the HW timers, and the two others are purely
> > > emulated.
> > > 
> > > The role of deciding which is which at any given time is left to a
> > > mapping function which is called every time we need to make such a
> > > decision.
> > > 
> > > Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> > > [maz: added CNTVOFF support, general reworking for v4.8]
> > > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > > ---
> > >  arch/arm64/include/asm/kvm_host.h |   4 +
> > >  arch/arm64/kvm/arch_timer.c       | 165
> > > ++++++++++++++++++++++++++++--
> > >  arch/arm64/kvm/sys_regs.c         |   7 +-
> > >  arch/arm64/kvm/trace_arm.h        |   6 +-
> > >  arch/arm64/kvm/vgic/vgic.c        |  15 +++
> > >  include/kvm/arm_arch_timer.h      |   8 +-
> > >  include/kvm/arm_vgic.h            |   1 +
> > >  7 files changed, 194 insertions(+), 12 deletions(-)
> > > 
> > [..]
> > > @@ -1301,6 +1445,7 @@ static void set_timer_irqs(struct kvm *kvm,
> > > int vtimer_irq, int ptimer_irq)
> > >  	kvm_for_each_vcpu(i, vcpu, kvm) {
> > >  		vcpu_vtimer(vcpu)->irq.irq = vtimer_irq;
> > >  		vcpu_ptimer(vcpu)->irq.irq = ptimer_irq;
> > > +		/* TODO: Add support for hv/hp timers */
> > >  	}
> > >  }
> > > 
> > > @@ -1311,6 +1456,8 @@ int kvm_arm_timer_set_attr(struct kvm_vcpu
> > > *vcpu, struct kvm_device_attr *attr)
> > >  	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
> > >  	int irq;
> > > 
> > > +	/* TODO: Add support for hv/hp timers */
> > 
> > Is the patch unfinished?
> 
> Just like the rest of the kernel.

That doesn't really answer my question. What I was asking if this is the patch
that is intended to be merged or if you still want to add handling of those
timers before merging it.

Thanks,
Alex

> 
>         M.
> -- 
> Jazz is not dead. It just smells funny...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 43/64] KVM: arm64: nv: arch_timer: Support hyp timer emulation
  2022-03-07 15:48       ` Marc Zyngier
  (?)
@ 2022-03-07 16:28         ` Alexandru Elisei
  -1 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-03-07 16:28 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi,

On Mon, Mar 07, 2022 at 03:48:19PM +0000, Marc Zyngier wrote:
> On 2022-03-07 14:52, Alexandru Elisei wrote:
> > Hi,
> > 
> > I was under the impression that KVM's nested virtualization doesn't
> > support
> > AArch32 in the guest, why is the subject about hyp mode aarch32 timers?
> 
> Where did you see *ANY* mention of AArch32?

I saw an implicit mention of aarch32 when the commit message used the term
"hyp", which is the name of an aarch32 execution mode.

> 
> Or is that a very roundabout way to object to the 'hyp' name?

Bingo.

> If that's the case, just apply a mental 's/hyp/el2/' substitution.

I'm a bit confused about that. Is that something that anyone reading the patch
should apply mentally when reading the patch, or is it something that you plan
to change in the commit subject?

Thanks,
Alex

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 43/64] KVM: arm64: nv: arch_timer: Support hyp timer emulation
@ 2022-03-07 16:28         ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-03-07 16:28 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

Hi,

On Mon, Mar 07, 2022 at 03:48:19PM +0000, Marc Zyngier wrote:
> On 2022-03-07 14:52, Alexandru Elisei wrote:
> > Hi,
> > 
> > I was under the impression that KVM's nested virtualization doesn't
> > support
> > AArch32 in the guest, why is the subject about hyp mode aarch32 timers?
> 
> Where did you see *ANY* mention of AArch32?

I saw an implicit mention of aarch32 when the commit message used the term
"hyp", which is the name of an aarch32 execution mode.

> 
> Or is that a very roundabout way to object to the 'hyp' name?

Bingo.

> If that's the case, just apply a mental 's/hyp/el2/' substitution.

I'm a bit confused about that. Is that something that anyone reading the patch
should apply mentally when reading the patch, or is it something that you plan
to change in the commit subject?

Thanks,
Alex
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 43/64] KVM: arm64: nv: arch_timer: Support hyp timer emulation
@ 2022-03-07 16:28         ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-03-07 16:28 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi,

On Mon, Mar 07, 2022 at 03:48:19PM +0000, Marc Zyngier wrote:
> On 2022-03-07 14:52, Alexandru Elisei wrote:
> > Hi,
> > 
> > I was under the impression that KVM's nested virtualization doesn't
> > support
> > AArch32 in the guest, why is the subject about hyp mode aarch32 timers?
> 
> Where did you see *ANY* mention of AArch32?

I saw an implicit mention of aarch32 when the commit message used the term
"hyp", which is the name of an aarch32 execution mode.

> 
> Or is that a very roundabout way to object to the 'hyp' name?

Bingo.

> If that's the case, just apply a mental 's/hyp/el2/' substitution.

I'm a bit confused about that. Is that something that anyone reading the patch
should apply mentally when reading the patch, or is it something that you plan
to change in the commit subject?

Thanks,
Alex

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 43/64] KVM: arm64: nv: arch_timer: Support hyp timer emulation
  2022-03-07 16:24         ` Alexandru Elisei
  (?)
@ 2022-03-07 16:40           ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-03-07 16:40 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

On Mon, 07 Mar 2022 16:24:58 +0000,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> Hi,
> 
> On Mon, Mar 07, 2022 at 03:44:43PM +0000, Marc Zyngier wrote:
> > On 2022-03-07 15:23, Alexandru Elisei wrote:
> > > Hi,
> > > 
> > > On Fri, Jan 28, 2022 at 12:18:51PM +0000, Marc Zyngier wrote:
> > > > From: Christoffer Dall <christoffer.dall@arm.com>
> > > > 
> > > > Emulating EL2 also means emulating the EL2 timers. To do so, we expand
> > > > our timer framework to deal with at most 4 timers. At any given time,
> > > > two timers are using the HW timers, and the two others are purely
> > > > emulated.
> > > > 
> > > > The role of deciding which is which at any given time is left to a
> > > > mapping function which is called every time we need to make such a
> > > > decision.
> > > > 
> > > > Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> > > > [maz: added CNTVOFF support, general reworking for v4.8]
> > > > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > > > ---
> > > >  arch/arm64/include/asm/kvm_host.h |   4 +
> > > >  arch/arm64/kvm/arch_timer.c       | 165
> > > > ++++++++++++++++++++++++++++--
> > > >  arch/arm64/kvm/sys_regs.c         |   7 +-
> > > >  arch/arm64/kvm/trace_arm.h        |   6 +-
> > > >  arch/arm64/kvm/vgic/vgic.c        |  15 +++
> > > >  include/kvm/arm_arch_timer.h      |   8 +-
> > > >  include/kvm/arm_vgic.h            |   1 +
> > > >  7 files changed, 194 insertions(+), 12 deletions(-)
> > > > 
> > > [..]
> > > > @@ -1301,6 +1445,7 @@ static void set_timer_irqs(struct kvm *kvm,
> > > > int vtimer_irq, int ptimer_irq)
> > > >  	kvm_for_each_vcpu(i, vcpu, kvm) {
> > > >  		vcpu_vtimer(vcpu)->irq.irq = vtimer_irq;
> > > >  		vcpu_ptimer(vcpu)->irq.irq = ptimer_irq;
> > > > +		/* TODO: Add support for hv/hp timers */
> > > >  	}
> > > >  }
> > > > 
> > > > @@ -1311,6 +1456,8 @@ int kvm_arm_timer_set_attr(struct kvm_vcpu
> > > > *vcpu, struct kvm_device_attr *attr)
> > > >  	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
> > > >  	int irq;
> > > > 
> > > > +	/* TODO: Add support for hv/hp timers */
> > > 
> > > Is the patch unfinished?
> > 
> > Just like the rest of the kernel.
> 
> That doesn't really answer my question. What I was asking if this is
> the patch that is intended to be merged or if you still want to add
> handling of those timers before merging it.

It depends what people's expectations are.

If you want full support for everything the architecture has, plus all
the features that KVM has such as save restore (which is what we this
comment is about) before things can go in, then the answer probably is
that I should go and erase that code as quickly as possible.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 43/64] KVM: arm64: nv: arch_timer: Support hyp timer emulation
@ 2022-03-07 16:40           ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-03-07 16:40 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

On Mon, 07 Mar 2022 16:24:58 +0000,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> Hi,
> 
> On Mon, Mar 07, 2022 at 03:44:43PM +0000, Marc Zyngier wrote:
> > On 2022-03-07 15:23, Alexandru Elisei wrote:
> > > Hi,
> > > 
> > > On Fri, Jan 28, 2022 at 12:18:51PM +0000, Marc Zyngier wrote:
> > > > From: Christoffer Dall <christoffer.dall@arm.com>
> > > > 
> > > > Emulating EL2 also means emulating the EL2 timers. To do so, we expand
> > > > our timer framework to deal with at most 4 timers. At any given time,
> > > > two timers are using the HW timers, and the two others are purely
> > > > emulated.
> > > > 
> > > > The role of deciding which is which at any given time is left to a
> > > > mapping function which is called every time we need to make such a
> > > > decision.
> > > > 
> > > > Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> > > > [maz: added CNTVOFF support, general reworking for v4.8]
> > > > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > > > ---
> > > >  arch/arm64/include/asm/kvm_host.h |   4 +
> > > >  arch/arm64/kvm/arch_timer.c       | 165
> > > > ++++++++++++++++++++++++++++--
> > > >  arch/arm64/kvm/sys_regs.c         |   7 +-
> > > >  arch/arm64/kvm/trace_arm.h        |   6 +-
> > > >  arch/arm64/kvm/vgic/vgic.c        |  15 +++
> > > >  include/kvm/arm_arch_timer.h      |   8 +-
> > > >  include/kvm/arm_vgic.h            |   1 +
> > > >  7 files changed, 194 insertions(+), 12 deletions(-)
> > > > 
> > > [..]
> > > > @@ -1301,6 +1445,7 @@ static void set_timer_irqs(struct kvm *kvm,
> > > > int vtimer_irq, int ptimer_irq)
> > > >  	kvm_for_each_vcpu(i, vcpu, kvm) {
> > > >  		vcpu_vtimer(vcpu)->irq.irq = vtimer_irq;
> > > >  		vcpu_ptimer(vcpu)->irq.irq = ptimer_irq;
> > > > +		/* TODO: Add support for hv/hp timers */
> > > >  	}
> > > >  }
> > > > 
> > > > @@ -1311,6 +1456,8 @@ int kvm_arm_timer_set_attr(struct kvm_vcpu
> > > > *vcpu, struct kvm_device_attr *attr)
> > > >  	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
> > > >  	int irq;
> > > > 
> > > > +	/* TODO: Add support for hv/hp timers */
> > > 
> > > Is the patch unfinished?
> > 
> > Just like the rest of the kernel.
> 
> That doesn't really answer my question. What I was asking if this is
> the patch that is intended to be merged or if you still want to add
> handling of those timers before merging it.

It depends what people's expectations are.

If you want full support for everything the architecture has, plus all
the features that KVM has such as save restore (which is what we this
comment is about) before things can go in, then the answer probably is
that I should go and erase that code as quickly as possible.

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 43/64] KVM: arm64: nv: arch_timer: Support hyp timer emulation
@ 2022-03-07 16:40           ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-03-07 16:40 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

On Mon, 07 Mar 2022 16:24:58 +0000,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> Hi,
> 
> On Mon, Mar 07, 2022 at 03:44:43PM +0000, Marc Zyngier wrote:
> > On 2022-03-07 15:23, Alexandru Elisei wrote:
> > > Hi,
> > > 
> > > On Fri, Jan 28, 2022 at 12:18:51PM +0000, Marc Zyngier wrote:
> > > > From: Christoffer Dall <christoffer.dall@arm.com>
> > > > 
> > > > Emulating EL2 also means emulating the EL2 timers. To do so, we expand
> > > > our timer framework to deal with at most 4 timers. At any given time,
> > > > two timers are using the HW timers, and the two others are purely
> > > > emulated.
> > > > 
> > > > The role of deciding which is which at any given time is left to a
> > > > mapping function which is called every time we need to make such a
> > > > decision.
> > > > 
> > > > Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
> > > > [maz: added CNTVOFF support, general reworking for v4.8]
> > > > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > > > ---
> > > >  arch/arm64/include/asm/kvm_host.h |   4 +
> > > >  arch/arm64/kvm/arch_timer.c       | 165
> > > > ++++++++++++++++++++++++++++--
> > > >  arch/arm64/kvm/sys_regs.c         |   7 +-
> > > >  arch/arm64/kvm/trace_arm.h        |   6 +-
> > > >  arch/arm64/kvm/vgic/vgic.c        |  15 +++
> > > >  include/kvm/arm_arch_timer.h      |   8 +-
> > > >  include/kvm/arm_vgic.h            |   1 +
> > > >  7 files changed, 194 insertions(+), 12 deletions(-)
> > > > 
> > > [..]
> > > > @@ -1301,6 +1445,7 @@ static void set_timer_irqs(struct kvm *kvm,
> > > > int vtimer_irq, int ptimer_irq)
> > > >  	kvm_for_each_vcpu(i, vcpu, kvm) {
> > > >  		vcpu_vtimer(vcpu)->irq.irq = vtimer_irq;
> > > >  		vcpu_ptimer(vcpu)->irq.irq = ptimer_irq;
> > > > +		/* TODO: Add support for hv/hp timers */
> > > >  	}
> > > >  }
> > > > 
> > > > @@ -1311,6 +1456,8 @@ int kvm_arm_timer_set_attr(struct kvm_vcpu
> > > > *vcpu, struct kvm_device_attr *attr)
> > > >  	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
> > > >  	int irq;
> > > > 
> > > > +	/* TODO: Add support for hv/hp timers */
> > > 
> > > Is the patch unfinished?
> > 
> > Just like the rest of the kernel.
> 
> That doesn't really answer my question. What I was asking if this is
> the patch that is intended to be merged or if you still want to add
> handling of those timers before merging it.

It depends what people's expectations are.

If you want full support for everything the architecture has, plus all
the features that KVM has such as save restore (which is what we this
comment is about) before things can go in, then the answer probably is
that I should go and erase that code as quickly as possible.

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 43/64] KVM: arm64: nv: arch_timer: Support hyp timer emulation
  2022-03-07 16:28         ` Alexandru Elisei
  (?)
@ 2022-03-07 16:52           ` Marc Zyngier
  -1 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-03-07 16:52 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

On Mon, 07 Mar 2022 16:28:44 +0000,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> Hi,
> 
> On Mon, Mar 07, 2022 at 03:48:19PM +0000, Marc Zyngier wrote:
> > On 2022-03-07 14:52, Alexandru Elisei wrote:
> > > Hi,
> > > 
> > > I was under the impression that KVM's nested virtualization doesn't
> > > support
> > > AArch32 in the guest, why is the subject about hyp mode aarch32 timers?
> > 
> > Where did you see *ANY* mention of AArch32?
> 
> I saw an implicit mention of aarch32 when the commit message used
> the term "hyp", which is the name of an aarch32 execution mode.
> 
> > 
> > Or is that a very roundabout way to object to the 'hyp' name?
> 
> Bingo.
> 
> > If that's the case, just apply a mental 's/hyp/el2/' substitution.
> 
> I'm a bit confused about that. Is that something that anyone reading
> the patch should apply mentally when reading the patch, or is it
> something that you plan to change in the commit subject?

Big picture:

maz@hot-poop:~/arm-platforms$ git grep -i hyp arch/arm64/|wc -l
1701
maz@hot-poop:~/arm-platforms$ git grep -i el2 arch/arm64/|wc -l
1008

Are we going to also repaint all these 'hyp' references?

I really appreciate all the hard work you are putting in carefully
reviewing the code. I *really* do. But bickering on this really
doesn't help, and I know you understand exactly what this subject line
means (you've been reviewing KVM code for long enough, and won't fool
anyone).

The point you are trying to make really is moot. Everybody understands
that HYP means EL2. I'd even argue that it is clearer than EL2,
because it indicates that we're running at EL2 with the role of a
hypervisor, which isn't that clear with running with VHE.

And if they don't understand that, it unfortunately means they can't
really review this code.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 43/64] KVM: arm64: nv: arch_timer: Support hyp timer emulation
@ 2022-03-07 16:52           ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-03-07 16:52 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

On Mon, 07 Mar 2022 16:28:44 +0000,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> Hi,
> 
> On Mon, Mar 07, 2022 at 03:48:19PM +0000, Marc Zyngier wrote:
> > On 2022-03-07 14:52, Alexandru Elisei wrote:
> > > Hi,
> > > 
> > > I was under the impression that KVM's nested virtualization doesn't
> > > support
> > > AArch32 in the guest, why is the subject about hyp mode aarch32 timers?
> > 
> > Where did you see *ANY* mention of AArch32?
> 
> I saw an implicit mention of aarch32 when the commit message used
> the term "hyp", which is the name of an aarch32 execution mode.
> 
> > 
> > Or is that a very roundabout way to object to the 'hyp' name?
> 
> Bingo.
> 
> > If that's the case, just apply a mental 's/hyp/el2/' substitution.
> 
> I'm a bit confused about that. Is that something that anyone reading
> the patch should apply mentally when reading the patch, or is it
> something that you plan to change in the commit subject?

Big picture:

maz@hot-poop:~/arm-platforms$ git grep -i hyp arch/arm64/|wc -l
1701
maz@hot-poop:~/arm-platforms$ git grep -i el2 arch/arm64/|wc -l
1008

Are we going to also repaint all these 'hyp' references?

I really appreciate all the hard work you are putting in carefully
reviewing the code. I *really* do. But bickering on this really
doesn't help, and I know you understand exactly what this subject line
means (you've been reviewing KVM code for long enough, and won't fool
anyone).

The point you are trying to make really is moot. Everybody understands
that HYP means EL2. I'd even argue that it is clearer than EL2,
because it indicates that we're running at EL2 with the role of a
hypervisor, which isn't that clear with running with VHE.

And if they don't understand that, it unfortunately means they can't
really review this code.

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 43/64] KVM: arm64: nv: arch_timer: Support hyp timer emulation
@ 2022-03-07 16:52           ` Marc Zyngier
  0 siblings, 0 replies; 378+ messages in thread
From: Marc Zyngier @ 2022-03-07 16:52 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

On Mon, 07 Mar 2022 16:28:44 +0000,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> Hi,
> 
> On Mon, Mar 07, 2022 at 03:48:19PM +0000, Marc Zyngier wrote:
> > On 2022-03-07 14:52, Alexandru Elisei wrote:
> > > Hi,
> > > 
> > > I was under the impression that KVM's nested virtualization doesn't
> > > support
> > > AArch32 in the guest, why is the subject about hyp mode aarch32 timers?
> > 
> > Where did you see *ANY* mention of AArch32?
> 
> I saw an implicit mention of aarch32 when the commit message used
> the term "hyp", which is the name of an aarch32 execution mode.
> 
> > 
> > Or is that a very roundabout way to object to the 'hyp' name?
> 
> Bingo.
> 
> > If that's the case, just apply a mental 's/hyp/el2/' substitution.
> 
> I'm a bit confused about that. Is that something that anyone reading
> the patch should apply mentally when reading the patch, or is it
> something that you plan to change in the commit subject?

Big picture:

maz@hot-poop:~/arm-platforms$ git grep -i hyp arch/arm64/|wc -l
1701
maz@hot-poop:~/arm-platforms$ git grep -i el2 arch/arm64/|wc -l
1008

Are we going to also repaint all these 'hyp' references?

I really appreciate all the hard work you are putting in carefully
reviewing the code. I *really* do. But bickering on this really
doesn't help, and I know you understand exactly what this subject line
means (you've been reviewing KVM code for long enough, and won't fool
anyone).

The point you are trying to make really is moot. Everybody understands
that HYP means EL2. I'd even argue that it is clearer than EL2,
because it indicates that we're running at EL2 with the role of a
hypervisor, which isn't that clear with running with VHE.

And if they don't understand that, it unfortunately means they can't
really review this code.

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 43/64] KVM: arm64: nv: arch_timer: Support hyp timer emulation
  2022-03-07 16:52           ` Marc Zyngier
  (?)
@ 2022-03-07 17:13             ` Alexandru Elisei
  -1 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-03-07 17:13 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi,

On Mon, Mar 07, 2022 at 04:52:15PM +0000, Marc Zyngier wrote:
> On Mon, 07 Mar 2022 16:28:44 +0000,
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > 
> > Hi,
> > 
> > On Mon, Mar 07, 2022 at 03:48:19PM +0000, Marc Zyngier wrote:
> > > On 2022-03-07 14:52, Alexandru Elisei wrote:
> > > > Hi,
> > > > 
> > > > I was under the impression that KVM's nested virtualization doesn't
> > > > support
> > > > AArch32 in the guest, why is the subject about hyp mode aarch32 timers?
> > > 
> > > Where did you see *ANY* mention of AArch32?
> > 
> > I saw an implicit mention of aarch32 when the commit message used
> > the term "hyp", which is the name of an aarch32 execution mode.
> > 
> > > 
> > > Or is that a very roundabout way to object to the 'hyp' name?
> > 
> > Bingo.
> > 
> > > If that's the case, just apply a mental 's/hyp/el2/' substitution.
> > 
> > I'm a bit confused about that. Is that something that anyone reading
> > the patch should apply mentally when reading the patch, or is it
> > something that you plan to change in the commit subject?
> 
> Big picture:
> 
> maz@hot-poop:~/arm-platforms$ git grep -i hyp arch/arm64/|wc -l
> 1701
> maz@hot-poop:~/arm-platforms$ git grep -i el2 arch/arm64/|wc -l
> 1008
> 
> Are we going to also repaint all these 'hyp' references?

I didn't say, nor suggest that. I suggested that *this* commit could be changed
to use the architectural terminology, which I assumed it's the most natural
terminology to use in a patch series which has the goal to emulate the
architectural EL2. Obviously, opinions vary.

> 
> I really appreciate all the hard work you are putting in carefully
> reviewing the code. I *really* do. But bickering on this really
> doesn't help, and I know you understand exactly what this subject line
> means (you've been reviewing KVM code for long enough, and won't fool
> anyone).
> 
> The point you are trying to make really is moot. Everybody understands
> that HYP means EL2. I'd even argue that it is clearer than EL2,
> because it indicates that we're running at EL2 with the role of a
> hypervisor, which isn't that clear with running with VHE.

So HYP means EL2 or EL2 with the role of a hypervisor? Because when booting KVM
on a machine without FEAT_VHE, dmesg uses "hyp mode" to refer to EL2 without
FEAT_VHE. And is_hyp_ctxt(), which this series adds, refers to EL0 with
HCR_EL2.TGE set.

I've given these examples before (and others), in the end it's up to you how
precise you want the terminology to be and how easy to understand you want to
make the code.

Thanks,
Alex

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 43/64] KVM: arm64: nv: arch_timer: Support hyp timer emulation
@ 2022-03-07 17:13             ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-03-07 17:13 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, kvm, Andre Przywara, Christoffer Dall,
	Chase Conklin, kvmarm, mihai.carabas, Ganapatrao Kulkarni,
	Russell King (Oracle),
	linux-arm-kernel

Hi,

On Mon, Mar 07, 2022 at 04:52:15PM +0000, Marc Zyngier wrote:
> On Mon, 07 Mar 2022 16:28:44 +0000,
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > 
> > Hi,
> > 
> > On Mon, Mar 07, 2022 at 03:48:19PM +0000, Marc Zyngier wrote:
> > > On 2022-03-07 14:52, Alexandru Elisei wrote:
> > > > Hi,
> > > > 
> > > > I was under the impression that KVM's nested virtualization doesn't
> > > > support
> > > > AArch32 in the guest, why is the subject about hyp mode aarch32 timers?
> > > 
> > > Where did you see *ANY* mention of AArch32?
> > 
> > I saw an implicit mention of aarch32 when the commit message used
> > the term "hyp", which is the name of an aarch32 execution mode.
> > 
> > > 
> > > Or is that a very roundabout way to object to the 'hyp' name?
> > 
> > Bingo.
> > 
> > > If that's the case, just apply a mental 's/hyp/el2/' substitution.
> > 
> > I'm a bit confused about that. Is that something that anyone reading
> > the patch should apply mentally when reading the patch, or is it
> > something that you plan to change in the commit subject?
> 
> Big picture:
> 
> maz@hot-poop:~/arm-platforms$ git grep -i hyp arch/arm64/|wc -l
> 1701
> maz@hot-poop:~/arm-platforms$ git grep -i el2 arch/arm64/|wc -l
> 1008
> 
> Are we going to also repaint all these 'hyp' references?

I didn't say, nor suggest that. I suggested that *this* commit could be changed
to use the architectural terminology, which I assumed it's the most natural
terminology to use in a patch series which has the goal to emulate the
architectural EL2. Obviously, opinions vary.

> 
> I really appreciate all the hard work you are putting in carefully
> reviewing the code. I *really* do. But bickering on this really
> doesn't help, and I know you understand exactly what this subject line
> means (you've been reviewing KVM code for long enough, and won't fool
> anyone).
> 
> The point you are trying to make really is moot. Everybody understands
> that HYP means EL2. I'd even argue that it is clearer than EL2,
> because it indicates that we're running at EL2 with the role of a
> hypervisor, which isn't that clear with running with VHE.

So HYP means EL2 or EL2 with the role of a hypervisor? Because when booting KVM
on a machine without FEAT_VHE, dmesg uses "hyp mode" to refer to EL2 without
FEAT_VHE. And is_hyp_ctxt(), which this series adds, refers to EL0 with
HCR_EL2.TGE set.

I've given these examples before (and others), in the end it's up to you how
precise you want the terminology to be and how easy to understand you want to
make the code.

Thanks,
Alex
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 43/64] KVM: arm64: nv: arch_timer: Support hyp timer emulation
@ 2022-03-07 17:13             ` Alexandru Elisei
  0 siblings, 0 replies; 378+ messages in thread
From: Alexandru Elisei @ 2022-03-07 17:13 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvmarm, kvm, Andre Przywara, Christoffer Dall,
	Jintack Lim, Haibo Xu, Ganapatrao Kulkarni, Chase Conklin,
	Russell King (Oracle),
	James Morse, Suzuki K Poulose, karl.heubaum, mihai.carabas,
	miguel.luis, kernel-team

Hi,

On Mon, Mar 07, 2022 at 04:52:15PM +0000, Marc Zyngier wrote:
> On Mon, 07 Mar 2022 16:28:44 +0000,
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > 
> > Hi,
> > 
> > On Mon, Mar 07, 2022 at 03:48:19PM +0000, Marc Zyngier wrote:
> > > On 2022-03-07 14:52, Alexandru Elisei wrote:
> > > > Hi,
> > > > 
> > > > I was under the impression that KVM's nested virtualization doesn't
> > > > support
> > > > AArch32 in the guest, why is the subject about hyp mode aarch32 timers?
> > > 
> > > Where did you see *ANY* mention of AArch32?
> > 
> > I saw an implicit mention of aarch32 when the commit message used
> > the term "hyp", which is the name of an aarch32 execution mode.
> > 
> > > 
> > > Or is that a very roundabout way to object to the 'hyp' name?
> > 
> > Bingo.
> > 
> > > If that's the case, just apply a mental 's/hyp/el2/' substitution.
> > 
> > I'm a bit confused about that. Is that something that anyone reading
> > the patch should apply mentally when reading the patch, or is it
> > something that you plan to change in the commit subject?
> 
> Big picture:
> 
> maz@hot-poop:~/arm-platforms$ git grep -i hyp arch/arm64/|wc -l
> 1701
> maz@hot-poop:~/arm-platforms$ git grep -i el2 arch/arm64/|wc -l
> 1008
> 
> Are we going to also repaint all these 'hyp' references?

I didn't say, nor suggest that. I suggested that *this* commit could be changed
to use the architectural terminology, which I assumed it's the most natural
terminology to use in a patch series which has the goal to emulate the
architectural EL2. Obviously, opinions vary.

> 
> I really appreciate all the hard work you are putting in carefully
> reviewing the code. I *really* do. But bickering on this really
> doesn't help, and I know you understand exactly what this subject line
> means (you've been reviewing KVM code for long enough, and won't fool
> anyone).
> 
> The point you are trying to make really is moot. Everybody understands
> that HYP means EL2. I'd even argue that it is clearer than EL2,
> because it indicates that we're running at EL2 with the role of a
> hypervisor, which isn't that clear with running with VHE.

So HYP means EL2 or EL2 with the role of a hypervisor? Because when booting KVM
on a machine without FEAT_VHE, dmesg uses "hyp mode" to refer to EL2 without
FEAT_VHE. And is_hyp_ctxt(), which this series adds, refers to EL0 with
HCR_EL2.TGE set.

I've given these examples before (and others), in the end it's up to you how
precise you want the terminology to be and how easy to understand you want to
make the code.

Thanks,
Alex

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 60/64] KVM: arm64: nv: Sync nested timer state with ARMv8.4
  2022-01-28 12:19   ` Marc Zyngier
  (?)
@ 2022-04-01 17:51     ` Chase Conklin
  -1 siblings, 0 replies; 378+ messages in thread
From: Chase Conklin @ 2022-04-01 17:51 UTC (permalink / raw)
  To: maz
  Cc: alexandru.elisei, andre.przywara, chase.conklin,
	christoffer.dall, gankulkarni, haibo.xu, james.morse, jintack,
	karl.heubaum, kernel-team, kvm, kvmarm, linux-arm-kernel, linux,
	miguel.luis, mihai.carabas, suzuki.poulose

Hi Marc,

On Fri, 28 Jan 2022 12:19:08 +0000, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall at arm.com>
>
> Emulating the ARMv8.4-NV timers is a bit odd, as the timers can
> be reconfigured behind our back without the hypervisor even
> noticing. In the VHE case, that's an actual regression in the
> architecture...

In addition to that, I belive that the vEL2's view of CNTy_CTL_ELx.ISTATUS can
get out of sync with the corresponding timer conditions. Currently, the values
are kept in NVMem and updated only during a put of a vCPU.

I'd like to say that this could be fixed by updating the NVMem copies on each
entry into vEL2, but that doesn't prevent them from getting out of sync while
the vEL2 is still running. Provided that the host takes a timer interrupt
whenever a vEL2 timer condition is satisfied, the host should have a chance to
update the NVMem copy before the vEL2 can see an out of sync value. Even still,
I think there is still a small window where vEL2 can read the NVMem copy after
the timer condition is met but before the host timer interrupt fires. In
practice, that might not not be a huge issue.

The only other option I can see is to trap the accesses (which for the virtual
timer requires FEAT_ECV). At least that would prevent the timers from being
configured behind the host's back...

Thanks,
Chase

>
> Signed-off-by: Christoffer Dall <christoffer.dall at arm.com>
> Signed-off-by: Marc Zyngier <maz at kernel.org>
> ---
>  arch/arm64/kvm/arch_timer.c  | 37 ++++++++++++++++++++++++++++++++++++
>  arch/arm64/kvm/arm.c         |  3 +++
>  include/kvm/arm_arch_timer.h |  1 +
>  3 files changed, 41 insertions(+)
>
> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> index 5e4f93605d36..2371796b1ab5 100644
> --- a/arch/arm64/kvm/arch_timer.c
> +++ b/arch/arm64/kvm/arch_timer.c
> @@ -785,6 +785,43 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
>  	set_cntvoff(0);
>  }
>  
> +void kvm_timer_sync_nested(struct kvm_vcpu *vcpu)
> +{
> +	if (!is_hyp_ctxt(vcpu))
> +		return;
> +
> +	/*
> +	 * Guest hypervisors using ARMv8.4 enhanced nested virt support have
> +	 * their EL1 timer register accesses redirected to the VNCR page.
> +	 */
> +	if (!vcpu_el2_e2h_is_set(vcpu)) {
> +		/*
> +		 * For a non-VHE guest hypervisor, we update the hardware
> +		 * timer registers with the latest value written by the guest
> +		 * to the VNCR page and let the hardware take care of the
> +		 * rest.
> +		 */
> +		write_sysreg_el0(__vcpu_sys_reg(vcpu, CNTV_CTL_EL0),  SYS_CNTV_CTL);
> +		write_sysreg_el0(__vcpu_sys_reg(vcpu, CNTV_CVAL_EL0), SYS_CNTV_CVAL);
> +		write_sysreg_el0(__vcpu_sys_reg(vcpu, CNTP_CTL_EL0),  SYS_CNTP_CTL);
> +		write_sysreg_el0(__vcpu_sys_reg(vcpu, CNTP_CVAL_EL0), SYS_CNTP_CVAL);
> +	} else {
> +		/*
> +		 * For a VHE guest hypervisor, the emulated state (which
> +		 * is stored in the VNCR page) could have been updated behind
> +		 * our back, and we must reset the emulation of the timers.
> +		 */
> +
> +		struct timer_map map;
> +		get_timer_map(vcpu, &map);
> +
> +		soft_timer_cancel(&map.emul_vtimer->hrtimer);
> +		soft_timer_cancel(&map.emul_ptimer->hrtimer);
> +		timer_emulate(map.emul_vtimer);
> +		timer_emulate(map.emul_ptimer);
> +	}
> +}
> +
>  /*
>   * With a userspace irqchip we have to check if the guest de-asserted the
>   * timer and if so, unmask the timer irq signal on the host interrupt
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index ac7d89c1e987..4c47a66eac8c 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -936,6 +936,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>  		if (static_branch_unlikely(&userspace_irqchip_in_use))
>  			kvm_timer_sync_user(vcpu);
>  
> +		if (vcpu_has_nv2(vcpu))
> +			kvm_timer_sync_nested(vcpu);
> +
>  		kvm_arch_vcpu_ctxsync_fp(vcpu);
>  
>  		/*
> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> index 0a76dac8cb6a..89b08e5b456e 100644
> --- a/include/kvm/arm_arch_timer.h
> +++ b/include/kvm/arm_arch_timer.h
> @@ -68,6 +68,7 @@ int kvm_timer_hyp_init(bool);
>  int kvm_timer_enable(struct kvm_vcpu *vcpu);
>  int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu);
>  void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu);
> +void kvm_timer_sync_nested(struct kvm_vcpu *vcpu);
>  void kvm_timer_sync_user(struct kvm_vcpu *vcpu);
>  bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu);
>  void kvm_timer_update_run(struct kvm_vcpu *vcpu);
> -- 
> 2.30.2


^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 60/64] KVM: arm64: nv: Sync nested timer state with ARMv8.4
@ 2022-04-01 17:51     ` Chase Conklin
  0 siblings, 0 replies; 378+ messages in thread
From: Chase Conklin @ 2022-04-01 17:51 UTC (permalink / raw)
  To: maz
  Cc: alexandru.elisei, andre.przywara, chase.conklin,
	christoffer.dall, gankulkarni, haibo.xu, james.morse, jintack,
	karl.heubaum, kernel-team, kvm, kvmarm, linux-arm-kernel, linux,
	miguel.luis, mihai.carabas, suzuki.poulose

Hi Marc,

On Fri, 28 Jan 2022 12:19:08 +0000, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall at arm.com>
>
> Emulating the ARMv8.4-NV timers is a bit odd, as the timers can
> be reconfigured behind our back without the hypervisor even
> noticing. In the VHE case, that's an actual regression in the
> architecture...

In addition to that, I belive that the vEL2's view of CNTy_CTL_ELx.ISTATUS can
get out of sync with the corresponding timer conditions. Currently, the values
are kept in NVMem and updated only during a put of a vCPU.

I'd like to say that this could be fixed by updating the NVMem copies on each
entry into vEL2, but that doesn't prevent them from getting out of sync while
the vEL2 is still running. Provided that the host takes a timer interrupt
whenever a vEL2 timer condition is satisfied, the host should have a chance to
update the NVMem copy before the vEL2 can see an out of sync value. Even still,
I think there is still a small window where vEL2 can read the NVMem copy after
the timer condition is met but before the host timer interrupt fires. In
practice, that might not not be a huge issue.

The only other option I can see is to trap the accesses (which for the virtual
timer requires FEAT_ECV). At least that would prevent the timers from being
configured behind the host's back...

Thanks,
Chase

>
> Signed-off-by: Christoffer Dall <christoffer.dall at arm.com>
> Signed-off-by: Marc Zyngier <maz at kernel.org>
> ---
>  arch/arm64/kvm/arch_timer.c  | 37 ++++++++++++++++++++++++++++++++++++
>  arch/arm64/kvm/arm.c         |  3 +++
>  include/kvm/arm_arch_timer.h |  1 +
>  3 files changed, 41 insertions(+)
>
> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> index 5e4f93605d36..2371796b1ab5 100644
> --- a/arch/arm64/kvm/arch_timer.c
> +++ b/arch/arm64/kvm/arch_timer.c
> @@ -785,6 +785,43 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
>  	set_cntvoff(0);
>  }
>  
> +void kvm_timer_sync_nested(struct kvm_vcpu *vcpu)
> +{
> +	if (!is_hyp_ctxt(vcpu))
> +		return;
> +
> +	/*
> +	 * Guest hypervisors using ARMv8.4 enhanced nested virt support have
> +	 * their EL1 timer register accesses redirected to the VNCR page.
> +	 */
> +	if (!vcpu_el2_e2h_is_set(vcpu)) {
> +		/*
> +		 * For a non-VHE guest hypervisor, we update the hardware
> +		 * timer registers with the latest value written by the guest
> +		 * to the VNCR page and let the hardware take care of the
> +		 * rest.
> +		 */
> +		write_sysreg_el0(__vcpu_sys_reg(vcpu, CNTV_CTL_EL0),  SYS_CNTV_CTL);
> +		write_sysreg_el0(__vcpu_sys_reg(vcpu, CNTV_CVAL_EL0), SYS_CNTV_CVAL);
> +		write_sysreg_el0(__vcpu_sys_reg(vcpu, CNTP_CTL_EL0),  SYS_CNTP_CTL);
> +		write_sysreg_el0(__vcpu_sys_reg(vcpu, CNTP_CVAL_EL0), SYS_CNTP_CVAL);
> +	} else {
> +		/*
> +		 * For a VHE guest hypervisor, the emulated state (which
> +		 * is stored in the VNCR page) could have been updated behind
> +		 * our back, and we must reset the emulation of the timers.
> +		 */
> +
> +		struct timer_map map;
> +		get_timer_map(vcpu, &map);
> +
> +		soft_timer_cancel(&map.emul_vtimer->hrtimer);
> +		soft_timer_cancel(&map.emul_ptimer->hrtimer);
> +		timer_emulate(map.emul_vtimer);
> +		timer_emulate(map.emul_ptimer);
> +	}
> +}
> +
>  /*
>   * With a userspace irqchip we have to check if the guest de-asserted the
>   * timer and if so, unmask the timer irq signal on the host interrupt
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index ac7d89c1e987..4c47a66eac8c 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -936,6 +936,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>  		if (static_branch_unlikely(&userspace_irqchip_in_use))
>  			kvm_timer_sync_user(vcpu);
>  
> +		if (vcpu_has_nv2(vcpu))
> +			kvm_timer_sync_nested(vcpu);
> +
>  		kvm_arch_vcpu_ctxsync_fp(vcpu);
>  
>  		/*
> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> index 0a76dac8cb6a..89b08e5b456e 100644
> --- a/include/kvm/arm_arch_timer.h
> +++ b/include/kvm/arm_arch_timer.h
> @@ -68,6 +68,7 @@ int kvm_timer_hyp_init(bool);
>  int kvm_timer_enable(struct kvm_vcpu *vcpu);
>  int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu);
>  void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu);
> +void kvm_timer_sync_nested(struct kvm_vcpu *vcpu);
>  void kvm_timer_sync_user(struct kvm_vcpu *vcpu);
>  bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu);
>  void kvm_timer_update_run(struct kvm_vcpu *vcpu);
> -- 
> 2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 378+ messages in thread

* Re: [PATCH v6 60/64] KVM: arm64: nv: Sync nested timer state with ARMv8.4
@ 2022-04-01 17:51     ` Chase Conklin
  0 siblings, 0 replies; 378+ messages in thread
From: Chase Conklin @ 2022-04-01 17:51 UTC (permalink / raw)
  To: maz
  Cc: kvmarm, kernel-team, kvm, andre.przywara, christoffer.dall,
	chase.conklin, mihai.carabas, gankulkarni, linux,
	linux-arm-kernel

Hi Marc,

On Fri, 28 Jan 2022 12:19:08 +0000, Marc Zyngier wrote:
> From: Christoffer Dall <christoffer.dall at arm.com>
>
> Emulating the ARMv8.4-NV timers is a bit odd, as the timers can
> be reconfigured behind our back without the hypervisor even
> noticing. In the VHE case, that's an actual regression in the
> architecture...

In addition to that, I belive that the vEL2's view of CNTy_CTL_ELx.ISTATUS can
get out of sync with the corresponding timer conditions. Currently, the values
are kept in NVMem and updated only during a put of a vCPU.

I'd like to say that this could be fixed by updating the NVMem copies on each
entry into vEL2, but that doesn't prevent them from getting out of sync while
the vEL2 is still running. Provided that the host takes a timer interrupt
whenever a vEL2 timer condition is satisfied, the host should have a chance to
update the NVMem copy before the vEL2 can see an out of sync value. Even still,
I think there is still a small window where vEL2 can read the NVMem copy after
the timer condition is met but before the host timer interrupt fires. In
practice, that might not not be a huge issue.

The only other option I can see is to trap the accesses (which for the virtual
timer requires FEAT_ECV). At least that would prevent the timers from being
configured behind the host's back...

Thanks,
Chase

>
> Signed-off-by: Christoffer Dall <christoffer.dall at arm.com>
> Signed-off-by: Marc Zyngier <maz at kernel.org>
> ---
>  arch/arm64/kvm/arch_timer.c  | 37 ++++++++++++++++++++++++++++++++++++
>  arch/arm64/kvm/arm.c         |  3 +++
>  include/kvm/arm_arch_timer.h |  1 +
>  3 files changed, 41 insertions(+)
>
> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> index 5e4f93605d36..2371796b1ab5 100644
> --- a/arch/arm64/kvm/arch_timer.c
> +++ b/arch/arm64/kvm/arch_timer.c
> @@ -785,6 +785,43 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
>  	set_cntvoff(0);
>  }
>  
> +void kvm_timer_sync_nested(struct kvm_vcpu *vcpu)
> +{
> +	if (!is_hyp_ctxt(vcpu))
> +		return;
> +
> +	/*
> +	 * Guest hypervisors using ARMv8.4 enhanced nested virt support have
> +	 * their EL1 timer register accesses redirected to the VNCR page.
> +	 */
> +	if (!vcpu_el2_e2h_is_set(vcpu)) {
> +		/*
> +		 * For a non-VHE guest hypervisor, we update the hardware
> +		 * timer registers with the latest value written by the guest
> +		 * to the VNCR page and let the hardware take care of the
> +		 * rest.
> +		 */
> +		write_sysreg_el0(__vcpu_sys_reg(vcpu, CNTV_CTL_EL0),  SYS_CNTV_CTL);
> +		write_sysreg_el0(__vcpu_sys_reg(vcpu, CNTV_CVAL_EL0), SYS_CNTV_CVAL);
> +		write_sysreg_el0(__vcpu_sys_reg(vcpu, CNTP_CTL_EL0),  SYS_CNTP_CTL);
> +		write_sysreg_el0(__vcpu_sys_reg(vcpu, CNTP_CVAL_EL0), SYS_CNTP_CVAL);
> +	} else {
> +		/*
> +		 * For a VHE guest hypervisor, the emulated state (which
> +		 * is stored in the VNCR page) could have been updated behind
> +		 * our back, and we must reset the emulation of the timers.
> +		 */
> +
> +		struct timer_map map;
> +		get_timer_map(vcpu, &map);
> +
> +		soft_timer_cancel(&map.emul_vtimer->hrtimer);
> +		soft_timer_cancel(&map.emul_ptimer->hrtimer);
> +		timer_emulate(map.emul_vtimer);
> +		timer_emulate(map.emul_ptimer);
> +	}
> +}
> +
>  /*
>   * With a userspace irqchip we have to check if the guest de-asserted the
>   * timer and if so, unmask the timer irq signal on the host interrupt
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index ac7d89c1e987..4c47a66eac8c 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -936,6 +936,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>  		if (static_branch_unlikely(&userspace_irqchip_in_use))
>  			kvm_timer_sync_user(vcpu);
>  
> +		if (vcpu_has_nv2(vcpu))
> +			kvm_timer_sync_nested(vcpu);
> +
>  		kvm_arch_vcpu_ctxsync_fp(vcpu);
>  
>  		/*
> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> index 0a76dac8cb6a..89b08e5b456e 100644
> --- a/include/kvm/arm_arch_timer.h
> +++ b/include/kvm/arm_arch_timer.h
> @@ -68,6 +68,7 @@ int kvm_timer_hyp_init(bool);
>  int kvm_timer_enable(struct kvm_vcpu *vcpu);
>  int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu);
>  void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu);
> +void kvm_timer_sync_nested(struct kvm_vcpu *vcpu);
>  void kvm_timer_sync_user(struct kvm_vcpu *vcpu);
>  bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu);
>  void kvm_timer_update_run(struct kvm_vcpu *vcpu);
> -- 
> 2.30.2

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 378+ messages in thread

end of thread, other threads:[~2022-04-01 18:20 UTC | newest]

Thread overview: 378+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-28 12:18 [PATCH v6 00/64] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support Marc Zyngier
2022-01-28 12:18 ` Marc Zyngier
2022-01-28 12:18 ` Marc Zyngier
2022-01-28 12:18 ` [PATCH v6 01/64] arm64: Add ARM64_HAS_NESTED_VIRT cpufeature Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-02-01 14:22   ` Russell King (Oracle)
2022-02-01 14:22     ` Russell King (Oracle)
2022-02-01 14:22     ` Russell King (Oracle)
2022-01-28 12:18 ` [PATCH v6 02/64] KVM: arm64: nv: Introduce nested virtualization VCPU feature Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18 ` [PATCH v6 03/64] KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-02-02 11:40   ` Alexandru Elisei
2022-02-02 11:40     ` Alexandru Elisei
2022-02-02 11:40     ` Alexandru Elisei
2022-01-28 12:18 ` [PATCH v6 04/64] KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-02-02 11:53   ` Alexandru Elisei
2022-02-02 11:53     ` Alexandru Elisei
2022-02-02 11:53     ` Alexandru Elisei
2022-01-28 12:18 ` [PATCH v6 05/64] KVM: arm64: nv: Add EL2 system registers to vcpu context Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-02-11 16:35   ` Miguel Luis
2022-02-11 16:35     ` Miguel Luis
2022-02-11 16:35     ` Miguel Luis
2022-01-28 12:18 ` [PATCH v6 06/64] KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-02-02 12:10   ` Alexandru Elisei
2022-02-02 12:10     ` Alexandru Elisei
2022-02-02 12:10     ` Alexandru Elisei
2022-02-14 12:39   ` Miguel Luis
2022-02-14 12:39     ` Miguel Luis
2022-02-14 12:39     ` Miguel Luis
2022-02-14 14:20     ` Marc Zyngier
2022-02-14 14:20       ` Marc Zyngier
2022-02-14 14:20       ` Marc Zyngier
2022-01-28 12:18 ` [PATCH v6 07/64] KVM: arm64: nv: Handle HCR_EL2.NV system register traps Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-02-01 14:32   ` Russell King (Oracle)
2022-02-01 14:32     ` Russell King (Oracle)
2022-02-01 14:32     ` Russell King (Oracle)
2022-01-28 12:18 ` [PATCH v6 08/64] KVM: arm64: nv: Reset VMPIDR_EL2 and VPIDR_EL2 to sane values Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18 ` [PATCH v6 09/64] KVM: arm64: nv: Support virtual EL2 exceptions Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-02-02 15:23   ` Alexandru Elisei
2022-02-02 15:23     ` Alexandru Elisei
2022-02-02 15:23     ` Alexandru Elisei
2022-02-03 17:43     ` Marc Zyngier
2022-02-03 17:43       ` Marc Zyngier
2022-02-03 17:43       ` Marc Zyngier
2022-02-04 11:47       ` Alexandru Elisei
2022-02-04 11:47         ` Alexandru Elisei
2022-02-04 11:47         ` Alexandru Elisei
2022-01-28 12:18 ` [PATCH v6 10/64] KVM: arm64: nv: Inject HVC exceptions to the virtual EL2 Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18 ` [PATCH v6 11/64] KVM: arm64: nv: Handle trapped ERET from " Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18 ` [PATCH v6 12/64] KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-02-01 16:37   ` Russell King (Oracle)
2022-02-01 16:37     ` Russell King (Oracle)
2022-02-01 16:37     ` Russell King (Oracle)
2022-02-02 17:08   ` Alexandru Elisei
2022-02-02 17:08     ` Alexandru Elisei
2022-02-02 17:08     ` Alexandru Elisei
2022-02-03 18:29     ` Marc Zyngier
2022-02-03 18:29       ` Marc Zyngier
2022-02-03 18:29       ` Marc Zyngier
2022-02-04 12:05       ` Alexandru Elisei
2022-02-04 12:05         ` Alexandru Elisei
2022-02-04 12:05         ` Alexandru Elisei
2022-01-28 12:18 ` [PATCH v6 13/64] KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg() Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-02-01 16:40   ` Russell King (Oracle)
2022-02-01 16:40     ` Russell King (Oracle)
2022-02-01 16:40     ` Russell King (Oracle)
2022-01-28 12:18 ` [PATCH v6 14/64] KVM: arm64: nv: Handle SPSR_EL2 specially Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-02-01 16:43   ` Russell King (Oracle)
2022-02-01 16:43     ` Russell King (Oracle)
2022-02-01 16:43     ` Russell King (Oracle)
2022-01-28 12:18 ` [PATCH v6 15/64] KVM: arm64: nv: Handle HCR_EL2.E2H specially Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-02-01 16:51   ` Russell King (Oracle)
2022-02-01 16:51     ` Russell King (Oracle)
2022-02-01 16:51     ` Russell King (Oracle)
2022-02-01 18:17     ` Marc Zyngier
2022-02-01 18:17       ` Marc Zyngier
2022-02-01 18:17       ` Marc Zyngier
2022-01-28 12:18 ` [PATCH v6 16/64] KVM: arm64: nv: Save/Restore vEL2 sysregs Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-02-03 15:14   ` Alexandru Elisei
2022-02-03 15:14     ` Alexandru Elisei
2022-02-03 15:14     ` Alexandru Elisei
2022-01-28 12:18 ` [PATCH v6 17/64] KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-02-01 18:06   ` Russell King (Oracle)
2022-02-01 18:06     ` Russell King (Oracle)
2022-02-01 18:06     ` Russell King (Oracle)
2022-02-03 15:53   ` Alexandru Elisei
2022-02-03 15:53     ` Alexandru Elisei
2022-02-03 15:53     ` Alexandru Elisei
2022-01-28 12:18 ` [PATCH v6 18/64] KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2 Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-02-01 18:08   ` Russell King (Oracle)
2022-02-01 18:08     ` Russell King (Oracle)
2022-02-01 18:08     ` Russell King (Oracle)
2022-02-03 17:11   ` Alexandru Elisei
2022-02-03 17:11     ` Alexandru Elisei
2022-02-03 17:11     ` Alexandru Elisei
2022-01-28 12:18 ` [PATCH v6 19/64] KVM: arm64: nv: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 from " Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-02-01 18:13   ` Russell King (Oracle)
2022-02-01 18:13     ` Russell King (Oracle)
2022-02-01 18:13     ` Russell King (Oracle)
2022-02-03 17:27   ` Alexandru Elisei
2022-02-03 17:27     ` Alexandru Elisei
2022-02-03 17:27     ` Alexandru Elisei
2022-02-04 10:58   ` Alexandru Elisei
2022-02-04 10:58     ` Alexandru Elisei
2022-02-04 10:58     ` Alexandru Elisei
2022-01-28 12:18 ` [PATCH v6 20/64] KVM: arm64: nv: Trap CPACR_EL1 access in " Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-02-04 11:10   ` Alexandru Elisei
2022-02-04 11:10     ` Alexandru Elisei
2022-02-04 11:10     ` Alexandru Elisei
2022-01-28 12:18 ` [PATCH v6 21/64] KVM: arm64: nv: Handle PSCI call via smc from the guest Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-02-04 14:02   ` Alexandru Elisei
2022-02-04 14:02     ` Alexandru Elisei
2022-02-04 14:02     ` Alexandru Elisei
2022-01-28 12:18 ` [PATCH v6 22/64] KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-02-04 15:40   ` Alexandru Elisei
2022-02-04 15:40     ` Alexandru Elisei
2022-02-04 15:40     ` Alexandru Elisei
2022-02-04 16:01     ` Alexandru Elisei
2022-02-04 16:01       ` Alexandru Elisei
2022-02-04 16:01       ` Alexandru Elisei
2022-02-07 15:38     ` Alexandru Elisei
2022-02-07 15:38       ` Alexandru Elisei
2022-02-07 15:38       ` Alexandru Elisei
2022-01-28 12:18 ` [PATCH v6 23/64] KVM: arm64: nv: Respect virtual CPTR_EL2.{TFP,FPEN} settings Marc Zyngier
2022-01-28 12:18   ` [PATCH v6 23/64] KVM: arm64: nv: Respect virtual CPTR_EL2.{TFP, FPEN} settings Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18 ` [PATCH v6 24/64] KVM: arm64: nv: Respect the virtual HCR_EL2.NV bit setting Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-02-07 15:33   ` Alexandru Elisei
2022-02-07 15:33     ` Alexandru Elisei
2022-02-07 15:33     ` Alexandru Elisei
2022-01-28 12:18 ` [PATCH v6 25/64] KVM: arm64: nv: Respect virtual HCR_EL2.TVM and TRVM settings Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-02-07 16:18   ` Alexandru Elisei
2022-02-07 16:18     ` Alexandru Elisei
2022-02-07 16:18     ` Alexandru Elisei
2022-01-28 12:18 ` [PATCH v6 26/64] KVM: arm64: nv: Respect the virtual HCR_EL2.NV1 bit setting Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-02-07 16:36   ` Alexandru Elisei
2022-02-07 16:36     ` Alexandru Elisei
2022-02-07 16:36     ` Alexandru Elisei
2022-01-28 12:18 ` [PATCH v6 27/64] KVM: arm64: nv: Allow a sysreg to be hidden from userspace only Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-02-08 14:36   ` Alexandru Elisei
2022-02-08 14:36     ` Alexandru Elisei
2022-02-08 14:36     ` Alexandru Elisei
2022-01-28 12:18 ` [PATCH v6 28/64] KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2 Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-02-08 15:35   ` Alexandru Elisei
2022-02-08 15:35     ` Alexandru Elisei
2022-02-08 15:35     ` Alexandru Elisei
2022-01-28 12:18 ` [PATCH v6 29/64] KVM: arm64: nv: Forward debug traps to the nested guest Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-02-09 11:04   ` Alexandru Elisei
2022-02-09 11:04     ` Alexandru Elisei
2022-02-09 11:04     ` Alexandru Elisei
2022-01-28 12:18 ` [PATCH v6 30/64] KVM: arm64: nv: Configure HCR_EL2 for nested virtualization Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-02-09 16:41   ` Alexandru Elisei
2022-02-09 16:41     ` Alexandru Elisei
2022-02-09 16:41     ` Alexandru Elisei
2022-01-28 12:18 ` [PATCH v6 31/64] KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 changes Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-02-09 16:56   ` Alexandru Elisei
2022-02-09 16:56     ` Alexandru Elisei
2022-02-09 16:56     ` Alexandru Elisei
2022-01-28 12:18 ` [PATCH v6 32/64] KVM: arm64: nv: Filter out unsupported features from ID regs Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-02-09 17:33   ` Alexandru Elisei
2022-02-09 17:33     ` Alexandru Elisei
2022-02-09 17:33     ` Alexandru Elisei
2022-01-28 12:18 ` [PATCH v6 33/64] KVM: arm64: nv: Hide RAS from nested guests Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18 ` [PATCH v6 34/64] KVM: arm64: nv: Support multiple nested Stage-2 mmu structures Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-02-16 16:12   ` Alexandru Elisei
2022-02-16 16:12     ` Alexandru Elisei
2022-02-16 16:12     ` Alexandru Elisei
2022-02-24 14:25   ` Alexandru Elisei
2022-02-24 14:25     ` Alexandru Elisei
2022-02-24 14:25     ` Alexandru Elisei
2022-01-28 12:18 ` [PATCH v6 35/64] KVM: arm64: nv: Implement nested Stage-2 page table walk logic Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18 ` [PATCH v6 36/64] KVM: arm64: nv: Handle shadow stage 2 page faults Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-02-17 15:23   ` Alexandru Elisei
2022-02-17 15:23     ` Alexandru Elisei
2022-02-17 15:23     ` Alexandru Elisei
2022-01-28 12:18 ` [PATCH v6 37/64] KVM: arm64: nv: Restrict S2 RD/WR permissions to match the guest's Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-02-17 16:29   ` Alexandru Elisei
2022-02-17 16:29     ` Alexandru Elisei
2022-02-17 16:29     ` Alexandru Elisei
2022-01-28 12:18 ` [PATCH v6 38/64] KVM: arm64: nv: Unmap/flush shadow stage 2 page tables Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-02-22 16:13   ` Alexandru Elisei
2022-02-22 16:13     ` Alexandru Elisei
2022-02-22 16:13     ` Alexandru Elisei
2022-01-28 12:18 ` [PATCH v6 39/64] KVM: arm64: nv: Set a handler for the system instruction traps Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-02-24 11:59   ` Alexandru Elisei
2022-02-24 11:59     ` Alexandru Elisei
2022-02-24 11:59     ` Alexandru Elisei
2022-01-28 12:18 ` [PATCH v6 40/64] KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2 Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-02-24 15:39   ` Alexandru Elisei
2022-02-24 15:39     ` Alexandru Elisei
2022-02-24 15:39     ` Alexandru Elisei
2022-01-28 12:18 ` [PATCH v6 41/64] KVM: arm64: nv: Trap and emulate TLBI " Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-02-24 15:56   ` Alexandru Elisei
2022-02-24 15:56     ` Alexandru Elisei
2022-02-24 15:56     ` Alexandru Elisei
2022-01-28 12:18 ` [PATCH v6 42/64] KVM: arm64: nv: Fold guest's HCR_EL2 configuration into the host's Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-02-25 13:45   ` Alexandru Elisei
2022-02-25 13:45     ` Alexandru Elisei
2022-02-25 13:45     ` Alexandru Elisei
2022-01-28 12:18 ` [PATCH v6 43/64] KVM: arm64: nv: arch_timer: Support hyp timer emulation Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-03-07 14:52   ` Alexandru Elisei
2022-03-07 14:52     ` Alexandru Elisei
2022-03-07 14:52     ` Alexandru Elisei
2022-03-07 15:48     ` Marc Zyngier
2022-03-07 15:48       ` Marc Zyngier
2022-03-07 15:48       ` Marc Zyngier
2022-03-07 16:28       ` Alexandru Elisei
2022-03-07 16:28         ` Alexandru Elisei
2022-03-07 16:28         ` Alexandru Elisei
2022-03-07 16:52         ` Marc Zyngier
2022-03-07 16:52           ` Marc Zyngier
2022-03-07 16:52           ` Marc Zyngier
2022-03-07 17:13           ` Alexandru Elisei
2022-03-07 17:13             ` Alexandru Elisei
2022-03-07 17:13             ` Alexandru Elisei
2022-03-07 15:23   ` Alexandru Elisei
2022-03-07 15:23     ` Alexandru Elisei
2022-03-07 15:23     ` Alexandru Elisei
2022-03-07 15:44     ` Marc Zyngier
2022-03-07 15:44       ` Marc Zyngier
2022-03-07 15:44       ` Marc Zyngier
2022-03-07 16:24       ` Alexandru Elisei
2022-03-07 16:24         ` Alexandru Elisei
2022-03-07 16:24         ` Alexandru Elisei
2022-03-07 16:40         ` Marc Zyngier
2022-03-07 16:40           ` Marc Zyngier
2022-03-07 16:40           ` Marc Zyngier
2022-01-28 12:18 ` [PATCH v6 44/64] KVM: arm64: nv: Add handling of EL2-specific timer registers Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-03-07 16:01   ` Alexandru Elisei
2022-03-07 16:01     ` Alexandru Elisei
2022-03-07 16:01     ` Alexandru Elisei
2022-01-28 12:18 ` [PATCH v6 45/64] KVM: arm64: nv: Load timer before the GIC Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18 ` [PATCH v6 46/64] KVM: arm64: nv: Nested GICv3 Support Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18 ` [PATCH v6 47/64] KVM: arm64: nv: Don't load the GICv4 context on entering a nested guest Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18 ` [PATCH v6 48/64] KVM: arm64: nv: vgic: Emulate the HW bit in software Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18 ` [PATCH v6 49/64] KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18 ` [PATCH v6 50/64] KVM: arm64: nv: Implement maintenance interrupt forwarding Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18 ` [PATCH v6 51/64] KVM: arm64: nv: Add nested GICv3 tracepoints Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:18   ` Marc Zyngier
2022-01-28 12:19 ` [PATCH v6 52/64] KVM: arm64: nv: Allow userspace to request KVM_ARM_VCPU_NESTED_VIRT Marc Zyngier
2022-01-28 12:19   ` Marc Zyngier
2022-01-28 12:19   ` Marc Zyngier
2022-01-28 12:19 ` [PATCH v6 53/64] KVM: arm64: nv: Add handling of ARMv8.4-TTL TLB invalidation Marc Zyngier
2022-01-28 12:19   ` Marc Zyngier
2022-01-28 12:19   ` Marc Zyngier
2022-01-28 12:19 ` [PATCH v6 54/64] KVM: arm64: nv: Invalidate TLBs based on shadow S2 TTL-like information Marc Zyngier
2022-01-28 12:19   ` Marc Zyngier
2022-01-28 12:19   ` Marc Zyngier
2022-01-28 12:19 ` [PATCH v6 55/64] KVM: arm64: nv: Tag shadow S2 entries with nested level Marc Zyngier
2022-01-28 12:19   ` Marc Zyngier
2022-01-28 12:19   ` Marc Zyngier
2022-01-28 12:19 ` [PATCH v6 56/64] KVM: arm64: nv: Add include containing the VNCR_EL2 offsets Marc Zyngier
2022-01-28 12:19   ` Marc Zyngier
2022-01-28 12:19   ` Marc Zyngier
2022-01-28 12:19 ` [PATCH v6 57/64] KVM: arm64: nv: Map VNCR-capable registers to a separate page Marc Zyngier
2022-01-28 12:19   ` Marc Zyngier
2022-01-28 12:19   ` Marc Zyngier
2022-01-28 12:19 ` [PATCH v6 58/64] KVM: arm64: nv: Move nested vgic state into the sysreg file Marc Zyngier
2022-01-28 12:19   ` Marc Zyngier
2022-01-28 12:19   ` Marc Zyngier
2022-01-28 12:19 ` [PATCH v6 59/64] KVM: arm64: Add ARMv8.4 Enhanced Nested Virt cpufeature Marc Zyngier
2022-01-28 12:19   ` Marc Zyngier
2022-01-28 12:19   ` Marc Zyngier
2022-01-28 12:19 ` [PATCH v6 60/64] KVM: arm64: nv: Sync nested timer state with ARMv8.4 Marc Zyngier
2022-01-28 12:19   ` Marc Zyngier
2022-01-28 12:19   ` Marc Zyngier
2022-04-01 17:51   ` Chase Conklin
2022-04-01 17:51     ` Chase Conklin
2022-04-01 17:51     ` Chase Conklin
2022-01-28 12:19 ` [PATCH v6 61/64] KVM: arm64: nv: Allocate VNCR page when required Marc Zyngier
2022-01-28 12:19   ` Marc Zyngier
2022-01-28 12:19   ` Marc Zyngier
2022-01-28 12:19 ` [PATCH v6 62/64] KVM: arm64: nv: Enable ARMv8.4-NV support Marc Zyngier
2022-01-28 12:19   ` Marc Zyngier
2022-01-28 12:19   ` Marc Zyngier
2022-01-28 12:19 ` [PATCH v6 63/64] KVM: arm64: nv: Fast-track 'InHost' exception returns Marc Zyngier
2022-01-28 12:19   ` Marc Zyngier
2022-01-28 12:19   ` Marc Zyngier
2022-01-28 12:19 ` [PATCH v6 64/64] KVM: arm64: nv: Fast-track EL1 TLBIs for VHE guests Marc Zyngier
2022-01-28 12:19   ` Marc Zyngier
2022-01-28 12:19   ` Marc Zyngier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.