linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 00/55] Nested Virtualization on KVM/ARM
@ 2017-01-09  6:23 Jintack Lim
  2017-01-09  6:23 ` [RFC 01/55] arm64: Add missing TCR hw defines Jintack Lim
                   ` (56 more replies)
  0 siblings, 57 replies; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:23 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=yes, Size: 10862 bytes --]

Nested virtualization is the ability to run a virtual machine inside another
virtual machine. In other words, it’s about running a hypervisor (the guest
hypervisor) on top of another hypervisor (the host hypervisor).

This series supports nested virtualization on arm64. ARM recently announced an
extension (ARMv8.3) which has support for nested virtualization[1]. This series
is based on the ARMv8.3 specification.

Supporting nested virtualization means that the hypervisor provides not only
EL0/EL1 execution environment with VMs as it usually does, but also the
virtualization extensions including EL2 execution environment with the VMs.
Once the host hypervisor provides those execution environment with the VMs,
then the guest hypervisor can run its own VMs (nested VMs) naturally.

To support nested virtualization on ARM the hypervisor must emulate a virtual
execution environment consisting of EL2, EL1, and EL0, as the guest hypervisor
will run in a virtual EL2 mode.  Normally KVM/ARM only emulated a VM supporting
EL1/0 running in their respective native CPU modes, but with nested
virtualization we deprivilege the guest hypervisor and emulate a virtual EL2
execution mode in EL1 using the hardware features provided by ARMv8.3 to trap
EL2 operations to EL1. To do that the host hypervisor needs to manage EL2
register state for the guest hypervisor, and shadow EL1 register state that
reflects the EL2 register state to run the guest hypervisor in EL1. See patch 6
through 10 for this.

For memory virtualization, the biggest issue is that we now have more than two
stages of translation when running nested VMs. We choose to merge two stage-2
page tables (one from the guest hypervisor and the other from the host
hypervisor) and create shadow stage-2 page tables, which have mappings from the
nested VM’s physical addresses to the machine physical addresses. Stage-1
translation is done by the hardware as is done for the normal VMs.

To provide VGIC support to the guest hypervisor, we emulate the GIC
virtualization extensions using trap-and-emulate to a virtual GIC Hypervisor
Control Interface.  Furthermore, we can still use the GIC VE hardware features
to deliver virtual interrupts to the nested VM, by directly mapping the GIC
VCPU interface to the nested VM and switching the content of the GIC Hypervisor
Control interface when alternating between a nested VM and a normal VM.  See
patches 25 through 32, and 50 through 52 for more information.

For timer virtualization, the guest hypervisor expects to have access to the
EL2 physical timer, the EL1 physical timer and the virtual timer. So, the host
hypervisor needs to provide all of them. The virtual timer is always available
to VMs. The physical timer is available to VMs via my previous patch series[3].
The EL2 physical timer is not supported yet in this RFC. We plan to support
this as it is required to run other guest hypervisors such as Xen.

Even though this work is not complete (see limitations below), I'd appreciate
early feedback on this RFC. Specifically, I'm interested in:
- Is it better to have a kernel config or to make it configurable at runtime?
- I wonder if the data structure for memory management makes sense.
- What architecture version do we support for the guest hypervisor, and how?
  For example, do we always support all architecture versions or the same
  architecture as the underlying hardware platform? Or is it better
  to make it configurable from the userspace?
- Initial comments on the overall design?

This patch series is based on kvm-arm-for-4.9-rc7 with the patch series to provide
VMs with the EL1 physical timer[2].

Git: https://github.com/columbia/nesting-pub/tree/rfc-v1

Testing:
We have tested this on ARMv8.0 (Applied Micro X-Gene)[3] since ARMv8.3 hardware
is not available yet. We have paravirtualized the guest hypervisor to trap to
EL2 as specified in ARMv8.3 specification using hvc instruction. We plan to
test this on ARMv8.3 model, and will post the result and v2 if necessary.

Limitations:
- This patch series only supports arm64, not arm. All the patches compile on
  arm, but I haven't try to boot normal VMs on it.
- The guest hypervisor with VHE (ARMv8.1) is not supported in this RFC. I have
  patches for that, but they need to be cleaned up.
- Recursive nesting (i.e. emulating ARMv8.3 in the VM) is not tested yet.
- Other hypervisors (such as Xen) on KVM are not tested.

TODO:
- Test to boot normal VMs on arm architecture
- Test this on ARMv8.3 model
- Support the guest hypervisor with VHE
- Provide the guest hypervisor with the EL2 physical timer
- Run other hypervisors such as Xen on KVM

[1] https://www.community.arm.com/processors/b/blog/posts/armv8-a-architecture-2016-additions
[2] https://lists.cs.columbia.edu/pipermail/kvmarm/2016-December/022825.html
[3] https://www.cloudlab.us/hardware.php#utah

Christoffer Dall (27):
  arm64: Add missing TCR hw defines
  KVM: arm64: Add nesting config option
  KVM: arm64: Add KVM nesting feature
  KVM: arm64: Allow userspace to set PSR_MODE_EL2x
  KVM: arm64: Add vcpu_mode_el2 primitive to support nesting
  KVM: arm/arm64: Add virtual EL2 state emulation framework
  KVM: arm64: Set virtual EL2 context depending on the guest exception
    level
  KVM: arm64: Set shadow EL1 registers for virtual EL2 execution
  KVM: arm64: Synchronize EL1 system registers on virtual EL2 entry and
    exit
  KVM: arm64: Trap EL1 VM register accesses in virtual EL2
  KVM: arm/arm64: Add VGIC data structures for the nesting
  KVM: arm/arm64: Inject maintenance interrupts to the guest hypervisor
  KVM: arm/arm64: Remove unused params in mmu functions
  KVM: arm/arm64: Abstract stage-2 MMU state into a separate structure
  KVM: arm/arm64: Support mmu for the virtual EL2 execution
  KVM: arm64: Invalidate virtual EL2 TLB entries when needed
  KVM: arm64: Setup vttbr_el2 on each VM entry
  KVM: arm/arm64: Make mmu functions non-static
  KVM: arm/arm64: Unmap/flush shadow stage 2 page tables
  KVM: arm64: Implement nested Stage-2 page table walk logic
  KVM: arm/arm64: Handle shadow stage 2 page faults
  KVM: arm/arm64: Move kvm_is_write_fault to header file
  KVM: arm64: KVM: Inject stage-2 page faults
  KVM: arm64: Add more info to the S2 translation result
  KVM: arm/arm64: Forward the guest hypervisor's stage 2 permission
    faults
  KVM: arm64: Emulate TLBI instruction
  KVM: arm64: Fixes to toggle_cache for nesting

Jintack Lim (28):
  KVM: arm64: Add EL2 execution context for nesting
  KVM: arm64: Emulate taking an exception to the guest hypervisor
  KVM: arm64: Handle EL2 register access traps
  KVM: arm64: Handle eret instruction traps
  KVM: arm64: Take account of system instruction traps
  KVM: arm64: Forward VM reg traps to the guest hypervisor
  KVM: arm64: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 in virtual EL2
  KVM: arm64: Forward traps due to HCR_EL2.NV1 bit to the guest
    hypervisor
  KVM: arm64: Trap CPACR_EL1 access in virtual EL2
  KVM: arm64: Forward CPACR_EL1 traps to the guest hypervisor
  KVM: arm64: Forward HVC instruction to the guest hypervisor
  KVM: arm64: Handle PSCI call from the guest
  KVM: arm64: Forward WFX to the guest hypervisor
  KVM: arm64: Forward FP exceptions to the guest hypervisor
  KVM: arm/arm64: Let vcpu thread modify its own active state
  KVM: arm/arm64: Emulate GICH interface on GICv2
  KVM: arm/arm64: Prepare vgic state for the nested VM
  KVM: arm/arm64: Set up the prepared vgic state
  KVM: arm/arm64: Inject irqs to the guest hypervisor
  KVM: arm/arm64: register GICH iodev for the guest hypervisor
  KVM: arm/arm64: Add mmu context for the nesting
  KVM: arm/arm64: Handle vttbr_el2 write operation from the guest
    hypervisor
  KVM: arm/arm64: Abstract kvm_phys_addr_ioremap() function
  KVM: arm64: Expose physical address of vcpu interface
  KVM: arm/arm64: Create a vcpu mapping for the nested VM
  KVM: arm64: Reflect shadow VMPIDR_EL2 value to MPIDR_EL1
  KVM: arm/arm64: Adjust virtual offset considering nesting
  KVM: arm64: Enable nested virtualization

 arch/arm/include/asm/kvm_asm.h         |   7 +-
 arch/arm/include/asm/kvm_emulate.h     |  54 ++++
 arch/arm/include/asm/kvm_host.h        |  34 ++-
 arch/arm/include/asm/kvm_mmu.h         |  39 +++
 arch/arm/kvm/arm.c                     |  79 ++++--
 arch/arm/kvm/hyp/switch.c              |   3 +-
 arch/arm/kvm/hyp/tlb.c                 |  15 +-
 arch/arm/kvm/mmio.c                    |  12 +-
 arch/arm/kvm/mmu.c                     | 386 +++++++++++++++++--------
 arch/arm64/include/asm/esr.h           |   2 +
 arch/arm64/include/asm/kvm_arm.h       |   3 +
 arch/arm64/include/asm/kvm_asm.h       |   7 +-
 arch/arm64/include/asm/kvm_coproc.h    |   2 +-
 arch/arm64/include/asm/kvm_emulate.h   |  68 +++++
 arch/arm64/include/asm/kvm_host.h      |  96 ++++++-
 arch/arm64/include/asm/kvm_mmu.h       | 110 +++++++-
 arch/arm64/include/asm/kvm_nested.h    |   7 +
 arch/arm64/include/asm/pgtable-hwdef.h |   6 +
 arch/arm64/include/uapi/asm/kvm.h      |   7 +
 arch/arm64/kernel/asm-offsets.c        |   1 +
 arch/arm64/kvm/Kconfig                 |   6 +
 arch/arm64/kvm/Makefile                |   7 +-
 arch/arm64/kvm/context.c               | 212 ++++++++++++++
 arch/arm64/kvm/emulate-nested.c        |  66 +++++
 arch/arm64/kvm/guest.c                 |   2 +
 arch/arm64/kvm/handle_exit.c           |  62 +++-
 arch/arm64/kvm/handle_exit_nested.c    |  51 ++++
 arch/arm64/kvm/hyp/entry.S             |  14 +
 arch/arm64/kvm/hyp/hyp-entry.S         |   2 +-
 arch/arm64/kvm/hyp/switch.c            |  15 +-
 arch/arm64/kvm/hyp/sysreg-sr.c         | 109 +++----
 arch/arm64/kvm/hyp/tlb.c               |  16 +-
 arch/arm64/kvm/mmu-nested.c            | 501 +++++++++++++++++++++++++++++++++
 arch/arm64/kvm/reset.c                 |   8 +
 arch/arm64/kvm/sys_regs.c              | 287 ++++++++++++++++++-
 arch/arm64/kvm/sys_regs.h              |   7 +
 arch/arm64/kvm/trace.h                 |  43 ++-
 include/kvm/arm_vgic.h                 |  36 ++-
 virt/kvm/arm/arch_timer.c              |   3 +-
 virt/kvm/arm/hyp/timer-sr.c            |   5 +-
 virt/kvm/arm/hyp/vgic-v2-sr.c          |  15 +-
 virt/kvm/arm/vgic/vgic-init.c          |   3 +
 virt/kvm/arm/vgic/vgic-mmio.c          |  11 +-
 virt/kvm/arm/vgic/vgic-v2-nested.c     | 346 +++++++++++++++++++++++
 virt/kvm/arm/vgic/vgic-v2.c            |  13 +
 virt/kvm/arm/vgic/vgic.c               |  23 ++
 virt/kvm/arm/vgic/vgic.h               |  17 ++
 47 files changed, 2542 insertions(+), 276 deletions(-)
 create mode 100644 arch/arm64/include/asm/kvm_nested.h
 create mode 100644 arch/arm64/kvm/context.c
 create mode 100644 arch/arm64/kvm/emulate-nested.c
 create mode 100644 arch/arm64/kvm/handle_exit_nested.c
 create mode 100644 arch/arm64/kvm/mmu-nested.c
 create mode 100644 virt/kvm/arm/vgic/vgic-v2-nested.c

-- 
1.9.1

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [RFC 01/55] arm64: Add missing TCR hw defines
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
@ 2017-01-09  6:23 ` Jintack Lim
  2017-01-09  6:23 ` [RFC 02/55] KVM: arm64: Add nesting config option Jintack Lim
                   ` (55 subsequent siblings)
  56 siblings, 0 replies; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:23 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

From: Christoffer Dall <christoffer.dall@linaro.org>

Some bits of the TCR weren't defined and since we're about to use these
in KVM, add these defines.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm64/include/asm/pgtable-hwdef.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
index eb0c2bd..d26cab7 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -272,9 +272,15 @@
 #define TCR_TG1_4K		(UL(2) << TCR_TG1_SHIFT)
 #define TCR_TG1_64K		(UL(3) << TCR_TG1_SHIFT)
 
+#define TCR_IPS_SHIFT		32
+#define TCR_IPS_MASK		(UL(7) << TCR_IPS_SHIFT)
+
 #define TCR_ASID16		(UL(1) << 36)
 #define TCR_TBI0		(UL(1) << 37)
 #define TCR_HA			(UL(1) << 39)
 #define TCR_HD			(UL(1) << 40)
 
+#define TCR_EPD1		(UL(1) << 23)
+#define TCR_EPD0		(UL(1) << 7)
+
 #endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 02/55] KVM: arm64: Add nesting config option
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
  2017-01-09  6:23 ` [RFC 01/55] arm64: Add missing TCR hw defines Jintack Lim
@ 2017-01-09  6:23 ` Jintack Lim
  2017-01-09  6:23 ` [RFC 03/55] KVM: arm64: Add KVM nesting feature Jintack Lim
                   ` (54 subsequent siblings)
  56 siblings, 0 replies; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:23 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

From: Christoffer Dall <christoffer.dall@linaro.org>

Add an option that allows nested hypervisor support.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm64/kvm/Kconfig | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 6eaf12c..37263ff 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -57,6 +57,12 @@ config KVM_ARM_PMU
 	  Adds support for a virtual Performance Monitoring Unit (PMU) in
 	  virtual machines.
 
+config KVM_ARM_NESTED_HYP
+	bool "Nested Virtualization"
+	depends on KVM
+	---help---
+	  Support nested hypervisors in VMs.
+
 source drivers/vhost/Kconfig
 
 endif # VIRTUALIZATION
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 03/55] KVM: arm64: Add KVM nesting feature
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
  2017-01-09  6:23 ` [RFC 01/55] arm64: Add missing TCR hw defines Jintack Lim
  2017-01-09  6:23 ` [RFC 02/55] KVM: arm64: Add nesting config option Jintack Lim
@ 2017-01-09  6:23 ` Jintack Lim
  2017-01-09  6:24 ` [RFC 04/55] KVM: arm64: Allow userspace to set PSR_MODE_EL2x Jintack Lim
                   ` (53 subsequent siblings)
  56 siblings, 0 replies; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:23 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

From: Christoffer Dall <christoffer.dall@linaro.org>

Set the initial exception level of the guest to EL2 if nested
virtualization feature is enabled.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm64/include/asm/kvm_host.h | 2 +-
 arch/arm64/include/uapi/asm/kvm.h | 1 +
 arch/arm64/kvm/reset.c            | 8 ++++++++
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index e505038..c0c8b02 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -41,7 +41,7 @@
 
 #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
 
-#define KVM_VCPU_MAX_FEATURES 4
+#define KVM_VCPU_MAX_FEATURES 5
 
 #define KVM_REQ_VCPU_EXIT	8
 
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 3051f86..78117bf 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -97,6 +97,7 @@ struct kvm_regs {
 #define KVM_ARM_VCPU_EL1_32BIT		1 /* CPU running a 32bit VM */
 #define KVM_ARM_VCPU_PSCI_0_2		2 /* CPU uses PSCI v0.2 */
 #define KVM_ARM_VCPU_PMU_V3		3 /* Support guest PMUv3 */
+#define KVM_ARM_VCPU_NESTED_VIRT	4 /* Support nested virtual EL2 */
 
 struct kvm_vcpu_init {
 	__u32 target;
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 74322c2..e6b0b20 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -41,6 +41,11 @@
 			PSR_F_BIT | PSR_D_BIT),
 };
 
+static const struct kvm_regs default_regs_reset_el2 = {
+	.regs.pstate = (PSR_MODE_EL2h | PSR_A_BIT | PSR_I_BIT |
+			PSR_F_BIT | PSR_D_BIT),
+};
+
 static const struct kvm_regs default_regs_reset32 = {
 	.regs.pstate = (COMPAT_PSR_MODE_SVC | COMPAT_PSR_A_BIT |
 			COMPAT_PSR_I_BIT | COMPAT_PSR_F_BIT),
@@ -124,6 +129,9 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 			if (!cpu_has_32bit_el1())
 				return -EINVAL;
 			cpu_reset = &default_regs_reset32;
+		} else if (test_bit(KVM_ARM_VCPU_NESTED_VIRT,
+				    vcpu->arch.features)) {
+			cpu_reset = &default_regs_reset_el2;
 		} else {
 			cpu_reset = &default_regs_reset;
 		}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 04/55] KVM: arm64: Allow userspace to set PSR_MODE_EL2x
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (2 preceding siblings ...)
  2017-01-09  6:23 ` [RFC 03/55] KVM: arm64: Add KVM nesting feature Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-01-09  6:24 ` [RFC 05/55] KVM: arm64: Add vcpu_mode_el2 primitive to support nesting Jintack Lim
                   ` (52 subsequent siblings)
  56 siblings, 0 replies; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

From: Christoffer Dall <christoffer.dall@linaro.org>

We were not allowing userspace to set a more privileged mode for the VCPU
than EL1, but now that we support nesting with a virtual EL2 mode, do
allow this!

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm64/kvm/guest.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 3f9e157..6b9f38a 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -117,6 +117,8 @@ static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 		case PSR_MODE_EL0t:
 		case PSR_MODE_EL1t:
 		case PSR_MODE_EL1h:
+		case PSR_MODE_EL2h:
+		case PSR_MODE_EL2t:
 			break;
 		default:
 			err = -EINVAL;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 05/55] KVM: arm64: Add vcpu_mode_el2 primitive to support nesting
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (3 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 04/55] KVM: arm64: Allow userspace to set PSR_MODE_EL2x Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-01-09  6:24 ` [RFC 06/55] KVM: arm64: Add EL2 execution context for nesting Jintack Lim
                   ` (51 subsequent siblings)
  56 siblings, 0 replies; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

From: Christoffer Dall <christoffer.dall@linaro.org>

When running a nested hypervisor we occasionally have to figure out if
the mode we are switching into is the virtual EL2 mode or a regular
EL0/1 mode.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm/include/asm/kvm_emulate.h   |  6 ++++++
 arch/arm64/include/asm/kvm_emulate.h | 12 ++++++++++++
 2 files changed, 18 insertions(+)

diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index 9a8a45a..399cd75e 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -77,6 +77,12 @@ static inline bool vcpu_mode_is_32bit(const struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+/* We don't support nesting on arm */
+static inline bool vcpu_mode_el2(const struct kvm_vcpu *vcpu)
+{
+	return false;
+}
+
 static inline unsigned long *vcpu_pc(struct kvm_vcpu *vcpu)
 {
 	return &vcpu->arch.ctxt.gp_regs.usr_regs.ARM_pc;
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index f5ea0ba..830be2e 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -143,6 +143,18 @@ static inline bool vcpu_mode_priv(const struct kvm_vcpu *vcpu)
 	return mode != PSR_MODE_EL0t;
 }
 
+static inline bool vcpu_mode_el2(const struct kvm_vcpu *vcpu)
+{
+	u32 mode;
+
+	if (vcpu_mode_is_32bit(vcpu))
+		return false;
+
+	mode = *vcpu_cpsr(vcpu) & PSR_MODE_MASK;
+
+	return mode == PSR_MODE_EL2h || mode == PSR_MODE_EL2t;
+}
+
 static inline u32 kvm_vcpu_get_hsr(const struct kvm_vcpu *vcpu)
 {
 	return vcpu->arch.fault.esr_el2;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 06/55] KVM: arm64: Add EL2 execution context for nesting
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (4 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 05/55] KVM: arm64: Add vcpu_mode_el2 primitive to support nesting Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-02-22 11:10   ` Christoffer Dall
  2017-01-09  6:24 ` [RFC 07/55] KVM: arm/arm64: Add virtual EL2 state emulation framework Jintack Lim
                   ` (50 subsequent siblings)
  56 siblings, 1 reply; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

With the nested virtualization support, the context of the guest
includes EL2 register states. The host manages a set of virtual EL2
registers.  In addition to that, the guest hypervisor supposed to run in
EL2 is now deprivilaged and runs in EL1. So, the host also manages a set
of shadow system registers to be able to run the guest hypervisor in
EL1.

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/include/asm/kvm_host.h | 54 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index c0c8b02..ed78d73 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -146,6 +146,42 @@ enum vcpu_sysreg {
 	NR_SYS_REGS	/* Nothing after this line! */
 };
 
+enum el2_regs {
+	ELR_EL2,
+	SPSR_EL2,
+	SP_EL2,
+	AMAIR_EL2,
+	MAIR_EL2,
+	TCR_EL2,
+	TTBR0_EL2,
+	VTCR_EL2,
+	VTTBR_EL2,
+	VMPIDR_EL2,
+	VPIDR_EL2,      /* 10 */
+	MDCR_EL2,
+	CNTHCTL_EL2,
+	CNTHP_CTL_EL2,
+	CNTHP_CVAL_EL2,
+	CNTHP_TVAL_EL2,
+	CNTVOFF_EL2,
+	ACTLR_EL2,
+	AFSR0_EL2,
+	AFSR1_EL2,
+	CPTR_EL2,       /* 20 */
+	ESR_EL2,
+	FAR_EL2,
+	HACR_EL2,
+	HCR_EL2,
+	HPFAR_EL2,
+	HSTR_EL2,
+	RMR_EL2,
+	RVBAR_EL2,
+	SCTLR_EL2,
+	TPIDR_EL2,      /* 30 */
+	VBAR_EL2,
+	NR_EL2_REGS     /* Nothing after this line! */
+};
+
 /* 32bit mapping */
 #define c0_MPIDR	(MPIDR_EL1 * 2)	/* MultiProcessor ID Register */
 #define c0_CSSELR	(CSSELR_EL1 * 2)/* Cache Size Selection Register */
@@ -193,6 +229,23 @@ struct kvm_cpu_context {
 		u64 sys_regs[NR_SYS_REGS];
 		u32 copro[NR_COPRO_REGS];
 	};
+
+	u64 el2_regs[NR_EL2_REGS];         /* only used for nesting */
+	u64 shadow_sys_regs[NR_SYS_REGS];  /* only used for virtual EL2 */
+
+	/*
+	 * hw_* will be used when switching to a VM. They point to either
+	 * the virtual EL2 or EL1/EL0 context depending on vcpu mode.
+	 */
+
+	/* pointing shadow_sys_regs or sys_regs */
+	u64 *hw_sys_regs;
+
+	/* copy of either gp_regs.sp_el1 or el2_regs[SP_EL2] */
+	u64 hw_sp_el1;
+
+	/* pstate written to SPSR_EL2 */
+	u64 hw_pstate;
 };
 
 typedef struct kvm_cpu_context kvm_cpu_context_t;
@@ -277,6 +330,7 @@ struct kvm_vcpu_arch {
 
 #define vcpu_gp_regs(v)		(&(v)->arch.ctxt.gp_regs)
 #define vcpu_sys_reg(v,r)	((v)->arch.ctxt.sys_regs[(r)])
+#define vcpu_el2_reg(v, r)	((v)->arch.ctxt.el2_regs[(r)])
 /*
  * CP14 and CP15 live in the same array, as they are backed by the
  * same system registers.
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 07/55] KVM: arm/arm64: Add virtual EL2 state emulation framework
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (5 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 06/55] KVM: arm64: Add EL2 execution context for nesting Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-02-22 11:12   ` Christoffer Dall
  2017-06-01 20:05   ` Bandan Das
  2017-01-09  6:24 ` [RFC 08/55] KVM: arm64: Set virtual EL2 context depending on the guest exception level Jintack Lim
                   ` (49 subsequent siblings)
  56 siblings, 2 replies; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

From: Christoffer Dall <christoffer.dall@linaro.org>

Add a framework to set up the guest's context depending on the guest's
exception level. A chosen context is written to hardware in the lowvisor.
We don't set the virtual EL2 context yet.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm/include/asm/kvm_emulate.h   |   4 ++
 arch/arm/kvm/arm.c                   |   5 ++
 arch/arm64/include/asm/kvm_emulate.h |   4 ++
 arch/arm64/kvm/Makefile              |   2 +-
 arch/arm64/kvm/context.c             |  49 ++++++++++++++++
 arch/arm64/kvm/hyp/sysreg-sr.c       | 109 +++++++++++++++++++----------------
 6 files changed, 122 insertions(+), 51 deletions(-)
 create mode 100644 arch/arm64/kvm/context.c

diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index 399cd75e..0a03b7d 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -47,6 +47,10 @@ static inline void vcpu_set_reg(struct kvm_vcpu *vcpu, u8 reg_num,
 void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
 void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
 
+static inline void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu) { };
+static inline void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu) { };
+static inline void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt) { };
+
 static inline bool kvm_condition_valid(const struct kvm_vcpu *vcpu)
 {
 	return kvm_condition_valid32(vcpu);
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index d2dfa32..436bf5a 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -41,6 +41,7 @@
 #include <asm/virt.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_asm.h>
+#include <asm/kvm_hyp.h>
 #include <asm/kvm_mmu.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_coproc.h>
@@ -646,6 +647,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		}
 
 		kvm_arm_setup_debug(vcpu);
+		kvm_arm_setup_shadow_state(vcpu);
 
 		/**************************************************************
 		 * Enter the guest
@@ -662,6 +664,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		 * Back from guest
 		 *************************************************************/
 
+		kvm_arm_restore_shadow_state(vcpu);
 		kvm_arm_clear_debug(vcpu);
 
 		/*
@@ -1369,6 +1372,8 @@ static int init_hyp_mode(void)
 			kvm_err("Cannot map host CPU state: %d\n", err);
 			goto out_err;
 		}
+
+		kvm_arm_init_cpu_context(cpu_ctxt);
 	}
 
 	kvm_info("Hyp mode initialized successfully\n");
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 830be2e..8892c82 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -42,6 +42,10 @@
 void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
 void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
 
+void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu);
+void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu);
+void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt);
+
 static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
 {
 	vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS;
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index d50a82a..7811d27 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -16,7 +16,7 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/e
 kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/arm.o $(ARM)/mmu.o $(ARM)/mmio.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/psci.o $(ARM)/perf.o
 
-kvm-$(CONFIG_KVM_ARM_HOST) += inject_fault.o regmap.o
+kvm-$(CONFIG_KVM_ARM_HOST) += inject_fault.o regmap.o context.o
 kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o
 kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o sys_regs_generic_v8.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/aarch32.o
diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
new file mode 100644
index 0000000..320afc6
--- /dev/null
+++ b/arch/arm64/kvm/context.c
@@ -0,0 +1,49 @@
+/*
+ * Copyright (C) 2016 - Linaro Ltd.
+ * Author: Christoffer Dall <christoffer.dall@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/kvm_host.h>
+#include <asm/kvm_emulate.h>
+
+/**
+ * kvm_arm_setup_shadow_state -- prepare shadow state based on emulated mode
+ * @vcpu: The VCPU pointer
+ */
+void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
+{
+	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
+
+	ctxt->hw_pstate = *vcpu_cpsr(vcpu);
+	ctxt->hw_sys_regs = ctxt->sys_regs;
+	ctxt->hw_sp_el1 = ctxt->gp_regs.sp_el1;
+}
+
+/**
+ * kvm_arm_restore_shadow_state -- write back shadow state from guest
+ * @vcpu: The VCPU pointer
+ */
+void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu)
+{
+	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
+
+	*vcpu_cpsr(vcpu) = ctxt->hw_pstate;
+	ctxt->gp_regs.sp_el1 = ctxt->hw_sp_el1;
+}
+
+void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt)
+{
+	cpu_ctxt->hw_sys_regs = &cpu_ctxt->sys_regs[0];
+}
diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index 9341376..f2a1b32 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -19,6 +19,7 @@
 #include <linux/kvm_host.h>
 
 #include <asm/kvm_asm.h>
+#include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
 
 /* Yes, this does nothing, on purpose */
@@ -33,37 +34,41 @@ static void __hyp_text __sysreg_do_nothing(struct kvm_cpu_context *ctxt) { }
 
 static void __hyp_text __sysreg_save_common_state(struct kvm_cpu_context *ctxt)
 {
-	ctxt->sys_regs[ACTLR_EL1]	= read_sysreg(actlr_el1);
-	ctxt->sys_regs[TPIDR_EL0]	= read_sysreg(tpidr_el0);
-	ctxt->sys_regs[TPIDRRO_EL0]	= read_sysreg(tpidrro_el0);
-	ctxt->sys_regs[TPIDR_EL1]	= read_sysreg(tpidr_el1);
-	ctxt->sys_regs[MDSCR_EL1]	= read_sysreg(mdscr_el1);
+	u64 *sys_regs = kern_hyp_va(ctxt->hw_sys_regs);
+
+	sys_regs[ACTLR_EL1]	= read_sysreg(actlr_el1);
+	sys_regs[TPIDR_EL0]	= read_sysreg(tpidr_el0);
+	sys_regs[TPIDRRO_EL0]	= read_sysreg(tpidrro_el0);
+	sys_regs[TPIDR_EL1]	= read_sysreg(tpidr_el1);
+	sys_regs[MDSCR_EL1]	= read_sysreg(mdscr_el1);
 	ctxt->gp_regs.regs.sp		= read_sysreg(sp_el0);
 	ctxt->gp_regs.regs.pc		= read_sysreg_el2(elr);
-	ctxt->gp_regs.regs.pstate	= read_sysreg_el2(spsr);
+	ctxt->hw_pstate			= read_sysreg_el2(spsr);
 }
 
 static void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
 {
-	ctxt->sys_regs[MPIDR_EL1]	= read_sysreg(vmpidr_el2);
-	ctxt->sys_regs[CSSELR_EL1]	= read_sysreg(csselr_el1);
-	ctxt->sys_regs[SCTLR_EL1]	= read_sysreg_el1(sctlr);
-	ctxt->sys_regs[CPACR_EL1]	= read_sysreg_el1(cpacr);
-	ctxt->sys_regs[TTBR0_EL1]	= read_sysreg_el1(ttbr0);
-	ctxt->sys_regs[TTBR1_EL1]	= read_sysreg_el1(ttbr1);
-	ctxt->sys_regs[TCR_EL1]		= read_sysreg_el1(tcr);
-	ctxt->sys_regs[ESR_EL1]		= read_sysreg_el1(esr);
-	ctxt->sys_regs[AFSR0_EL1]	= read_sysreg_el1(afsr0);
-	ctxt->sys_regs[AFSR1_EL1]	= read_sysreg_el1(afsr1);
-	ctxt->sys_regs[FAR_EL1]		= read_sysreg_el1(far);
-	ctxt->sys_regs[MAIR_EL1]	= read_sysreg_el1(mair);
-	ctxt->sys_regs[VBAR_EL1]	= read_sysreg_el1(vbar);
-	ctxt->sys_regs[CONTEXTIDR_EL1]	= read_sysreg_el1(contextidr);
-	ctxt->sys_regs[AMAIR_EL1]	= read_sysreg_el1(amair);
-	ctxt->sys_regs[CNTKCTL_EL1]	= read_sysreg_el1(cntkctl);
-	ctxt->sys_regs[PAR_EL1]		= read_sysreg(par_el1);
-
-	ctxt->gp_regs.sp_el1		= read_sysreg(sp_el1);
+	u64 *sys_regs = kern_hyp_va(ctxt->hw_sys_regs);
+
+	sys_regs[MPIDR_EL1]	= read_sysreg(vmpidr_el2);
+	sys_regs[CSSELR_EL1]	= read_sysreg(csselr_el1);
+	sys_regs[SCTLR_EL1]	= read_sysreg_el1(sctlr);
+	sys_regs[CPACR_EL1]	= read_sysreg_el1(cpacr);
+	sys_regs[TTBR0_EL1]	= read_sysreg_el1(ttbr0);
+	sys_regs[TTBR1_EL1]	= read_sysreg_el1(ttbr1);
+	sys_regs[TCR_EL1]	= read_sysreg_el1(tcr);
+	sys_regs[ESR_EL1]	= read_sysreg_el1(esr);
+	sys_regs[AFSR0_EL1]	= read_sysreg_el1(afsr0);
+	sys_regs[AFSR1_EL1]	= read_sysreg_el1(afsr1);
+	sys_regs[FAR_EL1]	= read_sysreg_el1(far);
+	sys_regs[MAIR_EL1]	= read_sysreg_el1(mair);
+	sys_regs[VBAR_EL1]	= read_sysreg_el1(vbar);
+	sys_regs[CONTEXTIDR_EL1]	= read_sysreg_el1(contextidr);
+	sys_regs[AMAIR_EL1]	= read_sysreg_el1(amair);
+	sys_regs[CNTKCTL_EL1]	= read_sysreg_el1(cntkctl);
+	sys_regs[PAR_EL1]		= read_sysreg(par_el1);
+
+	ctxt->hw_sp_el1			= read_sysreg(sp_el1);
 	ctxt->gp_regs.elr_el1		= read_sysreg_el1(elr);
 	ctxt->gp_regs.spsr[KVM_SPSR_EL1]= read_sysreg_el1(spsr);
 }
@@ -86,37 +91,41 @@ void __hyp_text __sysreg_save_guest_state(struct kvm_cpu_context *ctxt)
 
 static void __hyp_text __sysreg_restore_common_state(struct kvm_cpu_context *ctxt)
 {
-	write_sysreg(ctxt->sys_regs[ACTLR_EL1],	  actlr_el1);
-	write_sysreg(ctxt->sys_regs[TPIDR_EL0],	  tpidr_el0);
-	write_sysreg(ctxt->sys_regs[TPIDRRO_EL0], tpidrro_el0);
-	write_sysreg(ctxt->sys_regs[TPIDR_EL1],	  tpidr_el1);
-	write_sysreg(ctxt->sys_regs[MDSCR_EL1],	  mdscr_el1);
+	u64 *sys_regs = kern_hyp_va(ctxt->hw_sys_regs);
+
+	write_sysreg(sys_regs[ACTLR_EL1],	  actlr_el1);
+	write_sysreg(sys_regs[TPIDR_EL0],	  tpidr_el0);
+	write_sysreg(sys_regs[TPIDRRO_EL0],	tpidrro_el0);
+	write_sysreg(sys_regs[TPIDR_EL1],	  tpidr_el1);
+	write_sysreg(sys_regs[MDSCR_EL1],	  mdscr_el1);
 	write_sysreg(ctxt->gp_regs.regs.sp,	  sp_el0);
 	write_sysreg_el2(ctxt->gp_regs.regs.pc,	  elr);
-	write_sysreg_el2(ctxt->gp_regs.regs.pstate, spsr);
+	write_sysreg_el2(ctxt->hw_pstate,	  spsr);
 }
 
 static void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
 {
-	write_sysreg(ctxt->sys_regs[MPIDR_EL1],		vmpidr_el2);
-	write_sysreg(ctxt->sys_regs[CSSELR_EL1],	csselr_el1);
-	write_sysreg_el1(ctxt->sys_regs[SCTLR_EL1],	sctlr);
-	write_sysreg_el1(ctxt->sys_regs[CPACR_EL1],	cpacr);
-	write_sysreg_el1(ctxt->sys_regs[TTBR0_EL1],	ttbr0);
-	write_sysreg_el1(ctxt->sys_regs[TTBR1_EL1],	ttbr1);
-	write_sysreg_el1(ctxt->sys_regs[TCR_EL1],	tcr);
-	write_sysreg_el1(ctxt->sys_regs[ESR_EL1],	esr);
-	write_sysreg_el1(ctxt->sys_regs[AFSR0_EL1],	afsr0);
-	write_sysreg_el1(ctxt->sys_regs[AFSR1_EL1],	afsr1);
-	write_sysreg_el1(ctxt->sys_regs[FAR_EL1],	far);
-	write_sysreg_el1(ctxt->sys_regs[MAIR_EL1],	mair);
-	write_sysreg_el1(ctxt->sys_regs[VBAR_EL1],	vbar);
-	write_sysreg_el1(ctxt->sys_regs[CONTEXTIDR_EL1],contextidr);
-	write_sysreg_el1(ctxt->sys_regs[AMAIR_EL1],	amair);
-	write_sysreg_el1(ctxt->sys_regs[CNTKCTL_EL1], 	cntkctl);
-	write_sysreg(ctxt->sys_regs[PAR_EL1],		par_el1);
-
-	write_sysreg(ctxt->gp_regs.sp_el1,		sp_el1);
+	u64 *sys_regs = kern_hyp_va(ctxt->hw_sys_regs);
+
+	write_sysreg(sys_regs[MPIDR_EL1],	vmpidr_el2);
+	write_sysreg(sys_regs[CSSELR_EL1],	csselr_el1);
+	write_sysreg_el1(sys_regs[SCTLR_EL1],	sctlr);
+	write_sysreg_el1(sys_regs[CPACR_EL1],	cpacr);
+	write_sysreg_el1(sys_regs[TTBR0_EL1],	ttbr0);
+	write_sysreg_el1(sys_regs[TTBR1_EL1],	ttbr1);
+	write_sysreg_el1(sys_regs[TCR_EL1],	tcr);
+	write_sysreg_el1(sys_regs[ESR_EL1],	esr);
+	write_sysreg_el1(sys_regs[AFSR0_EL1],	afsr0);
+	write_sysreg_el1(sys_regs[AFSR1_EL1],	afsr1);
+	write_sysreg_el1(sys_regs[FAR_EL1],	far);
+	write_sysreg_el1(sys_regs[MAIR_EL1],	mair);
+	write_sysreg_el1(sys_regs[VBAR_EL1],	vbar);
+	write_sysreg_el1(sys_regs[CONTEXTIDR_EL1], contextidr);
+	write_sysreg_el1(sys_regs[AMAIR_EL1],	amair);
+	write_sysreg_el1(sys_regs[CNTKCTL_EL1], cntkctl);
+	write_sysreg(sys_regs[PAR_EL1],		par_el1);
+
+	write_sysreg(ctxt->hw_sp_el1,			sp_el1);
 	write_sysreg_el1(ctxt->gp_regs.elr_el1,		elr);
 	write_sysreg_el1(ctxt->gp_regs.spsr[KVM_SPSR_EL1],spsr);
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 08/55] KVM: arm64: Set virtual EL2 context depending on the guest exception level
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (6 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 07/55] KVM: arm/arm64: Add virtual EL2 state emulation framework Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-02-22 11:14   ` Christoffer Dall
  2017-06-01 20:22   ` Bandan Das
  2017-01-09  6:24 ` [RFC 09/55] KVM: arm64: Set shadow EL1 registers for virtual EL2 execution Jintack Lim
                   ` (48 subsequent siblings)
  56 siblings, 2 replies; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

From: Christoffer Dall <christoffer.dall@linaro.org>

Set up virutal EL2 context to hardware if the guest exception level is
EL2.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm64/kvm/context.c | 32 ++++++++++++++++++++++++++------
 1 file changed, 26 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
index 320afc6..acb4b1e 100644
--- a/arch/arm64/kvm/context.c
+++ b/arch/arm64/kvm/context.c
@@ -25,10 +25,25 @@
 void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
+	if (unlikely(vcpu_mode_el2(vcpu))) {
+		ctxt->hw_pstate = *vcpu_cpsr(vcpu) & ~PSR_MODE_MASK;
 
-	ctxt->hw_pstate = *vcpu_cpsr(vcpu);
-	ctxt->hw_sys_regs = ctxt->sys_regs;
-	ctxt->hw_sp_el1 = ctxt->gp_regs.sp_el1;
+		/*
+		 * We emulate virtual EL2 mode in hardware EL1 mode using the
+		 * same stack pointer mode as the guest expects.
+		 */
+		if ((*vcpu_cpsr(vcpu) & PSR_MODE_MASK) == PSR_MODE_EL2h)
+			ctxt->hw_pstate |= PSR_MODE_EL1h;
+		else
+			ctxt->hw_pstate |= PSR_MODE_EL1t;
+
+		ctxt->hw_sys_regs = ctxt->shadow_sys_regs;
+		ctxt->hw_sp_el1 = ctxt->el2_regs[SP_EL2];
+	} else {
+		ctxt->hw_pstate = *vcpu_cpsr(vcpu);
+		ctxt->hw_sys_regs = ctxt->sys_regs;
+		ctxt->hw_sp_el1 = ctxt->gp_regs.sp_el1;
+	}
 }
 
 /**
@@ -38,9 +53,14 @@ void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
 void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
-
-	*vcpu_cpsr(vcpu) = ctxt->hw_pstate;
-	ctxt->gp_regs.sp_el1 = ctxt->hw_sp_el1;
+	if (unlikely(vcpu_mode_el2(vcpu))) {
+		*vcpu_cpsr(vcpu) &= PSR_MODE_MASK;
+		*vcpu_cpsr(vcpu) |= ctxt->hw_pstate & ~PSR_MODE_MASK;
+		ctxt->el2_regs[SP_EL2] = ctxt->hw_sp_el1;
+	} else {
+		*vcpu_cpsr(vcpu) = ctxt->hw_pstate;
+		ctxt->gp_regs.sp_el1 = ctxt->hw_sp_el1;
+	}
 }
 
 void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 09/55] KVM: arm64: Set shadow EL1 registers for virtual EL2 execution
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (7 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 08/55] KVM: arm64: Set virtual EL2 context depending on the guest exception level Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-02-22 11:19   ` Christoffer Dall
  2017-01-09  6:24 ` [RFC 10/55] KVM: arm64: Synchronize EL1 system registers on virtual EL2 entry and exit Jintack Lim
                   ` (47 subsequent siblings)
  56 siblings, 1 reply; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

From: Christoffer Dall <christoffer.dall@linaro.org>

When entering virtual EL2, we need to reflect virtual EL2 register
states to corresponding shadow EL1 registers. We can simply copy them if
their formats are identical.  Otherwise, we need to convert EL2 register
state to EL1 register state.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm64/kvm/context.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 71 insertions(+)

diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
index acb4b1e..2e9e386 100644
--- a/arch/arm64/kvm/context.c
+++ b/arch/arm64/kvm/context.c
@@ -17,6 +17,76 @@
 
 #include <linux/kvm_host.h>
 #include <asm/kvm_emulate.h>
+#include <asm/esr.h>
+
+struct el1_el2_map {
+	enum vcpu_sysreg	el1;
+	enum el2_regs		el2;
+};
+
+/*
+ * List of EL2 registers which can be directly applied to EL1 registers to
+ * emulate running EL2 in EL1.  The EL1 registers here must either be trapped
+ * or paravirtualized in EL1.
+ */
+static const struct el1_el2_map el1_el2_map[] = {
+	{ AMAIR_EL1, AMAIR_EL2 },
+	{ MAIR_EL1, MAIR_EL2 },
+	{ TTBR0_EL1, TTBR0_EL2 },
+	{ ACTLR_EL1, ACTLR_EL2 },
+	{ AFSR0_EL1, AFSR0_EL2 },
+	{ AFSR1_EL1, AFSR1_EL2 },
+	{ SCTLR_EL1, SCTLR_EL2 },
+	{ VBAR_EL1, VBAR_EL2 },
+};
+
+static inline u64 tcr_el2_ips_to_tcr_el1_ps(u64 tcr_el2)
+{
+	return ((tcr_el2 & TCR_EL2_PS_MASK) >> TCR_EL2_PS_SHIFT)
+		<< TCR_IPS_SHIFT;
+}
+
+static inline u64 cptr_el2_to_cpacr_el1(u64 cptr_el2)
+{
+	u64 cpacr_el1 = 0;
+
+	if (!(cptr_el2 & CPTR_EL2_TFP))
+		cpacr_el1 |= CPACR_EL1_FPEN;
+	if (cptr_el2 & CPTR_EL2_TTA)
+		cpacr_el1 |= CPACR_EL1_TTA;
+
+	return cpacr_el1;
+}
+
+static void create_shadow_el1_sysregs(struct kvm_vcpu *vcpu)
+{
+	u64 *s_sys_regs = vcpu->arch.ctxt.shadow_sys_regs;
+	u64 *el2_regs = vcpu->arch.ctxt.el2_regs;
+	u64 tcr_el2;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(el1_el2_map); i++) {
+		const struct el1_el2_map *map = &el1_el2_map[i];
+
+		s_sys_regs[map->el1] = el2_regs[map->el2];
+	}
+
+	tcr_el2 = el2_regs[TCR_EL2];
+	s_sys_regs[TCR_EL1] =
+		TCR_EPD1 |	/* disable TTBR1_EL1 */
+		((tcr_el2 & TCR_EL2_TBI) ? TCR_TBI0 : 0) |
+		tcr_el2_ips_to_tcr_el1_ps(tcr_el2) |
+		(tcr_el2 & TCR_EL2_TG0_MASK) |
+		(tcr_el2 & TCR_EL2_ORGN0_MASK) |
+		(tcr_el2 & TCR_EL2_IRGN0_MASK) |
+		(tcr_el2 & TCR_EL2_T0SZ_MASK);
+
+	/* Rely on separate VMID for VA context, always use ASID 0 */
+	s_sys_regs[TTBR0_EL1] &= ~GENMASK_ULL(63, 48);
+	s_sys_regs[TTBR1_EL1] = 0;
+
+	s_sys_regs[CPACR_EL1] = cptr_el2_to_cpacr_el1(el2_regs[CPTR_EL2]);
+}
 
 /**
  * kvm_arm_setup_shadow_state -- prepare shadow state based on emulated mode
@@ -37,6 +107,7 @@ void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
 		else
 			ctxt->hw_pstate |= PSR_MODE_EL1t;
 
+		create_shadow_el1_sysregs(vcpu);
 		ctxt->hw_sys_regs = ctxt->shadow_sys_regs;
 		ctxt->hw_sp_el1 = ctxt->el2_regs[SP_EL2];
 	} else {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 10/55] KVM: arm64: Synchronize EL1 system registers on virtual EL2 entry and exit
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (8 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 09/55] KVM: arm64: Set shadow EL1 registers for virtual EL2 execution Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-06-06 20:16   ` Bandan Das
  2017-01-09  6:24 ` [RFC 11/55] KVM: arm64: Emulate taking an exception to the guest hypervisor Jintack Lim
                   ` (46 subsequent siblings)
  56 siblings, 1 reply; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

From: Christoffer Dall <christoffer.dall@linaro.org>

When running in virtual EL2 we use the shadow EL1 systerm register array
for the save/restore process, so that hardware and especially the memory
subsystem behaves as code written for EL2 expects while really running
in EL1.

This works great for EL1 system register accesses that we trap, because
these accesses will be written into the virtual state for the EL1 system
registers used when eventually switching the VCPU mode to EL1.

However, there was a collection of EL1 system registers which we do not
trap, and as a consequence all save/restore operations of these
registers were happening locally in the shadow array, with no benefit to
software actually running in virtual EL1 at all.

To fix this, simply synchronize the shadow and real EL1 state for these
registers on entry/exit to/from virtual EL2 state.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm64/kvm/context.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
index 2e9e386..0025dd9 100644
--- a/arch/arm64/kvm/context.c
+++ b/arch/arm64/kvm/context.c
@@ -88,6 +88,51 @@ static void create_shadow_el1_sysregs(struct kvm_vcpu *vcpu)
 	s_sys_regs[CPACR_EL1] = cptr_el2_to_cpacr_el1(el2_regs[CPTR_EL2]);
 }
 
+/*
+ * List of EL1 registers which we allow the virtual EL2 mode to access
+ * directly without trapping and which haven't been paravirtualized.
+ *
+ * Probably CNTKCTL_EL1 should not be copied but be accessed via trap. Because,
+ * the guest hypervisor running in EL1 can be affected by event streams
+ * configured via CNTKCTL_EL1, which it does not expect. We don't have a
+ * mechanism to trap on CNTKCTL_EL1 as of now (v8.3), keep it in here instead.
+ */
+static const int el1_non_trap_regs[] = {
+	CNTKCTL_EL1,
+	CSSELR_EL1,
+	PAR_EL1,
+	TPIDR_EL0,
+	TPIDR_EL1,
+	TPIDRRO_EL0
+};
+
+/**
+ * sync_shadow_el1_state - Going to/from the virtual EL2 state, sync state
+ * @vcpu:	The VCPU pointer
+ * @setup:	True, if on the way to the guest (called from setup)
+ *		False, if returning form the guet (calld from restore)
+ *
+ * Some EL1 registers are accessed directly by the virtual EL2 mode because
+ * they in no way affect execution state in virtual EL2.   However, we must
+ * still ensure that virtual EL2 observes the same state of the EL1 registers
+ * as the normal VM's EL1 mode, so copy this state as needed on setup/restore.
+ */
+static void sync_shadow_el1_state(struct kvm_vcpu *vcpu, bool setup)
+{
+	u64 *sys_regs = vcpu->arch.ctxt.sys_regs;
+	u64 *s_sys_regs = vcpu->arch.ctxt.shadow_sys_regs;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(el1_non_trap_regs); i++) {
+		const int sr = el1_non_trap_regs[i];
+
+		if (setup)
+			s_sys_regs[sr] = sys_regs[sr];
+		else
+			sys_regs[sr] = s_sys_regs[sr];
+	}
+}
+
 /**
  * kvm_arm_setup_shadow_state -- prepare shadow state based on emulated mode
  * @vcpu: The VCPU pointer
@@ -107,6 +152,7 @@ void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
 		else
 			ctxt->hw_pstate |= PSR_MODE_EL1t;
 
+		sync_shadow_el1_state(vcpu, true);
 		create_shadow_el1_sysregs(vcpu);
 		ctxt->hw_sys_regs = ctxt->shadow_sys_regs;
 		ctxt->hw_sp_el1 = ctxt->el2_regs[SP_EL2];
@@ -125,6 +171,7 @@ void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
 	if (unlikely(vcpu_mode_el2(vcpu))) {
+		sync_shadow_el1_state(vcpu, false);
 		*vcpu_cpsr(vcpu) &= PSR_MODE_MASK;
 		*vcpu_cpsr(vcpu) |= ctxt->hw_pstate & ~PSR_MODE_MASK;
 		ctxt->el2_regs[SP_EL2] = ctxt->hw_sp_el1;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 11/55] KVM: arm64: Emulate taking an exception to the guest hypervisor
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (9 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 10/55] KVM: arm64: Synchronize EL1 system registers on virtual EL2 entry and exit Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-02-22 11:28   ` Christoffer Dall
  2017-06-06 20:21   ` Bandan Das
  2017-01-09  6:24 ` [RFC 12/55] KVM: arm64: Handle EL2 register access traps Jintack Lim
                   ` (45 subsequent siblings)
  56 siblings, 2 replies; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

Emulate taking an exception to the guest hypervisor running in the
virtual EL2 as described in ARM ARM AArch64.TakeException().

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm/include/asm/kvm_emulate.h   | 14 ++++++++
 arch/arm64/include/asm/kvm_emulate.h | 19 +++++++++++
 arch/arm64/kvm/Makefile              |  2 ++
 arch/arm64/kvm/emulate-nested.c      | 66 ++++++++++++++++++++++++++++++++++++
 arch/arm64/kvm/trace.h               | 20 +++++++++++
 5 files changed, 121 insertions(+)
 create mode 100644 arch/arm64/kvm/emulate-nested.c

diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index 0a03b7d..0fa2f5a 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -47,6 +47,20 @@ static inline void vcpu_set_reg(struct kvm_vcpu *vcpu, u8 reg_num,
 void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
 void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
 
+static inline int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2)
+{
+	kvm_err("Unexpected call to %s for the non-nesting configuration\n",
+		 __func__);
+	return -EINVAL;
+}
+
+static inline int kvm_inject_nested_irq(struct kvm_vcpu *vcpu)
+{
+	kvm_err("Unexpected call to %s for the non-nesting configuration\n",
+		 __func__);
+	return -EINVAL;
+}
+
 static inline void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu) { };
 static inline void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu) { };
 static inline void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt) { };
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 8892c82..0987ee4 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -42,6 +42,25 @@
 void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
 void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
 
+#ifdef CONFIG_KVM_ARM_NESTED_HYP
+int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2);
+int kvm_inject_nested_irq(struct kvm_vcpu *vcpu);
+#else
+static inline int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2)
+{
+	kvm_err("Unexpected call to %s for the non-nesting configuration\n",
+		 __func__);
+	return -EINVAL;
+}
+
+static inline int kvm_inject_nested_irq(struct kvm_vcpu *vcpu)
+{
+	kvm_err("Unexpected call to %s for the non-nesting configuration\n",
+		 __func__);
+	return -EINVAL;
+}
+#endif
+
 void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu);
 void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu);
 void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt);
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 7811d27..b342bdd 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -34,3 +34,5 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-its.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/irqchip.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
 kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
+
+kvm-$(CONFIG_KVM_ARM_NESTED_HYP) += emulate-nested.o
diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
new file mode 100644
index 0000000..59d147f
--- /dev/null
+++ b/arch/arm64/kvm/emulate-nested.c
@@ -0,0 +1,66 @@
+/*
+ * Copyright (C) 2016 - Columbia University
+ * Author: Jintack Lim <jintack@cs.columbia.edu>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_emulate.h>
+
+#include "trace.h"
+
+#define	EL2_EXCEPT_SYNC_OFFSET	0x400
+#define	EL2_EXCEPT_ASYNC_OFFSET	0x480
+
+
+/*
+ *  Emulate taking an exception. See ARM ARM J8.1.2 AArch64.TakeException()
+ */
+static int kvm_inject_nested(struct kvm_vcpu *vcpu, u64 esr_el2,
+			     int exception_offset)
+{
+	int ret = 1;
+	kvm_cpu_context_t *ctxt = &vcpu->arch.ctxt;
+
+	/* We don't inject an exception recursively to virtual EL2 */
+	if (vcpu_mode_el2(vcpu))
+		BUG();
+
+	ctxt->el2_regs[SPSR_EL2] = *vcpu_cpsr(vcpu);
+	ctxt->el2_regs[ELR_EL2] = *vcpu_pc(vcpu);
+	ctxt->el2_regs[ESR_EL2] = esr_el2;
+
+	/* On an exception, PSTATE.SP = 1 */
+	*vcpu_cpsr(vcpu) = PSR_MODE_EL2h;
+	*vcpu_cpsr(vcpu) |=  (PSR_A_BIT | PSR_F_BIT | PSR_I_BIT | PSR_D_BIT);
+	*vcpu_pc(vcpu) = ctxt->el2_regs[VBAR_EL2] + exception_offset;
+
+	trace_kvm_inject_nested_exception(vcpu, esr_el2, *vcpu_pc(vcpu));
+
+	return ret;
+}
+
+int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2)
+{
+	return kvm_inject_nested(vcpu, esr_el2, EL2_EXCEPT_SYNC_OFFSET);
+}
+
+int kvm_inject_nested_irq(struct kvm_vcpu *vcpu)
+{
+	u64 esr_el2 = kvm_vcpu_get_hsr(vcpu);
+	/* We supports only IRQ and FIQ, so the esr_el2 is not updated. */
+	return kvm_inject_nested(vcpu, esr_el2, EL2_EXCEPT_ASYNC_OFFSET);
+}
diff --git a/arch/arm64/kvm/trace.h b/arch/arm64/kvm/trace.h
index 7fb0008..7c86cfb 100644
--- a/arch/arm64/kvm/trace.h
+++ b/arch/arm64/kvm/trace.h
@@ -167,6 +167,26 @@
 );
 
 
+TRACE_EVENT(kvm_inject_nested_exception,
+	TP_PROTO(struct kvm_vcpu *vcpu, unsigned long esr_el2,
+		 unsigned long pc),
+	TP_ARGS(vcpu, esr_el2, pc),
+
+	TP_STRUCT__entry(
+		__field(struct kvm_vcpu *,	vcpu)
+		__field(unsigned long,		esr_el2)
+		__field(unsigned long,		pc)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu = vcpu;
+		__entry->esr_el2 = esr_el2;
+		__entry->pc = pc;
+	),
+
+	TP_printk("vcpu: %p, inject exception to vEL2: ESR_EL2 0x%lx, vector: 0x%016lx",
+		  __entry->vcpu, __entry->esr_el2, __entry->pc)
+);
 #endif /* _TRACE_ARM64_KVM_H */
 
 #undef TRACE_INCLUDE_PATH
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 12/55] KVM: arm64: Handle EL2 register access traps
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (10 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 11/55] KVM: arm64: Emulate taking an exception to the guest hypervisor Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-02-22 11:30   ` Christoffer Dall
  2017-02-22 11:31   ` Christoffer Dall
  2017-01-09  6:24 ` [RFC 13/55] KVM: arm64: Handle eret instruction traps Jintack Lim
                   ` (44 subsequent siblings)
  56 siblings, 2 replies; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

ARM v8.3 introduces a new bit in the HCR_EL2, which is the NV bit. When
this bit is set, accessing EL2 registers in EL1 traps to EL2. In
addition, executing following instructions in EL1 will trap to EL2 -
tlbi and at instructions which are undefined when exectued in EL1, eret
instruction, msr/mrs instructions to access SP_EL1.

This patch handles traps due to accessing EL2 registers in EL1.  The
host hypervisor keeps EL2 register values in memory, and will use them
to emulate the behavior that the guest hypervisor expects from the
hardware.

Subsequent patches will handle other kinds of traps.

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm64/kvm/sys_regs.c | 119 ++++++++++++++++++++++++++++++++++++++++++++++
 arch/arm64/kvm/sys_regs.h |   7 +++
 2 files changed, 126 insertions(+)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 7cef94f..4158f2f 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -873,6 +873,18 @@ static bool access_cntp_cval(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+static bool trap_el2_reg(struct kvm_vcpu *vcpu,
+			 struct sys_reg_params *p,
+			 const struct sys_reg_desc *r)
+{
+	if (!p->is_write)
+		p->regval = vcpu_el2_reg(vcpu, r->reg);
+	else
+		vcpu_el2_reg(vcpu, r->reg) = p->regval;
+
+	return true;
+}
+
 /*
  * Architected system registers.
  * Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
@@ -1163,15 +1175,122 @@ static bool access_cntp_cval(struct kvm_vcpu *vcpu,
 	{ Op0(0b11), Op1(0b011), CRn(0b1110), CRm(0b1111), Op2(0b111),
 	  access_pmu_evtyper, reset_val, PMCCFILTR_EL0, 0 },
 
+	/* VPIDR_EL2 */
+	{ Op0(0b11), Op1(0b100), CRn(0b0000), CRm(0b0000), Op2(0b000),
+	  trap_el2_reg, reset_el2_val, VPIDR_EL2, 0 },
+	/* VMPIDR_EL2 */
+	{ Op0(0b11), Op1(0b100), CRn(0b0000), CRm(0b0000), Op2(0b101),
+	  trap_el2_reg, reset_el2_val, VMPIDR_EL2, 0 },
+
+	/* SCTLR_EL2 */
+	{ Op0(0b11), Op1(0b100), CRn(0b0001), CRm(0b0000), Op2(0b000),
+	  trap_el2_reg, reset_el2_val, SCTLR_EL2, 0 },
+	/* ACTLR_EL2 */
+	{ Op0(0b11), Op1(0b100), CRn(0b0001), CRm(0b0000), Op2(0b001),
+	  trap_el2_reg, reset_el2_val, ACTLR_EL2, 0 },
+	/* HCR_EL2 */
+	{ Op0(0b11), Op1(0b100), CRn(0b0001), CRm(0b0001), Op2(0b000),
+	  trap_el2_reg, reset_el2_val, HCR_EL2, 0 },
+	/* MDCR_EL2 */
+	{ Op0(0b11), Op1(0b100), CRn(0b0001), CRm(0b0001), Op2(0b001),
+	  trap_el2_reg, reset_el2_val, MDCR_EL2, 0 },
+	/* CPTR_EL2 */
+	{ Op0(0b11), Op1(0b100), CRn(0b0001), CRm(0b0001), Op2(0b010),
+	  trap_el2_reg, reset_el2_val, CPTR_EL2, 0 },
+	/* HSTR_EL2 */
+	{ Op0(0b11), Op1(0b100), CRn(0b0001), CRm(0b0001), Op2(0b011),
+	  trap_el2_reg, reset_el2_val, HSTR_EL2, 0 },
+	/* HACR_EL2 */
+	{ Op0(0b11), Op1(0b100), CRn(0b0001), CRm(0b0001), Op2(0b111),
+	  trap_el2_reg, reset_el2_val, HACR_EL2, 0 },
+
+	/* TTBR0_EL2 */
+	{ Op0(0b11), Op1(0b100), CRn(0b0010), CRm(0b0000), Op2(0b000),
+	  trap_el2_reg, reset_el2_val, TTBR0_EL2, 0 },
+	/* TCR_EL2 */
+	{ Op0(0b11), Op1(0b100), CRn(0b0010), CRm(0b0000), Op2(0b010),
+	  trap_el2_reg, reset_el2_val, TCR_EL2, 0 },
+	/* VTTBR_EL2 */
+	{ Op0(0b11), Op1(0b100), CRn(0b0010), CRm(0b0001), Op2(0b000),
+	  trap_el2_reg, reset_el2_val, VTTBR_EL2, 0 },
+	/* VTCR_EL2 */
+	{ Op0(0b11), Op1(0b100), CRn(0b0010), CRm(0b0001), Op2(0b010),
+	  trap_el2_reg, reset_el2_val, VTCR_EL2, 0 },
+
 	/* DACR32_EL2 */
 	{ Op0(0b11), Op1(0b100), CRn(0b0011), CRm(0b0000), Op2(0b000),
 	  NULL, reset_unknown, DACR32_EL2 },
+
+	/* SPSR_EL2 */
+	{ Op0(0b11), Op1(0b100), CRn(0b0100), CRm(0b0000), Op2(0b000),
+	  trap_el2_reg, reset_el2_val, SPSR_EL2, 0 },
+	/* ELR_EL2 */
+	{ Op0(0b11), Op1(0b100), CRn(0b0100), CRm(0b0000), Op2(0b001),
+	  trap_el2_reg, reset_el2_val, ELR_EL2, 0 },
+	/* SP_EL1 */
+	{ Op0(0b11), Op1(0b100), CRn(0b0100), CRm(0b0001), Op2(0b000),
+	  trap_el2_reg },
+
 	/* IFSR32_EL2 */
 	{ Op0(0b11), Op1(0b100), CRn(0b0101), CRm(0b0000), Op2(0b001),
 	  NULL, reset_unknown, IFSR32_EL2 },
+	/* AFSR0_EL2 */
+	{ Op0(0b11), Op1(0b100), CRn(0b0101), CRm(0b0001), Op2(0b000),
+	  trap_el2_reg, reset_el2_val, AFSR0_EL2, 0 },
+	/* AFSR1_EL2 */
+	{ Op0(0b11), Op1(0b100), CRn(0b0101), CRm(0b0001), Op2(0b001),
+	  trap_el2_reg, reset_el2_val, AFSR1_EL2, 0 },
+	/* ESR_EL2 */
+	{ Op0(0b11), Op1(0b100), CRn(0b0101), CRm(0b0010), Op2(0b000),
+	  trap_el2_reg, reset_el2_val, ESR_EL2, 0 },
 	/* FPEXC32_EL2 */
 	{ Op0(0b11), Op1(0b100), CRn(0b0101), CRm(0b0011), Op2(0b000),
 	  NULL, reset_val, FPEXC32_EL2, 0x70 },
+
+	/* FAR_EL2 */
+	{ Op0(0b11), Op1(0b100), CRn(0b0110), CRm(0b0000), Op2(0b000),
+	  trap_el2_reg, reset_el2_val, FAR_EL2, 0 },
+	/* HPFAR_EL2 */
+	{ Op0(0b11), Op1(0b100), CRn(0b0110), CRm(0b0000), Op2(0b100),
+	  trap_el2_reg, reset_el2_val, HPFAR_EL2, 0 },
+
+	/* MAIR_EL2 */
+	{ Op0(0b11), Op1(0b100), CRn(0b1010), CRm(0b0010), Op2(0b000),
+	  trap_el2_reg, reset_el2_val, MAIR_EL2, 0 },
+	/* AMAIR_EL2 */
+	{ Op0(0b11), Op1(0b100), CRn(0b1010), CRm(0b0011), Op2(0b000),
+	  trap_el2_reg, reset_el2_val, AMAIR_EL2, 0 },
+
+	/* VBAR_EL2 */
+	{ Op0(0b11), Op1(0b100), CRn(0b1100), CRm(0b0000), Op2(0b000),
+	  trap_el2_reg, reset_el2_val, VBAR_EL2, 0 },
+	/* RVBAR_EL2 */
+	{ Op0(0b11), Op1(0b100), CRn(0b1100), CRm(0b0000), Op2(0b001),
+	  trap_el2_reg, reset_el2_val, RVBAR_EL2, 0 },
+	/* RMR_EL2 */
+	{ Op0(0b11), Op1(0b100), CRn(0b1100), CRm(0b0000), Op2(0b010),
+	  trap_el2_reg, reset_el2_val, RMR_EL2, 0 },
+
+	/* TPIDR_EL2 */
+	{ Op0(0b11), Op1(0b100), CRn(0b1101), CRm(0b0000), Op2(0b010),
+	  trap_el2_reg, reset_el2_val, TPIDR_EL2, 0 },
+
+	/* CNTVOFF_EL2 */
+	{ Op0(0b11), Op1(0b100), CRn(0b1110), CRm(0b0000), Op2(0b011),
+	  trap_el2_reg, reset_el2_val, CNTVOFF_EL2, 0 },
+	/* CNTHCTL_EL2 */
+	{ Op0(0b11), Op1(0b100), CRn(0b1110), CRm(0b0001), Op2(0b000),
+	  trap_el2_reg, reset_el2_val, CNTHCTL_EL2, 0 },
+	/* CNTHP_TVAL_EL2 */
+	{ Op0(0b11), Op1(0b100), CRn(0b1110), CRm(0b0010), Op2(0b000),
+	  trap_el2_reg, reset_el2_val, CNTHP_TVAL_EL2, 0 },
+	/* CNTHP_CTL_EL2 */
+	{ Op0(0b11), Op1(0b100), CRn(0b1110), CRm(0b0010), Op2(0b001),
+	  trap_el2_reg, reset_el2_val, CNTHP_CTL_EL2, 0 },
+	/* CNTHP_CVAL_EL2 */
+	{ Op0(0b11), Op1(0b100), CRn(0b1110), CRm(0b0010), Op2(0b010),
+	  trap_el2_reg, reset_el2_val, CNTHP_CVAL_EL2, 0 },
+
 };
 
 static bool trap_dbgidr(struct kvm_vcpu *vcpu,
diff --git a/arch/arm64/kvm/sys_regs.h b/arch/arm64/kvm/sys_regs.h
index dbbb01c..181290f 100644
--- a/arch/arm64/kvm/sys_regs.h
+++ b/arch/arm64/kvm/sys_regs.h
@@ -117,6 +117,13 @@ static inline void reset_val(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r
 	vcpu_sys_reg(vcpu, r->reg) = r->val;
 }
 
+static inline void reset_el2_val(struct kvm_vcpu *vcpu,
+				 const struct sys_reg_desc *r)
+{
+	BUG_ON(r->reg >= NR_EL2_REGS);
+	vcpu_el2_reg(vcpu, r->reg) = r->val;
+}
+
 static inline int cmp_sys_reg(const struct sys_reg_desc *i1,
 			      const struct sys_reg_desc *i2)
 {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 13/55] KVM: arm64: Handle eret instruction traps
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (11 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 12/55] KVM: arm64: Handle EL2 register access traps Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-01-09  6:24 ` [RFC 14/55] KVM: arm64: Take account of system " Jintack Lim
                   ` (43 subsequent siblings)
  56 siblings, 0 replies; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

When HCR.NV bit is set, eret instruction execution in the guest
hypervisor will trap with EC code 0x1A. Let ELR_EL2 and SPSR_EL2 state
from the guest's perspective be restored to the hardware on the next
guest entry.

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm64/include/asm/esr.h |  1 +
 arch/arm64/kvm/handle_exit.c | 12 ++++++++++++
 arch/arm64/kvm/trace.h       | 21 +++++++++++++++++++++
 3 files changed, 34 insertions(+)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index d14c478..f32e3a7 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -42,6 +42,7 @@
 #define ESR_ELx_EC_HVC64	(0x16)
 #define ESR_ELx_EC_SMC64	(0x17)
 #define ESR_ELx_EC_SYS64	(0x18)
+#define ESR_ELx_EC_ERET		(0x1A)
 /* Unallocated EC: 0x19 - 0x1E */
 #define ESR_ELx_EC_IMP_DEF	(0x1f)
 #define ESR_ELx_EC_IABT_LOW	(0x20)
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index a204adf..4e4a915 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -125,6 +125,17 @@ static int kvm_handle_guest_debug(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	return ret;
 }
 
+static int kvm_handle_eret(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	trace_kvm_nested_eret(vcpu, vcpu_el2_reg(vcpu, ELR_EL2),
+			      vcpu_el2_reg(vcpu, SPSR_EL2));
+
+	*vcpu_pc(vcpu) = vcpu_el2_reg(vcpu, ELR_EL2);
+	*vcpu_cpsr(vcpu) = vcpu_el2_reg(vcpu, SPSR_EL2);
+
+	return 1;
+}
+
 static exit_handle_fn arm_exit_handlers[] = {
 	[ESR_ELx_EC_WFx]	= kvm_handle_wfx,
 	[ESR_ELx_EC_CP15_32]	= kvm_handle_cp15_32,
@@ -137,6 +148,7 @@ static int kvm_handle_guest_debug(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	[ESR_ELx_EC_HVC64]	= handle_hvc,
 	[ESR_ELx_EC_SMC64]	= handle_smc,
 	[ESR_ELx_EC_SYS64]	= kvm_handle_sys_reg,
+	[ESR_ELx_EC_ERET]	= kvm_handle_eret,
 	[ESR_ELx_EC_IABT_LOW]	= kvm_handle_guest_abort,
 	[ESR_ELx_EC_DABT_LOW]	= kvm_handle_guest_abort,
 	[ESR_ELx_EC_SOFTSTP_LOW]= kvm_handle_guest_debug,
diff --git a/arch/arm64/kvm/trace.h b/arch/arm64/kvm/trace.h
index 7c86cfb..5f40987 100644
--- a/arch/arm64/kvm/trace.h
+++ b/arch/arm64/kvm/trace.h
@@ -187,6 +187,27 @@
 	TP_printk("vcpu: %p, inject exception to vEL2: ESR_EL2 0x%lx, vector: 0x%016lx",
 		  __entry->vcpu, __entry->esr_el2, __entry->pc)
 );
+
+TRACE_EVENT(kvm_nested_eret,
+	TP_PROTO(struct kvm_vcpu *vcpu, unsigned long elr_el2,
+		 unsigned long spsr_el2),
+	TP_ARGS(vcpu, elr_el2, spsr_el2),
+
+	TP_STRUCT__entry(
+		__field(struct kvm_vcpu *,	vcpu)
+		__field(unsigned long,		elr_el2)
+		__field(unsigned long,		spsr_el2)
+	),
+
+	TP_fast_assign(
+		__entry->vcpu = vcpu;
+		__entry->elr_el2 = elr_el2;
+		__entry->spsr_el2 = spsr_el2;
+	),
+
+	TP_printk("vcpu: %p, eret to elr_el2: 0x%016lx, with spsr_el2: 0x%08lx",
+		  __entry->vcpu, __entry->elr_el2, __entry->spsr_el2)
+);
 #endif /* _TRACE_ARM64_KVM_H */
 
 #undef TRACE_INCLUDE_PATH
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 14/55] KVM: arm64: Take account of system instruction traps
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (12 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 13/55] KVM: arm64: Handle eret instruction traps Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-02-22 11:34   ` Christoffer Dall
  2017-01-09  6:24 ` [RFC 15/55] KVM: arm64: Trap EL1 VM register accesses in virtual EL2 Jintack Lim
                   ` (42 subsequent siblings)
  56 siblings, 1 reply; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

When HCR.NV bit is set, execution of the EL2 translation regime Address
Translation instructions and TLB maintenance instructions are trapped to
EL2. In addition, execution of the EL1 translation regime Address
Translation instructions and TLB maintenance instructions that are only
accessible from EL2 and above are trapped to EL2. In these cases,
ESR_EL2.EC will be set to 0x18.

Take account of this and handle system instructions as well as MRS/MSR
instructions in the handler. Change the handler name to reflect this.

Emulation of those system instructions is to be done.

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm64/include/asm/kvm_coproc.h |  2 +-
 arch/arm64/kvm/handle_exit.c        |  2 +-
 arch/arm64/kvm/sys_regs.c           | 49 ++++++++++++++++++++++++++++++++-----
 arch/arm64/kvm/trace.h              |  2 +-
 4 files changed, 46 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_coproc.h b/arch/arm64/include/asm/kvm_coproc.h
index 0b52377..1b3d21b 100644
--- a/arch/arm64/include/asm/kvm_coproc.h
+++ b/arch/arm64/include/asm/kvm_coproc.h
@@ -43,7 +43,7 @@ void kvm_register_target_sys_reg_table(unsigned int target,
 int kvm_handle_cp14_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
-int kvm_handle_sys_reg(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_sys(struct kvm_vcpu *vcpu, struct kvm_run *run);
 
 #define kvm_coproc_table_init kvm_sys_reg_table_init
 void kvm_sys_reg_table_init(void);
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 4e4a915..a891684 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -147,7 +147,7 @@ static int kvm_handle_eret(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	[ESR_ELx_EC_SMC32]	= handle_smc,
 	[ESR_ELx_EC_HVC64]	= handle_hvc,
 	[ESR_ELx_EC_SMC64]	= handle_smc,
-	[ESR_ELx_EC_SYS64]	= kvm_handle_sys_reg,
+	[ESR_ELx_EC_SYS64]	= kvm_handle_sys,
 	[ESR_ELx_EC_ERET]	= kvm_handle_eret,
 	[ESR_ELx_EC_IABT_LOW]	= kvm_handle_guest_abort,
 	[ESR_ELx_EC_DABT_LOW]	= kvm_handle_guest_abort,
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 4158f2f..202f64d 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1903,6 +1903,36 @@ static int emulate_sys_reg(struct kvm_vcpu *vcpu,
 	return 1;
 }
 
+static int emulate_tlbi(struct kvm_vcpu *vcpu,
+			     struct sys_reg_params *params)
+{
+	/* TODO: support tlbi instruction emulation*/
+	kvm_inject_undefined(vcpu);
+	return 1;
+}
+
+static int emulate_at(struct kvm_vcpu *vcpu,
+			     struct sys_reg_params *params)
+{
+	/* TODO: support address translation instruction emulation */
+	kvm_inject_undefined(vcpu);
+	return 1;
+}
+
+static int emulate_sys_instr(struct kvm_vcpu *vcpu,
+			     struct sys_reg_params *params)
+{
+	int ret;
+
+	/* TLB maintenance instructions*/
+	if (params->CRn == 0b1000)
+		ret = emulate_tlbi(vcpu, params);
+	/* Address Translation instructions */
+	else if (params->CRn == 0b0111 && params->CRm == 0b1000)
+		ret = emulate_at(vcpu, params);
+	return ret;
+}
+
 static void reset_sys_reg_descs(struct kvm_vcpu *vcpu,
 			      const struct sys_reg_desc *table, size_t num)
 {
@@ -1914,18 +1944,19 @@ static void reset_sys_reg_descs(struct kvm_vcpu *vcpu,
 }
 
 /**
- * kvm_handle_sys_reg -- handles a mrs/msr trap on a guest sys_reg access
+ * kvm_handle_sys-- handles a system instruction or mrs/msr instruction trap
+		    on a guest execution
  * @vcpu: The VCPU pointer
  * @run:  The kvm_run struct
  */
-int kvm_handle_sys_reg(struct kvm_vcpu *vcpu, struct kvm_run *run)
+int kvm_handle_sys(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
 	struct sys_reg_params params;
 	unsigned long esr = kvm_vcpu_get_hsr(vcpu);
 	int Rt = (esr >> 5) & 0x1f;
 	int ret;
 
-	trace_kvm_handle_sys_reg(esr);
+	trace_kvm_handle_sys(esr);
 
 	params.is_aarch32 = false;
 	params.is_32bit = false;
@@ -1937,10 +1968,16 @@ int kvm_handle_sys_reg(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	params.regval = vcpu_get_reg(vcpu, Rt);
 	params.is_write = !(esr & 1);
 
-	ret = emulate_sys_reg(vcpu, &params);
+	if (params.Op0 == 1) {
+		/* System instructions */
+		ret = emulate_sys_instr(vcpu, &params);
+	} else {
+		/* MRS/MSR instructions */
+		ret = emulate_sys_reg(vcpu, &params);
+		if (!params.is_write)
+			vcpu_set_reg(vcpu, Rt, params.regval);
+	}
 
-	if (!params.is_write)
-		vcpu_set_reg(vcpu, Rt, params.regval);
 	return ret;
 }
 
diff --git a/arch/arm64/kvm/trace.h b/arch/arm64/kvm/trace.h
index 5f40987..192708e 100644
--- a/arch/arm64/kvm/trace.h
+++ b/arch/arm64/kvm/trace.h
@@ -134,7 +134,7 @@
 	TP_printk("%s %s reg %d (0x%08llx)", __entry->fn,  __entry->is_write?"write to":"read from", __entry->reg, __entry->write_value)
 );
 
-TRACE_EVENT(kvm_handle_sys_reg,
+TRACE_EVENT(kvm_handle_sys,
 	TP_PROTO(unsigned long hsr),
 	TP_ARGS(hsr),
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 15/55] KVM: arm64: Trap EL1 VM register accesses in virtual EL2
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (13 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 14/55] KVM: arm64: Take account of system " Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-01-09  6:24 ` [RFC 16/55] KVM: arm64: Forward VM reg traps to the guest hypervisor Jintack Lim
                   ` (41 subsequent siblings)
  56 siblings, 0 replies; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

From: Christoffer Dall <christoffer.dall@linaro.org>

When running in virtual EL2 mode, we actually run the hardware in EL1
and therefore have to use the EL1 registers to ensure correct operation.

By setting the HCR.TVM and HCR.TVRM we ensure that the virtual EL2 mode
doesn't shoot itself in the foot when setting up what it believes to be
a different mode's system register state (for example when preparing to
switch to a VM).

We can leverage the existing sysregs infrastructure to support trapped
accesses to these registers.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm64/kvm/hyp/switch.c | 2 ++
 arch/arm64/kvm/sys_regs.c   | 7 ++++++-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 83037cd..c05c48f 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -82,6 +82,8 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
 		write_sysreg(1 << 30, fpexc32_el2);
 		isb();
 	}
+	if (vcpu_mode_el2(vcpu))
+		val |= HCR_TVM | HCR_TRVM;
 	write_sysreg(val, hcr_el2);
 	/* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */
 	write_sysreg(1 << 15, hstr_el2);
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 202f64d..b8e993a 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -101,7 +101,12 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
 {
 	bool was_enabled = vcpu_has_cache_enabled(vcpu);
 
-	BUG_ON(!p->is_write);
+	BUG_ON(!vcpu_mode_el2(vcpu) && !p->is_write);
+
+	if (!p->is_write) {
+		p->regval = vcpu_sys_reg(vcpu, r->reg);
+		return true;
+	}
 
 	if (!p->is_aarch32) {
 		vcpu_sys_reg(vcpu, r->reg) = p->regval;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 16/55] KVM: arm64: Forward VM reg traps to the guest hypervisor
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (14 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 15/55] KVM: arm64: Trap EL1 VM register accesses in virtual EL2 Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-02-22 11:39   ` Christoffer Dall
  2017-01-09  6:24 ` [RFC 17/55] KVM: arm64: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 in virtual EL2 Jintack Lim
                   ` (40 subsequent siblings)
  56 siblings, 1 reply; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

Forward virtual memory register traps to the guest hypervisor
if it has set corresponding bits to the virtual HCR_EL2.

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm64/kvm/sys_regs.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index b8e993a..0f5d21b 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -90,6 +90,23 @@ static bool access_dcsw(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+static bool forward_vm_traps(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
+{
+	u64 hcr_el2 = vcpu_el2_reg(vcpu, HCR_EL2);
+
+	/* If this is a trap from the virtual EL2, the host handles */
+	if (vcpu_mode_el2(vcpu))
+		return false;
+
+	/* If the guest wants to trap on R/W operation, forward this trap */
+	if ((hcr_el2 & HCR_TVM) && p->is_write)
+		return true;
+	else if ((hcr_el2 & HCR_TRVM) && !p->is_write)
+		return true;
+
+	return false;
+}
+
 /*
  * Generic accessor for VM registers. Only called as long as HCR_TVM
  * is set. If the guest enables the MMU, we stop trapping the VM
@@ -101,6 +118,9 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
 {
 	bool was_enabled = vcpu_has_cache_enabled(vcpu);
 
+	if (forward_vm_traps(vcpu, p))
+		return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
 	BUG_ON(!vcpu_mode_el2(vcpu) && !p->is_write);
 
 	if (!p->is_write) {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 17/55] KVM: arm64: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 in virtual EL2
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (15 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 16/55] KVM: arm64: Forward VM reg traps to the guest hypervisor Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-02-22 11:40   ` Christoffer Dall
  2017-01-09  6:24 ` [RFC 18/55] KVM: arm64: Forward traps due to HCR_EL2.NV1 bit to the guest hypervisor Jintack Lim
                   ` (39 subsequent siblings)
  56 siblings, 1 reply; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

For the same reason we trap virtual memory register accesses in virtual
EL2, we need to trap SPSR_EL1, ELR_EL1 and VBAR_EL1 accesses. ARM v8.3
introduces the HCR_EL2.NV1 bit to be able to trap on those register
accesses in EL1. Do not set this bit until the whole nesting support is
complete.

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm64/kvm/sys_regs.c | 41 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 40 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 0f5d21b..19d6a6e 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -898,6 +898,38 @@ static bool access_cntp_cval(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+static inline void access_rw(struct sys_reg_params *p, u64 *sysreg)
+{
+	if (!p->is_write)
+		p->regval = *sysreg;
+	else
+		*sysreg = p->regval;
+}
+
+static bool access_elr(struct kvm_vcpu *vcpu,
+		struct sys_reg_params *p,
+		const struct sys_reg_desc *r)
+{
+	access_rw(p, &vcpu->arch.ctxt.gp_regs.elr_el1);
+	return true;
+}
+
+static bool access_spsr(struct kvm_vcpu *vcpu,
+		struct sys_reg_params *p,
+		const struct sys_reg_desc *r)
+{
+	access_rw(p, &vcpu->arch.ctxt.gp_regs.spsr[KVM_SPSR_EL1]);
+	return true;
+}
+
+static bool access_vbar(struct kvm_vcpu *vcpu,
+		struct sys_reg_params *p,
+		const struct sys_reg_desc *r)
+{
+	access_rw(p, &vcpu_sys_reg(vcpu, r->reg));
+	return true;
+}
+
 static bool trap_el2_reg(struct kvm_vcpu *vcpu,
 			 struct sys_reg_params *p,
 			 const struct sys_reg_desc *r)
@@ -1013,6 +1045,13 @@ static bool trap_el2_reg(struct kvm_vcpu *vcpu,
 	{ Op0(0b11), Op1(0b000), CRn(0b0010), CRm(0b0000), Op2(0b010),
 	  access_vm_reg, reset_val, TCR_EL1, 0 },
 
+	/* SPSR_EL1 */
+	{ Op0(0b11), Op1(0b000), CRn(0b0100), CRm(0b0000), Op2(0b000),
+	  access_spsr},
+	/* ELR_EL1 */
+	{ Op0(0b11), Op1(0b000), CRn(0b0100), CRm(0b0000), Op2(0b001),
+	  access_elr},
+
 	/* AFSR0_EL1 */
 	{ Op0(0b11), Op1(0b000), CRn(0b0101), CRm(0b0001), Op2(0b000),
 	  access_vm_reg, reset_unknown, AFSR0_EL1 },
@@ -1045,7 +1084,7 @@ static bool trap_el2_reg(struct kvm_vcpu *vcpu,
 
 	/* VBAR_EL1 */
 	{ Op0(0b11), Op1(0b000), CRn(0b1100), CRm(0b0000), Op2(0b000),
-	  NULL, reset_val, VBAR_EL1, 0 },
+	  access_vbar, reset_val, VBAR_EL1, 0 },
 
 	/* ICC_SGI1R_EL1 */
 	{ Op0(0b11), Op1(0b000), CRn(0b1100), CRm(0b1011), Op2(0b101),
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 18/55] KVM: arm64: Forward traps due to HCR_EL2.NV1 bit to the guest hypervisor
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (16 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 17/55] KVM: arm64: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 in virtual EL2 Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-02-22 11:41   ` Christoffer Dall
  2017-01-09  6:24 ` [RFC 19/55] KVM: arm64: Trap CPACR_EL1 access in virtual EL2 Jintack Lim
                   ` (38 subsequent siblings)
  56 siblings, 1 reply; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

Forward ELR_EL1, SPSR_EL1 and VBAR_EL1 traps to the guest hypervisor if
it has set the NV1 bit to the virtual HCR_EL2. The guest hypervisor
would set this NV1 bit to run a hypervisor in its VM (i.e. another level
of nested hypervisor).

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm64/include/asm/kvm_arm.h |  1 +
 arch/arm64/kvm/sys_regs.c        | 17 +++++++++++++++++
 2 files changed, 18 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 2a2752b..feded61 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -23,6 +23,7 @@
 #include <asm/types.h>
 
 /* Hyp Configuration Register (HCR) bits */
+#define HCR_NV1		(UL(1) << 43)
 #define HCR_E2H		(UL(1) << 34)
 #define HCR_ID		(UL(1) << 33)
 #define HCR_CD		(UL(1) << 32)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 19d6a6e..59f9cc6 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -906,10 +906,21 @@ static inline void access_rw(struct sys_reg_params *p, u64 *sysreg)
 		*sysreg = p->regval;
 }
 
+static bool forward_nv1_traps(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
+{
+	if (!vcpu_mode_el2(vcpu) && (vcpu_el2_reg(vcpu, HCR_EL2) & HCR_NV1))
+		return true;
+
+	return false;
+}
+
 static bool access_elr(struct kvm_vcpu *vcpu,
 		struct sys_reg_params *p,
 		const struct sys_reg_desc *r)
 {
+	if (forward_nv1_traps(vcpu, p))
+		return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
 	access_rw(p, &vcpu->arch.ctxt.gp_regs.elr_el1);
 	return true;
 }
@@ -918,6 +929,9 @@ static bool access_spsr(struct kvm_vcpu *vcpu,
 		struct sys_reg_params *p,
 		const struct sys_reg_desc *r)
 {
+	if (forward_nv1_traps(vcpu, p))
+		return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
 	access_rw(p, &vcpu->arch.ctxt.gp_regs.spsr[KVM_SPSR_EL1]);
 	return true;
 }
@@ -926,6 +940,9 @@ static bool access_vbar(struct kvm_vcpu *vcpu,
 		struct sys_reg_params *p,
 		const struct sys_reg_desc *r)
 {
+	if (forward_nv1_traps(vcpu, p))
+		return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
 	access_rw(p, &vcpu_sys_reg(vcpu, r->reg));
 	return true;
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 19/55] KVM: arm64: Trap CPACR_EL1 access in virtual EL2
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (17 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 18/55] KVM: arm64: Forward traps due to HCR_EL2.NV1 bit to the guest hypervisor Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-01-09  6:24 ` [RFC 20/55] KVM: arm64: Forward CPACR_EL1 traps to the guest hypervisor Jintack Lim
                   ` (37 subsequent siblings)
  56 siblings, 0 replies; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

For the same reason we trap virtual memory register accesses in virtual
EL2, we trap CPACR_EL1 access too. Basically, we don't want the guest
hypervisor to access the real CPACR_EL1, which is used to emulate
virtual EL2. Instead, we want it to access virtual CPACR_EL1 which is
used to run software in EL0/EL1 from the guest hypervisor's perspective.

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm64/kvm/hyp/switch.c | 10 +++++++---
 arch/arm64/kvm/sys_regs.c   | 10 +++++++++-
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index c05c48f..b7c8c30 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -41,7 +41,8 @@ bool __hyp_text __fpsimd_enabled(void)
 	return __fpsimd_is_enabled()();
 }
 
-static void __hyp_text __activate_traps_vhe(void)
+static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
+
 {
 	u64 val;
 
@@ -53,12 +54,15 @@ static void __hyp_text __activate_traps_vhe(void)
 	write_sysreg(__kvm_hyp_vector, vbar_el1);
 }
 
-static void __hyp_text __activate_traps_nvhe(void)
+static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
+
 {
 	u64 val;
 
 	val = CPTR_EL2_DEFAULT;
 	val |= CPTR_EL2_TTA | CPTR_EL2_TFP;
+	if (vcpu_mode_el2(vcpu))
+		val |= CPTR_EL2_TCPAC;
 	write_sysreg(val, cptr_el2);
 }
 
@@ -90,7 +94,7 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
 	/* Make sure we trap PMU access from EL0 to EL2 */
 	write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
 	write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
-	__activate_traps_arch()();
+	__activate_traps_arch()(vcpu);
 }
 
 static void __hyp_text __deactivate_traps_vhe(void)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 59f9cc6..321ecbc 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -947,6 +947,14 @@ static bool access_vbar(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+static bool access_cpacr(struct kvm_vcpu *vcpu,
+		struct sys_reg_params *p,
+		const struct sys_reg_desc *r)
+{
+	access_rw(p, &vcpu_sys_reg(vcpu, r->reg));
+	return true;
+}
+
 static bool trap_el2_reg(struct kvm_vcpu *vcpu,
 			 struct sys_reg_params *p,
 			 const struct sys_reg_desc *r)
@@ -1051,7 +1059,7 @@ static bool trap_el2_reg(struct kvm_vcpu *vcpu,
 	  access_vm_reg, reset_val, SCTLR_EL1, 0x00C50078 },
 	/* CPACR_EL1 */
 	{ Op0(0b11), Op1(0b000), CRn(0b0001), CRm(0b0000), Op2(0b010),
-	  NULL, reset_val, CPACR_EL1, 0 },
+	  access_cpacr, reset_val, CPACR_EL1, 0 },
 	/* TTBR0_EL1 */
 	{ Op0(0b11), Op1(0b000), CRn(0b0010), CRm(0b0000), Op2(0b000),
 	  access_vm_reg, reset_unknown, TTBR0_EL1 },
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 20/55] KVM: arm64: Forward CPACR_EL1 traps to the guest hypervisor
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (18 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 19/55] KVM: arm64: Trap CPACR_EL1 access in virtual EL2 Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-01-09  6:24 ` [RFC 21/55] KVM: arm64: Forward HVC instruction " Jintack Lim
                   ` (36 subsequent siblings)
  56 siblings, 0 replies; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

Forward CPACR_EL1 traps to the guest hypervisor if it has configured the
virtual CPTR_EL2 to do so.

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm64/kvm/sys_regs.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 321ecbc..e66f40d 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -951,6 +951,11 @@ static bool access_cpacr(struct kvm_vcpu *vcpu,
 		struct sys_reg_params *p,
 		const struct sys_reg_desc *r)
 {
+	/* Forward this trap to the guest hypervisor if it expects */
+	if (!vcpu_mode_el2(vcpu) &&
+	    (vcpu_el2_reg(vcpu, CPTR_EL2) & CPTR_EL2_TCPAC))
+		return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
 	access_rw(p, &vcpu_sys_reg(vcpu, r->reg));
 	return true;
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 21/55] KVM: arm64: Forward HVC instruction to the guest hypervisor
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (19 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 20/55] KVM: arm64: Forward CPACR_EL1 traps to the guest hypervisor Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-02-22 11:47   ` Christoffer Dall
  2017-01-09  6:24 ` [RFC 22/55] KVM: arm64: Handle PSCI call from the guest Jintack Lim
                   ` (35 subsequent siblings)
  56 siblings, 1 reply; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

Forward exceptions due to hvc instruction to the guest hypervisor.

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm64/include/asm/kvm_nested.h |  5 +++++
 arch/arm64/kvm/Makefile             |  1 +
 arch/arm64/kvm/handle_exit.c        | 11 +++++++++++
 arch/arm64/kvm/handle_exit_nested.c | 27 +++++++++++++++++++++++++++
 4 files changed, 44 insertions(+)
 create mode 100644 arch/arm64/include/asm/kvm_nested.h
 create mode 100644 arch/arm64/kvm/handle_exit_nested.c

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
new file mode 100644
index 0000000..620b4d3
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -0,0 +1,5 @@
+#ifndef __ARM64_KVM_NESTED_H__
+#define __ARM64_KVM_NESTED_H__
+
+int handle_hvc_nested(struct kvm_vcpu *vcpu);
+#endif
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index b342bdd..9c35e9a 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -35,4 +35,5 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/irqchip.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
 kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
 
+kvm-$(CONFIG_KVM_ARM_NESTED_HYP) += handle_exit_nested.o
 kvm-$(CONFIG_KVM_ARM_NESTED_HYP) += emulate-nested.o
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index a891684..208be16 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -29,6 +29,10 @@
 #include <asm/kvm_mmu.h>
 #include <asm/kvm_psci.h>
 
+#ifdef CONFIG_KVM_ARM_NESTED_HYP
+#include <asm/kvm_nested.h>
+#endif
+
 #define CREATE_TRACE_POINTS
 #include "trace.h"
 
@@ -42,6 +46,13 @@ static int handle_hvc(struct kvm_vcpu *vcpu, struct kvm_run *run)
 			    kvm_vcpu_hvc_get_imm(vcpu));
 	vcpu->stat.hvc_exit_stat++;
 
+#ifdef CONFIG_KVM_ARM_NESTED_HYP
+	ret = handle_hvc_nested(vcpu);
+	if (ret < 0 && ret != -EINVAL)
+		return ret;
+	else if (ret >= 0)
+		return ret;
+#endif
 	ret = kvm_psci_call(vcpu);
 	if (ret < 0) {
 		kvm_inject_undefined(vcpu);
diff --git a/arch/arm64/kvm/handle_exit_nested.c b/arch/arm64/kvm/handle_exit_nested.c
new file mode 100644
index 0000000..a6ce23b
--- /dev/null
+++ b/arch/arm64/kvm/handle_exit_nested.c
@@ -0,0 +1,27 @@
+/*
+ * Copyright (C) 2016 - Columbia University
+ * Author: Jintack Lim <jintack@cs.columbia.edu>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_emulate.h>
+
+/* We forward all hvc instruction to the guest hypervisor. */
+int handle_hvc_nested(struct kvm_vcpu *vcpu)
+{
+	return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 22/55] KVM: arm64: Handle PSCI call from the guest
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (20 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 21/55] KVM: arm64: Forward HVC instruction " Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-01-09  6:24 ` [RFC 23/55] KVM: arm64: Forward WFX to the guest hypervisor Jintack Lim
                   ` (34 subsequent siblings)
  56 siblings, 0 replies; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

VMs used to execute hvc #0 for the psci call. However, when we come to
provide virtual EL2 to the VM, the host OS inside the VM also calls
kvm_call_hyp which is also hvc #0. So, it's hard to differentiate
between them from the host hypervisor's point of view.

So, let the VM execute smc for the psci call. On ARMv8.3, even if EL3 is
not implemented, a smc instruction executed at non-secure EL1 is trapped
to EL2 if HCR_EL2.TSC==1, rather than being treated as UNDEFINED. So,
the host hypervisor can handle this psci call without any confusion.

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm64/kvm/handle_exit.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 208be16..ce6d2ef 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -64,8 +64,27 @@ static int handle_hvc(struct kvm_vcpu *vcpu, struct kvm_run *run)
 
 static int handle_smc(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
-	kvm_inject_undefined(vcpu);
-	return 1;
+	int ret;
+
+	/* If imm is non-zero, it's not defined */
+	if (kvm_vcpu_hvc_get_imm(vcpu)) {
+		kvm_inject_undefined(vcpu);
+		return 1;
+	}
+
+	/*
+	 * If imm is zero, it's a psci call.
+	 * Note that on ARMv8.3, even if EL3 is not implemented, SMC executed
+	 * at Non-secure EL1 is trapped to EL2 if HCR_EL2.TSC==1, rather than
+	 * being treated as UNDEFINED.
+	 */
+	ret = kvm_psci_call(vcpu);
+	if (ret < 0) {
+		kvm_inject_undefined(vcpu);
+		return 1;
+	}
+
+	return ret;
 }
 
 /**
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 23/55] KVM: arm64: Forward WFX to the guest hypervisor
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (21 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 22/55] KVM: arm64: Handle PSCI call from the guest Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-01-09  6:24 ` [RFC 24/55] KVM: arm64: Forward FP exceptions " Jintack Lim
                   ` (33 subsequent siblings)
  56 siblings, 0 replies; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

Forward exceptions due to WFI or WFE to the guest hypervisor if the
guest hypervisor has set corresponding virtual HCR_EL2.TWX bits.

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm64/include/asm/kvm_nested.h |  1 +
 arch/arm64/kvm/handle_exit.c        | 11 ++++++++++-
 arch/arm64/kvm/handle_exit_nested.c | 18 ++++++++++++++++++
 3 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 620b4d3..8d36935 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -2,4 +2,5 @@
 #define __ARM64_KVM_NESTED_H__
 
 int handle_hvc_nested(struct kvm_vcpu *vcpu);
+int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
 #endif
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index ce6d2ef..046fdf8 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -101,7 +101,16 @@ static int handle_smc(struct kvm_vcpu *vcpu, struct kvm_run *run)
  */
 static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
-	if (kvm_vcpu_get_hsr(vcpu) & ESR_ELx_WFx_ISS_WFE) {
+	bool is_wfe = !!(kvm_vcpu_get_hsr(vcpu) & ESR_ELx_WFx_ISS_WFE);
+#ifdef CONFIG_KVM_ARM_NESTED_HYP
+	int ret = handle_wfx_nested(vcpu, is_wfe);
+
+	if (ret < 0 && ret != -EINVAL)
+		return ret;
+	else if (ret >= 0)
+		return ret;
+#endif
+	if (is_wfe) {
 		trace_kvm_wfx_arm64(*vcpu_pc(vcpu), true);
 		vcpu->stat.wfe_exit_stat++;
 		kvm_vcpu_on_spin(vcpu);
diff --git a/arch/arm64/kvm/handle_exit_nested.c b/arch/arm64/kvm/handle_exit_nested.c
index a6ce23b..871ecfc 100644
--- a/arch/arm64/kvm/handle_exit_nested.c
+++ b/arch/arm64/kvm/handle_exit_nested.c
@@ -25,3 +25,21 @@ int handle_hvc_nested(struct kvm_vcpu *vcpu)
 {
 	return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
 }
+
+/*
+ * Inject wfx to the nested hypervisor if this is from the nested VM and
+ * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
+ * handle this.
+ */
+int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
+{
+	u64 hcr_el2 = vcpu_el2_reg(vcpu, HCR_EL2);
+
+	if (vcpu_mode_el2(vcpu))
+		return -EINVAL;
+
+	if ((is_wfe && (hcr_el2 & HCR_TWE)) || (!is_wfe && (hcr_el2 & HCR_TWI)))
+		return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
+	return -EINVAL;
+}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 24/55] KVM: arm64: Forward FP exceptions to the guest hypervisor
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (22 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 23/55] KVM: arm64: Forward WFX to the guest hypervisor Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-01-09  6:24 ` [RFC 25/55] KVM: arm/arm64: Let vcpu thread modify its own active state Jintack Lim
                   ` (32 subsequent siblings)
  56 siblings, 0 replies; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

Forward exceptions due to floating-point register accesses to the guest
hypervisor if it has set CPTR_EL2.TFP bit.

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm64/include/asm/kvm_nested.h |  1 +
 arch/arm64/kernel/asm-offsets.c     |  1 +
 arch/arm64/kvm/handle_exit.c        |  3 +++
 arch/arm64/kvm/handle_exit_nested.c |  6 ++++++
 arch/arm64/kvm/hyp/entry.S          | 14 ++++++++++++++
 arch/arm64/kvm/hyp/hyp-entry.S      |  2 +-
 6 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 8d36935..54c5ce5 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -3,4 +3,5 @@
 
 int handle_hvc_nested(struct kvm_vcpu *vcpu);
 int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
+int kvm_handle_fp_asimd(struct kvm_vcpu *vcpu, struct kvm_run *run);
 #endif
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 4a2f0f0..b635f1a 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -131,6 +131,7 @@ int main(void)
   DEFINE(CPU_FP_REGS,		offsetof(struct kvm_regs, fp_regs));
   DEFINE(VCPU_FPEXC32_EL2,	offsetof(struct kvm_vcpu, arch.ctxt.sys_regs[FPEXC32_EL2]));
   DEFINE(VCPU_HOST_CONTEXT,	offsetof(struct kvm_vcpu, arch.host_cpu_context));
+  DEFINE(VIRTUAL_CPTR_EL2,	offsetof(struct kvm_vcpu, arch.ctxt.el2_regs[CPTR_EL2]));
 #endif
 #ifdef CONFIG_CPU_PM
   DEFINE(CPU_SUSPEND_SZ,	sizeof(struct cpu_suspend_ctx));
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 046fdf8..308f5c5 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -195,6 +195,9 @@ static int kvm_handle_eret(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	[ESR_ELx_EC_BREAKPT_LOW]= kvm_handle_guest_debug,
 	[ESR_ELx_EC_BKPT32]	= kvm_handle_guest_debug,
 	[ESR_ELx_EC_BRK64]	= kvm_handle_guest_debug,
+#ifdef CONFIG_KVM_ARM_NESTED_HYP
+	[ESR_ELx_EC_FP_ASIMD]	= kvm_handle_fp_asimd,
+#endif
 };
 
 static exit_handle_fn kvm_get_exit_handler(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/kvm/handle_exit_nested.c b/arch/arm64/kvm/handle_exit_nested.c
index 871ecfc..7544c6d 100644
--- a/arch/arm64/kvm/handle_exit_nested.c
+++ b/arch/arm64/kvm/handle_exit_nested.c
@@ -43,3 +43,9 @@ int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
 
 	return -EINVAL;
 }
+
+/* This is only called when virtual CPTR_EL2.TFP bit is set. */
+int kvm_handle_fp_asimd(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+}
diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index 12ee62d..a76f102 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -158,6 +158,20 @@ abort_guest_exit_end:
 1:	ret
 ENDPROC(__guest_exit)
 
+ENTRY(__fpsimd_guest_trap)
+#ifdef CONFIG_KVM_ARM_NESTED_HYP
+// If virtual CPTR_EL2.TFP is set, then foward it to the nested hyp.
+	mrs	x1, tpidr_el2
+	ldr	x0, [x1, #VIRTUAL_CPTR_EL2]
+	and 	x0, x0, #CPTR_EL2_TFP
+	cbnz	x0, 1f
+#endif
+	b	__fpsimd_guest_restore
+1:
+	mov	x0, #ARM_EXCEPTION_TRAP
+	b	__guest_exit
+ENDPROC(__fpsimd_guest_trap)
+
 ENTRY(__fpsimd_guest_restore)
 	stp	x2, x3, [sp, #-16]!
 	stp	x4, lr, [sp, #-16]!
diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
index 4e92399..d83494b 100644
--- a/arch/arm64/kvm/hyp/hyp-entry.S
+++ b/arch/arm64/kvm/hyp/hyp-entry.S
@@ -108,7 +108,7 @@ el1_trap:
 
 	/* Guest accessed VFP/SIMD registers, save host, restore Guest */
 	cmp	x0, #ESR_ELx_EC_FP_ASIMD
-	b.eq	__fpsimd_guest_restore
+	b.eq	__fpsimd_guest_trap
 
 	mrs	x1, tpidr_el2
 	mov	x0, #ARM_EXCEPTION_TRAP
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 25/55] KVM: arm/arm64: Let vcpu thread modify its own active state
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (23 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 24/55] KVM: arm64: Forward FP exceptions " Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-02-22 12:27   ` Christoffer Dall
  2017-01-09  6:24 ` [RFC 26/55] KVM: arm/arm64: Add VGIC data structures for the nesting Jintack Lim
                   ` (31 subsequent siblings)
  56 siblings, 1 reply; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

Currently, if a vcpu thread tries to change its own active state when
the irq is already in AP list, it'll loop forever. Since the VCPU thread
has already synced back LR state to the struct vgic_irq, let it modify
its own state safely.

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 virt/kvm/arm/vgic/vgic-mmio.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c
index ebe1b9f..049c570 100644
--- a/virt/kvm/arm/vgic/vgic-mmio.c
+++ b/virt/kvm/arm/vgic/vgic-mmio.c
@@ -192,9 +192,9 @@ static void vgic_mmio_change_active(struct kvm_vcpu *vcpu, struct vgic_irq *irq,
 	 * If this virtual IRQ was written into a list register, we
 	 * have to make sure the CPU that runs the VCPU thread has
 	 * synced back LR state to the struct vgic_irq.  We can only
-	 * know this for sure, when either this irq is not assigned to
+	 * know this for sure, when this irq is not assigned to
 	 * anyone's AP list anymore, or the VCPU thread is not
-	 * running on any CPUs.
+	 * running on any CPUs, or current thread is the VCPU thread.
 	 *
 	 * In the opposite case, we know the VCPU thread may be on its
 	 * way back from the guest and still has to sync back this
@@ -202,6 +202,7 @@ static void vgic_mmio_change_active(struct kvm_vcpu *vcpu, struct vgic_irq *irq,
 	 * other thread sync back the IRQ.
 	 */
 	while (irq->vcpu && /* IRQ may have state in an LR somewhere */
+	       irq->vcpu != vcpu && /* Current thread is not the VCPU thread */
 	       irq->vcpu->cpu != -1) /* VCPU thread is running */
 		cond_resched_lock(&irq->irq_lock);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 26/55] KVM: arm/arm64: Add VGIC data structures for the nesting
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (24 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 25/55] KVM: arm/arm64: Let vcpu thread modify its own active state Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-01-09  6:24 ` [RFC 27/55] KVM: arm/arm64: Emulate GICH interface on GICv2 Jintack Lim
                   ` (30 subsequent siblings)
  56 siblings, 0 replies; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

From: Christoffer Dall <christoffer.dall@linaro.org>

This adds a couple of extra data structures:

The nested_vgic_vX structures contain the data manipulated by the guest
hypervisor when it faults/traps on accesses to the GICH_ interface.

The shadow_vgic_vX arrays contain the shadow copies of the LRs.  That
is, it is a modified version of the nested_vgic_vX->vgic_lr.  The reason
why we need a modified version is that for interrupts with the HW bit
set (those for the timer) the interrupt number must be that of the host
hardware number, and not the virtual one programmed by the guest
hypervisor.

The hw_vX_cpu_if pointers point to the registers that the lowvisor (EL2)
code actually copied into hardware when switching to the guest, so at
init time we set:

vgic_cpu->hw_v2_cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;

And we should change the vgic-sr function to read the LRs from the
hw_v2_lr pointer.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 include/kvm/arm_vgic.h | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 002f092..9a9cb27 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -246,6 +246,26 @@ struct vgic_cpu {
 	unsigned int used_lrs;
 	struct vgic_irq private_irqs[VGIC_NR_PRIVATE_IRQS];
 
+	/* CPU vif control registers for the virtual GICH interface */
+	union {
+		struct vgic_v2_cpu_if	nested_vgic_v2;
+		struct vgic_v3_cpu_if	nested_vgic_v3;
+	};
+
+	/*
+	 * The shadow vif control register loaded to the hardware when
+	 * running a sted L2 guest with the virtual IMO bit set.
+	 */
+	union {
+		struct vgic_v2_cpu_if	shadow_vgic_v2;
+		struct vgic_v3_cpu_if	shadow_vgic_v3;
+	};
+
+	union {
+		struct vgic_v2_cpu_if	*hw_v2_cpu_if;
+		struct vgic_v3_cpu_if	*hw_v3_cpu_if;
+	};
+
 	spinlock_t ap_list_lock;	/* Protects the ap_list */
 
 	/*
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 27/55] KVM: arm/arm64: Emulate GICH interface on GICv2
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (25 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 26/55] KVM: arm/arm64: Add VGIC data structures for the nesting Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-02-22 13:06   ` Christoffer Dall
  2017-01-09  6:24 ` [RFC 28/55] KVM: arm/arm64: Prepare vgic state for the nested VM Jintack Lim
                   ` (29 subsequent siblings)
  56 siblings, 1 reply; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

Emulate GICH interface accesses from the guest hypervisor.

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
Signed-off-by: Shih-Wei Li <shihwei@cs.columbia.edu>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm64/kvm/Makefile            |   1 +
 virt/kvm/arm/vgic/vgic-v2-nested.c | 207 +++++++++++++++++++++++++++++++++++++
 2 files changed, 208 insertions(+)
 create mode 100644 virt/kvm/arm/vgic/vgic-v2-nested.c

diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 9c35e9a..8573faf 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -37,3 +37,4 @@ kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
 
 kvm-$(CONFIG_KVM_ARM_NESTED_HYP) += handle_exit_nested.o
 kvm-$(CONFIG_KVM_ARM_NESTED_HYP) += emulate-nested.o
+kvm-$(CONFIG_KVM_ARM_NESTED_HYP) += $(KVM)/arm/vgic/vgic-v2-nested.o
diff --git a/virt/kvm/arm/vgic/vgic-v2-nested.c b/virt/kvm/arm/vgic/vgic-v2-nested.c
new file mode 100644
index 0000000..b13128e
--- /dev/null
+++ b/virt/kvm/arm/vgic/vgic-v2-nested.c
@@ -0,0 +1,207 @@
+#include <linux/cpu.h>
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/uaccess.h>
+
+#include <linux/irqchip/arm-gic.h>
+
+#include <asm/kvm_emulate.h>
+#include <asm/kvm_arm.h>
+#include <asm/kvm_mmu.h>
+#include <kvm/arm_vgic.h>
+
+#include "vgic.h"
+#include "vgic-mmio.h"
+
+static inline struct vgic_v2_cpu_if *vcpu_nested_if(struct kvm_vcpu *vcpu)
+{
+	return &vcpu->arch.vgic_cpu.nested_vgic_v2;
+}
+
+static inline struct vgic_v2_cpu_if *vcpu_shadow_if(struct kvm_vcpu *vcpu)
+{
+	return &vcpu->arch.vgic_cpu.shadow_vgic_v2;
+}
+
+static unsigned long vgic_mmio_read_v2_vtr(struct kvm_vcpu *vcpu,
+					   gpa_t addr, unsigned int len)
+{
+	u32 reg;
+
+	reg = kvm_vgic_global_state.nr_lr - 1;
+	reg |= 0b100 << 26;
+	reg |= 0b100 << 29;
+
+	return reg;
+}
+
+static inline bool lr_triggers_eoi(u32 lr)
+{
+	return !(lr & (GICH_LR_STATE | GICH_LR_HW)) && (lr & GICH_LR_EOI);
+}
+
+static unsigned long get_eisr(struct kvm_vcpu *vcpu, bool upper_reg)
+{
+	struct vgic_v2_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+	int max_lr = upper_reg ? 64 : 32;
+	int min_lr = upper_reg ? 32 : 0;
+	int nr_lr = min(kvm_vgic_global_state.nr_lr, max_lr);
+	int i;
+	u32 reg = 0;
+
+	for (i = min_lr; i < nr_lr; i++) {
+		if (lr_triggers_eoi(cpu_if->vgic_lr[i]))
+			reg |= BIT(i - min_lr);
+	}
+
+	return reg;
+}
+
+static unsigned long vgic_mmio_read_v2_eisr0(struct kvm_vcpu *vcpu,
+					     gpa_t addr, unsigned int len)
+{
+	return get_eisr(vcpu, false);
+}
+
+static unsigned long vgic_mmio_read_v2_eisr1(struct kvm_vcpu *vcpu,
+					     gpa_t addr, unsigned int len)
+{
+	return get_eisr(vcpu, true);
+}
+
+static u32 get_elrsr(struct kvm_vcpu *vcpu, bool upper_reg)
+{
+	struct vgic_v2_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+	int max_lr = upper_reg ? 64 : 32;
+	int min_lr = upper_reg ? 32 : 0;
+	int nr_lr = min(kvm_vgic_global_state.nr_lr, max_lr);
+	u32 reg = 0;
+	int i;
+
+	for (i = min_lr; i < nr_lr; i++) {
+		if (!(cpu_if->vgic_lr[i] & GICH_LR_STATE))
+			reg |= BIT(i - min_lr);
+	}
+
+	return reg;
+}
+
+static unsigned long vgic_mmio_read_v2_elrsr0(struct kvm_vcpu *vcpu,
+					      gpa_t addr, unsigned int len)
+{
+	return get_elrsr(vcpu, false);
+}
+
+static unsigned long vgic_mmio_read_v2_elrsr1(struct kvm_vcpu *vcpu,
+					      gpa_t addr, unsigned int len)
+{
+	return get_elrsr(vcpu, true);
+}
+
+static unsigned long vgic_mmio_read_v2_misr(struct kvm_vcpu *vcpu,
+					    gpa_t addr, unsigned int len)
+{
+	struct vgic_v2_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+	int nr_lr = kvm_vgic_global_state.nr_lr;
+	u32 reg = 0;
+
+	if (vgic_mmio_read_v2_eisr0(vcpu, addr, len) ||
+			vgic_mmio_read_v2_eisr1(vcpu, addr, len))
+		reg |= GICH_MISR_EOI;
+
+	if (cpu_if->vgic_hcr & GICH_HCR_UIE) {
+		u32 elrsr0 = vgic_mmio_read_v2_elrsr0(vcpu, addr, len);
+		u32 elrsr1 = vgic_mmio_read_v2_elrsr1(vcpu, addr, len);
+		int used_lrs;
+
+		used_lrs = nr_lr - (hweight32(elrsr0) + hweight32(elrsr1));
+		if (used_lrs <= 1)
+			reg |= GICH_MISR_U;
+	}
+
+	/* TODO: Support remaining bits in this register */
+	return reg;
+}
+
+static unsigned long vgic_mmio_read_v2_gich(struct kvm_vcpu *vcpu,
+					    gpa_t addr, unsigned int len)
+{
+	struct vgic_v2_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+	u32 value;
+
+	switch (addr & 0xfff) {
+	case GICH_HCR:
+		value = cpu_if->vgic_hcr;
+		break;
+	case GICH_VMCR:
+		value = cpu_if->vgic_vmcr;
+		break;
+	case GICH_APR:
+		value = cpu_if->vgic_apr;
+		break;
+	case GICH_LR0 ... (GICH_LR0 + 4 * (VGIC_V2_MAX_LRS - 1)):
+		value = cpu_if->vgic_lr[(addr & 0xff) >> 2];
+		break;
+	default:
+		return 0;
+	}
+
+	return value;
+}
+
+static void vgic_mmio_write_v2_gich(struct kvm_vcpu *vcpu,
+				    gpa_t addr, unsigned int len,
+				    unsigned long val)
+{
+	struct vgic_v2_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+
+	switch (addr & 0xfff) {
+	case GICH_HCR:
+		cpu_if->vgic_hcr = val;
+		break;
+	case GICH_VMCR:
+		cpu_if->vgic_vmcr = val;
+		break;
+	case GICH_APR:
+		cpu_if->vgic_apr = val;
+		break;
+	case GICH_LR0 ... (GICH_LR0 + 4 * (VGIC_V2_MAX_LRS - 1)):
+		cpu_if->vgic_lr[(addr & 0xff) >> 2] = val;
+		break;
+	}
+}
+
+static const struct vgic_register_region vgic_v2_gich_registers[] = {
+	REGISTER_DESC_WITH_LENGTH(GICH_HCR,
+		vgic_mmio_read_v2_gich, vgic_mmio_write_v2_gich, 4,
+		VGIC_ACCESS_32bit),
+	REGISTER_DESC_WITH_LENGTH(GICH_VTR,
+		vgic_mmio_read_v2_vtr, vgic_mmio_write_wi, 4,
+		VGIC_ACCESS_32bit),
+	REGISTER_DESC_WITH_LENGTH(GICH_VMCR,
+		vgic_mmio_read_v2_gich, vgic_mmio_write_v2_gich, 4,
+		VGIC_ACCESS_32bit),
+	REGISTER_DESC_WITH_LENGTH(GICH_MISR,
+		vgic_mmio_read_v2_misr, vgic_mmio_write_wi, 4,
+		VGIC_ACCESS_32bit),
+	REGISTER_DESC_WITH_LENGTH(GICH_EISR0,
+		vgic_mmio_read_v2_eisr0, vgic_mmio_write_wi, 4,
+		VGIC_ACCESS_32bit),
+	REGISTER_DESC_WITH_LENGTH(GICH_EISR1,
+		vgic_mmio_read_v2_eisr1, vgic_mmio_write_wi, 4,
+		VGIC_ACCESS_32bit),
+	REGISTER_DESC_WITH_LENGTH(GICH_ELRSR0,
+		vgic_mmio_read_v2_elrsr0, vgic_mmio_write_wi, 4,
+		VGIC_ACCESS_32bit),
+	REGISTER_DESC_WITH_LENGTH(GICH_ELRSR1,
+		vgic_mmio_read_v2_elrsr1, vgic_mmio_write_wi, 4,
+		VGIC_ACCESS_32bit),
+	REGISTER_DESC_WITH_LENGTH(GICH_APR,
+		vgic_mmio_read_v2_gich, vgic_mmio_write_v2_gich, 4,
+		VGIC_ACCESS_32bit),
+	REGISTER_DESC_WITH_LENGTH(GICH_LR0,
+		vgic_mmio_read_v2_gich, vgic_mmio_write_v2_gich,
+		4 * VGIC_V2_MAX_LRS, VGIC_ACCESS_32bit),
+};
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 28/55] KVM: arm/arm64: Prepare vgic state for the nested VM
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (26 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 27/55] KVM: arm/arm64: Emulate GICH interface on GICv2 Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-02-22 13:12   ` Christoffer Dall
  2017-01-09  6:24 ` [RFC 29/55] KVM: arm/arm64: Set up the prepared vgic state Jintack Lim
                   ` (28 subsequent siblings)
  56 siblings, 1 reply; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

When entering a nested VM, we set up the hypervisor control interface
based on what the guest hypervisor has set. Especially, we investigate
each list register written by the guest hypervisor whether HW bit is
set.  If so, we translate hw irq number from the guest's point of view
to the real hardware irq number if there is a mapping.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm/include/asm/kvm_emulate.h   |  5 ++
 arch/arm64/include/asm/kvm_emulate.h |  5 ++
 arch/arm64/kvm/context.c             |  4 ++
 include/kvm/arm_vgic.h               |  8 +++
 virt/kvm/arm/vgic/vgic-init.c        |  3 ++
 virt/kvm/arm/vgic/vgic-v2-nested.c   | 99 ++++++++++++++++++++++++++++++++++++
 virt/kvm/arm/vgic/vgic.h             | 11 ++++
 7 files changed, 135 insertions(+)

diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index 0fa2f5a..05d5906 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -101,6 +101,11 @@ static inline bool vcpu_mode_el2(const struct kvm_vcpu *vcpu)
 	return false;
 }
 
+static inline bool vcpu_el2_imo_is_set(const struct kvm_vcpu *vcpu)
+{
+	return false;
+}
+
 static inline unsigned long *vcpu_pc(struct kvm_vcpu *vcpu)
 {
 	return &vcpu->arch.ctxt.gp_regs.usr_regs.ARM_pc;
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 0987ee4..a9c993f 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -178,6 +178,11 @@ static inline bool vcpu_mode_el2(const struct kvm_vcpu *vcpu)
 	return mode == PSR_MODE_EL2h || mode == PSR_MODE_EL2t;
 }
 
+static inline bool vcpu_el2_imo_is_set(const struct kvm_vcpu *vcpu)
+{
+	return (vcpu_el2_reg(vcpu, HCR_EL2) & HCR_IMO);
+}
+
 static inline u32 kvm_vcpu_get_hsr(const struct kvm_vcpu *vcpu)
 {
 	return vcpu->arch.fault.esr_el2;
diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
index 0025dd9..7a94c9d 100644
--- a/arch/arm64/kvm/context.c
+++ b/arch/arm64/kvm/context.c
@@ -161,6 +161,8 @@ void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
 		ctxt->hw_sys_regs = ctxt->sys_regs;
 		ctxt->hw_sp_el1 = ctxt->gp_regs.sp_el1;
 	}
+
+	vgic_v2_setup_shadow_state(vcpu);
 }
 
 /**
@@ -179,6 +181,8 @@ void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu)
 		*vcpu_cpsr(vcpu) = ctxt->hw_pstate;
 		ctxt->gp_regs.sp_el1 = ctxt->hw_sp_el1;
 	}
+
+	vgic_v2_restore_shadow_state(vcpu);
 }
 
 void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt)
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 9a9cb27..484f6b1 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -312,6 +312,14 @@ int kvm_vgic_inject_mapped_irq(struct kvm *kvm, int cpuid, unsigned int intid,
 
 int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
 
+#ifdef CONFIG_KVM_ARM_NESTED_HYP
+void vgic_v2_setup_shadow_state(struct kvm_vcpu *vcpu);
+void vgic_v2_restore_shadow_state(struct kvm_vcpu *vcpu);
+#else
+static inline void vgic_v2_setup_shadow_state(struct kvm_vcpu *vcpu) { }
+static inline void vgic_v2_restore_shadow_state(struct kvm_vcpu *vcpu) { }
+#endif
+
 #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
 #define vgic_initialized(k)	((k)->arch.vgic.initialized)
 #define vgic_ready(k)		((k)->arch.vgic.ready)
diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c
index 8cebfbc..06ab8a5 100644
--- a/virt/kvm/arm/vgic/vgic-init.c
+++ b/virt/kvm/arm/vgic/vgic-init.c
@@ -216,6 +216,9 @@ static void kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
 			irq->config = VGIC_CONFIG_LEVEL;
 		}
 	}
+
+	vgic_init_nested(vcpu);
+
 	if (kvm_vgic_global_state.type == VGIC_V2)
 		vgic_v2_enable(vcpu);
 	else
diff --git a/virt/kvm/arm/vgic/vgic-v2-nested.c b/virt/kvm/arm/vgic/vgic-v2-nested.c
index b13128e..a992da5 100644
--- a/virt/kvm/arm/vgic/vgic-v2-nested.c
+++ b/virt/kvm/arm/vgic/vgic-v2-nested.c
@@ -205,3 +205,102 @@ static void vgic_mmio_write_v2_gich(struct kvm_vcpu *vcpu,
 		vgic_mmio_read_v2_gich, vgic_mmio_write_v2_gich,
 		4 * VGIC_V2_MAX_LRS, VGIC_ACCESS_32bit),
 };
+
+/*
+ * For LRs which have HW bit set such as timer interrupts, we modify them to
+ * have the host hardware interrupt number instead of the virtual one programmed
+ * by the guest hypervisor.
+ */
+static void vgic_v2_create_shadow_lr(struct kvm_vcpu *vcpu)
+{
+	int i;
+	struct vgic_v2_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+	struct vgic_v2_cpu_if *s_cpu_if = vcpu_shadow_if(vcpu);
+	struct vgic_irq *irq;
+
+	int nr_lr = kvm_vgic_global_state.nr_lr;
+
+	for (i = 0; i < nr_lr; i++) {
+		u32 lr = cpu_if->vgic_lr[i];
+		int l1_irq;
+
+		if (!(lr & GICH_LR_HW))
+			goto next;
+
+		/* We have the HW bit set */
+		l1_irq = (lr & GICH_LR_PHYSID_CPUID) >>
+			GICH_LR_PHYSID_CPUID_SHIFT;
+		irq = vgic_get_irq(vcpu->kvm, vcpu, l1_irq);
+
+		if (!irq->hw) {
+			/* There was no real mapping, so nuke the HW bit */
+			lr &= ~GICH_LR_HW;
+			vgic_put_irq(vcpu->kvm, irq);
+			goto next;
+		}
+
+		/* Translate the virtual mapping to the real one */
+		lr &= ~GICH_LR_EOI;
+		lr &= ~GICH_LR_PHYSID_CPUID;
+		lr |= irq->hwintid << GICH_LR_PHYSID_CPUID_SHIFT;
+		vgic_put_irq(vcpu->kvm, irq);
+
+next:
+		s_cpu_if->vgic_lr[i] = lr;
+	}
+}
+
+/*
+ * Change the shadow HWIRQ field back to the virtual value before copying over
+ * the entire shadow struct to the nested state.
+ */
+static void vgic_v2_restore_shadow_lr(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v2_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+	struct vgic_v2_cpu_if *s_cpu_if = vcpu_shadow_if(vcpu);
+	int nr_lr = kvm_vgic_global_state.nr_lr;
+	int lr;
+
+	for (lr = 0; lr < nr_lr; lr++) {
+		s_cpu_if->vgic_lr[lr] &= ~GICH_LR_PHYSID_CPUID;
+		s_cpu_if->vgic_lr[lr] |= cpu_if->vgic_lr[lr] &
+			GICH_LR_PHYSID_CPUID;
+	}
+}
+
+void vgic_v2_setup_shadow_state(struct kvm_vcpu *vcpu)
+{
+	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
+	struct vgic_v2_cpu_if *cpu_if;
+
+	if (vcpu_el2_imo_is_set(vcpu) && !vcpu_mode_el2(vcpu)) {
+		vgic_cpu->shadow_vgic_v2 = vgic_cpu->nested_vgic_v2;
+		vgic_v2_create_shadow_lr(vcpu);
+		cpu_if = vcpu_shadow_if(vcpu);
+	} else {
+		cpu_if = &vgic_cpu->vgic_v2;
+	}
+
+	vgic_cpu->hw_v2_cpu_if = cpu_if;
+}
+
+void vgic_v2_restore_shadow_state(struct kvm_vcpu *vcpu)
+{
+	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
+
+	/* Not using shadow state: Nothing to do... */
+	if (vgic_cpu->hw_v2_cpu_if == &vgic_cpu->vgic_v2)
+		return;
+
+	/*
+	 * Translate the shadow state HW fields back to the virtual ones
+	 * before copying the shadow struct back to the nested one.
+	 */
+	vgic_v2_restore_shadow_lr(vcpu);
+	vgic_cpu->nested_vgic_v2 = vgic_cpu->shadow_vgic_v2;
+}
+
+void vgic_init_nested(struct kvm_vcpu *vcpu)
+{
+	vgic_v2_setup_shadow_state(vcpu);
+}
diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
index 9d9e014..2aef680 100644
--- a/virt/kvm/arm/vgic/vgic.h
+++ b/virt/kvm/arm/vgic/vgic.h
@@ -120,4 +120,15 @@ static inline int vgic_its_inject_msi(struct kvm *kvm, struct kvm_msi *msi)
 int vgic_lazy_init(struct kvm *kvm);
 int vgic_init(struct kvm *kvm);
 
+#ifdef CONFIG_KVM_ARM_NESTED_HYP
+void vgic_init_nested(struct kvm_vcpu *vcpu);
+#else
+static inline void vgic_init_nested(struct kvm_vcpu *vcpu)
+{
+	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
+
+	vgic_cpu->hw_v2_cpu_if = &vgic_cpu->vgic_v2;
+}
+#endif
+
 #endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 29/55] KVM: arm/arm64: Set up the prepared vgic state
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (27 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 28/55] KVM: arm/arm64: Prepare vgic state for the nested VM Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-01-09  6:24 ` [RFC 30/55] KVM: arm/arm64: Inject irqs to the guest hypervisor Jintack Lim
                   ` (27 subsequent siblings)
  56 siblings, 0 replies; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

Since vgic state is properly prepared and is pointed by hw_v2_cpu_if,
let's use it to manipulate vgic.

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 virt/kvm/arm/hyp/vgic-v2-sr.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/virt/kvm/arm/hyp/vgic-v2-sr.c b/virt/kvm/arm/hyp/vgic-v2-sr.c
index c8aeb7b..5d4898f 100644
--- a/virt/kvm/arm/hyp/vgic-v2-sr.c
+++ b/virt/kvm/arm/hyp/vgic-v2-sr.c
@@ -22,10 +22,15 @@
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
 
+static __hyp_text struct vgic_v2_cpu_if *__hyp_get_cpu_if(struct kvm_vcpu *vcpu)
+{
+	return kern_hyp_va(vcpu->arch.vgic_cpu.hw_v2_cpu_if);
+}
+
 static void __hyp_text save_maint_int_state(struct kvm_vcpu *vcpu,
 					    void __iomem *base)
 {
-	struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
+	struct vgic_v2_cpu_if *cpu_if = __hyp_get_cpu_if(vcpu);
 	int nr_lr = (kern_hyp_va(&kvm_vgic_global_state))->nr_lr;
 	u32 eisr0, eisr1;
 	int i;
@@ -67,7 +72,7 @@ static void __hyp_text save_maint_int_state(struct kvm_vcpu *vcpu,
 
 static void __hyp_text save_elrsr(struct kvm_vcpu *vcpu, void __iomem *base)
 {
-	struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
+	struct vgic_v2_cpu_if *cpu_if = __hyp_get_cpu_if(vcpu);
 	int nr_lr = (kern_hyp_va(&kvm_vgic_global_state))->nr_lr;
 	u32 elrsr0, elrsr1;
 
@@ -86,7 +91,7 @@ static void __hyp_text save_elrsr(struct kvm_vcpu *vcpu, void __iomem *base)
 
 static void __hyp_text save_lrs(struct kvm_vcpu *vcpu, void __iomem *base)
 {
-	struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
+	struct vgic_v2_cpu_if *cpu_if = __hyp_get_cpu_if(vcpu);
 	int nr_lr = (kern_hyp_va(&kvm_vgic_global_state))->nr_lr;
 	int i;
 
@@ -107,7 +112,7 @@ static void __hyp_text save_lrs(struct kvm_vcpu *vcpu, void __iomem *base)
 void __hyp_text __vgic_v2_save_state(struct kvm_vcpu *vcpu)
 {
 	struct kvm *kvm = kern_hyp_va(vcpu->kvm);
-	struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
+	struct vgic_v2_cpu_if *cpu_if = __hyp_get_cpu_if(vcpu);
 	struct vgic_dist *vgic = &kvm->arch.vgic;
 	void __iomem *base = kern_hyp_va(vgic->vctrl_base);
 
@@ -138,7 +143,7 @@ void __hyp_text __vgic_v2_save_state(struct kvm_vcpu *vcpu)
 void __hyp_text __vgic_v2_restore_state(struct kvm_vcpu *vcpu)
 {
 	struct kvm *kvm = kern_hyp_va(vcpu->kvm);
-	struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
+	struct vgic_v2_cpu_if *cpu_if = __hyp_get_cpu_if(vcpu);
 	struct vgic_dist *vgic = &kvm->arch.vgic;
 	void __iomem *base = kern_hyp_va(vgic->vctrl_base);
 	int nr_lr = (kern_hyp_va(&kvm_vgic_global_state))->nr_lr;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 30/55] KVM: arm/arm64: Inject irqs to the guest hypervisor
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (28 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 29/55] KVM: arm/arm64: Set up the prepared vgic state Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-02-22 13:16   ` Christoffer Dall
  2017-01-09  6:24 ` [RFC 31/55] KVM: arm/arm64: Inject maintenance interrupts " Jintack Lim
                   ` (26 subsequent siblings)
  56 siblings, 1 reply; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

If we have a pending IRQ for the guest and the guest expects IRQs
to be handled in its virtual EL2 mode (the virtual IMO bit is set)
and it is not already running in virtual EL2 mode, then we have to
emulate an IRQ exception.

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 virt/kvm/arm/vgic/vgic.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index 6440b56..4a98654 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -17,6 +17,7 @@
 #include <linux/kvm.h>
 #include <linux/kvm_host.h>
 #include <linux/list_sort.h>
+#include <asm/kvm_emulate.h>
 
 #include "vgic.h"
 
@@ -652,6 +653,28 @@ static void vgic_flush_lr_state(struct kvm_vcpu *vcpu)
 	/* Nuke remaining LRs */
 	for ( ; count < kvm_vgic_global_state.nr_lr; count++)
 		vgic_clear_lr(vcpu, count);
+
+	/*
+	 * If we have any pending IRQ for the guest and the guest expects IRQs
+	 * to be handled in its virtual EL2 mode (the virtual IMO bit is set)
+	 * and it is not already running in virtual EL2 mode, then we have to
+	 * emulate an IRQ exception to virtual IRQ. Note that a pending IRQ
+	 * means an irq of which state is pending but not active.
+	 */
+	if (vcpu_el2_imo_is_set(vcpu) && !vcpu_mode_el2(vcpu)) {
+		bool pending = false;
+
+		list_for_each_entry(irq, &vgic_cpu->ap_list_head, ap_list) {
+			spin_lock(&irq->irq_lock);
+			pending = irq->pending && irq->enabled && !irq->active;
+			spin_unlock(&irq->irq_lock);
+
+			if (pending) {
+				kvm_inject_nested_irq(vcpu);
+				break;
+			}
+		}
+	}
 }
 
 /* Sync back the hardware VGIC state into our emulation after a guest's run. */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 31/55] KVM: arm/arm64: Inject maintenance interrupts to the guest hypervisor
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (29 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 30/55] KVM: arm/arm64: Inject irqs to the guest hypervisor Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-02-22 13:19   ` Christoffer Dall
  2017-01-09  6:24 ` [RFC 32/55] KVM: arm/arm64: register GICH iodev for " Jintack Lim
                   ` (25 subsequent siblings)
  56 siblings, 1 reply; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

From: Christoffer Dall <christoffer.dall@linaro.org>

If we exit a nested VM with a pending maintenance interrupt from the
GIC, then we need to forward this to the guest hypervisor so that it can
re-sync the appropriate LRs and sample level triggered interrupts again.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm64/kvm/context.c           |  3 +++
 include/kvm/arm_vgic.h             |  2 ++
 virt/kvm/arm/vgic/vgic-v2-nested.c | 16 ++++++++++++++++
 3 files changed, 21 insertions(+)

diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
index 7a94c9d..a93ffe4 100644
--- a/arch/arm64/kvm/context.c
+++ b/arch/arm64/kvm/context.c
@@ -140,6 +140,9 @@ static void sync_shadow_el1_state(struct kvm_vcpu *vcpu, bool setup)
 void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
+
+	vgic_handle_nested_maint_irq(vcpu);
+
 	if (unlikely(vcpu_mode_el2(vcpu))) {
 		ctxt->hw_pstate = *vcpu_cpsr(vcpu) & ~PSR_MODE_MASK;
 
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 484f6b1..fc882d6 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -315,9 +315,11 @@ int kvm_vgic_inject_mapped_irq(struct kvm *kvm, int cpuid, unsigned int intid,
 #ifdef CONFIG_KVM_ARM_NESTED_HYP
 void vgic_v2_setup_shadow_state(struct kvm_vcpu *vcpu);
 void vgic_v2_restore_shadow_state(struct kvm_vcpu *vcpu);
+void vgic_handle_nested_maint_irq(struct kvm_vcpu *vcpu);
 #else
 static inline void vgic_v2_setup_shadow_state(struct kvm_vcpu *vcpu) { }
 static inline void vgic_v2_restore_shadow_state(struct kvm_vcpu *vcpu) { }
+static inline void vgic_handle_nested_maint_irq(struct kvm_vcpu *vcpu) { }
 #endif
 
 #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
diff --git a/virt/kvm/arm/vgic/vgic-v2-nested.c b/virt/kvm/arm/vgic/vgic-v2-nested.c
index a992da5..85f646b 100644
--- a/virt/kvm/arm/vgic/vgic-v2-nested.c
+++ b/virt/kvm/arm/vgic/vgic-v2-nested.c
@@ -300,6 +300,22 @@ void vgic_v2_restore_shadow_state(struct kvm_vcpu *vcpu)
 	vgic_cpu->nested_vgic_v2 = vgic_cpu->shadow_vgic_v2;
 }
 
+void vgic_handle_nested_maint_irq(struct kvm_vcpu *vcpu)
+{
+	struct vgic_v2_cpu_if *cpu_if = vcpu_nested_if(vcpu);
+
+	/*
+	 * If we exit a nested VM with a pending maintenance interrupt from the
+	 * GIC, then we need to forward this to the guest hypervisor so that it
+	 * can re-sync the appropriate LRs and sample level triggered interrupts
+	 * again.
+	 */
+	if (vcpu_el2_imo_is_set(vcpu) && !vcpu_mode_el2(vcpu) &&
+	    (cpu_if->vgic_hcr & GICH_HCR_EN) &&
+	    vgic_mmio_read_v2_misr(vcpu, 0, 0))
+		kvm_inject_nested_irq(vcpu);
+}
+
 void vgic_init_nested(struct kvm_vcpu *vcpu)
 {
 	vgic_v2_setup_shadow_state(vcpu);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 32/55] KVM: arm/arm64: register GICH iodev for the guest hypervisor
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (30 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 31/55] KVM: arm/arm64: Inject maintenance interrupts " Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-02-22 13:21   ` Christoffer Dall
  2017-01-09  6:24 ` [RFC 33/55] KVM: arm/arm64: Remove unused params in mmu functions Jintack Lim
                   ` (24 subsequent siblings)
  56 siblings, 1 reply; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

Register a device for the virtual interface control block(GICH) access
from the guest hypervisor.

TODO: Get GICH address from DT, which is hardcoded now.

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm64/include/uapi/asm/kvm.h  |  6 ++++++
 include/kvm/arm_vgic.h             |  5 ++++-
 virt/kvm/arm/vgic/vgic-mmio.c      |  6 ++++++
 virt/kvm/arm/vgic/vgic-v2-nested.c | 24 ++++++++++++++++++++++++
 virt/kvm/arm/vgic/vgic-v2.c        |  7 +++++++
 virt/kvm/arm/vgic/vgic.h           |  6 ++++++
 6 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 78117bf..3995d3d 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -99,6 +99,12 @@ struct kvm_regs {
 #define KVM_ARM_VCPU_PMU_V3		3 /* Support guest PMUv3 */
 #define KVM_ARM_VCPU_NESTED_VIRT	4 /* Support nested virtual EL2 */
 
+/* FIXME: This should come from DT */
+#ifdef CONFIG_KVM_ARM_NESTED_HYP
+#define KVM_VGIC_V2_GICH_BASE          0x08030000
+#define KVM_VGIC_V2_GICH_SIZE          0x2000
+#endif
+
 struct kvm_vcpu_init {
 	__u32 target;
 	__u32 features[7];
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index fc882d6..5bda20c 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -125,7 +125,8 @@ enum iodev_type {
 	IODEV_CPUIF,
 	IODEV_DIST,
 	IODEV_REDIST,
-	IODEV_ITS
+	IODEV_ITS,
+	IODEV_GICH,
 };
 
 struct vgic_io_device {
@@ -198,6 +199,8 @@ struct vgic_dist {
 
 	struct vgic_io_device	dist_iodev;
 
+	struct vgic_io_device	hyp_iodev;
+
 	bool			has_its;
 
 	/*
diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c
index 049c570..2e4097d 100644
--- a/virt/kvm/arm/vgic/vgic-mmio.c
+++ b/virt/kvm/arm/vgic/vgic-mmio.c
@@ -512,6 +512,9 @@ static int dispatch_mmio_read(struct kvm_vcpu *vcpu, struct kvm_io_device *dev,
 	case IODEV_ITS:
 		data = region->its_read(vcpu->kvm, iodev->its, addr, len);
 		break;
+	case IODEV_GICH:
+		data = region->read(vcpu, addr, len);
+		break;
 	}
 
 	vgic_data_host_to_mmio_bus(val, len, data);
@@ -543,6 +546,9 @@ static int dispatch_mmio_write(struct kvm_vcpu *vcpu, struct kvm_io_device *dev,
 	case IODEV_ITS:
 		region->its_write(vcpu->kvm, iodev->its, addr, len, data);
 		break;
+	case IODEV_GICH:
+		region->write(vcpu, addr, len, data);
+		break;
 	}
 
 	return 0;
diff --git a/virt/kvm/arm/vgic/vgic-v2-nested.c b/virt/kvm/arm/vgic/vgic-v2-nested.c
index 85f646b..cb55324 100644
--- a/virt/kvm/arm/vgic/vgic-v2-nested.c
+++ b/virt/kvm/arm/vgic/vgic-v2-nested.c
@@ -206,6 +206,30 @@ static void vgic_mmio_write_v2_gich(struct kvm_vcpu *vcpu,
 		4 * VGIC_V2_MAX_LRS, VGIC_ACCESS_32bit),
 };
 
+int vgic_register_gich_iodev(struct kvm *kvm, struct vgic_dist *dist)
+{
+	struct vgic_io_device *io_device = &kvm->arch.vgic.hyp_iodev;
+	int ret = 0;
+	unsigned int len;
+
+	len = KVM_VGIC_V2_GICH_SIZE;
+
+	io_device->regions = vgic_v2_gich_registers;
+	io_device->nr_regions = ARRAY_SIZE(vgic_v2_gich_registers);
+	kvm_iodevice_init(&io_device->dev, &kvm_io_gic_ops);
+
+	io_device->base_addr = KVM_VGIC_V2_GICH_BASE;
+	io_device->iodev_type = IODEV_GICH;
+	io_device->redist_vcpu = NULL;
+
+	mutex_lock(&kvm->slots_lock);
+	ret = kvm_io_bus_register_dev(kvm, KVM_MMIO_BUS, KVM_VGIC_V2_GICH_BASE,
+			len, &io_device->dev);
+	mutex_unlock(&kvm->slots_lock);
+
+	return ret;
+}
+
 /*
  * For LRs which have HW bit set such as timer interrupts, we modify them to
  * have the host hardware interrupt number instead of the virtual one programmed
diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
index 9bab867..b8b73fd 100644
--- a/virt/kvm/arm/vgic/vgic-v2.c
+++ b/virt/kvm/arm/vgic/vgic-v2.c
@@ -280,6 +280,13 @@ int vgic_v2_map_resources(struct kvm *kvm)
 		goto out;
 	}
 
+	/* Register virtual GICH interface to kvm io bus */
+	ret = vgic_register_gich_iodev(kvm, dist);
+	if (ret) {
+		kvm_err("Unable to register VGIC GICH regions\n");
+		goto out;
+	}
+
 	if (!static_branch_unlikely(&vgic_v2_cpuif_trap)) {
 		ret = kvm_phys_addr_ioremap(kvm, dist->vgic_cpu_base,
 					    kvm_vgic_global_state.vcpu_base,
diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
index 2aef680..11d61a7 100644
--- a/virt/kvm/arm/vgic/vgic.h
+++ b/virt/kvm/arm/vgic/vgic.h
@@ -121,8 +121,14 @@ static inline int vgic_its_inject_msi(struct kvm *kvm, struct kvm_msi *msi)
 int vgic_init(struct kvm *kvm);
 
 #ifdef CONFIG_KVM_ARM_NESTED_HYP
+int vgic_register_gich_iodev(struct kvm *kvm, struct vgic_dist *dist);
 void vgic_init_nested(struct kvm_vcpu *vcpu);
 #else
+static inline int vgic_register_gich_iodev(struct kvm *kvm,
+		struct vgic_dist *dist)
+{
+	return 0;
+}
 static inline void vgic_init_nested(struct kvm_vcpu *vcpu)
 {
 	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 33/55] KVM: arm/arm64: Remove unused params in mmu functions
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (31 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 32/55] KVM: arm/arm64: register GICH iodev for " Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-01-09  6:24 ` [RFC 34/55] KVM: arm/arm64: Abstract stage-2 MMU state into a separate structure Jintack Lim
                   ` (23 subsequent siblings)
  56 siblings, 0 replies; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

From: Christoffer Dall <christoffer.dall@linaro.org>

stage2_flush_xxx functions take a pointer to the kvm struct as the first
parameter but they are never used. Clean this up before modifying mmu
code for nested virtualization support.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm/kvm/mmu.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index a5265ed..57cb671 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -300,7 +300,7 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
 	} while (pgd++, addr = next, addr != end);
 }
 
-static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
+static void stage2_flush_ptes(pmd_t *pmd,
 			      phys_addr_t addr, phys_addr_t end)
 {
 	pte_t *pte;
@@ -312,7 +312,7 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
 	} while (pte++, addr += PAGE_SIZE, addr != end);
 }
 
-static void stage2_flush_pmds(struct kvm *kvm, pud_t *pud,
+static void stage2_flush_pmds(pud_t *pud,
 			      phys_addr_t addr, phys_addr_t end)
 {
 	pmd_t *pmd;
@@ -325,12 +325,12 @@ static void stage2_flush_pmds(struct kvm *kvm, pud_t *pud,
 			if (pmd_thp_or_huge(*pmd))
 				kvm_flush_dcache_pmd(*pmd);
 			else
-				stage2_flush_ptes(kvm, pmd, addr, next);
+				stage2_flush_ptes(pmd, addr, next);
 		}
 	} while (pmd++, addr = next, addr != end);
 }
 
-static void stage2_flush_puds(struct kvm *kvm, pgd_t *pgd,
+static void stage2_flush_puds(pgd_t *pgd,
 			      phys_addr_t addr, phys_addr_t end)
 {
 	pud_t *pud;
@@ -343,7 +343,7 @@ static void stage2_flush_puds(struct kvm *kvm, pgd_t *pgd,
 			if (stage2_pud_huge(*pud))
 				kvm_flush_dcache_pud(*pud);
 			else
-				stage2_flush_pmds(kvm, pud, addr, next);
+				stage2_flush_pmds(pud, addr, next);
 		}
 	} while (pud++, addr = next, addr != end);
 }
@@ -359,7 +359,7 @@ static void stage2_flush_memslot(struct kvm *kvm,
 	pgd = kvm->arch.pgd + stage2_pgd_index(addr);
 	do {
 		next = stage2_pgd_addr_end(addr, end);
-		stage2_flush_puds(kvm, pgd, addr, next);
+		stage2_flush_puds(pgd, addr, next);
 	} while (pgd++, addr = next, addr != end);
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 34/55] KVM: arm/arm64: Abstract stage-2 MMU state into a separate structure
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (32 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 33/55] KVM: arm/arm64: Remove unused params in mmu functions Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-01-09  6:24 ` [RFC 35/55] KVM: arm/arm64: Support mmu for the virtual EL2 execution Jintack Lim
                   ` (22 subsequent siblings)
  56 siblings, 0 replies; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

From: Christoffer Dall <christoffer.dall@linaro.org>

Abstract stage-2 MMU state into a separate structure and change all
callers referring to page tables, VMIDs, and the VTTBR to use this new
indirection.

This is about to become very handy when using shadow stage-2 page
tables.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm/include/asm/kvm_asm.h    |   7 +-
 arch/arm/include/asm/kvm_host.h   |  26 ++++---
 arch/arm/kvm/arm.c                |  34 +++++----
 arch/arm/kvm/hyp/switch.c         |   5 +-
 arch/arm/kvm/hyp/tlb.c            |  18 ++---
 arch/arm/kvm/mmu.c                | 146 +++++++++++++++++++++-----------------
 arch/arm64/include/asm/kvm_asm.h  |   7 +-
 arch/arm64/include/asm/kvm_host.h |  10 ++-
 arch/arm64/kvm/hyp/switch.c       |   5 +-
 arch/arm64/kvm/hyp/tlb.c          |  20 +++---
 10 files changed, 159 insertions(+), 119 deletions(-)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 8ef0538..36e3856 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -57,6 +57,7 @@
 #ifndef __ASSEMBLY__
 struct kvm;
 struct kvm_vcpu;
+struct kvm_s2_mmu;
 
 extern char __kvm_hyp_init[];
 extern char __kvm_hyp_init_end[];
@@ -64,9 +65,9 @@
 extern char __kvm_hyp_vector[];
 
 extern void __kvm_flush_vm_context(void);
-extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
-extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
-extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
+extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa);
+extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
+extern void __kvm_tlb_flush_local_vmid(struct kvm_s2_mmu *mmu);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index d5423ab..f84a59c 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -53,9 +53,21 @@
 int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
 void kvm_reset_coprocs(struct kvm_vcpu *vcpu);
 
-struct kvm_arch {
-	/* VTTBR value associated with below pgd and vmid */
+struct kvm_s2_mmu {
+	/* The VMID generation used for the virt. memory system */
+	u64    vmid_gen;
+	u32    vmid;
+
+	/* Stage-2 page table */
+	pgd_t *pgd;
+
+	/* VTTBR value associated with above pgd and vmid */
 	u64    vttbr;
+};
+
+struct kvm_arch {
+	/* Stage 2 paging state for the VM */
+	struct kvm_s2_mmu mmu;
 
 	/* The last vcpu id that ran on each physical CPU */
 	int __percpu *last_vcpu_ran;
@@ -68,13 +80,6 @@ struct kvm_arch {
 	 * here.
 	 */
 
-	/* The VMID generation used for the virt. memory system */
-	u64    vmid_gen;
-	u32    vmid;
-
-	/* Stage-2 page table */
-	pgd_t *pgd;
-
 	/* Interrupt controller */
 	struct vgic_dist	vgic;
 	int max_vcpus;
@@ -188,6 +193,9 @@ struct kvm_vcpu_arch {
 
 	/* Detect first run of a vcpu */
 	bool has_run_once;
+
+	/* Stage 2 paging state used by the hardware on next switch */
+	struct kvm_s2_mmu *hw_mmu;
 };
 
 struct kvm_vm_stat {
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 436bf5a..eb3e709 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -139,7 +139,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	kvm_timer_init(kvm);
 
 	/* Mark the initial VMID generation invalid */
-	kvm->arch.vmid_gen = 0;
+	kvm->arch.mmu.vmid_gen = 0;
 
 	/* The maximum number of VCPUs is limited by the host's GIC model */
 	kvm->arch.max_vcpus = vgic_present ?
@@ -321,6 +321,8 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 
 	kvm_arm_reset_debug_ptr(vcpu);
 
+	vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu;
+
 	return 0;
 }
 
@@ -335,7 +337,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	 * over-invalidation doesn't affect correctness.
 	 */
 	if (*last_ran != vcpu->vcpu_id) {
-		kvm_call_hyp(__kvm_tlb_flush_local_vmid, vcpu);
+		kvm_call_hyp(__kvm_tlb_flush_local_vmid, &vcpu->kvm->arch.mmu);
 		*last_ran = vcpu->vcpu_id;
 	}
 
@@ -423,25 +425,26 @@ void force_vm_exit(const cpumask_t *mask)
  * VMID for the new generation, we must flush necessary caches and TLBs on all
  * CPUs.
  */
-static bool need_new_vmid_gen(struct kvm *kvm)
+static bool need_new_vmid_gen(struct kvm_s2_mmu *mmu)
 {
-	return unlikely(kvm->arch.vmid_gen != atomic64_read(&kvm_vmid_gen));
+	return unlikely(mmu->vmid_gen != atomic64_read(&kvm_vmid_gen));
 }
 
 /**
  * update_vttbr - Update the VTTBR with a valid VMID before the guest runs
- * @kvm	The guest that we are about to run
+ * @kvm:	The guest that we are about to run
+ * @mmu:	The stage-2 translation context to update
  *
  * Called from kvm_arch_vcpu_ioctl_run before entering the guest to ensure the
  * VM has a valid VMID, otherwise assigns a new one and flushes corresponding
  * caches and TLBs.
  */
-static void update_vttbr(struct kvm *kvm)
+static void update_vttbr(struct kvm *kvm, struct kvm_s2_mmu *mmu)
 {
 	phys_addr_t pgd_phys;
 	u64 vmid;
 
-	if (!need_new_vmid_gen(kvm))
+	if (!need_new_vmid_gen(mmu))
 		return;
 
 	spin_lock(&kvm_vmid_lock);
@@ -451,7 +454,7 @@ static void update_vttbr(struct kvm *kvm)
 	 * already allocated a valid vmid for this vm, then this vcpu should
 	 * use the same vmid.
 	 */
-	if (!need_new_vmid_gen(kvm)) {
+	if (!need_new_vmid_gen(mmu)) {
 		spin_unlock(&kvm_vmid_lock);
 		return;
 	}
@@ -475,16 +478,17 @@ static void update_vttbr(struct kvm *kvm)
 		kvm_call_hyp(__kvm_flush_vm_context);
 	}
 
-	kvm->arch.vmid_gen = atomic64_read(&kvm_vmid_gen);
-	kvm->arch.vmid = kvm_next_vmid;
+	mmu->vmid_gen = atomic64_read(&kvm_vmid_gen);
+	mmu->vmid = kvm_next_vmid;
 	kvm_next_vmid++;
 	kvm_next_vmid &= (1 << kvm_vmid_bits) - 1;
 
 	/* update vttbr to be used with the new vmid */
-	pgd_phys = virt_to_phys(kvm->arch.pgd);
+	pgd_phys = virt_to_phys(mmu->pgd);
 	BUG_ON(pgd_phys & ~VTTBR_BADDR_MASK);
-	vmid = ((u64)(kvm->arch.vmid) << VTTBR_VMID_SHIFT) & VTTBR_VMID_MASK(kvm_vmid_bits);
-	kvm->arch.vttbr = pgd_phys | vmid;
+	vmid = ((u64)(mmu->vmid) << VTTBR_VMID_SHIFT) &
+	       VTTBR_VMID_MASK(kvm_vmid_bits);
+	mmu->vttbr = pgd_phys | vmid;
 
 	spin_unlock(&kvm_vmid_lock);
 }
@@ -611,7 +615,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		 */
 		cond_resched();
 
-		update_vttbr(vcpu->kvm);
+		update_vttbr(vcpu->kvm, vcpu->arch.hw_mmu);
 
 		if (vcpu->arch.power_off || vcpu->arch.pause)
 			vcpu_sleep(vcpu);
@@ -636,7 +640,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 			run->exit_reason = KVM_EXIT_INTR;
 		}
 
-		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm) ||
+		if (ret <= 0 || need_new_vmid_gen(vcpu->arch.hw_mmu) ||
 			vcpu->arch.power_off || vcpu->arch.pause) {
 			local_irq_enable();
 			kvm_pmu_sync_hwstate(vcpu);
diff --git a/arch/arm/kvm/hyp/switch.c b/arch/arm/kvm/hyp/switch.c
index 92678b7..6f99de1 100644
--- a/arch/arm/kvm/hyp/switch.c
+++ b/arch/arm/kvm/hyp/switch.c
@@ -73,8 +73,9 @@ static void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu)
 
 static void __hyp_text __activate_vm(struct kvm_vcpu *vcpu)
 {
-	struct kvm *kvm = kern_hyp_va(vcpu->kvm);
-	write_sysreg(kvm->arch.vttbr, VTTBR);
+	struct kvm_s2_mmu *mmu = kern_hyp_va(vcpu->arch.hw_mmu);
+
+	write_sysreg(mmu->vttbr, VTTBR);
 	write_sysreg(vcpu->arch.midr, VPIDR);
 }
 
diff --git a/arch/arm/kvm/hyp/tlb.c b/arch/arm/kvm/hyp/tlb.c
index 6d810af..56f0a49 100644
--- a/arch/arm/kvm/hyp/tlb.c
+++ b/arch/arm/kvm/hyp/tlb.c
@@ -34,13 +34,13 @@
  * As v7 does not support flushing per IPA, just nuke the whole TLB
  * instead, ignoring the ipa value.
  */
-void __hyp_text __kvm_tlb_flush_vmid(struct kvm *kvm)
+void __hyp_text __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu)
 {
 	dsb(ishst);
 
 	/* Switch to requested VMID */
-	kvm = kern_hyp_va(kvm);
-	write_sysreg(kvm->arch.vttbr, VTTBR);
+	mmu = kern_hyp_va(mmu);
+	write_sysreg(mmu->vttbr, VTTBR);
 	isb();
 
 	write_sysreg(0, TLBIALLIS);
@@ -50,17 +50,17 @@ void __hyp_text __kvm_tlb_flush_vmid(struct kvm *kvm)
 	write_sysreg(0, VTTBR);
 }
 
-void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
+void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu,
+					 phys_addr_t ipa)
 {
-	__kvm_tlb_flush_vmid(kvm);
+	__kvm_tlb_flush_vmid(mmu);
 }
 
-void __hyp_text __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu)
+void __hyp_text __kvm_tlb_flush_local_vmid(struct kvm_s2_mmu *mmu)
 {
-	struct kvm *kvm = kern_hyp_va(kern_hyp_va(vcpu)->kvm);
-
 	/* Switch to requested VMID */
-	write_sysreg(kvm->arch.vttbr, VTTBR);
+	mmu = kern_hyp_va(mmu);
+	write_sysreg(mmu->vttbr, VTTBR);
 	isb();
 
 	write_sysreg(0, TLBIALL);
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 57cb671..a27a204 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -63,9 +63,9 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
 	kvm_call_hyp(__kvm_tlb_flush_vmid, kvm);
 }
 
-static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
+static void kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa)
 {
-	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, kvm, ipa);
+	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ipa);
 }
 
 /*
@@ -102,13 +102,14 @@ static bool kvm_is_device_pfn(unsigned long pfn)
  * Function clears a PMD entry, flushes addr 1st and 2nd stage TLBs. Marks all
  * pages in the range dirty.
  */
-static void stage2_dissolve_pmd(struct kvm *kvm, phys_addr_t addr, pmd_t *pmd)
+static void stage2_dissolve_pmd(struct kvm_s2_mmu *mmu, phys_addr_t addr,
+				pmd_t *pmd)
 {
 	if (!pmd_thp_or_huge(*pmd))
 		return;
 
 	pmd_clear(pmd);
-	kvm_tlb_flush_vmid_ipa(kvm, addr);
+	kvm_tlb_flush_vmid_ipa(mmu, addr);
 	put_page(virt_to_page(pmd));
 }
 
@@ -144,31 +145,34 @@ static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
 	return p;
 }
 
-static void clear_stage2_pgd_entry(struct kvm *kvm, pgd_t *pgd, phys_addr_t addr)
+static void clear_stage2_pgd_entry(struct kvm_s2_mmu *mmu,
+				   pgd_t *pgd, phys_addr_t addr)
 {
 	pud_t *pud_table __maybe_unused = stage2_pud_offset(pgd, 0UL);
 	stage2_pgd_clear(pgd);
-	kvm_tlb_flush_vmid_ipa(kvm, addr);
+	kvm_tlb_flush_vmid_ipa(mmu, addr);
 	stage2_pud_free(pud_table);
 	put_page(virt_to_page(pgd));
 }
 
-static void clear_stage2_pud_entry(struct kvm *kvm, pud_t *pud, phys_addr_t addr)
+static void clear_stage2_pud_entry(struct kvm_s2_mmu *mmu,
+				   pud_t *pud, phys_addr_t addr)
 {
 	pmd_t *pmd_table __maybe_unused = stage2_pmd_offset(pud, 0);
 	VM_BUG_ON(stage2_pud_huge(*pud));
 	stage2_pud_clear(pud);
-	kvm_tlb_flush_vmid_ipa(kvm, addr);
+	kvm_tlb_flush_vmid_ipa(mmu, addr);
 	stage2_pmd_free(pmd_table);
 	put_page(virt_to_page(pud));
 }
 
-static void clear_stage2_pmd_entry(struct kvm *kvm, pmd_t *pmd, phys_addr_t addr)
+static void clear_stage2_pmd_entry(struct kvm_s2_mmu *mmu,
+				   pmd_t *pmd, phys_addr_t addr)
 {
 	pte_t *pte_table = pte_offset_kernel(pmd, 0);
 	VM_BUG_ON(pmd_thp_or_huge(*pmd));
 	pmd_clear(pmd);
-	kvm_tlb_flush_vmid_ipa(kvm, addr);
+	kvm_tlb_flush_vmid_ipa(mmu, addr);
 	pte_free_kernel(NULL, pte_table);
 	put_page(virt_to_page(pmd));
 }
@@ -193,7 +197,7 @@ static void clear_stage2_pmd_entry(struct kvm *kvm, pmd_t *pmd, phys_addr_t addr
  * the corresponding TLBs, we call kvm_flush_dcache_p*() to make sure
  * the IO subsystem will never hit in the cache.
  */
-static void unmap_stage2_ptes(struct kvm *kvm, pmd_t *pmd,
+static void unmap_stage2_ptes(struct kvm_s2_mmu *mmu, pmd_t *pmd,
 		       phys_addr_t addr, phys_addr_t end)
 {
 	phys_addr_t start_addr = addr;
@@ -205,7 +209,7 @@ static void unmap_stage2_ptes(struct kvm *kvm, pmd_t *pmd,
 			pte_t old_pte = *pte;
 
 			kvm_set_pte(pte, __pte(0));
-			kvm_tlb_flush_vmid_ipa(kvm, addr);
+			kvm_tlb_flush_vmid_ipa(mmu, addr);
 
 			/* No need to invalidate the cache for device mappings */
 			if (!kvm_is_device_pfn(pte_pfn(old_pte)))
@@ -216,10 +220,10 @@ static void unmap_stage2_ptes(struct kvm *kvm, pmd_t *pmd,
 	} while (pte++, addr += PAGE_SIZE, addr != end);
 
 	if (stage2_pte_table_empty(start_pte))
-		clear_stage2_pmd_entry(kvm, pmd, start_addr);
+		clear_stage2_pmd_entry(mmu, pmd, start_addr);
 }
 
-static void unmap_stage2_pmds(struct kvm *kvm, pud_t *pud,
+static void unmap_stage2_pmds(struct kvm_s2_mmu *mmu, pud_t *pud,
 		       phys_addr_t addr, phys_addr_t end)
 {
 	phys_addr_t next, start_addr = addr;
@@ -233,22 +237,22 @@ static void unmap_stage2_pmds(struct kvm *kvm, pud_t *pud,
 				pmd_t old_pmd = *pmd;
 
 				pmd_clear(pmd);
-				kvm_tlb_flush_vmid_ipa(kvm, addr);
+				kvm_tlb_flush_vmid_ipa(mmu, addr);
 
 				kvm_flush_dcache_pmd(old_pmd);
 
 				put_page(virt_to_page(pmd));
 			} else {
-				unmap_stage2_ptes(kvm, pmd, addr, next);
+				unmap_stage2_ptes(mmu, pmd, addr, next);
 			}
 		}
 	} while (pmd++, addr = next, addr != end);
 
 	if (stage2_pmd_table_empty(start_pmd))
-		clear_stage2_pud_entry(kvm, pud, start_addr);
+		clear_stage2_pud_entry(mmu, pud, start_addr);
 }
 
-static void unmap_stage2_puds(struct kvm *kvm, pgd_t *pgd,
+static void unmap_stage2_puds(struct kvm_s2_mmu *mmu, pgd_t *pgd,
 		       phys_addr_t addr, phys_addr_t end)
 {
 	phys_addr_t next, start_addr = addr;
@@ -262,17 +266,17 @@ static void unmap_stage2_puds(struct kvm *kvm, pgd_t *pgd,
 				pud_t old_pud = *pud;
 
 				stage2_pud_clear(pud);
-				kvm_tlb_flush_vmid_ipa(kvm, addr);
+				kvm_tlb_flush_vmid_ipa(mmu, addr);
 				kvm_flush_dcache_pud(old_pud);
 				put_page(virt_to_page(pud));
 			} else {
-				unmap_stage2_pmds(kvm, pud, addr, next);
+				unmap_stage2_pmds(mmu, pud, addr, next);
 			}
 		}
 	} while (pud++, addr = next, addr != end);
 
 	if (stage2_pud_table_empty(start_pud))
-		clear_stage2_pgd_entry(kvm, pgd, start_addr);
+		clear_stage2_pgd_entry(mmu, pgd, start_addr);
 }
 
 /**
@@ -286,17 +290,18 @@ static void unmap_stage2_puds(struct kvm *kvm, pgd_t *pgd,
  * destroying the VM), otherwise another faulting VCPU may come in and mess
  * with things behind our backs.
  */
-static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
+static void unmap_stage2_range(struct kvm_s2_mmu *mmu,
+			       phys_addr_t start, u64 size)
 {
 	pgd_t *pgd;
 	phys_addr_t addr = start, end = start + size;
 	phys_addr_t next;
 
-	pgd = kvm->arch.pgd + stage2_pgd_index(addr);
+	pgd = mmu->pgd + stage2_pgd_index(addr);
 	do {
 		next = stage2_pgd_addr_end(addr, end);
 		if (!stage2_pgd_none(*pgd))
-			unmap_stage2_puds(kvm, pgd, addr, next);
+			unmap_stage2_puds(mmu, pgd, addr, next);
 	} while (pgd++, addr = next, addr != end);
 }
 
@@ -348,7 +353,7 @@ static void stage2_flush_puds(pgd_t *pgd,
 	} while (pud++, addr = next, addr != end);
 }
 
-static void stage2_flush_memslot(struct kvm *kvm,
+static void stage2_flush_memslot(struct kvm_s2_mmu *mmu,
 				 struct kvm_memory_slot *memslot)
 {
 	phys_addr_t addr = memslot->base_gfn << PAGE_SHIFT;
@@ -356,7 +361,7 @@ static void stage2_flush_memslot(struct kvm *kvm,
 	phys_addr_t next;
 	pgd_t *pgd;
 
-	pgd = kvm->arch.pgd + stage2_pgd_index(addr);
+	pgd = mmu->pgd + stage2_pgd_index(addr);
 	do {
 		next = stage2_pgd_addr_end(addr, end);
 		stage2_flush_puds(pgd, addr, next);
@@ -381,7 +386,7 @@ static void stage2_flush_vm(struct kvm *kvm)
 
 	slots = kvm_memslots(kvm);
 	kvm_for_each_memslot(memslot, slots)
-		stage2_flush_memslot(kvm, memslot);
+		stage2_flush_memslot(&kvm->arch.mmu, memslot);
 
 	spin_unlock(&kvm->mmu_lock);
 	srcu_read_unlock(&kvm->srcu, idx);
@@ -733,8 +738,9 @@ int create_hyp_io_mappings(void *from, void *to, phys_addr_t phys_addr)
 int kvm_alloc_stage2_pgd(struct kvm *kvm)
 {
 	pgd_t *pgd;
+	struct kvm_s2_mmu *mmu = &kvm->arch.mmu;
 
-	if (kvm->arch.pgd != NULL) {
+	if (mmu->pgd != NULL) {
 		kvm_err("kvm_arch already initialized?\n");
 		return -EINVAL;
 	}
@@ -744,11 +750,12 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm)
 	if (!pgd)
 		return -ENOMEM;
 
-	kvm->arch.pgd = pgd;
+	mmu->pgd = pgd;
+
 	return 0;
 }
 
-static void stage2_unmap_memslot(struct kvm *kvm,
+static void stage2_unmap_memslot(struct kvm_s2_mmu *mmu,
 				 struct kvm_memory_slot *memslot)
 {
 	hva_t hva = memslot->userspace_addr;
@@ -783,7 +790,7 @@ static void stage2_unmap_memslot(struct kvm *kvm,
 
 		if (!(vma->vm_flags & VM_PFNMAP)) {
 			gpa_t gpa = addr + (vm_start - memslot->userspace_addr);
-			unmap_stage2_range(kvm, gpa, vm_end - vm_start);
+			unmap_stage2_range(mmu, gpa, vm_end - vm_start);
 		}
 		hva = vm_end;
 	} while (hva < reg_end);
@@ -807,7 +814,7 @@ void stage2_unmap_vm(struct kvm *kvm)
 
 	slots = kvm_memslots(kvm);
 	kvm_for_each_memslot(memslot, slots)
-		stage2_unmap_memslot(kvm, memslot);
+		stage2_unmap_memslot(&kvm->arch.mmu, memslot);
 
 	spin_unlock(&kvm->mmu_lock);
 	srcu_read_unlock(&kvm->srcu, idx);
@@ -826,22 +833,25 @@ void stage2_unmap_vm(struct kvm *kvm)
  */
 void kvm_free_stage2_pgd(struct kvm *kvm)
 {
-	if (kvm->arch.pgd == NULL)
+	struct kvm_s2_mmu *mmu = &kvm->arch.mmu;
+
+	if (mmu->pgd == NULL)
 		return;
 
-	unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE);
+	unmap_stage2_range(mmu, 0, KVM_PHYS_SIZE);
 	/* Free the HW pgd, one page at a time */
-	free_pages_exact(kvm->arch.pgd, S2_PGD_SIZE);
-	kvm->arch.pgd = NULL;
+	free_pages_exact(mmu->pgd, S2_PGD_SIZE);
+	mmu->pgd = NULL;
 }
 
-static pud_t *stage2_get_pud(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
+static pud_t *stage2_get_pud(struct kvm_s2_mmu *mmu,
+			     struct kvm_mmu_memory_cache *cache,
 			     phys_addr_t addr)
 {
 	pgd_t *pgd;
 	pud_t *pud;
 
-	pgd = kvm->arch.pgd + stage2_pgd_index(addr);
+	pgd = mmu->pgd + stage2_pgd_index(addr);
 	if (WARN_ON(stage2_pgd_none(*pgd))) {
 		if (!cache)
 			return NULL;
@@ -853,13 +863,14 @@ static pud_t *stage2_get_pud(struct kvm *kvm, struct kvm_mmu_memory_cache *cache
 	return stage2_pud_offset(pgd, addr);
 }
 
-static pmd_t *stage2_get_pmd(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
+static pmd_t *stage2_get_pmd(struct kvm_s2_mmu *mmu,
+			     struct kvm_mmu_memory_cache *cache,
 			     phys_addr_t addr)
 {
 	pud_t *pud;
 	pmd_t *pmd;
 
-	pud = stage2_get_pud(kvm, cache, addr);
+	pud = stage2_get_pud(mmu, cache, addr);
 	if (stage2_pud_none(*pud)) {
 		if (!cache)
 			return NULL;
@@ -871,12 +882,13 @@ static pmd_t *stage2_get_pmd(struct kvm *kvm, struct kvm_mmu_memory_cache *cache
 	return stage2_pmd_offset(pud, addr);
 }
 
-static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
+static int stage2_set_pmd_huge(struct kvm_s2_mmu *mmu,
+			       struct kvm_mmu_memory_cache
 			       *cache, phys_addr_t addr, const pmd_t *new_pmd)
 {
 	pmd_t *pmd, old_pmd;
 
-	pmd = stage2_get_pmd(kvm, cache, addr);
+	pmd = stage2_get_pmd(mmu, cache, addr);
 	VM_BUG_ON(!pmd);
 
 	/*
@@ -893,7 +905,7 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
 	old_pmd = *pmd;
 	if (pmd_present(old_pmd)) {
 		pmd_clear(pmd);
-		kvm_tlb_flush_vmid_ipa(kvm, addr);
+		kvm_tlb_flush_vmid_ipa(mmu, addr);
 	} else {
 		get_page(virt_to_page(pmd));
 	}
@@ -902,7 +914,8 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
 	return 0;
 }
 
-static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
+static int stage2_set_pte(struct kvm_s2_mmu *mmu,
+			  struct kvm_mmu_memory_cache *cache,
 			  phys_addr_t addr, const pte_t *new_pte,
 			  unsigned long flags)
 {
@@ -914,7 +927,7 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
 	VM_BUG_ON(logging_active && !cache);
 
 	/* Create stage-2 page table mapping - Levels 0 and 1 */
-	pmd = stage2_get_pmd(kvm, cache, addr);
+	pmd = stage2_get_pmd(mmu, cache, addr);
 	if (!pmd) {
 		/*
 		 * Ignore calls from kvm_set_spte_hva for unallocated
@@ -928,7 +941,7 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
 	 * allocate page.
 	 */
 	if (logging_active)
-		stage2_dissolve_pmd(kvm, addr, pmd);
+		stage2_dissolve_pmd(mmu, addr, pmd);
 
 	/* Create stage-2 page mappings - Level 2 */
 	if (pmd_none(*pmd)) {
@@ -948,7 +961,7 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
 	old_pte = *pte;
 	if (pte_present(old_pte)) {
 		kvm_set_pte(pte, __pte(0));
-		kvm_tlb_flush_vmid_ipa(kvm, addr);
+		kvm_tlb_flush_vmid_ipa(mmu, addr);
 	} else {
 		get_page(virt_to_page(pte));
 	}
@@ -1008,7 +1021,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 		if (ret)
 			goto out;
 		spin_lock(&kvm->mmu_lock);
-		ret = stage2_set_pte(kvm, &cache, addr, &pte,
+		ret = stage2_set_pte(&kvm->arch.mmu, &cache, addr, &pte,
 						KVM_S2PTE_FLAG_IS_IOMAP);
 		spin_unlock(&kvm->mmu_lock);
 		if (ret)
@@ -1146,12 +1159,13 @@ static void  stage2_wp_puds(pgd_t *pgd, phys_addr_t addr, phys_addr_t end)
  * @addr:	Start address of range
  * @end:	End address of range
  */
-static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
+static void stage2_wp_range(struct kvm *kvm, struct kvm_s2_mmu *mmu,
+			    phys_addr_t addr, phys_addr_t end)
 {
 	pgd_t *pgd;
 	phys_addr_t next;
 
-	pgd = kvm->arch.pgd + stage2_pgd_index(addr);
+	pgd = mmu->pgd + stage2_pgd_index(addr);
 	do {
 		/*
 		 * Release kvm_mmu_lock periodically if the memory region is
@@ -1190,7 +1204,7 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
 	phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
 
 	spin_lock(&kvm->mmu_lock);
-	stage2_wp_range(kvm, start, end);
+	stage2_wp_range(kvm, &kvm->arch.mmu, start, end);
 	spin_unlock(&kvm->mmu_lock);
 	kvm_flush_remote_tlbs(kvm);
 }
@@ -1214,7 +1228,7 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
 	phys_addr_t start = (base_gfn +  __ffs(mask)) << PAGE_SHIFT;
 	phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT;
 
-	stage2_wp_range(kvm, start, end);
+	stage2_wp_range(kvm, &kvm->arch.mmu, start, end);
 }
 
 /*
@@ -1253,6 +1267,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	bool fault_ipa_uncached;
 	bool logging_active = memslot_is_logging(memslot);
 	unsigned long flags = 0;
+	struct kvm_s2_mmu *mmu = vcpu->arch.hw_mmu;
 
 	write_fault = kvm_is_write_fault(vcpu);
 	if (fault_status == FSC_PERM && !write_fault) {
@@ -1347,7 +1362,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			kvm_set_pfn_dirty(pfn);
 		}
 		coherent_cache_guest_page(vcpu, pfn, PMD_SIZE, fault_ipa_uncached);
-		ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, &new_pmd);
+		ret = stage2_set_pmd_huge(mmu, memcache, fault_ipa, &new_pmd);
 	} else {
 		pte_t new_pte = pfn_pte(pfn, mem_type);
 
@@ -1357,7 +1372,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			mark_page_dirty(kvm, gfn);
 		}
 		coherent_cache_guest_page(vcpu, pfn, PAGE_SIZE, fault_ipa_uncached);
-		ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, flags);
+		ret = stage2_set_pte(mmu, memcache, fault_ipa, &new_pte, flags);
 	}
 
 out_unlock:
@@ -1385,7 +1400,7 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
 
 	spin_lock(&vcpu->kvm->mmu_lock);
 
-	pmd = stage2_get_pmd(vcpu->kvm, NULL, fault_ipa);
+	pmd = stage2_get_pmd(vcpu->arch.hw_mmu, NULL, fault_ipa);
 	if (!pmd || pmd_none(*pmd))	/* Nothing there */
 		goto out;
 
@@ -1553,7 +1568,7 @@ static int handle_hva_to_gpa(struct kvm *kvm,
 
 static int kvm_unmap_hva_handler(struct kvm *kvm, gpa_t gpa, void *data)
 {
-	unmap_stage2_range(kvm, gpa, PAGE_SIZE);
+	unmap_stage2_range(&kvm->arch.mmu, gpa, PAGE_SIZE);
 	return 0;
 }
 
@@ -1561,7 +1576,7 @@ int kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
 {
 	unsigned long end = hva + PAGE_SIZE;
 
-	if (!kvm->arch.pgd)
+	if (!kvm->arch.mmu.pgd)
 		return 0;
 
 	trace_kvm_unmap_hva(hva);
@@ -1572,7 +1587,7 @@ int kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
 int kvm_unmap_hva_range(struct kvm *kvm,
 			unsigned long start, unsigned long end)
 {
-	if (!kvm->arch.pgd)
+	if (!kvm->arch.mmu.pgd)
 		return 0;
 
 	trace_kvm_unmap_hva_range(start, end);
@@ -1591,7 +1606,7 @@ static int kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, void *data)
 	 * therefore stage2_set_pte() never needs to clear out a huge PMD
 	 * through this calling path.
 	 */
-	stage2_set_pte(kvm, NULL, gpa, pte, 0);
+	stage2_set_pte(&kvm->arch.mmu, NULL, gpa, pte, 0);
 	return 0;
 }
 
@@ -1601,7 +1616,7 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
 	unsigned long end = hva + PAGE_SIZE;
 	pte_t stage2_pte;
 
-	if (!kvm->arch.pgd)
+	if (!kvm->arch.mmu.pgd)
 		return;
 
 	trace_kvm_set_spte_hva(hva);
@@ -1614,7 +1629,7 @@ static int kvm_age_hva_handler(struct kvm *kvm, gpa_t gpa, void *data)
 	pmd_t *pmd;
 	pte_t *pte;
 
-	pmd = stage2_get_pmd(kvm, NULL, gpa);
+	pmd = stage2_get_pmd(&kvm->arch.mmu, NULL, gpa);
 	if (!pmd || pmd_none(*pmd))	/* Nothing there */
 		return 0;
 
@@ -1633,7 +1648,7 @@ static int kvm_test_age_hva_handler(struct kvm *kvm, gpa_t gpa, void *data)
 	pmd_t *pmd;
 	pte_t *pte;
 
-	pmd = stage2_get_pmd(kvm, NULL, gpa);
+	pmd = stage2_get_pmd(&kvm->arch.mmu, NULL, gpa);
 	if (!pmd || pmd_none(*pmd))	/* Nothing there */
 		return 0;
 
@@ -1864,9 +1879,10 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 
 	spin_lock(&kvm->mmu_lock);
 	if (ret)
-		unmap_stage2_range(kvm, mem->guest_phys_addr, mem->memory_size);
+		unmap_stage2_range(&kvm->arch.mmu, mem->guest_phys_addr,
+				   mem->memory_size);
 	else
-		stage2_flush_memslot(kvm, memslot);
+		stage2_flush_memslot(&kvm->arch.mmu, memslot);
 	spin_unlock(&kvm->mmu_lock);
 	return ret;
 }
@@ -1907,7 +1923,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 	phys_addr_t size = slot->npages << PAGE_SHIFT;
 
 	spin_lock(&kvm->mmu_lock);
-	unmap_stage2_range(kvm, gpa, size);
+	unmap_stage2_range(&kvm->arch.mmu, gpa, size);
 	spin_unlock(&kvm->mmu_lock);
 }
 
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index ec3553eb..ed8139f 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -44,6 +44,7 @@
 #ifndef __ASSEMBLY__
 struct kvm;
 struct kvm_vcpu;
+struct kvm_s2_mmu;
 
 extern char __kvm_hyp_init[];
 extern char __kvm_hyp_init_end[];
@@ -52,9 +53,9 @@
 extern char __kvm_hyp_vector[];
 
 extern void __kvm_flush_vm_context(void);
-extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
-extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
-extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
+extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa);
+extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
+extern void __kvm_tlb_flush_local_vmid(struct kvm_s2_mmu *mmu);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index ed78d73..954d6de 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -50,7 +50,7 @@
 int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext);
 void __extended_idmap_trampoline(phys_addr_t boot_pgd, phys_addr_t idmap_start);
 
-struct kvm_arch {
+struct kvm_s2_mmu {
 	/* The VMID generation used for the virt. memory system */
 	u64    vmid_gen;
 	u32    vmid;
@@ -61,6 +61,11 @@ struct kvm_arch {
 
 	/* VTTBR value associated with above pgd and vmid */
 	u64    vttbr;
+};
+
+struct kvm_arch {
+	/* Stage 2 paging state for the VM */
+	struct kvm_s2_mmu mmu;
 
 	/* The last vcpu id that ran on each physical CPU */
 	int __percpu *last_vcpu_ran;
@@ -326,6 +331,9 @@ struct kvm_vcpu_arch {
 
 	/* Detect first run of a vcpu */
 	bool has_run_once;
+
+	/* Stage 2 paging state used by the hardware on next switch */
+	struct kvm_s2_mmu *hw_mmu;
 };
 
 #define vcpu_gp_regs(v)		(&(v)->arch.ctxt.gp_regs)
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index b7c8c30..3207009a 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -135,8 +135,9 @@ static void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu)
 
 static void __hyp_text __activate_vm(struct kvm_vcpu *vcpu)
 {
-	struct kvm *kvm = kern_hyp_va(vcpu->kvm);
-	write_sysreg(kvm->arch.vttbr, vttbr_el2);
+	struct kvm_s2_mmu *mmu = kern_hyp_va(vcpu->arch.hw_mmu);
+
+	write_sysreg(mmu->vttbr, vttbr_el2);
 }
 
 static void __hyp_text __deactivate_vm(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/kvm/hyp/tlb.c b/arch/arm64/kvm/hyp/tlb.c
index 88e2f2b..71a62ea 100644
--- a/arch/arm64/kvm/hyp/tlb.c
+++ b/arch/arm64/kvm/hyp/tlb.c
@@ -17,13 +17,14 @@
 
 #include <asm/kvm_hyp.h>
 
-void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
+void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu,
+					 phys_addr_t ipa)
 {
 	dsb(ishst);
 
 	/* Switch to requested VMID */
-	kvm = kern_hyp_va(kvm);
-	write_sysreg(kvm->arch.vttbr, vttbr_el2);
+	mmu = kern_hyp_va(mmu);
+	write_sysreg(mmu->vttbr, vttbr_el2);
 	isb();
 
 	/*
@@ -48,13 +49,13 @@ void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
 	write_sysreg(0, vttbr_el2);
 }
 
-void __hyp_text __kvm_tlb_flush_vmid(struct kvm *kvm)
+void __hyp_text __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu)
 {
 	dsb(ishst);
 
 	/* Switch to requested VMID */
-	kvm = kern_hyp_va(kvm);
-	write_sysreg(kvm->arch.vttbr, vttbr_el2);
+	mmu = kern_hyp_va(mmu);
+	write_sysreg(mmu->vttbr, vttbr_el2);
 	isb();
 
 	asm volatile("tlbi vmalls12e1is" : : );
@@ -64,12 +65,11 @@ void __hyp_text __kvm_tlb_flush_vmid(struct kvm *kvm)
 	write_sysreg(0, vttbr_el2);
 }
 
-void __hyp_text __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu)
+void __hyp_text __kvm_tlb_flush_local_vmid(struct kvm_s2_mmu *mmu)
 {
-	struct kvm *kvm = kern_hyp_va(kern_hyp_va(vcpu)->kvm);
-
 	/* Switch to requested VMID */
-	write_sysreg(kvm->arch.vttbr, vttbr_el2);
+	mmu = kern_hyp_va(mmu);
+	write_sysreg(mmu->vttbr, vttbr_el2);
 	isb();
 
 	asm volatile("tlbi vmalle1" : : );
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 35/55] KVM: arm/arm64: Support mmu for the virtual EL2 execution
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (33 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 34/55] KVM: arm/arm64: Abstract stage-2 MMU state into a separate structure Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-02-22 13:38   ` Christoffer Dall
  2017-01-09  6:24 ` [RFC 36/55] KVM: arm64: Invalidate virtual EL2 TLB entries when needed Jintack Lim
                   ` (21 subsequent siblings)
  56 siblings, 1 reply; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

From: Christoffer Dall <christoffer.dall@linaro.org>

When running a guest hypervisor in virtual EL2, the translation context
has to be separate from the rest of the system, including the guest
EL1/0 translation regime, so we allocate a separate VMID for this mode.

Considering that we have two different vttbr values due to separate
VMIDs, it's racy to keep a vttbr value in a struct (kvm_s2_mmu) and
share it between multiple vcpus. So, keep the vttbr value per vcpu.

Hypercalls to flush tlb now have vttbr as a parameter instead of mmu,
since mmu structure does not have vttbr any more.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm/include/asm/kvm_asm.h       |  6 ++--
 arch/arm/include/asm/kvm_emulate.h   |  4 +++
 arch/arm/include/asm/kvm_host.h      | 14 ++++++---
 arch/arm/include/asm/kvm_mmu.h       | 11 +++++++
 arch/arm/kvm/arm.c                   | 60 +++++++++++++++++++-----------------
 arch/arm/kvm/hyp/switch.c            |  4 +--
 arch/arm/kvm/hyp/tlb.c               | 15 ++++-----
 arch/arm/kvm/mmu.c                   |  9 ++++--
 arch/arm64/include/asm/kvm_asm.h     |  6 ++--
 arch/arm64/include/asm/kvm_emulate.h |  8 +++++
 arch/arm64/include/asm/kvm_host.h    | 14 ++++++---
 arch/arm64/include/asm/kvm_mmu.h     | 11 +++++++
 arch/arm64/kvm/hyp/switch.c          |  4 +--
 arch/arm64/kvm/hyp/tlb.c             | 16 ++++------
 14 files changed, 112 insertions(+), 70 deletions(-)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 36e3856..aa214f7 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -65,9 +65,9 @@
 extern char __kvm_hyp_vector[];
 
 extern void __kvm_flush_vm_context(void);
-extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa);
-extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
-extern void __kvm_tlb_flush_local_vmid(struct kvm_s2_mmu *mmu);
+extern void __kvm_tlb_flush_vmid_ipa(u64 vttbr, phys_addr_t ipa);
+extern void __kvm_tlb_flush_vmid(u64 vttbr);
+extern void __kvm_tlb_flush_local_vmid(u64 vttbr);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index 05d5906..6285f4f 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -305,4 +305,8 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
 	}
 }
 
+static inline struct kvm_s2_vmid *vcpu_get_active_vmid(struct kvm_vcpu *vcpu)
+{
+	return &vcpu->kvm->arch.mmu.vmid;
+}
 #endif /* __ARM_KVM_EMULATE_H__ */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index f84a59c..da45394 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -53,16 +53,18 @@
 int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
 void kvm_reset_coprocs(struct kvm_vcpu *vcpu);
 
-struct kvm_s2_mmu {
+struct kvm_s2_vmid {
 	/* The VMID generation used for the virt. memory system */
 	u64    vmid_gen;
 	u32    vmid;
+};
+
+struct kvm_s2_mmu {
+	struct kvm_s2_vmid vmid;
+	struct kvm_s2_vmid el2_vmid;
 
 	/* Stage-2 page table */
 	pgd_t *pgd;
-
-	/* VTTBR value associated with above pgd and vmid */
-	u64    vttbr;
 };
 
 struct kvm_arch {
@@ -196,6 +198,9 @@ struct kvm_vcpu_arch {
 
 	/* Stage 2 paging state used by the hardware on next switch */
 	struct kvm_s2_mmu *hw_mmu;
+
+	/* VTTBR value used by the hardware on next switch */
+	u64 hw_vttbr;
 };
 
 struct kvm_vm_stat {
@@ -242,6 +247,7 @@ static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
 {
 }
 
+unsigned int get_kvm_vmid_bits(void);
 struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
 struct kvm_vcpu __percpu **kvm_get_running_vcpus(void);
 void kvm_arm_halt_guest(struct kvm *kvm);
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 74a44727..1b3309c 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -230,6 +230,17 @@ static inline unsigned int kvm_get_vmid_bits(void)
 	return 8;
 }
 
+static inline u64 kvm_get_vttbr(struct kvm_s2_vmid *vmid,
+				struct kvm_s2_mmu *mmu)
+{
+	u64 vmid_field, baddr;
+
+	baddr = virt_to_phys(mmu->pgd);
+	vmid_field = ((u64)vmid->vmid << VTTBR_VMID_SHIFT) &
+		VTTBR_VMID_MASK(get_kvm_vmid_bits());
+	return baddr | vmid_field;
+}
+
 #endif	/* !__ASSEMBLY__ */
 
 #endif /* __ARM_KVM_MMU_H__ */
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index eb3e709..aa8771d 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -75,6 +75,11 @@ static void kvm_arm_set_running_vcpu(struct kvm_vcpu *vcpu)
 	__this_cpu_write(kvm_arm_running_vcpu, vcpu);
 }
 
+unsigned int get_kvm_vmid_bits(void)
+{
+	return kvm_vmid_bits;
+}
+
 /**
  * kvm_arm_get_running_vcpu - get the vcpu running on the current CPU.
  * Must be called from non-preemptible context
@@ -139,7 +144,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	kvm_timer_init(kvm);
 
 	/* Mark the initial VMID generation invalid */
-	kvm->arch.mmu.vmid_gen = 0;
+	kvm->arch.mmu.vmid.vmid_gen = 0;
+	kvm->arch.mmu.el2_vmid.vmid_gen = 0;
 
 	/* The maximum number of VCPUs is limited by the host's GIC model */
 	kvm->arch.max_vcpus = vgic_present ?
@@ -312,6 +318,8 @@ void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
 
 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 {
+	struct kvm_s2_mmu *mmu = &vcpu->kvm->arch.mmu;
+
 	/* Force users to call KVM_ARM_VCPU_INIT */
 	vcpu->arch.target = -1;
 	bitmap_zero(vcpu->arch.features, KVM_VCPU_MAX_FEATURES);
@@ -321,7 +329,8 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 
 	kvm_arm_reset_debug_ptr(vcpu);
 
-	vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu;
+	vcpu->arch.hw_mmu = mmu;
+	vcpu->arch.hw_vttbr = kvm_get_vttbr(&mmu->vmid, mmu);
 
 	return 0;
 }
@@ -337,7 +346,10 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	 * over-invalidation doesn't affect correctness.
 	 */
 	if (*last_ran != vcpu->vcpu_id) {
-		kvm_call_hyp(__kvm_tlb_flush_local_vmid, &vcpu->kvm->arch.mmu);
+		struct kvm_s2_mmu *mmu = &vcpu->kvm->arch.mmu;
+		u64 vttbr = kvm_get_vttbr(&mmu->vmid, mmu);
+
+		kvm_call_hyp(__kvm_tlb_flush_local_vmid, vttbr);
 		*last_ran = vcpu->vcpu_id;
 	}
 
@@ -415,36 +427,33 @@ void force_vm_exit(const cpumask_t *mask)
 
 /**
  * need_new_vmid_gen - check that the VMID is still valid
- * @kvm: The VM's VMID to check
+ * @vmid: The VMID to check
  *
  * return true if there is a new generation of VMIDs being used
  *
- * The hardware supports only 256 values with the value zero reserved for the
- * host, so we check if an assigned value belongs to a previous generation,
- * which which requires us to assign a new value. If we're the first to use a
- * VMID for the new generation, we must flush necessary caches and TLBs on all
- * CPUs.
+ * The hardware supports a limited set of values with the value zero reserved
+ * for the host, so we check if an assigned value belongs to a previous
+ * generation, which which requires us to assign a new value. If we're the
+ * first to use a VMID for the new generation, we must flush necessary caches
+ * and TLBs on all CPUs.
  */
-static bool need_new_vmid_gen(struct kvm_s2_mmu *mmu)
+static bool need_new_vmid_gen(struct kvm_s2_vmid *vmid)
 {
-	return unlikely(mmu->vmid_gen != atomic64_read(&kvm_vmid_gen));
+	return unlikely(vmid->vmid_gen != atomic64_read(&kvm_vmid_gen));
 }
 
 /**
  * update_vttbr - Update the VTTBR with a valid VMID before the guest runs
  * @kvm:	The guest that we are about to run
- * @mmu:	The stage-2 translation context to update
+ * @vmid:	The stage-2 VMID information struct
  *
  * Called from kvm_arch_vcpu_ioctl_run before entering the guest to ensure the
  * VM has a valid VMID, otherwise assigns a new one and flushes corresponding
  * caches and TLBs.
  */
-static void update_vttbr(struct kvm *kvm, struct kvm_s2_mmu *mmu)
+static void update_vttbr(struct kvm *kvm, struct kvm_s2_vmid *vmid)
 {
-	phys_addr_t pgd_phys;
-	u64 vmid;
-
-	if (!need_new_vmid_gen(mmu))
+	if (!need_new_vmid_gen(vmid))
 		return;
 
 	spin_lock(&kvm_vmid_lock);
@@ -454,7 +463,7 @@ static void update_vttbr(struct kvm *kvm, struct kvm_s2_mmu *mmu)
 	 * already allocated a valid vmid for this vm, then this vcpu should
 	 * use the same vmid.
 	 */
-	if (!need_new_vmid_gen(mmu)) {
+	if (!need_new_vmid_gen(vmid)) {
 		spin_unlock(&kvm_vmid_lock);
 		return;
 	}
@@ -478,18 +487,11 @@ static void update_vttbr(struct kvm *kvm, struct kvm_s2_mmu *mmu)
 		kvm_call_hyp(__kvm_flush_vm_context);
 	}
 
-	mmu->vmid_gen = atomic64_read(&kvm_vmid_gen);
-	mmu->vmid = kvm_next_vmid;
+	vmid->vmid_gen = atomic64_read(&kvm_vmid_gen);
+	vmid->vmid = kvm_next_vmid;
 	kvm_next_vmid++;
 	kvm_next_vmid &= (1 << kvm_vmid_bits) - 1;
 
-	/* update vttbr to be used with the new vmid */
-	pgd_phys = virt_to_phys(mmu->pgd);
-	BUG_ON(pgd_phys & ~VTTBR_BADDR_MASK);
-	vmid = ((u64)(mmu->vmid) << VTTBR_VMID_SHIFT) &
-	       VTTBR_VMID_MASK(kvm_vmid_bits);
-	mmu->vttbr = pgd_phys | vmid;
-
 	spin_unlock(&kvm_vmid_lock);
 }
 
@@ -615,7 +617,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		 */
 		cond_resched();
 
-		update_vttbr(vcpu->kvm, vcpu->arch.hw_mmu);
+		update_vttbr(vcpu->kvm, vcpu_get_active_vmid(vcpu));
 
 		if (vcpu->arch.power_off || vcpu->arch.pause)
 			vcpu_sleep(vcpu);
@@ -640,7 +642,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 			run->exit_reason = KVM_EXIT_INTR;
 		}
 
-		if (ret <= 0 || need_new_vmid_gen(vcpu->arch.hw_mmu) ||
+		if (ret <= 0 || need_new_vmid_gen(vcpu_get_active_vmid(vcpu)) ||
 			vcpu->arch.power_off || vcpu->arch.pause) {
 			local_irq_enable();
 			kvm_pmu_sync_hwstate(vcpu);
diff --git a/arch/arm/kvm/hyp/switch.c b/arch/arm/kvm/hyp/switch.c
index 6f99de1..65d0b5b 100644
--- a/arch/arm/kvm/hyp/switch.c
+++ b/arch/arm/kvm/hyp/switch.c
@@ -73,9 +73,7 @@ static void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu)
 
 static void __hyp_text __activate_vm(struct kvm_vcpu *vcpu)
 {
-	struct kvm_s2_mmu *mmu = kern_hyp_va(vcpu->arch.hw_mmu);
-
-	write_sysreg(mmu->vttbr, VTTBR);
+	write_sysreg(vcpu->arch.hw_vttbr, VTTBR);
 	write_sysreg(vcpu->arch.midr, VPIDR);
 }
 
diff --git a/arch/arm/kvm/hyp/tlb.c b/arch/arm/kvm/hyp/tlb.c
index 56f0a49..562ad0b 100644
--- a/arch/arm/kvm/hyp/tlb.c
+++ b/arch/arm/kvm/hyp/tlb.c
@@ -34,13 +34,12 @@
  * As v7 does not support flushing per IPA, just nuke the whole TLB
  * instead, ignoring the ipa value.
  */
-void __hyp_text __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu)
+void __hyp_text __kvm_tlb_flush_vmid(u64 vttbr)
 {
 	dsb(ishst);
 
 	/* Switch to requested VMID */
-	mmu = kern_hyp_va(mmu);
-	write_sysreg(mmu->vttbr, VTTBR);
+	write_sysreg(vttbr, VTTBR);
 	isb();
 
 	write_sysreg(0, TLBIALLIS);
@@ -50,17 +49,15 @@ void __hyp_text __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu)
 	write_sysreg(0, VTTBR);
 }
 
-void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu,
-					 phys_addr_t ipa)
+void __hyp_text __kvm_tlb_flush_vmid_ipa(u64 vttbr, phys_addr_t ipa)
 {
-	__kvm_tlb_flush_vmid(mmu);
+	__kvm_tlb_flush_vmid(vttbr);
 }
 
-void __hyp_text __kvm_tlb_flush_local_vmid(struct kvm_s2_mmu *mmu)
+void __hyp_text __kvm_tlb_flush_local_vmid(u64 vttbr)
 {
 	/* Switch to requested VMID */
-	mmu = kern_hyp_va(mmu);
-	write_sysreg(mmu->vttbr, VTTBR);
+	write_sysreg(vttbr, VTTBR);
 	isb();
 
 	write_sysreg(0, TLBIALL);
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index a27a204..5ca3a04 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -60,12 +60,17 @@ static bool memslot_is_logging(struct kvm_memory_slot *memslot)
  */
 void kvm_flush_remote_tlbs(struct kvm *kvm)
 {
-	kvm_call_hyp(__kvm_tlb_flush_vmid, kvm);
+	struct kvm_s2_mmu *mmu = &kvm->arch.mmu;
+	u64 vttbr = kvm_get_vttbr(&mmu->vmid, mmu);
+
+	kvm_call_hyp(__kvm_tlb_flush_vmid, vttbr);
 }
 
 static void kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa)
 {
-	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ipa);
+	u64 vttbr = kvm_get_vttbr(&mmu->vmid, mmu);
+
+	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, vttbr, ipa);
 }
 
 /*
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index ed8139f..27dce47 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -53,9 +53,9 @@
 extern char __kvm_hyp_vector[];
 
 extern void __kvm_flush_vm_context(void);
-extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa);
-extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
-extern void __kvm_tlb_flush_local_vmid(struct kvm_s2_mmu *mmu);
+extern void __kvm_tlb_flush_vmid_ipa(u64 vttbr, phys_addr_t ipa);
+extern void __kvm_tlb_flush_vmid(u64 vttbr);
+extern void __kvm_tlb_flush_local_vmid(u64 vttbr);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index a9c993f..94068e7 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -363,4 +363,12 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
 	return data;		/* Leave LE untouched */
 }
 
+static inline struct kvm_s2_vmid *vcpu_get_active_vmid(struct kvm_vcpu *vcpu)
+{
+	if (unlikely(vcpu_mode_el2(vcpu)))
+		return &vcpu->kvm->arch.mmu.el2_vmid;
+
+	return &vcpu->kvm->arch.mmu.vmid;
+}
+
 #endif /* __ARM64_KVM_EMULATE_H__ */
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 954d6de..b33d35d 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -50,17 +50,19 @@
 int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext);
 void __extended_idmap_trampoline(phys_addr_t boot_pgd, phys_addr_t idmap_start);
 
-struct kvm_s2_mmu {
+struct kvm_s2_vmid {
 	/* The VMID generation used for the virt. memory system */
 	u64    vmid_gen;
 	u32    vmid;
+};
+
+struct kvm_s2_mmu {
+	struct kvm_s2_vmid vmid;
+	struct kvm_s2_vmid el2_vmid;
 
 	/* 1-level 2nd stage table and lock */
 	spinlock_t pgd_lock;
 	pgd_t *pgd;
-
-	/* VTTBR value associated with above pgd and vmid */
-	u64    vttbr;
 };
 
 struct kvm_arch {
@@ -334,6 +336,9 @@ struct kvm_vcpu_arch {
 
 	/* Stage 2 paging state used by the hardware on next switch */
 	struct kvm_s2_mmu *hw_mmu;
+
+	/* VTTBR value used by the hardware on next switch */
+	u64 hw_vttbr;
 };
 
 #define vcpu_gp_regs(v)		(&(v)->arch.ctxt.gp_regs)
@@ -391,6 +396,7 @@ static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
 {
 }
 
+unsigned int get_kvm_vmid_bits(void);
 struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
 struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void);
 void kvm_arm_halt_guest(struct kvm *kvm);
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 6f72fe8..e3455c4 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -314,5 +314,16 @@ static inline unsigned int kvm_get_vmid_bits(void)
 	return (cpuid_feature_extract_unsigned_field(reg, ID_AA64MMFR1_VMIDBITS_SHIFT) == 2) ? 16 : 8;
 }
 
+static inline u64 kvm_get_vttbr(struct kvm_s2_vmid *vmid,
+				struct kvm_s2_mmu *mmu)
+{
+	u64 vmid_field, baddr;
+
+	baddr = virt_to_phys(mmu->pgd);
+	vmid_field = ((u64)vmid->vmid << VTTBR_VMID_SHIFT) &
+		VTTBR_VMID_MASK(get_kvm_vmid_bits());
+	return baddr | vmid_field;
+}
+
 #endif /* __ASSEMBLY__ */
 #endif /* __ARM64_KVM_MMU_H__ */
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 3207009a..c80b2ae 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -135,9 +135,7 @@ static void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu)
 
 static void __hyp_text __activate_vm(struct kvm_vcpu *vcpu)
 {
-	struct kvm_s2_mmu *mmu = kern_hyp_va(vcpu->arch.hw_mmu);
-
-	write_sysreg(mmu->vttbr, vttbr_el2);
+	write_sysreg(vcpu->arch.hw_vttbr, vttbr_el2);
 }
 
 static void __hyp_text __deactivate_vm(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/kvm/hyp/tlb.c b/arch/arm64/kvm/hyp/tlb.c
index 71a62ea..82350e7 100644
--- a/arch/arm64/kvm/hyp/tlb.c
+++ b/arch/arm64/kvm/hyp/tlb.c
@@ -17,14 +17,12 @@
 
 #include <asm/kvm_hyp.h>
 
-void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu,
-					 phys_addr_t ipa)
+void __hyp_text __kvm_tlb_flush_vmid_ipa(u64 vttbr, phys_addr_t ipa)
 {
 	dsb(ishst);
 
 	/* Switch to requested VMID */
-	mmu = kern_hyp_va(mmu);
-	write_sysreg(mmu->vttbr, vttbr_el2);
+	write_sysreg(vttbr, vttbr_el2);
 	isb();
 
 	/*
@@ -49,13 +47,12 @@ void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu,
 	write_sysreg(0, vttbr_el2);
 }
 
-void __hyp_text __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu)
+void __hyp_text __kvm_tlb_flush_vmid(u64 vttbr)
 {
 	dsb(ishst);
 
 	/* Switch to requested VMID */
-	mmu = kern_hyp_va(mmu);
-	write_sysreg(mmu->vttbr, vttbr_el2);
+	write_sysreg(vttbr, vttbr_el2);
 	isb();
 
 	asm volatile("tlbi vmalls12e1is" : : );
@@ -65,11 +62,10 @@ void __hyp_text __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu)
 	write_sysreg(0, vttbr_el2);
 }
 
-void __hyp_text __kvm_tlb_flush_local_vmid(struct kvm_s2_mmu *mmu)
+void __hyp_text __kvm_tlb_flush_local_vmid(u64 vttbr)
 {
 	/* Switch to requested VMID */
-	mmu = kern_hyp_va(mmu);
-	write_sysreg(mmu->vttbr, vttbr_el2);
+	write_sysreg(vttbr, vttbr_el2);
 	isb();
 
 	asm volatile("tlbi vmalle1" : : );
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 36/55] KVM: arm64: Invalidate virtual EL2 TLB entries when needed
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (34 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 35/55] KVM: arm/arm64: Support mmu for the virtual EL2 execution Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-01-09  6:24 ` [RFC 37/55] KVM: arm64: Setup vttbr_el2 on each VM entry Jintack Lim
                   ` (20 subsequent siblings)
  56 siblings, 0 replies; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

From: Christoffer Dall <christoffer.dall@linaro.org>

Sometimes when we are invalidating the TLB for a certain S2 MMU
context, this context can also have EL2 context associated with it and
we have to invalidate this too.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm/kvm/arm.c |  6 ++++++
 arch/arm/kvm/mmu.c | 16 ++++++++++++++++
 2 files changed, 22 insertions(+)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index aa8771d..371b38e7 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -350,6 +350,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 		u64 vttbr = kvm_get_vttbr(&mmu->vmid, mmu);
 
 		kvm_call_hyp(__kvm_tlb_flush_local_vmid, vttbr);
+#ifndef CONFIG_KVM_ARM_NESTED_HYP
+		if (mmu->el2_vmid.vmid) {
+			vttbr = kvm_get_vttbr(&mmu->el2_vmid, mmu);
+			kvm_call_hyp(__kvm_tlb_flush_local_vmid, vttbr);
+		}
+#endif
 		*last_ran = vcpu->vcpu_id;
 	}
 
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 5ca3a04..56358fa 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -60,10 +60,20 @@ static bool memslot_is_logging(struct kvm_memory_slot *memslot)
  */
 void kvm_flush_remote_tlbs(struct kvm *kvm)
 {
+#ifndef CONFIG_KVM_ARM_NESTED_HYP
 	struct kvm_s2_mmu *mmu = &kvm->arch.mmu;
 	u64 vttbr = kvm_get_vttbr(&mmu->vmid, mmu);
 
 	kvm_call_hyp(__kvm_tlb_flush_vmid, vttbr);
+#else
+	/*
+	 * When supporting nested virtualization, we can have multiple VMIDs
+	 * in play for each VCPU in the VM, so it's really not worth it to try
+	 * to quiesce the system and flush all the VMIDs that may be in use,
+	 * instead just nuke the whole thing.
+	 */
+	kvm_call_hyp(__kvm_flush_vm_context);
+#endif
 }
 
 static void kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa)
@@ -71,6 +81,12 @@ static void kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa)
 	u64 vttbr = kvm_get_vttbr(&mmu->vmid, mmu);
 
 	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, vttbr, ipa);
+#ifdef CONFIG_KVM_ARM_NESTED_HYP
+	if (!mmu->el2_vmid.vmid)
+		return; /* only if this mmu has el2 context */
+	vttbr = kvm_get_vttbr(&mmu->el2_vmid, mmu);
+	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, vttbr, ipa);
+#endif
 }
 
 /*
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 37/55] KVM: arm64: Setup vttbr_el2 on each VM entry
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (35 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 36/55] KVM: arm64: Invalidate virtual EL2 TLB entries when needed Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-01-09  6:24 ` [RFC 38/55] KVM: arm/arm64: Make mmu functions non-static Jintack Lim
                   ` (19 subsequent siblings)
  56 siblings, 0 replies; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

From: Christoffer Dall <christoffer.dall@linaro.org>

Now that the vttbr value will be different depending on the VM's
exception level, we set it on each VM entry.

We only have one mmu instance at this point, but there will be
multiple of them when we run nested VMs.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm64/kvm/context.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
index a93ffe4..b2c0220 100644
--- a/arch/arm64/kvm/context.c
+++ b/arch/arm64/kvm/context.c
@@ -18,6 +18,7 @@
 #include <linux/kvm_host.h>
 #include <asm/kvm_emulate.h>
 #include <asm/esr.h>
+#include <asm/kvm_mmu.h>
 
 struct el1_el2_map {
 	enum vcpu_sysreg	el1;
@@ -88,6 +89,15 @@ static void create_shadow_el1_sysregs(struct kvm_vcpu *vcpu)
 	s_sys_regs[CPACR_EL1] = cptr_el2_to_cpacr_el1(el2_regs[CPTR_EL2]);
 }
 
+static void setup_s2_mmu(struct kvm_vcpu *vcpu)
+{
+	struct kvm_s2_mmu *mmu = &vcpu->kvm->arch.mmu;
+	struct kvm_s2_vmid *vmid = vcpu_get_active_vmid(vcpu);
+
+	vcpu->arch.hw_vttbr = kvm_get_vttbr(vmid, mmu);
+	vcpu->arch.hw_mmu = mmu;
+}
+
 /*
  * List of EL1 registers which we allow the virtual EL2 mode to access
  * directly without trapping and which haven't been paravirtualized.
@@ -166,6 +176,8 @@ void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
 	}
 
 	vgic_v2_setup_shadow_state(vcpu);
+
+	setup_s2_mmu(vcpu);
 }
 
 /**
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 38/55] KVM: arm/arm64: Make mmu functions non-static
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (36 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 37/55] KVM: arm64: Setup vttbr_el2 on each VM entry Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-01-09  6:24 ` [RFC 39/55] KVM: arm/arm64: Add mmu context for the nesting Jintack Lim
                   ` (18 subsequent siblings)
  56 siblings, 0 replies; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

From: Christoffer Dall <christoffer.dall@linaro.org>

Make mmu functions non-static so that we can reuse those functions
to support mmu for the nested VMs.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm/kvm/mmu.c               | 90 +++++++++++++++++++++++-----------------
 arch/arm64/include/asm/kvm_mmu.h |  9 ++++
 2 files changed, 61 insertions(+), 38 deletions(-)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 56358fa..98b42e8 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -301,7 +301,7 @@ static void unmap_stage2_puds(struct kvm_s2_mmu *mmu, pgd_t *pgd,
 }
 
 /**
- * unmap_stage2_range -- Clear stage2 page table entries to unmap a range
+ * kvm_unmap_stage2_range -- Clear stage2 page table entries to unmap a range
  * @kvm:   The VM pointer
  * @start: The intermediate physical base address of the range to unmap
  * @size:  The size of the area to unmap
@@ -311,8 +311,7 @@ static void unmap_stage2_puds(struct kvm_s2_mmu *mmu, pgd_t *pgd,
  * destroying the VM), otherwise another faulting VCPU may come in and mess
  * with things behind our backs.
  */
-static void unmap_stage2_range(struct kvm_s2_mmu *mmu,
-			       phys_addr_t start, u64 size)
+void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
 {
 	pgd_t *pgd;
 	phys_addr_t addr = start, end = start + size;
@@ -374,11 +373,10 @@ static void stage2_flush_puds(pgd_t *pgd,
 	} while (pud++, addr = next, addr != end);
 }
 
-static void stage2_flush_memslot(struct kvm_s2_mmu *mmu,
-				 struct kvm_memory_slot *memslot)
+void kvm_stage2_flush_range(struct kvm_s2_mmu *mmu,
+			    phys_addr_t start, phys_addr_t end)
 {
-	phys_addr_t addr = memslot->base_gfn << PAGE_SHIFT;
-	phys_addr_t end = addr + PAGE_SIZE * memslot->npages;
+	phys_addr_t addr = start;
 	phys_addr_t next;
 	pgd_t *pgd;
 
@@ -389,6 +387,15 @@ static void stage2_flush_memslot(struct kvm_s2_mmu *mmu,
 	} while (pgd++, addr = next, addr != end);
 }
 
+static void stage2_flush_memslot(struct kvm_s2_mmu *mmu,
+				 struct kvm_memory_slot *memslot)
+{
+	phys_addr_t start = memslot->base_gfn << PAGE_SHIFT;
+	phys_addr_t end = start + PAGE_SIZE * memslot->npages;
+
+	kvm_stage2_flush_range(mmu, start, end);
+}
+
 /**
  * stage2_flush_vm - Invalidate cache for pages mapped in stage 2
  * @kvm: The struct kvm pointer
@@ -745,21 +752,9 @@ int create_hyp_io_mappings(void *from, void *to, phys_addr_t phys_addr)
 				     __phys_to_pfn(phys_addr), PAGE_HYP_DEVICE);
 }
 
-/**
- * kvm_alloc_stage2_pgd - allocate level-1 table for stage-2 translation.
- * @kvm:	The KVM struct pointer for the VM.
- *
- * Allocates only the stage-2 HW PGD level table(s) (can support either full
- * 40-bit input addresses or limited to 32-bit input addresses). Clears the
- * allocated pages.
- *
- * Note we don't need locking here as this is only called when the VM is
- * created, which can only be done once.
- */
-int kvm_alloc_stage2_pgd(struct kvm *kvm)
+int __kvm_alloc_stage2_pgd(struct kvm_s2_mmu *mmu)
 {
 	pgd_t *pgd;
-	struct kvm_s2_mmu *mmu = &kvm->arch.mmu;
 
 	if (mmu->pgd != NULL) {
 		kvm_err("kvm_arch already initialized?\n");
@@ -776,6 +771,22 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm)
 	return 0;
 }
 
+/**
+ * kvm_alloc_stage2_pgd - allocate level-1 table for stage-2 translation.
+ * @kvm:	The KVM struct pointer for the VM.
+ *
+ * Allocates only the stage-2 HW PGD level table(s) (can support either full
+ * 40-bit input addresses or limited to 32-bit input addresses). Clears the
+ * allocated pages.
+ *
+ * Note we don't need locking here as this is only called when the VM is
+ * created, which can only be done once.
+ */
+int kvm_alloc_stage2_pgd(struct kvm *kvm)
+{
+	return __kvm_alloc_stage2_pgd(&kvm->arch.mmu);
+}
+
 static void stage2_unmap_memslot(struct kvm_s2_mmu *mmu,
 				 struct kvm_memory_slot *memslot)
 {
@@ -811,7 +822,7 @@ static void stage2_unmap_memslot(struct kvm_s2_mmu *mmu,
 
 		if (!(vma->vm_flags & VM_PFNMAP)) {
 			gpa_t gpa = addr + (vm_start - memslot->userspace_addr);
-			unmap_stage2_range(mmu, gpa, vm_end - vm_start);
+			kvm_unmap_stage2_range(mmu, gpa, vm_end - vm_start);
 		}
 		hva = vm_end;
 	} while (hva < reg_end);
@@ -841,6 +852,17 @@ void stage2_unmap_vm(struct kvm *kvm)
 	srcu_read_unlock(&kvm->srcu, idx);
 }
 
+void __kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
+{
+	if (mmu->pgd == NULL)
+		return;
+
+	kvm_unmap_stage2_range(mmu, 0, KVM_PHYS_SIZE);
+	/* Free the HW pgd, one page at a time */
+	free_pages_exact(mmu->pgd, S2_PGD_SIZE);
+	mmu->pgd = NULL;
+}
+
 /**
  * kvm_free_stage2_pgd - free all stage-2 tables
  * @kvm:	The KVM struct pointer for the VM.
@@ -854,15 +876,7 @@ void stage2_unmap_vm(struct kvm *kvm)
  */
 void kvm_free_stage2_pgd(struct kvm *kvm)
 {
-	struct kvm_s2_mmu *mmu = &kvm->arch.mmu;
-
-	if (mmu->pgd == NULL)
-		return;
-
-	unmap_stage2_range(mmu, 0, KVM_PHYS_SIZE);
-	/* Free the HW pgd, one page at a time */
-	free_pages_exact(mmu->pgd, S2_PGD_SIZE);
-	mmu->pgd = NULL;
+	__kvm_free_stage2_pgd(&kvm->arch.mmu);
 }
 
 static pud_t *stage2_get_pud(struct kvm_s2_mmu *mmu,
@@ -1175,13 +1189,13 @@ static void  stage2_wp_puds(pgd_t *pgd, phys_addr_t addr, phys_addr_t end)
 }
 
 /**
- * stage2_wp_range() - write protect stage2 memory region range
+ * kvm_stage2_wp_range() - write protect stage2 memory region range
  * @kvm:	The KVM pointer
  * @addr:	Start address of range
  * @end:	End address of range
  */
-static void stage2_wp_range(struct kvm *kvm, struct kvm_s2_mmu *mmu,
-			    phys_addr_t addr, phys_addr_t end)
+void kvm_stage2_wp_range(struct kvm *kvm, struct kvm_s2_mmu *mmu,
+			 phys_addr_t addr, phys_addr_t end)
 {
 	pgd_t *pgd;
 	phys_addr_t next;
@@ -1225,7 +1239,7 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
 	phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
 
 	spin_lock(&kvm->mmu_lock);
-	stage2_wp_range(kvm, &kvm->arch.mmu, start, end);
+	kvm_stage2_wp_range(kvm, &kvm->arch.mmu, start, end);
 	spin_unlock(&kvm->mmu_lock);
 	kvm_flush_remote_tlbs(kvm);
 }
@@ -1249,7 +1263,7 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
 	phys_addr_t start = (base_gfn +  __ffs(mask)) << PAGE_SHIFT;
 	phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT;
 
-	stage2_wp_range(kvm, &kvm->arch.mmu, start, end);
+	kvm_stage2_wp_range(kvm, &kvm->arch.mmu, start, end);
 }
 
 /*
@@ -1589,7 +1603,7 @@ static int handle_hva_to_gpa(struct kvm *kvm,
 
 static int kvm_unmap_hva_handler(struct kvm *kvm, gpa_t gpa, void *data)
 {
-	unmap_stage2_range(&kvm->arch.mmu, gpa, PAGE_SIZE);
+	kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, PAGE_SIZE);
 	return 0;
 }
 
@@ -1900,7 +1914,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 
 	spin_lock(&kvm->mmu_lock);
 	if (ret)
-		unmap_stage2_range(&kvm->arch.mmu, mem->guest_phys_addr,
+		kvm_unmap_stage2_range(&kvm->arch.mmu, mem->guest_phys_addr,
 				   mem->memory_size);
 	else
 		stage2_flush_memslot(&kvm->arch.mmu, memslot);
@@ -1944,7 +1958,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 	phys_addr_t size = slot->npages << PAGE_SHIFT;
 
 	spin_lock(&kvm->mmu_lock);
-	unmap_stage2_range(&kvm->arch.mmu, gpa, size);
+	kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, size);
 	spin_unlock(&kvm->mmu_lock);
 }
 
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index e3455c4..a504162 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -145,9 +145,18 @@ static inline unsigned long __kern_hyp_va(unsigned long v)
 
 void stage2_unmap_vm(struct kvm *kvm);
 int kvm_alloc_stage2_pgd(struct kvm *kvm);
+int __kvm_alloc_stage2_pgd(struct kvm_s2_mmu *mmu);
 void kvm_free_stage2_pgd(struct kvm *kvm);
+void __kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
 int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 			  phys_addr_t pa, unsigned long size, bool writable);
+void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start,
+			    u64 size);
+void kvm_stage2_wp_range(struct kvm *kvm, struct kvm_s2_mmu *mmu,
+			 phys_addr_t addr, phys_addr_t end);
+void kvm_stage2_flush_range(struct kvm_s2_mmu *mmu,
+			    phys_addr_t start, phys_addr_t end);
+
 
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 39/55] KVM: arm/arm64: Add mmu context for the nesting
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (37 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 38/55] KVM: arm/arm64: Make mmu functions non-static Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-02-22 13:34   ` Christoffer Dall
  2017-01-09  6:24 ` [RFC 40/55] KVM: arm/arm64: Handle vttbr_el2 write operation from the guest hypervisor Jintack Lim
                   ` (17 subsequent siblings)
  56 siblings, 1 reply; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

Add the shadow stage-2 MMU context to be used for the nesting, but don't
do anything with it yet.

The host hypervisor maintains mmu structures for each nested VM. When
entering a nested VM, the host hypervisor searches for the nested VM's
mmu using vmid as a key. Note that this vmid is from the guest
hypervisor's point of view.

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm/include/asm/kvm_host.h      |  3 ++
 arch/arm/kvm/arm.c                   |  1 +
 arch/arm64/include/asm/kvm_emulate.h | 13 ++++-----
 arch/arm64/include/asm/kvm_host.h    | 19 +++++++++++++
 arch/arm64/include/asm/kvm_mmu.h     | 31 ++++++++++++++++++++
 arch/arm64/kvm/Makefile              |  1 +
 arch/arm64/kvm/context.c             |  2 +-
 arch/arm64/kvm/mmu-nested.c          | 55 ++++++++++++++++++++++++++++++++++++
 8 files changed, 116 insertions(+), 9 deletions(-)
 create mode 100644 arch/arm64/kvm/mmu-nested.c

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index da45394..fbde48d 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -82,6 +82,9 @@ struct kvm_arch {
 	 * here.
 	 */
 
+	/* Never used on arm but added to be compatible with arm64 */
+	struct list_head nested_mmu_list;
+
 	/* Interrupt controller */
 	struct vgic_dist	vgic;
 	int max_vcpus;
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 371b38e7..147df97 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -146,6 +146,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	/* Mark the initial VMID generation invalid */
 	kvm->arch.mmu.vmid.vmid_gen = 0;
 	kvm->arch.mmu.el2_vmid.vmid_gen = 0;
+	INIT_LIST_HEAD(&kvm->arch.nested_mmu_list);
 
 	/* The maximum number of VCPUs is limited by the host's GIC model */
 	kvm->arch.max_vcpus = vgic_present ?
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 94068e7..abad676 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -183,6 +183,11 @@ static inline bool vcpu_el2_imo_is_set(const struct kvm_vcpu *vcpu)
 	return (vcpu_el2_reg(vcpu, HCR_EL2) & HCR_IMO);
 }
 
+static inline bool vcpu_nested_stage2_enabled(const struct kvm_vcpu *vcpu)
+{
+	return (vcpu_el2_reg(vcpu, HCR_EL2) & HCR_VM);
+}
+
 static inline u32 kvm_vcpu_get_hsr(const struct kvm_vcpu *vcpu)
 {
 	return vcpu->arch.fault.esr_el2;
@@ -363,12 +368,4 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
 	return data;		/* Leave LE untouched */
 }
 
-static inline struct kvm_s2_vmid *vcpu_get_active_vmid(struct kvm_vcpu *vcpu)
-{
-	if (unlikely(vcpu_mode_el2(vcpu)))
-		return &vcpu->kvm->arch.mmu.el2_vmid;
-
-	return &vcpu->kvm->arch.mmu.vmid;
-}
-
 #endif /* __ARM64_KVM_EMULATE_H__ */
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index b33d35d..23e2267 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -65,6 +65,22 @@ struct kvm_s2_mmu {
 	pgd_t *pgd;
 };
 
+/* Per nested VM mmu structure */
+struct kvm_nested_s2_mmu {
+	struct kvm_s2_mmu mmu;
+
+	/*
+	 * The vttbr value set by the guest hypervisor for this nested VM.
+	 * vmid field is used as a key to search for this mmu structure among
+	 * all nested VM mmu structures by the host hypervisor.
+	 * baddr field is used to determine if we need to unmap stage 2
+	 * shadow page tables.
+	 */
+	u64 virtual_vttbr;
+
+	struct list_head list;
+};
+
 struct kvm_arch {
 	/* Stage 2 paging state for the VM */
 	struct kvm_s2_mmu mmu;
@@ -80,6 +96,9 @@ struct kvm_arch {
 
 	/* Timer */
 	struct arch_timer_kvm	timer;
+
+	/* Stage 2 shadow paging contexts for nested L2 VM */
+	struct list_head nested_mmu_list;
 };
 
 #define KVM_NR_MEM_OBJS     40
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index a504162..d1ef650 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -112,6 +112,7 @@
 #include <asm/cacheflush.h>
 #include <asm/mmu_context.h>
 #include <asm/pgtable.h>
+#include <asm/kvm_emulate.h>
 
 static inline unsigned long __kern_hyp_va(unsigned long v)
 {
@@ -323,6 +324,21 @@ static inline unsigned int kvm_get_vmid_bits(void)
 	return (cpuid_feature_extract_unsigned_field(reg, ID_AA64MMFR1_VMIDBITS_SHIFT) == 2) ? 16 : 8;
 }
 
+#ifdef CONFIG_KVM_ARM_NESTED_HYP
+struct kvm_nested_s2_mmu *get_nested_mmu(struct kvm_vcpu *vcpu, u64 vttbr);
+struct kvm_s2_mmu *vcpu_get_active_s2_mmu(struct kvm_vcpu *vcpu);
+#else
+static inline struct kvm_nested_s2_mmu *get_nested_mmu(struct kvm_vcpu *vcpu,
+						       u64 vttbr)
+{
+	return NULL;
+}
+static inline struct kvm_s2_mmu *vcpu_get_active_s2_mmu(struct kvm_vcpu *vcpu)
+{
+	return &vcpu->kvm->arch.mmu;
+}
+#endif
+
 static inline u64 kvm_get_vttbr(struct kvm_s2_vmid *vmid,
 				struct kvm_s2_mmu *mmu)
 {
@@ -334,5 +350,20 @@ static inline u64 kvm_get_vttbr(struct kvm_s2_vmid *vmid,
 	return baddr | vmid_field;
 }
 
+static inline u64 get_vmid(u64 vttbr)
+{
+	return (vttbr & VTTBR_VMID_MASK(get_kvm_vmid_bits()))>>VTTBR_VMID_SHIFT;
+}
+
+static inline struct kvm_s2_vmid *vcpu_get_active_vmid(struct kvm_vcpu *vcpu)
+{
+	struct kvm_s2_mmu *mmu = vcpu_get_active_s2_mmu(vcpu);
+
+	if (unlikely(vcpu_mode_el2(vcpu)))
+		return &mmu->el2_vmid;
+	else
+		return &mmu->vmid;
+}
+
 #endif /* __ASSEMBLY__ */
 #endif /* __ARM64_KVM_MMU_H__ */
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 8573faf..b0b1074 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -36,5 +36,6 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
 kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
 
 kvm-$(CONFIG_KVM_ARM_NESTED_HYP) += handle_exit_nested.o
+kvm-$(CONFIG_KVM_ARM_NESTED_HYP) += mmu-nested.o
 kvm-$(CONFIG_KVM_ARM_NESTED_HYP) += emulate-nested.o
 kvm-$(CONFIG_KVM_ARM_NESTED_HYP) += $(KVM)/arm/vgic/vgic-v2-nested.o
diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
index b2c0220..9ebc38f 100644
--- a/arch/arm64/kvm/context.c
+++ b/arch/arm64/kvm/context.c
@@ -91,7 +91,7 @@ static void create_shadow_el1_sysregs(struct kvm_vcpu *vcpu)
 
 static void setup_s2_mmu(struct kvm_vcpu *vcpu)
 {
-	struct kvm_s2_mmu *mmu = &vcpu->kvm->arch.mmu;
+	struct kvm_s2_mmu *mmu = vcpu_get_active_s2_mmu(vcpu);
 	struct kvm_s2_vmid *vmid = vcpu_get_active_vmid(vcpu);
 
 	vcpu->arch.hw_vttbr = kvm_get_vttbr(vmid, mmu);
diff --git a/arch/arm64/kvm/mmu-nested.c b/arch/arm64/kvm/mmu-nested.c
new file mode 100644
index 0000000..d52078f
--- /dev/null
+++ b/arch/arm64/kvm/mmu-nested.c
@@ -0,0 +1,55 @@
+/*
+ * Copyright (C) 2016 - Columbia University
+ * Author: Jintack Lim <jintack@cs.columbia.edu>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_arm.h>
+#include <asm/kvm_emulate.h>
+#include <asm/kvm_mmu.h>
+#include <asm/kvm_nested.h>
+
+struct kvm_nested_s2_mmu *get_nested_mmu(struct kvm_vcpu *vcpu, u64 vttbr)
+{
+	struct kvm_nested_s2_mmu *mmu;
+	u64 target_vmid = get_vmid(vttbr);
+	struct list_head *nested_mmu_list = &vcpu->kvm->arch.nested_mmu_list;
+
+	list_for_each_entry_rcu(mmu, nested_mmu_list, list) {
+		u64 vmid = get_vmid(mmu->virtual_vttbr);
+
+		if (target_vmid == vmid)
+			return mmu;
+	}
+	return NULL;
+}
+
+struct kvm_s2_mmu *vcpu_get_active_s2_mmu(struct kvm_vcpu *vcpu)
+{
+	struct kvm_nested_s2_mmu *nested_mmu;
+
+	/* If we are NOT entering the nested VM, return mmu in kvm_arch */
+	if (vcpu_mode_el2(vcpu) || !vcpu_nested_stage2_enabled(vcpu))
+		return &vcpu->kvm->arch.mmu;
+
+	/* Otherwise, search for nested_mmu in the list */
+	nested_mmu = get_nested_mmu(vcpu, vcpu_el2_reg(vcpu, VTTBR_EL2));
+
+	/* When this function is called, nested_mmu should be in the list */
+	BUG_ON(!nested_mmu);
+
+	return &nested_mmu->mmu;
+}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 40/55] KVM: arm/arm64: Handle vttbr_el2 write operation from the guest hypervisor
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (38 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 39/55] KVM: arm/arm64: Add mmu context for the nesting Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-02-22 17:59   ` Christoffer Dall
  2017-01-09  6:24 ` [RFC 41/55] KVM: arm/arm64: Unmap/flush shadow stage 2 page tables Jintack Lim
                   ` (16 subsequent siblings)
  56 siblings, 1 reply; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

Each nested VM is supposed to have a mmu (i.e. shadow stage-2 page
table), and we create it when the guest hypervisor writes to vttbr_el2
with a new vmid.

In case the guest hypervisor writes to vttbr_el2 with existing vmid, we
check if the base address is changed. If so, then what we have in the
shadow page table is not valid any more. So ummap it.

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm/include/asm/kvm_host.h   |  1 +
 arch/arm/kvm/arm.c                |  1 +
 arch/arm64/include/asm/kvm_host.h |  1 +
 arch/arm64/include/asm/kvm_mmu.h  |  6 ++++
 arch/arm64/kvm/mmu-nested.c       | 71 +++++++++++++++++++++++++++++++++++++++
 arch/arm64/kvm/sys_regs.c         | 15 ++++++++-
 6 files changed, 94 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index fbde48d..ebf2810 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -84,6 +84,7 @@ struct kvm_arch {
 
 	/* Never used on arm but added to be compatible with arm64 */
 	struct list_head nested_mmu_list;
+	spinlock_t mmu_list_lock;
 
 	/* Interrupt controller */
 	struct vgic_dist	vgic;
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 147df97..6fa5754 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -147,6 +147,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	kvm->arch.mmu.vmid.vmid_gen = 0;
 	kvm->arch.mmu.el2_vmid.vmid_gen = 0;
 	INIT_LIST_HEAD(&kvm->arch.nested_mmu_list);
+	spin_lock_init(&kvm->arch.mmu_list_lock);
 
 	/* The maximum number of VCPUs is limited by the host's GIC model */
 	kvm->arch.max_vcpus = vgic_present ?
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 23e2267..52eea76 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -99,6 +99,7 @@ struct kvm_arch {
 
 	/* Stage 2 shadow paging contexts for nested L2 VM */
 	struct list_head nested_mmu_list;
+	spinlock_t mmu_list_lock;
 };
 
 #define KVM_NR_MEM_OBJS     40
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index d1ef650..fdc9327 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -327,6 +327,7 @@ static inline unsigned int kvm_get_vmid_bits(void)
 #ifdef CONFIG_KVM_ARM_NESTED_HYP
 struct kvm_nested_s2_mmu *get_nested_mmu(struct kvm_vcpu *vcpu, u64 vttbr);
 struct kvm_s2_mmu *vcpu_get_active_s2_mmu(struct kvm_vcpu *vcpu);
+bool handle_vttbr_update(struct kvm_vcpu *vcpu, u64 vttbr);
 #else
 static inline struct kvm_nested_s2_mmu *get_nested_mmu(struct kvm_vcpu *vcpu,
 						       u64 vttbr)
@@ -337,6 +338,11 @@ static inline struct kvm_s2_mmu *vcpu_get_active_s2_mmu(struct kvm_vcpu *vcpu)
 {
 	return &vcpu->kvm->arch.mmu;
 }
+
+static inline bool handle_vttbr_update(struct kvm_vcpu *vcpu, u64 vttbr)
+{
+	return false;
+}
 #endif
 
 static inline u64 kvm_get_vttbr(struct kvm_s2_vmid *vmid,
diff --git a/arch/arm64/kvm/mmu-nested.c b/arch/arm64/kvm/mmu-nested.c
index d52078f..0811d94 100644
--- a/arch/arm64/kvm/mmu-nested.c
+++ b/arch/arm64/kvm/mmu-nested.c
@@ -53,3 +53,74 @@ struct kvm_s2_mmu *vcpu_get_active_s2_mmu(struct kvm_vcpu *vcpu)
 
 	return &nested_mmu->mmu;
 }
+
+static struct kvm_nested_s2_mmu *create_nested_mmu(struct kvm_vcpu *vcpu,
+						   u64 vttbr)
+{
+	struct kvm_nested_s2_mmu *nested_mmu, *tmp_mmu;
+	struct list_head *nested_mmu_list = &vcpu->kvm->arch.nested_mmu_list;
+	bool need_free = false;
+	int ret;
+
+	nested_mmu = kzalloc(sizeof(struct kvm_nested_s2_mmu), GFP_KERNEL);
+	if (!nested_mmu)
+		return NULL;
+
+	ret = __kvm_alloc_stage2_pgd(&nested_mmu->mmu);
+	if (ret) {
+		kfree(nested_mmu);
+		return NULL;
+	}
+
+	spin_lock(&vcpu->kvm->arch.mmu_list_lock);
+	tmp_mmu = get_nested_mmu(vcpu, vttbr);
+	if (!tmp_mmu)
+		list_add_rcu(&nested_mmu->list, nested_mmu_list);
+	else /* Somebody already created and put a new nested_mmu to the list */
+		need_free = true;
+	spin_unlock(&vcpu->kvm->arch.mmu_list_lock);
+
+	if (need_free) {
+		__kvm_free_stage2_pgd(&nested_mmu->mmu);
+		kfree(nested_mmu);
+		nested_mmu = tmp_mmu;
+	}
+
+	return nested_mmu;
+}
+
+static void kvm_nested_s2_unmap(struct kvm_vcpu *vcpu)
+{
+	struct kvm_nested_s2_mmu *nested_mmu;
+	struct list_head *nested_mmu_list = &vcpu->kvm->arch.nested_mmu_list;
+
+	list_for_each_entry_rcu(nested_mmu, nested_mmu_list, list)
+		kvm_unmap_stage2_range(&nested_mmu->mmu, 0, KVM_PHYS_SIZE);
+}
+
+bool handle_vttbr_update(struct kvm_vcpu *vcpu, u64 vttbr)
+{
+	struct kvm_nested_s2_mmu *nested_mmu;
+
+	/* See if we can relax this */
+	if (!vttbr)
+		return true;
+
+	nested_mmu = (struct kvm_nested_s2_mmu *)get_nested_mmu(vcpu, vttbr);
+	if (!nested_mmu) {
+		nested_mmu = create_nested_mmu(vcpu, vttbr);
+		if (!nested_mmu)
+			return false;
+	} else {
+		/*
+		 * unmap the shadow page table if vttbr_el2 is
+		 * changed to different value
+		 */
+		if (vttbr != nested_mmu->virtual_vttbr)
+			kvm_nested_s2_unmap(vcpu);
+	}
+
+	nested_mmu->virtual_vttbr = vttbr;
+
+	return true;
+}
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index e66f40d..ddb641c 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -960,6 +960,19 @@ static bool access_cpacr(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+static bool access_vttbr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			 const struct sys_reg_desc *r)
+{
+	u64 vttbr = p->regval;
+
+	if (!p->is_write) {
+		p->regval = vcpu_el2_reg(vcpu, r->reg);
+		return true;
+	}
+
+	return handle_vttbr_update(vcpu, vttbr);
+}
+
 static bool trap_el2_reg(struct kvm_vcpu *vcpu,
 			 struct sys_reg_params *p,
 			 const struct sys_reg_desc *r)
@@ -1306,7 +1319,7 @@ static bool trap_el2_reg(struct kvm_vcpu *vcpu,
 	  trap_el2_reg, reset_el2_val, TCR_EL2, 0 },
 	/* VTTBR_EL2 */
 	{ Op0(0b11), Op1(0b100), CRn(0b0010), CRm(0b0001), Op2(0b000),
-	  trap_el2_reg, reset_el2_val, VTTBR_EL2, 0 },
+	  access_vttbr, reset_el2_val, VTTBR_EL2, 0 },
 	/* VTCR_EL2 */
 	{ Op0(0b11), Op1(0b100), CRn(0b0010), CRm(0b0001), Op2(0b010),
 	  trap_el2_reg, reset_el2_val, VTCR_EL2, 0 },
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 41/55] KVM: arm/arm64: Unmap/flush shadow stage 2 page tables
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (39 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 40/55] KVM: arm/arm64: Handle vttbr_el2 write operation from the guest hypervisor Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-02-22 18:09   ` Christoffer Dall
  2017-01-09  6:24 ` [RFC 42/55] KVM: arm64: Implement nested Stage-2 page table walk logic Jintack Lim
                   ` (15 subsequent siblings)
  56 siblings, 1 reply; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

From: Christoffer Dall <christoffer.dall@linaro.org>

Unmap/flush shadow stage 2 page tables for the nested VMs as well as the
stage 2 page table for the guest hypervisor.

Note: A bunch of the code in mmu.c relating to MMU notifiers is
currently dealt with in an extremely abrupt way, for example by clearing
out an entire shadow stage-2 table.  Probably we can do smarter with
some sort of rmap structure.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm/include/asm/kvm_mmu.h   |  7 ++++
 arch/arm/kvm/arm.c               |  6 ++-
 arch/arm/kvm/mmu.c               | 11 +++++
 arch/arm64/include/asm/kvm_mmu.h | 13 ++++++
 arch/arm64/kvm/mmu-nested.c      | 90 ++++++++++++++++++++++++++++++++++++----
 5 files changed, 117 insertions(+), 10 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 1b3309c..ae3aa39 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -230,6 +230,13 @@ static inline unsigned int kvm_get_vmid_bits(void)
 	return 8;
 }
 
+static inline void kvm_nested_s2_unmap(struct kvm_vcpu *vcpu) { }
+static inline int kvm_nested_s2_init(struct kvm_vcpu *vcpu) { return 0; }
+static inline void kvm_nested_s2_teardown(struct kvm_vcpu *vcpu) { }
+static inline void kvm_nested_s2_all_vcpus_wp(struct kvm *kvm) { }
+static inline void kvm_nested_s2_all_vcpus_unmap(struct kvm *kvm) { }
+static inline void kvm_nested_s2_all_vcpus_flush(struct kvm *kvm) { }
+
 static inline u64 kvm_get_vttbr(struct kvm_s2_vmid *vmid,
 				struct kvm_s2_mmu *mmu)
 {
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 6fa5754..dc2795f 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -191,6 +191,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 
 	for (i = 0; i < KVM_MAX_VCPUS; ++i) {
 		if (kvm->vcpus[i]) {
+			kvm_nested_s2_teardown(kvm->vcpus[i]);
 			kvm_arch_vcpu_free(kvm->vcpus[i]);
 			kvm->vcpus[i] = NULL;
 		}
@@ -333,6 +334,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 
 	vcpu->arch.hw_mmu = mmu;
 	vcpu->arch.hw_vttbr = kvm_get_vttbr(&mmu->vmid, mmu);
+	kvm_nested_s2_init(vcpu);
 
 	return 0;
 }
@@ -871,8 +873,10 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu,
 	 * Ensure a rebooted VM will fault in RAM pages and detect if the
 	 * guest MMU is turned off and flush the caches as needed.
 	 */
-	if (vcpu->arch.has_run_once)
+	if (vcpu->arch.has_run_once) {
 		stage2_unmap_vm(vcpu->kvm);
+		kvm_nested_s2_unmap(vcpu);
+	}
 
 	vcpu_reset_hcr(vcpu);
 
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 98b42e8..1677a87 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -416,6 +416,8 @@ static void stage2_flush_vm(struct kvm *kvm)
 	kvm_for_each_memslot(memslot, slots)
 		stage2_flush_memslot(&kvm->arch.mmu, memslot);
 
+	kvm_nested_s2_all_vcpus_flush(kvm);
+
 	spin_unlock(&kvm->mmu_lock);
 	srcu_read_unlock(&kvm->srcu, idx);
 }
@@ -1240,6 +1242,7 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
 
 	spin_lock(&kvm->mmu_lock);
 	kvm_stage2_wp_range(kvm, &kvm->arch.mmu, start, end);
+	kvm_nested_s2_all_vcpus_wp(kvm);
 	spin_unlock(&kvm->mmu_lock);
 	kvm_flush_remote_tlbs(kvm);
 }
@@ -1278,6 +1281,7 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 		gfn_t gfn_offset, unsigned long mask)
 {
 	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
+	kvm_nested_s2_all_vcpus_wp(kvm);
 }
 
 static void coherent_cache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
@@ -1604,6 +1608,7 @@ static int handle_hva_to_gpa(struct kvm *kvm,
 static int kvm_unmap_hva_handler(struct kvm *kvm, gpa_t gpa, void *data)
 {
 	kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, PAGE_SIZE);
+	kvm_nested_s2_all_vcpus_unmap(kvm);
 	return 0;
 }
 
@@ -1642,6 +1647,7 @@ static int kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, void *data)
 	 * through this calling path.
 	 */
 	stage2_set_pte(&kvm->arch.mmu, NULL, gpa, pte, 0);
+	kvm_nested_s2_all_vcpus_unmap(kvm);
 	return 0;
 }
 
@@ -1675,6 +1681,8 @@ static int kvm_age_hva_handler(struct kvm *kvm, gpa_t gpa, void *data)
 	if (pte_none(*pte))
 		return 0;
 
+	/* TODO: Handle nested_mmu structures here as well */
+
 	return stage2_ptep_test_and_clear_young(pte);
 }
 
@@ -1694,6 +1702,8 @@ static int kvm_test_age_hva_handler(struct kvm *kvm, gpa_t gpa, void *data)
 	if (!pte_none(*pte))		/* Just a page... */
 		return pte_young(*pte);
 
+	/* TODO: Handle nested_mmu structures here as well */
+
 	return 0;
 }
 
@@ -1959,6 +1969,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 
 	spin_lock(&kvm->mmu_lock);
 	kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, size);
+	kvm_nested_s2_all_vcpus_unmap(kvm);
 	spin_unlock(&kvm->mmu_lock);
 }
 
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index fdc9327..e4d5d54 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -328,6 +328,12 @@ static inline unsigned int kvm_get_vmid_bits(void)
 struct kvm_nested_s2_mmu *get_nested_mmu(struct kvm_vcpu *vcpu, u64 vttbr);
 struct kvm_s2_mmu *vcpu_get_active_s2_mmu(struct kvm_vcpu *vcpu);
 bool handle_vttbr_update(struct kvm_vcpu *vcpu, u64 vttbr);
+void kvm_nested_s2_unmap(struct kvm_vcpu *vcpu);
+int kvm_nested_s2_init(struct kvm_vcpu *vcpu);
+void kvm_nested_s2_teardown(struct kvm_vcpu *vcpu);
+void kvm_nested_s2_all_vcpus_wp(struct kvm *kvm);
+void kvm_nested_s2_all_vcpus_unmap(struct kvm *kvm);
+void kvm_nested_s2_all_vcpus_flush(struct kvm *kvm);
 #else
 static inline struct kvm_nested_s2_mmu *get_nested_mmu(struct kvm_vcpu *vcpu,
 						       u64 vttbr)
@@ -343,6 +349,13 @@ static inline bool handle_vttbr_update(struct kvm_vcpu *vcpu, u64 vttbr)
 {
 	return false;
 }
+
+static inline void kvm_nested_s2_unmap(struct kvm_vcpu *vcpu) { }
+static inline int kvm_nested_s2_init(struct kvm_vcpu *vcpu) { return 0; }
+static inline void kvm_nested_s2_teardown(struct kvm_vcpu *vcpu) { }
+static inline void kvm_nested_s2_all_vcpus_wp(struct kvm *kvm) { }
+static inline void kvm_nested_s2_all_vcpus_unmap(struct kvm *kvm) { }
+static inline void kvm_nested_s2_all_vcpus_flush(struct kvm *kvm) { }
 #endif
 
 static inline u64 kvm_get_vttbr(struct kvm_s2_vmid *vmid,
diff --git a/arch/arm64/kvm/mmu-nested.c b/arch/arm64/kvm/mmu-nested.c
index 0811d94..b22b78c 100644
--- a/arch/arm64/kvm/mmu-nested.c
+++ b/arch/arm64/kvm/mmu-nested.c
@@ -1,6 +1,7 @@
 /*
  * Copyright (C) 2016 - Columbia University
  * Author: Jintack Lim <jintack@cs.columbia.edu>
+ * Author: Christoffer Dall <cdall@cs.columbia.edu>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -22,6 +23,86 @@
 #include <asm/kvm_mmu.h>
 #include <asm/kvm_nested.h>
 
+
+/* expects kvm->mmu_lock to be held */
+void kvm_nested_s2_all_vcpus_wp(struct kvm *kvm)
+{
+	int i;
+	struct kvm_vcpu *vcpu;
+	struct kvm_nested_s2_mmu *nested_mmu;
+	struct list_head *nested_mmu_list;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		if (need_resched() || spin_needbreak(&kvm->mmu_lock))
+			cond_resched_lock(&kvm->mmu_lock);
+
+		nested_mmu_list = &vcpu->kvm->arch.nested_mmu_list;
+		list_for_each_entry_rcu(nested_mmu, nested_mmu_list, list)
+			kvm_stage2_wp_range(kvm, &nested_mmu->mmu,
+				    0, KVM_PHYS_SIZE);
+	}
+}
+
+/* expects kvm->mmu_lock to be held */
+void kvm_nested_s2_all_vcpus_unmap(struct kvm *kvm)
+{
+	int i;
+	struct kvm_vcpu *vcpu;
+	struct kvm_nested_s2_mmu *nested_mmu;
+	struct list_head *nested_mmu_list;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		if (need_resched() || spin_needbreak(&kvm->mmu_lock))
+			cond_resched_lock(&kvm->mmu_lock);
+
+		nested_mmu_list = &vcpu->kvm->arch.nested_mmu_list;
+		list_for_each_entry_rcu(nested_mmu, nested_mmu_list, list)
+			kvm_unmap_stage2_range(&nested_mmu->mmu,
+				       0, KVM_PHYS_SIZE);
+	}
+}
+
+void kvm_nested_s2_all_vcpus_flush(struct kvm *kvm)
+{
+	int i;
+	struct kvm_vcpu *vcpu;
+	struct kvm_nested_s2_mmu *nested_mmu;
+	struct list_head *nested_mmu_list;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		if (need_resched() || spin_needbreak(&kvm->mmu_lock))
+			cond_resched_lock(&kvm->mmu_lock);
+
+		nested_mmu_list = &vcpu->kvm->arch.nested_mmu_list;
+		list_for_each_entry_rcu(nested_mmu, nested_mmu_list, list)
+			kvm_stage2_flush_range(&nested_mmu->mmu,
+				       0, KVM_PHYS_SIZE);
+	}
+}
+
+void kvm_nested_s2_unmap(struct kvm_vcpu *vcpu)
+{
+	struct kvm_nested_s2_mmu *nested_mmu;
+	struct list_head *nested_mmu_list = &vcpu->kvm->arch.nested_mmu_list;
+
+	list_for_each_entry_rcu(nested_mmu, nested_mmu_list, list)
+		kvm_unmap_stage2_range(&nested_mmu->mmu, 0, KVM_PHYS_SIZE);
+}
+
+int kvm_nested_s2_init(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
+void kvm_nested_s2_teardown(struct kvm_vcpu *vcpu)
+{
+	struct kvm_nested_s2_mmu *nested_mmu;
+	struct list_head *nested_mmu_list = &vcpu->kvm->arch.nested_mmu_list;
+
+	list_for_each_entry_rcu(nested_mmu, nested_mmu_list, list)
+		__kvm_free_stage2_pgd(&nested_mmu->mmu);
+}
+
 struct kvm_nested_s2_mmu *get_nested_mmu(struct kvm_vcpu *vcpu, u64 vttbr)
 {
 	struct kvm_nested_s2_mmu *mmu;
@@ -89,15 +170,6 @@ static struct kvm_nested_s2_mmu *create_nested_mmu(struct kvm_vcpu *vcpu,
 	return nested_mmu;
 }
 
-static void kvm_nested_s2_unmap(struct kvm_vcpu *vcpu)
-{
-	struct kvm_nested_s2_mmu *nested_mmu;
-	struct list_head *nested_mmu_list = &vcpu->kvm->arch.nested_mmu_list;
-
-	list_for_each_entry_rcu(nested_mmu, nested_mmu_list, list)
-		kvm_unmap_stage2_range(&nested_mmu->mmu, 0, KVM_PHYS_SIZE);
-}
-
 bool handle_vttbr_update(struct kvm_vcpu *vcpu, u64 vttbr)
 {
 	struct kvm_nested_s2_mmu *nested_mmu;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 42/55] KVM: arm64: Implement nested Stage-2 page table walk logic
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (40 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 41/55] KVM: arm/arm64: Unmap/flush shadow stage 2 page tables Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-01-09  6:24 ` [RFC 43/55] KVM: arm/arm64: Handle shadow stage 2 page faults Jintack Lim
                   ` (14 subsequent siblings)
  56 siblings, 0 replies; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

From: Christoffer Dall <christoffer.dall@linaro.org>

Based on the pseudo-code in the ARM ARM, implement a stage 2 software
page table walker.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm/include/asm/kvm_mmu.h   |  11 ++
 arch/arm64/include/asm/kvm_arm.h |   1 +
 arch/arm64/include/asm/kvm_mmu.h |  13 +++
 arch/arm64/kvm/mmu-nested.c      | 223 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 248 insertions(+)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index ae3aa39..ab41a10 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -230,6 +230,17 @@ static inline unsigned int kvm_get_vmid_bits(void)
 	return 8;
 }
 
+struct kvm_s2_trans {
+	phys_addr_t output;
+	phys_addr_t block_size;
+};
+
+static inline int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
+				     struct kvm_s2_trans *result)
+{
+	return 0;
+}
+
 static inline void kvm_nested_s2_unmap(struct kvm_vcpu *vcpu) { }
 static inline int kvm_nested_s2_init(struct kvm_vcpu *vcpu) { return 0; }
 static inline void kvm_nested_s2_teardown(struct kvm_vcpu *vcpu) { }
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index feded61..f9addf3 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -103,6 +103,7 @@
 #define VTCR_EL2_RES1		(1 << 31)
 #define VTCR_EL2_HD		(1 << 22)
 #define VTCR_EL2_HA		(1 << 21)
+#define VTCR_EL2_PS_SHIFT	TCR_EL2_PS_SHIFT
 #define VTCR_EL2_PS_MASK	TCR_EL2_PS_MASK
 #define VTCR_EL2_TG0_MASK	TCR_TG0_MASK
 #define VTCR_EL2_TG0_4K		TCR_TG0_4K
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index e4d5d54..bf94f0c 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -324,10 +324,17 @@ static inline unsigned int kvm_get_vmid_bits(void)
 	return (cpuid_feature_extract_unsigned_field(reg, ID_AA64MMFR1_VMIDBITS_SHIFT) == 2) ? 16 : 8;
 }
 
+struct kvm_s2_trans {
+	phys_addr_t output;
+	phys_addr_t block_size;
+};
+
 #ifdef CONFIG_KVM_ARM_NESTED_HYP
 struct kvm_nested_s2_mmu *get_nested_mmu(struct kvm_vcpu *vcpu, u64 vttbr);
 struct kvm_s2_mmu *vcpu_get_active_s2_mmu(struct kvm_vcpu *vcpu);
 bool handle_vttbr_update(struct kvm_vcpu *vcpu, u64 vttbr);
+int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
+		       struct kvm_s2_trans *result);
 void kvm_nested_s2_unmap(struct kvm_vcpu *vcpu);
 int kvm_nested_s2_init(struct kvm_vcpu *vcpu);
 void kvm_nested_s2_teardown(struct kvm_vcpu *vcpu);
@@ -350,6 +357,12 @@ static inline bool handle_vttbr_update(struct kvm_vcpu *vcpu, u64 vttbr)
 	return false;
 }
 
+static inline int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
+				     struct kvm_s2_trans *result)
+{
+	return 0;
+}
+
 static inline void kvm_nested_s2_unmap(struct kvm_vcpu *vcpu) { }
 static inline int kvm_nested_s2_init(struct kvm_vcpu *vcpu) { return 0; }
 static inline void kvm_nested_s2_teardown(struct kvm_vcpu *vcpu) { }
diff --git a/arch/arm64/kvm/mmu-nested.c b/arch/arm64/kvm/mmu-nested.c
index b22b78c..a2fab41 100644
--- a/arch/arm64/kvm/mmu-nested.c
+++ b/arch/arm64/kvm/mmu-nested.c
@@ -23,6 +23,229 @@
 #include <asm/kvm_mmu.h>
 #include <asm/kvm_nested.h>
 
+struct s2_walk_info {
+	unsigned int pgshift;
+	unsigned int pgsize;
+	unsigned int ps;
+	unsigned int sl;
+	unsigned int t0sz;
+};
+
+static unsigned int ps_to_output_size(unsigned int ps)
+{
+	switch (ps) {
+	case 0: return 32;
+	case 1: return 36;
+	case 2: return 40;
+	case 3: return 42;
+	case 4: return 44;
+	case 5:
+	default:
+		return 48;
+	}
+}
+
+static unsigned int pa_max(void)
+{
+	u64 parange = read_sysreg(id_aa64mmfr0_el1) & 7;
+
+	return ps_to_output_size(parange);
+}
+
+static int vcpu_inject_s2_trans_fault(struct kvm_vcpu *vcpu, gpa_t ipa,
+				      int level)
+{
+	/* TODO: Implement */
+	return -EFAULT;
+}
+
+static int vcpu_inject_s2_addr_sz_fault(struct kvm_vcpu *vcpu, gpa_t ipa,
+					int level)
+{
+	/* TODO: Implement */
+	return -EFAULT;
+}
+
+static int vcpu_inject_s2_access_flag_fault(struct kvm_vcpu *vcpu, gpa_t ipa,
+					    int level)
+{
+	/* TODO: Implement */
+	return -EFAULT;
+}
+
+static int check_base_s2_limits(struct kvm_vcpu *vcpu, struct s2_walk_info *wi,
+				int level, int input_size, int stride)
+{
+	int start_size;
+
+	/* Check translation limits */
+	switch (wi->pgsize) {
+	case SZ_64K:
+		if (level == 0 || (level == 1 && pa_max() <= 42))
+			return -EFAULT;
+		break;
+	case SZ_16K:
+		if (level == 0 || (level == 1 && pa_max() <= 40))
+			return -EFAULT;
+		break;
+	case SZ_4K:
+		if (level < 0 || (level == 0 && pa_max() <= 42))
+			return -EFAULT;
+		break;
+	}
+
+	/* Check input size limits */
+	if (input_size > pa_max() &&
+	    (!vcpu_mode_is_32bit(vcpu) || input_size > 40))
+		return -EFAULT;
+
+	/* Check number of entries in starting level table */
+	start_size = input_size - ((3 - level) * stride + wi->pgshift);
+	if (start_size < 1 || start_size > stride + 4)
+		return -EFAULT;
+
+	return 0;
+}
+
+/* Check if output is within boundaries */
+static int check_output_size(struct kvm_vcpu *vcpu, struct s2_walk_info *wi,
+			     phys_addr_t output)
+{
+	unsigned int output_size = ps_to_output_size(wi->ps);
+
+	if (output_size > pa_max())
+		output_size = pa_max();
+
+	if (output_size != 48 && (output & GENMASK_ULL(47, output_size)))
+		return -1;
+
+	return 0;
+}
+
+/*
+ * This is essentially a C-version of the pseudo code from the ARM ARM
+ * AArch64.TranslationTableWalk  function.  I strongly recommend looking at
+ * that pseudocode in trying to understand this.
+ *
+ * Must be called with the kvm->srcy read lock held
+ */
+static int walk_nested_s2_pgd(struct kvm_vcpu *vcpu, phys_addr_t ipa,
+			      struct s2_walk_info *wi, struct kvm_s2_trans *out)
+{
+	u64 vttbr = vcpu->arch.ctxt.el2_regs[VTTBR_EL2];
+	int first_block_level, level, stride, input_size, base_lower_bound;
+	phys_addr_t base_addr;
+	unsigned int addr_top, addr_bottom;
+	u64 desc;  /* page table entry */
+	int ret;
+	phys_addr_t paddr;
+
+	switch (wi->pgsize) {
+	case SZ_64K:
+	case SZ_16K:
+		level = 3 - wi->sl;
+		first_block_level = 2;
+		break;
+	case SZ_4K:
+		level = 2 - wi->sl;
+		first_block_level = 1;
+		break;
+	default:
+		/* GCC is braindead */
+		WARN(1, "Page size is none of 4K, 16K or 64K");
+	}
+
+	stride = wi->pgshift - 3;
+	input_size = 64 - wi->t0sz;
+	if (input_size > 48 || input_size < 25)
+		return -EFAULT;
+
+	ret = check_base_s2_limits(vcpu, wi, level, input_size, stride);
+	if (WARN_ON(ret))
+		return ret;
+
+	if (check_output_size(vcpu, wi, vttbr))
+		return vcpu_inject_s2_addr_sz_fault(vcpu, ipa, level);
+
+	base_lower_bound = 3 + input_size - ((3 - level) * stride +
+			   wi->pgshift);
+	base_addr = vttbr & GENMASK_ULL(47, base_lower_bound);
+
+	addr_top = input_size - 1;
+
+	while (1) {
+		phys_addr_t index;
+
+		addr_bottom = (3 - level) * stride + wi->pgshift;
+		index = (ipa & GENMASK_ULL(addr_top, addr_bottom))
+			>> (addr_bottom - 3);
+
+		paddr = base_addr | index;
+		ret = kvm_read_guest(vcpu->kvm, paddr, &desc, sizeof(desc));
+		if (ret < 0)
+			return ret;
+
+		/* Check for valid descriptor at this point */
+		if (!(desc & 1) || ((desc & 3) == 1 && level == 3))
+			return vcpu_inject_s2_trans_fault(vcpu, ipa, level);
+
+		/* We're at the final level or block translation level */
+		if ((desc & 3) == 1 || level == 3)
+			break;
+
+		if (check_output_size(vcpu, wi, desc))
+			return vcpu_inject_s2_addr_sz_fault(vcpu, ipa, level);
+
+		base_addr = desc & GENMASK_ULL(47, wi->pgshift);
+
+		level += 1;
+		addr_top = addr_bottom - 1;
+	}
+
+	if (level < first_block_level)
+		return vcpu_inject_s2_trans_fault(vcpu, ipa, level);
+
+	/* TODO: Consider checking contiguous bit setting */
+
+	if (check_output_size(vcpu, wi, desc))
+		return vcpu_inject_s2_addr_sz_fault(vcpu, ipa, level);
+
+	if (!(desc & BIT(10)))
+		return vcpu_inject_s2_access_flag_fault(vcpu, ipa, level);
+
+	/* Calculate and return the result */
+	paddr = (desc & GENMASK_ULL(47, addr_bottom)) |
+		(ipa & GENMASK_ULL(addr_bottom - 1, 0));
+	out->output = paddr;
+	out->block_size = 1UL << ((3 - level) * stride + wi->pgshift);
+	return 0;
+}
+
+int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
+		       struct kvm_s2_trans *result)
+{
+	u64 vtcr = vcpu->arch.ctxt.el2_regs[VTCR_EL2];
+	struct s2_walk_info wi;
+
+	wi.t0sz = vtcr & TCR_EL2_T0SZ_MASK;
+
+	switch (vtcr & VTCR_EL2_TG0_MASK) {
+	case VTCR_EL2_TG0_4K:
+		wi.pgshift = 12;	 break;
+	case VTCR_EL2_TG0_16K:
+		wi.pgshift = 14;	 break;
+	case VTCR_EL2_TG0_64K:
+	default:
+		wi.pgshift = 16;	 break;
+	}
+	wi.pgsize = 1UL << wi.pgshift;
+	wi.ps = (vtcr & VTCR_EL2_PS_MASK) >> VTCR_EL2_PS_SHIFT;
+	wi.sl = (vtcr & VTCR_EL2_SL0_MASK) >> VTCR_EL2_SL0_SHIFT;
+
+	/* TODO: Reversedescriptor if SCTLR_EL2.EE == 1 */
+
+	return walk_nested_s2_pgd(vcpu, gipa, &wi, result);
+}
 
 /* expects kvm->mmu_lock to be held */
 void kvm_nested_s2_all_vcpus_wp(struct kvm *kvm)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 43/55] KVM: arm/arm64: Handle shadow stage 2 page faults
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (41 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 42/55] KVM: arm64: Implement nested Stage-2 page table walk logic Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-01-09  6:24 ` [RFC 44/55] KVM: arm/arm64: Move kvm_is_write_fault to header file Jintack Lim
                   ` (13 subsequent siblings)
  56 siblings, 0 replies; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

From: Christoffer Dall <christoffer.dall@linaro.org>

If we are faulting on a shadow stage 2 translation, we have to take
extra care in faulting in a page, because we have to collapse the two
levels of stage 2 paging by walking the L2-to-L1 stage 2 page tables in
software.

This approach tries to integrate as much as possible with the existing
code.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm/include/asm/kvm_emulate.h   |  7 ++++
 arch/arm/kvm/mmio.c                  | 12 +++---
 arch/arm/kvm/mmu.c                   | 75 ++++++++++++++++++++++++++++--------
 arch/arm64/include/asm/kvm_emulate.h |  9 +++++
 4 files changed, 82 insertions(+), 21 deletions(-)

diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index 6285f4f..dfc53ce 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -309,4 +309,11 @@ static inline struct kvm_s2_vmid *vcpu_get_active_vmid(struct kvm_vcpu *vcpu)
 {
 	return &vcpu->kvm->arch.mmu.vmid;
 }
+
+/* arm architecture doesn't support the nesting */
+static inline bool kvm_is_shadow_s2_fault(struct kvm_vcpu *vcpu)
+{
+	return false;
+}
+
 #endif /* __ARM_KVM_EMULATE_H__ */
diff --git a/arch/arm/kvm/mmio.c b/arch/arm/kvm/mmio.c
index b6e715f..a1009c2 100644
--- a/arch/arm/kvm/mmio.c
+++ b/arch/arm/kvm/mmio.c
@@ -153,7 +153,7 @@ static int decode_hsr(struct kvm_vcpu *vcpu, bool *is_write, int *len)
 }
 
 int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
-		 phys_addr_t fault_ipa)
+		 phys_addr_t ipa)
 {
 	unsigned long data;
 	unsigned long rt;
@@ -182,22 +182,22 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
 		data = vcpu_data_guest_to_host(vcpu, vcpu_get_reg(vcpu, rt),
 					       len);
 
-		trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, len, fault_ipa, data);
+		trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, len, ipa, data);
 		kvm_mmio_write_buf(data_buf, len, data);
 
-		ret = kvm_io_bus_write(vcpu, KVM_MMIO_BUS, fault_ipa, len,
+		ret = kvm_io_bus_write(vcpu, KVM_MMIO_BUS, ipa, len,
 				       data_buf);
 	} else {
 		trace_kvm_mmio(KVM_TRACE_MMIO_READ_UNSATISFIED, len,
-			       fault_ipa, 0);
+			       ipa, 0);
 
-		ret = kvm_io_bus_read(vcpu, KVM_MMIO_BUS, fault_ipa, len,
+		ret = kvm_io_bus_read(vcpu, KVM_MMIO_BUS, ipa, len,
 				      data_buf);
 	}
 
 	/* Now prepare kvm_run for the potential return to userland. */
 	run->mmio.is_write	= is_write;
-	run->mmio.phys_addr	= fault_ipa;
+	run->mmio.phys_addr	= ipa;
 	run->mmio.len		= len;
 
 	if (!ret) {
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 1677a87..710ae60 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -1072,10 +1072,10 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 	return ret;
 }
 
-static bool transparent_hugepage_adjust(kvm_pfn_t *pfnp, phys_addr_t *ipap)
+static bool transparent_hugepage_adjust(kvm_pfn_t *pfnp, gfn_t gfn,
+					phys_addr_t *ipap)
 {
 	kvm_pfn_t pfn = *pfnp;
-	gfn_t gfn = *ipap >> PAGE_SHIFT;
 
 	if (PageTransCompoundMap(pfn_to_page(pfn))) {
 		unsigned long mask;
@@ -1291,13 +1291,15 @@ static void coherent_cache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
 }
 
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
-			  struct kvm_memory_slot *memslot, unsigned long hva,
-			  unsigned long fault_status)
+			  struct kvm_s2_trans *nested,
+			  struct kvm_memory_slot *memslot,
+			  unsigned long hva, unsigned long fault_status)
 {
 	int ret;
 	bool write_fault, writable, hugetlb = false, force_pte = false;
 	unsigned long mmu_seq;
-	gfn_t gfn = fault_ipa >> PAGE_SHIFT;
+	phys_addr_t ipa = fault_ipa;
+	gfn_t gfn;
 	struct kvm *kvm = vcpu->kvm;
 	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
 	struct vm_area_struct *vma;
@@ -1323,9 +1325,23 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		return -EFAULT;
 	}
 
-	if (is_vm_hugetlb_page(vma) && !logging_active) {
+	if (kvm_is_shadow_s2_fault(vcpu)) {
+		ipa = nested->output;
+
+		/*
+		 * If we're about to create a shadow stage 2 entry, then we
+		 * can only create huge mapings if the guest hypervior also
+		 * uses a huge mapping.
+		 */
+		if (nested->block_size != PMD_SIZE)
+			force_pte = true;
+	}
+	gfn = ipa >> PAGE_SHIFT;
+
+
+	if (!force_pte && is_vm_hugetlb_page(vma) && !logging_active) {
 		hugetlb = true;
-		gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
+		gfn = (ipa & PMD_MASK) >> PAGE_SHIFT;
 	} else {
 		/*
 		 * Pages belonging to memslots that don't have the same
@@ -1389,7 +1405,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		goto out_unlock;
 
 	if (!hugetlb && !force_pte)
-		hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
+		hugetlb = transparent_hugepage_adjust(&pfn, gfn, &fault_ipa);
 
 	fault_ipa_uncached = memslot->flags & KVM_MEMSLOT_INCOHERENT;
 
@@ -1435,6 +1451,12 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
 	kvm_pfn_t pfn;
 	bool pfn_valid = false;
 
+	/*
+	 * TODO: Lookup nested S2 pgtable entry and if the access flag is set,
+	 * then inject an access fault to the guest and invalidate the shadow
+	 * entry.
+	 */
+
 	trace_kvm_access_fault(fault_ipa);
 
 	spin_lock(&vcpu->kvm->mmu_lock);
@@ -1478,8 +1500,10 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
 	unsigned long fault_status;
-	phys_addr_t fault_ipa;
+	phys_addr_t fault_ipa; /* The address we faulted on */
+	phys_addr_t ipa; /* Always the IPA in the L1 guest phys space */
 	struct kvm_memory_slot *memslot;
+	struct kvm_s2_trans nested_trans;
 	unsigned long hva;
 	bool is_iabt, write_fault, writable;
 	gfn_t gfn;
@@ -1491,7 +1515,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		return 1;
 	}
 
-	fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
+	ipa = fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
 
 	trace_kvm_guest_fault(*vcpu_pc(vcpu), kvm_vcpu_get_hsr(vcpu),
 			      kvm_vcpu_get_hfar(vcpu), fault_ipa);
@@ -1500,6 +1524,10 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
 	if (fault_status != FSC_FAULT && fault_status != FSC_PERM &&
 	    fault_status != FSC_ACCESS) {
+		/*
+		 * TODO: Report address size faults from an L2 IPA which
+		 * exceeds KVM_PHYS_SIZE to the L1 hypervisor.
+		 */
 		kvm_err("Unsupported FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
 			kvm_vcpu_trap_get_class(vcpu),
 			(unsigned long)kvm_vcpu_trap_get_fault(vcpu),
@@ -1509,7 +1537,23 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 
 	idx = srcu_read_lock(&vcpu->kvm->srcu);
 
-	gfn = fault_ipa >> PAGE_SHIFT;
+	/*
+	 * We may have faulted on a shadow stage 2 page table if we are
+	 * running a nested guest.  In this case, we have to resovle the L2
+	 * IPA to the L1 IPA first, before knowing what kind of memory should
+	 * back the L1 IPA.
+	 *
+	 * If the shadow stage 2 page table walk faults, then we simply inject
+	 * this to the guest and carry on.
+	 */
+	if (kvm_is_shadow_s2_fault(vcpu)) {
+		ret = kvm_walk_nested_s2(vcpu, fault_ipa, &nested_trans);
+		if (ret)
+			goto out_unlock;
+		ipa = nested_trans.output;
+	}
+
+	gfn = ipa >> PAGE_SHIFT;
 	memslot = gfn_to_memslot(vcpu->kvm, gfn);
 	hva = gfn_to_hva_memslot_prot(memslot, gfn, &writable);
 	write_fault = kvm_is_write_fault(vcpu);
@@ -1543,13 +1587,13 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		 * faulting VA. This is always 12 bits, irrespective
 		 * of the page size.
 		 */
-		fault_ipa |= kvm_vcpu_get_hfar(vcpu) & ((1 << 12) - 1);
-		ret = io_mem_abort(vcpu, run, fault_ipa);
+		ipa |= kvm_vcpu_get_hfar(vcpu) & ((1 << 12) - 1);
+		ret = io_mem_abort(vcpu, run, ipa);
 		goto out_unlock;
 	}
 
 	/* Userspace should not be able to register out-of-bounds IPAs */
-	VM_BUG_ON(fault_ipa >= KVM_PHYS_SIZE);
+	VM_BUG_ON(ipa >= KVM_PHYS_SIZE);
 
 	if (fault_status == FSC_ACCESS) {
 		handle_access_fault(vcpu, fault_ipa);
@@ -1557,7 +1601,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		goto out_unlock;
 	}
 
-	ret = user_mem_abort(vcpu, fault_ipa, memslot, hva, fault_status);
+	ret = user_mem_abort(vcpu, fault_ipa, &nested_trans,
+			     memslot, hva, fault_status);
 	if (ret == 0)
 		ret = 1;
 out_unlock:
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index abad676..2994410 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -368,4 +368,13 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
 	return data;		/* Leave LE untouched */
 }
 
+static inline bool kvm_is_shadow_s2_fault(struct kvm_vcpu *vcpu)
+{
+#ifdef CONFIG_KVM_ARM_NESTED_HYP
+	return (!vcpu_mode_el2(vcpu)) && vcpu_nested_stage2_enabled(vcpu);
+#else
+	return false;
+#endif
+}
+
 #endif /* __ARM64_KVM_EMULATE_H__ */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 44/55] KVM: arm/arm64: Move kvm_is_write_fault to header file
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (42 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 43/55] KVM: arm/arm64: Handle shadow stage 2 page faults Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-01-09  6:24 ` [RFC 45/55] KVM: arm64: KVM: Inject stage-2 page faults Jintack Lim
                   ` (12 subsequent siblings)
  56 siblings, 0 replies; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

From: Christoffer Dall <christoffer.dall@linaro.org>

Move this little function to the header files for arm/arm64 so other
code can make use of it directly.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm/include/asm/kvm_emulate.h   | 8 ++++++++
 arch/arm/kvm/mmu.c                   | 8 --------
 arch/arm64/include/asm/kvm_emulate.h | 8 ++++++++
 3 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index dfc53ce..dde5335 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -235,6 +235,14 @@ static inline u8 kvm_vcpu_trap_get_fault_type(struct kvm_vcpu *vcpu)
 	return kvm_vcpu_get_hsr(vcpu) & HSR_FSC_TYPE;
 }
 
+static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
+{
+	if (kvm_vcpu_trap_is_iabt(vcpu))
+		return false;
+
+	return kvm_vcpu_dabt_iswrite(vcpu);
+}
+
 static inline u32 kvm_vcpu_hvc_get_imm(struct kvm_vcpu *vcpu)
 {
 	return kvm_vcpu_get_hsr(vcpu) & HSR_HVC_IMM_MASK;
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 710ae60..abdf345 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -1113,14 +1113,6 @@ static bool transparent_hugepage_adjust(kvm_pfn_t *pfnp, gfn_t gfn,
 	return false;
 }
 
-static bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
-{
-	if (kvm_vcpu_trap_is_iabt(vcpu))
-		return false;
-
-	return kvm_vcpu_dabt_iswrite(vcpu);
-}
-
 /**
  * stage2_wp_ptes - write protect PMD range
  * @pmd:	pointer to pmd entry
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 2994410..17f4855 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -285,6 +285,14 @@ static inline u8 kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu *vcpu)
 	return kvm_vcpu_get_hsr(vcpu) & ESR_ELx_FSC_TYPE;
 }
 
+static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
+{
+	if (kvm_vcpu_trap_is_iabt(vcpu))
+		return false;
+
+	return kvm_vcpu_dabt_iswrite(vcpu);
+}
+
 static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
 {
 	return vcpu_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 45/55] KVM: arm64: KVM: Inject stage-2 page faults
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (43 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 44/55] KVM: arm/arm64: Move kvm_is_write_fault to header file Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-01-09  6:24 ` [RFC 46/55] KVM: arm64: Add more info to the S2 translation result Jintack Lim
                   ` (11 subsequent siblings)
  56 siblings, 0 replies; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

From: Christoffer Dall <christoffer.dall@linaro.org>

Inject stage-2 page faults to the guest hypervisor.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm64/include/asm/esr.h |  1 +
 arch/arm64/kvm/mmu-nested.c  | 30 ++++++++++++++++++++++++------
 2 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index f32e3a7..6104e31 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -107,6 +107,7 @@
 #define ESR_ELx_CM 		(UL(1) << 8)
 
 /* ISS field definitions for exceptions taken in to Hyp */
+#define ESR_ELx_FSC_ADDRSZ	(0x00)
 #define ESR_ELx_CV		(UL(1) << 24)
 #define ESR_ELx_COND_SHIFT	(20)
 #define ESR_ELx_COND_MASK	(UL(0xF) << ESR_ELx_COND_SHIFT)
diff --git a/arch/arm64/kvm/mmu-nested.c b/arch/arm64/kvm/mmu-nested.c
index a2fab41..b161b55 100644
--- a/arch/arm64/kvm/mmu-nested.c
+++ b/arch/arm64/kvm/mmu-nested.c
@@ -55,22 +55,40 @@ static unsigned int pa_max(void)
 static int vcpu_inject_s2_trans_fault(struct kvm_vcpu *vcpu, gpa_t ipa,
 				      int level)
 {
-	/* TODO: Implement */
-	return -EFAULT;
+	u32 esr;
+
+	vcpu->arch.ctxt.el2_regs[FAR_EL2] = vcpu->arch.fault.far_el2;
+	vcpu->arch.ctxt.el2_regs[HPFAR_EL2] = vcpu->arch.fault.hpfar_el2;
+	esr = kvm_vcpu_get_hsr(vcpu) & ~ESR_ELx_FSC;
+	esr |= ESR_ELx_FSC_FAULT;
+	esr |= level & 0x3;
+	return kvm_inject_nested_sync(vcpu, esr);
 }
 
 static int vcpu_inject_s2_addr_sz_fault(struct kvm_vcpu *vcpu, gpa_t ipa,
 					int level)
 {
-	/* TODO: Implement */
-	return -EFAULT;
+	u32 esr;
+
+	vcpu->arch.ctxt.el2_regs[FAR_EL2] = vcpu->arch.fault.far_el2;
+	vcpu->arch.ctxt.el2_regs[HPFAR_EL2] = vcpu->arch.fault.hpfar_el2;
+	esr = kvm_vcpu_get_hsr(vcpu) & ~ESR_ELx_FSC;
+	esr |= ESR_ELx_FSC_ADDRSZ;
+	esr |= level & 0x3;
+	return kvm_inject_nested_sync(vcpu, esr);
 }
 
 static int vcpu_inject_s2_access_flag_fault(struct kvm_vcpu *vcpu, gpa_t ipa,
 					    int level)
 {
-	/* TODO: Implement */
-	return -EFAULT;
+	u32 esr;
+
+	vcpu->arch.ctxt.el2_regs[FAR_EL2] = vcpu->arch.fault.far_el2;
+	vcpu->arch.ctxt.el2_regs[HPFAR_EL2] = vcpu->arch.fault.hpfar_el2;
+	esr = kvm_vcpu_get_hsr(vcpu) & ~ESR_ELx_FSC;
+	esr |= ESR_ELx_FSC_ACCESS;
+	esr |= level & 0x3;
+	return kvm_inject_nested_sync(vcpu, esr);
 }
 
 static int check_base_s2_limits(struct kvm_vcpu *vcpu, struct s2_walk_info *wi,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 46/55] KVM: arm64: Add more info to the S2 translation result
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (44 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 45/55] KVM: arm64: KVM: Inject stage-2 page faults Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-01-09  6:24 ` [RFC 47/55] KVM: arm/arm64: Forward the guest hypervisor's stage 2 permission faults Jintack Lim
                   ` (10 subsequent siblings)
  56 siblings, 0 replies; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

From: Christoffer Dall <christoffer.dall@linaro.org>

When translating an L2 IPA to an L1 IPA, we some times need to know at
which level this translation occurred and what the resulting permissions
was, so populate the translation result structure with these additional
fields.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm64/include/asm/kvm_mmu.h | 3 +++
 arch/arm64/kvm/mmu-nested.c      | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index bf94f0c..2ac603d 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -327,6 +327,9 @@ static inline unsigned int kvm_get_vmid_bits(void)
 struct kvm_s2_trans {
 	phys_addr_t output;
 	phys_addr_t block_size;
+	bool writable;
+	bool readable;
+	int level;
 };
 
 #ifdef CONFIG_KVM_ARM_NESTED_HYP
diff --git a/arch/arm64/kvm/mmu-nested.c b/arch/arm64/kvm/mmu-nested.c
index b161b55..b579d23 100644
--- a/arch/arm64/kvm/mmu-nested.c
+++ b/arch/arm64/kvm/mmu-nested.c
@@ -236,6 +236,9 @@ static int walk_nested_s2_pgd(struct kvm_vcpu *vcpu, phys_addr_t ipa,
 		(ipa & GENMASK_ULL(addr_bottom - 1, 0));
 	out->output = paddr;
 	out->block_size = 1UL << ((3 - level) * stride + wi->pgshift);
+	out->readable = desc & (0b01 << 6);
+	out->writable = desc & (0b10 << 6);
+	out->level = level;
 	return 0;
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 47/55] KVM: arm/arm64: Forward the guest hypervisor's stage 2 permission faults
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (45 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 46/55] KVM: arm64: Add more info to the S2 translation result Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-02-22 18:15   ` Christoffer Dall
  2017-01-09  6:24 ` [RFC 48/55] KVM: arm64: Emulate TLBI instruction Jintack Lim
                   ` (9 subsequent siblings)
  56 siblings, 1 reply; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

From: Christoffer Dall <christoffer.dall@linaro.org>

When faulting on a shadow stage 2 page table we have to check if the
fault was a permission fault and if so, if that fault needs to be
handled by the guest hypervisor before us, in case the guest hypervisor
has created a less permissive S2 entry than the operation required.

Check if this is the case, and inject a fault if it is.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm/include/asm/kvm_mmu.h   |  7 +++++++
 arch/arm/kvm/mmu.c               |  5 +++++
 arch/arm64/include/asm/kvm_mmu.h |  9 +++++++++
 arch/arm64/kvm/mmu-nested.c      | 33 +++++++++++++++++++++++++++++++++
 4 files changed, 54 insertions(+)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index ab41a10..0d106ae 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -241,6 +241,13 @@ static inline int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
 	return 0;
 }
 
+static inline int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu,
+					   phys_addr_t fault_ipa,
+					   struct kvm_s2_trans *trans)
+{
+	return 0;
+}
+
 static inline void kvm_nested_s2_unmap(struct kvm_vcpu *vcpu) { }
 static inline int kvm_nested_s2_init(struct kvm_vcpu *vcpu) { return 0; }
 static inline void kvm_nested_s2_teardown(struct kvm_vcpu *vcpu) { }
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index abdf345..68fc8e8 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -1542,6 +1542,11 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		ret = kvm_walk_nested_s2(vcpu, fault_ipa, &nested_trans);
 		if (ret)
 			goto out_unlock;
+
+		ret = kvm_s2_handle_perm_fault(vcpu, fault_ipa, &nested_trans);
+		if (ret)
+			goto out_unlock;
+
 		ipa = nested_trans.output;
 	}
 
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 2ac603d..2086296 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -338,6 +338,8 @@ struct kvm_s2_trans {
 bool handle_vttbr_update(struct kvm_vcpu *vcpu, u64 vttbr);
 int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
 		       struct kvm_s2_trans *result);
+int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+			     struct kvm_s2_trans *trans);
 void kvm_nested_s2_unmap(struct kvm_vcpu *vcpu);
 int kvm_nested_s2_init(struct kvm_vcpu *vcpu);
 void kvm_nested_s2_teardown(struct kvm_vcpu *vcpu);
@@ -366,6 +368,13 @@ static inline int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
 	return 0;
 }
 
+static inline int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu,
+					   phys_addr_t fault_ipa,
+					   struct kvm_s2_trans *trans)
+{
+	return 0;
+}
+
 static inline void kvm_nested_s2_unmap(struct kvm_vcpu *vcpu) { }
 static inline int kvm_nested_s2_init(struct kvm_vcpu *vcpu) { return 0; }
 static inline void kvm_nested_s2_teardown(struct kvm_vcpu *vcpu) { }
diff --git a/arch/arm64/kvm/mmu-nested.c b/arch/arm64/kvm/mmu-nested.c
index b579d23..65ad0da 100644
--- a/arch/arm64/kvm/mmu-nested.c
+++ b/arch/arm64/kvm/mmu-nested.c
@@ -52,6 +52,19 @@ static unsigned int pa_max(void)
 	return ps_to_output_size(parange);
 }
 
+static int vcpu_inject_s2_perm_fault(struct kvm_vcpu *vcpu, gpa_t ipa,
+				     int level)
+{
+	u32 esr;
+
+	vcpu->arch.ctxt.el2_regs[FAR_EL2] = vcpu->arch.fault.far_el2;
+	vcpu->arch.ctxt.el2_regs[HPFAR_EL2] = vcpu->arch.fault.hpfar_el2;
+	esr = kvm_vcpu_get_hsr(vcpu) & ~ESR_ELx_FSC;
+	esr |= ESR_ELx_FSC_PERM;
+	esr |= level & 0x3;
+	return kvm_inject_nested_sync(vcpu, esr);
+}
+
 static int vcpu_inject_s2_trans_fault(struct kvm_vcpu *vcpu, gpa_t ipa,
 				      int level)
 {
@@ -268,6 +281,26 @@ int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
 	return walk_nested_s2_pgd(vcpu, gipa, &wi, result);
 }
 
+/*
+ * Returns non-zero if permission fault is handled by injecting it to the next
+ * level hypervisor.
+ */
+int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+			     struct kvm_s2_trans *trans)
+{
+	unsigned long fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
+	bool write_fault = kvm_is_write_fault(vcpu);
+
+	if (fault_status != FSC_PERM)
+		return 0;
+
+	if ((write_fault && !trans->writable) ||
+	    (!write_fault && !trans->readable))
+		return vcpu_inject_s2_perm_fault(vcpu, fault_ipa, trans->level);
+
+	return 0;
+}
+
 /* expects kvm->mmu_lock to be held */
 void kvm_nested_s2_all_vcpus_wp(struct kvm *kvm)
 {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 48/55] KVM: arm64: Emulate TLBI instruction
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (46 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 47/55] KVM: arm/arm64: Forward the guest hypervisor's stage 2 permission faults Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-01-09  6:24 ` [RFC 49/55] KVM: arm64: Fixes to toggle_cache for nesting Jintack Lim
                   ` (8 subsequent siblings)
  56 siblings, 0 replies; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

From: Christoffer Dall <christoffer.dall@linaro.org>

Currently, we flush ALL shadow stage-2 page tables on the tlbi
instruction execution. We may be able to do this more efficiently by
considering the vttbr_el2 value of the guest hypervisor, but leave it
for now.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm64/kvm/sys_regs.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index ddb641c..b0a057d 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2013,8 +2013,14 @@ static int emulate_sys_reg(struct kvm_vcpu *vcpu,
 static int emulate_tlbi(struct kvm_vcpu *vcpu,
 			     struct sys_reg_params *params)
 {
-	/* TODO: support tlbi instruction emulation*/
-	kvm_inject_undefined(vcpu);
+	/*
+	 * We unmap ALL stage-2 page tables on tlbi instruction.
+	 * We may make it more efficient by looking at the exact tlbi
+	 * instruction.
+	 */
+	stage2_unmap_vm(vcpu->kvm);
+	kvm_nested_s2_unmap(vcpu);
+
 	return 1;
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 49/55] KVM: arm64: Fixes to toggle_cache for nesting
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (47 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 48/55] KVM: arm64: Emulate TLBI instruction Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-01-09  6:24 ` [RFC 50/55] KVM: arm/arm64: Abstract kvm_phys_addr_ioremap() function Jintack Lim
                   ` (7 subsequent siblings)
  56 siblings, 0 replies; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

From: Christoffer Dall <christoffer.dall@linaro.org>

So far we were flushing almost the entire universe whenever a VM would
load/unload the SCTLR_EL1 and the two versions of that register had
different MMU enabled settings.  This turned out to be so slow that it
prevented forward progress for a nested VM, because a scheduler timer
tick interrupt would always be pending when we reached the nested VM.

To avoid this problem, we consider the SCTLR_EL2 when evaluating if
caches are on or off when entering virtual EL2 (because this is the
value that we end up shadowing onto the hardware EL1 register).

We also reduce the scope of the flush operation to only flush shadow
stage 2 page table state of the particular VCPU toggling the caches
instead of the shadow stage 2 state of all possible VCPUs.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm/kvm/mmu.c               | 31 ++++++++++++++++++++++++++++++-
 arch/arm64/include/asm/kvm_mmu.h |  7 ++++++-
 2 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 68fc8e8..344bc01 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -422,6 +422,35 @@ static void stage2_flush_vm(struct kvm *kvm)
 	srcu_read_unlock(&kvm->srcu, idx);
 }
 
+/**
+ * Same as above but only flushed shadow state for specific vcpu
+ */
+static void stage2_flush_vcpu(struct kvm_vcpu *vcpu)
+{
+	struct kvm *kvm = vcpu->kvm;
+	struct kvm_memslots *slots;
+	struct kvm_memory_slot *memslot;
+	int idx;
+	struct kvm_nested_s2_mmu __maybe_unused *nested_mmu;
+
+	idx = srcu_read_lock(&kvm->srcu);
+	spin_lock(&kvm->mmu_lock);
+
+	slots = kvm_memslots(kvm);
+	kvm_for_each_memslot(memslot, slots)
+		stage2_flush_memslot(&kvm->arch.mmu, memslot);
+
+#ifdef CONFIG_KVM_ARM_NESTED_HYP
+	list_for_each_entry_rcu(nested_mmu, &vcpu->kvm->arch.nested_mmu_list,
+				list) {
+		kvm_stage2_flush_range(&nested_mmu->mmu, 0, KVM_PHYS_SIZE);
+	}
+#endif
+
+	spin_unlock(&kvm->mmu_lock);
+	srcu_read_unlock(&kvm->srcu, idx);
+}
+
 static void clear_hyp_pgd_entry(pgd_t *pgd)
 {
 	pud_t *pud_table __maybe_unused = pud_offset(pgd, 0UL);
@@ -2074,7 +2103,7 @@ void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled)
 	 * Clean + invalidate does the trick always.
 	 */
 	if (now_enabled != was_enabled)
-		stage2_flush_vm(vcpu->kvm);
+		stage2_flush_vcpu(vcpu);
 
 	/* Caches are now on, stop trapping VM ops (until a S/W op) */
 	if (now_enabled)
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 2086296..7754f3e 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -241,7 +241,12 @@ static inline bool kvm_page_empty(void *ptr)
 
 static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
 {
-	return (vcpu_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
+	u32 mode = vcpu->arch.ctxt.gp_regs.regs.pstate & PSR_MODE_MASK;
+
+	if (mode != PSR_MODE_EL2h && mode != PSR_MODE_EL2t)
+		return (vcpu_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
+	else
+		return (vcpu_el2_reg(vcpu, SCTLR_EL2) & 0b101) == 0b101;
 }
 
 static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 50/55] KVM: arm/arm64: Abstract kvm_phys_addr_ioremap() function
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (48 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 49/55] KVM: arm64: Fixes to toggle_cache for nesting Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-01-09  6:24 ` [RFC 51/55] KVM: arm64: Expose physical address of vcpu interface Jintack Lim
                   ` (6 subsequent siblings)
  56 siblings, 0 replies; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

The original kvm_phys_addr_ioremap function only uses mmu pointing to
the VM's mmu context. However, it would be very useful to reuse this
function for the nested mmu context. Therefore, create a function named
__kvm_phys_addr_ioremapp which takes mmu as an argument, and let
kvm_phys_addr_ioremap calls this function with the VM's mmu context.

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm/kvm/mmu.c               | 18 +++++++++++++-----
 arch/arm64/include/asm/kvm_mmu.h |  3 +++
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 344bc01..2cd6a19 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -1058,15 +1058,16 @@ static int stage2_pmdp_test_and_clear_young(pmd_t *pmd)
 }
 
 /**
- * kvm_phys_addr_ioremap - map a device range to guest IPA
+ * __kvm_phys_addr_ioremap - map a device range to guest IPA
  *
  * @kvm:	The KVM pointer
  * @guest_ipa:	The IPA at which to insert the mapping
  * @pa:		The physical address of the device
  * @size:	The size of the mapping
  */
-int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
-			  phys_addr_t pa, unsigned long size, bool writable)
+int __kvm_phys_addr_ioremap(struct kvm *kvm, struct kvm_s2_mmu *mmu,
+			    phys_addr_t guest_ipa, phys_addr_t pa,
+			    unsigned long size, bool writable)
 {
 	phys_addr_t addr, end;
 	int ret = 0;
@@ -1087,8 +1088,8 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 		if (ret)
 			goto out;
 		spin_lock(&kvm->mmu_lock);
-		ret = stage2_set_pte(&kvm->arch.mmu, &cache, addr, &pte,
-						KVM_S2PTE_FLAG_IS_IOMAP);
+		ret = stage2_set_pte(mmu, &cache, addr, &pte,
+				     KVM_S2PTE_FLAG_IS_IOMAP);
 		spin_unlock(&kvm->mmu_lock);
 		if (ret)
 			goto out;
@@ -1101,6 +1102,13 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 	return ret;
 }
 
+int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
+			  phys_addr_t pa, unsigned long size, bool writable)
+{
+	return __kvm_phys_addr_ioremap(kvm, &kvm->arch.mmu, guest_ipa, pa,
+				       size, writable);
+}
+
 static bool transparent_hugepage_adjust(kvm_pfn_t *pfnp, gfn_t gfn,
 					phys_addr_t *ipap)
 {
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 7754f3e..ec9e5e9 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -149,6 +149,9 @@ static inline unsigned long __kern_hyp_va(unsigned long v)
 int __kvm_alloc_stage2_pgd(struct kvm_s2_mmu *mmu);
 void kvm_free_stage2_pgd(struct kvm *kvm);
 void __kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
+int __kvm_phys_addr_ioremap(struct kvm *kvm, struct kvm_s2_mmu *mmu,
+			    phys_addr_t guest_ipa, phys_addr_t pa,
+			    unsigned long size, bool writable);
 int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 			  phys_addr_t pa, unsigned long size, bool writable);
 void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 51/55] KVM: arm64: Expose physical address of vcpu interface
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (49 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 50/55] KVM: arm/arm64: Abstract kvm_phys_addr_ioremap() function Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-01-09  6:24 ` [RFC 52/55] KVM: arm/arm64: Create a vcpu mapping for the nested VM Jintack Lim
                   ` (5 subsequent siblings)
  56 siblings, 0 replies; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

Expose physical address of vgic virtual cpu interface.

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 include/kvm/arm_vgic.h      | 1 +
 virt/kvm/arm/vgic/vgic-v2.c | 6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 5bda20c..05c7811 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -331,6 +331,7 @@ static inline void vgic_handle_nested_maint_irq(struct kvm_vcpu *vcpu) { }
 #define vgic_valid_spi(k, i)	(((i) >= VGIC_NR_PRIVATE_IRQS) && \
 			((i) < (k)->arch.vgic.nr_spis + VGIC_NR_PRIVATE_IRQS))
 
+phys_addr_t vgic_vcpu_base(void);
 bool kvm_vcpu_has_pending_irqs(struct kvm_vcpu *vcpu);
 void kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu);
 void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
index b8b73fd..5d85041 100644
--- a/virt/kvm/arm/vgic/vgic-v2.c
+++ b/virt/kvm/arm/vgic/vgic-v2.c
@@ -386,3 +386,9 @@ int vgic_v2_probe(const struct gic_kvm_info *info)
 
 	return ret;
 }
+
+/* Return physical address of vgic virtual cpu interface */
+phys_addr_t vgic_vcpu_base(void)
+{
+	return kvm_vgic_global_state.vcpu_base;
+}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 52/55] KVM: arm/arm64: Create a vcpu mapping for the nested VM
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (50 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 51/55] KVM: arm64: Expose physical address of vcpu interface Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-01-09  6:24 ` [RFC 53/55] KVM: arm64: Reflect shadow VMPIDR_EL2 value to MPIDR_EL1 Jintack Lim
                   ` (4 subsequent siblings)
  56 siblings, 0 replies; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

Create a mapping from the nested VM's cpu interface to the hardware
virtual cpu interface. This is to allow the nested VM to access virtual
cpu interface directly.

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm/include/asm/kvm_mmu.h   |  3 +++
 arch/arm/kvm/mmu.c               |  5 +++++
 arch/arm64/include/asm/kvm_mmu.h |  5 +++++
 arch/arm64/kvm/mmu-nested.c      | 26 ++++++++++++++++++++++++++
 4 files changed, 39 insertions(+)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 0d106ae..048a021 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -254,6 +254,9 @@ static inline void kvm_nested_s2_teardown(struct kvm_vcpu *vcpu) { }
 static inline void kvm_nested_s2_all_vcpus_wp(struct kvm *kvm) { }
 static inline void kvm_nested_s2_all_vcpus_unmap(struct kvm *kvm) { }
 static inline void kvm_nested_s2_all_vcpus_flush(struct kvm *kvm) { }
+static inline int kvm_nested_mmio_ondemand(struct kvm_vcpu *vcpu,
+					   phys_addr_t fault_ipa,
+					   phys_addr_t ipa) { return 0; }
 
 static inline u64 kvm_get_vttbr(struct kvm_s2_vmid *vmid,
 				struct kvm_s2_mmu *mmu)
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 2cd6a19..f7c2911 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -1615,6 +1615,11 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 			goto out_unlock;
 		}
 
+		if (kvm_nested_mmio_ondemand(vcpu, fault_ipa, ipa)) {
+			ret = 1;
+			goto out_unlock;
+		}
+
 		/*
 		 * The IPA is reported as [MAX:12], so we need to
 		 * complement it with the bottom 12 bits from the
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index ec9e5e9..ee80a58 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -354,6 +354,8 @@ int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 void kvm_nested_s2_all_vcpus_wp(struct kvm *kvm);
 void kvm_nested_s2_all_vcpus_unmap(struct kvm *kvm);
 void kvm_nested_s2_all_vcpus_flush(struct kvm *kvm);
+int kvm_nested_mmio_ondemand(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+			     phys_addr_t ipa);
 #else
 static inline struct kvm_nested_s2_mmu *get_nested_mmu(struct kvm_vcpu *vcpu,
 						       u64 vttbr)
@@ -389,6 +391,9 @@ static inline void kvm_nested_s2_teardown(struct kvm_vcpu *vcpu) { }
 static inline void kvm_nested_s2_all_vcpus_wp(struct kvm *kvm) { }
 static inline void kvm_nested_s2_all_vcpus_unmap(struct kvm *kvm) { }
 static inline void kvm_nested_s2_all_vcpus_flush(struct kvm *kvm) { }
+static inline int kvm_nested_mmio_ondemand(struct kvm_vcpu *vcpu,
+					   phys_addr_t fault_ipa,
+					   phys_addr_t ipa) { return 0; }
 #endif
 
 static inline u64 kvm_get_vttbr(struct kvm_s2_vmid *vmid,
diff --git a/arch/arm64/kvm/mmu-nested.c b/arch/arm64/kvm/mmu-nested.c
index 65ad0da..bce0042 100644
--- a/arch/arm64/kvm/mmu-nested.c
+++ b/arch/arm64/kvm/mmu-nested.c
@@ -473,3 +473,29 @@ bool handle_vttbr_update(struct kvm_vcpu *vcpu, u64 vttbr)
 
 	return true;
 }
+
+/*
+ * vcpu interface address. This address is supposed to come from the guest's
+ * device tree via QEMU. Here we just hardcoded it, but should be fixed.
+ */
+#define NESTED_VCPU_IF_ADDR	0x08010000
+int kvm_nested_mmio_ondemand(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+			     phys_addr_t ipa)
+{
+	int ret = 0;
+	phys_addr_t vcpu_base = vgic_vcpu_base();
+
+	/* Return if this fault is not from a nested VM */
+	if (vcpu->arch.hw_mmu == &vcpu->kvm->arch.mmu)
+		return ret;
+
+	if (ipa == NESTED_VCPU_IF_ADDR)  {
+		ret = __kvm_phys_addr_ioremap(vcpu->kvm, vcpu->arch.hw_mmu,
+					      fault_ipa, vcpu_base,
+					      KVM_VGIC_V2_CPU_SIZE, true);
+		if (!ret)
+			ret = 1;
+	}
+
+	return ret;
+}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 53/55] KVM: arm64: Reflect shadow VMPIDR_EL2 value to MPIDR_EL1
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (51 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 52/55] KVM: arm/arm64: Create a vcpu mapping for the nested VM Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-01-09  6:24 ` [RFC 54/55] KVM: arm/arm64: Adjust virtual offset considering nesting Jintack Lim
                   ` (3 subsequent siblings)
  56 siblings, 0 replies; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

A non-secure EL0 or EL1 read of MPIDR_EL1 should return the value of
VMPIDR_EL2. We emulate this by copying the virtual VMPIDR_EL2 value to
MPIDR_EL1 when entering VM's EL0 or EL1.

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm64/kvm/context.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
index 9ebc38f..dd79b0e 100644
--- a/arch/arm64/kvm/context.c
+++ b/arch/arm64/kvm/context.c
@@ -173,6 +173,12 @@ void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
 		ctxt->hw_pstate = *vcpu_cpsr(vcpu);
 		ctxt->hw_sys_regs = ctxt->sys_regs;
 		ctxt->hw_sp_el1 = ctxt->gp_regs.sp_el1;
+
+		/*
+		 * A non-secure EL0 or EL1 read of MPIDR_EL1 returns
+		 * the value of VMPIDR_EL2.
+		 */
+		ctxt->hw_sys_regs[MPIDR_EL1] = ctxt->el2_regs[VMPIDR_EL2];
 	}
 
 	vgic_v2_setup_shadow_state(vcpu);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 54/55] KVM: arm/arm64: Adjust virtual offset considering nesting
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (52 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 53/55] KVM: arm64: Reflect shadow VMPIDR_EL2 value to MPIDR_EL1 Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-02-22 19:28   ` Christoffer Dall
  2017-01-09  6:24 ` [RFC 55/55] KVM: arm64: Enable nested virtualization Jintack Lim
                   ` (2 subsequent siblings)
  56 siblings, 1 reply; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

The guest hypervisor sets cntvoff_el2 for its VM (i.e. nested VM).  Note
that physical/virtual counter value in the guest hypervisor's point of
view is already offsetted by the virtual offset set by the host
hypervisor.  Therefore, the correct offset we need to write to the
cntvoff_el2 is the sum of offset the host hypervisor initially has for
the VM and virtual offset the guest hypervisor sets for the nested VM.

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm/include/asm/kvm_emulate.h   | 6 ++++++
 arch/arm64/include/asm/kvm_emulate.h | 6 ++++++
 virt/kvm/arm/arch_timer.c            | 3 ++-
 virt/kvm/arm/hyp/timer-sr.c          | 5 ++++-
 4 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index dde5335..c7a690f 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -324,4 +324,10 @@ static inline bool kvm_is_shadow_s2_fault(struct kvm_vcpu *vcpu)
 	return false;
 }
 
+/* Return the guest hypervisor's cntvoff value */
+static inline u64 kvm_get_vcntvoff(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
+
 #endif /* __ARM_KVM_EMULATE_H__ */
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 17f4855..0aaa4ca 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -385,4 +385,10 @@ static inline bool kvm_is_shadow_s2_fault(struct kvm_vcpu *vcpu)
 #endif
 }
 
+/* Return the guest hypervisor's cntvoff value */
+static inline u64 kvm_get_vcntvoff(struct kvm_vcpu *vcpu)
+{
+	return vcpu_el2_reg(vcpu, CNTVOFF_EL2);
+}
+
 #endif /* __ARM64_KVM_EMULATE_H__ */
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 7a161f8..e393939 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -24,6 +24,7 @@
 
 #include <clocksource/arm_arch_timer.h>
 #include <asm/arch_timer.h>
+#include <asm/kvm_emulate.h>
 
 #include <kvm/arm_vgic.h>
 #include <kvm/arm_arch_timer.h>
@@ -102,7 +103,7 @@ static u64 kvm_timer_cntvoff(struct kvm_vcpu *vcpu,
 			     struct arch_timer_context *timer_ctx)
 {
 	if (timer_ctx == vcpu_vtimer(vcpu))
-		return vcpu->kvm->arch.timer.cntvoff;
+		return vcpu->kvm->arch.timer.cntvoff + kvm_get_vcntvoff(vcpu);
 
 	return 0;
 }
diff --git a/virt/kvm/arm/hyp/timer-sr.c b/virt/kvm/arm/hyp/timer-sr.c
index 4bbd36c..66dab01 100644
--- a/virt/kvm/arm/hyp/timer-sr.c
+++ b/virt/kvm/arm/hyp/timer-sr.c
@@ -20,6 +20,7 @@
 #include <linux/kvm_host.h>
 
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_emulate.h>
 
 /* vcpu is already in the HYP VA space */
 void __hyp_text __timer_save_state(struct kvm_vcpu *vcpu)
@@ -49,6 +50,7 @@ void __hyp_text __timer_restore_state(struct kvm_vcpu *vcpu)
 	struct kvm *kvm = kern_hyp_va(vcpu->kvm);
 	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
 	u64 val;
+	u64 cntvoff;
 
 	/*
 	 * Disallow physical timer access for the guest
@@ -60,7 +62,8 @@ void __hyp_text __timer_restore_state(struct kvm_vcpu *vcpu)
 	write_sysreg(val, cnthctl_el2);
 
 	if (vtimer->enabled) {
-		write_sysreg(kvm->arch.timer.cntvoff, cntvoff_el2);
+		cntvoff = kvm->arch.timer.cntvoff + kvm_get_vcntvoff(vcpu);
+		write_sysreg(cntvoff, cntvoff_el2);
 		write_sysreg_el0(vtimer->cnt_cval, cntv_cval);
 		isb();
 		write_sysreg_el0(vtimer->cnt_ctl, cntv_ctl);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [RFC 55/55] KVM: arm64: Enable nested virtualization
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (53 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 54/55] KVM: arm/arm64: Adjust virtual offset considering nesting Jintack Lim
@ 2017-01-09  6:24 ` Jintack Lim
  2017-01-09 15:05 ` [RFC 00/55] Nested Virtualization on KVM/ARM David Hildenbrand
  2017-02-22 18:23 ` Christoffer Dall
  56 siblings, 0 replies; 111+ messages in thread
From: Jintack Lim @ 2017-01-09  6:24 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel
  Cc: jintack

Now that everything is ready, we enable nested virtualization by setting
the HCR NV and NV1 bit.

Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
---
 arch/arm64/include/asm/kvm_arm.h | 1 +
 arch/arm64/kvm/hyp/switch.c      | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index f9addf3..ab8b93b 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -24,6 +24,7 @@
 
 /* Hyp Configuration Register (HCR) bits */
 #define HCR_NV1		(UL(1) << 43)
+#define HCR_NV		(UL(1) << 42)
 #define HCR_E2H		(UL(1) << 34)
 #define HCR_ID		(UL(1) << 33)
 #define HCR_CD		(UL(1) << 32)
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index c80b2ae..df7b88d 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -87,7 +87,7 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
 		isb();
 	}
 	if (vcpu_mode_el2(vcpu))
-		val |= HCR_TVM | HCR_TRVM;
+		val |= HCR_TVM | HCR_TRVM | HCR_NV | HCR_NV1;
 	write_sysreg(val, hcr_el2);
 	/* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */
 	write_sysreg(1 << 15, hstr_el2);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* Re: [RFC 00/55] Nested Virtualization on KVM/ARM
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (54 preceding siblings ...)
  2017-01-09  6:24 ` [RFC 55/55] KVM: arm64: Enable nested virtualization Jintack Lim
@ 2017-01-09 15:05 ` David Hildenbrand
  2017-01-10 16:18   ` Jintack Lim
  2017-02-22 18:23 ` Christoffer Dall
  56 siblings, 1 reply; 111+ messages in thread
From: David Hildenbrand @ 2017-01-09 15:05 UTC (permalink / raw)
  To: Jintack Lim, christoffer.dall, marc.zyngier, pbonzini, rkrcmar,
	linux, catalin.marinas, will.deacon, vladimir.murzin,
	suzuki.poulose, mark.rutland, james.morse, lorenzo.pieralisi,
	kevin.brodsky, wcohen, shankerd, geoff, andre.przywara,
	eric.auger, anna-maria, shihwei, linux-arm-kernel, kvmarm, kvm,
	linux-kernel


> Even though this work is not complete (see limitations below), I'd appreciate
> early feedback on this RFC. Specifically, I'm interested in:
> - Is it better to have a kernel config or to make it configurable at runtime?

x86 and s390x have a kernel module parameter (nested) that can only be
changed when loading the module and should default to false. So the
admin explicitly has to enable it. Maybe going the same path makes
sense.

-- 

David

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 00/55] Nested Virtualization on KVM/ARM
  2017-01-09 15:05 ` [RFC 00/55] Nested Virtualization on KVM/ARM David Hildenbrand
@ 2017-01-10 16:18   ` Jintack Lim
  0 siblings, 0 replies; 111+ messages in thread
From: Jintack Lim @ 2017-01-10 16:18 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Christoffer Dall, Marc Zyngier, Paolo Bonzini, rkrcmar, linux,
	Catalin Marinas, will.deacon, vladimir.murzin, Suzuki K Poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	Shih-Wei Li, linux-arm-kernel, kvmarm, KVM General, linux-kernel

On Mon, Jan 9, 2017 at 10:05 AM, David Hildenbrand <david@redhat.com> wrote:
>
>> Even though this work is not complete (see limitations below), I'd
>> appreciate
>> early feedback on this RFC. Specifically, I'm interested in:
>> - Is it better to have a kernel config or to make it configurable at
>> runtime?
>
>
> x86 and s390x have a kernel module parameter (nested) that can only be
> changed when loading the module and should default to false. So the
> admin explicitly has to enable it. Maybe going the same path makes
> sense.

I think that makes sense. Thanks!

>
> --
>
> David
>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 06/55] KVM: arm64: Add EL2 execution context for nesting
  2017-01-09  6:24 ` [RFC 06/55] KVM: arm64: Add EL2 execution context for nesting Jintack Lim
@ 2017-02-22 11:10   ` Christoffer Dall
  2017-06-26 14:33     ` Jintack Lim
  0 siblings, 1 reply; 111+ messages in thread
From: Christoffer Dall @ 2017-02-22 11:10 UTC (permalink / raw)
  To: Jintack Lim
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel

On Mon, Jan 09, 2017 at 01:24:02AM -0500, Jintack Lim wrote:
> With the nested virtualization support, the context of the guest
> includes EL2 register states. The host manages a set of virtual EL2
> registers.  In addition to that, the guest hypervisor supposed to run in
> EL2 is now deprivilaged and runs in EL1. So, the host also manages a set
> of shadow system registers to be able to run the guest hypervisor in
> EL1.
> 
> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>  arch/arm64/include/asm/kvm_host.h | 54 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 54 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index c0c8b02..ed78d73 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -146,6 +146,42 @@ enum vcpu_sysreg {
>  	NR_SYS_REGS	/* Nothing after this line! */
>  };
>  
> +enum el2_regs {
> +	ELR_EL2,
> +	SPSR_EL2,
> +	SP_EL2,
> +	AMAIR_EL2,
> +	MAIR_EL2,
> +	TCR_EL2,
> +	TTBR0_EL2,
> +	VTCR_EL2,
> +	VTTBR_EL2,
> +	VMPIDR_EL2,
> +	VPIDR_EL2,      /* 10 */
> +	MDCR_EL2,
> +	CNTHCTL_EL2,
> +	CNTHP_CTL_EL2,
> +	CNTHP_CVAL_EL2,
> +	CNTHP_TVAL_EL2,
> +	CNTVOFF_EL2,
> +	ACTLR_EL2,
> +	AFSR0_EL2,
> +	AFSR1_EL2,
> +	CPTR_EL2,       /* 20 */
> +	ESR_EL2,
> +	FAR_EL2,
> +	HACR_EL2,
> +	HCR_EL2,
> +	HPFAR_EL2,
> +	HSTR_EL2,
> +	RMR_EL2,
> +	RVBAR_EL2,
> +	SCTLR_EL2,
> +	TPIDR_EL2,      /* 30 */
> +	VBAR_EL2,
> +	NR_EL2_REGS     /* Nothing after this line! */
> +};

Why do we have a separate enum and array for the EL2 regs and not simply
expand vcpu_sysreg ?

> +
>  /* 32bit mapping */
>  #define c0_MPIDR	(MPIDR_EL1 * 2)	/* MultiProcessor ID Register */
>  #define c0_CSSELR	(CSSELR_EL1 * 2)/* Cache Size Selection Register */
> @@ -193,6 +229,23 @@ struct kvm_cpu_context {
>  		u64 sys_regs[NR_SYS_REGS];
>  		u32 copro[NR_COPRO_REGS];
>  	};
> +
> +	u64 el2_regs[NR_EL2_REGS];         /* only used for nesting */
> +	u64 shadow_sys_regs[NR_SYS_REGS];  /* only used for virtual EL2 */
> +
> +	/*
> +	 * hw_* will be used when switching to a VM. They point to either
> +	 * the virtual EL2 or EL1/EL0 context depending on vcpu mode.

don't they either point to the shadow sys regs or the the normal EL1
sysregs?

> +	 */
> +
> +	/* pointing shadow_sys_regs or sys_regs */

that's what this comment seems to indicate, so there's some duplicity
here.

> +	u64 *hw_sys_regs;
> +
> +	/* copy of either gp_regs.sp_el1 or el2_regs[SP_EL2] */
> +	u64 hw_sp_el1;
> +
> +	/* pstate written to SPSR_EL2 */
> +	u64 hw_pstate;
>  };
>  
>  typedef struct kvm_cpu_context kvm_cpu_context_t;
> @@ -277,6 +330,7 @@ struct kvm_vcpu_arch {
>  
>  #define vcpu_gp_regs(v)		(&(v)->arch.ctxt.gp_regs)
>  #define vcpu_sys_reg(v,r)	((v)->arch.ctxt.sys_regs[(r)])
> +#define vcpu_el2_reg(v, r)	((v)->arch.ctxt.el2_regs[(r)])
>  /*
>   * CP14 and CP15 live in the same array, as they are backed by the
>   * same system registers.
> -- 
> 1.9.1
> 
> 

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 07/55] KVM: arm/arm64: Add virtual EL2 state emulation framework
  2017-01-09  6:24 ` [RFC 07/55] KVM: arm/arm64: Add virtual EL2 state emulation framework Jintack Lim
@ 2017-02-22 11:12   ` Christoffer Dall
  2017-06-01 20:05   ` Bandan Das
  1 sibling, 0 replies; 111+ messages in thread
From: Christoffer Dall @ 2017-02-22 11:12 UTC (permalink / raw)
  To: Jintack Lim
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel

On Mon, Jan 09, 2017 at 01:24:03AM -0500, Jintack Lim wrote:
> From: Christoffer Dall <christoffer.dall@linaro.org>
> 
> Add a framework to set up the guest's context depending on the guest's
> exception level. A chosen context is written to hardware in the lowvisor.
> We don't set the virtual EL2 context yet.

We need to improve this commit message.

I think this commit is just trying to prepare to be able to switch
between the normal EL1 sysreg state or using the shadow sysreg to
emulate virtual EL2, but without any functional change so far.

Is that correct?

> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
> ---
>  arch/arm/include/asm/kvm_emulate.h   |   4 ++
>  arch/arm/kvm/arm.c                   |   5 ++
>  arch/arm64/include/asm/kvm_emulate.h |   4 ++
>  arch/arm64/kvm/Makefile              |   2 +-
>  arch/arm64/kvm/context.c             |  49 ++++++++++++++++
>  arch/arm64/kvm/hyp/sysreg-sr.c       | 109 +++++++++++++++++++----------------
>  6 files changed, 122 insertions(+), 51 deletions(-)
>  create mode 100644 arch/arm64/kvm/context.c
> 
> diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
> index 399cd75e..0a03b7d 100644
> --- a/arch/arm/include/asm/kvm_emulate.h
> +++ b/arch/arm/include/asm/kvm_emulate.h
> @@ -47,6 +47,10 @@ static inline void vcpu_set_reg(struct kvm_vcpu *vcpu, u8 reg_num,
>  void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
>  void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
>  
> +static inline void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu) { };
> +static inline void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu) { };
> +static inline void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt) { };
> +
>  static inline bool kvm_condition_valid(const struct kvm_vcpu *vcpu)
>  {
>  	return kvm_condition_valid32(vcpu);
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index d2dfa32..436bf5a 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -41,6 +41,7 @@
>  #include <asm/virt.h>
>  #include <asm/kvm_arm.h>
>  #include <asm/kvm_asm.h>
> +#include <asm/kvm_hyp.h>
>  #include <asm/kvm_mmu.h>
>  #include <asm/kvm_emulate.h>
>  #include <asm/kvm_coproc.h>
> @@ -646,6 +647,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		}
>  
>  		kvm_arm_setup_debug(vcpu);
> +		kvm_arm_setup_shadow_state(vcpu);
>  
>  		/**************************************************************
>  		 * Enter the guest
> @@ -662,6 +664,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		 * Back from guest
>  		 *************************************************************/
>  
> +		kvm_arm_restore_shadow_state(vcpu);
>  		kvm_arm_clear_debug(vcpu);
>  
>  		/*
> @@ -1369,6 +1372,8 @@ static int init_hyp_mode(void)
>  			kvm_err("Cannot map host CPU state: %d\n", err);
>  			goto out_err;
>  		}
> +
> +		kvm_arm_init_cpu_context(cpu_ctxt);
>  	}
>  
>  	kvm_info("Hyp mode initialized successfully\n");
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 830be2e..8892c82 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -42,6 +42,10 @@
>  void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
>  void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
>  
> +void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu);
> +void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu);
> +void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt);
> +
>  static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
>  {
>  	vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS;
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index d50a82a..7811d27 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -16,7 +16,7 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/e
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/arm.o $(ARM)/mmu.o $(ARM)/mmio.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/psci.o $(ARM)/perf.o
>  
> -kvm-$(CONFIG_KVM_ARM_HOST) += inject_fault.o regmap.o
> +kvm-$(CONFIG_KVM_ARM_HOST) += inject_fault.o regmap.o context.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o sys_regs_generic_v8.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/aarch32.o
> diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
> new file mode 100644
> index 0000000..320afc6
> --- /dev/null
> +++ b/arch/arm64/kvm/context.c
> @@ -0,0 +1,49 @@
> +/*
> + * Copyright (C) 2016 - Linaro Ltd.
> + * Author: Christoffer Dall <christoffer.dall@linaro.org>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <linux/kvm_host.h>
> +#include <asm/kvm_emulate.h>
> +
> +/**
> + * kvm_arm_setup_shadow_state -- prepare shadow state based on emulated mode
> + * @vcpu: The VCPU pointer
> + */
> +void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
> +
> +	ctxt->hw_pstate = *vcpu_cpsr(vcpu);
> +	ctxt->hw_sys_regs = ctxt->sys_regs;
> +	ctxt->hw_sp_el1 = ctxt->gp_regs.sp_el1;
> +}
> +
> +/**
> + * kvm_arm_restore_shadow_state -- write back shadow state from guest
> + * @vcpu: The VCPU pointer
> + */
> +void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
> +
> +	*vcpu_cpsr(vcpu) = ctxt->hw_pstate;
> +	ctxt->gp_regs.sp_el1 = ctxt->hw_sp_el1;
> +}
> +
> +void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt)
> +{
> +	cpu_ctxt->hw_sys_regs = &cpu_ctxt->sys_regs[0];
> +}
> diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> index 9341376..f2a1b32 100644
> --- a/arch/arm64/kvm/hyp/sysreg-sr.c
> +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> @@ -19,6 +19,7 @@
>  #include <linux/kvm_host.h>
>  
>  #include <asm/kvm_asm.h>
> +#include <asm/kvm_emulate.h>
>  #include <asm/kvm_hyp.h>
>  
>  /* Yes, this does nothing, on purpose */
> @@ -33,37 +34,41 @@ static void __hyp_text __sysreg_do_nothing(struct kvm_cpu_context *ctxt) { }
>  
>  static void __hyp_text __sysreg_save_common_state(struct kvm_cpu_context *ctxt)
>  {
> -	ctxt->sys_regs[ACTLR_EL1]	= read_sysreg(actlr_el1);
> -	ctxt->sys_regs[TPIDR_EL0]	= read_sysreg(tpidr_el0);
> -	ctxt->sys_regs[TPIDRRO_EL0]	= read_sysreg(tpidrro_el0);
> -	ctxt->sys_regs[TPIDR_EL1]	= read_sysreg(tpidr_el1);
> -	ctxt->sys_regs[MDSCR_EL1]	= read_sysreg(mdscr_el1);
> +	u64 *sys_regs = kern_hyp_va(ctxt->hw_sys_regs);
> +
> +	sys_regs[ACTLR_EL1]	= read_sysreg(actlr_el1);
> +	sys_regs[TPIDR_EL0]	= read_sysreg(tpidr_el0);
> +	sys_regs[TPIDRRO_EL0]	= read_sysreg(tpidrro_el0);
> +	sys_regs[TPIDR_EL1]	= read_sysreg(tpidr_el1);
> +	sys_regs[MDSCR_EL1]	= read_sysreg(mdscr_el1);
>  	ctxt->gp_regs.regs.sp		= read_sysreg(sp_el0);
>  	ctxt->gp_regs.regs.pc		= read_sysreg_el2(elr);
> -	ctxt->gp_regs.regs.pstate	= read_sysreg_el2(spsr);
> +	ctxt->hw_pstate			= read_sysreg_el2(spsr);
>  }
>  
>  static void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
>  {
> -	ctxt->sys_regs[MPIDR_EL1]	= read_sysreg(vmpidr_el2);
> -	ctxt->sys_regs[CSSELR_EL1]	= read_sysreg(csselr_el1);
> -	ctxt->sys_regs[SCTLR_EL1]	= read_sysreg_el1(sctlr);
> -	ctxt->sys_regs[CPACR_EL1]	= read_sysreg_el1(cpacr);
> -	ctxt->sys_regs[TTBR0_EL1]	= read_sysreg_el1(ttbr0);
> -	ctxt->sys_regs[TTBR1_EL1]	= read_sysreg_el1(ttbr1);
> -	ctxt->sys_regs[TCR_EL1]		= read_sysreg_el1(tcr);
> -	ctxt->sys_regs[ESR_EL1]		= read_sysreg_el1(esr);
> -	ctxt->sys_regs[AFSR0_EL1]	= read_sysreg_el1(afsr0);
> -	ctxt->sys_regs[AFSR1_EL1]	= read_sysreg_el1(afsr1);
> -	ctxt->sys_regs[FAR_EL1]		= read_sysreg_el1(far);
> -	ctxt->sys_regs[MAIR_EL1]	= read_sysreg_el1(mair);
> -	ctxt->sys_regs[VBAR_EL1]	= read_sysreg_el1(vbar);
> -	ctxt->sys_regs[CONTEXTIDR_EL1]	= read_sysreg_el1(contextidr);
> -	ctxt->sys_regs[AMAIR_EL1]	= read_sysreg_el1(amair);
> -	ctxt->sys_regs[CNTKCTL_EL1]	= read_sysreg_el1(cntkctl);
> -	ctxt->sys_regs[PAR_EL1]		= read_sysreg(par_el1);
> -
> -	ctxt->gp_regs.sp_el1		= read_sysreg(sp_el1);
> +	u64 *sys_regs = kern_hyp_va(ctxt->hw_sys_regs);
> +
> +	sys_regs[MPIDR_EL1]	= read_sysreg(vmpidr_el2);
> +	sys_regs[CSSELR_EL1]	= read_sysreg(csselr_el1);
> +	sys_regs[SCTLR_EL1]	= read_sysreg_el1(sctlr);
> +	sys_regs[CPACR_EL1]	= read_sysreg_el1(cpacr);
> +	sys_regs[TTBR0_EL1]	= read_sysreg_el1(ttbr0);
> +	sys_regs[TTBR1_EL1]	= read_sysreg_el1(ttbr1);
> +	sys_regs[TCR_EL1]	= read_sysreg_el1(tcr);
> +	sys_regs[ESR_EL1]	= read_sysreg_el1(esr);
> +	sys_regs[AFSR0_EL1]	= read_sysreg_el1(afsr0);
> +	sys_regs[AFSR1_EL1]	= read_sysreg_el1(afsr1);
> +	sys_regs[FAR_EL1]	= read_sysreg_el1(far);
> +	sys_regs[MAIR_EL1]	= read_sysreg_el1(mair);
> +	sys_regs[VBAR_EL1]	= read_sysreg_el1(vbar);
> +	sys_regs[CONTEXTIDR_EL1]	= read_sysreg_el1(contextidr);
> +	sys_regs[AMAIR_EL1]	= read_sysreg_el1(amair);
> +	sys_regs[CNTKCTL_EL1]	= read_sysreg_el1(cntkctl);
> +	sys_regs[PAR_EL1]		= read_sysreg(par_el1);
> +
> +	ctxt->hw_sp_el1			= read_sysreg(sp_el1);
>  	ctxt->gp_regs.elr_el1		= read_sysreg_el1(elr);
>  	ctxt->gp_regs.spsr[KVM_SPSR_EL1]= read_sysreg_el1(spsr);
>  }
> @@ -86,37 +91,41 @@ void __hyp_text __sysreg_save_guest_state(struct kvm_cpu_context *ctxt)
>  
>  static void __hyp_text __sysreg_restore_common_state(struct kvm_cpu_context *ctxt)
>  {
> -	write_sysreg(ctxt->sys_regs[ACTLR_EL1],	  actlr_el1);
> -	write_sysreg(ctxt->sys_regs[TPIDR_EL0],	  tpidr_el0);
> -	write_sysreg(ctxt->sys_regs[TPIDRRO_EL0], tpidrro_el0);
> -	write_sysreg(ctxt->sys_regs[TPIDR_EL1],	  tpidr_el1);
> -	write_sysreg(ctxt->sys_regs[MDSCR_EL1],	  mdscr_el1);
> +	u64 *sys_regs = kern_hyp_va(ctxt->hw_sys_regs);
> +
> +	write_sysreg(sys_regs[ACTLR_EL1],	  actlr_el1);
> +	write_sysreg(sys_regs[TPIDR_EL0],	  tpidr_el0);
> +	write_sysreg(sys_regs[TPIDRRO_EL0],	tpidrro_el0);
> +	write_sysreg(sys_regs[TPIDR_EL1],	  tpidr_el1);
> +	write_sysreg(sys_regs[MDSCR_EL1],	  mdscr_el1);
>  	write_sysreg(ctxt->gp_regs.regs.sp,	  sp_el0);
>  	write_sysreg_el2(ctxt->gp_regs.regs.pc,	  elr);
> -	write_sysreg_el2(ctxt->gp_regs.regs.pstate, spsr);
> +	write_sysreg_el2(ctxt->hw_pstate,	  spsr);
>  }
>  
>  static void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
>  {
> -	write_sysreg(ctxt->sys_regs[MPIDR_EL1],		vmpidr_el2);
> -	write_sysreg(ctxt->sys_regs[CSSELR_EL1],	csselr_el1);
> -	write_sysreg_el1(ctxt->sys_regs[SCTLR_EL1],	sctlr);
> -	write_sysreg_el1(ctxt->sys_regs[CPACR_EL1],	cpacr);
> -	write_sysreg_el1(ctxt->sys_regs[TTBR0_EL1],	ttbr0);
> -	write_sysreg_el1(ctxt->sys_regs[TTBR1_EL1],	ttbr1);
> -	write_sysreg_el1(ctxt->sys_regs[TCR_EL1],	tcr);
> -	write_sysreg_el1(ctxt->sys_regs[ESR_EL1],	esr);
> -	write_sysreg_el1(ctxt->sys_regs[AFSR0_EL1],	afsr0);
> -	write_sysreg_el1(ctxt->sys_regs[AFSR1_EL1],	afsr1);
> -	write_sysreg_el1(ctxt->sys_regs[FAR_EL1],	far);
> -	write_sysreg_el1(ctxt->sys_regs[MAIR_EL1],	mair);
> -	write_sysreg_el1(ctxt->sys_regs[VBAR_EL1],	vbar);
> -	write_sysreg_el1(ctxt->sys_regs[CONTEXTIDR_EL1],contextidr);
> -	write_sysreg_el1(ctxt->sys_regs[AMAIR_EL1],	amair);
> -	write_sysreg_el1(ctxt->sys_regs[CNTKCTL_EL1], 	cntkctl);
> -	write_sysreg(ctxt->sys_regs[PAR_EL1],		par_el1);
> -
> -	write_sysreg(ctxt->gp_regs.sp_el1,		sp_el1);
> +	u64 *sys_regs = kern_hyp_va(ctxt->hw_sys_regs);
> +
> +	write_sysreg(sys_regs[MPIDR_EL1],	vmpidr_el2);
> +	write_sysreg(sys_regs[CSSELR_EL1],	csselr_el1);
> +	write_sysreg_el1(sys_regs[SCTLR_EL1],	sctlr);
> +	write_sysreg_el1(sys_regs[CPACR_EL1],	cpacr);
> +	write_sysreg_el1(sys_regs[TTBR0_EL1],	ttbr0);
> +	write_sysreg_el1(sys_regs[TTBR1_EL1],	ttbr1);
> +	write_sysreg_el1(sys_regs[TCR_EL1],	tcr);
> +	write_sysreg_el1(sys_regs[ESR_EL1],	esr);
> +	write_sysreg_el1(sys_regs[AFSR0_EL1],	afsr0);
> +	write_sysreg_el1(sys_regs[AFSR1_EL1],	afsr1);
> +	write_sysreg_el1(sys_regs[FAR_EL1],	far);
> +	write_sysreg_el1(sys_regs[MAIR_EL1],	mair);
> +	write_sysreg_el1(sys_regs[VBAR_EL1],	vbar);
> +	write_sysreg_el1(sys_regs[CONTEXTIDR_EL1], contextidr);
> +	write_sysreg_el1(sys_regs[AMAIR_EL1],	amair);
> +	write_sysreg_el1(sys_regs[CNTKCTL_EL1], cntkctl);
> +	write_sysreg(sys_regs[PAR_EL1],		par_el1);
> +
> +	write_sysreg(ctxt->hw_sp_el1,			sp_el1);
>  	write_sysreg_el1(ctxt->gp_regs.elr_el1,		elr);
>  	write_sysreg_el1(ctxt->gp_regs.spsr[KVM_SPSR_EL1],spsr);
>  }
> -- 
> 1.9.1
> 
> 

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 08/55] KVM: arm64: Set virtual EL2 context depending on the guest exception level
  2017-01-09  6:24 ` [RFC 08/55] KVM: arm64: Set virtual EL2 context depending on the guest exception level Jintack Lim
@ 2017-02-22 11:14   ` Christoffer Dall
  2017-06-01 20:22   ` Bandan Das
  1 sibling, 0 replies; 111+ messages in thread
From: Christoffer Dall @ 2017-02-22 11:14 UTC (permalink / raw)
  To: Jintack Lim
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel

On Mon, Jan 09, 2017 at 01:24:04AM -0500, Jintack Lim wrote:
> From: Christoffer Dall <christoffer.dall@linaro.org>
> 
> Set up virutal EL2 context to hardware if the guest exception level is
> EL2.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
> ---
>  arch/arm64/kvm/context.c | 32 ++++++++++++++++++++++++++------
>  1 file changed, 26 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
> index 320afc6..acb4b1e 100644
> --- a/arch/arm64/kvm/context.c
> +++ b/arch/arm64/kvm/context.c
> @@ -25,10 +25,25 @@
>  void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
> +	if (unlikely(vcpu_mode_el2(vcpu))) {
> +		ctxt->hw_pstate = *vcpu_cpsr(vcpu) & ~PSR_MODE_MASK;
>  
> -	ctxt->hw_pstate = *vcpu_cpsr(vcpu);
> -	ctxt->hw_sys_regs = ctxt->sys_regs;
> -	ctxt->hw_sp_el1 = ctxt->gp_regs.sp_el1;
> +		/*
> +		 * We emulate virtual EL2 mode in hardware EL1 mode using the
> +		 * same stack pointer mode as the guest expects.
> +		 */

I think this comment should either be deleted or explain why this works
as opposed to stating the obvious.  How about:

		/*
		 * We can emulate the guest's configuration of which
		 * stack pointer to use when executing in virtual EL2 by
		 * using the equivalent feature in EL1 to point to
		 * either the EL1 or EL0 stack pointer.
		 */

> +		if ((*vcpu_cpsr(vcpu) & PSR_MODE_MASK) == PSR_MODE_EL2h)
> +			ctxt->hw_pstate |= PSR_MODE_EL1h;
> +		else
> +			ctxt->hw_pstate |= PSR_MODE_EL1t;
> +
> +		ctxt->hw_sys_regs = ctxt->shadow_sys_regs;
> +		ctxt->hw_sp_el1 = ctxt->el2_regs[SP_EL2];
> +	} else {
> +		ctxt->hw_pstate = *vcpu_cpsr(vcpu);
> +		ctxt->hw_sys_regs = ctxt->sys_regs;
> +		ctxt->hw_sp_el1 = ctxt->gp_regs.sp_el1;
> +	}
>  }
>  
>  /**
> @@ -38,9 +53,14 @@ void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
>  void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
> -
> -	*vcpu_cpsr(vcpu) = ctxt->hw_pstate;
> -	ctxt->gp_regs.sp_el1 = ctxt->hw_sp_el1;
> +	if (unlikely(vcpu_mode_el2(vcpu))) {
> +		*vcpu_cpsr(vcpu) &= PSR_MODE_MASK;
> +		*vcpu_cpsr(vcpu) |= ctxt->hw_pstate & ~PSR_MODE_MASK;
> +		ctxt->el2_regs[SP_EL2] = ctxt->hw_sp_el1;
> +	} else {
> +		*vcpu_cpsr(vcpu) = ctxt->hw_pstate;
> +		ctxt->gp_regs.sp_el1 = ctxt->hw_sp_el1;
> +	}
>  }
>  
>  void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt)
> -- 
> 1.9.1
> 
> 

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 09/55] KVM: arm64: Set shadow EL1 registers for virtual EL2 execution
  2017-01-09  6:24 ` [RFC 09/55] KVM: arm64: Set shadow EL1 registers for virtual EL2 execution Jintack Lim
@ 2017-02-22 11:19   ` Christoffer Dall
  0 siblings, 0 replies; 111+ messages in thread
From: Christoffer Dall @ 2017-02-22 11:19 UTC (permalink / raw)
  To: Jintack Lim
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel

On Mon, Jan 09, 2017 at 01:24:05AM -0500, Jintack Lim wrote:
> From: Christoffer Dall <christoffer.dall@linaro.org>
> 
> When entering virtual EL2, we need to reflect virtual EL2 register
> states to corresponding shadow EL1 registers. We can simply copy them if
> their formats are identical.  Otherwise, we need to convert EL2 register
> state to EL1 register state.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
> ---
>  arch/arm64/kvm/context.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++++

Looking at this again, I'm not sure 'context.c' is a very meaningful
name.

>  1 file changed, 71 insertions(+)
> 
> diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
> index acb4b1e..2e9e386 100644
> --- a/arch/arm64/kvm/context.c
> +++ b/arch/arm64/kvm/context.c
> @@ -17,6 +17,76 @@
>  
>  #include <linux/kvm_host.h>
>  #include <asm/kvm_emulate.h>
> +#include <asm/esr.h>
> +
> +struct el1_el2_map {
> +	enum vcpu_sysreg	el1;
> +	enum el2_regs		el2;
> +};
> +
> +/*
> + * List of EL2 registers which can be directly applied to EL1 registers to
> + * emulate running EL2 in EL1.  The EL1 registers here must either be trapped
> + * or paravirtualized in EL1.

This series doesn't deal with paravirtualizaion but only targets 8.3 so
we should clean up references to paravirtualization.

> + */
> +static const struct el1_el2_map el1_el2_map[] = {
> +	{ AMAIR_EL1, AMAIR_EL2 },
> +	{ MAIR_EL1, MAIR_EL2 },
> +	{ TTBR0_EL1, TTBR0_EL2 },
> +	{ ACTLR_EL1, ACTLR_EL2 },
> +	{ AFSR0_EL1, AFSR0_EL2 },
> +	{ AFSR1_EL1, AFSR1_EL2 },
> +	{ SCTLR_EL1, SCTLR_EL2 },
> +	{ VBAR_EL1, VBAR_EL2 },
> +};
> +
> +static inline u64 tcr_el2_ips_to_tcr_el1_ps(u64 tcr_el2)
> +{
> +	return ((tcr_el2 & TCR_EL2_PS_MASK) >> TCR_EL2_PS_SHIFT)
> +		<< TCR_IPS_SHIFT;
> +}
> +
> +static inline u64 cptr_el2_to_cpacr_el1(u64 cptr_el2)
> +{
> +	u64 cpacr_el1 = 0;
> +
> +	if (!(cptr_el2 & CPTR_EL2_TFP))
> +		cpacr_el1 |= CPACR_EL1_FPEN;
> +	if (cptr_el2 & CPTR_EL2_TTA)
> +		cpacr_el1 |= CPACR_EL1_TTA;
> +
> +	return cpacr_el1;
> +}
> +
> +static void create_shadow_el1_sysregs(struct kvm_vcpu *vcpu)
> +{
> +	u64 *s_sys_regs = vcpu->arch.ctxt.shadow_sys_regs;
> +	u64 *el2_regs = vcpu->arch.ctxt.el2_regs;
> +	u64 tcr_el2;
> +	int i;
> +
> +	for (i = 0; i < ARRAY_SIZE(el1_el2_map); i++) {
> +		const struct el1_el2_map *map = &el1_el2_map[i];
> +
> +		s_sys_regs[map->el1] = el2_regs[map->el2];
> +	}
> +
> +	tcr_el2 = el2_regs[TCR_EL2];
> +	s_sys_regs[TCR_EL1] =
> +		TCR_EPD1 |	/* disable TTBR1_EL1 */
> +		((tcr_el2 & TCR_EL2_TBI) ? TCR_TBI0 : 0) |
> +		tcr_el2_ips_to_tcr_el1_ps(tcr_el2) |
> +		(tcr_el2 & TCR_EL2_TG0_MASK) |
> +		(tcr_el2 & TCR_EL2_ORGN0_MASK) |
> +		(tcr_el2 & TCR_EL2_IRGN0_MASK) |
> +		(tcr_el2 & TCR_EL2_T0SZ_MASK);
> +
> +	/* Rely on separate VMID for VA context, always use ASID 0 */
> +	s_sys_regs[TTBR0_EL1] &= ~GENMASK_ULL(63, 48);
> +	s_sys_regs[TTBR1_EL1] = 0;
> +
> +	s_sys_regs[CPACR_EL1] = cptr_el2_to_cpacr_el1(el2_regs[CPTR_EL2]);
> +}
>  
>  /**
>   * kvm_arm_setup_shadow_state -- prepare shadow state based on emulated mode
> @@ -37,6 +107,7 @@ void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
>  		else
>  			ctxt->hw_pstate |= PSR_MODE_EL1t;
>  
> +		create_shadow_el1_sysregs(vcpu);
>  		ctxt->hw_sys_regs = ctxt->shadow_sys_regs;
>  		ctxt->hw_sp_el1 = ctxt->el2_regs[SP_EL2];
>  	} else {
> -- 
> 1.9.1
> 
> 

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 11/55] KVM: arm64: Emulate taking an exception to the guest hypervisor
  2017-01-09  6:24 ` [RFC 11/55] KVM: arm64: Emulate taking an exception to the guest hypervisor Jintack Lim
@ 2017-02-22 11:28   ` Christoffer Dall
  2017-06-06 20:21   ` Bandan Das
  1 sibling, 0 replies; 111+ messages in thread
From: Christoffer Dall @ 2017-02-22 11:28 UTC (permalink / raw)
  To: Jintack Lim
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel

On Mon, Jan 09, 2017 at 01:24:07AM -0500, Jintack Lim wrote:
> Emulate taking an exception to the guest hypervisor running in the
> virtual EL2 as described in ARM ARM AArch64.TakeException().

I would rename the subject and change the description of this patch to
talk about injecting exceptions to virtual EL2 as opposed to talking
about the guest hypervisor.

> 
> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
> ---
>  arch/arm/include/asm/kvm_emulate.h   | 14 ++++++++
>  arch/arm64/include/asm/kvm_emulate.h | 19 +++++++++++
>  arch/arm64/kvm/Makefile              |  2 ++
>  arch/arm64/kvm/emulate-nested.c      | 66 ++++++++++++++++++++++++++++++++++++
>  arch/arm64/kvm/trace.h               | 20 +++++++++++
>  5 files changed, 121 insertions(+)
>  create mode 100644 arch/arm64/kvm/emulate-nested.c
> 
> diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
> index 0a03b7d..0fa2f5a 100644
> --- a/arch/arm/include/asm/kvm_emulate.h
> +++ b/arch/arm/include/asm/kvm_emulate.h
> @@ -47,6 +47,20 @@ static inline void vcpu_set_reg(struct kvm_vcpu *vcpu, u8 reg_num,
>  void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
>  void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
>  
> +static inline int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2)
> +{
> +	kvm_err("Unexpected call to %s for the non-nesting configuration\n",
> +		 __func__);
> +	return -EINVAL;
> +}
> +
> +static inline int kvm_inject_nested_irq(struct kvm_vcpu *vcpu)
> +{
> +	kvm_err("Unexpected call to %s for the non-nesting configuration\n",
> +		 __func__);
> +	return -EINVAL;
> +}
> +
>  static inline void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu) { };
>  static inline void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu) { };
>  static inline void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt) { };
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 8892c82..0987ee4 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -42,6 +42,25 @@
>  void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
>  void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
>  
> +#ifdef CONFIG_KVM_ARM_NESTED_HYP
> +int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2);
> +int kvm_inject_nested_irq(struct kvm_vcpu *vcpu);
> +#else
> +static inline int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2)
> +{
> +	kvm_err("Unexpected call to %s for the non-nesting configuration\n",
> +		 __func__);
> +	return -EINVAL;
> +}
> +
> +static inline int kvm_inject_nested_irq(struct kvm_vcpu *vcpu)
> +{
> +	kvm_err("Unexpected call to %s for the non-nesting configuration\n",
> +		 __func__);
> +	return -EINVAL;
> +}
> +#endif
> +
>  void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu);
>  void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu);
>  void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt);
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index 7811d27..b342bdd 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -34,3 +34,5 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-its.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/irqchip.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
>  kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
> +
> +kvm-$(CONFIG_KVM_ARM_NESTED_HYP) += emulate-nested.o
> diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
> new file mode 100644
> index 0000000..59d147f
> --- /dev/null
> +++ b/arch/arm64/kvm/emulate-nested.c
> @@ -0,0 +1,66 @@
> +/*
> + * Copyright (C) 2016 - Columbia University
> + * Author: Jintack Lim <jintack@cs.columbia.edu>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <linux/kvm.h>
> +#include <linux/kvm_host.h>
> +
> +#include <asm/kvm_emulate.h>
> +
> +#include "trace.h"
> +
> +#define	EL2_EXCEPT_SYNC_OFFSET	0x400
> +#define	EL2_EXCEPT_ASYNC_OFFSET	0x480

I don't like the 'EXCEPT' word here.  Don't we have other defines in the
kernel with more appropriate naming schemes we can rely on?

> +
> +
> +/*
> + *  Emulate taking an exception. See ARM ARM J8.1.2 AArch64.TakeException()
> + */
> +static int kvm_inject_nested(struct kvm_vcpu *vcpu, u64 esr_el2,
> +			     int exception_offset)
> +{
> +	int ret = 1;
> +	kvm_cpu_context_t *ctxt = &vcpu->arch.ctxt;
> +
> +	/* We don't inject an exception recursively to virtual EL2 */
> +	if (vcpu_mode_el2(vcpu))
> +		BUG();

Why not?

> +
> +	ctxt->el2_regs[SPSR_EL2] = *vcpu_cpsr(vcpu);
> +	ctxt->el2_regs[ELR_EL2] = *vcpu_pc(vcpu);
> +	ctxt->el2_regs[ESR_EL2] = esr_el2;
> +
> +	/* On an exception, PSTATE.SP = 1 */

You can probably loose this comment.

> +	*vcpu_cpsr(vcpu) = PSR_MODE_EL2h;
> +	*vcpu_cpsr(vcpu) |=  (PSR_A_BIT | PSR_F_BIT | PSR_I_BIT | PSR_D_BIT);
> +	*vcpu_pc(vcpu) = ctxt->el2_regs[VBAR_EL2] + exception_offset;
> +
> +	trace_kvm_inject_nested_exception(vcpu, esr_el2, *vcpu_pc(vcpu));
> +
> +	return ret;
> +}
> +
> +int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2)
> +{
> +	return kvm_inject_nested(vcpu, esr_el2, EL2_EXCEPT_SYNC_OFFSET);
> +}
> +
> +int kvm_inject_nested_irq(struct kvm_vcpu *vcpu)
> +{
> +	u64 esr_el2 = kvm_vcpu_get_hsr(vcpu);
> +	/* We supports only IRQ and FIQ, so the esr_el2 is not updated. */

I don't understand this comment.

I think you need some whitespace here before the comment and explain a
little more context about why we can reuse the ESR register on the vcpu
struct here.

Also 's/supports/support/'

> +	return kvm_inject_nested(vcpu, esr_el2, EL2_EXCEPT_ASYNC_OFFSET);
> +}
> diff --git a/arch/arm64/kvm/trace.h b/arch/arm64/kvm/trace.h
> index 7fb0008..7c86cfb 100644
> --- a/arch/arm64/kvm/trace.h
> +++ b/arch/arm64/kvm/trace.h
> @@ -167,6 +167,26 @@
>  );
>  
>  
> +TRACE_EVENT(kvm_inject_nested_exception,
> +	TP_PROTO(struct kvm_vcpu *vcpu, unsigned long esr_el2,
> +		 unsigned long pc),
> +	TP_ARGS(vcpu, esr_el2, pc),
> +
> +	TP_STRUCT__entry(
> +		__field(struct kvm_vcpu *,	vcpu)
> +		__field(unsigned long,		esr_el2)
> +		__field(unsigned long,		pc)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->vcpu = vcpu;
> +		__entry->esr_el2 = esr_el2;
> +		__entry->pc = pc;
> +	),
> +
> +	TP_printk("vcpu: %p, inject exception to vEL2: ESR_EL2 0x%lx, vector: 0x%016lx",
> +		  __entry->vcpu, __entry->esr_el2, __entry->pc)
> +);
>  #endif /* _TRACE_ARM64_KVM_H */
>  
>  #undef TRACE_INCLUDE_PATH
> -- 
> 1.9.1
> 
> 

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 12/55] KVM: arm64: Handle EL2 register access traps
  2017-01-09  6:24 ` [RFC 12/55] KVM: arm64: Handle EL2 register access traps Jintack Lim
@ 2017-02-22 11:30   ` Christoffer Dall
  2017-02-22 11:31   ` Christoffer Dall
  1 sibling, 0 replies; 111+ messages in thread
From: Christoffer Dall @ 2017-02-22 11:30 UTC (permalink / raw)
  To: Jintack Lim
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel

On Mon, Jan 09, 2017 at 01:24:08AM -0500, Jintack Lim wrote:
> ARM v8.3 introduces a new bit in the HCR_EL2, which is the NV bit. When
> this bit is set, accessing EL2 registers in EL1 traps to EL2. In
> addition, executing following instructions in EL1 will trap to EL2 -

the following:

So these instructions trap:
 - tlbi
 - at
 - eret
 - mrs/msr accessgin sp_el1

And they would previously undef at EL1, but now trap to EL2?

> tlbi and at instructions which are undefined when exectued in EL1, eret
> instruction, msr/mrs instructions to access SP_EL1.

this is a bit confusing to read.

> 
> This patch handles traps due to accessing EL2 registers in EL1.  The
> host hypervisor keeps EL2 register values in memory, and will use them
> to emulate the behavior that the guest hypervisor expects from the
> hardware.
> 
> Subsequent patches will handle other kinds of traps.
> 
> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
> ---
>  arch/arm64/kvm/sys_regs.c | 119 ++++++++++++++++++++++++++++++++++++++++++++++
>  arch/arm64/kvm/sys_regs.h |   7 +++
>  2 files changed, 126 insertions(+)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 7cef94f..4158f2f 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -873,6 +873,18 @@ static bool access_cntp_cval(struct kvm_vcpu *vcpu,
>  	return true;
>  }
>  
> +static bool trap_el2_reg(struct kvm_vcpu *vcpu,
> +			 struct sys_reg_params *p,
> +			 const struct sys_reg_desc *r)
> +{
> +	if (!p->is_write)
> +		p->regval = vcpu_el2_reg(vcpu, r->reg);
> +	else
> +		vcpu_el2_reg(vcpu, r->reg) = p->regval;
> +
> +	return true;
> +}
> +
>  /*
>   * Architected system registers.
>   * Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
> @@ -1163,15 +1175,122 @@ static bool access_cntp_cval(struct kvm_vcpu *vcpu,
>  	{ Op0(0b11), Op1(0b011), CRn(0b1110), CRm(0b1111), Op2(0b111),
>  	  access_pmu_evtyper, reset_val, PMCCFILTR_EL0, 0 },
>  
> +	/* VPIDR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0000), CRm(0b0000), Op2(0b000),
> +	  trap_el2_reg, reset_el2_val, VPIDR_EL2, 0 },
> +	/* VMPIDR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0000), CRm(0b0000), Op2(0b101),
> +	  trap_el2_reg, reset_el2_val, VMPIDR_EL2, 0 },
> +
> +	/* SCTLR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0001), CRm(0b0000), Op2(0b000),
> +	  trap_el2_reg, reset_el2_val, SCTLR_EL2, 0 },
> +	/* ACTLR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0001), CRm(0b0000), Op2(0b001),
> +	  trap_el2_reg, reset_el2_val, ACTLR_EL2, 0 },
> +	/* HCR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0001), CRm(0b0001), Op2(0b000),
> +	  trap_el2_reg, reset_el2_val, HCR_EL2, 0 },
> +	/* MDCR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0001), CRm(0b0001), Op2(0b001),
> +	  trap_el2_reg, reset_el2_val, MDCR_EL2, 0 },
> +	/* CPTR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0001), CRm(0b0001), Op2(0b010),
> +	  trap_el2_reg, reset_el2_val, CPTR_EL2, 0 },
> +	/* HSTR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0001), CRm(0b0001), Op2(0b011),
> +	  trap_el2_reg, reset_el2_val, HSTR_EL2, 0 },
> +	/* HACR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0001), CRm(0b0001), Op2(0b111),
> +	  trap_el2_reg, reset_el2_val, HACR_EL2, 0 },
> +
> +	/* TTBR0_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0010), CRm(0b0000), Op2(0b000),
> +	  trap_el2_reg, reset_el2_val, TTBR0_EL2, 0 },
> +	/* TCR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0010), CRm(0b0000), Op2(0b010),
> +	  trap_el2_reg, reset_el2_val, TCR_EL2, 0 },
> +	/* VTTBR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0010), CRm(0b0001), Op2(0b000),
> +	  trap_el2_reg, reset_el2_val, VTTBR_EL2, 0 },
> +	/* VTCR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0010), CRm(0b0001), Op2(0b010),
> +	  trap_el2_reg, reset_el2_val, VTCR_EL2, 0 },
> +
>  	/* DACR32_EL2 */
>  	{ Op0(0b11), Op1(0b100), CRn(0b0011), CRm(0b0000), Op2(0b000),
>  	  NULL, reset_unknown, DACR32_EL2 },
> +
> +	/* SPSR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0100), CRm(0b0000), Op2(0b000),
> +	  trap_el2_reg, reset_el2_val, SPSR_EL2, 0 },
> +	/* ELR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0100), CRm(0b0000), Op2(0b001),
> +	  trap_el2_reg, reset_el2_val, ELR_EL2, 0 },
> +	/* SP_EL1 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0100), CRm(0b0001), Op2(0b000),
> +	  trap_el2_reg },
> +
>  	/* IFSR32_EL2 */
>  	{ Op0(0b11), Op1(0b100), CRn(0b0101), CRm(0b0000), Op2(0b001),
>  	  NULL, reset_unknown, IFSR32_EL2 },
> +	/* AFSR0_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0101), CRm(0b0001), Op2(0b000),
> +	  trap_el2_reg, reset_el2_val, AFSR0_EL2, 0 },
> +	/* AFSR1_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0101), CRm(0b0001), Op2(0b001),
> +	  trap_el2_reg, reset_el2_val, AFSR1_EL2, 0 },
> +	/* ESR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0101), CRm(0b0010), Op2(0b000),
> +	  trap_el2_reg, reset_el2_val, ESR_EL2, 0 },
>  	/* FPEXC32_EL2 */
>  	{ Op0(0b11), Op1(0b100), CRn(0b0101), CRm(0b0011), Op2(0b000),
>  	  NULL, reset_val, FPEXC32_EL2, 0x70 },
> +
> +	/* FAR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0110), CRm(0b0000), Op2(0b000),
> +	  trap_el2_reg, reset_el2_val, FAR_EL2, 0 },
> +	/* HPFAR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0110), CRm(0b0000), Op2(0b100),
> +	  trap_el2_reg, reset_el2_val, HPFAR_EL2, 0 },
> +
> +	/* MAIR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b1010), CRm(0b0010), Op2(0b000),
> +	  trap_el2_reg, reset_el2_val, MAIR_EL2, 0 },
> +	/* AMAIR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b1010), CRm(0b0011), Op2(0b000),
> +	  trap_el2_reg, reset_el2_val, AMAIR_EL2, 0 },
> +
> +	/* VBAR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b1100), CRm(0b0000), Op2(0b000),
> +	  trap_el2_reg, reset_el2_val, VBAR_EL2, 0 },
> +	/* RVBAR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b1100), CRm(0b0000), Op2(0b001),
> +	  trap_el2_reg, reset_el2_val, RVBAR_EL2, 0 },
> +	/* RMR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b1100), CRm(0b0000), Op2(0b010),
> +	  trap_el2_reg, reset_el2_val, RMR_EL2, 0 },
> +
> +	/* TPIDR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b1101), CRm(0b0000), Op2(0b010),
> +	  trap_el2_reg, reset_el2_val, TPIDR_EL2, 0 },
> +
> +	/* CNTVOFF_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b1110), CRm(0b0000), Op2(0b011),
> +	  trap_el2_reg, reset_el2_val, CNTVOFF_EL2, 0 },
> +	/* CNTHCTL_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b1110), CRm(0b0001), Op2(0b000),
> +	  trap_el2_reg, reset_el2_val, CNTHCTL_EL2, 0 },
> +	/* CNTHP_TVAL_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b1110), CRm(0b0010), Op2(0b000),
> +	  trap_el2_reg, reset_el2_val, CNTHP_TVAL_EL2, 0 },
> +	/* CNTHP_CTL_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b1110), CRm(0b0010), Op2(0b001),
> +	  trap_el2_reg, reset_el2_val, CNTHP_CTL_EL2, 0 },
> +	/* CNTHP_CVAL_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b1110), CRm(0b0010), Op2(0b010),
> +	  trap_el2_reg, reset_el2_val, CNTHP_CVAL_EL2, 0 },
> +
>  };
>  
>  static bool trap_dbgidr(struct kvm_vcpu *vcpu,
> diff --git a/arch/arm64/kvm/sys_regs.h b/arch/arm64/kvm/sys_regs.h
> index dbbb01c..181290f 100644
> --- a/arch/arm64/kvm/sys_regs.h
> +++ b/arch/arm64/kvm/sys_regs.h
> @@ -117,6 +117,13 @@ static inline void reset_val(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r
>  	vcpu_sys_reg(vcpu, r->reg) = r->val;
>  }
>  
> +static inline void reset_el2_val(struct kvm_vcpu *vcpu,
> +				 const struct sys_reg_desc *r)
> +{
> +	BUG_ON(r->reg >= NR_EL2_REGS);
> +	vcpu_el2_reg(vcpu, r->reg) = r->val;
> +}
> +
>  static inline int cmp_sys_reg(const struct sys_reg_desc *i1,
>  			      const struct sys_reg_desc *i2)
>  {
> -- 
> 1.9.1
> 
> 

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 12/55] KVM: arm64: Handle EL2 register access traps
  2017-01-09  6:24 ` [RFC 12/55] KVM: arm64: Handle EL2 register access traps Jintack Lim
  2017-02-22 11:30   ` Christoffer Dall
@ 2017-02-22 11:31   ` Christoffer Dall
  1 sibling, 0 replies; 111+ messages in thread
From: Christoffer Dall @ 2017-02-22 11:31 UTC (permalink / raw)
  To: Jintack Lim
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel

On Mon, Jan 09, 2017 at 01:24:08AM -0500, Jintack Lim wrote:
> ARM v8.3 introduces a new bit in the HCR_EL2, which is the NV bit. When
> this bit is set, accessing EL2 registers in EL1 traps to EL2. In
> addition, executing following instructions in EL1 will trap to EL2 -
> tlbi and at instructions which are undefined when exectued in EL1, eret
> instruction, msr/mrs instructions to access SP_EL1.
> 
> This patch handles traps due to accessing EL2 registers in EL1.  The
> host hypervisor keeps EL2 register values in memory, and will use them
> to emulate the behavior that the guest hypervisor expects from the
> hardware.

This patch just sets up the handlers but doesn't actually enable the NV
feature, right?

> 
> Subsequent patches will handle other kinds of traps.
> 
> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
> ---
>  arch/arm64/kvm/sys_regs.c | 119 ++++++++++++++++++++++++++++++++++++++++++++++
>  arch/arm64/kvm/sys_regs.h |   7 +++
>  2 files changed, 126 insertions(+)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 7cef94f..4158f2f 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -873,6 +873,18 @@ static bool access_cntp_cval(struct kvm_vcpu *vcpu,
>  	return true;
>  }
>  
> +static bool trap_el2_reg(struct kvm_vcpu *vcpu,
> +			 struct sys_reg_params *p,
> +			 const struct sys_reg_desc *r)
> +{
> +	if (!p->is_write)
> +		p->regval = vcpu_el2_reg(vcpu, r->reg);
> +	else
> +		vcpu_el2_reg(vcpu, r->reg) = p->regval;
> +
> +	return true;
> +}
> +
>  /*
>   * Architected system registers.
>   * Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
> @@ -1163,15 +1175,122 @@ static bool access_cntp_cval(struct kvm_vcpu *vcpu,
>  	{ Op0(0b11), Op1(0b011), CRn(0b1110), CRm(0b1111), Op2(0b111),
>  	  access_pmu_evtyper, reset_val, PMCCFILTR_EL0, 0 },
>  
> +	/* VPIDR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0000), CRm(0b0000), Op2(0b000),
> +	  trap_el2_reg, reset_el2_val, VPIDR_EL2, 0 },
> +	/* VMPIDR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0000), CRm(0b0000), Op2(0b101),
> +	  trap_el2_reg, reset_el2_val, VMPIDR_EL2, 0 },
> +
> +	/* SCTLR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0001), CRm(0b0000), Op2(0b000),
> +	  trap_el2_reg, reset_el2_val, SCTLR_EL2, 0 },
> +	/* ACTLR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0001), CRm(0b0000), Op2(0b001),
> +	  trap_el2_reg, reset_el2_val, ACTLR_EL2, 0 },
> +	/* HCR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0001), CRm(0b0001), Op2(0b000),
> +	  trap_el2_reg, reset_el2_val, HCR_EL2, 0 },
> +	/* MDCR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0001), CRm(0b0001), Op2(0b001),
> +	  trap_el2_reg, reset_el2_val, MDCR_EL2, 0 },
> +	/* CPTR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0001), CRm(0b0001), Op2(0b010),
> +	  trap_el2_reg, reset_el2_val, CPTR_EL2, 0 },
> +	/* HSTR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0001), CRm(0b0001), Op2(0b011),
> +	  trap_el2_reg, reset_el2_val, HSTR_EL2, 0 },
> +	/* HACR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0001), CRm(0b0001), Op2(0b111),
> +	  trap_el2_reg, reset_el2_val, HACR_EL2, 0 },
> +
> +	/* TTBR0_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0010), CRm(0b0000), Op2(0b000),
> +	  trap_el2_reg, reset_el2_val, TTBR0_EL2, 0 },
> +	/* TCR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0010), CRm(0b0000), Op2(0b010),
> +	  trap_el2_reg, reset_el2_val, TCR_EL2, 0 },
> +	/* VTTBR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0010), CRm(0b0001), Op2(0b000),
> +	  trap_el2_reg, reset_el2_val, VTTBR_EL2, 0 },
> +	/* VTCR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0010), CRm(0b0001), Op2(0b010),
> +	  trap_el2_reg, reset_el2_val, VTCR_EL2, 0 },
> +
>  	/* DACR32_EL2 */
>  	{ Op0(0b11), Op1(0b100), CRn(0b0011), CRm(0b0000), Op2(0b000),
>  	  NULL, reset_unknown, DACR32_EL2 },
> +
> +	/* SPSR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0100), CRm(0b0000), Op2(0b000),
> +	  trap_el2_reg, reset_el2_val, SPSR_EL2, 0 },
> +	/* ELR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0100), CRm(0b0000), Op2(0b001),
> +	  trap_el2_reg, reset_el2_val, ELR_EL2, 0 },
> +	/* SP_EL1 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0100), CRm(0b0001), Op2(0b000),
> +	  trap_el2_reg },
> +
>  	/* IFSR32_EL2 */
>  	{ Op0(0b11), Op1(0b100), CRn(0b0101), CRm(0b0000), Op2(0b001),
>  	  NULL, reset_unknown, IFSR32_EL2 },
> +	/* AFSR0_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0101), CRm(0b0001), Op2(0b000),
> +	  trap_el2_reg, reset_el2_val, AFSR0_EL2, 0 },
> +	/* AFSR1_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0101), CRm(0b0001), Op2(0b001),
> +	  trap_el2_reg, reset_el2_val, AFSR1_EL2, 0 },
> +	/* ESR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0101), CRm(0b0010), Op2(0b000),
> +	  trap_el2_reg, reset_el2_val, ESR_EL2, 0 },
>  	/* FPEXC32_EL2 */
>  	{ Op0(0b11), Op1(0b100), CRn(0b0101), CRm(0b0011), Op2(0b000),
>  	  NULL, reset_val, FPEXC32_EL2, 0x70 },
> +
> +	/* FAR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0110), CRm(0b0000), Op2(0b000),
> +	  trap_el2_reg, reset_el2_val, FAR_EL2, 0 },
> +	/* HPFAR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b0110), CRm(0b0000), Op2(0b100),
> +	  trap_el2_reg, reset_el2_val, HPFAR_EL2, 0 },
> +
> +	/* MAIR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b1010), CRm(0b0010), Op2(0b000),
> +	  trap_el2_reg, reset_el2_val, MAIR_EL2, 0 },
> +	/* AMAIR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b1010), CRm(0b0011), Op2(0b000),
> +	  trap_el2_reg, reset_el2_val, AMAIR_EL2, 0 },
> +
> +	/* VBAR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b1100), CRm(0b0000), Op2(0b000),
> +	  trap_el2_reg, reset_el2_val, VBAR_EL2, 0 },
> +	/* RVBAR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b1100), CRm(0b0000), Op2(0b001),
> +	  trap_el2_reg, reset_el2_val, RVBAR_EL2, 0 },
> +	/* RMR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b1100), CRm(0b0000), Op2(0b010),
> +	  trap_el2_reg, reset_el2_val, RMR_EL2, 0 },
> +
> +	/* TPIDR_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b1101), CRm(0b0000), Op2(0b010),
> +	  trap_el2_reg, reset_el2_val, TPIDR_EL2, 0 },
> +
> +	/* CNTVOFF_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b1110), CRm(0b0000), Op2(0b011),
> +	  trap_el2_reg, reset_el2_val, CNTVOFF_EL2, 0 },
> +	/* CNTHCTL_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b1110), CRm(0b0001), Op2(0b000),
> +	  trap_el2_reg, reset_el2_val, CNTHCTL_EL2, 0 },
> +	/* CNTHP_TVAL_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b1110), CRm(0b0010), Op2(0b000),
> +	  trap_el2_reg, reset_el2_val, CNTHP_TVAL_EL2, 0 },
> +	/* CNTHP_CTL_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b1110), CRm(0b0010), Op2(0b001),
> +	  trap_el2_reg, reset_el2_val, CNTHP_CTL_EL2, 0 },
> +	/* CNTHP_CVAL_EL2 */
> +	{ Op0(0b11), Op1(0b100), CRn(0b1110), CRm(0b0010), Op2(0b010),
> +	  trap_el2_reg, reset_el2_val, CNTHP_CVAL_EL2, 0 },
> +
>  };
>  
>  static bool trap_dbgidr(struct kvm_vcpu *vcpu,
> diff --git a/arch/arm64/kvm/sys_regs.h b/arch/arm64/kvm/sys_regs.h
> index dbbb01c..181290f 100644
> --- a/arch/arm64/kvm/sys_regs.h
> +++ b/arch/arm64/kvm/sys_regs.h
> @@ -117,6 +117,13 @@ static inline void reset_val(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r
>  	vcpu_sys_reg(vcpu, r->reg) = r->val;
>  }
>  
> +static inline void reset_el2_val(struct kvm_vcpu *vcpu,
> +				 const struct sys_reg_desc *r)
> +{
> +	BUG_ON(r->reg >= NR_EL2_REGS);
> +	vcpu_el2_reg(vcpu, r->reg) = r->val;
> +}
> +
>  static inline int cmp_sys_reg(const struct sys_reg_desc *i1,
>  			      const struct sys_reg_desc *i2)
>  {
> -- 
> 1.9.1
> 
> 

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 14/55] KVM: arm64: Take account of system instruction traps
  2017-01-09  6:24 ` [RFC 14/55] KVM: arm64: Take account of system " Jintack Lim
@ 2017-02-22 11:34   ` Christoffer Dall
  0 siblings, 0 replies; 111+ messages in thread
From: Christoffer Dall @ 2017-02-22 11:34 UTC (permalink / raw)
  To: Jintack Lim
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel

On Mon, Jan 09, 2017 at 01:24:10AM -0500, Jintack Lim wrote:
> When HCR.NV bit is set, execution of the EL2 translation regime Address
> Translation instructions and TLB maintenance instructions are trapped to
> EL2. In addition, execution of the EL1 translation regime Address
> Translation instructions and TLB maintenance instructions that are only
> accessible from EL2 and above are trapped to EL2. In these cases,
> ESR_EL2.EC will be set to 0x18.
> 
> Take account of this and handle system instructions as well as MRS/MSR
> instructions in the handler. Change the handler name to reflect this.
> 
> Emulation of those system instructions is to be done.

Is it goint to be done in later patches in this series or left as an
excercise for the reader?

> 
> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
> ---
>  arch/arm64/include/asm/kvm_coproc.h |  2 +-
>  arch/arm64/kvm/handle_exit.c        |  2 +-
>  arch/arm64/kvm/sys_regs.c           | 49 ++++++++++++++++++++++++++++++++-----
>  arch/arm64/kvm/trace.h              |  2 +-
>  4 files changed, 46 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_coproc.h b/arch/arm64/include/asm/kvm_coproc.h
> index 0b52377..1b3d21b 100644
> --- a/arch/arm64/include/asm/kvm_coproc.h
> +++ b/arch/arm64/include/asm/kvm_coproc.h
> @@ -43,7 +43,7 @@ void kvm_register_target_sys_reg_table(unsigned int target,
>  int kvm_handle_cp14_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
>  int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
>  int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
> -int kvm_handle_sys_reg(struct kvm_vcpu *vcpu, struct kvm_run *run);
> +int kvm_handle_sys(struct kvm_vcpu *vcpu, struct kvm_run *run);
>  
>  #define kvm_coproc_table_init kvm_sys_reg_table_init
>  void kvm_sys_reg_table_init(void);
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 4e4a915..a891684 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -147,7 +147,7 @@ static int kvm_handle_eret(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  	[ESR_ELx_EC_SMC32]	= handle_smc,
>  	[ESR_ELx_EC_HVC64]	= handle_hvc,
>  	[ESR_ELx_EC_SMC64]	= handle_smc,
> -	[ESR_ELx_EC_SYS64]	= kvm_handle_sys_reg,
> +	[ESR_ELx_EC_SYS64]	= kvm_handle_sys,
>  	[ESR_ELx_EC_ERET]	= kvm_handle_eret,
>  	[ESR_ELx_EC_IABT_LOW]	= kvm_handle_guest_abort,
>  	[ESR_ELx_EC_DABT_LOW]	= kvm_handle_guest_abort,
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 4158f2f..202f64d 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1903,6 +1903,36 @@ static int emulate_sys_reg(struct kvm_vcpu *vcpu,
>  	return 1;
>  }
>  
> +static int emulate_tlbi(struct kvm_vcpu *vcpu,
> +			     struct sys_reg_params *params)
> +{
> +	/* TODO: support tlbi instruction emulation*/
> +	kvm_inject_undefined(vcpu);
> +	return 1;
> +}
> +
> +static int emulate_at(struct kvm_vcpu *vcpu,
> +			     struct sys_reg_params *params)
> +{
> +	/* TODO: support address translation instruction emulation */
> +	kvm_inject_undefined(vcpu);
> +	return 1;
> +}
> +
> +static int emulate_sys_instr(struct kvm_vcpu *vcpu,
> +			     struct sys_reg_params *params)
> +{
> +	int ret;
> +
> +	/* TLB maintenance instructions*/
> +	if (params->CRn == 0b1000)
> +		ret = emulate_tlbi(vcpu, params);
> +	/* Address Translation instructions */
> +	else if (params->CRn == 0b0111 && params->CRm == 0b1000)
> +		ret = emulate_at(vcpu, params);
> +	return ret;
> +}
> +
>  static void reset_sys_reg_descs(struct kvm_vcpu *vcpu,
>  			      const struct sys_reg_desc *table, size_t num)
>  {
> @@ -1914,18 +1944,19 @@ static void reset_sys_reg_descs(struct kvm_vcpu *vcpu,
>  }
>  
>  /**
> - * kvm_handle_sys_reg -- handles a mrs/msr trap on a guest sys_reg access
> + * kvm_handle_sys-- handles a system instruction or mrs/msr instruction trap
> +		    on a guest execution
>   * @vcpu: The VCPU pointer
>   * @run:  The kvm_run struct
>   */
> -int kvm_handle_sys_reg(struct kvm_vcpu *vcpu, struct kvm_run *run)
> +int kvm_handle_sys(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  {
>  	struct sys_reg_params params;
>  	unsigned long esr = kvm_vcpu_get_hsr(vcpu);
>  	int Rt = (esr >> 5) & 0x1f;
>  	int ret;
>  
> -	trace_kvm_handle_sys_reg(esr);
> +	trace_kvm_handle_sys(esr);
>  
>  	params.is_aarch32 = false;
>  	params.is_32bit = false;
> @@ -1937,10 +1968,16 @@ int kvm_handle_sys_reg(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  	params.regval = vcpu_get_reg(vcpu, Rt);
>  	params.is_write = !(esr & 1);
>  
> -	ret = emulate_sys_reg(vcpu, &params);
> +	if (params.Op0 == 1) {
> +		/* System instructions */
> +		ret = emulate_sys_instr(vcpu, &params);
> +	} else {
> +		/* MRS/MSR instructions */
> +		ret = emulate_sys_reg(vcpu, &params);
> +		if (!params.is_write)
> +			vcpu_set_reg(vcpu, Rt, params.regval);
> +	}
>  
> -	if (!params.is_write)
> -		vcpu_set_reg(vcpu, Rt, params.regval);
>  	return ret;
>  }
>  
> diff --git a/arch/arm64/kvm/trace.h b/arch/arm64/kvm/trace.h
> index 5f40987..192708e 100644
> --- a/arch/arm64/kvm/trace.h
> +++ b/arch/arm64/kvm/trace.h
> @@ -134,7 +134,7 @@
>  	TP_printk("%s %s reg %d (0x%08llx)", __entry->fn,  __entry->is_write?"write to":"read from", __entry->reg, __entry->write_value)
>  );
>  
> -TRACE_EVENT(kvm_handle_sys_reg,
> +TRACE_EVENT(kvm_handle_sys,
>  	TP_PROTO(unsigned long hsr),
>  	TP_ARGS(hsr),
>  
> -- 
> 1.9.1
> 
> 

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 16/55] KVM: arm64: Forward VM reg traps to the guest hypervisor
  2017-01-09  6:24 ` [RFC 16/55] KVM: arm64: Forward VM reg traps to the guest hypervisor Jintack Lim
@ 2017-02-22 11:39   ` Christoffer Dall
  0 siblings, 0 replies; 111+ messages in thread
From: Christoffer Dall @ 2017-02-22 11:39 UTC (permalink / raw)
  To: Jintack Lim
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel

On Mon, Jan 09, 2017 at 01:24:12AM -0500, Jintack Lim wrote:
> Forward virtual memory register traps to the guest hypervisor
> if it has set corresponding bits to the virtual HCR_EL2.

I was a bit confused about the subject of this patch.  I would recommend
calling it something like
"Respect virtul HCR_EL2.TVM and HCR_EL2.TRVM settings"

> 
> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
> ---
>  arch/arm64/kvm/sys_regs.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index b8e993a..0f5d21b 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -90,6 +90,23 @@ static bool access_dcsw(struct kvm_vcpu *vcpu,
>  	return true;
>  }
>  
> +static bool forward_vm_traps(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
> +{
> +	u64 hcr_el2 = vcpu_el2_reg(vcpu, HCR_EL2);
> +
> +	/* If this is a trap from the virtual EL2, the host handles */
> +	if (vcpu_mode_el2(vcpu))
> +		return false;
> +
> +	/* If the guest wants to trap on R/W operation, forward this trap */
> +	if ((hcr_el2 & HCR_TVM) && p->is_write)
> +		return true;
> +	else if ((hcr_el2 & HCR_TRVM) && !p->is_write)
> +		return true;
> +
> +	return false;
> +}
> +
>  /*
>   * Generic accessor for VM registers. Only called as long as HCR_TVM
>   * is set. If the guest enables the MMU, we stop trapping the VM
> @@ -101,6 +118,9 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
>  {
>  	bool was_enabled = vcpu_has_cache_enabled(vcpu);
>  
> +	if (forward_vm_traps(vcpu, p))
> +		return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
> +
>  	BUG_ON(!vcpu_mode_el2(vcpu) && !p->is_write);
>  
>  	if (!p->is_write) {
> -- 
> 1.9.1
> 
> 

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 17/55] KVM: arm64: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 in virtual EL2
  2017-01-09  6:24 ` [RFC 17/55] KVM: arm64: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 in virtual EL2 Jintack Lim
@ 2017-02-22 11:40   ` Christoffer Dall
  0 siblings, 0 replies; 111+ messages in thread
From: Christoffer Dall @ 2017-02-22 11:40 UTC (permalink / raw)
  To: Jintack Lim
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel

On Mon, Jan 09, 2017 at 01:24:13AM -0500, Jintack Lim wrote:
> For the same reason we trap virtual memory register accesses in virtual
> EL2, we need to trap SPSR_EL1, ELR_EL1 and VBAR_EL1 accesses. ARM v8.3
> introduces the HCR_EL2.NV1 bit to be able to trap on those register
> accesses in EL1. Do not set this bit until the whole nesting support is
> complete.

You'll only enable this feature for a non-VHE guest hypervisor, right?

> 
> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
> ---
>  arch/arm64/kvm/sys_regs.c | 41 ++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 40 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 0f5d21b..19d6a6e 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -898,6 +898,38 @@ static bool access_cntp_cval(struct kvm_vcpu *vcpu,
>  	return true;
>  }
>  
> +static inline void access_rw(struct sys_reg_params *p, u64 *sysreg)
> +{
> +	if (!p->is_write)
> +		p->regval = *sysreg;
> +	else
> +		*sysreg = p->regval;
> +}
> +
> +static bool access_elr(struct kvm_vcpu *vcpu,
> +		struct sys_reg_params *p,
> +		const struct sys_reg_desc *r)
> +{
> +	access_rw(p, &vcpu->arch.ctxt.gp_regs.elr_el1);
> +	return true;
> +}
> +
> +static bool access_spsr(struct kvm_vcpu *vcpu,
> +		struct sys_reg_params *p,
> +		const struct sys_reg_desc *r)
> +{
> +	access_rw(p, &vcpu->arch.ctxt.gp_regs.spsr[KVM_SPSR_EL1]);
> +	return true;
> +}
> +
> +static bool access_vbar(struct kvm_vcpu *vcpu,
> +		struct sys_reg_params *p,
> +		const struct sys_reg_desc *r)
> +{
> +	access_rw(p, &vcpu_sys_reg(vcpu, r->reg));
> +	return true;
> +}
> +
>  static bool trap_el2_reg(struct kvm_vcpu *vcpu,
>  			 struct sys_reg_params *p,
>  			 const struct sys_reg_desc *r)
> @@ -1013,6 +1045,13 @@ static bool trap_el2_reg(struct kvm_vcpu *vcpu,
>  	{ Op0(0b11), Op1(0b000), CRn(0b0010), CRm(0b0000), Op2(0b010),
>  	  access_vm_reg, reset_val, TCR_EL1, 0 },
>  
> +	/* SPSR_EL1 */
> +	{ Op0(0b11), Op1(0b000), CRn(0b0100), CRm(0b0000), Op2(0b000),
> +	  access_spsr},
> +	/* ELR_EL1 */
> +	{ Op0(0b11), Op1(0b000), CRn(0b0100), CRm(0b0000), Op2(0b001),
> +	  access_elr},
> +
>  	/* AFSR0_EL1 */
>  	{ Op0(0b11), Op1(0b000), CRn(0b0101), CRm(0b0001), Op2(0b000),
>  	  access_vm_reg, reset_unknown, AFSR0_EL1 },
> @@ -1045,7 +1084,7 @@ static bool trap_el2_reg(struct kvm_vcpu *vcpu,
>  
>  	/* VBAR_EL1 */
>  	{ Op0(0b11), Op1(0b000), CRn(0b1100), CRm(0b0000), Op2(0b000),
> -	  NULL, reset_val, VBAR_EL1, 0 },
> +	  access_vbar, reset_val, VBAR_EL1, 0 },
>  
>  	/* ICC_SGI1R_EL1 */
>  	{ Op0(0b11), Op1(0b000), CRn(0b1100), CRm(0b1011), Op2(0b101),
> -- 
> 1.9.1
> 
> 

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 18/55] KVM: arm64: Forward traps due to HCR_EL2.NV1 bit to the guest hypervisor
  2017-01-09  6:24 ` [RFC 18/55] KVM: arm64: Forward traps due to HCR_EL2.NV1 bit to the guest hypervisor Jintack Lim
@ 2017-02-22 11:41   ` Christoffer Dall
  0 siblings, 0 replies; 111+ messages in thread
From: Christoffer Dall @ 2017-02-22 11:41 UTC (permalink / raw)
  To: Jintack Lim
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel

On Mon, Jan 09, 2017 at 01:24:14AM -0500, Jintack Lim wrote:
> Forward ELR_EL1, SPSR_EL1 and VBAR_EL1 traps to the guest hypervisor if
> it has set the NV1 bit to the virtual HCR_EL2. The guest hypervisor
> would set this NV1 bit to run a hypervisor in its VM (i.e. another level
> of nested hypervisor).

Ah so this is recursively supporting the NV1 bit ?

> 
> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
> ---
>  arch/arm64/include/asm/kvm_arm.h |  1 +
>  arch/arm64/kvm/sys_regs.c        | 17 +++++++++++++++++
>  2 files changed, 18 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index 2a2752b..feded61 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -23,6 +23,7 @@
>  #include <asm/types.h>
>  
>  /* Hyp Configuration Register (HCR) bits */
> +#define HCR_NV1		(UL(1) << 43)
>  #define HCR_E2H		(UL(1) << 34)
>  #define HCR_ID		(UL(1) << 33)
>  #define HCR_CD		(UL(1) << 32)
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 19d6a6e..59f9cc6 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -906,10 +906,21 @@ static inline void access_rw(struct sys_reg_params *p, u64 *sysreg)
>  		*sysreg = p->regval;
>  }
>  
> +static bool forward_nv1_traps(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
> +{
> +	if (!vcpu_mode_el2(vcpu) && (vcpu_el2_reg(vcpu, HCR_EL2) & HCR_NV1))
> +		return true;
> +
> +	return false;
> +}
> +
>  static bool access_elr(struct kvm_vcpu *vcpu,
>  		struct sys_reg_params *p,
>  		const struct sys_reg_desc *r)
>  {
> +	if (forward_nv1_traps(vcpu, p))
> +		return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
> +
>  	access_rw(p, &vcpu->arch.ctxt.gp_regs.elr_el1);
>  	return true;
>  }
> @@ -918,6 +929,9 @@ static bool access_spsr(struct kvm_vcpu *vcpu,
>  		struct sys_reg_params *p,
>  		const struct sys_reg_desc *r)
>  {
> +	if (forward_nv1_traps(vcpu, p))
> +		return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
> +
>  	access_rw(p, &vcpu->arch.ctxt.gp_regs.spsr[KVM_SPSR_EL1]);
>  	return true;
>  }
> @@ -926,6 +940,9 @@ static bool access_vbar(struct kvm_vcpu *vcpu,
>  		struct sys_reg_params *p,
>  		const struct sys_reg_desc *r)
>  {
> +	if (forward_nv1_traps(vcpu, p))
> +		return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
> +
>  	access_rw(p, &vcpu_sys_reg(vcpu, r->reg));
>  	return true;
>  }
> -- 
> 1.9.1
> 
> 

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 21/55] KVM: arm64: Forward HVC instruction to the guest hypervisor
  2017-01-09  6:24 ` [RFC 21/55] KVM: arm64: Forward HVC instruction " Jintack Lim
@ 2017-02-22 11:47   ` Christoffer Dall
  2017-06-26 15:21     ` Jintack Lim
  0 siblings, 1 reply; 111+ messages in thread
From: Christoffer Dall @ 2017-02-22 11:47 UTC (permalink / raw)
  To: Jintack Lim
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel

On Mon, Jan 09, 2017 at 01:24:17AM -0500, Jintack Lim wrote:
> Forward exceptions due to hvc instruction to the guest hypervisor.
> 
> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
> ---
>  arch/arm64/include/asm/kvm_nested.h |  5 +++++
>  arch/arm64/kvm/Makefile             |  1 +
>  arch/arm64/kvm/handle_exit.c        | 11 +++++++++++
>  arch/arm64/kvm/handle_exit_nested.c | 27 +++++++++++++++++++++++++++
>  4 files changed, 44 insertions(+)
>  create mode 100644 arch/arm64/include/asm/kvm_nested.h
>  create mode 100644 arch/arm64/kvm/handle_exit_nested.c
> 
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> new file mode 100644
> index 0000000..620b4d3
> --- /dev/null
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -0,0 +1,5 @@
> +#ifndef __ARM64_KVM_NESTED_H__
> +#define __ARM64_KVM_NESTED_H__
> +
> +int handle_hvc_nested(struct kvm_vcpu *vcpu);
> +#endif
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index b342bdd..9c35e9a 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -35,4 +35,5 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/irqchip.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
>  kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
>  
> +kvm-$(CONFIG_KVM_ARM_NESTED_HYP) += handle_exit_nested.o
>  kvm-$(CONFIG_KVM_ARM_NESTED_HYP) += emulate-nested.o
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index a891684..208be16 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -29,6 +29,10 @@
>  #include <asm/kvm_mmu.h>
>  #include <asm/kvm_psci.h>
>  
> +#ifdef CONFIG_KVM_ARM_NESTED_HYP
> +#include <asm/kvm_nested.h>
> +#endif
> +
>  #define CREATE_TRACE_POINTS
>  #include "trace.h"
>  
> @@ -42,6 +46,13 @@ static int handle_hvc(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  			    kvm_vcpu_hvc_get_imm(vcpu));
>  	vcpu->stat.hvc_exit_stat++;
>  
> +#ifdef CONFIG_KVM_ARM_NESTED_HYP
> +	ret = handle_hvc_nested(vcpu);
> +	if (ret < 0 && ret != -EINVAL)
> +		return ret;
> +	else if (ret >= 0)
> +		return ret;
> +#endif
>  	ret = kvm_psci_call(vcpu);
>  	if (ret < 0) {
>  		kvm_inject_undefined(vcpu);
> diff --git a/arch/arm64/kvm/handle_exit_nested.c b/arch/arm64/kvm/handle_exit_nested.c
> new file mode 100644
> index 0000000..a6ce23b
> --- /dev/null
> +++ b/arch/arm64/kvm/handle_exit_nested.c
> @@ -0,0 +1,27 @@
> +/*
> + * Copyright (C) 2016 - Columbia University
> + * Author: Jintack Lim <jintack@cs.columbia.edu>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <linux/kvm.h>
> +#include <linux/kvm_host.h>
> +
> +#include <asm/kvm_emulate.h>
> +
> +/* We forward all hvc instruction to the guest hypervisor. */
> +int handle_hvc_nested(struct kvm_vcpu *vcpu)
> +{
> +	return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
> +}

I don't understand the logic here or in the caller above.  Do we really
forward *all" hvc calls to the guest hypervisor now, so that we no
longer support any hypercalls from the VM?  That seems a little rough
and probably requires some more discussions.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 25/55] KVM: arm/arm64: Let vcpu thread modify its own active state
  2017-01-09  6:24 ` [RFC 25/55] KVM: arm/arm64: Let vcpu thread modify its own active state Jintack Lim
@ 2017-02-22 12:27   ` Christoffer Dall
  0 siblings, 0 replies; 111+ messages in thread
From: Christoffer Dall @ 2017-02-22 12:27 UTC (permalink / raw)
  To: Jintack Lim
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel

On Mon, Jan 09, 2017 at 01:24:21AM -0500, Jintack Lim wrote:
> Currently, if a vcpu thread tries to change its own active state when
> the irq is already in AP list, it'll loop forever. Since the VCPU thread
> has already synced back LR state to the struct vgic_irq, let it modify
> its own state safely.
> 
> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
> ---
>  virt/kvm/arm/vgic/vgic-mmio.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c
> index ebe1b9f..049c570 100644
> --- a/virt/kvm/arm/vgic/vgic-mmio.c
> +++ b/virt/kvm/arm/vgic/vgic-mmio.c
> @@ -192,9 +192,9 @@ static void vgic_mmio_change_active(struct kvm_vcpu *vcpu, struct vgic_irq *irq,
>  	 * If this virtual IRQ was written into a list register, we
>  	 * have to make sure the CPU that runs the VCPU thread has
>  	 * synced back LR state to the struct vgic_irq.  We can only
> -	 * know this for sure, when either this irq is not assigned to
> +	 * know this for sure, when this irq is not assigned to
>  	 * anyone's AP list anymore, or the VCPU thread is not
> -	 * running on any CPUs.
> +	 * running on any CPUs, or current thread is the VCPU thread.
>  	 *
>  	 * In the opposite case, we know the VCPU thread may be on its
>  	 * way back from the guest and still has to sync back this
> @@ -202,6 +202,7 @@ static void vgic_mmio_change_active(struct kvm_vcpu *vcpu, struct vgic_irq *irq,
>  	 * other thread sync back the IRQ.
>  	 */
>  	while (irq->vcpu && /* IRQ may have state in an LR somewhere */
> +	       irq->vcpu != vcpu && /* Current thread is not the VCPU thread */
>  	       irq->vcpu->cpu != -1) /* VCPU thread is running */
>  		cond_resched_lock(&irq->irq_lock);
>  
> -- 
> 1.9.1
> 
> 

This seems to be an independent fix, so please send it outside of this
series as an individual patch.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 27/55] KVM: arm/arm64: Emulate GICH interface on GICv2
  2017-01-09  6:24 ` [RFC 27/55] KVM: arm/arm64: Emulate GICH interface on GICv2 Jintack Lim
@ 2017-02-22 13:06   ` Christoffer Dall
  0 siblings, 0 replies; 111+ messages in thread
From: Christoffer Dall @ 2017-02-22 13:06 UTC (permalink / raw)
  To: Jintack Lim
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel

On Mon, Jan 09, 2017 at 01:24:23AM -0500, Jintack Lim wrote:
> Emulate GICH interface accesses from the guest hypervisor.
> 
> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
> Signed-off-by: Shih-Wei Li <shihwei@cs.columbia.edu>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>  arch/arm64/kvm/Makefile            |   1 +
>  virt/kvm/arm/vgic/vgic-v2-nested.c | 207 +++++++++++++++++++++++++++++++++++++
>  2 files changed, 208 insertions(+)
>  create mode 100644 virt/kvm/arm/vgic/vgic-v2-nested.c
> 
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index 9c35e9a..8573faf 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -37,3 +37,4 @@ kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
>  
>  kvm-$(CONFIG_KVM_ARM_NESTED_HYP) += handle_exit_nested.o
>  kvm-$(CONFIG_KVM_ARM_NESTED_HYP) += emulate-nested.o
> +kvm-$(CONFIG_KVM_ARM_NESTED_HYP) += $(KVM)/arm/vgic/vgic-v2-nested.o
> diff --git a/virt/kvm/arm/vgic/vgic-v2-nested.c b/virt/kvm/arm/vgic/vgic-v2-nested.c
> new file mode 100644
> index 0000000..b13128e
> --- /dev/null
> +++ b/virt/kvm/arm/vgic/vgic-v2-nested.c
> @@ -0,0 +1,207 @@
> +#include <linux/cpu.h>
> +#include <linux/kvm.h>
> +#include <linux/kvm_host.h>
> +#include <linux/interrupt.h>
> +#include <linux/io.h>
> +#include <linux/uaccess.h>
> +
> +#include <linux/irqchip/arm-gic.h>
> +
> +#include <asm/kvm_emulate.h>
> +#include <asm/kvm_arm.h>
> +#include <asm/kvm_mmu.h>
> +#include <kvm/arm_vgic.h>
> +
> +#include "vgic.h"
> +#include "vgic-mmio.h"
> +
> +static inline struct vgic_v2_cpu_if *vcpu_nested_if(struct kvm_vcpu *vcpu)
> +{
> +	return &vcpu->arch.vgic_cpu.nested_vgic_v2;
> +}
> +
> +static inline struct vgic_v2_cpu_if *vcpu_shadow_if(struct kvm_vcpu *vcpu)
> +{
> +	return &vcpu->arch.vgic_cpu.shadow_vgic_v2;
> +}
> +
> +static unsigned long vgic_mmio_read_v2_vtr(struct kvm_vcpu *vcpu,
> +					   gpa_t addr, unsigned int len)
> +{
> +	u32 reg;
> +
> +	reg = kvm_vgic_global_state.nr_lr - 1;
> +	reg |= 0b100 << 26;
> +	reg |= 0b100 << 29;

Pure magic?  Can we have some defines?  Have you checked the existing
header file if we have some defines for this?

> +
> +	return reg;
> +}
> +
> +static inline bool lr_triggers_eoi(u32 lr)
> +{
> +	return !(lr & (GICH_LR_STATE | GICH_LR_HW)) && (lr & GICH_LR_EOI);
> +}
> +
> +static unsigned long get_eisr(struct kvm_vcpu *vcpu, bool upper_reg)
> +{
> +	struct vgic_v2_cpu_if *cpu_if = vcpu_nested_if(vcpu);
> +	int max_lr = upper_reg ? 64 : 32;
> +	int min_lr = upper_reg ? 32 : 0;
> +	int nr_lr = min(kvm_vgic_global_state.nr_lr, max_lr);
> +	int i;
> +	u32 reg = 0;

So the assumption here is that we can only emualte a virtual GICH
interface with the same number of LRs that the hardware has, yes ?  Can
you document this assumption in the commit message and explain how we
deal with nr_lr for all this logic based on that.

Seems like this could go in the commit message.

> +
> +	for (i = min_lr; i < nr_lr; i++) {
> +		if (lr_triggers_eoi(cpu_if->vgic_lr[i]))
> +			reg |= BIT(i - min_lr);
> +	}
> +
> +	return reg;
> +}
> +
> +static unsigned long vgic_mmio_read_v2_eisr0(struct kvm_vcpu *vcpu,
> +					     gpa_t addr, unsigned int len)
> +{
> +	return get_eisr(vcpu, false);
> +}
> +
> +static unsigned long vgic_mmio_read_v2_eisr1(struct kvm_vcpu *vcpu,
> +					     gpa_t addr, unsigned int len)
> +{
> +	return get_eisr(vcpu, true);
> +}
> +
> +static u32 get_elrsr(struct kvm_vcpu *vcpu, bool upper_reg)
> +{
> +	struct vgic_v2_cpu_if *cpu_if = vcpu_nested_if(vcpu);
> +	int max_lr = upper_reg ? 64 : 32;
> +	int min_lr = upper_reg ? 32 : 0;
> +	int nr_lr = min(kvm_vgic_global_state.nr_lr, max_lr);
> +	u32 reg = 0;
> +	int i;
> +
> +	for (i = min_lr; i < nr_lr; i++) {
> +		if (!(cpu_if->vgic_lr[i] & GICH_LR_STATE))
> +			reg |= BIT(i - min_lr);
> +	}
> +
> +	return reg;
> +}
> +
> +static unsigned long vgic_mmio_read_v2_elrsr0(struct kvm_vcpu *vcpu,
> +					      gpa_t addr, unsigned int len)
> +{
> +	return get_elrsr(vcpu, false);
> +}
> +
> +static unsigned long vgic_mmio_read_v2_elrsr1(struct kvm_vcpu *vcpu,
> +					      gpa_t addr, unsigned int len)
> +{
> +	return get_elrsr(vcpu, true);
> +}
> +
> +static unsigned long vgic_mmio_read_v2_misr(struct kvm_vcpu *vcpu,
> +					    gpa_t addr, unsigned int len)
> +{
> +	struct vgic_v2_cpu_if *cpu_if = vcpu_nested_if(vcpu);
> +	int nr_lr = kvm_vgic_global_state.nr_lr;
> +	u32 reg = 0;
> +
> +	if (vgic_mmio_read_v2_eisr0(vcpu, addr, len) ||
> +			vgic_mmio_read_v2_eisr1(vcpu, addr, len))
> +		reg |= GICH_MISR_EOI;
> +
> +	if (cpu_if->vgic_hcr & GICH_HCR_UIE) {
> +		u32 elrsr0 = vgic_mmio_read_v2_elrsr0(vcpu, addr, len);
> +		u32 elrsr1 = vgic_mmio_read_v2_elrsr1(vcpu, addr, len);
> +		int used_lrs;
> +
> +		used_lrs = nr_lr - (hweight32(elrsr0) + hweight32(elrsr1));
> +		if (used_lrs <= 1)
> +			reg |= GICH_MISR_U;
> +	}
> +
> +	/* TODO: Support remaining bits in this register */

Is this going to happen in this series?  Why don't we just do it here?

> +	return reg;
> +}
> +
> +static unsigned long vgic_mmio_read_v2_gich(struct kvm_vcpu *vcpu,
> +					    gpa_t addr, unsigned int len)
> +{
> +	struct vgic_v2_cpu_if *cpu_if = vcpu_nested_if(vcpu);
> +	u32 value;
> +
> +	switch (addr & 0xfff) {
> +	case GICH_HCR:
> +		value = cpu_if->vgic_hcr;
> +		break;
> +	case GICH_VMCR:
> +		value = cpu_if->vgic_vmcr;
> +		break;
> +	case GICH_APR:
> +		value = cpu_if->vgic_apr;
> +		break;
> +	case GICH_LR0 ... (GICH_LR0 + 4 * (VGIC_V2_MAX_LRS - 1)):
> +		value = cpu_if->vgic_lr[(addr & 0xff) >> 2];
> +		break;
> +	default:
> +		return 0;
> +	}
> +
> +	return value;
> +}
> +
> +static void vgic_mmio_write_v2_gich(struct kvm_vcpu *vcpu,
> +				    gpa_t addr, unsigned int len,
> +				    unsigned long val)
> +{
> +	struct vgic_v2_cpu_if *cpu_if = vcpu_nested_if(vcpu);
> +
> +	switch (addr & 0xfff) {
> +	case GICH_HCR:
> +		cpu_if->vgic_hcr = val;
> +		break;
> +	case GICH_VMCR:
> +		cpu_if->vgic_vmcr = val;
> +		break;
> +	case GICH_APR:
> +		cpu_if->vgic_apr = val;
> +		break;
> +	case GICH_LR0 ... (GICH_LR0 + 4 * (VGIC_V2_MAX_LRS - 1)):
> +		cpu_if->vgic_lr[(addr & 0xff) >> 2] = val;

Don't you need to check if we actually support this particular LR ?

> +		break;
> +	}
> +}
> +
> +static const struct vgic_register_region vgic_v2_gich_registers[] = {
> +	REGISTER_DESC_WITH_LENGTH(GICH_HCR,
> +		vgic_mmio_read_v2_gich, vgic_mmio_write_v2_gich, 4,
> +		VGIC_ACCESS_32bit),
> +	REGISTER_DESC_WITH_LENGTH(GICH_VTR,
> +		vgic_mmio_read_v2_vtr, vgic_mmio_write_wi, 4,
> +		VGIC_ACCESS_32bit),
> +	REGISTER_DESC_WITH_LENGTH(GICH_VMCR,
> +		vgic_mmio_read_v2_gich, vgic_mmio_write_v2_gich, 4,
> +		VGIC_ACCESS_32bit),
> +	REGISTER_DESC_WITH_LENGTH(GICH_MISR,
> +		vgic_mmio_read_v2_misr, vgic_mmio_write_wi, 4,
> +		VGIC_ACCESS_32bit),
> +	REGISTER_DESC_WITH_LENGTH(GICH_EISR0,
> +		vgic_mmio_read_v2_eisr0, vgic_mmio_write_wi, 4,
> +		VGIC_ACCESS_32bit),
> +	REGISTER_DESC_WITH_LENGTH(GICH_EISR1,
> +		vgic_mmio_read_v2_eisr1, vgic_mmio_write_wi, 4,
> +		VGIC_ACCESS_32bit),
> +	REGISTER_DESC_WITH_LENGTH(GICH_ELRSR0,
> +		vgic_mmio_read_v2_elrsr0, vgic_mmio_write_wi, 4,
> +		VGIC_ACCESS_32bit),
> +	REGISTER_DESC_WITH_LENGTH(GICH_ELRSR1,
> +		vgic_mmio_read_v2_elrsr1, vgic_mmio_write_wi, 4,
> +		VGIC_ACCESS_32bit),
> +	REGISTER_DESC_WITH_LENGTH(GICH_APR,
> +		vgic_mmio_read_v2_gich, vgic_mmio_write_v2_gich, 4,
> +		VGIC_ACCESS_32bit),
> +	REGISTER_DESC_WITH_LENGTH(GICH_LR0,
> +		vgic_mmio_read_v2_gich, vgic_mmio_write_v2_gich,
> +		4 * VGIC_V2_MAX_LRS, VGIC_ACCESS_32bit),
> +};
> -- 
> 1.9.1
> 
> 

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 28/55] KVM: arm/arm64: Prepare vgic state for the nested VM
  2017-01-09  6:24 ` [RFC 28/55] KVM: arm/arm64: Prepare vgic state for the nested VM Jintack Lim
@ 2017-02-22 13:12   ` Christoffer Dall
  0 siblings, 0 replies; 111+ messages in thread
From: Christoffer Dall @ 2017-02-22 13:12 UTC (permalink / raw)
  To: Jintack Lim
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel

On Mon, Jan 09, 2017 at 01:24:24AM -0500, Jintack Lim wrote:
> When entering a nested VM, we set up the hypervisor control interface
> based on what the guest hypervisor has set. Especially, we investigate
> each list register written by the guest hypervisor whether HW bit is
> set.  If so, we translate hw irq number from the guest's point of view
> to the real hardware irq number if there is a mapping.

Does that really always work?

Are there not some assumptions about the virtual device that the guest
hypervisor is mapping the virtual IRQ to also exists as an equivalent
device with some connected state on the host?

Thanks,
-Christoffer

> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
> ---
>  arch/arm/include/asm/kvm_emulate.h   |  5 ++
>  arch/arm64/include/asm/kvm_emulate.h |  5 ++
>  arch/arm64/kvm/context.c             |  4 ++
>  include/kvm/arm_vgic.h               |  8 +++
>  virt/kvm/arm/vgic/vgic-init.c        |  3 ++
>  virt/kvm/arm/vgic/vgic-v2-nested.c   | 99 ++++++++++++++++++++++++++++++++++++
>  virt/kvm/arm/vgic/vgic.h             | 11 ++++
>  7 files changed, 135 insertions(+)
> 
> diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
> index 0fa2f5a..05d5906 100644
> --- a/arch/arm/include/asm/kvm_emulate.h
> +++ b/arch/arm/include/asm/kvm_emulate.h
> @@ -101,6 +101,11 @@ static inline bool vcpu_mode_el2(const struct kvm_vcpu *vcpu)
>  	return false;
>  }
>  
> +static inline bool vcpu_el2_imo_is_set(const struct kvm_vcpu *vcpu)
> +{
> +	return false;
> +}
> +
>  static inline unsigned long *vcpu_pc(struct kvm_vcpu *vcpu)
>  {
>  	return &vcpu->arch.ctxt.gp_regs.usr_regs.ARM_pc;
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 0987ee4..a9c993f 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -178,6 +178,11 @@ static inline bool vcpu_mode_el2(const struct kvm_vcpu *vcpu)
>  	return mode == PSR_MODE_EL2h || mode == PSR_MODE_EL2t;
>  }
>  
> +static inline bool vcpu_el2_imo_is_set(const struct kvm_vcpu *vcpu)
> +{
> +	return (vcpu_el2_reg(vcpu, HCR_EL2) & HCR_IMO);
> +}
> +
>  static inline u32 kvm_vcpu_get_hsr(const struct kvm_vcpu *vcpu)
>  {
>  	return vcpu->arch.fault.esr_el2;
> diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
> index 0025dd9..7a94c9d 100644
> --- a/arch/arm64/kvm/context.c
> +++ b/arch/arm64/kvm/context.c
> @@ -161,6 +161,8 @@ void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
>  		ctxt->hw_sys_regs = ctxt->sys_regs;
>  		ctxt->hw_sp_el1 = ctxt->gp_regs.sp_el1;
>  	}
> +
> +	vgic_v2_setup_shadow_state(vcpu);
>  }
>  
>  /**
> @@ -179,6 +181,8 @@ void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu)
>  		*vcpu_cpsr(vcpu) = ctxt->hw_pstate;
>  		ctxt->gp_regs.sp_el1 = ctxt->hw_sp_el1;
>  	}
> +
> +	vgic_v2_restore_shadow_state(vcpu);
>  }
>  
>  void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt)
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index 9a9cb27..484f6b1 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -312,6 +312,14 @@ int kvm_vgic_inject_mapped_irq(struct kvm *kvm, int cpuid, unsigned int intid,
>  
>  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
>  
> +#ifdef CONFIG_KVM_ARM_NESTED_HYP
> +void vgic_v2_setup_shadow_state(struct kvm_vcpu *vcpu);
> +void vgic_v2_restore_shadow_state(struct kvm_vcpu *vcpu);
> +#else
> +static inline void vgic_v2_setup_shadow_state(struct kvm_vcpu *vcpu) { }
> +static inline void vgic_v2_restore_shadow_state(struct kvm_vcpu *vcpu) { }
> +#endif
> +
>  #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
>  #define vgic_initialized(k)	((k)->arch.vgic.initialized)
>  #define vgic_ready(k)		((k)->arch.vgic.ready)
> diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c
> index 8cebfbc..06ab8a5 100644
> --- a/virt/kvm/arm/vgic/vgic-init.c
> +++ b/virt/kvm/arm/vgic/vgic-init.c
> @@ -216,6 +216,9 @@ static void kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
>  			irq->config = VGIC_CONFIG_LEVEL;
>  		}
>  	}
> +
> +	vgic_init_nested(vcpu);
> +
>  	if (kvm_vgic_global_state.type == VGIC_V2)
>  		vgic_v2_enable(vcpu);
>  	else
> diff --git a/virt/kvm/arm/vgic/vgic-v2-nested.c b/virt/kvm/arm/vgic/vgic-v2-nested.c
> index b13128e..a992da5 100644
> --- a/virt/kvm/arm/vgic/vgic-v2-nested.c
> +++ b/virt/kvm/arm/vgic/vgic-v2-nested.c
> @@ -205,3 +205,102 @@ static void vgic_mmio_write_v2_gich(struct kvm_vcpu *vcpu,
>  		vgic_mmio_read_v2_gich, vgic_mmio_write_v2_gich,
>  		4 * VGIC_V2_MAX_LRS, VGIC_ACCESS_32bit),
>  };
> +
> +/*
> + * For LRs which have HW bit set such as timer interrupts, we modify them to
> + * have the host hardware interrupt number instead of the virtual one programmed
> + * by the guest hypervisor.
> + */
> +static void vgic_v2_create_shadow_lr(struct kvm_vcpu *vcpu)
> +{
> +	int i;
> +	struct vgic_v2_cpu_if *cpu_if = vcpu_nested_if(vcpu);
> +	struct vgic_v2_cpu_if *s_cpu_if = vcpu_shadow_if(vcpu);
> +	struct vgic_irq *irq;
> +
> +	int nr_lr = kvm_vgic_global_state.nr_lr;
> +
> +	for (i = 0; i < nr_lr; i++) {
> +		u32 lr = cpu_if->vgic_lr[i];
> +		int l1_irq;
> +
> +		if (!(lr & GICH_LR_HW))
> +			goto next;
> +
> +		/* We have the HW bit set */
> +		l1_irq = (lr & GICH_LR_PHYSID_CPUID) >>
> +			GICH_LR_PHYSID_CPUID_SHIFT;
> +		irq = vgic_get_irq(vcpu->kvm, vcpu, l1_irq);
> +
> +		if (!irq->hw) {
> +			/* There was no real mapping, so nuke the HW bit */
> +			lr &= ~GICH_LR_HW;
> +			vgic_put_irq(vcpu->kvm, irq);
> +			goto next;
> +		}
> +
> +		/* Translate the virtual mapping to the real one */
> +		lr &= ~GICH_LR_EOI;
> +		lr &= ~GICH_LR_PHYSID_CPUID;
> +		lr |= irq->hwintid << GICH_LR_PHYSID_CPUID_SHIFT;
> +		vgic_put_irq(vcpu->kvm, irq);
> +
> +next:
> +		s_cpu_if->vgic_lr[i] = lr;
> +	}
> +}
> +
> +/*
> + * Change the shadow HWIRQ field back to the virtual value before copying over
> + * the entire shadow struct to the nested state.
> + */
> +static void vgic_v2_restore_shadow_lr(struct kvm_vcpu *vcpu)
> +{
> +	struct vgic_v2_cpu_if *cpu_if = vcpu_nested_if(vcpu);
> +	struct vgic_v2_cpu_if *s_cpu_if = vcpu_shadow_if(vcpu);
> +	int nr_lr = kvm_vgic_global_state.nr_lr;
> +	int lr;
> +
> +	for (lr = 0; lr < nr_lr; lr++) {
> +		s_cpu_if->vgic_lr[lr] &= ~GICH_LR_PHYSID_CPUID;
> +		s_cpu_if->vgic_lr[lr] |= cpu_if->vgic_lr[lr] &
> +			GICH_LR_PHYSID_CPUID;
> +	}
> +}
> +
> +void vgic_v2_setup_shadow_state(struct kvm_vcpu *vcpu)
> +{
> +	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
> +	struct vgic_v2_cpu_if *cpu_if;
> +
> +	if (vcpu_el2_imo_is_set(vcpu) && !vcpu_mode_el2(vcpu)) {
> +		vgic_cpu->shadow_vgic_v2 = vgic_cpu->nested_vgic_v2;
> +		vgic_v2_create_shadow_lr(vcpu);
> +		cpu_if = vcpu_shadow_if(vcpu);
> +	} else {
> +		cpu_if = &vgic_cpu->vgic_v2;
> +	}
> +
> +	vgic_cpu->hw_v2_cpu_if = cpu_if;
> +}
> +
> +void vgic_v2_restore_shadow_state(struct kvm_vcpu *vcpu)
> +{
> +	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
> +
> +	/* Not using shadow state: Nothing to do... */
> +	if (vgic_cpu->hw_v2_cpu_if == &vgic_cpu->vgic_v2)
> +		return;
> +
> +	/*
> +	 * Translate the shadow state HW fields back to the virtual ones
> +	 * before copying the shadow struct back to the nested one.
> +	 */
> +	vgic_v2_restore_shadow_lr(vcpu);
> +	vgic_cpu->nested_vgic_v2 = vgic_cpu->shadow_vgic_v2;
> +}
> +
> +void vgic_init_nested(struct kvm_vcpu *vcpu)
> +{
> +	vgic_v2_setup_shadow_state(vcpu);
> +}
> diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
> index 9d9e014..2aef680 100644
> --- a/virt/kvm/arm/vgic/vgic.h
> +++ b/virt/kvm/arm/vgic/vgic.h
> @@ -120,4 +120,15 @@ static inline int vgic_its_inject_msi(struct kvm *kvm, struct kvm_msi *msi)
>  int vgic_lazy_init(struct kvm *kvm);
>  int vgic_init(struct kvm *kvm);
>  
> +#ifdef CONFIG_KVM_ARM_NESTED_HYP
> +void vgic_init_nested(struct kvm_vcpu *vcpu);
> +#else
> +static inline void vgic_init_nested(struct kvm_vcpu *vcpu)
> +{
> +	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
> +
> +	vgic_cpu->hw_v2_cpu_if = &vgic_cpu->vgic_v2;
> +}
> +#endif
> +
>  #endif
> -- 
> 1.9.1
> 
> 

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 30/55] KVM: arm/arm64: Inject irqs to the guest hypervisor
  2017-01-09  6:24 ` [RFC 30/55] KVM: arm/arm64: Inject irqs to the guest hypervisor Jintack Lim
@ 2017-02-22 13:16   ` Christoffer Dall
  0 siblings, 0 replies; 111+ messages in thread
From: Christoffer Dall @ 2017-02-22 13:16 UTC (permalink / raw)
  To: Jintack Lim
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel

On Mon, Jan 09, 2017 at 01:24:26AM -0500, Jintack Lim wrote:
> If we have a pending IRQ for the guest and the guest expects IRQs
> to be handled in its virtual EL2 mode (the virtual IMO bit is set)
> and it is not already running in virtual EL2 mode, then we have to
> emulate an IRQ exception.
> 
> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>  virt/kvm/arm/vgic/vgic.c | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> index 6440b56..4a98654 100644
> --- a/virt/kvm/arm/vgic/vgic.c
> +++ b/virt/kvm/arm/vgic/vgic.c
> @@ -17,6 +17,7 @@
>  #include <linux/kvm.h>
>  #include <linux/kvm_host.h>
>  #include <linux/list_sort.h>
> +#include <asm/kvm_emulate.h>
>  
>  #include "vgic.h"
>  
> @@ -652,6 +653,28 @@ static void vgic_flush_lr_state(struct kvm_vcpu *vcpu)
>  	/* Nuke remaining LRs */
>  	for ( ; count < kvm_vgic_global_state.nr_lr; count++)
>  		vgic_clear_lr(vcpu, count);
> +
> +	/*
> +	 * If we have any pending IRQ for the guest and the guest expects IRQs
> +	 * to be handled in its virtual EL2 mode (the virtual IMO bit is set)
> +	 * and it is not already running in virtual EL2 mode, then we have to
> +	 * emulate an IRQ exception to virtual IRQ. Note that a pending IRQ
> +	 * means an irq of which state is pending but not active.
> +	 */
> +	if (vcpu_el2_imo_is_set(vcpu) && !vcpu_mode_el2(vcpu)) {

Is this correct?

Shouldn't you also inject this to virtual EL2 even when virtual EL2 is
already running as long as the PSTATE.I bit is clear?

> +		bool pending = false;
> +
> +		list_for_each_entry(irq, &vgic_cpu->ap_list_head, ap_list) {

You need to take a lock when iterating over this list.

> +			spin_lock(&irq->irq_lock);
> +			pending = irq->pending && irq->enabled && !irq->active;
> +			spin_unlock(&irq->irq_lock);
> +
> +			if (pending) {
> +				kvm_inject_nested_irq(vcpu);
> +				break;
> +			}
> +		}

I would prefer to see this check that loops over the AP list as a
separate function that you call, like vgic_vcpu_has_pending_irq.

> +	}
>  }
>  
>  /* Sync back the hardware VGIC state into our emulation after a guest's run. */
> -- 
> 1.9.1
> 
> 

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 31/55] KVM: arm/arm64: Inject maintenance interrupts to the guest hypervisor
  2017-01-09  6:24 ` [RFC 31/55] KVM: arm/arm64: Inject maintenance interrupts " Jintack Lim
@ 2017-02-22 13:19   ` Christoffer Dall
  0 siblings, 0 replies; 111+ messages in thread
From: Christoffer Dall @ 2017-02-22 13:19 UTC (permalink / raw)
  To: Jintack Lim
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel

On Mon, Jan 09, 2017 at 01:24:27AM -0500, Jintack Lim wrote:
> From: Christoffer Dall <christoffer.dall@linaro.org>
> 
> If we exit a nested VM with a pending maintenance interrupt from the
> GIC, then we need to forward this to the guest hypervisor so that it can
> re-sync the appropriate LRs and sample level triggered interrupts again.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
> ---
>  arch/arm64/kvm/context.c           |  3 +++
>  include/kvm/arm_vgic.h             |  2 ++
>  virt/kvm/arm/vgic/vgic-v2-nested.c | 16 ++++++++++++++++
>  3 files changed, 21 insertions(+)
> 
> diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
> index 7a94c9d..a93ffe4 100644
> --- a/arch/arm64/kvm/context.c
> +++ b/arch/arm64/kvm/context.c
> @@ -140,6 +140,9 @@ static void sync_shadow_el1_state(struct kvm_vcpu *vcpu, bool setup)
>  void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
> +
> +	vgic_handle_nested_maint_irq(vcpu);
> +

It feels like I stuck this in some random place where it would work, but
now it looks weird to call a vgic function from the shadow_el1_state
function.  Can we find a more appropriate place to put it?

>  	if (unlikely(vcpu_mode_el2(vcpu))) {
>  		ctxt->hw_pstate = *vcpu_cpsr(vcpu) & ~PSR_MODE_MASK;
>  
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index 484f6b1..fc882d6 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -315,9 +315,11 @@ int kvm_vgic_inject_mapped_irq(struct kvm *kvm, int cpuid, unsigned int intid,
>  #ifdef CONFIG_KVM_ARM_NESTED_HYP
>  void vgic_v2_setup_shadow_state(struct kvm_vcpu *vcpu);
>  void vgic_v2_restore_shadow_state(struct kvm_vcpu *vcpu);
> +void vgic_handle_nested_maint_irq(struct kvm_vcpu *vcpu);
>  #else
>  static inline void vgic_v2_setup_shadow_state(struct kvm_vcpu *vcpu) { }
>  static inline void vgic_v2_restore_shadow_state(struct kvm_vcpu *vcpu) { }
> +static inline void vgic_handle_nested_maint_irq(struct kvm_vcpu *vcpu) { }
>  #endif
>  
>  #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
> diff --git a/virt/kvm/arm/vgic/vgic-v2-nested.c b/virt/kvm/arm/vgic/vgic-v2-nested.c
> index a992da5..85f646b 100644
> --- a/virt/kvm/arm/vgic/vgic-v2-nested.c
> +++ b/virt/kvm/arm/vgic/vgic-v2-nested.c
> @@ -300,6 +300,22 @@ void vgic_v2_restore_shadow_state(struct kvm_vcpu *vcpu)
>  	vgic_cpu->nested_vgic_v2 = vgic_cpu->shadow_vgic_v2;
>  }
>  
> +void vgic_handle_nested_maint_irq(struct kvm_vcpu *vcpu)
> +{
> +	struct vgic_v2_cpu_if *cpu_if = vcpu_nested_if(vcpu);
> +
> +	/*
> +	 * If we exit a nested VM with a pending maintenance interrupt from the
> +	 * GIC, then we need to forward this to the guest hypervisor so that it
> +	 * can re-sync the appropriate LRs and sample level triggered interrupts
> +	 * again.
> +	 */
> +	if (vcpu_el2_imo_is_set(vcpu) && !vcpu_mode_el2(vcpu) &&

I think I made the same mistake here, that we shouldn't check if we're
in EL2 or not, because I don't think the GIC cares.  I think we should
check if EL2 can accept interrupts (i.e. PSTATE.I is clear if it is in
EL2 or IMO is set if not).

> +	    (cpu_if->vgic_hcr & GICH_HCR_EN) &&
> +	    vgic_mmio_read_v2_misr(vcpu, 0, 0))

what are the zeroes?  They look dodgy.

> +		kvm_inject_nested_irq(vcpu);
> +}
> +
>  void vgic_init_nested(struct kvm_vcpu *vcpu)
>  {
>  	vgic_v2_setup_shadow_state(vcpu);
> -- 
> 1.9.1
> 
> 

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 32/55] KVM: arm/arm64: register GICH iodev for the guest hypervisor
  2017-01-09  6:24 ` [RFC 32/55] KVM: arm/arm64: register GICH iodev for " Jintack Lim
@ 2017-02-22 13:21   ` Christoffer Dall
  0 siblings, 0 replies; 111+ messages in thread
From: Christoffer Dall @ 2017-02-22 13:21 UTC (permalink / raw)
  To: Jintack Lim
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel

On Mon, Jan 09, 2017 at 01:24:28AM -0500, Jintack Lim wrote:
> Register a device for the virtual interface control block(GICH) access
> from the guest hypervisor.
> 
> TODO: Get GICH address from DT, which is hardcoded now.
> 

It's not so much about the DT as it is adding an API for userspace to
tell KVM where to place it, and userspace can add the same required info
into the DT/ACPI as needed.

This is obviously something we have to address sooner as opposed to
later.

Thanks,
-Christoffer

> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
> ---
>  arch/arm64/include/uapi/asm/kvm.h  |  6 ++++++
>  include/kvm/arm_vgic.h             |  5 ++++-
>  virt/kvm/arm/vgic/vgic-mmio.c      |  6 ++++++
>  virt/kvm/arm/vgic/vgic-v2-nested.c | 24 ++++++++++++++++++++++++
>  virt/kvm/arm/vgic/vgic-v2.c        |  7 +++++++
>  virt/kvm/arm/vgic/vgic.h           |  6 ++++++
>  6 files changed, 53 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> index 78117bf..3995d3d 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -99,6 +99,12 @@ struct kvm_regs {
>  #define KVM_ARM_VCPU_PMU_V3		3 /* Support guest PMUv3 */
>  #define KVM_ARM_VCPU_NESTED_VIRT	4 /* Support nested virtual EL2 */
>  
> +/* FIXME: This should come from DT */
> +#ifdef CONFIG_KVM_ARM_NESTED_HYP
> +#define KVM_VGIC_V2_GICH_BASE          0x08030000
> +#define KVM_VGIC_V2_GICH_SIZE          0x2000
> +#endif
> +
>  struct kvm_vcpu_init {
>  	__u32 target;
>  	__u32 features[7];
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index fc882d6..5bda20c 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -125,7 +125,8 @@ enum iodev_type {
>  	IODEV_CPUIF,
>  	IODEV_DIST,
>  	IODEV_REDIST,
> -	IODEV_ITS
> +	IODEV_ITS,
> +	IODEV_GICH,
>  };
>  
>  struct vgic_io_device {
> @@ -198,6 +199,8 @@ struct vgic_dist {
>  
>  	struct vgic_io_device	dist_iodev;
>  
> +	struct vgic_io_device	hyp_iodev;
> +
>  	bool			has_its;
>  
>  	/*
> diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c
> index 049c570..2e4097d 100644
> --- a/virt/kvm/arm/vgic/vgic-mmio.c
> +++ b/virt/kvm/arm/vgic/vgic-mmio.c
> @@ -512,6 +512,9 @@ static int dispatch_mmio_read(struct kvm_vcpu *vcpu, struct kvm_io_device *dev,
>  	case IODEV_ITS:
>  		data = region->its_read(vcpu->kvm, iodev->its, addr, len);
>  		break;
> +	case IODEV_GICH:
> +		data = region->read(vcpu, addr, len);
> +		break;
>  	}
>  
>  	vgic_data_host_to_mmio_bus(val, len, data);
> @@ -543,6 +546,9 @@ static int dispatch_mmio_write(struct kvm_vcpu *vcpu, struct kvm_io_device *dev,
>  	case IODEV_ITS:
>  		region->its_write(vcpu->kvm, iodev->its, addr, len, data);
>  		break;
> +	case IODEV_GICH:
> +		region->write(vcpu, addr, len, data);
> +		break;
>  	}
>  
>  	return 0;
> diff --git a/virt/kvm/arm/vgic/vgic-v2-nested.c b/virt/kvm/arm/vgic/vgic-v2-nested.c
> index 85f646b..cb55324 100644
> --- a/virt/kvm/arm/vgic/vgic-v2-nested.c
> +++ b/virt/kvm/arm/vgic/vgic-v2-nested.c
> @@ -206,6 +206,30 @@ static void vgic_mmio_write_v2_gich(struct kvm_vcpu *vcpu,
>  		4 * VGIC_V2_MAX_LRS, VGIC_ACCESS_32bit),
>  };
>  
> +int vgic_register_gich_iodev(struct kvm *kvm, struct vgic_dist *dist)
> +{
> +	struct vgic_io_device *io_device = &kvm->arch.vgic.hyp_iodev;
> +	int ret = 0;
> +	unsigned int len;
> +
> +	len = KVM_VGIC_V2_GICH_SIZE;
> +
> +	io_device->regions = vgic_v2_gich_registers;
> +	io_device->nr_regions = ARRAY_SIZE(vgic_v2_gich_registers);
> +	kvm_iodevice_init(&io_device->dev, &kvm_io_gic_ops);
> +
> +	io_device->base_addr = KVM_VGIC_V2_GICH_BASE;
> +	io_device->iodev_type = IODEV_GICH;
> +	io_device->redist_vcpu = NULL;
> +
> +	mutex_lock(&kvm->slots_lock);
> +	ret = kvm_io_bus_register_dev(kvm, KVM_MMIO_BUS, KVM_VGIC_V2_GICH_BASE,
> +			len, &io_device->dev);
> +	mutex_unlock(&kvm->slots_lock);
> +
> +	return ret;
> +}
> +
>  /*
>   * For LRs which have HW bit set such as timer interrupts, we modify them to
>   * have the host hardware interrupt number instead of the virtual one programmed
> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
> index 9bab867..b8b73fd 100644
> --- a/virt/kvm/arm/vgic/vgic-v2.c
> +++ b/virt/kvm/arm/vgic/vgic-v2.c
> @@ -280,6 +280,13 @@ int vgic_v2_map_resources(struct kvm *kvm)
>  		goto out;
>  	}
>  
> +	/* Register virtual GICH interface to kvm io bus */
> +	ret = vgic_register_gich_iodev(kvm, dist);
> +	if (ret) {
> +		kvm_err("Unable to register VGIC GICH regions\n");
> +		goto out;
> +	}
> +
>  	if (!static_branch_unlikely(&vgic_v2_cpuif_trap)) {
>  		ret = kvm_phys_addr_ioremap(kvm, dist->vgic_cpu_base,
>  					    kvm_vgic_global_state.vcpu_base,
> diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
> index 2aef680..11d61a7 100644
> --- a/virt/kvm/arm/vgic/vgic.h
> +++ b/virt/kvm/arm/vgic/vgic.h
> @@ -121,8 +121,14 @@ static inline int vgic_its_inject_msi(struct kvm *kvm, struct kvm_msi *msi)
>  int vgic_init(struct kvm *kvm);
>  
>  #ifdef CONFIG_KVM_ARM_NESTED_HYP
> +int vgic_register_gich_iodev(struct kvm *kvm, struct vgic_dist *dist);
>  void vgic_init_nested(struct kvm_vcpu *vcpu);
>  #else
> +static inline int vgic_register_gich_iodev(struct kvm *kvm,
> +		struct vgic_dist *dist)
> +{
> +	return 0;
> +}
>  static inline void vgic_init_nested(struct kvm_vcpu *vcpu)
>  {
>  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
> -- 
> 1.9.1
> 
> 

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 39/55] KVM: arm/arm64: Add mmu context for the nesting
  2017-01-09  6:24 ` [RFC 39/55] KVM: arm/arm64: Add mmu context for the nesting Jintack Lim
@ 2017-02-22 13:34   ` Christoffer Dall
  0 siblings, 0 replies; 111+ messages in thread
From: Christoffer Dall @ 2017-02-22 13:34 UTC (permalink / raw)
  To: Jintack Lim
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel

On Mon, Jan 09, 2017 at 01:24:35AM -0500, Jintack Lim wrote:
> Add the shadow stage-2 MMU context to be used for the nesting, but don't
> do anything with it yet.
> 
> The host hypervisor maintains mmu structures for each nested VM. When
> entering a nested VM, the host hypervisor searches for the nested VM's
> mmu using vmid as a key. Note that this vmid is from the guest
> hypervisor's point of view.

I feel like I'm missing some overall design description or rationale of
why this is needed.  Can you expand on this commit message a bit?

> 
> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
> ---
>  arch/arm/include/asm/kvm_host.h      |  3 ++
>  arch/arm/kvm/arm.c                   |  1 +
>  arch/arm64/include/asm/kvm_emulate.h | 13 ++++-----
>  arch/arm64/include/asm/kvm_host.h    | 19 +++++++++++++
>  arch/arm64/include/asm/kvm_mmu.h     | 31 ++++++++++++++++++++
>  arch/arm64/kvm/Makefile              |  1 +
>  arch/arm64/kvm/context.c             |  2 +-
>  arch/arm64/kvm/mmu-nested.c          | 55 ++++++++++++++++++++++++++++++++++++
>  8 files changed, 116 insertions(+), 9 deletions(-)
>  create mode 100644 arch/arm64/kvm/mmu-nested.c
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index da45394..fbde48d 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -82,6 +82,9 @@ struct kvm_arch {
>  	 * here.
>  	 */
>  
> +	/* Never used on arm but added to be compatible with arm64 */
> +	struct list_head nested_mmu_list;
> +
>  	/* Interrupt controller */
>  	struct vgic_dist	vgic;
>  	int max_vcpus;
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 371b38e7..147df97 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -146,6 +146,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  	/* Mark the initial VMID generation invalid */
>  	kvm->arch.mmu.vmid.vmid_gen = 0;
>  	kvm->arch.mmu.el2_vmid.vmid_gen = 0;
> +	INIT_LIST_HEAD(&kvm->arch.nested_mmu_list);
>  
>  	/* The maximum number of VCPUs is limited by the host's GIC model */
>  	kvm->arch.max_vcpus = vgic_present ?
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 94068e7..abad676 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -183,6 +183,11 @@ static inline bool vcpu_el2_imo_is_set(const struct kvm_vcpu *vcpu)
>  	return (vcpu_el2_reg(vcpu, HCR_EL2) & HCR_IMO);
>  }
>  
> +static inline bool vcpu_nested_stage2_enabled(const struct kvm_vcpu *vcpu)
> +{
> +	return (vcpu_el2_reg(vcpu, HCR_EL2) & HCR_VM);
> +}
> +
>  static inline u32 kvm_vcpu_get_hsr(const struct kvm_vcpu *vcpu)
>  {
>  	return vcpu->arch.fault.esr_el2;
> @@ -363,12 +368,4 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
>  	return data;		/* Leave LE untouched */
>  }
>  
> -static inline struct kvm_s2_vmid *vcpu_get_active_vmid(struct kvm_vcpu *vcpu)
> -{
> -	if (unlikely(vcpu_mode_el2(vcpu)))
> -		return &vcpu->kvm->arch.mmu.el2_vmid;
> -
> -	return &vcpu->kvm->arch.mmu.vmid;
> -}
> -
>  #endif /* __ARM64_KVM_EMULATE_H__ */
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index b33d35d..23e2267 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -65,6 +65,22 @@ struct kvm_s2_mmu {
>  	pgd_t *pgd;
>  };
>  
> +/* Per nested VM mmu structure */
> +struct kvm_nested_s2_mmu {
> +	struct kvm_s2_mmu mmu;
> +
> +	/*
> +	 * The vttbr value set by the guest hypervisor for this nested VM.
> +	 * vmid field is used as a key to search for this mmu structure among
> +	 * all nested VM mmu structures by the host hypervisor.
> +	 * baddr field is used to determine if we need to unmap stage 2
> +	 * shadow page tables.
> +	 */

I don't really understand this comment in isolation - especially not the
baddr part.

> +	u64 virtual_vttbr;
> +
> +	struct list_head list;
> +};
> +
>  struct kvm_arch {
>  	/* Stage 2 paging state for the VM */
>  	struct kvm_s2_mmu mmu;
> @@ -80,6 +96,9 @@ struct kvm_arch {
>  
>  	/* Timer */
>  	struct arch_timer_kvm	timer;
> +
> +	/* Stage 2 shadow paging contexts for nested L2 VM */
> +	struct list_head nested_mmu_list;
>  };
>  
>  #define KVM_NR_MEM_OBJS     40
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index a504162..d1ef650 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -112,6 +112,7 @@
>  #include <asm/cacheflush.h>
>  #include <asm/mmu_context.h>
>  #include <asm/pgtable.h>
> +#include <asm/kvm_emulate.h>
>  
>  static inline unsigned long __kern_hyp_va(unsigned long v)
>  {
> @@ -323,6 +324,21 @@ static inline unsigned int kvm_get_vmid_bits(void)
>  	return (cpuid_feature_extract_unsigned_field(reg, ID_AA64MMFR1_VMIDBITS_SHIFT) == 2) ? 16 : 8;
>  }
>  
> +#ifdef CONFIG_KVM_ARM_NESTED_HYP
> +struct kvm_nested_s2_mmu *get_nested_mmu(struct kvm_vcpu *vcpu, u64 vttbr);
> +struct kvm_s2_mmu *vcpu_get_active_s2_mmu(struct kvm_vcpu *vcpu);
> +#else
> +static inline struct kvm_nested_s2_mmu *get_nested_mmu(struct kvm_vcpu *vcpu,
> +						       u64 vttbr)
> +{
> +	return NULL;
> +}
> +static inline struct kvm_s2_mmu *vcpu_get_active_s2_mmu(struct kvm_vcpu *vcpu)
> +{
> +	return &vcpu->kvm->arch.mmu;
> +}
> +#endif
> +
>  static inline u64 kvm_get_vttbr(struct kvm_s2_vmid *vmid,
>  				struct kvm_s2_mmu *mmu)
>  {
> @@ -334,5 +350,20 @@ static inline u64 kvm_get_vttbr(struct kvm_s2_vmid *vmid,
>  	return baddr | vmid_field;
>  }
>  
> +static inline u64 get_vmid(u64 vttbr)
> +{
> +	return (vttbr & VTTBR_VMID_MASK(get_kvm_vmid_bits()))>>VTTBR_VMID_SHIFT;

whitespacealertbetweentheshiftmarker.

> +}
> +
> +static inline struct kvm_s2_vmid *vcpu_get_active_vmid(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_s2_mmu *mmu = vcpu_get_active_s2_mmu(vcpu);
> +
> +	if (unlikely(vcpu_mode_el2(vcpu)))
> +		return &mmu->el2_vmid;
> +	else
> +		return &mmu->vmid;
> +}
> +
>  #endif /* __ASSEMBLY__ */
>  #endif /* __ARM64_KVM_MMU_H__ */
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index 8573faf..b0b1074 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -36,5 +36,6 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
>  kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
>  
>  kvm-$(CONFIG_KVM_ARM_NESTED_HYP) += handle_exit_nested.o
> +kvm-$(CONFIG_KVM_ARM_NESTED_HYP) += mmu-nested.o
>  kvm-$(CONFIG_KVM_ARM_NESTED_HYP) += emulate-nested.o
>  kvm-$(CONFIG_KVM_ARM_NESTED_HYP) += $(KVM)/arm/vgic/vgic-v2-nested.o
> diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
> index b2c0220..9ebc38f 100644
> --- a/arch/arm64/kvm/context.c
> +++ b/arch/arm64/kvm/context.c
> @@ -91,7 +91,7 @@ static void create_shadow_el1_sysregs(struct kvm_vcpu *vcpu)
>  
>  static void setup_s2_mmu(struct kvm_vcpu *vcpu)
>  {
> -	struct kvm_s2_mmu *mmu = &vcpu->kvm->arch.mmu;
> +	struct kvm_s2_mmu *mmu = vcpu_get_active_s2_mmu(vcpu);
>  	struct kvm_s2_vmid *vmid = vcpu_get_active_vmid(vcpu);
>  
>  	vcpu->arch.hw_vttbr = kvm_get_vttbr(vmid, mmu);
> diff --git a/arch/arm64/kvm/mmu-nested.c b/arch/arm64/kvm/mmu-nested.c
> new file mode 100644
> index 0000000..d52078f
> --- /dev/null
> +++ b/arch/arm64/kvm/mmu-nested.c
> @@ -0,0 +1,55 @@
> +/*
> + * Copyright (C) 2016 - Columbia University
> + * Author: Jintack Lim <jintack@cs.columbia.edu>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <linux/kvm_host.h>
> +
> +#include <asm/kvm_arm.h>
> +#include <asm/kvm_emulate.h>
> +#include <asm/kvm_mmu.h>
> +#include <asm/kvm_nested.h>
> +
> +struct kvm_nested_s2_mmu *get_nested_mmu(struct kvm_vcpu *vcpu, u64 vttbr)
> +{
> +	struct kvm_nested_s2_mmu *mmu;
> +	u64 target_vmid = get_vmid(vttbr);
> +	struct list_head *nested_mmu_list = &vcpu->kvm->arch.nested_mmu_list;
> +
> +	list_for_each_entry_rcu(mmu, nested_mmu_list, list) {
> +		u64 vmid = get_vmid(mmu->virtual_vttbr);
> +
> +		if (target_vmid == vmid)

why is it sufficient to just look at the VMID and not having to consider
the baddr?

> +			return mmu;
> +	}
> +	return NULL;
> +}
> +
> +struct kvm_s2_mmu *vcpu_get_active_s2_mmu(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_nested_s2_mmu *nested_mmu;
> +
> +	/* If we are NOT entering the nested VM, return mmu in kvm_arch */

this comment doesn't add any info not clear in the code

> +	if (vcpu_mode_el2(vcpu) || !vcpu_nested_stage2_enabled(vcpu))
> +		return &vcpu->kvm->arch.mmu;
> +
> +	/* Otherwise, search for nested_mmu in the list */
> +	nested_mmu = get_nested_mmu(vcpu, vcpu_el2_reg(vcpu, VTTBR_EL2));
> +
> +	/* When this function is called, nested_mmu should be in the list */
> +	BUG_ON(!nested_mmu);

can you provide a slightly stronger rationale behind why this BUG_ON
should never fire - I don't feel convinced right now.

> +
> +	return &nested_mmu->mmu;
> +}
> -- 
> 1.9.1
> 
> 

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 35/55] KVM: arm/arm64: Support mmu for the virtual EL2 execution
  2017-01-09  6:24 ` [RFC 35/55] KVM: arm/arm64: Support mmu for the virtual EL2 execution Jintack Lim
@ 2017-02-22 13:38   ` Christoffer Dall
  0 siblings, 0 replies; 111+ messages in thread
From: Christoffer Dall @ 2017-02-22 13:38 UTC (permalink / raw)
  To: Jintack Lim
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel

On Mon, Jan 09, 2017 at 01:24:31AM -0500, Jintack Lim wrote:
> From: Christoffer Dall <christoffer.dall@linaro.org>
> 
> When running a guest hypervisor in virtual EL2, the translation context
> has to be separate from the rest of the system, including the guest
> EL1/0 translation regime, so we allocate a separate VMID for this mode.
> 
> Considering that we have two different vttbr values due to separate
> VMIDs, it's racy to keep a vttbr value in a struct (kvm_s2_mmu) and
> share it between multiple vcpus. So, keep the vttbr value per vcpu.
> 
> Hypercalls to flush tlb now have vttbr as a parameter instead of mmu,
> since mmu structure does not have vttbr any more.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
> ---
>  arch/arm/include/asm/kvm_asm.h       |  6 ++--
>  arch/arm/include/asm/kvm_emulate.h   |  4 +++
>  arch/arm/include/asm/kvm_host.h      | 14 ++++++---
>  arch/arm/include/asm/kvm_mmu.h       | 11 +++++++
>  arch/arm/kvm/arm.c                   | 60 +++++++++++++++++++-----------------
>  arch/arm/kvm/hyp/switch.c            |  4 +--
>  arch/arm/kvm/hyp/tlb.c               | 15 ++++-----
>  arch/arm/kvm/mmu.c                   |  9 ++++--
>  arch/arm64/include/asm/kvm_asm.h     |  6 ++--
>  arch/arm64/include/asm/kvm_emulate.h |  8 +++++
>  arch/arm64/include/asm/kvm_host.h    | 14 ++++++---
>  arch/arm64/include/asm/kvm_mmu.h     | 11 +++++++
>  arch/arm64/kvm/hyp/switch.c          |  4 +--
>  arch/arm64/kvm/hyp/tlb.c             | 16 ++++------
>  14 files changed, 112 insertions(+), 70 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
> index 36e3856..aa214f7 100644
> --- a/arch/arm/include/asm/kvm_asm.h
> +++ b/arch/arm/include/asm/kvm_asm.h
> @@ -65,9 +65,9 @@
>  extern char __kvm_hyp_vector[];
>  
>  extern void __kvm_flush_vm_context(void);
> -extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa);
> -extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
> -extern void __kvm_tlb_flush_local_vmid(struct kvm_s2_mmu *mmu);
> +extern void __kvm_tlb_flush_vmid_ipa(u64 vttbr, phys_addr_t ipa);
> +extern void __kvm_tlb_flush_vmid(u64 vttbr);
> +extern void __kvm_tlb_flush_local_vmid(u64 vttbr);
>  
>  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>  
> diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
> index 05d5906..6285f4f 100644
> --- a/arch/arm/include/asm/kvm_emulate.h
> +++ b/arch/arm/include/asm/kvm_emulate.h
> @@ -305,4 +305,8 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
>  	}
>  }
>  
> +static inline struct kvm_s2_vmid *vcpu_get_active_vmid(struct kvm_vcpu *vcpu)
> +{
> +	return &vcpu->kvm->arch.mmu.vmid;
> +}
>  #endif /* __ARM_KVM_EMULATE_H__ */
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index f84a59c..da45394 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -53,16 +53,18 @@
>  int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
>  void kvm_reset_coprocs(struct kvm_vcpu *vcpu);
>  
> -struct kvm_s2_mmu {
> +struct kvm_s2_vmid {
>  	/* The VMID generation used for the virt. memory system */
>  	u64    vmid_gen;
>  	u32    vmid;
> +};
> +
> +struct kvm_s2_mmu {
> +	struct kvm_s2_vmid vmid;
> +	struct kvm_s2_vmid el2_vmid;

So this is subtle:  We use struct kvm_s2_mmu for the stage-2 context
used for the L1 VM, and for the L2 VM as well, right?  But only in the
first case can the el2_vmid have any valid meaning, and it's simply
ignored in other contexts.

Not sure if we can improve on this data structure design, but we could
at least add a comment on this somewhere.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 40/55] KVM: arm/arm64: Handle vttbr_el2 write operation from the guest hypervisor
  2017-01-09  6:24 ` [RFC 40/55] KVM: arm/arm64: Handle vttbr_el2 write operation from the guest hypervisor Jintack Lim
@ 2017-02-22 17:59   ` Christoffer Dall
  0 siblings, 0 replies; 111+ messages in thread
From: Christoffer Dall @ 2017-02-22 17:59 UTC (permalink / raw)
  To: Jintack Lim
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel

On Mon, Jan 09, 2017 at 01:24:36AM -0500, Jintack Lim wrote:
> Each nested VM is supposed to have a mmu (i.e. shadow stage-2 page

to have a 'struct kvm_mmu' ?

> table), and we create it when the guest hypervisor writes to vttbr_el2
> with a new vmid.

I think the commit message should also mention that you maintain a list
of seen nested stage 2 translation contexts and associated shadow page
tables.

> 
> In case the guest hypervisor writes to vttbr_el2 with existing vmid, we
> check if the base address is changed. If so, then what we have in the
> shadow page table is not valid any more. So ummap it.

unmap?  We clear the entire shadow stage 2 page table, right?

> 
> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
> ---
>  arch/arm/include/asm/kvm_host.h   |  1 +
>  arch/arm/kvm/arm.c                |  1 +
>  arch/arm64/include/asm/kvm_host.h |  1 +
>  arch/arm64/include/asm/kvm_mmu.h  |  6 ++++
>  arch/arm64/kvm/mmu-nested.c       | 71 +++++++++++++++++++++++++++++++++++++++
>  arch/arm64/kvm/sys_regs.c         | 15 ++++++++-
>  6 files changed, 94 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index fbde48d..ebf2810 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -84,6 +84,7 @@ struct kvm_arch {
>  
>  	/* Never used on arm but added to be compatible with arm64 */
>  	struct list_head nested_mmu_list;
> +	spinlock_t mmu_list_lock;
>  
>  	/* Interrupt controller */
>  	struct vgic_dist	vgic;
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 147df97..6fa5754 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -147,6 +147,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  	kvm->arch.mmu.vmid.vmid_gen = 0;
>  	kvm->arch.mmu.el2_vmid.vmid_gen = 0;
>  	INIT_LIST_HEAD(&kvm->arch.nested_mmu_list);
> +	spin_lock_init(&kvm->arch.mmu_list_lock);
>  
>  	/* The maximum number of VCPUs is limited by the host's GIC model */
>  	kvm->arch.max_vcpus = vgic_present ?
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 23e2267..52eea76 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -99,6 +99,7 @@ struct kvm_arch {
>  
>  	/* Stage 2 shadow paging contexts for nested L2 VM */
>  	struct list_head nested_mmu_list;
> +	spinlock_t mmu_list_lock;

I'm wondering if we really need the separate spin lock or if we could
just grab the KVM mutex?

>  };
>  
>  #define KVM_NR_MEM_OBJS     40
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index d1ef650..fdc9327 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -327,6 +327,7 @@ static inline unsigned int kvm_get_vmid_bits(void)
>  #ifdef CONFIG_KVM_ARM_NESTED_HYP
>  struct kvm_nested_s2_mmu *get_nested_mmu(struct kvm_vcpu *vcpu, u64 vttbr);
>  struct kvm_s2_mmu *vcpu_get_active_s2_mmu(struct kvm_vcpu *vcpu);
> +bool handle_vttbr_update(struct kvm_vcpu *vcpu, u64 vttbr);
>  #else
>  static inline struct kvm_nested_s2_mmu *get_nested_mmu(struct kvm_vcpu *vcpu,
>  						       u64 vttbr)
> @@ -337,6 +338,11 @@ static inline struct kvm_s2_mmu *vcpu_get_active_s2_mmu(struct kvm_vcpu *vcpu)
>  {
>  	return &vcpu->kvm->arch.mmu;
>  }
> +
> +static inline bool handle_vttbr_update(struct kvm_vcpu *vcpu, u64 vttbr)
> +{
> +	return false;
> +}
>  #endif
>  
>  static inline u64 kvm_get_vttbr(struct kvm_s2_vmid *vmid,
> diff --git a/arch/arm64/kvm/mmu-nested.c b/arch/arm64/kvm/mmu-nested.c
> index d52078f..0811d94 100644
> --- a/arch/arm64/kvm/mmu-nested.c
> +++ b/arch/arm64/kvm/mmu-nested.c
> @@ -53,3 +53,74 @@ struct kvm_s2_mmu *vcpu_get_active_s2_mmu(struct kvm_vcpu *vcpu)
>  
>  	return &nested_mmu->mmu;
>  }
> +
> +static struct kvm_nested_s2_mmu *create_nested_mmu(struct kvm_vcpu *vcpu,
> +						   u64 vttbr)
> +{
> +	struct kvm_nested_s2_mmu *nested_mmu, *tmp_mmu;
> +	struct list_head *nested_mmu_list = &vcpu->kvm->arch.nested_mmu_list;
> +	bool need_free = false;
> +	int ret;
> +
> +	nested_mmu = kzalloc(sizeof(struct kvm_nested_s2_mmu), GFP_KERNEL);
> +	if (!nested_mmu)
> +		return NULL;
> +
> +	ret = __kvm_alloc_stage2_pgd(&nested_mmu->mmu);
> +	if (ret) {
> +		kfree(nested_mmu);
> +		return NULL;
> +	}
> +
> +	spin_lock(&vcpu->kvm->arch.mmu_list_lock);
> +	tmp_mmu = get_nested_mmu(vcpu, vttbr);
> +	if (!tmp_mmu)
> +		list_add_rcu(&nested_mmu->list, nested_mmu_list);
> +	else /* Somebody already created and put a new nested_mmu to the list */
> +		need_free = true;
> +	spin_unlock(&vcpu->kvm->arch.mmu_list_lock);
> +
> +	if (need_free) {
> +		__kvm_free_stage2_pgd(&nested_mmu->mmu);
> +		kfree(nested_mmu);
> +		nested_mmu = tmp_mmu;
> +	}
> +
> +	return nested_mmu;
> +}
> +
> +static void kvm_nested_s2_unmap(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_nested_s2_mmu *nested_mmu;
> +	struct list_head *nested_mmu_list = &vcpu->kvm->arch.nested_mmu_list;
> +
> +	list_for_each_entry_rcu(nested_mmu, nested_mmu_list, list)
> +		kvm_unmap_stage2_range(&nested_mmu->mmu, 0, KVM_PHYS_SIZE);
> +}
> +
> +bool handle_vttbr_update(struct kvm_vcpu *vcpu, u64 vttbr)
> +{
> +	struct kvm_nested_s2_mmu *nested_mmu;
> +
> +	/* See if we can relax this */

huh?

> +	if (!vttbr)

why is this a special case?

Theoretically an IPA of zero and VMID zero could be a valid page table
base pointer, right?

I'm gussing because the guest hypervisor occasionally writes zero into
VTTBR_EL2, for example when not using stage 2 translation, so perhaps
what you need to do is to defer creating a new nested mmu structure
until you actually enter the VM with stage 2 paging enabled?

> +		return true;
> +
> +	nested_mmu = (struct kvm_nested_s2_mmu *)get_nested_mmu(vcpu, vttbr);
> +	if (!nested_mmu) {
> +		nested_mmu = create_nested_mmu(vcpu, vttbr);
> +		if (!nested_mmu)
> +			return false;

I'm wondering if this can be simplified by having get_nested_mmu lookup
and allocate the struct and renaming the get_nested_mmu funtion to
lookup_nested_mmu?  This caller looks racy, even though it isn't, which
would be improved by my suggestion as well.

> +	} else {
> +		/*
> +		 * unmap the shadow page table if vttbr_el2 is

While the function is called unmap, what we really do is
clearing/flushing the shadow stage 2 page table.

> +		 * changed to different value
> +		 */
> +		if (vttbr != nested_mmu->virtual_vttbr)
> +			kvm_nested_s2_unmap(vcpu);
> +	}
> +
> +	nested_mmu->virtual_vttbr = vttbr;
> +
> +	return true;
> +}
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index e66f40d..ddb641c 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -960,6 +960,19 @@ static bool access_cpacr(struct kvm_vcpu *vcpu,
>  	return true;
>  }
>  
> +static bool access_vttbr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
> +			 const struct sys_reg_desc *r)
> +{
> +	u64 vttbr = p->regval;
> +
> +	if (!p->is_write) {
> +		p->regval = vcpu_el2_reg(vcpu, r->reg);
> +		return true;
> +	}
> +
> +	return handle_vttbr_update(vcpu, vttbr);
> +}
> +
>  static bool trap_el2_reg(struct kvm_vcpu *vcpu,
>  			 struct sys_reg_params *p,
>  			 const struct sys_reg_desc *r)
> @@ -1306,7 +1319,7 @@ static bool trap_el2_reg(struct kvm_vcpu *vcpu,
>  	  trap_el2_reg, reset_el2_val, TCR_EL2, 0 },
>  	/* VTTBR_EL2 */
>  	{ Op0(0b11), Op1(0b100), CRn(0b0010), CRm(0b0001), Op2(0b000),
> -	  trap_el2_reg, reset_el2_val, VTTBR_EL2, 0 },
> +	  access_vttbr, reset_el2_val, VTTBR_EL2, 0 },
>  	/* VTCR_EL2 */
>  	{ Op0(0b11), Op1(0b100), CRn(0b0010), CRm(0b0001), Op2(0b010),
>  	  trap_el2_reg, reset_el2_val, VTCR_EL2, 0 },
> -- 
> 1.9.1
> 
> 

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 41/55] KVM: arm/arm64: Unmap/flush shadow stage 2 page tables
  2017-01-09  6:24 ` [RFC 41/55] KVM: arm/arm64: Unmap/flush shadow stage 2 page tables Jintack Lim
@ 2017-02-22 18:09   ` Christoffer Dall
  0 siblings, 0 replies; 111+ messages in thread
From: Christoffer Dall @ 2017-02-22 18:09 UTC (permalink / raw)
  To: Jintack Lim
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel

On Mon, Jan 09, 2017 at 01:24:37AM -0500, Jintack Lim wrote:
> From: Christoffer Dall <christoffer.dall@linaro.org>
> 
> Unmap/flush shadow stage 2 page tables for the nested VMs as well as the
> stage 2 page table for the guest hypervisor.
> 
> Note: A bunch of the code in mmu.c relating to MMU notifiers is
> currently dealt with in an extremely abrupt way, for example by clearing
> out an entire shadow stage-2 table.  Probably we can do smarter with
> some sort of rmap structure.

I think we need to do better than this patch for merging something
upstream.  At least the current approach will not perform well if we run
more than one guest hypervisor on the system.

> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
> ---
>  arch/arm/include/asm/kvm_mmu.h   |  7 ++++
>  arch/arm/kvm/arm.c               |  6 ++-
>  arch/arm/kvm/mmu.c               | 11 +++++
>  arch/arm64/include/asm/kvm_mmu.h | 13 ++++++
>  arch/arm64/kvm/mmu-nested.c      | 90 ++++++++++++++++++++++++++++++++++++----
>  5 files changed, 117 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index 1b3309c..ae3aa39 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -230,6 +230,13 @@ static inline unsigned int kvm_get_vmid_bits(void)
>  	return 8;
>  }
>  
> +static inline void kvm_nested_s2_unmap(struct kvm_vcpu *vcpu) { }
> +static inline int kvm_nested_s2_init(struct kvm_vcpu *vcpu) { return 0; }
> +static inline void kvm_nested_s2_teardown(struct kvm_vcpu *vcpu) { }
> +static inline void kvm_nested_s2_all_vcpus_wp(struct kvm *kvm) { }
> +static inline void kvm_nested_s2_all_vcpus_unmap(struct kvm *kvm) { }
> +static inline void kvm_nested_s2_all_vcpus_flush(struct kvm *kvm) { }
> +
>  static inline u64 kvm_get_vttbr(struct kvm_s2_vmid *vmid,
>  				struct kvm_s2_mmu *mmu)
>  {
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 6fa5754..dc2795f 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -191,6 +191,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
>  
>  	for (i = 0; i < KVM_MAX_VCPUS; ++i) {
>  		if (kvm->vcpus[i]) {
> +			kvm_nested_s2_teardown(kvm->vcpus[i]);
>  			kvm_arch_vcpu_free(kvm->vcpus[i]);
>  			kvm->vcpus[i] = NULL;
>  		}
> @@ -333,6 +334,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
>  
>  	vcpu->arch.hw_mmu = mmu;
>  	vcpu->arch.hw_vttbr = kvm_get_vttbr(&mmu->vmid, mmu);
> +	kvm_nested_s2_init(vcpu);
>  
>  	return 0;
>  }
> @@ -871,8 +873,10 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu,
>  	 * Ensure a rebooted VM will fault in RAM pages and detect if the
>  	 * guest MMU is turned off and flush the caches as needed.
>  	 */
> -	if (vcpu->arch.has_run_once)
> +	if (vcpu->arch.has_run_once) {
>  		stage2_unmap_vm(vcpu->kvm);
> +		kvm_nested_s2_unmap(vcpu);
> +	}
>  
>  	vcpu_reset_hcr(vcpu);
>  
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 98b42e8..1677a87 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -416,6 +416,8 @@ static void stage2_flush_vm(struct kvm *kvm)
>  	kvm_for_each_memslot(memslot, slots)
>  		stage2_flush_memslot(&kvm->arch.mmu, memslot);
>  
> +	kvm_nested_s2_all_vcpus_flush(kvm);
> +
>  	spin_unlock(&kvm->mmu_lock);
>  	srcu_read_unlock(&kvm->srcu, idx);
>  }
> @@ -1240,6 +1242,7 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
>  
>  	spin_lock(&kvm->mmu_lock);
>  	kvm_stage2_wp_range(kvm, &kvm->arch.mmu, start, end);
> +	kvm_nested_s2_all_vcpus_wp(kvm);
>  	spin_unlock(&kvm->mmu_lock);
>  	kvm_flush_remote_tlbs(kvm);
>  }
> @@ -1278,6 +1281,7 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>  		gfn_t gfn_offset, unsigned long mask)
>  {
>  	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
> +	kvm_nested_s2_all_vcpus_wp(kvm);
>  }
>  
>  static void coherent_cache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
> @@ -1604,6 +1608,7 @@ static int handle_hva_to_gpa(struct kvm *kvm,
>  static int kvm_unmap_hva_handler(struct kvm *kvm, gpa_t gpa, void *data)
>  {
>  	kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, PAGE_SIZE);
> +	kvm_nested_s2_all_vcpus_unmap(kvm);
>  	return 0;
>  }
>  
> @@ -1642,6 +1647,7 @@ static int kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, void *data)
>  	 * through this calling path.
>  	 */
>  	stage2_set_pte(&kvm->arch.mmu, NULL, gpa, pte, 0);
> +	kvm_nested_s2_all_vcpus_unmap(kvm);
>  	return 0;
>  }
>  
> @@ -1675,6 +1681,8 @@ static int kvm_age_hva_handler(struct kvm *kvm, gpa_t gpa, void *data)
>  	if (pte_none(*pte))
>  		return 0;
>  
> +	/* TODO: Handle nested_mmu structures here as well */
> +
>  	return stage2_ptep_test_and_clear_young(pte);
>  }
>  
> @@ -1694,6 +1702,8 @@ static int kvm_test_age_hva_handler(struct kvm *kvm, gpa_t gpa, void *data)
>  	if (!pte_none(*pte))		/* Just a page... */
>  		return pte_young(*pte);
>  
> +	/* TODO: Handle nested_mmu structures here as well */

These TODO's should be addresses somehow as well.

> +
>  	return 0;
>  }
>  
> @@ -1959,6 +1969,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>  
>  	spin_lock(&kvm->mmu_lock);
>  	kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, size);
> +	kvm_nested_s2_all_vcpus_unmap(kvm);
>  	spin_unlock(&kvm->mmu_lock);
>  }
>  
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index fdc9327..e4d5d54 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -328,6 +328,12 @@ static inline unsigned int kvm_get_vmid_bits(void)
>  struct kvm_nested_s2_mmu *get_nested_mmu(struct kvm_vcpu *vcpu, u64 vttbr);
>  struct kvm_s2_mmu *vcpu_get_active_s2_mmu(struct kvm_vcpu *vcpu);
>  bool handle_vttbr_update(struct kvm_vcpu *vcpu, u64 vttbr);
> +void kvm_nested_s2_unmap(struct kvm_vcpu *vcpu);
> +int kvm_nested_s2_init(struct kvm_vcpu *vcpu);
> +void kvm_nested_s2_teardown(struct kvm_vcpu *vcpu);
> +void kvm_nested_s2_all_vcpus_wp(struct kvm *kvm);
> +void kvm_nested_s2_all_vcpus_unmap(struct kvm *kvm);
> +void kvm_nested_s2_all_vcpus_flush(struct kvm *kvm);
>  #else
>  static inline struct kvm_nested_s2_mmu *get_nested_mmu(struct kvm_vcpu *vcpu,
>  						       u64 vttbr)
> @@ -343,6 +349,13 @@ static inline bool handle_vttbr_update(struct kvm_vcpu *vcpu, u64 vttbr)
>  {
>  	return false;
>  }
> +
> +static inline void kvm_nested_s2_unmap(struct kvm_vcpu *vcpu) { }
> +static inline int kvm_nested_s2_init(struct kvm_vcpu *vcpu) { return 0; }
> +static inline void kvm_nested_s2_teardown(struct kvm_vcpu *vcpu) { }
> +static inline void kvm_nested_s2_all_vcpus_wp(struct kvm *kvm) { }
> +static inline void kvm_nested_s2_all_vcpus_unmap(struct kvm *kvm) { }
> +static inline void kvm_nested_s2_all_vcpus_flush(struct kvm *kvm) { }
>  #endif
>  
>  static inline u64 kvm_get_vttbr(struct kvm_s2_vmid *vmid,
> diff --git a/arch/arm64/kvm/mmu-nested.c b/arch/arm64/kvm/mmu-nested.c
> index 0811d94..b22b78c 100644
> --- a/arch/arm64/kvm/mmu-nested.c
> +++ b/arch/arm64/kvm/mmu-nested.c
> @@ -1,6 +1,7 @@
>  /*
>   * Copyright (C) 2016 - Columbia University
>   * Author: Jintack Lim <jintack@cs.columbia.edu>
> + * Author: Christoffer Dall <cdall@cs.columbia.edu>
>   *
>   * This program is free software; you can redistribute it and/or modify
>   * it under the terms of the GNU General Public License version 2 as
> @@ -22,6 +23,86 @@
>  #include <asm/kvm_mmu.h>
>  #include <asm/kvm_nested.h>
>  
> +
> +/* expects kvm->mmu_lock to be held */
> +void kvm_nested_s2_all_vcpus_wp(struct kvm *kvm)
> +{
> +	int i;
> +	struct kvm_vcpu *vcpu;
> +	struct kvm_nested_s2_mmu *nested_mmu;
> +	struct list_head *nested_mmu_list;
> +
> +	kvm_for_each_vcpu(i, vcpu, kvm) {
> +		if (need_resched() || spin_needbreak(&kvm->mmu_lock))
> +			cond_resched_lock(&kvm->mmu_lock);
> +
> +		nested_mmu_list = &vcpu->kvm->arch.nested_mmu_list;
> +		list_for_each_entry_rcu(nested_mmu, nested_mmu_list, list)
> +			kvm_stage2_wp_range(kvm, &nested_mmu->mmu,
> +				    0, KVM_PHYS_SIZE);
> +	}
> +}
> +
> +/* expects kvm->mmu_lock to be held */
> +void kvm_nested_s2_all_vcpus_unmap(struct kvm *kvm)
> +{
> +	int i;
> +	struct kvm_vcpu *vcpu;
> +	struct kvm_nested_s2_mmu *nested_mmu;
> +	struct list_head *nested_mmu_list;
> +
> +	kvm_for_each_vcpu(i, vcpu, kvm) {
> +		if (need_resched() || spin_needbreak(&kvm->mmu_lock))
> +			cond_resched_lock(&kvm->mmu_lock);
> +
> +		nested_mmu_list = &vcpu->kvm->arch.nested_mmu_list;
> +		list_for_each_entry_rcu(nested_mmu, nested_mmu_list, list)
> +			kvm_unmap_stage2_range(&nested_mmu->mmu,
> +				       0, KVM_PHYS_SIZE);
> +	}
> +}
> +
> +void kvm_nested_s2_all_vcpus_flush(struct kvm *kvm)
> +{
> +	int i;
> +	struct kvm_vcpu *vcpu;
> +	struct kvm_nested_s2_mmu *nested_mmu;
> +	struct list_head *nested_mmu_list;
> +
> +	kvm_for_each_vcpu(i, vcpu, kvm) {
> +		if (need_resched() || spin_needbreak(&kvm->mmu_lock))
> +			cond_resched_lock(&kvm->mmu_lock);
> +
> +		nested_mmu_list = &vcpu->kvm->arch.nested_mmu_list;
> +		list_for_each_entry_rcu(nested_mmu, nested_mmu_list, list)
> +			kvm_stage2_flush_range(&nested_mmu->mmu,
> +				       0, KVM_PHYS_SIZE);
> +	}
> +}
> +
> +void kvm_nested_s2_unmap(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_nested_s2_mmu *nested_mmu;
> +	struct list_head *nested_mmu_list = &vcpu->kvm->arch.nested_mmu_list;
> +
> +	list_for_each_entry_rcu(nested_mmu, nested_mmu_list, list)
> +		kvm_unmap_stage2_range(&nested_mmu->mmu, 0, KVM_PHYS_SIZE);
> +}

Did we change the functionality of this function in this patch as well
or are we just moving it around?  I can't really tell.


Thanks,
-Christoffer

> +
> +int kvm_nested_s2_init(struct kvm_vcpu *vcpu)
> +{
> +	return 0;
> +}
> +
> +void kvm_nested_s2_teardown(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_nested_s2_mmu *nested_mmu;
> +	struct list_head *nested_mmu_list = &vcpu->kvm->arch.nested_mmu_list;
> +
> +	list_for_each_entry_rcu(nested_mmu, nested_mmu_list, list)
> +		__kvm_free_stage2_pgd(&nested_mmu->mmu);
> +}
> +
>  struct kvm_nested_s2_mmu *get_nested_mmu(struct kvm_vcpu *vcpu, u64 vttbr)
>  {
>  	struct kvm_nested_s2_mmu *mmu;
> @@ -89,15 +170,6 @@ static struct kvm_nested_s2_mmu *create_nested_mmu(struct kvm_vcpu *vcpu,
>  	return nested_mmu;
>  }
>  
> -static void kvm_nested_s2_unmap(struct kvm_vcpu *vcpu)
> -{
> -	struct kvm_nested_s2_mmu *nested_mmu;
> -	struct list_head *nested_mmu_list = &vcpu->kvm->arch.nested_mmu_list;
> -
> -	list_for_each_entry_rcu(nested_mmu, nested_mmu_list, list)
> -		kvm_unmap_stage2_range(&nested_mmu->mmu, 0, KVM_PHYS_SIZE);
> -}
> -
>  bool handle_vttbr_update(struct kvm_vcpu *vcpu, u64 vttbr)
>  {
>  	struct kvm_nested_s2_mmu *nested_mmu;
> -- 
> 1.9.1
> 
> 

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 47/55] KVM: arm/arm64: Forward the guest hypervisor's stage 2 permission faults
  2017-01-09  6:24 ` [RFC 47/55] KVM: arm/arm64: Forward the guest hypervisor's stage 2 permission faults Jintack Lim
@ 2017-02-22 18:15   ` Christoffer Dall
  0 siblings, 0 replies; 111+ messages in thread
From: Christoffer Dall @ 2017-02-22 18:15 UTC (permalink / raw)
  To: Jintack Lim
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel

On Mon, Jan 09, 2017 at 01:24:43AM -0500, Jintack Lim wrote:
> From: Christoffer Dall <christoffer.dall@linaro.org>
> 
> When faulting on a shadow stage 2 page table we have to check if the
> fault was a permission fault and if so, if that fault needs to be
> handled by the guest hypervisor before us, in case the guest hypervisor
> has created a less permissive S2 entry than the operation required.

So I was a bit brief here.

We can have discrepancies between the nested stage 2 page table and the
shadow one in a couple of cases.  For example, the guest hypervisor can
mark a page writable but the host hypervisor maps the page read-only in
the shadow page table, if using something like KSM on the host level.
In this case, a write fault is handled directly by the host hypervisor.
But we could also simply have a read-only page mapped read-only in both
tables, in which case the host hypervisor cannot do anything else than
telling the guest hypervisor about the fault.

Can you incorporate that into the commit message?

Thanks,
-Christoffer

> 
> Check if this is the case, and inject a fault if it is.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
> ---
>  arch/arm/include/asm/kvm_mmu.h   |  7 +++++++
>  arch/arm/kvm/mmu.c               |  5 +++++
>  arch/arm64/include/asm/kvm_mmu.h |  9 +++++++++
>  arch/arm64/kvm/mmu-nested.c      | 33 +++++++++++++++++++++++++++++++++
>  4 files changed, 54 insertions(+)
> 
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index ab41a10..0d106ae 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -241,6 +241,13 @@ static inline int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
>  	return 0;
>  }
>  
> +static inline int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu,
> +					   phys_addr_t fault_ipa,
> +					   struct kvm_s2_trans *trans)
> +{
> +	return 0;
> +}
> +
>  static inline void kvm_nested_s2_unmap(struct kvm_vcpu *vcpu) { }
>  static inline int kvm_nested_s2_init(struct kvm_vcpu *vcpu) { return 0; }
>  static inline void kvm_nested_s2_teardown(struct kvm_vcpu *vcpu) { }
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index abdf345..68fc8e8 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -1542,6 +1542,11 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		ret = kvm_walk_nested_s2(vcpu, fault_ipa, &nested_trans);
>  		if (ret)
>  			goto out_unlock;
> +
> +		ret = kvm_s2_handle_perm_fault(vcpu, fault_ipa, &nested_trans);
> +		if (ret)
> +			goto out_unlock;
> +
>  		ipa = nested_trans.output;
>  	}
>  
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 2ac603d..2086296 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -338,6 +338,8 @@ struct kvm_s2_trans {
>  bool handle_vttbr_update(struct kvm_vcpu *vcpu, u64 vttbr);
>  int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
>  		       struct kvm_s2_trans *result);
> +int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> +			     struct kvm_s2_trans *trans);
>  void kvm_nested_s2_unmap(struct kvm_vcpu *vcpu);
>  int kvm_nested_s2_init(struct kvm_vcpu *vcpu);
>  void kvm_nested_s2_teardown(struct kvm_vcpu *vcpu);
> @@ -366,6 +368,13 @@ static inline int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
>  	return 0;
>  }
>  
> +static inline int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu,
> +					   phys_addr_t fault_ipa,
> +					   struct kvm_s2_trans *trans)
> +{
> +	return 0;
> +}
> +
>  static inline void kvm_nested_s2_unmap(struct kvm_vcpu *vcpu) { }
>  static inline int kvm_nested_s2_init(struct kvm_vcpu *vcpu) { return 0; }
>  static inline void kvm_nested_s2_teardown(struct kvm_vcpu *vcpu) { }
> diff --git a/arch/arm64/kvm/mmu-nested.c b/arch/arm64/kvm/mmu-nested.c
> index b579d23..65ad0da 100644
> --- a/arch/arm64/kvm/mmu-nested.c
> +++ b/arch/arm64/kvm/mmu-nested.c
> @@ -52,6 +52,19 @@ static unsigned int pa_max(void)
>  	return ps_to_output_size(parange);
>  }
>  
> +static int vcpu_inject_s2_perm_fault(struct kvm_vcpu *vcpu, gpa_t ipa,
> +				     int level)
> +{
> +	u32 esr;
> +
> +	vcpu->arch.ctxt.el2_regs[FAR_EL2] = vcpu->arch.fault.far_el2;
> +	vcpu->arch.ctxt.el2_regs[HPFAR_EL2] = vcpu->arch.fault.hpfar_el2;
> +	esr = kvm_vcpu_get_hsr(vcpu) & ~ESR_ELx_FSC;
> +	esr |= ESR_ELx_FSC_PERM;
> +	esr |= level & 0x3;
> +	return kvm_inject_nested_sync(vcpu, esr);
> +}
> +
>  static int vcpu_inject_s2_trans_fault(struct kvm_vcpu *vcpu, gpa_t ipa,
>  				      int level)
>  {
> @@ -268,6 +281,26 @@ int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
>  	return walk_nested_s2_pgd(vcpu, gipa, &wi, result);
>  }
>  
> +/*
> + * Returns non-zero if permission fault is handled by injecting it to the next
> + * level hypervisor.
> + */
> +int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> +			     struct kvm_s2_trans *trans)
> +{
> +	unsigned long fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
> +	bool write_fault = kvm_is_write_fault(vcpu);
> +
> +	if (fault_status != FSC_PERM)
> +		return 0;
> +
> +	if ((write_fault && !trans->writable) ||
> +	    (!write_fault && !trans->readable))
> +		return vcpu_inject_s2_perm_fault(vcpu, fault_ipa, trans->level);
> +
> +	return 0;
> +}
> +
>  /* expects kvm->mmu_lock to be held */
>  void kvm_nested_s2_all_vcpus_wp(struct kvm *kvm)
>  {
> -- 
> 1.9.1
> 
> 

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 00/55] Nested Virtualization on KVM/ARM
  2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
                   ` (55 preceding siblings ...)
  2017-01-09 15:05 ` [RFC 00/55] Nested Virtualization on KVM/ARM David Hildenbrand
@ 2017-02-22 18:23 ` Christoffer Dall
  2017-02-24 10:28   ` Jintack Lim
  56 siblings, 1 reply; 111+ messages in thread
From: Christoffer Dall @ 2017-02-22 18:23 UTC (permalink / raw)
  To: Jintack Lim
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel

Hi Jintack,


On Mon, Jan 09, 2017 at 01:23:56AM -0500, Jintack Lim wrote:
> Nested virtualization is the ability to run a virtual machine inside another
> virtual machine. In other words, it’s about running a hypervisor (the guest
> hypervisor) on top of another hypervisor (the host hypervisor).
> 
> This series supports nested virtualization on arm64. ARM recently announced an
> extension (ARMv8.3) which has support for nested virtualization[1]. This series
> is based on the ARMv8.3 specification.
> 
> Supporting nested virtualization means that the hypervisor provides not only
> EL0/EL1 execution environment with VMs as it usually does, but also the
> virtualization extensions including EL2 execution environment with the VMs.
> Once the host hypervisor provides those execution environment with the VMs,
> then the guest hypervisor can run its own VMs (nested VMs) naturally.
> 
> To support nested virtualization on ARM the hypervisor must emulate a virtual
> execution environment consisting of EL2, EL1, and EL0, as the guest hypervisor
> will run in a virtual EL2 mode.  Normally KVM/ARM only emulated a VM supporting
> EL1/0 running in their respective native CPU modes, but with nested
> virtualization we deprivilege the guest hypervisor and emulate a virtual EL2
> execution mode in EL1 using the hardware features provided by ARMv8.3 to trap
> EL2 operations to EL1. To do that the host hypervisor needs to manage EL2
> register state for the guest hypervisor, and shadow EL1 register state that
> reflects the EL2 register state to run the guest hypervisor in EL1. See patch 6
> through 10 for this.
> 
> For memory virtualization, the biggest issue is that we now have more than two
> stages of translation when running nested VMs. We choose to merge two stage-2
> page tables (one from the guest hypervisor and the other from the host
> hypervisor) and create shadow stage-2 page tables, which have mappings from the
> nested VM’s physical addresses to the machine physical addresses. Stage-1
> translation is done by the hardware as is done for the normal VMs.
> 
> To provide VGIC support to the guest hypervisor, we emulate the GIC
> virtualization extensions using trap-and-emulate to a virtual GIC Hypervisor
> Control Interface.  Furthermore, we can still use the GIC VE hardware features
> to deliver virtual interrupts to the nested VM, by directly mapping the GIC
> VCPU interface to the nested VM and switching the content of the GIC Hypervisor
> Control interface when alternating between a nested VM and a normal VM.  See
> patches 25 through 32, and 50 through 52 for more information.
> 
> For timer virtualization, the guest hypervisor expects to have access to the
> EL2 physical timer, the EL1 physical timer and the virtual timer. So, the host
> hypervisor needs to provide all of them. The virtual timer is always available
> to VMs. The physical timer is available to VMs via my previous patch series[3].
> The EL2 physical timer is not supported yet in this RFC. We plan to support
> this as it is required to run other guest hypervisors such as Xen.
> 
> Even though this work is not complete (see limitations below), I'd appreciate
> early feedback on this RFC. Specifically, I'm interested in:
> - Is it better to have a kernel config or to make it configurable at runtime?
> - I wonder if the data structure for memory management makes sense.
> - What architecture version do we support for the guest hypervisor, and how?
>   For example, do we always support all architecture versions or the same
>   architecture as the underlying hardware platform? Or is it better
>   to make it configurable from the userspace?
> - Initial comments on the overall design?
> 
> This patch series is based on kvm-arm-for-4.9-rc7 with the patch series to provide
> VMs with the EL1 physical timer[2].
> 
> Git: https://github.com/columbia/nesting-pub/tree/rfc-v1
> 
> Testing:
> We have tested this on ARMv8.0 (Applied Micro X-Gene)[3] since ARMv8.3 hardware
> is not available yet. We have paravirtualized the guest hypervisor to trap to
> EL2 as specified in ARMv8.3 specification using hvc instruction. We plan to
> test this on ARMv8.3 model, and will post the result and v2 if necessary.
> 
> Limitations:
> - This patch series only supports arm64, not arm. All the patches compile on
>   arm, but I haven't try to boot normal VMs on it.
> - The guest hypervisor with VHE (ARMv8.1) is not supported in this RFC. I have
>   patches for that, but they need to be cleaned up.
> - Recursive nesting (i.e. emulating ARMv8.3 in the VM) is not tested yet.
> - Other hypervisors (such as Xen) on KVM are not tested.
> 
> TODO:
> - Test to boot normal VMs on arm architecture
> - Test this on ARMv8.3 model
> - Support the guest hypervisor with VHE
> - Provide the guest hypervisor with the EL2 physical timer
> - Run other hypervisors such as Xen on KVM
> 

I have a couple of overall questions and comments on this series:

First, I think we should make sure that the series actually works with
v8.3 on the model using both VHE and non-VHE for the host hypervisor.

Second, this patch set is pretty large overall and it would be great if
we could split it up into some slightly more manageable bits.  I'm not
exactly how to do that, but perhaps we can rework it so that we add bits
of framework (CPU, memory, interrupt, timers) as individual series, and
finally we plug all the logic together with the current flow.  What do
you think?

Third, we should follow the feedback from David about not using a kernel
config option.  I'm afraid that some code will bitrot too fast if guided
by a kernel config option, so a runtime parameter and using static keys
where relevant seems like a better approach to me.  But since KVM/ARM is
not loaded as a module, this would have to be a kernel cmdline
parameter.  What do people think?

Fourth, there are some places where we have hard-coded information (like
the location of the GICH/GICV interfaces) which have to be fixed by
adding the required userspace interfaces.

Fifth, the ordering of the patches needs a bit of love. I think it's
important that we build the whole infrastructure first, but leave it
completely disabled until the end, and then we plug in all the
capabilities of userspace to create a nested VM in the end.  So for
example, I would expect that patch 03 would be the last patch in the
series.

Overall though, this is a massive amount of work, and it's awesome that
you were able to pull it together to a pretty nice initial RFC!

Thanks!
-Christoffer

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 54/55] KVM: arm/arm64: Adjust virtual offset considering nesting
  2017-01-09  6:24 ` [RFC 54/55] KVM: arm/arm64: Adjust virtual offset considering nesting Jintack Lim
@ 2017-02-22 19:28   ` Christoffer Dall
  0 siblings, 0 replies; 111+ messages in thread
From: Christoffer Dall @ 2017-02-22 19:28 UTC (permalink / raw)
  To: Jintack Lim
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel

Hi Jintack,

On Mon, Jan 09, 2017 at 01:24:50AM -0500, Jintack Lim wrote:
> The guest hypervisor sets cntvoff_el2 for its VM (i.e. nested VM).  Note
> that physical/virtual counter value in the guest hypervisor's point of
> view is already offsetted by the virtual offset set by the host
> hypervisor.  Therefore, the correct offset we need to write to the
> cntvoff_el2 is the sum of offset the host hypervisor initially has for
> the VM and virtual offset the guest hypervisor sets for the nested VM.

This appears to be the only timer patch in the series.  Should we not
also expose the EL2 timer to the VM and emulate that in software?

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 00/55] Nested Virtualization on KVM/ARM
  2017-02-22 18:23 ` Christoffer Dall
@ 2017-02-24 10:28   ` Jintack Lim
  0 siblings, 0 replies; 111+ messages in thread
From: Jintack Lim @ 2017-02-24 10:28 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Christoffer Dall, Marc Zyngier, Paolo Bonzini,
	Radim Krčmář,
	linux, Catalin Marinas, Will Deacon, vladimir.murzin,
	Suzuki K Poulose, mark.rutland, james.morse, lorenzo.pieralisi,
	kevin.brodsky, wcohen, shankerd, geoff, Andre Przywara,
	Eric Auger, anna-maria, Shih-Wei Li, arm-mail-list, kvmarm,
	KVM General, lkml - Kernel Mailing List

[My previous reply had HTML subpart, which made the e-mail look
terrible and being rejected from mailing lists. So, I'm sending it
again. Sorry for the inconvenience]

Hi Christoffer,

On Wed, Feb 22, 2017 at 1:23 PM, Christoffer Dall <cdall@linaro.org> wrote:
> Hi Jintack,
>
>
> On Mon, Jan 09, 2017 at 01:23:56AM -0500, Jintack Lim wrote:
>> Nested virtualization is the ability to run a virtual machine inside another
>> virtual machine. In other words, it’s about running a hypervisor (the guest
>> hypervisor) on top of another hypervisor (the host hypervisor).
>>
>> This series supports nested virtualization on arm64. ARM recently announced an
>> extension (ARMv8.3) which has support for nested virtualization[1]. This series
>> is based on the ARMv8.3 specification.
>>
>> Supporting nested virtualization means that the hypervisor provides not only
>> EL0/EL1 execution environment with VMs as it usually does, but also the
>> virtualization extensions including EL2 execution environment with the VMs.
>> Once the host hypervisor provides those execution environment with the VMs,
>> then the guest hypervisor can run its own VMs (nested VMs) naturally.
>>
>> To support nested virtualization on ARM the hypervisor must emulate a virtual
>> execution environment consisting of EL2, EL1, and EL0, as the guest hypervisor
>> will run in a virtual EL2 mode.  Normally KVM/ARM only emulated a VM supporting
>> EL1/0 running in their respective native CPU modes, but with nested
>> virtualization we deprivilege the guest hypervisor and emulate a virtual EL2
>> execution mode in EL1 using the hardware features provided by ARMv8.3 to trap
>> EL2 operations to EL1. To do that the host hypervisor needs to manage EL2
>> register state for the guest hypervisor, and shadow EL1 register state that
>> reflects the EL2 register state to run the guest hypervisor in EL1. See patch 6
>> through 10 for this.
>>
>> For memory virtualization, the biggest issue is that we now have more than two
>> stages of translation when running nested VMs. We choose to merge two stage-2
>> page tables (one from the guest hypervisor and the other from the host
>> hypervisor) and create shadow stage-2 page tables, which have mappings from the
>> nested VM’s physical addresses to the machine physical addresses. Stage-1
>> translation is done by the hardware as is done for the normal VMs.
>>
>> To provide VGIC support to the guest hypervisor, we emulate the GIC
>> virtualization extensions using trap-and-emulate to a virtual GIC Hypervisor
>> Control Interface.  Furthermore, we can still use the GIC VE hardware features
>> to deliver virtual interrupts to the nested VM, by directly mapping the GIC
>> VCPU interface to the nested VM and switching the content of the GIC Hypervisor
>> Control interface when alternating between a nested VM and a normal VM.  See
>> patches 25 through 32, and 50 through 52 for more information.
>>
>> For timer virtualization, the guest hypervisor expects to have access to the
>> EL2 physical timer, the EL1 physical timer and the virtual timer. So, the host
>> hypervisor needs to provide all of them. The virtual timer is always available
>> to VMs. The physical timer is available to VMs via my previous patch series[3].
>> The EL2 physical timer is not supported yet in this RFC. We plan to support
>> this as it is required to run other guest hypervisors such as Xen.
>>
>> Even though this work is not complete (see limitations below), I'd appreciate
>> early feedback on this RFC. Specifically, I'm interested in:
>> - Is it better to have a kernel config or to make it configurable at runtime?
>> - I wonder if the data structure for memory management makes sense.
>> - What architecture version do we support for the guest hypervisor, and how?
>>   For example, do we always support all architecture versions or the same
>>   architecture as the underlying hardware platform? Or is it better
>>   to make it configurable from the userspace?
>> - Initial comments on the overall design?
>>
>> This patch series is based on kvm-arm-for-4.9-rc7 with the patch series to provide
>> VMs with the EL1 physical timer[2].
>>
>> Git: https://github.com/columbia/nesting-pub/tree/rfc-v1
>>
>> Testing:
>> We have tested this on ARMv8.0 (Applied Micro X-Gene)[3] since ARMv8.3 hardware
>> is not available yet. We have paravirtualized the guest hypervisor to trap to
>> EL2 as specified in ARMv8.3 specification using hvc instruction. We plan to
>> test this on ARMv8.3 model, and will post the result and v2 if necessary.
>>
>> Limitations:
>> - This patch series only supports arm64, not arm. All the patches compile on
>>   arm, but I haven't try to boot normal VMs on it.
>> - The guest hypervisor with VHE (ARMv8.1) is not supported in this RFC. I have
>>   patches for that, but they need to be cleaned up.
>> - Recursive nesting (i.e. emulating ARMv8.3 in the VM) is not tested yet.
>> - Other hypervisors (such as Xen) on KVM are not tested.
>>
>> TODO:
>> - Test to boot normal VMs on arm architecture
>> - Test this on ARMv8.3 model
>> - Support the guest hypervisor with VHE
>> - Provide the guest hypervisor with the EL2 physical timer
>> - Run other hypervisors such as Xen on KVM
>>
>
> I have a couple of overall questions and comments on this series:
>
> First, I think we should make sure that the series actually works with
> v8.3 on the model using both VHE and non-VHE for the host hypervisor.

I agree. Will send out v2 once I make this work with v8.3 model.

>
> Second, this patch set is pretty large overall and it would be great if
> we could split it up into some slightly more manageable bits.  I'm not
> exactly how to do that, but perhaps we can rework it so that we add bits
> of framework (CPU, memory, interrupt, timers) as individual series, and
> finally we plug all the logic together with the current flow.  What do
> you think?

I think it sounds great. I can start with CPU patch series first.

>
> Third, we should follow the feedback from David about not using a kernel
> config option.  I'm afraid that some code will bitrot too fast if guided
> by a kernel config option, so a runtime parameter and using static keys
> where relevant seems like a better approach to me.  But since KVM/ARM is
> not loaded as a module, this would have to be a kernel cmdline
> parameter.  What do people think?
>
> Fourth, there are some places where we have hard-coded information (like
> the location of the GICH/GICV interfaces) which have to be fixed by
> adding the required userspace interfaces.

Right. I'll fix them and I'll provide a link which has userspace
changes for this nesting work in the cover letter.

>
> Fifth, the ordering of the patches needs a bit of love. I think it's
> important that we build the whole infrastructure first, but leave it
> completely disabled until the end, and then we plug in all the
> capabilities of userspace to create a nested VM in the end.  So for
> example, I would expect that patch 03 would be the last patch in the
> series.

Ah, I got it. I'll reorder patches accordingly.

>
> Overall though, this is a massive amount of work, and it's awesome that
> you were able to pull it together to a pretty nice initial RFC!

Thanks a lot for your help and reviews. I'll address individual reviews soon :)

Thanks,
Jintack

>
> Thanks!
> -Christoffer
>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 07/55] KVM: arm/arm64: Add virtual EL2 state emulation framework
  2017-01-09  6:24 ` [RFC 07/55] KVM: arm/arm64: Add virtual EL2 state emulation framework Jintack Lim
  2017-02-22 11:12   ` Christoffer Dall
@ 2017-06-01 20:05   ` Bandan Das
  2017-06-02 11:51     ` Christoffer Dall
  1 sibling, 1 reply; 111+ messages in thread
From: Bandan Das @ 2017-06-01 20:05 UTC (permalink / raw)
  To: Jintack Lim
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel

Jintack Lim <jintack@cs.columbia.edu> writes:
...
> +/**
> + * kvm_arm_setup_shadow_state -- prepare shadow state based on emulated mode
> + * @vcpu: The VCPU pointer
> + */
> +void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
> +
> +	ctxt->hw_pstate = *vcpu_cpsr(vcpu);
> +	ctxt->hw_sys_regs = ctxt->sys_regs;
> +	ctxt->hw_sp_el1 = ctxt->gp_regs.sp_el1;
> +}
> +
> +/**
> + * kvm_arm_restore_shadow_state -- write back shadow state from guest
> + * @vcpu: The VCPU pointer
> + */
> +void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
> +
> +	*vcpu_cpsr(vcpu) = ctxt->hw_pstate;
> +	ctxt->gp_regs.sp_el1 = ctxt->hw_sp_el1;
> +}
> +
> +void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt)
> +{
> +	cpu_ctxt->hw_sys_regs = &cpu_ctxt->sys_regs[0];
> +}


IIUC, the *_shadow_state() functions will set hw_* pointers to
either point to the "real" state or the shadow state to manage L2 ?
Maybe, it might make sense to make these function names a little more
generic since they are not dealing with setting the shadow state
alone.

> diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> index 9341376..f2a1b32 100644
> --- a/arch/arm64/kvm/hyp/sysreg-sr.c
> +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> @@ -19,6 +19,7 @@
>  #include <linux/kvm_host.h>
>  
>  #include <asm/kvm_asm.h>
> +#include <asm/kvm_emulate.h>
>  #include <asm/kvm_hyp.h>
>  
>  /* Yes, this does nothing, on purpose */
> @@ -33,37 +34,41 @@ static void __hyp_text __sysreg_do_nothing(struct kvm_cpu_context *ctxt) { }
>  
>  static void __hyp_text __sysreg_save_common_state(struct kvm_cpu_context *ctxt)
>  {
> -	ctxt->sys_regs[ACTLR_EL1]	= read_sysreg(actlr_el1);
> -	ctxt->sys_regs[TPIDR_EL0]	= read_sysreg(tpidr_el0);
> -	ctxt->sys_regs[TPIDRRO_EL0]	= read_sysreg(tpidrro_el0);
> -	ctxt->sys_regs[TPIDR_EL1]	= read_sysreg(tpidr_el1);
> -	ctxt->sys_regs[MDSCR_EL1]	= read_sysreg(mdscr_el1);
> +	u64 *sys_regs = kern_hyp_va(ctxt->hw_sys_regs);
> +
> +	sys_regs[ACTLR_EL1]	= read_sysreg(actlr_el1);
> +	sys_regs[TPIDR_EL0]	= read_sysreg(tpidr_el0);
> +	sys_regs[TPIDRRO_EL0]	= read_sysreg(tpidrro_el0);
> +	sys_regs[TPIDR_EL1]	= read_sysreg(tpidr_el1);
> +	sys_regs[MDSCR_EL1]	= read_sysreg(mdscr_el1);
>  	ctxt->gp_regs.regs.sp		= read_sysreg(sp_el0);
>  	ctxt->gp_regs.regs.pc		= read_sysreg_el2(elr);
> -	ctxt->gp_regs.regs.pstate	= read_sysreg_el2(spsr);
> +	ctxt->hw_pstate			= read_sysreg_el2(spsr);
>  }
>  
>  static void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
>  {
> -	ctxt->sys_regs[MPIDR_EL1]	= read_sysreg(vmpidr_el2);
> -	ctxt->sys_regs[CSSELR_EL1]	= read_sysreg(csselr_el1);
> -	ctxt->sys_regs[SCTLR_EL1]	= read_sysreg_el1(sctlr);
> -	ctxt->sys_regs[CPACR_EL1]	= read_sysreg_el1(cpacr);
> -	ctxt->sys_regs[TTBR0_EL1]	= read_sysreg_el1(ttbr0);
> -	ctxt->sys_regs[TTBR1_EL1]	= read_sysreg_el1(ttbr1);
> -	ctxt->sys_regs[TCR_EL1]		= read_sysreg_el1(tcr);
> -	ctxt->sys_regs[ESR_EL1]		= read_sysreg_el1(esr);
> -	ctxt->sys_regs[AFSR0_EL1]	= read_sysreg_el1(afsr0);
> -	ctxt->sys_regs[AFSR1_EL1]	= read_sysreg_el1(afsr1);
> -	ctxt->sys_regs[FAR_EL1]		= read_sysreg_el1(far);
> -	ctxt->sys_regs[MAIR_EL1]	= read_sysreg_el1(mair);
> -	ctxt->sys_regs[VBAR_EL1]	= read_sysreg_el1(vbar);
> -	ctxt->sys_regs[CONTEXTIDR_EL1]	= read_sysreg_el1(contextidr);
> -	ctxt->sys_regs[AMAIR_EL1]	= read_sysreg_el1(amair);
> -	ctxt->sys_regs[CNTKCTL_EL1]	= read_sysreg_el1(cntkctl);
> -	ctxt->sys_regs[PAR_EL1]		= read_sysreg(par_el1);
> -
> -	ctxt->gp_regs.sp_el1		= read_sysreg(sp_el1);
> +	u64 *sys_regs = kern_hyp_va(ctxt->hw_sys_regs);
> +
> +	sys_regs[MPIDR_EL1]	= read_sysreg(vmpidr_el2);
> +	sys_regs[CSSELR_EL1]	= read_sysreg(csselr_el1);
> +	sys_regs[SCTLR_EL1]	= read_sysreg_el1(sctlr);
> +	sys_regs[CPACR_EL1]	= read_sysreg_el1(cpacr);
> +	sys_regs[TTBR0_EL1]	= read_sysreg_el1(ttbr0);
> +	sys_regs[TTBR1_EL1]	= read_sysreg_el1(ttbr1);
> +	sys_regs[TCR_EL1]	= read_sysreg_el1(tcr);
> +	sys_regs[ESR_EL1]	= read_sysreg_el1(esr);
> +	sys_regs[AFSR0_EL1]	= read_sysreg_el1(afsr0);
> +	sys_regs[AFSR1_EL1]	= read_sysreg_el1(afsr1);
> +	sys_regs[FAR_EL1]	= read_sysreg_el1(far);
> +	sys_regs[MAIR_EL1]	= read_sysreg_el1(mair);
> +	sys_regs[VBAR_EL1]	= read_sysreg_el1(vbar);
> +	sys_regs[CONTEXTIDR_EL1]	= read_sysreg_el1(contextidr);
> +	sys_regs[AMAIR_EL1]	= read_sysreg_el1(amair);
> +	sys_regs[CNTKCTL_EL1]	= read_sysreg_el1(cntkctl);
> +	sys_regs[PAR_EL1]		= read_sysreg(par_el1);
> +
> +	ctxt->hw_sp_el1			= read_sysreg(sp_el1);
>  	ctxt->gp_regs.elr_el1		= read_sysreg_el1(elr);
>  	ctxt->gp_regs.spsr[KVM_SPSR_EL1]= read_sysreg_el1(spsr);
>  }
> @@ -86,37 +91,41 @@ void __hyp_text __sysreg_save_guest_state(struct kvm_cpu_context *ctxt)
>  
>  static void __hyp_text __sysreg_restore_common_state(struct kvm_cpu_context *ctxt)
>  {
> -	write_sysreg(ctxt->sys_regs[ACTLR_EL1],	  actlr_el1);
> -	write_sysreg(ctxt->sys_regs[TPIDR_EL0],	  tpidr_el0);
> -	write_sysreg(ctxt->sys_regs[TPIDRRO_EL0], tpidrro_el0);
> -	write_sysreg(ctxt->sys_regs[TPIDR_EL1],	  tpidr_el1);
> -	write_sysreg(ctxt->sys_regs[MDSCR_EL1],	  mdscr_el1);
> +	u64 *sys_regs = kern_hyp_va(ctxt->hw_sys_regs);
> +
> +	write_sysreg(sys_regs[ACTLR_EL1],	  actlr_el1);
> +	write_sysreg(sys_regs[TPIDR_EL0],	  tpidr_el0);
> +	write_sysreg(sys_regs[TPIDRRO_EL0],	tpidrro_el0);
> +	write_sysreg(sys_regs[TPIDR_EL1],	  tpidr_el1);
> +	write_sysreg(sys_regs[MDSCR_EL1],	  mdscr_el1);
>  	write_sysreg(ctxt->gp_regs.regs.sp,	  sp_el0);
>  	write_sysreg_el2(ctxt->gp_regs.regs.pc,	  elr);
> -	write_sysreg_el2(ctxt->gp_regs.regs.pstate, spsr);
> +	write_sysreg_el2(ctxt->hw_pstate,	  spsr);
>  }
>  
>  static void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
>  {
> -	write_sysreg(ctxt->sys_regs[MPIDR_EL1],		vmpidr_el2);
> -	write_sysreg(ctxt->sys_regs[CSSELR_EL1],	csselr_el1);
> -	write_sysreg_el1(ctxt->sys_regs[SCTLR_EL1],	sctlr);
> -	write_sysreg_el1(ctxt->sys_regs[CPACR_EL1],	cpacr);
> -	write_sysreg_el1(ctxt->sys_regs[TTBR0_EL1],	ttbr0);
> -	write_sysreg_el1(ctxt->sys_regs[TTBR1_EL1],	ttbr1);
> -	write_sysreg_el1(ctxt->sys_regs[TCR_EL1],	tcr);
> -	write_sysreg_el1(ctxt->sys_regs[ESR_EL1],	esr);
> -	write_sysreg_el1(ctxt->sys_regs[AFSR0_EL1],	afsr0);
> -	write_sysreg_el1(ctxt->sys_regs[AFSR1_EL1],	afsr1);
> -	write_sysreg_el1(ctxt->sys_regs[FAR_EL1],	far);
> -	write_sysreg_el1(ctxt->sys_regs[MAIR_EL1],	mair);
> -	write_sysreg_el1(ctxt->sys_regs[VBAR_EL1],	vbar);
> -	write_sysreg_el1(ctxt->sys_regs[CONTEXTIDR_EL1],contextidr);
> -	write_sysreg_el1(ctxt->sys_regs[AMAIR_EL1],	amair);
> -	write_sysreg_el1(ctxt->sys_regs[CNTKCTL_EL1], 	cntkctl);
> -	write_sysreg(ctxt->sys_regs[PAR_EL1],		par_el1);
> -
> -	write_sysreg(ctxt->gp_regs.sp_el1,		sp_el1);
> +	u64 *sys_regs = kern_hyp_va(ctxt->hw_sys_regs);
> +
> +	write_sysreg(sys_regs[MPIDR_EL1],	vmpidr_el2);
> +	write_sysreg(sys_regs[CSSELR_EL1],	csselr_el1);
> +	write_sysreg_el1(sys_regs[SCTLR_EL1],	sctlr);
> +	write_sysreg_el1(sys_regs[CPACR_EL1],	cpacr);
> +	write_sysreg_el1(sys_regs[TTBR0_EL1],	ttbr0);
> +	write_sysreg_el1(sys_regs[TTBR1_EL1],	ttbr1);
> +	write_sysreg_el1(sys_regs[TCR_EL1],	tcr);
> +	write_sysreg_el1(sys_regs[ESR_EL1],	esr);
> +	write_sysreg_el1(sys_regs[AFSR0_EL1],	afsr0);
> +	write_sysreg_el1(sys_regs[AFSR1_EL1],	afsr1);
> +	write_sysreg_el1(sys_regs[FAR_EL1],	far);
> +	write_sysreg_el1(sys_regs[MAIR_EL1],	mair);
> +	write_sysreg_el1(sys_regs[VBAR_EL1],	vbar);
> +	write_sysreg_el1(sys_regs[CONTEXTIDR_EL1], contextidr);
> +	write_sysreg_el1(sys_regs[AMAIR_EL1],	amair);
> +	write_sysreg_el1(sys_regs[CNTKCTL_EL1], cntkctl);
> +	write_sysreg(sys_regs[PAR_EL1],		par_el1);
> +
> +	write_sysreg(ctxt->hw_sp_el1,			sp_el1);
>  	write_sysreg_el1(ctxt->gp_regs.elr_el1,		elr);
>  	write_sysreg_el1(ctxt->gp_regs.spsr[KVM_SPSR_EL1],spsr);
>  }

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 08/55] KVM: arm64: Set virtual EL2 context depending on the guest exception level
  2017-01-09  6:24 ` [RFC 08/55] KVM: arm64: Set virtual EL2 context depending on the guest exception level Jintack Lim
  2017-02-22 11:14   ` Christoffer Dall
@ 2017-06-01 20:22   ` Bandan Das
  2017-06-02  8:48     ` Marc Zyngier
  1 sibling, 1 reply; 111+ messages in thread
From: Bandan Das @ 2017-06-01 20:22 UTC (permalink / raw)
  To: Jintack Lim
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel

Jintack Lim <jintack@cs.columbia.edu> writes:

> From: Christoffer Dall <christoffer.dall@linaro.org>
>
> Set up virutal EL2 context to hardware if the guest exception level is
> EL2.
>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
> ---
>  arch/arm64/kvm/context.c | 32 ++++++++++++++++++++++++++------
>  1 file changed, 26 insertions(+), 6 deletions(-)
>
> diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
> index 320afc6..acb4b1e 100644
> --- a/arch/arm64/kvm/context.c
> +++ b/arch/arm64/kvm/context.c
> @@ -25,10 +25,25 @@
>  void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
> +	if (unlikely(vcpu_mode_el2(vcpu))) {
> +		ctxt->hw_pstate = *vcpu_cpsr(vcpu) & ~PSR_MODE_MASK;
>  
> -	ctxt->hw_pstate = *vcpu_cpsr(vcpu);
> -	ctxt->hw_sys_regs = ctxt->sys_regs;
> -	ctxt->hw_sp_el1 = ctxt->gp_regs.sp_el1;
> +		/*
> +		 * We emulate virtual EL2 mode in hardware EL1 mode using the
> +		 * same stack pointer mode as the guest expects.
> +		 */
> +		if ((*vcpu_cpsr(vcpu) & PSR_MODE_MASK) == PSR_MODE_EL2h)
> +			ctxt->hw_pstate |= PSR_MODE_EL1h;
> +		else
> +			ctxt->hw_pstate |= PSR_MODE_EL1t;
> +

I see vcpu_mode(el2) does
return mode == PSR_MODE_EL2h || mode == PSR_MODE_EL2t;

I can't seem to find this, what's the difference between
the modes: PSR_MODE_EL2h/EL2t ?

Bandan

> +		ctxt->hw_sys_regs = ctxt->shadow_sys_regs;
> +		ctxt->hw_sp_el1 = ctxt->el2_regs[SP_EL2];
> +	} else {
> +		ctxt->hw_pstate = *vcpu_cpsr(vcpu);
> +		ctxt->hw_sys_regs = ctxt->sys_regs;
> +		ctxt->hw_sp_el1 = ctxt->gp_regs.sp_el1;
> +	}
>  }
>  
>  /**
> @@ -38,9 +53,14 @@ void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
>  void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
> -
> -	*vcpu_cpsr(vcpu) = ctxt->hw_pstate;
> -	ctxt->gp_regs.sp_el1 = ctxt->hw_sp_el1;
> +	if (unlikely(vcpu_mode_el2(vcpu))) {
> +		*vcpu_cpsr(vcpu) &= PSR_MODE_MASK;
> +		*vcpu_cpsr(vcpu) |= ctxt->hw_pstate & ~PSR_MODE_MASK;
> +		ctxt->el2_regs[SP_EL2] = ctxt->hw_sp_el1;
> +	} else {
> +		*vcpu_cpsr(vcpu) = ctxt->hw_pstate;
> +		ctxt->gp_regs.sp_el1 = ctxt->hw_sp_el1;
> +	}
>  }
>  
>  void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt)

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 08/55] KVM: arm64: Set virtual EL2 context depending on the guest exception level
  2017-06-01 20:22   ` Bandan Das
@ 2017-06-02  8:48     ` Marc Zyngier
  0 siblings, 0 replies; 111+ messages in thread
From: Marc Zyngier @ 2017-06-02  8:48 UTC (permalink / raw)
  To: Bandan Das, Jintack Lim
  Cc: christoffer.dall, pbonzini, rkrcmar, linux, catalin.marinas,
	will.deacon, vladimir.murzin, suzuki.poulose, mark.rutland,
	james.morse, lorenzo.pieralisi, kevin.brodsky, wcohen, shankerd,
	geoff, andre.przywara, eric.auger, anna-maria, shihwei,
	linux-arm-kernel, kvmarm, kvm, linux-kernel

On 01/06/17 21:22, Bandan Das wrote:
> Jintack Lim <jintack@cs.columbia.edu> writes:
> 
>> From: Christoffer Dall <christoffer.dall@linaro.org>
>>
>> Set up virutal EL2 context to hardware if the guest exception level is
>> EL2.
>>
>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
>> ---
>>  arch/arm64/kvm/context.c | 32 ++++++++++++++++++++++++++------
>>  1 file changed, 26 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
>> index 320afc6..acb4b1e 100644
>> --- a/arch/arm64/kvm/context.c
>> +++ b/arch/arm64/kvm/context.c
>> @@ -25,10 +25,25 @@
>>  void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
>>  {
>>  	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
>> +	if (unlikely(vcpu_mode_el2(vcpu))) {
>> +		ctxt->hw_pstate = *vcpu_cpsr(vcpu) & ~PSR_MODE_MASK;
>>  
>> -	ctxt->hw_pstate = *vcpu_cpsr(vcpu);
>> -	ctxt->hw_sys_regs = ctxt->sys_regs;
>> -	ctxt->hw_sp_el1 = ctxt->gp_regs.sp_el1;
>> +		/*
>> +		 * We emulate virtual EL2 mode in hardware EL1 mode using the
>> +		 * same stack pointer mode as the guest expects.
>> +		 */
>> +		if ((*vcpu_cpsr(vcpu) & PSR_MODE_MASK) == PSR_MODE_EL2h)
>> +			ctxt->hw_pstate |= PSR_MODE_EL1h;
>> +		else
>> +			ctxt->hw_pstate |= PSR_MODE_EL1t;
>> +
> 
> I see vcpu_mode(el2) does
> return mode == PSR_MODE_EL2h || mode == PSR_MODE_EL2t;
> 
> I can't seem to find this, what's the difference between
> the modes: PSR_MODE_EL2h/EL2t ?

The difference is the stack pointer that is getting used. When the CPU
is at ELxh, it uses SPx at ELx. When at ELxt, it uses SP0 (the userspace
stack pointer). See the definition of SPSR_EL2 in the ARMv8 ARM.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 07/55] KVM: arm/arm64: Add virtual EL2 state emulation framework
  2017-06-01 20:05   ` Bandan Das
@ 2017-06-02 11:51     ` Christoffer Dall
  2017-06-02 17:36       ` Bandan Das
  0 siblings, 1 reply; 111+ messages in thread
From: Christoffer Dall @ 2017-06-02 11:51 UTC (permalink / raw)
  To: Bandan Das
  Cc: Jintack Lim, christoffer.dall, marc.zyngier, pbonzini, rkrcmar,
	linux, catalin.marinas, will.deacon, vladimir.murzin,
	suzuki.poulose, mark.rutland, james.morse, lorenzo.pieralisi,
	kevin.brodsky, wcohen, shankerd, geoff, andre.przywara,
	eric.auger, anna-maria, shihwei, linux-arm-kernel, kvmarm, kvm,
	linux-kernel

On Thu, Jun 01, 2017 at 04:05:49PM -0400, Bandan Das wrote:
> Jintack Lim <jintack@cs.columbia.edu> writes:
> ...
> > +/**
> > + * kvm_arm_setup_shadow_state -- prepare shadow state based on emulated mode
> > + * @vcpu: The VCPU pointer
> > + */
> > +void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
> > +{
> > +	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
> > +
> > +	ctxt->hw_pstate = *vcpu_cpsr(vcpu);
> > +	ctxt->hw_sys_regs = ctxt->sys_regs;
> > +	ctxt->hw_sp_el1 = ctxt->gp_regs.sp_el1;
> > +}
> > +
> > +/**
> > + * kvm_arm_restore_shadow_state -- write back shadow state from guest
> > + * @vcpu: The VCPU pointer
> > + */
> > +void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu)
> > +{
> > +	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
> > +
> > +	*vcpu_cpsr(vcpu) = ctxt->hw_pstate;
> > +	ctxt->gp_regs.sp_el1 = ctxt->hw_sp_el1;
> > +}
> > +
> > +void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt)
> > +{
> > +	cpu_ctxt->hw_sys_regs = &cpu_ctxt->sys_regs[0];
> > +}
> 
> 
> IIUC, the *_shadow_state() functions will set hw_* pointers to
> either point to the "real" state or the shadow state to manage L2 ?
> Maybe, it might make sense to make these function names a little more
> generic since they are not dealing with setting the shadow state
> alone.
> 

The notion of 'shadow state' is borrowed from shadow page tables, in
which you always load some 'shadow copy' of the 'real value' into the
hardware, so the shadow state is the one that's used for execution by
the hardware.

The shadow state may be the same as the VCPU's EL1 state, for example,
or it may be a modified version of the VCPU's EL2 state, for example.

If you have better suggestions for naming, we're open to that though.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 07/55] KVM: arm/arm64: Add virtual EL2 state emulation framework
  2017-06-02 11:51     ` Christoffer Dall
@ 2017-06-02 17:36       ` Bandan Das
  2017-06-02 19:06         ` Christoffer Dall
  0 siblings, 1 reply; 111+ messages in thread
From: Bandan Das @ 2017-06-02 17:36 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Jintack Lim, christoffer.dall, marc.zyngier, pbonzini, rkrcmar,
	linux, catalin.marinas, will.deacon, vladimir.murzin,
	suzuki.poulose, mark.rutland, james.morse, lorenzo.pieralisi,
	kevin.brodsky, wcohen, shankerd, geoff, andre.przywara,
	eric.auger, anna-maria, shihwei, linux-arm-kernel, kvmarm, kvm,
	linux-kernel

Christoffer Dall <cdall@linaro.org> writes:

> On Thu, Jun 01, 2017 at 04:05:49PM -0400, Bandan Das wrote:
>> Jintack Lim <jintack@cs.columbia.edu> writes:
>> ...
>> > +/**
>> > + * kvm_arm_setup_shadow_state -- prepare shadow state based on emulated mode
>> > + * @vcpu: The VCPU pointer
>> > + */
>> > +void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
>> > +{
>> > +	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
>> > +
>> > +	ctxt->hw_pstate = *vcpu_cpsr(vcpu);
>> > +	ctxt->hw_sys_regs = ctxt->sys_regs;
>> > +	ctxt->hw_sp_el1 = ctxt->gp_regs.sp_el1;
>> > +}
>> > +
>> > +/**
>> > + * kvm_arm_restore_shadow_state -- write back shadow state from guest
>> > + * @vcpu: The VCPU pointer
>> > + */
>> > +void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu)
>> > +{
>> > +	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
>> > +
>> > +	*vcpu_cpsr(vcpu) = ctxt->hw_pstate;
>> > +	ctxt->gp_regs.sp_el1 = ctxt->hw_sp_el1;
>> > +}
>> > +
>> > +void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt)
>> > +{
>> > +	cpu_ctxt->hw_sys_regs = &cpu_ctxt->sys_regs[0];
>> > +}
>> 
>> 
>> IIUC, the *_shadow_state() functions will set hw_* pointers to
>> either point to the "real" state or the shadow state to manage L2 ?
>> Maybe, it might make sense to make these function names a little more
>> generic since they are not dealing with setting the shadow state
>> alone.
>> 
>
> The notion of 'shadow state' is borrowed from shadow page tables, in
> which you always load some 'shadow copy' of the 'real value' into the
> hardware, so the shadow state is the one that's used for execution by
> the hardware.
>
> The shadow state may be the same as the VCPU's EL1 state, for example,
> or it may be a modified version of the VCPU's EL2 state, for example.

Yes, it can be the same. Although, as you said above, "shadow" conventionally
refers to the latter. When it's pointing to EL1 state, it's not really
shadow state anymore.

> If you have better suggestions for naming, we're open to that though.
>

Oh nothing specifically, I just felt like "shadow" in the function name
could be confusing. Borrowing from kvm_arm_init_cpu_context(), 
how about kvm_arm_setup/restore_cpu_context()  ?

BTW, on a separate note, we might as well get away with the typedef and
call struct kvm_cpu_context directly.

> Thanks,
> -Christoffer

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 07/55] KVM: arm/arm64: Add virtual EL2 state emulation framework
  2017-06-02 17:36       ` Bandan Das
@ 2017-06-02 19:06         ` Christoffer Dall
  2017-06-02 19:25           ` Bandan Das
  0 siblings, 1 reply; 111+ messages in thread
From: Christoffer Dall @ 2017-06-02 19:06 UTC (permalink / raw)
  To: Bandan Das
  Cc: Jintack Lim, christoffer.dall, marc.zyngier, pbonzini, rkrcmar,
	linux, catalin.marinas, will.deacon, vladimir.murzin,
	suzuki.poulose, mark.rutland, james.morse, lorenzo.pieralisi,
	kevin.brodsky, wcohen, shankerd, geoff, andre.przywara,
	eric.auger, anna-maria, shihwei, linux-arm-kernel, kvmarm, kvm,
	linux-kernel

On Fri, Jun 02, 2017 at 01:36:23PM -0400, Bandan Das wrote:
> Christoffer Dall <cdall@linaro.org> writes:
> 
> > On Thu, Jun 01, 2017 at 04:05:49PM -0400, Bandan Das wrote:
> >> Jintack Lim <jintack@cs.columbia.edu> writes:
> >> ...
> >> > +/**
> >> > + * kvm_arm_setup_shadow_state -- prepare shadow state based on emulated mode
> >> > + * @vcpu: The VCPU pointer
> >> > + */
> >> > +void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
> >> > +{
> >> > +	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
> >> > +
> >> > +	ctxt->hw_pstate = *vcpu_cpsr(vcpu);
> >> > +	ctxt->hw_sys_regs = ctxt->sys_regs;
> >> > +	ctxt->hw_sp_el1 = ctxt->gp_regs.sp_el1;
> >> > +}
> >> > +
> >> > +/**
> >> > + * kvm_arm_restore_shadow_state -- write back shadow state from guest
> >> > + * @vcpu: The VCPU pointer
> >> > + */
> >> > +void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu)
> >> > +{
> >> > +	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
> >> > +
> >> > +	*vcpu_cpsr(vcpu) = ctxt->hw_pstate;
> >> > +	ctxt->gp_regs.sp_el1 = ctxt->hw_sp_el1;
> >> > +}
> >> > +
> >> > +void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt)
> >> > +{
> >> > +	cpu_ctxt->hw_sys_regs = &cpu_ctxt->sys_regs[0];
> >> > +}
> >> 
> >> 
> >> IIUC, the *_shadow_state() functions will set hw_* pointers to
> >> either point to the "real" state or the shadow state to manage L2 ?
> >> Maybe, it might make sense to make these function names a little more
> >> generic since they are not dealing with setting the shadow state
> >> alone.
> >> 
> >
> > The notion of 'shadow state' is borrowed from shadow page tables, in
> > which you always load some 'shadow copy' of the 'real value' into the
> > hardware, so the shadow state is the one that's used for execution by
> > the hardware.
> >
> > The shadow state may be the same as the VCPU's EL1 state, for example,
> > or it may be a modified version of the VCPU's EL2 state, for example.
> 
> Yes, it can be the same. Although, as you said above, "shadow" conventionally
> refers to the latter.

That's not what I said.  I said shadow is the thing you use in the
hardware, which may be the same, and may be something different.  The
important point being, that it is what gets used by the hardware, and
that it's decoupled, not necessarily different, from the virtual
state.

> When it's pointing to EL1 state, it's not really
> shadow state anymore.
> 

You can argue it both ways, in the end, all that's important is whether
or not it's clear what the functions do.

> > If you have better suggestions for naming, we're open to that though.
> >
> 
> Oh nothing specifically, I just felt like "shadow" in the function name
> could be confusing. Borrowing from kvm_arm_init_cpu_context(), 
> how about kvm_arm_setup/restore_cpu_context()  ?

I have no objection to these names.

> 
> BTW, on a separate note, we might as well get away with the typedef and
> call struct kvm_cpu_context directly.
> 
I don't think it's worth changing the code just for that, but if you
feel it's a significant cleanup, you can send a patch with a good
argument for why it's worth changing in the commit message.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 07/55] KVM: arm/arm64: Add virtual EL2 state emulation framework
  2017-06-02 19:06         ` Christoffer Dall
@ 2017-06-02 19:25           ` Bandan Das
  0 siblings, 0 replies; 111+ messages in thread
From: Bandan Das @ 2017-06-02 19:25 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Jintack Lim, christoffer.dall, marc.zyngier, pbonzini, rkrcmar,
	linux, catalin.marinas, will.deacon, vladimir.murzin,
	suzuki.poulose, mark.rutland, james.morse, lorenzo.pieralisi,
	kevin.brodsky, wcohen, shankerd, geoff, andre.przywara,
	eric.auger, anna-maria, shihwei, linux-arm-kernel, kvmarm, kvm,
	linux-kernel

Christoffer Dall <cdall@linaro.org> writes:

> On Fri, Jun 02, 2017 at 01:36:23PM -0400, Bandan Das wrote:
>> Christoffer Dall <cdall@linaro.org> writes:
>> 
>> > On Thu, Jun 01, 2017 at 04:05:49PM -0400, Bandan Das wrote:
>> >> Jintack Lim <jintack@cs.columbia.edu> writes:
>> >> ...
>> >> > +/**
>> >> > + * kvm_arm_setup_shadow_state -- prepare shadow state based on emulated mode
>> >> > + * @vcpu: The VCPU pointer
>> >> > + */
>> >> > +void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
>> >> > +{
>> >> > +	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
>> >> > +
>> >> > +	ctxt->hw_pstate = *vcpu_cpsr(vcpu);
>> >> > +	ctxt->hw_sys_regs = ctxt->sys_regs;
>> >> > +	ctxt->hw_sp_el1 = ctxt->gp_regs.sp_el1;
>> >> > +}
>> >> > +
>> >> > +/**
>> >> > + * kvm_arm_restore_shadow_state -- write back shadow state from guest
>> >> > + * @vcpu: The VCPU pointer
>> >> > + */
>> >> > +void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu)
>> >> > +{
>> >> > +	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
>> >> > +
>> >> > +	*vcpu_cpsr(vcpu) = ctxt->hw_pstate;
>> >> > +	ctxt->gp_regs.sp_el1 = ctxt->hw_sp_el1;
>> >> > +}
>> >> > +
>> >> > +void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt)
>> >> > +{
>> >> > +	cpu_ctxt->hw_sys_regs = &cpu_ctxt->sys_regs[0];
>> >> > +}
>> >> 
>> >> 
>> >> IIUC, the *_shadow_state() functions will set hw_* pointers to
>> >> either point to the "real" state or the shadow state to manage L2 ?
>> >> Maybe, it might make sense to make these function names a little more
>> >> generic since they are not dealing with setting the shadow state
>> >> alone.
>> >> 
>> >
>> > The notion of 'shadow state' is borrowed from shadow page tables, in
>> > which you always load some 'shadow copy' of the 'real value' into the
>> > hardware, so the shadow state is the one that's used for execution by
>> > the hardware.
>> >
>> > The shadow state may be the same as the VCPU's EL1 state, for example,
>> > or it may be a modified version of the VCPU's EL2 state, for example.
>> 
>> Yes, it can be the same. Although, as you said above, "shadow" conventionally
>> refers to the latter.
>
> That's not what I said.  I said shadow is the thing you use in the
> hardware, which may be the same, and may be something different.  The
> important point being, that it is what gets used by the hardware, and
> that it's decoupled, not necessarily different, from the virtual
> state.

I was referring to your first paragraph. And conventionally, in the context of
shadow page tables, it is always different.

>> When it's pointing to EL1 state, it's not really
>> shadow state anymore.
>> 
>
> You can argue it both ways, in the end, all that's important is whether
> or not it's clear what the functions do.
>
>> > If you have better suggestions for naming, we're open to that though.
>> >
>> 
>> Oh nothing specifically, I just felt like "shadow" in the function name
>> could be confusing. Borrowing from kvm_arm_init_cpu_context(), 
>> how about kvm_arm_setup/restore_cpu_context()  ?
>
> I have no objection to these names.
>
>> 
>> BTW, on a separate note, we might as well get away with the typedef and
>> call struct kvm_cpu_context directly.
>> 
> I don't think it's worth changing the code just for that, but if you
> feel it's a significant cleanup, you can send a patch with a good
> argument for why it's worth changing in the commit message.

Sure! The cleanup is not part of the series but sticking to either one
of them in this patch is. As for the argument, typedefs for structs are
discouraged as part of the coding style.

> Thanks,
> -Christoffer

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 10/55] KVM: arm64: Synchronize EL1 system registers on virtual EL2 entry and exit
  2017-01-09  6:24 ` [RFC 10/55] KVM: arm64: Synchronize EL1 system registers on virtual EL2 entry and exit Jintack Lim
@ 2017-06-06 20:16   ` Bandan Das
  2017-06-07  4:26     ` Jintack Lim
  0 siblings, 1 reply; 111+ messages in thread
From: Bandan Das @ 2017-06-06 20:16 UTC (permalink / raw)
  To: Jintack Lim
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel

Jintack Lim <jintack@cs.columbia.edu> writes:

> From: Christoffer Dall <christoffer.dall@linaro.org>
>
> When running in virtual EL2 we use the shadow EL1 systerm register array
> for the save/restore process, so that hardware and especially the memory
> subsystem behaves as code written for EL2 expects while really running
> in EL1.
>
> This works great for EL1 system register accesses that we trap, because
> these accesses will be written into the virtual state for the EL1 system
> registers used when eventually switching the VCPU mode to EL1.
>
> However, there was a collection of EL1 system registers which we do not
> trap, and as a consequence all save/restore operations of these
> registers were happening locally in the shadow array, with no benefit to
> software actually running in virtual EL1 at all.
>
> To fix this, simply synchronize the shadow and real EL1 state for these
> registers on entry/exit to/from virtual EL2 state.
>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
> ---
>  arch/arm64/kvm/context.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 47 insertions(+)
>
> diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
> index 2e9e386..0025dd9 100644
> --- a/arch/arm64/kvm/context.c
> +++ b/arch/arm64/kvm/context.c
> @@ -88,6 +88,51 @@ static void create_shadow_el1_sysregs(struct kvm_vcpu *vcpu)
>  	s_sys_regs[CPACR_EL1] = cptr_el2_to_cpacr_el1(el2_regs[CPTR_EL2]);
>  }
>  
> +/*
> + * List of EL1 registers which we allow the virtual EL2 mode to access
> + * directly without trapping and which haven't been paravirtualized.
> + *
> + * Probably CNTKCTL_EL1 should not be copied but be accessed via trap. Because,
> + * the guest hypervisor running in EL1 can be affected by event streams
> + * configured via CNTKCTL_EL1, which it does not expect. We don't have a
> + * mechanism to trap on CNTKCTL_EL1 as of now (v8.3), keep it in here instead.
> + */
> +static const int el1_non_trap_regs[] = {
> +	CNTKCTL_EL1,
> +	CSSELR_EL1,
> +	PAR_EL1,
> +	TPIDR_EL0,
> +	TPIDR_EL1,
> +	TPIDRRO_EL0
> +};
> +

Do we trap on all register accesses in the non-nested case +
all accesses to the memory access registers ? I am trying to
understand how we decide what registers to trap on. For example,
shouldn't accesses to CSSELR_EL1, the cache size selection register
be trapped ?

Bandan


> +/**
> + * sync_shadow_el1_state - Going to/from the virtual EL2 state, sync state
> + * @vcpu:	The VCPU pointer
> + * @setup:	True, if on the way to the guest (called from setup)
> + *		False, if returning form the guet (calld from restore)
> + *
> + * Some EL1 registers are accessed directly by the virtual EL2 mode because
> + * they in no way affect execution state in virtual EL2.   However, we must
> + * still ensure that virtual EL2 observes the same state of the EL1 registers
> + * as the normal VM's EL1 mode, so copy this state as needed on setup/restore.
> + */
> +static void sync_shadow_el1_state(struct kvm_vcpu *vcpu, bool setup)
> +{
> +	u64 *sys_regs = vcpu->arch.ctxt.sys_regs;
> +	u64 *s_sys_regs = vcpu->arch.ctxt.shadow_sys_regs;
> +	int i;
> +
> +	for (i = 0; i < ARRAY_SIZE(el1_non_trap_regs); i++) {
> +		const int sr = el1_non_trap_regs[i];
> +
> +		if (setup)
> +			s_sys_regs[sr] = sys_regs[sr];
> +		else
> +			sys_regs[sr] = s_sys_regs[sr];
> +	}
> +}
> +
>  /**
>   * kvm_arm_setup_shadow_state -- prepare shadow state based on emulated mode
>   * @vcpu: The VCPU pointer
> @@ -107,6 +152,7 @@ void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
>  		else
>  			ctxt->hw_pstate |= PSR_MODE_EL1t;
>  
> +		sync_shadow_el1_state(vcpu, true);
>  		create_shadow_el1_sysregs(vcpu);
>  		ctxt->hw_sys_regs = ctxt->shadow_sys_regs;
>  		ctxt->hw_sp_el1 = ctxt->el2_regs[SP_EL2];
> @@ -125,6 +171,7 @@ void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
>  	if (unlikely(vcpu_mode_el2(vcpu))) {
> +		sync_shadow_el1_state(vcpu, false);
>  		*vcpu_cpsr(vcpu) &= PSR_MODE_MASK;
>  		*vcpu_cpsr(vcpu) |= ctxt->hw_pstate & ~PSR_MODE_MASK;
>  		ctxt->el2_regs[SP_EL2] = ctxt->hw_sp_el1;

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 11/55] KVM: arm64: Emulate taking an exception to the guest hypervisor
  2017-01-09  6:24 ` [RFC 11/55] KVM: arm64: Emulate taking an exception to the guest hypervisor Jintack Lim
  2017-02-22 11:28   ` Christoffer Dall
@ 2017-06-06 20:21   ` Bandan Das
  2017-06-06 20:38     ` Jintack Lim
  1 sibling, 1 reply; 111+ messages in thread
From: Bandan Das @ 2017-06-06 20:21 UTC (permalink / raw)
  To: Jintack Lim
  Cc: christoffer.dall, marc.zyngier, pbonzini, rkrcmar, linux,
	catalin.marinas, will.deacon, vladimir.murzin, suzuki.poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, andre.przywara, eric.auger, anna-maria,
	shihwei, linux-arm-kernel, kvmarm, kvm, linux-kernel

Jintack Lim <jintack@cs.columbia.edu> writes:

> Emulate taking an exception to the guest hypervisor running in the
> virtual EL2 as described in ARM ARM AArch64.TakeException().

ARM newbie here, I keep thinking of ARM ARM as a typo ;)
...
> +static inline int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2)
> +{
> +	kvm_err("Unexpected call to %s for the non-nesting configuration\n",
> +		 __func__);
> +	return -EINVAL;
> +}
> +
> +static inline int kvm_inject_nested_irq(struct kvm_vcpu *vcpu)
> +{
> +	kvm_err("Unexpected call to %s for the non-nesting configuration\n",
> +		 __func__);
> +	return -EINVAL;
> +}
> +

I see these function stubs for aarch32 in the patches. I don't see how they
can actually be called though. Is this because eventually, there will be
a virtual el2 mode for aarch32 ?

Bandan

>  static inline void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu) { };
>  static inline void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu) { };
>  static inline void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt) { };
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 8892c82..0987ee4 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -42,6 +42,25 @@
>  void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
>  void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
>  
> +#ifdef CONFIG_KVM_ARM_NESTED_HYP
> +int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2);
> +int kvm_inject_nested_irq(struct kvm_vcpu *vcpu);
> +#else
> +static inline int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2)
> +{
> +	kvm_err("Unexpected call to %s for the non-nesting configuration\n",
> +		 __func__);
> +	return -EINVAL;
> +}
> +
> +static inline int kvm_inject_nested_irq(struct kvm_vcpu *vcpu)
> +{
> +	kvm_err("Unexpected call to %s for the non-nesting configuration\n",
> +		 __func__);
> +	return -EINVAL;
> +}
> +#endif
> +
>  void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu);
>  void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu);
>  void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt);
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index 7811d27..b342bdd 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -34,3 +34,5 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-its.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/irqchip.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
>  kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
> +
> +kvm-$(CONFIG_KVM_ARM_NESTED_HYP) += emulate-nested.o
> diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
> new file mode 100644
> index 0000000..59d147f
> --- /dev/null
> +++ b/arch/arm64/kvm/emulate-nested.c
> @@ -0,0 +1,66 @@
> +/*
> + * Copyright (C) 2016 - Columbia University
> + * Author: Jintack Lim <jintack@cs.columbia.edu>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <linux/kvm.h>
> +#include <linux/kvm_host.h>
> +
> +#include <asm/kvm_emulate.h>
> +
> +#include "trace.h"
> +
> +#define	EL2_EXCEPT_SYNC_OFFSET	0x400
> +#define	EL2_EXCEPT_ASYNC_OFFSET	0x480
> +
> +
> +/*
> + *  Emulate taking an exception. See ARM ARM J8.1.2 AArch64.TakeException()
> + */
> +static int kvm_inject_nested(struct kvm_vcpu *vcpu, u64 esr_el2,
> +			     int exception_offset)
> +{
> +	int ret = 1;
> +	kvm_cpu_context_t *ctxt = &vcpu->arch.ctxt;
> +
> +	/* We don't inject an exception recursively to virtual EL2 */
> +	if (vcpu_mode_el2(vcpu))
> +		BUG();
> +
> +	ctxt->el2_regs[SPSR_EL2] = *vcpu_cpsr(vcpu);
> +	ctxt->el2_regs[ELR_EL2] = *vcpu_pc(vcpu);
> +	ctxt->el2_regs[ESR_EL2] = esr_el2;
> +
> +	/* On an exception, PSTATE.SP = 1 */
> +	*vcpu_cpsr(vcpu) = PSR_MODE_EL2h;
> +	*vcpu_cpsr(vcpu) |=  (PSR_A_BIT | PSR_F_BIT | PSR_I_BIT | PSR_D_BIT);
> +	*vcpu_pc(vcpu) = ctxt->el2_regs[VBAR_EL2] + exception_offset;
> +
> +	trace_kvm_inject_nested_exception(vcpu, esr_el2, *vcpu_pc(vcpu));
> +
> +	return ret;
> +}
> +
> +int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2)
> +{
> +	return kvm_inject_nested(vcpu, esr_el2, EL2_EXCEPT_SYNC_OFFSET);
> +}
> +
> +int kvm_inject_nested_irq(struct kvm_vcpu *vcpu)
> +{
> +	u64 esr_el2 = kvm_vcpu_get_hsr(vcpu);
> +	/* We supports only IRQ and FIQ, so the esr_el2 is not updated. */
> +	return kvm_inject_nested(vcpu, esr_el2, EL2_EXCEPT_ASYNC_OFFSET);
> +}
> diff --git a/arch/arm64/kvm/trace.h b/arch/arm64/kvm/trace.h
> index 7fb0008..7c86cfb 100644
> --- a/arch/arm64/kvm/trace.h
> +++ b/arch/arm64/kvm/trace.h
> @@ -167,6 +167,26 @@
>  );
>  
>  
> +TRACE_EVENT(kvm_inject_nested_exception,
> +	TP_PROTO(struct kvm_vcpu *vcpu, unsigned long esr_el2,
> +		 unsigned long pc),
> +	TP_ARGS(vcpu, esr_el2, pc),
> +
> +	TP_STRUCT__entry(
> +		__field(struct kvm_vcpu *,	vcpu)
> +		__field(unsigned long,		esr_el2)
> +		__field(unsigned long,		pc)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->vcpu = vcpu;
> +		__entry->esr_el2 = esr_el2;
> +		__entry->pc = pc;
> +	),
> +
> +	TP_printk("vcpu: %p, inject exception to vEL2: ESR_EL2 0x%lx, vector: 0x%016lx",
> +		  __entry->vcpu, __entry->esr_el2, __entry->pc)
> +);
>  #endif /* _TRACE_ARM64_KVM_H */
>  
>  #undef TRACE_INCLUDE_PATH

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 11/55] KVM: arm64: Emulate taking an exception to the guest hypervisor
  2017-06-06 20:21   ` Bandan Das
@ 2017-06-06 20:38     ` Jintack Lim
  2017-06-06 22:07       ` Bandan Das
  0 siblings, 1 reply; 111+ messages in thread
From: Jintack Lim @ 2017-06-06 20:38 UTC (permalink / raw)
  To: Bandan Das
  Cc: KVM General, Catalin Marinas, Will Deacon, kvmarm, Shih-Wei Li,
	lorenzo.pieralisi, linux, arm-mail-list, Marc Zyngier,
	Andre Przywara, kevin.brodsky, wcohen, anna-maria, geoff,
	lkml - Kernel Mailing List, Paolo Bonzini, Jintack Lim

Hi Bandan,

On Tue, Jun 6, 2017 at 4:21 PM, Bandan Das <bsd@redhat.com> wrote:
> Jintack Lim <jintack@cs.columbia.edu> writes:
>
>> Emulate taking an exception to the guest hypervisor running in the
>> virtual EL2 as described in ARM ARM AArch64.TakeException().
>
> ARM newbie here, I keep thinking of ARM ARM as a typo ;)

ARM ARM means ARM Architecture Reference Manual :)

> ...
>> +static inline int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2)
>> +{
>> +     kvm_err("Unexpected call to %s for the non-nesting configuration\n",
>> +              __func__);
>> +     return -EINVAL;
>> +}
>> +
>> +static inline int kvm_inject_nested_irq(struct kvm_vcpu *vcpu)
>> +{
>> +     kvm_err("Unexpected call to %s for the non-nesting configuration\n",
>> +              __func__);
>> +     return -EINVAL;
>> +}
>> +
>
> I see these function stubs for aarch32 in the patches. I don't see how they
> can actually be called though. Is this because eventually, there will be
> a virtual el2 mode for aarch32 ?

Current RFC doesn't support nested virtualization on 32bit arm
architecture and those functions will be never called. Those functions
are there for the compilation.

Thanks,
Jintack

>
> Bandan
>
>>  static inline void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu) { };
>>  static inline void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu) { };
>>  static inline void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt) { };
>> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
>> index 8892c82..0987ee4 100644
>> --- a/arch/arm64/include/asm/kvm_emulate.h
>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>> @@ -42,6 +42,25 @@
>>  void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
>>  void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
>>
>> +#ifdef CONFIG_KVM_ARM_NESTED_HYP
>> +int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2);
>> +int kvm_inject_nested_irq(struct kvm_vcpu *vcpu);
>> +#else
>> +static inline int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2)
>> +{
>> +     kvm_err("Unexpected call to %s for the non-nesting configuration\n",
>> +              __func__);
>> +     return -EINVAL;
>> +}
>> +
>> +static inline int kvm_inject_nested_irq(struct kvm_vcpu *vcpu)
>> +{
>> +     kvm_err("Unexpected call to %s for the non-nesting configuration\n",
>> +              __func__);
>> +     return -EINVAL;
>> +}
>> +#endif
>> +
>>  void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu);
>>  void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu);
>>  void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt);
>> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
>> index 7811d27..b342bdd 100644
>> --- a/arch/arm64/kvm/Makefile
>> +++ b/arch/arm64/kvm/Makefile
>> @@ -34,3 +34,5 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-its.o
>>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/irqchip.o
>>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
>>  kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
>> +
>> +kvm-$(CONFIG_KVM_ARM_NESTED_HYP) += emulate-nested.o
>> diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
>> new file mode 100644
>> index 0000000..59d147f
>> --- /dev/null
>> +++ b/arch/arm64/kvm/emulate-nested.c
>> @@ -0,0 +1,66 @@
>> +/*
>> + * Copyright (C) 2016 - Columbia University
>> + * Author: Jintack Lim <jintack@cs.columbia.edu>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include <linux/kvm.h>
>> +#include <linux/kvm_host.h>
>> +
>> +#include <asm/kvm_emulate.h>
>> +
>> +#include "trace.h"
>> +
>> +#define      EL2_EXCEPT_SYNC_OFFSET  0x400
>> +#define      EL2_EXCEPT_ASYNC_OFFSET 0x480
>> +
>> +
>> +/*
>> + *  Emulate taking an exception. See ARM ARM J8.1.2 AArch64.TakeException()
>> + */
>> +static int kvm_inject_nested(struct kvm_vcpu *vcpu, u64 esr_el2,
>> +                          int exception_offset)
>> +{
>> +     int ret = 1;
>> +     kvm_cpu_context_t *ctxt = &vcpu->arch.ctxt;
>> +
>> +     /* We don't inject an exception recursively to virtual EL2 */
>> +     if (vcpu_mode_el2(vcpu))
>> +             BUG();
>> +
>> +     ctxt->el2_regs[SPSR_EL2] = *vcpu_cpsr(vcpu);
>> +     ctxt->el2_regs[ELR_EL2] = *vcpu_pc(vcpu);
>> +     ctxt->el2_regs[ESR_EL2] = esr_el2;
>> +
>> +     /* On an exception, PSTATE.SP = 1 */
>> +     *vcpu_cpsr(vcpu) = PSR_MODE_EL2h;
>> +     *vcpu_cpsr(vcpu) |=  (PSR_A_BIT | PSR_F_BIT | PSR_I_BIT | PSR_D_BIT);
>> +     *vcpu_pc(vcpu) = ctxt->el2_regs[VBAR_EL2] + exception_offset;
>> +
>> +     trace_kvm_inject_nested_exception(vcpu, esr_el2, *vcpu_pc(vcpu));
>> +
>> +     return ret;
>> +}
>> +
>> +int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2)
>> +{
>> +     return kvm_inject_nested(vcpu, esr_el2, EL2_EXCEPT_SYNC_OFFSET);
>> +}
>> +
>> +int kvm_inject_nested_irq(struct kvm_vcpu *vcpu)
>> +{
>> +     u64 esr_el2 = kvm_vcpu_get_hsr(vcpu);
>> +     /* We supports only IRQ and FIQ, so the esr_el2 is not updated. */
>> +     return kvm_inject_nested(vcpu, esr_el2, EL2_EXCEPT_ASYNC_OFFSET);
>> +}
>> diff --git a/arch/arm64/kvm/trace.h b/arch/arm64/kvm/trace.h
>> index 7fb0008..7c86cfb 100644
>> --- a/arch/arm64/kvm/trace.h
>> +++ b/arch/arm64/kvm/trace.h
>> @@ -167,6 +167,26 @@
>>  );
>>
>>
>> +TRACE_EVENT(kvm_inject_nested_exception,
>> +     TP_PROTO(struct kvm_vcpu *vcpu, unsigned long esr_el2,
>> +              unsigned long pc),
>> +     TP_ARGS(vcpu, esr_el2, pc),
>> +
>> +     TP_STRUCT__entry(
>> +             __field(struct kvm_vcpu *,      vcpu)
>> +             __field(unsigned long,          esr_el2)
>> +             __field(unsigned long,          pc)
>> +     ),
>> +
>> +     TP_fast_assign(
>> +             __entry->vcpu = vcpu;
>> +             __entry->esr_el2 = esr_el2;
>> +             __entry->pc = pc;
>> +     ),
>> +
>> +     TP_printk("vcpu: %p, inject exception to vEL2: ESR_EL2 0x%lx, vector: 0x%016lx",
>> +               __entry->vcpu, __entry->esr_el2, __entry->pc)
>> +);
>>  #endif /* _TRACE_ARM64_KVM_H */
>>
>>  #undef TRACE_INCLUDE_PATH
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 11/55] KVM: arm64: Emulate taking an exception to the guest hypervisor
  2017-06-06 20:38     ` Jintack Lim
@ 2017-06-06 22:07       ` Bandan Das
  2017-06-06 23:16         ` Jintack Lim
  0 siblings, 1 reply; 111+ messages in thread
From: Bandan Das @ 2017-06-06 22:07 UTC (permalink / raw)
  To: Jintack Lim
  Cc: KVM General, Catalin Marinas, Will Deacon, kvmarm, Shih-Wei Li,
	lorenzo.pieralisi, linux, arm-mail-list, Marc Zyngier,
	Andre Przywara, kevin.brodsky, wcohen, anna-maria, geoff,
	lkml - Kernel Mailing List, Paolo Bonzini, Jintack Lim

Hi Jintack,

Jintack Lim <jintack.lim@linaro.org> writes:

> Hi Bandan,
>
> On Tue, Jun 6, 2017 at 4:21 PM, Bandan Das <bsd@redhat.com> wrote:
>> Jintack Lim <jintack@cs.columbia.edu> writes:
>>
>>> Emulate taking an exception to the guest hypervisor running in the
>>> virtual EL2 as described in ARM ARM AArch64.TakeException().
>>
>> ARM newbie here, I keep thinking of ARM ARM as a typo ;)
>
> ARM ARM means ARM Architecture Reference Manual :)
>
>> ...
>>> +static inline int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2)
>>> +{
>>> +     kvm_err("Unexpected call to %s for the non-nesting configuration\n",
>>> +              __func__);
>>> +     return -EINVAL;
>>> +}
>>> +
>>> +static inline int kvm_inject_nested_irq(struct kvm_vcpu *vcpu)
>>> +{
>>> +     kvm_err("Unexpected call to %s for the non-nesting configuration\n",
>>> +              __func__);
>>> +     return -EINVAL;
>>> +}
>>> +
>>
>> I see these function stubs for aarch32 in the patches. I don't see how they
>> can actually be called though. Is this because eventually, there will be
>> a virtual el2 mode for aarch32 ?
>
> Current RFC doesn't support nested virtualization on 32bit arm
> architecture and those functions will be never called. Those functions
> are there for the compilation.

Do you mean that compilation will fail ? It seems these functions are
defined separately in 32/64 bit specific header files. Or is it that
64 bit compilation also depends on the 32 bit header file ?

Bandan

> Thanks,
> Jintack
>
>>
>> Bandan
>>
>>>  static inline void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu) { };
>>>  static inline void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu) { };
>>>  static inline void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt) { };
>>> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
>>> index 8892c82..0987ee4 100644
>>> --- a/arch/arm64/include/asm/kvm_emulate.h
>>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>>> @@ -42,6 +42,25 @@
>>>  void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
>>>  void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
>>>
>>> +#ifdef CONFIG_KVM_ARM_NESTED_HYP
>>> +int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2);
>>> +int kvm_inject_nested_irq(struct kvm_vcpu *vcpu);
>>> +#else
>>> +static inline int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2)
>>> +{
>>> +     kvm_err("Unexpected call to %s for the non-nesting configuration\n",
>>> +              __func__);
>>> +     return -EINVAL;
>>> +}
>>> +
>>> +static inline int kvm_inject_nested_irq(struct kvm_vcpu *vcpu)
>>> +{
>>> +     kvm_err("Unexpected call to %s for the non-nesting configuration\n",
>>> +              __func__);
>>> +     return -EINVAL;
>>> +}
>>> +#endif
>>> +
>>>  void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu);
>>>  void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu);
>>>  void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt);
>>> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
>>> index 7811d27..b342bdd 100644
>>> --- a/arch/arm64/kvm/Makefile
>>> +++ b/arch/arm64/kvm/Makefile
>>> @@ -34,3 +34,5 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-its.o
>>>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/irqchip.o
>>>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
>>>  kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
>>> +
>>> +kvm-$(CONFIG_KVM_ARM_NESTED_HYP) += emulate-nested.o
>>> diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
>>> new file mode 100644
>>> index 0000000..59d147f
>>> --- /dev/null
>>> +++ b/arch/arm64/kvm/emulate-nested.c
>>> @@ -0,0 +1,66 @@
>>> +/*
>>> + * Copyright (C) 2016 - Columbia University
>>> + * Author: Jintack Lim <jintack@cs.columbia.edu>
>>> + *
>>> + * This program is free software; you can redistribute it and/or modify
>>> + * it under the terms of the GNU General Public License version 2 as
>>> + * published by the Free Software Foundation.
>>> + *
>>> + * This program is distributed in the hope that it will be useful,
>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>> + * GNU General Public License for more details.
>>> + *
>>> + * You should have received a copy of the GNU General Public License
>>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>>> + */
>>> +
>>> +#include <linux/kvm.h>
>>> +#include <linux/kvm_host.h>
>>> +
>>> +#include <asm/kvm_emulate.h>
>>> +
>>> +#include "trace.h"
>>> +
>>> +#define      EL2_EXCEPT_SYNC_OFFSET  0x400
>>> +#define      EL2_EXCEPT_ASYNC_OFFSET 0x480
>>> +
>>> +
>>> +/*
>>> + *  Emulate taking an exception. See ARM ARM J8.1.2 AArch64.TakeException()
>>> + */
>>> +static int kvm_inject_nested(struct kvm_vcpu *vcpu, u64 esr_el2,
>>> +                          int exception_offset)
>>> +{
>>> +     int ret = 1;
>>> +     kvm_cpu_context_t *ctxt = &vcpu->arch.ctxt;
>>> +
>>> +     /* We don't inject an exception recursively to virtual EL2 */
>>> +     if (vcpu_mode_el2(vcpu))
>>> +             BUG();
>>> +
>>> +     ctxt->el2_regs[SPSR_EL2] = *vcpu_cpsr(vcpu);
>>> +     ctxt->el2_regs[ELR_EL2] = *vcpu_pc(vcpu);
>>> +     ctxt->el2_regs[ESR_EL2] = esr_el2;
>>> +
>>> +     /* On an exception, PSTATE.SP = 1 */
>>> +     *vcpu_cpsr(vcpu) = PSR_MODE_EL2h;
>>> +     *vcpu_cpsr(vcpu) |=  (PSR_A_BIT | PSR_F_BIT | PSR_I_BIT | PSR_D_BIT);
>>> +     *vcpu_pc(vcpu) = ctxt->el2_regs[VBAR_EL2] + exception_offset;
>>> +
>>> +     trace_kvm_inject_nested_exception(vcpu, esr_el2, *vcpu_pc(vcpu));
>>> +
>>> +     return ret;
>>> +}
>>> +
>>> +int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2)
>>> +{
>>> +     return kvm_inject_nested(vcpu, esr_el2, EL2_EXCEPT_SYNC_OFFSET);
>>> +}
>>> +
>>> +int kvm_inject_nested_irq(struct kvm_vcpu *vcpu)
>>> +{
>>> +     u64 esr_el2 = kvm_vcpu_get_hsr(vcpu);
>>> +     /* We supports only IRQ and FIQ, so the esr_el2 is not updated. */
>>> +     return kvm_inject_nested(vcpu, esr_el2, EL2_EXCEPT_ASYNC_OFFSET);
>>> +}
>>> diff --git a/arch/arm64/kvm/trace.h b/arch/arm64/kvm/trace.h
>>> index 7fb0008..7c86cfb 100644
>>> --- a/arch/arm64/kvm/trace.h
>>> +++ b/arch/arm64/kvm/trace.h
>>> @@ -167,6 +167,26 @@
>>>  );
>>>
>>>
>>> +TRACE_EVENT(kvm_inject_nested_exception,
>>> +     TP_PROTO(struct kvm_vcpu *vcpu, unsigned long esr_el2,
>>> +              unsigned long pc),
>>> +     TP_ARGS(vcpu, esr_el2, pc),
>>> +
>>> +     TP_STRUCT__entry(
>>> +             __field(struct kvm_vcpu *,      vcpu)
>>> +             __field(unsigned long,          esr_el2)
>>> +             __field(unsigned long,          pc)
>>> +     ),
>>> +
>>> +     TP_fast_assign(
>>> +             __entry->vcpu = vcpu;
>>> +             __entry->esr_el2 = esr_el2;
>>> +             __entry->pc = pc;
>>> +     ),
>>> +
>>> +     TP_printk("vcpu: %p, inject exception to vEL2: ESR_EL2 0x%lx, vector: 0x%016lx",
>>> +               __entry->vcpu, __entry->esr_el2, __entry->pc)
>>> +);
>>>  #endif /* _TRACE_ARM64_KVM_H */
>>>
>>>  #undef TRACE_INCLUDE_PATH
>> _______________________________________________
>> kvmarm mailing list
>> kvmarm@lists.cs.columbia.edu
>> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 11/55] KVM: arm64: Emulate taking an exception to the guest hypervisor
  2017-06-06 22:07       ` Bandan Das
@ 2017-06-06 23:16         ` Jintack Lim
  2017-06-07 17:21           ` Bandan Das
  0 siblings, 1 reply; 111+ messages in thread
From: Jintack Lim @ 2017-06-06 23:16 UTC (permalink / raw)
  To: Bandan Das
  Cc: KVM General, Catalin Marinas, Will Deacon, kvmarm, Shih-Wei Li,
	lorenzo.pieralisi, linux, arm-mail-list, Marc Zyngier,
	Andre Przywara, kevin.brodsky, wcohen, anna-maria, geoff,
	lkml - Kernel Mailing List, Paolo Bonzini

On Tue, Jun 6, 2017 at 6:07 PM, Bandan Das <bsd@redhat.com> wrote:
> Hi Jintack,
>
> Jintack Lim <jintack.lim@linaro.org> writes:
>
>> Hi Bandan,
>>
>> On Tue, Jun 6, 2017 at 4:21 PM, Bandan Das <bsd@redhat.com> wrote:
>>> Jintack Lim <jintack@cs.columbia.edu> writes:
>>>
>>>> Emulate taking an exception to the guest hypervisor running in the
>>>> virtual EL2 as described in ARM ARM AArch64.TakeException().
>>>
>>> ARM newbie here, I keep thinking of ARM ARM as a typo ;)
>>
>> ARM ARM means ARM Architecture Reference Manual :)
>>
>>> ...
>>>> +static inline int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2)
>>>> +{
>>>> +     kvm_err("Unexpected call to %s for the non-nesting configuration\n",
>>>> +              __func__);
>>>> +     return -EINVAL;
>>>> +}
>>>> +
>>>> +static inline int kvm_inject_nested_irq(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +     kvm_err("Unexpected call to %s for the non-nesting configuration\n",
>>>> +              __func__);
>>>> +     return -EINVAL;
>>>> +}
>>>> +
>>>
>>> I see these function stubs for aarch32 in the patches. I don't see how they
>>> can actually be called though. Is this because eventually, there will be
>>> a virtual el2 mode for aarch32 ?
>>
>> Current RFC doesn't support nested virtualization on 32bit arm
>> architecture and those functions will be never called. Those functions
>> are there for the compilation.
>
> Do you mean that compilation will fail ?

Compilation on 32bit arm architecture will fail without them.

> It seems these functions are
> defined separately in 32/64 bit specific header files. Or is it that
> 64 bit compilation also depends on the 32 bit header file ?

It's only for 32bit architecture. For example, kvm_inject_nested_irq()
is called in virt/kvm/arm/vgic/vgic.c which is shared between 32 and
64 bit.

>
> Bandan
>
>> Thanks,
>> Jintack
>>
>>>
>>> Bandan
>>>
>>>>  static inline void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu) { };
>>>>  static inline void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu) { };
>>>>  static inline void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt) { };
>>>> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
>>>> index 8892c82..0987ee4 100644
>>>> --- a/arch/arm64/include/asm/kvm_emulate.h
>>>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>>>> @@ -42,6 +42,25 @@
>>>>  void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
>>>>  void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
>>>>
>>>> +#ifdef CONFIG_KVM_ARM_NESTED_HYP
>>>> +int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2);
>>>> +int kvm_inject_nested_irq(struct kvm_vcpu *vcpu);
>>>> +#else
>>>> +static inline int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2)
>>>> +{
>>>> +     kvm_err("Unexpected call to %s for the non-nesting configuration\n",
>>>> +              __func__);
>>>> +     return -EINVAL;
>>>> +}
>>>> +
>>>> +static inline int kvm_inject_nested_irq(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +     kvm_err("Unexpected call to %s for the non-nesting configuration\n",
>>>> +              __func__);
>>>> +     return -EINVAL;
>>>> +}
>>>> +#endif
>>>> +
>>>>  void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu);
>>>>  void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu);
>>>>  void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt);
>>>> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
>>>> index 7811d27..b342bdd 100644
>>>> --- a/arch/arm64/kvm/Makefile
>>>> +++ b/arch/arm64/kvm/Makefile
>>>> @@ -34,3 +34,5 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-its.o
>>>>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/irqchip.o
>>>>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
>>>>  kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
>>>> +
>>>> +kvm-$(CONFIG_KVM_ARM_NESTED_HYP) += emulate-nested.o
>>>> diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
>>>> new file mode 100644
>>>> index 0000000..59d147f
>>>> --- /dev/null
>>>> +++ b/arch/arm64/kvm/emulate-nested.c
>>>> @@ -0,0 +1,66 @@
>>>> +/*
>>>> + * Copyright (C) 2016 - Columbia University
>>>> + * Author: Jintack Lim <jintack@cs.columbia.edu>
>>>> + *
>>>> + * This program is free software; you can redistribute it and/or modify
>>>> + * it under the terms of the GNU General Public License version 2 as
>>>> + * published by the Free Software Foundation.
>>>> + *
>>>> + * This program is distributed in the hope that it will be useful,
>>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>>> + * GNU General Public License for more details.
>>>> + *
>>>> + * You should have received a copy of the GNU General Public License
>>>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>>>> + */
>>>> +
>>>> +#include <linux/kvm.h>
>>>> +#include <linux/kvm_host.h>
>>>> +
>>>> +#include <asm/kvm_emulate.h>
>>>> +
>>>> +#include "trace.h"
>>>> +
>>>> +#define      EL2_EXCEPT_SYNC_OFFSET  0x400
>>>> +#define      EL2_EXCEPT_ASYNC_OFFSET 0x480
>>>> +
>>>> +
>>>> +/*
>>>> + *  Emulate taking an exception. See ARM ARM J8.1.2 AArch64.TakeException()
>>>> + */
>>>> +static int kvm_inject_nested(struct kvm_vcpu *vcpu, u64 esr_el2,
>>>> +                          int exception_offset)
>>>> +{
>>>> +     int ret = 1;
>>>> +     kvm_cpu_context_t *ctxt = &vcpu->arch.ctxt;
>>>> +
>>>> +     /* We don't inject an exception recursively to virtual EL2 */
>>>> +     if (vcpu_mode_el2(vcpu))
>>>> +             BUG();
>>>> +
>>>> +     ctxt->el2_regs[SPSR_EL2] = *vcpu_cpsr(vcpu);
>>>> +     ctxt->el2_regs[ELR_EL2] = *vcpu_pc(vcpu);
>>>> +     ctxt->el2_regs[ESR_EL2] = esr_el2;
>>>> +
>>>> +     /* On an exception, PSTATE.SP = 1 */
>>>> +     *vcpu_cpsr(vcpu) = PSR_MODE_EL2h;
>>>> +     *vcpu_cpsr(vcpu) |=  (PSR_A_BIT | PSR_F_BIT | PSR_I_BIT | PSR_D_BIT);
>>>> +     *vcpu_pc(vcpu) = ctxt->el2_regs[VBAR_EL2] + exception_offset;
>>>> +
>>>> +     trace_kvm_inject_nested_exception(vcpu, esr_el2, *vcpu_pc(vcpu));
>>>> +
>>>> +     return ret;
>>>> +}
>>>> +
>>>> +int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2)
>>>> +{
>>>> +     return kvm_inject_nested(vcpu, esr_el2, EL2_EXCEPT_SYNC_OFFSET);
>>>> +}
>>>> +
>>>> +int kvm_inject_nested_irq(struct kvm_vcpu *vcpu)
>>>> +{
>>>> +     u64 esr_el2 = kvm_vcpu_get_hsr(vcpu);
>>>> +     /* We supports only IRQ and FIQ, so the esr_el2 is not updated. */
>>>> +     return kvm_inject_nested(vcpu, esr_el2, EL2_EXCEPT_ASYNC_OFFSET);
>>>> +}
>>>> diff --git a/arch/arm64/kvm/trace.h b/arch/arm64/kvm/trace.h
>>>> index 7fb0008..7c86cfb 100644
>>>> --- a/arch/arm64/kvm/trace.h
>>>> +++ b/arch/arm64/kvm/trace.h
>>>> @@ -167,6 +167,26 @@
>>>>  );
>>>>
>>>>
>>>> +TRACE_EVENT(kvm_inject_nested_exception,
>>>> +     TP_PROTO(struct kvm_vcpu *vcpu, unsigned long esr_el2,
>>>> +              unsigned long pc),
>>>> +     TP_ARGS(vcpu, esr_el2, pc),
>>>> +
>>>> +     TP_STRUCT__entry(
>>>> +             __field(struct kvm_vcpu *,      vcpu)
>>>> +             __field(unsigned long,          esr_el2)
>>>> +             __field(unsigned long,          pc)
>>>> +     ),
>>>> +
>>>> +     TP_fast_assign(
>>>> +             __entry->vcpu = vcpu;
>>>> +             __entry->esr_el2 = esr_el2;
>>>> +             __entry->pc = pc;
>>>> +     ),
>>>> +
>>>> +     TP_printk("vcpu: %p, inject exception to vEL2: ESR_EL2 0x%lx, vector: 0x%016lx",
>>>> +               __entry->vcpu, __entry->esr_el2, __entry->pc)
>>>> +);
>>>>  #endif /* _TRACE_ARM64_KVM_H */
>>>>
>>>>  #undef TRACE_INCLUDE_PATH
>>> _______________________________________________
>>> kvmarm mailing list
>>> kvmarm@lists.cs.columbia.edu
>>> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 10/55] KVM: arm64: Synchronize EL1 system registers on virtual EL2 entry and exit
  2017-06-06 20:16   ` Bandan Das
@ 2017-06-07  4:26     ` Jintack Lim
  0 siblings, 0 replies; 111+ messages in thread
From: Jintack Lim @ 2017-06-07  4:26 UTC (permalink / raw)
  To: Bandan Das
  Cc: KVM General, Catalin Marinas, Will Deacon, kvmarm, Shih-Wei Li,
	lorenzo.pieralisi, linux, arm-mail-list, Marc Zyngier,
	Andre Przywara, kevin.brodsky, wcohen, anna-maria, geoff,
	lkml - Kernel Mailing List, Paolo Bonzini, Jintack Lim

Hi Bandan,

On Tue, Jun 6, 2017 at 4:16 PM, Bandan Das <bsd@redhat.com> wrote:
> Jintack Lim <jintack@cs.columbia.edu> writes:
>
>> From: Christoffer Dall <christoffer.dall@linaro.org>
>>
>> When running in virtual EL2 we use the shadow EL1 systerm register array
>> for the save/restore process, so that hardware and especially the memory
>> subsystem behaves as code written for EL2 expects while really running
>> in EL1.
>>
>> This works great for EL1 system register accesses that we trap, because
>> these accesses will be written into the virtual state for the EL1 system
>> registers used when eventually switching the VCPU mode to EL1.
>>
>> However, there was a collection of EL1 system registers which we do not
>> trap, and as a consequence all save/restore operations of these
>> registers were happening locally in the shadow array, with no benefit to
>> software actually running in virtual EL1 at all.
>>
>> To fix this, simply synchronize the shadow and real EL1 state for these
>> registers on entry/exit to/from virtual EL2 state.
>>
>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
>> ---
>>  arch/arm64/kvm/context.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 47 insertions(+)
>>
>> diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
>> index 2e9e386..0025dd9 100644
>> --- a/arch/arm64/kvm/context.c
>> +++ b/arch/arm64/kvm/context.c
>> @@ -88,6 +88,51 @@ static void create_shadow_el1_sysregs(struct kvm_vcpu *vcpu)
>>       s_sys_regs[CPACR_EL1] = cptr_el2_to_cpacr_el1(el2_regs[CPTR_EL2]);
>>  }
>>
>> +/*
>> + * List of EL1 registers which we allow the virtual EL2 mode to access
>> + * directly without trapping and which haven't been paravirtualized.
>> + *
>> + * Probably CNTKCTL_EL1 should not be copied but be accessed via trap. Because,
>> + * the guest hypervisor running in EL1 can be affected by event streams
>> + * configured via CNTKCTL_EL1, which it does not expect. We don't have a
>> + * mechanism to trap on CNTKCTL_EL1 as of now (v8.3), keep it in here instead.
>> + */
>> +static const int el1_non_trap_regs[] = {
>> +     CNTKCTL_EL1,
>> +     CSSELR_EL1,
>> +     PAR_EL1,
>> +     TPIDR_EL0,
>> +     TPIDR_EL1,
>> +     TPIDRRO_EL0
>> +};
>> +
>
> Do we trap on all register accesses in the non-nested case +
> all accesses to the memory access registers ? I am trying to
> understand how we decide what registers to trap on. For example,
> shouldn't accesses to CSSELR_EL1, the cache size selection register
> be trapped ?

So, the principle is that we trap on EL1 register accesses from the
virtual EL2 if those EL1 register accesses without trap make the guest
hypervisor's execution different from what it should be if it really
runs in EL2. (e.g. A write operation to TTBR0_EL1 from the guest
hypervisor without trap will change the guest hypervisor's page table
base. However, if a hypervisor runs in EL2, this operation only
affects the software running in EL1, not itself.)

For non-nested case, this patch does not introduce any additional
traps since there's no virtual EL2 state.

For CSSELR_EL1 case, this register can be used at any exception level
other than EL0 and its behavior is the same whether it is executed in
EL1 or EL2. In other words, the guest hypervisor can interact with
this register in EL1 just the way a non-nesting hypervisor would do in
EL2.

Thanks,
Jintack

>
> Bandan
>
>
>> +/**
>> + * sync_shadow_el1_state - Going to/from the virtual EL2 state, sync state
>> + * @vcpu:    The VCPU pointer
>> + * @setup:   True, if on the way to the guest (called from setup)
>> + *           False, if returning form the guet (calld from restore)
>> + *
>> + * Some EL1 registers are accessed directly by the virtual EL2 mode because
>> + * they in no way affect execution state in virtual EL2.   However, we must
>> + * still ensure that virtual EL2 observes the same state of the EL1 registers
>> + * as the normal VM's EL1 mode, so copy this state as needed on setup/restore.
>> + */
>> +static void sync_shadow_el1_state(struct kvm_vcpu *vcpu, bool setup)
>> +{
>> +     u64 *sys_regs = vcpu->arch.ctxt.sys_regs;
>> +     u64 *s_sys_regs = vcpu->arch.ctxt.shadow_sys_regs;
>> +     int i;
>> +
>> +     for (i = 0; i < ARRAY_SIZE(el1_non_trap_regs); i++) {
>> +             const int sr = el1_non_trap_regs[i];
>> +
>> +             if (setup)
>> +                     s_sys_regs[sr] = sys_regs[sr];
>> +             else
>> +                     sys_regs[sr] = s_sys_regs[sr];
>> +     }
>> +}
>> +
>>  /**
>>   * kvm_arm_setup_shadow_state -- prepare shadow state based on emulated mode
>>   * @vcpu: The VCPU pointer
>> @@ -107,6 +152,7 @@ void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
>>               else
>>                       ctxt->hw_pstate |= PSR_MODE_EL1t;
>>
>> +             sync_shadow_el1_state(vcpu, true);
>>               create_shadow_el1_sysregs(vcpu);
>>               ctxt->hw_sys_regs = ctxt->shadow_sys_regs;
>>               ctxt->hw_sp_el1 = ctxt->el2_regs[SP_EL2];
>> @@ -125,6 +171,7 @@ void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu)
>>  {
>>       struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
>>       if (unlikely(vcpu_mode_el2(vcpu))) {
>> +             sync_shadow_el1_state(vcpu, false);
>>               *vcpu_cpsr(vcpu) &= PSR_MODE_MASK;
>>               *vcpu_cpsr(vcpu) |= ctxt->hw_pstate & ~PSR_MODE_MASK;
>>               ctxt->el2_regs[SP_EL2] = ctxt->hw_sp_el1;
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 11/55] KVM: arm64: Emulate taking an exception to the guest hypervisor
  2017-06-06 23:16         ` Jintack Lim
@ 2017-06-07 17:21           ` Bandan Das
  0 siblings, 0 replies; 111+ messages in thread
From: Bandan Das @ 2017-06-07 17:21 UTC (permalink / raw)
  To: Jintack Lim
  Cc: KVM General, Catalin Marinas, Will Deacon, kvmarm, Shih-Wei Li,
	lorenzo.pieralisi, linux, arm-mail-list, Marc Zyngier,
	Andre Przywara, kevin.brodsky, wcohen, anna-maria, geoff,
	lkml - Kernel Mailing List, Paolo Bonzini

Jintack Lim <jintack.lim@linaro.org> writes:

> Compilation on 32bit arm architecture will fail without them.
...
>> It seems these functions are
>> defined separately in 32/64 bit specific header files. Or is it that
>> 64 bit compilation also depends on the 32 bit header file ?
>
> It's only for 32bit architecture. For example, kvm_inject_nested_irq()
> is called in virt/kvm/arm/vgic/vgic.c which is shared between 32 and
> 64 bit.

Ah, that's the catch! Thanks for clearing this up!

>>
>> Bandan
>>
>>> Thanks,
>>> Jintack
>>>
>>>>
>>>> Bandan
>>>>
>>>>>  static inline void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu) { };
>>>>>  static inline void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu) { };
>>>>>  static inline void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt) { };
>>>>> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
>>>>> index 8892c82..0987ee4 100644
>>>>> --- a/arch/arm64/include/asm/kvm_emulate.h
>>>>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>>>>> @@ -42,6 +42,25 @@
>>>>>  void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
>>>>>  void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
>>>>>
>>>>> +#ifdef CONFIG_KVM_ARM_NESTED_HYP
>>>>> +int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2);
>>>>> +int kvm_inject_nested_irq(struct kvm_vcpu *vcpu);
>>>>> +#else
>>>>> +static inline int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2)
>>>>> +{
>>>>> +     kvm_err("Unexpected call to %s for the non-nesting configuration\n",
>>>>> +              __func__);
>>>>> +     return -EINVAL;
>>>>> +}
>>>>> +
>>>>> +static inline int kvm_inject_nested_irq(struct kvm_vcpu *vcpu)
>>>>> +{
>>>>> +     kvm_err("Unexpected call to %s for the non-nesting configuration\n",
>>>>> +              __func__);
>>>>> +     return -EINVAL;
>>>>> +}
>>>>> +#endif
>>>>> +
>>>>>  void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu);
>>>>>  void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu);
>>>>>  void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt);
>>>>> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
>>>>> index 7811d27..b342bdd 100644
>>>>> --- a/arch/arm64/kvm/Makefile
>>>>> +++ b/arch/arm64/kvm/Makefile
>>>>> @@ -34,3 +34,5 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-its.o
>>>>>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/irqchip.o
>>>>>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
>>>>>  kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
>>>>> +
>>>>> +kvm-$(CONFIG_KVM_ARM_NESTED_HYP) += emulate-nested.o
>>>>> diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
>>>>> new file mode 100644
>>>>> index 0000000..59d147f
>>>>> --- /dev/null
>>>>> +++ b/arch/arm64/kvm/emulate-nested.c
>>>>> @@ -0,0 +1,66 @@
>>>>> +/*
>>>>> + * Copyright (C) 2016 - Columbia University
>>>>> + * Author: Jintack Lim <jintack@cs.columbia.edu>
>>>>> + *
>>>>> + * This program is free software; you can redistribute it and/or modify
>>>>> + * it under the terms of the GNU General Public License version 2 as
>>>>> + * published by the Free Software Foundation.
>>>>> + *
>>>>> + * This program is distributed in the hope that it will be useful,
>>>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>>>> + * GNU General Public License for more details.
>>>>> + *
>>>>> + * You should have received a copy of the GNU General Public License
>>>>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>>>>> + */
>>>>> +
>>>>> +#include <linux/kvm.h>
>>>>> +#include <linux/kvm_host.h>
>>>>> +
>>>>> +#include <asm/kvm_emulate.h>
>>>>> +
>>>>> +#include "trace.h"
>>>>> +
>>>>> +#define      EL2_EXCEPT_SYNC_OFFSET  0x400
>>>>> +#define      EL2_EXCEPT_ASYNC_OFFSET 0x480
>>>>> +
>>>>> +
>>>>> +/*
>>>>> + *  Emulate taking an exception. See ARM ARM J8.1.2 AArch64.TakeException()
>>>>> + */
>>>>> +static int kvm_inject_nested(struct kvm_vcpu *vcpu, u64 esr_el2,
>>>>> +                          int exception_offset)
>>>>> +{
>>>>> +     int ret = 1;
>>>>> +     kvm_cpu_context_t *ctxt = &vcpu->arch.ctxt;
>>>>> +
>>>>> +     /* We don't inject an exception recursively to virtual EL2 */
>>>>> +     if (vcpu_mode_el2(vcpu))
>>>>> +             BUG();
>>>>> +
>>>>> +     ctxt->el2_regs[SPSR_EL2] = *vcpu_cpsr(vcpu);
>>>>> +     ctxt->el2_regs[ELR_EL2] = *vcpu_pc(vcpu);
>>>>> +     ctxt->el2_regs[ESR_EL2] = esr_el2;
>>>>> +
>>>>> +     /* On an exception, PSTATE.SP = 1 */
>>>>> +     *vcpu_cpsr(vcpu) = PSR_MODE_EL2h;
>>>>> +     *vcpu_cpsr(vcpu) |=  (PSR_A_BIT | PSR_F_BIT | PSR_I_BIT | PSR_D_BIT);
>>>>> +     *vcpu_pc(vcpu) = ctxt->el2_regs[VBAR_EL2] + exception_offset;
>>>>> +
>>>>> +     trace_kvm_inject_nested_exception(vcpu, esr_el2, *vcpu_pc(vcpu));
>>>>> +
>>>>> +     return ret;
>>>>> +}
>>>>> +
>>>>> +int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2)
>>>>> +{
>>>>> +     return kvm_inject_nested(vcpu, esr_el2, EL2_EXCEPT_SYNC_OFFSET);
>>>>> +}
>>>>> +
>>>>> +int kvm_inject_nested_irq(struct kvm_vcpu *vcpu)
>>>>> +{
>>>>> +     u64 esr_el2 = kvm_vcpu_get_hsr(vcpu);
>>>>> +     /* We supports only IRQ and FIQ, so the esr_el2 is not updated. */
>>>>> +     return kvm_inject_nested(vcpu, esr_el2, EL2_EXCEPT_ASYNC_OFFSET);
>>>>> +}
>>>>> diff --git a/arch/arm64/kvm/trace.h b/arch/arm64/kvm/trace.h
>>>>> index 7fb0008..7c86cfb 100644
>>>>> --- a/arch/arm64/kvm/trace.h
>>>>> +++ b/arch/arm64/kvm/trace.h
>>>>> @@ -167,6 +167,26 @@
>>>>>  );
>>>>>
>>>>>
>>>>> +TRACE_EVENT(kvm_inject_nested_exception,
>>>>> +     TP_PROTO(struct kvm_vcpu *vcpu, unsigned long esr_el2,
>>>>> +              unsigned long pc),
>>>>> +     TP_ARGS(vcpu, esr_el2, pc),
>>>>> +
>>>>> +     TP_STRUCT__entry(
>>>>> +             __field(struct kvm_vcpu *,      vcpu)
>>>>> +             __field(unsigned long,          esr_el2)
>>>>> +             __field(unsigned long,          pc)
>>>>> +     ),
>>>>> +
>>>>> +     TP_fast_assign(
>>>>> +             __entry->vcpu = vcpu;
>>>>> +             __entry->esr_el2 = esr_el2;
>>>>> +             __entry->pc = pc;
>>>>> +     ),
>>>>> +
>>>>> +     TP_printk("vcpu: %p, inject exception to vEL2: ESR_EL2 0x%lx, vector: 0x%016lx",
>>>>> +               __entry->vcpu, __entry->esr_el2, __entry->pc)
>>>>> +);
>>>>>  #endif /* _TRACE_ARM64_KVM_H */
>>>>>
>>>>>  #undef TRACE_INCLUDE_PATH
>>>> _______________________________________________
>>>> kvmarm mailing list
>>>> kvmarm@lists.cs.columbia.edu
>>>> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 06/55] KVM: arm64: Add EL2 execution context for nesting
  2017-02-22 11:10   ` Christoffer Dall
@ 2017-06-26 14:33     ` Jintack Lim
  2017-07-03  9:03       ` Christoffer Dall
  0 siblings, 1 reply; 111+ messages in thread
From: Jintack Lim @ 2017-06-26 14:33 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Christoffer Dall, Marc Zyngier, Paolo Bonzini,
	Radim Krčmář,
	linux, Catalin Marinas, Will Deacon, vladimir.murzin,
	Suzuki K Poulose, mark.rutland, james.morse, lorenzo.pieralisi,
	kevin.brodsky, wcohen, shankerd, geoff, Andre Przywara,
	Eric Auger, anna-maria, Shih-Wei Li, arm-mail-list, kvmarm,
	KVM General, lkml - Kernel Mailing List, Jintack Lim

Hi Christoffer,

On Wed, Feb 22, 2017 at 6:10 AM, Christoffer Dall <cdall@linaro.org> wrote:
> On Mon, Jan 09, 2017 at 01:24:02AM -0500, Jintack Lim wrote:
>> With the nested virtualization support, the context of the guest
>> includes EL2 register states. The host manages a set of virtual EL2
>> registers.  In addition to that, the guest hypervisor supposed to run in
>> EL2 is now deprivilaged and runs in EL1. So, the host also manages a set
>> of shadow system registers to be able to run the guest hypervisor in
>> EL1.
>>
>> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>> ---
>>  arch/arm64/include/asm/kvm_host.h | 54 +++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 54 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index c0c8b02..ed78d73 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -146,6 +146,42 @@ enum vcpu_sysreg {
>>       NR_SYS_REGS     /* Nothing after this line! */
>>  };
>>
>> +enum el2_regs {
>> +     ELR_EL2,
>> +     SPSR_EL2,
>> +     SP_EL2,
>> +     AMAIR_EL2,
>> +     MAIR_EL2,
>> +     TCR_EL2,
>> +     TTBR0_EL2,
>> +     VTCR_EL2,
>> +     VTTBR_EL2,
>> +     VMPIDR_EL2,
>> +     VPIDR_EL2,      /* 10 */
>> +     MDCR_EL2,
>> +     CNTHCTL_EL2,
>> +     CNTHP_CTL_EL2,
>> +     CNTHP_CVAL_EL2,
>> +     CNTHP_TVAL_EL2,
>> +     CNTVOFF_EL2,
>> +     ACTLR_EL2,
>> +     AFSR0_EL2,
>> +     AFSR1_EL2,
>> +     CPTR_EL2,       /* 20 */
>> +     ESR_EL2,
>> +     FAR_EL2,
>> +     HACR_EL2,
>> +     HCR_EL2,
>> +     HPFAR_EL2,
>> +     HSTR_EL2,
>> +     RMR_EL2,
>> +     RVBAR_EL2,
>> +     SCTLR_EL2,
>> +     TPIDR_EL2,      /* 30 */
>> +     VBAR_EL2,
>> +     NR_EL2_REGS     /* Nothing after this line! */
>> +};
>
> Why do we have a separate enum and array for the EL2 regs and not simply
> expand vcpu_sysreg ?

We can expand vcpu_sysreg for the EL2 system registers. For SP_EL2,
SPSR_EL2, and ELR_EL2, where is the good place to locate them?.
SP_EL1, SPSR_EL1, and ELR_EL1 registers are saved in the kvm_regs
structure instead of sysregs[], so I wonder it's better to put them in
kvm_regs, too.

BTW, what's the reason that those EL1 registers are in kvm_regs
instead of sysregs[] in the first place?

>
>> +
>>  /* 32bit mapping */
>>  #define c0_MPIDR     (MPIDR_EL1 * 2) /* MultiProcessor ID Register */
>>  #define c0_CSSELR    (CSSELR_EL1 * 2)/* Cache Size Selection Register */
>> @@ -193,6 +229,23 @@ struct kvm_cpu_context {
>>               u64 sys_regs[NR_SYS_REGS];
>>               u32 copro[NR_COPRO_REGS];
>>       };
>> +
>> +     u64 el2_regs[NR_EL2_REGS];         /* only used for nesting */
>> +     u64 shadow_sys_regs[NR_SYS_REGS];  /* only used for virtual EL2 */
>> +
>> +     /*
>> +      * hw_* will be used when switching to a VM. They point to either
>> +      * the virtual EL2 or EL1/EL0 context depending on vcpu mode.
>
> don't they either point to the shadow sys regs or the the normal EL1
> sysregs?

Ah, this is a general comment for all three members below.

>
>> +      */
>> +
>> +     /* pointing shadow_sys_regs or sys_regs */
>
> that's what this comment seems to indicate, so there's some duplicity
> here.

And this comment is for hw_sys_regs specifically.

>
>> +     u64 *hw_sys_regs;
>> +
>> +     /* copy of either gp_regs.sp_el1 or el2_regs[SP_EL2] */
>> +     u64 hw_sp_el1;
>> +
>> +     /* pstate written to SPSR_EL2 */
>> +     u64 hw_pstate;
>>  };
>>
>>  typedef struct kvm_cpu_context kvm_cpu_context_t;
>> @@ -277,6 +330,7 @@ struct kvm_vcpu_arch {
>>
>>  #define vcpu_gp_regs(v)              (&(v)->arch.ctxt.gp_regs)
>>  #define vcpu_sys_reg(v,r)    ((v)->arch.ctxt.sys_regs[(r)])
>> +#define vcpu_el2_reg(v, r)   ((v)->arch.ctxt.el2_regs[(r)])
>>  /*
>>   * CP14 and CP15 live in the same array, as they are backed by the
>>   * same system registers.
>> --
>> 1.9.1
>>
>>
>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 21/55] KVM: arm64: Forward HVC instruction to the guest hypervisor
  2017-02-22 11:47   ` Christoffer Dall
@ 2017-06-26 15:21     ` Jintack Lim
  2017-07-03  9:08       ` Christoffer Dall
  0 siblings, 1 reply; 111+ messages in thread
From: Jintack Lim @ 2017-06-26 15:21 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Christoffer Dall, Marc Zyngier, Paolo Bonzini,
	Radim Krčmář,
	linux, Catalin Marinas, Will Deacon, vladimir.murzin,
	Suzuki K Poulose, mark.rutland, james.morse, lorenzo.pieralisi,
	kevin.brodsky, wcohen, shankerd, geoff, Andre Przywara,
	Eric Auger, anna-maria, Shih-Wei Li, arm-mail-list, kvmarm,
	KVM General, lkml - Kernel Mailing List

On Wed, Feb 22, 2017 at 6:47 AM, Christoffer Dall <cdall@linaro.org> wrote:
> On Mon, Jan 09, 2017 at 01:24:17AM -0500, Jintack Lim wrote:
>> Forward exceptions due to hvc instruction to the guest hypervisor.
>>
>> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
>> ---
>>  arch/arm64/include/asm/kvm_nested.h |  5 +++++
>>  arch/arm64/kvm/Makefile             |  1 +
>>  arch/arm64/kvm/handle_exit.c        | 11 +++++++++++
>>  arch/arm64/kvm/handle_exit_nested.c | 27 +++++++++++++++++++++++++++
>>  4 files changed, 44 insertions(+)
>>  create mode 100644 arch/arm64/include/asm/kvm_nested.h
>>  create mode 100644 arch/arm64/kvm/handle_exit_nested.c
>>
>> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
>> new file mode 100644
>> index 0000000..620b4d3
>> --- /dev/null
>> +++ b/arch/arm64/include/asm/kvm_nested.h
>> @@ -0,0 +1,5 @@
>> +#ifndef __ARM64_KVM_NESTED_H__
>> +#define __ARM64_KVM_NESTED_H__
>> +
>> +int handle_hvc_nested(struct kvm_vcpu *vcpu);
>> +#endif
>> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
>> index b342bdd..9c35e9a 100644
>> --- a/arch/arm64/kvm/Makefile
>> +++ b/arch/arm64/kvm/Makefile
>> @@ -35,4 +35,5 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/irqchip.o
>>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
>>  kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
>>
>> +kvm-$(CONFIG_KVM_ARM_NESTED_HYP) += handle_exit_nested.o
>>  kvm-$(CONFIG_KVM_ARM_NESTED_HYP) += emulate-nested.o
>> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
>> index a891684..208be16 100644
>> --- a/arch/arm64/kvm/handle_exit.c
>> +++ b/arch/arm64/kvm/handle_exit.c
>> @@ -29,6 +29,10 @@
>>  #include <asm/kvm_mmu.h>
>>  #include <asm/kvm_psci.h>
>>
>> +#ifdef CONFIG_KVM_ARM_NESTED_HYP
>> +#include <asm/kvm_nested.h>
>> +#endif
>> +
>>  #define CREATE_TRACE_POINTS
>>  #include "trace.h"
>>
>> @@ -42,6 +46,13 @@ static int handle_hvc(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>                           kvm_vcpu_hvc_get_imm(vcpu));
>>       vcpu->stat.hvc_exit_stat++;
>>
>> +#ifdef CONFIG_KVM_ARM_NESTED_HYP
>> +     ret = handle_hvc_nested(vcpu);
>> +     if (ret < 0 && ret != -EINVAL)
>> +             return ret;
>> +     else if (ret >= 0)
>> +             return ret;
>> +#endif
>>       ret = kvm_psci_call(vcpu);
>>       if (ret < 0) {
>>               kvm_inject_undefined(vcpu);
>> diff --git a/arch/arm64/kvm/handle_exit_nested.c b/arch/arm64/kvm/handle_exit_nested.c
>> new file mode 100644
>> index 0000000..a6ce23b
>> --- /dev/null
>> +++ b/arch/arm64/kvm/handle_exit_nested.c
>> @@ -0,0 +1,27 @@
>> +/*
>> + * Copyright (C) 2016 - Columbia University
>> + * Author: Jintack Lim <jintack@cs.columbia.edu>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include <linux/kvm.h>
>> +#include <linux/kvm_host.h>
>> +
>> +#include <asm/kvm_emulate.h>
>> +
>> +/* We forward all hvc instruction to the guest hypervisor. */
>> +int handle_hvc_nested(struct kvm_vcpu *vcpu)
>> +{
>> +     return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
>> +}
>
> I don't understand the logic here or in the caller above.  Do we really
> forward *all" hvc calls to the guest hypervisor now, so that we no
> longer support any hypercalls from the VM?  That seems a little rough
> and probably requires some more discussions.

So I think if we run a VM with the EL2 support, then all hvc calls
from the VM should be forwarded to the virtual EL2.

I may miss something obvious, so can you (or anyone) come up with some
cases that the host hypervisor needs to directly handle hvc from the
VM with the EL2 support?

Thanks,
Jintack

>
> Thanks,
> -Christoffer
>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 06/55] KVM: arm64: Add EL2 execution context for nesting
  2017-06-26 14:33     ` Jintack Lim
@ 2017-07-03  9:03       ` Christoffer Dall
  2017-07-03  9:32         ` Marc Zyngier
  0 siblings, 1 reply; 111+ messages in thread
From: Christoffer Dall @ 2017-07-03  9:03 UTC (permalink / raw)
  To: Jintack Lim
  Cc: Christoffer Dall, Marc Zyngier, Paolo Bonzini,
	Radim Krčmář,
	linux, Catalin Marinas, Will Deacon, vladimir.murzin,
	Suzuki K Poulose, mark.rutland, james.morse, lorenzo.pieralisi,
	kevin.brodsky, wcohen, shankerd, geoff, Andre Przywara,
	Eric Auger, anna-maria, Shih-Wei Li, arm-mail-list, kvmarm,
	KVM General, lkml - Kernel Mailing List, Jintack Lim

On Mon, Jun 26, 2017 at 10:33:23AM -0400, Jintack Lim wrote:
> Hi Christoffer,
> 
> On Wed, Feb 22, 2017 at 6:10 AM, Christoffer Dall <cdall@linaro.org> wrote:
> > On Mon, Jan 09, 2017 at 01:24:02AM -0500, Jintack Lim wrote:
> >> With the nested virtualization support, the context of the guest
> >> includes EL2 register states. The host manages a set of virtual EL2
> >> registers.  In addition to that, the guest hypervisor supposed to run in
> >> EL2 is now deprivilaged and runs in EL1. So, the host also manages a set
> >> of shadow system registers to be able to run the guest hypervisor in
> >> EL1.
> >>
> >> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
> >> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> >> ---
> >>  arch/arm64/include/asm/kvm_host.h | 54 +++++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 54 insertions(+)
> >>
> >> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> >> index c0c8b02..ed78d73 100644
> >> --- a/arch/arm64/include/asm/kvm_host.h
> >> +++ b/arch/arm64/include/asm/kvm_host.h
> >> @@ -146,6 +146,42 @@ enum vcpu_sysreg {
> >>       NR_SYS_REGS     /* Nothing after this line! */
> >>  };
> >>
> >> +enum el2_regs {
> >> +     ELR_EL2,
> >> +     SPSR_EL2,
> >> +     SP_EL2,
> >> +     AMAIR_EL2,
> >> +     MAIR_EL2,
> >> +     TCR_EL2,
> >> +     TTBR0_EL2,
> >> +     VTCR_EL2,
> >> +     VTTBR_EL2,
> >> +     VMPIDR_EL2,
> >> +     VPIDR_EL2,      /* 10 */
> >> +     MDCR_EL2,
> >> +     CNTHCTL_EL2,
> >> +     CNTHP_CTL_EL2,
> >> +     CNTHP_CVAL_EL2,
> >> +     CNTHP_TVAL_EL2,
> >> +     CNTVOFF_EL2,
> >> +     ACTLR_EL2,
> >> +     AFSR0_EL2,
> >> +     AFSR1_EL2,
> >> +     CPTR_EL2,       /* 20 */
> >> +     ESR_EL2,
> >> +     FAR_EL2,
> >> +     HACR_EL2,
> >> +     HCR_EL2,
> >> +     HPFAR_EL2,
> >> +     HSTR_EL2,
> >> +     RMR_EL2,
> >> +     RVBAR_EL2,
> >> +     SCTLR_EL2,
> >> +     TPIDR_EL2,      /* 30 */
> >> +     VBAR_EL2,
> >> +     NR_EL2_REGS     /* Nothing after this line! */
> >> +};
> >
> > Why do we have a separate enum and array for the EL2 regs and not simply
> > expand vcpu_sysreg ?
> 
> We can expand vcpu_sysreg for the EL2 system registers. For SP_EL2,
> SPSR_EL2, and ELR_EL2, where is the good place to locate them?.
> SP_EL1, SPSR_EL1, and ELR_EL1 registers are saved in the kvm_regs
> structure instead of sysregs[], so I wonder it's better to put them in
> kvm_regs, too.
> 
> BTW, what's the reason that those EL1 registers are in kvm_regs
> instead of sysregs[] in the first place?
> 

This has mostly to do with the way we export things to userspace, and
for historical reasons.

So we should either expand kvm_regs with the non-sysregs EL2 registers
and expand sys_regs with the EL2 sysregs, or we should put everything
EL2 into an EL2 array.  I feel like the first solution will fit more
nicely into the current design, but I don't have a very strong
preference.

You should look at the KVM_{GET,SET}_ONE_REG API definition and think
about how your choice will fit with this.

Marc, any preference?

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 21/55] KVM: arm64: Forward HVC instruction to the guest hypervisor
  2017-06-26 15:21     ` Jintack Lim
@ 2017-07-03  9:08       ` Christoffer Dall
  2017-07-03  9:31         ` Andrew Jones
  2017-07-03 13:29         ` Jintack Lim
  0 siblings, 2 replies; 111+ messages in thread
From: Christoffer Dall @ 2017-07-03  9:08 UTC (permalink / raw)
  To: Jintack Lim
  Cc: Christoffer Dall, Marc Zyngier, Paolo Bonzini,
	Radim Krčmář,
	linux, Catalin Marinas, Will Deacon, vladimir.murzin,
	Suzuki K Poulose, mark.rutland, james.morse, lorenzo.pieralisi,
	kevin.brodsky, wcohen, shankerd, geoff, Andre Przywara,
	Eric Auger, anna-maria, Shih-Wei Li, arm-mail-list, kvmarm,
	KVM General, lkml - Kernel Mailing List

On Mon, Jun 26, 2017 at 11:21:25AM -0400, Jintack Lim wrote:
> On Wed, Feb 22, 2017 at 6:47 AM, Christoffer Dall <cdall@linaro.org> wrote:
> > On Mon, Jan 09, 2017 at 01:24:17AM -0500, Jintack Lim wrote:
> >> Forward exceptions due to hvc instruction to the guest hypervisor.
> >>
> >> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
> >> ---
> >>  arch/arm64/include/asm/kvm_nested.h |  5 +++++
> >>  arch/arm64/kvm/Makefile             |  1 +
> >>  arch/arm64/kvm/handle_exit.c        | 11 +++++++++++
> >>  arch/arm64/kvm/handle_exit_nested.c | 27 +++++++++++++++++++++++++++
> >>  4 files changed, 44 insertions(+)
> >>  create mode 100644 arch/arm64/include/asm/kvm_nested.h
> >>  create mode 100644 arch/arm64/kvm/handle_exit_nested.c
> >>
> >> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> >> new file mode 100644
> >> index 0000000..620b4d3
> >> --- /dev/null
> >> +++ b/arch/arm64/include/asm/kvm_nested.h
> >> @@ -0,0 +1,5 @@
> >> +#ifndef __ARM64_KVM_NESTED_H__
> >> +#define __ARM64_KVM_NESTED_H__
> >> +
> >> +int handle_hvc_nested(struct kvm_vcpu *vcpu);
> >> +#endif
> >> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> >> index b342bdd..9c35e9a 100644
> >> --- a/arch/arm64/kvm/Makefile
> >> +++ b/arch/arm64/kvm/Makefile
> >> @@ -35,4 +35,5 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/irqchip.o
> >>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
> >>  kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
> >>
> >> +kvm-$(CONFIG_KVM_ARM_NESTED_HYP) += handle_exit_nested.o
> >>  kvm-$(CONFIG_KVM_ARM_NESTED_HYP) += emulate-nested.o
> >> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> >> index a891684..208be16 100644
> >> --- a/arch/arm64/kvm/handle_exit.c
> >> +++ b/arch/arm64/kvm/handle_exit.c
> >> @@ -29,6 +29,10 @@
> >>  #include <asm/kvm_mmu.h>
> >>  #include <asm/kvm_psci.h>
> >>
> >> +#ifdef CONFIG_KVM_ARM_NESTED_HYP
> >> +#include <asm/kvm_nested.h>
> >> +#endif
> >> +
> >>  #define CREATE_TRACE_POINTS
> >>  #include "trace.h"
> >>
> >> @@ -42,6 +46,13 @@ static int handle_hvc(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >>                           kvm_vcpu_hvc_get_imm(vcpu));
> >>       vcpu->stat.hvc_exit_stat++;
> >>
> >> +#ifdef CONFIG_KVM_ARM_NESTED_HYP
> >> +     ret = handle_hvc_nested(vcpu);
> >> +     if (ret < 0 && ret != -EINVAL)
> >> +             return ret;
> >> +     else if (ret >= 0)
> >> +             return ret;
> >> +#endif
> >>       ret = kvm_psci_call(vcpu);
> >>       if (ret < 0) {
> >>               kvm_inject_undefined(vcpu);
> >> diff --git a/arch/arm64/kvm/handle_exit_nested.c b/arch/arm64/kvm/handle_exit_nested.c
> >> new file mode 100644
> >> index 0000000..a6ce23b
> >> --- /dev/null
> >> +++ b/arch/arm64/kvm/handle_exit_nested.c
> >> @@ -0,0 +1,27 @@
> >> +/*
> >> + * Copyright (C) 2016 - Columbia University
> >> + * Author: Jintack Lim <jintack@cs.columbia.edu>
> >> + *
> >> + * This program is free software; you can redistribute it and/or modify
> >> + * it under the terms of the GNU General Public License version 2 as
> >> + * published by the Free Software Foundation.
> >> + *
> >> + * This program is distributed in the hope that it will be useful,
> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >> + * GNU General Public License for more details.
> >> + *
> >> + * You should have received a copy of the GNU General Public License
> >> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> >> + */
> >> +
> >> +#include <linux/kvm.h>
> >> +#include <linux/kvm_host.h>
> >> +
> >> +#include <asm/kvm_emulate.h>
> >> +
> >> +/* We forward all hvc instruction to the guest hypervisor. */
> >> +int handle_hvc_nested(struct kvm_vcpu *vcpu)
> >> +{
> >> +     return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
> >> +}
> >
> > I don't understand the logic here or in the caller above.  Do we really
> > forward *all" hvc calls to the guest hypervisor now, so that we no
> > longer support any hypercalls from the VM?  That seems a little rough
> > and probably requires some more discussions.
> 
> So I think if we run a VM with the EL2 support, then all hvc calls
> from the VM should be forwarded to the virtual EL2.

But do we actually check if the guest has EL2 here?  It seems you cann
handle_hvc_nested unconditionally when you have
OCNFIG_KVM_ARM_NESTED_HYP.  I think that's what threw me off when first
reading your patch.

> 
> I may miss something obvious, so can you (or anyone) come up with some
> cases that the host hypervisor needs to directly handle hvc from the
> VM with the EL2 support?
> 

So I'm a little unsure what to say here.  On one hand you are absolutely
correct, that architecturally if we emulated virtual EL2, then all
hypercalls are handled by the virtual EL2 (even hypercalls from virtual
EL2 which should become self-hypercalls).

On the other hand, an enlightened guest may want to use hypercalls to
the hypervisor for some reason, but that would require some numbering
scheme to separate the two concepts.

Do we currently have support for the guest to use SMC calls for PSCI
when it has virtual EL2?

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 21/55] KVM: arm64: Forward HVC instruction to the guest hypervisor
  2017-07-03  9:08       ` Christoffer Dall
@ 2017-07-03  9:31         ` Andrew Jones
  2017-07-03  9:51           ` Christoffer Dall
  2017-07-03 13:29         ` Jintack Lim
  1 sibling, 1 reply; 111+ messages in thread
From: Andrew Jones @ 2017-07-03  9:31 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Jintack Lim, Christoffer Dall, Marc Zyngier, Paolo Bonzini,
	Radim Krčmář,
	linux, Catalin Marinas, Will Deacon, vladimir.murzin,
	Suzuki K Poulose, mark.rutland, james.morse, lorenzo.pieralisi,
	kevin.brodsky, wcohen, shankerd, geoff, Andre Przywara,
	Eric Auger, anna-maria, Shih-Wei Li, arm-mail-list, kvmarm,
	KVM General, lkml - Kernel Mailing List

On Mon, Jul 03, 2017 at 11:08:50AM +0200, Christoffer Dall wrote:
> On Mon, Jun 26, 2017 at 11:21:25AM -0400, Jintack Lim wrote:
> > On Wed, Feb 22, 2017 at 6:47 AM, Christoffer Dall <cdall@linaro.org> wrote:
> > > On Mon, Jan 09, 2017 at 01:24:17AM -0500, Jintack Lim wrote:
> > >> Forward exceptions due to hvc instruction to the guest hypervisor.
> > >>
> > >> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
> > >> ---
> > >>  arch/arm64/include/asm/kvm_nested.h |  5 +++++
> > >>  arch/arm64/kvm/Makefile             |  1 +
> > >>  arch/arm64/kvm/handle_exit.c        | 11 +++++++++++
> > >>  arch/arm64/kvm/handle_exit_nested.c | 27 +++++++++++++++++++++++++++
> > >>  4 files changed, 44 insertions(+)
> > >>  create mode 100644 arch/arm64/include/asm/kvm_nested.h
> > >>  create mode 100644 arch/arm64/kvm/handle_exit_nested.c
> > >>
> > >> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> > >> new file mode 100644
> > >> index 0000000..620b4d3
> > >> --- /dev/null
> > >> +++ b/arch/arm64/include/asm/kvm_nested.h
> > >> @@ -0,0 +1,5 @@
> > >> +#ifndef __ARM64_KVM_NESTED_H__
> > >> +#define __ARM64_KVM_NESTED_H__
> > >> +
> > >> +int handle_hvc_nested(struct kvm_vcpu *vcpu);
> > >> +#endif
> > >> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> > >> index b342bdd..9c35e9a 100644
> > >> --- a/arch/arm64/kvm/Makefile
> > >> +++ b/arch/arm64/kvm/Makefile
> > >> @@ -35,4 +35,5 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/irqchip.o
> > >>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
> > >>  kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
> > >>
> > >> +kvm-$(CONFIG_KVM_ARM_NESTED_HYP) += handle_exit_nested.o
> > >>  kvm-$(CONFIG_KVM_ARM_NESTED_HYP) += emulate-nested.o
> > >> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> > >> index a891684..208be16 100644
> > >> --- a/arch/arm64/kvm/handle_exit.c
> > >> +++ b/arch/arm64/kvm/handle_exit.c
> > >> @@ -29,6 +29,10 @@
> > >>  #include <asm/kvm_mmu.h>
> > >>  #include <asm/kvm_psci.h>
> > >>
> > >> +#ifdef CONFIG_KVM_ARM_NESTED_HYP
> > >> +#include <asm/kvm_nested.h>
> > >> +#endif
> > >> +
> > >>  #define CREATE_TRACE_POINTS
> > >>  #include "trace.h"
> > >>
> > >> @@ -42,6 +46,13 @@ static int handle_hvc(struct kvm_vcpu *vcpu, struct kvm_run *run)
> > >>                           kvm_vcpu_hvc_get_imm(vcpu));
> > >>       vcpu->stat.hvc_exit_stat++;
> > >>
> > >> +#ifdef CONFIG_KVM_ARM_NESTED_HYP
> > >> +     ret = handle_hvc_nested(vcpu);
> > >> +     if (ret < 0 && ret != -EINVAL)
> > >> +             return ret;
> > >> +     else if (ret >= 0)
> > >> +             return ret;
> > >> +#endif
> > >>       ret = kvm_psci_call(vcpu);
> > >>       if (ret < 0) {
> > >>               kvm_inject_undefined(vcpu);
> > >> diff --git a/arch/arm64/kvm/handle_exit_nested.c b/arch/arm64/kvm/handle_exit_nested.c
> > >> new file mode 100644
> > >> index 0000000..a6ce23b
> > >> --- /dev/null
> > >> +++ b/arch/arm64/kvm/handle_exit_nested.c
> > >> @@ -0,0 +1,27 @@
> > >> +/*
> > >> + * Copyright (C) 2016 - Columbia University
> > >> + * Author: Jintack Lim <jintack@cs.columbia.edu>
> > >> + *
> > >> + * This program is free software; you can redistribute it and/or modify
> > >> + * it under the terms of the GNU General Public License version 2 as
> > >> + * published by the Free Software Foundation.
> > >> + *
> > >> + * This program is distributed in the hope that it will be useful,
> > >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > >> + * GNU General Public License for more details.
> > >> + *
> > >> + * You should have received a copy of the GNU General Public License
> > >> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> > >> + */
> > >> +
> > >> +#include <linux/kvm.h>
> > >> +#include <linux/kvm_host.h>
> > >> +
> > >> +#include <asm/kvm_emulate.h>
> > >> +
> > >> +/* We forward all hvc instruction to the guest hypervisor. */
> > >> +int handle_hvc_nested(struct kvm_vcpu *vcpu)
> > >> +{
> > >> +     return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
> > >> +}
> > >
> > > I don't understand the logic here or in the caller above.  Do we really
> > > forward *all" hvc calls to the guest hypervisor now, so that we no
> > > longer support any hypercalls from the VM?  That seems a little rough
> > > and probably requires some more discussions.
> > 
> > So I think if we run a VM with the EL2 support, then all hvc calls
> > from the VM should be forwarded to the virtual EL2.
> 
> But do we actually check if the guest has EL2 here?  It seems you cann
> handle_hvc_nested unconditionally when you have
> OCNFIG_KVM_ARM_NESTED_HYP.  I think that's what threw me off when first
> reading your patch.
> 
> > 
> > I may miss something obvious, so can you (or anyone) come up with some
> > cases that the host hypervisor needs to directly handle hvc from the
> > VM with the EL2 support?
> > 
> 
> So I'm a little unsure what to say here.  On one hand you are absolutely
> correct, that architecturally if we emulated virtual EL2, then all
> hypercalls are handled by the virtual EL2 (even hypercalls from virtual
> EL2 which should become self-hypercalls).
> 
> On the other hand, an enlightened guest may want to use hypercalls to
> the hypervisor for some reason, but that would require some numbering
> scheme to separate the two concepts.

Yes, I've been thinking that a KVM generic vcpu needs to be enlightened,
and to use a hypercall to get the host cpu's errata. If we head down that
road, then even a vcpu emulating EL2 would need to be able to this.

> 
> Do we currently have support for the guest to use SMC calls for PSCI
> when it has virtual EL2?

Yup, that's already supported by QEMU and the guest kernel.

Thanks,
drew

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 06/55] KVM: arm64: Add EL2 execution context for nesting
  2017-07-03  9:03       ` Christoffer Dall
@ 2017-07-03  9:32         ` Marc Zyngier
  2017-07-03  9:54           ` Christoffer Dall
  0 siblings, 1 reply; 111+ messages in thread
From: Marc Zyngier @ 2017-07-03  9:32 UTC (permalink / raw)
  To: Christoffer Dall, Jintack Lim
  Cc: Christoffer Dall, Paolo Bonzini, Radim Krčmář,
	linux, Catalin Marinas, Will Deacon, vladimir.murzin,
	Suzuki K Poulose, mark.rutland, james.morse, lorenzo.pieralisi,
	kevin.brodsky, wcohen, shankerd, geoff, Andre Przywara,
	Eric Auger, anna-maria, Shih-Wei Li, arm-mail-list, kvmarm,
	KVM General, lkml - Kernel Mailing List, Jintack Lim

On 03/07/17 10:03, Christoffer Dall wrote:
> On Mon, Jun 26, 2017 at 10:33:23AM -0400, Jintack Lim wrote:
>> Hi Christoffer,
>>
>> On Wed, Feb 22, 2017 at 6:10 AM, Christoffer Dall <cdall@linaro.org> wrote:
>>> On Mon, Jan 09, 2017 at 01:24:02AM -0500, Jintack Lim wrote:
>>>> With the nested virtualization support, the context of the guest
>>>> includes EL2 register states. The host manages a set of virtual EL2
>>>> registers.  In addition to that, the guest hypervisor supposed to run in
>>>> EL2 is now deprivilaged and runs in EL1. So, the host also manages a set
>>>> of shadow system registers to be able to run the guest hypervisor in
>>>> EL1.
>>>>
>>>> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
>>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>>>> ---
>>>>  arch/arm64/include/asm/kvm_host.h | 54 +++++++++++++++++++++++++++++++++++++++
>>>>  1 file changed, 54 insertions(+)
>>>>
>>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>>>> index c0c8b02..ed78d73 100644
>>>> --- a/arch/arm64/include/asm/kvm_host.h
>>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>>> @@ -146,6 +146,42 @@ enum vcpu_sysreg {
>>>>       NR_SYS_REGS     /* Nothing after this line! */
>>>>  };
>>>>
>>>> +enum el2_regs {
>>>> +     ELR_EL2,
>>>> +     SPSR_EL2,
>>>> +     SP_EL2,
>>>> +     AMAIR_EL2,
>>>> +     MAIR_EL2,
>>>> +     TCR_EL2,
>>>> +     TTBR0_EL2,
>>>> +     VTCR_EL2,
>>>> +     VTTBR_EL2,
>>>> +     VMPIDR_EL2,
>>>> +     VPIDR_EL2,      /* 10 */
>>>> +     MDCR_EL2,
>>>> +     CNTHCTL_EL2,
>>>> +     CNTHP_CTL_EL2,
>>>> +     CNTHP_CVAL_EL2,
>>>> +     CNTHP_TVAL_EL2,
>>>> +     CNTVOFF_EL2,
>>>> +     ACTLR_EL2,
>>>> +     AFSR0_EL2,
>>>> +     AFSR1_EL2,
>>>> +     CPTR_EL2,       /* 20 */
>>>> +     ESR_EL2,
>>>> +     FAR_EL2,
>>>> +     HACR_EL2,
>>>> +     HCR_EL2,
>>>> +     HPFAR_EL2,
>>>> +     HSTR_EL2,
>>>> +     RMR_EL2,
>>>> +     RVBAR_EL2,
>>>> +     SCTLR_EL2,
>>>> +     TPIDR_EL2,      /* 30 */
>>>> +     VBAR_EL2,
>>>> +     NR_EL2_REGS     /* Nothing after this line! */
>>>> +};
>>>
>>> Why do we have a separate enum and array for the EL2 regs and not simply
>>> expand vcpu_sysreg ?
>>
>> We can expand vcpu_sysreg for the EL2 system registers. For SP_EL2,
>> SPSR_EL2, and ELR_EL2, where is the good place to locate them?.
>> SP_EL1, SPSR_EL1, and ELR_EL1 registers are saved in the kvm_regs
>> structure instead of sysregs[], so I wonder it's better to put them in
>> kvm_regs, too.
>>
>> BTW, what's the reason that those EL1 registers are in kvm_regs
>> instead of sysregs[] in the first place?
>>
> 
> This has mostly to do with the way we export things to userspace, and
> for historical reasons.
> 
> So we should either expand kvm_regs with the non-sysregs EL2 registers
> and expand sys_regs with the EL2 sysregs, or we should put everything
> EL2 into an EL2 array.  I feel like the first solution will fit more
> nicely into the current design, but I don't have a very strong
> preference.
> 
> You should look at the KVM_{GET,SET}_ONE_REG API definition and think
> about how your choice will fit with this.
> 
> Marc, any preference?

My worry is that by changing kvm_regs, we're touching a userspace
visible structure. I'm not sure we can avoid it, but I'd like to avoid
putting too much there (SPSR_EL2 and ELR_EL2 should be enough). I just
had a panic moment when realizing that this structure is not versioned,
but the whole ONE_REG API seems to save us from a complete disaster.

Overall, having kvm_regs as a UAPI visible thing retrospectively strikes
me as a dangerous design, as we cannot easily expand it. Maybe we should
consider having a kvm_regs_v2 that embeds kvm_regs, and not directly
expose it to userspace (but instead expose the indexes in that
structure)? Userspace that knows how to deal with EL2 will use the new
indexes, while existing SW will carry on using the EL1/EL0 version.

sysregs are easier to deal with, as they are visible through their
encoding, and we can place the anywhere we want. sys_regs is as good a
location as any.

Thoughts?

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 21/55] KVM: arm64: Forward HVC instruction to the guest hypervisor
  2017-07-03  9:31         ` Andrew Jones
@ 2017-07-03  9:51           ` Christoffer Dall
  2017-07-03 12:03             ` Will Deacon
  0 siblings, 1 reply; 111+ messages in thread
From: Christoffer Dall @ 2017-07-03  9:51 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Jintack Lim, Christoffer Dall, Marc Zyngier, Paolo Bonzini,
	Radim Krčmář,
	linux, Catalin Marinas, Will Deacon, vladimir.murzin,
	Suzuki K Poulose, mark.rutland, james.morse, lorenzo.pieralisi,
	kevin.brodsky, wcohen, shankerd, geoff, Andre Przywara,
	Eric Auger, anna-maria, Shih-Wei Li, arm-mail-list, kvmarm,
	KVM General, lkml - Kernel Mailing List

On Mon, Jul 03, 2017 at 11:31:56AM +0200, Andrew Jones wrote:
> On Mon, Jul 03, 2017 at 11:08:50AM +0200, Christoffer Dall wrote:
> > On Mon, Jun 26, 2017 at 11:21:25AM -0400, Jintack Lim wrote:
> > > On Wed, Feb 22, 2017 at 6:47 AM, Christoffer Dall <cdall@linaro.org> wrote:
> > > > On Mon, Jan 09, 2017 at 01:24:17AM -0500, Jintack Lim wrote:
> > > >> Forward exceptions due to hvc instruction to the guest hypervisor.
> > > >>
> > > >> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
> > > >> ---
> > > >>  arch/arm64/include/asm/kvm_nested.h |  5 +++++
> > > >>  arch/arm64/kvm/Makefile             |  1 +
> > > >>  arch/arm64/kvm/handle_exit.c        | 11 +++++++++++
> > > >>  arch/arm64/kvm/handle_exit_nested.c | 27 +++++++++++++++++++++++++++
> > > >>  4 files changed, 44 insertions(+)
> > > >>  create mode 100644 arch/arm64/include/asm/kvm_nested.h
> > > >>  create mode 100644 arch/arm64/kvm/handle_exit_nested.c
> > > >>
> > > >> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> > > >> new file mode 100644
> > > >> index 0000000..620b4d3
> > > >> --- /dev/null
> > > >> +++ b/arch/arm64/include/asm/kvm_nested.h
> > > >> @@ -0,0 +1,5 @@
> > > >> +#ifndef __ARM64_KVM_NESTED_H__
> > > >> +#define __ARM64_KVM_NESTED_H__
> > > >> +
> > > >> +int handle_hvc_nested(struct kvm_vcpu *vcpu);
> > > >> +#endif
> > > >> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> > > >> index b342bdd..9c35e9a 100644
> > > >> --- a/arch/arm64/kvm/Makefile
> > > >> +++ b/arch/arm64/kvm/Makefile
> > > >> @@ -35,4 +35,5 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/irqchip.o
> > > >>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
> > > >>  kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
> > > >>
> > > >> +kvm-$(CONFIG_KVM_ARM_NESTED_HYP) += handle_exit_nested.o
> > > >>  kvm-$(CONFIG_KVM_ARM_NESTED_HYP) += emulate-nested.o
> > > >> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> > > >> index a891684..208be16 100644
> > > >> --- a/arch/arm64/kvm/handle_exit.c
> > > >> +++ b/arch/arm64/kvm/handle_exit.c
> > > >> @@ -29,6 +29,10 @@
> > > >>  #include <asm/kvm_mmu.h>
> > > >>  #include <asm/kvm_psci.h>
> > > >>
> > > >> +#ifdef CONFIG_KVM_ARM_NESTED_HYP
> > > >> +#include <asm/kvm_nested.h>
> > > >> +#endif
> > > >> +
> > > >>  #define CREATE_TRACE_POINTS
> > > >>  #include "trace.h"
> > > >>
> > > >> @@ -42,6 +46,13 @@ static int handle_hvc(struct kvm_vcpu *vcpu, struct kvm_run *run)
> > > >>                           kvm_vcpu_hvc_get_imm(vcpu));
> > > >>       vcpu->stat.hvc_exit_stat++;
> > > >>
> > > >> +#ifdef CONFIG_KVM_ARM_NESTED_HYP
> > > >> +     ret = handle_hvc_nested(vcpu);
> > > >> +     if (ret < 0 && ret != -EINVAL)
> > > >> +             return ret;
> > > >> +     else if (ret >= 0)
> > > >> +             return ret;
> > > >> +#endif
> > > >>       ret = kvm_psci_call(vcpu);
> > > >>       if (ret < 0) {
> > > >>               kvm_inject_undefined(vcpu);
> > > >> diff --git a/arch/arm64/kvm/handle_exit_nested.c b/arch/arm64/kvm/handle_exit_nested.c
> > > >> new file mode 100644
> > > >> index 0000000..a6ce23b
> > > >> --- /dev/null
> > > >> +++ b/arch/arm64/kvm/handle_exit_nested.c
> > > >> @@ -0,0 +1,27 @@
> > > >> +/*
> > > >> + * Copyright (C) 2016 - Columbia University
> > > >> + * Author: Jintack Lim <jintack@cs.columbia.edu>
> > > >> + *
> > > >> + * This program is free software; you can redistribute it and/or modify
> > > >> + * it under the terms of the GNU General Public License version 2 as
> > > >> + * published by the Free Software Foundation.
> > > >> + *
> > > >> + * This program is distributed in the hope that it will be useful,
> > > >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > > >> + * GNU General Public License for more details.
> > > >> + *
> > > >> + * You should have received a copy of the GNU General Public License
> > > >> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> > > >> + */
> > > >> +
> > > >> +#include <linux/kvm.h>
> > > >> +#include <linux/kvm_host.h>
> > > >> +
> > > >> +#include <asm/kvm_emulate.h>
> > > >> +
> > > >> +/* We forward all hvc instruction to the guest hypervisor. */
> > > >> +int handle_hvc_nested(struct kvm_vcpu *vcpu)
> > > >> +{
> > > >> +     return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
> > > >> +}
> > > >
> > > > I don't understand the logic here or in the caller above.  Do we really
> > > > forward *all" hvc calls to the guest hypervisor now, so that we no
> > > > longer support any hypercalls from the VM?  That seems a little rough
> > > > and probably requires some more discussions.
> > > 
> > > So I think if we run a VM with the EL2 support, then all hvc calls
> > > from the VM should be forwarded to the virtual EL2.
> > 
> > But do we actually check if the guest has EL2 here?  It seems you cann
> > handle_hvc_nested unconditionally when you have
> > OCNFIG_KVM_ARM_NESTED_HYP.  I think that's what threw me off when first
> > reading your patch.
> > 
> > > 
> > > I may miss something obvious, so can you (or anyone) come up with some
> > > cases that the host hypervisor needs to directly handle hvc from the
> > > VM with the EL2 support?
> > > 
> > 
> > So I'm a little unsure what to say here.  On one hand you are absolutely
> > correct, that architecturally if we emulated virtual EL2, then all
> > hypercalls are handled by the virtual EL2 (even hypercalls from virtual
> > EL2 which should become self-hypercalls).
> > 
> > On the other hand, an enlightened guest may want to use hypercalls to
> > the hypervisor for some reason, but that would require some numbering
> > scheme to separate the two concepts.
> 
> Yes, I've been thinking that a KVM generic vcpu needs to be enlightened,
> and to use a hypercall to get the host cpu's errata. If we head down that
> road, then even a vcpu emulating EL2 would need to be able to this.
> 

We could use SMC calls here a well, as the "conduit" as I believe the
ARM folks are calling it.  We just need to agree somewhere (across
hypervisors preferably), that when you have virtual EL2, everything is
via SMC (even upcalls to a host hypervisor), and otherwise it's via HVC.

> > 
> > Do we currently have support for the guest to use SMC calls for PSCI
> > when it has virtual EL2?
> 
> Yup, that's already supported by QEMU and the guest kernel.
> 
Yes, and the KVM support follows this patch in the series as it turns
out (but given the time since I looked at this series last, I forgot).


Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 06/55] KVM: arm64: Add EL2 execution context for nesting
  2017-07-03  9:32         ` Marc Zyngier
@ 2017-07-03  9:54           ` Christoffer Dall
  2017-07-03 14:44             ` Jintack Lim
  0 siblings, 1 reply; 111+ messages in thread
From: Christoffer Dall @ 2017-07-03  9:54 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Jintack Lim, Christoffer Dall, Paolo Bonzini,
	Radim Krčmář,
	linux, Catalin Marinas, Will Deacon, vladimir.murzin,
	Suzuki K Poulose, mark.rutland, james.morse, lorenzo.pieralisi,
	kevin.brodsky, wcohen, shankerd, geoff, Andre Przywara,
	Eric Auger, anna-maria, Shih-Wei Li, arm-mail-list, kvmarm,
	KVM General, lkml - Kernel Mailing List, Jintack Lim

On Mon, Jul 03, 2017 at 10:32:45AM +0100, Marc Zyngier wrote:
> On 03/07/17 10:03, Christoffer Dall wrote:
> > On Mon, Jun 26, 2017 at 10:33:23AM -0400, Jintack Lim wrote:
> >> Hi Christoffer,
> >>
> >> On Wed, Feb 22, 2017 at 6:10 AM, Christoffer Dall <cdall@linaro.org> wrote:
> >>> On Mon, Jan 09, 2017 at 01:24:02AM -0500, Jintack Lim wrote:
> >>>> With the nested virtualization support, the context of the guest
> >>>> includes EL2 register states. The host manages a set of virtual EL2
> >>>> registers.  In addition to that, the guest hypervisor supposed to run in
> >>>> EL2 is now deprivilaged and runs in EL1. So, the host also manages a set
> >>>> of shadow system registers to be able to run the guest hypervisor in
> >>>> EL1.
> >>>>
> >>>> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
> >>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> >>>> ---
> >>>>  arch/arm64/include/asm/kvm_host.h | 54 +++++++++++++++++++++++++++++++++++++++
> >>>>  1 file changed, 54 insertions(+)
> >>>>
> >>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> >>>> index c0c8b02..ed78d73 100644
> >>>> --- a/arch/arm64/include/asm/kvm_host.h
> >>>> +++ b/arch/arm64/include/asm/kvm_host.h
> >>>> @@ -146,6 +146,42 @@ enum vcpu_sysreg {
> >>>>       NR_SYS_REGS     /* Nothing after this line! */
> >>>>  };
> >>>>
> >>>> +enum el2_regs {
> >>>> +     ELR_EL2,
> >>>> +     SPSR_EL2,
> >>>> +     SP_EL2,
> >>>> +     AMAIR_EL2,
> >>>> +     MAIR_EL2,
> >>>> +     TCR_EL2,
> >>>> +     TTBR0_EL2,
> >>>> +     VTCR_EL2,
> >>>> +     VTTBR_EL2,
> >>>> +     VMPIDR_EL2,
> >>>> +     VPIDR_EL2,      /* 10 */
> >>>> +     MDCR_EL2,
> >>>> +     CNTHCTL_EL2,
> >>>> +     CNTHP_CTL_EL2,
> >>>> +     CNTHP_CVAL_EL2,
> >>>> +     CNTHP_TVAL_EL2,
> >>>> +     CNTVOFF_EL2,
> >>>> +     ACTLR_EL2,
> >>>> +     AFSR0_EL2,
> >>>> +     AFSR1_EL2,
> >>>> +     CPTR_EL2,       /* 20 */
> >>>> +     ESR_EL2,
> >>>> +     FAR_EL2,
> >>>> +     HACR_EL2,
> >>>> +     HCR_EL2,
> >>>> +     HPFAR_EL2,
> >>>> +     HSTR_EL2,
> >>>> +     RMR_EL2,
> >>>> +     RVBAR_EL2,
> >>>> +     SCTLR_EL2,
> >>>> +     TPIDR_EL2,      /* 30 */
> >>>> +     VBAR_EL2,
> >>>> +     NR_EL2_REGS     /* Nothing after this line! */
> >>>> +};
> >>>
> >>> Why do we have a separate enum and array for the EL2 regs and not simply
> >>> expand vcpu_sysreg ?
> >>
> >> We can expand vcpu_sysreg for the EL2 system registers. For SP_EL2,
> >> SPSR_EL2, and ELR_EL2, where is the good place to locate them?.
> >> SP_EL1, SPSR_EL1, and ELR_EL1 registers are saved in the kvm_regs
> >> structure instead of sysregs[], so I wonder it's better to put them in
> >> kvm_regs, too.
> >>
> >> BTW, what's the reason that those EL1 registers are in kvm_regs
> >> instead of sysregs[] in the first place?
> >>
> > 
> > This has mostly to do with the way we export things to userspace, and
> > for historical reasons.
> > 
> > So we should either expand kvm_regs with the non-sysregs EL2 registers
> > and expand sys_regs with the EL2 sysregs, or we should put everything
> > EL2 into an EL2 array.  I feel like the first solution will fit more
> > nicely into the current design, but I don't have a very strong
> > preference.
> > 
> > You should look at the KVM_{GET,SET}_ONE_REG API definition and think
> > about how your choice will fit with this.
> > 
> > Marc, any preference?
> 
> My worry is that by changing kvm_regs, we're touching a userspace
> visible structure. I'm not sure we can avoid it, but I'd like to avoid
> putting too much there (SPSR_EL2 and ELR_EL2 should be enough). I just
> had a panic moment when realizing that this structure is not versioned,
> but the whole ONE_REG API seems to save us from a complete disaster.
> 
> Overall, having kvm_regs as a UAPI visible thing retrospectively strikes
> me as a dangerous design, as we cannot easily expand it. Maybe we should
> consider having a kvm_regs_v2 that embeds kvm_regs, and not directly
> expose it to userspace (but instead expose the indexes in that
> structure)? Userspace that knows how to deal with EL2 will use the new
> indexes, while existing SW will carry on using the EL1/EL0 version.

We definitely cannot expand kvm_regs, that would lead to all sorts of
potential errors, as you correctly point out.

So we probably need something like that, or simply let it stay the way
it is for now, and add el2_core_regs as a separate thing to the vcpu and
only expose the indexes and encoding for those registers?

> 
> sysregs are easier to deal with, as they are visible through their
> encoding, and we can place the anywhere we want. sys_regs is as good a
> location as any.
> 

Agreed.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 21/55] KVM: arm64: Forward HVC instruction to the guest hypervisor
  2017-07-03  9:51           ` Christoffer Dall
@ 2017-07-03 12:03             ` Will Deacon
  2017-07-03 12:35               ` Marc Zyngier
  0 siblings, 1 reply; 111+ messages in thread
From: Will Deacon @ 2017-07-03 12:03 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Andrew Jones, Jintack Lim, Christoffer Dall, Marc Zyngier,
	Paolo Bonzini, Radim Krčmář,
	linux, Catalin Marinas, vladimir.murzin, Suzuki K Poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, Andre Przywara, Eric Auger, anna-maria,
	Shih-Wei Li, arm-mail-list, kvmarm, KVM General,
	lkml - Kernel Mailing List

On Mon, Jul 03, 2017 at 11:51:26AM +0200, Christoffer Dall wrote:
> On Mon, Jul 03, 2017 at 11:31:56AM +0200, Andrew Jones wrote:
> > On Mon, Jul 03, 2017 at 11:08:50AM +0200, Christoffer Dall wrote:
> > > On Mon, Jun 26, 2017 at 11:21:25AM -0400, Jintack Lim wrote:
> > > > On Wed, Feb 22, 2017 at 6:47 AM, Christoffer Dall <cdall@linaro.org> wrote:
> > > > > On Mon, Jan 09, 2017 at 01:24:17AM -0500, Jintack Lim wrote:
> > > > >> +/* We forward all hvc instruction to the guest hypervisor. */
> > > > >> +int handle_hvc_nested(struct kvm_vcpu *vcpu)
> > > > >> +{
> > > > >> +     return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
> > > > >> +}
> > > > >
> > > > > I don't understand the logic here or in the caller above.  Do we really
> > > > > forward *all" hvc calls to the guest hypervisor now, so that we no
> > > > > longer support any hypercalls from the VM?  That seems a little rough
> > > > > and probably requires some more discussions.
> > > > 
> > > > So I think if we run a VM with the EL2 support, then all hvc calls
> > > > from the VM should be forwarded to the virtual EL2.
> > > 
> > > But do we actually check if the guest has EL2 here?  It seems you cann
> > > handle_hvc_nested unconditionally when you have
> > > OCNFIG_KVM_ARM_NESTED_HYP.  I think that's what threw me off when first
> > > reading your patch.
> > > 
> > > > 
> > > > I may miss something obvious, so can you (or anyone) come up with some
> > > > cases that the host hypervisor needs to directly handle hvc from the
> > > > VM with the EL2 support?
> > > > 
> > > 
> > > So I'm a little unsure what to say here.  On one hand you are absolutely
> > > correct, that architecturally if we emulated virtual EL2, then all
> > > hypercalls are handled by the virtual EL2 (even hypercalls from virtual
> > > EL2 which should become self-hypercalls).
> > > 
> > > On the other hand, an enlightened guest may want to use hypercalls to
> > > the hypervisor for some reason, but that would require some numbering
> > > scheme to separate the two concepts.
> > 
> > Yes, I've been thinking that a KVM generic vcpu needs to be enlightened,
> > and to use a hypercall to get the host cpu's errata. If we head down that
> > road, then even a vcpu emulating EL2 would need to be able to this.
> > 
> 
> We could use SMC calls here a well, as the "conduit" as I believe the
> ARM folks are calling it.  We just need to agree somewhere (across
> hypervisors preferably), that when you have virtual EL2, everything is
> via SMC (even upcalls to a host hypervisor), and otherwise it's via HVC.

Does that mean you require the CPU to implement EL3 if you want to use
nested virtualisation?

Will

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 21/55] KVM: arm64: Forward HVC instruction to the guest hypervisor
  2017-07-03 12:03             ` Will Deacon
@ 2017-07-03 12:35               ` Marc Zyngier
  0 siblings, 0 replies; 111+ messages in thread
From: Marc Zyngier @ 2017-07-03 12:35 UTC (permalink / raw)
  To: Will Deacon, Christoffer Dall
  Cc: Andrew Jones, Jintack Lim, Christoffer Dall, Paolo Bonzini,
	Radim Krčmář,
	linux, Catalin Marinas, vladimir.murzin, Suzuki K Poulose,
	mark.rutland, james.morse, lorenzo.pieralisi, kevin.brodsky,
	wcohen, shankerd, geoff, Andre Przywara, Eric Auger, anna-maria,
	Shih-Wei Li, arm-mail-list, kvmarm, KVM General,
	lkml - Kernel Mailing List

On 03/07/17 13:03, Will Deacon wrote:
> On Mon, Jul 03, 2017 at 11:51:26AM +0200, Christoffer Dall wrote:
>> On Mon, Jul 03, 2017 at 11:31:56AM +0200, Andrew Jones wrote:
>>> On Mon, Jul 03, 2017 at 11:08:50AM +0200, Christoffer Dall wrote:
>>>> On Mon, Jun 26, 2017 at 11:21:25AM -0400, Jintack Lim wrote:
>>>>> On Wed, Feb 22, 2017 at 6:47 AM, Christoffer Dall <cdall@linaro.org> wrote:
>>>>>> On Mon, Jan 09, 2017 at 01:24:17AM -0500, Jintack Lim wrote:
>>>>>>> +/* We forward all hvc instruction to the guest hypervisor. */
>>>>>>> +int handle_hvc_nested(struct kvm_vcpu *vcpu)
>>>>>>> +{
>>>>>>> +     return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
>>>>>>> +}
>>>>>>
>>>>>> I don't understand the logic here or in the caller above.  Do we really
>>>>>> forward *all" hvc calls to the guest hypervisor now, so that we no
>>>>>> longer support any hypercalls from the VM?  That seems a little rough
>>>>>> and probably requires some more discussions.
>>>>>
>>>>> So I think if we run a VM with the EL2 support, then all hvc calls
>>>>> from the VM should be forwarded to the virtual EL2.
>>>>
>>>> But do we actually check if the guest has EL2 here?  It seems you cann
>>>> handle_hvc_nested unconditionally when you have
>>>> OCNFIG_KVM_ARM_NESTED_HYP.  I think that's what threw me off when first
>>>> reading your patch.
>>>>
>>>>>
>>>>> I may miss something obvious, so can you (or anyone) come up with some
>>>>> cases that the host hypervisor needs to directly handle hvc from the
>>>>> VM with the EL2 support?
>>>>>
>>>>
>>>> So I'm a little unsure what to say here.  On one hand you are absolutely
>>>> correct, that architecturally if we emulated virtual EL2, then all
>>>> hypercalls are handled by the virtual EL2 (even hypercalls from virtual
>>>> EL2 which should become self-hypercalls).
>>>>
>>>> On the other hand, an enlightened guest may want to use hypercalls to
>>>> the hypervisor for some reason, but that would require some numbering
>>>> scheme to separate the two concepts.
>>>
>>> Yes, I've been thinking that a KVM generic vcpu needs to be enlightened,
>>> and to use a hypercall to get the host cpu's errata. If we head down that
>>> road, then even a vcpu emulating EL2 would need to be able to this.
>>>
>>
>> We could use SMC calls here a well, as the "conduit" as I believe the
>> ARM folks are calling it.  We just need to agree somewhere (across
>> hypervisors preferably), that when you have virtual EL2, everything is
>> via SMC (even upcalls to a host hypervisor), and otherwise it's via HVC.
> 
> Does that mean you require the CPU to implement EL3 if you want to use
> nested virtualisation?

The 8.3 spec has relaxed the use of SMC for the non-root hypervisor,
where the top-level hypervisor can trap SMCs from nested hypervisors,
irrespective of EL3 being implemented. It still cannot SMCs from an EL1
guest if EL3 is not implemented though...

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 21/55] KVM: arm64: Forward HVC instruction to the guest hypervisor
  2017-07-03  9:08       ` Christoffer Dall
  2017-07-03  9:31         ` Andrew Jones
@ 2017-07-03 13:29         ` Jintack Lim
  1 sibling, 0 replies; 111+ messages in thread
From: Jintack Lim @ 2017-07-03 13:29 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Christoffer Dall, Marc Zyngier, Paolo Bonzini,
	Radim Krčmář,
	linux, Catalin Marinas, Will Deacon, vladimir.murzin,
	Suzuki K Poulose, mark.rutland, james.morse, lorenzo.pieralisi,
	kevin.brodsky, wcohen, shankerd, geoff, Andre Przywara,
	Eric Auger, anna-maria, Shih-Wei Li, arm-mail-list, kvmarm,
	KVM General, lkml - Kernel Mailing List

On Mon, Jul 3, 2017 at 5:08 AM, Christoffer Dall <cdall@linaro.org> wrote:
> On Mon, Jun 26, 2017 at 11:21:25AM -0400, Jintack Lim wrote:
>> On Wed, Feb 22, 2017 at 6:47 AM, Christoffer Dall <cdall@linaro.org> wrote:
>> > On Mon, Jan 09, 2017 at 01:24:17AM -0500, Jintack Lim wrote:
>> >> Forward exceptions due to hvc instruction to the guest hypervisor.
>> >>
>> >> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
>> >> ---
>> >>  arch/arm64/include/asm/kvm_nested.h |  5 +++++
>> >>  arch/arm64/kvm/Makefile             |  1 +
>> >>  arch/arm64/kvm/handle_exit.c        | 11 +++++++++++
>> >>  arch/arm64/kvm/handle_exit_nested.c | 27 +++++++++++++++++++++++++++
>> >>  4 files changed, 44 insertions(+)
>> >>  create mode 100644 arch/arm64/include/asm/kvm_nested.h
>> >>  create mode 100644 arch/arm64/kvm/handle_exit_nested.c
>> >>
>> >> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
>> >> new file mode 100644
>> >> index 0000000..620b4d3
>> >> --- /dev/null
>> >> +++ b/arch/arm64/include/asm/kvm_nested.h
>> >> @@ -0,0 +1,5 @@
>> >> +#ifndef __ARM64_KVM_NESTED_H__
>> >> +#define __ARM64_KVM_NESTED_H__
>> >> +
>> >> +int handle_hvc_nested(struct kvm_vcpu *vcpu);
>> >> +#endif
>> >> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
>> >> index b342bdd..9c35e9a 100644
>> >> --- a/arch/arm64/kvm/Makefile
>> >> +++ b/arch/arm64/kvm/Makefile
>> >> @@ -35,4 +35,5 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/irqchip.o
>> >>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
>> >>  kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
>> >>
>> >> +kvm-$(CONFIG_KVM_ARM_NESTED_HYP) += handle_exit_nested.o
>> >>  kvm-$(CONFIG_KVM_ARM_NESTED_HYP) += emulate-nested.o
>> >> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
>> >> index a891684..208be16 100644
>> >> --- a/arch/arm64/kvm/handle_exit.c
>> >> +++ b/arch/arm64/kvm/handle_exit.c
>> >> @@ -29,6 +29,10 @@
>> >>  #include <asm/kvm_mmu.h>
>> >>  #include <asm/kvm_psci.h>
>> >>
>> >> +#ifdef CONFIG_KVM_ARM_NESTED_HYP
>> >> +#include <asm/kvm_nested.h>
>> >> +#endif
>> >> +
>> >>  #define CREATE_TRACE_POINTS
>> >>  #include "trace.h"
>> >>
>> >> @@ -42,6 +46,13 @@ static int handle_hvc(struct kvm_vcpu *vcpu, struct kvm_run *run)
>> >>                           kvm_vcpu_hvc_get_imm(vcpu));
>> >>       vcpu->stat.hvc_exit_stat++;
>> >>
>> >> +#ifdef CONFIG_KVM_ARM_NESTED_HYP
>> >> +     ret = handle_hvc_nested(vcpu);
>> >> +     if (ret < 0 && ret != -EINVAL)
>> >> +             return ret;
>> >> +     else if (ret >= 0)
>> >> +             return ret;
>> >> +#endif
>> >>       ret = kvm_psci_call(vcpu);
>> >>       if (ret < 0) {
>> >>               kvm_inject_undefined(vcpu);
>> >> diff --git a/arch/arm64/kvm/handle_exit_nested.c b/arch/arm64/kvm/handle_exit_nested.c
>> >> new file mode 100644
>> >> index 0000000..a6ce23b
>> >> --- /dev/null
>> >> +++ b/arch/arm64/kvm/handle_exit_nested.c
>> >> @@ -0,0 +1,27 @@
>> >> +/*
>> >> + * Copyright (C) 2016 - Columbia University
>> >> + * Author: Jintack Lim <jintack@cs.columbia.edu>
>> >> + *
>> >> + * This program is free software; you can redistribute it and/or modify
>> >> + * it under the terms of the GNU General Public License version 2 as
>> >> + * published by the Free Software Foundation.
>> >> + *
>> >> + * This program is distributed in the hope that it will be useful,
>> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> >> + * GNU General Public License for more details.
>> >> + *
>> >> + * You should have received a copy of the GNU General Public License
>> >> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> >> + */
>> >> +
>> >> +#include <linux/kvm.h>
>> >> +#include <linux/kvm_host.h>
>> >> +
>> >> +#include <asm/kvm_emulate.h>
>> >> +
>> >> +/* We forward all hvc instruction to the guest hypervisor. */
>> >> +int handle_hvc_nested(struct kvm_vcpu *vcpu)
>> >> +{
>> >> +     return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
>> >> +}
>> >
>> > I don't understand the logic here or in the caller above.  Do we really
>> > forward *all" hvc calls to the guest hypervisor now, so that we no
>> > longer support any hypercalls from the VM?  That seems a little rough
>> > and probably requires some more discussions.
>>
>> So I think if we run a VM with the EL2 support, then all hvc calls
>> from the VM should be forwarded to the virtual EL2.
>
> But do we actually check if the guest has EL2 here?  It seems you cann
> handle_hvc_nested unconditionally when you have
> OCNFIG_KVM_ARM_NESTED_HYP.  I think that's what threw me off when first
> reading your patch.

You're right. We should check it first.

>
>>
>> I may miss something obvious, so can you (or anyone) come up with some
>> cases that the host hypervisor needs to directly handle hvc from the
>> VM with the EL2 support?
>>
>
> So I'm a little unsure what to say here.  On one hand you are absolutely
> correct, that architecturally if we emulated virtual EL2, then all
> hypercalls are handled by the virtual EL2 (even hypercalls from virtual
> EL2 which should become self-hypercalls).
>
> On the other hand, an enlightened guest may want to use hypercalls to
> the hypervisor for some reason, but that would require some numbering
> scheme to separate the two concepts.
>
> Do we currently have support for the guest to use SMC calls for PSCI
> when it has virtual EL2?

Yes, we do in "[RFC,22/55] KVM: arm64: Handle PSCI call from the
guest" as you figured out.

>
> Thanks,
> -Christoffer

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 06/55] KVM: arm64: Add EL2 execution context for nesting
  2017-07-03  9:54           ` Christoffer Dall
@ 2017-07-03 14:44             ` Jintack Lim
  2017-07-03 15:30               ` Christoffer Dall
  0 siblings, 1 reply; 111+ messages in thread
From: Jintack Lim @ 2017-07-03 14:44 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Marc Zyngier, Christoffer Dall, Paolo Bonzini,
	Radim Krčmář,
	linux, Catalin Marinas, Will Deacon, vladimir.murzin,
	Suzuki K Poulose, mark.rutland, james.morse, lorenzo.pieralisi,
	kevin.brodsky, wcohen, shankerd, geoff, Andre Przywara,
	Eric Auger, anna-maria, Shih-Wei Li, arm-mail-list, kvmarm,
	KVM General, lkml - Kernel Mailing List

Thanks Christoffer and Marc,

On Mon, Jul 3, 2017 at 5:54 AM, Christoffer Dall <cdall@linaro.org> wrote:
> On Mon, Jul 03, 2017 at 10:32:45AM +0100, Marc Zyngier wrote:
>> On 03/07/17 10:03, Christoffer Dall wrote:
>> > On Mon, Jun 26, 2017 at 10:33:23AM -0400, Jintack Lim wrote:
>> >> Hi Christoffer,
>> >>
>> >> On Wed, Feb 22, 2017 at 6:10 AM, Christoffer Dall <cdall@linaro.org> wrote:
>> >>> On Mon, Jan 09, 2017 at 01:24:02AM -0500, Jintack Lim wrote:
>> >>>> With the nested virtualization support, the context of the guest
>> >>>> includes EL2 register states. The host manages a set of virtual EL2
>> >>>> registers.  In addition to that, the guest hypervisor supposed to run in
>> >>>> EL2 is now deprivilaged and runs in EL1. So, the host also manages a set
>> >>>> of shadow system registers to be able to run the guest hypervisor in
>> >>>> EL1.
>> >>>>
>> >>>> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
>> >>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>> >>>> ---
>> >>>>  arch/arm64/include/asm/kvm_host.h | 54 +++++++++++++++++++++++++++++++++++++++
>> >>>>  1 file changed, 54 insertions(+)
>> >>>>
>> >>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> >>>> index c0c8b02..ed78d73 100644
>> >>>> --- a/arch/arm64/include/asm/kvm_host.h
>> >>>> +++ b/arch/arm64/include/asm/kvm_host.h
>> >>>> @@ -146,6 +146,42 @@ enum vcpu_sysreg {
>> >>>>       NR_SYS_REGS     /* Nothing after this line! */
>> >>>>  };
>> >>>>
>> >>>> +enum el2_regs {
>> >>>> +     ELR_EL2,
>> >>>> +     SPSR_EL2,
>> >>>> +     SP_EL2,
>> >>>> +     AMAIR_EL2,
>> >>>> +     MAIR_EL2,
>> >>>> +     TCR_EL2,
>> >>>> +     TTBR0_EL2,
>> >>>> +     VTCR_EL2,
>> >>>> +     VTTBR_EL2,
>> >>>> +     VMPIDR_EL2,
>> >>>> +     VPIDR_EL2,      /* 10 */
>> >>>> +     MDCR_EL2,
>> >>>> +     CNTHCTL_EL2,
>> >>>> +     CNTHP_CTL_EL2,
>> >>>> +     CNTHP_CVAL_EL2,
>> >>>> +     CNTHP_TVAL_EL2,
>> >>>> +     CNTVOFF_EL2,
>> >>>> +     ACTLR_EL2,
>> >>>> +     AFSR0_EL2,
>> >>>> +     AFSR1_EL2,
>> >>>> +     CPTR_EL2,       /* 20 */
>> >>>> +     ESR_EL2,
>> >>>> +     FAR_EL2,
>> >>>> +     HACR_EL2,
>> >>>> +     HCR_EL2,
>> >>>> +     HPFAR_EL2,
>> >>>> +     HSTR_EL2,
>> >>>> +     RMR_EL2,
>> >>>> +     RVBAR_EL2,
>> >>>> +     SCTLR_EL2,
>> >>>> +     TPIDR_EL2,      /* 30 */
>> >>>> +     VBAR_EL2,
>> >>>> +     NR_EL2_REGS     /* Nothing after this line! */
>> >>>> +};
>> >>>
>> >>> Why do we have a separate enum and array for the EL2 regs and not simply
>> >>> expand vcpu_sysreg ?
>> >>
>> >> We can expand vcpu_sysreg for the EL2 system registers. For SP_EL2,
>> >> SPSR_EL2, and ELR_EL2, where is the good place to locate them?.
>> >> SP_EL1, SPSR_EL1, and ELR_EL1 registers are saved in the kvm_regs
>> >> structure instead of sysregs[], so I wonder it's better to put them in
>> >> kvm_regs, too.
>> >>
>> >> BTW, what's the reason that those EL1 registers are in kvm_regs
>> >> instead of sysregs[] in the first place?
>> >>
>> >
>> > This has mostly to do with the way we export things to userspace, and
>> > for historical reasons.
>> >
>> > So we should either expand kvm_regs with the non-sysregs EL2 registers
>> > and expand sys_regs with the EL2 sysregs, or we should put everything
>> > EL2 into an EL2 array.  I feel like the first solution will fit more
>> > nicely into the current design, but I don't have a very strong
>> > preference.
>> >
>> > You should look at the KVM_{GET,SET}_ONE_REG API definition and think
>> > about how your choice will fit with this.
>> >
>> > Marc, any preference?
>>
>> My worry is that by changing kvm_regs, we're touching a userspace
>> visible structure. I'm not sure we can avoid it, but I'd like to avoid
>> putting too much there (SPSR_EL2 and ELR_EL2 should be enough). I just
>> had a panic moment when realizing that this structure is not versioned,
>> but the whole ONE_REG API seems to save us from a complete disaster.
>>
>> Overall, having kvm_regs as a UAPI visible thing retrospectively strikes
>> me as a dangerous design, as we cannot easily expand it. Maybe we should
>> consider having a kvm_regs_v2 that embeds kvm_regs, and not directly
>> expose it to userspace (but instead expose the indexes in that
>> structure)? Userspace that knows how to deal with EL2 will use the new
>> indexes, while existing SW will carry on using the EL1/EL0 version.
>
> We definitely cannot expand kvm_regs, that would lead to all sorts of
> potential errors, as you correctly point out.

Ok. I didn't know that kvm_regs are exposed to the user space.

>
> So we probably need something like that, or simply let it stay the way
> it is for now, and add el2_core_regs as a separate thing to the vcpu and
> only expose the indexes and encoding for those registers?
>

Sounds good to me.

So, expand sys_regs with the EL2 sysregs and put the special purpose
registers, which is the term used in ARM ARM, such as SPSR_EL2,
ELR_EL2 and SP_EL2 into el2_core_regs or el2_special_regs, right?

>>
>> sysregs are easier to deal with, as they are visible through their
>> encoding, and we can place the anywhere we want. sys_regs is as good a
>> location as any.
>>
>
> Agreed.
>
> Thanks,
> -Christoffer

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC 06/55] KVM: arm64: Add EL2 execution context for nesting
  2017-07-03 14:44             ` Jintack Lim
@ 2017-07-03 15:30               ` Christoffer Dall
  0 siblings, 0 replies; 111+ messages in thread
From: Christoffer Dall @ 2017-07-03 15:30 UTC (permalink / raw)
  To: Jintack Lim
  Cc: Marc Zyngier, Christoffer Dall, Paolo Bonzini,
	Radim Krčmář,
	linux, Catalin Marinas, Will Deacon, vladimir.murzin,
	Suzuki K Poulose, mark.rutland, james.morse, lorenzo.pieralisi,
	kevin.brodsky, wcohen, shankerd, geoff, Andre Przywara,
	Eric Auger, anna-maria, Shih-Wei Li, arm-mail-list, kvmarm,
	KVM General, lkml - Kernel Mailing List

On Mon, Jul 03, 2017 at 10:44:51AM -0400, Jintack Lim wrote:
> Thanks Christoffer and Marc,
> 
> On Mon, Jul 3, 2017 at 5:54 AM, Christoffer Dall <cdall@linaro.org> wrote:
> > On Mon, Jul 03, 2017 at 10:32:45AM +0100, Marc Zyngier wrote:
> >> On 03/07/17 10:03, Christoffer Dall wrote:
> >> > On Mon, Jun 26, 2017 at 10:33:23AM -0400, Jintack Lim wrote:
> >> >> Hi Christoffer,
> >> >>
> >> >> On Wed, Feb 22, 2017 at 6:10 AM, Christoffer Dall <cdall@linaro.org> wrote:
> >> >>> On Mon, Jan 09, 2017 at 01:24:02AM -0500, Jintack Lim wrote:
> >> >>>> With the nested virtualization support, the context of the guest
> >> >>>> includes EL2 register states. The host manages a set of virtual EL2
> >> >>>> registers.  In addition to that, the guest hypervisor supposed to run in
> >> >>>> EL2 is now deprivilaged and runs in EL1. So, the host also manages a set
> >> >>>> of shadow system registers to be able to run the guest hypervisor in
> >> >>>> EL1.
> >> >>>>
> >> >>>> Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
> >> >>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> >> >>>> ---
> >> >>>>  arch/arm64/include/asm/kvm_host.h | 54 +++++++++++++++++++++++++++++++++++++++
> >> >>>>  1 file changed, 54 insertions(+)
> >> >>>>
> >> >>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> >> >>>> index c0c8b02..ed78d73 100644
> >> >>>> --- a/arch/arm64/include/asm/kvm_host.h
> >> >>>> +++ b/arch/arm64/include/asm/kvm_host.h
> >> >>>> @@ -146,6 +146,42 @@ enum vcpu_sysreg {
> >> >>>>       NR_SYS_REGS     /* Nothing after this line! */
> >> >>>>  };
> >> >>>>
> >> >>>> +enum el2_regs {
> >> >>>> +     ELR_EL2,
> >> >>>> +     SPSR_EL2,
> >> >>>> +     SP_EL2,
> >> >>>> +     AMAIR_EL2,
> >> >>>> +     MAIR_EL2,
> >> >>>> +     TCR_EL2,
> >> >>>> +     TTBR0_EL2,
> >> >>>> +     VTCR_EL2,
> >> >>>> +     VTTBR_EL2,
> >> >>>> +     VMPIDR_EL2,
> >> >>>> +     VPIDR_EL2,      /* 10 */
> >> >>>> +     MDCR_EL2,
> >> >>>> +     CNTHCTL_EL2,
> >> >>>> +     CNTHP_CTL_EL2,
> >> >>>> +     CNTHP_CVAL_EL2,
> >> >>>> +     CNTHP_TVAL_EL2,
> >> >>>> +     CNTVOFF_EL2,
> >> >>>> +     ACTLR_EL2,
> >> >>>> +     AFSR0_EL2,
> >> >>>> +     AFSR1_EL2,
> >> >>>> +     CPTR_EL2,       /* 20 */
> >> >>>> +     ESR_EL2,
> >> >>>> +     FAR_EL2,
> >> >>>> +     HACR_EL2,
> >> >>>> +     HCR_EL2,
> >> >>>> +     HPFAR_EL2,
> >> >>>> +     HSTR_EL2,
> >> >>>> +     RMR_EL2,
> >> >>>> +     RVBAR_EL2,
> >> >>>> +     SCTLR_EL2,
> >> >>>> +     TPIDR_EL2,      /* 30 */
> >> >>>> +     VBAR_EL2,
> >> >>>> +     NR_EL2_REGS     /* Nothing after this line! */
> >> >>>> +};
> >> >>>
> >> >>> Why do we have a separate enum and array for the EL2 regs and not simply
> >> >>> expand vcpu_sysreg ?
> >> >>
> >> >> We can expand vcpu_sysreg for the EL2 system registers. For SP_EL2,
> >> >> SPSR_EL2, and ELR_EL2, where is the good place to locate them?.
> >> >> SP_EL1, SPSR_EL1, and ELR_EL1 registers are saved in the kvm_regs
> >> >> structure instead of sysregs[], so I wonder it's better to put them in
> >> >> kvm_regs, too.
> >> >>
> >> >> BTW, what's the reason that those EL1 registers are in kvm_regs
> >> >> instead of sysregs[] in the first place?
> >> >>
> >> >
> >> > This has mostly to do with the way we export things to userspace, and
> >> > for historical reasons.
> >> >
> >> > So we should either expand kvm_regs with the non-sysregs EL2 registers
> >> > and expand sys_regs with the EL2 sysregs, or we should put everything
> >> > EL2 into an EL2 array.  I feel like the first solution will fit more
> >> > nicely into the current design, but I don't have a very strong
> >> > preference.
> >> >
> >> > You should look at the KVM_{GET,SET}_ONE_REG API definition and think
> >> > about how your choice will fit with this.
> >> >
> >> > Marc, any preference?
> >>
> >> My worry is that by changing kvm_regs, we're touching a userspace
> >> visible structure. I'm not sure we can avoid it, but I'd like to avoid
> >> putting too much there (SPSR_EL2 and ELR_EL2 should be enough). I just
> >> had a panic moment when realizing that this structure is not versioned,
> >> but the whole ONE_REG API seems to save us from a complete disaster.
> >>
> >> Overall, having kvm_regs as a UAPI visible thing retrospectively strikes
> >> me as a dangerous design, as we cannot easily expand it. Maybe we should
> >> consider having a kvm_regs_v2 that embeds kvm_regs, and not directly
> >> expose it to userspace (but instead expose the indexes in that
> >> structure)? Userspace that knows how to deal with EL2 will use the new
> >> indexes, while existing SW will carry on using the EL1/EL0 version.
> >
> > We definitely cannot expand kvm_regs, that would lead to all sorts of
> > potential errors, as you correctly point out.
> 
> Ok. I didn't know that kvm_regs are exposed to the user space.
> 
> >
> > So we probably need something like that, or simply let it stay the way
> > it is for now, and add el2_core_regs as a separate thing to the vcpu and
> > only expose the indexes and encoding for those registers?
> >
> 
> Sounds good to me.
> 
> So, expand sys_regs with the EL2 sysregs and put the special purpose
> registers, which is the term used in ARM ARM, such as SPSR_EL2,
> ELR_EL2 and SP_EL2 into el2_core_regs or el2_special_regs, right?
> 

el2_special_regs, yes.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 111+ messages in thread

end of thread, other threads:[~2017-07-03 15:30 UTC | newest]

Thread overview: 111+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-09  6:23 [RFC 00/55] Nested Virtualization on KVM/ARM Jintack Lim
2017-01-09  6:23 ` [RFC 01/55] arm64: Add missing TCR hw defines Jintack Lim
2017-01-09  6:23 ` [RFC 02/55] KVM: arm64: Add nesting config option Jintack Lim
2017-01-09  6:23 ` [RFC 03/55] KVM: arm64: Add KVM nesting feature Jintack Lim
2017-01-09  6:24 ` [RFC 04/55] KVM: arm64: Allow userspace to set PSR_MODE_EL2x Jintack Lim
2017-01-09  6:24 ` [RFC 05/55] KVM: arm64: Add vcpu_mode_el2 primitive to support nesting Jintack Lim
2017-01-09  6:24 ` [RFC 06/55] KVM: arm64: Add EL2 execution context for nesting Jintack Lim
2017-02-22 11:10   ` Christoffer Dall
2017-06-26 14:33     ` Jintack Lim
2017-07-03  9:03       ` Christoffer Dall
2017-07-03  9:32         ` Marc Zyngier
2017-07-03  9:54           ` Christoffer Dall
2017-07-03 14:44             ` Jintack Lim
2017-07-03 15:30               ` Christoffer Dall
2017-01-09  6:24 ` [RFC 07/55] KVM: arm/arm64: Add virtual EL2 state emulation framework Jintack Lim
2017-02-22 11:12   ` Christoffer Dall
2017-06-01 20:05   ` Bandan Das
2017-06-02 11:51     ` Christoffer Dall
2017-06-02 17:36       ` Bandan Das
2017-06-02 19:06         ` Christoffer Dall
2017-06-02 19:25           ` Bandan Das
2017-01-09  6:24 ` [RFC 08/55] KVM: arm64: Set virtual EL2 context depending on the guest exception level Jintack Lim
2017-02-22 11:14   ` Christoffer Dall
2017-06-01 20:22   ` Bandan Das
2017-06-02  8:48     ` Marc Zyngier
2017-01-09  6:24 ` [RFC 09/55] KVM: arm64: Set shadow EL1 registers for virtual EL2 execution Jintack Lim
2017-02-22 11:19   ` Christoffer Dall
2017-01-09  6:24 ` [RFC 10/55] KVM: arm64: Synchronize EL1 system registers on virtual EL2 entry and exit Jintack Lim
2017-06-06 20:16   ` Bandan Das
2017-06-07  4:26     ` Jintack Lim
2017-01-09  6:24 ` [RFC 11/55] KVM: arm64: Emulate taking an exception to the guest hypervisor Jintack Lim
2017-02-22 11:28   ` Christoffer Dall
2017-06-06 20:21   ` Bandan Das
2017-06-06 20:38     ` Jintack Lim
2017-06-06 22:07       ` Bandan Das
2017-06-06 23:16         ` Jintack Lim
2017-06-07 17:21           ` Bandan Das
2017-01-09  6:24 ` [RFC 12/55] KVM: arm64: Handle EL2 register access traps Jintack Lim
2017-02-22 11:30   ` Christoffer Dall
2017-02-22 11:31   ` Christoffer Dall
2017-01-09  6:24 ` [RFC 13/55] KVM: arm64: Handle eret instruction traps Jintack Lim
2017-01-09  6:24 ` [RFC 14/55] KVM: arm64: Take account of system " Jintack Lim
2017-02-22 11:34   ` Christoffer Dall
2017-01-09  6:24 ` [RFC 15/55] KVM: arm64: Trap EL1 VM register accesses in virtual EL2 Jintack Lim
2017-01-09  6:24 ` [RFC 16/55] KVM: arm64: Forward VM reg traps to the guest hypervisor Jintack Lim
2017-02-22 11:39   ` Christoffer Dall
2017-01-09  6:24 ` [RFC 17/55] KVM: arm64: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 in virtual EL2 Jintack Lim
2017-02-22 11:40   ` Christoffer Dall
2017-01-09  6:24 ` [RFC 18/55] KVM: arm64: Forward traps due to HCR_EL2.NV1 bit to the guest hypervisor Jintack Lim
2017-02-22 11:41   ` Christoffer Dall
2017-01-09  6:24 ` [RFC 19/55] KVM: arm64: Trap CPACR_EL1 access in virtual EL2 Jintack Lim
2017-01-09  6:24 ` [RFC 20/55] KVM: arm64: Forward CPACR_EL1 traps to the guest hypervisor Jintack Lim
2017-01-09  6:24 ` [RFC 21/55] KVM: arm64: Forward HVC instruction " Jintack Lim
2017-02-22 11:47   ` Christoffer Dall
2017-06-26 15:21     ` Jintack Lim
2017-07-03  9:08       ` Christoffer Dall
2017-07-03  9:31         ` Andrew Jones
2017-07-03  9:51           ` Christoffer Dall
2017-07-03 12:03             ` Will Deacon
2017-07-03 12:35               ` Marc Zyngier
2017-07-03 13:29         ` Jintack Lim
2017-01-09  6:24 ` [RFC 22/55] KVM: arm64: Handle PSCI call from the guest Jintack Lim
2017-01-09  6:24 ` [RFC 23/55] KVM: arm64: Forward WFX to the guest hypervisor Jintack Lim
2017-01-09  6:24 ` [RFC 24/55] KVM: arm64: Forward FP exceptions " Jintack Lim
2017-01-09  6:24 ` [RFC 25/55] KVM: arm/arm64: Let vcpu thread modify its own active state Jintack Lim
2017-02-22 12:27   ` Christoffer Dall
2017-01-09  6:24 ` [RFC 26/55] KVM: arm/arm64: Add VGIC data structures for the nesting Jintack Lim
2017-01-09  6:24 ` [RFC 27/55] KVM: arm/arm64: Emulate GICH interface on GICv2 Jintack Lim
2017-02-22 13:06   ` Christoffer Dall
2017-01-09  6:24 ` [RFC 28/55] KVM: arm/arm64: Prepare vgic state for the nested VM Jintack Lim
2017-02-22 13:12   ` Christoffer Dall
2017-01-09  6:24 ` [RFC 29/55] KVM: arm/arm64: Set up the prepared vgic state Jintack Lim
2017-01-09  6:24 ` [RFC 30/55] KVM: arm/arm64: Inject irqs to the guest hypervisor Jintack Lim
2017-02-22 13:16   ` Christoffer Dall
2017-01-09  6:24 ` [RFC 31/55] KVM: arm/arm64: Inject maintenance interrupts " Jintack Lim
2017-02-22 13:19   ` Christoffer Dall
2017-01-09  6:24 ` [RFC 32/55] KVM: arm/arm64: register GICH iodev for " Jintack Lim
2017-02-22 13:21   ` Christoffer Dall
2017-01-09  6:24 ` [RFC 33/55] KVM: arm/arm64: Remove unused params in mmu functions Jintack Lim
2017-01-09  6:24 ` [RFC 34/55] KVM: arm/arm64: Abstract stage-2 MMU state into a separate structure Jintack Lim
2017-01-09  6:24 ` [RFC 35/55] KVM: arm/arm64: Support mmu for the virtual EL2 execution Jintack Lim
2017-02-22 13:38   ` Christoffer Dall
2017-01-09  6:24 ` [RFC 36/55] KVM: arm64: Invalidate virtual EL2 TLB entries when needed Jintack Lim
2017-01-09  6:24 ` [RFC 37/55] KVM: arm64: Setup vttbr_el2 on each VM entry Jintack Lim
2017-01-09  6:24 ` [RFC 38/55] KVM: arm/arm64: Make mmu functions non-static Jintack Lim
2017-01-09  6:24 ` [RFC 39/55] KVM: arm/arm64: Add mmu context for the nesting Jintack Lim
2017-02-22 13:34   ` Christoffer Dall
2017-01-09  6:24 ` [RFC 40/55] KVM: arm/arm64: Handle vttbr_el2 write operation from the guest hypervisor Jintack Lim
2017-02-22 17:59   ` Christoffer Dall
2017-01-09  6:24 ` [RFC 41/55] KVM: arm/arm64: Unmap/flush shadow stage 2 page tables Jintack Lim
2017-02-22 18:09   ` Christoffer Dall
2017-01-09  6:24 ` [RFC 42/55] KVM: arm64: Implement nested Stage-2 page table walk logic Jintack Lim
2017-01-09  6:24 ` [RFC 43/55] KVM: arm/arm64: Handle shadow stage 2 page faults Jintack Lim
2017-01-09  6:24 ` [RFC 44/55] KVM: arm/arm64: Move kvm_is_write_fault to header file Jintack Lim
2017-01-09  6:24 ` [RFC 45/55] KVM: arm64: KVM: Inject stage-2 page faults Jintack Lim
2017-01-09  6:24 ` [RFC 46/55] KVM: arm64: Add more info to the S2 translation result Jintack Lim
2017-01-09  6:24 ` [RFC 47/55] KVM: arm/arm64: Forward the guest hypervisor's stage 2 permission faults Jintack Lim
2017-02-22 18:15   ` Christoffer Dall
2017-01-09  6:24 ` [RFC 48/55] KVM: arm64: Emulate TLBI instruction Jintack Lim
2017-01-09  6:24 ` [RFC 49/55] KVM: arm64: Fixes to toggle_cache for nesting Jintack Lim
2017-01-09  6:24 ` [RFC 50/55] KVM: arm/arm64: Abstract kvm_phys_addr_ioremap() function Jintack Lim
2017-01-09  6:24 ` [RFC 51/55] KVM: arm64: Expose physical address of vcpu interface Jintack Lim
2017-01-09  6:24 ` [RFC 52/55] KVM: arm/arm64: Create a vcpu mapping for the nested VM Jintack Lim
2017-01-09  6:24 ` [RFC 53/55] KVM: arm64: Reflect shadow VMPIDR_EL2 value to MPIDR_EL1 Jintack Lim
2017-01-09  6:24 ` [RFC 54/55] KVM: arm/arm64: Adjust virtual offset considering nesting Jintack Lim
2017-02-22 19:28   ` Christoffer Dall
2017-01-09  6:24 ` [RFC 55/55] KVM: arm64: Enable nested virtualization Jintack Lim
2017-01-09 15:05 ` [RFC 00/55] Nested Virtualization on KVM/ARM David Hildenbrand
2017-01-10 16:18   ` Jintack Lim
2017-02-22 18:23 ` Christoffer Dall
2017-02-24 10:28   ` Jintack Lim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).