kvmarm.lists.cs.columbia.edu archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support
@ 2021-08-25 16:17 Alexandru Elisei
  2021-08-25 16:17 ` [RFC PATCH v4 01/39] KVM: arm64: Make lock_all_vcpus() available to the rest of KVM Alexandru Elisei
                   ` (39 more replies)
  0 siblings, 40 replies; 47+ messages in thread
From: Alexandru Elisei @ 2021-08-25 16:17 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	linux-kernel

This is v4 of the SPE series posted at [1]. v2 can be found at [2], and the
original series at [3].

Statistical Profiling Extension (SPE) is an optional feature added in
ARMv8.2. It allows sampling at regular intervals of the operations executed
by the PE and storing a record of each operation in a memory buffer. A high
level overview of the extension is presented in an article on arm.com [4].

This is another complete rewrite of the series, and nothing is set in
stone. If you think of a better way to do things, please suggest it.


Features added
==============

The rewrite enabled me to add support for several features not
present in the previous iteration:

- Support for heterogeneous systems, where only some of the CPUs support SPE.
  This is accomplished via the KVM_ARM_VCPU_SUPPORTED_CPUS VCPU ioctl.

- Support for VM migration with the KVM_ARM_VCPU_SPE_CTRL(KVM_ARM_VCPU_SPE_STOP)
  VCPU ioctl.

- The requirement for userspace to mlock() the guest memory has been removed,
  and now userspace can make changes to memory contents after the memory is
  mapped at stage 2.

- Better debugging of guest memory pinning by printing a warning when we
  get an unexpected read or write fault. This helped me catch several bugs
  during development, it has already proven very useful. Many thanks to
  James who suggested when reviewing v3.


Missing features
================

I've tried to keep the series as small as possible to make it easier to review,
while implementing the core functionality needed for the SPE emulation. As such,
I've chosen to not implement several features:

- Host profiling a guest which has the SPE feature bit set (see open
  questions).

- No errata workarounds have been implemented yet, and there are quite a few of
  them for Neoverse N1 and Neoverse V1.

- Disabling CONFIG_NUMA_BALANCING is a hack to get KVM SPE to work and I am
  investigating other ways to get around automatic numa balancing, like
  requiring userspace to disable it via set_mempolicy(). I am also going to
  look at how VFIO gets around it. Suggestions welcome.

- There's plenty of room for optimization. Off the top of my head, using
  block mappings at stage 2, batch pinning of pages (similar to what VFIO
  does), optimize the way KVM keeps track of pinned pages (using a linked
  list triples the memory usage), context-switch the SPE registers on
  vcpu_load/vcpu_put on VHE if the host is not profiling, locking
  optimizations, etc, etc.

- ...and others. I'm sure I'm missing at least a few things which are
  important for someone.


Known issues
============

This is an RFC, so keep in mind that almost definitely there will be scary
bugs. For example, below is a list of known issues which don't affect the
correctness of the emulation, and which I'm planning to fix in a future
iteration:

- With CONFIG_PROVE_LOCKING=y, lockdep complains about lock contention when
  the VCPU executes the dcache clean pending ops.

- With CONFIG_PROVE_LOCKING=y, KVM will hit a BUG at
  kvm_lock_all_vcpus()->mutex_trylock(&vcpu->mutex) with more than 48
  VCPUs.

This BUG statement can also be triggered with mainline. To reproduce it,
compile kvmtool from this branch [5] and follow the instruction in the
kvmtool commit message.

One workaround could be to stop trying to lock all VCPUs when locking a
memslot and document the fact that it is required that no VCPUs are run
before the ioctl completes, otherwise bad things might happen to the VM.


Open questions
==============

1. Implementing support for host profiling a guest with the SPE feature
means setting the profiling buffer owning regime to EL2. While that is in
effect,  PMBIDR_EL1.P will equal 1. This has two consequences: if the guest
probes SPE during this time, the driver will fail; and the guest will be
able to determine when it is profiled. I see two options here:

- Do not allow the host's userspace to profile a guest where at least one
  VCPU has SPE.

- Document the effects somewhere and let userspace do whatever it likes.

No preference for either.

2. Userspace is not allowed to profile a CPU event (not bound to a task) is
!perf_allow_cpu(). It is my understanding that this is because of security
reasons, as we don't want a task to profile another task. Because a VM
will only be able to profile itself, I don't think it's necessary to
restrict the VM in any way based on perf_allow_cpu(), like we do with
perfmon_capable() and physical timer timestamps.

3. How to handle guest triggered SPE SErrors. Right now the Linux SPE drivers
doesn't do anything special SErrors reported by SPE, and I've done the same when
a guest manages to trigger one. Should I do something more? Disabling SPE
emulation for the entire VM is one option.


Summary of the patches
======================

Below is a short summary of the patches. For a more detailed explanation of
how SPE works, please see version 3 of the series [1].

Patches 1-11 implement the memslot locking functionality.

Patch 12 implements the KVM_ARM_VCPU_SUPPORTED_CPUS ioctl.

Patches 13-14 are preparatory for KVM SPE.

Patches 15-19 makes it possible for KVM to deny running a SPE enabled VCPU
on CPUs which don't have the SPE hardware.

Patches 20-22 add the userspace interface to configure SPE.

Patches 23-32 implement context switching of the SPE registers.

Patch 33 allows a guest to use physical timestamps only if the VMM is
perfmon_capable().

Patch 34 add the emulation for the SPE buffer management interrupts.

Patches 35-37 add the userspace API to temprorarily stop profiling so
memory can be unlocked and the VM migrated.

Patch 38 is a hack to get KVM SPE emulation going on NUMA systems (like the
Altra server which I used for testing).

Patch 39 finally enables SPE.


Testing
=======

Testing was done on Altra server with two sockets, both populated.

The Linux patches are based on v5.14-rc5 and can also be found on gitlab
[6].

For testing, I've used an SPE enabled version of kvmtool, which can be
found at [7]; the kvmtool patches will also be sent upstream. To test the
SPE_STOP API, I used a special version of kvmtool which starts the guest in
one of the stopped states; that can be found at [8] (compile from a
different commit if a different state and/or transition is desired).

Finally, in the VM I used defconfig Linux guest compiled from v5.15-rc5 and
some kvm-unit-tests patches which I wrote to test SPE [9].

All tests were run three times: once with VHE enabled, once in NVHE mode
(kvm-arm.mode=nvhe) and once in protected mode (kvm-arm.mode=protected).

The first test that I ran was the kvm-unit-tests test. This is also the
test that I used to check that KVM_ARM_VCPU_SPE_STOP_{TRAP,EXIT,RESUME}
works correctly with kvmtool.

Then I profiled iperf3 in the guest (16 VCPUs to limit the size of perf.data,
32GiB memory), while concurrently profiling in the host. This is the command
that I used:

# perf record -ae arm_spe/ts_enable=1,pa_enable=1,pct_enable=1/ -- iperf3 -c 127.0.0.1 -t 30

Everything looked right to me and I didn't see any kernel warnings or bugs.

[1] https://lore.kernel.org/linux-arm-kernel/20201027172705.15181-1-alexandru.elisei@arm.com/
[2] https://www.spinics.net/lists/arm-kernel/msg776228.html
[3] https://lists.cs.columbia.edu/pipermail/kvmarm/2019-February/034887.html
[4] https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/statistical-profiling-extension-for-armv8-a
[5] https://gitlab.arm.com/linux-arm/kvmtool-ae/-/tree/vgic-lock-all-vcpus-lockdep-bug-v1
[6] https://gitlab.arm.com/linux-arm/linux-ae/-/tree/kvm-spe-v4
[7] https://gitlab.arm.com/linux-arm/kvmtool-ae/-/tree/kvm-spe-v4
[8] https://gitlab.arm.com/linux-arm/kvmtool-ae/-/tree/kvm-spe-v4-spe-stop-tests
[9] https://gitlab.arm.com/linux-arm/kvm-unit-tests-ae/-/tree/kvm-spe-v4

Alexandru Elisei (35):
  KVM: arm64: Make lock_all_vcpus() available to the rest of KVM
  KVM: arm64: Add lock/unlock memslot user API
  KVM: arm64: Implement the memslot lock/unlock functionality
  KVM: arm64: Defer CMOs for locked memslots until a VCPU is run
  KVM: arm64: Perform CMOs on locked memslots when userspace resets
    VCPUs
  KVM: arm64: Delay tag scrubbing for locked memslots until a VCPU runs
  KVM: arm64: Unlock memslots after stage 2 tables are freed
  KVM: arm64: Deny changes to locked memslots
  KVM: Add kvm_warn{,_ratelimited} macros
  KVM: arm64: Print a warning for unexpected faults on locked memslots
  KVM: arm64: Allow userspace to lock and unlock memslots
  KVM: arm64: Add the KVM_ARM_VCPU_SUPPORTED_CPUS VCPU ioctl
  KVM: arm64: Add CONFIG_KVM_ARM_SPE Kconfig option
  KVM: arm64: Add SPE capability and VCPU feature
  drivers/perf: Expose the cpumask of CPUs that support SPE
  KVM: arm64: Make SPE available when at least one CPU supports it
  KVM: arm64: Set the VCPU SPE feature bit when SPE is available
  KVM: arm64: Expose SPE version to guests
  KVM: arm64: Do not emulate SPE on CPUs which don't have SPE
  KVM: arm64: debug: Configure MDCR_EL2 when a VCPU has SPE
  KVM: arm64: Move the write to MDCR_EL2 out of
    __activate_traps_common()
  KVM: arm64: VHE: Change MDCR_EL2 at world switch if VCPU has SPE
  KVM: arm64: Add SPE system registers to VCPU context
  KVM: arm64: nVHE: Save PMSCR_EL1 to the host context
  KVM: arm64: Rename DEBUG_STATE_SAVE_SPE -> DEBUG_SAVE_SPE_BUFFER flags
  KVM: arm64: nVHE: Context switch SPE state if VCPU has SPE
  KVM: arm64: VHE: Context switch SPE state if VCPU has SPE
  KVM: arm64: Save/restore PMSNEVFR_EL1 on VCPU put/load
  KVM: arm64: Allow guest to use physical timestamps if
    perfmon_capable()
  KVM: arm64: Emulate SPE buffer management interrupt
  KVM: arm64: Add an userspace API to stop a VCPU profiling
  KVM: arm64: Implement userspace API to stop a VCPU profiling
  KVM: arm64: Add PMSIDR_EL1 to the SPE register context
  KVM: arm64: Make CONFIG_KVM_ARM_SPE depend on !CONFIG_NUMA_BALANCING
  KVM: arm64: Allow userspace to enable SPE for guests

Sudeep Holla (4):
  KVM: arm64: Add a new VCPU device control group for SPE
  KVM: arm64: Add SPE VCPU device attribute to set the interrupt number
  KVM: arm64: Add SPE VCPU device attribute to initialize SPE
  KVM: arm64: VHE: Clear MDCR_EL2.E2PB in vcpu_put()

 Documentation/virt/kvm/api.rst          |  87 ++++-
 Documentation/virt/kvm/devices/vcpu.rst |  76 ++++
 arch/arm64/include/asm/kvm_arm.h        |   1 +
 arch/arm64/include/asm/kvm_host.h       |  75 +++-
 arch/arm64/include/asm/kvm_hyp.h        |  51 ++-
 arch/arm64/include/asm/kvm_mmu.h        |   8 +
 arch/arm64/include/asm/kvm_spe.h        |  96 ++++++
 arch/arm64/include/asm/sysreg.h         |   3 +
 arch/arm64/include/uapi/asm/kvm.h       |  11 +
 arch/arm64/kvm/Kconfig                  |   8 +
 arch/arm64/kvm/Makefile                 |   1 +
 arch/arm64/kvm/arm.c                    | 140 +++++++-
 arch/arm64/kvm/debug.c                  |  55 ++-
 arch/arm64/kvm/guest.c                  |  10 +
 arch/arm64/kvm/hyp/include/hyp/spe-sr.h |  32 ++
 arch/arm64/kvm/hyp/include/hyp/switch.h |   1 -
 arch/arm64/kvm/hyp/nvhe/Makefile        |   1 +
 arch/arm64/kvm/hyp/nvhe/debug-sr.c      |  24 +-
 arch/arm64/kvm/hyp/nvhe/spe-sr.c        | 133 +++++++
 arch/arm64/kvm/hyp/nvhe/switch.c        |  35 +-
 arch/arm64/kvm/hyp/vhe/Makefile         |   1 +
 arch/arm64/kvm/hyp/vhe/spe-sr.c         | 193 +++++++++++
 arch/arm64/kvm/hyp/vhe/switch.c         |  43 ++-
 arch/arm64/kvm/hyp/vhe/sysreg-sr.c      |   2 +-
 arch/arm64/kvm/mmu.c                    | 441 +++++++++++++++++++++++-
 arch/arm64/kvm/reset.c                  |  23 ++
 arch/arm64/kvm/spe.c                    | 381 ++++++++++++++++++++
 arch/arm64/kvm/sys_regs.c               |  77 ++++-
 arch/arm64/kvm/vgic/vgic-init.c         |   4 +-
 arch/arm64/kvm/vgic/vgic-its.c          |   8 +-
 arch/arm64/kvm/vgic/vgic-kvm-device.c   |  50 +--
 arch/arm64/kvm/vgic/vgic.h              |   3 -
 drivers/perf/arm_spe_pmu.c              |  30 +-
 include/linux/kvm_host.h                |   4 +
 include/linux/perf/arm_pmu.h            |   7 +
 include/uapi/linux/kvm.h                |  13 +
 36 files changed, 1983 insertions(+), 145 deletions(-)
 create mode 100644 arch/arm64/include/asm/kvm_spe.h
 create mode 100644 arch/arm64/kvm/hyp/include/hyp/spe-sr.h
 create mode 100644 arch/arm64/kvm/hyp/nvhe/spe-sr.c
 create mode 100644 arch/arm64/kvm/hyp/vhe/spe-sr.c
 create mode 100644 arch/arm64/kvm/spe.c

-- 
2.33.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [RFC PATCH v4 01/39] KVM: arm64: Make lock_all_vcpus() available to the rest of KVM
  2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
@ 2021-08-25 16:17 ` Alexandru Elisei
  2021-09-22 10:39   ` Suzuki K Poulose
  2021-08-25 16:17 ` [RFC PATCH v4 02/39] KVM: arm64: Add lock/unlock memslot user API Alexandru Elisei
                   ` (38 subsequent siblings)
  39 siblings, 1 reply; 47+ messages in thread
From: Alexandru Elisei @ 2021-08-25 16:17 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	linux-kernel

The VGIC code uses the lock_all_vcpus() function to make sure no VCPUs are
run while it fiddles with the global VGIC state. Move the declaration of
lock_all_vcpus() and the corresponding unlock function into asm/kvm_host.h
where it can be reused by other parts of KVM/arm64 and rename the functions
to kvm_{lock,unlock}_all_vcpus() to make them more generic.

Because the scope of the code potentially using the functions has
increased, add a lockdep check that the kvm->lock is held by the caller.
Holding the lock is necessary because otherwise userspace would be able to
create new VCPUs and run them while the existing VCPUs are locked.

No functional change intended.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_host.h     |  3 ++
 arch/arm64/kvm/arm.c                  | 41 ++++++++++++++++++++++
 arch/arm64/kvm/vgic/vgic-init.c       |  4 +--
 arch/arm64/kvm/vgic/vgic-its.c        |  8 ++---
 arch/arm64/kvm/vgic/vgic-kvm-device.c | 50 ++++-----------------------
 arch/arm64/kvm/vgic/vgic.h            |  3 --
 6 files changed, 56 insertions(+), 53 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 41911585ae0c..797083203603 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -601,6 +601,9 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
 void kvm_arm_halt_guest(struct kvm *kvm);
 void kvm_arm_resume_guest(struct kvm *kvm);
 
+bool kvm_lock_all_vcpus(struct kvm *kvm);
+void kvm_unlock_all_vcpus(struct kvm *kvm);
+
 #ifndef __KVM_NVHE_HYPERVISOR__
 #define kvm_call_hyp_nvhe(f, ...)						\
 	({								\
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index e9a2b8f27792..ddace63528f1 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -647,6 +647,47 @@ void kvm_arm_resume_guest(struct kvm *kvm)
 	}
 }
 
+/* unlocks vcpus from @vcpu_lock_idx and smaller */
+static void unlock_vcpus(struct kvm *kvm, int vcpu_lock_idx)
+{
+	struct kvm_vcpu *tmp_vcpu;
+
+	for (; vcpu_lock_idx >= 0; vcpu_lock_idx--) {
+		tmp_vcpu = kvm_get_vcpu(kvm, vcpu_lock_idx);
+		mutex_unlock(&tmp_vcpu->mutex);
+	}
+}
+
+void kvm_unlock_all_vcpus(struct kvm *kvm)
+{
+	lockdep_assert_held(&kvm->lock);
+	unlock_vcpus(kvm, atomic_read(&kvm->online_vcpus) - 1);
+}
+
+/* Returns true if all vcpus were locked, false otherwise */
+bool kvm_lock_all_vcpus(struct kvm *kvm)
+{
+	struct kvm_vcpu *tmp_vcpu;
+	int c;
+
+	lockdep_assert_held(&kvm->lock);
+
+	/*
+	 * Any time a vcpu is run, vcpu_load is called which tries to grab the
+	 * vcpu->mutex.  By grabbing the vcpu->mutex of all VCPUs we ensure that
+	 * no other VCPUs are run and it is safe to fiddle with KVM global
+	 * state.
+	 */
+	kvm_for_each_vcpu(c, tmp_vcpu, kvm) {
+		if (!mutex_trylock(&tmp_vcpu->mutex)) {
+			unlock_vcpus(kvm, c - 1);
+			return false;
+		}
+	}
+
+	return true;
+}
+
 static void vcpu_req_sleep(struct kvm_vcpu *vcpu)
 {
 	struct rcuwait *wait = kvm_arch_vcpu_get_wait(vcpu);
diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
index 340c51d87677..6a85aa064a6c 100644
--- a/arch/arm64/kvm/vgic/vgic-init.c
+++ b/arch/arm64/kvm/vgic/vgic-init.c
@@ -87,7 +87,7 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
 		return -ENODEV;
 
 	ret = -EBUSY;
-	if (!lock_all_vcpus(kvm))
+	if (!kvm_lock_all_vcpus(kvm))
 		return ret;
 
 	kvm_for_each_vcpu(i, vcpu, kvm) {
@@ -117,7 +117,7 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
 		INIT_LIST_HEAD(&kvm->arch.vgic.rd_regions);
 
 out_unlock:
-	unlock_all_vcpus(kvm);
+	kvm_unlock_all_vcpus(kvm);
 	return ret;
 }
 
diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
index 61728c543eb9..3a336a678cb8 100644
--- a/arch/arm64/kvm/vgic/vgic-its.c
+++ b/arch/arm64/kvm/vgic/vgic-its.c
@@ -2005,7 +2005,7 @@ static int vgic_its_attr_regs_access(struct kvm_device *dev,
 		goto out;
 	}
 
-	if (!lock_all_vcpus(dev->kvm)) {
+	if (!kvm_lock_all_vcpus(dev->kvm)) {
 		ret = -EBUSY;
 		goto out;
 	}
@@ -2023,7 +2023,7 @@ static int vgic_its_attr_regs_access(struct kvm_device *dev,
 	} else {
 		*reg = region->its_read(dev->kvm, its, addr, len);
 	}
-	unlock_all_vcpus(dev->kvm);
+	kvm_unlock_all_vcpus(dev->kvm);
 out:
 	mutex_unlock(&dev->kvm->lock);
 	return ret;
@@ -2668,7 +2668,7 @@ static int vgic_its_ctrl(struct kvm *kvm, struct vgic_its *its, u64 attr)
 	mutex_lock(&kvm->lock);
 	mutex_lock(&its->its_lock);
 
-	if (!lock_all_vcpus(kvm)) {
+	if (!kvm_lock_all_vcpus(kvm)) {
 		mutex_unlock(&its->its_lock);
 		mutex_unlock(&kvm->lock);
 		return -EBUSY;
@@ -2686,7 +2686,7 @@ static int vgic_its_ctrl(struct kvm *kvm, struct vgic_its *its, u64 attr)
 		break;
 	}
 
-	unlock_all_vcpus(kvm);
+	kvm_unlock_all_vcpus(kvm);
 	mutex_unlock(&its->its_lock);
 	mutex_unlock(&kvm->lock);
 	return ret;
diff --git a/arch/arm64/kvm/vgic/vgic-kvm-device.c b/arch/arm64/kvm/vgic/vgic-kvm-device.c
index 7740995de982..c2f95d124cbc 100644
--- a/arch/arm64/kvm/vgic/vgic-kvm-device.c
+++ b/arch/arm64/kvm/vgic/vgic-kvm-device.c
@@ -298,44 +298,6 @@ int vgic_v2_parse_attr(struct kvm_device *dev, struct kvm_device_attr *attr,
 	return 0;
 }
 
-/* unlocks vcpus from @vcpu_lock_idx and smaller */
-static void unlock_vcpus(struct kvm *kvm, int vcpu_lock_idx)
-{
-	struct kvm_vcpu *tmp_vcpu;
-
-	for (; vcpu_lock_idx >= 0; vcpu_lock_idx--) {
-		tmp_vcpu = kvm_get_vcpu(kvm, vcpu_lock_idx);
-		mutex_unlock(&tmp_vcpu->mutex);
-	}
-}
-
-void unlock_all_vcpus(struct kvm *kvm)
-{
-	unlock_vcpus(kvm, atomic_read(&kvm->online_vcpus) - 1);
-}
-
-/* Returns true if all vcpus were locked, false otherwise */
-bool lock_all_vcpus(struct kvm *kvm)
-{
-	struct kvm_vcpu *tmp_vcpu;
-	int c;
-
-	/*
-	 * Any time a vcpu is run, vcpu_load is called which tries to grab the
-	 * vcpu->mutex.  By grabbing the vcpu->mutex of all VCPUs we ensure
-	 * that no other VCPUs are run and fiddle with the vgic state while we
-	 * access it.
-	 */
-	kvm_for_each_vcpu(c, tmp_vcpu, kvm) {
-		if (!mutex_trylock(&tmp_vcpu->mutex)) {
-			unlock_vcpus(kvm, c - 1);
-			return false;
-		}
-	}
-
-	return true;
-}
-
 /**
  * vgic_v2_attr_regs_access - allows user space to access VGIC v2 state
  *
@@ -366,7 +328,7 @@ static int vgic_v2_attr_regs_access(struct kvm_device *dev,
 	if (ret)
 		goto out;
 
-	if (!lock_all_vcpus(dev->kvm)) {
+	if (!kvm_lock_all_vcpus(dev->kvm)) {
 		ret = -EBUSY;
 		goto out;
 	}
@@ -383,7 +345,7 @@ static int vgic_v2_attr_regs_access(struct kvm_device *dev,
 		break;
 	}
 
-	unlock_all_vcpus(dev->kvm);
+	kvm_unlock_all_vcpus(dev->kvm);
 out:
 	mutex_unlock(&dev->kvm->lock);
 	return ret;
@@ -532,7 +494,7 @@ static int vgic_v3_attr_regs_access(struct kvm_device *dev,
 		goto out;
 	}
 
-	if (!lock_all_vcpus(dev->kvm)) {
+	if (!kvm_lock_all_vcpus(dev->kvm)) {
 		ret = -EBUSY;
 		goto out;
 	}
@@ -582,7 +544,7 @@ static int vgic_v3_attr_regs_access(struct kvm_device *dev,
 		break;
 	}
 
-	unlock_all_vcpus(dev->kvm);
+	kvm_unlock_all_vcpus(dev->kvm);
 out:
 	mutex_unlock(&dev->kvm->lock);
 	return ret;
@@ -637,12 +599,12 @@ static int vgic_v3_set_attr(struct kvm_device *dev,
 		case KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES:
 			mutex_lock(&dev->kvm->lock);
 
-			if (!lock_all_vcpus(dev->kvm)) {
+			if (!kvm_lock_all_vcpus(dev->kvm)) {
 				mutex_unlock(&dev->kvm->lock);
 				return -EBUSY;
 			}
 			ret = vgic_v3_save_pending_tables(dev->kvm);
-			unlock_all_vcpus(dev->kvm);
+			kvm_unlock_all_vcpus(dev->kvm);
 			mutex_unlock(&dev->kvm->lock);
 			return ret;
 		}
diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h
index dc1f3d1657ee..0511618c89f6 100644
--- a/arch/arm64/kvm/vgic/vgic.h
+++ b/arch/arm64/kvm/vgic/vgic.h
@@ -252,9 +252,6 @@ int vgic_init(struct kvm *kvm);
 void vgic_debug_init(struct kvm *kvm);
 void vgic_debug_destroy(struct kvm *kvm);
 
-bool lock_all_vcpus(struct kvm *kvm);
-void unlock_all_vcpus(struct kvm *kvm);
-
 static inline int vgic_v3_max_apr_idx(struct kvm_vcpu *vcpu)
 {
 	struct vgic_cpu *cpu_if = &vcpu->arch.vgic_cpu;
-- 
2.33.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v4 02/39] KVM: arm64: Add lock/unlock memslot user API
  2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
  2021-08-25 16:17 ` [RFC PATCH v4 01/39] KVM: arm64: Make lock_all_vcpus() available to the rest of KVM Alexandru Elisei
@ 2021-08-25 16:17 ` Alexandru Elisei
  2021-10-18  9:04   ` Suzuki K Poulose
  2021-08-25 16:17 ` [RFC PATCH v4 03/39] KVM: arm64: Implement the memslot lock/unlock functionality Alexandru Elisei
                   ` (37 subsequent siblings)
  39 siblings, 1 reply; 47+ messages in thread
From: Alexandru Elisei @ 2021-08-25 16:17 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	linux-kernel

Stage 2 faults triggered by the profiling buffer attempting to write to
memory are reported by the SPE hardware by asserting a buffer management
event interrupt. Interrupts are by their nature asynchronous, which means
that the guest might have changed its stage 1 translation tables since the
attempted write. SPE reports the guest virtual address that caused the data
abort, not the IPA, which means that KVM would have to walk the guest's
stage 1 tables to find the IPA. Using the AT instruction to walk the
guest's tables in hardware is not an option because it doesn't report the
IPA in the case of a stage 2 fault on a stage 1 table walk.

Avoid both issues by pre-mapping the guest memory at stage 2. This is being
done by adding a capability that allows the user to pin the memory backing
a memslot. The same capability can be used to unlock a memslot, which
unpins the pages associated with the memslot, but doesn't unmap the IPA
range from stage 2; in this case, the addresses will be unmapped from stage
2 via the MMU notifiers when the process' address space changes.

For now, the capability doesn't actually do anything other than checking
that the usage is correct; the memory operations will be added in future
patches.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 Documentation/virt/kvm/api.rst   | 56 +++++++++++++++++++++++
 arch/arm64/include/asm/kvm_mmu.h |  3 ++
 arch/arm64/kvm/arm.c             | 42 ++++++++++++++++--
 arch/arm64/kvm/mmu.c             | 76 ++++++++++++++++++++++++++++++++
 include/uapi/linux/kvm.h         |  8 ++++
 5 files changed, 181 insertions(+), 4 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index dae68e68ca23..741327ef06b0 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6682,6 +6682,62 @@ MAP_SHARED mmap will result in an -EINVAL return.
 When enabled the VMM may make use of the ``KVM_ARM_MTE_COPY_TAGS`` ioctl to
 perform a bulk copy of tags to/from the guest.
 
+7.29 KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
+----------------------------------------
+
+:Architectures: arm64
+:Target: VM
+:Parameters: flags is one of KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK or
+                     KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK
+             args[0] is the slot number
+             args[1] specifies the permisions when the memslot is locked or if
+                     all memslots should be unlocked
+
+The presence of this capability indicates that KVM supports locking the memory
+associated with the memslot, and unlocking a previously locked memslot.
+
+The 'flags' parameter is defined as follows:
+
+7.29.1 KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK
+-------------------------------------------------
+
+:Capability: 'flags' parameter to KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
+:Architectures: arm64
+:Target: VM
+:Parameters: args[0] contains the memory slot number
+             args[1] contains the permissions for the locked memory:
+                     KVM_ARM_LOCK_MEMORY_READ (mandatory) to map it with
+                     read permissions and KVM_ARM_LOCK_MEMORY_WRITE
+                     (optional) with write permissions
+:Returns: 0 on success; negative error code on failure
+
+Enabling this capability causes the memory described by the memslot to be
+pinned in the process address space and the corresponding stage 2 IPA range
+mapped at stage 2. The permissions specified in args[1] apply to both
+mappings. The memory pinned with this capability counts towards the max
+locked memory limit for the current process.
+
+The capability must be enabled before any VCPUs have run. The virtual memory
+range described by the memslot must be mapped in the userspace process without
+any gaps. It is considered an error if write permissions are specified for a
+memslot which logs dirty pages.
+
+7.29.2 KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK
+---------------------------------------------------
+
+:Capability: 'flags' parameter to KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
+:Architectures: arm64
+:Target: VM
+:Parameters: args[0] contains the memory slot number
+             args[1] optionally contains the flag KVM_ARM_UNLOCK_MEM_ALL,
+                     which unlocks all previously locked memslots.
+:Returns: 0 on success; negative error code on failure
+
+Enabling this capability causes the memory pinned when locking the memslot
+specified in args[0] to be unpinned, or, optionally, the memory associated
+with all locked memslots, to be unpinned. The IPA range is not unmapped
+from stage 2.
+
 8. Other capabilities.
 ======================
 
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index b52c5c4b9a3d..ef079b5eb475 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -216,6 +216,9 @@ static inline void __invalidate_icache_guest_page(void *va, size_t size)
 void kvm_set_way_flush(struct kvm_vcpu *vcpu);
 void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
 
+int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags);
+int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags);
+
 static inline unsigned int kvm_get_vmid_bits(void)
 {
 	int reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index ddace63528f1..57ac97b30b3d 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -80,16 +80,43 @@ int kvm_arch_check_processor_compat(void *opaque)
 	return 0;
 }
 
+static int kvm_arm_lock_memslot_supported(void)
+{
+	return 0;
+}
+
+static int kvm_lock_user_memory_region_ioctl(struct kvm *kvm,
+					     struct kvm_enable_cap *cap)
+{
+	u64 slot, flags;
+	u32 action;
+
+	if (cap->args[2] || cap->args[3])
+		return -EINVAL;
+
+	slot = cap->args[0];
+	flags = cap->args[1];
+	action = cap->flags;
+
+	switch (action) {
+	case KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK:
+		return kvm_mmu_lock_memslot(kvm, slot, flags);
+	case KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK:
+		return kvm_mmu_unlock_memslot(kvm, slot, flags);
+	default:
+		return -EINVAL;
+	}
+}
+
 int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 			    struct kvm_enable_cap *cap)
 {
 	int r;
 
-	if (cap->flags)
-		return -EINVAL;
-
 	switch (cap->cap) {
 	case KVM_CAP_ARM_NISV_TO_USER:
+		if (cap->flags)
+			return -EINVAL;
 		r = 0;
 		kvm->arch.return_nisv_io_abort_to_user = true;
 		break;
@@ -99,6 +126,11 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		r = 0;
 		kvm->arch.mte_enabled = true;
 		break;
+	case KVM_CAP_ARM_LOCK_USER_MEMORY_REGION:
+		if (!kvm_arm_lock_memslot_supported())
+			return -EINVAL;
+		r = kvm_lock_user_memory_region_ioctl(kvm, cap);
+		break;
 	default:
 		r = -EINVAL;
 		break;
@@ -166,7 +198,6 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
 	return VM_FAULT_SIGBUS;
 }
 
-
 /**
  * kvm_arch_destroy_vm - destroy the VM data structure
  * @kvm:	pointer to the KVM struct
@@ -274,6 +305,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_ARM_PTRAUTH_GENERIC:
 		r = system_has_full_ptr_auth();
 		break;
+	case KVM_CAP_ARM_LOCK_USER_MEMORY_REGION:
+		r = kvm_arm_lock_memslot_supported();
+		break;
 	default:
 		r = 0;
 	}
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 0625bf2353c2..689b24cb0f10 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1244,6 +1244,82 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 	return ret;
 }
 
+int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags)
+{
+	struct kvm_memory_slot *memslot;
+	struct kvm_vcpu *vcpu;
+	int i, ret;
+
+	if (slot >= KVM_MEM_SLOTS_NUM)
+		return -EINVAL;
+
+	if (!(flags & KVM_ARM_LOCK_MEM_READ))
+		return -EINVAL;
+
+	mutex_lock(&kvm->lock);
+
+	if (!kvm_lock_all_vcpus(kvm)) {
+		ret = -EBUSY;
+		goto out;
+	}
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		if (vcpu->arch.has_run_once) {
+			ret = -EBUSY;
+			goto out_unlock_vcpus;
+		}
+	}
+
+	mutex_lock(&kvm->slots_lock);
+
+	memslot = id_to_memslot(kvm_memslots(kvm), slot);
+	if (!memslot) {
+		ret = -EINVAL;
+		goto out_unlock_slots;
+	}
+	if ((flags & KVM_ARM_LOCK_MEM_WRITE) &&
+	    ((memslot->flags & KVM_MEM_READONLY) || memslot->dirty_bitmap)) {
+		ret = -EPERM;
+		goto out_unlock_slots;
+	}
+
+	ret = -EINVAL;
+
+out_unlock_slots:
+	mutex_unlock(&kvm->slots_lock);
+out_unlock_vcpus:
+	kvm_unlock_all_vcpus(kvm);
+out:
+	mutex_unlock(&kvm->lock);
+	return ret;
+}
+
+int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags)
+{
+	struct kvm_memory_slot *memslot;
+	int ret;
+
+	if (flags & KVM_ARM_UNLOCK_MEM_ALL)
+		return -EINVAL;
+
+	if (slot >= KVM_MEM_SLOTS_NUM)
+		return -EINVAL;
+
+	mutex_lock(&kvm->slots_lock);
+
+	memslot = id_to_memslot(kvm_memslots(kvm), slot);
+	if (!memslot) {
+		ret = -EINVAL;
+		goto out_unlock_slots;
+	}
+
+	ret = -EINVAL;
+
+out_unlock_slots:
+	mutex_unlock(&kvm->slots_lock);
+	return ret;
+}
+
 bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
 {
 	if (!kvm->arch.mmu.pgt)
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index d9e4aabcb31a..bcf62c7bdd2d 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1112,6 +1112,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_BINARY_STATS_FD 203
 #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
 #define KVM_CAP_ARM_MTE 205
+#define KVM_CAP_ARM_LOCK_USER_MEMORY_REGION 206
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1459,6 +1460,13 @@ struct kvm_s390_ucas_mapping {
 #define KVM_PPC_SVM_OFF		  _IO(KVMIO,  0xb3)
 #define KVM_ARM_MTE_COPY_TAGS	  _IOR(KVMIO,  0xb4, struct kvm_arm_copy_mte_tags)
 
+/* Used by KVM_CAP_ARM_LOCK_USER_MEMORY_REGION */
+#define KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK	(1 << 0)
+#define   KVM_ARM_LOCK_MEM_READ				(1 << 0)
+#define   KVM_ARM_LOCK_MEM_WRITE			(1 << 1)
+#define KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK	(1 << 1)
+#define   KVM_ARM_UNLOCK_MEM_ALL			(1 << 0)
+
 /* ioctl for vm fd */
 #define KVM_CREATE_DEVICE	  _IOWR(KVMIO,  0xe0, struct kvm_create_device)
 
-- 
2.33.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v4 03/39] KVM: arm64: Implement the memslot lock/unlock functionality
  2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
  2021-08-25 16:17 ` [RFC PATCH v4 01/39] KVM: arm64: Make lock_all_vcpus() available to the rest of KVM Alexandru Elisei
  2021-08-25 16:17 ` [RFC PATCH v4 02/39] KVM: arm64: Add lock/unlock memslot user API Alexandru Elisei
@ 2021-08-25 16:17 ` Alexandru Elisei
  2021-08-25 16:17 ` [RFC PATCH v4 04/39] KVM: arm64: Defer CMOs for locked memslots until a VCPU is run Alexandru Elisei
                   ` (36 subsequent siblings)
  39 siblings, 0 replies; 47+ messages in thread
From: Alexandru Elisei @ 2021-08-25 16:17 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	linux-kernel

Pin memory in the process address space and map it in the stage 2 tables as
a result of userspace enabling the KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
capability; and unpin it from the process address space when the capability
is used with the KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK flag.

The current implementation has two drawbacks which will be fixed in future
patches:

- The dcache maintenance is done when the memslot is locked, which means
  that it is possible that memory changes made by userspace after the ioctl
  completes won't be visible to a guest running with the MMU off.

- Tag scrubbing is done when the memslot is locked. If the MTE capability
  is enabled after the ioctl, the guest will be able to access unsanitised
  pages. This is prevented by forbidding userspace to enable the MTE
  capability if any memslots are locked.

Only PAGE_SIZE mappings are supported at stage 2.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 Documentation/virt/kvm/api.rst    |   8 +-
 arch/arm64/include/asm/kvm_host.h |  11 ++
 arch/arm64/kvm/arm.c              |  22 ++++
 arch/arm64/kvm/mmu.c              | 211 +++++++++++++++++++++++++++++-
 4 files changed, 245 insertions(+), 7 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 741327ef06b0..5aa251df7077 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6717,10 +6717,10 @@ mapped at stage 2. The permissions specified in args[1] apply to both
 mappings. The memory pinned with this capability counts towards the max
 locked memory limit for the current process.
 
-The capability must be enabled before any VCPUs have run. The virtual memory
-range described by the memslot must be mapped in the userspace process without
-any gaps. It is considered an error if write permissions are specified for a
-memslot which logs dirty pages.
+The capability must be enabled before any VCPUs have run. The entire virtual
+memory range described by the memslot must be mapped by the userspace process.
+It is considered an error if write permissions are specified for a memslot which
+logs dirty pages.
 
 7.29.2 KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK
 ---------------------------------------------------
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 797083203603..97ff3ed5d4b7 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -98,7 +98,18 @@ struct kvm_s2_mmu {
 	struct kvm_arch *arch;
 };
 
+#define KVM_MEMSLOT_LOCK_READ		(1 << 0)
+#define KVM_MEMSLOT_LOCK_WRITE		(1 << 1)
+#define KVM_MEMSLOT_LOCK_MASK		0x3
+
+struct kvm_memory_slot_page {
+	struct list_head list;
+	struct page *page;
+};
+
 struct kvm_arch_memory_slot {
+	struct kvm_memory_slot_page pages;
+	u32 flags;
 };
 
 struct kvm_arch {
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 57ac97b30b3d..efb3501c6016 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -108,6 +108,25 @@ static int kvm_lock_user_memory_region_ioctl(struct kvm *kvm,
 	}
 }
 
+static bool kvm_arm_has_locked_memslots(struct kvm *kvm)
+{
+	struct kvm_memslots *slots = kvm_memslots(kvm);
+	struct kvm_memory_slot *memslot;
+	bool has_locked_memslots = false;
+	int idx;
+
+	idx = srcu_read_lock(&kvm->srcu);
+	kvm_for_each_memslot(memslot, slots) {
+		if (memslot->arch.flags & KVM_MEMSLOT_LOCK_MASK) {
+			has_locked_memslots = true;
+			break;
+		}
+	}
+	srcu_read_unlock(&kvm->srcu, idx);
+
+	return has_locked_memslots;
+}
+
 int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 			    struct kvm_enable_cap *cap)
 {
@@ -123,6 +142,9 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 	case KVM_CAP_ARM_MTE:
 		if (!system_supports_mte() || kvm->created_vcpus)
 			return -EINVAL;
+		if (kvm_arm_lock_memslot_supported() &&
+		    kvm_arm_has_locked_memslots(kvm))
+			return -EPERM;
 		r = 0;
 		kvm->arch.mte_enabled = true;
 		break;
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 689b24cb0f10..59c2bfef2fd1 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -72,6 +72,11 @@ static bool memslot_is_logging(struct kvm_memory_slot *memslot)
 	return memslot->dirty_bitmap && !(memslot->flags & KVM_MEM_READONLY);
 }
 
+static bool memslot_is_locked(struct kvm_memory_slot *memslot)
+{
+	return memslot->arch.flags & KVM_MEMSLOT_LOCK_MASK;
+}
+
 /**
  * kvm_flush_remote_tlbs() - flush all VM TLB entries for v7/8
  * @kvm:	pointer to kvm structure.
@@ -722,6 +727,10 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot,
 	if (map_size == PAGE_SIZE)
 		return true;
 
+	/* Allow only PAGE_SIZE mappings for locked memslots */
+	if (memslot_is_locked(memslot))
+		return false;
+
 	size = memslot->npages * PAGE_SIZE;
 
 	gpa_start = memslot->base_gfn << PAGE_SHIFT;
@@ -1244,6 +1253,159 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 	return ret;
 }
 
+static int try_rlimit_memlock(unsigned long npages)
+{
+	unsigned long lock_limit;
+	bool has_lock_cap;
+	int ret = 0;
+
+	has_lock_cap = capable(CAP_IPC_LOCK);
+	if (has_lock_cap)
+		goto out;
+
+	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
+
+	mmap_read_lock(current->mm);
+	if (npages + current->mm->locked_vm > lock_limit)
+		ret = -ENOMEM;
+	mmap_read_unlock(current->mm);
+
+out:
+	return ret;
+}
+
+static void unpin_memslot_pages(struct kvm_memory_slot *memslot, bool writable)
+{
+	struct kvm_memory_slot_page *entry, *tmp;
+
+	list_for_each_entry_safe(entry, tmp, &memslot->arch.pages.list, list) {
+		if (writable)
+			set_page_dirty_lock(entry->page);
+		unpin_user_page(entry->page);
+		kfree(entry);
+	}
+}
+
+static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
+			u64 flags)
+{
+	struct kvm_mmu_memory_cache cache = { 0, __GFP_ZERO, NULL, };
+	struct kvm_memory_slot_page *page_entry;
+	bool writable = flags & KVM_ARM_LOCK_MEM_WRITE;
+	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
+	struct kvm_pgtable *pgt = kvm->arch.mmu.pgt;
+	struct vm_area_struct *vma;
+	unsigned long npages = memslot->npages;
+	unsigned int pin_flags = FOLL_LONGTERM;
+	unsigned long i, hva, ipa, mmu_seq;
+	int ret;
+
+	ret = try_rlimit_memlock(npages);
+	if (ret)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&memslot->arch.pages.list);
+
+	if (writable) {
+		prot |= KVM_PGTABLE_PROT_W;
+		pin_flags |= FOLL_WRITE;
+	}
+
+	hva = memslot->userspace_addr;
+	ipa = memslot->base_gfn << PAGE_SHIFT;
+
+	mmu_seq = kvm->mmu_notifier_seq;
+	smp_rmb();
+
+	for (i = 0; i < npages; i++) {
+		page_entry = kzalloc(sizeof(*page_entry), GFP_KERNEL);
+		if (!page_entry) {
+			unpin_memslot_pages(memslot, writable);
+			ret = -ENOMEM;
+			goto out_err;
+		}
+
+		mmap_read_lock(current->mm);
+		ret = pin_user_pages(hva, 1, pin_flags, &page_entry->page, &vma);
+		if (ret != 1) {
+			mmap_read_unlock(current->mm);
+			unpin_memslot_pages(memslot, writable);
+			ret = -ENOMEM;
+			goto out_err;
+		}
+		if (kvm_has_mte(kvm)) {
+			if (vma->vm_flags & VM_SHARED) {
+				ret = -EFAULT;
+			} else {
+				ret = sanitise_mte_tags(kvm,
+					page_to_pfn(page_entry->page),
+					PAGE_SIZE);
+			}
+			if (ret) {
+				mmap_read_unlock(current->mm);
+				goto out_err;
+			}
+		}
+		mmap_read_unlock(current->mm);
+
+		ret = kvm_mmu_topup_memory_cache(&cache, kvm_mmu_cache_min_pages(kvm));
+		if (ret) {
+			unpin_memslot_pages(memslot, writable);
+			goto out_err;
+		}
+
+		spin_lock(&kvm->mmu_lock);
+		if (mmu_notifier_retry(kvm, mmu_seq)) {
+			spin_unlock(&kvm->mmu_lock);
+			unpin_memslot_pages(memslot, writable);
+			ret = -EAGAIN;
+			goto out_err;
+		}
+
+		ret = kvm_pgtable_stage2_map(pgt, ipa, PAGE_SIZE,
+					     page_to_phys(page_entry->page),
+					     prot, &cache);
+		spin_unlock(&kvm->mmu_lock);
+
+		if (ret) {
+			kvm_pgtable_stage2_unmap(pgt, memslot->base_gfn << PAGE_SHIFT,
+						 i << PAGE_SHIFT);
+			unpin_memslot_pages(memslot, writable);
+			goto out_err;
+		}
+		list_add(&page_entry->list, &memslot->arch.pages.list);
+
+		hva += PAGE_SIZE;
+		ipa += PAGE_SIZE;
+	}
+
+
+	/*
+	 * Even though we've checked the limit at the start, we can still exceed
+	 * it if userspace locked other pages in the meantime or if the
+	 * CAP_IPC_LOCK capability has been revoked.
+	 */
+	ret = account_locked_vm(current->mm, npages, true);
+	if (ret) {
+		kvm_pgtable_stage2_unmap(pgt, memslot->base_gfn << PAGE_SHIFT,
+					 npages << PAGE_SHIFT);
+		unpin_memslot_pages(memslot, writable);
+		goto out_err;
+	}
+
+	memslot->arch.flags = KVM_MEMSLOT_LOCK_READ;
+	if (writable)
+		memslot->arch.flags |= KVM_MEMSLOT_LOCK_WRITE;
+
+	kvm_mmu_free_memory_cache(&cache);
+
+	return 0;
+
+out_err:
+	kvm_mmu_free_memory_cache(&cache);
+	return ret;
+}
+
 int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags)
 {
 	struct kvm_memory_slot *memslot;
@@ -1283,7 +1445,12 @@ int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags)
 		goto out_unlock_slots;
 	}
 
-	ret = -EINVAL;
+	if (memslot_is_locked(memslot)) {
+		ret = -EBUSY;
+		goto out_unlock_slots;
+	}
+
+	ret = lock_memslot(kvm, memslot, flags);
 
 out_unlock_slots:
 	mutex_unlock(&kvm->slots_lock);
@@ -1294,13 +1461,51 @@ int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags)
 	return ret;
 }
 
+static int unlock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
+{
+	bool writable = memslot->arch.flags & KVM_MEMSLOT_LOCK_WRITE;
+	unsigned long npages = memslot->npages;
+
+	unpin_memslot_pages(memslot, writable);
+	account_locked_vm(current->mm, npages, false);
+
+	memslot->arch.flags &= ~KVM_MEMSLOT_LOCK_MASK;
+
+	return 0;
+}
+
+static int unlock_all_memslots(struct kvm *kvm)
+{
+	struct kvm_memory_slot *memslot;
+	int err, ret;
+
+	mutex_lock(&kvm->slots_lock);
+
+	ret = 0;
+	kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
+		if (!memslot_is_locked(memslot))
+			continue;
+		err = unlock_memslot(kvm, memslot);
+		if (err) {
+			kvm_err("error unlocking memslot %u: %d\n",
+				memslot->id, err);
+			/* Continue to try unlocking the rest of the slots */
+			ret = err;
+		}
+	}
+
+	mutex_unlock(&kvm->slots_lock);
+
+	return ret;
+}
+
 int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags)
 {
 	struct kvm_memory_slot *memslot;
 	int ret;
 
 	if (flags & KVM_ARM_UNLOCK_MEM_ALL)
-		return -EINVAL;
+		return unlock_all_memslots(kvm);
 
 	if (slot >= KVM_MEM_SLOTS_NUM)
 		return -EINVAL;
@@ -1313,7 +1518,7 @@ int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags)
 		goto out_unlock_slots;
 	}
 
-	ret = -EINVAL;
+	ret = unlock_memslot(kvm, memslot);
 
 out_unlock_slots:
 	mutex_unlock(&kvm->slots_lock);
-- 
2.33.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v4 04/39] KVM: arm64: Defer CMOs for locked memslots until a VCPU is run
  2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
                   ` (2 preceding siblings ...)
  2021-08-25 16:17 ` [RFC PATCH v4 03/39] KVM: arm64: Implement the memslot lock/unlock functionality Alexandru Elisei
@ 2021-08-25 16:17 ` Alexandru Elisei
  2021-10-17 11:43   ` Marc Zyngier
  2021-08-25 16:17 ` [RFC PATCH v4 05/39] KVM: arm64: Perform CMOs on locked memslots when userspace resets VCPUs Alexandru Elisei
                   ` (35 subsequent siblings)
  39 siblings, 1 reply; 47+ messages in thread
From: Alexandru Elisei @ 2021-08-25 16:17 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	linux-kernel

KVM relies on doing dcache maintenance on stage 2 faults to present to a
gueste running with the MMU off the same view of memory as userspace. For
locked memslots, KVM so far has done the dcache maintenance when a memslot
is locked, but that leaves KVM in a rather awkward position: what userspace
writes to guest memory after the memslot is locked, but before a VCPU is
run, might not be visible to the guest.

Fix this by deferring the dcache maintenance until the first VCPU is run.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_host.h |  7 ++++
 arch/arm64/include/asm/kvm_mmu.h  |  5 +++
 arch/arm64/kvm/arm.c              |  3 ++
 arch/arm64/kvm/mmu.c              | 56 ++++++++++++++++++++++++++++---
 4 files changed, 67 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 97ff3ed5d4b7..ed67f914d169 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -112,6 +112,10 @@ struct kvm_arch_memory_slot {
 	u32 flags;
 };
 
+/* kvm->arch.mmu_pending_ops flags */
+#define KVM_LOCKED_MEMSLOT_FLUSH_DCACHE	0
+#define KVM_MAX_MMU_PENDING_OPS		1
+
 struct kvm_arch {
 	struct kvm_s2_mmu mmu;
 
@@ -135,6 +139,9 @@ struct kvm_arch {
 	 */
 	bool return_nisv_io_abort_to_user;
 
+	/* Defer MMU operations until a VCPU is run. */
+	unsigned long mmu_pending_ops;
+
 	/*
 	 * VM-wide PMU filter, implemented as a bitmap and big enough for
 	 * up to 2^10 events (ARMv8.0) or 2^16 events (ARMv8.1+).
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index ef079b5eb475..525c223e769f 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -219,6 +219,11 @@ void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
 int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags);
 int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags);
 
+#define kvm_mmu_has_pending_ops(kvm)	\
+	(!bitmap_empty(&(kvm)->arch.mmu_pending_ops, KVM_MAX_MMU_PENDING_OPS))
+
+void kvm_mmu_perform_pending_ops(struct kvm *kvm);
+
 static inline unsigned int kvm_get_vmid_bits(void)
 {
 	int reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index efb3501c6016..144c982912d8 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -829,6 +829,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 	if (unlikely(!kvm_vcpu_initialized(vcpu)))
 		return -ENOEXEC;
 
+	if (unlikely(kvm_mmu_has_pending_ops(vcpu->kvm)))
+		kvm_mmu_perform_pending_ops(vcpu->kvm);
+
 	ret = kvm_vcpu_first_run_init(vcpu);
 	if (ret)
 		return ret;
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 59c2bfef2fd1..94fa08f3d9d3 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1253,6 +1253,41 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 	return ret;
 }
 
+/*
+ * It's safe to do the CMOs when the first VCPU is run because:
+ * - VCPUs cannot run until mmu_cmo_needed is cleared.
+ * - Memslots cannot be modified because we hold the kvm->slots_lock.
+ *
+ * It's safe to periodically release the mmu_lock because:
+ * - VCPUs cannot run.
+ * - Any changes to the stage 2 tables triggered by the MMU notifiers also take
+ *   the mmu_lock, which means accesses will be serialized.
+ * - Stage 2 tables cannot be freed from under us as long as at least one VCPU
+ *   is live, which means that the VM will be live.
+ */
+void kvm_mmu_perform_pending_ops(struct kvm *kvm)
+{
+	struct kvm_memory_slot *memslot;
+
+	mutex_lock(&kvm->slots_lock);
+	if (!kvm_mmu_has_pending_ops(kvm))
+		goto out_unlock;
+
+	if (test_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops)) {
+		kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
+			if (!memslot_is_locked(memslot))
+				continue;
+			stage2_flush_memslot(kvm, memslot);
+		}
+	}
+
+	bitmap_zero(&kvm->arch.mmu_pending_ops, KVM_MAX_MMU_PENDING_OPS);
+
+out_unlock:
+	mutex_unlock(&kvm->slots_lock);
+	return;
+}
+
 static int try_rlimit_memlock(unsigned long npages)
 {
 	unsigned long lock_limit;
@@ -1293,7 +1328,8 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
 	struct kvm_memory_slot_page *page_entry;
 	bool writable = flags & KVM_ARM_LOCK_MEM_WRITE;
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
-	struct kvm_pgtable *pgt = kvm->arch.mmu.pgt;
+	struct kvm_pgtable pgt;
+	struct kvm_pgtable_mm_ops mm_ops;
 	struct vm_area_struct *vma;
 	unsigned long npages = memslot->npages;
 	unsigned int pin_flags = FOLL_LONGTERM;
@@ -1311,6 +1347,16 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
 		pin_flags |= FOLL_WRITE;
 	}
 
+	/*
+	 * Make a copy of the stage 2 translation table struct to remove the
+	 * dcache callback so we can postpone the cache maintenance operations
+	 * until the first VCPU is run.
+	 */
+	mm_ops = *kvm->arch.mmu.pgt->mm_ops;
+	mm_ops.dcache_clean_inval_poc = NULL;
+	pgt = *kvm->arch.mmu.pgt;
+	pgt.mm_ops = &mm_ops;
+
 	hva = memslot->userspace_addr;
 	ipa = memslot->base_gfn << PAGE_SHIFT;
 
@@ -1362,13 +1408,13 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
 			goto out_err;
 		}
 
-		ret = kvm_pgtable_stage2_map(pgt, ipa, PAGE_SIZE,
+		ret = kvm_pgtable_stage2_map(&pgt, ipa, PAGE_SIZE,
 					     page_to_phys(page_entry->page),
 					     prot, &cache);
 		spin_unlock(&kvm->mmu_lock);
 
 		if (ret) {
-			kvm_pgtable_stage2_unmap(pgt, memslot->base_gfn << PAGE_SHIFT,
+			kvm_pgtable_stage2_unmap(&pgt, memslot->base_gfn << PAGE_SHIFT,
 						 i << PAGE_SHIFT);
 			unpin_memslot_pages(memslot, writable);
 			goto out_err;
@@ -1387,7 +1433,7 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
 	 */
 	ret = account_locked_vm(current->mm, npages, true);
 	if (ret) {
-		kvm_pgtable_stage2_unmap(pgt, memslot->base_gfn << PAGE_SHIFT,
+		kvm_pgtable_stage2_unmap(&pgt, memslot->base_gfn << PAGE_SHIFT,
 					 npages << PAGE_SHIFT);
 		unpin_memslot_pages(memslot, writable);
 		goto out_err;
@@ -1397,6 +1443,8 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
 	if (writable)
 		memslot->arch.flags |= KVM_MEMSLOT_LOCK_WRITE;
 
+	set_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops);
+
 	kvm_mmu_free_memory_cache(&cache);
 
 	return 0;
-- 
2.33.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v4 05/39] KVM: arm64: Perform CMOs on locked memslots when userspace resets VCPUs
  2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
                   ` (3 preceding siblings ...)
  2021-08-25 16:17 ` [RFC PATCH v4 04/39] KVM: arm64: Defer CMOs for locked memslots until a VCPU is run Alexandru Elisei
@ 2021-08-25 16:17 ` Alexandru Elisei
  2021-08-25 16:17 ` [RFC PATCH v4 06/39] KVM: arm64: Delay tag scrubbing for locked memslots until a VCPU runs Alexandru Elisei
                   ` (34 subsequent siblings)
  39 siblings, 0 replies; 47+ messages in thread
From: Alexandru Elisei @ 2021-08-25 16:17 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	linux-kernel

Userspace resets a VCPU that has already run by means of a
KVM_ARM_VCPU_INIT ioctl. This is usually done after a VM shutdown and
before the same VM is rebooted, and during this interval the VM memory can
be modified by userspace (for example, to copy the original guest kernel
image). In this situation, KVM unmaps the entire stage 2 to trigger stage 2
faults, which ensures that the guest has the same view of memory as the
host's userspace.

Unmapping stage 2 is not an option for locked memslots, so instead do the
cache maintenance the first time a VCPU is run, similar to what KVM does
when a memslot is locked.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_host.h |  3 ++-
 arch/arm64/kvm/mmu.c              | 13 ++++++++++++-
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index ed67f914d169..68905bd47f85 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -114,7 +114,8 @@ struct kvm_arch_memory_slot {
 
 /* kvm->arch.mmu_pending_ops flags */
 #define KVM_LOCKED_MEMSLOT_FLUSH_DCACHE	0
-#define KVM_MAX_MMU_PENDING_OPS		1
+#define KVM_LOCKED_MEMSLOT_INVAL_ICACHE	1
+#define KVM_MAX_MMU_PENDING_OPS		2
 
 struct kvm_arch {
 	struct kvm_s2_mmu mmu;
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 94fa08f3d9d3..f1f8a87550d1 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -560,8 +560,16 @@ void stage2_unmap_vm(struct kvm *kvm)
 	spin_lock(&kvm->mmu_lock);
 
 	slots = kvm_memslots(kvm);
-	kvm_for_each_memslot(memslot, slots)
+	kvm_for_each_memslot(memslot, slots) {
+		if (memslot_is_locked(memslot)) {
+			set_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE,
+				&kvm->arch.mmu_pending_ops);
+			set_bit(KVM_LOCKED_MEMSLOT_INVAL_ICACHE,
+				&kvm->arch.mmu_pending_ops);
+			continue;
+		}
 		stage2_unmap_memslot(kvm, memslot);
+	}
 
 	spin_unlock(&kvm->mmu_lock);
 	mmap_read_unlock(current->mm);
@@ -1281,6 +1289,9 @@ void kvm_mmu_perform_pending_ops(struct kvm *kvm)
 		}
 	}
 
+	if (test_bit(KVM_LOCKED_MEMSLOT_INVAL_ICACHE, &kvm->arch.mmu_pending_ops))
+		icache_inval_all_pou();
+
 	bitmap_zero(&kvm->arch.mmu_pending_ops, KVM_MAX_MMU_PENDING_OPS);
 
 out_unlock:
-- 
2.33.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v4 06/39] KVM: arm64: Delay tag scrubbing for locked memslots until a VCPU runs
  2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
                   ` (4 preceding siblings ...)
  2021-08-25 16:17 ` [RFC PATCH v4 05/39] KVM: arm64: Perform CMOs on locked memslots when userspace resets VCPUs Alexandru Elisei
@ 2021-08-25 16:17 ` Alexandru Elisei
  2021-08-25 16:17 ` [RFC PATCH v4 07/39] KVM: arm64: Unlock memslots after stage 2 tables are freed Alexandru Elisei
                   ` (33 subsequent siblings)
  39 siblings, 0 replies; 47+ messages in thread
From: Alexandru Elisei @ 2021-08-25 16:17 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	linux-kernel

When an MTE-enabled guest first accesses a physical page, that page must be
scrubbed for tags. This is normally done by KVM on a translation fault, but
with locked memslots we will not get translation faults. So far, this has
been handled by forbidding userspace to enable the MTE capability after
locking a memslot.

Remove this constraint by deferring tag cleaning until the first VCPU is
run, similar to how KVM handles cache maintenance operations.

When userspace resets a VCPU, KVM again performs cache maintenance
operations on locked memslots because userspace might have modified the
guest memory. Clean the tags the next time a VCPU is run for the same
reason.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_host.h |  7 ++-
 arch/arm64/include/asm/kvm_mmu.h  |  2 +-
 arch/arm64/kvm/arm.c              | 29 ++--------
 arch/arm64/kvm/mmu.c              | 92 ++++++++++++++++++++++++++-----
 4 files changed, 87 insertions(+), 43 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 68905bd47f85..a57f33368a3e 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -113,9 +113,10 @@ struct kvm_arch_memory_slot {
 };
 
 /* kvm->arch.mmu_pending_ops flags */
-#define KVM_LOCKED_MEMSLOT_FLUSH_DCACHE	0
-#define KVM_LOCKED_MEMSLOT_INVAL_ICACHE	1
-#define KVM_MAX_MMU_PENDING_OPS		2
+#define KVM_LOCKED_MEMSLOT_FLUSH_DCACHE		0
+#define KVM_LOCKED_MEMSLOT_INVAL_ICACHE		1
+#define KVM_LOCKED_MEMSLOT_SANITISE_TAGS	2
+#define KVM_MAX_MMU_PENDING_OPS			3
 
 struct kvm_arch {
 	struct kvm_s2_mmu mmu;
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 525c223e769f..9fcdd2580f6e 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -222,7 +222,7 @@ int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags);
 #define kvm_mmu_has_pending_ops(kvm)	\
 	(!bitmap_empty(&(kvm)->arch.mmu_pending_ops, KVM_MAX_MMU_PENDING_OPS))
 
-void kvm_mmu_perform_pending_ops(struct kvm *kvm);
+int kvm_mmu_perform_pending_ops(struct kvm *kvm);
 
 static inline unsigned int kvm_get_vmid_bits(void)
 {
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 144c982912d8..c47e96ae4f7c 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -108,25 +108,6 @@ static int kvm_lock_user_memory_region_ioctl(struct kvm *kvm,
 	}
 }
 
-static bool kvm_arm_has_locked_memslots(struct kvm *kvm)
-{
-	struct kvm_memslots *slots = kvm_memslots(kvm);
-	struct kvm_memory_slot *memslot;
-	bool has_locked_memslots = false;
-	int idx;
-
-	idx = srcu_read_lock(&kvm->srcu);
-	kvm_for_each_memslot(memslot, slots) {
-		if (memslot->arch.flags & KVM_MEMSLOT_LOCK_MASK) {
-			has_locked_memslots = true;
-			break;
-		}
-	}
-	srcu_read_unlock(&kvm->srcu, idx);
-
-	return has_locked_memslots;
-}
-
 int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 			    struct kvm_enable_cap *cap)
 {
@@ -142,9 +123,6 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 	case KVM_CAP_ARM_MTE:
 		if (!system_supports_mte() || kvm->created_vcpus)
 			return -EINVAL;
-		if (kvm_arm_lock_memslot_supported() &&
-		    kvm_arm_has_locked_memslots(kvm))
-			return -EPERM;
 		r = 0;
 		kvm->arch.mte_enabled = true;
 		break;
@@ -829,8 +807,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 	if (unlikely(!kvm_vcpu_initialized(vcpu)))
 		return -ENOEXEC;
 
-	if (unlikely(kvm_mmu_has_pending_ops(vcpu->kvm)))
-		kvm_mmu_perform_pending_ops(vcpu->kvm);
+	if (unlikely(kvm_mmu_has_pending_ops(vcpu->kvm))) {
+		ret = kvm_mmu_perform_pending_ops(vcpu->kvm);
+		if (ret)
+			return ret;
+	}
 
 	ret = kvm_vcpu_first_run_init(vcpu);
 	if (ret)
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index f1f8a87550d1..cd44b6f2c53e 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -566,6 +566,10 @@ void stage2_unmap_vm(struct kvm *kvm)
 				&kvm->arch.mmu_pending_ops);
 			set_bit(KVM_LOCKED_MEMSLOT_INVAL_ICACHE,
 				&kvm->arch.mmu_pending_ops);
+			if (kvm_has_mte(kvm)) {
+				set_bit(KVM_LOCKED_MEMSLOT_SANITISE_TAGS,
+					&kvm->arch.mmu_pending_ops);
+			}
 			continue;
 		}
 		stage2_unmap_memslot(kvm, memslot);
@@ -909,6 +913,58 @@ static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
 	return 0;
 }
 
+static int sanitise_mte_tags_memslot(struct kvm *kvm,
+				     struct kvm_memory_slot *memslot)
+{
+	unsigned long hva, slot_size, slot_end;
+	struct kvm_memory_slot_page *entry;
+	struct page *page;
+	int ret = 0;
+
+	if (!kvm_has_mte(kvm))
+		return 0;
+
+	hva = memslot->userspace_addr;
+	slot_size = memslot->npages << PAGE_SHIFT;
+	slot_end = hva + slot_size;
+
+	/* First check that the VMAs spanning the memslot are not shared... */
+	do {
+		struct vm_area_struct *vma;
+
+		vma = find_vma_intersection(current->mm, hva, slot_end);
+		/* The VMAs spanning the memslot must be contiguous. */
+		if (!vma) {
+			ret = -EFAULT;
+			goto out;
+		}
+		/*
+		 * VM_SHARED mappings are not allowed with MTE to avoid races
+		 * when updating the PG_mte_tagged page flag, see
+		 * sanitise_mte_tags for more details.
+		 */
+		if (kvm_has_mte(kvm) && vma->vm_flags & VM_SHARED) {
+			ret = -EFAULT;
+			goto out;
+		}
+		hva = min(slot_end, vma->vm_end);
+	} while (hva < slot_end);
+
+	/* ... then clear the tags. */
+	list_for_each_entry(entry, &memslot->arch.pages.list, list) {
+		page = entry->page;
+		if (!test_bit(PG_mte_tagged, &page->flags)) {
+			mte_clear_page_tags(page_address(page));
+			set_bit(PG_mte_tagged, &page->flags);
+		}
+	}
+
+out:
+	mmap_read_unlock(current->mm);
+
+	return ret;
+}
+
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  struct kvm_memory_slot *memslot, unsigned long hva,
 			  unsigned long fault_status)
@@ -1273,14 +1329,28 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
  * - Stage 2 tables cannot be freed from under us as long as at least one VCPU
  *   is live, which means that the VM will be live.
  */
-void kvm_mmu_perform_pending_ops(struct kvm *kvm)
+int kvm_mmu_perform_pending_ops(struct kvm *kvm)
 {
 	struct kvm_memory_slot *memslot;
+	int ret = 0;
 
 	mutex_lock(&kvm->slots_lock);
 	if (!kvm_mmu_has_pending_ops(kvm))
 		goto out_unlock;
 
+	if (test_bit(KVM_LOCKED_MEMSLOT_SANITISE_TAGS, &kvm->arch.mmu_pending_ops) &&
+	    kvm_has_mte(kvm)) {
+		kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
+			if (!memslot_is_locked(memslot))
+				continue;
+			mmap_read_lock(current->mm);
+			ret = sanitise_mte_tags_memslot(kvm, memslot);
+			mmap_read_unlock(current->mm);
+			if (ret)
+				goto out_unlock;
+		}
+	}
+
 	if (test_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops)) {
 		kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
 			if (!memslot_is_locked(memslot))
@@ -1296,7 +1366,7 @@ void kvm_mmu_perform_pending_ops(struct kvm *kvm)
 
 out_unlock:
 	mutex_unlock(&kvm->slots_lock);
-	return;
+	return ret;
 }
 
 static int try_rlimit_memlock(unsigned long npages)
@@ -1390,19 +1460,6 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
 			ret = -ENOMEM;
 			goto out_err;
 		}
-		if (kvm_has_mte(kvm)) {
-			if (vma->vm_flags & VM_SHARED) {
-				ret = -EFAULT;
-			} else {
-				ret = sanitise_mte_tags(kvm,
-					page_to_pfn(page_entry->page),
-					PAGE_SIZE);
-			}
-			if (ret) {
-				mmap_read_unlock(current->mm);
-				goto out_err;
-			}
-		}
 		mmap_read_unlock(current->mm);
 
 		ret = kvm_mmu_topup_memory_cache(&cache, kvm_mmu_cache_min_pages(kvm));
@@ -1455,6 +1512,11 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
 		memslot->arch.flags |= KVM_MEMSLOT_LOCK_WRITE;
 
 	set_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops);
+	/*
+	 * MTE might be enabled after we lock the memslot, set it here
+	 * unconditionally.
+	 */
+	set_bit(KVM_LOCKED_MEMSLOT_SANITISE_TAGS, &kvm->arch.mmu_pending_ops);
 
 	kvm_mmu_free_memory_cache(&cache);
 
-- 
2.33.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v4 07/39] KVM: arm64: Unlock memslots after stage 2 tables are freed
  2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
                   ` (5 preceding siblings ...)
  2021-08-25 16:17 ` [RFC PATCH v4 06/39] KVM: arm64: Delay tag scrubbing for locked memslots until a VCPU runs Alexandru Elisei
@ 2021-08-25 16:17 ` Alexandru Elisei
  2021-08-25 16:17 ` [RFC PATCH v4 08/39] KVM: arm64: Deny changes to locked memslots Alexandru Elisei
                   ` (32 subsequent siblings)
  39 siblings, 0 replies; 47+ messages in thread
From: Alexandru Elisei @ 2021-08-25 16:17 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	linux-kernel

Unpin the backing pages mapped at stage 2 after the stage 2 translation
tables are destroyed.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/kvm/mmu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index cd44b6f2c53e..27b7befd4fa9 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1907,6 +1907,7 @@ void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen)
 void kvm_arch_flush_shadow_all(struct kvm *kvm)
 {
 	kvm_free_stage2_pgd(&kvm->arch.mmu);
+	unlock_all_memslots(kvm);
 }
 
 void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
-- 
2.33.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v4 08/39] KVM: arm64: Deny changes to locked memslots
  2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
                   ` (6 preceding siblings ...)
  2021-08-25 16:17 ` [RFC PATCH v4 07/39] KVM: arm64: Unlock memslots after stage 2 tables are freed Alexandru Elisei
@ 2021-08-25 16:17 ` Alexandru Elisei
  2021-08-25 16:17 ` [RFC PATCH v4 09/39] KVM: Add kvm_warn{,_ratelimited} macros Alexandru Elisei
                   ` (31 subsequent siblings)
  39 siblings, 0 replies; 47+ messages in thread
From: Alexandru Elisei @ 2021-08-25 16:17 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	linux-kernel

Forbid userspace from making changes to a locked memslot. If userspace
wants to modify a locked memslot, then they will need to unlock it.

One special case is allowed: memslots locked for read, but not for write,
can have dirty page logging turned on.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/kvm/mmu.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 27b7befd4fa9..3ab8eba808ae 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1842,8 +1842,23 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 {
 	hva_t hva = mem->userspace_addr;
 	hva_t reg_end = hva + mem->memory_size;
+	struct kvm_memory_slot *old;
 	int ret = 0;
 
+	/*
+	 * Forbid all changes to locked memslots with the exception of turning
+	 * on dirty page logging for memslots locked only for reads.
+	 */
+	old = id_to_memslot(kvm_memslots(kvm), memslot->id);
+	if (old && memslot_is_locked(old)) {
+		if (change == KVM_MR_FLAGS_ONLY &&
+		    memslot_is_logging(memslot) &&
+		    !(old->arch.flags & KVM_MEMSLOT_LOCK_WRITE))
+			memcpy(&memslot->arch, &old->arch, sizeof(old->arch));
+		else
+			return -EBUSY;
+	}
+
 	if (change != KVM_MR_CREATE && change != KVM_MR_MOVE &&
 			change != KVM_MR_FLAGS_ONLY)
 		return 0;
-- 
2.33.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v4 09/39] KVM: Add kvm_warn{,_ratelimited} macros
  2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
                   ` (7 preceding siblings ...)
  2021-08-25 16:17 ` [RFC PATCH v4 08/39] KVM: arm64: Deny changes to locked memslots Alexandru Elisei
@ 2021-08-25 16:17 ` Alexandru Elisei
  2021-08-25 16:17 ` [RFC PATCH v4 10/39] KVM: arm64: Print a warning for unexpected faults on locked memslots Alexandru Elisei
                   ` (30 subsequent siblings)
  39 siblings, 0 replies; 47+ messages in thread
From: Alexandru Elisei @ 2021-08-25 16:17 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	linux-kernel

Add the kvm_warn() and kvm_warn_ratelimited() macros to print a kernel
warning.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 include/linux/kvm_host.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ae7735b490b4..ada5019ad93d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -605,6 +605,10 @@ struct kvm {
 
 #define kvm_err(fmt, ...) \
 	pr_err("kvm [%i]: " fmt, task_pid_nr(current), ## __VA_ARGS__)
+#define kvm_warn(fmt, ...) \
+	pr_warn("kvm [%i]: " fmt, task_pid_nr(current), ## __VA_ARGS__)
+#define kvm_warn_ratelimited(fmt, ...) \
+	pr_warn_ratelimited("kvm [%i]: " fmt, task_pid_nr(current), ## __VA_ARGS__)
 #define kvm_info(fmt, ...) \
 	pr_info("kvm [%i]: " fmt, task_pid_nr(current), ## __VA_ARGS__)
 #define kvm_debug(fmt, ...) \
-- 
2.33.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v4 10/39] KVM: arm64: Print a warning for unexpected faults on locked memslots
  2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
                   ` (8 preceding siblings ...)
  2021-08-25 16:17 ` [RFC PATCH v4 09/39] KVM: Add kvm_warn{,_ratelimited} macros Alexandru Elisei
@ 2021-08-25 16:17 ` Alexandru Elisei
  2021-08-25 16:17 ` [RFC PATCH v4 11/39] KVM: arm64: Allow userspace to lock and unlock memslots Alexandru Elisei
                   ` (29 subsequent siblings)
  39 siblings, 0 replies; 47+ messages in thread
From: Alexandru Elisei @ 2021-08-25 16:17 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	linux-kernel

When userspace unmaps a VMA backing a memslot, the corresponding stage 2
address range gets unmapped via the MMU notifiers. This makes it possible
to get stage 2 faults on a locked memslot, which might not be what
userspace wants because the purpose of locking a memslot is to avoid stage
2 faults in the first place.

Addresses being unmapped from stage 2 can happen from other reasons too,
like bugs in the implementation of the lock memslot API, however unlikely
that might seem.

Let's try to make debugging easier by printing a warning when this happens.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/kvm/mmu.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 3ab8eba808ae..d66d89c18045 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1298,6 +1298,27 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 	/* Userspace should not be able to register out-of-bounds IPAs */
 	VM_BUG_ON(fault_ipa >= kvm_phys_size(vcpu->kvm));
 
+	if (memslot_is_locked(memslot)) {
+		const char *fault_type_str;
+
+		if (kvm_vcpu_trap_is_exec_fault(vcpu))
+			goto handle_fault;
+
+		if (fault_status == FSC_ACCESS)
+			fault_type_str = "access";
+		else if (write_fault && (memslot->arch.flags & KVM_MEMSLOT_LOCK_WRITE))
+			fault_type_str = "write";
+		else if (!write_fault)
+			fault_type_str = "read";
+		else
+			goto handle_fault;
+
+		kvm_warn_ratelimited("Unexpected L2 %s fault on locked memslot %d: IPA=%#llx, ESR_EL2=%#08x]\n",
+				     fault_type_str, memslot->id, fault_ipa,
+				     kvm_vcpu_get_esr(vcpu));
+	}
+
+handle_fault:
 	if (fault_status == FSC_ACCESS) {
 		handle_access_fault(vcpu, fault_ipa);
 		ret = 1;
-- 
2.33.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v4 11/39] KVM: arm64: Allow userspace to lock and unlock memslots
  2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
                   ` (9 preceding siblings ...)
  2021-08-25 16:17 ` [RFC PATCH v4 10/39] KVM: arm64: Print a warning for unexpected faults on locked memslots Alexandru Elisei
@ 2021-08-25 16:17 ` Alexandru Elisei
  2021-08-25 16:17 ` [RFC PATCH v4 12/39] KVM: arm64: Add the KVM_ARM_VCPU_SUPPORTED_CPUS VCPU ioctl Alexandru Elisei
                   ` (28 subsequent siblings)
  39 siblings, 0 replies; 47+ messages in thread
From: Alexandru Elisei @ 2021-08-25 16:17 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	linux-kernel

The ioctls have been implemented, allow the userspace to lock and unlock
memslots.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/kvm/arm.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index c47e96ae4f7c..4bd4b8b082a4 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -80,11 +80,6 @@ int kvm_arch_check_processor_compat(void *opaque)
 	return 0;
 }
 
-static int kvm_arm_lock_memslot_supported(void)
-{
-	return 0;
-}
-
 static int kvm_lock_user_memory_region_ioctl(struct kvm *kvm,
 					     struct kvm_enable_cap *cap)
 {
@@ -127,8 +122,6 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		kvm->arch.mte_enabled = true;
 		break;
 	case KVM_CAP_ARM_LOCK_USER_MEMORY_REGION:
-		if (!kvm_arm_lock_memslot_supported())
-			return -EINVAL;
 		r = kvm_lock_user_memory_region_ioctl(kvm, cap);
 		break;
 	default:
@@ -306,7 +299,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		r = system_has_full_ptr_auth();
 		break;
 	case KVM_CAP_ARM_LOCK_USER_MEMORY_REGION:
-		r = kvm_arm_lock_memslot_supported();
+		r = 1;
 		break;
 	default:
 		r = 0;
-- 
2.33.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v4 12/39] KVM: arm64: Add the KVM_ARM_VCPU_SUPPORTED_CPUS VCPU ioctl
  2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
                   ` (10 preceding siblings ...)
  2021-08-25 16:17 ` [RFC PATCH v4 11/39] KVM: arm64: Allow userspace to lock and unlock memslots Alexandru Elisei
@ 2021-08-25 16:17 ` Alexandru Elisei
  2021-08-25 16:17 ` [RFC PATCH v4 13/39] KVM: arm64: Add CONFIG_KVM_ARM_SPE Kconfig option Alexandru Elisei
                   ` (27 subsequent siblings)
  39 siblings, 0 replies; 47+ messages in thread
From: Alexandru Elisei @ 2021-08-25 16:17 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	linux-kernel

The ioctl is used to specify a list of physical CPUs on which the VCPU is
allowed to run. The ioctl introduces no constraints on the VCPU scheduling,
and userspace is expected to manage the VCPU affinity. Attempting to run
the VCPU on a CPU not present in the list will result in KVM_RUN returning
-ENOEXEC.

The expectation is that this ioctl will be used by KVM to prevent errors,
like accesses to undefined registers, when emulating VCPU features for
which hardware support is present only on a subset of the CPUs present in
the system.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 Documentation/virt/kvm/api.rst    | 22 ++++++++++++++++++++--
 arch/arm64/include/asm/kvm_host.h |  3 +++
 arch/arm64/kvm/arm.c              | 31 +++++++++++++++++++++++++++++++
 include/uapi/linux/kvm.h          |  4 ++++
 4 files changed, 58 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 5aa251df7077..994faa24690a 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -396,8 +396,10 @@ Errors:
 
   =======    ==============================================================
   EINTR      an unmasked signal is pending
-  ENOEXEC    the vcpu hasn't been initialized or the guest tried to execute
-             instructions from device memory (arm64)
+  ENOEXEC    the vcpu hasn't been initialized, the guest tried to execute
+             instructions from device memory (arm64) or the vcpu has been
+             scheduled on a cpu not in the list specified by
+             KVM_ARM_VCPU_SUPPORTED_CPUS (arm64).
   ENOSYS     data abort outside memslots with no syndrome info and
              KVM_CAP_ARM_NISV_TO_USER not enabled (arm64)
   EPERM      SVE feature set but not finalized (arm64)
@@ -5293,6 +5295,22 @@ the trailing ``'\0'``, is indicated by ``name_size`` in the header.
 The Stats Data block contains an array of 64-bit values in the same order
 as the descriptors in Descriptors block.
 
+4.134 KVM_ARM_VCPU_SUPPORTED_CPUS
+---------------------------------
+
+:Capability: KVM_CAP_ARM_SUPPORTED_CPUS
+:Architectures: arm64
+:Type: vcpu ioctl
+:Parameters: const char * representing a range of supported CPUs
+:Returns: 0 on success, < 0 on error
+
+Specifies a list of physical CPUs on which the VCPU can run. KVM will not make
+any attempts to prevent the VCPU from being scheduled on a CPU which is not
+present in the list; when that happens, KVM_RUN will return -ENOEXEC.
+
+The format for the range of supported CPUs is specified in the comment for
+the function lib/bitmap.c::bitmap_parselist().
+
 5. The kvm_run structure
 ========================
 
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index a57f33368a3e..1f3b46a6df81 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -396,6 +396,9 @@ struct kvm_vcpu_arch {
 	 * see kvm_vcpu_load_sysregs_vhe and kvm_vcpu_put_sysregs_vhe. */
 	bool sysregs_loaded_on_cpu;
 
+	cpumask_t supported_cpus;
+	bool cpu_not_supported;
+
 	/* Guest PV state */
 	struct {
 		u64 last_steal;
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 4bd4b8b082a4..e8a7c0c3a086 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -301,6 +301,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_ARM_LOCK_USER_MEMORY_REGION:
 		r = 1;
 		break;
+	case KVM_CAP_ARM_VCPU_SUPPORTED_CPUS:
+		r = 1;
+		break;
 	default:
 		r = 0;
 	}
@@ -456,6 +459,10 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	if (vcpu_has_ptrauth(vcpu))
 		vcpu_ptrauth_disable(vcpu);
 	kvm_arch_vcpu_load_debug_state_flags(vcpu);
+
+	if (!cpumask_empty(&vcpu->arch.supported_cpus) &&
+	    !cpumask_test_cpu(smp_processor_id(), &vcpu->arch.supported_cpus))
+		vcpu->arch.cpu_not_supported = true;
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
@@ -844,6 +851,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 		 */
 		preempt_disable();
 
+		if (unlikely(vcpu->arch.cpu_not_supported)) {
+			vcpu->arch.cpu_not_supported = false;
+			ret = -ENOEXEC;
+			preempt_enable();
+			continue;
+		}
+
 		kvm_pmu_flush_hwstate(vcpu);
 
 		local_irq_disable();
@@ -1361,6 +1375,23 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 
 		return kvm_arm_vcpu_set_events(vcpu, &events);
 	}
+	case KVM_ARM_VCPU_SUPPORTED_CPUS: {
+		char *cpulist;
+
+		r = -ENOEXEC;
+		if (unlikely(vcpu->arch.has_run_once))
+			break;
+
+		cpulist = strndup_user((const char __user *)argp, PAGE_SIZE);
+		if (IS_ERR(cpulist)) {
+			r = PTR_ERR(cpulist);
+			break;
+		}
+
+		r = cpulist_parse(cpulist, &vcpu->arch.supported_cpus);
+		kfree(cpulist);
+		break;
+	}
 	case KVM_ARM_VCPU_FINALIZE: {
 		int what;
 
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index bcf62c7bdd2d..e5acc925c528 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1113,6 +1113,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
 #define KVM_CAP_ARM_MTE 205
 #define KVM_CAP_ARM_LOCK_USER_MEMORY_REGION 206
+#define KVM_CAP_ARM_VCPU_SUPPORTED_CPUS 207
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1594,6 +1595,9 @@ struct kvm_enc_region {
 #define KVM_S390_NORMAL_RESET	_IO(KVMIO,   0xc3)
 #define KVM_S390_CLEAR_RESET	_IO(KVMIO,   0xc4)
 
+/* Available with KVM_CAP_ARM_VCPU_SUPPORTED_CPUS */
+#define KVM_ARM_VCPU_SUPPORTED_CPUS    _IOW(KVMIO, 0xc5, const char *)
+
 struct kvm_s390_pv_sec_parm {
 	__u64 origin;
 	__u64 length;
-- 
2.33.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v4 13/39] KVM: arm64: Add CONFIG_KVM_ARM_SPE Kconfig option
  2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
                   ` (11 preceding siblings ...)
  2021-08-25 16:17 ` [RFC PATCH v4 12/39] KVM: arm64: Add the KVM_ARM_VCPU_SUPPORTED_CPUS VCPU ioctl Alexandru Elisei
@ 2021-08-25 16:17 ` Alexandru Elisei
  2021-08-25 16:17 ` [RFC PATCH v4 14/39] KVM: arm64: Add SPE capability and VCPU feature Alexandru Elisei
                   ` (26 subsequent siblings)
  39 siblings, 0 replies; 47+ messages in thread
From: Alexandru Elisei @ 2021-08-25 16:17 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	linux-kernel

Add a new configuration option that will be used for KVM SPE emulation.
CONFIG_KVM_ARM_SPE depends on the SPE driver being builtin because:

1. The cpumask of physical CPUs that support SPE will be used by KVM to
   emulate SPE on heterogeneous systems.

2. KVM will rely on the SPE driver enabling the SPE interrupt at the GIC
   level.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/kvm/Kconfig | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index a4eba0908bfa..c6ad5a05efb3 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -46,6 +46,14 @@ if KVM
 
 source "virt/kvm/Kconfig"
 
+config KVM_ARM_SPE
+	bool "Virtual Statistical Profiling Extension (SPE) support"
+	depends on ARM_SPE_PMU=y
+	default y
+	help
+	  Adds support for Statistical Profiling Extension (SPE) in virtual
+	  machines.
+
 endif # KVM
 
 endif # VIRTUALIZATION
-- 
2.33.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v4 14/39] KVM: arm64: Add SPE capability and VCPU feature
  2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
                   ` (12 preceding siblings ...)
  2021-08-25 16:17 ` [RFC PATCH v4 13/39] KVM: arm64: Add CONFIG_KVM_ARM_SPE Kconfig option Alexandru Elisei
@ 2021-08-25 16:17 ` Alexandru Elisei
  2021-08-25 16:17 ` [RFC PATCH v4 15/39] drivers/perf: Expose the cpumask of CPUs that support SPE Alexandru Elisei
                   ` (25 subsequent siblings)
  39 siblings, 0 replies; 47+ messages in thread
From: Alexandru Elisei @ 2021-08-25 16:17 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	linux-kernel

Add the KVM_CAP_ARM_SPE capability which allows userspace to discover if
SPE emulation is available. Add the KVM_ARM_VCPU_SPE feature which
enables the emulation for a VCPU.

Both are disabled for now.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 Documentation/virt/kvm/api.rst    | 9 +++++++++
 arch/arm64/include/uapi/asm/kvm.h | 1 +
 arch/arm64/kvm/arm.c              | 3 +++
 include/uapi/linux/kvm.h          | 1 +
 4 files changed, 14 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 994faa24690a..68fada258b80 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7315,3 +7315,12 @@ The argument to KVM_ENABLE_CAP is also a bitmask, and must be a subset
 of the result of KVM_CHECK_EXTENSION.  KVM will forward to userspace
 the hypercalls whose corresponding bit is in the argument, and return
 ENOSYS for the others.
+
+8.35 KVM_CAP_ARM_SPE
+--------------------
+
+:Architectures: arm64
+
+This capability indicates that Statistical Profiling Extension (SPE) emulation
+is available in KVM. SPE emulation is enabled for each VCPU which has the
+feature bit KVM_ARM_VCPU_SPE set.
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index b3edde68bc3e..9f0a8ea50ea9 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -106,6 +106,7 @@ struct kvm_regs {
 #define KVM_ARM_VCPU_SVE		4 /* enable SVE for this CPU */
 #define KVM_ARM_VCPU_PTRAUTH_ADDRESS	5 /* VCPU uses address authentication */
 #define KVM_ARM_VCPU_PTRAUTH_GENERIC	6 /* VCPU uses generic authentication */
+#define KVM_ARM_VCPU_SPE		7 /* enable SPE for this CPU */
 
 struct kvm_vcpu_init {
 	__u32 target;
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index e8a7c0c3a086..22544eb367f3 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -304,6 +304,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_ARM_VCPU_SUPPORTED_CPUS:
 		r = 1;
 		break;
+	case KVM_CAP_ARM_SPE:
+		r = 0;
+		break;
 	default:
 		r = 0;
 	}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index e5acc925c528..930ef91f7916 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1114,6 +1114,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_ARM_MTE 205
 #define KVM_CAP_ARM_LOCK_USER_MEMORY_REGION 206
 #define KVM_CAP_ARM_VCPU_SUPPORTED_CPUS 207
+#define KVM_CAP_ARM_SPE 208
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.33.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v4 15/39] drivers/perf: Expose the cpumask of CPUs that support SPE
  2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
                   ` (13 preceding siblings ...)
  2021-08-25 16:17 ` [RFC PATCH v4 14/39] KVM: arm64: Add SPE capability and VCPU feature Alexandru Elisei
@ 2021-08-25 16:17 ` Alexandru Elisei
  2021-08-25 16:17 ` [RFC PATCH v4 16/39] KVM: arm64: Make SPE available when at least one CPU supports it Alexandru Elisei
                   ` (24 subsequent siblings)
  39 siblings, 0 replies; 47+ messages in thread
From: Alexandru Elisei @ 2021-08-25 16:17 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	linux-kernel

KVM SPE emulation will require the list of CPUs that have hardware support
for SPE. Add a function that returns the cpumask of supported CPUs
discovered by the driver during probing.

CC: Will Deacon <will@kernel.org>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 drivers/perf/arm_spe_pmu.c   | 30 ++++++++++++++++++------------
 include/linux/perf/arm_pmu.h |  7 +++++++
 2 files changed, 25 insertions(+), 12 deletions(-)

diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index d44bcc29d99c..40749665f102 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -50,7 +50,6 @@ struct arm_spe_pmu_buf {
 struct arm_spe_pmu {
 	struct pmu				pmu;
 	struct platform_device			*pdev;
-	cpumask_t				supported_cpus;
 	struct hlist_node			hotplug_node;
 
 	int					irq; /* PPI */
@@ -72,6 +71,8 @@ struct arm_spe_pmu {
 	struct perf_output_handle __percpu	*handle;
 };
 
+static cpumask_t supported_cpus;
+
 #define to_spe_pmu(p) (container_of(p, struct arm_spe_pmu, pmu))
 
 /* Convert a free-running index from perf into an SPE buffer offset */
@@ -234,9 +235,7 @@ static const struct attribute_group arm_spe_pmu_format_group = {
 static ssize_t cpumask_show(struct device *dev,
 			    struct device_attribute *attr, char *buf)
 {
-	struct arm_spe_pmu *spe_pmu = dev_get_drvdata(dev);
-
-	return cpumap_print_to_pagebuf(true, buf, &spe_pmu->supported_cpus);
+	return cpumap_print_to_pagebuf(true, buf, &supported_cpus);
 }
 static DEVICE_ATTR_RO(cpumask);
 
@@ -677,7 +676,7 @@ static int arm_spe_pmu_event_init(struct perf_event *event)
 		return -ENOENT;
 
 	if (event->cpu >= 0 &&
-	    !cpumask_test_cpu(event->cpu, &spe_pmu->supported_cpus))
+	    !cpumask_test_cpu(event->cpu, &supported_cpus))
 		return -ENOENT;
 
 	if (arm_spe_event_to_pmsevfr(event) & arm_spe_pmsevfr_res0(spe_pmu->pmsver))
@@ -797,11 +796,10 @@ static void arm_spe_pmu_stop(struct perf_event *event, int flags)
 static int arm_spe_pmu_add(struct perf_event *event, int flags)
 {
 	int ret = 0;
-	struct arm_spe_pmu *spe_pmu = to_spe_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
 	int cpu = event->cpu == -1 ? smp_processor_id() : event->cpu;
 
-	if (!cpumask_test_cpu(cpu, &spe_pmu->supported_cpus))
+	if (!cpumask_test_cpu(cpu, &supported_cpus))
 		return -ENOENT;
 
 	hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED;
@@ -883,6 +881,13 @@ static void arm_spe_pmu_free_aux(void *aux)
 	kfree(buf);
 }
 
+#if IS_BUILTIN(CONFIG_ARM_SPE_PMU)
+const cpumask_t *arm_spe_pmu_supported_cpus(void)
+{
+	return &supported_cpus;
+}
+#endif
+
 /* Initialisation and teardown functions */
 static int arm_spe_pmu_perf_init(struct arm_spe_pmu *spe_pmu)
 {
@@ -1039,7 +1044,7 @@ static void __arm_spe_pmu_dev_probe(void *info)
 
 	dev_info(dev,
 		 "probed for CPUs %*pbl [max_record_sz %u, align %u, features 0x%llx]\n",
-		 cpumask_pr_args(&spe_pmu->supported_cpus),
+		 cpumask_pr_args(&supported_cpus),
 		 spe_pmu->max_record_sz, spe_pmu->align, spe_pmu->features);
 
 	spe_pmu->features |= SPE_PMU_FEAT_DEV_PROBED;
@@ -1083,7 +1088,7 @@ static int arm_spe_pmu_cpu_startup(unsigned int cpu, struct hlist_node *node)
 	struct arm_spe_pmu *spe_pmu;
 
 	spe_pmu = hlist_entry_safe(node, struct arm_spe_pmu, hotplug_node);
-	if (!cpumask_test_cpu(cpu, &spe_pmu->supported_cpus))
+	if (!cpumask_test_cpu(cpu, &supported_cpus))
 		return 0;
 
 	__arm_spe_pmu_setup_one(spe_pmu);
@@ -1095,7 +1100,7 @@ static int arm_spe_pmu_cpu_teardown(unsigned int cpu, struct hlist_node *node)
 	struct arm_spe_pmu *spe_pmu;
 
 	spe_pmu = hlist_entry_safe(node, struct arm_spe_pmu, hotplug_node);
-	if (!cpumask_test_cpu(cpu, &spe_pmu->supported_cpus))
+	if (!cpumask_test_cpu(cpu, &supported_cpus))
 		return 0;
 
 	__arm_spe_pmu_stop_one(spe_pmu);
@@ -1105,7 +1110,7 @@ static int arm_spe_pmu_cpu_teardown(unsigned int cpu, struct hlist_node *node)
 static int arm_spe_pmu_dev_init(struct arm_spe_pmu *spe_pmu)
 {
 	int ret;
-	cpumask_t *mask = &spe_pmu->supported_cpus;
+	cpumask_t *mask = &supported_cpus;
 
 	/* Make sure we probe the hardware on a relevant CPU */
 	ret = smp_call_function_any(mask,  __arm_spe_pmu_dev_probe, spe_pmu, 1);
@@ -1151,7 +1156,7 @@ static int arm_spe_pmu_irq_probe(struct arm_spe_pmu *spe_pmu)
 		return -EINVAL;
 	}
 
-	if (irq_get_percpu_devid_partition(irq, &spe_pmu->supported_cpus)) {
+	if (irq_get_percpu_devid_partition(irq, &supported_cpus)) {
 		dev_err(&pdev->dev, "failed to get PPI partition (%d)\n", irq);
 		return -EINVAL;
 	}
@@ -1216,6 +1221,7 @@ static int arm_spe_pmu_device_probe(struct platform_device *pdev)
 	arm_spe_pmu_dev_teardown(spe_pmu);
 out_free_handle:
 	free_percpu(spe_pmu->handle);
+	cpumask_clear(&supported_cpus);
 	return ret;
 }
 
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index 505480217cf1..3121d95c575b 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -8,6 +8,7 @@
 #ifndef __ARM_PMU_H__
 #define __ARM_PMU_H__
 
+#include <linux/cpumask.h>
 #include <linux/interrupt.h>
 #include <linux/perf_event.h>
 #include <linux/platform_device.h>
@@ -177,4 +178,10 @@ void armpmu_free_irq(int irq, int cpu);
 
 #define ARMV8_SPE_PDEV_NAME "arm,spe-v1"
 
+#if IS_BUILTIN(CONFIG_ARM_SPE_PMU)
+extern const cpumask_t *arm_spe_pmu_supported_cpus(void);
+#else
+static inline const cpumask_t *arm_spe_pmu_supported_cpus(void) { return NULL; };
+#endif
+
 #endif /* __ARM_PMU_H__ */
-- 
2.33.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v4 16/39] KVM: arm64: Make SPE available when at least one CPU supports it
  2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
                   ` (14 preceding siblings ...)
  2021-08-25 16:17 ` [RFC PATCH v4 15/39] drivers/perf: Expose the cpumask of CPUs that support SPE Alexandru Elisei
@ 2021-08-25 16:17 ` Alexandru Elisei
  2021-08-25 16:17 ` [RFC PATCH v4 17/39] KVM: arm64: Set the VCPU SPE feature bit when SPE is available Alexandru Elisei
                   ` (23 subsequent siblings)
  39 siblings, 0 replies; 47+ messages in thread
From: Alexandru Elisei @ 2021-08-25 16:17 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	linux-kernel

KVM SPE emulation requires at least one physical CPU to support SPE.
Initialize the cpumask of PEs that support SPE the first time
KVM_CAP_ARM_SPE is queried or when the first virtual machine is created,
whichever comes first.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_spe.h | 26 ++++++++++++++++++++++++++
 arch/arm64/kvm/Makefile          |  1 +
 arch/arm64/kvm/arm.c             |  4 ++++
 arch/arm64/kvm/spe.c             | 32 ++++++++++++++++++++++++++++++++
 4 files changed, 63 insertions(+)
 create mode 100644 arch/arm64/include/asm/kvm_spe.h
 create mode 100644 arch/arm64/kvm/spe.c

diff --git a/arch/arm64/include/asm/kvm_spe.h b/arch/arm64/include/asm/kvm_spe.h
new file mode 100644
index 000000000000..328115ce0b48
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_spe.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2021 - ARM Ltd
+ */
+
+#ifndef __ARM64_KVM_SPE_H__
+#define __ARM64_KVM_SPE_H__
+
+#ifdef CONFIG_KVM_ARM_SPE
+DECLARE_STATIC_KEY_FALSE(kvm_spe_available);
+
+static __always_inline bool kvm_supports_spe(void)
+{
+	return static_branch_likely(&kvm_spe_available);
+}
+
+void kvm_spe_init_supported_cpus(void);
+void kvm_spe_vm_init(struct kvm *kvm);
+#else
+#define kvm_supports_spe()	(false)
+
+static inline void kvm_spe_init_supported_cpus(void) {}
+static inline void kvm_spe_vm_init(struct kvm *kvm) {}
+#endif /* CONFIG_KVM_ARM_SPE */
+
+#endif /* __ARM64_KVM_SPE_H__ */
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 989bb5dad2c8..86092a0f8367 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -25,3 +25,4 @@ kvm-y := $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o \
 	 vgic/vgic-its.o vgic/vgic-debug.o
 
 kvm-$(CONFIG_HW_PERF_EVENTS)  += pmu-emul.o
+kvm-$(CONFIG_KVM_ARM_SPE)     += spe.o
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 22544eb367f3..82cb7b5b3b45 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -35,6 +35,7 @@
 #include <asm/kvm_arm.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_spe.h>
 #include <asm/kvm_emulate.h>
 #include <asm/sections.h>
 
@@ -180,6 +181,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 
 	set_default_spectre(kvm);
 
+	kvm_spe_vm_init(kvm);
+
 	return ret;
 out_free_stage2_pgd:
 	kvm_free_stage2_pgd(&kvm->arch.mmu);
@@ -305,6 +308,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		r = 1;
 		break;
 	case KVM_CAP_ARM_SPE:
+		kvm_spe_init_supported_cpus();
 		r = 0;
 		break;
 	default:
diff --git a/arch/arm64/kvm/spe.c b/arch/arm64/kvm/spe.c
new file mode 100644
index 000000000000..83f92245f881
--- /dev/null
+++ b/arch/arm64/kvm/spe.c
@@ -0,0 +1,32 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2021 - ARM Ltd
+ */
+
+#include <linux/cpumask.h>
+#include <linux/kvm_host.h>
+#include <linux/perf/arm_pmu.h>
+
+#include <asm/kvm_spe.h>
+
+DEFINE_STATIC_KEY_FALSE(kvm_spe_available);
+
+static const cpumask_t *supported_cpus;
+
+void kvm_spe_init_supported_cpus(void)
+{
+	if (likely(supported_cpus))
+		return;
+
+	supported_cpus = arm_spe_pmu_supported_cpus();
+	BUG_ON(!supported_cpus);
+
+	if (!cpumask_empty(supported_cpus))
+		static_branch_enable(&kvm_spe_available);
+}
+
+void kvm_spe_vm_init(struct kvm *kvm)
+{
+	/* Set supported_cpus if it isn't already initialized. */
+	kvm_spe_init_supported_cpus();
+}
-- 
2.33.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v4 17/39] KVM: arm64: Set the VCPU SPE feature bit when SPE is available
  2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
                   ` (15 preceding siblings ...)
  2021-08-25 16:17 ` [RFC PATCH v4 16/39] KVM: arm64: Make SPE available when at least one CPU supports it Alexandru Elisei
@ 2021-08-25 16:17 ` Alexandru Elisei
  2021-08-25 16:17 ` [RFC PATCH v4 18/39] KVM: arm64: Expose SPE version to guests Alexandru Elisei
                   ` (22 subsequent siblings)
  39 siblings, 0 replies; 47+ messages in thread
From: Alexandru Elisei @ 2021-08-25 16:17 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	linux-kernel

Check that KVM SPE emulation is supported before allowing userspace to set
the KVM_ARM_VCPU_SPE feature.

According to ARM DDI 0487G.a, page D9-2946 the Profiling Buffer is disabled
if the owning Exception level is 32 bit, reject the SPE feature if the
VCPU's execution state at EL1 is aarch32.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_host.h |  3 +++
 arch/arm64/kvm/reset.c            | 23 +++++++++++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 1f3b46a6df81..948adb152104 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -807,6 +807,9 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
 #define kvm_vcpu_has_pmu(vcpu)					\
 	(test_bit(KVM_ARM_VCPU_PMU_V3, (vcpu)->arch.features))
 
+#define kvm_vcpu_has_spe(vcpu)					\
+	(test_bit(KVM_ARM_VCPU_SPE, (vcpu)->arch.features))
+
 int kvm_trng_call(struct kvm_vcpu *vcpu);
 #ifdef CONFIG_KVM
 extern phys_addr_t hyp_mem_base;
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index cba7872d69a8..17b9f1b29c24 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -27,6 +27,7 @@
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_spe.h>
 #include <asm/virt.h>
 
 /* Maximum phys_shift supported for any VM on this host */
@@ -189,6 +190,21 @@ static bool vcpu_allowed_register_width(struct kvm_vcpu *vcpu)
 	return true;
 }
 
+static int kvm_vcpu_enable_spe(struct kvm_vcpu *vcpu)
+{
+	if (!kvm_supports_spe())
+		return -EINVAL;
+
+	/*
+	 * The Profiling Buffer is disabled if the owning Exception level is
+	 * aarch32.
+	 */
+	if (vcpu_has_feature(vcpu, KVM_ARM_VCPU_EL1_32BIT))
+		return -EINVAL;
+
+	return 0;
+}
+
 /**
  * kvm_reset_vcpu - sets core registers and sys_regs to reset value
  * @vcpu: The VCPU pointer
@@ -245,6 +261,13 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 		goto out;
 	}
 
+	if (kvm_vcpu_has_spe(vcpu)) {
+		if (kvm_vcpu_enable_spe(vcpu)) {
+			ret = -EINVAL;
+			goto out;
+		}
+	}
+
 	switch (vcpu->arch.target) {
 	default:
 		if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) {
-- 
2.33.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v4 18/39] KVM: arm64: Expose SPE version to guests
  2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
                   ` (16 preceding siblings ...)
  2021-08-25 16:17 ` [RFC PATCH v4 17/39] KVM: arm64: Set the VCPU SPE feature bit when SPE is available Alexandru Elisei
@ 2021-08-25 16:17 ` Alexandru Elisei
  2021-08-25 16:17 ` [RFC PATCH v4 19/39] KVM: arm64: Do not emulate SPE on CPUs which don't have SPE Alexandru Elisei
                   ` (21 subsequent siblings)
  39 siblings, 0 replies; 47+ messages in thread
From: Alexandru Elisei @ 2021-08-25 16:17 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	linux-kernel

Set the ID_AA64DFR0_EL1.PMSVer field to a non-zero value if the VCPU SPE
feature is set. SPE version is capped at FEAT_SPEv1p1 because KVM doesn't
yet implement freezing of PMU event counters on a SPE buffer management
event.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/kvm/sys_regs.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index f6f126eb6ac1..ab7370b7a44b 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1070,8 +1070,10 @@ static u64 read_id_reg(const struct kvm_vcpu *vcpu,
 		val = cpuid_feature_cap_perfmon_field(val,
 						      ID_AA64DFR0_PMUVER_SHIFT,
 						      kvm_vcpu_has_pmu(vcpu) ? ID_AA64DFR0_PMUVER_8_4 : 0);
-		/* Hide SPE from guests */
-		val &= ~FEATURE(ID_AA64DFR0_PMSVER);
+		/* Limit guests to SPE for ARMv8.3 */
+		val = cpuid_feature_cap_perfmon_field(val,
+						      ID_AA64DFR0_PMSVER_SHIFT,
+						      kvm_vcpu_has_spe(vcpu) ? ID_AA64DFR0_PMSVER_8_3 : 0);
 		break;
 	case SYS_ID_DFR0_EL1:
 		/* Limit guests to PMUv3 for ARMv8.4 */
-- 
2.33.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v4 19/39] KVM: arm64: Do not emulate SPE on CPUs which don't have SPE
  2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
                   ` (17 preceding siblings ...)
  2021-08-25 16:17 ` [RFC PATCH v4 18/39] KVM: arm64: Expose SPE version to guests Alexandru Elisei
@ 2021-08-25 16:17 ` Alexandru Elisei
  2021-08-25 16:17 ` [RFC PATCH v4 20/39] KVM: arm64: Add a new VCPU device control group for SPE Alexandru Elisei
                   ` (20 subsequent siblings)
  39 siblings, 0 replies; 47+ messages in thread
From: Alexandru Elisei @ 2021-08-25 16:17 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	linux-kernel

The kernel allows heterogeneous systems where FEAT_SPE is not present on
all CPUs. This presents a challenge for KVM, as it will have to touch the
SPE registers when emulating SPE for a guest, and those accesses will cause
an undefined exception if SPE is not present on the CPU.

Avoid this situation by comparing the cpumask of the physical CPUs that
support SPE with the cpu list provided by userspace via the
KVM_ARM_VCPU_SUPPORTED_CPUS ioctl and refusing the run the VCPU if there is
a mismatch.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_spe.h |  2 ++
 arch/arm64/kvm/arm.c             |  3 +++
 arch/arm64/kvm/spe.c             | 12 ++++++++++++
 3 files changed, 17 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_spe.h b/arch/arm64/include/asm/kvm_spe.h
index 328115ce0b48..ed67ddbf8132 100644
--- a/arch/arm64/include/asm/kvm_spe.h
+++ b/arch/arm64/include/asm/kvm_spe.h
@@ -16,11 +16,13 @@ static __always_inline bool kvm_supports_spe(void)
 
 void kvm_spe_init_supported_cpus(void);
 void kvm_spe_vm_init(struct kvm *kvm);
+int kvm_spe_check_supported_cpus(struct kvm_vcpu *vcpu);
 #else
 #define kvm_supports_spe()	(false)
 
 static inline void kvm_spe_init_supported_cpus(void) {}
 static inline void kvm_spe_vm_init(struct kvm *kvm) {}
+static inline int kvm_spe_check_supported_cpus(struct kvm_vcpu *vcpu) { return -ENOEXEC; }
 #endif /* CONFIG_KVM_ARM_SPE */
 
 #endif /* __ARM64_KVM_SPE_H__ */
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 82cb7b5b3b45..8f7025f2e4a0 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -633,6 +633,9 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
 	if (!kvm_arm_vcpu_is_finalized(vcpu))
 		return -EPERM;
 
+	if (kvm_vcpu_has_spe(vcpu) && kvm_spe_check_supported_cpus(vcpu))
+		return -EPERM;
+
 	vcpu->arch.has_run_once = true;
 
 	kvm_arm_vcpu_init_debug(vcpu);
diff --git a/arch/arm64/kvm/spe.c b/arch/arm64/kvm/spe.c
index 83f92245f881..8d2afc137151 100644
--- a/arch/arm64/kvm/spe.c
+++ b/arch/arm64/kvm/spe.c
@@ -30,3 +30,15 @@ void kvm_spe_vm_init(struct kvm *kvm)
 	/* Set supported_cpus if it isn't already initialized. */
 	kvm_spe_init_supported_cpus();
 }
+
+int kvm_spe_check_supported_cpus(struct kvm_vcpu *vcpu)
+{
+	/* SPE is supported on all CPUs, we don't care about the VCPU mask */
+	if (cpumask_equal(supported_cpus, cpu_possible_mask))
+		return 0;
+
+	if (!cpumask_subset(&vcpu->arch.supported_cpus, supported_cpus))
+		return -ENOEXEC;
+
+	return 0;
+}
-- 
2.33.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v4 20/39] KVM: arm64: Add a new VCPU device control group for SPE
  2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
                   ` (18 preceding siblings ...)
  2021-08-25 16:17 ` [RFC PATCH v4 19/39] KVM: arm64: Do not emulate SPE on CPUs which don't have SPE Alexandru Elisei
@ 2021-08-25 16:17 ` Alexandru Elisei
  2021-08-25 16:17 ` [RFC PATCH v4 21/39] KVM: arm64: Add SPE VCPU device attribute to set the interrupt number Alexandru Elisei
                   ` (19 subsequent siblings)
  39 siblings, 0 replies; 47+ messages in thread
From: Alexandru Elisei @ 2021-08-25 16:17 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	linux-kernel
  Cc: Sudeep Holla

From: Sudeep Holla <sudeep.holla@arm.com>

Add a new VCPU device control group to control various aspects of KVM's SPE
emulation. Functionality will be added in later patches.

[ Alexandru E: Rewrote patch ]

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 Documentation/virt/kvm/devices/vcpu.rst |  5 +++++
 arch/arm64/include/asm/kvm_spe.h        | 20 ++++++++++++++++++++
 arch/arm64/include/uapi/asm/kvm.h       |  1 +
 arch/arm64/kvm/guest.c                  | 10 ++++++++++
 arch/arm64/kvm/spe.c                    | 15 +++++++++++++++
 5 files changed, 51 insertions(+)

diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst
index 2acec3b9ef65..85399c005197 100644
--- a/Documentation/virt/kvm/devices/vcpu.rst
+++ b/Documentation/virt/kvm/devices/vcpu.rst
@@ -161,3 +161,8 @@ Specifies the base address of the stolen time structure for this VCPU. The
 base address must be 64 byte aligned and exist within a valid guest memory
 region. See Documentation/virt/kvm/arm/pvtime.rst for more information
 including the layout of the stolen time structure.
+
+4. GROUP: KVM_ARM_VCPU_SPE_CTRL
+===============================
+
+:Architectures: ARM64
diff --git a/arch/arm64/include/asm/kvm_spe.h b/arch/arm64/include/asm/kvm_spe.h
index ed67ddbf8132..ce0f5b3f2027 100644
--- a/arch/arm64/include/asm/kvm_spe.h
+++ b/arch/arm64/include/asm/kvm_spe.h
@@ -17,12 +17,32 @@ static __always_inline bool kvm_supports_spe(void)
 void kvm_spe_init_supported_cpus(void);
 void kvm_spe_vm_init(struct kvm *kvm);
 int kvm_spe_check_supported_cpus(struct kvm_vcpu *vcpu);
+
+int kvm_spe_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
+int kvm_spe_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
+int kvm_spe_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
 #else
 #define kvm_supports_spe()	(false)
 
 static inline void kvm_spe_init_supported_cpus(void) {}
 static inline void kvm_spe_vm_init(struct kvm *kvm) {}
 static inline int kvm_spe_check_supported_cpus(struct kvm_vcpu *vcpu) { return -ENOEXEC; }
+
+static inline int kvm_spe_set_attr(struct kvm_vcpu *vcpu,
+				   struct kvm_device_attr *attr)
+{
+	return -ENXIO;
+}
+static inline int kvm_spe_get_attr(struct kvm_vcpu *vcpu,
+				   struct kvm_device_attr *attr)
+{
+	return -ENXIO;
+}
+static inline int kvm_spe_has_attr(struct kvm_vcpu *vcpu,
+				   struct kvm_device_attr *attr)
+{
+	return -ENXIO;
+}
 #endif /* CONFIG_KVM_ARM_SPE */
 
 #endif /* __ARM64_KVM_SPE_H__ */
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 9f0a8ea50ea9..7159a1e23da2 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -368,6 +368,7 @@ struct kvm_arm_copy_mte_tags {
 #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER		1
 #define KVM_ARM_VCPU_PVTIME_CTRL	2
 #define   KVM_ARM_VCPU_PVTIME_IPA	0
+#define KVM_ARM_VCPU_SPE_CTRL		3
 
 /* KVM_IRQ_LINE irq field index values */
 #define KVM_ARM_IRQ_VCPU2_SHIFT		28
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 1dfb83578277..316110b5dd95 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -24,6 +24,7 @@
 #include <asm/fpsimd.h>
 #include <asm/kvm.h>
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_spe.h>
 #include <asm/sigcontext.h>
 
 #include "trace.h"
@@ -962,6 +963,9 @@ int kvm_arm_vcpu_arch_set_attr(struct kvm_vcpu *vcpu,
 	case KVM_ARM_VCPU_PVTIME_CTRL:
 		ret = kvm_arm_pvtime_set_attr(vcpu, attr);
 		break;
+	case KVM_ARM_VCPU_SPE_CTRL:
+		ret = kvm_spe_set_attr(vcpu, attr);
+		break;
 	default:
 		ret = -ENXIO;
 		break;
@@ -985,6 +989,9 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
 	case KVM_ARM_VCPU_PVTIME_CTRL:
 		ret = kvm_arm_pvtime_get_attr(vcpu, attr);
 		break;
+	case KVM_ARM_VCPU_SPE_CTRL:
+		ret = kvm_spe_get_attr(vcpu, attr);
+		break;
 	default:
 		ret = -ENXIO;
 		break;
@@ -1008,6 +1015,9 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
 	case KVM_ARM_VCPU_PVTIME_CTRL:
 		ret = kvm_arm_pvtime_has_attr(vcpu, attr);
 		break;
+	case KVM_ARM_VCPU_SPE_CTRL:
+		ret = kvm_spe_has_attr(vcpu, attr);
+		break;
 	default:
 		ret = -ENXIO;
 		break;
diff --git a/arch/arm64/kvm/spe.c b/arch/arm64/kvm/spe.c
index 8d2afc137151..56a3fdb35623 100644
--- a/arch/arm64/kvm/spe.c
+++ b/arch/arm64/kvm/spe.c
@@ -42,3 +42,18 @@ int kvm_spe_check_supported_cpus(struct kvm_vcpu *vcpu)
 
 	return 0;
 }
+
+int kvm_spe_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
+{
+	return -ENXIO;
+}
+
+int kvm_spe_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
+{
+	return -ENXIO;
+}
+
+int kvm_spe_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
+{
+	return -ENXIO;
+}
-- 
2.33.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v4 21/39] KVM: arm64: Add SPE VCPU device attribute to set the interrupt number
  2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
                   ` (19 preceding siblings ...)
  2021-08-25 16:17 ` [RFC PATCH v4 20/39] KVM: arm64: Add a new VCPU device control group for SPE Alexandru Elisei
@ 2021-08-25 16:17 ` Alexandru Elisei
  2021-08-25 16:17 ` [RFC PATCH v4 22/39] KVM: arm64: Add SPE VCPU device attribute to initialize SPE Alexandru Elisei
                   ` (18 subsequent siblings)
  39 siblings, 0 replies; 47+ messages in thread
From: Alexandru Elisei @ 2021-08-25 16:17 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	linux-kernel
  Cc: Sudeep Holla

From: Sudeep Holla <sudeep.holla@arm.com>

Add KVM_ARM_VCPU_SPE_CTRL(KVM_ARM_VCPU_SPE_IRQ) to allow the user to set
the interrupt number for the buffer management interrupt.

[ Alexandru E: Split from "KVM: arm64: Add a new VCPU device control group
               for SPE" ]

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 Documentation/virt/kvm/devices/vcpu.rst | 19 ++++++
 arch/arm64/include/asm/kvm_host.h       |  2 +
 arch/arm64/include/asm/kvm_spe.h        | 10 +++
 arch/arm64/include/uapi/asm/kvm.h       |  1 +
 arch/arm64/kvm/spe.c                    | 86 +++++++++++++++++++++++++
 5 files changed, 118 insertions(+)

diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst
index 85399c005197..05821d40849f 100644
--- a/Documentation/virt/kvm/devices/vcpu.rst
+++ b/Documentation/virt/kvm/devices/vcpu.rst
@@ -166,3 +166,22 @@ including the layout of the stolen time structure.
 ===============================
 
 :Architectures: ARM64
+
+4.1 ATTRIBUTE: KVM_ARM_VCPU_SPE_IRQ
+-----------------------------------
+
+:Parameters: in kvm_device_attr.addr the address for the Profiling Buffer
+             management interrupt number as a pointer to an int
+
+Returns:
+
+	 =======  ======================================================
+	 -EBUSY   Interrupt number already set for this VCPU
+	 -EFAULT  Error accessing the buffer management interrupt number
+	 -EINVAL  Invalid interrupt number
+	 -ENXIO   SPE not supported or not properly configured
+	 =======  ======================================================
+
+Specifies the Profiling Buffer management interrupt number. The interrupt number
+must be a PPI and the interrupt number must be the same for each VCPU. SPE
+emulation requires an in-kernel vGIC implementation.
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 948adb152104..7b957e439b3d 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -26,6 +26,7 @@
 #include <asm/fpsimd.h>
 #include <asm/kvm.h>
 #include <asm/kvm_asm.h>
+#include <asm/kvm_spe.h>
 #include <asm/thread_info.h>
 
 #define __KVM_HAVE_ARCH_INTC_INITIALIZED
@@ -353,6 +354,7 @@ struct kvm_vcpu_arch {
 	struct vgic_cpu vgic_cpu;
 	struct arch_timer_cpu timer_cpu;
 	struct kvm_pmu pmu;
+	struct kvm_vcpu_spe spe;
 
 	/*
 	 * Anything that is not used directly from assembly code goes
diff --git a/arch/arm64/include/asm/kvm_spe.h b/arch/arm64/include/asm/kvm_spe.h
index ce0f5b3f2027..2fe11868719d 100644
--- a/arch/arm64/include/asm/kvm_spe.h
+++ b/arch/arm64/include/asm/kvm_spe.h
@@ -6,6 +6,8 @@
 #ifndef __ARM64_KVM_SPE_H__
 #define __ARM64_KVM_SPE_H__
 
+#include <linux/kvm.h>
+
 #ifdef CONFIG_KVM_ARM_SPE
 DECLARE_STATIC_KEY_FALSE(kvm_spe_available);
 
@@ -14,6 +16,11 @@ static __always_inline bool kvm_supports_spe(void)
 	return static_branch_likely(&kvm_spe_available);
 }
 
+struct kvm_vcpu_spe {
+	bool initialized;	/* SPE initialized for the VCPU */
+	int irq_num;		/* Buffer management interrut number */
+};
+
 void kvm_spe_init_supported_cpus(void);
 void kvm_spe_vm_init(struct kvm *kvm);
 int kvm_spe_check_supported_cpus(struct kvm_vcpu *vcpu);
@@ -24,6 +31,9 @@ int kvm_spe_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
 #else
 #define kvm_supports_spe()	(false)
 
+struct kvm_vcpu_spe {
+};
+
 static inline void kvm_spe_init_supported_cpus(void) {}
 static inline void kvm_spe_vm_init(struct kvm *kvm) {}
 static inline int kvm_spe_check_supported_cpus(struct kvm_vcpu *vcpu) { return -ENOEXEC; }
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 7159a1e23da2..c55d94a1a8f5 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -369,6 +369,7 @@ struct kvm_arm_copy_mte_tags {
 #define KVM_ARM_VCPU_PVTIME_CTRL	2
 #define   KVM_ARM_VCPU_PVTIME_IPA	0
 #define KVM_ARM_VCPU_SPE_CTRL		3
+#define   KVM_ARM_VCPU_SPE_IRQ		0
 
 /* KVM_IRQ_LINE irq field index values */
 #define KVM_ARM_IRQ_VCPU2_SHIFT		28
diff --git a/arch/arm64/kvm/spe.c b/arch/arm64/kvm/spe.c
index 56a3fdb35623..2fdb42e27675 100644
--- a/arch/arm64/kvm/spe.c
+++ b/arch/arm64/kvm/spe.c
@@ -43,17 +43,103 @@ int kvm_spe_check_supported_cpus(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+static bool kvm_vcpu_supports_spe(struct kvm_vcpu *vcpu)
+{
+	if (!kvm_supports_spe())
+		return false;
+
+	if (!kvm_vcpu_has_spe(vcpu))
+		return false;
+
+	if (!irqchip_in_kernel(vcpu->kvm))
+		return false;
+
+	return true;
+}
+
+static bool kvm_spe_irq_is_valid(struct kvm *kvm, int irq)
+{
+	struct kvm_vcpu *vcpu;
+	int i;
+
+	if (!irq_is_ppi(irq))
+		return -EINVAL;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		if (!vcpu->arch.spe.irq_num)
+			continue;
+
+		if (vcpu->arch.spe.irq_num != irq)
+			return false;
+	}
+
+	return true;
+}
+
 int kvm_spe_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 {
+	if (!kvm_vcpu_supports_spe(vcpu))
+		return -ENXIO;
+
+	if (vcpu->arch.spe.initialized)
+		return -EBUSY;
+
+	switch (attr->attr) {
+	case KVM_ARM_VCPU_SPE_IRQ: {
+		int __user *uaddr = (int __user *)(long)attr->addr;
+		int irq;
+
+		if (vcpu->arch.spe.irq_num)
+			return -EBUSY;
+
+		if (get_user(irq, uaddr))
+			return -EFAULT;
+
+		if (!kvm_spe_irq_is_valid(vcpu->kvm, irq))
+			return -EINVAL;
+
+		kvm_debug("Set KVM_ARM_VCPU_SPE_IRQ: %d\n", irq);
+		vcpu->arch.spe.irq_num = irq;
+		return 0;
+	}
+	}
+
 	return -ENXIO;
 }
 
 int kvm_spe_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 {
+	if (!kvm_vcpu_supports_spe(vcpu))
+		return -ENXIO;
+
+	switch (attr->attr) {
+	case KVM_ARM_VCPU_SPE_IRQ: {
+		int __user *uaddr = (int __user *)(long)attr->addr;
+		int irq;
+
+		if (!vcpu->arch.spe.irq_num)
+			return -ENXIO;
+
+		irq = vcpu->arch.spe.irq_num;
+		if (put_user(irq, uaddr))
+			return -EFAULT;
+
+		return 0;
+	}
+	}
+
 	return -ENXIO;
 }
 
 int kvm_spe_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 {
+	if (!kvm_vcpu_supports_spe(vcpu))
+		return -ENXIO;
+
+	switch(attr->attr) {
+	case KVM_ARM_VCPU_SPE_IRQ:
+		return 0;
+	}
+
 	return -ENXIO;
 }
-- 
2.33.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v4 22/39] KVM: arm64: Add SPE VCPU device attribute to initialize SPE
  2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
                   ` (20 preceding siblings ...)
  2021-08-25 16:17 ` [RFC PATCH v4 21/39] KVM: arm64: Add SPE VCPU device attribute to set the interrupt number Alexandru Elisei
@ 2021-08-25 16:17 ` Alexandru Elisei
  2021-08-25 16:17 ` [RFC PATCH v4 23/39] KVM: arm64: VHE: Clear MDCR_EL2.E2PB in vcpu_put() Alexandru Elisei
                   ` (17 subsequent siblings)
  39 siblings, 0 replies; 47+ messages in thread
From: Alexandru Elisei @ 2021-08-25 16:17 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	linux-kernel
  Cc: Sudeep Holla

From: Sudeep Holla <sudeep.holla@arm.com>

Add KVM_ARM_VCPU_SPE_CTRL(KVM_ARM_VCPU_SPE_INIT) VCPU ioctl to initialize
SPE. Initialization can only be done once for a VCPU. If the feature bit is
set, then SPE must be initialized before the VCPU can be run.

[ Alexandru E: Split from "KVM: arm64: Add a new VCPU device control group
	       for SPE" ]

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 Documentation/virt/kvm/devices/vcpu.rst | 16 ++++++++++++++
 arch/arm64/include/asm/kvm_spe.h        |  4 ++--
 arch/arm64/include/uapi/asm/kvm.h       |  1 +
 arch/arm64/kvm/arm.c                    |  7 ++++--
 arch/arm64/kvm/spe.c                    | 29 ++++++++++++++++++++++++-
 5 files changed, 52 insertions(+), 5 deletions(-)

diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst
index 05821d40849f..c275c320e500 100644
--- a/Documentation/virt/kvm/devices/vcpu.rst
+++ b/Documentation/virt/kvm/devices/vcpu.rst
@@ -185,3 +185,19 @@ Returns:
 Specifies the Profiling Buffer management interrupt number. The interrupt number
 must be a PPI and the interrupt number must be the same for each VCPU. SPE
 emulation requires an in-kernel vGIC implementation.
+
+4.2 ATTRIBUTE: KVM_ARM_VCPU_SPE_INIT
+-----------------------------------
+
+:Parameters: no additional parameter in kvm_device_attr.addr
+
+Returns:
+
+	 =======  ============================================
+	 -EBUSY   SPE already initialized for this VCPU
+	 -ENXIO   SPE not supported or not properly configured
+	 =======  ============================================
+
+Request initialization of the Statistical Profiling Extension for this VCPU.
+Must be done after initializaing the in-kernel irqchip and after setting the
+Profiling Buffer management interrupt number for the VCPU.
diff --git a/arch/arm64/include/asm/kvm_spe.h b/arch/arm64/include/asm/kvm_spe.h
index 2fe11868719d..2217b821ab37 100644
--- a/arch/arm64/include/asm/kvm_spe.h
+++ b/arch/arm64/include/asm/kvm_spe.h
@@ -23,7 +23,7 @@ struct kvm_vcpu_spe {
 
 void kvm_spe_init_supported_cpus(void);
 void kvm_spe_vm_init(struct kvm *kvm);
-int kvm_spe_check_supported_cpus(struct kvm_vcpu *vcpu);
+int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu);
 
 int kvm_spe_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
 int kvm_spe_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
@@ -36,7 +36,7 @@ struct kvm_vcpu_spe {
 
 static inline void kvm_spe_init_supported_cpus(void) {}
 static inline void kvm_spe_vm_init(struct kvm *kvm) {}
-static inline int kvm_spe_check_supported_cpus(struct kvm_vcpu *vcpu) { return -ENOEXEC; }
+static inline int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu) { return -ENOEXEC; }
 
 static inline int kvm_spe_set_attr(struct kvm_vcpu *vcpu,
 				   struct kvm_device_attr *attr)
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index c55d94a1a8f5..d4c0b53a5fb2 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -370,6 +370,7 @@ struct kvm_arm_copy_mte_tags {
 #define   KVM_ARM_VCPU_PVTIME_IPA	0
 #define KVM_ARM_VCPU_SPE_CTRL		3
 #define   KVM_ARM_VCPU_SPE_IRQ		0
+#define   KVM_ARM_VCPU_SPE_INIT		1
 
 /* KVM_IRQ_LINE irq field index values */
 #define KVM_ARM_IRQ_VCPU2_SHIFT		28
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 8f7025f2e4a0..6af7ef26d2c1 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -633,8 +633,11 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
 	if (!kvm_arm_vcpu_is_finalized(vcpu))
 		return -EPERM;
 
-	if (kvm_vcpu_has_spe(vcpu) && kvm_spe_check_supported_cpus(vcpu))
-		return -EPERM;
+	if (kvm_vcpu_has_spe(vcpu)) {
+		ret = kvm_spe_vcpu_first_run_init(vcpu);
+		if (ret)
+			return ret;
+	}
 
 	vcpu->arch.has_run_once = true;
 
diff --git a/arch/arm64/kvm/spe.c b/arch/arm64/kvm/spe.c
index 2fdb42e27675..801ceb66a3d0 100644
--- a/arch/arm64/kvm/spe.c
+++ b/arch/arm64/kvm/spe.c
@@ -31,7 +31,7 @@ void kvm_spe_vm_init(struct kvm *kvm)
 	kvm_spe_init_supported_cpus();
 }
 
-int kvm_spe_check_supported_cpus(struct kvm_vcpu *vcpu)
+static int kvm_spe_check_supported_cpus(struct kvm_vcpu *vcpu)
 {
 	/* SPE is supported on all CPUs, we don't care about the VCPU mask */
 	if (cpumask_equal(supported_cpus, cpu_possible_mask))
@@ -43,6 +43,20 @@ int kvm_spe_check_supported_cpus(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu)
+{
+	int ret;
+
+	ret = kvm_spe_check_supported_cpus(vcpu);
+	if (ret)
+		return ret;
+
+	if (!vcpu->arch.spe.initialized)
+		return -EPERM;
+
+	return 0;
+}
+
 static bool kvm_vcpu_supports_spe(struct kvm_vcpu *vcpu)
 {
 	if (!kvm_supports_spe())
@@ -102,6 +116,18 @@ int kvm_spe_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 		vcpu->arch.spe.irq_num = irq;
 		return 0;
 	}
+	case KVM_ARM_VCPU_SPE_INIT:
+		if (!vcpu->arch.spe.irq_num)
+			return -ENXIO;
+
+		if (!vgic_initialized(vcpu->kvm))
+			return -ENXIO;
+
+		if (kvm_vgic_set_owner(vcpu, vcpu->arch.spe.irq_num, &vcpu->arch.spe))
+			return -ENXIO;
+
+		vcpu->arch.spe.initialized = true;
+		return 0;
 	}
 
 	return -ENXIO;
@@ -138,6 +164,7 @@ int kvm_spe_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 
 	switch(attr->attr) {
 	case KVM_ARM_VCPU_SPE_IRQ:
+	case KVM_ARM_VCPU_SPE_INIT:
 		return 0;
 	}
 
-- 
2.33.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v4 23/39] KVM: arm64: VHE: Clear MDCR_EL2.E2PB in vcpu_put()
  2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
                   ` (21 preceding siblings ...)
  2021-08-25 16:17 ` [RFC PATCH v4 22/39] KVM: arm64: Add SPE VCPU device attribute to initialize SPE Alexandru Elisei
@ 2021-08-25 16:17 ` Alexandru Elisei
  2021-08-25 16:18 ` [RFC PATCH v4 24/39] KVM: arm64: debug: Configure MDCR_EL2 when a VCPU has SPE Alexandru Elisei
                   ` (16 subsequent siblings)
  39 siblings, 0 replies; 47+ messages in thread
From: Alexandru Elisei @ 2021-08-25 16:17 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	linux-kernel
  Cc: Sudeep Holla

From: Sudeep Holla <sudeep.holla@arm.com>

On VHE systems, the kernel executes at EL2 and configures the profiling
buffer to use the EL2&0 translation regime and to trap accesses from the
guest by clearing MDCR_EL2.E2PB. In vcpu_put(), KVM does a bitwise or with
the E2PB mask, preserving its value. This has been correct so far, since
MDCR_EL2.E2B has the same value (0b00) for all VMs, as set by
kvm_arm_setup_mdcr_el2().

However, this will change when KVM enables support for SPE in guests. For
such guests KVM will configure the profiling buffer to use the EL1&0
translation regime, a setting that is obviously undesirable to be preserved
for the host running at EL2. Let's avoid this situation by explicitly
clearing E2PB in vcpu_put().

[ Alexandru E: Reworded commit ]

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/kvm/hyp/vhe/switch.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index b3229924d243..86d4c8c33f3e 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -95,9 +95,7 @@ void deactivate_traps_vhe_put(void)
 {
 	u64 mdcr_el2 = read_sysreg(mdcr_el2);
 
-	mdcr_el2 &= MDCR_EL2_HPMN_MASK |
-		    MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT |
-		    MDCR_EL2_TPMS;
+	mdcr_el2 &= MDCR_EL2_HPMN_MASK | MDCR_EL2_TPMS;
 
 	write_sysreg(mdcr_el2, mdcr_el2);
 
-- 
2.33.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v4 24/39] KVM: arm64: debug: Configure MDCR_EL2 when a VCPU has SPE
  2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
                   ` (22 preceding siblings ...)
  2021-08-25 16:17 ` [RFC PATCH v4 23/39] KVM: arm64: VHE: Clear MDCR_EL2.E2PB in vcpu_put() Alexandru Elisei
@ 2021-08-25 16:18 ` Alexandru Elisei
  2021-08-25 16:18 ` [RFC PATCH v4 25/39] KVM: arm64: Move the write to MDCR_EL2 out of __activate_traps_common() Alexandru Elisei
                   ` (15 subsequent siblings)
  39 siblings, 0 replies; 47+ messages in thread
From: Alexandru Elisei @ 2021-08-25 16:18 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	linux-kernel

Allow the guest running at EL1 to use SPE when that feature is enabled for
the VCPU by setting the profiling buffer owning translation regime to EL1&0
and disabling traps to the profiling control registers. Keep trapping
accesses to the buffer control registers because that's needed to emulate
the buffer management interrupt.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_arm.h |  1 +
 arch/arm64/kvm/debug.c           | 23 +++++++++++++++++++----
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index d436831dd706..d939da6f54dc 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -285,6 +285,7 @@
 #define MDCR_EL2_TPMS		(1 << 14)
 #define MDCR_EL2_E2PB_MASK	(UL(0x3))
 #define MDCR_EL2_E2PB_SHIFT	(UL(12))
+#define MDCR_EL2_E2PB_EL1_TRAP	(UL(2))
 #define MDCR_EL2_TDRA		(1 << 11)
 #define MDCR_EL2_TDOSA		(1 << 10)
 #define MDCR_EL2_TDA		(1 << 9)
diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index d5e79d7ee6e9..64e8211366b6 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -77,24 +77,39 @@ void kvm_arm_init_debug(void)
  *  - Performance monitors (MDCR_EL2_TPM/MDCR_EL2_TPMCR)
  *  - Debug ROM Address (MDCR_EL2_TDRA)
  *  - OS related registers (MDCR_EL2_TDOSA)
- *  - Statistical profiler (MDCR_EL2_TPMS/MDCR_EL2_E2PB)
  *  - Self-hosted Trace Filter controls (MDCR_EL2_TTRF)
  *  - Self-hosted Trace (MDCR_EL2_TTRF/MDCR_EL2_E2TB)
  */
 static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu)
 {
 	/*
-	 * This also clears MDCR_EL2_E2PB_MASK and MDCR_EL2_E2TB_MASK
-	 * to disable guest access to the profiling and trace buffers
+	 * This also clears MDCR_EL2_E2TB_MASK to disable guest access to the
+	 * trace buffers.
 	 */
 	vcpu->arch.mdcr_el2 = __this_cpu_read(mdcr_el2) & MDCR_EL2_HPMN_MASK;
 	vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
-				MDCR_EL2_TPMS |
 				MDCR_EL2_TTRF |
 				MDCR_EL2_TPMCR |
 				MDCR_EL2_TDRA |
 				MDCR_EL2_TDOSA);
 
+	if (kvm_supports_spe() && kvm_vcpu_has_spe(vcpu)) {
+		/*
+		 * Use EL1&0 for the profiling buffer translation regime and
+		 * trap accesses to the buffer control registers; leave
+		 * MDCR_EL2.TPMS unset and do not trap accesses to the profiling
+		 * control registers.
+		 */
+		vcpu->arch.mdcr_el2 |= MDCR_EL2_E2PB_EL1_TRAP << MDCR_EL2_E2PB_SHIFT;
+	} else {
+		/*
+		 * Trap accesses to the profiling control registers; leave
+		 * MDCR_EL2.E2PB unset and use the EL2&0 translation regime for
+		 * the profiling buffer.
+		 */
+		vcpu->arch.mdcr_el2 |= MDCR_EL2_TPMS;
+	}
+
 	/* Is the VM being debugged by userspace? */
 	if (vcpu->guest_debug)
 		/* Route all software debug exceptions to EL2 */
-- 
2.33.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v4 25/39] KVM: arm64: Move the write to MDCR_EL2 out of __activate_traps_common()
  2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
                   ` (23 preceding siblings ...)
  2021-08-25 16:18 ` [RFC PATCH v4 24/39] KVM: arm64: debug: Configure MDCR_EL2 when a VCPU has SPE Alexandru Elisei
@ 2021-08-25 16:18 ` Alexandru Elisei
  2021-08-25 16:18 ` [RFC PATCH v4 26/39] KVM: arm64: VHE: Change MDCR_EL2 at world switch if VCPU has SPE Alexandru Elisei
                   ` (14 subsequent siblings)
  39 siblings, 0 replies; 47+ messages in thread
From: Alexandru Elisei @ 2021-08-25 16:18 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	linux-kernel

To run a guest with SPE, MDCR_EL2 must be configured such that the buffer
owning regime is EL1&0. With VHE enabled, the guest runs at EL2 and
changing the owning regime to EL1&0 too early in vcpu_put() would mean
creating an extended blackout window for the host.

Move the MDCR_EL2 write out of __activate_traps_common() to prepare for
executing it later in the run loop in the VHE case. This also makes
__activate_traps_common() look more like __deactivate_traps_common(), which
does not touch the register.

No functional change intended.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/kvm/hyp/include/hyp/switch.h | 1 -
 arch/arm64/kvm/hyp/nvhe/switch.c        | 2 ++
 arch/arm64/kvm/hyp/vhe/switch.c         | 2 ++
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index e4a2f295a394..5084a54d012e 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -92,7 +92,6 @@ static inline void __activate_traps_common(struct kvm_vcpu *vcpu)
 		write_sysreg(0, pmselr_el0);
 		write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
 	}
-	write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
 }
 
 static inline void __deactivate_traps_common(void)
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index f7af9688c1f7..0c70d897a493 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -41,6 +41,8 @@ static void __activate_traps(struct kvm_vcpu *vcpu)
 	___activate_traps(vcpu);
 	__activate_traps_common(vcpu);
 
+	write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
+
 	val = CPTR_EL2_DEFAULT;
 	val |= CPTR_EL2_TTA | CPTR_EL2_TAM;
 	if (!update_fp_enabled(vcpu)) {
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 86d4c8c33f3e..983ba1570d72 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -89,6 +89,8 @@ NOKPROBE_SYMBOL(__deactivate_traps);
 void activate_traps_vhe_load(struct kvm_vcpu *vcpu)
 {
 	__activate_traps_common(vcpu);
+
+	write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
 }
 
 void deactivate_traps_vhe_put(void)
-- 
2.33.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v4 26/39] KVM: arm64: VHE: Change MDCR_EL2 at world switch if VCPU has SPE
  2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
                   ` (24 preceding siblings ...)
  2021-08-25 16:18 ` [RFC PATCH v4 25/39] KVM: arm64: Move the write to MDCR_EL2 out of __activate_traps_common() Alexandru Elisei
@ 2021-08-25 16:18 ` Alexandru Elisei
  2021-08-25 16:18 ` [RFC PATCH v4 27/39] KVM: arm64: Add SPE system registers to VCPU context Alexandru Elisei
                   ` (13 subsequent siblings)
  39 siblings, 0 replies; 47+ messages in thread
From: Alexandru Elisei @ 2021-08-25 16:18 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	linux-kernel

When a VCPU has the SPE feature, MDCR_EL2 sets the buffer owning regime to
EL1&0. Write the guest's MDCR_EL2 value as late as possible and restore the
host's value as soon as possible at each world switch to make the profiling
blackout window as small as possible for the host.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_hyp.h   |  2 +-
 arch/arm64/kvm/debug.c             | 14 +++++++++++--
 arch/arm64/kvm/hyp/vhe/switch.c    | 33 +++++++++++++++++++++++-------
 arch/arm64/kvm/hyp/vhe/sysreg-sr.c |  2 +-
 4 files changed, 40 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 9d60b3006efc..657d0c94cf82 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -95,7 +95,7 @@ void __sve_restore_state(void *sve_pffr, u32 *fpsr);
 
 #ifndef __KVM_NVHE_HYPERVISOR__
 void activate_traps_vhe_load(struct kvm_vcpu *vcpu);
-void deactivate_traps_vhe_put(void);
+void deactivate_traps_vhe_put(struct kvm_vcpu *vcpu);
 #endif
 
 u64 __guest_enter(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index 64e8211366b6..70712cd85f32 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -249,9 +249,19 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
 		vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
 
 	/* Write mdcr_el2 changes since vcpu_load on VHE systems */
-	if (has_vhe() && orig_mdcr_el2 != vcpu->arch.mdcr_el2)
-		write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
+	if (has_vhe()) {
+		/*
+		 * MDCR_EL2 can modify the SPE buffer owning regime, defer the
+		 * write until the VCPU is run.
+		 */
+		if (kvm_vcpu_has_spe(vcpu))
+			goto out;
+
+		if (orig_mdcr_el2 != vcpu->arch.mdcr_el2)
+			write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
+	}
 
+out:
 	trace_kvm_arm_set_dreg32("MDSCR_EL1", vcpu_read_sys_reg(vcpu, MDSCR_EL1));
 }
 
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 983ba1570d72..ec4e179d56ae 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -31,12 +31,29 @@ DEFINE_PER_CPU(struct kvm_host_data, kvm_host_data);
 DEFINE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt);
 DEFINE_PER_CPU(unsigned long, kvm_hyp_vector);
 
+static void __restore_host_mdcr_el2(struct kvm_vcpu *vcpu)
+{
+	u64 mdcr_el2;
+
+	mdcr_el2 = read_sysreg(mdcr_el2);
+	mdcr_el2 &= MDCR_EL2_HPMN_MASK | MDCR_EL2_TPMS;
+	write_sysreg(mdcr_el2, mdcr_el2);
+}
+
+static void __restore_guest_mdcr_el2(struct kvm_vcpu *vcpu)
+{
+	write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
+}
+
 static void __activate_traps(struct kvm_vcpu *vcpu)
 {
 	u64 val;
 
 	___activate_traps(vcpu);
 
+	if (kvm_vcpu_has_spe(vcpu))
+		__restore_guest_mdcr_el2(vcpu);
+
 	val = read_sysreg(cpacr_el1);
 	val |= CPACR_EL1_TTA;
 	val &= ~CPACR_EL1_ZEN;
@@ -81,7 +98,11 @@ static void __deactivate_traps(struct kvm_vcpu *vcpu)
 	 */
 	asm(ALTERNATIVE("nop", "isb", ARM64_WORKAROUND_SPECULATIVE_AT));
 
+	if (kvm_vcpu_has_spe(vcpu))
+		__restore_host_mdcr_el2(vcpu);
+
 	write_sysreg(CPACR_EL1_DEFAULT, cpacr_el1);
+
 	write_sysreg(vectors, vbar_el1);
 }
 NOKPROBE_SYMBOL(__deactivate_traps);
@@ -90,16 +111,14 @@ void activate_traps_vhe_load(struct kvm_vcpu *vcpu)
 {
 	__activate_traps_common(vcpu);
 
-	write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
+	if (!kvm_vcpu_has_spe(vcpu))
+		__restore_guest_mdcr_el2(vcpu);
 }
 
-void deactivate_traps_vhe_put(void)
+void deactivate_traps_vhe_put(struct kvm_vcpu *vcpu)
 {
-	u64 mdcr_el2 = read_sysreg(mdcr_el2);
-
-	mdcr_el2 &= MDCR_EL2_HPMN_MASK | MDCR_EL2_TPMS;
-
-	write_sysreg(mdcr_el2, mdcr_el2);
+	if (!kvm_vcpu_has_spe(vcpu))
+		__restore_host_mdcr_el2(vcpu);
 
 	__deactivate_traps_common();
 }
diff --git a/arch/arm64/kvm/hyp/vhe/sysreg-sr.c b/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
index 2a0b8c88d74f..007a12dd4351 100644
--- a/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
@@ -101,7 +101,7 @@ void kvm_vcpu_put_sysregs_vhe(struct kvm_vcpu *vcpu)
 	struct kvm_cpu_context *host_ctxt;
 
 	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
-	deactivate_traps_vhe_put();
+	deactivate_traps_vhe_put(vcpu);
 
 	__sysreg_save_el1_state(guest_ctxt);
 	__sysreg_save_user_state(guest_ctxt);
-- 
2.33.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v4 27/39] KVM: arm64: Add SPE system registers to VCPU context
  2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
                   ` (25 preceding siblings ...)
  2021-08-25 16:18 ` [RFC PATCH v4 26/39] KVM: arm64: VHE: Change MDCR_EL2 at world switch if VCPU has SPE Alexandru Elisei
@ 2021-08-25 16:18 ` Alexandru Elisei
  2021-08-25 16:18 ` [RFC PATCH v4 28/39] KVM: arm64: nVHE: Save PMSCR_EL1 to the host context Alexandru Elisei
                   ` (12 subsequent siblings)
  39 siblings, 0 replies; 47+ messages in thread
From: Alexandru Elisei @ 2021-08-25 16:18 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	linux-kernel

Add the SPE system registers to the VCPU context. Omitted are
PMBIDR_EL1, which cannot be trapped, and PMSIR_EL1, which is a read-only
register. The registers that KVM traps are stored in the sys_regs array
on a write, and returned on a read; complete emulation and save/restore
for all registers on world switch will be added a future patches.

KVM exposes FEAT_SPEv1p1 to guests in the ID_AA64DFR0_EL1 register and
doesn't trap accesses to the profiling control registers. If the hardware
supports FEAT_SPEv1p2, the guest will be able to access the PMSNEVFR_EL1
register, which is UNDEFINED for FEAT_SPEv1p1. However, that
inconsistency is somewhat consistent with the architecture because
PMBIDR_EL1 behaves similarly: the register is UNDEFINED if SPE is missing,
but a VCPU without the SPE feature can still read the register because
there is no (easy) way for KVM to trap accesses to the register.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_host.h | 12 +++++++
 arch/arm64/include/asm/kvm_spe.h  |  7 ++++
 arch/arm64/kvm/spe.c              | 10 ++++++
 arch/arm64/kvm/sys_regs.c         | 54 ++++++++++++++++++++++++-------
 4 files changed, 71 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 7b957e439b3d..4c0d3d5ba285 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -237,6 +237,18 @@ enum vcpu_sysreg {
 	TFSR_EL1,	/* Tag Fault Status Register (EL1) */
 	TFSRE0_EL1,	/* Tag Fault Status Register (EL0) */
 
+       /* Statistical Profiling Extension Registers. */
+	PMSCR_EL1,      /* Statistical Profiling Control Register */
+	PMSICR_EL1,     /* Sampling Interval Counter Register */
+	PMSIRR_EL1,     /* Sampling Interval Reload Register */
+	PMSFCR_EL1,     /* Sampling Filter Control Register */
+	PMSEVFR_EL1,    /* Sampling Event Filter Register */
+	PMSLATFR_EL1,   /* Sampling Latency Filter Register */
+	PMBLIMITR_EL1,  /* Profiling Buffer Limit Address Register */
+	PMBPTR_EL1,     /* Profiling Buffer Write Pointer Register */
+	PMBSR_EL1,      /* Profiling Buffer Status/syndrome Register */
+	PMSCR_EL2,	/* Statistical Profiling Control Register, EL2 */
+
 	/* 32bit specific registers. Keep them at the end of the range */
 	DACR32_EL2,	/* Domain Access Control Register */
 	IFSR32_EL2,	/* Instruction Fault Status Register */
diff --git a/arch/arm64/include/asm/kvm_spe.h b/arch/arm64/include/asm/kvm_spe.h
index 2217b821ab37..934eedb0de46 100644
--- a/arch/arm64/include/asm/kvm_spe.h
+++ b/arch/arm64/include/asm/kvm_spe.h
@@ -25,9 +25,13 @@ void kvm_spe_init_supported_cpus(void);
 void kvm_spe_vm_init(struct kvm *kvm);
 int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu);
 
+void kvm_spe_write_sysreg(struct kvm_vcpu *vcpu, int reg, u64 val);
+u64 kvm_spe_read_sysreg(struct kvm_vcpu *vcpu, int reg);
+
 int kvm_spe_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
 int kvm_spe_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
 int kvm_spe_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
+
 #else
 #define kvm_supports_spe()	(false)
 
@@ -38,6 +42,9 @@ static inline void kvm_spe_init_supported_cpus(void) {}
 static inline void kvm_spe_vm_init(struct kvm *kvm) {}
 static inline int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu) { return -ENOEXEC; }
 
+static inline void kvm_spe_write_sysreg(struct kvm_vcpu *vcpu, int reg, u64 val) {}
+static inline u64 kvm_spe_read_sysreg(struct kvm_vcpu *vcpu, int reg) { return 0; }
+
 static inline int kvm_spe_set_attr(struct kvm_vcpu *vcpu,
 				   struct kvm_device_attr *attr)
 {
diff --git a/arch/arm64/kvm/spe.c b/arch/arm64/kvm/spe.c
index 801ceb66a3d0..f760ccd8306a 100644
--- a/arch/arm64/kvm/spe.c
+++ b/arch/arm64/kvm/spe.c
@@ -57,6 +57,16 @@ int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+void kvm_spe_write_sysreg(struct kvm_vcpu *vcpu, int reg, u64 val)
+{
+	__vcpu_sys_reg(vcpu, reg) = val;
+}
+
+u64 kvm_spe_read_sysreg(struct kvm_vcpu *vcpu, int reg)
+{
+	return __vcpu_sys_reg(vcpu, reg);
+}
+
 static bool kvm_vcpu_supports_spe(struct kvm_vcpu *vcpu)
 {
 	if (!kvm_supports_spe())
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index ab7370b7a44b..843822be5695 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -594,6 +594,33 @@ static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
 	vcpu_write_sys_reg(vcpu, (1ULL << 31) | mpidr, MPIDR_EL1);
 }
 
+static unsigned int spe_visibility(const struct kvm_vcpu *vcpu,
+				  const struct sys_reg_desc *r)
+{
+	if (kvm_vcpu_has_spe(vcpu))
+		return 0;
+
+	return REG_HIDDEN;
+}
+
+static bool access_spe_reg(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+			   const struct sys_reg_desc *r)
+{	int reg = r->reg;
+	u64 val = p->regval;
+
+	if (reg < PMBLIMITR_EL1) {
+		print_sys_reg_msg(p, "Unsupported guest SPE register access at: %lx [%08lx]\n",
+				  *vcpu_pc(vcpu), *vcpu_cpsr(vcpu));
+	}
+
+	if (p->is_write)
+		kvm_spe_write_sysreg(vcpu, reg, val);
+	else
+		p->regval = kvm_spe_read_sysreg(vcpu, reg);
+
+	return true;
+}
+
 static unsigned int pmu_visibility(const struct kvm_vcpu *vcpu,
 				   const struct sys_reg_desc *r)
 {
@@ -956,6 +983,10 @@ static bool access_pmuserenr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 	{ PMU_SYS_REG(SYS_PMEVTYPERn_EL0(n)),				\
 	  .access = access_pmu_evtyper, .reg = (PMEVTYPER0_EL0 + n), }
 
+#define SPE_SYS_REG(r)							\
+	SYS_DESC(r), .access = access_spe_reg, .reset = reset_val,	\
+	.val = 0, .visibility = spe_visibility
+
 static bool undef_access(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 			 const struct sys_reg_desc *r)
 {
@@ -1530,18 +1561,17 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_FAR_EL1), access_vm_reg, reset_unknown, FAR_EL1 },
 	{ SYS_DESC(SYS_PAR_EL1), NULL, reset_unknown, PAR_EL1 },
 
-	{ SYS_DESC(SYS_PMSCR_EL1), undef_access },
-	{ SYS_DESC(SYS_PMSNEVFR_EL1), undef_access },
-	{ SYS_DESC(SYS_PMSICR_EL1), undef_access },
-	{ SYS_DESC(SYS_PMSIRR_EL1), undef_access },
-	{ SYS_DESC(SYS_PMSFCR_EL1), undef_access },
-	{ SYS_DESC(SYS_PMSEVFR_EL1), undef_access },
-	{ SYS_DESC(SYS_PMSLATFR_EL1), undef_access },
-	{ SYS_DESC(SYS_PMSIDR_EL1), undef_access },
-	{ SYS_DESC(SYS_PMBLIMITR_EL1), undef_access },
-	{ SYS_DESC(SYS_PMBPTR_EL1), undef_access },
-	{ SYS_DESC(SYS_PMBSR_EL1), undef_access },
-	/* PMBIDR_EL1 is not trapped */
+	{ SPE_SYS_REG(SYS_PMSCR_EL1), .reg = PMSCR_EL1 },
+	{ SPE_SYS_REG(SYS_PMSICR_EL1), .reg = PMSICR_EL1 },
+	{ SPE_SYS_REG(SYS_PMSIRR_EL1), .reg = PMSIRR_EL1 },
+	{ SPE_SYS_REG(SYS_PMSFCR_EL1), .reg = PMSFCR_EL1 },
+	{ SPE_SYS_REG(SYS_PMSEVFR_EL1), .reg = PMSEVFR_EL1 },
+	{ SPE_SYS_REG(SYS_PMSLATFR_EL1), .reg = PMSLATFR_EL1 },
+	{ SPE_SYS_REG(SYS_PMSIDR_EL1), .reset = NULL },
+	{ SPE_SYS_REG(SYS_PMBLIMITR_EL1), .reg = PMBLIMITR_EL1 },
+	{ SPE_SYS_REG(SYS_PMBPTR_EL1), .reg = PMBPTR_EL1 },
+	{ SPE_SYS_REG(SYS_PMBSR_EL1), .reg = PMBSR_EL1 },
+	/* PMBIDR_EL1 and PMSCR_EL2 are not trapped */
 
 	{ PMU_SYS_REG(SYS_PMINTENSET_EL1),
 	  .access = access_pminten, .reg = PMINTENSET_EL1 },
-- 
2.33.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v4 28/39] KVM: arm64: nVHE: Save PMSCR_EL1 to the host context
  2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
                   ` (26 preceding siblings ...)
  2021-08-25 16:18 ` [RFC PATCH v4 27/39] KVM: arm64: Add SPE system registers to VCPU context Alexandru Elisei
@ 2021-08-25 16:18 ` Alexandru Elisei
  2021-08-25 16:18 ` [RFC PATCH v4 29/39] KVM: arm64: Rename DEBUG_STATE_SAVE_SPE -> DEBUG_SAVE_SPE_BUFFER flags Alexandru Elisei
                   ` (11 subsequent siblings)
  39 siblings, 0 replies; 47+ messages in thread
From: Alexandru Elisei @ 2021-08-25 16:18 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	linux-kernel

The SPE registers are now part of the KVM register context, use the host
context to save the value of PMSCR_EL1 instead of a dedicated field in
host_debug_state.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_host.h  |  2 --
 arch/arm64/include/asm/kvm_hyp.h   |  6 ++++--
 arch/arm64/kvm/hyp/nvhe/debug-sr.c | 10 ++++++----
 arch/arm64/kvm/hyp/nvhe/switch.c   |  4 ++--
 4 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 4c0d3d5ba285..351b77dc7732 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -356,8 +356,6 @@ struct kvm_vcpu_arch {
 	struct {
 		/* {Break,watch}point registers */
 		struct kvm_guest_debug_arch regs;
-		/* Statistical profiling extension */
-		u64 pmscr_el1;
 		/* Self-hosted trace */
 		u64 trfcr_el1;
 	} host_debug_state;
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 657d0c94cf82..48619c2c0dc6 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -84,8 +84,10 @@ void __debug_switch_to_guest(struct kvm_vcpu *vcpu);
 void __debug_switch_to_host(struct kvm_vcpu *vcpu);
 
 #ifdef __KVM_NVHE_HYPERVISOR__
-void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu);
-void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu);
+void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu,
+				    struct kvm_cpu_context *host_ctxt);
+void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu,
+				       struct kvm_cpu_context *host_ctxt);
 #endif
 
 void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
index 7d3f25868cae..6db58722f045 100644
--- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
@@ -81,11 +81,12 @@ static void __debug_restore_trace(u64 trfcr_el1)
 	write_sysreg_s(trfcr_el1, SYS_TRFCR_EL1);
 }
 
-void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu)
+void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu,
+				    struct kvm_cpu_context *host_ctxt)
 {
 	/* Disable and flush SPE data generation */
 	if (vcpu->arch.flags & KVM_ARM64_DEBUG_STATE_SAVE_SPE)
-		__debug_save_spe(&vcpu->arch.host_debug_state.pmscr_el1);
+		__debug_save_spe(__ctxt_sys_reg(host_ctxt, PMSCR_EL1));
 	/* Disable and flush Self-Hosted Trace generation */
 	if (vcpu->arch.flags & KVM_ARM64_DEBUG_STATE_SAVE_TRBE)
 		__debug_save_trace(&vcpu->arch.host_debug_state.trfcr_el1);
@@ -96,10 +97,11 @@ void __debug_switch_to_guest(struct kvm_vcpu *vcpu)
 	__debug_switch_to_guest_common(vcpu);
 }
 
-void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu)
+void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu,
+				       struct kvm_cpu_context *host_ctxt)
 {
 	if (vcpu->arch.flags & KVM_ARM64_DEBUG_STATE_SAVE_SPE)
-		__debug_restore_spe(vcpu->arch.host_debug_state.pmscr_el1);
+		__debug_restore_spe(ctxt_sys_reg(host_ctxt, PMSCR_EL1));
 	if (vcpu->arch.flags & KVM_ARM64_DEBUG_STATE_SAVE_TRBE)
 		__debug_restore_trace(vcpu->arch.host_debug_state.trfcr_el1);
 }
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index 0c70d897a493..04d654e71a6e 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -200,7 +200,7 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 	 * translation regime to EL2 (via MDCR_EL2_E2PB == 0) and
 	 * before we load guest Stage1.
 	 */
-	__debug_save_host_buffers_nvhe(vcpu);
+	__debug_save_host_buffers_nvhe(vcpu, host_ctxt);
 
 	__kvm_adjust_pc(vcpu);
 
@@ -248,7 +248,7 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 	 * This must come after restoring the host sysregs, since a non-VHE
 	 * system may enable SPE here and make use of the TTBRs.
 	 */
-	__debug_restore_host_buffers_nvhe(vcpu);
+	__debug_restore_host_buffers_nvhe(vcpu, host_ctxt);
 
 	if (pmu_switch_needed)
 		__pmu_switch_to_host(host_ctxt);
-- 
2.33.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v4 29/39] KVM: arm64: Rename DEBUG_STATE_SAVE_SPE -> DEBUG_SAVE_SPE_BUFFER flags
  2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
                   ` (27 preceding siblings ...)
  2021-08-25 16:18 ` [RFC PATCH v4 28/39] KVM: arm64: nVHE: Save PMSCR_EL1 to the host context Alexandru Elisei
@ 2021-08-25 16:18 ` Alexandru Elisei
  2021-08-25 16:18 ` [RFC PATCH v4 30/39] KVM: arm64: nVHE: Context switch SPE state if VCPU has SPE Alexandru Elisei
                   ` (10 subsequent siblings)
  39 siblings, 0 replies; 47+ messages in thread
From: Alexandru Elisei @ 2021-08-25 16:18 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	linux-kernel

Setting the KVM_ARM64_DEBUG_STATE_SAVE_SPE flag will stop profiling to
drain the buffer, if the buffer is enabled when switching to the guest, and
then re-enable profiling on the return to the host. Rename it to
KVM_ARM64_DEBUG_SAVE_SPE_BUFFER to avoid any confusion with what a SPE
enabled VCPU will do, which is to save and restore the full SPE state on a
world switch, and not just part of it, some of the time. This also matches
the name of the function __debug_save_host_buffers_nvhe(), which makes use
of the flag to decide if the buffer should be drained.

Similar treatment was applied to KVM_ARM64_DEBUG_STATE_SAVE_TRBE, which was
renamed to KVM_ARM64_DEBUG_SAVE_TRBE_BUFFER, for consistency and to better
reflect what it is doing.

CC: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_host.h  | 24 ++++++++++++------------
 arch/arm64/kvm/debug.c             | 11 ++++++-----
 arch/arm64/kvm/hyp/nvhe/debug-sr.c |  8 ++++----
 3 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 351b77dc7732..e704847a7645 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -439,18 +439,18 @@ struct kvm_vcpu_arch {
 })
 
 /* vcpu_arch flags field values: */
-#define KVM_ARM64_DEBUG_DIRTY		(1 << 0)
-#define KVM_ARM64_FP_ENABLED		(1 << 1) /* guest FP regs loaded */
-#define KVM_ARM64_FP_HOST		(1 << 2) /* host FP regs loaded */
-#define KVM_ARM64_HOST_SVE_IN_USE	(1 << 3) /* backup for host TIF_SVE */
-#define KVM_ARM64_HOST_SVE_ENABLED	(1 << 4) /* SVE enabled for EL0 */
-#define KVM_ARM64_GUEST_HAS_SVE		(1 << 5) /* SVE exposed to guest */
-#define KVM_ARM64_VCPU_SVE_FINALIZED	(1 << 6) /* SVE config completed */
-#define KVM_ARM64_GUEST_HAS_PTRAUTH	(1 << 7) /* PTRAUTH exposed to guest */
-#define KVM_ARM64_PENDING_EXCEPTION	(1 << 8) /* Exception pending */
-#define KVM_ARM64_EXCEPT_MASK		(7 << 9) /* Target EL/MODE */
-#define KVM_ARM64_DEBUG_STATE_SAVE_SPE	(1 << 12) /* Save SPE context if active  */
-#define KVM_ARM64_DEBUG_STATE_SAVE_TRBE	(1 << 13) /* Save TRBE context if active  */
+#define KVM_ARM64_DEBUG_DIRTY			(1 << 0)
+#define KVM_ARM64_FP_ENABLED			(1 << 1) /* guest FP regs loaded */
+#define KVM_ARM64_FP_HOST			(1 << 2) /* host FP regs loaded */
+#define KVM_ARM64_HOST_SVE_IN_USE		(1 << 3) /* backup for host TIF_SVE */
+#define KVM_ARM64_HOST_SVE_ENABLED		(1 << 4) /* SVE enabled for EL0 */
+#define KVM_ARM64_GUEST_HAS_SVE			(1 << 5) /* SVE exposed to guest */
+#define KVM_ARM64_VCPU_SVE_FINALIZED		(1 << 6) /* SVE config completed */
+#define KVM_ARM64_GUEST_HAS_PTRAUTH		(1 << 7) /* PTRAUTH exposed to guest */
+#define KVM_ARM64_PENDING_EXCEPTION		(1 << 8) /* Exception pending */
+#define KVM_ARM64_EXCEPT_MASK			(7 << 9) /* Target EL/MODE */
+#define KVM_ARM64_DEBUG_SAVE_SPE_BUFFER		(1 << 12) /* Save SPE buffer if active  */
+#define KVM_ARM64_DEBUG_SAVE_TRBE_BUFFER	(1 << 13) /* Save TRBE buffer if active  */
 
 #define KVM_GUESTDBG_VALID_MASK (KVM_GUESTDBG_ENABLE | \
 				 KVM_GUESTDBG_USE_SW_BP | \
diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index 70712cd85f32..6e5fc1887215 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -299,22 +299,23 @@ void kvm_arch_vcpu_load_debug_state_flags(struct kvm_vcpu *vcpu)
 		return;
 
 	dfr0 = read_sysreg(id_aa64dfr0_el1);
+
 	/*
 	 * If SPE is present on this CPU and is available at current EL,
-	 * we may need to check if the host state needs to be saved.
+	 * we may need to check if the host buffer needs to be drained.
 	 */
 	if (cpuid_feature_extract_unsigned_field(dfr0, ID_AA64DFR0_PMSVER_SHIFT) &&
 	    !(read_sysreg_s(SYS_PMBIDR_EL1) & BIT(SYS_PMBIDR_EL1_P_SHIFT)))
-		vcpu->arch.flags |= KVM_ARM64_DEBUG_STATE_SAVE_SPE;
+		vcpu->arch.flags |= KVM_ARM64_DEBUG_SAVE_SPE_BUFFER;
 
 	/* Check if we have TRBE implemented and available at the host */
 	if (cpuid_feature_extract_unsigned_field(dfr0, ID_AA64DFR0_TRBE_SHIFT) &&
 	    !(read_sysreg_s(SYS_TRBIDR_EL1) & TRBIDR_PROG))
-		vcpu->arch.flags |= KVM_ARM64_DEBUG_STATE_SAVE_TRBE;
+		vcpu->arch.flags |= KVM_ARM64_DEBUG_SAVE_TRBE_BUFFER;
 }
 
 void kvm_arch_vcpu_put_debug_state_flags(struct kvm_vcpu *vcpu)
 {
-	vcpu->arch.flags &= ~(KVM_ARM64_DEBUG_STATE_SAVE_SPE |
-			      KVM_ARM64_DEBUG_STATE_SAVE_TRBE);
+	vcpu->arch.flags &= ~(KVM_ARM64_DEBUG_SAVE_SPE_BUFFER |
+			      KVM_ARM64_DEBUG_SAVE_TRBE_BUFFER);
 }
diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
index 6db58722f045..186b90b5fd20 100644
--- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
@@ -85,10 +85,10 @@ void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu,
 				    struct kvm_cpu_context *host_ctxt)
 {
 	/* Disable and flush SPE data generation */
-	if (vcpu->arch.flags & KVM_ARM64_DEBUG_STATE_SAVE_SPE)
+	if (vcpu->arch.flags & KVM_ARM64_DEBUG_SAVE_SPE_BUFFER)
 		__debug_save_spe(__ctxt_sys_reg(host_ctxt, PMSCR_EL1));
 	/* Disable and flush Self-Hosted Trace generation */
-	if (vcpu->arch.flags & KVM_ARM64_DEBUG_STATE_SAVE_TRBE)
+	if (vcpu->arch.flags & KVM_ARM64_DEBUG_SAVE_TRBE_BUFFER)
 		__debug_save_trace(&vcpu->arch.host_debug_state.trfcr_el1);
 }
 
@@ -100,9 +100,9 @@ void __debug_switch_to_guest(struct kvm_vcpu *vcpu)
 void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu,
 				       struct kvm_cpu_context *host_ctxt)
 {
-	if (vcpu->arch.flags & KVM_ARM64_DEBUG_STATE_SAVE_SPE)
+	if (vcpu->arch.flags & KVM_ARM64_DEBUG_SAVE_SPE_BUFFER)
 		__debug_restore_spe(ctxt_sys_reg(host_ctxt, PMSCR_EL1));
-	if (vcpu->arch.flags & KVM_ARM64_DEBUG_STATE_SAVE_TRBE)
+	if (vcpu->arch.flags & KVM_ARM64_DEBUG_SAVE_TRBE_BUFFER)
 		__debug_restore_trace(vcpu->arch.host_debug_state.trfcr_el1);
 }
 
-- 
2.33.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v4 30/39] KVM: arm64: nVHE: Context switch SPE state if VCPU has SPE
  2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
                   ` (28 preceding siblings ...)
  2021-08-25 16:18 ` [RFC PATCH v4 29/39] KVM: arm64: Rename DEBUG_STATE_SAVE_SPE -> DEBUG_SAVE_SPE_BUFFER flags Alexandru Elisei
@ 2021-08-25 16:18 ` Alexandru Elisei
  2021-08-25 16:18 ` [RFC PATCH v4 31/39] KVM: arm64: VHE: " Alexandru Elisei
                   ` (9 subsequent siblings)
  39 siblings, 0 replies; 47+ messages in thread
From: Alexandru Elisei @ 2021-08-25 16:18 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	linux-kernel

For non-VHE systems, make the SPE register state part of the context that
is saved and restored at each world switch. The SPE buffer management
interrupt will be handled in a later patch.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_hyp.h        | 19 ++++++
 arch/arm64/kvm/hyp/include/hyp/spe-sr.h | 32 +++++++++
 arch/arm64/kvm/hyp/nvhe/Makefile        |  1 +
 arch/arm64/kvm/hyp/nvhe/debug-sr.c      |  6 +-
 arch/arm64/kvm/hyp/nvhe/spe-sr.c        | 87 +++++++++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/switch.c        | 29 +++++++--
 6 files changed, 165 insertions(+), 9 deletions(-)
 create mode 100644 arch/arm64/kvm/hyp/include/hyp/spe-sr.h
 create mode 100644 arch/arm64/kvm/hyp/nvhe/spe-sr.c

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 48619c2c0dc6..06e77a739458 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -88,6 +88,25 @@ void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu,
 				    struct kvm_cpu_context *host_ctxt);
 void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu,
 				       struct kvm_cpu_context *host_ctxt);
+#ifdef CONFIG_KVM_ARM_SPE
+void __spe_save_host_state_nvhe(struct kvm_vcpu *vcpu,
+				struct kvm_cpu_context *host_ctxt);
+void __spe_save_guest_state_nvhe(struct kvm_vcpu *vcpu,
+				 struct kvm_cpu_context *guest_ctxt);
+void __spe_restore_host_state_nvhe(struct kvm_vcpu *vcpu,
+				   struct kvm_cpu_context *host_ctxt);
+void __spe_restore_guest_state_nvhe(struct kvm_vcpu *vcpu,
+				    struct kvm_cpu_context *guest_ctxt);
+#else
+static inline void __spe_save_host_state_nvhe(struct kvm_vcpu *vcpu,
+					struct kvm_cpu_context *host_ctxt) {}
+static inline void __spe_save_guest_state_nvhe(struct kvm_vcpu *vcpu,
+					struct kvm_cpu_context *guest_ctxt) {}
+static inline void __spe_restore_host_state_nvhe(struct kvm_vcpu *vcpu,
+					struct kvm_cpu_context *host_ctxt) {}
+static inline void __spe_restore_guest_state_nvhe(struct kvm_vcpu *vcpu,
+					struct kvm_cpu_context *guest_ctxt) {}
+#endif
 #endif
 
 void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
diff --git a/arch/arm64/kvm/hyp/include/hyp/spe-sr.h b/arch/arm64/kvm/hyp/include/hyp/spe-sr.h
new file mode 100644
index 000000000000..d5f8f3ffc7d4
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/hyp/spe-sr.h
@@ -0,0 +1,32 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2021 - ARM Ltd
+ * Author: Alexandru Elisei <alexandru.elisei@arm.com>
+ */
+
+#ifndef __ARM64_KVM_HYP_SPE_SR_H__
+#define __ARM64_KVM_HYP_SPE_SR_H__
+
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_asm.h>
+
+static inline void __spe_save_common_state(struct kvm_cpu_context *ctxt)
+{
+	ctxt_sys_reg(ctxt, PMSICR_EL1) = read_sysreg_s(SYS_PMSICR_EL1);
+	ctxt_sys_reg(ctxt, PMSIRR_EL1) = read_sysreg_s(SYS_PMSIRR_EL1);
+	ctxt_sys_reg(ctxt, PMSFCR_EL1) = read_sysreg_s(SYS_PMSFCR_EL1);
+	ctxt_sys_reg(ctxt, PMSEVFR_EL1) = read_sysreg_s(SYS_PMSEVFR_EL1);
+	ctxt_sys_reg(ctxt, PMSLATFR_EL1) = read_sysreg_s(SYS_PMSLATFR_EL1);
+}
+
+static inline void __spe_restore_common_state(struct kvm_cpu_context *ctxt)
+{
+	write_sysreg_s(ctxt_sys_reg(ctxt, PMSICR_EL1), SYS_PMSICR_EL1);
+	write_sysreg_s(ctxt_sys_reg(ctxt, PMSIRR_EL1), SYS_PMSIRR_EL1);
+	write_sysreg_s(ctxt_sys_reg(ctxt, PMSFCR_EL1), SYS_PMSFCR_EL1);
+	write_sysreg_s(ctxt_sys_reg(ctxt, PMSEVFR_EL1), SYS_PMSEVFR_EL1);
+	write_sysreg_s(ctxt_sys_reg(ctxt, PMSLATFR_EL1), SYS_PMSLATFR_EL1);
+}
+
+#endif /* __ARM64_KVM_HYP_SPE_SR_H__ */
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index 5df6193fc430..37dca45d85d5 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -15,6 +15,7 @@ lib-objs := $(addprefix ../../../lib/, $(lib-objs))
 obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
 	 hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o page_alloc.o \
 	 cache.o setup.o mm.o mem_protect.o
+obj-$(CONFIG_KVM_ARM_SPE) += spe-sr.o
 obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
 	 ../fpsimd.o ../hyp-entry.o ../exception.o ../pgtable.o
 obj-y += $(lib-objs)
diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
index 186b90b5fd20..1622615954b2 100644
--- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
@@ -85,7 +85,8 @@ void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu,
 				    struct kvm_cpu_context *host_ctxt)
 {
 	/* Disable and flush SPE data generation */
-	if (vcpu->arch.flags & KVM_ARM64_DEBUG_SAVE_SPE_BUFFER)
+	if (!kvm_vcpu_has_spe(vcpu) &&
+	    vcpu->arch.flags & KVM_ARM64_DEBUG_SAVE_SPE_BUFFER)
 		__debug_save_spe(__ctxt_sys_reg(host_ctxt, PMSCR_EL1));
 	/* Disable and flush Self-Hosted Trace generation */
 	if (vcpu->arch.flags & KVM_ARM64_DEBUG_SAVE_TRBE_BUFFER)
@@ -100,7 +101,8 @@ void __debug_switch_to_guest(struct kvm_vcpu *vcpu)
 void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu,
 				       struct kvm_cpu_context *host_ctxt)
 {
-	if (vcpu->arch.flags & KVM_ARM64_DEBUG_SAVE_SPE_BUFFER)
+	if (!kvm_vcpu_has_spe(vcpu) &&
+	    vcpu->arch.flags & KVM_ARM64_DEBUG_SAVE_SPE_BUFFER)
 		__debug_restore_spe(ctxt_sys_reg(host_ctxt, PMSCR_EL1));
 	if (vcpu->arch.flags & KVM_ARM64_DEBUG_SAVE_TRBE_BUFFER)
 		__debug_restore_trace(vcpu->arch.host_debug_state.trfcr_el1);
diff --git a/arch/arm64/kvm/hyp/nvhe/spe-sr.c b/arch/arm64/kvm/hyp/nvhe/spe-sr.c
new file mode 100644
index 000000000000..46e47c9fd08f
--- /dev/null
+++ b/arch/arm64/kvm/hyp/nvhe/spe-sr.c
@@ -0,0 +1,87 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2021 - ARM Ltd
+ * Author: Alexandru Elisei <alexandru.elisei@arm.com>
+ */
+
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_hyp.h>
+
+#include <hyp/spe-sr.h>
+
+/*
+ * The owning exception level remains unchange from EL1 during the world switch,
+ * which means that profiling is disabled for as long as we execute at EL2. KVM
+ * does not need to explicitely disable profiling, like it does when the VCPU
+ * does not have SPE and we change buffer owning exception level, nor does it
+ * need to do any synchronization around sysreg save/restore.
+ */
+
+void __spe_save_host_state_nvhe(struct kvm_vcpu *vcpu,
+				struct kvm_cpu_context *host_ctxt)
+{
+	u64 pmblimitr;
+
+	pmblimitr = read_sysreg_s(SYS_PMBLIMITR_EL1);
+	if (pmblimitr & BIT(SYS_PMBLIMITR_EL1_E_SHIFT)) {
+		psb_csync();
+		dsb(nsh);
+		/*
+		 * The buffer performs indirect writes to system registers, a
+		 * context synchronization event is needed before the new
+		 * PMBPTR_EL1 value is visible to subsequent direct reads.
+		 */
+		isb();
+	}
+
+	ctxt_sys_reg(host_ctxt, PMBPTR_EL1) = read_sysreg_s(SYS_PMBPTR_EL1);
+	ctxt_sys_reg(host_ctxt, PMBSR_EL1) = read_sysreg_s(SYS_PMBSR_EL1);
+	ctxt_sys_reg(host_ctxt, PMBLIMITR_EL1) = pmblimitr;
+	ctxt_sys_reg(host_ctxt, PMSCR_EL1) = read_sysreg_s(SYS_PMSCR_EL1);
+	ctxt_sys_reg(host_ctxt, PMSCR_EL2) = read_sysreg_el2(SYS_PMSCR);
+
+	__spe_save_common_state(host_ctxt);
+}
+
+void __spe_save_guest_state_nvhe(struct kvm_vcpu *vcpu,
+				 struct kvm_cpu_context *guest_ctxt)
+{
+	if (read_sysreg_s(SYS_PMBLIMITR_EL1) & BIT(SYS_PMBLIMITR_EL1_E_SHIFT)) {
+		psb_csync();
+		dsb(nsh);
+		/* Ensure hardware updates to PMBPTR_EL1 are visible. */
+		isb();
+	}
+
+	ctxt_sys_reg(guest_ctxt, PMBPTR_EL1) = read_sysreg_s(SYS_PMBPTR_EL1);
+	ctxt_sys_reg(guest_ctxt, PMBSR_EL1) = read_sysreg_s(SYS_PMBSR_EL1);
+	/* PMBLIMITR_EL1 is updated only on a trapped write. */
+	ctxt_sys_reg(guest_ctxt, PMSCR_EL1) = read_sysreg_s(SYS_PMSCR_EL1);
+
+	__spe_save_common_state(guest_ctxt);
+}
+
+void __spe_restore_host_state_nvhe(struct kvm_vcpu *vcpu,
+				   struct kvm_cpu_context *host_ctxt)
+{
+	__spe_restore_common_state(host_ctxt);
+
+	write_sysreg_s(ctxt_sys_reg(host_ctxt, PMBPTR_EL1), SYS_PMBPTR_EL1);
+	write_sysreg_s(ctxt_sys_reg(host_ctxt, PMBSR_EL1), SYS_PMBSR_EL1);
+	write_sysreg_s(ctxt_sys_reg(host_ctxt, PMBLIMITR_EL1), SYS_PMBLIMITR_EL1);
+	write_sysreg_s(ctxt_sys_reg(host_ctxt, PMSCR_EL1), SYS_PMSCR_EL1);
+	write_sysreg_el2(ctxt_sys_reg(host_ctxt, PMSCR_EL2), SYS_PMSCR);
+}
+
+void __spe_restore_guest_state_nvhe(struct kvm_vcpu *vcpu,
+				    struct kvm_cpu_context *guest_ctxt)
+{
+	__spe_restore_common_state(guest_ctxt);
+
+	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBPTR_EL1), SYS_PMBPTR_EL1);
+	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBSR_EL1), SYS_PMBSR_EL1);
+	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBLIMITR_EL1), SYS_PMBLIMITR_EL1);
+	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMSCR_EL1), SYS_PMSCR_EL1);
+	write_sysreg_el2(0, SYS_PMSCR);
+}
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index 04d654e71a6e..62ef2a5789ba 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -194,12 +194,16 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 
 	__sysreg_save_state_nvhe(host_ctxt);
 	/*
-	 * We must flush and disable the SPE buffer for nVHE, as
-	 * the translation regime(EL1&0) is going to be loaded with
-	 * that of the guest. And we must do this before we change the
-	 * translation regime to EL2 (via MDCR_EL2_E2PB == 0) and
-	 * before we load guest Stage1.
+	 * If the VCPU has the SPE feature bit set, then we save the host's SPE
+	 * context.
+	 *
+	 * Otherwise, we only flush and disable the SPE buffer for nVHE, as the
+	 * translation regime(EL1&0) is going to be loaded with that of the
+	 * guest. And we must do this before we change the translation regime to
+	 * EL2 (via MDCR_EL2_E2PB == 0) and before we load guest Stage1.
 	 */
+	if (kvm_vcpu_has_spe(vcpu))
+		__spe_save_host_state_nvhe(vcpu, host_ctxt);
 	__debug_save_host_buffers_nvhe(vcpu, host_ctxt);
 
 	__kvm_adjust_pc(vcpu);
@@ -218,6 +222,9 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 	__load_guest_stage2(kern_hyp_va(vcpu->arch.hw_mmu));
 	__activate_traps(vcpu);
 
+	if (kvm_vcpu_has_spe(vcpu))
+		__spe_restore_guest_state_nvhe(vcpu, guest_ctxt);
+
 	__hyp_vgic_restore_state(vcpu);
 	__timer_enable_traps(vcpu);
 
@@ -232,6 +239,10 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 
 	__sysreg_save_state_nvhe(guest_ctxt);
 	__sysreg32_save_state(vcpu);
+
+	if (kvm_vcpu_has_spe(vcpu))
+		__spe_save_guest_state_nvhe(vcpu, guest_ctxt);
+
 	__timer_disable_traps(vcpu);
 	__hyp_vgic_save_state(vcpu);
 
@@ -244,10 +255,14 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 		__fpsimd_save_fpexc32(vcpu);
 
 	__debug_switch_to_host(vcpu);
+
 	/*
-	 * This must come after restoring the host sysregs, since a non-VHE
-	 * system may enable SPE here and make use of the TTBRs.
+	 * Restoring the host context must come after restoring the host
+	 * sysregs, since a non-VHE system may enable SPE here and make use of
+	 * the TTBRs.
 	 */
+	if (kvm_vcpu_has_spe(vcpu))
+		__spe_restore_host_state_nvhe(vcpu, host_ctxt);
 	__debug_restore_host_buffers_nvhe(vcpu, host_ctxt);
 
 	if (pmu_switch_needed)
-- 
2.33.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v4 31/39] KVM: arm64: VHE: Context switch SPE state if VCPU has SPE
  2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
                   ` (29 preceding siblings ...)
  2021-08-25 16:18 ` [RFC PATCH v4 30/39] KVM: arm64: nVHE: Context switch SPE state if VCPU has SPE Alexandru Elisei
@ 2021-08-25 16:18 ` Alexandru Elisei
  2021-08-25 16:18 ` [RFC PATCH v4 32/39] KVM: arm64: Save/restore PMSNEVFR_EL1 on VCPU put/load Alexandru Elisei
                   ` (8 subsequent siblings)
  39 siblings, 0 replies; 47+ messages in thread
From: Alexandru Elisei @ 2021-08-25 16:18 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	linux-kernel

Similar to the non-VHE case, save and restore the SPE register state at
each world switch for VHE enabled systems if the VCPU has the SPE
feature.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_hyp.h |  24 +++++-
 arch/arm64/include/asm/sysreg.h  |   2 +
 arch/arm64/kvm/hyp/vhe/Makefile  |   1 +
 arch/arm64/kvm/hyp/vhe/spe-sr.c  | 128 +++++++++++++++++++++++++++++++
 arch/arm64/kvm/hyp/vhe/switch.c  |   8 ++
 5 files changed, 161 insertions(+), 2 deletions(-)
 create mode 100644 arch/arm64/kvm/hyp/vhe/spe-sr.c

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 06e77a739458..03bc51049996 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -106,8 +106,28 @@ static inline void __spe_restore_host_state_nvhe(struct kvm_vcpu *vcpu,
 					struct kvm_cpu_context *host_ctxt) {}
 static inline void __spe_restore_guest_state_nvhe(struct kvm_vcpu *vcpu,
 					struct kvm_cpu_context *guest_ctxt) {}
-#endif
-#endif
+#endif /* CONFIG_KVM_ARM_SPE */
+#else
+#ifdef CONFIG_KVM_ARM_SPE
+void __spe_save_host_state_vhe(struct kvm_vcpu *vcpu,
+			       struct kvm_cpu_context *host_ctxt);
+void __spe_save_guest_state_vhe(struct kvm_vcpu *vcpu,
+				struct kvm_cpu_context *guest_ctxt);
+void __spe_restore_host_state_vhe(struct kvm_vcpu *vcpu,
+				  struct kvm_cpu_context *host_ctxt);
+void __spe_restore_guest_state_vhe(struct kvm_vcpu *vcpu,
+				   struct kvm_cpu_context *guest_ctxt);
+#else
+static inline void __spe_save_host_state_vhe(struct kvm_vcpu *vcpu,
+					struct kvm_cpu_context *host_ctxt) {}
+static inline void __spe_save_guest_state_vhe(struct kvm_vcpu *vcpu,
+					struct kvm_cpu_context *guest_ctxt) {}
+static inline void __spe_restore_host_state_vhe(struct kvm_vcpu *vcpu,
+					struct kvm_cpu_context *host_ctxt) {}
+static inline void __spe_restore_guest_state_vhe(struct kvm_vcpu *vcpu,
+					struct kvm_cpu_context *guest_ctxt) {}
+#endif /* CONFIG_KVM_ARM_SPE */
+#endif /* __KVM_NVHE_HYPERVISOR__ */
 
 void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
 void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs);
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 7b9c3acba684..b2d691bc1049 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -267,6 +267,8 @@
 #define SYS_PMSCR_EL1_TS_SHIFT		5
 #define SYS_PMSCR_EL1_PCT_SHIFT		6
 
+#define SYS_PMSCR_EL12			sys_reg(3, 5, 9, 9, 0)
+
 #define SYS_PMSCR_EL2			sys_reg(3, 4, 9, 9, 0)
 #define SYS_PMSCR_EL2_E0HSPE_SHIFT	0
 #define SYS_PMSCR_EL2_E2SPE_SHIFT	1
diff --git a/arch/arm64/kvm/hyp/vhe/Makefile b/arch/arm64/kvm/hyp/vhe/Makefile
index 96bec0ecf9dd..7cb4a9e5ceb0 100644
--- a/arch/arm64/kvm/hyp/vhe/Makefile
+++ b/arch/arm64/kvm/hyp/vhe/Makefile
@@ -7,5 +7,6 @@ asflags-y := -D__KVM_VHE_HYPERVISOR__
 ccflags-y := -D__KVM_VHE_HYPERVISOR__
 
 obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o
+obj-$(CONFIG_KVM_ARM_SPE) += spe-sr.o
 obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
 	 ../fpsimd.o ../hyp-entry.o ../exception.o
diff --git a/arch/arm64/kvm/hyp/vhe/spe-sr.c b/arch/arm64/kvm/hyp/vhe/spe-sr.c
new file mode 100644
index 000000000000..00eab9e2ec60
--- /dev/null
+++ b/arch/arm64/kvm/hyp/vhe/spe-sr.c
@@ -0,0 +1,128 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2021 - ARM Ltd
+ */
+
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_hyp.h>
+#include <asm/kprobes.h>
+
+#include <hyp/spe-sr.h>
+
+/*
+ * Disable host profiling, drain the buffer and save the host SPE context.
+ * Extra care must be taken because profiling might be in progress.
+ */
+void __spe_save_host_state_vhe(struct kvm_vcpu *vcpu,
+			       struct kvm_cpu_context *host_ctxt)
+{
+	u64 pmblimitr, pmscr_el2;
+
+	/* Disable profiling while the SPE context is being switched. */
+	pmscr_el2 = read_sysreg_el2(SYS_PMSCR);
+	write_sysreg_el2(0, SYS_PMSCR);
+	isb();
+
+	pmblimitr = read_sysreg_s(SYS_PMBLIMITR_EL1);
+	if (pmblimitr & BIT(SYS_PMBLIMITR_EL1_E_SHIFT)) {
+		psb_csync();
+		dsb(nsh);
+		/* Ensure hardware updates to PMBPTR_EL1 are visible. */
+		isb();
+	}
+
+	ctxt_sys_reg(host_ctxt, PMBPTR_EL1) = read_sysreg_s(SYS_PMBPTR_EL1);
+	ctxt_sys_reg(host_ctxt, PMBSR_EL1) = read_sysreg_s(SYS_PMBSR_EL1);
+	ctxt_sys_reg(host_ctxt, PMBLIMITR_EL1) = pmblimitr;
+	ctxt_sys_reg(host_ctxt, PMSCR_EL2) = pmscr_el2;
+
+	__spe_save_common_state(host_ctxt);
+}
+NOKPROBE_SYMBOL(__spe_save_host_state_vhe);
+
+/*
+ * Drain the guest's buffer and save the SPE state. Profiling is disabled
+ * because we're at EL2 and the buffer owning exceptions level is EL1.
+ */
+void __spe_save_guest_state_vhe(struct kvm_vcpu *vcpu,
+				struct kvm_cpu_context *guest_ctxt)
+{
+	u64 pmblimitr;
+
+	/*
+	 * We're at EL2 and the buffer owning regime is EL1, which means that
+	 * profiling is disabled. After we disable traps and restore the host's
+	 * MDCR_EL2, profiling will remain disabled because we've disabled it
+	 * via PMSCR_EL2 when we saved the host's SPE state. All it's needed
+	 * here is to drain the buffer.
+	 */
+	pmblimitr = read_sysreg_s(SYS_PMBLIMITR_EL1);
+	if (pmblimitr & BIT(SYS_PMBLIMITR_EL1_E_SHIFT)) {
+		psb_csync();
+		dsb(nsh);
+		/* Ensure hardware updates to PMBPTR_EL1 are visible. */
+		isb();
+	}
+
+	ctxt_sys_reg(guest_ctxt, PMBPTR_EL1) = read_sysreg_s(SYS_PMBPTR_EL1);
+	ctxt_sys_reg(guest_ctxt, PMBSR_EL1) = read_sysreg_s(SYS_PMBSR_EL1);
+	/* PMBLIMITR_EL1 is updated only on a trapped write. */
+	ctxt_sys_reg(guest_ctxt, PMSCR_EL1) = read_sysreg_el1(SYS_PMSCR);
+
+	__spe_save_common_state(guest_ctxt);
+}
+NOKPROBE_SYMBOL(__spe_save_guest_state_vhe);
+
+/*
+ * Restore the host SPE context. Special care must be taken because we're
+ * potentially resuming a profiling session which was stopped when we saved the
+ * host SPE register state.
+ */
+void __spe_restore_host_state_vhe(struct kvm_vcpu *vcpu,
+				  struct kvm_cpu_context *host_ctxt)
+{
+	__spe_restore_common_state(host_ctxt);
+
+	write_sysreg_s(ctxt_sys_reg(host_ctxt, PMBPTR_EL1), SYS_PMBPTR_EL1);
+	write_sysreg_s(ctxt_sys_reg(host_ctxt, PMBLIMITR_EL1), SYS_PMBLIMITR_EL1);
+	write_sysreg_s(ctxt_sys_reg(host_ctxt, PMBSR_EL1), SYS_PMBSR_EL1);
+
+	/*
+	 * Make sure buffer pointer and limit is updated first, so we don't end
+	 * up in a situation where profiling is enabled and the buffer uses the
+	 * values programmed by the guest.
+	 *
+	 * This also serves to make sure the write to MDCR_EL2 which changes the
+	 * buffer owning Exception level is visible.
+	 *
+	 * After the synchronization, profiling is still disabled at EL2,
+	 * because we cleared PMSCR_EL2 when we saved the host context.
+	 */
+	isb();
+
+	write_sysreg_el2(ctxt_sys_reg(host_ctxt, PMSCR_EL2), SYS_PMSCR);
+}
+NOKPROBE_SYMBOL(__spe_restore_host_state_vhe);
+
+/*
+ * Restore the guest SPE context while profiling is disabled at EL2.
+ */
+void __spe_restore_guest_state_vhe(struct kvm_vcpu *vcpu,
+				   struct kvm_cpu_context *guest_ctxt)
+{
+	__spe_restore_common_state(guest_ctxt);
+
+	/*
+	 * No synchronization needed here. Profiling is disabled at EL2 because
+	 * PMSCR_EL2 has been cleared when saving the host's context, and the
+	 * buffer has already been drained.
+	 */
+
+	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBPTR_EL1), SYS_PMBPTR_EL1);
+	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBSR_EL1), SYS_PMBSR_EL1);
+	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBLIMITR_EL1), SYS_PMBLIMITR_EL1);
+	write_sysreg_el1(ctxt_sys_reg(guest_ctxt, PMSCR_EL1), SYS_PMSCR);
+	/* PMSCR_EL2 has been cleared when saving the host state. */
+}
+NOKPROBE_SYMBOL(__spe_restore_guest_state_vhe);
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index ec4e179d56ae..46da018f4a5a 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -135,6 +135,8 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	guest_ctxt = &vcpu->arch.ctxt;
 
 	sysreg_save_host_state_vhe(host_ctxt);
+	if (kvm_vcpu_has_spe(vcpu))
+		__spe_save_host_state_vhe(vcpu, host_ctxt);
 
 	/*
 	 * ARM erratum 1165522 requires us to configure both stage 1 and
@@ -153,6 +155,8 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	__kvm_adjust_pc(vcpu);
 
 	sysreg_restore_guest_state_vhe(guest_ctxt);
+	if (kvm_vcpu_has_spe(vcpu))
+		__spe_restore_guest_state_vhe(vcpu, guest_ctxt);
 	__debug_switch_to_guest(vcpu);
 
 	do {
@@ -163,10 +167,14 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	} while (fixup_guest_exit(vcpu, &exit_code));
 
 	sysreg_save_guest_state_vhe(guest_ctxt);
+	if (kvm_vcpu_has_spe(vcpu))
+		__spe_save_guest_state_vhe(vcpu, guest_ctxt);
 
 	__deactivate_traps(vcpu);
 
 	sysreg_restore_host_state_vhe(host_ctxt);
+	if (kvm_vcpu_has_spe(vcpu))
+		__spe_restore_host_state_vhe(vcpu, host_ctxt);
 
 	if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED)
 		__fpsimd_save_fpexc32(vcpu);
-- 
2.33.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v4 32/39] KVM: arm64: Save/restore PMSNEVFR_EL1 on VCPU put/load
  2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
                   ` (30 preceding siblings ...)
  2021-08-25 16:18 ` [RFC PATCH v4 31/39] KVM: arm64: VHE: " Alexandru Elisei
@ 2021-08-25 16:18 ` Alexandru Elisei
  2021-08-25 16:18 ` [RFC PATCH v4 33/39] KVM: arm64: Allow guest to use physical timestamps if perfmon_capable() Alexandru Elisei
                   ` (7 subsequent siblings)
  39 siblings, 0 replies; 47+ messages in thread
From: Alexandru Elisei @ 2021-08-25 16:18 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	linux-kernel

FEAT_SPEv1p2 introduced a new register, PMSNEVFR_EL1. The SPE driver is not
using the register, so save the register to the guest context on vcpu_put()
and restore it on vcpu_load() since it will not be touched by the host, and
the value programmed by the guest doesn't affect the host.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_host.h |  1 +
 arch/arm64/include/asm/kvm_spe.h  |  6 ++++++
 arch/arm64/include/asm/sysreg.h   |  1 +
 arch/arm64/kvm/arm.c              |  2 ++
 arch/arm64/kvm/spe.c              | 29 +++++++++++++++++++++++++++++
 arch/arm64/kvm/sys_regs.c         |  1 +
 6 files changed, 40 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index e704847a7645..66f0b999cb5f 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -239,6 +239,7 @@ enum vcpu_sysreg {
 
        /* Statistical Profiling Extension Registers. */
 	PMSCR_EL1,      /* Statistical Profiling Control Register */
+	PMSNEVFR_EL1,   /* Sampling Inverted Event Filter Register */
 	PMSICR_EL1,     /* Sampling Interval Counter Register */
 	PMSIRR_EL1,     /* Sampling Interval Reload Register */
 	PMSFCR_EL1,     /* Sampling Filter Control Register */
diff --git a/arch/arm64/include/asm/kvm_spe.h b/arch/arm64/include/asm/kvm_spe.h
index 934eedb0de46..6b8d4cf2cd37 100644
--- a/arch/arm64/include/asm/kvm_spe.h
+++ b/arch/arm64/include/asm/kvm_spe.h
@@ -28,6 +28,9 @@ int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu);
 void kvm_spe_write_sysreg(struct kvm_vcpu *vcpu, int reg, u64 val);
 u64 kvm_spe_read_sysreg(struct kvm_vcpu *vcpu, int reg);
 
+void kvm_spe_vcpu_load(struct kvm_vcpu *vcpu);
+void kvm_spe_vcpu_put(struct kvm_vcpu *vcpu);
+
 int kvm_spe_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
 int kvm_spe_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
 int kvm_spe_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
@@ -45,6 +48,9 @@ static inline int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu) { return -E
 static inline void kvm_spe_write_sysreg(struct kvm_vcpu *vcpu, int reg, u64 val) {}
 static inline u64 kvm_spe_read_sysreg(struct kvm_vcpu *vcpu, int reg) { return 0; }
 
+static inline void kvm_spe_vcpu_load(struct kvm_vcpu *vcpu) {}
+static inline void kvm_spe_vcpu_put(struct kvm_vcpu *vcpu) {}
+
 static inline int kvm_spe_set_attr(struct kvm_vcpu *vcpu,
 				   struct kvm_device_attr *attr)
 {
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index b2d691bc1049..cedab4494a71 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -921,6 +921,7 @@
 
 #define ID_AA64DFR0_PMSVER_8_2		0x1
 #define ID_AA64DFR0_PMSVER_8_3		0x2
+#define ID_AA64DFR0_PMSVER_8_7		0x3
 
 #define ID_DFR0_PERFMON_SHIFT		24
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 6af7ef26d2c1..cd64894b286e 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -466,6 +466,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	if (vcpu_has_ptrauth(vcpu))
 		vcpu_ptrauth_disable(vcpu);
 	kvm_arch_vcpu_load_debug_state_flags(vcpu);
+	kvm_spe_vcpu_load(vcpu);
 
 	if (!cpumask_empty(&vcpu->arch.supported_cpus) &&
 	    !cpumask_test_cpu(smp_processor_id(), &vcpu->arch.supported_cpus))
@@ -474,6 +475,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
+	kvm_spe_vcpu_put(vcpu);
 	kvm_arch_vcpu_put_debug_state_flags(vcpu);
 	kvm_arch_vcpu_put_fp(vcpu);
 	if (has_vhe())
diff --git a/arch/arm64/kvm/spe.c b/arch/arm64/kvm/spe.c
index f760ccd8306a..711736c49f63 100644
--- a/arch/arm64/kvm/spe.c
+++ b/arch/arm64/kvm/spe.c
@@ -67,6 +67,35 @@ u64 kvm_spe_read_sysreg(struct kvm_vcpu *vcpu, int reg)
 	return __vcpu_sys_reg(vcpu, reg);
 }
 
+static unsigned int kvm_spe_get_pmsver(void)
+{
+	u64 dfr0 = read_sysreg(id_aa64dfr0_el1);
+
+	return cpuid_feature_extract_unsigned_field(dfr0, ID_AA64DFR0_PMSVER_SHIFT);
+}
+
+void kvm_spe_vcpu_load(struct kvm_vcpu *vcpu)
+{
+	if (!kvm_vcpu_has_spe(vcpu))
+		return;
+
+	if (kvm_spe_get_pmsver() < ID_AA64DFR0_PMSVER_8_7)
+		return;
+
+	write_sysreg_s(__vcpu_sys_reg(vcpu, PMSNEVFR_EL1), SYS_PMSNEVFR_EL1);
+}
+
+void kvm_spe_vcpu_put(struct kvm_vcpu *vcpu)
+{
+	if (!kvm_vcpu_has_spe(vcpu))
+		return;
+
+	if (kvm_spe_get_pmsver() < ID_AA64DFR0_PMSVER_8_7)
+		return;
+
+	__vcpu_sys_reg(vcpu, PMSNEVFR_EL1) = read_sysreg_s(SYS_PMSNEVFR_EL1);
+}
+
 static bool kvm_vcpu_supports_spe(struct kvm_vcpu *vcpu)
 {
 	if (!kvm_supports_spe())
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 843822be5695..064742cee425 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1562,6 +1562,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_PAR_EL1), NULL, reset_unknown, PAR_EL1 },
 
 	{ SPE_SYS_REG(SYS_PMSCR_EL1), .reg = PMSCR_EL1 },
+	{ SPE_SYS_REG(SYS_PMSNEVFR_EL1), .reg = PMSNEVFR_EL1 },
 	{ SPE_SYS_REG(SYS_PMSICR_EL1), .reg = PMSICR_EL1 },
 	{ SPE_SYS_REG(SYS_PMSIRR_EL1), .reg = PMSIRR_EL1 },
 	{ SPE_SYS_REG(SYS_PMSFCR_EL1), .reg = PMSFCR_EL1 },
-- 
2.33.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v4 33/39] KVM: arm64: Allow guest to use physical timestamps if perfmon_capable()
  2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
                   ` (31 preceding siblings ...)
  2021-08-25 16:18 ` [RFC PATCH v4 32/39] KVM: arm64: Save/restore PMSNEVFR_EL1 on VCPU put/load Alexandru Elisei
@ 2021-08-25 16:18 ` Alexandru Elisei
  2021-08-25 16:18 ` [RFC PATCH v4 34/39] KVM: arm64: Emulate SPE buffer management interrupt Alexandru Elisei
                   ` (6 subsequent siblings)
  39 siblings, 0 replies; 47+ messages in thread
From: Alexandru Elisei @ 2021-08-25 16:18 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	linux-kernel

The SPE driver allows userspace to use physical timestamps for records only
if the process if perfmon_capable(). Do the same for a virtual machine with
the SPE feature.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_host.h |  2 ++
 arch/arm64/include/asm/kvm_spe.h  |  7 +++++++
 arch/arm64/kvm/hyp/nvhe/spe-sr.c  |  2 +-
 arch/arm64/kvm/hyp/vhe/spe-sr.c   |  2 +-
 arch/arm64/kvm/spe.c              | 14 ++++++++++++++
 5 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 66f0b999cb5f..cc46f1406196 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -157,6 +157,8 @@ struct kvm_arch {
 
 	/* Memory Tagging Extension enabled for the guest */
 	bool mte_enabled;
+
+	struct kvm_spe spe;
 };
 
 struct kvm_vcpu_fault_info {
diff --git a/arch/arm64/include/asm/kvm_spe.h b/arch/arm64/include/asm/kvm_spe.h
index 6b8d4cf2cd37..272f1eec64f2 100644
--- a/arch/arm64/include/asm/kvm_spe.h
+++ b/arch/arm64/include/asm/kvm_spe.h
@@ -21,6 +21,10 @@ struct kvm_vcpu_spe {
 	int irq_num;		/* Buffer management interrut number */
 };
 
+struct kvm_spe {
+	bool perfmon_capable;	/* Is the VM perfmon_capable()? */
+};
+
 void kvm_spe_init_supported_cpus(void);
 void kvm_spe_vm_init(struct kvm *kvm);
 int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu);
@@ -41,6 +45,9 @@ int kvm_spe_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
 struct kvm_vcpu_spe {
 };
 
+struct kvm_spe {
+};
+
 static inline void kvm_spe_init_supported_cpus(void) {}
 static inline void kvm_spe_vm_init(struct kvm *kvm) {}
 static inline int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu) { return -ENOEXEC; }
diff --git a/arch/arm64/kvm/hyp/nvhe/spe-sr.c b/arch/arm64/kvm/hyp/nvhe/spe-sr.c
index 46e47c9fd08f..4f6579daddb5 100644
--- a/arch/arm64/kvm/hyp/nvhe/spe-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/spe-sr.c
@@ -83,5 +83,5 @@ void __spe_restore_guest_state_nvhe(struct kvm_vcpu *vcpu,
 	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBSR_EL1), SYS_PMBSR_EL1);
 	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBLIMITR_EL1), SYS_PMBLIMITR_EL1);
 	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMSCR_EL1), SYS_PMSCR_EL1);
-	write_sysreg_el2(0, SYS_PMSCR);
+	write_sysreg_el2(ctxt_sys_reg(guest_ctxt, PMSCR_EL2), SYS_PMSCR);
 }
diff --git a/arch/arm64/kvm/hyp/vhe/spe-sr.c b/arch/arm64/kvm/hyp/vhe/spe-sr.c
index 00eab9e2ec60..f557ac64a1cc 100644
--- a/arch/arm64/kvm/hyp/vhe/spe-sr.c
+++ b/arch/arm64/kvm/hyp/vhe/spe-sr.c
@@ -21,7 +21,7 @@ void __spe_save_host_state_vhe(struct kvm_vcpu *vcpu,
 
 	/* Disable profiling while the SPE context is being switched. */
 	pmscr_el2 = read_sysreg_el2(SYS_PMSCR);
-	write_sysreg_el2(0, SYS_PMSCR);
+	write_sysreg_el2(__vcpu_sys_reg(vcpu, PMSCR_EL2), SYS_PMSCR);
 	isb();
 
 	pmblimitr = read_sysreg_s(SYS_PMBLIMITR_EL1);
diff --git a/arch/arm64/kvm/spe.c b/arch/arm64/kvm/spe.c
index 711736c49f63..054bb16bbd79 100644
--- a/arch/arm64/kvm/spe.c
+++ b/arch/arm64/kvm/spe.c
@@ -3,6 +3,7 @@
  * Copyright (C) 2021 - ARM Ltd
  */
 
+#include <linux/capability.h>
 #include <linux/cpumask.h>
 #include <linux/kvm_host.h>
 #include <linux/perf/arm_pmu.h>
@@ -29,6 +30,16 @@ void kvm_spe_vm_init(struct kvm *kvm)
 {
 	/* Set supported_cpus if it isn't already initialized. */
 	kvm_spe_init_supported_cpus();
+
+	/*
+	 * Allow the guest to use the physical timer for timestamps only if the
+	 * VMM is perfmon_capable(), similar to what the SPE driver allows.
+	 *
+	 * CAP_PERFMON can be changed during the lifetime of the VM, so record
+	 * its value when the VM is created to avoid situations where only some
+	 * VCPUs allow physical timer timestamps, while others don't.
+	 */
+	kvm->arch.spe.perfmon_capable = perfmon_capable();
 }
 
 static int kvm_spe_check_supported_cpus(struct kvm_vcpu *vcpu)
@@ -54,6 +65,9 @@ int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu)
 	if (!vcpu->arch.spe.initialized)
 		return -EPERM;
 
+	if (vcpu->kvm->arch.spe.perfmon_capable)
+		__vcpu_sys_reg(vcpu, PMSCR_EL2) = BIT(SYS_PMSCR_EL1_PCT_SHIFT);
+
 	return 0;
 }
 
-- 
2.33.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v4 34/39] KVM: arm64: Emulate SPE buffer management interrupt
  2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
                   ` (32 preceding siblings ...)
  2021-08-25 16:18 ` [RFC PATCH v4 33/39] KVM: arm64: Allow guest to use physical timestamps if perfmon_capable() Alexandru Elisei
@ 2021-08-25 16:18 ` Alexandru Elisei
  2021-08-25 16:18 ` [RFC PATCH v4 35/39] KVM: arm64: Add an userspace API to stop a VCPU profiling Alexandru Elisei
                   ` (5 subsequent siblings)
  39 siblings, 0 replies; 47+ messages in thread
From: Alexandru Elisei @ 2021-08-25 16:18 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	linux-kernel

A profiling buffer management interrupt is asserted when the buffer fills,
on a fault or on an external abort. The service bit, PMBSR_EL1.S, is set as
long as SPE asserts this interrupt. The interrupt can also be asserted
following a direct write to PMBSR_EL1 that sets the bit. The SPE hardware
stops asserting the interrupt only when the service bit is cleared.

KVM emulates the interrupt by reading the value of the service bit on
each guest exit to determine if the SPE hardware asserted the interrupt
(for example, if the buffer was full). Writes to the buffer registers are
trapped, to determine when the interrupt should be cleared or when the
guest wants to explicitely assert the interrupt by setting the service bit.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_spe.h |  4 ++
 arch/arm64/kvm/arm.c             |  3 ++
 arch/arm64/kvm/hyp/nvhe/spe-sr.c | 28 +++++++++++--
 arch/arm64/kvm/hyp/vhe/spe-sr.c  | 17 ++++++--
 arch/arm64/kvm/spe.c             | 72 ++++++++++++++++++++++++++++++++
 5 files changed, 117 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_spe.h b/arch/arm64/include/asm/kvm_spe.h
index 272f1eec64f2..d7d7b9e243de 100644
--- a/arch/arm64/include/asm/kvm_spe.h
+++ b/arch/arm64/include/asm/kvm_spe.h
@@ -19,6 +19,8 @@ static __always_inline bool kvm_supports_spe(void)
 struct kvm_vcpu_spe {
 	bool initialized;	/* SPE initialized for the VCPU */
 	int irq_num;		/* Buffer management interrut number */
+	bool irq_level;		/* 'true' if the interrupt is asserted at the VGIC */
+	bool hwirq_level;	/* 'true' if the SPE hardware is asserting the interrupt */
 };
 
 struct kvm_spe {
@@ -28,6 +30,7 @@ struct kvm_spe {
 void kvm_spe_init_supported_cpus(void);
 void kvm_spe_vm_init(struct kvm *kvm);
 int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu);
+void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu);
 
 void kvm_spe_write_sysreg(struct kvm_vcpu *vcpu, int reg, u64 val);
 u64 kvm_spe_read_sysreg(struct kvm_vcpu *vcpu, int reg);
@@ -51,6 +54,7 @@ struct kvm_spe {
 static inline void kvm_spe_init_supported_cpus(void) {}
 static inline void kvm_spe_vm_init(struct kvm *kvm) {}
 static inline int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu) { return -ENOEXEC; }
+static inline void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu) {}
 
 static inline void kvm_spe_write_sysreg(struct kvm_vcpu *vcpu, int reg, u64 val) {}
 static inline u64 kvm_spe_read_sysreg(struct kvm_vcpu *vcpu, int reg) { return 0; }
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index cd64894b286e..ec449bc5f811 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -949,6 +949,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 		 */
 		kvm_pmu_sync_hwstate(vcpu);
 
+		if (kvm_supports_spe() && kvm_vcpu_has_spe(vcpu))
+			kvm_spe_sync_hwstate(vcpu);
+
 		/*
 		 * Sync the vgic state before syncing the timer state because
 		 * the timer code needs to know if the virtual timer
diff --git a/arch/arm64/kvm/hyp/nvhe/spe-sr.c b/arch/arm64/kvm/hyp/nvhe/spe-sr.c
index 4f6579daddb5..b74131486a75 100644
--- a/arch/arm64/kvm/hyp/nvhe/spe-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/spe-sr.c
@@ -47,6 +47,8 @@ void __spe_save_host_state_nvhe(struct kvm_vcpu *vcpu,
 void __spe_save_guest_state_nvhe(struct kvm_vcpu *vcpu,
 				 struct kvm_cpu_context *guest_ctxt)
 {
+	u64 pmbsr;
+
 	if (read_sysreg_s(SYS_PMBLIMITR_EL1) & BIT(SYS_PMBLIMITR_EL1_E_SHIFT)) {
 		psb_csync();
 		dsb(nsh);
@@ -55,7 +57,22 @@ void __spe_save_guest_state_nvhe(struct kvm_vcpu *vcpu,
 	}
 
 	ctxt_sys_reg(guest_ctxt, PMBPTR_EL1) = read_sysreg_s(SYS_PMBPTR_EL1);
-	ctxt_sys_reg(guest_ctxt, PMBSR_EL1) = read_sysreg_s(SYS_PMBSR_EL1);
+	/*
+	 * We need to differentiate between the hardware asserting the interrupt
+	 * and the guest setting the service bit as a result of a direct
+	 * register write, hence the extra field in the spe struct.
+	 *
+	 * The PMBSR_EL1 register is not directly accessed by the guest, KVM
+	 * needs to update the in-memory copy when the hardware asserts the
+	 * interrupt as that's the only case when KVM will show the guest a
+	 * value which is different from what the guest last wrote to the
+	 * register.
+	 */
+	pmbsr = read_sysreg_s(SYS_PMBSR_EL1);
+	if (pmbsr & BIT(SYS_PMBSR_EL1_S_SHIFT)) {
+		ctxt_sys_reg(guest_ctxt, PMBSR_EL1) = pmbsr;
+		vcpu->arch.spe.hwirq_level = true;
+	}
 	/* PMBLIMITR_EL1 is updated only on a trapped write. */
 	ctxt_sys_reg(guest_ctxt, PMSCR_EL1) = read_sysreg_s(SYS_PMSCR_EL1);
 
@@ -80,8 +97,13 @@ void __spe_restore_guest_state_nvhe(struct kvm_vcpu *vcpu,
 	__spe_restore_common_state(guest_ctxt);
 
 	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBPTR_EL1), SYS_PMBPTR_EL1);
-	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBSR_EL1), SYS_PMBSR_EL1);
-	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBLIMITR_EL1), SYS_PMBLIMITR_EL1);
+	/* The buffer management interrupt is virtual. */
+	write_sysreg_s(0, SYS_PMBSR_EL1);
+	/* The buffer is disabled when the interrupt is asserted. */
+	if (vcpu->arch.spe.irq_level)
+		write_sysreg_s(0, SYS_PMBLIMITR_EL1);
+	else
+		write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBLIMITR_EL1), SYS_PMBLIMITR_EL1);
 	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMSCR_EL1), SYS_PMSCR_EL1);
 	write_sysreg_el2(ctxt_sys_reg(guest_ctxt, PMSCR_EL2), SYS_PMSCR);
 }
diff --git a/arch/arm64/kvm/hyp/vhe/spe-sr.c b/arch/arm64/kvm/hyp/vhe/spe-sr.c
index f557ac64a1cc..ea4b3b69bb32 100644
--- a/arch/arm64/kvm/hyp/vhe/spe-sr.c
+++ b/arch/arm64/kvm/hyp/vhe/spe-sr.c
@@ -48,7 +48,7 @@ NOKPROBE_SYMBOL(__spe_save_host_state_vhe);
 void __spe_save_guest_state_vhe(struct kvm_vcpu *vcpu,
 				struct kvm_cpu_context *guest_ctxt)
 {
-	u64 pmblimitr;
+	u64 pmblimitr, pmbsr;
 
 	/*
 	 * We're at EL2 and the buffer owning regime is EL1, which means that
@@ -66,7 +66,11 @@ void __spe_save_guest_state_vhe(struct kvm_vcpu *vcpu,
 	}
 
 	ctxt_sys_reg(guest_ctxt, PMBPTR_EL1) = read_sysreg_s(SYS_PMBPTR_EL1);
-	ctxt_sys_reg(guest_ctxt, PMBSR_EL1) = read_sysreg_s(SYS_PMBSR_EL1);
+	pmbsr = read_sysreg_s(SYS_PMBSR_EL1);
+	if (pmbsr & BIT(SYS_PMBSR_EL1_S_SHIFT)) {
+		ctxt_sys_reg(guest_ctxt, PMBSR_EL1) = pmbsr;
+		vcpu->arch.spe.hwirq_level = true;
+	}
 	/* PMBLIMITR_EL1 is updated only on a trapped write. */
 	ctxt_sys_reg(guest_ctxt, PMSCR_EL1) = read_sysreg_el1(SYS_PMSCR);
 
@@ -120,8 +124,13 @@ void __spe_restore_guest_state_vhe(struct kvm_vcpu *vcpu,
 	 */
 
 	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBPTR_EL1), SYS_PMBPTR_EL1);
-	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBSR_EL1), SYS_PMBSR_EL1);
-	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBLIMITR_EL1), SYS_PMBLIMITR_EL1);
+	/* The buffer management interrupt is virtual. */
+	write_sysreg_s(0, SYS_PMBSR_EL1);
+	/* The buffer is disabled when the interrupt is asserted. */
+	if (vcpu->arch.spe.irq_level)
+		write_sysreg_s(0, SYS_PMBLIMITR_EL1);
+	else
+		write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBLIMITR_EL1), SYS_PMBLIMITR_EL1);
 	write_sysreg_el1(ctxt_sys_reg(guest_ctxt, PMSCR_EL1), SYS_PMSCR);
 	/* PMSCR_EL2 has been cleared when saving the host state. */
 }
diff --git a/arch/arm64/kvm/spe.c b/arch/arm64/kvm/spe.c
index 054bb16bbd79..5b69501dc3da 100644
--- a/arch/arm64/kvm/spe.c
+++ b/arch/arm64/kvm/spe.c
@@ -71,9 +71,81 @@ int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+static void kvm_spe_update_irq(struct kvm_vcpu *vcpu, bool level)
+{
+	struct kvm_vcpu_spe *spe = &vcpu->arch.spe;
+	int ret;
+
+	if (spe->irq_level == level)
+		return;
+
+	spe->irq_level = level;
+	ret = kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id, spe->irq_num,
+				  level, spe);
+	WARN_ON(ret);
+}
+
+static __printf(2, 3)
+void print_buf_warn(struct kvm_vcpu *vcpu, char *fmt, ...)
+{
+	va_list va;
+
+	va_start(va, fmt);
+	kvm_warn_ratelimited("%pV [PMBSR=0x%016llx, PMBPTR=0x%016llx, PMBLIMITR=0x%016llx]\n",
+			    &(struct va_format){ fmt, &va },
+			    __vcpu_sys_reg(vcpu, PMBSR_EL1),
+			    __vcpu_sys_reg(vcpu, PMBPTR_EL1),
+			    __vcpu_sys_reg(vcpu, PMBLIMITR_EL1));
+	va_end(va);
+}
+
+static void kvm_spe_inject_ext_abt(struct kvm_vcpu *vcpu)
+{
+	__vcpu_sys_reg(vcpu, PMBSR_EL1) = BIT(SYS_PMBSR_EL1_EA_SHIFT) |
+					  BIT(SYS_PMBSR_EL1_S_SHIFT);
+	__vcpu_sys_reg(vcpu, PMBSR_EL1) |= SYS_PMBSR_EL1_EC_FAULT_S1;
+	/* Synchronous External Abort, not on translation table walk. */
+	__vcpu_sys_reg(vcpu, PMBSR_EL1) |= 0x10 << SYS_PMBSR_EL1_FAULT_FSC_SHIFT;
+}
+
+void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu)
+{
+	struct kvm_vcpu_spe *spe = &vcpu->arch.spe;
+	u64 pmbsr, pmbsr_ec;
+
+	if (!spe->hwirq_level)
+		return;
+	spe->hwirq_level = false;
+
+	pmbsr = __vcpu_sys_reg(vcpu, PMBSR_EL1);
+	pmbsr_ec = pmbsr & (SYS_PMBSR_EL1_EC_MASK << SYS_PMBSR_EL1_EC_SHIFT);
+
+	switch (pmbsr_ec) {
+	case SYS_PMBSR_EL1_EC_FAULT_S2:
+		print_buf_warn(vcpu, "SPE stage 2 data abort");
+		kvm_spe_inject_ext_abt(vcpu);
+		break;
+	case SYS_PMBSR_EL1_EC_FAULT_S1:
+	case SYS_PMBSR_EL1_EC_BUF:
+		/*
+		 * These two exception syndromes are entirely up to the guest to
+		 * figure out, leave PMBSR_EL1 unchanged.
+		 */
+		break;
+	default:
+		print_buf_warn(vcpu, "SPE unknown buffer syndrome");
+		kvm_spe_inject_ext_abt(vcpu);
+	}
+
+	kvm_spe_update_irq(vcpu, true);
+}
+
 void kvm_spe_write_sysreg(struct kvm_vcpu *vcpu, int reg, u64 val)
 {
 	__vcpu_sys_reg(vcpu, reg) = val;
+
+	if (reg == PMBSR_EL1)
+		kvm_spe_update_irq(vcpu, val & BIT(SYS_PMBSR_EL1_S_SHIFT));
 }
 
 u64 kvm_spe_read_sysreg(struct kvm_vcpu *vcpu, int reg)
-- 
2.33.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v4 35/39] KVM: arm64: Add an userspace API to stop a VCPU profiling
  2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
                   ` (33 preceding siblings ...)
  2021-08-25 16:18 ` [RFC PATCH v4 34/39] KVM: arm64: Emulate SPE buffer management interrupt Alexandru Elisei
@ 2021-08-25 16:18 ` Alexandru Elisei
  2021-08-25 16:18 ` [RFC PATCH v4 36/39] KVM: arm64: Implement " Alexandru Elisei
                   ` (4 subsequent siblings)
  39 siblings, 0 replies; 47+ messages in thread
From: Alexandru Elisei @ 2021-08-25 16:18 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	linux-kernel

Add the KVM_ARM_VCPU_SPE_CTRL(KVM_ARM_VCPU_SPE_STOP) VCPU attribute to
allow userspace to request that KVM disables profiling for that VCPU. The
ioctl does nothing yet.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 Documentation/virt/kvm/devices/vcpu.rst | 36 +++++++++++++++++++++++++
 arch/arm64/include/uapi/asm/kvm.h       |  4 +++
 arch/arm64/kvm/spe.c                    | 23 +++++++++++++---
 3 files changed, 60 insertions(+), 3 deletions(-)

diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst
index c275c320e500..b4e38261b00f 100644
--- a/Documentation/virt/kvm/devices/vcpu.rst
+++ b/Documentation/virt/kvm/devices/vcpu.rst
@@ -201,3 +201,39 @@ Returns:
 Request initialization of the Statistical Profiling Extension for this VCPU.
 Must be done after initializaing the in-kernel irqchip and after setting the
 Profiling Buffer management interrupt number for the VCPU.
+
+4.3 ATTRIBUTE: KVM_ARM_VCPU_SPE_STOP
+------------------------------------
+
+:Parameters: in kvm_device_attr.addr the address to the flag that specifies
+             what KVM should do when the guest enables profiling
+
+The flag must be exactly one of:
+
+- KVM_ARM_VCPU_SPE_STOP_TRAP: trap all register accesses and ignore the guest
+  trying to enable profiling.
+- KVM_ARM_VCPU_SPE_STOP_EXIT: exit to userspace when the guest tries to enable
+  profiling.
+- KVM_ARM_VCPU_SPE_RESUME: resume profiling, if it was previously stopped using
+  this attribute.
+
+If KVM detects that a vcpu is trying to run with SPE enabled when
+KVM_ARM_VCPU_STOP_EXIT is set, KVM_RUN will return without entering the guest
+with kvm_run.exit_reason equal to KVM_EXIT_FAIL_ENTRY, and the fail_entry struct
+will be zeroed.
+
+Returns:
+
+	 =======  ============================================
+	 -EAGAIN  SPE not initialized
+	 -EFAULT  Error accessing the flag
+	 -EINVAL  Invalid flag
+	 -ENXIO   SPE not supported or not properly configured
+	 =======  ============================================
+
+Request that KVM disables SPE for the given vcpu. This can be useful for
+migration, which relies on tracking dirty pages by write-protecting memory, but
+breaks SPE in the guest as KVM does not handle buffer stage 2 faults.
+
+The attribute must be set after SPE has been initialized successfully. It can be
+set multiple times, with the latest value overwritting the previous one.
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index d4c0b53a5fb2..75a5113f610e 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -371,6 +371,10 @@ struct kvm_arm_copy_mte_tags {
 #define KVM_ARM_VCPU_SPE_CTRL		3
 #define   KVM_ARM_VCPU_SPE_IRQ		0
 #define   KVM_ARM_VCPU_SPE_INIT		1
+#define   KVM_ARM_VCPU_SPE_STOP		2
+#define     KVM_ARM_VCPU_SPE_STOP_TRAP		(1 << 0)
+#define     KVM_ARM_VCPU_SPE_STOP_EXIT		(1 << 1)
+#define     KVM_ARM_VCPU_SPE_RESUME		(1 << 2)
 
 /* KVM_IRQ_LINE irq field index values */
 #define KVM_ARM_IRQ_VCPU2_SHIFT		28
diff --git a/arch/arm64/kvm/spe.c b/arch/arm64/kvm/spe.c
index 5b69501dc3da..2630e777fe1d 100644
--- a/arch/arm64/kvm/spe.c
+++ b/arch/arm64/kvm/spe.c
@@ -220,14 +220,14 @@ int kvm_spe_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 	if (!kvm_vcpu_supports_spe(vcpu))
 		return -ENXIO;
 
-	if (vcpu->arch.spe.initialized)
-		return -EBUSY;
-
 	switch (attr->attr) {
 	case KVM_ARM_VCPU_SPE_IRQ: {
 		int __user *uaddr = (int __user *)(long)attr->addr;
 		int irq;
 
+		if (vcpu->arch.spe.initialized)
+			return -EBUSY;
+
 		if (vcpu->arch.spe.irq_num)
 			return -EBUSY;
 
@@ -248,11 +248,27 @@ int kvm_spe_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 		if (!vgic_initialized(vcpu->kvm))
 			return -ENXIO;
 
+		if (vcpu->arch.spe.initialized)
+			return -EBUSY;
+
 		if (kvm_vgic_set_owner(vcpu, vcpu->arch.spe.irq_num, &vcpu->arch.spe))
 			return -ENXIO;
 
 		vcpu->arch.spe.initialized = true;
 		return 0;
+	case KVM_ARM_VCPU_SPE_STOP: {
+		int __user *uaddr = (int __user *)(long)attr->addr;
+		int flags;
+
+		if (!vcpu->arch.spe.initialized)
+			return -EAGAIN;
+
+		if (get_user(flags, uaddr))
+			return -EFAULT;
+
+		if (!flags)
+			return -EINVAL;
+	}
 	}
 
 	return -ENXIO;
@@ -290,6 +306,7 @@ int kvm_spe_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 	switch(attr->attr) {
 	case KVM_ARM_VCPU_SPE_IRQ:
 	case KVM_ARM_VCPU_SPE_INIT:
+	case KVM_ARM_VCPU_SPE_STOP:
 		return 0;
 	}
 
-- 
2.33.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v4 36/39] KVM: arm64: Implement userspace API to stop a VCPU profiling
  2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
                   ` (34 preceding siblings ...)
  2021-08-25 16:18 ` [RFC PATCH v4 35/39] KVM: arm64: Add an userspace API to stop a VCPU profiling Alexandru Elisei
@ 2021-08-25 16:18 ` Alexandru Elisei
  2021-08-25 16:18 ` [RFC PATCH v4 37/39] KVM: arm64: Add PMSIDR_EL1 to the SPE register context Alexandru Elisei
                   ` (3 subsequent siblings)
  39 siblings, 0 replies; 47+ messages in thread
From: Alexandru Elisei @ 2021-08-25 16:18 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	linux-kernel

When userspace requests that a VCPU is not allowed to profile anymore via the
KVM_ARM_VCPU_SPE_STOP attribute, keep all the register state in memory and
trap all registers, not just the buffer registers, and don't copy any of
this shadow state on the hardware.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_hyp.h   |  2 +
 arch/arm64/include/asm/kvm_spe.h   | 14 +++++++
 arch/arm64/include/uapi/asm/kvm.h  |  3 ++
 arch/arm64/kvm/arm.c               |  9 ++++
 arch/arm64/kvm/debug.c             | 13 ++++--
 arch/arm64/kvm/hyp/nvhe/debug-sr.c |  4 +-
 arch/arm64/kvm/hyp/nvhe/spe-sr.c   | 24 +++++++++++
 arch/arm64/kvm/hyp/vhe/spe-sr.c    | 56 +++++++++++++++++++++++++
 arch/arm64/kvm/spe.c               | 67 ++++++++++++++++++++++++++++++
 arch/arm64/kvm/sys_regs.c          |  2 +-
 10 files changed, 188 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 03bc51049996..ce365427b483 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -86,8 +86,10 @@ void __debug_switch_to_host(struct kvm_vcpu *vcpu);
 #ifdef __KVM_NVHE_HYPERVISOR__
 void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu,
 				    struct kvm_cpu_context *host_ctxt);
+void __debug_save_spe(u64 *pmscr_el1);
 void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu,
 				       struct kvm_cpu_context *host_ctxt);
+void __debug_restore_spe(u64 pmscr_el1);
 #ifdef CONFIG_KVM_ARM_SPE
 void __spe_save_host_state_nvhe(struct kvm_vcpu *vcpu,
 				struct kvm_cpu_context *host_ctxt);
diff --git a/arch/arm64/include/asm/kvm_spe.h b/arch/arm64/include/asm/kvm_spe.h
index d7d7b9e243de..f51561e3b43f 100644
--- a/arch/arm64/include/asm/kvm_spe.h
+++ b/arch/arm64/include/asm/kvm_spe.h
@@ -16,13 +16,23 @@ static __always_inline bool kvm_supports_spe(void)
 	return static_branch_likely(&kvm_spe_available);
 }
 
+/* Guest profiling disabled by the user. */
+#define KVM_VCPU_SPE_STOP_USER		(1 << 0)
+/* Stop profiling and exit to userspace when guest starts profiling. */
+#define KVM_VCPU_SPE_STOP_USER_EXIT	(1 << 1)
+
 struct kvm_vcpu_spe {
 	bool initialized;	/* SPE initialized for the VCPU */
 	int irq_num;		/* Buffer management interrut number */
 	bool irq_level;		/* 'true' if the interrupt is asserted at the VGIC */
 	bool hwirq_level;	/* 'true' if the SPE hardware is asserting the interrupt */
+	u64 flags;
 };
 
+#define kvm_spe_profiling_stopped(vcpu)					\
+	(((vcpu)->arch.spe.flags & KVM_VCPU_SPE_STOP_USER) ||		\
+	 ((vcpu)->arch.spe.flags & KVM_VCPU_SPE_STOP_USER_EXIT))	\
+
 struct kvm_spe {
 	bool perfmon_capable;	/* Is the VM perfmon_capable()? */
 };
@@ -31,6 +41,7 @@ void kvm_spe_init_supported_cpus(void);
 void kvm_spe_vm_init(struct kvm *kvm);
 int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu);
 void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu);
+bool kvm_spe_exit_to_user(struct kvm_vcpu *vcpu);
 
 void kvm_spe_write_sysreg(struct kvm_vcpu *vcpu, int reg, u64 val);
 u64 kvm_spe_read_sysreg(struct kvm_vcpu *vcpu, int reg);
@@ -48,6 +59,8 @@ int kvm_spe_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
 struct kvm_vcpu_spe {
 };
 
+#define kvm_spe_profiling_stopped(vcpu)		(false)
+
 struct kvm_spe {
 };
 
@@ -55,6 +68,7 @@ static inline void kvm_spe_init_supported_cpus(void) {}
 static inline void kvm_spe_vm_init(struct kvm *kvm) {}
 static inline int kvm_spe_vcpu_first_run_init(struct kvm_vcpu *vcpu) { return -ENOEXEC; }
 static inline void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu) {}
+static inline bool kvm_spe_exit_to_user(struct kvm_vcpu *vcpu) { return false; }
 
 static inline void kvm_spe_write_sysreg(struct kvm_vcpu *vcpu, int reg, u64 val) {}
 static inline u64 kvm_spe_read_sysreg(struct kvm_vcpu *vcpu, int reg) { return 0; }
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 75a5113f610e..63599ee39a7b 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -376,6 +376,9 @@ struct kvm_arm_copy_mte_tags {
 #define     KVM_ARM_VCPU_SPE_STOP_EXIT		(1 << 1)
 #define     KVM_ARM_VCPU_SPE_RESUME		(1 << 2)
 
+/* run->fail_entry.hardware_entry_failure_reason codes. */
+#define KVM_EXIT_FAIL_ENTRY_SPE		(1 << 0)
+
 /* KVM_IRQ_LINE irq field index values */
 #define KVM_ARM_IRQ_VCPU2_SHIFT		28
 #define KVM_ARM_IRQ_VCPU2_MASK		0xf
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index ec449bc5f811..b7aae25bb9da 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -873,6 +873,15 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 			continue;
 		}
 
+		if (unlikely(kvm_spe_exit_to_user(vcpu))) {
+			run->exit_reason = KVM_EXIT_FAIL_ENTRY;
+			run->fail_entry.hardware_entry_failure_reason
+				= KVM_EXIT_FAIL_ENTRY_SPE;
+			ret = -EAGAIN;
+			preempt_enable();
+			continue;
+		}
+
 		kvm_pmu_flush_hwstate(vcpu);
 
 		local_irq_disable();
diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index 6e5fc1887215..6a4277a23bbb 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -96,11 +96,18 @@ static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu)
 	if (kvm_supports_spe() && kvm_vcpu_has_spe(vcpu)) {
 		/*
 		 * Use EL1&0 for the profiling buffer translation regime and
-		 * trap accesses to the buffer control registers; leave
-		 * MDCR_EL2.TPMS unset and do not trap accesses to the profiling
-		 * control registers.
+		 * trap accesses to the buffer control registers; if profiling
+		 * is stopped, also set MSCR_EL2.TMPS to trap accesses to the
+		 * rest of the registers, otherwise leave it clear.
+		 *
+		 * Leaving MDCR_EL2.E2P unset, like we do when the VCPU does not
+		 * have SPE, means that the PMBIDR_EL1.P (which KVM does not
+		 * trap) will be set and the guest will detect SPE as being
+		 * unavailable.
 		 */
 		vcpu->arch.mdcr_el2 |= MDCR_EL2_E2PB_EL1_TRAP << MDCR_EL2_E2PB_SHIFT;
+		if (kvm_spe_profiling_stopped(vcpu))
+			vcpu->arch.mdcr_el2 |= MDCR_EL2_TPMS;
 	} else {
 		/*
 		 * Trap accesses to the profiling control registers; leave
diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
index 1622615954b2..944972de0944 100644
--- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
@@ -14,7 +14,7 @@
 #include <asm/kvm_hyp.h>
 #include <asm/kvm_mmu.h>
 
-static void __debug_save_spe(u64 *pmscr_el1)
+void __debug_save_spe(u64 *pmscr_el1)
 {
 	u64 reg;
 
@@ -40,7 +40,7 @@ static void __debug_save_spe(u64 *pmscr_el1)
 	dsb(nsh);
 }
 
-static void __debug_restore_spe(u64 pmscr_el1)
+void __debug_restore_spe(u64 pmscr_el1)
 {
 	if (!pmscr_el1)
 		return;
diff --git a/arch/arm64/kvm/hyp/nvhe/spe-sr.c b/arch/arm64/kvm/hyp/nvhe/spe-sr.c
index b74131486a75..8ed03aa4f965 100644
--- a/arch/arm64/kvm/hyp/nvhe/spe-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/spe-sr.c
@@ -23,6 +23,11 @@ void __spe_save_host_state_nvhe(struct kvm_vcpu *vcpu,
 {
 	u64 pmblimitr;
 
+	if (kvm_spe_profiling_stopped(vcpu)) {
+		__debug_save_spe(__ctxt_sys_reg(host_ctxt, PMSCR_EL1));
+		return;
+	}
+
 	pmblimitr = read_sysreg_s(SYS_PMBLIMITR_EL1);
 	if (pmblimitr & BIT(SYS_PMBLIMITR_EL1_E_SHIFT)) {
 		psb_csync();
@@ -49,6 +54,13 @@ void __spe_save_guest_state_nvhe(struct kvm_vcpu *vcpu,
 {
 	u64 pmbsr;
 
+	/*
+	 * Profiling is stopped and all register accesses are trapped, nothing
+	 * to save here.
+	 */
+	if (kvm_spe_profiling_stopped(vcpu))
+		return;
+
 	if (read_sysreg_s(SYS_PMBLIMITR_EL1) & BIT(SYS_PMBLIMITR_EL1_E_SHIFT)) {
 		psb_csync();
 		dsb(nsh);
@@ -82,6 +94,11 @@ void __spe_save_guest_state_nvhe(struct kvm_vcpu *vcpu,
 void __spe_restore_host_state_nvhe(struct kvm_vcpu *vcpu,
 				   struct kvm_cpu_context *host_ctxt)
 {
+	if (kvm_spe_profiling_stopped(vcpu)) {
+		__debug_restore_spe(ctxt_sys_reg(host_ctxt, PMSCR_EL1));
+		return;
+	}
+
 	__spe_restore_common_state(host_ctxt);
 
 	write_sysreg_s(ctxt_sys_reg(host_ctxt, PMBPTR_EL1), SYS_PMBPTR_EL1);
@@ -94,6 +111,13 @@ void __spe_restore_host_state_nvhe(struct kvm_vcpu *vcpu,
 void __spe_restore_guest_state_nvhe(struct kvm_vcpu *vcpu,
 				    struct kvm_cpu_context *guest_ctxt)
 {
+	/*
+	 * Profiling is stopped and all register accesses are trapped, nothing
+	 * to restore here.
+	 */
+	if (kvm_spe_profiling_stopped(vcpu))
+		return;
+
 	__spe_restore_common_state(guest_ctxt);
 
 	write_sysreg_s(ctxt_sys_reg(guest_ctxt, PMBPTR_EL1), SYS_PMBPTR_EL1);
diff --git a/arch/arm64/kvm/hyp/vhe/spe-sr.c b/arch/arm64/kvm/hyp/vhe/spe-sr.c
index ea4b3b69bb32..024a4c0618cc 100644
--- a/arch/arm64/kvm/hyp/vhe/spe-sr.c
+++ b/arch/arm64/kvm/hyp/vhe/spe-sr.c
@@ -10,6 +10,34 @@
 
 #include <hyp/spe-sr.h>
 
+static void __spe_save_host_buffer(u64 *pmscr_el2)
+{
+	u64 pmblimitr;
+
+	/* Disable guest profiling. */
+	write_sysreg_el1(0, SYS_PMSCR);
+
+	pmblimitr = read_sysreg_s(SYS_PMBLIMITR_EL1);
+	if (!(pmblimitr & BIT(SYS_PMBLIMITR_EL1_E_SHIFT))) {
+		*pmscr_el2 = 0;
+		return;
+	}
+
+	*pmscr_el2 = read_sysreg_el2(SYS_PMSCR);
+
+	/* Disable profiling at EL2 so we can drain the buffer. */
+	write_sysreg_el2(0, SYS_PMSCR);
+	isb();
+
+	/*
+	 * We're going to change the buffer owning exception level when we
+	 * activate traps, drain the buffer now.
+	 */
+	psb_csync();
+	dsb(nsh);
+}
+NOKPROBE_SYMBOL(__spe_save_host_buffer);
+
 /*
  * Disable host profiling, drain the buffer and save the host SPE context.
  * Extra care must be taken because profiling might be in progress.
@@ -19,6 +47,11 @@ void __spe_save_host_state_vhe(struct kvm_vcpu *vcpu,
 {
 	u64 pmblimitr, pmscr_el2;
 
+	if (kvm_spe_profiling_stopped(vcpu)) {
+		__spe_save_host_buffer(__ctxt_sys_reg(host_ctxt, PMSCR_EL2));
+		return;
+	}
+
 	/* Disable profiling while the SPE context is being switched. */
 	pmscr_el2 = read_sysreg_el2(SYS_PMSCR);
 	write_sysreg_el2(__vcpu_sys_reg(vcpu, PMSCR_EL2), SYS_PMSCR);
@@ -50,6 +83,9 @@ void __spe_save_guest_state_vhe(struct kvm_vcpu *vcpu,
 {
 	u64 pmblimitr, pmbsr;
 
+	if (kvm_spe_profiling_stopped(vcpu))
+		return;
+
 	/*
 	 * We're at EL2 and the buffer owning regime is EL1, which means that
 	 * profiling is disabled. After we disable traps and restore the host's
@@ -78,6 +114,18 @@ void __spe_save_guest_state_vhe(struct kvm_vcpu *vcpu,
 }
 NOKPROBE_SYMBOL(__spe_save_guest_state_vhe);
 
+static void __spe_restore_host_buffer(u64 pmscr_el2)
+{
+	if (!pmscr_el2)
+		return;
+
+	/* Synchronize MDCR_EL2 write. */
+	isb();
+
+	write_sysreg_el2(pmscr_el2, SYS_PMSCR);
+}
+NOKPROBE_SYMBOL(__spe_restore_host_buffer);
+
 /*
  * Restore the host SPE context. Special care must be taken because we're
  * potentially resuming a profiling session which was stopped when we saved the
@@ -86,6 +134,11 @@ NOKPROBE_SYMBOL(__spe_save_guest_state_vhe);
 void __spe_restore_host_state_vhe(struct kvm_vcpu *vcpu,
 				  struct kvm_cpu_context *host_ctxt)
 {
+	if (kvm_spe_profiling_stopped(vcpu)) {
+		__spe_restore_host_buffer(ctxt_sys_reg(host_ctxt, PMSCR_EL2));
+		return;
+	}
+
 	__spe_restore_common_state(host_ctxt);
 
 	write_sysreg_s(ctxt_sys_reg(host_ctxt, PMBPTR_EL1), SYS_PMBPTR_EL1);
@@ -115,6 +168,9 @@ NOKPROBE_SYMBOL(__spe_restore_host_state_vhe);
 void __spe_restore_guest_state_vhe(struct kvm_vcpu *vcpu,
 				   struct kvm_cpu_context *guest_ctxt)
 {
+	if (kvm_spe_profiling_stopped(vcpu))
+		return;
+
 	__spe_restore_common_state(guest_ctxt);
 
 	/*
diff --git a/arch/arm64/kvm/spe.c b/arch/arm64/kvm/spe.c
index 2630e777fe1d..69ca731ba9d3 100644
--- a/arch/arm64/kvm/spe.c
+++ b/arch/arm64/kvm/spe.c
@@ -140,6 +140,28 @@ void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu)
 	kvm_spe_update_irq(vcpu, true);
 }
 
+static bool kvm_spe_buffer_enabled(struct kvm_vcpu *vcpu)
+{
+	return !vcpu->arch.spe.irq_level  &&
+		(__vcpu_sys_reg(vcpu, PMBLIMITR_EL1) & BIT(SYS_PMBLIMITR_EL1_E_SHIFT));
+}
+
+bool kvm_spe_exit_to_user(struct kvm_vcpu *vcpu)
+{
+	u64 pmscr_enabled_mask = BIT(SYS_PMSCR_EL1_E0SPE_SHIFT) |
+				 BIT(SYS_PMSCR_EL1_E1SPE_SHIFT);
+
+	if (!(vcpu->arch.spe.flags & KVM_VCPU_SPE_STOP_USER_EXIT))
+		return false;
+
+	/*
+	 * We don't trap the guest dropping to EL0, so exit even if profiling is
+	 * disabled at EL1, but enabled at EL0.
+	 */
+	return kvm_spe_buffer_enabled(vcpu) &&
+		(__vcpu_sys_reg(vcpu, PMSCR_EL1) & pmscr_enabled_mask);
+}
+
 void kvm_spe_write_sysreg(struct kvm_vcpu *vcpu, int reg, u64 val)
 {
 	__vcpu_sys_reg(vcpu, reg) = val;
@@ -215,6 +237,31 @@ static bool kvm_spe_irq_is_valid(struct kvm *kvm, int irq)
 	return true;
 }
 
+static int kvm_spe_stop_user(struct kvm_vcpu *vcpu, int flags)
+{
+	struct kvm_vcpu_spe *spe = &vcpu->arch.spe;
+
+	if (flags & KVM_ARM_VCPU_SPE_STOP_TRAP) {
+		if (flags & ~KVM_ARM_VCPU_SPE_STOP_TRAP)
+			return -EINVAL;
+		spe->flags = KVM_VCPU_SPE_STOP_USER;
+	}
+
+	if (flags & KVM_ARM_VCPU_SPE_STOP_EXIT) {
+		if (flags & ~KVM_ARM_VCPU_SPE_STOP_EXIT)
+			return -EINVAL;
+		spe->flags = KVM_VCPU_SPE_STOP_USER_EXIT;
+	}
+
+	if (flags & KVM_ARM_VCPU_SPE_RESUME) {
+		if (flags & ~KVM_ARM_VCPU_SPE_RESUME)
+			return -EINVAL;
+		spe->flags = 0;
+	}
+
+	return 0;
+}
+
 int kvm_spe_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 {
 	if (!kvm_vcpu_supports_spe(vcpu))
@@ -268,6 +315,8 @@ int kvm_spe_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 
 		if (!flags)
 			return -EINVAL;
+
+		return kvm_spe_stop_user(vcpu, flags);
 	}
 	}
 
@@ -293,6 +342,24 @@ int kvm_spe_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 
 		return 0;
 	}
+	case KVM_ARM_VCPU_SPE_STOP: {
+		int __user *uaddr = (int __user *)(long)attr->addr;
+		struct kvm_vcpu_spe *spe = &vcpu->arch.spe;
+		int flag = 0;
+
+		if (!vcpu->arch.spe.initialized)
+			return -EAGAIN;
+
+		if (spe->flags & KVM_VCPU_SPE_STOP_USER)
+			flag = KVM_ARM_VCPU_SPE_STOP_TRAP;
+		else if (spe->flags & KVM_VCPU_SPE_STOP_USER_EXIT)
+			flag = KVM_ARM_VCPU_SPE_STOP_EXIT;
+
+		if (put_user(flag, uaddr))
+			return -EFAULT;
+
+		return 0;
+	}
 	}
 
 	return -ENXIO;
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 064742cee425..cc711b081f31 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -608,7 +608,7 @@ static bool access_spe_reg(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 {	int reg = r->reg;
 	u64 val = p->regval;
 
-	if (reg < PMBLIMITR_EL1) {
+	if (reg < PMBLIMITR_EL1 && !kvm_spe_profiling_stopped(vcpu)) {
 		print_sys_reg_msg(p, "Unsupported guest SPE register access at: %lx [%08lx]\n",
 				  *vcpu_pc(vcpu), *vcpu_cpsr(vcpu));
 	}
-- 
2.33.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v4 37/39] KVM: arm64: Add PMSIDR_EL1 to the SPE register context
  2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
                   ` (35 preceding siblings ...)
  2021-08-25 16:18 ` [RFC PATCH v4 36/39] KVM: arm64: Implement " Alexandru Elisei
@ 2021-08-25 16:18 ` Alexandru Elisei
  2021-08-25 16:18 ` [RFC PATCH v4 38/39] KVM: arm64: Make CONFIG_KVM_ARM_SPE depend on !CONFIG_NUMA_BALANCING Alexandru Elisei
                   ` (2 subsequent siblings)
  39 siblings, 0 replies; 47+ messages in thread
From: Alexandru Elisei @ 2021-08-25 16:18 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	linux-kernel

PMSIDR_EL1 is not part of the VCPU register context because the profiling
control registers were not trapped and the register is read-only. With the
introduction of the KVM_ARM_VCPU_SPE_STOP API, KVM will start trapping
accesses to the profiling control registers, add PMSIDR_EL1 to the VCPU
register context to prevent undefined exceptions in the guest.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_host.h |  1 +
 arch/arm64/kvm/sys_regs.c         | 22 +++++++++++++++++++---
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index cc46f1406196..f866c4556ff9 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -247,6 +247,7 @@ enum vcpu_sysreg {
 	PMSFCR_EL1,     /* Sampling Filter Control Register */
 	PMSEVFR_EL1,    /* Sampling Event Filter Register */
 	PMSLATFR_EL1,   /* Sampling Latency Filter Register */
+	PMSIDR_EL1,	/* Sampling Profiling ID Register */
 	PMBLIMITR_EL1,  /* Profiling Buffer Limit Address Register */
 	PMBPTR_EL1,     /* Profiling Buffer Write Pointer Register */
 	PMBSR_EL1,      /* Profiling Buffer Status/syndrome Register */
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index cc711b081f31..1a85a0cedbec 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -603,6 +603,18 @@ static unsigned int spe_visibility(const struct kvm_vcpu *vcpu,
 	return REG_HIDDEN;
 }
 
+static void reset_pmsidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
+{
+	/*
+	 * When SPE is stopped by userspace, the guest reads the in-memory value
+	 * of the register. When SPE is resumed, accesses to the control
+	 * registers are not trapped and the guest reads the hardware
+	 * value. Reset PMSIDR_EL1 to the hardware value to avoid mistmatches
+	 * between the two.
+	 */
+	vcpu_write_sys_reg(vcpu, read_sysreg_s(SYS_PMSIDR_EL1), PMSIDR_EL1);
+}
+
 static bool access_spe_reg(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 			   const struct sys_reg_desc *r)
 {	int reg = r->reg;
@@ -613,10 +625,14 @@ static bool access_spe_reg(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 				  *vcpu_pc(vcpu), *vcpu_cpsr(vcpu));
 	}
 
-	if (p->is_write)
+	if (p->is_write) {
+		if (reg == PMSIDR_EL1)
+			return write_to_read_only(vcpu, p, r);
+
 		kvm_spe_write_sysreg(vcpu, reg, val);
-	else
+	} else {
 		p->regval = kvm_spe_read_sysreg(vcpu, reg);
+	}
 
 	return true;
 }
@@ -1568,7 +1584,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SPE_SYS_REG(SYS_PMSFCR_EL1), .reg = PMSFCR_EL1 },
 	{ SPE_SYS_REG(SYS_PMSEVFR_EL1), .reg = PMSEVFR_EL1 },
 	{ SPE_SYS_REG(SYS_PMSLATFR_EL1), .reg = PMSLATFR_EL1 },
-	{ SPE_SYS_REG(SYS_PMSIDR_EL1), .reset = NULL },
+	{ SPE_SYS_REG(SYS_PMSIDR_EL1), .reset = reset_pmsidr, .reg = PMSIDR_EL1 },
 	{ SPE_SYS_REG(SYS_PMBLIMITR_EL1), .reg = PMBLIMITR_EL1 },
 	{ SPE_SYS_REG(SYS_PMBPTR_EL1), .reg = PMBPTR_EL1 },
 	{ SPE_SYS_REG(SYS_PMBSR_EL1), .reg = PMBSR_EL1 },
-- 
2.33.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v4 38/39] KVM: arm64: Make CONFIG_KVM_ARM_SPE depend on !CONFIG_NUMA_BALANCING
  2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
                   ` (36 preceding siblings ...)
  2021-08-25 16:18 ` [RFC PATCH v4 37/39] KVM: arm64: Add PMSIDR_EL1 to the SPE register context Alexandru Elisei
@ 2021-08-25 16:18 ` Alexandru Elisei
  2021-08-25 16:18 ` [RFC PATCH v4 39/39] KVM: arm64: Allow userspace to enable SPE for guests Alexandru Elisei
  2021-09-22 10:11 ` [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Suzuki K Poulose
  39 siblings, 0 replies; 47+ messages in thread
From: Alexandru Elisei @ 2021-08-25 16:18 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	linux-kernel

Automatic NUMA balancing is a performance strategy that Linux uses to
reduce the cost associated with memory accesses by having a task use
the memory closest to the NUMA node where the task is executing. This is
accomplished by triggering periodic page faults to examine the memory
location that a task uses, and decide if page migration is necessary.

The periodic page faults that drive automatic NUMA balancing are triggered
by clearing permissions on certain pages from the task's address space.
Clearing the permissions invokes mmu_notifier_invalidate_range_start(),
which causes guest memory from being unmapped from stage 2. As a result,
SPE can start reporting stage 2 faults, which KVM has no way of handling.

Make CONFIG_KVM_ARM_SPE depend on !CONFIG_NUMA_BALANCING to keep SPE usable
for a guest.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/kvm/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index c6ad5a05efb3..1ea34eb29fb4 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -48,7 +48,7 @@ source "virt/kvm/Kconfig"
 
 config KVM_ARM_SPE
 	bool "Virtual Statistical Profiling Extension (SPE) support"
-	depends on ARM_SPE_PMU=y
+	depends on ARM_SPE_PMU=y && !NUMA_BALANCING
 	default y
 	help
 	  Adds support for Statistical Profiling Extension (SPE) in virtual
-- 
2.33.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [RFC PATCH v4 39/39] KVM: arm64: Allow userspace to enable SPE for guests
  2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
                   ` (37 preceding siblings ...)
  2021-08-25 16:18 ` [RFC PATCH v4 38/39] KVM: arm64: Make CONFIG_KVM_ARM_SPE depend on !CONFIG_NUMA_BALANCING Alexandru Elisei
@ 2021-08-25 16:18 ` Alexandru Elisei
  2021-09-22 10:11 ` [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Suzuki K Poulose
  39 siblings, 0 replies; 47+ messages in thread
From: Alexandru Elisei @ 2021-08-25 16:18 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, linux-arm-kernel, kvmarm, will,
	linux-kernel

Everything is in place to emulate SPE for a guest, allow userspace to set
the VCPU feature.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/include/asm/kvm_host.h | 2 +-
 arch/arm64/kvm/arm.c              | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index f866c4556ff9..040da3b0cf2b 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -39,7 +39,7 @@
 
 #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
 
-#define KVM_VCPU_MAX_FEATURES 7
+#define KVM_VCPU_MAX_FEATURES 8
 
 #define KVM_REQ_SLEEP \
 	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index b7aae25bb9da..8016b98a8ac3 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -309,7 +309,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		break;
 	case KVM_CAP_ARM_SPE:
 		kvm_spe_init_supported_cpus();
-		r = 0;
+		r = kvm_supports_spe();
 		break;
 	default:
 		r = 0;
-- 
2.33.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support
  2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
                   ` (38 preceding siblings ...)
  2021-08-25 16:18 ` [RFC PATCH v4 39/39] KVM: arm64: Allow userspace to enable SPE for guests Alexandru Elisei
@ 2021-09-22 10:11 ` Suzuki K Poulose
  2021-09-23 15:12   ` Alexandru Elisei
  39 siblings, 1 reply; 47+ messages in thread
From: Suzuki K Poulose @ 2021-09-22 10:11 UTC (permalink / raw)
  To: Alexandru Elisei, maz, james.morse, linux-arm-kernel, kvmarm,
	will, linux-kernel

On 25/08/2021 17:17, Alexandru Elisei wrote:
> This is v4 of the SPE series posted at [1]. v2 can be found at [2], and the
> original series at [3].
> 
> Statistical Profiling Extension (SPE) is an optional feature added in
> ARMv8.2. It allows sampling at regular intervals of the operations executed
> by the PE and storing a record of each operation in a memory buffer. A high
> level overview of the extension is presented in an article on arm.com [4].
> 
> This is another complete rewrite of the series, and nothing is set in
> stone. If you think of a better way to do things, please suggest it.
> 
> 
> Features added
> ==============
> 
> The rewrite enabled me to add support for several features not
> present in the previous iteration:
> 
> - Support for heterogeneous systems, where only some of the CPUs support SPE.
>    This is accomplished via the KVM_ARM_VCPU_SUPPORTED_CPUS VCPU ioctl.
> 
> - Support for VM migration with the KVM_ARM_VCPU_SPE_CTRL(KVM_ARM_VCPU_SPE_STOP)
>    VCPU ioctl.
> 
> - The requirement for userspace to mlock() the guest memory has been removed,
>    and now userspace can make changes to memory contents after the memory is
>    mapped at stage 2.
> 
> - Better debugging of guest memory pinning by printing a warning when we
>    get an unexpected read or write fault. This helped me catch several bugs
>    during development, it has already proven very useful. Many thanks to
>    James who suggested when reviewing v3.
> 
> 
> Missing features
> ================
> 
> I've tried to keep the series as small as possible to make it easier to review,
> while implementing the core functionality needed for the SPE emulation. As such,
> I've chosen to not implement several features:
> 
> - Host profiling a guest which has the SPE feature bit set (see open
>    questions).
> 
> - No errata workarounds have been implemented yet, and there are quite a few of
>    them for Neoverse N1 and Neoverse V1.
> 
> - Disabling CONFIG_NUMA_BALANCING is a hack to get KVM SPE to work and I am
>    investigating other ways to get around automatic numa balancing, like
>    requiring userspace to disable it via set_mempolicy(). I am also going to
>    look at how VFIO gets around it. Suggestions welcome.
> 
> - There's plenty of room for optimization. Off the top of my head, using
>    block mappings at stage 2, batch pinning of pages (similar to what VFIO
>    does), optimize the way KVM keeps track of pinned pages (using a linked
>    list triples the memory usage), context-switch the SPE registers on
>    vcpu_load/vcpu_put on VHE if the host is not profiling, locking
>    optimizations, etc, etc.
> 
> - ...and others. I'm sure I'm missing at least a few things which are
>    important for someone.
> 
> 
> Known issues
> ============
> 
> This is an RFC, so keep in mind that almost definitely there will be scary
> bugs. For example, below is a list of known issues which don't affect the
> correctness of the emulation, and which I'm planning to fix in a future
> iteration:
> 
> - With CONFIG_PROVE_LOCKING=y, lockdep complains about lock contention when
>    the VCPU executes the dcache clean pending ops.
> 
> - With CONFIG_PROVE_LOCKING=y, KVM will hit a BUG at
>    kvm_lock_all_vcpus()->mutex_trylock(&vcpu->mutex) with more than 48
>    VCPUs.
> 
> This BUG statement can also be triggered with mainline. To reproduce it,
> compile kvmtool from this branch [5] and follow the instruction in the
> kvmtool commit message.
> 
> One workaround could be to stop trying to lock all VCPUs when locking a
> memslot and document the fact that it is required that no VCPUs are run
> before the ioctl completes, otherwise bad things might happen to the VM.
> 
> 
> Open questions
> ==============
> 
> 1. Implementing support for host profiling a guest with the SPE feature
> means setting the profiling buffer owning regime to EL2. While that is in
> effect,  PMBIDR_EL1.P will equal 1. This has two consequences: if the guest
> probes SPE during this time, the driver will fail; and the guest will be
> able to determine when it is profiled. I see two options here:

This doesn't mean the EL2 is owning the SPE. It only tells you that a
higher level EL is owning the SPE. It could as well be EL3. (e.g, 
MDCR_EL3.NSPB == 0 or 1). So I think this is architecturally correct,
as long as we trap the guest access to other SPE registers and inject
and UNDEF.


Thanks
Suzuki
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v4 01/39] KVM: arm64: Make lock_all_vcpus() available to the rest of KVM
  2021-08-25 16:17 ` [RFC PATCH v4 01/39] KVM: arm64: Make lock_all_vcpus() available to the rest of KVM Alexandru Elisei
@ 2021-09-22 10:39   ` Suzuki K Poulose
  0 siblings, 0 replies; 47+ messages in thread
From: Suzuki K Poulose @ 2021-09-22 10:39 UTC (permalink / raw)
  To: Alexandru Elisei, maz, james.morse, linux-arm-kernel, kvmarm,
	will, linux-kernel

On 25/08/2021 17:17, Alexandru Elisei wrote:
> The VGIC code uses the lock_all_vcpus() function to make sure no VCPUs are
> run while it fiddles with the global VGIC state. Move the declaration of
> lock_all_vcpus() and the corresponding unlock function into asm/kvm_host.h
> where it can be reused by other parts of KVM/arm64 and rename the functions
> to kvm_{lock,unlock}_all_vcpus() to make them more generic.
> 
> Because the scope of the code potentially using the functions has
> increased, add a lockdep check that the kvm->lock is held by the caller.
> Holding the lock is necessary because otherwise userspace would be able to
> create new VCPUs and run them while the existing VCPUs are locked.
> 
> No functional change intended.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>


LGTM,

Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support
  2021-09-22 10:11 ` [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Suzuki K Poulose
@ 2021-09-23 15:12   ` Alexandru Elisei
  0 siblings, 0 replies; 47+ messages in thread
From: Alexandru Elisei @ 2021-09-23 15:12 UTC (permalink / raw)
  To: Suzuki K Poulose, maz, james.morse, linux-arm-kernel, kvmarm,
	will, linux-kernel

Hi Suzuki,

Thank you for having a look!

On 9/22/21 11:11, Suzuki K Poulose wrote:
> On 25/08/2021 17:17, Alexandru Elisei wrote:
>> This is v4 of the SPE series posted at [1]. v2 can be found at [2], and the
>> original series at [3].
>>
>> Statistical Profiling Extension (SPE) is an optional feature added in
>> ARMv8.2. It allows sampling at regular intervals of the operations executed
>> by the PE and storing a record of each operation in a memory buffer. A high
>> level overview of the extension is presented in an article on arm.com [4].
>>
>> This is another complete rewrite of the series, and nothing is set in
>> stone. If you think of a better way to do things, please suggest it.
>>
>>
>> Features added
>> ==============
>>
>> The rewrite enabled me to add support for several features not
>> present in the previous iteration:
>>
>> - Support for heterogeneous systems, where only some of the CPUs support SPE.
>>    This is accomplished via the KVM_ARM_VCPU_SUPPORTED_CPUS VCPU ioctl.
>>
>> - Support for VM migration with the KVM_ARM_VCPU_SPE_CTRL(KVM_ARM_VCPU_SPE_STOP)
>>    VCPU ioctl.
>>
>> - The requirement for userspace to mlock() the guest memory has been removed,
>>    and now userspace can make changes to memory contents after the memory is
>>    mapped at stage 2.
>>
>> - Better debugging of guest memory pinning by printing a warning when we
>>    get an unexpected read or write fault. This helped me catch several bugs
>>    during development, it has already proven very useful. Many thanks to
>>    James who suggested when reviewing v3.
>>
>>
>> Missing features
>> ================
>>
>> I've tried to keep the series as small as possible to make it easier to review,
>> while implementing the core functionality needed for the SPE emulation. As such,
>> I've chosen to not implement several features:
>>
>> - Host profiling a guest which has the SPE feature bit set (see open
>>    questions).
>>
>> - No errata workarounds have been implemented yet, and there are quite a few of
>>    them for Neoverse N1 and Neoverse V1.
>>
>> - Disabling CONFIG_NUMA_BALANCING is a hack to get KVM SPE to work and I am
>>    investigating other ways to get around automatic numa balancing, like
>>    requiring userspace to disable it via set_mempolicy(). I am also going to
>>    look at how VFIO gets around it. Suggestions welcome.
>>
>> - There's plenty of room for optimization. Off the top of my head, using
>>    block mappings at stage 2, batch pinning of pages (similar to what VFIO
>>    does), optimize the way KVM keeps track of pinned pages (using a linked
>>    list triples the memory usage), context-switch the SPE registers on
>>    vcpu_load/vcpu_put on VHE if the host is not profiling, locking
>>    optimizations, etc, etc.
>>
>> - ...and others. I'm sure I'm missing at least a few things which are
>>    important for someone.
>>
>>
>> Known issues
>> ============
>>
>> This is an RFC, so keep in mind that almost definitely there will be scary
>> bugs. For example, below is a list of known issues which don't affect the
>> correctness of the emulation, and which I'm planning to fix in a future
>> iteration:
>>
>> - With CONFIG_PROVE_LOCKING=y, lockdep complains about lock contention when
>>    the VCPU executes the dcache clean pending ops.
>>
>> - With CONFIG_PROVE_LOCKING=y, KVM will hit a BUG at
>>    kvm_lock_all_vcpus()->mutex_trylock(&vcpu->mutex) with more than 48
>>    VCPUs.
>>
>> This BUG statement can also be triggered with mainline. To reproduce it,
>> compile kvmtool from this branch [5] and follow the instruction in the
>> kvmtool commit message.
>>
>> One workaround could be to stop trying to lock all VCPUs when locking a
>> memslot and document the fact that it is required that no VCPUs are run
>> before the ioctl completes, otherwise bad things might happen to the VM.
>>
>>
>> Open questions
>> ==============
>>
>> 1. Implementing support for host profiling a guest with the SPE feature
>> means setting the profiling buffer owning regime to EL2. While that is in
>> effect,  PMBIDR_EL1.P will equal 1. This has two consequences: if the guest
>> probes SPE during this time, the driver will fail; and the guest will be
>> able to determine when it is profiled. I see two options here:
>
> This doesn't mean the EL2 is owning the SPE. It only tells you that a
> higher level EL is owning the SPE. It could as well be EL3. (e.g, MDCR_EL3.NSPB
> == 0 or 1). So I think this is architecturally correct,
> as long as we trap the guest access to other SPE registers and inject
> and UNDEF.

You are right, I was wrong about the part about the guest being able to determine
when it is profiled, I forgot that PMBIDR_EL1.P can also be 1 if EL3 is owning the
buffer.

But I don't understand why you are suggesting that KVM injects UNDEF in this case.
I was thinking that when the host is profiling the guest, KVM can store guest
writes to the SPE registers to the shadow copy of the registers. On a world switch
to guest, KVM will not restore the registers onto the hardware as to not interfere
with the profiling operation performed by the host. When profiling ends, KVM can
then let the guest use SPE again by restoring the guest register values onto the
hardware at every world switch and setting the buffer owning regime to EL1 again.

I put this as an open question because if the guest sees PMBIDR_EL1.P set when the
guest SPE driver probes, the driver will likely decide that SPE is not available.
But the VM has been created with SPE enabled because the host userspace wants the
guest to have access to SPE, and the host userspace might not realize that the act
of profiling a guest can make SPE unusable by that guest. I am inclined now to let
the host's userspace do whatever it wishes and allow it to profile a guest if it
so desires, and mention this possible side effect in the KVM documentation, as it
might be surprising for someone who isn't familiar with the inner workings of
KVM's SPE emulation.

Thanks,

Alex

>
>
> Thanks
> Suzuki
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v4 04/39] KVM: arm64: Defer CMOs for locked memslots until a VCPU is run
  2021-08-25 16:17 ` [RFC PATCH v4 04/39] KVM: arm64: Defer CMOs for locked memslots until a VCPU is run Alexandru Elisei
@ 2021-10-17 11:43   ` Marc Zyngier
  2021-10-18 12:56     ` Alexandru Elisei
  0 siblings, 1 reply; 47+ messages in thread
From: Marc Zyngier @ 2021-10-17 11:43 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: linux-kernel, will, kvmarm, linux-arm-kernel

On Wed, 25 Aug 2021 17:17:40 +0100,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> KVM relies on doing dcache maintenance on stage 2 faults to present to a
> gueste running with the MMU off the same view of memory as userspace. For
> locked memslots, KVM so far has done the dcache maintenance when a memslot
> is locked, but that leaves KVM in a rather awkward position: what userspace
> writes to guest memory after the memslot is locked, but before a VCPU is
> run, might not be visible to the guest.
> 
> Fix this by deferring the dcache maintenance until the first VCPU is run.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  arch/arm64/include/asm/kvm_host.h |  7 ++++
>  arch/arm64/include/asm/kvm_mmu.h  |  5 +++
>  arch/arm64/kvm/arm.c              |  3 ++
>  arch/arm64/kvm/mmu.c              | 56 ++++++++++++++++++++++++++++---
>  4 files changed, 67 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 97ff3ed5d4b7..ed67f914d169 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -112,6 +112,10 @@ struct kvm_arch_memory_slot {
>  	u32 flags;
>  };
>  
> +/* kvm->arch.mmu_pending_ops flags */
> +#define KVM_LOCKED_MEMSLOT_FLUSH_DCACHE	0
> +#define KVM_MAX_MMU_PENDING_OPS		1
> +
>  struct kvm_arch {
>  	struct kvm_s2_mmu mmu;
>  
> @@ -135,6 +139,9 @@ struct kvm_arch {
>  	 */
>  	bool return_nisv_io_abort_to_user;
>  
> +	/* Defer MMU operations until a VCPU is run. */
> +	unsigned long mmu_pending_ops;

This has a funny taste of VM-wide requests...

> +
>  	/*
>  	 * VM-wide PMU filter, implemented as a bitmap and big enough for
>  	 * up to 2^10 events (ARMv8.0) or 2^16 events (ARMv8.1+).
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index ef079b5eb475..525c223e769f 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -219,6 +219,11 @@ void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
>  int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags);
>  int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags);
>  
> +#define kvm_mmu_has_pending_ops(kvm)	\
> +	(!bitmap_empty(&(kvm)->arch.mmu_pending_ops, KVM_MAX_MMU_PENDING_OPS))
> +
> +void kvm_mmu_perform_pending_ops(struct kvm *kvm);
> +
>  static inline unsigned int kvm_get_vmid_bits(void)
>  {
>  	int reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index efb3501c6016..144c982912d8 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -829,6 +829,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>  	if (unlikely(!kvm_vcpu_initialized(vcpu)))
>  		return -ENOEXEC;
>  
> +	if (unlikely(kvm_mmu_has_pending_ops(vcpu->kvm)))
> +		kvm_mmu_perform_pending_ops(vcpu->kvm);
> +
>  	ret = kvm_vcpu_first_run_init(vcpu);

Is there any reason why this isn't done as part of the 'first run'
handling? I am refactoring that part to remove as many things as
possible from the fast path, and would love not to go back to piling
more stuff here.

Or do you expect this to happen more than once per VM (despite what
the comment says)?

>  	if (ret)
>  		return ret;
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 59c2bfef2fd1..94fa08f3d9d3 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1253,6 +1253,41 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>  	return ret;
>  }
>  
> +/*
> + * It's safe to do the CMOs when the first VCPU is run because:
> + * - VCPUs cannot run until mmu_cmo_needed is cleared.
> + * - Memslots cannot be modified because we hold the kvm->slots_lock.

It would be good to document the expected locking order for this kind
of stuff.

> + *
> + * It's safe to periodically release the mmu_lock because:
> + * - VCPUs cannot run.
> + * - Any changes to the stage 2 tables triggered by the MMU notifiers also take
> + *   the mmu_lock, which means accesses will be serialized.
> + * - Stage 2 tables cannot be freed from under us as long as at least one VCPU
> + *   is live, which means that the VM will be live.
> + */
> +void kvm_mmu_perform_pending_ops(struct kvm *kvm)
> +{
> +	struct kvm_memory_slot *memslot;
> +
> +	mutex_lock(&kvm->slots_lock);
> +	if (!kvm_mmu_has_pending_ops(kvm))
> +		goto out_unlock;
> +
> +	if (test_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops)) {
> +		kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
> +			if (!memslot_is_locked(memslot))
> +				continue;
> +			stage2_flush_memslot(kvm, memslot);
> +		}
> +	}
> +
> +	bitmap_zero(&kvm->arch.mmu_pending_ops, KVM_MAX_MMU_PENDING_OPS);

clear_bit() instead? I understand that you want to support multiple
ops, but this looks odd.

> +
> +out_unlock:
> +	mutex_unlock(&kvm->slots_lock);
> +	return;
> +}
> +
>  static int try_rlimit_memlock(unsigned long npages)
>  {
>  	unsigned long lock_limit;
> @@ -1293,7 +1328,8 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
>  	struct kvm_memory_slot_page *page_entry;
>  	bool writable = flags & KVM_ARM_LOCK_MEM_WRITE;
>  	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
> -	struct kvm_pgtable *pgt = kvm->arch.mmu.pgt;
> +	struct kvm_pgtable pgt;
> +	struct kvm_pgtable_mm_ops mm_ops;
>  	struct vm_area_struct *vma;
>  	unsigned long npages = memslot->npages;
>  	unsigned int pin_flags = FOLL_LONGTERM;
> @@ -1311,6 +1347,16 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
>  		pin_flags |= FOLL_WRITE;
>  	}
>  
> +	/*
> +	 * Make a copy of the stage 2 translation table struct to remove the
> +	 * dcache callback so we can postpone the cache maintenance operations
> +	 * until the first VCPU is run.
> +	 */
> +	mm_ops = *kvm->arch.mmu.pgt->mm_ops;
> +	mm_ops.dcache_clean_inval_poc = NULL;
> +	pgt = *kvm->arch.mmu.pgt;
> +	pgt.mm_ops = &mm_ops;

Huhuh... Can't really say I'm in love with this. Are you trying to
avoid a double dcache clean to PoC? Is this a performance or a
correctness issue?

> +
>  	hva = memslot->userspace_addr;
>  	ipa = memslot->base_gfn << PAGE_SHIFT;
>  
> @@ -1362,13 +1408,13 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
>  			goto out_err;
>  		}
>  
> -		ret = kvm_pgtable_stage2_map(pgt, ipa, PAGE_SIZE,
> +		ret = kvm_pgtable_stage2_map(&pgt, ipa, PAGE_SIZE,
>  					     page_to_phys(page_entry->page),
>  					     prot, &cache);
>  		spin_unlock(&kvm->mmu_lock);
>  
>  		if (ret) {
> -			kvm_pgtable_stage2_unmap(pgt, memslot->base_gfn << PAGE_SHIFT,
> +			kvm_pgtable_stage2_unmap(&pgt, memslot->base_gfn << PAGE_SHIFT,
>  						 i << PAGE_SHIFT);
>  			unpin_memslot_pages(memslot, writable);
>  			goto out_err;
> @@ -1387,7 +1433,7 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
>  	 */
>  	ret = account_locked_vm(current->mm, npages, true);
>  	if (ret) {
> -		kvm_pgtable_stage2_unmap(pgt, memslot->base_gfn << PAGE_SHIFT,
> +		kvm_pgtable_stage2_unmap(&pgt, memslot->base_gfn << PAGE_SHIFT,
>  					 npages << PAGE_SHIFT);
>  		unpin_memslot_pages(memslot, writable);
>  		goto out_err;
> @@ -1397,6 +1443,8 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
>  	if (writable)
>  		memslot->arch.flags |= KVM_MEMSLOT_LOCK_WRITE;
>  
> +	set_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops);
> +
>  	kvm_mmu_free_memory_cache(&cache);
>  
>  	return 0;

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v4 02/39] KVM: arm64: Add lock/unlock memslot user API
  2021-08-25 16:17 ` [RFC PATCH v4 02/39] KVM: arm64: Add lock/unlock memslot user API Alexandru Elisei
@ 2021-10-18  9:04   ` Suzuki K Poulose
  2021-10-18 14:09     ` Alexandru Elisei
  0 siblings, 1 reply; 47+ messages in thread
From: Suzuki K Poulose @ 2021-10-18  9:04 UTC (permalink / raw)
  To: Alexandru Elisei, maz, james.morse, linux-arm-kernel, kvmarm,
	will, linux-kernel

On 25/08/2021 17:17, Alexandru Elisei wrote:
> Stage 2 faults triggered by the profiling buffer attempting to write to
> memory are reported by the SPE hardware by asserting a buffer management
> event interrupt. Interrupts are by their nature asynchronous, which means
> that the guest might have changed its stage 1 translation tables since the
> attempted write. SPE reports the guest virtual address that caused the data
> abort, not the IPA, which means that KVM would have to walk the guest's
> stage 1 tables to find the IPA. Using the AT instruction to walk the
> guest's tables in hardware is not an option because it doesn't report the
> IPA in the case of a stage 2 fault on a stage 1 table walk.
> 
> Avoid both issues by pre-mapping the guest memory at stage 2. This is being
> done by adding a capability that allows the user to pin the memory backing
> a memslot. The same capability can be used to unlock a memslot, which
> unpins the pages associated with the memslot, but doesn't unmap the IPA
> range from stage 2; in this case, the addresses will be unmapped from stage
> 2 via the MMU notifiers when the process' address space changes.
> 
> For now, the capability doesn't actually do anything other than checking
> that the usage is correct; the memory operations will be added in future
> patches.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>   Documentation/virt/kvm/api.rst   | 56 +++++++++++++++++++++++
>   arch/arm64/include/asm/kvm_mmu.h |  3 ++
>   arch/arm64/kvm/arm.c             | 42 ++++++++++++++++--
>   arch/arm64/kvm/mmu.c             | 76 ++++++++++++++++++++++++++++++++
>   include/uapi/linux/kvm.h         |  8 ++++
>   5 files changed, 181 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index dae68e68ca23..741327ef06b0 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6682,6 +6682,62 @@ MAP_SHARED mmap will result in an -EINVAL return.
>   When enabled the VMM may make use of the ``KVM_ARM_MTE_COPY_TAGS`` ioctl to
>   perform a bulk copy of tags to/from the guest.
>   
> +7.29 KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
> +----------------------------------------
> +
> +:Architectures: arm64
> +:Target: VM
> +:Parameters: flags is one of KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK or
> +                     KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK
> +             args[0] is the slot number
> +             args[1] specifies the permisions when the memslot is locked or if
> +                     all memslots should be unlocked
> +
> +The presence of this capability indicates that KVM supports locking the memory
> +associated with the memslot, and unlocking a previously locked memslot.
> +
> +The 'flags' parameter is defined as follows:
> +
> +7.29.1 KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK
> +-------------------------------------------------
> +
> +:Capability: 'flags' parameter to KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
> +:Architectures: arm64
> +:Target: VM
> +:Parameters: args[0] contains the memory slot number
> +             args[1] contains the permissions for the locked memory:
> +                     KVM_ARM_LOCK_MEMORY_READ (mandatory) to map it with
> +                     read permissions and KVM_ARM_LOCK_MEMORY_WRITE
> +                     (optional) with write permissions
> +:Returns: 0 on success; negative error code on failure
> +
> +Enabling this capability causes the memory described by the memslot to be
> +pinned in the process address space and the corresponding stage 2 IPA range
> +mapped at stage 2. The permissions specified in args[1] apply to both
> +mappings. The memory pinned with this capability counts towards the max
> +locked memory limit for the current process.
> +
> +The capability must be enabled before any VCPUs have run. The virtual memory
> +range described by the memslot must be mapped in the userspace process without
> +any gaps. It is considered an error if write permissions are specified for a
> +memslot which logs dirty pages.
> +
> +7.29.2 KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK
> +---------------------------------------------------
> +
> +:Capability: 'flags' parameter to KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
> +:Architectures: arm64
> +:Target: VM
> +:Parameters: args[0] contains the memory slot number
> +             args[1] optionally contains the flag KVM_ARM_UNLOCK_MEM_ALL,
> +                     which unlocks all previously locked memslots.
> +:Returns: 0 on success; negative error code on failure
> +
> +Enabling this capability causes the memory pinned when locking the memslot
> +specified in args[0] to be unpinned, or, optionally, the memory associated
> +with all locked memslots, to be unpinned. The IPA range is not unmapped
> +from stage 2.
> +
>   8. Other capabilities.
>   ======================
>   
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index b52c5c4b9a3d..ef079b5eb475 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -216,6 +216,9 @@ static inline void __invalidate_icache_guest_page(void *va, size_t size)
>   void kvm_set_way_flush(struct kvm_vcpu *vcpu);
>   void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
>   
> +int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags);
> +int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags);
> +
>   static inline unsigned int kvm_get_vmid_bits(void)
>   {
>   	int reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index ddace63528f1..57ac97b30b3d 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -80,16 +80,43 @@ int kvm_arch_check_processor_compat(void *opaque)
>   	return 0;
>   }
>   
> +static int kvm_arm_lock_memslot_supported(void)
> +{
> +	return 0;
> +}
> +
> +static int kvm_lock_user_memory_region_ioctl(struct kvm *kvm,
> +					     struct kvm_enable_cap *cap)
> +{
> +	u64 slot, flags;
> +	u32 action;
> +
> +	if (cap->args[2] || cap->args[3])
> +		return -EINVAL;
> +
> +	slot = cap->args[0];
> +	flags = cap->args[1];

nit: ^^ Please could we rename "flags" => "perm" (ission) ?

> +	action = cap->flags;

We already have cap->flags, and using arg[1] as flags ( which indeed
is permission by definition) is confusing.

Suzuki
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v4 04/39] KVM: arm64: Defer CMOs for locked memslots until a VCPU is run
  2021-10-17 11:43   ` Marc Zyngier
@ 2021-10-18 12:56     ` Alexandru Elisei
  0 siblings, 0 replies; 47+ messages in thread
From: Alexandru Elisei @ 2021-10-18 12:56 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: linux-kernel, will, kvmarm, linux-arm-kernel

Hi Marc,

On Sun, Oct 17, 2021 at 12:43:29PM +0100, Marc Zyngier wrote:
> On Wed, 25 Aug 2021 17:17:40 +0100,
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > 
> > KVM relies on doing dcache maintenance on stage 2 faults to present to a
> > gueste running with the MMU off the same view of memory as userspace. For
> > locked memslots, KVM so far has done the dcache maintenance when a memslot
> > is locked, but that leaves KVM in a rather awkward position: what userspace
> > writes to guest memory after the memslot is locked, but before a VCPU is
> > run, might not be visible to the guest.
> > 
> > Fix this by deferring the dcache maintenance until the first VCPU is run.
> > 
> > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > ---
> >  arch/arm64/include/asm/kvm_host.h |  7 ++++
> >  arch/arm64/include/asm/kvm_mmu.h  |  5 +++
> >  arch/arm64/kvm/arm.c              |  3 ++
> >  arch/arm64/kvm/mmu.c              | 56 ++++++++++++++++++++++++++++---
> >  4 files changed, 67 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 97ff3ed5d4b7..ed67f914d169 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -112,6 +112,10 @@ struct kvm_arch_memory_slot {
> >  	u32 flags;
> >  };
> >  
> > +/* kvm->arch.mmu_pending_ops flags */
> > +#define KVM_LOCKED_MEMSLOT_FLUSH_DCACHE	0
> > +#define KVM_MAX_MMU_PENDING_OPS		1
> > +
> >  struct kvm_arch {
> >  	struct kvm_s2_mmu mmu;
> >  
> > @@ -135,6 +139,9 @@ struct kvm_arch {
> >  	 */
> >  	bool return_nisv_io_abort_to_user;
> >  
> > +	/* Defer MMU operations until a VCPU is run. */
> > +	unsigned long mmu_pending_ops;
> 
> This has a funny taste of VM-wide requests...

I guess you can look at it that way, although the exact semantics of a VM-wide
request are very vague to me.

> 
> > +
> >  	/*
> >  	 * VM-wide PMU filter, implemented as a bitmap and big enough for
> >  	 * up to 2^10 events (ARMv8.0) or 2^16 events (ARMv8.1+).
> > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> > index ef079b5eb475..525c223e769f 100644
> > --- a/arch/arm64/include/asm/kvm_mmu.h
> > +++ b/arch/arm64/include/asm/kvm_mmu.h
> > @@ -219,6 +219,11 @@ void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
> >  int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags);
> >  int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags);
> >  
> > +#define kvm_mmu_has_pending_ops(kvm)	\
> > +	(!bitmap_empty(&(kvm)->arch.mmu_pending_ops, KVM_MAX_MMU_PENDING_OPS))
> > +
> > +void kvm_mmu_perform_pending_ops(struct kvm *kvm);
> > +
> >  static inline unsigned int kvm_get_vmid_bits(void)
> >  {
> >  	int reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index efb3501c6016..144c982912d8 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -829,6 +829,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
> >  	if (unlikely(!kvm_vcpu_initialized(vcpu)))
> >  		return -ENOEXEC;
> >  
> > +	if (unlikely(kvm_mmu_has_pending_ops(vcpu->kvm)))
> > +		kvm_mmu_perform_pending_ops(vcpu->kvm);
> > +
> >  	ret = kvm_vcpu_first_run_init(vcpu);
> 
> Is there any reason why this isn't done as part of the 'first run'
> handling? I am refactoring that part to remove as many things as
> possible from the fast path, and would love not to go back to piling
> more stuff here.
> 
> Or do you expect this to happen more than once per VM (despite what
> the comment says)?

In theory, it can happen more than once per VM and KVM should allow it to
happen. I believe there is a least one valid use case for it, if userspace
unlocks a memslot to perform migration (so KVM can write-protect stage 2),
decides to cancel migration, then locks the memslot again without destroying the
VM.

However, I didn't take this scenario into account in this series
(kvm_mmu_lock_memslot returns an error if one VCPU has vcpu->arch.has_run_once
true). I'll put it on my todo list for the next iteration.

> 
> >  	if (ret)
> >  		return ret;
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 59c2bfef2fd1..94fa08f3d9d3 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -1253,6 +1253,41 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
> >  	return ret;
> >  }
> >  
> > +/*
> > + * It's safe to do the CMOs when the first VCPU is run because:
> > + * - VCPUs cannot run until mmu_cmo_needed is cleared.
> > + * - Memslots cannot be modified because we hold the kvm->slots_lock.
> 
> It would be good to document the expected locking order for this kind
> of stuff.

I created this locking scheme with the assumption that a memslot is locked
before any VCPUs have run. I need to reconsider that for the use case I
mentioned above. I do agree that this needs to be very well documented.

I also want to try to get rid of lock_all_vcpus (which was
renamed to kvm_lock_all_vcpus in this series), because it triggers a lockdep
bug, which is present in from upstream KVM. Copy and pasted from the cover
letter (I got the number of VCPUs wrong, 47 is enough to trigger it):

"With CONFIG_PROVE_LOCKING=y, KVM will hit a BUG at
kvm_lock_all_vcpus()->mutex_trylock(&vcpu->mutex) with more than 48
VCPUs.

This BUG statement can also be triggered with mainline. To reproduce it,
compile kvmtool from this branch [1] and follow the instruction in the
kvmtool commit message".

[1] https://gitlab.arm.com/linux-arm/kvmtool-ae/-/tree/vgic-lock-all-vcpus-lockdep-bug-v1

Here's the dmesg on my rockpro64, with mainline Linux v5.15-rc6, with the
kvmtool patch above:

$ ./vm run -c47 -m256 -k ../kvm-unit-tests/arm/selftest.flat --nodefaults -p "setup smp=47 mem=256"

[..]
[  136.888002] BUG: MAX_LOCK_DEPTH too low!
[  136.888372] turning off the locking correctness validator.
[  136.888859] depth: 48  max: 48!
[  136.889140] 48 locks held by vm/401:
[  136.889461]  #0: ffff00000d621248 (&kvm->lock){+.+.}-{3:3}, at: vgic_v3_attr_regs_access+0x9c/0x250
[  136.890288]  #1: ffff0000098d80c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.891032]  #2: ffff00000b9980c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.891774]  #3: ffff00000b8440c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.892516]  #4: ffff00000d6c00c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.893259]  #5: ffff00000aa600c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.894001]  #6: ffff0000099540c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.894743]  #7: ffff00000b8f00c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.895486]  #8: ffff00000b9e00c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.896228]  #9: ffff00000b8e40c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.896970]  #10: ffff00000b8dc0c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.897719]  #11: ffff00000b8d40c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.898468]  #12: ffff00000b9cc0c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.899219]  #13: ffff00000ca940c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.899968]  #14: ffff00000c8f40c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.900717]  #15: ffff00000baa40c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.901465]  #16: ffff00000e98c0c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.902214]  #17: ffff00000e9dc0c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.902965]  #18: ffff00000aacc0c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.903713]  #19: ffff0000098c80c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.904462]  #20: ffff00000ab940c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.905211]  #21: ffff00000aba40c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.905961]  #22: ffff00000e9e80c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.906710]  #23: ffff00000e9ec0c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.907460]  #24: ffff00000d4d80c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.908209]  #25: ffff00000d4dc0c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.908959]  #26: ffff00000ab980c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.909708]  #27: ffff00000ab9c0c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.910458]  #28: ffff00000ba9c0c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.911206]  #29: ffff00000b9dc0c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.911955]  #30: ffff00000abb40c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.912704]  #31: ffff00000aab80c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.913454]  #32: ffff00000b9a00c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.914203]  #33: ffff00000b9e80c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.914952]  #34: ffff00000ba000c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.915701]  #35: ffff00000b8a80c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.916451]  #36: ffff00000ca080c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.917199]  #37: ffff00000aba80c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.917950]  #38: ffff00000abac0c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.918700]  #39: ffff00000c9e80c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.919450]  #40: ffff00000c9ec0c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.920199]  #41: ffff00000d4e00c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.920948]  #42: ffff00000d4e40c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.921697]  #43: ffff00000ca0c0c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.922446]  #44: ffff00000abf00c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.923195]  #45: ffff00000abf40c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.923945]  #46: ffff00000abf80c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.924693]  #47: ffff00000abfc0c8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x2c/0xf0
[  136.925443] INFO: lockdep is turned off.
[  136.925795] CPU: 5 PID: 401 Comm: vm Not tainted 5.15.0-rc6 #103
[  136.926331] Hardware name: Pine64 RockPro64 v2.0 (DT)
[  136.926781] Call trace:
[  136.927000]  dump_backtrace+0x0/0x1a0
[  136.927332]  show_stack+0x18/0x24
[  136.927630]  dump_stack_lvl+0x8c/0xb8
[  136.927960]  dump_stack+0x18/0x34
[  136.928258]  __lock_acquire+0xbc8/0x20cc
[  136.928611]  lock_acquire.part.0+0xe0/0x230
[  136.928987]  lock_acquire+0x68/0x84
[  136.929301]  mutex_trylock+0xa8/0xf0
[  136.929623]  lock_all_vcpus+0x2c/0xf0
[  136.929953]  vgic_v3_attr_regs_access+0xb4/0x250
[  136.930366]  vgic_v3_get_attr+0x128/0x1e4
[  136.930726]  kvm_device_ioctl+0xd8/0x120
[  136.931078]  __arm64_sys_ioctl+0xa8/0xf0
[  136.931432]  invoke_syscall+0x48/0x114
[  136.931771]  el0_svc_common.constprop.0+0xfc/0x11c
[  136.932200]  do_el0_svc+0x28/0x90
[  136.932499]  el0_svc+0x54/0x130
[  136.932782]  el0t_64_sync_handler+0xa4/0x130
[  136.933164]  el0t_64_sync+0x1a0/0x1a4

The test completes correctly, and the lockdep splat is benign (bumping
include/linux/sched.h::MAX_LOCK_DEPTH removed the splat), but I would still like
to try to avoid it if possible.

> 
> > + *
> > + * It's safe to periodically release the mmu_lock because:
> > + * - VCPUs cannot run.
> > + * - Any changes to the stage 2 tables triggered by the MMU notifiers also take
> > + *   the mmu_lock, which means accesses will be serialized.
> > + * - Stage 2 tables cannot be freed from under us as long as at least one VCPU
> > + *   is live, which means that the VM will be live.
> > + */
> > +void kvm_mmu_perform_pending_ops(struct kvm *kvm)
> > +{
> > +	struct kvm_memory_slot *memslot;
> > +
> > +	mutex_lock(&kvm->slots_lock);
> > +	if (!kvm_mmu_has_pending_ops(kvm))
> > +		goto out_unlock;
> > +
> > +	if (test_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops)) {
> > +		kvm_for_each_memslot(memslot, kvm_memslots(kvm)) {
> > +			if (!memslot_is_locked(memslot))
> > +				continue;
> > +			stage2_flush_memslot(kvm, memslot);
> > +		}
> > +	}
> > +
> > +	bitmap_zero(&kvm->arch.mmu_pending_ops, KVM_MAX_MMU_PENDING_OPS);
> 
> clear_bit() instead? I understand that you want to support multiple
> ops, but this looks odd.

I can do that.

> 
> > +
> > +out_unlock:
> > +	mutex_unlock(&kvm->slots_lock);
> > +	return;
> > +}
> > +
> >  static int try_rlimit_memlock(unsigned long npages)
> >  {
> >  	unsigned long lock_limit;
> > @@ -1293,7 +1328,8 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
> >  	struct kvm_memory_slot_page *page_entry;
> >  	bool writable = flags & KVM_ARM_LOCK_MEM_WRITE;
> >  	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
> > -	struct kvm_pgtable *pgt = kvm->arch.mmu.pgt;
> > +	struct kvm_pgtable pgt;
> > +	struct kvm_pgtable_mm_ops mm_ops;
> >  	struct vm_area_struct *vma;
> >  	unsigned long npages = memslot->npages;
> >  	unsigned int pin_flags = FOLL_LONGTERM;
> > @@ -1311,6 +1347,16 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
> >  		pin_flags |= FOLL_WRITE;
> >  	}
> >  
> > +	/*
> > +	 * Make a copy of the stage 2 translation table struct to remove the
> > +	 * dcache callback so we can postpone the cache maintenance operations
> > +	 * until the first VCPU is run.
> > +	 */
> > +	mm_ops = *kvm->arch.mmu.pgt->mm_ops;
> > +	mm_ops.dcache_clean_inval_poc = NULL;
> > +	pgt = *kvm->arch.mmu.pgt;
> > +	pgt.mm_ops = &mm_ops;
> 
> Huhuh... Can't really say I'm in love with this. Are you trying to
> avoid a double dcache clean to PoC? Is this a performance or a
> correctness issue?

I am trying to avoid a double dcache clean to PoC for performance reasons. I
haven't measured it in any way, but I believe that doing dcache clean to PoC for
VMs with big amounts of memory will have a noticeable impact on performance. I
can run some tests and come back with a number if you wish.

My intention here was to create a copy of the kvm->arch.mmu.pgt without the
dcache clean to PoC function pointer, and I agree that it doesn't look too
pleasing. I can temporarily assign NULL to
kvm->arch.mmu.pgt->mm_ops.dcache_clean_inval_poc if that makes it more
palatable. I'm open to suggestions.

One other option, if this looks too unpleasant and the solution is too
convoluted, is to leave the double dcache operation in place and figure out a
way to optimize it after the series has progressed some more.

Thanks for having a look!

Alex

> 
> > +
> >  	hva = memslot->userspace_addr;
> >  	ipa = memslot->base_gfn << PAGE_SHIFT;
> >  
> > @@ -1362,13 +1408,13 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
> >  			goto out_err;
> >  		}
> >  
> > -		ret = kvm_pgtable_stage2_map(pgt, ipa, PAGE_SIZE,
> > +		ret = kvm_pgtable_stage2_map(&pgt, ipa, PAGE_SIZE,
> >  					     page_to_phys(page_entry->page),
> >  					     prot, &cache);
> >  		spin_unlock(&kvm->mmu_lock);
> >  
> >  		if (ret) {
> > -			kvm_pgtable_stage2_unmap(pgt, memslot->base_gfn << PAGE_SHIFT,
> > +			kvm_pgtable_stage2_unmap(&pgt, memslot->base_gfn << PAGE_SHIFT,
> >  						 i << PAGE_SHIFT);
> >  			unpin_memslot_pages(memslot, writable);
> >  			goto out_err;
> > @@ -1387,7 +1433,7 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
> >  	 */
> >  	ret = account_locked_vm(current->mm, npages, true);
> >  	if (ret) {
> > -		kvm_pgtable_stage2_unmap(pgt, memslot->base_gfn << PAGE_SHIFT,
> > +		kvm_pgtable_stage2_unmap(&pgt, memslot->base_gfn << PAGE_SHIFT,
> >  					 npages << PAGE_SHIFT);
> >  		unpin_memslot_pages(memslot, writable);
> >  		goto out_err;
> > @@ -1397,6 +1443,8 @@ static int lock_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot,
> >  	if (writable)
> >  		memslot->arch.flags |= KVM_MEMSLOT_LOCK_WRITE;
> >  
> > +	set_bit(KVM_LOCKED_MEMSLOT_FLUSH_DCACHE, &kvm->arch.mmu_pending_ops);
> > +
> >  	kvm_mmu_free_memory_cache(&cache);
> >  
> >  	return 0;
> 
> Thanks,
> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC PATCH v4 02/39] KVM: arm64: Add lock/unlock memslot user API
  2021-10-18  9:04   ` Suzuki K Poulose
@ 2021-10-18 14:09     ` Alexandru Elisei
  0 siblings, 0 replies; 47+ messages in thread
From: Alexandru Elisei @ 2021-10-18 14:09 UTC (permalink / raw)
  To: Suzuki K Poulose; +Cc: maz, linux-kernel, will, kvmarm, linux-arm-kernel

Hi Suzuki,

On Mon, Oct 18, 2021 at 10:04:37AM +0100, Suzuki K Poulose wrote:
> On 25/08/2021 17:17, Alexandru Elisei wrote:
> > Stage 2 faults triggered by the profiling buffer attempting to write to
> > memory are reported by the SPE hardware by asserting a buffer management
> > event interrupt. Interrupts are by their nature asynchronous, which means
> > that the guest might have changed its stage 1 translation tables since the
> > attempted write. SPE reports the guest virtual address that caused the data
> > abort, not the IPA, which means that KVM would have to walk the guest's
> > stage 1 tables to find the IPA. Using the AT instruction to walk the
> > guest's tables in hardware is not an option because it doesn't report the
> > IPA in the case of a stage 2 fault on a stage 1 table walk.
> > 
> > Avoid both issues by pre-mapping the guest memory at stage 2. This is being
> > done by adding a capability that allows the user to pin the memory backing
> > a memslot. The same capability can be used to unlock a memslot, which
> > unpins the pages associated with the memslot, but doesn't unmap the IPA
> > range from stage 2; in this case, the addresses will be unmapped from stage
> > 2 via the MMU notifiers when the process' address space changes.
> > 
> > For now, the capability doesn't actually do anything other than checking
> > that the usage is correct; the memory operations will be added in future
> > patches.
> > 
> > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > ---
> >   Documentation/virt/kvm/api.rst   | 56 +++++++++++++++++++++++
> >   arch/arm64/include/asm/kvm_mmu.h |  3 ++
> >   arch/arm64/kvm/arm.c             | 42 ++++++++++++++++--
> >   arch/arm64/kvm/mmu.c             | 76 ++++++++++++++++++++++++++++++++
> >   include/uapi/linux/kvm.h         |  8 ++++
> >   5 files changed, 181 insertions(+), 4 deletions(-)
> > 
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index dae68e68ca23..741327ef06b0 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -6682,6 +6682,62 @@ MAP_SHARED mmap will result in an -EINVAL return.
> >   When enabled the VMM may make use of the ``KVM_ARM_MTE_COPY_TAGS`` ioctl to
> >   perform a bulk copy of tags to/from the guest.
> > +7.29 KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
> > +----------------------------------------
> > +
> > +:Architectures: arm64
> > +:Target: VM
> > +:Parameters: flags is one of KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK or
> > +                     KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK
> > +             args[0] is the slot number
> > +             args[1] specifies the permisions when the memslot is locked or if
> > +                     all memslots should be unlocked
> > +
> > +The presence of this capability indicates that KVM supports locking the memory
> > +associated with the memslot, and unlocking a previously locked memslot.
> > +
> > +The 'flags' parameter is defined as follows:
> > +
> > +7.29.1 KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK
> > +-------------------------------------------------
> > +
> > +:Capability: 'flags' parameter to KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
> > +:Architectures: arm64
> > +:Target: VM
> > +:Parameters: args[0] contains the memory slot number
> > +             args[1] contains the permissions for the locked memory:
> > +                     KVM_ARM_LOCK_MEMORY_READ (mandatory) to map it with
> > +                     read permissions and KVM_ARM_LOCK_MEMORY_WRITE
> > +                     (optional) with write permissions
> > +:Returns: 0 on success; negative error code on failure
> > +
> > +Enabling this capability causes the memory described by the memslot to be
> > +pinned in the process address space and the corresponding stage 2 IPA range
> > +mapped at stage 2. The permissions specified in args[1] apply to both
> > +mappings. The memory pinned with this capability counts towards the max
> > +locked memory limit for the current process.
> > +
> > +The capability must be enabled before any VCPUs have run. The virtual memory
> > +range described by the memslot must be mapped in the userspace process without
> > +any gaps. It is considered an error if write permissions are specified for a
> > +memslot which logs dirty pages.
> > +
> > +7.29.2 KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK
> > +---------------------------------------------------
> > +
> > +:Capability: 'flags' parameter to KVM_CAP_ARM_LOCK_USER_MEMORY_REGION
> > +:Architectures: arm64
> > +:Target: VM
> > +:Parameters: args[0] contains the memory slot number
> > +             args[1] optionally contains the flag KVM_ARM_UNLOCK_MEM_ALL,
> > +                     which unlocks all previously locked memslots.
> > +:Returns: 0 on success; negative error code on failure
> > +
> > +Enabling this capability causes the memory pinned when locking the memslot
> > +specified in args[0] to be unpinned, or, optionally, the memory associated
> > +with all locked memslots, to be unpinned. The IPA range is not unmapped
> > +from stage 2.
> > +
> >   8. Other capabilities.
> >   ======================
> > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> > index b52c5c4b9a3d..ef079b5eb475 100644
> > --- a/arch/arm64/include/asm/kvm_mmu.h
> > +++ b/arch/arm64/include/asm/kvm_mmu.h
> > @@ -216,6 +216,9 @@ static inline void __invalidate_icache_guest_page(void *va, size_t size)
> >   void kvm_set_way_flush(struct kvm_vcpu *vcpu);
> >   void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
> > +int kvm_mmu_lock_memslot(struct kvm *kvm, u64 slot, u64 flags);
> > +int kvm_mmu_unlock_memslot(struct kvm *kvm, u64 slot, u64 flags);
> > +
> >   static inline unsigned int kvm_get_vmid_bits(void)
> >   {
> >   	int reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index ddace63528f1..57ac97b30b3d 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -80,16 +80,43 @@ int kvm_arch_check_processor_compat(void *opaque)
> >   	return 0;
> >   }
> > +static int kvm_arm_lock_memslot_supported(void)
> > +{
> > +	return 0;
> > +}
> > +
> > +static int kvm_lock_user_memory_region_ioctl(struct kvm *kvm,
> > +					     struct kvm_enable_cap *cap)
> > +{
> > +	u64 slot, flags;
> > +	u32 action;
> > +
> > +	if (cap->args[2] || cap->args[3])
> > +		return -EINVAL;
> > +
> > +	slot = cap->args[0];
> > +	flags = cap->args[1];
> 
> nit: ^^ Please could we rename "flags" => "perm" (ission) ?
> 
> > +	action = cap->flags;
> 
> We already have cap->flags, and using arg[1] as flags ( which indeed
> is permission by definition) is confusing.

Yes, that's a good idea, will do.

Thanks,
Alex
> 
> Suzuki
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2021-10-18 14:07 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-25 16:17 [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 01/39] KVM: arm64: Make lock_all_vcpus() available to the rest of KVM Alexandru Elisei
2021-09-22 10:39   ` Suzuki K Poulose
2021-08-25 16:17 ` [RFC PATCH v4 02/39] KVM: arm64: Add lock/unlock memslot user API Alexandru Elisei
2021-10-18  9:04   ` Suzuki K Poulose
2021-10-18 14:09     ` Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 03/39] KVM: arm64: Implement the memslot lock/unlock functionality Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 04/39] KVM: arm64: Defer CMOs for locked memslots until a VCPU is run Alexandru Elisei
2021-10-17 11:43   ` Marc Zyngier
2021-10-18 12:56     ` Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 05/39] KVM: arm64: Perform CMOs on locked memslots when userspace resets VCPUs Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 06/39] KVM: arm64: Delay tag scrubbing for locked memslots until a VCPU runs Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 07/39] KVM: arm64: Unlock memslots after stage 2 tables are freed Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 08/39] KVM: arm64: Deny changes to locked memslots Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 09/39] KVM: Add kvm_warn{,_ratelimited} macros Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 10/39] KVM: arm64: Print a warning for unexpected faults on locked memslots Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 11/39] KVM: arm64: Allow userspace to lock and unlock memslots Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 12/39] KVM: arm64: Add the KVM_ARM_VCPU_SUPPORTED_CPUS VCPU ioctl Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 13/39] KVM: arm64: Add CONFIG_KVM_ARM_SPE Kconfig option Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 14/39] KVM: arm64: Add SPE capability and VCPU feature Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 15/39] drivers/perf: Expose the cpumask of CPUs that support SPE Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 16/39] KVM: arm64: Make SPE available when at least one CPU supports it Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 17/39] KVM: arm64: Set the VCPU SPE feature bit when SPE is available Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 18/39] KVM: arm64: Expose SPE version to guests Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 19/39] KVM: arm64: Do not emulate SPE on CPUs which don't have SPE Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 20/39] KVM: arm64: Add a new VCPU device control group for SPE Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 21/39] KVM: arm64: Add SPE VCPU device attribute to set the interrupt number Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 22/39] KVM: arm64: Add SPE VCPU device attribute to initialize SPE Alexandru Elisei
2021-08-25 16:17 ` [RFC PATCH v4 23/39] KVM: arm64: VHE: Clear MDCR_EL2.E2PB in vcpu_put() Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 24/39] KVM: arm64: debug: Configure MDCR_EL2 when a VCPU has SPE Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 25/39] KVM: arm64: Move the write to MDCR_EL2 out of __activate_traps_common() Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 26/39] KVM: arm64: VHE: Change MDCR_EL2 at world switch if VCPU has SPE Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 27/39] KVM: arm64: Add SPE system registers to VCPU context Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 28/39] KVM: arm64: nVHE: Save PMSCR_EL1 to the host context Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 29/39] KVM: arm64: Rename DEBUG_STATE_SAVE_SPE -> DEBUG_SAVE_SPE_BUFFER flags Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 30/39] KVM: arm64: nVHE: Context switch SPE state if VCPU has SPE Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 31/39] KVM: arm64: VHE: " Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 32/39] KVM: arm64: Save/restore PMSNEVFR_EL1 on VCPU put/load Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 33/39] KVM: arm64: Allow guest to use physical timestamps if perfmon_capable() Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 34/39] KVM: arm64: Emulate SPE buffer management interrupt Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 35/39] KVM: arm64: Add an userspace API to stop a VCPU profiling Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 36/39] KVM: arm64: Implement " Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 37/39] KVM: arm64: Add PMSIDR_EL1 to the SPE register context Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 38/39] KVM: arm64: Make CONFIG_KVM_ARM_SPE depend on !CONFIG_NUMA_BALANCING Alexandru Elisei
2021-08-25 16:18 ` [RFC PATCH v4 39/39] KVM: arm64: Allow userspace to enable SPE for guests Alexandru Elisei
2021-09-22 10:11 ` [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support Suzuki K Poulose
2021-09-23 15:12   ` Alexandru Elisei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).