All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] KVM: arm64: Improve PMU support on heterogeneous systems
@ 2021-11-15 16:50 ` Alexandru Elisei
  0 siblings, 0 replies; 26+ messages in thread
From: Alexandru Elisei @ 2021-11-15 16:50 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, will, mark.rutland,
	linux-arm-kernel, kvmarm

(CC'ing Peter Maydell in case this might be of interest to qemu)

The series can be found at [x], and the kvmtool support at [2].

At the moment, the experience of running a virtual machine with a PMU on a
heterogeneous systems (where there are different PMUs), varies from just
works, if the VCPUs run only on the correct physical CPUs, to doesn't work
at all, if the VCPUs run only on the incorrect physical CPUs, to something
doesn't look right, if the VCPUs run some of the time on the correct
physical CPUs, and some of the time on the incorrect physical CPUs. The
reason for this behaviour is that KVM creates perf events to emulate a
guest PMU, and the choice of PMU that is used to create these events is
left entirely up to the perf core system, based on the hardware probe
order. The first PMU to register with perf (via perf_pmu_register()) is the
PMU that will always be chosen when creating the events (in
perf_event_create_kernel_counter() -> perf_event_alloc() ->
perf_event_init()).

Let's take the example of a rockpro64 board, CPUs 0-3 are Cortex-A53 (the
little cores), CPUs 4-5 are Cortex-A72 (the big cores), and each group has
their own PMU. When running the pmu-cycle-counter test from kvm-unit-tests
on the little cores, everything is working as expected:

taskset -c 0-3 ./vm run -c1 -m64 --nodefaults -k arm/pmu.flat -p "cycle-counter 0" --pmu
[..]
PASS: pmu: cycle-counter: Monotonically increasing cycle count
[..]
PASS: pmu: cycle-counter: Cycle/instruction ratio
SUMMARY: 2 tests

But when running the same test on the big cores:

$ taskset -c 4-5 ./vm run -c1 -m64 --nodefaults -k arm/pmu.flat -p "cycle-counter 0" --pmu
[..]
FAIL: pmu: cycle-counter: Monotonically increasing cycle count
[..]
FAIL: pmu: cycle-counter: Cycle/instruction ratio
SUMMARY: 2 tests, 2 unexpected failures

The same behaviour is exhibited when running under qemu.

The test passes on the little cores in that particular setup because the
little cores are the "correct" cores: the PMU that perf chooses to create
the events on is the PMU associated with the little cores. The test fails
on the big cores because the events cannot be scheduled in, as the PMU is
associated with a different set of cores (merge_sched_in() exits early
because event_filter_match() returns false).

It gets even more impredicatable, as the order in which the PMUs are probed
during boot dictates which PMU is chosen for creating the events, and the
probe order can change if, for example, the order of the PMU nodes in the
DTB changes, or if the kernel is booted with asynchronous driver probing
for the armv8-pmu driver. A user can end up in a situation where pinning
the VM on a set of CPUs works just fine, and after a reboot doesn't work
anymore, without any kind of explanation or hints of why it stopped
working.

All of this is not ideal from the user perspective and this series aims to
improve that by adding a new PMU attribute which can be used to tell KVM
exactly on which PMU events for the VCPU should be created. The contract is
that user is still responsible for pinning the VCPUs on the corresponding
CPUs, and KVM will refuse to run the VCPU on a CPU with a different PMU.

With this series on top of kvmtool support for KVM_ARM_VCPU_PMU_V3_SET_PMU
attribute [2], running the same test as above on the little cores, then on
the big cores:

$ taskset -c 0-3 ./vm run -c1 -m64 --nodefaults -k arm/pmu.flat -p "cycle-counter 0" --pmu
[..]
PASS: pmu: cycle-counter: Monotonically increasing cycle count
[..]
PASS: pmu: cycle-counter: Cycle/instruction ratio
SUMMARY: 2 tests

$ taskset -c 4-5 ./vm run -c1 -m64 --nodefaults -k arm/pmu.flat -p "cycle-counter 0" --pmu
[..]
PASS: pmu: cycle-counter: Monotonically increasing cycle count
[..]
PASS: pmu: cycle-counter: Cycle/instruction ratio
SUMMARY: 2 tests

We get a saner behaviour, which is reproducible across reboots, regardless
of the probe order.

And this is what happens if the VCPU is run on a physical PMU with a
different PMU than what was set by userspace:

$ taskset -c 3-4 ./vm run -c1 -m64 --nodefaults -k arm/pmu.flat -p "cycle-counter 0" --pmu
KVM_RUN failed: Exec format error

kvmtool sets the PMU for all VCPUs from the main thread; the main thread
runs on the little core (CPU3), but the VCPU is scheduled on the big core
(CPU4); there is a mismatch between the VCPU PMU and the physical CPU PMU,
and KVM returns -ENOEXEC from KVM_RUN.

[1] https://gitlab.arm.com/linux-arm/linux-ae/-/tree/pmu-big-little-fix-v1
[2] https://gitlab.arm.com/linux-arm/kvmtool-ae/-/tree/pmu-big-little-fix-v1

Alexandru Elisei (4):
  perf: Fix wrong name in comment for struct perf_cpu_context
  KVM: arm64: Keep a list of probed PMUs
  KVM: arm64: Add KVM_ARM_VCPU_PMU_V3_SET_PMU attribute
  KVM: arm64: Refuse to run VCPU if the PMU doesn't match the physical
    CPU

 Documentation/virt/kvm/api.rst          |  5 ++-
 Documentation/virt/kvm/devices/vcpu.rst | 26 +++++++++++
 arch/arm64/include/asm/kvm_host.h       |  3 ++
 arch/arm64/include/uapi/asm/kvm.h       |  1 +
 arch/arm64/kvm/arm.c                    | 15 +++++++
 arch/arm64/kvm/pmu-emul.c               | 58 +++++++++++++++++++++++--
 include/kvm/arm_pmu.h                   |  6 +++
 include/linux/perf_event.h              |  2 +-
 tools/arch/arm64/include/uapi/asm/kvm.h |  1 +
 9 files changed, 110 insertions(+), 7 deletions(-)

-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 0/4] KVM: arm64: Improve PMU support on heterogeneous systems
@ 2021-11-15 16:50 ` Alexandru Elisei
  0 siblings, 0 replies; 26+ messages in thread
From: Alexandru Elisei @ 2021-11-15 16:50 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, will, mark.rutland,
	linux-arm-kernel, kvmarm
  Cc: peter.maydell

(CC'ing Peter Maydell in case this might be of interest to qemu)

The series can be found at [x], and the kvmtool support at [2].

At the moment, the experience of running a virtual machine with a PMU on a
heterogeneous systems (where there are different PMUs), varies from just
works, if the VCPUs run only on the correct physical CPUs, to doesn't work
at all, if the VCPUs run only on the incorrect physical CPUs, to something
doesn't look right, if the VCPUs run some of the time on the correct
physical CPUs, and some of the time on the incorrect physical CPUs. The
reason for this behaviour is that KVM creates perf events to emulate a
guest PMU, and the choice of PMU that is used to create these events is
left entirely up to the perf core system, based on the hardware probe
order. The first PMU to register with perf (via perf_pmu_register()) is the
PMU that will always be chosen when creating the events (in
perf_event_create_kernel_counter() -> perf_event_alloc() ->
perf_event_init()).

Let's take the example of a rockpro64 board, CPUs 0-3 are Cortex-A53 (the
little cores), CPUs 4-5 are Cortex-A72 (the big cores), and each group has
their own PMU. When running the pmu-cycle-counter test from kvm-unit-tests
on the little cores, everything is working as expected:

taskset -c 0-3 ./vm run -c1 -m64 --nodefaults -k arm/pmu.flat -p "cycle-counter 0" --pmu
[..]
PASS: pmu: cycle-counter: Monotonically increasing cycle count
[..]
PASS: pmu: cycle-counter: Cycle/instruction ratio
SUMMARY: 2 tests

But when running the same test on the big cores:

$ taskset -c 4-5 ./vm run -c1 -m64 --nodefaults -k arm/pmu.flat -p "cycle-counter 0" --pmu
[..]
FAIL: pmu: cycle-counter: Monotonically increasing cycle count
[..]
FAIL: pmu: cycle-counter: Cycle/instruction ratio
SUMMARY: 2 tests, 2 unexpected failures

The same behaviour is exhibited when running under qemu.

The test passes on the little cores in that particular setup because the
little cores are the "correct" cores: the PMU that perf chooses to create
the events on is the PMU associated with the little cores. The test fails
on the big cores because the events cannot be scheduled in, as the PMU is
associated with a different set of cores (merge_sched_in() exits early
because event_filter_match() returns false).

It gets even more impredicatable, as the order in which the PMUs are probed
during boot dictates which PMU is chosen for creating the events, and the
probe order can change if, for example, the order of the PMU nodes in the
DTB changes, or if the kernel is booted with asynchronous driver probing
for the armv8-pmu driver. A user can end up in a situation where pinning
the VM on a set of CPUs works just fine, and after a reboot doesn't work
anymore, without any kind of explanation or hints of why it stopped
working.

All of this is not ideal from the user perspective and this series aims to
improve that by adding a new PMU attribute which can be used to tell KVM
exactly on which PMU events for the VCPU should be created. The contract is
that user is still responsible for pinning the VCPUs on the corresponding
CPUs, and KVM will refuse to run the VCPU on a CPU with a different PMU.

With this series on top of kvmtool support for KVM_ARM_VCPU_PMU_V3_SET_PMU
attribute [2], running the same test as above on the little cores, then on
the big cores:

$ taskset -c 0-3 ./vm run -c1 -m64 --nodefaults -k arm/pmu.flat -p "cycle-counter 0" --pmu
[..]
PASS: pmu: cycle-counter: Monotonically increasing cycle count
[..]
PASS: pmu: cycle-counter: Cycle/instruction ratio
SUMMARY: 2 tests

$ taskset -c 4-5 ./vm run -c1 -m64 --nodefaults -k arm/pmu.flat -p "cycle-counter 0" --pmu
[..]
PASS: pmu: cycle-counter: Monotonically increasing cycle count
[..]
PASS: pmu: cycle-counter: Cycle/instruction ratio
SUMMARY: 2 tests

We get a saner behaviour, which is reproducible across reboots, regardless
of the probe order.

And this is what happens if the VCPU is run on a physical PMU with a
different PMU than what was set by userspace:

$ taskset -c 3-4 ./vm run -c1 -m64 --nodefaults -k arm/pmu.flat -p "cycle-counter 0" --pmu
KVM_RUN failed: Exec format error

kvmtool sets the PMU for all VCPUs from the main thread; the main thread
runs on the little core (CPU3), but the VCPU is scheduled on the big core
(CPU4); there is a mismatch between the VCPU PMU and the physical CPU PMU,
and KVM returns -ENOEXEC from KVM_RUN.

[1] https://gitlab.arm.com/linux-arm/linux-ae/-/tree/pmu-big-little-fix-v1
[2] https://gitlab.arm.com/linux-arm/kvmtool-ae/-/tree/pmu-big-little-fix-v1

Alexandru Elisei (4):
  perf: Fix wrong name in comment for struct perf_cpu_context
  KVM: arm64: Keep a list of probed PMUs
  KVM: arm64: Add KVM_ARM_VCPU_PMU_V3_SET_PMU attribute
  KVM: arm64: Refuse to run VCPU if the PMU doesn't match the physical
    CPU

 Documentation/virt/kvm/api.rst          |  5 ++-
 Documentation/virt/kvm/devices/vcpu.rst | 26 +++++++++++
 arch/arm64/include/asm/kvm_host.h       |  3 ++
 arch/arm64/include/uapi/asm/kvm.h       |  1 +
 arch/arm64/kvm/arm.c                    | 15 +++++++
 arch/arm64/kvm/pmu-emul.c               | 58 +++++++++++++++++++++++--
 include/kvm/arm_pmu.h                   |  6 +++
 include/linux/perf_event.h              |  2 +-
 tools/arch/arm64/include/uapi/asm/kvm.h |  1 +
 9 files changed, 110 insertions(+), 7 deletions(-)

-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/4] perf: Fix wrong name in comment for struct perf_cpu_context
  2021-11-15 16:50 ` Alexandru Elisei
@ 2021-11-15 16:50   ` Alexandru Elisei
  -1 siblings, 0 replies; 26+ messages in thread
From: Alexandru Elisei @ 2021-11-15 16:50 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, will, mark.rutland,
	linux-arm-kernel, kvmarm
  Cc: Thomas Gleixner, Ingo Molnar

Commit 0793a61d4df8 ("performance counters: core code") added the perf
subsystem (then called Performance Counters) to Linux, creating the struct
perf_cpu_context. The comment for the struct referred to it as a "struct
perf_counter_cpu_context".

Commit cdd6c482c9ff ("perf: Do the big rename: Performance Counters ->
Performance Events") changed the comment to refer to a "struct
perf_event_cpu_context", which was still the wrong name for the struct.

Change the comment to say "struct perf_cpu_context".

CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 include/linux/perf_event.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 0dcfd265beed..14132570ea5d 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -862,7 +862,7 @@ struct perf_event_context {
 #define PERF_NR_CONTEXTS	4
 
 /**
- * struct perf_event_cpu_context - per cpu event context structure
+ * struct perf_cpu_context - per cpu event context structure
  */
 struct perf_cpu_context {
 	struct perf_event_context	ctx;
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 1/4] perf: Fix wrong name in comment for struct perf_cpu_context
@ 2021-11-15 16:50   ` Alexandru Elisei
  0 siblings, 0 replies; 26+ messages in thread
From: Alexandru Elisei @ 2021-11-15 16:50 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, will, mark.rutland,
	linux-arm-kernel, kvmarm
  Cc: peter.maydell, Thomas Gleixner, Ingo Molnar

Commit 0793a61d4df8 ("performance counters: core code") added the perf
subsystem (then called Performance Counters) to Linux, creating the struct
perf_cpu_context. The comment for the struct referred to it as a "struct
perf_counter_cpu_context".

Commit cdd6c482c9ff ("perf: Do the big rename: Performance Counters ->
Performance Events") changed the comment to refer to a "struct
perf_event_cpu_context", which was still the wrong name for the struct.

Change the comment to say "struct perf_cpu_context".

CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 include/linux/perf_event.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 0dcfd265beed..14132570ea5d 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -862,7 +862,7 @@ struct perf_event_context {
 #define PERF_NR_CONTEXTS	4
 
 /**
- * struct perf_event_cpu_context - per cpu event context structure
+ * struct perf_cpu_context - per cpu event context structure
  */
 struct perf_cpu_context {
 	struct perf_event_context	ctx;
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/4] KVM: arm64: Keep a list of probed PMUs
  2021-11-15 16:50 ` Alexandru Elisei
@ 2021-11-15 16:50   ` Alexandru Elisei
  -1 siblings, 0 replies; 26+ messages in thread
From: Alexandru Elisei @ 2021-11-15 16:50 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, will, mark.rutland,
	linux-arm-kernel, kvmarm

The ARM PMU driver calls kvm_host_pmu_init() after probing to tell KVM that
a hardware PMU is available for guest emulation. Heterogeneous systems can
have more than one PMU present, and the callback gets called multiple
times, once for each of them. Keep track of all the PMUs available to KVM,
as we're going to need them later.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/kvm/pmu-emul.c | 26 ++++++++++++++++++++++++--
 include/kvm/arm_pmu.h     |  5 +++++
 2 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index a5e4bbf5e68f..dab335d17409 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -7,6 +7,7 @@
 #include <linux/cpu.h>
 #include <linux/kvm.h>
 #include <linux/kvm_host.h>
+#include <linux/list.h>
 #include <linux/perf_event.h>
 #include <linux/perf/arm_pmu.h>
 #include <linux/uaccess.h>
@@ -14,6 +15,9 @@
 #include <kvm/arm_pmu.h>
 #include <kvm/arm_vgic.h>
 
+static LIST_HEAD(arm_pmus);
+static DEFINE_MUTEX(arm_pmus_lock);
+
 static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx);
 static void kvm_pmu_update_pmc_chained(struct kvm_vcpu *vcpu, u64 select_idx);
 static void kvm_pmu_stop_counter(struct kvm_vcpu *vcpu, struct kvm_pmc *pmc);
@@ -742,9 +746,27 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data,
 
 void kvm_host_pmu_init(struct arm_pmu *pmu)
 {
-	if (pmu->pmuver != 0 && pmu->pmuver != ID_AA64DFR0_PMUVER_IMP_DEF &&
-	    !kvm_arm_support_pmu_v3() && !is_protected_kvm_enabled())
+	struct arm_pmu_entry *entry;
+
+	if (pmu->pmuver == 0 || pmu->pmuver == ID_AA64DFR0_PMUVER_IMP_DEF ||
+	    is_protected_kvm_enabled())
+		return;
+
+	/* Protect list changes against asynchronous driver probing. */
+	mutex_lock(&arm_pmus_lock);
+
+	entry = kmalloc(sizeof(*entry), GFP_KERNEL);
+	if (!entry)
+		goto out_unlock;
+
+	if (list_empty(&arm_pmus))
 		static_branch_enable(&kvm_arm_pmu_available);
+
+	entry->arm_pmu = pmu;
+	list_add_tail(&entry->entry, &arm_pmus);
+
+out_unlock:
+	mutex_unlock(&arm_pmus_lock);
 }
 
 static int kvm_pmu_probe_pmuver(void)
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index 90f21898aad8..e249c5f172aa 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -36,6 +36,11 @@ struct kvm_pmu {
 	struct irq_work overflow_work;
 };
 
+struct arm_pmu_entry {
+	struct list_head entry;
+	struct arm_pmu *arm_pmu;
+};
+
 #define kvm_arm_pmu_irq_initialized(v)	((v)->arch.pmu.irq_num >= VGIC_NR_SGIS)
 u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 select_idx);
 void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, u64 select_idx, u64 val);
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/4] KVM: arm64: Keep a list of probed PMUs
@ 2021-11-15 16:50   ` Alexandru Elisei
  0 siblings, 0 replies; 26+ messages in thread
From: Alexandru Elisei @ 2021-11-15 16:50 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, will, mark.rutland,
	linux-arm-kernel, kvmarm
  Cc: peter.maydell

The ARM PMU driver calls kvm_host_pmu_init() after probing to tell KVM that
a hardware PMU is available for guest emulation. Heterogeneous systems can
have more than one PMU present, and the callback gets called multiple
times, once for each of them. Keep track of all the PMUs available to KVM,
as we're going to need them later.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arch/arm64/kvm/pmu-emul.c | 26 ++++++++++++++++++++++++--
 include/kvm/arm_pmu.h     |  5 +++++
 2 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index a5e4bbf5e68f..dab335d17409 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -7,6 +7,7 @@
 #include <linux/cpu.h>
 #include <linux/kvm.h>
 #include <linux/kvm_host.h>
+#include <linux/list.h>
 #include <linux/perf_event.h>
 #include <linux/perf/arm_pmu.h>
 #include <linux/uaccess.h>
@@ -14,6 +15,9 @@
 #include <kvm/arm_pmu.h>
 #include <kvm/arm_vgic.h>
 
+static LIST_HEAD(arm_pmus);
+static DEFINE_MUTEX(arm_pmus_lock);
+
 static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx);
 static void kvm_pmu_update_pmc_chained(struct kvm_vcpu *vcpu, u64 select_idx);
 static void kvm_pmu_stop_counter(struct kvm_vcpu *vcpu, struct kvm_pmc *pmc);
@@ -742,9 +746,27 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data,
 
 void kvm_host_pmu_init(struct arm_pmu *pmu)
 {
-	if (pmu->pmuver != 0 && pmu->pmuver != ID_AA64DFR0_PMUVER_IMP_DEF &&
-	    !kvm_arm_support_pmu_v3() && !is_protected_kvm_enabled())
+	struct arm_pmu_entry *entry;
+
+	if (pmu->pmuver == 0 || pmu->pmuver == ID_AA64DFR0_PMUVER_IMP_DEF ||
+	    is_protected_kvm_enabled())
+		return;
+
+	/* Protect list changes against asynchronous driver probing. */
+	mutex_lock(&arm_pmus_lock);
+
+	entry = kmalloc(sizeof(*entry), GFP_KERNEL);
+	if (!entry)
+		goto out_unlock;
+
+	if (list_empty(&arm_pmus))
 		static_branch_enable(&kvm_arm_pmu_available);
+
+	entry->arm_pmu = pmu;
+	list_add_tail(&entry->entry, &arm_pmus);
+
+out_unlock:
+	mutex_unlock(&arm_pmus_lock);
 }
 
 static int kvm_pmu_probe_pmuver(void)
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index 90f21898aad8..e249c5f172aa 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -36,6 +36,11 @@ struct kvm_pmu {
 	struct irq_work overflow_work;
 };
 
+struct arm_pmu_entry {
+	struct list_head entry;
+	struct arm_pmu *arm_pmu;
+};
+
 #define kvm_arm_pmu_irq_initialized(v)	((v)->arch.pmu.irq_num >= VGIC_NR_SGIS)
 u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 select_idx);
 void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, u64 select_idx, u64 val);
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/4] KVM: arm64: Add KVM_ARM_VCPU_PMU_V3_SET_PMU attribute
  2021-11-15 16:50 ` Alexandru Elisei
@ 2021-11-15 16:50   ` Alexandru Elisei
  -1 siblings, 0 replies; 26+ messages in thread
From: Alexandru Elisei @ 2021-11-15 16:50 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, will, mark.rutland,
	linux-arm-kernel, kvmarm

When KVM creates an event and there are more than one PMUs present on the
system, perf_init_event() will go through the list of available PMUs and
will choose the first one that can create the event. The order of the PMUs
in the PMU list depends on the probe order, which can change under various
circumstances, for example if the order of the PMU nodes change in the DTB
or if asynchronous driver probing is enabled on the kernel command line
(with the driver_async_probe=armv8-pmu option).

Another consequence of this approach is that, on heteregeneous systems,
all virtual machines that KVM creates will use the same PMU. This might
cause unexpected behaviour for userspace: when a VCPU is executing on
the physical CPU that uses this PMU, PMU events in the guest work
correctly; but when the same VCPU executes on another CPU, PMU events in
the guest will suddenly stop counting.

Fortunately, perf core allows user to specify on which PMU to create an
event by using the perf_event_attr->type field, which is used by
perf_init_event() as an index in the radix tree of available PMUs.

Add the KVM_ARM_VCPU_PMU_V3_CTRL(KVM_ARM_VCPU_PMU_V3_SET_PMU) VCPU
attribute to allow userspace to specify the arm_pmu that KVM will use when
creating events for that VCPU. KVM will make no attempt to run the VCPU on
the physical CPUs that share this PMU, leaving it up to userspace to
manage the VCPU threads' affinity accordingly.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 Documentation/virt/kvm/devices/vcpu.rst | 25 ++++++++++++++++++++
 arch/arm64/include/uapi/asm/kvm.h       |  1 +
 arch/arm64/kvm/pmu-emul.c               | 31 +++++++++++++++++++++++--
 include/kvm/arm_pmu.h                   |  1 +
 tools/arch/arm64/include/uapi/asm/kvm.h |  1 +
 5 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst
index 60a29972d3f1..59ac382af59a 100644
--- a/Documentation/virt/kvm/devices/vcpu.rst
+++ b/Documentation/virt/kvm/devices/vcpu.rst
@@ -104,6 +104,31 @@ hardware event. Filtering event 0x1E (CHAIN) has no effect either, as it
 isn't strictly speaking an event. Filtering the cycle counter is possible
 using event 0x11 (CPU_CYCLES).
 
+1.4 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_SET_PMU
+------------------------------------------
+
+:Parameters: in kvm_device_attr.addr the address to an int representing the PMU
+             identifier.
+
+:Returns:
+
+	 =======  ===============================================
+	 -EBUSY   PMUv3 already initialized
+	 -EFAULT  Error accessing the PMU identifier
+	 -EINVAL  PMU not found or PMU name longer than PAGE_SIZE
+	 -ENODEV  PMUv3 not supported or GIC not initialized
+	 -ENOMEM  Could not allocate memory
+	 =======  ===============================================
+
+Request that the VCPU uses the specified hardware PMU when creating guest events
+for the purpose of PMU emulation. The PMU identifier can be read from the "type"
+file for the desired PMU instance under /sys/devices (or, equivalent,
+/sys/bus/even_source). This attribute is particularly useful on heterogeneous
+systems where there are at least two PMUs on the system.
+
+Note that KVM will not make any attempts to run the VCPU on the physical CPUs
+associated with the PMU specified by this attribute. This is entirely left to
+userspace.
 
 2. GROUP: KVM_ARM_VCPU_TIMER_CTRL
 =================================
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index b3edde68bc3e..1d0a0a2a9711 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -362,6 +362,7 @@ struct kvm_arm_copy_mte_tags {
 #define   KVM_ARM_VCPU_PMU_V3_IRQ	0
 #define   KVM_ARM_VCPU_PMU_V3_INIT	1
 #define   KVM_ARM_VCPU_PMU_V3_FILTER	2
+#define   KVM_ARM_VCPU_PMU_V3_SET_PMU	3
 #define KVM_ARM_VCPU_TIMER_CTRL		1
 #define   KVM_ARM_VCPU_TIMER_IRQ_VTIMER		0
 #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER		1
diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index dab335d17409..53cedeb5dbf6 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -602,6 +602,7 @@ static bool kvm_pmu_counter_is_enabled(struct kvm_vcpu *vcpu, u64 select_idx)
 static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx)
 {
 	struct kvm_pmu *pmu = &vcpu->arch.pmu;
+	struct arm_pmu *arm_pmu = pmu->arm_pmu;
 	struct kvm_pmc *pmc;
 	struct perf_event *event;
 	struct perf_event_attr attr;
@@ -637,8 +638,7 @@ static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx)
 		return;
 
 	memset(&attr, 0, sizeof(struct perf_event_attr));
-	attr.type = PERF_TYPE_RAW;
-	attr.size = sizeof(attr);
+	attr.type = arm_pmu ? arm_pmu->pmu.type : PERF_TYPE_RAW;
 	attr.pinned = 1;
 	attr.disabled = !kvm_pmu_counter_is_enabled(vcpu, pmc->idx);
 	attr.exclude_user = data & ARMV8_PMU_EXCLUDE_EL0 ? 1 : 0;
@@ -941,6 +941,23 @@ static bool pmu_irq_is_valid(struct kvm *kvm, int irq)
 	return true;
 }
 
+static int kvm_arm_pmu_v3_set_pmu(struct kvm_vcpu *vcpu, int pmu_id)
+{
+	struct kvm_pmu *kvm_pmu = &vcpu->arch.pmu;
+	struct arm_pmu_entry *entry;
+	struct arm_pmu *arm_pmu;
+
+	list_for_each_entry(entry, &arm_pmus, entry) {
+		arm_pmu = entry->arm_pmu;
+		if (arm_pmu->pmu.type == pmu_id) {
+			kvm_pmu->arm_pmu = arm_pmu;
+			return 0;
+		}
+	}
+
+	return -ENXIO;
+}
+
 int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 {
 	if (!kvm_vcpu_has_pmu(vcpu))
@@ -1027,6 +1044,15 @@ int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 
 		return 0;
 	}
+	case KVM_ARM_VCPU_PMU_V3_SET_PMU: {
+		int __user *uaddr = (int __user *)(long)attr->addr;
+		int pmu_id;
+
+		if (get_user(pmu_id, uaddr))
+			return -EFAULT;
+
+		return kvm_arm_pmu_v3_set_pmu(vcpu, pmu_id);
+	}
 	case KVM_ARM_VCPU_PMU_V3_INIT:
 		return kvm_arm_pmu_v3_init(vcpu);
 	}
@@ -1064,6 +1090,7 @@ int kvm_arm_pmu_v3_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 	case KVM_ARM_VCPU_PMU_V3_IRQ:
 	case KVM_ARM_VCPU_PMU_V3_INIT:
 	case KVM_ARM_VCPU_PMU_V3_FILTER:
+	case KVM_ARM_VCPU_PMU_V3_SET_PMU:
 		if (kvm_vcpu_has_pmu(vcpu))
 			return 0;
 	}
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index e249c5f172aa..ab3046a8f9bb 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -34,6 +34,7 @@ struct kvm_pmu {
 	bool created;
 	bool irq_level;
 	struct irq_work overflow_work;
+	struct arm_pmu *arm_pmu;
 };
 
 struct arm_pmu_entry {
diff --git a/tools/arch/arm64/include/uapi/asm/kvm.h b/tools/arch/arm64/include/uapi/asm/kvm.h
index b3edde68bc3e..1d0a0a2a9711 100644
--- a/tools/arch/arm64/include/uapi/asm/kvm.h
+++ b/tools/arch/arm64/include/uapi/asm/kvm.h
@@ -362,6 +362,7 @@ struct kvm_arm_copy_mte_tags {
 #define   KVM_ARM_VCPU_PMU_V3_IRQ	0
 #define   KVM_ARM_VCPU_PMU_V3_INIT	1
 #define   KVM_ARM_VCPU_PMU_V3_FILTER	2
+#define   KVM_ARM_VCPU_PMU_V3_SET_PMU	3
 #define KVM_ARM_VCPU_TIMER_CTRL		1
 #define   KVM_ARM_VCPU_TIMER_IRQ_VTIMER		0
 #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER		1
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/4] KVM: arm64: Add KVM_ARM_VCPU_PMU_V3_SET_PMU attribute
@ 2021-11-15 16:50   ` Alexandru Elisei
  0 siblings, 0 replies; 26+ messages in thread
From: Alexandru Elisei @ 2021-11-15 16:50 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, will, mark.rutland,
	linux-arm-kernel, kvmarm
  Cc: peter.maydell

When KVM creates an event and there are more than one PMUs present on the
system, perf_init_event() will go through the list of available PMUs and
will choose the first one that can create the event. The order of the PMUs
in the PMU list depends on the probe order, which can change under various
circumstances, for example if the order of the PMU nodes change in the DTB
or if asynchronous driver probing is enabled on the kernel command line
(with the driver_async_probe=armv8-pmu option).

Another consequence of this approach is that, on heteregeneous systems,
all virtual machines that KVM creates will use the same PMU. This might
cause unexpected behaviour for userspace: when a VCPU is executing on
the physical CPU that uses this PMU, PMU events in the guest work
correctly; but when the same VCPU executes on another CPU, PMU events in
the guest will suddenly stop counting.

Fortunately, perf core allows user to specify on which PMU to create an
event by using the perf_event_attr->type field, which is used by
perf_init_event() as an index in the radix tree of available PMUs.

Add the KVM_ARM_VCPU_PMU_V3_CTRL(KVM_ARM_VCPU_PMU_V3_SET_PMU) VCPU
attribute to allow userspace to specify the arm_pmu that KVM will use when
creating events for that VCPU. KVM will make no attempt to run the VCPU on
the physical CPUs that share this PMU, leaving it up to userspace to
manage the VCPU threads' affinity accordingly.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 Documentation/virt/kvm/devices/vcpu.rst | 25 ++++++++++++++++++++
 arch/arm64/include/uapi/asm/kvm.h       |  1 +
 arch/arm64/kvm/pmu-emul.c               | 31 +++++++++++++++++++++++--
 include/kvm/arm_pmu.h                   |  1 +
 tools/arch/arm64/include/uapi/asm/kvm.h |  1 +
 5 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst
index 60a29972d3f1..59ac382af59a 100644
--- a/Documentation/virt/kvm/devices/vcpu.rst
+++ b/Documentation/virt/kvm/devices/vcpu.rst
@@ -104,6 +104,31 @@ hardware event. Filtering event 0x1E (CHAIN) has no effect either, as it
 isn't strictly speaking an event. Filtering the cycle counter is possible
 using event 0x11 (CPU_CYCLES).
 
+1.4 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_SET_PMU
+------------------------------------------
+
+:Parameters: in kvm_device_attr.addr the address to an int representing the PMU
+             identifier.
+
+:Returns:
+
+	 =======  ===============================================
+	 -EBUSY   PMUv3 already initialized
+	 -EFAULT  Error accessing the PMU identifier
+	 -EINVAL  PMU not found or PMU name longer than PAGE_SIZE
+	 -ENODEV  PMUv3 not supported or GIC not initialized
+	 -ENOMEM  Could not allocate memory
+	 =======  ===============================================
+
+Request that the VCPU uses the specified hardware PMU when creating guest events
+for the purpose of PMU emulation. The PMU identifier can be read from the "type"
+file for the desired PMU instance under /sys/devices (or, equivalent,
+/sys/bus/even_source). This attribute is particularly useful on heterogeneous
+systems where there are at least two PMUs on the system.
+
+Note that KVM will not make any attempts to run the VCPU on the physical CPUs
+associated with the PMU specified by this attribute. This is entirely left to
+userspace.
 
 2. GROUP: KVM_ARM_VCPU_TIMER_CTRL
 =================================
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index b3edde68bc3e..1d0a0a2a9711 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -362,6 +362,7 @@ struct kvm_arm_copy_mte_tags {
 #define   KVM_ARM_VCPU_PMU_V3_IRQ	0
 #define   KVM_ARM_VCPU_PMU_V3_INIT	1
 #define   KVM_ARM_VCPU_PMU_V3_FILTER	2
+#define   KVM_ARM_VCPU_PMU_V3_SET_PMU	3
 #define KVM_ARM_VCPU_TIMER_CTRL		1
 #define   KVM_ARM_VCPU_TIMER_IRQ_VTIMER		0
 #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER		1
diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index dab335d17409..53cedeb5dbf6 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -602,6 +602,7 @@ static bool kvm_pmu_counter_is_enabled(struct kvm_vcpu *vcpu, u64 select_idx)
 static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx)
 {
 	struct kvm_pmu *pmu = &vcpu->arch.pmu;
+	struct arm_pmu *arm_pmu = pmu->arm_pmu;
 	struct kvm_pmc *pmc;
 	struct perf_event *event;
 	struct perf_event_attr attr;
@@ -637,8 +638,7 @@ static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx)
 		return;
 
 	memset(&attr, 0, sizeof(struct perf_event_attr));
-	attr.type = PERF_TYPE_RAW;
-	attr.size = sizeof(attr);
+	attr.type = arm_pmu ? arm_pmu->pmu.type : PERF_TYPE_RAW;
 	attr.pinned = 1;
 	attr.disabled = !kvm_pmu_counter_is_enabled(vcpu, pmc->idx);
 	attr.exclude_user = data & ARMV8_PMU_EXCLUDE_EL0 ? 1 : 0;
@@ -941,6 +941,23 @@ static bool pmu_irq_is_valid(struct kvm *kvm, int irq)
 	return true;
 }
 
+static int kvm_arm_pmu_v3_set_pmu(struct kvm_vcpu *vcpu, int pmu_id)
+{
+	struct kvm_pmu *kvm_pmu = &vcpu->arch.pmu;
+	struct arm_pmu_entry *entry;
+	struct arm_pmu *arm_pmu;
+
+	list_for_each_entry(entry, &arm_pmus, entry) {
+		arm_pmu = entry->arm_pmu;
+		if (arm_pmu->pmu.type == pmu_id) {
+			kvm_pmu->arm_pmu = arm_pmu;
+			return 0;
+		}
+	}
+
+	return -ENXIO;
+}
+
 int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 {
 	if (!kvm_vcpu_has_pmu(vcpu))
@@ -1027,6 +1044,15 @@ int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 
 		return 0;
 	}
+	case KVM_ARM_VCPU_PMU_V3_SET_PMU: {
+		int __user *uaddr = (int __user *)(long)attr->addr;
+		int pmu_id;
+
+		if (get_user(pmu_id, uaddr))
+			return -EFAULT;
+
+		return kvm_arm_pmu_v3_set_pmu(vcpu, pmu_id);
+	}
 	case KVM_ARM_VCPU_PMU_V3_INIT:
 		return kvm_arm_pmu_v3_init(vcpu);
 	}
@@ -1064,6 +1090,7 @@ int kvm_arm_pmu_v3_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 	case KVM_ARM_VCPU_PMU_V3_IRQ:
 	case KVM_ARM_VCPU_PMU_V3_INIT:
 	case KVM_ARM_VCPU_PMU_V3_FILTER:
+	case KVM_ARM_VCPU_PMU_V3_SET_PMU:
 		if (kvm_vcpu_has_pmu(vcpu))
 			return 0;
 	}
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index e249c5f172aa..ab3046a8f9bb 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -34,6 +34,7 @@ struct kvm_pmu {
 	bool created;
 	bool irq_level;
 	struct irq_work overflow_work;
+	struct arm_pmu *arm_pmu;
 };
 
 struct arm_pmu_entry {
diff --git a/tools/arch/arm64/include/uapi/asm/kvm.h b/tools/arch/arm64/include/uapi/asm/kvm.h
index b3edde68bc3e..1d0a0a2a9711 100644
--- a/tools/arch/arm64/include/uapi/asm/kvm.h
+++ b/tools/arch/arm64/include/uapi/asm/kvm.h
@@ -362,6 +362,7 @@ struct kvm_arm_copy_mte_tags {
 #define   KVM_ARM_VCPU_PMU_V3_IRQ	0
 #define   KVM_ARM_VCPU_PMU_V3_INIT	1
 #define   KVM_ARM_VCPU_PMU_V3_FILTER	2
+#define   KVM_ARM_VCPU_PMU_V3_SET_PMU	3
 #define KVM_ARM_VCPU_TIMER_CTRL		1
 #define   KVM_ARM_VCPU_TIMER_IRQ_VTIMER		0
 #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER		1
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/4] KVM: arm64: Refuse to run VCPU if the PMU doesn't match the physical CPU
  2021-11-15 16:50 ` Alexandru Elisei
@ 2021-11-15 16:50   ` Alexandru Elisei
  -1 siblings, 0 replies; 26+ messages in thread
From: Alexandru Elisei @ 2021-11-15 16:50 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, will, mark.rutland,
	linux-arm-kernel, kvmarm

Userspace can assign a PMU to a VCPU with the KVM_ARM_VCPU_PMU_V3_SET_PMU
device ioctl. If the VCPU is scheduled on a physical CPU which has a
different PMU, the perf events needed to emulate a guest PMU won't be
scheduled in and the guest performance counters will stop counting. Treat
it as an userspace error and refuse to run the VCPU in this situation.

The VCPU is flagged as being scheduled on the wrong CPU in vcpu_load(), but
the flag is cleared when the KVM_RUN enters the non-preemptible section
instead of in vcpu_put(); this has been done on purpose so the error
condition is communicated as soon as possible to userspace, otherwise
vcpu_load() on the wrong CPU followed by a vcpu_put() could clear the flag.

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 Documentation/virt/kvm/api.rst          |  5 +++--
 Documentation/virt/kvm/devices/vcpu.rst |  3 ++-
 arch/arm64/include/asm/kvm_host.h       |  3 +++
 arch/arm64/kvm/arm.c                    | 15 +++++++++++++++
 arch/arm64/kvm/pmu-emul.c               |  1 +
 5 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index aeeb071c7688..5bbad8318ea5 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -396,8 +396,9 @@ Errors:
 
   =======    ==============================================================
   EINTR      an unmasked signal is pending
-  ENOEXEC    the vcpu hasn't been initialized or the guest tried to execute
-             instructions from device memory (arm64)
+  ENOEXEC    the vcpu hasn't been initialized, the guest tried to execute
+             instructions from device memory (arm64) or the vcpu PMU is
+             different from the physical cpu PMU (arm64).
   ENOSYS     data abort outside memslots with no syndrome info and
              KVM_CAP_ARM_NISV_TO_USER not enabled (arm64)
   EPERM      SVE feature set but not finalized (arm64)
diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst
index 59ac382af59a..ca0da34da889 100644
--- a/Documentation/virt/kvm/devices/vcpu.rst
+++ b/Documentation/virt/kvm/devices/vcpu.rst
@@ -128,7 +128,8 @@ systems where there are at least two PMUs on the system.
 
 Note that KVM will not make any attempts to run the VCPU on the physical CPUs
 associated with the PMU specified by this attribute. This is entirely left to
-userspace.
+userspace. However, if the VCPU is scheduled on a CPU which has a different PMU,
+then KVM_RUN will return with the error code ENOEXEC.
 
 2. GROUP: KVM_ARM_VCPU_TIMER_CTRL
 =================================
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 2a5f7f38006f..ae2083b41d8a 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -385,6 +385,9 @@ struct kvm_vcpu_arch {
 		u64 last_steal;
 		gpa_t base;
 	} steal;
+
+	cpumask_var_t supported_cpus;
+	bool cpu_not_supported;
 };
 
 /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 2f03cbfefe67..5dbfd18c4e37 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -320,6 +320,9 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 
 	vcpu->arch.mmu_page_cache.gfp_zero = __GFP_ZERO;
 
+	if (!zalloc_cpumask_var(&vcpu->arch.supported_cpus, GFP_KERNEL))
+		return -ENOMEM;
+
 	/* Set up the timer */
 	kvm_timer_vcpu_init(vcpu);
 
@@ -347,6 +350,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 	if (vcpu->arch.has_run_once && unlikely(!irqchip_in_kernel(vcpu->kvm)))
 		static_branch_dec(&userspace_irqchip_in_use);
 
+	free_cpumask_var(vcpu->arch.supported_cpus);
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
 	kvm_timer_vcpu_terminate(vcpu);
 	kvm_pmu_vcpu_destroy(vcpu);
@@ -425,6 +429,10 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	if (vcpu_has_ptrauth(vcpu))
 		vcpu_ptrauth_disable(vcpu);
 	kvm_arch_vcpu_load_debug_state_flags(vcpu);
+
+	if (!cpumask_empty(vcpu->arch.supported_cpus) &&
+	    !cpumask_test_cpu(smp_processor_id(), vcpu->arch.supported_cpus))
+		vcpu->arch.cpu_not_supported = true;
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
@@ -815,6 +823,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 		 */
 		preempt_disable();
 
+		if (unlikely(vcpu->arch.cpu_not_supported)) {
+			vcpu->arch.cpu_not_supported = false;
+			ret = -ENOEXEC;
+			preempt_enable();
+			continue;
+		}
+
 		kvm_pmu_flush_hwstate(vcpu);
 
 		local_irq_disable();
diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index 53cedeb5dbf6..957a6d0cfa56 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -951,6 +951,7 @@ static int kvm_arm_pmu_v3_set_pmu(struct kvm_vcpu *vcpu, int pmu_id)
 		arm_pmu = entry->arm_pmu;
 		if (arm_pmu->pmu.type == pmu_id) {
 			kvm_pmu->arm_pmu = arm_pmu;
+			cpumask_copy(vcpu->arch.supported_cpus, &arm_pmu->supported_cpus);
 			return 0;
 		}
 	}
-- 
2.33.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/4] KVM: arm64: Refuse to run VCPU if the PMU doesn't match the physical CPU
@ 2021-11-15 16:50   ` Alexandru Elisei
  0 siblings, 0 replies; 26+ messages in thread
From: Alexandru Elisei @ 2021-11-15 16:50 UTC (permalink / raw)
  To: maz, james.morse, suzuki.poulose, will, mark.rutland,
	linux-arm-kernel, kvmarm
  Cc: peter.maydell

Userspace can assign a PMU to a VCPU with the KVM_ARM_VCPU_PMU_V3_SET_PMU
device ioctl. If the VCPU is scheduled on a physical CPU which has a
different PMU, the perf events needed to emulate a guest PMU won't be
scheduled in and the guest performance counters will stop counting. Treat
it as an userspace error and refuse to run the VCPU in this situation.

The VCPU is flagged as being scheduled on the wrong CPU in vcpu_load(), but
the flag is cleared when the KVM_RUN enters the non-preemptible section
instead of in vcpu_put(); this has been done on purpose so the error
condition is communicated as soon as possible to userspace, otherwise
vcpu_load() on the wrong CPU followed by a vcpu_put() could clear the flag.

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 Documentation/virt/kvm/api.rst          |  5 +++--
 Documentation/virt/kvm/devices/vcpu.rst |  3 ++-
 arch/arm64/include/asm/kvm_host.h       |  3 +++
 arch/arm64/kvm/arm.c                    | 15 +++++++++++++++
 arch/arm64/kvm/pmu-emul.c               |  1 +
 5 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index aeeb071c7688..5bbad8318ea5 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -396,8 +396,9 @@ Errors:
 
   =======    ==============================================================
   EINTR      an unmasked signal is pending
-  ENOEXEC    the vcpu hasn't been initialized or the guest tried to execute
-             instructions from device memory (arm64)
+  ENOEXEC    the vcpu hasn't been initialized, the guest tried to execute
+             instructions from device memory (arm64) or the vcpu PMU is
+             different from the physical cpu PMU (arm64).
   ENOSYS     data abort outside memslots with no syndrome info and
              KVM_CAP_ARM_NISV_TO_USER not enabled (arm64)
   EPERM      SVE feature set but not finalized (arm64)
diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst
index 59ac382af59a..ca0da34da889 100644
--- a/Documentation/virt/kvm/devices/vcpu.rst
+++ b/Documentation/virt/kvm/devices/vcpu.rst
@@ -128,7 +128,8 @@ systems where there are at least two PMUs on the system.
 
 Note that KVM will not make any attempts to run the VCPU on the physical CPUs
 associated with the PMU specified by this attribute. This is entirely left to
-userspace.
+userspace. However, if the VCPU is scheduled on a CPU which has a different PMU,
+then KVM_RUN will return with the error code ENOEXEC.
 
 2. GROUP: KVM_ARM_VCPU_TIMER_CTRL
 =================================
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 2a5f7f38006f..ae2083b41d8a 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -385,6 +385,9 @@ struct kvm_vcpu_arch {
 		u64 last_steal;
 		gpa_t base;
 	} steal;
+
+	cpumask_var_t supported_cpus;
+	bool cpu_not_supported;
 };
 
 /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 2f03cbfefe67..5dbfd18c4e37 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -320,6 +320,9 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 
 	vcpu->arch.mmu_page_cache.gfp_zero = __GFP_ZERO;
 
+	if (!zalloc_cpumask_var(&vcpu->arch.supported_cpus, GFP_KERNEL))
+		return -ENOMEM;
+
 	/* Set up the timer */
 	kvm_timer_vcpu_init(vcpu);
 
@@ -347,6 +350,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 	if (vcpu->arch.has_run_once && unlikely(!irqchip_in_kernel(vcpu->kvm)))
 		static_branch_dec(&userspace_irqchip_in_use);
 
+	free_cpumask_var(vcpu->arch.supported_cpus);
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
 	kvm_timer_vcpu_terminate(vcpu);
 	kvm_pmu_vcpu_destroy(vcpu);
@@ -425,6 +429,10 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	if (vcpu_has_ptrauth(vcpu))
 		vcpu_ptrauth_disable(vcpu);
 	kvm_arch_vcpu_load_debug_state_flags(vcpu);
+
+	if (!cpumask_empty(vcpu->arch.supported_cpus) &&
+	    !cpumask_test_cpu(smp_processor_id(), vcpu->arch.supported_cpus))
+		vcpu->arch.cpu_not_supported = true;
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
@@ -815,6 +823,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 		 */
 		preempt_disable();
 
+		if (unlikely(vcpu->arch.cpu_not_supported)) {
+			vcpu->arch.cpu_not_supported = false;
+			ret = -ENOEXEC;
+			preempt_enable();
+			continue;
+		}
+
 		kvm_pmu_flush_hwstate(vcpu);
 
 		local_irq_disable();
diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index 53cedeb5dbf6..957a6d0cfa56 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -951,6 +951,7 @@ static int kvm_arm_pmu_v3_set_pmu(struct kvm_vcpu *vcpu, int pmu_id)
 		arm_pmu = entry->arm_pmu;
 		if (arm_pmu->pmu.type == pmu_id) {
 			kvm_pmu->arm_pmu = arm_pmu;
+			cpumask_copy(vcpu->arch.supported_cpus, &arm_pmu->supported_cpus);
 			return 0;
 		}
 	}
-- 
2.33.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4] KVM: arm64: Add KVM_ARM_VCPU_PMU_V3_SET_PMU attribute
  2021-11-15 16:50   ` Alexandru Elisei
@ 2021-11-21 19:11     ` Marc Zyngier
  -1 siblings, 0 replies; 26+ messages in thread
From: Marc Zyngier @ 2021-11-21 19:11 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: will, kvmarm, linux-arm-kernel

On Mon, 15 Nov 2021 16:50:40 +0000,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> When KVM creates an event and there are more than one PMUs present on the
> system, perf_init_event() will go through the list of available PMUs and
> will choose the first one that can create the event. The order of the PMUs
> in the PMU list depends on the probe order, which can change under various
> circumstances, for example if the order of the PMU nodes change in the DTB
> or if asynchronous driver probing is enabled on the kernel command line
> (with the driver_async_probe=armv8-pmu option).
> 
> Another consequence of this approach is that, on heteregeneous systems,
> all virtual machines that KVM creates will use the same PMU. This might
> cause unexpected behaviour for userspace: when a VCPU is executing on
> the physical CPU that uses this PMU, PMU events in the guest work
> correctly; but when the same VCPU executes on another CPU, PMU events in
> the guest will suddenly stop counting.
> 
> Fortunately, perf core allows user to specify on which PMU to create an
> event by using the perf_event_attr->type field, which is used by
> perf_init_event() as an index in the radix tree of available PMUs.
> 
> Add the KVM_ARM_VCPU_PMU_V3_CTRL(KVM_ARM_VCPU_PMU_V3_SET_PMU) VCPU
> attribute to allow userspace to specify the arm_pmu that KVM will use when
> creating events for that VCPU. KVM will make no attempt to run the VCPU on
> the physical CPUs that share this PMU, leaving it up to userspace to
> manage the VCPU threads' affinity accordingly.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  Documentation/virt/kvm/devices/vcpu.rst | 25 ++++++++++++++++++++
>  arch/arm64/include/uapi/asm/kvm.h       |  1 +
>  arch/arm64/kvm/pmu-emul.c               | 31 +++++++++++++++++++++++--
>  include/kvm/arm_pmu.h                   |  1 +
>  tools/arch/arm64/include/uapi/asm/kvm.h |  1 +
>  5 files changed, 57 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst
> index 60a29972d3f1..59ac382af59a 100644
> --- a/Documentation/virt/kvm/devices/vcpu.rst
> +++ b/Documentation/virt/kvm/devices/vcpu.rst
> @@ -104,6 +104,31 @@ hardware event. Filtering event 0x1E (CHAIN) has no effect either, as it
>  isn't strictly speaking an event. Filtering the cycle counter is possible
>  using event 0x11 (CPU_CYCLES).
>  
> +1.4 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_SET_PMU
> +------------------------------------------
> +
> +:Parameters: in kvm_device_attr.addr the address to an int representing the PMU
> +             identifier.
> +
> +:Returns:
> +
> +	 =======  ===============================================
> +	 -EBUSY   PMUv3 already initialized
> +	 -EFAULT  Error accessing the PMU identifier
> +	 -EINVAL  PMU not found or PMU name longer than PAGE_SIZE
> +	 -ENODEV  PMUv3 not supported or GIC not initialized
> +	 -ENOMEM  Could not allocate memory
> +	 =======  ===============================================
> +
> +Request that the VCPU uses the specified hardware PMU when creating guest events
> +for the purpose of PMU emulation. The PMU identifier can be read from the "type"
> +file for the desired PMU instance under /sys/devices (or, equivalent,
> +/sys/bus/even_source). This attribute is particularly useful on heterogeneous
> +systems where there are at least two PMUs on the system.

nit: CPU PMUs. A number of systems have 'uncore' PMUs which KVM
totally ignores.

> +
> +Note that KVM will not make any attempts to run the VCPU on the physical CPUs
> +associated with the PMU specified by this attribute. This is entirely left to
> +userspace.
>  
>  2. GROUP: KVM_ARM_VCPU_TIMER_CTRL
>  =================================
> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> index b3edde68bc3e..1d0a0a2a9711 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -362,6 +362,7 @@ struct kvm_arm_copy_mte_tags {
>  #define   KVM_ARM_VCPU_PMU_V3_IRQ	0
>  #define   KVM_ARM_VCPU_PMU_V3_INIT	1
>  #define   KVM_ARM_VCPU_PMU_V3_FILTER	2
> +#define   KVM_ARM_VCPU_PMU_V3_SET_PMU	3
>  #define KVM_ARM_VCPU_TIMER_CTRL		1
>  #define   KVM_ARM_VCPU_TIMER_IRQ_VTIMER		0
>  #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER		1
> diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
> index dab335d17409..53cedeb5dbf6 100644
> --- a/arch/arm64/kvm/pmu-emul.c
> +++ b/arch/arm64/kvm/pmu-emul.c
> @@ -602,6 +602,7 @@ static bool kvm_pmu_counter_is_enabled(struct kvm_vcpu *vcpu, u64 select_idx)
>  static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx)
>  {
>  	struct kvm_pmu *pmu = &vcpu->arch.pmu;
> +	struct arm_pmu *arm_pmu = pmu->arm_pmu;
>  	struct kvm_pmc *pmc;
>  	struct perf_event *event;
>  	struct perf_event_attr attr;
> @@ -637,8 +638,7 @@ static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx)
>  		return;
>  
>  	memset(&attr, 0, sizeof(struct perf_event_attr));
> -	attr.type = PERF_TYPE_RAW;
> -	attr.size = sizeof(attr);
> +	attr.type = arm_pmu ? arm_pmu->pmu.type : PERF_TYPE_RAW;
>  	attr.pinned = 1;
>  	attr.disabled = !kvm_pmu_counter_is_enabled(vcpu, pmc->idx);
>  	attr.exclude_user = data & ARMV8_PMU_EXCLUDE_EL0 ? 1 : 0;
> @@ -941,6 +941,23 @@ static bool pmu_irq_is_valid(struct kvm *kvm, int irq)
>  	return true;
>  }
>  
> +static int kvm_arm_pmu_v3_set_pmu(struct kvm_vcpu *vcpu, int pmu_id)
> +{
> +	struct kvm_pmu *kvm_pmu = &vcpu->arch.pmu;
> +	struct arm_pmu_entry *entry;
> +	struct arm_pmu *arm_pmu;
> +
> +	list_for_each_entry(entry, &arm_pmus, entry) {
> +		arm_pmu = entry->arm_pmu;
> +		if (arm_pmu->pmu.type == pmu_id) {
> +			kvm_pmu->arm_pmu = arm_pmu;
> +			return 0;
> +		}
> +	}

How does this work when a new CPU gets hotplugged on, bringing a new
PMU type along? It doesn't seem safe to parse this list without any
locking.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4] KVM: arm64: Add KVM_ARM_VCPU_PMU_V3_SET_PMU attribute
@ 2021-11-21 19:11     ` Marc Zyngier
  0 siblings, 0 replies; 26+ messages in thread
From: Marc Zyngier @ 2021-11-21 19:11 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: james.morse, suzuki.poulose, will, mark.rutland,
	linux-arm-kernel, kvmarm, peter.maydell

On Mon, 15 Nov 2021 16:50:40 +0000,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> When KVM creates an event and there are more than one PMUs present on the
> system, perf_init_event() will go through the list of available PMUs and
> will choose the first one that can create the event. The order of the PMUs
> in the PMU list depends on the probe order, which can change under various
> circumstances, for example if the order of the PMU nodes change in the DTB
> or if asynchronous driver probing is enabled on the kernel command line
> (with the driver_async_probe=armv8-pmu option).
> 
> Another consequence of this approach is that, on heteregeneous systems,
> all virtual machines that KVM creates will use the same PMU. This might
> cause unexpected behaviour for userspace: when a VCPU is executing on
> the physical CPU that uses this PMU, PMU events in the guest work
> correctly; but when the same VCPU executes on another CPU, PMU events in
> the guest will suddenly stop counting.
> 
> Fortunately, perf core allows user to specify on which PMU to create an
> event by using the perf_event_attr->type field, which is used by
> perf_init_event() as an index in the radix tree of available PMUs.
> 
> Add the KVM_ARM_VCPU_PMU_V3_CTRL(KVM_ARM_VCPU_PMU_V3_SET_PMU) VCPU
> attribute to allow userspace to specify the arm_pmu that KVM will use when
> creating events for that VCPU. KVM will make no attempt to run the VCPU on
> the physical CPUs that share this PMU, leaving it up to userspace to
> manage the VCPU threads' affinity accordingly.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  Documentation/virt/kvm/devices/vcpu.rst | 25 ++++++++++++++++++++
>  arch/arm64/include/uapi/asm/kvm.h       |  1 +
>  arch/arm64/kvm/pmu-emul.c               | 31 +++++++++++++++++++++++--
>  include/kvm/arm_pmu.h                   |  1 +
>  tools/arch/arm64/include/uapi/asm/kvm.h |  1 +
>  5 files changed, 57 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst
> index 60a29972d3f1..59ac382af59a 100644
> --- a/Documentation/virt/kvm/devices/vcpu.rst
> +++ b/Documentation/virt/kvm/devices/vcpu.rst
> @@ -104,6 +104,31 @@ hardware event. Filtering event 0x1E (CHAIN) has no effect either, as it
>  isn't strictly speaking an event. Filtering the cycle counter is possible
>  using event 0x11 (CPU_CYCLES).
>  
> +1.4 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_SET_PMU
> +------------------------------------------
> +
> +:Parameters: in kvm_device_attr.addr the address to an int representing the PMU
> +             identifier.
> +
> +:Returns:
> +
> +	 =======  ===============================================
> +	 -EBUSY   PMUv3 already initialized
> +	 -EFAULT  Error accessing the PMU identifier
> +	 -EINVAL  PMU not found or PMU name longer than PAGE_SIZE
> +	 -ENODEV  PMUv3 not supported or GIC not initialized
> +	 -ENOMEM  Could not allocate memory
> +	 =======  ===============================================
> +
> +Request that the VCPU uses the specified hardware PMU when creating guest events
> +for the purpose of PMU emulation. The PMU identifier can be read from the "type"
> +file for the desired PMU instance under /sys/devices (or, equivalent,
> +/sys/bus/even_source). This attribute is particularly useful on heterogeneous
> +systems where there are at least two PMUs on the system.

nit: CPU PMUs. A number of systems have 'uncore' PMUs which KVM
totally ignores.

> +
> +Note that KVM will not make any attempts to run the VCPU on the physical CPUs
> +associated with the PMU specified by this attribute. This is entirely left to
> +userspace.
>  
>  2. GROUP: KVM_ARM_VCPU_TIMER_CTRL
>  =================================
> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> index b3edde68bc3e..1d0a0a2a9711 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -362,6 +362,7 @@ struct kvm_arm_copy_mte_tags {
>  #define   KVM_ARM_VCPU_PMU_V3_IRQ	0
>  #define   KVM_ARM_VCPU_PMU_V3_INIT	1
>  #define   KVM_ARM_VCPU_PMU_V3_FILTER	2
> +#define   KVM_ARM_VCPU_PMU_V3_SET_PMU	3
>  #define KVM_ARM_VCPU_TIMER_CTRL		1
>  #define   KVM_ARM_VCPU_TIMER_IRQ_VTIMER		0
>  #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER		1
> diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
> index dab335d17409..53cedeb5dbf6 100644
> --- a/arch/arm64/kvm/pmu-emul.c
> +++ b/arch/arm64/kvm/pmu-emul.c
> @@ -602,6 +602,7 @@ static bool kvm_pmu_counter_is_enabled(struct kvm_vcpu *vcpu, u64 select_idx)
>  static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx)
>  {
>  	struct kvm_pmu *pmu = &vcpu->arch.pmu;
> +	struct arm_pmu *arm_pmu = pmu->arm_pmu;
>  	struct kvm_pmc *pmc;
>  	struct perf_event *event;
>  	struct perf_event_attr attr;
> @@ -637,8 +638,7 @@ static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx)
>  		return;
>  
>  	memset(&attr, 0, sizeof(struct perf_event_attr));
> -	attr.type = PERF_TYPE_RAW;
> -	attr.size = sizeof(attr);
> +	attr.type = arm_pmu ? arm_pmu->pmu.type : PERF_TYPE_RAW;
>  	attr.pinned = 1;
>  	attr.disabled = !kvm_pmu_counter_is_enabled(vcpu, pmc->idx);
>  	attr.exclude_user = data & ARMV8_PMU_EXCLUDE_EL0 ? 1 : 0;
> @@ -941,6 +941,23 @@ static bool pmu_irq_is_valid(struct kvm *kvm, int irq)
>  	return true;
>  }
>  
> +static int kvm_arm_pmu_v3_set_pmu(struct kvm_vcpu *vcpu, int pmu_id)
> +{
> +	struct kvm_pmu *kvm_pmu = &vcpu->arch.pmu;
> +	struct arm_pmu_entry *entry;
> +	struct arm_pmu *arm_pmu;
> +
> +	list_for_each_entry(entry, &arm_pmus, entry) {
> +		arm_pmu = entry->arm_pmu;
> +		if (arm_pmu->pmu.type == pmu_id) {
> +			kvm_pmu->arm_pmu = arm_pmu;
> +			return 0;
> +		}
> +	}

How does this work when a new CPU gets hotplugged on, bringing a new
PMU type along? It doesn't seem safe to parse this list without any
locking.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/4] KVM: arm64: Refuse to run VCPU if the PMU doesn't match the physical CPU
  2021-11-15 16:50   ` Alexandru Elisei
@ 2021-11-21 19:35     ` Marc Zyngier
  -1 siblings, 0 replies; 26+ messages in thread
From: Marc Zyngier @ 2021-11-21 19:35 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: will, kvmarm, linux-arm-kernel

On Mon, 15 Nov 2021 16:50:41 +0000,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> Userspace can assign a PMU to a VCPU with the KVM_ARM_VCPU_PMU_V3_SET_PMU
> device ioctl. If the VCPU is scheduled on a physical CPU which has a
> different PMU, the perf events needed to emulate a guest PMU won't be
> scheduled in and the guest performance counters will stop counting. Treat
> it as an userspace error and refuse to run the VCPU in this situation.
> 
> The VCPU is flagged as being scheduled on the wrong CPU in vcpu_load(), but
> the flag is cleared when the KVM_RUN enters the non-preemptible section
> instead of in vcpu_put(); this has been done on purpose so the error
> condition is communicated as soon as possible to userspace, otherwise
> vcpu_load() on the wrong CPU followed by a vcpu_put() could clear the flag.

Can we make this something orthogonal to the PMU, and get userspace to
pick an affinity mask independently of instantiating a PMU? I can
imagine this would also be useful for SPE on asymmetric systems.

> Suggested-by: Marc Zyngier <maz@kernel.org>
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  Documentation/virt/kvm/api.rst          |  5 +++--
>  Documentation/virt/kvm/devices/vcpu.rst |  3 ++-
>  arch/arm64/include/asm/kvm_host.h       |  3 +++
>  arch/arm64/kvm/arm.c                    | 15 +++++++++++++++
>  arch/arm64/kvm/pmu-emul.c               |  1 +
>  5 files changed, 24 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index aeeb071c7688..5bbad8318ea5 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -396,8 +396,9 @@ Errors:
>  
>    =======    ==============================================================
>    EINTR      an unmasked signal is pending
> -  ENOEXEC    the vcpu hasn't been initialized or the guest tried to execute
> -             instructions from device memory (arm64)
> +  ENOEXEC    the vcpu hasn't been initialized, the guest tried to execute
> +             instructions from device memory (arm64) or the vcpu PMU is
> +             different from the physical cpu PMU (arm64).
>    ENOSYS     data abort outside memslots with no syndrome info and
>               KVM_CAP_ARM_NISV_TO_USER not enabled (arm64)
>    EPERM      SVE feature set but not finalized (arm64)
> diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst
> index 59ac382af59a..ca0da34da889 100644
> --- a/Documentation/virt/kvm/devices/vcpu.rst
> +++ b/Documentation/virt/kvm/devices/vcpu.rst
> @@ -128,7 +128,8 @@ systems where there are at least two PMUs on the system.
>  
>  Note that KVM will not make any attempts to run the VCPU on the physical CPUs
>  associated with the PMU specified by this attribute. This is entirely left to
> -userspace.
> +userspace. However, if the VCPU is scheduled on a CPU which has a different PMU,
> +then KVM_RUN will return with the error code ENOEXEC.
>  
>  2. GROUP: KVM_ARM_VCPU_TIMER_CTRL
>  =================================
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 2a5f7f38006f..ae2083b41d8a 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -385,6 +385,9 @@ struct kvm_vcpu_arch {
>  		u64 last_steal;
>  		gpa_t base;
>  	} steal;
> +
> +	cpumask_var_t supported_cpus;
> +	bool cpu_not_supported;

Can this just be made a vcpu flag instead?

>  };
>  
>  /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 2f03cbfefe67..5dbfd18c4e37 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -320,6 +320,9 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
>  
>  	vcpu->arch.mmu_page_cache.gfp_zero = __GFP_ZERO;
>  
> +	if (!zalloc_cpumask_var(&vcpu->arch.supported_cpus, GFP_KERNEL))
> +		return -ENOMEM;
> +
>  	/* Set up the timer */
>  	kvm_timer_vcpu_init(vcpu);
>  
> @@ -347,6 +350,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>  	if (vcpu->arch.has_run_once && unlikely(!irqchip_in_kernel(vcpu->kvm)))
>  		static_branch_dec(&userspace_irqchip_in_use);
>  
> +	free_cpumask_var(vcpu->arch.supported_cpus);
>  	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
>  	kvm_timer_vcpu_terminate(vcpu);
>  	kvm_pmu_vcpu_destroy(vcpu);
> @@ -425,6 +429,10 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  	if (vcpu_has_ptrauth(vcpu))
>  		vcpu_ptrauth_disable(vcpu);
>  	kvm_arch_vcpu_load_debug_state_flags(vcpu);
> +
> +	if (!cpumask_empty(vcpu->arch.supported_cpus) &&

How about initialising the cpumask to cpu_possible_mask, avoiding the
cpumask_empty check?

> +	    !cpumask_test_cpu(smp_processor_id(), vcpu->arch.supported_cpus))
> +		vcpu->arch.cpu_not_supported = true;

I have the feeling this would actually better be implemented as a
request, but there may be some surgery required for this.

>  }
>  
>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> @@ -815,6 +823,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>  		 */
>  		preempt_disable();
>  
> +		if (unlikely(vcpu->arch.cpu_not_supported)) {
> +			vcpu->arch.cpu_not_supported = false;
> +			ret = -ENOEXEC;
> +			preempt_enable();

How about populating run->fail_entry with some information? I bet this
would be useful, if only as a debugging tool.

> +			continue;
> +		}
> +
>  		kvm_pmu_flush_hwstate(vcpu);
>  
>  		local_irq_disable();
> diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
> index 53cedeb5dbf6..957a6d0cfa56 100644
> --- a/arch/arm64/kvm/pmu-emul.c
> +++ b/arch/arm64/kvm/pmu-emul.c
> @@ -951,6 +951,7 @@ static int kvm_arm_pmu_v3_set_pmu(struct kvm_vcpu *vcpu, int pmu_id)
>  		arm_pmu = entry->arm_pmu;
>  		if (arm_pmu->pmu.type == pmu_id) {
>  			kvm_pmu->arm_pmu = arm_pmu;
> +			cpumask_copy(vcpu->arch.supported_cpus, &arm_pmu->supported_cpus);
>  			return 0;
>  		}
>  	}

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/4] KVM: arm64: Refuse to run VCPU if the PMU doesn't match the physical CPU
@ 2021-11-21 19:35     ` Marc Zyngier
  0 siblings, 0 replies; 26+ messages in thread
From: Marc Zyngier @ 2021-11-21 19:35 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: james.morse, suzuki.poulose, will, mark.rutland,
	linux-arm-kernel, kvmarm, peter.maydell

On Mon, 15 Nov 2021 16:50:41 +0000,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> Userspace can assign a PMU to a VCPU with the KVM_ARM_VCPU_PMU_V3_SET_PMU
> device ioctl. If the VCPU is scheduled on a physical CPU which has a
> different PMU, the perf events needed to emulate a guest PMU won't be
> scheduled in and the guest performance counters will stop counting. Treat
> it as an userspace error and refuse to run the VCPU in this situation.
> 
> The VCPU is flagged as being scheduled on the wrong CPU in vcpu_load(), but
> the flag is cleared when the KVM_RUN enters the non-preemptible section
> instead of in vcpu_put(); this has been done on purpose so the error
> condition is communicated as soon as possible to userspace, otherwise
> vcpu_load() on the wrong CPU followed by a vcpu_put() could clear the flag.

Can we make this something orthogonal to the PMU, and get userspace to
pick an affinity mask independently of instantiating a PMU? I can
imagine this would also be useful for SPE on asymmetric systems.

> Suggested-by: Marc Zyngier <maz@kernel.org>
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  Documentation/virt/kvm/api.rst          |  5 +++--
>  Documentation/virt/kvm/devices/vcpu.rst |  3 ++-
>  arch/arm64/include/asm/kvm_host.h       |  3 +++
>  arch/arm64/kvm/arm.c                    | 15 +++++++++++++++
>  arch/arm64/kvm/pmu-emul.c               |  1 +
>  5 files changed, 24 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index aeeb071c7688..5bbad8318ea5 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -396,8 +396,9 @@ Errors:
>  
>    =======    ==============================================================
>    EINTR      an unmasked signal is pending
> -  ENOEXEC    the vcpu hasn't been initialized or the guest tried to execute
> -             instructions from device memory (arm64)
> +  ENOEXEC    the vcpu hasn't been initialized, the guest tried to execute
> +             instructions from device memory (arm64) or the vcpu PMU is
> +             different from the physical cpu PMU (arm64).
>    ENOSYS     data abort outside memslots with no syndrome info and
>               KVM_CAP_ARM_NISV_TO_USER not enabled (arm64)
>    EPERM      SVE feature set but not finalized (arm64)
> diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst
> index 59ac382af59a..ca0da34da889 100644
> --- a/Documentation/virt/kvm/devices/vcpu.rst
> +++ b/Documentation/virt/kvm/devices/vcpu.rst
> @@ -128,7 +128,8 @@ systems where there are at least two PMUs on the system.
>  
>  Note that KVM will not make any attempts to run the VCPU on the physical CPUs
>  associated with the PMU specified by this attribute. This is entirely left to
> -userspace.
> +userspace. However, if the VCPU is scheduled on a CPU which has a different PMU,
> +then KVM_RUN will return with the error code ENOEXEC.
>  
>  2. GROUP: KVM_ARM_VCPU_TIMER_CTRL
>  =================================
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 2a5f7f38006f..ae2083b41d8a 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -385,6 +385,9 @@ struct kvm_vcpu_arch {
>  		u64 last_steal;
>  		gpa_t base;
>  	} steal;
> +
> +	cpumask_var_t supported_cpus;
> +	bool cpu_not_supported;

Can this just be made a vcpu flag instead?

>  };
>  
>  /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 2f03cbfefe67..5dbfd18c4e37 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -320,6 +320,9 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
>  
>  	vcpu->arch.mmu_page_cache.gfp_zero = __GFP_ZERO;
>  
> +	if (!zalloc_cpumask_var(&vcpu->arch.supported_cpus, GFP_KERNEL))
> +		return -ENOMEM;
> +
>  	/* Set up the timer */
>  	kvm_timer_vcpu_init(vcpu);
>  
> @@ -347,6 +350,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>  	if (vcpu->arch.has_run_once && unlikely(!irqchip_in_kernel(vcpu->kvm)))
>  		static_branch_dec(&userspace_irqchip_in_use);
>  
> +	free_cpumask_var(vcpu->arch.supported_cpus);
>  	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
>  	kvm_timer_vcpu_terminate(vcpu);
>  	kvm_pmu_vcpu_destroy(vcpu);
> @@ -425,6 +429,10 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  	if (vcpu_has_ptrauth(vcpu))
>  		vcpu_ptrauth_disable(vcpu);
>  	kvm_arch_vcpu_load_debug_state_flags(vcpu);
> +
> +	if (!cpumask_empty(vcpu->arch.supported_cpus) &&

How about initialising the cpumask to cpu_possible_mask, avoiding the
cpumask_empty check?

> +	    !cpumask_test_cpu(smp_processor_id(), vcpu->arch.supported_cpus))
> +		vcpu->arch.cpu_not_supported = true;

I have the feeling this would actually better be implemented as a
request, but there may be some surgery required for this.

>  }
>  
>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> @@ -815,6 +823,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>  		 */
>  		preempt_disable();
>  
> +		if (unlikely(vcpu->arch.cpu_not_supported)) {
> +			vcpu->arch.cpu_not_supported = false;
> +			ret = -ENOEXEC;
> +			preempt_enable();

How about populating run->fail_entry with some information? I bet this
would be useful, if only as a debugging tool.

> +			continue;
> +		}
> +
>  		kvm_pmu_flush_hwstate(vcpu);
>  
>  		local_irq_disable();
> diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
> index 53cedeb5dbf6..957a6d0cfa56 100644
> --- a/arch/arm64/kvm/pmu-emul.c
> +++ b/arch/arm64/kvm/pmu-emul.c
> @@ -951,6 +951,7 @@ static int kvm_arm_pmu_v3_set_pmu(struct kvm_vcpu *vcpu, int pmu_id)
>  		arm_pmu = entry->arm_pmu;
>  		if (arm_pmu->pmu.type == pmu_id) {
>  			kvm_pmu->arm_pmu = arm_pmu;
> +			cpumask_copy(vcpu->arch.supported_cpus, &arm_pmu->supported_cpus);
>  			return 0;
>  		}
>  	}

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4] KVM: arm64: Add KVM_ARM_VCPU_PMU_V3_SET_PMU attribute
  2021-11-21 19:11     ` Marc Zyngier
@ 2021-11-22 11:29       ` Alexandru Elisei
  -1 siblings, 0 replies; 26+ messages in thread
From: Alexandru Elisei @ 2021-11-22 11:29 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: will, kvmarm, linux-arm-kernel

Hi Marc,

Thanks for having a look!

On Sun, Nov 21, 2021 at 07:11:30PM +0000, Marc Zyngier wrote:
> On Mon, 15 Nov 2021 16:50:40 +0000,
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > 
> > When KVM creates an event and there are more than one PMUs present on the
> > system, perf_init_event() will go through the list of available PMUs and
> > will choose the first one that can create the event. The order of the PMUs
> > in the PMU list depends on the probe order, which can change under various
> > circumstances, for example if the order of the PMU nodes change in the DTB
> > or if asynchronous driver probing is enabled on the kernel command line
> > (with the driver_async_probe=armv8-pmu option).
> > 
> > Another consequence of this approach is that, on heteregeneous systems,
> > all virtual machines that KVM creates will use the same PMU. This might
> > cause unexpected behaviour for userspace: when a VCPU is executing on
> > the physical CPU that uses this PMU, PMU events in the guest work
> > correctly; but when the same VCPU executes on another CPU, PMU events in
> > the guest will suddenly stop counting.
> > 
> > Fortunately, perf core allows user to specify on which PMU to create an
> > event by using the perf_event_attr->type field, which is used by
> > perf_init_event() as an index in the radix tree of available PMUs.
> > 
> > Add the KVM_ARM_VCPU_PMU_V3_CTRL(KVM_ARM_VCPU_PMU_V3_SET_PMU) VCPU
> > attribute to allow userspace to specify the arm_pmu that KVM will use when
> > creating events for that VCPU. KVM will make no attempt to run the VCPU on
> > the physical CPUs that share this PMU, leaving it up to userspace to
> > manage the VCPU threads' affinity accordingly.
> > 
> > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > ---
> >  Documentation/virt/kvm/devices/vcpu.rst | 25 ++++++++++++++++++++
> >  arch/arm64/include/uapi/asm/kvm.h       |  1 +
> >  arch/arm64/kvm/pmu-emul.c               | 31 +++++++++++++++++++++++--
> >  include/kvm/arm_pmu.h                   |  1 +
> >  tools/arch/arm64/include/uapi/asm/kvm.h |  1 +
> >  5 files changed, 57 insertions(+), 2 deletions(-)
> > 
> > diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst
> > index 60a29972d3f1..59ac382af59a 100644
> > --- a/Documentation/virt/kvm/devices/vcpu.rst
> > +++ b/Documentation/virt/kvm/devices/vcpu.rst
> > @@ -104,6 +104,31 @@ hardware event. Filtering event 0x1E (CHAIN) has no effect either, as it
> >  isn't strictly speaking an event. Filtering the cycle counter is possible
> >  using event 0x11 (CPU_CYCLES).
> >  
> > +1.4 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_SET_PMU
> > +------------------------------------------
> > +
> > +:Parameters: in kvm_device_attr.addr the address to an int representing the PMU
> > +             identifier.
> > +
> > +:Returns:
> > +
> > +	 =======  ===============================================
> > +	 -EBUSY   PMUv3 already initialized
> > +	 -EFAULT  Error accessing the PMU identifier
> > +	 -EINVAL  PMU not found or PMU name longer than PAGE_SIZE
> > +	 -ENODEV  PMUv3 not supported or GIC not initialized
> > +	 -ENOMEM  Could not allocate memory
> > +	 =======  ===============================================
> > +
> > +Request that the VCPU uses the specified hardware PMU when creating guest events
> > +for the purpose of PMU emulation. The PMU identifier can be read from the "type"
> > +file for the desired PMU instance under /sys/devices (or, equivalent,
> > +/sys/bus/even_source). This attribute is particularly useful on heterogeneous
> > +systems where there are at least two PMUs on the system.
> 
> nit: CPU PMUs. A number of systems have 'uncore' PMUs which KVM
> totally ignores.

Sure, will change.

> 
> > +
> > +Note that KVM will not make any attempts to run the VCPU on the physical CPUs
> > +associated with the PMU specified by this attribute. This is entirely left to
> > +userspace.
> >  
> >  2. GROUP: KVM_ARM_VCPU_TIMER_CTRL
> >  =================================
> > diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> > index b3edde68bc3e..1d0a0a2a9711 100644
> > --- a/arch/arm64/include/uapi/asm/kvm.h
> > +++ b/arch/arm64/include/uapi/asm/kvm.h
> > @@ -362,6 +362,7 @@ struct kvm_arm_copy_mte_tags {
> >  #define   KVM_ARM_VCPU_PMU_V3_IRQ	0
> >  #define   KVM_ARM_VCPU_PMU_V3_INIT	1
> >  #define   KVM_ARM_VCPU_PMU_V3_FILTER	2
> > +#define   KVM_ARM_VCPU_PMU_V3_SET_PMU	3
> >  #define KVM_ARM_VCPU_TIMER_CTRL		1
> >  #define   KVM_ARM_VCPU_TIMER_IRQ_VTIMER		0
> >  #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER		1
> > diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
> > index dab335d17409..53cedeb5dbf6 100644
> > --- a/arch/arm64/kvm/pmu-emul.c
> > +++ b/arch/arm64/kvm/pmu-emul.c
> > @@ -602,6 +602,7 @@ static bool kvm_pmu_counter_is_enabled(struct kvm_vcpu *vcpu, u64 select_idx)
> >  static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx)
> >  {
> >  	struct kvm_pmu *pmu = &vcpu->arch.pmu;
> > +	struct arm_pmu *arm_pmu = pmu->arm_pmu;
> >  	struct kvm_pmc *pmc;
> >  	struct perf_event *event;
> >  	struct perf_event_attr attr;
> > @@ -637,8 +638,7 @@ static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx)
> >  		return;
> >  
> >  	memset(&attr, 0, sizeof(struct perf_event_attr));
> > -	attr.type = PERF_TYPE_RAW;
> > -	attr.size = sizeof(attr);
> > +	attr.type = arm_pmu ? arm_pmu->pmu.type : PERF_TYPE_RAW;
> >  	attr.pinned = 1;
> >  	attr.disabled = !kvm_pmu_counter_is_enabled(vcpu, pmc->idx);
> >  	attr.exclude_user = data & ARMV8_PMU_EXCLUDE_EL0 ? 1 : 0;
> > @@ -941,6 +941,23 @@ static bool pmu_irq_is_valid(struct kvm *kvm, int irq)
> >  	return true;
> >  }
> >  
> > +static int kvm_arm_pmu_v3_set_pmu(struct kvm_vcpu *vcpu, int pmu_id)
> > +{
> > +	struct kvm_pmu *kvm_pmu = &vcpu->arch.pmu;
> > +	struct arm_pmu_entry *entry;
> > +	struct arm_pmu *arm_pmu;
> > +
> > +	list_for_each_entry(entry, &arm_pmus, entry) {
> > +		arm_pmu = entry->arm_pmu;
> > +		if (arm_pmu->pmu.type == pmu_id) {
> > +			kvm_pmu->arm_pmu = arm_pmu;
> > +			return 0;
> > +		}
> > +	}
> 
> How does this work when a new CPU gets hotplugged on, bringing a new
> PMU type along? It doesn't seem safe to parse this list without any
> locking.

It wouldn't work at all. I missed the fact that hotplogging a CPU means
writing to the list of PMUs after the initial driver probing which happens
at boot. This needs to be protected against concurrent writes, I will fix
it.

> 
> Thanks,
> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4] KVM: arm64: Add KVM_ARM_VCPU_PMU_V3_SET_PMU attribute
@ 2021-11-22 11:29       ` Alexandru Elisei
  0 siblings, 0 replies; 26+ messages in thread
From: Alexandru Elisei @ 2021-11-22 11:29 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: james.morse, suzuki.poulose, will, mark.rutland,
	linux-arm-kernel, kvmarm, peter.maydell

Hi Marc,

Thanks for having a look!

On Sun, Nov 21, 2021 at 07:11:30PM +0000, Marc Zyngier wrote:
> On Mon, 15 Nov 2021 16:50:40 +0000,
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > 
> > When KVM creates an event and there are more than one PMUs present on the
> > system, perf_init_event() will go through the list of available PMUs and
> > will choose the first one that can create the event. The order of the PMUs
> > in the PMU list depends on the probe order, which can change under various
> > circumstances, for example if the order of the PMU nodes change in the DTB
> > or if asynchronous driver probing is enabled on the kernel command line
> > (with the driver_async_probe=armv8-pmu option).
> > 
> > Another consequence of this approach is that, on heteregeneous systems,
> > all virtual machines that KVM creates will use the same PMU. This might
> > cause unexpected behaviour for userspace: when a VCPU is executing on
> > the physical CPU that uses this PMU, PMU events in the guest work
> > correctly; but when the same VCPU executes on another CPU, PMU events in
> > the guest will suddenly stop counting.
> > 
> > Fortunately, perf core allows user to specify on which PMU to create an
> > event by using the perf_event_attr->type field, which is used by
> > perf_init_event() as an index in the radix tree of available PMUs.
> > 
> > Add the KVM_ARM_VCPU_PMU_V3_CTRL(KVM_ARM_VCPU_PMU_V3_SET_PMU) VCPU
> > attribute to allow userspace to specify the arm_pmu that KVM will use when
> > creating events for that VCPU. KVM will make no attempt to run the VCPU on
> > the physical CPUs that share this PMU, leaving it up to userspace to
> > manage the VCPU threads' affinity accordingly.
> > 
> > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > ---
> >  Documentation/virt/kvm/devices/vcpu.rst | 25 ++++++++++++++++++++
> >  arch/arm64/include/uapi/asm/kvm.h       |  1 +
> >  arch/arm64/kvm/pmu-emul.c               | 31 +++++++++++++++++++++++--
> >  include/kvm/arm_pmu.h                   |  1 +
> >  tools/arch/arm64/include/uapi/asm/kvm.h |  1 +
> >  5 files changed, 57 insertions(+), 2 deletions(-)
> > 
> > diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst
> > index 60a29972d3f1..59ac382af59a 100644
> > --- a/Documentation/virt/kvm/devices/vcpu.rst
> > +++ b/Documentation/virt/kvm/devices/vcpu.rst
> > @@ -104,6 +104,31 @@ hardware event. Filtering event 0x1E (CHAIN) has no effect either, as it
> >  isn't strictly speaking an event. Filtering the cycle counter is possible
> >  using event 0x11 (CPU_CYCLES).
> >  
> > +1.4 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_SET_PMU
> > +------------------------------------------
> > +
> > +:Parameters: in kvm_device_attr.addr the address to an int representing the PMU
> > +             identifier.
> > +
> > +:Returns:
> > +
> > +	 =======  ===============================================
> > +	 -EBUSY   PMUv3 already initialized
> > +	 -EFAULT  Error accessing the PMU identifier
> > +	 -EINVAL  PMU not found or PMU name longer than PAGE_SIZE
> > +	 -ENODEV  PMUv3 not supported or GIC not initialized
> > +	 -ENOMEM  Could not allocate memory
> > +	 =======  ===============================================
> > +
> > +Request that the VCPU uses the specified hardware PMU when creating guest events
> > +for the purpose of PMU emulation. The PMU identifier can be read from the "type"
> > +file for the desired PMU instance under /sys/devices (or, equivalent,
> > +/sys/bus/even_source). This attribute is particularly useful on heterogeneous
> > +systems where there are at least two PMUs on the system.
> 
> nit: CPU PMUs. A number of systems have 'uncore' PMUs which KVM
> totally ignores.

Sure, will change.

> 
> > +
> > +Note that KVM will not make any attempts to run the VCPU on the physical CPUs
> > +associated with the PMU specified by this attribute. This is entirely left to
> > +userspace.
> >  
> >  2. GROUP: KVM_ARM_VCPU_TIMER_CTRL
> >  =================================
> > diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> > index b3edde68bc3e..1d0a0a2a9711 100644
> > --- a/arch/arm64/include/uapi/asm/kvm.h
> > +++ b/arch/arm64/include/uapi/asm/kvm.h
> > @@ -362,6 +362,7 @@ struct kvm_arm_copy_mte_tags {
> >  #define   KVM_ARM_VCPU_PMU_V3_IRQ	0
> >  #define   KVM_ARM_VCPU_PMU_V3_INIT	1
> >  #define   KVM_ARM_VCPU_PMU_V3_FILTER	2
> > +#define   KVM_ARM_VCPU_PMU_V3_SET_PMU	3
> >  #define KVM_ARM_VCPU_TIMER_CTRL		1
> >  #define   KVM_ARM_VCPU_TIMER_IRQ_VTIMER		0
> >  #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER		1
> > diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
> > index dab335d17409..53cedeb5dbf6 100644
> > --- a/arch/arm64/kvm/pmu-emul.c
> > +++ b/arch/arm64/kvm/pmu-emul.c
> > @@ -602,6 +602,7 @@ static bool kvm_pmu_counter_is_enabled(struct kvm_vcpu *vcpu, u64 select_idx)
> >  static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx)
> >  {
> >  	struct kvm_pmu *pmu = &vcpu->arch.pmu;
> > +	struct arm_pmu *arm_pmu = pmu->arm_pmu;
> >  	struct kvm_pmc *pmc;
> >  	struct perf_event *event;
> >  	struct perf_event_attr attr;
> > @@ -637,8 +638,7 @@ static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx)
> >  		return;
> >  
> >  	memset(&attr, 0, sizeof(struct perf_event_attr));
> > -	attr.type = PERF_TYPE_RAW;
> > -	attr.size = sizeof(attr);
> > +	attr.type = arm_pmu ? arm_pmu->pmu.type : PERF_TYPE_RAW;
> >  	attr.pinned = 1;
> >  	attr.disabled = !kvm_pmu_counter_is_enabled(vcpu, pmc->idx);
> >  	attr.exclude_user = data & ARMV8_PMU_EXCLUDE_EL0 ? 1 : 0;
> > @@ -941,6 +941,23 @@ static bool pmu_irq_is_valid(struct kvm *kvm, int irq)
> >  	return true;
> >  }
> >  
> > +static int kvm_arm_pmu_v3_set_pmu(struct kvm_vcpu *vcpu, int pmu_id)
> > +{
> > +	struct kvm_pmu *kvm_pmu = &vcpu->arch.pmu;
> > +	struct arm_pmu_entry *entry;
> > +	struct arm_pmu *arm_pmu;
> > +
> > +	list_for_each_entry(entry, &arm_pmus, entry) {
> > +		arm_pmu = entry->arm_pmu;
> > +		if (arm_pmu->pmu.type == pmu_id) {
> > +			kvm_pmu->arm_pmu = arm_pmu;
> > +			return 0;
> > +		}
> > +	}
> 
> How does this work when a new CPU gets hotplugged on, bringing a new
> PMU type along? It doesn't seem safe to parse this list without any
> locking.

It wouldn't work at all. I missed the fact that hotplogging a CPU means
writing to the list of PMUs after the initial driver probing which happens
at boot. This needs to be protected against concurrent writes, I will fix
it.

> 
> Thanks,
> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/4] KVM: arm64: Refuse to run VCPU if the PMU doesn't match the physical CPU
  2021-11-21 19:35     ` Marc Zyngier
@ 2021-11-22 12:12       ` Alexandru Elisei
  -1 siblings, 0 replies; 26+ messages in thread
From: Alexandru Elisei @ 2021-11-22 12:12 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: will, kvmarm, linux-arm-kernel

Hi Marc,

On Sun, Nov 21, 2021 at 07:35:13PM +0000, Marc Zyngier wrote:
> On Mon, 15 Nov 2021 16:50:41 +0000,
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > 
> > Userspace can assign a PMU to a VCPU with the KVM_ARM_VCPU_PMU_V3_SET_PMU
> > device ioctl. If the VCPU is scheduled on a physical CPU which has a
> > different PMU, the perf events needed to emulate a guest PMU won't be
> > scheduled in and the guest performance counters will stop counting. Treat
> > it as an userspace error and refuse to run the VCPU in this situation.
> > 
> > The VCPU is flagged as being scheduled on the wrong CPU in vcpu_load(), but
> > the flag is cleared when the KVM_RUN enters the non-preemptible section
> > instead of in vcpu_put(); this has been done on purpose so the error
> > condition is communicated as soon as possible to userspace, otherwise
> > vcpu_load() on the wrong CPU followed by a vcpu_put() could clear the flag.
> 
> Can we make this something orthogonal to the PMU, and get userspace to
> pick an affinity mask independently of instantiating a PMU? I can
> imagine this would also be useful for SPE on asymmetric systems.

I actually went this way for the latest version of the SPE series [1] and
dropped the explicit userspace ioctl in favor of this mechanism.

The expectation is that userspace already knows which CPUs are associated
with the chosen PMU (or SPE) when setting the PMU for the VCPU, and having
userspace set it explicitely via an ioctl looks like an unnecessary step to
me. I don't see other usecases of an explicit ioctl outside of the above
two situation (if userspace wants a VCPU to run only on specific CPUs, it
can use thread affinity for that), so I decided to drop it.

For reference, this is how the ioctl looked like in the previous version of
the SPE series [2], in case you want to see how it would look like.

[1] https://www.spinics.net/lists/arm-kernel/msg934211.html
[2] https://www.spinics.net/lists/arm-kernel/msg917229.html

> 
> > Suggested-by: Marc Zyngier <maz@kernel.org>
> > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > ---
> >  Documentation/virt/kvm/api.rst          |  5 +++--
> >  Documentation/virt/kvm/devices/vcpu.rst |  3 ++-
> >  arch/arm64/include/asm/kvm_host.h       |  3 +++
> >  arch/arm64/kvm/arm.c                    | 15 +++++++++++++++
> >  arch/arm64/kvm/pmu-emul.c               |  1 +
> >  5 files changed, 24 insertions(+), 3 deletions(-)
> > 
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index aeeb071c7688..5bbad8318ea5 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -396,8 +396,9 @@ Errors:
> >  
> >    =======    ==============================================================
> >    EINTR      an unmasked signal is pending
> > -  ENOEXEC    the vcpu hasn't been initialized or the guest tried to execute
> > -             instructions from device memory (arm64)
> > +  ENOEXEC    the vcpu hasn't been initialized, the guest tried to execute
> > +             instructions from device memory (arm64) or the vcpu PMU is
> > +             different from the physical cpu PMU (arm64).
> >    ENOSYS     data abort outside memslots with no syndrome info and
> >               KVM_CAP_ARM_NISV_TO_USER not enabled (arm64)
> >    EPERM      SVE feature set but not finalized (arm64)
> > diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst
> > index 59ac382af59a..ca0da34da889 100644
> > --- a/Documentation/virt/kvm/devices/vcpu.rst
> > +++ b/Documentation/virt/kvm/devices/vcpu.rst
> > @@ -128,7 +128,8 @@ systems where there are at least two PMUs on the system.
> >  
> >  Note that KVM will not make any attempts to run the VCPU on the physical CPUs
> >  associated with the PMU specified by this attribute. This is entirely left to
> > -userspace.
> > +userspace. However, if the VCPU is scheduled on a CPU which has a different PMU,
> > +then KVM_RUN will return with the error code ENOEXEC.
> >  
> >  2. GROUP: KVM_ARM_VCPU_TIMER_CTRL
> >  =================================
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 2a5f7f38006f..ae2083b41d8a 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -385,6 +385,9 @@ struct kvm_vcpu_arch {
> >  		u64 last_steal;
> >  		gpa_t base;
> >  	} steal;
> > +
> > +	cpumask_var_t supported_cpus;
> > +	bool cpu_not_supported;
> 
> Can this just be made a vcpu flag instead?

Sure, that can also work.

> 
> >  };
> >  
> >  /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index 2f03cbfefe67..5dbfd18c4e37 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -320,6 +320,9 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
> >  
> >  	vcpu->arch.mmu_page_cache.gfp_zero = __GFP_ZERO;
> >  
> > +	if (!zalloc_cpumask_var(&vcpu->arch.supported_cpus, GFP_KERNEL))
> > +		return -ENOMEM;
> > +
> >  	/* Set up the timer */
> >  	kvm_timer_vcpu_init(vcpu);
> >  
> > @@ -347,6 +350,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
> >  	if (vcpu->arch.has_run_once && unlikely(!irqchip_in_kernel(vcpu->kvm)))
> >  		static_branch_dec(&userspace_irqchip_in_use);
> >  
> > +	free_cpumask_var(vcpu->arch.supported_cpus);
> >  	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
> >  	kvm_timer_vcpu_terminate(vcpu);
> >  	kvm_pmu_vcpu_destroy(vcpu);
> > @@ -425,6 +429,10 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> >  	if (vcpu_has_ptrauth(vcpu))
> >  		vcpu_ptrauth_disable(vcpu);
> >  	kvm_arch_vcpu_load_debug_state_flags(vcpu);
> > +
> > +	if (!cpumask_empty(vcpu->arch.supported_cpus) &&
> 
> How about initialising the cpumask to cpu_possible_mask, avoiding the
> cpumask_empty check?

That's a great idea, I will change it.

> 
> > +	    !cpumask_test_cpu(smp_processor_id(), vcpu->arch.supported_cpus))
> > +		vcpu->arch.cpu_not_supported = true;
> 
> I have the feeling this would actually better be implemented as a
> request, but there may be some surgery required for this.

I can give it a go, see how it looks.

> 
> >  }
> >  
> >  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> > @@ -815,6 +823,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
> >  		 */
> >  		preempt_disable();
> >  
> > +		if (unlikely(vcpu->arch.cpu_not_supported)) {
> > +			vcpu->arch.cpu_not_supported = false;
> > +			ret = -ENOEXEC;
> > +			preempt_enable();
> 
> How about populating run->fail_entry with some information? I bet this
> would be useful, if only as a debugging tool.

That's another great idea, will definitely add it.

Thanks,
Alex

> 
> > +			continue;
> > +		}
> > +
> >  		kvm_pmu_flush_hwstate(vcpu);
> >  
> >  		local_irq_disable();
> > diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
> > index 53cedeb5dbf6..957a6d0cfa56 100644
> > --- a/arch/arm64/kvm/pmu-emul.c
> > +++ b/arch/arm64/kvm/pmu-emul.c
> > @@ -951,6 +951,7 @@ static int kvm_arm_pmu_v3_set_pmu(struct kvm_vcpu *vcpu, int pmu_id)
> >  		arm_pmu = entry->arm_pmu;
> >  		if (arm_pmu->pmu.type == pmu_id) {
> >  			kvm_pmu->arm_pmu = arm_pmu;
> > +			cpumask_copy(vcpu->arch.supported_cpus, &arm_pmu->supported_cpus);
> >  			return 0;
> >  		}
> >  	}
> 
> Thanks,
> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/4] KVM: arm64: Refuse to run VCPU if the PMU doesn't match the physical CPU
@ 2021-11-22 12:12       ` Alexandru Elisei
  0 siblings, 0 replies; 26+ messages in thread
From: Alexandru Elisei @ 2021-11-22 12:12 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: james.morse, suzuki.poulose, will, mark.rutland,
	linux-arm-kernel, kvmarm, peter.maydell

Hi Marc,

On Sun, Nov 21, 2021 at 07:35:13PM +0000, Marc Zyngier wrote:
> On Mon, 15 Nov 2021 16:50:41 +0000,
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > 
> > Userspace can assign a PMU to a VCPU with the KVM_ARM_VCPU_PMU_V3_SET_PMU
> > device ioctl. If the VCPU is scheduled on a physical CPU which has a
> > different PMU, the perf events needed to emulate a guest PMU won't be
> > scheduled in and the guest performance counters will stop counting. Treat
> > it as an userspace error and refuse to run the VCPU in this situation.
> > 
> > The VCPU is flagged as being scheduled on the wrong CPU in vcpu_load(), but
> > the flag is cleared when the KVM_RUN enters the non-preemptible section
> > instead of in vcpu_put(); this has been done on purpose so the error
> > condition is communicated as soon as possible to userspace, otherwise
> > vcpu_load() on the wrong CPU followed by a vcpu_put() could clear the flag.
> 
> Can we make this something orthogonal to the PMU, and get userspace to
> pick an affinity mask independently of instantiating a PMU? I can
> imagine this would also be useful for SPE on asymmetric systems.

I actually went this way for the latest version of the SPE series [1] and
dropped the explicit userspace ioctl in favor of this mechanism.

The expectation is that userspace already knows which CPUs are associated
with the chosen PMU (or SPE) when setting the PMU for the VCPU, and having
userspace set it explicitely via an ioctl looks like an unnecessary step to
me. I don't see other usecases of an explicit ioctl outside of the above
two situation (if userspace wants a VCPU to run only on specific CPUs, it
can use thread affinity for that), so I decided to drop it.

For reference, this is how the ioctl looked like in the previous version of
the SPE series [2], in case you want to see how it would look like.

[1] https://www.spinics.net/lists/arm-kernel/msg934211.html
[2] https://www.spinics.net/lists/arm-kernel/msg917229.html

> 
> > Suggested-by: Marc Zyngier <maz@kernel.org>
> > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > ---
> >  Documentation/virt/kvm/api.rst          |  5 +++--
> >  Documentation/virt/kvm/devices/vcpu.rst |  3 ++-
> >  arch/arm64/include/asm/kvm_host.h       |  3 +++
> >  arch/arm64/kvm/arm.c                    | 15 +++++++++++++++
> >  arch/arm64/kvm/pmu-emul.c               |  1 +
> >  5 files changed, 24 insertions(+), 3 deletions(-)
> > 
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index aeeb071c7688..5bbad8318ea5 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -396,8 +396,9 @@ Errors:
> >  
> >    =======    ==============================================================
> >    EINTR      an unmasked signal is pending
> > -  ENOEXEC    the vcpu hasn't been initialized or the guest tried to execute
> > -             instructions from device memory (arm64)
> > +  ENOEXEC    the vcpu hasn't been initialized, the guest tried to execute
> > +             instructions from device memory (arm64) or the vcpu PMU is
> > +             different from the physical cpu PMU (arm64).
> >    ENOSYS     data abort outside memslots with no syndrome info and
> >               KVM_CAP_ARM_NISV_TO_USER not enabled (arm64)
> >    EPERM      SVE feature set but not finalized (arm64)
> > diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst
> > index 59ac382af59a..ca0da34da889 100644
> > --- a/Documentation/virt/kvm/devices/vcpu.rst
> > +++ b/Documentation/virt/kvm/devices/vcpu.rst
> > @@ -128,7 +128,8 @@ systems where there are at least two PMUs on the system.
> >  
> >  Note that KVM will not make any attempts to run the VCPU on the physical CPUs
> >  associated with the PMU specified by this attribute. This is entirely left to
> > -userspace.
> > +userspace. However, if the VCPU is scheduled on a CPU which has a different PMU,
> > +then KVM_RUN will return with the error code ENOEXEC.
> >  
> >  2. GROUP: KVM_ARM_VCPU_TIMER_CTRL
> >  =================================
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 2a5f7f38006f..ae2083b41d8a 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -385,6 +385,9 @@ struct kvm_vcpu_arch {
> >  		u64 last_steal;
> >  		gpa_t base;
> >  	} steal;
> > +
> > +	cpumask_var_t supported_cpus;
> > +	bool cpu_not_supported;
> 
> Can this just be made a vcpu flag instead?

Sure, that can also work.

> 
> >  };
> >  
> >  /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index 2f03cbfefe67..5dbfd18c4e37 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -320,6 +320,9 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
> >  
> >  	vcpu->arch.mmu_page_cache.gfp_zero = __GFP_ZERO;
> >  
> > +	if (!zalloc_cpumask_var(&vcpu->arch.supported_cpus, GFP_KERNEL))
> > +		return -ENOMEM;
> > +
> >  	/* Set up the timer */
> >  	kvm_timer_vcpu_init(vcpu);
> >  
> > @@ -347,6 +350,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
> >  	if (vcpu->arch.has_run_once && unlikely(!irqchip_in_kernel(vcpu->kvm)))
> >  		static_branch_dec(&userspace_irqchip_in_use);
> >  
> > +	free_cpumask_var(vcpu->arch.supported_cpus);
> >  	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
> >  	kvm_timer_vcpu_terminate(vcpu);
> >  	kvm_pmu_vcpu_destroy(vcpu);
> > @@ -425,6 +429,10 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> >  	if (vcpu_has_ptrauth(vcpu))
> >  		vcpu_ptrauth_disable(vcpu);
> >  	kvm_arch_vcpu_load_debug_state_flags(vcpu);
> > +
> > +	if (!cpumask_empty(vcpu->arch.supported_cpus) &&
> 
> How about initialising the cpumask to cpu_possible_mask, avoiding the
> cpumask_empty check?

That's a great idea, I will change it.

> 
> > +	    !cpumask_test_cpu(smp_processor_id(), vcpu->arch.supported_cpus))
> > +		vcpu->arch.cpu_not_supported = true;
> 
> I have the feeling this would actually better be implemented as a
> request, but there may be some surgery required for this.

I can give it a go, see how it looks.

> 
> >  }
> >  
> >  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> > @@ -815,6 +823,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
> >  		 */
> >  		preempt_disable();
> >  
> > +		if (unlikely(vcpu->arch.cpu_not_supported)) {
> > +			vcpu->arch.cpu_not_supported = false;
> > +			ret = -ENOEXEC;
> > +			preempt_enable();
> 
> How about populating run->fail_entry with some information? I bet this
> would be useful, if only as a debugging tool.

That's another great idea, will definitely add it.

Thanks,
Alex

> 
> > +			continue;
> > +		}
> > +
> >  		kvm_pmu_flush_hwstate(vcpu);
> >  
> >  		local_irq_disable();
> > diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
> > index 53cedeb5dbf6..957a6d0cfa56 100644
> > --- a/arch/arm64/kvm/pmu-emul.c
> > +++ b/arch/arm64/kvm/pmu-emul.c
> > @@ -951,6 +951,7 @@ static int kvm_arm_pmu_v3_set_pmu(struct kvm_vcpu *vcpu, int pmu_id)
> >  		arm_pmu = entry->arm_pmu;
> >  		if (arm_pmu->pmu.type == pmu_id) {
> >  			kvm_pmu->arm_pmu = arm_pmu;
> > +			cpumask_copy(vcpu->arch.supported_cpus, &arm_pmu->supported_cpus);
> >  			return 0;
> >  		}
> >  	}
> 
> Thanks,
> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/4] KVM: arm64: Refuse to run VCPU if the PMU doesn't match the physical CPU
  2021-11-22 12:12       ` Alexandru Elisei
@ 2021-11-22 14:21         ` Marc Zyngier
  -1 siblings, 0 replies; 26+ messages in thread
From: Marc Zyngier @ 2021-11-22 14:21 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: will, kvmarm, linux-arm-kernel

On Mon, 22 Nov 2021 12:12:17 +0000,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> Hi Marc,
> 
> On Sun, Nov 21, 2021 at 07:35:13PM +0000, Marc Zyngier wrote:
> > On Mon, 15 Nov 2021 16:50:41 +0000,
> > Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > > 
> > > Userspace can assign a PMU to a VCPU with the KVM_ARM_VCPU_PMU_V3_SET_PMU
> > > device ioctl. If the VCPU is scheduled on a physical CPU which has a
> > > different PMU, the perf events needed to emulate a guest PMU won't be
> > > scheduled in and the guest performance counters will stop counting. Treat
> > > it as an userspace error and refuse to run the VCPU in this situation.
> > > 
> > > The VCPU is flagged as being scheduled on the wrong CPU in vcpu_load(), but
> > > the flag is cleared when the KVM_RUN enters the non-preemptible section
> > > instead of in vcpu_put(); this has been done on purpose so the error
> > > condition is communicated as soon as possible to userspace, otherwise
> > > vcpu_load() on the wrong CPU followed by a vcpu_put() could clear the flag.
> > 
> > Can we make this something orthogonal to the PMU, and get userspace to
> > pick an affinity mask independently of instantiating a PMU? I can
> > imagine this would also be useful for SPE on asymmetric systems.
> 
> I actually went this way for the latest version of the SPE series [1] and
> dropped the explicit userspace ioctl in favor of this mechanism.
> 
> The expectation is that userspace already knows which CPUs are associated
> with the chosen PMU (or SPE) when setting the PMU for the VCPU, and having
> userspace set it explicitely via an ioctl looks like an unnecessary step to
> me. I don't see other usecases of an explicit ioctl outside of the above
> two situation (if userspace wants a VCPU to run only on specific CPUs, it
> can use thread affinity for that), so I decided to drop it.

My problem with that is that if you have (for whatever reason) a set
of affinities that are not strictly identical for both PMU and SPE,
and expose both of these to a guest, what do you choose?

As long as you have a single affinity set to take care of, you're
good. It is when you have several ones that it becomes ugly (as with
anything involving asymmetric CPUs).

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/4] KVM: arm64: Refuse to run VCPU if the PMU doesn't match the physical CPU
@ 2021-11-22 14:21         ` Marc Zyngier
  0 siblings, 0 replies; 26+ messages in thread
From: Marc Zyngier @ 2021-11-22 14:21 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: james.morse, suzuki.poulose, will, mark.rutland,
	linux-arm-kernel, kvmarm, peter.maydell

On Mon, 22 Nov 2021 12:12:17 +0000,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> Hi Marc,
> 
> On Sun, Nov 21, 2021 at 07:35:13PM +0000, Marc Zyngier wrote:
> > On Mon, 15 Nov 2021 16:50:41 +0000,
> > Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > > 
> > > Userspace can assign a PMU to a VCPU with the KVM_ARM_VCPU_PMU_V3_SET_PMU
> > > device ioctl. If the VCPU is scheduled on a physical CPU which has a
> > > different PMU, the perf events needed to emulate a guest PMU won't be
> > > scheduled in and the guest performance counters will stop counting. Treat
> > > it as an userspace error and refuse to run the VCPU in this situation.
> > > 
> > > The VCPU is flagged as being scheduled on the wrong CPU in vcpu_load(), but
> > > the flag is cleared when the KVM_RUN enters the non-preemptible section
> > > instead of in vcpu_put(); this has been done on purpose so the error
> > > condition is communicated as soon as possible to userspace, otherwise
> > > vcpu_load() on the wrong CPU followed by a vcpu_put() could clear the flag.
> > 
> > Can we make this something orthogonal to the PMU, and get userspace to
> > pick an affinity mask independently of instantiating a PMU? I can
> > imagine this would also be useful for SPE on asymmetric systems.
> 
> I actually went this way for the latest version of the SPE series [1] and
> dropped the explicit userspace ioctl in favor of this mechanism.
> 
> The expectation is that userspace already knows which CPUs are associated
> with the chosen PMU (or SPE) when setting the PMU for the VCPU, and having
> userspace set it explicitely via an ioctl looks like an unnecessary step to
> me. I don't see other usecases of an explicit ioctl outside of the above
> two situation (if userspace wants a VCPU to run only on specific CPUs, it
> can use thread affinity for that), so I decided to drop it.

My problem with that is that if you have (for whatever reason) a set
of affinities that are not strictly identical for both PMU and SPE,
and expose both of these to a guest, what do you choose?

As long as you have a single affinity set to take care of, you're
good. It is when you have several ones that it becomes ugly (as with
anything involving asymmetric CPUs).

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/4] KVM: arm64: Refuse to run VCPU if the PMU doesn't match the physical CPU
  2021-11-22 14:21         ` Marc Zyngier
@ 2021-11-22 14:43           ` Alexandru Elisei
  -1 siblings, 0 replies; 26+ messages in thread
From: Alexandru Elisei @ 2021-11-22 14:43 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: will, kvmarm, linux-arm-kernel

Hi Marc,

On Mon, Nov 22, 2021 at 02:21:00PM +0000, Marc Zyngier wrote:
> On Mon, 22 Nov 2021 12:12:17 +0000,
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > 
> > Hi Marc,
> > 
> > On Sun, Nov 21, 2021 at 07:35:13PM +0000, Marc Zyngier wrote:
> > > On Mon, 15 Nov 2021 16:50:41 +0000,
> > > Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > > > 
> > > > Userspace can assign a PMU to a VCPU with the KVM_ARM_VCPU_PMU_V3_SET_PMU
> > > > device ioctl. If the VCPU is scheduled on a physical CPU which has a
> > > > different PMU, the perf events needed to emulate a guest PMU won't be
> > > > scheduled in and the guest performance counters will stop counting. Treat
> > > > it as an userspace error and refuse to run the VCPU in this situation.
> > > > 
> > > > The VCPU is flagged as being scheduled on the wrong CPU in vcpu_load(), but
> > > > the flag is cleared when the KVM_RUN enters the non-preemptible section
> > > > instead of in vcpu_put(); this has been done on purpose so the error
> > > > condition is communicated as soon as possible to userspace, otherwise
> > > > vcpu_load() on the wrong CPU followed by a vcpu_put() could clear the flag.
> > > 
> > > Can we make this something orthogonal to the PMU, and get userspace to
> > > pick an affinity mask independently of instantiating a PMU? I can
> > > imagine this would also be useful for SPE on asymmetric systems.
> > 
> > I actually went this way for the latest version of the SPE series [1] and
> > dropped the explicit userspace ioctl in favor of this mechanism.
> > 
> > The expectation is that userspace already knows which CPUs are associated
> > with the chosen PMU (or SPE) when setting the PMU for the VCPU, and having
> > userspace set it explicitely via an ioctl looks like an unnecessary step to
> > me. I don't see other usecases of an explicit ioctl outside of the above
> > two situation (if userspace wants a VCPU to run only on specific CPUs, it
> > can use thread affinity for that), so I decided to drop it.
> 
> My problem with that is that if you have (for whatever reason) a set
> of affinities that are not strictly identical for both PMU and SPE,
> and expose both of these to a guest, what do you choose?
> 
> As long as you have a single affinity set to take care of, you're
> good. It is when you have several ones that it becomes ugly (as with
> anything involving asymmetric CPUs).

I thought about it when I decided to do it this way, my solution was to do
a cpumask_and() with the existing VCPU cpumask when setting a VCPU feature
that requires it, and returning an error if we get an empty cpumask,
because userspace is requesting a combination of VCPU features that is not
supported by the hardware.

Going with the other solution (user sets the cpumask via an ioctl), KVM
would still have to check against certain combinations of VCPU features
(for SPE it's mandatory, so KVM doesn't trigger an undefined exception, we
could skip the check for PMU, but then what do we gain from the ioctl if
KVM doesn't check that it matches the PMU?), so I don't think we loose
anything by going with the implicit cpumask.

What do you think?

Thanks,
Alex

> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/4] KVM: arm64: Refuse to run VCPU if the PMU doesn't match the physical CPU
@ 2021-11-22 14:43           ` Alexandru Elisei
  0 siblings, 0 replies; 26+ messages in thread
From: Alexandru Elisei @ 2021-11-22 14:43 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: james.morse, suzuki.poulose, will, mark.rutland,
	linux-arm-kernel, kvmarm, peter.maydell

Hi Marc,

On Mon, Nov 22, 2021 at 02:21:00PM +0000, Marc Zyngier wrote:
> On Mon, 22 Nov 2021 12:12:17 +0000,
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > 
> > Hi Marc,
> > 
> > On Sun, Nov 21, 2021 at 07:35:13PM +0000, Marc Zyngier wrote:
> > > On Mon, 15 Nov 2021 16:50:41 +0000,
> > > Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > > > 
> > > > Userspace can assign a PMU to a VCPU with the KVM_ARM_VCPU_PMU_V3_SET_PMU
> > > > device ioctl. If the VCPU is scheduled on a physical CPU which has a
> > > > different PMU, the perf events needed to emulate a guest PMU won't be
> > > > scheduled in and the guest performance counters will stop counting. Treat
> > > > it as an userspace error and refuse to run the VCPU in this situation.
> > > > 
> > > > The VCPU is flagged as being scheduled on the wrong CPU in vcpu_load(), but
> > > > the flag is cleared when the KVM_RUN enters the non-preemptible section
> > > > instead of in vcpu_put(); this has been done on purpose so the error
> > > > condition is communicated as soon as possible to userspace, otherwise
> > > > vcpu_load() on the wrong CPU followed by a vcpu_put() could clear the flag.
> > > 
> > > Can we make this something orthogonal to the PMU, and get userspace to
> > > pick an affinity mask independently of instantiating a PMU? I can
> > > imagine this would also be useful for SPE on asymmetric systems.
> > 
> > I actually went this way for the latest version of the SPE series [1] and
> > dropped the explicit userspace ioctl in favor of this mechanism.
> > 
> > The expectation is that userspace already knows which CPUs are associated
> > with the chosen PMU (or SPE) when setting the PMU for the VCPU, and having
> > userspace set it explicitely via an ioctl looks like an unnecessary step to
> > me. I don't see other usecases of an explicit ioctl outside of the above
> > two situation (if userspace wants a VCPU to run only on specific CPUs, it
> > can use thread affinity for that), so I decided to drop it.
> 
> My problem with that is that if you have (for whatever reason) a set
> of affinities that are not strictly identical for both PMU and SPE,
> and expose both of these to a guest, what do you choose?
> 
> As long as you have a single affinity set to take care of, you're
> good. It is when you have several ones that it becomes ugly (as with
> anything involving asymmetric CPUs).

I thought about it when I decided to do it this way, my solution was to do
a cpumask_and() with the existing VCPU cpumask when setting a VCPU feature
that requires it, and returning an error if we get an empty cpumask,
because userspace is requesting a combination of VCPU features that is not
supported by the hardware.

Going with the other solution (user sets the cpumask via an ioctl), KVM
would still have to check against certain combinations of VCPU features
(for SPE it's mandatory, so KVM doesn't trigger an undefined exception, we
could skip the check for PMU, but then what do we gain from the ioctl if
KVM doesn't check that it matches the PMU?), so I don't think we loose
anything by going with the implicit cpumask.

What do you think?

Thanks,
Alex

> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/4] KVM: arm64: Refuse to run VCPU if the PMU doesn't match the physical CPU
  2021-11-22 14:43           ` Alexandru Elisei
@ 2021-12-06 10:15             ` Marc Zyngier
  -1 siblings, 0 replies; 26+ messages in thread
From: Marc Zyngier @ 2021-12-06 10:15 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: will, kvmarm, linux-arm-kernel

On Mon, 22 Nov 2021 14:43:17 +0000,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> Hi Marc,
> 
> On Mon, Nov 22, 2021 at 02:21:00PM +0000, Marc Zyngier wrote:
> > On Mon, 22 Nov 2021 12:12:17 +0000,
> > Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > > 
> > > Hi Marc,
> > > 
> > > On Sun, Nov 21, 2021 at 07:35:13PM +0000, Marc Zyngier wrote:
> > > > On Mon, 15 Nov 2021 16:50:41 +0000,
> > > > Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > > > > 
> > > > > Userspace can assign a PMU to a VCPU with the KVM_ARM_VCPU_PMU_V3_SET_PMU
> > > > > device ioctl. If the VCPU is scheduled on a physical CPU which has a
> > > > > different PMU, the perf events needed to emulate a guest PMU won't be
> > > > > scheduled in and the guest performance counters will stop counting. Treat
> > > > > it as an userspace error and refuse to run the VCPU in this situation.
> > > > > 
> > > > > The VCPU is flagged as being scheduled on the wrong CPU in vcpu_load(), but
> > > > > the flag is cleared when the KVM_RUN enters the non-preemptible section
> > > > > instead of in vcpu_put(); this has been done on purpose so the error
> > > > > condition is communicated as soon as possible to userspace, otherwise
> > > > > vcpu_load() on the wrong CPU followed by a vcpu_put() could clear the flag.
> > > > 
> > > > Can we make this something orthogonal to the PMU, and get userspace to
> > > > pick an affinity mask independently of instantiating a PMU? I can
> > > > imagine this would also be useful for SPE on asymmetric systems.
> > > 
> > > I actually went this way for the latest version of the SPE series [1] and
> > > dropped the explicit userspace ioctl in favor of this mechanism.
> > > 
> > > The expectation is that userspace already knows which CPUs are associated
> > > with the chosen PMU (or SPE) when setting the PMU for the VCPU, and having
> > > userspace set it explicitely via an ioctl looks like an unnecessary step to
> > > me. I don't see other usecases of an explicit ioctl outside of the above
> > > two situation (if userspace wants a VCPU to run only on specific CPUs, it
> > > can use thread affinity for that), so I decided to drop it.
> > 
> > My problem with that is that if you have (for whatever reason) a set
> > of affinities that are not strictly identical for both PMU and SPE,
> > and expose both of these to a guest, what do you choose?
> > 
> > As long as you have a single affinity set to take care of, you're
> > good. It is when you have several ones that it becomes ugly (as with
> > anything involving asymmetric CPUs).
> 
> I thought about it when I decided to do it this way, my solution was to do
> a cpumask_and() with the existing VCPU cpumask when setting a VCPU feature
> that requires it, and returning an error if we get an empty cpumask,
> because userspace is requesting a combination of VCPU features that is not
> supported by the hardware.

So every new asymetric feature would come with its own potential
affinity mask, and KVM would track the restriction of that affinity. I
guess that because it can only converge to zero, this is safe by
design...

One thing I want to make sure is that we can evaluate the mask very
early on, and reduce the overhead of that evaluation.

> Going with the other solution (user sets the cpumask via an ioctl), KVM
> would still have to check against certain combinations of VCPU features
> (for SPE it's mandatory, so KVM doesn't trigger an undefined exception, we
> could skip the check for PMU, but then what do we gain from the ioctl if
> KVM doesn't check that it matches the PMU?), so I don't think we loose
> anything by going with the implicit cpumask.
> 
> What do you think?

OK, fair enough. Please respin the series (I had a bunch of minor
comments), and I'll have another look.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/4] KVM: arm64: Refuse to run VCPU if the PMU doesn't match the physical CPU
@ 2021-12-06 10:15             ` Marc Zyngier
  0 siblings, 0 replies; 26+ messages in thread
From: Marc Zyngier @ 2021-12-06 10:15 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: james.morse, suzuki.poulose, will, mark.rutland,
	linux-arm-kernel, kvmarm, peter.maydell

On Mon, 22 Nov 2021 14:43:17 +0000,
Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> 
> Hi Marc,
> 
> On Mon, Nov 22, 2021 at 02:21:00PM +0000, Marc Zyngier wrote:
> > On Mon, 22 Nov 2021 12:12:17 +0000,
> > Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > > 
> > > Hi Marc,
> > > 
> > > On Sun, Nov 21, 2021 at 07:35:13PM +0000, Marc Zyngier wrote:
> > > > On Mon, 15 Nov 2021 16:50:41 +0000,
> > > > Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > > > > 
> > > > > Userspace can assign a PMU to a VCPU with the KVM_ARM_VCPU_PMU_V3_SET_PMU
> > > > > device ioctl. If the VCPU is scheduled on a physical CPU which has a
> > > > > different PMU, the perf events needed to emulate a guest PMU won't be
> > > > > scheduled in and the guest performance counters will stop counting. Treat
> > > > > it as an userspace error and refuse to run the VCPU in this situation.
> > > > > 
> > > > > The VCPU is flagged as being scheduled on the wrong CPU in vcpu_load(), but
> > > > > the flag is cleared when the KVM_RUN enters the non-preemptible section
> > > > > instead of in vcpu_put(); this has been done on purpose so the error
> > > > > condition is communicated as soon as possible to userspace, otherwise
> > > > > vcpu_load() on the wrong CPU followed by a vcpu_put() could clear the flag.
> > > > 
> > > > Can we make this something orthogonal to the PMU, and get userspace to
> > > > pick an affinity mask independently of instantiating a PMU? I can
> > > > imagine this would also be useful for SPE on asymmetric systems.
> > > 
> > > I actually went this way for the latest version of the SPE series [1] and
> > > dropped the explicit userspace ioctl in favor of this mechanism.
> > > 
> > > The expectation is that userspace already knows which CPUs are associated
> > > with the chosen PMU (or SPE) when setting the PMU for the VCPU, and having
> > > userspace set it explicitely via an ioctl looks like an unnecessary step to
> > > me. I don't see other usecases of an explicit ioctl outside of the above
> > > two situation (if userspace wants a VCPU to run only on specific CPUs, it
> > > can use thread affinity for that), so I decided to drop it.
> > 
> > My problem with that is that if you have (for whatever reason) a set
> > of affinities that are not strictly identical for both PMU and SPE,
> > and expose both of these to a guest, what do you choose?
> > 
> > As long as you have a single affinity set to take care of, you're
> > good. It is when you have several ones that it becomes ugly (as with
> > anything involving asymmetric CPUs).
> 
> I thought about it when I decided to do it this way, my solution was to do
> a cpumask_and() with the existing VCPU cpumask when setting a VCPU feature
> that requires it, and returning an error if we get an empty cpumask,
> because userspace is requesting a combination of VCPU features that is not
> supported by the hardware.

So every new asymetric feature would come with its own potential
affinity mask, and KVM would track the restriction of that affinity. I
guess that because it can only converge to zero, this is safe by
design...

One thing I want to make sure is that we can evaluate the mask very
early on, and reduce the overhead of that evaluation.

> Going with the other solution (user sets the cpumask via an ioctl), KVM
> would still have to check against certain combinations of VCPU features
> (for SPE it's mandatory, so KVM doesn't trigger an undefined exception, we
> could skip the check for PMU, but then what do we gain from the ioctl if
> KVM doesn't check that it matches the PMU?), so I don't think we loose
> anything by going with the implicit cpumask.
> 
> What do you think?

OK, fair enough. Please respin the series (I had a bunch of minor
comments), and I'll have another look.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/4] KVM: arm64: Refuse to run VCPU if the PMU doesn't match the physical CPU
  2021-12-06 10:15             ` Marc Zyngier
@ 2021-12-06 10:26               ` Alexandru Elisei
  -1 siblings, 0 replies; 26+ messages in thread
From: Alexandru Elisei @ 2021-12-06 10:26 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: will, kvmarm, linux-arm-kernel

Hi Marc,

On Mon, Dec 06, 2021 at 10:15:31AM +0000, Marc Zyngier wrote:
> On Mon, 22 Nov 2021 14:43:17 +0000,
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > 
> > Hi Marc,
> > 
> > On Mon, Nov 22, 2021 at 02:21:00PM +0000, Marc Zyngier wrote:
> > > On Mon, 22 Nov 2021 12:12:17 +0000,
> > > Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > > > 
> > > > Hi Marc,
> > > > 
> > > > On Sun, Nov 21, 2021 at 07:35:13PM +0000, Marc Zyngier wrote:
> > > > > On Mon, 15 Nov 2021 16:50:41 +0000,
> > > > > Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > > > > > 
> > > > > > Userspace can assign a PMU to a VCPU with the KVM_ARM_VCPU_PMU_V3_SET_PMU
> > > > > > device ioctl. If the VCPU is scheduled on a physical CPU which has a
> > > > > > different PMU, the perf events needed to emulate a guest PMU won't be
> > > > > > scheduled in and the guest performance counters will stop counting. Treat
> > > > > > it as an userspace error and refuse to run the VCPU in this situation.
> > > > > > 
> > > > > > The VCPU is flagged as being scheduled on the wrong CPU in vcpu_load(), but
> > > > > > the flag is cleared when the KVM_RUN enters the non-preemptible section
> > > > > > instead of in vcpu_put(); this has been done on purpose so the error
> > > > > > condition is communicated as soon as possible to userspace, otherwise
> > > > > > vcpu_load() on the wrong CPU followed by a vcpu_put() could clear the flag.
> > > > > 
> > > > > Can we make this something orthogonal to the PMU, and get userspace to
> > > > > pick an affinity mask independently of instantiating a PMU? I can
> > > > > imagine this would also be useful for SPE on asymmetric systems.
> > > > 
> > > > I actually went this way for the latest version of the SPE series [1] and
> > > > dropped the explicit userspace ioctl in favor of this mechanism.
> > > > 
> > > > The expectation is that userspace already knows which CPUs are associated
> > > > with the chosen PMU (or SPE) when setting the PMU for the VCPU, and having
> > > > userspace set it explicitely via an ioctl looks like an unnecessary step to
> > > > me. I don't see other usecases of an explicit ioctl outside of the above
> > > > two situation (if userspace wants a VCPU to run only on specific CPUs, it
> > > > can use thread affinity for that), so I decided to drop it.
> > > 
> > > My problem with that is that if you have (for whatever reason) a set
> > > of affinities that are not strictly identical for both PMU and SPE,
> > > and expose both of these to a guest, what do you choose?
> > > 
> > > As long as you have a single affinity set to take care of, you're
> > > good. It is when you have several ones that it becomes ugly (as with
> > > anything involving asymmetric CPUs).
> > 
> > I thought about it when I decided to do it this way, my solution was to do
> > a cpumask_and() with the existing VCPU cpumask when setting a VCPU feature
> > that requires it, and returning an error if we get an empty cpumask,
> > because userspace is requesting a combination of VCPU features that is not
> > supported by the hardware.
> 
> So every new asymetric feature would come with its own potential
> affinity mask, and KVM would track the restriction of that affinity. I
> guess that because it can only converge to zero, this is safe by
> design...
> 
> One thing I want to make sure is that we can evaluate the mask very
> early on, and reduce the overhead of that evaluation.

I don't think the check can be made any sooner than when the feature bit is set,
which is what I am proposing :)

> 
> > Going with the other solution (user sets the cpumask via an ioctl), KVM
> > would still have to check against certain combinations of VCPU features
> > (for SPE it's mandatory, so KVM doesn't trigger an undefined exception, we
> > could skip the check for PMU, but then what do we gain from the ioctl if
> > KVM doesn't check that it matches the PMU?), so I don't think we loose
> > anything by going with the implicit cpumask.
> > 
> > What do you think?
> 
> OK, fair enough. Please respin the series (I had a bunch of minor
> comments), and I'll have another look.

Great, thanks!

Alex

> 
> Thanks,
> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/4] KVM: arm64: Refuse to run VCPU if the PMU doesn't match the physical CPU
@ 2021-12-06 10:26               ` Alexandru Elisei
  0 siblings, 0 replies; 26+ messages in thread
From: Alexandru Elisei @ 2021-12-06 10:26 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: james.morse, suzuki.poulose, will, mark.rutland,
	linux-arm-kernel, kvmarm, peter.maydell

Hi Marc,

On Mon, Dec 06, 2021 at 10:15:31AM +0000, Marc Zyngier wrote:
> On Mon, 22 Nov 2021 14:43:17 +0000,
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > 
> > Hi Marc,
> > 
> > On Mon, Nov 22, 2021 at 02:21:00PM +0000, Marc Zyngier wrote:
> > > On Mon, 22 Nov 2021 12:12:17 +0000,
> > > Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > > > 
> > > > Hi Marc,
> > > > 
> > > > On Sun, Nov 21, 2021 at 07:35:13PM +0000, Marc Zyngier wrote:
> > > > > On Mon, 15 Nov 2021 16:50:41 +0000,
> > > > > Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> > > > > > 
> > > > > > Userspace can assign a PMU to a VCPU with the KVM_ARM_VCPU_PMU_V3_SET_PMU
> > > > > > device ioctl. If the VCPU is scheduled on a physical CPU which has a
> > > > > > different PMU, the perf events needed to emulate a guest PMU won't be
> > > > > > scheduled in and the guest performance counters will stop counting. Treat
> > > > > > it as an userspace error and refuse to run the VCPU in this situation.
> > > > > > 
> > > > > > The VCPU is flagged as being scheduled on the wrong CPU in vcpu_load(), but
> > > > > > the flag is cleared when the KVM_RUN enters the non-preemptible section
> > > > > > instead of in vcpu_put(); this has been done on purpose so the error
> > > > > > condition is communicated as soon as possible to userspace, otherwise
> > > > > > vcpu_load() on the wrong CPU followed by a vcpu_put() could clear the flag.
> > > > > 
> > > > > Can we make this something orthogonal to the PMU, and get userspace to
> > > > > pick an affinity mask independently of instantiating a PMU? I can
> > > > > imagine this would also be useful for SPE on asymmetric systems.
> > > > 
> > > > I actually went this way for the latest version of the SPE series [1] and
> > > > dropped the explicit userspace ioctl in favor of this mechanism.
> > > > 
> > > > The expectation is that userspace already knows which CPUs are associated
> > > > with the chosen PMU (or SPE) when setting the PMU for the VCPU, and having
> > > > userspace set it explicitely via an ioctl looks like an unnecessary step to
> > > > me. I don't see other usecases of an explicit ioctl outside of the above
> > > > two situation (if userspace wants a VCPU to run only on specific CPUs, it
> > > > can use thread affinity for that), so I decided to drop it.
> > > 
> > > My problem with that is that if you have (for whatever reason) a set
> > > of affinities that are not strictly identical for both PMU and SPE,
> > > and expose both of these to a guest, what do you choose?
> > > 
> > > As long as you have a single affinity set to take care of, you're
> > > good. It is when you have several ones that it becomes ugly (as with
> > > anything involving asymmetric CPUs).
> > 
> > I thought about it when I decided to do it this way, my solution was to do
> > a cpumask_and() with the existing VCPU cpumask when setting a VCPU feature
> > that requires it, and returning an error if we get an empty cpumask,
> > because userspace is requesting a combination of VCPU features that is not
> > supported by the hardware.
> 
> So every new asymetric feature would come with its own potential
> affinity mask, and KVM would track the restriction of that affinity. I
> guess that because it can only converge to zero, this is safe by
> design...
> 
> One thing I want to make sure is that we can evaluate the mask very
> early on, and reduce the overhead of that evaluation.

I don't think the check can be made any sooner than when the feature bit is set,
which is what I am proposing :)

> 
> > Going with the other solution (user sets the cpumask via an ioctl), KVM
> > would still have to check against certain combinations of VCPU features
> > (for SPE it's mandatory, so KVM doesn't trigger an undefined exception, we
> > could skip the check for PMU, but then what do we gain from the ioctl if
> > KVM doesn't check that it matches the PMU?), so I don't think we loose
> > anything by going with the implicit cpumask.
> > 
> > What do you think?
> 
> OK, fair enough. Please respin the series (I had a bunch of minor
> comments), and I'll have another look.

Great, thanks!

Alex

> 
> Thanks,
> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2021-12-06 10:28 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-15 16:50 [PATCH 0/4] KVM: arm64: Improve PMU support on heterogeneous systems Alexandru Elisei
2021-11-15 16:50 ` Alexandru Elisei
2021-11-15 16:50 ` [PATCH 1/4] perf: Fix wrong name in comment for struct perf_cpu_context Alexandru Elisei
2021-11-15 16:50   ` Alexandru Elisei
2021-11-15 16:50 ` [PATCH 2/4] KVM: arm64: Keep a list of probed PMUs Alexandru Elisei
2021-11-15 16:50   ` Alexandru Elisei
2021-11-15 16:50 ` [PATCH 3/4] KVM: arm64: Add KVM_ARM_VCPU_PMU_V3_SET_PMU attribute Alexandru Elisei
2021-11-15 16:50   ` Alexandru Elisei
2021-11-21 19:11   ` Marc Zyngier
2021-11-21 19:11     ` Marc Zyngier
2021-11-22 11:29     ` Alexandru Elisei
2021-11-22 11:29       ` Alexandru Elisei
2021-11-15 16:50 ` [PATCH 4/4] KVM: arm64: Refuse to run VCPU if the PMU doesn't match the physical CPU Alexandru Elisei
2021-11-15 16:50   ` Alexandru Elisei
2021-11-21 19:35   ` Marc Zyngier
2021-11-21 19:35     ` Marc Zyngier
2021-11-22 12:12     ` Alexandru Elisei
2021-11-22 12:12       ` Alexandru Elisei
2021-11-22 14:21       ` Marc Zyngier
2021-11-22 14:21         ` Marc Zyngier
2021-11-22 14:43         ` Alexandru Elisei
2021-11-22 14:43           ` Alexandru Elisei
2021-12-06 10:15           ` Marc Zyngier
2021-12-06 10:15             ` Marc Zyngier
2021-12-06 10:26             ` Alexandru Elisei
2021-12-06 10:26               ` Alexandru Elisei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.