KVM Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v2 00/18] arm64: KVM: add SPE profiling support
@ 2019-12-20 14:30 Andrew Murray
  2019-12-20 14:30 ` [PATCH v2 01/18] dt-bindings: ARM SPE: highlight the need for PPI partitions on heterogeneous systems Andrew Murray
                   ` (19 more replies)
  0 siblings, 20 replies; 78+ messages in thread
From: Andrew Murray @ 2019-12-20 14:30 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon
  Cc: Sudeep Holla, kvmarm, linux-arm-kernel, kvm, linux-kernel, Mark Rutland

This series implements support for allowing KVM guests to use the Arm
Statistical Profiling Extension (SPE).

It has been tested on a model to ensure that both host and guest can
simultaneously use SPE with valid data. E.g.

$ perf record -e arm_spe/ts_enable=1,pa_enable=1,pct_enable=1/ \
        dd if=/dev/zero of=/dev/null count=1000
$ perf report --dump-raw-trace > spe_buf.txt

As we save and restore the SPE context, the guest can access the SPE
registers directly, thus in this version of the series we remove the
trapping and emulation.

In the previous series of this support, when KVM SPE isn't supported
(e.g. via CONFIG_KVM_ARM_SPE) we were able to return a value of 0 to
all reads of the SPE registers - as we can no longer do this there isn't
a mechanism to prevent the guest from using SPE - thus I'm keen for
feedback on the best way of resolving this.

It appears necessary to pin the entire guest memory in order to provide
guest SPE access - otherwise it is possible for the guest to receive
Stage-2 faults.

The last two extra patches are for the kvmtool if someone wants to play
with it.

Changes since v2:
	- Rebased on v5.5-rc2
	- Renamed kvm_spe structure 'irq' member to 'irq_num'
	- Added irq_level to kvm_spe structure
	- Clear PMBSR service bit on save to avoid spurious interrupts
	- Update kvmtool headers to 5.4
	- Enabled SPE in KVM init features
	- No longer trap and emulate
	- Add support for guest/host exclusion flags
	- Fix virq support for SPE
	- Adjusted sysreg_elx_s macros with merged clang build support

Andrew Murray (4):
  KVM: arm64: don't trap Statistical Profiling controls to EL2
  perf: arm_spe: Add KVM structure for obtaining IRQ info
  KVM: arm64: spe: Provide guest virtual interrupts for SPE
  perf: arm_spe: Handle guest/host exclusion flags

Sudeep Holla (12):
  dt-bindings: ARM SPE: highlight the need for PPI partitions on
    heterogeneous systems
  arm64: KVM: reset E2PB correctly in MDCR_EL2 when exiting the
    guest(VHE)
  arm64: KVM: define SPE data structure for each vcpu
  arm64: KVM: add SPE system registers to sys_reg_descs
  arm64: KVM/VHE: enable the use PMSCR_EL12 on VHE systems
  arm64: KVM: split debug save restore across vm/traps activation
  arm64: KVM/debug: drop pmscr_el1 and use sys_regs[PMSCR_EL1] in
    kvm_cpu_context
  arm64: KVM: add support to save/restore SPE profiling buffer controls
  arm64: KVM: enable conditional save/restore full SPE profiling buffer
    controls
  arm64: KVM/debug: use EL1&0 stage 1 translation regime
  KVM: arm64: add a new vcpu device control group for SPEv1
  KVM: arm64: enable SPE support
  KVMTOOL: update_headers: Sync kvm UAPI headers with linux v5.5-rc2
  KVMTOOL: kvm: add a vcpu feature for SPEv1 support

 .../devicetree/bindings/arm/spe-pmu.txt       |   5 +-
 Documentation/virt/kvm/devices/vcpu.txt       |  28 +++
 arch/arm64/include/asm/kvm_host.h             |  18 +-
 arch/arm64/include/asm/kvm_hyp.h              |   6 +-
 arch/arm64/include/asm/sysreg.h               |   1 +
 arch/arm64/include/uapi/asm/kvm.h             |   4 +
 arch/arm64/kvm/Kconfig                        |   7 +
 arch/arm64/kvm/Makefile                       |   1 +
 arch/arm64/kvm/debug.c                        |   2 -
 arch/arm64/kvm/guest.c                        |   6 +
 arch/arm64/kvm/hyp/debug-sr.c                 | 105 +++++---
 arch/arm64/kvm/hyp/switch.c                   |  18 +-
 arch/arm64/kvm/reset.c                        |   3 +
 arch/arm64/kvm/sys_regs.c                     |  11 +
 drivers/perf/arm_spe_pmu.c                    |  26 ++
 include/kvm/arm_spe.h                         |  82 ++++++
 include/uapi/linux/kvm.h                      |   1 +
 virt/kvm/arm/arm.c                            |  10 +-
 virt/kvm/arm/spe.c                            | 234 ++++++++++++++++++
 19 files changed, 521 insertions(+), 47 deletions(-)
 create mode 100644 include/kvm/arm_spe.h
 create mode 100644 virt/kvm/arm/spe.c

-- 
2.21.0


^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v2 01/18] dt-bindings: ARM SPE: highlight the need for PPI partitions on heterogeneous systems
  2019-12-20 14:30 [PATCH v2 00/18] arm64: KVM: add SPE profiling support Andrew Murray
@ 2019-12-20 14:30 ` Andrew Murray
  2019-12-20 14:30 ` [PATCH v2 02/18] arm64: KVM: reset E2PB correctly in MDCR_EL2 when exiting the guest(VHE) Andrew Murray
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 78+ messages in thread
From: Andrew Murray @ 2019-12-20 14:30 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon
  Cc: Sudeep Holla, kvmarm, linux-arm-kernel, kvm, linux-kernel, Mark Rutland

From: Sudeep Holla <sudeep.holla@arm.com>

It's not entirely clear for the binding document that the only way to
express ARM SPE affined to a subset of CPUs on a heterogeneous systems
is through the use of PPI partitions available in the interrupt
controller bindings.

Let's make it clear.

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
---
 Documentation/devicetree/bindings/arm/spe-pmu.txt | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/arm/spe-pmu.txt b/Documentation/devicetree/bindings/arm/spe-pmu.txt
index 93372f2a7df9..4f4815800f6e 100644
--- a/Documentation/devicetree/bindings/arm/spe-pmu.txt
+++ b/Documentation/devicetree/bindings/arm/spe-pmu.txt
@@ -9,8 +9,9 @@ performance sample data using an in-memory trace buffer.
 	       "arm,statistical-profiling-extension-v1"
 
 - interrupts : Exactly 1 PPI must be listed. For heterogeneous systems where
-               SPE is only supported on a subset of the CPUs, please consult
-	       the arm,gic-v3 binding for details on describing a PPI partition.
+               SPE is only supported on a subset of the CPUs, a PPI partition
+	       described in the arm,gic-v3 binding must be used to describe
+	       the set of CPUs this interrupt is affine to.
 
 ** Example:
 
-- 
2.21.0


^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v2 02/18] arm64: KVM: reset E2PB correctly in MDCR_EL2 when exiting the guest(VHE)
  2019-12-20 14:30 [PATCH v2 00/18] arm64: KVM: add SPE profiling support Andrew Murray
  2019-12-20 14:30 ` [PATCH v2 01/18] dt-bindings: ARM SPE: highlight the need for PPI partitions on heterogeneous systems Andrew Murray
@ 2019-12-20 14:30 ` Andrew Murray
  2019-12-21 13:12   ` Marc Zyngier
  2019-12-20 14:30 ` [PATCH v2 03/18] arm64: KVM: define SPE data structure for each vcpu Andrew Murray
                   ` (17 subsequent siblings)
  19 siblings, 1 reply; 78+ messages in thread
From: Andrew Murray @ 2019-12-20 14:30 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon
  Cc: Sudeep Holla, kvmarm, linux-arm-kernel, kvm, linux-kernel, Mark Rutland

From: Sudeep Holla <sudeep.holla@arm.com>

On VHE systems, the reset value for MDCR_EL2.E2PB=b00 which defaults
to profiling buffer using the EL2 stage 1 translations. However if the
guest are allowed to use profiling buffers changing E2PB settings, we
need to ensure we resume back MDCR_EL2.E2PB=b00. Currently we just
do bitwise '&' with MDCR_EL2_E2PB_MASK which will retain the value.

So fix it by clearing all the bits in E2PB.

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
---
 arch/arm64/kvm/hyp/switch.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 72fbbd86eb5e..250f13910882 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -228,9 +228,7 @@ void deactivate_traps_vhe_put(void)
 {
 	u64 mdcr_el2 = read_sysreg(mdcr_el2);
 
-	mdcr_el2 &= MDCR_EL2_HPMN_MASK |
-		    MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT |
-		    MDCR_EL2_TPMS;
+	mdcr_el2 &= MDCR_EL2_HPMN_MASK | MDCR_EL2_TPMS;
 
 	write_sysreg(mdcr_el2, mdcr_el2);
 
-- 
2.21.0


^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v2 03/18] arm64: KVM: define SPE data structure for each vcpu
  2019-12-20 14:30 [PATCH v2 00/18] arm64: KVM: add SPE profiling support Andrew Murray
  2019-12-20 14:30 ` [PATCH v2 01/18] dt-bindings: ARM SPE: highlight the need for PPI partitions on heterogeneous systems Andrew Murray
  2019-12-20 14:30 ` [PATCH v2 02/18] arm64: KVM: reset E2PB correctly in MDCR_EL2 when exiting the guest(VHE) Andrew Murray
@ 2019-12-20 14:30 ` Andrew Murray
  2019-12-21 13:19   ` Marc Zyngier
  2019-12-20 14:30 ` [PATCH v2 04/18] arm64: KVM: add SPE system registers to sys_reg_descs Andrew Murray
                   ` (16 subsequent siblings)
  19 siblings, 1 reply; 78+ messages in thread
From: Andrew Murray @ 2019-12-20 14:30 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon
  Cc: Sudeep Holla, kvmarm, linux-arm-kernel, kvm, linux-kernel, Mark Rutland

From: Sudeep Holla <sudeep.holla@arm.com>

In order to support virtual SPE for guest, so define some basic structs.
This features depends on host having hardware with SPE support.

Since we can support this only on ARM64, add a separate config symbol
for the same.

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
[ Add irq_level, rename irq to irq_num for kvm_spe ]
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
---
 arch/arm64/include/asm/kvm_host.h |  2 ++
 arch/arm64/kvm/Kconfig            |  7 +++++++
 include/kvm/arm_spe.h             | 19 +++++++++++++++++++
 3 files changed, 28 insertions(+)
 create mode 100644 include/kvm/arm_spe.h

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index c61260cf63c5..f5dcff912645 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -35,6 +35,7 @@
 #include <kvm/arm_vgic.h>
 #include <kvm/arm_arch_timer.h>
 #include <kvm/arm_pmu.h>
+#include <kvm/arm_spe.h>
 
 #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
 
@@ -302,6 +303,7 @@ struct kvm_vcpu_arch {
 	struct vgic_cpu vgic_cpu;
 	struct arch_timer_cpu timer_cpu;
 	struct kvm_pmu pmu;
+	struct kvm_spe spe;
 
 	/*
 	 * Anything that is not used directly from assembly code goes
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index a475c68cbfec..af5be2c57dcb 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -35,6 +35,7 @@ config KVM
 	select HAVE_KVM_EVENTFD
 	select HAVE_KVM_IRQFD
 	select KVM_ARM_PMU if HW_PERF_EVENTS
+	select KVM_ARM_SPE if (HW_PERF_EVENTS && ARM_SPE_PMU)
 	select HAVE_KVM_MSI
 	select HAVE_KVM_IRQCHIP
 	select HAVE_KVM_IRQ_ROUTING
@@ -61,6 +62,12 @@ config KVM_ARM_PMU
 	  Adds support for a virtual Performance Monitoring Unit (PMU) in
 	  virtual machines.
 
+config KVM_ARM_SPE
+	bool
+	---help---
+	  Adds support for a virtual Statistical Profiling Extension(SPE) in
+	  virtual machines.
+
 config KVM_INDIRECT_VECTORS
        def_bool KVM && (HARDEN_BRANCH_PREDICTOR || HARDEN_EL2_VECTORS)
 
diff --git a/include/kvm/arm_spe.h b/include/kvm/arm_spe.h
new file mode 100644
index 000000000000..48d118fdb174
--- /dev/null
+++ b/include/kvm/arm_spe.h
@@ -0,0 +1,19 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 ARM Ltd.
+ */
+
+#ifndef __ASM_ARM_KVM_SPE_H
+#define __ASM_ARM_KVM_SPE_H
+
+#include <uapi/linux/kvm.h>
+#include <linux/kvm_host.h>
+
+struct kvm_spe {
+	int irq_num;
+	bool ready; /* indicates that SPE KVM instance is ready for use */
+	bool created; /* SPE KVM instance is created, may not be ready yet */
+	bool irq_level;
+};
+
+#endif /* __ASM_ARM_KVM_SPE_H */
-- 
2.21.0


^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v2 04/18] arm64: KVM: add SPE system registers to sys_reg_descs
  2019-12-20 14:30 [PATCH v2 00/18] arm64: KVM: add SPE profiling support Andrew Murray
                   ` (2 preceding siblings ...)
  2019-12-20 14:30 ` [PATCH v2 03/18] arm64: KVM: define SPE data structure for each vcpu Andrew Murray
@ 2019-12-20 14:30 ` Andrew Murray
  2019-12-20 14:30 ` [PATCH v2 05/18] arm64: KVM/VHE: enable the use PMSCR_EL12 on VHE systems Andrew Murray
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 78+ messages in thread
From: Andrew Murray @ 2019-12-20 14:30 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon
  Cc: Sudeep Holla, kvmarm, linux-arm-kernel, kvm, linux-kernel, Mark Rutland

From: Sudeep Holla <sudeep.holla@arm.com>

Add the Statistical Profiling Extension(SPE) Profiling Buffer controls
registers such that we can provide initial register values and use the
sys_regs structure as a store for our SPE context.

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
[ Reword commit, remove access/reset handlers, defer kvm_arm_support_spe_v1 ]
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
---
 arch/arm64/include/asm/kvm_host.h | 12 ++++++++++++
 arch/arm64/kvm/sys_regs.c         | 11 +++++++++++
 2 files changed, 23 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index f5dcff912645..9eb85f14df90 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -145,6 +145,18 @@ enum vcpu_sysreg {
 	MDCCINT_EL1,	/* Monitor Debug Comms Channel Interrupt Enable Reg */
 	DISR_EL1,	/* Deferred Interrupt Status Register */
 
+	/* Statistical Profiling Extension Registers */
+	PMSCR_EL1,
+	PMSICR_EL1,
+	PMSIRR_EL1,
+	PMSFCR_EL1,
+	PMSEVFR_EL1,
+	PMSLATFR_EL1,
+	PMSIDR_EL1,
+	PMBLIMITR_EL1,
+	PMBPTR_EL1,
+	PMBSR_EL1,
+
 	/* Performance Monitors Registers */
 	PMCR_EL0,	/* Control Register */
 	PMSELR_EL0,	/* Event Counter Selection Register */
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 46822afc57e0..955b157f9cc5 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1506,6 +1506,17 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_FAR_EL1), access_vm_reg, reset_unknown, FAR_EL1 },
 	{ SYS_DESC(SYS_PAR_EL1), NULL, reset_unknown, PAR_EL1 },
 
+	{ SYS_DESC(SYS_PMSCR_EL1), NULL, reset_val, PMSCR_EL1, 0 },
+	{ SYS_DESC(SYS_PMSICR_EL1), NULL, reset_val, PMSICR_EL1, 0 },
+	{ SYS_DESC(SYS_PMSIRR_EL1), NULL, reset_val, PMSIRR_EL1, 0 },
+	{ SYS_DESC(SYS_PMSFCR_EL1), NULL, reset_val, PMSFCR_EL1, 0 },
+	{ SYS_DESC(SYS_PMSEVFR_EL1), NULL, reset_val, PMSEVFR_EL1, 0 },
+	{ SYS_DESC(SYS_PMSLATFR_EL1), NULL, reset_val, PMSLATFR_EL1, 0 },
+	{ SYS_DESC(SYS_PMSIDR_EL1), NULL, reset_val, PMSIDR_EL1, 0 },
+	{ SYS_DESC(SYS_PMBLIMITR_EL1), NULL, reset_val, PMBLIMITR_EL1, 0 },
+	{ SYS_DESC(SYS_PMBPTR_EL1), NULL, reset_val, PMBPTR_EL1, 0 },
+	{ SYS_DESC(SYS_PMBSR_EL1), NULL, reset_val, PMBSR_EL1, 0 },
+
 	{ SYS_DESC(SYS_PMINTENSET_EL1), access_pminten, reset_unknown, PMINTENSET_EL1 },
 	{ SYS_DESC(SYS_PMINTENCLR_EL1), access_pminten, NULL, PMINTENSET_EL1 },
 
-- 
2.21.0


^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v2 05/18] arm64: KVM/VHE: enable the use PMSCR_EL12 on VHE systems
  2019-12-20 14:30 [PATCH v2 00/18] arm64: KVM: add SPE profiling support Andrew Murray
                   ` (3 preceding siblings ...)
  2019-12-20 14:30 ` [PATCH v2 04/18] arm64: KVM: add SPE system registers to sys_reg_descs Andrew Murray
@ 2019-12-20 14:30 ` Andrew Murray
  2019-12-20 14:30 ` [PATCH v2 06/18] arm64: KVM: split debug save restore across vm/traps activation Andrew Murray
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 78+ messages in thread
From: Andrew Murray @ 2019-12-20 14:30 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon
  Cc: Sudeep Holla, kvmarm, linux-arm-kernel, kvm, linux-kernel, Mark Rutland

From: Sudeep Holla <sudeep.holla@arm.com>

Currently, we are just using PMSCR_EL1 in the host for non VHE systems.
We already have the {read,write}_sysreg_el*() accessors for accessing
particular ELs' sysregs in the presence of VHE.

Lets just define PMSCR_EL12 and start making use of it here which will
access the right register on both VHE and non VHE systems. This change
is required to add SPE guest support on VHE systems.

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
---
 arch/arm64/include/asm/sysreg.h | 1 +
 arch/arm64/kvm/hyp/debug-sr.c   | 6 +++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 6e919fafb43d..6c0b0ad97688 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -468,6 +468,7 @@
 #define SYS_AFSR1_EL12			sys_reg(3, 5, 5, 1, 1)
 #define SYS_ESR_EL12			sys_reg(3, 5, 5, 2, 0)
 #define SYS_FAR_EL12			sys_reg(3, 5, 6, 0, 0)
+#define SYS_PMSCR_EL12			sys_reg(3, 5, 9, 9, 0)
 #define SYS_MAIR_EL12			sys_reg(3, 5, 10, 2, 0)
 #define SYS_AMAIR_EL12			sys_reg(3, 5, 10, 3, 0)
 #define SYS_VBAR_EL12			sys_reg(3, 5, 12, 0, 0)
diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
index 0fc9872a1467..98be2f11c16c 100644
--- a/arch/arm64/kvm/hyp/debug-sr.c
+++ b/arch/arm64/kvm/hyp/debug-sr.c
@@ -108,8 +108,8 @@ static void __hyp_text __debug_save_spe_nvhe(u64 *pmscr_el1)
 		return;
 
 	/* Yes; save the control register and disable data generation */
-	*pmscr_el1 = read_sysreg_s(SYS_PMSCR_EL1);
-	write_sysreg_s(0, SYS_PMSCR_EL1);
+	*pmscr_el1 = read_sysreg_el1(SYS_PMSCR);
+	write_sysreg_el1(0, SYS_PMSCR);
 	isb();
 
 	/* Now drain all buffered data to memory */
@@ -126,7 +126,7 @@ static void __hyp_text __debug_restore_spe_nvhe(u64 pmscr_el1)
 	isb();
 
 	/* Re-enable data generation */
-	write_sysreg_s(pmscr_el1, SYS_PMSCR_EL1);
+	write_sysreg_el1(pmscr_el1, SYS_PMSCR);
 }
 
 static void __hyp_text __debug_save_state(struct kvm_vcpu *vcpu,
-- 
2.21.0


^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v2 06/18] arm64: KVM: split debug save restore across vm/traps activation
  2019-12-20 14:30 [PATCH v2 00/18] arm64: KVM: add SPE profiling support Andrew Murray
                   ` (4 preceding siblings ...)
  2019-12-20 14:30 ` [PATCH v2 05/18] arm64: KVM/VHE: enable the use PMSCR_EL12 on VHE systems Andrew Murray
@ 2019-12-20 14:30 ` Andrew Murray
  2019-12-20 14:30 ` [PATCH v2 07/18] arm64: KVM/debug: drop pmscr_el1 and use sys_regs[PMSCR_EL1] in kvm_cpu_context Andrew Murray
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 78+ messages in thread
From: Andrew Murray @ 2019-12-20 14:30 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon
  Cc: Sudeep Holla, kvmarm, linux-arm-kernel, kvm, linux-kernel, Mark Rutland

From: Sudeep Holla <sudeep.holla@arm.com>

If we enable profiling buffer controls at EL1 generate a trap exception
to EL2, it also changes profiling buffer to use EL1&0 stage 1 translation
regime in case of VHE. To support SPE both in the guest and host, we
need to first stop profiling and flush the profiling buffers before
we activate/switch vm or enable/disable the traps.

In prepartion to do that, lets split the debug save restore functionality
into 4 steps:
1. debug_save_host_context - saves the host context
2. debug_restore_guest_context - restore the guest context
3. debug_save_guest_context - saves the guest context
4. debug_restore_host_context - restores the host context

Lets rename existing __debug_switch_to_{host,guest} to make sure it's
aligned to the above and just add the place holders for new ones getting
added here as we need them to support SPE in guests.

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
---
 arch/arm64/include/asm/kvm_hyp.h |  6 ++++--
 arch/arm64/kvm/hyp/debug-sr.c    | 25 ++++++++++++++++---------
 arch/arm64/kvm/hyp/switch.c      | 12 ++++++++----
 3 files changed, 28 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 97f21cc66657..011e7963f772 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -69,8 +69,10 @@ void sysreg_restore_guest_state_vhe(struct kvm_cpu_context *ctxt);
 void __sysreg32_save_state(struct kvm_vcpu *vcpu);
 void __sysreg32_restore_state(struct kvm_vcpu *vcpu);
 
-void __debug_switch_to_guest(struct kvm_vcpu *vcpu);
-void __debug_switch_to_host(struct kvm_vcpu *vcpu);
+void __debug_save_host_context(struct kvm_vcpu *vcpu);
+void __debug_restore_guest_context(struct kvm_vcpu *vcpu);
+void __debug_save_guest_context(struct kvm_vcpu *vcpu);
+void __debug_restore_host_context(struct kvm_vcpu *vcpu);
 
 void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
 void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs);
diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
index 98be2f11c16c..c803daebd596 100644
--- a/arch/arm64/kvm/hyp/debug-sr.c
+++ b/arch/arm64/kvm/hyp/debug-sr.c
@@ -168,20 +168,13 @@ static void __hyp_text __debug_restore_state(struct kvm_vcpu *vcpu,
 	write_sysreg(ctxt->sys_regs[MDCCINT_EL1], mdccint_el1);
 }
 
-void __hyp_text __debug_switch_to_guest(struct kvm_vcpu *vcpu)
+void __hyp_text __debug_restore_guest_context(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpu_context *host_ctxt;
 	struct kvm_cpu_context *guest_ctxt;
 	struct kvm_guest_debug_arch *host_dbg;
 	struct kvm_guest_debug_arch *guest_dbg;
 
-	/*
-	 * Non-VHE: Disable and flush SPE data generation
-	 * VHE: The vcpu can run, but it can't hide.
-	 */
-	if (!has_vhe())
-		__debug_save_spe_nvhe(&vcpu->arch.host_debug_state.pmscr_el1);
-
 	if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
 		return;
 
@@ -194,7 +187,7 @@ void __hyp_text __debug_switch_to_guest(struct kvm_vcpu *vcpu)
 	__debug_restore_state(vcpu, guest_dbg, guest_ctxt);
 }
 
-void __hyp_text __debug_switch_to_host(struct kvm_vcpu *vcpu)
+void __hyp_text __debug_restore_host_context(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpu_context *host_ctxt;
 	struct kvm_cpu_context *guest_ctxt;
@@ -218,6 +211,20 @@ void __hyp_text __debug_switch_to_host(struct kvm_vcpu *vcpu)
 	vcpu->arch.flags &= ~KVM_ARM64_DEBUG_DIRTY;
 }
 
+void __hyp_text __debug_save_host_context(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * Non-VHE: Disable and flush SPE data generation
+	 * VHE: The vcpu can run, but it can't hide.
+	 */
+	if (!has_vhe())
+		__debug_save_spe_nvhe(&vcpu->arch.host_debug_state.pmscr_el1);
+}
+
+void __hyp_text __debug_save_guest_context(struct kvm_vcpu *vcpu)
+{
+}
+
 u32 __hyp_text __kvm_get_mdcr_el2(void)
 {
 	return read_sysreg(mdcr_el2);
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 250f13910882..67b7c160f65b 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -626,6 +626,7 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	guest_ctxt = &vcpu->arch.ctxt;
 
 	sysreg_save_host_state_vhe(host_ctxt);
+	__debug_save_host_context(vcpu);
 
 	/*
 	 * ARM erratum 1165522 requires us to configure both stage 1 and
@@ -642,7 +643,7 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	__activate_traps(vcpu);
 
 	sysreg_restore_guest_state_vhe(guest_ctxt);
-	__debug_switch_to_guest(vcpu);
+	__debug_restore_guest_context(vcpu);
 
 	__set_guest_arch_workaround_state(vcpu);
 
@@ -656,6 +657,7 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	__set_host_arch_workaround_state(vcpu);
 
 	sysreg_save_guest_state_vhe(guest_ctxt);
+	__debug_save_guest_context(vcpu);
 
 	__deactivate_traps(vcpu);
 
@@ -664,7 +666,7 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED)
 		__fpsimd_save_fpexc32(vcpu);
 
-	__debug_switch_to_host(vcpu);
+	__debug_restore_host_context(vcpu);
 
 	return exit_code;
 }
@@ -698,6 +700,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 	pmu_switch_needed = __pmu_switch_to_guest(host_ctxt);
 
 	__sysreg_save_state_nvhe(host_ctxt);
+	__debug_save_host_context(vcpu);
 
 	/*
 	 * We must restore the 32-bit state before the sysregs, thanks
@@ -716,7 +719,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 	__hyp_vgic_restore_state(vcpu);
 	__timer_enable_traps(vcpu);
 
-	__debug_switch_to_guest(vcpu);
+	__debug_restore_guest_context(vcpu);
 
 	__set_guest_arch_workaround_state(vcpu);
 
@@ -730,6 +733,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 	__set_host_arch_workaround_state(vcpu);
 
 	__sysreg_save_state_nvhe(guest_ctxt);
+	__debug_save_guest_context(vcpu);
 	__sysreg32_save_state(vcpu);
 	__timer_disable_traps(vcpu);
 	__hyp_vgic_save_state(vcpu);
@@ -746,7 +750,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 	 * This must come after restoring the host sysregs, since a non-VHE
 	 * system may enable SPE here and make use of the TTBRs.
 	 */
-	__debug_switch_to_host(vcpu);
+	__debug_restore_host_context(vcpu);
 
 	if (pmu_switch_needed)
 		__pmu_switch_to_host(host_ctxt);
-- 
2.21.0


^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v2 07/18] arm64: KVM/debug: drop pmscr_el1 and use sys_regs[PMSCR_EL1] in kvm_cpu_context
  2019-12-20 14:30 [PATCH v2 00/18] arm64: KVM: add SPE profiling support Andrew Murray
                   ` (5 preceding siblings ...)
  2019-12-20 14:30 ` [PATCH v2 06/18] arm64: KVM: split debug save restore across vm/traps activation Andrew Murray
@ 2019-12-20 14:30 ` Andrew Murray
  2019-12-20 14:30 ` [PATCH v2 08/18] arm64: KVM: add support to save/restore SPE profiling buffer controls Andrew Murray
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 78+ messages in thread
From: Andrew Murray @ 2019-12-20 14:30 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon
  Cc: Sudeep Holla, kvmarm, linux-arm-kernel, kvm, linux-kernel, Mark Rutland

From: Sudeep Holla <sudeep.holla@arm.com>

kvm_cpu_context now has support to stash the complete SPE buffer control
context. We no longer need the pmscr_el1 kvm_vcpu_arch and it can be
dropped.

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
---
 arch/arm64/include/asm/kvm_host.h |  2 --
 arch/arm64/kvm/hyp/debug-sr.c     | 26 +++++++++++++++-----------
 2 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 9eb85f14df90..333c6491bec7 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -307,8 +307,6 @@ struct kvm_vcpu_arch {
 	struct {
 		/* {Break,watch}point registers */
 		struct kvm_guest_debug_arch regs;
-		/* Statistical profiling extension */
-		u64 pmscr_el1;
 	} host_debug_state;
 
 	/* VGIC state */
diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
index c803daebd596..8a70a493345e 100644
--- a/arch/arm64/kvm/hyp/debug-sr.c
+++ b/arch/arm64/kvm/hyp/debug-sr.c
@@ -85,19 +85,19 @@
 	default:	write_debug(ptr[0], reg, 0);			\
 	}
 
-static void __hyp_text __debug_save_spe_nvhe(u64 *pmscr_el1)
+static void __hyp_text __debug_save_spe_nvhe(struct kvm_cpu_context *ctxt)
 {
 	u64 reg;
 
 	/* Clear pmscr in case of early return */
-	*pmscr_el1 = 0;
+	ctxt->sys_regs[PMSCR_EL1] = 0;
 
 	/* SPE present on this CPU? */
 	if (!cpuid_feature_extract_unsigned_field(read_sysreg(id_aa64dfr0_el1),
 						  ID_AA64DFR0_PMSVER_SHIFT))
 		return;
 
-	/* Yes; is it owned by EL3? */
+	/* Yes; is it owned by higher EL? */
 	reg = read_sysreg_s(SYS_PMBIDR_EL1);
 	if (reg & BIT(SYS_PMBIDR_EL1_P_SHIFT))
 		return;
@@ -108,7 +108,7 @@ static void __hyp_text __debug_save_spe_nvhe(u64 *pmscr_el1)
 		return;
 
 	/* Yes; save the control register and disable data generation */
-	*pmscr_el1 = read_sysreg_el1(SYS_PMSCR);
+	ctxt->sys_regs[PMSCR_EL1] = read_sysreg_el1(SYS_PMSCR);
 	write_sysreg_el1(0, SYS_PMSCR);
 	isb();
 
@@ -117,16 +117,16 @@ static void __hyp_text __debug_save_spe_nvhe(u64 *pmscr_el1)
 	dsb(nsh);
 }
 
-static void __hyp_text __debug_restore_spe_nvhe(u64 pmscr_el1)
+static void __hyp_text __debug_restore_spe_nvhe(struct kvm_cpu_context *ctxt)
 {
-	if (!pmscr_el1)
+	if (!ctxt->sys_regs[PMSCR_EL1])
 		return;
 
 	/* The host page table is installed, but not yet synchronised */
 	isb();
 
 	/* Re-enable data generation */
-	write_sysreg_el1(pmscr_el1, SYS_PMSCR);
+	write_sysreg_el1(ctxt->sys_regs[PMSCR_EL1], SYS_PMSCR);
 }
 
 static void __hyp_text __debug_save_state(struct kvm_vcpu *vcpu,
@@ -194,14 +194,15 @@ void __hyp_text __debug_restore_host_context(struct kvm_vcpu *vcpu)
 	struct kvm_guest_debug_arch *host_dbg;
 	struct kvm_guest_debug_arch *guest_dbg;
 
+	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
+	guest_ctxt = &vcpu->arch.ctxt;
+
 	if (!has_vhe())
-		__debug_restore_spe_nvhe(vcpu->arch.host_debug_state.pmscr_el1);
+		__debug_restore_spe_nvhe(host_ctxt);
 
 	if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
 		return;
 
-	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
-	guest_ctxt = &vcpu->arch.ctxt;
 	host_dbg = &vcpu->arch.host_debug_state.regs;
 	guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
 
@@ -217,8 +218,11 @@ void __hyp_text __debug_save_host_context(struct kvm_vcpu *vcpu)
 	 * Non-VHE: Disable and flush SPE data generation
 	 * VHE: The vcpu can run, but it can't hide.
 	 */
+	struct kvm_cpu_context *host_ctxt;
+
+	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
 	if (!has_vhe())
-		__debug_save_spe_nvhe(&vcpu->arch.host_debug_state.pmscr_el1);
+		__debug_save_spe_nvhe(host_ctxt);
 }
 
 void __hyp_text __debug_save_guest_context(struct kvm_vcpu *vcpu)
-- 
2.21.0


^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v2 08/18] arm64: KVM: add support to save/restore SPE profiling buffer controls
  2019-12-20 14:30 [PATCH v2 00/18] arm64: KVM: add SPE profiling support Andrew Murray
                   ` (6 preceding siblings ...)
  2019-12-20 14:30 ` [PATCH v2 07/18] arm64: KVM/debug: drop pmscr_el1 and use sys_regs[PMSCR_EL1] in kvm_cpu_context Andrew Murray
@ 2019-12-20 14:30 ` Andrew Murray
  2019-12-21 13:57   ` Marc Zyngier
  2019-12-20 14:30 ` [PATCH v2 09/18] arm64: KVM: enable conditional save/restore full " Andrew Murray
                   ` (11 subsequent siblings)
  19 siblings, 1 reply; 78+ messages in thread
From: Andrew Murray @ 2019-12-20 14:30 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon
  Cc: Sudeep Holla, kvmarm, linux-arm-kernel, kvm, linux-kernel, Mark Rutland

From: Sudeep Holla <sudeep.holla@arm.com>

Currently since we don't support profiling using SPE in the guests,
we just save the PMSCR_EL1, flush the profiling buffers and disable
sampling. However in order to support simultaneous sampling both in
the host and guests, we need to save and reatore the complete SPE
profiling buffer controls' context.

Let's add the support for the same and keep it disabled for now.
We can enable it conditionally only if guests are allowed to use
SPE.

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
[ Clear PMBSR bit when saving state to prevent spurious interrupts ]
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
---
 arch/arm64/kvm/hyp/debug-sr.c | 51 +++++++++++++++++++++++++++++------
 1 file changed, 43 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
index 8a70a493345e..12429b212a3a 100644
--- a/arch/arm64/kvm/hyp/debug-sr.c
+++ b/arch/arm64/kvm/hyp/debug-sr.c
@@ -85,7 +85,8 @@
 	default:	write_debug(ptr[0], reg, 0);			\
 	}
 
-static void __hyp_text __debug_save_spe_nvhe(struct kvm_cpu_context *ctxt)
+static void __hyp_text
+__debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
 {
 	u64 reg;
 
@@ -102,22 +103,46 @@ static void __hyp_text __debug_save_spe_nvhe(struct kvm_cpu_context *ctxt)
 	if (reg & BIT(SYS_PMBIDR_EL1_P_SHIFT))
 		return;
 
-	/* No; is the host actually using the thing? */
-	reg = read_sysreg_s(SYS_PMBLIMITR_EL1);
-	if (!(reg & BIT(SYS_PMBLIMITR_EL1_E_SHIFT)))
+	/* Save the control register and disable data generation */
+	ctxt->sys_regs[PMSCR_EL1] = read_sysreg_el1(SYS_PMSCR);
+
+	if (!ctxt->sys_regs[PMSCR_EL1])
 		return;
 
 	/* Yes; save the control register and disable data generation */
-	ctxt->sys_regs[PMSCR_EL1] = read_sysreg_el1(SYS_PMSCR);
 	write_sysreg_el1(0, SYS_PMSCR);
 	isb();
 
 	/* Now drain all buffered data to memory */
 	psb_csync();
 	dsb(nsh);
+
+	if (!full_ctxt)
+		return;
+
+	ctxt->sys_regs[PMBLIMITR_EL1] = read_sysreg_s(SYS_PMBLIMITR_EL1);
+	write_sysreg_s(0, SYS_PMBLIMITR_EL1);
+
+	/*
+	 * As PMBSR is conditionally restored when returning to the host we
+	 * must ensure the service bit is unset here to prevent a spurious
+	 * host SPE interrupt from being raised.
+	 */
+	ctxt->sys_regs[PMBSR_EL1] = read_sysreg_s(SYS_PMBSR_EL1);
+	write_sysreg_s(0, SYS_PMBSR_EL1);
+
+	isb();
+
+	ctxt->sys_regs[PMSICR_EL1] = read_sysreg_s(SYS_PMSICR_EL1);
+	ctxt->sys_regs[PMSIRR_EL1] = read_sysreg_s(SYS_PMSIRR_EL1);
+	ctxt->sys_regs[PMSFCR_EL1] = read_sysreg_s(SYS_PMSFCR_EL1);
+	ctxt->sys_regs[PMSEVFR_EL1] = read_sysreg_s(SYS_PMSEVFR_EL1);
+	ctxt->sys_regs[PMSLATFR_EL1] = read_sysreg_s(SYS_PMSLATFR_EL1);
+	ctxt->sys_regs[PMBPTR_EL1] = read_sysreg_s(SYS_PMBPTR_EL1);
 }
 
-static void __hyp_text __debug_restore_spe_nvhe(struct kvm_cpu_context *ctxt)
+static void __hyp_text
+__debug_restore_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
 {
 	if (!ctxt->sys_regs[PMSCR_EL1])
 		return;
@@ -126,6 +151,16 @@ static void __hyp_text __debug_restore_spe_nvhe(struct kvm_cpu_context *ctxt)
 	isb();
 
 	/* Re-enable data generation */
+	if (full_ctxt) {
+		write_sysreg_s(ctxt->sys_regs[PMBPTR_EL1], SYS_PMBPTR_EL1);
+		write_sysreg_s(ctxt->sys_regs[PMBLIMITR_EL1], SYS_PMBLIMITR_EL1);
+		write_sysreg_s(ctxt->sys_regs[PMSFCR_EL1], SYS_PMSFCR_EL1);
+		write_sysreg_s(ctxt->sys_regs[PMSEVFR_EL1], SYS_PMSEVFR_EL1);
+		write_sysreg_s(ctxt->sys_regs[PMSLATFR_EL1], SYS_PMSLATFR_EL1);
+		write_sysreg_s(ctxt->sys_regs[PMSIRR_EL1], SYS_PMSIRR_EL1);
+		write_sysreg_s(ctxt->sys_regs[PMSICR_EL1], SYS_PMSICR_EL1);
+		write_sysreg_s(ctxt->sys_regs[PMBSR_EL1], SYS_PMBSR_EL1);
+	}
 	write_sysreg_el1(ctxt->sys_regs[PMSCR_EL1], SYS_PMSCR);
 }
 
@@ -198,7 +233,7 @@ void __hyp_text __debug_restore_host_context(struct kvm_vcpu *vcpu)
 	guest_ctxt = &vcpu->arch.ctxt;
 
 	if (!has_vhe())
-		__debug_restore_spe_nvhe(host_ctxt);
+		__debug_restore_spe_nvhe(host_ctxt, false);
 
 	if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
 		return;
@@ -222,7 +257,7 @@ void __hyp_text __debug_save_host_context(struct kvm_vcpu *vcpu)
 
 	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
 	if (!has_vhe())
-		__debug_save_spe_nvhe(host_ctxt);
+		__debug_save_spe_nvhe(host_ctxt, false);
 }
 
 void __hyp_text __debug_save_guest_context(struct kvm_vcpu *vcpu)
-- 
2.21.0


^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v2 09/18] arm64: KVM: enable conditional save/restore full SPE profiling buffer controls
  2019-12-20 14:30 [PATCH v2 00/18] arm64: KVM: add SPE profiling support Andrew Murray
                   ` (7 preceding siblings ...)
  2019-12-20 14:30 ` [PATCH v2 08/18] arm64: KVM: add support to save/restore SPE profiling buffer controls Andrew Murray
@ 2019-12-20 14:30 ` " Andrew Murray
  2019-12-20 18:06   ` Mark Rutland
  2019-12-21 14:13   ` Marc Zyngier
  2019-12-20 14:30 ` [PATCH v2 10/18] arm64: KVM/debug: use EL1&0 stage 1 translation regime Andrew Murray
                   ` (10 subsequent siblings)
  19 siblings, 2 replies; 78+ messages in thread
From: Andrew Murray @ 2019-12-20 14:30 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon
  Cc: Sudeep Holla, kvmarm, linux-arm-kernel, kvm, linux-kernel, Mark Rutland

From: Sudeep Holla <sudeep.holla@arm.com>

Now that we can save/restore the full SPE controls, we can enable it
if SPE is setup and ready to use in KVM. It's supported in KVM only if
all the CPUs in the system supports SPE.

However to support heterogenous systems, we need to move the check if
host supports SPE and do a partial save/restore.

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
---
 arch/arm64/kvm/hyp/debug-sr.c | 33 ++++++++++++++++-----------------
 include/kvm/arm_spe.h         |  6 ++++++
 2 files changed, 22 insertions(+), 17 deletions(-)

diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
index 12429b212a3a..d8d857067e6d 100644
--- a/arch/arm64/kvm/hyp/debug-sr.c
+++ b/arch/arm64/kvm/hyp/debug-sr.c
@@ -86,18 +86,13 @@
 	}
 
 static void __hyp_text
-__debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
+__debug_save_spe_context(struct kvm_cpu_context *ctxt, bool full_ctxt)
 {
 	u64 reg;
 
 	/* Clear pmscr in case of early return */
 	ctxt->sys_regs[PMSCR_EL1] = 0;
 
-	/* SPE present on this CPU? */
-	if (!cpuid_feature_extract_unsigned_field(read_sysreg(id_aa64dfr0_el1),
-						  ID_AA64DFR0_PMSVER_SHIFT))
-		return;
-
 	/* Yes; is it owned by higher EL? */
 	reg = read_sysreg_s(SYS_PMBIDR_EL1);
 	if (reg & BIT(SYS_PMBIDR_EL1_P_SHIFT))
@@ -142,7 +137,7 @@ __debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
 }
 
 static void __hyp_text
-__debug_restore_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
+__debug_restore_spe_context(struct kvm_cpu_context *ctxt, bool full_ctxt)
 {
 	if (!ctxt->sys_regs[PMSCR_EL1])
 		return;
@@ -210,11 +205,14 @@ void __hyp_text __debug_restore_guest_context(struct kvm_vcpu *vcpu)
 	struct kvm_guest_debug_arch *host_dbg;
 	struct kvm_guest_debug_arch *guest_dbg;
 
+	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
+	guest_ctxt = &vcpu->arch.ctxt;
+
+	__debug_restore_spe_context(guest_ctxt, kvm_arm_spe_v1_ready(vcpu));
+
 	if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
 		return;
 
-	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
-	guest_ctxt = &vcpu->arch.ctxt;
 	host_dbg = &vcpu->arch.host_debug_state.regs;
 	guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
 
@@ -232,8 +230,7 @@ void __hyp_text __debug_restore_host_context(struct kvm_vcpu *vcpu)
 	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
 	guest_ctxt = &vcpu->arch.ctxt;
 
-	if (!has_vhe())
-		__debug_restore_spe_nvhe(host_ctxt, false);
+	__debug_restore_spe_context(host_ctxt, kvm_arm_spe_v1_ready(vcpu));
 
 	if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
 		return;
@@ -249,19 +246,21 @@ void __hyp_text __debug_restore_host_context(struct kvm_vcpu *vcpu)
 
 void __hyp_text __debug_save_host_context(struct kvm_vcpu *vcpu)
 {
-	/*
-	 * Non-VHE: Disable and flush SPE data generation
-	 * VHE: The vcpu can run, but it can't hide.
-	 */
 	struct kvm_cpu_context *host_ctxt;
 
 	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
-	if (!has_vhe())
-		__debug_save_spe_nvhe(host_ctxt, false);
+	if (cpuid_feature_extract_unsigned_field(read_sysreg(id_aa64dfr0_el1),
+						 ID_AA64DFR0_PMSVER_SHIFT))
+		__debug_save_spe_context(host_ctxt, kvm_arm_spe_v1_ready(vcpu));
 }
 
 void __hyp_text __debug_save_guest_context(struct kvm_vcpu *vcpu)
 {
+	bool kvm_spe_ready = kvm_arm_spe_v1_ready(vcpu);
+
+	/* SPE present on this vCPU? */
+	if (kvm_spe_ready)
+		__debug_save_spe_context(&vcpu->arch.ctxt, kvm_spe_ready);
 }
 
 u32 __hyp_text __kvm_get_mdcr_el2(void)
diff --git a/include/kvm/arm_spe.h b/include/kvm/arm_spe.h
index 48d118fdb174..30c40b1bc385 100644
--- a/include/kvm/arm_spe.h
+++ b/include/kvm/arm_spe.h
@@ -16,4 +16,10 @@ struct kvm_spe {
 	bool irq_level;
 };
 
+#ifdef CONFIG_KVM_ARM_SPE
+#define kvm_arm_spe_v1_ready(v)		((v)->arch.spe.ready)
+#else
+#define kvm_arm_spe_v1_ready(v)		(false)
+#endif /* CONFIG_KVM_ARM_SPE */
+
 #endif /* __ASM_ARM_KVM_SPE_H */
-- 
2.21.0


^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v2 10/18] arm64: KVM/debug: use EL1&0 stage 1 translation regime
  2019-12-20 14:30 [PATCH v2 00/18] arm64: KVM: add SPE profiling support Andrew Murray
                   ` (8 preceding siblings ...)
  2019-12-20 14:30 ` [PATCH v2 09/18] arm64: KVM: enable conditional save/restore full " Andrew Murray
@ 2019-12-20 14:30 ` Andrew Murray
  2019-12-22 10:34   ` Marc Zyngier
  2019-12-20 14:30 ` [PATCH v2 11/18] KVM: arm64: don't trap Statistical Profiling controls to EL2 Andrew Murray
                   ` (9 subsequent siblings)
  19 siblings, 1 reply; 78+ messages in thread
From: Andrew Murray @ 2019-12-20 14:30 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon
  Cc: Sudeep Holla, kvmarm, linux-arm-kernel, kvm, linux-kernel, Mark Rutland

From: Sudeep Holla <sudeep.holla@arm.com>

Now that we have all the save/restore mechanism in place, lets enable
the translation regime used by buffer from EL2 stage 1 to EL1 stage 1
on VHE systems.

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
[ Reword commit, don't trap to EL2 ]
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
---
 arch/arm64/kvm/hyp/switch.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 67b7c160f65b..6c153b79829b 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -100,6 +100,7 @@ static void activate_traps_vhe(struct kvm_vcpu *vcpu)
 
 	write_sysreg(val, cpacr_el1);
 
+	write_sysreg(vcpu->arch.mdcr_el2 | 3 << MDCR_EL2_E2PB_SHIFT, mdcr_el2);
 	write_sysreg(kvm_get_hyp_vector(), vbar_el1);
 }
 NOKPROBE_SYMBOL(activate_traps_vhe);
@@ -117,6 +118,7 @@ static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
 		__activate_traps_fpsimd32(vcpu);
 	}
 
+	write_sysreg(vcpu->arch.mdcr_el2 | 3 << MDCR_EL2_E2PB_SHIFT, mdcr_el2);
 	write_sysreg(val, cptr_el2);
 
 	if (cpus_have_const_cap(ARM64_WORKAROUND_1319367)) {
-- 
2.21.0


^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v2 11/18] KVM: arm64: don't trap Statistical Profiling controls to EL2
  2019-12-20 14:30 [PATCH v2 00/18] arm64: KVM: add SPE profiling support Andrew Murray
                   ` (9 preceding siblings ...)
  2019-12-20 14:30 ` [PATCH v2 10/18] arm64: KVM/debug: use EL1&0 stage 1 translation regime Andrew Murray
@ 2019-12-20 14:30 ` Andrew Murray
  2019-12-20 18:08   ` Mark Rutland
  2019-12-22 10:42   ` Marc Zyngier
  2019-12-20 14:30 ` [PATCH v2 12/18] KVM: arm64: add a new vcpu device control group for SPEv1 Andrew Murray
                   ` (8 subsequent siblings)
  19 siblings, 2 replies; 78+ messages in thread
From: Andrew Murray @ 2019-12-20 14:30 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon
  Cc: Sudeep Holla, kvmarm, linux-arm-kernel, kvm, linux-kernel, Mark Rutland

As we now save/restore the profiler state there is no need to trap
accesses to the statistical profiling controls. Let's unset the
_TPMS bit.

Signed-off-by: Andrew Murray <andrew.murray@arm.com>
---
 arch/arm64/kvm/debug.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index 43487f035385..07ca783e7d9e 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -88,7 +88,6 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu)
  *  - Performance monitors (MDCR_EL2_TPM/MDCR_EL2_TPMCR)
  *  - Debug ROM Address (MDCR_EL2_TDRA)
  *  - OS related registers (MDCR_EL2_TDOSA)
- *  - Statistical profiler (MDCR_EL2_TPMS/MDCR_EL2_E2PB)
  *
  * Additionally, KVM only traps guest accesses to the debug registers if
  * the guest is not actively using them (see the KVM_ARM64_DEBUG_DIRTY
@@ -111,7 +110,6 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
 	 */
 	vcpu->arch.mdcr_el2 = __this_cpu_read(mdcr_el2) & MDCR_EL2_HPMN_MASK;
 	vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
-				MDCR_EL2_TPMS |
 				MDCR_EL2_TPMCR |
 				MDCR_EL2_TDRA |
 				MDCR_EL2_TDOSA);
-- 
2.21.0


^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v2 12/18] KVM: arm64: add a new vcpu device control group for SPEv1
  2019-12-20 14:30 [PATCH v2 00/18] arm64: KVM: add SPE profiling support Andrew Murray
                   ` (10 preceding siblings ...)
  2019-12-20 14:30 ` [PATCH v2 11/18] KVM: arm64: don't trap Statistical Profiling controls to EL2 Andrew Murray
@ 2019-12-20 14:30 ` Andrew Murray
  2019-12-22 11:03   ` Marc Zyngier
  2019-12-20 14:30 ` [PATCH v2 13/18] perf: arm_spe: Add KVM structure for obtaining IRQ info Andrew Murray
                   ` (7 subsequent siblings)
  19 siblings, 1 reply; 78+ messages in thread
From: Andrew Murray @ 2019-12-20 14:30 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon
  Cc: Sudeep Holla, kvmarm, linux-arm-kernel, kvm, linux-kernel, Mark Rutland

From: Sudeep Holla <sudeep.holla@arm.com>

To configure the virtual SPEv1 overflow interrupt number, we use the
vcpu kvm_device ioctl, encapsulating the KVM_ARM_VCPU_SPE_V1_IRQ
attribute within the KVM_ARM_VCPU_SPE_V1_CTRL group.

After configuring the SPEv1, call the vcpu ioctl with attribute
KVM_ARM_VCPU_SPE_V1_INIT to initialize the SPEv1.

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
---
 Documentation/virt/kvm/devices/vcpu.txt |  28 ++++
 arch/arm64/include/asm/kvm_host.h       |   2 +-
 arch/arm64/include/uapi/asm/kvm.h       |   4 +
 arch/arm64/kvm/Makefile                 |   1 +
 arch/arm64/kvm/guest.c                  |   6 +
 arch/arm64/kvm/reset.c                  |   3 +
 include/kvm/arm_spe.h                   |  45 +++++++
 include/uapi/linux/kvm.h                |   1 +
 virt/kvm/arm/arm.c                      |   1 +
 virt/kvm/arm/spe.c                      | 163 ++++++++++++++++++++++++
 10 files changed, 253 insertions(+), 1 deletion(-)
 create mode 100644 virt/kvm/arm/spe.c

diff --git a/Documentation/virt/kvm/devices/vcpu.txt b/Documentation/virt/kvm/devices/vcpu.txt
index 6f3bd64a05b0..cefad056d677 100644
--- a/Documentation/virt/kvm/devices/vcpu.txt
+++ b/Documentation/virt/kvm/devices/vcpu.txt
@@ -74,3 +74,31 @@ Specifies the base address of the stolen time structure for this VCPU. The
 base address must be 64 byte aligned and exist within a valid guest memory
 region. See Documentation/virt/kvm/arm/pvtime.txt for more information
 including the layout of the stolen time structure.
+
+4. GROUP: KVM_ARM_VCPU_SPE_V1_CTRL
+Architectures: ARM64
+
+4.1. ATTRIBUTE: KVM_ARM_VCPU_SPE_V1_IRQ
+Parameters: in kvm_device_attr.addr the address for SPE buffer overflow interrupt
+	    is a pointer to an int
+Returns: -EBUSY: The SPE overflow interrupt is already set
+         -ENXIO: The overflow interrupt not set when attempting to get it
+         -ENODEV: SPEv1 not supported
+         -EINVAL: Invalid SPE overflow interrupt number supplied or
+                  trying to set the IRQ number without using an in-kernel
+                  irqchip.
+
+A value describing the SPEv1 (Statistical Profiling Extension v1) overflow
+interrupt number for this vcpu. This interrupt should be PPI and the interrupt
+type and number must be same for each vcpu.
+
+4.2 ATTRIBUTE: KVM_ARM_VCPU_SPE_V1_INIT
+Parameters: no additional parameter in kvm_device_attr.addr
+Returns: -ENODEV: SPEv1 not supported or GIC not initialized
+         -ENXIO: SPEv1 not properly configured or in-kernel irqchip not
+                 configured as required prior to calling this attribute
+         -EBUSY: SPEv1 already initialized
+
+Request the initialization of the SPEv1.  If using the SPEv1 with an in-kernel
+virtual GIC implementation, this must be done after initializing the in-kernel
+irqchip.
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 333c6491bec7..d00f450dc4cd 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -39,7 +39,7 @@
 
 #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
 
-#define KVM_VCPU_MAX_FEATURES 7
+#define KVM_VCPU_MAX_FEATURES 8
 
 #define KVM_REQ_SLEEP \
 	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 820e5751ada7..905a73f30079 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -106,6 +106,7 @@ struct kvm_regs {
 #define KVM_ARM_VCPU_SVE		4 /* enable SVE for this CPU */
 #define KVM_ARM_VCPU_PTRAUTH_ADDRESS	5 /* VCPU uses address authentication */
 #define KVM_ARM_VCPU_PTRAUTH_GENERIC	6 /* VCPU uses generic authentication */
+#define KVM_ARM_VCPU_SPE_V1		7 /* Support guest SPEv1 */
 
 struct kvm_vcpu_init {
 	__u32 target;
@@ -326,6 +327,9 @@ struct kvm_vcpu_events {
 #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER		1
 #define KVM_ARM_VCPU_PVTIME_CTRL	2
 #define   KVM_ARM_VCPU_PVTIME_IPA	0
+#define KVM_ARM_VCPU_SPE_V1_CTRL	3
+#define   KVM_ARM_VCPU_SPE_V1_IRQ	0
+#define   KVM_ARM_VCPU_SPE_V1_INIT	1
 
 /* KVM_IRQ_LINE irq field index values */
 #define KVM_ARM_IRQ_VCPU2_SHIFT		28
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 5ffbdc39e780..526f3bf09321 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -37,3 +37,4 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-debug.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/irqchip.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
 kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
+kvm-$(CONFIG_KVM_ARM_SPE) += $(KVM)/arm/spe.o
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 2fff06114a8f..50fea538b8bd 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -874,6 +874,8 @@ int kvm_arm_vcpu_arch_set_attr(struct kvm_vcpu *vcpu,
 		break;
 	case KVM_ARM_VCPU_PVTIME_CTRL:
 		ret = kvm_arm_pvtime_set_attr(vcpu, attr);
+	case KVM_ARM_VCPU_SPE_V1_CTRL:
+		ret = kvm_arm_spe_v1_set_attr(vcpu, attr);
 		break;
 	default:
 		ret = -ENXIO;
@@ -897,6 +899,8 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
 		break;
 	case KVM_ARM_VCPU_PVTIME_CTRL:
 		ret = kvm_arm_pvtime_get_attr(vcpu, attr);
+	case KVM_ARM_VCPU_SPE_V1_CTRL:
+		ret = kvm_arm_spe_v1_get_attr(vcpu, attr);
 		break;
 	default:
 		ret = -ENXIO;
@@ -920,6 +924,8 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
 		break;
 	case KVM_ARM_VCPU_PVTIME_CTRL:
 		ret = kvm_arm_pvtime_has_attr(vcpu, attr);
+	case KVM_ARM_VCPU_SPE_V1_CTRL:
+		ret = kvm_arm_spe_v1_has_attr(vcpu, attr);
 		break;
 	default:
 		ret = -ENXIO;
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index f4a8ae918827..cf17aff1489d 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -80,6 +80,9 @@ int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_ARM_INJECT_SERROR_ESR:
 		r = cpus_have_const_cap(ARM64_HAS_RAS_EXTN);
 		break;
+	case KVM_CAP_ARM_SPE_V1:
+		r = kvm_arm_support_spe_v1();
+		break;
 	case KVM_CAP_SET_GUEST_DEBUG:
 	case KVM_CAP_VCPU_ATTRIBUTES:
 		r = 1;
diff --git a/include/kvm/arm_spe.h b/include/kvm/arm_spe.h
index 30c40b1bc385..d1f3c564dfd0 100644
--- a/include/kvm/arm_spe.h
+++ b/include/kvm/arm_spe.h
@@ -8,6 +8,7 @@
 
 #include <uapi/linux/kvm.h>
 #include <linux/kvm_host.h>
+#include <linux/cpufeature.h>
 
 struct kvm_spe {
 	int irq_num;
@@ -18,8 +19,52 @@ struct kvm_spe {
 
 #ifdef CONFIG_KVM_ARM_SPE
 #define kvm_arm_spe_v1_ready(v)		((v)->arch.spe.ready)
+#define kvm_arm_spe_irq_initialized(v)		\
+	((v)->arch.spe.irq_num >= VGIC_NR_SGIS &&	\
+	(v)->arch.spe.irq_num <= VGIC_MAX_PRIVATE)
+
+static inline bool kvm_arm_support_spe_v1(void)
+{
+	u64 dfr0 = read_sanitised_ftr_reg(SYS_ID_AA64DFR0_EL1);
+
+	return !!cpuid_feature_extract_unsigned_field(dfr0,
+						      ID_AA64DFR0_PMSVER_SHIFT);
+}
+
+int kvm_arm_spe_v1_set_attr(struct kvm_vcpu *vcpu,
+			    struct kvm_device_attr *attr);
+int kvm_arm_spe_v1_get_attr(struct kvm_vcpu *vcpu,
+			    struct kvm_device_attr *attr);
+int kvm_arm_spe_v1_has_attr(struct kvm_vcpu *vcpu,
+			    struct kvm_device_attr *attr);
+int kvm_arm_spe_v1_enable(struct kvm_vcpu *vcpu);
 #else
 #define kvm_arm_spe_v1_ready(v)		(false)
+#define kvm_arm_support_spe_v1()	(false)
+#define kvm_arm_spe_irq_initialized(v)	(false)
+
+static inline int kvm_arm_spe_v1_set_attr(struct kvm_vcpu *vcpu,
+					  struct kvm_device_attr *attr)
+{
+	return -ENXIO;
+}
+
+static inline int kvm_arm_spe_v1_get_attr(struct kvm_vcpu *vcpu,
+					  struct kvm_device_attr *attr)
+{
+	return -ENXIO;
+}
+
+static inline int kvm_arm_spe_v1_has_attr(struct kvm_vcpu *vcpu,
+					  struct kvm_device_attr *attr)
+{
+	return -ENXIO;
+}
+
+static inline int kvm_arm_spe_v1_enable(struct kvm_vcpu *vcpu)
+{
+	return 0;
+}
 #endif /* CONFIG_KVM_ARM_SPE */
 
 #endif /* __ASM_ARM_KVM_SPE_H */
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index f0a16b4adbbd..1a362c230e4a 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1009,6 +1009,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_PPC_GUEST_DEBUG_SSTEP 176
 #define KVM_CAP_ARM_NISV_TO_USER 177
 #define KVM_CAP_ARM_INJECT_EXT_DABT 178
+#define KVM_CAP_ARM_SPE_V1 179
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 12e0280291ce..340d2388ee2c 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -22,6 +22,7 @@
 #include <trace/events/kvm.h>
 #include <kvm/arm_pmu.h>
 #include <kvm/arm_psci.h>
+#include <kvm/arm_spe.h>
 
 #define CREATE_TRACE_POINTS
 #include "trace.h"
diff --git a/virt/kvm/arm/spe.c b/virt/kvm/arm/spe.c
new file mode 100644
index 000000000000..83ac2cce2cc3
--- /dev/null
+++ b/virt/kvm/arm/spe.c
@@ -0,0 +1,163 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 ARM Ltd.
+ */
+
+#include <linux/cpu.h>
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+#include <linux/uaccess.h>
+#include <asm/kvm_emulate.h>
+#include <kvm/arm_spe.h>
+#include <kvm/arm_vgic.h>
+
+int kvm_arm_spe_v1_enable(struct kvm_vcpu *vcpu)
+{
+	if (!vcpu->arch.spe.created)
+		return 0;
+
+	/*
+	 * A valid interrupt configuration for the SPE is either to have a
+	 * properly configured interrupt number and using an in-kernel irqchip.
+	 */
+	if (irqchip_in_kernel(vcpu->kvm)) {
+		int irq = vcpu->arch.spe.irq_num;
+
+		if (!kvm_arm_spe_irq_initialized(vcpu))
+			return -EINVAL;
+
+		if (!irq_is_ppi(irq))
+			return -EINVAL;
+	}
+
+	vcpu->arch.spe.ready = true;
+
+	return 0;
+}
+
+static int kvm_arm_spe_v1_init(struct kvm_vcpu *vcpu)
+{
+	if (!kvm_arm_support_spe_v1())
+		return -ENODEV;
+
+	if (!test_bit(KVM_ARM_VCPU_SPE_V1, vcpu->arch.features))
+		return -ENXIO;
+
+	if (vcpu->arch.spe.created)
+		return -EBUSY;
+
+	if (irqchip_in_kernel(vcpu->kvm)) {
+		int ret;
+
+		/*
+		 * If using the SPE with an in-kernel virtual GIC
+		 * implementation, we require the GIC to be already
+		 * initialized when initializing the SPE.
+		 */
+		if (!vgic_initialized(vcpu->kvm))
+			return -ENODEV;
+
+		ret = kvm_vgic_set_owner(vcpu, vcpu->arch.spe.irq_num,
+					 &vcpu->arch.spe);
+		if (ret)
+			return ret;
+	}
+
+	vcpu->arch.spe.created = true;
+	return 0;
+}
+
+/*
+ * For one VM the interrupt type must be same for each vcpu.
+ * As a PPI, the interrupt number is the same for all vcpus,
+ * while as an SPI it must be a separate number per vcpu.
+ */
+static bool spe_irq_is_valid(struct kvm *kvm, int irq)
+{
+	int i;
+	struct kvm_vcpu *vcpu;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		if (!kvm_arm_spe_irq_initialized(vcpu))
+			continue;
+
+		if (vcpu->arch.spe.irq_num != irq)
+			return false;
+	}
+
+	return true;
+}
+
+int kvm_arm_spe_v1_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
+{
+	switch (attr->attr) {
+	case KVM_ARM_VCPU_SPE_V1_IRQ: {
+		int __user *uaddr = (int __user *)(long)attr->addr;
+		int irq;
+
+		if (!irqchip_in_kernel(vcpu->kvm))
+			return -EINVAL;
+
+		if (!test_bit(KVM_ARM_VCPU_SPE_V1, vcpu->arch.features))
+			return -ENODEV;
+
+		if (get_user(irq, uaddr))
+			return -EFAULT;
+
+		/* The SPE overflow interrupt can be a PPI only */
+		if (!(irq_is_ppi(irq)))
+			return -EINVAL;
+
+		if (!spe_irq_is_valid(vcpu->kvm, irq))
+			return -EINVAL;
+
+		if (kvm_arm_spe_irq_initialized(vcpu))
+			return -EBUSY;
+
+		kvm_debug("Set kvm ARM SPE irq: %d\n", irq);
+		vcpu->arch.spe.irq_num = irq;
+		return 0;
+	}
+	case KVM_ARM_VCPU_SPE_V1_INIT:
+		return kvm_arm_spe_v1_init(vcpu);
+	}
+
+	return -ENXIO;
+}
+
+int kvm_arm_spe_v1_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
+{
+	switch (attr->attr) {
+	case KVM_ARM_VCPU_SPE_V1_IRQ: {
+		int __user *uaddr = (int __user *)(long)attr->addr;
+		int irq;
+
+		if (!irqchip_in_kernel(vcpu->kvm))
+			return -EINVAL;
+
+		if (!test_bit(KVM_ARM_VCPU_SPE_V1, vcpu->arch.features))
+			return -ENODEV;
+
+		if (!kvm_arm_spe_irq_initialized(vcpu))
+			return -ENXIO;
+
+		irq = vcpu->arch.spe.irq_num;
+		return put_user(irq, uaddr);
+	}
+	}
+
+	return -ENXIO;
+}
+
+int kvm_arm_spe_v1_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
+{
+	switch (attr->attr) {
+	case KVM_ARM_VCPU_SPE_V1_IRQ:
+	case KVM_ARM_VCPU_SPE_V1_INIT:
+		if (kvm_arm_support_spe_v1() &&
+		    test_bit(KVM_ARM_VCPU_SPE_V1, vcpu->arch.features))
+			return 0;
+	}
+
+	return -ENXIO;
+}
-- 
2.21.0


^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v2 13/18] perf: arm_spe: Add KVM structure for obtaining IRQ info
  2019-12-20 14:30 [PATCH v2 00/18] arm64: KVM: add SPE profiling support Andrew Murray
                   ` (11 preceding siblings ...)
  2019-12-20 14:30 ` [PATCH v2 12/18] KVM: arm64: add a new vcpu device control group for SPEv1 Andrew Murray
@ 2019-12-20 14:30 ` Andrew Murray
  2019-12-22 11:24   ` Marc Zyngier
  2019-12-20 14:30 ` [PATCH v2 14/18] KVM: arm64: spe: Provide guest virtual interrupts for SPE Andrew Murray
                   ` (6 subsequent siblings)
  19 siblings, 1 reply; 78+ messages in thread
From: Andrew Murray @ 2019-12-20 14:30 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon
  Cc: Sudeep Holla, kvmarm, linux-arm-kernel, kvm, linux-kernel, Mark Rutland

KVM requires knowledge of the physical SPE IRQ number such that it can
associate it with any virtual IRQ for guests that require SPE emulation.

Let's create a structure to hold this information and an accessor that
KVM can use to retrieve this information.

We expect that each SPE device will have the same physical PPI number
and thus will warn when this is not the case.

Signed-off-by: Andrew Murray <andrew.murray@arm.com>
---
 drivers/perf/arm_spe_pmu.c | 23 +++++++++++++++++++++++
 include/kvm/arm_spe.h      |  6 ++++++
 2 files changed, 29 insertions(+)

diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index 4e4984a55cd1..2d24af4cfcab 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -34,6 +34,9 @@
 #include <linux/smp.h>
 #include <linux/vmalloc.h>
 
+#include <linux/kvm_host.h>
+#include <kvm/arm_spe.h>
+
 #include <asm/barrier.h>
 #include <asm/cpufeature.h>
 #include <asm/mmu.h>
@@ -1127,6 +1130,24 @@ static void arm_spe_pmu_dev_teardown(struct arm_spe_pmu *spe_pmu)
 	free_percpu_irq(spe_pmu->irq, spe_pmu->handle);
 }
 
+#ifdef CONFIG_KVM_ARM_SPE
+static struct arm_spe_kvm_info arm_spe_kvm_info;
+
+struct arm_spe_kvm_info *arm_spe_get_kvm_info(void)
+{
+	return &arm_spe_kvm_info;
+}
+
+static void arm_spe_populate_kvm_info(struct arm_spe_pmu *spe_pmu)
+{
+	WARN_ON_ONCE(arm_spe_kvm_info.physical_irq != 0 &&
+		     arm_spe_kvm_info.physical_irq != spe_pmu->irq);
+	arm_spe_kvm_info.physical_irq = spe_pmu->irq;
+}
+#else
+static void arm_spe_populate_kvm_info(struct arm_spe_pmu *spe_pmu) {}
+#endif
+
 /* Driver and device probing */
 static int arm_spe_pmu_irq_probe(struct arm_spe_pmu *spe_pmu)
 {
@@ -1149,6 +1170,8 @@ static int arm_spe_pmu_irq_probe(struct arm_spe_pmu *spe_pmu)
 	}
 
 	spe_pmu->irq = irq;
+	arm_spe_populate_kvm_info(spe_pmu);
+
 	return 0;
 }
 
diff --git a/include/kvm/arm_spe.h b/include/kvm/arm_spe.h
index d1f3c564dfd0..9c65130d726d 100644
--- a/include/kvm/arm_spe.h
+++ b/include/kvm/arm_spe.h
@@ -17,6 +17,12 @@ struct kvm_spe {
 	bool irq_level;
 };
 
+struct arm_spe_kvm_info {
+	int physical_irq;
+};
+
+struct arm_spe_kvm_info *arm_spe_get_kvm_info(void);
+
 #ifdef CONFIG_KVM_ARM_SPE
 #define kvm_arm_spe_v1_ready(v)		((v)->arch.spe.ready)
 #define kvm_arm_spe_irq_initialized(v)		\
-- 
2.21.0


^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v2 14/18] KVM: arm64: spe: Provide guest virtual interrupts for SPE
  2019-12-20 14:30 [PATCH v2 00/18] arm64: KVM: add SPE profiling support Andrew Murray
                   ` (12 preceding siblings ...)
  2019-12-20 14:30 ` [PATCH v2 13/18] perf: arm_spe: Add KVM structure for obtaining IRQ info Andrew Murray
@ 2019-12-20 14:30 ` Andrew Murray
  2019-12-22 12:07   ` Marc Zyngier
  2019-12-20 14:30 ` [PATCH v2 15/18] perf: arm_spe: Handle guest/host exclusion flags Andrew Murray
                   ` (5 subsequent siblings)
  19 siblings, 1 reply; 78+ messages in thread
From: Andrew Murray @ 2019-12-20 14:30 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon
  Cc: Sudeep Holla, kvmarm, linux-arm-kernel, kvm, linux-kernel, Mark Rutland

Upon the exit of a guest, let's determine if the SPE device has generated
an interrupt - if so we'll inject a virtual interrupt to the guest.

Upon the entry and exit of a guest we'll also update the state of the
physical IRQ such that it is active when a guest interrupt is pending
and the guest is running.

Finally we map the physical IRQ to the virtual IRQ such that the guest
can deactivate the interrupt when it handles the interrupt.

Signed-off-by: Andrew Murray <andrew.murray@arm.com>
---
 include/kvm/arm_spe.h |  6 ++++
 virt/kvm/arm/arm.c    |  5 ++-
 virt/kvm/arm/spe.c    | 71 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 81 insertions(+), 1 deletion(-)

diff --git a/include/kvm/arm_spe.h b/include/kvm/arm_spe.h
index 9c65130d726d..91b2214f543a 100644
--- a/include/kvm/arm_spe.h
+++ b/include/kvm/arm_spe.h
@@ -37,6 +37,9 @@ static inline bool kvm_arm_support_spe_v1(void)
 						      ID_AA64DFR0_PMSVER_SHIFT);
 }
 
+void kvm_spe_flush_hwstate(struct kvm_vcpu *vcpu);
+inline void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu);
+
 int kvm_arm_spe_v1_set_attr(struct kvm_vcpu *vcpu,
 			    struct kvm_device_attr *attr);
 int kvm_arm_spe_v1_get_attr(struct kvm_vcpu *vcpu,
@@ -49,6 +52,9 @@ int kvm_arm_spe_v1_enable(struct kvm_vcpu *vcpu);
 #define kvm_arm_support_spe_v1()	(false)
 #define kvm_arm_spe_irq_initialized(v)	(false)
 
+static inline void kvm_spe_flush_hwstate(struct kvm_vcpu *vcpu) {}
+static inline void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu) {}
+
 static inline int kvm_arm_spe_v1_set_attr(struct kvm_vcpu *vcpu,
 					  struct kvm_device_attr *attr)
 {
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 340d2388ee2c..a66085c8e785 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -741,6 +741,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		preempt_disable();
 
 		kvm_pmu_flush_hwstate(vcpu);
+		kvm_spe_flush_hwstate(vcpu);
 
 		local_irq_disable();
 
@@ -782,6 +783,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		    kvm_request_pending(vcpu)) {
 			vcpu->mode = OUTSIDE_GUEST_MODE;
 			isb(); /* Ensure work in x_flush_hwstate is committed */
+			kvm_spe_sync_hwstate(vcpu);
 			kvm_pmu_sync_hwstate(vcpu);
 			if (static_branch_unlikely(&userspace_irqchip_in_use))
 				kvm_timer_sync_hwstate(vcpu);
@@ -816,11 +818,12 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		kvm_arm_clear_debug(vcpu);
 
 		/*
-		 * We must sync the PMU state before the vgic state so
+		 * We must sync the PMU and SPE state before the vgic state so
 		 * that the vgic can properly sample the updated state of the
 		 * interrupt line.
 		 */
 		kvm_pmu_sync_hwstate(vcpu);
+		kvm_spe_sync_hwstate(vcpu);
 
 		/*
 		 * Sync the vgic state before syncing the timer state because
diff --git a/virt/kvm/arm/spe.c b/virt/kvm/arm/spe.c
index 83ac2cce2cc3..097ed39014e4 100644
--- a/virt/kvm/arm/spe.c
+++ b/virt/kvm/arm/spe.c
@@ -35,6 +35,68 @@ int kvm_arm_spe_v1_enable(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+static inline void set_spe_irq_phys_active(struct arm_spe_kvm_info *info,
+					   bool active)
+{
+	int r;
+	r = irq_set_irqchip_state(info->physical_irq, IRQCHIP_STATE_ACTIVE,
+				  active);
+	WARN_ON(r);
+}
+
+void kvm_spe_flush_hwstate(struct kvm_vcpu *vcpu)
+{
+	struct kvm_spe *spe = &vcpu->arch.spe;
+	bool phys_active = false;
+	struct arm_spe_kvm_info *info = arm_spe_get_kvm_info();
+
+	if (!kvm_arm_spe_v1_ready(vcpu))
+		return;
+
+	if (irqchip_in_kernel(vcpu->kvm))
+		phys_active = kvm_vgic_map_is_active(vcpu, spe->irq_num);
+
+	phys_active |= spe->irq_level;
+
+	set_spe_irq_phys_active(info, phys_active);
+}
+
+void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu)
+{
+	struct kvm_spe *spe = &vcpu->arch.spe;
+	u64 pmbsr;
+	int r;
+	bool service;
+	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
+	struct arm_spe_kvm_info *info = arm_spe_get_kvm_info();
+
+	if (!kvm_arm_spe_v1_ready(vcpu))
+		return;
+
+	set_spe_irq_phys_active(info, false);
+
+	pmbsr = ctxt->sys_regs[PMBSR_EL1];
+	service = !!(pmbsr & BIT(SYS_PMBSR_EL1_S_SHIFT));
+	if (spe->irq_level == service)
+		return;
+
+	spe->irq_level = service;
+
+	if (likely(irqchip_in_kernel(vcpu->kvm))) {
+		r = kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
+					spe->irq_num, service, spe);
+		WARN_ON(r);
+	}
+}
+
+static inline bool kvm_arch_arm_spe_v1_get_input_level(int vintid)
+{
+	struct kvm_vcpu *vcpu = kvm_arm_get_running_vcpu();
+	struct kvm_spe *spe = &vcpu->arch.spe;
+
+	return spe->irq_level;
+}
+
 static int kvm_arm_spe_v1_init(struct kvm_vcpu *vcpu)
 {
 	if (!kvm_arm_support_spe_v1())
@@ -48,6 +110,7 @@ static int kvm_arm_spe_v1_init(struct kvm_vcpu *vcpu)
 
 	if (irqchip_in_kernel(vcpu->kvm)) {
 		int ret;
+		struct arm_spe_kvm_info *info;
 
 		/*
 		 * If using the SPE with an in-kernel virtual GIC
@@ -57,10 +120,18 @@ static int kvm_arm_spe_v1_init(struct kvm_vcpu *vcpu)
 		if (!vgic_initialized(vcpu->kvm))
 			return -ENODEV;
 
+		info = arm_spe_get_kvm_info();
+		if (!info->physical_irq)
+			return -ENODEV;
+
 		ret = kvm_vgic_set_owner(vcpu, vcpu->arch.spe.irq_num,
 					 &vcpu->arch.spe);
 		if (ret)
 			return ret;
+
+		ret = kvm_vgic_map_phys_irq(vcpu, info->physical_irq,
+					    vcpu->arch.spe.irq_num,
+					    kvm_arch_arm_spe_v1_get_input_level);
 	}
 
 	vcpu->arch.spe.created = true;
-- 
2.21.0


^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v2 15/18] perf: arm_spe: Handle guest/host exclusion flags
  2019-12-20 14:30 [PATCH v2 00/18] arm64: KVM: add SPE profiling support Andrew Murray
                   ` (13 preceding siblings ...)
  2019-12-20 14:30 ` [PATCH v2 14/18] KVM: arm64: spe: Provide guest virtual interrupts for SPE Andrew Murray
@ 2019-12-20 14:30 ` Andrew Murray
  2019-12-20 18:10   ` Mark Rutland
  2019-12-22 12:10   ` Marc Zyngier
  2019-12-20 14:30 ` [PATCH v2 16/18] KVM: arm64: enable SPE support Andrew Murray
                   ` (4 subsequent siblings)
  19 siblings, 2 replies; 78+ messages in thread
From: Andrew Murray @ 2019-12-20 14:30 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon
  Cc: Sudeep Holla, kvmarm, linux-arm-kernel, kvm, linux-kernel, Mark Rutland

A side effect of supporting the SPE in guests is that we prevent the
host from collecting data whilst inside a guest thus creating a black-out
window. This occurs because instead of emulating the SPE, we share it
with our guests.

Let's accurately describe our capabilities by using the perf exclude
flags to prevent !exclude_guest and exclude_host flags from being used.

Signed-off-by: Andrew Murray <andrew.murray@arm.com>
---
 drivers/perf/arm_spe_pmu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index 2d24af4cfcab..3703dbf459de 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -679,6 +679,9 @@ static int arm_spe_pmu_event_init(struct perf_event *event)
 	if (attr->exclude_idle)
 		return -EOPNOTSUPP;
 
+	if (!attr->exclude_guest || attr->exclude_host)
+		return -EOPNOTSUPP;
+
 	/*
 	 * Feedback-directed frequency throttling doesn't work when we
 	 * have a buffer of samples. We'd need to manually count the
-- 
2.21.0


^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v2 16/18] KVM: arm64: enable SPE support
  2019-12-20 14:30 [PATCH v2 00/18] arm64: KVM: add SPE profiling support Andrew Murray
                   ` (14 preceding siblings ...)
  2019-12-20 14:30 ` [PATCH v2 15/18] perf: arm_spe: Handle guest/host exclusion flags Andrew Murray
@ 2019-12-20 14:30 ` Andrew Murray
  2019-12-20 14:30 ` [PATCH v2 17/18, KVMTOOL] update_headers: Sync kvm UAPI headers with linux v5.5-rc2 Andrew Murray
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 78+ messages in thread
From: Andrew Murray @ 2019-12-20 14:30 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon
  Cc: Sudeep Holla, kvmarm, linux-arm-kernel, kvm, linux-kernel, Mark Rutland

From: Sudeep Holla <sudeep.holla@arm.com>

We have all the bits and pieces to enable SPE for guest in place, so
lets enable it.

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
---
 virt/kvm/arm/arm.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index a66085c8e785..fb3ad0835255 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -611,6 +611,10 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
 		return ret;
 
 	ret = kvm_arm_pmu_v3_enable(vcpu);
+	if (ret)
+		return ret;
+
+	ret = kvm_arm_spe_v1_enable(vcpu);
 
 	return ret;
 }
-- 
2.21.0


^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v2 17/18, KVMTOOL] update_headers: Sync kvm UAPI headers with linux v5.5-rc2
  2019-12-20 14:30 [PATCH v2 00/18] arm64: KVM: add SPE profiling support Andrew Murray
                   ` (15 preceding siblings ...)
  2019-12-20 14:30 ` [PATCH v2 16/18] KVM: arm64: enable SPE support Andrew Murray
@ 2019-12-20 14:30 ` Andrew Murray
  2019-12-20 14:30 ` [PATCH v2 18/18, KVMTOOL] kvm: add a vcpu feature for SPEv1 support Andrew Murray
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 78+ messages in thread
From: Andrew Murray @ 2019-12-20 14:30 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon
  Cc: Sudeep Holla, kvmarm, linux-arm-kernel, kvm, linux-kernel, Mark Rutland

The local copies of the kvm user API headers are getting stale.

In preparation for some arch-specific updated, this patch reflects
a re-run of util/update_headers.sh to pull in upstream updates from
linux v5.5-rc2.

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
[ Update headers to v5.5-rc2 ]
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
---
 arm/aarch32/include/asm/kvm.h |  7 +++++--
 arm/aarch64/include/asm/kvm.h | 13 +++++++++++--
 include/linux/kvm.h           | 18 ++++++++++++++++++
 powerpc/include/asm/kvm.h     |  3 +++
 4 files changed, 37 insertions(+), 4 deletions(-)

diff --git a/arm/aarch32/include/asm/kvm.h b/arm/aarch32/include/asm/kvm.h
index a4217c1a5d01..03cd7c19a683 100644
--- a/arm/aarch32/include/asm/kvm.h
+++ b/arm/aarch32/include/asm/kvm.h
@@ -131,8 +131,9 @@ struct kvm_vcpu_events {
 	struct {
 		__u8 serror_pending;
 		__u8 serror_has_esr;
+		__u8 ext_dabt_pending;
 		/* Align it to 8 bytes */
-		__u8 pad[6];
+		__u8 pad[5];
 		__u64 serror_esr;
 	} exception;
 	__u32 reserved[12];
@@ -266,8 +267,10 @@ struct kvm_vcpu_events {
 #define   KVM_DEV_ARM_ITS_CTRL_RESET		4
 
 /* KVM_IRQ_LINE irq field index values */
+#define KVM_ARM_IRQ_VCPU2_SHIFT		28
+#define KVM_ARM_IRQ_VCPU2_MASK		0xf
 #define KVM_ARM_IRQ_TYPE_SHIFT		24
-#define KVM_ARM_IRQ_TYPE_MASK		0xff
+#define KVM_ARM_IRQ_TYPE_MASK		0xf
 #define KVM_ARM_IRQ_VCPU_SHIFT		16
 #define KVM_ARM_IRQ_VCPU_MASK		0xff
 #define KVM_ARM_IRQ_NUM_SHIFT		0
diff --git a/arm/aarch64/include/asm/kvm.h b/arm/aarch64/include/asm/kvm.h
index 9a507716ae2f..905a73f30079 100644
--- a/arm/aarch64/include/asm/kvm.h
+++ b/arm/aarch64/include/asm/kvm.h
@@ -106,6 +106,7 @@ struct kvm_regs {
 #define KVM_ARM_VCPU_SVE		4 /* enable SVE for this CPU */
 #define KVM_ARM_VCPU_PTRAUTH_ADDRESS	5 /* VCPU uses address authentication */
 #define KVM_ARM_VCPU_PTRAUTH_GENERIC	6 /* VCPU uses generic authentication */
+#define KVM_ARM_VCPU_SPE_V1		7 /* Support guest SPEv1 */
 
 struct kvm_vcpu_init {
 	__u32 target;
@@ -164,8 +165,9 @@ struct kvm_vcpu_events {
 	struct {
 		__u8 serror_pending;
 		__u8 serror_has_esr;
+		__u8 ext_dabt_pending;
 		/* Align it to 8 bytes */
-		__u8 pad[6];
+		__u8 pad[5];
 		__u64 serror_esr;
 	} exception;
 	__u32 reserved[12];
@@ -323,10 +325,17 @@ struct kvm_vcpu_events {
 #define KVM_ARM_VCPU_TIMER_CTRL		1
 #define   KVM_ARM_VCPU_TIMER_IRQ_VTIMER		0
 #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER		1
+#define KVM_ARM_VCPU_PVTIME_CTRL	2
+#define   KVM_ARM_VCPU_PVTIME_IPA	0
+#define KVM_ARM_VCPU_SPE_V1_CTRL	3
+#define   KVM_ARM_VCPU_SPE_V1_IRQ	0
+#define   KVM_ARM_VCPU_SPE_V1_INIT	1
 
 /* KVM_IRQ_LINE irq field index values */
+#define KVM_ARM_IRQ_VCPU2_SHIFT		28
+#define KVM_ARM_IRQ_VCPU2_MASK		0xf
 #define KVM_ARM_IRQ_TYPE_SHIFT		24
-#define KVM_ARM_IRQ_TYPE_MASK		0xff
+#define KVM_ARM_IRQ_TYPE_MASK		0xf
 #define KVM_ARM_IRQ_VCPU_SHIFT		16
 #define KVM_ARM_IRQ_VCPU_MASK		0xff
 #define KVM_ARM_IRQ_NUM_SHIFT		0
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 5e3f12d5359e..1a362c230e4a 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -235,6 +235,7 @@ struct kvm_hyperv_exit {
 #define KVM_EXIT_S390_STSI        25
 #define KVM_EXIT_IOAPIC_EOI       26
 #define KVM_EXIT_HYPERV           27
+#define KVM_EXIT_ARM_NISV         28
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -243,6 +244,8 @@ struct kvm_hyperv_exit {
 #define KVM_INTERNAL_ERROR_SIMUL_EX	2
 /* Encounter unexpected vm-exit due to delivery event. */
 #define KVM_INTERNAL_ERROR_DELIVERY_EV	3
+/* Encounter unexpected vm-exit reason */
+#define KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON	4
 
 /* for KVM_RUN, returned by mmap(vcpu_fd, offset=0) */
 struct kvm_run {
@@ -392,6 +395,11 @@ struct kvm_run {
 		} eoi;
 		/* KVM_EXIT_HYPERV */
 		struct kvm_hyperv_exit hyperv;
+		/* KVM_EXIT_ARM_NISV */
+		struct {
+			__u64 esr_iss;
+			__u64 fault_ipa;
+		} arm_nisv;
 		/* Fix the size of the union. */
 		char padding[256];
 	};
@@ -996,6 +1004,12 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_ARM_PTRAUTH_ADDRESS 171
 #define KVM_CAP_ARM_PTRAUTH_GENERIC 172
 #define KVM_CAP_PMU_EVENT_FILTER 173
+#define KVM_CAP_ARM_IRQ_LINE_LAYOUT_2 174
+#define KVM_CAP_HYPERV_DIRECT_TLBFLUSH 175
+#define KVM_CAP_PPC_GUEST_DEBUG_SSTEP 176
+#define KVM_CAP_ARM_NISV_TO_USER 177
+#define KVM_CAP_ARM_INJECT_EXT_DABT 178
+#define KVM_CAP_ARM_SPE_V1 179
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1142,6 +1156,7 @@ struct kvm_dirty_tlb {
 #define KVM_REG_S390		0x5000000000000000ULL
 #define KVM_REG_ARM64		0x6000000000000000ULL
 #define KVM_REG_MIPS		0x7000000000000000ULL
+#define KVM_REG_RISCV		0x8000000000000000ULL
 
 #define KVM_REG_SIZE_SHIFT	52
 #define KVM_REG_SIZE_MASK	0x00f0000000000000ULL
@@ -1222,6 +1237,8 @@ enum kvm_device_type {
 #define KVM_DEV_TYPE_ARM_VGIC_ITS	KVM_DEV_TYPE_ARM_VGIC_ITS
 	KVM_DEV_TYPE_XIVE,
 #define KVM_DEV_TYPE_XIVE		KVM_DEV_TYPE_XIVE
+	KVM_DEV_TYPE_ARM_PV_TIME,
+#define KVM_DEV_TYPE_ARM_PV_TIME	KVM_DEV_TYPE_ARM_PV_TIME
 	KVM_DEV_TYPE_MAX,
 };
 
@@ -1332,6 +1349,7 @@ struct kvm_s390_ucas_mapping {
 #define KVM_PPC_GET_CPU_CHAR	  _IOR(KVMIO,  0xb1, struct kvm_ppc_cpu_char)
 /* Available with KVM_CAP_PMU_EVENT_FILTER */
 #define KVM_SET_PMU_EVENT_FILTER  _IOW(KVMIO,  0xb2, struct kvm_pmu_event_filter)
+#define KVM_PPC_SVM_OFF		  _IO(KVMIO,  0xb3)
 
 /* ioctl for vm fd */
 #define KVM_CREATE_DEVICE	  _IOWR(KVMIO,  0xe0, struct kvm_create_device)
diff --git a/powerpc/include/asm/kvm.h b/powerpc/include/asm/kvm.h
index b0f72dea8b11..264e266a85bf 100644
--- a/powerpc/include/asm/kvm.h
+++ b/powerpc/include/asm/kvm.h
@@ -667,6 +667,8 @@ struct kvm_ppc_cpu_char {
 
 /* PPC64 eXternal Interrupt Controller Specification */
 #define KVM_DEV_XICS_GRP_SOURCES	1	/* 64-bit source attributes */
+#define KVM_DEV_XICS_GRP_CTRL		2
+#define   KVM_DEV_XICS_NR_SERVERS	1
 
 /* Layout of 64-bit source attribute values */
 #define  KVM_XICS_DESTINATION_SHIFT	0
@@ -683,6 +685,7 @@ struct kvm_ppc_cpu_char {
 #define KVM_DEV_XIVE_GRP_CTRL		1
 #define   KVM_DEV_XIVE_RESET		1
 #define   KVM_DEV_XIVE_EQ_SYNC		2
+#define   KVM_DEV_XIVE_NR_SERVERS	3
 #define KVM_DEV_XIVE_GRP_SOURCE		2	/* 64-bit source identifier */
 #define KVM_DEV_XIVE_GRP_SOURCE_CONFIG	3	/* 64-bit source identifier */
 #define KVM_DEV_XIVE_GRP_EQ_CONFIG	4	/* 64-bit EQ identifier */
-- 
2.21.0


^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v2 18/18, KVMTOOL] kvm: add a vcpu feature for SPEv1 support
  2019-12-20 14:30 [PATCH v2 00/18] arm64: KVM: add SPE profiling support Andrew Murray
                   ` (16 preceding siblings ...)
  2019-12-20 14:30 ` [PATCH v2 17/18, KVMTOOL] update_headers: Sync kvm UAPI headers with linux v5.5-rc2 Andrew Murray
@ 2019-12-20 14:30 ` Andrew Murray
  2019-12-20 17:55 ` [PATCH v2 00/18] arm64: KVM: add SPE profiling support Mark Rutland
  2019-12-21 10:48 ` Marc Zyngier
  19 siblings, 0 replies; 78+ messages in thread
From: Andrew Murray @ 2019-12-20 14:30 UTC (permalink / raw)
  To: Marc Zyngier, Catalin Marinas, Will Deacon
  Cc: Sudeep Holla, kvmarm, linux-arm-kernel, kvm, linux-kernel, Mark Rutland

This is a runtime configurable for KVM tool to enable Statistical
Profiling Extensions version 1 support in guest kernel. A command line
option --spe is required to use the same.

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
[ Add SPE to init features ]
Signed-off-by: Andrew Murray <andrew.murray@arm.com>
---
 Makefile                                  |  2 +-
 arm/aarch64/arm-cpu.c                     |  2 +
 arm/aarch64/include/kvm/kvm-config-arch.h |  2 +
 arm/aarch64/include/kvm/kvm-cpu-arch.h    |  3 +-
 arm/aarch64/kvm-cpu.c                     |  4 ++
 arm/include/arm-common/kvm-config-arch.h  |  1 +
 arm/include/arm-common/spe.h              |  4 ++
 arm/spe.c                                 | 81 +++++++++++++++++++++++
 8 files changed, 97 insertions(+), 2 deletions(-)
 create mode 100644 arm/include/arm-common/spe.h
 create mode 100644 arm/spe.c

diff --git a/Makefile b/Makefile
index 3862112c5ec6..04dddb3e7699 100644
--- a/Makefile
+++ b/Makefile
@@ -158,7 +158,7 @@ endif
 # ARM
 OBJS_ARM_COMMON		:= arm/fdt.o arm/gic.o arm/gicv2m.o arm/ioport.o \
 			   arm/kvm.o arm/kvm-cpu.o arm/pci.o arm/timer.o \
-			   arm/pmu.o
+			   arm/pmu.o arm/spe.o
 HDRS_ARM_COMMON		:= arm/include
 ifeq ($(ARCH), arm)
 	DEFINES		+= -DCONFIG_ARM
diff --git a/arm/aarch64/arm-cpu.c b/arm/aarch64/arm-cpu.c
index d7572b7790b1..6ccea033f361 100644
--- a/arm/aarch64/arm-cpu.c
+++ b/arm/aarch64/arm-cpu.c
@@ -6,6 +6,7 @@
 #include "arm-common/gic.h"
 #include "arm-common/timer.h"
 #include "arm-common/pmu.h"
+#include "arm-common/spe.h"
 
 #include <linux/byteorder.h>
 #include <linux/types.h>
@@ -17,6 +18,7 @@ static void generate_fdt_nodes(void *fdt, struct kvm *kvm)
 	gic__generate_fdt_nodes(fdt, kvm->cfg.arch.irqchip);
 	timer__generate_fdt_nodes(fdt, kvm, timer_interrupts);
 	pmu__generate_fdt_nodes(fdt, kvm);
+	spe__generate_fdt_nodes(fdt, kvm);
 }
 
 static int arm_cpu__vcpu_init(struct kvm_cpu *vcpu)
diff --git a/arm/aarch64/include/kvm/kvm-config-arch.h b/arm/aarch64/include/kvm/kvm-config-arch.h
index 04be43dfa9b2..9968e1666de5 100644
--- a/arm/aarch64/include/kvm/kvm-config-arch.h
+++ b/arm/aarch64/include/kvm/kvm-config-arch.h
@@ -6,6 +6,8 @@
 			"Run AArch32 guest"),				\
 	OPT_BOOLEAN('\0', "pmu", &(cfg)->has_pmuv3,			\
 			"Create PMUv3 device"),				\
+	OPT_BOOLEAN('\0', "spe", &(cfg)->has_spev1,			\
+			"Create SPEv1 device"),				\
 	OPT_U64('\0', "kaslr-seed", &(cfg)->kaslr_seed,			\
 			"Specify random seed for Kernel Address Space "	\
 			"Layout Randomization (KASLR)"),
diff --git a/arm/aarch64/include/kvm/kvm-cpu-arch.h b/arm/aarch64/include/kvm/kvm-cpu-arch.h
index 8dfb82ecbc37..080183fa4f81 100644
--- a/arm/aarch64/include/kvm/kvm-cpu-arch.h
+++ b/arm/aarch64/include/kvm/kvm-cpu-arch.h
@@ -8,7 +8,8 @@
 #define ARM_VCPU_FEATURE_FLAGS(kvm, cpuid)	{				\
 	[0] = ((!!(cpuid) << KVM_ARM_VCPU_POWER_OFF) |				\
 	       (!!(kvm)->cfg.arch.aarch32_guest << KVM_ARM_VCPU_EL1_32BIT) |	\
-	       (!!(kvm)->cfg.arch.has_pmuv3 << KVM_ARM_VCPU_PMU_V3))		\
+	       (!!(kvm)->cfg.arch.has_pmuv3 << KVM_ARM_VCPU_PMU_V3) |		\
+	       (!!(kvm)->cfg.arch.has_spev1 << KVM_ARM_VCPU_SPE_V1))		\
 }
 
 #define ARM_MPIDR_HWID_BITMASK	0xFF00FFFFFFUL
diff --git a/arm/aarch64/kvm-cpu.c b/arm/aarch64/kvm-cpu.c
index 9f3e8586880c..90c2e1784e97 100644
--- a/arm/aarch64/kvm-cpu.c
+++ b/arm/aarch64/kvm-cpu.c
@@ -140,6 +140,10 @@ void kvm_cpu__select_features(struct kvm *kvm, struct kvm_vcpu_init *init)
 	/* Enable SVE if available */
 	if (kvm__supports_extension(kvm, KVM_CAP_ARM_SVE))
 		init->features[0] |= 1UL << KVM_ARM_VCPU_SVE;
+
+	/* Enable SPE if available */
+	if (kvm__supports_extension(kvm, KVM_CAP_ARM_SPE_V1))
+		init->features[0] |= 1UL << KVM_ARM_VCPU_SPE_V1;
 }
 
 int kvm_cpu__configure_features(struct kvm_cpu *vcpu)
diff --git a/arm/include/arm-common/kvm-config-arch.h b/arm/include/arm-common/kvm-config-arch.h
index 5734c46ab9e6..742733e289af 100644
--- a/arm/include/arm-common/kvm-config-arch.h
+++ b/arm/include/arm-common/kvm-config-arch.h
@@ -9,6 +9,7 @@ struct kvm_config_arch {
 	bool		virtio_trans_pci;
 	bool		aarch32_guest;
 	bool		has_pmuv3;
+	bool		has_spev1;
 	u64		kaslr_seed;
 	enum irqchip_type irqchip;
 	u64		fw_addr;
diff --git a/arm/include/arm-common/spe.h b/arm/include/arm-common/spe.h
new file mode 100644
index 000000000000..bcfa40877f6f
--- /dev/null
+++ b/arm/include/arm-common/spe.h
@@ -0,0 +1,4 @@
+
+#define KVM_ARM_SPEV1_PPI			21
+
+void spe__generate_fdt_nodes(void *fdt, struct kvm *kvm);
diff --git a/arm/spe.c b/arm/spe.c
new file mode 100644
index 000000000000..ec03b01a3866
--- /dev/null
+++ b/arm/spe.c
@@ -0,0 +1,81 @@
+#include "kvm/fdt.h"
+#include "kvm/kvm.h"
+#include "kvm/kvm-cpu.h"
+#include "kvm/util.h"
+
+#include "arm-common/gic.h"
+#include "arm-common/spe.h"
+
+#ifdef CONFIG_ARM64
+static int set_spe_attr(struct kvm *kvm, int vcpu_idx,
+			struct kvm_device_attr *attr)
+{
+	int ret, fd;
+
+	fd = kvm->cpus[vcpu_idx]->vcpu_fd;
+
+	ret = ioctl(fd, KVM_HAS_DEVICE_ATTR, attr);
+	if (!ret) {
+		ret = ioctl(fd, KVM_SET_DEVICE_ATTR, attr);
+		if (ret)
+			pr_err("SPE KVM_SET_DEVICE_ATTR failed (%d)\n", ret);
+	} else {
+		pr_err("Unsupported SPE on vcpu%d\n", vcpu_idx);
+	}
+
+	return ret;
+}
+
+void spe__generate_fdt_nodes(void *fdt, struct kvm *kvm)
+{
+	const char compatible[] = "arm,statistical-profiling-extension-v1";
+	int irq = KVM_ARM_SPEV1_PPI;
+	int i, ret;
+
+	u32 cpu_mask = (((1 << kvm->nrcpus) - 1) << GIC_FDT_IRQ_PPI_CPU_SHIFT) \
+		       & GIC_FDT_IRQ_PPI_CPU_MASK;
+	u32 irq_prop[] = {
+		cpu_to_fdt32(GIC_FDT_IRQ_TYPE_PPI),
+		cpu_to_fdt32(irq - 16),
+		cpu_to_fdt32(cpu_mask | IRQ_TYPE_LEVEL_HIGH),
+	};
+
+	if (!kvm->cfg.arch.has_spev1)
+		return;
+
+	if (!kvm__supports_extension(kvm, KVM_CAP_ARM_SPE_V1)) {
+		pr_info("SPE unsupported\n");
+		return;
+	}
+
+	for (i = 0; i < kvm->nrcpus; i++) {
+		struct kvm_device_attr spe_attr;
+
+		spe_attr = (struct kvm_device_attr){
+			.group	= KVM_ARM_VCPU_SPE_V1_CTRL,
+			.addr	= (u64)(unsigned long)&irq,
+			.attr	= KVM_ARM_VCPU_SPE_V1_IRQ,
+		};
+
+		ret = set_spe_attr(kvm, i, &spe_attr);
+		if (ret)
+			return;
+
+		spe_attr = (struct kvm_device_attr){
+			.group	= KVM_ARM_VCPU_SPE_V1_CTRL,
+			.attr	= KVM_ARM_VCPU_SPE_V1_INIT,
+		};
+
+		ret = set_spe_attr(kvm, i, &spe_attr);
+		if (ret)
+			return;
+	}
+
+	_FDT(fdt_begin_node(fdt, "spe"));
+	_FDT(fdt_property(fdt, "compatible", compatible, sizeof(compatible)));
+	_FDT(fdt_property(fdt, "interrupts", irq_prop, sizeof(irq_prop)));
+	_FDT(fdt_end_node(fdt));
+}
+#else
+void spe__generate_fdt_nodes(void *fdt, struct kvm *kvm) { }
+#endif
-- 
2.21.0


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 00/18] arm64: KVM: add SPE profiling support
  2019-12-20 14:30 [PATCH v2 00/18] arm64: KVM: add SPE profiling support Andrew Murray
                   ` (17 preceding siblings ...)
  2019-12-20 14:30 ` [PATCH v2 18/18, KVMTOOL] kvm: add a vcpu feature for SPEv1 support Andrew Murray
@ 2019-12-20 17:55 ` Mark Rutland
  2019-12-24 12:54   ` Andrew Murray
  2019-12-21 10:48 ` Marc Zyngier
  19 siblings, 1 reply; 78+ messages in thread
From: Mark Rutland @ 2019-12-20 17:55 UTC (permalink / raw)
  To: Andrew Murray
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, Sudeep Holla, kvmarm,
	linux-arm-kernel, kvm, linux-kernel

Hi Andrew,

On Fri, Dec 20, 2019 at 02:30:07PM +0000, Andrew Murray wrote:
> This series implements support for allowing KVM guests to use the Arm
> Statistical Profiling Extension (SPE).
> 
> It has been tested on a model to ensure that both host and guest can
> simultaneously use SPE with valid data. E.g.
> 
> $ perf record -e arm_spe/ts_enable=1,pa_enable=1,pct_enable=1/ \
>         dd if=/dev/zero of=/dev/null count=1000
> $ perf report --dump-raw-trace > spe_buf.txt

What happens if I run perf record on the VMM, or on the CPU(s) that the
VMM is running on? i.e.

$ perf record -e arm_spe/ts_enable=1,pa_enable=1,pct_enable=1/ \
        lkvm ${OPTIONS_FOR_GUEST_USING_SPE}

... or:

$ perf record -a -c 0 -e arm_spe/ts_enable=1,pa_enable=1,pct_enable=1/ \
        sleep 1000 &
$ taskset -c 0 lkvm ${OPTIONS_FOR_GUEST_USING_SPE} &

> As we save and restore the SPE context, the guest can access the SPE
> registers directly, thus in this version of the series we remove the
> trapping and emulation.
> 
> In the previous series of this support, when KVM SPE isn't supported
> (e.g. via CONFIG_KVM_ARM_SPE) we were able to return a value of 0 to
> all reads of the SPE registers - as we can no longer do this there isn't
> a mechanism to prevent the guest from using SPE - thus I'm keen for
> feedback on the best way of resolving this.

When not providing SPE to the guest, surely we should be trapping the
registers and injecting an UNDEF?

What happens today, without these patches?

> It appears necessary to pin the entire guest memory in order to provide
> guest SPE access - otherwise it is possible for the guest to receive
> Stage-2 faults.

AFAICT these patches do not implement this. I assume that's what you're
trying to point out here, but I just want to make sure that's explicit.

Maybe this is a reason to trap+emulate if there's something more
sensible that hyp can do if it sees a Stage-2 fault.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 09/18] arm64: KVM: enable conditional save/restore full SPE profiling buffer controls
  2019-12-20 14:30 ` [PATCH v2 09/18] arm64: KVM: enable conditional save/restore full " Andrew Murray
@ 2019-12-20 18:06   ` Mark Rutland
  2019-12-24 12:15     ` Andrew Murray
  2019-12-21 14:13   ` Marc Zyngier
  1 sibling, 1 reply; 78+ messages in thread
From: Mark Rutland @ 2019-12-20 18:06 UTC (permalink / raw)
  To: Andrew Murray
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, Sudeep Holla, kvmarm,
	linux-arm-kernel, kvm, linux-kernel

On Fri, Dec 20, 2019 at 02:30:16PM +0000, Andrew Murray wrote:
> From: Sudeep Holla <sudeep.holla@arm.com>
> 
> Now that we can save/restore the full SPE controls, we can enable it
> if SPE is setup and ready to use in KVM. It's supported in KVM only if
> all the CPUs in the system supports SPE.
> 
> However to support heterogenous systems, we need to move the check if
> host supports SPE and do a partial save/restore.

I don't think that it makes sense to support this for heterogeneous
systems, given their SPE capabilities and IMP DEF details will differ.

Is there some way we can limit this to homogeneous systems?

Thanks,
Mark.

> 
> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> ---
>  arch/arm64/kvm/hyp/debug-sr.c | 33 ++++++++++++++++-----------------
>  include/kvm/arm_spe.h         |  6 ++++++
>  2 files changed, 22 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
> index 12429b212a3a..d8d857067e6d 100644
> --- a/arch/arm64/kvm/hyp/debug-sr.c
> +++ b/arch/arm64/kvm/hyp/debug-sr.c
> @@ -86,18 +86,13 @@
>  	}
>  
>  static void __hyp_text
> -__debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
> +__debug_save_spe_context(struct kvm_cpu_context *ctxt, bool full_ctxt)
>  {
>  	u64 reg;
>  
>  	/* Clear pmscr in case of early return */
>  	ctxt->sys_regs[PMSCR_EL1] = 0;
>  
> -	/* SPE present on this CPU? */
> -	if (!cpuid_feature_extract_unsigned_field(read_sysreg(id_aa64dfr0_el1),
> -						  ID_AA64DFR0_PMSVER_SHIFT))
> -		return;
> -
>  	/* Yes; is it owned by higher EL? */
>  	reg = read_sysreg_s(SYS_PMBIDR_EL1);
>  	if (reg & BIT(SYS_PMBIDR_EL1_P_SHIFT))
> @@ -142,7 +137,7 @@ __debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
>  }
>  
>  static void __hyp_text
> -__debug_restore_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
> +__debug_restore_spe_context(struct kvm_cpu_context *ctxt, bool full_ctxt)
>  {
>  	if (!ctxt->sys_regs[PMSCR_EL1])
>  		return;
> @@ -210,11 +205,14 @@ void __hyp_text __debug_restore_guest_context(struct kvm_vcpu *vcpu)
>  	struct kvm_guest_debug_arch *host_dbg;
>  	struct kvm_guest_debug_arch *guest_dbg;
>  
> +	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> +	guest_ctxt = &vcpu->arch.ctxt;
> +
> +	__debug_restore_spe_context(guest_ctxt, kvm_arm_spe_v1_ready(vcpu));
> +
>  	if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
>  		return;
>  
> -	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> -	guest_ctxt = &vcpu->arch.ctxt;
>  	host_dbg = &vcpu->arch.host_debug_state.regs;
>  	guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
>  
> @@ -232,8 +230,7 @@ void __hyp_text __debug_restore_host_context(struct kvm_vcpu *vcpu)
>  	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
>  	guest_ctxt = &vcpu->arch.ctxt;
>  
> -	if (!has_vhe())
> -		__debug_restore_spe_nvhe(host_ctxt, false);
> +	__debug_restore_spe_context(host_ctxt, kvm_arm_spe_v1_ready(vcpu));
>  
>  	if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
>  		return;
> @@ -249,19 +246,21 @@ void __hyp_text __debug_restore_host_context(struct kvm_vcpu *vcpu)
>  
>  void __hyp_text __debug_save_host_context(struct kvm_vcpu *vcpu)
>  {
> -	/*
> -	 * Non-VHE: Disable and flush SPE data generation
> -	 * VHE: The vcpu can run, but it can't hide.
> -	 */
>  	struct kvm_cpu_context *host_ctxt;
>  
>  	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> -	if (!has_vhe())
> -		__debug_save_spe_nvhe(host_ctxt, false);
> +	if (cpuid_feature_extract_unsigned_field(read_sysreg(id_aa64dfr0_el1),
> +						 ID_AA64DFR0_PMSVER_SHIFT))
> +		__debug_save_spe_context(host_ctxt, kvm_arm_spe_v1_ready(vcpu));
>  }
>  
>  void __hyp_text __debug_save_guest_context(struct kvm_vcpu *vcpu)
>  {
> +	bool kvm_spe_ready = kvm_arm_spe_v1_ready(vcpu);
> +
> +	/* SPE present on this vCPU? */
> +	if (kvm_spe_ready)
> +		__debug_save_spe_context(&vcpu->arch.ctxt, kvm_spe_ready);
>  }
>  
>  u32 __hyp_text __kvm_get_mdcr_el2(void)
> diff --git a/include/kvm/arm_spe.h b/include/kvm/arm_spe.h
> index 48d118fdb174..30c40b1bc385 100644
> --- a/include/kvm/arm_spe.h
> +++ b/include/kvm/arm_spe.h
> @@ -16,4 +16,10 @@ struct kvm_spe {
>  	bool irq_level;
>  };
>  
> +#ifdef CONFIG_KVM_ARM_SPE
> +#define kvm_arm_spe_v1_ready(v)		((v)->arch.spe.ready)
> +#else
> +#define kvm_arm_spe_v1_ready(v)		(false)
> +#endif /* CONFIG_KVM_ARM_SPE */
> +
>  #endif /* __ASM_ARM_KVM_SPE_H */
> -- 
> 2.21.0
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 11/18] KVM: arm64: don't trap Statistical Profiling controls to EL2
  2019-12-20 14:30 ` [PATCH v2 11/18] KVM: arm64: don't trap Statistical Profiling controls to EL2 Andrew Murray
@ 2019-12-20 18:08   ` Mark Rutland
  2019-12-22 10:42   ` Marc Zyngier
  1 sibling, 0 replies; 78+ messages in thread
From: Mark Rutland @ 2019-12-20 18:08 UTC (permalink / raw)
  To: Andrew Murray
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, Sudeep Holla, kvmarm,
	linux-arm-kernel, kvm, linux-kernel

On Fri, Dec 20, 2019 at 02:30:18PM +0000, Andrew Murray wrote:
> As we now save/restore the profiler state there is no need to trap
> accesses to the statistical profiling controls. Let's unset the
> _TPMS bit.
> 
> Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> ---
>  arch/arm64/kvm/debug.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
> index 43487f035385..07ca783e7d9e 100644
> --- a/arch/arm64/kvm/debug.c
> +++ b/arch/arm64/kvm/debug.c
> @@ -88,7 +88,6 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu)
>   *  - Performance monitors (MDCR_EL2_TPM/MDCR_EL2_TPMCR)
>   *  - Debug ROM Address (MDCR_EL2_TDRA)
>   *  - OS related registers (MDCR_EL2_TDOSA)
> - *  - Statistical profiler (MDCR_EL2_TPMS/MDCR_EL2_E2PB)
>   *
>   * Additionally, KVM only traps guest accesses to the debug registers if
>   * the guest is not actively using them (see the KVM_ARM64_DEBUG_DIRTY
> @@ -111,7 +110,6 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
>  	 */
>  	vcpu->arch.mdcr_el2 = __this_cpu_read(mdcr_el2) & MDCR_EL2_HPMN_MASK;
>  	vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
> -				MDCR_EL2_TPMS |
>  				MDCR_EL2_TPMCR |
>  				MDCR_EL2_TDRA |
>  				MDCR_EL2_TDOSA);

I think that this should be conditional on some vcpu feature flag.

If nothing else, this could break existing migration cases otherwise.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 15/18] perf: arm_spe: Handle guest/host exclusion flags
  2019-12-20 14:30 ` [PATCH v2 15/18] perf: arm_spe: Handle guest/host exclusion flags Andrew Murray
@ 2019-12-20 18:10   ` Mark Rutland
  2019-12-22 12:10   ` Marc Zyngier
  1 sibling, 0 replies; 78+ messages in thread
From: Mark Rutland @ 2019-12-20 18:10 UTC (permalink / raw)
  To: Andrew Murray
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, Sudeep Holla, kvmarm,
	linux-arm-kernel, kvm, linux-kernel

On Fri, Dec 20, 2019 at 02:30:22PM +0000, Andrew Murray wrote:
> A side effect of supporting the SPE in guests is that we prevent the
> host from collecting data whilst inside a guest thus creating a black-out
> window. This occurs because instead of emulating the SPE, we share it
> with our guests.

We used to permit this; do we know if anyone is using it?

Thanks,
Mark.

> Let's accurately describe our capabilities by using the perf exclude
> flags to prevent !exclude_guest and exclude_host flags from being used.
> 
> Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> ---
>  drivers/perf/arm_spe_pmu.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
> index 2d24af4cfcab..3703dbf459de 100644
> --- a/drivers/perf/arm_spe_pmu.c
> +++ b/drivers/perf/arm_spe_pmu.c
> @@ -679,6 +679,9 @@ static int arm_spe_pmu_event_init(struct perf_event *event)
>  	if (attr->exclude_idle)
>  		return -EOPNOTSUPP;
>  
> +	if (!attr->exclude_guest || attr->exclude_host)
> +		return -EOPNOTSUPP;
> +
>  	/*
>  	 * Feedback-directed frequency throttling doesn't work when we
>  	 * have a buffer of samples. We'd need to manually count the
> -- 
> 2.21.0
> 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 00/18] arm64: KVM: add SPE profiling support
  2019-12-20 14:30 [PATCH v2 00/18] arm64: KVM: add SPE profiling support Andrew Murray
                   ` (18 preceding siblings ...)
  2019-12-20 17:55 ` [PATCH v2 00/18] arm64: KVM: add SPE profiling support Mark Rutland
@ 2019-12-21 10:48 ` Marc Zyngier
  2019-12-22 12:22   ` Marc Zyngier
  19 siblings, 1 reply; 78+ messages in thread
From: Marc Zyngier @ 2019-12-21 10:48 UTC (permalink / raw)
  To: Andrew Murray
  Cc: Catalin Marinas, Mark Rutland, kvm, linux-kernel, Sudeep Holla,
	kvmarm, linux-arm-kernel, will

[fixing email addresses]

Hi Andrew,

On 2019-12-20 14:30, Andrew Murray wrote:
> This series implements support for allowing KVM guests to use the Arm
> Statistical Profiling Extension (SPE).

Thanks for this. In future, please Cc me and Will on email addresses
we can actually read.

> It has been tested on a model to ensure that both host and guest can
> simultaneously use SPE with valid data. E.g.
>
> $ perf record -e arm_spe/ts_enable=1,pa_enable=1,pct_enable=1/ \
>         dd if=/dev/zero of=/dev/null count=1000
> $ perf report --dump-raw-trace > spe_buf.txt
>
> As we save and restore the SPE context, the guest can access the SPE
> registers directly, thus in this version of the series we remove the
> trapping and emulation.
>
> In the previous series of this support, when KVM SPE isn't supported
> (e.g. via CONFIG_KVM_ARM_SPE) we were able to return a value of 0 to
> all reads of the SPE registers - as we can no longer do this there 
> isn't
> a mechanism to prevent the guest from using SPE - thus I'm keen for
> feedback on the best way of resolving this.

Surely there is a way to conditionally trap SPE registers, right? You
should still be able to do this if SPE is not configured for a given
guest (as we do for other feature such as PtrAuth).

> It appears necessary to pin the entire guest memory in order to 
> provide
> guest SPE access - otherwise it is possible for the guest to receive
> Stage-2 faults.

Really? How can the guest receive a stage-2 fault? This doesn't fit 
what
I understand of the ARMv8 exception model. Or do you mean a SPE 
interrupt
describing a S2 fault?

And this is not just pinning the memory either. You have to ensure that
all S2 page tables are created ahead of SPE being able to DMA to guest
memory. This may have some impacts on the THP code...

I'll have a look at the actual series ASAP (but that's not very soon).

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 02/18] arm64: KVM: reset E2PB correctly in MDCR_EL2 when exiting the guest(VHE)
  2019-12-20 14:30 ` [PATCH v2 02/18] arm64: KVM: reset E2PB correctly in MDCR_EL2 when exiting the guest(VHE) Andrew Murray
@ 2019-12-21 13:12   ` Marc Zyngier
  2019-12-24 10:29     ` Andrew Murray
  0 siblings, 1 reply; 78+ messages in thread
From: Marc Zyngier @ 2019-12-21 13:12 UTC (permalink / raw)
  To: Andrew Murray
  Cc: will, Catalin Marinas, kvm, linux-kernel, Sudeep Holla, kvmarm,
	linux-arm-kernel

On Fri, 20 Dec 2019 14:30:09 +0000
Andrew Murray <andrew.murray@arm.com> wrote:

> From: Sudeep Holla <sudeep.holla@arm.com>
> 
> On VHE systems, the reset value for MDCR_EL2.E2PB=b00 which defaults
> to profiling buffer using the EL2 stage 1 translations. 

Does the reset value actually matter here? I don't see it being
specific to VHE systems, and all we're trying to achieve is to restore
the SPE configuration to a state where it can be used by the host.

> However if the
> guest are allowed to use profiling buffers changing E2PB settings, we

How can the guest be allowed to change E2PB settings? Or do you mean
here that allowing the guest to use SPE will mandate changes of the
E2PB settings, and that we'd better restore the hypervisor state once
we exit?

> need to ensure we resume back MDCR_EL2.E2PB=b00. Currently we just
> do bitwise '&' with MDCR_EL2_E2PB_MASK which will retain the value.
> 
> So fix it by clearing all the bits in E2PB.
> 
> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> ---
>  arch/arm64/kvm/hyp/switch.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 72fbbd86eb5e..250f13910882 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -228,9 +228,7 @@ void deactivate_traps_vhe_put(void)
>  {
>  	u64 mdcr_el2 = read_sysreg(mdcr_el2);
>  
> -	mdcr_el2 &= MDCR_EL2_HPMN_MASK |
> -		    MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT |
> -		    MDCR_EL2_TPMS;
> +	mdcr_el2 &= MDCR_EL2_HPMN_MASK | MDCR_EL2_TPMS;
>  
>  	write_sysreg(mdcr_el2, mdcr_el2);
>  

I'm OK with this change, but I believe the commit message could use
some tidying up.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 03/18] arm64: KVM: define SPE data structure for each vcpu
  2019-12-20 14:30 ` [PATCH v2 03/18] arm64: KVM: define SPE data structure for each vcpu Andrew Murray
@ 2019-12-21 13:19   ` Marc Zyngier
  2019-12-24 12:01     ` Andrew Murray
  0 siblings, 1 reply; 78+ messages in thread
From: Marc Zyngier @ 2019-12-21 13:19 UTC (permalink / raw)
  To: Andrew Murray
  Cc: will, Catalin Marinas, kvm, linux-kernel, Sudeep Holla, kvmarm,
	linux-arm-kernel

On Fri, 20 Dec 2019 14:30:10 +0000
Andrew Murray <andrew.murray@arm.com> wrote:

> From: Sudeep Holla <sudeep.holla@arm.com>
> 
> In order to support virtual SPE for guest, so define some basic structs.
> This features depends on host having hardware with SPE support.
> 
> Since we can support this only on ARM64, add a separate config symbol
> for the same.
> 
> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> [ Add irq_level, rename irq to irq_num for kvm_spe ]
> Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> ---
>  arch/arm64/include/asm/kvm_host.h |  2 ++
>  arch/arm64/kvm/Kconfig            |  7 +++++++
>  include/kvm/arm_spe.h             | 19 +++++++++++++++++++
>  3 files changed, 28 insertions(+)
>  create mode 100644 include/kvm/arm_spe.h
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index c61260cf63c5..f5dcff912645 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -35,6 +35,7 @@
>  #include <kvm/arm_vgic.h>
>  #include <kvm/arm_arch_timer.h>
>  #include <kvm/arm_pmu.h>
> +#include <kvm/arm_spe.h>
>  
>  #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
>  
> @@ -302,6 +303,7 @@ struct kvm_vcpu_arch {
>  	struct vgic_cpu vgic_cpu;
>  	struct arch_timer_cpu timer_cpu;
>  	struct kvm_pmu pmu;
> +	struct kvm_spe spe;
>  
>  	/*
>  	 * Anything that is not used directly from assembly code goes
> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> index a475c68cbfec..af5be2c57dcb 100644
> --- a/arch/arm64/kvm/Kconfig
> +++ b/arch/arm64/kvm/Kconfig
> @@ -35,6 +35,7 @@ config KVM
>  	select HAVE_KVM_EVENTFD
>  	select HAVE_KVM_IRQFD
>  	select KVM_ARM_PMU if HW_PERF_EVENTS
> +	select KVM_ARM_SPE if (HW_PERF_EVENTS && ARM_SPE_PMU)
>  	select HAVE_KVM_MSI
>  	select HAVE_KVM_IRQCHIP
>  	select HAVE_KVM_IRQ_ROUTING
> @@ -61,6 +62,12 @@ config KVM_ARM_PMU
>  	  Adds support for a virtual Performance Monitoring Unit (PMU) in
>  	  virtual machines.
>  
> +config KVM_ARM_SPE
> +	bool
> +	---help---
> +	  Adds support for a virtual Statistical Profiling Extension(SPE) in
> +	  virtual machines.
> +
>  config KVM_INDIRECT_VECTORS
>         def_bool KVM && (HARDEN_BRANCH_PREDICTOR || HARDEN_EL2_VECTORS)
>  
> diff --git a/include/kvm/arm_spe.h b/include/kvm/arm_spe.h
> new file mode 100644
> index 000000000000..48d118fdb174
> --- /dev/null
> +++ b/include/kvm/arm_spe.h
> @@ -0,0 +1,19 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2019 ARM Ltd.
> + */
> +
> +#ifndef __ASM_ARM_KVM_SPE_H
> +#define __ASM_ARM_KVM_SPE_H
> +
> +#include <uapi/linux/kvm.h>
> +#include <linux/kvm_host.h>

I don't believe these are required at this stage.

> +
> +struct kvm_spe {
> +	int irq_num;

'irq' was the right name *if* this represents a Linux irq. If this
instead represents a guest PPI, then it should be named 'intid'.

In either case, please document what this represents.

> +	bool ready; /* indicates that SPE KVM instance is ready for use */
> +	bool created; /* SPE KVM instance is created, may not be ready yet */
> +	bool irq_level;

What does this represent? The state of the interrupt on the host? The
guest? Something else? Also, please consider grouping related fields
together.

> +};

If you've added a config option that controls the selection of the SPE
feature, why doesn't this result in an empty structure when it isn't
selected?

> +
> +#endif /* __ASM_ARM_KVM_SPE_H */

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 08/18] arm64: KVM: add support to save/restore SPE profiling buffer controls
  2019-12-20 14:30 ` [PATCH v2 08/18] arm64: KVM: add support to save/restore SPE profiling buffer controls Andrew Murray
@ 2019-12-21 13:57   ` Marc Zyngier
  2019-12-24 10:49     ` Andrew Murray
  0 siblings, 1 reply; 78+ messages in thread
From: Marc Zyngier @ 2019-12-21 13:57 UTC (permalink / raw)
  To: Andrew Murray
  Cc: Catalin Marinas, Will Deacon, kvm, linux-kernel, Sudeep Holla,
	kvmarm, linux-arm-kernel

On Fri, 20 Dec 2019 14:30:15 +0000
Andrew Murray <andrew.murray@arm.com> wrote:

> From: Sudeep Holla <sudeep.holla@arm.com>
> 
> Currently since we don't support profiling using SPE in the guests,
> we just save the PMSCR_EL1, flush the profiling buffers and disable
> sampling. However in order to support simultaneous sampling both in

Is the sampling actually simultaneous? I don't believe so (the whole
series would be much simpler if it was).

> the host and guests, we need to save and reatore the complete SPE

s/reatore/restore/

> profiling buffer controls' context.
> 
> Let's add the support for the same and keep it disabled for now.
> We can enable it conditionally only if guests are allowed to use
> SPE.
> 
> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> [ Clear PMBSR bit when saving state to prevent spurious interrupts ]
> Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> ---
>  arch/arm64/kvm/hyp/debug-sr.c | 51 +++++++++++++++++++++++++++++------
>  1 file changed, 43 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
> index 8a70a493345e..12429b212a3a 100644
> --- a/arch/arm64/kvm/hyp/debug-sr.c
> +++ b/arch/arm64/kvm/hyp/debug-sr.c
> @@ -85,7 +85,8 @@
>  	default:	write_debug(ptr[0], reg, 0);			\
>  	}
>  
> -static void __hyp_text __debug_save_spe_nvhe(struct kvm_cpu_context *ctxt)
> +static void __hyp_text
> +__debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)

nit: don't split lines like this if you can avoid it. You can put the
full_ctxt parameter on a separate line instead.

>  {
>  	u64 reg;
>  
> @@ -102,22 +103,46 @@ static void __hyp_text __debug_save_spe_nvhe(struct kvm_cpu_context *ctxt)
>  	if (reg & BIT(SYS_PMBIDR_EL1_P_SHIFT))
>  		return;
>  
> -	/* No; is the host actually using the thing? */
> -	reg = read_sysreg_s(SYS_PMBLIMITR_EL1);
> -	if (!(reg & BIT(SYS_PMBLIMITR_EL1_E_SHIFT)))
> +	/* Save the control register and disable data generation */
> +	ctxt->sys_regs[PMSCR_EL1] = read_sysreg_el1(SYS_PMSCR);
> +
> +	if (!ctxt->sys_regs[PMSCR_EL1])

Shouldn't you check the enable bits instead of relying on the whole
thing being zero?

>  		return;
>  
>  	/* Yes; save the control register and disable data generation */
> -	ctxt->sys_regs[PMSCR_EL1] = read_sysreg_el1(SYS_PMSCR);

You've already saved the control register...

>  	write_sysreg_el1(0, SYS_PMSCR);
>  	isb();
>  
>  	/* Now drain all buffered data to memory */
>  	psb_csync();
>  	dsb(nsh);
> +
> +	if (!full_ctxt)
> +		return;
> +
> +	ctxt->sys_regs[PMBLIMITR_EL1] = read_sysreg_s(SYS_PMBLIMITR_EL1);
> +	write_sysreg_s(0, SYS_PMBLIMITR_EL1);
> +
> +	/*
> +	 * As PMBSR is conditionally restored when returning to the host we
> +	 * must ensure the service bit is unset here to prevent a spurious
> +	 * host SPE interrupt from being raised.
> +	 */
> +	ctxt->sys_regs[PMBSR_EL1] = read_sysreg_s(SYS_PMBSR_EL1);
> +	write_sysreg_s(0, SYS_PMBSR_EL1);
> +
> +	isb();
> +
> +	ctxt->sys_regs[PMSICR_EL1] = read_sysreg_s(SYS_PMSICR_EL1);
> +	ctxt->sys_regs[PMSIRR_EL1] = read_sysreg_s(SYS_PMSIRR_EL1);
> +	ctxt->sys_regs[PMSFCR_EL1] = read_sysreg_s(SYS_PMSFCR_EL1);
> +	ctxt->sys_regs[PMSEVFR_EL1] = read_sysreg_s(SYS_PMSEVFR_EL1);
> +	ctxt->sys_regs[PMSLATFR_EL1] = read_sysreg_s(SYS_PMSLATFR_EL1);
> +	ctxt->sys_regs[PMBPTR_EL1] = read_sysreg_s(SYS_PMBPTR_EL1);
>  }
>  
> -static void __hyp_text __debug_restore_spe_nvhe(struct kvm_cpu_context *ctxt)
> +static void __hyp_text
> +__debug_restore_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
>  {
>  	if (!ctxt->sys_regs[PMSCR_EL1])
>  		return;
> @@ -126,6 +151,16 @@ static void __hyp_text __debug_restore_spe_nvhe(struct kvm_cpu_context *ctxt)
>  	isb();
>  
>  	/* Re-enable data generation */
> +	if (full_ctxt) {
> +		write_sysreg_s(ctxt->sys_regs[PMBPTR_EL1], SYS_PMBPTR_EL1);
> +		write_sysreg_s(ctxt->sys_regs[PMBLIMITR_EL1], SYS_PMBLIMITR_EL1);
> +		write_sysreg_s(ctxt->sys_regs[PMSFCR_EL1], SYS_PMSFCR_EL1);
> +		write_sysreg_s(ctxt->sys_regs[PMSEVFR_EL1], SYS_PMSEVFR_EL1);
> +		write_sysreg_s(ctxt->sys_regs[PMSLATFR_EL1], SYS_PMSLATFR_EL1);
> +		write_sysreg_s(ctxt->sys_regs[PMSIRR_EL1], SYS_PMSIRR_EL1);
> +		write_sysreg_s(ctxt->sys_regs[PMSICR_EL1], SYS_PMSICR_EL1);
> +		write_sysreg_s(ctxt->sys_regs[PMBSR_EL1], SYS_PMBSR_EL1);
> +	}
>  	write_sysreg_el1(ctxt->sys_regs[PMSCR_EL1], SYS_PMSCR);
>  }
>  
> @@ -198,7 +233,7 @@ void __hyp_text __debug_restore_host_context(struct kvm_vcpu *vcpu)
>  	guest_ctxt = &vcpu->arch.ctxt;
>  
>  	if (!has_vhe())
> -		__debug_restore_spe_nvhe(host_ctxt);
> +		__debug_restore_spe_nvhe(host_ctxt, false);
>  
>  	if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
>  		return;
> @@ -222,7 +257,7 @@ void __hyp_text __debug_save_host_context(struct kvm_vcpu *vcpu)
>  
>  	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
>  	if (!has_vhe())
> -		__debug_save_spe_nvhe(host_ctxt);
> +		__debug_save_spe_nvhe(host_ctxt, false);
>  }
>  
>  void __hyp_text __debug_save_guest_context(struct kvm_vcpu *vcpu)

So all of this is for non-VHE. What happens in the VHE case?

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 09/18] arm64: KVM: enable conditional save/restore full SPE profiling buffer controls
  2019-12-20 14:30 ` [PATCH v2 09/18] arm64: KVM: enable conditional save/restore full " Andrew Murray
  2019-12-20 18:06   ` Mark Rutland
@ 2019-12-21 14:13   ` Marc Zyngier
  2020-01-07 15:13     ` Andrew Murray
  2020-01-10 10:54     ` Andrew Murray
  1 sibling, 2 replies; 78+ messages in thread
From: Marc Zyngier @ 2019-12-21 14:13 UTC (permalink / raw)
  To: Andrew Murray
  Cc: Catalin Marinas, Mark Rutland, will, Sudeep Holla, kvm, kvmarm,
	linux-arm-kernel, linux-kernel

On Fri, 20 Dec 2019 14:30:16 +0000
Andrew Murray <andrew.murray@arm.com> wrote:

[somehow managed not to do a reply all, re-sending]

> From: Sudeep Holla <sudeep.holla@arm.com>
> 
> Now that we can save/restore the full SPE controls, we can enable it
> if SPE is setup and ready to use in KVM. It's supported in KVM only if
> all the CPUs in the system supports SPE.
> 
> However to support heterogenous systems, we need to move the check if
> host supports SPE and do a partial save/restore.

No. Let's just not go down that path. For now, KVM on heterogeneous
systems do not get SPE. If SPE has been enabled on a guest and a CPU
comes up without SPE, this CPU should fail to boot (same as exposing a
feature to userspace).

> 
> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> ---
>  arch/arm64/kvm/hyp/debug-sr.c | 33 ++++++++++++++++-----------------
>  include/kvm/arm_spe.h         |  6 ++++++
>  2 files changed, 22 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
> index 12429b212a3a..d8d857067e6d 100644
> --- a/arch/arm64/kvm/hyp/debug-sr.c
> +++ b/arch/arm64/kvm/hyp/debug-sr.c
> @@ -86,18 +86,13 @@
>  	}
>  
>  static void __hyp_text
> -__debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
> +__debug_save_spe_context(struct kvm_cpu_context *ctxt, bool full_ctxt)
>  {
>  	u64 reg;
>  
>  	/* Clear pmscr in case of early return */
>  	ctxt->sys_regs[PMSCR_EL1] = 0;
>  
> -	/* SPE present on this CPU? */
> -	if (!cpuid_feature_extract_unsigned_field(read_sysreg(id_aa64dfr0_el1),
> -						  ID_AA64DFR0_PMSVER_SHIFT))
> -		return;
> -
>  	/* Yes; is it owned by higher EL? */
>  	reg = read_sysreg_s(SYS_PMBIDR_EL1);
>  	if (reg & BIT(SYS_PMBIDR_EL1_P_SHIFT))
> @@ -142,7 +137,7 @@ __debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
>  }
>  
>  static void __hyp_text
> -__debug_restore_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
> +__debug_restore_spe_context(struct kvm_cpu_context *ctxt, bool full_ctxt)
>  {
>  	if (!ctxt->sys_regs[PMSCR_EL1])
>  		return;
> @@ -210,11 +205,14 @@ void __hyp_text __debug_restore_guest_context(struct kvm_vcpu *vcpu)
>  	struct kvm_guest_debug_arch *host_dbg;
>  	struct kvm_guest_debug_arch *guest_dbg;
>  
> +	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> +	guest_ctxt = &vcpu->arch.ctxt;
> +
> +	__debug_restore_spe_context(guest_ctxt, kvm_arm_spe_v1_ready(vcpu));
> +
>  	if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
>  		return;
>  
> -	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> -	guest_ctxt = &vcpu->arch.ctxt;
>  	host_dbg = &vcpu->arch.host_debug_state.regs;
>  	guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
>  
> @@ -232,8 +230,7 @@ void __hyp_text __debug_restore_host_context(struct kvm_vcpu *vcpu)
>  	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
>  	guest_ctxt = &vcpu->arch.ctxt;
>  
> -	if (!has_vhe())
> -		__debug_restore_spe_nvhe(host_ctxt, false);
> +	__debug_restore_spe_context(host_ctxt, kvm_arm_spe_v1_ready(vcpu));

So you now do an unconditional save/restore on the exit path for VHE as
well? Even if the host isn't using the SPE HW? That's not acceptable
as, in most cases, only the host /or/ the guest will use SPE. Here, you
put a measurable overhead on each exit.

If the host is not using SPE, then the restore/save should happen in
vcpu_load/vcpu_put. Only if the host is using SPE should you do
something in the run loop. Of course, this only applies to VHE and
non-VHE must switch eagerly.

>  
>  	if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
>  		return;
> @@ -249,19 +246,21 @@ void __hyp_text __debug_restore_host_context(struct kvm_vcpu *vcpu)
>  
>  void __hyp_text __debug_save_host_context(struct kvm_vcpu *vcpu)
>  {
> -	/*
> -	 * Non-VHE: Disable and flush SPE data generation
> -	 * VHE: The vcpu can run, but it can't hide.
> -	 */
>  	struct kvm_cpu_context *host_ctxt;
>  
>  	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> -	if (!has_vhe())
> -		__debug_save_spe_nvhe(host_ctxt, false);
> +	if (cpuid_feature_extract_unsigned_field(read_sysreg(id_aa64dfr0_el1),
> +						 ID_AA64DFR0_PMSVER_SHIFT))
> +		__debug_save_spe_context(host_ctxt, kvm_arm_spe_v1_ready(vcpu));
>  }
>  
>  void __hyp_text __debug_save_guest_context(struct kvm_vcpu *vcpu)
>  {
> +	bool kvm_spe_ready = kvm_arm_spe_v1_ready(vcpu);
> +
> +	/* SPE present on this vCPU? */
> +	if (kvm_spe_ready)
> +		__debug_save_spe_context(&vcpu->arch.ctxt, kvm_spe_ready);
>  }
>  
>  u32 __hyp_text __kvm_get_mdcr_el2(void)
> diff --git a/include/kvm/arm_spe.h b/include/kvm/arm_spe.h
> index 48d118fdb174..30c40b1bc385 100644
> --- a/include/kvm/arm_spe.h
> +++ b/include/kvm/arm_spe.h
> @@ -16,4 +16,10 @@ struct kvm_spe {
>  	bool irq_level;
>  };
>  
> +#ifdef CONFIG_KVM_ARM_SPE
> +#define kvm_arm_spe_v1_ready(v)		((v)->arch.spe.ready)
> +#else
> +#define kvm_arm_spe_v1_ready(v)		(false)
> +#endif /* CONFIG_KVM_ARM_SPE */
> +
>  #endif /* __ASM_ARM_KVM_SPE_H */

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 10/18] arm64: KVM/debug: use EL1&0 stage 1 translation regime
  2019-12-20 14:30 ` [PATCH v2 10/18] arm64: KVM/debug: use EL1&0 stage 1 translation regime Andrew Murray
@ 2019-12-22 10:34   ` Marc Zyngier
  2019-12-24 11:11     ` Andrew Murray
  2020-01-13 16:31     ` Andrew Murray
  0 siblings, 2 replies; 78+ messages in thread
From: Marc Zyngier @ 2019-12-22 10:34 UTC (permalink / raw)
  To: Andrew Murray
  Cc: Catalin Marinas, Will Deacon, kvm, linux-kernel, Sudeep Holla,
	kvmarm, linux-arm-kernel

On Fri, 20 Dec 2019 14:30:17 +0000,
Andrew Murray <andrew.murray@arm.com> wrote:
> 
> From: Sudeep Holla <sudeep.holla@arm.com>
> 
> Now that we have all the save/restore mechanism in place, lets enable
> the translation regime used by buffer from EL2 stage 1 to EL1 stage 1
> on VHE systems.
> 
> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> [ Reword commit, don't trap to EL2 ]

Not trapping to EL2 for the case where we don't allow SPE in the
guest is not acceptable.

> Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> ---
>  arch/arm64/kvm/hyp/switch.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 67b7c160f65b..6c153b79829b 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -100,6 +100,7 @@ static void activate_traps_vhe(struct kvm_vcpu *vcpu)
>  
>  	write_sysreg(val, cpacr_el1);
>  
> +	write_sysreg(vcpu->arch.mdcr_el2 | 3 << MDCR_EL2_E2PB_SHIFT, mdcr_el2);
>  	write_sysreg(kvm_get_hyp_vector(), vbar_el1);
>  }
>  NOKPROBE_SYMBOL(activate_traps_vhe);
> @@ -117,6 +118,7 @@ static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
>  		__activate_traps_fpsimd32(vcpu);
>  	}
>  
> +	write_sysreg(vcpu->arch.mdcr_el2 | 3 << MDCR_EL2_E2PB_SHIFT, mdcr_el2);

There is a _MASK macro that can replace this '3', and is in keeping
with the rest of the code.

It still remains that it looks like the wrong place to do this, and
vcpu_load seems much better. Why should you write to mdcr_el2 on each
entry to the guest, since you know whether it has SPE enabled at the
point where it gets scheduled?

	M.

-- 
Jazz is not dead, it just smells funny.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 11/18] KVM: arm64: don't trap Statistical Profiling controls to EL2
  2019-12-20 14:30 ` [PATCH v2 11/18] KVM: arm64: don't trap Statistical Profiling controls to EL2 Andrew Murray
  2019-12-20 18:08   ` Mark Rutland
@ 2019-12-22 10:42   ` Marc Zyngier
  2019-12-23 11:56     ` Andrew Murray
  1 sibling, 1 reply; 78+ messages in thread
From: Marc Zyngier @ 2019-12-22 10:42 UTC (permalink / raw)
  To: Andrew Murray
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, kvm, linux-kernel,
	Sudeep Holla, kvmarm, linux-arm-kernel

On Fri, 20 Dec 2019 14:30:18 +0000,
Andrew Murray <andrew.murray@arm.com> wrote:
> 
> As we now save/restore the profiler state there is no need to trap
> accesses to the statistical profiling controls. Let's unset the
> _TPMS bit.
> 
> Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> ---
>  arch/arm64/kvm/debug.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
> index 43487f035385..07ca783e7d9e 100644
> --- a/arch/arm64/kvm/debug.c
> +++ b/arch/arm64/kvm/debug.c
> @@ -88,7 +88,6 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu)
>   *  - Performance monitors (MDCR_EL2_TPM/MDCR_EL2_TPMCR)
>   *  - Debug ROM Address (MDCR_EL2_TDRA)
>   *  - OS related registers (MDCR_EL2_TDOSA)
> - *  - Statistical profiler (MDCR_EL2_TPMS/MDCR_EL2_E2PB)
>   *
>   * Additionally, KVM only traps guest accesses to the debug registers if
>   * the guest is not actively using them (see the KVM_ARM64_DEBUG_DIRTY
> @@ -111,7 +110,6 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
>  	 */
>  	vcpu->arch.mdcr_el2 = __this_cpu_read(mdcr_el2) & MDCR_EL2_HPMN_MASK;
>  	vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
> -				MDCR_EL2_TPMS |

No. This is an *optional* feature (the guest could not be presented
with the SPE feature, or the the support simply not be compiled in).

If the guest is not allowed to see the feature, for whichever reason,
the traps *must* be enabled and handled.

	M.

-- 
Jazz is not dead, it just smells funny.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 12/18] KVM: arm64: add a new vcpu device control group for SPEv1
  2019-12-20 14:30 ` [PATCH v2 12/18] KVM: arm64: add a new vcpu device control group for SPEv1 Andrew Murray
@ 2019-12-22 11:03   ` Marc Zyngier
  2019-12-24 12:30     ` Andrew Murray
  0 siblings, 1 reply; 78+ messages in thread
From: Marc Zyngier @ 2019-12-22 11:03 UTC (permalink / raw)
  To: Andrew Murray
  Cc: Catalin Marinas, Will Deacon, kvm, linux-kernel, Sudeep Holla,
	kvmarm, linux-arm-kernel

On Fri, 20 Dec 2019 14:30:19 +0000,
Andrew Murray <andrew.murray@arm.com> wrote:
> 
> From: Sudeep Holla <sudeep.holla@arm.com>
> 
> To configure the virtual SPEv1 overflow interrupt number, we use the
> vcpu kvm_device ioctl, encapsulating the KVM_ARM_VCPU_SPE_V1_IRQ
> attribute within the KVM_ARM_VCPU_SPE_V1_CTRL group.
> 
> After configuring the SPEv1, call the vcpu ioctl with attribute
> KVM_ARM_VCPU_SPE_V1_INIT to initialize the SPEv1.
> 
> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> ---
>  Documentation/virt/kvm/devices/vcpu.txt |  28 ++++
>  arch/arm64/include/asm/kvm_host.h       |   2 +-
>  arch/arm64/include/uapi/asm/kvm.h       |   4 +
>  arch/arm64/kvm/Makefile                 |   1 +
>  arch/arm64/kvm/guest.c                  |   6 +
>  arch/arm64/kvm/reset.c                  |   3 +
>  include/kvm/arm_spe.h                   |  45 +++++++
>  include/uapi/linux/kvm.h                |   1 +
>  virt/kvm/arm/arm.c                      |   1 +
>  virt/kvm/arm/spe.c                      | 163 ++++++++++++++++++++++++
>  10 files changed, 253 insertions(+), 1 deletion(-)
>  create mode 100644 virt/kvm/arm/spe.c
> 
> diff --git a/Documentation/virt/kvm/devices/vcpu.txt b/Documentation/virt/kvm/devices/vcpu.txt
> index 6f3bd64a05b0..cefad056d677 100644
> --- a/Documentation/virt/kvm/devices/vcpu.txt
> +++ b/Documentation/virt/kvm/devices/vcpu.txt
> @@ -74,3 +74,31 @@ Specifies the base address of the stolen time structure for this VCPU. The
>  base address must be 64 byte aligned and exist within a valid guest memory
>  region. See Documentation/virt/kvm/arm/pvtime.txt for more information
>  including the layout of the stolen time structure.
> +
> +4. GROUP: KVM_ARM_VCPU_SPE_V1_CTRL
> +Architectures: ARM64
> +
> +4.1. ATTRIBUTE: KVM_ARM_VCPU_SPE_V1_IRQ
> +Parameters: in kvm_device_attr.addr the address for SPE buffer overflow interrupt
> +	    is a pointer to an int
> +Returns: -EBUSY: The SPE overflow interrupt is already set
> +         -ENXIO: The overflow interrupt not set when attempting to get it
> +         -ENODEV: SPEv1 not supported
> +         -EINVAL: Invalid SPE overflow interrupt number supplied or
> +                  trying to set the IRQ number without using an in-kernel
> +                  irqchip.
> +
> +A value describing the SPEv1 (Statistical Profiling Extension v1) overflow
> +interrupt number for this vcpu. This interrupt should be PPI and the interrupt
> +type and number must be same for each vcpu.
> +
> +4.2 ATTRIBUTE: KVM_ARM_VCPU_SPE_V1_INIT
> +Parameters: no additional parameter in kvm_device_attr.addr
> +Returns: -ENODEV: SPEv1 not supported or GIC not initialized
> +         -ENXIO: SPEv1 not properly configured or in-kernel irqchip not
> +                 configured as required prior to calling this attribute
> +         -EBUSY: SPEv1 already initialized
> +
> +Request the initialization of the SPEv1.  If using the SPEv1 with an in-kernel
> +virtual GIC implementation, this must be done after initializing the in-kernel
> +irqchip.
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 333c6491bec7..d00f450dc4cd 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -39,7 +39,7 @@
>  
>  #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
>  
> -#define KVM_VCPU_MAX_FEATURES 7
> +#define KVM_VCPU_MAX_FEATURES 8
>  
>  #define KVM_REQ_SLEEP \
>  	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> index 820e5751ada7..905a73f30079 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -106,6 +106,7 @@ struct kvm_regs {
>  #define KVM_ARM_VCPU_SVE		4 /* enable SVE for this CPU */
>  #define KVM_ARM_VCPU_PTRAUTH_ADDRESS	5 /* VCPU uses address authentication */
>  #define KVM_ARM_VCPU_PTRAUTH_GENERIC	6 /* VCPU uses generic authentication */
> +#define KVM_ARM_VCPU_SPE_V1		7 /* Support guest SPEv1 */
>  
>  struct kvm_vcpu_init {
>  	__u32 target;
> @@ -326,6 +327,9 @@ struct kvm_vcpu_events {
>  #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER		1
>  #define KVM_ARM_VCPU_PVTIME_CTRL	2
>  #define   KVM_ARM_VCPU_PVTIME_IPA	0
> +#define KVM_ARM_VCPU_SPE_V1_CTRL	3
> +#define   KVM_ARM_VCPU_SPE_V1_IRQ	0
> +#define   KVM_ARM_VCPU_SPE_V1_INIT	1
>  
>  /* KVM_IRQ_LINE irq field index values */
>  #define KVM_ARM_IRQ_VCPU2_SHIFT		28
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index 5ffbdc39e780..526f3bf09321 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -37,3 +37,4 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-debug.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/irqchip.o
>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
>  kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
> +kvm-$(CONFIG_KVM_ARM_SPE) += $(KVM)/arm/spe.o
> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> index 2fff06114a8f..50fea538b8bd 100644
> --- a/arch/arm64/kvm/guest.c
> +++ b/arch/arm64/kvm/guest.c
> @@ -874,6 +874,8 @@ int kvm_arm_vcpu_arch_set_attr(struct kvm_vcpu *vcpu,
>  		break;
>  	case KVM_ARM_VCPU_PVTIME_CTRL:
>  		ret = kvm_arm_pvtime_set_attr(vcpu, attr);
> +	case KVM_ARM_VCPU_SPE_V1_CTRL:
> +		ret = kvm_arm_spe_v1_set_attr(vcpu, attr);
>  		break;
>  	default:
>  		ret = -ENXIO;
> @@ -897,6 +899,8 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
>  		break;
>  	case KVM_ARM_VCPU_PVTIME_CTRL:
>  		ret = kvm_arm_pvtime_get_attr(vcpu, attr);
> +	case KVM_ARM_VCPU_SPE_V1_CTRL:
> +		ret = kvm_arm_spe_v1_get_attr(vcpu, attr);
>  		break;
>  	default:
>  		ret = -ENXIO;
> @@ -920,6 +924,8 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
>  		break;
>  	case KVM_ARM_VCPU_PVTIME_CTRL:
>  		ret = kvm_arm_pvtime_has_attr(vcpu, attr);
> +	case KVM_ARM_VCPU_SPE_V1_CTRL:
> +		ret = kvm_arm_spe_v1_has_attr(vcpu, attr);
>  		break;
>  	default:
>  		ret = -ENXIO;
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index f4a8ae918827..cf17aff1489d 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -80,6 +80,9 @@ int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  	case KVM_CAP_ARM_INJECT_SERROR_ESR:
>  		r = cpus_have_const_cap(ARM64_HAS_RAS_EXTN);
>  		break;
> +	case KVM_CAP_ARM_SPE_V1:
> +		r = kvm_arm_support_spe_v1();
> +		break;
>  	case KVM_CAP_SET_GUEST_DEBUG:
>  	case KVM_CAP_VCPU_ATTRIBUTES:
>  		r = 1;
> diff --git a/include/kvm/arm_spe.h b/include/kvm/arm_spe.h
> index 30c40b1bc385..d1f3c564dfd0 100644
> --- a/include/kvm/arm_spe.h
> +++ b/include/kvm/arm_spe.h
> @@ -8,6 +8,7 @@
>  
>  #include <uapi/linux/kvm.h>
>  #include <linux/kvm_host.h>
> +#include <linux/cpufeature.h>
>  
>  struct kvm_spe {
>  	int irq_num;
> @@ -18,8 +19,52 @@ struct kvm_spe {
>  
>  #ifdef CONFIG_KVM_ARM_SPE
>  #define kvm_arm_spe_v1_ready(v)		((v)->arch.spe.ready)
> +#define kvm_arm_spe_irq_initialized(v)		\
> +	((v)->arch.spe.irq_num >= VGIC_NR_SGIS &&	\
> +	(v)->arch.spe.irq_num <= VGIC_MAX_PRIVATE)

This is buggy, as it accepts 32 as a valid interrupt (which obviously
isn't a PPI). Having fixed it, this is a duplicate of irq_is_ppi().

And that's where we can finally confirm that 'irq_num' is a GIC
INTID. Please name it as such.

> +
> +static inline bool kvm_arm_support_spe_v1(void)
> +{
> +	u64 dfr0 = read_sanitised_ftr_reg(SYS_ID_AA64DFR0_EL1);
> +
> +	return !!cpuid_feature_extract_unsigned_field(dfr0,
> +						      ID_AA64DFR0_PMSVER_SHIFT);
> +}
> +
> +int kvm_arm_spe_v1_set_attr(struct kvm_vcpu *vcpu,
> +			    struct kvm_device_attr *attr);
> +int kvm_arm_spe_v1_get_attr(struct kvm_vcpu *vcpu,
> +			    struct kvm_device_attr *attr);
> +int kvm_arm_spe_v1_has_attr(struct kvm_vcpu *vcpu,
> +			    struct kvm_device_attr *attr);
> +int kvm_arm_spe_v1_enable(struct kvm_vcpu *vcpu);
>  #else
>  #define kvm_arm_spe_v1_ready(v)		(false)
> +#define kvm_arm_support_spe_v1()	(false)
> +#define kvm_arm_spe_irq_initialized(v)	(false)
> +
> +static inline int kvm_arm_spe_v1_set_attr(struct kvm_vcpu *vcpu,
> +					  struct kvm_device_attr *attr)
> +{
> +	return -ENXIO;
> +}
> +
> +static inline int kvm_arm_spe_v1_get_attr(struct kvm_vcpu *vcpu,
> +					  struct kvm_device_attr *attr)
> +{
> +	return -ENXIO;
> +}
> +
> +static inline int kvm_arm_spe_v1_has_attr(struct kvm_vcpu *vcpu,
> +					  struct kvm_device_attr *attr)
> +{
> +	return -ENXIO;
> +}
> +
> +static inline int kvm_arm_spe_v1_enable(struct kvm_vcpu *vcpu)
> +{
> +	return 0;
> +}
>  #endif /* CONFIG_KVM_ARM_SPE */
>  
>  #endif /* __ASM_ARM_KVM_SPE_H */
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index f0a16b4adbbd..1a362c230e4a 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1009,6 +1009,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_PPC_GUEST_DEBUG_SSTEP 176
>  #define KVM_CAP_ARM_NISV_TO_USER 177
>  #define KVM_CAP_ARM_INJECT_EXT_DABT 178
> +#define KVM_CAP_ARM_SPE_V1 179
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 12e0280291ce..340d2388ee2c 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -22,6 +22,7 @@
>  #include <trace/events/kvm.h>
>  #include <kvm/arm_pmu.h>
>  #include <kvm/arm_psci.h>
> +#include <kvm/arm_spe.h>
>  
>  #define CREATE_TRACE_POINTS
>  #include "trace.h"
> diff --git a/virt/kvm/arm/spe.c b/virt/kvm/arm/spe.c
> new file mode 100644
> index 000000000000..83ac2cce2cc3
> --- /dev/null
> +++ b/virt/kvm/arm/spe.c
> @@ -0,0 +1,163 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2019 ARM Ltd.
> + */
> +
> +#include <linux/cpu.h>
> +#include <linux/kvm.h>
> +#include <linux/kvm_host.h>
> +#include <linux/uaccess.h>
> +#include <asm/kvm_emulate.h>
> +#include <kvm/arm_spe.h>
> +#include <kvm/arm_vgic.h>
> +
> +int kvm_arm_spe_v1_enable(struct kvm_vcpu *vcpu)
> +{
> +	if (!vcpu->arch.spe.created)
> +		return 0;

Shouldn't it be an error to enable something that doesn't exist?

> +
> +	/*
> +	 * A valid interrupt configuration for the SPE is either to have a

either?

> +	 * properly configured interrupt number and using an in-kernel irqchip.
> +	 */
> +	if (irqchip_in_kernel(vcpu->kvm)) {
> +		int irq = vcpu->arch.spe.irq_num;
> +
> +		if (!kvm_arm_spe_irq_initialized(vcpu))
> +			return -EINVAL;
> +
> +		if (!irq_is_ppi(irq))
> +			return -EINVAL;
> +	}
> +
> +	vcpu->arch.spe.ready = true;

And SPE is then ready even when we don't have an in-kernel irqchip?

> +
> +	return 0;
> +}
> +
> +static int kvm_arm_spe_v1_init(struct kvm_vcpu *vcpu)
> +{
> +	if (!kvm_arm_support_spe_v1())
> +		return -ENODEV;
> +
> +	if (!test_bit(KVM_ARM_VCPU_SPE_V1, vcpu->arch.features))
> +		return -ENXIO;
> +
> +	if (vcpu->arch.spe.created)
> +		return -EBUSY;
> +
> +	if (irqchip_in_kernel(vcpu->kvm)) {
> +		int ret;
> +
> +		/*
> +		 * If using the SPE with an in-kernel virtual GIC
> +		 * implementation, we require the GIC to be already
> +		 * initialized when initializing the SPE.
> +		 */
> +		if (!vgic_initialized(vcpu->kvm))
> +			return -ENODEV;
> +
> +		ret = kvm_vgic_set_owner(vcpu, vcpu->arch.spe.irq_num,
> +					 &vcpu->arch.spe);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	vcpu->arch.spe.created = true;

Same problem.

> +	return 0;
> +}
> +
> +/*
> + * For one VM the interrupt type must be same for each vcpu.
> + * As a PPI, the interrupt number is the same for all vcpus,
> + * while as an SPI it must be a separate number per vcpu.

Why do you want to support SPIs at all? And it isn't what
kvm_arm_spe_irq_initialized claims to be doing.

> + */
> +static bool spe_irq_is_valid(struct kvm *kvm, int irq)
> +{
> +	int i;
> +	struct kvm_vcpu *vcpu;
> +
> +	kvm_for_each_vcpu(i, vcpu, kvm) {
> +		if (!kvm_arm_spe_irq_initialized(vcpu))
> +			continue;
> +
> +		if (vcpu->arch.spe.irq_num != irq)
> +			return false;
> +	}
> +
> +	return true;
> +}
> +
> +int kvm_arm_spe_v1_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
> +{
> +	switch (attr->attr) {
> +	case KVM_ARM_VCPU_SPE_V1_IRQ: {
> +		int __user *uaddr = (int __user *)(long)attr->addr;
> +		int irq;
> +
> +		if (!irqchip_in_kernel(vcpu->kvm))
> +			return -EINVAL;

Here, you forbid setting the IRQ for a VM that doesn't have an
in-kernel irqchip. And yet below, you're happy to initialise it.

> +
> +		if (!test_bit(KVM_ARM_VCPU_SPE_V1, vcpu->arch.features))
> +			return -ENODEV;
> +
> +		if (get_user(irq, uaddr))
> +			return -EFAULT;
> +
> +		/* The SPE overflow interrupt can be a PPI only */
> +		if (!(irq_is_ppi(irq)))
> +			return -EINVAL;

Ah, so you do know about irq_is_ppi()...

> +
> +		if (!spe_irq_is_valid(vcpu->kvm, irq))

But why don't you fold the interrupt validity checks in this helper?

> +			return -EINVAL;
> +
> +		if (kvm_arm_spe_irq_initialized(vcpu))
> +			return -EBUSY;
> +
> +		kvm_debug("Set kvm ARM SPE irq: %d\n", irq);
> +		vcpu->arch.spe.irq_num = irq;
> +		return 0;
> +	}
> +	case KVM_ARM_VCPU_SPE_V1_INIT:
> +		return kvm_arm_spe_v1_init(vcpu);
> +	}
> +
> +	return -ENXIO;
> +}
> +
> +int kvm_arm_spe_v1_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
> +{
> +	switch (attr->attr) {
> +	case KVM_ARM_VCPU_SPE_V1_IRQ: {
> +		int __user *uaddr = (int __user *)(long)attr->addr;
> +		int irq;
> +
> +		if (!irqchip_in_kernel(vcpu->kvm))
> +			return -EINVAL;
> +
> +		if (!test_bit(KVM_ARM_VCPU_SPE_V1, vcpu->arch.features))
> +			return -ENODEV;
> +
> +		if (!kvm_arm_spe_irq_initialized(vcpu))
> +			return -ENXIO;
> +
> +		irq = vcpu->arch.spe.irq_num;
> +		return put_user(irq, uaddr);
> +	}
> +	}
> +
> +	return -ENXIO;
> +}
> +
> +int kvm_arm_spe_v1_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
> +{
> +	switch (attr->attr) {
> +	case KVM_ARM_VCPU_SPE_V1_IRQ:
> +	case KVM_ARM_VCPU_SPE_V1_INIT:
> +		if (kvm_arm_support_spe_v1() &&
> +		    test_bit(KVM_ARM_VCPU_SPE_V1, vcpu->arch.features))
> +			return 0;

It is interesting that all the user interface is designed with the
feature being optional in mind, and yet you've removed all the
necessary handling code.

	M.

-- 
Jazz is not dead, it just smells funny.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 13/18] perf: arm_spe: Add KVM structure for obtaining IRQ info
  2019-12-20 14:30 ` [PATCH v2 13/18] perf: arm_spe: Add KVM structure for obtaining IRQ info Andrew Murray
@ 2019-12-22 11:24   ` Marc Zyngier
  2019-12-24 12:35     ` Andrew Murray
  0 siblings, 1 reply; 78+ messages in thread
From: Marc Zyngier @ 2019-12-22 11:24 UTC (permalink / raw)
  To: Andrew Murray
  Cc: Catalin Marinas, Will Deacon, kvm, linux-kernel, Sudeep Holla,
	kvmarm, linux-arm-kernel

On Fri, 20 Dec 2019 14:30:20 +0000,
Andrew Murray <andrew.murray@arm.com> wrote:
> 
> KVM requires knowledge of the physical SPE IRQ number such that it can
> associate it with any virtual IRQ for guests that require SPE emulation.

This is at best extremely odd. The only reason for KVM to obtain this
IRQ number is if it has exclusive access to the device.  This
obviously isn't the case, as this device is shared between host and
guest.

> Let's create a structure to hold this information and an accessor that
> KVM can use to retrieve this information.
> 
> We expect that each SPE device will have the same physical PPI number
> and thus will warn when this is not the case.
> 
> Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> ---
>  drivers/perf/arm_spe_pmu.c | 23 +++++++++++++++++++++++
>  include/kvm/arm_spe.h      |  6 ++++++
>  2 files changed, 29 insertions(+)
> 
> diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
> index 4e4984a55cd1..2d24af4cfcab 100644
> --- a/drivers/perf/arm_spe_pmu.c
> +++ b/drivers/perf/arm_spe_pmu.c
> @@ -34,6 +34,9 @@
>  #include <linux/smp.h>
>  #include <linux/vmalloc.h>
>  
> +#include <linux/kvm_host.h>
> +#include <kvm/arm_spe.h>
> +
>  #include <asm/barrier.h>
>  #include <asm/cpufeature.h>
>  #include <asm/mmu.h>
> @@ -1127,6 +1130,24 @@ static void arm_spe_pmu_dev_teardown(struct arm_spe_pmu *spe_pmu)
>  	free_percpu_irq(spe_pmu->irq, spe_pmu->handle);
>  }
>  
> +#ifdef CONFIG_KVM_ARM_SPE
> +static struct arm_spe_kvm_info arm_spe_kvm_info;
> +
> +struct arm_spe_kvm_info *arm_spe_get_kvm_info(void)
> +{
> +	return &arm_spe_kvm_info;
> +}

How does this work when SPE is built as a module?

> +
> +static void arm_spe_populate_kvm_info(struct arm_spe_pmu *spe_pmu)
> +{
> +	WARN_ON_ONCE(arm_spe_kvm_info.physical_irq != 0 &&
> +		     arm_spe_kvm_info.physical_irq != spe_pmu->irq);
> +	arm_spe_kvm_info.physical_irq = spe_pmu->irq;

What does 'physical' means here? It's an IRQ in the Linux sense, so
it's already some random number that bears no relation to anything
'physical'.

> +}
> +#else
> +static void arm_spe_populate_kvm_info(struct arm_spe_pmu *spe_pmu) {}
> +#endif
> +
>  /* Driver and device probing */
>  static int arm_spe_pmu_irq_probe(struct arm_spe_pmu *spe_pmu)
>  {
> @@ -1149,6 +1170,8 @@ static int arm_spe_pmu_irq_probe(struct arm_spe_pmu *spe_pmu)
>  	}
>  
>  	spe_pmu->irq = irq;
> +	arm_spe_populate_kvm_info(spe_pmu);
> +
>  	return 0;
>  }
>  
> diff --git a/include/kvm/arm_spe.h b/include/kvm/arm_spe.h
> index d1f3c564dfd0..9c65130d726d 100644
> --- a/include/kvm/arm_spe.h
> +++ b/include/kvm/arm_spe.h
> @@ -17,6 +17,12 @@ struct kvm_spe {
>  	bool irq_level;
>  };
>  
> +struct arm_spe_kvm_info {
> +	int physical_irq;
> +};
> +
> +struct arm_spe_kvm_info *arm_spe_get_kvm_info(void);
> +
>  #ifdef CONFIG_KVM_ARM_SPE
>  #define kvm_arm_spe_v1_ready(v)		((v)->arch.spe.ready)
>  #define kvm_arm_spe_irq_initialized(v)		\

	M.

-- 
Jazz is not dead, it just smells funny.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 14/18] KVM: arm64: spe: Provide guest virtual interrupts for SPE
  2019-12-20 14:30 ` [PATCH v2 14/18] KVM: arm64: spe: Provide guest virtual interrupts for SPE Andrew Murray
@ 2019-12-22 12:07   ` Marc Zyngier
  2019-12-24 11:50     ` Andrew Murray
  0 siblings, 1 reply; 78+ messages in thread
From: Marc Zyngier @ 2019-12-22 12:07 UTC (permalink / raw)
  To: Andrew Murray
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, kvm, linux-kernel,
	Sudeep Holla, kvmarm, linux-arm-kernel

On Fri, 20 Dec 2019 14:30:21 +0000,
Andrew Murray <andrew.murray@arm.com> wrote:
> 
> Upon the exit of a guest, let's determine if the SPE device has generated
> an interrupt - if so we'll inject a virtual interrupt to the guest.
> 
> Upon the entry and exit of a guest we'll also update the state of the
> physical IRQ such that it is active when a guest interrupt is pending
> and the guest is running.
> 
> Finally we map the physical IRQ to the virtual IRQ such that the guest
> can deactivate the interrupt when it handles the interrupt.
> 
> Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> ---
>  include/kvm/arm_spe.h |  6 ++++
>  virt/kvm/arm/arm.c    |  5 ++-
>  virt/kvm/arm/spe.c    | 71 +++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 81 insertions(+), 1 deletion(-)
> 
> diff --git a/include/kvm/arm_spe.h b/include/kvm/arm_spe.h
> index 9c65130d726d..91b2214f543a 100644
> --- a/include/kvm/arm_spe.h
> +++ b/include/kvm/arm_spe.h
> @@ -37,6 +37,9 @@ static inline bool kvm_arm_support_spe_v1(void)
>  						      ID_AA64DFR0_PMSVER_SHIFT);
>  }
>  
> +void kvm_spe_flush_hwstate(struct kvm_vcpu *vcpu);
> +inline void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu);
> +
>  int kvm_arm_spe_v1_set_attr(struct kvm_vcpu *vcpu,
>  			    struct kvm_device_attr *attr);
>  int kvm_arm_spe_v1_get_attr(struct kvm_vcpu *vcpu,
> @@ -49,6 +52,9 @@ int kvm_arm_spe_v1_enable(struct kvm_vcpu *vcpu);
>  #define kvm_arm_support_spe_v1()	(false)
>  #define kvm_arm_spe_irq_initialized(v)	(false)
>  
> +static inline void kvm_spe_flush_hwstate(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu) {}
> +
>  static inline int kvm_arm_spe_v1_set_attr(struct kvm_vcpu *vcpu,
>  					  struct kvm_device_attr *attr)
>  {
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 340d2388ee2c..a66085c8e785 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -741,6 +741,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		preempt_disable();
>  
>  		kvm_pmu_flush_hwstate(vcpu);
> +		kvm_spe_flush_hwstate(vcpu);
>  
>  		local_irq_disable();
>  
> @@ -782,6 +783,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		    kvm_request_pending(vcpu)) {
>  			vcpu->mode = OUTSIDE_GUEST_MODE;
>  			isb(); /* Ensure work in x_flush_hwstate is committed */
> +			kvm_spe_sync_hwstate(vcpu);
>  			kvm_pmu_sync_hwstate(vcpu);
>  			if (static_branch_unlikely(&userspace_irqchip_in_use))
>  				kvm_timer_sync_hwstate(vcpu);
> @@ -816,11 +818,12 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		kvm_arm_clear_debug(vcpu);
>  
>  		/*
> -		 * We must sync the PMU state before the vgic state so
> +		 * We must sync the PMU and SPE state before the vgic state so
>  		 * that the vgic can properly sample the updated state of the
>  		 * interrupt line.
>  		 */
>  		kvm_pmu_sync_hwstate(vcpu);
> +		kvm_spe_sync_hwstate(vcpu);

The *HUGE* difference is that the PMU is purely a virtual interrupt,
while you're trying to deal with a HW interrupt here.

>  
>  		/*
>  		 * Sync the vgic state before syncing the timer state because
> diff --git a/virt/kvm/arm/spe.c b/virt/kvm/arm/spe.c
> index 83ac2cce2cc3..097ed39014e4 100644
> --- a/virt/kvm/arm/spe.c
> +++ b/virt/kvm/arm/spe.c
> @@ -35,6 +35,68 @@ int kvm_arm_spe_v1_enable(struct kvm_vcpu *vcpu)
>  	return 0;
>  }
>  
> +static inline void set_spe_irq_phys_active(struct arm_spe_kvm_info *info,
> +					   bool active)
> +{
> +	int r;
> +	r = irq_set_irqchip_state(info->physical_irq, IRQCHIP_STATE_ACTIVE,
> +				  active);
> +	WARN_ON(r);
> +}
> +
> +void kvm_spe_flush_hwstate(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_spe *spe = &vcpu->arch.spe;
> +	bool phys_active = false;
> +	struct arm_spe_kvm_info *info = arm_spe_get_kvm_info();
> +
> +	if (!kvm_arm_spe_v1_ready(vcpu))
> +		return;
> +
> +	if (irqchip_in_kernel(vcpu->kvm))
> +		phys_active = kvm_vgic_map_is_active(vcpu, spe->irq_num);
> +
> +	phys_active |= spe->irq_level;
> +
> +	set_spe_irq_phys_active(info, phys_active);

So you're happy to mess with the HW interrupt state even when you
don't have a HW irqchip? If you are going to copy paste the timer code
here, you'd need to support it all the way (no, don't).

> +}
> +
> +void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_spe *spe = &vcpu->arch.spe;
> +	u64 pmbsr;
> +	int r;
> +	bool service;
> +	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
> +	struct arm_spe_kvm_info *info = arm_spe_get_kvm_info();
> +
> +	if (!kvm_arm_spe_v1_ready(vcpu))
> +		return;
> +
> +	set_spe_irq_phys_active(info, false);
> +
> +	pmbsr = ctxt->sys_regs[PMBSR_EL1];
> +	service = !!(pmbsr & BIT(SYS_PMBSR_EL1_S_SHIFT));
> +	if (spe->irq_level == service)
> +		return;
> +
> +	spe->irq_level = service;
> +
> +	if (likely(irqchip_in_kernel(vcpu->kvm))) {
> +		r = kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
> +					spe->irq_num, service, spe);
> +		WARN_ON(r);
> +	}
> +}
> +
> +static inline bool kvm_arch_arm_spe_v1_get_input_level(int vintid)
> +{
> +	struct kvm_vcpu *vcpu = kvm_arm_get_running_vcpu();
> +	struct kvm_spe *spe = &vcpu->arch.spe;
> +
> +	return spe->irq_level;
> +}

This isn't what such a callback is for. It is supposed to sample the
HW, an nothing else.

> +
>  static int kvm_arm_spe_v1_init(struct kvm_vcpu *vcpu)
>  {
>  	if (!kvm_arm_support_spe_v1())
> @@ -48,6 +110,7 @@ static int kvm_arm_spe_v1_init(struct kvm_vcpu *vcpu)
>  
>  	if (irqchip_in_kernel(vcpu->kvm)) {
>  		int ret;
> +		struct arm_spe_kvm_info *info;
>  
>  		/*
>  		 * If using the SPE with an in-kernel virtual GIC
> @@ -57,10 +120,18 @@ static int kvm_arm_spe_v1_init(struct kvm_vcpu *vcpu)
>  		if (!vgic_initialized(vcpu->kvm))
>  			return -ENODEV;
>  
> +		info = arm_spe_get_kvm_info();
> +		if (!info->physical_irq)
> +			return -ENODEV;
> +
>  		ret = kvm_vgic_set_owner(vcpu, vcpu->arch.spe.irq_num,
>  					 &vcpu->arch.spe);
>  		if (ret)
>  			return ret;
> +
> +		ret = kvm_vgic_map_phys_irq(vcpu, info->physical_irq,
> +					    vcpu->arch.spe.irq_num,
> +					    kvm_arch_arm_spe_v1_get_input_level);

You're mapping the interrupt int the guest, and yet you have never
forwarded the interrupt the first place. All this flow is only going
to wreck the host driver as soon as an interrupt occurs.

I think you should rethink the interrupt handling altogether. It would
make more sense if the interrupt was actually completely
virtualized. If you can isolate the guest state and compute the
interrupt state in SW (and from the above, it seems that you can),
then you shouldn't mess with the whole forwarding *at all*, as it
isn't designed for devices shared between host and guests.

	M.

-- 
Jazz is not dead, it just smells funny.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 15/18] perf: arm_spe: Handle guest/host exclusion flags
  2019-12-20 14:30 ` [PATCH v2 15/18] perf: arm_spe: Handle guest/host exclusion flags Andrew Murray
  2019-12-20 18:10   ` Mark Rutland
@ 2019-12-22 12:10   ` Marc Zyngier
  2019-12-23 12:10     ` Andrew Murray
  1 sibling, 1 reply; 78+ messages in thread
From: Marc Zyngier @ 2019-12-22 12:10 UTC (permalink / raw)
  To: Andrew Murray
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, kvm, linux-kernel,
	Sudeep Holla, kvmarm, linux-arm-kernel

On Fri, 20 Dec 2019 14:30:22 +0000,
Andrew Murray <andrew.murray@arm.com> wrote:
> 
> A side effect of supporting the SPE in guests is that we prevent the
> host from collecting data whilst inside a guest thus creating a black-out
> window. This occurs because instead of emulating the SPE, we share it
> with our guests.
> 
> Let's accurately describe our capabilities by using the perf exclude
> flags to prevent !exclude_guest and exclude_host flags from being used.
> 
> Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> ---
>  drivers/perf/arm_spe_pmu.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
> index 2d24af4cfcab..3703dbf459de 100644
> --- a/drivers/perf/arm_spe_pmu.c
> +++ b/drivers/perf/arm_spe_pmu.c
> @@ -679,6 +679,9 @@ static int arm_spe_pmu_event_init(struct perf_event *event)
>  	if (attr->exclude_idle)
>  		return -EOPNOTSUPP;
>  
> +	if (!attr->exclude_guest || attr->exclude_host)
> +		return -EOPNOTSUPP;
> +

I have the opposite approach. If the host decides to profile the
guest, why should that be denied? If there is a black hole, it should
take place in the guest. Today, the host does expect this to work, and
there is no way that we unconditionally allow it to regress.

	M.

-- 
Jazz is not dead, it just smells funny.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 00/18] arm64: KVM: add SPE profiling support
  2019-12-21 10:48 ` Marc Zyngier
@ 2019-12-22 12:22   ` Marc Zyngier
  2019-12-24 12:56     ` Andrew Murray
  0 siblings, 1 reply; 78+ messages in thread
From: Marc Zyngier @ 2019-12-22 12:22 UTC (permalink / raw)
  To: Andrew Murray
  Cc: kvm, Catalin Marinas, linux-kernel, Sudeep Holla, will, kvmarm,
	linux-arm-kernel

On Sat, 21 Dec 2019 10:48:16 +0000,
Marc Zyngier <maz@kernel.org> wrote:
> 
> [fixing email addresses]
> 
> Hi Andrew,
> 
> On 2019-12-20 14:30, Andrew Murray wrote:
> > This series implements support for allowing KVM guests to use the Arm
> > Statistical Profiling Extension (SPE).
> 
> Thanks for this. In future, please Cc me and Will on email addresses
> we can actually read.
> 
> > It has been tested on a model to ensure that both host and guest can
> > simultaneously use SPE with valid data. E.g.
> > 
> > $ perf record -e arm_spe/ts_enable=1,pa_enable=1,pct_enable=1/ \
> >         dd if=/dev/zero of=/dev/null count=1000
> > $ perf report --dump-raw-trace > spe_buf.txt
> > 
> > As we save and restore the SPE context, the guest can access the SPE
> > registers directly, thus in this version of the series we remove the
> > trapping and emulation.
> > 
> > In the previous series of this support, when KVM SPE isn't
> > supported (e.g. via CONFIG_KVM_ARM_SPE) we were able to return a
> > value of 0 to all reads of the SPE registers - as we can no longer
> > do this there isn't a mechanism to prevent the guest from using
> > SPE - thus I'm keen for feedback on the best way of resolving
> > this.
> 
> Surely there is a way to conditionally trap SPE registers, right? You
> should still be able to do this if SPE is not configured for a given
> guest (as we do for other feature such as PtrAuth).
> 
> > It appears necessary to pin the entire guest memory in order to
> > provide guest SPE access - otherwise it is possible for the guest
> > to receive Stage-2 faults.
> 
> Really? How can the guest receive a stage-2 fault? This doesn't fit
> what I understand of the ARMv8 exception model. Or do you mean a SPE
> interrupt describing a S2 fault?
> 
> And this is not just pinning the memory either. You have to ensure that
> all S2 page tables are created ahead of SPE being able to DMA to guest
> memory. This may have some impacts on the THP code...
> 
> I'll have a look at the actual series ASAP (but that's not very soon).

I found some time to go through the series, and there is clearly a lot
of work left to do:

- There so nothing here to handle memory pinning whatsoever. If it
  works, it is only thanks to some side effect.

- The missing trapping is deeply worrying. Given that this is an
  optional feature, you cannot just let the guest do whatever it wants
  in an uncontrolled manner.

- The interrupt handling is busted. You mix concepts picked from both
  the PMU and the timer code, while the SPE device doesn't behave like
  any of these two (it is neither a fully emulated device, nor a
  device that is exclusively owned by a guest at any given time).

I expect some level of discussion on the list including at least Will
and myself before you respin this.

	M.

-- 
Jazz is not dead, it just smells funny.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 11/18] KVM: arm64: don't trap Statistical Profiling controls to EL2
  2019-12-22 10:42   ` Marc Zyngier
@ 2019-12-23 11:56     ` Andrew Murray
  2019-12-23 12:05       ` Marc Zyngier
  0 siblings, 1 reply; 78+ messages in thread
From: Andrew Murray @ 2019-12-23 11:56 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, kvm, linux-kernel,
	Sudeep Holla, kvmarm, linux-arm-kernel

On Sun, Dec 22, 2019 at 10:42:05AM +0000, Marc Zyngier wrote:
> On Fri, 20 Dec 2019 14:30:18 +0000,
> Andrew Murray <andrew.murray@arm.com> wrote:
> > 
> > As we now save/restore the profiler state there is no need to trap
> > accesses to the statistical profiling controls. Let's unset the
> > _TPMS bit.
> > 
> > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> > ---
> >  arch/arm64/kvm/debug.c | 2 --
> >  1 file changed, 2 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
> > index 43487f035385..07ca783e7d9e 100644
> > --- a/arch/arm64/kvm/debug.c
> > +++ b/arch/arm64/kvm/debug.c
> > @@ -88,7 +88,6 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu)
> >   *  - Performance monitors (MDCR_EL2_TPM/MDCR_EL2_TPMCR)
> >   *  - Debug ROM Address (MDCR_EL2_TDRA)
> >   *  - OS related registers (MDCR_EL2_TDOSA)
> > - *  - Statistical profiler (MDCR_EL2_TPMS/MDCR_EL2_E2PB)
> >   *
> >   * Additionally, KVM only traps guest accesses to the debug registers if
> >   * the guest is not actively using them (see the KVM_ARM64_DEBUG_DIRTY
> > @@ -111,7 +110,6 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
> >  	 */
> >  	vcpu->arch.mdcr_el2 = __this_cpu_read(mdcr_el2) & MDCR_EL2_HPMN_MASK;
> >  	vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
> > -				MDCR_EL2_TPMS |
> 
> No. This is an *optional* feature (the guest could not be presented
> with the SPE feature, or the the support simply not be compiled in).
> 
> If the guest is not allowed to see the feature, for whichever reason,
> the traps *must* be enabled and handled.

I'll update this (and similar) to trap such registers when we don't support
SPE in the guest.

My original concern in the cover letter was in how to prevent the guest
from attempting to use these registers in the first place - I think the
solution I was looking for is to trap-and-emulate ID_AA64DFR0_EL1 such that
the PMSVer bits indicate that SPE is not emulated.

Thanks,

Andrew Murray


> 
> 	M.
> 
> -- 
> Jazz is not dead, it just smells funny.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 11/18] KVM: arm64: don't trap Statistical Profiling  controls to EL2
  2019-12-23 11:56     ` Andrew Murray
@ 2019-12-23 12:05       ` Marc Zyngier
  2019-12-23 12:10         ` Andrew Murray
  0 siblings, 1 reply; 78+ messages in thread
From: Marc Zyngier @ 2019-12-23 12:05 UTC (permalink / raw)
  To: Andrew Murray
  Cc: Catalin Marinas, kvm, linux-kernel, Sudeep Holla, kvmarm,
	linux-arm-kernel

On 2019-12-23 11:56, Andrew Murray wrote:
> On Sun, Dec 22, 2019 at 10:42:05AM +0000, Marc Zyngier wrote:
>> On Fri, 20 Dec 2019 14:30:18 +0000,
>> Andrew Murray <andrew.murray@arm.com> wrote:
>> >
>> > As we now save/restore the profiler state there is no need to trap
>> > accesses to the statistical profiling controls. Let's unset the
>> > _TPMS bit.
>> >
>> > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
>> > ---
>> >  arch/arm64/kvm/debug.c | 2 --
>> >  1 file changed, 2 deletions(-)
>> >
>> > diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
>> > index 43487f035385..07ca783e7d9e 100644
>> > --- a/arch/arm64/kvm/debug.c
>> > +++ b/arch/arm64/kvm/debug.c
>> > @@ -88,7 +88,6 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu 
>> *vcpu)
>> >   *  - Performance monitors (MDCR_EL2_TPM/MDCR_EL2_TPMCR)
>> >   *  - Debug ROM Address (MDCR_EL2_TDRA)
>> >   *  - OS related registers (MDCR_EL2_TDOSA)
>> > - *  - Statistical profiler (MDCR_EL2_TPMS/MDCR_EL2_E2PB)
>> >   *
>> >   * Additionally, KVM only traps guest accesses to the debug 
>> registers if
>> >   * the guest is not actively using them (see the 
>> KVM_ARM64_DEBUG_DIRTY
>> > @@ -111,7 +110,6 @@ void kvm_arm_setup_debug(struct kvm_vcpu 
>> *vcpu)
>> >  	 */
>> >  	vcpu->arch.mdcr_el2 = __this_cpu_read(mdcr_el2) & 
>> MDCR_EL2_HPMN_MASK;
>> >  	vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
>> > -				MDCR_EL2_TPMS |
>>
>> No. This is an *optional* feature (the guest could not be presented
>> with the SPE feature, or the the support simply not be compiled in).
>>
>> If the guest is not allowed to see the feature, for whichever 
>> reason,
>> the traps *must* be enabled and handled.
>
> I'll update this (and similar) to trap such registers when we don't 
> support
> SPE in the guest.
>
> My original concern in the cover letter was in how to prevent the 
> guest
> from attempting to use these registers in the first place - I think 
> the
> solution I was looking for is to trap-and-emulate ID_AA64DFR0_EL1 
> such that
> the PMSVer bits indicate that SPE is not emulated.

That, and active trapping of the SPE system registers resulting in 
injection
of an UNDEF into the offending guest.

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 15/18] perf: arm_spe: Handle guest/host exclusion flags
  2019-12-22 12:10   ` Marc Zyngier
@ 2019-12-23 12:10     ` Andrew Murray
  2019-12-23 12:18       ` Marc Zyngier
  0 siblings, 1 reply; 78+ messages in thread
From: Andrew Murray @ 2019-12-23 12:10 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, kvm, linux-kernel,
	Sudeep Holla, kvmarm, linux-arm-kernel

On Sun, Dec 22, 2019 at 12:10:52PM +0000, Marc Zyngier wrote:
> On Fri, 20 Dec 2019 14:30:22 +0000,
> Andrew Murray <andrew.murray@arm.com> wrote:
> > 
> > A side effect of supporting the SPE in guests is that we prevent the
> > host from collecting data whilst inside a guest thus creating a black-out
> > window. This occurs because instead of emulating the SPE, we share it
> > with our guests.
> > 
> > Let's accurately describe our capabilities by using the perf exclude
> > flags to prevent !exclude_guest and exclude_host flags from being used.
> > 
> > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> > ---
> >  drivers/perf/arm_spe_pmu.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
> > index 2d24af4cfcab..3703dbf459de 100644
> > --- a/drivers/perf/arm_spe_pmu.c
> > +++ b/drivers/perf/arm_spe_pmu.c
> > @@ -679,6 +679,9 @@ static int arm_spe_pmu_event_init(struct perf_event *event)
> >  	if (attr->exclude_idle)
> >  		return -EOPNOTSUPP;
> >  
> > +	if (!attr->exclude_guest || attr->exclude_host)
> > +		return -EOPNOTSUPP;
> > +
> 
> I have the opposite approach. If the host decides to profile the
> guest, why should that be denied? If there is a black hole, it should
> take place in the guest. Today, the host does expect this to work, and
> there is no way that we unconditionally allow it to regress.

That seems reasonable.

Upon entering the guest we'd have to detect if the host is using SPE, and if
so choose not to restore the guest registers. Instead we'd have to trap them
and let the guest read/write emulated values until the host has finished with
SPE - at which time we could restore the guest SPE registers to hardware.

Does that approach make sense?

Thanks,

Andrew Murray

> 
> 	M.
> 
> -- 
> Jazz is not dead, it just smells funny.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 11/18] KVM: arm64: don't trap Statistical Profiling controls to EL2
  2019-12-23 12:05       ` Marc Zyngier
@ 2019-12-23 12:10         ` Andrew Murray
  2020-01-09 17:25           ` Andrew Murray
  0 siblings, 1 reply; 78+ messages in thread
From: Andrew Murray @ 2019-12-23 12:10 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, kvm, linux-kernel, Sudeep Holla, kvmarm,
	linux-arm-kernel

On Mon, Dec 23, 2019 at 12:05:12PM +0000, Marc Zyngier wrote:
> On 2019-12-23 11:56, Andrew Murray wrote:
> > On Sun, Dec 22, 2019 at 10:42:05AM +0000, Marc Zyngier wrote:
> > > On Fri, 20 Dec 2019 14:30:18 +0000,
> > > Andrew Murray <andrew.murray@arm.com> wrote:
> > > >
> > > > As we now save/restore the profiler state there is no need to trap
> > > > accesses to the statistical profiling controls. Let's unset the
> > > > _TPMS bit.
> > > >
> > > > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> > > > ---
> > > >  arch/arm64/kvm/debug.c | 2 --
> > > >  1 file changed, 2 deletions(-)
> > > >
> > > > diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
> > > > index 43487f035385..07ca783e7d9e 100644
> > > > --- a/arch/arm64/kvm/debug.c
> > > > +++ b/arch/arm64/kvm/debug.c
> > > > @@ -88,7 +88,6 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu
> > > *vcpu)
> > > >   *  - Performance monitors (MDCR_EL2_TPM/MDCR_EL2_TPMCR)
> > > >   *  - Debug ROM Address (MDCR_EL2_TDRA)
> > > >   *  - OS related registers (MDCR_EL2_TDOSA)
> > > > - *  - Statistical profiler (MDCR_EL2_TPMS/MDCR_EL2_E2PB)
> > > >   *
> > > >   * Additionally, KVM only traps guest accesses to the debug
> > > registers if
> > > >   * the guest is not actively using them (see the
> > > KVM_ARM64_DEBUG_DIRTY
> > > > @@ -111,7 +110,6 @@ void kvm_arm_setup_debug(struct kvm_vcpu
> > > *vcpu)
> > > >  	 */
> > > >  	vcpu->arch.mdcr_el2 = __this_cpu_read(mdcr_el2) &
> > > MDCR_EL2_HPMN_MASK;
> > > >  	vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
> > > > -				MDCR_EL2_TPMS |
> > > 
> > > No. This is an *optional* feature (the guest could not be presented
> > > with the SPE feature, or the the support simply not be compiled in).
> > > 
> > > If the guest is not allowed to see the feature, for whichever
> > > reason,
> > > the traps *must* be enabled and handled.
> > 
> > I'll update this (and similar) to trap such registers when we don't
> > support
> > SPE in the guest.
> > 
> > My original concern in the cover letter was in how to prevent the guest
> > from attempting to use these registers in the first place - I think the
> > solution I was looking for is to trap-and-emulate ID_AA64DFR0_EL1 such
> > that
> > the PMSVer bits indicate that SPE is not emulated.
> 
> That, and active trapping of the SPE system registers resulting in injection
> of an UNDEF into the offending guest.

Yes that's no problem.

Thanks,

Andrew Murray

> 
> Thanks,
> 
>         M.
> -- 
> Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 15/18] perf: arm_spe: Handle guest/host exclusion flags
  2019-12-23 12:10     ` Andrew Murray
@ 2019-12-23 12:18       ` Marc Zyngier
  0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2019-12-23 12:18 UTC (permalink / raw)
  To: Andrew Murray
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, kvm, linux-kernel,
	Sudeep Holla, kvmarm, linux-arm-kernel

On 2019-12-23 12:10, Andrew Murray wrote:
> On Sun, Dec 22, 2019 at 12:10:52PM +0000, Marc Zyngier wrote:
>> On Fri, 20 Dec 2019 14:30:22 +0000,
>> Andrew Murray <andrew.murray@arm.com> wrote:
>> >
>> > A side effect of supporting the SPE in guests is that we prevent 
>> the
>> > host from collecting data whilst inside a guest thus creating a 
>> black-out
>> > window. This occurs because instead of emulating the SPE, we share 
>> it
>> > with our guests.
>> >
>> > Let's accurately describe our capabilities by using the perf 
>> exclude
>> > flags to prevent !exclude_guest and exclude_host flags from being 
>> used.
>> >
>> > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
>> > ---
>> >  drivers/perf/arm_spe_pmu.c | 3 +++
>> >  1 file changed, 3 insertions(+)
>> >
>> > diff --git a/drivers/perf/arm_spe_pmu.c 
>> b/drivers/perf/arm_spe_pmu.c
>> > index 2d24af4cfcab..3703dbf459de 100644
>> > --- a/drivers/perf/arm_spe_pmu.c
>> > +++ b/drivers/perf/arm_spe_pmu.c
>> > @@ -679,6 +679,9 @@ static int arm_spe_pmu_event_init(struct 
>> perf_event *event)
>> >  	if (attr->exclude_idle)
>> >  		return -EOPNOTSUPP;
>> >
>> > +	if (!attr->exclude_guest || attr->exclude_host)
>> > +		return -EOPNOTSUPP;
>> > +
>>
>> I have the opposite approach. If the host decides to profile the
>> guest, why should that be denied? If there is a black hole, it 
>> should
>> take place in the guest. Today, the host does expect this to work, 
>> and
>> there is no way that we unconditionally allow it to regress.
>
> That seems reasonable.
>
> Upon entering the guest we'd have to detect if the host is using SPE, 
> and if
> so choose not to restore the guest registers. Instead we'd have to 
> trap them
> and let the guest read/write emulated values until the host has 
> finished with
> SPE - at which time we could restore the guest SPE registers to 
> hardware.
>
> Does that approach make sense?

Yes, this would be much better. All of this can be found out at 
vcpu_load()
time, and once you've moved most of the SPE sysreg handling there, it 
will
just follow the normal scheduling flow.

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 02/18] arm64: KVM: reset E2PB correctly in MDCR_EL2 when exiting the guest(VHE)
  2019-12-21 13:12   ` Marc Zyngier
@ 2019-12-24 10:29     ` Andrew Murray
  2020-01-02 16:21       ` Andrew Murray
  0 siblings, 1 reply; 78+ messages in thread
From: Andrew Murray @ 2019-12-24 10:29 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: will, Catalin Marinas, kvm, linux-kernel, Sudeep Holla, kvmarm,
	linux-arm-kernel

On Sat, Dec 21, 2019 at 01:12:14PM +0000, Marc Zyngier wrote:
> On Fri, 20 Dec 2019 14:30:09 +0000
> Andrew Murray <andrew.murray@arm.com> wrote:
> 
> > From: Sudeep Holla <sudeep.holla@arm.com>
> > 
> > On VHE systems, the reset value for MDCR_EL2.E2PB=b00 which defaults
> > to profiling buffer using the EL2 stage 1 translations. 
> 
> Does the reset value actually matter here? I don't see it being
> specific to VHE systems, and all we're trying to achieve is to restore
> the SPE configuration to a state where it can be used by the host.
> 
> > However if the
> > guest are allowed to use profiling buffers changing E2PB settings, we
> 
> How can the guest be allowed to change E2PB settings? Or do you mean
> here that allowing the guest to use SPE will mandate changes of the
> E2PB settings, and that we'd better restore the hypervisor state once
> we exit?
> 
> > need to ensure we resume back MDCR_EL2.E2PB=b00. Currently we just
> > do bitwise '&' with MDCR_EL2_E2PB_MASK which will retain the value.
> > 
> > So fix it by clearing all the bits in E2PB.
> > 
> > Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> > ---
> >  arch/arm64/kvm/hyp/switch.c | 4 +---
> >  1 file changed, 1 insertion(+), 3 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> > index 72fbbd86eb5e..250f13910882 100644
> > --- a/arch/arm64/kvm/hyp/switch.c
> > +++ b/arch/arm64/kvm/hyp/switch.c
> > @@ -228,9 +228,7 @@ void deactivate_traps_vhe_put(void)
> >  {
> >  	u64 mdcr_el2 = read_sysreg(mdcr_el2);
> >  
> > -	mdcr_el2 &= MDCR_EL2_HPMN_MASK |
> > -		    MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT |
> > -		    MDCR_EL2_TPMS;
> > +	mdcr_el2 &= MDCR_EL2_HPMN_MASK | MDCR_EL2_TPMS;
> >  
> >  	write_sysreg(mdcr_el2, mdcr_el2);
> >  
> 
> I'm OK with this change, but I believe the commit message could use
> some tidying up.

No problem, I'll update the commit message.

Thanks,

Andrew Murray

> 
> Thanks,
> 
> 	M.
> -- 
> Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 08/18] arm64: KVM: add support to save/restore SPE profiling buffer controls
  2019-12-21 13:57   ` Marc Zyngier
@ 2019-12-24 10:49     ` Andrew Murray
  2019-12-24 15:17       ` Andrew Murray
  0 siblings, 1 reply; 78+ messages in thread
From: Andrew Murray @ 2019-12-24 10:49 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Will Deacon, kvm, linux-kernel, Sudeep Holla,
	kvmarm, linux-arm-kernel

On Sat, Dec 21, 2019 at 01:57:55PM +0000, Marc Zyngier wrote:
> On Fri, 20 Dec 2019 14:30:15 +0000
> Andrew Murray <andrew.murray@arm.com> wrote:
> 
> > From: Sudeep Holla <sudeep.holla@arm.com>
> > 
> > Currently since we don't support profiling using SPE in the guests,
> > we just save the PMSCR_EL1, flush the profiling buffers and disable
> > sampling. However in order to support simultaneous sampling both in
> 
> Is the sampling actually simultaneous? I don't believe so (the whole
> series would be much simpler if it was).

No the SPE is used by either the guest or host at any one time. I guess
the term simultaneous was used to refer to illusion given to both guest
and host that they are able to use it whenever they like. I'll update
the commit message to drop the magic.
 

> 
> > the host and guests, we need to save and reatore the complete SPE
> 
> s/reatore/restore/

Noted.


> 
> > profiling buffer controls' context.
> > 
> > Let's add the support for the same and keep it disabled for now.
> > We can enable it conditionally only if guests are allowed to use
> > SPE.
> > 
> > Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> > [ Clear PMBSR bit when saving state to prevent spurious interrupts ]
> > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> > ---
> >  arch/arm64/kvm/hyp/debug-sr.c | 51 +++++++++++++++++++++++++++++------
> >  1 file changed, 43 insertions(+), 8 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
> > index 8a70a493345e..12429b212a3a 100644
> > --- a/arch/arm64/kvm/hyp/debug-sr.c
> > +++ b/arch/arm64/kvm/hyp/debug-sr.c
> > @@ -85,7 +85,8 @@
> >  	default:	write_debug(ptr[0], reg, 0);			\
> >  	}
> >  
> > -static void __hyp_text __debug_save_spe_nvhe(struct kvm_cpu_context *ctxt)
> > +static void __hyp_text
> > +__debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
> 
> nit: don't split lines like this if you can avoid it. You can put the
> full_ctxt parameter on a separate line instead.

Yes understood.


> 
> >  {
> >  	u64 reg;
> >  
> > @@ -102,22 +103,46 @@ static void __hyp_text __debug_save_spe_nvhe(struct kvm_cpu_context *ctxt)
> >  	if (reg & BIT(SYS_PMBIDR_EL1_P_SHIFT))
> >  		return;
> >  
> > -	/* No; is the host actually using the thing? */
> > -	reg = read_sysreg_s(SYS_PMBLIMITR_EL1);
> > -	if (!(reg & BIT(SYS_PMBLIMITR_EL1_E_SHIFT)))
> > +	/* Save the control register and disable data generation */
> > +	ctxt->sys_regs[PMSCR_EL1] = read_sysreg_el1(SYS_PMSCR);
> > +
> > +	if (!ctxt->sys_regs[PMSCR_EL1])
> 
> Shouldn't you check the enable bits instead of relying on the whole
> thing being zero?

Yes that would make more sense (E1SPE and E0SPE).

I feel that this check makes an assumption about the guest/host SPE
driver... What happens if the SPE driver writes to some SPE registers
but doesn't enable PMSCR? If the guest is also using SPE then those
writes will be lost, when the host returns and the SPE driver enables
SPE it won't work.

With a quick look at the SPE driver I'm not sure this will happen, but
even so it makes me nervous relying on these assumptions. I wonder if
this risk is present in other devices?


> 
> >  		return;
> >  
> >  	/* Yes; save the control register and disable data generation */
> > -	ctxt->sys_regs[PMSCR_EL1] = read_sysreg_el1(SYS_PMSCR);
> 
> You've already saved the control register...

I'll remove that.


> 
> >  	write_sysreg_el1(0, SYS_PMSCR);
> >  	isb();
> >  
> >  	/* Now drain all buffered data to memory */
> >  	psb_csync();
> >  	dsb(nsh);
> > +
> > +	if (!full_ctxt)
> > +		return;
> > +
> > +	ctxt->sys_regs[PMBLIMITR_EL1] = read_sysreg_s(SYS_PMBLIMITR_EL1);
> > +	write_sysreg_s(0, SYS_PMBLIMITR_EL1);
> > +
> > +	/*
> > +	 * As PMBSR is conditionally restored when returning to the host we
> > +	 * must ensure the service bit is unset here to prevent a spurious
> > +	 * host SPE interrupt from being raised.
> > +	 */
> > +	ctxt->sys_regs[PMBSR_EL1] = read_sysreg_s(SYS_PMBSR_EL1);
> > +	write_sysreg_s(0, SYS_PMBSR_EL1);
> > +
> > +	isb();
> > +
> > +	ctxt->sys_regs[PMSICR_EL1] = read_sysreg_s(SYS_PMSICR_EL1);
> > +	ctxt->sys_regs[PMSIRR_EL1] = read_sysreg_s(SYS_PMSIRR_EL1);
> > +	ctxt->sys_regs[PMSFCR_EL1] = read_sysreg_s(SYS_PMSFCR_EL1);
> > +	ctxt->sys_regs[PMSEVFR_EL1] = read_sysreg_s(SYS_PMSEVFR_EL1);
> > +	ctxt->sys_regs[PMSLATFR_EL1] = read_sysreg_s(SYS_PMSLATFR_EL1);
> > +	ctxt->sys_regs[PMBPTR_EL1] = read_sysreg_s(SYS_PMBPTR_EL1);
> >  }
> >  
> > -static void __hyp_text __debug_restore_spe_nvhe(struct kvm_cpu_context *ctxt)
> > +static void __hyp_text
> > +__debug_restore_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
> >  {
> >  	if (!ctxt->sys_regs[PMSCR_EL1])
> >  		return;
> > @@ -126,6 +151,16 @@ static void __hyp_text __debug_restore_spe_nvhe(struct kvm_cpu_context *ctxt)
> >  	isb();
> >  
> >  	/* Re-enable data generation */
> > +	if (full_ctxt) {
> > +		write_sysreg_s(ctxt->sys_regs[PMBPTR_EL1], SYS_PMBPTR_EL1);
> > +		write_sysreg_s(ctxt->sys_regs[PMBLIMITR_EL1], SYS_PMBLIMITR_EL1);
> > +		write_sysreg_s(ctxt->sys_regs[PMSFCR_EL1], SYS_PMSFCR_EL1);
> > +		write_sysreg_s(ctxt->sys_regs[PMSEVFR_EL1], SYS_PMSEVFR_EL1);
> > +		write_sysreg_s(ctxt->sys_regs[PMSLATFR_EL1], SYS_PMSLATFR_EL1);
> > +		write_sysreg_s(ctxt->sys_regs[PMSIRR_EL1], SYS_PMSIRR_EL1);
> > +		write_sysreg_s(ctxt->sys_regs[PMSICR_EL1], SYS_PMSICR_EL1);
> > +		write_sysreg_s(ctxt->sys_regs[PMBSR_EL1], SYS_PMBSR_EL1);
> > +	}
> >  	write_sysreg_el1(ctxt->sys_regs[PMSCR_EL1], SYS_PMSCR);
> >  }
> >  
> > @@ -198,7 +233,7 @@ void __hyp_text __debug_restore_host_context(struct kvm_vcpu *vcpu)
> >  	guest_ctxt = &vcpu->arch.ctxt;
> >  
> >  	if (!has_vhe())
> > -		__debug_restore_spe_nvhe(host_ctxt);
> > +		__debug_restore_spe_nvhe(host_ctxt, false);
> >  
> >  	if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
> >  		return;
> > @@ -222,7 +257,7 @@ void __hyp_text __debug_save_host_context(struct kvm_vcpu *vcpu)
> >  
> >  	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> >  	if (!has_vhe())
> > -		__debug_save_spe_nvhe(host_ctxt);
> > +		__debug_save_spe_nvhe(host_ctxt, false);
> >  }
> >  
> >  void __hyp_text __debug_save_guest_context(struct kvm_vcpu *vcpu)
> 
> So all of this is for non-VHE. What happens in the VHE case?

By the end of the series this ends up in __debug_save_host_context which is
called for both VHE/nVHE - on the re-spin I'll make it not look so confusing.

Thanks,

Andrew Murray

> 
> 	M.
> -- 
> Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 10/18] arm64: KVM/debug: use EL1&0 stage 1 translation regime
  2019-12-22 10:34   ` Marc Zyngier
@ 2019-12-24 11:11     ` Andrew Murray
  2020-01-13 16:31     ` Andrew Murray
  1 sibling, 0 replies; 78+ messages in thread
From: Andrew Murray @ 2019-12-24 11:11 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Will Deacon, kvm, linux-kernel, Sudeep Holla,
	kvmarm, linux-arm-kernel

On Sun, Dec 22, 2019 at 10:34:55AM +0000, Marc Zyngier wrote:
> On Fri, 20 Dec 2019 14:30:17 +0000,
> Andrew Murray <andrew.murray@arm.com> wrote:
> > 
> > From: Sudeep Holla <sudeep.holla@arm.com>
> > 
> > Now that we have all the save/restore mechanism in place, lets enable
> > the translation regime used by buffer from EL2 stage 1 to EL1 stage 1
> > on VHE systems.
> > 
> > Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> > [ Reword commit, don't trap to EL2 ]
> 
> Not trapping to EL2 for the case where we don't allow SPE in the
> guest is not acceptable.

Yes understood (because of this I had meant to send the series as RFC btw).


> 
> > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> > ---
> >  arch/arm64/kvm/hyp/switch.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> > index 67b7c160f65b..6c153b79829b 100644
> > --- a/arch/arm64/kvm/hyp/switch.c
> > +++ b/arch/arm64/kvm/hyp/switch.c
> > @@ -100,6 +100,7 @@ static void activate_traps_vhe(struct kvm_vcpu *vcpu)
> >  
> >  	write_sysreg(val, cpacr_el1);
> >  
> > +	write_sysreg(vcpu->arch.mdcr_el2 | 3 << MDCR_EL2_E2PB_SHIFT, mdcr_el2);
> >  	write_sysreg(kvm_get_hyp_vector(), vbar_el1);
> >  }
> >  NOKPROBE_SYMBOL(activate_traps_vhe);
> > @@ -117,6 +118,7 @@ static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
> >  		__activate_traps_fpsimd32(vcpu);
> >  	}
> >  
> > +	write_sysreg(vcpu->arch.mdcr_el2 | 3 << MDCR_EL2_E2PB_SHIFT, mdcr_el2);
> 
> There is a _MASK macro that can replace this '3', and is in keeping
> with the rest of the code.

OK.


> 
> It still remains that it looks like the wrong place to do this, and
> vcpu_load seems much better. Why should you write to mdcr_el2 on each
> entry to the guest, since you know whether it has SPE enabled at the
> point where it gets scheduled?

Yes OK, I'll move what I can to vcpu_load.

Thanks,

Andrew Murray


> 
> 	M.
> 
> -- 
> Jazz is not dead, it just smells funny.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 14/18] KVM: arm64: spe: Provide guest virtual interrupts for SPE
  2019-12-22 12:07   ` Marc Zyngier
@ 2019-12-24 11:50     ` Andrew Murray
  2019-12-24 12:42       ` Marc Zyngier
  0 siblings, 1 reply; 78+ messages in thread
From: Andrew Murray @ 2019-12-24 11:50 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, kvm, linux-kernel,
	Sudeep Holla, kvmarm, linux-arm-kernel

On Sun, Dec 22, 2019 at 12:07:50PM +0000, Marc Zyngier wrote:
> On Fri, 20 Dec 2019 14:30:21 +0000,
> Andrew Murray <andrew.murray@arm.com> wrote:
> > 
> > Upon the exit of a guest, let's determine if the SPE device has generated
> > an interrupt - if so we'll inject a virtual interrupt to the guest.
> > 
> > Upon the entry and exit of a guest we'll also update the state of the
> > physical IRQ such that it is active when a guest interrupt is pending
> > and the guest is running.
> > 
> > Finally we map the physical IRQ to the virtual IRQ such that the guest
> > can deactivate the interrupt when it handles the interrupt.
> > 
> > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> > ---
> >  include/kvm/arm_spe.h |  6 ++++
> >  virt/kvm/arm/arm.c    |  5 ++-
> >  virt/kvm/arm/spe.c    | 71 +++++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 81 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/kvm/arm_spe.h b/include/kvm/arm_spe.h
> > index 9c65130d726d..91b2214f543a 100644
> > --- a/include/kvm/arm_spe.h
> > +++ b/include/kvm/arm_spe.h
> > @@ -37,6 +37,9 @@ static inline bool kvm_arm_support_spe_v1(void)
> >  						      ID_AA64DFR0_PMSVER_SHIFT);
> >  }
> >  
> > +void kvm_spe_flush_hwstate(struct kvm_vcpu *vcpu);
> > +inline void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu);
> > +
> >  int kvm_arm_spe_v1_set_attr(struct kvm_vcpu *vcpu,
> >  			    struct kvm_device_attr *attr);
> >  int kvm_arm_spe_v1_get_attr(struct kvm_vcpu *vcpu,
> > @@ -49,6 +52,9 @@ int kvm_arm_spe_v1_enable(struct kvm_vcpu *vcpu);
> >  #define kvm_arm_support_spe_v1()	(false)
> >  #define kvm_arm_spe_irq_initialized(v)	(false)
> >  
> > +static inline void kvm_spe_flush_hwstate(struct kvm_vcpu *vcpu) {}
> > +static inline void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu) {}
> > +
> >  static inline int kvm_arm_spe_v1_set_attr(struct kvm_vcpu *vcpu,
> >  					  struct kvm_device_attr *attr)
> >  {
> > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> > index 340d2388ee2c..a66085c8e785 100644
> > --- a/virt/kvm/arm/arm.c
> > +++ b/virt/kvm/arm/arm.c
> > @@ -741,6 +741,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  		preempt_disable();
> >  
> >  		kvm_pmu_flush_hwstate(vcpu);
> > +		kvm_spe_flush_hwstate(vcpu);
> >  
> >  		local_irq_disable();
> >  
> > @@ -782,6 +783,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  		    kvm_request_pending(vcpu)) {
> >  			vcpu->mode = OUTSIDE_GUEST_MODE;
> >  			isb(); /* Ensure work in x_flush_hwstate is committed */
> > +			kvm_spe_sync_hwstate(vcpu);
> >  			kvm_pmu_sync_hwstate(vcpu);
> >  			if (static_branch_unlikely(&userspace_irqchip_in_use))
> >  				kvm_timer_sync_hwstate(vcpu);
> > @@ -816,11 +818,12 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  		kvm_arm_clear_debug(vcpu);
> >  
> >  		/*
> > -		 * We must sync the PMU state before the vgic state so
> > +		 * We must sync the PMU and SPE state before the vgic state so
> >  		 * that the vgic can properly sample the updated state of the
> >  		 * interrupt line.
> >  		 */
> >  		kvm_pmu_sync_hwstate(vcpu);
> > +		kvm_spe_sync_hwstate(vcpu);
> 
> The *HUGE* difference is that the PMU is purely a virtual interrupt,
> while you're trying to deal with a HW interrupt here.
> 
> >  
> >  		/*
> >  		 * Sync the vgic state before syncing the timer state because
> > diff --git a/virt/kvm/arm/spe.c b/virt/kvm/arm/spe.c
> > index 83ac2cce2cc3..097ed39014e4 100644
> > --- a/virt/kvm/arm/spe.c
> > +++ b/virt/kvm/arm/spe.c
> > @@ -35,6 +35,68 @@ int kvm_arm_spe_v1_enable(struct kvm_vcpu *vcpu)
> >  	return 0;
> >  }
> >  
> > +static inline void set_spe_irq_phys_active(struct arm_spe_kvm_info *info,
> > +					   bool active)
> > +{
> > +	int r;
> > +	r = irq_set_irqchip_state(info->physical_irq, IRQCHIP_STATE_ACTIVE,
> > +				  active);
> > +	WARN_ON(r);
> > +}
> > +
> > +void kvm_spe_flush_hwstate(struct kvm_vcpu *vcpu)
> > +{
> > +	struct kvm_spe *spe = &vcpu->arch.spe;
> > +	bool phys_active = false;
> > +	struct arm_spe_kvm_info *info = arm_spe_get_kvm_info();
> > +
> > +	if (!kvm_arm_spe_v1_ready(vcpu))
> > +		return;
> > +
> > +	if (irqchip_in_kernel(vcpu->kvm))
> > +		phys_active = kvm_vgic_map_is_active(vcpu, spe->irq_num);
> > +
> > +	phys_active |= spe->irq_level;
> > +
> > +	set_spe_irq_phys_active(info, phys_active);
> 
> So you're happy to mess with the HW interrupt state even when you
> don't have a HW irqchip? If you are going to copy paste the timer code
> here, you'd need to support it all the way (no, don't).
> 
> > +}
> > +
> > +void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu)
> > +{
> > +	struct kvm_spe *spe = &vcpu->arch.spe;
> > +	u64 pmbsr;
> > +	int r;
> > +	bool service;
> > +	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
> > +	struct arm_spe_kvm_info *info = arm_spe_get_kvm_info();
> > +
> > +	if (!kvm_arm_spe_v1_ready(vcpu))
> > +		return;
> > +
> > +	set_spe_irq_phys_active(info, false);
> > +
> > +	pmbsr = ctxt->sys_regs[PMBSR_EL1];
> > +	service = !!(pmbsr & BIT(SYS_PMBSR_EL1_S_SHIFT));
> > +	if (spe->irq_level == service)
> > +		return;
> > +
> > +	spe->irq_level = service;
> > +
> > +	if (likely(irqchip_in_kernel(vcpu->kvm))) {
> > +		r = kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
> > +					spe->irq_num, service, spe);
> > +		WARN_ON(r);
> > +	}
> > +}
> > +
> > +static inline bool kvm_arch_arm_spe_v1_get_input_level(int vintid)
> > +{
> > +	struct kvm_vcpu *vcpu = kvm_arm_get_running_vcpu();
> > +	struct kvm_spe *spe = &vcpu->arch.spe;
> > +
> > +	return spe->irq_level;
> > +}
> 
> This isn't what such a callback is for. It is supposed to sample the
> HW, an nothing else.
> 
> > +
> >  static int kvm_arm_spe_v1_init(struct kvm_vcpu *vcpu)
> >  {
> >  	if (!kvm_arm_support_spe_v1())
> > @@ -48,6 +110,7 @@ static int kvm_arm_spe_v1_init(struct kvm_vcpu *vcpu)
> >  
> >  	if (irqchip_in_kernel(vcpu->kvm)) {
> >  		int ret;
> > +		struct arm_spe_kvm_info *info;
> >  
> >  		/*
> >  		 * If using the SPE with an in-kernel virtual GIC
> > @@ -57,10 +120,18 @@ static int kvm_arm_spe_v1_init(struct kvm_vcpu *vcpu)
> >  		if (!vgic_initialized(vcpu->kvm))
> >  			return -ENODEV;
> >  
> > +		info = arm_spe_get_kvm_info();
> > +		if (!info->physical_irq)
> > +			return -ENODEV;
> > +
> >  		ret = kvm_vgic_set_owner(vcpu, vcpu->arch.spe.irq_num,
> >  					 &vcpu->arch.spe);
> >  		if (ret)
> >  			return ret;
> > +
> > +		ret = kvm_vgic_map_phys_irq(vcpu, info->physical_irq,
> > +					    vcpu->arch.spe.irq_num,
> > +					    kvm_arch_arm_spe_v1_get_input_level);
> 
> You're mapping the interrupt int the guest, and yet you have never
> forwarded the interrupt the first place. All this flow is only going
> to wreck the host driver as soon as an interrupt occurs.
> 
> I think you should rethink the interrupt handling altogether. It would
> make more sense if the interrupt was actually completely
> virtualized. If you can isolate the guest state and compute the
> interrupt state in SW (and from the above, it seems that you can),
> then you shouldn't mess with the whole forwarding *at all*, as it
> isn't designed for devices shared between host and guests.

Yes it's possible to read SYS_PMBSR_EL1_S_SHIFT and determine if SPE wants
service. If I understand correctly, you're suggesting on entry/exit to the
guest we determine this and inject an interrupt to the guest. As well as
removing the kvm_vgic_map_phys_irq mapping to the physical interrupt?

My understanding was that I needed knowledge of the physical SPE interrupt
number so that I could prevent the host SPE driver from getting spurious
interrupts due to guest use of the SPE. 

Thanks,

Andrew Murray

> 
> 	M.
> 
> -- 
> Jazz is not dead, it just smells funny.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 03/18] arm64: KVM: define SPE data structure for each vcpu
  2019-12-21 13:19   ` Marc Zyngier
@ 2019-12-24 12:01     ` Andrew Murray
  0 siblings, 0 replies; 78+ messages in thread
From: Andrew Murray @ 2019-12-24 12:01 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: will, Catalin Marinas, kvm, linux-kernel, Sudeep Holla, kvmarm,
	linux-arm-kernel

On Sat, Dec 21, 2019 at 01:19:36PM +0000, Marc Zyngier wrote:
> On Fri, 20 Dec 2019 14:30:10 +0000
> Andrew Murray <andrew.murray@arm.com> wrote:
> 
> > From: Sudeep Holla <sudeep.holla@arm.com>
> > 
> > In order to support virtual SPE for guest, so define some basic structs.
> > This features depends on host having hardware with SPE support.
> > 
> > Since we can support this only on ARM64, add a separate config symbol
> > for the same.
> > 
> > Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> > [ Add irq_level, rename irq to irq_num for kvm_spe ]
> > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> > ---
> >  arch/arm64/include/asm/kvm_host.h |  2 ++
> >  arch/arm64/kvm/Kconfig            |  7 +++++++
> >  include/kvm/arm_spe.h             | 19 +++++++++++++++++++
> >  3 files changed, 28 insertions(+)
> >  create mode 100644 include/kvm/arm_spe.h
> > 
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index c61260cf63c5..f5dcff912645 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -35,6 +35,7 @@
> >  #include <kvm/arm_vgic.h>
> >  #include <kvm/arm_arch_timer.h>
> >  #include <kvm/arm_pmu.h>
> > +#include <kvm/arm_spe.h>
> >  
> >  #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
> >  
> > @@ -302,6 +303,7 @@ struct kvm_vcpu_arch {
> >  	struct vgic_cpu vgic_cpu;
> >  	struct arch_timer_cpu timer_cpu;
> >  	struct kvm_pmu pmu;
> > +	struct kvm_spe spe;
> >  
> >  	/*
> >  	 * Anything that is not used directly from assembly code goes
> > diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> > index a475c68cbfec..af5be2c57dcb 100644
> > --- a/arch/arm64/kvm/Kconfig
> > +++ b/arch/arm64/kvm/Kconfig
> > @@ -35,6 +35,7 @@ config KVM
> >  	select HAVE_KVM_EVENTFD
> >  	select HAVE_KVM_IRQFD
> >  	select KVM_ARM_PMU if HW_PERF_EVENTS
> > +	select KVM_ARM_SPE if (HW_PERF_EVENTS && ARM_SPE_PMU)
> >  	select HAVE_KVM_MSI
> >  	select HAVE_KVM_IRQCHIP
> >  	select HAVE_KVM_IRQ_ROUTING
> > @@ -61,6 +62,12 @@ config KVM_ARM_PMU
> >  	  Adds support for a virtual Performance Monitoring Unit (PMU) in
> >  	  virtual machines.
> >  
> > +config KVM_ARM_SPE
> > +	bool
> > +	---help---
> > +	  Adds support for a virtual Statistical Profiling Extension(SPE) in
> > +	  virtual machines.
> > +
> >  config KVM_INDIRECT_VECTORS
> >         def_bool KVM && (HARDEN_BRANCH_PREDICTOR || HARDEN_EL2_VECTORS)
> >  
> > diff --git a/include/kvm/arm_spe.h b/include/kvm/arm_spe.h
> > new file mode 100644
> > index 000000000000..48d118fdb174
> > --- /dev/null
> > +++ b/include/kvm/arm_spe.h
> > @@ -0,0 +1,19 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2019 ARM Ltd.
> > + */
> > +
> > +#ifndef __ASM_ARM_KVM_SPE_H
> > +#define __ASM_ARM_KVM_SPE_H
> > +
> > +#include <uapi/linux/kvm.h>
> > +#include <linux/kvm_host.h>
> 
> I don't believe these are required at this stage.
> 
> > +
> > +struct kvm_spe {
> > +	int irq_num;
> 
> 'irq' was the right name *if* this represents a Linux irq. If this
> instead represents a guest PPI, then it should be named 'intid'.
> 
> In either case, please document what this represents.
> 
> > +	bool ready; /* indicates that SPE KVM instance is ready for use */
> > +	bool created; /* SPE KVM instance is created, may not be ready yet */
> > +	bool irq_level;
> 
> What does this represent? The state of the interrupt on the host? The
> guest? Something else? Also, please consider grouping related fields
> together.

It should be the state of the interrupt on the guest.

> 
> > +};
> 
> If you've added a config option that controls the selection of the SPE
> feature, why doesn't this result in an empty structure when it isn't
> selected?

OK, all noted.

Andrew Murray

> 
> > +
> > +#endif /* __ASM_ARM_KVM_SPE_H */
> 
> Thanks,
> 
> 	M.
> -- 
> Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 09/18] arm64: KVM: enable conditional save/restore full SPE profiling buffer controls
  2019-12-20 18:06   ` Mark Rutland
@ 2019-12-24 12:15     ` Andrew Murray
  0 siblings, 0 replies; 78+ messages in thread
From: Andrew Murray @ 2019-12-24 12:15 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, Sudeep Holla, kvmarm,
	linux-arm-kernel, kvm, linux-kernel

On Fri, Dec 20, 2019 at 06:06:58PM +0000, Mark Rutland wrote:
> On Fri, Dec 20, 2019 at 02:30:16PM +0000, Andrew Murray wrote:
> > From: Sudeep Holla <sudeep.holla@arm.com>
> > 
> > Now that we can save/restore the full SPE controls, we can enable it
> > if SPE is setup and ready to use in KVM. It's supported in KVM only if
> > all the CPUs in the system supports SPE.
> > 
> > However to support heterogenous systems, we need to move the check if
> > host supports SPE and do a partial save/restore.
> 
> I don't think that it makes sense to support this for heterogeneous
> systems, given their SPE capabilities and IMP DEF details will differ.
> 
> Is there some way we can limit this to homogeneous systems?

No problem, I'll see how to limit this.

Thanks,

Andrew Murray

> 
> Thanks,
> Mark.
> 
> > 
> > Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> > ---
> >  arch/arm64/kvm/hyp/debug-sr.c | 33 ++++++++++++++++-----------------
> >  include/kvm/arm_spe.h         |  6 ++++++
> >  2 files changed, 22 insertions(+), 17 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
> > index 12429b212a3a..d8d857067e6d 100644
> > --- a/arch/arm64/kvm/hyp/debug-sr.c
> > +++ b/arch/arm64/kvm/hyp/debug-sr.c
> > @@ -86,18 +86,13 @@
> >  	}
> >  
> >  static void __hyp_text
> > -__debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
> > +__debug_save_spe_context(struct kvm_cpu_context *ctxt, bool full_ctxt)
> >  {
> >  	u64 reg;
> >  
> >  	/* Clear pmscr in case of early return */
> >  	ctxt->sys_regs[PMSCR_EL1] = 0;
> >  
> > -	/* SPE present on this CPU? */
> > -	if (!cpuid_feature_extract_unsigned_field(read_sysreg(id_aa64dfr0_el1),
> > -						  ID_AA64DFR0_PMSVER_SHIFT))
> > -		return;
> > -
> >  	/* Yes; is it owned by higher EL? */
> >  	reg = read_sysreg_s(SYS_PMBIDR_EL1);
> >  	if (reg & BIT(SYS_PMBIDR_EL1_P_SHIFT))
> > @@ -142,7 +137,7 @@ __debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
> >  }
> >  
> >  static void __hyp_text
> > -__debug_restore_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
> > +__debug_restore_spe_context(struct kvm_cpu_context *ctxt, bool full_ctxt)
> >  {
> >  	if (!ctxt->sys_regs[PMSCR_EL1])
> >  		return;
> > @@ -210,11 +205,14 @@ void __hyp_text __debug_restore_guest_context(struct kvm_vcpu *vcpu)
> >  	struct kvm_guest_debug_arch *host_dbg;
> >  	struct kvm_guest_debug_arch *guest_dbg;
> >  
> > +	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > +	guest_ctxt = &vcpu->arch.ctxt;
> > +
> > +	__debug_restore_spe_context(guest_ctxt, kvm_arm_spe_v1_ready(vcpu));
> > +
> >  	if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
> >  		return;
> >  
> > -	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > -	guest_ctxt = &vcpu->arch.ctxt;
> >  	host_dbg = &vcpu->arch.host_debug_state.regs;
> >  	guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
> >  
> > @@ -232,8 +230,7 @@ void __hyp_text __debug_restore_host_context(struct kvm_vcpu *vcpu)
> >  	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> >  	guest_ctxt = &vcpu->arch.ctxt;
> >  
> > -	if (!has_vhe())
> > -		__debug_restore_spe_nvhe(host_ctxt, false);
> > +	__debug_restore_spe_context(host_ctxt, kvm_arm_spe_v1_ready(vcpu));
> >  
> >  	if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
> >  		return;
> > @@ -249,19 +246,21 @@ void __hyp_text __debug_restore_host_context(struct kvm_vcpu *vcpu)
> >  
> >  void __hyp_text __debug_save_host_context(struct kvm_vcpu *vcpu)
> >  {
> > -	/*
> > -	 * Non-VHE: Disable and flush SPE data generation
> > -	 * VHE: The vcpu can run, but it can't hide.
> > -	 */
> >  	struct kvm_cpu_context *host_ctxt;
> >  
> >  	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > -	if (!has_vhe())
> > -		__debug_save_spe_nvhe(host_ctxt, false);
> > +	if (cpuid_feature_extract_unsigned_field(read_sysreg(id_aa64dfr0_el1),
> > +						 ID_AA64DFR0_PMSVER_SHIFT))
> > +		__debug_save_spe_context(host_ctxt, kvm_arm_spe_v1_ready(vcpu));
> >  }
> >  
> >  void __hyp_text __debug_save_guest_context(struct kvm_vcpu *vcpu)
> >  {
> > +	bool kvm_spe_ready = kvm_arm_spe_v1_ready(vcpu);
> > +
> > +	/* SPE present on this vCPU? */
> > +	if (kvm_spe_ready)
> > +		__debug_save_spe_context(&vcpu->arch.ctxt, kvm_spe_ready);
> >  }
> >  
> >  u32 __hyp_text __kvm_get_mdcr_el2(void)
> > diff --git a/include/kvm/arm_spe.h b/include/kvm/arm_spe.h
> > index 48d118fdb174..30c40b1bc385 100644
> > --- a/include/kvm/arm_spe.h
> > +++ b/include/kvm/arm_spe.h
> > @@ -16,4 +16,10 @@ struct kvm_spe {
> >  	bool irq_level;
> >  };
> >  
> > +#ifdef CONFIG_KVM_ARM_SPE
> > +#define kvm_arm_spe_v1_ready(v)		((v)->arch.spe.ready)
> > +#else
> > +#define kvm_arm_spe_v1_ready(v)		(false)
> > +#endif /* CONFIG_KVM_ARM_SPE */
> > +
> >  #endif /* __ASM_ARM_KVM_SPE_H */
> > -- 
> > 2.21.0
> > 

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 12/18] KVM: arm64: add a new vcpu device control group for SPEv1
  2019-12-22 11:03   ` Marc Zyngier
@ 2019-12-24 12:30     ` Andrew Murray
  0 siblings, 0 replies; 78+ messages in thread
From: Andrew Murray @ 2019-12-24 12:30 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Will Deacon, kvm, linux-kernel, Sudeep Holla,
	kvmarm, linux-arm-kernel

On Sun, Dec 22, 2019 at 11:03:04AM +0000, Marc Zyngier wrote:
> On Fri, 20 Dec 2019 14:30:19 +0000,
> Andrew Murray <andrew.murray@arm.com> wrote:
> > 
> > From: Sudeep Holla <sudeep.holla@arm.com>
> > 
> > To configure the virtual SPEv1 overflow interrupt number, we use the
> > vcpu kvm_device ioctl, encapsulating the KVM_ARM_VCPU_SPE_V1_IRQ
> > attribute within the KVM_ARM_VCPU_SPE_V1_CTRL group.
> > 
> > After configuring the SPEv1, call the vcpu ioctl with attribute
> > KVM_ARM_VCPU_SPE_V1_INIT to initialize the SPEv1.
> > 
> > Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> > ---
> >  Documentation/virt/kvm/devices/vcpu.txt |  28 ++++
> >  arch/arm64/include/asm/kvm_host.h       |   2 +-
> >  arch/arm64/include/uapi/asm/kvm.h       |   4 +
> >  arch/arm64/kvm/Makefile                 |   1 +
> >  arch/arm64/kvm/guest.c                  |   6 +
> >  arch/arm64/kvm/reset.c                  |   3 +
> >  include/kvm/arm_spe.h                   |  45 +++++++
> >  include/uapi/linux/kvm.h                |   1 +
> >  virt/kvm/arm/arm.c                      |   1 +
> >  virt/kvm/arm/spe.c                      | 163 ++++++++++++++++++++++++
> >  10 files changed, 253 insertions(+), 1 deletion(-)
> >  create mode 100644 virt/kvm/arm/spe.c
> > 
> > diff --git a/Documentation/virt/kvm/devices/vcpu.txt b/Documentation/virt/kvm/devices/vcpu.txt
> > index 6f3bd64a05b0..cefad056d677 100644
> > --- a/Documentation/virt/kvm/devices/vcpu.txt
> > +++ b/Documentation/virt/kvm/devices/vcpu.txt
> > @@ -74,3 +74,31 @@ Specifies the base address of the stolen time structure for this VCPU. The
> >  base address must be 64 byte aligned and exist within a valid guest memory
> >  region. See Documentation/virt/kvm/arm/pvtime.txt for more information
> >  including the layout of the stolen time structure.
> > +
> > +4. GROUP: KVM_ARM_VCPU_SPE_V1_CTRL
> > +Architectures: ARM64
> > +
> > +4.1. ATTRIBUTE: KVM_ARM_VCPU_SPE_V1_IRQ
> > +Parameters: in kvm_device_attr.addr the address for SPE buffer overflow interrupt
> > +	    is a pointer to an int
> > +Returns: -EBUSY: The SPE overflow interrupt is already set
> > +         -ENXIO: The overflow interrupt not set when attempting to get it
> > +         -ENODEV: SPEv1 not supported
> > +         -EINVAL: Invalid SPE overflow interrupt number supplied or
> > +                  trying to set the IRQ number without using an in-kernel
> > +                  irqchip.
> > +
> > +A value describing the SPEv1 (Statistical Profiling Extension v1) overflow
> > +interrupt number for this vcpu. This interrupt should be PPI and the interrupt
> > +type and number must be same for each vcpu.
> > +
> > +4.2 ATTRIBUTE: KVM_ARM_VCPU_SPE_V1_INIT
> > +Parameters: no additional parameter in kvm_device_attr.addr
> > +Returns: -ENODEV: SPEv1 not supported or GIC not initialized
> > +         -ENXIO: SPEv1 not properly configured or in-kernel irqchip not
> > +                 configured as required prior to calling this attribute
> > +         -EBUSY: SPEv1 already initialized
> > +
> > +Request the initialization of the SPEv1.  If using the SPEv1 with an in-kernel
> > +virtual GIC implementation, this must be done after initializing the in-kernel
> > +irqchip.
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 333c6491bec7..d00f450dc4cd 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -39,7 +39,7 @@
> >  
> >  #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
> >  
> > -#define KVM_VCPU_MAX_FEATURES 7
> > +#define KVM_VCPU_MAX_FEATURES 8
> >  
> >  #define KVM_REQ_SLEEP \
> >  	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> > diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> > index 820e5751ada7..905a73f30079 100644
> > --- a/arch/arm64/include/uapi/asm/kvm.h
> > +++ b/arch/arm64/include/uapi/asm/kvm.h
> > @@ -106,6 +106,7 @@ struct kvm_regs {
> >  #define KVM_ARM_VCPU_SVE		4 /* enable SVE for this CPU */
> >  #define KVM_ARM_VCPU_PTRAUTH_ADDRESS	5 /* VCPU uses address authentication */
> >  #define KVM_ARM_VCPU_PTRAUTH_GENERIC	6 /* VCPU uses generic authentication */
> > +#define KVM_ARM_VCPU_SPE_V1		7 /* Support guest SPEv1 */
> >  
> >  struct kvm_vcpu_init {
> >  	__u32 target;
> > @@ -326,6 +327,9 @@ struct kvm_vcpu_events {
> >  #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER		1
> >  #define KVM_ARM_VCPU_PVTIME_CTRL	2
> >  #define   KVM_ARM_VCPU_PVTIME_IPA	0
> > +#define KVM_ARM_VCPU_SPE_V1_CTRL	3
> > +#define   KVM_ARM_VCPU_SPE_V1_IRQ	0
> > +#define   KVM_ARM_VCPU_SPE_V1_INIT	1
> >  
> >  /* KVM_IRQ_LINE irq field index values */
> >  #define KVM_ARM_IRQ_VCPU2_SHIFT		28
> > diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> > index 5ffbdc39e780..526f3bf09321 100644
> > --- a/arch/arm64/kvm/Makefile
> > +++ b/arch/arm64/kvm/Makefile
> > @@ -37,3 +37,4 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-debug.o
> >  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/irqchip.o
> >  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
> >  kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
> > +kvm-$(CONFIG_KVM_ARM_SPE) += $(KVM)/arm/spe.o
> > diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> > index 2fff06114a8f..50fea538b8bd 100644
> > --- a/arch/arm64/kvm/guest.c
> > +++ b/arch/arm64/kvm/guest.c
> > @@ -874,6 +874,8 @@ int kvm_arm_vcpu_arch_set_attr(struct kvm_vcpu *vcpu,
> >  		break;
> >  	case KVM_ARM_VCPU_PVTIME_CTRL:
> >  		ret = kvm_arm_pvtime_set_attr(vcpu, attr);
> > +	case KVM_ARM_VCPU_SPE_V1_CTRL:
> > +		ret = kvm_arm_spe_v1_set_attr(vcpu, attr);
> >  		break;
> >  	default:
> >  		ret = -ENXIO;
> > @@ -897,6 +899,8 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
> >  		break;
> >  	case KVM_ARM_VCPU_PVTIME_CTRL:
> >  		ret = kvm_arm_pvtime_get_attr(vcpu, attr);
> > +	case KVM_ARM_VCPU_SPE_V1_CTRL:
> > +		ret = kvm_arm_spe_v1_get_attr(vcpu, attr);
> >  		break;
> >  	default:
> >  		ret = -ENXIO;
> > @@ -920,6 +924,8 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
> >  		break;
> >  	case KVM_ARM_VCPU_PVTIME_CTRL:
> >  		ret = kvm_arm_pvtime_has_attr(vcpu, attr);
> > +	case KVM_ARM_VCPU_SPE_V1_CTRL:
> > +		ret = kvm_arm_spe_v1_has_attr(vcpu, attr);
> >  		break;
> >  	default:
> >  		ret = -ENXIO;
> > diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> > index f4a8ae918827..cf17aff1489d 100644
> > --- a/arch/arm64/kvm/reset.c
> > +++ b/arch/arm64/kvm/reset.c
> > @@ -80,6 +80,9 @@ int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >  	case KVM_CAP_ARM_INJECT_SERROR_ESR:
> >  		r = cpus_have_const_cap(ARM64_HAS_RAS_EXTN);
> >  		break;
> > +	case KVM_CAP_ARM_SPE_V1:
> > +		r = kvm_arm_support_spe_v1();
> > +		break;
> >  	case KVM_CAP_SET_GUEST_DEBUG:
> >  	case KVM_CAP_VCPU_ATTRIBUTES:
> >  		r = 1;
> > diff --git a/include/kvm/arm_spe.h b/include/kvm/arm_spe.h
> > index 30c40b1bc385..d1f3c564dfd0 100644
> > --- a/include/kvm/arm_spe.h
> > +++ b/include/kvm/arm_spe.h
> > @@ -8,6 +8,7 @@
> >  
> >  #include <uapi/linux/kvm.h>
> >  #include <linux/kvm_host.h>
> > +#include <linux/cpufeature.h>
> >  
> >  struct kvm_spe {
> >  	int irq_num;
> > @@ -18,8 +19,52 @@ struct kvm_spe {
> >  
> >  #ifdef CONFIG_KVM_ARM_SPE
> >  #define kvm_arm_spe_v1_ready(v)		((v)->arch.spe.ready)
> > +#define kvm_arm_spe_irq_initialized(v)		\
> > +	((v)->arch.spe.irq_num >= VGIC_NR_SGIS &&	\
> > +	(v)->arch.spe.irq_num <= VGIC_MAX_PRIVATE)
> 
> This is buggy, as it accepts 32 as a valid interrupt (which obviously
> isn't a PPI). Having fixed it, this is a duplicate of irq_is_ppi().

I'll replace that line with irq_is_ppi.

> 
> And that's where we can finally confirm that 'irq_num' is a GIC
> INTID. Please name it as such.

OK.


> 
> > +
> > +static inline bool kvm_arm_support_spe_v1(void)
> > +{
> > +	u64 dfr0 = read_sanitised_ftr_reg(SYS_ID_AA64DFR0_EL1);
> > +
> > +	return !!cpuid_feature_extract_unsigned_field(dfr0,
> > +						      ID_AA64DFR0_PMSVER_SHIFT);
> > +}
> > +
> > +int kvm_arm_spe_v1_set_attr(struct kvm_vcpu *vcpu,
> > +			    struct kvm_device_attr *attr);
> > +int kvm_arm_spe_v1_get_attr(struct kvm_vcpu *vcpu,
> > +			    struct kvm_device_attr *attr);
> > +int kvm_arm_spe_v1_has_attr(struct kvm_vcpu *vcpu,
> > +			    struct kvm_device_attr *attr);
> > +int kvm_arm_spe_v1_enable(struct kvm_vcpu *vcpu);
> >  #else
> >  #define kvm_arm_spe_v1_ready(v)		(false)
> > +#define kvm_arm_support_spe_v1()	(false)
> > +#define kvm_arm_spe_irq_initialized(v)	(false)
> > +
> > +static inline int kvm_arm_spe_v1_set_attr(struct kvm_vcpu *vcpu,
> > +					  struct kvm_device_attr *attr)
> > +{
> > +	return -ENXIO;
> > +}
> > +
> > +static inline int kvm_arm_spe_v1_get_attr(struct kvm_vcpu *vcpu,
> > +					  struct kvm_device_attr *attr)
> > +{
> > +	return -ENXIO;
> > +}
> > +
> > +static inline int kvm_arm_spe_v1_has_attr(struct kvm_vcpu *vcpu,
> > +					  struct kvm_device_attr *attr)
> > +{
> > +	return -ENXIO;
> > +}
> > +
> > +static inline int kvm_arm_spe_v1_enable(struct kvm_vcpu *vcpu)
> > +{
> > +	return 0;
> > +}
> >  #endif /* CONFIG_KVM_ARM_SPE */
> >  
> >  #endif /* __ASM_ARM_KVM_SPE_H */
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index f0a16b4adbbd..1a362c230e4a 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -1009,6 +1009,7 @@ struct kvm_ppc_resize_hpt {
> >  #define KVM_CAP_PPC_GUEST_DEBUG_SSTEP 176
> >  #define KVM_CAP_ARM_NISV_TO_USER 177
> >  #define KVM_CAP_ARM_INJECT_EXT_DABT 178
> > +#define KVM_CAP_ARM_SPE_V1 179
> >  
> >  #ifdef KVM_CAP_IRQ_ROUTING
> >  
> > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> > index 12e0280291ce..340d2388ee2c 100644
> > --- a/virt/kvm/arm/arm.c
> > +++ b/virt/kvm/arm/arm.c
> > @@ -22,6 +22,7 @@
> >  #include <trace/events/kvm.h>
> >  #include <kvm/arm_pmu.h>
> >  #include <kvm/arm_psci.h>
> > +#include <kvm/arm_spe.h>
> >  
> >  #define CREATE_TRACE_POINTS
> >  #include "trace.h"
> > diff --git a/virt/kvm/arm/spe.c b/virt/kvm/arm/spe.c
> > new file mode 100644
> > index 000000000000..83ac2cce2cc3
> > --- /dev/null
> > +++ b/virt/kvm/arm/spe.c
> > @@ -0,0 +1,163 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2019 ARM Ltd.
> > + */
> > +
> > +#include <linux/cpu.h>
> > +#include <linux/kvm.h>
> > +#include <linux/kvm_host.h>
> > +#include <linux/uaccess.h>
> > +#include <asm/kvm_emulate.h>
> > +#include <kvm/arm_spe.h>
> > +#include <kvm/arm_vgic.h>
> > +
> > +int kvm_arm_spe_v1_enable(struct kvm_vcpu *vcpu)
> > +{
> > +	if (!vcpu->arch.spe.created)
> > +		return 0;
> 
> Shouldn't it be an error to enable something that doesn't exist?

It looks like this has adopted the same approach as kvm_arm_pmu_v3_enable. In
kvm_vcpu_first_run_init we attempt to enable pmu v3 and spe - without first
checking they exist.


> 
> > +
> > +	/*
> > +	 * A valid interrupt configuration for the SPE is either to have a
> 
> either?
> 
> > +	 * properly configured interrupt number and using an in-kernel irqchip.
> > +	 */
> > +	if (irqchip_in_kernel(vcpu->kvm)) {
> > +		int irq = vcpu->arch.spe.irq_num;
> > +
> > +		if (!kvm_arm_spe_irq_initialized(vcpu))
> > +			return -EINVAL;
> > +
> > +		if (!irq_is_ppi(irq))
> > +			return -EINVAL;
> > +	}
> > +
> > +	vcpu->arch.spe.ready = true;
> 
> And SPE is then ready even when we don't have an in-kernel irqchip?

I recall in Sudeep's previous patchset that you suggested we don't support
SPE without an in-kernel irqchip. I'll update to reflect that feedback.


> 
> > +
> > +	return 0;
> > +}
> > +
> > +static int kvm_arm_spe_v1_init(struct kvm_vcpu *vcpu)
> > +{
> > +	if (!kvm_arm_support_spe_v1())
> > +		return -ENODEV;
> > +
> > +	if (!test_bit(KVM_ARM_VCPU_SPE_V1, vcpu->arch.features))
> > +		return -ENXIO;
> > +
> > +	if (vcpu->arch.spe.created)
> > +		return -EBUSY;
> > +
> > +	if (irqchip_in_kernel(vcpu->kvm)) {
> > +		int ret;
> > +
> > +		/*
> > +		 * If using the SPE with an in-kernel virtual GIC
> > +		 * implementation, we require the GIC to be already
> > +		 * initialized when initializing the SPE.
> > +		 */
> > +		if (!vgic_initialized(vcpu->kvm))
> > +			return -ENODEV;
> > +
> > +		ret = kvm_vgic_set_owner(vcpu, vcpu->arch.spe.irq_num,
> > +					 &vcpu->arch.spe);
> > +		if (ret)
> > +			return ret;
> > +	}
> > +
> > +	vcpu->arch.spe.created = true;
> 
> Same problem.
> 
> > +	return 0;
> > +}
> > +
> > +/*
> > + * For one VM the interrupt type must be same for each vcpu.
> > + * As a PPI, the interrupt number is the same for all vcpus,
> > + * while as an SPI it must be a separate number per vcpu.
> 
> Why do you want to support SPIs at all? And it isn't what
> kvm_arm_spe_irq_initialized claims to be doing.

I don't think we do. I think it's expected that the SPE interrupt is a PPI,
at least that's what the SPE driver expects. I'll simplify this.


> 
> > + */
> > +static bool spe_irq_is_valid(struct kvm *kvm, int irq)
> > +{
> > +	int i;
> > +	struct kvm_vcpu *vcpu;
> > +
> > +	kvm_for_each_vcpu(i, vcpu, kvm) {
> > +		if (!kvm_arm_spe_irq_initialized(vcpu))
> > +			continue;
> > +
> > +		if (vcpu->arch.spe.irq_num != irq)
> > +			return false;
> > +	}
> > +
> > +	return true;
> > +}
> > +
> > +int kvm_arm_spe_v1_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
> > +{
> > +	switch (attr->attr) {
> > +	case KVM_ARM_VCPU_SPE_V1_IRQ: {
> > +		int __user *uaddr = (int __user *)(long)attr->addr;
> > +		int irq;
> > +
> > +		if (!irqchip_in_kernel(vcpu->kvm))
> > +			return -EINVAL;
> 
> Here, you forbid setting the IRQ for a VM that doesn't have an
> in-kernel irqchip. And yet below, you're happy to initialise it.
> 
> > +
> > +		if (!test_bit(KVM_ARM_VCPU_SPE_V1, vcpu->arch.features))
> > +			return -ENODEV;
> > +
> > +		if (get_user(irq, uaddr))
> > +			return -EFAULT;
> > +
> > +		/* The SPE overflow interrupt can be a PPI only */
> > +		if (!(irq_is_ppi(irq)))
> > +			return -EINVAL;
> 
> Ah, so you do know about irq_is_ppi()...
> 
> > +
> > +		if (!spe_irq_is_valid(vcpu->kvm, irq))
> 
> But why don't you fold the interrupt validity checks in this helper?

That makes sense.


> 
> > +			return -EINVAL;
> > +
> > +		if (kvm_arm_spe_irq_initialized(vcpu))
> > +			return -EBUSY;
> > +
> > +		kvm_debug("Set kvm ARM SPE irq: %d\n", irq);
> > +		vcpu->arch.spe.irq_num = irq;
> > +		return 0;
> > +	}
> > +	case KVM_ARM_VCPU_SPE_V1_INIT:
> > +		return kvm_arm_spe_v1_init(vcpu);
> > +	}
> > +
> > +	return -ENXIO;
> > +}
> > +
> > +int kvm_arm_spe_v1_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
> > +{
> > +	switch (attr->attr) {
> > +	case KVM_ARM_VCPU_SPE_V1_IRQ: {
> > +		int __user *uaddr = (int __user *)(long)attr->addr;
> > +		int irq;
> > +
> > +		if (!irqchip_in_kernel(vcpu->kvm))
> > +			return -EINVAL;
> > +
> > +		if (!test_bit(KVM_ARM_VCPU_SPE_V1, vcpu->arch.features))
> > +			return -ENODEV;
> > +
> > +		if (!kvm_arm_spe_irq_initialized(vcpu))
> > +			return -ENXIO;
> > +
> > +		irq = vcpu->arch.spe.irq_num;
> > +		return put_user(irq, uaddr);
> > +	}
> > +	}
> > +
> > +	return -ENXIO;
> > +}
> > +
> > +int kvm_arm_spe_v1_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
> > +{
> > +	switch (attr->attr) {
> > +	case KVM_ARM_VCPU_SPE_V1_IRQ:
> > +	case KVM_ARM_VCPU_SPE_V1_INIT:
> > +		if (kvm_arm_support_spe_v1() &&
> > +		    test_bit(KVM_ARM_VCPU_SPE_V1, vcpu->arch.features))
> > +			return 0;
> 
> It is interesting that all the user interface is designed with the
> feature being optional in mind, and yet you've removed all the
> necessary handling code.

Yeah I broke that, I'll fix it.

Thanks,

Andrew Murray


> 
> 	M.
> 
> -- 
> Jazz is not dead, it just smells funny.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 13/18] perf: arm_spe: Add KVM structure for obtaining IRQ info
  2019-12-22 11:24   ` Marc Zyngier
@ 2019-12-24 12:35     ` Andrew Murray
  0 siblings, 0 replies; 78+ messages in thread
From: Andrew Murray @ 2019-12-24 12:35 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Will Deacon, kvm, linux-kernel, Sudeep Holla,
	kvmarm, linux-arm-kernel

On Sun, Dec 22, 2019 at 11:24:13AM +0000, Marc Zyngier wrote:
> On Fri, 20 Dec 2019 14:30:20 +0000,
> Andrew Murray <andrew.murray@arm.com> wrote:
> > 
> > KVM requires knowledge of the physical SPE IRQ number such that it can
> > associate it with any virtual IRQ for guests that require SPE emulation.
> 
> This is at best extremely odd. The only reason for KVM to obtain this
> IRQ number is if it has exclusive access to the device.  This
> obviously isn't the case, as this device is shared between host and
> guest.

This was an attempt to set the interrupt as active such that host SPE driver
doesn't get spurious interrupts due to guest SPE activity. Though let's save
the discussion to patch 14.


> 
> > Let's create a structure to hold this information and an accessor that
> > KVM can use to retrieve this information.
> > 
> > We expect that each SPE device will have the same physical PPI number
> > and thus will warn when this is not the case.
> > 
> > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> > ---
> >  drivers/perf/arm_spe_pmu.c | 23 +++++++++++++++++++++++
> >  include/kvm/arm_spe.h      |  6 ++++++
> >  2 files changed, 29 insertions(+)
> > 
> > diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
> > index 4e4984a55cd1..2d24af4cfcab 100644
> > --- a/drivers/perf/arm_spe_pmu.c
> > +++ b/drivers/perf/arm_spe_pmu.c
> > @@ -34,6 +34,9 @@
> >  #include <linux/smp.h>
> >  #include <linux/vmalloc.h>
> >  
> > +#include <linux/kvm_host.h>
> > +#include <kvm/arm_spe.h>
> > +
> >  #include <asm/barrier.h>
> >  #include <asm/cpufeature.h>
> >  #include <asm/mmu.h>
> > @@ -1127,6 +1130,24 @@ static void arm_spe_pmu_dev_teardown(struct arm_spe_pmu *spe_pmu)
> >  	free_percpu_irq(spe_pmu->irq, spe_pmu->handle);
> >  }
> >  
> > +#ifdef CONFIG_KVM_ARM_SPE
> > +static struct arm_spe_kvm_info arm_spe_kvm_info;
> > +
> > +struct arm_spe_kvm_info *arm_spe_get_kvm_info(void)
> > +{
> > +	return &arm_spe_kvm_info;
> > +}
> 
> How does this work when SPE is built as a module?
> 
> > +
> > +static void arm_spe_populate_kvm_info(struct arm_spe_pmu *spe_pmu)
> > +{
> > +	WARN_ON_ONCE(arm_spe_kvm_info.physical_irq != 0 &&
> > +		     arm_spe_kvm_info.physical_irq != spe_pmu->irq);
> > +	arm_spe_kvm_info.physical_irq = spe_pmu->irq;
> 
> What does 'physical' means here? It's an IRQ in the Linux sense, so
> it's already some random number that bears no relation to anything
> 'physical'.

It's some random number relating to the SPE device as opposed to the virtual
SPE device.

Thanks,

Andrew Murray

> 
> > +}
> > +#else
> > +static void arm_spe_populate_kvm_info(struct arm_spe_pmu *spe_pmu) {}
> > +#endif
> > +
> >  /* Driver and device probing */
> >  static int arm_spe_pmu_irq_probe(struct arm_spe_pmu *spe_pmu)
> >  {
> > @@ -1149,6 +1170,8 @@ static int arm_spe_pmu_irq_probe(struct arm_spe_pmu *spe_pmu)
> >  	}
> >  
> >  	spe_pmu->irq = irq;
> > +	arm_spe_populate_kvm_info(spe_pmu);
> > +
> >  	return 0;
> >  }
> >  
> > diff --git a/include/kvm/arm_spe.h b/include/kvm/arm_spe.h
> > index d1f3c564dfd0..9c65130d726d 100644
> > --- a/include/kvm/arm_spe.h
> > +++ b/include/kvm/arm_spe.h
> > @@ -17,6 +17,12 @@ struct kvm_spe {
> >  	bool irq_level;
> >  };
> >  
> > +struct arm_spe_kvm_info {
> > +	int physical_irq;
> > +};
> > +
> > +struct arm_spe_kvm_info *arm_spe_get_kvm_info(void);
> > +
> >  #ifdef CONFIG_KVM_ARM_SPE
> >  #define kvm_arm_spe_v1_ready(v)		((v)->arch.spe.ready)
> >  #define kvm_arm_spe_irq_initialized(v)		\
> 
> 	M.
> 
> -- 
> Jazz is not dead, it just smells funny.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 14/18] KVM: arm64: spe: Provide guest virtual  interrupts for SPE
  2019-12-24 11:50     ` Andrew Murray
@ 2019-12-24 12:42       ` Marc Zyngier
  2019-12-24 13:08         ` Andrew Murray
  0 siblings, 1 reply; 78+ messages in thread
From: Marc Zyngier @ 2019-12-24 12:42 UTC (permalink / raw)
  To: Andrew Murray
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, kvm, linux-kernel,
	Sudeep Holla, kvmarm, linux-arm-kernel

On 2019-12-24 11:50, Andrew Murray wrote:
> On Sun, Dec 22, 2019 at 12:07:50PM +0000, Marc Zyngier wrote:
>> On Fri, 20 Dec 2019 14:30:21 +0000,
>> Andrew Murray <andrew.murray@arm.com> wrote:
>> >
>> > Upon the exit of a guest, let's determine if the SPE device has 
>> generated
>> > an interrupt - if so we'll inject a virtual interrupt to the 
>> guest.
>> >
>> > Upon the entry and exit of a guest we'll also update the state of 
>> the
>> > physical IRQ such that it is active when a guest interrupt is 
>> pending
>> > and the guest is running.
>> >
>> > Finally we map the physical IRQ to the virtual IRQ such that the 
>> guest
>> > can deactivate the interrupt when it handles the interrupt.
>> >
>> > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
>> > ---
>> >  include/kvm/arm_spe.h |  6 ++++
>> >  virt/kvm/arm/arm.c    |  5 ++-
>> >  virt/kvm/arm/spe.c    | 71 
>> +++++++++++++++++++++++++++++++++++++++++++
>> >  3 files changed, 81 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/include/kvm/arm_spe.h b/include/kvm/arm_spe.h
>> > index 9c65130d726d..91b2214f543a 100644
>> > --- a/include/kvm/arm_spe.h
>> > +++ b/include/kvm/arm_spe.h
>> > @@ -37,6 +37,9 @@ static inline bool kvm_arm_support_spe_v1(void)
>> >  						      ID_AA64DFR0_PMSVER_SHIFT);
>> >  }
>> >
>> > +void kvm_spe_flush_hwstate(struct kvm_vcpu *vcpu);
>> > +inline void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu);
>> > +
>> >  int kvm_arm_spe_v1_set_attr(struct kvm_vcpu *vcpu,
>> >  			    struct kvm_device_attr *attr);
>> >  int kvm_arm_spe_v1_get_attr(struct kvm_vcpu *vcpu,
>> > @@ -49,6 +52,9 @@ int kvm_arm_spe_v1_enable(struct kvm_vcpu 
>> *vcpu);
>> >  #define kvm_arm_support_spe_v1()	(false)
>> >  #define kvm_arm_spe_irq_initialized(v)	(false)
>> >
>> > +static inline void kvm_spe_flush_hwstate(struct kvm_vcpu *vcpu) 
>> {}
>> > +static inline void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu) {}
>> > +
>> >  static inline int kvm_arm_spe_v1_set_attr(struct kvm_vcpu *vcpu,
>> >  					  struct kvm_device_attr *attr)
>> >  {
>> > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
>> > index 340d2388ee2c..a66085c8e785 100644
>> > --- a/virt/kvm/arm/arm.c
>> > +++ b/virt/kvm/arm/arm.c
>> > @@ -741,6 +741,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu 
>> *vcpu, struct kvm_run *run)
>> >  		preempt_disable();
>> >
>> >  		kvm_pmu_flush_hwstate(vcpu);
>> > +		kvm_spe_flush_hwstate(vcpu);
>> >
>> >  		local_irq_disable();
>> >
>> > @@ -782,6 +783,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu 
>> *vcpu, struct kvm_run *run)
>> >  		    kvm_request_pending(vcpu)) {
>> >  			vcpu->mode = OUTSIDE_GUEST_MODE;
>> >  			isb(); /* Ensure work in x_flush_hwstate is committed */
>> > +			kvm_spe_sync_hwstate(vcpu);
>> >  			kvm_pmu_sync_hwstate(vcpu);
>> >  			if (static_branch_unlikely(&userspace_irqchip_in_use))
>> >  				kvm_timer_sync_hwstate(vcpu);
>> > @@ -816,11 +818,12 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu 
>> *vcpu, struct kvm_run *run)
>> >  		kvm_arm_clear_debug(vcpu);
>> >
>> >  		/*
>> > -		 * We must sync the PMU state before the vgic state so
>> > +		 * We must sync the PMU and SPE state before the vgic state so
>> >  		 * that the vgic can properly sample the updated state of the
>> >  		 * interrupt line.
>> >  		 */
>> >  		kvm_pmu_sync_hwstate(vcpu);
>> > +		kvm_spe_sync_hwstate(vcpu);
>>
>> The *HUGE* difference is that the PMU is purely a virtual interrupt,
>> while you're trying to deal with a HW interrupt here.
>>
>> >
>> >  		/*
>> >  		 * Sync the vgic state before syncing the timer state because
>> > diff --git a/virt/kvm/arm/spe.c b/virt/kvm/arm/spe.c
>> > index 83ac2cce2cc3..097ed39014e4 100644
>> > --- a/virt/kvm/arm/spe.c
>> > +++ b/virt/kvm/arm/spe.c
>> > @@ -35,6 +35,68 @@ int kvm_arm_spe_v1_enable(struct kvm_vcpu 
>> *vcpu)
>> >  	return 0;
>> >  }
>> >
>> > +static inline void set_spe_irq_phys_active(struct 
>> arm_spe_kvm_info *info,
>> > +					   bool active)
>> > +{
>> > +	int r;
>> > +	r = irq_set_irqchip_state(info->physical_irq, 
>> IRQCHIP_STATE_ACTIVE,
>> > +				  active);
>> > +	WARN_ON(r);
>> > +}
>> > +
>> > +void kvm_spe_flush_hwstate(struct kvm_vcpu *vcpu)
>> > +{
>> > +	struct kvm_spe *spe = &vcpu->arch.spe;
>> > +	bool phys_active = false;
>> > +	struct arm_spe_kvm_info *info = arm_spe_get_kvm_info();
>> > +
>> > +	if (!kvm_arm_spe_v1_ready(vcpu))
>> > +		return;
>> > +
>> > +	if (irqchip_in_kernel(vcpu->kvm))
>> > +		phys_active = kvm_vgic_map_is_active(vcpu, spe->irq_num);
>> > +
>> > +	phys_active |= spe->irq_level;
>> > +
>> > +	set_spe_irq_phys_active(info, phys_active);
>>
>> So you're happy to mess with the HW interrupt state even when you
>> don't have a HW irqchip? If you are going to copy paste the timer 
>> code
>> here, you'd need to support it all the way (no, don't).
>>
>> > +}
>> > +
>> > +void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu)
>> > +{
>> > +	struct kvm_spe *spe = &vcpu->arch.spe;
>> > +	u64 pmbsr;
>> > +	int r;
>> > +	bool service;
>> > +	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
>> > +	struct arm_spe_kvm_info *info = arm_spe_get_kvm_info();
>> > +
>> > +	if (!kvm_arm_spe_v1_ready(vcpu))
>> > +		return;
>> > +
>> > +	set_spe_irq_phys_active(info, false);
>> > +
>> > +	pmbsr = ctxt->sys_regs[PMBSR_EL1];
>> > +	service = !!(pmbsr & BIT(SYS_PMBSR_EL1_S_SHIFT));
>> > +	if (spe->irq_level == service)
>> > +		return;
>> > +
>> > +	spe->irq_level = service;
>> > +
>> > +	if (likely(irqchip_in_kernel(vcpu->kvm))) {
>> > +		r = kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
>> > +					spe->irq_num, service, spe);
>> > +		WARN_ON(r);
>> > +	}
>> > +}
>> > +
>> > +static inline bool kvm_arch_arm_spe_v1_get_input_level(int 
>> vintid)
>> > +{
>> > +	struct kvm_vcpu *vcpu = kvm_arm_get_running_vcpu();
>> > +	struct kvm_spe *spe = &vcpu->arch.spe;
>> > +
>> > +	return spe->irq_level;
>> > +}
>>
>> This isn't what such a callback is for. It is supposed to sample the
>> HW, an nothing else.
>>
>> > +
>> >  static int kvm_arm_spe_v1_init(struct kvm_vcpu *vcpu)
>> >  {
>> >  	if (!kvm_arm_support_spe_v1())
>> > @@ -48,6 +110,7 @@ static int kvm_arm_spe_v1_init(struct kvm_vcpu 
>> *vcpu)
>> >
>> >  	if (irqchip_in_kernel(vcpu->kvm)) {
>> >  		int ret;
>> > +		struct arm_spe_kvm_info *info;
>> >
>> >  		/*
>> >  		 * If using the SPE with an in-kernel virtual GIC
>> > @@ -57,10 +120,18 @@ static int kvm_arm_spe_v1_init(struct 
>> kvm_vcpu *vcpu)
>> >  		if (!vgic_initialized(vcpu->kvm))
>> >  			return -ENODEV;
>> >
>> > +		info = arm_spe_get_kvm_info();
>> > +		if (!info->physical_irq)
>> > +			return -ENODEV;
>> > +
>> >  		ret = kvm_vgic_set_owner(vcpu, vcpu->arch.spe.irq_num,
>> >  					 &vcpu->arch.spe);
>> >  		if (ret)
>> >  			return ret;
>> > +
>> > +		ret = kvm_vgic_map_phys_irq(vcpu, info->physical_irq,
>> > +					    vcpu->arch.spe.irq_num,
>> > +					    kvm_arch_arm_spe_v1_get_input_level);
>>
>> You're mapping the interrupt int the guest, and yet you have never
>> forwarded the interrupt the first place. All this flow is only going
>> to wreck the host driver as soon as an interrupt occurs.
>>
>> I think you should rethink the interrupt handling altogether. It 
>> would
>> make more sense if the interrupt was actually completely
>> virtualized. If you can isolate the guest state and compute the
>> interrupt state in SW (and from the above, it seems that you can),
>> then you shouldn't mess with the whole forwarding *at all*, as it
>> isn't designed for devices shared between host and guests.
>
> Yes it's possible to read SYS_PMBSR_EL1_S_SHIFT and determine if SPE 
> wants
> service. If I understand correctly, you're suggesting on entry/exit 
> to the
> guest we determine this and inject an interrupt to the guest. As well 
> as
> removing the kvm_vgic_map_phys_irq mapping to the physical interrupt?

The mapping only makes sense for devices that have their interrupt
forwarded to a vcpu, where the expected flow is that the interrupt
is taken on the host with a normal interrupt handler and then
injected in the guest (you still have to manage the active state
though). The basic assumption is that such a device is entirely
owned by KVM.

Here, you're abusing the mapping interface: you don't have an
interrupt handler (the host SPE driver owns it), the interrupt
isn't forwarded, and yet you're messing with the active state.
None of that is expected, and you are in uncharted territory
as far as KVM is concerned.

What bothers me the most is that this looks a lot like a previous
implementation of the timers, and we had all the problems in the
world to keep track of the interrupt state *and* have a reasonable
level of performance (hitting the redistributor on the fast path
is a performance killer).

> My understanding was that I needed knowledge of the physical SPE 
> interrupt
> number so that I could prevent the host SPE driver from getting 
> spurious
> interrupts due to guest use of the SPE.

You can't completely rule out the host getting interrupted. Even if you 
set
PMBSR_EL1.S to zero, there is no guarantee that the host will not 
observe
the interrupt anyway (the GIC architecture doesn't tell you how quickly
it will be retired, if ever). The host driver already checks for this
anyway.

What you need to ensure is that PMBSR_EL1.S being set on guest entry
doesn't immediately kick you out of the guest and prevent forward
progress. This is why you need to manage the active state.

The real question is: how quickly do you want to react to a SPE
interrupt firing while in a guest?

If you want to take it into account as soon as it fires, then you need
to eagerly save/restore the active state together with the SPE state on
each entry/exit, and performance will suffer. This is what you are
currently doing.

If you're OK with evaluating the interrupt status on exit, but without
the interrupt itself causing an exit, then you can simply manage it
as a purely virtual interrupt, and just deal with the active state
in load/put (set the interrupt as active on load, clear it on put).

Given that SPE interrupts always indicate that profiling has stopped,
this only affects the size of the black hole, and I'm inclined to do
the latter.

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 00/18] arm64: KVM: add SPE profiling support
  2019-12-20 17:55 ` [PATCH v2 00/18] arm64: KVM: add SPE profiling support Mark Rutland
@ 2019-12-24 12:54   ` Andrew Murray
  0 siblings, 0 replies; 78+ messages in thread
From: Andrew Murray @ 2019-12-24 12:54 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, Sudeep Holla, kvmarm,
	linux-arm-kernel, kvm, linux-kernel

On Fri, Dec 20, 2019 at 05:55:25PM +0000, Mark Rutland wrote:
> Hi Andrew,
> 
> On Fri, Dec 20, 2019 at 02:30:07PM +0000, Andrew Murray wrote:
> > This series implements support for allowing KVM guests to use the Arm
> > Statistical Profiling Extension (SPE).
> > 
> > It has been tested on a model to ensure that both host and guest can
> > simultaneously use SPE with valid data. E.g.
> > 
> > $ perf record -e arm_spe/ts_enable=1,pa_enable=1,pct_enable=1/ \
> >         dd if=/dev/zero of=/dev/null count=1000
> > $ perf report --dump-raw-trace > spe_buf.txt
> 
> What happens if I run perf record on the VMM, or on the CPU(s) that the
> VMM is running on? i.e.
> 
> $ perf record -e arm_spe/ts_enable=1,pa_enable=1,pct_enable=1/ \
>         lkvm ${OPTIONS_FOR_GUEST_USING_SPE}
> 

By default perf excludes the guest, so this works as expected, just recording
activity of the process when it is outside the guest. (perf report appears
to give valid output).

Patch 15 currently prevents using perf to record inside the guest.


> ... or:
> 
> $ perf record -a -c 0 -e arm_spe/ts_enable=1,pa_enable=1,pct_enable=1/ \
>         sleep 1000 &
> $ taskset -c 0 lkvm ${OPTIONS_FOR_GUEST_USING_SPE} &
> 
> > As we save and restore the SPE context, the guest can access the SPE
> > registers directly, thus in this version of the series we remove the
> > trapping and emulation.
> > 
> > In the previous series of this support, when KVM SPE isn't supported
> > (e.g. via CONFIG_KVM_ARM_SPE) we were able to return a value of 0 to
> > all reads of the SPE registers - as we can no longer do this there isn't
> > a mechanism to prevent the guest from using SPE - thus I'm keen for
> > feedback on the best way of resolving this.
> 
> When not providing SPE to the guest, surely we should be trapping the
> registers and injecting an UNDEF?

Yes we should, I'll update the series.


> 
> What happens today, without these patches?
> 

Prior to this series MDCR_EL2_TPMS is set and E2PB is unset resulting in all
SPE registers being trapped (with NULL handlers).


> > It appears necessary to pin the entire guest memory in order to provide
> > guest SPE access - otherwise it is possible for the guest to receive
> > Stage-2 faults.
> 
> AFAICT these patches do not implement this. I assume that's what you're
> trying to point out here, but I just want to make sure that's explicit.

That's right.


> 
> Maybe this is a reason to trap+emulate if there's something more
> sensible that hyp can do if it sees a Stage-2 fault.

Yes it's not really clear to me at the moment what to do about this.

Thanks,

Andrew Murray

> 
> Thanks,
> Mark.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 00/18] arm64: KVM: add SPE profiling support
  2019-12-22 12:22   ` Marc Zyngier
@ 2019-12-24 12:56     ` Andrew Murray
  0 siblings, 0 replies; 78+ messages in thread
From: Andrew Murray @ 2019-12-24 12:56 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, Catalin Marinas, linux-kernel, Sudeep Holla, will, kvmarm,
	linux-arm-kernel

On Sun, Dec 22, 2019 at 12:22:10PM +0000, Marc Zyngier wrote:
> On Sat, 21 Dec 2019 10:48:16 +0000,
> Marc Zyngier <maz@kernel.org> wrote:
> > 
> > [fixing email addresses]
> > 
> > Hi Andrew,
> > 
> > On 2019-12-20 14:30, Andrew Murray wrote:
> > > This series implements support for allowing KVM guests to use the Arm
> > > Statistical Profiling Extension (SPE).
> > 
> > Thanks for this. In future, please Cc me and Will on email addresses
> > we can actually read.
> > 
> > > It has been tested on a model to ensure that both host and guest can
> > > simultaneously use SPE with valid data. E.g.
> > > 
> > > $ perf record -e arm_spe/ts_enable=1,pa_enable=1,pct_enable=1/ \
> > >         dd if=/dev/zero of=/dev/null count=1000
> > > $ perf report --dump-raw-trace > spe_buf.txt
> > > 
> > > As we save and restore the SPE context, the guest can access the SPE
> > > registers directly, thus in this version of the series we remove the
> > > trapping and emulation.
> > > 
> > > In the previous series of this support, when KVM SPE isn't
> > > supported (e.g. via CONFIG_KVM_ARM_SPE) we were able to return a
> > > value of 0 to all reads of the SPE registers - as we can no longer
> > > do this there isn't a mechanism to prevent the guest from using
> > > SPE - thus I'm keen for feedback on the best way of resolving
> > > this.
> > 
> > Surely there is a way to conditionally trap SPE registers, right? You
> > should still be able to do this if SPE is not configured for a given
> > guest (as we do for other feature such as PtrAuth).
> > 
> > > It appears necessary to pin the entire guest memory in order to
> > > provide guest SPE access - otherwise it is possible for the guest
> > > to receive Stage-2 faults.
> > 
> > Really? How can the guest receive a stage-2 fault? This doesn't fit
> > what I understand of the ARMv8 exception model. Or do you mean a SPE
> > interrupt describing a S2 fault?

Yes the latter.


> > 
> > And this is not just pinning the memory either. You have to ensure that
> > all S2 page tables are created ahead of SPE being able to DMA to guest
> > memory. This may have some impacts on the THP code...
> > 
> > I'll have a look at the actual series ASAP (but that's not very soon).
> 
> I found some time to go through the series, and there is clearly a lot
> of work left to do:
> 
> - There so nothing here to handle memory pinning whatsoever. If it
>   works, it is only thanks to some side effect.
> 
> - The missing trapping is deeply worrying. Given that this is an
>   optional feature, you cannot just let the guest do whatever it wants
>   in an uncontrolled manner.

Yes I'll add this.


> 
> - The interrupt handling is busted. You mix concepts picked from both
>   the PMU and the timer code, while the SPE device doesn't behave like
>   any of these two (it is neither a fully emulated device, nor a
>   device that is exclusively owned by a guest at any given time).
> 
> I expect some level of discussion on the list including at least Will
> and myself before you respin this.

Thanks for the quick feedback.

Andrew Murray

> 
> 	M.
> 
> -- 
> Jazz is not dead, it just smells funny.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 14/18] KVM: arm64: spe: Provide guest virtual interrupts for SPE
  2019-12-24 12:42       ` Marc Zyngier
@ 2019-12-24 13:08         ` Andrew Murray
  2019-12-24 13:22           ` Marc Zyngier
  0 siblings, 1 reply; 78+ messages in thread
From: Andrew Murray @ 2019-12-24 13:08 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, kvm, linux-kernel,
	Sudeep Holla, kvmarm, linux-arm-kernel

On Tue, Dec 24, 2019 at 12:42:02PM +0000, Marc Zyngier wrote:
> On 2019-12-24 11:50, Andrew Murray wrote:
> > On Sun, Dec 22, 2019 at 12:07:50PM +0000, Marc Zyngier wrote:
> > > On Fri, 20 Dec 2019 14:30:21 +0000,
> > > Andrew Murray <andrew.murray@arm.com> wrote:
> > > >
> > > > Upon the exit of a guest, let's determine if the SPE device has
> > > generated
> > > > an interrupt - if so we'll inject a virtual interrupt to the
> > > guest.
> > > >
> > > > Upon the entry and exit of a guest we'll also update the state of
> > > the
> > > > physical IRQ such that it is active when a guest interrupt is
> > > pending
> > > > and the guest is running.
> > > >
> > > > Finally we map the physical IRQ to the virtual IRQ such that the
> > > guest
> > > > can deactivate the interrupt when it handles the interrupt.
> > > >
> > > > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> > > > ---
> > > >  include/kvm/arm_spe.h |  6 ++++
> > > >  virt/kvm/arm/arm.c    |  5 ++-
> > > >  virt/kvm/arm/spe.c    | 71
> > > +++++++++++++++++++++++++++++++++++++++++++
> > > >  3 files changed, 81 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/include/kvm/arm_spe.h b/include/kvm/arm_spe.h
> > > > index 9c65130d726d..91b2214f543a 100644
> > > > --- a/include/kvm/arm_spe.h
> > > > +++ b/include/kvm/arm_spe.h
> > > > @@ -37,6 +37,9 @@ static inline bool kvm_arm_support_spe_v1(void)
> > > >  						      ID_AA64DFR0_PMSVER_SHIFT);
> > > >  }
> > > >
> > > > +void kvm_spe_flush_hwstate(struct kvm_vcpu *vcpu);
> > > > +inline void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu);
> > > > +
> > > >  int kvm_arm_spe_v1_set_attr(struct kvm_vcpu *vcpu,
> > > >  			    struct kvm_device_attr *attr);
> > > >  int kvm_arm_spe_v1_get_attr(struct kvm_vcpu *vcpu,
> > > > @@ -49,6 +52,9 @@ int kvm_arm_spe_v1_enable(struct kvm_vcpu
> > > *vcpu);
> > > >  #define kvm_arm_support_spe_v1()	(false)
> > > >  #define kvm_arm_spe_irq_initialized(v)	(false)
> > > >
> > > > +static inline void kvm_spe_flush_hwstate(struct kvm_vcpu *vcpu)
> > > {}
> > > > +static inline void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu) {}
> > > > +
> > > >  static inline int kvm_arm_spe_v1_set_attr(struct kvm_vcpu *vcpu,
> > > >  					  struct kvm_device_attr *attr)
> > > >  {
> > > > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> > > > index 340d2388ee2c..a66085c8e785 100644
> > > > --- a/virt/kvm/arm/arm.c
> > > > +++ b/virt/kvm/arm/arm.c
> > > > @@ -741,6 +741,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu
> > > *vcpu, struct kvm_run *run)
> > > >  		preempt_disable();
> > > >
> > > >  		kvm_pmu_flush_hwstate(vcpu);
> > > > +		kvm_spe_flush_hwstate(vcpu);
> > > >
> > > >  		local_irq_disable();
> > > >
> > > > @@ -782,6 +783,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu
> > > *vcpu, struct kvm_run *run)
> > > >  		    kvm_request_pending(vcpu)) {
> > > >  			vcpu->mode = OUTSIDE_GUEST_MODE;
> > > >  			isb(); /* Ensure work in x_flush_hwstate is committed */
> > > > +			kvm_spe_sync_hwstate(vcpu);
> > > >  			kvm_pmu_sync_hwstate(vcpu);
> > > >  			if (static_branch_unlikely(&userspace_irqchip_in_use))
> > > >  				kvm_timer_sync_hwstate(vcpu);
> > > > @@ -816,11 +818,12 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu
> > > *vcpu, struct kvm_run *run)
> > > >  		kvm_arm_clear_debug(vcpu);
> > > >
> > > >  		/*
> > > > -		 * We must sync the PMU state before the vgic state so
> > > > +		 * We must sync the PMU and SPE state before the vgic state so
> > > >  		 * that the vgic can properly sample the updated state of the
> > > >  		 * interrupt line.
> > > >  		 */
> > > >  		kvm_pmu_sync_hwstate(vcpu);
> > > > +		kvm_spe_sync_hwstate(vcpu);
> > > 
> > > The *HUGE* difference is that the PMU is purely a virtual interrupt,
> > > while you're trying to deal with a HW interrupt here.
> > > 
> > > >
> > > >  		/*
> > > >  		 * Sync the vgic state before syncing the timer state because
> > > > diff --git a/virt/kvm/arm/spe.c b/virt/kvm/arm/spe.c
> > > > index 83ac2cce2cc3..097ed39014e4 100644
> > > > --- a/virt/kvm/arm/spe.c
> > > > +++ b/virt/kvm/arm/spe.c
> > > > @@ -35,6 +35,68 @@ int kvm_arm_spe_v1_enable(struct kvm_vcpu
> > > *vcpu)
> > > >  	return 0;
> > > >  }
> > > >
> > > > +static inline void set_spe_irq_phys_active(struct
> > > arm_spe_kvm_info *info,
> > > > +					   bool active)
> > > > +{
> > > > +	int r;
> > > > +	r = irq_set_irqchip_state(info->physical_irq,
> > > IRQCHIP_STATE_ACTIVE,
> > > > +				  active);
> > > > +	WARN_ON(r);
> > > > +}
> > > > +
> > > > +void kvm_spe_flush_hwstate(struct kvm_vcpu *vcpu)
> > > > +{
> > > > +	struct kvm_spe *spe = &vcpu->arch.spe;
> > > > +	bool phys_active = false;
> > > > +	struct arm_spe_kvm_info *info = arm_spe_get_kvm_info();
> > > > +
> > > > +	if (!kvm_arm_spe_v1_ready(vcpu))
> > > > +		return;
> > > > +
> > > > +	if (irqchip_in_kernel(vcpu->kvm))
> > > > +		phys_active = kvm_vgic_map_is_active(vcpu, spe->irq_num);
> > > > +
> > > > +	phys_active |= spe->irq_level;
> > > > +
> > > > +	set_spe_irq_phys_active(info, phys_active);
> > > 
> > > So you're happy to mess with the HW interrupt state even when you
> > > don't have a HW irqchip? If you are going to copy paste the timer
> > > code
> > > here, you'd need to support it all the way (no, don't).
> > > 
> > > > +}
> > > > +
> > > > +void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu)
> > > > +{
> > > > +	struct kvm_spe *spe = &vcpu->arch.spe;
> > > > +	u64 pmbsr;
> > > > +	int r;
> > > > +	bool service;
> > > > +	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
> > > > +	struct arm_spe_kvm_info *info = arm_spe_get_kvm_info();
> > > > +
> > > > +	if (!kvm_arm_spe_v1_ready(vcpu))
> > > > +		return;
> > > > +
> > > > +	set_spe_irq_phys_active(info, false);
> > > > +
> > > > +	pmbsr = ctxt->sys_regs[PMBSR_EL1];
> > > > +	service = !!(pmbsr & BIT(SYS_PMBSR_EL1_S_SHIFT));
> > > > +	if (spe->irq_level == service)
> > > > +		return;
> > > > +
> > > > +	spe->irq_level = service;
> > > > +
> > > > +	if (likely(irqchip_in_kernel(vcpu->kvm))) {
> > > > +		r = kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
> > > > +					spe->irq_num, service, spe);
> > > > +		WARN_ON(r);
> > > > +	}
> > > > +}
> > > > +
> > > > +static inline bool kvm_arch_arm_spe_v1_get_input_level(int
> > > vintid)
> > > > +{
> > > > +	struct kvm_vcpu *vcpu = kvm_arm_get_running_vcpu();
> > > > +	struct kvm_spe *spe = &vcpu->arch.spe;
> > > > +
> > > > +	return spe->irq_level;
> > > > +}
> > > 
> > > This isn't what such a callback is for. It is supposed to sample the
> > > HW, an nothing else.
> > > 
> > > > +
> > > >  static int kvm_arm_spe_v1_init(struct kvm_vcpu *vcpu)
> > > >  {
> > > >  	if (!kvm_arm_support_spe_v1())
> > > > @@ -48,6 +110,7 @@ static int kvm_arm_spe_v1_init(struct kvm_vcpu
> > > *vcpu)
> > > >
> > > >  	if (irqchip_in_kernel(vcpu->kvm)) {
> > > >  		int ret;
> > > > +		struct arm_spe_kvm_info *info;
> > > >
> > > >  		/*
> > > >  		 * If using the SPE with an in-kernel virtual GIC
> > > > @@ -57,10 +120,18 @@ static int kvm_arm_spe_v1_init(struct
> > > kvm_vcpu *vcpu)
> > > >  		if (!vgic_initialized(vcpu->kvm))
> > > >  			return -ENODEV;
> > > >
> > > > +		info = arm_spe_get_kvm_info();
> > > > +		if (!info->physical_irq)
> > > > +			return -ENODEV;
> > > > +
> > > >  		ret = kvm_vgic_set_owner(vcpu, vcpu->arch.spe.irq_num,
> > > >  					 &vcpu->arch.spe);
> > > >  		if (ret)
> > > >  			return ret;
> > > > +
> > > > +		ret = kvm_vgic_map_phys_irq(vcpu, info->physical_irq,
> > > > +					    vcpu->arch.spe.irq_num,
> > > > +					    kvm_arch_arm_spe_v1_get_input_level);
> > > 
> > > You're mapping the interrupt int the guest, and yet you have never
> > > forwarded the interrupt the first place. All this flow is only going
> > > to wreck the host driver as soon as an interrupt occurs.
> > > 
> > > I think you should rethink the interrupt handling altogether. It
> > > would
> > > make more sense if the interrupt was actually completely
> > > virtualized. If you can isolate the guest state and compute the
> > > interrupt state in SW (and from the above, it seems that you can),
> > > then you shouldn't mess with the whole forwarding *at all*, as it
> > > isn't designed for devices shared between host and guests.
> > 
> > Yes it's possible to read SYS_PMBSR_EL1_S_SHIFT and determine if SPE
> > wants
> > service. If I understand correctly, you're suggesting on entry/exit to
> > the
> > guest we determine this and inject an interrupt to the guest. As well as
> > removing the kvm_vgic_map_phys_irq mapping to the physical interrupt?
> 
> The mapping only makes sense for devices that have their interrupt
> forwarded to a vcpu, where the expected flow is that the interrupt
> is taken on the host with a normal interrupt handler and then
> injected in the guest (you still have to manage the active state
> though). The basic assumption is that such a device is entirely
> owned by KVM.

Though the mapping does mean that if the guest handles the guest SPE
interrupt it doesn't have to wait for a guest exit before having the
SPE interrupt evaluated again (i.e. another SPE interrupt won't cause
a guest exit) - thus increasing the size of any black hole.


> 
> Here, you're abusing the mapping interface: you don't have an
> interrupt handler (the host SPE driver owns it), the interrupt
> isn't forwarded, and yet you're messing with the active state.
> None of that is expected, and you are in uncharted territory
> as far as KVM is concerned.
> 
> What bothers me the most is that this looks a lot like a previous
> implementation of the timers, and we had all the problems in the
> world to keep track of the interrupt state *and* have a reasonable
> level of performance (hitting the redistributor on the fast path
> is a performance killer).
> 
> > My understanding was that I needed knowledge of the physical SPE
> > interrupt
> > number so that I could prevent the host SPE driver from getting spurious
> > interrupts due to guest use of the SPE.
> 
> You can't completely rule out the host getting interrupted. Even if you set
> PMBSR_EL1.S to zero, there is no guarantee that the host will not observe
> the interrupt anyway (the GIC architecture doesn't tell you how quickly
> it will be retired, if ever). The host driver already checks for this
> anyway.
> 
> What you need to ensure is that PMBSR_EL1.S being set on guest entry
> doesn't immediately kick you out of the guest and prevent forward
> progress. This is why you need to manage the active state.
> 
> The real question is: how quickly do you want to react to a SPE
> interrupt firing while in a guest?
> 
> If you want to take it into account as soon as it fires, then you need
> to eagerly save/restore the active state together with the SPE state on
> each entry/exit, and performance will suffer. This is what you are
> currently doing.
> 
> If you're OK with evaluating the interrupt status on exit, but without
> the interrupt itself causing an exit, then you can simply manage it
> as a purely virtual interrupt, and just deal with the active state
> in load/put (set the interrupt as active on load, clear it on put).

This does feel like the pragmatic approach - a larger black hole in exchange
for performance. I imagine the blackhole would be naturally reduced on
machines with high workloads.

I'll refine the series to take this approach.

> 
> Given that SPE interrupts always indicate that profiling has stopped,

and faults :|

Thanks,

Andrew Murray

> this only affects the size of the black hole, and I'm inclined to do
> the latter.
> 
>         M.
> -- 
> Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 14/18] KVM: arm64: spe: Provide guest virtual  interrupts for SPE
  2019-12-24 13:08         ` Andrew Murray
@ 2019-12-24 13:22           ` Marc Zyngier
  2019-12-24 13:36             ` Andrew Murray
  0 siblings, 1 reply; 78+ messages in thread
From: Marc Zyngier @ 2019-12-24 13:22 UTC (permalink / raw)
  To: Andrew Murray
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, kvm, linux-kernel,
	Sudeep Holla, kvmarm, linux-arm-kernel

On 2019-12-24 13:08, Andrew Murray wrote:
> On Tue, Dec 24, 2019 at 12:42:02PM +0000, Marc Zyngier wrote:
>> On 2019-12-24 11:50, Andrew Murray wrote:
>> > On Sun, Dec 22, 2019 at 12:07:50PM +0000, Marc Zyngier wrote:
>> > > On Fri, 20 Dec 2019 14:30:21 +0000,
>> > > Andrew Murray <andrew.murray@arm.com> wrote:
>> > > >
>> > > > Upon the exit of a guest, let's determine if the SPE device 
>> has
>> > > generated
>> > > > an interrupt - if so we'll inject a virtual interrupt to the
>> > > guest.
>> > > >
>> > > > Upon the entry and exit of a guest we'll also update the state 
>> of
>> > > the
>> > > > physical IRQ such that it is active when a guest interrupt is
>> > > pending
>> > > > and the guest is running.
>> > > >
>> > > > Finally we map the physical IRQ to the virtual IRQ such that 
>> the
>> > > guest
>> > > > can deactivate the interrupt when it handles the interrupt.
>> > > >
>> > > > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
>> > > > ---
>> > > >  include/kvm/arm_spe.h |  6 ++++
>> > > >  virt/kvm/arm/arm.c    |  5 ++-
>> > > >  virt/kvm/arm/spe.c    | 71
>> > > +++++++++++++++++++++++++++++++++++++++++++
>> > > >  3 files changed, 81 insertions(+), 1 deletion(-)
>> > > >
>> > > > diff --git a/include/kvm/arm_spe.h b/include/kvm/arm_spe.h
>> > > > index 9c65130d726d..91b2214f543a 100644
>> > > > --- a/include/kvm/arm_spe.h
>> > > > +++ b/include/kvm/arm_spe.h
>> > > > @@ -37,6 +37,9 @@ static inline bool 
>> kvm_arm_support_spe_v1(void)
>> > > >  						      ID_AA64DFR0_PMSVER_SHIFT);
>> > > >  }
>> > > >
>> > > > +void kvm_spe_flush_hwstate(struct kvm_vcpu *vcpu);
>> > > > +inline void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu);
>> > > > +
>> > > >  int kvm_arm_spe_v1_set_attr(struct kvm_vcpu *vcpu,
>> > > >  			    struct kvm_device_attr *attr);
>> > > >  int kvm_arm_spe_v1_get_attr(struct kvm_vcpu *vcpu,
>> > > > @@ -49,6 +52,9 @@ int kvm_arm_spe_v1_enable(struct kvm_vcpu
>> > > *vcpu);
>> > > >  #define kvm_arm_support_spe_v1()	(false)
>> > > >  #define kvm_arm_spe_irq_initialized(v)	(false)
>> > > >
>> > > > +static inline void kvm_spe_flush_hwstate(struct kvm_vcpu 
>> *vcpu)
>> > > {}
>> > > > +static inline void kvm_spe_sync_hwstate(struct kvm_vcpu 
>> *vcpu) {}
>> > > > +
>> > > >  static inline int kvm_arm_spe_v1_set_attr(struct kvm_vcpu 
>> *vcpu,
>> > > >  					  struct kvm_device_attr *attr)
>> > > >  {
>> > > > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
>> > > > index 340d2388ee2c..a66085c8e785 100644
>> > > > --- a/virt/kvm/arm/arm.c
>> > > > +++ b/virt/kvm/arm/arm.c
>> > > > @@ -741,6 +741,7 @@ int kvm_arch_vcpu_ioctl_run(struct 
>> kvm_vcpu
>> > > *vcpu, struct kvm_run *run)
>> > > >  		preempt_disable();
>> > > >
>> > > >  		kvm_pmu_flush_hwstate(vcpu);
>> > > > +		kvm_spe_flush_hwstate(vcpu);
>> > > >
>> > > >  		local_irq_disable();
>> > > >
>> > > > @@ -782,6 +783,7 @@ int kvm_arch_vcpu_ioctl_run(struct 
>> kvm_vcpu
>> > > *vcpu, struct kvm_run *run)
>> > > >  		    kvm_request_pending(vcpu)) {
>> > > >  			vcpu->mode = OUTSIDE_GUEST_MODE;
>> > > >  			isb(); /* Ensure work in x_flush_hwstate is committed */
>> > > > +			kvm_spe_sync_hwstate(vcpu);
>> > > >  			kvm_pmu_sync_hwstate(vcpu);
>> > > >  			if (static_branch_unlikely(&userspace_irqchip_in_use))
>> > > >  				kvm_timer_sync_hwstate(vcpu);
>> > > > @@ -816,11 +818,12 @@ int kvm_arch_vcpu_ioctl_run(struct 
>> kvm_vcpu
>> > > *vcpu, struct kvm_run *run)
>> > > >  		kvm_arm_clear_debug(vcpu);
>> > > >
>> > > >  		/*
>> > > > -		 * We must sync the PMU state before the vgic state so
>> > > > +		 * We must sync the PMU and SPE state before the vgic state 
>> so
>> > > >  		 * that the vgic can properly sample the updated state of 
>> the
>> > > >  		 * interrupt line.
>> > > >  		 */
>> > > >  		kvm_pmu_sync_hwstate(vcpu);
>> > > > +		kvm_spe_sync_hwstate(vcpu);
>> > >
>> > > The *HUGE* difference is that the PMU is purely a virtual 
>> interrupt,
>> > > while you're trying to deal with a HW interrupt here.
>> > >
>> > > >
>> > > >  		/*
>> > > >  		 * Sync the vgic state before syncing the timer state 
>> because
>> > > > diff --git a/virt/kvm/arm/spe.c b/virt/kvm/arm/spe.c
>> > > > index 83ac2cce2cc3..097ed39014e4 100644
>> > > > --- a/virt/kvm/arm/spe.c
>> > > > +++ b/virt/kvm/arm/spe.c
>> > > > @@ -35,6 +35,68 @@ int kvm_arm_spe_v1_enable(struct kvm_vcpu
>> > > *vcpu)
>> > > >  	return 0;
>> > > >  }
>> > > >
>> > > > +static inline void set_spe_irq_phys_active(struct
>> > > arm_spe_kvm_info *info,
>> > > > +					   bool active)
>> > > > +{
>> > > > +	int r;
>> > > > +	r = irq_set_irqchip_state(info->physical_irq,
>> > > IRQCHIP_STATE_ACTIVE,
>> > > > +				  active);
>> > > > +	WARN_ON(r);
>> > > > +}
>> > > > +
>> > > > +void kvm_spe_flush_hwstate(struct kvm_vcpu *vcpu)
>> > > > +{
>> > > > +	struct kvm_spe *spe = &vcpu->arch.spe;
>> > > > +	bool phys_active = false;
>> > > > +	struct arm_spe_kvm_info *info = arm_spe_get_kvm_info();
>> > > > +
>> > > > +	if (!kvm_arm_spe_v1_ready(vcpu))
>> > > > +		return;
>> > > > +
>> > > > +	if (irqchip_in_kernel(vcpu->kvm))
>> > > > +		phys_active = kvm_vgic_map_is_active(vcpu, spe->irq_num);
>> > > > +
>> > > > +	phys_active |= spe->irq_level;
>> > > > +
>> > > > +	set_spe_irq_phys_active(info, phys_active);
>> > >
>> > > So you're happy to mess with the HW interrupt state even when 
>> you
>> > > don't have a HW irqchip? If you are going to copy paste the 
>> timer
>> > > code
>> > > here, you'd need to support it all the way (no, don't).
>> > >
>> > > > +}
>> > > > +
>> > > > +void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu)
>> > > > +{
>> > > > +	struct kvm_spe *spe = &vcpu->arch.spe;
>> > > > +	u64 pmbsr;
>> > > > +	int r;
>> > > > +	bool service;
>> > > > +	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
>> > > > +	struct arm_spe_kvm_info *info = arm_spe_get_kvm_info();
>> > > > +
>> > > > +	if (!kvm_arm_spe_v1_ready(vcpu))
>> > > > +		return;
>> > > > +
>> > > > +	set_spe_irq_phys_active(info, false);
>> > > > +
>> > > > +	pmbsr = ctxt->sys_regs[PMBSR_EL1];
>> > > > +	service = !!(pmbsr & BIT(SYS_PMBSR_EL1_S_SHIFT));
>> > > > +	if (spe->irq_level == service)
>> > > > +		return;
>> > > > +
>> > > > +	spe->irq_level = service;
>> > > > +
>> > > > +	if (likely(irqchip_in_kernel(vcpu->kvm))) {
>> > > > +		r = kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
>> > > > +					spe->irq_num, service, spe);
>> > > > +		WARN_ON(r);
>> > > > +	}
>> > > > +}
>> > > > +
>> > > > +static inline bool kvm_arch_arm_spe_v1_get_input_level(int
>> > > vintid)
>> > > > +{
>> > > > +	struct kvm_vcpu *vcpu = kvm_arm_get_running_vcpu();
>> > > > +	struct kvm_spe *spe = &vcpu->arch.spe;
>> > > > +
>> > > > +	return spe->irq_level;
>> > > > +}
>> > >
>> > > This isn't what such a callback is for. It is supposed to sample 
>> the
>> > > HW, an nothing else.
>> > >
>> > > > +
>> > > >  static int kvm_arm_spe_v1_init(struct kvm_vcpu *vcpu)
>> > > >  {
>> > > >  	if (!kvm_arm_support_spe_v1())
>> > > > @@ -48,6 +110,7 @@ static int kvm_arm_spe_v1_init(struct 
>> kvm_vcpu
>> > > *vcpu)
>> > > >
>> > > >  	if (irqchip_in_kernel(vcpu->kvm)) {
>> > > >  		int ret;
>> > > > +		struct arm_spe_kvm_info *info;
>> > > >
>> > > >  		/*
>> > > >  		 * If using the SPE with an in-kernel virtual GIC
>> > > > @@ -57,10 +120,18 @@ static int kvm_arm_spe_v1_init(struct
>> > > kvm_vcpu *vcpu)
>> > > >  		if (!vgic_initialized(vcpu->kvm))
>> > > >  			return -ENODEV;
>> > > >
>> > > > +		info = arm_spe_get_kvm_info();
>> > > > +		if (!info->physical_irq)
>> > > > +			return -ENODEV;
>> > > > +
>> > > >  		ret = kvm_vgic_set_owner(vcpu, vcpu->arch.spe.irq_num,
>> > > >  					 &vcpu->arch.spe);
>> > > >  		if (ret)
>> > > >  			return ret;
>> > > > +
>> > > > +		ret = kvm_vgic_map_phys_irq(vcpu, info->physical_irq,
>> > > > +					    vcpu->arch.spe.irq_num,
>> > > > +					    kvm_arch_arm_spe_v1_get_input_level);
>> > >
>> > > You're mapping the interrupt int the guest, and yet you have 
>> never
>> > > forwarded the interrupt the first place. All this flow is only 
>> going
>> > > to wreck the host driver as soon as an interrupt occurs.
>> > >
>> > > I think you should rethink the interrupt handling altogether. It
>> > > would
>> > > make more sense if the interrupt was actually completely
>> > > virtualized. If you can isolate the guest state and compute the
>> > > interrupt state in SW (and from the above, it seems that you 
>> can),
>> > > then you shouldn't mess with the whole forwarding *at all*, as 
>> it
>> > > isn't designed for devices shared between host and guests.
>> >
>> > Yes it's possible to read SYS_PMBSR_EL1_S_SHIFT and determine if 
>> SPE
>> > wants
>> > service. If I understand correctly, you're suggesting on 
>> entry/exit to
>> > the
>> > guest we determine this and inject an interrupt to the guest. As 
>> well as
>> > removing the kvm_vgic_map_phys_irq mapping to the physical 
>> interrupt?
>>
>> The mapping only makes sense for devices that have their interrupt
>> forwarded to a vcpu, where the expected flow is that the interrupt
>> is taken on the host with a normal interrupt handler and then
>> injected in the guest (you still have to manage the active state
>> though). The basic assumption is that such a device is entirely
>> owned by KVM.
>
> Though the mapping does mean that if the guest handles the guest SPE
> interrupt it doesn't have to wait for a guest exit before having the
> SPE interrupt evaluated again (i.e. another SPE interrupt won't cause
> a guest exit) - thus increasing the size of any black hole.

Sure. It still remains that your use case is outside of the scope of
this internal API.

>> Here, you're abusing the mapping interface: you don't have an
>> interrupt handler (the host SPE driver owns it), the interrupt
>> isn't forwarded, and yet you're messing with the active state.
>> None of that is expected, and you are in uncharted territory
>> as far as KVM is concerned.
>>
>> What bothers me the most is that this looks a lot like a previous
>> implementation of the timers, and we had all the problems in the
>> world to keep track of the interrupt state *and* have a reasonable
>> level of performance (hitting the redistributor on the fast path
>> is a performance killer).
>>
>> > My understanding was that I needed knowledge of the physical SPE
>> > interrupt
>> > number so that I could prevent the host SPE driver from getting 
>> spurious
>> > interrupts due to guest use of the SPE.
>>
>> You can't completely rule out the host getting interrupted. Even if 
>> you set
>> PMBSR_EL1.S to zero, there is no guarantee that the host will not 
>> observe
>> the interrupt anyway (the GIC architecture doesn't tell you how 
>> quickly
>> it will be retired, if ever). The host driver already checks for 
>> this
>> anyway.
>>
>> What you need to ensure is that PMBSR_EL1.S being set on guest entry
>> doesn't immediately kick you out of the guest and prevent forward
>> progress. This is why you need to manage the active state.
>>
>> The real question is: how quickly do you want to react to a SPE
>> interrupt firing while in a guest?
>>
>> If you want to take it into account as soon as it fires, then you 
>> need
>> to eagerly save/restore the active state together with the SPE state 
>> on
>> each entry/exit, and performance will suffer. This is what you are
>> currently doing.
>>
>> If you're OK with evaluating the interrupt status on exit, but 
>> without
>> the interrupt itself causing an exit, then you can simply manage it
>> as a purely virtual interrupt, and just deal with the active state
>> in load/put (set the interrupt as active on load, clear it on put).
>
> This does feel like the pragmatic approach - a larger black hole in 
> exchange
> for performance. I imagine the blackhole would be naturally reduced 
> on
> machines with high workloads.

Why? I don't see the relation between how busy the vcpu is and the size
of the blackhole. It is strictly a function of the frequency of exits.

         M.

>
> I'll refine the series to take this approach.
>
>>
>> Given that SPE interrupts always indicate that profiling has 
>> stopped,
>
> and faults :|
>
> Thanks,
>
> Andrew Murray
>
>> this only affects the size of the black hole, and I'm inclined to do
>> the latter.
>>
>>         M.
>> --
>> Jazz is not dead. It just smells funny...

-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 14/18] KVM: arm64: spe: Provide guest virtual interrupts for SPE
  2019-12-24 13:22           ` Marc Zyngier
@ 2019-12-24 13:36             ` Andrew Murray
  2019-12-24 13:46               ` Marc Zyngier
  0 siblings, 1 reply; 78+ messages in thread
From: Andrew Murray @ 2019-12-24 13:36 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, kvm, linux-kernel,
	Sudeep Holla, kvmarm, linux-arm-kernel

On Tue, Dec 24, 2019 at 01:22:46PM +0000, Marc Zyngier wrote:
> On 2019-12-24 13:08, Andrew Murray wrote:
> > On Tue, Dec 24, 2019 at 12:42:02PM +0000, Marc Zyngier wrote:
> > > On 2019-12-24 11:50, Andrew Murray wrote:
> > > > On Sun, Dec 22, 2019 at 12:07:50PM +0000, Marc Zyngier wrote:
> > > > > On Fri, 20 Dec 2019 14:30:21 +0000,
> > > > > Andrew Murray <andrew.murray@arm.com> wrote:
> > > > > >
> > > > > > Upon the exit of a guest, let's determine if the SPE device
> > > has
> > > > > generated
> > > > > > an interrupt - if so we'll inject a virtual interrupt to the
> > > > > guest.
> > > > > >
> > > > > > Upon the entry and exit of a guest we'll also update the state
> > > of
> > > > > the
> > > > > > physical IRQ such that it is active when a guest interrupt is
> > > > > pending
> > > > > > and the guest is running.
> > > > > >
> > > > > > Finally we map the physical IRQ to the virtual IRQ such that
> > > the
> > > > > guest
> > > > > > can deactivate the interrupt when it handles the interrupt.
> > > > > >
> > > > > > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> > > > > > ---
> > > > > >  include/kvm/arm_spe.h |  6 ++++
> > > > > >  virt/kvm/arm/arm.c    |  5 ++-
> > > > > >  virt/kvm/arm/spe.c    | 71
> > > > > +++++++++++++++++++++++++++++++++++++++++++
> > > > > >  3 files changed, 81 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/include/kvm/arm_spe.h b/include/kvm/arm_spe.h
> > > > > > index 9c65130d726d..91b2214f543a 100644
> > > > > > --- a/include/kvm/arm_spe.h
> > > > > > +++ b/include/kvm/arm_spe.h
> > > > > > @@ -37,6 +37,9 @@ static inline bool
> > > kvm_arm_support_spe_v1(void)
> > > > > >  						      ID_AA64DFR0_PMSVER_SHIFT);
> > > > > >  }
> > > > > >
> > > > > > +void kvm_spe_flush_hwstate(struct kvm_vcpu *vcpu);
> > > > > > +inline void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu);
> > > > > > +
> > > > > >  int kvm_arm_spe_v1_set_attr(struct kvm_vcpu *vcpu,
> > > > > >  			    struct kvm_device_attr *attr);
> > > > > >  int kvm_arm_spe_v1_get_attr(struct kvm_vcpu *vcpu,
> > > > > > @@ -49,6 +52,9 @@ int kvm_arm_spe_v1_enable(struct kvm_vcpu
> > > > > *vcpu);
> > > > > >  #define kvm_arm_support_spe_v1()	(false)
> > > > > >  #define kvm_arm_spe_irq_initialized(v)	(false)
> > > > > >
> > > > > > +static inline void kvm_spe_flush_hwstate(struct kvm_vcpu
> > > *vcpu)
> > > > > {}
> > > > > > +static inline void kvm_spe_sync_hwstate(struct kvm_vcpu
> > > *vcpu) {}
> > > > > > +
> > > > > >  static inline int kvm_arm_spe_v1_set_attr(struct kvm_vcpu
> > > *vcpu,
> > > > > >  					  struct kvm_device_attr *attr)
> > > > > >  {
> > > > > > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> > > > > > index 340d2388ee2c..a66085c8e785 100644
> > > > > > --- a/virt/kvm/arm/arm.c
> > > > > > +++ b/virt/kvm/arm/arm.c
> > > > > > @@ -741,6 +741,7 @@ int kvm_arch_vcpu_ioctl_run(struct
> > > kvm_vcpu
> > > > > *vcpu, struct kvm_run *run)
> > > > > >  		preempt_disable();
> > > > > >
> > > > > >  		kvm_pmu_flush_hwstate(vcpu);
> > > > > > +		kvm_spe_flush_hwstate(vcpu);
> > > > > >
> > > > > >  		local_irq_disable();
> > > > > >
> > > > > > @@ -782,6 +783,7 @@ int kvm_arch_vcpu_ioctl_run(struct
> > > kvm_vcpu
> > > > > *vcpu, struct kvm_run *run)
> > > > > >  		    kvm_request_pending(vcpu)) {
> > > > > >  			vcpu->mode = OUTSIDE_GUEST_MODE;
> > > > > >  			isb(); /* Ensure work in x_flush_hwstate is committed */
> > > > > > +			kvm_spe_sync_hwstate(vcpu);
> > > > > >  			kvm_pmu_sync_hwstate(vcpu);
> > > > > >  			if (static_branch_unlikely(&userspace_irqchip_in_use))
> > > > > >  				kvm_timer_sync_hwstate(vcpu);
> > > > > > @@ -816,11 +818,12 @@ int kvm_arch_vcpu_ioctl_run(struct
> > > kvm_vcpu
> > > > > *vcpu, struct kvm_run *run)
> > > > > >  		kvm_arm_clear_debug(vcpu);
> > > > > >
> > > > > >  		/*
> > > > > > -		 * We must sync the PMU state before the vgic state so
> > > > > > +		 * We must sync the PMU and SPE state before the vgic state
> > > so
> > > > > >  		 * that the vgic can properly sample the updated state of
> > > the
> > > > > >  		 * interrupt line.
> > > > > >  		 */
> > > > > >  		kvm_pmu_sync_hwstate(vcpu);
> > > > > > +		kvm_spe_sync_hwstate(vcpu);
> > > > >
> > > > > The *HUGE* difference is that the PMU is purely a virtual
> > > interrupt,
> > > > > while you're trying to deal with a HW interrupt here.
> > > > >
> > > > > >
> > > > > >  		/*
> > > > > >  		 * Sync the vgic state before syncing the timer state
> > > because
> > > > > > diff --git a/virt/kvm/arm/spe.c b/virt/kvm/arm/spe.c
> > > > > > index 83ac2cce2cc3..097ed39014e4 100644
> > > > > > --- a/virt/kvm/arm/spe.c
> > > > > > +++ b/virt/kvm/arm/spe.c
> > > > > > @@ -35,6 +35,68 @@ int kvm_arm_spe_v1_enable(struct kvm_vcpu
> > > > > *vcpu)
> > > > > >  	return 0;
> > > > > >  }
> > > > > >
> > > > > > +static inline void set_spe_irq_phys_active(struct
> > > > > arm_spe_kvm_info *info,
> > > > > > +					   bool active)
> > > > > > +{
> > > > > > +	int r;
> > > > > > +	r = irq_set_irqchip_state(info->physical_irq,
> > > > > IRQCHIP_STATE_ACTIVE,
> > > > > > +				  active);
> > > > > > +	WARN_ON(r);
> > > > > > +}
> > > > > > +
> > > > > > +void kvm_spe_flush_hwstate(struct kvm_vcpu *vcpu)
> > > > > > +{
> > > > > > +	struct kvm_spe *spe = &vcpu->arch.spe;
> > > > > > +	bool phys_active = false;
> > > > > > +	struct arm_spe_kvm_info *info = arm_spe_get_kvm_info();
> > > > > > +
> > > > > > +	if (!kvm_arm_spe_v1_ready(vcpu))
> > > > > > +		return;
> > > > > > +
> > > > > > +	if (irqchip_in_kernel(vcpu->kvm))
> > > > > > +		phys_active = kvm_vgic_map_is_active(vcpu, spe->irq_num);
> > > > > > +
> > > > > > +	phys_active |= spe->irq_level;
> > > > > > +
> > > > > > +	set_spe_irq_phys_active(info, phys_active);
> > > > >
> > > > > So you're happy to mess with the HW interrupt state even when
> > > you
> > > > > don't have a HW irqchip? If you are going to copy paste the
> > > timer
> > > > > code
> > > > > here, you'd need to support it all the way (no, don't).
> > > > >
> > > > > > +}
> > > > > > +
> > > > > > +void kvm_spe_sync_hwstate(struct kvm_vcpu *vcpu)
> > > > > > +{
> > > > > > +	struct kvm_spe *spe = &vcpu->arch.spe;
> > > > > > +	u64 pmbsr;
> > > > > > +	int r;
> > > > > > +	bool service;
> > > > > > +	struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
> > > > > > +	struct arm_spe_kvm_info *info = arm_spe_get_kvm_info();
> > > > > > +
> > > > > > +	if (!kvm_arm_spe_v1_ready(vcpu))
> > > > > > +		return;
> > > > > > +
> > > > > > +	set_spe_irq_phys_active(info, false);
> > > > > > +
> > > > > > +	pmbsr = ctxt->sys_regs[PMBSR_EL1];
> > > > > > +	service = !!(pmbsr & BIT(SYS_PMBSR_EL1_S_SHIFT));
> > > > > > +	if (spe->irq_level == service)
> > > > > > +		return;
> > > > > > +
> > > > > > +	spe->irq_level = service;
> > > > > > +
> > > > > > +	if (likely(irqchip_in_kernel(vcpu->kvm))) {
> > > > > > +		r = kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
> > > > > > +					spe->irq_num, service, spe);
> > > > > > +		WARN_ON(r);
> > > > > > +	}
> > > > > > +}
> > > > > > +
> > > > > > +static inline bool kvm_arch_arm_spe_v1_get_input_level(int
> > > > > vintid)
> > > > > > +{
> > > > > > +	struct kvm_vcpu *vcpu = kvm_arm_get_running_vcpu();
> > > > > > +	struct kvm_spe *spe = &vcpu->arch.spe;
> > > > > > +
> > > > > > +	return spe->irq_level;
> > > > > > +}
> > > > >
> > > > > This isn't what such a callback is for. It is supposed to sample
> > > the
> > > > > HW, an nothing else.
> > > > >
> > > > > > +
> > > > > >  static int kvm_arm_spe_v1_init(struct kvm_vcpu *vcpu)
> > > > > >  {
> > > > > >  	if (!kvm_arm_support_spe_v1())
> > > > > > @@ -48,6 +110,7 @@ static int kvm_arm_spe_v1_init(struct
> > > kvm_vcpu
> > > > > *vcpu)
> > > > > >
> > > > > >  	if (irqchip_in_kernel(vcpu->kvm)) {
> > > > > >  		int ret;
> > > > > > +		struct arm_spe_kvm_info *info;
> > > > > >
> > > > > >  		/*
> > > > > >  		 * If using the SPE with an in-kernel virtual GIC
> > > > > > @@ -57,10 +120,18 @@ static int kvm_arm_spe_v1_init(struct
> > > > > kvm_vcpu *vcpu)
> > > > > >  		if (!vgic_initialized(vcpu->kvm))
> > > > > >  			return -ENODEV;
> > > > > >
> > > > > > +		info = arm_spe_get_kvm_info();
> > > > > > +		if (!info->physical_irq)
> > > > > > +			return -ENODEV;
> > > > > > +
> > > > > >  		ret = kvm_vgic_set_owner(vcpu, vcpu->arch.spe.irq_num,
> > > > > >  					 &vcpu->arch.spe);
> > > > > >  		if (ret)
> > > > > >  			return ret;
> > > > > > +
> > > > > > +		ret = kvm_vgic_map_phys_irq(vcpu, info->physical_irq,
> > > > > > +					    vcpu->arch.spe.irq_num,
> > > > > > +					    kvm_arch_arm_spe_v1_get_input_level);
> > > > >
> > > > > You're mapping the interrupt int the guest, and yet you have
> > > never
> > > > > forwarded the interrupt the first place. All this flow is only
> > > going
> > > > > to wreck the host driver as soon as an interrupt occurs.
> > > > >
> > > > > I think you should rethink the interrupt handling altogether. It
> > > > > would
> > > > > make more sense if the interrupt was actually completely
> > > > > virtualized. If you can isolate the guest state and compute the
> > > > > interrupt state in SW (and from the above, it seems that you
> > > can),
> > > > > then you shouldn't mess with the whole forwarding *at all*, as
> > > it
> > > > > isn't designed for devices shared between host and guests.
> > > >
> > > > Yes it's possible to read SYS_PMBSR_EL1_S_SHIFT and determine if
> > > SPE
> > > > wants
> > > > service. If I understand correctly, you're suggesting on
> > > entry/exit to
> > > > the
> > > > guest we determine this and inject an interrupt to the guest. As
> > > well as
> > > > removing the kvm_vgic_map_phys_irq mapping to the physical
> > > interrupt?
> > > 
> > > The mapping only makes sense for devices that have their interrupt
> > > forwarded to a vcpu, where the expected flow is that the interrupt
> > > is taken on the host with a normal interrupt handler and then
> > > injected in the guest (you still have to manage the active state
> > > though). The basic assumption is that such a device is entirely
> > > owned by KVM.
> > 
> > Though the mapping does mean that if the guest handles the guest SPE
> > interrupt it doesn't have to wait for a guest exit before having the
> > SPE interrupt evaluated again (i.e. another SPE interrupt won't cause
> > a guest exit) - thus increasing the size of any black hole.
> 
> Sure. It still remains that your use case is outside of the scope of
> this internal API.
> 
> > > Here, you're abusing the mapping interface: you don't have an
> > > interrupt handler (the host SPE driver owns it), the interrupt
> > > isn't forwarded, and yet you're messing with the active state.
> > > None of that is expected, and you are in uncharted territory
> > > as far as KVM is concerned.
> > > 
> > > What bothers me the most is that this looks a lot like a previous
> > > implementation of the timers, and we had all the problems in the
> > > world to keep track of the interrupt state *and* have a reasonable
> > > level of performance (hitting the redistributor on the fast path
> > > is a performance killer).
> > > 
> > > > My understanding was that I needed knowledge of the physical SPE
> > > > interrupt
> > > > number so that I could prevent the host SPE driver from getting
> > > spurious
> > > > interrupts due to guest use of the SPE.
> > > 
> > > You can't completely rule out the host getting interrupted. Even if
> > > you set
> > > PMBSR_EL1.S to zero, there is no guarantee that the host will not
> > > observe
> > > the interrupt anyway (the GIC architecture doesn't tell you how
> > > quickly
> > > it will be retired, if ever). The host driver already checks for
> > > this
> > > anyway.
> > > 
> > > What you need to ensure is that PMBSR_EL1.S being set on guest entry
> > > doesn't immediately kick you out of the guest and prevent forward
> > > progress. This is why you need to manage the active state.
> > > 
> > > The real question is: how quickly do you want to react to a SPE
> > > interrupt firing while in a guest?
> > > 
> > > If you want to take it into account as soon as it fires, then you
> > > need
> > > to eagerly save/restore the active state together with the SPE state
> > > on
> > > each entry/exit, and performance will suffer. This is what you are
> > > currently doing.
> > > 
> > > If you're OK with evaluating the interrupt status on exit, but
> > > without
> > > the interrupt itself causing an exit, then you can simply manage it
> > > as a purely virtual interrupt, and just deal with the active state
> > > in load/put (set the interrupt as active on load, clear it on put).
> > 
> > This does feel like the pragmatic approach - a larger black hole in
> > exchange
> > for performance. I imagine the blackhole would be naturally reduced on
> > machines with high workloads.
> 
> Why? I don't see the relation between how busy the vcpu is and the size
> of the blackhole. It is strictly a function of the frequency of exits.

Indeed, my assumption being that the busier a system is the more
interrupts, thus leading to more exits and so an increased frequency of
SPE interrupt evaluation and thus smaller black hole.

Thanks,

Andrew Murray

> 
>         M.
> 
> > 
> > I'll refine the series to take this approach.
> > 
> > > 
> > > Given that SPE interrupts always indicate that profiling has
> > > stopped,
> > 
> > and faults :|
> > 
> > Thanks,
> > 
> > Andrew Murray
> > 
> > > this only affects the size of the black hole, and I'm inclined to do
> > > the latter.
> > > 
> > >         M.
> > > --
> > > Jazz is not dead. It just smells funny...
> 
> -- 
> Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 14/18] KVM: arm64: spe: Provide guest virtual  interrupts for SPE
  2019-12-24 13:36             ` Andrew Murray
@ 2019-12-24 13:46               ` Marc Zyngier
  0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2019-12-24 13:46 UTC (permalink / raw)
  To: Andrew Murray
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, kvm, linux-kernel,
	Sudeep Holla, kvmarm, linux-arm-kernel

On 2019-12-24 13:36, Andrew Murray wrote:
> On Tue, Dec 24, 2019 at 01:22:46PM +0000, Marc Zyngier wrote:
>> On 2019-12-24 13:08, Andrew Murray wrote:

[...]

>> > This does feel like the pragmatic approach - a larger black hole 
>> in
>> > exchange
>> > for performance. I imagine the blackhole would be naturally 
>> reduced on
>> > machines with high workloads.
>>
>> Why? I don't see the relation between how busy the vcpu is and the 
>> size
>> of the blackhole. It is strictly a function of the frequency of 
>> exits.
>
> Indeed, my assumption being that the busier a system is the more
> interrupts, thus leading to more exits and so an increased frequency 
> of
> SPE interrupt evaluation and thus smaller black hole.

On a GICv4-enabled system, this isn't true anymore. My bet is that
people won't use SPE to optimize IO-oriented workloads, but more CPU
intensive workloads (that don't necessarily exit at all).

But never mind. Let's start with this approach, as it is simple and 
easy
to verify. If the black hole aspect becomes problematic, we know how
to reduce it (at the expense of entry/exit performance).

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 08/18] arm64: KVM: add support to save/restore SPE profiling buffer controls
  2019-12-24 10:49     ` Andrew Murray
@ 2019-12-24 15:17       ` Andrew Murray
  2019-12-24 15:48         ` Marc Zyngier
  0 siblings, 1 reply; 78+ messages in thread
From: Andrew Murray @ 2019-12-24 15:17 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, Catalin Marinas, linux-kernel, Sudeep Holla, Will Deacon,
	kvmarm, linux-arm-kernel

On Tue, Dec 24, 2019 at 10:49:30AM +0000, Andrew Murray wrote:
> On Sat, Dec 21, 2019 at 01:57:55PM +0000, Marc Zyngier wrote:
> > On Fri, 20 Dec 2019 14:30:15 +0000
> > Andrew Murray <andrew.murray@arm.com> wrote:
> > 
> > > From: Sudeep Holla <sudeep.holla@arm.com>
> > > 
> > > Currently since we don't support profiling using SPE in the guests,
> > > we just save the PMSCR_EL1, flush the profiling buffers and disable
> > > sampling. However in order to support simultaneous sampling both in
> > 
> > Is the sampling actually simultaneous? I don't believe so (the whole
> > series would be much simpler if it was).
> 
> No the SPE is used by either the guest or host at any one time. I guess
> the term simultaneous was used to refer to illusion given to both guest
> and host that they are able to use it whenever they like. I'll update
> the commit message to drop the magic.
>  
> 
> > 
> > > the host and guests, we need to save and reatore the complete SPE
> > 
> > s/reatore/restore/
> 
> Noted.
> 
> 
> > 
> > > profiling buffer controls' context.
> > > 
> > > Let's add the support for the same and keep it disabled for now.
> > > We can enable it conditionally only if guests are allowed to use
> > > SPE.
> > > 
> > > Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> > > [ Clear PMBSR bit when saving state to prevent spurious interrupts ]
> > > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> > > ---
> > >  arch/arm64/kvm/hyp/debug-sr.c | 51 +++++++++++++++++++++++++++++------
> > >  1 file changed, 43 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
> > > index 8a70a493345e..12429b212a3a 100644
> > > --- a/arch/arm64/kvm/hyp/debug-sr.c
> > > +++ b/arch/arm64/kvm/hyp/debug-sr.c
> > > @@ -85,7 +85,8 @@
> > >  	default:	write_debug(ptr[0], reg, 0);			\
> > >  	}
> > >  
> > > -static void __hyp_text __debug_save_spe_nvhe(struct kvm_cpu_context *ctxt)
> > > +static void __hyp_text
> > > +__debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
> > 
> > nit: don't split lines like this if you can avoid it. You can put the
> > full_ctxt parameter on a separate line instead.
> 
> Yes understood.
> 
> 
> > 
> > >  {
> > >  	u64 reg;
> > >  
> > > @@ -102,22 +103,46 @@ static void __hyp_text __debug_save_spe_nvhe(struct kvm_cpu_context *ctxt)
> > >  	if (reg & BIT(SYS_PMBIDR_EL1_P_SHIFT))
> > >  		return;
> > >  
> > > -	/* No; is the host actually using the thing? */
> > > -	reg = read_sysreg_s(SYS_PMBLIMITR_EL1);
> > > -	if (!(reg & BIT(SYS_PMBLIMITR_EL1_E_SHIFT)))
> > > +	/* Save the control register and disable data generation */
> > > +	ctxt->sys_regs[PMSCR_EL1] = read_sysreg_el1(SYS_PMSCR);
> > > +
> > > +	if (!ctxt->sys_regs[PMSCR_EL1])
> > 
> > Shouldn't you check the enable bits instead of relying on the whole
> > thing being zero?
> 
> Yes that would make more sense (E1SPE and E0SPE).
> 
> I feel that this check makes an assumption about the guest/host SPE
> driver... What happens if the SPE driver writes to some SPE registers
> but doesn't enable PMSCR? If the guest is also using SPE then those
> writes will be lost, when the host returns and the SPE driver enables
> SPE it won't work.
> 
> With a quick look at the SPE driver I'm not sure this will happen, but
> even so it makes me nervous relying on these assumptions. I wonder if
> this risk is present in other devices?

In fact, this may be a good reason to trap the SPE registers - this would
allow you to conditionally save/restore based on a dirty bit. It would
also allow you to re-evaluate the SPE interrupt (for example when the guest
clears the status register) and thus potentially reduce any black hole.

Thanks,

Andrew Murray

> 
> 
> > 
> > >  		return;
> > >  
> > >  	/* Yes; save the control register and disable data generation */
> > > -	ctxt->sys_regs[PMSCR_EL1] = read_sysreg_el1(SYS_PMSCR);
> > 
> > You've already saved the control register...
> 
> I'll remove that.
> 
> 
> > 
> > >  	write_sysreg_el1(0, SYS_PMSCR);
> > >  	isb();
> > >  
> > >  	/* Now drain all buffered data to memory */
> > >  	psb_csync();
> > >  	dsb(nsh);
> > > +
> > > +	if (!full_ctxt)
> > > +		return;
> > > +
> > > +	ctxt->sys_regs[PMBLIMITR_EL1] = read_sysreg_s(SYS_PMBLIMITR_EL1);
> > > +	write_sysreg_s(0, SYS_PMBLIMITR_EL1);
> > > +
> > > +	/*
> > > +	 * As PMBSR is conditionally restored when returning to the host we
> > > +	 * must ensure the service bit is unset here to prevent a spurious
> > > +	 * host SPE interrupt from being raised.
> > > +	 */
> > > +	ctxt->sys_regs[PMBSR_EL1] = read_sysreg_s(SYS_PMBSR_EL1);
> > > +	write_sysreg_s(0, SYS_PMBSR_EL1);
> > > +
> > > +	isb();
> > > +
> > > +	ctxt->sys_regs[PMSICR_EL1] = read_sysreg_s(SYS_PMSICR_EL1);
> > > +	ctxt->sys_regs[PMSIRR_EL1] = read_sysreg_s(SYS_PMSIRR_EL1);
> > > +	ctxt->sys_regs[PMSFCR_EL1] = read_sysreg_s(SYS_PMSFCR_EL1);
> > > +	ctxt->sys_regs[PMSEVFR_EL1] = read_sysreg_s(SYS_PMSEVFR_EL1);
> > > +	ctxt->sys_regs[PMSLATFR_EL1] = read_sysreg_s(SYS_PMSLATFR_EL1);
> > > +	ctxt->sys_regs[PMBPTR_EL1] = read_sysreg_s(SYS_PMBPTR_EL1);
> > >  }
> > >  
> > > -static void __hyp_text __debug_restore_spe_nvhe(struct kvm_cpu_context *ctxt)
> > > +static void __hyp_text
> > > +__debug_restore_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
> > >  {
> > >  	if (!ctxt->sys_regs[PMSCR_EL1])
> > >  		return;
> > > @@ -126,6 +151,16 @@ static void __hyp_text __debug_restore_spe_nvhe(struct kvm_cpu_context *ctxt)
> > >  	isb();
> > >  
> > >  	/* Re-enable data generation */
> > > +	if (full_ctxt) {
> > > +		write_sysreg_s(ctxt->sys_regs[PMBPTR_EL1], SYS_PMBPTR_EL1);
> > > +		write_sysreg_s(ctxt->sys_regs[PMBLIMITR_EL1], SYS_PMBLIMITR_EL1);
> > > +		write_sysreg_s(ctxt->sys_regs[PMSFCR_EL1], SYS_PMSFCR_EL1);
> > > +		write_sysreg_s(ctxt->sys_regs[PMSEVFR_EL1], SYS_PMSEVFR_EL1);
> > > +		write_sysreg_s(ctxt->sys_regs[PMSLATFR_EL1], SYS_PMSLATFR_EL1);
> > > +		write_sysreg_s(ctxt->sys_regs[PMSIRR_EL1], SYS_PMSIRR_EL1);
> > > +		write_sysreg_s(ctxt->sys_regs[PMSICR_EL1], SYS_PMSICR_EL1);
> > > +		write_sysreg_s(ctxt->sys_regs[PMBSR_EL1], SYS_PMBSR_EL1);
> > > +	}
> > >  	write_sysreg_el1(ctxt->sys_regs[PMSCR_EL1], SYS_PMSCR);
> > >  }
> > >  
> > > @@ -198,7 +233,7 @@ void __hyp_text __debug_restore_host_context(struct kvm_vcpu *vcpu)
> > >  	guest_ctxt = &vcpu->arch.ctxt;
> > >  
> > >  	if (!has_vhe())
> > > -		__debug_restore_spe_nvhe(host_ctxt);
> > > +		__debug_restore_spe_nvhe(host_ctxt, false);
> > >  
> > >  	if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
> > >  		return;
> > > @@ -222,7 +257,7 @@ void __hyp_text __debug_save_host_context(struct kvm_vcpu *vcpu)
> > >  
> > >  	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > >  	if (!has_vhe())
> > > -		__debug_save_spe_nvhe(host_ctxt);
> > > +		__debug_save_spe_nvhe(host_ctxt, false);
> > >  }
> > >  
> > >  void __hyp_text __debug_save_guest_context(struct kvm_vcpu *vcpu)
> > 
> > So all of this is for non-VHE. What happens in the VHE case?
> 
> By the end of the series this ends up in __debug_save_host_context which is
> called for both VHE/nVHE - on the re-spin I'll make it not look so confusing.
> 
> Thanks,
> 
> Andrew Murray
> 
> > 
> > 	M.
> > -- 
> > Jazz is not dead. It just smells funny...
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 08/18] arm64: KVM: add support to save/restore SPE profiling buffer controls
  2019-12-24 15:17       ` Andrew Murray
@ 2019-12-24 15:48         ` Marc Zyngier
  0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2019-12-24 15:48 UTC (permalink / raw)
  To: Andrew Murray
  Cc: kvm, Catalin Marinas, linux-kernel, Sudeep Holla, Will Deacon,
	kvmarm, linux-arm-kernel

On Tue, 24 Dec 2019 15:17:39 +0000,
Andrew Murray <andrew.murray@arm.com> wrote:
> 
> On Tue, Dec 24, 2019 at 10:49:30AM +0000, Andrew Murray wrote:
> > On Sat, Dec 21, 2019 at 01:57:55PM +0000, Marc Zyngier wrote:
> > > On Fri, 20 Dec 2019 14:30:15 +0000
> > > Andrew Murray <andrew.murray@arm.com> wrote:
> > > 
> > > > From: Sudeep Holla <sudeep.holla@arm.com>
> > > > 
> > > > Currently since we don't support profiling using SPE in the guests,
> > > > we just save the PMSCR_EL1, flush the profiling buffers and disable
> > > > sampling. However in order to support simultaneous sampling both in
> > > 
> > > Is the sampling actually simultaneous? I don't believe so (the whole
> > > series would be much simpler if it was).
> > 
> > No the SPE is used by either the guest or host at any one time. I guess
> > the term simultaneous was used to refer to illusion given to both guest
> > and host that they are able to use it whenever they like. I'll update
> > the commit message to drop the magic.
> >  
> > 
> > > 
> > > > the host and guests, we need to save and reatore the complete SPE
> > > 
> > > s/reatore/restore/
> > 
> > Noted.
> > 
> > 
> > > 
> > > > profiling buffer controls' context.
> > > > 
> > > > Let's add the support for the same and keep it disabled for now.
> > > > We can enable it conditionally only if guests are allowed to use
> > > > SPE.
> > > > 
> > > > Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> > > > [ Clear PMBSR bit when saving state to prevent spurious interrupts ]
> > > > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> > > > ---
> > > >  arch/arm64/kvm/hyp/debug-sr.c | 51 +++++++++++++++++++++++++++++------
> > > >  1 file changed, 43 insertions(+), 8 deletions(-)
> > > > 
> > > > diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
> > > > index 8a70a493345e..12429b212a3a 100644
> > > > --- a/arch/arm64/kvm/hyp/debug-sr.c
> > > > +++ b/arch/arm64/kvm/hyp/debug-sr.c
> > > > @@ -85,7 +85,8 @@
> > > >  	default:	write_debug(ptr[0], reg, 0);			\
> > > >  	}
> > > >  
> > > > -static void __hyp_text __debug_save_spe_nvhe(struct kvm_cpu_context *ctxt)
> > > > +static void __hyp_text
> > > > +__debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
> > > 
> > > nit: don't split lines like this if you can avoid it. You can put the
> > > full_ctxt parameter on a separate line instead.
> > 
> > Yes understood.
> > 
> > 
> > > 
> > > >  {
> > > >  	u64 reg;
> > > >  
> > > > @@ -102,22 +103,46 @@ static void __hyp_text __debug_save_spe_nvhe(struct kvm_cpu_context *ctxt)
> > > >  	if (reg & BIT(SYS_PMBIDR_EL1_P_SHIFT))
> > > >  		return;
> > > >  
> > > > -	/* No; is the host actually using the thing? */
> > > > -	reg = read_sysreg_s(SYS_PMBLIMITR_EL1);
> > > > -	if (!(reg & BIT(SYS_PMBLIMITR_EL1_E_SHIFT)))
> > > > +	/* Save the control register and disable data generation */
> > > > +	ctxt->sys_regs[PMSCR_EL1] = read_sysreg_el1(SYS_PMSCR);
> > > > +
> > > > +	if (!ctxt->sys_regs[PMSCR_EL1])
> > > 
> > > Shouldn't you check the enable bits instead of relying on the whole
> > > thing being zero?
> > 
> > Yes that would make more sense (E1SPE and E0SPE).
> > 
> > I feel that this check makes an assumption about the guest/host SPE
> > driver... What happens if the SPE driver writes to some SPE registers
> > but doesn't enable PMSCR? If the guest is also using SPE then those
> > writes will be lost, when the host returns and the SPE driver enables
> > SPE it won't work.
> >
> > With a quick look at the SPE driver I'm not sure this will happen, but
> > even so it makes me nervous relying on these assumptions. I wonder if
> > this risk is present in other devices?

As a rule of thumb, you should always save whatever you're about to
overwrite if the registers are not under exclusive control of KVM. No
exception.

So if the guest is willing to use SPE *and* that it isn't enabled on
the host, these registers have to be saved on vcpu_load() and restored
on vcpu_put().

If SPE is enabled on the host, then trapping has to be enabled, and no
tracing occurs in the guest for this time slice.

> In fact, this may be a good reason to trap the SPE registers - this would
> allow you to conditionally save/restore based on a dirty bit. It would
> also allow you to re-evaluate the SPE interrupt (for example when the guest
> clears the status register) and thus potentially reduce any black hole.

I don't see what trapping buys you in the expected case (where the
guest is tracing and the host isn't). To clear PMBSR_EL1.S, you first
need to know that an interrupt has fired. So this brings you exactly
nothing in this particular case, and just adds overhead for everything
else. The whole point of the architecture is that in the non-contended
case, we can give SPE to the guest and mostly forget about it.

I strongly suggest that you start with the simplest possible, non
broken implementation. It doesn't matter if the black holes last for
seconds for now. Once you have something that looks reasonable, we can
evaluate how to improve on it by throwing actual HW and workloads at
it.

	M.

-- 
Jazz is not dead, it just smells funny.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 02/18] arm64: KVM: reset E2PB correctly in MDCR_EL2 when exiting the guest(VHE)
  2019-12-24 10:29     ` Andrew Murray
@ 2020-01-02 16:21       ` Andrew Murray
  0 siblings, 0 replies; 78+ messages in thread
From: Andrew Murray @ 2020-01-02 16:21 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, Catalin Marinas, linux-kernel, Sudeep Holla, will, kvmarm,
	linux-arm-kernel

On Tue, Dec 24, 2019 at 10:29:50AM +0000, Andrew Murray wrote:
> On Sat, Dec 21, 2019 at 01:12:14PM +0000, Marc Zyngier wrote:
> > On Fri, 20 Dec 2019 14:30:09 +0000
> > Andrew Murray <andrew.murray@arm.com> wrote:
> > 
> > > From: Sudeep Holla <sudeep.holla@arm.com>
> > > 
> > > On VHE systems, the reset value for MDCR_EL2.E2PB=b00 which defaults
> > > to profiling buffer using the EL2 stage 1 translations. 
> > 
> > Does the reset value actually matter here? I don't see it being
> > specific to VHE systems, and all we're trying to achieve is to restore
> > the SPE configuration to a state where it can be used by the host.
> > 
> > > However if the
> > > guest are allowed to use profiling buffers changing E2PB settings, we
> > 
> > How can the guest be allowed to change E2PB settings? Or do you mean
> > here that allowing the guest to use SPE will mandate changes of the
> > E2PB settings, and that we'd better restore the hypervisor state once
> > we exit?
> > 
> > > need to ensure we resume back MDCR_EL2.E2PB=b00. Currently we just
> > > do bitwise '&' with MDCR_EL2_E2PB_MASK which will retain the value.
> > > 
> > > So fix it by clearing all the bits in E2PB.
> > > 
> > > Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> > > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> > > ---
> > >  arch/arm64/kvm/hyp/switch.c | 4 +---
> > >  1 file changed, 1 insertion(+), 3 deletions(-)
> > > 
> > > diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> > > index 72fbbd86eb5e..250f13910882 100644
> > > --- a/arch/arm64/kvm/hyp/switch.c
> > > +++ b/arch/arm64/kvm/hyp/switch.c
> > > @@ -228,9 +228,7 @@ void deactivate_traps_vhe_put(void)
> > >  {
> > >  	u64 mdcr_el2 = read_sysreg(mdcr_el2);
> > >  
> > > -	mdcr_el2 &= MDCR_EL2_HPMN_MASK |
> > > -		    MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT |
> > > -		    MDCR_EL2_TPMS;
> > > +	mdcr_el2 &= MDCR_EL2_HPMN_MASK | MDCR_EL2_TPMS;
> > >  
> > >  	write_sysreg(mdcr_el2, mdcr_el2);
> > >  
> > 
> > I'm OK with this change, but I believe the commit message could use
> > some tidying up.
> 
> No problem, I'll update the commit message.

This is my new description:

    arm64: KVM: reset E2PB correctly in MDCR_EL2 when exiting the guest (VHE)
    
    Upon leaving the guest on VHE systems we currently preserve the value of
    MDCR_EL2.E2PB. This register determines if the SPE profiling buffer controls
    are trapped and which translation regime they use.
    
    In order to permit guest access to SPE we may use a different translation
    regime whilst the vCPU is scheduled - therefore let's ensure that upon leaving
    the guest we set E2PB back to the value expected by the host (b00).
    
    For nVHE systems we already explictly set E2PB back to the expected value
    of 0b11 in __deactivate_traps_nvhe.

Thanks,

Andrew Murray

> 
> Thanks,
> 
> Andrew Murray
> 
> > 
> > Thanks,
> > 
> > 	M.
> > -- 
> > Jazz is not dead. It just smells funny...
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 09/18] arm64: KVM: enable conditional save/restore full SPE profiling buffer controls
  2019-12-21 14:13   ` Marc Zyngier
@ 2020-01-07 15:13     ` Andrew Murray
  2020-01-08 11:17       ` Marc Zyngier
  2020-01-10 10:54     ` Andrew Murray
  1 sibling, 1 reply; 78+ messages in thread
From: Andrew Murray @ 2020-01-07 15:13 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Mark Rutland, will, Sudeep Holla, kvm, kvmarm,
	linux-arm-kernel, linux-kernel

On Sat, Dec 21, 2019 at 02:13:25PM +0000, Marc Zyngier wrote:
> On Fri, 20 Dec 2019 14:30:16 +0000
> Andrew Murray <andrew.murray@arm.com> wrote:
> 
> [somehow managed not to do a reply all, re-sending]
> 
> > From: Sudeep Holla <sudeep.holla@arm.com>
> > 
> > Now that we can save/restore the full SPE controls, we can enable it
> > if SPE is setup and ready to use in KVM. It's supported in KVM only if
> > all the CPUs in the system supports SPE.
> > 
> > However to support heterogenous systems, we need to move the check if
> > host supports SPE and do a partial save/restore.
> 
> No. Let's just not go down that path. For now, KVM on heterogeneous
> systems do not get SPE.

At present these patches only offer the SPE feature to VCPU's where the
sanitised AA64DFR0 register indicates that all CPUs have this support
(kvm_arm_support_spe_v1) at the time of setting the attribute
(KVM_SET_DEVICE_ATTR).

Therefore if a new CPU comes online without SPE support, and an
existing VCPU is scheduled onto it, then bad things happen - which I guess
must have been the intention behind this patch.


> If SPE has been enabled on a guest and a CPU
> comes up without SPE, this CPU should fail to boot (same as exposing a
> feature to userspace).

I'm unclear as how to prevent this. We can set the FTR_STRICT flag on
the sanitised register - thus tainting the kernel if such a non-SPE CPU
comes online - thought that doesn't prevent KVM from blowing up. Though
I don't believe we can prevent a CPU coming up. At the moment this is
my preferred approach.

Looking at the vcpu_load and related code, I don't see a way of saying
'don't schedule this VCPU on this CPU' or bailing in any way.

One solution could be to allow scheduling onto non-SPE VCPUs but wrap the
SPE save/restore code in a macro (much like kvm_arm_spe_v1_ready) that
reads the non-sanitised feature register. Therefore we don't go bang, but
we also increase the size of any black-holes in SPE capturing. Though this
feels like something that will cause grief down the line.

Is there something else that can be done?

Thanks,

Andrew Murray

> 
> > 
> > Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> > ---
> >  arch/arm64/kvm/hyp/debug-sr.c | 33 ++++++++++++++++-----------------
> >  include/kvm/arm_spe.h         |  6 ++++++
> >  2 files changed, 22 insertions(+), 17 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
> > index 12429b212a3a..d8d857067e6d 100644
> > --- a/arch/arm64/kvm/hyp/debug-sr.c
> > +++ b/arch/arm64/kvm/hyp/debug-sr.c
> > @@ -86,18 +86,13 @@
> >  	}
> >  
> >  static void __hyp_text
> > -__debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
> > +__debug_save_spe_context(struct kvm_cpu_context *ctxt, bool full_ctxt)
> >  {
> >  	u64 reg;
> >  
> >  	/* Clear pmscr in case of early return */
> >  	ctxt->sys_regs[PMSCR_EL1] = 0;
> >  
> > -	/* SPE present on this CPU? */
> > -	if (!cpuid_feature_extract_unsigned_field(read_sysreg(id_aa64dfr0_el1),
> > -						  ID_AA64DFR0_PMSVER_SHIFT))
> > -		return;
> > -
> >  	/* Yes; is it owned by higher EL? */
> >  	reg = read_sysreg_s(SYS_PMBIDR_EL1);
> >  	if (reg & BIT(SYS_PMBIDR_EL1_P_SHIFT))
> > @@ -142,7 +137,7 @@ __debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
> >  }
> >  
> >  static void __hyp_text
> > -__debug_restore_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
> > +__debug_restore_spe_context(struct kvm_cpu_context *ctxt, bool full_ctxt)
> >  {
> >  	if (!ctxt->sys_regs[PMSCR_EL1])
> >  		return;
> > @@ -210,11 +205,14 @@ void __hyp_text __debug_restore_guest_context(struct kvm_vcpu *vcpu)
> >  	struct kvm_guest_debug_arch *host_dbg;
> >  	struct kvm_guest_debug_arch *guest_dbg;
> >  
> > +	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > +	guest_ctxt = &vcpu->arch.ctxt;
> > +
> > +	__debug_restore_spe_context(guest_ctxt, kvm_arm_spe_v1_ready(vcpu));
> > +
> >  	if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
> >  		return;
> >  
> > -	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > -	guest_ctxt = &vcpu->arch.ctxt;
> >  	host_dbg = &vcpu->arch.host_debug_state.regs;
> >  	guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
> >  
> > @@ -232,8 +230,7 @@ void __hyp_text __debug_restore_host_context(struct kvm_vcpu *vcpu)
> >  	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> >  	guest_ctxt = &vcpu->arch.ctxt;
> >  
> > -	if (!has_vhe())
> > -		__debug_restore_spe_nvhe(host_ctxt, false);
> > +	__debug_restore_spe_context(host_ctxt, kvm_arm_spe_v1_ready(vcpu));
> 
> So you now do an unconditional save/restore on the exit path for VHE as
> well? Even if the host isn't using the SPE HW? That's not acceptable
> as, in most cases, only the host /or/ the guest will use SPE. Here, you
> put a measurable overhead on each exit.
> 
> If the host is not using SPE, then the restore/save should happen in
> vcpu_load/vcpu_put. Only if the host is using SPE should you do
> something in the run loop. Of course, this only applies to VHE and
> non-VHE must switch eagerly.
> 
> >  
> >  	if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
> >  		return;
> > @@ -249,19 +246,21 @@ void __hyp_text __debug_restore_host_context(struct kvm_vcpu *vcpu)
> >  
> >  void __hyp_text __debug_save_host_context(struct kvm_vcpu *vcpu)
> >  {
> > -	/*
> > -	 * Non-VHE: Disable and flush SPE data generation
> > -	 * VHE: The vcpu can run, but it can't hide.
> > -	 */
> >  	struct kvm_cpu_context *host_ctxt;
> >  
> >  	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > -	if (!has_vhe())
> > -		__debug_save_spe_nvhe(host_ctxt, false);
> > +	if (cpuid_feature_extract_unsigned_field(read_sysreg(id_aa64dfr0_el1),
> > +						 ID_AA64DFR0_PMSVER_SHIFT))
> > +		__debug_save_spe_context(host_ctxt, kvm_arm_spe_v1_ready(vcpu));
> >  }
> >  
> >  void __hyp_text __debug_save_guest_context(struct kvm_vcpu *vcpu)
> >  {
> > +	bool kvm_spe_ready = kvm_arm_spe_v1_ready(vcpu);
> > +
> > +	/* SPE present on this vCPU? */
> > +	if (kvm_spe_ready)
> > +		__debug_save_spe_context(&vcpu->arch.ctxt, kvm_spe_ready);
> >  }
> >  
> >  u32 __hyp_text __kvm_get_mdcr_el2(void)
> > diff --git a/include/kvm/arm_spe.h b/include/kvm/arm_spe.h
> > index 48d118fdb174..30c40b1bc385 100644
> > --- a/include/kvm/arm_spe.h
> > +++ b/include/kvm/arm_spe.h
> > @@ -16,4 +16,10 @@ struct kvm_spe {
> >  	bool irq_level;
> >  };
> >  
> > +#ifdef CONFIG_KVM_ARM_SPE
> > +#define kvm_arm_spe_v1_ready(v)		((v)->arch.spe.ready)
> > +#else
> > +#define kvm_arm_spe_v1_ready(v)		(false)
> > +#endif /* CONFIG_KVM_ARM_SPE */
> > +
> >  #endif /* __ASM_ARM_KVM_SPE_H */
> 
> Thanks,
> 
> 	M.
> -- 
> Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 09/18] arm64: KVM: enable conditional save/restore full SPE profiling buffer controls
  2020-01-07 15:13     ` Andrew Murray
@ 2020-01-08 11:17       ` Marc Zyngier
  2020-01-08 11:58         ` Will Deacon
  0 siblings, 1 reply; 78+ messages in thread
From: Marc Zyngier @ 2020-01-08 11:17 UTC (permalink / raw)
  To: Andrew Murray
  Cc: Catalin Marinas, Mark Rutland, will, Sudeep Holla, kvm, kvmarm,
	linux-arm-kernel, linux-kernel

On 2020-01-07 15:13, Andrew Murray wrote:
> On Sat, Dec 21, 2019 at 02:13:25PM +0000, Marc Zyngier wrote:
>> On Fri, 20 Dec 2019 14:30:16 +0000
>> Andrew Murray <andrew.murray@arm.com> wrote:
>> 
>> [somehow managed not to do a reply all, re-sending]
>> 
>> > From: Sudeep Holla <sudeep.holla@arm.com>
>> >
>> > Now that we can save/restore the full SPE controls, we can enable it
>> > if SPE is setup and ready to use in KVM. It's supported in KVM only if
>> > all the CPUs in the system supports SPE.
>> >
>> > However to support heterogenous systems, we need to move the check if
>> > host supports SPE and do a partial save/restore.
>> 
>> No. Let's just not go down that path. For now, KVM on heterogeneous
>> systems do not get SPE.
> 
> At present these patches only offer the SPE feature to VCPU's where the
> sanitised AA64DFR0 register indicates that all CPUs have this support
> (kvm_arm_support_spe_v1) at the time of setting the attribute
> (KVM_SET_DEVICE_ATTR).
> 
> Therefore if a new CPU comes online without SPE support, and an
> existing VCPU is scheduled onto it, then bad things happen - which I 
> guess
> must have been the intention behind this patch.

I guess that was the intent.

>> If SPE has been enabled on a guest and a CPU
>> comes up without SPE, this CPU should fail to boot (same as exposing a
>> feature to userspace).
> 
> I'm unclear as how to prevent this. We can set the FTR_STRICT flag on
> the sanitised register - thus tainting the kernel if such a non-SPE CPU
> comes online - thought that doesn't prevent KVM from blowing up. Though
> I don't believe we can prevent a CPU coming up. At the moment this is
> my preferred approach.

I'd be OK with this as a stop-gap measure. Do we know of any existing
design where only half of the CPUs have SPE?

> Looking at the vcpu_load and related code, I don't see a way of saying
> 'don't schedule this VCPU on this CPU' or bailing in any way.

That would actually be pretty easy to implement. In vcpu_load(), check
that that the CPU physical has SPE. If not, raise a request for that 
vcpu.
In the run loop, check for that request and abort if raised, returning
to userspace.

Userspace can always check /sys/devices/arm_spe_0/cpumask and work out
where to run that particular vcpu.

> 
> One solution could be to allow scheduling onto non-SPE VCPUs but wrap 
> the
> SPE save/restore code in a macro (much like kvm_arm_spe_v1_ready) that
> reads the non-sanitised feature register. Therefore we don't go bang, 
> but
> we also increase the size of any black-holes in SPE capturing. Though 
> this
> feels like something that will cause grief down the line.
> 
> Is there something else that can be done?

How does userspace deal with this? When SPE is only available on half of
the CPUs, how does perf work in these conditions?

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 09/18] arm64: KVM: enable conditional save/restore full SPE profiling buffer controls
  2020-01-08 11:17       ` Marc Zyngier
@ 2020-01-08 11:58         ` Will Deacon
  2020-01-08 12:36           ` Marc Zyngier
  0 siblings, 1 reply; 78+ messages in thread
From: Will Deacon @ 2020-01-08 11:58 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Andrew Murray, Catalin Marinas, Mark Rutland, Sudeep Holla, kvm,
	kvmarm, linux-arm-kernel, linux-kernel

On Wed, Jan 08, 2020 at 11:17:16AM +0000, Marc Zyngier wrote:
> On 2020-01-07 15:13, Andrew Murray wrote:
> > On Sat, Dec 21, 2019 at 02:13:25PM +0000, Marc Zyngier wrote:
> > > On Fri, 20 Dec 2019 14:30:16 +0000
> > > Andrew Murray <andrew.murray@arm.com> wrote:
> > > 
> > > [somehow managed not to do a reply all, re-sending]
> > > 
> > > > From: Sudeep Holla <sudeep.holla@arm.com>
> > > >
> > > > Now that we can save/restore the full SPE controls, we can enable it
> > > > if SPE is setup and ready to use in KVM. It's supported in KVM only if
> > > > all the CPUs in the system supports SPE.
> > > >
> > > > However to support heterogenous systems, we need to move the check if
> > > > host supports SPE and do a partial save/restore.
> > > 
> > > No. Let's just not go down that path. For now, KVM on heterogeneous
> > > systems do not get SPE.
> > 
> > At present these patches only offer the SPE feature to VCPU's where the
> > sanitised AA64DFR0 register indicates that all CPUs have this support
> > (kvm_arm_support_spe_v1) at the time of setting the attribute
> > (KVM_SET_DEVICE_ATTR).
> > 
> > Therefore if a new CPU comes online without SPE support, and an
> > existing VCPU is scheduled onto it, then bad things happen - which I
> > guess
> > must have been the intention behind this patch.
> 
> I guess that was the intent.
> 
> > > If SPE has been enabled on a guest and a CPU
> > > comes up without SPE, this CPU should fail to boot (same as exposing a
> > > feature to userspace).
> > 
> > I'm unclear as how to prevent this. We can set the FTR_STRICT flag on
> > the sanitised register - thus tainting the kernel if such a non-SPE CPU
> > comes online - thought that doesn't prevent KVM from blowing up. Though
> > I don't believe we can prevent a CPU coming up. At the moment this is
> > my preferred approach.
> 
> I'd be OK with this as a stop-gap measure. Do we know of any existing
> design where only half of the CPUs have SPE?

No, but given how few CPUs implement SPE I'd say that this configuration
is inevitable. I certainly went out of my way to support it in the driver.

> > Looking at the vcpu_load and related code, I don't see a way of saying
> > 'don't schedule this VCPU on this CPU' or bailing in any way.
> 
> That would actually be pretty easy to implement. In vcpu_load(), check
> that that the CPU physical has SPE. If not, raise a request for that vcpu.
> In the run loop, check for that request and abort if raised, returning
> to userspace.
> 
> Userspace can always check /sys/devices/arm_spe_0/cpumask and work out
> where to run that particular vcpu.

It's also worth considering systems where there are multiple implementations
of SPE in play. Assuming we don't want to expose this to a guest, then the
right interface here is probably for userspace to pick one SPE
implementation and expose that to the guest. That fits with your idea above,
where you basically get an immediate exit if we try to schedule a vCPU onto
a CPU that isn't part of the SPE mask.

> > One solution could be to allow scheduling onto non-SPE VCPUs but wrap
> > the
> > SPE save/restore code in a macro (much like kvm_arm_spe_v1_ready) that
> > reads the non-sanitised feature register. Therefore we don't go bang,
> > but
> > we also increase the size of any black-holes in SPE capturing. Though
> > this
> > feels like something that will cause grief down the line.
> > 
> > Is there something else that can be done?
> 
> How does userspace deal with this? When SPE is only available on half of
> the CPUs, how does perf work in these conditions?

Not sure about userspace, but the kernel driver works by instantiating an
SPE PMU instance only for the CPUs that have it and then that instance
profiles for only those CPUs. You also need to do something similar if
you had two CPU types with SPE, since the SPE configuration is likely to be
different between them.

Will

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 09/18] arm64: KVM: enable conditional save/restore full SPE profiling buffer controls
  2020-01-08 11:58         ` Will Deacon
@ 2020-01-08 12:36           ` Marc Zyngier
  2020-01-08 13:10             ` Will Deacon
  0 siblings, 1 reply; 78+ messages in thread
From: Marc Zyngier @ 2020-01-08 12:36 UTC (permalink / raw)
  To: Will Deacon
  Cc: Andrew Murray, Catalin Marinas, Mark Rutland, Sudeep Holla, kvm,
	kvmarm, linux-arm-kernel, linux-kernel

On 2020-01-08 11:58, Will Deacon wrote:
> On Wed, Jan 08, 2020 at 11:17:16AM +0000, Marc Zyngier wrote:
>> On 2020-01-07 15:13, Andrew Murray wrote:
>> > On Sat, Dec 21, 2019 at 02:13:25PM +0000, Marc Zyngier wrote:
>> > > On Fri, 20 Dec 2019 14:30:16 +0000
>> > > Andrew Murray <andrew.murray@arm.com> wrote:
>> > >
>> > > [somehow managed not to do a reply all, re-sending]
>> > >
>> > > > From: Sudeep Holla <sudeep.holla@arm.com>
>> > > >
>> > > > Now that we can save/restore the full SPE controls, we can enable it
>> > > > if SPE is setup and ready to use in KVM. It's supported in KVM only if
>> > > > all the CPUs in the system supports SPE.
>> > > >
>> > > > However to support heterogenous systems, we need to move the check if
>> > > > host supports SPE and do a partial save/restore.
>> > >
>> > > No. Let's just not go down that path. For now, KVM on heterogeneous
>> > > systems do not get SPE.
>> >
>> > At present these patches only offer the SPE feature to VCPU's where the
>> > sanitised AA64DFR0 register indicates that all CPUs have this support
>> > (kvm_arm_support_spe_v1) at the time of setting the attribute
>> > (KVM_SET_DEVICE_ATTR).
>> >
>> > Therefore if a new CPU comes online without SPE support, and an
>> > existing VCPU is scheduled onto it, then bad things happen - which I
>> > guess
>> > must have been the intention behind this patch.
>> 
>> I guess that was the intent.
>> 
>> > > If SPE has been enabled on a guest and a CPU
>> > > comes up without SPE, this CPU should fail to boot (same as exposing a
>> > > feature to userspace).
>> >
>> > I'm unclear as how to prevent this. We can set the FTR_STRICT flag on
>> > the sanitised register - thus tainting the kernel if such a non-SPE CPU
>> > comes online - thought that doesn't prevent KVM from blowing up. Though
>> > I don't believe we can prevent a CPU coming up. At the moment this is
>> > my preferred approach.
>> 
>> I'd be OK with this as a stop-gap measure. Do we know of any existing
>> design where only half of the CPUs have SPE?
> 
> No, but given how few CPUs implement SPE I'd say that this 
> configuration
> is inevitable. I certainly went out of my way to support it in the 
> driver.
> 
>> > Looking at the vcpu_load and related code, I don't see a way of saying
>> > 'don't schedule this VCPU on this CPU' or bailing in any way.
>> 
>> That would actually be pretty easy to implement. In vcpu_load(), check
>> that that the CPU physical has SPE. If not, raise a request for that 
>> vcpu.
>> In the run loop, check for that request and abort if raised, returning
>> to userspace.
>> 
>> Userspace can always check /sys/devices/arm_spe_0/cpumask and work out
>> where to run that particular vcpu.
> 
> It's also worth considering systems where there are multiple 
> implementations
> of SPE in play. Assuming we don't want to expose this to a guest, then 
> the
> right interface here is probably for userspace to pick one SPE
> implementation and expose that to the guest. That fits with your idea 
> above,
> where you basically get an immediate exit if we try to schedule a vCPU 
> onto
> a CPU that isn't part of the SPE mask.

Then it means that the VM should be configured with a mask indicating
which CPUs it is intended to run on, and setting such a mask is 
mandatory
for SPE.

> 
>> > One solution could be to allow scheduling onto non-SPE VCPUs but wrap
>> > the
>> > SPE save/restore code in a macro (much like kvm_arm_spe_v1_ready) that
>> > reads the non-sanitised feature register. Therefore we don't go bang,
>> > but
>> > we also increase the size of any black-holes in SPE capturing. Though
>> > this
>> > feels like something that will cause grief down the line.
>> >
>> > Is there something else that can be done?
>> 
>> How does userspace deal with this? When SPE is only available on half 
>> of
>> the CPUs, how does perf work in these conditions?
> 
> Not sure about userspace, but the kernel driver works by instantiating 
> an
> SPE PMU instance only for the CPUs that have it and then that instance
> profiles for only those CPUs. You also need to do something similar if
> you had two CPU types with SPE, since the SPE configuration is likely 
> to be
> different between them.

So that's closer to what Andrew was suggesting above (running a guest on 
a
non-SPE CPU creates a profiling black hole). Except that we can't really
run a SPE-enabled guest on a non-SPE CPU, as the SPE sysregs will UNDEF
at EL1.

Conclusion: we need a mix of a cpumask to indicate which CPUs we want to
run on (generic, not-SPE related), and a check for SPE-capable CPUs.
If any of these condition is not satisfied, the vcpu exits for userspace
to sort out the affinity.

I hate heterogeneous systems.

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 09/18] arm64: KVM: enable conditional save/restore full SPE profiling buffer controls
  2020-01-08 12:36           ` Marc Zyngier
@ 2020-01-08 13:10             ` Will Deacon
  2020-01-09 11:23               ` Andrew Murray
  0 siblings, 1 reply; 78+ messages in thread
From: Will Deacon @ 2020-01-08 13:10 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Andrew Murray, Catalin Marinas, Mark Rutland, Sudeep Holla, kvm,
	kvmarm, linux-arm-kernel, linux-kernel

On Wed, Jan 08, 2020 at 12:36:11PM +0000, Marc Zyngier wrote:
> On 2020-01-08 11:58, Will Deacon wrote:
> > On Wed, Jan 08, 2020 at 11:17:16AM +0000, Marc Zyngier wrote:
> > > On 2020-01-07 15:13, Andrew Murray wrote:
> > > > Looking at the vcpu_load and related code, I don't see a way of saying
> > > > 'don't schedule this VCPU on this CPU' or bailing in any way.
> > > 
> > > That would actually be pretty easy to implement. In vcpu_load(), check
> > > that that the CPU physical has SPE. If not, raise a request for that
> > > vcpu.
> > > In the run loop, check for that request and abort if raised, returning
> > > to userspace.
> > > 
> > > Userspace can always check /sys/devices/arm_spe_0/cpumask and work out
> > > where to run that particular vcpu.
> > 
> > It's also worth considering systems where there are multiple
> > implementations
> > of SPE in play. Assuming we don't want to expose this to a guest, then
> > the
> > right interface here is probably for userspace to pick one SPE
> > implementation and expose that to the guest. That fits with your idea
> > above,
> > where you basically get an immediate exit if we try to schedule a vCPU
> > onto
> > a CPU that isn't part of the SPE mask.
> 
> Then it means that the VM should be configured with a mask indicating
> which CPUs it is intended to run on, and setting such a mask is mandatory
> for SPE.

Yeah, and this could probably all be wrapped up by userspace so you just
pass the SPE PMU name or something and it grabs the corresponding cpumask
for you.

> > > > One solution could be to allow scheduling onto non-SPE VCPUs but wrap
> > > > the
> > > > SPE save/restore code in a macro (much like kvm_arm_spe_v1_ready) that
> > > > reads the non-sanitised feature register. Therefore we don't go bang,
> > > > but
> > > > we also increase the size of any black-holes in SPE capturing. Though
> > > > this
> > > > feels like something that will cause grief down the line.
> > > >
> > > > Is there something else that can be done?
> > > 
> > > How does userspace deal with this? When SPE is only available on
> > > half of
> > > the CPUs, how does perf work in these conditions?
> > 
> > Not sure about userspace, but the kernel driver works by instantiating
> > an
> > SPE PMU instance only for the CPUs that have it and then that instance
> > profiles for only those CPUs. You also need to do something similar if
> > you had two CPU types with SPE, since the SPE configuration is likely to
> > be
> > different between them.
> 
> So that's closer to what Andrew was suggesting above (running a guest on a
> non-SPE CPU creates a profiling black hole). Except that we can't really
> run a SPE-enabled guest on a non-SPE CPU, as the SPE sysregs will UNDEF
> at EL1.

Right. I wouldn't suggest the "black hole" approach for VMs, but it works
for userspace so that's why the driver does it that way.

> Conclusion: we need a mix of a cpumask to indicate which CPUs we want to
> run on (generic, not-SPE related), and a check for SPE-capable CPUs.
> If any of these condition is not satisfied, the vcpu exits for userspace
> to sort out the affinity.
> 
> I hate heterogeneous systems.

They hate you too ;)

Will

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 09/18] arm64: KVM: enable conditional save/restore full SPE profiling buffer controls
  2020-01-08 13:10             ` Will Deacon
@ 2020-01-09 11:23               ` Andrew Murray
  2020-01-09 11:25                 ` Andrew Murray
  0 siblings, 1 reply; 78+ messages in thread
From: Andrew Murray @ 2020-01-09 11:23 UTC (permalink / raw)
  To: Will Deacon
  Cc: Marc Zyngier, Catalin Marinas, Mark Rutland, Sudeep Holla, kvm,
	kvmarm, linux-arm-kernel, linux-kernel

On Wed, Jan 08, 2020 at 01:10:21PM +0000, Will Deacon wrote:
> On Wed, Jan 08, 2020 at 12:36:11PM +0000, Marc Zyngier wrote:
> > On 2020-01-08 11:58, Will Deacon wrote:
> > > On Wed, Jan 08, 2020 at 11:17:16AM +0000, Marc Zyngier wrote:
> > > > On 2020-01-07 15:13, Andrew Murray wrote:
> > > > > Looking at the vcpu_load and related code, I don't see a way of saying
> > > > > 'don't schedule this VCPU on this CPU' or bailing in any way.
> > > > 
> > > > That would actually be pretty easy to implement. In vcpu_load(), check
> > > > that that the CPU physical has SPE. If not, raise a request for that
> > > > vcpu.
> > > > In the run loop, check for that request and abort if raised, returning
> > > > to userspace.

I hadn't really noticed the kvm_make_request mechanism - however it's now
clear how this could be implemented.

This approach gives responsibility for which CPUs should be used to userspace
and if userspace gets it wrong then the KVM_RUN ioctl won't do very much.


> > > > 
> > > > Userspace can always check /sys/devices/arm_spe_0/cpumask and work out
> > > > where to run that particular vcpu.
> > > 
> > > It's also worth considering systems where there are multiple
> > > implementations
> > > of SPE in play. Assuming we don't want to expose this to a guest, then
> > > the
> > > right interface here is probably for userspace to pick one SPE
> > > implementation and expose that to the guest.

If I understand correctly then this implies the following:

 - If the host userspace indicates it wants support for SPE in the guest (via 
   KVM_SET_DEVICE_ATTR at start of day) - then we should check in vcpu_load that
   the minimum version of SPE is present on the current CPU. 'minimum' because
   we don't know why userspace has selected the given cpumask.

 - Userspace can get it wrong, i.e. it can create a CPU mask with CPUs that
   have SPE with differing versions. If it does, and all CPUs have some form of
   SPE then errors may occur in the guest. Perhaps this is OK and userspace
   shouldn't get it wrong?


> > >  That fits with your idea
> > > above,
> > > where you basically get an immediate exit if we try to schedule a vCPU
> > > onto
> > > a CPU that isn't part of the SPE mask.
> > 
> > Then it means that the VM should be configured with a mask indicating
> > which CPUs it is intended to run on, and setting such a mask is mandatory
> > for SPE.
> 
> Yeah, and this could probably all be wrapped up by userspace so you just
> pass the SPE PMU name or something and it grabs the corresponding cpumask
> for you.
> 
> > > > > One solution could be to allow scheduling onto non-SPE VCPUs but wrap
> > > > > the
> > > > > SPE save/restore code in a macro (much like kvm_arm_spe_v1_ready) that
> > > > > reads the non-sanitised feature register. Therefore we don't go bang,
> > > > > but
> > > > > we also increase the size of any black-holes in SPE capturing. Though
> > > > > this
> > > > > feels like something that will cause grief down the line.
> > > > >
> > > > > Is there something else that can be done?
> > > > 
> > > > How does userspace deal with this? When SPE is only available on
> > > > half of
> > > > the CPUs, how does perf work in these conditions?
> > > 
> > > Not sure about userspace, but the kernel driver works by instantiating
> > > an
> > > SPE PMU instance only for the CPUs that have it and then that instance
> > > profiles for only those CPUs. You also need to do something similar if
> > > you had two CPU types with SPE, since the SPE configuration is likely to
> > > be
> > > different between them.
> > 
> > So that's closer to what Andrew was suggesting above (running a guest on a
> > non-SPE CPU creates a profiling black hole). Except that we can't really
> > run a SPE-enabled guest on a non-SPE CPU, as the SPE sysregs will UNDEF
> > at EL1.
> 
> Right. I wouldn't suggest the "black hole" approach for VMs, but it works
> for userspace so that's why the driver does it that way.
> 
> > Conclusion: we need a mix of a cpumask to indicate which CPUs we want to
> > run on (generic, not-SPE related), 

If I understand correctly this mask isn't exposed to KVM (in the kernel) and
KVM (in the kernel) is unware of how the CPUs that have KVM_RUN called are
selected.

Thus this implies the cpumask is a feature of KVM tool or QEMU that would
need to be added there. (E.g. kvm_cmd_run_work would set some affinity when
creating pthreads - based on a CPU mask triggered by setting the --spe flag)?

Thanks,

Andrew Murray

> and a check for SPE-capable CPUs.
> > If any of these condition is not satisfied, the vcpu exits for userspace
> > to sort out the affinity.
> > 
> > I hate heterogeneous systems.
> 
> They hate you too ;)
> 
> Will

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 09/18] arm64: KVM: enable conditional save/restore full SPE profiling buffer controls
  2020-01-09 11:23               ` Andrew Murray
@ 2020-01-09 11:25                 ` Andrew Murray
  2020-01-09 12:01                   ` Will Deacon
  0 siblings, 1 reply; 78+ messages in thread
From: Andrew Murray @ 2020-01-09 11:25 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, kvm, Marc Zyngier, linux-kernel, Sudeep Holla,
	kvmarm, linux-arm-kernel

On Thu, Jan 09, 2020 at 11:23:37AM +0000, Andrew Murray wrote:
> On Wed, Jan 08, 2020 at 01:10:21PM +0000, Will Deacon wrote:
> > On Wed, Jan 08, 2020 at 12:36:11PM +0000, Marc Zyngier wrote:
> > > On 2020-01-08 11:58, Will Deacon wrote:
> > > > On Wed, Jan 08, 2020 at 11:17:16AM +0000, Marc Zyngier wrote:
> > > > > On 2020-01-07 15:13, Andrew Murray wrote:
> > > > > > Looking at the vcpu_load and related code, I don't see a way of saying
> > > > > > 'don't schedule this VCPU on this CPU' or bailing in any way.
> > > > > 
> > > > > That would actually be pretty easy to implement. In vcpu_load(), check
> > > > > that that the CPU physical has SPE. If not, raise a request for that
> > > > > vcpu.
> > > > > In the run loop, check for that request and abort if raised, returning
> > > > > to userspace.
> 
> I hadn't really noticed the kvm_make_request mechanism - however it's now
> clear how this could be implemented.
> 
> This approach gives responsibility for which CPUs should be used to userspace
> and if userspace gets it wrong then the KVM_RUN ioctl won't do very much.
> 
> 
> > > > > 
> > > > > Userspace can always check /sys/devices/arm_spe_0/cpumask and work out
> > > > > where to run that particular vcpu.
> > > > 
> > > > It's also worth considering systems where there are multiple
> > > > implementations
> > > > of SPE in play. Assuming we don't want to expose this to a guest, then
> > > > the
> > > > right interface here is probably for userspace to pick one SPE
> > > > implementation and expose that to the guest.
> 
> If I understand correctly then this implies the following:
> 
>  - If the host userspace indicates it wants support for SPE in the guest (via 
>    KVM_SET_DEVICE_ATTR at start of day) - then we should check in vcpu_load that
>    the minimum version of SPE is present on the current CPU. 'minimum' because
>    we don't know why userspace has selected the given cpumask.
> 
>  - Userspace can get it wrong, i.e. it can create a CPU mask with CPUs that
>    have SPE with differing versions. If it does, and all CPUs have some form of
>    SPE then errors may occur in the guest. Perhaps this is OK and userspace
>    shouldn't get it wrong?

Actually this could be guarded against by emulating the ID_AA64DFR0_EL1 such to
cap the version to the minimum SPE version - if absolutely required.

Thanks,

Andrew Murray

> 
> 
> > > >  That fits with your idea
> > > > above,
> > > > where you basically get an immediate exit if we try to schedule a vCPU
> > > > onto
> > > > a CPU that isn't part of the SPE mask.
> > > 
> > > Then it means that the VM should be configured with a mask indicating
> > > which CPUs it is intended to run on, and setting such a mask is mandatory
> > > for SPE.
> > 
> > Yeah, and this could probably all be wrapped up by userspace so you just
> > pass the SPE PMU name or something and it grabs the corresponding cpumask
> > for you.
> > 
> > > > > > One solution could be to allow scheduling onto non-SPE VCPUs but wrap
> > > > > > the
> > > > > > SPE save/restore code in a macro (much like kvm_arm_spe_v1_ready) that
> > > > > > reads the non-sanitised feature register. Therefore we don't go bang,
> > > > > > but
> > > > > > we also increase the size of any black-holes in SPE capturing. Though
> > > > > > this
> > > > > > feels like something that will cause grief down the line.
> > > > > >
> > > > > > Is there something else that can be done?
> > > > > 
> > > > > How does userspace deal with this? When SPE is only available on
> > > > > half of
> > > > > the CPUs, how does perf work in these conditions?
> > > > 
> > > > Not sure about userspace, but the kernel driver works by instantiating
> > > > an
> > > > SPE PMU instance only for the CPUs that have it and then that instance
> > > > profiles for only those CPUs. You also need to do something similar if
> > > > you had two CPU types with SPE, since the SPE configuration is likely to
> > > > be
> > > > different between them.
> > > 
> > > So that's closer to what Andrew was suggesting above (running a guest on a
> > > non-SPE CPU creates a profiling black hole). Except that we can't really
> > > run a SPE-enabled guest on a non-SPE CPU, as the SPE sysregs will UNDEF
> > > at EL1.
> > 
> > Right. I wouldn't suggest the "black hole" approach for VMs, but it works
> > for userspace so that's why the driver does it that way.
> > 
> > > Conclusion: we need a mix of a cpumask to indicate which CPUs we want to
> > > run on (generic, not-SPE related), 
> 
> If I understand correctly this mask isn't exposed to KVM (in the kernel) and
> KVM (in the kernel) is unware of how the CPUs that have KVM_RUN called are
> selected.
> 
> Thus this implies the cpumask is a feature of KVM tool or QEMU that would
> need to be added there. (E.g. kvm_cmd_run_work would set some affinity when
> creating pthreads - based on a CPU mask triggered by setting the --spe flag)?
> 
> Thanks,
> 
> Andrew Murray
> 
> > and a check for SPE-capable CPUs.
> > > If any of these condition is not satisfied, the vcpu exits for userspace
> > > to sort out the affinity.
> > > 
> > > I hate heterogeneous systems.
> > 
> > They hate you too ;)
> > 
> > Will
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 09/18] arm64: KVM: enable conditional save/restore full SPE profiling buffer controls
  2020-01-09 11:25                 ` Andrew Murray
@ 2020-01-09 12:01                   ` Will Deacon
  0 siblings, 0 replies; 78+ messages in thread
From: Will Deacon @ 2020-01-09 12:01 UTC (permalink / raw)
  To: Andrew Murray
  Cc: Catalin Marinas, kvm, Marc Zyngier, linux-kernel, Sudeep Holla,
	kvmarm, linux-arm-kernel

On Thu, Jan 09, 2020 at 11:25:04AM +0000, Andrew Murray wrote:
> On Thu, Jan 09, 2020 at 11:23:37AM +0000, Andrew Murray wrote:
> > On Wed, Jan 08, 2020 at 01:10:21PM +0000, Will Deacon wrote:
> > > On Wed, Jan 08, 2020 at 12:36:11PM +0000, Marc Zyngier wrote:
> > > > On 2020-01-08 11:58, Will Deacon wrote:
> > > > > On Wed, Jan 08, 2020 at 11:17:16AM +0000, Marc Zyngier wrote:
> > > > > > On 2020-01-07 15:13, Andrew Murray wrote:
> > > > > > > Looking at the vcpu_load and related code, I don't see a way of saying
> > > > > > > 'don't schedule this VCPU on this CPU' or bailing in any way.
> > > > > > 
> > > > > > That would actually be pretty easy to implement. In vcpu_load(), check
> > > > > > that that the CPU physical has SPE. If not, raise a request for that
> > > > > > vcpu.
> > > > > > In the run loop, check for that request and abort if raised, returning
> > > > > > to userspace.
> > 
> > I hadn't really noticed the kvm_make_request mechanism - however it's now
> > clear how this could be implemented.
> > 
> > This approach gives responsibility for which CPUs should be used to userspace
> > and if userspace gets it wrong then the KVM_RUN ioctl won't do very much.
> > 
> > 
> > > > > > 
> > > > > > Userspace can always check /sys/devices/arm_spe_0/cpumask and work out
> > > > > > where to run that particular vcpu.
> > > > > 
> > > > > It's also worth considering systems where there are multiple
> > > > > implementations
> > > > > of SPE in play. Assuming we don't want to expose this to a guest, then
> > > > > the
> > > > > right interface here is probably for userspace to pick one SPE
> > > > > implementation and expose that to the guest.
> > 
> > If I understand correctly then this implies the following:
> > 
> >  - If the host userspace indicates it wants support for SPE in the guest (via 
> >    KVM_SET_DEVICE_ATTR at start of day) - then we should check in vcpu_load that
> >    the minimum version of SPE is present on the current CPU. 'minimum' because
> >    we don't know why userspace has selected the given cpumask.
> > 
> >  - Userspace can get it wrong, i.e. it can create a CPU mask with CPUs that
> >    have SPE with differing versions. If it does, and all CPUs have some form of
> >    SPE then errors may occur in the guest. Perhaps this is OK and userspace
> >    shouldn't get it wrong?
> 
> Actually this could be guarded against by emulating the ID_AA64DFR0_EL1 such to
> cap the version to the minimum SPE version - if absolutely required.

The problem is, it's not as simple as checking a version field. Instead,
you'd have to look at all of the ID registers for SPE so that you don't end
up with funny differences such as minimum sampling interval, or hardware RNG
support. Ultimately though, *much* of the trace is going to be describing
IMP DEF stuff because it's so micro-architectural, and there's very little
you can do to hide that.

Will

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 11/18] KVM: arm64: don't trap Statistical Profiling controls to EL2
  2019-12-23 12:10         ` Andrew Murray
@ 2020-01-09 17:25           ` Andrew Murray
  2020-01-09 17:42             ` Mark Rutland
  0 siblings, 1 reply; 78+ messages in thread
From: Andrew Murray @ 2020-01-09 17:25 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, Catalin Marinas, linux-kernel, Sudeep Holla, kvmarm,
	linux-arm-kernel

On Mon, Dec 23, 2019 at 12:10:42PM +0000, Andrew Murray wrote:
> On Mon, Dec 23, 2019 at 12:05:12PM +0000, Marc Zyngier wrote:
> > On 2019-12-23 11:56, Andrew Murray wrote:
> > > On Sun, Dec 22, 2019 at 10:42:05AM +0000, Marc Zyngier wrote:
> > > > On Fri, 20 Dec 2019 14:30:18 +0000,
> > > > Andrew Murray <andrew.murray@arm.com> wrote:
> > > > >
> > > > > As we now save/restore the profiler state there is no need to trap
> > > > > accesses to the statistical profiling controls. Let's unset the
> > > > > _TPMS bit.
> > > > >
> > > > > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> > > > > ---
> > > > >  arch/arm64/kvm/debug.c | 2 --
> > > > >  1 file changed, 2 deletions(-)
> > > > >
> > > > > diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
> > > > > index 43487f035385..07ca783e7d9e 100644
> > > > > --- a/arch/arm64/kvm/debug.c
> > > > > +++ b/arch/arm64/kvm/debug.c
> > > > > @@ -88,7 +88,6 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu
> > > > *vcpu)
> > > > >   *  - Performance monitors (MDCR_EL2_TPM/MDCR_EL2_TPMCR)
> > > > >   *  - Debug ROM Address (MDCR_EL2_TDRA)
> > > > >   *  - OS related registers (MDCR_EL2_TDOSA)
> > > > > - *  - Statistical profiler (MDCR_EL2_TPMS/MDCR_EL2_E2PB)
> > > > >   *
> > > > >   * Additionally, KVM only traps guest accesses to the debug
> > > > registers if
> > > > >   * the guest is not actively using them (see the
> > > > KVM_ARM64_DEBUG_DIRTY
> > > > > @@ -111,7 +110,6 @@ void kvm_arm_setup_debug(struct kvm_vcpu
> > > > *vcpu)
> > > > >  	 */
> > > > >  	vcpu->arch.mdcr_el2 = __this_cpu_read(mdcr_el2) &
> > > > MDCR_EL2_HPMN_MASK;
> > > > >  	vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
> > > > > -				MDCR_EL2_TPMS |
> > > > 
> > > > No. This is an *optional* feature (the guest could not be presented
> > > > with the SPE feature, or the the support simply not be compiled in).
> > > > 
> > > > If the guest is not allowed to see the feature, for whichever
> > > > reason,
> > > > the traps *must* be enabled and handled.
> > > 
> > > I'll update this (and similar) to trap such registers when we don't
> > > support
> > > SPE in the guest.
> > > 
> > > My original concern in the cover letter was in how to prevent the guest
> > > from attempting to use these registers in the first place - I think the
> > > solution I was looking for is to trap-and-emulate ID_AA64DFR0_EL1 such
> > > that
> > > the PMSVer bits indicate that SPE is not emulated.
> > 
> > That, and active trapping of the SPE system registers resulting in injection
> > of an UNDEF into the offending guest.
> 
> Yes that's no problem.

The spec says that 'direct access to [these registers] are UNDEFINED' - is it
not more correct to handle this with trap_raz_wi than an undefined instruction?

Thanks,

Andrew Murray

> 
> Thanks,
> 
> Andrew Murray
> 
> > 
> > Thanks,
> > 
> >         M.
> > -- 
> > Jazz is not dead. It just smells funny...
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 11/18] KVM: arm64: don't trap Statistical Profiling controls to EL2
  2020-01-09 17:25           ` Andrew Murray
@ 2020-01-09 17:42             ` Mark Rutland
  2020-01-09 17:46               ` Andrew Murray
  0 siblings, 1 reply; 78+ messages in thread
From: Mark Rutland @ 2020-01-09 17:42 UTC (permalink / raw)
  To: Andrew Murray
  Cc: Marc Zyngier, kvm, Catalin Marinas, linux-kernel, Sudeep Holla,
	kvmarm, linux-arm-kernel

Hi Andrew,

On Thu, Jan 09, 2020 at 05:25:12PM +0000, Andrew Murray wrote:
> On Mon, Dec 23, 2019 at 12:10:42PM +0000, Andrew Murray wrote:
> > On Mon, Dec 23, 2019 at 12:05:12PM +0000, Marc Zyngier wrote:
> > > On 2019-12-23 11:56, Andrew Murray wrote:
> > > > My original concern in the cover letter was in how to prevent
> > > > the guest from attempting to use these registers in the first
> > > > place - I think the solution I was looking for is to
> > > > trap-and-emulate ID_AA64DFR0_EL1 such that the PMSVer bits
> > > > indicate that SPE is not emulated.
> > > 
> > > That, and active trapping of the SPE system registers resulting in injection
> > > of an UNDEF into the offending guest.
> > 
> > Yes that's no problem.
> 
> The spec says that 'direct access to [these registers] are UNDEFINED' - is it
> not more correct to handle this with trap_raz_wi than an undefined instruction?

The term UNDEFINED specifically means treated as an undefined
instruction. The Glossary in ARM DDI 0487E.a says for UNDEFINED:

| Indicates cases where an attempt to execute a particular encoding bit
| pattern generates an exception, that is taken to the current Exception
| level, or to the default Exception level for taking exceptions if the
| UNDEFINED encoding was executed at EL0. This applies to:
|
| * Any encoding that is not allocated to any instruction.
|
| * Any encoding that is defined as never accessible at the current
|   Exception level.
|
| * Some cases where an enable, disable, or trap control means an
|   encoding is not accessible at the current Exception level.

So these should trigger an UNDEFINED exception rather than behaving as
RAZ/WI.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 11/18] KVM: arm64: don't trap Statistical Profiling controls to EL2
  2020-01-09 17:42             ` Mark Rutland
@ 2020-01-09 17:46               ` Andrew Murray
  0 siblings, 0 replies; 78+ messages in thread
From: Andrew Murray @ 2020-01-09 17:46 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Marc Zyngier, kvm, Catalin Marinas, linux-kernel, Sudeep Holla,
	kvmarm, linux-arm-kernel

On Thu, Jan 09, 2020 at 05:42:51PM +0000, Mark Rutland wrote:
> Hi Andrew,
> 
> On Thu, Jan 09, 2020 at 05:25:12PM +0000, Andrew Murray wrote:
> > On Mon, Dec 23, 2019 at 12:10:42PM +0000, Andrew Murray wrote:
> > > On Mon, Dec 23, 2019 at 12:05:12PM +0000, Marc Zyngier wrote:
> > > > On 2019-12-23 11:56, Andrew Murray wrote:
> > > > > My original concern in the cover letter was in how to prevent
> > > > > the guest from attempting to use these registers in the first
> > > > > place - I think the solution I was looking for is to
> > > > > trap-and-emulate ID_AA64DFR0_EL1 such that the PMSVer bits
> > > > > indicate that SPE is not emulated.
> > > > 
> > > > That, and active trapping of the SPE system registers resulting in injection
> > > > of an UNDEF into the offending guest.
> > > 
> > > Yes that's no problem.
> > 
> > The spec says that 'direct access to [these registers] are UNDEFINED' - is it
> > not more correct to handle this with trap_raz_wi than an undefined instruction?
> 
> The term UNDEFINED specifically means treated as an undefined
> instruction. The Glossary in ARM DDI 0487E.a says for UNDEFINED:
> 
> | Indicates cases where an attempt to execute a particular encoding bit
> | pattern generates an exception, that is taken to the current Exception
> | level, or to the default Exception level for taking exceptions if the
> | UNDEFINED encoding was executed at EL0. This applies to:
> |
> | * Any encoding that is not allocated to any instruction.
> |
> | * Any encoding that is defined as never accessible at the current
> |   Exception level.
> |
> | * Some cases where an enable, disable, or trap control means an
> |   encoding is not accessible at the current Exception level.
> 
> So these should trigger an UNDEFINED exception rather than behaving as
> RAZ/WI.

OK thanks for the clarification - I'll leave it as an undefined instruction.

Thanks,

Andrew Murray

> 
> Thanks,
> Mark.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 09/18] arm64: KVM: enable conditional save/restore full SPE profiling buffer controls
  2019-12-21 14:13   ` Marc Zyngier
  2020-01-07 15:13     ` Andrew Murray
@ 2020-01-10 10:54     ` Andrew Murray
  2020-01-10 11:04       ` Andrew Murray
  2020-01-10 11:18       ` Marc Zyngier
  1 sibling, 2 replies; 78+ messages in thread
From: Andrew Murray @ 2020-01-10 10:54 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Mark Rutland, will, Sudeep Holla, kvm, kvmarm,
	linux-arm-kernel, linux-kernel

On Sat, Dec 21, 2019 at 02:13:25PM +0000, Marc Zyngier wrote:
> On Fri, 20 Dec 2019 14:30:16 +0000
> Andrew Murray <andrew.murray@arm.com> wrote:
> 
> [somehow managed not to do a reply all, re-sending]
> 
> > From: Sudeep Holla <sudeep.holla@arm.com>
> > 
> > Now that we can save/restore the full SPE controls, we can enable it
> > if SPE is setup and ready to use in KVM. It's supported in KVM only if
> > all the CPUs in the system supports SPE.
> > 
> > However to support heterogenous systems, we need to move the check if
> > host supports SPE and do a partial save/restore.
> 
> No. Let's just not go down that path. For now, KVM on heterogeneous
> systems do not get SPE. If SPE has been enabled on a guest and a CPU
> comes up without SPE, this CPU should fail to boot (same as exposing a
> feature to userspace).
> 
> > 
> > Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> > ---
> >  arch/arm64/kvm/hyp/debug-sr.c | 33 ++++++++++++++++-----------------
> >  include/kvm/arm_spe.h         |  6 ++++++
> >  2 files changed, 22 insertions(+), 17 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
> > index 12429b212a3a..d8d857067e6d 100644
> > --- a/arch/arm64/kvm/hyp/debug-sr.c
> > +++ b/arch/arm64/kvm/hyp/debug-sr.c
> > @@ -86,18 +86,13 @@
> >  	}
> >  
> >  static void __hyp_text
> > -__debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
> > +__debug_save_spe_context(struct kvm_cpu_context *ctxt, bool full_ctxt)
> >  {
> >  	u64 reg;
> >  
> >  	/* Clear pmscr in case of early return */
> >  	ctxt->sys_regs[PMSCR_EL1] = 0;
> >  
> > -	/* SPE present on this CPU? */
> > -	if (!cpuid_feature_extract_unsigned_field(read_sysreg(id_aa64dfr0_el1),
> > -						  ID_AA64DFR0_PMSVER_SHIFT))
> > -		return;
> > -
> >  	/* Yes; is it owned by higher EL? */
> >  	reg = read_sysreg_s(SYS_PMBIDR_EL1);
> >  	if (reg & BIT(SYS_PMBIDR_EL1_P_SHIFT))
> > @@ -142,7 +137,7 @@ __debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
> >  }
> >  
> >  static void __hyp_text
> > -__debug_restore_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
> > +__debug_restore_spe_context(struct kvm_cpu_context *ctxt, bool full_ctxt)
> >  {
> >  	if (!ctxt->sys_regs[PMSCR_EL1])
> >  		return;
> > @@ -210,11 +205,14 @@ void __hyp_text __debug_restore_guest_context(struct kvm_vcpu *vcpu)
> >  	struct kvm_guest_debug_arch *host_dbg;
> >  	struct kvm_guest_debug_arch *guest_dbg;
> >  
> > +	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > +	guest_ctxt = &vcpu->arch.ctxt;
> > +
> > +	__debug_restore_spe_context(guest_ctxt, kvm_arm_spe_v1_ready(vcpu));
> > +
> >  	if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
> >  		return;
> >  
> > -	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > -	guest_ctxt = &vcpu->arch.ctxt;
> >  	host_dbg = &vcpu->arch.host_debug_state.regs;
> >  	guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
> >  
> > @@ -232,8 +230,7 @@ void __hyp_text __debug_restore_host_context(struct kvm_vcpu *vcpu)
> >  	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> >  	guest_ctxt = &vcpu->arch.ctxt;
> >  
> > -	if (!has_vhe())
> > -		__debug_restore_spe_nvhe(host_ctxt, false);
> > +	__debug_restore_spe_context(host_ctxt, kvm_arm_spe_v1_ready(vcpu));
> 
> So you now do an unconditional save/restore on the exit path for VHE as
> well? Even if the host isn't using the SPE HW? That's not acceptable
> as, in most cases, only the host /or/ the guest will use SPE. Here, you
> put a measurable overhead on each exit.
> 
> If the host is not using SPE, then the restore/save should happen in
> vcpu_load/vcpu_put. Only if the host is using SPE should you do
> something in the run loop. Of course, this only applies to VHE and
> non-VHE must switch eagerly.
> 

On VHE where SPE is used in the guest only - we save/restore in vcpu_load/put.

On VHE where SPE is used in the host only - we save/restore in the run loop.

On VHE where SPE is used in guest and host - we save/restore in the run loop.

As the guest can't trace EL2 it doesn't matter if we restore guest SPE early
in the vcpu_load/put functions. (I assume it doesn't matter that we restore
an EL0/EL1 profiling buffer address at this point and enable tracing given
that there is nothing to trace until entering the guest).

However the reason for moving save/restore to vcpu_load/put when the host is
using SPE is to minimise the host EL2 black-out window.


On nVHE we always save/restore in the run loop. For the SPE guest-use-only
use-case we can't save/restore in vcpu_load/put - because the guest runs at
the same ELx level as the host - and thus doing so would result in the guest
tracing part of the host.

Though if we determine that (for nVHE systems) the guest SPE is profiling only
EL0 - then we could also save/restore in vcpu_load/put where SPE is only being
used in the guest.

Does that make sense, are my reasons correct?

Thanks,

Andrew Murray


> >  
> >  	if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
> >  		return;
> > @@ -249,19 +246,21 @@ void __hyp_text __debug_restore_host_context(struct kvm_vcpu *vcpu)
> >  
> >  void __hyp_text __debug_save_host_context(struct kvm_vcpu *vcpu)
> >  {
> > -	/*
> > -	 * Non-VHE: Disable and flush SPE data generation
> > -	 * VHE: The vcpu can run, but it can't hide.
> > -	 */
> >  	struct kvm_cpu_context *host_ctxt;
> >  
> >  	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > -	if (!has_vhe())
> > -		__debug_save_spe_nvhe(host_ctxt, false);
> > +	if (cpuid_feature_extract_unsigned_field(read_sysreg(id_aa64dfr0_el1),
> > +						 ID_AA64DFR0_PMSVER_SHIFT))
> > +		__debug_save_spe_context(host_ctxt, kvm_arm_spe_v1_ready(vcpu));
> >  }
> >  
> >  void __hyp_text __debug_save_guest_context(struct kvm_vcpu *vcpu)
> >  {
> > +	bool kvm_spe_ready = kvm_arm_spe_v1_ready(vcpu);
> > +
> > +	/* SPE present on this vCPU? */
> > +	if (kvm_spe_ready)
> > +		__debug_save_spe_context(&vcpu->arch.ctxt, kvm_spe_ready);
> >  }
> >  
> >  u32 __hyp_text __kvm_get_mdcr_el2(void)
> > diff --git a/include/kvm/arm_spe.h b/include/kvm/arm_spe.h
> > index 48d118fdb174..30c40b1bc385 100644
> > --- a/include/kvm/arm_spe.h
> > +++ b/include/kvm/arm_spe.h
> > @@ -16,4 +16,10 @@ struct kvm_spe {
> >  	bool irq_level;
> >  };
> >  
> > +#ifdef CONFIG_KVM_ARM_SPE
> > +#define kvm_arm_spe_v1_ready(v)		((v)->arch.spe.ready)
> > +#else
> > +#define kvm_arm_spe_v1_ready(v)		(false)
> > +#endif /* CONFIG_KVM_ARM_SPE */
> > +
> >  #endif /* __ASM_ARM_KVM_SPE_H */
> 
> Thanks,
> 
> 	M.
> -- 
> Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 09/18] arm64: KVM: enable conditional save/restore full SPE profiling buffer controls
  2020-01-10 10:54     ` Andrew Murray
@ 2020-01-10 11:04       ` Andrew Murray
  2020-01-10 11:51         ` Marc Zyngier
  2020-01-10 11:18       ` Marc Zyngier
  1 sibling, 1 reply; 78+ messages in thread
From: Andrew Murray @ 2020-01-10 11:04 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, Catalin Marinas, linux-kernel, Sudeep Holla, will, kvmarm,
	linux-arm-kernel

On Fri, Jan 10, 2020 at 10:54:36AM +0000, Andrew Murray wrote:
> On Sat, Dec 21, 2019 at 02:13:25PM +0000, Marc Zyngier wrote:
> > On Fri, 20 Dec 2019 14:30:16 +0000
> > Andrew Murray <andrew.murray@arm.com> wrote:
> > 
> > [somehow managed not to do a reply all, re-sending]
> > 
> > > From: Sudeep Holla <sudeep.holla@arm.com>
> > > 
> > > Now that we can save/restore the full SPE controls, we can enable it
> > > if SPE is setup and ready to use in KVM. It's supported in KVM only if
> > > all the CPUs in the system supports SPE.
> > > 
> > > However to support heterogenous systems, we need to move the check if
> > > host supports SPE and do a partial save/restore.
> > 
> > No. Let's just not go down that path. For now, KVM on heterogeneous
> > systems do not get SPE. If SPE has been enabled on a guest and a CPU
> > comes up without SPE, this CPU should fail to boot (same as exposing a
> > feature to userspace).
> > 
> > > 
> > > Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> > > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> > > ---
> > >  arch/arm64/kvm/hyp/debug-sr.c | 33 ++++++++++++++++-----------------
> > >  include/kvm/arm_spe.h         |  6 ++++++
> > >  2 files changed, 22 insertions(+), 17 deletions(-)
> > > 
> > > diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
> > > index 12429b212a3a..d8d857067e6d 100644
> > > --- a/arch/arm64/kvm/hyp/debug-sr.c
> > > +++ b/arch/arm64/kvm/hyp/debug-sr.c
> > > @@ -86,18 +86,13 @@
> > >  	}
> > >  
> > >  static void __hyp_text
> > > -__debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
> > > +__debug_save_spe_context(struct kvm_cpu_context *ctxt, bool full_ctxt)
> > >  {
> > >  	u64 reg;
> > >  
> > >  	/* Clear pmscr in case of early return */
> > >  	ctxt->sys_regs[PMSCR_EL1] = 0;
> > >  
> > > -	/* SPE present on this CPU? */
> > > -	if (!cpuid_feature_extract_unsigned_field(read_sysreg(id_aa64dfr0_el1),
> > > -						  ID_AA64DFR0_PMSVER_SHIFT))
> > > -		return;
> > > -
> > >  	/* Yes; is it owned by higher EL? */
> > >  	reg = read_sysreg_s(SYS_PMBIDR_EL1);
> > >  	if (reg & BIT(SYS_PMBIDR_EL1_P_SHIFT))
> > > @@ -142,7 +137,7 @@ __debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
> > >  }
> > >  
> > >  static void __hyp_text
> > > -__debug_restore_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
> > > +__debug_restore_spe_context(struct kvm_cpu_context *ctxt, bool full_ctxt)
> > >  {
> > >  	if (!ctxt->sys_regs[PMSCR_EL1])
> > >  		return;
> > > @@ -210,11 +205,14 @@ void __hyp_text __debug_restore_guest_context(struct kvm_vcpu *vcpu)
> > >  	struct kvm_guest_debug_arch *host_dbg;
> > >  	struct kvm_guest_debug_arch *guest_dbg;
> > >  
> > > +	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > > +	guest_ctxt = &vcpu->arch.ctxt;
> > > +
> > > +	__debug_restore_spe_context(guest_ctxt, kvm_arm_spe_v1_ready(vcpu));
> > > +
> > >  	if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
> > >  		return;
> > >  
> > > -	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > > -	guest_ctxt = &vcpu->arch.ctxt;
> > >  	host_dbg = &vcpu->arch.host_debug_state.regs;
> > >  	guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
> > >  
> > > @@ -232,8 +230,7 @@ void __hyp_text __debug_restore_host_context(struct kvm_vcpu *vcpu)
> > >  	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > >  	guest_ctxt = &vcpu->arch.ctxt;
> > >  
> > > -	if (!has_vhe())
> > > -		__debug_restore_spe_nvhe(host_ctxt, false);
> > > +	__debug_restore_spe_context(host_ctxt, kvm_arm_spe_v1_ready(vcpu));
> > 
> > So you now do an unconditional save/restore on the exit path for VHE as
> > well? Even if the host isn't using the SPE HW? That's not acceptable
> > as, in most cases, only the host /or/ the guest will use SPE. Here, you
> > put a measurable overhead on each exit.
> > 
> > If the host is not using SPE, then the restore/save should happen in
> > vcpu_load/vcpu_put. Only if the host is using SPE should you do
> > something in the run loop. Of course, this only applies to VHE and
> > non-VHE must switch eagerly.
> > 
> 
> On VHE where SPE is used in the guest only - we save/restore in vcpu_load/put.
> 
> On VHE where SPE is used in the host only - we save/restore in the run loop.
> 
> On VHE where SPE is used in guest and host - we save/restore in the run loop.
> 
> As the guest can't trace EL2 it doesn't matter if we restore guest SPE early
> in the vcpu_load/put functions. (I assume it doesn't matter that we restore
> an EL0/EL1 profiling buffer address at this point and enable tracing given
> that there is nothing to trace until entering the guest).
> 
> However the reason for moving save/restore to vcpu_load/put when the host is
> using SPE is to minimise the host EL2 black-out window.
> 
> 
> On nVHE we always save/restore in the run loop. For the SPE guest-use-only
> use-case we can't save/restore in vcpu_load/put - because the guest runs at
> the same ELx level as the host - and thus doing so would result in the guest
> tracing part of the host.
> 
> Though if we determine that (for nVHE systems) the guest SPE is profiling only
> EL0 - then we could also save/restore in vcpu_load/put where SPE is only being
> used in the guest.
> 
> Does that make sense, are my reasons correct?

Also I'm making the following assumptions:

 - We determine if the host or guest are using SPE by seeing if profiling
   (e.g. PMSCR_EL1) is enabled. That should determine *when* we restore as per
   my previous email.

 - I'm less sure on this: We should determine *what* we restore based on the
   availability of the SPE feature and not if it is being used - so for guest
   this is if the guest has the feature on the vcpu. For host this is based on
   the CPU feature registers.

   The downshot of this is that if you have SPE support present on guest and
   host and they aren't being used, then you still save/restore upon entering/
   leaving a guest. The reason I feel this is needed is to prevent the issue
   where the host starts programming the SPE registers, but is preempted by
   KVM entering a guest, before it could enable host SPE. Thus when we enter the
   guest we don't save all the registers, we return to the host and the host
   SPE carries on from where it left of and enables it - yet because we didn't
   restore all the programmed registers it doesn't work.

Thanks,

Andrew Murray

> 
> Thanks,
> 
> Andrew Murray
> 
> 
> > >  
> > >  	if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
> > >  		return;
> > > @@ -249,19 +246,21 @@ void __hyp_text __debug_restore_host_context(struct kvm_vcpu *vcpu)
> > >  
> > >  void __hyp_text __debug_save_host_context(struct kvm_vcpu *vcpu)
> > >  {
> > > -	/*
> > > -	 * Non-VHE: Disable and flush SPE data generation
> > > -	 * VHE: The vcpu can run, but it can't hide.
> > > -	 */
> > >  	struct kvm_cpu_context *host_ctxt;
> > >  
> > >  	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > > -	if (!has_vhe())
> > > -		__debug_save_spe_nvhe(host_ctxt, false);
> > > +	if (cpuid_feature_extract_unsigned_field(read_sysreg(id_aa64dfr0_el1),
> > > +						 ID_AA64DFR0_PMSVER_SHIFT))
> > > +		__debug_save_spe_context(host_ctxt, kvm_arm_spe_v1_ready(vcpu));
> > >  }
> > >  
> > >  void __hyp_text __debug_save_guest_context(struct kvm_vcpu *vcpu)
> > >  {
> > > +	bool kvm_spe_ready = kvm_arm_spe_v1_ready(vcpu);
> > > +
> > > +	/* SPE present on this vCPU? */
> > > +	if (kvm_spe_ready)
> > > +		__debug_save_spe_context(&vcpu->arch.ctxt, kvm_spe_ready);
> > >  }
> > >  
> > >  u32 __hyp_text __kvm_get_mdcr_el2(void)
> > > diff --git a/include/kvm/arm_spe.h b/include/kvm/arm_spe.h
> > > index 48d118fdb174..30c40b1bc385 100644
> > > --- a/include/kvm/arm_spe.h
> > > +++ b/include/kvm/arm_spe.h
> > > @@ -16,4 +16,10 @@ struct kvm_spe {
> > >  	bool irq_level;
> > >  };
> > >  
> > > +#ifdef CONFIG_KVM_ARM_SPE
> > > +#define kvm_arm_spe_v1_ready(v)		((v)->arch.spe.ready)
> > > +#else
> > > +#define kvm_arm_spe_v1_ready(v)		(false)
> > > +#endif /* CONFIG_KVM_ARM_SPE */
> > > +
> > >  #endif /* __ASM_ARM_KVM_SPE_H */
> > 
> > Thanks,
> > 
> > 	M.
> > -- 
> > Jazz is not dead. It just smells funny...
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 09/18] arm64: KVM: enable conditional save/restore full SPE profiling buffer controls
  2020-01-10 10:54     ` Andrew Murray
  2020-01-10 11:04       ` Andrew Murray
@ 2020-01-10 11:18       ` Marc Zyngier
  2020-01-10 12:12         ` Andrew Murray
  1 sibling, 1 reply; 78+ messages in thread
From: Marc Zyngier @ 2020-01-10 11:18 UTC (permalink / raw)
  To: Andrew Murray
  Cc: Catalin Marinas, Mark Rutland, will, Sudeep Holla, kvm, kvmarm,
	linux-arm-kernel, linux-kernel

On 2020-01-10 10:54, Andrew Murray wrote:
> On Sat, Dec 21, 2019 at 02:13:25PM +0000, Marc Zyngier wrote:
>> On Fri, 20 Dec 2019 14:30:16 +0000
>> Andrew Murray <andrew.murray@arm.com> wrote:
>> 
>> [somehow managed not to do a reply all, re-sending]
>> 
>> > From: Sudeep Holla <sudeep.holla@arm.com>
>> >
>> > Now that we can save/restore the full SPE controls, we can enable it
>> > if SPE is setup and ready to use in KVM. It's supported in KVM only if
>> > all the CPUs in the system supports SPE.
>> >
>> > However to support heterogenous systems, we need to move the check if
>> > host supports SPE and do a partial save/restore.
>> 
>> No. Let's just not go down that path. For now, KVM on heterogeneous
>> systems do not get SPE. If SPE has been enabled on a guest and a CPU
>> comes up without SPE, this CPU should fail to boot (same as exposing a
>> feature to userspace).
>> 
>> >
>> > Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
>> > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
>> > ---
>> >  arch/arm64/kvm/hyp/debug-sr.c | 33 ++++++++++++++++-----------------
>> >  include/kvm/arm_spe.h         |  6 ++++++
>> >  2 files changed, 22 insertions(+), 17 deletions(-)
>> >
>> > diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
>> > index 12429b212a3a..d8d857067e6d 100644
>> > --- a/arch/arm64/kvm/hyp/debug-sr.c
>> > +++ b/arch/arm64/kvm/hyp/debug-sr.c
>> > @@ -86,18 +86,13 @@
>> >  	}
>> >
>> >  static void __hyp_text
>> > -__debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
>> > +__debug_save_spe_context(struct kvm_cpu_context *ctxt, bool full_ctxt)
>> >  {
>> >  	u64 reg;
>> >
>> >  	/* Clear pmscr in case of early return */
>> >  	ctxt->sys_regs[PMSCR_EL1] = 0;
>> >
>> > -	/* SPE present on this CPU? */
>> > -	if (!cpuid_feature_extract_unsigned_field(read_sysreg(id_aa64dfr0_el1),
>> > -						  ID_AA64DFR0_PMSVER_SHIFT))
>> > -		return;
>> > -
>> >  	/* Yes; is it owned by higher EL? */
>> >  	reg = read_sysreg_s(SYS_PMBIDR_EL1);
>> >  	if (reg & BIT(SYS_PMBIDR_EL1_P_SHIFT))
>> > @@ -142,7 +137,7 @@ __debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
>> >  }
>> >
>> >  static void __hyp_text
>> > -__debug_restore_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
>> > +__debug_restore_spe_context(struct kvm_cpu_context *ctxt, bool full_ctxt)
>> >  {
>> >  	if (!ctxt->sys_regs[PMSCR_EL1])
>> >  		return;
>> > @@ -210,11 +205,14 @@ void __hyp_text __debug_restore_guest_context(struct kvm_vcpu *vcpu)
>> >  	struct kvm_guest_debug_arch *host_dbg;
>> >  	struct kvm_guest_debug_arch *guest_dbg;
>> >
>> > +	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
>> > +	guest_ctxt = &vcpu->arch.ctxt;
>> > +
>> > +	__debug_restore_spe_context(guest_ctxt, kvm_arm_spe_v1_ready(vcpu));
>> > +
>> >  	if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
>> >  		return;
>> >
>> > -	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
>> > -	guest_ctxt = &vcpu->arch.ctxt;
>> >  	host_dbg = &vcpu->arch.host_debug_state.regs;
>> >  	guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
>> >
>> > @@ -232,8 +230,7 @@ void __hyp_text __debug_restore_host_context(struct kvm_vcpu *vcpu)
>> >  	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
>> >  	guest_ctxt = &vcpu->arch.ctxt;
>> >
>> > -	if (!has_vhe())
>> > -		__debug_restore_spe_nvhe(host_ctxt, false);
>> > +	__debug_restore_spe_context(host_ctxt, kvm_arm_spe_v1_ready(vcpu));
>> 
>> So you now do an unconditional save/restore on the exit path for VHE 
>> as
>> well? Even if the host isn't using the SPE HW? That's not acceptable
>> as, in most cases, only the host /or/ the guest will use SPE. Here, 
>> you
>> put a measurable overhead on each exit.
>> 
>> If the host is not using SPE, then the restore/save should happen in
>> vcpu_load/vcpu_put. Only if the host is using SPE should you do
>> something in the run loop. Of course, this only applies to VHE and
>> non-VHE must switch eagerly.
>> 
> 
> On VHE where SPE is used in the guest only - we save/restore in 
> vcpu_load/put.

Yes.

> On VHE where SPE is used in the host only - we save/restore in the run 
> loop.

Why? If only the host is using SPE, why should we do *anything at all*?

> On VHE where SPE is used in guest and host - we save/restore in the run 
> loop.
> 
> As the guest can't trace EL2 it doesn't matter if we restore guest SPE 
> early
> in the vcpu_load/put functions. (I assume it doesn't matter that we 
> restore
> an EL0/EL1 profiling buffer address at this point and enable tracing 
> given
> that there is nothing to trace until entering the guest).

As long as you do it after the EL1 sysregs have need restored so that 
the SPE
HW has a valid context, we should be fine. Don't restore it before that 
point
though (you have no idea whether the SPE HW can do speculative memory 
accesses
that would use the wrong page tables).

> However the reason for moving save/restore to vcpu_load/put when the 
> host is
> using SPE is to minimise the host EL2 black-out window.

You should move it to *the run loop* when both host and guest are using 
SPE.

> On nVHE we always save/restore in the run loop. For the SPE 
> guest-use-only
> use-case we can't save/restore in vcpu_load/put - because the guest 
> runs at
> the same ELx level as the host - and thus doing so would result in the 
> guest
> tracing part of the host.

Not only. It would actively corrupt memory in the host by using the 
wrong
page tables.

> Though if we determine that (for nVHE systems) the guest SPE is 
> profiling only
> EL0 - then we could also save/restore in vcpu_load/put where SPE is 
> only being
> used in the guest.

Same as above: wrong MM context, speculation, potential memory 
corruption.

> Does that make sense, are my reasons correct?

Not entirely. I think you should use the following table:

VHE | Host-SPE | Guest-SPE | Switch location
  0  |     0    |     0     | none
  0  |     0    |     1     | run loop
  0  |     1    |     0     | run loop
  0  |     1    |     1     | run loop
  1  |     0    |     0     | none
  1  |     0    |     1     | load/put
  1  |     1    |     0     | none
  1  |     1    |     1     | run loop

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 09/18] arm64: KVM: enable conditional save/restore full SPE profiling buffer controls
  2020-01-10 11:04       ` Andrew Murray
@ 2020-01-10 11:51         ` Marc Zyngier
  2020-01-10 12:12           ` Andrew Murray
  0 siblings, 1 reply; 78+ messages in thread
From: Marc Zyngier @ 2020-01-10 11:51 UTC (permalink / raw)
  To: Andrew Murray
  Cc: kvm, Catalin Marinas, linux-kernel, Sudeep Holla, will, kvmarm,
	linux-arm-kernel

On 2020-01-10 11:04, Andrew Murray wrote:
> On Fri, Jan 10, 2020 at 10:54:36AM +0000, Andrew Murray wrote:
>> On Sat, Dec 21, 2019 at 02:13:25PM +0000, Marc Zyngier wrote:
>> > On Fri, 20 Dec 2019 14:30:16 +0000
>> > Andrew Murray <andrew.murray@arm.com> wrote:
>> >
>> > [somehow managed not to do a reply all, re-sending]
>> >
>> > > From: Sudeep Holla <sudeep.holla@arm.com>
>> > >
>> > > Now that we can save/restore the full SPE controls, we can enable it
>> > > if SPE is setup and ready to use in KVM. It's supported in KVM only if
>> > > all the CPUs in the system supports SPE.
>> > >
>> > > However to support heterogenous systems, we need to move the check if
>> > > host supports SPE and do a partial save/restore.
>> >
>> > No. Let's just not go down that path. For now, KVM on heterogeneous
>> > systems do not get SPE. If SPE has been enabled on a guest and a CPU
>> > comes up without SPE, this CPU should fail to boot (same as exposing a
>> > feature to userspace).
>> >
>> > >
>> > > Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
>> > > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
>> > > ---
>> > >  arch/arm64/kvm/hyp/debug-sr.c | 33 ++++++++++++++++-----------------
>> > >  include/kvm/arm_spe.h         |  6 ++++++
>> > >  2 files changed, 22 insertions(+), 17 deletions(-)
>> > >
>> > > diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
>> > > index 12429b212a3a..d8d857067e6d 100644
>> > > --- a/arch/arm64/kvm/hyp/debug-sr.c
>> > > +++ b/arch/arm64/kvm/hyp/debug-sr.c
>> > > @@ -86,18 +86,13 @@
>> > >  	}
>> > >
>> > >  static void __hyp_text
>> > > -__debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
>> > > +__debug_save_spe_context(struct kvm_cpu_context *ctxt, bool full_ctxt)
>> > >  {
>> > >  	u64 reg;
>> > >
>> > >  	/* Clear pmscr in case of early return */
>> > >  	ctxt->sys_regs[PMSCR_EL1] = 0;
>> > >
>> > > -	/* SPE present on this CPU? */
>> > > -	if (!cpuid_feature_extract_unsigned_field(read_sysreg(id_aa64dfr0_el1),
>> > > -						  ID_AA64DFR0_PMSVER_SHIFT))
>> > > -		return;
>> > > -
>> > >  	/* Yes; is it owned by higher EL? */
>> > >  	reg = read_sysreg_s(SYS_PMBIDR_EL1);
>> > >  	if (reg & BIT(SYS_PMBIDR_EL1_P_SHIFT))
>> > > @@ -142,7 +137,7 @@ __debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
>> > >  }
>> > >
>> > >  static void __hyp_text
>> > > -__debug_restore_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
>> > > +__debug_restore_spe_context(struct kvm_cpu_context *ctxt, bool full_ctxt)
>> > >  {
>> > >  	if (!ctxt->sys_regs[PMSCR_EL1])
>> > >  		return;
>> > > @@ -210,11 +205,14 @@ void __hyp_text __debug_restore_guest_context(struct kvm_vcpu *vcpu)
>> > >  	struct kvm_guest_debug_arch *host_dbg;
>> > >  	struct kvm_guest_debug_arch *guest_dbg;
>> > >
>> > > +	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
>> > > +	guest_ctxt = &vcpu->arch.ctxt;
>> > > +
>> > > +	__debug_restore_spe_context(guest_ctxt, kvm_arm_spe_v1_ready(vcpu));
>> > > +
>> > >  	if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
>> > >  		return;
>> > >
>> > > -	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
>> > > -	guest_ctxt = &vcpu->arch.ctxt;
>> > >  	host_dbg = &vcpu->arch.host_debug_state.regs;
>> > >  	guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
>> > >
>> > > @@ -232,8 +230,7 @@ void __hyp_text __debug_restore_host_context(struct kvm_vcpu *vcpu)
>> > >  	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
>> > >  	guest_ctxt = &vcpu->arch.ctxt;
>> > >
>> > > -	if (!has_vhe())
>> > > -		__debug_restore_spe_nvhe(host_ctxt, false);
>> > > +	__debug_restore_spe_context(host_ctxt, kvm_arm_spe_v1_ready(vcpu));
>> >
>> > So you now do an unconditional save/restore on the exit path for VHE as
>> > well? Even if the host isn't using the SPE HW? That's not acceptable
>> > as, in most cases, only the host /or/ the guest will use SPE. Here, you
>> > put a measurable overhead on each exit.
>> >
>> > If the host is not using SPE, then the restore/save should happen in
>> > vcpu_load/vcpu_put. Only if the host is using SPE should you do
>> > something in the run loop. Of course, this only applies to VHE and
>> > non-VHE must switch eagerly.
>> >
>> 
>> On VHE where SPE is used in the guest only - we save/restore in 
>> vcpu_load/put.
>> 
>> On VHE where SPE is used in the host only - we save/restore in the run 
>> loop.
>> 
>> On VHE where SPE is used in guest and host - we save/restore in the 
>> run loop.
>> 
>> As the guest can't trace EL2 it doesn't matter if we restore guest SPE 
>> early
>> in the vcpu_load/put functions. (I assume it doesn't matter that we 
>> restore
>> an EL0/EL1 profiling buffer address at this point and enable tracing 
>> given
>> that there is nothing to trace until entering the guest).
>> 
>> However the reason for moving save/restore to vcpu_load/put when the 
>> host is
>> using SPE is to minimise the host EL2 black-out window.
>> 
>> 
>> On nVHE we always save/restore in the run loop. For the SPE 
>> guest-use-only
>> use-case we can't save/restore in vcpu_load/put - because the guest 
>> runs at
>> the same ELx level as the host - and thus doing so would result in the 
>> guest
>> tracing part of the host.
>> 
>> Though if we determine that (for nVHE systems) the guest SPE is 
>> profiling only
>> EL0 - then we could also save/restore in vcpu_load/put where SPE is 
>> only being
>> used in the guest.
>> 
>> Does that make sense, are my reasons correct?
> 
> Also I'm making the following assumptions:
> 
>  - We determine if the host or guest are using SPE by seeing if 
> profiling
>    (e.g. PMSCR_EL1) is enabled. That should determine *when* we restore 
> as per
>    my previous email.

Yes.

>  - I'm less sure on this: We should determine *what* we restore based 
> on the
>    availability of the SPE feature and not if it is being used - so for 
> guest
>    this is if the guest has the feature on the vcpu. For host this is 
> based on
>    the CPU feature registers.

As long as the guest's feature is conditionned on the HW being present 
*and*
that you're running on a CPU that has the HW.

>    The downshot of this is that if you have SPE support present on 
> guest and
>    host and they aren't being used, then you still save/restore upon 
> entering/
>    leaving a guest. The reason I feel this is needed is to prevent the 
> issue
>    where the host starts programming the SPE registers, but is 
> preempted by
>    KVM entering a guest, before it could enable host SPE. Thus when we 
> enter the
>    guest we don't save all the registers, we return to the host and the 
> host
>    SPE carries on from where it left of and enables it - yet because we 
> didn't
>    restore all the programmed registers it doesn't work.

Saving the host registers is never optional if they are shared with the 
guest.

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 09/18] arm64: KVM: enable conditional save/restore full SPE profiling buffer controls
  2020-01-10 11:18       ` Marc Zyngier
@ 2020-01-10 12:12         ` Andrew Murray
  2020-01-10 13:34           ` Marc Zyngier
  0 siblings, 1 reply; 78+ messages in thread
From: Andrew Murray @ 2020-01-10 12:12 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Mark Rutland, will, Sudeep Holla, kvm, kvmarm,
	linux-arm-kernel, linux-kernel

On Fri, Jan 10, 2020 at 11:18:48AM +0000, Marc Zyngier wrote:
> On 2020-01-10 10:54, Andrew Murray wrote:
> > On Sat, Dec 21, 2019 at 02:13:25PM +0000, Marc Zyngier wrote:
> > > On Fri, 20 Dec 2019 14:30:16 +0000
> > > Andrew Murray <andrew.murray@arm.com> wrote:
> > > 
> > > [somehow managed not to do a reply all, re-sending]
> > > 
> > > > From: Sudeep Holla <sudeep.holla@arm.com>
> > > >
> > > > Now that we can save/restore the full SPE controls, we can enable it
> > > > if SPE is setup and ready to use in KVM. It's supported in KVM only if
> > > > all the CPUs in the system supports SPE.
> > > >
> > > > However to support heterogenous systems, we need to move the check if
> > > > host supports SPE and do a partial save/restore.
> > > 
> > > No. Let's just not go down that path. For now, KVM on heterogeneous
> > > systems do not get SPE. If SPE has been enabled on a guest and a CPU
> > > comes up without SPE, this CPU should fail to boot (same as exposing a
> > > feature to userspace).
> > > 
> > > >
> > > > Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> > > > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> > > > ---
> > > >  arch/arm64/kvm/hyp/debug-sr.c | 33 ++++++++++++++++-----------------
> > > >  include/kvm/arm_spe.h         |  6 ++++++
> > > >  2 files changed, 22 insertions(+), 17 deletions(-)
> > > >
> > > > diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
> > > > index 12429b212a3a..d8d857067e6d 100644
> > > > --- a/arch/arm64/kvm/hyp/debug-sr.c
> > > > +++ b/arch/arm64/kvm/hyp/debug-sr.c
> > > > @@ -86,18 +86,13 @@
> > > >  	}
> > > >
> > > >  static void __hyp_text
> > > > -__debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
> > > > +__debug_save_spe_context(struct kvm_cpu_context *ctxt, bool full_ctxt)
> > > >  {
> > > >  	u64 reg;
> > > >
> > > >  	/* Clear pmscr in case of early return */
> > > >  	ctxt->sys_regs[PMSCR_EL1] = 0;
> > > >
> > > > -	/* SPE present on this CPU? */
> > > > -	if (!cpuid_feature_extract_unsigned_field(read_sysreg(id_aa64dfr0_el1),
> > > > -						  ID_AA64DFR0_PMSVER_SHIFT))
> > > > -		return;
> > > > -
> > > >  	/* Yes; is it owned by higher EL? */
> > > >  	reg = read_sysreg_s(SYS_PMBIDR_EL1);
> > > >  	if (reg & BIT(SYS_PMBIDR_EL1_P_SHIFT))
> > > > @@ -142,7 +137,7 @@ __debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
> > > >  }
> > > >
> > > >  static void __hyp_text
> > > > -__debug_restore_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
> > > > +__debug_restore_spe_context(struct kvm_cpu_context *ctxt, bool full_ctxt)
> > > >  {
> > > >  	if (!ctxt->sys_regs[PMSCR_EL1])
> > > >  		return;
> > > > @@ -210,11 +205,14 @@ void __hyp_text __debug_restore_guest_context(struct kvm_vcpu *vcpu)
> > > >  	struct kvm_guest_debug_arch *host_dbg;
> > > >  	struct kvm_guest_debug_arch *guest_dbg;
> > > >
> > > > +	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > > > +	guest_ctxt = &vcpu->arch.ctxt;
> > > > +
> > > > +	__debug_restore_spe_context(guest_ctxt, kvm_arm_spe_v1_ready(vcpu));
> > > > +
> > > >  	if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
> > > >  		return;
> > > >
> > > > -	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > > > -	guest_ctxt = &vcpu->arch.ctxt;
> > > >  	host_dbg = &vcpu->arch.host_debug_state.regs;
> > > >  	guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
> > > >
> > > > @@ -232,8 +230,7 @@ void __hyp_text __debug_restore_host_context(struct kvm_vcpu *vcpu)
> > > >  	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > > >  	guest_ctxt = &vcpu->arch.ctxt;
> > > >
> > > > -	if (!has_vhe())
> > > > -		__debug_restore_spe_nvhe(host_ctxt, false);
> > > > +	__debug_restore_spe_context(host_ctxt, kvm_arm_spe_v1_ready(vcpu));
> > > 
> > > So you now do an unconditional save/restore on the exit path for VHE
> > > as
> > > well? Even if the host isn't using the SPE HW? That's not acceptable
> > > as, in most cases, only the host /or/ the guest will use SPE. Here,
> > > you
> > > put a measurable overhead on each exit.
> > > 
> > > If the host is not using SPE, then the restore/save should happen in
> > > vcpu_load/vcpu_put. Only if the host is using SPE should you do
> > > something in the run loop. Of course, this only applies to VHE and
> > > non-VHE must switch eagerly.
> > > 
> > 
> > On VHE where SPE is used in the guest only - we save/restore in
> > vcpu_load/put.
> 
> Yes.
> 
> > On VHE where SPE is used in the host only - we save/restore in the run
> > loop.
> 
> Why? If only the host is using SPE, why should we do *anything at all*?

Oh yeah of course, we trap them in this case.

(Do I understand correctly that we don't/can't trap them for nVHE? - and so
we should save/restore them for this use-case in nVHE)


> 
> > On VHE where SPE is used in guest and host - we save/restore in the run
> > loop.
> > 
> > As the guest can't trace EL2 it doesn't matter if we restore guest SPE
> > early
> > in the vcpu_load/put functions. (I assume it doesn't matter that we
> > restore
> > an EL0/EL1 profiling buffer address at this point and enable tracing
> > given
> > that there is nothing to trace until entering the guest).
> 
> As long as you do it after the EL1 sysregs have need restored so that the
> SPE
> HW has a valid context, we should be fine. Don't restore it before that
> point
> though (you have no idea whether the SPE HW can do speculative memory
> accesses
> that would use the wrong page tables).

Right, so don't enable tracing until SPE has a valid context. I understand
that to mean at least the SPE buffer address registers (PMBPTR, PMBLIMITR)
in the right context with respect to the E2PB bits (translation regime)
and having those tables mapped in (which I think relate to the __activateX,
__sysreg_restore_guest_stateX type of calls in kvm_vcpu_run_X right?).

I think that means we can restore the registers no earler than vcpu_load/put
but we can't re-enable the tracing (PMSCR) until no earlier than just before
__set_guest_arch_workaround_state. I think that applies to both VHE and nVHE?

> 
> > However the reason for moving save/restore to vcpu_load/put when the
> > host is
> > using SPE is to minimise the host EL2 black-out window.
> 
> You should move it to *the run loop* when both host and guest are using SPE.
> 
> > On nVHE we always save/restore in the run loop. For the SPE
> > guest-use-only
> > use-case we can't save/restore in vcpu_load/put - because the guest runs
> > at
> > the same ELx level as the host - and thus doing so would result in the
> > guest
> > tracing part of the host.
> 
> Not only. It would actively corrupt memory in the host by using the wrong
> page tables.
> 
> > Though if we determine that (for nVHE systems) the guest SPE is
> > profiling only
> > EL0 - then we could also save/restore in vcpu_load/put where SPE is only
> > being
> > used in the guest.
> 
> Same as above: wrong MM context, speculation, potential memory corruption.
> 
> > Does that make sense, are my reasons correct?
> 
> Not entirely. I think you should use the following table:
> 
> VHE | Host-SPE | Guest-SPE | Switch location
>  0  |     0    |     0     | none
>  0  |     0    |     1     | run loop
>  0  |     1    |     0     | run loop
>  0  |     1    |     1     | run loop
>  1  |     0    |     0     | none
>  1  |     0    |     1     | load/put
>  1  |     1    |     0     | none
>  1  |     1    |     1     | run loop

Thanks,

Andrew Murray

> 
> Thanks,
> 
>         M.
> -- 
> Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 09/18] arm64: KVM: enable conditional save/restore full SPE profiling buffer controls
  2020-01-10 11:51         ` Marc Zyngier
@ 2020-01-10 12:12           ` Andrew Murray
  0 siblings, 0 replies; 78+ messages in thread
From: Andrew Murray @ 2020-01-10 12:12 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, Catalin Marinas, linux-kernel, Sudeep Holla, will, kvmarm,
	linux-arm-kernel

On Fri, Jan 10, 2020 at 11:51:39AM +0000, Marc Zyngier wrote:
> On 2020-01-10 11:04, Andrew Murray wrote:
> > On Fri, Jan 10, 2020 at 10:54:36AM +0000, Andrew Murray wrote:
> > > On Sat, Dec 21, 2019 at 02:13:25PM +0000, Marc Zyngier wrote:
> > > > On Fri, 20 Dec 2019 14:30:16 +0000
> > > > Andrew Murray <andrew.murray@arm.com> wrote:
> > > >
> > > > [somehow managed not to do a reply all, re-sending]
> > > >
> > > > > From: Sudeep Holla <sudeep.holla@arm.com>
> > > > >
> > > > > Now that we can save/restore the full SPE controls, we can enable it
> > > > > if SPE is setup and ready to use in KVM. It's supported in KVM only if
> > > > > all the CPUs in the system supports SPE.
> > > > >
> > > > > However to support heterogenous systems, we need to move the check if
> > > > > host supports SPE and do a partial save/restore.
> > > >
> > > > No. Let's just not go down that path. For now, KVM on heterogeneous
> > > > systems do not get SPE. If SPE has been enabled on a guest and a CPU
> > > > comes up without SPE, this CPU should fail to boot (same as exposing a
> > > > feature to userspace).
> > > >
> > > > >
> > > > > Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> > > > > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> > > > > ---
> > > > >  arch/arm64/kvm/hyp/debug-sr.c | 33 ++++++++++++++++-----------------
> > > > >  include/kvm/arm_spe.h         |  6 ++++++
> > > > >  2 files changed, 22 insertions(+), 17 deletions(-)
> > > > >
> > > > > diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
> > > > > index 12429b212a3a..d8d857067e6d 100644
> > > > > --- a/arch/arm64/kvm/hyp/debug-sr.c
> > > > > +++ b/arch/arm64/kvm/hyp/debug-sr.c
> > > > > @@ -86,18 +86,13 @@
> > > > >  	}
> > > > >
> > > > >  static void __hyp_text
> > > > > -__debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
> > > > > +__debug_save_spe_context(struct kvm_cpu_context *ctxt, bool full_ctxt)
> > > > >  {
> > > > >  	u64 reg;
> > > > >
> > > > >  	/* Clear pmscr in case of early return */
> > > > >  	ctxt->sys_regs[PMSCR_EL1] = 0;
> > > > >
> > > > > -	/* SPE present on this CPU? */
> > > > > -	if (!cpuid_feature_extract_unsigned_field(read_sysreg(id_aa64dfr0_el1),
> > > > > -						  ID_AA64DFR0_PMSVER_SHIFT))
> > > > > -		return;
> > > > > -
> > > > >  	/* Yes; is it owned by higher EL? */
> > > > >  	reg = read_sysreg_s(SYS_PMBIDR_EL1);
> > > > >  	if (reg & BIT(SYS_PMBIDR_EL1_P_SHIFT))
> > > > > @@ -142,7 +137,7 @@ __debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
> > > > >  }
> > > > >
> > > > >  static void __hyp_text
> > > > > -__debug_restore_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
> > > > > +__debug_restore_spe_context(struct kvm_cpu_context *ctxt, bool full_ctxt)
> > > > >  {
> > > > >  	if (!ctxt->sys_regs[PMSCR_EL1])
> > > > >  		return;
> > > > > @@ -210,11 +205,14 @@ void __hyp_text __debug_restore_guest_context(struct kvm_vcpu *vcpu)
> > > > >  	struct kvm_guest_debug_arch *host_dbg;
> > > > >  	struct kvm_guest_debug_arch *guest_dbg;
> > > > >
> > > > > +	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > > > > +	guest_ctxt = &vcpu->arch.ctxt;
> > > > > +
> > > > > +	__debug_restore_spe_context(guest_ctxt, kvm_arm_spe_v1_ready(vcpu));
> > > > > +
> > > > >  	if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
> > > > >  		return;
> > > > >
> > > > > -	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > > > > -	guest_ctxt = &vcpu->arch.ctxt;
> > > > >  	host_dbg = &vcpu->arch.host_debug_state.regs;
> > > > >  	guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
> > > > >
> > > > > @@ -232,8 +230,7 @@ void __hyp_text __debug_restore_host_context(struct kvm_vcpu *vcpu)
> > > > >  	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > > > >  	guest_ctxt = &vcpu->arch.ctxt;
> > > > >
> > > > > -	if (!has_vhe())
> > > > > -		__debug_restore_spe_nvhe(host_ctxt, false);
> > > > > +	__debug_restore_spe_context(host_ctxt, kvm_arm_spe_v1_ready(vcpu));
> > > >
> > > > So you now do an unconditional save/restore on the exit path for VHE as
> > > > well? Even if the host isn't using the SPE HW? That's not acceptable
> > > > as, in most cases, only the host /or/ the guest will use SPE. Here, you
> > > > put a measurable overhead on each exit.
> > > >
> > > > If the host is not using SPE, then the restore/save should happen in
> > > > vcpu_load/vcpu_put. Only if the host is using SPE should you do
> > > > something in the run loop. Of course, this only applies to VHE and
> > > > non-VHE must switch eagerly.
> > > >
> > > 
> > > On VHE where SPE is used in the guest only - we save/restore in
> > > vcpu_load/put.
> > > 
> > > On VHE where SPE is used in the host only - we save/restore in the
> > > run loop.
> > > 
> > > On VHE where SPE is used in guest and host - we save/restore in the
> > > run loop.
> > > 
> > > As the guest can't trace EL2 it doesn't matter if we restore guest
> > > SPE early
> > > in the vcpu_load/put functions. (I assume it doesn't matter that we
> > > restore
> > > an EL0/EL1 profiling buffer address at this point and enable tracing
> > > given
> > > that there is nothing to trace until entering the guest).
> > > 
> > > However the reason for moving save/restore to vcpu_load/put when the
> > > host is
> > > using SPE is to minimise the host EL2 black-out window.
> > > 
> > > 
> > > On nVHE we always save/restore in the run loop. For the SPE
> > > guest-use-only
> > > use-case we can't save/restore in vcpu_load/put - because the guest
> > > runs at
> > > the same ELx level as the host - and thus doing so would result in
> > > the guest
> > > tracing part of the host.
> > > 
> > > Though if we determine that (for nVHE systems) the guest SPE is
> > > profiling only
> > > EL0 - then we could also save/restore in vcpu_load/put where SPE is
> > > only being
> > > used in the guest.
> > > 
> > > Does that make sense, are my reasons correct?
> > 
> > Also I'm making the following assumptions:
> > 
> >  - We determine if the host or guest are using SPE by seeing if
> > profiling
> >    (e.g. PMSCR_EL1) is enabled. That should determine *when* we restore
> > as per
> >    my previous email.
> 
> Yes.
> 
> >  - I'm less sure on this: We should determine *what* we restore based on
> > the
> >    availability of the SPE feature and not if it is being used - so for
> > guest
> >    this is if the guest has the feature on the vcpu. For host this is
> > based on
> >    the CPU feature registers.
> 
> As long as the guest's feature is conditionned on the HW being present *and*
> that you're running on a CPU that has the HW.

Yes that makes sense.


> 
> >    The downshot of this is that if you have SPE support present on guest
> > and
> >    host and they aren't being used, then you still save/restore upon
> > entering/
> >    leaving a guest. The reason I feel this is needed is to prevent the
> > issue
> >    where the host starts programming the SPE registers, but is preempted
> > by
> >    KVM entering a guest, before it could enable host SPE. Thus when we
> > enter the
> >    guest we don't save all the registers, we return to the host and the
> > host
> >    SPE carries on from where it left of and enables it - yet because we
> > didn't
> >    restore all the programmed registers it doesn't work.
> 
> Saving the host registers is never optional if they are shared with the
> guest.

That make me feel better :)

Thanks,

Andrew Murray

> 
>         M.
> -- 
> Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 09/18] arm64: KVM: enable conditional save/restore full SPE profiling buffer controls
  2020-01-10 12:12         ` Andrew Murray
@ 2020-01-10 13:34           ` Marc Zyngier
  0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2020-01-10 13:34 UTC (permalink / raw)
  To: Andrew Murray
  Cc: Catalin Marinas, Mark Rutland, will, Sudeep Holla, kvm, kvmarm,
	linux-arm-kernel, linux-kernel

On 2020-01-10 12:12, Andrew Murray wrote:
> On Fri, Jan 10, 2020 at 11:18:48AM +0000, Marc Zyngier wrote:
>> On 2020-01-10 10:54, Andrew Murray wrote:
>> > On Sat, Dec 21, 2019 at 02:13:25PM +0000, Marc Zyngier wrote:
>> > > On Fri, 20 Dec 2019 14:30:16 +0000
>> > > Andrew Murray <andrew.murray@arm.com> wrote:
>> > >
>> > > [somehow managed not to do a reply all, re-sending]
>> > >
>> > > > From: Sudeep Holla <sudeep.holla@arm.com>
>> > > >
>> > > > Now that we can save/restore the full SPE controls, we can enable it
>> > > > if SPE is setup and ready to use in KVM. It's supported in KVM only if
>> > > > all the CPUs in the system supports SPE.
>> > > >
>> > > > However to support heterogenous systems, we need to move the check if
>> > > > host supports SPE and do a partial save/restore.
>> > >
>> > > No. Let's just not go down that path. For now, KVM on heterogeneous
>> > > systems do not get SPE. If SPE has been enabled on a guest and a CPU
>> > > comes up without SPE, this CPU should fail to boot (same as exposing a
>> > > feature to userspace).
>> > >
>> > > >
>> > > > Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
>> > > > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
>> > > > ---
>> > > >  arch/arm64/kvm/hyp/debug-sr.c | 33 ++++++++++++++++-----------------
>> > > >  include/kvm/arm_spe.h         |  6 ++++++
>> > > >  2 files changed, 22 insertions(+), 17 deletions(-)
>> > > >
>> > > > diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
>> > > > index 12429b212a3a..d8d857067e6d 100644
>> > > > --- a/arch/arm64/kvm/hyp/debug-sr.c
>> > > > +++ b/arch/arm64/kvm/hyp/debug-sr.c
>> > > > @@ -86,18 +86,13 @@
>> > > >  	}
>> > > >
>> > > >  static void __hyp_text
>> > > > -__debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
>> > > > +__debug_save_spe_context(struct kvm_cpu_context *ctxt, bool full_ctxt)
>> > > >  {
>> > > >  	u64 reg;
>> > > >
>> > > >  	/* Clear pmscr in case of early return */
>> > > >  	ctxt->sys_regs[PMSCR_EL1] = 0;
>> > > >
>> > > > -	/* SPE present on this CPU? */
>> > > > -	if (!cpuid_feature_extract_unsigned_field(read_sysreg(id_aa64dfr0_el1),
>> > > > -						  ID_AA64DFR0_PMSVER_SHIFT))
>> > > > -		return;
>> > > > -
>> > > >  	/* Yes; is it owned by higher EL? */
>> > > >  	reg = read_sysreg_s(SYS_PMBIDR_EL1);
>> > > >  	if (reg & BIT(SYS_PMBIDR_EL1_P_SHIFT))
>> > > > @@ -142,7 +137,7 @@ __debug_save_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
>> > > >  }
>> > > >
>> > > >  static void __hyp_text
>> > > > -__debug_restore_spe_nvhe(struct kvm_cpu_context *ctxt, bool full_ctxt)
>> > > > +__debug_restore_spe_context(struct kvm_cpu_context *ctxt, bool full_ctxt)
>> > > >  {
>> > > >  	if (!ctxt->sys_regs[PMSCR_EL1])
>> > > >  		return;
>> > > > @@ -210,11 +205,14 @@ void __hyp_text __debug_restore_guest_context(struct kvm_vcpu *vcpu)
>> > > >  	struct kvm_guest_debug_arch *host_dbg;
>> > > >  	struct kvm_guest_debug_arch *guest_dbg;
>> > > >
>> > > > +	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
>> > > > +	guest_ctxt = &vcpu->arch.ctxt;
>> > > > +
>> > > > +	__debug_restore_spe_context(guest_ctxt, kvm_arm_spe_v1_ready(vcpu));
>> > > > +
>> > > >  	if (!(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY))
>> > > >  		return;
>> > > >
>> > > > -	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
>> > > > -	guest_ctxt = &vcpu->arch.ctxt;
>> > > >  	host_dbg = &vcpu->arch.host_debug_state.regs;
>> > > >  	guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
>> > > >
>> > > > @@ -232,8 +230,7 @@ void __hyp_text __debug_restore_host_context(struct kvm_vcpu *vcpu)
>> > > >  	host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
>> > > >  	guest_ctxt = &vcpu->arch.ctxt;
>> > > >
>> > > > -	if (!has_vhe())
>> > > > -		__debug_restore_spe_nvhe(host_ctxt, false);
>> > > > +	__debug_restore_spe_context(host_ctxt, kvm_arm_spe_v1_ready(vcpu));
>> > >
>> > > So you now do an unconditional save/restore on the exit path for VHE
>> > > as
>> > > well? Even if the host isn't using the SPE HW? That's not acceptable
>> > > as, in most cases, only the host /or/ the guest will use SPE. Here,
>> > > you
>> > > put a measurable overhead on each exit.
>> > >
>> > > If the host is not using SPE, then the restore/save should happen in
>> > > vcpu_load/vcpu_put. Only if the host is using SPE should you do
>> > > something in the run loop. Of course, this only applies to VHE and
>> > > non-VHE must switch eagerly.
>> > >
>> >
>> > On VHE where SPE is used in the guest only - we save/restore in
>> > vcpu_load/put.
>> 
>> Yes.
>> 
>> > On VHE where SPE is used in the host only - we save/restore in the run
>> > loop.
>> 
>> Why? If only the host is using SPE, why should we do *anything at 
>> all*?
> 
> Oh yeah of course, we trap them in this case.
> 
> (Do I understand correctly that we don't/can't trap them for nVHE? - 
> and so
> we should save/restore them for this use-case in nVHE)

We can always trap. Otherwise we wouldn't be able to hide the feature
from the guest.

>> > On VHE where SPE is used in guest and host - we save/restore in the run
>> > loop.
>> >
>> > As the guest can't trace EL2 it doesn't matter if we restore guest SPE
>> > early
>> > in the vcpu_load/put functions. (I assume it doesn't matter that we
>> > restore
>> > an EL0/EL1 profiling buffer address at this point and enable tracing
>> > given
>> > that there is nothing to trace until entering the guest).
>> 
>> As long as you do it after the EL1 sysregs have need restored so that 
>> the
>> SPE
>> HW has a valid context, we should be fine. Don't restore it before 
>> that
>> point
>> though (you have no idea whether the SPE HW can do speculative memory
>> accesses
>> that would use the wrong page tables).
> 
> Right, so don't enable tracing until SPE has a valid context. I 
> understand
> that to mean at least the SPE buffer address registers (PMBPTR, 
> PMBLIMITR)
> in the right context with respect to the E2PB bits (translation regime)
> and having those tables mapped in (which I think relate to the 
> __activateX,
> __sysreg_restore_guest_stateX type of calls in kvm_vcpu_run_X right?).

The full MM context has to be in place before you can do anything. This 
means
at least TTBR*_EL1, TCR_EL1 and co. But maybe this note in the SPE 
architecture
document would allow us to relax things:

"The Statistical Profiling Extension is always disabled if the owning 
Exception
level is a lower Exception level than the current Exception level."

So as long as you restore the guest state from EL2, SPE should be 
disabled.

> I think that means we can restore the registers no earler than 
> vcpu_load/put
> but we can't re-enable the tracing (PMSCR) until no earlier than just 
> before
> __set_guest_arch_workaround_state. I think that applies to both VHE and 
> nVHE?

I'm sorry, but I don't understand what you mean.

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 10/18] arm64: KVM/debug: use EL1&0 stage 1 translation regime
  2019-12-22 10:34   ` Marc Zyngier
  2019-12-24 11:11     ` Andrew Murray
@ 2020-01-13 16:31     ` Andrew Murray
  2020-01-15 14:03       ` Marc Zyngier
  1 sibling, 1 reply; 78+ messages in thread
From: Andrew Murray @ 2020-01-13 16:31 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Catalin Marinas, Will Deacon, kvm, linux-kernel, Sudeep Holla,
	kvmarm, linux-arm-kernel

On Sun, Dec 22, 2019 at 10:34:55AM +0000, Marc Zyngier wrote:
> On Fri, 20 Dec 2019 14:30:17 +0000,
> Andrew Murray <andrew.murray@arm.com> wrote:
> > 
> > From: Sudeep Holla <sudeep.holla@arm.com>
> > 
> > Now that we have all the save/restore mechanism in place, lets enable
> > the translation regime used by buffer from EL2 stage 1 to EL1 stage 1
> > on VHE systems.
> > 
> > Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> > [ Reword commit, don't trap to EL2 ]
> 
> Not trapping to EL2 for the case where we don't allow SPE in the
> guest is not acceptable.
> 
> > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> > ---
> >  arch/arm64/kvm/hyp/switch.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> > index 67b7c160f65b..6c153b79829b 100644
> > --- a/arch/arm64/kvm/hyp/switch.c
> > +++ b/arch/arm64/kvm/hyp/switch.c
> > @@ -100,6 +100,7 @@ static void activate_traps_vhe(struct kvm_vcpu *vcpu)
> >  
> >  	write_sysreg(val, cpacr_el1);
> >  
> > +	write_sysreg(vcpu->arch.mdcr_el2 | 3 << MDCR_EL2_E2PB_SHIFT, mdcr_el2);
> >  	write_sysreg(kvm_get_hyp_vector(), vbar_el1);
> >  }
> >  NOKPROBE_SYMBOL(activate_traps_vhe);
> > @@ -117,6 +118,7 @@ static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
> >  		__activate_traps_fpsimd32(vcpu);
> >  	}
> >  
> > +	write_sysreg(vcpu->arch.mdcr_el2 | 3 << MDCR_EL2_E2PB_SHIFT, mdcr_el2);
> 
> There is a _MASK macro that can replace this '3', and is in keeping
> with the rest of the code.
> 
> It still remains that it looks like the wrong place to do this, and
> vcpu_load seems much better. Why should you write to mdcr_el2 on each
> entry to the guest, since you know whether it has SPE enabled at the
> point where it gets scheduled?

For nVHE, the only reason we'd want to change E2PB on entry/exit of guest
would be if the host is also using SPE. If the host is using SPE whilst
the vcpu is 'loaded' but we're not in the guest, then host SPE could raise
an interrupt - we need the E2PB bits to allow access from EL1 (host).

Thanks,

Andrew Murray

> 
> 	M.
> 
> -- 
> Jazz is not dead, it just smells funny.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 10/18] arm64: KVM/debug: use EL1&0 stage 1 translation regime
  2020-01-13 16:31     ` Andrew Murray
@ 2020-01-15 14:03       ` Marc Zyngier
  0 siblings, 0 replies; 78+ messages in thread
From: Marc Zyngier @ 2020-01-15 14:03 UTC (permalink / raw)
  To: Andrew Murray
  Cc: Catalin Marinas, Will Deacon, kvm, linux-kernel, Sudeep Holla,
	kvmarm, linux-arm-kernel

On 2020-01-13 16:31, Andrew Murray wrote:
> On Sun, Dec 22, 2019 at 10:34:55AM +0000, Marc Zyngier wrote:
>> On Fri, 20 Dec 2019 14:30:17 +0000,
>> Andrew Murray <andrew.murray@arm.com> wrote:
>> >
>> > From: Sudeep Holla <sudeep.holla@arm.com>
>> >
>> > Now that we have all the save/restore mechanism in place, lets enable
>> > the translation regime used by buffer from EL2 stage 1 to EL1 stage 1
>> > on VHE systems.
>> >
>> > Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
>> > [ Reword commit, don't trap to EL2 ]
>> 
>> Not trapping to EL2 for the case where we don't allow SPE in the
>> guest is not acceptable.
>> 
>> > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
>> > ---
>> >  arch/arm64/kvm/hyp/switch.c | 2 ++
>> >  1 file changed, 2 insertions(+)
>> >
>> > diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
>> > index 67b7c160f65b..6c153b79829b 100644
>> > --- a/arch/arm64/kvm/hyp/switch.c
>> > +++ b/arch/arm64/kvm/hyp/switch.c
>> > @@ -100,6 +100,7 @@ static void activate_traps_vhe(struct kvm_vcpu *vcpu)
>> >
>> >  	write_sysreg(val, cpacr_el1);
>> >
>> > +	write_sysreg(vcpu->arch.mdcr_el2 | 3 << MDCR_EL2_E2PB_SHIFT, mdcr_el2);
>> >  	write_sysreg(kvm_get_hyp_vector(), vbar_el1);
>> >  }
>> >  NOKPROBE_SYMBOL(activate_traps_vhe);
>> > @@ -117,6 +118,7 @@ static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
>> >  		__activate_traps_fpsimd32(vcpu);
>> >  	}
>> >
>> > +	write_sysreg(vcpu->arch.mdcr_el2 | 3 << MDCR_EL2_E2PB_SHIFT, mdcr_el2);
>> 
>> There is a _MASK macro that can replace this '3', and is in keeping
>> with the rest of the code.
>> 
>> It still remains that it looks like the wrong place to do this, and
>> vcpu_load seems much better. Why should you write to mdcr_el2 on each
>> entry to the guest, since you know whether it has SPE enabled at the
>> point where it gets scheduled?
> 
> For nVHE, the only reason we'd want to change E2PB on entry/exit of 
> guest
> would be if the host is also using SPE. If the host is using SPE whilst
> the vcpu is 'loaded' but we're not in the guest, then host SPE could 
> raise
> an interrupt - we need the E2PB bits to allow access from EL1 (host).

My comment was of course for VHE. nVHE hardly makes use of load/put at 
all,
for obvious reasons.

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 78+ messages in thread

end of thread, back to index

Thread overview: 78+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-20 14:30 [PATCH v2 00/18] arm64: KVM: add SPE profiling support Andrew Murray
2019-12-20 14:30 ` [PATCH v2 01/18] dt-bindings: ARM SPE: highlight the need for PPI partitions on heterogeneous systems Andrew Murray
2019-12-20 14:30 ` [PATCH v2 02/18] arm64: KVM: reset E2PB correctly in MDCR_EL2 when exiting the guest(VHE) Andrew Murray
2019-12-21 13:12   ` Marc Zyngier
2019-12-24 10:29     ` Andrew Murray
2020-01-02 16:21       ` Andrew Murray
2019-12-20 14:30 ` [PATCH v2 03/18] arm64: KVM: define SPE data structure for each vcpu Andrew Murray
2019-12-21 13:19   ` Marc Zyngier
2019-12-24 12:01     ` Andrew Murray
2019-12-20 14:30 ` [PATCH v2 04/18] arm64: KVM: add SPE system registers to sys_reg_descs Andrew Murray
2019-12-20 14:30 ` [PATCH v2 05/18] arm64: KVM/VHE: enable the use PMSCR_EL12 on VHE systems Andrew Murray
2019-12-20 14:30 ` [PATCH v2 06/18] arm64: KVM: split debug save restore across vm/traps activation Andrew Murray
2019-12-20 14:30 ` [PATCH v2 07/18] arm64: KVM/debug: drop pmscr_el1 and use sys_regs[PMSCR_EL1] in kvm_cpu_context Andrew Murray
2019-12-20 14:30 ` [PATCH v2 08/18] arm64: KVM: add support to save/restore SPE profiling buffer controls Andrew Murray
2019-12-21 13:57   ` Marc Zyngier
2019-12-24 10:49     ` Andrew Murray
2019-12-24 15:17       ` Andrew Murray
2019-12-24 15:48         ` Marc Zyngier
2019-12-20 14:30 ` [PATCH v2 09/18] arm64: KVM: enable conditional save/restore full " Andrew Murray
2019-12-20 18:06   ` Mark Rutland
2019-12-24 12:15     ` Andrew Murray
2019-12-21 14:13   ` Marc Zyngier
2020-01-07 15:13     ` Andrew Murray
2020-01-08 11:17       ` Marc Zyngier
2020-01-08 11:58         ` Will Deacon
2020-01-08 12:36           ` Marc Zyngier
2020-01-08 13:10             ` Will Deacon
2020-01-09 11:23               ` Andrew Murray
2020-01-09 11:25                 ` Andrew Murray
2020-01-09 12:01                   ` Will Deacon
2020-01-10 10:54     ` Andrew Murray
2020-01-10 11:04       ` Andrew Murray
2020-01-10 11:51         ` Marc Zyngier
2020-01-10 12:12           ` Andrew Murray
2020-01-10 11:18       ` Marc Zyngier
2020-01-10 12:12         ` Andrew Murray
2020-01-10 13:34           ` Marc Zyngier
2019-12-20 14:30 ` [PATCH v2 10/18] arm64: KVM/debug: use EL1&0 stage 1 translation regime Andrew Murray
2019-12-22 10:34   ` Marc Zyngier
2019-12-24 11:11     ` Andrew Murray
2020-01-13 16:31     ` Andrew Murray
2020-01-15 14:03       ` Marc Zyngier
2019-12-20 14:30 ` [PATCH v2 11/18] KVM: arm64: don't trap Statistical Profiling controls to EL2 Andrew Murray
2019-12-20 18:08   ` Mark Rutland
2019-12-22 10:42   ` Marc Zyngier
2019-12-23 11:56     ` Andrew Murray
2019-12-23 12:05       ` Marc Zyngier
2019-12-23 12:10         ` Andrew Murray
2020-01-09 17:25           ` Andrew Murray
2020-01-09 17:42             ` Mark Rutland
2020-01-09 17:46               ` Andrew Murray
2019-12-20 14:30 ` [PATCH v2 12/18] KVM: arm64: add a new vcpu device control group for SPEv1 Andrew Murray
2019-12-22 11:03   ` Marc Zyngier
2019-12-24 12:30     ` Andrew Murray
2019-12-20 14:30 ` [PATCH v2 13/18] perf: arm_spe: Add KVM structure for obtaining IRQ info Andrew Murray
2019-12-22 11:24   ` Marc Zyngier
2019-12-24 12:35     ` Andrew Murray
2019-12-20 14:30 ` [PATCH v2 14/18] KVM: arm64: spe: Provide guest virtual interrupts for SPE Andrew Murray
2019-12-22 12:07   ` Marc Zyngier
2019-12-24 11:50     ` Andrew Murray
2019-12-24 12:42       ` Marc Zyngier
2019-12-24 13:08         ` Andrew Murray
2019-12-24 13:22           ` Marc Zyngier
2019-12-24 13:36             ` Andrew Murray
2019-12-24 13:46               ` Marc Zyngier
2019-12-20 14:30 ` [PATCH v2 15/18] perf: arm_spe: Handle guest/host exclusion flags Andrew Murray
2019-12-20 18:10   ` Mark Rutland
2019-12-22 12:10   ` Marc Zyngier
2019-12-23 12:10     ` Andrew Murray
2019-12-23 12:18       ` Marc Zyngier
2019-12-20 14:30 ` [PATCH v2 16/18] KVM: arm64: enable SPE support Andrew Murray
2019-12-20 14:30 ` [PATCH v2 17/18, KVMTOOL] update_headers: Sync kvm UAPI headers with linux v5.5-rc2 Andrew Murray
2019-12-20 14:30 ` [PATCH v2 18/18, KVMTOOL] kvm: add a vcpu feature for SPEv1 support Andrew Murray
2019-12-20 17:55 ` [PATCH v2 00/18] arm64: KVM: add SPE profiling support Mark Rutland
2019-12-24 12:54   ` Andrew Murray
2019-12-21 10:48 ` Marc Zyngier
2019-12-22 12:22   ` Marc Zyngier
2019-12-24 12:56     ` Andrew Murray

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git